2019-12-03 09:03:55

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

The current XCHECK_SZ macro warns if the XFEATURE size reported
by CPUID does not match the size of kernel structure. However, depending
on the hardware implementation, CPUID can report the XSAVE state size
larger than the size of C structures defined for each of the XSAVE state
due to padding. Such case should be safe and should not need to generate
warning message.

Therefore, change the logic to warn only when the CPUID reported size is
less than then size of C structure.

Fixes: ef78f2a4bf84 ("x86/fpu: Check CPU-provided sizes against struct declarations")
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Thomas Lendacky <[email protected]>
Signed-off-by: Suravee Suthikulpanit <[email protected]>
---
arch/x86/kernel/fpu/xstate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index e5cb67d..f002115 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -523,7 +523,7 @@ static void __xstate_dump_leaves(void)

#define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \
if ((nr == nr_macro) && \
- WARN_ONCE(sz != sizeof(__struct), \
+ WARN_ONCE(sz < sizeof(__struct), \
"%s: struct is %zu bytes, cpu state %d bytes\n", \
__stringify(nr_macro), sizeof(__struct), sz)) { \
__xstate_dump_leaves(); \
--
1.8.3.1


Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

On 2019-12-03 04:01:28 [-0500], Suravee Suthikulpanit wrote:
> The current XCHECK_SZ macro warns if the XFEATURE size reported
> by CPUID does not match the size of kernel structure. However, depending
> on the hardware implementation, CPUID can report the XSAVE state size
> larger than the size of C structures defined for each of the XSAVE state
> due to padding. Such case should be safe and should not need to generate
> warning message.

Do you have an example which CPU generation and which feature?

We don't use this these structs in the kernel and the xsave layout is
dynamic based on the memory requirements reported by the CPU.
But we have a warning which complains about different sizes. Now you
change the warning that it is okay if the CPU reports that more memory
is needed than we expect. This looks wrong. The other way around would
be "okay" but this just renders the warning useless.

> Therefore, change the logic to warn only when the CPUID reported size is
> less than then size of C structure.
>
> Fixes: ef78f2a4bf84 ("x86/fpu: Check CPU-provided sizes against struct declarations")
> Cc: Thomas Gleixner <[email protected]>
> Cc: Ingo Molnar <[email protected]>
> Cc: Borislav Petkov <[email protected]>
> Cc: H. Peter Anvin <[email protected]>
> Cc: Sebastian Andrzej Siewior <[email protected]>
> Cc: Dave Hansen <[email protected]>
> Cc: Thomas Lendacky <[email protected]>
> Signed-off-by: Suravee Suthikulpanit <[email protected]>
> ---
> arch/x86/kernel/fpu/xstate.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
> index e5cb67d..f002115 100644
> --- a/arch/x86/kernel/fpu/xstate.c
> +++ b/arch/x86/kernel/fpu/xstate.c
> @@ -523,7 +523,7 @@ static void __xstate_dump_leaves(void)
>
> #define XCHECK_SZ(sz, nr, nr_macro, __struct) do { \
> if ((nr == nr_macro) && \
> - WARN_ONCE(sz != sizeof(__struct), \
> + WARN_ONCE(sz < sizeof(__struct), \
> "%s: struct is %zu bytes, cpu state %d bytes\n", \
> __stringify(nr_macro), sizeof(__struct), sz)) { \
> __xstate_dump_leaves(); \

Sebastian

2019-12-03 17:29:09

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

On 12/3/19 1:01 AM, Suravee Suthikulpanit wrote:
> The current XCHECK_SZ macro warns if the XFEATURE size reported
> by CPUID does not match the size of kernel structure. However, depending
> on the hardware implementation, CPUID can report the XSAVE state size
> larger than the size of C structures defined for each of the XSAVE state
> due to padding.

We have existing architecture for padding. See xfeature_is_aligned(),
for instance. Are you saying that there are implementations out there
that do padding which is not otherwise enumerated and that they do it
within the size of the enumerated state?

> Such case should be safe and should not need to generate warning
> message.

I've seen these error messages trip before, but only on pre-production
processors with goofy microcode. I'd be really suspicious that this is
just papering over a processor issue. Or, that perhaps the compacted
form works but the standard form is broken somehow.

2019-12-06 08:17:53

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

Dave,

On 12/4/19 12:27 AM, Dave Hansen wrote:
> On 12/3/19 1:01 AM, Suravee Suthikulpanit wrote:
>> The current XCHECK_SZ macro warns if the XFEATURE size reported
>> by CPUID does not match the size of kernel structure. However, depending
>> on the hardware implementation, CPUID can report the XSAVE state size
>> larger than the size of C structures defined for each of the XSAVE state
>> due to padding.
>
> We have existing architecture for padding. See xfeature_is_aligned(),
> for instance. Are you saying that there are implementations out there
> that do padding which is not otherwise enumerated and that they do it
> within the size of the enumerated stat
Yes, the implementation includes the padding size within the size of
the enumerated state. This results in the reported size larger than
the amount needed by the feature.

>> Such case should be safe and should not need to generate warning
>> message.
>
> I've seen these error messages trip before, but only on pre-production
> processors with goofy microcode. I'd be really suspicious that this is
> just papering over a processor issue. Or, that perhaps the compacted
> form works but the standard form is broken somehow.

I have verified with the HW folks and the have confirmed that this is
to be expected.

Thanks,
Suravee

2019-12-06 08:18:12

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

Sebastian,

On 12/3/19 5:34 PM, Sebastian Andrzej Siewior wrote:
> On 2019-12-03 04:01:28 [-0500], Suravee Suthikulpanit wrote:
>> The current XCHECK_SZ macro warns if the XFEATURE size reported
>> by CPUID does not match the size of kernel structure. However, depending
>> on the hardware implementation, CPUID can report the XSAVE state size
>> larger than the size of C structures defined for each of the XSAVE state
>> due to padding. Such case should be safe and should not need to generate
>> warning message.
> Do you have an example which CPU generation and which feature?

This is observed with one of the newly supported features in the upcoming product.

> We don't use this these structs in the kernel and the xsave layout is
> dynamic based on the memory requirements reported by the CPU.
> But we have a warning which complains about different sizes. Now you
> change the warning that it is okay if the CPU reports that more memory
> is needed than we expect. This looks wrong. The other way around would
> be "okay" but this just renders the warning useless.

My point is it should be safe for the hardware to save data more than or equal
to the size used by the kernel. However, it is not okay if the hardware
saves data less than the amount used by kernel because that means the data
would lost at the time of restore.

However, in this case the hardware reported size contains data + padding.

Thanks,
Suravee

2019-12-06 15:31:06

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

On 12/6/19 12:14 AM, Suravee Suthikulpanit wrote:
> On 12/4/19 12:27 AM, Dave Hansen wrote:
>> On 12/3/19 1:01 AM, Suravee Suthikulpanit wrote:
>>> The current XCHECK_SZ macro warns if the XFEATURE size reported
>>> by CPUID does not match the size of kernel structure. However, depending
>>> on the hardware implementation, CPUID can report the XSAVE state size
>>> larger than the size of C structures defined for each of the XSAVE state
>>> due to padding.
>>
>> We have existing architecture for padding.  See xfeature_is_aligned(),
>> for instance.  Are you saying that there are implementations out there
>> that do padding which is not otherwise enumerated and that they do it
>> within the size of the enumerated stat
> Yes, the implementation includes the padding size within the size of
> the enumerated state. This results in the reported size larger than
> the amount needed by the feature.

I don't think we've ever had XSAVE state that differed in size between
implementations. This kind of thing ensures that we can't have any
statically-defined inspection into the XSAVE state.

It also increases the amount of blind trust that we have in the CPU
implementations. However, those warnings were specifically added at
Ingo's behest (IIRC) to reduce our blind trust in the CPU.

>>> Such case should be safe and should not need to generate warning
>>> message.
>>
>> I've seen these error messages trip before, but only on pre-production
>> processors with goofy microcode.  I'd be really suspicious that this is
>> just papering over a processor issue.  Or, that perhaps the compacted
>> form works but the standard form is broken somehow.
>
> I have verified with the HW folks and the have confirmed that this is
> to be expected.

From a review perspective, I'd really appreciate being able to have a
concrete discussion about exactly which state this is and exactly how
much padding we are talking about and *why* the existing padding
architecture can't be used. I'd also want guarantees about this state
not getting used for any real state, *ever*.

I'm not sure how we can have those conversations without more details,
though. Would you be able, for instance, to have a private discussion
with the x86 maintainers? That discussion could, understandably,
exclude folks from Intel.

2019-12-11 05:26:55

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

Dave

On 12/6/19 10:28 PM, Dave Hansen wrote:
> On 12/6/19 12:14 AM, Suravee Suthikulpanit wrote:
>> On 12/4/19 12:27 AM, Dave Hansen wrote:
>>> On 12/3/19 1:01 AM, Suravee Suthikulpanit wrote:
>>>> The current XCHECK_SZ macro warns if the XFEATURE size reported
>>>> by CPUID does not match the size of kernel structure. However, depending
>>>> on the hardware implementation, CPUID can report the XSAVE state size
>>>> larger than the size of C structures defined for each of the XSAVE state
>>>> due to padding.
>>>
>>> We have existing architecture for padding.  See xfeature_is_aligned(),
>>> for instance.  Are you saying that there are implementations out there
>>> that do padding which is not otherwise enumerated and that they do it
>>> within the size of the enumerated stat
>> Yes, the implementation includes the padding size within the size of
>> the enumerated state. This results in the reported size larger than
>> the amount needed by the feature.

Actually, please allow me clarify my understanding for this part.

When you referred to "the existing architecture for padding", IIUC,
that's the XSAVE state size, offset, and alignment of each extended
feature reported by the CPUID Fn 0Dh E[A|B|C]X. By "padding", do you
mean the additional area included as part of alignment?

> I don't think we've ever had XSAVE state that differed in size between
> implementations. This kind of thing ensures that we can't have any
> statically-defined inspection into the XSAVE state.
>
> It also increases the amount of blind trust that we have in the CPU
> implementations. However, those warnings were specifically added at
> Ingo's behest (IIRC) to reduce our blind trust in the CPU.
>

I am not quite sure what you meant by "statically-defined inspection" and
"blind trust".

>>>> Such case should be safe and should not need to generate warning
>>>> message.
>>>
>>> I've seen these error messages trip before, but only on pre-production
>>> processors with goofy microcode.  I'd be really suspicious that this is
>>> just papering over a processor issue.  Or, that perhaps the compacted
>>> form works but the standard form is broken somehow.
>>
>> I have verified with the HW folks and the have confirmed that this is
>> to be expected.
>
> From a review perspective, I'd really appreciate being able to have a
> concrete discussion about exactly which state this is and exactly how
> much padding we are talking about and *why* the existing padding
> architecture can't be used. I'd also want guarantees about this state
> not getting used for any real state, *ever*.

Here is the warning message:

------------[ cut here ]------------
XFEATURE_PKRU: struct is 8 bytes, cpu state 32 bytes
WARNING: CPU: 0 PID: 0 at arch/x86/kernel/fpu/xstate.c:558 fpu__init_system_xstate+0x2b5/0x8ea
.......

In this case, the feature is PKRU (feature enum 9), and CPUID 0Dh instruction reports
eax:0x20 (size), ebx:0x340 (offset), ecx:0(aligned). The warning is due to the 32-byte size
is more than the 8 bytes required for struct pkru_state.

What I have been told (by HW folks) is that the hardware reports 32 bytes for PKRU feature
to account for additional padding (24 bytes) required to maintain offset for the XSAVE data
in compact form where:

offset of the next component = offset of current component +
size of current component

In this case, the hardware adds the padding needed before the offset of the subsequent
feature into the PKRU xsave state size.

Please correct me if I am wrong, but I believe this is similar to the case
mentioned in the commit ef78f2a4bf84 ('x86/fpu: Check CPU-provided sizes against struct declarations'),
where it mentions inconsistency b/w the MPX 'bndcsr' state and the C structures.

However, my point is that it should not need to warn if the xsave size is greater than
the size of the C structure.

Thanks,
Suravee

2019-12-11 14:15:29

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

On 12/10/19 9:24 PM, Suravee Suthikulpanit wrote:
>>> Yes, the implementation includes the padding size within the size of
>>> the enumerated state. This results in the reported size larger than
>>> the amount needed by the feature.
>
> Actually, please allow me clarify my understanding for this part.
>
> When you referred to "the existing architecture for padding", IIUC,
> that's the XSAVE state size, offset, and alignment of each extended
> feature reported by the CPUID Fn 0Dh E[A|B|C]X. By "padding", do you
> mean the additional area included as part of alignment?

I was specifically thinking of this nugget in Intel's SDM, Volume 1, 13.2:

> The value returned by ECX[1] indicates the alignment of state
> component i when the compacted format of the extended region of an
> XSAVE area is used (see Section 13.4.3). If ECX[1] returns 0, state
> component i is located immediately following the preceding state
> component; if ECX[1] returns 1, state component i is located on the
> next 64-byte boundary following the preceding state component.

Essentially, if an implementation needs state alignment or (up to) 64
bytes of padding, it could use this existing architecture for it.

>> I don't think we've ever had XSAVE state that differed in size between
>> implementations.  This kind of thing ensures that we can't have any
>> statically-defined inspection into the XSAVE state.
>>
>> It also increases the amount of blind trust that we have in the CPU
>> implementations.  However, those warnings were specifically added at
>> Ingo's behest (IIRC) to reduce our blind trust in the CPU.
>
> I am not quite sure what you meant by "statically-defined inspection"

Previously, it was possible to depend on a constant offset for a state
component in an XSAVE buffer. You could literally say "PKRU is at
offset 0x1234" (or whatever). All implementations had PKRU in the same
spot (without the compacted format).

BTW, I'm not saying this is a problem. The documented XSAVE
architecture itself has no provisions for the state being accessed like
this.

> and "blind trust".

I think I actually parroted the term from Ingo:

https://lore.kernel.org/lkml/[email protected]/

But the point is that with your patch, the kernel becomes less strict
about what we demand from the CPU. That might lead to letting CPU bugs
creep in that might cause real problems later.

> Please correct me if I am wrong, but I believe this is similar to the
> case mentioned in the commit ef78f2a4bf84 ('x86/fpu: Check
> CPU-provided sizes against struct declarations'), where it mentions
> inconsistency b/w the MPX 'bndcsr' state and the C structures.

Yep, but I fixed that by padding the C structure, not silencing the
warning. Also *ALL* MPX implementations have had the same size for that
state.

If we go forward with this patch, we should also removing the
pad_to_64_bytes[] from 'struct mpx_bndcsr_state'.

> What I have been told (by HW folks) is that the hardware reports 32 bytes for PKRU feature
> to account for additional padding (24 bytes) required to maintain offset for the XSAVE data
> in compact form where:
>
> offset of the next component = offset of current component +
> size of current component
>
> In this case, the hardware adds the padding needed before the offset of the subsequent
> feature into the PKRU xsave state size.

Huh, that's exactly why I thought we added ECX[1]. to the architecture.

2019-12-12 06:54:04

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

Dave,

On 12/11/2019 9:13 PM, Dave Hansen wrote:
> On 12/10/19 9:24 PM, Suravee Suthikulpanit wrote:
>> The value returned by ECX[1] indicates the alignment of state
>> component i when the compacted format of the extended region of an
>> XSAVE area is used (see Section 13.4.3). If ECX[1] returns 0, state
>> component i is located immediately following the preceding state
>> component; if ECX[1] returns 1, state component i is located on the
>> next 64-byte boundary following the preceding state component.
>
> Essentially, if an implementation needs state alignment or (up to) 64
> bytes of padding, it could use this existing architecture for it.

Let me check with the HW folks and get back to you on this.

>> Please correct me if I am wrong, but I believe this is similar to the
>> case mentioned in the commit ef78f2a4bf84 ('x86/fpu: Check
>> CPU-provided sizes against struct declarations'), where it mentions
>> inconsistency b/w the MPX 'bndcsr' state and the C structures.
>
> Yep, but I fixed that by padding the C structure, not silencing the
> warning. Also *ALL* MPX implementations have had the same size for that
> state.

Ah I see. But that solution works because the MPX feature is only on Intel.
In case of PKRU, it seems that two hardware implementations have different
padding size for PKRU state. IIUC, Intel has padding of 4 bytes based on
the following struct.

/*
* State component 9: 32-bit PKRU register. The state is
* 8 bytes long but only 4 bytes is used currently.
*/
struct pkru_state {
u32 pkru;
u32 pad;
} __packed;

Therefore, I agree with you that we might need to use the ECX[1].
Let me confirm this and get back to you.

Thanks,
Suravee

2020-01-13 09:56:35

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

Dave,

On 12/12/19 1:52 PM, Suravee Suthikulpanit wrote:
>
> On 12/11/2019 9:13 PM, Dave Hansen wrote:
>> On 12/10/19 9:24 PM, Suravee Suthikulpanit wrote:
>>> The value returned by ECX[1] indicates the alignment of state
>>> component i when the compacted format of the extended region of an
>>> XSAVE area is used (see Section 13.4.3). If ECX[1] returns 0, state
>>> component i is located immediately following the preceding state
>>> component; if ECX[1] returns 1, state component i is located on the
>>> next 64-byte boundary following the preceding state component.
>>
>> Essentially, if an implementation needs state alignment or (up to) 64
>> bytes of padding, it could use this existing architecture for it.
>
> Let me check with the HW folks and get back to you on this.

I don't think the 64-byte aligned bit alone would help with the PKRU size mismatch warning in
check_xstate_against_struct() that we are seeing.

Here is the description from the section 13.4.3, for the compacted format:

* Let j, 2 ≤ j < i, be the greatest value such that XCOMP_BV[j] = 1. Then locationI is
determined by the following values: locationJ; sizeJ, as enumerated in CPUID.(EAX=0DH,ECX=j):EAX;
and the value of alignI, as enumerated in CPUID.(EAX=0DH,ECX=i):ECX[1]:

— If alignI = 0, locationI = locationJ + sizeJ. (This item implies that state component i is located immediately
following the preceding state component whose bit is set in XCOMP_BV.)

— If alignI = 1, locationI = ceiling(locationJ + sizeJ, 64). (This item implies that state component i is located on
the next 64-byte boundary following the preceding state component whose bit is set in XCOMP_BV.)

According to the description above, since we need to add padding for PKRU state, if we set the aligned bit of the
subsequent feature after PKRU state, the current Linux kernel would still generate the warning for PKRU since the
check_xstate_against_struct() does not currently take into account of the align bit (which is only for compact mode).

It would also generate another warning in do_extra_xstate_size_checks() due to XSTATE_WARN_ON(paranoid_xstate_size !=
fpu_kernel_xstate_size) (see https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/fpu/xstate.c#L608). Note
that the paranoid_xstate_size takes into account of the 64-byte alignment in the size calculation, but the
fpu_kernel_xstate_size takes the size reported by CPUID function 0DH, sub-function 0 or 1. So, this logic might need to
change as well.

As for the PKRU size, it should really be 4 bytes (regardless of the alignment) since that's the size of actual PKRU
register, but it seems that the 4-byte padding in struct pkru_state was added to match the Intel's CPUID-report size of
8. Now we also have AMD hardware reporting PKRU size of 32 (also mainly for padding). We shouldn't keep the same warning
logic in check_xstate_against_struct().

Please let me know if I am still missing anything.

Thanks,
Suravee

2020-01-15 16:55:51

by Dave Hansen

[permalink] [raw]
Subject: Re: [PATCH] x86/fpu: Warn only when CPU-provided sizes less than struct declaration

On 1/13/20 1:54 AM, Suravee Suthikulpanit wrote:
>
> It would also generate another warning in do_extra_xstate_size_checks()
> due to XSTATE_WARN_ON(paranoid_xstate_size != fpu_kernel_xstate_size)
> (see
> https://elixir.bootlin.com/linux/latest/source/arch/x86/kernel/fpu/xstate.c#L608).

I'd love to see the patch that fixes this. :)

Or, is it already fixed by this?

https://lore.kernel.org/lkml/[email protected]/

2020-04-29 19:19:55

by Dave Hansen

[permalink] [raw]
Subject: AMD / Memory Protection Keys

The docs for AMD's implementation of protection keys showed up[1].
Welcome to the party!

It all looks fine, except for this nugget:

The MPK mechanism is ignored in the following cases:
... for pages marked in the paging structures as read-only
(R/W=0) or as supervisor addresses (U/S=0)

That R/W=0 would mean that you can't access-disable read-only pages,
which seems a bit goofy. It's certainly a feature that I could imagine
folks wanting to have. Read-only pages might happen in unexpected
places for things like mmap(PROT_WRITE)'d files or even CoW pages after
fork().

Is this an error in the docs?

1. https://www.amd.com/system/files/TechDocs/24593.pdf

2020-05-01 23:52:51

by Suthikulpanit, Suravee

[permalink] [raw]
Subject: Re: AMD / Memory Protection Keys

Dave,

On 4/30/20 2:18 AM, Dave Hansen wrote:
> The docs for AMD's implementation of protection keys showed up[1].
> Welcome to the party!
>
> It all looks fine, except for this nugget:
>
> The MPK mechanism is ignored in the following cases:
> ... for pages marked in the paging structures as read-only
> (R/W=0) or as supervisor addresses (U/S=0)
>
> That R/W=0 would mean that you can't access-disable read-only pages,
> which seems a bit goofy. It's certainly a feature that I could imagine
> folks wanting to have. Read-only pages might happen in unexpected
> places for things like mmap(PROT_WRITE)'d files or even CoW pages after
> fork().
>
> Is this an error in the docs?
>
> 1. https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.amd.com%2Fsystem%2Ffiles%2FTechDocs%2F24593.pdf&amp;data=02%7C01%7Csuravee.suthikulpanit%40amd.com%7C9ea21438bce6488589a308d7ec720d0d%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637237847630823928&amp;sdata=LZLQjvlVT5xSnjyRt0alN0lHleh8VD7Sxx53RL1UrX0%3D&amp;reserved=0
>

Yes, we have checked with the hardware team and this is a mistake in the documentation.
The access disable bit (ADi) bit does apply to read-only pages. We will be correcting
the documentation. Thanks for pointing it out :)

Suravee