On 5/25/21 2:37 PM, Babu Moger wrote:
> My suspicion at this point is towards the selftest tool protection_keys.c.
> I will keep looking. Any feedback would be much appreciated to debug further.
The pkey selftest code that pokes at the signal stack is rather hackish.
If I had to guess, I'd suspect that PKRU ends up in there in a slightly
different place than on Intel CPUs.
One oddity is that xfeatures seems to lose its pkey bit somewhere:
> protection_keys-17350 [035] 59275.834197: x86_fpu_copy_src: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
> protection_keys-17350 [035] 59275.834197: x86_fpu_copy_dst: x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
The only legitimate way that can happen (on Intel at least) is an XRSTOR
that brings PKRU back to the init state. That would destroy all
meaningful PKRU state, unless PKRU=0, which it almost never does on Linux.
What values do PKRU and the shadow have when the test fails? Is PKRU 0?
Any idea how xfeatures&0x200 got clear?
On 5/25/21 5:18 PM, Dave Hansen wrote:
> On 5/25/21 2:37 PM, Babu Moger wrote:
>> My suspicion at this point is towards the selftest tool protection_keys.c.
>> I will keep looking. Any feedback would be much appreciated to debug further.
>
> The pkey selftest code that pokes at the signal stack is rather hackish.
> If I had to guess, I'd suspect that PKRU ends up in there in a slightly
> different place than on Intel CPUs.
>
> One oddity is that xfeatures seems to lose its pkey bit somewhere:
>
>> protection_keys-17350 [035] 59275.834197: x86_fpu_copy_src: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
>> protection_keys-17350 [035] 59275.834197: x86_fpu_copy_dst: x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
>
> The only legitimate way that can happen (on Intel at least) is an XRSTOR
> that brings PKRU back to the init state. That would destroy all
> meaningful PKRU state, unless PKRU=0, which it almost never does on Linux.
>
> What values do PKRU and the shadow have when the test fails? Is PKRU 0?
> Any idea how xfeatures&0x200 got clear?
I did observe that PKRU is 0 right before the failure.
Shouldn't this still be a valid value?
On 5/25/21 3:22 PM, Dave Kleikamp wrote:
> On 5/25/21 5:18 PM, Dave Hansen wrote:
>> What values do PKRU and the shadow have when the test fails?? Is PKRU 0?
>> ? Any idea how xfeatures&0x200 got clear?
>
> I did observe that PKRU is 0 right before the failure.
>
> Shouldn't this still be a valid value?
It's architecturall *valid* in the hardware for sure, but nothing in the
tests or the kernel should ever set PKRU=0. The selftest is noticing
that it ends up at a value that's entirely unexpected and properly
bugging out.
The reason I'm suspecting an XRSTOR is that the kernel always calls
XRSTOR(RFBM=-1)
setting all of the bits in the Requested Feature BitMap (RFBM). If
RFBM[X]=1 and XSTATE_BV[i]=0 (XSTATE_BV is 'xfeatures' in the traces),
then the state component is set to its initial configuration, which for
PKRU is 0.
That's why I'm asking how xfeatures&0x200 got clear.
On 5/25/21 5:18 PM, Dave Hansen wrote:
> On 5/25/21 2:37 PM, Babu Moger wrote:
>> My suspicion at this point is towards the selftest tool protection_keys.c.
>> I will keep looking. Any feedback would be much appreciated to debug further.
>
> The pkey selftest code that pokes at the signal stack is rather hackish.
> If I had to guess, I'd suspect that PKRU ends up in there in a slightly
> different place than on Intel CPUs.
You mean the offsets can be different? Not sure how to figure that out.
Let me take a look.
>
> One oddity is that xfeatures seems to lose its pkey bit somewhere:
Yes. I noticed that. I did not see that happening on Intel box where test
runs successfully.
>
>> protection_keys-17350 [035] 59275.834197: x86_fpu_copy_src: x86/fpu: 0xffff93d7595e2dc0 load: 0 xfeatures: 202 xcomp_bv: 8000000000000207
>> protection_keys-17350 [035] 59275.834197: x86_fpu_copy_dst: x86/fpu: 0xffff93d722877800 load: 0 xfeatures: 2 xcomp_bv: 8000000000000207
>
> The only legitimate way that can happen (on Intel at least) is an XRSTOR
> that brings PKRU back to the init state. That would destroy all
> meaningful PKRU state, unless PKRU=0, which it almost never does on Linux.
>
> What values do PKRU and the shadow have when the test fails? Is PKRU 0?
It goes back to default value 0x55555554. The test is expecting it to be
0. Printed them below.
test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow:
0000000000000000
protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion
`pkey_reg == shadow_pkey_reg' failed.
> Any idea how xfeatures&0x200 got clear?
Printed all the flags while in __switch_to, the header flags and CR4 flags
appears to be intact. Dont know how the feature 0x200 got cleared. Let me
check if XRSTOR is coming into play here.
On 5/25/21 5:03 PM, Babu Moger wrote:
>> What values do PKRU and the shadow have when the test fails? Is PKRU 0?
> It goes back to default value 0x55555554. The test is expecting it to be
> 0. Printed them below.
>
> test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow:
> 0000000000000000
> protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion
> `pkey_reg == shadow_pkey_reg' failed.
That's backwards (shadow vs pkru) from what I was expecting.
Can you turn on all the debuging?
Just compile with -DDEBUG_LEVEL=5
On 5/25/21 7:20 PM, Dave Hansen wrote:
> On 5/25/21 5:03 PM, Babu Moger wrote:
>>> What values do PKRU and the shadow have when the test fails? Is PKRU 0?
>> It goes back to default value 0x55555554. The test is expecting it to be
>> 0. Printed them below.
>>
>> test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow:
>> 0000000000000000
>> protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion
>> `pkey_reg == shadow_pkey_reg' failed.
>
> That's backwards (shadow vs pkru) from what I was expecting.
>
> Can you turn on all the debuging?
>
> Just compile with -DDEBUG_LEVEL=5
>
Copied the logs at https://pastebin.com/gtQiHg8Q
On 5/26/21 10:25 AM, Babu Moger wrote:
>
>
> On 5/25/21 7:20 PM, Dave Hansen wrote:
>> On 5/25/21 5:03 PM, Babu Moger wrote:
>>>> What values do PKRU and the shadow have when the test fails? Is PKRU 0?
>>> It goes back to default value 0x55555554. The test is expecting it to be
>>> 0. Printed them below.
>>>
>>> test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow:
>>> 0000000000000000
>>> protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion
>>> `pkey_reg == shadow_pkey_reg' failed.
>>
>> That's backwards (shadow vs pkru) from what I was expecting.
>>
>> Can you turn on all the debuging?
>>
>> Just compile with -DDEBUG_LEVEL=5
>>
>
>
> Copied the logs at https://pastebin.com/gtQiHg8Q
I think this failed early due to the debug code. I have a patch that
gets around this.
On 5/26/21 8:25 AM, Babu Moger wrote:
> On 5/25/21 7:20 PM, Dave Hansen wrote:
>> On 5/25/21 5:03 PM, Babu Moger wrote:
>>>> What values do PKRU and the shadow have when the test fails? Is PKRU 0?
>>> It goes back to default value 0x55555554. The test is expecting it to be
>>> 0. Printed them below.
>>>
>>> test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow:
>>> 0000000000000000
>>> protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion
>>> `pkey_reg == shadow_pkey_reg' failed.
>> That's backwards (shadow vs pkru) from what I was expecting.
>>
>> Can you turn on all the debuging?
>>
>> Just compile with -DDEBUG_LEVEL=5
>
> Copied the logs at https://pastebin.com/gtQiHg8Q
Well, it's a bit backwards from what I'm expecting. The PKRU=0 value
*WAS* legitimate because all of the pkeys got allocated and their
disable bits cleared.
I think Andy was close when he was blaming:
> static inline void write_pkru(u32 pkru)
> {
...
> pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
...
> if (pk)
> pk->pkru = pkru;
> __write_pkru(pkru);
> }
But that can't be it because PKRU ended up with 0x55555554. Something
must have been writing 'init_pkru_value'.
switch_fpu_finish() does that:
> static inline void switch_fpu_finish(struct fpu *new_fpu)
> {
> u32 pkru_val = init_pkru_value;
...
> if (current->mm) {
> pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
> if (pk)
> pkru_val = pk->pkru;
> }
> __write_pkru(pkru_val);
...
> }
If 'new_fpu' had XSTATE_BV[PKRU]=0 then we'd have pk=NULL and 'pkru_val'
would still have 'init_pkru_value'. *Then*, we'd have a shadow=0x0 and
pkru=0x55555554. It would also only trigger if the hardware has an init
tracker that fires when wrpkru(0). Intel doesn't do that. AMD must.
Anyway, I need to think about this a bit more. But, an entirely
guaranteed to be 100% untested patch is attached. I'm *NOT* confident
this is the right fix.
I don't have much AMD hardware laying around, so testing would be
appreciated.
On 5/26/21 11:06 AM, Dave Hansen wrote:
> On 5/26/21 8:25 AM, Babu Moger wrote:
>> On 5/25/21 7:20 PM, Dave Hansen wrote:
>>> On 5/25/21 5:03 PM, Babu Moger wrote:
>>>>> What values do PKRU and the shadow have when the test fails? Is PKRU 0?
>>>> It goes back to default value 0x55555554. The test is expecting it to be
>>>> 0. Printed them below.
>>>>
>>>> test_ptrace_of_child()::1346, pkey_reg: 0x0000000055555554 shadow:
>>>> 0000000000000000
>>>> protection_keys_64: pkey-helpers.h:127: _read_pkey_reg: Assertion
>>>> `pkey_reg == shadow_pkey_reg' failed.
>>> That's backwards (shadow vs pkru) from what I was expecting.
>>>
>>> Can you turn on all the debuging?
>>>
>>> Just compile with -DDEBUG_LEVEL=5
>>
>> Copied the logs at https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpastebin.com%2FgtQiHg8Q&data=04%7C01%7Cbabu.moger%40amd.com%7Cf35e0082b0f44650045408d920602c08%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637576419688153335%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=lkJrEo9EJFhfQcOvS%2Be8gLf0GuqZSWGQw2omPZ2Ehb0%3D&reserved=0
>
> Well, it's a bit backwards from what I'm expecting. The PKRU=0 value
> *WAS* legitimate because all of the pkeys got allocated and their
> disable bits cleared.
>
> I think Andy was close when he was blaming:
>
>> static inline void write_pkru(u32 pkru)
>> {
> ...
>> pk = get_xsave_addr(¤t->thread.fpu.state.xsave, XFEATURE_PKRU);
> ...
>> if (pk)
>> pk->pkru = pkru;
>> __write_pkru(pkru);
>> }
>
> But that can't be it because PKRU ended up with 0x55555554. Something
> must have been writing 'init_pkru_value'.
>
> switch_fpu_finish() does that:
Yes, I have noticed switch_fpu_finish writing init_pkru_value sometimes.
But, I was not sure why that was happening..
>
>> static inline void switch_fpu_finish(struct fpu *new_fpu)
>> {
>> u32 pkru_val = init_pkru_value;
> ...
>> if (current->mm) {
>> pk = get_xsave_addr(&new_fpu->state.xsave, XFEATURE_PKRU);
>> if (pk)
>> pkru_val = pk->pkru;
>> }
>> __write_pkru(pkru_val);
> ...
>> }
>
> If 'new_fpu' had XSTATE_BV[PKRU]=0 then we'd have pk=NULL and 'pkru_val'
> would still have 'init_pkru_value'. *Then*, we'd have a shadow=0x0 and
> pkru=0x55555554. It would also only trigger if the hardware has an init
> tracker that fires when wrpkru(0). Intel doesn't do that. AMD must.
Ok. I will check with hardware guys here about this behavior.
>
> Anyway, I need to think about this a bit more. But, an entirely
> guaranteed to be 100% untested patch is attached. I'm *NOT* confident
> this is the right fix.
>
> I don't have much AMD hardware laying around, so testing would be
> appreciated.
>
Yes. Patch fixes problem on AMD. Also tested on Intel box to make sure it
does not cause any regression there. It does work fine there as well.
Thanks for the patch.