2021-06-05 13:14:35

by Jiashuo Liang

[permalink] [raw]
Subject: arch_set_user_pkey_access only works on the current task_struct

Hi,

I am learning the kernel implementation of the x86 PKU feature. I find the
arch_set_user_pkey_access function in arch/x86/kernel/fpu/xstate.c does not
use its first parameter. So it is perhaps a bug?

The arch_set_user_pkey_access function is supposed to set the PKRU register
for the task_struct specified by its first parameter tsk. But it is only
implemented for the current task_struct.

Fortunately, it has been called only with current task_struct in the kernel
code, so it appears to be okay. However, it can introduce bugs in the
future because people may expect it working on other task_struct.

This commit seems to be related: b79daf8589921 ("x86/mm/pkeys: Fix compact
mode by removing protection keys' XSAVE buffer manipulation").

Thank you!
liangjs


2021-06-07 17:55:43

by Dave Hansen

[permalink] [raw]
Subject: Re: arch_set_user_pkey_access only works on the current task_struct

On 6/5/21 6:10 AM, Jiashuo Liang wrote:
> I am learning the kernel implementation of the x86 PKU feature. I find the
> arch_set_user_pkey_access function in arch/x86/kernel/fpu/xstate.c does not
> use its first parameter. So it is perhaps a bug?

I wouldn't really call it a bug. But, yes, it is something we should
clean up.

2021-06-08 03:20:21

by Jiashuo Liang

[permalink] [raw]
Subject: Re: arch_set_user_pkey_access only works on the current task_struct

On Mon, 2021-06-07 at 10:52 -0700, Dave Hansen wrote:
> On 6/5/21 6:10 AM, Jiashuo Liang wrote:
> > I am learning the kernel implementation of the x86 PKU feature. I find the
> > arch_set_user_pkey_access function in arch/x86/kernel/fpu/xstate.c does not
> > use its first parameter. So it is perhaps a bug?
>
> I wouldn't really call it a bug.  But, yes, it is something we should
> clean up.

Should we remove the tsk parameter, or allow it to change the PKRU of tsk?

By the way, we are calling write_pkru, which changes both the CPU's PKRU
and the xsave one. Why is this necessary?

If I want to change PKRU of a task_struct other than current, do I still
need to call __write_pkru?

Thank you!
liangjs

2021-06-08 14:58:27

by Dave Hansen

[permalink] [raw]
Subject: Re: arch_set_user_pkey_access only works on the current task_struct

On 6/7/21 8:16 PM, liangjs wrote:
> On Mon, 2021-06-07 at 10:52 -0700, Dave Hansen wrote:
>> On 6/5/21 6:10 AM, Jiashuo Liang wrote:
>>> I am learning the kernel implementation of the x86 PKU feature. I find the
>>> arch_set_user_pkey_access function in arch/x86/kernel/fpu/xstate.c does not
>>> use its first parameter. So it is perhaps a bug?
>> I wouldn't really call it a bug.  But, yes, it is something we should
>> clean up.
> Should we remove the tsk parameter, or allow it to change the PKRU of tsk?

Probably just remove the parameter.

By the way, there's a big PKRU rework in progress. It might be best to
wait until the dust settles to poke at this.

> By the way, we are calling write_pkru, which changes both the CPU's PKRU
> and the xsave one. Why is this necessary?

PKRU affects kernel accesses to user memory. That means that you can't
run the *kernel* with an out-of-date PKRU, thus the write_pkru().

Returning to userspace blindly restores the *WHOLE* XSAVE buffer to the
regsisters. If you don't update the XSAVE buffer, the write_pkru() will
be overwritten before returning to userspace.

> If I want to change PKRU of a task_struct other than current, do I still
> need to call __write_pkru?

No. You can't do that. Seriously.

The protection keys architecture really doesn't support off-thread
manipulation of PKRU. Imagine you want to mask a bit out of PKRU, you
do the following to make key 2 memory accessible and writable:

reg = read_pkru();
reg &= 0x30;
write_pkru(reg);

Now, imagine that you tried to interrupt this poor task in the middle of
that operation. Let's say you try to *set* the bits for key 4, effectively:

pkru |= 0x300;

Now you try to do that key-4 business with an IPI.

reg = read_pkru(); // PKRU=0x30
reg &= 0x30;
-> IPI
ipireg = read_pkru(); // PKRU=0x0
ipireg |= 0x300;
write_pkru(ipireg); // PKRU=0x300
write_pkru(reg); // PKRU=0x0

You *LOST* the update from the IPI.

2021-06-08 19:31:25

by Thomas Gleixner

[permalink] [raw]
Subject: Re: arch_set_user_pkey_access only works on the current task_struct

On Tue, Jun 08 2021 at 11:16, liangjs wrote:
> On Mon, 2021-06-07 at 10:52 -0700, Dave Hansen wrote:
>> On 6/5/21 6:10 AM, Jiashuo Liang wrote:
>> > I am learning the kernel implementation of the x86 PKU feature. I find the
>> > arch_set_user_pkey_access function in arch/x86/kernel/fpu/xstate.c does not
>> > use its first parameter. So it is perhaps a bug?
>>
>> I wouldn't really call it a bug.  But, yes, it is something we should
>> clean up.
>
> Should we remove the tsk parameter, or allow it to change the PKRU of tsk?
>
> By the way, we are calling write_pkru, which changes both the CPU's PKRU
> and the xsave one. Why is this necessary?

Because PKRU is xstate managed and there is the requirement to keep both
up to to date. There is work in progress to clean this up.

> If I want to change PKRU of a task_struct other than current, do I still
> need to call __write_pkru?

Of course not, but you _cannot_ safely update a different tasks PKRU
value except through ptrace which guarantees that the task is scheduled
out and stays that way until ptrace releases it again.

So tsk != current cannot work which means the function argument can just
go away.

Thanks,

tglx