2024-03-14 17:29:47

by Aruna Ramakrishna

[permalink] [raw]
Subject: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE

This patch is a workaround for a bug where the PKRU value is not
restored to the init value before the signal handler is invoked.

Problem description:
Let's assume there's a multithreaded application that runs untrusted
user code. Each thread has its stack/code protected by a non-zero pkey,
and the PKRU register is set up such that only that particular non-zero
pkey is enabled. Each thread also sets up an alternate signal stack to
handle signals, which is protected by pkey zero. The pkeys man page
documents that the PKRU will be reset to init_pkru when the signal
handler is invoked, which means that pkey zero access will be enabled.
But this reset happens after the kernel attempts to push fpu state
to the alternate stack, which is not (yet) accessible by the kernel,
which leads to a new SIGSEGV being sent to the application, terminating
it.

Enabling both the non-zero pkey (for the thread) and pkey zero (in
userspace) will not work for us. We cannot have the alt stack writeable
by all - the rationale here is that the code running in that thread
(using a non-zero pkey) is untrusted and should not have access to the
alternate signal stack (that uses pkey zero), to prevent the return address
of a function from being changed. The expectation is that kernel should be
able to set up the alternate signal stack and deliver the signal to the
application even if pkey zero is explicitly disabled by the application
(as documented in the pkeys man page). The signal handler accessibility
should not be dictated by whatever PKRU value the thread sets up.

Solution:
The PKRU register is managed by XSAVE, which means the sigframe contents
must match the register contents - which is not the case here. We want
the sigframe to contain the user-defined PKRU value (so that it is
restored correctly from sigcontext) but the actual register must be
reset to init_pkru so that the alt stack is accessible and the signal
can be delivered to the application. It seems that the proper fix here
would be to remove PKRU from the XSAVE framework and manage it
separately, which is quite complicated. As a workaround, this patch does
something like this:

orig_pkru = rdpkru();
wrpkru(init_pkru & orig_pkru);
xsave_to_user_sigframe();
put_user(pkru_sigframe_addr, orig_pkru)

Note:
I think the use_xsave() check in __update_pkru_in_sigframe() is redundant,
but I'm not 100% sure.

Signed-off-by: Aruna Ramakrishna <[email protected]>
Helped-by: Dave Kleikamp <[email protected]>
Helped-by: Prakash Sangappa <[email protected]>
---
arch/x86/include/asm/fpu/xstate.h | 8 ++++
arch/x86/kernel/fpu/internal.h | 7 +---
arch/x86/kernel/fpu/signal.c | 4 +-
arch/x86/kernel/fpu/xstate.c | 13 +++++++
arch/x86/kernel/signal.c | 65 ++++++++++++++++++++++++++++++-
5 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/fpu/xstate.h b/arch/x86/include/asm/fpu/xstate.h
index d4427b88ee12..681d4bb5939c 100644
--- a/arch/x86/include/asm/fpu/xstate.h
+++ b/arch/x86/include/asm/fpu/xstate.h
@@ -112,6 +112,14 @@ void xsaves(struct xregs_state *xsave, u64 mask);
void xrstors(struct xregs_state *xsave, u64 mask);

int xfd_enable_feature(u64 xfd_err);
+void *get_xsave_standard_addr(struct xregs_state *xsave, int xfeature_nr);
+bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
+ struct _fpx_sw_bytes *fx_sw);
+
+static __always_inline __pure bool use_xsave(void)
+{
+ return cpu_feature_enabled(X86_FEATURE_XSAVE);
+}

#ifdef CONFIG_X86_64
DECLARE_STATIC_KEY_FALSE(__fpu_state_size_dynamic);
diff --git a/arch/x86/kernel/fpu/internal.h b/arch/x86/kernel/fpu/internal.h
index dbdb31f55fc7..ab838c9625dd 100644
--- a/arch/x86/kernel/fpu/internal.h
+++ b/arch/x86/kernel/fpu/internal.h
@@ -4,12 +4,7 @@

extern struct fpstate init_fpstate;

-/* CPU feature check wrappers */
-static __always_inline __pure bool use_xsave(void)
-{
- return cpu_feature_enabled(X86_FEATURE_XSAVE);
-}
-
+/* CPU feature check wrapper */
static __always_inline __pure bool use_fxsr(void)
{
return cpu_feature_enabled(X86_FEATURE_FXSR);
diff --git a/arch/x86/kernel/fpu/signal.c b/arch/x86/kernel/fpu/signal.c
index 247f2225aa9f..86fcae8ea141 100644
--- a/arch/x86/kernel/fpu/signal.c
+++ b/arch/x86/kernel/fpu/signal.c
@@ -24,8 +24,8 @@
* Check for the presence of extended state information in the
* user fpstate pointer in the sigcontext.
*/
-static inline bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
- struct _fpx_sw_bytes *fx_sw)
+bool check_xstate_in_sigframe(struct fxregs_state __user *fxbuf,
+ struct _fpx_sw_bytes *fx_sw)
{
int min_xstate_size = sizeof(struct fxregs_state) +
sizeof(struct xstate_header);
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 117e74c44e75..81de5759a71a 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -941,6 +941,19 @@ static void *__raw_xsave_addr(struct xregs_state *xsave, int xfeature_nr)
return (void *)xsave + xfeature_get_offset(xcomp_bv, xfeature_nr);
}

+/*
+ * Given an xstate feature nr, calculate where in the xsave
+ * buffer the state is. The xsave buffer should be in standard
+ * format, not compacted (e.g. user mode signal frames).
+ */
+void *get_xsave_standard_addr(struct xregs_state *xsave, int xfeature_nr)
+{
+ if (WARN_ON_ONCE(!xfeature_enabled(xfeature_nr)))
+ return NULL;
+
+ return (void *)xsave + xstate_offsets[xfeature_nr];
+}
+
/*
* Given the xsave area and a state inside, this function returns the
* address of the state.
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index 31b6f5dddfc2..b52edf9c29e8 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -224,11 +224,52 @@ setup_rt_frame(struct ksignal *ksig, struct pt_regs *regs)
}
}

+static inline int
+__update_pkru_in_sigframe(struct rt_sigframe __user *frame, u32 new_pkru)
+{
+ int err = -EFAULT;
+ struct _fpx_sw_bytes fx_sw;
+ struct pkru_state *pk = NULL;
+ struct sigcontext __user *sc;
+ unsigned long buf;
+
+ if (!cpu_feature_enabled(X86_FEATURE_OSPKE) || !use_xsave())
+ return 0;
+
+ if (!user_read_access_begin(frame, sizeof(*frame)))
+ goto out;
+ sc = (struct sigcontext __user *) &frame->uc.uc_mcontext;
+ unsafe_get_user(buf,
+ (unsigned long __user *)&sc->fpstate,
+ uaccess_end);
+ user_access_end();
+
+ if (unlikely(!check_xstate_in_sigframe((void __user *) buf, &fx_sw)))
+ goto out;
+
+ pk = get_xsave_standard_addr((struct xregs_state __user *) buf,
+ XFEATURE_PKRU);
+ if (!pk || !user_write_access_begin((struct xregs_state __user *) buf,
+ sizeof(struct xregs_state)))
+ goto out;
+ unsafe_put_user(new_pkru, (unsigned int __user *) pk, uaccess_end);
+
+ err = 0;
+
+uaccess_end:
+ user_access_end();
+out:
+ return err;
+}
+
static void
handle_signal(struct ksignal *ksig, struct pt_regs *regs)
{
bool stepping, failed;
struct fpu *fpu = &current->thread.fpu;
+ u32 orig_pkru;
+ u32 init_pkru_snapshot;
+ struct rt_sigframe __user *frame = NULL;

if (v8086_mode(regs))
save_v86_state((struct kernel_vm86_regs *) regs, VM86_SIGNAL);
@@ -264,8 +305,30 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
if (stepping)
user_disable_single_step(current);

+ /*
+ * We must have access to both the current stack and the exception
+ * stack. The exception stack must be accessible by the init PKRU
+ * value. Temporarily set the PKRU to access both. It will be set
+ * properly in fpu__clear_user_states().
+ */
+ orig_pkru = read_pkru();
+ init_pkru_snapshot = pkru_get_init_value();
+ write_pkru(orig_pkru & init_pkru_snapshot);
+
failed = (setup_rt_frame(ksig, regs) < 0);
- if (!failed) {
+ if (failed)
+ write_pkru(orig_pkru);
+ else {
+ /*
+ * Overwrite the PKRU value on the signal frame with the
+ * user-defined value so that it gets restored correctly
+ * from sigcontext.
+ */
+ frame = (struct rt_sigframe __force __user *)(regs->sp);
+ if (__update_pkru_in_sigframe(frame, orig_pkru))
+ pr_info("Failed to reset PKRU value to 0x%x on "
+ "the signal frame\n", orig_pkru);
+
/*
* Clear the direction flag as per the ABI for function entry.
*
--
2.39.3



2024-03-14 17:54:25

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE

On 3/14/24 10:29, Aruna Ramakrishna wrote:
> This patch is a workaround for a bug where the PKRU value is not
> restored to the init value before the signal handler is invoked.

I don't think we should touch this with a ten foot pole without a test
for it in tools/testing/selftests/mm/protection_keys.c.

I'm not sure this is worth the trouble. Protection keys is not a
security feature. This isn't a regression. It's been the behavior
since day one. This really is a feature request for a behavioral
improvement, not a bug fix.

The need for this new feature is highly dependent on the threat model
that it supports. I'm highly dubious that there's a true need to
protect against an attacker with arbitrary write access in the same
address space. We need to have a lot more information there.

I haven't even more than glanced at the code. It looks pretty
unspeakably ugly even at a glance.

2024-03-14 18:15:09

by Aruna Ramakrishna

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE


> On Mar 14, 2024, at 10:54 AM, Dave Hansen <[email protected]> wrote:
>
> On 3/14/24 10:29, Aruna Ramakrishna wrote:
>> This patch is a workaround for a bug where the PKRU value is not
>> restored to the init value before the signal handler is invoked.
>
> I don't think we should touch this with a ten foot pole without a test
> for it in tools/testing/selftests/mm/protection_keys.c.

I’ll add a test case here.

>
> I'm not sure this is worth the trouble. Protection keys is not a
> security feature. This isn't a regression. It's been the behavior
> since day one. This really is a feature request for a behavioral
> improvement, not a bug fix.
>
> The need for this new feature is highly dependent on the threat model
> that it supports. I'm highly dubious that there's a true need to
> protect against an attacker with arbitrary write access in the same
> address space. We need to have a lot more information there.

I thought the PKRU value being reset in the signal handler was supposed to be the default behavior. In which case, this is a bug.

"Signal Handler Behavior
Each time a signal handler is invoked (including nested signals),
the thread is temporarily given a new, default set of protection
key rights that override the rights from the interrupted context.”

(Ref: https://man7.org/linux/man-pages/man7/pkeys.7.html)

I'm not very familiar with protection keys (before I started looking into this issue), so I apologize for misunderstanding.

fpu__clear_user_states() does reset PKRU, but that happens much later in the flow. Before that, the kernel tries to save registers on to the alternate signal stack in setup_rt_frame(), and that fails if the application has explicitly disabled pkey 0 (and the alt stack is protected by pkey 0). This patch attempts to move that reset a little earlier in the flow, so that setup_rt_frame() can succeed.

>
> I haven't even more than glanced at the code. It looks pretty
> unspeakably ugly even at a glance.

I agree with you - no argument there.

But I’m not sure there is a “clean” way to do this. If there is, I’m happy to redo the patch.

Thanks,
Aruna

2024-03-14 18:30:10

by Dave Hansen

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE

On 3/14/24 11:14, Aruna Ramakrishna wrote:
> I thought the PKRU value being reset in the signal handler was supposed
> to be the default behavior. In which case, this is a bug.
>
> "Signal Handler Behavior
> Each time a signal handler is invoked (including nested signals),
> the thread is temporarily given a new, default set of protection
> key rights that override the rights from the interrupted context.”
>
> (Ref: https://man7.org/linux/man-pages/man7/pkeys.7.html)

As the person who wrote those words, I can honestly say that there were
*ZERO* considerations for what the kernel's permissions were while
setting up the frame. I was assuming then and assume to this day that
it's practically impossible to turn off pkey-0 writes and get sane behavior.

If we want to lawyer-word the manpage, I'll just argue that "Each time a
signal handler is invoked" literally doesn't apply until the moment that
RIP is pointing back to userspace. :)

If this is truly about the manpage, then I'll happily amend the manpage
to say, "don't turn off pkey 0 access or else". I'd *MUCH* rather do
that than add more pkey munging to the kernel.

In other words, you're not going to spur me into action my thwapping me
with the manpage that I wrote. You've got to convince me that your new
use case is valid, this is the best way to support your new use case,
and that your implementation of the new feature is sane.



2024-03-15 04:48:14

by Aruna Ramakrishna

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE


> On Mar 14, 2024, at 11:30 AM, Dave Hansen <[email protected]> wrote:
>
> On 3/14/24 11:14, Aruna Ramakrishna wrote:
>> I thought the PKRU value being reset in the signal handler was supposed
>> to be the default behavior. In which case, this is a bug.
>>
>> "Signal Handler Behavior
>> Each time a signal handler is invoked (including nested signals),
>> the thread is temporarily given a new, default set of protection
>> key rights that override the rights from the interrupted context.”
>>
>> (Ref: https://man7.org/linux/man-pages/man7/pkeys.7.html)
>
> As the person who wrote those words, I can honestly say that there were
> *ZERO* considerations for what the kernel's permissions were while
> setting up the frame. I was assuming then and assume to this day that
> it's practically impossible to turn off pkey-0 writes and get sane behavior.
>
> If we want to lawyer-word the manpage, I'll just argue that "Each time a
> signal handler is invoked" literally doesn't apply until the moment that
> RIP is pointing back to userspace. :)
>
> If this is truly about the manpage, then I'll happily amend the manpage
> to say, "don't turn off pkey 0 access or else". I'd *MUCH* rather do
> that than add more pkey munging to the kernel.
>

It’s not about the man page - it's just that, my understanding of this flow and this use case stems from there. I think we assumed that we can turn off pkey 0 and still be able to set up the alt sig stack (and have the kernel reset it to init_pkru anyway) - and when that didn’t work, it seemed like a bug. :)

> In other words, you're not going to spur me into action my thwapping me
> with the manpage that I wrote. You've got to convince me that your new
> use case is valid, this is the best way to support your new use case,
> and that your implementation of the new feature is sane.
>
>

Matthias/Eric,
Can you please talk about the use case in greater detail?

Thanks,
Aruna

2024-03-15 17:36:16

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE

On Thu, Mar 14 2024 at 18:14, Aruna Ramakrishna wrote:
>> On Mar 14, 2024, at 10:54 AM, Dave Hansen <[email protected]> wrote:
>> The need for this new feature is highly dependent on the threat model
>> that it supports. I'm highly dubious that there's a true need to
>> protect against an attacker with arbitrary write access in the same
>> address space. We need to have a lot more information there.
>
> I thought the PKRU value being reset in the signal handler was
> supposed to be the default behavior. In which case, this is a bug.
>
> "Signal Handler Behavior
> Each time a signal handler is invoked (including nested signals),
> the thread is temporarily given a new, default set of protection
> key rights that override the rights from the interrupted context.”
>
> (Ref: https://man7.org/linux/man-pages/man7/pkeys.7.html)
>
> I'm not very familiar with protection keys (before I started looking
> into this issue), so I apologize for misunderstanding.
>
> fpu__clear_user_states() does reset PKRU, but that happens much later
> in the flow. Before that, the kernel tries to save registers on to the
> alternate signal stack in setup_rt_frame(), and that fails if the
> application has explicitly disabled pkey 0 (and the alt stack is
> protected by pkey 0). This patch attempts to move that reset a little
> earlier in the flow, so that setup_rt_frame() can succeed.
>
>> I haven't even more than glanced at the code. It looks pretty
>> unspeakably ugly even at a glance.
>
> I agree with you - no argument there.

It's a horrible hack.

> But I’m not sure there is a “clean” way to do this. If there is, I’m
> happy to redo the patch.

If it turns out to be required, desired whatever then the obvious clean
solution is to hand the PKRU value down:

setup_rt_frame()
xxx_setup_rt_frame()
get_sigframe()
copy_fpstate_to_sigframe()

copy_fpstate_to_sigframe() has the user fpstate pointer already so none
of the __update_pkru_in_sigframe() monstrosities are required. No?

Thanks,

tglx

2024-03-15 18:44:33

by Aruna Ramakrishna

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE



> On Mar 15, 2024, at 10:36 AM, Thomas Gleixner <[email protected]> wrote:
>
> On Thu, Mar 14 2024 at 18:14, Aruna Ramakrishna wrote:
>>> On Mar 14, 2024, at 10:54 AM, Dave Hansen <[email protected]> wrote:
>>> The need for this new feature is highly dependent on the threat model
>>> that it supports. I'm highly dubious that there's a true need to
>>> protect against an attacker with arbitrary write access in the same
>>> address space. We need to have a lot more information there.
>>
>> I thought the PKRU value being reset in the signal handler was
>> supposed to be the default behavior. In which case, this is a bug.
>>
>> "Signal Handler Behavior
>> Each time a signal handler is invoked (including nested signals),
>> the thread is temporarily given a new, default set of protection
>> key rights that override the rights from the interrupted context.”
>>
>> (Ref: https://man7.org/linux/man-pages/man7/pkeys.7.html)
>>
>> I'm not very familiar with protection keys (before I started looking
>> into this issue), so I apologize for misunderstanding.
>>
>> fpu__clear_user_states() does reset PKRU, but that happens much later
>> in the flow. Before that, the kernel tries to save registers on to the
>> alternate signal stack in setup_rt_frame(), and that fails if the
>> application has explicitly disabled pkey 0 (and the alt stack is
>> protected by pkey 0). This patch attempts to move that reset a little
>> earlier in the flow, so that setup_rt_frame() can succeed.
>>
>>> I haven't even more than glanced at the code. It looks pretty
>>> unspeakably ugly even at a glance.
>>
>> I agree with you - no argument there.
>
> It's a horrible hack.
>
>> But I’m not sure there is a “clean” way to do this. If there is, I’m
>> happy to redo the patch.
>
> If it turns out to be required, desired whatever then the obvious clean
> solution is to hand the PKRU value down:
>
> setup_rt_frame()
> xxx_setup_rt_frame()
> get_sigframe()
> copy_fpstate_to_sigframe()
>
> copy_fpstate_to_sigframe() has the user fpstate pointer already so none
> of the __update_pkru_in_sigframe() monstrosities are required. No?
>

(Resending with some edits; I’m not sure why my previous message did not make it to the mailing list.)

I’m not sure I fully understand.

Are you suggesting modifying all these functions down the chain from handle_signal() to take in an additional parameter? Wouldn’t that break kABI?

In this approach too, the snippet where the value is modified on the sigframe after xsave will remain unchanged, because we need the value before xsave to match the register contents.

I guess what I’m saying is, half of __update_pkru_in_sigframe() will remain unchanged - it would just be invoked from copy_fpstate_to_sigframe() instead of handle_signal().

pk = get_xsave_standard_addr((struct xregs_state __user *) buf,
XFEATURE_PKRU);
if (!pk || !user_write_access_begin((struct xregs_state __user *) buf,
sizeof(struct xregs_state)))
goto out;
unsafe_put_user(new_pkru, (unsigned int __user *) pk, uaccess_end);

Right?

If I’ve misunderstood something, I apologize. If there’s a way to do this without overwriting PKRU on the sigframe after xsave, I'd like to understand that flow. Or if it’s just a matter of not needing to extract fpstate pointer in handle_signal(), I can restructure it that way too.

Thanks,
Aruna

> Thanks,
>
> tglx

2024-03-15 19:22:45

by Aruna Ramakrishna

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE



> On Mar 15, 2024, at 10:36 AM, Thomas Gleixner <[email protected]> wrote:
>
> On Thu, Mar 14 2024 at 18:14, Aruna Ramakrishna wrote:
>>> On Mar 14, 2024, at 10:54 AM, Dave Hansen <[email protected]> wrote:
>>> The need for this new feature is highly dependent on the threat model
>>> that it supports. I'm highly dubious that there's a true need to
>>> protect against an attacker with arbitrary write access in the same
>>> address space. We need to have a lot more information there.
>>
>> I thought the PKRU value being reset in the signal handler was
>> supposed to be the default behavior. In which case, this is a bug.
>>
>> "Signal Handler Behavior
>> Each time a signal handler is invoked (including nested signals),
>> the thread is temporarily given a new, default set of protection
>> key rights that override the rights from the interrupted context.”
>>
>> (Ref: https://man7.org/linux/man-pages/man7/pkeys.7.html)
>>
>> I'm not very familiar with protection keys (before I started looking
>> into this issue), so I apologize for misunderstanding.
>>
>> fpu__clear_user_states() does reset PKRU, but that happens much later
>> in the flow. Before that, the kernel tries to save registers on to the
>> alternate signal stack in setup_rt_frame(), and that fails if the
>> application has explicitly disabled pkey 0 (and the alt stack is
>> protected by pkey 0). This patch attempts to move that reset a little
>> earlier in the flow, so that setup_rt_frame() can succeed.
>>
>>> I haven't even more than glanced at the code. It looks pretty
>>> unspeakably ugly even at a glance.
>>
>> I agree with you - no argument there.
>
> It's a horrible hack.
>
>> But I’m not sure there is a “clean” way to do this. If there is, I’m
>> happy to redo the patch.
>
> If it turns out to be required, desired whatever then the obvious clean
> solution is to hand the PKRU value down:
>
> setup_rt_frame()
> xxx_setup_rt_frame()
> get_sigframe()
> copy_fpstate_to_sigframe()
>
> copy_fpstate_to_sigframe() has the user fpstate pointer already so none
> of the __update_pkru_in_sigframe() monstrosities are required. No?
>

I’m not sure I fully understand.

Are you suggesting modifying all these functions down the chain from handle_signal() to take in an additional parameter? Wouldn’t that break kABI?

In this approach too, the snippet where the value is modified on the sigframe after xsave will remain unchanged, because we need the value before xsave to match the register contents.

I guess what I’m saying is, this bit will remain unchanged - it would just be invoked from copy_fpstate_to_sigframe() instead of handle_signal().

pk = get_xsave_standard_addr((struct xregs_state __user *) buf,
XFEATURE_PKRU);
if (!pk || !user_write_access_begin((struct xregs_state __user *) buf,
sizeof(struct xregs_state)))
goto out;
unsafe_put_user(new_pkru, (unsigned int __user *) pk, uaccess_end);

Right?

If I’ve misunderstood something, I apologize. If there’s a way to do this without overwriting PKRU on the sigframe after xsave, I'd like to understand that flow. I'd be happy to redo it and send out a v2.

Thanks,
Aruna

> Thanks,
>
> tglx

2024-03-15 23:13:07

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE

On Fri, Mar 15 2024 at 18:43, Aruna Ramakrishna wrote:
>> On Mar 15, 2024, at 10:36 AM, Thomas Gleixner <[email protected]> wrote:
>>> But I’m not sure there is a “clean” way to do this. If there is, I’m
>>> happy to redo the patch.
>>
>> If it turns out to be required, desired whatever then the obvious clean
>> solution is to hand the PKRU value down:
>>
>> setup_rt_frame()
>> xxx_setup_rt_frame()
>> get_sigframe()
>> copy_fpstate_to_sigframe()
>>
>> copy_fpstate_to_sigframe() has the user fpstate pointer already so none
>> of the __update_pkru_in_sigframe() monstrosities are required. No?
>>
>
> Are you suggesting modifying all these functions down the chain from
> handle_signal() to take in an additional parameter?

Yes.

> Wouldn’t that break kABI?

If so who cares?

kABI is an out of tree madness maintained by distros, but upstream has
never supported it and never will. Aside of that kABI is a driver
interface which hardly has anything to do with the low level
architecture specific signal delivery code.

The kernel has no stable ABI, never had and never will have one, unless
hell freezes over.

> In this approach too, the snippet where the value is modified on the
> sigframe after xsave will remain unchanged, because we need the value
> before xsave to match the register contents.
>
> I guess what I’m saying is, half of __update_pkru_in_sigframe() will
> remain unchanged - it would just be invoked from
> copy_fpstate_to_sigframe() instead of handle_signal().

Yes, but that's the necessary and sane part of it.

> If there’s a way to do this without overwriting PKRU on the sigframe
> after xsave, I'd like to understand that flow.

There is none for obvious reasons unless you figure out how to resolve a
double circular hen and egg problem.

> Or if it’s just a matter of not needing to extract fpstate pointer in
> handle_signal(), I can restructure it that way too.

It's not only the pointer retrieval. Updating xstate is obviously a FPU
specific problem and the general signal handling code simply should not
care. C does not provide encapsulation, but it does not prevent
respecting it either.

Ideally we'd hide all of this in the FPU code, which is anyway the first
one writing to the signal stack. The problem is the error case when any
of the writes after saving the FPU frame terminaly faults or any other
condition makes the signal delivery fail.

So handle_signal() should look like this:

/* Ensure that the signal stack is writeable */
pkru = sig_prepare_pkru();

failed = setup_rt_frame(ksig, regs, pkru);
if (!failed) {
/*
* Clear the direction flag as per the ABI for function entry.
*
* Clear RF when entering the signal handler, because
* it might disable possible debug exception from the
* signal handler.
*
* Clear TF for the case when it wasn't set by debugger to
* avoid the recursive send_sigtrap() in SIGTRAP handler.
*/
regs->flags &= ~(X86_EFLAGS_DF|X86_EFLAGS_RF|X86_EFLAGS_TF);
/*
* Ensure the signal handler starts with the new fpu state.
* This also ensures that the PKRU state is set to the
* initial state.
*/
fpu__clear_user_states(fpu);
} else {
/* Restore the previous PKRU state */
sig_restore_pkru(pkru);
}

and then you'd have:

static inline bool copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru)
{
if (use_xsave()) {
if (!xsave_to_user_sigframe(buf))
return false;
if (cpu_feature_enabled(X86_FEATURE_OSPKE))
return !__put_user(pkru_address(buf), pkru);
return true;
}
if (use_fxsr())
return fxsave_to_user_sigframe((struct fxregs_state __user *) buf);
else
return fnsave_to_user_sigframe((struct fregs_state __user *) buf);
}

And yes, I deliberately changed the return value of setup_rt_frame() to
bool in this mockup because nothing cares about it. The only relevant
information is whether if failed or not. That want's to be a separate
patch obivously.

Thanks,

tglx






2024-03-18 17:25:36

by Aruna Ramakrishna

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE


> On Mar 15, 2024, at 4:05 PM, Thomas Gleixner <[email protected]> wrote:
>
> On Fri, Mar 15 2024 at 18:43, Aruna Ramakrishna wrote:
>>> On Mar 15, 2024, at 10:36 AM, Thomas Gleixner <[email protected]> wrote:
>>>> But I’m not sure there is a “clean” way to do this. If there is, I’m
>>>> happy to redo the patch.
>>>
>>> If it turns out to be required, desired whatever then the obvious clean
>>> solution is to hand the PKRU value down:
>>>
>>> setup_rt_frame()
>>> xxx_setup_rt_frame()
>>> get_sigframe()
>>> copy_fpstate_to_sigframe()
>>>
>>> copy_fpstate_to_sigframe() has the user fpstate pointer already so none
>>> of the __update_pkru_in_sigframe() monstrosities are required. No?
>>>
>>
>> Are you suggesting modifying all these functions down the chain from
>> handle_signal() to take in an additional parameter?
>
> Yes.
>
>> Wouldn’t that break kABI?
>
> If so who cares?
>
> kABI is an out of tree madness maintained by distros, but upstream has
> never supported it and never will. Aside of that kABI is a driver
> interface which hardly has anything to do with the low level
> architecture specific signal delivery code.
>
> The kernel has no stable ABI, never had and never will have one, unless
> hell freezes over.
>
>> In this approach too, the snippet where the value is modified on the
>> sigframe after xsave will remain unchanged, because we need the value
>> before xsave to match the register contents.
>>
>> I guess what I’m saying is, half of __update_pkru_in_sigframe() will
>> remain unchanged - it would just be invoked from
>> copy_fpstate_to_sigframe() instead of handle_signal().
>
> Yes, but that's the necessary and sane part of it.
>
>> If there’s a way to do this without overwriting PKRU on the sigframe
>> after xsave, I'd like to understand that flow.
>
> There is none for obvious reasons unless you figure out how to resolve a
> double circular hen and egg problem.
>
>> Or if it’s just a matter of not needing to extract fpstate pointer in
>> handle_signal(), I can restructure it that way too.
>
> It's not only the pointer retrieval. Updating xstate is obviously a FPU
> specific problem and the general signal handling code simply should not
> care. C does not provide encapsulation, but it does not prevent
> respecting it either.
>
> Ideally we'd hide all of this in the FPU code, which is anyway the first
> one writing to the signal stack. The problem is the error case when any
> of the writes after saving the FPU frame terminaly faults or any other
> condition makes the signal delivery fail.
>
> So handle_signal() should look like this:
>
> /* Ensure that the signal stack is writeable */
> pkru = sig_prepare_pkru();
>
> failed = setup_rt_frame(ksig, regs, pkru);
> if (!failed) {
> /*
> * Clear the direction flag as per the ABI for function entry.
> *
> * Clear RF when entering the signal handler, because
> * it might disable possible debug exception from the
> * signal handler.
> *
> * Clear TF for the case when it wasn't set by debugger to
> * avoid the recursive send_sigtrap() in SIGTRAP handler.
> */
> regs->flags &= ~(X86_EFLAGS_DF|X86_EFLAGS_RF|X86_EFLAGS_TF);
> /*
> * Ensure the signal handler starts with the new fpu state.
> * This also ensures that the PKRU state is set to the
> * initial state.
> */
> fpu__clear_user_states(fpu);
> } else {
> /* Restore the previous PKRU state */
> sig_restore_pkru(pkru);
> }
>
> and then you'd have:
>
> static inline bool copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru)
> {
> if (use_xsave()) {
> if (!xsave_to_user_sigframe(buf))
> return false;
> if (cpu_feature_enabled(X86_FEATURE_OSPKE))
> return !__put_user(pkru_address(buf), pkru);
> return true;
> }
> if (use_fxsr())
> return fxsave_to_user_sigframe((struct fxregs_state __user *) buf);
> else
> return fnsave_to_user_sigframe((struct fregs_state __user *) buf);
> }
>
> And yes, I deliberately changed the return value of setup_rt_frame() to
> bool in this mockup because nothing cares about it. The only relevant
> information is whether if failed or not. That want's to be a separate
> patch obivously.
>
> Thanks,
>
> tglx
>
>
>

I’ll update the patch based on your feedback and send out a v2.

Thank you to both of you for your time - appreciate it!

Thanks,
Aruna

2024-03-18 17:33:12

by Matthias Neugschwandtner

[permalink] [raw]
Subject: Re: [RFC PATCH] x86/pkeys: update PKRU to enable pkey 0 before XSAVE

On 3/15/24 05:47, Aruna Ramakrishna wrote:

> It’s not about the man page - it's just that, my understanding of this flow and this use case stems from there. I think we assumed that we can turn off pkey 0 and still be able to set up the alt sig stack (and have the kernel reset it to init_pkru anyway) - and when that didn’t work, it seemed like a bug. :)
>
>> In other words, you're not going to spur me into action my thwapping me
>> with the manpage that I wrote. You've got to convince me that your new
>> use case is valid, this is the best way to support your new use case,
>> and that your implementation of the new feature is sane.
>>
>>
>
> Matthias/Eric,
> Can you please talk about the use case in greater detail?

Sure. The core use case we are trying to handle is inspired by the seminal
ERIM paper [1] on using protection keys for in-process isolation. We want to
protect the memory regions of an application from corruption by a component
that co-resides in the same address space.
Since all memory allocated by the main application is tagged with pkey 0 by
default, we remove access to it when entering the component. If a signal is
triggered at that time, the kernel subsequently fails to set up the signal
handling stack.

Thank you,
Matthias

[1]
https://www.usenix.org/conference/usenixsecurity19/presentation/vahldiek-oberwagner