2024-02-21 20:01:53

by Aruna Ramakrishna

[permalink] [raw]
Subject: PKRU issue while using alternate signal stack

(Re-sending to the list, previous email had some formatting issues. I apologize.)

Hello,

We?re running into an issue with delayed PKRU update for signal handling, for which we don?t have a proposed solution yet.

Our use case is this:

The application has many threads that runs code that is deemed to be untrusted. Each thread has its stack/code protected by a non-zero pkey, and the PKRU register is set up such that only that particular non-zero pkey is enabled. Each thread also sets up an alternate signal stack to handle signals, which is protected by pkey zero. The pkeys man page documents that the PKRU will be reset to init_pku when the signal handler it is invoked, which means that pkey zero access is enabled, and so the alt sig stack is protected with pkey zero. But this reset happens after the kernel attempts to push fpstate to the alternate stack, which is not (yet) accessible by the kernel, which leads to a new SIGSEGV being sent to the application, terminating it.

This is the relevant snippet:

In handle_signal():

..
failed = (setup_rt_frame(ksig, regs) < 0); <- pkru reset should happen before this
if (!failed) {
/*
* Clear the direction flag as per the ABI for function entry.
*
* Clear RF when entering the signal handler, because
* it might disable possible debug exception from the
* signal handler.
*
* Clear TF for the case when it wasn't set by debugger to
* avoid the recursive send_sigtrap() in SIGTRAP handler.
*/
regs->flags &= ~(X86_EFLAGS_DF|X86_EFLAGS_RF|X86_EFLAGS_TF);
/*
* Ensure the signal handler starts with the new fpu state.
*/
fpu__clear_user_states(fpu); <- pkru resets here, via pkru_write_default()
}
signal_setup_done(failed, ksig, stepping);
..

Failure path: setup_rt_frame() -> x64_setup_rt_frame() -> get_sigframe() -> copy_fpstate_to_sigframe() -> __clear_user() -> fails, with SIGSEGV and si_code set to SEGV_PKUERR.

The PKRU value is reset to the default (enabling pkey 0 only) in fpu__clear_user_states().

If the pkru_write_default() call were to move up the flow here, before copy_fpstate_to_sigframe(), then the signal handling would work as expected. But this code/flow is quite complicated, and we?d appreciate some expert opinion.

Thanks,
Aruna



2024-02-21 20:46:10

by Dave Hansen

[permalink] [raw]
Subject: Re: PKRU issue while using alternate signal stack

On 2/21/24 11:54, Aruna Ramakrishna wrote:
> If the pkru_write_default() call were to move up the flow here, before
> copy_fpstate_to_sigframe(), then the signal handling would work as
> expected. But this code/flow is quite complicated, and we’d appreciate
> some expert opinion.

First, I think you're not the first ones to report this, or want the
behavior tweaked. I can't seem to find the thread at the moment, but
you might want to search to see if you have some fellow travelers here.

This is a bit of a chicken-and-egg problem. We used to have some
complicated code to munge the (compacted+supervisor) kernel fpstate into
the (uncompacted+user) userspace sigframe. That sucked, so we
simplified it to always use XSAVE to write the uncompacted+user format.

But that implementation choice fundamentally means that the register
state *MUST* match sigframe contents, at least at the time of XSAVE.
That's in direct conflict to your requirement that the sigframe be
written with different PKRU contents than what was in place at the time
that the exception happened.

That means we either need to abandon the xsave_to_user_sigframe()
approach, or we need to do something like:

tmp_pkru = rdpkru();
wrpkru(0);
xsave_to_user_sigframe();
put_user(pkru_sigframe_addr, tmp_pkru);

Which is horrid.

There are other games you could play with get_user_pages(), vmap() and
XSAVE but those would be even more horrid.

The simplest option is to just leave the altstacks writeable by all.