2023-01-20 18:10:45

by Dave Hansen

[permalink] [raw]
Subject: Re: the x86 sysret_rip test fails on the Intel FRED architecture

On 1/19/23 23:49, Li, Xin3 wrote:
> The x86 sysret_rip test has the following assertion:
>
> /* R11 and EFLAGS should already match. */
> assert(ctx->uc_mcontext.gregs[REG_EFL] ==
> ctx->uc_mcontext.gregs[REG_R11]);
>
> This is being tested to avoid kernel state leak due to sysret vs iret,
> but that on FRED r11 is *always* preserved, and the test just fails.

Let's figure out the reason that FRED acts differently, first. Right
now, the SDM says:

SYSCALL also saves RFLAGS into R11

so that behavior of SYSCALL _looks_ architectural to me. Was this
change in SYSCALL behavior with FRED intentional?

If not intentional, it might be something that can still be fixed. If
it is intentional and is going to be with us for a while we have a few
options. If userspace is _really_ depending on this behavior, we could
just clobber r11 ourselves in the FRED entry path. If not, we can
remove the assertion in the selftest.


2023-01-21 06:51:57

by H. Peter Anvin

[permalink] [raw]
Subject: Re: the x86 sysret_rip test fails on the Intel FRED architecture

On January 20, 2023 9:45:26 AM PST, Dave Hansen <[email protected]> wrote:
>On 1/19/23 23:49, Li, Xin3 wrote:
>> The x86 sysret_rip test has the following assertion:
>>
>> /* R11 and EFLAGS should already match. */
>> assert(ctx->uc_mcontext.gregs[REG_EFL] ==
>> ctx->uc_mcontext.gregs[REG_R11]);
>>
>> This is being tested to avoid kernel state leak due to sysret vs iret,
>> but that on FRED r11 is *always* preserved, and the test just fails.
>
>Let's figure out the reason that FRED acts differently, first. Right
>now, the SDM says:
>
> SYSCALL also saves RFLAGS into R11
>
>so that behavior of SYSCALL _looks_ architectural to me. Was this
>change in SYSCALL behavior with FRED intentional?
>
>If not intentional, it might be something that can still be fixed. If
>it is intentional and is going to be with us for a while we have a few
>options. If userspace is _really_ depending on this behavior, we could
>just clobber r11 ourselves in the FRED entry path. If not, we can
>remove the assertion in the selftest.

We can't clobber it in the FRED entry path, since it is common for all events, but we could do it in the syscall dispatch.

However, it doesn't seem to make sense to do so to me. The current behavior is much more of an artifact than desired behavior.

2023-01-21 17:34:09

by Dave Hansen

[permalink] [raw]
Subject: Re: the x86 sysret_rip test fails on the Intel FRED architecture

On 1/20/23 20:59, H. Peter Anvin wrote:
>> If not intentional, it might be something that can still be fixed.
>> If it is intentional and is going to be with us for a while we have
>> a few options. If userspace is _really_ depending on this
>> behavior, we could just clobber r11 ourselves in the FRED entry
>> path. If not, we can remove the assertion in the selftest.
> We can't clobber it in the FRED entry path, since it is common for
> all events, but we could do it in the syscall dispatch.
>
> However, it doesn't seem to make sense to do so to me. The current
> behavior is much more of an artifact than desired behavior.
I guess the SDM statements really are for the kernel's benefit and not
for userspace. Userspace _should_ be treating SYSCALL like a CALL and
r11 like any old register that can be clobbered. Right now, the kernel
just happens to clobber it with RFLAGS.

I do the the odds of anyone relying on this behavior are pretty small.
Let's just zap the check from the selftest, document what we did in the
FRED docs and changelog and move on.

If someone screams later, we can fix in some SYSCALL-specific piece of
the FRED code.

2023-01-21 23:21:44

by Brian Gerst

[permalink] [raw]
Subject: Re: the x86 sysret_rip test fails on the Intel FRED architecture

On Sat, Jan 21, 2023 at 12:34 PM Dave Hansen <[email protected]> wrote:
>
> On 1/20/23 20:59, H. Peter Anvin wrote:
> >> If not intentional, it might be something that can still be fixed.
> >> If it is intentional and is going to be with us for a while we have
> >> a few options. If userspace is _really_ depending on this
> >> behavior, we could just clobber r11 ourselves in the FRED entry
> >> path. If not, we can remove the assertion in the selftest.
> > We can't clobber it in the FRED entry path, since it is common for
> > all events, but we could do it in the syscall dispatch.
> >
> > However, it doesn't seem to make sense to do so to me. The current
> > behavior is much more of an artifact than desired behavior.
> I guess the SDM statements really are for the kernel's benefit and not
> for userspace. Userspace _should_ be treating SYSCALL like a CALL and
> r11 like any old register that can be clobbered. Right now, the kernel
> just happens to clobber it with RFLAGS.
>
> I do the the odds of anyone relying on this behavior are pretty small.
> Let's just zap the check from the selftest, document what we did in the
> FRED docs and changelog and move on.

Keep the selftest check, but also accept preserved RCX/R11. What
really matters is that the kernel isn't leaking data.

--
Brian Gerst

2023-01-22 03:32:02

by Li, Xin3

[permalink] [raw]
Subject: RE: the x86 sysret_rip test fails on the Intel FRED architecture

> > >> If not intentional, it might be something that can still be fixed.
> > >> If it is intentional and is going to be with us for a while we have
> > >> a few options. If userspace is _really_ depending on this
> > >> behavior, we could just clobber r11 ourselves in the FRED entry
> > >> path. If not, we can remove the assertion in the selftest.
> > > We can't clobber it in the FRED entry path, since it is common for
> > > all events, but we could do it in the syscall dispatch.
> > >
> > > However, it doesn't seem to make sense to do so to me. The current
> > > behavior is much more of an artifact than desired behavior.
> > I guess the SDM statements really are for the kernel's benefit and not
> > for userspace. Userspace _should_ be treating SYSCALL like a CALL and
> > r11 like any old register that can be clobbered. Right now, the
> > kernel just happens to clobber it with RFLAGS.
> >
> > I do the the odds of anyone relying on this behavior are pretty small.
> > Let's just zap the check from the selftest, document what we did in
> > the FRED docs and changelog and move on.
>
> Keep the selftest check, but also accept preserved RCX/R11. What really matters is
> that the kernel isn't leaking data.

I feel it the same way, it looks to me that the check is to make sure
R11 doesn't leak any kernel data because the Linux kernel deliberately
overwrites R11 with the value of user level flags just before returning
to user level.

I wanted to zap the check, but as HPA said, this is an artifact to not leak
any kernel data. I guess it doesn't make a difference if the kernel sets
R11 to 0.

Maybe it's still reasonable to keep such a check for IDT. However, it makes
no sense for FRED systems, because all GP registers are saved/restored upon
event delivery/return.

Thanks!
Xin

>
> --
> Brian Gerst

2023-01-22 03:46:35

by H. Peter Anvin

[permalink] [raw]
Subject: RE: the x86 sysret_rip test fails on the Intel FRED architecture

On January 21, 2023 7:01:53 PM PST, "Li, Xin3" <[email protected]> wrote:
>> > >> If not intentional, it might be something that can still be fixed.
>> > >> If it is intentional and is going to be with us for a while we have
>> > >> a few options. If userspace is _really_ depending on this
>> > >> behavior, we could just clobber r11 ourselves in the FRED entry
>> > >> path. If not, we can remove the assertion in the selftest.
>> > > We can't clobber it in the FRED entry path, since it is common for
>> > > all events, but we could do it in the syscall dispatch.
>> > >
>> > > However, it doesn't seem to make sense to do so to me. The current
>> > > behavior is much more of an artifact than desired behavior.
>> > I guess the SDM statements really are for the kernel's benefit and not
>> > for userspace. Userspace _should_ be treating SYSCALL like a CALL and
>> > r11 like any old register that can be clobbered. Right now, the
>> > kernel just happens to clobber it with RFLAGS.
>> >
>> > I do the the odds of anyone relying on this behavior are pretty small.
>> > Let's just zap the check from the selftest, document what we did in
>> > the FRED docs and changelog and move on.
>>
>> Keep the selftest check, but also accept preserved RCX/R11. What really matters is
>> that the kernel isn't leaking data.
>
>I feel it the same way, it looks to me that the check is to make sure
>R11 doesn't leak any kernel data because the Linux kernel deliberately
>overwrites R11 with the value of user level flags just before returning
>to user level.
>
>I wanted to zap the check, but as HPA said, this is an artifact to not leak
>any kernel data. I guess it doesn't make a difference if the kernel sets
>R11 to 0.
>
>Maybe it's still reasonable to keep such a check for IDT. However, it makes
>no sense for FRED systems, because all GP registers are saved/restored upon
>event delivery/return.
>
>Thanks!
> Xin
>
>>
>> --
>> Brian Gerst
>

The big thing is that the system calls that return with sysret v iret on IDT systems need to be consistent, in order to not leak kernel state.

2023-01-22 04:00:36

by Li, Xin3

[permalink] [raw]
Subject: RE: the x86 sysret_rip test fails on the Intel FRED architecture

> >> The x86 sysret_rip test has the following assertion:
> >>
> >> /* R11 and EFLAGS should already match. */
> >> assert(ctx->uc_mcontext.gregs[REG_EFL] ==
> >> ctx->uc_mcontext.gregs[REG_R11]);
> >>
> >> This is being tested to avoid kernel state leak due to sysret vs
> >> iret, but that on FRED r11 is *always* preserved, and the test just fails.
> >
> >Let's figure out the reason that FRED acts differently, first. Right
> >now, the SDM says:
> >
> > SYSCALL also saves RFLAGS into R11
> >
> >so that behavior of SYSCALL _looks_ architectural to me. Was this
> >change in SYSCALL behavior with FRED intentional?
> >
> >If not intentional, it might be something that can still be fixed. If
> >it is intentional and is going to be with us for a while we have a few
> >options. If userspace is _really_ depending on this behavior, we could
> >just clobber r11 ourselves in the FRED entry path. If not, we can
> >remove the assertion in the selftest.
>
> We can't clobber it in the FRED entry path, since it is common for all events, but we
> could do it in the syscall dispatch.

Yes, adding "regs->r11 = regs->flags" in the SYSCALL dispatch does make
the test pass.

>
> However, it doesn't seem to make sense to do so to me. The current behavior is
> much more of an artifact than desired behavior.

We kind of have an agreement that %r11 = %flags after returning from the kernel.

And the question is, is it what we want?

2023-01-22 04:55:16

by Dave Hansen

[permalink] [raw]
Subject: Re: the x86 sysret_rip test fails on the Intel FRED architecture

On 1/21/23 19:38, Li, Xin3 wrote:
>> However, it doesn't seem to make sense to do so to me. The current behavior is
>> much more of an artifact than desired behavior.
> We kind of have an agreement that %r11 = %flags after returning from the kernel.
>
> And the question is, is it what we want?

Can the selftest just set r11=rflags before the syscall? The old
syscall entry path will set r11=rflags. The FRED path won't touch it.
Either case will pass an r11==rflags check.

2023-01-22 06:04:13

by H. Peter Anvin

[permalink] [raw]
Subject: Re: the x86 sysret_rip test fails on the Intel FRED architecture

On January 21, 2023 8:34:09 PM PST, Dave Hansen <[email protected]> wrote:
>On 1/21/23 19:38, Li, Xin3 wrote:
>>> However, it doesn't seem to make sense to do so to me. The current behavior is
>>> much more of an artifact than desired behavior.
>> We kind of have an agreement that %r11 = %flags after returning from the kernel.
>>
>> And the question is, is it what we want?
>
>Can the selftest just set r11=rflags before the syscall? The old
>syscall entry path will set r11=rflags. The FRED path won't touch it.
>Either case will pass an r11==rflags check.

That's a good idea.

2023-01-22 09:15:01

by Ammar Faizi

[permalink] [raw]
Subject: Re: RE: the x86 sysret_rip test fails on the Intel FRED architecture

On 1/22/23 3:22 PM, Li, Xin3 wrote:
> The problem is where/how to set %r11 = %rflags in the test code.
>
> The check happens in the USER1 signal handler, and we could set %r11
> just before calling raise(SIGUSR1). However, the C library implementation
> of raise() modifies %r11, thus we can't preserve %r11 until the SYSCALL
> instruction. And the test still fails.

From "man 3 raise":

"""
The raise() function sends a signal to the calling process or thread.
In a single-threaded program it is equivalent to

kill(getpid(), sig);
"""

Implementing kill syscall with %r11 modified before entering the kernel
may look like this?

static void __raise(int sig)
{
__asm__ volatile (
"pushf\n\t"
"popq %%r11\n\t"
"syscall"
:
: "D"(getpid()), /* %rdi */
"S"(sig), /* %rsi */
"a"(__NR_kill) /* %rax */
: "rcx", "r11", "memory"
);
}

--
Ammar Faizi

2023-01-22 10:10:00

by Li, Xin3

[permalink] [raw]
Subject: RE: the x86 sysret_rip test fails on the Intel FRED architecture

> On January 21, 2023 8:34:09 PM PST, Dave Hansen <[email protected]>
> wrote:
> >On 1/21/23 19:38, Li, Xin3 wrote:
> >>> However, it doesn't seem to make sense to do so to me. The current
> >>> behavior is much more of an artifact than desired behavior.
> >> We kind of have an agreement that %r11 = %flags after returning from the
> kernel.
> >>
> >> And the question is, is it what we want?
> >
> >Can the selftest just set r11=rflags before the syscall? The old
> >syscall entry path will set r11=rflags. The FRED path won't touch it.
> >Either case will pass an r11==rflags check.
>
> That's a good idea.

The problem is where/how to set %r11 = %rflags in the test code.

The check happens in the USER1 signal handler, and we could set %r11
just before calling raise(SIGUSR1). However, the C library implementation
of raise() modifies %r11, thus we can't preserve %r11 until the SYSCALL
instruction. And the test still fails.

Thanks!
XIn

2023-01-22 11:05:49

by H. Peter Anvin

[permalink] [raw]
Subject: Re: RE: the x86 sysret_rip test fails on the Intel FRED architecture

On January 22, 2023 12:54:56 AM PST, Ammar Faizi <[email protected]> wrote:
>On 1/22/23 3:22 PM, Li, Xin3 wrote:
>> The problem is where/how to set %r11 = %rflags in the test code.
>>
>> The check happens in the USER1 signal handler, and we could set %r11
>> just before calling raise(SIGUSR1). However, the C library implementation
>> of raise() modifies %r11, thus we can't preserve %r11 until the SYSCALL
>> instruction. And the test still fails.
>
>From "man 3 raise":
>
>"""
>The raise() function sends a signal to the calling process or thread.
>In a single-threaded program it is equivalent to
>
> kill(getpid(), sig);
>"""
>
>Implementing kill syscall with %r11 modified before entering the kernel
>may look like this?
>
>static void __raise(int sig)
>{
> __asm__ volatile (
> "pushf\n\t"
> "popq %%r11\n\t"
> "syscall"
> :
> : "D"(getpid()), /* %rdi */
> "S"(sig), /* %rsi */
> "a"(__NR_kill) /* %rax */
> : "rcx", "r11", "memory"
> );
>}
>

Exactly.