Hi.
I am playing with 32bit compatibility segments,
and I am observing something very strange.
To demonstrate the problem, I did the change
to the kernel sigreturn test-case, and it is attached.
The change just moves the magic value to EAX
and calls an interrupt that produces a SIGSEGV.
The SIGSEGV handler prints the needed regs.
This patch intentionally adds 0x100000000 to
the RIP register, because AFAIK the high part
or 64bit regs are irrelevant in compatibility mode.
Now with high part of RIP non-zero, we see this:
$ ./sigreturn_64
err=0 trapno=d ax=0 ip=100000003
OK, obviously our asm code didn't execute -
why is so? How does the high part of RIP
affect compatibility mode?
Then lets start the same example under gdb:
Program received signal SIGSEGV, Segmentation fault.
0x0000000000403008 in int31 ()
(gdb)
Continuing.
err=18a trapno=d ax=a5f3 ip=403008
Wow! Much better now: error code is correct,
ax is correct, but ip is invalid.
Now lets remove that high-part trashing from
source and recompile, then run again:
$ ./sigreturn_64
err=0 trapno=d ax=a5f3 ip=6
This time error code is incorrect! All other
values are good.
Lets see under gdb:
err=18a trapno=d ax=a5f3 ip=403008
Cool, under gdb the err code fixed itself but
IP is wrong...
Can anyone make any sense out of this? :)
To me it seems completely chaotic and I
suspect both sigreturn() and ptrace() are
somehow "broken" here.
Stas,
in your github report you mention that you believe this is a regression
https://github.com/dosemu2/dosemu2/pull/1830.
Can you tell us the last kernel this worked on? Which kernel you tested
that this fails on? It would be awesome if you could bisect this to the
commit that is broken but at least knowing kernel's you have tried that
work and don't work would be very useful.
As everything doesemu and vm86 does is pretty x86 centric I have added
the overall x86 maintainers as well as Al Viro in the hopes of this
getting routed to someone who has a bit more time then I do at the
moment. I remember Al looking quite a bit at the architecture specific
signal handling bits.
Regressions where things used to work, but no longer work are much
more interesting than a simple bug report in a compatibility layer.
We have old bugs where 32bit compatibility code on 64bit kernels have
didn't always give the correct siginfo that I think I have fixed in
recent years. I don't know if we have similar issues for sigcontext
which it seems that you are dealing with.
Dosemu is old enough that anything it has down historically that no
longer works certainly counts as a regression and should be fixed.
stsp <[email protected]> writes:
> I am playing with 32bit compatibility segments,
> and I am observing something very strange.
> To demonstrate the problem, I did the change
> to the kernel sigreturn test-case, and it is attached.
> The change just moves the magic value to EAX
> and calls an interrupt that produces a SIGSEGV.
> The SIGSEGV handler prints the needed regs.
> This patch intentionally adds 0x100000000 to
> the RIP register, because AFAIK the high part
> or 64bit regs are irrelevant in compatibility mode.
> Now with high part of RIP non-zero, we see this:
> $ ./sigreturn_64
> err=0 trapno=d ax=0 ip=100000003
>
> OK, obviously our asm code didn't execute -
> why is so? How does the high part of RIP
> affect compatibility mode?
> Then lets start the same example under gdb:
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000403008 in int31 ()
> (gdb)
> Continuing.
> err=18a trapno=d ax=a5f3 ip=403008
>
> Wow! Much better now: error code is correct,
> ax is correct, but ip is invalid.
>
> Now lets remove that high-part trashing from
> source and recompile, then run again:
> $ ./sigreturn_64
> err=0 trapno=d ax=a5f3 ip=6
>
> This time error code is incorrect! All other
> values are good.
> Lets see under gdb:
> err=18a trapno=d ax=a5f3 ip=403008
>
> Cool, under gdb the err code fixed itself but
> IP is wrong...
>
> Can anyone make any sense out of this? :)
> To me it seems completely chaotic and I
> suspect both sigreturn() and ptrace() are
> somehow "broken" here.
>
> diff --git a/tools/testing/selftests/x86/sigreturn.c b/tools/testing/selftests/x86/sigreturn.c
> index 5d7961a5f7f6..6c8f2431d77d 100644
> --- a/tools/testing/selftests/x86/sigreturn.c
> +++ b/tools/testing/selftests/x86/sigreturn.c
> @@ -101,9 +101,14 @@ asm (".pushsection .text\n\t"
> "mov %ss,%ecx\n\t"
> "int3\n\t"
> ".size int3, . - int3\n\t"
> + ".type int31, @function\n\t"
> + "int31:\n\t"
> + "movl $0xa5f3, %eax\n\t"
> + "int $0x31\n\t"
> ".align 4096, 0xcc\n\t"
> ".popsection");
> extern char int3[4096];
> +extern char int31[];
>
> /*
> * At startup, we prepapre:
> @@ -296,6 +301,7 @@ static volatile sig_atomic_t sig_corrupt_final_ss;
> # define REG_IP REG_RIP
> # define REG_SP REG_RSP
> # define REG_CX REG_RCX
> +# define REG_AX REG_RAX
>
> struct selectors {
> unsigned short cs, gs, fs, ss;
> @@ -316,6 +322,7 @@ static unsigned short *csptr(ucontext_t *ctx)
> # define REG_IP REG_EIP
> # define REG_SP REG_ESP
> # define REG_CX REG_ECX
> +# define REG_AX REG_EAX
>
> static greg_t *ssptr(ucontext_t *ctx)
> {
> @@ -444,9 +451,12 @@ static void sigusr1(int sig, siginfo_t *info, void *ctx_void)
> *ssptr(ctx) = sig_ss;
>
> ctx->uc_mcontext.gregs[REG_IP] =
> - sig_cs == code16_sel ? 0 : (unsigned long)&int3;
> + sig_cs == code16_sel ? ((unsigned long)&int31 -
> + (unsigned long)&int3) | 0x100000000 :
> + (unsigned long)&int3;
> ctx->uc_mcontext.gregs[REG_SP] = (unsigned long)0x8badf00d5aadc0deULL;
> ctx->uc_mcontext.gregs[REG_CX] = 0;
> + ctx->uc_mcontext.gregs[REG_AX] = 0;
>
> #ifdef __i386__
> /*
> @@ -515,6 +525,20 @@ static void sigtrap(int sig, siginfo_t *info, void *ctx_void)
> sig_trapped = sig;
> }
>
> +static void sigsegv(int sig, siginfo_t *info, void *ctx_void)
> +{
> + ucontext_t *ctx = (ucontext_t*)ctx_void;
> +
> + validate_signal_ss(sig, ctx);
> +
> + sig_err = ctx->uc_mcontext.gregs[REG_ERR];
> + sig_trapno = ctx->uc_mcontext.gregs[REG_TRAPNO];
> + printf("err=%x trapno=%x ax=%x ip=%llx\n", sig_err, sig_trapno,
> + (unsigned)ctx->uc_mcontext.gregs[REG_AX],
> + (unsigned long long)ctx->uc_mcontext.gregs[REG_IP]);
> + _exit(0);
> +}
> +
> #ifdef __x86_64__
> /* Tests recovery if !UC_STRICT_RESTORE_SS */
> static void sigusr2(int sig, siginfo_t *info, void *ctx_void)
> @@ -777,6 +801,7 @@ int main()
>
> sethandler(SIGUSR1, sigusr1, 0);
> sethandler(SIGTRAP, sigtrap, SA_ONSTACK);
> + sethandler(SIGSEGV, sigsegv, SA_ONSTACK);
>
> /* Easy cases: return to a 32-bit SS in each possible CS bitness. */
> total_nerrs += test_valid_sigreturn(64, false, -1);
Eric
Hi Eric,
13.12.2022 00:36, Eric W. Biederman пишет:
> Stas,
>
> in your github report you mention that you believe this is a regression
> https://github.com/dosemu2/dosemu2/pull/1830.
>
> Can you tell us the last kernel this worked on?
The only thing I actually think is a
regression, is the return of 0 as an error
code for GPF. I am pretty sure it used
to work, because I was reporting the
zeroed-out err code to @amluto and
he fixed it. But that was something like
5 years ago. These days @amluto seems
to be inactive, does anyone know what
have happened? He was always providing
a very quick help in the past (and well,
he wrote all that 64-32 switching code
in sigreturn for us).
Other problems I've found, are likely not
a regressions. I.e. I never tried such
tests under gdb before and I never tried
to set high dword of RIP to non-zero.
In fact, reliable err code is what I care
most. If things can't be fixed under gdb
or if I should always clear high part of
RIP before switching to 32bit segment -
fine. But zero error code is bad.
> Which kernel you tested
> that this fails on?
5.19.0-26-generic from ubuntu.
> It would be awesome if you could bisect this to the
> commit that is broken but at least knowing kernel's you have tried that
> work and don't work would be very useful.
Perhaps I'll look into setting up the test
env under qemu if everything else fails.
> Dosemu is old enough that anything it has down historically that no
> longer works certainly counts as a regression and should be fixed.
dosemu2 nowadays is using Andy's sigreturn
new code, which is now about 10 years old.
Historic dosemu afaik is not using anything
like that, switching to 32bit by hands with iret.
On Sat, Dec 10, 2022, at 1:08 AM, stsp wrote:
> Hi.
>
> I am playing with 32bit compatibility segments,
> and I am observing something very strange.
> To demonstrate the problem, I did the change
> to the kernel sigreturn test-case, and it is attached.
> The change just moves the magic value to EAX
> and calls an interrupt that produces a SIGSEGV.
> The SIGSEGV handler prints the needed regs.
> This patch intentionally adds 0x100000000 to
> the RIP register, because AFAIK the high part
> or 64bit regs are irrelevant in compatibility mode.
> Now with high part of RIP non-zero, we see this:
> $ ./sigreturn_64
> err=0 trapno=d ax=0 ip=100000003
>
> OK, obviously our asm code didn't execute -
> why is so? How does the high part of RIP
> affect compatibility mode?
> Then lets start the same example under gdb:
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000000403008 in int31 ()
> (gdb)
> Continuing.
> err=18a trapno=d ax=a5f3 ip=403008
>
> Wow! Much better now: error code is correct,
> ax is correct, but ip is invalid.
I generally distrust gdb when mixed modes are involved -- it's fundamentally intensely buggy. Now maybe you're not hitting the bugs I know of, but still...
Anyway, the behavior I expect (not that I've tested this, but based on my memory of how this is all supposed to work) is that an attempt to return to user mode will fail with #GP because the full value of RIP is compared to the segment limit, which is 2^32-1. And #GP is 0xd, so your non-gdb outputs look broadly correct...
On Sat, Dec 10 2022 at 14:08, stsp wrote:
Can you please Cc LKML on such mails? [email protected] is not used
by any x86 maintainer as you can figure out from looking at the
MAINTAINERS file in the kernel.
> I am playing with 32bit compatibility segments, and I am observing
> something very strange. To demonstrate the problem, I did the change
> to the kernel sigreturn test-case, and it is attached. The change
> just moves the magic value to EAX and calls an interrupt that produces
> a SIGSEGV. The SIGSEGV handler prints the needed regs. This patch
> intentionally adds 0x100000000 to the RIP register, because AFAIK the
> high part or 64bit regs are irrelevant in compatibility mode.
>
> Now with high part of RIP non-zero, we see this:
> $ ./sigreturn_64
> err=0 trapno=d ax=0 ip=100000003
I just applied the patch and on a 6.1 kernel I get:
# ./sigreturn_64
[OK] set_thread_area refused 16-bit data
[OK] set_thread_area refused 16-bit data
[RUN] Valid sigreturn: 64-bit CS (33), 32-bit SS (2b, GDT)
[OK] all registers okay
[RUN] Valid sigreturn: 32-bit CS (23), 32-bit SS (2b, GDT)
[NOTE] SP: 8badf00d5aadc0de -> 5aadc0de
[OK] all registers okay
[RUN] Valid sigreturn: 16-bit CS (37), 32-bit SS (2b, GDT)
err=0 trapno=d ax=a5f3 ip=6
Let's look at the disassmbly:
0000000000403000 <int3>:
403000: 8c d1 mov %ss,%ecx
403002: cc int3
0000000000403003 <int31>:
403003: b8 f3 a5 00 00 mov $0xa5f3,%eax
403008: cd 31 int $0x31
which is expected and correct:
trapno = 0xd = 13 = #GP
ax = the magic value
ip = 6 (Offset to the 'int3:' label in the 16bit CS)
err = 0
Both 'ip' and 'err' are completely correct here. Why?
The #GP's on 403006. Because in 16bit mode the CPU the disassmbly looks
like this:
403003: b8 f3 a5 mov $0xa5f3,%eax
403006: 00 00 add %al, (%bx, %si)
so 403006 which is offset 6 into the 16bit CS translates to:
bx[si] += al;
so in my case:
bx=0x0 si=0x2903e6d0
which is clearly outside of the DS segment limit resulting in a #GP with
error code == 0.
Your observation that running this under GDB changes the behaviour of
the error is completely correct because BX/SI are subject to context. So
depending where the combo points to it results in random behaviour.
So nothing strange to see here, really. You got what you asked for:
> I am playing with ....
Thanks,
tglx
13.12.2022 02:59, Andy Lutomirski пишет:
> I generally distrust gdb when mixed modes are involved -- it's fundamentally intensely buggy. Now maybe you're not hitting the bugs I know of, but still...
>
> Anyway, the behavior I expect (not that I've tested this, but based on my memory of how this is all supposed to work) is that an attempt to return to user mode will fail with #GP because the full value of RIP is compared to the segment limit, which is 2^32-1. And #GP is 0xd, so your non-gdb outputs look broadly correct...
Yes, that may explain the problem.
So where is this check? And should it
be fixed to apply the mask to RIP?
Or should I always clear high parts
by hands? If so - only for RIP?
Hi,
13.12.2022 05:24, Thomas Gleixner пишет:
> Your observation that running this under GDB changes the behaviour of
> the error is completely correct because BX/SI are subject to context. So
> depending where the combo points to it results in random behaviour.
>
> So nothing strange to see here, really. You got what you asked for:
Thanks for checking, so some problems
were not valid ones, but lets remove the
mov to eax from the test.
Without gdb:
err=0 trapno=d ax=0 ip=100000003
With gdb:
err=18a trapno=d ax=0 ip=403003
Without high RIP poison:
err=18a trapno=d ax=0 ip=3
This case is perfectly valid now, thanks.
Without high RIP poison and with gdb:
err=18a trapno=d ax=0 ip=403003
So under gdb we still see the wrong RIP
value and high RIP part breaks things
only without gdb (gdb "fixes" it).
Attaching the new diff that doesn't do
the mov to eax, so should be correct now.