Hi,
I'm experiencing VDSO-related crashes on 5.5 with MIPS64. The MIPS64
builders on build.wireguard.com are all red at the moment.
It looks like libc is crashing with a null pointer dereference when
doing any work after returning from clock_gettime. This manifests
itself, for me, with calls to clock_gettime(CLOCK_PROCESS_CPUTIME_ID),
because CLOCK_PROCESS_CPUTIME_ID is not in the VDSO. It looks in the
VDSO, doesn't find it, and then proceeds to make the real syscall, when
it crashes. I can simulate the same crash by simply adding a printf
after a successful call to the vdso before returning. For example:
int __clock_gettime(clockid_t clk, struct timespec *ts)
{
int r;
#ifdef VDSO_CGT_SYM
int (*f)(clockid_t, struct timespec *) =
(int (*)(clockid_t, struct timespec *))vdso_func;
printf("vdso %p\n", f); // <-- this line does NOT crash.
if (f) {
r = f(clk, ts);
if (!r) {
printf("ret %d\n", r); // <-- this line DOES crash.
return r;
}
if (r == -EINVAL)
return __syscall_ret(r);
}
#endif
printf("falling through\n"); // <--- this line DOES crash.
r = __syscall(SYS_clock_gettime, clk, ts); // <-- also, this line will crash too
if (r == -ENOSYS) {
if (clk == CLOCK_REALTIME) {
__syscall(SYS_gettimeofday, ts, 0);
ts->tv_nsec = (int)ts->tv_nsec * 1000;
return 0;
}
r = -EINVAL;
}
return __syscall_ret(r);
}
It seems like somehow the stack frame is corrupted/unusable after a call
to the vdso. But, returning immediately from clock_gettime after a call
to the vdso allows the program to continue. Thus, this problem only
manifests itself when using clocks that aren't handled by the vdso.
It's possible this is due to some compiler ABI mismatch situation
between userspace and kernelspace. However, I've only started seeing
this happen with 5.5 and not on 5.4.
Does the above description immediately point to some recognizable
change? If not, I'll keep debugging.
Thanks,
Jason
Looks like the problem goes away by reverting 24640f233b46 ("mips: Add
support for generic vDSO").
Hi Jason,
Copying Vincenzo.
On Mon, Dec 23, 2019 at 02:08:34PM +0100, Jason A. Donenfeld wrote:
> I'm experiencing VDSO-related crashes on 5.5 with MIPS64. The MIPS64
> builders on build.wireguard.com are all red at the moment.
>
> It looks like libc is crashing with a null pointer dereference when
> doing any work after returning from clock_gettime. This manifests
> itself, for me, with calls to clock_gettime(CLOCK_PROCESS_CPUTIME_ID),
> because CLOCK_PROCESS_CPUTIME_ID is not in the VDSO. It looks in the
> VDSO, doesn't find it, and then proceeds to make the real syscall, when
> it crashes. I can simulate the same crash by simply adding a printf
> after a successful call to the vdso before returning. For example:
>
> int __clock_gettime(clockid_t clk, struct timespec *ts)
> {
> int r;
>
> #ifdef VDSO_CGT_SYM
> int (*f)(clockid_t, struct timespec *) =
> (int (*)(clockid_t, struct timespec *))vdso_func;
> printf("vdso %p\n", f); // <-- this line does NOT crash.
> if (f) {
> r = f(clk, ts);
> if (!r) {
> printf("ret %d\n", r); // <-- this line DOES crash.
> return r;
> }
> if (r == -EINVAL)
> return __syscall_ret(r);
> }
> #endif
> printf("falling through\n"); // <--- this line DOES crash.
> r = __syscall(SYS_clock_gettime, clk, ts); // <-- also, this line will crash too
> if (r == -ENOSYS) {
> if (clk == CLOCK_REALTIME) {
> __syscall(SYS_gettimeofday, ts, 0);
> ts->tv_nsec = (int)ts->tv_nsec * 1000;
> return 0;
> }
> r = -EINVAL;
> }
> return __syscall_ret(r);
> }
>
> It seems like somehow the stack frame is corrupted/unusable after a call
> to the vdso. But, returning immediately from clock_gettime after a call
> to the vdso allows the program to continue. Thus, this problem only
> manifests itself when using clocks that aren't handled by the vdso.
>
> It's possible this is due to some compiler ABI mismatch situation
> between userspace and kernelspace. However, I've only started seeing
> this happen with 5.5 and not on 5.4.
>
> Does the above description immediately point to some recognizable
> change? If not, I'll keep debugging.
There is one pending fix for the VDSO in mips-fixes, commit 7d2aa4bb90f5
("mips: Fix gettimeofday() in the vdso library") but your symptoms sound
different to the problem fixed there...
Could you share your kernel config & tell us which platform you're
running on? (QEMU Malta?)
Thanks,
Paul
More details forthcoming, but I just bisected this to:
commit 942437c97fd9ff23a17c13118f50bd0490f6868c (refs/bisect/bad)
Author: Arnd Bergmann <[email protected]>
Date: Mon Jul 15 11:46:10 2019 +0200
y2038: allow disabling time32 system calls
At the moment, the compilation of the old time32 system calls depends
purely on the architecture. As systems with new libc based on 64-bit
time_t are getting deployed, even architectures that previously supported
these (notably x86-32 and arm32 but also many others) no longer depend on
them, and removing them from a kernel image results in a smaller kernel
binary, the same way we can leave out many other optional system calls.
More importantly, on an embedded system that needs to keep working
beyond year 2038, any user space program calling these system calls
is likely a bug, so removing them from the kernel image does provide
an extra debugging help for finding broken applications.
I've gone back and forth on hiding this option unless CONFIG_EXPERT
is set. This version leaves it visible based on the logic that
eventually it will be turned off indefinitely.
Acked-by: Christian Brauner <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
When the VDSO falls back to 32-bit time functions on kernels with
COMPAT_32BIT_TIME=n, userspace becomes corrupted and appears to crash
shortly after, with something like:
[ 0.359617] do_page_fault(): sending SIGSEGV to init for invalid read access from 000000007ff790d0
[ 0.359843] epc = 0000000077e45df4 in libc.so[77da6000+de000]
[ 0.360319] ra = 0000000010000c50 in init[10000000+2000]
[ 0.364456] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
This can be reproduced with simply calling `clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts)`,
since `CLOCK_PROCESS_CPUTIME_ID` is not exported to the VDSO, invoking
the syscall callback branch. This crash was observed with musl 1.20's
clock_gettime implementation:
int __clock_gettime(clockid_t clk, struct timespec *ts)
{
int r;
int (*f)(clockid_t, struct timespec *) =
(int (*)(clockid_t, struct timespec *))vdso_func;
if (f) {
r = f(clk, ts);
if (!r) {
return r;
}
if (r == -EINVAL)
return __syscall_ret(r);
}
r = __syscall(SYS_clock_gettime, clk, ts); // <-- CRASH
if (r == -ENOSYS) {
if (clk == CLOCK_REALTIME) {
__syscall(SYS_gettimeofday, ts, 0);
ts->tv_nsec = (int)ts->tv_nsec * 1000;
return 0;
}
r = -EINVAL;
}
return __syscall_ret(r);
}
The particular kernel and libc are built as part of the MIPS64 CI on
build.wireguard.com which generally uses as minimal configurations as
possible.
Signed-off-by: Jason A. Donenfeld <[email protected]>
Cc: Paul Burton <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Vincenzo Frascino <[email protected]>
Cc: Christian Brauner <[email protected]>
Fixes: 942437c97fd9 ("y2038: allow disabling time32 system calls")
---
arch/mips/include/asm/vdso/gettimeofday.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/mips/include/asm/vdso/gettimeofday.h b/arch/mips/include/asm/vdso/gettimeofday.h
index b08825531e9f..7f1aa610e68e 100644
--- a/arch/mips/include/asm/vdso/gettimeofday.h
+++ b/arch/mips/include/asm/vdso/gettimeofday.h
@@ -107,7 +107,7 @@ static __always_inline int clock_getres_fallback(
return error ? -ret : ret;
}
-#if _MIPS_SIM != _MIPS_SIM_ABI64
+#if _MIPS_SIM != _MIPS_SIM_ABI64 && defined(CONFIG_COMPAT_32BIT_TIME)
#define VDSO_HAS_32BIT_FALLBACK 1
--
2.24.1
Hi Paul,
.config is attached. This is run with:
qemu-system-mips64 -nodefaults -nographic -smp 4 -cpu MIPS64R2-generic
-machine malta -smp 1 -m $(grep -q CONFIG_DEBUG_KMEMLEAK=y
/home/zx2c4/Projects/WireGuard/src/tests/qemu/../../../qemu-build/mips64/linux-linus-git/.config
&& echo 1G || echo 256M) -serial stdio -serial
file:/home/zx2c4/Projects/WireGuard/src/tests/qemu/../../../qemu-build/mips64/result
-no-reboot -monitor none -kernel
/home/zx2c4/Projects/WireGuard/src/tests/qemu/../../../qemu-build/mips64/linux-linus-git/vmlinux
This patch appears to "fix" the problem:
https://lore.kernel.org/linux-mips/[email protected]/
though I don't yet have a handle on exactly what's causing the prior
crash.
Thanks,
Jason
On Tue, Dec 24, 2019 at 2:54 PM Jason A. Donenfeld <[email protected]> wrote:
>
> When the VDSO falls back to 32-bit time functions on kernels with
> COMPAT_32BIT_TIME=n, userspace becomes corrupted and appears to crash
> shortly after, with something like:
>
> [ 0.359617] do_page_fault(): sending SIGSEGV to init for invalid read access from 000000007ff790d0
> [ 0.359843] epc = 0000000077e45df4 in libc.so[77da6000+de000]
> [ 0.360319] ra = 0000000010000c50 in init[10000000+2000]
> [ 0.364456] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> This can be reproduced with simply calling `clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &ts)`,
> since `CLOCK_PROCESS_CPUTIME_ID` is not exported to the VDSO, invoking
> the syscall callback branch. This crash was observed with musl 1.20's
> clock_gettime implementation:
Thanks for the bug report! I'm not completely sure why this fails in
this particular
way though. I assume you are using musl-1.1.20, not a musl-1.2.0 snapshot
(the version 1.20 you list does not exist), so the combination you are testing
is supposed to just return -ENOSYS here to match the behavior of hte
system call.
> --- a/arch/mips/include/asm/vdso/gettimeofday.h
> +++ b/arch/mips/include/asm/vdso/gettimeofday.h
> @@ -107,7 +107,7 @@ static __always_inline int clock_getres_fallback(
> return error ? -ret : ret;
> }
>
> -#if _MIPS_SIM != _MIPS_SIM_ABI64
> +#if _MIPS_SIM != _MIPS_SIM_ABI64 && defined(CONFIG_COMPAT_32BIT_TIME)
>
> #define VDSO_HAS_32BIT_FALLBACK 1
>
I don't think this is the correct fix, it may actually make it worse
by changing the vdso implementation for clock_gettime32()
to fall back to clock_gettime64(), which would appear to work
correctly before y2038 but fail afterwards. How about this one:
diff --git a/arch/mips/vdso/vdso.lds.S b/arch/mips/vdso/vdso.lds.S
index da4627430aba..0bdc6a026be8 100644
--- a/arch/mips/vdso/vdso.lds.S
+++ b/arch/mips/vdso/vdso.lds.S
@@ -93,9 +93,11 @@ VERSION
LINUX_2.6 {
#ifndef DISABLE_MIPS_VDSO
global:
+#if (_MIPS_SIM == _MIPS_SIM_ABI64) || defined(CONFIG_COMPAT_32BIT_TIME)
__vdso_clock_gettime;
__vdso_gettimeofday;
__vdso_clock_getres;
+#endif
#if _MIPS_SIM != _MIPS_SIM_ABI64
__vdso_clock_gettime64;
#endif
That should ensure that no user space can call the old vdso
functions on a kernel that intentionally breaks the actual
syscalls.
Arnd
On Mon, Dec 30, 2019 at 12:58 PM Arnd Bergmann <[email protected]> wrote:
> Thanks for the bug report! I'm not completely sure why this fails in
> this particular
> way though. I assume you are using musl-1.1.20, not a musl-1.2.0 snapshot
Yes, that's the one, sorry.
> diff --git a/arch/mips/vdso/vdso.lds.S b/arch/mips/vdso/vdso.lds.S
> index da4627430aba..0bdc6a026be8 100644
> --- a/arch/mips/vdso/vdso.lds.S
> +++ b/arch/mips/vdso/vdso.lds.S
> @@ -93,9 +93,11 @@ VERSION
> LINUX_2.6 {
> #ifndef DISABLE_MIPS_VDSO
> global:
> +#if (_MIPS_SIM == _MIPS_SIM_ABI64) || defined(CONFIG_COMPAT_32BIT_TIME)
> __vdso_clock_gettime;
> __vdso_gettimeofday;
> __vdso_clock_getres;
> +#endif
> #if _MIPS_SIM != _MIPS_SIM_ABI64
> __vdso_clock_gettime64;
> #endif
>
> That should ensure that no user space can call the old vdso
> functions on a kernel that intentionally breaks the actual
> syscalls.
I can confirm that patch fixes things. Thanks.
Jason
On Mon, Dec 30, 2019 at 1:27 PM Jason A. Donenfeld <[email protected]> wrote:
>
> On Mon, Dec 30, 2019 at 12:58 PM Arnd Bergmann <[email protected]> wrote:
> > Thanks for the bug report! I'm not completely sure why this fails in
> > this particular
> > way though. I assume you are using musl-1.1.20, not a musl-1.2.0 snapshot
>
> Yes, that's the one, sorry.
>
> > diff --git a/arch/mips/vdso/vdso.lds.S b/arch/mips/vdso/vdso.lds.S
> > index da4627430aba..0bdc6a026be8 100644
> > --- a/arch/mips/vdso/vdso.lds.S
> > +++ b/arch/mips/vdso/vdso.lds.S
> > @@ -93,9 +93,11 @@ VERSION
> > LINUX_2.6 {
> > #ifndef DISABLE_MIPS_VDSO
> > global:
> > +#if (_MIPS_SIM == _MIPS_SIM_ABI64) || defined(CONFIG_COMPAT_32BIT_TIME)
> > __vdso_clock_gettime;
> > __vdso_gettimeofday;
> > __vdso_clock_getres;
> > +#endif
> > #if _MIPS_SIM != _MIPS_SIM_ABI64
> > __vdso_clock_gettime64;
> > #endif
> >
> > That should ensure that no user space can call the old vdso
> > functions on a kernel that intentionally breaks the actual
> > syscalls.
>
> I can confirm that patch fixes things. Thanks.
Ok, that's good news, but I think we still need to answer two questions:
- Why does it crash in the first place rather than returning -ENOSYS?
- How does it actually work if you run an application built against
an old musl version on a kernel that tries to make this not work?
Do you just get a random time (uninitialized user space stack) and
work with that without checking the error code?
Arnd
On Mon, Dec 30, 2019 at 1:34 PM Arnd Bergmann <[email protected]> wrote:
>
> - Why does it crash in the first place rather than returning -ENOSYS?
There's a bit of speculation about this in the original thread that
prompted this patch (you're CC'd).
>
> - How does it actually work if you run an application built against
> an old musl version on a kernel that tries to make this not work?
> Do you just get a random time (uninitialized user space stack) and
> work with that without checking the error code?
Actually, your patch fails here. The ts struct remains as it was
before, filled with garbage. No good. My original patch in this
thread, though, does result in the correct value being written to ts.
Jason
On Mon, Dec 30, 2019 at 3:37 PM Jason A. Donenfeld <[email protected]> wrote:
>
> On Mon, Dec 30, 2019 at 1:34 PM Arnd Bergmann <[email protected]> wrote:
> >
> > - Why does it crash in the first place rather than returning -ENOSYS?
>
> There's a bit of speculation about this in the original thread that
> prompted this patch (you're CC'd).
The following will provoke the crash:
__attribute__((noinline)) void somefunc(void) { }
int __clock_gettime(clockid_t clk, struct timespec *ts)
{
((int (*)(clockid_t, struct timespec *))vdso_func)(clk, ts);
somefunc();
return 88;
}
It seems like the VDSO is doing something to the stack.
On Mon, Dec 30, 2019 at 3:37 PM Jason A. Donenfeld <[email protected]> wrote:
>
> On Mon, Dec 30, 2019 at 1:34 PM Arnd Bergmann <[email protected]> wrote:
> >
> > - Why does it crash in the first place rather than returning -ENOSYS?
>
> There's a bit of speculation about this in the original thread that
> prompted this patch (you're CC'd).
>
> >
> > - How does it actually work if you run an application built against
> > an old musl version on a kernel that tries to make this not work?
> > Do you just get a random time (uninitialized user space stack) and
> > work with that without checking the error code?
>
> Actually, your patch fails here. The ts struct remains as it was
> before, filled with garbage. No good. My original patch in this
> thread, though, does result in the correct value being written to ts.
Ok, that is the intended behavior then, clock_gettime() needs
to fail with -EINVAL or -ENOSYS here (depending on the libc
implementation), and of course the data is not updated.
Returning success from clock_gettime() on a kernel with only
time64 support and a libc with only time32 support (or vice
versa) would be a bug.
Arnd
On Mon, Dec 30, 2019 at 4:37 PM Arnd Bergmann <[email protected]> wrote:
>
> On Mon, Dec 30, 2019 at 3:37 PM Jason A. Donenfeld <[email protected]> wrote:
> >
> > On Mon, Dec 30, 2019 at 1:34 PM Arnd Bergmann <[email protected]> wrote:
> > >
> > > - Why does it crash in the first place rather than returning -ENOSYS?
> >
> > There's a bit of speculation about this in the original thread that
> > prompted this patch (you're CC'd).
> >
> > >
> > > - How does it actually work if you run an application built against
> > > an old musl version on a kernel that tries to make this not work?
> > > Do you just get a random time (uninitialized user space stack) and
> > > work with that without checking the error code?
> >
> > Actually, your patch fails here. The ts struct remains as it was
> > before, filled with garbage. No good. My original patch in this
> > thread, though, does result in the correct value being written to ts.
>
> Ok, that is the intended behavior then, clock_gettime() needs
> to fail with -EINVAL or -ENOSYS here (depending on the libc
> implementation), and of course the data is not updated.
>
> Returning success from clock_gettime() on a kernel with only
> time64 support and a libc with only time32 support (or vice
> versa) would be a bug.
Ah, right, hence why the 32-bit compat code is behind a
still-on-by-default-but-not-for-long menu option.
On Mon, Dec 30, 2019 at 4:39 PM Jason A. Donenfeld <[email protected]> wrote:
>
> On Mon, Dec 30, 2019 at 4:37 PM Arnd Bergmann <[email protected]> wrote:
> >
> > On Mon, Dec 30, 2019 at 3:37 PM Jason A. Donenfeld <[email protected]> wrote:
> > >
> > > On Mon, Dec 30, 2019 at 1:34 PM Arnd Bergmann <[email protected]> wrote:
> > Returning success from clock_gettime() on a kernel with only
> > time64 support and a libc with only time32 support (or vice
> > versa) would be a bug.
>
> Ah, right, hence why the 32-bit compat code is behind a
> still-on-by-default-but-not-for-long menu option.
I expect this to remain on-by-default for a rather long time, as we still
to run old binaries well into 2030s. Making it configurable is done for
two reasons:
- on embedded systems that have all user space built with time64,
turning this off can make the kernel slightly smaller
- turning it off lets you check that no code relies on calling the old
interfaces internally, bypassing the C library, in a way that works
at the moment but may break after 2038.
Arnd
On Tue, Dec 24, 2019 at 2:37 PM Jason A. Donenfeld <[email protected]> wrote:
>
> More details forthcoming, but I just bisected this to:
>
> commit 942437c97fd9ff23a17c13118f50bd0490f6868c (refs/bisect/bad)
> Author: Arnd Bergmann <[email protected]>
> Date: Mon Jul 15 11:46:10 2019 +0200
>
> y2038: allow disabling time32 system calls
>
> At the moment, the compilation of the old time32 system calls depends
> purely on the architecture. As systems with new libc based on 64-bit
> time_t are getting deployed, even architectures that previously supported
> these (notably x86-32 and arm32 but also many others) no longer depend on
> them, and removing them from a kernel image results in a smaller kernel
> binary, the same way we can leave out many other optional system calls.
My best guess right now is that there is a bug in the error handling
inside the mips vdso: clock_gettime32_fallback() is defined like
static __always_inline long clock_gettime32_fallback(
clockid_t _clkid,
struct old_timespec32 *_ts)
{
register struct old_timespec32 *ts asm("a1") = _ts;
register clockid_t clkid asm("a0") = _clkid;
register long ret asm("v0");
register long nr asm("v0") = __NR_clock_gettime;
register long error asm("a3");
asm volatile(
" syscall\n"
: "=r" (ret), "=r" (error)
: "r" (clkid), "r" (ts), "r" (nr)
: "$1", "$3", "$8", "$9", "$10", "$11", "$12", "$13",
"$14", "$15", "$24", "$25", "hi", "lo", "memory");
return error ? -ret : ret;
}
and it's possible that this does not work correctly when the system
call fails. With my patch, the __NR_clock_gettime syscall always
fails, and this may end up corrupting the registers or the stack
in some way.
One thing you could try is to use a working kernel version (before
my patch, with my patch reverted, or with your workaround applied)
and pass an invalid _clkid argument to make the system call
fail in a different way. Does this cause the same crash?
I see that the previous vdso implementation (added in
180902e08f05, fixed in b399ee28c29c07, removed in
90800281e761) had an issue with the clobber list originally,
but the current version looks identical to the version after
the fix.
Arnd
Makes sense w.r.t. time32 situation.
I still think that in spite of that there's still something weird
happening with the mips VDSO.
Here's a register dump before the call:
$ 0 : 0000000000000000 0000000000000001 0000000010000000 fffffffffffffffc
$ 4 : 0000000000000002 000000007fff2e40 0000000000000000 0000000000000001
$ 8 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000
$12 : 0000000000000000 000000000000000a ffffffff80000000 000000007fffffda
$16 : 0000000010001ba8 0000005800000015 0000000010000000 0000000010000000
$20 : 0000000010000000 0000000010000000 0000000000000000 0000000077ff2ae8
$24 : 0000000000000005 0000000077fa1d18
$28 : 0000000010019cf0 000000007fff2e40 0000000000000000 0000000010000c30
Hi : 0000000000000000
Lo : 0000000000000000
epc : 0000000077fa1d18 0x77fa1d18
ra : 0000000010000c30 0x10000c30
And here it is immediately after:
$ 0 : 0000000000000000 0000000000000001 ffffffffffffffa7 000000007fff5000
$ 4 : 0000000000000002 000000007fff2e40 0000000077ff2000 0000000000000001
$ 8 : 0000000000000006 0000000000000020 0000000000000002 0000000000000000
$12 : 0000000000000000 0000000000001852 ffffffff80156160 000000007fffffda
$16 : 0000000010001ba8 0000005800000015 0000000010000000 0000000010000000
$20 : 0000000010000000 0000000010000000 0000000000000000 0000000077ff2b00
$24 : 0000000000000005 0000000000000000
$28 : 000000007fff5000 000000007fff2e30 0000000000000000 0000000077fa1e00
Hi : 0000000000000000
Lo : 0000000000000000
epc : 0000000077fa1e00 0x77fa1e00
ra : 0000000077fa1e00 0x77fa1e00
I wonder if a toolchain option or compiler bug or something is causing
the vdso to not restore certain registers (gp? ra?).
On Mon, Dec 30, 2019 at 4:58 PM Jason A. Donenfeld <[email protected]> wrote:
>
> Makes sense w.r.t. time32 situation.
>
> I still think that in spite of that there's still something weird
> happening with the mips VDSO.
Agreed.
> Here's a register dump before the call:
>
> $ 0 : 0000000000000000 0000000000000001 0000000010000000 fffffffffffffffc
> $ 4 : 0000000000000002 000000007fff2e40 0000000000000000 0000000000000001
> $ 8 : 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> $12 : 0000000000000000 000000000000000a ffffffff80000000 000000007fffffda
> $16 : 0000000010001ba8 0000005800000015 0000000010000000 0000000010000000
> $20 : 0000000010000000 0000000010000000 0000000000000000 0000000077ff2ae8
> $24 : 0000000000000005 0000000077fa1d18
> $28 : 0000000010019cf0 000000007fff2e40 0000000000000000 0000000010000c30
> Hi : 0000000000000000
> Lo : 0000000000000000
> epc : 0000000077fa1d18 0x77fa1d18
> ra : 0000000010000c30 0x10000c30
>
> And here it is immediately after:
>
> $ 0 : 0000000000000000 0000000000000001 ffffffffffffffa7 000000007fff5000
> $ 4 : 0000000000000002 000000007fff2e40 0000000077ff2000 0000000000000001
> $ 8 : 0000000000000006 0000000000000020 0000000000000002 0000000000000000
> $12 : 0000000000000000 0000000000001852 ffffffff80156160 000000007fffffda
> $16 : 0000000010001ba8 0000005800000015 0000000010000000 0000000010000000
> $20 : 0000000010000000 0000000010000000 0000000000000000 0000000077ff2b00
> $24 : 0000000000000005 0000000000000000
> $28 : 000000007fff5000 000000007fff2e30 0000000000000000 0000000077fa1e00
> Hi : 0000000000000000
> Lo : 0000000000000000
> epc : 0000000077fa1e00 0x77fa1e00
> ra : 0000000077fa1e00 0x77fa1e00
Is this immediately before/after the syscall instruction or the
indirect function call?
> I wonder if a toolchain option or compiler bug or something is causing
> the vdso to not restore certain registers (gp? ra?).
Here is the assembler output I see for the o32 vdso, hopefully I got all
the relevant bits:
# /git/arm-soc/lib/vdso/gettimeofday.c:130: if (unlikely(ret))
.set noreorder
.set nomacro
beqz $2,$L86 #,,
lw $28,16($sp) #,
.set macro
.set reorder
$L46:
# /git/arm-soc/arch/mips/include/asm/vdso/gettimeofday.h:118:
register struct old_timespec32 *ts asm("a1") = _ts;
move $5,$16 # ts, ts
# /git/arm-soc/arch/mips/include/asm/vdso/gettimeofday.h:119:
register clockid_t clkid asm("a0") = _clkid;
move $4,$17 # clkid, clock
# /git/arm-soc/arch/mips/include/asm/vdso/gettimeofday.h:121:
register long nr asm("v0") = __NR_clock_gettime;
li $2,4263 # 0x10a7 # nr,
# /git/arm-soc/arch/mips/include/asm/vdso/gettimeofday.h:124: asm volatile(
#APP
# 124 "/git/arm-soc/arch/mips/include/asm/vdso/gettimeofday.h" 1
syscall
# 0 "" 2
# /git/arm-soc/arch/mips/vdso/vgettimeofday.c:18: }
#NO_APP
lw $31,60($sp) #,
lw $19,56($sp) #,
# /git/arm-soc/arch/mips/include/asm/vdso/gettimeofday.h:131: return
error ? -ret : ret;
subu $3,$0,$2 # <retval>, ret
selnez $3,$3,$7 # tmp406, <retval>, error
seleqz $2,$2,$7 # tmp407, ret, error
or $3,$3,$2 # <retval>, tmp406, tmp407
# /git/arm-soc/arch/mips/vdso/vgettimeofday.c:18: }
lw $18,52($sp) #,
lw $17,48($sp) #,
lw $16,44($sp) #,
move $2,$3 #, <retval>
.set noreorder
.set nomacro
jr $31 #
addiu $sp,$sp,64 #,,
.set macro
.set reorder
gp ($28) and ra ($31) sound like good guesses to me,
SP ($r29) changed from 000000007fff2e40
to 000000007fff2e30, if that is not the intention, it would clearly explain why
anything afterwards crashes, but that seems unlikely.
r3 contains the error code -ENOSYS on mips, so that's good.
$23 is supposed to be preserved across function calls and is
consequently not part of the clobber list but is modified.
$25 is part of the clobber list and is also modified, but there
is no code to save/restore it in the assembler output.
Arnd
On Mon, Dec 30, 2019 at 6:33 PM Arnd Bergmann <[email protected]> wrote
> Is this immediately before/after the syscall instruction or the
> indirect function call?
It's immediately after/before the call to the VDSO function itself.
Next I'll try to instrument the VDSO to get closer to that syscall.
I produced those reg dumps by hooking the page fault handler in the
kernel to print them and then disabling aslr and sticking a
`*(volatile int *)0 = 0;` in the code. Pretty gnarly.
On Mon, Dec 30, 2019 at 10:09 PM Jason A. Donenfeld <[email protected]> wrote:
>
> On Mon, Dec 30, 2019 at 6:33 PM Arnd Bergmann <[email protected]> wrote
> > Is this immediately before/after the syscall instruction or the
> > indirect function call?
>
> It's immediately after/before the call to the VDSO function itself.
> Next I'll try to instrument the VDSO to get closer to that syscall.
>
> I produced those reg dumps by hooking the page fault handler in the
> kernel to print them and then disabling aslr and sticking a
> `*(volatile int *)0 = 0;` in the code. Pretty gnarly.
Here's immediately before and immediately after the syscall asm that
the vdso has in mips/include/asm/vdso/gettimeofday.h. sp and ra are
wrong?
Before:
[ 0.546364] $ 0 : 0000000000000000 0000000000000001
0000000000000002 0000000000000000
[ 0.546545] $ 4 : 000000007fff4000 0000000000000000
0000000077ff0000 0000000000000406
[ 0.546762] $ 8 : 000000007fff5000 0000000000000020
0000000000000002 0000000000000000
[ 0.546912] $12 : 0000000000000000 000000000000000a
ffffffff80000000 000000000000006d
[ 0.547046] $16 : 000000007fff2e40 000000007fff2e40
0000000010000000 0000000010000000
[ 0.547178] $20 : 0000000010000000 0000000010000000
0000000000000000 0000000077ff0000
[ 0.547548] $24 : 0000000000000005 0000000000000000
[ 0.547743] $28 : 000000007fff5000 000000007fff2df0
0000000000000000 000000007fff550c
[ 0.547898] Hi : 0000000000000000
[ 0.548010] Lo : 0000000000000000
[ 0.548175] epc : 000000007fff5580 0x7fff5580
[ 0.548358] ra : 000000007fff550c 0x7fff550c
[ 0.549305] Stack : 0000000000000002 000000007fff2e40
0000000000000002 0000000077f9e80c
[ 0.549500] 0000000000000000 0000000000000000
ffffffffffffffff 0000000010000000
[ 0.549687] 0000000010019dd0 0000000010000c20
0000000077ff0000 0000000077fa4868
[ 0.549951] 0000000377ff19b8 0000000000000000
000000007fff2f04 0000000000000001
[ 0.550127] 0000000010000870 0000000077ff0000
0000000077fa4868 0000000077ff19b8
[ 0.550277] 0000000077ff7180 0000000077f297ac
7fff2f0c77ff7180 0000000077f29800
[ 0.550458] 0000000000000000 000000007fff2f00
0000000077ff19b8 0000000077ff1e30
[ 0.550613] 0000000010019dd0 0000000010000dec
0000000010019dd0 0000000010000db0
[ 0.550811] 0000000000000000 0000000000000000
000000017fff2fda 000000007fff2fe0
[ 0.550957] 7fff2fe700000000 000000217fff5000
0000001000000020 0000000600001000
After:
[ 0.577975] $ 0 : 0000000000000000 0000000000000001
0000000000000059 000000007fff5000
[ 0.578191] $ 4 : 0000000000000002 000000007fff2e40
0000000077ff0000 0000000000000001
[ 0.578392] $ 8 : 0000000000000006 0000000000000020
0000000000000002 0000000000000000
[ 0.578611] $12 : 0000000000000000 0000000000001852
ffffffff801560e0 000000000000006d
[ 0.578817] $16 : 0000000000000002 000000007fff2e40
0000000010000000 0000000010000000
[ 0.579004] $20 : 0000000010000000 0000000010000000
0000000000000000 0000000077ff0000
[ 0.579149] $24 : 0000000000000005 0000000000000000
[ 0.579375] $28 : 000000007fff5000 000000007fff2de0
0000000000000000 000000007fff551c
[ 0.579640] Hi : 0000000000000000
[ 0.579799] Lo : 0000000000000000
[ 0.579974] epc : 000000007fff55a0 0x7fff55a0
[ 0.580134] ra : 000000007fff551c 0x7fff551c
[ 0.581293] Stack : 0000000000000000 0000000077f9e760
0000000000000002 000000007fff2e40
[ 0.581456] 0000000077ff0000 0000000077f9e80c
0000000000000000 0000000000000000
[ 0.581619] ffffffffffffffff 0000000010000000
0000000010019dd0 0000000010000c20
[ 0.581834] 0000000077ff0000 0000000077fa4868
0000000377ff19b8 0000000000000000
[ 0.581985] 000000007fff2f04 0000000000000001
0000000010000870 0000000077ff0000
[ 0.582136] 0000000077fa4868 0000000077ff19b8
0000000077ff7180 0000000077f297ac
[ 0.582288] 7fff2f0c77ff7180 0000000077f29800
0000000000000000 000000007fff2f00
[ 0.582438] 0000000077ff19b8 0000000077ff1e30
0000000010019dd0 0000000010000dec
[ 0.582585] 0000000010019dd0 0000000010000db0
0000000000000000 0000000000000000
[ 0.582732] 000000017fff2fda 000000007fff2fe0
7fff2fe700000000 000000217fff5000
Here's a "one click" reproducer:
https://data.zx2c4.com/mips-musl-libc-weird-crash-time32-compat.tar.xz
Untar that and hit `make -j$(nproc)`, and you'll get a freshly built
and crashing kernel+userland.
Hi Jason,
On Tue, Dec 31, 2019 at 05:14:41PM +0100, Jason A. Donenfeld wrote:
> Here's a "one click" reproducer:
> https://data.zx2c4.com/mips-musl-libc-weird-crash-time32-compat.tar.xz
>
> Untar that and hit `make -j$(nproc)`, and you'll get a freshly built
> and crashing kernel+userland.
Thanks for the test case. It seems like the VDSO code isn't saving &
restoring $gp/$28, even though it's meant to be callee-saved in both the
n32 & n64 ABIs. With some digging I found that the below seems to
resolve the issue. Could you check whether it works for you?
I'm still not quite sure *why* this happens; perhaps GCC just decides it
doesn't need to save & restore $gp/$28 when it spots that it's being
"used" for __current_thread_info (even though that's never actually
referenced in the VDSO)?
Just moving the declaration of __current_thread_info inside the
current_thread_info() function seems to do the trick too, and is
probably a bit neater.
Thanks,
Paul
---
diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
index 4993db40482c..ac33959bbb1f 100644
--- a/arch/mips/include/asm/thread_info.h
+++ b/arch/mips/include/asm/thread_info.h
@@ -50,7 +50,11 @@ struct thread_info {
}
/* How to get the thread information struct from C. */
+#ifdef __VDSO__
+register struct thread_info *__current_thread_info __asm__("$0");
+#else
register struct thread_info *__current_thread_info __asm__("$28");
+#endif
static inline struct thread_info *current_thread_info(void)
{
Hi Jason,
On Tue, Dec 31, 2019 at 08:10:56PM -0800, Paul Burton wrote:
> I'm still not quite sure *why* this happens; perhaps GCC just decides it
> doesn't need to save & restore $gp/$28 when it spots that it's being
> "used" for __current_thread_info (even though that's never actually
> referenced in the VDSO)?
Ah:
> After defining a global register variable, for the current compilation
> unit:
>
> - If the register is a call-saved register, call ABI is affected: the
> register will not be restored in function epilogue sequences after
> the variable has been assigned. Therefore, functions cannot safely
> return to callers that assume standard ABI.
https://gcc.gnu.org/onlinedocs/gcc/Global-Register-Variables.html
That makes sense then. What doesn't make sense is how this ever
worked. A question for another day...
Thanks,
Paul
On Wed, Jan 1, 2020 at 5:08 AM Paul Burton <[email protected]> wrote:
>
> Hi Jason,
>
> On Tue, Dec 31, 2019 at 05:14:41PM +0100, Jason A. Donenfeld wrote:
> > Here's a "one click" reproducer:
> > https://data.zx2c4.com/mips-musl-libc-weird-crash-time32-compat.tar.xz
> >
> > Untar that and hit `make -j$(nproc)`, and you'll get a freshly built
> > and crashing kernel+userland.
>
> Thanks for the test case. It seems like the VDSO code isn't saving &
> restoring $gp/$28, even though it's meant to be callee-saved in both the
> n32 & n64 ABIs. With some digging I found that the below seems to
> resolve the issue. Could you check whether it works for you?
>
> I'm still not quite sure *why* this happens; perhaps GCC just decides it
> doesn't need to save & restore $gp/$28 when it spots that it's being
> "used" for __current_thread_info (even though that's never actually
> referenced in the VDSO)?
>
> Just moving the declaration of __current_thread_info inside the
> current_thread_info() function seems to do the trick too, and is
> probably a bit neater.
>
> Thanks,
> Paul
>
> ---
> diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
> index 4993db40482c..ac33959bbb1f 100644
> --- a/arch/mips/include/asm/thread_info.h
> +++ b/arch/mips/include/asm/thread_info.h
> @@ -50,7 +50,11 @@ struct thread_info {
> }
>
> /* How to get the thread information struct from C. */
> +#ifdef __VDSO__
> +register struct thread_info *__current_thread_info __asm__("$0");
> +#else
> register struct thread_info *__current_thread_info __asm__("$28");
> +#endif
>
> static inline struct thread_info *current_thread_info(void)
> {
Holy guacamole, nice catch. That's interesting behavior indeed...
I'll leave it to you to submit for 5.5?
On Wed, Jan 1, 2020 at 5:22 AM Paul Burton <[email protected]> wrote:
> That makes sense then. What doesn't make sense is how this ever
> worked. A question for another day...
In most other cases, calls to the vdso would be followed in the parent
function immediately by a return, restoring gp then.