I am seeing a bug in clone() on the Alpha architecture. Reported to
Debian as https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=755397
The test suite of glibc sometimes fails in the nptl/tst-eintr3 test
with a segmentation fault. I have tracked it down to the thread
pointer returned by the rduniq PALcall is occasionally zero when
it should point to the TLS. I have only ever seen this occur when
running a SMP kernel.
Running strace on nptl/tst-eintr3 reveals that the clone() syscall
is retried by the kernel if an ERESTARTNOINTR error occurs. At
$syscall_error in arch/alpha/kernel/entry.S the kernel handles the
error and in doing that it writes to 72(sp) which is where the value
of the a3 CPU register on entry to the kernel is stored. Then the
kernel retries the clone() function. But the alpha specific code
for copy_thread() in arch/alpha/kernel/process.c does not use the
passed a3 cpu register (the argument tls), instead it goes to the
saved stack to get the value of the a3 register, which on the
second call to clone() has been modified to no longer be the value
of the a3 cpu register on entry to the kernel. And a latent bomb
is laid for userspace in the form of an incorrect process unique
value (which is the thread pointer) in the PCB.
Am I correct in my analysis and, if so, can we get a fix for this
please.
Cheers
Michael.
On 07/22/2014 10:52 PM, Michael Cree wrote:
> Running strace on nptl/tst-eintr3 reveals that the clone() syscall
> is retried by the kernel if an ERESTARTNOINTR error occurs. At
> $syscall_error in arch/alpha/kernel/entry.S the kernel handles the
> error and in doing that it writes to 72(sp) which is where the value
> of the a3 CPU register on entry to the kernel is stored. Then the
> kernel retries the clone() function. But the alpha specific code
> for copy_thread() in arch/alpha/kernel/process.c does not use the
> passed a3 cpu register (the argument tls), instead it goes to the
> saved stack to get the value of the a3 register, which on the
> second call to clone() has been modified to no longer be the value
> of the a3 cpu register on entry to the kernel. And a latent bomb
> is laid for userspace in the form of an incorrect process unique
> value (which is the thread pointer) in the PCB.
>
> Am I correct in my analysis and, if so, can we get a fix for this
> please.
Well... let me start with the assumption that we can't possibly restart unless
the syscall fails with -ERESTART*.
Before we clobber 72($sp), $syscall_error saves the old value in $19. This is
the r19 parameter to do_work_pending, and is passed all the way down to
syscall_restart where we do restore the original value of a3 for ERESTARTNOINTR.
So if there's a path that leads to restart, but doesn't save a3 before
clobbering, I don't see it. Do you have an strace dump that shows this?
r~
On Thu, Jul 24, 2014 at 08:19:52AM -1000, Richard Henderson wrote:
> On 07/22/2014 10:52 PM, Michael Cree wrote:
> > Running strace on nptl/tst-eintr3 reveals that the clone() syscall
> > is retried by the kernel if an ERESTARTNOINTR error occurs. At
> > $syscall_error in arch/alpha/kernel/entry.S the kernel handles the
> > error and in doing that it writes to 72(sp) which is where the value
> > of the a3 CPU register on entry to the kernel is stored. Then the
> > kernel retries the clone() function. But the alpha specific code
> > for copy_thread() in arch/alpha/kernel/process.c does not use the
> > passed a3 cpu register (the argument tls), instead it goes to the
> > saved stack to get the value of the a3 register, which on the
> > second call to clone() has been modified to no longer be the value
> > of the a3 cpu register on entry to the kernel. And a latent bomb
> > is laid for userspace in the form of an incorrect process unique
> > value (which is the thread pointer) in the PCB.
> >
> > Am I correct in my analysis and, if so, can we get a fix for this
> > please.
>
> Well... let me start with the assumption that we can't possibly restart unless
> the syscall fails with -ERESTART*.
>
> Before we clobber 72($sp), $syscall_error saves the old value in $19. This is
> the r19 parameter to do_work_pending, and is passed all the way down to
> syscall_restart where we do restore the original value of a3 for ERESTARTNOINTR.
>
> So if there's a path that leads to restart, but doesn't save a3 before
> clobbering, I don't see it. Do you have an strace dump that shows this?
Yes. This is an example of a run of nptl/tst-eintr3 that fails after
cutting off quite a bit of stuff at the start to get to the relevant
section:
clone(child_stack=0x2000121eae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2000121f2c0, tls=0x2000121f8e0, child_tidptr=0x2000121f2c0) = ? ERESTARTNOINTR (To be restarted)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_TKILL, si_pid=20086, si_uid=1000} ---
write(1, ".", 1.) = 1
sigreturn() (mask []) = -1 ERRNO_312 (Unknown error 312)
clone(child_stack=0x2000121eae0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x2000121f2c0, tls=0, child_tidptr=0x2000121f2c0) = 20089
+++ killed by SIGSEGV +++
Note that the retry of clone() has zero for the tls argument.
Examining the resultant core dump reveals that tst-eintr3 segfaulted
when trying to access a thread local variable and that register v0,
used in calculating the TLS location and set up by the rduniq PALcall,
is zero.
Cheers
Michael.
Ok, I think I've actually found it this time. It's here:
> 281 childregs->r20 = 1; /* OSF/1 has some strange fork() semantics. */
> 282 regs->r20 = 0;
We need to delay this r20 silliness until after restarts or something. Or just
kill it -- it's not like glibc uses this return value.
r~