On Sat, 2 Feb 2019, Heiko Carstens wrote:
> On Fri, Feb 01, 2019 at 10:59:08PM +0100, Thomas Gleixner wrote:
> > Were you able to capture a trace with the last set of additional trace
> > printks?
>
> Of course I forgot to collect that, sorry! But just reproduced; see
> log below (last 1000 lines) and attachment for full log.
The failing futex is here:
<...>-48786 [002] .... 337.231645: sys_futex(uaddr: 3ff90c00460, op: 6, val: 1, utime: 0, uaddr2: 4, val3: 0)
<...>-48786 [002] .... 337.231646: attach_to_pi_owner: Missing pid 49011
<...>-48786 [002] .... 337.231646: handle_exit_race: uval2 vs uval 8000bf73 vs 8000bf73 (-1)
<...>-48786 [002] .... 337.231741: sys_futex -> 0xfffffffffffffffd
Lets look were it was handled in the kernel right before that:
<...>-49014 [006] .... 337.215675: sys_futex(uaddr: 3ff90c00460, op: 7, val: 3ff00000007, utime: 3ff8d3f8910, uaddr2: 3ff8d3f8910, val3: 3ffc64fe8f7)
<...>-49014 [006] .... 337.215675: do_futex: uaddr: 3ff90c00460 cur: 8000bf76 new: 0
49014 unlocks the futex in the kernel and due to lack of waiters it sets it
to unlocked ---> new: 0.
Between this and the failing sys_futex() invocation, the missing task exits:
<...>-49011 [000] .... 337.221543: handle_futex_death: uaddr: 3ff90c00a00 pi: 1
...
<...>-49011 [000] .... 337.221547: handle_futex_death: uaddr: 3ff90c00488 success
<...>-49011 [000] .... 337.221548: sched_process_exit: comm=ld64.so.1 pid=49011 prio=120
but it does not have futex 3ff90c00460 in its robust list.
So after the unlock @timestamp 337.215675 the kernel does not deal with
that futex at all until the failed lock attempt where it rightfully rejects
the attempt due to the alleged owner being gone.
So this looks more like user space doing something stupid...
As we talked about the missing barriers before, I just looked at
pthread_mutex_trylock() and that does still:
if (robust)
{
ENQUEUE_MUTEX_PI (mutex);
THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
}
So it's missing the barriers which pthread_mutex_lock() has. Grasping for
straws obviously....
Thanks,
tglx
On Sat, Feb 02, 2019 at 11:14:27AM +0100, Thomas Gleixner wrote:
> On Sat, 2 Feb 2019, Heiko Carstens wrote:
> So after the unlock @timestamp 337.215675 the kernel does not deal with
> that futex at all until the failed lock attempt where it rightfully rejects
> the attempt due to the alleged owner being gone.
>
> So this looks more like user space doing something stupid...
>
> As we talked about the missing barriers before, I just looked at
> pthread_mutex_trylock() and that does still:
>
> if (robust)
> {
> ENQUEUE_MUTEX_PI (mutex);
> THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> }
>
> So it's missing the barriers which pthread_mutex_lock() has. Grasping for
> straws obviously....
Excellent! Taking a look into the disassembly of nptl/pthread_mutex_trylock.o
reveals this part:
140: a5 1b 00 01 oill %r1,1
144: e5 48 a0 f0 00 00 mvghi 240(%r10),0 <--- THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
14a: e3 10 a0 e0 00 24 stg %r1,224(%r10) <--- last THREAD_SETMEM of ENQUEUE_MUTEX_PI
I added a barrier between those two and now the code looks like this:
140: a5 1b 00 01 oill %r1,1
144: e3 10 a0 e0 00 24 stg %r1,224(%r10)
14a: e5 48 a0 f0 00 00 mvghi 240(%r10),0
Looks like this was a one instruction race...
I'll try to reproduce with the patch below (sprinkling compiler
barriers just like the other files have).
diff --git a/nptl/pthread_mutex_trylock.c b/nptl/pthread_mutex_trylock.c
index 7de61f4f68..3b093cb09c 100644
--- a/nptl/pthread_mutex_trylock.c
+++ b/nptl/pthread_mutex_trylock.c
@@ -116,8 +116,12 @@ __pthread_mutex_trylock (pthread_mutex_t *mutex)
mutex->__data.__count = 1;
/* But it is inconsistent unless marked otherwise. */
mutex->__data.__owner = PTHREAD_MUTEX_INCONSISTENT;
-
+ /* We must not enqueue the mutex before we have acquired it.
+ Also see comments at ENQUEUE_MUTEX. */
+ __asm ("" ::: "memory");
ENQUEUE_MUTEX (mutex);
+ /* We need to clear op_pending after we enqueue the mutex. */
+ __asm ("" ::: "memory");
THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
/* Note that we deliberately exist here. If we fall
@@ -177,7 +181,12 @@ __pthread_mutex_trylock (pthread_mutex_t *mutex)
}
while ((oldval & FUTEX_OWNER_DIED) != 0);
+ /* We must not enqueue the mutex before we have acquired it.
+ Also see comments at ENQUEUE_MUTEX. */
+ __asm ("" ::: "memory");
ENQUEUE_MUTEX (mutex);
+ /* We need to clear op_pending after we enqueue the mutex. */
+ __asm ("" ::: "memory");
THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
mutex->__data.__owner = id;
@@ -279,7 +288,11 @@ __pthread_mutex_trylock (pthread_mutex_t *mutex)
/* But it is inconsistent unless marked otherwise. */
mutex->__data.__owner = PTHREAD_MUTEX_INCONSISTENT;
+ /* We must not enqueue the mutex before we have acquired it.
+ Also see comments at ENQUEUE_MUTEX. */
+ __asm ("" ::: "memory");
ENQUEUE_MUTEX (mutex);
+ __asm ("" ::: "memory");
THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
/* Note that we deliberately exit here. If we fall
@@ -308,7 +321,12 @@ __pthread_mutex_trylock (pthread_mutex_t *mutex)
if (robust)
{
+ /* We must not enqueue the mutex before we have acquired it.
+ Also see comments at ENQUEUE_MUTEX. */
+ __asm ("" ::: "memory");
ENQUEUE_MUTEX_PI (mutex);
+ /* We need to clear op_pending after we enqueue the mutex. */
+ __asm ("" ::: "memory");
THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
}
On Sat, 2 Feb 2019, Heiko Carstens wrote:
> On Sat, Feb 02, 2019 at 11:14:27AM +0100, Thomas Gleixner wrote:
> > On Sat, 2 Feb 2019, Heiko Carstens wrote:
> > So after the unlock @timestamp 337.215675 the kernel does not deal with
> > that futex at all until the failed lock attempt where it rightfully rejects
> > the attempt due to the alleged owner being gone.
> >
> > So this looks more like user space doing something stupid...
> >
> > As we talked about the missing barriers before, I just looked at
> > pthread_mutex_trylock() and that does still:
> >
> > if (robust)
> > {
> > ENQUEUE_MUTEX_PI (mutex);
> > THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> > }
> >
> > So it's missing the barriers which pthread_mutex_lock() has. Grasping for
> > straws obviously....
Looks more like a solid tree than a straw now. :)
> Excellent! Taking a look into the disassembly of nptl/pthread_mutex_trylock.o
> reveals this part:
>
> 140: a5 1b 00 01 oill %r1,1
> 144: e5 48 a0 f0 00 00 mvghi 240(%r10),0 <--- THREAD_SETMEM (THREAD_SELF, robust_head.list_op_pending, NULL);
> 14a: e3 10 a0 e0 00 24 stg %r1,224(%r10) <--- last THREAD_SETMEM of ENQUEUE_MUTEX_PI
Awesome.
> I added a barrier between those two and now the code looks like this:
>
> 140: a5 1b 00 01 oill %r1,1
> 144: e3 10 a0 e0 00 24 stg %r1,224(%r10)
> 14a: e5 48 a0 f0 00 00 mvghi 240(%r10),0
>
> Looks like this was a one instruction race...
Fun. JFYI, I said that I reversed the stores in glibc and on my x86 test VM
it took more than _3_ days to trigger. But the good news is, that the trace
looks exactly like the ones you provided. So it looks we are on the right
track.
> I'll try to reproduce with the patch below (sprinkling compiler
> barriers just like the other files have).
Looks about right.
Thanks,
tglx
On Sun, Feb 03, 2019 at 05:30:39PM +0100, Thomas Gleixner wrote:
> On Sat, 2 Feb 2019, Heiko Carstens wrote:
> > I added a barrier between those two and now the code looks like this:
> >
> > 140: a5 1b 00 01 oill %r1,1
> > 144: e3 10 a0 e0 00 24 stg %r1,224(%r10)
> > 14a: e5 48 a0 f0 00 00 mvghi 240(%r10),0
> >
> > Looks like this was a one instruction race...
>
> Fun. JFYI, I said that I reversed the stores in glibc and on my x86 test VM
> it took more than _3_ days to trigger. But the good news is, that the trace
> looks exactly like the ones you provided. So it looks we are on the right
> track.
>
> > I'll try to reproduce with the patch below (sprinkling compiler
> > barriers just like the other files have).
>
> Looks about right.
The test case now runs since two days without failures. So it looks like you
found the bug! Thank you for debugging this!
My glibc patch missed at lease one place where I should have added another
barrier, but the current version was good enough for the test case ;)
Stefan Liebler is kind enough to take care that this will be fixed in glibc.
Thanks!