All,
So we've been seeing PI mutex deadlocks with a few of our applications
using the -rt kernel. After narrowing things down, we were finding that
the applications were indirectly calling futex_unlock_pi(), which on
occasion would return -EFAULT, which is promptly ignored by glibc. This
would cause the application to continue on as if it has unlocked the
mutex, until it tried to reacquire it and deadlock.
In looking into why the kernel was returning -EFAULT, I found the
following:
...
retry_locked:
/*
* To avoid races, try to do the TID -> 0 atomic transition
* again. If it succeeds then we can return without waking
* anyone else up:
*/
if (!(uval & FUTEX_OWNER_DIED)) {
pagefault_disable();
uval = futex_atomic_cmpxchg_inatomic(uaddr, current->pid, 0);
pagefault_enable();
}
if (unlikely(uval == -EFAULT))
goto pi_faulted;
...[snip]...
pi_faulted:
/*
* We have to r/w *(int __user *)uaddr, but we can't modify it
* non-atomically. Therefore, if get_user below is not
* enough, we need to handle the fault ourselves, while
* still holding the mmap_sem.
*/
if (attempt++) {
ret = futex_handle_fault((unsigned long)uaddr, fshared,
attempt);
if (ret)
goto out_unlock;
goto retry_locked;
}
Should we fault through normal causes, on the second round we call
futex_handle_fault, which faults in the address, and we then jump back
to retry_locked. However, since uval is -EFAULT from the last cmpxchg,
it &s w/ FUTEX_OWNER_DIED so we don't enter the first conditional to try
to cmpxchg again. So since uval is still -EFAULT, we loop back to
pi_faulted! This will loop until futex_handle_fault() bombs out because
attempt is too big and we return -EFAULT.
I *think* this is a possible quick fix here, but I'm no futex guru, so I
wanted to run it by folks for review.
Big thanks to Sripathi and Angela Lin for helping debug this, and Steven
for suggesting a cleaner fix then what I first tried.
thanks
-john
Avoid futex_unlock_pi returning -EFAULT (which results in deadlock), by
clearing uval before jumping to retry_locked.
Signed-off-by: John Stultz <[email protected]>
---
diff --git a/kernel/futex.c b/kernel/futex.c
index b7ce15c..9969b36 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2011,6 +2011,7 @@ pi_faulted:
attempt);
if (ret)
goto out_unlock;
+ uval = 0;
goto retry_locked;
}
On Wed, 2007-05-30 at 17:49 -0700, john stultz wrote:
> ...
> retry_locked:
> /*
> * To avoid races, try to do the TID -> 0 atomic transition
> * again. If it succeeds then we can return without waking
> * anyone else up:
> */
> if (!(uval & FUTEX_OWNER_DIED)) {
> pagefault_disable();
> uval = futex_atomic_cmpxchg_inatomic(uaddr, current->pid, 0);
> pagefault_enable();
> }
My question is to all the futex gurus out there.
This code is in futex_unlock_pi. Can the owner of the mutex really die?
Isn't the owner the one doing the unlock?
-- Steve
[CC'ing Uli because John forgot to with the original patch]
As John found with futex_unlock_pi (which that patch should apply nicely
to the -rt tree), the code can get screwed up with faulting in at the
cmpxchg in futex_unlock_pi code.
>From John's original email:
----
In looking into why the kernel was returning -EFAULT, I found the
following:
...
retry_locked:
/*
* To avoid races, try to do the TID -> 0 atomic transition
* again. If it succeeds then we can return without waking
* anyone else up:
*/
if (!(uval & FUTEX_OWNER_DIED)) {
pagefault_disable();
uval = futex_atomic_cmpxchg_inatomic(uaddr,
current->pid, 0);
pagefault_enable();
}
if (unlikely(uval == -EFAULT))
goto pi_faulted;
...[snip]...
pi_faulted:
/*
* We have to r/w *(int __user *)uaddr, but we can't modify it
* non-atomically. Therefore, if get_user below is not
* enough, we need to handle the fault ourselves, while
* still holding the mmap_sem.
*/
if (attempt++) {
ret = futex_handle_fault((unsigned long)uaddr, fshared,
attempt);
if (ret)
goto out_unlock;
goto retry_locked;
}
----
We see that uval is -EFAULT when jumping to pi_faulted. But when it goes
back to retry_locked, the uval is still -EFAULT. Which just happens to
have the FUTEX_OWNER_DIED bit set. So we don't do the
futex_atomic_cmpxchg_inatomic call, and go to the next conditional which
just so happens to be a check of uval for -EFAULT. And guess what? it
still is!!
Way to go John in finding this!!
But, the -rt kernel has pretty much the same code for the
futex_unlock_pi64, and it has the same bug. Hence this email.
This patch is the same fix that John posted, but simply for the
futex_unlock_pi64 that exists in the RT kernel and not the mainline.
Signed-off-by: Steven Rostedt <[email protected]>
Index: linux-2.6-rt-work/kernel/futex.c
===================================================================
--- linux-2.6-rt-work.orig/kernel/futex.c 2007-05-30 22:29:04.000000000 -0400
+++ linux-2.6-rt-work/kernel/futex.c 2007-05-30 22:32:01.000000000 -0400
@@ -3503,6 +3503,7 @@ pi_faulted:
ret = -EFAULT;
goto out_unlock;
}
+ uval = 0;
goto retry_locked;
}
Ingo,
This patch contains both John's and mine for -rt. Might as well make it
a single patch to pull in.
Signed-off-by: Steven Rostedt <[email protected]>
Signed-off-by: John Stultz <[email protected]>
---
Index: linux-2.6.21-rt9/kernel/futex.c
===================================================================
--- linux-2.6.21-rt9.orig/kernel/futex.c 2007-05-30 23:12:34.000000000 -0400
+++ linux-2.6.21-rt9/kernel/futex.c 2007-05-30 23:17:05.000000000 -0400
@@ -1839,6 +1839,7 @@ static int futex_lock_pi(u32 __user *uad
ret = -EFAULT;
goto out_unlock_release_sem;
}
+ uval = 0;
goto retry_locked;
}
@@ -3503,6 +3504,7 @@ pi_faulted:
ret = -EFAULT;
goto out_unlock;
}
+ uval = 0;
goto retry_locked;
}
* john stultz <[email protected]> wrote:
> So we've been seeing PI mutex deadlocks with a few of our
> applications using the -rt kernel. After narrowing things down, we
> were finding that the applications were indirectly calling
> futex_unlock_pi(), which on occasion would return -EFAULT, which is
> promptly ignored by glibc. This would cause the application to
> continue on as if it has unlocked the mutex, until it tried to
> reacquire it and deadlock.
> --- a/kernel/futex.c
> +++ b/kernel/futex.c
> @@ -2011,6 +2011,7 @@ pi_faulted:
> attempt);
> if (ret)
> goto out_unlock;
> + uval = 0;
> goto retry_locked;
this looks good to me. Oleg has posted a few more fixes as well - have
you tried those too?
Ingo
On Thu, 2007-05-31 at 16:24 +0200, Ingo Molnar wrote:
> * john stultz <[email protected]> wrote:
>
> > So we've been seeing PI mutex deadlocks with a few of our
> > applications using the -rt kernel. After narrowing things down, we
> > were finding that the applications were indirectly calling
> > futex_unlock_pi(), which on occasion would return -EFAULT, which is
> > promptly ignored by glibc. This would cause the application to
> > continue on as if it has unlocked the mutex, until it tried to
> > reacquire it and deadlock.
>
> > --- a/kernel/futex.c
> > +++ b/kernel/futex.c
> > @@ -2011,6 +2011,7 @@ pi_faulted:
> > attempt);
> > if (ret)
> > goto out_unlock;
> > + uval = 0;
> > goto retry_locked;
>
> this looks good to me. Oleg has posted a few more fixes as well - have
> you tried those too?
Do you have a URL or subject line for the discussion?
I'll take a look and give them a whirl.
thanks
-john
On 5/30/07, Steven Rostedt <[email protected]> wrote:
> > if (!(uval & FUTEX_OWNER_DIED)) {
> > pagefault_disable();
> > uval = futex_atomic_cmpxchg_inatomic(uaddr, current->pid, 0);
> > pagefault_enable();
> > }
> [...]
> This code is in futex_unlock_pi. Can the owner of the mutex really die?
> Isn't the owner the one doing the unlock?
This is part of the implementation of robust PI futexes. The
semantics is that if the owner of a robust futex dies the futex is
marked with FUTEX_OWNER_DIED. The next locking thread then has to
clear that bit before calling pthread_mutex_unlock. If the bit is
still set while unlocking the mutex is permanently marked unusable. I
cannot say right now whether this is the semantics implemented above.
I'll have to check the glibc test suite whether we check for that
semantic.
* john stultz <[email protected]> wrote:
> > > --- a/kernel/futex.c
> > > +++ b/kernel/futex.c
> > > @@ -2011,6 +2011,7 @@ pi_faulted:
> > > attempt);
> > > if (ret)
> > > goto out_unlock;
> > > + uval = 0;
> > > goto retry_locked;
> >
> > this looks good to me. Oleg has posted a few more fixes as well -
> > have you tried those too?
>
> Do you have a URL or subject line for the discussion? I'll take a look
> and give them a whirl.
i meant to say Alexey, not Oleg :-) Here's the thread:
http://lkml.org/lkml/2007/5/7/129
i've Cc:-ed Alexey too.
Ingo
On Thu, 2007-05-31 at 16:55 +0200, Ingo Molnar wrote:
> * john stultz <[email protected]> wrote:
>
> > > > --- a/kernel/futex.c
> > > > +++ b/kernel/futex.c
> > > > @@ -2011,6 +2011,7 @@ pi_faulted:
> > > > attempt);
> > > > if (ret)
> > > > goto out_unlock;
> > > > + uval = 0;
> > > > goto retry_locked;
> > >
> > > this looks good to me. Oleg has posted a few more fixes as well -
> > > have you tried those too?
> >
> > Do you have a URL or subject line for the discussion? I'll take a look
> > and give them a whirl.
>
> i meant to say Alexey, not Oleg :-) Here's the thread:
>
> http://lkml.org/lkml/2007/5/7/129
>
> i've Cc:-ed Alexey too.
I haven't had a chance to run it, but reviewing at Alexey's changes,
they do not look to be sufficient to the issue we were hitting. We still
need to clear uval to avoid -EFAULT &'ing w/ FUTEX_OWNER_DIED.
Here's the patch against -rt9 (which also contains Alexey's fix)
diff -rup 2.6-rt/kernel/futex.c 2.6-rtnew/kernel/futex.c
--- 2.6-rt/kernel/futex.c 2007-05-31 06:38:00.000000000 -0500
+++ 2.6-rtnew/kernel/futex.c 2007-05-31 06:43:08.000000000 -0500
@@ -1954,6 +1954,7 @@ pi_faulted:
ret = -EFAULT;
goto out_unlock;
}
+ uval = 0;
goto retry_locked;
}
@@ -3503,6 +3504,7 @@ pi_faulted:
ret = -EFAULT;
goto out_unlock;
}
+ uval = 0;
goto retry_locked;
}
On 5/30/07, Steven Rostedt <[email protected]> wrote:
> But, the -rt kernel has pretty much the same code for the
> futex_unlock_pi64, and it has the same bug.
Why would you use futex_unlock_pi64?
In fact, the whole 64-bit futex patch should be reviewed and cut down
to the bare minimum. I know, I was asking for it loudly but it became
to Jakub and me after adding the private futexes that all these
orthogonal extensions don't make sense. So far we need 64-bit futexes
only for some rwlocks and here only a few operations (wait, wake). We
don't need 64-bit extended to PI, we don't need requeue.
Yes, if we had started out with 64-bit futexes on 64-bit platforms,
that would be fine. But as it is today making 64-bit a new dimension
is not needed.