2006-08-01 17:46:14

by Esben Nielsen

[permalink] [raw]
Subject: [-rt] Fix race condition and following BUG in PI-futex

I ran into the bug on 2.6.17-rt8 with the previous posted patches which
make pthread_timed_lock() work on UP, but the bug is there without the
patches - I just can't trigger it - and it is also in the mainline kernel.

The problem is that rt_mutex_next_owner() is used unprotected in
wake_futex_pi(). At least it isn't probably serialiazed against the next
owner being signalled or getting a timeout. The only lock, which is
good enough here, is &pi_state->pi_mutex.wait_lock, so I added this
protection.

Esben

kernel/futex.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

Index: linux-2.6.17-rt8/kernel/futex.c
===================================================================
--- linux-2.6.17-rt8.orig/kernel/futex.c
+++ linux-2.6.17-rt8/kernel/futex.c
@@ -565,6 +565,7 @@ static int wake_futex_pi(u32 __user *uad
if (!pi_state)
return -EINVAL;

+ spin_lock(&pi_state->pi_mutex.wait_lock);
new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);

/*
@@ -590,15 +591,22 @@ static int wake_futex_pi(u32 __user *uad
curval = futex_atomic_cmpxchg_inatomic(uaddr, uval, newval);
dec_preempt_count();

- if (curval == -EFAULT)
+ if (curval == -EFAULT) {
+ spin_unlock(&pi_state->pi_mutex.wait_lock);
return -EFAULT;
- if (curval != uval)
+ }
+ if (curval != uval) {
+ spin_unlock(&pi_state->pi_mutex.wait_lock);
return -EINVAL;
+ }

list_del_init(&pi_state->owner->pi_state_list);
list_add(&pi_state->list, &new_owner->pi_state_list);
pi_state->owner = new_owner;
+ atomic_inc(&pi_state->refcount);
+ spin_unlock(&pi_state->pi_mutex.wait_lock);
rt_mutex_unlock(&pi_state->pi_mutex);
+ free_pi_state(pi_state);

return 0;
}


2006-08-01 18:23:13

by Steven Rostedt

[permalink] [raw]
Subject: Re: [-rt] Fix race condition and following BUG in PI-futex

On Tue, 2006-08-01 at 19:46 +0100, Esben Nielsen wrote:
> I ran into the bug on 2.6.17-rt8 with the previous posted patches which
> make pthread_timed_lock() work on UP, but the bug is there without the
> patches - I just can't trigger it - and it is also in the mainline kernel.
>
> The problem is that rt_mutex_next_owner() is used unprotected in
> wake_futex_pi(). At least it isn't probably serialiazed against the next
> owner being signalled or getting a timeout. The only lock, which is
> good enough here, is &pi_state->pi_mutex.wait_lock, so I added this
> protection.
>
> Esben
>
> kernel/futex.c | 12 ++++++++++--
> 1 file changed, 10 insertions(+), 2 deletions(-)
>
> Index: linux-2.6.17-rt8/kernel/futex.c
> ===================================================================
> --- linux-2.6.17-rt8.orig/kernel/futex.c
> +++ linux-2.6.17-rt8/kernel/futex.c
> @@ -565,6 +565,7 @@ static int wake_futex_pi(u32 __user *uad
> if (!pi_state)
> return -EINVAL;
>
> + spin_lock(&pi_state->pi_mutex.wait_lock);
> new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
>
> /*
> @@ -590,15 +591,22 @@ static int wake_futex_pi(u32 __user *uad
> curval = futex_atomic_cmpxchg_inatomic(uaddr, uval, newval);
> dec_preempt_count();
>
> - if (curval == -EFAULT)
> + if (curval == -EFAULT) {
> + spin_unlock(&pi_state->pi_mutex.wait_lock);
> return -EFAULT;
> - if (curval != uval)
> + }
> + if (curval != uval) {
> + spin_unlock(&pi_state->pi_mutex.wait_lock);
> return -EINVAL;
> + }
>
> list_del_init(&pi_state->owner->pi_state_list);
> list_add(&pi_state->list, &new_owner->pi_state_list);
> pi_state->owner = new_owner;
> + atomic_inc(&pi_state->refcount);

There really needs to be a get_pi_state() or some variant. Having to do
a manual atomic_inc is very dangerous.

> + spin_unlock(&pi_state->pi_mutex.wait_lock);
> rt_mutex_unlock(&pi_state->pi_mutex);
> + free_pi_state(pi_state);

And to stay in line with the kernel, perhaps we should rename this to
put_pi_state. We aren't freeing it if there's still references to it.

-- Steve

>
> return 0;
> }

2006-08-01 20:24:22

by Darren Hart

[permalink] [raw]
Subject: Re: [-rt] Fix race condition and following BUG in PI-futex

On Tuesday 01 August 2006 11:23, you wrote:
> On Tue, 2006-08-01 at 19:46 +0100, Esben Nielsen wrote:
> > I ran into the bug on 2.6.17-rt8 with the previous posted patches which
> > make pthread_timed_lock() work on UP, but the bug is there without the
> > patches - I just can't trigger it - and it is also in the mainline
> > kernel.
> >
> > The problem is that rt_mutex_next_owner() is used unprotected in
> > wake_futex_pi(). At least it isn't probably serialiazed against the next
> > owner being signalled or getting a timeout. The only lock, which is
> > good enough here, is &pi_state->pi_mutex.wait_lock, so I added this
> > protection.
> >
> > Esben
> >
> > kernel/futex.c | 12 ++++++++++--
> > 1 file changed, 10 insertions(+), 2 deletions(-)
> >
> > Index: linux-2.6.17-rt8/kernel/futex.c
> > ===================================================================
> > --- linux-2.6.17-rt8.orig/kernel/futex.c
> > +++ linux-2.6.17-rt8/kernel/futex.c
> > @@ -565,6 +565,7 @@ static int wake_futex_pi(u32 __user *uad
> > if (!pi_state)
> > return -EINVAL;
> >
> > + spin_lock(&pi_state->pi_mutex.wait_lock);
> > new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
> >
> > /*
> > @@ -590,15 +591,22 @@ static int wake_futex_pi(u32 __user *uad
> > curval = futex_atomic_cmpxchg_inatomic(uaddr, uval, newval);
> > dec_preempt_count();
> >
> > - if (curval == -EFAULT)
> > + if (curval == -EFAULT) {
> > + spin_unlock(&pi_state->pi_mutex.wait_lock);
> > return -EFAULT;
> > - if (curval != uval)
> > + }
> > + if (curval != uval) {
> > + spin_unlock(&pi_state->pi_mutex.wait_lock);
> > return -EINVAL;
> > + }
> >
> > list_del_init(&pi_state->owner->pi_state_list);
> > list_add(&pi_state->list, &new_owner->pi_state_list);
> > pi_state->owner = new_owner;
> > + atomic_inc(&pi_state->refcount);
>
> There really needs to be a get_pi_state() or some variant. Having to do
> a manual atomic_inc is very dangerous.

I understand the need to grab the wait_lock in order to serialize
rt_mutex_next_owner(), but why the addition of of the atomic_inc() and the
free_pi_state() ? And if we do need them, shouldn't we place them around the
use of the pi_state, rather than just before the unlock calls?

>
> > + spin_unlock(&pi_state->pi_mutex.wait_lock);
> > rt_mutex_unlock(&pi_state->pi_mutex);
> > + free_pi_state(pi_state);
>
> And to stay in line with the kernel, perhaps we should rename this to
> put_pi_state. We aren't freeing it if there's still references to it.
>
> -- Steve
>
> > return 0;
> > }
>

--
Darren Hart
IBM Linux Technology Center
Realtime Linux Team

2006-08-01 21:07:54

by Steven Rostedt

[permalink] [raw]
Subject: Re: [-rt] Fix race condition and following BUG in PI-futex

On Tue, 2006-08-01 at 13:22 -0700, Darren Hart wrote:

> > >
> > > list_del_init(&pi_state->owner->pi_state_list);
> > > list_add(&pi_state->list, &new_owner->pi_state_list);
> > > pi_state->owner = new_owner;
> > > + atomic_inc(&pi_state->refcount);
> >
> > There really needs to be a get_pi_state() or some variant. Having to do
> > a manual atomic_inc is very dangerous.
>
> I understand the need to grab the wait_lock in order to serialize
> rt_mutex_next_owner(), but why the addition of of the atomic_inc() and the
> free_pi_state() ? And if we do need them, shouldn't we place them around the
> use of the pi_state, rather than just before the unlock calls?

Hmm, is the inc really needed? The hb->lock is held through this and
the pi_state can't go away while that lock is held.

-- Steve


2006-08-01 22:28:46

by Esben Nielsen

[permalink] [raw]
Subject: Re: [-rt] Fix race condition and following BUG in PI-futex



On Tue, 1 Aug 2006, Steven Rostedt wrote:

> On Tue, 2006-08-01 at 13:22 -0700, Darren Hart wrote:
>
>>>>
>>>> list_del_init(&pi_state->owner->pi_state_list);
>>>> list_add(&pi_state->list, &new_owner->pi_state_list);
>>>> pi_state->owner = new_owner;
>>>> + atomic_inc(&pi_state->refcount);
>>>
>>> There really needs to be a get_pi_state() or some variant. Having to do
>>> a manual atomic_inc is very dangerous.
>>
>> I understand the need to grab the wait_lock in order to serialize
>> rt_mutex_next_owner(), but why the addition of of the atomic_inc() and the
>> free_pi_state() ? And if we do need them, shouldn't we place them around the
>> use of the pi_state, rather than just before the unlock calls?
>
> Hmm, is the inc really needed? The hb->lock is held through this and
> the pi_state can't go away while that lock is held.

I was going to ask about that... If you say so they can go. I just added
the inc/dec to be sure.

Esben

>
> -- Steve
>
>

2006-08-03 04:48:08

by Darren Hart

[permalink] [raw]
Subject: Re: [-rt] Fix race condition and following BUG in PI-futex

On Tuesday 01 August 2006 16:28, you wrote:
> On Tue, 1 Aug 2006, Steven Rostedt wrote:
> > On Tue, 2006-08-01 at 13:22 -0700, Darren Hart wrote:
> >>>> list_del_init(&pi_state->owner->pi_state_list);
> >>>> list_add(&pi_state->list, &new_owner->pi_state_list);
> >>>> pi_state->owner = new_owner;
> >>>> + atomic_inc(&pi_state->refcount);
> >>>
> >>> There really needs to be a get_pi_state() or some variant. Having to do
> >>> a manual atomic_inc is very dangerous.
> >>
> >> I understand the need to grab the wait_lock in order to serialize
> >> rt_mutex_next_owner(), but why the addition of of the atomic_inc() and
> >> the free_pi_state() ? And if we do need them, shouldn't we place them
> >> around the use of the pi_state, rather than just before the unlock
> >> calls?
> >
> > Hmm, is the inc really needed? The hb->lock is held through this and
> > the pi_state can't go away while that lock is held.
>
> I was going to ask about that... If you say so they can go. I just added
> the inc/dec to be sure.

So the only thing that frees the pi_state is free_pi_state and it's only
caller is unqueue_me_pi() which must be called with hb->lock held, so since
we already hold it, I think we're fine without the inc/free lines (as Steven
already said). The following patch has the lines removed.

The direct use of the rt_mutex wait_lock seems a little out of place here, as
it "ought to be" restricted to rt_mutex.c. Perhaps some kind of an "atomic"
rt_mutex_set_next_owner() call could abstract this away from futex.c? I
confess I don't see a way to do that without putting futex internals into
rt_mutex.c... so not really any better.


Index: 2.6.17-rt8/kernel/futex.c
===================================================================
--- 2.6.17-rt8.orig/kernel/futex.c 2006-08-02 20:29:42.000000000 -0700
+++ 2.6.17-rt8/kernel/futex.c 2006-08-02 21:39:17.000000000 -0700
@@ -565,6 +565,7 @@
if (!pi_state)
return -EINVAL;

+ spin_lock(&pi_state->pi_mutex.wait_lock);
new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);

/*
@@ -598,6 +599,8 @@
list_del_init(&pi_state->owner->pi_state_list);
list_add(&pi_state->list, &new_owner->pi_state_list);
pi_state->owner = new_owner;
+
+ spin_unlock(&pi_state->pi_mutex.wait_lock);
rt_mutex_unlock(&pi_state->pi_mutex);

return 0;



--
Darren Hart
IBM Linux Technology Center
Realtime Linux Team