2014-06-04 01:06:14

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

Added LKML and Thomas et. al., as this looks to be mainline too, and
we've been having so much fun with futexes lately.

What I've thought about is this scenario. In
rt_mutex_adjust_prio_chain(), at the bottom of the loop, just before
goto again is called, all locks are released, and we are fully
preemptible. That means, at than moment, anything can happen.

The reason we are in this code is because we blocked on a lock and we
are pushing the inheritance up as well as checking for deadlocks. But,
there's a flaw here in the deadlock case. Lets say we have this.

Tasks A, B, C and D

Locks L1, L2, L3, L4.

D owns L4, C owns L3, B owns L2. C tries to take L4 and blocks, B tries
to take L3, blocks. We then have:

L2->B->L3->C->L4->D

Perfectly fine. Then A comes along and blocks on L2, where we would
have:

A->L2->B->L3->C->L4->D

Lets say that A on its chain walk just before that goto again, and task
is D. top_waiter is C. As all locks are released and preemption is
enabled, lots of things can happen. Lets say they all release their
locks! And now we just have:

A->L2

but things are still running, and they take the locks such that we have:

C->L1->D->L2->B
A->L2

That is, B grabbed L2 (stole it from A), D grabbed L1, D blocked on L2
and C blocked on L1. Now A gets scheduled in and continues.

task->pi_blocked_on

Yep, as task is D, and it's blocked on L1 that is true.

orig_waiter && !rt_mutex_owner(orig_lock)

well, L1 has a owner, thus it wont exit due to this.

top_waiter && (!task_has_pi_waiters(task) ||
top_waiter != task_top_pi_waiter(task))

top_waiter is C, and D has pi_waiters, and C is still the top pi waiter
for D.

Then we get to the test.

lock == orig_lock || rt_mutex_owner(lock) == top_task

lock happens to be L2 and this is the original lock we wanted to take.
This reports a deadlock, but no deadlock scenario ever occurred.

I'm not sure if Brad's patch addresses this, but when reviewing
possible scenarios, this came to my mind.

-- Steve



On Fri, 23 May 2014 09:30:10 -0500
"Brad Mouring" <[email protected]> wrote:

> If, during walking the priority chain on a task blocking on a rtmutex,
> and the task is examining the waiter blocked on the lock owned by a task
> that is not blocking (the end of the chain), the current task is ejected
> from the processor and the owner of the end lock is scheduled in,
> releasing that lock, before the original task is scheduled back in, the
> task misses the fact that the previous owner of the current lock no
> longer holds it.
>
> Signed-off-by: Brad Mouring <[email protected]>
> Acked-by: Scot Salmon <[email protected]>
> Acked-by: Ben Shelton <[email protected]>
> Tested-by: Jeff Westfahl <[email protected]>
> ---
> kernel/locking/rtmutex.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index fbf152b..029a9ab 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -384,6 +384,25 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
>
> /* Deadlock detection */
> if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> + /*
> + * If the prio chain has changed out from under us, set the task
> + * to the current owner of the lock in the current waiter and
> + * continue walking the prio chain
> + */
> + if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {
> + /* Release the old owner */
> + raw_spin_unlock_irqrestore(&task->pi_lock, flags);
> + put_task_struct(task);
> +
> + /* Move to the new owner */
> + task = rt_mutex_owner(lock);
> + get_task_struct(task);
> +
> + /* Let's try this again */
> + raw_spin_unlock(&lock->wait_lock);
> + goto retry;
> + }
> +
> debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
> raw_spin_unlock(&lock->wait_lock);
> ret = deadlock_detect ? -EDEADLK : 0;


2014-06-04 13:17:04

by Brad Mouring

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Tue, Jun 03, 2014 at 09:06:09PM -0400, Steven Rostedt wrote:
> Added LKML and Thomas et. al., as this looks to be mainline too, and
> we've been having so much fun with futexes lately.
>
> What I've thought about is this scenario. In
> rt_mutex_adjust_prio_chain(), at the bottom of the loop, just before
> goto again is called, all locks are released, and we are fully
> preemptible. That means, at than moment, anything can happen.
>
> The reason we are in this code is because we blocked on a lock and we
> are pushing the inheritance up as well as checking for deadlocks. But,
> there's a flaw here in the deadlock case. Lets say we have this.
>
> Tasks A, B, C and D
>
> Locks L1, L2, L3, L4.
>
> D owns L4, C owns L3, B owns L2. C tries to take L4 and blocks, B tries
> to take L3, blocks. We then have:
>
> L2->B->L3->C->L4->D
>
> Perfectly fine. Then A comes along and blocks on L2, where we would
> have:
>
> A->L2->B->L3->C->L4->D
>
> Lets say that A on its chain walk just before that goto again, and task
> is D. top_waiter is C. As all locks are released and preemption is
> enabled, lots of things can happen. Lets say they all release their
> locks! And now we just have:
>
> A->L2
>
> but things are still running, and they take the locks such that we have:
>
> C->L1->D->L2->B
> A->L2

This is a slight variation on what I was seeing. To use the nomenclature
that you proposed at the start, rewinding to the point

A->L2->B->L3->C->L4->D

Let's assume things continue to unfold as you explain. Task is D,
top_waiter is C. A is scheduled out and the chain shuffles.

A->L2->B
C->L4->D->'

So, we now have D blocking on L2 and L4 has waiters, C again. Also,
since the codepath to have C block on L4 again is the same as the
codepath from when it blocked on it in the first place, the location
is the same since the stack (for what we care about) is the same.

I can see both of these being different expressions of the same problem.

>
> That is, B grabbed L2 (stole it from A), D grabbed L1, D blocked on L2
> and C blocked on L1. Now A gets scheduled in and continues.
>
> task->pi_blocked_on
>
> Yep, as task is D, and it's blocked on L1 that is true.
>
> orig_waiter && !rt_mutex_owner(orig_lock)
>
> well, L1 has a owner, thus it wont exit due to this.
>
> top_waiter && (!task_has_pi_waiters(task) ||
> top_waiter != task_top_pi_waiter(task))
>
> top_waiter is C, and D has pi_waiters, and C is still the top pi waiter
> for D.
>
> Then we get to the test.
>
> lock == orig_lock || rt_mutex_owner(lock) == top_task
>
> lock happens to be L2 and this is the original lock we wanted to take.
> This reports a deadlock, but no deadlock scenario ever occurred.
>
> I'm not sure if Brad's patch addresses this, but when reviewing
> possible scenarios, this came to my mind.

This is what my patch addresses.

>
> -- Steve
>
>
>
> On Fri, 23 May 2014 09:30:10 -0500
> "Brad Mouring" <[email protected]> wrote:
>
> > If, during walking the priority chain on a task blocking on a rtmutex,
> > and the task is examining the waiter blocked on the lock owned by a task
> > that is not blocking (the end of the chain), the current task is ejected
> > from the processor and the owner of the end lock is scheduled in,
> > releasing that lock, before the original task is scheduled back in, the
> > task misses the fact that the previous owner of the current lock no
> > longer holds it.
> >
> > Signed-off-by: Brad Mouring <[email protected]>
> > Acked-by: Scot Salmon <[email protected]>
> > Acked-by: Ben Shelton <[email protected]>
> > Tested-by: Jeff Westfahl <[email protected]>
> > ---
> > kernel/locking/rtmutex.c | 19 +++++++++++++++++++
> > 1 file changed, 19 insertions(+)
> >
> > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> > index fbf152b..029a9ab 100644
> > --- a/kernel/locking/rtmutex.c
> > +++ b/kernel/locking/rtmutex.c
> > @@ -384,6 +384,25 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
> >
> > /* Deadlock detection */
> > if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > + /*
> > + * If the prio chain has changed out from under us, set the task
> > + * to the current owner of the lock in the current waiter and
> > + * continue walking the prio chain
> > + */
> > + if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {
> > + /* Release the old owner */
> > + raw_spin_unlock_irqrestore(&task->pi_lock, flags);
> > + put_task_struct(task);
> > +
> > + /* Move to the new owner */
> > + task = rt_mutex_owner(lock);
> > + get_task_struct(task);
> > +
> > + /* Let's try this again */
> > + raw_spin_unlock(&lock->wait_lock);
> > + goto retry;
> > + }
> > +
> > debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
> > raw_spin_unlock(&lock->wait_lock);
> > ret = deadlock_detect ? -EDEADLK : 0;
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-06-04 14:16:18

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, 4 Jun 2014 08:05:25 -0500
"Brad Mouring" <[email protected]> wrote:

> A->L2
>
> This is a slight variation on what I was seeing. To use the nomenclature
> that you proposed at the start, rewinding to the point
>
> A->L2->B->L3->C->L4->D
>
> Let's assume things continue to unfold as you explain. Task is D,
> top_waiter is C. A is scheduled out and the chain shuffles.
>
> A->L2->B
> C->L4->D->'

But isn't that a lock ordering problem there?

If B can block on L3 owned by C, I see the following:

B->L3->C->L4->D->L2->B

Deadlock!

In my scenario I was very careful to point out that the lock ordering
was: L1->L2->L3->L4

But you show that we can have both:

L2-> ... ->L4

and

L4-> ... ->L2

Which is a reverse of lock ordering and a possible deadlock can occur.

-- Steve


>
> So, we now have D blocking on L2 and L4 has waiters, C again. Also,
> since the codepath to have C block on L4 again is the same as the
> codepath from when it blocked on it in the first place, the location
> is the same since the stack (for what we care about) is the same.
>

2014-06-04 14:39:12

by Brad Mouring

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, Jun 04, 2014 at 10:16:12AM -0400, Steven Rostedt wrote:
> On Wed, 4 Jun 2014 08:05:25 -0500
> "Brad Mouring" <[email protected]> wrote:
>
> > A->L2
> >
> > This is a slight variation on what I was seeing. To use the nomenclature
> > that you proposed at the start, rewinding to the point
> >
> > A->L2->B->L3->C->L4->D
> >
> > Let's assume things continue to unfold as you explain. Task is D,
> > top_waiter is C. A is scheduled out and the chain shuffles.
> >
> > A->L2->B
> > C->L4->D->'
>
> But isn't that a lock ordering problem there?
>
> If B can block on L3 owned by C, I see the following:
>
> B->L3->C->L4->D->L2->B
>
> Deadlock!
Yes, it could be. But currently no one owns L3. B is currently not
blocked. Under these circumstances, there is no deadlock. Also, I
somewhat arbitrarily picked L4, it could be Lfoo that C blocks on
since the process is
...
waiter = D->pi_blocked_on

// waiter is real_waiter D->L2

// orig_waiter still there, orig_lock still has an owner

// top_waiter was pointing to C->L4, now points to C->Lfoo
// D does have top_waiters, and, as noted above, it aliased
// to encompass a different waiter scenario

>
> In my scenario I was very careful to point out that the lock ordering
> was: L1->L2->L3->L4
>
> But you show that we can have both:
>
> L2-> ... ->L4
>
> and
>
> L4-> ... ->L2
>
> Which is a reverse of lock ordering and a possible deadlock can occur.

So the numbering/ordering of the locks is really somewhat arbitrary.
Here we *can* have L2-> ... ->L4 (if B decides to block on L2, it
could just as easily block on L8), and we absolutely have
L4-> ... ->L2. A deadlock *could* occur, but all of the traces that
I dug through, no actual deadlocks occurred.
>
> -- Steve
>
>
> >
> > So, we now have D blocking on L2 and L4 has waiters, C again. Also,
> > since the codepath to have C block on L4 again is the same as the
> > codepath from when it blocked on it in the first place, the location
> > is the same since the stack (for what we care about) is the same.
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-06-04 14:58:28

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, 4 Jun 2014 09:38:30 -0500
"Brad Mouring" <[email protected]> wrote:

> On Wed, Jun 04, 2014 at 10:16:12AM -0400, Steven Rostedt wrote:
> > On Wed, 4 Jun 2014 08:05:25 -0500
> > "Brad Mouring" <[email protected]> wrote:
> >
> > > A->L2
> > >
> > > This is a slight variation on what I was seeing. To use the nomenclature
> > > that you proposed at the start, rewinding to the point
> > >
> > > A->L2->B->L3->C->L4->D
> > >
> > > Let's assume things continue to unfold as you explain. Task is D,
> > > top_waiter is C. A is scheduled out and the chain shuffles.
> > >
> > > A->L2->B
> > > C->L4->D->'
> >
> > But isn't that a lock ordering problem there?
> >
> > If B can block on L3 owned by C, I see the following:
> >
> > B->L3->C->L4->D->L2->B
> >
> > Deadlock!
> Yes, it could be. But currently no one owns L3. B is currently not
> blocked. Under these circumstances, there is no deadlock. Also, I
> somewhat arbitrarily picked L4, it could be Lfoo that C blocks on
> since the process is

OK, then you should have used L1, which basically makes it exactly my
scenario ;-)

> ...
> waiter = D->pi_blocked_on
>
> // waiter is real_waiter D->L2
>
> // orig_waiter still there, orig_lock still has an owner
>
> // top_waiter was pointing to C->L4, now points to C->Lfoo
> // D does have top_waiters, and, as noted above, it aliased
> // to encompass a different waiter scenario
>
> >
> > In my scenario I was very careful to point out that the lock ordering
> > was: L1->L2->L3->L4
> >
> > But you show that we can have both:
> >
> > L2-> ... ->L4
> >
> > and
> >
> > L4-> ... ->L2
> >
> > Which is a reverse of lock ordering and a possible deadlock can occur.
>
> So the numbering/ordering of the locks is really somewhat arbitrary.
> Here we *can* have L2-> ... ->L4 (if B decides to block on L2, it
> could just as easily block on L8), and we absolutely have
> L4-> ... ->L2. A deadlock *could* occur, but all of the traces that
> I dug through, no actual deadlocks occurred.

Heh, but that shows the code is broken. I'm not saying that our
deadlock detector is not returning false positives, I'm just stating
that you probably need to fix your code.

Yes, you can have a locking order of L1 -> L2 and also L2 -> L1, and if
you are lucky, that may never trigger any deadlocks. But why do you
think the kernel folks have put so much effort into lockdep. Lockdep
doesn't tell you that there is a deadlock (although it could), what it
is so useful with is to tell us where there are possible deadlocks.

If your code does take L1 -> L2 and then L2 -> L1, you have a chance of
hitting a deadlock right there. If you were to run the userspace
lockdep, it would spit out a nice warning for you.

But this is off topic, as I have shown that there exists an example
that the userspace code would never deadlock but our deadlock detector
would say it did.

-- Steve

2014-06-04 15:12:23

by Brad Mouring

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, Jun 04, 2014 at 10:58:23AM -0400, Steven Rostedt wrote:
> On Wed, 4 Jun 2014 09:38:30 -0500
> "Brad Mouring" <[email protected]> wrote:
>
> > On Wed, Jun 04, 2014 at 10:16:12AM -0400, Steven Rostedt wrote:
> > > On Wed, 4 Jun 2014 08:05:25 -0500
> > > "Brad Mouring" <[email protected]> wrote:
> > >
> > > > A->L2
> > > >
> > > > This is a slight variation on what I was seeing. To use the nomenclature
> > > > that you proposed at the start, rewinding to the point
> > > >
> > > > A->L2->B->L3->C->L4->D
> > > >
> > > > Let's assume things continue to unfold as you explain. Task is D,
> > > > top_waiter is C. A is scheduled out and the chain shuffles.
> > > >
> > > > A->L2->B
> > > > C->L4->D->'
> > >
> > > But isn't that a lock ordering problem there?
> > >
> > > If B can block on L3 owned by C, I see the following:
> > >
> > > B->L3->C->L4->D->L2->B
> > >
> > > Deadlock!
> > Yes, it could be. But currently no one owns L3. B is currently not
> > blocked. Under these circumstances, there is no deadlock. Also, I
> > somewhat arbitrarily picked L4, it could be Lfoo that C blocks on
> > since the process is
>
> OK, then you should have used L1, which basically makes it exactly my
> scenario ;-)

Heh, fair point.

>
> > ...
> > waiter = D->pi_blocked_on
> >
> > // waiter is real_waiter D->L2
> >
> > // orig_waiter still there, orig_lock still has an owner
> >
> > // top_waiter was pointing to C->L4, now points to C->Lfoo
> > // D does have top_waiters, and, as noted above, it aliased
> > // to encompass a different waiter scenario
> >
> > >
> > > In my scenario I was very careful to point out that the lock ordering
> > > was: L1->L2->L3->L4
> > >
> > > But you show that we can have both:
> > >
> > > L2-> ... ->L4
> > >
> > > and
> > >
> > > L4-> ... ->L2
> > >
> > > Which is a reverse of lock ordering and a possible deadlock can occur.
> >
> > So the numbering/ordering of the locks is really somewhat arbitrary.
> > Here we *can* have L2-> ... ->L4 (if B decides to block on L2, it
> > could just as easily block on L8), and we absolutely have
> > L4-> ... ->L2. A deadlock *could* occur, but all of the traces that
> > I dug through, no actual deadlocks occurred.
>
> Heh, but that shows the code is broken. I'm not saying that our
> deadlock detector is not returning false positives, I'm just stating
> that you probably need to fix your code.
>
> Yes, you can have a locking order of L1 -> L2 and also L2 -> L1, and if
> you are lucky, that may never trigger any deadlocks. But why do you
> think the kernel folks have put so much effort into lockdep. Lockdep
> doesn't tell you that there is a deadlock (although it could), what it
> is so useful with is to tell us where there are possible deadlocks.
>
> If your code does take L1 -> L2 and then L2 -> L1, you have a chance of
> hitting a deadlock right there. If you were to run the userspace
> lockdep, it would spit out a nice warning for you.

What I was saying is that the code can take L1 -> L2 and L2 -> Lfoo.
And, in fact, a quick glance back over my notes supports just this
behavior. It was unfortunate that I decided to come up with an example
without thinking it through first.
>
> But this is off topic, as I have shown that there exists an example
> that the userspace code would never deadlock but our deadlock detector
> would say it did.
>
> -- Steve
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-06-04 15:32:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Tue, 3 Jun 2014, Steven Rostedt wrote:
> On Fri, 23 May 2014 09:30:10 -0500
> "Brad Mouring" <[email protected]> wrote:
> > /* Deadlock detection */
> > if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > + /*
> > + * If the prio chain has changed out from under us, set the task
> > + * to the current owner of the lock in the current waiter and
> > + * continue walking the prio chain
> > + */
> > + if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {

No, sorry. That's wrong.

Your change wreckages the rt_mutex_owner(lock) == top_task test
simply because in that case:

(rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)

evaluates to true.

Aside of that we need to figure out whether the lock chain changed
while we dropped the locks even in the non dead lock case. Otherwise
we follow up the wrong chain there.

T0 blocked on L1 held by T1
T1 blocked on L2 held by T2
T2 blocked on L3 held by T3

So we walk the chain and do:

T1 -> L2 -> T2

Now here we get preempted.

T3 releases L3
T2 gets L3
T2 drops L3 and L2
T2 blocks on L4 held by T4
T4 blocked on L5 held by T5

So we happily boost T4 and T5. Not what we really want to do.

Nasty, isn't it ?

Thanks,

tglx

2014-06-04 15:44:15

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> T3 releases L3
> T2 gets L3
> T2 drops L3 and L2
> T2 blocks on L4 held by T4
> T4 blocked on L5 held by T5
>
> So we happily boost T4 and T5. Not what we really want to do.
>
> Nasty, isn't it ?
>

Actually, we may go up a chain, but we never do any unnecessary
boosting. That's because the boost is done with rt_mutex_adjust_prio()
which gets the prio from rt_mutex_getprio() which reads the
task->normal_prio and compares it to the task_top_pi_waiter(task)->prio,
which will always be correct as we have the necessary locks.

And we don't even need to worry about the chain we miss. That is, if
task A is blocked on a lock owned by D at the time, but as we go up the
chain, D releases the lock and B grabs it, B will still up its priority
based on the waiters of the lock (that is A), and if B blocks, it will
boost the tasks that own the lock it blocks on, where B is still
influenced by A.

The fact that we only update the prio based on the actual waiters and
don't carry a prio up the chain (which you designed, and I thought was
quite ingenious by the way), we may waste time going up a chain, but
the priority inheritance is still accurate.

-- Steve

2014-06-04 18:02:24

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, 4 Jun 2014, Steven Rostedt wrote:

> On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> Thomas Gleixner <[email protected]> wrote:
>
> > T3 releases L3
> > T2 gets L3
> > T2 drops L3 and L2
> > T2 blocks on L4 held by T4
> > T4 blocked on L5 held by T5
> >
> > So we happily boost T4 and T5. Not what we really want to do.
> >
> > Nasty, isn't it ?
> >
>
> Actually, we may go up a chain, but we never do any unnecessary
> boosting. That's because the boost is done with rt_mutex_adjust_prio()
> which gets the prio from rt_mutex_getprio() which reads the
> task->normal_prio and compares it to the task_top_pi_waiter(task)->prio,
> which will always be correct as we have the necessary locks.

Indeed.

> And we don't even need to worry about the chain we miss. That is, if
> task A is blocked on a lock owned by D at the time, but as we go up the
> chain, D releases the lock and B grabs it, B will still up its priority
> based on the waiters of the lock (that is A), and if B blocks, it will
> boost the tasks that own the lock it blocks on, where B is still
> influenced by A.
>
> The fact that we only update the prio based on the actual waiters and
> don't carry a prio up the chain (which you designed, and I thought was
> quite ingenious by the way), we may waste time going up a chain, but
> the priority inheritance is still accurate.

Duh. I actually had to lookup my notes from back then. There is even a
lenghty IRC discussion about not propagating the least waiters prio,
but lookup the actual lock waiters. Good, so we just walk for nothing
and waste some cpu cycles.

My brain still suffers from 3 days staring into futex.c

I'll fixup the check so it wont break the real deadlock case and queue
it.

Thanks,

tglx

2014-06-04 18:12:30

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, 4 Jun 2014 20:02:16 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:


> My brain still suffers from 3 days staring into futex.c

Hopefully that's not permanent damage.

>
> I'll fixup the check so it wont break the real deadlock case and queue
> it.
>

Cool. Could you include my write up on the locking scenario in the
change log so that it is documented somewhere other than mail archives.

Thanks,

-- Steve

2014-06-04 19:26:19

by Brad Mouring

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, Jun 04, 2014 at 08:02:16PM +0200, Thomas Gleixner wrote:
> On Wed, 4 Jun 2014, Steven Rostedt wrote:
>
> > On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> > Thomas Gleixner <[email protected]> wrote:
> >
> > > T3 releases L3
> > > T2 gets L3
> > > T2 drops L3 and L2
> > > T2 blocks on L4 held by T4
> > > T4 blocked on L5 held by T5
> > >
> > > So we happily boost T4 and T5. Not what we really want to do.
> > >
> > > Nasty, isn't it ?
> > >
> >
> > Actually, we may go up a chain, but we never do any unnecessary
> > boosting. That's because the boost is done with rt_mutex_adjust_prio()
> > which gets the prio from rt_mutex_getprio() which reads the
> > task->normal_prio and compares it to the task_top_pi_waiter(task)->prio,
> > which will always be correct as we have the necessary locks.
>
> Indeed.

I had thought through how to try to determine, from what we knew at
this point, whether or not we were walking a different chain for just
this concern, but I had convinced myself that it would be cumbersome,
error-prone, and, as pointed out here, not vital.

>
> > And we don't even need to worry about the chain we miss. That is, if
> > task A is blocked on a lock owned by D at the time, but as we go up the
> > chain, D releases the lock and B grabs it, B will still up its priority
> > based on the waiters of the lock (that is A), and if B blocks, it will
> > boost the tasks that own the lock it blocks on, where B is still
> > influenced by A.
> >
> > The fact that we only update the prio based on the actual waiters and
> > don't carry a prio up the chain (which you designed, and I thought was
> > quite ingenious by the way), we may waste time going up a chain, but
> > the priority inheritance is still accurate.
>
> Duh. I actually had to lookup my notes from back then. There is even a
> lenghty IRC discussion about not propagating the least waiters prio,
> but lookup the actual lock waiters. Good, so we just walk for nothing
> and waste some cpu cycles.
>
I put the check within the deadlock detection block to try to minimize
this case. As evidenced by the spinning on this, it's a rare case where
your userspace code has to be doing some wacky (but valid) stuff.

> My brain still suffers from 3 days staring into futex.c
>
> I'll fixup the check so it wont break the real deadlock case and queue
> it.

How would the change break the real deadlock case?

>
> Thanks,
>
> tglx
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-06-04 19:53:22

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, 4 Jun 2014, Brad Mouring wrote:
> On Wed, Jun 04, 2014 at 08:02:16PM +0200, Thomas Gleixner wrote:
> > I'll fixup the check so it wont break the real deadlock case and queue
> > it.
>
> How would the change break the real deadlock case?

> > /* Deadlock detection */
> > if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > + /*
> > + * If the prio chain has changed out from under us, set the task
> > + * to the current owner of the lock in the current waiter and
> > + * continue walking the prio chain
> > + */
> > + if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {

No, sorry. That's wrong.

Your change wreckages the rt_mutex_owner(lock) == top_task test
simply because in that case:

(rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)

evaluates to true.

So we want this:

Index: tip/kernel/locking/rtmutex.c
===================================================================
--- tip.orig/kernel/locking/rtmutex.c
+++ tip/kernel/locking/rtmutex.c
@@ -375,6 +375,26 @@ static int rt_mutex_adjust_prio_chain(st
* walk, we detected a deadlock.
*/
if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
+ /*
+ * If the prio chain has changed out from under us, set the task
+ * to the current owner of the lock in the current waiter and
+ * continue walking the prio chain
+ */
+ if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
+ rt_mutex_owner(lock) != top_task) {
+ /* Release the old owner */
+ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+ put_task_struct(task);
+
+ /* Move to the new owner */
+ task = rt_mutex_owner(lock);
+ get_task_struct(task);
+
+ /* Let's try this again */
+ raw_spin_unlock(&lock->wait_lock);
+ goto retry;
+ }
+
debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
raw_spin_unlock(&lock->wait_lock);
ret = deadlock_detect ? -EDEADLK : 0;

2014-06-04 20:08:28

by Brad Mouring

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, Jun 04, 2014 at 09:53:16PM +0200, Thomas Gleixner wrote:
> On Wed, 4 Jun 2014, Brad Mouring wrote:
> > On Wed, Jun 04, 2014 at 08:02:16PM +0200, Thomas Gleixner wrote:
> > > I'll fixup the check so it wont break the real deadlock case and queue
> > > it.
> >
> > How would the change break the real deadlock case?
>
> > > /* Deadlock detection */
> > > if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > > + /*
> > > + * If the prio chain has changed out from under us, set the task
> > > + * to the current owner of the lock in the current waiter and
> > > + * continue walking the prio chain
> > > + */
> > > + if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {
>
> No, sorry. That's wrong.
>
> Your change wreckages the rt_mutex_owner(lock) == top_task test
> simply because in that case:
>
> (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)
>
> evaluates to true.

Ah. Yeah. I haven't tested this but it seems sane to me.

>
> So we want this:
>
> Index: tip/kernel/locking/rtmutex.c
> ===================================================================
> --- tip.orig/kernel/locking/rtmutex.c
> +++ tip/kernel/locking/rtmutex.c
> @@ -375,6 +375,26 @@ static int rt_mutex_adjust_prio_chain(st
> * walk, we detected a deadlock.
> */
> if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> + /*
> + * If the prio chain has changed out from under us, set the task
> + * to the current owner of the lock in the current waiter and
> + * continue walking the prio chain
> + */
> + if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
> + rt_mutex_owner(lock) != top_task) {
> + /* Release the old owner */
> + raw_spin_unlock_irqrestore(&task->pi_lock, flags);
> + put_task_struct(task);
> +
> + /* Move to the new owner */
> + task = rt_mutex_owner(lock);
> + get_task_struct(task);
> +
> + /* Let's try this again */
> + raw_spin_unlock(&lock->wait_lock);
> + goto retry;
> + }
> +
> debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
> raw_spin_unlock(&lock->wait_lock);
> ret = deadlock_detect ? -EDEADLK : 0;
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2014-06-04 20:42:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, 4 Jun 2014, Brad Mouring wrote:
> On Wed, Jun 04, 2014 at 09:53:16PM +0200, Thomas Gleixner wrote:
> > Your change wreckages the rt_mutex_owner(lock) == top_task test
> > simply because in that case:
> >
> > (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)
> >
> > evaluates to true.
>
> Ah. Yeah. I haven't tested this but it seems sane to me.
>
> >
> > So we want this:
> >
> > Index: tip/kernel/locking/rtmutex.c
> > ===================================================================
> > --- tip.orig/kernel/locking/rtmutex.c
> > +++ tip/kernel/locking/rtmutex.c
> > @@ -375,6 +375,26 @@ static int rt_mutex_adjust_prio_chain(st
> > * walk, we detected a deadlock.
> > */
> > if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > + /*
> > + * If the prio chain has changed out from under us, set the task
> > + * to the current owner of the lock in the current waiter and
> > + * continue walking the prio chain
> > + */
> > + if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
> > + rt_mutex_owner(lock) != top_task) {
> > + /* Release the old owner */

That's not the old owner. It's the task which was blocked in the lock
chain before we dropped the locks and got preemptible. So we come here
just in the case that the task is now blocked on orig_lock.

We really want to be more than careful about the comments here. The
damned thing is complex enough already, so confusing comments are
actually worse than no comments.

Care to resend?

Thanks,

tglx

2014-06-04 20:50:00

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, 4 Jun 2014, Steven Rostedt wrote:

> On Wed, 4 Jun 2014 20:02:16 +0200 (CEST)
> Thomas Gleixner <[email protected]> wrote:
>
>
> > My brain still suffers from 3 days staring into futex.c
>
> Hopefully that's not permanent damage.

Staring into rtmutex.c does not make it any better ...

2014-06-04 22:23:38

by Brad Mouring

[permalink] [raw]
Subject: [PATCH] rtmutex: Handle when top lock owner changes

If, during walking the priority chain on a task blocking on a rtmutex,
and the task is examining the waiter blocked on the lock owned by a task
that is not blocking (the end of the chain), the current task is ejected
from the processor and the owner of the end lock is scheduled in,
releasing that lock, before the original task is scheduled back in, the
task misses the fact that the previous owner of the current lock no
longer holds it.

Consider the following scenario:
Tasks A, B, C, and D
Locks L1, L2, L3, and L4

D owns L4, C owns L3, B owns L2. C blocks on L4, B blocks on L3.

We have
L2->B->L3->C->L4->D

A comes along and blocks on L2.
A->L2->B->L3->C->L4->D

We walking the priority chain, and, while walking the chain, with
task pointing to D, top_waiter at C->L4. We fail to take L4's pi_lock
and are scheduled out.

Let's assume that the chain changes prior to A being scheduled in.
All of the owners finish with their locks and drop them. We have

A->L2

But, as things are still running, the chain can continue to change,
leading to

A->L2->B
C->L1->D->L2

That is, B ends up winning L2, D blocks on L2 after grabbing L1,
and L1 blocks C. A is scheduled back in and continues the walk.

Since task was pointing to D, and D is indeed blocked, it will
have a waiter (D->L2), and, sadly, the lock is orig_lock. The
deadlock detection will come in and report a deadlock to userspace.

This change provides an additional check for this situation before
reporting a deadlock to userspace.

Signed-off-by: Brad Mouring <[email protected]>
Acked-by: Scot Salmon <[email protected]>
Acked-by: Ben Shelton <[email protected]>
Tested-by: Jeff Westfahl <[email protected]>
---
kernel/locking/rtmutex.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index fbf152b..8ad7f7d 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -384,6 +384,26 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,

/* Deadlock detection */
if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
+ /*
+ * If the prio chain has changed out from under us, set the task
+ * to the current owner of the lock in the current waiter and
+ * continue walking the prio chain
+ */
+ if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
+ rt_mutex_owner(lock) != top_task) {
+ /* Release the old task (blocked before the chain chaged) */
+ raw_spin_unlock_irqrestore(&task->pi_lock, flags);
+ put_task_struct(task);
+
+ /* Move to the owner of the lock now described in waiter */
+ task = rt_mutex_owner(lock);
+ get_task_struct(task);
+
+ /* Let's try this again */
+ raw_spin_unlock(&lock->wait_lock);
+ goto retry;
+ }
+
debug_rt_mutex_deadlock(deadlock_detect, orig_waiter, lock);
raw_spin_unlock(&lock->wait_lock);
ret = deadlock_detect ? -EDEADLK : 0;
--
1.8.3-rc3

2014-06-04 23:03:40

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH] rtmutex: Handle when top lock owner changes


On Wed, 4 Jun 2014, Brad Mouring wrote:

> If, during walking the priority chain on a task blocking on a rtmutex,
> and the task is examining the waiter blocked on the lock owned by a task
> that is not blocking (the end of the chain), the current task is ejected
> from the processor and the owner of the end lock is scheduled in,
> releasing that lock, before the original task is scheduled back in, the
> task misses the fact that the previous owner of the current lock no
> longer holds it.

-ENOPARSE,

> diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
> index fbf152b..8ad7f7d 100644
> --- a/kernel/locking/rtmutex.c
> +++ b/kernel/locking/rtmutex.c
> @@ -384,6 +384,26 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task,
>
> /* Deadlock detection */

Does not apply against 3.15-rc8

> if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> + /*
> + * If the prio chain has changed out from under us, set the task
> + * to the current owner of the lock in the current waiter and
> + * continue walking the prio chain
> + */

You are still describing what the code is doing, not WHY.

Why can it happen that the prio chain changed under us?

Why do we set task to the current owner of the lock ?

Why makes it sense to retry the chain walk from the very
beginning?

What are the conditions which make us go into that code path?

> + if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task &&
> + rt_mutex_owner(lock) != top_task) {
> + /* Release the old task (blocked before the chain chaged) */

chaged?

old task ?

Again:

> We really want to be more than careful about the comments here. The
> damned thing is complex enough already, so confusing comments are
> actually worse than no comments.

This is not a speed code contest. Take your time, think about it, run
it through your coworkers and then post again.

Last warning.

Thanks,

tglx

2014-06-06 03:19:45

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Tue, 3 Jun 2014, Steven Rostedt wrote:
> > On Fri, 23 May 2014 09:30:10 -0500
> > "Brad Mouring" <[email protected]> wrote:
> > > /* Deadlock detection */
> > > if (lock == orig_lock || rt_mutex_owner(lock) == top_task) {
> > > + /*
> > > + * If the prio chain has changed out from under us, set the task
> > > + * to the current owner of the lock in the current waiter and
> > > + * continue walking the prio chain
> > > + */
> > > + if (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task) {
>
> No, sorry. That's wrong.
>
> Your change wreckages the rt_mutex_owner(lock) == top_task test
> simply because in that case:
>
> (rt_mutex_owner(lock) && rt_mutex_owner(lock) != task)
>
> evaluates to true.

I'm not so sure that's true. Because if this is a deadlock in the case
that rt_mutex_owner(lock) == top_task, then task == top_task should
also be true.

>
> Aside of that we need to figure out whether the lock chain changed
> while we dropped the locks even in the non dead lock case. Otherwise
> we follow up the wrong chain there.
>
> T0 blocked on L1 held by T1
> T1 blocked on L2 held by T2
> T2 blocked on L3 held by T3
>
> So we walk the chain and do:
>
> T1 -> L2 -> T2
>
> Now here we get preempted.
>
> T3 releases L3
> T2 gets L3
> T2 drops L3 and L2
> T2 blocks on L4 held by T4
> T4 blocked on L5 held by T5
>
> So we happily boost T4 and T5. Not what we really want to do.
>
> Nasty, isn't it ?

As I stated before, it just wastes cycles. But looking at both your
next_lock code and this, I'm thinking we can simply break out if we
find that rt_mutex_owner(lock) != task. Because that means when we let
go of the locks, the current lock we are going up was released and
retaken. That would mean the chain walk should stop. It's similar to
the next lock being what we expected.

Perhaps something like this:

---
kernel/locking/rtmutex.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)

Index: linux-rt.git/kernel/locking/rtmutex.c
===================================================================
--- linux-rt.git.orig/kernel/locking/rtmutex.c 2014-06-05 23:00:56.197855465 -0400
+++ linux-rt.git/kernel/locking/rtmutex.c 2014-06-05 23:14:44.164414857 -0400
@@ -284,7 +284,7 @@
struct rt_mutex_waiter *orig_waiter,
struct task_struct *top_task)
{
- struct rt_mutex *lock;
+ struct rt_mutex *lock = orig_lock;
struct rt_mutex_waiter *waiter, *top_waiter = orig_waiter;
int detect_deadlock, ret = 0, depth = 0;
unsigned long flags;
@@ -322,6 +322,16 @@
*/
raw_spin_lock_irqsave(&task->pi_lock, flags);

+ /*
+ * When we dropped the spinlocks, if the owner of the lock we
+ * are currently processing changed since we chain walked
+ * to that lock, we are done with the chain walk. The previous
+ * owner was obviously running to release it.
+ */
+ if (lock && rt_mutex_owner(lock) != task)
+ goto out_unlock_pi;
+
+
waiter = task->pi_blocked_on;
/*
* Check whether the end of the boosting chain has been

2014-06-06 05:40:19

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Thu, 5 Jun 2014, Steven Rostedt wrote:
> On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> Thomas Gleixner <[email protected]> wrote:
> + /*
> + * When we dropped the spinlocks, if the owner of the lock we
> + * are currently processing changed since we chain walked
> + * to that lock, we are done with the chain walk. The previous
> + * owner was obviously running to release it.
> + */
> + if (lock && rt_mutex_owner(lock) != task)
> + goto out_unlock_pi;

NO. You CANNOT derefernce lock after dropping the locks. It might be
gone already.

Thanks,

tglx

2014-06-06 05:44:29

by Thomas Gleixner

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Fri, 6 Jun 2014, Thomas Gleixner wrote:
> On Thu, 5 Jun 2014, Steven Rostedt wrote:
> > On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> > Thomas Gleixner <[email protected]> wrote:
> > + /*
> > + * When we dropped the spinlocks, if the owner of the lock we
> > + * are currently processing changed since we chain walked
> > + * to that lock, we are done with the chain walk. The previous
> > + * owner was obviously running to release it.
> > + */
> > + if (lock && rt_mutex_owner(lock) != task)
> > + goto out_unlock_pi;
>
> NO. You CANNOT derefernce lock after dropping the locks. It might be
> gone already.

The only information we can check is, whether @task changed the
blocked on lock, really.

Thanks,

tglx

2014-06-06 08:53:19

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH 1/1] rtmutex: Handle when top lock owner changes

On Fri, 6 Jun 2014 07:40:10 +0200 (CEST)
Thomas Gleixner <[email protected]> wrote:

> On Thu, 5 Jun 2014, Steven Rostedt wrote:
> > On Wed, 4 Jun 2014 17:32:37 +0200 (CEST)
> > Thomas Gleixner <[email protected]> wrote:
> > + /*
> > + * When we dropped the spinlocks, if the owner of the lock we
> > + * are currently processing changed since we chain walked
> > + * to that lock, we are done with the chain walk. The previous
> > + * owner was obviously running to release it.
> > + */
> > + if (lock && rt_mutex_owner(lock) != task)
> > + goto out_unlock_pi;
>
> NO. You CANNOT derefernce lock after dropping the locks. It might be
> gone already.
>

Ah, the lock can be freed (out of memory that is). Why didn't you say
so in the first place ;-)

-- Steve