2006-03-25 13:52:45

by Esben Nielsen

[permalink] [raw]
Subject: Comment on 2.6.16-rt6 PI

1)
I found a bug adjust_pi_chain():

if (top_waiter)
plist_del(&top_waiter->pi_list_entry,
&owner->pi_waiters);


if (waiter && waiter == rt_mutex_top_waiter(lock)) {
waiter->pi_list_entry.prio = waiter->task->prio;
plist_add(&waiter->pi_list_entry,
&owner->pi_waiters);
}

In my test setup this leaves the owner->pi_waiters empty even though there
are waiters. I tried to move the removal of top_waiter inside the second
if statement but then a lot of other tests failed. I don't have time to
fix it.

2) Some time ago I got started on a patch which I consider a lot simpler
than the approach in -rt6, which I consider _very_ complicated.

My patch doesn't work yet but I will send it to you anyway. My machine UP
boots hangs in "Letting udev process events...". I think it is because I tried
to switch around xchg(&current->state,x) stuff which I don't feel too secure
about. I don't have time to work on it these days - barely time to send this
mail, actually :-(

The idea I use is to delay the adjust_pi_chain() until all spin locks are
released - i.e. just before schedule() is called in the lock operation or
on the way out when the task is signalled or times out. Only two spin locks
will ever be held at the same time, namely sometask->pi_lock or
somelock->wait_lock. All spinlocks are released before the next iteration -
which is good for latencies and makes far less deadlock issues to
consider.

The patch is against 2.6.16-rt1.

Esben


Attachments:
pi.patch2 (19.03 kB)

2006-03-25 14:07:44

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Comment on 2.6.16-rt6 PI

On Sat, 2006-03-25 at 14:52 +0100, Esben Nielsen wrote:

> In my test setup this leaves the owner->pi_waiters empty even though there
> are waiters. I tried to move the removal of top_waiter inside the second
> if statement but then a lot of other tests failed. I don't have time to
> fix it.

Can you please explain that more detailed how it happens ? And provide a
test case ?

tglx


2006-03-25 18:23:41

by Esben Nielsen

[permalink] [raw]
Subject: Re: Comment on 2.6.16-rt6 PI

On Sat, 25 Mar 2006, Thomas Gleixner wrote:

> On Sat, 2006-03-25 at 14:52 +0100, Esben Nielsen wrote:
>
> > In my test setup this leaves the owner->pi_waiters empty even though there
> > are waiters. I tried to move the removal of top_waiter inside the second
> > if statement but then a lot of other tests failed. I don't have time to
> > fix it.
>
> Can you please explain that more detailed how it happens ? And provide a
> test case ?
>

Sorry for the lack of details. I just thought the test-case wouldn't make
sense to you much and didn't paste it in. I was in a bit of a hurry too.
Now I have a little more time and can tell you what is going on:

top_waiter!=NULL
waiter!=NULL
waiter!=rt_mutex_top_waiter(lock)

Therefore one top_waiter is removed and but nothing is inserted.

Below is a fix.

Esben

> tglx
>
>
--- linux-2.6.16-rt6/kernel/rtmutex.c.orig 2006-03-25 19:14:35.000000000 +0100
+++ linux-2.6.16-rt6/kernel/rtmutex.c 2006-03-25 19:22:04.000000000 +0100
@@ -223,15 +223,22 @@ static void adjust_pi_chain(struct rt_mu
struct list_head *curr = lock_chain->prev;

for (;;) {
- if (top_waiter)
+ if (top_waiter) {
plist_del(&top_waiter->pi_list_entry,
&owner->pi_waiters);
-
-
- if (waiter && waiter == rt_mutex_top_waiter(lock)) {
+ }
+
+ if (waiter) {
waiter->pi_list_entry.prio = waiter->task->prio;
- plist_add(&waiter->pi_list_entry, &owner->pi_waiters);
}
+
+ if (rt_mutex_has_waiters(lock)) {
+ top_waiter = rt_mutex_top_waiter(lock);
+ plist_add(&top_waiter->pi_list_entry,
+ &owner->pi_waiters);
+ }
+
+
adjust_prio(owner);

waiter = owner->pi_blocked_on;

2006-03-25 18:34:45

by Thomas Gleixner

[permalink] [raw]
Subject: Re: Comment on 2.6.16-rt6 PI

On Sat, 2006-03-25 at 19:23 +0100, Esben Nielsen wrote:
> Sorry for the lack of details. I just thought the test-case wouldn't make
> sense to you much and didn't paste it in. I was in a bit of a hurry too.
> Now I have a little more time and can tell you what is going on:
>
> top_waiter!=NULL
> waiter!=NULL
> waiter!=rt_mutex_top_waiter(lock)
>
> Therefore one top_waiter is removed and but nothing is inserted.

How does this happen. From inside the loop this is impossible. And I
dont see a caller with that constellation either.

tglx


2006-03-25 19:52:11

by Esben Nielsen

[permalink] [raw]
Subject: Re: Comment on 2.6.16-rt6 PI

On Sat, 25 Mar 2006, Thomas Gleixner wrote:

> On Sat, 2006-03-25 at 19:23 +0100, Esben Nielsen wrote:
> > Sorry for the lack of details. I just thought the test-case wouldn't make
> > sense to you much and didn't paste it in. I was in a bit of a hurry too.
> > Now I have a little more time and can tell you what is going on:
> >
> > top_waiter!=NULL
> > waiter!=NULL
> > waiter!=rt_mutex_top_waiter(lock)
> >
> > Therefore one top_waiter is removed and but nothing is inserted.
>
> How does this happen. From inside the loop this is impossible. And I
> dont see a caller with that constellation either.

The test-case where I made it happen is below.

What happens in the last line of execution is that the 3rd thread takes
lock 1 and boosts task 1st task which is blocked on 2nd task. Now all 1st,
2nd and 3rd task all have priority 1.
Then the 4th task is allowed to run. It tries to lock lock 2, which is owned
by the 2nd task. The first waiter is the 1st task bosted to priorty 2. So
this is "top_waiter". But waiter referes to the 4th task with priority 2
so not the first waiter.

The result is as above because I am running with deadlock detection on.

Esben


threads: 4 3 1 2
lock 1 + + +
test: + + + +
test: prio 4 prio 3 prio 1 prio 2
test: lockcount 1 lockcount 0 lockcount 0 lockcount 0

+ lock 2 + +
test: + + + +
test: prio 4 prio 3 prio 1 prio 2
test: lockcount 1 lockcount 1 lockcount 0 lockcount 0

lock 2 + + +
test: - + + +
test: prio 4 prio 3 prio 1 prio 2
test: lockcount 1 lockcount 1 lockcount 0 lockcount 0

+ + lock1 lock 2
test: - + - -
test: prio 1 prio 1 prio 1 prio 2
test: lockcount 1 lockcount 1 lockcount 0 lockcount 0

>
> tglx
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>