Oleg brought up some interesting points about grabbing the pi_lock for
some protections. In this discussion, I realized that there are some
places that the pi_lock is being grabbed when it really wasn't
necessary. Also this patch does a little bit of clean up.
This patch basically does three things:
1) renames the "boost" variable to "chain_walk". Since it is used in
the debugging case when it isn't going to be boosted. It better
describes what the test is going to do if it succeeds.
2) moves get_task_struct to just before the unlocking of the wait_lock.
This removes duplicate code, and makes it a little easier to read. The
owner wont go away while either the pi_lock or the wait_lock are held.
3) removes the pi_locking and owner blocked checking completely from the
debugging case. This is because the grabbing the lock and doing the
check, then releasing the lock is just so full of races. It's just as
good to go ahead and call the pi_chain_walk function, since after
releasing the lock the owner can then block anyway, and we would have
missed that. For the debug case, we really do want to do the chain walk
to test for deadlocks anyway.
-- Steve
Signed-of-by: Steven Rostedt <[email protected]>
Index: linux-2.6.18-rc2/kernel/rtmutex.c
===================================================================
--- linux-2.6.18-rc2.orig/kernel/rtmutex.c 2006-08-01 09:22:07.000000000 -0400
+++ linux-2.6.18-rc2/kernel/rtmutex.c 2006-08-01 09:26:14.000000000 -0400
@@ -409,7 +409,7 @@ static int task_blocks_on_rt_mutex(struc
struct task_struct *owner = rt_mutex_owner(lock);
struct rt_mutex_waiter *top_waiter = waiter;
unsigned long flags;
- int boost = 0, res;
+ int chain_walk = 0, res;
spin_lock_irqsave(¤t->pi_lock, flags);
__rt_mutex_adjust_prio(current);
@@ -433,25 +433,23 @@ static int task_blocks_on_rt_mutex(struc
plist_add(&waiter->pi_list_entry, &owner->pi_waiters);
__rt_mutex_adjust_prio(owner);
- if (owner->pi_blocked_on) {
- boost = 1;
- /* gets dropped in rt_mutex_adjust_prio_chain()! */
- get_task_struct(owner);
- }
+ if (owner->pi_blocked_on)
+ chain_walk = 1;
spin_unlock_irqrestore(&owner->pi_lock, flags);
}
- else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock)) {
- spin_lock_irqsave(&owner->pi_lock, flags);
- if (owner->pi_blocked_on) {
- boost = 1;
- /* gets dropped in rt_mutex_adjust_prio_chain()! */
- get_task_struct(owner);
- }
- spin_unlock_irqrestore(&owner->pi_lock, flags);
- }
- if (!boost)
+ else if (debug_rt_mutex_detect_deadlock(waiter, detect_deadlock))
+ chain_walk = 1;
+
+ if (!chain_walk)
return 0;
+ /*
+ * The owner can't disappear while holding a lock,
+ * so the owner struct is protected by wait_lock.
+ * Gets dropped in rt_mutex_adjust_prio_chain()!
+ */
+ get_task_struct(owner);
+
spin_unlock(&lock->wait_lock);
res = rt_mutex_adjust_prio_chain(owner, detect_deadlock, lock, waiter,
@@ -532,7 +530,7 @@ static void remove_waiter(struct rt_mute
int first = (waiter == rt_mutex_top_waiter(lock));
struct task_struct *owner = rt_mutex_owner(lock);
unsigned long flags;
- int boost = 0;
+ int chain_walk = 0;
spin_lock_irqsave(¤t->pi_lock, flags);
plist_del(&waiter->list_entry, &lock->wait_list);
@@ -554,19 +552,20 @@ static void remove_waiter(struct rt_mute
}
__rt_mutex_adjust_prio(owner);
- if (owner->pi_blocked_on) {
- boost = 1;
- /* gets dropped in rt_mutex_adjust_prio_chain()! */
- get_task_struct(owner);
- }
+ if (owner->pi_blocked_on)
+ chain_walk = 1;
+
spin_unlock_irqrestore(&owner->pi_lock, flags);
}
WARN_ON(!plist_node_empty(&waiter->pi_list_entry));
- if (!boost)
+ if (!chain_walk)
return;
+ /* gets dropped in rt_mutex_adjust_prio_chain()! */
+ get_task_struct(owner);
+
spin_unlock(&lock->wait_lock);
rt_mutex_adjust_prio_chain(owner, 0, lock, NULL, current);
On top of Steven's rtmutex-clean-up-and-remove-some-extra-spinlocks.patch
There are still 2 get_task_struct()s under ->pi_lock. Imho, this is
confusing. Move them outside of ->pi_lock protected section. The task
can't go away, otherwise it was unsafe to take task->pi_lock.
Signed-off-by: Oleg Nesterov <[email protected]>
--- 2.6.18-rc3/kernel/rtmutex.c~ 2006-08-13 18:49:28.000000000 +0400
+++ 2.6.18-rc3/kernel/rtmutex.c 2006-08-13 19:07:45.000000000 +0400
@@ -249,6 +249,7 @@ static int rt_mutex_adjust_prio_chain(st
/* Grab the next task */
task = rt_mutex_owner(lock);
+ get_task_struct(task);
spin_lock_irqsave(&task->pi_lock, flags);
if (waiter == rt_mutex_top_waiter(lock)) {
@@ -267,7 +268,6 @@ static int rt_mutex_adjust_prio_chain(st
__rt_mutex_adjust_prio(task);
}
- get_task_struct(task);
spin_unlock_irqrestore(&task->pi_lock, flags);
top_waiter = rt_mutex_top_waiter(lock);
@@ -589,10 +589,10 @@ void rt_mutex_adjust_pi(struct task_stru
return;
}
- /* gets dropped in rt_mutex_adjust_prio_chain()! */
- get_task_struct(task);
spin_unlock_irqrestore(&task->pi_lock, flags);
+ /* gets dropped in rt_mutex_adjust_prio_chain()! */
+ get_task_struct(task);
rt_mutex_adjust_prio_chain(task, 0, NULL, NULL, task);
}
Another question: why should we take ->pi_lock to modify rt_mutex's
->wait_list? It looks confusing and unneeded to me, because we already
hold ->wait_lock. For example, wakeup_next_waiter() takes current's
->pi_lock before plist_del(), which seems to be completely offtopic,
since current->pi_blocked_on has nothing to do with that rt_mutex.
Note also that ->pi_blocked_on is always modified while also holding
->pi_blocked_on->lock->wait_lock, and things like rt_mutex_top_waiter()
need ->wait_lock too, so I don't think we need ->pi_lock for ->wait_list.
In other words, could you please explain to me whether the patch below
correct or not?
Thanks,
Oleg.
--- 2.6.18-rc3/kernel/rtmutex.c~2_rtm 2006-08-13 19:07:45.000000000 +0400
+++ 2.6.18-rc3/kernel/rtmutex.c 2006-08-13 22:09:45.000000000 +0400
@@ -236,6 +236,10 @@ static int rt_mutex_adjust_prio_chain(st
goto out_unlock_pi;
}
+ /* Release the task */
+ spin_unlock_irqrestore(&task->pi_lock, flags);
+ put_task_struct(task);
+
top_waiter = rt_mutex_top_waiter(lock);
/* Requeue the waiter */
@@ -243,10 +247,6 @@ static int rt_mutex_adjust_prio_chain(st
waiter->list_entry.prio = task->prio;
plist_add(&waiter->list_entry, &lock->wait_list);
- /* Release the task */
- spin_unlock_irqrestore(&task->pi_lock, flags);
- put_task_struct(task);
-
/* Grab the next task */
task = rt_mutex_owner(lock);
get_task_struct(task);
@@ -416,15 +416,15 @@ static int task_blocks_on_rt_mutex(struc
plist_node_init(&waiter->list_entry, current->prio);
plist_node_init(&waiter->pi_list_entry, current->prio);
+ current->pi_blocked_on = waiter;
+
+ spin_unlock_irqrestore(¤t->pi_lock, flags);
+
/* Get the top priority waiter on the lock */
if (rt_mutex_has_waiters(lock))
top_waiter = rt_mutex_top_waiter(lock);
plist_add(&waiter->list_entry, &lock->wait_list);
- current->pi_blocked_on = waiter;
-
- spin_unlock_irqrestore(¤t->pi_lock, flags);
-
if (waiter == rt_mutex_top_waiter(lock)) {
spin_lock_irqsave(&owner->pi_lock, flags);
plist_del(&top_waiter->pi_list_entry, &owner->pi_waiters);
@@ -472,11 +472,10 @@ static void wakeup_next_waiter(struct rt
struct task_struct *pendowner;
unsigned long flags;
- spin_lock_irqsave(¤t->pi_lock, flags);
-
waiter = rt_mutex_top_waiter(lock);
plist_del(&waiter->list_entry, &lock->wait_list);
+ spin_lock_irqsave(¤t->pi_lock, flags);
/*
* Remove it from current->pi_waiters. We do not adjust a
* possible priority boost right now. We execute wakeup in the
@@ -530,8 +529,9 @@ static void remove_waiter(struct rt_mute
unsigned long flags;
int chain_walk = 0;
- spin_lock_irqsave(¤t->pi_lock, flags);
plist_del(&waiter->list_entry, &lock->wait_list);
+
+ spin_lock_irqsave(¤t->pi_lock, flags);
waiter->task = NULL;
current->pi_blocked_on = NULL;
spin_unlock_irqrestore(¤t->pi_lock, flags);
On Sun, 13 Aug 2006, Oleg Nesterov wrote:
> Another question: why should we take ->pi_lock to modify rt_mutex's
> ->wait_list?
> It looks confusing and unneeded to me, because we already
> hold ->wait_lock. For example, wakeup_next_waiter() takes current's
> ->pi_lock before plist_del(), which seems to be completely offtopic,
> since current->pi_blocked_on has nothing to do with that rt_mutex.
>
> Note also that ->pi_blocked_on is always modified while also holding
> ->pi_blocked_on->lock->wait_lock, and things like rt_mutex_top_waiter()
> need ->wait_lock too, so I don't think we need ->pi_lock for ->wait_list.
>
Yes, that was the basic design:
lock->wait_list and related waiter->list_entry is protected by
lock->wait_lock, while task->pi_waiters and related waiter->pi_list_entry.
> In other words, could you please explain to me whether the patch below
> correct or not?
>
Well, we are talking about small optimizations now, moving only a few
instructions outside the lock. Except for one of them it is correct, but
it is worth risking stability for now?
> Thanks,
>
> Oleg.
>
> --- 2.6.18-rc3/kernel/rtmutex.c~2_rtm 2006-08-13 19:07:45.000000000 +0400
> +++ 2.6.18-rc3/kernel/rtmutex.c 2006-08-13 22:09:45.000000000 +0400
> @@ -236,6 +236,10 @@ static int rt_mutex_adjust_prio_chain(st
> goto out_unlock_pi;
> }
>
> + /* Release the task */
> + spin_unlock_irqrestore(&task->pi_lock, flags);
> + put_task_struct(task);
> +
So you want the task to go away here and use it below?
> top_waiter = rt_mutex_top_waiter(lock);
>
> /* Requeue the waiter */
> @@ -243,10 +247,6 @@ static int rt_mutex_adjust_prio_chain(st
> waiter->list_entry.prio = task->prio;
> plist_add(&waiter->list_entry, &lock->wait_list);
>
> - /* Release the task */
> - spin_unlock_irqrestore(&task->pi_lock, flags);
> - put_task_struct(task);
> -
No! It is used in the line just above, so we better be sure it still
exists!
> /* Grab the next task */
> task = rt_mutex_owner(lock);
> get_task_struct(task);
> @@ -416,15 +416,15 @@ static int task_blocks_on_rt_mutex(struc
> plist_node_init(&waiter->list_entry, current->prio);
> plist_node_init(&waiter->pi_list_entry, current->prio);
>
> + current->pi_blocked_on = waiter;
> +
> + spin_unlock_irqrestore(¤t->pi_lock, flags);
> +
> /* Get the top priority waiter on the lock */
> if (rt_mutex_has_waiters(lock))
> top_waiter = rt_mutex_top_waiter(lock);
> plist_add(&waiter->list_entry, &lock->wait_list);
>
> - current->pi_blocked_on = waiter;
> -
> - spin_unlock_irqrestore(¤t->pi_lock, flags);
> -
Ok, this change might work out.
> if (waiter == rt_mutex_top_waiter(lock)) {
> spin_lock_irqsave(&owner->pi_lock, flags);
> plist_del(&top_waiter->pi_list_entry, &owner->pi_waiters);
> @@ -472,11 +472,10 @@ static void wakeup_next_waiter(struct rt
> struct task_struct *pendowner;
> unsigned long flags;
>
> - spin_lock_irqsave(¤t->pi_lock, flags);
> -
> waiter = rt_mutex_top_waiter(lock);
> plist_del(&waiter->list_entry, &lock->wait_list);
>
> + spin_lock_irqsave(¤t->pi_lock, flags);
This might be ok, too...
> /*
> * Remove it from current->pi_waiters. We do not adjust a
> * possible priority boost right now. We execute wakeup in the
> @@ -530,8 +529,9 @@ static void remove_waiter(struct rt_mute
> unsigned long flags;
> int chain_walk = 0;
>
> - spin_lock_irqsave(¤t->pi_lock, flags);
> plist_del(&waiter->list_entry, &lock->wait_list);
> +
> + spin_lock_irqsave(¤t->pi_lock, flags);
> waiter->task = NULL;
> current->pi_blocked_on = NULL;
> spin_unlock_irqrestore(¤t->pi_lock, flags);
>
And ok.
On 08/14, Esben Nielsen wrote:
>
> Well, we are talking about small optimizations now, moving only a few
> instructions outside the lock. Except for one of them it is correct, but
> it is worth risking stability for now?
Yes, optimization is small, but I think this cleanups the code, which is (imho)
more important. That said, I don't suggest this patch, it was a question. I stiil
can't find a time to read the code hard and convince myself I can understand it :)
Also, I think such a change opens the possibility for further cleanups.
> >--- 2.6.18-rc3/kernel/rtmutex.c~2_rtm 2006-08-13 19:07:45.000000000 +0400
> >+++ 2.6.18-rc3/kernel/rtmutex.c 2006-08-13 22:09:45.000000000 +0400
> >@@ -236,6 +236,10 @@ static int rt_mutex_adjust_prio_chain(st
> > goto out_unlock_pi;
> > }
> >
> >+ /* Release the task */
> >+ spin_unlock_irqrestore(&task->pi_lock, flags);
> >+ put_task_struct(task);
> >+
>
> So you want the task to go away here and use it below?
task->pi_blocked_on != NULL, we hold task->pi_blocked_on->lock->wait_lock.
Can it go away ?
>
> > top_waiter = rt_mutex_top_waiter(lock);
> >
> > /* Requeue the waiter */
> >@@ -243,10 +247,6 @@ static int rt_mutex_adjust_prio_chain(st
> > waiter->list_entry.prio = task->prio;
> > plist_add(&waiter->list_entry, &lock->wait_list);
> >
> >- /* Release the task */
> >- spin_unlock_irqrestore(&task->pi_lock, flags);
> >- put_task_struct(task);
> >-
>
> No! It is used in the line just above, so we better be sure it still
> exists!
See above. If I am wrong, we can move this line
waiter->list_entry.prio = task->prio;
up, under ->pi_lock. plist_del() doesn't need a valid ->prio.
Thanks for your answer!
Oleg.
On Tue, 15 Aug 2006, Oleg Nesterov wrote:
> On 08/14, Esben Nielsen wrote:
>>
>> Well, we are talking about small optimizations now, moving only a few
>> instructions outside the lock. Except for one of them it is correct, but
>> it is worth risking stability for now?
>
> Yes, optimization is small, but I think this cleanups the code, which is (imho)
> more important. That said, I don't suggest this patch, it was a question. I stiil
> can't find a time to read the code hard and convince myself I can understand it :)
>
> Also, I think such a change opens the possibility for further cleanups.
>
>>> --- 2.6.18-rc3/kernel/rtmutex.c~2_rtm 2006-08-13 19:07:45.000000000 +0400
>>> +++ 2.6.18-rc3/kernel/rtmutex.c 2006-08-13 22:09:45.000000000 +0400
>>> @@ -236,6 +236,10 @@ static int rt_mutex_adjust_prio_chain(st
>>> goto out_unlock_pi;
>>> }
>>>
>>> + /* Release the task */
>>> + spin_unlock_irqrestore(&task->pi_lock, flags);
>>> + put_task_struct(task);
>>> +
>>
>> So you want the task to go away here and use it below?
>
> task->pi_blocked_on != NULL, we hold task->pi_blocked_on->lock->wait_lock.
> Can it go away ?
That is correct. But does it make the code more readable? When you read
the code you shouldn't need to go into that kind of complicated arguments
to see the correctness - unless the code can't be written otherwise.
>
>>
>>> top_waiter = rt_mutex_top_waiter(lock);
>>>
>>> /* Requeue the waiter */
>>> @@ -243,10 +247,6 @@ static int rt_mutex_adjust_prio_chain(st
>>> waiter->list_entry.prio = task->prio;
>>> plist_add(&waiter->list_entry, &lock->wait_list);
>>>
>>> - /* Release the task */
>>> - spin_unlock_irqrestore(&task->pi_lock, flags);
>>> - put_task_struct(task);
>>> -
>>
>> No! It is used in the line just above, so we better be sure it still
>> exists!
>
> See above. If I am wrong, we can move this line
>
> waiter->list_entry.prio = task->prio;
>
> up, under ->pi_lock. plist_del() doesn't need a valid ->prio.
>
Correct.
> Thanks for your answer!
>
> Oleg.
>
Esben
On 08/15, Esben Nielsen wrote:
>
> On Tue, 15 Aug 2006, Oleg Nesterov wrote:
> >
> >task->pi_blocked_on != NULL, we hold task->pi_blocked_on->lock->wait_lock.
> >Can it go away ?
>
> That is correct. But does it make the code more readable?
Well, in my opinion - yes. But yes, it's only my personal feeling :)
> When you read
> the code you shouldn't need to go into that kind of complicated arguments
> to see the correctness - unless the code can't be written otherwise.
Sure, this needs a comment.
Thanks again,
Oleg.
On Tue, 15 Aug 2006, Oleg Nesterov wrote:
> On 08/15, Esben Nielsen wrote:
>>
>> On Tue, 15 Aug 2006, Oleg Nesterov wrote:
>>>
>>> task->pi_blocked_on != NULL, we hold task->pi_blocked_on->lock->wait_lock.
>>> Can it go away ?
>>
>> That is correct. But does it make the code more readable?
>
> Well, in my opinion - yes. But yes, it's only my personal feeling :)
>
>> When you read
>> the code you shouldn't need to go into that kind of complicated arguments
>> to see the correctness - unless the code can't be written otherwise.
>
> Sure, this needs a comment.
It is even better if you can read the code without a comment. Good code
doesn't need comments.
Esben
>
> Thanks again,
>
> Oleg.
>
On 08/15, Esben Nielsen wrote:
>
> On Tue, 15 Aug 2006, Oleg Nesterov wrote:
> >
> >Sure, this needs a comment.
>
> It is even better if you can read the code without a comment. Good code
> doesn't need comments.
Yes. But from my point of view, the current code needs a comment to explain
why do we take ->pi_lock, this is hard to understand at least to me.
It is even better if we don't take ->pi_lock for ->wait_list manipulation.
Imho.
Oleg.