2006-03-02 10:38:14

by Simon Derr

[permalink] [raw]
Subject: Deadlock in net/sunrpc/sched.c


Hi,

My colleague Bruno Faccini has found a deadlock in the rpc wake up code.
This happened with 2.6.12 but it seems that the code has not changed and
the issue is very probably still present in the current kernels.

I think what happens is this:

One process (A) enters rpc_wake_up_task().
It enters rpc_start_wakeup() and sets the RPC_TASK_WAKEUP bit.

#define rpc_start_wakeup(t) \
(test_and_set_bit(RPC_TASK_WAKEUP, &(t)->tk_runstate) == 0)

void rpc_wake_up_task(struct rpc_task *task)
{
if (rpc_start_wakeup(task)) {
if (RPC_IS_QUEUED(task)) {
struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;

spin_lock_bh(&queue->lock);
__rpc_do_wake_up_task(task);
spin_unlock_bh(&queue->lock);
}
rpc_finish_wakeup(task);
}
}


Now an interrupt has occured on another CPU and process (B) enters
rpc_wake_up(). It takes the queue spinlock, and enters this `while' loop:

void rpc_wake_up(struct rpc_wait_queue *queue)
{
struct rpc_task *task;

struct list_head *head;
spin_lock_bh(&queue->lock);
head = &queue->tasks[queue->maxpriority];
for (;;) {
while (!list_empty(head)) {
task = list_entry(head->next, struct rpc_task, u.tk_wait.list);
__rpc_wake_up_task(task);
}
if (head == &queue->tasks[0])
break;
head--;
}
spin_unlock_bh(&queue->lock);
}

static void __rpc_wake_up_task(struct rpc_task *task)
{
if (rpc_start_wakeup(task)) {
if (RPC_IS_QUEUED(task))
__rpc_do_wake_up_task(task);
rpc_finish_wakeup(task);
}
}


Now to exit this loop, B needs to reach __rpc_do_wake_up_task() where a
list_del will occur. But for this the RPC_TASK_WAKEUP must be released by
process A, and this won't happen until process B releases the queue
spinlock. --> deadlock.


A possible fix would be to take the spinlock earlier in rpc_wake_up_task():



Signed-off-by: [email protected]
Signed-off-by: [email protected]

Index: linux-2.6.12.6/net/sunrpc/sched.c
===================================================================
--- linux-2.6.12.6.orig/net/sunrpc/sched.c 2005-08-29 18:55:27.000000000 +0200
+++ linux-2.6.12.6/net/sunrpc/sched.c 2006-03-02 11:10:38.000000000 +0100
@@ -400,16 +400,16 @@ __rpc_default_timer(struct rpc_task *tas
*/
void rpc_wake_up_task(struct rpc_task *task)
{
+ spin_lock_bh(&queue->lock);
if (rpc_start_wakeup(task)) {
if (RPC_IS_QUEUED(task)) {
struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;

- spin_lock_bh(&queue->lock);
__rpc_do_wake_up_task(task);
- spin_unlock_bh(&queue->lock);
}
rpc_finish_wakeup(task);
}
+ spin_unlock_bh(&queue->lock);
}

/*


2006-03-02 10:59:50

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: Deadlock in net/sunrpc/sched.c

On Thu, Mar 02, 2006 at 11:38:10AM +0100, Simon Derr wrote:
> This happened with 2.6.12 but it seems that the code has not changed and
> the issue is very probably still present in the current kernels.
Looks like it's fixed in 2.6.16-rc5, could you check agains the current
tree?

> + spin_lock_bh(&queue->lock);
> if (rpc_start_wakeup(task)) {
> if (RPC_IS_QUEUED(task)) {
> struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
>
> - spin_lock_bh(&queue->lock);
> __rpc_do_wake_up_task(task);
> - spin_unlock_bh(&queue->lock);
> }
> rpc_finish_wakeup(task);
> }
> + spin_unlock_bh(&queue->lock);

Hmm, not sure this will even compile...
Regards,
Frederik

2006-03-02 11:45:31

by Simon Derr

[permalink] [raw]
Subject: Re: Deadlock in net/sunrpc/sched.c

On Thu, 2 Mar 2006, Frederik Deweerdt wrote:

> On Thu, Mar 02, 2006 at 11:38:10AM +0100, Simon Derr wrote:
> > This happened with 2.6.12 but it seems that the code has not changed and
> > the issue is very probably still present in the current kernels.
> Looks like it's fixed in 2.6.16-rc5, could you check agains the current
> tree?

No, the code is still the same.

> Hmm, not sure this will even compile...

oops....

Index: linux-2.6.12.6/net/sunrpc/sched.c
===================================================================
--- linux-2.6.12.6.orig/net/sunrpc/sched.c 2005-08-29 18:55:27.000000000 +0200
+++ linux-2.6.12.6/net/sunrpc/sched.c 2006-03-02 12:41:42.000000000 +0100
@@ -400,16 +400,16 @@ __rpc_default_timer(struct rpc_task *tas
*/
void rpc_wake_up_task(struct rpc_task *task)
{
+ struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
+ spin_lock_bh(&queue->lock);
if (rpc_start_wakeup(task)) {
if (RPC_IS_QUEUED(task)) {
- struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;

- spin_lock_bh(&queue->lock);
__rpc_do_wake_up_task(task);
- spin_unlock_bh(&queue->lock);
}
rpc_finish_wakeup(task);
}
+ spin_unlock_bh(&queue->lock);
}

/*

2006-03-02 12:17:58

by Simon Derr

[permalink] [raw]
Subject: Re: Deadlock in net/sunrpc/sched.c

On Thu, 2 Mar 2006, Simon Derr wrote:

> Index: linux-2.6.12.6/net/sunrpc/sched.c
> ===================================================================
> --- linux-2.6.12.6.orig/net/sunrpc/sched.c 2005-08-29 18:55:27.000000000 +0200
> +++ linux-2.6.12.6/net/sunrpc/sched.c 2006-03-02 12:41:42.000000000 +0100
> @@ -400,16 +400,16 @@ __rpc_default_timer(struct rpc_task *tas
> */
> void rpc_wake_up_task(struct rpc_task *task)
> {
> + struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
> + spin_lock_bh(&queue->lock);
> if (rpc_start_wakeup(task)) {
> if (RPC_IS_QUEUED(task)) {
> - struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
>
> - spin_lock_bh(&queue->lock);
> __rpc_do_wake_up_task(task);
> - spin_unlock_bh(&queue->lock);
> }
> rpc_finish_wakeup(task);
> }
> + spin_unlock_bh(&queue->lock);
> }

Hmm, this is just to show how to reverse the locking order.
I'm not too familiar with the rpc_tasks, and
task->u.tk_wait.rpc_waitq might not be valid.

Maybe this would be better:



--- linux-2.6.12.6.orig/net/sunrpc/sched.c 2006-03-02 13:11:51.000000000 +0100
+++ linux-2.6.12.6/net/sunrpc/sched.c 2006-03-02 13:16:19.000000000 +0100
@@ -400,15 +400,14 @@ __rpc_default_timer(struct rpc_task *tas
*/
void rpc_wake_up_task(struct rpc_task *task)
{
- if (rpc_start_wakeup(task)) {
- if (RPC_IS_QUEUED(task)) {
- struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
-
- spin_lock_bh(&queue->lock);
+ if (RPC_IS_QUEUED(task)) {
+ struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
+ spin_lock_bh(&queue->lock);
+ if (rpc_start_wakeup(task)) {
__rpc_do_wake_up_task(task);
- spin_unlock_bh(&queue->lock);
+ rpc_finish_wakeup(task);
}
- rpc_finish_wakeup(task);
+ spin_unlock_bh(&queue->lock);
}
}

2006-03-02 17:32:09

by Trond Myklebust

[permalink] [raw]
Subject: Re: Deadlock in net/sunrpc/sched.c

On Thu, 2006-03-02 at 12:45 +0100, Simon Derr wrote:
> On Thu, 2 Mar 2006, Frederik Deweerdt wrote:
>
> > On Thu, Mar 02, 2006 at 11:38:10AM +0100, Simon Derr wrote:
> > > This happened with 2.6.12 but it seems that the code has not changed and
> > > the issue is very probably still present in the current kernels.
> > Looks like it's fixed in 2.6.16-rc5, could you check agains the current
> > tree?
>
> No, the code is still the same.
>
> > Hmm, not sure this will even compile...
>
> oops....
>
> Index: linux-2.6.12.6/net/sunrpc/sched.c
> ===================================================================
> --- linux-2.6.12.6.orig/net/sunrpc/sched.c 2005-08-29 18:55:27.000000000 +0200
> +++ linux-2.6.12.6/net/sunrpc/sched.c 2006-03-02 12:41:42.000000000 +0100
> @@ -400,16 +400,16 @@ __rpc_default_timer(struct rpc_task *tas
> */
> void rpc_wake_up_task(struct rpc_task *task)
> {
> + struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
> + spin_lock_bh(&queue->lock);
> if (rpc_start_wakeup(task)) {
> if (RPC_IS_QUEUED(task)) {
> - struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
>
> - spin_lock_bh(&queue->lock);
> __rpc_do_wake_up_task(task);
> - spin_unlock_bh(&queue->lock);
> }
> rpc_finish_wakeup(task);
> }
> + spin_unlock_bh(&queue->lock);
> }


Bzzzt... Race: How do you know that task->u.tk_wait.rpc_waitq contains a
valid pointer to a queue before you call rpc_start_wakeup()?

Cheers,
Trond

> /*
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2006-03-02 17:45:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: Deadlock in net/sunrpc/sched.c

On Thu, 2006-03-02 at 11:38 +0100, Simon Derr wrote:
> Hi,
>
> My colleague Bruno Faccini has found a deadlock in the rpc wake up code.
> This happened with 2.6.12 but it seems that the code has not changed and
> the issue is very probably still present in the current kernels.
>
> I think what happens is this:
>
> One process (A) enters rpc_wake_up_task().
> It enters rpc_start_wakeup() and sets the RPC_TASK_WAKEUP bit.
>
> #define rpc_start_wakeup(t) \
> (test_and_set_bit(RPC_TASK_WAKEUP, &(t)->tk_runstate) == 0)
>
> void rpc_wake_up_task(struct rpc_task *task)
> {
> if (rpc_start_wakeup(task)) {
> if (RPC_IS_QUEUED(task)) {
> struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
>
> spin_lock_bh(&queue->lock);
> __rpc_do_wake_up_task(task);
> spin_unlock_bh(&queue->lock);
> }
> rpc_finish_wakeup(task);
> }
> }
>
>
> Now an interrupt has occured on another CPU and process (B) enters
> rpc_wake_up(). It takes the queue spinlock, and enters this `while' loop:
>
> void rpc_wake_up(struct rpc_wait_queue *queue)
> {
> struct rpc_task *task;
>
> struct list_head *head;
> spin_lock_bh(&queue->lock);
> head = &queue->tasks[queue->maxpriority];
> for (;;) {
> while (!list_empty(head)) {
> task = list_entry(head->next, struct rpc_task, u.tk_wait.list);
> __rpc_wake_up_task(task);
> }
> if (head == &queue->tasks[0])
> break;
> head--;
> }
> spin_unlock_bh(&queue->lock);
> }
>
> static void __rpc_wake_up_task(struct rpc_task *task)
> {
> if (rpc_start_wakeup(task)) {
> if (RPC_IS_QUEUED(task))
> __rpc_do_wake_up_task(task);
> rpc_finish_wakeup(task);
> }
> }
>
>
> Now to exit this loop, B needs to reach __rpc_do_wake_up_task() where a
> list_del will occur. But for this the RPC_TASK_WAKEUP must be released by
> process A, and this won't happen until process B releases the queue
> spinlock. --> deadlock.

Could you see if this fixes it?

Cheers,
Trond
-----------------------
Author: Trond Myklebust <[email protected]>
SUNRPC: Fix potential deadlock in RPC code

In rpc_wake_up() and rpc_wake_up_status(), it is possible for the call to
__rpc_wake_up_task() to fail if another thread happens to be calling
rpc_wake_up_task() on the same rpc_task.

Problem noticed by Bruno Faccini.

Signed-off-by: Trond Myklebust <[email protected]>
---

net/sunrpc/sched.c | 13 +++++--------
1 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index a04cf3b..cd51b54 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -517,16 +517,14 @@ struct rpc_task * rpc_wake_up_next(struc
*/
void rpc_wake_up(struct rpc_wait_queue *queue)
{
- struct rpc_task *task;
-
+ struct rpc_task *task, *next;
struct list_head *head;
+
spin_lock_bh(&queue->lock);
head = &queue->tasks[queue->maxpriority];
for (;;) {
- while (!list_empty(head)) {
- task = list_entry(head->next, struct rpc_task, u.tk_wait.list);
+ list_for_each_entry_safe(task, next, head, u.tk_wait.list)
__rpc_wake_up_task(task);
- }
if (head == &queue->tasks[0])
break;
head--;
@@ -543,14 +541,13 @@ void rpc_wake_up(struct rpc_wait_queue *
*/
void rpc_wake_up_status(struct rpc_wait_queue *queue, int status)
{
+ struct rpc_task *task, *next;
struct list_head *head;
- struct rpc_task *task;

spin_lock_bh(&queue->lock);
head = &queue->tasks[queue->maxpriority];
for (;;) {
- while (!list_empty(head)) {
- task = list_entry(head->next, struct rpc_task, u.tk_wait.list);
+ list_for_each_entry_safe(task, next, head, u.tk_wait.list) {
task->tk_status = status;
__rpc_wake_up_task(task);
}


2006-03-02 17:51:34

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: Deadlock in net/sunrpc/sched.c

(Added Trond Myklebust to the cc list, as he's working on sunrpc)
> + if (RPC_IS_QUEUED(task)) {
> + struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
> + spin_lock_bh(&queue->lock);
> + if (rpc_start_wakeup(task)) {
We may end up list_del'ing a task that is not queued
anymore: we may be interrupted just after the RPC_IS_QUEUED test.
Don't you think this patch could be enough?

--- linux-2.6.16-rc5/net/sunrpc/sched.c 2006-03-01 11:26:15.000000000 +0100
+++ linux-2.6.16-rc5-2/net/sunrpc/sched.c 2006-03-02 15:43:18.000000000 +0100
@@ -521,8 +521,7 @@
spin_lock_bh(&queue->lock);
head = &queue->tasks[queue->maxpriority];
for (;;) {
- while (!list_empty(head)) {
- task = list_entry(head->next, struct rpc_task, u.tk_wait.list);
+ list_for_each_entry(task, head, u.tk_wait.list) {
__rpc_wake_up_task(task);
}
if (head == &queue->tasks[0])

We don't need to __rpc_wake_up_task a task that is already
rpc_start_wakeup'ed after all. BTW, there are other while
(!list_empty(head)) on sched.c that could need a similar rewrite.

Regards,
Frederik

2006-03-02 17:58:06

by Myklebust, Trond

[permalink] [raw]
Subject: Re: Deadlock in net/sunrpc/sched.c

On Thu, 2006-03-02 at 18:51 +0100, Frederik Deweerdt wrote:
> (Added Trond Myklebust to the cc list, as he's working on sunrpc)
> > + if (RPC_IS_QUEUED(task)) {
> > + struct rpc_wait_queue *queue = task->u.tk_wait.rpc_waitq;
> > + spin_lock_bh(&queue->lock);
> > + if (rpc_start_wakeup(task)) {
> We may end up list_del'ing a task that is not queued
> anymore: we may be interrupted just after the RPC_IS_QUEUED test.
> Don't you think this patch could be enough?
>
> --- linux-2.6.16-rc5/net/sunrpc/sched.c 2006-03-01 11:26:15.000000000 +0100
> +++ linux-2.6.16-rc5-2/net/sunrpc/sched.c 2006-03-02 15:43:18.000000000 +0100
> @@ -521,8 +521,7 @@
> spin_lock_bh(&queue->lock);
> head = &queue->tasks[queue->maxpriority];
> for (;;) {
> - while (!list_empty(head)) {
> - task = list_entry(head->next, struct rpc_task, u.tk_wait.list);
> + list_for_each_entry(task, head, u.tk_wait.list) {
> __rpc_wake_up_task(task);
> }
> if (head == &queue->tasks[0])

You need a list_for_each_entry_safe() here, since __rpc_wake_up_task()
will cause the task to be removed from the list. See the patch I sent
out 1/2 hour ago.

Cheers
Trond

2006-03-02 18:33:33

by Frederik Deweerdt

[permalink] [raw]
Subject: Re: Deadlock in net/sunrpc/sched.c

On Thu, Mar 02, 2006 at 09:58:02AM -0800, Trond Myklebust wrote:
> You need a list_for_each_entry_safe() here, since __rpc_wake_up_task()
> will cause the task to be removed from the list. See the patch I sent
> out 1/2 hour ago.
>
Indeed, I missed it. Thanks,
Frederik

2006-03-03 08:53:52

by Simon Derr

[permalink] [raw]
Subject: Re: Deadlock in net/sunrpc/sched.c

On Thu, 2 Mar 2006, Trond Myklebust wrote:

> > Now to exit this loop, B needs to reach __rpc_do_wake_up_task() where a
> > list_del will occur. But for this the RPC_TASK_WAKEUP must be released by
> > process A, and this won't happen until process B releases the queue
> > spinlock. --> deadlock.
>
> Could you see if this fixes it?
>
> Cheers,
> Trond


Thanks, I will apply it.
But don't hold your breath though, this happened on a large cluster with
high security and restricted access so I can't just try it like I would
on my desktop machine.
There will probably be several weeks before I get back to you.

Simon.