The sunrpc scheduler contains a race condition that can let an RPC
task end up being neither running nor on any wait queue. The race takes
place between rpc_make_runnable (called from rpc_wake_up_task) and
__rpc_execute under the following condition:
First __rpc_execute calls tk_action which puts the task on some wait
queue. The task is dequeued by another process before __rpc_execute
continues its execution. While executing rpc_make_runnable exactly after
setting the task `running' bit and before clearing the `queued' bit
__rpc_execute picks up execution, clears `running' and subsequently
both functions fall through, both under the false assumption somebody
else took the job.
Swapping rpc_test_and_set_running with rpc_clear_queued in
rpc_make_runnable fixes that hole. The reordering hopefully doesn't
introduce some new race condition, in fact the only possible one is
already correctly detected and handled in __rpc_execute.
Bug noticed on a 4-way x86_64 system under XEN with an NFSv4 server
on the same physical machine, apparently one of the few ways to hit
this race condition at all.
Cc: Trond Myklebust <[email protected]>
Cc: J. Bruce Fields <[email protected]>
Cc: Olaf Kirch <[email protected]>
Signed-off-by: Christophe Saout <[email protected]>
--- linux-2.6.18/net/sunrpc/sched.c 2006-09-20 05:42:06.000000000 +0200
+++ linux/net/sunrpc/sched.c 2006-11-04 20:38:56.000000000 +0100
@@ -302,12 +302,9 @@ EXPORT_SYMBOL(__rpc_wait_for_completion_
*/
static void rpc_make_runnable(struct rpc_task *task)
{
- int do_ret;
-
BUG_ON(task->tk_timeout_fn);
- do_ret = rpc_test_and_set_running(task);
rpc_clear_queued(task);
- if (do_ret)
+ if (rpc_test_and_set_running(task))
return;
if (RPC_IS_ASYNC(task)) {
int status;
On Sun, 2006-11-05 at 02:04 +0100, Christophe Saout wrote:
> The sunrpc scheduler contains a race condition that can let an RPC
> task end up being neither running nor on any wait queue. The race takes
> place between rpc_make_runnable (called from rpc_wake_up_task) and
> __rpc_execute under the following condition:
>
> First __rpc_execute calls tk_action which puts the task on some wait
> queue. The task is dequeued by another process before __rpc_execute
> continues its execution. While executing rpc_make_runnable exactly after
> setting the task `running' bit and before clearing the `queued' bit
> __rpc_execute picks up execution, clears `running' and subsequently
> both functions fall through, both under the false assumption somebody
> else took the job.
>
> Swapping rpc_test_and_set_running with rpc_clear_queued in
> rpc_make_runnable fixes that hole. The reordering hopefully doesn't
> introduce some new race condition, in fact the only possible one is
> already correctly detected and handled in __rpc_execute.
>
> Bug noticed on a 4-way x86_64 system under XEN with an NFSv4 server
> on the same physical machine, apparently one of the few ways to hit
> this race condition at all.
>
> Cc: Trond Myklebust <[email protected]>
> Cc: J. Bruce Fields <[email protected]>
> Cc: Olaf Kirch <[email protected]>
> Signed-off-by: Christophe Saout <[email protected]>
>
> --- linux-2.6.18/net/sunrpc/sched.c 2006-09-20 05:42:06.000000000 +0200
> +++ linux/net/sunrpc/sched.c 2006-11-04 20:38:56.000000000 +0100
> @@ -302,12 +302,9 @@ EXPORT_SYMBOL(__rpc_wait_for_completion_
> */
> static void rpc_make_runnable(struct rpc_task *task)
> {
> - int do_ret;
> -
> BUG_ON(task->tk_timeout_fn);
> - do_ret = rpc_test_and_set_running(task);
> rpc_clear_queued(task);
> - if (do_ret)
> + if (rpc_test_and_set_running(task))
> return;
> if (RPC_IS_ASYNC(task)) {
> int status;
This fix looks wrong to me. If we've made it to 'rpc_make_runnable',
then the rpc_task will have already been removed from the
rpc_wait_queue.
The only question left is "who is responsible for waking up the
synchronous task / setting up the asynchronous workqueue item?".
Cheers,
Trond
Am Sonntag, den 05.11.2006, 01:50 -0500 schrieb Trond Myklebust:
> > --- linux-2.6.18/net/sunrpc/sched.c 2006-09-20 05:42:06.000000000 +0200
> > +++ linux/net/sunrpc/sched.c 2006-11-04 20:38:56.000000000 +0100
> > @@ -302,12 +302,9 @@ EXPORT_SYMBOL(__rpc_wait_for_completion_
> > */
> > static void rpc_make_runnable(struct rpc_task *task)
> > {
> > - int do_ret;
> > -
> > BUG_ON(task->tk_timeout_fn);
> > - do_ret = rpc_test_and_set_running(task);
> > rpc_clear_queued(task);
> > - if (do_ret)
> > + if (rpc_test_and_set_running(task))
> > return;
> > if (RPC_IS_ASYNC(task)) {
> > int status;
>
> This fix looks wrong to me. If we've made it to 'rpc_make_runnable',
> then the rpc_task will have already been removed from the
> rpc_wait_queue.
I just flipped the two lines, changed nothing else. Why exactly do you
think that's wrong, I don't see anything particular that could be broken
by chaning the ordering. Anyway, the fsstress has been running for 18
hours straight now without showing any signs of problems.
On Sun, 2006-11-05 at 11:32 +0100, Christophe Saout wrote:
> Am Sonntag, den 05.11.2006, 01:50 -0500 schrieb Trond Myklebust:
>
> > > --- linux-2.6.18/net/sunrpc/sched.c 2006-09-20 05:42:06.000000000 +0200
> > > +++ linux/net/sunrpc/sched.c 2006-11-04 20:38:56.000000000 +0100
> > > @@ -302,12 +302,9 @@ EXPORT_SYMBOL(__rpc_wait_for_completion_
> > > */
> > > static void rpc_make_runnable(struct rpc_task *task)
> > > {
> > > - int do_ret;
> > > -
> > > BUG_ON(task->tk_timeout_fn);
> > > - do_ret = rpc_test_and_set_running(task);
> > > rpc_clear_queued(task);
> > > - if (do_ret)
> > > + if (rpc_test_and_set_running(task))
> > > return;
> > > if (RPC_IS_ASYNC(task)) {
> > > int status;
> >
> > This fix looks wrong to me. If we've made it to 'rpc_make_runnable',
> > then the rpc_task will have already been removed from the
> > rpc_wait_queue.
>
> I just flipped the two lines, changed nothing else. Why exactly do you
> think that's wrong, I don't see anything particular that could be broken
> by chaning the ordering. Anyway, the fsstress has been running for 18
> hours straight now without showing any signs of problems.
OK. I finally see the bug that you've spotted. The problem occurs when
__rpc_execute clears RPC_TASK_RUNNING after rpc_make_runnable has called
rpc_test_and_set_running, but before it has called rpc_clear_queued.
However if you just swap the two lines, you run into a new race:
__rpc_execute() may just put the rpc_task back to sleep before your call
to rpc_test_and_set_running() finishes executing.
We therefore need an extra test for RPC_IS_QUEUED() in
rpc_make_runnable().
Cheers,
Trond
On Sun, 2006-11-05 at 12:01 -0500, Trond Myklebust wrote:
> On Sun, 2006-11-05 at 11:32 +0100, Christophe Saout wrote:
> > Am Sonntag, den 05.11.2006, 01:50 -0500 schrieb Trond Myklebust:
> >
> > > > --- linux-2.6.18/net/sunrpc/sched.c 2006-09-20 05:42:06.000000000 +0200
> > > > +++ linux/net/sunrpc/sched.c 2006-11-04 20:38:56.000000000 +0100
> > > > @@ -302,12 +302,9 @@ EXPORT_SYMBOL(__rpc_wait_for_completion_
> > > > */
> > > > static void rpc_make_runnable(struct rpc_task *task)
> > > > {
> > > > - int do_ret;
> > > > -
> > > > BUG_ON(task->tk_timeout_fn);
> > > > - do_ret = rpc_test_and_set_running(task);
> > > > rpc_clear_queued(task);
> > > > - if (do_ret)
> > > > + if (rpc_test_and_set_running(task))
> > > > return;
> > > > if (RPC_IS_ASYNC(task)) {
> > > > int status;
> > >
> > > This fix looks wrong to me. If we've made it to 'rpc_make_runnable',
> > > then the rpc_task will have already been removed from the
> > > rpc_wait_queue.
> >
> > I just flipped the two lines, changed nothing else. Why exactly do you
> > think that's wrong, I don't see anything particular that could be broken
> > by chaning the ordering. Anyway, the fsstress has been running for 18
> > hours straight now without showing any signs of problems.
>
> OK. I finally see the bug that you've spotted. The problem occurs when
> __rpc_execute clears RPC_TASK_RUNNING after rpc_make_runnable has called
> rpc_test_and_set_running, but before it has called rpc_clear_queued.
Yes, exactly.
> However if you just swap the two lines, you run into a new race:
> __rpc_execute() may just put the rpc_task back to sleep before your call
> to rpc_test_and_set_running() finishes executing.
> We therefore need an extra test for RPC_IS_QUEUED() in
> rpc_make_runnable().
Damn, you're right. I missed that one. What about that:
----
The sunrpc scheduler contains a race condition that can let an RPC
task end up being neither running nor on any wait queue. The race takes
place between rpc_make_runnable (called from rpc_wake_up_task) and
__rpc_execute under the following condition:
First __rpc_execute calls tk_action which puts the task on some wait
queue. The task is dequeued by another process before __rpc_execute
continues its execution. While executing rpc_make_runnable exactly after
setting the task `running' bit and before clearing the `queued' bit
__rpc_execute picks up execution, clears `running' and subsequently
both functions fall through, both under the false assumption somebody
else took the job.
Swapping rpc_test_and_set_running with rpc_clear_queued in
rpc_make_runnable fixes that hole. This introduces another possible
race condition that can be handled by checking for `queued' after
setting the `running' bit.
Bug noticed on a 4-way x86_64 system under XEN with an NFSv4 server
on the same physical machine, apparently one of the few ways to hit
this race condition at all.
Cc: Trond Myklebust <[email protected]>
Cc: J. Bruce Fields <[email protected]>
Signed-off-by: Christophe Saout <[email protected]>
--- linux-2.6.18/net/sunrpc/sched.c 2006-09-20 05:42:06.000000000 +0200
+++ linux/net/sunrpc/sched.c 2006-11-04 20:38:56.000000000 +0100
@@ -302,12 +302,15 @@ EXPORT_SYMBOL(__rpc_wait_for_completion_
*/
static void rpc_make_runnable(struct rpc_task *task)
{
- int do_ret;
-
BUG_ON(task->tk_timeout_fn);
- do_ret = rpc_test_and_set_running(task);
rpc_clear_queued(task);
- if (do_ret)
+ if (rpc_test_and_set_running(task))
return;
+ /* We might have raced */
+ if (RPC_IS_QUEUED(task)) {
+ rpc_clear_running(task);
+ return;
+ }
if (RPC_IS_ASYNC(task)) {
int status;
- int do_ret;
-
BUG_ON(task->tk_timeout_fn);
- do_ret = rpc_test_and_set_running(task);
rpc_clear_queued(task);
- if (do_ret)
+ if (rpc_test_and_set_running(task))
return;
+ /* We might have raced with __rpc_execute */
+ if (RPC_IS_QUEUED(task)) {
+ rpc_clear_running(task);
+ return;
+ }
if (RPC_IS_ASYNC(task)) {
int status;
On Sun, 2006-11-05 at 18:42 +0100, Christophe Saout wrote:
> On Sun, 2006-11-05 at 12:01 -0500, Trond Myklebust wrote:
> > On Sun, 2006-11-05 at 11:32 +0100, Christophe Saout wrote:
> > > Am Sonntag, den 05.11.2006, 01:50 -0500 schrieb Trond Myklebust:
> > >
> > > > > --- linux-2.6.18/net/sunrpc/sched.c 2006-09-20 05:42:06.000000000 +0200
> > > > > +++ linux/net/sunrpc/sched.c 2006-11-04 20:38:56.000000000 +0100
> > > > > @@ -302,12 +302,9 @@ EXPORT_SYMBOL(__rpc_wait_for_completion_
> > > > > */
> > > > > static void rpc_make_runnable(struct rpc_task *task)
> > > > > {
> > > > > - int do_ret;
> > > > > -
> > > > > BUG_ON(task->tk_timeout_fn);
> > > > > - do_ret = rpc_test_and_set_running(task);
> > > > > rpc_clear_queued(task);
> > > > > - if (do_ret)
> > > > > + if (rpc_test_and_set_running(task))
> > > > > return;
> > > > > if (RPC_IS_ASYNC(task)) {
> > > > > int status;
> > > >
> > > > This fix looks wrong to me. If we've made it to 'rpc_make_runnable',
> > > > then the rpc_task will have already been removed from the
> > > > rpc_wait_queue.
> > >
> > > I just flipped the two lines, changed nothing else. Why exactly do you
> > > think that's wrong, I don't see anything particular that could be broken
> > > by chaning the ordering. Anyway, the fsstress has been running for 18
> > > hours straight now without showing any signs of problems.
> >
> > OK. I finally see the bug that you've spotted. The problem occurs when
> > __rpc_execute clears RPC_TASK_RUNNING after rpc_make_runnable has called
> > rpc_test_and_set_running, but before it has called rpc_clear_queued.
>
> Yes, exactly.
>
> > However if you just swap the two lines, you run into a new race:
> > __rpc_execute() may just put the rpc_task back to sleep before your call
> > to rpc_test_and_set_running() finishes executing.
> > We therefore need an extra test for RPC_IS_QUEUED() in
> > rpc_make_runnable().
>
> Damn, you're right. I missed that one. What about that:
>
> ----
> The sunrpc scheduler contains a race condition that can let an RPC
> task end up being neither running nor on any wait queue. The race takes
> place between rpc_make_runnable (called from rpc_wake_up_task) and
> __rpc_execute under the following condition:
>
> First __rpc_execute calls tk_action which puts the task on some wait
> queue. The task is dequeued by another process before __rpc_execute
> continues its execution. While executing rpc_make_runnable exactly after
> setting the task `running' bit and before clearing the `queued' bit
> __rpc_execute picks up execution, clears `running' and subsequently
> both functions fall through, both under the false assumption somebody
> else took the job.
>
> Swapping rpc_test_and_set_running with rpc_clear_queued in
> rpc_make_runnable fixes that hole. This introduces another possible
> race condition that can be handled by checking for `queued' after
> setting the `running' bit.
>
> Bug noticed on a 4-way x86_64 system under XEN with an NFSv4 server
> on the same physical machine, apparently one of the few ways to hit
> this race condition at all.
>
> Cc: Trond Myklebust <[email protected]>
> Cc: J. Bruce Fields <[email protected]>
> Signed-off-by: Christophe Saout <[email protected]>
>
> --- linux-2.6.18/net/sunrpc/sched.c 2006-09-20 05:42:06.000000000 +0200
> +++ linux/net/sunrpc/sched.c 2006-11-04 20:38:56.000000000 +0100
> @@ -302,12 +302,15 @@ EXPORT_SYMBOL(__rpc_wait_for_completion_
> */
> static void rpc_make_runnable(struct rpc_task *task)
> {
> - int do_ret;
> -
> BUG_ON(task->tk_timeout_fn);
> - do_ret = rpc_test_and_set_running(task);
> rpc_clear_queued(task);
> - if (do_ret)
> + if (rpc_test_and_set_running(task))
> return;
> + /* We might have raced */
> + if (RPC_IS_QUEUED(task)) {
> + rpc_clear_running(task);
> + return;
> + }
> if (RPC_IS_ASYNC(task)) {
> int status;
This looks OK.
> - int do_ret;
> -
> BUG_ON(task->tk_timeout_fn);
> - do_ret = rpc_test_and_set_running(task);
> rpc_clear_queued(task);
> - if (do_ret)
> + if (rpc_test_and_set_running(task))
> return;
> + /* We might have raced with __rpc_execute */
> + if (RPC_IS_QUEUED(task)) {
> + rpc_clear_running(task);
> + return;
> + }
> if (RPC_IS_ASYNC(task)) {
> int status;
A cut'n paste error?
Cheers,
Trond