2015-01-14 16:51:38

by Trond Myklebust

[permalink] [raw]
Subject: Race in bl_resolve_deviceid?

Hi Christoph,

bl_resolve_deviceid() has:

add_wait_queue(&nn->bl_wq, &wq);
rc = rpc_queue_upcall(nn->bl_device_pipe, msg);
if (rc < 0) {
remove_wait_queue(&nn->bl_wq, &wq);
goto out_free_data;
}

set_current_state(TASK_UNINTERRUPTIBLE);
schedule();
remove_wait_queue(&nn->bl_wq, &wq);


Doesn't that call to 'set_current_state()' need to come before the
rpc_queue_upcall() if you want the wait for the downcall to be
race-free? It looks to me as if the right thing to do here is to
replace the above with a prepare_to_wait()/finish_wait() pair...

--
Trond Myklebust
Linux NFS client maintainer, PrimaryData
[email protected]


2015-01-15 15:12:57

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Race in bl_resolve_deviceid?

On Wed, Jan 14, 2015 at 11:51:37AM -0500, Trond Myklebust wrote:
> Hi Christoph,
>
> bl_resolve_deviceid() has:
>
> add_wait_queue(&nn->bl_wq, &wq);
> rc = rpc_queue_upcall(nn->bl_device_pipe, msg);
> if (rc < 0) {
> remove_wait_queue(&nn->bl_wq, &wq);
> goto out_free_data;
> }
>
> set_current_state(TASK_UNINTERRUPTIBLE);
> schedule();
> remove_wait_queue(&nn->bl_wq, &wq);
>
>
> Doesn't that call to 'set_current_state()' need to come before the
> rpc_queue_upcall() if you want the wait for the downcall to be
> race-free? It looks to me as if the right thing to do here is to
> replace the above with a prepare_to_wait()/finish_wait() pair...

That code predates my involvement with the block layout driver,
but from a quick inspecion I'd say you're right. Let me cook up
a patch and run it through testing.

2015-01-23 13:41:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: Race in bl_resolve_deviceid?

On Wed, Jan 14, 2015 at 11:51:37AM -0500, Trond Myklebust wrote:
> Doesn't that call to 'set_current_state()' need to come before the
> rpc_queue_upcall() if you want the wait for the downcall to be
> race-free? It looks to me as if the right thing to do here is to
> replace the above with a prepare_to_wait()/finish_wait() pair...

After the trivial switch to prepare_to_wait()/finish_wait() the
thread asking for the deviceid never gets woken. I'll need more
time to understand all the code around the rpc_pipefs upcalls.