We have found a problem with the BKL not being held during a call to
nfs_asynch_unlink_release(), the call stack (this is an rpciod process)
is:
nfs_async_unlink_release (called via tk_ops->rpc_release())
rpc_release_task
__rpc_execute
run_workqueue
worker_thread
kthread
kernel_thread
I have been talking to Chuck Lever about this because I had noted one of
his patches for removing the BKL added a spin lock to protect the
nfs_deletes list used in fs/nfs/unlink.c. This patch happens to fix the
specific problem we are seeing, but discussion with Chuck suggests there
may be a larger problem.
Chuck pointed out that tk_ops is a recent addition, and that perhaps the
think to do would be to add a lock_kernel() around the call to
tk_ops->rpc_release().
When I had been investigating removing the BKL, I had noted that it
seemed inconsistent whether the BKL was held for calls to rpc_execute(),
and in some cases, the BKL was held ONLY for the call to rpc_execute().
My investigations found that rpciod does not hold the BKL when calling
__rpc_execute(), and I also think some of the direct I/O code in
fs/nfs/direct.c does not hold the BKL, or possible the asynch code (I
didn't keep good notes - I had made a cursory examination, then noted
for sure there were code paths where rpc_execute was called without the
BKL, and naively assumed it was not needed, and didn't pursue exactly
which calls did and did not hold the BKL - it seems that assumption may
be wrong).
At first, I thought that using Chuck's patch to unlink.c would be a good
fix since it would not introduce new lock_kernel() calls, however, if
there may be other holes, it might be simpler to put the proper
lock_kernel() calls in, and then cleanly remove all uses of the BKL when
Chuck's patch set is adopted.
Frank Filz
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2006-10-05 at 10:46 -0700, Frank Filz wrote:
> We have found a problem with the BKL not being held during a call to
> nfs_asynch_unlink_release(), the call stack (this is an rpciod process)
> is:
>
> nfs_async_unlink_release (called via tk_ops->rpc_release())
> rpc_release_task
> __rpc_execute
> run_workqueue
> worker_thread
> kthread
> kernel_thread
Until we fix the unlink call (and possibly others), then that looks like
a bug.
> I have been talking to Chuck Lever about this because I had noted one of
> his patches for removing the BKL added a spin lock to protect the
> nfs_deletes list used in fs/nfs/unlink.c. This patch happens to fix the
> specific problem we are seeing, but discussion with Chuck suggests there
> may be a larger problem.
>
> Chuck pointed out that tk_ops is a recent addition, and that perhaps the
> think to do would be to add a lock_kernel() around the call to
> tk_ops->rpc_release().
Agreed.
> When I had been investigating removing the BKL, I had noted that it
> seemed inconsistent whether the BKL was held for calls to rpc_execute(),
> and in some cases, the BKL was held ONLY for the call to rpc_execute().
> My investigations found that rpciod does not hold the BKL when calling
> __rpc_execute(), and I also think some of the direct I/O code in
> fs/nfs/direct.c does not hold the BKL, or possible the asynch code (I
> didn't keep good notes - I had made a cursory examination, then noted
> for sure there were code paths where rpc_execute was called without the
> BKL, and naively assumed it was not needed, and didn't pursue exactly
> which calls did and did not hold the BKL - it seems that assumption may
> be wrong).
>
> At first, I thought that using Chuck's patch to unlink.c would be a good
> fix since it would not introduce new lock_kernel() calls, however, if
> there may be other holes, it might be simpler to put the proper
> lock_kernel() calls in, and then cleanly remove all uses of the BKL when
> Chuck's patch set is adopted.
No. The RPC code itself should no longer require the BKL in order to
function correctly. The callbacks into the NFS or lockd layers are the
only routines that might care, and they only care for the duration of
the callback itself (holding the BKL across the entire RPC call is
unnecessary).
Cheers,
Trond
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2006-10-05 at 16:20 -0400, Trond Myklebust wrote:
> On Thu, 2006-10-05 at 10:46 -0700, Frank Filz wrote:
> > We have found a problem with the BKL not being held during a call to
> > nfs_asynch_unlink_release(), the call stack (this is an rpciod process)
> > is:
> >
> > nfs_async_unlink_release (called via tk_ops->rpc_release())
> > rpc_release_task
> > __rpc_execute
> > run_workqueue
> > worker_thread
> > kthread
> > kernel_thread
>
> Until we fix the unlink call (and possibly others), then that looks like
> a bug.
>
> > I have been talking to Chuck Lever about this because I had noted one of
> > his patches for removing the BKL added a spin lock to protect the
> > nfs_deletes list used in fs/nfs/unlink.c. This patch happens to fix the
> > specific problem we are seeing, but discussion with Chuck suggests there
> > may be a larger problem.
> >
> > Chuck pointed out that tk_ops is a recent addition, and that perhaps the
> > think to do would be to add a lock_kernel() around the call to
> > tk_ops->rpc_release().
>
> Agreed.
>
> > When I had been investigating removing the BKL, I had noted that it
> > seemed inconsistent whether the BKL was held for calls to rpc_execute(),
> > and in some cases, the BKL was held ONLY for the call to rpc_execute().
> > My investigations found that rpciod does not hold the BKL when calling
> > __rpc_execute(), and I also think some of the direct I/O code in
> > fs/nfs/direct.c does not hold the BKL, or possible the asynch code (I
> > didn't keep good notes - I had made a cursory examination, then noted
> > for sure there were code paths where rpc_execute was called without the
> > BKL, and naively assumed it was not needed, and didn't pursue exactly
> > which calls did and did not hold the BKL - it seems that assumption may
> > be wrong).
> >
> > At first, I thought that using Chuck's patch to unlink.c would be a good
> > fix since it would not introduce new lock_kernel() calls, however, if
> > there may be other holes, it might be simpler to put the proper
> > lock_kernel() calls in, and then cleanly remove all uses of the BKL when
> > Chuck's patch set is adopted.
>
> No. The RPC code itself should no longer require the BKL in order to
> function correctly. The callbacks into the NFS or lockd layers are thehe
> only routines that might care, and they only care for the duration of
> the callback itself (holding the BKL across the entire RPC call is
> unnecessary).
Ok. Should the BKL be acquired for all the tk_ops calls? I see these
four:
net/sunrpc/clnt.c rpc_call_async 535 tk_ops->rpc_release(data);
net/sunrpc/sched.c rpc_prepare_task 568 task->tk_ops->rpc_call_prepare
(task, task->tk_calldata);
net/sunrpc/sched.c rpc_exit_task 578 task->tk_ops->rpc_call_done(task,
task->tk_calldata);
net/sunrpc/sched.c rpc_release_task 892 tk_ops->rpc_release(calldata);
On the surface, it looks like it may only be necessary for rpc_release.
Frank
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2006-10-05 at 13:58 -0700, Frank Filz wrote:
> Ok. Should the BKL be acquired for all the tk_ops calls? I see these
> four:
>
> net/sunrpc/clnt.c rpc_call_async 535 tk_ops->rpc_release(data);
Yup. This needs a BKL wrapper.
> net/sunrpc/sched.c rpc_prepare_task 568 task->tk_ops->rpc_call_prepare
> (task, task->tk_calldata);
Nope. Should be covered by the BKL wrapper of task->tk_action() in
__rpc_execute().
> net/sunrpc/sched.c rpc_exit_task 578 task->tk_ops->rpc_call_done(task,
> task->tk_calldata);
Nope. Should be covered by the BKL wrapper of task->tk_action() in
__rpc_execute().
> net/sunrpc/sched.c rpc_release_task 892 tk_ops->rpc_release(calldata);
Yup. This needs a BKL wrapper.
Note that this would allow us to immediately get rid of all the BKL
wrappers around rpc_call_*() and friends.
Cheers,
Trond
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On 10/5/06, Trond Myklebust <[email protected]> wrote:
> On Thu, 2006-10-05 at 13:58 -0700, Frank Filz wrote:
> > Ok. Should the BKL be acquired for all the tk_ops calls? I see these
> > four:
> >
> > net/sunrpc/clnt.c rpc_call_async 535 tk_ops->rpc_release(data);
>
> Yup. This needs a BKL wrapper.
>
> > net/sunrpc/sched.c rpc_prepare_task 568 task->tk_ops->rpc_call_prepare
> > (task, task->tk_calldata);
>
> Nope. Should be covered by the BKL wrapper of task->tk_action() in
> __rpc_execute().
>
> > net/sunrpc/sched.c rpc_exit_task 578 task->tk_ops->rpc_call_done(task,
> > task->tk_calldata);
>
> Nope. Should be covered by the BKL wrapper of task->tk_action() in
> __rpc_execute().
>
> > net/sunrpc/sched.c rpc_release_task 892 tk_ops->rpc_release(calldata);
>
> Yup. This needs a BKL wrapper.
One more, in rpc_run_task()...
> Note that this would allow us to immediately get rid of all the BKL
> wrappers around rpc_call_*() and friends.
Do you mean rpc_call_{a,}sync() or do you mean the tk_ops calls?
--
"We who cut mere stones must always be envisioning cathedrals"
-- Quarry worker's creed
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2006-10-05 at 17:25 -0400, Chuck Lever wrote:
> On 10/5/06, Trond Myklebust <[email protected]> wrote:
> > On Thu, 2006-10-05 at 13:58 -0700, Frank Filz wrote:
> > > Ok. Should the BKL be acquired for all the tk_ops calls? I see these
> > > four:
> > >
> > > net/sunrpc/clnt.c rpc_call_async 535 tk_ops->rpc_release(data);
> >
> > Yup. This needs a BKL wrapper.
> >
> > > net/sunrpc/sched.c rpc_prepare_task 568 task->tk_ops->rpc_call_prepare
> > > (task, task->tk_calldata);
> >
> > Nope. Should be covered by the BKL wrapper of task->tk_action() in
> > __rpc_execute().
> >
> > > net/sunrpc/sched.c rpc_exit_task 578 task->tk_ops->rpc_call_done(task,
> > > task->tk_calldata);
> >
> > Nope. Should be covered by the BKL wrapper of task->tk_action() in
> > __rpc_execute().
> >
> > > net/sunrpc/sched.c rpc_release_task 892 tk_ops->rpc_release(calldata);
> >
> > Yup. This needs a BKL wrapper.
>
> One more, in rpc_run_task()...
Oops, missed that one. I'll code up and test a patch to add these.
> > Note that this would allow us to immediately get rid of all the BKL
> > wrappers around rpc_call_*() and friends.
I'll also code up a separate patch or patch set to remove these.
> Do you mean rpc_call_{a,}sync() or do you mean the tk_ops calls?
I assume rpc_call_async() and rpc_call_sync() along with rpc_execute().
Frank
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2006-10-05 at 17:25 -0400, Chuck Lever wrote:
> One more, in rpc_run_task()...
Yep.
> > Note that this would allow us to immediately get rid of all the BKL
> > wrappers around rpc_call_*() and friends.
>
> Do you mean rpc_call_{a,}sync() or do you mean the tk_ops calls?
All direct calls to rpc_call_(a|)sync, rpc_run_task, and rpc_execute
that are being wrapped. See for instance nfs_execute_write,
nfs_execute_read, nfs_direct_read_schedule, nfs_direct_commit_schedule,
nfs_direct_write_schedule.
Also see nfs_symlink_filler, nfs3_proc_readdir, nfs_proc_readdir.
Cheers,
Trond
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Thu, 2006-10-05 at 17:53 -0400, Trond Myklebust wrote:
> On Thu, 2006-10-05 at 17:25 -0400, Chuck Lever wrote:
>
> > One more, in rpc_run_task()...
>
> Yep.
>
> > > Note that this would allow us to immediately get rid of all the BKL
> > > wrappers around rpc_call_*() and friends.
> >
> > Do you mean rpc_call_{a,}sync() or do you mean the tk_ops calls?
>
> All direct calls to rpc_call_(a|)sync, rpc_run_task, and rpc_execute
> that are being wrapped. See for instance nfs_execute_write,
> nfs_execute_read, nfs_direct_read_schedule, nfs_direct_commit_schedule,
> nfs_direct_write_schedule.
>
> Also see nfs_symlink_filler, nfs3_proc_readdir, nfs_proc_readdir.
I looked over all of those. A few had a bit more than just a call to rpc
inside the BKL, but I'm pretty sure they are all safe (references to the
inode and such, which should be locked, or references to stack
variables). I won't get a chance to test this until next week, but looks
promising. This will definitely remove a bunch of lock_kernel() calls at
the expense of one new one.
Frank
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs