2008-01-14 19:56:42

by Benjamin Coddington

[permalink] [raw]
Subject: Re: Strange lockup during unmount in 2.6.22 - maybe rpciod deadlock?

Trond Myklebust wrote:

> I'm surprised that we can get into this state, though. How is
> sys_umount() able to exit with either readaheads or writebacks still
> pending? Is this perhaps occurring on a lazy umount?

Should we be submitting async tasks in gss_destroying_context? I get
this problem on umount with 3 or more cached GSS creds -- and end up
blocked in rpciod_down - destroy_workqueue..

With short expiry, dcache (or something) holds a ref, and these creds
stack up, then all queue tasks to clear context on shutdown.

Ben Coddington


2008-01-14 19:31:53

by Trond Myklebust

[permalink] [raw]
Subject: Re: Strange lockup during unmount in 2.6.22 - maybe rpciod deadlock?


On Mon, 2008-01-14 at 14:01 -0500, Benjamin Coddington wrote:
> Trond Myklebust wrote:
>
> > I'm surprised that we can get into this state, though. How is
> > sys_umount() able to exit with either readaheads or writebacks still
> > pending? Is this perhaps occurring on a lazy umount?
>
> Should we be submitting async tasks in gss_destroying_context? I get
> this problem on umount with 3 or more cached GSS creds -- and end up
> blocked in rpciod_down - destroy_workqueue..

That shouldn't be the case in standard 2.6.22 unless you've also applied
the NFS_ALL patches. In that case, there is a fix (which has also been
applied to mainline) in

http://client.linux-nfs.org/Linux-2.6.x/2.6.23-rc1/linux-2.6.23-001-fix_rpciod_down_race.dif

Cheers
Trond