2008-02-28 17:29:15

by Frans Pop

[permalink] [raw]
Subject: [regression] nfs4: 30 second delay during umount of remote fs on system shutdown

With 2.6.25-rc2 and -rc3 I noticed a pause of 30 seconds when my system
(Debian unstable, x86_64) is being shutdown. The cause is the unmounting of
an nfs4 remote file system.
The pause does not happen with 2.6.24. I'm unsure about 2.6.25-rc1 as I was
concentrating on other issues when I was testing that.

I can reproduce the same pause on an active system by stopping portmap
before doing an umount of the filesystem.

The shutdown sequence on my box includes:
K20nfs-common
[...]
S20sendsigs
[...]
S31umountnfs.sh
S32portmap

So it seems that things should be OK (S32portmap is after S31umountnfs.sh),
but in fact portmap has already been killed during S20sendsigs!

I have verified that portmap is also not running during S31umountnfs.sh with
2.6.24, so apparently there has been a change in the kernel that has made
umount look for portmap and cause the 30 second delay if it is not running.

Is this a kernel regression?

Cheers,
FJP


2008-02-28 21:47:03

by Trond Myklebust

[permalink] [raw]
Subject: Re: [regression] nfs4: 30 second delay during umount of remote fs on system shutdown


On Thu, 2008-02-28 at 18:29 +0100, Frans Pop wrote:

> I have verified that portmap is also not running during S31umountnfs.sh with
> 2.6.24, so apparently there has been a change in the kernel that has made
> umount look for portmap and cause the 30 second delay if it is not running.

You are probably seeing the effect of lockd going down: it always
attempts to unregister from the portmapper (and no, this is _not_ new
behaviour).

If debian is killing the portmapper while lockd is still up, then that
is a definite debian bug...

Trond


2008-02-29 01:03:14

by Frans Pop

[permalink] [raw]
Subject: Re: [regression] nfs4: 30 second delay during umount of remote fs on system shutdown

On Thursday 28 February 2008, Trond Myklebust wrote:
> On Thu, 2008-02-28 at 18:29 +0100, Frans Pop wrote:
> > I have verified that portmap is also not running during S31umountnfs.sh
> > with 2.6.24, so apparently there has been a change in the kernel that
> > has made umount look for portmap and cause the 30 second delay if it is
> > not running.
>
> You are probably seeing the effect of lockd going down: it always
> attempts to unregister from the portmapper (and no, this is _not_ new
> behaviour).

It is _very much_ new behavior because as soon as I boot (and shut down)
with a 2.6.24 kernel the problem completely disappears. Userland is *100%
identical* between my tests.

> If debian is killing the portmapper while lockd is still up, then that
> is a definite debian bug...

I have no idea about that. AFAICT lockd is not even running (at least,
there's no process named lockd, even when the system is running normally.
Isn't lockd an nfs3 thing? I'm using nfs4 here.

Cheers,
FJP

2008-02-29 01:10:27

by Jeff Layton

[permalink] [raw]
Subject: Re: [regression] nfs4: 30 second delay during umount of remote fs on system shutdown

On Fri, 29 Feb 2008 02:03:05 +0100
Frans Pop <elendil-EIBgga6/0yRmR6Xm/[email protected]> wrote:

> On Thursday 28 February 2008, Trond Myklebust wrote:
> > On Thu, 2008-02-28 at 18:29 +0100, Frans Pop wrote:
> > > I have verified that portmap is also not running during S31umountnfs.sh
> > > with 2.6.24, so apparently there has been a change in the kernel that
> > > has made umount look for portmap and cause the 30 second delay if it is
> > > not running.
> >
> > You are probably seeing the effect of lockd going down: it always
> > attempts to unregister from the portmapper (and no, this is _not_ new
> > behaviour).
>
> It is _very much_ new behavior because as soon as I boot (and shut down)
> with a 2.6.24 kernel the problem completely disappears. Userland is *100%
> identical* between my tests.
>
> > If debian is killing the portmapper while lockd is still up, then that
> > is a definite debian bug...
>
> I have no idea about that. AFAICT lockd is not even running (at least,
> there's no process named lockd, even when the system is running normally.
> Isn't lockd an nfs3 thing? I'm using nfs4 here.
>

It may be the NFSv4 callback thread going down then. It also tries to
unregister with the portmapper and recent patches fixed the reference
counting so that it actually tears things down instead of just leaking
memory and sockets.

I'm not sure what we can do about that other than try to fix it so that
it doesn't try to unregister with the portmapper. It really doesn't need
to since it no longer registers with it.

--
Jeff Layton <[email protected]>

2008-02-29 04:55:21

by Trond Myklebust

[permalink] [raw]
Subject: Re: [regression] nfs4: 30 second delay during umount of remote fs on system shutdown


On Fri, 2008-02-29 at 02:03 +0100, Frans Pop wrote:
> On Thursday 28 February 2008, Trond Myklebust wrote:
> > On Thu, 2008-02-28 at 18:29 +0100, Frans Pop wrote:
> > > I have verified that portmap is also not running during S31umountnfs.sh
> > > with 2.6.24, so apparently there has been a change in the kernel that
> > > has made umount look for portmap and cause the 30 second delay if it is
> > > not running.
> >
> > You are probably seeing the effect of lockd going down: it always
> > attempts to unregister from the portmapper (and no, this is _not_ new
> > behaviour).
>
> It is _very much_ new behavior because as soon as I boot (and shut down)
> with a 2.6.24 kernel the problem completely disappears. Userland is *100%
> identical* between my tests.

The "new behaviour" is that we fixed a bug whereby the lockd code was
failing to flush signals before attempting the rpc unregister. The sad
fact is that userland appears to have been 100% broken for some time.
The difference is that it was able to get away with it earlier...

> > If debian is killing the portmapper while lockd is still up, then that
> > is a definite debian bug...
>
> I have no idea about that. AFAICT lockd is not even running (at least,
> there's no process named lockd, even when the system is running normally.
> Isn't lockd an nfs3 thing? I'm using nfs4 here.

NFSv4 has a delegation recall server to kill. Same difference...

Trond