2007-08-28 06:22:19

by Janne Karhunen

[permalink] [raw]
Subject: NFSv3 lock recovery

Hi,

Brief question about NFSv3 lock recovery to those who might
know - does Linux implementation (or NLM/NSM protocol)
properly support the case in which client and server state
change simultaneously?

Reason I'm asking is that this very case is occasionally giving
me stale locks. Given that NFSv3 server crashes it's possible
that client 'rooting' from it crashes as well. Now, once the
server comes back up it tries to notify the client that just
crashed. Hardly surprisingly, this notification doesn't go
anywhere and server discards the notification/client. And once
the client starts to boot again it tries to notify the server which
instantly whines about SM_NOTIFY when no-one is being
monitored. Thus, whole notification cycle is busted and lock
states go haywire :/. Is this even supposed to work?



--
// Janne


2007-08-28 06:31:23

by NeilBrown

[permalink] [raw]
Subject: Re: NFSv3 lock recovery

On Tuesday August 28, [email protected] wrote:
> Hi,
>
> Brief question about NFSv3 lock recovery to those who might
> know - does Linux implementation (or NLM/NSM protocol)
> properly support the case in which client and server state
> change simultaneously?

If both crash, there is nothing for a client to reclaim and nothing
for a server to discard, so it is hard to see where a problem could
lie.

>
> Reason I'm asking is that this very case is occasionally giving
> me stale locks. Given that NFSv3 server crashes it's possible
> that client 'rooting' from it crashes as well. Now, once the
> server comes back up it tries to notify the client that just
> crashed. Hardly surprisingly, this notification doesn't go
> anywhere and server discards the notification/client. And once
> the client starts to boot again it tries to notify the server which
> instantly whines about SM_NOTIFY when no-one is being
> monitored. Thus, whole notification cycle is busted and lock
> states go haywire :/. Is this even supposed to work?
>

More details. What, exactly, goes "haywire"...

If your client is diskless and mounting root from the server, then it
should be mounting the root with "-o nolock" and there should be no
locking issues at all.
Possibly it mounts some other filesystems as well, and they are
mounted with locks. But still, I cannot imagine a problem scenario.

Please explain in detail your configuration (What is mounted where and
with what options etc) and what happens (why does the client crash
just because the server crashed - it shouldn't), and what actually
fails that you expected to work.

NeilBrown

2007-08-28 07:22:09

by Janne Karhunen

[permalink] [raw]
Subject: Re: NFSv3 lock recovery

On 8/28/07, Neil Brown <[email protected]> wrote:

> > Brief question about NFSv3 lock recovery to those who might
> > know - does Linux implementation (or NLM/NSM protocol)
> > properly support the case in which client and server state
> > change simultaneously?
>
> If both crash, there is nothing for a client to reclaim and nothing
> for a server to discard, so it is hard to see where a problem could
> lie.

What I'm getting is a usual stale lock case - node(s) bootup
gets stuck to first attempted file lock (wtmp in this case).


> > Reason I'm asking is that this very case is occasionally giving
> > me stale locks. Given that NFSv3 server crashes it's possible
> > that client 'rooting' from it crashes as well. Now, once the
> > server comes back up it tries to notify the client that just
> > crashed. Hardly surprisingly, this notification doesn't go
> > anywhere and server discards the notification/client. And once
> > the client starts to boot again it tries to notify the server which
> > instantly whines about SM_NOTIFY when no-one is being
> > monitored. Thus, whole notification cycle is busted and lock
> > states go haywire :/. Is this even supposed to work?
> >
>
> More details. What, exactly, goes "haywire"...

Wtmp is locked and client remains waiting for it to be
released. It's as if server lock state was persistent over
crash and both ways busted notification cycle makes
sure lock is never released. Server never grants a new
lock to this file.

But I take it that lock state(s) are not supposed to be
persistent over crash? Umm.


> If your client is diskless and mounting root from the server, then it
> should be mounting the root with "-o nolock" and there should be no
> locking issues at all.

Sure, but I need the locks..


> Please explain in detail your configuration (What is mounted where and
> with what options etc) and what happens (why does the client crash
> just because the server crashed - it shouldn't), and what actually
> fails that you expected to work.

Client has root on top of NFS. Thus, if the server goes
away it's 'quite likely' for things to jam in the client and
in this case we have a HA setup that reacts to this.

One more thing. Restarting server statd once clients
are up helps instantly.


--
// Janne