2011-11-14 17:15:11

by Pavel A

[permalink] [raw]
Subject: clients fail to reclaim locks after server reboot or manual sm-notify

Hi! I'm trying to set up an NFS server (particularly an A/A NFS cluster) and
having issues with locking and reboot notifications. These are the tests I have
done:

1. The simplest test includes single NFS server machine (Debian Squeeze),
running nfs-kernel-server (nfs-utils 1.2.2-4) and a single client machine (same
OS), that mounts a share with “-o 'vers=3'” option. From the client I lock some
file on share using 'testlk -w <filename>' (testlk from nfsutils/tools/locktest)
so that a corresponding file appears in /var/lib/nfs/sm/ on server. Then I
reboot the server and this is what I get in client logs:

lockd: request from 127.0.0.1, port=1007
lockd: SM_NOTIFY called
lockd: host nfs-server1 (192.168.0.101) rebooted, cnt 2
lockd: get host nfs-server1
lockd: get host nfs-server1
lockd: release host nfs-server1
lockd: reclaiming locks for host nfs-server1
lockd: rebind host nfs-server1
lockd: call procedure 2 on nfs-server1
lockd: nlm_bind_host nfs-server1 (192.168.0.101)
lockd: rpc_call returned error 13
lockd: failed to reclaim lock for pid 1555 (errno -13, status 0)
NLM: done reclaiming locks for host nfs-server1
lockd: release host nfs-server1

2. As I'm building a cluster I'll need to notify clients when NFS resource
migrates (since it is an A/A cluster nfs-kernel-server is always running on all
nodes and shares migrate using exportfs resource agent), but manually calling
sm-notify ('sm-notify -f -v <virtual IP of share>') from either the initial for
that share or backup node results in the following (client logs):

lockd: request from 127.0.0.1, port=637
lockd: SM_NOTIFY called
lockd: host B (192.168.0.110) rebooted, cnt 2
lockd: get host B
lockd: get host B
lockd: release host B
lockd: reclaiming locks for host B
lockd: rebind host B
lockd: call procedure 2 on B
lockd: nlm_bind_host B (192.168.0.110)
lockd: server in grace period
lockd: spurious grace period reject?!
lockd: failed to reclaim lock for pid 2508 (errno -37, status 4)
NLM: done reclaiming locks for host B
lockd: release host B

even though grace period is intended for lock reclamation. B/w after such
invocation no files, corresponding to the notified clients, appear in
/var/lib/nfs/sm/ on server for about 10 minutes, if I try locking from any of
these notified clients, even though locking itself is ok. Locking from other
clients generates files for them instantly.

As of the rest: simple concurrent lock tests from couple of clients work fine as
well as server frees locks of rebooted clients.

I'm new to NFS an may be missing obvious things, but I've already spent several
days googling around, but don't seem to find any solution.
Any help or guidance is highly appreciated. Thanks!



2011-11-16 21:56:06

by Pavel A

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

2011/11/16 Bryan Schumaker <[email protected]>:
> On 11/16/2011 03:08 PM, J. Bruce Fields wrote:
>> On Wed, Nov 16, 2011 at 09:09:07PM +0200, Pavel A wrote:
>>> I've read about this issue here:
>>> http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html
>>>
>>> /*-----
>>> In the event of server failure (e.g. server reboot or lock daemon
>>> restart), all client locks are lost. However, the clients are not
>>> informed of this, and because the other operations (read, write, and
>>> so on) are not visibly interrupted, they have no reliable way to
>>> prevent other clients from obtaining a lock on a file they think they
>>> have locked.
>>> -----*/
>>
>> That's incorrect.  Perhaps the article is out of date, I don't know.
>
> Looks like it was written about 11 years ago, so I'll believe that it's out of date.

Yes, should have watched out for that.

>
> - Bryan
>
>>
>>> Can't get this. If there is a grace period after reboot and clients
>>> can successfully reclaim locks, then how other clients can obtain
>>> locks?
>>
>> That's right, in the absence of bugs, if a client succesfully reclaims a
>> lock, then it knows that no other client can have acquired that lock in
>> the interim: since the reclaim succeeded, that means the server is still
>> in the grace period, which means the only other locks that it has
>> allowed are also reclaims.  If some reclaim conflicts with this lock,
>> then the other client must have reclaimed a lock that it didn't actually
>> hold before (hence must be buggy).
>>
>>>> You need to restart nfsd on the node that is taking over.  That means
>>>> that clients usings both filesystems (A and B) will have to do lock
>>>> recovery, when in theory only those using volume B should have to, and
>>>> that is suboptimal.  But it is also correct.
>>>>
>>>
>>> Seems to work. As of a more optimal solution: what do you think of the
>>> contents of /proc/locks? May it be possible to use this info to then
>>> perform locking locally on the other node (after failover)?
>>
>> No, I don't think so.  And I'd be careful about using /proc/locks for
>> anything but debugging.
>>
>> --b.
>
>
Well, looks like this is it.
Thank you very much, Bruce, Bryan - you real helped me to keep this going :)

2011-11-14 21:56:00

by Anna Schumaker

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

On Mon 14 Nov 2011 02:10:05 PM EST, Bryan Schumaker wrote:
> Hello Pavel,
>
> What kernel version is Debian using? I haven't been able to reproduce the problem using 3.0 (But I'm on Archlinux, so there might be other differences).

It might also be useful if you could share the /etc/exports file on the
server.

Thanks!

- Bryan
>
> - Bryan
>
> On Mon 14 Nov 2011 12:11:56 PM EST, Pavel wrote:
>> Hi! I'm trying to set up an NFS server (particularly an A/A NFS cluster) and
>> having issues with locking and reboot notifications. These are the tests I have
>> done:
>>
>> 1. The simplest test includes single NFS server machine (Debian Squeeze),
>> running nfs-kernel-server (nfs-utils 1.2.2-4) and a single client machine (same
>> OS), that mounts a share with “-o 'vers=3'” option. From the client I lock some
>> file on share using 'testlk -w <filename>' (testlk from nfsutils/tools/locktest)
>> so that a corresponding file appears in /var/lib/nfs/sm/ on server. Then I
>> reboot the server and this is what I get in client logs:
>>
>> lockd: request from 127.0.0.1, port=1007
>> lockd: SM_NOTIFY called
>> lockd: host nfs-server1 (192.168.0.101) rebooted, cnt 2
>> lockd: get host nfs-server1
>> lockd: get host nfs-server1
>> lockd: release host nfs-server1
>> lockd: reclaiming locks for host nfs-server1
>> lockd: rebind host nfs-server1
>> lockd: call procedure 2 on nfs-server1
>> lockd: nlm_bind_host nfs-server1 (192.168.0.101)
>> lockd: rpc_call returned error 13
>> lockd: failed to reclaim lock for pid 1555 (errno -13, status 0)
>> NLM: done reclaiming locks for host nfs-server1
>> lockd: release host nfs-server1
>>
>> 2. As I'm building a cluster I'll need to notify clients when NFS resource
>> migrates (since it is an A/A cluster nfs-kernel-server is always running on all
>> nodes and shares migrate using exportfs resource agent), but manually calling
>> sm-notify ('sm-notify -f -v <virtual IP of share>') from either the initial for
>> that share or backup node results in the following (client logs):
>>
>> lockd: request from 127.0.0.1, port=637
>> lockd: SM_NOTIFY called
>> lockd: host B (192.168.0.110) rebooted, cnt 2
>> lockd: get host B
>> lockd: get host B
>> lockd: release host B
>> lockd: reclaiming locks for host B
>> lockd: rebind host B
>> lockd: call procedure 2 on B
>> lockd: nlm_bind_host B (192.168.0.110)
>> lockd: server in grace period
>> lockd: spurious grace period reject?!
>> lockd: failed to reclaim lock for pid 2508 (errno -37, status 4)
>> NLM: done reclaiming locks for host B
>> lockd: release host B
>>
>> even though grace period is intended for lock reclamation. B/w after such
>> invocation no files, corresponding to the notified clients, appear in
>> /var/lib/nfs/sm/ on server for about 10 minutes, if I try locking from any of
>> these notified clients, even though locking itself is ok. Locking from other
>> clients generates files for them instantly.
>>
>> As of the rest: simple concurrent lock tests from couple of clients work fine as
>> well as server frees locks of rebooted clients.
>>
>> I'm new to NFS an may be missing obvious things, but I've already spent several
>> days googling around, but don't seem to find any solution.
>> Any help or guidance is highly appreciated. Thanks!
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to [email protected]
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html



2011-11-16 17:28:07

by J. Bruce Fields

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

On Wed, Nov 16, 2011 at 07:15:42PM +0200, Pasha Z wrote:
> 2011/11/16 J. Bruce Fields <[email protected]>:
> > On Wed, Nov 16, 2011 at 09:25:01AM -0500, Bryan Schumaker wrote:
> >> Here is what I'm doing (On debian with 2.6.32):
> >> - (On Client) Mount the server: `sudo mount -o vers=3
> >> 192.168.122.202:/home/bjschuma /mnt`
> >> - (On Client) Lock a file using nfs-utils/tools/locktest: `./testlk
> >> /mnt/test`
> >> - (On Server) Call sm-notify with the server's IP address: `sudo
> >> sm-notify -f -v 192.168.122.202`
> >> - dmesg on the client has this message:
> >>     lockd: spurious grace period reject?!
> >>     lockd: failed to reclaim lock for pid 2099 (errno -37, status 4)
> >> - (In wireshark) The client sends a lock request with the "Reclaim" bit
> >> set to "yes" but the server replies with "NLM_DENIED_GRACE_PERIOD".
> >
> > That sounds like correct server behavior to me.
> >
> > Once the server ends the grace period and starts accepting regular
> > non-reclaim locks, there's the chance of a situation like:
> >
> >        client A                client B
> >        --------                --------
> >
> >        acquires lock
> >
> >                ---server reboot---
> >                ---grace period ends---
> >
> >                                acquires conflicting lock
> >                                drops conflicting lock
> >
> > And if the server permits a reclaim of the original lock from client A,
> > then it gives client A the impression that it has held its lock
> > continuously over this whole time, when in fact someone else has held a
> > conflicting lock.
>
> Hm...This is how NFS behaves on real server reboot:
>
> client A                client B
>    --------                --------
> ---server started, serving regular locks---
> acquires lock
>
>        ---server rebooted--- (at this point sm-notify is called automatically)
> client A reacquires lock
>      ---grace period ends---
>
>                            cannot acquire lock,
> client A is holding it.

Yes.

>
> Shouldn't manual 'sm-notify -f' behave as the same way
> as real server reboot?

No, sm-notify does *not* restart knfsd (so does not cause knfsd to drop
existing locks or to enter a new grace period). It *only* sends NSM
notifications.

> I can't see how your example can take place.
> If client B acquires lock, then client A has to have
> released it some time before.

No, in my example above, there is a real server reboot; client A's lock
is lost in the reboot, it does not reclaim the lock in time, and so
client B is able to grab the lock.

> > So: no non-reclaim locks are allowed outside the grace period.
>
> I'm sorry, is that what you meant?

To restate it ing different words: locks with the reclaim bit will fail
outside of the grace period.

> As of HA setup. It is as follows, so you can understand, what I plan to use
> sm-notify for:
>
> Some background:
>
> I'm building an Active/Active NFS cluster and nfs-kernel-server is
> always running
> on all nodes. Note: each node in cluster exports shares, different from
> other nodes (they do not overlap), so clients never access same files through
> more than one server node and a usual file system (not cluster one) is
> used for storage.
> What I'm doing is moving NFS share (with resources underneath: virtual
> IP, drbd storage)
> between the nodes with exportfs OCF resource agent.
>
> This is how this setup is described here: http://ben.timby.com/?p=109
>
> /*-----
> I have need for an active-active NFS cluster. For review, and active-active
> cluster is two boxes that export two resources (one each). Each box acts as a
> backup for the other box’s resource. This way, both boxes actively
> serve clients
> (albeit for different NFS exports).
>
> *** To be clear, this means that half my users use Volume A and half
> of them use
> Volume B. Server A exports Volume A and Server B exports Volume B. If server A
> fails, Server B will export both volumes. I use DRBD to synchronize the primary
> server to the secondary server, for each volume. You can think of this like
> cross-replication, where Server A replicates changes to Volume A to Server B. I
> hope this makes it clear how this setup works. ***
> -----*/
>
> The goal:
>
> The solution by the link above allows to move NFS shares between the nodes, but
> doesn't support locking. Therefore I'll need to inform clients when share
> migrates to the other node (due to a node failure or manually), so
> that they can
> reclaim locks (given that files from /var/lib/nfs/sm are transferred to the
> other node).
>
> The problem:
>
> When I run sm-notify manually ('sm-notify -f -v <virtual IP of
> share>'), clients
> fail to reclaim locks. The log on the client looks like this:
>
> lockd: request from 127.0.0.1, port=637
> lockd: SM_NOTIFY called
> lockd: host B (192.168.0.110) rebooted, cnt 2
> lockd: get host B
> lockd: get host B
> lockd: release host B
> lockd: reclaiming locks for host B
> lockd: rebind host B
> lockd: call procedure 2 on B
> lockd: nlm_bind_host B (192.168.0.110)
> lockd: server in grace period
> lockd: spurious grace period reject?!
> lockd: failed to reclaim lock for pid 2508 (errno -37, status 4)
> NLM: done reclaiming locks for host B
> lockd: release host B

You need to restart nfsd on the node that is taking over. That means
that clients usings both filesystems (A and B) will have to do lock
recovery, when in theory only those using volume B should have to, and
that is suboptimal. But it is also correct.

--b.

2011-11-15 22:16:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

On Tue, Nov 15, 2011 at 04:48:57PM -0500, Bryan Schumaker wrote:
> On 11/15/2011 10:50 AM, Pavel wrote:
> > Bryan Schumaker <bjschuma@...> writes:
> >
> >>
> >> On Mon 14 Nov 2011 02:10:05 PM EST, Bryan Schumaker wrote:
> >>> Hello Pavel,
> >>>
> >>> What kernel version is Debian using? I haven't been able to reproduce the
> > problem using 3.0 (But I'm on
> >> Archlinux, so there might be other differences).
> >
> > Thanks, Bryan, for your reply.
> >
> > Debian is using Linux kernel version 2.6.32 - I haven't upgraded it.
> >
> >> It might also be useful if you could share the /etc/exports file on the
> >> server.
> >>
> >> Thanks!
> >>
> >> - Bryan
> >
> > Thank you for the question - that was my rude mistake. For managing exports I'm
> > using OCF resource agent 'exportfs'. It uses Linux build-in command 'exportfs'
> > to export shares and /etc/exports file is empty. However Heartbeat starts much
> > later than NFS...Now it is clear why this wasn't working. Setting up share that
> > doesn't rely on Heartbeat resources, resolves the issue.
> >
> > Still though, the first test was just to make sure NFS functions the way it is
> > supposed to, and not the goal - the second/main question remains open. When I
> > run sm-notify in this case, shares are already exported and all the other needed
> > resources are available as well. Why doesn't sm-notify work? It doesn't work
> > even in case of single server test. As of using files from /var/lib/nfs/sm/ when
> > notifying clients from the other node in cluster, it should be okay with -v
> > option of sm-notify, because it is a common practice to store the whole
> > /var/lib/nfs folder on shared storage in Active/Passive clusters and trigger sm-
> > notify from the active node. It would be awesome if you could give me a clue.
>
> I'm seeing the same thing you are using some Debian VMs I set up yesterday afternoon. It does look like the server is replying with NLM_DENIED_GRACE_PERIOD when sm-notify is used. Bruce, any idea what's going on here?

Sorry, I'm having trouble keeping up.... What exactly do you do, on
which machine, and what do you then see happen?

--b.

>
> When I try using my Linux 3.0 / Archlinux machines I don't see any NLM requests due to sm-notify. I'm not sure that's correct...

2011-11-16 17:37:57

by Anna Schumaker

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

On 11/16/2011 10:30 AM, J. Bruce Fields wrote:
> On Wed, Nov 16, 2011 at 09:25:01AM -0500, Bryan Schumaker wrote:
>> Here is what I'm doing (On debian with 2.6.32):
>> - (On Client) Mount the server: `sudo mount -o vers=3
>> 192.168.122.202:/home/bjschuma /mnt`
>> - (On Client) Lock a file using nfs-utils/tools/locktest: `./testlk
>> /mnt/test`
>> - (On Server) Call sm-notify with the server's IP address: `sudo
>> sm-notify -f -v 192.168.122.202`
>> - dmesg on the client has this message:
>> lockd: spurious grace period reject?!
>> lockd: failed to reclaim lock for pid 2099 (errno -37, status 4)
>> - (In wireshark) The client sends a lock request with the "Reclaim" bit
>> set to "yes" but the server replies with "NLM_DENIED_GRACE_PERIOD".
>
> That sounds like correct server behavior to me.
>
> Once the server ends the grace period and starts accepting regular
> non-reclaim locks, there's the chance of a situation like:
>
> client A client B
> -------- --------
>
> acquires lock
>
> ---server reboot---
> ---grace period ends---
>
> acquires conflicting lock
> drops conflicting lock
>
> And if the server permits a reclaim of the original lock from client A,
> then it gives client A the impression that it has held its lock
> continuously over this whole time, when in fact someone else has held a
> conflicting lock.
>
> So: no non-reclaim locks are allowed outside the grace period.

I see where I was confused. I thought that running sm-notify also restarted the grace period.

- Bryan

>
> If you restart the server, and *then* immediately run sm-notify while
> the new nfsd is still in its grace period, I'd expect the reclaim to
> succeed.
>
> And that may be where the HA setup isn't right--if you're doing
> active/passive failover, then you need to make sure you don't start nfsd
> on the backup machine until just before you send the sm-notify.
>
> --b.
>
>>
>> Shouldn't the server be allowing the lock reclaim? When I tried
>> yesterday using 3.0 it only triggered DNS packets, I tried again a few
>> minutes ago and got the same results that I did using .32.


2011-11-16 15:30:54

by J. Bruce Fields

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

On Wed, Nov 16, 2011 at 09:25:01AM -0500, Bryan Schumaker wrote:
> Here is what I'm doing (On debian with 2.6.32):
> - (On Client) Mount the server: `sudo mount -o vers=3
> 192.168.122.202:/home/bjschuma /mnt`
> - (On Client) Lock a file using nfs-utils/tools/locktest: `./testlk
> /mnt/test`
> - (On Server) Call sm-notify with the server's IP address: `sudo
> sm-notify -f -v 192.168.122.202`
> - dmesg on the client has this message:
> lockd: spurious grace period reject?!
> lockd: failed to reclaim lock for pid 2099 (errno -37, status 4)
> - (In wireshark) The client sends a lock request with the "Reclaim" bit
> set to "yes" but the server replies with "NLM_DENIED_GRACE_PERIOD".

That sounds like correct server behavior to me.

Once the server ends the grace period and starts accepting regular
non-reclaim locks, there's the chance of a situation like:

client A client B
-------- --------

acquires lock

---server reboot---
---grace period ends---

acquires conflicting lock
drops conflicting lock

And if the server permits a reclaim of the original lock from client A,
then it gives client A the impression that it has held its lock
continuously over this whole time, when in fact someone else has held a
conflicting lock.

So: no non-reclaim locks are allowed outside the grace period.

If you restart the server, and *then* immediately run sm-notify while
the new nfsd is still in its grace period, I'd expect the reclaim to
succeed.

And that may be where the HA setup isn't right--if you're doing
active/passive failover, then you need to make sure you don't start nfsd
on the backup machine until just before you send the sm-notify.

--b.

>
> Shouldn't the server be allowing the lock reclaim? When I tried
> yesterday using 3.0 it only triggered DNS packets, I tried again a few
> minutes ago and got the same results that I did using .32.

2011-11-16 20:08:41

by J. Bruce Fields

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

On Wed, Nov 16, 2011 at 09:09:07PM +0200, Pavel A wrote:
> I've read about this issue here:
> http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html
>
> /*-----
> In the event of server failure (e.g. server reboot or lock daemon
> restart), all client locks are lost. However, the clients are not
> informed of this, and because the other operations (read, write, and
> so on) are not visibly interrupted, they have no reliable way to
> prevent other clients from obtaining a lock on a file they think they
> have locked.
> -----*/

That's incorrect. Perhaps the article is out of date, I don't know.

> Can't get this. If there is a grace period after reboot and clients
> can successfully reclaim locks, then how other clients can obtain
> locks?

That's right, in the absence of bugs, if a client succesfully reclaims a
lock, then it knows that no other client can have acquired that lock in
the interim: since the reclaim succeeded, that means the server is still
in the grace period, which means the only other locks that it has
allowed are also reclaims. If some reclaim conflicts with this lock,
then the other client must have reclaimed a lock that it didn't actually
hold before (hence must be buggy).

> > You need to restart nfsd on the node that is taking over. That means
> > that clients usings both filesystems (A and B) will have to do lock
> > recovery, when in theory only those using volume B should have to, and
> > that is suboptimal. But it is also correct.
> >
>
> Seems to work. As of a more optimal solution: what do you think of the
> contents of /proc/locks? May it be possible to use this info to then
> perform locking locally on the other node (after failover)?

No, I don't think so. And I'd be careful about using /proc/locks for
anything but debugging.

--b.

2011-11-15 21:49:00

by Anna Schumaker

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

On 11/15/2011 10:50 AM, Pavel wrote:
> Bryan Schumaker <bjschuma@...> writes:
>
>>
>> On Mon 14 Nov 2011 02:10:05 PM EST, Bryan Schumaker wrote:
>>> Hello Pavel,
>>>
>>> What kernel version is Debian using? I haven't been able to reproduce the
> problem using 3.0 (But I'm on
>> Archlinux, so there might be other differences).
>
> Thanks, Bryan, for your reply.
>
> Debian is using Linux kernel version 2.6.32 - I haven't upgraded it.
>
>> It might also be useful if you could share the /etc/exports file on the
>> server.
>>
>> Thanks!
>>
>> - Bryan
>
> Thank you for the question - that was my rude mistake. For managing exports I'm
> using OCF resource agent 'exportfs'. It uses Linux build-in command 'exportfs'
> to export shares and /etc/exports file is empty. However Heartbeat starts much
> later than NFS...Now it is clear why this wasn't working. Setting up share that
> doesn't rely on Heartbeat resources, resolves the issue.
>
> Still though, the first test was just to make sure NFS functions the way it is
> supposed to, and not the goal - the second/main question remains open. When I
> run sm-notify in this case, shares are already exported and all the other needed
> resources are available as well. Why doesn't sm-notify work? It doesn't work
> even in case of single server test. As of using files from /var/lib/nfs/sm/ when
> notifying clients from the other node in cluster, it should be okay with -v
> option of sm-notify, because it is a common practice to store the whole
> /var/lib/nfs folder on shared storage in Active/Passive clusters and trigger sm-
> notify from the active node. It would be awesome if you could give me a clue.

I'm seeing the same thing you are using some Debian VMs I set up yesterday afternoon. It does look like the server is replying with NLM_DENIED_GRACE_PERIOD when sm-notify is used. Bruce, any idea what's going on here?

When I try using my Linux 3.0 / Archlinux machines I don't see any NLM requests due to sm-notify. I'm not sure that's correct...

- Bryan

>
>>>
>>> - Bryan
>>>
>>> On Mon 14 Nov 2011 12:11:56 PM EST, Pavel wrote:
>>>> Hi! I'm trying to set up an NFS server (particularly an A/A NFS cluster)
> and
>>>> having issues with locking and reboot notifications. These are the tests I
> have
>>>> done:
>>>>
>>>> 1. The simplest test includes single NFS server machine (Debian Squeeze),
>>>> running nfs-kernel-server (nfs-utils 1.2.2-4) and a single client machine
> (same
>>>> OS), that mounts a share with “-o 'vers=3'” option. From the client I lock
> some
>>>> file on share using 'testlk -w <filename>' (testlk from
> nfsutils/tools/locktest)
>>>> so that a corresponding file appears in /var/lib/nfs/sm/ on server. Then I
>>>> reboot the server and this is what I get in client logs:
>>>>
>>>> lockd: request from 127.0.0.1, port=1007
>>>> lockd: SM_NOTIFY called
>>>> lockd: host nfs-server1 (192.168.0.101) rebooted, cnt 2
>>>> lockd: get host nfs-server1
>>>> lockd: get host nfs-server1
>>>> lockd: release host nfs-server1
>>>> lockd: reclaiming locks for host nfs-server1
>>>> lockd: rebind host nfs-server1
>>>> lockd: call procedure 2 on nfs-server1
>>>> lockd: nlm_bind_host nfs-server1 (192.168.0.101)
>>>> lockd: rpc_call returned error 13
>>>> lockd: failed to reclaim lock for pid 1555 (errno -13, status 0)
>>>> NLM: done reclaiming locks for host nfs-server1
>>>> lockd: release host nfs-server1
>>>>
>>>> 2. As I'm building a cluster I'll need to notify clients when NFS resource
>>>> migrates (since it is an A/A cluster nfs-kernel-server is always running on
> all
>>>> nodes and shares migrate using exportfs resource agent), but manually
> calling
>>>> sm-notify ('sm-notify -f -v <virtual IP of share>') from either the initial
> for
>>>> that share or backup node results in the following (client logs):
>>>>
>>>> lockd: request from 127.0.0.1, port=637
>>>> lockd: SM_NOTIFY called
>>>> lockd: host B (192.168.0.110) rebooted, cnt 2
>>>> lockd: get host B
>>>> lockd: get host B
>>>> lockd: release host B
>>>> lockd: reclaiming locks for host B
>>>> lockd: rebind host B
>>>> lockd: call procedure 2 on B
>>>> lockd: nlm_bind_host B (192.168.0.110)
>>>> lockd: server in grace period
>>>> lockd: spurious grace period reject?!
>>>> lockd: failed to reclaim lock for pid 2508 (errno -37, status 4)
>>>> NLM: done reclaiming locks for host B
>>>> lockd: release host B
>>>>
>>>> even though grace period is intended for lock reclamation. B/w after such
>>>> invocation no files, corresponding to the notified clients, appear in
>>>> /var/lib/nfs/sm/ on server for about 10 minutes, if I try locking from any
> of
>>>> these notified clients, even though locking itself is ok. Locking from
> other
>>>> clients generates files for them instantly.
>>>>
>>>> As of the rest: simple concurrent lock tests from couple of clients work
> fine as
>>>> well as server frees locks of rebooted clients.
>>>>
>>>> I'm new to NFS an may be missing obvious things, but I've already spent
> several
>>>> days googling around, but don't seem to find any solution.
>>>> Any help or guidance is highly appreciated. Thanks!
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>>> the body of a message to majordomo@...
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>>> the body of a message to majordomo@...
>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@...
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-11-15 17:20:23

by Pavel A

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

Pavel <free.lan.c2.718r@...> writes:

>
> Bryan Schumaker <bjschuma@...> writes:
>
> >
> > On Mon 14 Nov 2011 02:10:05 PM EST, Bryan Schumaker wrote:
> > It might also be useful if you could share the /etc/exports file on the
> > server.
> >
> > Thanks!
> >
> > - Bryan

The options I export shares with are: rw,no_root_squash,no_all_squash,sync;
wdelay and no_subtree_check are added as default.

The output of 'exportfs -v' (whether I export share using /etc/exports entry or
exportfs resource agent) is the following:

/mnt/B/share 192.168.0.0/24(rw,wdelay,no_root_squash,no_subtree_check)





2011-11-14 19:10:07

by Anna Schumaker

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

Hello Pavel,

What kernel version is Debian using? I haven't been able to reproduce the problem using 3.0 (But I'm on Archlinux, so there might be other differences).

- Bryan

On Mon 14 Nov 2011 12:11:56 PM EST, Pavel wrote:
> Hi! I'm trying to set up an NFS server (particularly an A/A NFS cluster) and
> having issues with locking and reboot notifications. These are the tests I have
> done:
>
> 1. The simplest test includes single NFS server machine (Debian Squeeze),
> running nfs-kernel-server (nfs-utils 1.2.2-4) and a single client machine (same
> OS), that mounts a share with “-o 'vers=3'” option. From the client I lock some
> file on share using 'testlk -w <filename>' (testlk from nfsutils/tools/locktest)
> so that a corresponding file appears in /var/lib/nfs/sm/ on server. Then I
> reboot the server and this is what I get in client logs:
>
> lockd: request from 127.0.0.1, port=1007
> lockd: SM_NOTIFY called
> lockd: host nfs-server1 (192.168.0.101) rebooted, cnt 2
> lockd: get host nfs-server1
> lockd: get host nfs-server1
> lockd: release host nfs-server1
> lockd: reclaiming locks for host nfs-server1
> lockd: rebind host nfs-server1
> lockd: call procedure 2 on nfs-server1
> lockd: nlm_bind_host nfs-server1 (192.168.0.101)
> lockd: rpc_call returned error 13
> lockd: failed to reclaim lock for pid 1555 (errno -13, status 0)
> NLM: done reclaiming locks for host nfs-server1
> lockd: release host nfs-server1
>
> 2. As I'm building a cluster I'll need to notify clients when NFS resource
> migrates (since it is an A/A cluster nfs-kernel-server is always running on all
> nodes and shares migrate using exportfs resource agent), but manually calling
> sm-notify ('sm-notify -f -v <virtual IP of share>') from either the initial for
> that share or backup node results in the following (client logs):
>
> lockd: request from 127.0.0.1, port=637
> lockd: SM_NOTIFY called
> lockd: host B (192.168.0.110) rebooted, cnt 2
> lockd: get host B
> lockd: get host B
> lockd: release host B
> lockd: reclaiming locks for host B
> lockd: rebind host B
> lockd: call procedure 2 on B
> lockd: nlm_bind_host B (192.168.0.110)
> lockd: server in grace period
> lockd: spurious grace period reject?!
> lockd: failed to reclaim lock for pid 2508 (errno -37, status 4)
> NLM: done reclaiming locks for host B
> lockd: release host B
>
> even though grace period is intended for lock reclamation. B/w after such
> invocation no files, corresponding to the notified clients, appear in
> /var/lib/nfs/sm/ on server for about 10 minutes, if I try locking from any of
> these notified clients, even though locking itself is ok. Locking from other
> clients generates files for them instantly.
>
> As of the rest: simple concurrent lock tests from couple of clients work fine as
> well as server frees locks of rebooted clients.
>
> I'm new to NFS an may be missing obvious things, but I've already spent several
> days googling around, but don't seem to find any solution.
> Any help or guidance is highly appreciated. Thanks!
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


2011-11-16 20:21:44

by Anna Schumaker

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

On 11/16/2011 03:08 PM, J. Bruce Fields wrote:
> On Wed, Nov 16, 2011 at 09:09:07PM +0200, Pavel A wrote:
>> I've read about this issue here:
>> http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html
>>
>> /*-----
>> In the event of server failure (e.g. server reboot or lock daemon
>> restart), all client locks are lost. However, the clients are not
>> informed of this, and because the other operations (read, write, and
>> so on) are not visibly interrupted, they have no reliable way to
>> prevent other clients from obtaining a lock on a file they think they
>> have locked.
>> -----*/
>
> That's incorrect. Perhaps the article is out of date, I don't know.

Looks like it was written about 11 years ago, so I'll believe that it's out of date.

- Bryan

>
>> Can't get this. If there is a grace period after reboot and clients
>> can successfully reclaim locks, then how other clients can obtain
>> locks?
>
> That's right, in the absence of bugs, if a client succesfully reclaims a
> lock, then it knows that no other client can have acquired that lock in
> the interim: since the reclaim succeeded, that means the server is still
> in the grace period, which means the only other locks that it has
> allowed are also reclaims. If some reclaim conflicts with this lock,
> then the other client must have reclaimed a lock that it didn't actually
> hold before (hence must be buggy).
>
>>> You need to restart nfsd on the node that is taking over. That means
>>> that clients usings both filesystems (A and B) will have to do lock
>>> recovery, when in theory only those using volume B should have to, and
>>> that is suboptimal. But it is also correct.
>>>
>>
>> Seems to work. As of a more optimal solution: what do you think of the
>> contents of /proc/locks? May it be possible to use this info to then
>> perform locking locally on the other node (after failover)?
>
> No, I don't think so. And I'd be careful about using /proc/locks for
> anything but debugging.
>
> --b.


2011-11-16 14:58:22

by Pavel A

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

Bryan Schumaker <bjschuma@...> writes:

> > On Tue, Nov 15, 2011 at 04:48:57PM -0500, Bryan Schumaker wrote:
> >
> > Sorry, I'm having trouble keeping up.... What exactly do you do, on
> > which machine, and what do you then see happen?
>
> Here is what I'm doing (On debian with 2.6.32):
> - (On Client) Mount the server: `sudo mount -o vers=3
> 192.168.122.202:/home/bjschuma /mnt`
> - (On Client) Lock a file using nfs-utils/tools/locktest: `./testlk
> /mnt/test`
> - (On Server) Call sm-notify with the server's IP address: `sudo
> sm-notify -f -v 192.168.122.202`
> - dmesg on the client has this message:
> lockd: spurious grace period reject?!
> lockd: failed to reclaim lock for pid 2099 (errno -37, status 4)
> - (In wireshark) The client sends a lock request with the "Reclaim" bit
> set to "yes" but the server replies with "NLM_DENIED_GRACE_PERIOD".
>
> Shouldn't the server be allowing the lock reclaim? When I tried
> yesterday using 3.0 it only triggered DNS packets, I tried again a few
> minutes ago and got the same results that I did using .32.
>
> - Bryan
>

Yes, everything is exactly as you wrote.

After the steps above I also get the following in client logs:
lockd: server in grace period
lockd: spurious grace period reject?!
lockd: failed to reclaim lock for pid 2508 (errno -37, status 4)

Thank you all for taking time!


2011-11-16 19:09:08

by Pavel A

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

J. Bruce Fields <bfields@...> writes:

>
> On Wed, Nov 16, 2011 at 07:15:42PM +0200, Pasha Z wrote:
> > 2011/11/16 J. Bruce Fields <bfields@...>:
> > > On Wed, Nov 16, 2011 at 09:25:01AM -0500, Bryan Schumaker wrote:
> > >> Here is what I'm doing (On debian with 2.6.32):
> > >> - (On Client) Mount the server: `sudo mount -o vers=3
> > >> 192.168.122.202:/home/bjschuma /mnt`
> > >> - (On Client) Lock a file using nfs-utils/tools/locktest: `./testlk
> > >> /mnt/test`
> > >> - (On Server) Call sm-notify with the server's IP address: `sudo
> > >> sm-notify -f -v 192.168.122.202`
> > >> - dmesg on the client has this message:
> > >>     lockd: spurious grace period reject?!
> > >>     lockd: failed to reclaim lock for pid 2099 (errno -37, status 4)
> > >> - (In wireshark) The client sends a lock request with the "Reclaim" bit
> > >> set to "yes" but the server replies with "NLM_DENIED_GRACE_PERIOD".
> > >
> > > That sounds like correct server behavior to me.
> > >
> > > Once the server ends the grace period and starts accepting regular
> > > non-reclaim locks, there's the chance of a situation like:
> > >
> > >        client A                client B
> > >        --------                --------
> > >
> > >        acquires lock
> > >
> > >                ---server reboot---
> > >                ---grace period ends---
> > >
> > >                                acquires conflicting lock
> > >                                drops conflicting lock
> > >
> > > And if the server permits a reclaim of the original lock from client A,
> > > then it gives client A the impression that it has held its lock
> > > continuously over this whole time, when in fact someone else has held a
> > > conflicting lock.
> >
> > Hm...This is how NFS behaves on real server reboot:
> >
> > client A                client B
> >    --------                --------
> > ---server started, serving regular locks---
> > acquires lock
> >
> >        ---server rebooted--- (at this point sm-notify is called automatically)
> > client A reacquires lock
> >      ---grace period ends---
> >
> >                            cannot acquire lock,
> > client A is holding it.
>
> Yes.
>
> >
> > Shouldn't manual 'sm-notify -f' behave as the same way
> > as real server reboot?
>
> No, sm-notify does *not* restart knfsd (so does not cause knfsd to drop
> existing locks or to enter a new grace period). It *only* sends NSM
> notifications.

Thank you for the explanation.

>
> > I can't see how your example can take place.
> > If client B acquires lock, then client A has to have
> > released it some time before.
>
> No, in my example above, there is a real server reboot; client A's lock
> is lost in the reboot, it does not reclaim the lock in time, and so
> client B is able to grab the lock.
>

I've read about this issue here:
http://www.time-travellers.org/shane/papers/NFS_considered_harmful.html

/*-----
In the event of server failure (e.g. server reboot or lock daemon
restart), all client locks are lost. However, the clients are not
informed of this, and because the other operations (read, write, and
so on) are not visibly interrupted, they have no reliable way to
prevent other clients from obtaining a lock on a file they think they
have locked.
-----*/

Can't get this. If there is a grace period after reboot and clients
can successfully reclaim locks, then how other clients can obtain
locks?
Can you please explain when does this happen, if you've answered 'Yes'
to my example and your example is a real server reboot.

> > > So: no non-reclaim locks are allowed outside the grace period.
> >
> > I'm sorry, is that what you meant?
>
> To restate it ing different words: locks with the reclaim bit will fail
> outside of the grace period.

I've got it now, thanks.

>
> > As of HA setup. It is as follows, so you can understand, what I plan to use
> > sm-notify for:
> >
> > Some background:
> >
> > I'm building an Active/Active NFS cluster and nfs-kernel-server is
> > always running
> > on all nodes. Note: each node in cluster exports shares, different from
> > other nodes (they do not overlap), so clients never access same files through
> > more than one server node and a usual file system (not cluster one) is
> > used for storage.
> > What I'm doing is moving NFS share (with resources underneath: virtual
> > IP, drbd storage)
> > between the nodes with exportfs OCF resource agent.
> >
> > This is how this setup is described here: http://ben.timby.com/?p=109
> >
> > /*-----
> > I have need for an active-active NFS cluster. For review, and active-active
> > cluster is two boxes that export two resources (one each). Each box acts as a
> > backup for the other box’s resource. This way, both boxes actively
> > serve clients
> > (albeit for different NFS exports).
> >
> > *** To be clear, this means that half my users use Volume A and half
> > of them use
> > Volume B. Server A exports Volume A and Server B exports Volume B. If server A
> > fails, Server B will export both volumes. I use DRBD to synchronize the primary
> > server to the secondary server, for each volume. You can think of this like
> > cross-replication, where Server A replicates changes to Volume A to Server B. I
> > hope this makes it clear how this setup works. ***
> > -----*/
> >
> > The goal:
> >
> > The solution by the link above allows to move NFS shares between the nodes, but
> > doesn't support locking. Therefore I'll need to inform clients when share
> > migrates to the other node (due to a node failure or manually), so
> > that they can
> > reclaim locks (given that files from /var/lib/nfs/sm are transferred to the
> > other node).
> >
> > The problem:
> >
> > When I run sm-notify manually ('sm-notify -f -v <virtual IP of
> > share>'), clients
> > fail to reclaim locks. The log on the client looks like this:
> >
> > lockd: request from 127.0.0.1, port=637
> > lockd: SM_NOTIFY called
> > lockd: host B (192.168.0.110) rebooted, cnt 2
> > lockd: get host B
> > lockd: get host B
> > lockd: release host B
> > lockd: reclaiming locks for host B
> > lockd: rebind host B
> > lockd: call procedure 2 on B
> > lockd: nlm_bind_host B (192.168.0.110)
> > lockd: server in grace period
> > lockd: spurious grace period reject?!
> > lockd: failed to reclaim lock for pid 2508 (errno -37, status 4)
> > NLM: done reclaiming locks for host B
> > lockd: release host B
>
> You need to restart nfsd on the node that is taking over. That means
> that clients usings both filesystems (A and B) will have to do lock
> recovery, when in theory only those using volume B should have to, and
> that is suboptimal. But it is also correct.
>

Seems to work. As of a more optimal solution: what do you think of the
contents of /proc/locks? May it be possible to use this info to then
perform locking locally on the other node (after failover)?

Thanks!

> --b.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@...
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

2011-11-15 15:51:05

by Pavel A

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

Bryan Schumaker <bjschuma@...> writes:

>
> On Mon 14 Nov 2011 02:10:05 PM EST, Bryan Schumaker wrote:
> > Hello Pavel,
> >
> > What kernel version is Debian using? I haven't been able to reproduce the
problem using 3.0 (But I'm on
> Archlinux, so there might be other differences).

Thanks, Bryan, for your reply.

Debian is using Linux kernel version 2.6.32 - I haven't upgraded it.

> It might also be useful if you could share the /etc/exports file on the
> server.
>
> Thanks!
>
> - Bryan

Thank you for the question - that was my rude mistake. For managing exports I'm
using OCF resource agent 'exportfs'. It uses Linux build-in command 'exportfs'
to export shares and /etc/exports file is empty. However Heartbeat starts much
later than NFS...Now it is clear why this wasn't working. Setting up share that
doesn't rely on Heartbeat resources, resolves the issue.

Still though, the first test was just to make sure NFS functions the way it is
supposed to, and not the goal - the second/main question remains open. When I
run sm-notify in this case, shares are already exported and all the other needed
resources are available as well. Why doesn't sm-notify work? It doesn't work
even in case of single server test. As of using files from /var/lib/nfs/sm/ when
notifying clients from the other node in cluster, it should be okay with -v
option of sm-notify, because it is a common practice to store the whole
/var/lib/nfs folder on shared storage in Active/Passive clusters and trigger sm-
notify from the active node. It would be awesome if you could give me a clue.

> >
> > - Bryan
> >
> > On Mon 14 Nov 2011 12:11:56 PM EST, Pavel wrote:
> >> Hi! I'm trying to set up an NFS server (particularly an A/A NFS cluster)
and
> >> having issues with locking and reboot notifications. These are the tests I
have
> >> done:
> >>
> >> 1. The simplest test includes single NFS server machine (Debian Squeeze),
> >> running nfs-kernel-server (nfs-utils 1.2.2-4) and a single client machine
(same
> >> OS), that mounts a share with “-o 'vers=3'” option. From the client I lock
some
> >> file on share using 'testlk -w <filename>' (testlk from
nfsutils/tools/locktest)
> >> so that a corresponding file appears in /var/lib/nfs/sm/ on server. Then I
> >> reboot the server and this is what I get in client logs:
> >>
> >> lockd: request from 127.0.0.1, port=1007
> >> lockd: SM_NOTIFY called
> >> lockd: host nfs-server1 (192.168.0.101) rebooted, cnt 2
> >> lockd: get host nfs-server1
> >> lockd: get host nfs-server1
> >> lockd: release host nfs-server1
> >> lockd: reclaiming locks for host nfs-server1
> >> lockd: rebind host nfs-server1
> >> lockd: call procedure 2 on nfs-server1
> >> lockd: nlm_bind_host nfs-server1 (192.168.0.101)
> >> lockd: rpc_call returned error 13
> >> lockd: failed to reclaim lock for pid 1555 (errno -13, status 0)
> >> NLM: done reclaiming locks for host nfs-server1
> >> lockd: release host nfs-server1
> >>
> >> 2. As I'm building a cluster I'll need to notify clients when NFS resource
> >> migrates (since it is an A/A cluster nfs-kernel-server is always running on
all
> >> nodes and shares migrate using exportfs resource agent), but manually
calling
> >> sm-notify ('sm-notify -f -v <virtual IP of share>') from either the initial
for
> >> that share or backup node results in the following (client logs):
> >>
> >> lockd: request from 127.0.0.1, port=637
> >> lockd: SM_NOTIFY called
> >> lockd: host B (192.168.0.110) rebooted, cnt 2
> >> lockd: get host B
> >> lockd: get host B
> >> lockd: release host B
> >> lockd: reclaiming locks for host B
> >> lockd: rebind host B
> >> lockd: call procedure 2 on B
> >> lockd: nlm_bind_host B (192.168.0.110)
> >> lockd: server in grace period
> >> lockd: spurious grace period reject?!
> >> lockd: failed to reclaim lock for pid 2508 (errno -37, status 4)
> >> NLM: done reclaiming locks for host B
> >> lockd: release host B
> >>
> >> even though grace period is intended for lock reclamation. B/w after such
> >> invocation no files, corresponding to the notified clients, appear in
> >> /var/lib/nfs/sm/ on server for about 10 minutes, if I try locking from any
of
> >> these notified clients, even though locking itself is ok. Locking from
other
> >> clients generates files for them instantly.
> >>
> >> As of the rest: simple concurrent lock tests from couple of clients work
fine as
> >> well as server frees locks of rebooted clients.
> >>
> >> I'm new to NFS an may be missing obvious things, but I've already spent
several
> >> days googling around, but don't seem to find any solution.
> >> Any help or guidance is highly appreciated. Thanks!
> >>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> >> the body of a message to majordomo@...
> >> More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> > the body of a message to majordomo@...
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@...
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>





2011-11-16 17:15:43

by Pavel A

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

2011/11/16 J. Bruce Fields <[email protected]>:
> On Wed, Nov 16, 2011 at 09:25:01AM -0500, Bryan Schumaker wrote:
>> Here is what I'm doing (On debian with 2.6.32):
>> - (On Client) Mount the server: `sudo mount -o vers=3
>> 192.168.122.202:/home/bjschuma /mnt`
>> - (On Client) Lock a file using nfs-utils/tools/locktest: `./testlk
>> /mnt/test`
>> - (On Server) Call sm-notify with the server's IP address: `sudo
>> sm-notify -f -v 192.168.122.202`
>> - dmesg on the client has this message:
>>     lockd: spurious grace period reject?!
>>     lockd: failed to reclaim lock for pid 2099 (errno -37, status 4)
>> - (In wireshark) The client sends a lock request with the "Reclaim" bit
>> set to "yes" but the server replies with "NLM_DENIED_GRACE_PERIOD".
>
> That sounds like correct server behavior to me.
>
> Once the server ends the grace period and starts accepting regular
> non-reclaim locks, there's the chance of a situation like:
>
>        client A                client B
>        --------                --------
>
>        acquires lock
>
>                ---server reboot---
>                ---grace period ends---
>
>                                acquires conflicting lock
>                                drops conflicting lock
>
> And if the server permits a reclaim of the original lock from client A,
> then it gives client A the impression that it has held its lock
> continuously over this whole time, when in fact someone else has held a
> conflicting lock.

Hm...This is how NFS behaves on real server reboot:

client A                client B
   --------                --------
---server started, serving regular locks---
acquires lock

       ---server rebooted--- (at this point sm-notify is called automatically)
client A reacquires lock
     ---grace period ends---

                           cannot acquire lock,
client A is holding it.

Shouldn't manual 'sm-notify -f' behave as the same way
as real server reboot?
I can't see how your example can take place.
If client B acquires lock, then client A has to have
released it some time before.

>
> So: no non-reclaim locks are allowed outside the grace period.

I'm sorry, is that what you meant?
>
> If you restart the server, and *then* immediately run sm-notify while
> the new nfsd is still in its grace period, I'd expect the reclaim to
> succeed.
>
> And that may be where the HA setup isn't right--if you're doing
> active/passive failover, then you need to make sure you don't start nfsd
> on the backup machine until just before you send the sm-notify.
>
As of HA setup. It is as follows, so you can understand, what I plan to use
sm-notify for:

Some background:

I'm building an Active/Active NFS cluster and nfs-kernel-server is
always running
on all nodes. Note: each node in cluster exports shares, different from
other nodes (they do not overlap), so clients never access same files through
more than one server node and a usual file system (not cluster one) is
used for storage.
What I'm doing is moving NFS share (with resources underneath: virtual
IP, drbd storage)
between the nodes with exportfs OCF resource agent.

This is how this setup is described here: http://ben.timby.com/?p=109

/*-----
I have need for an active-active NFS cluster. For review, and active-active
cluster is two boxes that export two resources (one each). Each box acts as a
backup for the other box’s resource. This way, both boxes actively
serve clients
(albeit for different NFS exports).

*** To be clear, this means that half my users use Volume A and half
of them use
Volume B. Server A exports Volume A and Server B exports Volume B. If server A
fails, Server B will export both volumes. I use DRBD to synchronize the primary
server to the secondary server, for each volume. You can think of this like
cross-replication, where Server A replicates changes to Volume A to Server B. I
hope this makes it clear how this setup works. ***
-----*/

The goal:

The solution by the link above allows to move NFS shares between the nodes, but
doesn't support locking. Therefore I'll need to inform clients when share
migrates to the other node (due to a node failure or manually), so
that they can
reclaim locks (given that files from /var/lib/nfs/sm are transferred to the
other node).

The problem:

When I run sm-notify manually ('sm-notify -f -v <virtual IP of
share>'), clients
fail to reclaim locks. The log on the client looks like this:

lockd: request from 127.0.0.1, port=637
lockd: SM_NOTIFY called
lockd: host B (192.168.0.110) rebooted, cnt 2
lockd: get host B
lockd: get host B
lockd: release host B
lockd: reclaiming locks for host B
lockd: rebind host B
lockd: call procedure 2 on B
lockd: nlm_bind_host B (192.168.0.110)
lockd: server in grace period
lockd: spurious grace period reject?!
lockd: failed to reclaim lock for pid 2508 (errno -37, status 4)
NLM: done reclaiming locks for host B
lockd: release host B

!However, this happens even in case of a standard single-machine
NFS server!

The Active/Passive setup you have described is known to work.

> --b.
>
>>
>> Shouldn't the server be allowing the lock reclaim?  When I tried
>> yesterday using 3.0 it only triggered DNS packets, I tried again a few
>> minutes ago and got the same results that I did using .32.
>

2011-11-16 14:25:20

by Anna Schumaker

[permalink] [raw]
Subject: Re: clients fail to reclaim locks after server reboot or manual sm-notify

On Tue 15 Nov 2011 05:16:23 PM EST, J. Bruce Fields wrote:
> On Tue, Nov 15, 2011 at 04:48:57PM -0500, Bryan Schumaker wrote:
>> On 11/15/2011 10:50 AM, Pavel wrote:
>>> Bryan Schumaker <bjschuma@...> writes:
>>>
>>>>
>>>> On Mon 14 Nov 2011 02:10:05 PM EST, Bryan Schumaker wrote:
>>>>> Hello Pavel,
>>>>>
>>>>> What kernel version is Debian using? I haven't been able to reproduce the
>>> problem using 3.0 (But I'm on
>>>> Archlinux, so there might be other differences).
>>>
>>> Thanks, Bryan, for your reply.
>>>
>>> Debian is using Linux kernel version 2.6.32 - I haven't upgraded it.
>>>
>>>> It might also be useful if you could share the /etc/exports file on the
>>>> server.
>>>>
>>>> Thanks!
>>>>
>>>> - Bryan
>>>
>>> Thank you for the question - that was my rude mistake. For managing exports I'm
>>> using OCF resource agent 'exportfs'. It uses Linux build-in command 'exportfs'
>>> to export shares and /etc/exports file is empty. However Heartbeat starts much
>>> later than NFS...Now it is clear why this wasn't working. Setting up share that
>>> doesn't rely on Heartbeat resources, resolves the issue.
>>>
>>> Still though, the first test was just to make sure NFS functions the way it is
>>> supposed to, and not the goal - the second/main question remains open. When I
>>> run sm-notify in this case, shares are already exported and all the other needed
>>> resources are available as well. Why doesn't sm-notify work? It doesn't work
>>> even in case of single server test. As of using files from /var/lib/nfs/sm/ when
>>> notifying clients from the other node in cluster, it should be okay with -v
>>> option of sm-notify, because it is a common practice to store the whole
>>> /var/lib/nfs folder on shared storage in Active/Passive clusters and trigger sm-
>>> notify from the active node. It would be awesome if you could give me a clue.
>>
>> I'm seeing the same thing you are using some Debian VMs I set up yesterday afternoon. It does look like the server is replying with NLM_DENIED_GRACE_PERIOD when sm-notify is used. Bruce, any idea what's going on here?
>
> Sorry, I'm having trouble keeping up.... What exactly do you do, on
> which machine, and what do you then see happen?

Here is what I'm doing (On debian with 2.6.32):
- (On Client) Mount the server: `sudo mount -o vers=3
192.168.122.202:/home/bjschuma /mnt`
- (On Client) Lock a file using nfs-utils/tools/locktest: `./testlk
/mnt/test`
- (On Server) Call sm-notify with the server's IP address: `sudo
sm-notify -f -v 192.168.122.202`
- dmesg on the client has this message:
lockd: spurious grace period reject?!
lockd: failed to reclaim lock for pid 2099 (errno -37, status 4)
- (In wireshark) The client sends a lock request with the "Reclaim" bit
set to "yes" but the server replies with "NLM_DENIED_GRACE_PERIOD".

Shouldn't the server be allowing the lock reclaim? When I tried
yesterday using 3.0 it only triggered DNS packets, I tried again a few
minutes ago and got the same results that I did using .32.

- Bryan

>
> --b.
>
>>
>> When I try using my Linux 3.0 / Archlinux machines I don't see any NLM requests due to sm-notify. I'm not sure that's correct...
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html