_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
Sorry, that patch wasn't supposed to be an attachment. Here it is inline.
Tom.
At 09:36 AM 10/17/2007, Talpey, Thomas wrote:
>I've discovered a serious and somewhat interesting issue in the NLM
>client that's been present for quite a while. Basically, if an NLM server
>GRANTED callback arrives for a lock the client already holds, the client
>*rejects* the lock. This leads to incorrect behavior (since the server
>may give the lock to another process), but it also may lead to deadlock
>in the affected client when the next process is on the same machine;
>it means two processes are contending in the local locking code. We
>have seen this in the field recently.
>
>I have a proposed fix (patch below), which adds further checking to
>the NLM callback to scan for held locks, if the callback does not find
>a blocked lock in the client's nlm_blocked list. If found, the client will
>*accept* the lock instead of rejecting it. We're testing the patch at
>high scale currently, an earlier version of the patch (for kernel 2.6.15)
>works well at extreme levels.
>
>To see this, you need to have NLM lock contention, and enough
>network traffic to cause UDP NLM_GRANTED callbacks to be lost in
>the network. Typically, the client will retry the NLM_LOCK rpc after
>30 seconds when this occurs. These retries often succeed, but there
>are still the old GRANTED callbacks floating around, either at the server
>or in the network itself. We can duplicate this in minutes at high load,
>even with server callback retry.
>
>Comments?
>
>Tom.
>
>=====
>
Author: Tom Talpey <[email protected]>
NLM: close race condition in GRANTED callback
Don't reply with nlm_lck_denied to a late GRANTED callback for
a lock which we already hold. These callbacks can be delayed in
the network, and also may cross the client's NLMCLNT_POLL_TIMEOUT
retry.
Signed-off-by: Tom Talpey <[email protected]>
---
fs/lockd/clntlock.c | 3 ++
fs/lockd/host.c | 52 ++++++++++++++++++++++++++++++++++++++++++++
include/linux/lockd/lockd.h | 2 +
3 files changed, 57 insertions(+)
Index: linux-2.6.22.10/fs/lockd/clntlock.c
===================================================================
--- linux-2.6.22.10.orig/fs/lockd/clntlock.c
+++ linux-2.6.22.10/fs/lockd/clntlock.c
@@ -135,6 +135,9 @@ __be32 nlmclnt_grant(const struct sockad
wake_up(&block->b_wait);
res = nlm_granted;
}
+ /* Do NOT deny a lock which we already hold */
+ if (res != nlm_granted && nlmclnt_check_lock(addr, lock))
+ res = nlm_granted;
return res;
}
Index: linux-2.6.22.10/include/linux/lockd/lockd.h
===================================================================
--- linux-2.6.22.10.orig/include/linux/lockd/lockd.h
+++ linux-2.6.22.10/include/linux/lockd/lockd.h
@@ -174,6 +174,8 @@ void nlmclnt_next_cookie(struct nlm_c
*/
struct nlm_host * nlmclnt_lookup_host(const struct sockaddr_in *, int, int, const char *, int);
struct nlm_host * nlmsvc_lookup_host(struct svc_rqst *, const char *, int);
+int nlmclnt_check_lock(const struct sockaddr_in *,
+ const struct nlm_lock *);
struct rpc_clnt * nlm_bind_host(struct nlm_host *);
void nlm_rebind_host(struct nlm_host *);
struct nlm_host * nlm_get_host(struct nlm_host *);
Index: linux-2.6.22.10/fs/lockd/host.c
===================================================================
--- linux-2.6.22.10.orig/fs/lockd/host.c
+++ linux-2.6.22.10/fs/lockd/host.c
@@ -11,6 +11,7 @@
#include <linux/types.h>
#include <linux/slab.h>
#include <linux/in.h>
+#include <linux/nfs_fs.h>
#include <linux/sunrpc/clnt.h>
#include <linux/sunrpc/svc.h>
#include <linux/lockd/lockd.h>
@@ -197,6 +198,57 @@ nlmsvc_lookup_host(struct svc_rqst *rqst
}
/*
+ * Find an existing client NLM lock, used for validating old GRANTED callbacks.
+ */
+
+static inline int
+nlm_lock_match(const struct nlm_lock *lock, const struct file_lock *fl)
+{
+ return lock->fl.fl_start == fl->fl_start &&
+ lock->fl.fl_end == fl->fl_end &&
+ lock->svid == fl->fl_u.nfs_fl.owner->pid &&
+ 0 == nfs_compare_fh(&lock->fh,
+ NFS_FH(fl->fl_file->f_path.dentry->d_inode));
+}
+
+int
+nlmclnt_check_lock(const struct sockaddr_in *sin, const struct nlm_lock *lock)
+{
+ struct hlist_head *chain;
+ struct hlist_node *pos;
+ struct nlm_host *host;
+ struct file_lock *fl;
+ int hash;
+
+ dprintk("lockd: nlmclnt_check_lock(%u.%u.%u.%u)\n",
+ NIPQUAD(sin->sin_addr.s_addr));
+
+ hash = NLM_ADDRHASH(sin->sin_addr.s_addr);
+
+ /* Lock hash table */
+ mutex_lock(&nlm_host_mutex);
+
+ /* We must check all nlm_host objects for a peer, because the
+ * callback does not provide protocol, version or other context.
+ * When each peer is found, we scan for any matching granted lock.
+ * Also check for a granted lock in the process of being reclaimed. */
+ chain = &nlm_hosts[hash];
+ hlist_for_each_entry(host, pos, chain, h_hash) {
+ if (!nlm_cmp_addr(&host->h_addr, sin))
+ continue;
+ list_for_each_entry(fl, &host->h_granted, fl_u.nfs_fl.list)
+ if (nlm_lock_match(lock, fl))
+ goto found;
+ list_for_each_entry(fl, &host->h_reclaim, fl_u.nfs_fl.list)
+ if (nlm_lock_match(lock, fl))
+ goto found;
+ }
+ fl = NULL;
+found:
+ mutex_unlock(&nlm_host_mutex);
+ return fl != NULL;
+}
+/*
* Create the NLM RPC client for an NLM peer
*/
struct rpc_clnt *
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Wed, 2007-10-17 at 09:36 -0400, Talpey, Thomas wrote:
> I've discovered a serious and somewhat interesting issue in the NLM
> client that's been present for quite a while. Basically, if an NLM server
> GRANTED callback arrives for a lock the client already holds, the client
> *rejects* the lock. This leads to incorrect behavior (since the server
> may give the lock to another process), but it also may lead to deadlock
> in the affected client when the next process is on the same machine;
> it means two processes are contending in the local locking code. We
> have seen this in the field recently.
>
> I have a proposed fix (patch below), which adds further checking to
> the NLM callback to scan for held locks, if the callback does not find
> a blocked lock in the client's nlm_blocked list. If found, the client will
> *accept* the lock instead of rejecting it. We're testing the patch at
> high scale currently, an earlier version of the patch (for kernel 2.6.15)
> works well at extreme levels.
>
> To see this, you need to have NLM lock contention, and enough
> network traffic to cause UDP NLM_GRANTED callbacks to be lost in
> the network. Typically, the client will retry the NLM_LOCK rpc after
> 30 seconds when this occurs. These retries often succeed, but there
> are still the old GRANTED callbacks floating around, either at the server
> or in the network itself. We can duplicate this in minutes at high load,
> even with server callback retry.
>
> Comments?
So what will happen in this case if the NLM_GRANTED reply races with my
UNLOCK call to release the lock?
Servers have to perform consistency checks on NLM_GRANTED, or the
protocol will break. Once the client has been granted the lock via a
separate LOCK call, then the result of the NLM_GRANTED request _MUST_ be
ignored on the server.
Cheers
Trond
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
At 12:44 PM 10/17/2007, Trond Myklebust wrote:
>So what will happen in this case if the NLM_GRANTED reply races with my
>UNLOCK call to release the lock?
Alas, that is the beast which is NLM. The callback reply can race the UNLOCK
even without this change. I can't fix that, it's a protocol issue. Wait a sec
yes I can, it's called NFSv4.1 Sessions. :-)
>Servers have to perform consistency checks on NLM_GRANTED, or the
>protocol will break. Once the client has been granted the lock via a
>separate LOCK call, then the result of the NLM_GRANTED request _MUST_ be
>ignored on the server.
I agree the server should do this, but the spec doesn't say so, there are
servers out there who don't, and the protocol itself doesn't help. So the
intent of this fix is to restore order, even though there are perhaps
better-but-still-imperfect fixes that require global change.
BTW, the Linux server discards unsent NLM_GRANTED callbacks when
the client is granted a lock in its retry case. But it performs no such
checking on any responses that arrive from prior sends (see
fs/lockd/svclock.c nlmsvc_grant_reply() and nlmsvc_unlink_block()).
It simply processes the next blocked lock, as I described.
Tom.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Wed, Oct 17, 2007 at 01:30:27PM -0400, Talpey, Thomas wrote:
> At 12:44 PM 10/17/2007, Trond Myklebust wrote:
> >So what will happen in this case if the NLM_GRANTED reply races with my
> >UNLOCK call to release the lock?
>
> Alas, that is the beast which is NLM. The callback reply can race the UNLOCK
> even without this change. I can't fix that, it's a protocol issue. Wait a sec
> yes I can, it's called NFSv4.1 Sessions. :-)
>
> >Servers have to perform consistency checks on NLM_GRANTED, or the
> >protocol will break. Once the client has been granted the lock via a
> >separate LOCK call, then the result of the NLM_GRANTED request _MUST_ be
> >ignored on the server.
>
> I agree the server should do this, but the spec doesn't say so, there are
> servers out there who don't, and the protocol itself doesn't help. So the
> intent of this fix is to restore order, even though there are perhaps
> better-but-still-imperfect fixes that require global change.
>
> BTW, the Linux server discards unsent NLM_GRANTED callbacks when
> the client is granted a lock in its retry case. But it performs no such
> checking on any responses that arrive from prior sends (see
> fs/lockd/svclock.c nlmsvc_grant_reply() and nlmsvc_unlink_block()).
> It simply processes the next blocked lock, as I described.
I don't understand. As far as I can tell, the server acquires a lock
locally at the time it sends the GRANT, and it never unlocks in the code
that handles replies to the grant. Note that nlmsvc_unlink_block() just
remove the data structures that track the blocked (not-yet-acquired)
lock, and is (err, I hope) more or less a no-op in the case the lock in
question is already acquired.
But that's just based on a quick look at the code, and I haven't tested
it.
--b.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
At 03:23 PM 10/17/2007, J. Bruce Fields wrote:
>> BTW, the Linux server discards unsent NLM_GRANTED callbacks when
>> the client is granted a lock in its retry case. But it performs no such
>> checking on any responses that arrive from prior sends (see
>> fs/lockd/svclock.c nlmsvc_grant_reply() and nlmsvc_unlink_block()).
>> It simply processes the next blocked lock, as I described.
>
>I don't understand. As far as I can tell, the server acquires a lock
>locally at the time it sends the GRANT, and it never unlocks in the code
>that handles replies to the grant. Note that nlmsvc_unlink_block() just
>remove the data structures that track the blocked (not-yet-acquired)
>lock, and is (err, I hope) more or less a no-op in the case the lock in
>question is already acquired.
I don't fully understand the Linux server nlm code either, but I am sure
it suffers a similar problem, because it ends up with locks in different
state between client and server. I'm happy to help you with fixing the
server in conjunction with this. It's relatively easy to recreate it with
a couple of clients and some error injection.
Tom.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
On Wed, Oct 17, 2007 at 05:15:10PM -0400, Talpey, Thomas wrote:
> At 03:23 PM 10/17/2007, J. Bruce Fields wrote:
> >> BTW, the Linux server discards unsent NLM_GRANTED callbacks when
> >> the client is granted a lock in its retry case. But it performs no such
> >> checking on any responses that arrive from prior sends (see
> >> fs/lockd/svclock.c nlmsvc_grant_reply() and nlmsvc_unlink_block()).
> >> It simply processes the next blocked lock, as I described.
> >
> >I don't understand. As far as I can tell, the server acquires a lock
> >locally at the time it sends the GRANT, and it never unlocks in the code
> >that handles replies to the grant. Note that nlmsvc_unlink_block() just
> >remove the data structures that track the blocked (not-yet-acquired)
> >lock, and is (err, I hope) more or less a no-op in the case the lock in
> >question is already acquired.
>
> I don't fully understand the Linux server nlm code either, but I am sure
> it suffers a similar problem, because it ends up with locks in different
> state between client and server.
OK. But it's possible it might be a slightly different race? Your
description of the race certainly doesn't match my understanding of the
server nlm code, but of course it's possible I could be missing
something.
> I'm happy to help you with fixing the server in conjunction with this.
> It's relatively easy to recreate it with a couple of clients and some
> error injection.
That would be great, thanks!
--b.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs
At 06:25 PM 10/17/2007, J. Bruce Fields wrote:
>OK. But it's possible it might be a slightly different race? Your
>description of the race certainly doesn't match my understanding of the
>server nlm code, but of course it's possible I could be missing
>something.
To be clear, I first detected this error when serving off a NetApp filer.
I believe a similar issue can occur with a Linux server, but the manifestation
may well be different. It's just as serious (false locks, data corruption,
and client deadlock).
I'll drop you a line tomorrow.
Tom.
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs