2003-08-01 11:32:08

by Mark Hemment

[permalink] [raw]
Subject: lockd fails to purge blocked NLM_LOCKs

When stopping server-side NFS, or deleting a client (NFSCTL_DELCLIENT),
lockd fails to purge blocked lockd requests.

The call path is;
exp_freeclient()
nfsd_lockd_unexport()
nlmsvc_invalidate_client()

Where nlmsvc_invalidate_client(), looks up the host with;
host = nlm_lookup_host(clnt, NULL, 0, 0)
ie. the host-lookup keys off the client's "svc_client" (memory) address
from NFSD's "clients" list.

This "host" value is passed through;
nlmsvc_free_host_resources()
nlm_traverse_files()
nlm_inspect_file()
and into nlmsvc_traverse_blocks(), where blocked locks are tested for
with;
nlmsvc_traverse_blocks(host, ....)
{
...
if (host == NULL || host == block->b_host)
nlmsvc_delete_block(block, 1);
...
}

However, where the block entries are created in nlmsvc_create_block();
host = nlmclnt_lookup_host(&rqstp->rq_addr,
rqstp->rq_prot, rqstp->rq_vers);
...
block->b_host = host;

this host-lookup (nlmclnt_lookup_host()) is keyed off the client's network
address.

So, back in nlmsvc_traverse_blocks() the test;
host == block->b_host
is checking different values; "host" found via "svc_client"s memory
address and "b_host" found/created via the client's network address.

The bug/confusion probably originates as "nlm_hosts[]" is overload with
two different types of entries.

nlmsvc_create_block() still needs to create an entry in nlm_hosts[] for
the sending of the 'NLM_GRANTED', but it already stores a reference to
that entry in "call->a_host".
Solution is to use the value in "call->a_host" for the
nlm_rebind_host(), and to set "b_host" with the host value found via an
nlmsvc_lookup_host().

Patch against 2.4.21-pre9 attached.

Mark


diff -urN linux-2.4.21-pre9/fs/lockd/svclock.c lockd-block/fs/lockd/svclock.c
--- linux-2.4.21-pre9/fs/lockd/svclock.c 2003-07-31 20:49:28.000000000 +0100
+++ lockd-block/fs/lockd/svclock.c 2003-08-01 12:06:04.000000000 +0100
@@ -188,6 +188,11 @@
locks_init_lock(&block->b_call.a_args.lock.fl);
locks_init_lock(&block->b_call.a_res.lock.fl);

+ block->b_host = nlmsvc_lookup_host(rqstp);
+ if (block->b_host == NULL) {
+ goto failed_free;
+ }
+
if (!nlmclnt_setgrantargs(&block->b_call, lock))
goto failed_free;

@@ -199,7 +204,6 @@

/* Create and initialize the block */
block->b_daemon = rqstp->rq_server;
- block->b_host = host;
block->b_file = file;

/* Add to file's list of blocks */
@@ -265,8 +269,7 @@
}
}

- if (block->b_host)
- nlm_release_host(block->b_host);
+ nlm_release_host(block->b_host);
nlmclnt_freegrantargs(&block->b_call);
kfree(block);
}
@@ -515,7 +518,7 @@
* Just retry the grant callback, possibly refreshing the RPC
* binding */
if (block->b_granted) {
- nlm_rebind_host(block->b_host);
+ nlm_rebind_host(block->b_call.a_host);
goto callback;
}





-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2003-08-04 05:36:11

by NeilBrown

[permalink] [raw]
Subject: Re: lockd fails to purge blocked NLM_LOCKs

On Friday August 1, [email protected] wrote:
> When stopping server-side NFS, or deleting a client (NFSCTL_DELCLIENT),
> lockd fails to purge blocked lockd requests.

Thanks Mark. I'm at least largely convinced.

My only question concerns nlmsvc_traverse_blocks.
This uses b_host and you have (subtley) changed the meaning of b_host.

Do we also need to change both occurances of b_host in
nlmsvc_traverse_blocks to b_call.a_host ??

NeilBrown


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2003-08-06 11:27:22

by Mark Hemment

[permalink] [raw]
Subject: Re: lockd fails to purge blocked NLM_LOCKs

On Mon, 4 Aug 2003, Neil Brown wrote:

> On Friday August 1, [email protected] wrote:
> > When stopping server-side NFS, or deleting a client (NFSCTL_DELCLIENT),
> > lockd fails to purge blocked lockd requests.
>
> Thanks Mark. I'm at least largely convinced.
>
> My only question concerns nlmsvc_traverse_blocks.
> This uses b_host and you have (subtley) changed the meaning of b_host.

That is the idea.
In host.c, the nlm_hosts[] 'database' contains two types of entries;
a) those created with an "svc_client" (h_exportent != NULL)
b) those created without an "svc_client" (h_exportent = NULL)

Type a) entries refer back to nfsd's client 'database'.
Type b) entries are independent of nfsd.

nlmsvc_traverse_blocks() is expecting ->b_host to refer to an entry in
"nlm_hosts[]" which refers back to a type a) entry. Without the patch,
->b_host refers to a type b) entry.

I've attached my write-up from our internal bug tracking system; it has
a bit more detailed.

Also, I've attached a simple program for demonstrating the problem
(foo-lock). The steps are;
i) Create a scratch filesystem on the Linux NFS server and
mount it, say at point /mnt/fs1, and export it. Created a
small test file;
# cp /etc/hosts /mnt/fs1/testfile
ii) Mount the exported filesystem on a client, sat at point
/mnt/scratch
iii) Login to a client twice.
On the first login, run the test program;
# ./foo-lock /mnt/scratch/testfile
Attempting to get lock...
Got lock..sleeping for 60 seconds
On the second login, run the test program before the
first expires - it will
block for the lock;
# ./foo-lock /mnt/scratch/testfile
Attempting to get lock...
iv) Stop NFS on the server before the first test program expires.
For RedHat;
# /etc/init.d/nfs stop
v) Now try to umount the fileystem on the server;
# umount /mnt/fs1
Busy? Even stopping lockd will not help. The server needs
to be rebooted.

Thanks,
Mark


Attachments:
foo-lock.c (856.00 B)
foo-lock.c
incident-127591.txt (4.76 kB)
incident-127591.txt
Download all attachments

2003-08-06 23:55:33

by NeilBrown

[permalink] [raw]
Subject: Re: lockd fails to purge blocked NLM_LOCKs

On Wednesday August 6, [email protected] wrote:
> On Mon, 4 Aug 2003, Neil Brown wrote:
>
> > On Friday August 1, [email protected] wrote:
> > > When stopping server-side NFS, or deleting a client (NFSCTL_DELCLIENT),
> > > lockd fails to purge blocked lockd requests.
> >
> > Thanks Mark. I'm at least largely convinced.
> >
> > My only question concerns nlmsvc_traverse_blocks.
> > This uses b_host and you have (subtley) changed the meaning of b_host.
>
> That is the idea.

Thanks for the extra detail. I'm convinced now.

I know I must have understood all this code once, because I changed
this stuff for 2.6 and got rid of the per-export-entry 'host's and
just kept per-ip-address hosts. The actual patch can be found at:

http://www.cse.unsw.edu.au/~neilb/patches/linux-devel/2.5.34/patch-F-LockdSeparate

I hope it's right :-)

I'll submit your patch for 2.4.23.

NeilBrown


-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs