2004-08-25 00:31:10

by Marc Eshel

[permalink] [raw]
Subject: NFSv3 blocking locks broken in 2.6.7





Hi Trond,

I have a question about the serialization of the nlm_blocked list. Is it
possible that the lockd thread will manipulate the list while a call is
made to nlmsvc_notify_blocked() on a different thread which is also
searching the nlm_blocked list?

Thanks, Marc.



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs


2004-08-25 01:31:36

by Trond Myklebust

[permalink] [raw]
Subject: Re: NFSv3 blocking locks broken in 2.6.7

svclock.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

--- linux-2.6.8.1-grrr/fs/lockd/svclock.c.orig 2004-08-14 14:25:56.000000000 -0400
+++ linux-2.6.8.1-grrr/fs/lockd/svclock.c 2004-08-18 19:34:13.000000000 -0400
@@ -238,8 +238,13 @@ nlmsvc_delete_block(struct nlm_block *bl

/* Remove block from list */
nlmsvc_remove_block(block);
- posix_unblock_lock(&file->f_file, fl);
- block->b_granted = 0;
+ if (fl->fl_next)
+ posix_unblock_lock(&file->f_file, fl);
+ if (unlock) {
+ fl->fl_type = F_UNLCK;
+ posix_lock_file(&file->f_file, fl);
+ block->b_granted = 0;
+ }

/* If the block is in the middle of a GRANT callback,
* don't kill it yet. */


Attachments:
linux-2.6.8.1-10-lockd_bug.dif (707.00 B)

2004-08-18 03:38:10

by Marc Eshel

[permalink] [raw]
Subject: Re: NFSv3 blocking locks broken in 2.6.7





I just tried the latest 2.6.8 kernel and the following problem is still
there.

Marc Eshel/Almaden/IBM wrote on 07/17/2004 02:11:22 PM:

> Hi,
>
> There appears to be a problem with blocking locks granted by lockd
> (observed on 2.6.7 and some previous versions of 2.6). An NFS client
> waiting for a blocking lock unlocks the file incorrectly right after
> being granted the lock even though the application still holds the
> lock; thereby allowing another NFS client or local fcntl locks to be
> granted. This problem will not manifest itself while using two
> processes from the same NFS client because of the local locking on
> the client machine.
>
> The problem can be easily reproduced using two different NFS client
> machines and issuing the following sequence of locking calls (assume
> that /mnt is the NFS mount point on both NFS clients in the example):
>
> Client 1: fcntl(F_SETLK) /mnt/f1 granted and held
> Client 2: fcntl(F_SETLKW) /mnt/f1 blocked
> Client 1: fcntl(F_UNLCK) /mnt/f1 unlocked on Machine 1; granted on
Machine 2
> Client 2: After a delay, lock granted and held
> Client 1: fcntl(F_SETLK) /mnt/f1 granted (should be DENIED)
>
> The problem also occurs when the first process is executing the
> fcntl locks locally.
>
> Thanks, Marc.



-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-18 17:02:19

by Trond Myklebust

[permalink] [raw]
Subject: Re: Re: NFSv3 blocking locks broken in 2.6.7

--- linux-2.6.8.1-up/net/sunrpc/svcauth_unix.c.orig 2004-08-14 14:26:55.000000000 -0400
+++ linux-2.6.8.1-up/net/sunrpc/svcauth_unix.c 2004-08-18 12:33:47.000000000 -0400
@@ -368,6 +368,7 @@ svcauth_null_accept(struct svc_rqst *rqs
svc_putu32(resv, RPC_AUTH_NULL);
svc_putu32(resv, 0);

+#if 0
key.m_class = rqstp->rq_server->sv_program->pg_class;
key.m_addr = rqstp->rq_addr.sin_addr;

@@ -395,6 +396,10 @@ svcauth_null_accept(struct svc_rqst *rqs

if (rqstp->rq_client == NULL && rqstp->rq_proc != 0)
*authp = rpc_autherr_badcred;
+#else
+ rqstp->rq_client = NULL;
+ rv = SVC_OK;
+#endif

return rv;
}
@@ -461,6 +466,7 @@ svcauth_unix_accept(struct svc_rqst *rqs
}


+#if 0
key.m_class = rqstp->rq_server->sv_program->pg_class;
key.m_addr = rqstp->rq_addr.sin_addr;

@@ -487,6 +493,9 @@ svcauth_unix_accept(struct svc_rqst *rqs

if (rv == SVC_OK && rqstp->rq_client == NULL && rqstp->rq_proc != 0)
goto badcred;
+#else
+ rv = SVC_OK;
+#endif

/* Put NULL verifier */
svc_putu32(resv, RPC_AUTH_NULL);


Attachments:
gnurr.dif (1.06 kB)

2004-08-18 21:30:38

by Marc Eshel

[permalink] [raw]
Subject: Re: Re: NFSv3 blocking locks broken in 2.6.7





Hi Trond,

I tried the changed that you recommended on the NFS client but it did not
help. Following is an NLM trace on the NFS server that shows the problem.
Tracing the code points to a problem that is marked below with '*****'

Thanks, Marc.
-----------------------------------------------------------------------------------------------------
1. Local lock with fcntl(fd, F_SETLK, 1-5)

2. Lock from NFS client with fcntl(fd, F_SELKW, 1-5) - blocked

Aug 18 13:49:00 lockd: request from 09014042
Aug 18 13:49:00 lockd: LOCK called
Aug 18 13:49:00 lockd: nlm_lookup_host(09014042, p=17, v=4)
Aug 18 13:49:00 lockd: host garbage collection
Aug 18 13:49:00 lockd: nlmsvc_mark_resources
Aug 18 13:49:00 nlm_gc_hosts skipping 9.1.64.66 (cnt 0 use 0 exp 3784570)
Aug 18 13:49:00 lockd: delete host 9.1.64.66
Aug 18 13:49:00 lockd: get host 9.1.64.66
Aug 18 13:49:00 lockd: nlm_file_lookup(02000001 03000300 0006fc81 00070d65
7c32bc32 0006fc81)
Aug 18 13:49:00 lockd: found file c1584c00 (count 1)
Aug 18 13:49:00 lockd: nlmsvc_lock(hda3/462181, ty=1, pi=1380, 1-5, bl=1)
Aug 18 13:49:00 lockd: nlmsvc_lookup_block f=c1584c00 pd=1380 1-5 ty=1
Aug 18 13:49:00 lockd: blocking on this lock (allocating).
Aug 18 13:49:00 lockd: nlm_lookup_host(09014042, p=17, v=4)
Aug 18 13:49:01 lockd: creating host entry
Aug 18 13:49:02 lockd: created block d8d5e000...
Aug 18 13:49:03 lockd: nlmsvc_insert_block(d8d5e000, -1)
Aug 18 13:49:03 lockd: blocking on this lock.
Aug 18 13:49:04 lockd: LOCK status 3
Aug 18 13:49:04 lockd: release host 9.1.64.66
Aug 18 13:49:05 lockd: nlm_release_file(c1584c00, ct = 2)
Aug 18 13:49:05 nlmsvc_retry_blocked(d8d5e000, when=-1)
Aug 18 13:49:05 nlmsvc_retry_blocked(d8d5e000, when=-1)

3. Release local lock with fcntl(fd, F_UNLCK, 1-5)

4. NFS lock granted

Aug 18 13:49:30 lockd: request from 09014042
Aug 18 13:49:30 lockd: LOCK called
Aug 18 13:49:30 lockd: nlm_lookup_host(09014042, p=17, v=4)
Aug 18 13:49:30 lockd: get host 9.1.64.66
Aug 18 13:49:30 lockd: nlm_file_lookup(02000001 03000300 0006fc81 00070d65
7c32bc32 0006fc81)
Aug 18 13:49:30 lockd: found file c1584c00 (count 1)
Aug 18 13:49:30 lockd: nlmsvc_lock(hda3/462181, ty=1, pi=1380, 1-5, bl=1)
Aug 18 13:49:30 lockd: nlmsvc_lookup_block f=c1584c00 pd=1380 1-5 ty=1
Aug 18 13:49:30 lockd: check f=c1584c00 pd=1380 1-5 ty=1 cookie=123d
Aug 18 13:49:30 lockd: nlmsvc_insert_block(d8d5e000, -1)
Aug 18 13:49:30 lockd: LOCK status 3
Aug 18 13:49:30 lockd: release host 9.1.64.66
Aug 18 13:49:30 lockd: nlm_release_file(c1584c00, ct = 2)
Aug 18 13:49:37 nlmsvc_retry_blocked(d8d5e000, when=-1)
Aug 18 13:49:37 nlmsvc_retry_blocked(d8d5e000, when=-1)
Aug 18 13:49:38 lockd: VFS unblock notification for block d8d5e0b0
Aug 18 13:49:39 lockd: nlmsvc_insert_block(d8d5e000, 0)
Aug 18 13:49:39 nlmsvc_retry_blocked(d8d5e000, when=3799958)
Aug 18 13:49:40 nlmsvc_retry_blocked(d8d5e000, when=3799958, done=0)
Aug 18 13:49:42 lockd: grant blocked lock d8d5e000
Aug 18 13:49:43 lockd: GRANTing blocked lock.
Aug 18 13:49:44 lockd: nlmsvc_insert_block(d8d5e000, 30000)
Aug 18 13:49:45 lockd: get host 9.1.64.66
Aug 18 13:49:47 lockd: call procedure 10 on 9.1.64.66 (async)
Aug 18 13:49:48 lockd: nlm_bind_host(09014042)
Aug 18 13:49:49 lockd: GRANT_MSG RPC callback
Aug 18 13:49:50 callback: looking for cookie 123d, host (09014042)
Aug 18 13:49:50 cookie: head of blocked queue d8d5e000, block d8d5e000
Aug 18 13:49:52 lockd: nlmsvc_insert_block(d8d5e000, 60000)
Aug 18 13:49:53 lockd: release host 9.1.64.66
Aug 18 13:49:54 lockd: request from 09014042
Aug 18 13:49:55 lockd: GRANTED_RES called
Aug 18 13:49:55 grant_reply: looking for cookie 123d, host (09014042), s=0
Aug 18 13:49:56 cookie: head of blocked queue d8d5e000, block d8d5e000
Aug 18 13:49:57 cookie: head of blocked queue d8d5e000, block d8d5e000
Aug 18 13:49:57 lockd: deleting block d8d5e000...

***** This appears to be the problem. posix_unblock_lock called from
nlmsvc_delete_block
***** with waiter->fl_next NULL resulting in call to posix_lock_file(UNLCK)
which
***** releases the lock on the server.
Code from 2.6.8.1: fs/locks.c

void
posix_unblock_lock(struct file *filp, struct file_lock *waiter)
{
/*
* A remote machine may cancel the lock request after it's been
* granted locally. If that happens, we need to delete the lock.
*/
lock_kernel();
if (waiter->fl_next) {
__locks_delete_block(waiter);
unlock_kernel();
} else {
unlock_kernel();
waiter->fl_type = F_UNLCK;
posix_lock_file(filp, waiter); <<<<******
}
}

Aug 18 13:49:58 lockd: release host 9.1.64.66
Aug 18 13:49:59 lockd: nlm_release_file(c1584c00, ct = 3)
Aug 18 13:50:01 nlmsvc_retry_blocked(00000000, when=0)
Aug 18 13:50:06 nlmsvc_retry_blocked(00000000, when=0)


5. Release NFS lock. (At this point, lock has already been released on
server).

Aug 18 13:51:41 lockd: request from 09014042
Aug 18 13:51:41 lockd: UNLOCK called
Aug 18 13:51:41 lockd: nlm_lookup_host(09014042, p=17, v=4)
Aug 18 13:51:41 lockd: host garbage collection
Aug 18 13:51:41 lockd: nlmsvc_mark_resources
Aug 18 13:51:41 lockd: delete host 9.1.64.66
Aug 18 13:51:41 lockd: nsm_unmonitor(9.1.64.66)
Aug 18 13:51:41 lockd: delete host 9.1.64.66
Aug 18 13:51:41 lockd: creating host entry
Aug 18 13:51:41 lockd: nlm_file_lookup(02000001 03000300 0006fc81 00070d65
7c32bc32 0006fc81)
Aug 18 13:51:41 lockd: found file c1584c00 (count 2)
Aug 18 13:51:41 lockd: nlmsvc_unlock(hda3/462181, pi=1380,
0-9223372036854775807)
Aug 18 13:51:41 lockd: nlmsvc_cancel(hda3/462181, pi=1380,
0-9223372036854775807)
Aug 18 13:51:41 lockd: nlmsvc_lookup_block f=c1584c00 pd=1380
0-9223372036854775807 ty=2
Aug 18 13:51:41 lockd: UNLOCK status 0
Aug 18 13:51:41 lockd: release host 9.1.64.66
Aug 18 13:51:41 lockd: nlm_release_file(c1584c00, ct = 3)
Aug 18 13:51:41 nlmsvc_retry_blocked(00000000, when=0)




-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs

2004-08-18 23:39:20

by Trond Myklebust

[permalink] [raw]
Subject: Re: Re: NFSv3 blocking locks broken in 2.6.7

--- linux-2.6.8.1-grrr/fs/lockd/svclock.c.orig 2004-08-14 14:25:56.000000000 -0400
+++ linux-2.6.8.1-grrr/fs/lockd/svclock.c 2004-08-18 19:34:13.000000000 -0400
@@ -238,8 +238,13 @@ nlmsvc_delete_block(struct nlm_block *bl

/* Remove block from list */
nlmsvc_remove_block(block);
- posix_unblock_lock(&file->f_file, fl);
- block->b_granted = 0;
+ if (fl->fl_next)
+ posix_unblock_lock(&file->f_file, fl);
+ if (unlock) {
+ fl->fl_type = F_UNLCK;
+ posix_lock_file(&file->f_file, fl);
+ block->b_granted = 0;
+ }

/* If the block is in the middle of a GRANT callback,
* don't kill it yet. */


Attachments:
gnurr.dif (625.00 B)

2004-08-19 00:44:48

by Marc Eshel

[permalink] [raw]
Subject: Re: Re: NFSv3 blocking locks broken in 2.6.7





Thanks for the quick response, this patch fixed the problem.
Marc.

[email protected] wrote on 08/18/2004 04:39:20 PM:

> P=E5 on , 18/08/2004 klokka 17:30, skreiv Marc Eshel:
> >
> >
> > Hi Trond,
> >
> > I tried the changed that you recommended on the NFS client but it d=
id
not
> > help. Following is an NLM trace on the NFS server that shows the
problem.
> > Tracing the code points to a problem that is marked below with '***=
**'
> >

> Well debugged!

> Grr... These "design cleanups" are costing a lot of extra work to fix=
up
> the bugs.

> So the following patch fixes it for you then?

> Cheers,
> Trond
> --- linux-2.6.8.1-grrr/fs/lockd/svclock.c.orig 2004-08-14 14:25:56.
> 000000000 -0400
> +++ linux-2.6.8.1-grrr/fs/lockd/svclock.c 2004-08-18 19:34:13.0000000=
00
-0400
> @@ -238,8 +238,13 @@ nlmsvc_delete_block(struct nlm_block *bl

> /* Remove block from list */
> nlmsvc_remove_block(block);
> - posix_unblock_lock(&file->f_file, fl);
> - block->b_granted =3D 0;
> + if (fl->fl_next)
> + posix_unblock_lock(&file->f_file, fl);
> + if (unlock) {
> + fl->fl_type =3D F_UNLCK;
> + posix_lock_file(&file->f_file, fl);
> + block->b_granted =3D 0;
> + }

> /* If the block is in the middle of a GRANT callback,
> * don't kill it yet. */=




-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
NFS maillist - [email protected]
https://lists.sourceforge.net/lists/listinfo/nfs