2008-11-15 13:28:33

by Frank van Maarseveen

[permalink] [raw]
Subject: [NLM] 2.6.27 broken

Try running multiple instances of attached program on 1 NFS client
against a 2.6.27(.5) NFSv3 server:

gcc -Wall -Wstrict-prototypes -o lck lck.c
for i in `seq 30`
do
lck &
done

Depending on the client linux version one or more processes hang
indefinately (on 2.6.22) or receive a ENOLCK (on 2.6.27), printing:

lck: fcntl: No locks available

Either way, /proc/locks on the server grows indefinately.

--
Frank


Attachments:
(No filename) (421.00 B)
lck.c (1.48 kB)
Download all attachments

2008-11-28 11:24:49

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Thu, Nov 20, 2008 at 05:27:31PM -0500, J. Bruce Fields wrote:
> On Sat, Nov 15, 2008 at 02:28:31PM +0100, Frank van Maarseveen wrote:
> > Try running multiple instances of attached program on 1 NFS client
> > against a 2.6.27(.5) NFSv3 server:
> >
> > gcc -Wall -Wstrict-prototypes -o lck lck.c
> > for i in `seq 30`
> > do
> > lck &
> > done
>
> Or reproduceable using the "flock" utility with:
>
> for i in `seq 30`
> do
> flock /mnt/foo sleep 10
> done
>
> Hm. What's the last known good server version?

2.6.24.4

--
Frank

2008-11-20 22:27:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Sat, Nov 15, 2008 at 02:28:31PM +0100, Frank van Maarseveen wrote:
> Try running multiple instances of attached program on 1 NFS client
> against a 2.6.27(.5) NFSv3 server:
>
> gcc -Wall -Wstrict-prototypes -o lck lck.c
> for i in `seq 30`
> do
> lck &
> done

Or reproduceable using the "flock" utility with:

for i in `seq 30`
do
flock /mnt/foo sleep 10
done

Hm. What's the last known good server version?

--b.

>
> Depending on the client linux version one or more processes hang
> indefinately (on 2.6.22) or receive a ENOLCK (on 2.6.27), printing:
>
> lck: fcntl: No locks available
>
> Either way, /proc/locks on the server grows indefinately.
>
> --
> Frank

> #include <stdio.h>
> #include <ctype.h>
> #include <unistd.h>
> #include <fcntl.h>
> #include <errno.h>
> #include <string.h>
> #include <stdarg.h>
> #include <stdlib.h>
>
> void die(const char *fmt, ...) __attribute__((format(printf, 1, 2), noreturn));
> void die(const char *fmt, ...)
> {
> va_list ap;
>
> va_start(ap, fmt);
> fprintf(stderr, "lck: ");
> vfprintf(stderr, fmt, ap);
> va_end(ap);
> exit(1);
> }
>
> int main(int argc, char **argv)
> {
> struct flock flock = {0};
> int i, d, locktime, cmd;
> const char *name;
>
> flock.l_type = F_WRLCK; /* -w */
> flock.l_whence = SEEK_SET;
> cmd = F_SETLKW; /* no -t */
> name = NULL;
> locktime = 10;
> for (i = 1; i < argc; ++i) {
> if (strcmp(argv[i], "-r") == 0)
> flock.l_type = F_RDLCK; /* lock for N readers */
> else if (strcmp(argv[i], "-w") == 0)
> flock.l_type = F_WRLCK; /* lock for 1 writer */
> else if (strcmp(argv[i], "-t") == 0)
> cmd = F_SETLK; /* test for a lock, don't wait */
> else if (argv[i][0] == '-')
> die("Usage: lck [-r|-w] [-t] [<filename> [<locktime>]]\n");
> else if (name && isdigit(argv[i][0]))
> locktime = atoi(argv[i]); /* after acquiring lock, wait locktime seconds */
> else
> name = argv[i];
> }
> if (!name)
> name = "lck-filename";
> d = open(name, O_RDWR|O_CREAT, 0666);
> if (d == -1)
> die("open %s: %s\n", name, strerror(errno));
> if (fcntl(d, cmd, &flock) == -1)
> die("fcntl: %s\n", strerror(errno));
> printf("locked...");
> fflush(NULL);
> sleep(locktime);
> if (close(d))
> die("close: %s\n", strerror(errno));
> printf("unlocked.\n");
> return 0;
> }


2008-12-16 17:39:25

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Fri, Nov 28, 2008 at 12:24:47PM +0100, Frank van Maarseveen wrote:
> On Thu, Nov 20, 2008 at 05:27:31PM -0500, J. Bruce Fields wrote:
> > On Sat, Nov 15, 2008 at 02:28:31PM +0100, Frank van Maarseveen wrote:
> > > Try running multiple instances of attached program on 1 NFS client
> > > against a 2.6.27(.5) NFSv3 server:
> > >
> > > gcc -Wall -Wstrict-prototypes -o lck lck.c
> > > for i in `seq 30`
> > > do
> > > lck &
> > > done
> >
> > Or reproduceable using the "flock" utility with:
> >
> > for i in `seq 30`
> > do
> > flock /mnt/foo sleep 10

(Sorry, note there should be an ampersand at the end there....)

> > done
> >
> > Hm. What's the last known good server version?
>
> 2.6.24.4

More precisely, it looks like this started with

bde74e4bc64415b142e "locks: add special return value for
asynchronous locks"

But I haven't had the chance to look any harder yet. Miklos? Is this
easy for you to reproduce?

--b.

2008-12-16 19:43:52

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Tue, 2008-12-16 at 12:39 -0500, J. Bruce Fields wrote:
> More precisely, it looks like this started with
>
> bde74e4bc64415b142e "locks: add special return value for
> asynchronous locks"
>
> But I haven't had the chance to look any harder yet. Miklos? Is this
> easy for you to reproduce?

Not immediately, at the moment I don't have NFS set up. But if you
don't beat me to it, I'll look into this.

Thanks,
Miklos



2008-12-16 20:16:14

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Tue, Dec 16, 2008 at 08:43:52PM +0100, Miklos Szeredi wrote:
> On Tue, 2008-12-16 at 12:39 -0500, J. Bruce Fields wrote:
> > More precisely, it looks like this started with
> >
> > bde74e4bc64415b142e "locks: add special return value for
> > asynchronous locks"
> >
> > But I haven't had the chance to look any harder yet. Miklos? Is this
> > easy for you to reproduce?
>
> Not immediately, at the moment I don't have NFS set up. But if you
> don't beat me to it, I'll look into this.

OK, thanks. I'll take another look too when I get the chance, so let me
know of any partial result.

It may just for example be returning the wrong error to the client on an
nlm blocking lock request, so that the client assumes the lock is gone
and goes away rather than waiting for a grant request.

--b.

2009-02-04 23:33:46

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Tue, Dec 16, 2008 at 03:16:10PM -0500, bfields wrote:
> On Tue, Dec 16, 2008 at 08:43:52PM +0100, Miklos Szeredi wrote:
> > On Tue, 2008-12-16 at 12:39 -0500, J. Bruce Fields wrote:
> > > More precisely, it looks like this started with
> > >
> > > bde74e4bc64415b142e "locks: add special return value for
> > > asynchronous locks"
> > >
> > > But I haven't had the chance to look any harder yet. Miklos? Is this
> > > easy for you to reproduce?
> >
> > Not immediately, at the moment I don't have NFS set up. But if you
> > don't beat me to it, I'll look into this.
>
> OK, thanks. I'll take another look too when I get the chance, so let me
> know of any partial result.
>
> It may just for example be returning the wrong error to the client on an
> nlm blocking lock request, so that the client assumes the lock is gone
> and goes away rather than waiting for a grant request.

Sorry, I've gotten a bit backlogged, but I finally got back to this. If
there's no objections, the following is what I intend to submit.

--b.

commit cb8b864ea6addd3a3e72fe835aafecec63f06cbd
Author: J. Bruce Fields <[email protected]>
Date: Wed Feb 4 17:35:38 2009 -0500

lockd: fix regression in lockd's handling of blocked locks

If a client requests a blocking lock, is denied, then requests it again,
then here in nlmsvc_lock() we will call vfs_lock_file() without FL_SLEEP
set, because we've already queued a block and don't need the locks code
to do it again.

But that means vfs_lock_file() will return -EAGAIN instead of
FILE_LOCK_DENIED. So we still need to translate that -EAGAIN return
into a nlm_lck_blocked error in this case, and put ourselves back on
lockd's block list.

The bug was introduced by bde74e4bc64415b1 "locks: add special return
value for asynchronous locks".

Thanks to From: Frank van Maarseveen for the report; his original test
case was essentially

for i in `seq 30`; do flock /nfsmount/foo sleep 10 & done

Cc: Frank van Maarseveen <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 6063a8e..763b78a 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -427,7 +427,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
case -EAGAIN:
ret = nlm_lck_denied;
- goto out;
+ break;
case FILE_LOCK_DEFERRED:
if (wait)
break;
@@ -443,6 +443,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
}

+ ret = nlm_lck_denied;
+ if (!wait)
+ goto out;
+
ret = nlm_lck_blocked;

/* Append to list of blocked */

2009-02-05 10:47:13

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Wed, 2009-02-04 at 18:33 -0500, J. Bruce Fields wrote:
> On Tue, Dec 16, 2008 at 03:16:10PM -0500, bfields wrote:
> > On Tue, Dec 16, 2008 at 08:43:52PM +0100, Miklos Szeredi wrote:
> > > On Tue, 2008-12-16 at 12:39 -0500, J. Bruce Fields wrote:
> > > > More precisely, it looks like this started with
> > > >
> > > > bde74e4bc64415b142e "locks: add special return value for
> > > > asynchronous locks"
> > > >
> > > > But I haven't had the chance to look any harder yet. Miklos? Is this
> > > > easy for you to reproduce?
> > >
> > > Not immediately, at the moment I don't have NFS set up. But if you
> > > don't beat me to it, I'll look into this.
> >
> > OK, thanks. I'll take another look too when I get the chance, so let me
> > know of any partial result.
> >
> > It may just for example be returning the wrong error to the client on an
> > nlm blocking lock request, so that the client assumes the lock is gone
> > and goes away rather than waiting for a grant request.
>
> Sorry, I've gotten a bit backlogged, but I finally got back to this. If
> there's no objections, the following is what I intend to submit.

OK (though I don't really understand why we make a lock request to the
VFS _at all_ if we know the lock is already queued???).

But I think at least a comment in the code would be in order, or this
same mistake might be made again. Also I think the original code flow
is somewhat illogical.

How about this (it's essentially the same patch just a bit rearranged,
the authorship is still yours of course ;)

Thanks,
Miklos

Index: linux-2.6/fs/lockd/svclock.c
===================================================================
--- linux-2.6.orig/fs/lockd/svclock.c 2009-01-26 14:47:48.000000000 +0100
+++ linux-2.6/fs/lockd/svclock.c 2009-02-05 11:42:20.000000000 +0100
@@ -426,6 +426,13 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru
ret = nlm_granted;
goto out;
case -EAGAIN:
+ /*
+ * If this is a blocking request for an
+ * already pending lock request then we need
+ * to put it back on lockd's block list
+ */
+ if (wait)
+ break;
ret = nlm_lck_denied;
goto out;
case FILE_LOCK_DEFERRED:





2009-02-05 11:25:01

by Frank van Maarseveen

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Wed, Feb 04, 2009 at 06:33:48PM -0500, J. Bruce Fields wrote:
> On Tue, Dec 16, 2008 at 03:16:10PM -0500, bfields wrote:
> > On Tue, Dec 16, 2008 at 08:43:52PM +0100, Miklos Szeredi wrote:
> > > On Tue, 2008-12-16 at 12:39 -0500, J. Bruce Fields wrote:
> > > > More precisely, it looks like this started with
> > > >
> > > > bde74e4bc64415b142e "locks: add special return value for
> > > > asynchronous locks"
> > > >
> > > > But I haven't had the chance to look any harder yet. Miklos? Is this
> > > > easy for you to reproduce?
> > >
> > > Not immediately, at the moment I don't have NFS set up. But if you
> > > don't beat me to it, I'll look into this.
> >
> > OK, thanks. I'll take another look too when I get the chance, so let me
> > know of any partial result.
> >
> > It may just for example be returning the wrong error to the client on an
> > nlm blocking lock request, so that the client assumes the lock is gone
> > and goes away rather than waiting for a grant request.
>
> Sorry, I've gotten a bit backlogged, but I finally got back to this. If
> there's no objections, the following is what I intend to submit.
>
> --b.
>
> commit cb8b864ea6addd3a3e72fe835aafecec63f06cbd
> Author: J. Bruce Fields <[email protected]>
> Date: Wed Feb 4 17:35:38 2009 -0500
>
> lockd: fix regression in lockd's handling of blocked locks
>
> If a client requests a blocking lock, is denied, then requests it again,
> then here in nlmsvc_lock() we will call vfs_lock_file() without FL_SLEEP
> set, because we've already queued a block and don't need the locks code
> to do it again.
>
> But that means vfs_lock_file() will return -EAGAIN instead of
> FILE_LOCK_DENIED. So we still need to translate that -EAGAIN return
> into a nlm_lck_blocked error in this case, and put ourselves back on
> lockd's block list.
>
> The bug was introduced by bde74e4bc64415b1 "locks: add special return
> value for asynchronous locks".
>
> Thanks to From: Frank van Maarseveen for the report; his original test
> case was essentially
>
> for i in `seq 30`; do flock /nfsmount/foo sleep 10 & done
>
> Cc: Frank van Maarseveen <[email protected]>
> Cc: Miklos Szeredi <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> index 6063a8e..763b78a 100644
> --- a/fs/lockd/svclock.c
> +++ b/fs/lockd/svclock.c
> @@ -427,7 +427,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
> goto out;
> case -EAGAIN:
> ret = nlm_lck_denied;
> - goto out;
> + break;
> case FILE_LOCK_DEFERRED:
> if (wait)
> break;
> @@ -443,6 +443,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
> goto out;
> }
>
> + ret = nlm_lck_denied;
> + if (!wait)
> + goto out;
> +
> ret = nlm_lck_blocked;
>
> /* Append to list of blocked */


fix confirmed, thanks!

--
Frank

2009-02-05 19:52:00

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Thu, Feb 05, 2009 at 11:47:09AM +0100, Miklos Szeredi wrote:
> On Wed, 2009-02-04 at 18:33 -0500, J. Bruce Fields wrote:
> > On Tue, Dec 16, 2008 at 03:16:10PM -0500, bfields wrote:
> > > On Tue, Dec 16, 2008 at 08:43:52PM +0100, Miklos Szeredi wrote:
> > > > On Tue, 2008-12-16 at 12:39 -0500, J. Bruce Fields wrote:
> > > > > More precisely, it looks like this started with
> > > > >
> > > > > bde74e4bc64415b142e "locks: add special return value for
> > > > > asynchronous locks"
> > > > >
> > > > > But I haven't had the chance to look any harder yet. Miklos? Is this
> > > > > easy for you to reproduce?
> > > >
> > > > Not immediately, at the moment I don't have NFS set up. But if you
> > > > don't beat me to it, I'll look into this.
> > >
> > > OK, thanks. I'll take another look too when I get the chance, so let me
> > > know of any partial result.
> > >
> > > It may just for example be returning the wrong error to the client on an
> > > nlm blocking lock request, so that the client assumes the lock is gone
> > > and goes away rather than waiting for a grant request.
> >
> > Sorry, I've gotten a bit backlogged, but I finally got back to this. If
> > there's no objections, the following is what I intend to submit.
>
> OK (though I don't really understand why we make a lock request to the
> VFS _at all_ if we know the lock is already queued???).

I think you're right, we might be able to bypass the lock entirely in
that case, but we'd need to think about it carefully.

> But I think at least a comment in the code would be in order, or this
> same mistake might be made again. Also I think the original code flow
> is somewhat illogical.

Yeah, I was literally just reverting the problematic lines of your
previous commit. I'd rather keep it that way for now, just as a clear
separation between the revert/bugfix and the cleanup.

> How about this (it's essentially the same patch just a bit rearranged,
> the authorship is still yours of course ;)

... but would happily queue up the cleanup for 2.6.30.

Actually, I find it strange to have just that single case which breaks,
so that the code after the switch, which looks like it should be shared,
actually just applies to one case. I'd be inclined to just suck
everything up to "out:" into the -EAGAIN case and then make all cases
"goto out" (or, equivalently, break).

--b.

> Thanks,
> Miklos
>
> Index: linux-2.6/fs/lockd/svclock.c
> ===================================================================
> --- linux-2.6.orig/fs/lockd/svclock.c 2009-01-26 14:47:48.000000000 +0100
> +++ linux-2.6/fs/lockd/svclock.c 2009-02-05 11:42:20.000000000 +0100
> @@ -426,6 +426,13 @@ nlmsvc_lock(struct svc_rqst *rqstp, stru
> ret = nlm_granted;
> goto out;
> case -EAGAIN:
> + /*
> + * If this is a blocking request for an
> + * already pending lock request then we need
> + * to put it back on lockd's block list
> + */
> + if (wait)
> + break;
> ret = nlm_lck_denied;
> goto out;
> case FILE_LOCK_DEFERRED:
>
>
>
>

2009-02-05 19:52:15

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Thu, Feb 05, 2009 at 11:21:53AM +0100, Frank van Maarseveen wrote:
> On Wed, Feb 04, 2009 at 06:33:48PM -0500, J. Bruce Fields wrote:
> > diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> > index 6063a8e..763b78a 100644
> > --- a/fs/lockd/svclock.c
> > +++ b/fs/lockd/svclock.c
> > @@ -427,7 +427,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
> > goto out;
> > case -EAGAIN:
> > ret = nlm_lck_denied;
> > - goto out;
> > + break;
> > case FILE_LOCK_DEFERRED:
> > if (wait)
> > break;
> > @@ -443,6 +443,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
> > goto out;
> > }
> >
> > + ret = nlm_lck_denied;
> > + if (!wait)
> > + goto out;
> > +
> > ret = nlm_lck_blocked;
> >
> > /* Append to list of blocked */
>
>
> fix confirmed, thanks!

Good, thanks.--b.

2009-02-06 11:30:01

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Thu, 2009-02-05 at 14:52 -0500, J. Bruce Fields wrote:
> On Thu, Feb 05, 2009 at 11:47:09AM +0100, Miklos Szeredi wrote:
> > But I think at least a comment in the code would be in order, or this
> > same mistake might be made again. Also I think the original code flow
> > is somewhat illogical.
>
> Yeah, I was literally just reverting the problematic lines of your
> previous commit. I'd rather keep it that way for now, just as a clear
> separation between the revert/bugfix and the cleanup.

OK.

> > How about this (it's essentially the same patch just a bit rearranged,
> > the authorship is still yours of course ;)
>
> ... but would happily queue up the cleanup for 2.6.30.

Cool.

> Actually, I find it strange to have just that single case which breaks,
> so that the code after the switch, which looks like it should be shared,
> actually just applies to one case. I'd be inclined to just suck
> everything up to "out:" into the -EAGAIN case and then make all cases
> "goto out" (or, equivalently, break).

Yes, but it needs to be sucked into the FILE_LOCK_DEFERRED case as well.
It's just two lines and one of them is setting the error value, so it's
not real duplication.

Thanks,
Miklos



2009-02-09 18:10:31

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Fri, Feb 06, 2009 at 12:29:58PM +0100, Miklos Szeredi wrote:
> On Thu, 2009-02-05 at 14:52 -0500, J. Bruce Fields wrote:
> > On Thu, Feb 05, 2009 at 11:47:09AM +0100, Miklos Szeredi wrote:
> > > But I think at least a comment in the code would be in order, or this
> > > same mistake might be made again. Also I think the original code flow
> > > is somewhat illogical.
> >
> > Yeah, I was literally just reverting the problematic lines of your
> > previous commit. I'd rather keep it that way for now, just as a clear
> > separation between the revert/bugfix and the cleanup.
>
> OK.
>
> > > How about this (it's essentially the same patch just a bit rearranged,
> > > the authorship is still yours of course ;)
> >
> > ... but would happily queue up the cleanup for 2.6.30.
>
> Cool.
>
> > Actually, I find it strange to have just that single case which breaks,
> > so that the code after the switch, which looks like it should be shared,
> > actually just applies to one case. I'd be inclined to just suck
> > everything up to "out:" into the -EAGAIN case and then make all cases
> > "goto out" (or, equivalently, break).
>
> Yes, but it needs to be sucked into the FILE_LOCK_DEFERRED case as well.
> It's just two lines and one of them is setting the error value, so it's
> not real duplication.

Whoops, right, missed that; so, I'm applying the below, sending the
fixup in now, and queuing up the cleanup for 2.6.30 (with the blame
assigned back to you, hah--object or have me add your signed-off-by).

--b.


commit c4a06d0957ea5b386b1cd83fa9a9d6c19b736346
Author: Miklos Szeredi <[email protected]>
Date: Mon Feb 9 12:30:43 2009 -0500

lockd: clean up blocking lock cases of nlsmvc_lock()

No change in behavior, just rearranging the switch so that we break out
of the switch if and only if we're in the wait case.

Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 763b78a..83ee342 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -426,8 +426,15 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
ret = nlm_granted;
goto out;
case -EAGAIN:
+ /*
+ * If this is a blocking request for an
+ * already pending lock request then we need
+ * to put it back on lockd's block list
+ */
+ if (wait)
+ break;
ret = nlm_lck_denied;
- break;
+ goto out;
case FILE_LOCK_DEFERRED:
if (wait)
break;
@@ -443,10 +450,6 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
}

- ret = nlm_lck_denied;
- if (!wait)
- goto out;
-
ret = nlm_lck_blocked;

/* Append to list of blocked */

commit 716cb6d7901f92bdfe1c80dbf4765027dceab384
Author: J. Bruce Fields <[email protected]>
Date: Wed Feb 4 17:35:38 2009 -0500

lockd: fix regression in lockd's handling of blocked locks

If a client requests a blocking lock, is denied, then requests it again,
then here in nlmsvc_lock() we will call vfs_lock_file() without FL_SLEEP
set, because we've already queued a block and don't need the locks code
to do it again.

But that means vfs_lock_file() will return -EAGAIN instead of
FILE_LOCK_DENIED. So we still need to translate that -EAGAIN return
into a nlm_lck_blocked error in this case, and put ourselves back on
lockd's block list.

The bug was introduced by bde74e4bc64415b1 "locks: add special return
value for asynchronous locks".

Thanks to From: Frank van Maarseveen for the report; his original test
case was essentially

for i in `seq 30`; do flock /nfsmount/foo sleep 10 & done

Tested-by: Frank van Maarseveen <[email protected]>
Reported-by: Frank van Maarseveen <[email protected]>
Cc: Miklos Szeredi <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 6063a8e..763b78a 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -427,7 +427,7 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
case -EAGAIN:
ret = nlm_lck_denied;
- goto out;
+ break;
case FILE_LOCK_DEFERRED:
if (wait)
break;
@@ -443,6 +443,10 @@ nlmsvc_lock(struct svc_rqst *rqstp, struct nlm_file *file,
goto out;
}

+ ret = nlm_lck_denied;
+ if (!wait)
+ goto out;
+
ret = nlm_lck_blocked;

/* Append to list of blocked */

2009-02-09 20:18:40

by Miklos Szeredi

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Mon, 2009-02-09 at 13:10 -0500, J. Bruce Fields wrote:
> On Fri, Feb 06, 2009 at 12:29:58PM +0100, Miklos Szeredi wrote:
> > On Thu, 2009-02-05 at 14:52 -0500, J. Bruce Fields wrote:
> > > On Thu, Feb 05, 2009 at 11:47:09AM +0100, Miklos Szeredi wrote:
> > > > But I think at least a comment in the code would be in order, or this
> > > > same mistake might be made again. Also I think the original code flow
> > > > is somewhat illogical.
> > >
> > > Yeah, I was literally just reverting the problematic lines of your
> > > previous commit. I'd rather keep it that way for now, just as a clear
> > > separation between the revert/bugfix and the cleanup.
> >
> > OK.
> >
> > > > How about this (it's essentially the same patch just a bit rearranged,
> > > > the authorship is still yours of course ;)
> > >
> > > ... but would happily queue up the cleanup for 2.6.30.
> >
> > Cool.
> >
> > > Actually, I find it strange to have just that single case which breaks,
> > > so that the code after the switch, which looks like it should be shared,
> > > actually just applies to one case. I'd be inclined to just suck
> > > everything up to "out:" into the -EAGAIN case and then make all cases
> > > "goto out" (or, equivalently, break).
> >
> > Yes, but it needs to be sucked into the FILE_LOCK_DEFERRED case as well.
> > It's just two lines and one of them is setting the error value, so it's
> > not real duplication.
>
> Whoops, right, missed that; so, I'm applying the below, sending the
> fixup in now, and queuing up the cleanup for 2.6.30 (with the blame
> assigned back to you, hah--object or have me add your signed-off-by).

No objections :)

Signed-off-by: Miklos Szeredi <[email protected]>

BTW, one tip for stable patches: if you add a "Cc: [email protected]"
line to the Signed-off-by block, then it will ease the patch's way into
the stable kernels as it will automatically be picked up by Greg's
scripts when it hits the mainline tree.

Thanks,
Miklos



2009-02-09 20:51:06

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [NLM] 2.6.27 broken

On Mon, Feb 09, 2009 at 09:18:37PM +0100, Miklos Szeredi wrote:
> On Mon, 2009-02-09 at 13:10 -0500, J. Bruce Fields wrote:
> > On Fri, Feb 06, 2009 at 12:29:58PM +0100, Miklos Szeredi wrote:
> > > On Thu, 2009-02-05 at 14:52 -0500, J. Bruce Fields wrote:
> > > > On Thu, Feb 05, 2009 at 11:47:09AM +0100, Miklos Szeredi wrote:
> > > > > But I think at least a comment in the code would be in order, or this
> > > > > same mistake might be made again. Also I think the original code flow
> > > > > is somewhat illogical.
> > > >
> > > > Yeah, I was literally just reverting the problematic lines of your
> > > > previous commit. I'd rather keep it that way for now, just as a clear
> > > > separation between the revert/bugfix and the cleanup.
> > >
> > > OK.
> > >
> > > > > How about this (it's essentially the same patch just a bit rearranged,
> > > > > the authorship is still yours of course ;)
> > > >
> > > > ... but would happily queue up the cleanup for 2.6.30.
> > >
> > > Cool.
> > >
> > > > Actually, I find it strange to have just that single case which breaks,
> > > > so that the code after the switch, which looks like it should be shared,
> > > > actually just applies to one case. I'd be inclined to just suck
> > > > everything up to "out:" into the -EAGAIN case and then make all cases
> > > > "goto out" (or, equivalently, break).
> > >
> > > Yes, but it needs to be sucked into the FILE_LOCK_DEFERRED case as well.
> > > It's just two lines and one of them is setting the error value, so it's
> > > not real duplication.
> >
> > Whoops, right, missed that; so, I'm applying the below, sending the
> > fixup in now, and queuing up the cleanup for 2.6.30 (with the blame
> > assigned back to you, hah--object or have me add your signed-off-by).
>
> No objections :)
>
> Signed-off-by: Miklos Szeredi <[email protected]>
>
> BTW, one tip for stable patches: if you add a "Cc: [email protected]"
> line to the Signed-off-by block, then it will ease the patch's way into
> the stable kernels as it will automatically be picked up by Greg's
> scripts when it hits the mainline tree.

Thanks! I do cc: [email protected], but hadn't thought of adding that
to the changelog itself--makes sense, I'll do that next time.

--b.