2008-11-19 21:37:07

by Jeff Layton

[permalink] [raw]
Subject: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

Dave Teigland opened a bug stating that he was having some problems with
DLM and lockd. Essentially, the problem boiled down to the fact that
when you do a posix lock of a region that touches another lock, the VFS
will coalesce the locks and return a lock that encompasses the whole
range.

The problem here is that DLM then does a fl_grant callback to lockd with
the new coalesced lock. The fl_grant callback then looks through all
of the blocks and eventually returns -ENOENT since none match the
coalesced lock.

I'm having a very hard time tracking down info about how the fl_grant
callback is supposed to work. Is it OK to send an fl_grant callback
with a lock that's larger than the one requested? If so, then lockd
needs to account for that possibility. Also, what exactly is the
purpose of the second arg on fl_grant ("conf" in nlmsvc_grant_deferred)?

What follows is a patch that changes nlmsvc_grant_deferred to account
for the possibility that it may receive an fl_grant that has already
been coalesced. It changes nlmsvc_grant_deferred to walk the entire
list of blocks and grant any blocks that are covered by the range of
the lock in the grant callback, if doing so will not conflict with an
earlier grant.

The patch is still very rough and is probably broken in subtle (and
maybe overt) ways, but it fixes the reproducer that Dave provided. It's
just intended as a starting point for discussion. Can anyone clarify how
fl_grant is supposed to work? Who's wrong here? DLM or NLM?

Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/svclock.c | 58 ++++++++++++++++++++++++++++++++++---------
include/linux/lockd/lockd.h | 35 ++++++++++++++++++++++++++
2 files changed, 81 insertions(+), 12 deletions(-)

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 6063a8e..5ecd1db 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -611,6 +611,35 @@ nlmsvc_cancel_blocked(struct nlm_file *file, struct nlm_lock *lock)
}

/*
+ * Would granting this block cause a conflict with one already granted? Check
+ * by walking the f_blocks list for the file.
+ */
+static int
+nlmsvc_check_overlap(struct nlm_block *block)
+{
+ struct nlm_block *fblock, *next;
+
+ list_for_each_entry_safe(fblock, next, &block->b_file->f_blocks,
+ b_flist) {
+ /* skip myself */
+ if (fblock == block)
+ continue;
+
+ /* skip any locks that aren't granted */
+ if (!fblock->b_granted)
+ continue;
+
+ /* do block and fblock have any overlap in their ranges? */
+ if (nlm_overlapping_locks(&block->b_call->a_args.lock.fl,
+ &fblock->b_call->a_args.lock.fl))
+ return 1;
+ }
+
+ /* no conflicts found */
+ return 0;
+}
+
+/*
* This is a callback from the filesystem for VFS file lock requests.
* It will be used if fl_grant is defined and the filesystem can not
* respond to the request immediately.
@@ -625,9 +654,10 @@ nlmsvc_update_deferred_block(struct nlm_block *block, struct file_lock *conf,
int result)
{
block->b_flags |= B_GOT_CALLBACK;
- if (result == 0)
- block->b_granted = 1;
- else
+ if (result == 0) {
+ if (!nlmsvc_check_overlap(block))
+ block->b_granted = 1;
+ } else
block->b_flags |= B_TIMED_OUT;
if (conf) {
if (block->b_fl)
@@ -643,22 +673,26 @@ static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,

lock_kernel();
list_for_each_entry(block, &nlm_blocked, b_list) {
- if (nlm_compare_locks(&block->b_call->a_args.lock.fl, fl)) {
- dprintk("lockd: nlmsvc_notify_blocked block %p flags %d\n",
- block, block->b_flags);
+ if (nlm_lock_in_range(&block->b_call->a_args.lock.fl, fl)) {
+ dprintk("lockd: nlmsvc_notify_blocked block %p flags "
+ "%d\n", block, block->b_flags);
+
if (block->b_flags & B_QUEUED) {
if (block->b_flags & B_TIMED_OUT) {
- rc = -ENOLCK;
- break;
+ if (rc == -ENOENT)
+ rc = -ENOLCK;
+ continue;
}
- nlmsvc_update_deferred_block(block, conf, result);
- } else if (result == 0)
- block->b_granted = 1;
+ nlmsvc_update_deferred_block(block, conf,
+ result);
+ } else if (result == 0) {
+ if (!nlmsvc_check_overlap(block))
+ block->b_granted = 1;
+ }

nlmsvc_insert_block(block, 0);
svc_wake_up(block->b_daemon);
rc = 0;
- break;
}
}
unlock_kernel();
diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
index b56d5aa..aa38d47 100644
--- a/include/linux/lockd/lockd.h
+++ b/include/linux/lockd/lockd.h
@@ -367,6 +367,41 @@ static inline int nlm_compare_locks(const struct file_lock *fl1,
&&(fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
}

+/*
+ * Compare two NLM locks.
+ * When the second lock is of type F_UNLCK, this acts like a wildcard.
+ * Similar to nlm_compare_locks, but also return true as long as fl1's range
+ * is completely covered by fl2.
+ */
+static inline int nlm_lock_in_range(const struct file_lock *fl1,
+ const struct file_lock *fl2)
+{
+ return fl1->fl_pid == fl2->fl_pid
+ && fl1->fl_owner == fl2->fl_owner
+ && fl1->fl_start >= fl2->fl_start
+ && fl1->fl_end <= fl2->fl_end
+ && (fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
+}
+
+static inline int nlm_overlapping_locks(const struct file_lock *fl1,
+ const struct file_lock *fl2)
+{
+ /* does fl1 completely cover fl2? */
+ if (fl1->fl_start <= fl2->fl_start && fl1->fl_end >= fl2->fl_end)
+ return 1;
+
+ /* does fl2 completely cover fl1 */
+ if (fl2->fl_start <= fl1->fl_start && fl2->fl_end >= fl1->fl_end)
+ return 1;
+
+ /* does the start or end of fl1 sit inside of fl2? */
+ if ((fl1->fl_start >= fl2->fl_start && fl1->fl_start <= fl2->fl_end) ||
+ (fl1->fl_end >= fl2->fl_start && fl1->fl_end <= fl2->fl_end))
+ return 1;
+
+ return 0;
+}
+
extern struct lock_manager_operations nlmsvc_lock_operations;

#endif /* __KERNEL__ */
--
1.5.5.1



2008-11-25 15:14:09

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Mon, 24 Nov 2008 12:06:53 -0500
"J. Bruce Fields" <[email protected]> wrote:

>
> I still haven't looked at the code yet. Probably the first thing I'd
> check would whether the previous code depends on an assumption that each
> grant request results in revisit of exactly one rpc. I can't see a
> problem.
>

That's the case with the existing code, since it stops looking for
blocks once it finds a match. The patch I sent changes
nlmsvc_grant_deferred to walk the entire list and set up matching
blocks to be granted.

I think at that point, lockd will rewalk the list (via
nlmsvc_retry_blocked) and revisit each one that got moved to the head
of the list. Please correct me if I'm wrong here though. This code has
a lot of indirection and I'm not sure I fully understand the way
revisits work.

As a side note, one thing that might be nice for this is to have
nlmsvc_grant_deferred to only walk the list of blocks that is
associated with this file. nlm_blocked is an ordered list though, so
I'm not sure whether this might have an effect on "fairness" when
several hosts or processes are contending for the same range.

--
Jeff Layton <[email protected]>

2008-11-22 01:16:00

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Wed, Nov 19, 2008 at 04:37:03PM -0500, Jeff Layton wrote:
> Dave Teigland opened a bug stating that he was having some problems with
> DLM and lockd. Essentially, the problem boiled down to the fact that
> when you do a posix lock of a region that touches another lock, the VFS
> will coalesce the locks and return a lock that encompasses the whole
> range.
>
> The problem here is that DLM then does a fl_grant callback to lockd with
> the new coalesced lock. The fl_grant callback then looks through all
> of the blocks and eventually returns -ENOENT since none match the
> coalesced lock.

Ugh.

> I'm having a very hard time tracking down info about how the fl_grant
> callback is supposed to work. Is it OK to send an fl_grant callback
> with a lock that's larger than the one requested? If so, then lockd
> needs to account for that possibility. Also, what exactly is the
> purpose of the second arg on fl_grant ("conf" in nlmsvc_grant_deferred)?

It's only used in the case the lock failed, and it can optionally be set
to a copy of the conflicting lock.

> What follows is a patch that changes nlmsvc_grant_deferred to account
> for the possibility that it may receive an fl_grant that has already
> been coalesced. It changes nlmsvc_grant_deferred to walk the entire
> list of blocks and grant any blocks that are covered by the range of
> the lock in the grant callback, if doing so will not conflict with an
> earlier grant.

Hm. That might work.

> The patch is still very rough and is probably broken in subtle (and
> maybe overt) ways, but it fixes the reproducer that Dave provided. It's
> just intended as a starting point for discussion. Can anyone clarify how
> fl_grant is supposed to work? Who's wrong here? DLM or NLM?

I think this wasn't thought through, apologies. (It was my
responsibility to make sure it was!)

I also occasionally think that it would be better to keep any actual
lock application the caller's responsibility, to the extent that's
possible--so fl_grant would just be a notification, and it would be up
to lockd to try the lock again and check the result. That's more or
less the way it always worked before, and it seems a simpler model.

Some work in the filesystem would be required to make sure it would be
ready to return an answer on that second request.

I also think there's a problem with lock cancelling in the current code.

--b.

>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/lockd/svclock.c | 58 ++++++++++++++++++++++++++++++++++---------
> include/linux/lockd/lockd.h | 35 ++++++++++++++++++++++++++
> 2 files changed, 81 insertions(+), 12 deletions(-)
>
> diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
> index 6063a8e..5ecd1db 100644
> --- a/fs/lockd/svclock.c
> +++ b/fs/lockd/svclock.c
> @@ -611,6 +611,35 @@ nlmsvc_cancel_blocked(struct nlm_file *file, struct nlm_lock *lock)
> }
>
> /*
> + * Would granting this block cause a conflict with one already granted? Check
> + * by walking the f_blocks list for the file.
> + */
> +static int
> +nlmsvc_check_overlap(struct nlm_block *block)
> +{
> + struct nlm_block *fblock, *next;
> +
> + list_for_each_entry_safe(fblock, next, &block->b_file->f_blocks,
> + b_flist) {
> + /* skip myself */
> + if (fblock == block)
> + continue;
> +
> + /* skip any locks that aren't granted */
> + if (!fblock->b_granted)
> + continue;
> +
> + /* do block and fblock have any overlap in their ranges? */
> + if (nlm_overlapping_locks(&block->b_call->a_args.lock.fl,
> + &fblock->b_call->a_args.lock.fl))
> + return 1;
> + }
> +
> + /* no conflicts found */
> + return 0;
> +}
> +
> +/*
> * This is a callback from the filesystem for VFS file lock requests.
> * It will be used if fl_grant is defined and the filesystem can not
> * respond to the request immediately.
> @@ -625,9 +654,10 @@ nlmsvc_update_deferred_block(struct nlm_block *block, struct file_lock *conf,
> int result)
> {
> block->b_flags |= B_GOT_CALLBACK;
> - if (result == 0)
> - block->b_granted = 1;
> - else
> + if (result == 0) {
> + if (!nlmsvc_check_overlap(block))
> + block->b_granted = 1;
> + } else
> block->b_flags |= B_TIMED_OUT;
> if (conf) {
> if (block->b_fl)
> @@ -643,22 +673,26 @@ static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,
>
> lock_kernel();
> list_for_each_entry(block, &nlm_blocked, b_list) {
> - if (nlm_compare_locks(&block->b_call->a_args.lock.fl, fl)) {
> - dprintk("lockd: nlmsvc_notify_blocked block %p flags %d\n",
> - block, block->b_flags);
> + if (nlm_lock_in_range(&block->b_call->a_args.lock.fl, fl)) {
> + dprintk("lockd: nlmsvc_notify_blocked block %p flags "
> + "%d\n", block, block->b_flags);
> +
> if (block->b_flags & B_QUEUED) {
> if (block->b_flags & B_TIMED_OUT) {
> - rc = -ENOLCK;
> - break;
> + if (rc == -ENOENT)
> + rc = -ENOLCK;
> + continue;
> }
> - nlmsvc_update_deferred_block(block, conf, result);
> - } else if (result == 0)
> - block->b_granted = 1;
> + nlmsvc_update_deferred_block(block, conf,
> + result);
> + } else if (result == 0) {
> + if (!nlmsvc_check_overlap(block))
> + block->b_granted = 1;
> + }
>
> nlmsvc_insert_block(block, 0);
> svc_wake_up(block->b_daemon);
> rc = 0;
> - break;
> }
> }
> unlock_kernel();
> diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
> index b56d5aa..aa38d47 100644
> --- a/include/linux/lockd/lockd.h
> +++ b/include/linux/lockd/lockd.h
> @@ -367,6 +367,41 @@ static inline int nlm_compare_locks(const struct file_lock *fl1,
> &&(fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
> }
>
> +/*
> + * Compare two NLM locks.
> + * When the second lock is of type F_UNLCK, this acts like a wildcard.
> + * Similar to nlm_compare_locks, but also return true as long as fl1's range
> + * is completely covered by fl2.
> + */
> +static inline int nlm_lock_in_range(const struct file_lock *fl1,
> + const struct file_lock *fl2)
> +{
> + return fl1->fl_pid == fl2->fl_pid
> + && fl1->fl_owner == fl2->fl_owner
> + && fl1->fl_start >= fl2->fl_start
> + && fl1->fl_end <= fl2->fl_end
> + && (fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
> +}
> +
> +static inline int nlm_overlapping_locks(const struct file_lock *fl1,
> + const struct file_lock *fl2)
> +{
> + /* does fl1 completely cover fl2? */
> + if (fl1->fl_start <= fl2->fl_start && fl1->fl_end >= fl2->fl_end)
> + return 1;
> +
> + /* does fl2 completely cover fl1 */
> + if (fl2->fl_start <= fl1->fl_start && fl2->fl_end >= fl1->fl_end)
> + return 1;
> +
> + /* does the start or end of fl1 sit inside of fl2? */
> + if ((fl1->fl_start >= fl2->fl_start && fl1->fl_start <= fl2->fl_end) ||
> + (fl1->fl_end >= fl2->fl_start && fl1->fl_end <= fl2->fl_end))
> + return 1;
> +
> + return 0;
> +}
> +
> extern struct lock_manager_operations nlmsvc_lock_operations;
>
> #endif /* __KERNEL__ */
> --
> 1.5.5.1
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-11-24 15:33:22

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Fri, 21 Nov 2008 20:15:55 -0500
"J. Bruce Fields" <[email protected]> wrote:

> On Wed, Nov 19, 2008 at 04:37:03PM -0500, Jeff Layton wrote:
> > Dave Teigland opened a bug stating that he was having some problems with
> > DLM and lockd. Essentially, the problem boiled down to the fact that
> > when you do a posix lock of a region that touches another lock, the VFS
> > will coalesce the locks and return a lock that encompasses the whole
> > range.
> >
> > The problem here is that DLM then does a fl_grant callback to lockd with
> > the new coalesced lock. The fl_grant callback then looks through all
> > of the blocks and eventually returns -ENOENT since none match the
> > coalesced lock.
>
> Ugh.
>

My sentiments exactly...

> > I'm having a very hard time tracking down info about how the fl_grant
> > callback is supposed to work. Is it OK to send an fl_grant callback
> > with a lock that's larger than the one requested? If so, then lockd
> > needs to account for that possibility. Also, what exactly is the
> > purpose of the second arg on fl_grant ("conf" in nlmsvc_grant_deferred)?
>
> It's only used in the case the lock failed, and it can optionally be set
> to a copy of the conflicting lock.
>

Ok, good to know. At some point, a file in Documentation on these interfaces
this might be a nice addition...

> > What follows is a patch that changes nlmsvc_grant_deferred to account
> > for the possibility that it may receive an fl_grant that has already
> > been coalesced. It changes nlmsvc_grant_deferred to walk the entire
> > list of blocks and grant any blocks that are covered by the range of
> > the lock in the grant callback, if doing so will not conflict with an
> > earlier grant.
>
> Hm. That might work.
>

It seems to with very basic, cursory testing, but it could be
broken and I'm just not seeing it. This code looks like it's only used
in a NLM over DLM setup though, so it's hard to comprehensively test
it. This scheme is certainly more complex than the current code, though
and I'm not sure I have everything right with it.

> > The patch is still very rough and is probably broken in subtle (and
> > maybe overt) ways, but it fixes the reproducer that Dave provided. It's
> > just intended as a starting point for discussion. Can anyone clarify how
> > fl_grant is supposed to work? Who's wrong here? DLM or NLM?
>
> I think this wasn't thought through, apologies. (It was my
> responsibility to make sure it was!)
>
> I also occasionally think that it would be better to keep any actual
> lock application the caller's responsibility, to the extent that's
> possible--so fl_grant would just be a notification, and it would be up
> to lockd to try the lock again and check the result. That's more or
> less the way it always worked before, and it seems a simpler model.
>
> Some work in the filesystem would be required to make sure it would be
> ready to return an answer on that second request.
>
> I also think there's a problem with lock cancelling in the current code.
>

No worries, I think we can come up with something workable now that we
understand the problem.

When I dug through the mailing list archives, the only thing I could
find on this was this post by you:

http://lkml.org/lkml/2008/4/15/246

...and in particular:

- With fl_grant the filesystem says: I'm giving you the final
result of the lock operation. In particular, if I'm telling
you it succeeded, that means I've already granted you the
lock; don't ask me about it again.

...that seems to contradict what you're suggesting above. I suppose we
could consider changing NLM to use the .fl_notify interface, which you
described as:

- With fl_notify the filesystem says: something just happened to
this lock. I'm not guaranteeing anything, I'm just telling
you this would be a good time to try the lock again. Do it
and maybe you'll get lucky!

...is that what you were suggesting?

--
Jeff Layton <[email protected]>

2008-11-24 17:06:55

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Mon, Nov 24, 2008 at 10:33:13AM -0500, Jeff Layton wrote:
> On Fri, 21 Nov 2008 20:15:55 -0500
> "J. Bruce Fields" <[email protected]> wrote:
>
> > On Wed, Nov 19, 2008 at 04:37:03PM -0500, Jeff Layton wrote:
> > > Dave Teigland opened a bug stating that he was having some problems with
> > > DLM and lockd. Essentially, the problem boiled down to the fact that
> > > when you do a posix lock of a region that touches another lock, the VFS
> > > will coalesce the locks and return a lock that encompasses the whole
> > > range.
> > >
> > > The problem here is that DLM then does a fl_grant callback to lockd with
> > > the new coalesced lock. The fl_grant callback then looks through all
> > > of the blocks and eventually returns -ENOENT since none match the
> > > coalesced lock.
> >
> > Ugh.
> >
>
> My sentiments exactly...
>
> > > I'm having a very hard time tracking down info about how the fl_grant
> > > callback is supposed to work. Is it OK to send an fl_grant callback
> > > with a lock that's larger than the one requested? If so, then lockd
> > > needs to account for that possibility. Also, what exactly is the
> > > purpose of the second arg on fl_grant ("conf" in nlmsvc_grant_deferred)?
> >
> > It's only used in the case the lock failed, and it can optionally be set
> > to a copy of the conflicting lock.
> >
>
> Ok, good to know. At some point, a file in Documentation on these interfaces
> this might be a nice addition...

OK. I've made a note to do that. (Which doesn't mean it will happen
soon.)

>
> > > What follows is a patch that changes nlmsvc_grant_deferred to account
> > > for the possibility that it may receive an fl_grant that has already
> > > been coalesced. It changes nlmsvc_grant_deferred to walk the entire
> > > list of blocks and grant any blocks that are covered by the range of
> > > the lock in the grant callback, if doing so will not conflict with an
> > > earlier grant.
> >
> > Hm. That might work.
> >
>
> It seems to with very basic, cursory testing, but it could be
> broken and I'm just not seeing it. This code looks like it's only used
> in a NLM over DLM setup though, so it's hard to comprehensively test
> it. This scheme is certainly more complex than the current code, though
> and I'm not sure I have everything right with it.

I still haven't looked at the code yet. Probably the first thing I'd
check would whether the previous code depends on an assumption that each
grant request results in revisit of exactly one rpc. I can't see a
problem.

> > > The patch is still very rough and is probably broken in subtle (and
> > > maybe overt) ways, but it fixes the reproducer that Dave provided. It's
> > > just intended as a starting point for discussion. Can anyone clarify how
> > > fl_grant is supposed to work? Who's wrong here? DLM or NLM?
> >
> > I think this wasn't thought through, apologies. (It was my
> > responsibility to make sure it was!)
> >
> > I also occasionally think that it would be better to keep any actual
> > lock application the caller's responsibility, to the extent that's
> > possible--so fl_grant would just be a notification, and it would be up
> > to lockd to try the lock again and check the result. That's more or
> > less the way it always worked before, and it seems a simpler model.
> >
> > Some work in the filesystem would be required to make sure it would be
> > ready to return an answer on that second request.
> >
> > I also think there's a problem with lock cancelling in the current code.
> >
>
> No worries, I think we can come up with something workable now that we
> understand the problem.
>
> When I dug through the mailing list archives, the only thing I could
> find on this was this post by you:
>
> http://lkml.org/lkml/2008/4/15/246
>
> ...and in particular:
>
> - With fl_grant the filesystem says: I'm giving you the final
> result of the lock operation. In particular, if I'm telling
> you it succeeded, that means I've already granted you the
> lock; don't ask me about it again.
>
> ...that seems to contradict what you're suggesting above. I suppose we
> could consider changing NLM to use the .fl_notify interface, which you
> described as:
>
> - With fl_notify the filesystem says: something just happened to
> this lock. I'm not guaranteeing anything, I'm just telling
> you this would be a good time to try the lock again. Do it
> and maybe you'll get lucky!
>
> ...is that what you were suggesting?

Yes. But it would be a fundamental change. We should try to salvage
the current approach first.

--b.

2008-12-13 12:40:48

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Mon, 24 Nov 2008 12:06:53 -0500
"J. Bruce Fields" <[email protected]> wrote:

> >
> > It seems to with very basic, cursory testing, but it could be
> > broken and I'm just not seeing it. This code looks like it's only used
> > in a NLM over DLM setup though, so it's hard to comprehensively test
> > it. This scheme is certainly more complex than the current code, though
> > and I'm not sure I have everything right with it.
>
> I still haven't looked at the code yet. Probably the first thing I'd
> check would whether the previous code depends on an assumption that each
> grant request results in revisit of exactly one rpc. I can't see a
> problem.
>

Hi Bruce,

I've gone over this code in more detail. I'm fairly certain that this
assumption is not there in the case of a blocking lock. In the case of a
non-blocking lock (or testlock) then I don't *think* this assumption is
there either, but I could use someone to sanity check me there. The
non-blocking case is a lot harder to follow.

Any thoughts on the patch? I had figured this to be a relatively rare
situation, but I've had a couple of people ask me about it recently.
Some KDE/QT apps do this sort of locking in home dirs, so it's
apparently a real-world problem.

The most current patch follows...

Thanks,
Jeff

--------------------[snip]---------------------

>From 6e29be4fea77a57e29c0f67335dcb6750a31fb6a Mon Sep 17 00:00:00 2001
From: Jeff Layton <[email protected]>
Date: Sat, 13 Dec 2008 06:59:01 -0500
Subject: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

Dave Teigland opened a bug stating that he was having some problems with
DLM and lockd. Essentially, the problem boiled down to the fact that
when you do a posix lock of a region that touches another lock, the VFS
will coalesce the locks and return a lock that encompasses the whole
range.

The problem here is that DLM then does a fl_grant callback to lockd with
the new coalesced lock. The fl_grant callback then looks through all
of the blocks and eventually returns -ENOENT since none match the
coalesced lock.

I'm having a very hard time tracking down info about how the fl_grant
callback is supposed to work. Is it OK to send an fl_grant callback
with a lock that's larger than the one requested? If so, then lockd
needs to account for that possibility. Also, what exactly is the
purpose of the second arg on fl_grant ("conf" in nlmsvc_grant_deferred)?

What follows is a patch that changes nlmsvc_grant_deferred to account
for the possibility that it may receive an fl_grant that has already
been coalesced. It changes nlmsvc_grant_deferred to walk the entire
list of blocks and grant any blocks that are covered by the range of
the lock in the grant callback, if doing so will not conflict with an
earlier grant.

The patch is still very rough and is probably broken in subtle (and
maybe overt) ways, but it fixes the reproducer that Dave provided. It's
just intended as a starting point for discussion. Can anyone clarify how
fl_grant is supposed to work? Who's wrong here? DLM or NLM?

Signed-off-by: Jeff Layton <[email protected]>
---
fs/lockd/svclock.c | 72 ++++++++++++++++++++++++++++++++++---------
include/linux/lockd/lockd.h | 35 +++++++++++++++++++++
2 files changed, 92 insertions(+), 15 deletions(-)

diff --git a/fs/lockd/svclock.c b/fs/lockd/svclock.c
index 6063a8e..5ae1c28 100644
--- a/fs/lockd/svclock.c
+++ b/fs/lockd/svclock.c
@@ -611,28 +611,65 @@ nlmsvc_cancel_blocked(struct nlm_file *file, struct nlm_lock *lock)
}

/*
+ * Would granting this block cause a conflict with one already granted? Check
+ * by walking the f_blocks list for the file.
+ */
+static bool
+nlmsvc_check_overlap(struct nlm_block *block)
+{
+ struct nlm_block *fblock, *next;
+
+ list_for_each_entry_safe(fblock, next, &block->b_file->f_blocks,
+ b_flist) {
+ /* skip myself */
+ if (fblock == block)
+ continue;
+
+ /* skip any locks that aren't granted */
+ if (!fblock->b_granted)
+ continue;
+
+ /* do block and fblock have any overlap in their ranges? */
+ if (nlm_overlapping_locks(&block->b_call->a_args.lock.fl,
+ &fblock->b_call->a_args.lock.fl))
+ return true;
+ }
+
+ /* no conflicts found */
+ return false;
+}
+
+/*
* This is a callback from the filesystem for VFS file lock requests.
* It will be used if fl_grant is defined and the filesystem can not
* respond to the request immediately.
* For GETLK request it will copy the reply to the nlm_block.
* For SETLK or SETLKW request it will get the local posix lock.
- * In all cases it will move the block to the head of nlm_blocked q where
+ * It will generally will move the block to the head of nlm_blocked q where
* nlmsvc_retry_blocked() can send back a reply for SETLKW or revisit the
- * deferred rpc for GETLK and SETLK.
+ * deferred rpc for GETLK and SETLK. The exception is in the case where the
+ * block is not granted.
*/
-static void
+static bool
nlmsvc_update_deferred_block(struct nlm_block *block, struct file_lock *conf,
int result)
{
+ bool requeue = true;
+
block->b_flags |= B_GOT_CALLBACK;
- if (result == 0)
- block->b_granted = 1;
- else
+ if (result == 0) {
+ if (nlmsvc_check_overlap(block))
+ requeue = false;
+ else
+ block->b_granted = 1;
+ } else
block->b_flags |= B_TIMED_OUT;
if (conf) {
if (block->b_fl)
__locks_copy_lock(block->b_fl, conf);
}
+
+ return requeue;
}

static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,
@@ -643,22 +680,27 @@ static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,

lock_kernel();
list_for_each_entry(block, &nlm_blocked, b_list) {
- if (nlm_compare_locks(&block->b_call->a_args.lock.fl, fl)) {
- dprintk("lockd: nlmsvc_notify_blocked block %p flags %d\n",
- block, block->b_flags);
+ if (nlm_lock_in_range(&block->b_call->a_args.lock.fl, fl)) {
+ dprintk("lockd: nlmsvc_notify_blocked block %p flags "
+ "%d\n", block, block->b_flags);
+
if (block->b_flags & B_QUEUED) {
if (block->b_flags & B_TIMED_OUT) {
- rc = -ENOLCK;
- break;
+ if (rc == -ENOENT)
+ rc = -ENOLCK;
+ continue;
}
- nlmsvc_update_deferred_block(block, conf, result);
- } else if (result == 0)
- block->b_granted = 1;
+ if (!nlmsvc_update_deferred_block(block, conf,
+ result))
+ continue;
+ } else if (result == 0) {
+ if (!nlmsvc_check_overlap(block))
+ block->b_granted = 1;
+ }

nlmsvc_insert_block(block, 0);
svc_wake_up(block->b_daemon);
rc = 0;
- break;
}
}
unlock_kernel();
diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
index f7146e6..4a5406c 100644
--- a/include/linux/lockd/lockd.h
+++ b/include/linux/lockd/lockd.h
@@ -385,6 +385,41 @@ static inline int nlm_compare_locks(const struct file_lock *fl1,
&&(fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
}

+/*
+ * Compare two NLM locks.
+ * When the second lock is of type F_UNLCK, this acts like a wildcard.
+ * Similar to nlm_compare_locks, but also return true as long as fl1's range
+ * is completely covered by fl2.
+ */
+static inline int nlm_lock_in_range(const struct file_lock *fl1,
+ const struct file_lock *fl2)
+{
+ return fl1->fl_pid == fl2->fl_pid
+ && fl1->fl_owner == fl2->fl_owner
+ && fl1->fl_start >= fl2->fl_start
+ && fl1->fl_end <= fl2->fl_end
+ && (fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
+}
+
+static inline int nlm_overlapping_locks(const struct file_lock *fl1,
+ const struct file_lock *fl2)
+{
+ /* does fl1 completely cover fl2? */
+ if (fl1->fl_start <= fl2->fl_start && fl1->fl_end >= fl2->fl_end)
+ return 1;
+
+ /* does fl2 completely cover fl1 */
+ if (fl2->fl_start <= fl1->fl_start && fl2->fl_end >= fl1->fl_end)
+ return 1;
+
+ /* does the start or end of fl1 sit inside of fl2? */
+ if ((fl1->fl_start >= fl2->fl_start && fl1->fl_start <= fl2->fl_end) ||
+ (fl1->fl_end >= fl2->fl_start && fl1->fl_end <= fl2->fl_end))
+ return 1;
+
+ return 0;
+}
+
extern struct lock_manager_operations nlmsvc_lock_operations;

#endif /* __KERNEL__ */
--
1.5.5.1


2008-12-16 19:38:08

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Sat, Dec 13, 2008 at 07:40:42AM -0500, Jeff Layton wrote:
> Any thoughts on the patch? I had figured this to be a relatively rare
> situation, but I've had a couple of people ask me about it recently.
> Some KDE/QT apps do this sort of locking in home dirs, so it's
> apparently a real-world problem.

If you knew of a simple description of the locking they're doing (and
why), I'd be really interested.

> The most current patch follows...

Thanks. sorry for the delay responding. This deserves a closer look
than I've been able to give it so far:

> /*
> + * Would granting this block cause a conflict with one already granted? Check
> + * by walking the f_blocks list for the file.
> + */
> +static bool
> +nlmsvc_check_overlap(struct nlm_block *block)
> +{
> + struct nlm_block *fblock, *next;
> +
> + list_for_each_entry_safe(fblock, next, &block->b_file->f_blocks,
> + b_flist) {
> + /* skip myself */
> + if (fblock == block)
> + continue;
> +
> + /* skip any locks that aren't granted */
> + if (!fblock->b_granted)
> + continue;
> +
> + /* do block and fblock have any overlap in their ranges? */
> + if (nlm_overlapping_locks(&block->b_call->a_args.lock.fl,
> + &fblock->b_call->a_args.lock.fl))
> + return true;
> + }
> +
> + /* no conflicts found */
> + return false;
> +}
> +
> +/*
> * This is a callback from the filesystem for VFS file lock requests.
> * It will be used if fl_grant is defined and the filesystem can not
> * respond to the request immediately.
> * For GETLK request it will copy the reply to the nlm_block.
> * For SETLK or SETLKW request it will get the local posix lock.
> - * In all cases it will move the block to the head of nlm_blocked q where
> + * It will generally will move the block to the head of nlm_blocked q where
> * nlmsvc_retry_blocked() can send back a reply for SETLKW or revisit the
> - * deferred rpc for GETLK and SETLK.
> + * deferred rpc for GETLK and SETLK. The exception is in the case where the
> + * block is not granted.
> */
> -static void
> +static bool
> nlmsvc_update_deferred_block(struct nlm_block *block, struct file_lock *conf,
> int result)
> {
> + bool requeue = true;
> +
> block->b_flags |= B_GOT_CALLBACK;
> - if (result == 0)
> - block->b_granted = 1;
> - else
> + if (result == 0) {
> + if (nlmsvc_check_overlap(block))
> + requeue = false;

So these check_overlap()'s are in attempt to ensure that grants for
overlapping ranges get sent back to the client in some given order?
What order is that exactly? (Do we know that the existing order is
right?)

This can never be perfect, since we can't for example guarantee the
grant rpc's get received on the client in the right order. I'm inclined
not to bother, but perhaps I'm missing the point.

Maybe it would help to walk through of an example that demonstrates the
problem.

> + else
> + block->b_granted = 1;

Also, shouldn't we be granting in this case? Or are you counting on a
later overlapping grant that will cover this one as well?

> + } else
> block->b_flags |= B_TIMED_OUT;
> if (conf) {
> if (block->b_fl)
> __locks_copy_lock(block->b_fl, conf);
> }
> +
> + return requeue;
> }
>
> static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,
> @@ -643,22 +680,27 @@ static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,
>
> lock_kernel();
> list_for_each_entry(block, &nlm_blocked, b_list) {
> - if (nlm_compare_locks(&block->b_call->a_args.lock.fl, fl)) {
> - dprintk("lockd: nlmsvc_notify_blocked block %p flags %d\n",
> - block, block->b_flags);
> + if (nlm_lock_in_range(&block->b_call->a_args.lock.fl, fl)) {
> + dprintk("lockd: nlmsvc_notify_blocked block %p flags "
> + "%d\n", block, block->b_flags);
> +
> if (block->b_flags & B_QUEUED) {
> if (block->b_flags & B_TIMED_OUT) {
> - rc = -ENOLCK;
> - break;
> + if (rc == -ENOENT)
> + rc = -ENOLCK;

I don't think the conditional is necessary.

> + continue;
> }
> - nlmsvc_update_deferred_block(block, conf, result);
> - } else if (result == 0)
> - block->b_granted = 1;
> + if (!nlmsvc_update_deferred_block(block, conf,
> + result))
> + continue;

Don't we still want the svc_wake_up in this case?

I'm sorry this is so hairy.... Thanks very much for looking into it.

--b.

> + } else if (result == 0) {
> + if (!nlmsvc_check_overlap(block))
> + block->b_granted = 1;
> + }
>
> nlmsvc_insert_block(block, 0);
> svc_wake_up(block->b_daemon);
> rc = 0;
> - break;
> }
> }
> unlock_kernel();
> diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
> index f7146e6..4a5406c 100644
> --- a/include/linux/lockd/lockd.h
> +++ b/include/linux/lockd/lockd.h
> @@ -385,6 +385,41 @@ static inline int nlm_compare_locks(const struct file_lock *fl1,
> &&(fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
> }
>
> +/*
> + * Compare two NLM locks.
> + * When the second lock is of type F_UNLCK, this acts like a wildcard.
> + * Similar to nlm_compare_locks, but also return true as long as fl1's range
> + * is completely covered by fl2.
> + */
> +static inline int nlm_lock_in_range(const struct file_lock *fl1,
> + const struct file_lock *fl2)
> +{
> + return fl1->fl_pid == fl2->fl_pid
> + && fl1->fl_owner == fl2->fl_owner
> + && fl1->fl_start >= fl2->fl_start
> + && fl1->fl_end <= fl2->fl_end
> + && (fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
> +}
> +
> +static inline int nlm_overlapping_locks(const struct file_lock *fl1,
> + const struct file_lock *fl2)
> +{
> + /* does fl1 completely cover fl2? */
> + if (fl1->fl_start <= fl2->fl_start && fl1->fl_end >= fl2->fl_end)
> + return 1;
> +
> + /* does fl2 completely cover fl1 */
> + if (fl2->fl_start <= fl1->fl_start && fl2->fl_end >= fl1->fl_end)
> + return 1;
> +
> + /* does the start or end of fl1 sit inside of fl2? */
> + if ((fl1->fl_start >= fl2->fl_start && fl1->fl_start <= fl2->fl_end) ||
> + (fl1->fl_end >= fl2->fl_start && fl1->fl_end <= fl2->fl_end))
> + return 1;
> +
> + return 0;
> +}
> +
> extern struct lock_manager_operations nlmsvc_lock_operations;
>
> #endif /* __KERNEL__ */
> --
> 1.5.5.1
>

2008-12-16 19:56:37

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Tue, Dec 16, 2008 at 02:38:06PM -0500, bfields wrote:
> On Sat, Dec 13, 2008 at 07:40:42AM -0500, Jeff Layton wrote:
> > Any thoughts on the patch? I had figured this to be a relatively rare
> > situation, but I've had a couple of people ask me about it recently.
> > Some KDE/QT apps do this sort of locking in home dirs, so it's
> > apparently a real-world problem.
>
> If you knew of a simple description of the locking they're doing (and
> why), I'd be really interested.
>
> > The most current patch follows...
>
> Thanks. sorry for the delay responding. This deserves a closer look
> than I've been able to give it so far:

After thinking a little more: the interface is a lot simpler if it's
just a simple request and reply (with the reply for a lock identical to
the request). I believe that's more or less what gfs2 is already do
internally, if we look at the posix lock upcalls it's making to
userspace. So it shouldn't be hard to fix gfs2 to hand us back a lock
that doesn't take into account any coalescing. If it needs to keep an
extra (unmodified) copy of the lock around, that's OK.

Did you try that and find a reason that doesn't work?

--b.

>
> > /*
> > + * Would granting this block cause a conflict with one already granted? Check
> > + * by walking the f_blocks list for the file.
> > + */
> > +static bool
> > +nlmsvc_check_overlap(struct nlm_block *block)
> > +{
> > + struct nlm_block *fblock, *next;
> > +
> > + list_for_each_entry_safe(fblock, next, &block->b_file->f_blocks,
> > + b_flist) {
> > + /* skip myself */
> > + if (fblock == block)
> > + continue;
> > +
> > + /* skip any locks that aren't granted */
> > + if (!fblock->b_granted)
> > + continue;
> > +
> > + /* do block and fblock have any overlap in their ranges? */
> > + if (nlm_overlapping_locks(&block->b_call->a_args.lock.fl,
> > + &fblock->b_call->a_args.lock.fl))
> > + return true;
> > + }
> > +
> > + /* no conflicts found */
> > + return false;
> > +}
> > +
> > +/*
> > * This is a callback from the filesystem for VFS file lock requests.
> > * It will be used if fl_grant is defined and the filesystem can not
> > * respond to the request immediately.
> > * For GETLK request it will copy the reply to the nlm_block.
> > * For SETLK or SETLKW request it will get the local posix lock.
> > - * In all cases it will move the block to the head of nlm_blocked q where
> > + * It will generally will move the block to the head of nlm_blocked q where
> > * nlmsvc_retry_blocked() can send back a reply for SETLKW or revisit the
> > - * deferred rpc for GETLK and SETLK.
> > + * deferred rpc for GETLK and SETLK. The exception is in the case where the
> > + * block is not granted.
> > */
> > -static void
> > +static bool
> > nlmsvc_update_deferred_block(struct nlm_block *block, struct file_lock *conf,
> > int result)
> > {
> > + bool requeue = true;
> > +
> > block->b_flags |= B_GOT_CALLBACK;
> > - if (result == 0)
> > - block->b_granted = 1;
> > - else
> > + if (result == 0) {
> > + if (nlmsvc_check_overlap(block))
> > + requeue = false;
>
> So these check_overlap()'s are in attempt to ensure that grants for
> overlapping ranges get sent back to the client in some given order?
> What order is that exactly? (Do we know that the existing order is
> right?)
>
> This can never be perfect, since we can't for example guarantee the
> grant rpc's get received on the client in the right order. I'm inclined
> not to bother, but perhaps I'm missing the point.
>
> Maybe it would help to walk through of an example that demonstrates the
> problem.
>
> > + else
> > + block->b_granted = 1;
>
> Also, shouldn't we be granting in this case? Or are you counting on a
> later overlapping grant that will cover this one as well?
>
> > + } else
> > block->b_flags |= B_TIMED_OUT;
> > if (conf) {
> > if (block->b_fl)
> > __locks_copy_lock(block->b_fl, conf);
> > }
> > +
> > + return requeue;
> > }
> >
> > static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,
> > @@ -643,22 +680,27 @@ static int nlmsvc_grant_deferred(struct file_lock *fl, struct file_lock *conf,
> >
> > lock_kernel();
> > list_for_each_entry(block, &nlm_blocked, b_list) {
> > - if (nlm_compare_locks(&block->b_call->a_args.lock.fl, fl)) {
> > - dprintk("lockd: nlmsvc_notify_blocked block %p flags %d\n",
> > - block, block->b_flags);
> > + if (nlm_lock_in_range(&block->b_call->a_args.lock.fl, fl)) {
> > + dprintk("lockd: nlmsvc_notify_blocked block %p flags "
> > + "%d\n", block, block->b_flags);
> > +
> > if (block->b_flags & B_QUEUED) {
> > if (block->b_flags & B_TIMED_OUT) {
> > - rc = -ENOLCK;
> > - break;
> > + if (rc == -ENOENT)
> > + rc = -ENOLCK;
>
> I don't think the conditional is necessary.
>
> > + continue;
> > }
> > - nlmsvc_update_deferred_block(block, conf, result);
> > - } else if (result == 0)
> > - block->b_granted = 1;
> > + if (!nlmsvc_update_deferred_block(block, conf,
> > + result))
> > + continue;
>
> Don't we still want the svc_wake_up in this case?
>
> I'm sorry this is so hairy.... Thanks very much for looking into it.
>
> --b.
>
> > + } else if (result == 0) {
> > + if (!nlmsvc_check_overlap(block))
> > + block->b_granted = 1;
> > + }
> >
> > nlmsvc_insert_block(block, 0);
> > svc_wake_up(block->b_daemon);
> > rc = 0;
> > - break;
> > }
> > }
> > unlock_kernel();
> > diff --git a/include/linux/lockd/lockd.h b/include/linux/lockd/lockd.h
> > index f7146e6..4a5406c 100644
> > --- a/include/linux/lockd/lockd.h
> > +++ b/include/linux/lockd/lockd.h
> > @@ -385,6 +385,41 @@ static inline int nlm_compare_locks(const struct file_lock *fl1,
> > &&(fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
> > }
> >
> > +/*
> > + * Compare two NLM locks.
> > + * When the second lock is of type F_UNLCK, this acts like a wildcard.
> > + * Similar to nlm_compare_locks, but also return true as long as fl1's range
> > + * is completely covered by fl2.
> > + */
> > +static inline int nlm_lock_in_range(const struct file_lock *fl1,
> > + const struct file_lock *fl2)
> > +{
> > + return fl1->fl_pid == fl2->fl_pid
> > + && fl1->fl_owner == fl2->fl_owner
> > + && fl1->fl_start >= fl2->fl_start
> > + && fl1->fl_end <= fl2->fl_end
> > + && (fl1->fl_type == fl2->fl_type || fl2->fl_type == F_UNLCK);
> > +}
> > +
> > +static inline int nlm_overlapping_locks(const struct file_lock *fl1,
> > + const struct file_lock *fl2)
> > +{
> > + /* does fl1 completely cover fl2? */
> > + if (fl1->fl_start <= fl2->fl_start && fl1->fl_end >= fl2->fl_end)
> > + return 1;
> > +
> > + /* does fl2 completely cover fl1 */
> > + if (fl2->fl_start <= fl1->fl_start && fl2->fl_end >= fl1->fl_end)
> > + return 1;
> > +
> > + /* does the start or end of fl1 sit inside of fl2? */
> > + if ((fl1->fl_start >= fl2->fl_start && fl1->fl_start <= fl2->fl_end) ||
> > + (fl1->fl_end >= fl2->fl_start && fl1->fl_end <= fl2->fl_end))
> > + return 1;
> > +
> > + return 0;
> > +}
> > +
> > extern struct lock_manager_operations nlmsvc_lock_operations;
> >
> > #endif /* __KERNEL__ */
> > --
> > 1.5.5.1
> >

2008-12-16 21:12:04

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Tue, 16 Dec 2008 14:56:35 -0500
"J. Bruce Fields" <[email protected]> wrote:

> > + if (result == 0) {
> > + if (nlmsvc_check_overlap(block))
> > + requeue = false;
>
> So these check_overlap()'s are in attempt to ensure that grants for
> overlapping ranges get sent back to the client in some given order?
> What order is that exactly? (Do we know that the existing order is
> right?)
>

No, the idea was to have this code walk the entire list of blocks and
grant any block that the lock completely covers. But, I think we need to
check and make sure that block doesn't conflict with another grant,
correct? That's what that function is intended to do.

> After thinking a little more: the interface is a lot simpler if it's
> just a simple request and reply (with the reply for a lock identical to
> the request). I believe that's more or less what gfs2 is already do
> internally, if we look at the posix lock upcalls it's making to
> userspace. So it shouldn't be hard to fix gfs2 to hand us back a lock
> that doesn't take into account any coalescing. If it needs to keep an
> extra (unmodified) copy of the lock around, that's OK.
>
> Did you try that and find a reason that doesn't work?
>
> --b.
>

Agreed. That would be much simpler, I think...

I didn't try that, though I did consider it before wandering down the
rabbit hole. Dave, any thoughts?

--
Jeff Layton <[email protected]>

2008-12-17 19:17:30

by David Teigland

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Tue, Dec 16, 2008 at 04:11:58PM -0500, Jeff Layton wrote:
> On Tue, 16 Dec 2008 14:56:35 -0500
> "J. Bruce Fields" <[email protected]> wrote:
> > After thinking a little more: the interface is a lot simpler if it's
> > just a simple request and reply (with the reply for a lock identical to
> > the request). I believe that's more or less what gfs2 is already do
> > internally, if we look at the posix lock upcalls it's making to
> > userspace. So it shouldn't be hard to fix gfs2 to hand us back a lock
> > that doesn't take into account any coalescing. If it needs to keep an
> > extra (unmodified) copy of the lock around, that's OK.
> >
> > Did you try that and find a reason that doesn't work?
> >
> > --b.
> >
>
> Agreed. That would be much simpler, I think...
>
> I didn't try that, though I did consider it before wandering down the
> rabbit hole. Dave, any thoughts?

Jeff suggested the following patch, which I've tried and it fixes the
problem I was seeing. It passes the original, unmodified file_lock to
notify(), instead of the copy which is passed to (and coalesced by)
posix_lock_file(). I'm guessing this was reason for having a copy of the
file_lock in the first place, but it was just not used correctly.

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index eba87ff..502b1ea 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -168,7 +168,7 @@ static int dlm_plock_callback(struct plock_op *op)
notify = xop->callback;

if (op->info.rv) {
- notify(flc, NULL, op->info.rv);
+ notify(fl, NULL, op->info.rv);
goto out;
}

@@ -187,7 +187,7 @@ static int dlm_plock_callback(struct plock_op *op)
(unsigned long long)op->info.number, file, fl);
}

- rv = notify(flc, NULL, 0);
+ rv = notify(fl, NULL, 0);
if (rv) {
/* XXX: We need to cancel the fs lock here: */
log_print("dlm_plock_callback: lock granted after lock request "

2008-12-17 20:01:57

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Wed, Dec 17, 2008 at 01:14:53PM -0600, David Teigland wrote:
> On Tue, Dec 16, 2008 at 04:11:58PM -0500, Jeff Layton wrote:
> > On Tue, 16 Dec 2008 14:56:35 -0500
> > "J. Bruce Fields" <[email protected]> wrote:
> > > After thinking a little more: the interface is a lot simpler if it's
> > > just a simple request and reply (with the reply for a lock identical to
> > > the request). I believe that's more or less what gfs2 is already do
> > > internally, if we look at the posix lock upcalls it's making to
> > > userspace. So it shouldn't be hard to fix gfs2 to hand us back a lock
> > > that doesn't take into account any coalescing. If it needs to keep an
> > > extra (unmodified) copy of the lock around, that's OK.
> > >
> > > Did you try that and find a reason that doesn't work?
> > >
> > > --b.
> > >
> >
> > Agreed. That would be much simpler, I think...
> >
> > I didn't try that, though I did consider it before wandering down the
> > rabbit hole. Dave, any thoughts?
>
> Jeff suggested the following patch, which I've tried and it fixes the
> problem I was seeing. It passes the original, unmodified file_lock to
> notify(), instead of the copy which is passed to (and coalesced by)
> posix_lock_file(). I'm guessing this was reason for having a copy of the
> file_lock in the first place, but it was just not used correctly.

Yep, that looks much better. Though actually I suspect what was really
intended was to use "flc" for the notifies, and "fl" for the
posix_lock_file().

Also, since flc is never actually handed to the posix lock system, I
think it should be a "shallow" lock copy--so it should be created with
__locks_copy_lock(). Something like the below?

--b.

diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index eba87ff..e8d9086 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -103,7 +103,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
op->info.owner = (__u64) fl->fl_pid;
xop->callback = fl->fl_lmops->fl_grant;
locks_init_lock(&xop->flc);
- locks_copy_lock(&xop->flc, fl);
+ __locks_copy_lock(&xop->flc, fl);
xop->fl = fl;
xop->file = file;
} else {
@@ -173,8 +173,8 @@ static int dlm_plock_callback(struct plock_op *op)
}

/* got fs lock; bookkeep locally as well: */
- flc->fl_flags &= ~FL_SLEEP;
- if (posix_lock_file(file, flc, NULL)) {
+ fl->fl_flags &= ~FL_SLEEP;
+ if (posix_lock_file(file, fl, NULL)) {
/*
* This can only happen in the case of kmalloc() failure.
* The filesystem's own lock is the authoritative lock,

2008-12-17 21:31:05

by David Teigland

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Wed, Dec 17, 2008 at 03:01:56PM -0500, J. Bruce Fields wrote:
> Yep, that looks much better. Though actually I suspect what was really
> intended was to use "flc" for the notifies, and "fl" for the
> posix_lock_file().
>
> Also, since flc is never actually handed to the posix lock system, I
> think it should be a "shallow" lock copy--so it should be created with
> __locks_copy_lock(). Something like the below?

With this I'm back to seeing the same problem, but with the mismatch in
the reverse direction.

It seems fl points to lockd's file_lock, and that lockd expects notify()
will be called with a pointer to a file_lock that matches one of its own.
Based on that I think we'd always pass fl to notify().

The question then is whether lockd's file_lock should be coalesced or not.
If so, we'd pass fl to posix_lock_file(). If not, we'd pass flc to
posix_lock_file(). In both cases, fl would be passed to notify() and
would match. In the former case, I don't see much purpose for flc to even
exist. The patch I sent was the later case.

In the original code, we coalesce flc which then fails to match the
original (fl) in lockd. In your patch, we coalesce fl which then fails to
match the copy of the original (flc).

Dave

>
> diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> index eba87ff..e8d9086 100644
> --- a/fs/dlm/plock.c
> +++ b/fs/dlm/plock.c
> @@ -103,7 +103,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
> op->info.owner = (__u64) fl->fl_pid;
> xop->callback = fl->fl_lmops->fl_grant;
> locks_init_lock(&xop->flc);
> - locks_copy_lock(&xop->flc, fl);
> + __locks_copy_lock(&xop->flc, fl);
> xop->fl = fl;
> xop->file = file;
> } else {
> @@ -173,8 +173,8 @@ static int dlm_plock_callback(struct plock_op *op)
> }
>
> /* got fs lock; bookkeep locally as well: */
> - flc->fl_flags &= ~FL_SLEEP;
> - if (posix_lock_file(file, flc, NULL)) {
> + fl->fl_flags &= ~FL_SLEEP;
> + if (posix_lock_file(file, fl, NULL)) {
> /*
> * This can only happen in the case of kmalloc() failure.
> * The filesystem's own lock is the authoritative lock,

2009-01-19 22:54:11

by David Teigland

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

> On Wed, Dec 17, 2008 at 03:01:56PM -0500, J. Bruce Fields wrote:
> > On Wed, Dec 17, 2008 at 01:14:53PM -0600, David Teigland wrote:
> > Also, since flc is never actually handed to the posix lock system, I
> > think it should be a "shallow" lock copy--so it should be created with
> > __locks_copy_lock(). Something like the below?

> (I'd like to do the s/locks_copy_lock/__locks_copy_lock/ in a separate
> patch since it's not directly related to fixing the bug.)

I haven't looked at why, but s/locks_copy_lock/__locks_copy_lock/ creates
problems with the file_lock's kept by the vfs. With two programs doing
locking on an nfs client I get lots of messages like this on the server:

dlm: dlm_plock_callback: vfs lock error -11 num 20573 file
ffff88007e16a818 fl ffff88007dc796b0
dlm: dlm_plock_callback: vfs lock error -11 num 2055f file
ffff88007e16a818 fl ffff88007dc796b0
dlm: dlm_plock_callback: vfs lock error -11 num 2058b file
ffff880017cf56d0 fl ffff88007e566750
dlm: dlm_plock_callback: vfs lock error -11 num 2055c file
ffff88007e16a818 fl ffff88007e566750

(code modified to report the -11 / -EAGAIN)

And /proc/locks on the server has entries that look like:

54: POSIX *NOINODE* WRITE 8682 <none>:0 5 9
55: POSIX *NOINODE* READ 8682 <none>:0 0 4
56: POSIX *NOINODE* READ 8682 <none>:0 0 4
57: POSIX *NOINODE* READ 8682 <none>:0 5 9
58: POSIX *NOINODE* READ 8682 <none>:0 5 9
59: POSIX *NOINODE* WRITE 8682 <none>:0 0 4

My tests are looking good using the current locks_copy_lock(), so I plan
to just stick with that.
Dave


2009-01-20 23:05:48

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

Sorry for the delay responding!

On Wed, Dec 17, 2008 at 03:28:27PM -0600, David Teigland wrote:
> On Wed, Dec 17, 2008 at 03:01:56PM -0500, J. Bruce Fields wrote:
> > Yep, that looks much better. Though actually I suspect what was really
> > intended was to use "flc" for the notifies, and "fl" for the
> > posix_lock_file().
> >
> > Also, since flc is never actually handed to the posix lock system, I
> > think it should be a "shallow" lock copy--so it should be created with
> > __locks_copy_lock(). Something like the below?
>
> With this I'm back to seeing the same problem, but with the mismatch in
> the reverse direction.
>
> It seems fl points to lockd's file_lock, and that lockd expects notify()
> will be called with a pointer to a file_lock that matches one of its own.
> Based on that I think we'd always pass fl to notify().

The lockd grant function that's called is nlmsvc_grant_deferred(), and
it uses the passed-in fl only in nlm_compare_locks().

Perhaps the problem is that the posix_lock_file() modifies the original
file lock which lockd is also holding a pointer to, and thus the
coalescing has also changed the lock that *lockd*'s sees?

--b.

>
> The question then is whether lockd's file_lock should be coalesced or not.
> If so, we'd pass fl to posix_lock_file(). If not, we'd pass flc to
> posix_lock_file(). In both cases, fl would be passed to notify() and
> would match. In the former case, I don't see much purpose for flc to even
> exist. The patch I sent was the later case.
>
> In the original code, we coalesce flc which then fails to match the
> original (fl) in lockd. In your patch, we coalesce fl which then fails to
> match the copy of the original (flc).
>
> Dave
>
> >
> > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > index eba87ff..e8d9086 100644
> > --- a/fs/dlm/plock.c
> > +++ b/fs/dlm/plock.c
> > @@ -103,7 +103,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
> > op->info.owner = (__u64) fl->fl_pid;
> > xop->callback = fl->fl_lmops->fl_grant;
> > locks_init_lock(&xop->flc);
> > - locks_copy_lock(&xop->flc, fl);
> > + __locks_copy_lock(&xop->flc, fl);
> > xop->fl = fl;
> > xop->file = file;
> > } else {
> > @@ -173,8 +173,8 @@ static int dlm_plock_callback(struct plock_op *op)
> > }
> >
> > /* got fs lock; bookkeep locally as well: */
> > - flc->fl_flags &= ~FL_SLEEP;
> > - if (posix_lock_file(file, flc, NULL)) {
> > + fl->fl_flags &= ~FL_SLEEP;
> > + if (posix_lock_file(file, fl, NULL)) {
> > /*
> > * This can only happen in the case of kmalloc() failure.
> > * The filesystem's own lock is the authoritative lock,

2009-01-20 23:15:34

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Tue, Jan 20, 2009 at 06:05:48PM -0500, bfields wrote:
> Sorry for the delay responding!
>
> On Wed, Dec 17, 2008 at 03:28:27PM -0600, David Teigland wrote:
> > On Wed, Dec 17, 2008 at 03:01:56PM -0500, J. Bruce Fields wrote:
> > > Yep, that looks much better. Though actually I suspect what was really
> > > intended was to use "flc" for the notifies, and "fl" for the
> > > posix_lock_file().
> > >
> > > Also, since flc is never actually handed to the posix lock system, I
> > > think it should be a "shallow" lock copy--so it should be created with
> > > __locks_copy_lock(). Something like the below?
> >
> > With this I'm back to seeing the same problem, but with the mismatch in
> > the reverse direction.
> >
> > It seems fl points to lockd's file_lock, and that lockd expects notify()
> > will be called with a pointer to a file_lock that matches one of its own.
> > Based on that I think we'd always pass fl to notify().
>
> The lockd grant function that's called is nlmsvc_grant_deferred(), and
> it uses the passed-in fl only in nlm_compare_locks().
>
> Perhaps the problem is that the posix_lock_file() modifies the original
> file lock which lockd is also holding a pointer to, and thus the
> coalescing has also changed the lock that *lockd*'s sees?

Whoops, sorry, I stopped reading too early:

> > The question then is whether lockd's file_lock should be coalesced or not.
> > If so, we'd pass fl to posix_lock_file(). If not, we'd pass flc to
> > posix_lock_file(). In both cases, fl would be passed to notify() and
> > would match. In the former case, I don't see much purpose for flc to even
> > exist. The patch I sent was the later case.

Yes, OK, makes sense. But the former case (removing flc entirely) might
be simpler:

In the current code, the locks_copy_lock() results in ->fl_copy_lock()
calls without corresponding ->fl_release_private() calls on the copy
that's created; not a problem for the current code, but also not the way
the lock api should work.

--b.

> >
> > In the original code, we coalesce flc which then fails to match the
> > original (fl) in lockd. In your patch, we coalesce fl which then fails to
> > match the copy of the original (flc).
> >
> > Dave
> >
> > >
> > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > > index eba87ff..e8d9086 100644
> > > --- a/fs/dlm/plock.c
> > > +++ b/fs/dlm/plock.c
> > > @@ -103,7 +103,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
> > > op->info.owner = (__u64) fl->fl_pid;
> > > xop->callback = fl->fl_lmops->fl_grant;
> > > locks_init_lock(&xop->flc);
> > > - locks_copy_lock(&xop->flc, fl);
> > > + __locks_copy_lock(&xop->flc, fl);
> > > xop->fl = fl;
> > > xop->file = file;
> > > } else {
> > > @@ -173,8 +173,8 @@ static int dlm_plock_callback(struct plock_op *op)
> > > }
> > >
> > > /* got fs lock; bookkeep locally as well: */
> > > - flc->fl_flags &= ~FL_SLEEP;
> > > - if (posix_lock_file(file, flc, NULL)) {
> > > + fl->fl_flags &= ~FL_SLEEP;
> > > + if (posix_lock_file(file, fl, NULL)) {
> > > /*
> > > * This can only happen in the case of kmalloc() failure.
> > > * The filesystem's own lock is the authoritative lock,

2009-01-15 16:32:22

by David Teigland

[permalink] [raw]
Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Wed, Dec 17, 2008 at 03:01:56PM -0500, J. Bruce Fields wrote:
> On Wed, Dec 17, 2008 at 01:14:53PM -0600, David Teigland wrote:
> > Jeff suggested the following patch, which I've tried and it fixes the
> > problem I was seeing. It passes the original, unmodified file_lock to
> > notify(), instead of the copy which is passed to (and coalesced by)
> > posix_lock_file(). I'm guessing this was reason for having a copy of the
> > file_lock in the first place, but it was just not used correctly.
>
> Yep, that looks much better. Though actually I suspect what was really
> intended was to use "flc" for the notifies, and "fl" for the
> posix_lock_file().
>
> Also, since flc is never actually handed to the posix lock system, I
> think it should be a "shallow" lock copy--so it should be created with
> __locks_copy_lock(). Something like the below?

I left this hanging over the holidays and I'd like to get it wrapped up.
In summary, I think the following is the correct fix:
http://marc.info/?l=linux-nfs&m=122954145532438&w=2

(I'd like to do the s/locks_copy_lock/__locks_copy_lock/ in a separate
patch since it's not directly related to fixing the bug.)

Bruce suggested that perhaps my patch should swap "fl" and "flc", which
I don't think is correct (and doesn't fix the problem in a test). Here's
my complicated explanation of that:
http://marc.info/?l=linux-nfs&m=122954948914263&w=2

Without any further feedback, I'll plan to send my patch soon for 2.6.29.
Thanks,
Dave