by J. Bruce Fields

[permalink] [raw]

Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Tue, Jan 20, 2009 at 06:05:48PM -0500, bfields wrote:
> Sorry for the delay responding!
>
> On Wed, Dec 17, 2008 at 03:28:27PM -0600, David Teigland wrote:
> > On Wed, Dec 17, 2008 at 03:01:56PM -0500, J. Bruce Fields wrote:
> > > Yep, that looks much better. Though actually I suspect what was really
> > > intended was to use "flc" for the notifies, and "fl" for the
> > > posix_lock_file().
> > >
> > > Also, since flc is never actually handed to the posix lock system, I
> > > think it should be a "shallow" lock copy--so it should be created with
> > > __locks_copy_lock(). Something like the below?
> >
> > With this I'm back to seeing the same problem, but with the mismatch in
> > the reverse direction.
> >
> > It seems fl points to lockd's file_lock, and that lockd expects notify()
> > will be called with a pointer to a file_lock that matches one of its own.
> > Based on that I think we'd always pass fl to notify().
>
> The lockd grant function that's called is nlmsvc_grant_deferred(), and
> it uses the passed-in fl only in nlm_compare_locks().
>
> Perhaps the problem is that the posix_lock_file() modifies the original
> file lock which lockd is also holding a pointer to, and thus the
> coalescing has also changed the lock that *lockd*'s sees?

Whoops, sorry, I stopped reading too early:

> > The question then is whether lockd's file_lock should be coalesced or not.
> > If so, we'd pass fl to posix_lock_file(). If not, we'd pass flc to
> > posix_lock_file(). In both cases, fl would be passed to notify() and
> > would match. In the former case, I don't see much purpose for flc to even
> > exist. The patch I sent was the later case.

Yes, OK, makes sense. But the former case (removing flc entirely) might
be simpler:

In the current code, the locks_copy_lock() results in ->fl_copy_lock()
calls without corresponding ->fl_release_private() calls on the copy
that's created; not a problem for the current code, but also not the way
the lock api should work.

--b.

> >
> > In the original code, we coalesce flc which then fails to match the
> > original (fl) in lockd. In your patch, we coalesce fl which then fails to
> > match the copy of the original (flc).
> >
> > Dave
> >
> > >
> > > diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
> > > index eba87ff..e8d9086 100644
> > > --- a/fs/dlm/plock.c
> > > +++ b/fs/dlm/plock.c
> > > @@ -103,7 +103,7 @@ int dlm_posix_lock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
> > > op->info.owner = (__u64) fl->fl_pid;
> > > xop->callback = fl->fl_lmops->fl_grant;
> > > locks_init_lock(&xop->flc);
> > > - locks_copy_lock(&xop->flc, fl);
> > > + __locks_copy_lock(&xop->flc, fl);
> > > xop->fl = fl;
> > > xop->file = file;
> > > } else {
> > > @@ -173,8 +173,8 @@ static int dlm_plock_callback(struct plock_op *op)
> > > }
> > >
> > > /* got fs lock; bookkeep locally as well: */
> > > - flc->fl_flags &= ~FL_SLEEP;
> > > - if (posix_lock_file(file, flc, NULL)) {
> > > + fl->fl_flags &= ~FL_SLEEP;
> > > + if (posix_lock_file(file, fl, NULL)) {
> > > /*
> > > * This can only happen in the case of kmalloc() failure.
> > > * The filesystem's own lock is the authoritative lock,

2009-01-15 16:32:22

by David Teigland

[permalink] [raw]

Subject: Re: [PATCH] lockd: handle fl_grant callbacks with coalesced locks (RFC)

On Wed, Dec 17, 2008 at 03:01:56PM -0500, J. Bruce Fields wrote:
> On Wed, Dec 17, 2008 at 01:14:53PM -0600, David Teigland wrote:
> > Jeff suggested the following patch, which I've tried and it fixes the
> > problem I was seeing. It passes the original, unmodified file_lock to
> > notify(), instead of the copy which is passed to (and coalesced by)
> > posix_lock_file(). I'm guessing this was reason for having a copy of the
> > file_lock in the first place, but it was just not used correctly.
>
> Yep, that looks much better. Though actually I suspect what was really
> intended was to use "flc" for the notifies, and "fl" for the
> posix_lock_file().
>
> Also, since flc is never actually handed to the posix lock system, I
> think it should be a "shallow" lock copy--so it should be created with
> __locks_copy_lock(). Something like the below?

I left this hanging over the holidays and I'd like to get it wrapped up.
In summary, I think the following is the correct fix:
http://marc.info/?l=linux-nfs&m=122954145532438&w=2

(I'd like to do the s/locks_copy_lock/__locks_copy_lock/ in a separate
patch since it's not directly related to fixing the bug.)

Bruce suggested that perhaps my patch should swap "fl" and "flc", which
I don't think is correct (and doesn't fix the problem in a test). Here's
my complicated explanation of that:
http://marc.info/?l=linux-nfs&m=122954948914263&w=2

Without any further feedback, I'll plan to send my patch soon for 2.6.29.
Thanks,
Dave