v4:
- close more potential races in setlease code, and fix some bugs in
error handling in that code.
- clean up delegation setting functions, eliminating unused arguments
and avoiding allocations when there has already been a delegation
break
- add separate spinlock for block_delegations/delegation_blocked code
v3:
- use alternate method for checking for delegation break races after
getting a lease (just check fi_had_conflict instead)
- drop file_has_lease patch -- no longer needed
- move cl_revoked handling patch into this set. It means altering a
few of the later patches, but it keeps the set more topically
coherent
v2:
- move remove_stid call from nfs4_free_stid and into callers
Yet another respin of the delegation rework to prepare for client_mutex
removal. This fixes all of the error handling bugs I could find and
should fix up the potential races between setting a delegation and
having it broken.
I also added some cleanup of the delegation setting code itself, moving
the allocation into the nfs4_set_delegation call, so that we can avoid
allocating one when we know there has already been a conflict.
Jeff Layton (7):
nfsd: Protect the nfs4_file delegation fields using the fi_lock
nfsd: Fix delegation revocation
nfsd: ensure that clp->cl_revoked list is protected by clp->cl_lock
nfsd: drop unused stp arg to alloc_init_deleg
nfsd: clean up arguments to nfs4_open_delegation
nfsd: clean up nfs4_set_delegation
nfsd: give block_delegation and delegation_blocked its own spinlock
Trond Myklebust (3):
nfsd: Move the delegation reference counter into the struct nfs4_stid
nfsd: simplify stateid allocation and file handling
nfsd: Convert delegation counter to an atomic_long_t type
fs/nfsd/nfs4state.c | 250 +++++++++++++++++++++++++++++++++-------------------
fs/nfsd/state.h | 2 +-
2 files changed, 161 insertions(+), 91 deletions(-)
--
1.9.3
On Fri, Jul 18, 2014 at 01:31:40PM -0400, Jeff Layton wrote:
> On Fri, 18 Jul 2014 12:28:25 -0400
> "J. Bruce Fields" <[email protected]> wrote:
>
> > On Fri, Jul 18, 2014 at 11:13:27AM -0400, Jeff Layton wrote:
> > > Move more of the delegation fields to be protected by the fi_lock. It's
> > > more granular than the state_lock and in later patches we'll want to
> > > be able to rely on it in addition to the state_lock.
> > >
> > > Also, the current code in nfs4_setlease calls vfs_setlease and uses the
> > > client_mutex to ensure that it doesn't disappear before we can hash the
> > > delegation. With the client_mutex gone, we'll have a potential race
> > > condition.
> > >
> > > It's possible that the delegation could be recalled after we acquire the
> > > lease but before we ever get around to hashing it. If that happens, then
> > > we'd have a nfs4_file that *thinks* it has a delegation, when it
> > > actually has none.
> >
> > I understand now, thanks: so the lease break code walks the list of
> > delegations associated with the file, finds none, and issues no recall,
> > but the open code continues merrily on and returns a delegation, with
> > the result that we return the client a delegation that will never be
> > recalled.
> >
> > That could be worded more carefully, and would be worth a separate patch
> > (since the bug predates the new locking).
> >
>
> Yes, that's basically correct. I'd have to think about how to fix that
> with the current code. It's probably doable if you think it's
> worthwhile, but I'll need to rebase this set on top of it.
Well, I was wondering if this patch could just be split in two, no need
to backport further than that.
> > > Attempt to acquire a delegation. If that succeeds, take the spinlocks
> > > and then check to see if the file has had a conflict show up since then.
> > > If it has, then we assume that the lease is no longer valid and that
> > > we shouldn't hand out a delegation.
> > >
> > > There's also one more potential (but very unlikely) problem. If the
> > > lease is broken before the delegation is hashed, then it could leak.
> > > In the event that the fi_delegations list is empty, reset the
> > > fl_break_time to jiffies so that it's cleaned up ASAP by
> > > the normal lease handling code.
> >
> > Is there actually any guarantee time_out_leases() will get called on
> > this inode again?
> >
> > --b.
> >
>
> Yes. Lease breaks are handled in two phases. We walk the i_flock list
> and issue a ->lm_break on each lease, and then later we walk the list
> again after putting the task to sleep, and try to time out the leases.
> So by doing this, we should ensure that the task will wake up after
> sleeping and delete it.
In the case of an interrupt or a nonblocking break (which is what nfsd
will do), then time_out_leases isn't called again from what I could
tell.
--b.
>
> > >
> > > Signed-off-by: Trond Myklebust <[email protected]>
> > > Signed-off-by: Jeff Layton <[email protected]>
> > > ---
> > > fs/nfsd/nfs4state.c | 90
> > > +++++++++++++++++++++++++++++++++++++++-------------- 1 file
> > > changed, 66 insertions(+), 24 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index fd4deb049ddf..9ab067e85b51 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -624,6 +624,8 @@ nfs4_put_delegation(struct nfs4_delegation *dp)
> > >
> > > static void nfs4_put_deleg_lease(struct nfs4_file *fp)
> > > {
> > > + lockdep_assert_held(&state_lock);
> > > +
> > > if (!fp->fi_lease)
> > > return;
> > > if (atomic_dec_and_test(&fp->fi_delegees)) {
> > > @@ -643,11 +645,10 @@ static void
> > > hash_delegation_locked(struct nfs4_delegation *dp, struct
> > > nfs4_file *fp) {
> > > lockdep_assert_held(&state_lock);
> > > + lockdep_assert_held(&fp->fi_lock);
> > >
> > > dp->dl_stid.sc_type = NFS4_DELEG_STID;
> > > - spin_lock(&fp->fi_lock);
> > > list_add(&dp->dl_perfile, &fp->fi_delegations);
> > > - spin_unlock(&fp->fi_lock);
> > > list_add(&dp->dl_perclnt,
> > > &dp->dl_stid.sc_client->cl_delegations); }
> > >
> > > @@ -659,17 +660,18 @@ unhash_delegation(struct nfs4_delegation *dp)
> > >
> > > spin_lock(&state_lock);
> > > dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> > > + spin_lock(&fp->fi_lock);
> > > list_del_init(&dp->dl_perclnt);
> > > list_del_init(&dp->dl_recall_lru);
> > > - spin_lock(&fp->fi_lock);
> > > list_del_init(&dp->dl_perfile);
> > > spin_unlock(&fp->fi_lock);
> > > - spin_unlock(&state_lock);
> > > if (fp) {
> > > nfs4_put_deleg_lease(fp);
> > > - put_nfs4_file(fp);
> > > dp->dl_file = NULL;
> > > }
> > > + spin_unlock(&state_lock);
> > > + if (fp)
> > > + put_nfs4_file(fp);
> > > }
> > >
> > > static void destroy_revoked_delegation(struct nfs4_delegation *dp)
> > > @@ -3143,10 +3145,19 @@ static void nfsd_break_deleg_cb(struct
> > > file_lock *fl) */
> > > fl->fl_break_time = 0;
> > >
> > > - fp->fi_had_conflict = true;
> > > spin_lock(&fp->fi_lock);
> > > - list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
> > > - nfsd_break_one_deleg(dp);
> > > + fp->fi_had_conflict = true;
> > > + /*
> > > + * If there are no delegations on the list, then we can't
> > > count on this
> > > + * lease ever being cleaned up. Set the fl_break_time to
> > > jiffies so that
> > > + * time_out_leases will do it ASAP. The fact that
> > > fi_had_conflict is now
> > > + * true should keep any new delegations from being hashed.
> > > + */
> > > + if (list_empty(&fp->fi_delegations))
> > > + fl->fl_break_time = jiffies;
> > > + else
> > > + list_for_each_entry(dp, &fp->fi_delegations,
> > > dl_perfile)
> > > + nfsd_break_one_deleg(dp);
> > > spin_unlock(&fp->fi_lock);
> > > }
> > >
> > > @@ -3493,46 +3504,77 @@ static int nfs4_setlease(struct
> > > nfs4_delegation *dp) {
> > > struct nfs4_file *fp = dp->dl_file;
> > > struct file_lock *fl;
> > > - int status;
> > > + struct file *filp;
> > > + int status = 0;
> > >
> > > fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
> > > if (!fl)
> > > return -ENOMEM;
> > > - fl->fl_file = find_readable_file(fp);
> > > - status = vfs_setlease(fl->fl_file, fl->fl_type, &fl);
> > > - if (status)
> > > - goto out_free;
> > > + filp = find_readable_file(fp);
> > > + if (!filp) {
> > > + /* We should always have a readable file here */
> > > + WARN_ON_ONCE(1);
> > > + return -EBADF;
> > > + }
> > > + status = vfs_setlease(filp, fl->fl_type, &fl);
> > > + if (status) {
> > > + locks_free_lock(fl);
> > > + goto out_fput;
> > > + }
> > > + spin_lock(&state_lock);
> > > + spin_lock(&fp->fi_lock);
> > > + /* Did the lease get broken before we took the lock? */
> > > + status = -EAGAIN;
> > > + if (fp->fi_had_conflict)
> > > + goto out_unlock;
> > > + /* Race breaker */
> > > + if (fp->fi_lease) {
> > > + status = 0;
> > > + atomic_inc(&fp->fi_delegees);
> > > + hash_delegation_locked(dp, fp);
> > > + goto out_unlock;
> > > + }
> > > fp->fi_lease = fl;
> > > - fp->fi_deleg_file = fl->fl_file;
> > > + fp->fi_deleg_file = filp;
> > > atomic_set(&fp->fi_delegees, 1);
> > > - spin_lock(&state_lock);
> > > hash_delegation_locked(dp, fp);
> > > + spin_unlock(&fp->fi_lock);
> > > spin_unlock(&state_lock);
> > > return 0;
> > > -out_free:
> > > - if (fl->fl_file)
> > > - fput(fl->fl_file);
> > > - locks_free_lock(fl);
> > > +out_unlock:
> > > + spin_unlock(&fp->fi_lock);
> > > + spin_unlock(&state_lock);
> > > +out_fput:
> > > + if (filp)
> > > + fput(filp);
> > > return status;
> > > }
> > >
> > > static int nfs4_set_delegation(struct nfs4_delegation *dp, struct
> > > nfs4_file *fp) {
> > > + int status = 0;
> > > +
> > > if (fp->fi_had_conflict)
> > > return -EAGAIN;
> > > get_nfs4_file(fp);
> > > + spin_lock(&state_lock);
> > > + spin_lock(&fp->fi_lock);
> > > dp->dl_file = fp;
> > > - if (!fp->fi_lease)
> > > + if (!fp->fi_lease) {
> > > + spin_unlock(&fp->fi_lock);
> > > + spin_unlock(&state_lock);
> > > return nfs4_setlease(dp);
> > > - spin_lock(&state_lock);
> > > + }
> > > atomic_inc(&fp->fi_delegees);
> > > if (fp->fi_had_conflict) {
> > > - spin_unlock(&state_lock);
> > > - return -EAGAIN;
> > > + status = -EAGAIN;
> > > + goto out_unlock;
> > > }
> > > hash_delegation_locked(dp, fp);
> > > +out_unlock:
> > > + spin_unlock(&fp->fi_lock);
> > > spin_unlock(&state_lock);
> > > - return 0;
> > > + return status;
> > > }
> > >
> > > static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int
> > > status) --
> > > 1.9.3
> > >
>
>
> --
> Jeff Layton <[email protected]>
On Mon, 21 Jul 2014 17:17:57 -0400 "J. Bruce Fields" <[email protected]>
wrote:
> On Tue, Jul 22, 2014 at 06:40:49AM +1000, NeilBrown wrote:
> > On Mon, 21 Jul 2014 07:44:12 -0400 Jeff Layton <[email protected]>
> > wrote:
> >
> > > On Mon, 21 Jul 2014 17:02:54 +1000
> > > NeilBrown <[email protected]> wrote:
> >
> > > > > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> > > > >
> > > > > __set_bit(hash&255, bd->set[bd->new]);
> > > > > __set_bit((hash>>8)&255, bd->set[bd->new]);
> > > > > __set_bit((hash>>16)&255, bd->set[bd->new]);
> > > > > + spin_lock(&blocked_delegations_lock);
> > > >
> > > > __set_bit isn't atomic. The spin_lock should be taken *before* these
> > > > __set_bit() calls.
> > > >
> > > > Otherwise, looks fine.
> > > >
> > > > Thanks,
> > > > NeilBrown
> > > >
> > > >
> > >
> > > Ok. I guess the worry is that we could end up setting bits in the
> > > middle of swapping the two fields? Makes sense -- fixed in my repo.
> >
> > It is more subtle than that.
> > __set_bit() will:
> > read a value from memory to a register
> > set a bit in the register
> > write the register back out to memory
> >
> > If two threads both run __set_bit on the same word of memory at the same
> > time, one of the updates can get lost.
> > set_bit() (no underscore) performs an atomic RMW to avoid this, but is more
> > expensive.
> > spin_lock() obviously ensures the required exclusion and as we are going to
> > take the lock anyway we may as well take it before setting bits so we can use
> > the non-atomic (cheaper) __set_bit function.
> >
> > > I'll send out the updated set later today (it also includes a few nits
> > > that HCH pointed out last week).
> > >
> > > As a side note...I wonder how much we'll get in the way of false
> > > positives with this scheme?
> >
> > If a future version of NFSv4 could allow delegations to be granted while a
> > file is open (oh, it seems you are the only client using this file at the
> > moment, you can treat this "open" as a delegation if you like) a few false
> > positives would be a complete non-issue.
>
> For what it's worth, I think 4.1 provides what you're asking for here;
> see
>
> http://tools.ietf.org/html/rfc5661#section-20.7
>
> and the discussion of the various WANT_ flags in
>
> http://tools.ietf.org/html/rfc5661#section-18.16.3
>
> As far as I know none of that is implemented yet.
>
> --b.
I guess I should really read the 4.1 (and 4.2) spec some day....
Though the 20.7 section seems to be about saying "resources in general are
available" rather than "this specific file that you wanted a delegation for
but didn't get one is how up for delegation"....
But I only had a quick read so I might have missed something.
Thanks,
NeilBrown
Move the alloc_init_deleg call into nfs4_set_delegation and change the
function to return a pointer to the delegation or an IS_ERR return. This
allows us to skip allocating a delegation if the file has already
experienced a lease conflict.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 34 ++++++++++++++++++++++------------
1 file changed, 22 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 1834d9bd5935..a2c6c85adfc7 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3553,12 +3553,20 @@ out_fput:
return status;
}
-static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
+static struct nfs4_delegation *
+nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
+ struct nfs4_file *fp)
{
- int status = 0;
+ int status;
+ struct nfs4_delegation *dp;
if (fp->fi_had_conflict)
- return -EAGAIN;
+ return ERR_PTR(-EAGAIN);
+
+ dp = alloc_init_deleg(clp, fh);
+ if (!dp)
+ return ERR_PTR(-ENOMEM);
+
get_nfs4_file(fp);
spin_lock(&state_lock);
spin_lock(&fp->fi_lock);
@@ -3566,7 +3574,8 @@ static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
if (!fp->fi_lease) {
spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
- return nfs4_setlease(dp);
+ status = nfs4_setlease(dp);
+ goto out;
}
atomic_inc(&fp->fi_delegees);
if (fp->fi_had_conflict) {
@@ -3574,10 +3583,16 @@ static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
goto out_unlock;
}
hash_delegation_locked(dp, fp);
+ status = 0;
out_unlock:
spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
- return status;
+out:
+ if (status) {
+ nfs4_put_delegation(dp);
+ return ERR_PTR(status);
+ }
+ return dp;
}
static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status)
@@ -3651,12 +3666,9 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open,
default:
goto out_no_deleg;
}
- dp = alloc_init_deleg(clp, fh);
- if (dp == NULL)
+ dp = nfs4_set_delegation(clp, fh, stp->st_file);
+ if (IS_ERR(dp))
goto out_no_deleg;
- status = nfs4_set_delegation(dp, stp->st_file);
- if (status)
- goto out_free;
memcpy(&open->op_delegate_stateid, &dp->dl_stid.sc_stateid, sizeof(dp->dl_stid.sc_stateid));
@@ -3664,8 +3676,6 @@ nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open,
STATEID_VAL(&dp->dl_stid.sc_stateid));
open->op_delegate_type = NFS4_OPEN_DELEGATE_READ;
return;
-out_free:
- nfs4_put_delegation(dp);
out_no_deleg:
open->op_delegate_type = NFS4_OPEN_DELEGATE_NONE;
if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS &&
--
1.9.3
On Fri, 18 Jul 2014 12:28:25 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Fri, Jul 18, 2014 at 11:13:27AM -0400, Jeff Layton wrote:
> > Move more of the delegation fields to be protected by the fi_lock. It's
> > more granular than the state_lock and in later patches we'll want to
> > be able to rely on it in addition to the state_lock.
> >
> > Also, the current code in nfs4_setlease calls vfs_setlease and uses the
> > client_mutex to ensure that it doesn't disappear before we can hash the
> > delegation. With the client_mutex gone, we'll have a potential race
> > condition.
> >
> > It's possible that the delegation could be recalled after we acquire the
> > lease but before we ever get around to hashing it. If that happens, then
> > we'd have a nfs4_file that *thinks* it has a delegation, when it
> > actually has none.
>
> I understand now, thanks: so the lease break code walks the list of
> delegations associated with the file, finds none, and issues no recall,
> but the open code continues merrily on and returns a delegation, with
> the result that we return the client a delegation that will never be
> recalled.
>
> That could be worded more carefully, and would be worth a separate patch
> (since the bug predates the new locking).
>
Yes, that's basically correct. I'd have to think about how to fix that
with the current code. It's probably doable if you think it's
worthwhile, but I'll need to rebase this set on top of it.
> > Attempt to acquire a delegation. If that succeeds, take the spinlocks
> > and then check to see if the file has had a conflict show up since then.
> > If it has, then we assume that the lease is no longer valid and that
> > we shouldn't hand out a delegation.
> >
> > There's also one more potential (but very unlikely) problem. If the
> > lease is broken before the delegation is hashed, then it could leak.
> > In the event that the fi_delegations list is empty, reset the
> > fl_break_time to jiffies so that it's cleaned up ASAP by
> > the normal lease handling code.
>
> Is there actually any guarantee time_out_leases() will get called on
> this inode again?
>
> --b.
>
Yes. Lease breaks are handled in two phases. We walk the i_flock list
and issue a ->lm_break on each lease, and then later we walk the list
again after putting the task to sleep, and try to time out the leases.
So by doing this, we should ensure that the task will wake up after
sleeping and delete it.
> >
> > Signed-off-by: Trond Myklebust <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 90
> > +++++++++++++++++++++++++++++++++++++++-------------- 1 file
> > changed, 66 insertions(+), 24 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index fd4deb049ddf..9ab067e85b51 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -624,6 +624,8 @@ nfs4_put_delegation(struct nfs4_delegation *dp)
> >
> > static void nfs4_put_deleg_lease(struct nfs4_file *fp)
> > {
> > + lockdep_assert_held(&state_lock);
> > +
> > if (!fp->fi_lease)
> > return;
> > if (atomic_dec_and_test(&fp->fi_delegees)) {
> > @@ -643,11 +645,10 @@ static void
> > hash_delegation_locked(struct nfs4_delegation *dp, struct
> > nfs4_file *fp) {
> > lockdep_assert_held(&state_lock);
> > + lockdep_assert_held(&fp->fi_lock);
> >
> > dp->dl_stid.sc_type = NFS4_DELEG_STID;
> > - spin_lock(&fp->fi_lock);
> > list_add(&dp->dl_perfile, &fp->fi_delegations);
> > - spin_unlock(&fp->fi_lock);
> > list_add(&dp->dl_perclnt,
> > &dp->dl_stid.sc_client->cl_delegations); }
> >
> > @@ -659,17 +660,18 @@ unhash_delegation(struct nfs4_delegation *dp)
> >
> > spin_lock(&state_lock);
> > dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> > + spin_lock(&fp->fi_lock);
> > list_del_init(&dp->dl_perclnt);
> > list_del_init(&dp->dl_recall_lru);
> > - spin_lock(&fp->fi_lock);
> > list_del_init(&dp->dl_perfile);
> > spin_unlock(&fp->fi_lock);
> > - spin_unlock(&state_lock);
> > if (fp) {
> > nfs4_put_deleg_lease(fp);
> > - put_nfs4_file(fp);
> > dp->dl_file = NULL;
> > }
> > + spin_unlock(&state_lock);
> > + if (fp)
> > + put_nfs4_file(fp);
> > }
> >
> > static void destroy_revoked_delegation(struct nfs4_delegation *dp)
> > @@ -3143,10 +3145,19 @@ static void nfsd_break_deleg_cb(struct
> > file_lock *fl) */
> > fl->fl_break_time = 0;
> >
> > - fp->fi_had_conflict = true;
> > spin_lock(&fp->fi_lock);
> > - list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
> > - nfsd_break_one_deleg(dp);
> > + fp->fi_had_conflict = true;
> > + /*
> > + * If there are no delegations on the list, then we can't
> > count on this
> > + * lease ever being cleaned up. Set the fl_break_time to
> > jiffies so that
> > + * time_out_leases will do it ASAP. The fact that
> > fi_had_conflict is now
> > + * true should keep any new delegations from being hashed.
> > + */
> > + if (list_empty(&fp->fi_delegations))
> > + fl->fl_break_time = jiffies;
> > + else
> > + list_for_each_entry(dp, &fp->fi_delegations,
> > dl_perfile)
> > + nfsd_break_one_deleg(dp);
> > spin_unlock(&fp->fi_lock);
> > }
> >
> > @@ -3493,46 +3504,77 @@ static int nfs4_setlease(struct
> > nfs4_delegation *dp) {
> > struct nfs4_file *fp = dp->dl_file;
> > struct file_lock *fl;
> > - int status;
> > + struct file *filp;
> > + int status = 0;
> >
> > fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
> > if (!fl)
> > return -ENOMEM;
> > - fl->fl_file = find_readable_file(fp);
> > - status = vfs_setlease(fl->fl_file, fl->fl_type, &fl);
> > - if (status)
> > - goto out_free;
> > + filp = find_readable_file(fp);
> > + if (!filp) {
> > + /* We should always have a readable file here */
> > + WARN_ON_ONCE(1);
> > + return -EBADF;
> > + }
> > + status = vfs_setlease(filp, fl->fl_type, &fl);
> > + if (status) {
> > + locks_free_lock(fl);
> > + goto out_fput;
> > + }
> > + spin_lock(&state_lock);
> > + spin_lock(&fp->fi_lock);
> > + /* Did the lease get broken before we took the lock? */
> > + status = -EAGAIN;
> > + if (fp->fi_had_conflict)
> > + goto out_unlock;
> > + /* Race breaker */
> > + if (fp->fi_lease) {
> > + status = 0;
> > + atomic_inc(&fp->fi_delegees);
> > + hash_delegation_locked(dp, fp);
> > + goto out_unlock;
> > + }
> > fp->fi_lease = fl;
> > - fp->fi_deleg_file = fl->fl_file;
> > + fp->fi_deleg_file = filp;
> > atomic_set(&fp->fi_delegees, 1);
> > - spin_lock(&state_lock);
> > hash_delegation_locked(dp, fp);
> > + spin_unlock(&fp->fi_lock);
> > spin_unlock(&state_lock);
> > return 0;
> > -out_free:
> > - if (fl->fl_file)
> > - fput(fl->fl_file);
> > - locks_free_lock(fl);
> > +out_unlock:
> > + spin_unlock(&fp->fi_lock);
> > + spin_unlock(&state_lock);
> > +out_fput:
> > + if (filp)
> > + fput(filp);
> > return status;
> > }
> >
> > static int nfs4_set_delegation(struct nfs4_delegation *dp, struct
> > nfs4_file *fp) {
> > + int status = 0;
> > +
> > if (fp->fi_had_conflict)
> > return -EAGAIN;
> > get_nfs4_file(fp);
> > + spin_lock(&state_lock);
> > + spin_lock(&fp->fi_lock);
> > dp->dl_file = fp;
> > - if (!fp->fi_lease)
> > + if (!fp->fi_lease) {
> > + spin_unlock(&fp->fi_lock);
> > + spin_unlock(&state_lock);
> > return nfs4_setlease(dp);
> > - spin_lock(&state_lock);
> > + }
> > atomic_inc(&fp->fi_delegees);
> > if (fp->fi_had_conflict) {
> > - spin_unlock(&state_lock);
> > - return -EAGAIN;
> > + status = -EAGAIN;
> > + goto out_unlock;
> > }
> > hash_delegation_locked(dp, fp);
> > +out_unlock:
> > + spin_unlock(&fp->fi_lock);
> > spin_unlock(&state_lock);
> > - return 0;
> > + return status;
> > }
> >
> > static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int
> > status) --
> > 1.9.3
> >
--
Jeff Layton <[email protected]>
No need to pass in a net pointer since we can derive that.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 3e761ea48a70..1834d9bd5935 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3608,11 +3608,12 @@ static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status)
* proper support for them.
*/
static void
-nfs4_open_delegation(struct net *net, struct svc_fh *fh,
- struct nfsd4_open *open, struct nfs4_ol_stateid *stp)
+nfs4_open_delegation(struct svc_fh *fh, struct nfsd4_open *open,
+ struct nfs4_ol_stateid *stp)
{
struct nfs4_delegation *dp;
- struct nfs4_openowner *oo = container_of(stp->st_stateowner, struct nfs4_openowner, oo_owner);
+ struct nfs4_openowner *oo = openowner(stp->st_stateowner);
+ struct nfs4_client *clp = stp->st_stid.sc_client;
int cb_up;
int status = 0;
@@ -3631,7 +3632,7 @@ nfs4_open_delegation(struct net *net, struct svc_fh *fh,
* Let's not give out any delegations till everyone's
* had the chance to reclaim theirs....
*/
- if (locks_in_grace(net))
+ if (locks_in_grace(clp->net))
goto out_no_deleg;
if (!cb_up || !(oo->oo_flags & NFS4_OO_CONFIRMED))
goto out_no_deleg;
@@ -3650,7 +3651,7 @@ nfs4_open_delegation(struct net *net, struct svc_fh *fh,
default:
goto out_no_deleg;
}
- dp = alloc_init_deleg(oo->oo_owner.so_client, fh);
+ dp = alloc_init_deleg(clp, fh);
if (dp == NULL)
goto out_no_deleg;
status = nfs4_set_delegation(dp, stp->st_file);
@@ -3764,7 +3765,7 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
* Attempt to hand out a delegation. No error return, because the
* OPEN succeeds even if we fail.
*/
- nfs4_open_delegation(SVC_NET(rqstp), current_fh, open, stp);
+ nfs4_open_delegation(current_fh, open, stp);
nodeleg:
status = nfs_ok;
--
1.9.3
The state lock can be fairly heavily contended, and there's no reason
that nfs4_file lookups and delegation_blocked should be mutually
exclusive. Let's give the new block_delegation code its own spinlock.
It does mean that we'll need to take a different lock in the delegation
break code, but that's not generally as critical to performance.
Cc: Neil Brown <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index a2c6c85adfc7..952def00363b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -506,10 +506,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
* Each filter is 256 bits. We hash the filehandle to 32bit and use the
* low 3 bytes as hash-table indices.
*
- * 'state_lock', which is always held when block_delegations() is called,
- * is used to manage concurrent access. Testing does not need the lock
- * except when swapping the two filters.
+ * 'blocked_delegations_lock', which is always held when block_delegations()
+ * is called, is used to manage concurrent access. Testing does not need the
+ * lock except when swapping the two filters.
*/
+static DEFINE_SPINLOCK(blocked_delegations_lock);
static struct bloom_pair {
int entries, old_entries;
time_t swap_time;
@@ -525,7 +526,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
if (bd->entries == 0)
return 0;
if (seconds_since_boot() - bd->swap_time > 30) {
- spin_lock(&state_lock);
+ spin_lock(&blocked_delegations_lock);
if (seconds_since_boot() - bd->swap_time > 30) {
bd->entries -= bd->old_entries;
bd->old_entries = bd->entries;
@@ -534,7 +535,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
bd->new = 1-bd->new;
bd->swap_time = seconds_since_boot();
}
- spin_unlock(&state_lock);
+ spin_unlock(&blocked_delegations_lock);
}
hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
if (test_bit(hash&255, bd->set[0]) &&
@@ -555,16 +556,16 @@ static void block_delegations(struct knfsd_fh *fh)
u32 hash;
struct bloom_pair *bd = &blocked_delegations;
- lockdep_assert_held(&state_lock);
-
hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
__set_bit(hash&255, bd->set[bd->new]);
__set_bit((hash>>8)&255, bd->set[bd->new]);
__set_bit((hash>>16)&255, bd->set[bd->new]);
+ spin_lock(&blocked_delegations_lock);
if (bd->entries == 0)
bd->swap_time = seconds_since_boot();
bd->entries += 1;
+ spin_unlock(&blocked_delegations_lock);
}
static struct nfs4_delegation *
@@ -3097,16 +3098,16 @@ void nfsd4_prepare_cb_recall(struct nfs4_delegation *dp)
struct nfs4_client *clp = dp->dl_stid.sc_client;
struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
- /*
- * We can't do this in nfsd_break_deleg_cb because it is
- * already holding inode->i_lock
- */
- spin_lock(&state_lock);
block_delegations(&dp->dl_fh);
+
/*
+ * We can't do this in nfsd_break_deleg_cb because it is
+ * already holding inode->i_lock.
+ *
* If the dl_time != 0, then we know that it has already been
* queued for a lease break. Don't queue it again.
*/
+ spin_lock(&state_lock);
if (dp->dl_time == 0) {
dp->dl_time = get_seconds();
list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru);
--
1.9.3
On Fri, Jul 18, 2014 at 03:21:49PM -0400, J. Bruce Fields wrote:
> On Fri, Jul 18, 2014 at 03:04:04PM -0400, Jeff Layton wrote:
> > On Fri, 18 Jul 2014 13:49:57 -0400
> > "J. Bruce Fields" <[email protected]> wrote:
> >
> > > On Fri, Jul 18, 2014 at 01:31:40PM -0400, Jeff Layton wrote:
> > > > On Fri, 18 Jul 2014 12:28:25 -0400
> > > > "J. Bruce Fields" <[email protected]> wrote:
> > > >
> > > > > On Fri, Jul 18, 2014 at 11:13:27AM -0400, Jeff Layton wrote:
> > > > > > Move more of the delegation fields to be protected by the fi_lock. It's
> > > > > > more granular than the state_lock and in later patches we'll want to
> > > > > > be able to rely on it in addition to the state_lock.
> > > > > >
> > > > > > Also, the current code in nfs4_setlease calls vfs_setlease and uses the
> > > > > > client_mutex to ensure that it doesn't disappear before we can hash the
> > > > > > delegation. With the client_mutex gone, we'll have a potential race
> > > > > > condition.
> > > > > >
> > > > > > It's possible that the delegation could be recalled after we acquire the
> > > > > > lease but before we ever get around to hashing it. If that happens, then
> > > > > > we'd have a nfs4_file that *thinks* it has a delegation, when it
> > > > > > actually has none.
> > > > >
> > > > > I understand now, thanks: so the lease break code walks the list of
> > > > > delegations associated with the file, finds none, and issues no recall,
> > > > > but the open code continues merrily on and returns a delegation, with
> > > > > the result that we return the client a delegation that will never be
> > > > > recalled.
> > > > >
> > > > > That could be worded more carefully, and would be worth a separate patch
> > > > > (since the bug predates the new locking).
> > > > >
> > > >
> > > > Yes, that's basically correct. I'd have to think about how to fix that
> > > > with the current code. It's probably doable if you think it's
> > > > worthwhile, but I'll need to rebase this set on top of it.
> > >
> > > Well, I was wondering if this patch could just be split in two, no need
> > > to backport further than that.
> > >
> >
> > Erm, now that I've looked, I don't think it'll be that easy. The key
> > here is to ensure that fi_had_conflict is set while holding the
> > fi_lock. The trick here is that we need to take it in nfs4_setlease as
> > well, and check the flag before hashing the delegation without dropping
> > the fi_lock.
>
> OK, I'll live. For the sake of anyone that actually runs across that
> bug I'll update the summary and changelog to emphasize the bugfix over
> the locking change.
So, intending to apply as follows.
--b.
commit 417c6629b2d81d5a18d29c4bbb6a9a4c64282a36
Author: Jeff Layton <[email protected]>
Date: Mon Jul 21 09:34:57 2014 -0400
nfsd: fix race that grants unrecallable delegation
If nfs4_setlease succesfully acquires a new delegation, then another
task breaks the delegation before we reach hash_delegation_locked, then
the breaking task will see an empty fi_delegations list and do nothing.
The client will receive an open reply incorrectly granting a delegation
and will never receive a recall.
Move more of the delegation fields to be protected by the fi_lock. It's
more granular than the state_lock and in later patches we'll want to
be able to rely on it in addition to the state_lock.
Attempt to acquire a delegation. If that succeeds, take the spinlocks
and then check to see if the file has had a conflict show up since then.
If it has, then we assume that the lease is no longer valid and that
we shouldn't hand out a delegation.
There's also one more potential (but very unlikely) problem. If the
lease is broken before the delegation is hashed, then it could leak.
In the event that the fi_delegations list is empty, reset the
fl_break_time to jiffies so that it's cleaned up ASAP by
the normal lease handling code.
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: J. Bruce Fields <[email protected]>
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 10cdb67..cc477dd 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -624,6 +624,8 @@ nfs4_put_delegation(struct nfs4_delegation *dp)
static void nfs4_put_deleg_lease(struct nfs4_file *fp)
{
+ lockdep_assert_held(&state_lock);
+
if (!fp->fi_lease)
return;
if (atomic_dec_and_test(&fp->fi_delegees)) {
@@ -643,11 +645,10 @@ static void
hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
{
lockdep_assert_held(&state_lock);
+ lockdep_assert_held(&fp->fi_lock);
dp->dl_stid.sc_type = NFS4_DELEG_STID;
- spin_lock(&fp->fi_lock);
list_add(&dp->dl_perfile, &fp->fi_delegations);
- spin_unlock(&fp->fi_lock);
list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
}
@@ -659,17 +660,18 @@ unhash_delegation(struct nfs4_delegation *dp)
spin_lock(&state_lock);
dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
+ spin_lock(&fp->fi_lock);
list_del_init(&dp->dl_perclnt);
list_del_init(&dp->dl_recall_lru);
- spin_lock(&fp->fi_lock);
list_del_init(&dp->dl_perfile);
spin_unlock(&fp->fi_lock);
- spin_unlock(&state_lock);
if (fp) {
nfs4_put_deleg_lease(fp);
- put_nfs4_file(fp);
dp->dl_file = NULL;
}
+ spin_unlock(&state_lock);
+ if (fp)
+ put_nfs4_file(fp);
}
static void destroy_revoked_delegation(struct nfs4_delegation *dp)
@@ -3141,10 +3143,19 @@ static void nfsd_break_deleg_cb(struct file_lock *fl)
*/
fl->fl_break_time = 0;
- fp->fi_had_conflict = true;
spin_lock(&fp->fi_lock);
- list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
- nfsd_break_one_deleg(dp);
+ fp->fi_had_conflict = true;
+ /*
+ * If there are no delegations on the list, then we can't count on this
+ * lease ever being cleaned up. Set the fl_break_time to jiffies so that
+ * time_out_leases will do it ASAP. The fact that fi_had_conflict is now
+ * true should keep any new delegations from being hashed.
+ */
+ if (list_empty(&fp->fi_delegations))
+ fl->fl_break_time = jiffies;
+ else
+ list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
+ nfsd_break_one_deleg(dp);
spin_unlock(&fp->fi_lock);
}
@@ -3491,46 +3502,77 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
{
struct nfs4_file *fp = dp->dl_file;
struct file_lock *fl;
- int status;
+ struct file *filp;
+ int status = 0;
fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
if (!fl)
return -ENOMEM;
- fl->fl_file = find_readable_file(fp);
- status = vfs_setlease(fl->fl_file, fl->fl_type, &fl);
- if (status)
- goto out_free;
+ filp = find_readable_file(fp);
+ if (!filp) {
+ /* We should always have a readable file here */
+ WARN_ON_ONCE(1);
+ return -EBADF;
+ }
+ fl->fl_file = filp;
+ status = vfs_setlease(filp, fl->fl_type, &fl);
+ if (status) {
+ locks_free_lock(fl);
+ goto out_fput;
+ }
+ spin_lock(&state_lock);
+ spin_lock(&fp->fi_lock);
+ /* Did the lease get broken before we took the lock? */
+ status = -EAGAIN;
+ if (fp->fi_had_conflict)
+ goto out_unlock;
+ /* Race breaker */
+ if (fp->fi_lease) {
+ status = 0;
+ atomic_inc(&fp->fi_delegees);
+ hash_delegation_locked(dp, fp);
+ goto out_unlock;
+ }
fp->fi_lease = fl;
- fp->fi_deleg_file = fl->fl_file;
+ fp->fi_deleg_file = filp;
atomic_set(&fp->fi_delegees, 1);
- spin_lock(&state_lock);
hash_delegation_locked(dp, fp);
+ spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
return 0;
-out_free:
- if (fl->fl_file)
- fput(fl->fl_file);
- locks_free_lock(fl);
+out_unlock:
+ spin_unlock(&fp->fi_lock);
+ spin_unlock(&state_lock);
+out_fput:
+ fput(filp);
return status;
}
static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
{
+ int status = 0;
+
if (fp->fi_had_conflict)
return -EAGAIN;
get_nfs4_file(fp);
+ spin_lock(&state_lock);
+ spin_lock(&fp->fi_lock);
dp->dl_file = fp;
- if (!fp->fi_lease)
+ if (!fp->fi_lease) {
+ spin_unlock(&fp->fi_lock);
+ spin_unlock(&state_lock);
return nfs4_setlease(dp);
- spin_lock(&state_lock);
+ }
atomic_inc(&fp->fi_delegees);
if (fp->fi_had_conflict) {
- spin_unlock(&state_lock);
- return -EAGAIN;
+ status = -EAGAIN;
+ goto out_unlock;
}
hash_delegation_locked(dp, fp);
+out_unlock:
+ spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
- return 0;
+ return status;
}
static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status)
On Fri, 18 Jul 2014 12:44:17 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Fri, Jul 18, 2014 at 11:13:30AM -0400, Jeff Layton wrote:
> > Ensure that the delegations cannot be found by the laundromat etc once
> > we add them to the various 'revoke' lists.
>
> So if I understand right, the big mutex protects all of this right now,
> so this is all just moving the unhashing under a finer-grained lock to
> prevent that in-between state being exposed after the big lock's
> dropped. Looks reasonable.
>
> --b.
>
Yes, sorry -- I forgot the obligatory "this is not a problem until the
client_mutex goes away" comment...
> >
> > Signed-off-by: Trond Myklebust <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > Reviewed-by: Christoph Hellwig <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 37 +++++++++++++++++++++----------------
> > 1 file changed, 21 insertions(+), 16 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 60ae21abce00..7c5233427c9b 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -650,13 +650,13 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
> > list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
> > }
> >
> > -/* Called under the state lock. */
> > static void
> > -unhash_delegation(struct nfs4_delegation *dp)
> > +unhash_delegation_locked(struct nfs4_delegation *dp)
> > {
> > struct nfs4_file *fp = dp->dl_file;
> >
> > - spin_lock(&state_lock);
> > + lockdep_assert_held(&state_lock);
> > +
> > dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> > spin_lock(&fp->fi_lock);
> > list_del_init(&dp->dl_perclnt);
> > @@ -665,7 +665,6 @@ unhash_delegation(struct nfs4_delegation *dp)
> > spin_unlock(&fp->fi_lock);
> > if (fp)
> > nfs4_put_deleg_lease(fp);
> > - spin_unlock(&state_lock);
> > }
> >
> > static void destroy_revoked_delegation(struct nfs4_delegation *dp)
> > @@ -676,7 +675,9 @@ static void destroy_revoked_delegation(struct nfs4_delegation *dp)
> >
> > static void destroy_delegation(struct nfs4_delegation *dp)
> > {
> > - unhash_delegation(dp);
> > + spin_lock(&state_lock);
> > + unhash_delegation_locked(dp);
> > + spin_unlock(&state_lock);
> > nfs4_put_delegation(dp);
> > }
> >
> > @@ -685,11 +686,10 @@ static void revoke_delegation(struct nfs4_delegation *dp)
> > struct nfs4_client *clp = dp->dl_stid.sc_client;
> >
> > if (clp->cl_minorversion == 0)
> > - destroy_delegation(dp);
> > + destroy_revoked_delegation(dp);
> > else {
> > - unhash_delegation(dp);
> > dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
> > - list_add(&dp->dl_recall_lru, &clp->cl_revoked);
> > + list_move(&dp->dl_recall_lru, &clp->cl_revoked);
> > }
> > }
> >
> > @@ -1447,15 +1447,16 @@ destroy_client(struct nfs4_client *clp)
> > spin_lock(&state_lock);
> > while (!list_empty(&clp->cl_delegations)) {
> > dp = list_entry(clp->cl_delegations.next, struct nfs4_delegation, dl_perclnt);
> > - list_del_init(&dp->dl_perclnt);
> > + unhash_delegation_locked(dp);
> > /* Ensure that deleg break won't try to requeue it */
> > ++dp->dl_time;
> > - list_move(&dp->dl_recall_lru, &reaplist);
> > + list_add(&dp->dl_recall_lru, &reaplist);
> > }
> > spin_unlock(&state_lock);
> > while (!list_empty(&reaplist)) {
> > dp = list_entry(reaplist.next, struct nfs4_delegation, dl_recall_lru);
> > - destroy_delegation(dp);
> > + list_del_init(&dp->dl_recall_lru);
> > + nfs4_put_delegation(dp);
> > }
> > list_splice_init(&clp->cl_revoked, &reaplist);
> > while (!list_empty(&reaplist)) {
> > @@ -3655,7 +3656,7 @@ nfs4_open_delegation(struct net *net, struct svc_fh *fh,
> > open->op_delegate_type = NFS4_OPEN_DELEGATE_READ;
> > return;
> > out_free:
> > - destroy_delegation(dp);
> > + nfs4_put_delegation(dp);
> > out_no_deleg:
> > open->op_delegate_type = NFS4_OPEN_DELEGATE_NONE;
> > if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS &&
> > @@ -3894,7 +3895,8 @@ nfs4_laundromat(struct nfsd_net *nn)
> > new_timeo = min(new_timeo, t);
> > break;
> > }
> > - list_move(&dp->dl_recall_lru, &reaplist);
> > + unhash_delegation_locked(dp);
> > + list_add(&dp->dl_recall_lru, &reaplist);
> > }
> > spin_unlock(&state_lock);
> > list_for_each_safe(pos, next, &reaplist) {
> > @@ -5369,7 +5371,8 @@ static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max,
> > * don't monkey with it now that we are.
> > */
> > ++dp->dl_time;
> > - list_move(&dp->dl_recall_lru, victims);
> > + unhash_delegation_locked(dp);
> > + list_add(&dp->dl_recall_lru, victims);
> > }
> > if (++count == max)
> > break;
> > @@ -5624,12 +5627,14 @@ nfs4_state_shutdown_net(struct net *net)
> > spin_lock(&state_lock);
> > list_for_each_safe(pos, next, &nn->del_recall_lru) {
> > dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
> > - list_move(&dp->dl_recall_lru, &reaplist);
> > + unhash_delegation_locked(dp);
> > + list_add(&dp->dl_recall_lru, &reaplist);
> > }
> > spin_unlock(&state_lock);
> > list_for_each_safe(pos, next, &reaplist) {
> > dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
> > - destroy_delegation(dp);
> > + list_del_init(&dp->dl_recall_lru);
> > + nfs4_put_delegation(dp);
> > }
> >
> > nfsd4_client_tracking_exit(net);
> > --
> > 1.9.3
> >
--
Jeff Layton <[email protected]>
Move more of the delegation fields to be protected by the fi_lock. It's
more granular than the state_lock and in later patches we'll want to
be able to rely on it in addition to the state_lock.
Also, the current code in nfs4_setlease calls vfs_setlease and uses the
client_mutex to ensure that it doesn't disappear before we can hash the
delegation. With the client_mutex gone, we'll have a potential race
condition.
It's possible that the delegation could be recalled after we acquire the
lease but before we ever get around to hashing it. If that happens, then
we'd have a nfs4_file that *thinks* it has a delegation, when it
actually has none.
Attempt to acquire a delegation. If that succeeds, take the spinlocks
and then check to see if the file has had a conflict show up since then.
If it has, then we assume that the lease is no longer valid and that
we shouldn't hand out a delegation.
There's also one more potential (but very unlikely) problem. If the
lease is broken before the delegation is hashed, then it could leak.
In the event that the fi_delegations list is empty, reset the
fl_break_time to jiffies so that it's cleaned up ASAP by
the normal lease handling code.
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 90 +++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 66 insertions(+), 24 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index fd4deb049ddf..9ab067e85b51 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -624,6 +624,8 @@ nfs4_put_delegation(struct nfs4_delegation *dp)
static void nfs4_put_deleg_lease(struct nfs4_file *fp)
{
+ lockdep_assert_held(&state_lock);
+
if (!fp->fi_lease)
return;
if (atomic_dec_and_test(&fp->fi_delegees)) {
@@ -643,11 +645,10 @@ static void
hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
{
lockdep_assert_held(&state_lock);
+ lockdep_assert_held(&fp->fi_lock);
dp->dl_stid.sc_type = NFS4_DELEG_STID;
- spin_lock(&fp->fi_lock);
list_add(&dp->dl_perfile, &fp->fi_delegations);
- spin_unlock(&fp->fi_lock);
list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
}
@@ -659,17 +660,18 @@ unhash_delegation(struct nfs4_delegation *dp)
spin_lock(&state_lock);
dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
+ spin_lock(&fp->fi_lock);
list_del_init(&dp->dl_perclnt);
list_del_init(&dp->dl_recall_lru);
- spin_lock(&fp->fi_lock);
list_del_init(&dp->dl_perfile);
spin_unlock(&fp->fi_lock);
- spin_unlock(&state_lock);
if (fp) {
nfs4_put_deleg_lease(fp);
- put_nfs4_file(fp);
dp->dl_file = NULL;
}
+ spin_unlock(&state_lock);
+ if (fp)
+ put_nfs4_file(fp);
}
static void destroy_revoked_delegation(struct nfs4_delegation *dp)
@@ -3143,10 +3145,19 @@ static void nfsd_break_deleg_cb(struct file_lock *fl)
*/
fl->fl_break_time = 0;
- fp->fi_had_conflict = true;
spin_lock(&fp->fi_lock);
- list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
- nfsd_break_one_deleg(dp);
+ fp->fi_had_conflict = true;
+ /*
+ * If there are no delegations on the list, then we can't count on this
+ * lease ever being cleaned up. Set the fl_break_time to jiffies so that
+ * time_out_leases will do it ASAP. The fact that fi_had_conflict is now
+ * true should keep any new delegations from being hashed.
+ */
+ if (list_empty(&fp->fi_delegations))
+ fl->fl_break_time = jiffies;
+ else
+ list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
+ nfsd_break_one_deleg(dp);
spin_unlock(&fp->fi_lock);
}
@@ -3493,46 +3504,77 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
{
struct nfs4_file *fp = dp->dl_file;
struct file_lock *fl;
- int status;
+ struct file *filp;
+ int status = 0;
fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
if (!fl)
return -ENOMEM;
- fl->fl_file = find_readable_file(fp);
- status = vfs_setlease(fl->fl_file, fl->fl_type, &fl);
- if (status)
- goto out_free;
+ filp = find_readable_file(fp);
+ if (!filp) {
+ /* We should always have a readable file here */
+ WARN_ON_ONCE(1);
+ return -EBADF;
+ }
+ status = vfs_setlease(filp, fl->fl_type, &fl);
+ if (status) {
+ locks_free_lock(fl);
+ goto out_fput;
+ }
+ spin_lock(&state_lock);
+ spin_lock(&fp->fi_lock);
+ /* Did the lease get broken before we took the lock? */
+ status = -EAGAIN;
+ if (fp->fi_had_conflict)
+ goto out_unlock;
+ /* Race breaker */
+ if (fp->fi_lease) {
+ status = 0;
+ atomic_inc(&fp->fi_delegees);
+ hash_delegation_locked(dp, fp);
+ goto out_unlock;
+ }
fp->fi_lease = fl;
- fp->fi_deleg_file = fl->fl_file;
+ fp->fi_deleg_file = filp;
atomic_set(&fp->fi_delegees, 1);
- spin_lock(&state_lock);
hash_delegation_locked(dp, fp);
+ spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
return 0;
-out_free:
- if (fl->fl_file)
- fput(fl->fl_file);
- locks_free_lock(fl);
+out_unlock:
+ spin_unlock(&fp->fi_lock);
+ spin_unlock(&state_lock);
+out_fput:
+ if (filp)
+ fput(filp);
return status;
}
static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
{
+ int status = 0;
+
if (fp->fi_had_conflict)
return -EAGAIN;
get_nfs4_file(fp);
+ spin_lock(&state_lock);
+ spin_lock(&fp->fi_lock);
dp->dl_file = fp;
- if (!fp->fi_lease)
+ if (!fp->fi_lease) {
+ spin_unlock(&fp->fi_lock);
+ spin_unlock(&state_lock);
return nfs4_setlease(dp);
- spin_lock(&state_lock);
+ }
atomic_inc(&fp->fi_delegees);
if (fp->fi_had_conflict) {
- spin_unlock(&state_lock);
- return -EAGAIN;
+ status = -EAGAIN;
+ goto out_unlock;
}
hash_delegation_locked(dp, fp);
+out_unlock:
+ spin_unlock(&fp->fi_lock);
spin_unlock(&state_lock);
- return 0;
+ return status;
}
static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status)
--
1.9.3
On Fri, Jul 18, 2014 at 11:13:30AM -0400, Jeff Layton wrote:
> Ensure that the delegations cannot be found by the laundromat etc once
> we add them to the various 'revoke' lists.
So if I understand right, the big mutex protects all of this right now,
so this is all just moving the unhashing under a finer-grained lock to
prevent that in-between state being exposed after the big lock's
dropped. Looks reasonable.
--b.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 37 +++++++++++++++++++++----------------
> 1 file changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 60ae21abce00..7c5233427c9b 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -650,13 +650,13 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
> list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
> }
>
> -/* Called under the state lock. */
> static void
> -unhash_delegation(struct nfs4_delegation *dp)
> +unhash_delegation_locked(struct nfs4_delegation *dp)
> {
> struct nfs4_file *fp = dp->dl_file;
>
> - spin_lock(&state_lock);
> + lockdep_assert_held(&state_lock);
> +
> dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> spin_lock(&fp->fi_lock);
> list_del_init(&dp->dl_perclnt);
> @@ -665,7 +665,6 @@ unhash_delegation(struct nfs4_delegation *dp)
> spin_unlock(&fp->fi_lock);
> if (fp)
> nfs4_put_deleg_lease(fp);
> - spin_unlock(&state_lock);
> }
>
> static void destroy_revoked_delegation(struct nfs4_delegation *dp)
> @@ -676,7 +675,9 @@ static void destroy_revoked_delegation(struct nfs4_delegation *dp)
>
> static void destroy_delegation(struct nfs4_delegation *dp)
> {
> - unhash_delegation(dp);
> + spin_lock(&state_lock);
> + unhash_delegation_locked(dp);
> + spin_unlock(&state_lock);
> nfs4_put_delegation(dp);
> }
>
> @@ -685,11 +686,10 @@ static void revoke_delegation(struct nfs4_delegation *dp)
> struct nfs4_client *clp = dp->dl_stid.sc_client;
>
> if (clp->cl_minorversion == 0)
> - destroy_delegation(dp);
> + destroy_revoked_delegation(dp);
> else {
> - unhash_delegation(dp);
> dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
> - list_add(&dp->dl_recall_lru, &clp->cl_revoked);
> + list_move(&dp->dl_recall_lru, &clp->cl_revoked);
> }
> }
>
> @@ -1447,15 +1447,16 @@ destroy_client(struct nfs4_client *clp)
> spin_lock(&state_lock);
> while (!list_empty(&clp->cl_delegations)) {
> dp = list_entry(clp->cl_delegations.next, struct nfs4_delegation, dl_perclnt);
> - list_del_init(&dp->dl_perclnt);
> + unhash_delegation_locked(dp);
> /* Ensure that deleg break won't try to requeue it */
> ++dp->dl_time;
> - list_move(&dp->dl_recall_lru, &reaplist);
> + list_add(&dp->dl_recall_lru, &reaplist);
> }
> spin_unlock(&state_lock);
> while (!list_empty(&reaplist)) {
> dp = list_entry(reaplist.next, struct nfs4_delegation, dl_recall_lru);
> - destroy_delegation(dp);
> + list_del_init(&dp->dl_recall_lru);
> + nfs4_put_delegation(dp);
> }
> list_splice_init(&clp->cl_revoked, &reaplist);
> while (!list_empty(&reaplist)) {
> @@ -3655,7 +3656,7 @@ nfs4_open_delegation(struct net *net, struct svc_fh *fh,
> open->op_delegate_type = NFS4_OPEN_DELEGATE_READ;
> return;
> out_free:
> - destroy_delegation(dp);
> + nfs4_put_delegation(dp);
> out_no_deleg:
> open->op_delegate_type = NFS4_OPEN_DELEGATE_NONE;
> if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS &&
> @@ -3894,7 +3895,8 @@ nfs4_laundromat(struct nfsd_net *nn)
> new_timeo = min(new_timeo, t);
> break;
> }
> - list_move(&dp->dl_recall_lru, &reaplist);
> + unhash_delegation_locked(dp);
> + list_add(&dp->dl_recall_lru, &reaplist);
> }
> spin_unlock(&state_lock);
> list_for_each_safe(pos, next, &reaplist) {
> @@ -5369,7 +5371,8 @@ static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max,
> * don't monkey with it now that we are.
> */
> ++dp->dl_time;
> - list_move(&dp->dl_recall_lru, victims);
> + unhash_delegation_locked(dp);
> + list_add(&dp->dl_recall_lru, victims);
> }
> if (++count == max)
> break;
> @@ -5624,12 +5627,14 @@ nfs4_state_shutdown_net(struct net *net)
> spin_lock(&state_lock);
> list_for_each_safe(pos, next, &nn->del_recall_lru) {
> dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
> - list_move(&dp->dl_recall_lru, &reaplist);
> + unhash_delegation_locked(dp);
> + list_add(&dp->dl_recall_lru, &reaplist);
> }
> spin_unlock(&state_lock);
> list_for_each_safe(pos, next, &reaplist) {
> dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
> - destroy_delegation(dp);
> + list_del_init(&dp->dl_recall_lru);
> + nfs4_put_delegation(dp);
> }
>
> nfsd4_client_tracking_exit(net);
> --
> 1.9.3
>
On Fri, Jul 18, 2014 at 03:32:24PM -0400, Jeff Layton wrote:
> On Fri, 18 Jul 2014 15:21:49 -0400
> "J. Bruce Fields" <[email protected]> wrote:
>
> > On Fri, Jul 18, 2014 at 03:04:04PM -0400, Jeff Layton wrote:
> > > On Fri, 18 Jul 2014 13:49:57 -0400
> > > "J. Bruce Fields" <[email protected]> wrote:
> > >
> > > > On Fri, Jul 18, 2014 at 01:31:40PM -0400, Jeff Layton wrote:
> > > > > On Fri, 18 Jul 2014 12:28:25 -0400
> > > > > "J. Bruce Fields" <[email protected]> wrote:
> > > > >
> > > > > > On Fri, Jul 18, 2014 at 11:13:27AM -0400, Jeff Layton wrote:
> > > > > > > Move more of the delegation fields to be protected by the fi_lock. It's
> > > > > > > more granular than the state_lock and in later patches we'll want to
> > > > > > > be able to rely on it in addition to the state_lock.
> > > > > > >
> > > > > > > Also, the current code in nfs4_setlease calls vfs_setlease and uses the
> > > > > > > client_mutex to ensure that it doesn't disappear before we can hash the
> > > > > > > delegation. With the client_mutex gone, we'll have a potential race
> > > > > > > condition.
> > > > > > >
> > > > > > > It's possible that the delegation could be recalled after we acquire the
> > > > > > > lease but before we ever get around to hashing it. If that happens, then
> > > > > > > we'd have a nfs4_file that *thinks* it has a delegation, when it
> > > > > > > actually has none.
> > > > > >
> > > > > > I understand now, thanks: so the lease break code walks the list of
> > > > > > delegations associated with the file, finds none, and issues no recall,
> > > > > > but the open code continues merrily on and returns a delegation, with
> > > > > > the result that we return the client a delegation that will never be
> > > > > > recalled.
> > > > > >
> > > > > > That could be worded more carefully, and would be worth a separate patch
> > > > > > (since the bug predates the new locking).
> > > > > >
> > > > >
> > > > > Yes, that's basically correct. I'd have to think about how to fix that
> > > > > with the current code. It's probably doable if you think it's
> > > > > worthwhile, but I'll need to rebase this set on top of it.
> > > >
> > > > Well, I was wondering if this patch could just be split in two, no need
> > > > to backport further than that.
> > > >
> > >
> > > Erm, now that I've looked, I don't think it'll be that easy. The key
> > > here is to ensure that fi_had_conflict is set while holding the
> > > fi_lock. The trick here is that we need to take it in nfs4_setlease as
> > > well, and check the flag before hashing the delegation without dropping
> > > the fi_lock.
> >
> > OK, I'll live. For the sake of anyone that actually runs across that
> > bug I'll update the summary and changelog to emphasize the bugfix over
> > the locking change.
> >
>
> Ok, thanks.
>
> > > > > > > Attempt to acquire a delegation. If that succeeds, take the spinlocks
> > > > > > > and then check to see if the file has had a conflict show up since then.
> > > > > > > If it has, then we assume that the lease is no longer valid and that
> > > > > > > we shouldn't hand out a delegation.
> > > > > > >
> > > > > > > There's also one more potential (but very unlikely) problem. If the
> > > > > > > lease is broken before the delegation is hashed, then it could leak.
> > > > > > > In the event that the fi_delegations list is empty, reset the
> > > > > > > fl_break_time to jiffies so that it's cleaned up ASAP by
> > > > > > > the normal lease handling code.
> > > > > >
> > > > > > Is there actually any guarantee time_out_leases() will get called on
> > > > > > this inode again?
> > > > > >
> > > > > > --b.
> > > > > >
> > > > >
> > > > > Yes. Lease breaks are handled in two phases. We walk the i_flock list
> > > > > and issue a ->lm_break on each lease, and then later we walk the list
> > > > > again after putting the task to sleep, and try to time out the leases.
> > > > > So by doing this, we should ensure that the task will wake up after
> > > > > sleeping and delete it.
> > > >
> > > > In the case of an interrupt or a nonblocking break (which is what nfsd
> > > > will do), then time_out_leases isn't called again from what I could
> > > > tell.
> > > >
> > > > --b.
> > > >
> > >
> > > In both cases, time_out_leases is still called at the beginning of
> > > __break_lease. So the next time that function is called it'll
> > > get cleaned up, or when the filp is closed (in locks_remove_file).
> >
> > Right, but there's no guarantee another break_lease comes. E.g. the
> > process waiting on the lease break could get killed.
> >
> > --b.
>
> In that case, there's no harm in leaving the lease on the list until
> the filp closes.
Doh, of course.
> FWIW, I looked at trying to just remove the lease from the list, but
> that's not safe from the lm_break callback. So, I think this is the
> best we can reasonably do here.
Makes sense, thanks!
--b.
From: Trond Myklebust <[email protected]>
We want to convert to an atomic type so that we don't need to lock
across the call to alloc_init_deleg(). Then convert to a long type so
that we match the size of 'max_delegations'.
None of this is a problem today, but it will be once we remove
client_mutex protection.
Signed-off-by: Trond Myklebust <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/nfsd/nfs4state.c | 18 +++++++++++-------
1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index d11b298e625e..40def186ee50 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -343,7 +343,7 @@ find_any_file(struct nfs4_file *f)
return ret;
}
-static int num_delegations;
+static atomic_long_t num_delegations;
unsigned long max_delegations;
/*
@@ -571,22 +571,23 @@ static struct nfs4_delegation *
alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct svc_fh *current_fh)
{
struct nfs4_delegation *dp;
+ long n;
dprintk("NFSD alloc_init_deleg\n");
- if (num_delegations > max_delegations)
- return NULL;
+ n = atomic_long_inc_return(&num_delegations);
+ if (n < 0 || n > max_delegations)
+ goto out_dec;
if (delegation_blocked(¤t_fh->fh_handle))
- return NULL;
+ goto out_dec;
dp = delegstateid(nfs4_alloc_stid(clp, deleg_slab));
if (dp == NULL)
- return dp;
+ goto out_dec;
/*
* delegation seqid's are never incremented. The 4.1 special
* meaning of seqid 0 isn't meaningful, really, but let's avoid
* 0 anyway just for consistency and use 1:
*/
dp->dl_stid.sc_stateid.si_generation = 1;
- num_delegations++;
INIT_LIST_HEAD(&dp->dl_perfile);
INIT_LIST_HEAD(&dp->dl_perclnt);
INIT_LIST_HEAD(&dp->dl_recall_lru);
@@ -594,6 +595,9 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct sv
fh_copy_shallow(&dp->dl_fh, ¤t_fh->fh_handle);
INIT_WORK(&dp->dl_recall.cb_work, nfsd4_run_cb_recall);
return dp;
+out_dec:
+ atomic_long_dec(&num_delegations);
+ return NULL;
}
static void remove_stid(struct nfs4_stid *s)
@@ -616,7 +620,7 @@ nfs4_put_delegation(struct nfs4_delegation *dp)
put_nfs4_file(dp->dl_file);
remove_stid(&dp->dl_stid);
nfs4_free_stid(deleg_slab, &dp->dl_stid);
- num_delegations--;
+ atomic_long_dec(&num_delegations);
}
}
--
1.9.3
> +out_unlock:
> + spin_unlock(&fp->fi_lock);
> + spin_unlock(&state_lock);
> +out_fput:
> + if (filp)
> + fput(filp);
I don't think fput can be NULL here.
> static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
> {
> + int status = 0;
> +
> if (fp->fi_had_conflict)
> return -EAGAIN;
> get_nfs4_file(fp);
> + spin_lock(&state_lock);
> + spin_lock(&fp->fi_lock);
> dp->dl_file = fp;
> + if (!fp->fi_lease) {
> + spin_unlock(&fp->fi_lock);
> + spin_unlock(&state_lock);
> return nfs4_setlease(dp);
> + }
> atomic_inc(&fp->fi_delegees);
> if (fp->fi_had_conflict) {
> + status = -EAGAIN;
> + goto out_unlock;
> }
> hash_delegation_locked(dp, fp);
> +out_unlock:
> + spin_unlock(&fp->fi_lock);
> spin_unlock(&state_lock);
> + return status;
I have to admit that I didn't have time to go through all the
surrounding code yet, but is there error handling correct here,
i.e. no need to rop the file reference and cleanup dp->dl_file for any
error or race case?
On Fri, Jul 18, 2014 at 11:13:33AM -0400, Jeff Layton wrote:
> Signed-off-by: Jeff Layton <[email protected]>
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
On Mon, 21 Jul 2014 17:02:54 +1000
NeilBrown <[email protected]> wrote:
> On Fri, 18 Jul 2014 11:13:36 -0400 Jeff Layton <[email protected]>
> wrote:
>
> > The state lock can be fairly heavily contended, and there's no reason
> > that nfs4_file lookups and delegation_blocked should be mutually
> > exclusive. Let's give the new block_delegation code its own spinlock.
> > It does mean that we'll need to take a different lock in the delegation
> > break code, but that's not generally as critical to performance.
> >
> > Cc: Neil Brown <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
>
> Makes sense, thanks.
> However.....
>
>
> > ---
> > fs/nfsd/nfs4state.c | 25 +++++++++++++------------
> > 1 file changed, 13 insertions(+), 12 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index a2c6c85adfc7..952def00363b 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -506,10 +506,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
> > * Each filter is 256 bits. We hash the filehandle to 32bit and use the
> > * low 3 bytes as hash-table indices.
> > *
> > - * 'state_lock', which is always held when block_delegations() is called,
> > - * is used to manage concurrent access. Testing does not need the lock
> > - * except when swapping the two filters.
> > + * 'blocked_delegations_lock', which is always held when block_delegations()
> > + * is called, is used to manage concurrent access. Testing does not need the
> > + * lock except when swapping the two filters.
>
> ...this comment is wrong. blocked_delegations_lock is *not* held when
> block_delegations() is call, it is taken when needed (almost) by
> block_delegations.
>
Thanks, fixed.
> > */
> > +static DEFINE_SPINLOCK(blocked_delegations_lock);
> > static struct bloom_pair {
> > int entries, old_entries;
> > time_t swap_time;
> > @@ -525,7 +526,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
> > if (bd->entries == 0)
> > return 0;
> > if (seconds_since_boot() - bd->swap_time > 30) {
> > - spin_lock(&state_lock);
> > + spin_lock(&blocked_delegations_lock);
> > if (seconds_since_boot() - bd->swap_time > 30) {
> > bd->entries -= bd->old_entries;
> > bd->old_entries = bd->entries;
> > @@ -534,7 +535,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
> > bd->new = 1-bd->new;
> > bd->swap_time = seconds_since_boot();
> > }
> > - spin_unlock(&state_lock);
> > + spin_unlock(&blocked_delegations_lock);
> > }
> > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> > if (test_bit(hash&255, bd->set[0]) &&
> > @@ -555,16 +556,16 @@ static void block_delegations(struct knfsd_fh *fh)
> > u32 hash;
> > struct bloom_pair *bd = &blocked_delegations;
> >
> > - lockdep_assert_held(&state_lock);
> > -
> > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> >
> > __set_bit(hash&255, bd->set[bd->new]);
> > __set_bit((hash>>8)&255, bd->set[bd->new]);
> > __set_bit((hash>>16)&255, bd->set[bd->new]);
> > + spin_lock(&blocked_delegations_lock);
>
> __set_bit isn't atomic. The spin_lock should be taken *before* these
> __set_bit() calls.
>
> Otherwise, looks fine.
>
> Thanks,
> NeilBrown
>
>
Ok. I guess the worry is that we could end up setting bits in the
middle of swapping the two fields? Makes sense -- fixed in my repo.
I'll send out the updated set later today (it also includes a few nits
that HCH pointed out last week).
As a side note...I wonder how much we'll get in the way of false
positives with this scheme?
Given that we'll always have (or will have had) a nfs4_file
corresponding to this inode, perhaps we'd be better off doing something
like storing (and maybe hashing on) the filehandle in the nfs4_file,
and just ensuring that we hold on to it for 30s or so after the last
put?
Not something I'm looking at doing today, but it might be worth
considering for a later delegations rework.
> > if (bd->entries == 0)
> > bd->swap_time = seconds_since_boot();
> > bd->entries += 1;
> > + spin_unlock(&blocked_delegations_lock);
> > }
> >
> > static struct nfs4_delegation *
> > @@ -3097,16 +3098,16 @@ void nfsd4_prepare_cb_recall(struct nfs4_delegation *dp)
> > struct nfs4_client *clp = dp->dl_stid.sc_client;
> > struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
> >
> > - /*
> > - * We can't do this in nfsd_break_deleg_cb because it is
> > - * already holding inode->i_lock
> > - */
> > - spin_lock(&state_lock);
> > block_delegations(&dp->dl_fh);
> > +
> > /*
> > + * We can't do this in nfsd_break_deleg_cb because it is
> > + * already holding inode->i_lock.
> > + *
> > * If the dl_time != 0, then we know that it has already been
> > * queued for a lease break. Don't queue it again.
> > */
> > + spin_lock(&state_lock);
> > if (dp->dl_time == 0) {
> > dp->dl_time = get_seconds();
> > list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru);
>
--
Jeff Layton <[email protected]>
On Fri, Jul 18, 2014 at 11:13:27AM -0400, Jeff Layton wrote:
> Move more of the delegation fields to be protected by the fi_lock. It's
> more granular than the state_lock and in later patches we'll want to
> be able to rely on it in addition to the state_lock.
>
> Also, the current code in nfs4_setlease calls vfs_setlease and uses the
> client_mutex to ensure that it doesn't disappear before we can hash the
> delegation. With the client_mutex gone, we'll have a potential race
> condition.
>
> It's possible that the delegation could be recalled after we acquire the
> lease but before we ever get around to hashing it. If that happens, then
> we'd have a nfs4_file that *thinks* it has a delegation, when it
> actually has none.
I understand now, thanks: so the lease break code walks the list of
delegations associated with the file, finds none, and issues no recall,
but the open code continues merrily on and returns a delegation, with
the result that we return the client a delegation that will never be
recalled.
That could be worded more carefully, and would be worth a separate patch
(since the bug predates the new locking).
> Attempt to acquire a delegation. If that succeeds, take the spinlocks
> and then check to see if the file has had a conflict show up since then.
> If it has, then we assume that the lease is no longer valid and that
> we shouldn't hand out a delegation.
>
> There's also one more potential (but very unlikely) problem. If the
> lease is broken before the delegation is hashed, then it could leak.
> In the event that the fi_delegations list is empty, reset the
> fl_break_time to jiffies so that it's cleaned up ASAP by
> the normal lease handling code.
Is there actually any guarantee time_out_leases() will get called on
this inode again?
--b.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 90 +++++++++++++++++++++++++++++++++++++++--------------
> 1 file changed, 66 insertions(+), 24 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index fd4deb049ddf..9ab067e85b51 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -624,6 +624,8 @@ nfs4_put_delegation(struct nfs4_delegation *dp)
>
> static void nfs4_put_deleg_lease(struct nfs4_file *fp)
> {
> + lockdep_assert_held(&state_lock);
> +
> if (!fp->fi_lease)
> return;
> if (atomic_dec_and_test(&fp->fi_delegees)) {
> @@ -643,11 +645,10 @@ static void
> hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
> {
> lockdep_assert_held(&state_lock);
> + lockdep_assert_held(&fp->fi_lock);
>
> dp->dl_stid.sc_type = NFS4_DELEG_STID;
> - spin_lock(&fp->fi_lock);
> list_add(&dp->dl_perfile, &fp->fi_delegations);
> - spin_unlock(&fp->fi_lock);
> list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
> }
>
> @@ -659,17 +660,18 @@ unhash_delegation(struct nfs4_delegation *dp)
>
> spin_lock(&state_lock);
> dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> + spin_lock(&fp->fi_lock);
> list_del_init(&dp->dl_perclnt);
> list_del_init(&dp->dl_recall_lru);
> - spin_lock(&fp->fi_lock);
> list_del_init(&dp->dl_perfile);
> spin_unlock(&fp->fi_lock);
> - spin_unlock(&state_lock);
> if (fp) {
> nfs4_put_deleg_lease(fp);
> - put_nfs4_file(fp);
> dp->dl_file = NULL;
> }
> + spin_unlock(&state_lock);
> + if (fp)
> + put_nfs4_file(fp);
> }
>
> static void destroy_revoked_delegation(struct nfs4_delegation *dp)
> @@ -3143,10 +3145,19 @@ static void nfsd_break_deleg_cb(struct file_lock *fl)
> */
> fl->fl_break_time = 0;
>
> - fp->fi_had_conflict = true;
> spin_lock(&fp->fi_lock);
> - list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
> - nfsd_break_one_deleg(dp);
> + fp->fi_had_conflict = true;
> + /*
> + * If there are no delegations on the list, then we can't count on this
> + * lease ever being cleaned up. Set the fl_break_time to jiffies so that
> + * time_out_leases will do it ASAP. The fact that fi_had_conflict is now
> + * true should keep any new delegations from being hashed.
> + */
> + if (list_empty(&fp->fi_delegations))
> + fl->fl_break_time = jiffies;
> + else
> + list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
> + nfsd_break_one_deleg(dp);
> spin_unlock(&fp->fi_lock);
> }
>
> @@ -3493,46 +3504,77 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
> {
> struct nfs4_file *fp = dp->dl_file;
> struct file_lock *fl;
> - int status;
> + struct file *filp;
> + int status = 0;
>
> fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
> if (!fl)
> return -ENOMEM;
> - fl->fl_file = find_readable_file(fp);
> - status = vfs_setlease(fl->fl_file, fl->fl_type, &fl);
> - if (status)
> - goto out_free;
> + filp = find_readable_file(fp);
> + if (!filp) {
> + /* We should always have a readable file here */
> + WARN_ON_ONCE(1);
> + return -EBADF;
> + }
> + status = vfs_setlease(filp, fl->fl_type, &fl);
> + if (status) {
> + locks_free_lock(fl);
> + goto out_fput;
> + }
> + spin_lock(&state_lock);
> + spin_lock(&fp->fi_lock);
> + /* Did the lease get broken before we took the lock? */
> + status = -EAGAIN;
> + if (fp->fi_had_conflict)
> + goto out_unlock;
> + /* Race breaker */
> + if (fp->fi_lease) {
> + status = 0;
> + atomic_inc(&fp->fi_delegees);
> + hash_delegation_locked(dp, fp);
> + goto out_unlock;
> + }
> fp->fi_lease = fl;
> - fp->fi_deleg_file = fl->fl_file;
> + fp->fi_deleg_file = filp;
> atomic_set(&fp->fi_delegees, 1);
> - spin_lock(&state_lock);
> hash_delegation_locked(dp, fp);
> + spin_unlock(&fp->fi_lock);
> spin_unlock(&state_lock);
> return 0;
> -out_free:
> - if (fl->fl_file)
> - fput(fl->fl_file);
> - locks_free_lock(fl);
> +out_unlock:
> + spin_unlock(&fp->fi_lock);
> + spin_unlock(&state_lock);
> +out_fput:
> + if (filp)
> + fput(filp);
> return status;
> }
>
> static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
> {
> + int status = 0;
> +
> if (fp->fi_had_conflict)
> return -EAGAIN;
> get_nfs4_file(fp);
> + spin_lock(&state_lock);
> + spin_lock(&fp->fi_lock);
> dp->dl_file = fp;
> - if (!fp->fi_lease)
> + if (!fp->fi_lease) {
> + spin_unlock(&fp->fi_lock);
> + spin_unlock(&state_lock);
> return nfs4_setlease(dp);
> - spin_lock(&state_lock);
> + }
> atomic_inc(&fp->fi_delegees);
> if (fp->fi_had_conflict) {
> - spin_unlock(&state_lock);
> - return -EAGAIN;
> + status = -EAGAIN;
> + goto out_unlock;
> }
> hash_delegation_locked(dp, fp);
> +out_unlock:
> + spin_unlock(&fp->fi_lock);
> spin_unlock(&state_lock);
> - return 0;
> + return status;
> }
>
> static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status)
> --
> 1.9.3
>
On Fri, 18 Jul 2014 13:49:57 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Fri, Jul 18, 2014 at 01:31:40PM -0400, Jeff Layton wrote:
> > On Fri, 18 Jul 2014 12:28:25 -0400
> > "J. Bruce Fields" <[email protected]> wrote:
> >
> > > On Fri, Jul 18, 2014 at 11:13:27AM -0400, Jeff Layton wrote:
> > > > Move more of the delegation fields to be protected by the fi_lock. It's
> > > > more granular than the state_lock and in later patches we'll want to
> > > > be able to rely on it in addition to the state_lock.
> > > >
> > > > Also, the current code in nfs4_setlease calls vfs_setlease and uses the
> > > > client_mutex to ensure that it doesn't disappear before we can hash the
> > > > delegation. With the client_mutex gone, we'll have a potential race
> > > > condition.
> > > >
> > > > It's possible that the delegation could be recalled after we acquire the
> > > > lease but before we ever get around to hashing it. If that happens, then
> > > > we'd have a nfs4_file that *thinks* it has a delegation, when it
> > > > actually has none.
> > >
> > > I understand now, thanks: so the lease break code walks the list of
> > > delegations associated with the file, finds none, and issues no recall,
> > > but the open code continues merrily on and returns a delegation, with
> > > the result that we return the client a delegation that will never be
> > > recalled.
> > >
> > > That could be worded more carefully, and would be worth a separate patch
> > > (since the bug predates the new locking).
> > >
> >
> > Yes, that's basically correct. I'd have to think about how to fix that
> > with the current code. It's probably doable if you think it's
> > worthwhile, but I'll need to rebase this set on top of it.
>
> Well, I was wondering if this patch could just be split in two, no need
> to backport further than that.
>
Erm, now that I've looked, I don't think it'll be that easy. The key
here is to ensure that fi_had_conflict is set while holding the
fi_lock. The trick here is that we need to take it in nfs4_setlease as
well, and check the flag before hashing the delegation without dropping
the fi_lock.
> > > > Attempt to acquire a delegation. If that succeeds, take the spinlocks
> > > > and then check to see if the file has had a conflict show up since then.
> > > > If it has, then we assume that the lease is no longer valid and that
> > > > we shouldn't hand out a delegation.
> > > >
> > > > There's also one more potential (but very unlikely) problem. If the
> > > > lease is broken before the delegation is hashed, then it could leak.
> > > > In the event that the fi_delegations list is empty, reset the
> > > > fl_break_time to jiffies so that it's cleaned up ASAP by
> > > > the normal lease handling code.
> > >
> > > Is there actually any guarantee time_out_leases() will get called on
> > > this inode again?
> > >
> > > --b.
> > >
> >
> > Yes. Lease breaks are handled in two phases. We walk the i_flock list
> > and issue a ->lm_break on each lease, and then later we walk the list
> > again after putting the task to sleep, and try to time out the leases.
> > So by doing this, we should ensure that the task will wake up after
> > sleeping and delete it.
>
> In the case of an interrupt or a nonblocking break (which is what nfsd
> will do), then time_out_leases isn't called again from what I could
> tell.
>
> --b.
>
In both cases, time_out_leases is still called at the beginning of
__break_lease. So the next time that function is called it'll
get cleaned up, or when the filp is closed (in locks_remove_file).
> >
> > > >
> > > > Signed-off-by: Trond Myklebust <[email protected]>
> > > > Signed-off-by: Jeff Layton <[email protected]>
> > > > ---
> > > > fs/nfsd/nfs4state.c | 90
> > > > +++++++++++++++++++++++++++++++++++++++-------------- 1 file
> > > > changed, 66 insertions(+), 24 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > index fd4deb049ddf..9ab067e85b51 100644
> > > > --- a/fs/nfsd/nfs4state.c
> > > > +++ b/fs/nfsd/nfs4state.c
> > > > @@ -624,6 +624,8 @@ nfs4_put_delegation(struct nfs4_delegation *dp)
> > > >
> > > > static void nfs4_put_deleg_lease(struct nfs4_file *fp)
> > > > {
> > > > + lockdep_assert_held(&state_lock);
> > > > +
> > > > if (!fp->fi_lease)
> > > > return;
> > > > if (atomic_dec_and_test(&fp->fi_delegees)) {
> > > > @@ -643,11 +645,10 @@ static void
> > > > hash_delegation_locked(struct nfs4_delegation *dp, struct
> > > > nfs4_file *fp) {
> > > > lockdep_assert_held(&state_lock);
> > > > + lockdep_assert_held(&fp->fi_lock);
> > > >
> > > > dp->dl_stid.sc_type = NFS4_DELEG_STID;
> > > > - spin_lock(&fp->fi_lock);
> > > > list_add(&dp->dl_perfile, &fp->fi_delegations);
> > > > - spin_unlock(&fp->fi_lock);
> > > > list_add(&dp->dl_perclnt,
> > > > &dp->dl_stid.sc_client->cl_delegations); }
> > > >
> > > > @@ -659,17 +660,18 @@ unhash_delegation(struct nfs4_delegation *dp)
> > > >
> > > > spin_lock(&state_lock);
> > > > dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> > > > + spin_lock(&fp->fi_lock);
> > > > list_del_init(&dp->dl_perclnt);
> > > > list_del_init(&dp->dl_recall_lru);
> > > > - spin_lock(&fp->fi_lock);
> > > > list_del_init(&dp->dl_perfile);
> > > > spin_unlock(&fp->fi_lock);
> > > > - spin_unlock(&state_lock);
> > > > if (fp) {
> > > > nfs4_put_deleg_lease(fp);
> > > > - put_nfs4_file(fp);
> > > > dp->dl_file = NULL;
> > > > }
> > > > + spin_unlock(&state_lock);
> > > > + if (fp)
> > > > + put_nfs4_file(fp);
> > > > }
> > > >
> > > > static void destroy_revoked_delegation(struct nfs4_delegation *dp)
> > > > @@ -3143,10 +3145,19 @@ static void nfsd_break_deleg_cb(struct
> > > > file_lock *fl) */
> > > > fl->fl_break_time = 0;
> > > >
> > > > - fp->fi_had_conflict = true;
> > > > spin_lock(&fp->fi_lock);
> > > > - list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
> > > > - nfsd_break_one_deleg(dp);
> > > > + fp->fi_had_conflict = true;
> > > > + /*
> > > > + * If there are no delegations on the list, then we can't
> > > > count on this
> > > > + * lease ever being cleaned up. Set the fl_break_time to
> > > > jiffies so that
> > > > + * time_out_leases will do it ASAP. The fact that
> > > > fi_had_conflict is now
> > > > + * true should keep any new delegations from being hashed.
> > > > + */
> > > > + if (list_empty(&fp->fi_delegations))
> > > > + fl->fl_break_time = jiffies;
> > > > + else
> > > > + list_for_each_entry(dp, &fp->fi_delegations,
> > > > dl_perfile)
> > > > + nfsd_break_one_deleg(dp);
> > > > spin_unlock(&fp->fi_lock);
> > > > }
> > > >
> > > > @@ -3493,46 +3504,77 @@ static int nfs4_setlease(struct
> > > > nfs4_delegation *dp) {
> > > > struct nfs4_file *fp = dp->dl_file;
> > > > struct file_lock *fl;
> > > > - int status;
> > > > + struct file *filp;
> > > > + int status = 0;
> > > >
> > > > fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
> > > > if (!fl)
> > > > return -ENOMEM;
> > > > - fl->fl_file = find_readable_file(fp);
> > > > - status = vfs_setlease(fl->fl_file, fl->fl_type, &fl);
> > > > - if (status)
> > > > - goto out_free;
> > > > + filp = find_readable_file(fp);
> > > > + if (!filp) {
> > > > + /* We should always have a readable file here */
> > > > + WARN_ON_ONCE(1);
> > > > + return -EBADF;
> > > > + }
> > > > + status = vfs_setlease(filp, fl->fl_type, &fl);
> > > > + if (status) {
> > > > + locks_free_lock(fl);
> > > > + goto out_fput;
> > > > + }
> > > > + spin_lock(&state_lock);
> > > > + spin_lock(&fp->fi_lock);
> > > > + /* Did the lease get broken before we took the lock? */
> > > > + status = -EAGAIN;
> > > > + if (fp->fi_had_conflict)
> > > > + goto out_unlock;
> > > > + /* Race breaker */
> > > > + if (fp->fi_lease) {
> > > > + status = 0;
> > > > + atomic_inc(&fp->fi_delegees);
> > > > + hash_delegation_locked(dp, fp);
> > > > + goto out_unlock;
> > > > + }
> > > > fp->fi_lease = fl;
> > > > - fp->fi_deleg_file = fl->fl_file;
> > > > + fp->fi_deleg_file = filp;
> > > > atomic_set(&fp->fi_delegees, 1);
> > > > - spin_lock(&state_lock);
> > > > hash_delegation_locked(dp, fp);
> > > > + spin_unlock(&fp->fi_lock);
> > > > spin_unlock(&state_lock);
> > > > return 0;
> > > > -out_free:
> > > > - if (fl->fl_file)
> > > > - fput(fl->fl_file);
> > > > - locks_free_lock(fl);
> > > > +out_unlock:
> > > > + spin_unlock(&fp->fi_lock);
> > > > + spin_unlock(&state_lock);
> > > > +out_fput:
> > > > + if (filp)
> > > > + fput(filp);
> > > > return status;
> > > > }
> > > >
> > > > static int nfs4_set_delegation(struct nfs4_delegation *dp, struct
> > > > nfs4_file *fp) {
> > > > + int status = 0;
> > > > +
> > > > if (fp->fi_had_conflict)
> > > > return -EAGAIN;
> > > > get_nfs4_file(fp);
> > > > + spin_lock(&state_lock);
> > > > + spin_lock(&fp->fi_lock);
> > > > dp->dl_file = fp;
> > > > - if (!fp->fi_lease)
> > > > + if (!fp->fi_lease) {
> > > > + spin_unlock(&fp->fi_lock);
> > > > + spin_unlock(&state_lock);
> > > > return nfs4_setlease(dp);
> > > > - spin_lock(&state_lock);
> > > > + }
> > > > atomic_inc(&fp->fi_delegees);
> > > > if (fp->fi_had_conflict) {
> > > > - spin_unlock(&state_lock);
> > > > - return -EAGAIN;
> > > > + status = -EAGAIN;
> > > > + goto out_unlock;
> > > > }
> > > > hash_delegation_locked(dp, fp);
> > > > +out_unlock:
> > > > + spin_unlock(&fp->fi_lock);
> > > > spin_unlock(&state_lock);
> > > > - return 0;
> > > > + return status;
> > > > }
> > > >
> > > > static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int
> > > > status) --
> > > > 1.9.3
> > > >
> >
> >
> > --
> > Jeff Layton <[email protected]>
--
Jeff Layton <[email protected]>
On Mon, Jul 21, 2014 at 07:44:12AM -0400, Jeff Layton wrote:
> On Mon, 21 Jul 2014 17:02:54 +1000
> NeilBrown <[email protected]> wrote:
>
> > On Fri, 18 Jul 2014 11:13:36 -0400 Jeff Layton <[email protected]>
> > wrote:
> >
> > > The state lock can be fairly heavily contended, and there's no reason
> > > that nfs4_file lookups and delegation_blocked should be mutually
> > > exclusive. Let's give the new block_delegation code its own spinlock.
> > > It does mean that we'll need to take a different lock in the delegation
> > > break code, but that's not generally as critical to performance.
> > >
> > > Cc: Neil Brown <[email protected]>
> > > Signed-off-by: Jeff Layton <[email protected]>
> >
> > Makes sense, thanks.
> > However.....
> >
> >
> > > ---
> > > fs/nfsd/nfs4state.c | 25 +++++++++++++------------
> > > 1 file changed, 13 insertions(+), 12 deletions(-)
> > >
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index a2c6c85adfc7..952def00363b 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -506,10 +506,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
> > > * Each filter is 256 bits. We hash the filehandle to 32bit and use the
> > > * low 3 bytes as hash-table indices.
> > > *
> > > - * 'state_lock', which is always held when block_delegations() is called,
> > > - * is used to manage concurrent access. Testing does not need the lock
> > > - * except when swapping the two filters.
> > > + * 'blocked_delegations_lock', which is always held when block_delegations()
> > > + * is called, is used to manage concurrent access. Testing does not need the
> > > + * lock except when swapping the two filters.
> >
> > ...this comment is wrong. blocked_delegations_lock is *not* held when
> > block_delegations() is call, it is taken when needed (almost) by
> > block_delegations.
> >
>
> Thanks, fixed.
>
> > > */
> > > +static DEFINE_SPINLOCK(blocked_delegations_lock);
> > > static struct bloom_pair {
> > > int entries, old_entries;
> > > time_t swap_time;
> > > @@ -525,7 +526,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
> > > if (bd->entries == 0)
> > > return 0;
> > > if (seconds_since_boot() - bd->swap_time > 30) {
> > > - spin_lock(&state_lock);
> > > + spin_lock(&blocked_delegations_lock);
> > > if (seconds_since_boot() - bd->swap_time > 30) {
> > > bd->entries -= bd->old_entries;
> > > bd->old_entries = bd->entries;
> > > @@ -534,7 +535,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
> > > bd->new = 1-bd->new;
> > > bd->swap_time = seconds_since_boot();
> > > }
> > > - spin_unlock(&state_lock);
> > > + spin_unlock(&blocked_delegations_lock);
> > > }
> > > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> > > if (test_bit(hash&255, bd->set[0]) &&
> > > @@ -555,16 +556,16 @@ static void block_delegations(struct knfsd_fh *fh)
> > > u32 hash;
> > > struct bloom_pair *bd = &blocked_delegations;
> > >
> > > - lockdep_assert_held(&state_lock);
> > > -
> > > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> > >
> > > __set_bit(hash&255, bd->set[bd->new]);
> > > __set_bit((hash>>8)&255, bd->set[bd->new]);
> > > __set_bit((hash>>16)&255, bd->set[bd->new]);
> > > + spin_lock(&blocked_delegations_lock);
> >
> > __set_bit isn't atomic. The spin_lock should be taken *before* these
> > __set_bit() calls.
> >
> > Otherwise, looks fine.
> >
> > Thanks,
> > NeilBrown
> >
> >
>
> Ok. I guess the worry is that we could end up setting bits in the
> middle of swapping the two fields? Makes sense -- fixed in my repo.
> I'll send out the updated set later today (it also includes a few nits
> that HCH pointed out last week).
>
> As a side note...I wonder how much we'll get in the way of false
> positives with this scheme?
>
> Given that we'll always have (or will have had) a nfs4_file
> corresponding to this inode, perhaps we'd be better off doing something
> like storing (and maybe hashing on) the filehandle in the nfs4_file,
> and just ensuring that we hold on to it for 30s or so after the last
> put?
You don't want to hold a reference to the inode unnecessarily.
(Consider for example the case of a deleted-but-still-opened file, in
which case people can notice if a large file hangs around eating up
space for an extra 30 seconds.) So I suppose you'd put fi_inode on last
close and just make sure the rest of the code is prepared to deal with
nfs4_file's with struct inodes. That might make sense to do.
Occasional false positives aren't necessarily a big deal, so the current
approach seems a reasonable compromise for now.
--b.
>
> Not something I'm looking at doing today, but it might be worth
> considering for a later delegations rework.
>
> > > if (bd->entries == 0)
> > > bd->swap_time = seconds_since_boot();
> > > bd->entries += 1;
> > > + spin_unlock(&blocked_delegations_lock);
> > > }
> > >
> > > static struct nfs4_delegation *
> > > @@ -3097,16 +3098,16 @@ void nfsd4_prepare_cb_recall(struct nfs4_delegation *dp)
> > > struct nfs4_client *clp = dp->dl_stid.sc_client;
> > > struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
> > >
> > > - /*
> > > - * We can't do this in nfsd_break_deleg_cb because it is
> > > - * already holding inode->i_lock
> > > - */
> > > - spin_lock(&state_lock);
> > > block_delegations(&dp->dl_fh);
> > > +
> > > /*
> > > + * We can't do this in nfsd_break_deleg_cb because it is
> > > + * already holding inode->i_lock.
> > > + *
> > > * If the dl_time != 0, then we know that it has already been
> > > * queued for a lease break. Don't queue it again.
> > > */
> > > + spin_lock(&state_lock);
> > > if (dp->dl_time == 0) {
> > > dp->dl_time = get_seconds();
> > > list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru);
> >
>
>
> --
> Jeff Layton <[email protected]>
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
On Fri, 18 Jul 2014 08:54:02 -0700
Christoph Hellwig <[email protected]> wrote:
> > +out_unlock:
> > + spin_unlock(&fp->fi_lock);
> > + spin_unlock(&state_lock);
> > +out_fput:
> > + if (filp)
> > + fput(filp);
>
> I don't think fput can be NULL here.
>
Correct. I'll fix that...
> > static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
> > {
> > + int status = 0;
> > +
> > if (fp->fi_had_conflict)
> > return -EAGAIN;
> > get_nfs4_file(fp);
> > + spin_lock(&state_lock);
> > + spin_lock(&fp->fi_lock);
> > dp->dl_file = fp;
> > + if (!fp->fi_lease) {
> > + spin_unlock(&fp->fi_lock);
> > + spin_unlock(&state_lock);
> > return nfs4_setlease(dp);
> > + }
> > atomic_inc(&fp->fi_delegees);
> > if (fp->fi_had_conflict) {
> > + status = -EAGAIN;
> > + goto out_unlock;
> > }
> > hash_delegation_locked(dp, fp);
> > +out_unlock:
> > + spin_unlock(&fp->fi_lock);
> > spin_unlock(&state_lock);
> > + return status;
>
> I have to admit that I didn't have time to go through all the
> surrounding code yet, but is there error handling correct here,
> i.e. no need to rop the file reference and cleanup dp->dl_file for any
> error or race case?
>
No, we take a reference to the nfs4_file and then immediately set
dl_file. At that point, once we put the delegation's final reference it
will put the file reference.
So, I *think* the error handling is now correct, but please do sanity
check me on this if you're able.
--
Jeff Layton <[email protected]>
On Fri, 18 Jul 2014 11:13:36 -0400 Jeff Layton <[email protected]>
wrote:
> The state lock can be fairly heavily contended, and there's no reason
> that nfs4_file lookups and delegation_blocked should be mutually
> exclusive. Let's give the new block_delegation code its own spinlock.
> It does mean that we'll need to take a different lock in the delegation
> break code, but that's not generally as critical to performance.
>
> Cc: Neil Brown <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
Makes sense, thanks.
However.....
> ---
> fs/nfsd/nfs4state.c | 25 +++++++++++++------------
> 1 file changed, 13 insertions(+), 12 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index a2c6c85adfc7..952def00363b 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -506,10 +506,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
> * Each filter is 256 bits. We hash the filehandle to 32bit and use the
> * low 3 bytes as hash-table indices.
> *
> - * 'state_lock', which is always held when block_delegations() is called,
> - * is used to manage concurrent access. Testing does not need the lock
> - * except when swapping the two filters.
> + * 'blocked_delegations_lock', which is always held when block_delegations()
> + * is called, is used to manage concurrent access. Testing does not need the
> + * lock except when swapping the two filters.
...this comment is wrong. blocked_delegations_lock is *not* held when
block_delegations() is call, it is taken when needed (almost) by
block_delegations.
> */
> +static DEFINE_SPINLOCK(blocked_delegations_lock);
> static struct bloom_pair {
> int entries, old_entries;
> time_t swap_time;
> @@ -525,7 +526,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
> if (bd->entries == 0)
> return 0;
> if (seconds_since_boot() - bd->swap_time > 30) {
> - spin_lock(&state_lock);
> + spin_lock(&blocked_delegations_lock);
> if (seconds_since_boot() - bd->swap_time > 30) {
> bd->entries -= bd->old_entries;
> bd->old_entries = bd->entries;
> @@ -534,7 +535,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
> bd->new = 1-bd->new;
> bd->swap_time = seconds_since_boot();
> }
> - spin_unlock(&state_lock);
> + spin_unlock(&blocked_delegations_lock);
> }
> hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> if (test_bit(hash&255, bd->set[0]) &&
> @@ -555,16 +556,16 @@ static void block_delegations(struct knfsd_fh *fh)
> u32 hash;
> struct bloom_pair *bd = &blocked_delegations;
>
> - lockdep_assert_held(&state_lock);
> -
> hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
>
> __set_bit(hash&255, bd->set[bd->new]);
> __set_bit((hash>>8)&255, bd->set[bd->new]);
> __set_bit((hash>>16)&255, bd->set[bd->new]);
> + spin_lock(&blocked_delegations_lock);
__set_bit isn't atomic. The spin_lock should be taken *before* these
__set_bit() calls.
Otherwise, looks fine.
Thanks,
NeilBrown
> if (bd->entries == 0)
> bd->swap_time = seconds_since_boot();
> bd->entries += 1;
> + spin_unlock(&blocked_delegations_lock);
> }
>
> static struct nfs4_delegation *
> @@ -3097,16 +3098,16 @@ void nfsd4_prepare_cb_recall(struct nfs4_delegation *dp)
> struct nfs4_client *clp = dp->dl_stid.sc_client;
> struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
>
> - /*
> - * We can't do this in nfsd_break_deleg_cb because it is
> - * already holding inode->i_lock
> - */
> - spin_lock(&state_lock);
> block_delegations(&dp->dl_fh);
> +
> /*
> + * We can't do this in nfsd_break_deleg_cb because it is
> + * already holding inode->i_lock.
> + *
> * If the dl_time != 0, then we know that it has already been
> * queued for a lease break. Don't queue it again.
> */
> + spin_lock(&state_lock);
> if (dp->dl_time == 0) {
> dp->dl_time = get_seconds();
> list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru);
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
(although the description still feels a little vague..)
On Mon, 21 Jul 2014 17:05:31 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Fri, Jul 18, 2014 at 03:21:49PM -0400, J. Bruce Fields wrote:
> > On Fri, Jul 18, 2014 at 03:04:04PM -0400, Jeff Layton wrote:
> > > On Fri, 18 Jul 2014 13:49:57 -0400
> > > "J. Bruce Fields" <[email protected]> wrote:
> > >
> > > > On Fri, Jul 18, 2014 at 01:31:40PM -0400, Jeff Layton wrote:
> > > > > On Fri, 18 Jul 2014 12:28:25 -0400
> > > > > "J. Bruce Fields" <[email protected]> wrote:
> > > > >
> > > > > > On Fri, Jul 18, 2014 at 11:13:27AM -0400, Jeff Layton wrote:
> > > > > > > Move more of the delegation fields to be protected by the fi_lock. It's
> > > > > > > more granular than the state_lock and in later patches we'll want to
> > > > > > > be able to rely on it in addition to the state_lock.
> > > > > > >
> > > > > > > Also, the current code in nfs4_setlease calls vfs_setlease and uses the
> > > > > > > client_mutex to ensure that it doesn't disappear before we can hash the
> > > > > > > delegation. With the client_mutex gone, we'll have a potential race
> > > > > > > condition.
> > > > > > >
> > > > > > > It's possible that the delegation could be recalled after we acquire the
> > > > > > > lease but before we ever get around to hashing it. If that happens, then
> > > > > > > we'd have a nfs4_file that *thinks* it has a delegation, when it
> > > > > > > actually has none.
> > > > > >
> > > > > > I understand now, thanks: so the lease break code walks the list of
> > > > > > delegations associated with the file, finds none, and issues no recall,
> > > > > > but the open code continues merrily on and returns a delegation, with
> > > > > > the result that we return the client a delegation that will never be
> > > > > > recalled.
> > > > > >
> > > > > > That could be worded more carefully, and would be worth a separate patch
> > > > > > (since the bug predates the new locking).
> > > > > >
> > > > >
> > > > > Yes, that's basically correct. I'd have to think about how to fix that
> > > > > with the current code. It's probably doable if you think it's
> > > > > worthwhile, but I'll need to rebase this set on top of it.
> > > >
> > > > Well, I was wondering if this patch could just be split in two, no need
> > > > to backport further than that.
> > > >
> > >
> > > Erm, now that I've looked, I don't think it'll be that easy. The key
> > > here is to ensure that fi_had_conflict is set while holding the
> > > fi_lock. The trick here is that we need to take it in nfs4_setlease as
> > > well, and check the flag before hashing the delegation without dropping
> > > the fi_lock.
> >
> > OK, I'll live. For the sake of anyone that actually runs across that
> > bug I'll update the summary and changelog to emphasize the bugfix over
> > the locking change.
>
> So, intending to apply as follows.
>
> --b.
>
> commit 417c6629b2d81d5a18d29c4bbb6a9a4c64282a36
> Author: Jeff Layton <[email protected]>
> Date: Mon Jul 21 09:34:57 2014 -0400
>
> nfsd: fix race that grants unrecallable delegation
>
> If nfs4_setlease succesfully acquires a new delegation, then another
> task breaks the delegation before we reach hash_delegation_locked, then
> the breaking task will see an empty fi_delegations list and do nothing.
> The client will receive an open reply incorrectly granting a delegation
> and will never receive a recall.
>
> Move more of the delegation fields to be protected by the fi_lock. It's
> more granular than the state_lock and in later patches we'll want to
> be able to rely on it in addition to the state_lock.
>
> Attempt to acquire a delegation. If that succeeds, take the spinlocks
> and then check to see if the file has had a conflict show up since then.
> If it has, then we assume that the lease is no longer valid and that
> we shouldn't hand out a delegation.
>
> There's also one more potential (but very unlikely) problem. If the
> lease is broken before the delegation is hashed, then it could leak.
> In the event that the fi_delegations list is empty, reset the
> fl_break_time to jiffies so that it's cleaned up ASAP by
> the normal lease handling code.
>
> Signed-off-by: Trond Myklebust <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> Reviewed-by: Christoph Hellwig <[email protected]>
> Signed-off-by: J. Bruce Fields <[email protected]>
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 10cdb67..cc477dd 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -624,6 +624,8 @@ nfs4_put_delegation(struct nfs4_delegation *dp)
>
> static void nfs4_put_deleg_lease(struct nfs4_file *fp)
> {
> + lockdep_assert_held(&state_lock);
> +
> if (!fp->fi_lease)
> return;
> if (atomic_dec_and_test(&fp->fi_delegees)) {
> @@ -643,11 +645,10 @@ static void
> hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
> {
> lockdep_assert_held(&state_lock);
> + lockdep_assert_held(&fp->fi_lock);
>
> dp->dl_stid.sc_type = NFS4_DELEG_STID;
> - spin_lock(&fp->fi_lock);
> list_add(&dp->dl_perfile, &fp->fi_delegations);
> - spin_unlock(&fp->fi_lock);
> list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
> }
>
> @@ -659,17 +660,18 @@ unhash_delegation(struct nfs4_delegation *dp)
>
> spin_lock(&state_lock);
> dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
> + spin_lock(&fp->fi_lock);
> list_del_init(&dp->dl_perclnt);
> list_del_init(&dp->dl_recall_lru);
> - spin_lock(&fp->fi_lock);
> list_del_init(&dp->dl_perfile);
> spin_unlock(&fp->fi_lock);
> - spin_unlock(&state_lock);
> if (fp) {
> nfs4_put_deleg_lease(fp);
> - put_nfs4_file(fp);
> dp->dl_file = NULL;
> }
> + spin_unlock(&state_lock);
> + if (fp)
> + put_nfs4_file(fp);
> }
>
> static void destroy_revoked_delegation(struct nfs4_delegation *dp)
> @@ -3141,10 +3143,19 @@ static void nfsd_break_deleg_cb(struct file_lock *fl)
> */
> fl->fl_break_time = 0;
>
> - fp->fi_had_conflict = true;
> spin_lock(&fp->fi_lock);
> - list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
> - nfsd_break_one_deleg(dp);
> + fp->fi_had_conflict = true;
> + /*
> + * If there are no delegations on the list, then we can't count on this
> + * lease ever being cleaned up. Set the fl_break_time to jiffies so that
> + * time_out_leases will do it ASAP. The fact that fi_had_conflict is now
> + * true should keep any new delegations from being hashed.
> + */
> + if (list_empty(&fp->fi_delegations))
> + fl->fl_break_time = jiffies;
> + else
> + list_for_each_entry(dp, &fp->fi_delegations, dl_perfile)
> + nfsd_break_one_deleg(dp);
> spin_unlock(&fp->fi_lock);
> }
>
> @@ -3491,46 +3502,77 @@ static int nfs4_setlease(struct nfs4_delegation *dp)
> {
> struct nfs4_file *fp = dp->dl_file;
> struct file_lock *fl;
> - int status;
> + struct file *filp;
> + int status = 0;
>
> fl = nfs4_alloc_init_lease(fp, NFS4_OPEN_DELEGATE_READ);
> if (!fl)
> return -ENOMEM;
> - fl->fl_file = find_readable_file(fp);
> - status = vfs_setlease(fl->fl_file, fl->fl_type, &fl);
> - if (status)
> - goto out_free;
> + filp = find_readable_file(fp);
> + if (!filp) {
> + /* We should always have a readable file here */
> + WARN_ON_ONCE(1);
> + return -EBADF;
> + }
> + fl->fl_file = filp;
> + status = vfs_setlease(filp, fl->fl_type, &fl);
> + if (status) {
> + locks_free_lock(fl);
> + goto out_fput;
> + }
> + spin_lock(&state_lock);
> + spin_lock(&fp->fi_lock);
> + /* Did the lease get broken before we took the lock? */
> + status = -EAGAIN;
> + if (fp->fi_had_conflict)
> + goto out_unlock;
> + /* Race breaker */
> + if (fp->fi_lease) {
> + status = 0;
> + atomic_inc(&fp->fi_delegees);
> + hash_delegation_locked(dp, fp);
> + goto out_unlock;
> + }
> fp->fi_lease = fl;
> - fp->fi_deleg_file = fl->fl_file;
> + fp->fi_deleg_file = filp;
> atomic_set(&fp->fi_delegees, 1);
> - spin_lock(&state_lock);
> hash_delegation_locked(dp, fp);
> + spin_unlock(&fp->fi_lock);
> spin_unlock(&state_lock);
> return 0;
> -out_free:
> - if (fl->fl_file)
> - fput(fl->fl_file);
> - locks_free_lock(fl);
> +out_unlock:
> + spin_unlock(&fp->fi_lock);
> + spin_unlock(&state_lock);
> +out_fput:
> + fput(filp);
> return status;
> }
>
> static int nfs4_set_delegation(struct nfs4_delegation *dp, struct nfs4_file *fp)
> {
> + int status = 0;
> +
> if (fp->fi_had_conflict)
> return -EAGAIN;
> get_nfs4_file(fp);
> + spin_lock(&state_lock);
> + spin_lock(&fp->fi_lock);
> dp->dl_file = fp;
> - if (!fp->fi_lease)
> + if (!fp->fi_lease) {
> + spin_unlock(&fp->fi_lock);
> + spin_unlock(&state_lock);
> return nfs4_setlease(dp);
> - spin_lock(&state_lock);
> + }
> atomic_inc(&fp->fi_delegees);
> if (fp->fi_had_conflict) {
> - spin_unlock(&state_lock);
> - return -EAGAIN;
> + status = -EAGAIN;
> + goto out_unlock;
> }
> hash_delegation_locked(dp, fp);
> +out_unlock:
> + spin_unlock(&fp->fi_lock);
> spin_unlock(&state_lock);
> - return 0;
> + return status;
> }
>
> static void nfsd4_open_deleg_none_ext(struct nfsd4_open *open, int status)
Looks good -- ACK. Thanks for fixing that up...
--
Jeff Layton <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 40def186ee50..3e761ea48a70 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -568,7 +568,7 @@ static void block_delegations(struct knfsd_fh *fh)
}
static struct nfs4_delegation *
-alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct svc_fh *current_fh)
+alloc_init_deleg(struct nfs4_client *clp, struct svc_fh *current_fh)
{
struct nfs4_delegation *dp;
long n;
@@ -3650,7 +3650,7 @@ nfs4_open_delegation(struct net *net, struct svc_fh *fh,
default:
goto out_no_deleg;
}
- dp = alloc_init_deleg(oo->oo_owner.so_client, stp, fh);
+ dp = alloc_init_deleg(oo->oo_owner.so_client, fh);
if (dp == NULL)
goto out_no_deleg;
status = nfs4_set_delegation(dp, stp->st_file);
--
1.9.3
On Fri, Jul 18, 2014 at 03:04:04PM -0400, Jeff Layton wrote:
> On Fri, 18 Jul 2014 13:49:57 -0400
> "J. Bruce Fields" <[email protected]> wrote:
>
> > On Fri, Jul 18, 2014 at 01:31:40PM -0400, Jeff Layton wrote:
> > > On Fri, 18 Jul 2014 12:28:25 -0400
> > > "J. Bruce Fields" <[email protected]> wrote:
> > >
> > > > On Fri, Jul 18, 2014 at 11:13:27AM -0400, Jeff Layton wrote:
> > > > > Move more of the delegation fields to be protected by the fi_lock. It's
> > > > > more granular than the state_lock and in later patches we'll want to
> > > > > be able to rely on it in addition to the state_lock.
> > > > >
> > > > > Also, the current code in nfs4_setlease calls vfs_setlease and uses the
> > > > > client_mutex to ensure that it doesn't disappear before we can hash the
> > > > > delegation. With the client_mutex gone, we'll have a potential race
> > > > > condition.
> > > > >
> > > > > It's possible that the delegation could be recalled after we acquire the
> > > > > lease but before we ever get around to hashing it. If that happens, then
> > > > > we'd have a nfs4_file that *thinks* it has a delegation, when it
> > > > > actually has none.
> > > >
> > > > I understand now, thanks: so the lease break code walks the list of
> > > > delegations associated with the file, finds none, and issues no recall,
> > > > but the open code continues merrily on and returns a delegation, with
> > > > the result that we return the client a delegation that will never be
> > > > recalled.
> > > >
> > > > That could be worded more carefully, and would be worth a separate patch
> > > > (since the bug predates the new locking).
> > > >
> > >
> > > Yes, that's basically correct. I'd have to think about how to fix that
> > > with the current code. It's probably doable if you think it's
> > > worthwhile, but I'll need to rebase this set on top of it.
> >
> > Well, I was wondering if this patch could just be split in two, no need
> > to backport further than that.
> >
>
> Erm, now that I've looked, I don't think it'll be that easy. The key
> here is to ensure that fi_had_conflict is set while holding the
> fi_lock. The trick here is that we need to take it in nfs4_setlease as
> well, and check the flag before hashing the delegation without dropping
> the fi_lock.
OK, I'll live. For the sake of anyone that actually runs across that
bug I'll update the summary and changelog to emphasize the bugfix over
the locking change.
> > > > > Attempt to acquire a delegation. If that succeeds, take the spinlocks
> > > > > and then check to see if the file has had a conflict show up since then.
> > > > > If it has, then we assume that the lease is no longer valid and that
> > > > > we shouldn't hand out a delegation.
> > > > >
> > > > > There's also one more potential (but very unlikely) problem. If the
> > > > > lease is broken before the delegation is hashed, then it could leak.
> > > > > In the event that the fi_delegations list is empty, reset the
> > > > > fl_break_time to jiffies so that it's cleaned up ASAP by
> > > > > the normal lease handling code.
> > > >
> > > > Is there actually any guarantee time_out_leases() will get called on
> > > > this inode again?
> > > >
> > > > --b.
> > > >
> > >
> > > Yes. Lease breaks are handled in two phases. We walk the i_flock list
> > > and issue a ->lm_break on each lease, and then later we walk the list
> > > again after putting the task to sleep, and try to time out the leases.
> > > So by doing this, we should ensure that the task will wake up after
> > > sleeping and delete it.
> >
> > In the case of an interrupt or a nonblocking break (which is what nfsd
> > will do), then time_out_leases isn't called again from what I could
> > tell.
> >
> > --b.
> >
>
> In both cases, time_out_leases is still called at the beginning of
> __break_lease. So the next time that function is called it'll
> get cleaned up, or when the filp is closed (in locks_remove_file).
Right, but there's no guarantee another break_lease comes. E.g. the
process waiting on the lease break could get killed.
--b.
From: Trond Myklebust <[email protected]>
Don't allow stateids to clear the open file pointer until they are
being destroyed. In a later patch we'll want to move the putting of
the nfs4_file into generic stateid handling code.
Also, move to kzalloc and get rid of explicit zeroing of fields.
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfsd/nfs4state.c | 22 ++++++++++------------
1 file changed, 10 insertions(+), 12 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 49a734cd2f20..60ae21abce00 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -459,7 +459,7 @@ kmem_cache *slab)
struct nfs4_stid *stid;
int new_id;
- stid = kmem_cache_alloc(slab, GFP_KERNEL);
+ stid = kmem_cache_zalloc(slab, GFP_KERNEL);
if (!stid)
return NULL;
@@ -467,11 +467,9 @@ kmem_cache *slab)
if (new_id < 0)
goto out_free;
stid->sc_client = cl;
- stid->sc_type = 0;
stid->sc_stateid.si_opaque.so_id = new_id;
stid->sc_stateid.si_opaque.so_clid = cl->cl_clientid;
/* Will be incremented before return to client: */
- stid->sc_stateid.si_generation = 0;
atomic_set(&stid->sc_count, 1);
/*
@@ -592,10 +590,8 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct sv
INIT_LIST_HEAD(&dp->dl_perfile);
INIT_LIST_HEAD(&dp->dl_perclnt);
INIT_LIST_HEAD(&dp->dl_recall_lru);
- dp->dl_file = NULL;
dp->dl_type = NFS4_OPEN_DELEGATE_READ;
fh_copy_shallow(&dp->dl_fh, ¤t_fh->fh_handle);
- dp->dl_time = 0;
INIT_WORK(&dp->dl_recall.cb_work, nfsd4_run_cb_recall);
return dp;
}
@@ -616,6 +612,8 @@ void
nfs4_put_delegation(struct nfs4_delegation *dp)
{
if (atomic_dec_and_test(&dp->dl_stid.sc_count)) {
+ if (dp->dl_file)
+ put_nfs4_file(dp->dl_file);
remove_stid(&dp->dl_stid);
nfs4_free_stid(deleg_slab, &dp->dl_stid);
num_delegations--;
@@ -665,13 +663,9 @@ unhash_delegation(struct nfs4_delegation *dp)
list_del_init(&dp->dl_recall_lru);
list_del_init(&dp->dl_perfile);
spin_unlock(&fp->fi_lock);
- if (fp) {
+ if (fp)
nfs4_put_deleg_lease(fp);
- dp->dl_file = NULL;
- }
spin_unlock(&state_lock);
- if (fp)
- put_nfs4_file(fp);
}
static void destroy_revoked_delegation(struct nfs4_delegation *dp)
@@ -879,12 +873,12 @@ static void unhash_generic_stateid(struct nfs4_ol_stateid *stp)
static void close_generic_stateid(struct nfs4_ol_stateid *stp)
{
release_all_access(stp);
- put_nfs4_file(stp->st_file);
- stp->st_file = NULL;
}
static void free_generic_stateid(struct nfs4_ol_stateid *stp)
{
+ if (stp->st_file)
+ put_nfs4_file(stp->st_file);
remove_stid(&stp->st_stid);
nfs4_free_stid(stateid_slab, &stp->st_stid);
}
@@ -4463,6 +4457,10 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
if (list_empty(&oo->oo_owner.so_stateids))
release_openowner(oo);
} else {
+ if (s->st_file) {
+ put_nfs4_file(s->st_file);
+ s->st_file = NULL;
+ }
oo->oo_last_closed_stid = s;
/*
* In the 4.0 case we need to keep the owners around a
--
1.9.3
Currently, both destroy_revoked_delegation and revoke_delegation
manipulate the cl_revoked list without any locking aside from the
client_mutex. Ensure that the clp->cl_lock is held when manipulating it,
except for the list walking in destroy_client. At that point, the client
should no longer be in use, and so it should be safe to walk the list
without any locking. That also means that we don't need to do the
list_splice_init there either.
Also, the fact that destroy_revoked_delegation and revoke_delegation
delete dl_recall_lru without any locking makes it difficult to know
whether they're doing so safely in all cases. Move the list_del_init
calls into the callers, and add WARN_ONs in the event that these calls
are passed a delegation that has a non-empty list_head.
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 7c5233427c9b..d11b298e625e 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -669,7 +669,7 @@ unhash_delegation_locked(struct nfs4_delegation *dp)
static void destroy_revoked_delegation(struct nfs4_delegation *dp)
{
- list_del_init(&dp->dl_recall_lru);
+ WARN_ON(!list_empty(&dp->dl_recall_lru));
nfs4_put_delegation(dp);
}
@@ -685,11 +685,15 @@ static void revoke_delegation(struct nfs4_delegation *dp)
{
struct nfs4_client *clp = dp->dl_stid.sc_client;
+ WARN_ON(!list_empty(&dp->dl_recall_lru));
+
if (clp->cl_minorversion == 0)
destroy_revoked_delegation(dp);
else {
dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
- list_move(&dp->dl_recall_lru, &clp->cl_revoked);
+ spin_lock(&clp->cl_lock);
+ list_add(&dp->dl_recall_lru, &clp->cl_revoked);
+ spin_unlock(&clp->cl_lock);
}
}
@@ -1458,9 +1462,9 @@ destroy_client(struct nfs4_client *clp)
list_del_init(&dp->dl_recall_lru);
nfs4_put_delegation(dp);
}
- list_splice_init(&clp->cl_revoked, &reaplist);
- while (!list_empty(&reaplist)) {
+ while (!list_empty(&clp->cl_revoked)) {
dp = list_entry(reaplist.next, struct nfs4_delegation, dl_recall_lru);
+ list_del_init(&dp->dl_recall_lru);
destroy_revoked_delegation(dp);
}
while (!list_empty(&clp->cl_openowners)) {
@@ -3899,8 +3903,10 @@ nfs4_laundromat(struct nfsd_net *nn)
list_add(&dp->dl_recall_lru, &reaplist);
}
spin_unlock(&state_lock);
- list_for_each_safe(pos, next, &reaplist) {
- dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
+ while (!list_empty(&reaplist)) {
+ dp = list_first_entry(&reaplist, struct nfs4_delegation,
+ dl_recall_lru);
+ list_del_init(&dp->dl_recall_lru);
revoke_delegation(dp);
}
list_for_each_safe(pos, next, &nn->close_lru) {
@@ -4244,6 +4250,9 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
break;
case NFS4_REVOKED_DELEG_STID:
dp = delegstateid(s);
+ spin_lock(&cl->cl_lock);
+ list_del_init(&dp->dl_recall_lru);
+ spin_unlock(&cl->cl_lock);
destroy_revoked_delegation(dp);
ret = nfs_ok;
break;
--
1.9.3
On Fri, 18 Jul 2014 10:19:30 -0700
Christoph Hellwig <[email protected]> wrote:
> > +nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
> > + struct nfs4_file *fp)
> > {
> > - int status = 0;
> > + int status;
> > + struct nfs4_delegation *dp;
> >
> > if (fp->fi_had_conflict)
> > - return -EAGAIN;
> > + return ERR_PTR(-EAGAIN);
> > +
> > + dp = alloc_init_deleg(clp, fh);
> > + if (!dp)
> > + return ERR_PTR(-ENOMEM);
> > +
> > get_nfs4_file(fp);
>
> Seems like we should pass the file pointer to alloc_init_deleg as well
> so it can set that one one up and grab the reference without any lock?
>
> Otherwise looks good,
>
> Reviewed-by: Christoph Hellwig <[email protected]>
I actually started to do a patch that did that, but the error handling
turned out to be a little twisted. Might be reasonable for a future
cleanup though.
--
Jeff Layton <[email protected]>
On Tue, Jul 22, 2014 at 06:40:49AM +1000, NeilBrown wrote:
> On Mon, 21 Jul 2014 07:44:12 -0400 Jeff Layton <[email protected]>
> wrote:
>
> > On Mon, 21 Jul 2014 17:02:54 +1000
> > NeilBrown <[email protected]> wrote:
>
> > > > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> > > >
> > > > __set_bit(hash&255, bd->set[bd->new]);
> > > > __set_bit((hash>>8)&255, bd->set[bd->new]);
> > > > __set_bit((hash>>16)&255, bd->set[bd->new]);
> > > > + spin_lock(&blocked_delegations_lock);
> > >
> > > __set_bit isn't atomic. The spin_lock should be taken *before* these
> > > __set_bit() calls.
> > >
> > > Otherwise, looks fine.
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > >
> >
> > Ok. I guess the worry is that we could end up setting bits in the
> > middle of swapping the two fields? Makes sense -- fixed in my repo.
>
> It is more subtle than that.
> __set_bit() will:
> read a value from memory to a register
> set a bit in the register
> write the register back out to memory
>
> If two threads both run __set_bit on the same word of memory at the same
> time, one of the updates can get lost.
> set_bit() (no underscore) performs an atomic RMW to avoid this, but is more
> expensive.
> spin_lock() obviously ensures the required exclusion and as we are going to
> take the lock anyway we may as well take it before setting bits so we can use
> the non-atomic (cheaper) __set_bit function.
>
> > I'll send out the updated set later today (it also includes a few nits
> > that HCH pointed out last week).
> >
> > As a side note...I wonder how much we'll get in the way of false
> > positives with this scheme?
>
> If a future version of NFSv4 could allow delegations to be granted while a
> file is open (oh, it seems you are the only client using this file at the
> moment, you can treat this "open" as a delegation if you like) a few false
> positives would be a complete non-issue.
For what it's worth, I think 4.1 provides what you're asking for here;
see
http://tools.ietf.org/html/rfc5661#section-20.7
and the discussion of the various WANT_ flags in
http://tools.ietf.org/html/rfc5661#section-18.16.3
As far as I know none of that is implemented yet.
--b.
On Tue, Jul 22, 2014 at 08:50:25AM +1000, NeilBrown wrote:
> On Mon, 21 Jul 2014 17:17:57 -0400 "J. Bruce Fields" <[email protected]>
> wrote:
>
> > On Tue, Jul 22, 2014 at 06:40:49AM +1000, NeilBrown wrote:
> > > On Mon, 21 Jul 2014 07:44:12 -0400 Jeff Layton <[email protected]>
> > > wrote:
> > >
> > > > On Mon, 21 Jul 2014 17:02:54 +1000
> > > > NeilBrown <[email protected]> wrote:
> > >
> > > > > > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> > > > > >
> > > > > > __set_bit(hash&255, bd->set[bd->new]);
> > > > > > __set_bit((hash>>8)&255, bd->set[bd->new]);
> > > > > > __set_bit((hash>>16)&255, bd->set[bd->new]);
> > > > > > + spin_lock(&blocked_delegations_lock);
> > > > >
> > > > > __set_bit isn't atomic. The spin_lock should be taken *before* these
> > > > > __set_bit() calls.
> > > > >
> > > > > Otherwise, looks fine.
> > > > >
> > > > > Thanks,
> > > > > NeilBrown
> > > > >
> > > > >
> > > >
> > > > Ok. I guess the worry is that we could end up setting bits in the
> > > > middle of swapping the two fields? Makes sense -- fixed in my repo.
> > >
> > > It is more subtle than that.
> > > __set_bit() will:
> > > read a value from memory to a register
> > > set a bit in the register
> > > write the register back out to memory
> > >
> > > If two threads both run __set_bit on the same word of memory at the same
> > > time, one of the updates can get lost.
> > > set_bit() (no underscore) performs an atomic RMW to avoid this, but is more
> > > expensive.
> > > spin_lock() obviously ensures the required exclusion and as we are going to
> > > take the lock anyway we may as well take it before setting bits so we can use
> > > the non-atomic (cheaper) __set_bit function.
> > >
> > > > I'll send out the updated set later today (it also includes a few nits
> > > > that HCH pointed out last week).
> > > >
> > > > As a side note...I wonder how much we'll get in the way of false
> > > > positives with this scheme?
> > >
> > > If a future version of NFSv4 could allow delegations to be granted while a
> > > file is open (oh, it seems you are the only client using this file at the
> > > moment, you can treat this "open" as a delegation if you like) a few false
> > > positives would be a complete non-issue.
> >
> > For what it's worth, I think 4.1 provides what you're asking for here;
> > see
> >
> > http://tools.ietf.org/html/rfc5661#section-20.7
> >
> > and the discussion of the various WANT_ flags in
> >
> > http://tools.ietf.org/html/rfc5661#section-18.16.3
> >
> > As far as I know none of that is implemented yet.
> >
> > --b.
>
> I guess I should really read the 4.1 (and 4.2) spec some day....
> Though the 20.7 section seems to be about saying "resources in general are
> available" rather than "this specific file that you wanted a delegation for
> but didn't get one is how up for delegation"....
> But I only had a quick read so I might have missed something.
It was me that missed something, looks like CB_PUSH_DELEG is what you
actually want, not CB_RECALL_OBJ_AVAIL:
http://tools.ietf.org/html/rfc5661#section-20.5
NFS4.1: it's the whole bell-and-whistle orchestra.
--b.
From: Trond Myklebust <[email protected]>
We will want to add reference counting to the lock stateid and open
stateids too in later patches.
Signed-off-by: Trond Myklebust <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/nfsd/nfs4state.c | 6 +++---
fs/nfsd/state.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 9ab067e85b51..49a734cd2f20 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -472,6 +472,7 @@ kmem_cache *slab)
stid->sc_stateid.si_opaque.so_clid = cl->cl_clientid;
/* Will be incremented before return to client: */
stid->sc_stateid.si_generation = 0;
+ atomic_set(&stid->sc_count, 1);
/*
* It shouldn't be a problem to reuse an opaque stateid value.
@@ -595,7 +596,6 @@ alloc_init_deleg(struct nfs4_client *clp, struct nfs4_ol_stateid *stp, struct sv
dp->dl_type = NFS4_OPEN_DELEGATE_READ;
fh_copy_shallow(&dp->dl_fh, ¤t_fh->fh_handle);
dp->dl_time = 0;
- atomic_set(&dp->dl_count, 1);
INIT_WORK(&dp->dl_recall.cb_work, nfsd4_run_cb_recall);
return dp;
}
@@ -615,7 +615,7 @@ static void nfs4_free_stid(struct kmem_cache *slab, struct nfs4_stid *s)
void
nfs4_put_delegation(struct nfs4_delegation *dp)
{
- if (atomic_dec_and_test(&dp->dl_count)) {
+ if (atomic_dec_and_test(&dp->dl_stid.sc_count)) {
remove_stid(&dp->dl_stid);
nfs4_free_stid(deleg_slab, &dp->dl_stid);
num_delegations--;
@@ -3120,7 +3120,7 @@ static void nfsd_break_one_deleg(struct nfs4_delegation *dp)
* lock) we know the server hasn't removed the lease yet, we know
* it's safe to take a reference.
*/
- atomic_inc(&dp->dl_count);
+ atomic_inc(&dp->dl_stid.sc_count);
nfsd4_cb_recall(dp);
}
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 996d61eeb357..e68a9ae30fd7 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -73,6 +73,7 @@ struct nfsd4_callback {
};
struct nfs4_stid {
+ atomic_t sc_count;
#define NFS4_OPEN_STID 1
#define NFS4_LOCK_STID 2
#define NFS4_DELEG_STID 4
@@ -91,7 +92,6 @@ struct nfs4_delegation {
struct list_head dl_perfile;
struct list_head dl_perclnt;
struct list_head dl_recall_lru; /* delegation recalled */
- atomic_t dl_count; /* ref count */
struct nfs4_file *dl_file;
u32 dl_type;
time_t dl_time;
--
1.9.3
On Fri, Jul 18, 2014 at 11:13:34AM -0400, Jeff Layton wrote:
> No need to pass in a net pointer since we can derive that.
>
> Signed-off-by: Jeff Layton <[email protected]>
Looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
> static void destroy_revoked_delegation(struct nfs4_delegation *dp)
> {
> - list_del_init(&dp->dl_recall_lru);
> + WARN_ON(!list_empty(&dp->dl_recall_lru));
> nfs4_put_delegation(dp);
> }
Is there any point in keeping destroy_revoked_delegation and not just
calling nfs4_put_delegation directly?
> +nfs4_set_delegation(struct nfs4_client *clp, struct svc_fh *fh,
> + struct nfs4_file *fp)
> {
> - int status = 0;
> + int status;
> + struct nfs4_delegation *dp;
>
> if (fp->fi_had_conflict)
> - return -EAGAIN;
> + return ERR_PTR(-EAGAIN);
> +
> + dp = alloc_init_deleg(clp, fh);
> + if (!dp)
> + return ERR_PTR(-ENOMEM);
> +
> get_nfs4_file(fp);
Seems like we should pass the file pointer to alloc_init_deleg as well
so it can set that one one up and grab the reference without any lock?
Otherwise looks good,
Reviewed-by: Christoph Hellwig <[email protected]>
On Mon, 21 Jul 2014 09:11:27 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Mon, Jul 21, 2014 at 07:44:12AM -0400, Jeff Layton wrote:
> > On Mon, 21 Jul 2014 17:02:54 +1000
> > NeilBrown <[email protected]> wrote:
> >
> > > On Fri, 18 Jul 2014 11:13:36 -0400 Jeff Layton <[email protected]>
> > > wrote:
> > >
> > > > The state lock can be fairly heavily contended, and there's no reason
> > > > that nfs4_file lookups and delegation_blocked should be mutually
> > > > exclusive. Let's give the new block_delegation code its own spinlock.
> > > > It does mean that we'll need to take a different lock in the delegation
> > > > break code, but that's not generally as critical to performance.
> > > >
> > > > Cc: Neil Brown <[email protected]>
> > > > Signed-off-by: Jeff Layton <[email protected]>
> > >
> > > Makes sense, thanks.
> > > However.....
> > >
> > >
> > > > ---
> > > > fs/nfsd/nfs4state.c | 25 +++++++++++++------------
> > > > 1 file changed, 13 insertions(+), 12 deletions(-)
> > > >
> > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > > index a2c6c85adfc7..952def00363b 100644
> > > > --- a/fs/nfsd/nfs4state.c
> > > > +++ b/fs/nfsd/nfs4state.c
> > > > @@ -506,10 +506,11 @@ static struct nfs4_ol_stateid * nfs4_alloc_stateid(struct nfs4_client *clp)
> > > > * Each filter is 256 bits. We hash the filehandle to 32bit and use the
> > > > * low 3 bytes as hash-table indices.
> > > > *
> > > > - * 'state_lock', which is always held when block_delegations() is called,
> > > > - * is used to manage concurrent access. Testing does not need the lock
> > > > - * except when swapping the two filters.
> > > > + * 'blocked_delegations_lock', which is always held when block_delegations()
> > > > + * is called, is used to manage concurrent access. Testing does not need the
> > > > + * lock except when swapping the two filters.
> > >
> > > ...this comment is wrong. blocked_delegations_lock is *not* held when
> > > block_delegations() is call, it is taken when needed (almost) by
> > > block_delegations.
> > >
> >
> > Thanks, fixed.
> >
> > > > */
> > > > +static DEFINE_SPINLOCK(blocked_delegations_lock);
> > > > static struct bloom_pair {
> > > > int entries, old_entries;
> > > > time_t swap_time;
> > > > @@ -525,7 +526,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
> > > > if (bd->entries == 0)
> > > > return 0;
> > > > if (seconds_since_boot() - bd->swap_time > 30) {
> > > > - spin_lock(&state_lock);
> > > > + spin_lock(&blocked_delegations_lock);
> > > > if (seconds_since_boot() - bd->swap_time > 30) {
> > > > bd->entries -= bd->old_entries;
> > > > bd->old_entries = bd->entries;
> > > > @@ -534,7 +535,7 @@ static int delegation_blocked(struct knfsd_fh *fh)
> > > > bd->new = 1-bd->new;
> > > > bd->swap_time = seconds_since_boot();
> > > > }
> > > > - spin_unlock(&state_lock);
> > > > + spin_unlock(&blocked_delegations_lock);
> > > > }
> > > > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> > > > if (test_bit(hash&255, bd->set[0]) &&
> > > > @@ -555,16 +556,16 @@ static void block_delegations(struct knfsd_fh *fh)
> > > > u32 hash;
> > > > struct bloom_pair *bd = &blocked_delegations;
> > > >
> > > > - lockdep_assert_held(&state_lock);
> > > > -
> > > > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> > > >
> > > > __set_bit(hash&255, bd->set[bd->new]);
> > > > __set_bit((hash>>8)&255, bd->set[bd->new]);
> > > > __set_bit((hash>>16)&255, bd->set[bd->new]);
> > > > + spin_lock(&blocked_delegations_lock);
> > >
> > > __set_bit isn't atomic. The spin_lock should be taken *before* these
> > > __set_bit() calls.
> > >
> > > Otherwise, looks fine.
> > >
> > > Thanks,
> > > NeilBrown
> > >
> > >
> >
> > Ok. I guess the worry is that we could end up setting bits in the
> > middle of swapping the two fields? Makes sense -- fixed in my repo.
> > I'll send out the updated set later today (it also includes a few nits
> > that HCH pointed out last week).
> >
> > As a side note...I wonder how much we'll get in the way of false
> > positives with this scheme?
> >
> > Given that we'll always have (or will have had) a nfs4_file
> > corresponding to this inode, perhaps we'd be better off doing something
> > like storing (and maybe hashing on) the filehandle in the nfs4_file,
> > and just ensuring that we hold on to it for 30s or so after the last
> > put?
>
> You don't want to hold a reference to the inode unnecessarily.
> (Consider for example the case of a deleted-but-still-opened file, in
> which case people can notice if a large file hangs around eating up
> space for an extra 30 seconds.) So I suppose you'd put fi_inode on last
> close and just make sure the rest of the code is prepared to deal with
> nfs4_file's with struct inodes. That might make sense to do.
>
Yeah, that's what I was thinking. Change the code to hash the nfs4_file
based on filehandle instead of inode (which may make sense anyway), and
then just keep it around for a little while to handle delegation checks
without pinning down any vfs objects. We could institute some sort of
LRU collection of unused nfs4_files too to ensure the cache doesn't
grow too large.
> Occasional false positives aren't necessarily a big deal, so the current
> approach seems a reasonable compromise for now.
>
Right, it may be no big deal at all, but the question is -- "how often
do we hit false positives here?" I imagine it depends on workload to
some degree.
Is there some way we could sanity check the hit/miss rate without
needing to do too much tracking?
Anyway...it's more food for thought for later work in this area...
> >
> > Not something I'm looking at doing today, but it might be worth
> > considering for a later delegations rework.
> >
> > > > if (bd->entries == 0)
> > > > bd->swap_time = seconds_since_boot();
> > > > bd->entries += 1;
> > > > + spin_unlock(&blocked_delegations_lock);
> > > > }
> > > >
> > > > static struct nfs4_delegation *
> > > > @@ -3097,16 +3098,16 @@ void nfsd4_prepare_cb_recall(struct nfs4_delegation *dp)
> > > > struct nfs4_client *clp = dp->dl_stid.sc_client;
> > > > struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
> > > >
> > > > - /*
> > > > - * We can't do this in nfsd_break_deleg_cb because it is
> > > > - * already holding inode->i_lock
> > > > - */
> > > > - spin_lock(&state_lock);
> > > > block_delegations(&dp->dl_fh);
> > > > +
> > > > /*
> > > > + * We can't do this in nfsd_break_deleg_cb because it is
> > > > + * already holding inode->i_lock.
> > > > + *
> > > > * If the dl_time != 0, then we know that it has already been
> > > > * queued for a lease break. Don't queue it again.
> > > > */
> > > > + spin_lock(&state_lock);
> > > > if (dp->dl_time == 0) {
> > > > dp->dl_time = get_seconds();
> > > > list_add_tail(&dp->dl_recall_lru, &nn->del_recall_lru);
> > >
> >
> >
> > --
> > Jeff Layton <[email protected]>
>
>
--
Jeff Layton <[email protected]>
On Mon, 21 Jul 2014 07:44:12 -0400 Jeff Layton <[email protected]>
wrote:
> On Mon, 21 Jul 2014 17:02:54 +1000
> NeilBrown <[email protected]> wrote:
> > > hash = arch_fast_hash(&fh->fh_base, fh->fh_size, 0);
> > >
> > > __set_bit(hash&255, bd->set[bd->new]);
> > > __set_bit((hash>>8)&255, bd->set[bd->new]);
> > > __set_bit((hash>>16)&255, bd->set[bd->new]);
> > > + spin_lock(&blocked_delegations_lock);
> >
> > __set_bit isn't atomic. The spin_lock should be taken *before* these
> > __set_bit() calls.
> >
> > Otherwise, looks fine.
> >
> > Thanks,
> > NeilBrown
> >
> >
>
> Ok. I guess the worry is that we could end up setting bits in the
> middle of swapping the two fields? Makes sense -- fixed in my repo.
It is more subtle than that.
__set_bit() will:
read a value from memory to a register
set a bit in the register
write the register back out to memory
If two threads both run __set_bit on the same word of memory at the same
time, one of the updates can get lost.
set_bit() (no underscore) performs an atomic RMW to avoid this, but is more
expensive.
spin_lock() obviously ensures the required exclusion and as we are going to
take the lock anyway we may as well take it before setting bits so we can use
the non-atomic (cheaper) __set_bit function.
> I'll send out the updated set later today (it also includes a few nits
> that HCH pointed out last week).
>
> As a side note...I wonder how much we'll get in the way of false
> positives with this scheme?
If a future version of NFSv4 could allow delegations to be granted while a
file is open (oh, it seems you are the only client using this file at the
moment, you can treat this "open" as a delegation if you like) a few false
positives would be a complete non-issue. As it is, I think we just have to
hope.
Thanks,
NeilBrown
On Fri, 18 Jul 2014 15:21:49 -0400
"J. Bruce Fields" <[email protected]> wrote:
> On Fri, Jul 18, 2014 at 03:04:04PM -0400, Jeff Layton wrote:
> > On Fri, 18 Jul 2014 13:49:57 -0400
> > "J. Bruce Fields" <[email protected]> wrote:
> >
> > > On Fri, Jul 18, 2014 at 01:31:40PM -0400, Jeff Layton wrote:
> > > > On Fri, 18 Jul 2014 12:28:25 -0400
> > > > "J. Bruce Fields" <[email protected]> wrote:
> > > >
> > > > > On Fri, Jul 18, 2014 at 11:13:27AM -0400, Jeff Layton wrote:
> > > > > > Move more of the delegation fields to be protected by the fi_lock. It's
> > > > > > more granular than the state_lock and in later patches we'll want to
> > > > > > be able to rely on it in addition to the state_lock.
> > > > > >
> > > > > > Also, the current code in nfs4_setlease calls vfs_setlease and uses the
> > > > > > client_mutex to ensure that it doesn't disappear before we can hash the
> > > > > > delegation. With the client_mutex gone, we'll have a potential race
> > > > > > condition.
> > > > > >
> > > > > > It's possible that the delegation could be recalled after we acquire the
> > > > > > lease but before we ever get around to hashing it. If that happens, then
> > > > > > we'd have a nfs4_file that *thinks* it has a delegation, when it
> > > > > > actually has none.
> > > > >
> > > > > I understand now, thanks: so the lease break code walks the list of
> > > > > delegations associated with the file, finds none, and issues no recall,
> > > > > but the open code continues merrily on and returns a delegation, with
> > > > > the result that we return the client a delegation that will never be
> > > > > recalled.
> > > > >
> > > > > That could be worded more carefully, and would be worth a separate patch
> > > > > (since the bug predates the new locking).
> > > > >
> > > >
> > > > Yes, that's basically correct. I'd have to think about how to fix that
> > > > with the current code. It's probably doable if you think it's
> > > > worthwhile, but I'll need to rebase this set on top of it.
> > >
> > > Well, I was wondering if this patch could just be split in two, no need
> > > to backport further than that.
> > >
> >
> > Erm, now that I've looked, I don't think it'll be that easy. The key
> > here is to ensure that fi_had_conflict is set while holding the
> > fi_lock. The trick here is that we need to take it in nfs4_setlease as
> > well, and check the flag before hashing the delegation without dropping
> > the fi_lock.
>
> OK, I'll live. For the sake of anyone that actually runs across that
> bug I'll update the summary and changelog to emphasize the bugfix over
> the locking change.
>
Ok, thanks.
> > > > > > Attempt to acquire a delegation. If that succeeds, take the spinlocks
> > > > > > and then check to see if the file has had a conflict show up since then.
> > > > > > If it has, then we assume that the lease is no longer valid and that
> > > > > > we shouldn't hand out a delegation.
> > > > > >
> > > > > > There's also one more potential (but very unlikely) problem. If the
> > > > > > lease is broken before the delegation is hashed, then it could leak.
> > > > > > In the event that the fi_delegations list is empty, reset the
> > > > > > fl_break_time to jiffies so that it's cleaned up ASAP by
> > > > > > the normal lease handling code.
> > > > >
> > > > > Is there actually any guarantee time_out_leases() will get called on
> > > > > this inode again?
> > > > >
> > > > > --b.
> > > > >
> > > >
> > > > Yes. Lease breaks are handled in two phases. We walk the i_flock list
> > > > and issue a ->lm_break on each lease, and then later we walk the list
> > > > again after putting the task to sleep, and try to time out the leases.
> > > > So by doing this, we should ensure that the task will wake up after
> > > > sleeping and delete it.
> > >
> > > In the case of an interrupt or a nonblocking break (which is what nfsd
> > > will do), then time_out_leases isn't called again from what I could
> > > tell.
> > >
> > > --b.
> > >
> >
> > In both cases, time_out_leases is still called at the beginning of
> > __break_lease. So the next time that function is called it'll
> > get cleaned up, or when the filp is closed (in locks_remove_file).
>
> Right, but there's no guarantee another break_lease comes. E.g. the
> process waiting on the lease break could get killed.
>
> --b.
In that case, there's no harm in leaving the lease on the list until
the filp closes.
FWIW, I looked at trying to just remove the lease from the list, but
that's not safe from the lm_break callback. So, I think this is the
best we can reasonably do here.
--
Jeff Layton <[email protected]>
Ensure that the delegations cannot be found by the laundromat etc once
we add them to the various 'revoke' lists.
Signed-off-by: Trond Myklebust <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
---
fs/nfsd/nfs4state.c | 37 +++++++++++++++++++++----------------
1 file changed, 21 insertions(+), 16 deletions(-)
diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 60ae21abce00..7c5233427c9b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -650,13 +650,13 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
}
-/* Called under the state lock. */
static void
-unhash_delegation(struct nfs4_delegation *dp)
+unhash_delegation_locked(struct nfs4_delegation *dp)
{
struct nfs4_file *fp = dp->dl_file;
- spin_lock(&state_lock);
+ lockdep_assert_held(&state_lock);
+
dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
spin_lock(&fp->fi_lock);
list_del_init(&dp->dl_perclnt);
@@ -665,7 +665,6 @@ unhash_delegation(struct nfs4_delegation *dp)
spin_unlock(&fp->fi_lock);
if (fp)
nfs4_put_deleg_lease(fp);
- spin_unlock(&state_lock);
}
static void destroy_revoked_delegation(struct nfs4_delegation *dp)
@@ -676,7 +675,9 @@ static void destroy_revoked_delegation(struct nfs4_delegation *dp)
static void destroy_delegation(struct nfs4_delegation *dp)
{
- unhash_delegation(dp);
+ spin_lock(&state_lock);
+ unhash_delegation_locked(dp);
+ spin_unlock(&state_lock);
nfs4_put_delegation(dp);
}
@@ -685,11 +686,10 @@ static void revoke_delegation(struct nfs4_delegation *dp)
struct nfs4_client *clp = dp->dl_stid.sc_client;
if (clp->cl_minorversion == 0)
- destroy_delegation(dp);
+ destroy_revoked_delegation(dp);
else {
- unhash_delegation(dp);
dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
- list_add(&dp->dl_recall_lru, &clp->cl_revoked);
+ list_move(&dp->dl_recall_lru, &clp->cl_revoked);
}
}
@@ -1447,15 +1447,16 @@ destroy_client(struct nfs4_client *clp)
spin_lock(&state_lock);
while (!list_empty(&clp->cl_delegations)) {
dp = list_entry(clp->cl_delegations.next, struct nfs4_delegation, dl_perclnt);
- list_del_init(&dp->dl_perclnt);
+ unhash_delegation_locked(dp);
/* Ensure that deleg break won't try to requeue it */
++dp->dl_time;
- list_move(&dp->dl_recall_lru, &reaplist);
+ list_add(&dp->dl_recall_lru, &reaplist);
}
spin_unlock(&state_lock);
while (!list_empty(&reaplist)) {
dp = list_entry(reaplist.next, struct nfs4_delegation, dl_recall_lru);
- destroy_delegation(dp);
+ list_del_init(&dp->dl_recall_lru);
+ nfs4_put_delegation(dp);
}
list_splice_init(&clp->cl_revoked, &reaplist);
while (!list_empty(&reaplist)) {
@@ -3655,7 +3656,7 @@ nfs4_open_delegation(struct net *net, struct svc_fh *fh,
open->op_delegate_type = NFS4_OPEN_DELEGATE_READ;
return;
out_free:
- destroy_delegation(dp);
+ nfs4_put_delegation(dp);
out_no_deleg:
open->op_delegate_type = NFS4_OPEN_DELEGATE_NONE;
if (open->op_claim_type == NFS4_OPEN_CLAIM_PREVIOUS &&
@@ -3894,7 +3895,8 @@ nfs4_laundromat(struct nfsd_net *nn)
new_timeo = min(new_timeo, t);
break;
}
- list_move(&dp->dl_recall_lru, &reaplist);
+ unhash_delegation_locked(dp);
+ list_add(&dp->dl_recall_lru, &reaplist);
}
spin_unlock(&state_lock);
list_for_each_safe(pos, next, &reaplist) {
@@ -5369,7 +5371,8 @@ static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max,
* don't monkey with it now that we are.
*/
++dp->dl_time;
- list_move(&dp->dl_recall_lru, victims);
+ unhash_delegation_locked(dp);
+ list_add(&dp->dl_recall_lru, victims);
}
if (++count == max)
break;
@@ -5624,12 +5627,14 @@ nfs4_state_shutdown_net(struct net *net)
spin_lock(&state_lock);
list_for_each_safe(pos, next, &nn->del_recall_lru) {
dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
- list_move(&dp->dl_recall_lru, &reaplist);
+ unhash_delegation_locked(dp);
+ list_add(&dp->dl_recall_lru, &reaplist);
}
spin_unlock(&state_lock);
list_for_each_safe(pos, next, &reaplist) {
dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
- destroy_delegation(dp);
+ list_del_init(&dp->dl_recall_lru);
+ nfs4_put_delegation(dp);
}
nfsd4_client_tracking_exit(net);
--
1.9.3