2015-08-24 16:41:56

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 0/2] nfsd: fix unhashing races in nfsv4 stateid handling

v2:
- fix the fault injection code to handle the prototype change to
unhash_lock_stateid

Slightly updated set, to fix a problem that Wu found when enabling the
nfsd fault injection code. Original cover-letter follows:

-------------------------[snip]-------------------

I hadn't heard from Andrew in a while so I'm not 100% clear on whether
his patches had fixed the race that he had spotted in the NFSv4
unhashing code.

Still, once he had done the analysis, the problem was clear. I think
this patchset should fix the problem in a relatively clean way. I've
run some tests against this and it seems to not have broken anything,
but I haven't been able to reproduce the actual race so I can't
verify that this fixes it.

Andrew, Anna, if you are able to do so could you test these patches
and let us know whether this fixes those races?

Jeff Layton (2):
nfsd: ensure that the ol stateid hash reference is only put once
nfsd: ensure that delegation stateid hash references are only put once

fs/nfsd/nfs4state.c | 84 +++++++++++++++++++++++++++++++++--------------------
1 file changed, 53 insertions(+), 31 deletions(-)

--
2.4.3



2015-08-24 16:41:57

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 1/2] nfsd: ensure that the ol stateid hash reference is only put once

When an open or lock stateid is hashed, we take an extra reference to
it. When we unhash it, we drop that reference. The code however does
not properly account for the case where we have two callers concurrently
trying to unhash the stateid. This can lead to list corruption and the
hash reference being put more than once.

Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
list_head, and then testing to see if that list_head is empty before
releasing the hash reference. This means that some of the unhashing
wrappers now become bool return functions so we can test to see whether
the stateid was unhashed before we put the reference.

Reported-by: Andrew W Elble <[email protected]>
Reported-by: Anna Schumaker <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 58 +++++++++++++++++++++++++++++++++--------------------
1 file changed, 36 insertions(+), 22 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index c0c47a878cc6..4b4faf5e4bc7 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1009,16 +1009,20 @@ static void nfs4_put_stateowner(struct nfs4_stateowner *sop)
nfs4_free_stateowner(sop);
}

-static void unhash_ol_stateid(struct nfs4_ol_stateid *stp)
+static bool unhash_ol_stateid(struct nfs4_ol_stateid *stp)
{
struct nfs4_file *fp = stp->st_stid.sc_file;

lockdep_assert_held(&stp->st_stateowner->so_client->cl_lock);

+ if (list_empty(&stp->st_perfile))
+ return false;
+
spin_lock(&fp->fi_lock);
- list_del(&stp->st_perfile);
+ list_del_init(&stp->st_perfile);
spin_unlock(&fp->fi_lock);
list_del(&stp->st_perstateowner);
+ return true;
}

static void nfs4_free_ol_stateid(struct nfs4_stid *stid)
@@ -1068,25 +1072,27 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,
list_add(&stp->st_locks, reaplist);
}

-static void unhash_lock_stateid(struct nfs4_ol_stateid *stp)
+static bool unhash_lock_stateid(struct nfs4_ol_stateid *stp)
{
struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);

lockdep_assert_held(&oo->oo_owner.so_client->cl_lock);

list_del_init(&stp->st_locks);
- unhash_ol_stateid(stp);
nfs4_unhash_stid(&stp->st_stid);
+ return unhash_ol_stateid(stp);
}

static void release_lock_stateid(struct nfs4_ol_stateid *stp)
{
struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
+ bool unhashed;

spin_lock(&oo->oo_owner.so_client->cl_lock);
- unhash_lock_stateid(stp);
+ unhashed = unhash_lock_stateid(stp);
spin_unlock(&oo->oo_owner.so_client->cl_lock);
- nfs4_put_stid(&stp->st_stid);
+ if (unhashed)
+ nfs4_put_stid(&stp->st_stid);
}

static void unhash_lockowner_locked(struct nfs4_lockowner *lo)
@@ -1134,7 +1140,7 @@ static void release_lockowner(struct nfs4_lockowner *lo)
while (!list_empty(&lo->lo_owner.so_stateids)) {
stp = list_first_entry(&lo->lo_owner.so_stateids,
struct nfs4_ol_stateid, st_perstateowner);
- unhash_lock_stateid(stp);
+ WARN_ON(!unhash_lock_stateid(stp));
put_ol_stateid_locked(stp, &reaplist);
}
spin_unlock(&clp->cl_lock);
@@ -1147,21 +1153,26 @@ static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
{
struct nfs4_ol_stateid *stp;

+ lockdep_assert_held(&open_stp->st_stid.sc_client->cl_lock);
+
while (!list_empty(&open_stp->st_locks)) {
stp = list_entry(open_stp->st_locks.next,
struct nfs4_ol_stateid, st_locks);
- unhash_lock_stateid(stp);
+ WARN_ON(!unhash_lock_stateid(stp));
put_ol_stateid_locked(stp, reaplist);
}
}

-static void unhash_open_stateid(struct nfs4_ol_stateid *stp,
+static bool unhash_open_stateid(struct nfs4_ol_stateid *stp,
struct list_head *reaplist)
{
+ bool unhashed;
+
lockdep_assert_held(&stp->st_stid.sc_client->cl_lock);

- unhash_ol_stateid(stp);
+ unhashed = unhash_ol_stateid(stp);
release_open_stateid_locks(stp, reaplist);
+ return unhashed;
}

static void release_open_stateid(struct nfs4_ol_stateid *stp)
@@ -1169,8 +1180,8 @@ static void release_open_stateid(struct nfs4_ol_stateid *stp)
LIST_HEAD(reaplist);

spin_lock(&stp->st_stid.sc_client->cl_lock);
- unhash_open_stateid(stp, &reaplist);
- put_ol_stateid_locked(stp, &reaplist);
+ if (unhash_open_stateid(stp, &reaplist))
+ put_ol_stateid_locked(stp, &reaplist);
spin_unlock(&stp->st_stid.sc_client->cl_lock);
free_ol_stateid_reaplist(&reaplist);
}
@@ -1215,8 +1226,8 @@ static void release_openowner(struct nfs4_openowner *oo)
while (!list_empty(&oo->oo_owner.so_stateids)) {
stp = list_first_entry(&oo->oo_owner.so_stateids,
struct nfs4_ol_stateid, st_perstateowner);
- unhash_open_stateid(stp, &reaplist);
- put_ol_stateid_locked(stp, &reaplist);
+ if (unhash_open_stateid(stp, &reaplist))
+ put_ol_stateid_locked(stp, &reaplist);
}
spin_unlock(&clp->cl_lock);
free_ol_stateid_reaplist(&reaplist);
@@ -4752,7 +4763,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
if (check_for_locks(stp->st_stid.sc_file,
lockowner(stp->st_stateowner)))
break;
- unhash_lock_stateid(stp);
+ WARN_ON(!unhash_lock_stateid(stp));
spin_unlock(&cl->cl_lock);
nfs4_put_stid(s);
ret = nfs_ok;
@@ -4968,20 +4979,23 @@ out:
static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
{
struct nfs4_client *clp = s->st_stid.sc_client;
+ bool unhashed;
LIST_HEAD(reaplist);

s->st_stid.sc_type = NFS4_CLOSED_STID;
spin_lock(&clp->cl_lock);
- unhash_open_stateid(s, &reaplist);
+ unhashed = unhash_open_stateid(s, &reaplist);

if (clp->cl_minorversion) {
- put_ol_stateid_locked(s, &reaplist);
+ if (unhashed)
+ put_ol_stateid_locked(s, &reaplist);
spin_unlock(&clp->cl_lock);
free_ol_stateid_reaplist(&reaplist);
} else {
spin_unlock(&clp->cl_lock);
free_ol_stateid_reaplist(&reaplist);
- move_to_close_lru(s, clp->net);
+ if (unhashed)
+ move_to_close_lru(s, clp->net);
}
}

@@ -6014,7 +6028,7 @@ nfsd_inject_add_lock_to_list(struct nfs4_ol_stateid *lst,

static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
struct list_head *collect,
- void (*func)(struct nfs4_ol_stateid *))
+ bool (*func)(struct nfs4_ol_stateid *))
{
struct nfs4_openowner *oop;
struct nfs4_ol_stateid *stp, *st_next;
@@ -6028,9 +6042,9 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
list_for_each_entry_safe(lst, lst_next,
&stp->st_locks, st_locks) {
if (func) {
- func(lst);
- nfsd_inject_add_lock_to_list(lst,
- collect);
+ if (func(lst))
+ nfsd_inject_add_lock_to_list(lst,
+ collect);
}
++count;
/*
--
2.4.3


2015-08-24 16:41:59

by Jeff Layton

[permalink] [raw]
Subject: [PATCH v2 2/2] nfsd: ensure that delegation stateid hash references are only put once

It's possible that a DELEGRETURN could race with (e.g.) client expiry,
in which case we could end up putting the delegation hash reference more
than once.

Have unhash_delegation_locked return a bool that indicates whether it
was already unhashed. In the case of destroy_delegation we only
conditionally put the hash reference if that returns true.

The other callers of unhash_delegation_locked call it while walking
list_heads that shouldn't yet be detached. If we find that it doesn't
return true in those cases, then throw a WARN_ON as that indicates that
we have a partially hashed delegation, and that something is likely very
wrong.

Signed-off-by: Jeff Layton <[email protected]>
---
fs/nfsd/nfs4state.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4b4faf5e4bc7..0a82daa41555 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -777,13 +777,16 @@ hash_delegation_locked(struct nfs4_delegation *dp, struct nfs4_file *fp)
list_add(&dp->dl_perclnt, &dp->dl_stid.sc_client->cl_delegations);
}

-static void
+static bool
unhash_delegation_locked(struct nfs4_delegation *dp)
{
struct nfs4_file *fp = dp->dl_stid.sc_file;

lockdep_assert_held(&state_lock);

+ if (list_empty(&dp->dl_perfile))
+ return false;
+
dp->dl_stid.sc_type = NFS4_CLOSED_DELEG_STID;
/* Ensure that deleg break won't try to requeue it */
++dp->dl_time;
@@ -792,16 +795,21 @@ unhash_delegation_locked(struct nfs4_delegation *dp)
list_del_init(&dp->dl_recall_lru);
list_del_init(&dp->dl_perfile);
spin_unlock(&fp->fi_lock);
+ return true;
}

static void destroy_delegation(struct nfs4_delegation *dp)
{
+ bool unhashed;
+
spin_lock(&state_lock);
- unhash_delegation_locked(dp);
+ unhashed = unhash_delegation_locked(dp);
spin_unlock(&state_lock);
- put_clnt_odstate(dp->dl_clnt_odstate);
- nfs4_put_deleg_lease(dp->dl_stid.sc_file);
- nfs4_put_stid(&dp->dl_stid);
+ if (unhashed) {
+ put_clnt_odstate(dp->dl_clnt_odstate);
+ nfs4_put_deleg_lease(dp->dl_stid.sc_file);
+ nfs4_put_stid(&dp->dl_stid);
+ }
}

static void revoke_delegation(struct nfs4_delegation *dp)
@@ -1730,7 +1738,7 @@ __destroy_client(struct nfs4_client *clp)
spin_lock(&state_lock);
while (!list_empty(&clp->cl_delegations)) {
dp = list_entry(clp->cl_delegations.next, struct nfs4_delegation, dl_perclnt);
- unhash_delegation_locked(dp);
+ WARN_ON(!unhash_delegation_locked(dp));
list_add(&dp->dl_recall_lru, &reaplist);
}
spin_unlock(&state_lock);
@@ -4357,7 +4365,7 @@ nfs4_laundromat(struct nfsd_net *nn)
new_timeo = min(new_timeo, t);
break;
}
- unhash_delegation_locked(dp);
+ WARN_ON(!unhash_delegation_locked(dp));
list_add(&dp->dl_recall_lru, &reaplist);
}
spin_unlock(&state_lock);
@@ -6314,7 +6322,7 @@ static u64 nfsd_find_all_delegations(struct nfs4_client *clp, u64 max,
continue;

atomic_inc(&clp->cl_refcount);
- unhash_delegation_locked(dp);
+ WARN_ON(!unhash_delegation_locked(dp));
list_add(&dp->dl_recall_lru, victims);
}
++count;
@@ -6645,7 +6653,7 @@ nfs4_state_shutdown_net(struct net *net)
spin_lock(&state_lock);
list_for_each_safe(pos, next, &nn->del_recall_lru) {
dp = list_entry (pos, struct nfs4_delegation, dl_recall_lru);
- unhash_delegation_locked(dp);
+ WARN_ON(!unhash_delegation_locked(dp));
list_add(&dp->dl_recall_lru, &reaplist);
}
spin_unlock(&state_lock);
--
2.4.3


2015-08-24 19:54:20

by Andrew W Elble

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] nfsd: fix unhashing races in nfsv4 stateid handling

Jeff Layton <[email protected]> writes:

> v2:
> - fix the fault injection code to handle the prototype change to
> unhash_lock_stateid

Jeff,

We're running with this in place now. It definitely covers the same
bases my quick hack was covering, so I don't foresee any issues with
it - But I'll give a definitive answer tomorrow.

Thanks,

Andy

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-08-24 20:34:56

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] nfsd: ensure that the ol stateid hash reference is only put once

On Mon, Aug 24, 2015 at 12:41:47PM -0400, Jeff Layton wrote:
> When an open or lock stateid is hashed, we take an extra reference to
> it. When we unhash it, we drop that reference. The code however does
> not properly account for the case where we have two callers concurrently
> trying to unhash the stateid. This can lead to list corruption and the
> hash reference being put more than once.
>
> Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
> list_head, and then testing to see if that list_head is empty before
> releasing the hash reference. This means that some of the unhashing
> wrappers now become bool return functions so we can test to see whether
> the stateid was unhashed before we put the reference.

Can we make this any simpler if we make unhash_ol_stateid do the put
itself instead of making every caller do it?

Also, while I'm looking at that.... The stateid-putting code is
partially duplicated between nfs4_put_stid and put_ol_stateid_locked,
with the difference just being that the latter moves stuff to a list so
we can put a bunch at once under one cl_lock. It'd be nice if we could
remove that bit of duplication.

Haven't tried yet, though.

--b.

>
> Reported-by: Andrew W Elble <[email protected]>
> Reported-by: Anna Schumaker <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 58 +++++++++++++++++++++++++++++++++--------------------
> 1 file changed, 36 insertions(+), 22 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index c0c47a878cc6..4b4faf5e4bc7 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1009,16 +1009,20 @@ static void nfs4_put_stateowner(struct nfs4_stateowner *sop)
> nfs4_free_stateowner(sop);
> }
>
> -static void unhash_ol_stateid(struct nfs4_ol_stateid *stp)
> +static bool unhash_ol_stateid(struct nfs4_ol_stateid *stp)
> {
> struct nfs4_file *fp = stp->st_stid.sc_file;
>
> lockdep_assert_held(&stp->st_stateowner->so_client->cl_lock);
>
> + if (list_empty(&stp->st_perfile))
> + return false;
> +
> spin_lock(&fp->fi_lock);
> - list_del(&stp->st_perfile);
> + list_del_init(&stp->st_perfile);
> spin_unlock(&fp->fi_lock);
> list_del(&stp->st_perstateowner);
> + return true;
> }
>
> static void nfs4_free_ol_stateid(struct nfs4_stid *stid)
> @@ -1068,25 +1072,27 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,
> list_add(&stp->st_locks, reaplist);
> }
>
> -static void unhash_lock_stateid(struct nfs4_ol_stateid *stp)
> +static bool unhash_lock_stateid(struct nfs4_ol_stateid *stp)
> {
> struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
>
> lockdep_assert_held(&oo->oo_owner.so_client->cl_lock);
>
> list_del_init(&stp->st_locks);
> - unhash_ol_stateid(stp);
> nfs4_unhash_stid(&stp->st_stid);
> + return unhash_ol_stateid(stp);
> }
>
> static void release_lock_stateid(struct nfs4_ol_stateid *stp)
> {
> struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
> + bool unhashed;
>
> spin_lock(&oo->oo_owner.so_client->cl_lock);
> - unhash_lock_stateid(stp);
> + unhashed = unhash_lock_stateid(stp);
> spin_unlock(&oo->oo_owner.so_client->cl_lock);
> - nfs4_put_stid(&stp->st_stid);
> + if (unhashed)
> + nfs4_put_stid(&stp->st_stid);
> }
>
> static void unhash_lockowner_locked(struct nfs4_lockowner *lo)
> @@ -1134,7 +1140,7 @@ static void release_lockowner(struct nfs4_lockowner *lo)
> while (!list_empty(&lo->lo_owner.so_stateids)) {
> stp = list_first_entry(&lo->lo_owner.so_stateids,
> struct nfs4_ol_stateid, st_perstateowner);
> - unhash_lock_stateid(stp);
> + WARN_ON(!unhash_lock_stateid(stp));
> put_ol_stateid_locked(stp, &reaplist);
> }
> spin_unlock(&clp->cl_lock);
> @@ -1147,21 +1153,26 @@ static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
> {
> struct nfs4_ol_stateid *stp;
>
> + lockdep_assert_held(&open_stp->st_stid.sc_client->cl_lock);
> +
> while (!list_empty(&open_stp->st_locks)) {
> stp = list_entry(open_stp->st_locks.next,
> struct nfs4_ol_stateid, st_locks);
> - unhash_lock_stateid(stp);
> + WARN_ON(!unhash_lock_stateid(stp));
> put_ol_stateid_locked(stp, reaplist);
> }
> }
>
> -static void unhash_open_stateid(struct nfs4_ol_stateid *stp,
> +static bool unhash_open_stateid(struct nfs4_ol_stateid *stp,
> struct list_head *reaplist)
> {
> + bool unhashed;
> +
> lockdep_assert_held(&stp->st_stid.sc_client->cl_lock);
>
> - unhash_ol_stateid(stp);
> + unhashed = unhash_ol_stateid(stp);
> release_open_stateid_locks(stp, reaplist);
> + return unhashed;
> }
>
> static void release_open_stateid(struct nfs4_ol_stateid *stp)
> @@ -1169,8 +1180,8 @@ static void release_open_stateid(struct nfs4_ol_stateid *stp)
> LIST_HEAD(reaplist);
>
> spin_lock(&stp->st_stid.sc_client->cl_lock);
> - unhash_open_stateid(stp, &reaplist);
> - put_ol_stateid_locked(stp, &reaplist);
> + if (unhash_open_stateid(stp, &reaplist))
> + put_ol_stateid_locked(stp, &reaplist);
> spin_unlock(&stp->st_stid.sc_client->cl_lock);
> free_ol_stateid_reaplist(&reaplist);
> }
> @@ -1215,8 +1226,8 @@ static void release_openowner(struct nfs4_openowner *oo)
> while (!list_empty(&oo->oo_owner.so_stateids)) {
> stp = list_first_entry(&oo->oo_owner.so_stateids,
> struct nfs4_ol_stateid, st_perstateowner);
> - unhash_open_stateid(stp, &reaplist);
> - put_ol_stateid_locked(stp, &reaplist);
> + if (unhash_open_stateid(stp, &reaplist))
> + put_ol_stateid_locked(stp, &reaplist);
> }
> spin_unlock(&clp->cl_lock);
> free_ol_stateid_reaplist(&reaplist);
> @@ -4752,7 +4763,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> if (check_for_locks(stp->st_stid.sc_file,
> lockowner(stp->st_stateowner)))
> break;
> - unhash_lock_stateid(stp);
> + WARN_ON(!unhash_lock_stateid(stp));
> spin_unlock(&cl->cl_lock);
> nfs4_put_stid(s);
> ret = nfs_ok;
> @@ -4968,20 +4979,23 @@ out:
> static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
> {
> struct nfs4_client *clp = s->st_stid.sc_client;
> + bool unhashed;
> LIST_HEAD(reaplist);
>
> s->st_stid.sc_type = NFS4_CLOSED_STID;
> spin_lock(&clp->cl_lock);
> - unhash_open_stateid(s, &reaplist);
> + unhashed = unhash_open_stateid(s, &reaplist);
>
> if (clp->cl_minorversion) {
> - put_ol_stateid_locked(s, &reaplist);
> + if (unhashed)
> + put_ol_stateid_locked(s, &reaplist);
> spin_unlock(&clp->cl_lock);
> free_ol_stateid_reaplist(&reaplist);
> } else {
> spin_unlock(&clp->cl_lock);
> free_ol_stateid_reaplist(&reaplist);
> - move_to_close_lru(s, clp->net);
> + if (unhashed)
> + move_to_close_lru(s, clp->net);
> }
> }
>
> @@ -6014,7 +6028,7 @@ nfsd_inject_add_lock_to_list(struct nfs4_ol_stateid *lst,
>
> static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
> struct list_head *collect,
> - void (*func)(struct nfs4_ol_stateid *))
> + bool (*func)(struct nfs4_ol_stateid *))
> {
> struct nfs4_openowner *oop;
> struct nfs4_ol_stateid *stp, *st_next;
> @@ -6028,9 +6042,9 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
> list_for_each_entry_safe(lst, lst_next,
> &stp->st_locks, st_locks) {
> if (func) {
> - func(lst);
> - nfsd_inject_add_lock_to_list(lst,
> - collect);
> + if (func(lst))
> + nfsd_inject_add_lock_to_list(lst,
> + collect);
> }
> ++count;
> /*
> --
> 2.4.3

2015-08-24 20:41:00

by Jeff Layton

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] nfsd: ensure that the ol stateid hash reference is only put once

On Mon, 24 Aug 2015 16:34:56 -0400
"J. Bruce Fields" <[email protected]> wrote:

> On Mon, Aug 24, 2015 at 12:41:47PM -0400, Jeff Layton wrote:
> > When an open or lock stateid is hashed, we take an extra reference to
> > it. When we unhash it, we drop that reference. The code however does
> > not properly account for the case where we have two callers concurrently
> > trying to unhash the stateid. This can lead to list corruption and the
> > hash reference being put more than once.
> >
> > Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
> > list_head, and then testing to see if that list_head is empty before
> > releasing the hash reference. This means that some of the unhashing
> > wrappers now become bool return functions so we can test to see whether
> > the stateid was unhashed before we put the reference.
>
> Can we make this any simpler if we make unhash_ol_stateid do the put
> itself instead of making every caller do it?
>

The problem is that unhashing requires you to hold the cl_lock, whereas
the put requires that you do not hold it.

> Also, while I'm looking at that.... The stateid-putting code is
> partially duplicated between nfs4_put_stid and put_ol_stateid_locked,
> with the difference just being that the latter moves stuff to a list so
> we can put a bunch at once under one cl_lock. It'd be nice if we could
> remove that bit of duplication.
>
> Haven't tried yet, though.
>

There is a lot of variation here, but again the locking dictates how it
works. nfs4_put_stid is called without holding the cl_lock, and it only
takes it if the refcount goes to 0.

put_ol_stateid_locked is called while holding it. Since we can't call
sc_free while holding the spinlock, it has to defer that activity by
putting the objects on the list.

There might be a little opportunity for consolidation there, but I
think we're stuck with at least some duplication.


> >
> > Reported-by: Andrew W Elble <[email protected]>
> > Reported-by: Anna Schumaker <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 58 +++++++++++++++++++++++++++++++++--------------------
> > 1 file changed, 36 insertions(+), 22 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index c0c47a878cc6..4b4faf5e4bc7 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1009,16 +1009,20 @@ static void nfs4_put_stateowner(struct nfs4_stateowner *sop)
> > nfs4_free_stateowner(sop);
> > }
> >
> > -static void unhash_ol_stateid(struct nfs4_ol_stateid *stp)
> > +static bool unhash_ol_stateid(struct nfs4_ol_stateid *stp)
> > {
> > struct nfs4_file *fp = stp->st_stid.sc_file;
> >
> > lockdep_assert_held(&stp->st_stateowner->so_client->cl_lock);
> >
> > + if (list_empty(&stp->st_perfile))
> > + return false;
> > +
> > spin_lock(&fp->fi_lock);
> > - list_del(&stp->st_perfile);
> > + list_del_init(&stp->st_perfile);
> > spin_unlock(&fp->fi_lock);
> > list_del(&stp->st_perstateowner);
> > + return true;
> > }
> >
> > static void nfs4_free_ol_stateid(struct nfs4_stid *stid)
> > @@ -1068,25 +1072,27 @@ static void put_ol_stateid_locked(struct nfs4_ol_stateid *stp,
> > list_add(&stp->st_locks, reaplist);
> > }
> >
> > -static void unhash_lock_stateid(struct nfs4_ol_stateid *stp)
> > +static bool unhash_lock_stateid(struct nfs4_ol_stateid *stp)
> > {
> > struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
> >
> > lockdep_assert_held(&oo->oo_owner.so_client->cl_lock);
> >
> > list_del_init(&stp->st_locks);
> > - unhash_ol_stateid(stp);
> > nfs4_unhash_stid(&stp->st_stid);
> > + return unhash_ol_stateid(stp);
> > }
> >
> > static void release_lock_stateid(struct nfs4_ol_stateid *stp)
> > {
> > struct nfs4_openowner *oo = openowner(stp->st_openstp->st_stateowner);
> > + bool unhashed;
> >
> > spin_lock(&oo->oo_owner.so_client->cl_lock);
> > - unhash_lock_stateid(stp);
> > + unhashed = unhash_lock_stateid(stp);
> > spin_unlock(&oo->oo_owner.so_client->cl_lock);
> > - nfs4_put_stid(&stp->st_stid);
> > + if (unhashed)
> > + nfs4_put_stid(&stp->st_stid);
> > }
> >
> > static void unhash_lockowner_locked(struct nfs4_lockowner *lo)
> > @@ -1134,7 +1140,7 @@ static void release_lockowner(struct nfs4_lockowner *lo)
> > while (!list_empty(&lo->lo_owner.so_stateids)) {
> > stp = list_first_entry(&lo->lo_owner.so_stateids,
> > struct nfs4_ol_stateid, st_perstateowner);
> > - unhash_lock_stateid(stp);
> > + WARN_ON(!unhash_lock_stateid(stp));
> > put_ol_stateid_locked(stp, &reaplist);
> > }
> > spin_unlock(&clp->cl_lock);
> > @@ -1147,21 +1153,26 @@ static void release_open_stateid_locks(struct nfs4_ol_stateid *open_stp,
> > {
> > struct nfs4_ol_stateid *stp;
> >
> > + lockdep_assert_held(&open_stp->st_stid.sc_client->cl_lock);
> > +
> > while (!list_empty(&open_stp->st_locks)) {
> > stp = list_entry(open_stp->st_locks.next,
> > struct nfs4_ol_stateid, st_locks);
> > - unhash_lock_stateid(stp);
> > + WARN_ON(!unhash_lock_stateid(stp));
> > put_ol_stateid_locked(stp, reaplist);
> > }
> > }
> >
> > -static void unhash_open_stateid(struct nfs4_ol_stateid *stp,
> > +static bool unhash_open_stateid(struct nfs4_ol_stateid *stp,
> > struct list_head *reaplist)
> > {
> > + bool unhashed;
> > +
> > lockdep_assert_held(&stp->st_stid.sc_client->cl_lock);
> >
> > - unhash_ol_stateid(stp);
> > + unhashed = unhash_ol_stateid(stp);
> > release_open_stateid_locks(stp, reaplist);
> > + return unhashed;
> > }
> >
> > static void release_open_stateid(struct nfs4_ol_stateid *stp)
> > @@ -1169,8 +1180,8 @@ static void release_open_stateid(struct nfs4_ol_stateid *stp)
> > LIST_HEAD(reaplist);
> >
> > spin_lock(&stp->st_stid.sc_client->cl_lock);
> > - unhash_open_stateid(stp, &reaplist);
> > - put_ol_stateid_locked(stp, &reaplist);
> > + if (unhash_open_stateid(stp, &reaplist))
> > + put_ol_stateid_locked(stp, &reaplist);
> > spin_unlock(&stp->st_stid.sc_client->cl_lock);
> > free_ol_stateid_reaplist(&reaplist);
> > }
> > @@ -1215,8 +1226,8 @@ static void release_openowner(struct nfs4_openowner *oo)
> > while (!list_empty(&oo->oo_owner.so_stateids)) {
> > stp = list_first_entry(&oo->oo_owner.so_stateids,
> > struct nfs4_ol_stateid, st_perstateowner);
> > - unhash_open_stateid(stp, &reaplist);
> > - put_ol_stateid_locked(stp, &reaplist);
> > + if (unhash_open_stateid(stp, &reaplist))
> > + put_ol_stateid_locked(stp, &reaplist);
> > }
> > spin_unlock(&clp->cl_lock);
> > free_ol_stateid_reaplist(&reaplist);
> > @@ -4752,7 +4763,7 @@ nfsd4_free_stateid(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
> > if (check_for_locks(stp->st_stid.sc_file,
> > lockowner(stp->st_stateowner)))
> > break;
> > - unhash_lock_stateid(stp);
> > + WARN_ON(!unhash_lock_stateid(stp));
> > spin_unlock(&cl->cl_lock);
> > nfs4_put_stid(s);
> > ret = nfs_ok;
> > @@ -4968,20 +4979,23 @@ out:
> > static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
> > {
> > struct nfs4_client *clp = s->st_stid.sc_client;
> > + bool unhashed;
> > LIST_HEAD(reaplist);
> >
> > s->st_stid.sc_type = NFS4_CLOSED_STID;
> > spin_lock(&clp->cl_lock);
> > - unhash_open_stateid(s, &reaplist);
> > + unhashed = unhash_open_stateid(s, &reaplist);
> >
> > if (clp->cl_minorversion) {
> > - put_ol_stateid_locked(s, &reaplist);
> > + if (unhashed)
> > + put_ol_stateid_locked(s, &reaplist);
> > spin_unlock(&clp->cl_lock);
> > free_ol_stateid_reaplist(&reaplist);
> > } else {
> > spin_unlock(&clp->cl_lock);
> > free_ol_stateid_reaplist(&reaplist);
> > - move_to_close_lru(s, clp->net);
> > + if (unhashed)
> > + move_to_close_lru(s, clp->net);
> > }
> > }
> >
> > @@ -6014,7 +6028,7 @@ nfsd_inject_add_lock_to_list(struct nfs4_ol_stateid *lst,
> >
> > static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
> > struct list_head *collect,
> > - void (*func)(struct nfs4_ol_stateid *))
> > + bool (*func)(struct nfs4_ol_stateid *))
> > {
> > struct nfs4_openowner *oop;
> > struct nfs4_ol_stateid *stp, *st_next;
> > @@ -6028,9 +6042,9 @@ static u64 nfsd_foreach_client_lock(struct nfs4_client *clp, u64 max,
> > list_for_each_entry_safe(lst, lst_next,
> > &stp->st_locks, st_locks) {
> > if (func) {
> > - func(lst);
> > - nfsd_inject_add_lock_to_list(lst,
> > - collect);
> > + if (func(lst))
> > + nfsd_inject_add_lock_to_list(lst,
> > + collect);
> > }
> > ++count;
> > /*
> > --
> > 2.4.3


--
Jeff Layton <[email protected]>

2015-08-24 21:25:53

by Anna Schumaker

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] nfsd: fix unhashing races in nfsv4 stateid handling

Hi Jeff,

On 08/24/2015 12:41 PM, Jeff Layton wrote:
> v2:
> - fix the fault injection code to handle the prototype change to
> unhash_lock_stateid
>
> Slightly updated set, to fix a problem that Wu found when enabling the
> nfsd fault injection code. Original cover-letter follows:
>
> -------------------------[snip]-------------------
>
> I hadn't heard from Andrew in a while so I'm not 100% clear on whether
> his patches had fixed the race that he had spotted in the NFSv4
> unhashing code.
>
> Still, once he had done the analysis, the problem was clear. I think
> this patchset should fix the problem in a relatively clean way. I've
> run some tests against this and it seems to not have broken anything,
> but I haven't been able to reproduce the actual race so I can't
> verify that this fixes it.
>
> Andrew, Anna, if you are able to do so could you test these patches
> and let us know whether this fixes those races?

I just did a quick test, and it looks like my problem has been fixed. I'll do more in-depth testing tomorrow, just to make sure.

Thanks for working on this!

Anna
>
> Jeff Layton (2):
> nfsd: ensure that the ol stateid hash reference is only put once
> nfsd: ensure that delegation stateid hash references are only put once
>
> fs/nfsd/nfs4state.c | 84 +++++++++++++++++++++++++++++++++--------------------
> 1 file changed, 53 insertions(+), 31 deletions(-)
>


2015-08-25 18:32:53

by Andrew W Elble

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] nfsd: fix unhashing races in nfsv4 stateid handling

>> v2:
>> - fix the fault injection code to handle the prototype change to
>> unhash_lock_stateid
>
> Jeff,
>
> We're running with this in place now. It definitely covers the same
> bases my quick hack was covering, so I don't foresee any issues with
> it - But I'll give a definitive answer tomorrow.

It's good, no server crashes or memory corruption.

Thanks,

Andy

--
Andrew W. Elble
[email protected]
Infrastructure Engineer, Communications Technical Lead
Rochester Institute of Technology
PGP: BFAD 8461 4CCF DC95 DA2C B0EB 965B 082E 863E C912

2015-08-25 19:17:56

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 0/2] nfsd: fix unhashing races in nfsv4 stateid handling

On Tue, Aug 25, 2015 at 02:32:52PM -0400, Andrew W Elble wrote:
> >> v2:
> >> - fix the fault injection code to handle the prototype change to
> >> unhash_lock_stateid
> >
> > Jeff,
> >
> > We're running with this in place now. It definitely covers the same
> > bases my quick hack was covering, so I don't foresee any issues with
> > it - But I'll give a definitive answer tomorrow.
>
> It's good, no server crashes or memory corruption.

Thanks!--b.

2015-08-25 19:21:13

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [PATCH v2 1/2] nfsd: ensure that the ol stateid hash reference is only put once

On Mon, Aug 24, 2015 at 04:40:56PM -0400, Jeff Layton wrote:
> On Mon, 24 Aug 2015 16:34:56 -0400
> "J. Bruce Fields" <[email protected]> wrote:
>
> > On Mon, Aug 24, 2015 at 12:41:47PM -0400, Jeff Layton wrote:
> > > When an open or lock stateid is hashed, we take an extra reference to
> > > it. When we unhash it, we drop that reference. The code however does
> > > not properly account for the case where we have two callers concurrently
> > > trying to unhash the stateid. This can lead to list corruption and the
> > > hash reference being put more than once.
> > >
> > > Fix this by having unhash_ol_stateid use list_del_init on the st_perfile
> > > list_head, and then testing to see if that list_head is empty before
> > > releasing the hash reference. This means that some of the unhashing
> > > wrappers now become bool return functions so we can test to see whether
> > > the stateid was unhashed before we put the reference.
> >
> > Can we make this any simpler if we make unhash_ol_stateid do the put
> > itself instead of making every caller do it?
> >
>
> The problem is that unhashing requires you to hold the cl_lock, whereas
> the put requires that you do not hold it.
>
> > Also, while I'm looking at that.... The stateid-putting code is
> > partially duplicated between nfs4_put_stid and put_ol_stateid_locked,
> > with the difference just being that the latter moves stuff to a list so
> > we can put a bunch at once under one cl_lock. It'd be nice if we could
> > remove that bit of duplication.
> >
> > Haven't tried yet, though.
> >
>
> There is a lot of variation here, but again the locking dictates how it
> works. nfs4_put_stid is called without holding the cl_lock, and it only
> takes it if the refcount goes to 0.
>
> put_ol_stateid_locked is called while holding it. Since we can't call
> sc_free while holding the spinlock, it has to defer that activity by
> putting the objects on the list.
>
> There might be a little opportunity for consolidation there, but I
> think we're stuck with at least some duplication.

OK, fair enough. Anyway, these look correct and we've got confirmation
from testing now, so if we do find any chances to cleanup that should
probably be incremental--applying these for now.

--b.