LinuxLists.cc - [PATCH 0/4 v3] nfsd: fix deadlock in move_to_close

2024-03-04 22:47:41

Subject: [PATCH 0/4 v3] nfsd: fix deadlock in move_to_close_lru()

This is very similar to v2 with:
- first two patches swapped
- while(1) loop changed to "goto retry"
- add a comment to patch 3
- rebase on nfsd-next
- rb from Jeff added.

NeilBrown

[PATCH 1/4] nfsd: move nfsd4_cstate_assign_replay() earlier in open
[PATCH 2/4] nfsd: perform all find_openstateowner_str calls in the
[PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in
[PATCH 4/4] nfsd: drop st_mutex_mutex before calling

2024-03-04 22:47:42

by NeilBrown

[permalink] [raw]

Subject: [PATCH 1/4] nfsd: move nfsd4_cstate_assign_replay() earlier in open handling.

Rather than taking the rp_mutex in nfsd4_cleanup_open_state() (which
seems counter-intuitive), take it and assign rp_owner as soon as
possible.

This will support a future change when nfsd4_cstate_assign_replay() might
fail.

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfsd/nfs4state.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index ee9aa4843443..b9b3a2601675 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -5333,15 +5333,15 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate,
strhashval = ownerstr_hashval(&open->op_owner);
oo = find_openstateowner_str(strhashval, open, clp);
open->op_openowner = oo;
- if (!oo) {
+ if (!oo)
goto new_owner;
- }
if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) {
/* Replace unconfirmed owners without checking for replay. */
release_openowner(oo);
open->op_openowner = NULL;
goto new_owner;
}
+ nfsd4_cstate_assign_replay(cstate, &oo->oo_owner);
status = nfsd4_check_seqid(cstate, &oo->oo_owner, open->op_seqid);
if (status)
return status;
@@ -5351,6 +5351,7 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate,
if (oo == NULL)
return nfserr_jukebox;
open->op_openowner = oo;
+ nfsd4_cstate_assign_replay(cstate, &oo->oo_owner);
alloc_stateid:
open->op_stp = nfs4_alloc_open_stateid(clp);
if (!open->op_stp)
@@ -6122,12 +6123,8 @@ nfsd4_process_open2(struct svc_rqst *rqstp, struct svc_fh *current_fh, struct nf
void nfsd4_cleanup_open_state(struct nfsd4_compound_state *cstate,
struct nfsd4_open *open)
{
- if (open->op_openowner) {
- struct nfs4_stateowner *so = &open->op_openowner->oo_owner;
-
- nfsd4_cstate_assign_replay(cstate, so);
- nfs4_put_stateowner(so);
- }
+ if (open->op_openowner)
+ nfs4_put_stateowner(&open->op_openowner->oo_owner);
if (open->op_file)
kmem_cache_free(file_slab, open->op_file);
if (open->op_stp)
--
2.43.0

2024-03-04 22:47:55

by NeilBrown

[permalink] [raw]

Subject: [PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in move_to_close_lru()

move_to_close_lru() waits for sc_count to become zero while holding
rp_mutex. This can deadlock if another thread holds a reference and is
waiting for rp_mutex.

By the time we get to move_to_close_lru() the openowner is unhashed and
cannot be found any more. So code waiting for the mutex can safely
retry the lookup if move_to_close_lru() has started.

So change rp_mutex to an atomic_t with three states:

RP_UNLOCK - state is still hashed, not locked for reply
RP_LOCKED - state is still hashed, is locked for reply
RP_UNHASHED - state is not hashed, no code can get a lock.

Use wait_var_event() to wait for either a lock, or for the owner to be
unhashed. In the latter case, retry the lookup.

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfsd/nfs4state.c | 43 ++++++++++++++++++++++++++++++++++++-------
fs/nfsd/state.h | 2 +-
2 files changed, 37 insertions(+), 8 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 0c1058e8cc4b..52838db0c46e 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -4639,21 +4639,37 @@ nfsd4_init_leases_net(struct nfsd_net *nn)
atomic_set(&nn->nfsd_courtesy_clients, 0);
}

+enum rp_lock {
+ RP_UNLOCKED,
+ RP_LOCKED,
+ RP_UNHASHED,
+};
+
static void init_nfs4_replay(struct nfs4_replay *rp)
{
rp->rp_status = nfserr_serverfault;
rp->rp_buflen = 0;
rp->rp_buf = rp->rp_ibuf;
- mutex_init(&rp->rp_mutex);
+ atomic_set(&rp->rp_locked, RP_UNLOCKED);
}

-static void nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate,
- struct nfs4_stateowner *so)
+static int nfsd4_cstate_assign_replay(struct nfsd4_compound_state *cstate,
+ struct nfs4_stateowner *so)
{
if (!nfsd4_has_session(cstate)) {
- mutex_lock(&so->so_replay.rp_mutex);
+ /*
+ * rp_locked is an open-coded mutex which is unlocked in
+ * nfsd4_cstate_clear_replay() or aborted in
+ * move_to_close_lru().
+ */
+ wait_var_event(&so->so_replay.rp_locked,
+ atomic_cmpxchg(&so->so_replay.rp_locked,
+ RP_UNLOCKED, RP_LOCKED) != RP_LOCKED);
+ if (atomic_read(&so->so_replay.rp_locked) == RP_UNHASHED)
+ return -EAGAIN;
cstate->replay_owner = nfs4_get_stateowner(so);
}
+ return 0;
}

void nfsd4_cstate_clear_replay(struct nfsd4_compound_state *cstate)
@@ -4662,7 +4678,8 @@ void nfsd4_cstate_clear_replay(struct nfsd4_compound_state *cstate)

if (so != NULL) {
cstate->replay_owner = NULL;
- mutex_unlock(&so->so_replay.rp_mutex);
+ atomic_set(&so->so_replay.rp_locked, RP_UNLOCKED);
+ wake_up_var(&so->so_replay.rp_locked);
nfs4_put_stateowner(so);
}
}
@@ -4958,7 +4975,11 @@ move_to_close_lru(struct nfs4_ol_stateid *s, struct net *net)
* Wait for the refcount to drop to 2. Since it has been unhashed,
* there should be no danger of the refcount going back up again at
* this point.
+ * Some threads with a reference might be waiting for rp_locked,
+ * so tell them to stop waiting.
*/
+ atomic_set(&oo->oo_owner.so_replay.rp_locked, RP_UNHASHED);
+ wake_up_var(&oo->oo_owner.so_replay.rp_locked);
wait_event(close_wq, refcount_read(&s->st_stid.sc_count) == 2);

release_all_access(s);
@@ -5331,11 +5352,15 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate,
clp = cstate->clp;

strhashval = ownerstr_hashval(&open->op_owner);
+retry:
oo = find_or_alloc_open_stateowner(strhashval, open, cstate);
open->op_openowner = oo;
if (!oo)
return nfserr_jukebox;
- nfsd4_cstate_assign_replay(cstate, &oo->oo_owner);
+ if (nfsd4_cstate_assign_replay(cstate, &oo->oo_owner) == -EAGAIN) {
+ nfs4_put_stateowner(&oo->oo_owner);
+ goto retry;
+ }
status = nfsd4_check_seqid(cstate, &oo->oo_owner, open->op_seqid);
if (status)
return status;
@@ -7175,12 +7200,16 @@ nfs4_preprocess_seqid_op(struct nfsd4_compound_state *cstate, u32 seqid,
trace_nfsd_preprocess(seqid, stateid);

*stpp = NULL;
+retry:
status = nfsd4_lookup_stateid(cstate, stateid,
typemask, statusmask, &s, nn);
if (status)
return status;
stp = openlockstateid(s);
- nfsd4_cstate_assign_replay(cstate, stp->st_stateowner);
+ if (nfsd4_cstate_assign_replay(cstate, stp->st_stateowner) == -EAGAIN) {
+ nfs4_put_stateowner(stp->st_stateowner);
+ goto retry;
+ }

status = nfs4_seqid_op_checks(cstate, stateid, seqid, stp);
if (!status)
diff --git a/fs/nfsd/state.h b/fs/nfsd/state.h
index 01c6f3445646..0400441c87c1 100644
--- a/fs/nfsd/state.h
+++ b/fs/nfsd/state.h
@@ -486,7 +486,7 @@ struct nfs4_replay {
unsigned int rp_buflen;
char *rp_buf;
struct knfsd_fh rp_openfh;
- struct mutex rp_mutex;
+ atomic_t rp_locked;
char rp_ibuf[NFSD4_REPLAY_ISIZE];
};

--
2.43.0

2024-03-04 22:48:06

by NeilBrown

[permalink] [raw]

Subject: [PATCH 2/4] nfsd: perform all find_openstateowner_str calls in the one place.

Currently find_openstateowner_str looks are done both in
nfsd4_process_open1() and alloc_init_open_stateowner() - the latter
possibly being a surprise based on its name.

It would be easier to follow, and more conformant to common patterns, if
the lookup was all in the one place.

So replace alloc_init_open_stateowner() with
find_or_alloc_open_stateowner() and use the latter in
nfsd4_process_open1() without any calls to find_openstateowner_str().

This means all finds are find_openstateowner_str_locked() and
find_openstateowner_str() is no longer needed. So discard
find_openstateowner_str() and rename find_openstateowner_str_locked() to
find_openstateowner_str().

Signed-off-by: NeilBrown <[email protected]>
---
fs/nfsd/nfs4state.c | 93 +++++++++++++++++++--------------------------
1 file changed, 40 insertions(+), 53 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index b9b3a2601675..0c1058e8cc4b 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -541,7 +541,7 @@ same_owner_str(struct nfs4_stateowner *sop, struct xdr_netobj *owner)
}

static struct nfs4_openowner *
-find_openstateowner_str_locked(unsigned int hashval, struct nfsd4_open *open,
+find_openstateowner_str(unsigned int hashval, struct nfsd4_open *open,
struct nfs4_client *clp)
{
struct nfs4_stateowner *so;
@@ -558,18 +558,6 @@ find_openstateowner_str_locked(unsigned int hashval, struct nfsd4_open *open,
return NULL;
}

-static struct nfs4_openowner *
-find_openstateowner_str(unsigned int hashval, struct nfsd4_open *open,
- struct nfs4_client *clp)
-{
- struct nfs4_openowner *oo;
-
- spin_lock(&clp->cl_lock);
- oo = find_openstateowner_str_locked(hashval, open, clp);
- spin_unlock(&clp->cl_lock);
- return oo;
-}
-
static inline u32
opaque_hashval(const void *ptr, int nbytes)
{
@@ -4855,34 +4843,46 @@ nfsd4_find_and_lock_existing_open(struct nfs4_file *fp, struct nfsd4_open *open)
}

static struct nfs4_openowner *
-alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open,
- struct nfsd4_compound_state *cstate)
+find_or_alloc_open_stateowner(unsigned int strhashval, struct nfsd4_open *open,
+ struct nfsd4_compound_state *cstate)
{
struct nfs4_client *clp = cstate->clp;
- struct nfs4_openowner *oo, *ret;
+ struct nfs4_openowner *oo, *new = NULL;

- oo = alloc_stateowner(openowner_slab, &open->op_owner, clp);
- if (!oo)
- return NULL;
- oo->oo_owner.so_ops = &openowner_ops;
- oo->oo_owner.so_is_open_owner = 1;
- oo->oo_owner.so_seqid = open->op_seqid;
- oo->oo_flags = 0;
- if (nfsd4_has_session(cstate))
- oo->oo_flags |= NFS4_OO_CONFIRMED;
- oo->oo_time = 0;
- oo->oo_last_closed_stid = NULL;
- INIT_LIST_HEAD(&oo->oo_close_lru);
- spin_lock(&clp->cl_lock);
- ret = find_openstateowner_str_locked(strhashval, open, clp);
- if (ret == NULL) {
- hash_openowner(oo, clp, strhashval);
- ret = oo;
- } else
- nfs4_free_stateowner(&oo->oo_owner);
+ while (1) {
+ spin_lock(&clp->cl_lock);
+ oo = find_openstateowner_str(strhashval, open, clp);
+ if (oo && !(oo->oo_flags & NFS4_OO_CONFIRMED)) {
+ /* Replace unconfirmed owners without checking for replay. */
+ release_openowner(oo);
+ oo = NULL;
+ }
+ if (oo) {
+ spin_unlock(&clp->cl_lock);
+ if (new)
+ nfs4_free_stateowner(&new->oo_owner);
+ return oo;
+ }
+ if (new) {
+ hash_openowner(new, clp, strhashval);
+ spin_unlock(&clp->cl_lock);
+ return new;
+ }
+ spin_unlock(&clp->cl_lock);

- spin_unlock(&clp->cl_lock);
- return ret;
+ new = alloc_stateowner(openowner_slab, &open->op_owner, clp);
+ if (!new)
+ return NULL;
+ new->oo_owner.so_ops = &openowner_ops;
+ new->oo_owner.so_is_open_owner = 1;
+ new->oo_owner.so_seqid = open->op_seqid;
+ new->oo_flags = 0;
+ if (nfsd4_has_session(cstate))
+ new->oo_flags |= NFS4_OO_CONFIRMED;
+ new->oo_time = 0;
+ new->oo_last_closed_stid = NULL;
+ INIT_LIST_HEAD(&new->oo_close_lru);
+ }
}

static struct nfs4_ol_stateid *
@@ -5331,28 +5331,15 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate,
clp = cstate->clp;

strhashval = ownerstr_hashval(&open->op_owner);
- oo = find_openstateowner_str(strhashval, open, clp);
+ oo = find_or_alloc_open_stateowner(strhashval, open, cstate);
open->op_openowner = oo;
if (!oo)
- goto new_owner;
- if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) {
- /* Replace unconfirmed owners without checking for replay. */
- release_openowner(oo);
- open->op_openowner = NULL;
- goto new_owner;
- }
+ return nfserr_jukebox;
nfsd4_cstate_assign_replay(cstate, &oo->oo_owner);
status = nfsd4_check_seqid(cstate, &oo->oo_owner, open->op_seqid);
if (status)
return status;
- goto alloc_stateid;
-new_owner:
- oo = alloc_init_open_stateowner(strhashval, open, cstate);
- if (oo == NULL)
- return nfserr_jukebox;
- open->op_openowner = oo;
- nfsd4_cstate_assign_replay(cstate, &oo->oo_owner);
-alloc_stateid:
+
open->op_stp = nfs4_alloc_open_stateid(clp);
if (!open->op_stp)
return nfserr_jukebox;
--
2.43.0

2024-03-04 22:48:08

by NeilBrown

[permalink] [raw]

Subject: [PATCH 4/4] nfsd: drop st_mutex_mutex before calling move_to_close_lru()

move_to_close_lru() is currently called with ->st_mutex held.
This can lead to a deadlock as move_to_close_lru() waits for sc_count to
drop to 2, and some threads holding a reference might be waiting for the
mutex. These references will never be dropped so sc_count will never
reach 2.

There can be no harm in dropping ->st_mutex before
move_to_close_lru() because the only place that takes the mutex is
nfsd4_lock_ol_stateid(), and it quickly aborts if sc_type is
NFS4_CLOSED_STID, which it will be before move_to_close_lru() is called.

See also
https://lore.kernel.org/lkml/[email protected]/T/
where this problem was raised but not successfully resolved.

Signed-off-by: NeilBrown <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Chuck Lever <[email protected]>
---
fs/nfsd/nfs4state.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 52838db0c46e..13cb91d704fd 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -7351,7 +7351,7 @@ nfsd4_open_downgrade(struct svc_rqst *rqstp,
return status;
}

-static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
+static bool nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
{
struct nfs4_client *clp = s->st_stid.sc_client;
bool unhashed;
@@ -7368,11 +7368,11 @@ static void nfsd4_close_open_stateid(struct nfs4_ol_stateid *s)
list_for_each_entry(stp, &reaplist, st_locks)
nfs4_free_cpntf_statelist(clp->net, &stp->st_stid);
free_ol_stateid_reaplist(&reaplist);
+ return false;
} else {
spin_unlock(&clp->cl_lock);
free_ol_stateid_reaplist(&reaplist);
- if (unhashed)
- move_to_close_lru(s, clp->net);
+ return unhashed;
}
}

@@ -7388,6 +7388,7 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
struct nfs4_ol_stateid *stp;
struct net *net = SVC_NET(rqstp);
struct nfsd_net *nn = net_generic(net, nfsd_net_id);
+ bool need_move_to_close_list;

dprintk("NFSD: nfsd4_close on file %pd\n",
cstate->current_fh.fh_dentry);
@@ -7412,8 +7413,10 @@ nfsd4_close(struct svc_rqst *rqstp, struct nfsd4_compound_state *cstate,
*/
nfs4_inc_and_copy_stateid(&close->cl_stateid, &stp->st_stid);

- nfsd4_close_open_stateid(stp);
+ need_move_to_close_list = nfsd4_close_open_stateid(stp);
mutex_unlock(&stp->st_mutex);
+ if (need_move_to_close_list)
+ move_to_close_lru(stp, net);

/* v4.1+ suggests that we send a special stateid in here, since the
* clients should just ignore this anyway. Since this is not useful
--
2.43.0

2024-03-05 15:18:07

by Jeffrey Layton

[permalink] [raw]

Subject: Re: [PATCH 0/4 v3] nfsd: fix deadlock in move_to_close_lru()

On Tue, 2024-03-05 at 09:45 +1100, NeilBrown wrote:
> This is very similar to v2 with:
> - first two patches swapped
> - while(1) loop changed to "goto retry"
> - add a comment to patch 3
> - rebase on nfsd-next
> - rb from Jeff added.
>
> NeilBrown
>
> [PATCH 1/4] nfsd: move nfsd4_cstate_assign_replay() earlier in open
> [PATCH 2/4] nfsd: perform all find_openstateowner_str calls in the
> [PATCH 3/4] nfsd: replace rp_mutex to avoid deadlock in
> [PATCH 4/4] nfsd: drop st_mutex_mutex before calling

You can add this to the set. Nice work!

Reviewed-by: Jeff Layton <[email protected]>

2024-03-09 18:57:06

by Chuck Lever

[permalink] [raw]

Subject: Re: [PATCH 2/4] nfsd: perform all find_openstateowner_str calls in the one place.

On Tue, Mar 05, 2024 at 09:45:22AM +1100, NeilBrown wrote:
> Currently find_openstateowner_str looks are done both in
> nfsd4_process_open1() and alloc_init_open_stateowner() - the latter
> possibly being a surprise based on its name.
>
> It would be easier to follow, and more conformant to common patterns, if
> the lookup was all in the one place.
>
> So replace alloc_init_open_stateowner() with
> find_or_alloc_open_stateowner() and use the latter in
> nfsd4_process_open1() without any calls to find_openstateowner_str().
>
> This means all finds are find_openstateowner_str_locked() and
> find_openstateowner_str() is no longer needed. So discard
> find_openstateowner_str() and rename find_openstateowner_str_locked() to
> find_openstateowner_str().
>
> Signed-off-by: NeilBrown <[email protected]>
> ---
> fs/nfsd/nfs4state.c | 93 +++++++++++++++++++--------------------------
> 1 file changed, 40 insertions(+), 53 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index b9b3a2601675..0c1058e8cc4b 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -541,7 +541,7 @@ same_owner_str(struct nfs4_stateowner *sop, struct xdr_netobj *owner)
> }
>
> static struct nfs4_openowner *
> -find_openstateowner_str_locked(unsigned int hashval, struct nfsd4_open *open,
> +find_openstateowner_str(unsigned int hashval, struct nfsd4_open *open,
> struct nfs4_client *clp)
> {
> struct nfs4_stateowner *so;
> @@ -558,18 +558,6 @@ find_openstateowner_str_locked(unsigned int hashval, struct nfsd4_open *open,
> return NULL;
> }
>
> -static struct nfs4_openowner *
> -find_openstateowner_str(unsigned int hashval, struct nfsd4_open *open,
> - struct nfs4_client *clp)
> -{
> - struct nfs4_openowner *oo;
> -
> - spin_lock(&clp->cl_lock);
> - oo = find_openstateowner_str_locked(hashval, open, clp);
> - spin_unlock(&clp->cl_lock);
> - return oo;
> -}
> -
> static inline u32
> opaque_hashval(const void *ptr, int nbytes)
> {
> @@ -4855,34 +4843,46 @@ nfsd4_find_and_lock_existing_open(struct nfs4_file *fp, struct nfsd4_open *open)
> }
>
> static struct nfs4_openowner *
> -alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open,
> - struct nfsd4_compound_state *cstate)
> +find_or_alloc_open_stateowner(unsigned int strhashval, struct nfsd4_open *open,
> + struct nfsd4_compound_state *cstate)
> {
> struct nfs4_client *clp = cstate->clp;
> - struct nfs4_openowner *oo, *ret;
> + struct nfs4_openowner *oo, *new = NULL;
>
> - oo = alloc_stateowner(openowner_slab, &open->op_owner, clp);
> - if (!oo)
> - return NULL;
> - oo->oo_owner.so_ops = &openowner_ops;
> - oo->oo_owner.so_is_open_owner = 1;
> - oo->oo_owner.so_seqid = open->op_seqid;
> - oo->oo_flags = 0;
> - if (nfsd4_has_session(cstate))
> - oo->oo_flags |= NFS4_OO_CONFIRMED;
> - oo->oo_time = 0;
> - oo->oo_last_closed_stid = NULL;
> - INIT_LIST_HEAD(&oo->oo_close_lru);
> - spin_lock(&clp->cl_lock);
> - ret = find_openstateowner_str_locked(strhashval, open, clp);
> - if (ret == NULL) {
> - hash_openowner(oo, clp, strhashval);
> - ret = oo;
> - } else
> - nfs4_free_stateowner(&oo->oo_owner);
> + while (1) {
> + spin_lock(&clp->cl_lock);
> + oo = find_openstateowner_str(strhashval, open, clp);
> + if (oo && !(oo->oo_flags & NFS4_OO_CONFIRMED)) {
> + /* Replace unconfirmed owners without checking for replay. */
> + release_openowner(oo);
> + oo = NULL;
> + }
> + if (oo) {
> + spin_unlock(&clp->cl_lock);
> + if (new)
> + nfs4_free_stateowner(&new->oo_owner);
> + return oo;
> + }
> + if (new) {
> + hash_openowner(new, clp, strhashval);
> + spin_unlock(&clp->cl_lock);
> + return new;
> + }
> + spin_unlock(&clp->cl_lock);
>
> - spin_unlock(&clp->cl_lock);
> - return ret;
> + new = alloc_stateowner(openowner_slab, &open->op_owner, clp);
> + if (!new)
> + return NULL;
> + new->oo_owner.so_ops = &openowner_ops;
> + new->oo_owner.so_is_open_owner = 1;
> + new->oo_owner.so_seqid = open->op_seqid;
> + new->oo_flags = 0;
> + if (nfsd4_has_session(cstate))
> + new->oo_flags |= NFS4_OO_CONFIRMED;
> + new->oo_time = 0;
> + new->oo_last_closed_stid = NULL;
> + INIT_LIST_HEAD(&new->oo_close_lru);
> + }
> }
>
> static struct nfs4_ol_stateid *
> @@ -5331,28 +5331,15 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate,
> clp = cstate->clp;
>
> strhashval = ownerstr_hashval(&open->op_owner);
> - oo = find_openstateowner_str(strhashval, open, clp);
> + oo = find_or_alloc_open_stateowner(strhashval, open, cstate);
> open->op_openowner = oo;
> if (!oo)
> - goto new_owner;
> - if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) {
> - /* Replace unconfirmed owners without checking for replay. */
> - release_openowner(oo);
> - open->op_openowner = NULL;
> - goto new_owner;
> - }
> + return nfserr_jukebox;
> nfsd4_cstate_assign_replay(cstate, &oo->oo_owner);
> status = nfsd4_check_seqid(cstate, &oo->oo_owner, open->op_seqid);
> if (status)
> return status;
> - goto alloc_stateid;
> -new_owner:
> - oo = alloc_init_open_stateowner(strhashval, open, cstate);
> - if (oo == NULL)
> - return nfserr_jukebox;
> - open->op_openowner = oo;
> - nfsd4_cstate_assign_replay(cstate, &oo->oo_owner);
> -alloc_stateid:
> +
> open->op_stp = nfs4_alloc_open_stateid(clp);
> if (!open->op_stp)
> return nfserr_jukebox;
> --
> 2.43.0
>

While running the NFSv4.0 pynfs tests with KASAN and lock debugging
enabled, I get a number of CPU soft lockup warnings on the test
server and then it wedges hard. I bisected to this patch.

Because we are just a day shy of the v6.9 merge window, I'm going to
drop these four patches for v6.9. We can try them again for v6.10.

--
Chuck Lever

2024-03-10 22:14:01

by NeilBrown

[permalink] [raw]

Subject: Re: [PATCH 2/4] nfsd: perform all find_openstateowner_str calls in the one place.

On Sun, 10 Mar 2024, Chuck Lever wrote:
> On Tue, Mar 05, 2024 at 09:45:22AM +1100, NeilBrown wrote:
> > Currently find_openstateowner_str looks are done both in
> > nfsd4_process_open1() and alloc_init_open_stateowner() - the latter
> > possibly being a surprise based on its name.
> >
> > It would be easier to follow, and more conformant to common patterns, if
> > the lookup was all in the one place.
> >
> > So replace alloc_init_open_stateowner() with
> > find_or_alloc_open_stateowner() and use the latter in
> > nfsd4_process_open1() without any calls to find_openstateowner_str().
> >
> > This means all finds are find_openstateowner_str_locked() and
> > find_openstateowner_str() is no longer needed. So discard
> > find_openstateowner_str() and rename find_openstateowner_str_locked() to
> > find_openstateowner_str().
> >
> > Signed-off-by: NeilBrown <[email protected]>
> > ---
> > fs/nfsd/nfs4state.c | 93 +++++++++++++++++++--------------------------
> > 1 file changed, 40 insertions(+), 53 deletions(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index b9b3a2601675..0c1058e8cc4b 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -541,7 +541,7 @@ same_owner_str(struct nfs4_stateowner *sop, struct xdr_netobj *owner)
> > }
> >
> > static struct nfs4_openowner *
> > -find_openstateowner_str_locked(unsigned int hashval, struct nfsd4_open *open,
> > +find_openstateowner_str(unsigned int hashval, struct nfsd4_open *open,
> > struct nfs4_client *clp)
> > {
> > struct nfs4_stateowner *so;
> > @@ -558,18 +558,6 @@ find_openstateowner_str_locked(unsigned int hashval, struct nfsd4_open *open,
> > return NULL;
> > }
> >
> > -static struct nfs4_openowner *
> > -find_openstateowner_str(unsigned int hashval, struct nfsd4_open *open,
> > - struct nfs4_client *clp)
> > -{
> > - struct nfs4_openowner *oo;
> > -
> > - spin_lock(&clp->cl_lock);
> > - oo = find_openstateowner_str_locked(hashval, open, clp);
> > - spin_unlock(&clp->cl_lock);
> > - return oo;
> > -}
> > -
> > static inline u32
> > opaque_hashval(const void *ptr, int nbytes)
> > {
> > @@ -4855,34 +4843,46 @@ nfsd4_find_and_lock_existing_open(struct nfs4_file *fp, struct nfsd4_open *open)
> > }
> >
> > static struct nfs4_openowner *
> > -alloc_init_open_stateowner(unsigned int strhashval, struct nfsd4_open *open,
> > - struct nfsd4_compound_state *cstate)
> > +find_or_alloc_open_stateowner(unsigned int strhashval, struct nfsd4_open *open,
> > + struct nfsd4_compound_state *cstate)
> > {
> > struct nfs4_client *clp = cstate->clp;
> > - struct nfs4_openowner *oo, *ret;
> > + struct nfs4_openowner *oo, *new = NULL;
> >
> > - oo = alloc_stateowner(openowner_slab, &open->op_owner, clp);
> > - if (!oo)
> > - return NULL;
> > - oo->oo_owner.so_ops = &openowner_ops;
> > - oo->oo_owner.so_is_open_owner = 1;
> > - oo->oo_owner.so_seqid = open->op_seqid;
> > - oo->oo_flags = 0;
> > - if (nfsd4_has_session(cstate))
> > - oo->oo_flags |= NFS4_OO_CONFIRMED;
> > - oo->oo_time = 0;
> > - oo->oo_last_closed_stid = NULL;
> > - INIT_LIST_HEAD(&oo->oo_close_lru);
> > - spin_lock(&clp->cl_lock);
> > - ret = find_openstateowner_str_locked(strhashval, open, clp);
> > - if (ret == NULL) {
> > - hash_openowner(oo, clp, strhashval);
> > - ret = oo;
> > - } else
> > - nfs4_free_stateowner(&oo->oo_owner);
> > + while (1) {
> > + spin_lock(&clp->cl_lock);
> > + oo = find_openstateowner_str(strhashval, open, clp);
> > + if (oo && !(oo->oo_flags & NFS4_OO_CONFIRMED)) {
> > + /* Replace unconfirmed owners without checking for replay. */
> > + release_openowner(oo);
> > + oo = NULL;
> > + }
> > + if (oo) {
> > + spin_unlock(&clp->cl_lock);
> > + if (new)
> > + nfs4_free_stateowner(&new->oo_owner);
> > + return oo;
> > + }
> > + if (new) {
> > + hash_openowner(new, clp, strhashval);
> > + spin_unlock(&clp->cl_lock);
> > + return new;
> > + }
> > + spin_unlock(&clp->cl_lock);
> >
> > - spin_unlock(&clp->cl_lock);
> > - return ret;
> > + new = alloc_stateowner(openowner_slab, &open->op_owner, clp);
> > + if (!new)
> > + return NULL;
> > + new->oo_owner.so_ops = &openowner_ops;
> > + new->oo_owner.so_is_open_owner = 1;
> > + new->oo_owner.so_seqid = open->op_seqid;
> > + new->oo_flags = 0;
> > + if (nfsd4_has_session(cstate))
> > + new->oo_flags |= NFS4_OO_CONFIRMED;
> > + new->oo_time = 0;
> > + new->oo_last_closed_stid = NULL;
> > + INIT_LIST_HEAD(&new->oo_close_lru);
> > + }
> > }
> >
> > static struct nfs4_ol_stateid *
> > @@ -5331,28 +5331,15 @@ nfsd4_process_open1(struct nfsd4_compound_state *cstate,
> > clp = cstate->clp;
> >
> > strhashval = ownerstr_hashval(&open->op_owner);
> > - oo = find_openstateowner_str(strhashval, open, clp);
> > + oo = find_or_alloc_open_stateowner(strhashval, open, cstate);
> > open->op_openowner = oo;
> > if (!oo)
> > - goto new_owner;
> > - if (!(oo->oo_flags & NFS4_OO_CONFIRMED)) {
> > - /* Replace unconfirmed owners without checking for replay. */
> > - release_openowner(oo);
> > - open->op_openowner = NULL;
> > - goto new_owner;
> > - }
> > + return nfserr_jukebox;
> > nfsd4_cstate_assign_replay(cstate, &oo->oo_owner);
> > status = nfsd4_check_seqid(cstate, &oo->oo_owner, open->op_seqid);
> > if (status)
> > return status;
> > - goto alloc_stateid;
> > -new_owner:
> > - oo = alloc_init_open_stateowner(strhashval, open, cstate);
> > - if (oo == NULL)
> > - return nfserr_jukebox;
> > - open->op_openowner = oo;
> > - nfsd4_cstate_assign_replay(cstate, &oo->oo_owner);
> > -alloc_stateid:
> > +
> > open->op_stp = nfs4_alloc_open_stateid(clp);
> > if (!open->op_stp)
> > return nfserr_jukebox;
> > --
> > 2.43.0
> >
>
> While running the NFSv4.0 pynfs tests with KASAN and lock debugging
> enabled, I get a number of CPU soft lockup warnings on the test
> server and then it wedges hard. I bisected to this patch.

Almost certainly this is the problem Dan Carpenter found and reported.

>
> Because we are just a day shy of the v6.9 merge window, I'm going to
> drop these four patches for v6.9. We can try them again for v6.10.

I'll post a revised version of this patch - it shouldn't affect the
subsequent patches (though I can resend them to if/when you want).

Up to you when it lands of course.

NeilBrown