2024-04-05 18:07:55

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: hold a lighter-weight client reference over CB_RECALL_ANY

On Fri, Apr 05, 2024 at 01:56:18PM -0400, Jeff Layton wrote:
> Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the
> client. While a callback job is technically an RPC that counter is
> really more for client-driven RPCs, and this has the effect of
> preventing the client from being unhashed until the callback completes.
>
> If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we
> can end up in a situation where the callback can't complete on the (now
> dead) callback channel, but the new client can't connect because the old
> client can't be unhashed. This usually manifests as a NFS4ERR_DELAY
> return on the CREATE_SESSION operation.
>
> The job is only holding a reference to the client so it can clear a flag
> in the after the RPC completes. Fix this by having CB_RECALL_ANY instead
> hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that
> sort of reference when dealing with the nfsdfs info files, but it should
> work appropriately here to ensure that the nfs4_client doesn't
> disappear.
>
> Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
> Reported-by: Vladimir Benes <[email protected]>
> Signed-off-by: Jeff Layton <[email protected]>

Applied to nfsd-fixes while waiting for review and testing. Thanks!


> ---
> Changes in v2:
> - Clean up the changelog
> - Add Fixes: tag
> - Use kref_get instead of kref_get_unless_zero
> ---
> fs/nfsd/nfs4state.c | 7 ++-----
> 1 file changed, 2 insertions(+), 5 deletions(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 5fcd93f7cb8c..3cef81e196c6 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -3042,12 +3042,9 @@ static void
> nfsd4_cb_recall_any_release(struct nfsd4_callback *cb)
> {
> struct nfs4_client *clp = cb->cb_clp;
> - struct nfsd_net *nn = net_generic(clp->net, nfsd_net_id);
>
> - spin_lock(&nn->client_lock);
> clear_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags);
> - put_client_renew_locked(clp);
> - spin_unlock(&nn->client_lock);
> + drop_client(clp);
> }
>
> static int
> @@ -6616,7 +6613,7 @@ deleg_reaper(struct nfsd_net *nn)
> list_add(&clp->cl_ra_cblist, &cblist);
>
> /* release in nfsd4_cb_recall_any_release */
> - atomic_inc(&clp->cl_rpc_users);
> + kref_get(&clp->cl_nfsdfs.cl_ref);
> set_bit(NFSD4_CLIENT_CB_RECALL_ANY, &clp->cl_flags);
> clp->cl_ra_time = ktime_get_boottime_seconds();
> }
>
> ---
> base-commit: 05258a0a69b3c5d2c003f818702c0a52b6fea861
> change-id: 20240405-rhel-31513-028ab6f14252
>
> Best regards,
> --
> Jeff Layton <[email protected]>
>
>

--
Chuck Lever


2024-04-06 06:07:44

by Cedric Blancher

[permalink] [raw]
Subject: Re: [PATCH v2] nfsd: hold a lighter-weight client reference over CB_RECALL_ANY

On Fri, 5 Apr 2024 at 20:07, Chuck Lever <[email protected]> wrote:
>
> On Fri, Apr 05, 2024 at 01:56:18PM -0400, Jeff Layton wrote:
> > Currently the CB_RECALL_ANY job takes a cl_rpc_users reference to the
> > client. While a callback job is technically an RPC that counter is
> > really more for client-driven RPCs, and this has the effect of
> > preventing the client from being unhashed until the callback completes.
> >
> > If nfsd decides to send a CB_RECALL_ANY just as the client reboots, we
> > can end up in a situation where the callback can't complete on the (now
> > dead) callback channel, but the new client can't connect because the old
> > client can't be unhashed. This usually manifests as a NFS4ERR_DELAY
> > return on the CREATE_SESSION operation.
> >
> > The job is only holding a reference to the client so it can clear a flag
> > in the after the RPC completes. Fix this by having CB_RECALL_ANY instead
> > hold a reference to the cl_nfsdfs.cl_ref. Typically we only take that
> > sort of reference when dealing with the nfsdfs info files, but it should
> > work appropriately here to ensure that the nfs4_client doesn't
> > disappear.
> >
> > Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
> > Reported-by: Vladimir Benes <[email protected]>
> > Signed-off-by: Jeff Layton <[email protected]>
>
> Applied to nfsd-fixes while waiting for review and testing. Thanks!

Please add this to the 6.6 LTS brach, too

Ced
--
Cedric Blancher <[email protected]>
[https://plus.google.com/u/0/+CedricBlancher/]
Institute Pasteur