2023-09-17 23:12:44

by Trond Myklebust

[permalink] [raw]
Subject: [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race

From: Trond Myklebust <[email protected]>

If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared
NFS4CLNT_MANAGER_RUNNING, then we might have won the race against
nfs4_schedule_state_manager(), and are responsible for handling the
recovery situation.

Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
Signed-off-by: Trond Myklebust <[email protected]>
---
fs/nfs/nfs4state.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index e079987af4a3..0bc160fbabec 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct nfs_client *clp)
nfs4_end_drain_session(clp);
nfs4_clear_state_manager_bit(clp);

+ if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) &&
+ !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING,
+ &clp->cl_state)) {
+ memflags = memalloc_nofs_save();
+ continue;
+ }
+
if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state)) {
if (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) {
nfs_client_return_marked_delegations(clp);
--
2.41.0


2023-09-18 02:39:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race

On Mon, 2023-09-18 at 11:17 +1000, NeilBrown wrote:
> On Mon, 18 Sep 2023, [email protected] wrote:
> > From: Trond Myklebust <[email protected]>
> >
> > If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared
> > NFS4CLNT_MANAGER_RUNNING, then we might have won the race against
> > nfs4_schedule_state_manager(), and are responsible for handling the
> > recovery situation.
> >
> > Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> > Signed-off-by: Trond Myklebust <[email protected]>
> > ---
> >  fs/nfs/nfs4state.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> > index e079987af4a3..0bc160fbabec 100644
> > --- a/fs/nfs/nfs4state.c
> > +++ b/fs/nfs/nfs4state.c
> > @@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct
> > nfs_client *clp)
> >                 nfs4_end_drain_session(clp);
> >                 nfs4_clear_state_manager_bit(clp);
> >  
> > +               if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state)
> > &&
> > +                   !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING,
> > +                                     &clp->cl_state)) {
> > +                       memflags = memalloc_nofs_save();
> > +                       continue;
> > +               }
> > +
>
> I cannot see what race this closes.
> When we cleared MANAGER_RUNNING, the only possible consequence is
> that
> nfs4_wait_clnt_recover could have woken up.  This leads to
> nfs4_schedule_state_manager() being run, which sets RUN_MANAGER
> whether
> it was set or not.
>
> I understand that there are problems with MANAGER_AVAILABLE which
> your
> next patch addresses, but I cannot see what this one actually fixes.
> Can you help me?
>

If NFS4CLNT_RUN_MANAGER gets set while we're handling the reboot
recovery or network partition, then NFS4CLNT_MANAGER_RUNNING will be
set, so nfs4_schedule_state_manager() will just exit rather than start
a new thread. If we don't catch that situation before we start handling
the asynchronous delegation returns, then we can deadlock.

If, OTOH, nfs4_schedule_state_manager() runs after we've cleared
NFS4CLNT_MANAGER_RUNNING, then we should be OK (assuming both patches
are applied).

Cheers
Trond

> Thanks,
> NeilBrown
>
>
>
> >                 if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING,
> > &clp->cl_state)) {
> >                         if
> > (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) {
> >                                 nfs_client_return_marked_delegation
> > s(clp);
> > --
> > 2.41.0
> >
> >
>

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2023-09-18 03:31:31

by NeilBrown

[permalink] [raw]
Subject: Re: [PATCH 1/2] NFSv4: Fix a nfs4_state_manager() race

On Mon, 18 Sep 2023, [email protected] wrote:
> From: Trond Myklebust <[email protected]>
>
> If the NFS4CLNT_RUN_MANAGER flag got set just before we cleared
> NFS4CLNT_MANAGER_RUNNING, then we might have won the race against
> nfs4_schedule_state_manager(), and are responsible for handling the
> recovery situation.
>
> Fixes: aeabb3c96186 ("NFSv4: Fix a NFSv4 state manager deadlock")
> Signed-off-by: Trond Myklebust <[email protected]>
> ---
> fs/nfs/nfs4state.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
> index e079987af4a3..0bc160fbabec 100644
> --- a/fs/nfs/nfs4state.c
> +++ b/fs/nfs/nfs4state.c
> @@ -2703,6 +2703,13 @@ static void nfs4_state_manager(struct nfs_client *clp)
> nfs4_end_drain_session(clp);
> nfs4_clear_state_manager_bit(clp);
>
> + if (test_bit(NFS4CLNT_RUN_MANAGER, &clp->cl_state) &&
> + !test_and_set_bit(NFS4CLNT_MANAGER_RUNNING,
> + &clp->cl_state)) {
> + memflags = memalloc_nofs_save();
> + continue;
> + }
> +

I cannot see what race this closes.
When we cleared MANAGER_RUNNING, the only possible consequence is that
nfs4_wait_clnt_recover could have woken up. This leads to
nfs4_schedule_state_manager() being run, which sets RUN_MANAGER whether
it was set or not.

I understand that there are problems with MANAGER_AVAILABLE which your
next patch addresses, but I cannot see what this one actually fixes.
Can you help me?

Thanks,
NeilBrown



> if (!test_and_set_bit(NFS4CLNT_RECALL_RUNNING, &clp->cl_state)) {
> if (test_and_clear_bit(NFS4CLNT_DELEGRETURN, &clp->cl_state)) {
> nfs_client_return_marked_delegations(clp);
> --
> 2.41.0
>
>