2023-08-04 15:10:45

by Benjamin Coddington

[permalink] [raw]
Subject: [PATCH] nfsd: Fix race to FREE_STATEID and cl_revoked

We have some reports of linux NFS clients that cannot satisfy a linux knfsd
server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
those clients repeatedly walk all their known state using TEST_STATEID and
receive NFS4_OK for all.

Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to
cl_revoked. This would produce the observed client/server effect.

Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
and move to cl_revoked happens within the same cl_lock. This will allow
nfsd4_free_stateid() to properly remove the delegation from cl_revoked.

Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
Signed-off-by: Benjamin Coddington <[email protected]>
---
fs/nfsd/nfs4state.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 3aefbad4cc09..daf305daa751 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -1354,9 +1354,9 @@ static void revoke_delegation(struct nfs4_delegation *dp)
trace_nfsd_stid_revoke(&dp->dl_stid);

if (clp->cl_minorversion) {
+ spin_lock(&clp->cl_lock);
dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
refcount_inc(&dp->dl_stid.sc_count);
- spin_lock(&clp->cl_lock);
list_add(&dp->dl_recall_lru, &clp->cl_revoked);
spin_unlock(&clp->cl_lock);
}
--
2.40.1



2023-08-04 15:14:55

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix race to FREE_STATEID and cl_revoked

On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
> We have some reports of linux NFS clients that cannot satisfy a linux knfsd
> server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
> those clients repeatedly walk all their known state using TEST_STATEID and
> receive NFS4_OK for all.
>
> Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
> nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
> FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to
> cl_revoked. This would produce the observed client/server effect.
>
> Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
> and move to cl_revoked happens within the same cl_lock. This will allow
> nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
>
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
> Signed-off-by: Benjamin Coddington <[email protected]>

Hi Ben, does this fix deserve:

Cc: [email protected] # v4.17+

??

> ---
> fs/nfsd/nfs4state.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 3aefbad4cc09..daf305daa751 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1354,9 +1354,9 @@ static void revoke_delegation(struct nfs4_delegation *dp)
> trace_nfsd_stid_revoke(&dp->dl_stid);
>
> if (clp->cl_minorversion) {
> + spin_lock(&clp->cl_lock);
> dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
> refcount_inc(&dp->dl_stid.sc_count);
> - spin_lock(&clp->cl_lock);
> list_add(&dp->dl_recall_lru, &clp->cl_revoked);
> spin_unlock(&clp->cl_lock);
> }
> --
> 2.40.1
>

--
Chuck Lever

2023-08-04 15:26:45

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix race to FREE_STATEID and cl_revoked

On 4 Aug 2023, at 11:02, Chuck Lever wrote:

> On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
>> We have some reports of linux NFS clients that cannot satisfy a linux knfsd
>> server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
>> those clients repeatedly walk all their known state using TEST_STATEID and
>> receive NFS4_OK for all.
>>
>> Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
>> nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
>> FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to
>> cl_revoked. This would produce the observed client/server effect.
>>
>> Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
>> and move to cl_revoked happens within the same cl_lock. This will allow
>> nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
>>
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
>> Signed-off-by: Benjamin Coddington <[email protected]>
>
> Hi Ben, does this fix deserve:
>
> Cc: [email protected] # v4.17+

Yes, that's probably appropriate.

Ben


2023-08-04 15:55:27

by Jeffrey Layton

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix race to FREE_STATEID and cl_revoked

On Fri, 2023-08-04 at 11:02 -0400, Chuck Lever wrote:
> On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
> > We have some reports of linux NFS clients that cannot satisfy a linux knfsd
> > server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
> > those clients repeatedly walk all their known state using TEST_STATEID and
> > receive NFS4_OK for all.
> >
> > Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
> > nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
> > FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to
> > cl_revoked. This would produce the observed client/server effect.
> >
> > Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
> > and move to cl_revoked happens within the same cl_lock. This will allow
> > nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
> >
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
> > Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
> > Signed-off-by: Benjamin Coddington <[email protected]>
>
> Hi Ben, does this fix deserve:
>
> Cc: [email protected] # v4.17+
>
> ??
>

What's special about v4.17? Is there a patch that broke it that went
into that release?

> > ---
> > fs/nfsd/nfs4state.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > index 3aefbad4cc09..daf305daa751 100644
> > --- a/fs/nfsd/nfs4state.c
> > +++ b/fs/nfsd/nfs4state.c
> > @@ -1354,9 +1354,9 @@ static void revoke_delegation(struct nfs4_delegation *dp)
> > trace_nfsd_stid_revoke(&dp->dl_stid);
> >
> > if (clp->cl_minorversion) {
> > + spin_lock(&clp->cl_lock);
> > dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
> > refcount_inc(&dp->dl_stid.sc_count);
> > - spin_lock(&clp->cl_lock);
> > list_add(&dp->dl_recall_lru, &clp->cl_revoked);
> > spin_unlock(&clp->cl_lock);
> > }
> > --
> > 2.40.1
> >
>

The fix looks correct though. You can also add:

Reviewed-by: Jeff Layton <[email protected]>

2023-08-04 15:55:56

by Benjamin Coddington

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix race to FREE_STATEID and cl_revoked

On 4 Aug 2023, at 11:33, Jeff Layton wrote:

> On Fri, 2023-08-04 at 11:02 -0400, Chuck Lever wrote:
>> On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
>>> We have some reports of linux NFS clients that cannot satisfy a linux knfsd
>>> server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
>>> those clients repeatedly walk all their known state using TEST_STATEID and
>>> receive NFS4_OK for all.
>>>
>>> Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
>>> nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
>>> FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to
>>> cl_revoked. This would produce the observed client/server effect.
>>>
>>> Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
>>> and move to cl_revoked happens within the same cl_lock. This will allow
>>> nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
>>>
>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
>>> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
>>> Signed-off-by: Benjamin Coddington <[email protected]>
>>
>> Hi Ben, does this fix deserve:
>>
>> Cc: [email protected] # v4.17+
>>
>> ??
>>
>
> What's special about v4.17? Is there a patch that broke it that went
> into that release?

Before 0af6e690f0d4e the patch won't apply.

Ben


2023-08-04 16:00:03

by Chuck Lever

[permalink] [raw]
Subject: Re: [PATCH] nfsd: Fix race to FREE_STATEID and cl_revoked

On Fri, Aug 04, 2023 at 10:52:20AM -0400, Benjamin Coddington wrote:
> We have some reports of linux NFS clients that cannot satisfy a linux knfsd
> server that always sets SEQ4_STATUS_RECALLABLE_STATE_REVOKED even though
> those clients repeatedly walk all their known state using TEST_STATEID and
> receive NFS4_OK for all.
>
> Its possible for revoke_delegation() to set NFS4_REVOKED_DELEG_STID, then
> nfsd4_free_stateid() finds the delegation and returns NFS4_OK to
> FREE_STATEID. Afterward, revoke_delegation() moves the same delegation to
> cl_revoked. This would produce the observed client/server effect.
>
> Fix this by ensuring that the setting of sc_type to NFS4_REVOKED_DELEG_STID
> and move to cl_revoked happens within the same cl_lock. This will allow
> nfsd4_free_stateid() to properly remove the delegation from cl_revoked.
>
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2217103
> Link: https://bugzilla.redhat.com/show_bug.cgi?id=2176575
> Signed-off-by: Benjamin Coddington <[email protected]>

Applied to nfsd-fixes (for v6.5-rc).


> ---
> fs/nfsd/nfs4state.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> index 3aefbad4cc09..daf305daa751 100644
> --- a/fs/nfsd/nfs4state.c
> +++ b/fs/nfsd/nfs4state.c
> @@ -1354,9 +1354,9 @@ static void revoke_delegation(struct nfs4_delegation *dp)
> trace_nfsd_stid_revoke(&dp->dl_stid);
>
> if (clp->cl_minorversion) {
> + spin_lock(&clp->cl_lock);
> dp->dl_stid.sc_type = NFS4_REVOKED_DELEG_STID;
> refcount_inc(&dp->dl_stid.sc_count);
> - spin_lock(&clp->cl_lock);
> list_add(&dp->dl_recall_lru, &clp->cl_revoked);
> spin_unlock(&clp->cl_lock);
> }
> --
> 2.40.1
>

--
Chuck Lever