2024-01-16 13:52:54

by Connor Smith

[permalink] [raw]
Subject: Possible discrepancy in CREATE_SESSION slot number accounting

Hi,

I noticed what I think might be a discrepancy in the way Linux nfs and nfsd manage CREATE_SESSION reply cache sequence IDs.

In fs/nfs/nfs4proc.c (_nfs4_proc_create_session), the seqid looks to be incremented so long as the status is not one of NFS4ERR_STALE_CLIENTID, NFS4ERR_DELAY, or an RPC timeout. This is mostly due to:

commit b519d408ea32040b1c7e10b155a3ee9a36660947
Author: Trond Myklebust <[email protected]>
Date: Sun Sep 11 14:50:01 2016 -0400

NFSv4.1: Fix the CREATE_SESSION slot number accounting

Ensure that we conform to the algorithm described in RFC5661, section
18.36.4 for when to bump the sequence id. In essence we do it for all
cases except when the RPC call timed out, or in case of the server returning
NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID.

(Note that the comment /* Increment the clientid slot sequence id */ not being moved in the above commit has, I think, left it in the wrong place.)

Meanwhile, in fs/nfsd/nfs4state.c (nfsd4_create_session), the seqid looks to be incremented only when the session has been successfully created and the status will be NFS4_OK. This is mostly due to:

commit 86c3e16cc7aace4d1143952813b6cc2a80c51295
Author: J. Bruce Fields <[email protected]>
Date: Sat Oct 2 17:04:00 2010 -0400

nfsd4: confirm only on succesful create_session

Following rfc 5661, section 18.36.4: "If the session is not successfully
created, then no changes are made to any client records on the server."
We shouldn't be confirming or incrementing the sequence id in this case.

My reading of RFC8881 18.36.4 suggests that the former is correct - after the sequence ID has been processed in phase 2...

> [...] the CREATE_SESSION processing goes to the next phase. A subsequent new CREATE_SESSION call over the same client ID MUST use a csa_sequenceid that is one greater than the sequence ID in the slot.

Client ID confirmation is in phase 3, for example, and returns NFS4ERR_CLID_INUSE. Since this is after phase 2, the sequence ID should be incremented on that error code. My understanding is that a "client record" refers specifically to the 5-tuple described in 18.35.4, which does not include the CREATE_SESSION reply cache, so although I think the latter commit was right to move the confirmation of the client record, I'm not sure about the incrementing of the sequence ID.

Apologies if I'm mistaken about something here, but I was confused by what looked to me like a difference in 'slot number accounting'.

Thanks,
Connor


2024-01-16 14:19:57

by Chuck Lever III

[permalink] [raw]
Subject: Re: Possible discrepancy in CREATE_SESSION slot number accounting



> On Jan 16, 2024, at 8:52 AM, Connor Smith <[email protected]> wrote:
>
> Hi,
>
> I noticed what I think might be a discrepancy in the way Linux nfs and nfsd manage CREATE_SESSION reply cache sequence IDs.
>
> In fs/nfs/nfs4proc.c (_nfs4_proc_create_session), the seqid looks to be incremented so long as the status is not one of NFS4ERR_STALE_CLIENTID, NFS4ERR_DELAY, or an RPC timeout. This is mostly due to:
>
> commit b519d408ea32040b1c7e10b155a3ee9a36660947
> Author: Trond Myklebust <[email protected]>
> Date: Sun Sep 11 14:50:01 2016 -0400
>
> NFSv4.1: Fix the CREATE_SESSION slot number accounting
>
> Ensure that we conform to the algorithm described in RFC5661, section
> 18.36.4 for when to bump the sequence id. In essence we do it for all
> cases except when the RPC call timed out, or in case of the server returning
> NFS4ERR_DELAY or NFS4ERR_STALE_CLIENTID.
>
> (Note that the comment /* Increment the clientid slot sequence id */ not being moved in the above commit has, I think, left it in the wrong place.)
>
> Meanwhile, in fs/nfsd/nfs4state.c (nfsd4_create_session), the seqid looks to be incremented only when the session has been successfully created and the status will be NFS4_OK. This is mostly due to:
>
> commit 86c3e16cc7aace4d1143952813b6cc2a80c51295
> Author: J. Bruce Fields <[email protected]>
> Date: Sat Oct 2 17:04:00 2010 -0400
>
> nfsd4: confirm only on succesful create_session
>
> Following rfc 5661, section 18.36.4: "If the session is not successfully
> created, then no changes are made to any client records on the server."
> We shouldn't be confirming or incrementing the sequence id in this case.
>
> My reading of RFC8881 18.36.4 suggests that the former is correct - after the sequence ID has been processed in phase 2...
>
>> [...] the CREATE_SESSION processing goes to the next phase. A subsequent new CREATE_SESSION call over the same client ID MUST use a csa_sequenceid that is one greater than the sequence ID in the slot.
>
> Client ID confirmation is in phase 3, for example, and returns NFS4ERR_CLID_INUSE. Since this is after phase 2, the sequence ID should be incremented on that error code. My understanding is that a "client record" refers specifically to the 5-tuple described in 18.35.4, which does not include the CREATE_SESSION reply cache, so although I think the latter commit was right to move the confirmation of the client record, I'm not sure about the incrementing of the sequence ID.
>
> Apologies if I'm mistaken about something here, but I was confused by what looked to me like a difference in 'slot number accounting'.

Hello Connor -

On first blush, your interpretation of S18.36.4 looks correct
to me. We need to study this further and also look into how
pynfs treats CREATE_SESSION retransmits. Please file a bug
report here:

https://bugzilla.kernel.org/

under File Systems / NFS

--
Chuck Lever


2024-01-16 14:39:35

by Connor Smith

[permalink] [raw]
Subject: RE: Possible discrepancy in CREATE_SESSION slot number accounting

> On first blush, your interpretation of S18.36.4 looks correct
> to me. We need to study this further and also look into how
> pynfs treats CREATE_SESSION retransmits. Please file a bug
> report here:

Thanks Chuck - I've raised https://bugzilla.kernel.org/show_bug.cgi?id=218382

Thanks,
Connor