2016-08-24 18:11:13

by Chuck Lever III

[permalink] [raw]
Subject: READ during state recovery uses zero stateid

Hi-

I have a wire capture that shows this race while a simple I/O workload is
running:

0. The client reconnects after a network partition
1. The client sends a couple of READ requests
2. The client independently discovers its lease has expired
3. The client establishes a fresh lease
4. The client destroys open, lock, and delegation stateids for the file
that was open under the previous lease
5. The client issues a new OPEN to recover state for that file
6. The server replies to the READs in step 1. with NFS4ERR_EXPIRED
7. The client turns the READs around immediately using the current open
stateid for that file, which is the zero stateid
8. The server replies NFS4_OK to the OPEN from step 5

If I understand the code correctly, if the server happened to send those
READ replies after its OPEN reply (rather than before), the client would
have used the recovered open stateid instead of the zero stateid when
resending the READ requests.

Would it be better if the client recognized there is state recovery in
progress, and then waited for recovery to complete, before retrying the
READs?


--
Chuck Lever





2016-08-24 18:23:41

by Trond Myklebust

[permalink] [raw]
Subject: Re: READ during state recovery uses zero stateid


> On Aug 24, 2016, at 14:10, Chuck Lever <[email protected]> wrote:
>=20
> Hi-
>=20
> I have a wire capture that shows this race while a simple I/O workload is
> running:
>=20
> 0. The client reconnects after a network partition
> 1. The client sends a couple of READ requests
> 2. The client independently discovers its lease has expired
> 3. The client establishes a fresh lease
> 4. The client destroys open, lock, and delegation stateids for the file
> that was open under the previous lease
> 5. The client issues a new OPEN to recover state for that file
> 6. The server replies to the READs in step 1. with NFS4ERR_EXPIRED
> 7. The client turns the READs around immediately using the current open
> stateid for that file, which is the zero stateid
> 8. The server replies NFS4_OK to the OPEN from step 5
>=20
> If I understand the code correctly, if the server happened to send those
> READ replies after its OPEN reply (rather than before), the client would
> have used the recovered open stateid instead of the zero stateid when
> resending the READ requests.
>=20
> Would it be better if the client recognized there is state recovery in
> progress, and then waited for recovery to complete, before retrying the
> READs?
>=20

Why isn=92t the session draining taking care of ensuring the READs don=92t =
happen until after recovery is done?


2016-08-24 18:47:53

by Chuck Lever III

[permalink] [raw]
Subject: Re: READ during state recovery uses zero stateid


> On Aug 24, 2016, at 2:23 PM, Trond Myklebust <[email protected]> wrote:
>
>>
>> On Aug 24, 2016, at 14:10, Chuck Lever <[email protected]> wrote:
>>
>> Hi-
>>
>> I have a wire capture that shows this race while a simple I/O workload is
>> running:
>>
>> 0. The client reconnects after a network partition
>> 1. The client sends a couple of READ requests
>> 2. The client independently discovers its lease has expired
>> 3. The client establishes a fresh lease
>> 4. The client destroys open, lock, and delegation stateids for the file
>> that was open under the previous lease
>> 5. The client issues a new OPEN to recover state for that file
>> 6. The server replies to the READs in step 1. with NFS4ERR_EXPIRED
>> 7. The client turns the READs around immediately using the current open
>> stateid for that file, which is the zero stateid
>> 8. The server replies NFS4_OK to the OPEN from step 5
>>
>> If I understand the code correctly, if the server happened to send those
>> READ replies after its OPEN reply (rather than before), the client would
>> have used the recovered open stateid instead of the zero stateid when
>> resending the READ requests.
>>
>> Would it be better if the client recognized there is state recovery in
>> progress, and then waited for recovery to complete, before retrying the
>> READs?
>>
>
> Why isn?t the session draining taking care of ensuring the READs don?t happen until after recovery is done?

This is NFSv4.0. (Apologies, I recalled NFS4ERR_EXPIRED had been removed
from NFSv4.1, but I see that I was mistaken).

Here's step 1 and 2, exactly. After the partition heals, the client sends:

C READ
C GETATTR
C READ
C RENEW

The server responds to the RENEW first with GSS_CTXPROBLEM. The client's
gssd connects and establishes a fresh GSS context. The client sends the
RENEW again with the fresh context, and the server responds NFS4ERR_EXPIRED.
This triggers step 3.

The replies for those READ calls are in step 6., after state recovery
has started.

--
Chuck Lever




2016-08-24 19:05:27

by Trond Myklebust

[permalink] [raw]
Subject: Re: READ during state recovery uses zero stateid


> On Aug 24, 2016, at 14:47, Chuck Lever <[email protected]> wrote:
>=20
>=20
>> On Aug 24, 2016, at 2:23 PM, Trond Myklebust <[email protected]> w=
rote:
>>=20
>>>=20
>>> On Aug 24, 2016, at 14:10, Chuck Lever <[email protected]> wrote:
>>>=20
>>> Hi-
>>>=20
>>> I have a wire capture that shows this race while a simple I/O workload =
is
>>> running:
>>>=20
>>> 0. The client reconnects after a network partition
>>> 1. The client sends a couple of READ requests
>>> 2. The client independently discovers its lease has expired
>>> 3. The client establishes a fresh lease
>>> 4. The client destroys open, lock, and delegation stateids for the file
>>> that was open under the previous lease
>>> 5. The client issues a new OPEN to recover state for that file
>>> 6. The server replies to the READs in step 1. with NFS4ERR_EXPIRED
>>> 7. The client turns the READs around immediately using the current open
>>> stateid for that file, which is the zero stateid
>>> 8. The server replies NFS4_OK to the OPEN from step 5
>>>=20
>>> If I understand the code correctly, if the server happened to send thos=
e
>>> READ replies after its OPEN reply (rather than before), the client woul=
d
>>> have used the recovered open stateid instead of the zero stateid when
>>> resending the READ requests.
>>>=20
>>> Would it be better if the client recognized there is state recovery in
>>> progress, and then waited for recovery to complete, before retrying the
>>> READs?
>>>=20
>>=20
>> Why isn=92t the session draining taking care of ensuring the READs don=
=92t happen until after recovery is done?
>=20
> This is NFSv4.0. (Apologies, I recalled NFS4ERR_EXPIRED had been removed
> from NFSv4.1, but I see that I was mistaken).
>=20
> Here's step 1 and 2, exactly. After the partition heals, the client sends=
:
>=20
> C READ
> C GETATTR
> C READ
> C RENEW
>=20
> The server responds to the RENEW first with GSS_CTXPROBLEM. The client's
> gssd connects and establishes a fresh GSS context. The client sends the
> RENEW again with the fresh context, and the server responds NFS4ERR_EXPIR=
ED.
> This triggers step 3.
>=20
> The replies for those READ calls are in step 6., after state recovery
> has started.

This is what I=92m confused about: Normally, I=92d expect the NFSv4.0 code =
to drain, due to the checks in nfs40_setup_sequence().

IOW: there should be 2 steps

2.5) Call nfs4_drain_slot_tbl() and wait for operations to complete
2.6) Process the NFS4ERR_EXPIRED errors returned by the READ requests se=
nt in (1).

before we get to recovering the lease in (3)...


2016-08-24 19:37:27

by Chuck Lever III

[permalink] [raw]
Subject: Re: READ during state recovery uses zero stateid


> On Aug 24, 2016, at 3:05 PM, Trond Myklebust <[email protected]> wrote:
>
>>
>> On Aug 24, 2016, at 14:47, Chuck Lever <[email protected]> wrote:
>>
>>
>>> On Aug 24, 2016, at 2:23 PM, Trond Myklebust <[email protected]> wrote:
>>>
>>>>
>>>> On Aug 24, 2016, at 14:10, Chuck Lever <[email protected]> wrote:
>>>>
>>>> Hi-
>>>>
>>>> I have a wire capture that shows this race while a simple I/O workload is
>>>> running:
>>>>
>>>> 0. The client reconnects after a network partition
>>>> 1. The client sends a couple of READ requests
>>>> 2. The client independently discovers its lease has expired
>>>> 3. The client establishes a fresh lease
>>>> 4. The client destroys open, lock, and delegation stateids for the file
>>>> that was open under the previous lease
>>>> 5. The client issues a new OPEN to recover state for that file
>>>> 6. The server replies to the READs in step 1. with NFS4ERR_EXPIRED
>>>> 7. The client turns the READs around immediately using the current open
>>>> stateid for that file, which is the zero stateid
>>>> 8. The server replies NFS4_OK to the OPEN from step 5
>>>>
>>>> If I understand the code correctly, if the server happened to send those
>>>> READ replies after its OPEN reply (rather than before), the client would
>>>> have used the recovered open stateid instead of the zero stateid when
>>>> resending the READ requests.
>>>>
>>>> Would it be better if the client recognized there is state recovery in
>>>> progress, and then waited for recovery to complete, before retrying the
>>>> READs?
>>>>
>>>
>>> Why isn?t the session draining taking care of ensuring the READs don?t happen until after recovery is done?
>>
>> This is NFSv4.0. (Apologies, I recalled NFS4ERR_EXPIRED had been removed
>> from NFSv4.1, but I see that I was mistaken).
>>
>> Here's step 1 and 2, exactly. After the partition heals, the client sends:
>>
>> C READ
>> C GETATTR
>> C READ
>> C RENEW
>>
>> The server responds to the RENEW first with GSS_CTXPROBLEM. The client's
>> gssd connects and establishes a fresh GSS context. The client sends the
>> RENEW again with the fresh context, and the server responds NFS4ERR_EXPIRED.
>> This triggers step 3.
>>
>> The replies for those READ calls are in step 6., after state recovery
>> has started.
>
> This is what I?m confused about: Normally, I?d expect the NFSv4.0 code to drain, due to the checks in nfs40_setup_sequence().
>
> IOW: there should be 2 steps
>
> 2.5) Call nfs4_drain_slot_tbl() and wait for operations to complete
> 2.6) Process the NFS4ERR_EXPIRED errors returned by the READ requests sent in (1).
>
> before we get to recovering the lease in (3)...

My kernel is missing commit 5cae02f42793 ("NFSv4: Always drain the slot
table before re-establishing the lease"). I can give that a try, thank
you!


--
Chuck Lever




2016-08-25 15:33:20

by Chuck Lever III

[permalink] [raw]
Subject: Re: READ during state recovery uses zero stateid


> On Aug 24, 2016, at 3:37 PM, Chuck Lever <[email protected]> wrote:
>
>>
>> On Aug 24, 2016, at 3:05 PM, Trond Myklebust <[email protected]> wrote:
>>
>>>
>>> On Aug 24, 2016, at 14:47, Chuck Lever <[email protected]> wrote:
>>>
>>>
>>>> On Aug 24, 2016, at 2:23 PM, Trond Myklebust <[email protected]> wrote:
>>>>
>>>>>
>>>>> On Aug 24, 2016, at 14:10, Chuck Lever <[email protected]> wrote:
>>>>>
>>>>> Hi-
>>>>>
>>>>> I have a wire capture that shows this race while a simple I/O workload is
>>>>> running:
>>>>>
>>>>> 0. The client reconnects after a network partition
>>>>> 1. The client sends a couple of READ requests
>>>>> 2. The client independently discovers its lease has expired
>>>>> 3. The client establishes a fresh lease
>>>>> 4. The client destroys open, lock, and delegation stateids for the file
>>>>> that was open under the previous lease
>>>>> 5. The client issues a new OPEN to recover state for that file
>>>>> 6. The server replies to the READs in step 1. with NFS4ERR_EXPIRED
>>>>> 7. The client turns the READs around immediately using the current open
>>>>> stateid for that file, which is the zero stateid
>>>>> 8. The server replies NFS4_OK to the OPEN from step 5
>>>>>
>>>>> If I understand the code correctly, if the server happened to send those
>>>>> READ replies after its OPEN reply (rather than before), the client would
>>>>> have used the recovered open stateid instead of the zero stateid when
>>>>> resending the READ requests.
>>>>>
>>>>> Would it be better if the client recognized there is state recovery in
>>>>> progress, and then waited for recovery to complete, before retrying the
>>>>> READs?
>>>>>
>>>>
>>>> Why isn?t the session draining taking care of ensuring the READs don?t happen until after recovery is done?
>>>
>>> This is NFSv4.0. (Apologies, I recalled NFS4ERR_EXPIRED had been removed
>>> from NFSv4.1, but I see that I was mistaken).
>>>
>>> Here's step 1 and 2, exactly. After the partition heals, the client sends:
>>>
>>> C READ
>>> C GETATTR
>>> C READ
>>> C RENEW
>>>
>>> The server responds to the RENEW first with GSS_CTXPROBLEM. The client's
>>> gssd connects and establishes a fresh GSS context. The client sends the
>>> RENEW again with the fresh context, and the server responds NFS4ERR_EXPIRED.
>>> This triggers step 3.
>>>
>>> The replies for those READ calls are in step 6., after state recovery
>>> has started.
>>
>> This is what I?m confused about: Normally, I?d expect the NFSv4.0 code to drain, due to the checks in nfs40_setup_sequence().
>>
>> IOW: there should be 2 steps
>>
>> 2.5) Call nfs4_drain_slot_tbl() and wait for operations to complete
>> 2.6) Process the NFS4ERR_EXPIRED errors returned by the READ requests sent in (1).
>>
>> before we get to recovering the lease in (3)...
>
> My kernel is missing commit 5cae02f42793 ("NFSv4: Always drain the slot
> table before re-establishing the lease"). I can give that a try, thank
> you!

This is v4.1.31, btw.


--
Chuck Lever