2019-02-28 10:10:07

by Jiufei Xue

[permalink] [raw]
Subject: [bug report] task hang while testing xfstests generic/323

Hi,

when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
changed to zombie occasionally while a thread is hanging with the
following stack:

[<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
[<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
[<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
[<0>] nfs_file_release+0x35/0x50 [nfs]
[<0>] __fput+0xa2/0x1c0
[<0>] task_work_run+0x82/0xa0
[<0>] do_exit+0x2ac/0xc20
[<0>] do_group_exit+0x39/0xa0
[<0>] get_signal+0x1ce/0x5d0
[<0>] do_signal+0x36/0x620
[<0>] exit_to_usermode_loop+0x5e/0xc2
[<0>] do_syscall_64+0x16c/0x190
[<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[<0>] 0xffffffffffffffff

Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
NFS4ERR_OLD_STATEID), the client will retry to close the file when
stateid generation number in client is lower than server.

The original intention of this commit is retrying the operation while
racing with an OPEN. However, in this case the stateid generation remains
mismatch forever.

Any suggestions?

Thanks,
Jiufei


2019-02-28 22:26:48

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <[email protected]> wrote:
>
> Hi,
>
> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
> changed to zombie occasionally while a thread is hanging with the
> following stack:
>
> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
> [<0>] nfs_file_release+0x35/0x50 [nfs]
> [<0>] __fput+0xa2/0x1c0
> [<0>] task_work_run+0x82/0xa0
> [<0>] do_exit+0x2ac/0xc20
> [<0>] do_group_exit+0x39/0xa0
> [<0>] get_signal+0x1ce/0x5d0
> [<0>] do_signal+0x36/0x620
> [<0>] exit_to_usermode_loop+0x5e/0xc2
> [<0>] do_syscall_64+0x16c/0x190
> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [<0>] 0xffffffffffffffff
>
> Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
> NFS4ERR_OLD_STATEID), the client will retry to close the file when
> stateid generation number in client is lower than server.
>
> The original intention of this commit is retrying the operation while
> racing with an OPEN. However, in this case the stateid generation remains
> mismatch forever.
>
> Any suggestions?

Can you include a network trace of the failure? Is it possible that
the server has crashed on reply to the close and that's why the task
is hung? What server are you testing against?

I have seen trace where close would get ERR_OLD_STATEID and would
still retry with the same open state until it got a reply to the OPEN
which changed the state and when the client received reply to that,
it'll retry the CLOSE with the updated stateid.

2019-02-28 23:57:00

by Trond Myklebust

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Thu, 2019-02-28 at 17:26 -0500, Olga Kornievskaia wrote:
> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
> [email protected]> wrote:
> > Hi,
> >
> > when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
> > changed to zombie occasionally while a thread is hanging with the
> > following stack:
> >
> > [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
> > [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
> > [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
> > [<0>] nfs_file_release+0x35/0x50 [nfs]
> > [<0>] __fput+0xa2/0x1c0
> > [<0>] task_work_run+0x82/0xa0
> > [<0>] do_exit+0x2ac/0xc20
> > [<0>] do_group_exit+0x39/0xa0
> > [<0>] get_signal+0x1ce/0x5d0
> > [<0>] do_signal+0x36/0x620
> > [<0>] exit_to_usermode_loop+0x5e/0xc2
> > [<0>] do_syscall_64+0x16c/0x190
> > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [<0>] 0xffffffffffffffff
> >
> > Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
> > NFS4ERR_OLD_STATEID), the client will retry to close the file when
> > stateid generation number in client is lower than server.
> >
> > The original intention of this commit is retrying the operation
> > while
> > racing with an OPEN. However, in this case the stateid generation
> > remains
> > mismatch forever.
> >
> > Any suggestions?
>
> Can you include a network trace of the failure? Is it possible that
> the server has crashed on reply to the close and that's why the task
> is hung? What server are you testing against?
>
> I have seen trace where close would get ERR_OLD_STATEID and would
> still retry with the same open state until it got a reply to the OPEN
> which changed the state and when the client received reply to that,
> it'll retry the CLOSE with the updated stateid.

I agree with Olga's assessment. The server is not allowed to randomly
change the values of the seqid, and the client should be taking pains
to replay any OPEN calls for which a reply is missed. The expectation
is therefore that NFS4ERR_OLD_STATEID should always be a temporary
state.

If it is not, then the bugreport needs to explain why the server bumped
the seqid without informing the client.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-03-01 05:08:47

by Jiufei Xue

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

Hi Olga,

On 2019/3/1 上午6:26, Olga Kornievskaia wrote:
> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <[email protected]> wrote:
>>
>> Hi,
>>
>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
>> changed to zombie occasionally while a thread is hanging with the
>> following stack:
>>
>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
>> [<0>] nfs_file_release+0x35/0x50 [nfs]
>> [<0>] __fput+0xa2/0x1c0
>> [<0>] task_work_run+0x82/0xa0
>> [<0>] do_exit+0x2ac/0xc20
>> [<0>] do_group_exit+0x39/0xa0
>> [<0>] get_signal+0x1ce/0x5d0
>> [<0>] do_signal+0x36/0x620
>> [<0>] exit_to_usermode_loop+0x5e/0xc2
>> [<0>] do_syscall_64+0x16c/0x190
>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> [<0>] 0xffffffffffffffff
>>
>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
>> NFS4ERR_OLD_STATEID), the client will retry to close the file when
>> stateid generation number in client is lower than server.
>>
>> The original intention of this commit is retrying the operation while
>> racing with an OPEN. However, in this case the stateid generation remains
>> mismatch forever.
>>
>> Any suggestions?
>
> Can you include a network trace of the failure? Is it possible that
> the server has crashed on reply to the close and that's why the task
> is hung? What server are you testing against?
>
I am testing the local server, the testcase is attached below.

# uname -r
5.0.0-rc8

# cat /etc/exports
/mnt/vdd *(rw,no_subtree_check,no_root_squash)

# cat test.sh
mount /dev/vdd /mnt/vdd
systemctl start nfs
mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd /mnt/nfsscratch

for ((i=0;i<10000;i++))
do
echo "loop $i started"
/usr/local/xfstests/src/aio-dio-regress/aio-last-ref-held-by-io 1 100 /mnt/nfsscratch/aio-testfile
sleep 5
done

The command aio-last-ref-held-by-io from xfstests will spawn 100 threads. Each thread
repeated doing open/io_submit/close/io_getevents/io_destroy until the parent exist
because of timeout.


Here is the log when the task hung.

[10841.121523] nfs4_close_prepare: begin!
[10841.121526] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=10
[10841.121528] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
[10841.121530] nfs4_close_prepare: done!
[10841.121536] encode_sequence: sessionid=1551405749:3822331530:1:0 seqid=59495845 slotid=0 max_slotid=0 cache_this=1
[10841.121583] nfsd_dispatch: vers 4 proc 1
[10841.121588] __find_in_sessionid_hashtbl: 1551405749:3822331530:1:0
[10841.121590] nfsd4_sequence: slotid 0
[10841.121592] check_slot_seqid enter. seqid 59495845 slot_seqid 59495844
[10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9 61e12d9e c03beba6 0000000b)
[10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9 61e12d9e c03beba6 0000000b)
[10841.121613] NFSD: nfsd4_close on file aio-testfile
[10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid = (5c7892b5/e3d4268a/000ab905/0000746b)
[10841.121619] nfsv4 compound returned 10024
[10841.121621] --> nfsd4_store_cache_entry slot 00000000f540ddd9
[10841.121625] renewing client (clientid 5c7892b5/e3d4268a)
[10841.122526] decode_attr_type: type=00
......
[10841.122558] nfs4_close_done: begin!
[10841.122561] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=10
[10841.122564] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
[10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0
[10841.122568] nfs41_sequence_process: Error 0 free the slot
[10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
[10841.122575] nfs4_close_done: done, ret = 0!
[10841.122584] nfs4_close_prepare: begin!
......
Client keep retry the CLOSE.

> I have seen trace where close would get ERR_OLD_STATEID and would
> still retry with the same open state until it got a reply to the OPEN
> which changed the state and when the client received reply to that,
> it'll retry the CLOSE with the updated stateid.
>

Yes, you are right. I have also noticed that. But what happened if the
OPEN task exit receiving a signal? How can a client update the stateid?

Thanks,
Jiufei

2019-03-01 05:20:00

by Jiufei Xue

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323



On 2019/3/1 上午7:56, Trond Myklebust wrote:
> On Thu, 2019-02-28 at 17:26 -0500, Olga Kornievskaia wrote:
>> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
>> [email protected]> wrote:
>>> Hi,
>>>
>>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
>>> changed to zombie occasionally while a thread is hanging with the
>>> following stack:
>>>
>>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
>>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
>>> [<0>] nfs_file_release+0x35/0x50 [nfs]
>>> [<0>] __fput+0xa2/0x1c0
>>> [<0>] task_work_run+0x82/0xa0
>>> [<0>] do_exit+0x2ac/0xc20
>>> [<0>] do_group_exit+0x39/0xa0
>>> [<0>] get_signal+0x1ce/0x5d0
>>> [<0>] do_signal+0x36/0x620
>>> [<0>] exit_to_usermode_loop+0x5e/0xc2
>>> [<0>] do_syscall_64+0x16c/0x190
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [<0>] 0xffffffffffffffff
>>>
>>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
>>> NFS4ERR_OLD_STATEID), the client will retry to close the file when
>>> stateid generation number in client is lower than server.
>>>
>>> The original intention of this commit is retrying the operation
>>> while
>>> racing with an OPEN. However, in this case the stateid generation
>>> remains
>>> mismatch forever.
>>>
>>> Any suggestions?
>>
>> Can you include a network trace of the failure? Is it possible that
>> the server has crashed on reply to the close and that's why the task
>> is hung? What server are you testing against?
>>
>> I have seen trace where close would get ERR_OLD_STATEID and would
>> still retry with the same open state until it got a reply to the OPEN
>> which changed the state and when the client received reply to that,
>> it'll retry the CLOSE with the updated stateid.
>
> I agree with Olga's assessment. The server is not allowed to randomly
> change the values of the seqid, and the client should be taking pains
> to replay any OPEN calls for which a reply is missed. The expectation
> is therefore that NFS4ERR_OLD_STATEID should always be a temporary
> state.
>
The server bumped the seqid because of a new OPEN from another thread.
And I doubt that maybe the new OPEN task exit while receiving a signal
without update the stateid.

> If it is not, then the bugreport needs to explain why the server bumped
> the seqid without informing the client.
>

2019-03-01 08:50:00

by Jiufei Xue

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323



On 2019/3/1 下午1:08, Jiufei Xue wrote:
> Hi Olga,
>
> On 2019/3/1 上午6:26, Olga Kornievskaia wrote:
>> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <[email protected]> wrote:
>>>
>>> Hi,
>>>
>>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the task
>>> changed to zombie occasionally while a thread is hanging with the
>>> following stack:
>>>
>>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
>>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
>>> [<0>] nfs_file_release+0x35/0x50 [nfs]
>>> [<0>] __fput+0xa2/0x1c0
>>> [<0>] task_work_run+0x82/0xa0
>>> [<0>] do_exit+0x2ac/0xc20
>>> [<0>] do_group_exit+0x39/0xa0
>>> [<0>] get_signal+0x1ce/0x5d0
>>> [<0>] do_signal+0x36/0x620
>>> [<0>] exit_to_usermode_loop+0x5e/0xc2
>>> [<0>] do_syscall_64+0x16c/0x190
>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>> [<0>] 0xffffffffffffffff
>>>
>>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
>>> NFS4ERR_OLD_STATEID), the client will retry to close the file when
>>> stateid generation number in client is lower than server.
>>>
>>> The original intention of this commit is retrying the operation while
>>> racing with an OPEN. However, in this case the stateid generation remains
>>> mismatch forever.
>>>
>>> Any suggestions?
>>
>> Can you include a network trace of the failure? Is it possible that
>> the server has crashed on reply to the close and that's why the task
>> is hung? What server are you testing against?
>>
> I am testing the local server, the testcase is attached below.
>
> # uname -r
> 5.0.0-rc8
>
> # cat /etc/exports
> /mnt/vdd *(rw,no_subtree_check,no_root_squash)
>
> # cat test.sh
> mount /dev/vdd /mnt/vdd
> systemctl start nfs
> mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd /mnt/nfsscratch
>
> for ((i=0;i<10000;i++))
> do
> echo "loop $i started"
> /usr/local/xfstests/src/aio-dio-regress/aio-last-ref-held-by-io 1 100 /mnt/nfsscratch/aio-testfile
> sleep 5
> done
>
> The command aio-last-ref-held-by-io from xfstests will spawn 100 threads. Each thread
> repeated doing open/io_submit/close/io_getevents/io_destroy until the parent exist
> because of timeout.
>
>
> Here is the log when the task hung.
>
> [10841.121523] nfs4_close_prepare: begin!
> [10841.121526] --> nfs4_alloc_slot used_slots=0000 highest_used=4294967295 max_slots=10
> [10841.121528] <-- nfs4_alloc_slot used_slots=0001 highest_used=0 slotid=0
> [10841.121530] nfs4_close_prepare: done!
> [10841.121536] encode_sequence: sessionid=1551405749:3822331530:1:0 seqid=59495845 slotid=0 max_slotid=0 cache_this=1
> [10841.121583] nfsd_dispatch: vers 4 proc 1
> [10841.121588] __find_in_sessionid_hashtbl: 1551405749:3822331530:1:0
> [10841.121590] nfsd4_sequence: slotid 0
> [10841.121592] check_slot_seqid enter. seqid 59495845 slot_seqid 59495844
> [10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9 61e12d9e c03beba6 0000000b)
> [10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9 61e12d9e c03beba6 0000000b)
> [10841.121613] NFSD: nfsd4_close on file aio-testfile
> [10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid = (5c7892b5/e3d4268a/000ab905/0000746b)
> [10841.121619] nfsv4 compound returned 10024
> [10841.121621] --> nfsd4_store_cache_entry slot 00000000f540ddd9
> [10841.121625] renewing client (clientid 5c7892b5/e3d4268a)
> [10841.122526] decode_attr_type: type=00
> ......
> [10841.122558] nfs4_close_done: begin!
> [10841.122561] --> nfs4_alloc_slot used_slots=0001 highest_used=0 max_slots=10
> [10841.122564] <-- nfs4_alloc_slot used_slots=0003 highest_used=1 slotid=1
> [10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0
> [10841.122568] nfs41_sequence_process: Error 0 free the slot
> [10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid 4294967295
> [10841.122575] nfs4_close_done: done, ret = 0!
> [10841.122584] nfs4_close_prepare: begin!
> ......
> Client keep retry the CLOSE.
>
>> I have seen trace where close would get ERR_OLD_STATEID and would
>> still retry with the same open state until it got a reply to the OPEN
>> which changed the state and when the client received reply to that,
>> it'll retry the CLOSE with the updated stateid.
>>
>
> Yes, you are right. I have also noticed that. But what happened if the
> OPEN task exit receiving a signal? How can a client update the stateid?
>
> Thanks,
> Jiufei
>

I have added a tracepoint nfsd_inc_and_copy_stateid and enable the following
trace: nfs4_open_file,nfs4_open_stateid_update on client and
nfsd_inc_and_copy_stateid on server.

Here's some trace right before the problem. The generation on server is 31856(7c70) which
is higher than client(31855). And the client didn't update the stateid forever.

aio-last-ref-he-5224 [000] ...1 1105.173446: nfs4_open_file: error=0 (OK) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:11 fhandle=0x0ca0e657 name=00:2b:2/aio-testfile stateid=31854:0x2a82450b openstateid=31854:0x2a82450b
aio-last-ref-he-5222 [000] ...1 1105.173568: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
nfsd-2991 [000] ...1 1105.173661: nfsd_inc_and_copy_stateid: client 5c78ec26:60bfa0c2 stateid 00095d13:00007c6f
aio-last-ref-he-5234 [000] ...1 1105.174000: nfs4_open_file: error=0 (OK) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:11 fhandle=0x0ca0e657 name=00:2b:2/aio-testfile stateid=31854:0x2a82450b openstateid=31854:0x2a82450b
aio-last-ref-he-5151 [001] ...1 1105.174226: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
aio-last-ref-he-5235 [001] ...1 1105.174544: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
aio-last-ref-he-5141 [001] ...4 1105.174752: nfs4_open_stateid_update: error=-4 (0x4) fileid=00:2b:11 fhandle=0x0ca0e657 stateid=25925:0xe0cb0a41
aio-last-ref-he-5151 [001] ...4 1105.178258: nfs4_open_stateid_update: error=-4 (0x4) fileid=00:2b:11 fhandle=0x0ca0e657 stateid=31855:0x2a82450b
<...>-5181 [001] .N.1 1105.180254: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
aio-last-ref-he-5151 [001] ...1 1105.180670: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
aio-last-ref-he-5235 [001] ...1 1105.180929: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
aio-last-ref-he-5181 [001] ...1 1105.181179: nfs4_open_file: error=-512 (ERESTARTSYS) flags=49152 (O_DIRECT|0x8000) fmode=READ fileid=00:2b:0 fhandle=0x00000000 name=00:2b:2/aio-testfile stateid=0:0x00000000 openstateid=0:0x00000000
aio-last-ref-he-5141 [001] ...1 1105.181816: nfs4_open_file: error=0 (OK) flags=49152 (O_DIRECT|0x8000) fmode=READ|0x801c fileid=00:2b:11 fhandle=0x0ca0e657 name=00:2b:2/aio-testfile stateid=31855:0x2a82450b openstateid=31855:0x2a82450b
nfsd-2990 [001] ...1 1105.182033: nfsd_inc_and_copy_stateid: client 5c78ec26:60bfa0c2 stateid 00095d13:00007c70

Thanks,
Jiufei


2019-03-01 13:08:56

by Trond Myklebust

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Fri, 2019-03-01 at 16:49 +0800, Jiufei Xue wrote:
>
> On 2019/3/1 下午1:08, Jiufei Xue wrote:
> > Hi Olga,
> >
> > On 2019/3/1 上午6:26, Olga Kornievskaia wrote:
> > > On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
> > > [email protected]> wrote:
> > > > Hi,
> > > >
> > > > when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the
> > > > task
> > > > changed to zombie occasionally while a thread is hanging with
> > > > the
> > > > following stack:
> > > >
> > > > [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
> > > > [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
> > > > [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
> > > > [<0>] nfs_file_release+0x35/0x50 [nfs]
> > > > [<0>] __fput+0xa2/0x1c0
> > > > [<0>] task_work_run+0x82/0xa0
> > > > [<0>] do_exit+0x2ac/0xc20
> > > > [<0>] do_group_exit+0x39/0xa0
> > > > [<0>] get_signal+0x1ce/0x5d0
> > > > [<0>] do_signal+0x36/0x620
> > > > [<0>] exit_to_usermode_loop+0x5e/0xc2
> > > > [<0>] do_syscall_64+0x16c/0x190
> > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > [<0>] 0xffffffffffffffff
> > > >
> > > > Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
> > > > NFS4ERR_OLD_STATEID), the client will retry to close the file
> > > > when
> > > > stateid generation number in client is lower than server.
> > > >
> > > > The original intention of this commit is retrying the operation
> > > > while
> > > > racing with an OPEN. However, in this case the stateid
> > > > generation remains
> > > > mismatch forever.
> > > >
> > > > Any suggestions?
> > >
> > > Can you include a network trace of the failure? Is it possible
> > > that
> > > the server has crashed on reply to the close and that's why the
> > > task
> > > is hung? What server are you testing against?
> > >
> > I am testing the local server, the testcase is attached below.
> >
> > # uname -r
> > 5.0.0-rc8
> >
> > # cat /etc/exports
> > /mnt/vdd *(rw,no_subtree_check,no_root_squash)
> >
> > # cat test.sh
> > mount /dev/vdd /mnt/vdd
> > systemctl start nfs
> > mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd
> > /mnt/nfsscratch
> >
> > for ((i=0;i<10000;i++))
> > do
> > echo "loop $i started"
> > /usr/local/xfstests/src/aio-dio-regress/aio-last-ref-held-
> > by-io 1 100 /mnt/nfsscratch/aio-testfile
> > sleep 5
> > done
> >
> > The command aio-last-ref-held-by-io from xfstests will spawn 100
> > threads. Each thread
> > repeated doing open/io_submit/close/io_getevents/io_destroy until
> > the parent exist
> > because of timeout.
> >
> >
> > Here is the log when the task hung.
> >
> > [10841.121523] nfs4_close_prepare: begin!
> > [10841.121526] --> nfs4_alloc_slot used_slots=0000
> > highest_used=4294967295 max_slots=10
> > [10841.121528] <-- nfs4_alloc_slot used_slots=0001 highest_used=0
> > slotid=0
> > [10841.121530] nfs4_close_prepare: done!
> > [10841.121536] encode_sequence: sessionid=1551405749:3822331530:1:0
> > seqid=59495845 slotid=0 max_slotid=0 cache_this=1
> > [10841.121583] nfsd_dispatch: vers 4 proc 1
> > [10841.121588] __find_in_sessionid_hashtbl:
> > 1551405749:3822331530:1:0
> > [10841.121590] nfsd4_sequence: slotid 0
> > [10841.121592] check_slot_seqid enter. seqid 59495845 slot_seqid
> > 59495844
> > [10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
> > 61e12d9e c03beba6 0000000b)
> > [10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
> > 61e12d9e c03beba6 0000000b)
> > [10841.121613] NFSD: nfsd4_close on file aio-testfile
> > [10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid =
> > (5c7892b5/e3d4268a/000ab905/0000746b)
> > [10841.121619] nfsv4 compound returned 10024
> > [10841.121621] --> nfsd4_store_cache_entry slot 00000000f540ddd9
> > [10841.121625] renewing client (clientid 5c7892b5/e3d4268a)
> > [10841.122526] decode_attr_type: type=00
> > ......
> > [10841.122558] nfs4_close_done: begin!
> > [10841.122561] --> nfs4_alloc_slot used_slots=0001 highest_used=0
> > max_slots=10
> > [10841.122564] <-- nfs4_alloc_slot used_slots=0003 highest_used=1
> > slotid=1
> > [10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0
> > [10841.122568] nfs41_sequence_process: Error 0 free the slot
> > [10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid
> > 4294967295
> > [10841.122575] nfs4_close_done: done, ret = 0!
> > [10841.122584] nfs4_close_prepare: begin!
> > ......
> > Client keep retry the CLOSE.
> >
> > > I have seen trace where close would get ERR_OLD_STATEID and would
> > > still retry with the same open state until it got a reply to the
> > > OPEN
> > > which changed the state and when the client received reply to
> > > that,
> > > it'll retry the CLOSE with the updated stateid.
> > >
> >
> > Yes, you are right. I have also noticed that. But what happened if
> > the
> > OPEN task exit receiving a signal? How can a client update the
> > stateid?
> >

static void nfs4_open_release(void *calldata)
{
struct
nfs4_opendata *data = calldata;
struct nfs4_state *state = NULL;

/* If this request hasn't been cancelled, do nothing */
if (!data->cancelled)
goto out_free;
/* In case of error, no cleanup! */
if (data->rpc_status != 0 || !data->rpc_done)
goto out_free;
/* In case we need an open_confirm, no cleanup! */
if (data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM)
goto out_free;
state = nfs4_opendata_to_nfs4_state(data);
if (!IS_ERR(state))
nfs4_close_state(state, data->o_arg.fmode);
out_free:
nfs4_opendata_put(data);
}

Note what happens when the 'data->cancelled' flag is set. This is why
we always ensure that stateful operations such as OPEN, and LOCK,...
are run as asynchronous RPC tasks.
OPEN_DOWNGRADE, CLOSE and LOCKU are also asynchronous, but they are
inherently safe against signals since they do not establish new state
that needs to be torn down.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-03-02 16:34:08

by Jiufei Xue

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

Hi Trond,

On 2019/3/1 下午9:08, Trond Myklebust wrote:
> On Fri, 2019-03-01 at 16:49 +0800, Jiufei Xue wrote:
>>
>> On 2019/3/1 下午1:08, Jiufei Xue wrote:
>>> Hi Olga,
>>>
>>> On 2019/3/1 上午6:26, Olga Kornievskaia wrote:
>>>> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
>>>> [email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the
>>>>> task
>>>>> changed to zombie occasionally while a thread is hanging with
>>>>> the
>>>>> following stack:
>>>>>
>>>>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>>>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
>>>>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
>>>>> [<0>] nfs_file_release+0x35/0x50 [nfs]
>>>>> [<0>] __fput+0xa2/0x1c0
>>>>> [<0>] task_work_run+0x82/0xa0
>>>>> [<0>] do_exit+0x2ac/0xc20
>>>>> [<0>] do_group_exit+0x39/0xa0
>>>>> [<0>] get_signal+0x1ce/0x5d0
>>>>> [<0>] do_signal+0x36/0x620
>>>>> [<0>] exit_to_usermode_loop+0x5e/0xc2
>>>>> [<0>] do_syscall_64+0x16c/0x190
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [<0>] 0xffffffffffffffff
>>>>>
>>>>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
>>>>> NFS4ERR_OLD_STATEID), the client will retry to close the file
>>>>> when
>>>>> stateid generation number in client is lower than server.
>>>>>
>>>>> The original intention of this commit is retrying the operation
>>>>> while
>>>>> racing with an OPEN. However, in this case the stateid
>>>>> generation remains
>>>>> mismatch forever.
>>>>>
>>>>> Any suggestions?
>>>>
>>>> Can you include a network trace of the failure? Is it possible
>>>> that
>>>> the server has crashed on reply to the close and that's why the
>>>> task
>>>> is hung? What server are you testing against?
>>>>
>>> I am testing the local server, the testcase is attached below.
>>>
>>> # uname -r
>>> 5.0.0-rc8
>>>
>>> # cat /etc/exports
>>> /mnt/vdd *(rw,no_subtree_check,no_root_squash)
>>>
>>> # cat test.sh
>>> mount /dev/vdd /mnt/vdd
>>> systemctl start nfs
>>> mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd
>>> /mnt/nfsscratch
>>>
>>> for ((i=0;i<10000;i++))
>>> do
>>> echo "loop $i started"
>>> /usr/local/xfstests/src/aio-dio-regress/aio-last-ref-held-
>>> by-io 1 100 /mnt/nfsscratch/aio-testfile
>>> sleep 5
>>> done
>>>
>>> The command aio-last-ref-held-by-io from xfstests will spawn 100
>>> threads. Each thread
>>> repeated doing open/io_submit/close/io_getevents/io_destroy until
>>> the parent exist
>>> because of timeout.
>>>
>>>
>>> Here is the log when the task hung.
>>>
>>> [10841.121523] nfs4_close_prepare: begin!
>>> [10841.121526] --> nfs4_alloc_slot used_slots=0000
>>> highest_used=4294967295 max_slots=10
>>> [10841.121528] <-- nfs4_alloc_slot used_slots=0001 highest_used=0
>>> slotid=0
>>> [10841.121530] nfs4_close_prepare: done!
>>> [10841.121536] encode_sequence: sessionid=1551405749:3822331530:1:0
>>> seqid=59495845 slotid=0 max_slotid=0 cache_this=1
>>> [10841.121583] nfsd_dispatch: vers 4 proc 1
>>> [10841.121588] __find_in_sessionid_hashtbl:
>>> 1551405749:3822331530:1:0
>>> [10841.121590] nfsd4_sequence: slotid 0
>>> [10841.121592] check_slot_seqid enter. seqid 59495845 slot_seqid
>>> 59495844
>>> [10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
>>> 61e12d9e c03beba6 0000000b)
>>> [10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
>>> 61e12d9e c03beba6 0000000b)
>>> [10841.121613] NFSD: nfsd4_close on file aio-testfile
>>> [10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid =
>>> (5c7892b5/e3d4268a/000ab905/0000746b)
>>> [10841.121619] nfsv4 compound returned 10024
>>> [10841.121621] --> nfsd4_store_cache_entry slot 00000000f540ddd9
>>> [10841.121625] renewing client (clientid 5c7892b5/e3d4268a)
>>> [10841.122526] decode_attr_type: type=00
>>> ......
>>> [10841.122558] nfs4_close_done: begin!
>>> [10841.122561] --> nfs4_alloc_slot used_slots=0001 highest_used=0
>>> max_slots=10
>>> [10841.122564] <-- nfs4_alloc_slot used_slots=0003 highest_used=1
>>> slotid=1
>>> [10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0
>>> [10841.122568] nfs41_sequence_process: Error 0 free the slot
>>> [10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid
>>> 4294967295
>>> [10841.122575] nfs4_close_done: done, ret = 0!
>>> [10841.122584] nfs4_close_prepare: begin!
>>> ......
>>> Client keep retry the CLOSE.
>>>
>>>> I have seen trace where close would get ERR_OLD_STATEID and would
>>>> still retry with the same open state until it got a reply to the
>>>> OPEN
>>>> which changed the state and when the client received reply to
>>>> that,
>>>> it'll retry the CLOSE with the updated stateid.
>>>>
>>>
>>> Yes, you are right. I have also noticed that. But what happened if
>>> the
>>> OPEN task exit receiving a signal? How can a client update the
>>> stateid?
>>>
>
> static void nfs4_open_release(void *calldata)
> {
> struct
> nfs4_opendata *data = calldata;
> struct nfs4_state *state = NULL;
>
> /* If this request hasn't been cancelled, do nothing */
> if (!data->cancelled)
> goto out_free;
> /* In case of error, no cleanup! */
> if (data->rpc_status != 0 || !data->rpc_done)
> goto out_free;
> /* In case we need an open_confirm, no cleanup! */
> if (data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM)
> goto out_free;
> state = nfs4_opendata_to_nfs4_state(data);
> if (!IS_ERR(state))
> nfs4_close_state(state, data->o_arg.fmode);
> out_free:
> nfs4_opendata_put(data);
> }
>
> Note what happens when the 'data->cancelled' flag is set. This is why
> we always ensure that stateful operations such as OPEN, and LOCK,...
> are run as asynchronous RPC tasks.
> OPEN_DOWNGRADE, CLOSE and LOCKU are also asynchronous, but they are
> inherently safe against signals since they do not establish new state
> that needs to be torn down.
>
Thank you for your correction. I have not noted that before. However,
the problem can still be reproduced on 5.0.0-rc8. I found that server
bumped the last seqid in function nfsd4_process_open2(), while not keeping
the client updated for some reasons.

Thanks,
Jiufei

2019-03-04 15:20:10

by Jiufei Xue

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

Hi Trond,

On 2019/3/1 下午9:08, Trond Myklebust wrote:
> On Fri, 2019-03-01 at 16:49 +0800, Jiufei Xue wrote:
>>
>> On 2019/3/1 下午1:08, Jiufei Xue wrote:
>>> Hi Olga,
>>>
>>> On 2019/3/1 上午6:26, Olga Kornievskaia wrote:
>>>> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
>>>> [email protected]> wrote:
>>>>> Hi,
>>>>>
>>>>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2, the
>>>>> task
>>>>> changed to zombie occasionally while a thread is hanging with
>>>>> the
>>>>> following stack:
>>>>>
>>>>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>>>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
>>>>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
>>>>> [<0>] nfs_file_release+0x35/0x50 [nfs]
>>>>> [<0>] __fput+0xa2/0x1c0
>>>>> [<0>] task_work_run+0x82/0xa0
>>>>> [<0>] do_exit+0x2ac/0xc20
>>>>> [<0>] do_group_exit+0x39/0xa0
>>>>> [<0>] get_signal+0x1ce/0x5d0
>>>>> [<0>] do_signal+0x36/0x620
>>>>> [<0>] exit_to_usermode_loop+0x5e/0xc2
>>>>> [<0>] do_syscall_64+0x16c/0x190
>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>> [<0>] 0xffffffffffffffff
>>>>>
>>>>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and DELEGRETURN on
>>>>> NFS4ERR_OLD_STATEID), the client will retry to close the file
>>>>> when
>>>>> stateid generation number in client is lower than server.
>>>>>
>>>>> The original intention of this commit is retrying the operation
>>>>> while
>>>>> racing with an OPEN. However, in this case the stateid
>>>>> generation remains
>>>>> mismatch forever.
>>>>>
>>>>> Any suggestions?
>>>>
>>>> Can you include a network trace of the failure? Is it possible
>>>> that
>>>> the server has crashed on reply to the close and that's why the
>>>> task
>>>> is hung? What server are you testing against?
>>>>
>>> I am testing the local server, the testcase is attached below.
>>>
>>> # uname -r
>>> 5.0.0-rc8
>>>
>>> # cat /etc/exports
>>> /mnt/vdd *(rw,no_subtree_check,no_root_squash)
>>>
>>> # cat test.sh
>>> mount /dev/vdd /mnt/vdd
>>> systemctl start nfs
>>> mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd
>>> /mnt/nfsscratch
>>>
>>> for ((i=0;i<10000;i++))
>>> do
>>> echo "loop $i started"
>>> /usr/local/xfstests/src/aio-dio-regress/aio-last-ref-held-
>>> by-io 1 100 /mnt/nfsscratch/aio-testfile
>>> sleep 5
>>> done
>>>
>>> The command aio-last-ref-held-by-io from xfstests will spawn 100
>>> threads. Each thread
>>> repeated doing open/io_submit/close/io_getevents/io_destroy until
>>> the parent exist
>>> because of timeout.
>>>
>>>
>>> Here is the log when the task hung.
>>>
>>> [10841.121523] nfs4_close_prepare: begin!
>>> [10841.121526] --> nfs4_alloc_slot used_slots=0000
>>> highest_used=4294967295 max_slots=10
>>> [10841.121528] <-- nfs4_alloc_slot used_slots=0001 highest_used=0
>>> slotid=0
>>> [10841.121530] nfs4_close_prepare: done!
>>> [10841.121536] encode_sequence: sessionid=1551405749:3822331530:1:0
>>> seqid=59495845 slotid=0 max_slotid=0 cache_this=1
>>> [10841.121583] nfsd_dispatch: vers 4 proc 1
>>> [10841.121588] __find_in_sessionid_hashtbl:
>>> 1551405749:3822331530:1:0
>>> [10841.121590] nfsd4_sequence: slotid 0
>>> [10841.121592] check_slot_seqid enter. seqid 59495845 slot_seqid
>>> 59495844
>>> [10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
>>> 61e12d9e c03beba6 0000000b)
>>> [10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
>>> 61e12d9e c03beba6 0000000b)
>>> [10841.121613] NFSD: nfsd4_close on file aio-testfile
>>> [10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid =
>>> (5c7892b5/e3d4268a/000ab905/0000746b)
>>> [10841.121619] nfsv4 compound returned 10024
>>> [10841.121621] --> nfsd4_store_cache_entry slot 00000000f540ddd9
>>> [10841.121625] renewing client (clientid 5c7892b5/e3d4268a)
>>> [10841.122526] decode_attr_type: type=00
>>> ......
>>> [10841.122558] nfs4_close_done: begin!
>>> [10841.122561] --> nfs4_alloc_slot used_slots=0001 highest_used=0
>>> max_slots=10
>>> [10841.122564] <-- nfs4_alloc_slot used_slots=0003 highest_used=1
>>> slotid=1
>>> [10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0
>>> [10841.122568] nfs41_sequence_process: Error 0 free the slot
>>> [10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid
>>> 4294967295
>>> [10841.122575] nfs4_close_done: done, ret = 0!
>>> [10841.122584] nfs4_close_prepare: begin!
>>> ......
>>> Client keep retry the CLOSE.
>>>
>>>> I have seen trace where close would get ERR_OLD_STATEID and would
>>>> still retry with the same open state until it got a reply to the
>>>> OPEN
>>>> which changed the state and when the client received reply to
>>>> that,
>>>> it'll retry the CLOSE with the updated stateid.
>>>>
>>>
>>> Yes, you are right. I have also noticed that. But what happened if
>>> the
>>> OPEN task exit receiving a signal? How can a client update the
>>> stateid?
>>>
>
> static void nfs4_open_release(void *calldata)
> {
> struct
> nfs4_opendata *data = calldata;
> struct nfs4_state *state = NULL;
>
> /* If this request hasn't been cancelled, do nothing */
> if (!data->cancelled)
> goto out_free;
> /* In case of error, no cleanup! */
> if (data->rpc_status != 0 || !data->rpc_done)
> goto out_free;
> /* In case we need an open_confirm, no cleanup! */
> if (data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM)
> goto out_free;
> state = nfs4_opendata_to_nfs4_state(data);
> if (!IS_ERR(state))
> nfs4_close_state(state, data->o_arg.fmode);
> out_free:
> nfs4_opendata_put(data);
> }
>
> Note what happens when the 'data->cancelled' flag is set. This is why
> we always ensure that stateful operations such as OPEN, and LOCK,...
> are run as asynchronous RPC tasks.

I noticed that when receiving a signal, the rpc_status will be set to
-ERESTARTSYS in rpc_exit->rpc_exit_task->nfs4_open_done, which made stateid
not updated in nfs4_open_release().

So is it possible that the stateid between the client and the server remain
mismatch if the OPEN task received the signal right after transmitting the
message or receiving the response, but before calling rpc_exit_task normally?

Thanks,
Jiufei

> OPEN_DOWNGRADE, CLOSE and LOCKU are also asynchronous, but they are
> inherently safe against signals since they do not establish new state
> that needs to be torn down.
>

2019-03-04 15:50:17

by Trond Myklebust

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Mon, 2019-03-04 at 23:20 +0800, Jiufei Xue wrote:
> Hi Trond,
>
> On 2019/3/1 下午9:08, Trond Myklebust wrote:
> > On Fri, 2019-03-01 at 16:49 +0800, Jiufei Xue wrote:
> > > On 2019/3/1 下午1:08, Jiufei Xue wrote:
> > > > Hi Olga,
> > > >
> > > > On 2019/3/1 上午6:26, Olga Kornievskaia wrote:
> > > > > On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
> > > > > [email protected]> wrote:
> > > > > > Hi,
> > > > > >
> > > > > > when I tested xfstests/generic/323 with NFSv4.1 and v4.2,
> > > > > > the
> > > > > > task
> > > > > > changed to zombie occasionally while a thread is hanging
> > > > > > with
> > > > > > the
> > > > > > following stack:
> > > > > >
> > > > > > [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
> > > > > > [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
> > > > > > [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
> > > > > > [<0>] nfs_file_release+0x35/0x50 [nfs]
> > > > > > [<0>] __fput+0xa2/0x1c0
> > > > > > [<0>] task_work_run+0x82/0xa0
> > > > > > [<0>] do_exit+0x2ac/0xc20
> > > > > > [<0>] do_group_exit+0x39/0xa0
> > > > > > [<0>] get_signal+0x1ce/0x5d0
> > > > > > [<0>] do_signal+0x36/0x620
> > > > > > [<0>] exit_to_usermode_loop+0x5e/0xc2
> > > > > > [<0>] do_syscall_64+0x16c/0x190
> > > > > > [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > > > > > [<0>] 0xffffffffffffffff
> > > > > >
> > > > > > Since commit 12f275cdd163(NFSv4: Retry CLOSE and
> > > > > > DELEGRETURN on
> > > > > > NFS4ERR_OLD_STATEID), the client will retry to close the
> > > > > > file
> > > > > > when
> > > > > > stateid generation number in client is lower than server.
> > > > > >
> > > > > > The original intention of this commit is retrying the
> > > > > > operation
> > > > > > while
> > > > > > racing with an OPEN. However, in this case the stateid
> > > > > > generation remains
> > > > > > mismatch forever.
> > > > > >
> > > > > > Any suggestions?
> > > > >
> > > > > Can you include a network trace of the failure? Is it
> > > > > possible
> > > > > that
> > > > > the server has crashed on reply to the close and that's why
> > > > > the
> > > > > task
> > > > > is hung? What server are you testing against?
> > > > >
> > > > I am testing the local server, the testcase is attached below.
> > > >
> > > > # uname -r
> > > > 5.0.0-rc8
> > > >
> > > > # cat /etc/exports
> > > > /mnt/vdd *(rw,no_subtree_check,no_root_squash)
> > > >
> > > > # cat test.sh
> > > > mount /dev/vdd /mnt/vdd
> > > > systemctl start nfs
> > > > mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd
> > > > /mnt/nfsscratch
> > > >
> > > > for ((i=0;i<10000;i++))
> > > > do
> > > > echo "loop $i started"
> > > > /usr/local/xfstests/src/aio-dio-regress/aio-last-ref-
> > > > held-
> > > > by-io 1 100 /mnt/nfsscratch/aio-testfile
> > > > sleep 5
> > > > done
> > > >
> > > > The command aio-last-ref-held-by-io from xfstests will spawn
> > > > 100
> > > > threads. Each thread
> > > > repeated doing open/io_submit/close/io_getevents/io_destroy
> > > > until
> > > > the parent exist
> > > > because of timeout.
> > > >
> > > >
> > > > Here is the log when the task hung.
> > > >
> > > > [10841.121523] nfs4_close_prepare: begin!
> > > > [10841.121526] --> nfs4_alloc_slot used_slots=0000
> > > > highest_used=4294967295 max_slots=10
> > > > [10841.121528] <-- nfs4_alloc_slot used_slots=0001
> > > > highest_used=0
> > > > slotid=0
> > > > [10841.121530] nfs4_close_prepare: done!
> > > > [10841.121536] encode_sequence:
> > > > sessionid=1551405749:3822331530:1:0
> > > > seqid=59495845 slotid=0 max_slotid=0 cache_this=1
> > > > [10841.121583] nfsd_dispatch: vers 4 proc 1
> > > > [10841.121588] __find_in_sessionid_hashtbl:
> > > > 1551405749:3822331530:1:0
> > > > [10841.121590] nfsd4_sequence: slotid 0
> > > > [10841.121592] check_slot_seqid enter. seqid 59495845
> > > > slot_seqid
> > > > 59495844
> > > > [10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
> > > > 61e12d9e c03beba6 0000000b)
> > > > [10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
> > > > 61e12d9e c03beba6 0000000b)
> > > > [10841.121613] NFSD: nfsd4_close on file aio-testfile
> > > > [10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid
> > > > =
> > > > (5c7892b5/e3d4268a/000ab905/0000746b)
> > > > [10841.121619] nfsv4 compound returned 10024
> > > > [10841.121621] --> nfsd4_store_cache_entry slot
> > > > 00000000f540ddd9
> > > > [10841.121625] renewing client (clientid 5c7892b5/e3d4268a)
> > > > [10841.122526] decode_attr_type: type=00
> > > > ......
> > > > [10841.122558] nfs4_close_done: begin!
> > > > [10841.122561] --> nfs4_alloc_slot used_slots=0001
> > > > highest_used=0
> > > > max_slots=10
> > > > [10841.122564] <-- nfs4_alloc_slot used_slots=0003
> > > > highest_used=1
> > > > slotid=1
> > > > [10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0
> > > > [10841.122568] nfs41_sequence_process: Error 0 free the slot
> > > > [10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid
> > > > 4294967295
> > > > [10841.122575] nfs4_close_done: done, ret = 0!
> > > > [10841.122584] nfs4_close_prepare: begin!
> > > > ......
> > > > Client keep retry the CLOSE.
> > > >
> > > > > I have seen trace where close would get ERR_OLD_STATEID and
> > > > > would
> > > > > still retry with the same open state until it got a reply to
> > > > > the
> > > > > OPEN
> > > > > which changed the state and when the client received reply to
> > > > > that,
> > > > > it'll retry the CLOSE with the updated stateid.
> > > > >
> > > >
> > > > Yes, you are right. I have also noticed that. But what happened
> > > > if
> > > > the
> > > > OPEN task exit receiving a signal? How can a client update the
> > > > stateid?
> > > >
> >
> > static void nfs4_open_release(void *calldata)
> > {
> > struct
> > nfs4_opendata *data = calldata;
> > struct nfs4_state *state = NULL;
> >
> > /* If this request hasn't been cancelled, do nothing */
> > if (!data->cancelled)
> > goto out_free;
> > /* In case of error, no cleanup! */
> > if (data->rpc_status != 0 || !data->rpc_done)
> > goto out_free;
> > /* In case we need an open_confirm, no cleanup! */
> > if (data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM)
> > goto out_free;
> > state = nfs4_opendata_to_nfs4_state(data);
> > if (!IS_ERR(state))
> > nfs4_close_state(state, data->o_arg.fmode);
> > out_free:
> > nfs4_opendata_put(data);
> > }
> >
> > Note what happens when the 'data->cancelled' flag is set. This is
> > why
> > we always ensure that stateful operations such as OPEN, and
> > LOCK,...
> > are run as asynchronous RPC tasks.
>
> I noticed that when receiving a signal, the rpc_status will be set to
> -ERESTARTSYS in rpc_exit->rpc_exit_task->nfs4_open_done, which made
> stateid
> not updated in nfs4_open_release().
>
> So is it possible that the stateid between the client and the server
> remain
> mismatch if the OPEN task received the signal right after
> transmitting the
> message or receiving the response, but before calling rpc_exit_task
> normally?

If the rpc_task could be signalled, yes. However I don't see how that
can happen to an async RPC call. Have you found such a case?

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-03-05 05:09:21

by Jiufei Xue

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323



On 2019/3/4 下午11:50, Trond Myklebust wrote:
> On Mon, 2019-03-04 at 23:20 +0800, Jiufei Xue wrote:
>> Hi Trond,
>>
>> On 2019/3/1 下午9:08, Trond Myklebust wrote:
>>> On Fri, 2019-03-01 at 16:49 +0800, Jiufei Xue wrote:
>>>> On 2019/3/1 下午1:08, Jiufei Xue wrote:
>>>>> Hi Olga,
>>>>>
>>>>> On 2019/3/1 上午6:26, Olga Kornievskaia wrote:
>>>>>> On Thu, Feb 28, 2019 at 5:11 AM Jiufei Xue <
>>>>>> [email protected]> wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> when I tested xfstests/generic/323 with NFSv4.1 and v4.2,
>>>>>>> the
>>>>>>> task
>>>>>>> changed to zombie occasionally while a thread is hanging
>>>>>>> with
>>>>>>> the
>>>>>>> following stack:
>>>>>>>
>>>>>>> [<0>] rpc_wait_bit_killable+0x1e/0xa0 [sunrpc]
>>>>>>> [<0>] nfs4_do_close+0x21b/0x2c0 [nfsv4]
>>>>>>> [<0>] __put_nfs_open_context+0xa2/0x110 [nfs]
>>>>>>> [<0>] nfs_file_release+0x35/0x50 [nfs]
>>>>>>> [<0>] __fput+0xa2/0x1c0
>>>>>>> [<0>] task_work_run+0x82/0xa0
>>>>>>> [<0>] do_exit+0x2ac/0xc20
>>>>>>> [<0>] do_group_exit+0x39/0xa0
>>>>>>> [<0>] get_signal+0x1ce/0x5d0
>>>>>>> [<0>] do_signal+0x36/0x620
>>>>>>> [<0>] exit_to_usermode_loop+0x5e/0xc2
>>>>>>> [<0>] do_syscall_64+0x16c/0x190
>>>>>>> [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>>>>> [<0>] 0xffffffffffffffff
>>>>>>>
>>>>>>> Since commit 12f275cdd163(NFSv4: Retry CLOSE and
>>>>>>> DELEGRETURN on
>>>>>>> NFS4ERR_OLD_STATEID), the client will retry to close the
>>>>>>> file
>>>>>>> when
>>>>>>> stateid generation number in client is lower than server.
>>>>>>>
>>>>>>> The original intention of this commit is retrying the
>>>>>>> operation
>>>>>>> while
>>>>>>> racing with an OPEN. However, in this case the stateid
>>>>>>> generation remains
>>>>>>> mismatch forever.
>>>>>>>
>>>>>>> Any suggestions?
>>>>>>
>>>>>> Can you include a network trace of the failure? Is it
>>>>>> possible
>>>>>> that
>>>>>> the server has crashed on reply to the close and that's why
>>>>>> the
>>>>>> task
>>>>>> is hung? What server are you testing against?
>>>>>>
>>>>> I am testing the local server, the testcase is attached below.
>>>>>
>>>>> # uname -r
>>>>> 5.0.0-rc8
>>>>>
>>>>> # cat /etc/exports
>>>>> /mnt/vdd *(rw,no_subtree_check,no_root_squash)
>>>>>
>>>>> # cat test.sh
>>>>> mount /dev/vdd /mnt/vdd
>>>>> systemctl start nfs
>>>>> mount -t nfs -o rw,relatime,vers=4.1 127.0.0.1:/mnt/vdd
>>>>> /mnt/nfsscratch
>>>>>
>>>>> for ((i=0;i<10000;i++))
>>>>> do
>>>>> echo "loop $i started"
>>>>> /usr/local/xfstests/src/aio-dio-regress/aio-last-ref-
>>>>> held-
>>>>> by-io 1 100 /mnt/nfsscratch/aio-testfile
>>>>> sleep 5
>>>>> done
>>>>>
>>>>> The command aio-last-ref-held-by-io from xfstests will spawn
>>>>> 100
>>>>> threads. Each thread
>>>>> repeated doing open/io_submit/close/io_getevents/io_destroy
>>>>> until
>>>>> the parent exist
>>>>> because of timeout.
>>>>>
>>>>>
>>>>> Here is the log when the task hung.
>>>>>
>>>>> [10841.121523] nfs4_close_prepare: begin!
>>>>> [10841.121526] --> nfs4_alloc_slot used_slots=0000
>>>>> highest_used=4294967295 max_slots=10
>>>>> [10841.121528] <-- nfs4_alloc_slot used_slots=0001
>>>>> highest_used=0
>>>>> slotid=0
>>>>> [10841.121530] nfs4_close_prepare: done!
>>>>> [10841.121536] encode_sequence:
>>>>> sessionid=1551405749:3822331530:1:0
>>>>> seqid=59495845 slotid=0 max_slotid=0 cache_this=1
>>>>> [10841.121583] nfsd_dispatch: vers 4 proc 1
>>>>> [10841.121588] __find_in_sessionid_hashtbl:
>>>>> 1551405749:3822331530:1:0
>>>>> [10841.121590] nfsd4_sequence: slotid 0
>>>>> [10841.121592] check_slot_seqid enter. seqid 59495845
>>>>> slot_seqid
>>>>> 59495844
>>>>> [10841.121596] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
>>>>> 61e12d9e c03beba6 0000000b)
>>>>> [10841.121607] nfsd: fh_verify(28: 01060001 0c66c6db d74ea4c9
>>>>> 61e12d9e c03beba6 0000000b)
>>>>> [10841.121613] NFSD: nfsd4_close on file aio-testfile
>>>>> [10841.121615] NFSD: nfs4_preprocess_seqid_op: seqid=0 stateid
>>>>> =
>>>>> (5c7892b5/e3d4268a/000ab905/0000746b)
>>>>> [10841.121619] nfsv4 compound returned 10024
>>>>> [10841.121621] --> nfsd4_store_cache_entry slot
>>>>> 00000000f540ddd9
>>>>> [10841.121625] renewing client (clientid 5c7892b5/e3d4268a)
>>>>> [10841.122526] decode_attr_type: type=00
>>>>> ......
>>>>> [10841.122558] nfs4_close_done: begin!
>>>>> [10841.122561] --> nfs4_alloc_slot used_slots=0001
>>>>> highest_used=0
>>>>> max_slots=10
>>>>> [10841.122564] <-- nfs4_alloc_slot used_slots=0003
>>>>> highest_used=1
>>>>> slotid=1
>>>>> [10841.122566] nfs4_free_slot: slotid 1 highest_used_slotid 0
>>>>> [10841.122568] nfs41_sequence_process: Error 0 free the slot
>>>>> [10841.122571] nfs4_free_slot: slotid 0 highest_used_slotid
>>>>> 4294967295
>>>>> [10841.122575] nfs4_close_done: done, ret = 0!
>>>>> [10841.122584] nfs4_close_prepare: begin!
>>>>> ......
>>>>> Client keep retry the CLOSE.
>>>>>
>>>>>> I have seen trace where close would get ERR_OLD_STATEID and
>>>>>> would
>>>>>> still retry with the same open state until it got a reply to
>>>>>> the
>>>>>> OPEN
>>>>>> which changed the state and when the client received reply to
>>>>>> that,
>>>>>> it'll retry the CLOSE with the updated stateid.
>>>>>>
>>>>>
>>>>> Yes, you are right. I have also noticed that. But what happened
>>>>> if
>>>>> the
>>>>> OPEN task exit receiving a signal? How can a client update the
>>>>> stateid?
>>>>>
>>>
>>> static void nfs4_open_release(void *calldata)
>>> {
>>> struct
>>> nfs4_opendata *data = calldata;
>>> struct nfs4_state *state = NULL;
>>>
>>> /* If this request hasn't been cancelled, do nothing */
>>> if (!data->cancelled)
>>> goto out_free;
>>> /* In case of error, no cleanup! */
>>> if (data->rpc_status != 0 || !data->rpc_done)
>>> goto out_free;
>>> /* In case we need an open_confirm, no cleanup! */
>>> if (data->o_res.rflags & NFS4_OPEN_RESULT_CONFIRM)
>>> goto out_free;
>>> state = nfs4_opendata_to_nfs4_state(data);
>>> if (!IS_ERR(state))
>>> nfs4_close_state(state, data->o_arg.fmode);
>>> out_free:
>>> nfs4_opendata_put(data);
>>> }
>>>
>>> Note what happens when the 'data->cancelled' flag is set. This is
>>> why
>>> we always ensure that stateful operations such as OPEN, and
>>> LOCK,...
>>> are run as asynchronous RPC tasks.
>>
>> I noticed that when receiving a signal, the rpc_status will be set to
>> -ERESTARTSYS in rpc_exit->rpc_exit_task->nfs4_open_done, which made
>> stateid
>> not updated in nfs4_open_release().
>>
>> So is it possible that the stateid between the client and the server
>> remain
>> mismatch if the OPEN task received the signal right after
>> transmitting the
>> message or receiving the response, but before calling rpc_exit_task
>> normally?
>
> If the rpc_task could be signalled, yes. However I don't see how that
> can happen to an async RPC call. Have you found such a case?
>

I haven't found such a case, just guess :)

But now I have found something new that may helpful. When the problem
happened, rpc_status was set to -121 which returned from
nfs4_xdr_dec_open->decode_putfh. I think it is the cause that client can
not update the stateid.

kworker/u4:0-7 [001] .... 559.920542: nfs4_xdr_dec_open: status=0 openstateid=40289:0xd4bdfeb7
kworker/1:0-2929 [001] .... 559.920550: nfs4_open_stateid_update: error=0 (OK) fileid=00:2b:12 fhandle=0x1e4566cb stateid=40289:0xd4bdfeb7
kworker/1:0-2929 [001] .... 559.920551: nfs4_open_release: cancel=1 rpc_status=0 rpc_done=1 openstateid=40289:0xd4bdfeb7

nfsd-2741 [001] .... 559.920571: nfsd_compound_status: op=1/6 OP_SEQUENCE status=0
nfsd-2741 [001] .... 559.920575: nfsd_compound_status: op=2/6 OP_PUTFH status=0
nfsd-2741 [001] .... 559.920581: nfsd_inc_and_copy_stateid: client 5c7df867:90134041 stateid 00031b7d:00009d62 // 40290
nfsd-2741 [001] .... 559.920584: nfsd_compound_status: op=3/6 OP_OPEN status=0
nfsd-2741 [001] .... 559.920584: nfsd_compound_status: op=4/6 OP_GETFH status=0
nfsd-2741 [001] .... 559.920585: nfsd_compound_status: op=5/6 OP_ACCESS status=0
nfsd-2741 [001] .... 559.920588: nfsd_compound_status: op=6/6 OP_GETATTR status=0
kworker/u4:0-7 [001] .... 559.920622: nfs4_xdr_dec_open: status=-121 openstateid=0:0x7bd5c66f // added for debug
kworker/1:0-2929 [001] .... 559.920628: nfs4_open_release: cancel=1 rpc_status=-121 rpc_done=1 openstateid=0:0x7bd5c66f // added for debug

And we got this error in dmesg:
[ 559.834808] nfs: Server returned operation 53 but we issued a request for 22
[ 559.835741] nfs: Server returned operation 53 but we issued a request for 22

However, I don't kown why the client received two OP_SEQUENCE response.
Any ideas?

Thanks,
Jiufei

2019-03-05 14:45:51

by Trond Myklebust

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Tue, 2019-03-05 at 13:09 +0800, Jiufei Xue wrote:
> But now I have found something new that may helpful. When the problem
> happened, rpc_status was set to -121 which returned from
> nfs4_xdr_dec_open->decode_putfh. I think it is the cause that client
> can
> not update the stateid.
>
> kworker/u4:0-7 [001] .... 559.920542: nfs4_xdr_dec_open:
> status=0 openstateid=40289:0xd4bdfeb7
> kworker/1:0-2929 [001] .... 559.920550:
> nfs4_open_stateid_update: error=0 (OK) fileid=00:2b:12
> fhandle=0x1e4566cb stateid=40289:0xd4bdfeb7
> kworker/1:0-2929 [001] .... 559.920551: nfs4_open_release:
> cancel=1 rpc_status=0 rpc_done=1 openstateid=40289:0xd4bdfeb7
>
> nfsd-2741 [001] .... 559.920571: nfsd_compound_status:
> op=1/6 OP_SEQUENCE status=0
> nfsd-2741 [001] .... 559.920575: nfsd_compound_status:
> op=2/6 OP_PUTFH status=0
> nfsd-2741 [001] .... 559.920581:
> nfsd_inc_and_copy_stateid: client 5c7df867:90134041 stateid
> 00031b7d:00009d62 // 40290
> nfsd-2741 [001] .... 559.920584: nfsd_compound_status:
> op=3/6 OP_OPEN status=0
> nfsd-2741 [001] .... 559.920584: nfsd_compound_status:
> op=4/6 OP_GETFH status=0
> nfsd-2741 [001] .... 559.920585: nfsd_compound_status:
> op=5/6 OP_ACCESS status=0
> nfsd-2741 [001] .... 559.920588: nfsd_compound_status:
> op=6/6 OP_GETATTR status=0
> kworker/u4:0-7 [001] .... 559.920622: nfs4_xdr_dec_open:
> status=-121 openstateid=0:0x7bd5c66f // added for debug
> kworker/1:0-2929 [001] .... 559.920628: nfs4_open_release:
> cancel=1 rpc_status=-121 rpc_done=1 openstateid=0:0x7bd5c66f // added
> for debug
>
> And we got this error in dmesg:
> [ 559.834808] nfs: Server returned operation 53 but we issued a
> request for 22
> [ 559.835741] nfs: Server returned operation 53 but we issued a
> request for 22
>
> However, I don't kown why the client received two OP_SEQUENCE
> response.
> Any ideas?
>

No, that's quite weird... Were you able to catch this behaviour on
wireshark? It would be interesting to see if the server is actually
sending 2 SEQUENCE ops or if that is a corruption that occurs on the
client.

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-03-06 09:59:55

by Jiufei Xue

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

Hi Trond,

>
> No, that's quite weird... Were you able to catch this behaviour on
> wireshark? It would be interesting to see if the server is actually
> sending 2 SEQUENCE ops or if that is a corruption that occurs on the
> client.
>

Many packages are dropped by kernel when I use tcpdump, so I am not sure
whether the server is sending 2 SEQUENCE.
However, according to tracepoint nfsd_compound_status, the server is
processing SEQUENCE op only once. I think it is more likely blames to
the client.
Is there any other way to catch the behaviour?

Thanks,
Jiufei



2019-03-06 16:09:36

by J. Bruce Fields

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Wed, Mar 06, 2019 at 05:59:39PM +0800, Jiufei Xue wrote:
> Hi Trond,
>
> >
> > No, that's quite weird... Were you able to catch this behaviour on
> > wireshark? It would be interesting to see if the server is actually
> > sending 2 SEQUENCE ops or if that is a corruption that occurs on the
> > client.
> >
>
> Many packages are dropped by kernel when I use tcpdump, so I am not sure
> whether the server is sending 2 SEQUENCE.

For what it's worth, tcpdump with -s200 or so should be enough to get
this. Or maybe capturing from a different machine could help?

> However, according to tracepoint nfsd_compound_status, the server is
> processing SEQUENCE op only once. I think it is more likely blames to
> the client.

If there was a bug in nfsd's xdr decoding, that'd happen after that
tracepoint. That'd be weird corruption, but perhaps all the
possibilites are weird at this point.

--b.

> Is there any other way to catch the behaviour?
>
> Thanks,
> Jiufei
>

2019-03-10 22:20:19

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Wed, Mar 6, 2019 at 11:09 AM [email protected]
<[email protected]> wrote:
>
> On Wed, Mar 06, 2019 at 05:59:39PM +0800, Jiufei Xue wrote:
> > Hi Trond,
> >
> > >
> > > No, that's quite weird... Were you able to catch this behaviour on
> > > wireshark? It would be interesting to see if the server is actually
> > > sending 2 SEQUENCE ops or if that is a corruption that occurs on the
> > > client.
> > >
> >
> > Many packages are dropped by kernel when I use tcpdump, so I am not sure
> > whether the server is sending 2 SEQUENCE.
>
> For what it's worth, tcpdump with -s200 or so should be enough to get
> this. Or maybe capturing from a different machine could help?
>
> > However, according to tracepoint nfsd_compound_status, the server is
> > processing SEQUENCE op only once. I think it is more likely blames to
> > the client.
>
> If there was a bug in nfsd's xdr decoding, that'd happen after that
> tracepoint. That'd be weird corruption, but perhaps all the
> possibilites are weird at this point.

This is not a bug in the Linux nfsd. I was able to reproduce it
against the Ontap server. I ran using the RHEL7.6 client (but it's
upsteram since Jiufei hit it). (I wanted to attach a trace but it
looks like it's too big as my first attempt at posting this was
unsuccessful. I can send a trace to a personal email if interested).

There are a bunch of cases where multiple operations are using the
same seqid and slot.

Example of such weirdness is (nfs.seqid == 0x000002f4) && (nfs.slotid
== 0) and the one leading to the hang.

In frame 415870, there is an OPEN using that seqid and slot for the
first time (but this slot will be re-used a bunch of times before it
gets a reply in frame 415908 with the open stateid seq=40). (also in
this packet there is an example of reuse slot=1+seqid=0x000128f7 by
both TEST_STATEID and OPEN but let's set that aside).

In frame 415874 (in the same packet), client sends 5 opens on the SAME
seqid and slot (all have distinct xids). In a ways that's end up being
alright since opens are for the same file and thus reply out of the
cache and the reply is ERR_DELAY. But in frame 415876, client sends
again uses the same seqid and slot and in this case it's used by
3opens and a test_stateid.

Client in all this mess never processes the open stateid seq=40 and
keeps on resending CLOSE with seq=37 (also to note client "missed"
processing seqid=38 and 39 as well. 39 probably because it was a reply
on the same kind of "Reused" slot=1 and seq=0x000128f7. I haven't
tracked 38 but i'm assuming it's the same). I don't know how many
times but after 5mins, I see a TEST_STATEID that again uses the same
seqid+slot (which gets a reply from the cache matching OPEN). Also
open + close (still with seq=37) open is replied to but after this
client goes into a soft lockup logs have "nfs4_schedule_state_manager:
kthread_ruan: -4" over and over again . then a soft lockup.

Looking back on slot 0. nfs.seqid=0x000002f3 was used in frame=415866
by the TEST_STATEID. This is replied to in frame 415877 (with an
ERR_DELAY). But before the client got a reply, it used the slot and
the seq by frame 415874. TEST_STATEID is a synchronous and
interruptible operation. I'm suspecting that somehow it was
interrupted and that's who the slot was able to be re-used by the
frame 415874. But how the several opens were able to get the same slot
I don't know..

Regardless of how we got into the soft lockup. Client shouldn't be
using the same slot+seqid on multiple operations.
>
> --b.
>
> > Is there any other way to catch the behaviour?
> >
> > Thanks,
> > Jiufei
> >

2019-03-11 14:30:46

by Trond Myklebust

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

Hi Olga,

On Sun, 2019-03-10 at 18:20 -0400, Olga Kornievskaia wrote:
> There are a bunch of cases where multiple operations are using the
> same seqid and slot.
>
> Example of such weirdness is (nfs.seqid == 0x000002f4) && (nfs.slotid
> == 0) and the one leading to the hang.
>
> In frame 415870, there is an OPEN using that seqid and slot for the
> first time (but this slot will be re-used a bunch of times before it
> gets a reply in frame 415908 with the open stateid seq=40). (also in
> this packet there is an example of reuse slot=1+seqid=0x000128f7 by
> both TEST_STATEID and OPEN but let's set that aside).
>
> In frame 415874 (in the same packet), client sends 5 opens on the
> SAME
> seqid and slot (all have distinct xids). In a ways that's end up
> being
> alright since opens are for the same file and thus reply out of the
> cache and the reply is ERR_DELAY. But in frame 415876, client sends
> again uses the same seqid and slot and in this case it's used by
> 3opens and a test_stateid.
>
> Client in all this mess never processes the open stateid seq=40 and
> keeps on resending CLOSE with seq=37 (also to note client "missed"
> processing seqid=38 and 39 as well. 39 probably because it was a
> reply
> on the same kind of "Reused" slot=1 and seq=0x000128f7. I haven't
> tracked 38 but i'm assuming it's the same). I don't know how many
> times but after 5mins, I see a TEST_STATEID that again uses the same
> seqid+slot (which gets a reply from the cache matching OPEN). Also
> open + close (still with seq=37) open is replied to but after this
> client goes into a soft lockup logs have
> "nfs4_schedule_state_manager:
> kthread_ruan: -4" over and over again . then a soft lockup.
>
> Looking back on slot 0. nfs.seqid=0x000002f3 was used in frame=415866
> by the TEST_STATEID. This is replied to in frame 415877 (with an
> ERR_DELAY). But before the client got a reply, it used the slot and
> the seq by frame 415874. TEST_STATEID is a synchronous and
> interruptible operation. I'm suspecting that somehow it was
> interrupted and that's who the slot was able to be re-used by the
> frame 415874. But how the several opens were able to get the same
> slot
> I don't know..

Is this still true with the current linux-next? I would expect this
patch
http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=3453d5708b33efe76f40eca1c0ed60923094b971
to change the Linux client behaviour in the above regard.

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-03-11 15:07:35

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Mon, Mar 11, 2019 at 10:30 AM Trond Myklebust
<[email protected]> wrote:
>
> Hi Olga,
>
> On Sun, 2019-03-10 at 18:20 -0400, Olga Kornievskaia wrote:
> > There are a bunch of cases where multiple operations are using the
> > same seqid and slot.
> >
> > Example of such weirdness is (nfs.seqid == 0x000002f4) && (nfs.slotid
> > == 0) and the one leading to the hang.
> >
> > In frame 415870, there is an OPEN using that seqid and slot for the
> > first time (but this slot will be re-used a bunch of times before it
> > gets a reply in frame 415908 with the open stateid seq=40). (also in
> > this packet there is an example of reuse slot=1+seqid=0x000128f7 by
> > both TEST_STATEID and OPEN but let's set that aside).
> >
> > In frame 415874 (in the same packet), client sends 5 opens on the
> > SAME
> > seqid and slot (all have distinct xids). In a ways that's end up
> > being
> > alright since opens are for the same file and thus reply out of the
> > cache and the reply is ERR_DELAY. But in frame 415876, client sends
> > again uses the same seqid and slot and in this case it's used by
> > 3opens and a test_stateid.
> >
> > Client in all this mess never processes the open stateid seq=40 and
> > keeps on resending CLOSE with seq=37 (also to note client "missed"
> > processing seqid=38 and 39 as well. 39 probably because it was a
> > reply
> > on the same kind of "Reused" slot=1 and seq=0x000128f7. I haven't
> > tracked 38 but i'm assuming it's the same). I don't know how many
> > times but after 5mins, I see a TEST_STATEID that again uses the same
> > seqid+slot (which gets a reply from the cache matching OPEN). Also
> > open + close (still with seq=37) open is replied to but after this
> > client goes into a soft lockup logs have
> > "nfs4_schedule_state_manager:
> > kthread_ruan: -4" over and over again . then a soft lockup.
> >
> > Looking back on slot 0. nfs.seqid=0x000002f3 was used in frame=415866
> > by the TEST_STATEID. This is replied to in frame 415877 (with an
> > ERR_DELAY). But before the client got a reply, it used the slot and
> > the seq by frame 415874. TEST_STATEID is a synchronous and
> > interruptible operation. I'm suspecting that somehow it was
> > interrupted and that's who the slot was able to be re-used by the
> > frame 415874. But how the several opens were able to get the same
> > slot
> > I don't know..
>
> Is this still true with the current linux-next? I would expect this
> patch
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=3453d5708b33efe76f40eca1c0ed60923094b971
> to change the Linux client behaviour in the above regard.

Yes. I reproduced it against your origin/testing (5.0.0-rc7+) commit
0d1bf3407c4ae88 ("SUNRPC: allow dynamic allocation of back channel
slots").


>
> Cheers
> Trond
>
> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2019-03-11 15:12:14

by Trond Myklebust

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Mon, 2019-03-11 at 14:30 +0000, Trond Myklebust wrote:
> Hi Olga,
>
> On Sun, 2019-03-10 at 18:20 -0400, Olga Kornievskaia wrote:
> > There are a bunch of cases where multiple operations are using the
> > same seqid and slot.
> >
> > Example of such weirdness is (nfs.seqid == 0x000002f4) &&
> > (nfs.slotid
> > == 0) and the one leading to the hang.
> >
> > In frame 415870, there is an OPEN using that seqid and slot for the
> > first time (but this slot will be re-used a bunch of times before
> > it
> > gets a reply in frame 415908 with the open stateid seq=40). (also
> > in
> > this packet there is an example of reuse slot=1+seqid=0x000128f7 by
> > both TEST_STATEID and OPEN but let's set that aside).
> >
> > In frame 415874 (in the same packet), client sends 5 opens on the
> > SAME
> > seqid and slot (all have distinct xids). In a ways that's end up
> > being
> > alright since opens are for the same file and thus reply out of the
> > cache and the reply is ERR_DELAY. But in frame 415876, client sends
> > again uses the same seqid and slot and in this case it's used by
> > 3opens and a test_stateid.

This should result in exactly 1 bump of the stateid seqid.

> > Client in all this mess never processes the open stateid seq=40 and
> > keeps on resending CLOSE with seq=37 (also to note client "missed"
> > processing seqid=38 and 39 as well. 39 probably because it was a
> > reply
> > on the same kind of "Reused" slot=1 and seq=0x000128f7. I haven't
> > tracked 38 but i'm assuming it's the same). I don't know how many
> > times but after 5mins, I see a TEST_STATEID that again uses the
> > same
> > seqid+slot (which gets a reply from the cache matching OPEN). Also
> > open + close (still with seq=37) open is replied to but after this
> > client goes into a soft lockup logs have
> > "nfs4_schedule_state_manager:
> > kthread_ruan: -4" over and over again . then a soft lockup.
> >
> > Looking back on slot 0. nfs.seqid=0x000002f3 was used in
> > frame=415866
> > by the TEST_STATEID. This is replied to in frame 415877 (with an
> > ERR_DELAY). But before the client got a reply, it used the slot and
> > the seq by frame 415874. TEST_STATEID is a synchronous and
> > interruptible operation. I'm suspecting that somehow it was
> > interrupted and that's who the slot was able to be re-used by the
> > frame 415874. But how the several opens were able to get the same
> > slot
> > I don't know..
>
> Is this still true with the current linux-next? I would expect this
> patch
> http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=3453d5708b33efe76f40eca1c0ed60923094b971
> to change the Linux client behaviour in the above regard.
>


Note also that what you describe would appear to indicate a serious
server bug. If the client is reusing a slot+seqid, then the correct
server behaviour is to either return one of the errors NFS4ERR_DELAY,
NFS4ERR_SEQ_FALSE_RETRY, NFS4ERR_RETRY_UNCACHED_REP,
NFS4ERR_SEQ_MISORDERED, or else it must replay the exact same reply
that it had cached for the original request.

It is protocol violation for the server to execute new requests on a
slot+seqid combination that has already been used. For that reason, it
is also impossible for a replay to cause further state changes on the
server; a replay means that the server belts out the exact same reply
that was attempted sent earlier with no changes (with stateids that
match bit for bit what would have been sent earlier).

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-03-11 15:13:13

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Mon, Mar 11, 2019 at 11:07 AM Olga Kornievskaia <[email protected]> wrote:
>
> On Mon, Mar 11, 2019 at 10:30 AM Trond Myklebust
> <[email protected]> wrote:
> >
> > Hi Olga,
> >
> > On Sun, 2019-03-10 at 18:20 -0400, Olga Kornievskaia wrote:
> > > There are a bunch of cases where multiple operations are using the
> > > same seqid and slot.
> > >
> > > Example of such weirdness is (nfs.seqid == 0x000002f4) && (nfs.slotid
> > > == 0) and the one leading to the hang.
> > >
> > > In frame 415870, there is an OPEN using that seqid and slot for the
> > > first time (but this slot will be re-used a bunch of times before it
> > > gets a reply in frame 415908 with the open stateid seq=40). (also in
> > > this packet there is an example of reuse slot=1+seqid=0x000128f7 by
> > > both TEST_STATEID and OPEN but let's set that aside).
> > >
> > > In frame 415874 (in the same packet), client sends 5 opens on the
> > > SAME
> > > seqid and slot (all have distinct xids). In a ways that's end up
> > > being
> > > alright since opens are for the same file and thus reply out of the
> > > cache and the reply is ERR_DELAY. But in frame 415876, client sends
> > > again uses the same seqid and slot and in this case it's used by
> > > 3opens and a test_stateid.
> > >
> > > Client in all this mess never processes the open stateid seq=40 and
> > > keeps on resending CLOSE with seq=37 (also to note client "missed"
> > > processing seqid=38 and 39 as well. 39 probably because it was a
> > > reply
> > > on the same kind of "Reused" slot=1 and seq=0x000128f7. I haven't
> > > tracked 38 but i'm assuming it's the same). I don't know how many
> > > times but after 5mins, I see a TEST_STATEID that again uses the same
> > > seqid+slot (which gets a reply from the cache matching OPEN). Also
> > > open + close (still with seq=37) open is replied to but after this
> > > client goes into a soft lockup logs have
> > > "nfs4_schedule_state_manager:
> > > kthread_ruan: -4" over and over again . then a soft lockup.
> > >
> > > Looking back on slot 0. nfs.seqid=0x000002f3 was used in frame=415866
> > > by the TEST_STATEID. This is replied to in frame 415877 (with an
> > > ERR_DELAY). But before the client got a reply, it used the slot and
> > > the seq by frame 415874. TEST_STATEID is a synchronous and
> > > interruptible operation. I'm suspecting that somehow it was
> > > interrupted and that's who the slot was able to be re-used by the
> > > frame 415874. But how the several opens were able to get the same
> > > slot
> > > I don't know..
> >
> > Is this still true with the current linux-next? I would expect this
> > patch
> > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=3453d5708b33efe76f40eca1c0ed60923094b971
> > to change the Linux client behaviour in the above regard.
>
> Yes. I reproduced it against your origin/testing (5.0.0-rc7+) commit
> 0d1bf3407c4ae88 ("SUNRPC: allow dynamic allocation of back channel
> slots").

Let me double check that. I have reproduced the "infinite loop" or
CLOSE on the upstream (I'm looking thru the trace points from friday).

>
>
> >
> > Cheers
> > Trond
> >
> > --
> > Trond Myklebust
> > Linux NFS client maintainer, Hammerspace
> > [email protected]
> >
> >

2019-03-11 15:14:54

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Mon, Mar 11, 2019 at 11:12 AM Trond Myklebust
<[email protected]> wrote:
>
> On Mon, 2019-03-11 at 14:30 +0000, Trond Myklebust wrote:
> > Hi Olga,
> >
> > On Sun, 2019-03-10 at 18:20 -0400, Olga Kornievskaia wrote:
> > > There are a bunch of cases where multiple operations are using the
> > > same seqid and slot.
> > >
> > > Example of such weirdness is (nfs.seqid == 0x000002f4) &&
> > > (nfs.slotid
> > > == 0) and the one leading to the hang.
> > >
> > > In frame 415870, there is an OPEN using that seqid and slot for the
> > > first time (but this slot will be re-used a bunch of times before
> > > it
> > > gets a reply in frame 415908 with the open stateid seq=40). (also
> > > in
> > > this packet there is an example of reuse slot=1+seqid=0x000128f7 by
> > > both TEST_STATEID and OPEN but let's set that aside).
> > >
> > > In frame 415874 (in the same packet), client sends 5 opens on the
> > > SAME
> > > seqid and slot (all have distinct xids). In a ways that's end up
> > > being
> > > alright since opens are for the same file and thus reply out of the
> > > cache and the reply is ERR_DELAY. But in frame 415876, client sends
> > > again uses the same seqid and slot and in this case it's used by
> > > 3opens and a test_stateid.
>
> This should result in exactly 1 bump of the stateid seqid.
>
> > > Client in all this mess never processes the open stateid seq=40 and
> > > keeps on resending CLOSE with seq=37 (also to note client "missed"
> > > processing seqid=38 and 39 as well. 39 probably because it was a
> > > reply
> > > on the same kind of "Reused" slot=1 and seq=0x000128f7. I haven't
> > > tracked 38 but i'm assuming it's the same). I don't know how many
> > > times but after 5mins, I see a TEST_STATEID that again uses the
> > > same
> > > seqid+slot (which gets a reply from the cache matching OPEN). Also
> > > open + close (still with seq=37) open is replied to but after this
> > > client goes into a soft lockup logs have
> > > "nfs4_schedule_state_manager:
> > > kthread_ruan: -4" over and over again . then a soft lockup.
> > >
> > > Looking back on slot 0. nfs.seqid=0x000002f3 was used in
> > > frame=415866
> > > by the TEST_STATEID. This is replied to in frame 415877 (with an
> > > ERR_DELAY). But before the client got a reply, it used the slot and
> > > the seq by frame 415874. TEST_STATEID is a synchronous and
> > > interruptible operation. I'm suspecting that somehow it was
> > > interrupted and that's who the slot was able to be re-used by the
> > > frame 415874. But how the several opens were able to get the same
> > > slot
> > > I don't know..
> >
> > Is this still true with the current linux-next? I would expect this
> > patch
> > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=3453d5708b33efe76f40eca1c0ed60923094b971
> > to change the Linux client behaviour in the above regard.
> >
>
>
> Note also that what you describe would appear to indicate a serious
> server bug. If the client is reusing a slot+seqid, then the correct
> server behaviour is to either return one of the errors NFS4ERR_DELAY,
> NFS4ERR_SEQ_FALSE_RETRY, NFS4ERR_RETRY_UNCACHED_REP,
> NFS4ERR_SEQ_MISORDERED, or else it must replay the exact same reply
> that it had cached for the original request.
>
> It is protocol violation for the server to execute new requests on a
> slot+seqid combination that has already been used. For that reason, it
> is also impossible for a replay to cause further state changes on the
> server; a replay means that the server belts out the exact same reply
> that was attempted sent earlier with no changes (with stateids that
> match bit for bit what would have been sent earlier).
>

But it is the same requests because all of them are opens on the same
file same everything.

> --
> Trond Myklebust
> Linux NFS client maintainer, Hammerspace
> [email protected]
>
>

2019-03-11 15:29:02

by Trond Myklebust

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Mon, 2019-03-11 at 11:14 -0400, Olga Kornievskaia wrote:
> On Mon, Mar 11, 2019 at 11:12 AM Trond Myklebust
> <[email protected]> wrote:
> > On Mon, 2019-03-11 at 14:30 +0000, Trond Myklebust wrote:
> > > Hi Olga,
> > >
> > > On Sun, 2019-03-10 at 18:20 -0400, Olga Kornievskaia wrote:
> > > > There are a bunch of cases where multiple operations are using
> > > > the
> > > > same seqid and slot.
> > > >
> > > > Example of such weirdness is (nfs.seqid == 0x000002f4) &&
> > > > (nfs.slotid
> > > > == 0) and the one leading to the hang.
> > > >
> > > > In frame 415870, there is an OPEN using that seqid and slot for
> > > > the
> > > > first time (but this slot will be re-used a bunch of times
> > > > before
> > > > it
> > > > gets a reply in frame 415908 with the open stateid
> > > > seq=40). (also
> > > > in
> > > > this packet there is an example of reuse
> > > > slot=1+seqid=0x000128f7 by
> > > > both TEST_STATEID and OPEN but let's set that aside).
> > > >
> > > > In frame 415874 (in the same packet), client sends 5 opens on
> > > > the
> > > > SAME
> > > > seqid and slot (all have distinct xids). In a ways that's end
> > > > up
> > > > being
> > > > alright since opens are for the same file and thus reply out of
> > > > the
> > > > cache and the reply is ERR_DELAY. But in frame 415876, client
> > > > sends
> > > > again uses the same seqid and slot and in this case it's used
> > > > by
> > > > 3opens and a test_stateid.
> >
> > This should result in exactly 1 bump of the stateid seqid.
> >
> > > > Client in all this mess never processes the open stateid seq=40
> > > > and
> > > > keeps on resending CLOSE with seq=37 (also to note client
> > > > "missed"
> > > > processing seqid=38 and 39 as well. 39 probably because it was
> > > > a
> > > > reply
> > > > on the same kind of "Reused" slot=1 and seq=0x000128f7. I
> > > > haven't
> > > > tracked 38 but i'm assuming it's the same). I don't know how
> > > > many
> > > > times but after 5mins, I see a TEST_STATEID that again uses the
> > > > same
> > > > seqid+slot (which gets a reply from the cache matching OPEN).
> > > > Also
> > > > open + close (still with seq=37) open is replied to but after
> > > > this
> > > > client goes into a soft lockup logs have
> > > > "nfs4_schedule_state_manager:
> > > > kthread_ruan: -4" over and over again . then a soft lockup.
> > > >
> > > > Looking back on slot 0. nfs.seqid=0x000002f3 was used in
> > > > frame=415866
> > > > by the TEST_STATEID. This is replied to in frame 415877 (with
> > > > an
> > > > ERR_DELAY). But before the client got a reply, it used the slot
> > > > and
> > > > the seq by frame 415874. TEST_STATEID is a synchronous and
> > > > interruptible operation. I'm suspecting that somehow it was
> > > > interrupted and that's who the slot was able to be re-used by
> > > > the
> > > > frame 415874. But how the several opens were able to get the
> > > > same
> > > > slot
> > > > I don't know..
> > >
> > > Is this still true with the current linux-next? I would expect
> > > this
> > > patch
> > > http://git.linux-nfs.org/?p=trondmy/linux-nfs.git;a=commitdiff;h=3453d5708b33efe76f40eca1c0ed60923094b971
> > > to change the Linux client behaviour in the above regard.
> > >
> >
> > Note also that what you describe would appear to indicate a serious
> > server bug. If the client is reusing a slot+seqid, then the correct
> > server behaviour is to either return one of the errors
> > NFS4ERR_DELAY,
> > NFS4ERR_SEQ_FALSE_RETRY, NFS4ERR_RETRY_UNCACHED_REP,
> > NFS4ERR_SEQ_MISORDERED, or else it must replay the exact same reply
> > that it had cached for the original request.
> >
> > It is protocol violation for the server to execute new requests on
> > a
> > slot+seqid combination that has already been used. For that reason,
> > it
> > is also impossible for a replay to cause further state changes on
> > the
> > server; a replay means that the server belts out the exact same
> > reply
> > that was attempted sent earlier with no changes (with stateids that
> > match bit for bit what would have been sent earlier).
> >
>
> But it is the same requests because all of them are opens on the same
> file same everything.

That is irrelevant. The whole point of the session slot mechanism is to
provide reliable only once semantics as defined in section 2.10.6 of
RFC5661: https://tools.ietf.org/html/rfc5661#section-2.10.6

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-03-15 06:30:57

by Jiufei Xue

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

Hi Olga,

On 2019/3/11 下午11:13, Olga Kornievskaia wrote:
> Let me double check that. I have reproduced the "infinite loop" or
> CLOSE on the upstream (I'm looking thru the trace points from friday).

Do you try to capture the packages when reproduced this issue on the
upstream. I still lost kernel packages after some adjustment according
to bfield's suggestion :(

Thanks,
Jiufei

2019-03-15 20:34:10

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Fri, Mar 15, 2019 at 2:31 AM Jiufei Xue <[email protected]> wrote:
>
> Hi Olga,
>
> On 2019/3/11 下午11:13, Olga Kornievskaia wrote:
> > Let me double check that. I have reproduced the "infinite loop" or
> > CLOSE on the upstream (I'm looking thru the trace points from friday).
>
> Do you try to capture the packages when reproduced this issue on the
> upstream. I still lost kernel packages after some adjustment according
> to bfield's suggestion :(

Hi Jiufei,

Yes I have network trace captures but they are too big to post to the
mailing list. I have reproduced the problem on the latest upstream
origin/testing branch commit "SUNRPC: Take the transport send lock
before binding+connecting". As you have noted before infinite loops is
due to client "losing" an update to the seqid.

one packet would send out an (recovery) OPEN with slot=0 seqid=Y.
tracepoint (nfs4_open_file) would log that status=ERESTARTSYS. The rpc
task would be sent and the rpc task would receive a reply but there is
nobody there to receive it... This open that got a reply has an
updated stateid seqid which client never updates. When CLOSE is sent,
it's sent with the "old" stateid and puts the client in an infinite
loop. Btw, CLOSE is sent on the interrupted slot which should get
FALSE_RETRY which causes the client to terminate the session. But it
would still keep sending the CLOSE with the old stateid.

Some things I've noticed is that TEST_STATE op (as a part of the
nfs41_test_and _free_expired_stateid()) for some reason always has a
signal set even before issuing and RPC task so the task never
completes (ever).

I always thought that OPEN's can't be interrupted but I guess they are
since they call rpc_wait_for_completion_task() and that's a killable
event. But I don't know how to find out what's sending a signal to the
process. I'm rather stuck here trying to figure out where to go from
there. So I'm still trying to figure out what's causing the signal or
also how to recover from it that the client doesn't lose that seqid.

>
> Thanks,
> Jiufei

2019-03-15 20:56:28

by Trond Myklebust

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

Hi Olga,

On Fri, 2019-03-15 at 16:33 -0400, Olga Kornievskaia wrote:
> On Fri, Mar 15, 2019 at 2:31 AM Jiufei Xue <
> [email protected]> wrote:
> > Hi Olga,
> >
> > On 2019/3/11 下午11:13, Olga Kornievskaia wrote:
> > > Let me double check that. I have reproduced the "infinite loop"
> > > or
> > > CLOSE on the upstream (I'm looking thru the trace points from
> > > friday).
> >
> > Do you try to capture the packages when reproduced this issue on
> > the
> > upstream. I still lost kernel packages after some adjustment
> > according
> > to bfield's suggestion :(
>
> Hi Jiufei,
>
> Yes I have network trace captures but they are too big to post to the
> mailing list. I have reproduced the problem on the latest upstream
> origin/testing branch commit "SUNRPC: Take the transport send lock
> before binding+connecting". As you have noted before infinite loops
> is
> due to client "losing" an update to the seqid.
>
> one packet would send out an (recovery) OPEN with slot=0 seqid=Y.
> tracepoint (nfs4_open_file) would log that status=ERESTARTSYS. The
> rpc
> task would be sent and the rpc task would receive a reply but there
> is
> nobody there to receive it... This open that got a reply has an
> updated stateid seqid which client never updates. When CLOSE is sent,
> it's sent with the "old" stateid and puts the client in an infinite
> loop. Btw, CLOSE is sent on the interrupted slot which should get
> FALSE_RETRY which causes the client to terminate the session. But it
> would still keep sending the CLOSE with the old stateid.
>
> Some things I've noticed is that TEST_STATE op (as a part of the
> nfs41_test_and _free_expired_stateid()) for some reason always has a
> signal set even before issuing and RPC task so the task never
> completes (ever).
>
> I always thought that OPEN's can't be interrupted but I guess they
> are
> since they call rpc_wait_for_completion_task() and that's a killable
> event. But I don't know how to find out what's sending a signal to
> the
> process. I'm rather stuck here trying to figure out where to go from
> there. So I'm still trying to figure out what's causing the signal or
> also how to recover from it that the client doesn't lose that seqid.
>

Sending a fatal signal to a process waiting in
rpc_wait_for_completion_task() will interrupt that wait, but it will
not interrupt the running asynchronous RPC call. All it should do is
switch the call to taking the 'data->cancelled == true' path in the RPC
call clean up.

Cheers
Trond

--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
[email protected]


2019-03-16 14:11:21

by Jiufei Xue

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323


Hi Olga,
On 2019/3/16 上午4:33, Olga Kornievskaia wrote:
> On Fri, Mar 15, 2019 at 2:31 AM Jiufei Xue <[email protected]> wrote:
>>
>> Hi Olga,
>>
>> On 2019/3/11 下午11:13, Olga Kornievskaia wrote:
>>> Let me double check that. I have reproduced the "infinite loop" or
>>> CLOSE on the upstream (I'm looking thru the trace points from friday).
>>
>> Do you try to capture the packages when reproduced this issue on the
>> upstream. I still lost kernel packages after some adjustment according
>> to bfield's suggestion :(
>
> Hi Jiufei,
>
> Yes I have network trace captures but they are too big to post to the
> mailing list. I have reproduced the problem on the latest upstream
> origin/testing branch commit "SUNRPC: Take the transport send lock
> before binding+connecting". As you have noted before infinite loops is
> due to client "losing" an update to the seqid.
>
> one packet would send out an (recovery) OPEN with slot=0 seqid=Y.
> tracepoint (nfs4_open_file) would log that status=ERESTARTSYS. The rpc
> task would be sent and the rpc task would receive a reply but there is
> nobody there to receive it... This open that got a reply has an
> updated stateid seqid which client never updates. When CLOSE is sent,
> it's sent with the "old" stateid and puts the client in an infinite
> loop. Btw, CLOSE is sent on the interrupted slot which should get
> FALSE_RETRY which causes the client to terminate the session. But it
> would still keep sending the CLOSE with the old stateid.
>
> Some things I've noticed is that TEST_STATE op (as a part of the
> nfs41_test_and _free_expired_stateid()) for some reason always has a
> signal set even before issuing and RPC task so the task never
> completes (ever).
>
> I always thought that OPEN's can't be interrupted but I guess they are
> since they call rpc_wait_for_completion_task() and that's a killable
> event. But I don't know how to find out what's sending a signal to the
> process. I'm rather stuck here trying to figure out where to go from
> there. So I'm still trying to figure out what's causing the signal or
> also how to recover from it that the client doesn't lose that seqid.
>
>>
Thank you for you quick relpy.

I have also noticed the ERESTARTSYS status for OPEN op, but I think it
is returned by the open process which is woken in nfs4_run_open_task().
I found that the decode routine nfs4_xdr_dec_open returned -121, which
I thought is the root cause of this problem. Could you please post the
content of the last OPEN message?

Thanks,
Jiufei.



>> Thanks,
>> Jiufei
>

2019-03-19 15:33:25

by Olga Kornievskaia

[permalink] [raw]
Subject: Re: [bug report] task hang while testing xfstests generic/323

On Sat, Mar 16, 2019 at 10:11 AM Jiufei Xue
<[email protected]> wrote:
>
>
> Hi Olga,
> On 2019/3/16 上午4:33, Olga Kornievskaia wrote:
> > On Fri, Mar 15, 2019 at 2:31 AM Jiufei Xue <[email protected]> wrote:
> >>
> >> Hi Olga,
> >>
> >> On 2019/3/11 下午11:13, Olga Kornievskaia wrote:
> >>> Let me double check that. I have reproduced the "infinite loop" or
> >>> CLOSE on the upstream (I'm looking thru the trace points from friday).
> >>
> >> Do you try to capture the packages when reproduced this issue on the
> >> upstream. I still lost kernel packages after some adjustment according
> >> to bfield's suggestion :(
> >
> > Hi Jiufei,
> >
> > Yes I have network trace captures but they are too big to post to the
> > mailing list. I have reproduced the problem on the latest upstream
> > origin/testing branch commit "SUNRPC: Take the transport send lock
> > before binding+connecting". As you have noted before infinite loops is
> > due to client "losing" an update to the seqid.
> >
> > one packet would send out an (recovery) OPEN with slot=0 seqid=Y.
> > tracepoint (nfs4_open_file) would log that status=ERESTARTSYS. The rpc
> > task would be sent and the rpc task would receive a reply but there is
> > nobody there to receive it... This open that got a reply has an
> > updated stateid seqid which client never updates. When CLOSE is sent,
> > it's sent with the "old" stateid and puts the client in an infinite
> > loop. Btw, CLOSE is sent on the interrupted slot which should get
> > FALSE_RETRY which causes the client to terminate the session. But it
> > would still keep sending the CLOSE with the old stateid.
> >
> > Some things I've noticed is that TEST_STATE op (as a part of the
> > nfs41_test_and _free_expired_stateid()) for some reason always has a
> > signal set even before issuing and RPC task so the task never
> > completes (ever).
> >
> > I always thought that OPEN's can't be interrupted but I guess they are
> > since they call rpc_wait_for_completion_task() and that's a killable
> > event. But I don't know how to find out what's sending a signal to the
> > process. I'm rather stuck here trying to figure out where to go from
> > there. So I'm still trying to figure out what's causing the signal or
> > also how to recover from it that the client doesn't lose that seqid.
> >
> >>
> Thank you for you quick relpy.
>
> I have also noticed the ERESTARTSYS status for OPEN op, but I think it
> is returned by the open process which is woken in nfs4_run_open_task().
> I found that the decode routine nfs4_xdr_dec_open returned -121, which
> I thought is the root cause of this problem. Could you please post the
> content of the last OPEN message?


Hi Jiufei,

Yes I think that's why the update isn't happening because the
rpc_status isn't 0.

Trond,

rpc_status of the rpc tasks that were interrupted but are finishing
are not able to succeed because when they try to decode_sequence the
res->st_slot is NULL. Sequence op is not decoded and then when it
tries to decode the PUTFH it throws unexpected op (expecting PUTFH but
has SEQ still there instead).

res->st_slot is going away because after the open(s) were interrupted
and _nfs4_proc_open() returned an error (interrupted), it goes and
frees the slot.

Is it perhaps appropriate to only free the slot there when if
(!data->cancelled) free_slot() otherwise. Let the async RPC task
continue and once it's done it'll free the slot.

How's this for a proposed fix:
Subject: [PATCH 1/1] NFSv4.1 don't free interrupted slot on open

Allow the async rpc task for finish and update the open state if needed,
then free the slot. Otherwise, the async rpc unable to decode the reply.

Signed-off-by: Olga Kornievskaia <[email protected]>
---
fs/nfs/nfs4proc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 4dbb0ee..96c2499 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2933,7 +2933,8 @@ static int _nfs4_open_and_get_state(struct
nfs4_opendata *opendata,
}

out:
- nfs4_sequence_free_slot(&opendata->o_res.seq_res);
+ if (!opendata->cancelled)
+ nfs4_sequence_free_slot(&opendata->o_res.seq_res);
return ret;
}




>
> Thanks,
> Jiufei.
>
>
>
> >> Thanks,
> >> Jiufei
> >