2021-06-29 18:29:15

by Marciniszyn, Mike

[permalink] [raw]
Subject: NFS trace new to 5.13.0 (GA)

During our continuous integration testing on 5.13.0 kernel our testing trips on NFS testing with the following trace on the client:

[32936.156848] INFO: task kworker/9:1:519 blocked for more than 122 seconds.
[32936.165201] Tainted: G S 5.13.0 #1
[32936.171562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[32936.180773] task:kworker/9:1 state:D stack: 0 pid: 519 ppid: 2 flags:0x00004000
[32936.190565] Workqueue: events xprt_destroy_cb [sunrpc]
[32936.196854] Call Trace:
[32936.200107] __schedule+0x38e/0x8b0
[32936.204482] schedule+0x3c/0xa0
[32936.208464] schedule_timeout+0x215/0x2b0
[32936.213401] ? check_preempt_curr+0x3f/0x70
[32936.218518] ? ttwu_do_wakeup+0x17/0x140
[32936.223336] wait_for_completion+0x98/0xf0
[32936.228396] __flush_work+0x128/0x1e0
[32936.232942] ? worker_attach_to_pool+0xb0/0xb0
[32936.238351] ? work_busy+0x80/0x80
[32936.242555] __cancel_work_timer+0x110/0x1a0
[32936.247726] ? xprt_rdma_bc_destroy+0xc6/0xe0 [rpcrdma]
[32936.254034] xprt_rdma_destroy+0x15/0x50 [rpcrdma]
[32936.259873] process_one_work+0x1cb/0x360
[32936.264788] ? process_one_work+0x360/0x360
[32936.269915] worker_thread+0x30/0x370
[32936.274436] ? process_one_work+0x360/0x360
[32936.279526] kthread+0x116/0x130
[32936.283534] ? __kthread_cancel_work+0x40/0x40
[32936.288924] ret_from_fork+0x22/0x30

The same tests and same servers see no such issue from rc4 to rc7, so the failure seems new.

Any thoughts?

I'm currently rerunning rc7 just to be sure.

Mike
External recipient


2021-06-29 18:39:52

by Chuck Lever

[permalink] [raw]
Subject: Re: NFS trace new to 5.13.0 (GA)

Hi Mike-

> On Jun 29, 2021, at 2:28 PM, Marciniszyn, Mike <[email protected]> wrote:
>
> During our continuous integration testing on 5.13.0 kernel our testing trips on NFS testing with the following trace on the client:
>
> [32936.156848] INFO: task kworker/9:1:519 blocked for more than 122 seconds.
> [32936.165201] Tainted: G S 5.13.0 #1
> [32936.171562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [32936.180773] task:kworker/9:1 state:D stack: 0 pid: 519 ppid: 2 flags:0x00004000
> [32936.190565] Workqueue: events xprt_destroy_cb [sunrpc]
> [32936.196854] Call Trace:
> [32936.200107] __schedule+0x38e/0x8b0
> [32936.204482] schedule+0x3c/0xa0
> [32936.208464] schedule_timeout+0x215/0x2b0
> [32936.213401] ? check_preempt_curr+0x3f/0x70
> [32936.218518] ? ttwu_do_wakeup+0x17/0x140
> [32936.223336] wait_for_completion+0x98/0xf0
> [32936.228396] __flush_work+0x128/0x1e0
> [32936.232942] ? worker_attach_to_pool+0xb0/0xb0
> [32936.238351] ? work_busy+0x80/0x80
> [32936.242555] __cancel_work_timer+0x110/0x1a0
> [32936.247726] ? xprt_rdma_bc_destroy+0xc6/0xe0 [rpcrdma]
> [32936.254034] xprt_rdma_destroy+0x15/0x50 [rpcrdma]
> [32936.259873] process_one_work+0x1cb/0x360
> [32936.264788] ? process_one_work+0x360/0x360
> [32936.269915] worker_thread+0x30/0x370
> [32936.274436] ? process_one_work+0x360/0x360
> [32936.279526] kthread+0x116/0x130
> [32936.283534] ? __kthread_cancel_work+0x40/0x40
> [32936.288924] ret_from_fork+0x22/0x30
>
> The same tests and same servers see no such issue from rc4 to rc7, so the failure seems new.
>
> Any thoughts?
>
> I'm currently rerunning rc7 just to be sure.

The NFS server in v5.13 is afflicted by a late-breaking bug-fix
to the alloc_pages_bulk_array() API. It's been fixed in Linus'
tree, but that tree is otherwise unstable for me.

Have a look at commit 66d9282523b for a one-liner fix, it should
apply cleanly to v5.13.

--
Chuck Lever



2021-06-29 19:35:39

by Marciniszyn, Mike

[permalink] [raw]
Subject: RE: NFS trace new to 5.13.0 (GA)

> The NFS server in v5.13 is afflicted by a late-breaking bug-fix to the
> alloc_pages_bulk_array() API. It's been fixed in Linus'
> tree, but that tree is otherwise unstable for me.
>
> Have a look at commit 66d9282523b for a one-liner fix, it should apply cleanly
> to v5.13.
>

Thanks. I'm testing with 5.13 + that fix right now. Will know in the AM.

Mike
External recipient

2021-06-30 13:23:44

by Marciniszyn, Mike

[permalink] [raw]
Subject: RE: NFS trace new to 5.13.0 (GA)

> > The NFS server in v5.13 is afflicted by a late-breaking bug-fix to the
> > alloc_pages_bulk_array() API. It's been fixed in Linus'
> > tree, but that tree is otherwise unstable for me.
> >
> > Have a look at commit 66d9282523b for a one-liner fix, it should apply
> > cleanly to v5.13.
> >
>
> Thanks. I'm testing with 5.13 + that fix right now. Will know in the AM.
>

The testing completed with zero errors.

I also re-tested rc7 with similar results.

Thanks for your help,
Mike
External recipient