2022-12-23 13:07:09

by Mike Galbraith

[permalink] [raw]
Subject: Re: regression: nfs mount (even idle) eventually hangs server

On Fri, 2022-12-23 at 04:02 -0800, [email protected] wrote:
> Hi Mike,
>
> I think the problem is the nfsd4_state_shrinker_worker is being
> scheduled to run multiple times. This trigger the WARN_ON_ONCE
> in __queue_delayed_work.
>
> Could you try the attached patch to see if it fixes this problem.
> I tried to reproduce it on my test VMs but no success so I can't
> verify the patch.
> >

That was a nogo.

bart:/root # grep WARNING: netconsole.log
[ 1030.364594] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1655 __queue_delayed_work+0x6a/0x90
[ 1030.364970] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1657 __queue_delayed_work+0x5a/0x90
[ 1030.365315] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
[ 1030.365666] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
[ 1030.365992] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
[ 1030.366333] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
[ 1030.366669] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
[ 1030.366995] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
[ 1030.367317] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
[ 1030.367636] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
[ 1030.367962] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0


2023-01-09 00:51:15

by Dai Ngo

[permalink] [raw]
Subject: Re: regression: nfs mount (even idle) eventually hangs server

Hi Mike,

Could you give the v2 patch a try? I will send you the location
to upload the core, if it happens, in a private email.

Thanks,
-Dai

On 12/23/22 5:05 AM, Mike Galbraith wrote:
> On Fri, 2022-12-23 at 04:02 -0800, [email protected] wrote:
>> Hi Mike,
>>
>> I think the problem is the nfsd4_state_shrinker_worker is being
>> scheduled to run multiple times. This trigger the WARN_ON_ONCE
>> in __queue_delayed_work.
>>
>> Could you try the attached patch to see if it fixes this problem.
>> I tried to reproduce it on my test VMs but no success so I can't
>> verify the patch.
> That was a nogo.
>
> bart:/root # grep WARNING: netconsole.log
> [ 1030.364594] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1655 __queue_delayed_work+0x6a/0x90
> [ 1030.364970] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1657 __queue_delayed_work+0x5a/0x90
> [ 1030.365315] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
> [ 1030.365666] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
> [ 1030.365992] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
> [ 1030.366333] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
> [ 1030.366669] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
> [ 1030.366995] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
> [ 1030.367317] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
> [ 1030.367636] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
> [ 1030.367962] WARNING: CPU: 4 PID: 79 at kernel/workqueue.c:1500 __queue_work+0x33b/0x3d0
>


Attachments:
0001-PATCH-v2-NFSD-fix-WARN_ON_ONCE-in-__queue_delayed_wo.patch (7.15 kB)

2023-01-09 02:47:16

by Mike Galbraith

[permalink] [raw]
Subject: Re: regression: nfs mount (even idle) eventually hangs server

On Sun, 2023-01-08 at 16:50 -0800, [email protected] wrote:
> Hi Mike,

Greetings,

> Could you give the v2 patch a try? I will send you the location
> to upload the core, if it happens, in a private email.

V2 cured it. I also verified that virgin rc3 still does spew/brick.

-Mike

2023-01-09 06:35:26

by Dai Ngo

[permalink] [raw]
Subject: Re: regression: nfs mount (even idle) eventually hangs server


On 1/8/23 6:41 PM, Mike Galbraith wrote:
> On Sun, 2023-01-08 at 16:50 -0800, [email protected] wrote:
>> Hi Mike,
> Greetings,
>
>> Could you give the v2 patch a try? I will send you the location
>> to upload the core, if it happens, in a private email.
> V2 cured it. I also verified that virgin rc3 still does spew/brick.

Thank you much Mike!

-Dai