2023-07-08 19:13:28

by Chuck Lever III

[permalink] [raw]
Subject: NFS workload leaves nfsd threads in D state

Hi -

I have a "standard" test of running the git regression suite with
many threads against an NFS mount. I found that with 6.5-rc, the
test stalled and several nfsd threads on the server were stuck
in D state.

I can reproduce this stall 100% with both an xfs and an ext4
export, so I bisected with both, and both bisects landed on the
same commit:

615939a2ae734e3e68c816d6749d1f5f79c62ab7 is the first bad commit
commit 615939a2ae734e3e68c816d6749d1f5f79c62ab7
Author: Christoph Hellwig <[email protected]>
Date: Fri May 19 06:40:48 2023 +0200

blk-mq: defer to the normal submission path for post-flush requests

Requests with the FUA bit on hardware without FUA support need a post
flush before returning to the caller, but they can still be sent using
the normal I/O path after initializing the flush-related fields and
end I/O handler.

Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Bart Van Assche <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>

block/blk-flush.c | 11 +++++++++++
1 file changed, 11 insertions(+)

On system 1: the exports are on top of /dev/mapper and reside on
an "INTEL SSDSC2BA400G3" SATA device.

On system 2: the exports are on top of /dev/mapper and reside on
an "INTEL SSDSC2KB240G8" SATA device.

System 1 was where I discovered the stall. System 2 is where I ran
the bisects.

The call stacks vary a little. I've seen stalls in both the WRITE
and SETATTR paths. Here's a sample from system 1:

INFO: task nfsd:1237 blocked for more than 122 seconds.
Tainted: G W 6.4.0-08699-g9e268189cb14 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:nfsd state:D stack:0 pid:1237 ppid:2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x78f/0x7db
schedule+0x93/0xc8
jbd2_log_wait_commit+0xb4/0xf4
? __pfx_autoremove_wake_function+0x10/0x10
jbd2_complete_transaction+0x85/0x97
ext4_fc_commit+0x118/0x70a
? _raw_spin_unlock+0x18/0x2e
? __mark_inode_dirty+0x282/0x302
ext4_write_inode+0x94/0x121
ext4_nfs_commit_metadata+0x72/0x7d
commit_inode_metadata+0x1f/0x31 [nfsd]
commit_metadata+0x26/0x33 [nfsd]
nfsd_setattr+0x2f2/0x30e [nfsd]
nfsd_create_setattr+0x4e/0x87 [nfsd]
nfsd4_open+0x604/0x8fa [nfsd]
nfsd4_proc_compound+0x4a8/0x5e3 [nfsd]
? nfs4svc_decode_compoundargs+0x291/0x2de [nfsd]
nfsd_dispatch+0xb3/0x164 [nfsd]
svc_process_common+0x3c7/0x53a [sunrpc]
? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
svc_process+0xc6/0xe3 [sunrpc]
nfsd+0xf2/0x18c [nfsd]
? __pfx_nfsd+0x10/0x10 [nfsd]
kthread+0x10d/0x115
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2c/0x50
</TASK>

--
Chuck Lever




2023-07-09 07:35:41

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state

[CCing the regression list, as it should be in the loop for regressions:
https://docs.kernel.org/admin-guide/reporting-regressions.html]

[TLDR: I'm adding this report to the list of tracked Linux kernel
regressions; the text you find below is based on a few templates
paragraphs you might have encountered already in similar form.
See link in footer if these mails annoy you.]

On 08.07.23 20:30, Chuck Lever III wrote:
>
> I have a "standard" test of running the git regression suite with
> many threads against an NFS mount. I found that with 6.5-rc, the
> test stalled and several nfsd threads on the server were stuck
> in D state.
>
> I can reproduce this stall 100% with both an xfs and an ext4
> export, so I bisected with both, and both bisects landed on the
> same commit:
>
> 615939a2ae734e3e68c816d6749d1f5f79c62ab7 is the first bad commit
> commit 615939a2ae734e3e68c816d6749d1f5f79c62ab7
> Author: Christoph Hellwig <[email protected]>
> Date: Fri May 19 06:40:48 2023 +0200
>
> blk-mq: defer to the normal submission path for post-flush requests
>
> Requests with the FUA bit on hardware without FUA support need a post
> flush before returning to the caller, but they can still be sent using
> the normal I/O path after initializing the flush-related fields and
> end I/O handler.
>
> Signed-off-by: Christoph Hellwig <[email protected]>
> Reviewed-by: Bart Van Assche <[email protected]>
> Link: https://lore.kernel.org/r/[email protected]
> Signed-off-by: Jens Axboe <[email protected]>
>
> block/blk-flush.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> On system 1: the exports are on top of /dev/mapper and reside on
> an "INTEL SSDSC2BA400G3" SATA device.
>
> On system 2: the exports are on top of /dev/mapper and reside on
> an "INTEL SSDSC2KB240G8" SATA device.
>
> System 1 was where I discovered the stall. System 2 is where I ran
> the bisects.
>
> The call stacks vary a little. I've seen stalls in both the WRITE
> and SETATTR paths. Here's a sample from system 1:
>
> INFO: task nfsd:1237 blocked for more than 122 seconds.
> Tainted: G W 6.4.0-08699-g9e268189cb14 #1
> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> task:nfsd state:D stack:0 pid:1237 ppid:2 flags:0x00004000
> Call Trace:
> <TASK>
> __schedule+0x78f/0x7db
> schedule+0x93/0xc8
> jbd2_log_wait_commit+0xb4/0xf4
> ? __pfx_autoremove_wake_function+0x10/0x10
> jbd2_complete_transaction+0x85/0x97
> ext4_fc_commit+0x118/0x70a
> ? _raw_spin_unlock+0x18/0x2e
> ? __mark_inode_dirty+0x282/0x302
> ext4_write_inode+0x94/0x121
> ext4_nfs_commit_metadata+0x72/0x7d
> commit_inode_metadata+0x1f/0x31 [nfsd]
> commit_metadata+0x26/0x33 [nfsd]
> nfsd_setattr+0x2f2/0x30e [nfsd]
> nfsd_create_setattr+0x4e/0x87 [nfsd]
> nfsd4_open+0x604/0x8fa [nfsd]
> nfsd4_proc_compound+0x4a8/0x5e3 [nfsd]
> ? nfs4svc_decode_compoundargs+0x291/0x2de [nfsd]
> nfsd_dispatch+0xb3/0x164 [nfsd]
> svc_process_common+0x3c7/0x53a [sunrpc]
> ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
> svc_process+0xc6/0xe3 [sunrpc]
> nfsd+0xf2/0x18c [nfsd]
> ? __pfx_nfsd+0x10/0x10 [nfsd]
> kthread+0x10d/0x115
> ? __pfx_kthread+0x10/0x10
> ret_from_fork+0x2c/0x50
> </TASK>

Thanks for the report. To be sure the issue doesn't fall through the
cracks unnoticed, I'm adding it to regzbot, the Linux kernel regression
tracking bot:

#regzbot ^introduced 615939a2ae734e
#regzbot title blk-mq: NFS workload leaves nfsd threads in D state
#regzbot ignore-activity

This isn't a regression? This issue or a fix for it are already
discussed somewhere else? It was fixed already? You want to clarify when
the regression started to happen? Or point out I got the title or
something else totally wrong? Then just reply and tell me -- ideally
while also telling regzbot about it, as explained by the page listed in
the footer of this mail.

Developers: When fixing the issue, remember to add 'Link:' tags pointing
to the report (the parent of this mail). See page linked in footer for
details.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.

2023-07-10 08:11:01

by Christoph Hellwig

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state

On Sat, Jul 08, 2023 at 06:30:26PM +0000, Chuck Lever III wrote:
> Hi -
>
> I have a "standard" test of running the git regression suite with
> many threads against an NFS mount. I found that with 6.5-rc, the
> test stalled and several nfsd threads on the server were stuck
> in D state.

Can you paste the exact reproducer here?

> I can reproduce this stall 100% with both an xfs and an ext4
> export, so I bisected with both, and both bisects landed on the
> same commit:

> On system 1: the exports are on top of /dev/mapper and reside on
> an "INTEL SSDSC2BA400G3" SATA device.
>
> On system 2: the exports are on top of /dev/mapper and reside on
> an "INTEL SSDSC2KB240G8" SATA device.
>
> System 1 was where I discovered the stall. System 2 is where I ran
> the bisects.

Ok. I'd be curious if this reproducers without either device mapper
or on a non-SATA device. If you have an easy way to run it in a VM
that'd be great. Otherwise I'll try to recreate it in various
setups if you post the exact reproducer.


2023-07-10 14:09:26

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state


> On Jul 10, 2023, at 3:56 AM, Christoph Hellwig <[email protected]> wrote:
>
> On Sat, Jul 08, 2023 at 06:30:26PM +0000, Chuck Lever III wrote:
>> Hi -
>>
>> I have a "standard" test of running the git regression suite with
>> many threads against an NFS mount. I found that with 6.5-rc, the
>> test stalled and several nfsd threads on the server were stuck
>> in D state.
>
> Can you paste the exact reproducer here?

It's a twisty little maze of scripts, but it does essentially this:

1. export a test filesystem on system B

2. mount that export on system A via NFS (I think I used NFSv4.1)

3. download the latest git tarball on system A

4. unpack the tarball on the test NFS mount on system A. umount / mount

5. "make -jN all docs" on system A, where N is nprocs. umount / mount

6. "make -jN test" on system A, where N is as in step 5.

(For "make test" to work, the mounted on dir on system A has to be
exactly the same for all steps).

My system A has 12 cores, and B has 4, fwiw. The network fabric
is InfiniBand, but I suspect that won't make much difference.

During step 6, the tests will slow down and then stop cold.
After another two minutes, on system B you'll start to see the
INFO splats about hung processes.

As an interesting side note, I have a btrfs filesystem on that same
mapper group and physical device. I'm not able to reproduce the problem
on that filesystem.


>> I can reproduce this stall 100% with both an xfs and an ext4
>> export, so I bisected with both, and both bisects landed on the
>> same commit:
>
>> On system 1: the exports are on top of /dev/mapper and reside on
>> an "INTEL SSDSC2BA400G3" SATA device.
>>
>> On system 2: the exports are on top of /dev/mapper and reside on
>> an "INTEL SSDSC2KB240G8" SATA device.
>>
>> System 1 was where I discovered the stall. System 2 is where I ran
>> the bisects.
>
> Ok. I'd be curious if this reproducers without either device mapper
> or on a non-SATA device. If you have an easy way to run it in a VM
> that'd be great. Otherwise I'll try to recreate it in various
> setups if you post the exact reproducer.

I have a way to test it on an xfs export backed by a pair of AIC
NVMe devices. Stand by.

--
Chuck Lever



2023-07-10 15:42:33

by Christoph Hellwig

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state

On Mon, Jul 10, 2023 at 03:10:41PM +0000, Chuck Lever III wrote:
> > I have a way to test it on an xfs export backed by a pair of AIC
> > NVMe devices. Stand by.
>
> The issue is not reproducible with PCIe NVMe devices.

Thanks. ATA is special because it can't queued flush commands unlike
all other problems. I'll see if I can get an ATA test setup and will
look into it ASAP.

2023-07-10 15:49:51

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state


> On Jul 10, 2023, at 10:06 AM, Chuck Lever III <[email protected]> wrote:
>
>> On Jul 10, 2023, at 3:56 AM, Christoph Hellwig <[email protected]> wrote:
>>
>> Ok. I'd be curious if this reproducers without either device mapper
>> or on a non-SATA device. If you have an easy way to run it in a VM
>> that'd be great. Otherwise I'll try to recreate it in various
>> setups if you post the exact reproducer.
>
> I have a way to test it on an xfs export backed by a pair of AIC
> NVMe devices. Stand by.

The issue is not reproducible with PCIe NVMe devices.


--
Chuck Lever



2023-07-10 18:02:19

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state


> On Jul 10, 2023, at 1:28 PM, Christoph Hellwig <[email protected]> wrote:
>
> Only found a virtualized AHCI setup so far without much success. Can
> you try this (from Chengming Zhou) in the meantime?
>
> diff --git a/block/blk-flush.c b/block/blk-flush.c
> index dba392cf22bec6..5c392a277b9eb2 100644
> --- a/block/blk-flush.c
> +++ b/block/blk-flush.c
> @@ -443,7 +443,7 @@ bool blk_insert_flush(struct request *rq)
> * the post flush, and then just pass the command on.
> */
> blk_rq_init_flush(rq);
> - rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
> + rq->flush.seq |= REQ_FSEQ_PREFLUSH;
> spin_lock_irq(&fq->mq_flush_lock);
> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
> spin_unlock_irq(&fq->mq_flush_lock);

Thanks for the quick response. No change.

--
Chuck Lever



2023-07-10 18:36:25

by Christoph Hellwig

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state

Only found a virtualized AHCI setup so far without much success. Can
you try this (from Chengming Zhou) in the meantime?

diff --git a/block/blk-flush.c b/block/blk-flush.c
index dba392cf22bec6..5c392a277b9eb2 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -443,7 +443,7 @@ bool blk_insert_flush(struct request *rq)
* the post flush, and then just pass the command on.
*/
blk_rq_init_flush(rq);
- rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
+ rq->flush.seq |= REQ_FSEQ_PREFLUSH;
spin_lock_irq(&fq->mq_flush_lock);
list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
spin_unlock_irq(&fq->mq_flush_lock);

2023-07-11 12:08:49

by Christoph Hellwig

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state

On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote:
> > blk_rq_init_flush(rq);
> > - rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
> > + rq->flush.seq |= REQ_FSEQ_PREFLUSH;
> > spin_lock_irq(&fq->mq_flush_lock);
> > list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
> > spin_unlock_irq(&fq->mq_flush_lock);
>
> Thanks for the quick response. No change.

I'm a bit lost and still can't reprodce. Below is a patch with the
only behavior differences I can find. It has two "#if 1" blocks,
which I'll need to bisect to to find out which made it work (if any,
but I hope so).

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 5504719b970d59..67364e607f2d1d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2927,6 +2927,7 @@ void blk_mq_submit_bio(struct bio *bio)
struct request_queue *q = bdev_get_queue(bio->bi_bdev);
struct blk_plug *plug = blk_mq_plug(bio);
const int is_sync = op_is_sync(bio->bi_opf);
+ bool is_flush = op_is_flush(bio->bi_opf);
struct blk_mq_hw_ctx *hctx;
struct request *rq;
unsigned int nr_segs = 1;
@@ -2967,16 +2968,23 @@ void blk_mq_submit_bio(struct bio *bio)
return;
}

- if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
- return;
-
- if (plug) {
- blk_add_rq_to_plug(plug, rq);
- return;
+#if 1 /* Variant 1, the plug is holding us back */
+ if (op_is_flush(bio->bi_opf)) {
+ if (blk_insert_flush(rq))
+ return;
+ } else {
+ if (plug) {
+ blk_add_rq_to_plug(plug, rq);
+ return;
+ }
}
+#endif

hctx = rq->mq_hctx;
if ((rq->rq_flags & RQF_USE_SCHED) ||
+#if 1 /* Variant 2 (unlikely), blk_mq_try_issue_directly causes problems */
+ is_flush ||
+#endif
(hctx->dispatch_busy && (q->nr_hw_queues == 1 || !is_sync))) {
blk_mq_insert_request(rq, 0);
blk_mq_run_hw_queue(hctx, true);

2023-07-12 11:44:23

by Chengming Zhou

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state

On 2023/7/11 20:01, Christoph Hellwig wrote:
> On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote:
>>> blk_rq_init_flush(rq);
>>> - rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
>>> + rq->flush.seq |= REQ_FSEQ_PREFLUSH;
>>> spin_lock_irq(&fq->mq_flush_lock);
>>> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
>>> spin_unlock_irq(&fq->mq_flush_lock);
>>
>> Thanks for the quick response. No change.
>
> I'm a bit lost and still can't reprodce. Below is a patch with the
> only behavior differences I can find. It has two "#if 1" blocks,
> which I'll need to bisect to to find out which made it work (if any,
> but I hope so).

Hello,

I tried today to reproduce, but can't unfortunately.

Could you please also try the fix patch [1] from Ross Lagerwall that fixes
IO hung problem of plug recursive flush?

(Since the main difference is that post-flush requests now can go into plug.)

[1] https://lore.kernel.org/all/[email protected]/

Thanks!

>
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 5504719b970d59..67364e607f2d1d 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2927,6 +2927,7 @@ void blk_mq_submit_bio(struct bio *bio)
> struct request_queue *q = bdev_get_queue(bio->bi_bdev);
> struct blk_plug *plug = blk_mq_plug(bio);
> const int is_sync = op_is_sync(bio->bi_opf);
> + bool is_flush = op_is_flush(bio->bi_opf);
> struct blk_mq_hw_ctx *hctx;
> struct request *rq;
> unsigned int nr_segs = 1;
> @@ -2967,16 +2968,23 @@ void blk_mq_submit_bio(struct bio *bio)
> return;
> }
>
> - if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
> - return;
> -
> - if (plug) {
> - blk_add_rq_to_plug(plug, rq);
> - return;
> +#if 1 /* Variant 1, the plug is holding us back */
> + if (op_is_flush(bio->bi_opf)) {
> + if (blk_insert_flush(rq))
> + return;
> + } else {
> + if (plug) {
> + blk_add_rq_to_plug(plug, rq);
> + return;
> + }
> }
> +#endif
>
> hctx = rq->mq_hctx;
> if ((rq->rq_flags & RQF_USE_SCHED) ||
> +#if 1 /* Variant 2 (unlikely), blk_mq_try_issue_directly causes problems */
> + is_flush ||
> +#endif
> (hctx->dispatch_busy && (q->nr_hw_queues == 1 || !is_sync))) {
> blk_mq_insert_request(rq, 0);
> blk_mq_run_hw_queue(hctx, true);

2023-07-12 13:41:05

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state



> On Jul 12, 2023, at 7:34 AM, Chengming Zhou <[email protected]> wrote:
>
> On 2023/7/11 20:01, Christoph Hellwig wrote:
>> On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote:
>>>> blk_rq_init_flush(rq);
>>>> - rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
>>>> + rq->flush.seq |= REQ_FSEQ_PREFLUSH;
>>>> spin_lock_irq(&fq->mq_flush_lock);
>>>> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
>>>> spin_unlock_irq(&fq->mq_flush_lock);
>>>
>>> Thanks for the quick response. No change.
>>
>> I'm a bit lost and still can't reprodce. Below is a patch with the
>> only behavior differences I can find. It has two "#if 1" blocks,
>> which I'll need to bisect to to find out which made it work (if any,
>> but I hope so).
>
> Hello,
>
> I tried today to reproduce, but can't unfortunately.
>
> Could you please also try the fix patch [1] from Ross Lagerwall that fixes
> IO hung problem of plug recursive flush?
>
> (Since the main difference is that post-flush requests now can go into plug.)
>
> [1] https://lore.kernel.org/all/[email protected]/

Thanks for the suggestion. No change, unfortunately.


> Thanks!
>
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 5504719b970d59..67364e607f2d1d 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2927,6 +2927,7 @@ void blk_mq_submit_bio(struct bio *bio)
>> struct request_queue *q = bdev_get_queue(bio->bi_bdev);
>> struct blk_plug *plug = blk_mq_plug(bio);
>> const int is_sync = op_is_sync(bio->bi_opf);
>> + bool is_flush = op_is_flush(bio->bi_opf);
>> struct blk_mq_hw_ctx *hctx;
>> struct request *rq;
>> unsigned int nr_segs = 1;
>> @@ -2967,16 +2968,23 @@ void blk_mq_submit_bio(struct bio *bio)
>> return;
>> }
>>
>> - if (op_is_flush(bio->bi_opf) && blk_insert_flush(rq))
>> - return;
>> -
>> - if (plug) {
>> - blk_add_rq_to_plug(plug, rq);
>> - return;
>> +#if 1 /* Variant 1, the plug is holding us back */
>> + if (op_is_flush(bio->bi_opf)) {
>> + if (blk_insert_flush(rq))
>> + return;
>> + } else {
>> + if (plug) {
>> + blk_add_rq_to_plug(plug, rq);
>> + return;
>> + }
>> }
>> +#endif
>>
>> hctx = rq->mq_hctx;
>> if ((rq->rq_flags & RQF_USE_SCHED) ||
>> +#if 1 /* Variant 2 (unlikely), blk_mq_try_issue_directly causes problems */
>> + is_flush ||
>> +#endif
>> (hctx->dispatch_busy && (q->nr_hw_queues == 1 || !is_sync))) {
>> blk_mq_insert_request(rq, 0);
>> blk_mq_run_hw_queue(hctx, true);


--
Chuck Lever



2023-07-25 10:18:32

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state

On 12.07.23 15:29, Chuck Lever III wrote:
>> On Jul 12, 2023, at 7:34 AM, Chengming Zhou <[email protected]> wrote:
>> On 2023/7/11 20:01, Christoph Hellwig wrote:
>>> On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote:
>>>>> blk_rq_init_flush(rq);
>>>>> - rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
>>>>> + rq->flush.seq |= REQ_FSEQ_PREFLUSH;
>>>>> spin_lock_irq(&fq->mq_flush_lock);
>>>>> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
>>>>> spin_unlock_irq(&fq->mq_flush_lock);
>>>>
>>>> Thanks for the quick response. No change.
>>> I'm a bit lost and still can't reprodce. Below is a patch with the
>>> only behavior differences I can find. It has two "#if 1" blocks,
>>> which I'll need to bisect to to find out which made it work (if any,
>>> but I hope so).
>>
>> I tried today to reproduce, but can't unfortunately.
>>
>> Could you please also try the fix patch [1] from Ross Lagerwall that fixes
>> IO hung problem of plug recursive flush?
>>
>> (Since the main difference is that post-flush requests now can go into plug.)
>>
>> [1] https://lore.kernel.org/all/[email protected]/
>
> Thanks for the suggestion. No change, unfortunately.

Chuck, what's the status here? This thread looks stalled, that's why I
wonder.

FWIW, I noticed a commit with a Fixes: tag for your culprit in next (see
28b24123747098 ("blk-flush: fix rq->flush.seq for post-flush
requests")). But unless I missed something you are not CCed, so I guess
that's a different issue?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

2023-07-25 13:25:06

by Chuck Lever III

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state



> On Jul 25, 2023, at 5:57 AM, Linux regression tracking (Thorsten Leemhuis) <[email protected]> wrote:
>
> On 12.07.23 15:29, Chuck Lever III wrote:
>>> On Jul 12, 2023, at 7:34 AM, Chengming Zhou <[email protected]> wrote:
>>> On 2023/7/11 20:01, Christoph Hellwig wrote:
>>>> On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote:
>>>>>> blk_rq_init_flush(rq);
>>>>>> - rq->flush.seq |= REQ_FSEQ_POSTFLUSH;
>>>>>> + rq->flush.seq |= REQ_FSEQ_PREFLUSH;
>>>>>> spin_lock_irq(&fq->mq_flush_lock);
>>>>>> list_move_tail(&rq->flush.list, &fq->flush_data_in_flight);
>>>>>> spin_unlock_irq(&fq->mq_flush_lock);
>>>>>
>>>>> Thanks for the quick response. No change.
>>>> I'm a bit lost and still can't reprodce. Below is a patch with the
>>>> only behavior differences I can find. It has two "#if 1" blocks,
>>>> which I'll need to bisect to to find out which made it work (if any,
>>>> but I hope so).
>>>
>>> I tried today to reproduce, but can't unfortunately.
>>>
>>> Could you please also try the fix patch [1] from Ross Lagerwall that fixes
>>> IO hung problem of plug recursive flush?
>>>
>>> (Since the main difference is that post-flush requests now can go into plug.)
>>>
>>> [1] https://lore.kernel.org/all/[email protected]/
>>
>> Thanks for the suggestion. No change, unfortunately.
>
> Chuck, what's the status here? This thread looks stalled, that's why I
> wonder.
>
> FWIW, I noticed a commit with a Fixes: tag for your culprit in next (see
> 28b24123747098 ("blk-flush: fix rq->flush.seq for post-flush
> requests")). But unless I missed something you are not CCed, so I guess
> that's a different issue?

Hi Thorsten-

This issue was fixed in 6.5-rc2 by commit

9f87fc4d72f5 ("block: queue data commands from the flush state machine at the head")


--
Chuck Lever



2023-07-25 13:48:59

by Thorsten Leemhuis

[permalink] [raw]
Subject: Re: NFS workload leaves nfsd threads in D state

On 25.07.23 15:21, Chuck Lever III wrote:
>> On Jul 25, 2023, at 5:57 AM, Linux regression tracking (Thorsten Leemhuis) <[email protected]> wrote:
>>
>> On 12.07.23 15:29, Chuck Lever III wrote:
>>>> On Jul 12, 2023, at 7:34 AM, Chengming Zhou <[email protected]> wrote:
>>>> On 2023/7/11 20:01, Christoph Hellwig wrote:
>>>>> On Mon, Jul 10, 2023 at 05:40:42PM +0000, Chuck Lever III wrote:
>>
>> Chuck, what's the status here? This thread looks stalled, that's why I
>> wonder.
>>
>> FWIW, I noticed a commit with a Fixes: tag for your culprit in next (see
>> 28b24123747098 ("blk-flush: fix rq->flush.seq for post-flush
>> requests")). But unless I missed something you are not CCed, so I guess
>> that's a different issue?
>
> This issue was fixed in 6.5-rc2 by commit
>
> 9f87fc4d72f5 ("block: queue data commands from the flush state machine at the head")

Ahh, many thx for the update, it lacked a proper Link:/Closes: tag and
mentioned another culprit, so I had missed that!

#regzbot fix: 9f87fc4d72f5
#regzbot ignore-activity

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
That page also explains what to do if mails like this annoy you.