Hi,
Recently on torvalds master, I/O on USB flash drives started hanging
here:
task:systemd-udevd state:D stack: 0 pid: 374 ppid: 347 flags:0x00004000
Call Trace:
<TASK>
? __schedule+0x319/0x4a0
? schedule+0x77/0xa0
? io_schedule+0x43/0x60
? blk_mq_get_tag+0x175/0x2b0
? mempool_alloc+0x33/0x170
? init_wait_entry+0x30/0x30
? __blk_mq_alloc_requests+0x1b4/0x220
? blk_mq_submit_bio+0x213/0x490
? submit_bio_noacct+0x22c/0x270
? xa_load+0x48/0x80
? mpage_readahead+0x114/0x130
? blkdev_fallocate+0x170/0x170
? read_pages+0x48/0x1d0
? page_cache_ra_unbounded+0xee/0x1f0
? force_page_cache_ra+0x68/0xc0
? filemap_read+0x18c/0x9a0
? blkdev_read_iter+0x4e/0x120
? vfs_read+0x265/0x2d0
? ksys_read+0x50/0xa0
? do_syscall_64+0x62/0x90
? do_user_addr_fault+0x271/0x3c0
? asm_exc_page_fault+0x8/0x30
? exc_page_fault+0x58/0x80
? entry_SYSCALL_64_after_hwframe+0x44/0xae
</TASK>
mount(8) hangs with a similar backtrace, making the device effectively
unusable. It does not seem to affect NVMe or SATA attached drives. The
affected drive does not support UAS. I don't currently have UAS drives
to test with. The default I/O scheduler is set to noop.
I found that reverting 180dccb0dba4 ("blk-mq: fix tag_get wait
task can't be awakened") appears to resolve the issue.
Let me know what other information is needed.
Cheers,
Alex.
Hi Alex
1、Please help to import this structure:
blk_mq_tags <= request_queue->blk_mq_hw_ctx->blk_mq_tags
If there is no kernel dump, help to see the value of
cat /sys/block/sda/mq/0/nr_tags
__ <= Change it to the problem device
And how many block devices in total by lsblk.
2、Please describe in detail how to reproduce the issue,
And what type of USB device?
3、Please help to try the attachment patch and see if it can be reproduced.
Thanks.
On 2022/1/25 0:24, Alex Xu (Hello71) wrote:
> Hi,
>
> Recently on torvalds master, I/O on USB flash drives started hanging
> here:
>
> task:systemd-udevd state:D stack: 0 pid: 374 ppid: 347 flags:0x00004000
> Call Trace:
> <TASK>
> ? __schedule+0x319/0x4a0
> ? schedule+0x77/0xa0
> ? io_schedule+0x43/0x60
> ? blk_mq_get_tag+0x175/0x2b0
> ? mempool_alloc+0x33/0x170
> ? init_wait_entry+0x30/0x30
> ? __blk_mq_alloc_requests+0x1b4/0x220
> ? blk_mq_submit_bio+0x213/0x490
> ? submit_bio_noacct+0x22c/0x270
> ? xa_load+0x48/0x80
> ? mpage_readahead+0x114/0x130
> ? blkdev_fallocate+0x170/0x170
> ? read_pages+0x48/0x1d0
> ? page_cache_ra_unbounded+0xee/0x1f0
> ? force_page_cache_ra+0x68/0xc0
> ? filemap_read+0x18c/0x9a0
> ? blkdev_read_iter+0x4e/0x120
> ? vfs_read+0x265/0x2d0
> ? ksys_read+0x50/0xa0
> ? do_syscall_64+0x62/0x90
> ? do_user_addr_fault+0x271/0x3c0
> ? asm_exc_page_fault+0x8/0x30
> ? exc_page_fault+0x58/0x80
> ? entry_SYSCALL_64_after_hwframe+0x44/0xae
> </TASK>
>
> mount(8) hangs with a similar backtrace, making the device effectively
> unusable. It does not seem to affect NVMe or SATA attached drives. The
> affected drive does not support UAS. I don't currently have UAS drives
> to test with. The default I/O scheduler is set to noop.
>
> I found that reverting 180dccb0dba4 ("blk-mq: fix tag_get wait
> task can't be awakened") appears to resolve the issue.
>
> Let me know what other information is needed.
>
> Cheers,
> Alex.
> .
>
BR
Laibin
Hi Alex,
Same issue here. I just spent an hour bisecting the issue and hit the
same commit you did.
If I try to dd from a USB card reader to null I can see a few commands
wake up the thread that usb storage uses and then nothing more.
The user process then blocks forever waiting for the data it asked for.
Cheers,
Daniel
Excerpts from QiuLaibin's message of January 24, 2022 11:08 pm:
> Hi Alex
>
> 1、Please help to import this structure:
>
> blk_mq_tags <= request_queue->blk_mq_hw_ctx->blk_mq_tags
I don't understand what you mean.
> If there is no kernel dump, help to see the value of
>
> cat /sys/block/sda/mq/0/nr_tags
> __ <= Change it to the problem device
The affected device returns 1. My understanding is that mq does not work
with legacy non-UAS devices.
> And how many block devices in total by lsblk.
My device topology roughly looks like:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sda 8:0 0 [snip] 0 disk
├─sda1 8:1 0 [snip] 0 part
├─sda2 8:2 0 [snip] 0 part
└─sda3 8:3 0 [snip] 0 part
sdb 8:16 0 [snip] 0 disk
├─sdb1 8:17 0 [snip] 0 part
├─sdb2 8:18 0 [snip] 0 part
├─sdb3 8:19 0 [snip] 0 part
└─sdb4 8:20 0 [snip] 0 part
sdc 8:32 1 [snip] 0 disk
├─sdc1 8:33 1 [snip] 0 part
└─sdc2 8:34 1 [snip] 0 part
nvme0n1 259:0 0 [snip] 0 disk
├─nvme0n1p1 259:1 0 [snip] 0 part
└─nvme0n1p2 259:2 0 [snip] 0 part
└─root 254:0 0 [snip] 0 crypt /
> 2、Please describe in detail how to reproduce the issue,
1. Plug in the device.
2. Execute Show Blocked Tasks. udev is stuck.
> And what type of USB device?
It is a cheap unbranded USB flash drive.
> 3、Please help to try the attachment patch and see if it can be reproduced.
From a quick test, it appears to resolve the issue.
> Thanks.
Cheers,
Alex.
On 1/24/22 9:08 PM, QiuLaibin wrote:
> Hi Alex
>
> 1、Please help to import this structure:
>
> blk_mq_tags <= request_queue->blk_mq_hw_ctx->blk_mq_tags
>
> If there is no kernel dump, help to see the value of
>
> cat /sys/block/sda/mq/0/nr_tags
> __ <= Change it to the problem device
>
> And how many block devices in total by lsblk.
>
> 2、Please describe in detail how to reproduce the issue,
>
> And what type of USB device?
>
> 3、Please help to try the attachment patch and see if it can be reproduced.
Any progress on this? I strongly suspect that any QD=1 setup would
trivially show the issue, based on the reports.
--
Jens Axboe
Hi
On 2022/1/26 21:38, Jens Axboe wrote:
> On 1/24/22 9:08 PM, QiuLaibin wrote:
>> Hi Alex
>>
>> 1、Please help to import this structure:
>>
>> blk_mq_tags <= request_queue->blk_mq_hw_ctx->blk_mq_tags
>>
>> If there is no kernel dump, help to see the value of
>>
>> cat /sys/block/sda/mq/0/nr_tags
>> __ <= Change it to the problem device
>>
>> And how many block devices in total by lsblk.
>>
>> 2、Please describe in detail how to reproduce the issue,
>>
>> And what type of USB device?
>>
>> 3、Please help to try the attachment patch and see if it can be reproduced.
>
> Any progress on this? I strongly suspect that any QD=1 setup would
> trivially show the issue, based on the reports.
Yes, QD = 1 from Alex Xu's must-see environment. I'm trying to build a
must-see locally, and I will submit the repaired patch as soon as possible.
>
BR
Laibin