2023-02-03 07:28:32

by Zhongjie Zhu

[permalink] [raw]
Subject: [PATCH] USB: core: hub: fix usb_hub worker blocking drain_all_pages() worker issue

From: Zhongjie Zhu <[email protected]>

When disconnecting a usb mass storege, if there are a lot of inodes
like 10 thousands files need to be freed, the invalidate_inodes() will
run for a loog time to freeing all inodes, this will block other worker
to run in the cpu, so mark the usb_hub workqueue to WQ_CPU_INTENSIVE to
avoid this situation.

Sometimes the flowing call stack will hanppen: the cma_alloc() will be
blocked at __flush_work() by the hub_event worker.

usb_hub worker:
Call trace:
[<ffffffc010335e84>] invalidate_inodes+0xc0/0x294
<ffffffc0103659cc>] __invalidate_device+0x44/0xbc
[<ffffffc01059efe4>] invalidate_partition+0x7c/0xac
[<ffffffc01059ed54>] del_gendisk+0x148/0x35c
[<ffffffc0107620e4>] sd_remove+0x58/0xc4
[<ffffffc01070cb3c>] device_release_driver_internal+0x184/0x248
[<ffffffc010709e40>] bus_remove_device+0xdc/0x104
[<ffffffc0107068dc>] device_del+0x2b0/0x534
[<ffffffc01075c1a8>] __scsi_remove_device+0xc8/0x14c
[<ffffffc01075b788>] scsi_forget_host+0x60/0x7c
[<ffffffc010750040>] scsi_remove_host+0x84/0x130
[<ffffffc010882d60>] usb_stor_disconnect+0x7c/0xd0
[<ffffffc010828430>] usb_unbind_interface+0xc0/0x25c
[<ffffffc01070cb3c>] device_release_driver_internal+0x184/0x248
[<ffffffc010709e40>] bus_remove_device+0xdc/0x104
[<ffffffc0107068dc>] device_del+0x2b0/0x534
[<ffffffc010825b38>] usb_disable_device+0x80/0x180
[<ffffffc010817ef0>] usb_disconnect+0x128/0x314
[<ffffffc01081c95c>] hub_event+0x99c/0x16c8
[<ffffffc01011ed94>] process_one_work+0x2e8/0x574
[<ffffffc01011f2cc>] worker_thread+0x2ac/0x5e8
[<ffffffc0101252b4>] kthread+0x14c/0x15c
[<ffffffc010083ff0>] ret_from_fork+0x10/0x18

cma_alloc():
Call trace
[<ffffffc010090c18>] __switch_to+0x244/0x264
[<ffffffc0112153e0>] __schedule+0x564/0x754
[<ffffffc011215660>] schedule+0x90/0xc4
[<ffffffc011219468>] schedule_timeout+0x4c/0x1d0
[<ffffffc011216814>] do_wait_for_common+0xf8/0x1b4
[<ffffffc0112163bc>] wait_for_completion+0x50/0x68
[<ffffffc010119d94>] __flush_work.llvm.11756653060903382828+0x270/0x324
[<ffffffc0102d51e4>] drain_all_pages+0x224/0x2a8
[<ffffffc0103031fc>] start_isolate_page_range+0x1e4/0x2dc
[<ffffffc010e25624>] aml_cma_alloc_range+0xfc/0x3ec
[<ffffffc0103072e0>] cma_alloc+0x1ac/0x6d8

Signed-off-by: Zhongjie Zhu <[email protected]>
---
drivers/usb/core/hub.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 9eca403af2a8..850549b30277 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -5904,7 +5904,8 @@ int usb_hub_init(void)
* device was gone before the EHCI controller had handed its port
* over to the companion full-speed controller.
*/
- hub_wq = alloc_workqueue("usb_hub_wq", WQ_FREEZABLE, 0);
+ hub_wq = alloc_workqueue("usb_hub_wq",
+ WQ_FREEZABLE | WQ_CPU_INTENSIVE, 0);
if (hub_wq)
return 0;

--
2.34.1



2023-02-03 14:47:34

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] USB: core: hub: fix usb_hub worker blocking drain_all_pages() worker issue

On Fri, Feb 03, 2023 at 03:28:19PM +0800, Zhu Zhongjie wrote:
> From: Zhongjie Zhu <[email protected]>
>
> When disconnecting a usb mass storege, if there are a lot of inodes
> like 10 thousands files need to be freed, the invalidate_inodes() will
> run for a loog time to freeing all inodes, this will block other worker
> to run in the cpu, so mark the usb_hub workqueue to WQ_CPU_INTENSIVE to
> avoid this situation.

Very infrequently this will happen. In the vast majority of cases, the
usb_hub workqueue uses very little CPU time. Marking it
WQ_CPU_INTENSIVE seems inappropriate.

Alan Stern

2023-02-06 03:36:04

by Zhongjie Zhu

[permalink] [raw]
Subject: Re: [PATCH] USB: core: hub: fix usb_hub worker blocking drain_all_pages() worker issue

Yes, this is a very special case.

It will happen only when disconnecting the mass storage if there are
too many files in the storage, and the scanning operation is running,
and the file system is not unmounted.
It looks like this issue should be fixed in the usb mass storage
driver, but I don't find an appropriate place.

On Fri, Feb 3, 2023 at 10:47 PM Alan Stern <[email protected]> wrote:
>
> On Fri, Feb 03, 2023 at 03:28:19PM +0800, Zhu Zhongjie wrote:
> > From: Zhongjie Zhu <[email protected]>
> >
> > When disconnecting a usb mass storege, if there are a lot of inodes
> > like 10 thousands files need to be freed, the invalidate_inodes() will
> > run for a loog time to freeing all inodes, this will block other worker
> > to run in the cpu, so mark the usb_hub workqueue to WQ_CPU_INTENSIVE to
> > avoid this situation.
>
> Very infrequently this will happen. In the vast majority of cases, the
> usb_hub workqueue uses very little CPU time. Marking it
> WQ_CPU_INTENSIVE seems inappropriate.
>
> Alan Stern

2023-02-06 15:18:08

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] USB: core: hub: fix usb_hub worker blocking drain_all_pages() worker issue

On Mon, Feb 06, 2023 at 11:33:15AM +0800, 朱忠杰 wrote:
> Yes, this is a very special case.
>
> It will happen only when disconnecting the mass storage if there are
> too many files in the storage, and the scanning operation is running,
> and the file system is not unmounted.
> It looks like this issue should be fixed in the usb mass storage
> driver, but I don't find an appropriate place.

That's not surprising, because usb-storage doesn't know anything about
what's happening on the mass-storage device it connects to. All it does
is send the commands that it gets from the SCSI subsystem to the device
and receive the results back. It has no idea whether there is a mounted
filesystem on the device, if the filesystem contains any files, or
whether a scanning operation is running,

A better place to look for fixing this might be the filesystem code.
That's where the information about mounting, files, and scanning can be
found.

Alan Stern

2023-02-07 02:05:42

by Zhongjie Zhu

[permalink] [raw]
Subject: Re: [PATCH] USB: core: hub: fix usb_hub worker blocking drain_all_pages() worker issue

On Mon, Feb 6, 2023 at 11:17 PM Alan Stern <[email protected]> wrote:
>
> On Mon, Feb 06, 2023 at 11:33:15AM +0800, 朱忠杰 wrote:
> > Yes, this is a very special case.
> >
> > It will happen only when disconnecting the mass storage if there are
> > too many files in the storage, and the scanning operation is running,
> > and the file system is not unmounted.
> > It looks like this issue should be fixed in the usb mass storage
> > driver, but I don't find an appropriate place.
>
> That's not surprising, because usb-storage doesn't know anything about
> what's happening on the mass-storage device it connects to. All it does
> is send the commands that it gets from the SCSI subsystem to the device
> and receive the results back. It has no idea whether there is a mounted
> filesystem on the device, if the filesystem contains any files, or
> whether a scanning operation is running,
>
> A better place to look for fixing this might be the filesystem code.
> That's where the information about mounting, files, and scanning can be
> found.
>
> Alan Stern

The problem is there is a for loop in the invalidate_inodes(), this
function is in the block device driver. when the usb_disconnect is
called, the filesystem is not umounted, userspace applications will be
noticed the usb storage is disconnected, and then do the umounting
work.
the invalidate_inodes() is called in the usb hub worker, and will run
for a long time. To fix this issue, the long running loop need to be
moved out from the usb hub worker.

2023-02-07 04:07:15

by Alan Stern

[permalink] [raw]
Subject: Re: [PATCH] USB: core: hub: fix usb_hub worker blocking drain_all_pages() worker issue

On Tue, Feb 07, 2023 at 10:02:51AM +0800, Zhongjie Zhu wrote:
> On Mon, Feb 6, 2023 at 11:17 PM Alan Stern <[email protected]> wrote:
> >
> > On Mon, Feb 06, 2023 at 11:33:15AM +0800, 朱忠杰 wrote:
> > > Yes, this is a very special case.
> > >
> > > It will happen only when disconnecting the mass storage if there are
> > > too many files in the storage, and the scanning operation is running,
> > > and the file system is not unmounted.
> > > It looks like this issue should be fixed in the usb mass storage
> > > driver, but I don't find an appropriate place.
> >
> > That's not surprising, because usb-storage doesn't know anything about
> > what's happening on the mass-storage device it connects to. All it does
> > is send the commands that it gets from the SCSI subsystem to the device
> > and receive the results back. It has no idea whether there is a mounted
> > filesystem on the device, if the filesystem contains any files, or
> > whether a scanning operation is running,
> >
> > A better place to look for fixing this might be the filesystem code.
> > That's where the information about mounting, files, and scanning can be
> > found.
> >
> > Alan Stern
>
> The problem is there is a for loop in the invalidate_inodes(), this
> function is in the block device driver. when the usb_disconnect is
> called, the filesystem is not umounted, userspace applications will be
> noticed the usb storage is disconnected, and then do the umounting
> work.
> the invalidate_inodes() is called in the usb hub worker, and will run
> for a long time. To fix this issue, the long running loop need to be
> moved out from the usb hub worker.

Oh, maybe I didn't understand.

You've got a USB mass-storage device with a mounted filesystem and a lot
of dirty inodes, right? Then a USB disconnect happens, and as part of
the disconnect processing, invalidate_inodes() runs for a long time.

Do you know why it takes so long? The I/O operations shouldn't need any
time; they will all fail immediately because the device has been
disconnected and so there is no way to communicate with it.

Alan Stern

2023-02-07 08:26:35

by Zhongjie Zhu

[permalink] [raw]
Subject: Re: [PATCH] USB: core: hub: fix usb_hub worker blocking drain_all_pages() worker issue

On Tue, Feb 7, 2023 at 12:07 PM Alan Stern <[email protected]> wrote:
>
> On Tue, Feb 07, 2023 at 10:02:51AM +0800, Zhongjie Zhu wrote:
> > On Mon, Feb 6, 2023 at 11:17 PM Alan Stern <[email protected]> wrote:
> > >
> > > On Mon, Feb 06, 2023 at 11:33:15AM +0800, 朱忠杰 wrote:
> > > > Yes, this is a very special case.
> > > >
> > > > It will happen only when disconnecting the mass storage if there are
> > > > too many files in the storage, and the scanning operation is running,
> > > > and the file system is not unmounted.
> > > > It looks like this issue should be fixed in the usb mass storage
> > > > driver, but I don't find an appropriate place.
> > >
> > > That's not surprising, because usb-storage doesn't know anything about
> > > what's happening on the mass-storage device it connects to. All it does
> > > is send the commands that it gets from the SCSI subsystem to the device
> > > and receive the results back. It has no idea whether there is a mounted
> > > filesystem on the device, if the filesystem contains any files, or
> > > whether a scanning operation is running,
> > >
> > > A better place to look for fixing this might be the filesystem code.
> > > That's where the information about mounting, files, and scanning can be
> > > found.
> > >
> > > Alan Stern
> >
> > The problem is there is a for loop in the invalidate_inodes(), this
> > function is in the block device driver. when the usb_disconnect is
> > called, the filesystem is not umounted, userspace applications will be
> > noticed the usb storage is disconnected, and then do the umounting
> > work.
> > the invalidate_inodes() is called in the usb hub worker, and will run
> > for a long time. To fix this issue, the long running loop need to be
> > moved out from the usb hub worker.
>
> Oh, maybe I didn't understand.
>
> You've got a USB mass-storage device with a mounted filesystem and a lot
> of dirty inodes, right? Then a USB disconnect happens, and as part of
> the disconnect processing, invalidate_inodes() runs for a long time.
>
> Do you know why it takes so long? The I/O operations shouldn't need any
> time; they will all fail immediately because the device has been
> disconnected and so there is no way to communicate with it.
>
> Alan Stern

Yes, invalidate_inodes() will free all the inodes related to the
supper_block, there are more
than 20 thousands inodes (some times more) need to be freed, the perf
record shows the
cpu is busy running the spin_lock and spin_unlock in the
invalidate_inodes(). The work in
this function is to free all the inodes with the super_block.
Maybe I need to find out why the spin_lock is running so much first.