2022-03-20 16:29:47

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues

On 2022/03/20 2:15, Linus Torvalds wrote:
> On Fri, Mar 18, 2022 at 9:43 PM Tetsuo Handa
> <[email protected]> wrote:
>>
>> Since flush operation synchronously waits for completion, flushing
>> kernel-global WQs (e.g. system_wq) might introduce possibility of deadlock
>> due to unexpected locking dependency. Tejun Heo commented that it makes no
>> sense at all to call flush_workqueue() on the shared WQs as the caller has
>> no idea what it's gonna end up waiting for.
>
> NAK on this patch for a very simple reason:
>
> static inline void flush_scheduled_work(void)
> {
> flush_workqueue(system_wq);
> }
>
> and now grep for flush_scheduled_work().

I know there are in-tree flush_scheduled_work() users.

>
> The *other* system workqueue flushes may be rare and "that subsystem
> should just be converted to do its own workqueue".
>
> But flush_scheduled_work() is literally exported as an explicit and
> separate interface,
>
> The fact that the function has a big warning in the comment above it
> doesn't change that fact. At the very least, this patch has to be
> preceded by a couple of other patches that fix a couple of subsystems
> and document "this is what you should do".

This change was in Tejun's todo list
( https://lore.kernel.org/all/YgnQGZWT%[email protected] ).
But scrubbing the existing users will need some time.

This patch is intended for

(a) preventing developers from adding more flush_workqueue() calls on
kernel-global WQs.

(b) serving as an anchor to be referenced from individual patches

.

>
> Because suddenly saying "hey, we gave you this interface, now we're
> warning about it because it's going to go away" without actually
> showing how to do it instead is not acceptable.

This patch avoids emitting "WARNING:" in order not to disturb kernel testing.
This change is not as urgent as security fix. We can wait for several release
cycles until all in-tree users update their modules. I don't want to send
blind conversion patches, for what the proper fix is depends on what that
module is doing. For example, commit 081bdc9fe05bb232 ("RDMA/ib_srp: Fix a
deadlock") chose just removing flush_workqueue(system_long_wq) call. This
change is expected to be carefully handled by each module's maintainers.

>
> And honestly, I don't personally see a good conversion. We literally
> have all those "schedule_{delayed_}work{_on}()" etc helper functions
> that are *designed* to use this system_wq. People *need* the ability
> to flush those things, even if it's only for module unload.

Users of all those "schedule_{delayed_}work{_on}()" etc helper functions
need to be updated only if flush_workqueue() is needed. And even if
flush_workqueue() happens only upon module unload, there is possibility
of deadlock.

>
> So I really think this patch on its own is completely broken. It'd not
> pointing to a way forward, it's just saying "don't do this" with no
> really acceptable way to not do it.

If you want how to convert, I can include it into patch description.

>
> Removing flush_scheduled_work() needs to be paired with removing the
> "schedule_{delayed_}work{_on}()" helpers too.

No need to remove the "schedule_{delayed_}work{_on}()" helpers.
Those who don't need flush_workqueue() can continue using these helpers.

Those who need flush_workqueue() can convert

schedule_work(some_work);

into

queue_work(some_workqueue_for_that_module, some_work);

.

>
> And I don't see you having a good alternative.

What alternative are you expecting? We already have alternatives.
This change (replacing system_wq with module's local workqueue as
an example) is a matter of doing

(1) add

some_workqueue_for_that_module = alloc_workqueue("some_name", 0, 0);

into module's __init function.

(2) add

destroy_workqueue(some_workqueue_for_that_module);

into module's __exit function.

(3) replace

schedule_work(some_work);

with

queue_work(some_workqueue_for_that_module, some_work);

throughout that module.

(4) replace

flush_scheduled_work();

with

flush_workqueue(some_workqueue_for_that_module);

throughout that module.

if flush_scheduled_work() cannot be avoided.

>
> So until that clear way forward exists, NAK.

We know that allocating a !WQ_MEM_RECLAIM workqueue consumes little
resource ( https://lore.kernel.org/all/YhUq70Y%[email protected] ).
This is a step by step conversion if flush_workqueue() is unavoidable.

Again, we already have alternatives. Why NAK?

>
> Linus


2022-03-21 21:53:58

by Tejun Heo

[permalink] [raw]
Subject: Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues

Hello,

On Sun, Mar 20, 2022 at 03:06:30PM +0900, Tetsuo Handa wrote:
...
> What alternative are you expecting? We already have alternatives.
> This change (replacing system_wq with module's local workqueue as
> an example) is a matter of doing
>
> (1) add
>
> some_workqueue_for_that_module = alloc_workqueue("some_name", 0, 0);
>
> into module's __init function.
>
> (2) add
>
> destroy_workqueue(some_workqueue_for_that_module);
>
> into module's __exit function.
>
> (3) replace
>
> schedule_work(some_work);
>
> with
>
> queue_work(some_workqueue_for_that_module, some_work);
>
> throughout that module.
>
> (4) replace
>
> flush_scheduled_work();
>
> with
>
> flush_workqueue(some_workqueue_for_that_module);
>
> throughout that module.
>
> if flush_scheduled_work() cannot be avoided.

I'm willing to bet that the majority of the use cases can be converted to
use flush_work() and that'd be the preference. We need a separate workqueue
iff the flush requrement is complex (e.g. there are multiple dynamic work
items in flight which need to be flushed together) or the work items needs
some special attributes (such as MEM_RECLAIM or HIGHPRI) which don't apply
to the system_wq users in the first place.

Thanks.

--
tejun

2022-03-21 22:14:02

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues

On Sat, Mar 19, 2022 at 11:06 PM Tetsuo Handa
<[email protected]> wrote:
>
> I know there are in-tree flush_scheduled_work() users.

It's not just that there are in-tree users - there are in-tree users
in CORE CODE anat pretty much everybody runs.

This is not some kind of "it's some broken rare driver that we can't
test because almost nobody has the hardware".

> This patch is intended for
>
> (a) preventing developers from adding more flush_workqueue() calls on
> kernel-global WQs.

No. We're not taking stuff like this when core code has it, and will
trigger the same warning.

A warning that nobody will look at anyway, because it's just a single
line in dmesg, and because I bet there are perfectly regular machines
that will trigger the warning with existing users.

For example, I'm looking at that acpi_release_memory() case as an
example - used by ucsi_acpi_probe. I have no idea whether UCSI is
actually all that common or not, but the Kconfig docs say

"UCSI is available on most of the new Intel based systems
that are equipped with Embedded Controller and USB Type-C ports"

which makes me think it's not exactly rare.

We're not adding warnings for alleged new uses - when that warning
would trigger on normal system,s.

> (b) serving as an anchor to be referenced from individual patches

No. You can already reference the existing comment. There's no need to
create a bad commit just to make excuses to fix things up.

> This patch avoids emitting "WARNING:" in order not to disturb kernel testing.

See above: the patch is _pointless_.

And until normal systems don't have the warning, I don't want this
kind of pointless and wrong patch.

> If you want how to convert, I can include it into patch description.

I just want the existign users converted.

Once there are no existing users, we remove the interface.

And once the regular "flush_scheduled_work()" and friends are gone,
*THEN* we can add real verification that nobody tries to flush those
system-wide workqueues.

Linus

2022-05-12 15:37:52

by Dmitry Torokhov

[permalink] [raw]
Subject: Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues

Hi Tejun,

On Mon, Mar 21, 2022 at 07:02:45AM -1000, Tejun Heo wrote:
> I'm willing to bet that the majority of the use cases can be converted to
> use flush_work() and that'd be the preference. We need a separate workqueue
> iff the flush requrement is complex (e.g. there are multiple dynamic work
> items in flight which need to be flushed together) or the work items needs
> some special attributes (such as MEM_RECLAIM or HIGHPRI) which don't apply
> to the system_wq users in the first place.

This means that now the code has to keep track of all work items that it
allocated, instead of being able "fire and forget" works (when dealing
with extremely infrequent events) and rely on flush_workqueue() to
cleanup. That flush typically happens in module unload path, and I
wonder if the restriction on flush_workqueue() could be relaxed to allow
calling it on unload.

Thanks.

--
Dmitry

2022-05-14 01:43:07

by Tetsuo Handa

[permalink] [raw]
Subject: Re: [PATCH v3 (repost)] workqueue: Warn flushing of kernel-global workqueues

On 2022/05/12 19:38, Dmitry Torokhov wrote:
> Hi Tejun,
>
> On Mon, Mar 21, 2022 at 07:02:45AM -1000, Tejun Heo wrote:
>> I'm willing to bet that the majority of the use cases can be converted to
>> use flush_work() and that'd be the preference. We need a separate workqueue
>> iff the flush requrement is complex (e.g. there are multiple dynamic work
>> items in flight which need to be flushed together) or the work items needs
>> some special attributes (such as MEM_RECLAIM or HIGHPRI) which don't apply
>> to the system_wq users in the first place.
>
> This means that now the code has to keep track of all work items that it
> allocated, instead of being able "fire and forget" works (when dealing
> with extremely infrequent events) and rely on flush_workqueue() to
> cleanup.

Yes. Moreover, a patch to catch and refuse at compile time was proposed at
https://lkml.kernel.org/r/[email protected] .

> That flush typically happens in module unload path, and I
> wonder if the restriction on flush_workqueue() could be relaxed to allow
> calling it on unload.

A patch for drivers/input/mouse/psmouse-smbus.c is waiting for your response at
https://lkml.kernel.org/r/[email protected] .
Like many modules, flush_workqueue() happens on only module unload in your case.

We currently don't have a flag to tell whether the caller is inside module unload
path. And even inside module unload path, flushing the system-wide workqueue is
problematic under e.g. GFP_NOFS/GFP_NOIO context. Therefore, I don't think that
the caller is inside module unload path as a good exception.

Removing flush_scheduled_work() is for proactively avoiding new problems like
https://lkml.kernel.org/r/[email protected]
and https://lkml.kernel.org/r/[email protected] .

Using local WQ also helps for documentation purpose.
This change makes clear where the work's dependency is.
Please grep the linux-next.git tree. Some have been already converted.

Any chance you have too many out-of-tree modules to convert?