LinuxLists.cc - Re: [syzbot] possible deadlock in worker

2022-02-14 05:01:52

Subject: Re: [syzbot] possible deadlock in worker_thread

Hello,

On Mon, Feb 14, 2022 at 10:08:00AM +0900, Tetsuo Handa wrote:
> + destroy_workqueue(srp_tl_err_wq);
>
> Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.

Yeah, this is the right thing to do. It makes no sense at all to call
flush_workqueue() on the shared workqueues as the caller has no idea what
it's gonna end up waiting for. It was on my todo list a long while ago but
slipped through the crack. If anyone wanna take a stab at it (including
scrubbing the existing users, of course), please be my guest.

Thanks.

--
tejun

2022-02-14 19:13:07

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

On 2022/02/14 12:44, Tejun Heo wrote:
> Hello,
>
> On Mon, Feb 14, 2022 at 10:08:00AM +0900, Tetsuo Handa wrote:
>> + destroy_workqueue(srp_tl_err_wq);
>>
>> Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.
>
> Yeah, this is the right thing to do. It makes no sense at all to call
> flush_workqueue() on the shared workqueues as the caller has no idea what
> it's gonna end up waiting for. It was on my todo list a long while ago but
> slipped through the crack. If anyone wanna take a stab at it (including
> scrubbing the existing users, of course), please be my guest.
>
> Thanks.
>

OK. Then, I propose below patch. If you are OK with this approach, I can
keep this via my tree as a linux-next only experimental patch for one or
two weeks, in order to see if someone complains.

From 95a3aa8d46c8479c95672305645247ba70312113 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <[email protected]>
Date: Mon, 14 Feb 2022 22:28:21 +0900
Subject: [PATCH] workqueue: Warn on flushing system-wide workqueues

syzbot found a circular locking dependency which is caused by flushing
system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
to call flush_workqueue() on the shared workqueues as the caller has no
idea what it's gonna end up waiting for.

Although there is flush_scheduled_work() which flushes system_wq WQ with
"Think twice before calling this function! It's very easy to get into
trouble if you don't take great care." warning message, it will be too
difficult to guarantee that all users safely flush system-wide WQs.

Therefore, let's change the direction to that developers had better use
their own WQs if flushing is inevitable. To give developers time to update
their modules, for now just emit a warning message when flush_workqueue()
is called on system-wide WQs. We will eventually convert this warning
message into WARN_ON() and kill flush_scheduled_work().

Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Signed-off-by: Tetsuo Handa <[email protected]>
---
kernel/workqueue.c | 33 +++++++++++++++++++++++++++++++++
1 file changed, 33 insertions(+)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..5ef40b9a1842 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2805,6 +2805,37 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
return wait;
}

+static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
+{
+#ifdef CONFIG_PROVE_LOCKING
+ static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+ const char *name;
+
+ if (wq == system_wq)
+ name = "system_wq";
+ else if (wq == system_highpri_wq)
+ name = "system_highpri_wq";
+ else if (wq == system_long_wq)
+ name = "system_long_wq";
+ else if (wq == system_unbound_wq)
+ name = "system_unbound_wq";
+ else if (wq == system_freezable_wq)
+ name = "system_freezable_wq";
+ else if (wq == system_power_efficient_wq)
+ name = "system_power_efficient_wq";
+ else if (wq == system_freezable_power_efficient_wq)
+ name = "system_freezable_power_efficient_wq";
+ else
+ return;
+ ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+ if (!__ratelimit(&flush_warn_rs))
+ return;
+ pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
+ name);
+ dump_stack();
+#endif
+}
+
/**
* flush_workqueue - ensure that any scheduled work has run to completion.
* @wq: workqueue to flush
@@ -2824,6 +2855,8 @@ void flush_workqueue(struct workqueue_struct *wq)
if (WARN_ON(!wq_online))
return;

+ warn_if_flushing_global_workqueue(wq);
+
lock_map_acquire(&wq->lockdep_map);
lock_map_release(&wq->lockdep_map);

--
2.32.0

2022-02-14 20:34:22

by Tejun Heo

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

Hello,

On Mon, Feb 14, 2022 at 10:36:57PM +0900, Tetsuo Handa wrote:
> OK. Then, I propose below patch. If you are OK with this approach, I can
> keep this via my tree as a linux-next only experimental patch for one or
> two weeks, in order to see if someone complains.

I don't mind you testing that way but this and would much prefer this and
related changes in the wq tree.

> +static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
> +{
> +#ifdef CONFIG_PROVE_LOCKING
> + static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
> + const char *name;
> +
> + if (wq == system_wq)
> + name = "system_wq";
> + else if (wq == system_highpri_wq)
> + name = "system_highpri_wq";
> + else if (wq == system_long_wq)
> + name = "system_long_wq";
> + else if (wq == system_unbound_wq)
> + name = "system_unbound_wq";
> + else if (wq == system_freezable_wq)
> + name = "system_freezable_wq";
> + else if (wq == system_power_efficient_wq)
> + name = "system_power_efficient_wq";
> + else if (wq == system_freezable_power_efficient_wq)
> + name = "system_freezable_power_efficient_wq";
> + else
> + return;
> + ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
> + if (!__ratelimit(&flush_warn_rs))
> + return;
> + pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
> + name);
> + dump_stack();
> +#endif

Instead of doing the above, please add a wq flag to mark system wqs and
trigger the warning that way and I'd leave it regardless of PROVE_LOCKING.

Thanks.

--
tejun

2022-02-15 13:05:59

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

On 2022/02/15 2:34, Tejun Heo wrote:
>
> Instead of doing the above, please add a wq flag to mark system wqs and
> trigger the warning that way and I'd leave it regardless of PROVE_LOCKING.
>

Do you mean something like below?
I think the above is easier to understand (for developers) because

The above prints variable's name (one of 'system_wq', 'system_highpri_wq',
'system_long_wq', 'system_unbound_wq', 'system_freezable_wq', 'system_power_efficient_wq'
or 'system_freezable_power_efficient_wq') with backtrace (which will be translated into
filename:line format) while the below prints queue's name (one of 'events', 'events_highpri',
'events_long', 'events_unbound', 'events_freezable', 'events_power_efficient' or
'events_freezable_power_efficient'). If CONFIG_KALLSYMS_ALL=y, we can print
variable's name using "%ps", but CONFIG_KALLSYMS_ALL is not always enabled.

The flag must not be used by user-defined WQs, for destroy_workqueue() involves
flush_workqueue(). If this flag is by error used on user-defined WQs, pointless
warning will be printed. I didn't pass this flag when creating system-wide WQs
because some developer might copy&paste the
system_wq = alloc_workqueue("events", 0, 0);
lines when converting.

---
include/linux/workqueue.h | 1 +
kernel/workqueue.c | 24 ++++++++++++++++++++++++
2 files changed, 25 insertions(+)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..9e33dcd417d3 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -339,6 +339,7 @@ enum {
__WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
__WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */
__WQ_ORDERED_EXPLICIT = 1 << 19, /* internal: alloc_ordered_workqueue() */
+ __WQ_SYSTEM_WIDE = 1 << 20, /* interbal: don't flush this workqueue */

WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */
WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..dbb9c6bb54a7 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2805,6 +2805,21 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
return wait;
}

+static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
+{
+ static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+
+ if (likely(!(wq->flags & __WQ_SYSTEM_WIDE)))
+ return;
+
+ ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+ if (!__ratelimit(&flush_warn_rs))
+ return;
+ pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
+ wq->name);
+ dump_stack();
+}
+
/**
* flush_workqueue - ensure that any scheduled work has run to completion.
* @wq: workqueue to flush
@@ -2824,6 +2839,8 @@ void flush_workqueue(struct workqueue_struct *wq)
if (WARN_ON(!wq_online))
return;

+ warn_if_flushing_global_workqueue(wq);
+
lock_map_acquire(&wq->lockdep_map);
lock_map_release(&wq->lockdep_map);

@@ -6070,6 +6087,13 @@ void __init workqueue_init_early(void)
!system_unbound_wq || !system_freezable_wq ||
!system_power_efficient_wq ||
!system_freezable_power_efficient_wq);
+ system_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_highpri_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_long_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_unbound_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_freezable_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
+ system_freezable_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
}

/**
--
2.32.0

2022-02-15 15:28:49

by Håkon Bugge

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

> On 15 Feb 2022, at 11:26, Tetsuo Handa <[email protected]> wrote:
>
> On 2022/02/15 2:34, Tejun Heo wrote:
>>
>> Instead of doing the above, please add a wq flag to mark system wqs and
>> trigger the warning that way and I'd leave it regardless of PROVE_LOCKING.
>>
>
> Do you mean something like below?
> I think the above is easier to understand (for developers) because
>
> The above prints variable's name (one of 'system_wq', 'system_highpri_wq',
> 'system_long_wq', 'system_unbound_wq', 'system_freezable_wq', 'system_power_efficient_wq'
> or 'system_freezable_power_efficient_wq') with backtrace (which will be translated into
> filename:line format) while the below prints queue's name (one of 'events', 'events_highpri',
> 'events_long', 'events_unbound', 'events_freezable', 'events_power_efficient' or
> 'events_freezable_power_efficient'). If CONFIG_KALLSYMS_ALL=y, we can print
> variable's name using "%ps", but CONFIG_KALLSYMS_ALL is not always enabled.
>
> The flag must not be used by user-defined WQs, for destroy_workqueue() involves
> flush_workqueue(). If this flag is by error used on user-defined WQs, pointless
> warning will be printed. I didn't pass this flag when creating system-wide WQs
> because some developer might copy&paste the
> system_wq = alloc_workqueue("events", 0, 0);
> lines when converting.
>
> ---
> include/linux/workqueue.h | 1 +
> kernel/workqueue.c | 24 ++++++++++++++++++++++++
> 2 files changed, 25 insertions(+)
>
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 7fee9b6cfede..9e33dcd417d3 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -339,6 +339,7 @@ enum {
> __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
> __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */
> __WQ_ORDERED_EXPLICIT = 1 << 19, /* internal: alloc_ordered_workqueue() */
> + __WQ_SYSTEM_WIDE = 1 << 20, /* interbal: don't flush this workqueue */

s/interbal/internal/

>
> WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */
> WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 33f1106b4f99..dbb9c6bb54a7 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2805,6 +2805,21 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
> return wait;
> }
>
> +static void warn_if_flushing_global_workqueue(struct workqueue_struct *wq)
> +{
> + static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
> +
> + if (likely(!(wq->flags & __WQ_SYSTEM_WIDE)))
> + return;
> +
> + ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
> + if (!__ratelimit(&flush_warn_rs))
> + return;
> + pr_warn("Since system-wide WQ is shared, flushing system-wide WQ can introduce unexpected locking dependency. Please replace %s usage in your code with your local WQ.\n",
> + wq->name);
> + dump_stack();
> +}
> +
> /**
> * flush_workqueue - ensure that any scheduled work has run to completion.
> * @wq: workqueue to flush
> @@ -2824,6 +2839,8 @@ void flush_workqueue(struct workqueue_struct *wq)
> if (WARN_ON(!wq_online))
> return;
>
> + warn_if_flushing_global_workqueue(wq);
> +
> lock_map_acquire(&wq->lockdep_map);
> lock_map_release(&wq->lockdep_map);
>
> @@ -6070,6 +6087,13 @@ void __init workqueue_init_early(void)
> !system_unbound_wq || !system_freezable_wq ||
> !system_power_efficient_wq ||
> !system_freezable_power_efficient_wq);
> + system_wq->flags |= __WQ_SYSTEM_WIDE;
> + system_highpri_wq->flags |= __WQ_SYSTEM_WIDE;
> + system_long_wq->flags |= __WQ_SYSTEM_WIDE;
> + system_unbound_wq->flags |= __WQ_SYSTEM_WIDE;
> + system_freezable_wq->flags |= __WQ_SYSTEM_WIDE;
> + system_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
> + system_freezable_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;

Better to OR this in, in the alloc_workqueue() call? Perceive the notion of an opaque object?

Thxs, Håkon

> }
>
> /**
> --
> 2.32.0
>
>

2022-02-15 20:02:19

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

On 2022/02/15 19:43, Haakon Bugge wrote:
>> @@ -6070,6 +6087,13 @@ void __init workqueue_init_early(void)
>> !system_unbound_wq || !system_freezable_wq ||
>> !system_power_efficient_wq ||
>> !system_freezable_power_efficient_wq);
>> + system_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_highpri_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_long_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_unbound_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_freezable_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
>> + system_freezable_power_efficient_wq->flags |= __WQ_SYSTEM_WIDE;
>
> Better to OR this in, in the alloc_workqueue() call? Perceive the notion of an opaque object?
>

I do not want to do like

- system_wq = alloc_workqueue("events", 0, 0);
+ system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);

because the intent of this change is to ask developers to create their own WQs.
If I pass __WQ_SYSTEM_WIDE to alloc_workqueue(), developers might by error create like

srp_tl_err_wq = alloc_workqueue("srp_tl_err_wq", __WQ_SYSTEM_WIDE, 0);

because of

system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);

line. The __WQ_SYSTEM_WIDE is absolutely meant to be applied to only 'system_wq',
'system_highpri_wq', 'system_long_wq', 'system_unbound_wq', 'system_freezable_wq',
'system_power_efficient_wq' and 'system_freezable_power_efficient_wq' WQs, in order
to avoid calling flush_workqueue() on these system-wide WQs.

I wish I could define __WQ_SYSTEM_WIDE inside kernel/workqueue_internal.h, but
it seems that kernel/workqueue_internal.h does not define internal flags.

2022-02-16 01:46:53

by Bart Van Assche

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

On 2/15/22 04:48, Tetsuo Handa wrote:
> I do not want to do like
>
> - system_wq = alloc_workqueue("events", 0, 0);
> + system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);
>
> because the intent of this change is to ask developers to create their own WQs.

I want more developers to use the system-wide workqueues since that
reduces memory usage. That matters for embedded devices running Linux.

Thanks,

Bart.

2022-02-16 07:32:08

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

On 2022/02/16 2:05, Bart Van Assche wrote:
> On 2/15/22 04:48, Tetsuo Handa wrote:
>> I do not want to do like
>>
>> - system_wq = alloc_workqueue("events", 0, 0);
>> + system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);
>>
>> because the intent of this change is to ask developers to create their own WQs.
>
> I want more developers to use the system-wide workqueues since that reduces memory usage. That matters for embedded devices running Linux.

Reserving a kernel thread for WQ_MEM_RECLAIM WQ might consume some memory,
but I don't think that creating a !WQ_MEM_RECLAIM WQ consumes much memory.

2022-02-17 13:43:12

by Fabio M. De Francesco

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

On lunedì 14 febbraio 2022 04:44:25 CET Tejun Heo wrote:
> Hello,
>
> On Mon, Feb 14, 2022 at 10:08:00AM +0900, Tetsuo Handa wrote:
> > + destroy_workqueue(srp_tl_err_wq);
> >
> > Then, we can call WARN_ON() if e.g. flush_workqueue() is called on system-wide workqueues.
>
> Yeah, this is the right thing to do. It makes no sense at all to call
> flush_workqueue() on the shared workqueues as the caller has no idea what
> it's gonna end up waiting for. It was on my todo list a long while ago but
> slipped through the crack. If anyone wanna take a stab at it (including
> scrubbing the existing users, of course), please be my guest.
>

Just to think and understand... what if the system-wide WQ were allocated as unbound
ordered (i.e., as in alloc_ordered_workqueue()) with "max_active" of one?

1) Would it solve the locks dependency problem?
2) Would it introduce performance penalties (bottlenecks)?

Greetings,

Fabio

>
> Thanks.
>
> --
> tejun
>

2022-02-17 14:49:39

by Tetsuo Handa

[permalink] [raw]

Subject: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues

syzbot found a circular locking dependency which is caused by flushing
system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
to call flush_workqueue() on the shared workqueues as the caller has no
idea what it's gonna end up waiting for.

Although there is flush_scheduled_work() which flushes system_wq WQ with
"Think twice before calling this function! It's very easy to get into
trouble if you don't take great care." warning message, it will be too
difficult to guarantee that all users safely flush system-wide WQs.

Therefore, let's change the direction to that developers had better use
their own WQs if flushing is inevitable. To give developers time to update
their modules, for now just emit a warning message when flush_workqueue()
or flush_work() is called on system-wide WQs. We will eventually convert
this warning message into WARN_ON() and kill flush_scheduled_work().

Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Signed-off-by: Tetsuo Handa <[email protected]>
---
Changes in v2:
Removed #ifdef CONFIG_PROVE_LOCKING=y check.
Also check flush_work() attempt.
Shorten warning message.
Introduced a public WQ_ flag, which is initially meant for use by
only system-wide WQs, but allows private WQs used by built-in modules
to use this flag for detecting unexpected flush attempts if they want.

include/linux/workqueue.h | 26 +++++++++++++------------
kernel/workqueue.c | 41 ++++++++++++++++++++++++++++-----------
2 files changed, 44 insertions(+), 23 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..4b698917b9d5 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -335,6 +335,18 @@ enum {
*/
WQ_POWER_EFFICIENT = 1 << 7,

+ /*
+ * Since flush operation synchronously waits for completion, flushing
+ * system-wide workqueues (e.g. system_wq) or a work on a system-wide
+ * workqueue might introduce possibility of deadlock due to unexpected
+ * locking dependency.
+ *
+ * This flag emits warning if flush operation is attempted. Don't set
+ * this flag on user-defined workqueues, for destroy_workqueue() will
+ * involve flush operation.
+ */
+ WQ_WARN_FLUSH_ATTEMPT = 1 << 8,
+
__WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */
__WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
__WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */
@@ -569,18 +581,8 @@ static inline bool schedule_work(struct work_struct *work)
* Forces execution of the kernel-global workqueue and blocks until its
* completion.
*
- * Think twice before calling this function! It's very easy to get into
- * trouble if you don't take great care. Either of the following situations
- * will lead to deadlock:
- *
- * One of the work items currently on the workqueue needs to acquire
- * a lock held by your code or its caller.
- *
- * Your code is running in the context of a work routine.
- *
- * They will be detected by lockdep when they occur, but the first might not
- * occur very often. It depends on what work items are on the workqueue and
- * what locks they need, which you have no control over.
+ * Please stop calling this function. If you need to flush, please use your
+ * own workqueue.
*
* In most situations flushing the entire workqueue is overkill; you merely
* need to know that a particular work item isn't queued and isn't running.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..8e6e64372441 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2618,6 +2618,20 @@ static int rescuer_thread(void *__rescuer)
goto repeat;
}

+static void warn_flush_attempt(struct workqueue_struct *wq)
+{
+ static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+
+
+ /* Use ratelimit for now in order not to flood warning messages. */
+ ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+ if (!__ratelimit(&flush_warn_rs))
+ return;
+ /* Don't use WARN_ON() for now in order not to break kernel testing. */
+ pr_warn("Please do not flush %s WQ.\n", wq->name);
+ dump_stack();
+}
+
/**
* check_flush_dependency - check for flush dependency sanity
* @target_wq: workqueue being flushed
@@ -2635,6 +2649,9 @@ static void check_flush_dependency(struct workqueue_struct *target_wq,
work_func_t target_func = target_work ? target_work->func : NULL;
struct worker *worker;

+ if (unlikely(target_wq->flags & WQ_WARN_FLUSH_ATTEMPT))
+ warn_flush_attempt(target_wq);
+
if (target_wq->flags & WQ_MEM_RECLAIM)
return;

@@ -6054,18 +6071,20 @@ void __init workqueue_init_early(void)
ordered_wq_attrs[i] = attrs;
}

- system_wq = alloc_workqueue("events", 0, 0);
- system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
- system_long_wq = alloc_workqueue("events_long", 0, 0);
- system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
+ system_wq = alloc_workqueue("events", WQ_WARN_FLUSH_ATTEMPT, 0);
+ system_highpri_wq = alloc_workqueue("events_highpri",
+ WQ_WARN_FLUSH_ATTEMPT | WQ_HIGHPRI, 0);
+ system_long_wq = alloc_workqueue("events_long", WQ_WARN_FLUSH_ATTEMPT, 0);
+ system_unbound_wq = alloc_workqueue("events_unbound", WQ_WARN_FLUSH_ATTEMPT | WQ_UNBOUND,
WQ_UNBOUND_MAX_ACTIVE);
- system_freezable_wq = alloc_workqueue("events_freezable",
- WQ_FREEZABLE, 0);
- system_power_efficient_wq = alloc_workqueue("events_power_efficient",
- WQ_POWER_EFFICIENT, 0);
- system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
- WQ_FREEZABLE | WQ_POWER_EFFICIENT,
- 0);
+ system_freezable_wq =
+ alloc_workqueue("events_freezable", WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE, 0);
+ system_power_efficient_wq =
+ alloc_workqueue("events_power_efficient",
+ WQ_WARN_FLUSH_ATTEMPT | WQ_POWER_EFFICIENT, 0);
+ system_freezable_power_efficient_wq =
+ alloc_workqueue("events_freezable_power_efficient",
+ WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE | WQ_POWER_EFFICIENT, 0);
BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
!system_unbound_wq || !system_freezable_wq ||
!system_power_efficient_wq ||
--
2.32.0

2022-02-23 00:19:12

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues

On Thu, Feb 17, 2022 at 08:22:30PM +0900, Tetsuo Handa wrote:
> @@ -335,6 +335,18 @@ enum {
> */
> WQ_POWER_EFFICIENT = 1 << 7,
>
> + /*
> + * Since flush operation synchronously waits for completion, flushing
> + * system-wide workqueues (e.g. system_wq) or a work on a system-wide
> + * workqueue might introduce possibility of deadlock due to unexpected
> + * locking dependency.
> + *
> + * This flag emits warning if flush operation is attempted. Don't set
> + * this flag on user-defined workqueues, for destroy_workqueue() will
> + * involve flush operation.
> + */
> + WQ_WARN_FLUSH_ATTEMPT = 1 << 8,

Maybe just __WQ_NO_FLUSH?

> __WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */
> __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
> __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */
...
> +static void warn_flush_attempt(struct workqueue_struct *wq)
> +{
> + static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
> +
> +
> + /* Use ratelimit for now in order not to flood warning messages. */
> + ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
> + if (!__ratelimit(&flush_warn_rs))
> + return;

If you're worried about spamming console while conversion is in progress, we
can just print the immediate (and maybe one more) caller with %pf and
__builtin_return_address() so that it only prints out one line.

Thanks.

--
tejun

2022-02-23 02:34:04

by Tejun Heo

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

Hello,

On Thu, Feb 17, 2022 at 01:27:08PM +0100, Fabio M. De Francesco wrote:
> Just to think and understand... what if the system-wide WQ were allocated as unbound
> ordered (i.e., as in alloc_ordered_workqueue()) with "max_active" of one?
>
> 1) Would it solve the locks dependency problem?

It'll actually make deadlocks a lot more prevalent. Some work items take
more than one work to complete (e.g. flushing another work directly or
waiting for something which must be completed by something else which may
involve a system work item) and system wq's max active must be high enough
that all those chains taking place at the same time should require fewer
number of work items than max_active.

> 2) Would it introduce performance penalties (bottlenecks)?

I'd be surprised it wouldn't cause at least notcieable latency increases for
some workloads.

Thanks.

--
tejun

2022-02-23 08:40:04

by Tejun Heo

[permalink] [raw]

Subject: Re: [syzbot] possible deadlock in worker_thread

On Tue, Feb 15, 2022 at 09:05:38AM -0800, Bart Van Assche wrote:
> On 2/15/22 04:48, Tetsuo Handa wrote:
> > I do not want to do like
> >
> > - system_wq = alloc_workqueue("events", 0, 0);
> > + system_wq = alloc_workqueue("events", __WQ_SYSTEM_WIDE, 0);
> >
> > because the intent of this change is to ask developers to create their own WQs.
>
> I want more developers to use the system-wide workqueues since that reduces
> memory usage. That matters for embedded devices running Linux.

Each wq is just a frontend interface to backend shard pool and doesn't
consume a lot of memory. The only consumption which would matter is when
WQ_MEM_RECLAIM is specified which creates its dedicated rescuer thread to
guarantee forward progress under memory contention, but we aren't talking
about those here.

Thanks.

--
tejun

2022-02-24 00:44:04

by Tejun Heo

[permalink] [raw]

Subject: Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues

On Wed, Feb 23, 2022 at 10:20:47PM +0100, Marek Szyprowski wrote:
> Hi All,
>
> On 17.02.2022 12:22, Tetsuo Handa wrote:
> > syzbot found a circular locking dependency which is caused by flushing
> > system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
> > to call flush_workqueue() on the shared workqueues as the caller has no
> > idea what it's gonna end up waiting for.
> >
> > Although there is flush_scheduled_work() which flushes system_wq WQ with
> > "Think twice before calling this function! It's very easy to get into
> > trouble if you don't take great care." warning message, it will be too
> > difficult to guarantee that all users safely flush system-wide WQs.
> >
> > Therefore, let's change the direction to that developers had better use
> > their own WQs if flushing is inevitable. To give developers time to update
> > their modules, for now just emit a warning message when flush_workqueue()
> > or flush_work() is called on system-wide WQs. We will eventually convert
> > this warning message into WARN_ON() and kill flush_scheduled_work().
> >
> > Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
> > Signed-off-by: Tetsuo Handa <[email protected]>
>
> This patch landed in linux next-20220222 as commit 4a6a0ce060e4
> ("workqueue: Warn flush attempt using system-wide workqueues"). As it
> might be expected it exposed some calls to flush work. However it also
> causes boot failure of the Raspberry Pi 3 and 4 boards (kernel compiled
> from arm64/defconfig). In the log I see one call from the
> deferred_probe_initcall(), but it isn't critical for the boot process.
> The deadlock occurs when DRM registers emulated framebuffer on RPi4.
> RPi3 boots a bit further, to the shell prompt, but then the console is
> freezed. Reverting this patch on top of linux-next 'fixes' the boot.

Tetsuo, can you please revert the patch? The patch is incorrect in that it's
triggering also on work item flushes, not just workqueue flushes.

Thanks.

--
tejun

2022-02-24 01:05:27

by Tetsuo Handa

[permalink] [raw]

Subject: Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues

On 2022/02/24 6:35, Tejun Heo wrote:
> Tetsuo, can you please revert the patch? The patch is incorrect in that it's
> triggering also on work item flushes, not just workqueue flushes.

OK. I removed these patches from my tree.

2022-02-24 01:53:38

by Marek Szyprowski

[permalink] [raw]

Subject: Re: [PATCH v2] workqueue: Warn flush attempt using system-wide workqueues

Hi All,

On 17.02.2022 12:22, Tetsuo Handa wrote:
> syzbot found a circular locking dependency which is caused by flushing
> system_long_wq WQ [1]. Tejun Heo commented that it makes no sense at all
> to call flush_workqueue() on the shared workqueues as the caller has no
> idea what it's gonna end up waiting for.
>
> Although there is flush_scheduled_work() which flushes system_wq WQ with
> "Think twice before calling this function! It's very easy to get into
> trouble if you don't take great care." warning message, it will be too
> difficult to guarantee that all users safely flush system-wide WQs.
>
> Therefore, let's change the direction to that developers had better use
> their own WQs if flushing is inevitable. To give developers time to update
> their modules, for now just emit a warning message when flush_workqueue()
> or flush_work() is called on system-wide WQs. We will eventually convert
> this warning message into WARN_ON() and kill flush_scheduled_work().
>
> Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
> Signed-off-by: Tetsuo Handa <[email protected]>

This patch landed in linux next-20220222 as commit 4a6a0ce060e4
("workqueue: Warn flush attempt using system-wide workqueues"). As it
might be expected it exposed some calls to flush work. However it also
causes boot failure of the Raspberry Pi 3 and 4 boards (kernel compiled
from arm64/defconfig). In the log I see one call from the
deferred_probe_initcall(), but it isn't critical for the boot process.
The deadlock occurs when DRM registers emulated framebuffer on RPi4.
RPi3 boots a bit further, to the shell prompt, but then the console is
freezed. Reverting this patch on top of linux-next 'fixes' the boot.

> ---
> Changes in v2:
> Removed #ifdef CONFIG_PROVE_LOCKING=y check.
> Also check flush_work() attempt.
> Shorten warning message.
> Introduced a public WQ_ flag, which is initially meant for use by
> only system-wide WQs, but allows private WQs used by built-in modules
> to use this flag for detecting unexpected flush attempts if they want.
>
> include/linux/workqueue.h | 26 +++++++++++++------------
> kernel/workqueue.c | 41 ++++++++++++++++++++++++++++-----------
> 2 files changed, 44 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
> index 7fee9b6cfede..4b698917b9d5 100644
> --- a/include/linux/workqueue.h
> +++ b/include/linux/workqueue.h
> @@ -335,6 +335,18 @@ enum {
> */
> WQ_POWER_EFFICIENT = 1 << 7,
>
> + /*
> + * Since flush operation synchronously waits for completion, flushing
> + * system-wide workqueues (e.g. system_wq) or a work on a system-wide
> + * workqueue might introduce possibility of deadlock due to unexpected
> + * locking dependency.
> + *
> + * This flag emits warning if flush operation is attempted. Don't set
> + * this flag on user-defined workqueues, for destroy_workqueue() will
> + * involve flush operation.
> + */
> + WQ_WARN_FLUSH_ATTEMPT = 1 << 8,
> +
> __WQ_DRAINING = 1 << 16, /* internal: workqueue is draining */
> __WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
> __WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */
> @@ -569,18 +581,8 @@ static inline bool schedule_work(struct work_struct *work)
> * Forces execution of the kernel-global workqueue and blocks until its
> * completion.
> *
> - * Think twice before calling this function! It's very easy to get into
> - * trouble if you don't take great care. Either of the following situations
> - * will lead to deadlock:
> - *
> - * One of the work items currently on the workqueue needs to acquire
> - * a lock held by your code or its caller.
> - *
> - * Your code is running in the context of a work routine.
> - *
> - * They will be detected by lockdep when they occur, but the first might not
> - * occur very often. It depends on what work items are on the workqueue and
> - * what locks they need, which you have no control over.
> + * Please stop calling this function. If you need to flush, please use your
> + * own workqueue.
> *
> * In most situations flushing the entire workqueue is overkill; you merely
> * need to know that a particular work item isn't queued and isn't running.
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 33f1106b4f99..8e6e64372441 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -2618,6 +2618,20 @@ static int rescuer_thread(void *__rescuer)
> goto repeat;
> }
>
> +static void warn_flush_attempt(struct workqueue_struct *wq)
> +{
> + static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
> +
> +
> + /* Use ratelimit for now in order not to flood warning messages. */
> + ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
> + if (!__ratelimit(&flush_warn_rs))
> + return;
> + /* Don't use WARN_ON() for now in order not to break kernel testing. */
> + pr_warn("Please do not flush %s WQ.\n", wq->name);
> + dump_stack();
> +}
> +
> /**
> * check_flush_dependency - check for flush dependency sanity
> * @target_wq: workqueue being flushed
> @@ -2635,6 +2649,9 @@ static void check_flush_dependency(struct workqueue_struct *target_wq,
> work_func_t target_func = target_work ? target_work->func : NULL;
> struct worker *worker;
>
> + if (unlikely(target_wq->flags & WQ_WARN_FLUSH_ATTEMPT))
> + warn_flush_attempt(target_wq);
> +
> if (target_wq->flags & WQ_MEM_RECLAIM)
> return;
>
> @@ -6054,18 +6071,20 @@ void __init workqueue_init_early(void)
> ordered_wq_attrs[i] = attrs;
> }
>
> - system_wq = alloc_workqueue("events", 0, 0);
> - system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
> - system_long_wq = alloc_workqueue("events_long", 0, 0);
> - system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
> + system_wq = alloc_workqueue("events", WQ_WARN_FLUSH_ATTEMPT, 0);
> + system_highpri_wq = alloc_workqueue("events_highpri",
> + WQ_WARN_FLUSH_ATTEMPT | WQ_HIGHPRI, 0);
> + system_long_wq = alloc_workqueue("events_long", WQ_WARN_FLUSH_ATTEMPT, 0);
> + system_unbound_wq = alloc_workqueue("events_unbound", WQ_WARN_FLUSH_ATTEMPT | WQ_UNBOUND,
> WQ_UNBOUND_MAX_ACTIVE);
> - system_freezable_wq = alloc_workqueue("events_freezable",
> - WQ_FREEZABLE, 0);
> - system_power_efficient_wq = alloc_workqueue("events_power_efficient",
> - WQ_POWER_EFFICIENT, 0);
> - system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
> - WQ_FREEZABLE | WQ_POWER_EFFICIENT,
> - 0);
> + system_freezable_wq =
> + alloc_workqueue("events_freezable", WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE, 0);
> + system_power_efficient_wq =
> + alloc_workqueue("events_power_efficient",
> + WQ_WARN_FLUSH_ATTEMPT | WQ_POWER_EFFICIENT, 0);
> + system_freezable_power_efficient_wq =
> + alloc_workqueue("events_freezable_power_efficient",
> + WQ_WARN_FLUSH_ATTEMPT | WQ_FREEZABLE | WQ_POWER_EFFICIENT, 0);
> BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
> !system_unbound_wq || !system_freezable_wq ||
> !system_power_efficient_wq ||

Best regards
--
Marek Szyprowski, PhD
Samsung R&D Institute Poland