Greeting,
FYI, we noticed the following commit (built with gcc-9):
commit: 29bd199e4e739fe8b13203c92275bfafd97b7020 ("workqueue: Warn flush attempt using system-wide workqueues")
url: https://github.com/0day-ci/linux/commits/UPDATE-20220217-192406/Tetsuo-Handa/workqueue-Warn-on-flushing-system-wide-workqueues/20220214-213908
in testcase: trinity
version: trinity-static-i386-x86_64-f93256fb_2019-08-28
with following parameters:
runtime: 300s
test-description: Trinity is a linux system call fuzz tester.
test-url: http://codemonkey.org.uk/projects/trinity/
on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
+-------------------------------------------------------+------------+------------+
| | bc35f7ef96 | 29bd199e4e |
+-------------------------------------------------------+------------+------------+
| boot_successes | 8 | 0 |
| boot_failures | 0 | 11 |
| WARNING:possible_circular_locking_dependency_detected | 0 | 11 |
+-------------------------------------------------------+------------+------------+
If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>
[ 74.067427][ T254] WARNING: possible circular locking dependency detected
[ 74.067431][ T254] 5.16.0-06523-g29bd199e4e73 #1 Not tainted
[ 74.067435][ T254] ------------------------------------------------------
[ 74.067437][ T254] modprobe/254 is trying to acquire lock:
[ 74.067441][ T254] ffff88839d4393d8 (&pool->lock){-.-.}-{2:2}, at: __queue_work (kernel/workqueue.c:1053 kernel/workqueue.c:1464)
[ 74.067471][ T254]
[ 74.067471][ T254] but task is already holding lock:
[ 74.067473][ T254] ffffffff85201c40 (console_owner){-...}-{0:0}, at: console_unlock (kernel/printk/printk.c:2706 (discriminator 3))
[ 74.067494][ T254]
[ 74.067494][ T254] which lock already depends on the new lock.
[ 74.067494][ T254]
[ 74.067496][ T254]
[ 74.067496][ T254] the existing dependency chain (in reverse order) is:
[ 74.067498][ T254]
[ 74.067498][ T254] -> #1 (console_owner){-...}-{0:0}:
[ 74.067507][ T254] lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5641 kernel/locking/lockdep.c:5604)
[ 74.067517][ T254] console_unlock (kernel/printk/printk.c:2709 (discriminator 3))
[ 74.067522][ T254] vprintk_emit (arch/x86/include/asm/preempt.h:85 kernel/printk/printk.c:2245)
[ 74.067528][ T254] _printk (kernel/printk/printk.c:2260)
[ 74.067542][ T254] check_flush_dependency.cold (kernel/workqueue.c:2614 kernel/workqueue.c:2635)
[ 74.067548][ T254] __flush_work (include/linux/lockdep.h:195 include/linux/lockdep.h:202 include/linux/lockdep.h:208 kernel/workqueue.c:2702 kernel/workqueue.c:3051 kernel/workqueue.c:3091)
[ 74.067554][ T254] schedule_on_each_cpu (kernel/workqueue.c:3113 kernel/workqueue.c:3345)
[ 74.067561][ T254] rcu_tasks_kthread (kernel/rcu/rcu.h:71 kernel/rcu/tasks.h:516)
[ 74.067569][ T254] kthread (kernel/kthread.c:327)
[ 74.067577][ T254] ret_from_fork (arch/x86/entry/entry_64.S:301)
[ 74.067586][ T254]
[ 74.067586][ T254] -> #0 (&pool->lock){-.-.}-{2:2}:
[ 74.067596][ T254] check_prev_add (kernel/locking/lockdep.c:3064)
[ 74.067603][ T254] __lock_acquire (kernel/locking/lockdep.c:3187 kernel/locking/lockdep.c:3801 kernel/locking/lockdep.c:5027)
[ 74.067608][ T254] lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5641 kernel/locking/lockdep.c:5604)
[ 74.067614][ T254] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 74.067624][ T254] __queue_work (kernel/workqueue.c:1053 kernel/workqueue.c:1464)
[ 74.067629][ T254] queue_work_on (kernel/workqueue.c:1546)
[ 74.067635][ T254] soft_cursor (drivers/video/fbdev/core/softcursor.c:75)
[ 74.067644][ T254] bit_cursor (drivers/video/fbdev/core/bitblit.c:362)
[ 74.067650][ T254] hide_cursor (drivers/tty/vt/vt.c:893 drivers/tty/vt/vt.c:908)
[ 74.067663][ T254] vt_console_print (drivers/tty/vt/vt.c:3108)
[ 74.067668][ T254] console_unlock (kernel/printk/printk.c:1928 kernel/printk/printk.c:2710)
[ 74.067672][ T254] register_framebuffer (drivers/video/fbdev/core/fbmem.c:1876)
[ 74.067679][ T254] __drm_fb_helper_initial_config_and_unlock (drivers/gpu/drm/drm_fb_helper.c:1867) drm_kms_helper
[ 74.067760][ T254] drm_fbdev_client_hotplug (drivers/gpu/drm/drm_fb_helper.c:1933 drivers/gpu/drm/drm_fb_helper.c:1925 drivers/gpu/drm/drm_fb_helper.c:2435) drm_kms_helper
[ 74.067816][ T254] drm_fbdev_generic_setup (drivers/gpu/drm/drm_fb_helper.c:2522) drm_kms_helper
[ 74.067883][ T254] bochs_pci_probe.cold (drivers/gpu/drm/tiny/bochs.c:354 drivers/gpu/drm/tiny/bochs.c:430) bochs
[ 74.067898][ T254] local_pci_probe (drivers/pci/pci-driver.c:323)
[ 74.067907][ T254] pci_device_probe (drivers/pci/pci-driver.c:380 drivers/pci/pci-driver.c:405 drivers/pci/pci-driver.c:448)
[ 74.067914][ T254] really_probe (drivers/base/dd.c:748)
[ 74.067924][ T254] __driver_probe_device (drivers/base/dd.c:751)
[ 74.067930][ T254] driver_probe_device (drivers/base/dd.c:781)
[ 74.067937][ T254] __driver_attach (drivers/base/dd.c:1141)
[ 74.067943][ T254] bus_for_each_dev (drivers/base/bus.c:301)
[ 74.067950][ T254] bus_add_driver (drivers/base/bus.c:619)
[ 74.067956][ T254] driver_register (drivers/base/driver.c:171)
[ 74.067963][ T254] do_one_initcall (init/main.c:1297)
[ 74.067969][ T254] do_init_module (kernel/module.c:3734)
[ 74.067978][ T254] load_module (kernel/module.c:4133)
[ 74.067984][ T254] __do_sys_init_module (kernel/module.c:4198)
[ 74.067990][ T254] __do_fast_syscall_32 (arch/x86/entry/common.c:112 arch/x86/entry/common.c:178)
[ 74.067996][ T254] do_fast_syscall_32 (arch/x86/entry/common.c:203)
[ 74.068001][ T254] entry_SYSENTER_compat_after_hwframe (arch/x86/entry/entry_64_compat.S:141)
[ 74.068011][ T254]
[ 74.068011][ T254] other info that might help us debug this:
[ 74.068011][ T254]
[ 74.068013][ T254] Possible unsafe locking scenario:
[ 74.068013][ T254]
[ 74.068015][ T254] CPU0 CPU1
[ 74.068017][ T254] ---- ----
[ 74.068019][ T254] lock(console_owner);
[ 74.068025][ T254] lock(&pool->lock);
[ 74.068030][ T254] lock(console_owner);
[ 74.068035][ T254] lock(&pool->lock);
[ 74.068040][ T254]
[ 74.068040][ T254] *** DEADLOCK ***
[ 74.068040][ T254]
[ 74.068041][ T254] 6 locks held by modprobe/254:
[ 74.068046][ T254] #0: ffff88810dfea248 (&dev->mutex){....}-{3:3}, at: __driver_attach (drivers/base/dd.c:1140)
[ 74.068065][ T254] #1: ffffffff85a6cb48 (registration_lock#2){+.+.}-{3:3}, at: register_framebuffer (drivers/video/fbdev/core/fbmem.c:1491 drivers/video/fbdev/core/fbmem.c:1586 drivers/video/fbdev/core/fbmem.c:1875)
[ 74.068082][ T254] #2: ffffffff855822e0 (console_lock){+.+.}-{0:0}, at: register_framebuffer (include/linux/fb.h:638 drivers/video/fbdev/core/fbmem.c:1656 drivers/video/fbdev/core/fbmem.c:1875)
[ 74.068098][ T254] #3: ffffffff85201c40 (console_owner){-...}-{0:0}, at: console_unlock (kernel/printk/printk.c:2706 (discriminator 3))
[ 74.068115][ T254] #4: ffffffff85b0e078 (printing_lock){....}-{2:2}, at: vt_console_print (drivers/tty/vt/vt.c:3090)
[ 74.068130][ T254] #5: ffffffff855a5f80 (rcu_read_lock){....}-{1:2}, at: __queue_work (arch/x86/include/asm/preempt.h:80 include/linux/rcupdate.h:68 include/linux/rcupdate.h:692 kernel/workqueue.c:1440)
[ 74.068146][ T254]
[ 74.068146][ T254] stack backtrace:
[ 74.068149][ T254] CPU: 1 PID: 254 Comm: modprobe Not tainted 5.16.0-06523-g29bd199e4e73 #1
[ 74.068157][ T254] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[ 74.068161][ T254] Call Trace:
[ 74.068165][ T254] <TASK>
[ 74.068169][ T254] dump_stack_lvl (lib/dump_stack.c:107)
[ 74.068180][ T254] check_noncircular (kernel/locking/lockdep.c:2143)
[ 74.068187][ T254] ? stack_trace_save (kernel/stacktrace.c:123)
[ 74.068202][ T254] ? print_circular_bug+0x480/0x480
[ 74.068212][ T254] ? check_path+0x24/0x80
[ 74.068219][ T254] ? check_noncircular (kernel/locking/lockdep.c:2131)
[ 74.068228][ T254] ? save_trace (kernel/locking/lockdep.c:553)
[ 74.068242][ T254] check_prev_add (kernel/locking/lockdep.c:3064)
[ 74.068254][ T254] ? mark_lock+0xca/0x13c0
[ 74.068264][ T254] __lock_acquire (kernel/locking/lockdep.c:3187 kernel/locking/lockdep.c:3801 kernel/locking/lockdep.c:5027)
[ 74.068283][ T254] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4885)
[ 74.068291][ T254] ? lock_is_held_type (kernel/locking/lockdep.c:5380 kernel/locking/lockdep.c:5682)
[ 74.068303][ T254] ? rcu_read_lock_sched_held (include/linux/lockdep.h:283 kernel/rcu/update.c:125)
[ 74.068316][ T254] lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5641 kernel/locking/lockdep.c:5604)
[ 74.068324][ T254] ? __queue_work (kernel/workqueue.c:1053 kernel/workqueue.c:1464)
[ 74.068334][ T254] ? rcu_read_unlock (include/linux/rcupdate.h:723 (discriminator 5))
[ 74.068344][ T254] ? radix_tree_insert (lib/radix-tree.c:748)
[ 74.068358][ T254] ? rcu_read_lock_held (include/linux/lockdep.h:283 kernel/rcu/update.c:311 kernel/rcu/update.c:305)
[ 74.068366][ T254] ? rcu_read_lock_sched_held (kernel/rcu/update.c:306)
[ 74.068376][ T254] _raw_spin_lock (include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 74.068382][ T254] ? __queue_work (kernel/workqueue.c:1053 kernel/workqueue.c:1464)
[ 74.068389][ T254] __queue_work (kernel/workqueue.c:1053 kernel/workqueue.c:1464)
[ 74.068399][ T254] ? do_raw_spin_unlock (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-instrumented.h:28 include/asm-generic/qspinlock.h:28 kernel/locking/spinlock_debug.c:100 kernel/locking/spinlock_debug.c:140)
[ 74.068412][ T254] queue_work_on (kernel/workqueue.c:1546)
[ 74.068422][ T254] soft_cursor (drivers/video/fbdev/core/softcursor.c:75)
[ 74.068435][ T254] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4885)
[ 74.068442][ T254] ? lock_is_held_type (kernel/locking/lockdep.c:5380 kernel/locking/lockdep.c:5682)
[ 74.068452][ T254] bit_cursor (drivers/video/fbdev/core/bitblit.c:362)
[ 74.068467][ T254] ? bit_putcs (drivers/video/fbdev/core/bitblit.c:223)
[ 74.068474][ T254] ? lock_acquire (kernel/locking/lockdep.c:438 kernel/locking/lockdep.c:5641 kernel/locking/lockdep.c:5604)
[ 74.068487][ T254] ? get_color (drivers/video/fbdev/core/fbcon.c:287)
[ 74.068494][ T254] ? wait_for_xmitr (drivers/tty/serial/8250/8250_port.c:2070)
[ 74.068507][ T254] ? bit_putcs (drivers/video/fbdev/core/bitblit.c:223)
[ 74.068514][ T254] ? fbcon_cursor (drivers/video/fbdev/core/fbcon.c:1320)
[ 74.068524][ T254] hide_cursor (drivers/tty/vt/vt.c:893 drivers/tty/vt/vt.c:908)
[ 74.068535][ T254] vt_console_print (drivers/tty/vt/vt.c:3108)
[ 74.068543][ T254] ? serial8250_config_port (drivers/tty/serial/8250/8250_port.c:3339)
[ 74.068552][ T254] ? lock_downgrade (kernel/locking/lockdep.c:5647)
[ 74.068560][ T254] ? do_raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 kernel/locking/spinlock_debug.c:115)
[ 74.068568][ T254] ? vc_con_write_normal (drivers/tty/vt/vt.c:3079)
[ 74.068574][ T254] ? rwlock_bug+0xc0/0xc0
[ 74.068580][ T254] ? prb_read_valid (kernel/printk/printk_ringbuffer.c:1928)
[ 74.068589][ T254] ? prb_final_commit (kernel/printk/printk_ringbuffer.c:1928)
[ 74.068602][ T254] console_unlock (kernel/printk/printk.c:1928 kernel/printk/printk.c:2710)
[ 74.068615][ T254] ? do_syslog (kernel/printk/printk.c:2087)
[ 74.068638][ T254] ? do_fbcon_takeover (drivers/video/fbdev/core/fbcon.c:545)
[ 74.068649][ T254] register_framebuffer (drivers/video/fbdev/core/fbmem.c:1876)
[ 74.068656][ T254] ? drm_setup_crtcs_fb (drivers/gpu/drm/drm_fb_helper.c:1801) drm_kms_helper
[ 74.068725][ T254] ? do_remove_conflicting_framebuffers (drivers/video/fbdev/core/fbmem.c:1871)
[ 74.068740][ T254] ? lockdep_init_map_type (kernel/locking/lockdep.c:4810)
[ 74.068748][ T254] ? __raw_spin_lock_init (kernel/locking/spinlock_debug.c:26)
[ 74.068760][ T254] __drm_fb_helper_initial_config_and_unlock (drivers/gpu/drm/drm_fb_helper.c:1867) drm_kms_helper
[ 74.068829][ T254] drm_fbdev_client_hotplug (drivers/gpu/drm/drm_fb_helper.c:1933 drivers/gpu/drm/drm_fb_helper.c:1925 drivers/gpu/drm/drm_fb_helper.c:2435) drm_kms_helper
[ 74.068901][ T254] drm_fbdev_generic_setup (drivers/gpu/drm/drm_fb_helper.c:2522) drm_kms_helper
[ 74.068972][ T254] bochs_pci_probe.cold (drivers/gpu/drm/tiny/bochs.c:354 drivers/gpu/drm/tiny/bochs.c:430) bochs
[ 74.068990][ T254] ? bochs_connector_get_modes (drivers/gpu/drm/tiny/bochs.c:632) bochs
[ 74.069002][ T254] ? lockdep_hardirqs_on_prepare (kernel/locking/lockdep.c:4232 kernel/locking/lockdep.c:4292 kernel/locking/lockdep.c:4244)
[ 74.069011][ T254] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 74.069019][ T254] ? trace_hardirqs_on (kernel/trace/trace_preemptirq.c:50 (discriminator 22))
[ 74.069033][ T254] ? _raw_spin_unlock_irqrestore (arch/x86/include/asm/irqflags.h:45 arch/x86/include/asm/irqflags.h:80 arch/x86/include/asm/irqflags.h:138 include/linux/spinlock_api_smp.h:151 kernel/locking/spinlock.c:194)
[ 74.069043][ T254] ? bochs_connector_get_modes (drivers/gpu/drm/tiny/bochs.c:632) bochs
[ 74.069057][ T254] local_pci_probe (drivers/pci/pci-driver.c:323)
[ 74.069068][ T254] pci_device_probe (drivers/pci/pci-driver.c:380 drivers/pci/pci-driver.c:405 drivers/pci/pci-driver.c:448)
[ 74.069075][ T254] ? kernfs_put (arch/x86/include/asm/atomic.h:123 (discriminator 1) include/linux/atomic/atomic-instrumented.h:576 (discriminator 1) fs/kernfs/dir.c:513 (discriminator 1))
[ 74.069085][ T254] ? pci_device_remove (drivers/pci/pci-driver.c:433)
To reproduce:
# build kernel
cd linux
cp config-5.16.0-06523-g29bd199e4e73 .config
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
make HOSTCC=gcc-9 CC=gcc-9 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
cd <mod-install-dir>
find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.
---
0DAY/LKP+ Test Infrastructure Open Source Technology Center
https://lists.01.org/hyperkitty/list/[email protected] Intel Corporation
Thanks,
Oliver Sang
Since schedule_on_each_cpu() calls schedule_work_on() and flush_work(),
we should avoid using system_wq in order to avoid unexpected locking
dependency.
[T1] Running RCU-tasks wait API self tests
[T9] Please do not flush events WQ.
[T9] CPU: 0 PID: 9 Comm: rcu_tasks_rude_ Not tainted 5.16.0-06523-g29bd199e4e73 #1
[T9] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014
[T9] Call Trace:
[T9] <TASK>
[T9] dump_stack_lvl+0x45/0x59
[T9] check_flush_dependency.cold+0x18/0x49
[T9] __flush_work+0x202/0xa00
[T9] schedule_on_each_cpu+0x1d8/0x300
[T9] rcu_tasks_kthread+0x21f/0x580
[T9] kthread+0x3a4/0x480
[T9] ret_from_fork+0x22/0x30
[T9] </TASK>
[T1] Performance Events: unsupported p6 CPU model 42 no PMU driver, software events only.
Reported-by: kernel test robot <[email protected]>
Signed-off-by: Tetsuo Handa <[email protected]>
---
Tejun, are you OK with https://lkml.kernel.org/r/[email protected] ?
kernel/workqueue.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 8e6e64372441..6ebc6cb15c1c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3345,10 +3345,16 @@ int schedule_on_each_cpu(work_func_t func)
{
int cpu;
struct work_struct __percpu *works;
+ struct workqueue_struct *wq;
works = alloc_percpu(struct work_struct);
if (!works)
return -ENOMEM;
+ wq = alloc_workqueue("events_sync", 0, 0);
+ if (!wq) {
+ free_percpu(works);
+ return -ENOMEM;
+ }
cpus_read_lock();
@@ -3363,6 +3369,7 @@ int schedule_on_each_cpu(work_func_t func)
flush_work(per_cpu_ptr(works, cpu));
cpus_read_unlock();
+ destroy_workqueue(wq);
free_percpu(works);
return 0;
}
--
2.32.0
On Mon, Feb 21, 2022 at 07:38:09PM +0900, Tetsuo Handa wrote:
> Since schedule_on_each_cpu() calls schedule_work_on() and flush_work(),
> we should avoid using system_wq in order to avoid unexpected locking
> dependency.
I don't get it. schedule_on_each_cpu() is flushing each work item and thus
shouldn't need its own flushing domain. What's this change for?
Thanks.
--
tejun
On 2022/02/23 2:29, Tejun Heo wrote:
> On Mon, Feb 21, 2022 at 07:38:09PM +0900, Tetsuo Handa wrote:
>> Since schedule_on_each_cpu() calls schedule_work_on() and flush_work(),
>> we should avoid using system_wq in order to avoid unexpected locking
>> dependency.
>
> I don't get it. schedule_on_each_cpu() is flushing each work item and thus
> shouldn't need its own flushing domain. What's this change for?
A kernel test robot tested "[PATCH v2] workqueue: Warn flush attempt using
system-wide workqueues" on 5.16.0-06523-g29bd199e4e73 and hit a lockdep
warning ( https://lkml.kernel.org/r/20220221083358.GC835@xsang-OptiPlex-9020 ).
Although the circular locking dependency itself needs to be handled by
lockless console printing support, we won't be able to apply
"[PATCH v2] workqueue: Warn flush attempt using system-wide workqueues"
if schedule_on_each_cpu() continues using system-wide workqueues.
On 2022/02/24 6:33, Tejun Heo wrote:
> On Wed, Feb 23, 2022 at 09:57:27AM +0900, Tetsuo Handa wrote:
>> On 2022/02/23 2:29, Tejun Heo wrote:
>>> On Mon, Feb 21, 2022 at 07:38:09PM +0900, Tetsuo Handa wrote:
>>>> Since schedule_on_each_cpu() calls schedule_work_on() and flush_work(),
>>>> we should avoid using system_wq in order to avoid unexpected locking
>>>> dependency.
>>>
>>> I don't get it. schedule_on_each_cpu() is flushing each work item and thus
>>> shouldn't need its own flushing domain. What's this change for?
>>
>> A kernel test robot tested "[PATCH v2] workqueue: Warn flush attempt using
>> system-wide workqueues" on 5.16.0-06523-g29bd199e4e73 and hit a lockdep
>> warning ( https://lkml.kernel.org/r/20220221083358.GC835@xsang-OptiPlex-9020 ).
>>
>> Although the circular locking dependency itself needs to be handled by
>> lockless console printing support, we won't be able to apply
>> "[PATCH v2] workqueue: Warn flush attempt using system-wide workqueues"
>> if schedule_on_each_cpu() continues using system-wide workqueues.
>
> The patch seems pretty wrong. What's problematic is system workqueue flushes
> (which flushes the entire workqueue), not work item flushes.
Why? My understanding is that
flushing a workqueue waits for completion of all work items in that workqueue
flushing a work item waits for for completion of that work item using
a workqueue specified as of queue_work()
and
if a work item in some workqueue is blocked by other work in that workqueue
(e.g. max_active limit, work items on that workqueue and locks they need),
it has a risk of deadlock
. Then, how can flushing a work item using system-wide workqueues be free of deadlock risk?
Isn't it just "unlikely to deadlock" rather than "impossible to deadlock"?
On Wed, Feb 23, 2022 at 09:57:27AM +0900, Tetsuo Handa wrote:
> On 2022/02/23 2:29, Tejun Heo wrote:
> > On Mon, Feb 21, 2022 at 07:38:09PM +0900, Tetsuo Handa wrote:
> >> Since schedule_on_each_cpu() calls schedule_work_on() and flush_work(),
> >> we should avoid using system_wq in order to avoid unexpected locking
> >> dependency.
> >
> > I don't get it. schedule_on_each_cpu() is flushing each work item and thus
> > shouldn't need its own flushing domain. What's this change for?
>
> A kernel test robot tested "[PATCH v2] workqueue: Warn flush attempt using
> system-wide workqueues" on 5.16.0-06523-g29bd199e4e73 and hit a lockdep
> warning ( https://lkml.kernel.org/r/20220221083358.GC835@xsang-OptiPlex-9020 ).
>
> Although the circular locking dependency itself needs to be handled by
> lockless console printing support, we won't be able to apply
> "[PATCH v2] workqueue: Warn flush attempt using system-wide workqueues"
> if schedule_on_each_cpu() continues using system-wide workqueues.
The patch seems pretty wrong. What's problematic is system workqueue flushes
(which flushes the entire workqueue), not work item flushes.
Thanks.
--
tejun
On Thu, Feb 24, 2022 at 07:26:30AM +0900, Tetsuo Handa wrote:
> > The patch seems pretty wrong. What's problematic is system workqueue flushes
> > (which flushes the entire workqueue), not work item flushes.
>
> Why? My understanding is that
>
> flushing a workqueue waits for completion of all work items in that workqueue
>
> flushing a work item waits for for completion of that work item using
> a workqueue specified as of queue_work()
>
> and
>
> if a work item in some workqueue is blocked by other work in that workqueue
> (e.g. max_active limit, work items on that workqueue and locks they need),
> it has a risk of deadlock
>
> . Then, how can flushing a work item using system-wide workqueues be free of deadlock risk?
> Isn't it just "unlikely to deadlock" rather than "impossible to deadlock"?
If we're jamming system_wq with a combination of work items which need more
than max_active to make forward progress, we're stuck regardless of flushes.
What's needed at that point is increasing max_active (or something along
that line).
Thanks.
--
tejun
On 2022/02/24 7:29, Tejun Heo wrote:
> On Thu, Feb 24, 2022 at 07:26:30AM +0900, Tetsuo Handa wrote:
>>> The patch seems pretty wrong. What's problematic is system workqueue flushes
>>> (which flushes the entire workqueue), not work item flushes.
>>
>> Why? My understanding is that
>>
>> flushing a workqueue waits for completion of all work items in that workqueue
>>
>> flushing a work item waits for for completion of that work item using
>> a workqueue specified as of queue_work()
>>
>> and
>>
>> if a work item in some workqueue is blocked by other work in that workqueue
>> (e.g. max_active limit, work items on that workqueue and locks they need),
>> it has a risk of deadlock
>>
>> . Then, how can flushing a work item using system-wide workqueues be free of deadlock risk?
>> Isn't it just "unlikely to deadlock" rather than "impossible to deadlock"?
>
> If we're jamming system_wq with a combination of work items which need more
> than max_active to make forward progress, we're stuck regardless of flushes.
> What's needed at that point is increasing max_active (or something along
> that line).
Then, what about this?
If this looks OK, I'll test this patch using linux-next via my tree.
From 75ccc165bfcea368728d8c0f7efcc8b4f904f98d Mon Sep 17 00:00:00 2001
From: Tetsuo Handa <[email protected]>
Date: Mon, 28 Feb 2022 22:49:16 +0900
Subject: [PATCH v3] workqueue: Warn flushing of kernel-global workqueues
Since flush operation synchronously waits for completion, flushing
kernel-global WQs (e.g. system_wq) might introduce possibility of deadlock
due to unexpected locking dependency. Tejun Heo commented that it makes no
sense at all to call flush_workqueue() on the shared WQs as the caller has
no idea what it's gonna end up waiting for.
Although there is flush_scheduled_work() which flushes system_wq WQ with
"Think twice before calling this function! It's very easy to get into
trouble if you don't take great care." warning message, syzbot found a
circular locking dependency caused by flushing system_long_wq WQ [1].
Therefore, let's change the direction to that developers had better use
their local WQs if flush_workqueue() is inevitable.
To give developers time to update their modules, for now just emit
a warning message with ratelimit when flush_workqueue() is called on
kernel-global WQs. We will eventually convert this warning message into
WARN_ON() and kill flush_scheduled_work().
This patch introduces __WQ_NO_FLUSH flag for emitting warning. Don't set
this flag when creating your local WQs while updating your module, for
destroy_workqueue() will involve flush operation.
Theoretically, flushing specific work item using flush_work() queued on
kernel-global WQs (which are !WQ_MEM_RECLAIM) has possibility of deadlock.
But this patch does not emit warning when flush_work() is called on work
items queued on kernel-global WQs, based on assumption that we can create
kworker threads as needed and we won't hit max_active limit.
Link: https://syzkaller.appspot.com/bug?extid=831661966588c802aae9 [1]
Signed-off-by: Tetsuo Handa <[email protected]>
---
Changes in v3:
Don't check flush_work() attempt.
Use a private WQ_ flag.
Changes in v2:
Removed #ifdef CONFIG_PROVE_LOCKING=y check.
Also check flush_work() attempt.
Shorten warning message.
Introduced a public WQ_ flag, which is initially meant for use by
only system-wide WQs, but allows private WQs used by built-in modules
to use this flag for detecting unexpected flush attempts if they want.
include/linux/workqueue.h | 15 +++------------
kernel/workqueue.c | 36 +++++++++++++++++++++++++++++-------
2 files changed, 32 insertions(+), 19 deletions(-)
diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 7fee9b6cfede..7b13fae377e2 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -339,6 +339,7 @@ enum {
__WQ_ORDERED = 1 << 17, /* internal: workqueue is ordered */
__WQ_LEGACY = 1 << 18, /* internal: create*_workqueue() */
__WQ_ORDERED_EXPLICIT = 1 << 19, /* internal: alloc_ordered_workqueue() */
+ __WQ_NO_FLUSH = 1 << 20, /* internal: warn flush_workqueue() */
WQ_MAX_ACTIVE = 512, /* I like 512, better ideas? */
WQ_MAX_UNBOUND_PER_CPU = 4, /* 4 * #cpus for unbound wq */
@@ -569,18 +570,8 @@ static inline bool schedule_work(struct work_struct *work)
* Forces execution of the kernel-global workqueue and blocks until its
* completion.
*
- * Think twice before calling this function! It's very easy to get into
- * trouble if you don't take great care. Either of the following situations
- * will lead to deadlock:
- *
- * One of the work items currently on the workqueue needs to acquire
- * a lock held by your code or its caller.
- *
- * Your code is running in the context of a work routine.
- *
- * They will be detected by lockdep when they occur, but the first might not
- * occur very often. It depends on what work items are on the workqueue and
- * what locks they need, which you have no control over.
+ * Please stop calling this function. If you need to flush kernel-global
+ * workqueue, please use your local workqueue.
*
* In most situations flushing the entire workqueue is overkill; you merely
* need to know that a particular work item isn't queued and isn't running.
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 33f1106b4f99..bc271579704f 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -2805,6 +2805,25 @@ static bool flush_workqueue_prep_pwqs(struct workqueue_struct *wq,
return wait;
}
+static void warn_flush_attempt(struct workqueue_struct *wq)
+{
+ /*
+ * Since there are known in-tree modules which will emit this warning,
+ * for now don't use WARN_ON() in order not to break kernel testing.
+ *
+ * Print whole traces with ratelimit, in order to make sure that
+ * this warning is not overlooked while this warning does not flood
+ * console and kernel log buffer.
+ */
+ static DEFINE_RATELIMIT_STATE(flush_warn_rs, 600 * HZ, 1);
+
+ ratelimit_set_flags(&flush_warn_rs, RATELIMIT_MSG_ON_RELEASE);
+ if (!__ratelimit(&flush_warn_rs))
+ return;
+ pr_warn("Please do not flush %s WQ.\n", wq->name);
+ dump_stack();
+}
+
/**
* flush_workqueue - ensure that any scheduled work has run to completion.
* @wq: workqueue to flush
@@ -2824,6 +2843,9 @@ void flush_workqueue(struct workqueue_struct *wq)
if (WARN_ON(!wq_online))
return;
+ if (unlikely(wq->flags & __WQ_NO_FLUSH))
+ warn_flush_attempt(wq);
+
lock_map_acquire(&wq->lockdep_map);
lock_map_release(&wq->lockdep_map);
@@ -6054,17 +6076,17 @@ void __init workqueue_init_early(void)
ordered_wq_attrs[i] = attrs;
}
- system_wq = alloc_workqueue("events", 0, 0);
- system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
- system_long_wq = alloc_workqueue("events_long", 0, 0);
- system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
+ system_wq = alloc_workqueue("events", __WQ_NO_FLUSH, 0);
+ system_highpri_wq = alloc_workqueue("events_highpri", __WQ_NO_FLUSH | WQ_HIGHPRI, 0);
+ system_long_wq = alloc_workqueue("events_long", __WQ_NO_FLUSH, 0);
+ system_unbound_wq = alloc_workqueue("events_unbound", __WQ_NO_FLUSH | WQ_UNBOUND,
WQ_UNBOUND_MAX_ACTIVE);
system_freezable_wq = alloc_workqueue("events_freezable",
- WQ_FREEZABLE, 0);
+ __WQ_NO_FLUSH | WQ_FREEZABLE, 0);
system_power_efficient_wq = alloc_workqueue("events_power_efficient",
- WQ_POWER_EFFICIENT, 0);
+ __WQ_NO_FLUSH | WQ_POWER_EFFICIENT, 0);
system_freezable_power_efficient_wq = alloc_workqueue("events_freezable_power_efficient",
- WQ_FREEZABLE | WQ_POWER_EFFICIENT,
+ __WQ_NO_FLUSH | WQ_FREEZABLE | WQ_POWER_EFFICIENT,
0);
BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
!system_unbound_wq || !system_freezable_wq ||
--
2.32.0
On Mon, Feb 28, 2022 at 11:03:47PM +0900, Tetsuo Handa wrote:
> Then, what about this?
> If this looks OK, I'll test this patch using linux-next via my tree.
Yeah, looks good to me. Can you boot test and fix the obvious ones before
the pushing it to linux-next? It probably would be best if this is kept in a
separate branch together with all the fix-up patches. Once things settle
enough, we should be able to route the patches through the wq tree to Linus.
Thank you so much for doing this.
--
tejun
On 2022/03/01 3:42, Tejun Heo wrote:
> On Mon, Feb 28, 2022 at 11:03:47PM +0900, Tetsuo Handa wrote:
>> Then, what about this?
>> If this looks OK, I'll test this patch using linux-next via my tree.
>
> Yeah, looks good to me. Can you boot test and fix the obvious ones before
> the pushing it to linux-next? It probably would be best if this is kept in a
> separate branch together with all the fix-up patches. Once things settle
> enough, we should be able to route the patches through the wq tree to Linus.
>
> Thank you so much for doing this.
>
This patch has been in linux-next.git as commit b9c20da356db1b39 ("workqueue:
Warn flushing of kernel-global workqueues") since next-20220301, and I got no
failure reports. I think we can send this patch to linux.git in order to let
each module's developers to write their patches. (I will drop this patch from
my tree when you pick up this patch.)