2018-04-19 05:27:35

by Fengguang Wu

[permalink] [raw]
Subject: [btrfs_destroy_workqueue] WARNING: CPU: 0 PID: 6954 at kernel/workqueue.c:4142 destroy_workqueue+0x64/0x1e0

Hello,

FYI this happens in mainline kernel 4.17.0-rc1.
It at least dates back to v4.14-rc2 .

It's triggered when running fio tests. It's really hard to reproduce
(only happened once in 4.17-rc1 and several times in v4.14-rc2) and
all bisects failed so far.

[ 133.751073] WARNING: stack going in the wrong direction? ip=__schedule+0x489/0x830:
perf_sw_event_sched at include/linux/perf_event.h:1062
(inlined by) perf_event_task_sched_out at include/linux/perf_event.h:1100
(inlined by) prepare_task_switch at kernel/sched/core.c:2636
(inlined by) context_switch at kernel/sched/core.c:2813
(inlined by) __schedule at kernel/sched/core.c:3490
[ 134.048965] perf: interrupt took too long (9682 > 9626), lowering kernel.perf_event_max_sample_rate to 20000
[ 134.472390] perf: interrupt took too long (12178 > 12102), lowering kernel.perf_event_max_sample_rate to 16000
[ 234.324541] 2018-04-17 16:08:50 umount /fs/pmem0
[ 234.324546]
[ 240.185400] WARNING: CPU: 0 PID: 6954 at kernel/workqueue.c:4142 destroy_workqueue+0x64/0x1e0:
destroy_workqueue at kernel/workqueue.c:4142 (discriminator 1)
[ 240.197915] Modules linked in: btrfs xor zstd_decompress zstd_compress xxhash raid6_pq dm_mod sr_mod cdrom intel_rapl sd_mod sg sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp mgag200 kvm_intel ttm kvm irqbypass crct10dif_pclmul crc32_pclmul drm_kms_helper crc32c_intel ghash_clmulni_intel syscopyarea nd_pmem(O) dax_pmem(O) snd_pcm pcbc sysfillrect device_dax(O) nd_btt(O) snd_timer sysimgblt aesni_intel fb_sys_fops nd_e820(O) crypto_simd ipmi_si libnvdimm(O) snd soundcore ahci mxm_wmi cryptd ipmi_devintf wdat_wdt dcdbas nfit_test_iomap(O) libahci pcspkr drm megaraid_sas glue_helper libata ipmi_msghandler wmi acpi_power_meter shpchp ip_tables
[ 240.267813] CPU: 0 PID: 6954 Comm: umount Tainted: G O 4.17.0-rc1 #1
[ 240.277473] Hardware name: Dell Inc. PowerEdge R630/0CNCJW, BIOS 2.1.7 06/16/2016
[ 240.286967] RIP: 0010:destroy_workqueue+0x64/0x1e0:
destroy_workqueue at kernel/workqueue.c:4142 (discriminator 1)
[ 240.293463] RSP: 0018:ffffc90021b0fde0 EFLAGS: 00010202
[ 240.300462] RAX: ffff884072f4c058 RBX: ffff88407a17dc00 RCX: ffff884072f4c000
[ 240.309628] RDX: ffff884072f4c058 RSI: 0000000000000000 RDI: ffffffff820cfa30
[ 240.318802] RBP: ffff88407a17dc20 R08: ffffc90021b0fd40 R09: 0000000000000000
[ 240.327990] R10: ffffc90021b0fdb8 R11: 0000000000000000 R12: ffff8820347f4fc0
[ 240.337306] R13: ffff884079c23138 R14: ffffffff82d47aa0 R15: 0000000000000000
[ 240.346546] FS: 00007f941fee2840(0000) GS:ffff882067600000(0000) knlGS:0000000000000000
[ 240.356876] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 240.364602] CR2: 00007ff793645258 CR3: 0000004074f4a005 CR4: 00000000001606f0
[ 240.373917] Call Trace:
[ 240.378060] btrfs_destroy_workqueue+0x40/0x110 [btrfs]
[ 240.385322] btrfs_stop_all_workers+0x2d/0xf0 [btrfs]
[ 240.392397] close_ctree+0x133/0x2f0 [btrfs]
[ 240.398581] generic_shutdown_super+0x6c/0x120:
__read_once_size at include/linux/compiler.h:188
(inlined by) list_empty at include/linux/list.h:203
(inlined by) generic_shutdown_super at fs/super.c:442
[ 240.404956] kill_anon_super+0xe/0x20:
kill_anon_super at fs/super.c:1038
[ 240.410482] btrfs_kill_super+0x13/0x100 [btrfs]
[ 240.417076] deactivate_locked_super+0x3f/0x70:
deactivate_locked_super at fs/super.c:320
[ 240.423483] cleanup_mnt+0x3b/0x70:
cleanup_mnt at fs/namespace.c:1174
[ 240.428737] task_work_run+0xa3/0xe0:
task_work_run at kernel/task_work.c:115 (discriminator 1)
[ 240.434199] exit_to_usermode_loop+0x9e/0xa0:
tracehook_notify_resume at include/linux/tracehook.h:191
(inlined by) exit_to_usermode_loop at arch/x86/entry/common.c:166
[ 240.440456] do_syscall_64+0x16c/0x180:
prepare_exit_to_usermode at arch/x86/entry/common.c:196
(inlined by) syscall_return_slowpath at arch/x86/entry/common.c:265
(inlined by) do_syscall_64 at arch/x86/entry/common.c:290
[ 240.446133] entry_SYSCALL_64_after_hwframe+0x44/0xa9:
entry_SYSCALL_64_after_hwframe at arch/x86/entry/entry_64.S:247
[ 240.453285] RIP: 0033:0x7f941f7c7277
[ 240.458782] RSP: 002b:00007fffb9a17cf8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
[ 240.468778] RAX: 0000000000000000 RBX: 000000000158b6e0 RCX: 00007f941f7c7277
[ 240.478299] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 000000000158b8c0
[ 240.487824] RBP: 000000000158b8c0 R08: 0000000000000000 R09: 0000000000000015
[ 240.497317] R10: 00000000000006b0 R11: 0000000000000246 R12: 00007f941fcc9e44
[ 240.506810] R13: 0000000000000000 R14: 0000000000000000 R15: 00007fffb9a17f80
[ 240.516315] Code: c2 74 19 8b 30 85 f6 74 f1 0f 0b 48 89 ef e8 84 9f 8c 00 5b 5d 41 5c e9 cb fa ff ff 48 39 8b a0 00 00 00 74 0a 83 79 18 01 7e 04 <0f> 0b eb dc 8b 41 58 85 c0 0f 85 3d 01 00 00 48 8b 41 60 48 8d
[ 240.540655] ---[ end trace faf649c5bf432714 ]---
[ 240.547594] Showing busy workqueues and worker pools:

Attached the full dmesg and kconfig.

Thanks,
Fengguang


Attachments:
(No filename) (5.06 kB)
dmesg-lkp-hsw-ep6:20180417161212:x86_64-rhel-7.2:gcc-7:4.17.0-rc1:1 (153.85 kB)
.config (166.89 kB)
Download all attachments