2022-05-20 07:26:58

by kernel test robot

[permalink] [raw]
Subject: [writeback] b47372dac9: BUG:soft_lockup-CPU##stuck_for#s![kworker/u16:#:#]



Greeting,

FYI, we noticed the following commit (built with gcc-11):

commit: b47372dac98f419466a5459b4eb98f9f88b7aa38 ("[PATCH v4] writeback: Fix inode->i_io_list not be protected by inode->i_lock error")
url: https://github.com/intel-lab-lkp/linux/commits/Jchao-Sun/writeback-Fix-inode-i_io_list-not-be-protected-by-inode-i_lock-error/20220511-221801
base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git feb9c5e19e913b53cb536a7aa7c9f20107bb51ec
patch link: https://lore.kernel.org/linux-fsdevel/[email protected]

in testcase: xfstests
version: xfstests-x86_64-46e1b83-1_20220414
with following parameters:

disk: 4HDD
fs: f2fs
test: generic-group-03
ucode: 0x21

test-description: xfstests is a regression test suite for xfs and other files ystems.
test-url: git://git.kernel.org/pub/scm/fs/xfs/xfstests-dev.git


on test machine: 8 threads 1 sockets Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz with 16G memory

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):



If you fix the issue, kindly add following tag
Reported-by: kernel test robot <[email protected]>


[ 116.563986][ C1] watchdog: BUG: soft lockup - CPU#1 stuck for 26s! [kworker/u16:3:56]
[ 116.564176][ C1] Modules linked in: dm_mod f2fs crc32_generic netconsole btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c intel_rapl_msr intel_rapl_common i915 sd_mod t10_pi intel_gtt x86_pkg_temp_thermal drm_buddy intel_powerclamp crc64_rocksoft_generic drm_dp_helper crc64_rocksoft crc64 ttm sg coretemp ipmi_devintf kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ipmi_msghandler crc32c_intel ghash_clmulni_intel rapl drm_kms_helper intel_cstate syscopyarea firewire_ohci sysfillrect sysimgblt ahci fb_sys_fops firewire_core libahci intel_uncore crc_itu_t drm libata mxm_wmi wmi video ip_tables
[ 116.565104][ C1] CPU: 1 PID: 56 Comm: kworker/u16:3 Not tainted 5.18.0-rc6-00010-gb47372dac98f #1
[ 116.565265][ C1] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[ 116.565439][ C1] Workqueue: writeback wb_workfn (flush-8:16)
[ 116.565547][ C1] RIP: 0010:native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:382)
[ 116.565669][ C1] Code: 37 48 b8 00 00 00 00 00 fc ff df 49 89 dc 48 89 dd 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 f3 90 41 0f b6 04 24 40 38 c5 7c 08 <84> c0 0f 85 9a 06 00 00 8b 03 84 c0 75 e6 48 b8 00 00 00 00 00 fc
All code
========
0: 37 (bad)
1: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
8: fc ff df
b: 49 89 dc mov %rbx,%r12
e: 48 89 dd mov %rbx,%rbp
11: 49 c1 ec 03 shr $0x3,%r12
15: 83 e5 07 and $0x7,%ebp
18: 49 01 c4 add %rax,%r12
1b: 83 c5 03 add $0x3,%ebp
1e: f3 90 pause
20: 41 0f b6 04 24 movzbl (%r12),%eax
25: 40 38 c5 cmp %al,%bpl
28: 7c 08 jl 0x32
2a:* 84 c0 test %al,%al <-- trapping instruction
2c: 0f 85 9a 06 00 00 jne 0x6cc
32: 8b 03 mov (%rbx),%eax
34: 84 c0 test %al,%al
36: 75 e6 jne 0x1e
38: 48 rex.W
39: b8 00 00 00 00 mov $0x0,%eax
3e: 00 fc add %bh,%ah

Code starting with the faulting instruction
===========================================
0: 84 c0 test %al,%al
2: 0f 85 9a 06 00 00 jne 0x6a2
8: 8b 03 mov (%rbx),%eax
a: 84 c0 test %al,%al
c: 75 e6 jne 0xfffffffffffffff4
e: 48 rex.W
f: b8 00 00 00 00 mov $0x0,%eax
14: 00 fc add %bh,%ah
[ 116.565997][ C1] RSP: 0018:ffffc900004477b0 EFLAGS: 00000206
[ 116.566111][ C1] RAX: 0000000000000000 RBX: ffff8884095d7218 RCX: ffffffff812ee01d
[ 116.566240][ C1] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff8884095d7218
[ 116.566370][ C1] RBP: 0000000000000003 R08: 0000000000000000 R09: ffff8884095d721b
[ 116.566499][ C1] R10: ffffed10812bae43 R11: 0000000000000001 R12: ffffed10812bae43
[ 116.566628][ C1] R13: ffff888413631180 R14: 0000000000000001 R15: 1ffff92000088ef7
[ 116.566758][ C1] FS: 0000000000000000(0000) GS:ffff888361e80000(0000) knlGS:0000000000000000
[ 116.566903][ C1] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 116.567031][ C1] CR2: 00007f76e3ce3180 CR3: 000000041d60e004 CR4: 00000000001706e0
[ 116.567163][ C1] Call Trace:
[ 116.567223][ C1] <TASK>
[ 116.567279][ C1] ? .slowpath (kernel/locking/qspinlock.c:316)
[ 116.567357][ C1] ? __get_node_page (fs/f2fs/node.c:35 fs/f2fs/node.c:1410) f2fs
[ 116.567495][ C1] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 116.567584][ C1] _raw_spin_lock (arch/x86/include/asm/paravirt.h:591 arch/x86/include/asm/qspinlock.h:51 include/asm-generic/qspinlock.h:85 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 116.567664][ C1] ? _raw_write_lock_irq (kernel/locking/spinlock.c:153)
[ 116.567753][ C1] ? __inode_add_lru (arch/x86/include/asm/atomic.h:29 include/linux/atomic/atomic-instrumented.h:28 fs/inode.c:458)
[ 116.567839][ C1] writeback_sb_inodes (fs/fs-writeback.c:1812)
[ 116.567927][ C1] ? sync_inode_metadata (fs/fs-writeback.c:1768)
[ 116.568035][ C1] ? wb_io_lists_populated (arch/x86/include/asm/bitops.h:207 include/asm-generic/bitops/instrumented-non-atomic.h:135 include/linux/backing-dev.h:55 fs/fs-writeback.c:87)
[ 116.568129][ C1] ? queue_io (fs/fs-writeback.c:1427)
[ 116.568205][ C1] wb_writeback (fs/fs-writeback.c:2029)
[ 116.568283][ C1] ? __writeback_inodes_wb (fs/fs-writeback.c:1980)
[ 116.568376][ C1] ? _raw_spin_lock_bh (arch/x86/include/asm/atomic.h:202 include/linux/atomic/atomic-instrumented.h:543 include/asm-generic/qspinlock.h:82 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:127 kernel/locking/spinlock.c:178)
[ 116.568463][ C1] ? _raw_spin_lock (kernel/locking/spinlock.c:177)
[ 116.568548][ C1] ? pointer (lib/vsprintf.c:2737)
[ 116.568624][ C1] wb_do_writeback (fs/fs-writeback.c:2173)
[ 116.568706][ C1] ? set_worker_desc (kernel/workqueue.c:4645)
[ 116.568791][ C1] ? wb_writeback (fs/fs-writeback.c:2165)
[ 116.568872][ C1] ? update_load_avg (kernel/sched/fair.c:3647 kernel/sched/fair.c:3902)
[ 116.568958][ C1] wb_workfn (fs/fs-writeback.c:2212 (discriminator 1))
[ 116.569052][ C1] ? inode_wait_for_writeback (fs/fs-writeback.c:2196)
[ 116.569147][ C1] ? __switch_to (arch/x86/include/asm/bitops.h:55 include/asm-generic/bitops/instrumented-atomic.h:29 include/linux/thread_info.h:89 arch/x86/include/asm/fpu/sched.h:65 arch/x86/kernel/process_64.c:622)
[ 116.569228][ C1] ? __schedule (kernel/sched/core.c:6267)
[ 116.569311][ C1] ? read_word_at_a_time (include/asm-generic/rwonce.h:86)
[ 116.569398][ C1] ? strscpy (lib/string.c:203)
[ 116.569473][ C1] process_one_work (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/workqueue.h:108 kernel/workqueue.c:2294)
[ 116.569558][ C1] worker_thread (include/linux/list.h:292 kernel/workqueue.c:2437)
[ 116.569638][ C1] ? process_one_work (kernel/workqueue.c:2379)
[ 116.569727][ C1] kthread (kernel/kthread.c:376)
[ 116.569798][ C1] ? kthread_complete_and_exit (kernel/kthread.c:331)
[ 116.569895][ C1] ret_from_fork (arch/x86/entry/entry_64.S:304)
[ 116.569984][ C1] </TASK>
[ 116.570050][ C1] Kernel panic - not syncing: softlockup: hung tasks
[ 116.570161][ C1] CPU: 1 PID: 56 Comm: kworker/u16:3 Tainted: G L 5.18.0-rc6-00010-gb47372dac98f #1
[ 116.570332][ C1] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[ 116.570477][ C1] Workqueue: writeback wb_workfn (flush-8:16)
[ 116.570582][ C1] Call Trace:
[ 116.570640][ C1] <IRQ>
[ 116.570694][ C1] dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1))
[ 116.570773][ C1] panic (kernel/panic.c:268)
[ 116.570844][ C1] ? panic_print_sys_info+0x60/0x60
[ 116.570946][ C1] ? add_taint (arch/x86/include/asm/bitops.h:60 include/asm-generic/bitops/instrumented-atomic.h:29 kernel/panic.c:464)
[ 116.570986][ C7] watchdog: BUG: soft lockup - CPU#7 stuck for 26s! [kworker/u16:5:58]
[ 116.570991][ C7] Modules linked in: dm_mod f2fs crc32_generic netconsole btrfs blake2b_generic xor raid6_pq zstd_compress libcrc32c intel_rapl_msr intel_rapl_common i915 sd_mod t10_pi intel_gtt x86_pkg_temp_thermal drm_buddy intel_powerclamp crc64_rocksoft_generic drm_dp_helper crc64_rocksoft crc64 ttm sg coretemp ipmi_devintf kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ipmi_msghandler crc32c_intel ghash_clmulni_intel rapl drm_kms_helper intel_cstate syscopyarea firewire_ohci sysfillrect sysimgblt ahci fb_sys_fops firewire_core libahci intel_uncore crc_itu_t drm libata mxm_wmi wmi video ip_tables
[ 116.571041][ C7] CPU: 7 PID: 58 Comm: kworker/u16:5 Tainted: G L 5.18.0-rc6-00010-gb47372dac98f #1
[ 116.571044][ C7] Hardware name: /DZ77BH-55K, BIOS BHZ7710H.86A.0097.2012.1228.1346 12/28/2012
[ 116.571054][ C7] Workqueue: writeback wb_update_bandwidth_workfn
[ 116.571059][ C7] RIP: 0010:native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:382)
[ 116.571063][ C7] Code: 37 48 b8 00 00 00 00 00 fc ff df 49 89 dc 48 89 dd 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 f3 90 41 0f b6 04 24 40 38 c5 7c 08 <84> c0 0f 85 9a 06 00 00 8b 03 84 c0 75 e6 48 b8 00 00 00 00 00 fc
All code
========
0: 37 (bad)
1: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
8: fc ff df
b: 49 89 dc mov %rbx,%r12
e: 48 89 dd mov %rbx,%rbp
11: 49 c1 ec 03 shr $0x3,%r12
15: 83 e5 07 and $0x7,%ebp
18: 49 01 c4 add %rax,%r12
1b: 83 c5 03 add $0x3,%ebp
1e: f3 90 pause
20: 41 0f b6 04 24 movzbl (%r12),%eax
25: 40 38 c5 cmp %al,%bpl
28: 7c 08 jl 0x32
2a:* 84 c0 test %al,%al <-- trapping instruction
2c: 0f 85 9a 06 00 00 jne 0x6cc
32: 8b 03 mov (%rbx),%eax
34: 84 c0 test %al,%al
36: 75 e6 jne 0x1e
38: 48 rex.W
39: b8 00 00 00 00 mov $0x0,%eax
3e: 00 fc add %bh,%ah

Code starting with the faulting instruction
===========================================
0: 84 c0 test %al,%al
2: 0f 85 9a 06 00 00 jne 0x6a2
8: 8b 03 mov (%rbx),%eax
a: 84 c0 test %al,%al
c: 75 e6 jne 0xfffffffffffffff4
e: 48 rex.W
f: b8 00 00 00 00 mov $0x0,%eax
14: 00 fc add %bh,%ah
[ 116.571065][ C7] RSP: 0018:ffffc90000467b60 EFLAGS: 00000206
[ 116.571068][ C7] RAX: 0000000000000000 RBX: ffff888418e4e0b8 RCX: ffffffff812ee01d
[ 116.571069][ C7] RDX: 0000000000000000 RSI: 0000000000000004 RDI: ffff888418e4e0b8
[ 116.571071][ C7] RBP: 0000000000000003 R08: 0000000000000000 R09: ffff888418e4e0bb
[ 116.571072][ C7] R10: ffffed10831c9c17 R11: 0000000000000001 R12: ffffed10831c9c17
[ 116.571074][ C7] R13: ffff888418e4e060 R14: 0000000000000001 R15: 1ffff9200008cf6d
[ 116.571075][ C7] FS: 0000000000000000(0000) GS:ffff888362180000(0000) knlGS:0000000000000000
[ 116.571077][ C7] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 116.571079][ C7] CR2: 00007f8c1d498710 CR3: 000000041d60e003 CR4: 00000000001706e0
[ 116.571080][ C7] Call Trace:
[ 116.571082][ C7] <TASK>
[ 116.571083][ C7] ? .slowpath (kernel/locking/qspinlock.c:316)
[ 116.571086][ C7] ? out_of_line_wait_on_bit_timeout (kernel/sched/completion.c:100)
[ 116.571089][ C7] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161)
[ 116.571092][ C7] ? newidle_balance (kernel/sched/fair.c:10172 kernel/sched/fair.c:10183 kernel/sched/fair.c:10945)
[ 116.571095][ C7] _raw_spin_lock (arch/x86/include/asm/paravirt.h:591 arch/x86/include/asm/qspinlock.h:51 include/asm-generic/qspinlock.h:85 include/linux/spinlock.h:185 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154)
[ 116.571098][ C7] ? _raw_write_lock_irq (kernel/locking/spinlock.c:153)
[ 116.571100][ C7] ? update_curr (include/linux/cgroup.h:482 include/linux/cgroup.h:551 include/linux/cgroup.h:795 kernel/sched/fair.c:909)
[ 116.571102][ C7] ? start_poll_synchronize_srcu (kernel/rcu/srcutree.c:913)
[ 116.571106][ C7] ? load_balance (kernel/sched/fair.c:10902)
[ 116.571109][ C7] __wb_update_bandwidth (mm/page-writeback.c:1339)
[ 116.571112][ C7] wb_update_bandwidth (mm/page-writeback.c:1365)
[ 116.571115][ C7] ? wb_calc_thresh (mm/page-writeback.c:1365)
[ 116.571117][ C7] ? strscpy (lib/string.c:203)
[ 116.571120][ C7] process_one_work (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/workqueue.h:108 kernel/workqueue.c:2294)
[ 116.571124][ C7] worker_thread (include/linux/list.h:292 kernel/workqueue.c:2437)
[ 116.571126][ C7] ? process_one_work (kernel/workqueue.c:2379)
[ 116.571129][ C7] kthread (kernel/kthread.c:376)
[ 116.571131][ C7] ? kthread_complete_and_exit (kernel/kthread.c:331)
[ 116.571133][ C7] ret_from_fork (arch/x86/entry/entry_64.S:304)
[ 116.571137][ C7] </TASK>
[ 116.575850][ C1] watchdog_timer_fn.cold (kernel/watchdog.c:435)
[ 116.575943][ C1] ? lockup_detector_update_enable (kernel/watchdog.c:355)
[ 116.576066][ C1] __hrtimer_run_queues (kernel/time/hrtimer.c:1685 kernel/time/hrtimer.c:1749)
[ 116.576157][ C1] ? __remove_hrtimer (kernel/time/hrtimer.c:1719)
[ 116.576245][ C1] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161)
[ 116.576343][ C1] ? recalibrate_cpu_khz (arch/x86/include/asm/msr.h:224 arch/x86/kernel/tsc.c:1095)
[ 116.576431][ C1] ? ktime_get_update_offsets_now (kernel/time/timekeeping.c:290 (discriminator 3) kernel/time/timekeeping.c:386 (discriminator 3) kernel/time/timekeeping.c:2299 (discriminator 3))
[ 116.576533][ C1] hrtimer_interrupt (kernel/time/hrtimer.c:1814)
[ 116.576618][ C1] __sysvec_apic_timer_interrupt (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 arch/x86/include/asm/trace/irq_vectors.h:41 arch/x86/kernel/apic/apic.c:1104)
[ 116.576721][ C1] sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1097 (discriminator 14))
[ 116.576819][ C1] </IRQ>
[ 116.576874][ C1] <TASK>
[ 116.576929][ C1] asm_sysvec_apic_timer_interrupt (arch/x86/include/asm/idtentry.h:645)
[ 116.577050][ C1] RIP: 0010:native_queued_spin_lock_slowpath (kernel/locking/qspinlock.c:382)
[ 116.577171][ C1] Code: 37 48 b8 00 00 00 00 00 fc ff df 49 89 dc 48 89 dd 49 c1 ec 03 83 e5 07 49 01 c4 83 c5 03 f3 90 41 0f b6 04 24 40 38 c5 7c 08 <84> c0 0f 85 9a 06 00 00 8b 03 84 c0 75 e6 48 b8 00 00 00 00 00 fc
All code
========
0: 37 (bad)
1: 48 b8 00 00 00 00 00 movabs $0xdffffc0000000000,%rax
8: fc ff df
b: 49 89 dc mov %rbx,%r12
e: 48 89 dd mov %rbx,%rbp
11: 49 c1 ec 03 shr $0x3,%r12
15: 83 e5 07 and $0x7,%ebp
18: 49 01 c4 add %rax,%r12
1b: 83 c5 03 add $0x3,%ebp
1e: f3 90 pause
20: 41 0f b6 04 24 movzbl (%r12),%eax
25: 40 38 c5 cmp %al,%bpl
28: 7c 08 jl 0x32
2a:* 84 c0 test %al,%al <-- trapping instruction
2c: 0f 85 9a 06 00 00 jne 0x6cc
32: 8b 03 mov (%rbx),%eax
34: 84 c0 test %al,%al
36: 75 e6 jne 0x1e
38: 48 rex.W
39: b8 00 00 00 00 mov $0x0,%eax
3e: 00 fc add %bh,%ah

Code starting with the faulting instruction
===========================================
0: 84 c0 test %al,%al
2: 0f 85 9a 06 00 00 jne 0x6a2
8: 8b 03 mov (%rbx),%eax
a: 84 c0 test %al,%al
c: 75 e6 jne 0xfffffffffffffff4
e: 48 rex.W
f: b8 00 00 00 00 mov $0x0,%eax
14: 00 fc add %bh,%ah


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://01.org/lkp



Attachments:
(No filename) (17.01 kB)
config-5.18.0-rc6-00010-gb47372dac98f (168.55 kB)
job-script (6.03 kB)
dmesg.xz (5.70 kB)
xfstests (20.33 kB)
job.yaml (4.73 kB)
Download all attachments