2023-06-12 02:43:33

by kernel test robot

[permalink] [raw]
Subject: [tj-wq:affinity-scopes-dbg-invalid-cpu] [workqueue] e3485721f9: WARNING:at_kernel/workqueue.c:#workqueue_online_cpu



Hello,

kernel test robot noticed "WARNING:at_kernel/workqueue.c:#workqueue_online_cpu" on:

commit: e3485721f97d7230d5fd46d77047a5dc4e6e39c8 ("workqueue: Add workqueue_attrs->__pod_cpumask")
https://git.kernel.org/cgit/linux/kernel/git/tj/wq.git affinity-scopes-dbg-invalid-cpu

in testcase: xfstests
version: xfstests-x86_64-06c027a-1_20230529
with following parameters:

disk: 4HDD
fs: ext4
fs2: smbv3
test: generic-group-19



compiler: gcc-12
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-----------------------------------------------------+------------+------------+
| | d5e2e83891 | e3485721f9 |
+-----------------------------------------------------+------------+------------+
| WARNING:at_kernel/workqueue.c:#workqueue_online_cpu | 0 | 14 |
| RIP:workqueue_online_cpu | 0 | 14 |
| WARNING:at_kernel/workqueue.c:#process_one_work | 0 | 14 |
| RIP:process_one_work | 0 | 14 |
+-----------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <[email protected]>
| Closes: https://lore.kernel.org/oe-lkp/[email protected]


[ 3.388228][ T21] ------------[ cut here ]------------
[ 3.394990][ T21] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:5393 workqueue_online_cpu (kernel/workqueue.c:5393 kernel/workqueue.c:5481)
[ 3.407780][ T21] Modules linked in:
[ 3.412747][ T21] CPU: 1 PID: 21 Comm: cpuhp/1 Not tainted 6.4.0-rc1-00033-ge3485721f97d #1
[ 3.423777][ T21] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.8.1 12/05/2017
[ 3.433776][ T21] RIP: 0010:workqueue_online_cpu (kernel/workqueue.c:5393 kernel/workqueue.c:5481)
[ 3.441752][ T21] Code: e8 03 80 3c 18 00 0f 85 c5 00 00 00 48 8b 45 60 48 8d 68 a0 49 39 c6 75 94 4c 89 ff e8 e1 1a 5a 02 e9 34 fc ff ff 0f 0b eb a5 <0f> 0b e9 08 ff ff ff 48 8b 05 ea 32 30 03 4c 8b 7c 24 28 48 bd 00
All code
========
0: e8 03 80 3c 18 callq 0x183c8008
5: 00 0f add %cl,(%rdi)
7: 85 c5 test %eax,%ebp
9: 00 00 add %al,(%rax)
b: 00 48 8b add %cl,-0x75(%rax)
e: 45 60 rex.RB (bad)
10: 48 8d 68 a0 lea -0x60(%rax),%rbp
14: 49 39 c6 cmp %rax,%r14
17: 75 94 jne 0xffffffffffffffad
19: 4c 89 ff mov %r15,%rdi
1c: e8 e1 1a 5a 02 callq 0x25a1b02
21: e9 34 fc ff ff jmpq 0xfffffffffffffc5a
26: 0f 0b ud2
28: eb a5 jmp 0xffffffffffffffcf
2a:* 0f 0b ud2 <-- trapping instruction
2c: e9 08 ff ff ff jmpq 0xffffffffffffff39
31: 48 8b 05 ea 32 30 03 mov 0x33032ea(%rip),%rax # 0x3303322
38: 4c 8b 7c 24 28 mov 0x28(%rsp),%r15
3d: 48 rex.W
3e: bd .byte 0xbd
...

Code starting with the faulting instruction
===========================================
0: 0f 0b ud2
2: e9 08 ff ff ff jmpq 0xffffffffffffff0f
7: 48 8b 05 ea 32 30 03 mov 0x33032ea(%rip),%rax # 0x33032f8
e: 4c 8b 7c 24 28 mov 0x28(%rsp),%r15
13: 48 rex.W
14: bd .byte 0xbd
...
[ 3.466823][ T21] RSP: 0000:ffffc900001cfcd0 EFLAGS: 00010282
[ 3.474752][ T21] RAX: 00000000ffffffea RBX: dffffc0000000000 RCX: 0000000000000003
[ 3.484775][ T21] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff88810d78e094
[ 3.495775][ T21] RBP: ffffed10f3997388 R08: ffff88810d78d7e0 R09: ffff88810c98f168
[ 3.505775][ T21] R10: fffff52000039f52 R11: 0000000000000040 R12: ffff88810d081c50
[ 3.515775][ T21] R13: ffff88810d081c00 R14: ffff88879ccb9c08 R15: ffff88879ccb9940
[ 3.525775][ T21] FS: 0000000000000000(0000) GS:ffff88879cc80000(0000) knlGS:0000000000000000
[ 3.537777][ T21] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3.545754][ T21] CR2: 0000000000000000 CR3: 000000081be44001 CR4: 00000000003706e0
[ 3.555775][ T21] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3.566775][ T21] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3.576775][ T21] Call Trace:
[ 3.580746][ T21] <TASK>
[ 3.584746][ T21] ? workqueue_prepare_cpu (kernel/workqueue.c:5470)
[ 3.591752][ T21] ? local_clock (kernel/sched/clock.c:270 kernel/sched/clock.c:307)
[ 3.596749][ T21] cpuhp_invoke_callback (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/cpuhp.h:65 kernel/cpu.c:193)
[ 3.603751][ T21] ? try_to_wake_up (include/linux/sched.h:2254 kernel/sched/core.c:4312)
[ 3.610751][ T21] cpuhp_thread_fun (kernel/cpu.c:815)
[ 3.616750][ T21] ? __cpuhp_remove_state (kernel/cpu.c:772)
[ 3.622751][ T21] ? _raw_read_unlock_irqrestore (kernel/locking/spinlock.c:161)
[ 3.630752][ T21] ? __kthread_parkme (kernel/kthread.c:270)
[ 3.636750][ T21] ? schedule (arch/x86/include/asm/preempt.h:85 (discriminator 1) kernel/sched/core.c:6749 (discriminator 1))
[ 3.642749][ T21] ? smpboot_thread_fn (kernel/smpboot.c:112)
[ 3.648750][ T21] smpboot_thread_fn (kernel/smpboot.c:164 (discriminator 3))
[ 3.654751][ T21] ? sort_range (kernel/smpboot.c:107)
[ 3.660748][ T21] kthread (kernel/kthread.c:379)
[ 3.665748][ T21] ? kthread_complete_and_exit (kernel/kthread.c:332)
[ 3.672752][ T21] ret_from_fork (arch/x86/entry/entry_64.S:314)
[ 3.678751][ T21] </TASK>
[ 3.682745][ T21] ---[ end trace 0000000000000000 ]---
[ 3.690931][ T1] #2
[ 1.085404][ T0] masked ExtINT on CPU#2
[ 3.700819][ T1] #3
[ 1.085404][ T0] masked ExtINT on CPU#3
[ 3.709915][ T1] smp: Brought up 1 node, 4 CPUs
[ 3.710982][ T1] smpboot: Max logical packages: 1
[ 3.711750][ T1] smpboot: Total of 4 processors activated (25599.84 BogoMIPS)
[ 3.810660][ T39] node 0 deferred pages initialised in 95ms
[ 3.820827][ T1] devtmpfs: initialized
[ 3.826633][ T1] x86/mm: Memory block size: 128MB


To reproduce:

git clone https://github.com/intel/lkp-tests.git
cd lkp-tests
sudo bin/lkp install job.yaml # job file is attached in this email
bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run
sudo bin/lkp run generated-yaml-file

# if come across any failure that blocks the test,
# please remove ~/.lkp and /lkp dir to run from a clean state.



--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki



Attachments:
(No filename) (6.95 kB)
config-6.4.0-rc1-00033-ge3485721f97d (165.11 kB)
job-script (5.89 kB)
dmesg.xz (45.82 kB)
xfstests (1.69 kB)
job.yaml (5.01 kB)
reproduce (1.42 kB)
Download all attachments

2023-06-12 17:55:37

by Tejun Heo

[permalink] [raw]
Subject: Re: [tj-wq:affinity-scopes-dbg-invalid-cpu] [workqueue] e3485721f9: WARNING:at_kernel/workqueue.c:#workqueue_online_cpu

On Mon, Jun 12, 2023 at 10:04:02AM +0800, kernel test robot wrote:
> [ 3.388228][ T21] ------------[ cut here ]------------
> [ 3.394990][ T21] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:5393 workqueue_online_cpu (kernel/workqueue.c:5393 kernel/workqueue.c:5481)
...
> [ 3.584746][ T21] ? workqueue_prepare_cpu (kernel/workqueue.c:5470)
> [ 3.591752][ T21] ? local_clock (kernel/sched/clock.c:270 kernel/sched/clock.c:307)
> [ 3.596749][ T21] cpuhp_invoke_callback (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/cpuhp.h:65 kernel/cpu.c:193)
> [ 3.603751][ T21] ? try_to_wake_up (include/linux/sched.h:2254 kernel/sched/core.c:4312)
> [ 3.610751][ T21] cpuhp_thread_fun (kernel/cpu.c:815)

This is set_cpus_allowed_ptr() failing on kworkers when workqueue tries to
put back the per-cpu kworkers on the onlining CPU, which is rather
unexpected. Let me include the error code in the warning message so that we
can understand why it fails the next time.

Thanks.

--
tejun

2023-06-12 17:59:21

by Tejun Heo

[permalink] [raw]
Subject: Re: [tj-wq:affinity-scopes-dbg-invalid-cpu] [workqueue] e3485721f9: WARNING:at_kernel/workqueue.c:#workqueue_online_cpu

On Mon, Jun 12, 2023 at 07:42:11AM -1000, Tejun Heo wrote:
> On Mon, Jun 12, 2023 at 10:04:02AM +0800, kernel test robot wrote:
> > [ 3.388228][ T21] ------------[ cut here ]------------
> > [ 3.394990][ T21] WARNING: CPU: 1 PID: 21 at kernel/workqueue.c:5393 workqueue_online_cpu (kernel/workqueue.c:5393 kernel/workqueue.c:5481)
> ...
> > [ 3.584746][ T21] ? workqueue_prepare_cpu (kernel/workqueue.c:5470)
> > [ 3.591752][ T21] ? local_clock (kernel/sched/clock.c:270 kernel/sched/clock.c:307)
> > [ 3.596749][ T21] cpuhp_invoke_callback (arch/x86/include/asm/jump_label.h:27 include/linux/jump_label.h:207 include/trace/events/cpuhp.h:65 kernel/cpu.c:193)
> > [ 3.603751][ T21] ? try_to_wake_up (include/linux/sched.h:2254 kernel/sched/core.c:4312)
> > [ 3.610751][ T21] cpuhp_thread_fun (kernel/cpu.c:815)
>
> This is set_cpus_allowed_ptr() failing on kworkers when workqueue tries to
> put back the per-cpu kworkers on the onlining CPU, which is rather
> unexpected. Let me include the error code in the warning message so that we
> can understand why it fails the next time.

Ah, never mind. This ->__pod_cpumaks not being initialized correctly by the
patch and is already fixed in the newer version of the code.

Thanks.

--
tejun