2018-04-19 03:03:21

by Fengguang Wu

[permalink] [raw]
Subject: [kcov] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

Hello,

FYI this happens in mainline kernel 4.17.0-rc1.
It at least dates back to v4.8 .

[ 25.697463]
[ 25.697463] Start testing find_bit() with random-filled bitmap
[ 25.736615] find_next_bit: 30203201 ns, 163965 iterations
[ 25.783266] find_next_zero_bit: 45554622 ns, 163716 iterations
[ 25.832480] find_last_bit: 48003878 ns, 163965 iterations
[ 52.050031] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 52.050031] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc1 #210
[ 52.050031] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 52.050031] RIP: 0010:__sanitizer_cov_trace_pc+0x65/0x66:
__sanitizer_cov_trace_pc at kernel/kcov.c:111
[ 52.050031] RSP: 0000:ffff88000fd43e40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 52.050031] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000017
[ 52.050031] RDX: 0000000000000000 RSI: ffff88000fd3e000 RDI: 0000000000000002
[ 52.050031] RBP: 0000000000050000 R08: 0000000000000008 R09: 0000000000000000
[ 52.050031] R10: ffff88000fd3e000 R11: 00000000222e2c32 R12: ffffffff85565d90
[ 52.050031] R13: 0000000000030f80 R14: 0000000000030f80 R15: 0000000000000000
[ 52.050031] FS: 0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000
[ 52.050031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 52.050031] CR2: 0000000000000000 CR3: 0000000003c6a000 CR4: 00000000000006b0
[ 52.050031] Call Trace:
[ 52.050031] find_first_bit+0x66/0x7c:
find_first_bit at lib/find_bit.c:106 (discriminator 1)
[ 52.050031] test_find_first_bit+0x3a/0x93
[ 52.050031] find_bit_test+0x84/0x173:
find_bit_test at lib/find_bit_benchmark.c:135
[ 52.050031] ? test_find_next_and_bit+0x9b/0x9b
[ 52.050031] do_one_initcall+0x18e/0x3df:
do_one_initcall at init/main.c:883
[ 52.050031] ? check_kcov_mode+0x5/0x37:
check_kcov_mode at kernel/kcov.c:60
[ 52.050031] ? __sanitizer_cov_trace_pc+0x2b/0x66:
__sanitizer_cov_trace_pc at kernel/kcov.c:101
[ 52.050031] ? check_kcov_mode+0x5/0x37:
check_kcov_mode at kernel/kcov.c:60
[ 52.050031] kernel_init_freeable+0x250/0x3a5:
do_initcall_level at init/main.c:951
(inlined by) do_initcalls at init/main.c:959
(inlined by) do_basic_setup at init/main.c:977
(inlined by) kernel_init_freeable at init/main.c:1127
[ 52.050031] ? rest_init+0x13a/0x13a:
kernel_init at init/main.c:1050
[ 52.050031] kernel_init+0x17/0x218:
kernel_init at init/main.c:1053
[ 52.050031] ? rest_init+0x13a/0x13a:
kernel_init at init/main.c:1050
[ 52.050031] ret_from_fork+0x35/0x40:
ret_from_fork at arch/x86/entry/entry_64.S:418
[ 52.050031] Code: 1c 00 00 8b 8b 1c 1c 00 00 48 ff 05 d2 2d fd 04 48 8b 02 48 ff c0 48 39 c1 76 0e 48 ff 05 c8 2d fd 04 48 89 2c c2 48 89 02 5b 5d <c3> 48 8b 0c 24 40 0f b6 d6 40 0f b6 f7 31 ff 48 ff 05 e9 2d fd
[ 52.050031] Kernel panic - not syncing: softlockup: hung tasks
[ 52.050031] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G L 4.17.0-rc1 #210
[ 52.050031] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[ 52.050031] Call Trace:
[ 52.050031] <IRQ>
[ 52.050031] dump_stack+0xf3/0x135:
__dump_stack at lib/dump_stack.c:77
(inlined by) dump_stack at lib/dump_stack.c:113
[ 52.050031] panic+0x173/0x440:
panic at kernel/panic.c:195
[ 52.050031] watchdog_timer_fn+0x343/0x382:
watchdog_timer_fn at kernel/watchdog.c:444
[ 52.050031] __hrtimer_run_queues+0x2a0/0x4c3:
__run_hrtimer at kernel/time/hrtimer.c:1386
(inlined by) __hrtimer_run_queues at kernel/time/hrtimer.c:1448
[ 52.050031] hrtimer_run_queues+0xcb/0xed:
hrtimer_run_queues at kernel/time/hrtimer.c:1621
[ 52.050031] run_local_timers+0x2d/0x85:
run_local_timers at kernel/time/timer.c:1704
[ 52.050031] update_process_times+0x30/0x7e:
update_process_times at kernel/time/timer.c:1636
[ 52.050031] tick_nohz_handler+0x141/0x19a:
tick_sched_handle at kernel/time/tick-sched.c:173
(inlined by) tick_nohz_handler at kernel/time/tick-sched.c:1182
[ 52.050031] smp_apic_timer_interrupt+0x1a5/0x2c4:
local_apic_timer_interrupt at arch/x86/kernel/apic/apic.c:1025
(inlined by) smp_apic_timer_interrupt at arch/x86/kernel/apic/apic.c:1050
[ 52.050031] apic_timer_interrupt+0xf/0x20:
apic_timer_interrupt at arch/x86/entry/entry_64.S:863
[ 52.050031] </IRQ>
[ 52.050031] RIP: 0010:__sanitizer_cov_trace_pc+0x65/0x66:
__sanitizer_cov_trace_pc at kernel/kcov.c:111
[ 52.050031] RSP: 0000:ffff88000fd43e40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 52.050031] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000017
[ 52.050031] RDX: 0000000000000000 RSI: ffff88000fd3e000 RDI: 0000000000000002
[ 52.050031] RBP: 0000000000050000 R08: 0000000000000008 R09: 0000000000000000
[ 52.050031] R10: ffff88000fd3e000 R11: 00000000222e2c32 R12: ffffffff85565d90
[ 52.050031] R13: 0000000000030f80 R14: 0000000000030f80 R15: 0000000000000000
[ 52.050031] find_first_bit+0x66/0x7c:
find_first_bit at lib/find_bit.c:106 (discriminator 1)
[ 52.050031] test_find_first_bit+0x3a/0x93
[ 52.050031] find_bit_test+0x84/0x173:
find_bit_test at lib/find_bit_benchmark.c:135
[ 52.050031] ? test_find_next_and_bit+0x9b/0x9b
[ 52.050031] do_one_initcall+0x18e/0x3df:
do_one_initcall at init/main.c:883
[ 52.050031] ? check_kcov_mode+0x5/0x37:
check_kcov_mode at kernel/kcov.c:60
[ 52.050031] ? __sanitizer_cov_trace_pc+0x2b/0x66:
__sanitizer_cov_trace_pc at kernel/kcov.c:101
[ 52.050031] ? check_kcov_mode+0x5/0x37:
check_kcov_mode at kernel/kcov.c:60
[ 52.050031] kernel_init_freeable+0x250/0x3a5:
do_initcall_level at init/main.c:951
(inlined by) do_initcalls at init/main.c:959
(inlined by) do_basic_setup at init/main.c:977
(inlined by) kernel_init_freeable at init/main.c:1127
[ 52.050031] ? rest_init+0x13a/0x13a:
kernel_init at init/main.c:1050
[ 52.050031] kernel_init+0x17/0x218:
kernel_init at init/main.c:1053
[ 52.050031] ? rest_init+0x13a/0x13a:
kernel_init at init/main.c:1050
[ 52.050031] ret_from_fork+0x35/0x40:
ret_from_fork at arch/x86/entry/entry_64.S:418
[ 52.050031] Kernel Offset: disabled

Attached the full dmesg, kconfig and reproduce scripts.

Thanks,
Fengguang


Attachments:
(No filename) (6.49 kB)
dmesg-vm-intel12-yocto-x86_64-5:20180417140128:x86_64-randconfig-s3-04171327:4.17.0-rc1:210 (43.04 kB)
.config (114.49 kB)
job-script (4.11 kB)
reproduce-vm-intel12-yocto-x86_64-5:20180417140128:x86_64-randconfig-s3-04171327:4.17.0-rc1:210 (2.15 kB)
Download all attachments

2018-04-20 14:09:13

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [kcov] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]



On 04/19/2018 06:01 AM, Fengguang Wu wrote:
> Hello,
>
> FYI this happens in mainline kernel 4.17.0-rc1.
> It at least dates back to v4.8 .
>
> [ 25.697463]
> [ 25.697463] Start testing find_bit() with random-filled bitmap
> [ 25.736615] find_next_bit: 30203201 ns, 163965 iterations
> [ 25.783266] find_next_zero_bit: 45554622 ns, 163716 iterations
> [ 25.832480] find_last_bit: 48003878 ns, 163965 iterations
> [ 52.050031] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
> [ 52.050031] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc1 #210
> [ 52.050031] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> [ 52.050031] RIP: 0010:__sanitizer_cov_trace_pc+0x65/0x66:
> __sanitizer_cov_trace_pc at kernel/kcov.c:111
> [ 52.050031] RSP: 0000:ffff88000fd43e40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> [ 52.050031] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000017
> [ 52.050031] RDX: 0000000000000000 RSI: ffff88000fd3e000 RDI: 0000000000000002
> [ 52.050031] RBP: 0000000000050000 R08: 0000000000000008 R09: 0000000000000000
> [ 52.050031] R10: ffff88000fd3e000 R11: 00000000222e2c32 R12: ffffffff85565d90
> [ 52.050031] R13: 0000000000030f80 R14: 0000000000030f80 R15: 0000000000000000
> [ 52.050031] FS: 0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000
> [ 52.050031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 52.050031] CR2: 0000000000000000 CR3: 0000000003c6a000 CR4: 00000000000006b0
> [ 52.050031] Call Trace:
> [ 52.050031] find_first_bit+0x66/0x7c:
> find_first_bit at lib/find_bit.c:106 (discriminator 1)
> [ 52.050031] test_find_first_bit+0x3a/0x93


I suppose test_find_first_bit() needs a little more than zero cond_resched() calls.


> [ 52.050031] find_bit_test+0x84/0x173:
> find_bit_test at lib/find_bit_benchmark.c:135
> [ 52.050031] ? test_find_next_and_bit+0x9b/0x9b
> [ 52.050031] do_one_initcall+0x18e/0x3df:
> do_one_initcall at init/main.c:883
> [ 52.050031] ? check_kcov_mode+0x5/0x37:
> check_kcov_mode at kernel/kcov.c:60
> [ 52.050031] ? __sanitizer_cov_trace_pc+0x2b/0x66:
> __sanitizer_cov_trace_pc at kernel/kcov.c:101
> [ 52.050031] ? check_kcov_mode+0x5/0x37:
> check_kcov_mode at kernel/kcov.c:60
> [ 52.050031] kernel_init_freeable+0x250/0x3a5:
> do_initcall_level at init/main.c:951
> (inlined by) do_initcalls at init/main.c:959
> (inlined by) do_basic_setup at init/main.c:977
> (inlined by) kernel_init_freeable at init/main.c:1127
> [ 52.050031] ? rest_init+0x13a/0x13a:
> kernel_init at init/main.c:1050
> [ 52.050031] kernel_init+0x17/0x218:
> kernel_init at init/main.c:1053
> [ 52.050031] ? rest_init+0x13a/0x13a:
> kernel_init at init/main.c:1050
> [ 52.050031] ret_from_fork+0x35/0x40:
> ret_from_fork at arch/x86/entry/entry_64.S:418
> [ 52.050031] Code: 1c 00 00 8b 8b 1c 1c 00 00 48 ff 05 d2 2d fd 04 48 8b 02 48 ff c0 48 39 c1 76 0e 48 ff 05 c8 2d fd 04 48 89 2c c2 48 89 02 5b 5d <c3> 48 8b 0c 24 40 0f b6 d6 40 0f b6 f7 31 ff 48 ff 05 e9 2d fd
> [ 52.050031] Kernel panic - not syncing: softlockup: hung tasks

2018-04-20 16:50:00

by Yury Norov

[permalink] [raw]
Subject: Re: [kcov] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

On Fri, Apr 20, 2018 at 05:08:23PM +0300, Andrey Ryabinin wrote:
>
>
> On 04/19/2018 06:01 AM, Fengguang Wu wrote:
> > Hello,
> >
> > FYI this happens in mainline kernel 4.17.0-rc1.
> > It at least dates back to v4.8 .
> >
> > [ 25.697463]
> > [ 25.697463] Start testing find_bit() with random-filled bitmap
> > [ 25.736615] find_next_bit: 30203201 ns, 163965 iterations
> > [ 25.783266] find_next_zero_bit: 45554622 ns, 163716 iterations
> > [ 25.832480] find_last_bit: 48003878 ns, 163965 iterations
> > [ 52.050031] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
> > [ 52.050031] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc1 #210
> > [ 52.050031] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
> > [ 52.050031] RIP: 0010:__sanitizer_cov_trace_pc+0x65/0x66:
> > __sanitizer_cov_trace_pc at kernel/kcov.c:111
> > [ 52.050031] RSP: 0000:ffff88000fd43e40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
> > [ 52.050031] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000017
> > [ 52.050031] RDX: 0000000000000000 RSI: ffff88000fd3e000 RDI: 0000000000000002
> > [ 52.050031] RBP: 0000000000050000 R08: 0000000000000008 R09: 0000000000000000
> > [ 52.050031] R10: ffff88000fd3e000 R11: 00000000222e2c32 R12: ffffffff85565d90
> > [ 52.050031] R13: 0000000000030f80 R14: 0000000000030f80 R15: 0000000000000000
> > [ 52.050031] FS: 0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000
> > [ 52.050031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [ 52.050031] CR2: 0000000000000000 CR3: 0000000003c6a000 CR4: 00000000000006b0
> > [ 52.050031] Call Trace:
> > [ 52.050031] find_first_bit+0x66/0x7c:
> > find_first_bit at lib/find_bit.c:106 (discriminator 1)
> > [ 52.050031] test_find_first_bit+0x3a/0x93
>
>
> I suppose test_find_first_bit() needs a little more than zero cond_resched() calls.

Or just decrease length of bitmap in this test. find_first_bit() takes
lion's share of time of the test.

Start testing find_bit() with random-filled bitmap
find_next_bit: 240043 cycles, 164062 iterations
find_next_zero_bit: 312848 cycles, 163619 iterations
find_last_bit: 193748 cycles, 164062 iterations
find_first_bit: 177720874 cycles, 164062 iterations

I'll test the patch and send it soon if you OK with it.

Yury

2018-04-20 19:55:11

by Andrey Ryabinin

[permalink] [raw]
Subject: Re: [kcov] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]

On 04/20/2018 07:45 PM, Yury Norov wrote:
> On Fri, Apr 20, 2018 at 05:08:23PM +0300, Andrey Ryabinin wrote:
>>
>>
>> On 04/19/2018 06:01 AM, Fengguang Wu wrote:
>>> Hello,
>>>
>>> FYI this happens in mainline kernel 4.17.0-rc1.
>>> It at least dates back to v4.8 .
>>>
>>> [ 25.697463]
>>> [ 25.697463] Start testing find_bit() with random-filled bitmap
>>> [ 25.736615] find_next_bit: 30203201 ns, 163965 iterations
>>> [ 25.783266] find_next_zero_bit: 45554622 ns, 163716 iterations
>>> [ 25.832480] find_last_bit: 48003878 ns, 163965 iterations
>>> [ 52.050031] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
>>> [ 52.050031] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.17.0-rc1 #210
>>> [ 52.050031] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
>>> [ 52.050031] RIP: 0010:__sanitizer_cov_trace_pc+0x65/0x66:
>>> __sanitizer_cov_trace_pc at kernel/kcov.c:111
>>> [ 52.050031] RSP: 0000:ffff88000fd43e40 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
>>> [ 52.050031] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000017
>>> [ 52.050031] RDX: 0000000000000000 RSI: ffff88000fd3e000 RDI: 0000000000000002
>>> [ 52.050031] RBP: 0000000000050000 R08: 0000000000000008 R09: 0000000000000000
>>> [ 52.050031] R10: ffff88000fd3e000 R11: 00000000222e2c32 R12: ffffffff85565d90
>>> [ 52.050031] R13: 0000000000030f80 R14: 0000000000030f80 R15: 0000000000000000
>>> [ 52.050031] FS: 0000000000000000(0000) GS:ffff88001f600000(0000) knlGS:0000000000000000
>>> [ 52.050031] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [ 52.050031] CR2: 0000000000000000 CR3: 0000000003c6a000 CR4: 00000000000006b0
>>> [ 52.050031] Call Trace:
>>> [ 52.050031] find_first_bit+0x66/0x7c:
>>> find_first_bit at lib/find_bit.c:106 (discriminator 1)
>>> [ 52.050031] test_find_first_bit+0x3a/0x93
>>
>>
>> I suppose test_find_first_bit() needs a little more than zero cond_resched() calls.
>
> Or just decrease length of bitmap in this test. find_first_bit() takes
> lion's share of time of the test.
>

Yeah, that would be better.

> Start testing find_bit() with random-filled bitmap
> find_next_bit: 240043 cycles, 164062 iterations
> find_next_zero_bit: 312848 cycles, 163619 iterations
> find_last_bit: 193748 cycles, 164062 iterations
> find_first_bit: 177720874 cycles, 164062 iterations
>
> I'll test the patch and send it soon if you OK with it.
>
> Yury
>