2017-04-14 03:01:14

by kernel test robot

[permalink] [raw]
Subject: [lkp-robot] [locking/ww] 57dd924e54: BUG:soft_lockup-CPU##stuck_for#s


FYI, we noticed the following commit:

commit: 57dd924e541a98219bf3a508623db2e0c07e75a7 ("locking/ww-mutex: Limit stress test to 2 seconds")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


+--------------------------------------------------+------------+------------+
| | 44fe84459f | 57dd924e54 |
+--------------------------------------------------+------------+------------+
| boot_successes | 0 | 0 |
| boot_failures | 8 | 8 |
| INFO:task_blocked_for_more_than#seconds | 8 | |
| calltrace:debug_show_all_locks | 8 | |
| calltrace:ww_mutex_lock_slow | 7 | |
| Kernel_panic-not_syncing:hung_task:blocked_tasks | 8 | |
| calltrace:stress_inorder_work | 1 | |
| IP-Config:Auto-configuration_of_network_failed | 0 | 4 |
| BUG:soft_lockup-CPU##stuck_for#s | 0 | 4 |
| Kernel_panic-not_syncing:softlockup:hung_tasks | 0 | 4 |
+--------------------------------------------------+------------+------------+



[ 273.632796] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/0:1]
[ 273.675540] irq event stamp: 4865008
[ 273.685130] hardirqs last enabled at (4865007): [<ffffffff8e848aad>] restore_regs_and_iret+0x0/0x1d
[ 273.698804] hardirqs last disabled at (4865008): [<ffffffff8e848d08>] apic_timer_interrupt+0x98/0xb0
[ 273.712354] softirqs last enabled at (4865006): [<ffffffff8d6c6601>] __do_softirq+0x1cf/0x22a
[ 273.725144] softirqs last disabled at (4864999): [<ffffffff8d6c6886>] irq_exit+0x56/0xa6
[ 273.737170] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc2-00029-g57dd924 #1
[ 273.748511] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
[ 273.763553] task: ffff89a4bae9c040 task.stack: ffff89a4baea0000
[ 273.772653] RIP: 0010:erase_augmented+0x79/0x23b
[ 273.781081] RSP: 0000:ffff89a4baea3e20 EFLAGS: 00000296 ORIG_RAX: ffffffffffffff10
[ 273.792428] RAX: 00000000ffffffff RBX: ffffffff91ce7a38 RCX: ffff89a4bae9c040
[ 273.802992] RDX: ffffffff91ce7b50 RSI: 0000000000000000 RDI: ffffffff91ce7328
[ 273.813680] RBP: ffff89a4baea3e50 R08: ffffffff91ce7560 R09: 0000000000000000
[ 273.824245] R10: 0000000000000000 R11: ffff89a4bae9c768 R12: 0000000000000000
[ 273.834880] R13: ffffffff91ce7a38 R14: ffffffff91ce7a39 R15: ffffffff91ce8200
[ 273.845602] FS: 0000000000000000(0000) GS:ffff89a4bbc00000(0000) knlGS:0000000000000000
[ 273.857595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 273.884265] CR2: 00000000ffffffff CR3: 000000000d419000 CR4: 00000000000006a0
[ 273.894885] Call Trace:
[ 273.898736] rbtree_test_init+0x1c3/0x27f
[ 273.904839] ? irq_poll_setup+0x96/0x96
[ 273.910698] ? set_debug_rodata+0x20/0x20
[ 273.916791] do_one_initcall+0xa2/0x17c
[ 273.922687] ? set_debug_rodata+0x20/0x20
[ 273.928744] kernel_init_freeable+0x202/0x2a3
[ 273.935363] ? rest_init+0x16d/0x16d
[ 273.940809] kernel_init+0xf/0x142
[ 273.946056] ? rest_init+0x16d/0x16d
[ 273.951528] ret_from_fork+0x31/0x40
[ 273.956974] Code: 45 10 75 0b e8 e5 db e1 ff 4d 89 65 10 eb 17 e8 da db e1 ff 4d 89 65 08 eb 0c e8 cf db e1 ff 4c 89 25 e7 ca 3b 04 e8 c3 db e1 ff <4d> 85 e4 74 0e e8 b9 db e1 ff 4d 89 34 24 e9 4e 01 00 00 e8 ab
[ 273.987248] Kernel panic - not syncing: softlockup: hung tasks
[ 273.996061] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G L 4.11.0-rc2-00029-g57dd924 #1
[ 274.009087] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
[ 274.024313] Call Trace:
[ 274.028133] <IRQ>
[ 274.031381] dump_stack+0xb1/0x103
[ 274.036617] panic+0xfb/0x2c2
[ 274.041180] watchdog_timer_fn+0x227/0x25a
[ 274.047441] ? __touch_watchdog+0x1c/0x1c
[ 274.053526] hrtimer_run_queues+0x140/0x1e4
[ 274.059914] run_local_timers+0x1e/0x64
[ 274.065741] update_process_times+0x21/0x4d
[ 274.072159] tick_nohz_handler+0xbd/0xf6
[ 274.078129] smp_trace_apic_timer_interrupt+0x62/0x74
[ 274.085735] smp_apic_timer_interrupt+0x9/0xb
[ 274.092356] apic_timer_interrupt+0x9d/0xb0
[ 274.098767] RIP: 0010:erase_augmented+0x79/0x23b
[ 274.107194] RSP: 0000:ffff89a4baea3e20 EFLAGS: 00000296 ORIG_RAX: ffffffffffffff10
[ 274.118506] RAX: 00000000ffffffff RBX: ffffffff91ce7a38 RCX: ffff89a4bae9c040
[ 274.129096] RDX: ffffffff91ce7b50 RSI: 0000000000000000 RDI: ffffffff91ce7328
[ 274.139761] RBP: ffff89a4baea3e50 R08: ffffffff91ce7560 R09: 0000000000000000
[ 274.150265] R10: 0000000000000000 R11: ffff89a4bae9c768 R12: 0000000000000000
[ 274.160893] R13: ffffffff91ce7a38 R14: ffffffff91ce7a39 R15: ffffffff91ce8200
[ 274.171516] </IRQ>
[ 274.174831] rbtree_test_init+0x1c3/0x27f
[ 274.182921] ? irq_poll_setup+0x96/0x96
[ 274.188758] ? set_debug_rodata+0x20/0x20
[ 274.194881] do_one_initcall+0xa2/0x17c
[ 274.200776] ? set_debug_rodata+0x20/0x20
[ 274.206941] kernel_init_freeable+0x202/0x2a3
[ 274.213644] ? rest_init+0x16d/0x16d
[ 274.219061] kernel_init+0xf/0x142
[ 274.224238] ? rest_init+0x16d/0x16d
[ 274.229723] ret_from_fork+0x31/0x40
[ 274.236556] Kernel Offset: 0xc600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Elapsed time: 290

initrds=(
/osimage/debian/debian-x86_64-2016-08-31.cgz
/lkp/scheduled/vm-lkp-wsx03-4G-7/boot-1-debian-x86_64-2016-08-31.cgz-57dd924e541a98219bf3a508623db2e0c07e75a7-20170412-119024-102ictu-0.cgz
/lkp/lkp/lkp-x86_64.cgz
/osimage/deps/debian-x86_64-2016-08-31.cgz/lkp_2017-04-01.cgz
/osimage/deps/debian-x86_64-2016-08-31.cgz/rsync-rootfs_2016-11-15.cgz
/osimage/deps/debian-x86_64-2016-08-31.cgz/run-ipconfig_2016-11-15.cgz
)

cat "${initrds[@]}" > /fs/sdc1/initrd-vm-lkp-wsx03-4G-7

kvm=(
qemu-system-x86_64
-enable-kvm
-cpu host
-kernel /pkg/linux/x86_64-randconfig-r0-04020107/gcc-6/57dd924e541a98219bf3a508623db2e0c07e75a7/vmlinuz-4.11.0-rc2-00029-g57dd924
-initrd /fs/sdc1/initrd-vm-lkp-wsx03-4G-7
-m 4096
-smp 2
-device e1000,netdev=net0
-netdev user,id=net0,hostfwd=tcp::23666-:22
-boot order=nc
-no-reboot
-watchdog i6300esb
-watchdog-action debug
-rtc base=localtime
-drive file=/fs/sdc1/disk0-vm-lkp-wsx03-4G-7,media=disk,if=virtio
-drive file=/fs/sdc1/disk1-vm-lkp-wsx03-4G-7,media=disk,if=virtio
-pidfile /dev/shm/kboot/pid-vm-lkp-wsx03-4G-7
-serial file:/dev/shm/kboot/vm-lkp-wsx03-4G-7/serial
-serial file:/dev/shm/kboot/vm-lkp-wsx03-4G-7/kmsg
-daemonize


To reproduce:

git clone https://github.com/01org/lkp-tests.git
cd lkp-tests
bin/lkp qemu -k <bzImage> job-script # job-script is attached in this email



Thanks,
Xiaolong


Attachments:
(No filename) (6.93 kB)
config-4.11.0-rc2-00029-g57dd924 (113.61 kB)
job-script (3.99 kB)
dmesg.xz (9.64 kB)
Download all attachments

2017-04-14 08:56:58

by Chris Wilson

[permalink] [raw]
Subject: Re: [lkp-robot] [locking/ww] 57dd924e54: BUG:soft_lockup-CPU##stuck_for#s

On Fri, Apr 14, 2017 at 10:59:27AM +0800, kernel test robot wrote:
>
> FYI, we noticed the following commit:
>
> commit: 57dd924e541a98219bf3a508623db2e0c07e75a7 ("locking/ww-mutex: Limit stress test to 2 seconds")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu host -smp 2 -m 4G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> +--------------------------------------------------+------------+------------+
> | | 44fe84459f | 57dd924e54 |
> +--------------------------------------------------+------------+------------+
> | boot_successes | 0 | 0 |
> | boot_failures | 8 | 8 |
> | INFO:task_blocked_for_more_than#seconds | 8 | |
> | calltrace:debug_show_all_locks | 8 | |
> | calltrace:ww_mutex_lock_slow | 7 | |
> | Kernel_panic-not_syncing:hung_task:blocked_tasks | 8 | |
> | calltrace:stress_inorder_work | 1 | |
> | IP-Config:Auto-configuration_of_network_failed | 0 | 4 |
> | BUG:soft_lockup-CPU##stuck_for#s | 0 | 4 |
> | Kernel_panic-not_syncing:softlockup:hung_tasks | 0 | 4 |
> +--------------------------------------------------+------------+------------+
>
>
>
> [ 273.632796] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [swapper/0:1]
> [ 273.675540] irq event stamp: 4865008
> [ 273.685130] hardirqs last enabled at (4865007): [<ffffffff8e848aad>] restore_regs_and_iret+0x0/0x1d
> [ 273.698804] hardirqs last disabled at (4865008): [<ffffffff8e848d08>] apic_timer_interrupt+0x98/0xb0
> [ 273.712354] softirqs last enabled at (4865006): [<ffffffff8d6c6601>] __do_softirq+0x1cf/0x22a
> [ 273.725144] softirqs last disabled at (4864999): [<ffffffff8d6c6886>] irq_exit+0x56/0xa6
> [ 273.737170] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 4.11.0-rc2-00029-g57dd924 #1
> [ 273.748511] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-20161025_171302-gandalf 04/01/2014
> [ 273.763553] task: ffff89a4bae9c040 task.stack: ffff89a4baea0000
> [ 273.772653] RIP: 0010:erase_augmented+0x79/0x23b
> [ 273.781081] RSP: 0000:ffff89a4baea3e20 EFLAGS: 00000296 ORIG_RAX: ffffffffffffff10
> [ 273.792428] RAX: 00000000ffffffff RBX: ffffffff91ce7a38 RCX: ffff89a4bae9c040
> [ 273.802992] RDX: ffffffff91ce7b50 RSI: 0000000000000000 RDI: ffffffff91ce7328
> [ 273.813680] RBP: ffff89a4baea3e50 R08: ffffffff91ce7560 R09: 0000000000000000
> [ 273.824245] R10: 0000000000000000 R11: ffff89a4bae9c768 R12: 0000000000000000
> [ 273.834880] R13: ffffffff91ce7a38 R14: ffffffff91ce7a39 R15: ffffffff91ce8200
> [ 273.845602] FS: 0000000000000000(0000) GS:ffff89a4bbc00000(0000) knlGS:0000000000000000
> [ 273.857595] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 273.884265] CR2: 00000000ffffffff CR3: 000000000d419000 CR4: 00000000000006a0
> [ 273.894885] Call Trace:
> [ 273.898736] rbtree_test_init+0x1c3/0x27f
> [ 273.904839] ? irq_poll_setup+0x96/0x96
> [ 273.910698] ? set_debug_rodata+0x20/0x20
> [ 273.916791] do_one_initcall+0xa2/0x17c
> [ 273.922687] ? set_debug_rodata+0x20/0x20
> [ 273.928744] kernel_init_freeable+0x202/0x2a3
> [ 273.935363] ? rest_init+0x16d/0x16d
> [ 273.940809] kernel_init+0xf/0x142
> [ 273.946056] ? rest_init+0x16d/0x16d
> [ 273.951528] ret_from_fork+0x31/0x40
> [ 273.956974] Code: 45 10 75 0b e8 e5 db e1 ff 4d 89 65 10 eb 17 e8 da db e1 ff 4d 89 65 08 eb 0c e8 cf db e1 ff 4c 89 25 e7 ca 3b 04 e8 c3 db e1 ff <4d> 85 e4 74 0e e8 b9 db e1 ff 4d 89 34 24 e9 4e 01 00 00 e8 ab

It looks like (summary + trace) that we have moved the timeout from the
ww_mutex stress test to the rbtree tests.
-Chris

--
Chris Wilson, Intel Open Source Technology Centre