2016-04-18 01:06:43

by kernel test robot

[permalink] [raw]
Subject: [lkp] [parallel lookups machinery, part 2] 96352435af: kernel BUG at mm/slab.c:2599!

FYI, we noticed the below changes on

https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git testing.lookups
commit 96352435afd792d713ec91e2d5693bdcf40a6b03 ("parallel lookups machinery, part 2")


+-------------------------------------------------------+------------+------------+
| | 028d11dcb3 | 96352435af |
+-------------------------------------------------------+------------+------------+
| boot_successes | 12 | 0 |
| boot_failures | 0 | 12 |
| kernel_BUG_at_mm/slab.c | 0 | 12 |
| invalid_opcode:#[##]SMP | 0 | 12 |
| RIP:kfree_debugcheck | 0 | 12 |
| Kernel_panic-not_syncing:Fatal_exception_in_interrupt | 0 | 12 |
| backtrace:smpboot_thread_fn | 0 | 11 |
+-------------------------------------------------------+------------+------------+



[ 56.931276] random: nonblocking pool is initialized
[ 81.306979] kfree_debugcheck: out of range ptr a0h
[ 81.308254] ------------[ cut here ]------------
[ 81.309323] kernel BUG at mm/slab.c:2599!
[ 81.310557] invalid opcode: 0000 [#1] SMP
[ 81.311669] Modules linked in:
[ 81.312539] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.6.0-rc1-00021-g9635243 #167
[ 81.314687] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Debian-1.8.2-1 04/01/2014
[ 81.316678] task: ffff8800314b4080 ti: ffff8800314b8000 task.ti: ffff8800314b8000
[ 81.318633] RIP: 0010:[<ffffffff811fecc1>] [<ffffffff811fecc1>] kfree_debugcheck+0x25/0x2a
[ 81.320659] RSP: 0000:ffff8800314bbcc8 EFLAGS: 00010092
[ 81.321788] RAX: 0000000000000026 RBX: 00000000000000a0 RCX: 0000000000000000
[ 81.323327] RDX: ffffffff8112e952 RSI: ffffffff82ea5638 RDI: ffffffff8111c147
[ 81.324783] RBP: ffff8800314bbcd0 R08: 0000000000000001 R09: 0000000000000000
[ 81.326214] R10: ffff8800314bbca8 R11: ffffffff831728b0 R12: 0000000000000206
[ 81.327586] R13: ffffffff811c97d1 R14: ffff88002abca1b8 R15: 0000000000000000
[ 81.329022] FS: 0000000000000000(0000) GS:ffff880033600000(0000) knlGS:0000000000000000
[ 81.331078] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 81.332283] CR2: 0000000047eb9da0 CR3: 000000002dba4000 CR4: 00000000000006f0
[ 81.333670] DR0: 00000000089cb000 DR1: 0000000000000000 DR2: 0000000000000000
[ 81.335118] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000600
[ 81.336572] Stack:
[ 81.337315] 00000000000000a0 ffff8800314bbcf8 ffffffff812007a4 ffff88002a5c4380
[ 81.339449] ffffffff84081080 ffff88002a5c4380 ffff8800314bbd10 ffffffff811c97d1
[ 81.341560] ffff8800337d8d40 ffff8800314bbda0 ffffffff8113da92 ffff8800314bc000
[ 81.343828] Call Trace:
[ 81.344757] [<ffffffff812007a4>] kfree+0xdc/0x18c
[ 81.346043] [<ffffffff811c97d1>] shmem_destroy_callback+0x19/0x2f
[ 81.347545] [<ffffffff8113da92>] rcu_process_callbacks+0x60d/0x830
[ 81.349030] [<ffffffff811c97b8>] ? shmem_destroy_inode+0x3b/0x3b
[ 81.350491] [<ffffffff82ea8df0>] __do_softirq+0x1a0/0x403
[ 81.351840] [<ffffffff810e58f0>] run_ksoftirqd+0x23/0x5c
[ 81.353179] [<ffffffff811000d8>] smpboot_thread_fn+0x1a9/0x1c0
[ 81.354574] [<ffffffff810fff2f>] ? cpumask_next+0x2f/0x2f
[ 81.355885] [<ffffffff810fc9a8>] kthread+0xc5/0xcd
[ 81.357110] [<ffffffff82ea200c>] ? __wait_for_common+0x119/0x14c
[ 81.358502] [<ffffffff82ea6072>] ret_from_fork+0x22/0x50
[ 81.359780] [<ffffffff810fc8e3>] ? cpumask_check+0x35/0x35
[ 81.361114] Code: 41 5e 41 5f 5d c3 66 66 66 66 90 55 48 89 e5 53 48 89 fb e8 50 44 ea ff 84 c0 75 11 48 89 de 48 c7 c7 04 4c 9e 83 e8 bb 3b fb ff <0f> 0b 5b 5d c3 8b 4f 18 48 29 f2 55 89 d6 48 89 e5 5d 48 0f af
[ 81.366666] RIP [<ffffffff811fecc1>] kfree_debugcheck+0x25/0x2a
[ 81.367604] RSP <ffff8800314bbcc8>
[ 81.368286] ---[ end trace 1f4ba232faf192ba ]---
[ 81.369021] Kernel panic - not syncing: Fatal exception in interrupt


vm-kbuild-yocto-ia32: 1 threads qemu-system-x86_64 -enable-kvm -cpu Westmere with 320M memory

FYI, raw QEMU command line is:

qemu-system-x86_64 -enable-kvm -cpu Westmere -kernel /pkg/linux/x86_64-allyesdebian/gcc-5/96352435afd792d713ec91e2d5693bdcf40a6b03/vmlinuz-4.6.0-rc1-00021-g9635243 -append 'root=/dev/ram0 user=lkp job=/lkp/scheduled/vm-kbuild-yocto-ia32-8/rand_boot-1-yocto-minimal-i386.cgz-x86_64-allyesdebian-96352435afd792d713ec91e2d5693bdcf40a6b03-20160416-73769-rktar3-0.yaml ARCH=x86_64 kconfig=x86_64-allyesdebian branch=vfs/testing.lookups commit=96352435afd792d713ec91e2d5693bdcf40a6b03 BOOT_IMAGE=/pkg/linux/x86_64-allyesdebian/gcc-5/96352435afd792d713ec91e2d5693bdcf40a6b03/vmlinuz-4.6.0-rc1-00021-g9635243 max_uptime=600 RESULT_ROOT=/result/boot/1/vm-kbuild-yocto-ia32/yocto-minimal-i386.cgz/x86_64-allyesdebian/gcc-5/96352435afd792d713ec91e2d5693bdcf40a6b03/0 LKP_SERVER=inn earlyprintk=ttyS0,115200 systemd.log_level=err debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 console=ttyS0,115200 console=tty0 vga=normal rw ip=::::vm-kbuild-yocto-ia32-8::dhcp drbd.minor_count=8' -initrd /fs/sdh1/initrd-vm-kbuild-yocto-ia32-8 -m 832 -smp 1 -device e1000,netdev=net0 -netdev user,id=net0 -boot order=nc -no-reboot -watchdog i6300esb -rtc base=localtime -drive file=/fs/sdh1/disk0-vm-kbuild-yocto-ia32-8,media=disk,if=virtio -pidfile /dev/shm/kboot/pid-vm-kbuild-yocto-ia32-8 -serial file:/dev/shm/kboot/serial-vm-kbuild-yocto-ia32-8 -daemonize -display none -monitor null





Thanks,
Kernel Test Robot


Attachments:
(No filename) (5.67 kB)
config-4.6.0-rc1-00021-g9635243 (148.35 kB)
dmesg.xz (30.45 kB)
Download all attachments

2016-04-18 01:55:27

by Al Viro

[permalink] [raw]
Subject: Re: [lkp] [parallel lookups machinery, part 2] 96352435af: kernel BUG at mm/slab.c:2599!

On Mon, Apr 18, 2016 at 09:06:35AM +0800, kernel test robot wrote:
> FYI, we noticed the below changes on
>
> https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git testing.lookups
> commit 96352435afd792d713ec91e2d5693bdcf40a6b03 ("parallel lookups machinery, part 2")

Very odd. All stores in that commit are to ->d_parent->d_inode->i_dir_seq
of in-lookup dentries. And if ->d_parent in those gets buggered, we were
really deep in trouble. Could you try to reproduce that without __d_move()
or __d_add() parts and see which one ends up triggering that crap?

Note that we don't *use* ->i_dir_seq yet...

2016-04-18 02:08:41

by Huang, Ying

[permalink] [raw]
Subject: Re: [LKP] [lkp] [parallel lookups machinery, part 2] 96352435af: kernel BUG at mm/slab.c:2599!

Al Viro <[email protected]> writes:

> On Mon, Apr 18, 2016 at 09:06:35AM +0800, kernel test robot wrote:
>> FYI, we noticed the below changes on
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git testing.lookups
>> commit 96352435afd792d713ec91e2d5693bdcf40a6b03 ("parallel lookups machinery, part 2")
>
> Very odd. All stores in that commit are to ->d_parent->d_inode->i_dir_seq
> of in-lookup dentries. And if ->d_parent in those gets buggered, we were
> really deep in trouble. Could you try to reproduce that without __d_move()
> or __d_add() parts and see which one ends up triggering that crap?

Could you provide a debug branch in your tree for that?

Best Regards,
Huang, Ying

> Note that we don't *use* ->i_dir_seq yet...
> _______________________________________________
> LKP mailing list
> [email protected]
> https://lists.01.org/mailman/listinfo/lkp

2016-04-18 02:15:09

by Al Viro

[permalink] [raw]
Subject: Re: [LKP] [lkp] [parallel lookups machinery, part 2] 96352435af: kernel BUG at mm/slab.c:2599!

On Mon, Apr 18, 2016 at 10:08:37AM +0800, Huang, Ying wrote:

> Could you provide a debug branch in your tree for that?

Done - vfs.git#T1 and vfs.git#T2 resp.

How reproducible it is and how certain you are that it's not there on
the parent commit?

2016-04-18 07:12:51

by Huang, Ying

[permalink] [raw]
Subject: Re: [LKP] [lkp] [parallel lookups machinery, part 2] 96352435af: kernel BUG at mm/slab.c:2599!

Al Viro <[email protected]> writes:

Hi, Viro,

> On Mon, Apr 18, 2016 at 10:08:37AM +0800, Huang, Ying wrote:
>
>> Could you provide a debug branch in your tree for that?
>
> Done - vfs.git#T1 and vfs.git#T2 resp.

The compare result between T1 and T2 is as follow:

=========================================================================================
compiler/kconfig/rootfs/sleep/tbox_group/testcase:
gcc-5/x86_64-allyesdebian/quantal-core-x86_64.cgz/1/vm-vp-quantal-x86_64/boot

commit:
3ab6378a68555a90a11360e2f2d63c9adb4e7101 (T1)
e570d87b645a0bfd80e36df9753faf3598c9fce0 (T2)

3ab6378a68555a90 e570d87b645a0bfd80e36df975
---------------- --------------------------
fail:runs %reproduction fail:runs
| | |
47:47 -100% :3 dmesg.Kernel_panic-not_syncing:Fatal_exception_in_interrupt
47:47 -100% :3 dmesg.RIP:kfree_debugcheck
47:47 -100% :3 dmesg.invalid_opcode:#[##]SMP
47:47 -100% :3 dmesg.kernel_BUG_at_mm/slab.c
7:47 -15% :3 kmsg.APIC_calibration_not_consistent_with_PM-Timer:#ms_instead_of#ms
1:47 -2% :3 kmsg.kfree_debugcheck:out_of_range_ptr#a8h
7:47 -15% :3 kmsg.kfree_debugcheck:out_of_range_ptr#aah
21:47 -45% :3 kmsg.kfree_debugcheck:out_of_range_ptr#ach
1:47 -2% :3 kmsg.kfree_debugcheck:out_of_range_ptr#bah
2:47 -4% :3 kmsg.kfree_debugcheck:out_of_range_ptr#ch
4:47 -9% :3 kmsg.kfree_debugcheck:out_of_range_ptr#eh
11:47 -23% :3 kmsg.kfree_debugcheck:out_of_range_ptr#h

That is, there is same issue for T1, but no such issue for T2.

> How reproducible it is and how certain you are that it's not there on
> the parent commit?

The result is quite stable for tens tests for parents and the first bad
commit.

Best Regards,
Huang, Ying

2016-04-18 11:09:57

by Al Viro

[permalink] [raw]
Subject: Re: [LKP] [lkp] [parallel lookups machinery, part 2] 96352435af: kernel BUG at mm/slab.c:2599!

On Mon, Apr 18, 2016 at 03:12:47PM +0800, Huang, Ying wrote:

> That is, there is same issue for T1, but no such issue for T2.
>
> > How reproducible it is and how certain you are that it's not there on
> > the parent commit?
>
> The result is quite stable for tens tests for parents and the first bad
> commit.

Nuts... Try vfs.git#T3, please - the only way this could cause memory
corruption, AFAICS, is some really weird caller of d_add() or d_splice_alias(),
passing it a dentry with bogus ->d_parent...