2015-12-18 20:08:10

by Alex Ng (LIS)

[permalink] [raw]
Subject: [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next

Hi,

I was running a "git clone" of the linux-next source tree and hit the following BUG_ON condition. My box is running kernel 4.4.0-rc5-next-20151217-52.27. Any ideas on how to pin down the cause?

The trace indicates that the following condition in compare_css_sets() triggered the oops:

BUG_ON(cgrp1->root != cgrp2->root);

[ 1859.800805] ------------[ cut here ]------------
[ 1859.804082] kernel BUG at kernel/cgroup.c:834!
[ 1859.804082] invalid opcode: 0000 [#1] SMP
[ 1859.804082] Modules linked in: iscsi_ibft iscsi_boot_sysfs af_packet crct10dif_pclmul crc32_pclmul ghash_clmulni_intel drbg ansi_cprng aesni_intel i2c_piix4 hv_netvsc serio_raw pcspkr hyperv_keyboard aes_x86_64 lrw hyperv_fb joydev gf128mul glue_helper ablk_helper hv_utils acpi_cpufreq cryptd processor button dm_mod xfs libcrc32c sd_mod hid_generic sr_mod cdrom ata_generic ata_piix hid_hyperv hv_storvsc ahci libahci crc32c_intel hv_vmbus libata floppy sg scsi_mod autofs4
[ 1859.804082] CPU: 2 PID: 1 Comm: systemd Not tainted 4.4.0-rc5-next-20151217-52.27-default+ #2
[ 1859.804082] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006 05/23/2012
[ 1859.804082] task: ffff880101c54040 ti: ffff880101c58000 task.ti: ffff880101c58000
[ 1859.804082] RIP: 0010:[<ffffffff810f108d>] [<ffffffff810f108d>] find_css_set+0x3ad/0x3e0
[ 1859.804082] RSP: 0018:ffff880101c5bc38 EFLAGS: 00010207
[ 1859.804082] RAX: ffff88003694b238 RBX: ffff8800f10d0638 RCX: ffff8800eefa8220
[ 1859.804082] RDX: ffff8800f14b5a20 RSI: ffff88003694b250 RDI: ffff880101c5bc48
[ 1859.804082] RBP: ffff880101c5bcc0 R08: 0000000000000000 R09: ffff8800f12efc00
[ 1859.804082] R10: ffff8800f18e3800 R11: 0000000000000000 R12: ffff8800f3938400
[ 1859.804082] R13: ffff880101c5bc48 R14: ffff8800f10d0600 R15: ffff88003694b200
[ 1859.804082] FS: 00007f994345a880(0000) GS:ffff880102e40000(0000) knlGS:0000000000000000
[ 1859.804082] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1859.804082] CR2: 00007fc829d19000 CR3: 0000000036e46000 CR4: 00000000000006e0
[ 1859.804082] Stack:
[ 1859.804082] ffff880101c5bc88 ffffffff810c3970 ffffffff81a74b00 ffffffff81dcc380
[ 1859.804082] ffffffff81a4d100 ffffffff81f5c660 ffff8801023df800 ffff8801023db500
[ 1859.804082] ffff8801023d7400 ffff8801023d7340 ffff8801023d7280 ffff8801023db400
[ 1859.804082] Call Trace:
[ 1859.804082] [<ffffffff810c3970>] ? __wait_rcu_gp+0xd0/0xf0
[ 1859.804082] [<ffffffff810f115a>] cgroup_migrate_prepare_dst+0x9a/0x200
[ 1859.804082] [<ffffffff810f2065>] cgroup_attach_task+0x65/0xd0
[ 1859.804082] [<ffffffff810abf1d>] ? percpu_down_write+0x5d/0xd0
[ 1859.804082] [<ffffffff810f2348>] __cgroup_procs_write.isra.22+0x1b8/0x2d0
[ 1859.804082] [<ffffffff810f2493>] cgroup_procs_write+0x13/0x20
[ 1859.804082] [<ffffffff810edb28>] cgroup_file_write+0x38/0xf0
[ 1859.804082] [<ffffffff81250380>] kernfs_fop_write+0x120/0x170
[ 1859.804082] [<ffffffff811daf08>] __vfs_write+0x28/0xe0
[ 1859.804082] [<ffffffff8129a618>] ? apparmor_file_permission+0x18/0x20
[ 1859.804082] [<ffffffff81273dbd>] ? security_file_permission+0x3d/0xc0
[ 1859.804082] [<ffffffff810abe47>] ? percpu_down_read+0x17/0x50
[ 1859.804082] [<ffffffff811db7c2>] vfs_write+0xa2/0x1a0
[ 1859.804082] [<ffffffff81051310>] ? __do_page_fault+0x1a0/0x3f0
[ 1859.804082] [<ffffffff811dc726>] SyS_write+0x46/0xa0
[ 1859.804082] [<ffffffff815aafee>] entry_SYSCALL_64_fastpath+0x12/0x71
[ 1859.804082] Code: 03 10 48 8b 72 08 48 89 4a 08 48 89 11 48 89 71 08 48 89 0e f6 40 74 01 75 c3 48 8b 50 18 f6 c2 03 75 22 65 48 ff 02 eb b4 0f 0b <0f> 0b 31 c0 e9 b0 fd ff ff 4c 89 ff e8 72 92 0c 00 31 c0 e9 a1
[ 1860.196107] RIP [<ffffffff810f108d>] find_css_set+0x3ad/0x3e0
[ 1860.196107] RSP <ffff880101c5bc38>
[ 1860.199742] ---[ end trace 3a415fee224c72a3 ]---
[ 1860.199744] Kernel panic - not syncing: Fatal exception in interrupt
[ 1860.203733] Kernel Offset: disabled
[ 1860.203733] ---[ end Kernel panic - not syncing: Fatal exception in interrupt

--
Alex Ng


2015-12-21 21:56:22

by Tejun Heo

[permalink] [raw]
Subject: Re: [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next

Hello, Alex.

On Fri, Dec 18, 2015 at 08:08:03PM +0000, Alex Ng (LIS) wrote:
> Hi,
>
> I was running a "git clone" of the linux-next source tree and hit the following BUG_ON condition. My box is running kernel 4.4.0-rc5-next-20151217-52.27. Any ideas on how to pin down the cause?
>
> The trace indicates that the following condition in compare_css_sets() triggered the oops:

Can you please let me know the steps to reproduce the bug?

Thanks.

--
tejun

2015-12-22 20:40:36

by Alex Ng (LIS)

[permalink] [raw]
Subject: RE: [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next

> Hello, Alex.
>
> On Fri, Dec 18, 2015 at 08:08:03PM +0000, Alex Ng (LIS) wrote:
> > Hi,
> >
> > I was running a "git clone" of the linux-next source tree and hit the
> following BUG_ON condition. My box is running kernel 4.4.0-rc5-next-
> 20151217-52.27. Any ideas on how to pin down the cause?
> >
> > The trace indicates that the following condition in compare_css_sets()
> triggered the oops:
>
> Can you please let me know the steps to reproduce the bug?

I tried this on a Hyper-V VM hosted in Windows Server 2012R2 and ran the attached script.
The script clones the linux-next tree in a random directory under /tmp in a tight loop.

This panic is not always reproducible, and I have only hit it once after running the script about 10 times. A different kernel panic happens each time I run this script; and the panics always happen during the first iteration of the loop.

Let me know if you need more information.

Hope this helps,
Alex


Attachments:
test.sh (207.00 B)
test.sh

2015-12-23 16:54:28

by Tejun Heo

[permalink] [raw]
Subject: Re: [OOPS] BUG_ON in cgroups on 4.4.0-rc5-next

Hello, Alex.

On Tue, Dec 22, 2015 at 07:06:41PM +0000, Alex Ng (LIS) wrote:
> > Can you please let me know the steps to reproduce the bug?
>
> I tried this on a Hyper-V VM hosted in Windows Server 2012R2 and ran
> the attached script. The script clones the linux-next tree in a
> random directory under /tmp in a tight loop.
>
> This panic is not always reproducible, and I have only hit it once
> after running the script about 10 times. A different kernel panic
> happens each time I run this script; and the panics always happen
> during the first iteration of the loop.

Heh, I don't get it. The script doesn't do anything cgroup specific.
Can you please apply the attached patch, reproduce the issue and
report the kernel log?

Thanks.

--
tejun


Attachments:
(No filename) (756.00 B)
dbg (1.11 kB)
Download all attachments