2014-02-07 10:14:06

by David Rientjes

[permalink] [raw]
Subject: Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

On Fri, 7 Feb 2014, Fengguang Wu wrote:

> [ 1.625020] BTRFS: selftest: Running btrfs_split_item tests
> [ 1.627004] BTRFS: selftest: Running find delalloc tests
> [ 2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz
> [ 292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=1, oom_score_adj=0
> [ 292.086439] kthreadd cpuset=
> [ 292.087072] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> [ 292.087372] IP: [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c

This looks like a problem with the cpuset cgroup name, are you sure this
isn't related to the removal of cgroup->name?

> [ 292.087372] PGD 0
> [ 292.087372] Oops: 0000 [#1]
> [ 292.087372] Modules linked in:
> [ 292.087372] CPU: 0 PID: 2 Comm: kthreadd Not tainted 3.14.0-rc1-wl-ath-00978-g4830363 #2
> [ 292.087372] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 292.087372] task: ffff880000148050 ti: ffff88000014a000 task.ti: ffff88000014a000
> [ 292.087372] RIP: 0010:[<ffffffff812119de>] [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
> [ 292.087372] RSP: 0000:ffff88000014bb20 EFLAGS: 00010046
> [ 292.087372] RAX: 0000000000000282 RBX: 0000000000000000 RCX: 0000000000000002
> [ 292.087372] RDX: ffffffff812119de RSI: ffffffff8247d4a8 RDI: 0000000000000046
> [ 292.087372] RBP: ffff88000014bb30 R08: ffffffff82f31218 R09: 0000000000000000
> [ 292.087372] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff833c4ab8
> [ 292.087372] R13: 00000000003000d0 R14: 0000000000000001 R15: 0000000000000000
> [ 292.087372] FS: 0000000000000000(0000) GS:ffffffff82279000(0000) knlGS:0000000000000000
> [ 292.087372] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 292.087372] CR2: 0000000000000038 CR3: 0000000002269000 CR4: 00000000000006b0
> [ 292.087372] Stack:
> [ 292.087372] ffff880000148050 ffffffff833c4ab8 ffff88000014bb50 ffffffff8110836d
> [ 292.087372] ffff880000148050 ffff880000148530 ffff88000014bbd0 ffffffff81ae2614
> [ 292.087372] ffff880000148640 ffff88000014bb78 ffffffff810c7dd7 ffff88000014bb98
> [ 292.087372] Call Trace:
> [ 292.087372] [<ffffffff8110836d>] cpuset_print_task_mems_allowed+0x65/0x8b
> [ 292.087372] [<ffffffff81ae2614>] dump_header.isra.11+0x68/0x29a
> [ 292.087372] [<ffffffff810c7dd7>] ? local_clock+0x2b/0x34
> [ 292.087372] [<ffffffff810ce52b>] ? lock_release_holdtime+0xcf/0xdb
> [ 292.087372] [<ffffffff811533d2>] ? out_of_memory+0x318/0x40f
> [ 292.087372] [<ffffffff8115342d>] out_of_memory+0x373/0x40f
> [ 292.087372] [<ffffffff8115324d>] ? out_of_memory+0x193/0x40f
> [ 292.087372] [<ffffffff8115a0ac>] __alloc_pages_nodemask+0xdd1/0x12c4
> [ 292.087372] [<ffffffff8100d301>] ? native_sched_clock+0xe4/0xfc
> [ 292.087372] [<ffffffff81089519>] copy_process+0x21d/0x2115
> [ 292.087372] [<ffffffff810b693e>] ? kthread_create_on_node+0x23e/0x23e
> [ 292.087372] [<ffffffff8100d322>] ? sched_clock+0x9/0xd
> [ 292.087372] [<ffffffff810c7aa7>] ? sched_clock_local.constprop.2+0x35/0xc8
> [ 292.087372] [<ffffffff810360bf>] ? pvclock_clocksource_read+0x9b/0x140
> [ 292.087372] [<ffffffff810b693e>] ? kthread_create_on_node+0x23e/0x23e
> [ 292.087372] [<ffffffff8108b61c>] do_fork+0x105/0x4db
> [ 292.087372] [<ffffffff810c7d54>] ? sched_clock_cpu+0xc9/0xdb
> [ 292.087372] [<ffffffff810c7dd7>] ? local_clock+0x2b/0x34
> [ 292.087372] [<ffffffff810ce52b>] ? lock_release_holdtime+0xcf/0xdb
> [ 292.087372] [<ffffffff810b768d>] ? kthreadd+0x1b3/0x24a
> [ 292.087372] [<ffffffff8108ba18>] kernel_thread+0x26/0x28
> [ 292.087372] [<ffffffff810b76a1>] kthreadd+0x1c7/0x24a
> [ 292.087372] [<ffffffff81afc50a>] ? ret_from_fork+0x7a/0xb0
> [ 292.087372] [<ffffffff810b74da>] ? kthread_create_on_cpu+0x7a/0x7a
> [ 292.087372] [<ffffffff81afc50a>] ret_from_fork+0x7a/0xb0
> [ 292.087372] [<ffffffff810b74da>] ? kthread_create_on_cpu+0x7a/0x7a
> [ 292.087372] Code: 1c 92 8e 00 48 8b 45 e0 59 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 48 c7 c7 90 d4 47 82 e8 b1 8f 8e 00 <48> 83 7b 38 00 49 89 c4 74 06 48 8b 73 40 eb 07 48 c7 c6 48 fe
> [ 292.087372] RIP [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
> [ 292.087372] RSP <ffff88000014bb20>
> [ 292.087372] CR2: 0000000000000038
> [ 292.087372] ---[ end trace df25444498a82119 ]---
> [ 292.087372] ---[ end trace df25444498a82119 ]---
>
> git bisect start 483036322b45d800fd68cb028874b6cd4fee3dba 38dbfb59d1175ef458d006556061adeaa8751b72 --
> git bisect bad 50a557bc0d03997d9203d749b0304c54d2c6e5f5 # 02:20 0- 5 Merge 'cgroup/review-simplify' into devel-hourly-2014020601
> git bisect bad c04e036e53a39cf2986ddaaf5607ff011f9ea55b # 03:14 0- 1 Merge 'pinctrl/for-next' into devel-hourly-2014020601
> git bisect good 08ce20bd7ef9a08b74cdfe6fe454374be72b5825 # 03:58 25+ 0 Merge 'pci/pci/msi' into devel-hourly-2014020601
> git bisect good 4f86564c657669f69c9cc03151ef6f23ae9e2015 # 04:32 25+ 0 Merge 'm68knommu/cf' into devel-hourly-2014020601
> git bisect bad 29ae23c20b538664beaea72bb0721ce2538b4ca9 # 04:50 0- 11 Merge 'm68knommu/cfmmu' into devel-hourly-2014020601
> git bisect bad bbce71dbd03db3cf6df7faba0132d87a1a055827 # 04:58 0- 3 Merge 'nfs/devel' into devel-hourly-2014020601
> git bisect good 88a78a912ee059467ae6db7429a6efe4654620a5 # 06:16 25+ 1 Merge branch 'acl_fixes' into linux-next
> git bisect good 12b13835a0a8bfabea68741e1ab4d4a4cb77d037 # 06:36 25+ 0 kbuild: don't enable DEBUG_INFO when building for COMPILE_TEST
> git bisect bad 1f35d872a0b9dd05de8f0e50fc2957bf74ea30f7 # 06:55 0- 9 NFS: Shrink nfs_inode by sharing storage for cookieverf and commit_info
> git bisect bad 878a876b2e10888afe53766dcca33f723ae20edc # 07:16 0- 4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
> git bisect good d7512f79fd6cb8e2d9b78770289df6391a867ca1 # 07:20 25+ 0 Merge tag 'nfs-for-3.14-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
> git bisect bad 9343224bfd4be6a02e6ae0c0d66426c955c7d76e # 07:34 0- 12 Merge branch 'akpm' (patches from Andrew Morton)
> git bisect bad 0cc2aa51be9d2f2b001c0e070b2e5cdde89b39f4 # 07:47 0- 10 Add linux-next specific files for 20140206
>
> Thanks,
> Fengguang
>

2014-02-07 12:10:45

by Fengguang Wu

[permalink] [raw]
Subject: Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

On Fri, Feb 07, 2014 at 02:13:59AM -0800, David Rientjes wrote:
> On Fri, 7 Feb 2014, Fengguang Wu wrote:
>
> > [ 1.625020] BTRFS: selftest: Running btrfs_split_item tests
> > [ 1.627004] BTRFS: selftest: Running find delalloc tests
> > [ 2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz
> > [ 292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=1, oom_score_adj=0
> > [ 292.086439] kthreadd cpuset=
> > [ 292.087072] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > [ 292.087372] IP: [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
>
> This looks like a problem with the cpuset cgroup name, are you sure this
> isn't related to the removal of cgroup->name?

It looks not related to patch "cgroup: remove cgroup->name", because
that patch lies in the cgroup tree and not contained in output of "git log BAD_COMMIT".

Thanks,
Fengguang

> > [ 292.087372] PGD 0
> > [ 292.087372] Oops: 0000 [#1]
> > [ 292.087372] Modules linked in:
> > [ 292.087372] CPU: 0 PID: 2 Comm: kthreadd Not tainted 3.14.0-rc1-wl-ath-00978-g4830363 #2
> > [ 292.087372] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> > [ 292.087372] task: ffff880000148050 ti: ffff88000014a000 task.ti: ffff88000014a000
> > [ 292.087372] RIP: 0010:[<ffffffff812119de>] [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
> > [ 292.087372] RSP: 0000:ffff88000014bb20 EFLAGS: 00010046
> > [ 292.087372] RAX: 0000000000000282 RBX: 0000000000000000 RCX: 0000000000000002
> > [ 292.087372] RDX: ffffffff812119de RSI: ffffffff8247d4a8 RDI: 0000000000000046
> > [ 292.087372] RBP: ffff88000014bb30 R08: ffffffff82f31218 R09: 0000000000000000
> > [ 292.087372] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff833c4ab8
> > [ 292.087372] R13: 00000000003000d0 R14: 0000000000000001 R15: 0000000000000000
> > [ 292.087372] FS: 0000000000000000(0000) GS:ffffffff82279000(0000) knlGS:0000000000000000
> > [ 292.087372] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > [ 292.087372] CR2: 0000000000000038 CR3: 0000000002269000 CR4: 00000000000006b0
> > [ 292.087372] Stack:
> > [ 292.087372] ffff880000148050 ffffffff833c4ab8 ffff88000014bb50 ffffffff8110836d
> > [ 292.087372] ffff880000148050 ffff880000148530 ffff88000014bbd0 ffffffff81ae2614
> > [ 292.087372] ffff880000148640 ffff88000014bb78 ffffffff810c7dd7 ffff88000014bb98
> > [ 292.087372] Call Trace:
> > [ 292.087372] [<ffffffff8110836d>] cpuset_print_task_mems_allowed+0x65/0x8b
> > [ 292.087372] [<ffffffff81ae2614>] dump_header.isra.11+0x68/0x29a
> > [ 292.087372] [<ffffffff810c7dd7>] ? local_clock+0x2b/0x34
> > [ 292.087372] [<ffffffff810ce52b>] ? lock_release_holdtime+0xcf/0xdb
> > [ 292.087372] [<ffffffff811533d2>] ? out_of_memory+0x318/0x40f
> > [ 292.087372] [<ffffffff8115342d>] out_of_memory+0x373/0x40f
> > [ 292.087372] [<ffffffff8115324d>] ? out_of_memory+0x193/0x40f
> > [ 292.087372] [<ffffffff8115a0ac>] __alloc_pages_nodemask+0xdd1/0x12c4
> > [ 292.087372] [<ffffffff8100d301>] ? native_sched_clock+0xe4/0xfc
> > [ 292.087372] [<ffffffff81089519>] copy_process+0x21d/0x2115
> > [ 292.087372] [<ffffffff810b693e>] ? kthread_create_on_node+0x23e/0x23e
> > [ 292.087372] [<ffffffff8100d322>] ? sched_clock+0x9/0xd
> > [ 292.087372] [<ffffffff810c7aa7>] ? sched_clock_local.constprop.2+0x35/0xc8
> > [ 292.087372] [<ffffffff810360bf>] ? pvclock_clocksource_read+0x9b/0x140
> > [ 292.087372] [<ffffffff810b693e>] ? kthread_create_on_node+0x23e/0x23e
> > [ 292.087372] [<ffffffff8108b61c>] do_fork+0x105/0x4db
> > [ 292.087372] [<ffffffff810c7d54>] ? sched_clock_cpu+0xc9/0xdb
> > [ 292.087372] [<ffffffff810c7dd7>] ? local_clock+0x2b/0x34
> > [ 292.087372] [<ffffffff810ce52b>] ? lock_release_holdtime+0xcf/0xdb
> > [ 292.087372] [<ffffffff810b768d>] ? kthreadd+0x1b3/0x24a
> > [ 292.087372] [<ffffffff8108ba18>] kernel_thread+0x26/0x28
> > [ 292.087372] [<ffffffff810b76a1>] kthreadd+0x1c7/0x24a
> > [ 292.087372] [<ffffffff81afc50a>] ? ret_from_fork+0x7a/0xb0
> > [ 292.087372] [<ffffffff810b74da>] ? kthread_create_on_cpu+0x7a/0x7a
> > [ 292.087372] [<ffffffff81afc50a>] ret_from_fork+0x7a/0xb0
> > [ 292.087372] [<ffffffff810b74da>] ? kthread_create_on_cpu+0x7a/0x7a
> > [ 292.087372] Code: 1c 92 8e 00 48 8b 45 e0 59 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 55 48 89 e5 41 54 53 48 89 fb 48 c7 c7 90 d4 47 82 e8 b1 8f 8e 00 <48> 83 7b 38 00 49 89 c4 74 06 48 8b 73 40 eb 07 48 c7 c6 48 fe
> > [ 292.087372] RIP [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
> > [ 292.087372] RSP <ffff88000014bb20>
> > [ 292.087372] CR2: 0000000000000038
> > [ 292.087372] ---[ end trace df25444498a82119 ]---
> > [ 292.087372] ---[ end trace df25444498a82119 ]---
> >
> > git bisect start 483036322b45d800fd68cb028874b6cd4fee3dba 38dbfb59d1175ef458d006556061adeaa8751b72 --
> > git bisect bad 50a557bc0d03997d9203d749b0304c54d2c6e5f5 # 02:20 0- 5 Merge 'cgroup/review-simplify' into devel-hourly-2014020601
> > git bisect bad c04e036e53a39cf2986ddaaf5607ff011f9ea55b # 03:14 0- 1 Merge 'pinctrl/for-next' into devel-hourly-2014020601
> > git bisect good 08ce20bd7ef9a08b74cdfe6fe454374be72b5825 # 03:58 25+ 0 Merge 'pci/pci/msi' into devel-hourly-2014020601
> > git bisect good 4f86564c657669f69c9cc03151ef6f23ae9e2015 # 04:32 25+ 0 Merge 'm68knommu/cf' into devel-hourly-2014020601
> > git bisect bad 29ae23c20b538664beaea72bb0721ce2538b4ca9 # 04:50 0- 11 Merge 'm68knommu/cfmmu' into devel-hourly-2014020601
> > git bisect bad bbce71dbd03db3cf6df7faba0132d87a1a055827 # 04:58 0- 3 Merge 'nfs/devel' into devel-hourly-2014020601
> > git bisect good 88a78a912ee059467ae6db7429a6efe4654620a5 # 06:16 25+ 1 Merge branch 'acl_fixes' into linux-next
> > git bisect good 12b13835a0a8bfabea68741e1ab4d4a4cb77d037 # 06:36 25+ 0 kbuild: don't enable DEBUG_INFO when building for COMPILE_TEST
> > git bisect bad 1f35d872a0b9dd05de8f0e50fc2957bf74ea30f7 # 06:55 0- 9 NFS: Shrink nfs_inode by sharing storage for cookieverf and commit_info
> > git bisect bad 878a876b2e10888afe53766dcca33f723ae20edc # 07:16 0- 4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs
> > git bisect good d7512f79fd6cb8e2d9b78770289df6391a867ca1 # 07:20 25+ 0 Merge tag 'nfs-for-3.14-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs
> > git bisect bad 9343224bfd4be6a02e6ae0c0d66426c955c7d76e # 07:34 0- 12 Merge branch 'akpm' (patches from Andrew Morton)
> > git bisect bad 0cc2aa51be9d2f2b001c0e070b2e5cdde89b39f4 # 07:47 0- 10 Add linux-next specific files for 20140206
> >
> > Thanks,
> > Fengguang
> >

2014-02-07 15:09:16

by Chris Mason

[permalink] [raw]
Subject: Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

On Fri 07 Feb 2014 07:10:38 AM EST, Fengguang Wu wrote:
> On Fri, Feb 07, 2014 at 02:13:59AM -0800, David Rientjes wrote:
>> On Fri, 7 Feb 2014, Fengguang Wu wrote:
>>
>>> [ 1.625020] BTRFS: selftest: Running btrfs_split_item tests
>>> [ 1.627004] BTRFS: selftest: Running find delalloc tests
>>> [ 2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz
>>> [ 292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=1, oom_score_adj=0
>>> [ 292.086439] kthreadd cpuset=
>>> [ 292.087072] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
>>> [ 292.087372] IP: [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
>>
>> This looks like a problem with the cpuset cgroup name, are you sure this
>> isn't related to the removal of cgroup->name?
>
> It looks not related to patch "cgroup: remove cgroup->name", because
> that patch lies in the cgroup tree and not contained in output of "git log BAD_COMMIT".

Still not sure exactly what is going on, but I can't trigger it here.
My first guess is that it is related to having btrfs static, some part
of our init is happening at the wrong time, and the self tests are
swooping in and causing trouble.

2014-02-07 15:22:16

by Filipe Manana

[permalink] [raw]
Subject: Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

On Fri, Feb 7, 2014 at 3:10 PM, Chris Mason <[email protected]> wrote:
> On Fri 07 Feb 2014 07:10:38 AM EST, Fengguang Wu wrote:
>>
>> On Fri, Feb 07, 2014 at 02:13:59AM -0800, David Rientjes wrote:
>>>
>>> On Fri, 7 Feb 2014, Fengguang Wu wrote:
>>>
>>>> [ 1.625020] BTRFS: selftest: Running btrfs_split_item tests
>>>> [ 1.627004] BTRFS: selftest: Running find delalloc tests
>>>> [ 2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz
>>>> [ 292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=1,
>>>> oom_score_adj=0
>>>> [ 292.086439] kthreadd cpuset=
>>>> [ 292.087072] BUG: unable to handle kernel NULL pointer dereference at
>>>> 0000000000000038
>>>> [ 292.087372] IP: [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
>>>
>>>
>>> This looks like a problem with the cpuset cgroup name, are you sure this
>>> isn't related to the removal of cgroup->name?
>>
>>
>> It looks not related to patch "cgroup: remove cgroup->name", because
>> that patch lies in the cgroup tree and not contained in output of "git log
>> BAD_COMMIT".
>
>
> Still not sure exactly what is going on, but I can't trigger it here. My
> first guess is that it is related to having btrfs static, some part of our
> init is happening at the wrong time, and the self tests are swooping in and
> causing trouble.

I couldn't reproduce it either so far, neither on a physical machine
nor in a vm (qemu+kvm) (with CONFIG_BTRFS_FS=y, CONFIG_CRYPTO_CRC32C=y
and CONFIG_CRYPTO_CRC32C_INTEL=y).
If you disable CONFIG_BTRFS_FS_RUN_SANITY_TESTS, does it still crash?

thanks

>
>



--
Filipe David Manana,

"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."

2014-02-07 21:13:10

by David Rientjes

[permalink] [raw]
Subject: Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

On Fri, 7 Feb 2014, Fengguang Wu wrote:

> On Fri, Feb 07, 2014 at 02:13:59AM -0800, David Rientjes wrote:
> > On Fri, 7 Feb 2014, Fengguang Wu wrote:
> >
> > > [ 1.625020] BTRFS: selftest: Running btrfs_split_item tests
> > > [ 1.627004] BTRFS: selftest: Running find delalloc tests
> > > [ 2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz
> > > [ 292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=1, oom_score_adj=0
> > > [ 292.086439] kthreadd cpuset=
> > > [ 292.087072] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > [ 292.087372] IP: [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
> >
> > This looks like a problem with the cpuset cgroup name, are you sure this
> > isn't related to the removal of cgroup->name?
>
> It looks not related to patch "cgroup: remove cgroup->name", because
> that patch lies in the cgroup tree and not contained in output of "git log BAD_COMMIT".
>

It's dying on pr_cont_kernfs_name which is some tree that has "kernfs:
implement kernfs_get_parent(), kernfs_name/path() and friends", which is
not in linux-next, and is obviously printing the cpuset cgroup name.

It doesn't look like it has anything at all to do with btrfs or why they
would care about this failure.

2014-02-08 13:07:52

by Fengguang Wu

[permalink] [raw]
Subject: Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

> If you disable CONFIG_BTRFS_FS_RUN_SANITY_TESTS, does it still crash?

Good idea! I've queued test jobs for that config. However sorry that
I'll be offline for the next 2 days. So please expect some delays.

Thanks,
Fengguang

2014-02-08 20:10:44

by Tejun Heo

[permalink] [raw]
Subject: Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

Hello, David, Fengguang, Chris.

On Fri, Feb 07, 2014 at 01:13:06PM -0800, David Rientjes wrote:
> On Fri, 7 Feb 2014, Fengguang Wu wrote:
>
> > On Fri, Feb 07, 2014 at 02:13:59AM -0800, David Rientjes wrote:
> > > On Fri, 7 Feb 2014, Fengguang Wu wrote:
> > >
> > > > [ 1.625020] BTRFS: selftest: Running btrfs_split_item tests
> > > > [ 1.627004] BTRFS: selftest: Running find delalloc tests
> > > > [ 2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz
> > > > [ 292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=1, oom_score_adj=0
> > > > [ 292.086439] kthreadd cpuset=
> > > > [ 292.087072] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > > [ 292.087372] IP: [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
> > >
> > > This looks like a problem with the cpuset cgroup name, are you sure this
> > > isn't related to the removal of cgroup->name?
> >
> > It looks not related to patch "cgroup: remove cgroup->name", because
> > that patch lies in the cgroup tree and not contained in output of "git log BAD_COMMIT".
> >
>
> It's dying on pr_cont_kernfs_name which is some tree that has "kernfs:
> implement kernfs_get_parent(), kernfs_name/path() and friends", which is
> not in linux-next, and is obviously printing the cpuset cgroup name.
>
> It doesn't look like it has anything at all to do with btrfs or why they
> would care about this failure.

Yeah, this is from a patch in cgroup/review-post-kernfs-conversion
branch which updates cgroup to use pr_cont_kernfs_name(). I forget
that cgrp->kn is NULL for the dummy_root's top cgroup and thus it ends
up calling the kernfs functions with NULL kn and thus the oops. I
posted an updated patch and the git branch has been updated.

http://lkml.kernel.org/g/[email protected]

So, nothing to do with btrfs and it looks like somehow the test
appratus is mixing up branches?

Thanks!

--
tejun

2014-02-10 09:13:08

by Fengguang Wu

[permalink] [raw]
Subject: Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

Hi Filipe,

> If you disable CONFIG_BTRFS_FS_RUN_SANITY_TESTS, does it still crash?

I tried disabling CONFIG_BTRFS_FS_RUN_SANITY_TESTS in the reported 3
randconfigs and they all boot fine.

Thanks,
Fengguang

2014-02-10 09:25:19

by Fengguang Wu

[permalink] [raw]
Subject: Re: [btrfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038

On Sat, Feb 08, 2014 at 03:10:37PM -0500, Tejun Heo wrote:
> Hello, David, Fengguang, Chris.
>
> On Fri, Feb 07, 2014 at 01:13:06PM -0800, David Rientjes wrote:
> > On Fri, 7 Feb 2014, Fengguang Wu wrote:
> >
> > > On Fri, Feb 07, 2014 at 02:13:59AM -0800, David Rientjes wrote:
> > > > On Fri, 7 Feb 2014, Fengguang Wu wrote:
> > > >
> > > > > [ 1.625020] BTRFS: selftest: Running btrfs_split_item tests
> > > > > [ 1.627004] BTRFS: selftest: Running find delalloc tests
> > > > > [ 2.289182] tsc: Refined TSC clocksource calibration: 2299.967 MHz
> > > > > [ 292.084537] kthreadd invoked oom-killer: gfp_mask=0x3000d0, order=1, oom_score_adj=0
> > > > > [ 292.086439] kthreadd cpuset=
> > > > > [ 292.087072] BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
> > > > > [ 292.087372] IP: [<ffffffff812119de>] pr_cont_kernfs_name+0x1b/0x6c
> > > >
> > > > This looks like a problem with the cpuset cgroup name, are you sure this
> > > > isn't related to the removal of cgroup->name?
> > >
> > > It looks not related to patch "cgroup: remove cgroup->name", because
> > > that patch lies in the cgroup tree and not contained in output of "git log BAD_COMMIT".

Sorry I was wrong here. I find that the above dmesg is for commit
4830363 which is a merge HEAD that contains the cgroup code.

The dmesg for commit 878a876b2e1 ("Merge branch 'for-linus' of
git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs")
looks different, which hangs after the tsc line:

[ 2.428110] Btrfs loaded, assert=on, integrity-checker=on
[ 2.429469] BTRFS: selftest: Running btrfs free space cache tests
[ 2.430874] BTRFS: selftest: Running extent only tests
[ 2.432135] BTRFS: selftest: Running bitmap only tests
[ 2.433359] BTRFS: selftest: Running bitmap and extent tests
[ 2.434675] BTRFS: selftest: Free space cache tests finished
[ 2.435959] BTRFS: selftest: Running extent buffer operation tests
[ 2.437350] BTRFS: selftest: Running btrfs_split_item tests
[ 2.438843] BTRFS: selftest: Running find delalloc tests
[ 3.158351] tsc: Refined TSC clocksource calibration: 2666.596 MHz


> > It's dying on pr_cont_kernfs_name which is some tree that has "kernfs:
> > implement kernfs_get_parent(), kernfs_name/path() and friends", which is
> > not in linux-next, and is obviously printing the cpuset cgroup name.
> >
> > It doesn't look like it has anything at all to do with btrfs or why they
> > would care about this failure.
>
> Yeah, this is from a patch in cgroup/review-post-kernfs-conversion
> branch which updates cgroup to use pr_cont_kernfs_name(). I forget
> that cgrp->kn is NULL for the dummy_root's top cgroup and thus it ends
> up calling the kernfs functions with NULL kn and thus the oops. I
> posted an updated patch and the git branch has been updated.
>
> http://lkml.kernel.org/g/[email protected]
>
> So, nothing to do with btrfs and it looks like somehow the test
> appratus is mixing up branches?

Yes - I may do random merges and boot test the resulted kernels.

Thanks,
Fengguang