LinuxLists.cc - [REGRESSION 2.6.35+] crash (maybe kmemleak related)

2010-08-12 20:37:57

Subject: [REGRESSION 2.6.35+] crash (maybe kmemleak related)

I am getting this crash on boot very often on git master of Linus tree.

(Note that I use kmemleak, but disable its scan thread using

'echo "scan=off" > /sys/kernel/debug/kmemleak'
in rc.local.

<7>[ 24.438184] wlan0: authenticated
<7>[ 24.441687] wlan0: associate with 00:1b:9e:d8:77:02 (try 1)
<7>[ 24.444425] wlan0: RX AssocResp from 00:1b:9e:d8:77:02 (capab=0x431 status=0 aid=3)
<7>[ 24.444448] wlan0: associated
<4>[ 24.511898] kmemleak: Cannot insert 0xffff880079fe6000 into the object search tree (already existing)
<4>[ 24.511922] Pid: 2031, comm: NetworkManager Not tainted 2.6.35-rc0+ #108
<4>[ 24.511936] Call Trace:
<4>[ 24.511950] [<ffffffff810dc41a>] create_object+0x27a/0x2d0
<4>[ 24.511966] [<ffffffff810d7b9b>] ? __slab_alloc+0xab/0x5c0
<4>[ 24.511982] [<ffffffff81137e71>] ? proc_self_follow_link+0x71/0xa0
<4>[ 24.511999] [<ffffffff8138ea29>] kmemleak_alloc+0x59/0xd0
<4>[ 24.512013] [<ffffffff810d8144>] kmem_cache_alloc+0x94/0x100
<4>[ 24.512028] [<ffffffff81137e71>] proc_self_follow_link+0x71/0xa0
<4>[ 24.512045] [<ffffffff810ea1bc>] link_path_walk+0x26c/0xd00
<4>[ 24.512060] [<ffffffff810ebe22>] do_filp_open+0x152/0x660
<4>[ 24.512075] [<ffffffff810648a0>] ? sched_clock_cpu+0xc0/0x110
<4>[ 24.512091] [<ffffffff8106e6ac>] ? lock_release_holdtime+0xac/0x150
<4>[ 24.512107] [<ffffffff8103888d>] ? sub_preempt_count+0x9d/0xd0
<4>[ 24.512123] [<ffffffff813a0eb0>] ? _raw_spin_unlock+0x30/0x60
<4>[ 24.512139] [<ffffffff810f7222>] ? alloc_fd+0xf2/0x140
<4>[ 24.512153] [<ffffffff810dc890>] do_sys_open+0x60/0x110
<4>[ 24.512166] [<ffffffff810dc96b>] sys_open+0x1b/0x20
<4>[ 24.512181] [<ffffffff81002deb>] system_call_fastpath+0x16/0x1b
<6>[ 24.512205] kmemleak: Kernel memory leak detector disabled
<5>[ 24.512221] kmemleak: Object 0xffff880079fe6000 (size 4096):
<5>[ 24.512234] kmemleak: comm "NetworkManager", pid 2031, jiffies 4294939747
<5>[ 24.512248] kmemleak: min_count = 1
<5>[ 24.512257] kmemleak: count = 0
<5>[ 24.512265] kmemleak: flags = 0x1
<5>[ 24.512273] kmemleak: checksum = 0
<5>[ 24.512281] kmemleak: backtrace:
<4>[ 24.512289] [<ffffffff8138ea29>] kmemleak_alloc+0x59/0xd0
<4>[ 24.512304] [<ffffffff810d8144>] kmem_cache_alloc+0x94/0x100
<4>[ 24.512319] [<ffffffff810e9d76>] getname+0x36/0x210
<4>[ 24.512333] [<ffffffff810dc85d>] do_sys_open+0x2d/0x110
<4>[ 24.512347] [<ffffffff810dc96b>] sys_open+0x1b/0x20
<4>[ 24.512361] [<ffffffff81002deb>] system_call_fastpath+0x16/0x1b
<4>[ 24.512377] [<ffffffffffffffff>] 0xffffffffffffffff
<0>[ 24.578578] general protection fault: 0000 [#1] PREEMPT SMP
<0>[ 24.578667] last sysfs file: /sys/devices/virtual/net/lo/type
<4>[ 24.578719] CPU 0
<4>[ 24.578734] Modules linked in: af_packet nfsd exportfs nfs lockd nfs_acl auth_rpcgss sunrpc iwl3945 snd_hda_codec_realtek ir_lirc_codec usb_storage lirc_dev uvcvideo iwlcore videodev v4l2_compat_ioctl32 ir_sony_decoder usb_libusual ir_jvc_decoder cpufreq_powersave mac80211 uhci_hcd cpufreq_conservative snd_hda_intel ehci_hcd snd_hda_codec ir_rc6_decoder sdhci_pci cpufreq_userspace joydev r852 sm_common nand sdhci snd_hwdep snd_pcm ir_rc5_decoder acpi_cpufreq mmc_core usbcore tg3 nand_ids ir_nec_decoder mperf cfg80211 ac battery iTCO_wdt coretemp r592 psmouse nand_ecc ene_ir mtd ir_core memstick snd_page_alloc serio_raw evdev libphy sg video
<4>[ 24.579449]
<4>[ 24.579475] Pid: 2031, comm: autoconf.h Not tainted 2.6.35-rc0+ #108 Nettiling/Aspire 5720
<4>[ 24.579538] RIP: 0010:[<ffffffff810d8111>] [<ffffffff810d8111>] kmem_cache_alloc+0x61/0x100
<4>[ 24.579612] RSP: 0018:ffff88007ca0fec8 EFLAGS: 00010002
<4>[ 24.579652] RAX: ffff8800025cffa0 RBX: 00000000000000d0 RCX: 0000000000000000
<4>[ 24.579700] RDX: 0000000000000000 RSI: 00000000000006a5 RDI: ffffffff814d0460
<4>[ 24.579746] RBP: ffff88007ca0fef8 R08: 00007fa12c547300 R09: 00007fa12c55b220
<4>[ 24.579793] R10: 00007fff3dc48160 R11: 0000000000000246 R12: ffffffff8154bfe0
<4>[ 24.579840] R13: 0000000000000292 R14: 68642f6e6962732f R15: ffffffff810e9d76
<4>[ 24.579886] FS: 00007fa12c73e700(0000) GS:ffff880002400000(0000) knlGS:0000000000000000
<4>[ 24.579947] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
<4>[ 24.579988] CR2: 00007fa12c2a20e0 CR3: 000000007cb2e000 CR4: 00000000000006f0
<4>[ 24.580036] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>[ 24.580084] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
<4>[ 24.580130] Process autoconf.h (pid: 2031, threadinfo ffff88007ca0e000, task ffff88007c7d0000)
<0>[ 24.580193] Stack:
<4>[ 24.580218] ffff88007ca0fee8 0000000000000000 0000000000464356 00000000ffffff9c
<4>[ 24.580281] <0> fffffffffffffff4 000000003dc485d8 ffff88007ca0ff28 ffffffff810e9d76
<4>[ 24.580367] <0> 0000000000000000 0000000000402b40 00000000ffffff9c 0000000000008002
<0>[ 24.580468] Call Trace:
<4>[ 24.580500] [<ffffffff810e9d76>] getname+0x36/0x210
<4>[ 24.580540] [<ffffffff810dc85d>] do_sys_open+0x2d/0x110
<4>[ 24.580581] [<ffffffff810dc96b>] sys_open+0x1b/0x20
<4>[ 24.580621] [<ffffffff81002deb>] system_call_fastpath+0x16/0x1b
<0>[ 24.580663] Code: 85 8d 00 00 00 9c 41 5d fa 65 48 8b 14 25 38 d4 00 00 49 8b 04 24 48 8d 04 02 4c 8b 30 4d 85 f6 0f 84 84 00 00 00 49 63 54 24 18 <49> 8b 14 16 48 89 10 41 55 9d 4d 85 f6 74 05 66 85 db 78 3b 41
<1>[ 24.581148] RIP [<ffffffff810d8111>] kmem_cache_alloc+0x61/0x100
<4>[ 24.581198] RSP <ffff88007ca0fec8>
<4>[ 24.581233] ---[ end trace 299e97b266f497ee ]---
<0>[ 24.581272] Kernel panic - not syncing: Fatal exception
<4>[ 24.581312] Pid: 2031, comm: autoconf.h Tainted: G D 2.6.35-rc0+ #108
<4>[ 24.581369] Call Trace:
<4>[ 24.581400] [<ffffffff8139c61a>] panic+0x96/0x1a3
<4>[ 24.581439] [<ffffffff810418b5>] ? kmsg_dump+0x125/0x140
<4>[ 24.581480] [<ffffffff8100737c>] oops_end+0xcc/0xe0
<4>[ 24.581519] [<ffffffff81007586>] die+0x56/0x90
<4>[ 24.581557] [<ffffffff81004b32>] do_general_protection+0x152/0x160
<4>[ 24.581608] [<ffffffff813a142f>] general_protection+0x1f/0x30
<4>[ 24.581651] [<ffffffff810e9d76>] ? getname+0x36/0x210
<4>[ 24.581692] [<ffffffff810d8111>] ? kmem_cache_alloc+0x61/0x100
<4>[ 24.581736] [<ffffffff810e9d76>] getname+0x36/0x210
<4>[ 24.581775] [<ffffffff810dc85d>] do_sys_open+0x2d/0x110
<4>[ 24.581816] [<ffffffff810dc96b>] sys_open+0x1b/0x20
<4>[ 24.581855] [<ffffffff81002deb>] system_call_fastpath+0x16/0x1b
<0>[ 24.581915] Rebooting in 20 seconds..

I attach full log of two instances of the crash.

Best regards,
Maxim Levitsky

Attachments:

kernel_crash.gz (17.47 kB)
kernel_crash2.gz (15.79 kB)
Download all attachments

2010-08-13 09:30:26

by Catalin Marinas

[permalink] [raw]

Subject: Re: [REGRESSION 2.6.35+] crash (maybe kmemleak related)

Hi Maxim,

Could you please send us your .config file as well?

Maxim Levitsky <[email protected]> wrote:
> I am getting this crash on boot very often on git master of Linus tree.
>
> (Note that I use kmemleak, but disable its scan thread using
>
> 'echo "scan=off" > /sys/kernel/debug/kmemleak'
> in rc.local.

It may crash before you get to this point. Anyway, there is no scanning
taking place during booting - there is a specific 1 minute delay to
avoid slowing down the boot process.

I don't think the problem is caused by kmemleak (see below).

> <4>[ 24.511898] kmemleak: Cannot insert 0xffff880079fe6000 into the object search tree (already existing)
> <4>[ 24.511922] Pid: 2031, comm: NetworkManager Not tainted 2.6.35-rc0+ #108
> <4>[ 24.511936] Call Trace:
> <4>[ 24.511950] [<ffffffff810dc41a>] create_object+0x27a/0x2d0
> <4>[ 24.511966] [<ffffffff810d7b9b>] ? __slab_alloc+0xab/0x5c0
> <4>[ 24.511982] [<ffffffff81137e71>] ? proc_self_follow_link+0x71/0xa0
> <4>[ 24.511999] [<ffffffff8138ea29>] kmemleak_alloc+0x59/0xd0
> <4>[ 24.512013] [<ffffffff810d8144>] kmem_cache_alloc+0x94/0x100
> <4>[ 24.512028] [<ffffffff81137e71>] proc_self_follow_link+0x71/0xa0
> <4>[ 24.512045] [<ffffffff810ea1bc>] link_path_walk+0x26c/0xd00
> <4>[ 24.512060] [<ffffffff810ebe22>] do_filp_open+0x152/0x660
> <4>[ 24.512075] [<ffffffff810648a0>] ? sched_clock_cpu+0xc0/0x110
> <4>[ 24.512091] [<ffffffff8106e6ac>] ? lock_release_holdtime+0xac/0x150
> <4>[ 24.512107] [<ffffffff8103888d>] ? sub_preempt_count+0x9d/0xd0
> <4>[ 24.512123] [<ffffffff813a0eb0>] ? _raw_spin_unlock+0x30/0x60
> <4>[ 24.512139] [<ffffffff810f7222>] ? alloc_fd+0xf2/0x140
> <4>[ 24.512153] [<ffffffff810dc890>] do_sys_open+0x60/0x110
> <4>[ 24.512166] [<ffffffff810dc96b>] sys_open+0x1b/0x20
> <4>[ 24.512181] [<ffffffff81002deb>] system_call_fastpath+0x16/0x1b
> <6>[ 24.512205] kmemleak: Kernel memory leak detector disabled

Kmemleak shouldn't cause a panic, it just prints the message and
disables itself.

> <5>[ 24.512221] kmemleak: Object 0xffff880079fe6000 (size 4096):
> <5>[ 24.512234] kmemleak: comm "NetworkManager", pid 2031, jiffies 4294939747
> <5>[ 24.512248] kmemleak: min_count = 1
> <5>[ 24.512257] kmemleak: count = 0
> <5>[ 24.512265] kmemleak: flags = 0x1
> <5>[ 24.512273] kmemleak: checksum = 0
> <5>[ 24.512281] kmemleak: backtrace:
> <4>[ 24.512289] [<ffffffff8138ea29>] kmemleak_alloc+0x59/0xd0
> <4>[ 24.512304] [<ffffffff810d8144>] kmem_cache_alloc+0x94/0x100
> <4>[ 24.512319] [<ffffffff810e9d76>] getname+0x36/0x210
> <4>[ 24.512333] [<ffffffff810dc85d>] do_sys_open+0x2d/0x110
> <4>[ 24.512347] [<ffffffff810dc96b>] sys_open+0x1b/0x20
> <4>[ 24.512361] [<ffffffff81002deb>] system_call_fastpath+0x16/0x1b
> <4>[ 24.512377] [<ffffffffffffffff>] 0xffffffffffffffff

This kmemleak warning tells us that the kmemleak_alloc() hook got called
with a pointer (0xffff880079fe6000) that's already registered with
kmemleak. The first trace tells us where the hook gets called from while
the second trace shows the details of the already existing pointer.

So __getname() allocates the same 4096 bytes block twice via
kmem_cache_alloc() but there is no kmemleak_free() call between them and
kmemleak gives up. It disables itself but does not panic the system.

Pekka, were there any recent changes in the slab/slob/slub area and
maybe a kmemleak_free() hook is missing? Or maybe something else went
wrong with the slab allocator and returns an already allocated block?

> <0>[ 24.578578] general protection fault: 0000 [#1] PREEMPT SMP

That's what's causing the fault but it doesn't seem to be related to
kmemleak.

Thanks.

--
Catalin

2010-08-13 10:28:12

by Pekka Enberg

[permalink] [raw]

Subject: Re: [REGRESSION 2.6.35+] crash (maybe kmemleak related)

Hi,

On Fri, Aug 13, 2010 at 12:26 PM, Catalin Marinas
<[email protected]> wrote:
> I don't think the problem is caused by kmemleak (see below).
>
>> <4>[ ? 24.511898] kmemleak: Cannot insert 0xffff880079fe6000 into the object search tree (already existing)
>> <4>[ ? 24.511922] Pid: 2031, comm: NetworkManager Not tainted 2.6.35-rc0+ #108
>> <4>[ ? 24.511936] Call Trace:
>> <4>[ ? 24.511950] ?[<ffffffff810dc41a>] create_object+0x27a/0x2d0
>> <4>[ ? 24.511966] ?[<ffffffff810d7b9b>] ? __slab_alloc+0xab/0x5c0
>> <4>[ ? 24.511982] ?[<ffffffff81137e71>] ? proc_self_follow_link+0x71/0xa0
>> <4>[ ? 24.511999] ?[<ffffffff8138ea29>] kmemleak_alloc+0x59/0xd0
>> <4>[ ? 24.512013] ?[<ffffffff810d8144>] kmem_cache_alloc+0x94/0x100
>> <4>[ ? 24.512028] ?[<ffffffff81137e71>] proc_self_follow_link+0x71/0xa0
>> <4>[ ? 24.512045] ?[<ffffffff810ea1bc>] link_path_walk+0x26c/0xd00
>> <4>[ ? 24.512060] ?[<ffffffff810ebe22>] do_filp_open+0x152/0x660
>> <4>[ ? 24.512075] ?[<ffffffff810648a0>] ? sched_clock_cpu+0xc0/0x110
>> <4>[ ? 24.512091] ?[<ffffffff8106e6ac>] ? lock_release_holdtime+0xac/0x150
>> <4>[ ? 24.512107] ?[<ffffffff8103888d>] ? sub_preempt_count+0x9d/0xd0
>> <4>[ ? 24.512123] ?[<ffffffff813a0eb0>] ? _raw_spin_unlock+0x30/0x60
>> <4>[ ? 24.512139] ?[<ffffffff810f7222>] ? alloc_fd+0xf2/0x140
>> <4>[ ? 24.512153] ?[<ffffffff810dc890>] do_sys_open+0x60/0x110
>> <4>[ ? 24.512166] ?[<ffffffff810dc96b>] sys_open+0x1b/0x20
>> <4>[ ? 24.512181] ?[<ffffffff81002deb>] system_call_fastpath+0x16/0x1b
>> <6>[ ? 24.512205] kmemleak: Kernel memory leak detector disabled
>
> Kmemleak shouldn't cause a panic, it just prints the message and
> disables itself.
>
>> <5>[ ? 24.512221] kmemleak: Object 0xffff880079fe6000 (size 4096):
>> <5>[ ? 24.512234] kmemleak: ? comm "NetworkManager", pid 2031, jiffies 4294939747
>> <5>[ ? 24.512248] kmemleak: ? min_count = 1
>> <5>[ ? 24.512257] kmemleak: ? count = 0
>> <5>[ ? 24.512265] kmemleak: ? flags = 0x1
>> <5>[ ? 24.512273] kmemleak: ? checksum = 0
>> <5>[ ? 24.512281] kmemleak: ? backtrace:
>> <4>[ ? 24.512289] ? ? ?[<ffffffff8138ea29>] kmemleak_alloc+0x59/0xd0
>> <4>[ ? 24.512304] ? ? ?[<ffffffff810d8144>] kmem_cache_alloc+0x94/0x100
>> <4>[ ? 24.512319] ? ? ?[<ffffffff810e9d76>] getname+0x36/0x210
>> <4>[ ? 24.512333] ? ? ?[<ffffffff810dc85d>] do_sys_open+0x2d/0x110
>> <4>[ ? 24.512347] ? ? ?[<ffffffff810dc96b>] sys_open+0x1b/0x20
>> <4>[ ? 24.512361] ? ? ?[<ffffffff81002deb>] system_call_fastpath+0x16/0x1b
>> <4>[ ? 24.512377] ? ? ?[<ffffffffffffffff>] 0xffffffffffffffff
>
> This kmemleak warning tells us that the kmemleak_alloc() hook got called
> with a pointer (0xffff880079fe6000) that's already registered with
> kmemleak. The first trace tells us where the hook gets called from while
> the second trace shows the details of the already existing pointer.
>
> So __getname() allocates the same 4096 bytes block twice via
> kmem_cache_alloc() but there is no kmemleak_free() call between them and
> kmemleak gives up. It disables itself but does not panic the system.
>
> Pekka, were there any recent changes in the slab/slob/slub area and
> maybe a kmemleak_free() hook is missing? Or maybe something else went
> wrong with the slab allocator and returns an already allocated block?
>
>> <0>[ ? 24.578578] general protection fault: 0000 [#1] PREEMPT SMP
>
> That's what's causing the fault but it doesn't seem to be related to
> kmemleak.

Looking at Maxim's log, slab seems to be corrupted. I don't see
anything obviously wrong with SLUB (which he is using) kmemleak hooks
so it doesn't look like a slab allocator problem either. Could this be
related to the bootmem kmemleak hook changes?

Maxim, can you see some kind of crashes with CONFIG_KMEMLEAK disabled?
If yes, can you please enable SLUB debugging either by passing
'slub_debug' as kernel parameter or enabling CONFIG_SLUB_DEBUG_ON
config option and try to reproduce the issue?

Pekka

2010-08-13 10:54:27

by Catalin Marinas

[permalink] [raw]

Subject: Re: [REGRESSION 2.6.35+] crash (maybe kmemleak related)

On Fri, 2010-08-13 at 11:28 +0100, Pekka Enberg wrote:
> On Fri, Aug 13, 2010 at 12:26 PM, Catalin Marinas
> <[email protected]> wrote:
> > This kmemleak warning tells us that the kmemleak_alloc() hook got called
> > with a pointer (0xffff880079fe6000) that's already registered with
> > kmemleak. The first trace tells us where the hook gets called from while
> > the second trace shows the details of the already existing pointer.
> >
> > So __getname() allocates the same 4096 bytes block twice via
> > kmem_cache_alloc() but there is no kmemleak_free() call between them and
> > kmemleak gives up. It disables itself but does not panic the system.
> >
> > Pekka, were there any recent changes in the slab/slob/slub area and
> > maybe a kmemleak_free() hook is missing? Or maybe something else went
> > wrong with the slab allocator and returns an already allocated block?
> >
> >> <0>[ 24.578578] general protection fault: 0000 [#1] PREEMPT SMP
> >
> > That's what's causing the fault but it doesn't seem to be related to
> > kmemleak.
>
> Looking at Maxim's log, slab seems to be corrupted. I don't see
> anything obviously wrong with SLUB (which he is using) kmemleak hooks
> so it doesn't look like a slab allocator problem either. Could this be
> related to the bootmem kmemleak hook changes?

I don't think in this case since both allocations came via
kmem_cache_alloc() (and not one from bootmem and the other from slab, as
it happened to me in the past). But it's worth trying with kmemleak
disabled.

--
Catalin

2010-08-13 13:10:40

by Maxim Levitsky

[permalink] [raw]

Subject: Re: [REGRESSION 2.6.35+] crash (maybe kmemleak related)

On Fri, 2010-08-13 at 11:38 +0100, Catalin Marinas wrote:
> On Fri, 2010-08-13 at 11:28 +0100, Pekka Enberg wrote:
> > On Fri, Aug 13, 2010 at 12:26 PM, Catalin Marinas
> > <[email protected]> wrote:
> > > This kmemleak warning tells us that the kmemleak_alloc() hook got called
> > > with a pointer (0xffff880079fe6000) that's already registered with
> > > kmemleak. The first trace tells us where the hook gets called from while
> > > the second trace shows the details of the already existing pointer.
> > >
> > > So __getname() allocates the same 4096 bytes block twice via
> > > kmem_cache_alloc() but there is no kmemleak_free() call between them and
> > > kmemleak gives up. It disables itself but does not panic the system.
> > >
> > > Pekka, were there any recent changes in the slab/slob/slub area and
> > > maybe a kmemleak_free() hook is missing? Or maybe something else went
> > > wrong with the slab allocator and returns an already allocated block?
> > >
> > >> <0>[ 24.578578] general protection fault: 0000 [#1] PREEMPT SMP
> > >
> > > That's what's causing the fault but it doesn't seem to be related to
> > > kmemleak.
> >
> > Looking at Maxim's log, slab seems to be corrupted. I don't see
> > anything obviously wrong with SLUB (which he is using) kmemleak hooks
> > so it doesn't look like a slab allocator problem either. Could this be
> > related to the bootmem kmemleak hook changes?
>
> I don't think in this case since both allocations came via
> kmem_cache_alloc() (and not one from bootmem and the other from slab, as
> it happened to me in the past). But it's worth trying with kmemleak
> disabled.

Thank you.
I will try all the suggestions soon.

Best regards,
Maxim Levitsky