2009-06-25 18:59:53

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: possible recursive locking detected at kmem_cache_free


Not sure if this has been reported elsewhere. I started seeing this with git
kernel from this morning on a system with RedHat EL5 base installation with
ext3 root file system. Attached is the config I am using.

Let me know if you need any more details. This seems to be at every boot. I
am trying the bisect right now.

Thanks,
Venki

[ 11.554714]
[ 11.554715] =============================================
[ 11.555249] [ INFO: possible recursive locking detected ]
[ 11.555560] 2.6.31-rc1 #896
[ 11.555861] ---------------------------------------------
[ 11.556127] udevd/1899 is trying to acquire lock:
[ 11.556436] (&nc->lock){-.-...}, at: [<ffffffff810c337f>] kmem_cache_free+0xcd/0x25b
[ 11.557101]
[ 11.557102] but task is already holding lock:
[ 11.557706] (&nc->lock){-.-...}, at: [<ffffffff810c3cd0>] kfree+0x137/0x292
[ 11.558109]
[ 11.558109] other info that might help us debug this:
[ 11.558720] 2 locks held by udevd/1899:
[ 11.558983] #0: (&nc->lock){-.-...}, at: [<ffffffff810c3cd0>] kfree+0x137/0x292
[ 11.559734] #1: (&parent->list_lock){-.-...}, at: [<ffffffff810c36c7>] __drain_alien_cache+0x3b/0xbd
[ 11.560442]
[ 11.560443] stack backtrace:
[ 11.561009] Pid: 1899, comm: udevd Not tainted 2.6.31-rc1 #896
[ 11.561276] Call Trace:
[ 11.561632] [<ffffffff81065ed6>] __lock_acquire+0x15ec/0x168f
[ 11.561901] [<ffffffff81065f60>] ? __lock_acquire+0x1676/0x168f
[ 11.562171] [<ffffffff81063c52>] ? trace_hardirqs_on_caller+0x113/0x13e
[ 11.562490] [<ffffffff8150c337>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 11.562807] [<ffffffff8106603a>] lock_acquire+0xc1/0xe5
[ 11.563073] [<ffffffff810c337f>] ? kmem_cache_free+0xcd/0x25b
[ 11.563385] [<ffffffff8150c8fc>] _spin_lock+0x31/0x66
[ 11.563696] [<ffffffff810c337f>] ? kmem_cache_free+0xcd/0x25b
[ 11.563964] [<ffffffff810c337f>] kmem_cache_free+0xcd/0x25b
[ 11.564235] [<ffffffff8109bf8c>] ? __free_pages+0x1b/0x24
[ 11.564551] [<ffffffff810c3564>] slab_destroy+0x57/0x5c
[ 11.564860] [<ffffffff810c3641>] free_block+0xd8/0x123
[ 11.565126] [<ffffffff810c372e>] __drain_alien_cache+0xa2/0xbd
[ 11.565441] [<ffffffff810c3ce5>] kfree+0x14c/0x292
[ 11.565752] [<ffffffff8144a007>] skb_release_data+0xc6/0xcb
[ 11.566020] [<ffffffff81449cf0>] __kfree_skb+0x19/0x86
[ 11.566286] [<ffffffff81449d88>] consume_skb+0x2b/0x2d
[ 11.566631] [<ffffffff8144cbe0>] skb_free_datagram+0x14/0x3a
[ 11.566901] [<ffffffff81462eef>] netlink_recvmsg+0x164/0x258
[ 11.567170] [<ffffffff81443461>] sock_recvmsg+0xe5/0xfe
[ 11.567486] [<ffffffff810ab063>] ? might_fault+0xaf/0xb1
[ 11.567802] [<ffffffff81053a78>] ? autoremove_wake_function+0x0/0x38
[ 11.568073] [<ffffffff810d84ca>] ? core_sys_select+0x3d/0x2b4
[ 11.568378] [<ffffffff81065f60>] ? __lock_acquire+0x1676/0x168f
[ 11.568693] [<ffffffff81442dc1>] ? sockfd_lookup_light+0x1b/0x54
[ 11.568961] [<ffffffff81444416>] sys_recvfrom+0xa3/0xf8
[ 11.569228] [<ffffffff81063c8a>] ? trace_hardirqs_on+0xd/0xf
[ 11.569546] [<ffffffff8100af2b>] system_call_fastpath+0x16/0x1b


Attachments:
(No filename) (3.03 kB)
dot.config (47.65 kB)
Download all attachments

2009-06-25 19:38:08

by Andrew Morton

[permalink] [raw]
Subject: Re: possible recursive locking detected at kmem_cache_free

On Thu, 25 Jun 2009 11:59:42 -0700
"Pallipadi, Venkatesh" <[email protected]> wrote:

>
> Not sure if this has been reported elsewhere. I started seeing this with git
> kernel from this morning on a system with RedHat EL5 base installation with
> ext3 root file system. Attached is the config I am using.
>
> Let me know if you need any more details. This seems to be at every boot. I
> am trying the bisect right now.
>

Thanks. Let's cc Pekka.

>
> [ 11.554714]
> [ 11.554715] =============================================
> [ 11.555249] [ INFO: possible recursive locking detected ]
> [ 11.555560] 2.6.31-rc1 #896
> [ 11.555861] ---------------------------------------------
> [ 11.556127] udevd/1899 is trying to acquire lock:
> [ 11.556436] (&nc->lock){-.-...}, at: [<ffffffff810c337f>] kmem_cache_free+0xcd/0x25b
> [ 11.557101]
> [ 11.557102] but task is already holding lock:
> [ 11.557706] (&nc->lock){-.-...}, at: [<ffffffff810c3cd0>] kfree+0x137/0x292
> [ 11.558109]
> [ 11.558109] other info that might help us debug this:
> [ 11.558720] 2 locks held by udevd/1899:
> [ 11.558983] #0: (&nc->lock){-.-...}, at: [<ffffffff810c3cd0>] kfree+0x137/0x292
> [ 11.559734] #1: (&parent->list_lock){-.-...}, at: [<ffffffff810c36c7>] __drain_alien_cache+0x3b/0xbd
> [ 11.560442]
> [ 11.560443] stack backtrace:
> [ 11.561009] Pid: 1899, comm: udevd Not tainted 2.6.31-rc1 #896
> [ 11.561276] Call Trace:
> [ 11.561632] [<ffffffff81065ed6>] __lock_acquire+0x15ec/0x168f
> [ 11.561901] [<ffffffff81065f60>] ? __lock_acquire+0x1676/0x168f
> [ 11.562171] [<ffffffff81063c52>] ? trace_hardirqs_on_caller+0x113/0x13e
> [ 11.562490] [<ffffffff8150c337>] ? trace_hardirqs_on_thunk+0x3a/0x3f
> [ 11.562807] [<ffffffff8106603a>] lock_acquire+0xc1/0xe5
> [ 11.563073] [<ffffffff810c337f>] ? kmem_cache_free+0xcd/0x25b
> [ 11.563385] [<ffffffff8150c8fc>] _spin_lock+0x31/0x66
> [ 11.563696] [<ffffffff810c337f>] ? kmem_cache_free+0xcd/0x25b
> [ 11.563964] [<ffffffff810c337f>] kmem_cache_free+0xcd/0x25b
> [ 11.564235] [<ffffffff8109bf8c>] ? __free_pages+0x1b/0x24
> [ 11.564551] [<ffffffff810c3564>] slab_destroy+0x57/0x5c
> [ 11.564860] [<ffffffff810c3641>] free_block+0xd8/0x123
> [ 11.565126] [<ffffffff810c372e>] __drain_alien_cache+0xa2/0xbd
> [ 11.565441] [<ffffffff810c3ce5>] kfree+0x14c/0x292
> [ 11.565752] [<ffffffff8144a007>] skb_release_data+0xc6/0xcb
> [ 11.566020] [<ffffffff81449cf0>] __kfree_skb+0x19/0x86
> [ 11.566286] [<ffffffff81449d88>] consume_skb+0x2b/0x2d
> [ 11.566631] [<ffffffff8144cbe0>] skb_free_datagram+0x14/0x3a
> [ 11.566901] [<ffffffff81462eef>] netlink_recvmsg+0x164/0x258
> [ 11.567170] [<ffffffff81443461>] sock_recvmsg+0xe5/0xfe
> [ 11.567486] [<ffffffff810ab063>] ? might_fault+0xaf/0xb1
> [ 11.567802] [<ffffffff81053a78>] ? autoremove_wake_function+0x0/0x38
> [ 11.568073] [<ffffffff810d84ca>] ? core_sys_select+0x3d/0x2b4
> [ 11.568378] [<ffffffff81065f60>] ? __lock_acquire+0x1676/0x168f
> [ 11.568693] [<ffffffff81442dc1>] ? sockfd_lookup_light+0x1b/0x54
> [ 11.568961] [<ffffffff81444416>] sys_recvfrom+0xa3/0xf8
> [ 11.569228] [<ffffffff81063c8a>] ? trace_hardirqs_on+0xd/0xf
> [ 11.569546] [<ffffffff8100af2b>] system_call_fastpath+0x16/0x1b
>

2009-06-25 21:55:47

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: possible recursive locking detected at kmem_cache_free

On Thu, Jun 25, 2009 at 12:37:50PM -0700, Andrew Morton wrote:
> On Thu, 25 Jun 2009 11:59:42 -0700
> "Pallipadi, Venkatesh" <[email protected]> wrote:
>
> >
> > Not sure if this has been reported elsewhere. I started seeing this with git
> > kernel from this morning on a system with RedHat EL5 base installation with
> > ext3 root file system. Attached is the config I am using.
> >
> > Let me know if you need any more details. This seems to be at every boot. I
> > am trying the bisect right now.
> >
>
> Thanks. Let's cc Pekka.
>

git bisect result.

Thanks,
Venki

8429db5c6336083594036c30f49401405d536911 is first bad commit
commit 8429db5c6336083594036c30f49401405d536911
Author: Pekka Enberg <[email protected]>
Date: Fri Jun 12 15:58:59 2009 +0300

slab: setup cpu caches later on when interrupts are enabled

Fixes the following boot-time warning:

[ 0.000000] ------------[ cut here ]------------
[ 0.000000] WARNING: at kernel/smp.c:369 smp_call_function_many+0x56/0x1bc()
[ 0.000000] Hardware name:
[ 0.000000] Modules linked in:
[ 0.000000] Pid: 0, comm: swapper Not tainted 2.6.30 #492
[ 0.000000] Call Trace:
[ 0.000000] [<ffffffff8149e021>] ? _spin_unlock+0x4f/0x5c
[ 0.000000] [<ffffffff8108f11b>] ? smp_call_function_many+0x56/0x1bc
[ 0.000000] [<ffffffff81061764>] warn_slowpath_common+0x7c/0xa9
[ 0.000000] [<ffffffff810617a5>] warn_slowpath_null+0x14/0x16
[ 0.000000] [<ffffffff8108f11b>] smp_call_function_many+0x56/0x1bc
[ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54
[ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54
[ 0.000000] [<ffffffff8108f2be>] smp_call_function+0x3d/0x68
[ 0.000000] [<ffffffff810f3e00>] ? do_ccupdate_local+0x0/0x54
[ 0.000000] [<ffffffff81066fd8>] on_each_cpu+0x31/0x7c
[ 0.000000] [<ffffffff810f64f5>] do_tune_cpucache+0x119/0x454
[ 0.000000] [<ffffffff81087080>] ? lockdep_init_map+0x94/0x10b
[ 0.000000] [<ffffffff818133b0>] ? kmem_cache_init+0x421/0x593
[ 0.000000] [<ffffffff810f69cf>] enable_cpucache+0x68/0xad
[ 0.000000] [<ffffffff818133c3>] kmem_cache_init+0x434/0x593
[ 0.000000] [<ffffffff8180987c>] ? mem_init+0x156/0x161
[ 0.000000] [<ffffffff817f8aae>] start_kernel+0x1cc/0x3b9
[ 0.000000] [<ffffffff817f829a>] x86_64_start_reservations+0xaa/0xae
[ 0.000000] [<ffffffff817f837f>] x86_64_start_kernel+0xe1/0xe8
[ 0.000000] ---[ end trace 4eaa2a86a8e2da22 ]---

Cc: Christoph Lameter <[email protected]>
Cc: Nick Piggin <[email protected]>
Signed-off-by: Pekka Enberg <[email protected]>

:040000 040000 4575f193f5b94a4c1110cdef769a47fd41ae1f69 54d87f956098163232fce87a59bdc90a42de14bf M mm


$ git bisect log
git bisect start
# good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
git bisect good 07a2039b8eb0af4ff464efd3dfd95de5c02648c6
# bad: [28d0325ce6e0a52f53d8af687e6427fee59004d3] Linux 2.6.31-rc1
git bisect bad 28d0325ce6e0a52f53d8af687e6427fee59004d3
# bad: [e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a] powerpc/5121: make clock debug output more readable
git bisect bad e7c5a4f292e0d1f4ba9a3a94b2c8e8b71e35b25a
# good: [8a1ca8cedd108c8e76a6ab34079d0bbb4f244799] Merge branch 'perfcounters-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
git bisect good 8a1ca8cedd108c8e76a6ab34079d0bbb4f244799
# bad: [6b702462cbe5b6f372966a53f4465d745d86b65c] Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6
git bisect bad 6b702462cbe5b6f372966a53f4465d745d86b65c
# bad: [6d21491838a2a9f22843c7530b118596ee9f4d77] Merge branch 'topic/slab/earlyboot-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6
git bisect bad 6d21491838a2a9f22843c7530b118596ee9f4d77
# good: [c34752bc8b3196aee3115d1aa41972604ab0aea8] Merge branch 'cuse' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
git bisect good c34752bc8b3196aee3115d1aa41972604ab0aea8
# good: [ca9e9c3eb118d0cb9dc2e5232f6f2dcaa4b7a5e0] [SCSI] qla2xxx: Check status of qla2x00_get_fw_version() call.
git bisect good ca9e9c3eb118d0cb9dc2e5232f6f2dcaa4b7a5e0
# good: [c59a264c9e932c828d533497e286b89e43c8d1be] Merge git://git.infradead.org/~dwmw2/firmware-2.6
git bisect good c59a264c9e932c828d533497e286b89e43c8d1be
# good: [e7deff3374cc5951f914dcb3c66bc1b8de8a084e] [SCSI] mpt fusion: Adding DeviceResetCtx for internal Device reset frame
git bisect good e7deff3374cc5951f914dcb3c66bc1b8de8a084e
# good: [0e35afbc8b054e04a35faa796c72abb3b82bd33b] [SCSI] libosd: osd_req_{read,write}_kern new API
git bisect good 0e35afbc8b054e04a35faa796c72abb3b82bd33b
# good: [82681a318f9f028ea64e61f24bbd9ac535531921] [SCSI] Merge branch 'linus'
git bisect good 82681a318f9f028ea64e61f24bbd9ac535531921
# good: [eb91f1d0a531289e18f5587dc197d12a251c66a3] slab: fix gfp flag in setup_cpu_cache()
git bisect good eb91f1d0a531289e18f5587dc197d12a251c66a3
# bad: [8429db5c6336083594036c30f49401405d536911] slab: setup cpu caches later on when interrupts are enabled
git bisect bad 8429db5c6336083594036c30f49401405d536911
# good: [7e85ee0c1d15ca5f8bff0f514f158eba1742dd87] slab,slub: don't enable interrupts during early boot
git bisect good 7e85ee0c1d15ca5f8bff0f514f158eba1742dd87

2009-06-26 07:37:33

by Pekka Enberg

[permalink] [raw]
Subject: Re: possible recursive locking detected at kmem_cache_free

Hi Venki,

On Thu, 25 Jun 2009, Pallipadi, Venkatesh wrote:
> > > Not sure if this has been reported elsewhere. I started seeing this with git
> > > kernel from this morning on a system with RedHat EL5 base installation with
> > > ext3 root file system. Attached is the config I am using.
> > >
> > > Let me know if you need any more details. This seems to be at every boot. I
> > > am trying the bisect right now.
> > >
> >
> > Thanks. Let's cc Pekka.
> >
>
> git bisect result.

Thanks! I think I just broke lockdep annotations for slab. Does this patch
fix things?

Pekka

diff --git a/mm/slab.c b/mm/slab.c
index e74a16e..37ee532 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1544,9 +1544,6 @@ void __init kmem_cache_init(void)
}

g_cpucache_up = EARLY;
-
- /* Annotate slab for lockdep -- annotate the malloc caches */
- init_lock_keys();
}

void __init kmem_cache_init_late(void)
@@ -1563,6 +1560,9 @@ void __init kmem_cache_init_late(void)
/* Done! */
g_cpucache_up = FULL;

+ /* Annotate slab for lockdep -- annotate the malloc caches */
+ init_lock_keys();
+
/*
* Register a cpu startup notifier callback that initializes
* cpu_cache_get for all new cpus

2009-06-26 17:00:18

by Pallipadi, Venkatesh

[permalink] [raw]
Subject: Re: possible recursive locking detected at kmem_cache_free

On Fri, 2009-06-26 at 00:37 -0700, Pekka J Enberg wrote:
> Hi Venki,
>
> On Thu, 25 Jun 2009, Pallipadi, Venkatesh wrote:
> > > > Not sure if this has been reported elsewhere. I started seeing this with git
> > > > kernel from this morning on a system with RedHat EL5 base installation with
> > > > ext3 root file system. Attached is the config I am using.
> > > >
> > > > Let me know if you need any more details. This seems to be at every boot. I
> > > > am trying the bisect right now.
> > > >
> > >
> > > Thanks. Let's cc Pekka.
> > >
> >
> > git bisect result.
>
> Thanks! I think I just broke lockdep annotations for slab. Does this patch
> fix things?
>

Yes. This patch helps.

Thanks,
Venki

> Pekka
>
> diff --git a/mm/slab.c b/mm/slab.c
> index e74a16e..37ee532 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -1544,9 +1544,6 @@ void __init kmem_cache_init(void)
> }
>
> g_cpucache_up = EARLY;
> -
> - /* Annotate slab for lockdep -- annotate the malloc caches */
> - init_lock_keys();
> }
>
> void __init kmem_cache_init_late(void)
> @@ -1563,6 +1560,9 @@ void __init kmem_cache_init_late(void)
> /* Done! */
> g_cpucache_up = FULL;
>
> + /* Annotate slab for lockdep -- annotate the malloc caches */
> + init_lock_keys();
> +
> /*
> * Register a cpu startup notifier callback that initializes
> * cpu_cache_get for all new cpus

2009-06-26 17:42:45

by Pekka Enberg

[permalink] [raw]
Subject: Re: possible recursive locking detected at kmem_cache_free

On Fri, Jun 26, 2009 at 7:58 PM, Pallipadi,
Venkatesh<[email protected]> wrote:
> On Fri, 2009-06-26 at 00:37 -0700, Pekka J Enberg wrote:
>> Hi Venki,
>>
>> On Thu, 25 Jun 2009, Pallipadi, Venkatesh wrote:
>> > > > Not sure if this has been reported elsewhere. I started seeing this with git
>> > > > kernel from this morning on a system with RedHat EL5 base installation with
>> > > > ext3 root file system. Attached is the config I am using.
>> > > >
>> > > > Let me know if you need any more details. This seems to be at every boot. I
>> > > > am trying the bisect right now.
>> > > >
>> > >
>> > > Thanks. ?Let's cc Pekka.
>> > >
>> >
>> > git bisect result.
>>
>> Thanks! I think I just broke lockdep annotations for slab. Does this patch
>> fix things?
>>
>
> Yes. This patch helps.

OK, thanks for testing, Venki! I'll merge this to slab.git first thing
on Monday.

Pekka