Get this on a NFS root system while booting
This must be a recent change in the last week,
I didn't see it in a post rc1 git* from last week
(I haven't done a exact bisect)
It's triggered by the r8169 driver close function,
but looks more like a slab problem?
I haven't checked it in detail if the locks are
really different or just lockdep not knowing
enough classes.
-Andi
=============================================
[ INFO: possible recursive locking detected ]
2.6.33-rc2 #19
---------------------------------------------
swapper/1 is trying to acquire lock:
(&(&parent->list_lock)->rlock){-.-...}, at: [<ffffffff810cc93a>] cache_flusharray+0x55/0x10a
but task is already holding lock:
(&(&parent->list_lock)->rlock){-.-...}, at: [<ffffffff810cc93a>] cache_flusharray+0x55/0x10a
other info that might help us debug this:
2 locks held by swapper/1:
#0: (rtnl_mutex){+.+.+.}, at: [<ffffffff813e24d6>] rtnl_lock+0x12/0x14
#1: (&(&parent->list_lock)->rlock){-.-...}, at: [<ffffffff810cc93a>] cache_flusharray+0x55/0x10a
stack backtrace:
Pid: 1, comm: swapper Not tainted 2.6.33-rc2-MCE6 #19
Call Trace:
[<ffffffff810687da>] __lock_acquire+0xf94/0x1771
[<ffffffff81066402>] ? mark_held_locks+0x4d/0x6b
[<ffffffff81066663>] ? trace_hardirqs_on_caller+0x10b/0x12f
[<ffffffff8105b061>] ? sched_clock_local+0x1c/0x80
[<ffffffff8105b061>] ? sched_clock_local+0x1c/0x80
[<ffffffff81069073>] lock_acquire+0xbc/0xd9
[<ffffffff810cc93a>] ? cache_flusharray+0x55/0x10a
[<ffffffff8149639d>] _raw_spin_lock+0x31/0x66
[<ffffffff810cc93a>] ? cache_flusharray+0x55/0x10a
[<ffffffff810cbbf8>] ? kfree_debugcheck+0x11/0x2d
[<ffffffff810cc93a>] cache_flusharray+0x55/0x10a
[<ffffffff81066d67>] ? debug_check_no_locks_freed+0x119/0x12f
[<ffffffff810cc387>] kmem_cache_free+0x18f/0x1f2
[<ffffffff810cc515>] slab_destroy+0x12b/0x138
[<ffffffff810cc683>] free_block+0x161/0x1a2
[<ffffffff810cc982>] cache_flusharray+0x9d/0x10a
[<ffffffff81066d67>] ? debug_check_no_locks_freed+0x119/0x12f
[<ffffffff810ccbf3>] kfree+0x204/0x23b
[<ffffffff81066694>] ? trace_hardirqs_on+0xd/0xf
[<ffffffff813d002a>] skb_release_data+0xc6/0xcb
[<ffffffff813cfd19>] __kfree_skb+0x19/0x86
[<ffffffff813cfdb1>] consume_skb+0x2b/0x2d
[<ffffffff8133929a>] rtl8169_rx_clear+0x7f/0xbb
[<ffffffff8133ada2>] rtl8169_down+0x12c/0x13b
[<ffffffff8133b58a>] rtl8169_close+0x30/0x131
[<ffffffff813e8d98>] ? dev_deactivate+0x168/0x198
[<ffffffff813d94d6>] dev_close+0x8c/0xae
[<ffffffff813d8e62>] dev_change_flags+0xba/0x180
[<ffffffff81a87e63>] ic_close_devs+0x2e/0x48
[<ffffffff81a88a5b>] ip_auto_config+0x914/0xe1e
[<ffffffff8105b061>] ? sched_clock_local+0x1c/0x80
[<ffffffff810649a1>] ? trace_hardirqs_off+0xd/0xf
[<ffffffff8105b1c0>] ? cpu_clock+0x2d/0x3f
[<ffffffff810649c7>] ? lock_release_holdtime+0x24/0x181
[<ffffffff81a86967>] ? tcp_congestion_default+0x0/0x12
[<ffffffff81496c60>] ? _raw_spin_unlock+0x26/0x2b
[<ffffffff81a86967>] ? tcp_congestion_default+0x0/0x12
[<ffffffff81a88147>] ? ip_auto_config+0x0/0xe1e
[<ffffffff810001f0>] do_one_initcall+0x5a/0x14f
[<ffffffff81a5364c>] kernel_init+0x141/0x197
[<ffffffff81003794>] kernel_thread_helper+0x4/0x10
[<ffffffff81496efc>] ? restore_args+0x0/0x30
[<ffffffff81a5350b>] ? kernel_init+0x0/0x197
[<ffffffff81003790>] ? kernel_thread_helper+0x0/0x10
IP-Config: Retrying forever (NFS root)...
r8169: eth0: link up
--
[email protected] -- Speaking for myself only.
Hi Andi,
On Sun, 2009-12-27 at 13:06 +0100, Andi Kleen wrote:
> Get this on a NFS root system while booting
> This must be a recent change in the last week,
> I didn't see it in a post rc1 git* from last week
> (I haven't done a exact bisect)
>
> It's triggered by the r8169 driver close function,
> but looks more like a slab problem?
>
> I haven't checked it in detail if the locks are
> really different or just lockdep not knowing
> enough classes.
I broke the lockdep annotations in commit
ce79ddc8e2376a9a93c7d42daf89bfcbb9187e62 ("SLAB: Fix lockdep annotations
for CPU hotplug"). Does this fix things for you? Heiko, the following
patch should fix it for you too.
Pekka
diff --git a/mm/slab.c b/mm/slab.c
index 7d41f15..7451bda 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -654,7 +654,7 @@ static void init_node_lock_keys(int q)
l3 = s->cs_cachep->nodelists[q];
if (!l3 || OFF_SLAB(s->cs_cachep))
- return;
+ continue;
lockdep_set_class(&l3->list_lock, &on_slab_l3_key);
alc = l3->alien;
/*
@@ -665,7 +665,7 @@ static void init_node_lock_keys(int q)
* for alloc_alien_cache,
*/
if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
- return;
+ continue;
for_each_node(r) {
if (alc[r])
lockdep_set_class(&alc[r]->lock,
> I broke the lockdep annotations in commit
> ce79ddc8e2376a9a93c7d42daf89bfcbb9187e62 ("SLAB: Fix lockdep annotations
> for CPU hotplug"). Does this fix things for you? Heiko, the following
> patch should fix it for you too.
Yes that patch fixes it. Thanks.
-Andi
On Sun, Dec 27, 2009 at 02:33:14PM +0200, Pekka Enberg wrote:
> Hi Andi,
>
> On Sun, 2009-12-27 at 13:06 +0100, Andi Kleen wrote:
> > Get this on a NFS root system while booting
> > This must be a recent change in the last week,
> > I didn't see it in a post rc1 git* from last week
> > (I haven't done a exact bisect)
> >
> > It's triggered by the r8169 driver close function,
> > but looks more like a slab problem?
> >
> > I haven't checked it in detail if the locks are
> > really different or just lockdep not knowing
> > enough classes.
>
> I broke the lockdep annotations in commit
> ce79ddc8e2376a9a93c7d42daf89bfcbb9187e62 ("SLAB: Fix lockdep annotations
> for CPU hotplug"). Does this fix things for you? Heiko, the following
> patch should fix it for you too.
Works fine here too. Thanks!
On Sun, Dec 27, 2009 at 02:33:14PM +0200, Pekka Enberg wrote:
> Hi Andi,
>
> On Sun, 2009-12-27 at 13:06 +0100, Andi Kleen wrote:
> > Get this on a NFS root system while booting
> > This must be a recent change in the last week,
> > I didn't see it in a post rc1 git* from last week
> > (I haven't done a exact bisect)
> >
> > It's triggered by the r8169 driver close function,
> > but looks more like a slab problem?
> >
> > I haven't checked it in detail if the locks are
> > really different or just lockdep not knowing
> > enough classes.
>
> I broke the lockdep annotations in commit
> ce79ddc8e2376a9a93c7d42daf89bfcbb9187e62 ("SLAB: Fix lockdep annotations
> for CPU hotplug"). Does this fix things for you? Heiko, the following
> patch should fix it for you too.
And no lockdep warnings here, either. I did get the following
new-to-me preempt_count underflow, but doubt that it is related.
Thanx, Paul
Badness at kernel/sched.c:5350
NIP: c0000000005b2e58 LR: c0000000005b2e3c CTR: c000000000025f0c
REGS: c000000042893b30 TRAP: 0700 Not tainted (2.6.33-rc2-autokern1)
MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 22000082 XER: 0000000c
TASK = c00000007d8737e0[0] 'swapper' THREAD: c000000042890000 CPU: 2
GPR00: 0000000000000000 c000000042893db0 c0000000009c07f8 0000000000000001
GPR04: 0000000000000001 0000000000000006 0000000000000001 000000000000004a
GPR08: 0000000000000000 c00000000128adb8 c00000000088aa20 c000000000a0da08
GPR12: 0000000000000002 c0000000009df880 0000000000000000 0000000000c00020
GPR16: 0000000000000002 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 c0000000009e24b0 0000000000000001 c0000000009df480
GPR24: 0000000000000000 c0000000009d8628 c0000000009df880 0000000000000002
GPR28: c0000000009e2068 c0000000009d8628 c00000000093c000 c000000042890000
NIP [c0000000005b2e58] .sub_preempt_count+0x58/0xc8
LR [c0000000005b2e3c] .sub_preempt_count+0x3c/0xc8
Call Trace:
[c000000042893db0] [c000000042893e30] 0xc000000042893e30 (unreliable)
[c000000042893e30] [c000000000014d38] .cpu_idle+0x1f0/0x20c
[c000000042893ec0] [c0000000005ba678] .start_secondary+0x380/0x3c4
[c000000042893f90] [c000000000008264] .start_secondary_prolog+0x10/0x14
Instruction dump:
78290464 80090014 7f801800 40bc0074 4bd45745 60000000 2fa30000 419e0070
e93e8a08 80090000 2f800000 409e0060 <0fe00000> 48000058 78000620 2fa00000
BUG: scheduling while atomic: swapper/0/0x00000000
INFO: lockdep is turned off.
Modules linked in: ehea
Call Trace:
[c000000042897bf0] [c0000000000123b0] .show_stack+0x70/0x184 (unreliable)
[c000000042897ca0] [c00000000005eaa0] .__schedule_bug+0xa4/0xc4
[c000000042897d30] [c0000000005abe4c] .schedule+0xd8/0xa8c
[c000000042897e30] [c000000000014d40] .cpu_idle+0x1f8/0x20c
[c000000042897ec0] [c0000000005ba678] .start_secondary+0x380/0x3c4
[c000000042897f90] [c000000000008264] .start_secondary_prolog+0x10/0x14
> diff --git a/mm/slab.c b/mm/slab.c
> index 7d41f15..7451bda 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -654,7 +654,7 @@ static void init_node_lock_keys(int q)
>
> l3 = s->cs_cachep->nodelists[q];
> if (!l3 || OFF_SLAB(s->cs_cachep))
> - return;
> + continue;
> lockdep_set_class(&l3->list_lock, &on_slab_l3_key);
> alc = l3->alien;
> /*
> @@ -665,7 +665,7 @@ static void init_node_lock_keys(int q)
> * for alloc_alien_cache,
> */
> if (!alc || (unsigned long)alc == BAD_ALIEN_MAGIC)
> - return;
> + continue;
> for_each_node(r) {
> if (alc[r])
> lockdep_set_class(&alc[r]->lock,
>
>