2006-10-24 20:24:14

by Ravikiran G Thirumalai

[permalink] [raw]
Subject: [bug] 2.6.19-rc3 oops at __drain_pages during cpu hotplug tests + lockdep warning with xfs

2.6.19-rc2mm2 as well as 2.6.19-rc3 panics with the following OOPS when I
run kernbench continuously and execute cpu offline and online in parallel.
This is a 2 socket 4 core tyan opteron system with 8Gs of RAM.
The kernel has lockdep, and other debug flags enabled. Also, we have a
lockdep warning with XFS. Attaching the dmesg and the .config used as well.
Here's the snapshot of lockdep warning + oops.


[ 469.336719]
[ 469.336721] =============================================
[ 469.336728] [ INFO: possible recursive locking detected ]
[ 469.336731] 2.6.19-rc3 #2
[ 469.336737] ---------------------------------------------
[ 469.336745] rm/4538 is trying to acquire lock:
[ 469.336753] (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff80320050>] xfs_ilock+0x52/0x77
[ 469.336782]
[ 469.336784] but task is already holding lock:
[ 469.336791] (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff80320050>] xfs_ilock+0x52/0x77
[ 469.336816]
[ 469.336817] other info that might help us debug this:
[ 469.336825] 3 locks held by rm/4538:
[ 469.336832] #0: (&inode->i_mutex/1){--..}, at: [<ffffffff80286055>] do_unlinkat+0x75/0x136
[ 469.336866] #1: (&inode->i_mutex){--..}, at: [<ffffffff80285f98>] vfs_unlink+0x45/0x8d
[ 469.336897] #2: (&(&ip->i_lock)->mr_lock){----}, at: [<ffffffff80320050>] xfs_ilock+0x52/0x77
[ 469.336928]
[ 469.336929] stack backtrace:
[ 469.336936]
[ 469.336938] Call Trace:
[ 469.336947] [<ffffffff8024b4cc>] __lock_acquire+0x381/0xb8c
[ 469.336957] [<ffffffff8024bf80>] lock_acquire+0x4b/0x66
[ 469.336966] [<ffffffff80320050>] xfs_ilock+0x52/0x77
[ 469.336976] [<ffffffff80247c97>] down_write+0x1e/0x27
[ 469.336985] [<ffffffff80320050>] xfs_ilock+0x52/0x77
[ 469.336993] [<ffffffff8033cad1>] xfs_lock_dir_and_entry+0x92/0xce
[ 469.337002] [<ffffffff8033cd23>] xfs_remove+0x216/0x44a
[ 469.337013] [<ffffffff80346c96>] xfs_vn_unlink+0x21/0x4f
[ 469.337024] [<ffffffff804a72b9>] __mutex_lock_slowpath+0x249/0x279
[ 469.337034] [<ffffffff8024aaaf>] mark_held_locks+0x65/0x84
[ 469.337043] [<ffffffff804a72b9>] __mutex_lock_slowpath+0x249/0x279
[ 469.337053] [<ffffffff8024ac7c>] trace_hardirqs_on+0x101/0x128
[ 469.337062] [<ffffffff804a72dc>] __mutex_lock_slowpath+0x26c/0x279
[ 469.337072] [<ffffffff802832d9>] permission+0xa0/0xa9
[ 469.337081] [<ffffffff80285fb4>] vfs_unlink+0x61/0x8d
[ 469.337090] [<ffffffff80286098>] do_unlinkat+0xb8/0x136
[ 469.337100] [<ffffffff804a8649>] trace_hardirqs_on_thunk+0x35/0x37
[ 469.337109] [<ffffffff8024ac7c>] trace_hardirqs_on+0x101/0x128
[ 469.337118] [<ffffffff804a8649>] trace_hardirqs_on_thunk+0x35/0x37
[ 469.337129] [<ffffffff80209a4e>] system_call+0x7e/0x83
[ 469.337137]


[ 488.264267] Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
[ 488.264271] [<ffffffff8025d255>] __drain_pages+0x33/0x7d
[ 488.264288] PGD 2753ae067 PUD 275f66067 PMD 0
[ 488.264302] Oops: 0000 [1] PREEMPT SMP
[ 488.264315] CPU 2
[ 488.264323] Modules linked in:
[ 488.264332] Pid: 7617, comm: sh Not tainted 2.6.19-rc3 #2
[ 488.264339] RIP: 0010:[<ffffffff8025d255>] [<ffffffff8025d255>] __drain_pages+0x33/0x7d
[ 488.264353] RSP: 0018:ffff81016dd3fdf8 EFLAGS: 00010046
[ 488.264358] RAX: 0000000000000000 RBX: 0000000000000086 RCX: 0000000000000000
[ 488.264366] RDX: 0000000000000010 RSI: 0000000000000000 RDI: ffff810180000000
[ 488.264373] RBP: 0000000000000000 R08: 0000000000000002 R09: 0000000000000001
[ 488.264381] R10: ffffffff8025cb12 R11: 0000000000000000 R12: ffff810180000000
[ 488.264389] R13: 0000000000000001 R14: 0000000000000001 R15: ffff81027a7e5998
[ 488.264397] FS: 00002af4d80a5ae0(0000) GS:ffff81018001ad18(0000) knlGS:0000000000000000
[ 488.264405] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 488.264412] CR2: 0000000000000000 CR3: 00000002767bf000 CR4: 00000000000006a0
[ 488.264420] Process sh (pid: 7617, threadinfo ffff81016dd3e000, task ffff81016e03d080)
[ 488.264427] Stack: 0000000000000001 0000000000000001 0000000000000007 0000000000000001
[ 488.264448] 000000000000000f ffffffff8025e771 ffffffff80538860 ffffffff8023e3c2
[ 488.264466] 0000000000000001 0000000000000001 ffff81016e11a0c0 ffffffff8024ff50
[ 488.264481] Call Trace:
[ 488.264490] [<ffffffff8025e771>] page_alloc_cpu_notify+0x18/0x33
[ 488.264500] [<ffffffff8023e3c2>] notifier_call_chain+0x23/0x32
[ 488.264509] [<ffffffff8024ff50>] _cpu_down+0x184/0x245
[ 488.264516] [<ffffffff8025003c>] cpu_down+0x2b/0x42
[ 488.264525] [<ffffffff803b25b8>] store_online+0x27/0x71
[ 488.264534] [<ffffffff802b7b4a>] sysfs_write_file+0xb6/0xe5
[ 488.264544] [<ffffffff8027c6e5>] vfs_write+0xb2/0x155
[ 488.264551] [<ffffffff8027c83d>] sys_write+0x45/0x70
[ 488.264560] [<ffffffff80209a4e>] system_call+0x7e/0x83
[ 488.264566]
[ 488.264570]
[ 488.264572] Code: 8b 75 00 4c 89 e7 e8 82 f8 ff ff f6 c7 02 c7 45 00 00 00 00
[ 488.264634] RIP [<ffffffff8025d255>] __drain_pages+0x33/0x7d
[ 488.264644] RSP <ffff81016dd3fdf8>
[ 488.264650] CR2: 0000000000000000
[ 488.264655]

Thanks,
Kiran


Attachments:
(No filename) (4.98 kB)
2.6.19-rc3 (30.81 kB)
config.gz (6.82 kB)
Download all attachments