Dear all,
First, I apologize por my poor english...
Since I've tried to boot 2.6.32.x kernel, my system hangs during the
boot process, and I think it could be related to the problem reported
earlier by Megastorage (http://lkml.org/lkml/2010/1/10/92).
The hardware is a Dell PowerEdge 2950 which runs fine with the
2.6.31.x kernel series (actually running with the latest 2.6.31.11),
and the system is debian etch.
Here is the trace of the bug I've got (using netconsole) with a
2.6.32.3 kernel :
BUG: Dentry ffff880667690000{i=41a46,n=sleep} still in use (8)
[unmount of ext3 dm-4]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:670!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/block/dm-2/removable
CPU 0
Modules linked in: i5k_amb hwmon button processor thermal fan [last
unloaded: scsi_wait_scan]
Pid: 3311, comm: kpartx Not tainted 2.6.32.3 #2 PowerEdge 2950
RIP: 0010:[<ffffffff810f95f0>] ?[<ffffffff810f95f0>]
shrink_dcache_for_umount_subtree+0x280/0x290
RSP: 0018:ffff88066670dcf8 ?EFLAGS: 00010296
RAX: 000000000000005c RBX: ffff8806677696c0 RCX: 0000000000000096
RDX: 0000000000006767 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff880667690000 R08: 0000000000000000 R09: ffff8806670d1628
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880667690060
R13: 0000000000000007 R14: ffff8806654d1a88 R15: 0000000000dec0b0
FS: ?00007f176e96b770(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
CS: ?0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007fff0a2e0080 CR3: 0000000666607000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kpartx (pid: 3311, threadinfo ffff88066670c000, task ffff8806652997d0)
Stack:
ffff880665b8b178 ffff880665b8af18 ffffffff81619600 0000000000000001
<0> ffff880667408e00 ffffffff810f9629 ffff880665b8af18 ffffffff810e8049
<0> ffff8806651333f8 ffff880667408e00 ffffffff8185fc00 ffffffff810e8159
Call Trace:
[<ffffffff810f9629>] ? shrink_dcache_for_umount+0x29/0x50
[<ffffffff810e8049>] ? generic_shutdown_super+0x19/0x100
[<ffffffff810e8159>] ? kill_block_super+0x29/0x50
[<ffffffff810e8238>] ? deactivate_locked_super+0x58/0x80
[<ffffffff81112842>] ? thaw_bdev+0xd2/0x110
[<ffffffff814b0c67>] ? dm_resume+0xf7/0x160
[<ffffffff814b5f00>] ? dev_suspend+0x0/0x220
[<ffffffff814b60b1>] ? dev_suspend+0x1b1/0x220
[<ffffffff814b6c7b>] ? ctl_ioctl+0x1eb/0x260
[<ffffffff810c0b1b>] ? handle_mm_fault+0x63b/0x990
[<ffffffff814b6cfe>] ? dm_ctl_ioctl+0xe/0x20
[<ffffffff8104991a>] ? finish_task_switch+0x3a/0xc0
[<ffffffff810f4e9f>] ? vfs_ioctl+0x2f/0xb0
[<ffffffff810f53bb>] ? do_vfs_ioctl+0x3fb/0x580
[<ffffffff815fb101>] ? thread_return+0x3e/0x64d
[<ffffffff810f55e1>] ? sys_ioctl+0xa1/0xb0
[<ffffffff8100bf02>] ? system_call_fastpath+0x16/0x1b
Code: 4d 38 48 8b 45 10 48 85 c0 74 04 48 8b 50 40 48 8d 86 60 02 00
00 48 c7 c7 a8 66 76 81 48 89 04 24 48 89 ee 31 c0 e8 a9 11 50 00 <0f>
0b eb fe 0f 0b eb fe 0f 1f 84 00 00 00 00 00 53 48 89 fb 48
RIP ?[<ffffffff810f95f0>] shrink_dcache_for_umount_subtree+0x280/0x290
RSP <ffff88066670dcf8>
---[ end trace 3cc1cb65fcc6a8ca ]---
another trace with same behavior on a new compiled kernel with more
debug options;
but I can't see any difference :
BUG: Dentry ffff880667556738{i=41a46,n=sleep} still in use (8)
[unmount of ext3 dm-4]
------------[ cut here ]------------
kernel BUG at fs/dcache.c:670!
invalid opcode: 0000 [#1] SMP
last sysfs file: /sys/block/dm-3/removable
CPU 1
Modules linked in: i5k_amb(+) button hwmon processor thermal fan [last
unloaded: scsi_wait_scan]
Pid: 3315, comm: kpartx Not tainted 2.6.32.3 #3 PowerEdge 2950
RIP: 0010:[<ffffffff810f95f0>] ?[<ffffffff810f95f0>]
shrink_dcache_for_umount_subtree+0x280/0x290
RSP: 0018:ffff880667089cf8 ?EFLAGS: 00010296
RAX: 000000000000005c RBX: ffff880667790a60 RCX: 0000000000000096
RDX: 0000000000006767 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff880667556738 R08: 0000000000000000 R09: ffff88066604b420
R10: 0000000000000000 R11: 0000000000000000 R12: ffff880667556798
R13: 0000000000000007 R14: ffff880665842360 R15: 0000000000b3c0b0
FS: ?00007f7b1006c770(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
CS: ?0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 00007f6e67f1c350 CR3: 0000000664ff1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kpartx (pid: 3315, threadinfo ffff880667088000, task ffff880664f55f40)
Stack:
ffff880667058af0 ffff880667058890 ffffffff81619600 0000000000000001
<0> ffff880667408e00 ffffffff810f9629 ffff880667058890 ffffffff810e8049
<0> ffff88067f83e758 ffff880667408e00 ffffffff8185fc00 ffffffff810e8159
Call Trace:
[<ffffffff810f9629>] ? shrink_dcache_for_umount+0x29/0x50
[<ffffffff810e8049>] ? generic_shutdown_super+0x19/0x100
[<ffffffff810e8159>] ? kill_block_super+0x29/0x50
[<ffffffff810e8238>] ? deactivate_locked_super+0x58/0x80
[<ffffffff81112842>] ? thaw_bdev+0xd2/0x110
[<ffffffff814b0c67>] ? dm_resume+0xf7/0x160
[<ffffffff814b5f00>] ? dev_suspend+0x0/0x220
[<ffffffff814b60b1>] ? dev_suspend+0x1b1/0x220
[<ffffffff814b6c7b>] ? ctl_ioctl+0x1eb/0x260
[<ffffffff810c0b1b>] ? handle_mm_fault+0x63b/0x990
[<ffffffff814b6cfe>] ? dm_ctl_ioctl+0xe/0x20
[<ffffffff8104991a>] ? finish_task_switch+0x3a/0xc0
[<ffffffff810f4e9f>] ? vfs_ioctl+0x2f/0xb0
[<ffffffff810f53bb>] ? do_vfs_ioctl+0x3fb/0x580
[<ffffffff815fb101>] ? thread_return+0x3e/0x64d
[<ffffffff810f55e1>] ? sys_ioctl+0xa1/0xb0
[<ffffffff8100bf02>] ? system_call_fastpath+0x16/0x1b
Code: 4d 38 48 8b 45 10 48 85 c0 74 04 48 8b 50 40 48 8d 86 60 02 00
00 48 c7 c7 a8 66 76 81 48 89 04 24 48 89 ee 31 c0 e8 a9 11 50 00 <0f>
0b eb fe 0f 0b eb fe 0f 1f 84 00 00 00 00 00 53 48 89 fb 48
RIP ?[<ffffffff810f95f0>] shrink_dcache_for_umount_subtree+0x280/0x290
RSP <ffff880667089cf8>
---[ end trace a9fb3c2286e56cbd ]---
I think the problem should be related with lvm or device mapper because
I could start perfectly a 2.6.32.2 kernel on another PowerEdge 2950
without any kind of lvm or dm configured...
but I'm really not expert with kernel debug.
Here is the fstab of the buggy system :
# /etc/fstab: static file system information.
#
# <file system> <mount point> ? <type> ?<options> ? ? ? <dump> ?<pass>
proc ? ? ? ? ? ?/proc ? ? ? ? ? proc ? ?defaults ? ? ? ?0 ? ? ? 0
/dev/dm-4 ? ? ? / ? ? ? ? ? ? ? ext3 ? ?errors=remount-ro 0 ? ? ? 1
/dev/dm-1 ? ? ? /boot ? ? ? ? ? ext3 ? ?defaults ? ? ? ?0 ? ? ? 2
/dev/dm-7 ? ? ? /home ? ? ? ? ? ext3 ? ?defaults ? ? ? ?0 ? ? ? 2
/dev/dm-5 ? ? ? /usr ? ? ? ? ? ?ext3 ? ?defaults ? ? ? ?0 ? ? ? 2
/dev/dm-6 ? ? ? /var ? ? ? ? ? ?ext3 ? ?defaults ? ? ? ?0 ? ? ? 2
/dev/dm-2 ? ? ? none ? ? ? ? ? ?swap ? ?sw ? ? ? ? ? ? ?0 ? ? ? 0
/dev/hda ? ? ? ?/media/cdrom0 ? udf,iso9660 user,noauto ? ? 0 ? ? ? 0
debugfs /sys/kernel/debug debugfs noauto 0 0
I hope it can help, and try to give us more informations if necessary.
Fran?ois.
(cc's added)
On Sat, 16 Jan 2010 10:58:30 +0100
Fran__ois Figarola <[email protected]> wrote:
> Dear all,
>
> First, I apologize por my poor english...
>
> Since I've tried to boot 2.6.32.x kernel, my system hangs during the
> boot process, and I think it could be related to the problem reported
> earlier by Megastorage (http://lkml.org/lkml/2010/1/10/92).
>
> The hardware is a Dell PowerEdge 2950 which runs fine with the
> 2.6.31.x kernel series (actually running with the latest 2.6.31.11),
> and the system is debian etch.
>
> Here is the trace of the bug I've got (using netconsole) with a
> 2.6.32.3 kernel :
>
> BUG: Dentry ffff880667690000{i=41a46,n=sleep} still in use (8)
> [unmount of ext3 dm-4]
> ------------[ cut here ]------------
> kernel BUG at fs/dcache.c:670!
That's
if (atomic_read(&dentry->d_count) != 0) {
printk(KERN_ERR
"BUG: Dentry %p{i=%lx,n=%s}"
" still in use (%d)"
" [unmount of %s %s]\n",
dentry,
dentry->d_inode ?
dentry->d_inode->i_ino : 0UL,
dentry->d_name.name,
atomic_read(&dentry->d_count),
dentry->d_sb->s_type->name,
dentry->d_sb->s_id);
BUG();
}
I'm a bit surprised that the system is doing a dm suspemd/resume during
the boot process.
I assume it's a DM bug, dunno.
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/block/dm-2/removable
> CPU 0
> Modules linked in: i5k_amb hwmon button processor thermal fan [last
> unloaded: scsi_wait_scan]
> Pid: 3311, comm: kpartx Not tainted 2.6.32.3 #2 PowerEdge 2950
> RIP: 0010:[<ffffffff810f95f0>] __[<ffffffff810f95f0>]
> shrink_dcache_for_umount_subtree+0x280/0x290
> RSP: 0018:ffff88066670dcf8 __EFLAGS: 00010296
> RAX: 000000000000005c RBX: ffff8806677696c0 RCX: 0000000000000096
> RDX: 0000000000006767 RSI: 0000000000000046 RDI: 0000000000000246
> RBP: ffff880667690000 R08: 0000000000000000 R09: ffff8806670d1628
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880667690060
> R13: 0000000000000007 R14: ffff8806654d1a88 R15: 0000000000dec0b0
> FS: __00007f176e96b770(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> CS: __0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007fff0a2e0080 CR3: 0000000666607000 CR4: 00000000000006f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kpartx (pid: 3311, threadinfo ffff88066670c000, task ffff8806652997d0)
> Stack:
> ffff880665b8b178 ffff880665b8af18 ffffffff81619600 0000000000000001
> <0> ffff880667408e00 ffffffff810f9629 ffff880665b8af18 ffffffff810e8049
> <0> ffff8806651333f8 ffff880667408e00 ffffffff8185fc00 ffffffff810e8159
> Call Trace:
> [<ffffffff810f9629>] ? shrink_dcache_for_umount+0x29/0x50
> [<ffffffff810e8049>] ? generic_shutdown_super+0x19/0x100
> [<ffffffff810e8159>] ? kill_block_super+0x29/0x50
> [<ffffffff810e8238>] ? deactivate_locked_super+0x58/0x80
> [<ffffffff81112842>] ? thaw_bdev+0xd2/0x110
> [<ffffffff814b0c67>] ? dm_resume+0xf7/0x160
> [<ffffffff814b5f00>] ? dev_suspend+0x0/0x220
> [<ffffffff814b60b1>] ? dev_suspend+0x1b1/0x220
> [<ffffffff814b6c7b>] ? ctl_ioctl+0x1eb/0x260
> [<ffffffff810c0b1b>] ? handle_mm_fault+0x63b/0x990
> [<ffffffff814b6cfe>] ? dm_ctl_ioctl+0xe/0x20
> [<ffffffff8104991a>] ? finish_task_switch+0x3a/0xc0
> [<ffffffff810f4e9f>] ? vfs_ioctl+0x2f/0xb0
> [<ffffffff810f53bb>] ? do_vfs_ioctl+0x3fb/0x580
> [<ffffffff815fb101>] ? thread_return+0x3e/0x64d
> [<ffffffff810f55e1>] ? sys_ioctl+0xa1/0xb0
> [<ffffffff8100bf02>] ? system_call_fastpath+0x16/0x1b
> Code: 4d 38 48 8b 45 10 48 85 c0 74 04 48 8b 50 40 48 8d 86 60 02 00
> 00 48 c7 c7 a8 66 76 81 48 89 04 24 48 89 ee 31 c0 e8 a9 11 50 00 <0f>
> 0b eb fe 0f 0b eb fe 0f 1f 84 00 00 00 00 00 53 48 89 fb 48
> RIP __[<ffffffff810f95f0>] shrink_dcache_for_umount_subtree+0x280/0x290
> RSP <ffff88066670dcf8>
> ---[ end trace 3cc1cb65fcc6a8ca ]---
>
> another trace with same behavior on a new compiled kernel with more
> debug options;
> but I can't see any difference :
>
> BUG: Dentry ffff880667556738{i=41a46,n=sleep} still in use (8)
> [unmount of ext3 dm-4]
> ------------[ cut here ]------------
> kernel BUG at fs/dcache.c:670!
> invalid opcode: 0000 [#1] SMP
> last sysfs file: /sys/block/dm-3/removable
> CPU 1
> Modules linked in: i5k_amb(+) button hwmon processor thermal fan [last
> unloaded: scsi_wait_scan]
> Pid: 3315, comm: kpartx Not tainted 2.6.32.3 #3 PowerEdge 2950
> RIP: 0010:[<ffffffff810f95f0>] __[<ffffffff810f95f0>]
> shrink_dcache_for_umount_subtree+0x280/0x290
> RSP: 0018:ffff880667089cf8 __EFLAGS: 00010296
> RAX: 000000000000005c RBX: ffff880667790a60 RCX: 0000000000000096
> RDX: 0000000000006767 RSI: 0000000000000046 RDI: 0000000000000246
> RBP: ffff880667556738 R08: 0000000000000000 R09: ffff88066604b420
> R10: 0000000000000000 R11: 0000000000000000 R12: ffff880667556798
> R13: 0000000000000007 R14: ffff880665842360 R15: 0000000000b3c0b0
> FS: __00007f7b1006c770(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
> CS: __0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> CR2: 00007f6e67f1c350 CR3: 0000000664ff1000 CR4: 00000000000006e0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Process kpartx (pid: 3315, threadinfo ffff880667088000, task ffff880664f55f40)
> Stack:
> ffff880667058af0 ffff880667058890 ffffffff81619600 0000000000000001
> <0> ffff880667408e00 ffffffff810f9629 ffff880667058890 ffffffff810e8049
> <0> ffff88067f83e758 ffff880667408e00 ffffffff8185fc00 ffffffff810e8159
> Call Trace:
> [<ffffffff810f9629>] ? shrink_dcache_for_umount+0x29/0x50
> [<ffffffff810e8049>] ? generic_shutdown_super+0x19/0x100
> [<ffffffff810e8159>] ? kill_block_super+0x29/0x50
> [<ffffffff810e8238>] ? deactivate_locked_super+0x58/0x80
> [<ffffffff81112842>] ? thaw_bdev+0xd2/0x110
> [<ffffffff814b0c67>] ? dm_resume+0xf7/0x160
> [<ffffffff814b5f00>] ? dev_suspend+0x0/0x220
> [<ffffffff814b60b1>] ? dev_suspend+0x1b1/0x220
> [<ffffffff814b6c7b>] ? ctl_ioctl+0x1eb/0x260
> [<ffffffff810c0b1b>] ? handle_mm_fault+0x63b/0x990
> [<ffffffff814b6cfe>] ? dm_ctl_ioctl+0xe/0x20
> [<ffffffff8104991a>] ? finish_task_switch+0x3a/0xc0
> [<ffffffff810f4e9f>] ? vfs_ioctl+0x2f/0xb0
> [<ffffffff810f53bb>] ? do_vfs_ioctl+0x3fb/0x580
> [<ffffffff815fb101>] ? thread_return+0x3e/0x64d
> [<ffffffff810f55e1>] ? sys_ioctl+0xa1/0xb0
> [<ffffffff8100bf02>] ? system_call_fastpath+0x16/0x1b
> Code: 4d 38 48 8b 45 10 48 85 c0 74 04 48 8b 50 40 48 8d 86 60 02 00
> 00 48 c7 c7 a8 66 76 81 48 89 04 24 48 89 ee 31 c0 e8 a9 11 50 00 <0f>
> 0b eb fe 0f 0b eb fe 0f 1f 84 00 00 00 00 00 53 48 89 fb 48
> RIP __[<ffffffff810f95f0>] shrink_dcache_for_umount_subtree+0x280/0x290
> RSP <ffff880667089cf8>
> ---[ end trace a9fb3c2286e56cbd ]---
>
>
> I think the problem should be related with lvm or device mapper because
> I could start perfectly a 2.6.32.2 kernel on another PowerEdge 2950
> without any kind of lvm or dm configured...
> but I'm really not expert with kernel debug.
>
> Here is the fstab of the buggy system :
>
> # /etc/fstab: static file system information.
> #
> # <file system> <mount point> __ <type> __<options> __ __ __ <dump> __<pass>
> proc __ __ __ __ __ __/proc __ __ __ __ __ proc __ __defaults __ __ __ __0 __ __ __ 0
> /dev/dm-4 __ __ __ / __ __ __ __ __ __ __ ext3 __ __errors=remount-ro 0 __ __ __ 1
> /dev/dm-1 __ __ __ /boot __ __ __ __ __ ext3 __ __defaults __ __ __ __0 __ __ __ 2
> /dev/dm-7 __ __ __ /home __ __ __ __ __ ext3 __ __defaults __ __ __ __0 __ __ __ 2
> /dev/dm-5 __ __ __ /usr __ __ __ __ __ __ext3 __ __defaults __ __ __ __0 __ __ __ 2
> /dev/dm-6 __ __ __ /var __ __ __ __ __ __ext3 __ __defaults __ __ __ __0 __ __ __ 2
> /dev/dm-2 __ __ __ none __ __ __ __ __ __swap __ __sw __ __ __ __ __ __ __0 __ __ __ 0
> /dev/hda __ __ __ __/media/cdrom0 __ udf,iso9660 user,noauto __ __ 0 __ __ __ 0
> debugfs /sys/kernel/debug debugfs noauto 0 0
>
> I hope it can help, and try to give us more informations if necessary.
>
> Fran__ois.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
On Fri, 22 Jan 2010 16:07:40 -0800
Andrew Morton <[email protected]> wrote:
> (cc's added)
(another cc added, one that might actually be useful.....)
>
> On Sat, 16 Jan 2010 10:58:30 +0100
> Fran__ois Figarola <[email protected]> wrote:
>
> > Dear all,
> >
> > First, I apologize por my poor english...
> >
> > Since I've tried to boot 2.6.32.x kernel, my system hangs during the
> > boot process, and I think it could be related to the problem reported
> > earlier by Megastorage (http://lkml.org/lkml/2010/1/10/92).
> >
> > The hardware is a Dell PowerEdge 2950 which runs fine with the
> > 2.6.31.x kernel series (actually running with the latest 2.6.31.11),
> > and the system is debian etch.
> >
> > Here is the trace of the bug I've got (using netconsole) with a
> > 2.6.32.3 kernel :
> >
> > BUG: Dentry ffff880667690000{i=41a46,n=sleep} still in use (8)
> > [unmount of ext3 dm-4]
> > ------------[ cut here ]------------
> > kernel BUG at fs/dcache.c:670!
>
> That's
>
> if (atomic_read(&dentry->d_count) != 0) {
> printk(KERN_ERR
> "BUG: Dentry %p{i=%lx,n=%s}"
> " still in use (%d)"
> " [unmount of %s %s]\n",
> dentry,
> dentry->d_inode ?
> dentry->d_inode->i_ino : 0UL,
> dentry->d_name.name,
> atomic_read(&dentry->d_count),
> dentry->d_sb->s_type->name,
> dentry->d_sb->s_id);
> BUG();
> }
>
> I'm a bit surprised that the system is doing a dm suspemd/resume during
> the boot process.
It could be that a dm_resume if how you activate a dm device once it is
built, but I'm not sure....
Maybe the guys on dm-devel can help.
NeilBrown
>
> I assume it's a DM bug, dunno.
>
> > invalid opcode: 0000 [#1] SMP
> > last sysfs file: /sys/block/dm-2/removable
> > CPU 0
> > Modules linked in: i5k_amb hwmon button processor thermal fan [last
> > unloaded: scsi_wait_scan]
> > Pid: 3311, comm: kpartx Not tainted 2.6.32.3 #2 PowerEdge 2950
> > RIP: 0010:[<ffffffff810f95f0>] __[<ffffffff810f95f0>]
> > shrink_dcache_for_umount_subtree+0x280/0x290
> > RSP: 0018:ffff88066670dcf8 __EFLAGS: 00010296
> > RAX: 000000000000005c RBX: ffff8806677696c0 RCX: 0000000000000096
> > RDX: 0000000000006767 RSI: 0000000000000046 RDI: 0000000000000246
> > RBP: ffff880667690000 R08: 0000000000000000 R09: ffff8806670d1628
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff880667690060
> > R13: 0000000000000007 R14: ffff8806654d1a88 R15: 0000000000dec0b0
> > FS: __00007f176e96b770(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
> > CS: __0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007fff0a2e0080 CR3: 0000000666607000 CR4: 00000000000006f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process kpartx (pid: 3311, threadinfo ffff88066670c000, task ffff8806652997d0)
> > Stack:
> > ffff880665b8b178 ffff880665b8af18 ffffffff81619600 0000000000000001
> > <0> ffff880667408e00 ffffffff810f9629 ffff880665b8af18 ffffffff810e8049
> > <0> ffff8806651333f8 ffff880667408e00 ffffffff8185fc00 ffffffff810e8159
> > Call Trace:
> > [<ffffffff810f9629>] ? shrink_dcache_for_umount+0x29/0x50
> > [<ffffffff810e8049>] ? generic_shutdown_super+0x19/0x100
> > [<ffffffff810e8159>] ? kill_block_super+0x29/0x50
> > [<ffffffff810e8238>] ? deactivate_locked_super+0x58/0x80
> > [<ffffffff81112842>] ? thaw_bdev+0xd2/0x110
> > [<ffffffff814b0c67>] ? dm_resume+0xf7/0x160
> > [<ffffffff814b5f00>] ? dev_suspend+0x0/0x220
> > [<ffffffff814b60b1>] ? dev_suspend+0x1b1/0x220
> > [<ffffffff814b6c7b>] ? ctl_ioctl+0x1eb/0x260
> > [<ffffffff810c0b1b>] ? handle_mm_fault+0x63b/0x990
> > [<ffffffff814b6cfe>] ? dm_ctl_ioctl+0xe/0x20
> > [<ffffffff8104991a>] ? finish_task_switch+0x3a/0xc0
> > [<ffffffff810f4e9f>] ? vfs_ioctl+0x2f/0xb0
> > [<ffffffff810f53bb>] ? do_vfs_ioctl+0x3fb/0x580
> > [<ffffffff815fb101>] ? thread_return+0x3e/0x64d
> > [<ffffffff810f55e1>] ? sys_ioctl+0xa1/0xb0
> > [<ffffffff8100bf02>] ? system_call_fastpath+0x16/0x1b
> > Code: 4d 38 48 8b 45 10 48 85 c0 74 04 48 8b 50 40 48 8d 86 60 02 00
> > 00 48 c7 c7 a8 66 76 81 48 89 04 24 48 89 ee 31 c0 e8 a9 11 50 00 <0f>
> > 0b eb fe 0f 0b eb fe 0f 1f 84 00 00 00 00 00 53 48 89 fb 48
> > RIP __[<ffffffff810f95f0>] shrink_dcache_for_umount_subtree+0x280/0x290
> > RSP <ffff88066670dcf8>
> > ---[ end trace 3cc1cb65fcc6a8ca ]---
> >
> > another trace with same behavior on a new compiled kernel with more
> > debug options;
> > but I can't see any difference :
> >
> > BUG: Dentry ffff880667556738{i=41a46,n=sleep} still in use (8)
> > [unmount of ext3 dm-4]
> > ------------[ cut here ]------------
> > kernel BUG at fs/dcache.c:670!
> > invalid opcode: 0000 [#1] SMP
> > last sysfs file: /sys/block/dm-3/removable
> > CPU 1
> > Modules linked in: i5k_amb(+) button hwmon processor thermal fan [last
> > unloaded: scsi_wait_scan]
> > Pid: 3315, comm: kpartx Not tainted 2.6.32.3 #3 PowerEdge 2950
> > RIP: 0010:[<ffffffff810f95f0>] __[<ffffffff810f95f0>]
> > shrink_dcache_for_umount_subtree+0x280/0x290
> > RSP: 0018:ffff880667089cf8 __EFLAGS: 00010296
> > RAX: 000000000000005c RBX: ffff880667790a60 RCX: 0000000000000096
> > RDX: 0000000000006767 RSI: 0000000000000046 RDI: 0000000000000246
> > RBP: ffff880667556738 R08: 0000000000000000 R09: ffff88066604b420
> > R10: 0000000000000000 R11: 0000000000000000 R12: ffff880667556798
> > R13: 0000000000000007 R14: ffff880665842360 R15: 0000000000b3c0b0
> > FS: __00007f7b1006c770(0000) GS:ffff880028240000(0000) knlGS:0000000000000000
> > CS: __0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> > CR2: 00007f6e67f1c350 CR3: 0000000664ff1000 CR4: 00000000000006e0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > Process kpartx (pid: 3315, threadinfo ffff880667088000, task ffff880664f55f40)
> > Stack:
> > ffff880667058af0 ffff880667058890 ffffffff81619600 0000000000000001
> > <0> ffff880667408e00 ffffffff810f9629 ffff880667058890 ffffffff810e8049
> > <0> ffff88067f83e758 ffff880667408e00 ffffffff8185fc00 ffffffff810e8159
> > Call Trace:
> > [<ffffffff810f9629>] ? shrink_dcache_for_umount+0x29/0x50
> > [<ffffffff810e8049>] ? generic_shutdown_super+0x19/0x100
> > [<ffffffff810e8159>] ? kill_block_super+0x29/0x50
> > [<ffffffff810e8238>] ? deactivate_locked_super+0x58/0x80
> > [<ffffffff81112842>] ? thaw_bdev+0xd2/0x110
> > [<ffffffff814b0c67>] ? dm_resume+0xf7/0x160
> > [<ffffffff814b5f00>] ? dev_suspend+0x0/0x220
> > [<ffffffff814b60b1>] ? dev_suspend+0x1b1/0x220
> > [<ffffffff814b6c7b>] ? ctl_ioctl+0x1eb/0x260
> > [<ffffffff810c0b1b>] ? handle_mm_fault+0x63b/0x990
> > [<ffffffff814b6cfe>] ? dm_ctl_ioctl+0xe/0x20
> > [<ffffffff8104991a>] ? finish_task_switch+0x3a/0xc0
> > [<ffffffff810f4e9f>] ? vfs_ioctl+0x2f/0xb0
> > [<ffffffff810f53bb>] ? do_vfs_ioctl+0x3fb/0x580
> > [<ffffffff815fb101>] ? thread_return+0x3e/0x64d
> > [<ffffffff810f55e1>] ? sys_ioctl+0xa1/0xb0
> > [<ffffffff8100bf02>] ? system_call_fastpath+0x16/0x1b
> > Code: 4d 38 48 8b 45 10 48 85 c0 74 04 48 8b 50 40 48 8d 86 60 02 00
> > 00 48 c7 c7 a8 66 76 81 48 89 04 24 48 89 ee 31 c0 e8 a9 11 50 00 <0f>
> > 0b eb fe 0f 0b eb fe 0f 1f 84 00 00 00 00 00 53 48 89 fb 48
> > RIP __[<ffffffff810f95f0>] shrink_dcache_for_umount_subtree+0x280/0x290
> > RSP <ffff880667089cf8>
> > ---[ end trace a9fb3c2286e56cbd ]---
> >
> >
> > I think the problem should be related with lvm or device mapper because
> > I could start perfectly a 2.6.32.2 kernel on another PowerEdge 2950
> > without any kind of lvm or dm configured...
> > but I'm really not expert with kernel debug.
> >
> > Here is the fstab of the buggy system :
> >
> > # /etc/fstab: static file system information.
> > #
> > # <file system> <mount point> __ <type> __<options> __ __ __ <dump> __<pass>
> > proc __ __ __ __ __ __/proc __ __ __ __ __ proc __ __defaults __ __ __ __0 __ __ __ 0
> > /dev/dm-4 __ __ __ / __ __ __ __ __ __ __ ext3 __ __errors=remount-ro 0 __ __ __ 1
> > /dev/dm-1 __ __ __ /boot __ __ __ __ __ ext3 __ __defaults __ __ __ __0 __ __ __ 2
> > /dev/dm-7 __ __ __ /home __ __ __ __ __ ext3 __ __defaults __ __ __ __0 __ __ __ 2
> > /dev/dm-5 __ __ __ /usr __ __ __ __ __ __ext3 __ __defaults __ __ __ __0 __ __ __ 2
> > /dev/dm-6 __ __ __ /var __ __ __ __ __ __ext3 __ __defaults __ __ __ __0 __ __ __ 2
> > /dev/dm-2 __ __ __ none __ __ __ __ __ __swap __ __sw __ __ __ __ __ __ __0 __ __ __ 0
> > /dev/hda __ __ __ __/media/cdrom0 __ udf,iso9660 user,noauto __ __ 0 __ __ __ 0
> > debugfs /sys/kernel/debug debugfs noauto 0 0
> >
> > I hope it can help, and try to give us more informations if necessary.
> >
> > Fran__ois.
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to [email protected]
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
>> On Sat, 16 Jan 2010 10:58:30 +0100
>> Fran__ois Figarola <[email protected]> wrote:
>>> Since I've tried to boot 2.6.32.x kernel, my system hangs during the
>>> boot process, and I think it could be related to the problem reported
>>> earlier by Megastorage (http://lkml.org/lkml/2010/1/10/92).
>>>
>>> The hardware is a Dell PowerEdge 2950 which runs fine with the
>>> 2.6.31.x kernel series (actually running with the latest 2.6.31.11),
>>> and the system is debian etch.
>>>
>>> Here is the trace of the bug I've got (using netconsole) with a
>>> 2.6.32.3 kernel :
>>>
>>> BUG: Dentry ffff880667690000{i=41a46,n=sleep} still in use (8)
>>> [unmount of ext3 dm-4]
>>> ------------[ cut here ]------------
>>> kernel BUG at fs/dcache.c:670!
I can reproduce this when suspend/resume read-only mounted dm device.
When MS_RDONLY, both freeze_bdev and thaw_bdev call deactivate_locked_super,
which seems wrong. The change was introduced with the commit below:
commit 4504230a71566785a05d3e6b53fa1ee071b864eb
Author: Christoph Hellwig <[email protected]>
Date: Mon Aug 3 23:28:35 2009 +0200
freeze_bdev: grab active reference to frozen superblocks
With the attached patch, both remount-ro and remount-rw are
rejected as EBUSY on freezed device as expected.
Christoph, do you think this is the right fix?
--
Jun'ichi Nomura, NEC Corporation
If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of
deactivate_locked_super().
Also, keep sb->s_frozen consistent so that remount can check the frozen state.
Signed-off-by: Jun'ichi Nomura <[email protected]>
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 73d6a73..600261f 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -246,7 +246,9 @@ struct super_block *freeze_bdev(struct block_device *bdev)
if (!sb)
goto out;
if (sb->s_flags & MS_RDONLY) {
- deactivate_locked_super(sb);
+ sb->s_frozen = SB_FREEZE_TRANS;
+ smp_wmb();
+ up_write(&sb->s_umount);
mutex_unlock(&bdev->bd_fsfreeze_mutex);
return sb;
}
@@ -307,7 +309,7 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
BUG_ON(sb->s_bdev != bdev);
down_write(&sb->s_umount);
if (sb->s_flags & MS_RDONLY)
- goto out_deactivate;
+ goto out_unfrozen;
if (sb->s_op->unfreeze_fs) {
error = sb->s_op->unfreeze_fs(sb);
@@ -321,11 +323,11 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
}
}
+out_unfrozen:
sb->s_frozen = SB_UNFROZEN;
smp_wmb();
wake_up(&sb->s_wait_unfrozen);
-out_deactivate:
if (sb)
deactivate_locked_super(sb);
out_unlock:
28.01.2010 08:32, Jun'ichi Nomura skrev:
>>> On Sat, 16 Jan 2010 10:58:30 +0100
>>> Fran__ois Figarola<[email protected]> wrote:
>>>> Since I've tried to boot 2.6.32.x kernel, my system hangs during the
>>>> boot process, and I think it could be related to the problem reported
>>>> earlier by Megastorage (http://lkml.org/lkml/2010/1/10/92).
>>>>
>>>> The hardware is a Dell PowerEdge 2950 which runs fine with the
>>>> 2.6.31.x kernel series (actually running with the latest 2.6.31.11),
>>>> and the system is debian etch.
>>>>
>>>> Here is the trace of the bug I've got (using netconsole) with a
>>>> 2.6.32.3 kernel :
>>>>
>>>> BUG: Dentry ffff880667690000{i=41a46,n=sleep} still in use (8)
>>>> [unmount of ext3 dm-4]
>>>> ------------[ cut here ]------------
>>>> kernel BUG at fs/dcache.c:670!
>
> I can reproduce this when suspend/resume read-only mounted dm device.
>
> When MS_RDONLY, both freeze_bdev and thaw_bdev call deactivate_locked_super,
> which seems wrong. The change was introduced with the commit below:
>
> commit 4504230a71566785a05d3e6b53fa1ee071b864eb
> Author: Christoph Hellwig<[email protected]>
> Date: Mon Aug 3 23:28:35 2009 +0200
>
> freeze_bdev: grab active reference to frozen superblocks
>
> With the attached patch, both remount-ro and remount-rw are
> rejected as EBUSY on freezed device as expected.
>
> Christoph, do you think this is the right fix?
>
I can confirm that both reverting the above patch, or applying the fix
below fixes the issue on both 2.6.32 and 2.6.33-rc5
So if it's considered the correct fix, it needs to be cc stable@ for 2.6.32
(I reported this same issue this morning here:
http://marc.info/?l=linux-kernel&m=126467195500908&w=2,
but then I found this thread/fix)
The system I have tested on is a 4-disk dmraid10 connected to an Intel
ICH10R on an Asus P7P55D Deluxe running x86_64
> Jun'ichi Nomura, NEC Corporation
>
>
> If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of
> deactivate_locked_super().
> Also, keep sb->s_frozen consistent so that remount can check the frozen state.
>
> Signed-off-by: Jun'ichi Nomura <[email protected]>
Tested-by: Thomas Backlund <[email protected]>
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 73d6a73..600261f 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -246,7 +246,9 @@ struct super_block *freeze_bdev(struct block_device *bdev)
> if (!sb)
> goto out;
> if (sb->s_flags & MS_RDONLY) {
> - deactivate_locked_super(sb);
> + sb->s_frozen = SB_FREEZE_TRANS;
> + smp_wmb();
> + up_write(&sb->s_umount);
> mutex_unlock(&bdev->bd_fsfreeze_mutex);
> return sb;
> }
> @@ -307,7 +309,7 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
> BUG_ON(sb->s_bdev != bdev);
> down_write(&sb->s_umount);
> if (sb->s_flags & MS_RDONLY)
> - goto out_deactivate;
> + goto out_unfrozen;
>
> if (sb->s_op->unfreeze_fs) {
> error = sb->s_op->unfreeze_fs(sb);
> @@ -321,11 +323,11 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
> }
> }
>
> +out_unfrozen:
> sb->s_frozen = SB_UNFROZEN;
> smp_wmb();
> wake_up(&sb->s_wait_unfrozen);
>
> -out_deactivate:
> if (sb)
> deactivate_locked_super(sb);
> out_unlock:
On Thu, Jan 28, 2010 at 03:32:41PM +0900, Jun'ichi Nomura wrote:
> When MS_RDONLY, both freeze_bdev and thaw_bdev call deactivate_locked_super,
> which seems wrong. The change was introduced with the commit below:
>
> commit 4504230a71566785a05d3e6b53fa1ee071b864eb
> Author: Christoph Hellwig <[email protected]>
> Date: Mon Aug 3 23:28:35 2009 +0200
>
> freeze_bdev: grab active reference to frozen superblocks
>
> With the attached patch, both remount-ro and remount-rw are
> rejected as EBUSY on freezed device as expected.
>
> Christoph, do you think this is the right fix?
Indeed, this looks wrong in my original code, and the patch looks like
the correct fix. Thanks a lot!
Reviewed-by: Christoph Hellwig <[email protected]>
Thanks Thomas and Christoph for testing and review.
I removed 'smp_wmb()' before up_write from the previous patch,
since up_write() should have necessary ordering constraints.
(I.e. the change of s_frozen is visible to others after up_write)
I'm quite sure the change is harmless but if you are uncomfortable
with Tested-by/Reviewed-by on the modified patch, please remove them.
If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of
deactivate_locked_super().
Also, keep sb->s_frozen consistent so that remount can check the frozen state.
Otherwise a crash reported here can happen:
http://lkml.org/lkml/2010/1/16/37
http://lkml.org/lkml/2010/1/28/53
This patch should be applied for 2.6.32 stable series, too.
Reviewed-by: Christoph Hellwig <[email protected]>
Tested-by: Thomas Backlund <[email protected]>
Signed-off-by: Jun'ichi Nomura <[email protected]>
Cc: [email protected]
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 73d6a73..d11d028 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -246,7 +246,8 @@ struct super_block *freeze_bdev(struct block_device *bdev)
if (!sb)
goto out;
if (sb->s_flags & MS_RDONLY) {
- deactivate_locked_super(sb);
+ sb->s_frozen = SB_FREEZE_TRANS;
+ up_write(&sb->s_umount);
mutex_unlock(&bdev->bd_fsfreeze_mutex);
return sb;
}
@@ -307,7 +308,7 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
BUG_ON(sb->s_bdev != bdev);
down_write(&sb->s_umount);
if (sb->s_flags & MS_RDONLY)
- goto out_deactivate;
+ goto out_unfrozen;
if (sb->s_op->unfreeze_fs) {
error = sb->s_op->unfreeze_fs(sb);
@@ -321,11 +322,11 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
}
}
+out_unfrozen:
sb->s_frozen = SB_UNFROZEN;
smp_wmb();
wake_up(&sb->s_wait_unfrozen);
-out_deactivate:
if (sb)
deactivate_locked_super(sb);
out_unlock:
Jun'ichi Nomura a ?crit :
>>> On Sat, 16 Jan 2010 10:58:30 +0100
>>> Fran__ois Figarola <[email protected]> wrote:
>>>
>>>> Since I've tried to boot 2.6.32.x kernel, my system hangs during the
>>>> boot process, and I think it could be related to the problem reported
>>>> earlier by Megastorage (http://lkml.org/lkml/2010/1/10/92).
>>>>
>>>> The hardware is a Dell PowerEdge 2950 which runs fine with the
>>>> 2.6.31.x kernel series (actually running with the latest 2.6.31.11),
>>>> and the system is debian etch.
>>>>
>>>> Here is the trace of the bug I've got (using netconsole) with a
>>>> 2.6.32.3 kernel :
>>>>
>>>> BUG: Dentry ffff880667690000{i=41a46,n=sleep} still in use (8)
>>>> [unmount of ext3 dm-4]
>>>> ------------[ cut here ]------------
>>>> kernel BUG at fs/dcache.c:670!
>>>>
>
> I can reproduce this when suspend/resume read-only mounted dm device.
>
> When MS_RDONLY, both freeze_bdev and thaw_bdev call deactivate_locked_super,
> which seems wrong. The change was introduced with the commit below:
>
> commit 4504230a71566785a05d3e6b53fa1ee071b864eb
> Author: Christoph Hellwig <[email protected]>
> Date: Mon Aug 3 23:28:35 2009 +0200
>
> freeze_bdev: grab active reference to frozen superblocks
>
> With the attached patch, both remount-ro and remount-rw are
> rejected as EBUSY on freezed device as expected.
>
> Christoph, do you think this is the right fix?
>
>
With the fix from Jun'ichi Nomura, a 2.6.32.5 kernel
boots now correctly.
Thanks.
29.01.2010 02:56, Jun'ichi Nomura skrev:
> Thanks Thomas and Christoph for testing and review.
> I removed 'smp_wmb()' before up_write from the previous patch,
> since up_write() should have necessary ordering constraints.
> (I.e. the change of s_frozen is visible to others after up_write)
> I'm quite sure the change is harmless but if you are uncomfortable
> with Tested-by/Reviewed-by on the modified patch, please remove them.
>
I've just verified that this patch works as intended on both 2.6.32 and
2.6.33-rc6, so for me it's still OK.
>
> If MS_RDONLY, freeze_bdev should just up_write(s_umount) instead of
> deactivate_locked_super().
> Also, keep sb->s_frozen consistent so that remount can check the frozen state.
>
> Otherwise a crash reported here can happen:
> http://lkml.org/lkml/2010/1/16/37
> http://lkml.org/lkml/2010/1/28/53
>
>
> This patch should be applied for 2.6.32 stable series, too.
>
> Reviewed-by: Christoph Hellwig<[email protected]>
> Tested-by: Thomas Backlund<[email protected]>
> Signed-off-by: Jun'ichi Nomura<[email protected]>
> Cc: [email protected]
>
> diff --git a/fs/block_dev.c b/fs/block_dev.c
> index 73d6a73..d11d028 100644
> --- a/fs/block_dev.c
> +++ b/fs/block_dev.c
> @@ -246,7 +246,8 @@ struct super_block *freeze_bdev(struct block_device *bdev)
> if (!sb)
> goto out;
> if (sb->s_flags& MS_RDONLY) {
> - deactivate_locked_super(sb);
> + sb->s_frozen = SB_FREEZE_TRANS;
> + up_write(&sb->s_umount);
> mutex_unlock(&bdev->bd_fsfreeze_mutex);
> return sb;
> }
> @@ -307,7 +308,7 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
> BUG_ON(sb->s_bdev != bdev);
> down_write(&sb->s_umount);
> if (sb->s_flags& MS_RDONLY)
> - goto out_deactivate;
> + goto out_unfrozen;
>
> if (sb->s_op->unfreeze_fs) {
> error = sb->s_op->unfreeze_fs(sb);
> @@ -321,11 +322,11 @@ int thaw_bdev(struct block_device *bdev, struct super_block *sb)
> }
> }
>
> +out_unfrozen:
> sb->s_frozen = SB_UNFROZEN;
> smp_wmb();
> wake_up(&sb->s_wait_unfrozen);
>
> -out_deactivate:
> if (sb)
> deactivate_locked_super(sb);
> out_unlock:
> .
>