2007-02-01 09:08:40

by Andrew Morton

[permalink] [raw]
Subject: Fw: [BUG -mm] ext3_orphan_add() accessing corrupted list on a corrupted ext3fs



Begin forwarded message:

Date: Thu, 1 Feb 2007 16:44:39 +0800
From: Fengguang Wu <[email protected]>
To: LKML <[email protected]>
Subject: [BUG -mm] ext3_orphan_add() accessing corrupted list on a corrupted ext3fs


I accidentally ran two qemu instances on the same ext3 fs, after that bad
things happened. After exiting the two qemus and running a new one, I got the
following oops:

root ~# ll /etc/mtab
/bin/ls: /etc/mtab: Input/output error
root ~# rm /etc/mtab
[ 147.213090] EXT3-fs warning (device hda): ext3_unlink: Deleting nonexistent file (1775838), 0
root ~# halt
[ 152.651209] list_add corruption. next->prev should be prev (ffff810007be1a38), but was ffff81000717e3d8. (next=ffff81000717e3d8).
[ 152.652507] ------------[ cut here ]------------
[ 152.652900] kernel BUG at lib/list_debug.c:27!
[ 152.653283] invalid opcode: 0000 [1] SMP
[ 152.653649] last sysfs file: /block/md2/uevent
[ 152.654020] CPU 0
[ 152.654228] Modules linked in:
[ 152.654549] Pid: 1107, comm: zsh Not tainted 2.6.20-rc6-mm3 #1
[ 152.655397] RIP: 0010:[<ffffffff8116f558>] [<ffffffff8116f558>] __list_add+0x48/0xb0
[ 152.656139] RSP: 0018:ffff8100062bdd78 EFLAGS: 00000296
[ 152.656572] RAX: 0000000000000088 RBX: ffff81000717e3d8 RCX: 0000000000000000
[ 152.657140] RDX: ffffffff8101a433 RSI: 0000000000000001 RDI: ffffffff8141fb40
[ 152.657708] RBP: ffff8100062bdd98 R08: 0000000000000002 R09: ffffffff8101a270
[ 152.658275] R10: ffff8100062bdb58 R11: 0000000000000006 R12: ffff810007be1a38
[ 152.658842] R13: ffff81000717e3d8 R14: ffff810005a52170 R15: ffff81000717e3d8
[ 152.659415] FS: 00002ba30c98ae90(0000) GS:ffffffff81488000(0000) knlGS:0000000000000000
[ 152.660068] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 152.660531] CR2: 00002ba30d670000 CR3: 00000000062ee000 CR4: 00000000000006e0
[ 152.661103] Process zsh (pid: 1107, threadinfo ffff8100062bc000, task ffff810007895080)
[ 152.661731] Stack: ffff8100061245b0 0000000000000000 ffff810007bcac20 ffff81000717e470
[ 152.662483] ffff8100062bdda8 ffffffff8116f5cc ffff8100062bde18 ffffffff81129463
[ 152.663147] ffff81000717e470 ffff81000717e300 ffff8100061245b0 0000000000000f80
[ 152.663779] Call Trace:
[ 152.664035] [<ffffffff8116f5cc>] list_add+0xc/0x10
[ 152.664439] [<ffffffff81129463>] ext3_orphan_add+0x163/0x1a0
[ 152.664943] [<ffffffff8112a5c0>] ext3_unlink+0x150/0x1c0
[ 152.665385] [<ffffffff81056942>] vfs_unlink+0xb2/0x110
[ 152.665813] [<ffffffff81045e58>] do_unlinkat+0x108/0x1f0
[ 152.666255] [<ffffffff81073341>] trace_hardirqs_on_thunk+0x35/0x37
[ 152.666761] [<ffffffff810b2749>] trace_hardirqs_on+0x1a9/0x1d0
[ 152.667239] [<ffffffff81073341>] trace_hardirqs_on_thunk+0x35/0x37
[ 152.667758] [<ffffffff810ee891>] sys_unlink+0x11/0x20
[ 152.668180] [<ffffffff8106c11e>] system_call+0x7e/0x83
[ 152.668602]
[ 152.668749]
[ 152.668754] Code: 0f 0b 66 66 90 66 66 90 eb fe 31 f6 49 3b 1c 24 48 c7 c7 60
[ 152.669850] RIP [<ffffffff8116f558>] __list_add+0x48/0xb0
[ 152.670322] RSP <ffff8100062bdd78>
[ 152.670842] BUG: at kernel/exit.c:860 do_exit()
[ 152.671209]
[ 152.671214] Call Trace:
[ 152.671543] [<ffffffff8109a835>] profile_task_exit+0x15/0x20
[ 152.671992] [<ffffffff81017f7b>] do_exit+0x6b/0xac0
[ 152.672384] [<ffffffff81073d6c>] _spin_unlock_irqrestore+0x4c/0x60
[ 152.672871] [<ffffffff8107b7b1>] die+0x61/0x70
[ 152.673230] [<ffffffff81074b00>] do_trap+0xf0/0x110
[ 152.673624] [<ffffffff8107be63>] do_invalid_op+0xb3/0xc0
[ 152.674048] [<ffffffff8116f558>] __list_add+0x48/0xb0
[ 152.675955] [<ffffffff8107439d>] error_exit+0x0/0x96
[ 152.676385] [<ffffffff8101a270>] release_console_sem+0x50/0x230
[ 152.676876] [<ffffffff8101a433>] release_console_sem+0x213/0x230
[ 152.677370] [<ffffffff8116f558>] __list_add+0x48/0xb0
[ 152.677777] [<ffffffff8116f558>] __list_add+0x48/0xb0
[ 152.678200] [<ffffffff8116f5cc>] list_add+0xc/0x10
[ 152.678598] [<ffffffff81129463>] ext3_orphan_add+0x163/0x1a0
[ 152.679105] [<ffffffff8112a5c0>] ext3_unlink+0x150/0x1c0
[ 152.679570] [<ffffffff81056942>] vfs_unlink+0xb2/0x110
[ 152.679991] [<ffffffff81045e58>] do_unlinkat+0x108/0x1f0
[ 152.680436] [<ffffffff81073341>] trace_hardirqs_on_thunk+0x35/0x37
[ 152.680945] [<ffffffff810b2749>] trace_hardirqs_on+0x1a9/0x1d0
[ 152.681411] [<ffffffff81073341>] trace_hardirqs_on_thunk+0x35/0x37
[ 152.681909] [<ffffffff810ee891>] sys_unlink+0x11/0x20
[ 152.682341] [<ffffffff8106c11e>] system_call+0x7e/0x83
[ 152.682761]

Regards,
Wu


2007-02-01 10:25:36

by Andreas Dilger

[permalink] [raw]
Subject: Re: Fw: [BUG -mm] ext3_orphan_add() accessing corrupted list on a corrupted ext3fs

I don't have a comment on the actual bug here, but this is another case
where it would be nice to have multi-mount protection built into ext3...
When I last proposed this it was refused on the grounds that an external
HA manager should be doing this job but I don't think that is realistic.

Fengguang Wu <[email protected]> wrote:
> I accidentally ran two qemu instances on the same ext3 fs, after that bad
> things happened. After exiting the two qemus and running a new one, I got the
> following oops:
>
> root ~# ll /etc/mtab
> /bin/ls: /etc/mtab: Input/output error
> root ~# rm /etc/mtab
> [ 147.213090] EXT3-fs warning (device hda): ext3_unlink: Deleting nonexistent file (1775838), 0
> root ~# halt
> [ 152.651209] list_add corruption. next->prev should be prev (ffff810007be1a38), but was ffff81000717e3d8. (next=ffff81000717e3d8).
> [ 152.652507] ------------[ cut here ]------------
> [ 152.652900] kernel BUG at lib/list_debug.c:27!
> [ 152.653283] invalid opcode: 0000 [1] SMP
> [ 152.653649] last sysfs file: /block/md2/uevent
> [ 152.654020] CPU 0
> [ 152.654228] Modules linked in:
> [ 152.654549] Pid: 1107, comm: zsh Not tainted 2.6.20-rc6-mm3 #1
> [ 152.655397] RIP: 0010:[<ffffffff8116f558>] [<ffffffff8116f558>] __list_add+0x48/0xb0
> [ 152.656139] RSP: 0018:ffff8100062bdd78 EFLAGS: 00000296
> [ 152.656572] RAX: 0000000000000088 RBX: ffff81000717e3d8 RCX: 0000000000000000
> [ 152.657140] RDX: ffffffff8101a433 RSI: 0000000000000001 RDI: ffffffff8141fb40
> [ 152.657708] RBP: ffff8100062bdd98 R08: 0000000000000002 R09: ffffffff8101a270
> [ 152.658275] R10: ffff8100062bdb58 R11: 0000000000000006 R12: ffff810007be1a38
> [ 152.658842] R13: ffff81000717e3d8 R14: ffff810005a52170 R15: ffff81000717e3d8
> [ 152.659415] FS: 00002ba30c98ae90(0000) GS:ffffffff81488000(0000) knlGS:0000000000000000
> [ 152.660068] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> [ 152.660531] CR2: 00002ba30d670000 CR3: 00000000062ee000 CR4: 00000000000006e0
> [ 152.661103] Process zsh (pid: 1107, threadinfo ffff8100062bc000, task ffff810007895080)
> [ 152.661731] Stack: ffff8100061245b0 0000000000000000 ffff810007bcac20 ffff81000717e470
> [ 152.662483] ffff8100062bdda8 ffffffff8116f5cc ffff8100062bde18 ffffffff81129463
> [ 152.663147] ffff81000717e470 ffff81000717e300 ffff8100061245b0 0000000000000f80
> [ 152.663779] Call Trace:
> [ 152.664035] [<ffffffff8116f5cc>] list_add+0xc/0x10
> [ 152.664439] [<ffffffff81129463>] ext3_orphan_add+0x163/0x1a0
> [ 152.664943] [<ffffffff8112a5c0>] ext3_unlink+0x150/0x1c0
> [ 152.665385] [<ffffffff81056942>] vfs_unlink+0xb2/0x110
> [ 152.665813] [<ffffffff81045e58>] do_unlinkat+0x108/0x1f0
> [ 152.666255] [<ffffffff81073341>] trace_hardirqs_on_thunk+0x35/0x37
> [ 152.666761] [<ffffffff810b2749>] trace_hardirqs_on+0x1a9/0x1d0
> [ 152.667239] [<ffffffff81073341>] trace_hardirqs_on_thunk+0x35/0x37
> [ 152.667758] [<ffffffff810ee891>] sys_unlink+0x11/0x20
> [ 152.668180] [<ffffffff8106c11e>] system_call+0x7e/0x83
> [ 152.668602]
> [ 152.668749]
> [ 152.668754] Code: 0f 0b 66 66 90 66 66 90 eb fe 31 f6 49 3b 1c 24 48 c7 c7 60
> [ 152.669850] RIP [<ffffffff8116f558>] __list_add+0x48/0xb0
> [ 152.670322] RSP <ffff8100062bdd78>
> [ 152.670842] BUG: at kernel/exit.c:860 do_exit()
> [ 152.671209]
> [ 152.671214] Call Trace:
> [ 152.671543] [<ffffffff8109a835>] profile_task_exit+0x15/0x20
> [ 152.671992] [<ffffffff81017f7b>] do_exit+0x6b/0xac0
> [ 152.672384] [<ffffffff81073d6c>] _spin_unlock_irqrestore+0x4c/0x60
> [ 152.672871] [<ffffffff8107b7b1>] die+0x61/0x70
> [ 152.673230] [<ffffffff81074b00>] do_trap+0xf0/0x110
> [ 152.673624] [<ffffffff8107be63>] do_invalid_op+0xb3/0xc0
> [ 152.674048] [<ffffffff8116f558>] __list_add+0x48/0xb0
> [ 152.675955] [<ffffffff8107439d>] error_exit+0x0/0x96
> [ 152.676385] [<ffffffff8101a270>] release_console_sem+0x50/0x230
> [ 152.676876] [<ffffffff8101a433>] release_console_sem+0x213/0x230
> [ 152.677370] [<ffffffff8116f558>] __list_add+0x48/0xb0
> [ 152.677777] [<ffffffff8116f558>] __list_add+0x48/0xb0
> [ 152.678200] [<ffffffff8116f5cc>] list_add+0xc/0x10
> [ 152.678598] [<ffffffff81129463>] ext3_orphan_add+0x163/0x1a0
> [ 152.679105] [<ffffffff8112a5c0>] ext3_unlink+0x150/0x1c0
> [ 152.679570] [<ffffffff81056942>] vfs_unlink+0xb2/0x110
> [ 152.679991] [<ffffffff81045e58>] do_unlinkat+0x108/0x1f0
> [ 152.680436] [<ffffffff81073341>] trace_hardirqs_on_thunk+0x35/0x37
> [ 152.680945] [<ffffffff810b2749>] trace_hardirqs_on+0x1a9/0x1d0
> [ 152.681411] [<ffffffff81073341>] trace_hardirqs_on_thunk+0x35/0x37
> [ 152.681909] [<ffffffff810ee891>] sys_unlink+0x11/0x20
> [ 152.682341] [<ffffffff8106c11e>] system_call+0x7e/0x83
> [ 152.682761]
>
> Regards,
> Wu
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-02-01 16:57:49

by Eric Sandeen

[permalink] [raw]
Subject: Re: Fw: [BUG -mm] ext3_orphan_add() accessing corrupted list on a corrupted ext3fs

Andrew Morton wrote:
>
> Begin forwarded message:
>
> Date: Thu, 1 Feb 2007 16:44:39 +0800
> From: Fengguang Wu <[email protected]>
> To: LKML <[email protected]>
> Subject: [BUG -mm] ext3_orphan_add() accessing corrupted list on a corrupted ext3fs
>
>
> I accidentally ran two qemu instances on the same ext3 fs, after that bad
> things happened. After exiting the two qemus and running a new one, I got the
> following oops:

Is this equivalent to mounting the same SAN block device on 2 different
machines? And if so how much can the filesystem really be expected to
cope with this?

(remembering to read the rest of his inbox...)

Andreas Dilger wrote:

> I don't have a comment on the actual bug here, but this is another case
> where it would be nice to have multi-mount protection built into ext3...
> When I last proposed this it was refused on the grounds that an external
> HA manager should be doing this job but I don't think that is realistic.

I'm with Andreas on this one, in the era of SANs, iscsi, virtual
machines, and suspended images, it would be nice to prevent multiple
mounts at the fs (or vfs?) level....

-Eric

2007-02-01 17:28:30

by Alex Tomas

[permalink] [raw]
Subject: Re: Fw: [BUG -mm] ext3_orphan_add() accessing corrupted list on a corrupted ext3fs

>>>>> Andreas Dilger (AD) writes:

AD> I don't have a comment on the actual bug here, but this is another case
AD> where it would be nice to have multi-mount protection built into ext3...
AD> When I last proposed this it was refused on the grounds that an external
AD> HA manager should be doing this job but I don't think that is realistic.

can we use JBD-like approach? export some inode and MMP/whatever
would 'ping' 1st block of the inode.

thanks, Alex

2007-02-01 20:56:22

by Mingming Cao

[permalink] [raw]
Subject: ext3_forget() and ext3_free_blocks()

I am chasing a ext3 bug which double free the same xattr block from two
different inode. While I am looking at the code ext3_xattr_release_block
() I found ext3_free_block() is called before ext3_forget():

ext3_xattr_release_block(handle_t *handle, struct inode *inode,
struct buffer_head *bh)
{
struct mb_cache_entry *ce = NULL;

ce = mb_cache_entry_get(ext3_xattr_cache, bh->b_bdev, bh->b_blocknr);
if (BHDR(bh)->h_refcount == cpu_to_le32(1)) {
ea_bdebug(bh, "refcount now=0; freeing");
if (ce)
mb_cache_entry_free(ce);
ext3_free_blocks(handle, inode, bh->b_blocknr, 1);
get_bh(bh);
ext3_forget(handle, 1, inode, bh, bh->b_blocknr);
} else {

Is this a potential problem? Looks like other places calling
ext3_free_block() it all has ext3_forget() called before that.

Though this seems not related to the double-free bug I see, as I
reversed the order and rerun the test, the bug still reproduced. But
just curious..


Thanks,
Mingming

2007-02-02 01:26:44

by Andreas Dilger

[permalink] [raw]
Subject: Re: Fw: [BUG -mm] ext3_orphan_add() accessing corrupted list on a corrupted ext3fs

On Feb 01, 2007 20:28 +0300, Alex Tomas wrote:
> >>>>> Andreas Dilger (AD) writes:
> AD> I don't have a comment on the actual bug here, but this is another case
> AD> where it would be nice to have multi-mount protection built into ext3...
> AD> When I last proposed this it was refused on the grounds that an external
> AD> HA manager should be doing this job but I don't think that is realistic.
>
> can we use JBD-like approach? export some inode and MMP/whatever
> would 'ping' 1st block of the inode.

I'd be happy enough to implement the engine for this in a generic kernel
layer instead of being ext3-specific. That said, ext3 also has the
ability to mark the filesystem incompatible to older kernels that do not
support the MMP protocol, so this avoids the requirement that both
kernels are up-to-date in order to not corrupt shared images.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-02-02 10:30:18

by Andreas Gruenbacher

[permalink] [raw]
Subject: Re: ext3_forget() and ext3_free_blocks()

On Thursday 01 February 2007 12:56, Mingming Cao wrote:
> Is this a potential problem? Looks like other places calling
> ext3_free_block() it all has ext3_forget() called before that.

No, the order between the two doesn't matter. I'm still unclear about leads to
the xattr block double-free you are seeing.

Andreas