2004-11-22 15:35:03

by Jan De Luyck

[permalink] [raw]
Subject: [2.6.10-rc2] XFS filesystem corruption

Hello lists,

[resend with correct email address for LKML]

[Please CC all answers from linux-xfs to me, since I'm not subscribed on that list]

Yesterday I encountered an on-the-fly corruption of my /home filesystem. It worked perfectly one second, the next I hit these nice errors:

Nov 21 16:37:22 precious kernel: 0x0: 31 9e ce 63 cf ff 9c cf ff 31 61 63 ff ff ff ff
Nov 21 16:37:23 precious kernel: Filesystem "hda5": XFS internal error xfs_da_do_buf(2) at line 2273 of file fs/xfs/xfs_da_btree.c. Caller 0xc01fb908
Nov 21 16:37:23 precious kernel: [xfs_da_do_buf+905/2160] xfs_da_do_buf+0x389/0x870
Nov 21 16:37:23 precious kernel: [xfs_da_read_buf+88/96] xfs_da_read_buf+0x58/0x60
Nov 21 16:37:23 precious last message repeated 2 times
Nov 21 16:37:23 precious kernel: [xfs_dir2_leaf_lookup_int+386/720] xfs_dir2_leaf_lookup_int+0x182/0x2d0
Nov 21 16:37:23 precious kernel: [xfs_dir2_leaf_lookup_int+386/720] xfs_dir2_leaf_lookup_int+0x182/0x2d0
Nov 21 16:37:23 precious kernel: [__wake_up_common+65/112] __wake_up_common+0x41/0x70
Nov 21 16:37:23 precious kernel: [xfs_dir2_leaf_lookup+55/224] xfs_dir2_leaf_lookup+0x37/0xe0
Nov 21 16:37:23 precious kernel: [xfs_dir2_lookup+298/352] xfs_dir2_lookup+0x12a/0x160
Nov 21 16:37:23 precious kernel: [xfs_dir_lookup_int+76/304] xfs_dir_lookup_int+0x4c/0x130
Nov 21 16:37:23 precious kernel: [xfs_lookup+80/144] xfs_lookup+0x50/0x90
Nov 21 16:37:23 precious kernel: [linvfs_lookup+82/144] linvfs_lookup+0x52/0x90
Nov 21 16:37:23 precious kernel: [real_lookup+193/240] real_lookup+0xc1/0xf0
Nov 21 16:37:23 precious kernel: [do_lookup+150/176] do_lookup+0x96/0xb0
Nov 21 16:37:23 precious kernel: [link_path_walk+1732/3424] link_path_walk+0x6c4/0xd60
Nov 21 16:37:23 precious kernel: [link_path_walk+2603/3424] link_path_walk+0xa2b/0xd60
Nov 21 16:37:23 precious kernel: [cp_new_stat64+248/272] cp_new_stat64+0xf8/0x110
Nov 21 16:37:23 precious kernel: [path_lookup+124/320] path_lookup+0x7c/0x140
Nov 21 16:37:23 precious kernel: [__user_walk+51/96] __user_walk+0x33/0x60
Nov 21 16:37:23 precious kernel: [dput+51/544] dput+0x33/0x220
Nov 21 16:37:23 precious kernel: [sys_access+133/336] sys_access+0x85/0x150
Nov 21 16:37:23 precious kernel: [path_release+21/80] path_release+0x15/0x50
Nov 21 16:37:23 precious kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Nov 21 16:37:23 precious kernel: 0x0: 31 9e ce 63 cf ff 9c cf ff 31 61 63 ff ff ff ff
Nov 21 16:37:23 precious kernel: Filesystem "hda5": XFS internal error xfs_da_do_buf(2) at line 2273 of file fs/xfs/xfs_da_btree.c. Caller 0xc01fb908
Nov 21 16:37:23 precious kernel: [xfs_da_do_buf+905/2160] xfs_da_do_buf+0x389/0x870
Nov 21 16:37:23 precious kernel: [xfs_da_read_buf+88/96] xfs_da_read_buf+0x58/0x60
Nov 21 16:37:23 precious last message repeated 2 times
Nov 21 16:37:23 precious kernel: [xfs_dir2_leaf_lookup_int+386/720] xfs_dir2_leaf_lookup_int+0x182/0x2d0
Nov 21 16:37:23 precious kernel: [xfs_dir2_leaf_lookup_int+386/720] xfs_dir2_leaf_lookup_int+0x182/0x2d0
Nov 21 16:37:23 precious kernel: [__wake_up_common+65/112] __wake_up_common+0x41/0x70
Nov 21 16:37:23 precious kernel: [xfs_dir2_leaf_lookup+55/224] xfs_dir2_leaf_lookup+0x37/0xe0
Nov 21 16:37:23 precious kernel: [xfs_dir2_lookup+298/352] xfs_dir2_lookup+0x12a/0x160
Nov 21 16:37:23 precious kernel: [xfs_dir_lookup_int+76/304] xfs_dir_lookup_int+0x4c/0x130
Nov 21 16:37:23 precious kernel: [xfs_lookup+80/144] xfs_lookup+0x50/0x90
Nov 21 16:37:23 precious kernel: [linvfs_lookup+82/144] linvfs_lookup+0x52/0x90
Nov 21 16:37:23 precious kernel: [real_lookup+193/240] real_lookup+0xc1/0xf0
Nov 21 16:37:23 precious kernel: [do_lookup+150/176] do_lookup+0x96/0xb0
Nov 21 16:37:23 precious kernel: [link_path_walk+1732/3424] link_path_walk+0x6c4/0xd60
Nov 21 16:37:23 precious kernel: [link_path_walk+2603/3424] link_path_walk+0xa2b/0xd60
Nov 21 16:37:23 precious kernel: [cp_new_stat64+248/272] cp_new_stat64+0xf8/0x110
Nov 21 16:37:23 precious kernel: [path_lookup+124/320] path_lookup+0x7c/0x140
Nov 21 16:37:23 precious kernel: [__user_walk+51/96] __user_walk+0x33/0x60
Nov 21 16:37:23 precious kernel: [sys_access+133/336] sys_access+0x85/0x150
Nov 21 16:37:23 precious kernel: [path_release+21/80] path_release+0x15/0x50
Nov 21 16:37:23 precious kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Nov 21 16:37:23 precious kernel: 0x0: 10 10 10 10 10 10 00 00 21 10 10 10 10 10 10 10
Nov 21 16:37:23 precious kernel: Filesystem "hda5": XFS internal error xfs_da_do_buf(2) at line 2273 of file fs/xfs/xfs_da_btree.c. Caller 0xc01fb908
Nov 21 16:37:23 precious kernel: [xfs_da_do_buf+905/2160] xfs_da_do_buf+0x389/0x870
Nov 21 16:37:23 precious kernel: [xfs_da_read_buf+88/96] xfs_da_read_buf+0x58/0x60
Nov 21 16:37:23 precious kernel: [xfs_da_read_buf+88/96] xfs_da_read_buf+0x58/0x60
Nov 21 16:37:23 precious kernel: [xfs_initialize_vnode+780/800] xfs_initialize_vnode+0x30c/0x320
Nov 21 16:37:23 precious kernel: [xfs_da_read_buf+88/96] xfs_da_read_buf+0x58/0x60
Nov 21 16:37:23 precious kernel: [xfs_dir2_leaf_addname+875/2688] xfs_dir2_leaf_addname+0x36b/0xa80
Nov 21 16:37:23 precious kernel: [xfs_dir2_leaf_addname+875/2688] xfs_dir2_leaf_addname+0x36b/0xa80
Nov 21 16:37:23 precious kernel: [xfs_bmap_last_offset+197/304] xfs_bmap_last_offset+0xc5/0x130
Nov 21 16:37:23 precious kernel: [xfs_dir2_isleaf+44/112] xfs_dir2_isleaf+0x2c/0x70
Nov 21 16:37:23 precious kernel: [xfs_dir2_createname+341/384] xfs_dir2_createname+0x155/0x180
Nov 21 16:37:23 precious kernel: [xfs_dir_ialloc+145/736] xfs_dir_ialloc+0x91/0x2e0
Nov 21 16:37:23 precious kernel: [xfs_trans_ijoin+53/144] xfs_trans_ijoin+0x35/0x90
Nov 21 16:37:23 precious kernel: [xfs_create+1115/1888] xfs_create+0x45b/0x760
Nov 21 16:37:23 precious kernel: [linvfs_mknod+475/576] linvfs_mknod+0x1db/0x240
Nov 21 16:37:23 precious kernel: [xfs_dir2_lookup+298/352] xfs_dir2_lookup+0x12a/0x160
Nov 21 16:37:23 precious kernel: [real_lookup+193/240] real_lookup+0xc1/0xf0
Nov 21 16:37:23 precious kernel: [dput+51/544] dput+0x33/0x220
Nov 21 16:37:23 precious kernel: [xfs_dir_lookup_int+76/304] xfs_dir_lookup_int+0x4c/0x130
Nov 21 16:37:23 precious kernel: [permission+53/96] permission+0x35/0x60
Nov 21 16:37:23 precious kernel: [vfs_create+121/224] vfs_create+0x79/0xe0
Nov 21 16:37:23 precious kernel: [open_namei+1462/1552] open_namei+0x5b6/0x610
Nov 21 16:37:23 precious kernel: [filp_open+62/112] filp_open+0x3e/0x70
Nov 21 16:37:23 precious kernel: [get_unused_fd+57/224] get_unused_fd+0x39/0xe0
Nov 21 16:37:23 precious kernel: [sys_open+73/144] sys_open+0x49/0x90
Nov 21 16:37:23 precious kernel: [syscall_call+7/11] syscall_call+0x7/0xb
Nov 21 16:37:23 precious kernel: xfs_force_shutdown(hda5,0x8) called from line 1091 of file fs/xfs/xfs_trans.c. Return address = 0xc0244e4b
Nov 21 16:37:23 precious kernel: Filesystem "hda5": Corruption of in-memory data detected. Shutting down filesystem: hda5
Nov 21 16:37:23 precious kernel: Please umount the filesystem, and rectify the problem(s)

Doing an unmount/remount didn't solve it, i had to run xfs_repair on the filesystem to get it back to 'work', which caused me to lose
a lot of stuff. (thank <deity> for backups.)

Any idea what can cause this?

Thanks,

Jan

--
What's the MATTER Sid? ... Is your BEVERAGE unsatisfactory?


Attachments:
(No filename) (7.14 kB)
(No filename) (189.00 B)
Download all attachments

2004-11-22 23:35:43

by Eric Sandeen

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

The trigger was a bad magic number related to directories... hard to say
what happened in the first place. Can you send the output from
xfs_repair, that might offer some hints.

Thanks,

-Eric

2004-11-22 16:30:27

by William Lee Irwin III

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

On Mon, Nov 22, 2004 at 03:30:29PM +0100, Jan De Luyck wrote:
> [resend with correct email address for LKML]
> [Please CC all answers from linux-xfs to me, since I'm not subscribed on that list]
> Yesterday I encountered an on-the-fly corruption of my /home filesystem. It worked perfectly one second, the next I hit these nice errors:
> Nov 21 16:37:22 precious kernel: 0x0: 31 9e ce 63 cf ff 9c cf ff 31 61 63 ff ff ff ff
> Nov 21 16:37:23 precious kernel: Filesystem "hda5": XFS internal error xfs_da_do_buf(2) at line 2273 of file fs/xfs/xfs_da_btree.c. Caller 0xc01fb908
> Nov 21 16:37:23 precious kernel: [xfs_da_do_buf+905/2160] xfs_da_do_buf+0x389/0x870

I don't have any ideas at the moment, but please cc: me also. I'd like
to watch for issues I do understand as this bug's nature is clarified.


-- wli

2004-11-23 06:39:48

by Jan De Luyck

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

On Tuesday 23 November 2004 00:34, Eric Sandeen wrote:
> The trigger was a bad magic number related to directories... hard to say
> what happened in the first place. Can you send the output from
> xfs_repair, that might offer some hints.

Sorry, but as a repair was very urgent, I didn't really think of saving the
xfs_repair output.. My bad I guess.

Jan

--
The seven year itch comes from fooling around during the fourth, fifth,
and sixth years.


Attachments:
(No filename) (452.00 B)
(No filename) (189.00 B)
Download all attachments

2004-11-23 10:14:24

by Prakash K. Cheemplavam

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

William Lee Irwin III wrote:
> On Mon, Nov 22, 2004 at 03:30:29PM +0100, Jan De Luyck wrote:
>
>>[resend with correct email address for LKML]
>>[Please CC all answers from linux-xfs to me, since I'm not subscribed on that list]
>>Yesterday I encountered an on-the-fly corruption of my /home filesystem. It worked perfectly one second, the next I hit these nice errors:
>>Nov 21 16:37:22 precious kernel: 0x0: 31 9e ce 63 cf ff 9c cf ff 31 61 63 ff ff ff ff
>>Nov 21 16:37:23 precious kernel: Filesystem "hda5": XFS internal error xfs_da_do_buf(2) at line 2273 of file fs/xfs/xfs_da_btree.c. Caller 0xc01fb908
>>Nov 21 16:37:23 precious kernel: [xfs_da_do_buf+905/2160] xfs_da_do_buf+0x389/0x870
>
>
> I don't have any ideas at the moment, but please cc: me also. I'd like
> to watch for issues I do understand as this bug's nature is clarified.

While we are at it: Is xfs known to be broken while preempt is on? (Esp
using ck's preemp big kernel lock?) I got following using a raid0 setup
with xfs. I thought it would be a driver issue, but reformatting to ext3
the stripe array runs now w/o probs for a few days. (xfs crapped out
after a few hours on heavy disk activity.)

Nov 21 10:10:15 tachyon ata2: command 0x25 timeout, stat 0xd0 host_stat 0x61
Nov 21 10:10:15 tachyon ata2: status=0xd0 { Busy }
Nov 21 10:10:15 tachyon SCSI error : <1 0 0 0> return code = 0x8000002
Nov 21 10:10:15 tachyon Current sdb: sense = 70 10
Nov 21 10:10:15 tachyon end_request: I/O error, dev sdb, sector 10480847
Nov 21 10:10:15 tachyon ATA: abnormal status 0xD0 on port 0xF08060C7
Nov 21 10:10:15 tachyon ATA: abnormal status 0xD0 on port 0xF08060C7
Nov 21 10:10:15 tachyon ATA: abnormal status 0xD0 on port 0xF08060C7
Nov 21 10:10:45 tachyon ata2: command 0x25 timeout, stat 0xd0 host_stat 0x61
Nov 21 10:10:45 tachyon ata2: status=0xd0 { Busy }
Nov 21 10:10:45 tachyon SCSI error : <1 0 0 0> return code = 0x8000002
Nov 21 10:10:45 tachyon Current sdb: sense = 70 10
Nov 21 10:10:45 tachyon end_request: I/O error, dev sdb, sector 10480855
Nov 21 10:10:45 tachyon I/O error in filesystem ("md0") meta-data dev
md0 block 0x13fd990 ("xfs_trans_read_buf") error 5 buf count 8192

etc...

If you need more infos (dmesg, .config, etc) let me know.

Prakash


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2004-11-23 19:25:10

by Prakash K. Cheemplavam

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

Lee Revell schrieb:
> On Tue, 2004-11-23 at 11:13 +0100, Prakash K. Cheemplavam wrote:
>
>>Is xfs known to be broken while preempt is on? (Esp
>>using ck's preemp big kernel lock?)
>
>
> Minor nitpick: Ingo wrote the preempt BKL code, not Con.

Oh yes, I know, but it is in Con's kernel, and IIRC he merged nto all of
his patches, so I thought this would be more precise. ;-)

Prakash


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2004-11-23 19:18:22

by Lee Revell

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

On Tue, 2004-11-23 at 11:13 +0100, Prakash K. Cheemplavam wrote:
> Is xfs known to be broken while preempt is on? (Esp
> using ck's preemp big kernel lock?)

Minor nitpick: Ingo wrote the preempt BKL code, not Con.

Lee

2004-11-23 21:30:57

by Nathan Scott

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

On Tue, Nov 23, 2004 at 11:13:18AM +0100, Prakash K. Cheemplavam wrote:
>
> While we are at it: Is xfs known to be broken while preempt is on? (Esp

Nope.

> using ck's preemp big kernel lock?) I got following using a raid0 setup
> with xfs. I thought it would be a driver issue, but reformatting to ext3
> the stripe array runs now w/o probs for a few days. (xfs crapped out
> after a few hours on heavy disk activity.)
> ...
> Nov 21 10:10:45 tachyon end_request: I/O error, dev sdb, sector 10480855
> Nov 21 10:10:45 tachyon I/O error in filesystem ("md0") meta-data dev
> md0 block 0x13fd990 ("xfs_trans_read_buf") error 5 buf count 8192

This looks like your driver passed an error back up to the
filesystem while it was doing metadata IO and XFS chose to
shut it down to prevent further damage. It's unlikely to
be a preempt/xfs problem. Possibly hardware. Did you see
any of those device errors since switching to ext3?

cheers.

--
Nathan

2004-11-24 08:20:06

by Prakash K. Cheemplavam

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

Nathan Scott schrieb:
> On Tue, Nov 23, 2004 at 11:13:18AM +0100, Prakash K. Cheemplavam wrote:
>>using ck's preemp big kernel lock?) I got following using a raid0 setup
>>with xfs. I thought it would be a driver issue, but reformatting to ext3
>>the stripe array runs now w/o probs for a few days. (xfs crapped out
>>after a few hours on heavy disk activity.)
>>...
>>Nov 21 10:10:45 tachyon end_request: I/O error, dev sdb, sector 10480855
>>Nov 21 10:10:45 tachyon I/O error in filesystem ("md0") meta-data dev
>>md0 block 0x13fd990 ("xfs_trans_read_buf") error 5 buf count 8192
>
>
> This looks like your driver passed an error back up to the
> filesystem while it was doing metadata IO and XFS chose to
> shut it down to prevent further damage. It's unlikely to
> be a preempt/xfs problem. Possibly hardware. Did you see
> any of those device errors since switching to ext3?

No. That's why I am wondering. I read about such errors like I got
before in lkml and usually they were not fs related but libata siimage
driver related. It could be just a coincidence that it came up with xfs,
but till now (I guess 5 days now, though not 24/7 running) ext3 is
behaving nicely.

bye,

Prakash


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature

2004-11-24 14:09:53

by Eric Sandeen

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

Prakash K. Cheemplavam wrote:
> Nathan Scott schrieb:
>
>> Did you see
>> any of those device errors since switching to ext3?
>
>
> No. That's why I am wondering. I read about such errors like I got
> before in lkml and usually they were not fs related but libata siimage
> driver related. It could be just a coincidence that it came up with xfs,
> but till now (I guess 5 days now, though not 24/7 running) ext3 is
> behaving nicely.

It's almost certainly not a filesystem problem, but an IO layer problem.
Maybe you only see it with xfs due to different disk IO patterns with
xfs vs. ext3... the two will certainly be allocating & writing to the
disk in different ways.

-Eric

2004-11-25 07:28:47

by Prakash K. Cheemplavam

[permalink] [raw]
Subject: Re: [2.6.10-rc2] XFS filesystem corruption

Eric Sandeen schrieb:
> Prakash K. Cheemplavam wrote:
>
>> Nathan Scott schrieb:
>>
>>> Did you see
>>> any of those device errors since switching to ext3?
>>
>>
>>
>> No. That's why I am wondering. I read about such errors like I got
>> before in lkml and usually they were not fs related but libata siimage
>> driver related. It could be just a coincidence that it came up with
>> xfs, but till now (I guess 5 days now, though not 24/7 running) ext3
>> is behaving nicely.
>
>
> It's almost certainly not a filesystem problem, but an IO layer problem.
> Maybe you only see it with xfs due to different disk IO patterns with
> xfs vs. ext3... the two will certainly be allocating & writing to the
> disk in different ways.

Hmm, OK. When I have some hd space again. I might try to reproduce this
error. Whom should I bug then if it reappears?

Cheers,

Prakash


Attachments:
signature.asc (189.00 B)
OpenPGP digital signature