2007-10-07 01:10:22

by Max Waterman

[permalink] [raw]
Subject: XFS internal error

Hi,

I have just had an XFS error occur while deleting some directory
hierarchy. I hope this is the correct place to report it.

It essentially shutdown the file system, and a reboot seemed to return
everything to normal.

This is in syslog :

> Oct 6 23:40:33 jeeves kernel: xfs_da_do_buf: bno 16777216
> Oct 6 23:40:33 jeeves kernel: dir: inode 2095141277
> Oct 6 23:40:33 jeeves kernel: Filesystem "md2": XFS internal error xfs_da_do_buf(1) at line 1994 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff889b2de4
> Oct 6 23:40:33 jeeves kernel:
> Oct 6 23:40:33 jeeves kernel: Call Trace:
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889b2a21>] :xfs:xfs_da_do_buf+0x2da/0x633
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889bafb3>] :xfs:xfs_dir2_leafn_lookup_int+0x2c6/0x44b
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889bb013>] :xfs:xfs_dir2_leafn_lookup_int+0x326/0x44b
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889d721a>] :xfs:xfs_trans_log_buf+0x55/0x81
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889b2de4>] :xfs:xfs_da_read_buf+0x24/0x29
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889b988e>] :xfs:xfs_dir2_node_removename+0x23a/0x43a
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889b988e>] :xfs:xfs_dir2_node_removename+0x23a/0x43a
> Oct 6 23:40:33 jeeves kernel: [<ffffffff8106c632>] find_lock_page+0x26/0xa2
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889a5521>] :xfs:xfs_bmap_last_offset+0xcd/0xdb
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889b5189>] :xfs:xfs_dir_removename+0x102/0x110
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889e0de6>] :xfs:kmem_zone_alloc+0x52/0x9f
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889c7c98>] :xfs:xfs_inode_item_init+0x1e/0x7a
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889e050e>] :xfs:xfs_remove+0x2a9/0x437
> Oct 6 23:40:33 jeeves kernel: [<ffffffff8109d0f5>] __link_path_walk+0x16e/0xd9c
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889e6da7>] :xfs:xfs_vn_unlink+0x21/0x4f
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889c2310>] :xfs:xfs_iunlock+0x57/0x79
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889db297>] :xfs:xfs_access+0x3d/0x46Oct 6 23:40:33 jeeves kernel: [<ffffffff889e6eaa>] :xfs:xfs_vn_permission+0x14/0x19
> Oct 6 23:40:33 jeeves kernel: [<ffffffff8109b7e5>] permission+0xaf/0xf7
> Oct 6 23:40:33 jeeves kernel: [<ffffffff8109c583>] vfs_unlink+0xbc/0x102
> Oct 6 23:40:33 jeeves kernel: [<ffffffff8109e4ef>] do_unlinkat+0xaa/0x144
> Oct 6 23:40:33 jeeves kernel: [<ffffffff81009c71>] tracesys+0x71/0xda
> Oct 6 23:40:33 jeeves kernel: [<ffffffff81009cd5>] tracesys+0xd5/0xda
> Oct 6 23:40:33 jeeves kernel:
> Oct 6 23:40:33 jeeves kernel: Filesystem "md2": XFS internal error xfs_trans_cancel at line 1132 of file fs/xfs/xfs_trans.c. Caller 0xffffffff889e0668
> Oct 6 23:40:33 jeeves kernel:
> Oct 6 23:40:33 jeeves kernel: Call Trace:
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889d622d>] :xfs:xfs_trans_cancel+0x5b/0xf1
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889e0668>] :xfs:xfs_remove+0x403/0x437
> Oct 6 23:40:33 jeeves kernel: [<ffffffff8109d0f5>] __link_path_walk+0x16e/0xd9c
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889e6da7>] :xfs:xfs_vn_unlink+0x21/0x4f
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889c2310>] :xfs:xfs_iunlock+0x57/0x79
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889db297>] :xfs:xfs_access+0x3d/0x46
> Oct 6 23:40:33 jeeves kernel: [<ffffffff889e6eaa>] :xfs:xfs_vn_permission+0x14/0x19
> Oct 6 23:40:33 jeeves kernel: [<ffffffff8109b7e5>] permission+0xaf/0xf7
> Oct 6 23:40:33 jeeves kernel: [<ffffffff8109c583>] vfs_unlink+0xbc/0x102Oct 6 23:40:33 jeeves kernel: [<ffffffff8109e4ef>] do_unlinkat+0xaa/0x144
> Oct 6 23:40:33 jeeves kernel: [<ffffffff81009c71>] tracesys+0x71/0xda
> Oct 6 23:40:33 jeeves kernel: [<ffffffff81009cd5>] tracesys+0xd5/0xda
> Oct 6 23:40:33 jeeves kernel:
> Oct 6 23:40:33 jeeves kernel: xfs_force_shutdown(md2,0x8) called from line 1133 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff889d624b
> Oct 6 23:40:33 jeeves kernel: Filesystem "md2": Corruption of in-memory data detected. Shutting down filesystem: md2
> Oct 6 23:40:33 jeeves kernel: Please umount the filesystem, and rectify the problem(s)Oct 6 23:43:53 jeeves shutdown[18347]: shutting down for system reboot

I am fairly sure there is nothing I can do about this, but I thought it
prudent to mention it. Searching turned up some similar issues, but they
seem related to a previous kernel version and claimed to be fixed in
subsequent versions.

> Linux jeeves.mydomain 2.6.22.7-57.fc6 #1 SMP Fri Sep 21 19:45:12 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux

The array is a little 'unorthodox', if that matters.

It's using 4 on-board(nforce) sata drives and 4 PCI IDE drives :

> /dev/md2:
> Version : 00.90.03
> Creation Time : Sat Aug 6 10:18:41 2005
> Raid Level : raid5
> Array Size : 976804480 (931.55 GiB 1000.25 GB)
> Device Size : 195360896 (186.31 GiB 200.05 GB)
> Raid Devices : 6
> Total Devices : 8
> Preferred Minor : 2
> Persistence : Superblock is persistent
>
> Update Time : Sun Oct 7 09:05:43 2007
> State : clean
> Active Devices : 6
> Working Devices : 8
> Failed Devices : 0
> Spare Devices : 2
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> UUID : 15bfec75:595ac793:0914f8ee:862effd8
> Events : 0.9341058
>
> Number Major Minor RaidDevice State
> 0 33 0 0 active sync /dev/hde
> 1 34 0 1 active sync /dev/hdg
> 2 56 0 2 active sync /dev/hdi
> 3 8 32 3 active sync /dev/sdc
> 4 8 48 4 active sync /dev/sdd
> 5 8 80 5 active sync /dev/sdf
>
> 6 8 64 - spare /dev/sde
> 7 57 0 - spare /dev/hdk

Max.


2007-10-08 00:15:16

by David Chinner

[permalink] [raw]
Subject: Re: XFS internal error

[please cc [email protected] on XFS bug reports. thx.]

On Sun, Oct 07, 2007 at 09:09:58AM +0800, Max Waterman wrote:
> Hi,
>
> I have just had an XFS error occur while deleting some directory
> hierarchy. I hope this is the correct place to report it.

.....
> This is in syslog :
>
> > Oct 6 23:40:33 jeeves kernel: xfs_da_do_buf: bno 16777216
^^^^^^^^^^^^^
> > Oct 6 23:40:33 jeeves kernel: dir: inode 2095141277
> > Oct 6 23:40:33 jeeves kernel: Filesystem "md2": XFS internal error xfs_da_do_buf(1) at line 1994 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff889b2de4

Did you ever run 2.6.17-2.6.17.6? If so, this implies:

http://oss.sgi.com/projects/xfs/faq.html#dir2

> I am fairly sure there is nothing I can do about this, but I thought it
> prudent to mention it. Searching turned up some similar issues, but they
> seem related to a previous kernel version and claimed to be fixed in
> subsequent versions.

Yes, but those previous corruptions get left on disk as a landmine
for you to trip over some time later, even on a kernel that has the
bug fixed.

I suggest that you run xfs_check on the filesystem and if that
shows up errors, run xfs_repair onteh filesystem to correct them.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2007-10-08 01:54:50

by Max Waterman

[permalink] [raw]
Subject: Re: XFS internal error

David Chinner wrote:
>> 1994 of file fs/xfs/xfs_da_btree.c. Caller 0xffffffff889b2de4
>>
>
> Did you ever run 2.6.17-2.6.17.6?
I guess so, since I've been upgrading steadily since I installed FC6
some time ago.
> If so, this implies:
>
> http://oss.sgi.com/projects/xfs/faq.html#dir2
>
Ah. I did see that, but stopped reading when I read it was fixed in
later versions ... didn't get to the part where it still needed to be
repaired/etc.

>> I am fairly sure there is nothing I can do about this, but I thought it
>> prudent to mention it. Searching turned up some similar issues, but they
>> seem related to a previous kernel version and claimed to be fixed in
>> subsequent versions.
>>
>
> Yes, but those previous corruptions get left on disk as a landmine
> for you to trip over some time later, even on a kernel that has the
> bug fixed.
>
ah, ok.
> I suggest that you run xfs_check on the filesystem and if that
> shows up errors, run xfs_repair onteh filesystem to correct them.
>
It did, and I did, and another xfs_check produced no output.

Do I need to do anything else to correct it? xfs_repair produced a whole
bunch of stuff that I don't understand...this is the bit that looks most
significant :

> Phase 6 - check inode connectivity...
> - resetting contents of realtime bitmap and summary inodes
> - traversing filesystem ...
> can't read freespace block 16777216 for directory inode 2095141277
> rebuilding directory inode 2095141277
> free block 16777216 for directory inode 2100841732 bad nused
> rebuilding directory inode 2100841732
> free block 16777216 for directory inode 2102199514 bad nused
> rebuilding directory inode 2102199514
> free block 16777216 for directory inode 2102200124 bad nused
> rebuilding directory inode 2102200124
> free block 16777216 for directory inode 2102905843 bad nused
> rebuilding directory inode 2102905843
> free block 16777216 for directory inode 3277510927 bad nused
> rebuilding directory inode 3277510927
> free block 16777216 for directory inode 3277524487 bad nused
> rebuilding directory inode 3277524487
> free block 16777216 for directory inode 3379886019 bad nused
> rebuilding directory inode 3379886019
> - traversal finished ...
> - moving disconnected inodes to lost+found ...
That last line looks suspicious...furthermore, when I mount the
filesystem, I don't see a 'lost+found' directory (which I've been used
to seeing on IRIX). Ah, perhaps the '...' with *nothing* after it means
it didn't do any moving. Am I right?

Max.

2007-10-08 02:32:39

by Barry Naujok

[permalink] [raw]
Subject: Re: XFS internal error

On Mon, 08 Oct 2007 11:54:28 +1000, Max Waterman
<[email protected]> wrote:

> David Chinner wrote:
>> I suggest that you run xfs_check on the filesystem and if that
>> shows up errors, run xfs_repair onteh filesystem to correct them.
>>
> It did, and I did, and another xfs_check produced no output.
>
> Do I need to do anything else to correct it? xfs_repair produced a whole
> bunch of stuff that I don't understand...this is the bit that looks most
> significant :
>
>> Phase 6 - check inode connectivity...
>> - resetting contents of realtime bitmap and summary inodes
>> - traversing filesystem ...
>> can't read freespace block 16777216 for directory inode 2095141277
>> rebuilding directory inode 2095141277
>> free block 16777216 for directory inode 2100841732 bad nused
>> rebuilding directory inode 2100841732
>> free block 16777216 for directory inode 2102199514 bad nused
>> rebuilding directory inode 2102199514
>> free block 16777216 for directory inode 2102200124 bad nused
>> rebuilding directory inode 2102200124
>> free block 16777216 for directory inode 2102905843 bad nused
>> rebuilding directory inode 2102905843
>> free block 16777216 for directory inode 3277510927 bad nused
>> rebuilding directory inode 3277510927
>> free block 16777216 for directory inode 3277524487 bad nused
>> rebuilding directory inode 3277524487
>> free block 16777216 for directory inode 3379886019 bad nused
>> rebuilding directory inode 3379886019
>> - traversal finished ...
>> - moving disconnected inodes to lost+found ...
> That last line looks suspicious...furthermore, when I mount the
> filesystem, I don't see a 'lost+found' directory (which I've been used
> to seeing on IRIX). Ah, perhaps the '...' with *nothing* after it means
> it didn't do any moving. Am I right?

Yes, the latest xfs_repair doesn't create a lost+found unless it
needs to, and if it does so, it will list the inodes moved there.

So, in your case, nothing went to lost+found.

Regards,
Barry.

2007-10-08 02:48:39

by Max Waterman

[permalink] [raw]
Subject: Re: XFS internal error

Barry Naujok wrote:
> Yes, the latest xfs_repair doesn't create a lost+found unless it
> needs to, and if it does so, it will list the inodes moved there.
>
> So, in your case, nothing went to lost+found.
>
> Regards,
> Barry.
Great. Thanks a lot for your help :)

Max.

PS. I'm still missing working at SGI :|

2008-03-10 12:56:18

by Andreas Kotes

[permalink] [raw]
Subject: Re: XFS internal error

Hello,

* David Chinner <[email protected]> [20080310 13:18]:
> Yes, but those previous corruptions get left on disk as a landmine
> for you to trip over some time later, even on a kernel that has the
> bug fixed.
>
> I suggest that you run xfs_check on the filesystem and if that
> shows up errors, run xfs_repair onteh filesystem to correct them.

I seem to be having similiar problems, and xfs_repair is not helping :(

I always run into:

[ 137.099267] Filesystem "sda2": XFS internal error xfs_trans_cancel at line 1132 of file fs/xfs/xfs_trans.c. Caller 0xffffffff80372156
[ 137.106267]
[ 137.106268] Call Trace:
[ 137.113129] [<ffffffff803692f0>] xfs_trans_cancel+0x100/0x130
[ 137.116524] [<ffffffff80372156>] xfs_create+0x256/0x6e0
[ 137.119904] [<ffffffff80341e09>] xfs_dir2_isleaf+0x19/0x50
[ 137.123269] [<ffffffff8037e145>] xfs_vn_mknod+0x195/0x250
[ 137.126607] [<ffffffff8028f32c>] vfs_create+0xac/0xf0
[ 137.129920] [<ffffffff80292b3c>] open_namei+0x5dc/0x700
[ 137.133227] [<ffffffff8022a443>] __wake_up+0x43/0x70
[ 137.136477] [<ffffffff802851bc>] do_filp_open+0x1c/0x50
[ 137.139693] [<ffffffff8028524a>] do_sys_open+0x5a/0x100
[ 137.142838] [<ffffffff80220a83>] sysenter_do_call+0x1b/0x67
[ 137.145964]
[ 137.149014] xfs_force_shutdown(sda2,0x8) called from line 1133 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff8036930e
[ 137.163485] Filesystem "sda2": Corruption of in-memory data detected. Shutting down filesystem: sda2

directly after booting.

I'm using kernel 2.6.22.16 and xfs_repair version 2.9.7

How can I help finding the problem? I'd like xfs_repair to be able to
fix this.

Br,

Andreas

--
flatline IT services - Andreas Kotes - Tailored solutions for your IT needs

2008-03-10 22:30:56

by David Chinner

[permalink] [raw]
Subject: Re: XFS internal error

On Mon, Mar 10, 2008 at 01:22:16PM +0100, Andreas Kotes wrote:
> Hello,
>
> * David Chinner <[email protected]> [20080310 13:18]:
> > Yes, but those previous corruptions get left on disk as a landmine
> > for you to trip over some time later, even on a kernel that has the
> > bug fixed.
> >
> > I suggest that you run xfs_check on the filesystem and if that
> > shows up errors, run xfs_repair onteh filesystem to correct them.
>
> I seem to be having similiar problems, and xfs_repair is not helping :(

xfs_repair is ensuring that the problem is not being caused by on-disk
corruption. In this case, it does not appear to be caused by on-disk
corruption, so xfs_repair won't help.

> I always run into:
>
> [ 137.099267] Filesystem "sda2": XFS internal error xfs_trans_cancel at line 1132 of file fs/xfs/xfs_trans.c. Caller 0xffffffff80372156
> [ 137.106267]
> [ 137.106268] Call Trace:
> [ 137.113129] [<ffffffff803692f0>] xfs_trans_cancel+0x100/0x130
> [ 137.116524] [<ffffffff80372156>] xfs_create+0x256/0x6e0
> [ 137.119904] [<ffffffff80341e09>] xfs_dir2_isleaf+0x19/0x50
> [ 137.123269] [<ffffffff8037e145>] xfs_vn_mknod+0x195/0x250
> [ 137.126607] [<ffffffff8028f32c>] vfs_create+0xac/0xf0
> [ 137.129920] [<ffffffff80292b3c>] open_namei+0x5dc/0x700
> [ 137.133227] [<ffffffff8022a443>] __wake_up+0x43/0x70
> [ 137.136477] [<ffffffff802851bc>] do_filp_open+0x1c/0x50
> [ 137.139693] [<ffffffff8028524a>] do_sys_open+0x5a/0x100
> [ 137.142838] [<ffffffff80220a83>] sysenter_do_call+0x1b/0x67
> [ 137.145964]
> [ 137.149014] xfs_force_shutdown(sda2,0x8) called from line 1133 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff8036930e
> [ 137.163485] Filesystem "sda2": Corruption of in-memory data detected. Shutting down filesystem: sda2
>
> directly after booting.

Interesting. I think I just found a cause of this shutdown under
certain circumstances:

http://marc.info/?l=linux-xfs&m=120518791828200&w=2

To confirm it might be the same issue, can you dump the superblock of this
filesystem for me? i.e.:

# xfs_db -r -c 'sb 0' -c p /dev/sda2

Also, what the mount options you are using are?

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-03-10 23:36:26

by Andreas Kotes

[permalink] [raw]
Subject: Re: XFS internal error

Hello Dave,

* David Chinner <[email protected]> [20080310 23:30]:
> On Mon, Mar 10, 2008 at 01:22:16PM +0100, Andreas Kotes wrote:
> > * David Chinner <[email protected]> [20080310 13:18]:
> > > Yes, but those previous corruptions get left on disk as a landmine
> > > for you to trip over some time later, even on a kernel that has the
> > > bug fixed.
> > >
> > > I suggest that you run xfs_check on the filesystem and if that
> > > shows up errors, run xfs_repair onteh filesystem to correct them.
> >
> > I seem to be having similiar problems, and xfs_repair is not helping :(
>
> xfs_repair is ensuring that the problem is not being caused by on-disk
> corruption. In this case, it does not appear to be caused by on-disk
> corruption, so xfs_repair won't help.

ok, too bad - btw, is it a problem that I'm doing the xfs_repair on a
mounted filesystem with xfs_repair -f -L after a remount rw?

> > I always run into:
> >
> > [ 137.099267] Filesystem "sda2": XFS internal error xfs_trans_cancel at line 1132 of file fs/xfs/xfs_trans.c. Caller 0xffffffff80372156
> > [ 137.106267]
> > [ 137.106268] Call Trace:
> > [ 137.113129] [<ffffffff803692f0>] xfs_trans_cancel+0x100/0x130
> > [ 137.116524] [<ffffffff80372156>] xfs_create+0x256/0x6e0
> > [ 137.119904] [<ffffffff80341e09>] xfs_dir2_isleaf+0x19/0x50
> > [ 137.123269] [<ffffffff8037e145>] xfs_vn_mknod+0x195/0x250
> > [ 137.126607] [<ffffffff8028f32c>] vfs_create+0xac/0xf0
> > [ 137.129920] [<ffffffff80292b3c>] open_namei+0x5dc/0x700
> > [ 137.133227] [<ffffffff8022a443>] __wake_up+0x43/0x70
> > [ 137.136477] [<ffffffff802851bc>] do_filp_open+0x1c/0x50
> > [ 137.139693] [<ffffffff8028524a>] do_sys_open+0x5a/0x100
> > [ 137.142838] [<ffffffff80220a83>] sysenter_do_call+0x1b/0x67
> > [ 137.145964]
> > [ 137.149014] xfs_force_shutdown(sda2,0x8) called from line 1133 of file fs/xfs/xfs_trans.c. Return address = 0xffffffff8036930e
> > [ 137.163485] Filesystem "sda2": Corruption of in-memory data detected. Shutting down filesystem: sda2
> >
> > directly after booting.
>
> Interesting. I think I just found a cause of this shutdown under
> certain circumstances:
>
> http://marc.info/?l=linux-xfs&m=120518791828200&w=2
>
> To confirm it might be the same issue, can you dump the superblock of this
> filesystem for me? i.e.:
>
> # xfs_db -r -c 'sb 0' -c p /dev/sda2

certainly:

magicnum = 0x58465342
blocksize = 4096
dblocks = 35613152
rblocks = 0
rextents = 0
uuid = 62dae5fa-4085-4edc-ad76-5652d9fb00ae
logstart = 33554436
rootino = 128
rbmino = 129
rsumino = 130
rextsize = 1
agblocks = 2225822
agcount = 16
rbmblocks = 0
logblocks = 17389
versionnum = 0x3084
sectsize = 512
inodesize = 256
inopblock = 16
fname = "s2g-serv\000\000\000\000"
blocklog = 12
sectlog = 9
inodelog = 8
inopblog = 4
agblklog = 22
rextslog = 0
inprogress = 0
imax_pct = 25
icount = 15232
ifree = 2379
fdblocks = 5942436
frextents = 0
uquotino = 0
gquotino = 0
qflags = 0
flags = 0
shared_vn = 0
inoalignmt = 2
unit = 0
width = 0
dirblklog = 0
logsectlog = 0
logsectsize = 0
logsunit = 0
features2 = 0

> Also, what the mount options you are using are?

rw,noatime ...

if you want more info, just let me know :)

Kind regards from Berlin,

Andreas

--
flatline IT services - Andreas Kotes - Tailored solutions for your IT needs

2008-03-10 23:46:00

by David Chinner

[permalink] [raw]
Subject: Re: XFS internal error

On Mon, Mar 10, 2008 at 11:59:27PM +0100, Andreas Kotes wrote:
> * David Chinner <[email protected]> [20080310 23:30]:
> > On Mon, Mar 10, 2008 at 01:22:16PM +0100, Andreas Kotes wrote:
> > > * David Chinner <[email protected]> [20080310 13:18]:
> > > > Yes, but those previous corruptions get left on disk as a landmine
> > > > for you to trip over some time later, even on a kernel that has the
> > > > bug fixed.
> > > >
> > > > I suggest that you run xfs_check on the filesystem and if that
> > > > shows up errors, run xfs_repair onteh filesystem to correct them.
> > >
> > > I seem to be having similiar problems, and xfs_repair is not helping :(
> >
> > xfs_repair is ensuring that the problem is not being caused by on-disk
> > corruption. In this case, it does not appear to be caused by on-disk
> > corruption, so xfs_repair won't help.
>
> ok, too bad - btw, is it a problem that I'm doing the xfs_repair on a
> mounted filesystem with xfs_repair -f -L after a remount rw?

If it was read only, and you rebooted immediately afterwards, you'd
probably be ok. Doing this to a mounted, rw filesystem is asking
for trouble. If the shutdown is occurring after you've run xfs_repair,
then it is almost certainly the cause....

I'd suggest getting a knoppix (or similar) rescue disk and repairing
from that, rebooting and seeing if the problem persists. If it
does, then we'll have to look further into it.

FWIW, you've got plenty of free inodes so this does not look
to be the same problem I've just found.

Cheers,

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group

2008-03-11 13:48:00

by Andreas Kotes

[permalink] [raw]
Subject: Re: XFS internal error

Hello,

* David Chinner <[email protected]> [20080311 00:45]:
> On Mon, Mar 10, 2008 at 11:59:27PM +0100, Andreas Kotes wrote:
> > * David Chinner <[email protected]> [20080310 23:30]:
> > > On Mon, Mar 10, 2008 at 01:22:16PM +0100, Andreas Kotes wrote:
> > > > * David Chinner <[email protected]> [20080310 13:18]:
> > > > > Yes, but those previous corruptions get left on disk as a landmine
> > > > > for you to trip over some time later, even on a kernel that has the
> > > > > bug fixed.
> > > > >
> > > > > I suggest that you run xfs_check on the filesystem and if that
> > > > > shows up errors, run xfs_repair onteh filesystem to correct them.
> > > >
> > > > I seem to be having similiar problems, and xfs_repair is not helping :(
> > >
> > > xfs_repair is ensuring that the problem is not being caused by on-disk
> > > corruption. In this case, it does not appear to be caused by on-disk
> > > corruption, so xfs_repair won't help.
> >
> > ok, too bad - btw, is it a problem that I'm doing the xfs_repair on a
> > mounted filesystem with xfs_repair -f -L after a remount rw?
>
> If it was read only, and you rebooted immediately afterwards, you'd
> probably be ok. Doing this to a mounted, rw filesystem is asking
> for trouble. If the shutdown is occurring after you've run xfs_repair,
> then it is almost certainly the cause....

whoops, that should have read 'remount ro' .. xfs_repair on a live and
writable filesystem is of course inviting desaster. I was trying read
only - btw, the system as such is booted via PXE and running complete
out of an initrd, using the HDD just for local data storage - not much
happening on shutdown/reboot either way.

> I'd suggest getting a knoppix (or similar) rescue disk and repairing
> from that, rebooting and seeing if the problem persists. If it
> does, then we'll have to look further into it.

I basically build a PXE image which does an xfs_repair -L /dev/sda2 from
initrd - and the problem persists. Sigh. Exactly no change.

> FWIW, you've got plenty of free inodes so this does not look
> to be the same problem I've just found.

okay ... it happens on several of the dozens of machines I'm running
this way, but not on others - I have yet to find the difference.

what can I do to help find the problem?

Andreas

--
flatline IT services - Andreas Kotes - Tailored solutions for your IT needs