2007-06-08 13:59:59

by Marco Berizzi

[permalink] [raw]
Subject: Re: XFS internal error xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller 0xc01b00bd

David Chinner wrote:

> > Jun 6 09:47:09 Pleiadi kernel: =======================
> > Jun 6 09:47:09 Pleiadi kernel: 0x0: 28 f1 45 d4 22 53 35 11 09 80
37 5a
> > 47 8a 22 ee
> > Jun 6 09:47:09 Pleiadi kernel: Filesystem "sda8": XFS internal
error
> > xfs_da_do_buf(2) at line 2086 of file fs/xfs/xfs_da_btree.c. Caller
> > 0xc01b2301
> > Jun 6 09:47:09 Pleiadi kernel: [<c01b21f7>]
xfs_da_do_buf+0x70c/0x7b1
> > Jun 6 09:47:09 Pleiadi kernel: [<c01b2301>]
xfs_da_read_buf+0x30/0x35
> > Jun 6 09:47:09 Pleiadi kernel: [<c01b2301>]
xfs_da_read_buf+0x30/0x35
>
> These above stack trace is the sign of a corrupted directory.
>
> Chopping out the rest of the top posting (please don't do that)

apologies

> we get down to 3 months ago:
>
> > > On Mon, Mar 19, 2007 at 11:32:27AM +0100, Marco Berizzi wrote:
> > > > Marco Berizzi wrote:
> > > > Here is the relevant results:
> > > >
> > > > Phase 2 - found root inode chunk
> > > > Phase 3 - ...
> > > > agno = 0
> > > > ...
> > > > agno = 12
> > > > LEAFN node level is 1 inode 1610612918 bno = 8388608
> > >
> > > Hmmm - single bit error in the bno - that reminds of this:
> > >
> > > http://oss.sgi.com/projects/xfs/faq.html#dir2
> > >
> > > So I'd definitely make sure that is repaired....
>
> Where we saw signs of on disk directory corruption. Have you run
> xfs_repair successfully on the filesystem since you reported
> this?

yes.

> If you did clean up the error, does xfs_repair report the same sort
> of error again?

I have run xfs_repair this morning.
Here is the report:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- clear lost+found (if it exists) ...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- ensuring existence of lost+found directory
- traversing filesystem starting at / ...
- traversal finished ...
- traversing all unattached subtrees ...
- traversals finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done

> Have you run a 2.6.16-rcX or 2.6.17.[0-6] kernel since you last
> reported this problem?

No. I have run only 2.6.19.x and 2.6.21.x

After the xfs_repair I have remounted the file system.
After few hours linux has crashed with this message:
BUG: at arch/i386/kernel/smp.c:546 smp_call_function()
I have also the monitor bitmap.





2007-06-10 07:05:20

by Satyam Sharma

[permalink] [raw]
Subject: Re: XFS internal error xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller 0xc01b00bd

Hi,

On 6/8/07, Marco Berizzi <[email protected]> wrote:
> After few hours linux has crashed with this message:
> BUG: at arch/i386/kernel/smp.c:546 smp_call_function()

Which kernel (exactly) was this, and does this occur
reproducibly? Also, could you please send the dmesg,
stack trace, etc for when this happened?

Satyam

2007-06-12 06:15:10

by David Chinner

[permalink] [raw]
Subject: Re: XFS internal error xfs_da_do_buf(2) at line 2087 of file fs/xfs/xfs_da_btree.c. Caller 0xc01b00bd

On Fri, Jun 08, 2007 at 03:59:39PM +0200, Marco Berizzi wrote:
> David Chinner wrote:
> > Where we saw signs of on disk directory corruption. Have you run
> > xfs_repair successfully on the filesystem since you reported
> > this?
>
> yes.
>
> > If you did clean up the error, does xfs_repair report the same sort
> > of error again?
>
> I have run xfs_repair this morning.
> Here is the report:

<reports no on disk errors>

> > Have you run a 2.6.16-rcX or 2.6.17.[0-6] kernel since you last
> > reported this problem?
>
> No. I have run only 2.6.19.x and 2.6.21.x
>
> After the xfs_repair I have remounted the file system.
> After few hours linux has crashed with this message:
> BUG: at arch/i386/kernel/smp.c:546 smp_call_function()
> I have also the monitor bitmap.

This is sounding like memory corruption is no corruption is being
found on disk by xfs_repair. Have you run memtest86 on that box to
see if it's got bad memory?

Cheers

Dave.
--
Dave Chinner
Principal Engineer
SGI Australian Software Group