2009-01-21 22:57:20

by Lars Noschinski

[permalink] [raw]
Subject: XFS error & call trace

Hello!

I just had a crash of my XFS file system on 2.6.29-rc1 (syslog output
follows) After this, the file system on /home returned -EIO on most (all?)
operations. After a reboot from cdrom, I found it interesting to see that
xfs_check found no error on this partition. Instead, the XFS on the /
partition had a lot of errors.

Below is the call trace.

Jan 19 10:01:29 vertikal kernel: Pid: 3514, comm: firefox-bin Not tainted 2.6.29-rc1 #5
Jan 19 10:01:29 vertikal kernel: Call Trace:
Jan 19 10:01:29 vertikal kernel: [<c01f3e0a>] xfs_error_report+0x2c/0x2e
Jan 19 10:01:29 vertikal kernel: [<c01e863d>] xfs_btree_delrec+0x58a/0xa2c
Jan 19 10:01:29 vertikal kernel: [<c01e8b06>] ? xfs_btree_delete+0x27/0x70
Jan 19 10:01:29 vertikal kernel: [<c01e556f>] ? xfs_lookup_get_search_key+0x26/0x39
Jan 19 10:01:29 vertikal kernel: [<c01e72d3>] ? xfs_btree_lookup+0x145/0x2c0
Jan 19 10:01:29 vertikal kernel: [<c01e8b06>] xfs_btree_delete+0x27/0x70
Jan 19 10:01:29 vertikal kernel: [<c01e19a9>] xfs_bmap_del_extent+0x390/0x99b
Jan 19 10:01:29 vertikal kernel: [<c020ff2d>] ? kmem_zone_alloc+0x4d/0x90
Jan 19 10:01:29 vertikal kernel: [<c020ff2d>] ? kmem_zone_alloc+0x4d/0x90
Jan 19 10:01:29 vertikal kernel: [<c01e2965>] xfs_bunmapi+0x882/0xbe8
Jan 19 10:01:29 vertikal kernel: [<c020b377>] ? xfs_trans_unlock_items+0x3b/0xa3
Jan 19 10:01:29 vertikal kernel: [<c01fa16c>] xfs_itruncate_finish+0x1a9/0x2c9
Jan 19 10:01:29 vertikal kernel: [<c020e686>] xfs_inactive+0x1d8/0x3c7
Jan 19 10:01:29 vertikal kernel: [<c017f7c0>] ? inotify_inode_is_dead+0x71/0x79
Jan 19 10:01:29 vertikal kernel: [<c0217353>] xfs_fs_clear_inode+0x1f/0x21
Jan 19 10:01:29 vertikal kernel: [<c016e50c>] clear_inode+0x6f/0xbe
Jan 19 10:01:29 vertikal kernel: [<c016e9db>] generic_delete_inode+0x76/0xbd
Jan 19 10:01:29 vertikal kernel: [<c016ea34>] generic_drop_inode+0x12/0x117
Jan 19 10:01:29 vertikal kernel: [<c016e0b0>] iput+0x4b/0x4e
Jan 19 10:01:29 vertikal kernel: [<c016893e>] do_unlinkat+0xb9/0xfd
Jan 19 10:01:29 vertikal kernel: [<c0168992>] sys_unlink+0x10/0x12
Jan 19 10:01:29 vertikal kernel: [<c0102cee>] syscall_call+0x7/0xb
Jan 19 10:01:29 vertikal kernel: Pid: 3514, comm: firefox-bin Not tainted 2.6.29-rc1 #5
Jan 19 10:01:29 vertikal kernel: Call Trace:
Jan 19 10:01:29 vertikal kernel: [<c01f3e0a>] xfs_error_report+0x2c/0x2e
Jan 19 10:01:29 vertikal kernel: [<c02095b0>] xfs_trans_cancel+0x4b/0xd5
Jan 19 10:01:29 vertikal kernel: [<c020e69e>] ? xfs_inactive+0x1f0/0x3c7
Jan 19 10:01:29 vertikal kernel: [<c020e69e>] xfs_inactive+0x1f0/0x3c7
Jan 19 10:01:29 vertikal kernel: [<c017f7c0>] ? inotify_inode_is_dead+0x71/0x79
Jan 19 10:01:29 vertikal kernel: [<c0217353>] xfs_fs_clear_inode+0x1f/0x21
Jan 19 10:01:29 vertikal kernel: [<c016e50c>] clear_inode+0x6f/0xbe
Jan 19 10:01:29 vertikal kernel: [<c016e9db>] generic_delete_inode+0x76/0xbd
Jan 19 10:01:29 vertikal kernel: [<c016ea34>] generic_drop_inode+0x12/0x117
Jan 19 10:01:29 vertikal kernel: [<c016e0b0>] iput+0x4b/0x4e
Jan 19 10:01:29 vertikal kernel: [<c016893e>] do_unlinkat+0xb9/0xfd
Jan 19 10:01:29 vertikal kernel: [<c0168992>] sys_unlink+0x10/0x12
Jan 19 10:01:29 vertikal kernel: [<c0102cee>] syscall_call+0x7/0xb
Jan 19 10:01:29 vertikal kernel: xfs_force_shutdown(dm-3,0x8) called from line 1165 of file fs/xfs/xfs_trans.c. Return address = 0xc02095c6
Jan 19 10:01:51 vertikal kernel: Filesystem "dm-3": xfs_log_force: error 5 returned.
Jan 19 10:02:21 vertikal kernel: Filesystem "dm-3": xfs_log_force: error 5 returned.
Jan 19 10:03:21 vertikal last message repeated 2 times


In addition to this, over the last month, I had a few truncated files
after a reboot (shell history, firefox configuration, mutt's newsrc, ...)
on two occasions; even if I the shutdown was a clean one. Kernel was
2.6.26-git, (a26929fb489188ff959b1715ee67f0c9f84405b5 if I'm not mistaken).
This filesystem is running on top of dm-crypt. Can a volume group not
deactivated on shutdown cause such a failure? (The volume group cannot
be deactivated, because it contains also the root volume).


- Lars


2009-01-21 23:14:45

by Christoph Hellwig

[permalink] [raw]
Subject: Re: XFS error & call trace

On Wed, Jan 21, 2009 at 11:27:03PM +0100, Lars Noschinski wrote:
> Hello!
>
> I just had a crash of my XFS file system on 2.6.29-rc1 (syslog output
> follows) After this, the file system on /home returned -EIO on most (all?)
> operations. After a reboot from cdrom, I found it interesting to see that
> xfs_check found no error on this partition. Instead, the XFS on the /
> partition had a lot of errors.

The patch below from Dave Chinner fixes it:

---

[XFS] Long btree pointers are still 64 bit on disk

On 32 bit machines with CONFIG_LBD=n, XFS reduces the
in memory size of xfs_fsblock_t to 32 bits so that it
will fit within 32 bit addressing. However, the disk format
for long btree pointers are still 64 bits in size.

The recent btree rewrite failed to take this into account
when initialising new btree blocks, setting sibling pointers
to NULL and checking if they are NULL. Hence checking whether
a 64 bit NULL was the same as a 32 bit NULL was failingi
resulting in NULL sibling pointers failing to be detected
correctly. This showed up as WANT_CORRUPTED_GOTO shutdowns
in xfs_btree_delrec.

Fix this by making all the comparisons and setting of long
pointer btree NULL blocks to the disk format, not the
in memory format. i.e. use NULLDFSBNO.

Reported-by: Alexander Beregalov <[email protected]>
Reported-by: Jacek Luczak <[email protected]>
Reported-by: Danny ter Haar <[email protected]>
Signed-off-by: Dave Chinner <[email protected]>

---
fs/xfs/xfs_btree.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_btree.c b/fs/xfs/xfs_btree.c
index 2c3ef20..6bc2136 100644
--- a/fs/xfs/xfs_btree.c
+++ b/fs/xfs/xfs_btree.c
@@ -843,7 +843,7 @@ xfs_btree_ptr_is_null(
union xfs_btree_ptr *ptr)
{
if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
- return be64_to_cpu(ptr->l) == NULLFSBLOCK;
+ return be64_to_cpu(ptr->l) == NULLDFSBNO;
else
return be32_to_cpu(ptr->s) == NULLAGBLOCK;
}
@@ -854,7 +854,7 @@ xfs_btree_set_ptr_null(
union xfs_btree_ptr *ptr)
{
if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
- ptr->l = cpu_to_be64(NULLFSBLOCK);
+ ptr->l = cpu_to_be64(NULLDFSBNO);
else
ptr->s = cpu_to_be32(NULLAGBLOCK);
}
@@ -918,8 +918,8 @@ xfs_btree_init_block(
new->bb_numrecs = cpu_to_be16(numrecs);

if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
- new->bb_u.l.bb_leftsib = cpu_to_be64(NULLFSBLOCK);
- new->bb_u.l.bb_rightsib = cpu_to_be64(NULLFSBLOCK);
+ new->bb_u.l.bb_leftsib = cpu_to_be64(NULLDFSBNO);
+ new->bb_u.l.bb_rightsib = cpu_to_be64(NULLDFSBNO);
} else {
new->bb_u.s.bb_leftsib = cpu_to_be32(NULLAGBLOCK);
new->bb_u.s.bb_rightsib = cpu_to_be32(NULLAGBLOCK);
@@ -971,7 +971,7 @@ xfs_btree_ptr_to_daddr(
union xfs_btree_ptr *ptr)
{
if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
- ASSERT(be64_to_cpu(ptr->l) != NULLFSBLOCK);
+ ASSERT(be64_to_cpu(ptr->l) != NULLDFSBNO);

return XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->l));
} else {

_______________________________________________
xfs mailing list
[email protected]
http://oss.sgi.com/mailman/listinfo/xfs

2009-01-23 05:51:22

by Tino Keitel

[permalink] [raw]
Subject: Re: XFS error & call trace

On Wed, Jan 21, 2009 at 18:14:35 -0500, Christoph Hellwig wrote:
> On Wed, Jan 21, 2009 at 11:27:03PM +0100, Lars Noschinski wrote:
> > Hello!
> >
> > I just had a crash of my XFS file system on 2.6.29-rc1 (syslog output
> > follows) After this, the file system on /home returned -EIO on most (all?)
> > operations. After a reboot from cdrom, I found it interesting to see that
> > xfs_check found no error on this partition. Instead, the XFS on the /
> > partition had a lot of errors.
>
> The patch below from Dave Chinner fixes it:
>
> ---
>
> [XFS] Long btree pointers are still 64 bit on disk
>
> On 32 bit machines with CONFIG_LBD=n, XFS reduces the
> in memory size of xfs_fsblock_t to 32 bits so that it
> will fit within 32 bit addressing. However, the disk format
> for long btree pointers are still 64 bits in size.
>
> The recent btree rewrite failed to take this into account

Hi,

does this commit also fix http://lkml.org/lkml/2009/1/7/324 ?

Regards,
Tino

2009-01-24 03:23:51

by Christoph Hellwig

[permalink] [raw]
Subject: Re: XFS error & call trace

On Fri, Jan 23, 2009 at 06:51:09AM +0100, Tino Keitel wrote:
> does this commit also fix http://lkml.org/lkml/2009/1/7/324 ?

Yes, it should.