2009-01-19 11:48:24

by Jacek Luczak

[permalink] [raw]
Subject: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO


Hi All,

I've stepped into XFS issue/bug. Yesterday I've compiled 2.6.29-rc2 and no
didn't found errors. Today I've booted my notebook and XFS bug have occurred.
System reboot didn't helped, same error appeared.

Some info:
[1] config: http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2.config
[2] kernel logs:
http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2_XFS-bug.log
[3] most interesting part of log below.

Regards,
-Jacek

----------- BUG START HERE -----------
Jan 19 11:18:32 difrost kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at
line 3327 of file fs/xfs/xfs_btree.c. Caller 0xc01b047c
Jan 19 11:18:32 difrost kernel: Pid: 680, comm: mount Not tainted 2.6.29-rc2 #1
Jan 19 11:18:32 difrost kernel: Call Trace:
Jan 19 11:18:32 difrost kernel: [<c01afef0>] xfs_btree_delrec+0x657/0xbc2
Jan 19 11:18:32 difrost kernel: [<c01b047c>] xfs_btree_delete+0x21/0x66
Jan 19 11:18:32 difrost kernel: [<c01ac0b0>] xfs_bmbt_init_key_from_rec+0xa/0x16
Jan 19 11:18:32 difrost kernel: [<c01b047c>] xfs_btree_delete+0x21/0x66
Jan 19 11:18:32 difrost kernel: [<c01a8de3>] xfs_bmap_del_extent+0x309/0x974
Jan 19 11:18:32 difrost kernel: [<c01d8d33>] kmem_zone_alloc+0x53/0x90
Jan 19 11:18:32 difrost kernel: [<c01a9b28>] xfs_bunmapi+0x59c/0x95d
Jan 19 11:18:32 difrost kernel: [<c01c1ac1>] xfs_itruncate_finish+0x1c7/0x2f3
Jan 19 11:18:32 difrost kernel: [<c01d742b>] xfs_inactive+0x1d2/0x3ce
Jan 19 11:18:32 difrost kernel: [<c01c23b5>] xfs_imap_to_bp+0x5d/0xcb
Jan 19 11:18:32 difrost kernel: [<c0171bc4>] clear_inode+0x6c/0xb8
Jan 19 11:18:32 difrost kernel: [<c01720f1>] generic_delete_inode+0x72/0xcc
Jan 19 11:18:32 difrost kernel: [<c0171703>] iput+0x48/0x4a
Jan 19 11:18:32 difrost kernel: [<c01cba97>]
xlog_recover_process_one_iunlink+0xb0/0xda
Jan 19 11:18:32 difrost kernel: [<c01cbb38>]
xlog_recover_process_iunlinks+0x77/0xd8
Jan 19 11:18:32 difrost kernel: [<c01cbbd8>] xlog_recover_finish+0x3f/0x8d
Jan 19 11:18:32 difrost kernel: [<c01cff74>] xfs_mountfs+0x44e/0x54b
Jan 19 11:18:32 difrost kernel: [<c01d8e3a>] kmem_alloc+0x57/0xa8
Jan 19 11:18:32 difrost kernel: [<c01d06e9>] xfs_mru_cache_create+0xe6/0x11c
Jan 19 11:18:32 difrost kernel: [<c01e1616>] xfs_fs_fill_super+0x182/0x2d8
Jan 19 11:18:32 difrost kernel: [<c0165a80>] get_sb_bdev+0xe8/0x130
Jan 19 11:18:32 difrost kernel: [<c01750e6>] alloc_vfsmnt+0x69/0xf5
Jan 19 11:18:32 difrost kernel: [<c01dfe39>] xfs_fs_get_sb+0x12/0x16
Jan 19 11:18:32 difrost kernel: [<c01e1494>] xfs_fs_fill_super+0x0/0x2d8
Jan 19 11:18:32 difrost kernel: [<c0164cca>] vfs_kern_mount+0x39/0x72
Jan 19 11:18:32 difrost kernel: [<c0164d41>] do_kern_mount+0x2f/0xb4
Jan 19 11:18:32 difrost kernel: [<c0175c6a>] do_mount+0x632/0x66d
Jan 19 11:18:32 difrost kernel: [<c0175d14>] sys_mount+0x6f/0xaf
Jan 19 11:18:32 difrost kernel: [<c0102d05>] sysenter_do_call+0x12/0x25
Jan 19 11:18:32 difrost kernel: Filesystem "sda5": XFS internal error
xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller 0xc01d7444
Jan 19 11:18:32 difrost kernel:
Jan 19 11:18:32 difrost kernel: Pid: 680, comm: mount Not tainted 2.6.29-rc2 #1
Jan 19 11:18:32 difrost kernel: Call Trace:
Jan 19 11:18:32 difrost kernel: [<c01d2255>] xfs_trans_cancel+0x49/0xcf
Jan 19 11:18:32 difrost kernel: [<c01d7444>] xfs_inactive+0x1eb/0x3ce
Jan 19 11:18:32 difrost kernel: [<c01d7444>] xfs_inactive+0x1eb/0x3ce
Jan 19 11:18:32 difrost kernel: [<c01c23b5>] xfs_imap_to_bp+0x5d/0xcb
Jan 19 11:18:32 difrost kernel: [<c0171bc4>] clear_inode+0x6c/0xb8
Jan 19 11:18:32 difrost kernel: [<c01720f1>] generic_delete_inode+0x72/0xcc
Jan 19 11:18:32 difrost kernel: [<c0171703>] iput+0x48/0x4a
Jan 19 11:18:32 difrost kernel: [<c01cba97>]
xlog_recover_process_one_iunlink+0xb0/0xda
Jan 19 11:18:32 difrost kernel: [<c01cbb38>]
xlog_recover_process_iunlinks+0x77/0xd8
Jan 19 11:18:32 difrost kernel: [<c01cbbd8>] xlog_recover_finish+0x3f/0x8d
Jan 19 11:18:32 difrost kernel: [<c01cff74>] xfs_mountfs+0x44e/0x54b
Jan 19 11:18:32 difrost kernel: [<c01d8e3a>] kmem_alloc+0x57/0xa8
Jan 19 11:18:32 difrost kernel: [<c01d06e9>] xfs_mru_cache_create+0xe6/0x11c
Jan 19 11:18:32 difrost kernel: [<c01e1616>] xfs_fs_fill_super+0x182/0x2d8
Jan 19 11:18:32 difrost kernel: [<c0165a80>] get_sb_bdev+0xe8/0x130
Jan 19 11:18:32 difrost kernel: [<c01750e6>] alloc_vfsmnt+0x69/0xf5
Jan 19 11:18:32 difrost kernel: [<c01dfe39>] xfs_fs_get_sb+0x12/0x16
Jan 19 11:18:32 difrost kernel: [<c01e1494>] xfs_fs_fill_super+0x0/0x2d8
Jan 19 11:18:32 difrost kernel: [<c0164cca>] vfs_kern_mount+0x39/0x72
Jan 19 11:18:32 difrost kernel: [<c0164d41>] do_kern_mount+0x2f/0xb4
Jan 19 11:18:32 difrost kernel: [<c0175c6a>] do_mount+0x632/0x66d
Jan 19 11:18:32 difrost kernel: [<c0175d14>] sys_mount+0x6f/0xaf
Jan 19 11:18:32 difrost kernel: [<c0102d05>] sysenter_do_call+0x12/0x25
Jan 19 11:18:32 difrost kernel: Filesystem "sda5": Corruption of in-memory data
detected. Shutting down filesystem: sda5
Jan 19 11:18:32 difrost kernel: Please umount the filesystem, and rectify the
problem(s)
Jan 19 11:18:32 difrost kernel: BUG: unable to handle kernel NULL pointer
dereference at 0000005c
Jan 19 11:18:32 difrost kernel: IP: [<c01cbb51>]
xlog_recover_process_iunlinks+0x90/0xd8
Jan 19 11:18:32 difrost kernel: *pde = 00000000
Jan 19 11:18:32 difrost kernel: Oops: 0000 [#1] SMP
Jan 19 11:18:32 difrost kernel: last sysfs file:
/sys/devices/platform/i8042/modalias
Jan 19 11:18:32 difrost kernel: Modules linked in: psmouse arc4 ecb cryptomgr
aead crypto_blkcipher crypto_hash crypto_algapi iwl3945 rfkill mac80211 lib80211
cfg80211 sky2 sg
Jan 19 11:18:32 difrost kernel:
Jan 19 11:18:32 difrost kernel: Pid: 680, comm: mount Not tainted (2.6.29-rc2
#1) AMILO Pro Edition V3505
Jan 19 11:18:32 difrost kernel: EIP: 0060:[<c01cbb51>] EFLAGS: 00010286 CPU: 0
Jan 19 11:18:32 difrost kernel: EIP is at xlog_recover_process_iunlinks+0x90/0xd8
Jan 19 11:18:32 difrost kernel: EAX: 00000000 EBX: f6a71e40 ECX: 00000005 EDX:
f695fe20
Jan 19 11:18:32 difrost kernel: ESI: ffffffff EDI: f6825400 EBP: 00000026 ESP:
f695fe14
Jan 19 11:18:32 difrost kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Jan 19 11:18:32 difrost kernel: Process mount (pid: 680, ti=f695e000
task=f6aa5980 task.ti=f695e000)
Jan 19 11:18:32 difrost kernel: Stack:
Jan 19 11:18:32 difrost kernel: 00000026 00000002 00000000 00000000 f699c500
00000000 00000003 f6825400
Jan 19 11:18:32 difrost kernel: c01cbbd8 00000003 00000000 00000000 c01cff74
00000013 00000246 00400004
Jan 19 11:18:32 difrost kernel: 00000000 00000000 00000001 00000058 00000000
c01d8e3a 00000001 00000058
Jan 19 11:18:32 difrost kernel: Call Trace:
Jan 19 11:18:32 difrost kernel: [<c01cbbd8>] xlog_recover_finish+0x3f/0x8d
Jan 19 11:18:32 difrost kernel: [<c01cff74>] xfs_mountfs+0x44e/0x54b
Jan 19 11:18:32 difrost kernel: [<c01d8e3a>] kmem_alloc+0x57/0xa8
Jan 19 11:18:32 difrost kernel: [<c01d06e9>] xfs_mru_cache_create+0xe6/0x11c
Jan 19 11:18:32 difrost kernel: [<c01e1616>] xfs_fs_fill_super+0x182/0x2d8
Jan 19 11:18:32 difrost kernel: [<c0165a80>] get_sb_bdev+0xe8/0x130
Jan 19 11:18:32 difrost kernel: [<c01750e6>] alloc_vfsmnt+0x69/0xf5
Jan 19 11:18:32 difrost kernel: [<c01dfe39>] xfs_fs_get_sb+0x12/0x16
Jan 19 11:18:32 difrost kernel: [<c01e1494>] xfs_fs_fill_super+0x0/0x2d8
Jan 19 11:18:32 difrost kernel: [<c0164cca>] vfs_kern_mount+0x39/0x72
Jan 19 11:18:32 difrost kernel: [<c0164d41>] do_kern_mount+0x2f/0xb4
Jan 19 11:18:32 difrost kernel: [<c0175c6a>] do_mount+0x632/0x66d
Jan 19 11:18:32 difrost kernel: [<c0175d14>] sys_mount+0x6f/0xaf
Jan 19 11:18:32 difrost kernel: [<c0102d05>] sysenter_do_call+0x12/0x25
Jan 19 11:18:32 difrost kernel: Code: 1d fd 00 00 89 f1 55 89 f8 8b 54 24 04 e8
af fe ff ff 31 d2 89 c6 8d 44 24 0c 50 89 f8 8b 4c 24 08 e8 0b 14 ff ff 8b 44 24
10 5a <8b> 40 5c 59 83 fe ff 75 b8 45 83 fd 40 75 aa 8b 5c 24 08 83 7b
Jan 19 11:18:32 difrost kernel: EIP: [<c01cbb51>]
xlog_recover_process_iunlinks+0x90/0xd8 SS:ESP 0068:f695fe14
Jan 19 11:18:32 difrost kernel: ---[ end trace 0d722cd205608c78 ]---


2009-01-19 18:45:38

by Eric Sandeen

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

Jacek Luczak wrote:
> Hi All,
>
> I've stepped into XFS issue/bug. Yesterday I've compiled 2.6.29-rc2 and no
> didn't found errors. Today I've booted my notebook and XFS bug have occurred.
> System reboot didn't helped, same error appeared.
>
> Some info:
> [1] config: http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2.config
> [2] kernel logs:
> http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2_XFS-bug.log
> [3] most interesting part of log below.

so this happens every mount? Reproducible is good. How large is the
filesystem (too large to extract elsewhere for analysis...?) (plus I
suppose it'll be hard to get to it when you can't even boot....)

-Eric

> Regards,
> -Jacek
>
> ----------- BUG START HERE -----------
> Jan 19 11:18:32 difrost kernel: XFS internal error XFS_WANT_CORRUPTED_GOTO at
> line 3327 of file fs/xfs/xfs_btree.c. Caller 0xc01b047c
> Jan 19 11:18:32 difrost kernel: Pid: 680, comm: mount Not tainted 2.6.29-rc2 #1
> Jan 19 11:18:32 difrost kernel: Call Trace:
> Jan 19 11:18:32 difrost kernel: [<c01afef0>] xfs_btree_delrec+0x657/0xbc2
> Jan 19 11:18:32 difrost kernel: [<c01b047c>] xfs_btree_delete+0x21/0x66
> Jan 19 11:18:32 difrost kernel: [<c01ac0b0>] xfs_bmbt_init_key_from_rec+0xa/0x16
> Jan 19 11:18:32 difrost kernel: [<c01b047c>] xfs_btree_delete+0x21/0x66
> Jan 19 11:18:32 difrost kernel: [<c01a8de3>] xfs_bmap_del_extent+0x309/0x974
> Jan 19 11:18:32 difrost kernel: [<c01d8d33>] kmem_zone_alloc+0x53/0x90
> Jan 19 11:18:32 difrost kernel: [<c01a9b28>] xfs_bunmapi+0x59c/0x95d
> Jan 19 11:18:32 difrost kernel: [<c01c1ac1>] xfs_itruncate_finish+0x1c7/0x2f3
> Jan 19 11:18:32 difrost kernel: [<c01d742b>] xfs_inactive+0x1d2/0x3ce
> Jan 19 11:18:32 difrost kernel: [<c01c23b5>] xfs_imap_to_bp+0x5d/0xcb
> Jan 19 11:18:32 difrost kernel: [<c0171bc4>] clear_inode+0x6c/0xb8
> Jan 19 11:18:32 difrost kernel: [<c01720f1>] generic_delete_inode+0x72/0xcc
> Jan 19 11:18:32 difrost kernel: [<c0171703>] iput+0x48/0x4a
> Jan 19 11:18:32 difrost kernel: [<c01cba97>]
> xlog_recover_process_one_iunlink+0xb0/0xda
> Jan 19 11:18:32 difrost kernel: [<c01cbb38>]
> xlog_recover_process_iunlinks+0x77/0xd8
> Jan 19 11:18:32 difrost kernel: [<c01cbbd8>] xlog_recover_finish+0x3f/0x8d
> Jan 19 11:18:32 difrost kernel: [<c01cff74>] xfs_mountfs+0x44e/0x54b
> Jan 19 11:18:32 difrost kernel: [<c01d8e3a>] kmem_alloc+0x57/0xa8
> Jan 19 11:18:32 difrost kernel: [<c01d06e9>] xfs_mru_cache_create+0xe6/0x11c
> Jan 19 11:18:32 difrost kernel: [<c01e1616>] xfs_fs_fill_super+0x182/0x2d8
> Jan 19 11:18:32 difrost kernel: [<c0165a80>] get_sb_bdev+0xe8/0x130
> Jan 19 11:18:32 difrost kernel: [<c01750e6>] alloc_vfsmnt+0x69/0xf5
> Jan 19 11:18:32 difrost kernel: [<c01dfe39>] xfs_fs_get_sb+0x12/0x16
> Jan 19 11:18:32 difrost kernel: [<c01e1494>] xfs_fs_fill_super+0x0/0x2d8
> Jan 19 11:18:32 difrost kernel: [<c0164cca>] vfs_kern_mount+0x39/0x72
> Jan 19 11:18:32 difrost kernel: [<c0164d41>] do_kern_mount+0x2f/0xb4
> Jan 19 11:18:32 difrost kernel: [<c0175c6a>] do_mount+0x632/0x66d
> Jan 19 11:18:32 difrost kernel: [<c0175d14>] sys_mount+0x6f/0xaf
> Jan 19 11:18:32 difrost kernel: [<c0102d05>] sysenter_do_call+0x12/0x25
> Jan 19 11:18:32 difrost kernel: Filesystem "sda5": XFS internal error
> xfs_trans_cancel at line 1164 of file fs/xfs/xfs_trans.c. Caller 0xc01d7444
> Jan 19 11:18:32 difrost kernel:
> Jan 19 11:18:32 difrost kernel: Pid: 680, comm: mount Not tainted 2.6.29-rc2 #1
> Jan 19 11:18:32 difrost kernel: Call Trace:
> Jan 19 11:18:32 difrost kernel: [<c01d2255>] xfs_trans_cancel+0x49/0xcf
> Jan 19 11:18:32 difrost kernel: [<c01d7444>] xfs_inactive+0x1eb/0x3ce
> Jan 19 11:18:32 difrost kernel: [<c01d7444>] xfs_inactive+0x1eb/0x3ce
> Jan 19 11:18:32 difrost kernel: [<c01c23b5>] xfs_imap_to_bp+0x5d/0xcb
> Jan 19 11:18:32 difrost kernel: [<c0171bc4>] clear_inode+0x6c/0xb8
> Jan 19 11:18:32 difrost kernel: [<c01720f1>] generic_delete_inode+0x72/0xcc
> Jan 19 11:18:32 difrost kernel: [<c0171703>] iput+0x48/0x4a
> Jan 19 11:18:32 difrost kernel: [<c01cba97>]
> xlog_recover_process_one_iunlink+0xb0/0xda
> Jan 19 11:18:32 difrost kernel: [<c01cbb38>]
> xlog_recover_process_iunlinks+0x77/0xd8
> Jan 19 11:18:32 difrost kernel: [<c01cbbd8>] xlog_recover_finish+0x3f/0x8d
> Jan 19 11:18:32 difrost kernel: [<c01cff74>] xfs_mountfs+0x44e/0x54b
> Jan 19 11:18:32 difrost kernel: [<c01d8e3a>] kmem_alloc+0x57/0xa8
> Jan 19 11:18:32 difrost kernel: [<c01d06e9>] xfs_mru_cache_create+0xe6/0x11c
> Jan 19 11:18:32 difrost kernel: [<c01e1616>] xfs_fs_fill_super+0x182/0x2d8
> Jan 19 11:18:32 difrost kernel: [<c0165a80>] get_sb_bdev+0xe8/0x130
> Jan 19 11:18:32 difrost kernel: [<c01750e6>] alloc_vfsmnt+0x69/0xf5
> Jan 19 11:18:32 difrost kernel: [<c01dfe39>] xfs_fs_get_sb+0x12/0x16
> Jan 19 11:18:32 difrost kernel: [<c01e1494>] xfs_fs_fill_super+0x0/0x2d8
> Jan 19 11:18:32 difrost kernel: [<c0164cca>] vfs_kern_mount+0x39/0x72
> Jan 19 11:18:32 difrost kernel: [<c0164d41>] do_kern_mount+0x2f/0xb4
> Jan 19 11:18:32 difrost kernel: [<c0175c6a>] do_mount+0x632/0x66d
> Jan 19 11:18:32 difrost kernel: [<c0175d14>] sys_mount+0x6f/0xaf
> Jan 19 11:18:32 difrost kernel: [<c0102d05>] sysenter_do_call+0x12/0x25
> Jan 19 11:18:32 difrost kernel: Filesystem "sda5": Corruption of in-memory data
> detected. Shutting down filesystem: sda5
> Jan 19 11:18:32 difrost kernel: Please umount the filesystem, and rectify the
> problem(s)
> Jan 19 11:18:32 difrost kernel: BUG: unable to handle kernel NULL pointer
> dereference at 0000005c
> Jan 19 11:18:32 difrost kernel: IP: [<c01cbb51>]
> xlog_recover_process_iunlinks+0x90/0xd8
> Jan 19 11:18:32 difrost kernel: *pde = 00000000
> Jan 19 11:18:32 difrost kernel: Oops: 0000 [#1] SMP
> Jan 19 11:18:32 difrost kernel: last sysfs file:
> /sys/devices/platform/i8042/modalias
> Jan 19 11:18:32 difrost kernel: Modules linked in: psmouse arc4 ecb cryptomgr
> aead crypto_blkcipher crypto_hash crypto_algapi iwl3945 rfkill mac80211 lib80211
> cfg80211 sky2 sg
> Jan 19 11:18:32 difrost kernel:
> Jan 19 11:18:32 difrost kernel: Pid: 680, comm: mount Not tainted (2.6.29-rc2
> #1) AMILO Pro Edition V3505
> Jan 19 11:18:32 difrost kernel: EIP: 0060:[<c01cbb51>] EFLAGS: 00010286 CPU: 0
> Jan 19 11:18:32 difrost kernel: EIP is at xlog_recover_process_iunlinks+0x90/0xd8
> Jan 19 11:18:32 difrost kernel: EAX: 00000000 EBX: f6a71e40 ECX: 00000005 EDX:
> f695fe20
> Jan 19 11:18:32 difrost kernel: ESI: ffffffff EDI: f6825400 EBP: 00000026 ESP:
> f695fe14
> Jan 19 11:18:32 difrost kernel: DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
> Jan 19 11:18:32 difrost kernel: Process mount (pid: 680, ti=f695e000
> task=f6aa5980 task.ti=f695e000)
> Jan 19 11:18:32 difrost kernel: Stack:
> Jan 19 11:18:32 difrost kernel: 00000026 00000002 00000000 00000000 f699c500
> 00000000 00000003 f6825400
> Jan 19 11:18:32 difrost kernel: c01cbbd8 00000003 00000000 00000000 c01cff74
> 00000013 00000246 00400004
> Jan 19 11:18:32 difrost kernel: 00000000 00000000 00000001 00000058 00000000
> c01d8e3a 00000001 00000058
> Jan 19 11:18:32 difrost kernel: Call Trace:
> Jan 19 11:18:32 difrost kernel: [<c01cbbd8>] xlog_recover_finish+0x3f/0x8d
> Jan 19 11:18:32 difrost kernel: [<c01cff74>] xfs_mountfs+0x44e/0x54b
> Jan 19 11:18:32 difrost kernel: [<c01d8e3a>] kmem_alloc+0x57/0xa8
> Jan 19 11:18:32 difrost kernel: [<c01d06e9>] xfs_mru_cache_create+0xe6/0x11c
> Jan 19 11:18:32 difrost kernel: [<c01e1616>] xfs_fs_fill_super+0x182/0x2d8
> Jan 19 11:18:32 difrost kernel: [<c0165a80>] get_sb_bdev+0xe8/0x130
> Jan 19 11:18:32 difrost kernel: [<c01750e6>] alloc_vfsmnt+0x69/0xf5
> Jan 19 11:18:32 difrost kernel: [<c01dfe39>] xfs_fs_get_sb+0x12/0x16
> Jan 19 11:18:32 difrost kernel: [<c01e1494>] xfs_fs_fill_super+0x0/0x2d8
> Jan 19 11:18:32 difrost kernel: [<c0164cca>] vfs_kern_mount+0x39/0x72
> Jan 19 11:18:32 difrost kernel: [<c0164d41>] do_kern_mount+0x2f/0xb4
> Jan 19 11:18:32 difrost kernel: [<c0175c6a>] do_mount+0x632/0x66d
> Jan 19 11:18:32 difrost kernel: [<c0175d14>] sys_mount+0x6f/0xaf
> Jan 19 11:18:32 difrost kernel: [<c0102d05>] sysenter_do_call+0x12/0x25
> Jan 19 11:18:32 difrost kernel: Code: 1d fd 00 00 89 f1 55 89 f8 8b 54 24 04 e8
> af fe ff ff 31 d2 89 c6 8d 44 24 0c 50 89 f8 8b 4c 24 08 e8 0b 14 ff ff 8b 44 24
> 10 5a <8b> 40 5c 59 83 fe ff 75 b8 45 83 fd 40 75 aa 8b 5c 24 08 83 7b
> Jan 19 11:18:32 difrost kernel: EIP: [<c01cbb51>]
> xlog_recover_process_iunlinks+0x90/0xd8 SS:ESP 0068:f695fe14
> Jan 19 11:18:32 difrost kernel: ---[ end trace 0d722cd205608c78 ]---
>

2009-01-20 00:46:40

by Dave Chinner

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Mon, Jan 19, 2009 at 12:44:48PM -0600, Eric Sandeen wrote:
> Jacek Luczak wrote:
> > Hi All,
> >
> > I've stepped into XFS issue/bug. Yesterday I've compiled 2.6.29-rc2 and no
> > didn't found errors. Today I've booted my notebook and XFS bug have occurred.
> > System reboot didn't helped, same error appeared.
> >
> > Some info:
> > [1] config: http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2.config
> > [2] kernel logs:
> > http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2_XFS-bug.log
> > [3] most interesting part of log below.
>
> so this happens every mount? Reproducible is good. How large is the
> filesystem (too large to extract elsewhere for analysis...?) (plus I
> suppose it'll be hard to get to it when you can't even boot....)

XFS folks, I suspect the common link between all the reports of this
bug is that they are on 32-bit kernels. I can't reproduce this on
a 64 bit kernel, and I'm trying to get a 32-bit UML built right now
to test this theory.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2009-01-20 09:23:27

by Jacek Luczak

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

Eric Sandeen pisze:
> Jacek Luczak wrote:
>> Hi All,
>>
>> I've stepped into XFS issue/bug. Yesterday I've compiled 2.6.29-rc2 and no
>> didn't found errors. Today I've booted my notebook and XFS bug have occurred.
>> System reboot didn't helped, same error appeared.
>>
>> Some info:
>> [1] config: http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2.config
>> [2] kernel logs:
>> http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2_XFS-bug.log
>> [3] most interesting part of log below.
>
> so this happens every mount? Reproducible is good. How large is the
> filesystem (too large to extract elsewhere for analysis...?) (plus I
> suppose it'll be hard to get to it when you can't even boot....)
>
> -Eric
>

Hi Eric,

funny or sad thing is that this happens while mounting only one of partitions
(/home) which is:
$ df -h | grep /home
/dev/sda5 20G 14G 6,0G 69% /home

This bug is quite strange, as I mentioned, first boot on new kernel went OK,
next two resulted in such behavior. Now I'm running 2.6.29-rc2-12097-gf3b8436
and no bug here. Nevertheless I'm not fully happy, as it was seen before, the
bug might still happen (will boot few times to test it and report back to you).

First yesterday buggy but went unnoticed so I've started fluxbox and firefox
(errors seen in log), as usual, then found that sth is wrong. Second boot was
also buggy than I've returned to old kernel where everything was OK ... nearly
everything, cause firefox suddenly ,,forgot'' all configuration.

I will boot my notebook on all those 2.6.29-* kernel few times and maybe be able
to reproduce that bug.

-Jacek

2009-01-20 09:25:28

by Jacek Luczak

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

Dave Chinner pisze:
> On Mon, Jan 19, 2009 at 12:44:48PM -0600, Eric Sandeen wrote:
>> Jacek Luczak wrote:
>>> Hi All,
>>>
>>> I've stepped into XFS issue/bug. Yesterday I've compiled 2.6.29-rc2 and no
>>> didn't found errors. Today I've booted my notebook and XFS bug have occurred.
>>> System reboot didn't helped, same error appeared.
>>>
>>> Some info:
>>> [1] config: http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2.config
>>> [2] kernel logs:
>>> http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2_XFS-bug.log
>>> [3] most interesting part of log below.
>> so this happens every mount? Reproducible is good. How large is the
>> filesystem (too large to extract elsewhere for analysis...?) (plus I
>> suppose it'll be hard to get to it when you can't even boot....)
>
> XFS folks, I suspect the common link between all the reports of this
> bug is that they are on 32-bit kernels. I can't reproduce this on
> a 64 bit kernel, and I'm trying to get a 32-bit UML built right now
> to test this theory.
>

Yep, 32-bits here. I've googled a while looking for some answer and it looks
like it has happen before in various kernel version (no report regarding 2.6.29
AFAIR).

-Jacek

2009-01-20 10:41:58

by Jacek Luczak

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

Jacek Luczak pisze:
> Eric Sandeen pisze:
>> Jacek Luczak wrote:
>>> Hi All,
>>>
>>> I've stepped into XFS issue/bug. Yesterday I've compiled 2.6.29-rc2 and no
>>> didn't found errors. Today I've booted my notebook and XFS bug have occurred.
>>> System reboot didn't helped, same error appeared.
>>>
>>> Some info:
>>> [1] config: http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2.config
>>> [2] kernel logs:
>>> http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2_XFS-bug.log
>>> [3] most interesting part of log below.
>> so this happens every mount? Reproducible is good. How large is the
>> filesystem (too large to extract elsewhere for analysis...?) (plus I
>> suppose it'll be hard to get to it when you can't even boot....)
>>
>> -Eric
>>
>
> Hi Eric,
>
> funny or sad thing is that this happens while mounting only one of partitions
> (/home) which is:
> $ df -h | grep /home
> /dev/sda5 20G 14G 6,0G 69% /home
>
> This bug is quite strange, as I mentioned, first boot on new kernel went OK,
> next two resulted in such behavior. Now I'm running 2.6.29-rc2-12097-gf3b8436
> and no bug here. Nevertheless I'm not fully happy, as it was seen before, the
> bug might still happen (will boot few times to test it and report back to you).
>
> First yesterday buggy but went unnoticed so I've started fluxbox and firefox
> (errors seen in log), as usual, then found that sth is wrong. Second boot was
> also buggy than I've returned to old kernel where everything was OK ... nearly
> everything, cause firefox suddenly ,,forgot'' all configuration.
>
> I will boot my notebook on all those 2.6.29-* kernel few times and maybe be able
> to reproduce that bug.
>

I've made some tests. Basically umount + mount (with time measurement):

$ for i in $(seq -s ' ' 1 20) ; do echo "[=> umount [$MNT]: $i" ; time umount
$MNT ; [ $? -eq 0 ] && echo "[=> mount [$MNT]: $i" && time mount $MNT || break ;
done 2>&1 | tee ~/${MNT##*/}_mount_git.log

where MNT was set to three different partitions (all with XFS). Both on rc2 and
git version, no bug appeared. If someone is interested (I've seen some delays in
mounting before), some time results here:
http://pin.if.uz.zgora.pl/~difrost/linux-next/mount_logs/

I will still mess around so maybe it will appear once more than will try do some
more tests (suggestions are welcome).

-Jacek

2009-01-20 11:29:24

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Tue, Jan 20, 2009 at 11:46:11AM +1100, Dave Chinner wrote:
> On Mon, Jan 19, 2009 at 12:44:48PM -0600, Eric Sandeen wrote:
> > Jacek Luczak wrote:
> > > Hi All,
> > >
> > > I've stepped into XFS issue/bug. Yesterday I've compiled 2.6.29-rc2 and no
> > > didn't found errors. Today I've booted my notebook and XFS bug have occurred.
> > > System reboot didn't helped, same error appeared.
> > >
> > > Some info:
> > > [1] config: http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2.config
> > > [2] kernel logs:
> > > http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2_XFS-bug.log
> > > [3] most interesting part of log below.
> >
> > so this happens every mount? Reproducible is good. How large is the
> > filesystem (too large to extract elsewhere for analysis...?) (plus I
> > suppose it'll be hard to get to it when you can't even boot....)
>
> XFS folks, I suspect the common link between all the reports of this
> bug is that they are on 32-bit kernels. I can't reproduce this on
> a 64 bit kernel, and I'm trying to get a 32-bit UML built right now
> to test this theory.

I'm doing about half of my testing on 32 bit x86, and I couldn't
reproduce the detailed receipe in the kernel.org bugzilla yet.

Just curious: do you have CONFIG_LBD set?

2009-01-20 11:46:18

by Jacek Luczak

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

Christoph Hellwig pisze:
> On Tue, Jan 20, 2009 at 11:46:11AM +1100, Dave Chinner wrote:
>> On Mon, Jan 19, 2009 at 12:44:48PM -0600, Eric Sandeen wrote:
>>> Jacek Luczak wrote:
>>>> Hi All,
>>>>
>>>> I've stepped into XFS issue/bug. Yesterday I've compiled 2.6.29-rc2 and no
>>>> didn't found errors. Today I've booted my notebook and XFS bug have occurred.
>>>> System reboot didn't helped, same error appeared.
>>>>
>>>> Some info:
>>>> [1] config: http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2.config
>>>> [2] kernel logs:
>>>> http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2_XFS-bug.log
>>>> [3] most interesting part of log below.
>>> so this happens every mount? Reproducible is good. How large is the
>>> filesystem (too large to extract elsewhere for analysis...?) (plus I
>>> suppose it'll be hard to get to it when you can't even boot....)
>> XFS folks, I suspect the common link between all the reports of this
>> bug is that they are on 32-bit kernels. I can't reproduce this on
>> a 64 bit kernel, and I'm trying to get a 32-bit UML built right now
>> to test this theory.
>
> I'm doing about half of my testing on 32 bit x86, and I couldn't
> reproduce the detailed receipe in the kernel.org bugzilla yet.
>
> Just curious: do you have CONFIG_LBD set?
>
Hi Christoph,

the answer is:
$ grep LBD .config
# CONFIG_LBD is not set

-Jacek

2009-01-20 11:49:21

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Tue, Jan 20, 2009 at 12:47:16PM +0100, Jacek Luczak wrote:
> Christoph Hellwig pisze:
> > On Tue, Jan 20, 2009 at 11:46:11AM +1100, Dave Chinner wrote:
> >> On Mon, Jan 19, 2009 at 12:44:48PM -0600, Eric Sandeen wrote:
> >>> Jacek Luczak wrote:
> >>>> Hi All,
> >>>>
> >>>> I've stepped into XFS issue/bug. Yesterday I've compiled 2.6.29-rc2 and no
> >>>> didn't found errors. Today I've booted my notebook and XFS bug have occurred.
> >>>> System reboot didn't helped, same error appeared.
> >>>>
> >>>> Some info:
> >>>> [1] config: http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2.config
> >>>> [2] kernel logs:
> >>>> http://pin.if.uz.zgora.pl/~difrost/linux-next/2.6.29-rc2_XFS-bug.log
> >>>> [3] most interesting part of log below.
> >>> so this happens every mount? Reproducible is good. How large is the
> >>> filesystem (too large to extract elsewhere for analysis...?) (plus I
> >>> suppose it'll be hard to get to it when you can't even boot....)
> >> XFS folks, I suspect the common link between all the reports of this
> >> bug is that they are on 32-bit kernels. I can't reproduce this on
> >> a 64 bit kernel, and I'm trying to get a 32-bit UML built right now
> >> to test this theory.
> >
> > I'm doing about half of my testing on 32 bit x86, and I couldn't
> > reproduce the detailed receipe in the kernel.org bugzilla yet.
> >
> > Just curious: do you have CONFIG_LBD set?
> >
> Hi Christoph,
>
> the answer is:
> $ grep LBD .config
> # CONFIG_LBD is not set

Ok, let me reproduce it without that set..

2009-01-20 12:13:46

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Tue, Jan 20, 2009 at 06:49:06AM -0500, Christoph Hellwig wrote:
> > > Just curious: do you have CONFIG_LBD set?
> > >
> > Hi Christoph,
> >
> > the answer is:
> > $ grep LBD .config
> > # CONFIG_LBD is not set
>
> Ok, let me reproduce it without that set..

Ok, on 32-bit x86 without CONFIG_LBD I can reliably reproduce the issue
with the following script:


#!/bin/bash

TESTDIR=/mnt/test
SCRATCHMNT=/mnt/scratch
file=$SCRATCHMNT/f

do_pwrite()
{
offset=`expr $1 \* 512`
end=`expr $2 \* 512`
length=`expr $end - $offset`

xfs_io -d -f $file -c "pwrite $offset $length" >/dev/null
}


mkfs.xfs \
-b size=1024 \
-d file,name=$TESTDIR/fsfile,size=40146592b,agcount=16 \
-i attr=0 \
-l version=1

mount -o loop,rw,noatime,nodiratime $TESTDIR/fsfile $SCRATCHMNT

do_pwrite 30792 31039
do_pwrite 30320 30791
do_pwrite 29688 30319
do_pwrite 29536 29687
do_pwrite 27216 29535
do_pwrite 24368 27215
do_pwrite 21616 24367
do_pwrite 20608 21615
do_pwrite 19680 20607
do_pwrite 19232 19679
do_pwrite 17840 19231
do_pwrite 16928 17839
do_pwrite 15168 16927
do_pwrite 14048 15167
do_pwrite 12152 14047
do_pwrite 11344 12151
do_pwrite 8792 11343
do_pwrite 6456 8791
do_pwrite 5000 6455
do_pwrite 1728 4999
do_pwrite 0 1727

sync
sync

> $SCRATCHMNT/bigfile

#umount $SCRATCH

2009-01-20 12:45:43

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Tue, Jan 20, 2009 at 07:13:35AM -0500, Christoph Hellwig wrote:
> On Tue, Jan 20, 2009 at 06:49:06AM -0500, Christoph Hellwig wrote:
> > > > Just curious: do you have CONFIG_LBD set?
> > > >
> > > Hi Christoph,
> > >
> > > the answer is:
> > > $ grep LBD .config
> > > # CONFIG_LBD is not set
> >
> > Ok, let me reproduce it without that set..
>
> Ok, on 32-bit x86 without CONFIG_LBD I can reliably reproduce the issue
> with the following script:

Bisected down to:

commit 91cca5df9bc85efdabfa645f51d54259ed09f4bf
Author: Christoph Hellwig <[email protected]>
Date: Thu Oct 30 16:58:01 2008 +1100

[XFS] implement generic xfs_btree_delete/delrec

Make the btree delete code generic. Based on a patch from David Chinner
with lots of changes to follow the original btree implementations more
closely. While this loses some of the generic helper routines for
inserting/moving/removing records it also solves some of the one off bugs
in the original code and makes it easier to verify.

2009-01-20 13:35:35

by Dave Chinner

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Tue, Jan 20, 2009 at 06:49:06AM -0500, Christoph Hellwig wrote:
> On Tue, Jan 20, 2009 at 12:47:16PM +0100, Jacek Luczak wrote:
> > Christoph Hellwig pisze:
> > > I'm doing about half of my testing on 32 bit x86, and I couldn't
> > > reproduce the detailed receipe in the kernel.org bugzilla yet.
> > >
> > > Just curious: do you have CONFIG_LBD set?
> >
> > the answer is:
> > $ grep LBD .config
> > # CONFIG_LBD is not set
>
> Ok, let me reproduce it without that set..

Good call, Christoph. I have a reproduce on ia32, CONFIG_LBD=n,
1k block size, 16 AGs in 4GB. Filesystem pre-prepared by
copying a build kernel onto it then 'make mrproper' to put
holes in it. Then, on boot:

dave@xfs-32:/mnt$ sudo mount /dev/sdb /mnt; cd /mnt
dave@xfs-32:/mnt$ cp /home/dave/linux-2.6.tar.gz . ; sync
dave@xfs-32:/mnt$ sudo xfs_bmap -v linux-2.6.tar.gz
linux-2.6.tar.gz:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..150271]: 92112..242383 0 (92112..242383) 150272
1: [150272..346879]: 256188..452795 0 (256188..452795) 196608
2: [346880..445183]: 906390..1004693 1 (382102..480405) 98304
3: [445184..494335]: 770022..819173 1 (245734..294885) 49152
4: [494336..543487]: 720870..770021 1 (196582..245733) 49152
5: [543488..592639]: 671718..720869 1 (147430..196581) 49152
6: [592640..641791]: 622566..671717 1 (98278..147429) 49152
7: [641792..737023]: 1398498..1493729 2 (349922..445153) 95232
8: [737024..781055]: 1353872..1397903 2 (305296..349327) 44032
9: [781056..830207]: 1304720..1353871 2 (256144..305295) 49152
10: [830208..879359]: 1255566..1304717 2 (206990..256141) 49152
11: [879360..925367]: 1209558..1255565 2 (160982..206989) 46008
dave@xfs-32:/mnt$ > linux-2.6.tar.gz
Connection to xfs-32 closed.

I'll see if this is reproducable, and if it is I'll start
instrumenting in the morning during the LCA keynote. ;)

Cheers,

Dave.
--
Dave Chinner
[email protected]

2009-01-20 13:57:44

by Jacek Luczak

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

Christoph Hellwig pisze:
> On Tue, Jan 20, 2009 at 07:13:35AM -0500, Christoph Hellwig wrote:
>> On Tue, Jan 20, 2009 at 06:49:06AM -0500, Christoph Hellwig wrote:
>>>>> Just curious: do you have CONFIG_LBD set?
>>>>>
>>>> Hi Christoph,
>>>>
>>>> the answer is:
>>>> $ grep LBD .config
>>>> # CONFIG_LBD is not set
>>> Ok, let me reproduce it without that set..
>> Ok, on 32-bit x86 without CONFIG_LBD I can reliably reproduce the issue
>> with the following script:
>
> Bisected down to:
>
> commit 91cca5df9bc85efdabfa645f51d54259ed09f4bf
> Author: Christoph Hellwig <[email protected]>
> Date: Thu Oct 30 16:58:01 2008 +1100
>
> [XFS] implement generic xfs_btree_delete/delrec
>
> Make the btree delete code generic. Based on a patch from David Chinner
> with lots of changes to follow the original btree implementations more
> closely. While this loses some of the generic helper routines for
> inserting/moving/removing records it also solves some of the one off bugs
> in the original code and makes it easier to verify.
>

Good job! Is there some ,,quick'' fix?

-Jacek

2009-01-20 14:05:30

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Tue, Jan 20, 2009 at 02:58:35PM +0100, Jacek Luczak wrote:
> Good job! Is there some ,,quick'' fix?

The patch below makes it go away for me, alternatively just enable
CONFIG_LBD.


Index: linux-2.6/fs/xfs/xfs_types.h
===================================================================
--- linux-2.6.orig/fs/xfs/xfs_types.h 2009-01-20 14:55:55.806068213 +0100
+++ linux-2.6/fs/xfs/xfs_types.h 2009-01-20 14:56:01.437945154 +0100
@@ -96,7 +96,7 @@ typedef __uint64_t xfs_dfilblks_t; /* nu
/*
* Memory based types are conditional.
*/
-#if XFS_BIG_BLKNOS
+#if 1 //XFS_BIG_BLKNOS
typedef __uint64_t xfs_fsblock_t; /* blockno in filesystem (agno|agbno) */
typedef __uint64_t xfs_rfsblock_t; /* blockno in filesystem (raw) */
typedef __uint64_t xfs_rtblock_t; /* extent (block) in realtime area */

2009-01-20 14:14:01

by Jacek Luczak

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

Christoph Hellwig pisze:
> On Tue, Jan 20, 2009 at 02:58:35PM +0100, Jacek Luczak wrote:
>> Good job! Is there some ,,quick'' fix?
>
> The patch below makes it go away for me, alternatively just enable
> CONFIG_LBD.
>
>
> Index: linux-2.6/fs/xfs/xfs_types.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_types.h 2009-01-20 14:55:55.806068213 +0100
> +++ linux-2.6/fs/xfs/xfs_types.h 2009-01-20 14:56:01.437945154 +0100
> @@ -96,7 +96,7 @@ typedef __uint64_t xfs_dfilblks_t; /* nu
> /*
> * Memory based types are conditional.
> */
> -#if XFS_BIG_BLKNOS
> +#if 1 //XFS_BIG_BLKNOS
> typedef __uint64_t xfs_fsblock_t; /* blockno in filesystem (agno|agbno) */
> typedef __uint64_t xfs_rfsblock_t; /* blockno in filesystem (raw) */
> typedef __uint64_t xfs_rtblock_t; /* extent (block) in realtime area */
>

Applied. Thanks. Will do some tests with your script.

-Jacek

2009-01-20 14:22:15

by Jacek Luczak

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

Christoph Hellwig pisze:
> On Tue, Jan 20, 2009 at 02:58:35PM +0100, Jacek Luczak wrote:
>> Good job! Is there some ,,quick'' fix?
>
> The patch below makes it go away for me, alternatively just enable
> CONFIG_LBD.
>
>
> Index: linux-2.6/fs/xfs/xfs_types.h
> ===================================================================
> --- linux-2.6.orig/fs/xfs/xfs_types.h 2009-01-20 14:55:55.806068213 +0100
> +++ linux-2.6/fs/xfs/xfs_types.h 2009-01-20 14:56:01.437945154 +0100
> @@ -96,7 +96,7 @@ typedef __uint64_t xfs_dfilblks_t; /* nu
> /*
> * Memory based types are conditional.
> */
> -#if XFS_BIG_BLKNOS
> +#if 1 //XFS_BIG_BLKNOS
> typedef __uint64_t xfs_fsblock_t; /* blockno in filesystem (agno|agbno) */
> typedef __uint64_t xfs_rfsblock_t; /* blockno in filesystem (raw) */
> typedef __uint64_t xfs_rtblock_t; /* extent (block) in realtime area */
>

I've applied it and now running ,,fixed'' kernel. What I've notice is:
$ LC_ALL=C df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 20G -40E 40E - /
/dev/sda5 20G -23E 23E - /home
/dev/sda6 56G 56G 774M 99% /NORA
/dev/sda7 45G 44G 1.2G 98% /MAGAZYN

-Jacek

2009-01-20 14:33:15

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Tue, Jan 20, 2009 at 03:23:01PM +0100, Jacek Luczak wrote:
> I've applied it and now running ,,fixed'' kernel. What I've notice is:
> $ LC_ALL=C df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda1 20G -40E 40E - /
> /dev/sda5 20G -23E 23E - /home
> /dev/sda6 56G 56G 774M 99% /NORA
> /dev/sda7 45G 44G 1.2G 98% /MAGAZYN

Yeah, it's more of a hack. If you drop the patch and just enable
CONFIG_LBD it should be fine.

2009-01-20 23:03:25

by Dave Chinner

[permalink] [raw]
Subject: [PATCH] Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Wed, Jan 21, 2009 at 12:35:21AM +1100, Dave Chinner wrote:
> On Tue, Jan 20, 2009 at 06:49:06AM -0500, Christoph Hellwig wrote:
> > On Tue, Jan 20, 2009 at 12:47:16PM +0100, Jacek Luczak wrote:
> > > Christoph Hellwig pisze:
> > > > I'm doing about half of my testing on 32 bit x86, and I couldn't
> > > > reproduce the detailed receipe in the kernel.org bugzilla yet.
> > > >
> > > > Just curious: do you have CONFIG_LBD set?
> > >
> > > the answer is:
> > > $ grep LBD .config
> > > # CONFIG_LBD is not set
> >
> > Ok, let me reproduce it without that set..
>
> Good call, Christoph. I have a reproduce on ia32, CONFIG_LBD=n,
> 1k block size, 16 AGs in 4GB.

Christoph nailed it down to a problem with xf_fsblock_t last night.

> I'll see if this is reproducable, and if it is I'll start
> instrumenting in the morning during the LCA keynote. ;)

And here's the patch to fix it, posted direct from the keynote ;)

Cheers,

Dave.

------

[XFS] Long btree pointers are still 64 bit on disk

On 32 bit machines with CONFIG_LBD=n, XFS reduces the
in memory size of xfs_fsblock_t to 32 bits so that it
will fit within 32 bit addressing. However, the disk format
for long btree pointers are still 64 bits in size.

The recent btree rewrite failed to take this into account
when initialising new btree blocks, setting sibling pointers
to NULL and checking if they are NULL. Hence checking whether
a 64 bit NULL was the same as a 32 bit NULL was failingi
resulting in NULL sibling pointers failing to be detected
correctly. This showed up as WANT_CORRUPTED_GOTO shutdowns
in xfs_btree_delrec.

Fix this by making all the comparisons and setting of long
pointer btree NULL blocks to the disk format, not the
in memory format. i.e. use NULLDFSBNO.

Signed-off-by: Dave Chinner <[email protected]>
---
fs/xfs/xfs_btree.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/xfs/xfs_btree.c b/fs/xfs/xfs_btree.c
index 2c3ef20..6bc2136 100644
--- a/fs/xfs/xfs_btree.c
+++ b/fs/xfs/xfs_btree.c
@@ -843,7 +843,7 @@ xfs_btree_ptr_is_null(
union xfs_btree_ptr *ptr)
{
if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
- return be64_to_cpu(ptr->l) == NULLFSBLOCK;
+ return be64_to_cpu(ptr->l) == NULLDFSBNO;
else
return be32_to_cpu(ptr->s) == NULLAGBLOCK;
}
@@ -854,7 +854,7 @@ xfs_btree_set_ptr_null(
union xfs_btree_ptr *ptr)
{
if (cur->bc_flags & XFS_BTREE_LONG_PTRS)
- ptr->l = cpu_to_be64(NULLFSBLOCK);
+ ptr->l = cpu_to_be64(NULLDFSBNO);
else
ptr->s = cpu_to_be32(NULLAGBLOCK);
}
@@ -918,8 +918,8 @@ xfs_btree_init_block(
new->bb_numrecs = cpu_to_be16(numrecs);

if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
- new->bb_u.l.bb_leftsib = cpu_to_be64(NULLFSBLOCK);
- new->bb_u.l.bb_rightsib = cpu_to_be64(NULLFSBLOCK);
+ new->bb_u.l.bb_leftsib = cpu_to_be64(NULLDFSBNO);
+ new->bb_u.l.bb_rightsib = cpu_to_be64(NULLDFSBNO);
} else {
new->bb_u.s.bb_leftsib = cpu_to_be32(NULLAGBLOCK);
new->bb_u.s.bb_rightsib = cpu_to_be32(NULLAGBLOCK);
@@ -971,7 +971,7 @@ xfs_btree_ptr_to_daddr(
union xfs_btree_ptr *ptr)
{
if (cur->bc_flags & XFS_BTREE_LONG_PTRS) {
- ASSERT(be64_to_cpu(ptr->l) != NULLFSBLOCK);
+ ASSERT(be64_to_cpu(ptr->l) != NULLDFSBNO);

return XFS_FSB_TO_DADDR(cur->bc_mp, be64_to_cpu(ptr->l));
} else {

--
Dave Chinner
[email protected]

2009-01-20 23:22:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH] Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Wed, Jan 21, 2009 at 10:03:06AM +1100, Dave Chinner wrote:
> [XFS] Long btree pointers are still 64 bit on disk
>
> On 32 bit machines with CONFIG_LBD=n, XFS reduces the
> in memory size of xfs_fsblock_t to 32 bits so that it
> will fit within 32 bit addressing. However, the disk format
> for long btree pointers are still 64 bits in size.
>
> The recent btree rewrite failed to take this into account
> when initialising new btree blocks, setting sibling pointers
> to NULL and checking if they are NULL. Hence checking whether
> a 64 bit NULL was the same as a 32 bit NULL was failingi
> resulting in NULL sibling pointers failing to be detected
> correctly. This showed up as WANT_CORRUPTED_GOTO shutdowns
> in xfs_btree_delrec.
>
> Fix this by making all the comparisons and setting of long
> pointer btree NULL blocks to the disk format, not the
> in memory format. i.e. use NULLDFSBNO.

Thanks, this fixes the testcase for me.

2009-01-21 04:05:28

by Dave Chinner

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Tue, Jan 20, 2009 at 03:23:01PM +0100, Jacek Luczak wrote:
> Christoph Hellwig pisze:
> > On Tue, Jan 20, 2009 at 02:58:35PM +0100, Jacek Luczak wrote:
> >> Good job! Is there some ,,quick'' fix?
> >
> > The patch below makes it go away for me, alternatively just enable
> > CONFIG_LBD.
> >
> >
> > Index: linux-2.6/fs/xfs/xfs_types.h
> > ===================================================================
> > --- linux-2.6.orig/fs/xfs/xfs_types.h 2009-01-20 14:55:55.806068213 +0100
> > +++ linux-2.6/fs/xfs/xfs_types.h 2009-01-20 14:56:01.437945154 +0100
> > @@ -96,7 +96,7 @@ typedef __uint64_t xfs_dfilblks_t; /* nu
> > /*
> > * Memory based types are conditional.
> > */
> > -#if XFS_BIG_BLKNOS
> > +#if 1 //XFS_BIG_BLKNOS
> > typedef __uint64_t xfs_fsblock_t; /* blockno in filesystem (agno|agbno) */
> > typedef __uint64_t xfs_rfsblock_t; /* blockno in filesystem (raw) */
> > typedef __uint64_t xfs_rtblock_t; /* extent (block) in realtime area */
> >
>
> I've applied it and now running ,,fixed'' kernel. What I've notice is:
> $ LC_ALL=C df -h
> Filesystem Size Used Avail Use% Mounted on
> /dev/sda1 20G -40E 40E - /
> /dev/sda5 20G -23E 23E - /home
> /dev/sda6 56G 56G 774M 99% /NORA
> /dev/sda7 45G 44G 1.2G 98% /MAGAZYN

Please try the patch I posted this morning - it fixes the problem
properly and shouldn't have this side effect.

Cheers,

Dave.
--
Dave Chinner
[email protected]

2009-01-21 09:03:41

by Jacek Luczak

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

Dave Chinner pisze:
> On Tue, Jan 20, 2009 at 03:23:01PM +0100, Jacek Luczak wrote:
>> Christoph Hellwig pisze:
>>> On Tue, Jan 20, 2009 at 02:58:35PM +0100, Jacek Luczak wrote:
>>>> Good job! Is there some ,,quick'' fix?
>>> The patch below makes it go away for me, alternatively just enable
>>> CONFIG_LBD.
>>>
>>>
>>> Index: linux-2.6/fs/xfs/xfs_types.h
>>> ===================================================================
>>> --- linux-2.6.orig/fs/xfs/xfs_types.h 2009-01-20 14:55:55.806068213 +0100
>>> +++ linux-2.6/fs/xfs/xfs_types.h 2009-01-20 14:56:01.437945154 +0100
>>> @@ -96,7 +96,7 @@ typedef __uint64_t xfs_dfilblks_t; /* nu
>>> /*
>>> * Memory based types are conditional.
>>> */
>>> -#if XFS_BIG_BLKNOS
>>> +#if 1 //XFS_BIG_BLKNOS
>>> typedef __uint64_t xfs_fsblock_t; /* blockno in filesystem (agno|agbno) */
>>> typedef __uint64_t xfs_rfsblock_t; /* blockno in filesystem (raw) */
>>> typedef __uint64_t xfs_rtblock_t; /* extent (block) in realtime area */
>>>
>> I've applied it and now running ,,fixed'' kernel. What I've notice is:
>> $ LC_ALL=C df -h
>> Filesystem Size Used Avail Use% Mounted on
>> /dev/sda1 20G -40E 40E - /
>> /dev/sda5 20G -23E 23E - /home
>> /dev/sda6 56G 56G 774M 99% /NORA
>> /dev/sda7 45G 44G 1.2G 98% /MAGAZYN
>
> Please try the patch I posted this morning - it fixes the problem
> properly and shouldn't have this side effect.
>

Your patch work for me. I've made also some tests a'la one proposed by
Christoph, here also everything works. Good work guys!

Have a nice day,

-Jacek

2009-01-21 22:58:21

by Dave Chinner

[permalink] [raw]
Subject: Re: [XFS] 2.6.29-rc2: XFS internal error XFS_WANT_CORRUPTED_GOTO

On Wed, Jan 21, 2009 at 10:04:36AM +0100, Jacek Luczak wrote:
> Dave Chinner pisze:
> > Please try the patch I posted this morning - it fixes the problem
> > properly and shouldn't have this side effect.
>
> Your patch work for me. I've made also some tests a'la one proposed by
> Christoph, here also everything works. Good work guys!

Thanks for testing the fix, Jacek. I'll get it pushed upstream
now.

Cheers,

Dave.
--
Dave Chinner
[email protected]