2008-03-01 11:21:33

by Thomas Müller

[permalink] [raw]
Subject: Kernel oops / XFS filesystem corruption

Mar 1 10:32:03 linux kernel: BUG: unable to handle kernel NULL pointer dereference at virtual address 00000002
Mar 1 10:32:03 linux kernel: printing eip: f8a96141 *pde = 38ccb067
Mar 1 10:32:03 linux kernel: Oops: 0000 [#1] SMP
Mar 1 10:32:03 linux kernel: Modules linked in: asb100 hwmon_vid hwmon tun sch_sfq sch_htb pppoe pppox ppp_synctty ppp_async crc_ccitt ppp_generic slhc bridge xt_NOTRACK iptable_raw ipt_MASQUERADE iptable_nat nf_nat ipt_REJECT xt_mac ipt_LOG nf_conntrack_ipv4 xt_state nf_conntrack nfnetlink iptable_filter xt_CLASSIFY xt_length ipt_owner xt_TCPMSS xt_comment xt_tcpudp iptable_mangle ip_tables x_tables ext2 mbcache dm_mirror dm_mod 8139too r8169 mii i2c_i801 iTCO_wdt iTCO_vendor_support i2c_core sg sr_mod cdrom ata_generic ata_piix libata sd_mod scsi_mod xfs ehci_hcd
Mar 1 10:32:03 linux kernel: CPU: 0
Mar 1 10:32:03 linux kernel: EIP: 0060:[<f8a96141>] Not tainted VLI
Mar 1 10:32:03 linux kernel: EFLAGS: 00010292 (2.6.23.15-137.fc8 #1)
Mar 1 10:32:03 linux kernel: EIP is at xfs_attr_shortform_getvalue+0x15/0xdb [xfs]
Mar 1 10:32:03 linux kernel: eax: 00000000 ebx: f268cddc ecx: f8ae4d9d edx: 08d26645
Mar 1 10:32:03 linux kernel: esi: f04d1600 edi: 00000004 ebp: f8ae4d91 esp: f268cdbc
Mar 1 10:32:03 linux kernel: ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Mar 1 10:32:03 linux kernel: Process smbd (pid: 2036, ti=f268c000 task=f7207840 task.ti=f268c000)
Mar 1 10:32:03 linux kernel: Stack: 00000003 f37888d4 00000003 f04d1600 f04d1600 f268ce38 f8ae4d91 f8a93a97
Mar 1 10:32:03 linux kernel: f8ae4d91 0000000c c1ba6000 00000130 00000402 275b19c4 00000000 00000000
Mar 1 10:32:03 linux kernel: f04d1600 00000000 00000000 00000000 00000000 00000001 00000000 00000000
Mar 1 10:32:03 linux kernel: Call Trace:
Mar 1 10:32:03 linux kernel: [<f8a93a97>] xfs_attr_fetch+0x9e/0xee [xfs]
Mar 1 10:32:03 linux kernel: [<f8a8d843>] xfs_acl_iaccess+0x59/0xc2 [xfs]
Mar 1 10:32:03 linux kernel: [<f8abe3c2>] xfs_iaccess+0x87/0x15c [xfs]
Mar 1 10:32:03 linux kernel: [<f8ad53ec>] xfs_access+0x26/0x3a [xfs]
Mar 1 10:32:03 linux kernel: [<f8ae08ae>] xfs_vn_permission+0x0/0x13 [xfs]
Mar 1 10:32:03 linux kernel: [<f8ae08bd>] xfs_vn_permission+0xf/0x13 [xfs]
Mar 1 10:32:03 linux kernel: [<c0487419>] permission+0x9e/0xdb
Mar 1 10:32:03 linux kernel: [<c04887d0>] may_open+0x5c/0x205
Mar 1 10:32:03 linux kernel: [<c048a8b4>] open_namei+0x27d/0x576
Mar 1 10:32:03 linux kernel: [<c047fdb7>] do_filp_open+0x2a/0x3e
Mar 1 10:32:03 linux kernel: [<c047fafe>] get_unused_fd_flags+0x52/0xc5
Mar 1 10:32:03 linux kernel: [<c047fe13>] do_sys_open+0x48/0xca
Mar 1 10:32:03 linux kernel: [<c047fece>] sys_open+0x1c/0x1e
Mar 1 10:32:03 linux kernel: [<c040518a>] syscall_call+0x7/0xb
Mar 1 10:32:03 linux kernel: =======================
Mar 1 10:32:03 linux kernel: Code: 00 00 c6 40 02 00 66 c7 00 00 04 8b 47 2c 5b 5e 5f e9 08 bc 03 00 55 57 56 53 89 c3 83 ec 0c 8b 40 20 8b 40 4c 8b 40 14 8d 78 04 <0f> b6 40 02 c7 44 24 08 00 00 00 00 89 44 24 04 e9 96 00 00 00
Mar 1 10:32:03 linux kernel: EIP: [<f8a96141>] xfs_attr_shortform_getvalue+0x15/0xdb [xfs] SS:ESP 0068:f268cdbc


Attachments:
xfs_check (2.67 kB)
xfs_oops (3.09 kB)
Download all attachments

2008-03-01 21:02:21

by Eric Sandeen

[permalink] [raw]
Subject: Re: Kernel oops / XFS filesystem corruption

Thomas M?ller wrote:
> Hello :)
>
> My system just crashed because of a power fluctuation and the root
> filesystem was damaged.
> The system booted up just fine, but when samba tried to start up
> the kernel oops'd.
>
> xfs_repair was apparently able to repair the damage, though I seem
> to have lost some files.
>
> I do realize that a lot of awful things can happen if you just cut
> the power, but the kernel shouldn't oops on a mounted file
> system, right?

right.

here's the disassembly of that function in your kernrel FWIW:

0001012c <xfs_attr_shortform_getvalue>:
1012c: 55 push %ebp
1012d: 57 push %edi
1012e: 56 push %esi
1012f: 53 push %ebx
10130: 89 c3 mov %eax,%ebx
10132: 83 ec 0c sub $0xc,%esp
10135: 8b 40 20 mov 0x20(%eax),%eax
10138: 8b 40 4c mov 0x4c(%eax),%eax
1013b: 8b 40 14 mov 0x14(%eax),%eax
1013e: 8d 78 04 lea 0x4(%eax),%edi
10141: 0f b6 40 02 movzbl 0x2(%eax),%eax <--- boom.
10145: c7 44 24 08 00 00 00 movl $0x0,0x8(%esp)
1014c: 00
1014d: 89 44 24 04 mov %eax,0x4(%esp)
10151: e9 96 00 00 00 jmp 101ec
<xfs_attr_shortform_getvalue+0xc0>
...

at this point eax is "sf" (0x0) and edi is "sfe" (0x04)

Mar 1 10:32:03 linux kernel: eax: 00000000 ebx: f268cddc ecx:
f8ae4d9d edx: 08d26645
Mar 1 10:32:03 linux kernel: esi: f04d1600 edi: 00000004 ebp:
f8ae4d91 esp: f268cdbc

first part of the function:

int
xfs_attr_shortform_getvalue(xfs_da_args_t *args)
{
xfs_attr_shortform_t *sf;
xfs_attr_sf_entry_t *sfe;
int i;

ASSERT(args->dp->i_d.di_aformat == XFS_IFINLINE);
sf = (xfs_attr_shortform_t *)args->dp->i_afp->if_u1.if_data;
sfe = &sf->list[0];
for (i = 0; i < sf->hdr.count; <--- died here, sf is 0
sfe = XFS_ATTR_SF_NEXTENTRY(sfe), i++) {

we blew up on sf->hdr.count because sf is NULL (hdr.count is 0x2 into sf)

maybe the sgi guys can take it from there ;) Did you also happen to
save the xfs_repair output?

-Eric

2008-03-02 00:33:57

by Thomas Müller

[permalink] [raw]
Subject: Re: Kernel oops / XFS filesystem corruption

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
data fork in ino 128638 claims free block 19018
- agno = 1
- agno = 2
b5ac7b90: Badness in key lookup (length)
bp=(bno 11701280, len 32768 bytes) key=(bno 11701280, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 11708896, len 32768 bytes) key=(bno 11708896, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 11739296, len 32768 bytes) key=(bno 11739296, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 11751440, len 32768 bytes) key=(bno 11751440, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 11754176, len 32768 bytes) key=(bno 11754176, len 8192 bytes)
b5ac7b90: Badness in key lookup (length)
bp=(bno 12026592, len 32768 bytes) key=(bno 12026592, len 8192 bytes)
- agno = 3
b50c6b90: Badness in key lookup (length)
bp=(bno 15569728, len 32768 bytes) key=(bno 15569728, len 8192 bytes)
b50c6b90: Badness in key lookup (length)
bp=(bno 15626080, len 32768 bytes) key=(bno 15626080, len 8192 bytes)
- agno = 4
- agno = 5
- agno = 6
- agno = 7
b41ffb90: Badness in key lookup (length)
bp=(bno 31116224, len 32768 bytes) key=(bno 31116224, len 8192 bytes)
b41ffb90: Badness in key lookup (length)
bp=(bno 31117856, len 32768 bytes) key=(bno 31117856, len 8192 bytes)
b41ffb90: Badness in key lookup (length)
bp=(bno 31128704, len 32768 bytes) key=(bno 31128704, len 8192 bytes)
b41ffb90: Badness in key lookup (length)
bp=(bno 31239104, len 32768 bytes) key=(bno 31239104, len 8192 bytes)
b41ffb90: Badness in key lookup (length)
bp=(bno 31261408, len 32768 bytes) key=(bno 31261408, len 8192 bytes)
- agno = 8
local inode 33609156 attr too small (size = 0, min size = 4)
bad attribute fork in inode 33609156, clearing attr fork
clearing inode 33609156 attributes
cleared inode 33609156
- agno = 9
b50c6b90: Badness in key lookup (length)
bp=(bno 38861808, len 32768 bytes) key=(bno 38861808, len 8192 bytes)
- agno = 10
b41ffb90: Badness in key lookup (length)
bp=(bno 42752032, len 32768 bytes) key=(bno 42752032, len 8192 bytes)
- agno = 11
- agno = 12
b50c6b90: Badness in key lookup (length)
bp=(bno 50475360, len 32768 bytes) key=(bno 50475360, len 8192 bytes)
b50c6b90: Badness in key lookup (length)
bp=(bno 50629312, len 32768 bytes) key=(bno 50629312, len 8192 bytes)
- agno = 13
- agno = 14
- agno = 15
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
bad bmap btree ptr 0xc3a0000100000000 in ino 33609156
bad data fork in inode 33609156
cleared inode 33609156
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- agno = 13
- agno = 14
- agno = 15
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
entry "locking.tdb" in directory inode 33585205 points to free inode 33609156
bad hash table for directory inode 33585205 (no data entry): rebuilding
rebuilding directory inode 33585205
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 12636191, moving to lost+found
disconnected inode 12643748, moving to lost+found
disconnected inode 12643751, moving to lost+found
disconnected inode 12674162, moving to lost+found
disconnected inode 12674190, moving to lost+found
disconnected inode 12686342, moving to lost+found
disconnected inode 12689047, moving to lost+found
disconnected inode 12689059, moving to lost+found
disconnected inode 12961449, moving to lost+found
disconnected inode 16816212, moving to lost+found
disconnected inode 16872569, moving to lost+found
disconnected inode 33609179, moving to lost+found
disconnected inode 33609189, moving to lost+found
disconnected inode 33610799, moving to lost+found
disconnected inode 33610824, moving to lost+found
disconnected inode 33610838, moving to lost+found
disconnected inode 33610839, moving to lost+found
disconnected inode 33621664, moving to lost+found
disconnected inode 33621671, moving to lost+found
disconnected inode 33621672, moving to lost+found
disconnected inode 33732053, moving to lost+found
disconnected inode 33754372, moving to lost+found
disconnected inode 41977984, moving to lost+found
disconnected inode 46179860, moving to lost+found
disconnected inode 54526415, moving to lost+found
disconnected inode 54680382, moving to lost+found
Phase 7 - verify and correct link counts...
done


Attachments:
xfs_repair (5.05 kB)

2008-03-02 01:35:08

by Eric Sandeen

[permalink] [raw]
Subject: Re: Kernel oops / XFS filesystem corruption

Thomas M?ller wrote:
> Eric Sandeen wrote:
>> Did you also happen to save the xfs_repair output?
> No, but I made a complete copy of the file system before
> repairing it, so I can easily recreate it... :)

oh, like a dd image? great. You can use xfs_metadump to make a more
transportable image... xfs folks might even be able to use that to
recreate the oops.

-Eric

2008-03-02 19:03:11

by Thomas Müller

[permalink] [raw]
Subject: Re: Kernel oops / XFS filesystem corruption

Eric Sandeen wrote:
> oh, like a dd image? great.
Yup :)

> You can use xfs_metadump to make a more transportable image...
I will, if someone needs it.

As said, I have a complete file system image, so if anyone needs
more information/data, just tell me.


Thomas

2008-03-03 01:02:11

by Barry Naujok

[permalink] [raw]
Subject: Re: Kernel oops / XFS filesystem corruption

On Mon, 03 Mar 2008 06:02:28 +1100, Thomas Müller <[email protected]> wrote:

> Eric Sandeen wrote:
>> oh, like a dd image? great.
> Yup :)
>
> > You can use xfs_metadump to make a more transportable image...
> I will, if someone needs it.
>
> As said, I have a complete file system image, so if anyone needs
> more information/data, just tell me.

I could use the metadump image for the badness in key lookups that
xfs_repair was reporting.

Thanks,
Barry.

2008-03-03 01:03:59

by Mark Goodwin

[permalink] [raw]
Subject: Re: Kernel oops / XFS filesystem corruption



Eric Sandeen wrote:
> Thomas M?ller wrote:
>> Eric Sandeen wrote:
>>> Did you also happen to save the xfs_repair output?
>> No, but I made a complete copy of the file system before
>> repairing it, so I can easily recreate it... :)
>
> oh, like a dd image? great. You can use xfs_metadump to make a more
> transportable image... xfs folks might even be able to use that to
> recreate the oops.

YES PLEASE. See the xfs_metadump man page for instructions. It will
obfuscate filenames by default (but please only do so if you need to).

Please make it available for Barry, thanks.

Cheers
-- Mark