2006-02-16 18:36:43

by Bauke Jan Douma

[permalink] [raw]
Subject: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

Trying to mount a corrupted xfs partition hanging off a Promise
PDC20267 FastTrak100/Ukltra100 controller.

The partition contains a test Linux system that, due to an in-
complete install, had to be halted by brute force (i.e. power off).

After a reboot and xfs_repair the filesystem proved to be mildly
corrupted, but recoverable.

The oops is replicable.

Anyone else I need to CC to?


XFS mounting filesystem hda3
Starting XFS recovery on filesystem: hda3 (logdev: internal)
Unable to handle kernel paging request at virtual address 40000010
printing eip:
c022d0b9
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: sunkbd serport mousedev usbhid uhci_hcd snd_seq snd_via82xx gameport snd_mpu401_uart snd_emu10k1 snd_rawmidi snd_ac97_codec snd_ac97_bus snd_seq_device snd_util_mem snd_hwdep nvidia v4l1_compat i2c_viapro i2c_via bttv snd_bt87x snd_pcm snd_timer snd snd_page_alloc msp3400 tuner tda9887 video_buf firmware_class compat_ioctl32 v4l2_common btcx_risc ir_common tveeprom videodev i2c_dev i2c_algo_bit rtc udf ide_cd cdrom video thermal processor fan container button battery ac nls_utf8 nls_iso8859_15 nls_iso8859_1 nfsd 8250_pnp 8250_pci 8250 serial_core sk98lin dummy nfs lockd sunrpc loop floppy evdev pcspkr usbcore
CPU: 0
EIP: 0060:[xlog_recover_do_inode_trans+473/2688] Tainted: P VLI
EFLAGS: 00010297 (2.6.16-rc3 #2)
EIP is at xlog_recover_do_inode_trans+0x1d9/0xa80
eax: 40000010 ebx: e774ed40 ecx: 00000010 edx: 00000000
esi: e92dd1c0 edi: 00000000 ebp: 00000080 esp: e7041aa4
ds: 007b es: 007b ss: 0068
Process mount (pid: 2515, threadinfo=e7040000 task=e9a0d550)
Stack: <0>e9f93f08 00000010 00000040 00000000 00004000 c0359515 e9a0d550 c041b160
00000001 00010000 0880433e 40c00000 e9a569c0 c03a2a00 00000000 40000010
e9f93f08 e91cdc00 00000000 00000040 00000000 00000286 0010bffe e774eda4
Call Trace:
[schedule+1205/1664] schedule+0x4b5/0x680
[xlog_recover_do_trans+288/384] xlog_recover_do_trans+0x120/0x180
[kmem_zalloc+31/80] kmem_zalloc+0x1f/0x50
[xlog_recover_commit_trans+57/80] xlog_recover_commit_trans+0x39/0x50
[xlog_recover_process_data+378/528] xlog_recover_process_data+0x17a/0x210
[xlog_do_recovery_pass+1744/2864] xlog_do_recovery_pass+0x6d0/0xb30
[xlog_do_log_recovery+143/208] xlog_do_log_recovery+0x8f/0xd0
[xlog_do_recover+59/416] xlog_do_recover+0x3b/0x1a0
[xlog_recover+219/240] xlog_recover+0xdb/0xf0
[xfs_log_mount+165/320] xfs_log_mount+0xa5/0x140
[xfs_mountfs+2064/4128] xfs_mountfs+0x810/0x1020
[__sched_text_start+7/12] __down_failed+0x7/0xc
[xfs_setsize_buftarg_flags+64/208] xfs_setsize_buftarg_flags+0x40/0xd0
[xfs_buf_rele+37/224] xfs_buf_rele+0x25/0xe0
[xfs_readsb+409/560] xfs_readsb+0x199/0x230
[xfs_ioinit+38/80] xfs_ioinit+0x26/0x50
[xfs_mount+1001/1744] xfs_mount+0x3e9/0x6d0
[linvfs_fill_super+161/512] linvfs_fill_super+0xa1/0x200
[snprintf+39/48] snprintf+0x27/0x30
[disk_name+98/208] disk_name+0x62/0xd0
[sb_set_blocksize+46/96] sb_set_blocksize+0x2e/0x60
[get_sb_bdev+245/336] get_sb_bdev+0xf5/0x150
[linvfs_get_sb+47/64] linvfs_get_sb+0x2f/0x40
[linvfs_fill_super+0/512] linvfs_fill_super+0x0/0x200
[do_kern_mount+174/400] do_kern_mount+0xae/0x190
[do_new_mount+131/224] do_new_mount+0x83/0xe0
[do_mount+580/608] do_mount+0x244/0x260
[exact_copy_from_user+50/112] exact_copy_from_user+0x32/0x70
[copy_mount_options+96/192] copy_mount_options+0x60/0xc0
[sys_mount+159/224] sys_mount+0x9f/0xe0
[sysenter_past_esp+84/117] sysenter_past_esp+0x54/0x75
Code: 24 70 8b 6c 24 74 83 c4 78 c3 0f b7 44 24 5a 8b 4c 24 40 c7 44 24 38 00 00 00 00 89 0c 24 89 44 24 04 e8 2b b1 01 00 89 44 24 3c <0f> b7 00 89 c2 c1 e8 08 c1 e2 08 09 c2 66 81 fa 4e 49 75 67 8b


2006-02-16 19:32:05

by Nathan Scott

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

On Thu, Feb 16, 2006 at 07:36:29PM +0100, bjd wrote:
> Trying to mount a corrupted xfs partition hanging off a Promise
> PDC20267 FastTrak100/Ukltra100 controller.
>
> The partition contains a test Linux system that, due to an in-
> complete install, had to be halted by brute force (i.e. power off).

Any idea how/at what point it became corrupted?

> After a reboot and xfs_repair the filesystem proved to be mildly
> corrupted, but recoverable.
>

This filesystem has not been repaired, if it had you would not
be going into log recovery when you mount it (xfs_repair clears
out the log).

> XFS mounting filesystem hda3
> Starting XFS recovery on filesystem: hda3 (logdev: internal)
> EIP: 0060:[xlog_recover_do_inode_trans+473/2688] Tainted: P VLI

This indicates you are running recovery - run xfs_repair first
if you know the filesystem is corrupt.

cheers.

--
Nathan

2006-02-17 16:54:57

by Jan Engelhardt

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

>> XFS mounting filesystem hda3
>> Starting XFS recovery on filesystem: hda3 (logdev: internal)
>> EIP: 0060:[xlog_recover_do_inode_trans+473/2688] Tainted: P VLI
>
>This indicates you are running recovery - run xfs_repair first
>if you know the filesystem is corrupt.
>
How does one know a filesystem got "corrupt enough" to require xfs_repair
first?



Jan Engelhardt
--

2006-02-19 21:30:43

by Nathan Scott

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

On Fri, Feb 17, 2006 at 05:54:49PM +0100, Jan Engelhardt wrote:
> >> XFS mounting filesystem hda3
> >> Starting XFS recovery on filesystem: hda3 (logdev: internal)
> >> EIP: 0060:[xlog_recover_do_inode_trans+473/2688] Tainted: P VLI
> >
> >This indicates you are running recovery - run xfs_repair first
> >if you know the filesystem is corrupt.
> >
> How does one know a filesystem got "corrupt enough" to require xfs_repair
> first?

Any corruption should be repaired. You'd notice corruption by
either running repair (as the bug reporter here had asserted),
or via the filesystem shutting down when the ondisk corruption
was encountered.

cheers.

--
Nathan

2006-02-19 21:52:24

by Dave Jones

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

On Mon, Feb 20, 2006 at 08:29:46AM +1100, Nathan Scott wrote:
> On Fri, Feb 17, 2006 at 05:54:49PM +0100, Jan Engelhardt wrote:
> > >> XFS mounting filesystem hda3
> > >> Starting XFS recovery on filesystem: hda3 (logdev: internal)
> > >> EIP: 0060:[xlog_recover_do_inode_trans+473/2688] Tainted: P VLI
> > >
> > >This indicates you are running recovery - run xfs_repair first
> > >if you know the filesystem is corrupt.
> > >
> > How does one know a filesystem got "corrupt enough" to require xfs_repair
> > first?
>
> Any corruption should be repaired. You'd notice corruption by
> either running repair (as the bug reporter here had asserted),
> or via the filesystem shutting down when the ondisk corruption
> was encountered.

Just for kicks, I just hacked this up..

#!/bin/bash
wget http://www.digitaldwarf.be/products/mangle.c
gcc mangle.c -o mangle

dd if=/dev/zero of=data.img count=70000

while [ 1 ];
do
mkfs.xfs -f data.img >/dev/null
./mangle data.img $RANDOM
sudo mount -t xfs data.img mntpt -o loop
sudo ls -R mntpt
sudo umount mntpt
done


xfs wins the award for 'noisiest fs in the face of corruption' :-)
After a few dozen backtraces from xfs_corruption_error,
this fell out...

divide error: 0000 [1] SMP
last sysfs file: /devices/pci0000:00/0000:00:1d.7/usb1/1-0:1.0/bAlternateSetting
CPU 3
Modules linked in: loop xfs exportfs relayfs snd_usb_audio snd_usb_lib hwmon_vid hwmon i2c_isa snd_seq_midi vfat fat usb_storage radeon drm ppdev autofs4 nfs lockd nfs_acl rfcomm l2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink xt_tcpudp iptable_filter ip_tables x_tables video button battery ac ipv6 lp parport_pc parport floppy nvram uhci_hcd ehci_hcd sg snd_emu10k1_synth snd_emux_synth snd_seq_virmidi snd_seq_midi_emul snd_intel8x0 snd_seq_dummy snd_seq_oss snd_emu10k1 snd_seq_midi_event snd_seq snd_rawmidi snd_pcm_oss snd_mixer_oss snd_ac97_codec snd_ac97_bus snd_seq_device snd_util_mem snd_pcm snd_hwdep snd_timer emu10k1_gp gameport i2c_i801 snd soundcore e1000 i2c_core snd_page_alloc e752x_edac edac_mc dm_snapshot dm_zero dm_mirror dm_mod ext3 jbd ata_piix libata sd_mod scsi_mod
Pid: 15299, comm: mount Not tainted 2.6.15-1.1963_FC5 #1
RIP: 0010:[<ffffffff886b3e93>] <ffffffff886b3e93>{:xfs:xfs_mountfs+1031}
RSP: 0000:ffff81001bacfa28 EFLAGS: 00010246
RAX: 0000000000000800 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 00000000000000aa RDI: ffff81002f1115f8
RBP: ffff81002f1115f8 R08: 0000000000000008 R09: 0000000000000003
R10: 0000000000000001 R11: ffff81002f1115f8 R12: ffff8100162fa188
R13: ffff81002f111650 R14: ffffffffffffffff R15: ffff81003ab16c78
FS: 00002b2a25b01380(0000) GS:ffff81003fe4adf0(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000392bb17560 CR3: 00000000195b9000 CR4: 00000000000006e0
Process mount (pid: 15299, threadinfo ffff81001bace000, task ffff810038a0a820)
Stack: 000000003ecf72f8 ffff81003ab16c78 0000000000000000 000000001dcd52ea
0000000000000000 0000000000000002 ffff81003205dc08 ffff810031be8010
dead4ead00000001 00000000ffffffff
Call Trace: <ffffffff886c2699>{:xfs:xfs_setsize_buftarg_flags+48}
<ffffffff886ba733>{:xfs:xfs_mount+1880} <ffffffff886c945c>{:xfs:linvfs_fill_super+0}
<ffffffff886c94ed>{:xfs:linvfs_fill_super+145} <ffffffff80187ade>{bd_claim+131}
<ffffffff8018756c>{get_sb_bdev+271} <ffffffff801e1f0a>{selinux_sb_copy_data+328}
<ffffffff8018730a>{do_kern_mount+156} <ffffffff8019bdf5>{do_mount+1755}
<ffffffff8019a93f>{mntput_no_expire+25} <ffffffff8018fe98>{link_path_walk+211}
<ffffffff8017bbd2>{poison_obj+38} <ffffffff8017bf38>{cache_free_debugcheck+547}
<ffffffff80190278>{do_path_lookup+610} <ffffffff8015ac5a>{audit_getname+145}
<ffffffff801604d8>{bad_range+20} <ffffffff801616e7>{get_page_from_freelist+710}
<ffffffff8016189a>{__alloc_pages+112} <ffffffff8019bef3>{sys_mount+140}
<ffffffff8010a91c>{tracesys+209}

Code: 48 f7 f3 48 0f af c1 41 0f b6 4d 7b 48 d3 e0 48 89 85 d0 03
RIP <ffffffff886b3e93>{:xfs:xfs_mountfs+1031} RSP <ffff81001bacfa28>

(The kernel is based on 2.6.16rc4)

Dave

2006-02-20 07:13:51

by Sonny Rao

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

On Sun, Feb 19, 2006 at 04:52:09PM -0500, Dave Jones wrote:
<snip>
> Just for kicks, I just hacked this up..
>
> #!/bin/bash
> wget http://www.digitaldwarf.be/products/mangle.c
> gcc mangle.c -o mangle
>
> dd if=/dev/zero of=data.img count=70000
>
> while [ 1 ];
> do
> mkfs.xfs -f data.img >/dev/null
> ./mangle data.img $RANDOM
> sudo mount -t xfs data.img mntpt -o loop
> sudo ls -R mntpt
> sudo umount mntpt
> done

Cool script, you might want to multiply $RANDOM by some factor (I used
8) to catch some more stuff, I know JFS, for example, doesn't put
anything in the first 32k, so the first time I ran it on JFS it did
nothing ;-)


Reiserfs folks,

I also found an infinte loop in Reiserfs on 2.6.15, if the Reiser
folks are interested, I've gziped the fs and put it here:

http://burdell.org/~sonny/data.img.breaks.reiserfs.gz

The fs is only 52k when zipped, so its not too bad to download.

This is under stock 2.6.15, sorry I can't post dmesg output because I
end up having to reboot when it happens and don't have time to debug
right now. It looks like it's in the journal replay code where it
keeps trying to grab some block with a ridiculously large offset.


>
> xfs wins the award for 'noisiest fs in the face of corruption' :-)
> After a few dozen backtraces from xfs_corruption_error,
> this fell out...
>
> divide error: 0000 [1] SMP
<snip trace>

> (The kernel is based on 2.6.16rc4)

I see a similar breakage (divide error) on x86 using 2.6.15

Sonny

2006-02-20 07:21:20

by Hans Reiser

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

Thanks kindly Sonny, Chris is this bug known/fixed?

Hans

Sonny Rao wrote:

>On Sun, Feb 19, 2006 at 04:52:09PM -0500, Dave Jones wrote:
><snip>
>
>
>>Just for kicks, I just hacked this up..
>>
>>#!/bin/bash
>>wget http://www.digitaldwarf.be/products/mangle.c
>>gcc mangle.c -o mangle
>>
>>dd if=/dev/zero of=data.img count=70000
>>
>>while [ 1 ];
>>do
>> mkfs.xfs -f data.img >/dev/null
>> ./mangle data.img $RANDOM
>> sudo mount -t xfs data.img mntpt -o loop
>> sudo ls -R mntpt
>> sudo umount mntpt
>>done
>>
>>
>
>Cool script, you might want to multiply $RANDOM by some factor (I used
>8) to catch some more stuff, I know JFS, for example, doesn't put
>anything in the first 32k, so the first time I ran it on JFS it did
>nothing ;-)
>
>
>Reiserfs folks,
>
>I also found an infinte loop in Reiserfs on 2.6.15, if the Reiser
>folks are interested, I've gziped the fs and put it here:
>
>http://burdell.org/~sonny/data.img.breaks.reiserfs.gz
>
>The fs is only 52k when zipped, so its not too bad to download.
>
>This is under stock 2.6.15, sorry I can't post dmesg output because I
>end up having to reboot when it happens and don't have time to debug
>right now. It looks like it's in the journal replay code where it
>keeps trying to grab some block with a ridiculously large offset.
>
>
>
>
>>xfs wins the award for 'noisiest fs in the face of corruption' :-)
>>After a few dozen backtraces from xfs_corruption_error,
>>this fell out...
>>
>>divide error: 0000 [1] SMP
>>
>>
><snip trace>
>
>
>
>>(The kernel is based on 2.6.16rc4)
>>
>>
>
>I see a similar breakage (divide error) on x86 using 2.6.15
>
>Sonny
>
>
>
>

2006-02-20 16:45:58

by Sonny Rao

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

(trimmed the cc list a bit since this is all Reiserfs specific)

On Sun, Feb 19, 2006 at 11:21:13PM -0800, Hans Reiser wrote:
> Thanks kindly Sonny, Chris is this bug known/fixed?

Hi, I'm still seeing the issue on 2.6.16-rc4 so I don't think it's
fixed yet.

Here's some output :

Feb 20 10:36:25 localhost kernel: ReiserFS: loop0: found reiserfs format "3.6" with standard journal
Feb 20 10:36:25 localhost kernel: ReiserFS: loop0: using ordered data mode
Feb 20 10:36:25 localhost kernel: ReiserFS: loop0: journal params: device loop0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
Feb 20 10:36:25 localhost kernel: ReiserFS: loop0: checking transaction log (loop0)
Feb 20 10:36:27 localhost kernel: __find_get_block_slow() failed. block=18446744072887476243, b_blocknr=3472891923
Feb 20 10:36:27 localhost kernel: b_state=0x00000020, b_size=4096
Feb 20 10:36:27 localhost kernel: device blocksize: 4096
Feb 20 10:36:27 localhost kernel: __find_get_block_slow() failed. block=18446744072887476243, b_blocknr=3472891923
Feb 20 10:36:27 localhost kernel: b_state=0x00000020, b_size=4096
Feb 20 10:36:27 localhost kernel: device blocksize: 4096
Feb 20 10:36:27 localhost kernel: __find_get_block_slow() failed. block=18446744072887476243, b_blocknr=3472891923
Feb 20 10:36:27 localhost kernel: b_state=0x00000020, b_size=4096
Feb 20 10:36:27 localhost kernel: device blocksize: 4096
Feb 20 10:36:27 localhost kernel: __find_get_block_slow() failed. block=18446744072887476243, b_blocknr=3472891923
Feb 20 10:36:27 localhost kernel: b_state=0x00000020, b_size=4096
Feb 20 10:36:27 localhost kernel: device blocksize: 4096
...
ad infinitum

I'll try and add a dump_stack() to the code that prints this stuff later today

Sonny




>
> Sonny Rao wrote:
>
> >On Sun, Feb 19, 2006 at 04:52:09PM -0500, Dave Jones wrote:
> ><snip>
> >
> >
> >>Just for kicks, I just hacked this up..
> >>
> >>#!/bin/bash
> >>wget http://www.digitaldwarf.be/products/mangle.c
> >>gcc mangle.c -o mangle
> >>
> >>dd if=/dev/zero of=data.img count=70000
> >>
> >>while [ 1 ];
> >>do
> >> mkfs.xfs -f data.img >/dev/null
> >> ./mangle data.img $RANDOM
> >> sudo mount -t xfs data.img mntpt -o loop
> >> sudo ls -R mntpt
> >> sudo umount mntpt
> >>done
> >>
> >>
> >
> >Cool script, you might want to multiply $RANDOM by some factor (I used
> >8) to catch some more stuff, I know JFS, for example, doesn't put
> >anything in the first 32k, so the first time I ran it on JFS it did
> >nothing ;-)
> >
> >
> >Reiserfs folks,
> >
> >I also found an infinte loop in Reiserfs on 2.6.15, if the Reiser
> >folks are interested, I've gziped the fs and put it here:
> >
> >http://burdell.org/~sonny/data.img.breaks.reiserfs.gz
> >
> >The fs is only 52k when zipped, so its not too bad to download.
> >
> >This is under stock 2.6.15, sorry I can't post dmesg output because I
> >end up having to reboot when it happens and don't have time to debug
> >right now. It looks like it's in the journal replay code where it
> >keeps trying to grab some block with a ridiculously large offset.
> >
> >
> >
> >
> >>xfs wins the award for 'noisiest fs in the face of corruption' :-)
> >>After a few dozen backtraces from xfs_corruption_error,
> >>this fell out...
> >>
> >>divide error: 0000 [1] SMP
> >>
> >>
> ><snip trace>
> >
> >
> >
> >>(The kernel is based on 2.6.16rc4)
> >>
> >>
> >
> >I see a similar breakage (divide error) on x86 using 2.6.15
> >
> >Sonny
> >
> >
> >
> >

2006-02-20 17:15:58

by Sonny Rao

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

On Mon, Feb 20, 2006 at 11:41:20AM -0500, Sonny Rao wrote:
> (trimmed the cc list a bit since this is all Reiserfs specific)
>
> On Sun, Feb 19, 2006 at 11:21:13PM -0800, Hans Reiser wrote:
> > Thanks kindly Sonny, Chris is this bug known/fixed?
>
> Hi, I'm still seeing the issue on 2.6.16-rc4 so I don't think it's
> fixed yet.
>
> Here's some output :
>
> Feb 20 10:36:25 localhost kernel: ReiserFS: loop0: found reiserfs format "3.6" with standard journal
> Feb 20 10:36:25 localhost kernel: ReiserFS: loop0: using ordered data mode
> Feb 20 10:36:25 localhost kernel: ReiserFS: loop0: journal params: device loop0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
> Feb 20 10:36:25 localhost kernel: ReiserFS: loop0: checking transaction log (loop0)
> Feb 20 10:36:27 localhost kernel: __find_get_block_slow() failed. block=18446744072887476243, b_blocknr=3472891923
> Feb 20 10:36:27 localhost kernel: b_state=0x00000020, b_size=4096
> Feb 20 10:36:27 localhost kernel: device blocksize: 4096
> Feb 20 10:36:27 localhost kernel: __find_get_block_slow() failed. block=18446744072887476243, b_blocknr=3472891923
> Feb 20 10:36:27 localhost kernel: b_state=0x00000020, b_size=4096
> Feb 20 10:36:27 localhost kernel: device blocksize: 4096
> Feb 20 10:36:27 localhost kernel: __find_get_block_slow() failed. block=18446744072887476243, b_blocknr=3472891923
> Feb 20 10:36:27 localhost kernel: b_state=0x00000020, b_size=4096
> Feb 20 10:36:27 localhost kernel: device blocksize: 4096
> Feb 20 10:36:27 localhost kernel: __find_get_block_slow() failed. block=18446744072887476243, b_blocknr=3472891923
> Feb 20 10:36:27 localhost kernel: b_state=0x00000020, b_size=4096
> Feb 20 10:36:27 localhost kernel: device blocksize: 4096
> ...
> ad infinitum
>
> I'll try and add a dump_stack() to the code that prints this stuff later today

Ok, didn't take as long as I thought :)

Feb 20 11:03:57 localhost kernel: device blocksize: 4096
Feb 20 11:03:57 localhost kernel: [<b01042dd>] show_trace+0xd/0x10
Feb 20 11:03:57 localhost kernel: [<b01042f7>] dump_stack+0x17/0x20
Feb 20 11:03:57 localhost kernel: [<b0166973>] __find_get_block_slow+0x143/0x180
Feb 20 11:03:57 localhost kernel: [<b0168b72>] __find_get_block+0xf2/0x210
Feb 20 11:03:57 localhost kernel: [<b0168e59>] __getblk+0x1c9/0x280
Feb 20 11:03:57 localhost kernel: [<f1209125>] search_by_key+0xb5/0x1330 [reiserfs]
Feb 20 11:03:57 localhost kernel: [<f11f4130>] reiserfs_read_locked_inode+0x60/0x5e0 [reiserfs]
Feb 20 11:03:57 localhost kernel: [<f120207c>] reiserfs_fill_super+0xfec/0x1430 [reiserfs]
Feb 20 11:03:57 localhost kernel: [<b016b8f9>] get_sb_bdev+0xd9/0x107
Feb 20 11:03:57 localhost kernel: [<f11ff0eb>] get_super_block+0x1b/0x30 [reiserfs]
Feb 20 11:03:57 localhost kernel: [<b016ad1b>] do_kern_mount+0xbb/0x160
Feb 20 11:03:57 localhost kernel: [<b0182fcd>] do_mount+0x2bd/0x6f0
Feb 20 11:03:57 localhost kernel: [<b018346f>] sys_mount+0x6f/0xb0
Feb 20 11:03:57 localhost kernel: [<b0102f97>] sysenter_past_esp+0x54/0x75

So, search_by_key isn't terminating.

Also, fwiw I don't see this bug on ppc64, I get this message instead:

ReiserFS: loop0: found reiserfs format "3.6" with standard journal
ReiserFS: loop0: using ordered data mode
ReiserFS: loop0: journal params: device loop0, size 8192, journal first block 18, max trans len 1024, max batch 900, max commit age 30, max trans age 30
ReiserFS: loop0: checking transaction log (loop0)
attempt to access beyond end of device
loop0: rw=0, want=18446744067132948640, limit=70000
ReiserFS: loop0: warning: vs-13070: reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [1 2 0x0 SD]

2006-02-21 02:08:40

by Nathan Scott

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

On Mon, Feb 20, 2006 at 02:09:16AM -0500, Sonny Rao wrote:
> On Sun, Feb 19, 2006 at 04:52:09PM -0500, Dave Jones wrote:
> <snip>
> > Just for kicks, I just hacked this up..
> >
> > #!/bin/bash
> > wget http://www.digitaldwarf.be/products/mangle.c
> > gcc mangle.c -o mangle
> >
> > dd if=/dev/zero of=data.img count=70000
> >
> > while [ 1 ];
> > do
> > mkfs.xfs -f data.img >/dev/null
> > ./mangle data.img $RANDOM
> > sudo mount -t xfs data.img mntpt -o loop
> > sudo ls -R mntpt
> > sudo umount mntpt
> > done
> ...
> >
> > xfs wins the award for 'noisiest fs in the face of corruption' :-)
> > After a few dozen backtraces from xfs_corruption_error,
> > this fell out...
> >
> > divide error: 0000 [1] SMP
> <snip trace>
>
> > (The kernel is based on 2.6.16rc4)
>
> I see a similar breakage (divide error) on x86 using 2.6.15

>From a quick look at the image you sent me Sonny, I guess this is
the same problem Dave was seeing too -- a divide by zero when we're
working out some of the per-mount constants during mount(2). There
is probably one or two other superblock fields that could use more
verification, but this will do for now.

cheers.

--
Nathan


Index: xfs-linux/xfs_mount.c
===================================================================
--- xfs-linux.orig/xfs_mount.c
+++ xfs-linux/xfs_mount.c
@@ -268,9 +268,12 @@ xfs_mount_validate_sb(
sbp->sb_blocklog > XFS_MAX_BLOCKSIZE_LOG ||
sbp->sb_inodesize < XFS_DINODE_MIN_SIZE ||
sbp->sb_inodesize > XFS_DINODE_MAX_SIZE ||
+ sbp->sb_inodelog < XFS_DINODE_MIN_LOG ||
+ sbp->sb_inodelog > XFS_DINODE_MAX_LOG ||
+ (sbp->sb_blocklog - sbp->sb_inodelog != sbp->sb_inopblog) ||
(sbp->sb_rextsize * sbp->sb_blocksize > XFS_MAX_RTEXTSIZE) ||
(sbp->sb_rextsize * sbp->sb_blocksize < XFS_MIN_RTEXTSIZE) ||
- sbp->sb_imax_pct > 100)) {
+ (sbp->sb_imax_pct > 100 || sbp->sb_imax_pct < 1))) {
cmn_err(CE_WARN, "XFS: SB sanity check 1 failed");
XFS_CORRUPTION_ERROR("xfs_mount_validate_sb(3)",
XFS_ERRLEVEL_LOW, mp, sbp);

2006-02-21 04:11:01

by Sonny Rao

[permalink] [raw]
Subject: Re: kernel oops: trying to mount a corrupted xfs partition (2.6.16-rc3)

On Tue, Feb 21, 2006 at 01:04:47PM +1100, Nathan Scott wrote:
> On Mon, Feb 20, 2006 at 02:09:16AM -0500, Sonny Rao wrote:
> > On Sun, Feb 19, 2006 at 04:52:09PM -0500, Dave Jones wrote:
> > <snip>
> > > Just for kicks, I just hacked this up..
> > >
> > > #!/bin/bash
> > > wget http://www.digitaldwarf.be/products/mangle.c
> > > gcc mangle.c -o mangle
> > >
> > > dd if=/dev/zero of=data.img count=70000
> > >
> > > while [ 1 ];
> > > do
> > > mkfs.xfs -f data.img >/dev/null
> > > ./mangle data.img $RANDOM
> > > sudo mount -t xfs data.img mntpt -o loop
> > > sudo ls -R mntpt
> > > sudo umount mntpt
> > > done
> > ...
> > >
> > > xfs wins the award for 'noisiest fs in the face of corruption' :-)
> > > After a few dozen backtraces from xfs_corruption_error,
> > > this fell out...
> > >
> > > divide error: 0000 [1] SMP
> > <snip trace>
> >
> > > (The kernel is based on 2.6.16rc4)
> >
> > I see a similar breakage (divide error) on x86 using 2.6.15
>
> From a quick look at the image you sent me Sonny, I guess this is
> the same problem Dave was seeing too -- a divide by zero when we're
> working out some of the per-mount constants during mount(2). There
> is probably one or two other superblock fields that could use more
> verification, but this will do for now.

yep, this patch fixes it

Sonny