2007-10-17 00:26:29

by Martin Habets

[permalink] [raw]
Subject: ext3 warnings from LTP rename14

Hello,

I ran the ltp-full-20070930 tests on 2.6.23-rc9-mph4 (sparc32 SMP), and am
seeing the following warnings:

EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
cEXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
oEXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
mEXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs warning (device sdb5): ext3_rename: Deleting old file (215845), 2, error=-2
EXT3-fs error (device sdb5): ext3_add_entry: bad entry in directory #215845: rec_len is smaller than minimal - offset=56, inode=8026488, rec_len=0, name_len=0
Aborting journal on device sdb5.
EXT3-fs error (device sdb5) in start_transaction: Readonly filesystem
Aborting journal on device sdb5.
ext3_abort called.

The first ones are triggered the rename14 test (code attached), but I cannot
reproduce the latter issue so far.
Could it be a result of the earlier warnings?

An attempt to avoid the warning is also attached. But with that I see:
EXT3-fs warning (device sdb4): ext3_unlink: Deleting nonexistent file (305412), 0

Best regards,
Martin
---------------------------------------------------------------------------
30 years from now GNU/Linux will be as redundant a term as MERT/UNIX is
today. - Martin Habets
---------------------------------------------------------------------------


Attachments:
(No filename) (2.30 kB)
rename14.c (4.24 kB)
namei.c.patch (1.14 kB)
Download all attachments

2007-10-17 01:21:00

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext3 warnings from LTP rename14

Martin Habets wrote:
> Hello,
>
> I ran the ltp-full-20070930 tests on 2.6.23-rc9-mph4 (sparc32 SMP), and am
> seeing the following warnings:

This makes me a little nervous about my change
ef2b02d3e617cb0400eedf2668f86215e1b0e6af
(ext34: ensure do_split leaves enough free space in both blocks)

Do you know when this first showed up? Could you test -rc6?

You say you can reproduce it; have you checked (fsck'd) the filesystem
in between, and is it in good shape?

-Eric

2007-10-17 02:17:47

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext3 warnings from LTP rename14

Eric Sandeen wrote:
> Martin Habets wrote:
>> Hello,
>>
>> I ran the ltp-full-20070930 tests on 2.6.23-rc9-mph4 (sparc32 SMP), and am
>> seeing the following warnings:
>
> This makes me a little nervous about my change
> ef2b02d3e617cb0400eedf2668f86215e1b0e6af
> (ext34: ensure do_split leaves enough free space in both blocks)
>
> Do you know when this first showed up? Could you test -rc6?
>
> You say you can reproduce it; have you checked (fsck'd) the filesystem
> in between, and is it in good shape?
>
> -Eric

FWIW, I ran rename14 "standalone" a few times on 2.6.23.1 with no
problems...

[root@bear-05 ltp-full-20070930]#
./testcases/kernel/syscalls/rename/rename14
rename14 1 PASS : Test Passed
[root@bear-05 ltp-full-20070930]#
./testcases/kernel/syscalls/rename/rename14
rename14 1 PASS : Test Passed
[root@bear-05 ltp-full-20070930]#
./testcases/kernel/syscalls/rename/rename14
rename14 1 PASS : Test Passed
[root@bear-05 ~]# uname -a
Linux bear-05.lab.msp.redhat.com 2.6.23.1 #1 SMP Mon Oct 15 15:28:08 CDT
2007 i686 athlon i386 GNU/Linux


-Eric

2007-10-18 00:09:06

by Martin Habets

[permalink] [raw]
Subject: Re: ext3 warnings from LTP rename14

Hi,

On Tue, Oct 16, 2007 at 08:20:59PM -0500, Eric Sandeen wrote:
> Martin Habets wrote:
> > Hello,
> >
> > I ran the ltp-full-20070930 tests on 2.6.23-rc9-mph4 (sparc32 SMP), and am
> > seeing the following warnings:
>
> This makes me a little nervous about my change
> ef2b02d3e617cb0400eedf2668f86215e1b0e6af
> (ext34: ensure do_split leaves enough free space in both blocks)
>
> Do you know when this first showed up? Could you test -rc6?

Thanks for the input.
Tried this on -rc6, it behaves worse. Besides the earlier warnings
EXT3-fs warning (device sdb4): ext3_rename: Deleting old file (268753), 2, error=-2
I also get a lot of these
EXT3-fs error (device sdb4): ext3_add_entry: bad entry in directory #268753: rec_len is smaller than minimal - offset=56, inode=8026488, rec_len=0, name_len=0
and these at the end of the test:
EXT3-fs warning (device sdb4): ext3_unlink: Deleting nonexistent file (268761), 0
EXT3-fs error (device sdb4): ext3_readdir: bad entry in directory #268753: rec_len is smaller than minimal - offset=56, inode=8026488, rec_len=0, name_len=0
EXT3-fs error (device sdb4): empty_dir: bad entry in directory #268753: rec_len is smaller than minimal - offset=56, inode=8026488, rec_len=0, name_len=0

The FS is still mounted rw.

> You say you can reproduce it; have you checked (fsck'd) the filesystem
> in between, and is it in good shape?

Like you, I rerun rename14 stand-alone at the moment.
I didnt do an explicit fschk between all runs, but I also did 2 reboots
which did check the FS.

I'll try to bisect the original ext3_rename warning further.
--
Martin

2007-10-18 06:08:58

by Andreas Dilger

[permalink] [raw]
Subject: Re: ext3 warnings from LTP rename14

On Oct 16, 2007 21:17 -0500, Eric Sandeen wrote:
> Eric Sandeen wrote:
> > Martin Habets wrote:
> >> I ran the ltp-full-20070930 tests on 2.6.23-rc9-mph4 (sparc32 SMP), and am
> >> seeing the following warnings:
> >
> > This makes me a little nervous about my change
> > ef2b02d3e617cb0400eedf2668f86215e1b0e6af
> > (ext34: ensure do_split leaves enough free space in both blocks)
> >
> > Do you know when this first showed up? Could you test -rc6?
> >
> > You say you can reproduce it; have you checked (fsck'd) the filesystem
> > in between, and is it in good shape?
> >
> > -Eric
>
> FWIW, I ran rename14 "standalone" a few times on 2.6.23.1 with no
> problems...
>
> [root@bear-05 ltp-full-20070930]#
> ./testcases/kernel/syscalls/rename/rename14
> rename14 1 PASS : Test Passed
> [root@bear-05 ltp-full-20070930]#
> ./testcases/kernel/syscalls/rename/rename14
> rename14 1 PASS : Test Passed
> [root@bear-05 ltp-full-20070930]#
> ./testcases/kernel/syscalls/rename/rename14
> rename14 1 PASS : Test Passed
> [root@bear-05 ~]# uname -a
> Linux bear-05.lab.msp.redhat.com 2.6.23.1 #1 SMP Mon Oct 15 15:28:08 CDT
> 2007 i686 athlon i386 GNU/Linux

It is probably significant that the original machine is a sparc32 (big
endian). I'd suspect you can reproduce this on a PPC system also. You
might also consider running sparse on it ext3/ext4 in case you missed an
le32_to_cpu() or something.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.

2007-10-18 14:43:38

by Eric Sandeen

[permalink] [raw]
Subject: Re: ext3 warnings from LTP rename14

Andreas Dilger wrote:

> It is probably significant that the original machine is a sparc32 (big
> endian).

Oh, true.

> I'd suspect you can reproduce this on a PPC system also. You
> might also consider running sparse on it ext3/ext4 in case you missed an
> le32_to_cpu() or something.

sparse checks out ok... and, -rc6 which did not have that change showed
the same behavior....

-Eric

2007-10-21 13:21:56

by Martin Habets

[permalink] [raw]
Subject: Re: ext3 warnings from LTP rename14

I had a chance to try this on 2.6.23 today. The output from
rename14 was:

rename14 1 PASS : Test Passed
rename14 0 WARN : tst_rmdir(): rmobj(/tmp/renD0iZU9) failed: lstat(/tmp/renD0iZU9/rename14) failed; errno=2: No such file or directory

with these messages on the console:

EXT3-fs warning (device sdb4): ext3_rename: Deleting old file (162881), 2, error=-2
EXT3-fs warning (device sdb4): ext3_rename: Deleting old file (162881), 2, error=-2
EXT3-fs warning (device sdb4): ext3_rename: Deleting old file (162881), 2, error=-2
EXT3-fs warning (device sdb4): ext3_rename: Deleting old file (162881), 2, error=-2
EXT3-fs warning (device sdb4): ext3_unlink: Deleting nonexistent file (162885), 0

After this, umount /tmp resulted in the output below.
The filesystem shows up in mount, but another umount says it's not
mounted. After a hard reboot the fsck gives the following:

/dev/sdb4: Truncating orphaned inode 162884 (uid=1001, gid=1001, mode=0100644, size=0)
/dev/sdb4: Truncating orphaned inode 162882 (uid=1001, gid=1001, mode=0100644, size=0)
/dev/sdb4: Truncating orphaned inode 162883 (uid=1001, gid=1001, mode=0100644, size=0)
/dev/sdb4: Superblock last write time is in the future. FIXED.
/dev/sdb4: clean, 19/488640 files, 69643/976600 blocks

The filesystem is okay again after this.

I am not able to interpret the output below. Does it give any clues as
to the cause of the problem, given that it is probably an endian issue?

The other bit of info I have is that 2.6.21 works okay. Did not have a
chance yet to bisect this further.

Martin

---
palantir9:~# umount /tmp
EXT3 Inode f10f18f0: orphan list check failed!
f10f18f0: 00000000 00000000 00000000 00000000
f10f1900: 00000000 00000000 00000000 00000000
f10f1910: 00000000 00000000 00000000 00000000
f10f1920: 00000000 00000000 00000000 00000000
f10f1930: 00000000 00000000 00000000 00000028
f10f1940: 00000000 00000000 00000000 f0e8df0c
f10f1950: f10f179c 00000000 00000000 00000000
f10f1960: 00000000 00000001 00000000 f10f196c
f10f1970: f10f196c 00000000 00000000 00000000
f10f1980: 00100100 00200200 f10f1988 f10f1988
f10f1990: f10f1990 f10f1990 00027c43 00000000
f10f19a0: ffffffff 000003e9 000003e9 00000000
f10f19b0: 00000001 00000000 00000000 00000000
f10f19c0: 00000000 471b498a 00000000 471b498a
f10f19d0: 00000000 471b498b 00000000 0000000a
f10f19e0: 00000000 000081a4 00000000 00000001
f10f19f0: 00000000 f10f19f4 f10f19f4 00000000
f10f1a00: 00000000 f10f1a04 f10f1a04 f01a0674
f10f1a10: f01a06cc f0e97200 00000000 f10f1a20
f10f1a20: f10f1978 00000000 00000020 00000000
f10f1a30: 00000000 00000000 00000000 00010001
f10f1a40: f10f1a40 f10f1a40 00000000 00000000
f10f1a50: 00000000 00000000 f01a0800 001200d2
f10f1a60: f0d8fc6c 00000000 f10f1a68 f10f1a68
f10f1a70: 00000000 f10f1a74 f10f1a74 00000000
f10f1a80: 00000000 7588799e 00000040 00000000
f10f1a90: 00000000 00000000 00000000 00000000
[f0096b78 : dispose_list+0xc8/0x110 ] [f0096cb8 : invalidate_inodes+0xf8/0x108 ] [f008302c : generic_shutd
own_super+0xa8/0x134 ] [f0083dcc : kill_block_super+0x10/0x24 ] [f0082e7c : deactivate_super+0x50/0x6c ] [
f0099f9c : sys_umount+0x40/0x238 ] [f001541c : syscall_is_too_hard+0x3c/0x40 ] [00012438 : 0x12440 ]
EXT3 Inode f10f1740: orphan list check failed!
f10f1740: 00000000 00000000 00000000 00000000
f10f1750: 00000000 00000000 00000000 00000000
f10f1760: 00000000 00000000 00000000 00000000
f10f1770: 00000000 00000000 00000000 00000000
f10f1780: 00000000 00000000 00027c43 00000028
f10f1790: 00000000 00000000 00000000 f10f194c
f10f17a0: f10f15ec 00000000 00000000 00000000
f10f17b0: 00000000 00000001 00000000 f10f17bc
f10f17c0: f10f17bc 00000000 00000000 00000000
f10f17d0: 00100100 00200200 f10f17d8 f10f17d8
f10f17e0: f10f17e0 f10f17e0 00027c42 00000000
f10f17f0: ffffffff 000003e9 000003e9 00000000
f10f1800: 00000001 00000000 00000000 00000000
f10f1810: 00000000 471b498b 00000000 471b498b
f10f1820: 00000000 471b498b 00000000 0000000a
f10f1830: 00000000 000081a4 00000000 00000001
f10f1840: 00000000 f10f1844 f10f1844 00000000
f10f1850: 00000000 f10f1854 f10f1854 f01a0674
f10f1860: f01a06cc f0e97200 00000000 f10f1870
f10f1870: f10f17c8 00000000 00000020 00000000
f10f1880: 00000000 00000000 00000000 00010001
f10f1890: f10f1890 f10f1890 00000000 00000000
f10f18a0: 00000000 00000000 f01a0800 001200d2
f10f18b0: f0d8fc6c 00000000 f10f18b8 f10f18b8
f10f18c0: 00000000 f10f18c4 f10f18c4 00000000
f10f18d0: 00000000 758879a0 00000040 00000000
f10f18e0: 00000000 00000000 00000000 00000000
[f0096b78 : dispose_list+0xc8/0x110 ] [f0096cb8 : invalidate_inodes+0xf8/0x108 ] [f008302c : generic_shutd
own_super+0xa8/0x134 ] [f0083dcc : kill_block_super+0x10/0x24 ] [f0082e7c : deactivate_super+0x50/0x6c ] [
f0099f9c : sys_umount+0x40/0x238 ] [f001541c : syscall_is_too_hard+0x3c/0x40 ] [00012438 : 0x12440 ]
EXT3 Inode f10f1590: orphan list check failed!
f10f1590: 00000000 00000000 00000000 00000000
f10f15a0: 00000000 00000000 00000000 00000000
f10f15b0: 00000000 00000000 00000000 00000000
f10f15c0: 00000000 00000000 00000000 00000000
f10f15d0: 00000000 00000000 00027c42 00000028
f10f15e0: 00000000 00000000 00000000 f10f179c
f10f15f0: f0e8df0c 00000000 00000000 00000000
f10f1600: 00000000 00000001 00000000 f10f160c
f10f1610: f10f160c 00000000 00000000 00000000
f10f1620: 00100100 00200200 f10f1628 f10f1628
f10f1630: f10f1630 f10f1630 00027c44 00000000
f10f1640: ffffffff 000003e9 000003e9 00000000
f10f1650: 00000001 00000000 00000000 00000000
f10f1660: 00000000 471b498b 00000000 471b498b
f10f1670: 00000000 471b498b 00000000 0000000a
f10f1680: 00000000 000081a4 00000000 00000001
f10f1690: 00000000 f10f1694 f10f1694 00000000
f10f16a0: 00000000 f10f16a4 f10f16a4 f01a0674
f10f16b0: f01a06cc f0e97200 00000000 f10f16c0
f10f16c0: f10f1618 00000000 00000020 00000000
f10f16d0: 00000000 00000000 00000000 00010001
f10f16e0: f10f16e0 f10f16e0 00000000 00000000
f10f16f0: 00000000 00000000 f01a0800 001200d2
f10f1700: f0d8fc6c 00000000 f10f1708 f10f1708
f10f1710: 00000000 f10f1714 f10f1714 00000000
f10f1720: 00000000 758879a4 00000040 00000000
f10f1730: 00000000 00000000 00000000 00000000
[f0096b78 : dispose_list+0xc8/0x110 ] [f0096cb8 : invalidate_inodes+0xf8/0x108 ] [f008302c : generic_shutd
own_super+0xa8/0x134 ] [f0083dcc : kill_block_super+0x10/0x24 ] [f0082e7c : deactivate_super+0x50/0x6c ] [
f0099f9c : sys_umount+0x40/0x238 ] [f001541c : syscall_is_too_hard+0x3c/0x40 ] [00012438 : 0x12440 ]
sb orphan head is 162884
sb_info orphan list:
inode sdb4:162884 at f10f1618: mode 100644, nlink -1, next 162882
inode sdb4:162882 at f10f17c8: mode 100644, nlink -1, next 162883
inode sdb4:162883 at f10f1978: mode 100644, nlink -1, next 0
Assertion failure in ext3_put_super() at fs/ext3/super.c:423: "list_empty(&sbi->s_orphan)"
kernel BUG at fs/ext3/super.c:423!
\|/ ____ \|/
"@'/ ,. \`@"
/_| \__/ |_\
\__U_/
umount(769): Kernel bad trap [#1]
PSR: 1e4000c3 PC: f00d0b80 NPC: f00d0b84 Y: 00000000 Not tainted
PC: <ext3_put_super+0x138/0x224>
%G: 000001a7 1e4000e2 f01f5084 f01f5080 f00368f0 f010e97c f0f7c000 00000100
%O: 00000026 f01cdf08 000001a7 000001a7 f01cdf18 ffffffff f0f7dce8 f00d0b78
RPC: <ext3_put_super+0x130/0x224>
%L: f01cdc00 f0e8df0c f0e97200 f01cdc00 0000000a f0235400 00000001 f01fb000
%I: f0e8dc00 1e4000e4 00000001 00000063 00000078 f0f7dcf0 f0f7dd58 f00830b4
Caller[f00830b4]: generic_shutdown_super+0x130/0x134
Caller[f0083dcc]: kill_block_super+0x10/0x24
Caller[f0082e7c]: deactivate_super+0x50/0x6c
Caller[f0099f9c]: sys_umount+0x40/0x238
Caller[f001541c]: syscall_is_too_hard+0x3c/0x40
Caller[00012438]: 0x12440
Instruction DUMP: 90142308 7ffd1642 921021a7 <91d02005> 7fff55da d004a0a0 d0062318 80a22000 2280000d

WARNING: at kernel/exit.c:890 do_exit()
[f0016754 : do_hw_interrupt+0x50/0x8c ] [f00146a0 : bad_trap_handler+0x28/0x30 ] [f00d0b78 : ext3_put_supe
r+0x130/0x224 ] [f00830b4 : generic_shutdown_super+0x130/0x134 ] [f0083dcc : kill_block_super+0x10/0x24 ]
[f0082e7c : deactivate_super+0x50/0x6c ] [f0099f9c : sys_umount+0x40/0x238 ] [f001541c : syscall_is_too_ha
rd+0x3c/0x40 ] [00012438 : 0x12440 ]
Killed


On Thu, Oct 18, 2007 at 09:43:34AM -0500, Eric Sandeen wrote:
> Andreas Dilger wrote:
>
> > It is probably significant that the original machine is a sparc32 (big
> > endian).
>
> Oh, true.
>
> > I'd suspect you can reproduce this on a PPC system also. You
> > might also consider running sparse on it ext3/ext4 in case you missed an
> > le32_to_cpu() or something.
>
> sparse checks out ok... and, -rc6 which did not have that change showed
> the same behavior....
>
> -Eric
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html