2003-05-28 07:51:54

by Florian Weimer

[permalink] [raw]
Subject: [2.5.69] ext3 error: rec_len %% 4 != 0

Sometimes, our 2.5 test machine (actually, it's a production machine,
please don't ask why we can't use 2.4 *sigh*) stops with an ext3 error
message. We have now activated proper logging, and that's what we
got:

May 28 03:23:00 kernel: EXT3-fs error (device md0): ext3_readdir: bad entry in directory #16056745: rec_len %% 4 != 0 - offset=52, inode=431743, rec_len=37017, name_len=41
May 28 03:23:00 kernel: Aborting journal on device md0.
May 28 03:23:00 kernel: ext3_abort called.
May 28 03:23:00 kernel: EXT3-fs abort (device md0): ext3_journal_start: Detected aborted journal
May 28 03:23:00 kernel: Remounting filesystem read-only
May 28 03:23:00 kernel: EXT3-fs error (device md0) in ext3_commit_write: IO failure
May 28 03:23:00 kernel: EXT3-fs error (device md0) in start_transaction: Journal has aborted

What could cause this? Spurious data transmission errors? md0 is a
RAID-5, the machine is a Siemens Primergy H450 (Quad Pentium 4/Xeon, 4
GB RAM, two-channel Adaptec aic7899 Ultra160 SCSI adapter).

According to fsck.ext3, the on-disk data structures are clean, and if
I run "find" across the file system after the reboot, it doesn't
complain about bad directory entries, either.


2003-05-28 08:11:46

by Andrew Morton

[permalink] [raw]
Subject: Re: [2.5.69] ext3 error: rec_len %% 4 != 0

Florian Weimer <[email protected]> wrote:
>
> Sometimes, our 2.5 test machine (actually, it's a production machine,
> please don't ask why we can't use 2.4 *sigh*) stops with an ext3 error
> message. We have now activated proper logging, and that's what we
> got:
>
> May 28 03:23:00 kernel: EXT3-fs error (device md0): ext3_readdir: bad entry in directory #16056745: rec_len %% 4 != 0 - offset=52, inode=431743, rec_len=37017, name_len=41

Are you using htree? Run

dumpe2fs -h /dev/hda1 | grep features

and if it says "dir_index" then try turning it off:

tune2fs -O ^dir_index /dev/hda1

and reboot.

If it is not an htree problem (and htree seems pretty stable now) then
possibly the IO system has lost some data. If possible, try using a normal
old disk (no RAID).

Falling back to ext2 for a while would be interesting.

2003-05-28 08:37:35

by Florian Weimer

[permalink] [raw]
Subject: Re: [2.5.69] ext3 error: rec_len %% 4 != 0

Andrew Morton <[email protected]> writes:

> Are you using htree? Run
>
> dumpe2fs -h /dev/hda1 | grep features
>
> and if it says "dir_index" then try turning it off:
>
> tune2fs -O ^dir_index /dev/hda1
>
> and reboot.

No dir_index here, sorry.

> If it is not an htree problem (and htree seems pretty stable now)
> then possibly the IO system has lost some data. If possible, try
> using a normal old disk (no RAID).

Sorry, that's not possible, the data does not fit onto a single
disk. 8-(

> Falling back to ext2 for a while would be interesting.

Okay, will fallback to ext2 next time a reboot is required. (I guess
removing the has_journal feature using tune2fs is the easiest way to
do this, after a clean unmount, of course.)

2003-05-28 08:45:55

by Andrew Morton

[permalink] [raw]
Subject: Re: [2.5.69] ext3 error: rec_len %% 4 != 0

Florian Weimer <[email protected]> wrote:
>
> > Falling back to ext2 for a while would be interesting.
>
> Okay, will fallback to ext2 next time a reboot is required. (I guess
> removing the has_journal feature using tune2fs is the easiest way to
> do this, after a clean unmount, of course.)

No, just change its type to ext2 in /etc/fstab.

If it is the root filesystem, reboot with "rootfstype=ext2" on the
kernel command line and it will do what you want.

Make sure it's a clean shutdown though - ext2 will not mount a
needs-recovery ext3 filesystem.

2003-05-30 15:02:37

by Florian Weimer

[permalink] [raw]
Subject: Re: [2.5.69] ext3 error: rec_len %% 4 != 0

Andrew Morton <[email protected]> writes:

> Florian Weimer <[email protected]> wrote:
>>
>> Sometimes, our 2.5 test machine (actually, it's a production machine,
>> please don't ask why we can't use 2.4 *sigh*) stops with an ext3 error
>> message. We have now activated proper logging, and that's what we
>> got:
>>
>> May 28 03:23:00 kernel: EXT3-fs error (device md0): ext3_readdir: bad entry in directory #16056745: rec_len %% 4 != 0 - offset=52, inode=431743, rec_len=37017, name_len=41

Another error message is the following one:

EXT3-fs error (device md0): ext3_readdir: bad entry in directory #12812298: directory entry across blocks - offset=0, inode=1308, rec_len=38720, name_len=225

Hmm.

> Falling back to ext2 for a while would be interesting.

I see the following messages with dmesg, but they appear to be
non-critical:

init_special_inode: bogus i_mode (67)
init_special_inode: bogus i_mode (177766)
init_special_inode: bogus i_mode (5)
init_special_inode: bogus i_mode (65)
init_special_inode: bogus i_mode (53664)
init_special_inode: bogus i_mode (5)

Are they related? They didn't appear with ext3.

2003-06-06 07:45:40

by Florian Weimer

[permalink] [raw]
Subject: Re: [2.5.69] ext3 error: rec_len %% 4 != 0

Florian Weimer <[email protected]> writes:

> I see the following messages with dmesg, but they appear to be
> non-critical:
>
> init_special_inode: bogus i_mode (67)
> init_special_inode: bogus i_mode (177766)
> init_special_inode: bogus i_mode (5)
> init_special_inode: bogus i_mode (65)
> init_special_inode: bogus i_mode (53664)
> init_special_inode: bogus i_mode (5)
>
> Are they related? They didn't appear with ext3.

Some more data points:

o 2.5.70 fails as well.

o ext2 on 2.5.70, too.

o Above errors are definitely critical. 8-(

o It seems as if the kernel assumes that a file is a directory. (At
least after running 2.5.70/ext2 for a while, e2fsck had some
problems regenerating a proper file system structure and marked a
few files as directories.)

o Vanilla 2.4.20 is rock solid on the same hardware (no more file
system corruption after downgrade).