Sometimes, our 2.5 test machine (actually, it's a production machine,
please don't ask why we can't use 2.4 *sigh*) stops with an ext3 error
message. We have now activated proper logging, and that's what we
got:
May 28 03:23:00 kernel: EXT3-fs error (device md0): ext3_readdir: bad entry in directory #16056745: rec_len %% 4 != 0 - offset=52, inode=431743, rec_len=37017, name_len=41
May 28 03:23:00 kernel: Aborting journal on device md0.
May 28 03:23:00 kernel: ext3_abort called.
May 28 03:23:00 kernel: EXT3-fs abort (device md0): ext3_journal_start: Detected aborted journal
May 28 03:23:00 kernel: Remounting filesystem read-only
May 28 03:23:00 kernel: EXT3-fs error (device md0) in ext3_commit_write: IO failure
May 28 03:23:00 kernel: EXT3-fs error (device md0) in start_transaction: Journal has aborted
What could cause this? Spurious data transmission errors? md0 is a
RAID-5, the machine is a Siemens Primergy H450 (Quad Pentium 4/Xeon, 4
GB RAM, two-channel Adaptec aic7899 Ultra160 SCSI adapter).
According to fsck.ext3, the on-disk data structures are clean, and if
I run "find" across the file system after the reboot, it doesn't
complain about bad directory entries, either.
Florian Weimer <[email protected]> wrote:
>
> Sometimes, our 2.5 test machine (actually, it's a production machine,
> please don't ask why we can't use 2.4 *sigh*) stops with an ext3 error
> message. We have now activated proper logging, and that's what we
> got:
>
> May 28 03:23:00 kernel: EXT3-fs error (device md0): ext3_readdir: bad entry in directory #16056745: rec_len %% 4 != 0 - offset=52, inode=431743, rec_len=37017, name_len=41
Are you using htree? Run
dumpe2fs -h /dev/hda1 | grep features
and if it says "dir_index" then try turning it off:
tune2fs -O ^dir_index /dev/hda1
and reboot.
If it is not an htree problem (and htree seems pretty stable now) then
possibly the IO system has lost some data. If possible, try using a normal
old disk (no RAID).
Falling back to ext2 for a while would be interesting.
Andrew Morton <[email protected]> writes:
> Are you using htree? Run
>
> dumpe2fs -h /dev/hda1 | grep features
>
> and if it says "dir_index" then try turning it off:
>
> tune2fs -O ^dir_index /dev/hda1
>
> and reboot.
No dir_index here, sorry.
> If it is not an htree problem (and htree seems pretty stable now)
> then possibly the IO system has lost some data. If possible, try
> using a normal old disk (no RAID).
Sorry, that's not possible, the data does not fit onto a single
disk. 8-(
> Falling back to ext2 for a while would be interesting.
Okay, will fallback to ext2 next time a reboot is required. (I guess
removing the has_journal feature using tune2fs is the easiest way to
do this, after a clean unmount, of course.)
Florian Weimer <[email protected]> wrote:
>
> > Falling back to ext2 for a while would be interesting.
>
> Okay, will fallback to ext2 next time a reboot is required. (I guess
> removing the has_journal feature using tune2fs is the easiest way to
> do this, after a clean unmount, of course.)
No, just change its type to ext2 in /etc/fstab.
If it is the root filesystem, reboot with "rootfstype=ext2" on the
kernel command line and it will do what you want.
Make sure it's a clean shutdown though - ext2 will not mount a
needs-recovery ext3 filesystem.
Andrew Morton <[email protected]> writes:
> Florian Weimer <[email protected]> wrote:
>>
>> Sometimes, our 2.5 test machine (actually, it's a production machine,
>> please don't ask why we can't use 2.4 *sigh*) stops with an ext3 error
>> message. We have now activated proper logging, and that's what we
>> got:
>>
>> May 28 03:23:00 kernel: EXT3-fs error (device md0): ext3_readdir: bad entry in directory #16056745: rec_len %% 4 != 0 - offset=52, inode=431743, rec_len=37017, name_len=41
Another error message is the following one:
EXT3-fs error (device md0): ext3_readdir: bad entry in directory #12812298: directory entry across blocks - offset=0, inode=1308, rec_len=38720, name_len=225
Hmm.
> Falling back to ext2 for a while would be interesting.
I see the following messages with dmesg, but they appear to be
non-critical:
init_special_inode: bogus i_mode (67)
init_special_inode: bogus i_mode (177766)
init_special_inode: bogus i_mode (5)
init_special_inode: bogus i_mode (65)
init_special_inode: bogus i_mode (53664)
init_special_inode: bogus i_mode (5)
Are they related? They didn't appear with ext3.
Florian Weimer <[email protected]> writes:
> I see the following messages with dmesg, but they appear to be
> non-critical:
>
> init_special_inode: bogus i_mode (67)
> init_special_inode: bogus i_mode (177766)
> init_special_inode: bogus i_mode (5)
> init_special_inode: bogus i_mode (65)
> init_special_inode: bogus i_mode (53664)
> init_special_inode: bogus i_mode (5)
>
> Are they related? They didn't appear with ext3.
Some more data points:
o 2.5.70 fails as well.
o ext2 on 2.5.70, too.
o Above errors are definitely critical. 8-(
o It seems as if the kernel assumes that a file is a directory. (At
least after running 2.5.70/ext2 for a while, e2fsck had some
problems regenerating a proper file system structure and marked a
few files as directories.)
o Vanilla 2.4.20 is rock solid on the same hardware (no more file
system corruption after downgrade).