LinuxLists.cc - ext3 error with 2.6.9-rc4

2004-10-12 14:32:51

Subject: ext3 error with 2.6.9-rc4

The fs is on a 200gb seagate hd on a promise pci card (20267 - latest
firmware). It's hdh1. I was tarring a fs on hde1 onto hdh1. It ran for a
bit and then stopped with my kern.log providing the following error:

Oct 13 00:12:03 nessie kernel: EXT3-fs: mounted filesystem with ordered data mode.
Oct 13 00:17:03 nessie kernel: EXT3-fs error (device hdh1): ext3_readdir: bad entry in directory #3522561: rec_len is smaller than minimal - offset=4084, inode=3523431, rec_len=0, name_len=0
Oct 13 00:17:03 nessie kernel: Aborting journal on device hdh1.
Oct 13 00:17:03 nessie kernel: ext3_abort called.
Oct 13 00:17:03 nessie kernel: EXT3-fs error (device hdh1): ext3_journal_start: Detected aborted journal
Oct 13 00:17:03 nessie kernel: Remounting filesystem read-only
Oct 13 00:17:03 nessie kernel: EXT3-fs error (device hdh1) in start_transaction: Journal has aborted
Oct 13 00:17:58 nessie kernel: __journal_remove_journal_head: freeing b_committed_data

A similarish error occured under 2.6.8-rc2:

Sep 19 07:39:29 nessie kernel: attempt to access beyond end of device
Sep 19 07:39:29 nessie kernel: hdh1: rw=1, want=3186822344, limit=390716802
Sep 19 07:39:29 nessie kernel: Aborting journal on device hdh1.
Sep 19 07:39:29 nessie kernel: ext3_abort called.
Sep 19 07:39:29 nessie kernel: EXT3-fs abort (device hdh1): ext3_journal_start: Detected aborted journal
Sep 19 07:39:29 nessie kernel: Remounting filesystem read-only
Sep 19 07:39:29 nessie kernel: EXT3-fs error (device hdh1) in
start_transaction: Journal has aborted

This was during a copy from hde2 to hdh1. 2.6.9-rc4 survived this bit
but died anyway when more data was written to the fs when tarring.

The HD is brand new.

Any help I can provide in helping debug this I will gladly give. Just
give me a shout.

--
Red herrings strewn hither and yon.

2004-10-14 04:12:32

by Robert Hancock

[permalink] [raw]

Subject: Re: ext3 error with 2.6.9-rc4

Tried fsck on that file system? Sounds like it may be corrupted somehow.

----- Original Message -----
From: "CaT" <[email protected]>
Newsgroups: fa.linux.kernel
To: <[email protected]>
Cc: <[email protected]>; <[email protected]>; <[email protected]>
Sent: Tuesday, October 12, 2004 8:43 AM
Subject: ext3 error with 2.6.9-rc4

> The fs is on a 200gb seagate hd on a promise pci card (20267 - latest
> firmware). It's hdh1. I was tarring a fs on hde1 onto hdh1. It ran for a
> bit and then stopped with my kern.log providing the following error:
>
> Oct 13 00:12:03 nessie kernel: EXT3-fs: mounted filesystem with ordered
> data mode.
> Oct 13 00:17:03 nessie kernel: EXT3-fs error (device hdh1): ext3_readdir:
> bad entry in directory #3522561: rec_len is smaller than minimal -
> offset=4084, inode=3523431, rec_len=0, name_len=0
> Oct 13 00:17:03 nessie kernel: Aborting journal on device hdh1.
> Oct 13 00:17:03 nessie kernel: ext3_abort called.
> Oct 13 00:17:03 nessie kernel: EXT3-fs error (device hdh1):
> ext3_journal_start: Detected aborted journal
> Oct 13 00:17:03 nessie kernel: Remounting filesystem read-only
> Oct 13 00:17:03 nessie kernel: EXT3-fs error (device hdh1) in
> start_transaction: Journal has aborted
> Oct 13 00:17:58 nessie kernel: __journal_remove_journal_head: freeing
> b_committed_data
>
> A similarish error occured under 2.6.8-rc2:
>
> Sep 19 07:39:29 nessie kernel: attempt to access beyond end of device
> Sep 19 07:39:29 nessie kernel: hdh1: rw=1, want=3186822344,
> limit=390716802
> Sep 19 07:39:29 nessie kernel: Aborting journal on device hdh1.
> Sep 19 07:39:29 nessie kernel: ext3_abort called.
> Sep 19 07:39:29 nessie kernel: EXT3-fs abort (device hdh1):
> ext3_journal_start: Detected aborted journal
> Sep 19 07:39:29 nessie kernel: Remounting filesystem read-only
> Sep 19 07:39:29 nessie kernel: EXT3-fs error (device hdh1) in
> start_transaction: Journal has aborted
>
> This was during a copy from hde2 to hdh1. 2.6.9-rc4 survived this bit
> but died anyway when more data was written to the fs when tarring.
>
> The HD is brand new.
>
> Any help I can provide in helping debug this I will gladly give. Just
> give me a shout.
>
> --
> Red herrings strewn hither and yon.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2004-10-14 16:51:26

by Stephen C. Tweedie

[permalink] [raw]

Subject: Re: ext3 error with 2.6.9-rc4

Hi,

On Tue, 2004-10-12 at 15:29, CaT wrote:
> The fs is on a 200gb seagate hd on a promise pci card (20267 - latest
> firmware). It's hdh1. I was tarring a fs on hde1 onto hdh1. It ran for a
> bit and then stopped with my kern.log providing the following error:
>
> Oct 13 00:12:03 nessie kernel: EXT3-fs: mounted filesystem with ordered data mode.
> Oct 13 00:17:03 nessie kernel: EXT3-fs error (device hdh1): ext3_readdir: bad entry in directory #3522561: rec_len is smaller than minimal - offset=4084, inode=3523431, rec_len=0, name_len=0

All this really tells us is that there's something bogus on disk, not
how it got there.

There are tools like "dt" which may help identify whether there's data
going bad on the way to disk, or whether it might be a fs fault.

http://www.bit-net.com/~rmiller/dt.html

--Stephen

2004-10-30 03:53:39

by CaT

[permalink] [raw]

Subject: Re: ext3 error with 2.6.9-rc4

On Thu, Oct 14, 2004 at 05:51:10PM +0100, Stephen C. Tweedie wrote:
> > Oct 13 00:17:03 nessie kernel: EXT3-fs error (device hdh1): ext3_readdir: bad entry in directory #3522561: rec_len is smaller than minimal - offset=4084, inode=3523431, rec_len=0, name_len=0
>
> All this really tells us is that there's something bogus on disk, not
> how it got there.
>
> There are tools like "dt" which may help identify whether there's data
> going bad on the way to disk, or whether it might be a fs fault.
>
> http://www.bit-net.com/~rmiller/dt.html

Thanks for that. A new utlitity to learn. :) Anyways, after getting my
laptop (and hence my access to email) back after a week I did a fair bit
of testing and the only way I can duplicate the above is by copying from
one hd to another. Further testing has led me to believe that the ext3
error is more of a symptom of data corruption caused in the IDE layer
somewhere rather then anything else.

I've posted a bug wrt to what I've discovered in the ide update thread
(my message before this one).

--
Red herrings strewn hither and yon.