2007-05-28 10:08:55

by Rasmus Andersen

[permalink] [raw]
Subject: RAID1 questions and errors and questions about errors :)

Hello,

I have for some time been facing some RAID1 issues which I finally have
been able to take the time to write about. I hope you can help me shed
some light on this.

My main problem is that a check/repair run of my RAID1 device reports
errors. Not always the same number of errors and not monotonously
increasing. It has not always been like this but I have not been able to
link this to any external event.

Secondary problems, probably linked to the main one, is that rtorrent
complains about checksum errors from time to time and that a loop of ten
sha1/md5 sums over a DVD image returns varying numbers, usually only a
single diffent one. The latter is more easily provoked if the system (my
home server) as such is busy (rdiff-backup, compiling, etc). (The
torrents and images reside on the RAID1 device.)

I have tried to run smart tests on the component drives and I have tried
memtest86+ for about 24 hours and doug ledfords memtest script[1] for
~12 hours, all without errors reported. The box is running gentoo so it
also sees a fair amount of gcc action, which also never have had a
SIGSEG or SIGBUS error pop up.

The system dmesg is attached, it a basic oldish athlon system with two
SATA drives on a SIL adapter with a PATA drive as well. The SATA drives
are the ones in the RAID1 setup. Running the md5sums on the PATA drive
has not resulted in any errors so far.

A tangential question to all of this is that if I use mdadm to create a
single-device raid1 setup consisting of my existing PATA partition, the
resulting device is smaller than the existing one, causing fsck to
complain bitterly about the FS being bigger than the device. Is that a
bug? The command I use to create the device is

mdadm --create /dev/md1 -l 1 -n 2 /dev/hdb2 missing

(from memory but should capture the essence). I have wanted to create a
test RAID1 on the PATA drive to see if this caused errors to occur but
this prevents me from doing so, at least without some pain.


Lastly, I hope I made some sense :) I might well be that this is not
related to RAID1 at all but I have to start somewhere :)


[1] http://people.redhat.com/dledford/memtest.html

Thanks in advance,
Rasmus


Attachments:
(No filename) (2.13 kB)
dmesg (14.67 kB)
Download all attachments

2007-05-28 10:16:52

by Rasmus Andersen

[permalink] [raw]
Subject: Re: RAID1 questions and errors and questions about errors :)

On Mon, May 28, 2007 at 11:57:55AM +0200, Rasmus Andersen wrote:
> My main problem is that a check/repair run of my RAID1 device reports
> errors. Not always the same number of errors and not monotonously
> increasing. It has not always been like this but I have not been able to
> link this to any external event.

Meh. I forgot to state that I have seen this with vanilla .20 and .21.1
and also with various gentoo kernels based on .20. But as stated, I am
not able to link this to a kernel upgrade.

Rasmus