2006-05-03 09:17:23

by Andrea Gelmini

[permalink] [raw]
Subject: 2.6.16/MD/DM-crypt/fs corruption

Hi all,
to make short a long story:
a) five pata disks Maxtor 500GB each one;
b) one big software raid 5 (/dev/md1);[1]
c) dmcrypt-ing /dev/md1;[2]
d) after 500GB copied I've got fs corruption;[3]

vanilla kernel, well tested hardware,[4] debian testing (to have
dm-crypt).
I'm using same configuration (md+dm-crypt+ext2/3) on other server,
without problem, from years.
It's first time I use it on big partitions (1.9TB), sure there are
situations with much bigger storage, and it's first time I've got
problems. So:
a) is anybody outthere using this kind of configuration with
success?
b) what can I do to debug the problem? I've spent last weeks
working on hardware (dell servers mostly), to be sure to avoid
hardware problems;
c) has someone (OSDL?) regression tests on md+dm-crypt?

Thanks a lot for your time,
gelma


----
[1] /root/mdadm -Cv /dev/md1 --bitmap-chunk=1024 --chunk=256 \
--assume-clean --bitmap=internal -l5 -n5 /dev/hd[aefgh]
[2] mkfs.ext3 -L 'tritone' -m0 -N 5000000 /dev/mapper/raidone
[3] usually metadata, files checksum is good, but I have a lot of files in
/lost+found, after fsck, and sometimes I have to use debugfs to
unlink some of them (chatt/lsatt doesn't work on them). I can see the
problem with ext2/ext3/reiserfs.
[4] memtest and so on. I also tried the same box of disks on totally
different hardware, with same results. changed from ATA cable to RAM.
I tried also different IDE controller (the one on motherbords, old HPT
370/372, new HPT 302).


2006-05-03 15:50:52

by Andrea Gelmini

[permalink] [raw]
Subject: Re: 2.6.16/MD/DM-crypt/fs corruption

On mer, mag 03, 2006 at 11:16:53 +0200, Andrea Gelmini wrote:
> Hi all,
> to make short a long story:
> a) five pata disks Maxtor 500GB each one;
> b) one big software raid 5 (/dev/md1);[1]
> c) dmcrypt-ing /dev/md1;[2]
> d) after 500GB copied I've got fs corruption;[3]

it seems it's not only my problem.[1]
it would be good to put an advisory about it in menuconfig. I had spent an
incredible amount of hours trying to find hardware failure, and weeks to
re-test/reproduce the problem every time.
here[2] you can find a much more details.

thanks a lot for your time,
gelma

------
[1] http://episteme.arstechnica.com/groupee/forums/a/tpc/f/96509133/m/282007248731
[2] http://marc.theaimsgroup.com/?l=linux-raid&m=114579714925936&w=2