LinuxLists.cc - Data corruption with raid5/dm-crypt/lvm/reiserfs on 2.6.19.2

2007-01-18 20:12:00

Subject: Data corruption with raid5/dm-crypt/lvm/reiserfs on 2.6.19.2

Hi!

I'm experiencing data corruption in the following setup:

1. mdadm --create /dev/md0 -n3 -lraid5 /dev/hda1 /dev/hdc1 /dev/hde1
2. cryptsetup -c aes-cbc-essiva:sha256 luksFormat /dev/md0 mykey
3. cryptsetup -d mykey luksOpen /dev/md0 cryptvol
4. pvcreate /dev/mapper/cryptvol
5. vgcreate vg0 /dev/cryptvol
6. lvcreate -n root -L10G vg0
7. mkreiserfs -q /dev/vg0/root
8. mkdir /.newroot; mount /dev/vg0/root /.newroot
9. mkdir /.realroot; mount -o bind / /.realroot
10. tar cf - -C /.realroot|tar xvpf - -C /.newroot

With Linux 2.6.18 (it's broken, OK, but there's still something wrong
even in 2.6.19.2 so keep on reading) I started getting warnings from
ReiserFS indicating severe data corruptions. Reiserfsck confirms
this. It usually happened while extracting the Linux source tree.

So after asking around I found out dm-crypt had a bug[1] fixed in
early December.
It got fixed in 2.6.19 and the fix was backported and included in 2.6.18.6[2].

Fine, so I upgraded to 2.6.18.6, rebuilt the array from scratch and
did the whole procedure again.
No messages from reiserfs in dmesg this time, but reiserfsck still
revealed severe data corruption.
I also found compressed archives and ISO-images for which I had
md5sums to be corrupt.

I then upgraded to 2.6.19.2 with the exact same result as with 2.6.18.6.
I even verified this on a fairly new computer with different hardware
(Intel CPU and chipset).

Figured it maybe was some kind of race condition so on my second try
on 2.6.19.2, when recreating the array, I let md finish resyncing it
before copying over the files.
This time, reiserfsck didn't complain.

Just for the sake of fun, I did the whole thing again, rebuilding the
array from scratch, let md resync the third drive and then I started
to copy over all files again. Thinking the cause of the problem was
heavy disk I/O I tried to stress the other LVM volumes residing on md0
using tar during the copy. Everything seemed fine; no problems arose.

Did a few reboots and confirmed that reiserfsck didn't have any
complaints on any of the filesystems residing on the LVM volumes on
md0.

Started using the machine as normal, and half a day later I unmounted
the filesystems and ran reiserfsck just to make sure everything still
was OK. Unfortunately, it wasn't.

The drives in the array are three brand new drives, 2x250GB and one
200GB, all three IDE drives.
According to SMART there's no problems with them. And they worked
fine in my previous RAID1 setup with dm-crypt and LVM, by the way.
The computer itself is an Athlon XP with less than 1GB of RAM on a M/B
with nForce2 chipset FWIW. No memory errors were detected with
memtest86+ (I completed the full test).
I haven't tried using another filesystem as I've got quite a lot of
faith in reiserfs's stability.

Is anybody else experiencing these problems?
Unfortunately I'm only able to do limited testing due to busy days,
but I'd love to help if I can.

[1] Here's a thread on the recently fixed data corruption bug in dm-crypt
http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1974

[2] The backport of the dm-crypt fix for 2.6.18.6 is here
http://uwsg.iu.edu/hypermail/linux/kernel/0612.1/2299.html

2007-01-22 19:57:48

by Andrew Morton

[permalink] [raw]

Subject: Re: Data corruption with raid5/dm-crypt/lvm/reiserfs on 2.6.19.2

> On Thu, 18 Jan 2007 21:11:58 +0100 noah <[email protected]> wrote:
> Hi!
>
> I'm experiencing data corruption in the following setup:
>
> 1. mdadm --create /dev/md0 -n3 -lraid5 /dev/hda1 /dev/hdc1 /dev/hde1
> 2. cryptsetup -c aes-cbc-essiva:sha256 luksFormat /dev/md0 mykey
> 3. cryptsetup -d mykey luksOpen /dev/md0 cryptvol
> 4. pvcreate /dev/mapper/cryptvol
> 5. vgcreate vg0 /dev/cryptvol
> 6. lvcreate -n root -L10G vg0
> 7. mkreiserfs -q /dev/vg0/root
> 8. mkdir /.newroot; mount /dev/vg0/root /.newroot
> 9. mkdir /.realroot; mount -o bind / /.realroot
> 10. tar cf - -C /.realroot|tar xvpf - -C /.newroot
>
> With Linux 2.6.18 (it's broken, OK, but there's still something wrong
> even in 2.6.19.2 so keep on reading) I started getting warnings from
> ReiserFS indicating severe data corruptions. Reiserfsck confirms
> this. It usually happened while extracting the Linux source tree.
>
> So after asking around I found out dm-crypt had a bug[1] fixed in
> early December.
> It got fixed in 2.6.19 and the fix was backported and included in 2.6.18.6[2].
>
> Fine, so I upgraded to 2.6.18.6, rebuilt the array from scratch and
> did the whole procedure again.
> No messages from reiserfs in dmesg this time, but reiserfsck still
> revealed severe data corruption.
> I also found compressed archives and ISO-images for which I had
> md5sums to be corrupt.
>
> I then upgraded to 2.6.19.2 with the exact same result as with 2.6.18.6.
> I even verified this on a fairly new computer with different hardware
> (Intel CPU and chipset).
>
> Figured it maybe was some kind of race condition so on my second try
> on 2.6.19.2, when recreating the array, I let md finish resyncing it
> before copying over the files.
> This time, reiserfsck didn't complain.
>
> Just for the sake of fun, I did the whole thing again, rebuilding the
> array from scratch, let md resync the third drive and then I started
> to copy over all files again. Thinking the cause of the problem was
> heavy disk I/O I tried to stress the other LVM volumes residing on md0
> using tar during the copy. Everything seemed fine; no problems arose.
>
> Did a few reboots and confirmed that reiserfsck didn't have any
> complaints on any of the filesystems residing on the LVM volumes on
> md0.
>
> Started using the machine as normal, and half a day later I unmounted
> the filesystems and ran reiserfsck just to make sure everything still
> was OK. Unfortunately, it wasn't.
>
>
> The drives in the array are three brand new drives, 2x250GB and one
> 200GB, all three IDE drives.
> According to SMART there's no problems with them. And they worked
> fine in my previous RAID1 setup with dm-crypt and LVM, by the way.
> The computer itself is an Athlon XP with less than 1GB of RAM on a M/B
> with nForce2 chipset FWIW. No memory errors were detected with
> memtest86+ (I completed the full test).
> I haven't tried using another filesystem as I've got quite a lot of
> faith in reiserfs's stability.
>
> Is anybody else experiencing these problems?
> Unfortunately I'm only able to do limited testing due to busy days,
> but I'd love to help if I can.
>
>
> [1] Here's a thread on the recently fixed data corruption bug in dm-crypt
> http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1974
>
> [2] The backport of the dm-crypt fix for 2.6.18.6 is here
> http://uwsg.iu.edu/hypermail/linux/kernel/0612.1/2299.html

There has been a long history of similar problems when raid and dm-crypt
are used together. I thought a couple of months ago that we were hot on
the trail of a fix, but I don't think we ever got there. Perhaps
Christophe can comment?

2007-01-22 21:49:10

by Christophe Saout

[permalink] [raw]

Subject: Re: Data corruption with raid5/dm-crypt/lvm/reiserfs on 2.6.19.2

Am Montag, den 22.01.2007, 11:56 -0800 schrieb Andrew Morton:

> There has been a long history of similar problems when raid and dm-crypt
> are used together. I thought a couple of months ago that we were hot on
> the trail of a fix, but I don't think we ever got there. Perhaps
> Christophe can comment?

No, I think it's exactly this bug. Three month ago someone came up with
a very reliable test case and I managed to nail down the bug.

Readaheads that were aborted by the raid5 code (or some layer below)
were signalled using a cleared BIO_UPTODATE bit, but no error code, and
were missed as aborted by dm-crypt (all other layers apparently set the
error code in this case, so this only happened with raid5) which could
mess up the buffer cache.

Anyway, it then turned out this bug was already "accidentally" fixed in
2.6.19 by RedHat in order to play nicely with make_request changes (the
stuff to reduce stack usage with stacked block device layers), that's
why you probably missed that it got fixed. The fix for pre-2.6.19
kernels went into some 2.6.16.x and 2.6.18.6.

2007-02-22 23:57:26

by Piet Delaney

[permalink] [raw]

Subject: Re: [dm-devel] Re: Data corruption with raid5/dm-crypt/lvm/reiserfs on 2.6.19.2

On Mon, 2007-01-22 at 22:42 +0100, Christophe Saout wrote:
> Am Montag, den 22.01.2007, 11:56 -0800 schrieb Andrew Morton:
>
> > There has been a long history of similar problems when raid and dm-crypt
> > are used together. I thought a couple of months ago that we were hot on
> > the trail of a fix, but I don't think we ever got there. Perhaps
> > Christophe can comment?
>
> No, I think it's exactly this bug. Three month ago someone came up with
> a very reliable test case and I managed to nail down the bug.
>
> Readaheads that were aborted by the raid5 code (or some layer below)
> were signalled using a cleared BIO_UPTODATE bit, but no error code, and
> were missed as aborted by dm-crypt (all other layers apparently set the
> error code in this case, so this only happened with raid5) which could
> mess up the buffer cache.
>
> Anyway, it then turned out this bug was already "accidentally" fixed in
> 2.6.19 by RedHat in order to play nicely with make_request changes (the
> stuff to reduce stack usage with stacked block device layers), that's
> why you probably missed that it got fixed. The fix for pre-2.6.19
> kernels went into some 2.6.16.x and 2.6.18.6.

Hi Chris:

I've been trying Andrew's suggestion of doing fault injections,
currently just kmalloc() and mempool_alloc(), and just got a hang
on 2.6.12 that I'm poking around on with kgdb. I'm using dm-crypt
on a SCSI raid-1 (mirrored) root partition. I'm using your patch
that fixes the raid5 problem just to play it safe.

So far it looks like processes are waiting to be woken up by
the buffer cache once reads have completed.

-piet

>
>
> --
> dm-devel mailing list
> [email protected]
> https://www.redhat.com/mailman/listinfo/dm-devel
--
Piet Delaney Phone: (408) 200-5256
Blue Lane Technologies Fax: (408) 200-5299
10450 Bubb Rd.
Cupertino, Ca. 95014 Email: [email protected]

Attachments:

signature.asc (189.00 B)
This is a digitally signed message part