2006-05-11 15:15:31

by Paul Slootman

[permalink] [raw]
Subject: Re: [dm-crypt] dm-crypt is broken and causes massive data corruption

Alasdair G Kergon <[email protected]> wrote:
>On Mon, May 08, 2006 at 07:20:12PM +0200, Tillmann Steinbrecher wrote:
>> it's been many months that dm-crypt has been broken, and is known to
>> cause massive data corruption.

>So far there isn't much in the way of controlled experiments, but:
>
> All the reports agree the problem is independent of filesystem.
>
> One thread suggests only filesystem metadata is corrupted, not file
> data, and wonders if something's going wrong with (unsupported) write
> barriers.
>
> Another report said dm-crypt over raid5 failed while raid5
> over dm-crypt worked.

A data point:

I'm running my /home on reiserfs3 over dm-crypt over lvm over raid5 for
at least a year now, without any problems. Currently running 2.6.13.4
(that's my "stable" work system...).


Paul Slootman


2006-05-11 23:17:08

by Christian Schmidt

[permalink] [raw]
Subject: Re: [dm-crypt] dm-crypt is broken and causes massive data corruption

Paul Slootman wrote:
> Alasdair G Kergon <[email protected]> wrote:
>> On Mon, May 08, 2006 at 07:20:12PM +0200, Tillmann Steinbrecher wrote:
>>> it's been many months that dm-crypt has been broken, and is known to
>>> cause massive data corruption.
>
>> So far there isn't much in the way of controlled experiments, but:
>>
>> All the reports agree the problem is independent of filesystem.
>>
>> One thread suggests only filesystem metadata is corrupted, not file
>> data, and wonders if something's going wrong with (unsupported) write
>> barriers.
>>
>> Another report said dm-crypt over raid5 failed while raid5
>> over dm-crypt worked.
>
> A data point:
>
> I'm running my /home on reiserfs3 over dm-crypt over lvm over raid5 for
> at least a year now, without any problems. Currently running 2.6.13.4
> (that's my "stable" work system...).

Just so you know,

I'm running dm-crypt on top of raid-5 as well. Kernels ranging from
gentoo's hardened 2.6.11 to 2.6.15.X with gentoo patchset on AMD64. The
raid is running since February 2005 with >1TB and survived a disk
failure with rebuild.
Cipher module was aes, now the asm-accelerated x86_64 version. The
filesystem is ext-3. Survived several hard lockups (damn cheap SATA
controllers hanging if a drive passes out), an LV/filesystem resize, and
feeding with GBytes of data in a row (at max ~30MByte/s to 2-3 files in
parallel).

Just re-checked the filesystem: no metadata information wrong. I
remember I checked the crc of several bigger archives when I had to
replace a drive two month ago, and couldn't find any problems then.

Best regards,
Christian

2006-05-12 21:47:08

by Harik

[permalink] [raw]
Subject: Re: [dm-crypt] dm-crypt is broken and causes massive data corruption

On 5/11/06, Paul Slootman <[email protected]> wrote:

> A data point:
>
> I'm running my /home on reiserfs3 over dm-crypt over lvm over raid5 for
> at least a year now, without any problems. Currently running 2.6.13.4
> (that's my "stable" work system...).

Datapoint:

Linux fileserver 2.6.15.6 #1 PREEMPT Wed Mar 8 20:26:55 EST 2006
x86_64 GNU/Linux
CONFIG_MD_RAID5=y
CONFIG_BLK_DEV_DM=y
CONFIG_DM_SNAPSHOT=y
CONFIG_CRYPTO_AES_X86_64=y

encrypted logical volume on a raid-5 MD on 4 SATA drives, mounted reiser3.

aes-cbc-plain

It's worked through multiple kernels, and moving from 32 to 64bits.
2.6.11 (64-bit) 2.6.10 (64bit) 2.6.8 (32bit) is the kernel history I
have so far. I'm not sure when I switched from cryptoloop to dm-crypt
though, at least before may '05.

I'm not running dm-crypt directly on MD, though, the stack is
SATA->MD->DM->DM-crypt->reiser3. That may be the difference.

I've got plenty of free space, I could make a ~75gb encrypted
partition and run any sort
of write pattern test/filesystem you want me to try.

2006-05-12 15:04:13

by Andrea Gelmini

[permalink] [raw]
Subject: Re: [dm-crypt] dm-crypt is broken and causes massive data corruption

On Thu, May 11, 2006 at 03:15:29PM +0000, Paul Slootman wrote:
> A data point:
>
> I'm running my /home on reiserfs3 over dm-crypt over lvm over raid5 for
> at least a year now, without any problems. Currently running 2.6.13.4
> (that's my "stable" work system...).

It seems the write pattern is important... I can replicate corruption
copying giga of data from an locale attached IDE disk. Do you write mostly
from network or from slow devices?

ciao,
gelma