2008-02-03 20:55:29

by Christoph Anton Mitterer

[permalink] [raw]
Subject: data corruption with dmcrypt/LUKS

Hi.

I think I've found a bug somewhere in dm-crypt...

First of all the system that I use:
Debian (sid) with kernel 2.26.24 on AMD64 (intel core2 duo), 2GB RAM

For several days now I try to fully encrypt that system (that is, all
partitions are encrypted an I boot from an USB stick)
There are two errors that appear always and always again but first of
all an explanation how I setup everything:
/dev/sda1 is my unencrypted debian installation
/dev/sda2 is the partition that will hold the encrypted root
/dev/sda3 is swap

I boot from an USB stick (with the same debian sid/2.6.24 kernel as
on /dev/sda1) which is /dev/sdb(1).
The key itself is on /dev/sdc (also an USB stick)

How I've made the key:
dd if=/dev/random of=/tmp/KEY count=32 bs=1

How I've formatted sda2:
cryptsetup --verbose --cipher aes-cbc-essiv:sha256 --key-size 256
--iter-time 10000 luksFormat /dev/sda2 /tmp/KEY
cryptsetup --key-file /tmp/KEY luksOpen /dev/sda2 sda2
mkfs.ext3 /dev/mapper/sda2
cryptsetup luksClose sda2

<reboot>
<create mappings and mount everything again>

cp -a /mnt/unencrypted /mnt/encrypted/

<when I diff -q -r /mnt/unencrypted /mnt/encrypted/ here, everything is
ok but this is just, because those files are still cached in RAM>
<unmount + close mapping + reboot>
<create mappings and mount everything again>

Here's the first problem:
1) When I now diff the two versions again (the unencrypted and the one
from the encrypted partition) I get differences...
I'm quite sure that this is not due to damaged RAM or harddisk (checked
several times with memtest and badblocks) and the corruption is always
the same, although not fully reproducibly.
The filesystem tree itself seems to be the same on both discs (but I'm
not sure if the permissions and owners are copied correctly), but there
are differences in some (though not all) files.
The difference is always the same, that for one or more bytes of the
affected files, the hexcode is reduced by 0x10
That is:
If the file contains a byte "T" (0x74) on the unencrypted partition it
will have a "D" (0x64) on the encrypted.

I've first recognized this bug some weeks ago, when I used a 2.6.18
kernel on my boot+copy USB-stick (/dev/sdb) but I thought this might be
a bug in that pretty old version...
But now it even happens with 2.6.24...

2) The second bug happens only rarely and leads to a panic.
Unfortunately it's difficult to reproduce, but it always happened when I
mkfs.ext3 on the /dev/mapper/sda2.
There's a stack-trace printed which clearly involves some dmcrypt
lines...


Unfortunately this bug makes dm-crypt completely unusable for me (and
everybody who needs correctness for his data ;-) )

I'd ask you to run your own (massive) copying tests and report here if
you can reproduce that error.

Best wishes,
Chris.

btw: are there any other currently known bugs in dmcrypt? Or is it
considered as "production stable"?


2008-02-03 22:06:43

by Milan Broz

[permalink] [raw]
Subject: Re: data corruption with dmcrypt/LUKS

Christoph Anton Mitterer wrote:
> <when I diff -q -r /mnt/unencrypted /mnt/encrypted/ here, everything is
> ok but this is just, because those files are still cached in RAM>
> <unmount + close mapping + reboot>
> <create mappings and mount everything again>
>
> Here's the first problem:
> 1) When I now diff the two versions again (the unencrypted and the one
> from the encrypted partition) I get differences...
> I'm quite sure that this is not due to damaged RAM or harddisk (checked
> several times with memtest and badblocks) and the corruption is always
> the same, although not fully reproducibly.
> The filesystem tree itself seems to be the same on both discs (but I'm
> not sure if the permissions and owners are copied correctly), but there
> are differences in some (though not all) files.
> The difference is always the same, that for one or more bytes of the
> affected files, the hexcode is reduced by 0x10
> That is:
> If the file contains a byte "T" (0x74) on the unencrypted partition it
> will have a "D" (0x64) on the encrypted.

Hi,
Are you sure, that your USB-stick is not faulty ?
Could you reproduce it with different piece of hw ?
(Several strange reports for dm-crypt over USB were identified to be
USB hw faults.)

> 2) The second bug happens only rarely and leads to a panic.
> Unfortunately it's difficult to reproduce, but it always happened when I
> mkfs.ext3 on the /dev/mapper/sda2.
> There's a stack-trace printed which clearly involves some dmcrypt
> lines...

But no stack trace attached here... please attach it.

It can be known bug which was fixed in stable version some time ago
see http://lkml.org/lkml/2007/7/20/211

> btw: are there any other currently known bugs in dmcrypt? Or is it
> considered as "production stable"?

No known bugs causing data corruption, no such reports so far
for stable kernel.

Milan
--
[email protected]

2008-02-04 02:42:07

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: data corruption with dmcrypt/LUKS

Hi Milan Broz

On Sun, 2008-02-03 at 23:06 +0100, Milan Broz wrote:
> Are you sure, that your USB-stick is not faulty ?
I actually tested the stick, too. But I consider problems in the stick
(you mean the key-holding stick, do you?) as highly unlikely.
If the key would be wrong a good crypto system should give me completely
different data and not just these "minor" faults.


> Could you reproduce it with different piece of hw ?
> (Several strange reports for dm-crypt over USB were identified to be
> USB hw faults.)
I'll test it tomorrow.



> > 2) The second bug happens only rarely and leads to a panic.
> > Unfortunately it's difficult to reproduce, but it always happened when I
> > mkfs.ext3 on the /dev/mapper/sda2.
> > There's a stack-trace printed which clearly involves some dmcrypt
> > lines...
> But no stack trace attached here... please attach it.
Unfortunately I don't have one,... nothing was written to the logs and I
forgot to write it up :-/



> It can be known bug which was fixed in stable version some time ago
> see http://lkml.org/lkml/2007/7/20/211
Uhm but that patch should be part of 2.6.24, shouldn't it?


> No known bugs causing data corruption, no such reports so far
> for stable kernel.
Uhm ok,.. well as told above I'll make some other tests (without the
USB-sticks) but it would be great some people here could try this, too.

Best wishes,
Chris.

btw: What's about the dmcrypt mailing list,.. I've tried to subscribe
but no answers, and I get not posts (not even my owns).

2008-02-04 09:17:36

by Milan Broz

[permalink] [raw]
Subject: Re: data corruption with dmcrypt/LUKS

Christoph Anton Mitterer wrote:
> On Sun, 2008-02-03 at 23:06 +0100, Milan Broz wrote:
...
>>> 2) The second bug happens only rarely and leads to a panic.
>>> Unfortunately it's difficult to reproduce, but it always happened when I
>>> mkfs.ext3 on the /dev/mapper/sda2.
>>> There's a stack-trace printed which clearly involves some dmcrypt
>>> lines...
>> But no stack trace attached here... please attach it.
> Unfortunately I don't have one,... nothing was written to the logs and I
> forgot to write it up :-/
>
>
>
>> It can be known bug which was fixed in stable version some time ago
>> see http://lkml.org/lkml/2007/7/20/211
> Uhm but that patch should be part of 2.6.24, shouldn't it?

Yes, so if you hit this with 2.6.24 too is very important to sent OOps
log to identify problem (or link to screen snapshot, digital camera
snapshot or so).

...
> btw: What's about the dmcrypt mailing list,.. I've tried to subscribe
> but no answers, and I get not posts (not even my owns).

This is another story...
I hope that someone with admin rights will fix it (Christophe ?)

Milan
--
[email protected]

2008-02-05 01:28:30

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: data corruption with dmcrypt/LUKS

On Mon, 2008-02-04 at 10:17 +0100, Milan Broz wrote:
> Yes, so if you hit this with 2.6.24 too is very important to sent OOps
> log to identify problem (or link to screen snapshot, digital camera
> snapshot or so).
I did about 5 complete tests today and dozens of mkfs.ext3's but I
wasn't able to reproduce any of the two errors... very very strange.
(used the same sequence of commands, with and without using the
USB-stick)...
I'll do some other tests tomorrow because these problems were real and I
cannot believe, that they're simply gone...

And IMHO hardware problems are still very unlikely, or am I wrong?

Anyway,.. is there anybody who made deeper tests of dmcrypt? I mean real
massive tests perhaps with different filesystems and so on?
What are your experiences at Redhat?

Best wishes,
Chris


Attachments:
smime.p7s (4.99 kB)

2008-02-15 21:26:07

by Christoph Anton Mitterer

[permalink] [raw]
Subject: Re: data corruption with dmcrypt/LUKS

Hi Filippo.


On Wed, 2008-02-13 at 22:39 +0100, Filippo Zangheri wrote:
> have you conducted further tests? Have you discovered anything?
I actually conducted some tests last week (also with aes-cbc-essiv) but
wasn't able to reproduce the two errors (tested it on the same computer,
with the same USB-sticks, same commands, kernel 2.6.24 etc.).
I'm not sure if I should be glad about this,.. because I definitely had
those two problems, but of course it's still possible (though I consider
it unlikely) that there were hardware problems.


Today I copied a complete Debian installation (about 6 GB) from an
unencrypted partition to a luks/dm-crypt partition with aes-xts-plain
(according the Herbert Xu plain is the "most secure" and it's not
required to use the benbi mode for the IV generation).
I had no problems with this mode, too.

Best wishes,
Chris.


Attachments:
smime.p7s (4.99 kB)