2008-02-28 13:54:38

by Ritesh Raj Sarraf

[permalink] [raw]
Subject: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Hi Christophe,

I noted kernel soft lockup messages on my laptop when doing a lot of I/O
(200GB) to a dm-crypt device. It was setup using LUKS.
The I/O never got disrupted nor anything failed. Just the messages.

Kernel: 2.6.24
Distribution: Debian Testing/Unstable
Tainted: Yes (nvidia proprietary drivers)

I've not filed a bugzilla because my kernel is a tainted kernel because of
nvidia drivers.

I'm attaching the messages. Please let me know if it stands as a candidate for
a bug report.


Ritesh
--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."


Attachments:
(No filename) (0.00 B)
signature.asc (189.00 B)
This is a digitally signed message part.
Download all attachments

2008-02-29 07:22:54

by Andrew Morton

[permalink] [raw]
Subject: Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Thu, 28 Feb 2008 19:24:03 +0530 Ritesh Raj Sarraf <[email protected]> wrote:

> Hi Christophe,

(cc's added)

> I noted kernel soft lockup messages on my laptop when doing a lot of I/O
> (200GB) to a dm-crypt device. It was setup using LUKS.
> The I/O never got disrupted nor anything failed. Just the messages.
>
> Kernel: 2.6.24
> Distribution: Debian Testing/Unstable
> Tainted: Yes (nvidia proprietary drivers)
>
> I've not filed a bugzilla because my kernel is a tainted kernel because of
> nvidia drivers.

That would be pretty dogmatic - if nuking the nvodia module prevents this
I'll eat several hats.

> I'm attaching the messages. Please let me know if it stands as a candidate for
> a bug report.
>

> a200 EDI: 0000000a EBP: 00000000 ESP: f32bfd7c
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: b3c3e000 CR3: 003b5000 CR4: 000026d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> [<c012902d>] do_softirq+0x45/0x53
> [<c0129291>] irq_exit+0x38/0x6b
> [<c01066f2>] do_IRQ+0x5a/0x70
> [<c01048c3>] common_interrupt+0x23/0x28
> [<f899202f>] xor_128+0x0/0x17 [cbc]
> [<f899237e>] crypto_cbc_encrypt+0xe4/0x146 [cbc]
> [<f899202f>] xor_128+0x0/0x17 [cbc]
> [<c01dd80a>] cfq_allow_merge+0x0/0x5a
> [<f89ad6ef>] aes_encrypt+0x0/0x17 [aes_i586]
> [<f88fe648>] crypt_convert_scatterlist+0x73/0xc3 [dm_crypt]
> [<f88fe7e0>] crypt_convert+0x148/0x185 [dm_crypt]
> [<f88fe9fe>] kcryptd_do_crypt+0x1e1/0x25e [dm_crypt]
> [<f88fe81d>] kcryptd_do_crypt+0x0/0x25e [dm_crypt]
> [<c0132225>] run_workqueue+0x7d/0x109
> [<c0135554>] prepare_to_wait+0x12/0x49
> [<c0132a9b>] worker_thread+0x0/0xc5
> [<c0132b55>] worker_thread+0xba/0xc5
> [<c0135441>] autoremove_wake_function+0x0/0x35
> [<c013537a>] kthread+0x38/0x5e
> [<c0135342>] kthread+0x0/0x5e
> [<c0104b0f>] kernel_thread_helper+0x7/0x10
> =======================
> BUG: soft lockup - CPU#0 stuck for 11s! [kcryptd:22652]
>
> Pid: 22652, comm: kcryptd Tainted: P (2.6.24-1-686 #1)
> EIP: 0060:[<c0128f6c>] EFLAGS: 00000202 CPU: 0
> EIP is at __do_softirq+0x57/0xd3
> EAX: c03b4860 EBX: 00000020 ECX: 00000009 EDX: 01c5c000
> ESI: c036a200 EDI: 0000000a EBP: 00000000 ESP: f32bfd30
> DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> CR0: 8005003b CR2: b3c3e000 CR3: 003b5000 CR4: 000026d0
> DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
> DR6: ffff0ff0 DR7: 00000400
> [<c012902d>] do_softirq+0x45/0x53
> [<c0129291>] irq_exit+0x38/0x6b
> [<c01066f2>] do_IRQ+0x5a/0x70
> [<c01048c3>] common_interrupt+0x23/0x28
> [<c01100d8>] cyrix_get_arr+0xb4/0x126
> [<c011ad36>] native_flush_tlb_single+0x3/0x4
> [<c011d0e9>] kunmap_atomic+0x60/0x94
> [<f89742d5>] blkcipher_walk_done+0x87/0x1fe [blkcipher]
> [<f89923cc>] crypto_cbc_encrypt+0x132/0x146 [cbc]
> [<f899202f>] xor_128+0x0/0x17 [cbc]
> [<c01dd80a>] cfq_allow_merge+0x0/0x5a
> [<f89ad6ef>] aes_encrypt+0x0/0x17 [aes_i586]
> [<f88fe648>] crypt_convert_scatterlist+0x73/0xc3 [dm_crypt]
> [<f88fe7e0>] crypt_convert+0x148/0x185 [dm_crypt]
> [<f88fe9fe>] kcryptd_do_crypt+0x1e1/0x25e [dm_crypt]
> [<f88fe81d>] kcryptd_do_crypt+0x0/0x25e [dm_crypt]
> [<c0132225>] run_workqueue+0x7d/0x109
> [<c0135554>] prepare_to_wait+0x12/0x49
> [<c0132a9b>] worker_thread+0x0/0xc5
> [<c0132b55>] worker_thread+0xba/0xc5
> [<c0135441>] autoremove_wake_function+0x0/0x35
> [<c013537a>] kthread+0x38/0x5e
> [<c0135342>] kthread+0x0/0x5e
> [<c0104b0f>] kernel_thread_helper+0x7/0x10
> =======================
> BUG: soft lockup - CPU#0 stuck for 11s! [kcryptd:22652]
>

Could be a dm-crypt problem, could be a crypto problem, could even be a
core block problems.

If nothing happens in the next few days, yes, please do raise a bugzilla
report. That helps us to avoid forgetting about it, but it doesn't do much
to get things fixed, I'm afraid.

If you can provide us with a simple step-by-step recipe to reprodue this,
and if others can indeed reproduce it, the chances of getting it fixed will
increase.


Now, I'm assuming that it's just unreasonable for a machine to spend a full
11 seconds crunching away on crypto in that code path. Maybe it _is_
reasonable, and all we need to do is to poke a cond_resched() in there
somewhere. Herbert, any thoughts? What's the speed of that code?

Thanks.

2008-02-29 18:16:18

by Herbert Xu

[permalink] [raw]
Subject: Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Thu, Feb 28, 2008 at 11:20:48PM -0800, Andrew Morton wrote:
>
> Now, I'm assuming that it's just unreasonable for a machine to spend a full
> 11 seconds crunching away on crypto in that code path. Maybe it _is_
> reasonable, and all we need to do is to poke a cond_resched() in there
> somewhere. Herbert, any thoughts? What's the speed of that code?

It encryps 512 bytes each time so it should definitely be pretty
quick. Perhaps the caller is disabling interrupts or something?

Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <[email protected]>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

2008-02-29 18:47:17

by Alasdair G Kergon

[permalink] [raw]
Subject: Re: [dm-devel] Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Thu, Feb 28, 2008 at 11:20:48PM -0800, Andrew Morton wrote:
> On Thu, 28 Feb 2008 19:24:03 +0530 Ritesh Raj Sarraf <[email protected]> wrote:
> > Kernel: 2.6.24
> > Distribution: Debian Testing/Unstable
> > Tainted: Yes (nvidia proprietary drivers)

Any chance you can try to reproduce it upstream e.g. in 2.6.25-rc3?

There have been significant changes in this area of the code.

Alasdair
--
[email protected]

2008-02-29 18:59:46

by Ritesh Raj Sarraf

[permalink] [raw]
Subject: Re: [dm-devel] Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Saturday 01 March 2008, Alasdair G Kergon wrote:
> On Thu, Feb 28, 2008 at 11:20:48PM -0800, Andrew Morton wrote:
> > On Thu, 28 Feb 2008 19:24:03 +0530 Ritesh Raj Sarraf <[email protected]>
wrote:
> > > Kernel: 2.6.24
> > > Distribution: Debian Testing/Unstable
> > > Tainted: Yes (nvidia proprietary drivers)
>
> Any chance you can try to reproduce it upstream e.g. in 2.6.25-rc3?
>
I can do that but only by Monday Evening IST.

Meanwhile I was able to reproduce the bug again with the same configuration
and the same scenario. So I believe that the bug can be reproduced
consistently.

Here are the steps:

1) Initialize a device using dm-crypt and LUKS
2) Create a filesystem on top of it and mount it.
3) Write huge amount of data (as a normal user). Something like 150GB.

As the load goes hight (to something like 12-14), the kernel lock-up is logged
into dmesg.
At that moment, the OS is barely responsive.

The I/O scheduler in use is:
rrs@learner:/sys/block/sdb/queue$ cat scheduler
noop anticipatory deadline [cfq]

The kernel logs are the same like the last time but I'm attaching it. There
still is a delay of 11seconds.

Ritesh
--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."


Attachments:
(No filename) (0.00 B)
signature.asc (189.00 B)
This is a digitally signed message part.
Download all attachments

2008-03-01 19:46:20

by Milan Broz

[permalink] [raw]
Subject: Re: [dm-devel] Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Ritesh Raj Sarraf wrote:

> 1) Initialize a device using dm-crypt and LUKS
> 2) Create a filesystem on top of it and mount it.
> 3) Write huge amount of data (as a normal user). Something like 150GB.
>
> As the load goes hight (to something like 12-14), the kernel lock-up is logged
> into dmesg.
> At that moment, the OS is barely responsive.
>
>
Please could you try to reproduce it with this patch applied?
(patch for 2.6.25-rc3, for 2.6.24 will follow - code changed here)

Milan
[email protected]
--

Add cond_resched() to prevent stuck in big bio processing.

Signed-off-by: Milan Broz <[email protected]>
---
drivers/md/dm-crypt.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index b04f98d..2032228 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -432,6 +432,7 @@ static int crypt_convert(struct crypt_config *cc,
/* fall through*/
case 0:
ctx->sector++;
+ cond_resched();
continue;
}


2008-03-01 19:48:53

by Milan Broz

[permalink] [raw]
Subject: Re: [dm-devel] Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

(the same patch for 2.6.24)

Milan
--
Add cond_resched() to prevent stuck in big bio processing.

Signed-off-by: Milan Broz <[email protected]>
---
drivers/md/dm-crypt.c | 1 +
1 file changed, 1 insertion(+)

Index: linux-2.6.24.3/drivers/md/dm-crypt.c
===================================================================
--- linux-2.6.24.3.orig/drivers/md/dm-crypt.c 2008-02-26 01:20:20.000000000 +0100
+++ linux-2.6.24.3/drivers/md/dm-crypt.c 2008-03-01 16:46:24.000000000 +0100
@@ -374,6 +374,7 @@ static int crypt_convert(struct crypt_co
break;

ctx->sector++;
+ cond_resched();
}

return r;

2008-03-06 14:43:27

by Ritesh Raj Sarraf

[permalink] [raw]
Subject: Re: [dm-devel] Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

I have a problem now.
I'm not able to reproduce the bug on an IBM xSeries box.

CPU: Dual Core 3.8 Ghz
RAM: 1 GB

The big difference in the configuration is that the laptop read data from a
USB HDD and copied it to another USB HDD (dm-crypted)
The IBM box reads it from an NFS share and copies it to the local hdd
(dm-crypted)

Ritesh


On Sunday 02 March 2008, Milan Broz wrote:
> (the same patch for 2.6.24)
>
> Milan
> --
> Add cond_resched() to prevent stuck in big bio processing.
>
> Signed-off-by: Milan Broz <[email protected]>
> ---
> drivers/md/dm-crypt.c | 1 +
> 1 file changed, 1 insertion(+)
>
> Index: linux-2.6.24.3/drivers/md/dm-crypt.c
> ===================================================================
> --- linux-2.6.24.3.orig/drivers/md/dm-crypt.c 2008-02-26 01:20:20.000000000
> +0100 +++ linux-2.6.24.3/drivers/md/dm-crypt.c 2008-03-01
> 16:46:24.000000000 +0100 @@ -374,6 +374,7 @@ static int
> crypt_convert(struct crypt_co
> break;
>
> ctx->sector++;
> + cond_resched();
> }
>
> return r;



--
Ritesh Raj Sarraf
RESEARCHUT - http://www.researchut.com
"Necessity is the mother of invention."


Attachments:
(No filename) (1.11 kB)
signature.asc (189.00 B)
This is a digitally signed message part.
Download all attachments

2008-06-02 03:08:00

by Yan Li

[permalink] [raw]
Subject: Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Thu, 28 Feb 2008 23:20:48 -0800, Andrew Morton wrote:
> On Thu, 28 Feb 2008 19:24:03 +0530 Ritesh Raj Sarraf <[email protected]> wrote:
> > I noted kernel soft lockup messages on my laptop when doing a lot of I/O
> > (200GB) to a dm-crypt device. It was setup using LUKS.
> > The I/O never got disrupted nor anything failed. Just the messages.

I met the same problem yesterday.

> Could be a dm-crypt problem, could be a crypto problem, could even be a
> core block problems.

I think it's due to heavy encryption computation that run longer than
10s and triggered the warning. By heavy I mean dm-crypt with
aes-xts-plain, 512b key size.

This is a typical soft lockup call trace snip from dmesg:
Call Trace:
[<ffffffff882c60b6>] :xts:crypt+0x9d/0xea
[<ffffffff882b5705>] :aes_x86_64:aes_encrypt+0x0/0x5
[<ffffffff882b5705>] :aes_x86_64:aes_encrypt+0x0/0x5
[<ffffffff882c622e>] :xts:encrypt+0x41/0x46
[<ffffffff8828273f>] :dm_crypt:crypt_convert_scatterlist+0x7b/0xc7
[<ffffffff882828ae>] :dm_crypt:crypt_convert+0x123/0x15d
[<ffffffff88282abd>] :dm_crypt:kcryptd_do_crypt+0x1d5/0x253
[<ffffffff882828e8>] :dm_crypt:kcryptd_do_crypt+0x0/0x253
[<ffffffff802448e5>] run_workqueue+0x7f/0x10b
... (omitted)

> If nothing happens in the next few days, yes, please do raise a bugzilla
> report.

Anybody has done this yet? Or I'll do it.

> If you can provide us with a simple step-by-step recipe to reprodue this,
> and if others can indeed reproduce it, the chances of getting it fixed will
> increase.

Here's my step to reproduce:

1. You need a moderate computer, it can't be too fast (I'm testing
this on a Intel(R) Xeon Duo 3040 @ 1.86GHz with 2G ECC RAM on a
Dell SC440 server, and it's slow enough). On faster computer the
computation maybe fast enough and not trigger the soft lockup
detector.

2. Use a 2.6.24+ kernel (I'm using a 2.6.24-etchnhalf.1-amd64 from
Debian)

3. Create a big partition (or loop file, I think it's OK), at least
40G.

4. # modprobe xts
# modprobe aes (or aes-x86_64, same result)
# cryptsetup -c aes-xts-plain -s 512 luksFormat /dev/sd<Partition>
# cryptsetup luksOpen /dev/sd<Partition> open_par

5. Do heavy I/O on it, like this:
# dd if=/dev/zero of=/dev/mapper/open_par

6. After some time (like one hour), run top, I found "kcryptd" is
running at 100%sy. Check dmesg and I found the soft lockup warning.

I think disk I/O speed is not important here. I'm using a 500G SATA2
drive.

On my server, only AES-XTS with 512 keysize is slow enough to trigger
the lockup detector. Other slow cryptor such as AES-CBC is OK that I
have test it for hours without any problem.

> Now, I'm assuming that it's just unreasonable for a machine to spend a full
> 11 seconds crunching away on crypto in that code path. Maybe it _is_
> reasonable, and all we need to do is to poke a cond_resched() in there
> somewhere.

I think this can solve the problem, however, this may harm the
performance of most average users who use only simple crypto such as
CBC-ESSIV, or the performance of high-end server that could handle XTS
with 512b keysize in less than 10s.

Or we can just ignore this problem is there's no data
corruption. Since for moderate computers running XTS with 512 keysize,
the status quo is not very bad, only some dmesg lockup warning and a
unresponsive system. We can add a warning to the document like
"running AES-XTS with 512b key size is a CPU hog and may slow down
your computer."

Anybody see a data corruption?

--
Li, Yan

2008-06-02 06:53:41

by Milan Broz

[permalink] [raw]
Subject: Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt


Yan Li wrote:
> On Thu, 28 Feb 2008 23:20:48 -0800, Andrew Morton wrote:
>> On Thu, 28 Feb 2008 19:24:03 +0530 Ritesh Raj Sarraf <[email protected]> wrote:
>>> I noted kernel soft lockup messages on my laptop when doing a lot of I/O
>>> (200GB) to a dm-crypt device. It was setup using LUKS.
>>> The I/O never got disrupted nor anything failed. Just the messages.
>
> I met the same problem yesterday.
>
>> Could be a dm-crypt problem, could be a crypto problem, could even be a
>> core block problems.
>
> I think it's due to heavy encryption computation that run longer than
> 10s and triggered the warning. By heavy I mean dm-crypt with
> aes-xts-plain, 512b key size.
>
> This is a typical soft lockup call trace snip from dmesg:
> Call Trace:
> [<ffffffff882c60b6>] :xts:crypt+0x9d/0xea
> [<ffffffff882b5705>] :aes_x86_64:aes_encrypt+0x0/0x5
> [<ffffffff882b5705>] :aes_x86_64:aes_encrypt+0x0/0x5
> [<ffffffff882c622e>] :xts:encrypt+0x41/0x46
> [<ffffffff8828273f>] :dm_crypt:crypt_convert_scatterlist+0x7b/0xc7
> [<ffffffff882828ae>] :dm_crypt:crypt_convert+0x123/0x15d
> [<ffffffff88282abd>] :dm_crypt:kcryptd_do_crypt+0x1d5/0x253
> [<ffffffff882828e8>] :dm_crypt:kcryptd_do_crypt+0x0/0x253
> [<ffffffff802448e5>] run_workqueue+0x7f/0x10b
> ... (omitted)

Please could you try if patch here helps and doesn't cause performance degradation?

http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.25/dm-crypt-add-cond_resched.patch

...
> Anybody see a data corruption?

It shouldn't cause any corruption of data.

Milan

2008-06-02 12:32:20

by Yan Li

[permalink] [raw]
Subject: Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Hi Milan,

On Mon, Jun 02, 2008 at 08:52:00AM +0200, Milan Broz wrote:
> Please could you try if patch here helps and doesn't cause performance degradation?
> http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.25/dm-crypt-add-cond_resched.patch

Will the result of testing a Debian 2.6.24-etchnhalf.1-amd64 kernel
(very near a vanilla kernel) be of same value? Since the data on some
other drives on this server is important so I dare not try 2.6.25-rc
on it.

Following is my test plan, comments are welcomed:

Test command:
# dd if=/dev/zero of=/dev/mapper/open_device bs=500M count=10
(this server has 2G memory)

The command will be run for 3 times, and average speed of last two
runs will be taken as result score.

Dm-crypt LUKS Encryption scenarios:
aes-cbc-essiv:sha256, keysize 128
aes-xts-plain, keysize 256
aes-xts-plain, keysize 512

I will compare the speed of all above 3 encryption scenarios, with and
without the patch.

--
Li, Yan

2008-06-02 12:52:27

by Milan Broz

[permalink] [raw]
Subject: Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Yan Li wrote:

>> Please could you try if patch here helps and doesn't cause performance degradation?
>> http://www2.kernel.org/pub/linux/kernel/people/agk/patches/2.6/2.6.25/dm-crypt-add-cond_resched.patch
>>
>
> Will the result of testing a Debian 2.6.24-etchnhalf.1-amd64 kernel
> (very near a vanilla kernel) be of same value? Since the data on some
> other drives on this server is important so I dare not try 2.6.25-rc
> on it.
>
patch just adds cond_resched(), problem is the same in all recent kernel I think.
just for 2.6.24 kernel patch need to be slighly modified (see below)

> Following is my test plan, comments are welcomed:
>
> Test command:
> # dd if=/dev/zero of=/dev/mapper/open_device bs=500M count=10
> (this server has 2G memory)
>
bonnie++ test or something like that is more appropriate, but

for this problem is dd test enough

> The command will be run for 3 times, and average speed of last two
> runs will be taken as result score.
>
>
flush caches between tests or simple luksClose & luksOpen + mount device between
test runs

> Dm-crypt LUKS Encryption scenarios:
> aes-cbc-essiv:sha256, keysize 128
> aes-xts-plain, keysize 256
> aes-xts-plain, keysize 512
>
> I will compare the speed of all above 3 encryption scenarios, with and
> without the patch.
>
>
Patch for 2.6.24 kernel

Add cond_resched() to prevent stuck in big bio processing.

Signed-off-by: Milan Broz <[email protected]>
---
drivers/md/dm-crypt.c | 1 +
1 file changed, 1 insertion(+)

Index: linux-2.6.24.3/drivers/md/dm-crypt.c
===================================================================
--- linux-2.6.24.3.orig/drivers/md/dm-crypt.c 2008-02-26 01:20:20.000000000 +0100
+++ linux-2.6.24.3/drivers/md/dm-crypt.c 2008-03-01 16:46:24.000000000 +0100
@@ -374,6 +374,7 @@ static int crypt_convert(struct crypt_co
break;

ctx->sector++;
+ cond_resched();
}

return r;

2008-06-03 23:14:15

by Yan Li

[permalink] [raw]
Subject: Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Wed, Jun 04, 2008 at 01:16:30AM +0530, Ritesh Raj Sarraf wrote:
> Following is the bugzilla that was opened against this problem.
> http://bugzilla.kernel.org/show_bug.cgi?id=10378
>
> Since I wasn't able to reproduce it on a server machine again, it was later
> closed.
>
> If you think it is the same issue, please feel free to re-open it.

I think they are not the same. My problem lied in the slow crypto
computation under heavy I/O. I'm testing Milan Broz's patch, till now
it seems has solved my problem.

--
Li, Yan

2008-06-05 22:44:32

by Yan Li

[permalink] [raw]
Subject: Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

On Mon, Jun 02, 2008 at 02:51:04PM +0200, Milan Broz wrote:
> Patch for 2.6.24 kernel
> Add cond_resched() to prevent stuck in big bio processing.

This patch actual has lead to performance _gain_.

Test Result, performance gain:
aes-cbc-essiv:sha256, keysize 128: 2.53%
aes-xts-plain, keysize 256: 0.26%
aes-xts-plain, keysize 512: 9.31%

Test kernel:
AMD64 2.6.24 from Debian Etch-and-a-half

Test command:
# dd if=/dev/zero of=/dev/mapper/open_device bs=500M count=100

This would write 50G zero data to an open LUKS raw device (no
filesystem overhead here), as 500M per block. This will stress mainly
the cryptographic and dm code, with little overhead. During the test,
the CPU usage was always full, thus HD speed was not bottleneck.

The count is 10 times bigger than my initial plan. Any by doing this
I found that, on my server, all the encryption methods has triggered
soft lockup for at least one time. So this problem is universal, not
only with XTS or LRW operation mode.

With patched kernel, soft lockup _no longer_ occurred.

This server has 2G memory, Intel Xeon Duo @ 1.86GHz.

The command will be run for 3 times, and average speed of last two
runs will be taken as result score.

Device was synced (luksClose ; sync ; luksOpen) between tests.

With my test script (Makefile), calculation spreadsheet and raw test
result attached.

--
Li, Yan


Attachments:
(No filename) (1.34 kB)
Makefile (3.27 kB)
2008-06-05-patched.out (4.47 kB)
Download all attachments

2008-06-06 06:48:11

by Milan Broz

[permalink] [raw]
Subject: Re: 2.6.24 Kernel Soft Lock Up with heavy I/O in dm-crypt

Yan Li wrote:
> On Mon, Jun 02, 2008 at 02:51:04PM +0200, Milan Broz wrote:
>> Patch for 2.6.24 kernel
>> Add cond_resched() to prevent stuck in big bio processing.
>
> This patch actual has lead to performance _gain_.
hmmm, nice:)

> With patched kernel, soft lockup _no longer_ occurred.

Alasdair, please could you move this patch back to actual tree
and sent it upstream?

We have at least two separate reports confirming that it fixes
the problem.

Milan
--
[email protected]