2009-10-12 14:02:46

by Christian Pernegger

[permalink] [raw]
Subject: *Really* bad I/O latency with md raid5+dm-crypt+lvm

[Please keep me CCed as I'm not subscribed to LKML]

Summary: I was hoping to use a layered storage setup, namely lvm on
dm-crypt on md raid5 for a new box I'm setting up, but that isn't
looking so good since a single heavyish writer will monopolise any and
all I/O on the "device". F. ex. while cp'ing a few GB of data from an
external disk to the array it takes ~10sec to run ls and ~2min to
start aptitude. Clueless attempts at a diagnosis below.

Hardware:
AMD Athlon II X2 250
2GB Crucial DDR2-ECC RAM (more after testing)
ASUS M4A785D-M PRO
4x WD1000FYPS
connected to onboard SATA controller (AMD SB710 / ahci)

Software:
Debian 5.03 (lenny/stable)
Kernel: linux-image-2.6.30-bpo.2-amd64 (based on 2.6.30.5 it seems)

The 4 disks are each partitioned into a 256MB sdX1 and a $REST sdX2.
The sdX1s make up md0, a raid1 w/ 1.0 superblock for /boot.
The sdX2s make up md1, a raid5 w/ 1.1 superblock, 1MiB chunk size and
stripe_cache_size = 8192.
On top of md1 sits md1_crypt, a dm-crypt/luks layer using
aes-cbc-essiv:sha256 and a 256 bit key. It's aligned to 6144 sectors
(=3MiB / 1 stripe)
The whole of md1_crypt is an lvm PV with a metadatasize of 3008KiB.
(That's the poor-man's way of aligning the data to align the data to
3MiB / 1 stripe. The lvm tools in stable are too old for proper
alignment options.)
The VG consisting of md1_crypt has 16GiB root, 4GiB swap, 200GiB home
and $REST data LVs.
All filesystems are ext3 with stride=256 and stripe-width=768. home is
mounted acl,user_xattr, data acl,user_xattr,noatime. Readahed on the
LVs is at 6MiB (2 stripes).

So, first question: should this kind of setup work at all or am I
doing something pathological in the first place?

Anyway, as soon as I copy something to the array or create a larger
(upwards of a few hundred MiB) tar archive the box becomes utterly
unresponsive until that job is finished. Even on the local console the
completion time for a simple ls or cat is of the order of tens of
seconds, just forget about launching emacs.

Now I know that people have been ranting about desktop responsiveness
for a while but that was very much an abstract thing for me until now.
I'd never have thought it would hit me on a personal streaming media /
backups / multi-user general purpose server. Well, at the moment it's
single-user, single-job ... :-(

Here's what I tried:
changing scheduler from cfq to deadline (no effect)
tuning proc/sys/vm/dirty*ratio way down (no effect)
turning off NCQ (some effect, maybe)
raising queue/nr_requests really high, e. g. 1000000 (helps
noticeably, especially when NCQ is off)

Ideas:
According to openssl speed aes-256-cbc the CPUs encryption speed is
~113 MiB/s (single core, est. for 512b blocks). Obviously the array is
much faster than that. I can't find the benchmarks ATM but the numbers
seemed plausible for 70 MiB/s (optimistic est. for sequential access)
disks at the time. So lets say at least 50% faster. Wouldn't this move
the bottleneck for requests away from the scheduler queue thus
rendering it ineffective?

Also, running btrace on the various block device layers I never see
>4k writes, even when using dd with a blocksize of 3 MiB. Is this
normal? btrace on (one of) the component disks shows some merged
requests at least. Am I wrong or would scheduling/merging lots and
lots of 4k blocks effectively, take an *insane* queue length?

All comments and suggestions welcome

Thank you,

Chris


2009-10-12 14:26:39

by Arjan van de Ven

[permalink] [raw]
Subject: Re: *Really* bad I/O latency with md raid5+dm-crypt+lvm

On Mon, 12 Oct 2009 16:01:58 +0200
Christian Pernegger <[email protected]> wrote:

> [Please keep me CCed as I'm not subscribed to LKML]
>
> Summary: I was hoping to use a layered storage setup, namely lvm on
> dm-crypt on md raid5 for a new box I'm setting up, but that isn't
> looking so good since a single heavyish writer will monopolise any and
> all I/O on the "device". F. ex. while cp'ing a few GB of data from an
> external disk to the array it takes ~10sec to run ls and ~2min to
> start aptitude. Clueless attempts at a diagnosis below.


have you ran latencytop ?

2009-10-12 14:49:13

by Tomasz Chmielewski

[permalink] [raw]
Subject: Re: *Really* bad I/O latency with md raid5+dm-crypt+lvm

> Summary: I was hoping to use a layered storage setup, namely lvm on
> dm-crypt on md raid5 for a new box I'm setting up, but that isn't
> looking so good since a single heavyish writer will monopolise any and
> all I/O on the "device". F. ex. while cp'ing a few GB of data from an
> external disk to the array it takes ~10sec to run ls and ~2min to
> start aptitude. Clueless attempts at a diagnosis below.

Did you try running strace to see where ls pauses?

Did you try running latencytop (and generally, top/htop while doing your
tests)?


(...)

> Anyway, as soon as I copy something to the array or create a larger
> (upwards of a few hundred MiB) tar archive the box becomes utterly
> unresponsive until that job is finished. Even on the local console the
> completion time for a simple ls or cat is of the order of tens of
> seconds, just forget about launching emacs.
> Now I know that people have been ranting about desktop responsiveness
> for a while but that was very much an abstract thing for me until now.

I think the above (big latency when doing some bigger IO) is a general
Linux problem.

I see similar behaviour on quite powerful hardware, i.e. Core i7, 8 GB
RAM, 2x HDD in a software RAID-1 array (no dm-crypt), when tarring
something big, or writing dd if=/dev/zero of=/home/me/bigfile - doing ls
in another terminal or just starting top can take up to a minute.

Quite interestingly, background RAID synchronization have almost no
effect on latency.

(...)


> According to openssl speed aes-256-cbc the CPUs encryption speed is
> ~113 MiB/s (single core, est. for 512b blocks). Obviously the array is
> much faster than that. I can't find the benchmarks ATM but the numbers
> seemed plausible for 70 MiB/s (optimistic est. for sequential access)
> disks at the time.

You can find some dm-crypt benchmarks i.e. here:

http://blog.wpkg.org/2009/04/23/cipher-benchmark-for-dm-crypt-luks/

Obviously, they will not match your hardware.

Also note that dm-crypt is not "SMP-ready", so whatever hardware you
have, it will only use once CPU - this may seriously limit the
performance, depending on your usage and hardware.


--
Tomasz Chmielewski

2009-10-12 17:39:14

by Mike Galbraith

[permalink] [raw]
Subject: Re: *Really* bad I/O latency with md raid5+dm-crypt+lvm

On Mon, 2009-10-12 at 16:48 +0200, Tomasz Chmielewski wrote:
> > Summary: I was hoping to use a layered storage setup, namely lvm on
> > dm-crypt on md raid5 for a new box I'm setting up, but that isn't
> > looking so good since a single heavyish writer will monopolise any and
> > all I/O on the "device". F. ex. while cp'ing a few GB of data from an
> > external disk to the array it takes ~10sec to run ls and ~2min to
> > start aptitude. Clueless attempts at a diagnosis below.
>
> Did you try running strace to see where ls pauses?
>
> Did you try running latencytop (and generally, top/htop while doing your
> tests)?
>
>
> (...)
>
> > Anyway, as soon as I copy something to the array or create a larger
> > (upwards of a few hundred MiB) tar archive the box becomes utterly
> > unresponsive until that job is finished. Even on the local console the
> > completion time for a simple ls or cat is of the order of tens of
> > seconds, just forget about launching emacs.
> > Now I know that people have been ranting about desktop responsiveness
> > for a while but that was very much an abstract thing for me until now.
>
> I think the above (big latency when doing some bigger IO) is a general
> Linux problem.

It would be interesting to test latest -rc. Though it may prove to be
unrelated. the symptoms sound very much like a recent thread wrt writers
starving readers.

-Mike

2009-10-12 19:06:38

by Christian Pernegger

[permalink] [raw]
Subject: Re: *Really* bad I/O latency with md raid5+dm-crypt+lvm

>> [Please keep me CCed as I'm not subscribed to LKML]
>>
>> Summary: I was hoping to use a layered storage setup, namely lvm on
>> dm-crypt on md raid5 for a new box I'm setting up, but that isn't
>> looking so good since a single heavyish writer will monopolise any and
>> all I/O on the "device". F. ex. while cp'ing a few GB of data from an
>> external disk to the array it takes ~10sec to run ls and ~2min to
>> start aptitude. Clueless attempts at a diagnosis below.

> Also note that dm-crypt is not "SMP-ready", so whatever hardware you have,
> it will only use once CPU - this may seriously limit the performance,
> depending on your usage and hardware.

The crypto performance itself is fine. Yes, it limits throughput to a
little over 100MiB/s but so what, that's plenty. Multi-core support
will come in time, I can wait. What I can't live with is a single
streaming write singlehandedly starving all reads. Linux has never
been great at this and it has been getting worse since ~2.6.18 but it
was never more than a nuisance (say <1sec delay).

It's as if the I/O scheduler weren't there.

> [latencytop? regular top?]

Actually I hadn't heard of latencytop but it looks nifty. Will have to
compile a custom kernel for it, though, since Debian kernels don't
have CONFIG_LATENCYTOP set.

Regular top has kcryptd (67%), mv (36%), md1_raid5 (36%), pdflush
(7%), kjournald (5%) at the top. Seems a bit much for md, doesn't it?
This is while mv'ing in some data from an sdditional SATA disk, lateny
isn't *too* bad, ~3s for an ls. According to iostat mv is writing to
the array at 50-60 MB/s. The fun part: it's using ~15000tps averaging
out to 4k per transaction as observed via btrace.

That can't be normal, can it?

Thanks,

C.