2005-11-21 20:31:17

by Lars Roland

[permalink] [raw]
Subject: Poor Software RAID-0 performance with 2.6.14.2

I have created a stripe across two 500Gb disks located on separate IDE
channels using:

mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd

the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
with hdparm and blockdev tuning), both bonnie++ and hdparm (included
below) shows a single disk operating faster than the stripe:

----
dkstorage01:~# hdparm -t /dev/md0
/dev/md0:
Timing buffered disk reads: 182 MB in 3.01 seconds = 60.47 MB/sec

dkstorage02:~# hdparm -t /dev/hdc1
/dev/hdc1:
Timing buffered disk reads: 184 MB in 3.02 seconds = 60.93 MB/sec
----

I am aware of cpu overhead with software raid but such a degradation
should not be the case with raid 0, especially not when the OS is
located on a separate SCSI disk - the IDE disks should just be ready
to work.

There have been some earlier reporting on this problem but they all
seam to end more and less inconclusive (here is one
http://kerneltrap.org/node/4745). Some people favors switching to
dmraid with device mapper, is this the de facto standard today ?

Examining the setup with mdadm gives:
-------
dkstorage01:~# mdadm -E /dev/hdb
/dev/hdb:
Magic : a92b4efc
Version : 00.90.02
UUID : 7edc2c10:6cb402e8:06d9bd91:57b11f01
Creation Time : Mon Nov 21 19:38:30 2005
Raid Level : raid0
Device Size : 488386496 (465.76 GiB 500.11 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0

Update Time : Mon Nov 21 19:38:30 2005
State : active
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Checksum : 9766c3ec - correct
Events : 0.1

Chunk Size : 4K

Number Major Minor RaidDevice State
this 1 3 64 1 active sync /dev/hdb

0 0 22 64 0 active sync /dev/hdd
1 1 3 64 1 active sync /dev/hdb
-------

mdadm is v1.12.0.



--
Lars Roland


2005-11-21 20:47:54

by Lennart Sorensen

[permalink] [raw]
Subject: Re: Poor Software RAID-0 performance with 2.6.14.2

On Mon, Nov 21, 2005 at 09:31:14PM +0100, Lars Roland wrote:
> I have created a stripe across two 500Gb disks located on separate IDE
> channels using:
>
> mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd

Does -l0 equal stripe or linear? The mdadm man page doesn't seem clear
o that to me.

If it defaults to linear, then you shouldn't expect any performance gain
since that would just stick one drive after the other (no striping).
Try explicitly statping -l stripe instead of -l 0.

> the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> below) shows a single disk operating faster than the stripe:
>
> ----
> dkstorage01:~# hdparm -t /dev/md0
> /dev/md0:
> Timing buffered disk reads: 182 MB in 3.01 seconds = 60.47 MB/sec
>
> dkstorage02:~# hdparm -t /dev/hdc1
> /dev/hdc1:
> Timing buffered disk reads: 184 MB in 3.02 seconds = 60.93 MB/sec

How about at least testing one of the drives involved in the raid,
although I assume they are identical in your case given the numbers.

Did you test this with other kernel versions (older ones) to see if it
was better in the past?

Any idea where the ide controller is connected? If it is PCI the whole
bus only has 133MB/s to give on many systems (some have more of course),
so maybe 60M/s is quite good.

Len Sorensen

2005-11-21 21:56:54

by NeilBrown

[permalink] [raw]
Subject: Re: Poor Software RAID-0 performance with 2.6.14.2

On Monday November 21, [email protected] wrote:
> I have created a stripe across two 500Gb disks located on separate IDE
> channels using:
>
> mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
>
> the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> below) shows a single disk operating faster than the stripe:
>
> ----
> dkstorage01:~# hdparm -t /dev/md0
> /dev/md0:
> Timing buffered disk reads: 182 MB in 3.01 seconds = 60.47 MB/sec
>
> dkstorage02:~# hdparm -t /dev/hdc1
> /dev/hdc1:
> Timing buffered disk reads: 184 MB in 3.02 seconds = 60.93 MB/sec
> ----

Could you try hdparm tests on the two drives in parallel?
hdparm -t /dev/hdb & hdparm -t /dev/hdd

It could be that the controller doesn't handle parallel traffic very
well.


>
> I am aware of cpu overhead with software raid but such a degradation
> should not be the case with raid 0, especially not when the OS is
> located on a separate SCSI disk - the IDE disks should just be ready
> to work.

raid0 has essentially 0 cpu overhead. It would be maybe a couple of
hundred instructions which would be lost in the noise. It just
figures out which drive each request should go to, and directs it
there.


>
> There have been some earlier reporting on this problem but they all
> seam to end more and less inconclusive (here is one
> http://kerneltrap.org/node/4745). Some people favors switching to
> dmraid with device mapper, is this the de facto standard today ?
>

The kerneltrap reference is about raid5.
raid5 is implemented very differently to raid0.

It might be worth experimenting with different read-ahead values using
the 'blockdev' command. Alternately use a larger chunk size.

I don't think there is a de facto standard. Many people use md. Many
use dm.

NeilBrown

2005-11-21 21:58:10

by NeilBrown

[permalink] [raw]
Subject: Re: Poor Software RAID-0 performance with 2.6.14.2

On Monday November 21, [email protected] wrote:
> On Mon, Nov 21, 2005 at 09:31:14PM +0100, Lars Roland wrote:
> > I have created a stripe across two 500Gb disks located on separate IDE
> > channels using:
> >
> > mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
>
> Does -l0 equal stripe or linear? The mdadm man page doesn't seem clear
> o that to me.

0 is raid0. I thought that was so blatantly obvious that it wasn't
worth spelling it out in the man page. Maybe I was wrong :-(.

NeilBrown

2005-11-22 08:26:43

by Helge Hafting

[permalink] [raw]
Subject: Re: Poor Software RAID-0 performance with 2.6.14.2

On Mon, Nov 21, 2005 at 09:31:14PM +0100, Lars Roland wrote:
> I have created a stripe across two 500Gb disks located on separate IDE
> channels using:
>
> mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
>
> the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> below) shows a single disk operating faster than the stripe:
>
To rule out hardware problems (harware not as parallel as you might think):

Try running the performance test (bonnie++ or hdparm)
on both /dev/hdb and /dev/hdd at the same time.

Two hdparms on different disks should not take longer time than one,
unless you have bad hardware.

One bonnie with size x MB takes y minutes to run.
Two bonnies, each of size x/2 MB should take between
y/2 an y minutes to run. If they need more, then something
is wrong again, explaining bad RAID performance.

Helge Hafting


2005-11-22 09:46:16

by Lars Roland

[permalink] [raw]
Subject: Re: Poor Software RAID-0 performance with 2.6.14.2

On 11/21/05, Lennart Sorensen <[email protected]> wrote:
> > dkstorage01:~# hdparm -t /dev/md0
> > /dev/md0:
> > Timing buffered disk reads: 182 MB in 3.01 seconds = 60.47 MB/sec
> >
> > dkstorage02:~# hdparm -t /dev/hdc1
> > /dev/hdc1:
> > Timing buffered disk reads: 184 MB in 3.02 seconds = 60.93 MB/sec
>
> How about at least testing one of the drives involved in the raid,
> although I assume they are identical in your case given the numbers.

There are four identical drives in the machines although I only stripe
on two of them - I can assure you that I get the same numbers from all
the drives - I should ofcause have put this info in the orig post.

>
> Did you test this with other kernel versions (older ones) to see if it
> was better in the past?

Also tried 2.4.27 and 2.4.30 - no difference there.

2005-11-22 10:04:06

by Lars Roland

[permalink] [raw]
Subject: Re: Poor Software RAID-0 performance with 2.6.14.2

On 11/21/05, Neil Brown <[email protected]> wrote:
> On Monday November 21, [email protected] wrote:
> > I have created a stripe across two 500Gb disks located on separate IDE
> > channels using:
> >
> > mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
> >
> > the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> > with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> > below) shows a single disk operating faster than the stripe:
> >
> > ----
> > dkstorage01:~# hdparm -t /dev/md0
> > /dev/md0:
> > Timing buffered disk reads: 182 MB in 3.01 seconds = 60.47 MB/sec
> >
> > dkstorage02:~# hdparm -t /dev/hdc1
> > /dev/hdc1:
> > Timing buffered disk reads: 184 MB in 3.02 seconds = 60.93 MB/sec
> > ----
>
> Could you try hdparm tests on the two drives in parallel?
> hdparm -t /dev/hdb & hdparm -t /dev/hdd
>
> It could be that the controller doesn't handle parallel traffic very
> well.
>

hmm I should of cause have thought of this earlier - it does indeed
seam that the controller does not handle parallel traffic very well

-----------
dkstorage01:~# hdparm -t /dev/hdb
/dev/hdb:
Timing buffered disk reads: 112 MB in 3.02 seconds = 37.09 MB/sec

dkstorage01:~# hdparm -t /dev/hdd
/dev/hdd:
Timing buffered disk reads: 108 MB in 3.02 seconds = 35.76 MB/sec
-----------

Bonnie test shows the same picture.

> raid0 has essentially 0 cpu overhead. It would be maybe a couple of
> hundred instructions which would be lost in the noise. It just
> figures out which drive each request should go to, and directs it
> there.

Yeah so it is properly just a poor controller.


--
Lars Roland

2005-11-22 17:23:42

by Bill Davidsen

[permalink] [raw]
Subject: Re: Poor Software RAID-0 performance with 2.6.14.2

Lars Roland wrote:
> I have created a stripe across two 500Gb disks located on separate IDE
> channels using:
>
> mdadm -Cv /dev/md0 -c32 -n2 -l0 /dev/hdb /dev/hdd
>
> the performance is awful on both kernel 2.6.12.5 and 2.6.14.2 (even
> with hdparm and blockdev tuning), both bonnie++ and hdparm (included
> below) shows a single disk operating faster than the stripe:

In looking at this I found something interesting, even though you
identified your problem before I was able to use the data for the
intended purpose. So other than suggesting that the stripe size is too
small, nothing on that, your hardware is the issue.

I have two ATA drives connected, and each has two partitions. The first
partition of each is mirrored for reliability with default 64k chunks,
and the second is striped, with 512k chunks (I write a lot of 100MB
files to this f/s).

Reading the individual devices with dd, I saw a transfer rate of about
60MB/s, while the striped md1 device gave just under 120MB/s. (60.3573
and 119.6458) actually. However, the mirrored md0 also gave just 60MB/s
read speed.

One of the advantages of mirroring is that if there is heavy read load
when one drive is busy there is another copy of the data on the other
drive(s). But doing 1MB reads on the mirrored device did not show that
the kernel took advantage of this in any way. In fact, it looks as if
all the reads are going to the first device, even with multiple
processes running. Does the md code now set "write-mostly" by default
and only go to the redundant drives if the first fails?

I won't be able to do a lot of testing until Thursday, or perhaps
Wednesday night, but that is not as I expected and not what I want, I do
mirroring on web and news servers to spread the head motion, now I will
be looking at the stats to see if that's happening.

I added the raid M/L to the addresses, since this is getting to be
general RAID question.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me

2005-11-22 18:23:34

by Paul Clements

[permalink] [raw]
Subject: Re: Poor Software RAID-0 performance with 2.6.14.2

Bill Davidsen wrote:

> One of the advantages of mirroring is that if there is heavy read load
> when one drive is busy there is another copy of the data on the other
> drive(s). But doing 1MB reads on the mirrored device did not show that
> the kernel took advantage of this in any way. In fact, it looks as if
> all the reads are going to the first device, even with multiple
> processes running. Does the md code now set "write-mostly" by default
> and only go to the redundant drives if the first fails?

No, it doesn't use write-mostly by default. The way raid1 read balancing
works (in recent kernels) is this:

- sequential reads continue to go to the first disk

- for non-sequential reads, the code tries to pick the disk whose head
is "closest" to the sector that needs to be read

So even if the reads aren't exactly sequential, you probably still end
up reading from the first disk most of the time. I imagine with a more
random read pattern you'd see the second disk getting used.

--
Paul

2005-11-23 15:51:06

by Bill Davidsen

[permalink] [raw]
Subject: Re: Poor Software RAID-0 performance with 2.6.14.2

Paul Clements wrote:
> Bill Davidsen wrote:
>
>> One of the advantages of mirroring is that if there is heavy read load
>> when one drive is busy there is another copy of the data on the other
>> drive(s). But doing 1MB reads on the mirrored device did not show that
>> the kernel took advantage of this in any way. In fact, it looks as if
>> all the reads are going to the first device, even with multiple
>> processes running. Does the md code now set "write-mostly" by default
>> and only go to the redundant drives if the first fails?
>
>
> No, it doesn't use write-mostly by default. The way raid1 read balancing
> works (in recent kernels) is this:
>
> - sequential reads continue to go to the first disk
>
> - for non-sequential reads, the code tries to pick the disk whose head
> is "closest" to the sector that needs to be read
>
> So even if the reads aren't exactly sequential, you probably still end
> up reading from the first disk most of the time. I imagine with a more
> random read pattern you'd see the second disk getting used.

Thanks for the clarification. I think the current method is best for
most cases, I have to think about how large a file you would need to
have any saving in transfer time given that you have to consider the
slowest seek, drives doing other things on a busy system, etc.

--
-bill davidsen ([email protected])
"The secret to procrastination is to put things off until the
last possible moment - but no longer" -me