2000-12-26 00:00:42

by Felix von Leitner

[permalink] [raw]
Subject: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

Hi,

I bought 4 ATA-100 Maxtor drives and put them on a Promise Ultra100
controller to make a single striping RAID of them to increase
throughput.

I wrote a small test program that simply reads stdin linearly and
displays the throughput. The block size is 100k. This is the result:

# cat /etc/raidtab
raiddev /dev/md/0
raid-level 0
nr-raid-disks 4
persistent-superblock 1
chunk-size 32

device /dev/ide/host2/bus0/target0/lun0/part1
raid-disk 0
device /dev/ide/host2/bus0/target1/lun0/part1
raid-disk 2

device /dev/ide/host2/bus1/target0/lun0/part1
raid-disk 1
device /dev/ide/host2/bus1/target1/lun0/part1
raid-disk 3

Here are the results of my test program on the disk devices:
# rb < /dev/ide/host2/bus0/target0/lun0/part1
27.8 meg/sec
# rb < /dev/ide/host2/bus0/target0/lun0/part1
26.8 meg/sec

the other two disks have approximately the same numbers.

Here is the result of my test program on the strip set:
# rb < /dev/md/0
30.3 meg/sec
#

While this is faster than linear mode, I would have expected much better
performance. These are the boot messages of the Promise adapter:

PDC20267: IDE controller on PCI bus 00 dev 60
PDC20267: chipset revision 2
PDC20267: not 100% native mode: will probe irqs later
PDC20267: (U)DMA Burst Bit ENABLED Primary PCI Mode Secondary PCI Mode.
ide2: BM-DMA at 0xec00-0xec07, BIOS settings: hde:pio, hdf:pio
ide3: BM-DMA at 0xec08-0xec0f, BIOS settings: hdg:pio, hdh:pio
ide2 at 0xdc00-0xdc07,0xe002 on irq 10
ide3 at 0xe400-0xe407,0xe802 on irq 10
hde: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
hdf: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
hdg: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
hdh: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)

I tuned the devices with hdparm -c 1 -a 32 -m 16 -p -u 1, for what it's worth
(did not increase throughput but appeared to lessen the CPU usage).

To verify that this is not an issue of the Promise controller, I started
two instances of my test tool at the same time, one working on hde, the
other on hdg (the two channels). Both yielded approximately 25 meg/sec,
so it does not appear to be a hardware or driver issue. Is the RAID
code really this slow? Any ideas what I can do?

I am using the user space tools from raidtools-19990421-0.90.tar.bz2,
but that should not have any influence, right?

I heard that there is a new, faster RAID code somewhere, but it only
claimed to be faster on RAID level 5, not on striping.

Any tuning advice?

By the way: I noticed another thing: one of the Maxtor hard disks was
broken. It caused the whole box to freeze solid (no numlock, no console
switches, no sysrq). That to me severely limits the usefulness of IDE
RAID. While SCSI problems cause trouble, too, I have never seen one
cause a complete freeze. How am I supposed to hot-swap the disks?
I am using VESA framebuffer, so maybe there was a panic and it simply
did not appear on my screen (or in the logs).

Hope to hear from you soon (the RAID is needed on Dec 27).
Should I use LVM instead of the MD code?

Felix


2000-12-26 00:19:47

by Felix von Leitner

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

Thus spake Felix von Leitner ([email protected]):
> Here is the result of my test program on the strip set:
> # rb < /dev/md/0
> 30.3 meg/sec
> #

One more detail: top says the CPU is 50% system when reading from either
one of the disk or raid devices. That seems awfully high considering
that the Promise controller claims to do UDMA.

Any comments?

Felix

2000-12-26 08:10:16

by Andreas Dilger

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

Felix von Leitner writes:
> I bought 4 ATA-100 Maxtor drives and put them on a Promise Ultra100
> controller to make a single striping RAID of them to increase
> throughput.
>
> I wrote a small test program that simply reads stdin linearly and
> displays the throughput. Here are the results of my test program:
> # rb < /dev/ide/host2/bus0/target0/lun0/part1
> 27.8 meg/sec
> # rb < /dev/ide/host2/bus0/target0/lun0/part1
> 26.8 meg/sec
>
> Here is the result of my test program on the strip set:
> # rb < /dev/md/0
> 30.3 meg/sec

> hde: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
> hdf: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
> hdg: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)
> hdh: 160086528 sectors (81964 MB) w/2048KiB Cache, CHS=158816/16/63, UDMA(100)

That's because IDE doesn't allow multiple requests on the same bus, unlike
SCSI. That's why IDE disks on the same bus are "master" and "slave". If
you look at the 3ware IDE RAID systems, each drive has its own IDE bus.
Maybe try a stripe set on only two disks, hde and hdg, and see how it works.

Cheers, Andreas
--
Andreas Dilger \ "If a man ate a pound of pasta and a pound of antipasto,
\ would they cancel out, leaving him still hungry?"
http://www-mddsp.enel.ucalgary.ca/People/adilger/ -- Dogbert

2000-12-26 14:29:59

by Rik van Riel

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

On Tue, 26 Dec 2000, Felix von Leitner wrote:
> Thus spake Felix von Leitner ([email protected]):
> > Here is the result of my test program on the strip set:
> > # rb < /dev/md/0
> > 30.3 meg/sec
> > #
>
> One more detail: top says the CPU is 50% system when reading from either
> one of the disk or raid devices. That seems awfully high considering
> that the Promise controller claims to do UDMA.
>
> Any comments?

Your program reads in data at 30MB/second, on a memory bus
that most likely supports something like 60 to 100MB/second.

Part of this memory bandwidth is needed for the UDMA controller
to push the data to memory, probably between 30% and 50%.

Every time the UDMA controller has the memory bus for itself the
CPU will busy-wait on memory, which shows up as CPU busy time.

regards,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2000-12-26 17:06:58

by Rik van Riel

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

On Tue, 26 Dec 2000, Felix von Leitner wrote:
> Thus spake Rik van Riel ([email protected]):
> > > One more detail: top says the CPU is 50% system when reading from either
> > > one of the disk or raid devices. That seems awfully high considering
> > > that the Promise controller claims to do UDMA.
> > >
> > > Any comments?
> > Your program reads in data at 30MB/second, on a memory bus
> > that most likely supports something like 60 to 100MB/second.
>
> 100.

So that's 30% for the UDMA controller and maybe
30% for the CPU (if your program reads in all the
data).

> > Part of this memory bandwidth is needed for the UDMA controller
> > to push the data to memory, probably between 30% and 50%.
>
> That would be 30%.

Add to that the overhead of allocating and reclaiming
the memory, doing the RAID mapping, sending commands
to the hard disk, ...

> > Every time the UDMA controller has the memory bus for itself the
> > CPU will busy-wait on memory, which shows up as CPU busy time.
>
> So, you are saying, when I add a gigabit ethernet card, CPU will hit
> 100% at about 30 MB/second? That sounds like a weak architecture ;-)

Hey, there's a reason PCs are so cheap ;)

regards,

Rik
--
Hollywood goes for world dumbination,
Trailer at 11.

http://www.surriel.com/
http://www.conectiva.com/ http://distro.conectiva.com.br/

2000-12-26 20:24:17

by Ian Stirling

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

>
> On Tue, 26 Dec 2000, Felix von Leitner wrote:
> > Thus spake Rik van Riel ([email protected]):
> > > > One more detail: top says the CPU is 50% system when reading from either
> > > > one of the disk or raid devices. That seems awfully high considering
> > > > that the Promise controller claims to do UDMA.
> > > >
> > > > Any comments?
> > > Your program reads in data at 30MB/second, on a memory bus
> > > that most likely supports something like 60 to 100MB/second.
> >
> > 100.
>
> So that's 30% for the UDMA controller and maybe
> 30% for the CPU (if your program reads in all the
> data).

Where are you getting 100MB/s?
The PCI bus can move around 130MB/sec, but RAM is lots faster.
A single PC100 DIMM can move 800MB/sec.
This P100 laptop I'm typing on gets better than 100MB/s ram reads.


Anyway, in clarification, Rik mentioned that two reads from different
disk (arrays?) on the same controller at the same time get more or less
the same speed.

2000-12-26 20:55:34

by Barry K. Nathan

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

Ian Stirling wrote:
> Where are you getting 100MB/s?
> The PCI bus can move around 130MB/sec, but RAM is lots faster.

I'll clarify your clarification further. :) Your typical PC has 33MHz
32-bit PCI. Increasing it to 66MHz or 64-bit can double the transfer rate,
and doing both can quadruple it. (Perhaps I've overlooked a detail or
oversimplified something, in which case I'd appreciate being corrected.)

-Barry K. Nathan <[email protected]>

2000-12-27 11:30:01

by Ruth Ivimey-Cook

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

At 11:29 PM 12/25/00, you wrote:
>To verify that this is not an issue of the Promise controller, I started
>two instances of my test tool at the same time, one working on hde, the
>other on hdg (the two channels). Both yielded approximately 25 meg/sec,
>so it does not appear to be a hardware or driver issue. Is the RAID
>code really this slow? Any ideas what I can do?

Use two PDC controller cards. You have shown that you can read 50MB/s from
disks into memory, and so neither the IDE not PCI bus is overloaded.
However, the RAID code is contending for control of the IDE busses because
it is not optimized for the case that the disks cannot be controlled
independently.

>By the way: I noticed another thing: one of the Maxtor hard disks was
>broken. It caused the whole box to freeze solid (no numlock, no console
>switches, no sysrq). That to me severely limits the usefulness of IDE

Did you not read the RAID FAQ? look on the raidtools web site: it
specifically states:

a) you cannot hot swap IDE
b) if you put RAID disks on the same IDE bus (i.e. use master/slave) you
can expect abysmal performance. Only use the master for a given IDE bus
(i.e. the PDC supports 2, not 4, disks).

>RAID. While SCSI problems cause trouble, too, I have never seen one
>cause a complete freeze. How am I supposed to hot-swap the disks?

On IDE, you don't. IDE never supports hot-swap, RAID or no. If you want
that, use SCSI.

Regards,

Ruth

--

Ruth
Ivimey-Cook [email protected]
Technical
Author, ARM Ltd [email protected]

2000-12-27 16:56:02

by Paul Jakma

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

On Tue, 26 Dec 2000, Ian Stirling wrote:

> The PCI bus can move around 130MB/sec,

in bursts yes, but sustained data bandwidth of PCI is a lot lower,
maybe 30 to 50MB/s. And you won't get sustained RAID performance >
sustained PCI performance.

> Anyway, in clarification, Rik mentioned that two reads from different
> disk (arrays?) on the same controller at the same time get more or less
> the same speed.

try scsi.

--paulj

2000-12-27 18:08:30

by Jakob Oestergaard

[permalink] [raw]
Subject: RAID - IDE - here we go again...

On Wed, Dec 27, 2000 at 04:23:43PM +0000, Paul Jakma wrote:
> On Tue, 26 Dec 2000, Ian Stirling wrote:
>
> > The PCI bus can move around 130MB/sec,
>
> in bursts yes, but sustained data bandwidth of PCI is a lot lower,
> maybe 30 to 50MB/s. And you won't get sustained RAID performance >
> sustained PCI performance.

Much higher than 30-50 - but yes, the total bandwidth won't exceed
the slowest channel.

>
> > Anyway, in clarification, Rik mentioned that two reads from different
> > disk (arrays?) on the same controller at the same time get more or less
> > the same speed.
>
> try scsi.

SCSI won't get you a faster PCI bus. Guys, *PLEASE*, - everyone on this list
knows what the respective virtues and horrors of SCSI and IDE are. Configured
properly, both can perform well, configured wrongly, both will suck.

The timings below are from a dual PII-350, Asus P2B-DS, it has a six disk SCSI
RAID and a five disk IDE RAID. The IDE raid is configured properly with one
channel per disk - which would have solved the performance problem in the array
that spawned this thread.

(By the way: the SCSI RAID is configured with three controllers for the six
disks, because of the low SCSI bus bandwidth, reality rules in the SCSI world
as well)

Kernel is 2.2, RAID is 0.90, IDE is Andre's

---------------------------------------
Dual PII-350, 256 MB RAM, test on 1 GB file

Filesystem: 75 GB ext2fs
RAID: Linux Softare RAID-5
Disks: 5 pcs. IBM Deskstar 75 GXP (30GB)
Controller: 3 pcs. Promise PDC-2067

-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
1024 4998 98.0 28660 45.9 14586 50.4 5386 98.5 71468 79.1 338.6 5.2

28 MB/sec write, 71MB/sec read on RAID-5.

---------------------------------------
Same box, test on 1 GB file

Filesystem: 18 GB ext2fs
RAID: Linux Softare RAID-0

This array is built on other partitions on the same disks as above.

-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
1024 5381 99.7 44267 50.1 22613 62.1 5469 99.4 98075 91.4 311.0 5.0
44 MB/sec write, 98 MB/sec read on RAID-0. That's thru *one* PCI bus folks.

Please, for further comments or another IDE-hotplug-can!-cannot!-can-so!, let's
take this to linux-raid or #offtopic ;)

Enough said.

--
................................................................
: [email protected] : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob ?stergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:

2000-12-27 18:40:57

by safemode

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

Ruth Ivimey-Cook wrote:

>
> On IDE, you don't. IDE never supports hot-swap, RAID or no. If you want
> that, use SCSI.

That's not necessarily true. There is work in linux to support Tri-stating
the ide devices with the help of a custom card that will allow one to cut
power to a specific ide device. Tri-stating allows Hot Swapping of ide
devices now. I even had a picture of the device the person is using to hot
swap. I'm sorry that I have forgotten this kernel hackers name as i have
lost the original email Along with said picture. I'm pretty sure the person
who gave it to me was 2.4.x's IDE guy but I cant be sure right now.

2000-12-28 22:13:56

by Tim Wright

[permalink] [raw]
Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

On Wed, Dec 27, 2000 at 04:23:43PM +0000, Paul Jakma wrote:
> On Tue, 26 Dec 2000, Ian Stirling wrote:
>
> > The PCI bus can move around 130MB/sec,
>
> in bursts yes, but sustained data bandwidth of PCI is a lot lower,
> maybe 30 to 50MB/s. And you won't get sustained RAID performance >
> sustained PCI performance.
>

No. A well-designed card and driver doing cache-line sized transfers can
achieve ~100MB/s. On the IBM (Sequent) NUMA machines, we achieved in excess
of 3GB/s sustained read I/O (database full table scan) on a 16-quad (32 PCI
bus) system. That works out at around 100MB/s per bus.

Regards,

Tim

--
Tim Wright - [email protected] or [email protected] or [email protected]
IBM Linux Technology Center, Beaverton, Oregon
"Nobody ever said I was charming, they said "Rimmer, you're a git!"" RD VI

Subject: Re: Abysmal RAID 0 performance on 2.4.0-test10 for IDE?

[email protected] (Tim Wright) writes:

> On Wed, Dec 27, 2000 at 04:23:43PM +0000, Paul Jakma wrote:
> > On Tue, 26 Dec 2000, Ian Stirling wrote:
> >
> > > The PCI bus can move around 130MB/sec,
> >
> > in bursts yes, but sustained data bandwidth of PCI is a lot lower,
> > maybe 30 to 50MB/s. And you won't get sustained RAID performance >
> > sustained PCI performance.
> >
>
> No. A well-designed card and driver doing cache-line sized transfers can
> achieve ~100MB/s. On the IBM (Sequent) NUMA machines, we achieved in excess
> of 3GB/s sustained read I/O (database full table scan) on a 16-quad (32 PCI
> bus) system. That works out at around 100MB/s per bus.

Sadly, I am sure that your "well-designed" system must be costly as
hell... :(

--
Mathieu CHOUQUET-STRINGER E-Mail : [email protected]
Learning French is trivial: the word for horse is cheval, and
everything else follows in the same way.
-- Alan J. Perlis