2002-04-17 12:58:43

by Baldur Norddahl

[permalink] [raw]
Subject: IDE/raid performance

Hi,

I have been doing some simple benchmarks on my IDE system. It got 12 disks
and a system disk. The 12 disks are organized in two raids like this:

Personalities : [raid5]
read_ahead 1024 sectors
md1 : active raid5 hds1[0] hdo1[1] hdk1[2] hdg1[3]
480238656 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]

md0 : active raid5 hdt1[6] hdq1[1] hdp1[7] hdm1[0] hdl1[2] hdi1[4] hdh1[3] hde1[5]
547054144 blocks level 5, 4k chunk, algorithm 0 [8/8] [UUUUUUUU]

unused devices: <none>

The md0 raid is eight 80 GB Western Digital disks. The md1 raid is four 160
GB Maxtor disks.

I am using two Promise Technology ultradma133 TX2 controllers and two
Promise Technologu ultradma100 TX2 controllers. The two ultradma133
controllers are on a 66 MHz PCI bus, while the two ultradma100 controllers
are on a 33 MHz PCI bus.

An example of a test run is:

echo Testing hdo1, hds1 and hdk1
time dd if=/dev/hdo1 of=/dev/null bs=1M count=1k &
time dd if=/dev/hds1 of=/dev/null bs=1M count=1k &
time dd if=/dev/hdk1 of=/dev/null bs=1M count=1k &
wait

I am then watching the progress in another window with vmstat 1. I copied
typical lines for each test below. What interrest me is the "bi" column for
the transfer rate. Ad the "id" column as an indicator of how much CPU is being
spend.

This test is done on a SMP system with kernel 2.4.18 with IDE patches.

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id

hdt:
0 1 0 26096 6600 159888 599004 0 0 34432 0 709 1690 1 10 88

hdt and hdg:
0 2 1 26096 6428 70736 687496 0 0 64768 0 1167 2931 1 25 74

hdt, hdg and hdp:
0 3 0 26092 7832 42632 712712 0 0 75620 0 1383 3242 7 33 60

hdt, hdg, hdp and hdm:
0 4 0 26092 6400 42464 713044 0 0 74376 0 1374 3289 0 30 70

hdt, hdg, hdp, hdm and hdl:
0 5 0 26092 6196 42412 712188 0 0 107008 696 2000 4397 5 43 51

hdt, hdg, hdp, hdm, hdl and hdi:
2 4 1 26172 5480 42184 713432 0 0 137104 0 2137 4602 5 75 20

hdt, hdg, hdp, hdm, hdl, hdi and hdh:
5 2 1 27324 5020 35268 737336 0 108 144640 108 2177 2271 0 99 0

hdt, hdg, hdp, hdm, hdl, hdi, hdh and hde:
4 4 1 27324 5420 35572 735752 0 0 143796 0 2102 2180 1 99 0

hdo:
0 1 0 27324 7032 55732 666408 0 0 36352 0 710 1796 0 12 87

hdo and hds:
0 2 1 27324 6516 40012 691588 0 0 72576 0 1264 3311 0 24 75

hdo, hds and hdk:
0 3 0 27316 6012 40048 692088 0 0 108944 484 1970 4523 0 50 50

hdo, hds, hdk and hdg:
4 0 1 27316 5552 40080 694124 0 0 134572 0 2252 4825 1 70 29

md0:
1 0 0 27324 13460 38104 692140 0 0 76676 0 4639 2611 4 74 22

md1:
0 1 0 27316 10224 40340 697780 0 0 69504 0 1893 3892 1 55 44

md0 and md1:
2 1 1 27316 7188 40224 675200 0 0 81470 164 3935 2389 9 77 14

It is clear that the 33 MHz PCI bus maxes out at 75 MB/s. Is there a reason
it doesn't reach 132 MB/s?

Second, why are the md devices so slow? I would have expected it to reach
130+ MB/s on both md0 and md1. It even has spare CPU time to do it with.

Another issue is when the system is under heavy load this often happens:

hdq: dma_intr: bad DMA status (dma_stat=35)
hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
hdq: dma_intr: bad DMA status (dma_stat=35)
hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
hdt: dma_intr: bad DMA status (dma_stat=75)
hdt: dma_intr: status=0x50 { DriveReady SeekComplete }
hdq: dma_intr: bad DMA status (dma_stat=35)
hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
hdq: dma_intr: bad DMA status (dma_stat=35)
hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
hdq: dma_intr: bad DMA status (dma_stat=35)
hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
hdq: dma_intr: bad DMA status (dma_stat=35)
hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
hdq: timeout waiting for DMA
PDC202XX: Primary channel reset.
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
hdq: dma_intr: bad DMA status (dma_stat=35)
hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
hdq: timeout waiting for DMA
PDC202XX: Primary channel reset.
ide_dmaproc: chipset supported ide_dma_timeout func only: 14
etc.

It did not happen during the test above though.

Baldur


2002-04-17 13:43:53

by Mike Dresser

[permalink] [raw]
Subject: Re: IDE/raid performance

On Wed, 17 Apr 2002, Baldur Norddahl wrote:

> Hi,
>
> I have been doing some simple benchmarks on my IDE system. It got 12 disks
> and a system disk. The 12 disks are organized in two raids like this:

What is the exact hardware configuration of the system, besides the disk?
CPU/Motherboard/etc?

> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdt: dma_intr: bad DMA status (dma_stat=75)
> hdt: dma_intr: status=0x50 { DriveReady SeekComplete }

I'd take a look at the cable on hdq and hdt. Try replacing it, and see
what happens.

Also, with 12 hd's, dual cpu's, etc, what kind of power supply are you
using?

Mike

2002-04-17 14:00:48

by Baldur Norddahl

[permalink] [raw]
Subject: Re: IDE/raid performance

Quoting Mike Dresser ([email protected]):
> On Wed, 17 Apr 2002, Baldur Norddahl wrote:
>
> > Hi,
> >
> > I have been doing some simple benchmarks on my IDE system. It got 12 disks
> > and a system disk. The 12 disks are organized in two raids like this:
>
> What is the exact hardware configuration of the system, besides the disk?
> CPU/Motherboard/etc?

Motherboard: Tyan Tiger MPX S2466N
Chipset: AMD 760MPX
CPU: Dual Athlon MP 1800+ 1.53 GHz
Two Promise Technology UltraDMA133 TX2 controllers
Two Promise Technology UltraDMA100 TX2 controllers
Matrox G200 AGP video card.
1 GB registered DDR RAM

> > hdq: dma_intr: bad DMA status (dma_stat=35)
> > hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> > hdq: dma_intr: bad DMA status (dma_stat=35)
> > hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> > hdt: dma_intr: bad DMA status (dma_stat=75)
> > hdt: dma_intr: status=0x50 { DriveReady SeekComplete }
>
> I'd take a look at the cable on hdq and hdt. Try replacing it, and see
> what happens.

It happens randomly for all disks on the promise controllers. Never for the
system disk on hda (on the motherboard buildin controller). The above was
just a random snippet from dmesg, there is alot more of the same spam.

> Also, with 12 hd's, dual cpu's, etc, what kind of power supply are you
> using?

It is a 350W powersupply. I wanted something bigger, but couldn't get it for
sane prices. I can't rule out that it is overloaded of course. If it is, I
haven't seen any other symptoms, the system is rock stable so far.

Baldur

2002-04-17 15:13:11

by Bill Davidsen

[permalink] [raw]
Subject: Re: IDE/raid performance

On Wed, 17 Apr 2002, Baldur Norddahl wrote:

> I have been doing some simple benchmarks on my IDE system. It got 12 disks
> and a system disk. The 12 disks are organized in two raids like this:

> echo Testing hdo1, hds1 and hdk1
> time dd if=/dev/hdo1 of=/dev/null bs=1M count=1k &
> time dd if=/dev/hds1 of=/dev/null bs=1M count=1k &
> time dd if=/dev/hdk1 of=/dev/null bs=1M count=1k &
> wait

> It is clear that the 33 MHz PCI bus maxes out at 75 MB/s. Is there a reason
> it doesn't reach 132 MB/s?

I suspect you have tuned your system to the max, but I will mention
using 32 bit transfers on all drives, read ahead via hdparm, etc.

> Second, why are the md devices so slow? I would have expected it to reach
> 130+ MB/s on both md0 and md1. It even has spare CPU time to do it with.

Possibly contention? Try smaller read sizes and see if the rate goes up.
Also, your strip size is small for stuff like this, for high volume
sequential data I used 256k. That was SCSI, though.

> Another issue is when the system is under heavy load this often happens:
>
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdt: dma_intr: bad DMA status (dma_stat=75)
> hdt: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: timeout waiting for DMA
> PDC202XX: Primary channel reset.
> ide_dmaproc: chipset supported ide_dma_timeout func only: 14
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: timeout waiting for DMA
> PDC202XX: Primary channel reset.
> ide_dmaproc: chipset supported ide_dma_timeout func only: 14
> etc.

That looks like a hardware issue to me. I haven't looked at this
closely, but does the fallback include switching to PIO mode on errors
like this?

> It did not happen during the test above though.

Good to eliminate, however.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-04-17 15:16:23

by Nick

[permalink] [raw]
Subject: Re: IDE/raid performance

On Wed, 17 Apr 2002, Baldur Norddahl wrote:
> Quoting Mike Dresser ([email protected]):
> > On Wed, 17 Apr 2002, Baldur Norddahl wrote:
> Motherboard: Tyan Tiger MPX S2466N
> Chipset: AMD 760MPX
> CPU: Dual Athlon MP 1800+ 1.53 GHz
> Two Promise Technology UltraDMA133 TX2 controllers
> Two Promise Technology UltraDMA100 TX2 controllers
> Matrox G200 AGP video card.
> 1 GB registered DDR RAM
> > Also, with 12 hd's, dual cpu's, etc, what kind of power supply are you
> > using?
> It is a 350W powersupply. I wanted something bigger, but couldn't get it for
> sane prices. I can't rule out that it is overloaded of course. If it is, I
> haven't seen any other symptoms, the system is rock stable so far.
> Baldur
AMD recommends a minimum of 400watts for a dual athlon system
IIRC. Ignoreing that the startup current on an IBM 5400rpm IDE disk seems
to be about 25-30watts. Each 1800+ MP puts out 66w of heat, meaning it
uses more than 66w (I couldn't find the power useage stats) for a total of
132 watts, so on boot ignoreing everything but the disks and the chips
you've got 12x25w (for the disks) + 2x66w (for the procs) or about
432watts. This will go down alot after all your disks spin up, but I'm
amazed your system boots. Morale of this message: Don't be a dipshit and
put 12 IDE disks on a single power supply.
Nick

2002-04-17 16:27:24

by Alan

[permalink] [raw]
Subject: Re: IDE/raid performance

> 432watts. This will go down alot after all your disks spin up, but I'm
> amazed your system boots. Morale of this message: Don't be a dipshit and
> put 12 IDE disks on a single power supply.

I've run a dual athlon set up fully loaded with cards with 10 disks - that
takes a 550W PSU but works

2002-04-17 17:07:47

by Jeff V. Merkey

[permalink] [raw]
Subject: Re: IDE/raid performance


>From my analysis with 3Ware at 32 drive configurations, you really
need to power the drives from a separate power supply is you have
more than 16 devices. They really suck the power during initial
spinup.

Jeff


On Wed, Apr 17, 2002 at 04:48:20PM +0100, Alan Cox wrote:
> > 432watts. This will go down alot after all your disks spin up, but I'm
> > amazed your system boots. Morale of this message: Don't be a dipshit and
> > put 12 IDE disks on a single power supply.
>
> I've run a dual athlon set up fully loaded with cards with 10 disks - that
> takes a 550W PSU but works
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

2002-04-17 17:36:32

by Baldur Norddahl

[permalink] [raw]
Subject: Re: IDE/raid performance

Quoting [email protected] ([email protected]):
> On Wed, 17 Apr 2002, Baldur Norddahl wrote:
> > Quoting Mike Dresser ([email protected]):
> > > On Wed, 17 Apr 2002, Baldur Norddahl wrote:
> > Motherboard: Tyan Tiger MPX S2466N
> > Chipset: AMD 760MPX
> > CPU: Dual Athlon MP 1800+ 1.53 GHz
> > Two Promise Technology UltraDMA133 TX2 controllers
> > Two Promise Technology UltraDMA100 TX2 controllers
> > Matrox G200 AGP video card.
> > 1 GB registered DDR RAM
> > > Also, with 12 hd's, dual cpu's, etc, what kind of power supply are you
> > > using?
> > It is a 350W powersupply. I wanted something bigger, but couldn't get it for
> > sane prices. I can't rule out that it is overloaded of course. If it is, I
> > haven't seen any other symptoms, the system is rock stable so far.
> > Baldur
> AMD recommends a minimum of 400watts for a dual athlon system
> IIRC. Ignoreing that the startup current on an IBM 5400rpm IDE disk seems
> to be about 25-30watts. Each 1800+ MP puts out 66w of heat, meaning it
> uses more than 66w (I couldn't find the power useage stats) for a total of
> 132 watts, so on boot ignoreing everything but the disks and the chips
> you've got 12x25w (for the disks) + 2x66w (for the procs) or about
> 432watts. This will go down alot after all your disks spin up, but I'm
> amazed your system boots. Morale of this message: Don't be a dipshit and
> put 12 IDE disks on a single power supply.
> Nick
>

I checked and it is actually a 400W powersupply. Sorry about that.

Actually the power usage for the 80 GB western digital disks are:

Spinup(peak) 17.0W
Read/Write 8.0W
Seek 14.0W

And for the 160 GB maxtor disks:

Spinup(peak) 23.7W
Read/Write 4.8W
Seek 5.2W

This adds up to 8*17.0+4*23.7 = 231W spinup and 8*14.0+4*5.2 = 132W during
seek.

The Athlon MP 1800+ has the following stats:

Voltage: 1.75V
Maximum thermal power: 66.0W
Maximum Icc 37.7A
Typical thermal power: 58.9W
Typical Icc 33.7A

So the CPU maximum uses Vcc*Icc = 66.0W (wow they didn't lie about the
thermal power).

It is not likely that both CPUs are burning 66W during the inital phases of
boot where the disk do their spinup. Even then you might also be able to
draw more current out of the powersupply than its rating for a short while.

For this discussion it is more interresting if it is enough during normal
operations. Worst case we got 132W for the disks and 2*66W for the CPUs,
which leaves 400W - 132 - 2*66 = 136WW for the motherboard, gfx, 4 promise
controllers and the system disk.

The recommended minimum powersupply for this motherboard is 300W.

Baldur

2002-04-17 17:41:20

by Mike Dresser

[permalink] [raw]
Subject: Re: IDE/raid performance

> I checked and it is actually a 400W powersupply. Sorry about that.

What is the max current on the +5/+12/whatever combo.
I've frequently seen power supplies that have max current on the three or
four voltages, and then the rest is left over for the small stuff.

Mike

2002-04-17 17:48:06

by Nick

[permalink] [raw]
Subject: Re: IDE/raid performance

Well what's your GFX card? Anyone know the specs on a promise UltraDMA
133 or 100 card or the 760MPX chipset?
Nick

On Wed, 17 Apr 2002, Baldur Norddahl wrote:

> Quoting [email protected] ([email protected]):
> > On Wed, 17 Apr 2002, Baldur Norddahl wrote:
> > > Quoting Mike Dresser ([email protected]):
> > > > On Wed, 17 Apr 2002, Baldur Norddahl wrote:
> > > Motherboard: Tyan Tiger MPX S2466N
> > > Chipset: AMD 760MPX
> > > CPU: Dual Athlon MP 1800+ 1.53 GHz
> > > Two Promise Technology UltraDMA133 TX2 controllers
> > > Two Promise Technology UltraDMA100 TX2 controllers
> > > Matrox G200 AGP video card.
> > > 1 GB registered DDR RAM
> > > > Also, with 12 hd's, dual cpu's, etc, what kind of power supply are you
> > > > using?
> > > It is a 350W powersupply. I wanted something bigger, but couldn't get it for
> > > sane prices. I can't rule out that it is overloaded of course. If it is, I
> > > haven't seen any other symptoms, the system is rock stable so far.
> > > Baldur
> > AMD recommends a minimum of 400watts for a dual athlon system
> > IIRC. Ignoreing that the startup current on an IBM 5400rpm IDE disk seems
> > to be about 25-30watts. Each 1800+ MP puts out 66w of heat, meaning it
> > uses more than 66w (I couldn't find the power useage stats) for a total of
> > 132 watts, so on boot ignoreing everything but the disks and the chips
> > you've got 12x25w (for the disks) + 2x66w (for the procs) or about
> > 432watts. This will go down alot after all your disks spin up, but I'm
> > amazed your system boots. Morale of this message: Don't be a dipshit and
> > put 12 IDE disks on a single power supply.
> > Nick
> >
>
> I checked and it is actually a 400W powersupply. Sorry about that.
>
> Actually the power usage for the 80 GB western digital disks are:
>
> Spinup(peak) 17.0W
> Read/Write 8.0W
> Seek 14.0W
>
> And for the 160 GB maxtor disks:
>
> Spinup(peak) 23.7W
> Read/Write 4.8W
> Seek 5.2W
>
> This adds up to 8*17.0+4*23.7 = 231W spinup and 8*14.0+4*5.2 = 132W during
> seek.
>
> The Athlon MP 1800+ has the following stats:
>
> Voltage: 1.75V
> Maximum thermal power: 66.0W
> Maximum Icc 37.7A
> Typical thermal power: 58.9W
> Typical Icc 33.7A
>
> So the CPU maximum uses Vcc*Icc = 66.0W (wow they didn't lie about the
> thermal power).
>
> It is not likely that both CPUs are burning 66W during the inital phases of
> boot where the disk do their spinup. Even then you might also be able to
> draw more current out of the powersupply than its rating for a short while.
>
> For this discussion it is more interresting if it is enough during normal
> operations. Worst case we got 132W for the disks and 2*66W for the CPUs,
> which leaves 400W - 132 - 2*66 = 136WW for the motherboard, gfx, 4 promise
> controllers and the system disk.
>
> The recommended minimum powersupply for this motherboard is 300W.
>
> Baldur
>

2002-04-17 17:47:18

by Kent Borg

[permalink] [raw]
Subject: Re: IDE/raid performance


On Wed, Apr 17, 2002 at 10:27:22AM -0700, Jeff V. Merkey wrote:
> From my analysis with 3Ware at 32 drive configurations, you really
> need to power the drives from a separate power supply is you have
> more than 16 devices. They really suck the power during initial
> spinup.

It seems an obvious help would be to have the option of spinning up
the drives one at a time at 2-3 second intervals. I know a fast drive
doesn't get up to speed in 3 seconds, but the nastiest draw is going
to be over by then.

A machine with 32 drives is pretty serious stuff and probably isn't
booting in a few seconds anyway--another 60-some seconds might be a
desirable option.

Does this exist anywhere? Would it have to be a BIOS feature?


-kb

2002-04-17 17:50:16

by Nick

[permalink] [raw]
Subject: Re: IDE/raid performance

It's fairly common in SCSI raid setups, however I've never seen it for
IDE.
Nick

On Wed, 17 Apr 2002, Kent Borg wrote:

>
> On Wed, Apr 17, 2002 at 10:27:22AM -0700, Jeff V. Merkey wrote:
> > From my analysis with 3Ware at 32 drive configurations, you really
> > need to power the drives from a separate power supply is you have
> > more than 16 devices. They really suck the power during initial
> > spinup.
>
> It seems an obvious help would be to have the option of spinning up
> the drives one at a time at 2-3 second intervals. I know a fast drive
> doesn't get up to speed in 3 seconds, but the nastiest draw is going
> to be over by then.
>
> A machine with 32 drives is pretty serious stuff and probably isn't
> booting in a few seconds anyway--another 60-some seconds might be a
> desirable option.
>
> Does this exist anywhere? Would it have to be a BIOS feature?
>
>
> -kb
>

2002-04-17 18:39:18

by Andre Hedrick

[permalink] [raw]
Subject: Re: IDE/raid performance



case ide_dma_test_irq: /* returns 1 if dma irq issued, 0 otherwise */
dma_stat = IN_BYTE(dma_base+2);
if (newchip)
return (dma_stat & 4) == 4;

sc1d = IN_BYTE(high_16 + 0x001d);
if (HWIF(drive)->channel) {
if ((sc1d & 0x50) == 0x50) goto somebody_else;
else if ((sc1d & 0x40) == 0x40)
return (dma_stat & 4) == 4;
} else {
if ((sc1d & 0x05) == 0x05) goto somebody_else;
else if ((sc1d & 0x04) == 0x04)
return (dma_stat & 4) == 4;
}
somebody_else:
return (dma_stat & 4) == 4; /* return 1 if INTR asserted */

Please note the old chips have an interrupt parser and owner ship.
The new chips do not have this feature reported.

Once you hit high load/io the cards/driver get confused.
Who owns the interrupt as retruned and expect more reports dma_intr error
35 to show up.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Wed, 17 Apr 2002, Baldur Norddahl wrote:

> Hi,
>
> I have been doing some simple benchmarks on my IDE system. It got 12 disks
> and a system disk. The 12 disks are organized in two raids like this:
>
> Personalities : [raid5]
> read_ahead 1024 sectors
> md1 : active raid5 hds1[0] hdo1[1] hdk1[2] hdg1[3]
> 480238656 blocks level 5, 32k chunk, algorithm 0 [4/4] [UUUU]
>
> md0 : active raid5 hdt1[6] hdq1[1] hdp1[7] hdm1[0] hdl1[2] hdi1[4] hdh1[3] hde1[5]
> 547054144 blocks level 5, 4k chunk, algorithm 0 [8/8] [UUUUUUUU]
>
> unused devices: <none>
>
> The md0 raid is eight 80 GB Western Digital disks. The md1 raid is four 160
> GB Maxtor disks.
>
> I am using two Promise Technology ultradma133 TX2 controllers and two
> Promise Technologu ultradma100 TX2 controllers. The two ultradma133
> controllers are on a 66 MHz PCI bus, while the two ultradma100 controllers
> are on a 33 MHz PCI bus.
>
> An example of a test run is:
>
> echo Testing hdo1, hds1 and hdk1
> time dd if=/dev/hdo1 of=/dev/null bs=1M count=1k &
> time dd if=/dev/hds1 of=/dev/null bs=1M count=1k &
> time dd if=/dev/hdk1 of=/dev/null bs=1M count=1k &
> wait
>
> I am then watching the progress in another window with vmstat 1. I copied
> typical lines for each test below. What interrest me is the "bi" column for
> the transfer rate. Ad the "id" column as an indicator of how much CPU is being
> spend.
>
> This test is done on a SMP system with kernel 2.4.18 with IDE patches.
>
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
>
> hdt:
> 0 1 0 26096 6600 159888 599004 0 0 34432 0 709 1690 1 10 88
>
> hdt and hdg:
> 0 2 1 26096 6428 70736 687496 0 0 64768 0 1167 2931 1 25 74
>
> hdt, hdg and hdp:
> 0 3 0 26092 7832 42632 712712 0 0 75620 0 1383 3242 7 33 60
>
> hdt, hdg, hdp and hdm:
> 0 4 0 26092 6400 42464 713044 0 0 74376 0 1374 3289 0 30 70
>
> hdt, hdg, hdp, hdm and hdl:
> 0 5 0 26092 6196 42412 712188 0 0 107008 696 2000 4397 5 43 51
>
> hdt, hdg, hdp, hdm, hdl and hdi:
> 2 4 1 26172 5480 42184 713432 0 0 137104 0 2137 4602 5 75 20
>
> hdt, hdg, hdp, hdm, hdl, hdi and hdh:
> 5 2 1 27324 5020 35268 737336 0 108 144640 108 2177 2271 0 99 0
>
> hdt, hdg, hdp, hdm, hdl, hdi, hdh and hde:
> 4 4 1 27324 5420 35572 735752 0 0 143796 0 2102 2180 1 99 0
>
> hdo:
> 0 1 0 27324 7032 55732 666408 0 0 36352 0 710 1796 0 12 87
>
> hdo and hds:
> 0 2 1 27324 6516 40012 691588 0 0 72576 0 1264 3311 0 24 75
>
> hdo, hds and hdk:
> 0 3 0 27316 6012 40048 692088 0 0 108944 484 1970 4523 0 50 50
>
> hdo, hds, hdk and hdg:
> 4 0 1 27316 5552 40080 694124 0 0 134572 0 2252 4825 1 70 29
>
> md0:
> 1 0 0 27324 13460 38104 692140 0 0 76676 0 4639 2611 4 74 22
>
> md1:
> 0 1 0 27316 10224 40340 697780 0 0 69504 0 1893 3892 1 55 44
>
> md0 and md1:
> 2 1 1 27316 7188 40224 675200 0 0 81470 164 3935 2389 9 77 14
>
> It is clear that the 33 MHz PCI bus maxes out at 75 MB/s. Is there a reason
> it doesn't reach 132 MB/s?
>
> Second, why are the md devices so slow? I would have expected it to reach
> 130+ MB/s on both md0 and md1. It even has spare CPU time to do it with.
>
> Another issue is when the system is under heavy load this often happens:
>
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdt: dma_intr: bad DMA status (dma_stat=75)
> hdt: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: timeout waiting for DMA
> PDC202XX: Primary channel reset.
> ide_dmaproc: chipset supported ide_dma_timeout func only: 14
> hdq: dma_intr: bad DMA status (dma_stat=35)
> hdq: dma_intr: status=0x50 { DriveReady SeekComplete }
> hdq: timeout waiting for DMA
> PDC202XX: Primary channel reset.
> ide_dmaproc: chipset supported ide_dma_timeout func only: 14
> etc.
>
> It did not happen during the test above though.
>
> Baldur
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-04-17 20:36:30

by Kurt Garloff

[permalink] [raw]
Subject: Re: IDE/raid performance

Hi Nick,

On Wed, Apr 17, 2002 at 11:15:15AM -0400, [email protected] wrote:
> to be about 25-30watts. Each 1800+ MP puts out 66w of heat, meaning it
> uses more than 66w (I couldn't find the power useage stats)

Meaning that they consume 66w. All energy is tranfered to heat.
(Where else would you expect energy to go? My CPUs don't do mechanical
work nor do they build up potential energy.)

Regards,
--
Kurt Garloff <[email protected]> [Eindhoven, NL]
Physics: Plasma simulations <[email protected]> [TU Eindhoven, NL]
Linux: SCSI, Security <[email protected]> [SuSE Nuernberg, DE]
(See mail header or public key servers for PGP2 and GPG public keys.)


Attachments:
(No filename) (699.00 B)
(No filename) (232.00 B)
Download all attachments

2002-04-17 22:35:48

by dean gaudet

[permalink] [raw]
Subject: Re: IDE/raid performance

On Wed, 17 Apr 2002, Baldur Norddahl wrote:

> It is not likely that both CPUs are burning 66W during the inital phases of
> boot where the disk do their spinup.

while one cpu is almost certainly idle during, the other probably is not
doing nice things like going into HALT states; so it's entirely likely
that it's consuming a substantial amount of power. the BIOS does not
typically ever use HALT or other power saving states except on non-boot
CPUs.

i've got an external power meter for doing measurements of this sort of
thing, and i recently built a dual athlon system (tyan S2462NG mobo) w/
four maxtor D740X 80GB disks (on a pair of promise ultra100 TX2). its
peak during powerup is 255W. it idles at 193W, and will run up to 225W
while compiling a kernel.

(i made no attempt to find a "power virus" for this system. i have a 460W
power supply and i'm happy i'm well within limits.)

fwiw, my drives are rated at 24W each for power up... and the CPUs are
1.4GHz (purchased in feb/02, which was when the 1.4GHz were the best
price/performance, so around the middle of AMD's yield curve.)

my meter is a ~USD1000 lab quality meter... but you can get reasonablly
accurate measurements by picking up a ~USD50 "AC clamp meter" from an
electrician's supply store. look for one with 0.1A accuracy. AC clamp
meters use magnetic inductance to measure the current flow around an AC
wire. (for example, see
<http://www.fluke.com/products/home.asp?SID=5&AGID=3&PID=30405>)

to make the measurements you need to put the clamp around a single live
wire. you don't need to remove insulation from a live wire -- the
magnetic induction occurs even if there's still insulation around it.
but you can't measure with both live wires inside the clamp (their fields
are opposite and cancel)... so you do need to isolate the clamp around one
live wire ...

i've carefully removed the outer (black) insulation from a computer power
cable, exposing the three (still insulated) wires inside (which happen to
be black, white, and green). then i put the clamp around the black wire
for measurement. (or you can measure entire circuits at the fuse box.)

WARNING DANGER! i'm not responsible for any damage, injury and so forth,
which you incur as a result of trying to use one of these devices. you
assume all responsibility, and so forth. see my legal disclaimer at
<http://arctic.org/~dean/legal.html> if you're in doubt.

remember that power = current * volts-rms... (and now all the EE geeks
will jump in and tell me how i'm wrong and what the real detailed formula
actually is, and how power supplies quote their power numbers in confusing
manners, and so forth... and me being a software engineer i'm happy just
to see that i'm only consuming half the rated power of my power supply,
and that's probably a fine enough safety margin :)

-dean

2002-04-17 22:45:25

by Lincoln Dale

[permalink] [raw]
Subject: Re: IDE/raid performance

At 02:58 PM 17/04/2002 +0200, Baldur Norddahl wrote:
>It is clear that the 33 MHz PCI bus maxes out at 75 MB/s. Is there a reason
>it doesn't reach 132 MB/s?

welcome to the world of PC hardware, real-world performance and theoretical
numbers.

in theory, a 32/33 PCI bus can get 132mbyte/sec.

in reality, the more cards you have on a bus, the more arbitration you
have, the less overall efficiency.

in theory, with neither the initiator or target inserting wait-states, and
with continual bursting, you can achieve maximum throughput.
in reality, continual bursting doesn't happen very often and/or many
hardware devices are not designed to either perform i/o without some
wait-states in some conditions or provide continual bursting.

in short: you're working on theoretical numbers. reality is typically far
far different!


something you may want to try:
if your motherboard supports it, change the "PCI Burst" settings and see
what effect this has.
you can probably extract another 20-25% performance by changing the PCI
Burst from 32 to 64.

>Second, why are the md devices so slow? I would have expected it to reach
>130+ MB/s on both md0 and md1. It even has spare CPU time to do it with.

you don't mention actually what your motherboard or chipset actually is --
and where the 32/33 and 64/66 PCI connect in.
you also don't mention what your FSB & memory clock-speed are, or how these
are connected to the PCI busses.


it is likely that you have a motherboard where the throughput between PCI
to memory will also contend with the FSB.
given you're using "time dd if=/dev/hdo1 of=/dev/null bs=1M count=1" as
your test, you're effectively issuing read() and write() system-calls from
user-space to kernel.
this implies a memory-copy.
count the number of times you're doing a memory-copy (or, more correctly,
moving data across the front-side-bus), and you should be able to see
another reason for the bottlenecks you see.


cheers,

lincoln.

2002-04-17 23:24:38

by Mike Fedyk

[permalink] [raw]
Subject: Re: IDE/raid performance

On Wed, Apr 17, 2002 at 01:47:16PM -0400, Kent Borg wrote:
>
> On Wed, Apr 17, 2002 at 10:27:22AM -0700, Jeff V. Merkey wrote:
> > From my analysis with 3Ware at 32 drive configurations, you really
> > need to power the drives from a separate power supply is you have
> > more than 16 devices. They really suck the power during initial
> > spinup.
>
> It seems an obvious help would be to have the option of spinning up
> the drives one at a time at 2-3 second intervals. I know a fast drive
> doesn't get up to speed in 3 seconds, but the nastiest draw is going
> to be over by then.
>
> A machine with 32 drives is pretty serious stuff and probably isn't
> booting in a few seconds anyway--another 60-some seconds might be a
> desirable option.
>
> Does this exist anywhere? Would it have to be a BIOS feature?

I doubt it.

All of the IDE drives I have used spin up when power is applied. Most of
the scsi (except for some really old ones) have a jumper that tells the
drive to wait until it receives a message from the scsi controller to spin up.

I'd imagine that IDE would need some protocol spec changes before this could
be supported (at least a "spin the drive up" message...).

Mike

2002-04-17 23:36:10

by Alan

[permalink] [raw]
Subject: Re: IDE/raid performance

> doing nice things like going into HALT states; so it's entirely likely
> that it's consuming a substantial amount of power. the BIOS does not

halt makes little if any difference on the newer processors for a typical
setup

2002-04-18 00:36:26

by Mike Fedyk

[permalink] [raw]
Subject: Re: IDE/raid performance

On Thu, Apr 18, 2002 at 08:44:45AM +1000, Lincoln Dale wrote:
> At 02:58 PM 17/04/2002 +0200, Baldur Norddahl wrote:
> >It is clear that the 33 MHz PCI bus maxes out at 75 MB/s. Is there a reason
> >it doesn't reach 132 MB/s?
>
> welcome to the world of PC hardware, real-world performance and theoretical
> numbers.
>
> in theory, a 32/33 PCI bus can get 132mbyte/sec.
>
> in reality, the more cards you have on a bus, the more arbitration you
> have, the less overall efficiency.
>
> in theory, with neither the initiator or target inserting wait-states, and
> with continual bursting, you can achieve maximum throughput.
> in reality, continual bursting doesn't happen very often and/or many
> hardware devices are not designed to either perform i/o without some
> wait-states in some conditions or provide continual bursting.
>
> in short: you're working on theoretical numbers. reality is typically far
> far different!
>
>
> something you may want to try:
> if your motherboard supports it, change the "PCI Burst" settings and see
> what effect this has.
> you can probably extract another 20-25% performance by changing the PCI
> Burst from 32 to 64.

This ie a problem with the VIA chipsets. Intel chipsets burst 4096
bytes per burst, while the VIA chipsets were sending doing 64 bytes per burst.

AMD (like the origional poster later mentioned) chipsets weren't mentioned
in the comparison article I read, so I don't know if it has the same
trouble.

Mike

2002-04-18 06:49:08

by Andre Hedrick

[permalink] [raw]
Subject: Re: IDE/raid performance


Already there ...

It is called "specific configuration".
I will add support for it in 2.4 soon enough, once I have satatisfied it
functionally works.

Cheers,

Andre Hedrick
LAD Storage Consulting Group

On Wed, 17 Apr 2002, Mike Fedyk wrote:

> On Wed, Apr 17, 2002 at 01:47:16PM -0400, Kent Borg wrote:
> >
> > On Wed, Apr 17, 2002 at 10:27:22AM -0700, Jeff V. Merkey wrote:
> > > From my analysis with 3Ware at 32 drive configurations, you really
> > > need to power the drives from a separate power supply is you have
> > > more than 16 devices. They really suck the power during initial
> > > spinup.
> >
> > It seems an obvious help would be to have the option of spinning up
> > the drives one at a time at 2-3 second intervals. I know a fast drive
> > doesn't get up to speed in 3 seconds, but the nastiest draw is going
> > to be over by then.
> >
> > A machine with 32 drives is pretty serious stuff and probably isn't
> > booting in a few seconds anyway--another 60-some seconds might be a
> > desirable option.
> >
> > Does this exist anywhere? Would it have to be a BIOS feature?
>
> I doubt it.
>
> All of the IDE drives I have used spin up when power is applied. Most of
> the scsi (except for some really old ones) have a jumper that tells the
> drive to wait until it receives a message from the scsi controller to spin up.
>
> I'd imagine that IDE would need some protocol spec changes before this could
> be supported (at least a "spin the drive up" message...).
>
> Mike
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

2002-04-18 07:41:36

by Helge Hafting

[permalink] [raw]
Subject: Re: IDE/raid performance

Mike Fedyk wrote:

> I'd imagine that IDE would need some protocol spec changes before this could
> be supported (at least a "spin the drive up" message...).
>
Exists already. You may use hdparm to tell IDE drives
to spin up and down or even set a timeout. This is
mostly for power-saving or no-noise setups.

So they could indeed add a jumper to IDE drives to let them
power up in the spun-down state. But that's not what
the vast majority of one-disk users want.

Helge Hafting

2002-04-18 15:55:03

by Bill Davidsen

[permalink] [raw]
Subject: Re: IDE/raid performance

On Wed, 17 Apr 2002 [email protected] wrote:

> to be about 25-30watts. Each 1800+ MP puts out 66w of heat, meaning it
> uses more than 66w

Unless they changed the laws of physics, the power in is the same as the
power out, and the temp will rise to increase power out (or limit power in
by melting). The power of the output driver lines is really too small to
consider.

> Morale of this message: Don't be a dipshit and
> put 12 IDE disks on a single power supply.

1. learn physics
2. learn vocabulary
3. learn diplomacy

Since he has problems running, when there's no question of power being
adequate, rather than while booting, I think looking for the real problem
is now in order. There have been several constructive suggestions on this,
which address the problem.

--
bill davidsen <[email protected]>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

2002-04-18 18:35:23

by Mike Galbraith

[permalink] [raw]
Subject: Re: IDE/raid performance


----- Original Message -----
From: "Bill Davidsen" <[email protected]>
To: <[email protected]>
Cc: "Linux Kernel Mailing List" <[email protected]>
Sent: Thursday, April 18, 2002 5:51 PM
Subject: Re: IDE/raid performance


> On Wed, 17 Apr 2002 [email protected] wrote:
>
> > to be about 25-30watts. Each 1800+ MP puts out 66w of heat, meaning it
> > uses more than 66w
>
> Unless they changed the laws of physics, the power in is the same as the
> power out, and the temp will rise to increase power out (or limit power in
> by melting). The power of the output driver lines is really too small to
> consider.
>
> > Morale of this message: Don't be a dipshit and
> > put 12 IDE disks on a single power supply.
>
> 1. learn physics
> 2. learn vocabulary
> 3. learn diplomacy
>
> Since he has problems running, when there's no question of power being
> adequate, rather than while booting, I think looking for the real problem
> is now in order. There have been several constructive suggestions on this,
> which address the problem.

I like your point #3 quite a lot.

-Mike

2002-04-18 21:20:06

by Mike Fedyk

[permalink] [raw]
Subject: Re: IDE/raid performance

On Thu, Apr 18, 2002 at 09:41:20AM +0200, Helge Hafting wrote:
> Mike Fedyk wrote:
>
> > I'd imagine that IDE would need some protocol spec changes before this could
> > be supported (at least a "spin the drive up" message...).
> >
> Exists already. You may use hdparm to tell IDE drives
> to spin up and down or even set a timeout. This is
> mostly for power-saving or no-noise setups.
>

Oh yes, I know about that, but didn't remember it when I posted.

> So they could indeed add a jumper to IDE drives to let them
> power up in the spun-down state. But that's not what
> the vast majority of one-disk users want.
>

This is the specific thing I was talking about. Even if the drive can power
down with a command, it doesn't wait for a command to perform the spinup
when power is applied, and that's what's missing.

It seems like there is already protocol support in IDE, so the drive just
needs a way to be configured... Maybe some drives will allow software
config of this when they implement it?

2002-04-22 16:07:00

by Pavel Machek

[permalink] [raw]
Subject: Re: IDE/raid performance

Hi!

> On Wed, Apr 17, 2002 at 11:15:15AM -0400, [email protected] wrote:
> > to be about 25-30watts. Each 1800+ MP puts out 66w of heat, meaning it
> > uses more than 66w (I couldn't find the power useage stats)
>
> Meaning that they consume 66w. All energy is tranfered to heat.
> (Where else would you expect energy to go? My CPUs don't do mechanical
> work nor do they build up potential energy.)

But they drive other parts of mainboard. [Imagine led on the mainboard but
powered from CPU. Imagine that led takes 3W (unrealistic).] Then CPU needs
66W from power supply but onlly makes heat from 63W.
Pavel


--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.