2008-06-01 09:45:50

by Justin Piszcz

[permalink] [raw]
Subject: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?

I have 12 enterprise-class seagate 1TiB disks on a 965 desktop board and
it appears I have hit the limit, if I were able to get the maximum speed
of all drives, ~70MiB/avg * 12 = 840MiB/s but it seems to stop aound 774
MiB/s (currently running badblocks on all drives)..

I am testing some drives for someone and was curious to see how far one
can push the disks/backplane to their theoretical limit.

dstat output:
----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
1 12 0 83 3 2| - 0 | 0 0 | 0 0 | 13k 23k
1 11 0 84 2 2| 774M 0 |2100B 7373B| 0 0 | 13k 23k
1 12 0 83 3 2| 774M 0 | 0 0 | 0 0 | 13k 23k
1 11 0 82 4 2| 774M 0 |2030B 5178B| 0 0 | 13k 23k
1 11 0 83 4 2| 774M 0 | 0 0 | 0 0 | 13k 23k
1 11 0 83 3 2| 774M 0 |2264B 6225B| 0 0 | 13k 23k

vmstat 1 output:
~$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 12 124 3841772 8 13012 0 0 12595 163880 379 352 0 31 37 32
2 12 124 3841772 8 12992 0 0 791744 8 12796 23033 1 18 0 82
0 12 124 3841772 8 12992 0 0 792192 0 12677 22918 1 15 0 84
0 12 124 3841772 8 12992 0 0 792960 0 12894 22929 1 15 0 84

When I was writing to all of the drives it was maxing out around ~650
MiB/s.

I also have 2 raptors on a PCI card (as I ran out of PCI-e cards) and:
When I read from 1 of the raptors (w/ dd/example shown below) the speed
drops:

1 13 0 79 5 3| 764M 0 |2240B 7105B| 0 0 | 12k 21k
1 13 0 80 5 2| 764M 0 | 0 0 | 0 0 | 12k 21k
1 11 0 82 5 2| 764M 0 |2170B 5446B| 0 0 | 12k 21k
1 12 0 81 5 2| 762M 0 | 0 0 | 0 0 | 12k 21k

Does/has anyone done this with server intel board/would greater speeds be
achievable?

Also, how does AMD fair in this regard? Has anyone run similar tests?
For instance if you have 12 disks in your host you could:

dd if=/dev/disk1 of=/dev/null bs=1M
dd if=/dev/disk2 of=/dev/null bs=1M

What rate(s) do you get?

Justin.


2008-06-01 11:01:22

by Justin Piszcz

[permalink] [raw]
Subject: Re: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?


On Sun, 1 Jun 2008, Justin Piszcz wrote:

> I have 12 enterprise-class seagate 1TiB disks on a 965 desktop board and it
> appears I have hit the limit, if I were able to get the maximum speed of all
> drives, ~70MiB/avg * 12 = 840MiB/s but it seems to stop aound 774 MiB/s
> (currently running badblocks on all drives)..

Small correction, they are 7200.11 Seagate Desktop Drives (ST31000340AS),
not enterprise drives:

http://www.seagate.com/ww/v/index.jsp?vgnextoid=0732f141e7f43110VgnVCM100000f5ee0a0aRCRD
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148274

2008-06-01 11:26:22

by Justin Piszcz

[permalink] [raw]
Subject: Re: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?



On Sun, 1 Jun 2008, Justin Piszcz wrote:

>
> On Sun, 1 Jun 2008, Justin Piszcz wrote:
>
>> I have 12 enterprise-class seagate 1TiB disks on a 965 desktop board and it
>> appears I have hit the limit, if I were able to get the maximum speed of
>> all drives, ~70MiB/avg * 12 = 840MiB/s but it seems to stop aound 774 MiB/s
>> (currently running badblocks on all drives)..
>
> Small correction, they are 7200.11 Seagate Desktop Drives (ST31000340AS), not
> enterprise drives:
>
> http://www.seagate.com/ww/v/index.jsp?vgnextoid=0732f141e7f43110VgnVCM100000f5ee0a0aRCRD
> http://www.newegg.com/Product/Product.aspx?Item=N82E16822148274
>
>

http://www.intel.com/products/chipsets/g965/diagram.jpg

Basically it appears I am hammering the southbridge as for this board the
PCI-e (x1) slots also traverse through the southbridge.

6_SATA -> G965 ICH8
3_PCI-e -> G965 ICH8

>From which has to ship that data across the DMI (2GB) link to the
northbridge.

If one utilized a 12, 16 or 24 port raid card (but used SW RAID) on the
x16 slot on the northbridge itself, would this barrier exist as the:
GMCH<->CPU is (8.5GB/s)..?

Also on the X38 and X48 the speed increases slightly:
http://www.intel.com/products/chipsets/X38/X38_Block_Diagram.jpg (10.6GB/s)
http://www.intel.com/products/chipsets/x48/x48_block_diagram.jpg (12.8GB/s)

If one asks why would one need such speed?

Example:
LTO-4 drives can write at 120 MiB/s each:
http://en.wikipedia.org/wiki/Linear_Tape-Open

It is then therefore imperative one could sustain this rate to possibly
multiple(!) tape drives on a single system.

Justin.

2008-06-01 12:20:59

by Willy Tarreau

[permalink] [raw]
Subject: Re: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?

On Sun, Jun 01, 2008 at 07:26:09AM -0400, Justin Piszcz wrote:
>
>
> On Sun, 1 Jun 2008, Justin Piszcz wrote:
>
> >
> >On Sun, 1 Jun 2008, Justin Piszcz wrote:
> >
> >>I have 12 enterprise-class seagate 1TiB disks on a 965 desktop board and
> >>it appears I have hit the limit, if I were able to get the maximum speed
> >>of all drives, ~70MiB/avg * 12 = 840MiB/s but it seems to stop aound 774
> >>MiB/s (currently running badblocks on all drives)..
> >
> >Small correction, they are 7200.11 Seagate Desktop Drives (ST31000340AS),
> >not enterprise drives:
> >
> >http://www.seagate.com/ww/v/index.jsp?vgnextoid=0732f141e7f43110VgnVCM100000f5ee0a0aRCRD
> >http://www.newegg.com/Product/Product.aspx?Item=N82E16822148274
> >
> >
>
> http://www.intel.com/products/chipsets/g965/diagram.jpg
>
> Basically it appears I am hammering the southbridge as for this board the
> PCI-e (x1) slots also traverse through the southbridge.
>
> 6_SATA -> G965 ICH8
> 3_PCI-e -> G965 ICH8
>
> >From which has to ship that data across the DMI (2GB) link to the
> northbridge.
>
> If one utilized a 12, 16 or 24 port raid card (but used SW RAID) on the
> x16 slot on the northbridge itself, would this barrier exist as the:
> GMCH<->CPU is (8.5GB/s)..?
>
> Also on the X38 and X48 the speed increases slightly:
> http://www.intel.com/products/chipsets/X38/X38_Block_Diagram.jpg (10.6GB/s)
> http://www.intel.com/products/chipsets/x48/x48_block_diagram.jpg (12.8GB/s)
>
> If one asks why would one need such speed?

It looks like graphic games are pushing the technologies to their
limits, which is good for us. I have bought X38 motherboards for
10 Gbps experimentations, and this chipset is perfectly capable of
feeding two Myri10GE NICs (20 Gbps total). This is 2.5 GB/s, not
counting overhead. So I/O bandwidth is a premium requirement today.

Other chipsets I have tested (945 and 965) were very poor (about 4.7
and 6.5 Gbps respectively if my memory serves me right).

Willy

2008-06-01 13:55:48

by David Lethe

[permalink] [raw]
Subject: Re: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?



-----Original Message-----

From: "Justin Piszcz" <[email protected]>
Subj: Re: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?
Date: Sun Jun 1, 2008 6:02 am
Size: 793 bytes
To: "[email protected]" <[email protected]>; "[email protected]" <[email protected]>; "[email protected]" <[email protected]>


On Sun, 1 Jun 2008, Justin Piszcz wrote:

> I have 12 enterprise-class seagate 1TiB disks on a 965 desktop board and it
> appears I have hit the limit, if I were able to get the maximum speed of all
> drives, ~70MiB/avg * 12 = 840MiB/s but it seems to stop aound 774 MiB/s
> (currently running badblocks on all drives)..

Small correction, they are 7200.11 Seagate Desktop Drives (ST31000340AS),
not enterprise drives:

http://www.seagate.com/ww/v/index.jsp?vgnextoid=0732f141e7f43110VgnVCM100000f5ee0a0aRCRD
http://www.newegg.com/Product/Product.aspx?Item=N82E16822148274

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html


2008-06-01 21:23:18

by Daniel J Blueman

[permalink] [raw]
Subject: Re: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?

On 1 Jun, 10:50, Justin Piszcz <[email protected]> wrote:
> I have 12 enterprise-class seagate 1TiB disks on a 965 desktop board and
> it appears I have hit the limit, if I were able to get the maximum speed
> of all drives, ~70MiB/avg * 12 = 840MiB/s but it seems to stop aound 774
> MiB/s (currently running badblocks on all drives)..

Nice test. The Seagate 7200.11 drives deliver 120MB/s (outer zone,
raw) each, and there is an issue with CFQ dispatching requests; see:

http://groups.google.co.uk/group/linux.kernel/browse_thread/thread/b88264b084a2dfe0/a1bc0f67837bad00

A quick workaround tweak is:

# echo 0 >/sys/block/sda/queue/iosched/slice_idle

Does this help any? This gives the difference of ~68MB/s vs ~120MB/s
on my 7200.11 ;-) .

That said, the i965 chipset is fairly contemporary, but if that 2GB/s
DMI connection is the bidirectional bandwidth (likely), then maybe
you're hitting that limit: Intel's DMI bus is based on PCIe, thus will
use 128 byte PCI-e Max Payload packets (as in the rest of the
chipset), which IIRC theoretically maxes you out near 800MB/s.

The X48 chipset may allow you to crank the Max Payload to 256 (setpci
and the Intel chipset docs), if it doesn't default to 256, like in
5400 server chipsets. This chipset is where the fun really starts eg
hdparm -T giving >10GB/s, like in Itanium2s ;-) .

Thanks,
Daniel
--
Daniel J Blueman

2008-06-01 21:26:51

by Justin Piszcz

[permalink] [raw]
Subject: Re: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?



On Sun, 1 Jun 2008, Daniel J Blueman wrote:

> On 1 Jun, 10:50, Justin Piszcz <[email protected]> wrote:
>> I have 12 enterprise-class seagate 1TiB disks on a 965 desktop board and
>> it appears I have hit the limit, if I were able to get the maximum speed
>> of all drives, ~70MiB/avg * 12 = 840MiB/s but it seems to stop aound 774
>> MiB/s (currently running badblocks on all drives)..
>
> Nice test. The Seagate 7200.11 drives deliver 120MB/s (outer zone,
> raw) each, and there is an issue with CFQ dispatching requests; see:
Thanks, wow...!

>
> http://groups.google.co.uk/group/linux.kernel/browse_thread/thread/b88264b084a2dfe0/a1bc0f67837bad00
>
> A quick workaround tweak is:
>
> # echo 0 >/sys/block/sda/queue/iosched/slice_idle
Unfortunately the disks have been removed from the host, only to test
to make sure all of them were good (1 was DOA).

>
> Does this help any? This gives the difference of ~68MB/s vs ~120MB/s
> on my 7200.11 ;-) .
>
> That said, the i965 chipset is fairly contemporary, but if that 2GB/s
> DMI connection is the bidirectional bandwidth (likely), then maybe
> you're hitting that limit: Intel's DMI bus is based on PCIe, thus will
> use 128 byte PCI-e Max Payload packets (as in the rest of the
> chipset), which IIRC theoretically maxes you out near 800MB/s.
Very interesting, do you know what AMD uses for their boards by any chance?
I'll most likely stick with some type of Intel chipset but was curious
regarding AMD/Nvidia.

>
> The X48 chipset may allow you to crank the Max Payload to 256 (setpci
> and the Intel chipset docs), if it doesn't default to 256, like in
> 5400 server chipsets. This chipset is where the fun really starts eg
> hdparm -T giving >10GB/s, like in Itanium2s ;-) .
Thanks for this information. What kind of HW are you seeing > 10 GB/s? ;)

>
> Thanks,
> Daniel
> --
> Daniel J Blueman
>

2008-06-01 23:44:00

by Justin Piszcz

[permalink] [raw]
Subject: Re: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?



On Sun, 1 Jun 2008, Daniel J Blueman wrote:

> On 1 Jun, 10:50, Justin Piszcz <[email protected]> wrote:
>> I have 12 enterprise-class seagate 1TiB disks on a 965 desktop board and
>> it appears I have hit the limit, if I were able to get the maximum speed
>> of all drives, ~70MiB/avg * 12 = 840MiB/s but it seems to stop aound 774
>> MiB/s (currently running badblocks on all drives)..
>
> Nice test. The Seagate 7200.11 drives deliver 120MB/s (outer zone,
> raw) each, and there is an issue with CFQ dispatching requests; see:
>
> http://groups.google.co.uk/group/linux.kernel/browse_thread/thread/b88264b084a2dfe0/a1bc0f67837bad00
>
> A quick workaround tweak is:
>
> # echo 0 >/sys/block/sda/queue/iosched/slice_idle
>
> Does this help any? This gives the difference of ~68MB/s vs ~120MB/s
> on my 7200.11 ;-) .
>
> That said, the i965 chipset is fairly contemporary, but if that 2GB/s
> DMI connection is the bidirectional bandwidth (likely), then maybe
> you're hitting that limit: Intel's DMI bus is based on PCIe, thus will
> use 128 byte PCI-e Max Payload packets (as in the rest of the
> chipset), which IIRC theoretically maxes you out near 800MB/s.
>
> The X48 chipset may allow you to crank the Max Payload to 256 (setpci
> and the Intel chipset docs), if it doesn't default to 256, like in
> 5400 server chipsets. This chipset is where the fun really starts eg
> hdparm -T giving >10GB/s, like in Itanium2s ;-) .
>
> Thanks,
> Daniel
> --
> Daniel J Blueman
>

slice 0 does help a little with my 10-disk raid5 for sequential
read/writes:

http://home.comcast.net/~jpiszcz/raid/20080601/raid5.html

2008-06-03 18:44:30

by Bryan Mesich

[permalink] [raw]
Subject: Re: Limits of the 965 chipset & 3 PCI-e cards/southbridge? ~774MiB/s peak for read, ~650MiB/s peak for write?

On Sun, Jun 01, 2008 at 05:45:39AM -0400, Justin Piszcz wrote:

> I am testing some drives for someone and was curious to see how far one can
> push the disks/backplane to their theoretical limit.

This testing would indeed only suggest theoretical limits. In a
production environment, I think a person would be hard pressed to
reproduce these numbers.

> Does/has anyone done this with server intel board/would greater speeds be
> achievable?

Nope, but your post inspired me to give it a try. My setup is as
follows:

Kernel: linux 2.6.25.3-18 (Fedora 9)
Motherboard: Intel SE7520BD2-DDR2
SATA Controller: (2) 8 port 3Ware 9550SX
Disks (12) 750GB Seagate ST3750640NS

Disks sd[a-h] are plugged into the first 3Ware controller while
sd[i-l] are plugged into the second controller. Both 3Ware cards
are plugged onto PCIX 100 slots. The disks are being exported as
"single disk" and write caching has been disabled. The OS is
loaded on sd[a-d] (small 10GB partitions mirrored). For my first
test, I ran dd on a single disk:

dd if=/dev/sde of=/dev/null bs=1M

dstat -D sde

----total-cpu-usage---- --dsk/sde-- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0 7 53 40 0 0| 78M 0 | 526B 420B| 0 0 |1263 2559
0 8 53 38 0 0| 79M 0 | 574B 420B| 0 0 |1262 2529
0 7 54 39 0 0| 78M 0 | 390B 420B| 0 0 |1262 2576
0 7 54 39 0 0| 76M 0 | 284B 420B| 0 0 |1216 2450
0 8 54 38 0 0| 76M 0 | 376B 420B| 0 0 |1236 2489
0 9 54 36 0 0| 79M 0 | 397B 420B| 0 0 |1265 2537
0 9 54 37 0 0| 77M 0 | 344B 510B| 0 0 |1262 2872
0 8 54 38 0 0| 75M 0 | 637B 420B| 0 0 |1214 2992
0 8 53 38 0 0| 78M 0 | 422B 420B| 0 0 |1279 3179

And for a write:

dd if=/dev/zero of=/dev/sde bs=1M

dstat -D sde

----total-cpu-usage---- --dsk/sde-- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0 7 2 90 0 0| 0 73M| 637B 420B| 0 0 | 614 166
0 7 0 93 0 0| 0 73M| 344B 420B| 0 0 | 586 105
0 7 0 93 0 0| 0 75M| 344B 420B| 0 0 | 629 177
0 7 0 93 0 0| 0 74M| 344B 420B| 0 0 | 600 103
0 7 0 93 0 0| 0 73M| 875B 420B| 0 0 | 612 219
0 8 0 92 0 0| 0 68M| 595B 420B| 0 0 | 546 374
0 8 5 86 0 0| 0 76M| 132B 420B| 0 0 | 632 453
0 9 0 91 0 0| 0 74M| 799B 420B| 0 0 | 596 421
0 8 0 92 0 0| 0 74M| 693B 420B| 0 0 | 624 436


For my next test, I ran dd on 8 disks (sd[e-l]). These are
non-system disks (OS is installed on sd[a-d) and they are split
between the 3Ware controllers. Here are my results:

dd if=/dev/sd[e-l] of=/dev/null bs=1M

dstat

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0 91 0 0 1 8| 397M 0 | 811B 306B| 0 0 |6194 6654
0 91 0 0 1 7| 420M 0 | 158B 322B| 0 0 |6596 7097
1 91 0 0 1 8| 415M 0 | 324B 322B| 0 0 |6406 6839
1 91 0 0 1 8| 413M 0 | 316B 436B| 0 0 |6464 6941
0 90 0 0 2 8| 419M 0 | 66B 306B| 0 0 |6588 7121
1 91 0 0 2 7| 412M 0 | 461B 322B| 0 0 |6449 6916
0 91 0 0 1 7| 415M 0 | 665B 436B| 0 0 |6535 7044
0 92 0 0 1 7| 418M 0 | 299B 306B| 0 0 |6555 7028
0 90 0 0 1 8| 412M 0 | 192B 436B| 0 0 |6496 7014

And for write:

dd if=/dev/zero of=/dev/sd[e-l] bs=1M

dstat

----total-cpu-usage---- -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read writ| recv send| in out | int csw
0 86 0 0 1 12| 0 399M| 370B 306B| 0 0 |3520 855
0 87 0 0 1 12| 0 407M| 310B 322B| 0 0 |3506 813
1 87 0 0 1 12| 0 413M| 218B 322B| 0 0 |3568 827
0 87 0 0 0 12| 0 425M| 278B 322B| 0 0 |3641 785
0 87 0 0 1 12| 0 430M| 310B 322B| 0 0 |3658 845
0 86 0 0 1 14| 0 421M| 218B 322B| 0 0 |3605 756
1 85 0 0 1 14| 0 417M| 627B 322B| 0 0 |3579 984
0 84 0 0 1 14| 0 420M| 224B 436B| 0 0 |3548 1006
0 86 0 0 1 13| 0 433M| 310B 306B| 0 0 |3679 836


It seems that I'm running into a wall around 420-430M. Assuming
the disks can push 75M, 8 disks should push 600M together. This
is obviously not the case. According to Intel's Tech
Specifications:

http://download.intel.com/support/motherboards/server/se7520bd2/sb/se7520bd2_server_board_tps_r23.pdf

I think the IO contention (in my case) is due to the PXH.

All and all, when it comes down to moving IO in reality, these
tests are pretty much useless in my opinion. Filesystem overhead
and other operations limit the amount of IO that can be serviced
by the PCI bus and/or the block devices (although it's interesting
to see if the theoretical speeds are possible).

For example, the box I used in the above example will be used as
a fibre channel target server. Below is a performance print out
of a running fibre target with the same hardware as tested above:

mayacli> show performance controller=fc1
read/sec write/sec IOPS
16k 844k 141
52k 548k 62
1m 344k 64
52k 132k 26
0 208k 27
12k 396k 42
168k 356k 64
32k 76k 16
952k 248k 124
860k 264k 132
1m 544k 165
1m 280k 166
900k 344k 105
340k 284k 60
1m 280k 125
1m 340k 138
764k 592k 118
1m 448k 127
2m 356k 276
2m 480k 174
2m 8m 144
540k 376k 89
324k 380k 77
4k 348k 71

This particular fibre target is providing storage to 8
initiators, 4 of which are busy IMAP mail servers. Granted this
isn't the busiest time of the year for us, but were not comming even
close to the numbers mentioned in the above example.

As always, corrections to my above bable are appreciated and
welcomed :-)

Bryan