Hi,
I have two identical X8DTH-6's, each with 5520s XEONS..
=== RAID
Two 3ware RAID-0's (testing only) with 24 drives (2TB WD).
Read: 1.2Gbyte/sec
Write: 1.2Gbyte/sec
=== NETWORK
When I run iperf between the two 10GbE interfaces, I get 1.1-1.2Gbyte/sec.
MTU=9000 on both hosts.
== COPY OVER NETWORK
When I copy using cp/NFS, tar/nc, etc.. I only see an average of about
250MiB/s, I also have two desktop boards (8GB ram/ea) from which I get
500-550MiB/s from (the raid is slower on the desktop boards, it gets what
the RAID can read it), how come these server boards cannot push > 1Gbyte/sec
sustained?
I know on this board:
CPU1 => SLOT 1,2,3
CPU2 => SLOT 4,5,6,7
The RAID card is plugged into 2 (CPU1)
The 10GbE card is plugged into 6 (CPU2)
Should both cards be plugged into the same set of slots controlled by the
same CPU?
Has anyone experienced anything like this before, e.g., > 1Gbyte/sec read
& write for network and 3ware RAID but when using them in combination, it
is slow.
OS = CentOS 5.5 x86_64
Filesystem = XFS
NIC = 10GbE AT2 Server Adapter
No single component is < 1Gbyte/sec but when data is transferred from
hostA to hostB, then it degrades to 250MiB/s.
Justin.
Hi Justin,
When you run the copy over the network, could you issue a "vmstat 1" on
both machines?
Thanks,
J.
On Sat, 2011-02-05 at 14:35 -0500, Justin Piszcz wrote:
> Hi,
>
> I have two identical X8DTH-6's, each with 5520s XEONS..
>
> === RAID
> Two 3ware RAID-0's (testing only) with 24 drives (2TB WD).
> Read: 1.2Gbyte/sec
> Write: 1.2Gbyte/sec
>
> === NETWORK
> When I run iperf between the two 10GbE interfaces, I get 1.1-1.2Gbyte/sec.
> MTU=9000 on both hosts.
>
> == COPY OVER NETWORK
> When I copy using cp/NFS, tar/nc, etc.. I only see an average of about
> 250MiB/s, I also have two desktop boards (8GB ram/ea) from which I get
> 500-550MiB/s from (the raid is slower on the desktop boards, it gets what
> the RAID can read it), how come these server boards cannot push > 1Gbyte/sec
> sustained?
>
> I know on this board:
> CPU1 => SLOT 1,2,3
> CPU2 => SLOT 4,5,6,7
>
> The RAID card is plugged into 2 (CPU1)
> The 10GbE card is plugged into 6 (CPU2)
>
> Should both cards be plugged into the same set of slots controlled by the
> same CPU?
>
> Has anyone experienced anything like this before, e.g., > 1Gbyte/sec read
> & write for network and 3ware RAID but when using them in combination, it
> is slow.
>
> OS = CentOS 5.5 x86_64
> Filesystem = XFS
> NIC = 10GbE AT2 Server Adapter
>
> No single component is < 1Gbyte/sec but when data is transferred from
> hostA to hostB, then it degrades to 250MiB/s.
>
> Justin.
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-net" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Jean Gobin, CCENT, CCNA, CCNA Security
-----------------------------------------------------------------------
http://newsfromjean.blogspot.com/
On Sat, 5 Feb 2011, Jean Gobin wrote:
> Hi Justin,
>
> When you run the copy over the network, could you issue a "vmstat 1" on
> both machines?
>
> Thanks,
> J.
Hi,
Sure:
Server reading: (NFSD)
0 8 0 3491300 19816 335144 0 0 166720 0 16994 31355 0 2 79 20 0
0 9 0 3307132 19816 518392 0 0 183840 0 18893 33150 0 2 77 21 0
0 9 0 2948104 19816 876748 0 0 358560 0 36059 63193 0 4 78 18 0
3 3 0 2608568 19816 1216140 0 0 339008 0 34319 57289 0 3 80 17 0
0 7 0 2271500 19816 1552364 0 0 336352 0 34622 54913 0 3 81 16 0
0 5 0 2014760 19820 1808904 0 0 256320 4 27490 39669 0 2 82 15 0
1 12 0 1935804 7648 1897868 0 0 245696 0 26247 37934 0 2 82 15 0
3 7 0 1910924 7648 1923044 0 0 255232 0 27192 39606 0 3 82 15 0
2 12 0 1882688 7648 1950252 0 0 297760 0 30942 47381 0 3 81 16 0
3 13 0 1859100 7648 1975408 0 0 264480 0 27720 41203 0 3 82 15 0
0 13 0 1835308 7648 1997980 0 0 255872 0 26953 40114 0 2 83 15 0
Server writing: (NFS)
0 1 0 2963588 18460 858404 0 0 0 0 15461 24085 0 4 92 4 0
0 1 0 2241844 18460 1568852 0 0 0 307522 30341 43566 0 7 90 2 0
1 0 0 1958904 6792 1856808 0 0 0 348492 29154 49240 0 9 89 1 0
2 0 0 1959028 6792 1856504 0 0 0 336152 28843 47120 0 9 90 1 0
1 0 0 1958656 6792 1856892 0 0 0 258260 22357 34049 0 7 91 2 0
0 1 0 1958576 4424 1858956 0 0 0 241876 20976 31781 0 7 91 2 0
1 0 0 1958736 4428 1859184 0 0 0 262441 22843 34776 0 7 91 2 0
1 0 0 1958600 4428 1858880 0 0 0 295685 25271 38730 0 8 90 2 0
1 0 0 1958408 4428 1858916 0 0 0 262336 22351 32853 0 7 91 2 0
2 0 0 1958660 4428 1858916 0 0 0 254172 21894 32753 0 7 91 2 0
1 1 0 1958788 4428 1859972 0 0 0 286964 24219 36885 0 8 90
Justin.
On Sat, 5 Feb 2011, Jean Gobin wrote:
> Hi Justin,
>
> When you run the copy over the network, could you issue a "vmstat 1" on
> both machines?
>
> Thanks,
> J.
iperf & nload:
# ./iperf -c 10.0.1.2
------------------------------------------------------------
Client connecting to 10.0.1.2, TCP port 5001
TCP window size: 27.7 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.1.1 port 34935 connected with 10.0.1.2 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 11.5 GBytes 9.90 Gbits/sec
Device eth0 [10.0.1.2] (1/5):
================================================================================
Incoming:
#######
#######
#######
#######
#######
####### Curr: 1188.69 MByte/s
####### Avg: 510.87 MByte/s
####### Min: 0.00 MByte/s
####### Max: 1188.82 MByte/s
####### Ttl: 26.76 GByte
Outgoing:
#####
#####
#####
##### Curr: 1.01 MByte/s
###### Avg: 0.43 MByte/s
###### Min: 0.00 MByte/s
###### Max: 1.01 MByte/s
###### Ttl: 23.20 MByte
Read from RAID-0: (1.4Gbyte/sec)
# dd if=bigfile3 of=/dev/null bs=1M
1 0
1 0 0 1049192 440 2795456 0 0 1400832 0 11274 2868 0 5 94 1 0
1 0 0 1049072 440 2789112 0 0 1359872 0 10947 2728 0 5 94 2 0
1 0 0 1049196 440 2795736 0 0 1228800 0 9931 2749 0 4 94 2 0
1 0 0 1049196 440 2795736 0 0 1392640 0 11144 2773 0 5 94 1 0
0 1 0 1049320 440 2795748 0 0 1220608 0 9971 2760 0 4 94 2 0
0 1 0 1049320 440 2795748 0 0 1392640 0 11163 2785 0 5 94
Write to RAID-0: (1.4Gbyte/sec)
2 0 0 1995352 4604 1798940 0 0 8 1443783 12074 4957 0 9 86 5 0
1 1 0 1995268 4604 1798828 0 0 0 1494872 12410 5127 0 9 86 5 0
1 1 0 1995492 4604 1799132 0 0 0 1463364 12239 5031 0 9 86 5 0
1 2 0 1995432 4604 1798656 0 0 0 1431832 11993 4697 0 9 86
Everything > 1 Gbyte/sec but when you use the network, 250MiB/s, it does not
make sense.
Justin.
Justin Piszcz put forth on 2/5/2011 2:18 PM:
> Everything > 1 Gbyte/sec but when you use the network, 250MiB/s, it does not
> make sense.
Did you try launching 4 simultaneous cp operations over nfs to get to 1.2 GB/s?
I've witnessed single stream copy performance with Samba being less than
maximum due to Samba limitations. Running multiple copy ops in parallel then
usually saturates the pipe.
I'm thinking you may have a single threaded process involved that's eating all
of one core, at which point there is no scalability left for that operation.
--
Stan
Le Sat, 5 Feb 2011 14:35:52 -0500 (EST) vous écriviez:
> No single component is < 1Gbyte/sec but when data is transferred from
> hostA to hostB, then it degrades to 250MiB/s.
Running 2 Supermicro H8-DIi+ with 2378 Opteron, Intel 10 GigE CX4, 3Ware
RAID-6 24 drives : 550 MB/s reading sustained over NFS with 2.6.32.*
(plain vanilla kernel).
I know that Xeon used to suck IO wise, but this is surprising. Isn't
something preventing DMA ?
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------
On Sat, 5 Feb 2011, Emmanuel Florac wrote:
> Le Sat, 5 Feb 2011 14:35:52 -0500 (EST) vous écriviez:
>
To respond to everyone:
> Did you try launching 4 simultaneous cp operations over nfs to get to
> 1.2 GB/s?
> I've witnessed single stream copy performance with Samba being less than
> maximum due to Samba limitations. Running multiple copy ops in parallel then
> usually saturates the pipe.
I tried 4 simultaenous cp's and there was little change, 250-320MiB/s.
Reader: 250MiB/s-300MiB/s
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 6 0 927176 16076 2886308 0 0 206416 0 26850 40663 0 4 86 10 0
0 0 0 925620 16076 2887768 0 0 261620 0 31057 61111 0 6 86 8 0
0 0 0 923928 16076 2889016 0 0 328852 0 40954 102136 0 8 90 2 0
5 2 0 921112 16076 2890780 0 0 343476 0 39469 97957 0 8 90 2 0
Writer (its almost as if its caching the data and writing it out in
1.2-1.3Gbyte/sec chunks..
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
7 0 232 13980 408 3746900 0 0 0 1351926 37877 81924 2 32 61 5 0
6 0 232 12740 404 3748308 0 0 0 1336768 38822 86505 2 31 62 5 0
4 0 232 12524 404 3744672 0 0 0 1295000 39368 91005 1 30 63 5 0
6 0 232 12304 404 3748148 0 0 0 1332776 39351 86929 2 31 62 5 0
I also noticed this: 'FS-Cache: Loaded'
Could this be slowing things down?
********* [Desktop test below]
When I copy on my desktop (Debian) systems, it transfers immediately:
0 0 9172 83648 2822716 3343120 0 0 52 320 4249 3682 2 1 97 0
0 0 9172 86532 2822716 3343176 0 0 0 0 4362 3074 0 1 99 0
0 4 9172 62924 2822716 3363572 0 0 94212 0 5083 3044 1 3 90 7
1 7 9172 63444 2822708 3364008 0 0 360448 32 19058 8825 0 15 48 37
0 5 9172 61828 2821692 3367004 0 0 491520 0 26283 15282 0 24 43 32
0 5 9172 59212 2821672 3373292 0 0 524288 0 28810 17370 0 27 33 40
3 6 9172 57620 2821660 3355500 0 0 469364 128 25399 15825 0 21 42 36
********* [End of desktop test]
When I transfer using CentOS 5.5, there are a bunch of little reads, then it
averages out at around 250MiB/s:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 871088 16108 2942808 0 0 4091 3 552 1165 0 2 96 2 0
0 0 0 871088 16108 2942808 0 0 0 0 1013 2342 0 0 100 0 0
0 0 0 871224 16108 2942808 0 0 0 0 1031 2375 0 0 100 0 0
0 8 0 870864 16108 2942808 0 0 288 0 1071 2396 0 0 86 14 0
2 0 0 868636 16108 2943256 0 0 160 0 6348 18642 0 0 80 19 0
0 0 0 868884 16116 2943248 0 0 0 28 31717 99894 0 4 96 0 0
1 0 0 467040 16116 3343524 0 0 401444 0 40241 105777 0 4 93 4 0
2 1 0 12300 8540 3802792 0 0 482668 0 44694 116988 0 6 88 6 0
1 0 0 12412 3528 3805056 0 0 480124 0 44765 112272 0 6 88 6 0
0 8 0 17560 3528 3798900 0 0 388288 0 37987 68367 0 5 85 10 0
1 7 0 17868 3528 3802172 0 0 323296 0 33470 38398 0 5 83 11 0
1 7 0 17260 3524 3801688 0 0 299584 0 30991 35153 0 5 83 11 0
0 8 0 17272 3512 3802208 0 0 304352 0 31463 35400 0 5 84 11 0
0 8 0 17228 3512 3801476 0 0 258816 0 27035 30651 0 4 84 1
Is there a way to disable the VFS/page cache some how to avoid whatever
FS-Cache is doing?
> Ok, could you do the following:
> dd if=/dev/urandom of=hugefile bs=1M count=<c>
> Where <c> is chosen so the resulting file is 2-3 times the RAM available
> in your server.
> Then redo the dd to null. Let's also check with the rsize from the nfs.
> /usr/bin/time -v dd if=hugefile of=/dev/null bs=<rsize used by NFS>
I also have tried rsize/wsize of 65536 with NFS, no difference there, rsize
and wsize back the default, 8192 I believe.
221M hugefile
223M hugefile
226M hugefile
229M hugefile
232M hugefile
Had to skip urandom, too slow (7MiB/s or so)..
# dd if=/dev/zero of=hugefile bs=1M count=16384
16384+0 records in
16384+0 records out
17179869184 bytes (17 GB) copied, 14.6305 seconds, 1.2 GB/s
# /usr/bin/time -v dd if=hugefile of=/dev/null bs=8192
2097152+0 records in
2097152+0 records out
17179869184 bytes (17 GB) copied, 14.7656 seconds, 1.2 GB/s
Command being timed: "dd if=hugefile of=/dev/null bs=8192"
User time (seconds): 2.45
System time (seconds): 8.70
Percent of CPU this job got: 75%
Elapsed (wall clock) time (h:mm:ss or m:ss): 0:14.76
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 0
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 2
Minor (reclaiming a frame) page faults: 216
Voluntary context switches: 57753
Involuntary context switches: 1843
Swaps: 0
File system inputs: 0
File system outputs: 0
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
0 1 0 17100 432 3758476 0 0 4607 1147 575 1051 0 2 95 3 0
1 0 0 16980 432 3758888 0 0 1064760 0 9344 10698 1 4 94 1 0
1 0 0 17100 432 3758888 0 0 1081808 0 9520 10608 1 4 94 1 0
1 1 0 16912 440 3758908 0 0 1136764 48 9899 11305 1 4 94 1 0
1 0 0 16972 440 3758908 0 0 1112664 0 9756 11126 1 4 94 1 0
1 0 0 17080 440 3759040 0 0 1140564 0 9943 11492 1 4 94 1 0
2 0 0 16976 440 3759040 0 0 1134020 0 9910 11475 1 4 94 1 0
1 0 0 17364 440 3758940 0 0 1132380 0 9865 11236 1 4 94
Justin.
* Justin Piszcz ([email protected]) wrote:
>
>
> On Sat, 5 Feb 2011, Emmanuel Florac wrote:
>
>> Le Sat, 5 Feb 2011 14:35:52 -0500 (EST) vous ?criviez:
>>
>
> To respond to everyone:
>
>> Did you try launching 4 simultaneous cp operations over nfs to get to
>> 1.2 GB/s?
>> I've witnessed single stream copy performance with Samba being less than
>> maximum due to Samba limitations. Running multiple copy ops in parallel then
>> usually saturates the pipe.
>
>
> I tried 4 simultaenous cp's and there was little change, 250-320MiB/s.
So I think you've said network benchmarks are OK, disc benchmarks are OK, but
a copy over the network is slow.
What happens if you run a disc benchmark at the same time as a network benchmark;
even though the two aren't related? Can you keep the disc busy writing even when
the network is being pushed?
Dave
--
-----Open up your eyes, open up your mind, open up your code -------
/ Dr. David Alan Gilbert | Running GNU/Linux | Happy \
\ gro.gilbert @ treblig.org | | In Hex /
\ _________________________|_____ http://www.treblig.org |_______/
On Sat, 5 Feb 2011, Dr. David Alan Gilbert wrote:
> * Justin Piszcz ([email protected]) wrote:
>>
>>
>> On Sat, 5 Feb 2011, Emmanuel Florac wrote:
>>
>>> Le Sat, 5 Feb 2011 14:35:52 -0500 (EST) vous ?criviez:
>>>
>>
>> To respond to everyone:
>>
>>> Did you try launching 4 simultaneous cp operations over nfs to get to
>>> 1.2 GB/s?
>>> I've witnessed single stream copy performance with Samba being less than
>>> maximum due to Samba limitations. Running multiple copy ops in parallel then
>>> usually saturates the pipe.
>>
>>
>> I tried 4 simultaenous cp's and there was little change, 250-320MiB/s.
>
> So I think you've said network benchmarks are OK, disc benchmarks are OK, but
> a copy over the network is slow.
>
> What happens if you run a disc benchmark at the same time as a network benchmark;
> even though the two aren't related? Can you keep the disc busy writing even when
> the network is being pushed?
Hi,
Still OK when reading or writing & iperf test, so it does not appear to be
I/O or backplane bound. I also put both cards on the same CPU (lane-wise)
and it made no difference.
Device eth0 [10.0.1.4] (1/1):
================================================================================
Incoming:
###
###
###
###
###
### Curr: 1188.85 MByte/s
### Avg: 680.11 MByte/s
### Min: 0.00 MByte/s
### Max: 1188.88 MByte/s
### Ttl: 18.32 GByte
Outgoing:
On Sat, 5 Feb 2011, Justin Piszcz wrote:
>
>
> On Sat, 5 Feb 2011, Dr. David Alan Gilbert wrote:
>
>> * Justin Piszcz ([email protected]) wrote:
Not sure how to copy/paste from an IPMI window, but I made my own
kernel for 2.6.37 in CentOS 5.5 (a pain) and now it is doing ~482-500MiB/s
sustained with a single copy (netcat from A->B). Poor performance with the
default 2.6.18 kernel. Seems to also slow down over time, down to
434MiB/s now, but it started very quick and it remains between
420-500MiB/s sustained. Now 435-445MiB/s..
Justin.
Justin Piszcz put forth on 2/5/2011 4:56 PM:
> Not sure how to copy/paste from an IPMI window, but I made my own
> kernel for 2.6.37 in CentOS 5.5 (a pain) and now it is doing ~482-500MiB/s
> sustained with a single copy (netcat from A->B). Poor performance with the
> default 2.6.18 kernel. Seems to also slow down over time, down to 434MiB/s now,
> but it started very quick and it remains between 420-500MiB/s sustained. Now
> 435-445MiB/s..
I forgot you mentioned CentOS. Their kernel and apps are always very old.
2.6.18 was release in Sept 2006 IIRC--4+ years ago. It was the "pirate themed"
release.
With 2.6.37 what do you get with 4 concurrent nfs copy ops? If the aggregate
nfs throughput doesn't increase, you need to tweak your nfs server (and possibly
client). With that hardware and a recent kernel you should be able to fill that
10 GbE pipe, or come really close.
--
Stan
On Sat, 5 Feb 2011, Stan Hoeppner wrote:
> Justin Piszcz put forth on 2/5/2011 4:56 PM:
>
>> Not sure how to copy/paste from an IPMI window, but I made my own
>> kernel for 2.6.37 in CentOS 5.5 (a pain) and now it is doing ~482-500MiB/s
>> sustained with a single copy (netcat from A->B). Poor performance with the
>> default 2.6.18 kernel. Seems to also slow down over time, down to 434MiB/s now,
>> but it started very quick and it remains between 420-500MiB/s sustained. Now
>> 435-445MiB/s..
>
> I forgot you mentioned CentOS. Their kernel and apps are always very old.
> 2.6.18 was release in Sept 2006 IIRC--4+ years ago. It was the "pirate themed"
> release.
>
> With 2.6.37 what do you get with 4 concurrent nfs copy ops? If the aggregate
> nfs throughput doesn't increase, you need to tweak your nfs server (and possibly
> client). With that hardware and a recent kernel you should be able to fill that
> 10 GbE pipe, or come really close.
Hi,
I installed Debian Testing & my own kernel again (2.6.37):
One thread:
# get bigfile.1
`bigfile.1' at 1168441344 (5%) 493.10M/s eta:38s [Receiving data]
`bigfile.1' at 2226847744 (10%) 557.16M/s eta:32s [Receiving data]
`bigfile.1' at 3274768384 (15%) 578.57M/s eta:29s [Receiving data]
`bigfile.1' at 4781113344 (22%) 585.02M/s eta:26s [Receiving data]
`bigfile.1' at 6294077440 (30%) 588.96M/s eta:24s [Receiving data]
`bigfile.1' at 9318563840 (44%) 592.87M/s eta:19s [Receiving data]
`bigfile.1' at 12805996544 (61%) 592.46M/s eta:13s [Receiving data]
20971520000 bytes transferred in 34 seconds (592.72M/s)
Two threads:
[0] get bigfile.1
`bigfile.1' at 3225878528 (15%) 306.51M/s eta:55s [Receiving data]
[1] get bigfile.2
`bigfile.2' at 3200516096 (15%) 306.49M/s eta:55s [Receiving data]
Seems like some problem achieving > 600MiB/s now.
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 0 34184 112 3875676 0 0 622592 0 49060 9798 0 5 95 0
0 0 0 32820 112 3876976 0 0 622592 0 49043 9745 0 5 95 0
0 0 0 34184 112 3876188 0 0 622592 0 48997 9714 0 5 95 0
1 0 0 32572 112 3877780 0 0 606208 0 48815 9537 0 5 95 0
0 0 0 32820 112 3878104 0 0 622592 0 48968 9741 0 5 95 0
0 0 0 31456 112 3878024 0 0 622592 0 49051 9773 0 5 95 0
0 0 0 31952 112 3877448 0 0 622592 4 49076 9804 0 5 95 0
3 0 0 30836 112 3871364 0 0 606208 0 48901 9650 0 5 95 0
But much better than 250MiB/s.
[0] Done (get bigfile.1)
20971520000 bytes transferred in 64 seconds (312.95M/s)
[1] Done (get bigfile.2)
20971520000 bytes transferred in 64 seconds (312.78M/s)
So approx 624MiB/s..
With three threads:
[0] get bigfile.1
`bigfile.1' at 6659241132 (31%) 222.41M/s eta:61s [Receiving data]
[1] get bigfile.2
`bigfile.2' at 6620446720 (31%) 222.00M/s eta:62s [Receiving data]
[2] get bigfile.3
`bigfile.3' at 6601441280 (31%) 222.17M/s eta:62s [Receiving data]
vmstat:
0 0 0 33028 112 3877856 0 0 704512 0 53859 13108 0 6 94 0
1 0 0 32532 112 3877476 0 0 688128 0 54063 12484 0 6 94 0
1 0 0 33276 112 3877128 0 0 704512 0 54143 12548 0 6 94 0
1 0 0 33284 112 3868860 0 0 700672 0 54200 12398 0 6 94 0
[0] Done (get bigfile.1)
20971520000 bytes transferred in 90 seconds (221.70M/s)
[1] Done (get bigfile.2)
20971520000 bytes transferred in 90 seconds (221.50M/s)
[2] Done (get bigfile.3)
20971520000 bytes transferred in 90 seconds (221.42M/s)
A little better, 663MiB/s.
Four threads:
[0] get bigfile.1
`bigfile.1' at 2641887232 (12%) 162.77M/s eta:2m [Receiving data]
[1] get bigfile.2
`bigfile.2' at 2601254912 (12%) 161.97M/s eta:2m [Receiving data]
[2] get bigfile.3
`bigfile.3' at 2592931840 (12%) 162.05M/s eta:2m [Receiving data]
[3] get bigfile.4
`bigfile.4' at 2546794496 (12%) 159.92M/s eta:2m [Receiving data]
vmstat:
1 0 0 34492 112 3859748 0 0 678144 0 58816 11481 0 7 93 0
0 0 0 35360 112 3871060 0 0 681728 0 58889 11344 0 6 94 0
1 0 0 33872 112 3874252 0 0 688128 0 59052 11560 0 6 94 0
0 0 0 34492 112 3871400 0 0 688128 0 59079 11564 0 7 93 0
1 0 0 32136 112 3874888 0 0 688128 0 59089 11432 0 6 93 0
2 0 0 35360 112 3872436 0 0 655360 0 56672 10943 0 7 93 0
[0] Done (get bigfile.1)
20971520000 bytes transferred in 120 seconds (166.72M/s)
[1] Done (get bigfile.2)
20971520000 bytes transferred in 120 seconds (166.42M/s)
[2] Done (get bigfile.3)
20971520000 bytes transferred in 120 seconds (166.40M/s)
[3] Done (get bigfile.4)
20971520000 bytes transferred in 120 seconds (166.30M/s)
664MiB/s, so it appears this is the best it can do without any major tweaking
other than setting the blockdev readahead to 16384.
Overall performance is good, but now I need to figure out how to tweak the
network and/or disk paths so I can achieve > 1Gbyte/sec, as the RAID-0s can
read and write > 1.2Gbyte/sec.
Has anyone with 10GbE <-> 10GbE been able to transfer at or near line rate
with a single connection, or > 700MiB/s with multiple connections? Problem
I worry about with creating multiple connections between two machines it will
create contention within the RAID that is being read from..
Justin.
Justin Piszcz put forth on 2/5/2011 7:08 PM:
> Overall performance is good, but now I need to figure out how to tweak the
> network and/or disk paths so I can achieve > 1Gbyte/sec, as the RAID-0s can
> read and write > 1.2Gbyte/sec.
>
> Has anyone with 10GbE <-> 10GbE been able to transfer at or near line rate
> with a single connection, or > 700MiB/s with multiple connections? Problem
> I worry about with creating multiple connections between two machines it will
> create contention within the RAID that is being read from..
Are you trying to maximize this for the sake of maximizing it, or do you have an
actual application or work flow process that needs to be able to transfer a
single large file via NFS at >1 GB/s throughput?
--
Stan
On Sat, 5 Feb 2011, Stan Hoeppner wrote:
> Justin Piszcz put forth on 2/5/2011 7:08 PM:
>
>> Overall performance is good, but now I need to figure out how to tweak the
>> network and/or disk paths so I can achieve > 1Gbyte/sec, as the RAID-0s can
>> read and write > 1.2Gbyte/sec.
>>
>> Has anyone with 10GbE <-> 10GbE been able to transfer at or near line rate
>> with a single connection, or > 700MiB/s with multiple connections? Problem
>> I worry about with creating multiple connections between two machines it will
>> create contention within the RAID that is being read from..
>
> Are you trying to maximize this for the sake of maximizing it, or do you have an
> actual application or work flow process that needs to be able to transfer a
> single large file via NFS at >1 GB/s throughput?
Workflow process-
Migrate data from old/legacy RAID sets to new ones, possibly also
2TB->3TB, so the faster the transfer speed, the better.
Justin.
On Sun, 6 Feb 2011, Justin Piszcz wrote:
>
>
> On Sat, 5 Feb 2011, Stan Hoeppner wrote:
>
>> Justin Piszcz put forth on 2/5/2011 7:08 PM:
>>
Hi,
1. Defaults below:
sysctl -w net.core.wmem_max=131071
sysctl -w net.core.rmem_max=131071
sysctl -w net.core.wmem_default=118784
sysctl -w net.core.rmem_default=118784
sysctl -w net.core.optmem_max=20480
sysctl -w net.ipv4.igmp_max_memberships=20
sysctl -w net.ipv4.tcp_mem="379104 505472 758208"
sysctl -w net.ipv4.tcp_wmem="4096 16384 4194304"
sysctl -w net.ipv4.tcp_rmem="4096 87380 4194304"
sysctl -w net.ipv4.udp_mem="379104 505472 758208"
sysctl -w net.ipv4.udp_rmem_min=4096
sysctl -w net.ipv4.udp_wmem_min=4096
sysctl -w net.core.netdev_max_backlog=1024
2. Optimized settings, for > 800MiB/:
# for 3ware raid, use 16384 readahead, > 16384 readahead, no improvement
blockdev --setra 16384 /dev/sda
# not sure if this helps much
ethtool -K eth0 lro on
# seems to get performance > 600-700MiB/s faster
sysctl -w net.core.wmem_max=4194304
sysctl -w net.core.rmem_max=4194304
sysctl -w net.core.wmem_default=4194304
sysctl -w net.core.rmem_default=4194304
sysctl -w net.core.optmem_max=20480
sysctl -w net.ipv4.igmp_max_memberships=20
sysctl -w net.ipv4.tcp_mem="4194304 4194304 4194304"
sysctl -w net.ipv4.tcp_wmem="4194304 4194304 4194304"
sysctl -w net.ipv4.tcp_rmem="4194304 4194304 4194304"
sysctl -w net.ipv4.udp_mem="4194304 4194304 4194304"
sysctl -w net.ipv4.udp_rmem_min=4096
sysctl -w net.ipv4.udp_wmem_min=4096
sysctl -w net.core.netdev_max_backlog=1048576
# the main option that makes all of the difference, the golden option
# is the rszie and wsize of 1megabyte below:
10.0.1.4:/r1 /nfs/box2/r1 nfs tcp,bg,rw,hard,intr,nolock,nfsvers=3,rsize=1048576,wsize=1048576 0 0
CPU utilization:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2069 root 20 0 18640 1304 688 R 91 0.0 0:15.50 cp
703 root 20 0 0 0 0 S 25 0.0 2:46.95 kswapd0
With a single copy I get roughly 700-800MiB/s:
Device eth0 [10.0.1.3] (1/1):
================================================================================
Incoming:
###################### #################### ####
###################### #################### ####
###################### #################### ####
###################### #################### ####
###################### #################### ####
###################### #################### #### Curr: 808.71 MByte/s
###################### #################### #### Avg: 706.11 MByte/s
###################### #################### #### Min: 0.00 MByte/s
###################### #################### #### Max: 860.17 MByte/s
###################### #################### #### Ttl: 344.70 GByte
With two copies I get up to 830-850MiB/s:
Device eth0 [10.0.1.3] (1/1):
================================================================================
Incoming:
############################################ ####
############################################ ####
############################################ ####
############################################ ####
############################################ ####
############################################ #### Curr: 846.61 MByte/s
############################################ #### Avg: 683.14 MByte/s
############################################ #### Min: 0.00 MByte/s
############################################ #### Max: 860.17 MByte/s
############################################ #### Ttl: 305.71 GByte
Using a 4MiB r/w size with NFS improves performance to sustain > 750MiB/s
a little better I think:
10.0.1.4:/r1 /nfs/box2/r1 nfs tcp,bg,rw,hard,intr,nolock,nfsvers=3,rsize=4194304,wsize=4194304 0
Anyhow, roughly 750-850MiB/s it would be nice to get 1Gbyte/sec but I guess
the kerrnel (or my HW, CPU not fast enough) is not there yet.
Also found a good doc from RedHat:
http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf
Justin.
Le Sun, 6 Feb 2011 08:46:08 -0500 (EST) vous écriviez:
> # for 3ware raid, use 16384 readahead, > 16384 readahead, no
> improvement blockdev --setra 16384 /dev/sda
Also try this :
echo 512 > /sys/block/sda/queue/nr_requests
--
------------------------------------------------------------------------
Emmanuel Florac | Direction technique
| Intellique
| <[email protected]>
| +33 1 78 94 84 02
------------------------------------------------------------------------
Dne 6.2.2011 14:46, Justin Piszcz napsal(a):
>
>
> On Sun, 6 Feb 2011, Justin Piszcz wrote:
>
>>
>>
>> On Sat, 5 Feb 2011, Stan Hoeppner wrote:
>>
>>> Justin Piszcz put forth on 2/5/2011 7:08 PM:
>>>
>
>
> Hi,
Hi, just few comments for maximal throughput..
> 1. Defaults below:
> sysctl -w net.core.wmem_max=131071
> sysctl -w net.core.rmem_max=131071
> sysctl -w net.core.wmem_default=118784
> sysctl -w net.core.rmem_default=118784
> sysctl -w net.core.optmem_max=20480
> sysctl -w net.ipv4.igmp_max_memberships=20
> sysctl -w net.ipv4.tcp_mem="379104 505472 758208"
> sysctl -w net.ipv4.tcp_wmem="4096 16384 4194304"
> sysctl -w net.ipv4.tcp_rmem="4096 87380 4194304"
> sysctl -w net.ipv4.udp_mem="379104 505472 758208"
> sysctl -w net.ipv4.udp_rmem_min=4096
> sysctl -w net.ipv4.udp_wmem_min=4096
> sysctl -w net.core.netdev_max_backlog=1024
sysctl net.ipv4.tcp_timestamps=0
> 2. Optimized settings, for > 800MiB/:
>
> # for 3ware raid, use 16384 readahead, > 16384 readahead, no improvement
> blockdev --setra 16384 /dev/sda
elevator=deadline
> # not sure if this helps much
> ethtool -K eth0 lro on
Maybe try to _disable_ NIC offloads functions, sometimes its contra
productive (with enough CPU power, but I doubt on 2 socket box) + check
irqbalance..
If you have connection just between machines try the biggest possible MTU.
> # seems to get performance > 600-700MiB/s faster
> sysctl -w net.core.wmem_max=4194304
> sysctl -w net.core.rmem_max=4194304
> sysctl -w net.core.wmem_default=4194304
> sysctl -w net.core.rmem_default=4194304
> sysctl -w net.core.optmem_max=20480
> sysctl -w net.ipv4.igmp_max_memberships=20
> sysctl -w net.ipv4.tcp_mem="4194304 4194304 4194304"
> sysctl -w net.ipv4.tcp_wmem="4194304 4194304 4194304"
> sysctl -w net.ipv4.tcp_rmem="4194304 4194304 4194304"
> sysctl -w net.ipv4.udp_mem="4194304 4194304 4194304"
> sysctl -w net.ipv4.udp_rmem_min=4096
> sysctl -w net.ipv4.udp_wmem_min=4096
> sysctl -w net.core.netdev_max_backlog=1048576
>
> # the main option that makes all of the difference, the golden option
> # is the rszie and wsize of 1megabyte below:
> 10.0.1.4:/r1 /nfs/box2/r1 nfs
> tcp,bg,rw,hard,intr,nolock,nfsvers=3,rsize=1048576,wsize=1048576 0 0
>
> CPU utilization:
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 2069 root 20 0 18640 1304 688 R 91 0.0 0:15.50 cp
> 703 root 20 0 0 0 0 S 25 0.0 2:46.95 kswapd0
>
> With a single copy I get roughly 700-800MiB/s:
>
> Device eth0 [10.0.1.3] (1/1):
> ================================================================================
>
> Incoming:
> ###################### #################### ####
> ###################### #################### ####
> ###################### #################### ####
> ###################### #################### ####
> ###################### #################### ####
> ###################### #################### #### Curr: 808.71 MByte/s
> ###################### #################### #### Avg: 706.11 MByte/s
> ###################### #################### #### Min: 0.00 MByte/s
> ###################### #################### #### Max: 860.17 MByte/s
> ###################### #################### #### Ttl: 344.70 GByte
>
> With two copies I get up to 830-850MiB/s:
>
> Device eth0 [10.0.1.3] (1/1):
> ================================================================================
>
> Incoming:
> ############################################ ####
> ############################################ ####
> ############################################ ####
> ############################################ ####
> ############################################ ####
> ############################################ #### Curr: 846.61 MByte/s
> ############################################ #### Avg: 683.14 MByte/s
> ############################################ #### Min: 0.00 MByte/s
> ############################################ #### Max: 860.17 MByte/s
> ############################################ #### Ttl: 305.71 GByte
>
> Using a 4MiB r/w size with NFS improves performance to sustain > 750MiB/s
> a little better I think:
> 10.0.1.4:/r1 /nfs/box2/r1 nfs
> tcp,bg,rw,hard,intr,nolock,nfsvers=3,rsize=4194304,wsize=4194304 0
What about using UDP ?
> Anyhow, roughly 750-850MiB/s it would be nice to get 1Gbyte/sec but I guess
> the kerrnel (or my HW, CPU not fast enough) is not there yet.
>
> Also found a good doc from RedHat:
> http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf
>
>
> Justin.
>
HTH, Z.
On Feb 6, 2011, at 11:55 AM, Zdenek Kaspar <[email protected]> wrote:
> Dne 6.2.2011 14:46, Justin Piszcz napsal(a):
>>
>>
>> On Sun, 6 Feb 2011, Justin Piszcz wrote:
>>
>>>
>>>
>>> On Sat, 5 Feb 2011, Stan Hoeppner wrote:
>>>
>>>> Justin Piszcz put forth on 2/5/2011 7:08 PM:
>>>>
>>
>>
>> Hi,
>
> Hi, just few comments for maximal throughput..
>
>> 1. Defaults below:
>> sysctl -w net.core.wmem_max=131071
>> sysctl -w net.core.rmem_max=131071
>> sysctl -w net.core.wmem_default=118784
>> sysctl -w net.core.rmem_default=118784
>> sysctl -w net.core.optmem_max=20480
>> sysctl -w net.ipv4.igmp_max_memberships=20
>> sysctl -w net.ipv4.tcp_mem="379104 505472 758208"
>> sysctl -w net.ipv4.tcp_wmem="4096 16384 4194304"
>> sysctl -w net.ipv4.tcp_rmem="4096 87380 4194304"
>> sysctl -w net.ipv4.udp_mem="379104 505472 758208"
>> sysctl -w net.ipv4.udp_rmem_min=4096
>> sysctl -w net.ipv4.udp_wmem_min=4096
>> sysctl -w net.core.netdev_max_backlog=1024
>
> sysctl net.ipv4.tcp_timestamps=0
>
>> 2. Optimized settings, for > 800MiB/:
>>
>> # for 3ware raid, use 16384 readahead, > 16384 readahead, no improvement
>> blockdev --setra 16384 /dev/sda
>
> elevator=deadline
>
>> # not sure if this helps much
>> ethtool -K eth0 lro on
>
> Maybe try to _disable_ NIC offloads functions, sometimes its contra
> productive (with enough CPU power, but I doubt on 2 socket box) + check
> irqbalance..
>
> If you have connection just between machines try the biggest possible MTU.
>
>> # seems to get performance > 600-700MiB/s faster
>> sysctl -w net.core.wmem_max=4194304
>> sysctl -w net.core.rmem_max=4194304
>> sysctl -w net.core.wmem_default=4194304
>> sysctl -w net.core.rmem_default=4194304
>> sysctl -w net.core.optmem_max=20480
>> sysctl -w net.ipv4.igmp_max_memberships=20
>> sysctl -w net.ipv4.tcp_mem="4194304 4194304 4194304"
>> sysctl -w net.ipv4.tcp_wmem="4194304 4194304 4194304"
>> sysctl -w net.ipv4.tcp_rmem="4194304 4194304 4194304"
>> sysctl -w net.ipv4.udp_mem="4194304 4194304 4194304"
>> sysctl -w net.ipv4.udp_rmem_min=4096
>> sysctl -w net.ipv4.udp_wmem_min=4096
>> sysctl -w net.core.netdev_max_backlog=1048576
>>
>> # the main option that makes all of the difference, the golden option
>> # is the rszie and wsize of 1megabyte below:
>> 10.0.1.4:/r1 /nfs/box2/r1 nfs
>> tcp,bg,rw,hard,intr,nolock,nfsvers=3,rsize=1048576,wsize=1048576 0 0
>>
>> CPU utilization:
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 2069 root 20 0 18640 1304 688 R 91 0.0 0:15.50 cp
>> 703 root 20 0 0 0 0 S 25 0.0 2:46.95 kswapd0
>>
>> With a single copy I get roughly 700-800MiB/s:
>>
>> Device eth0 [10.0.1.3] (1/1):
>> ================================================================================
>>
>> Incoming:
>> ###################### #################### ####
>> ###################### #################### ####
>> ###################### #################### ####
>> ###################### #################### ####
>> ###################### #################### ####
>> ###################### #################### #### Curr: 808.71 MByte/s
>> ###################### #################### #### Avg: 706.11 MByte/s
>> ###################### #################### #### Min: 0.00 MByte/s
>> ###################### #################### #### Max: 860.17 MByte/s
>> ###################### #################### #### Ttl: 344.70 GByte
>>
>> With two copies I get up to 830-850MiB/s:
>>
>> Device eth0 [10.0.1.3] (1/1):
>> ================================================================================
>>
>> Incoming:
>> ############################################ ####
>> ############################################ ####
>> ############################################ ####
>> ############################################ ####
>> ############################################ ####
>> ############################################ #### Curr: 846.61 MByte/s
>> ############################################ #### Avg: 683.14 MByte/s
>> ############################################ #### Min: 0.00 MByte/s
>> ############################################ #### Max: 860.17 MByte/s
>> ############################################ #### Ttl: 305.71 GByte
>>
>> Using a 4MiB r/w size with NFS improves performance to sustain > 750MiB/s
>> a little better I think:
>> 10.0.1.4:/r1 /nfs/box2/r1 nfs
>> tcp,bg,rw,hard,intr,nolock,nfsvers=3,rsize=4194304,wsize=4194304 0
>
> What about using UDP ?
>
>> Anyhow, roughly 750-850MiB/s it would be nice to get 1Gbyte/sec but I guess
>> the kerrnel (or my HW, CPU not fast enough) is not there yet.
>>
>> Also found a good doc from RedHat:
>> http://www.redhat.com/promo/summit/2008/downloads/pdf/Thursday/Mark_Wagner.pdf
>>
>>
>> Justin.
>>
>
> HTH, Z.
Thx for suggestions, I have tried some of those additional optimizations in the past but they did not seem to give increases in performance, but will revisit them if I have time, thanks.
UDP seemed to hit and stay at 650MiB/s.
Justin.
Justin Piszcz put forth on 2/6/2011 4:16 AM:
> Workflow process-
>
> Migrate data from old/legacy RAID sets to new ones, possibly also 2TB->3TB, so
> the faster the transfer speed, the better.
This type of data migration is probably going to include many many files of
various sizes from small to large. You have optimized your system performance
only for individual large file xfers. Thus, when you go to copy directories
containing hundreds or thousands of files of various sizes, you will likely see
much lower throughput using a single copy stream. Thus if you want to keep that
10 GbE pipe full, you'll likely need to run multiple copies in parallel, one per
large parent directory. Or, run a single copy from say, 10 legacy systems to
one new system simultaneously, etc.
Given this situation, you may want to consider tar'ing up entire directories
with gz or bz compression, if you have enough free space on the legacy machines,
and copying the tarballs to the new system. This will maximize your throughput,
although I don't know if it will decrease your total work flow completion time,
which should really be your overall goal.
--
Stan
On Mon, Feb 7, 2011 at 09:01, Stan Hoeppner <[email protected]> wrote:
> Justin Piszcz put forth on 2/6/2011 4:16 AM:
>
>> Workflow process-
>>
>> Migrate data from old/legacy RAID sets to new ones, possibly also 2TB->3TB, so
>> the faster the transfer speed, the better.
>
> This type of data migration is probably going to include many many files of
> various sizes from small to large. ?You have optimized your system performance
> only for individual large file xfers. ?Thus, when you go to copy directories
> containing hundreds or thousands of files of various sizes, you will likely see
> much lower throughput using a single copy stream. ?Thus if you want to keep that
> 10 GbE pipe full, you'll likely need to run multiple copies in parallel, one per
> large parent directory. ?Or, run a single copy from say, 10 legacy systems to
> one new system simultaneously, etc.
>
> Given this situation, you may want to consider tar'ing up entire directories
> with gz or bz compression, if you have enough free space on the legacy machines,
> and copying the tarballs to the new system. ?This will maximize your throughput,
> although I don't know if it will decrease your total work flow completion time,
> which should really be your overall goal.
Another option might be to use tar and gzip to bundle the data up,
then pipe it through netcat or ssh. When I have to transfer large
chunks of data I find this is the fastest method. That said, if the
connection is interrupted, then you're on your own. rsync might also
be a good option.
Thanks,
--
Julian Calaby
Email: [email protected]
Profile: http://www.google.com/profiles/julian.calaby/
.Plan: http://sites.google.com/site/juliancalaby/