2008-01-30 10:21:26

by Bruce Allen

[permalink] [raw]
Subject: e1000 full-duplex TCP performance well below wire speed

Dear LKML,

We've connected a pair of modern high-performance boxes with integrated
copper Gb/s Intel NICS, with an ethernet crossover cable, and have run
some netperf full duplex TCP tests. The transfer rates are well below
wire speed. We're reporting this as a kernel bug, because we expect a
vanilla kernel with default settings to give wire speed (or close to wire
speed) performance in this case. We DO see wire speed in simplex
transfers. The behavior has been verified on multiple machines with
identical hardware.

Details:
Kernel version: 2.6.23.12
ethernet NIC: Intel 82573L
ethernet driver: e1000 version 7.3.20-k2
motherboard: Supermicro PDSML-LN2+ (one quad core Intel Xeon X3220, Intel
3000 chipset, 8GB memory)

The test was done with various mtu sizes ranging from 1500 to 9000, with
ethernet flow control switched on and off, and using reno and cubic as a
TCP congestion control.

The behavior depends on the setup. In one test we used cubic congestion
control, flow control off. The transfer rate in one direction was above
0.9Gb/s while in the other direction it was 0.6 to 0.8 Gb/s. After 15-20s
the rates flipped. Perhaps the two steams are fighting for resources. (The
performance of a full duplex stream should be close to 1Gb/s in both
directions.) A graph of the transfer speed as a function of time is here:
https://n0.aei.uni-hannover.de/networktest/node19-new20-noflow.jpg
Red shows transmit and green shows receive (please ignore other plots):

We're happy to do additional testing, if that would help, and very
grateful for any advice!

Bruce Allen
Carsten Aulbert
Henning Fehrmann


2008-01-30 13:18:20

by Andi Kleen

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Bruce Allen <[email protected]> writes:

> Dear LKML,

You forgot to specify what user programs you used to get to the
benchmark results. e.g. if the user space does not use large
enough reads/writes then performance will be not optimal.

Also best you repost your results with full information
on [email protected]

-Andi

2008-01-30 13:39:34

by Bruce Allen

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi Andi,

Thanks for the reply.

> You forgot to specify what user programs you used to get to the
> benchmark results. e.g. if the user space does not use large enough
> reads/writes then performance will be not optimal.

We used netperf (as stated in the first paragraph of the original post).
Tell us if you want the command line. Previous testing with older kernels
and broadcom NICS has shown full-duplex wire speed.

> Also best you repost your results with full information on
> [email protected]

Wilco. Just subscribing now.

Cheers,
Bruce

2008-01-30 13:53:35

by David Miller

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

From: Bruce Allen <[email protected]>
Date: Wed, 30 Jan 2008 03:51:51 -0600 (CST)

[ [email protected] added to CC: list, that is where
kernel networking issues are discussed. ]

> (The performance of a full duplex stream should be close to 1Gb/s in
> both directions.)

This is not a reasonable expectation.

ACKs take up space on the link in the opposite direction of the
transfer.

So the link usage in the opposite direction of the transfer is
very far from zero.

2008-01-30 14:02:21

by Bruce Allen

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi David,

Thanks for your note.

>> (The performance of a full duplex stream should be close to 1Gb/s in
>> both directions.)
>
> This is not a reasonable expectation.
>
> ACKs take up space on the link in the opposite direction of the
> transfer.
>
> So the link usage in the opposite direction of the transfer is
> very far from zero.

Indeed, we are not asking to see 1000 Mb/s. We'd be happy to see 900
Mb/s.

Netperf is trasmitting a large buffer in MTU-sized packets (min 1500
bytes). Since the acks are only about 60 bytes in size, they should be
around 4% of the total traffic. Hence we would not expect to see more
than 960 Mb/s.

We have run these same tests on older kernels (with Broadcomm NICS) and
gotten above 900 Mb/s full duplex.

Cheers,
Bruce

2008-01-30 14:09:21

by David Miller

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

From: Bruce Allen <[email protected]>
Date: Wed, 30 Jan 2008 07:38:56 -0600 (CST)

> Wilco. Just subscribing now.

You don't need to subscribe to any list at vger.kernel.org in order to
post a message to it.

2008-01-30 16:22:15

by Stephen Hemminger

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

On Wed, 30 Jan 2008 08:01:46 -0600 (CST)
Bruce Allen <[email protected]> wrote:

> Hi David,
>
> Thanks for your note.
>
> >> (The performance of a full duplex stream should be close to 1Gb/s in
> >> both directions.)
> >
> > This is not a reasonable expectation.
> >
> > ACKs take up space on the link in the opposite direction of the
> > transfer.
> >
> > So the link usage in the opposite direction of the transfer is
> > very far from zero.
>
> Indeed, we are not asking to see 1000 Mb/s. We'd be happy to see 900
> Mb/s.
>
> Netperf is trasmitting a large buffer in MTU-sized packets (min 1500
> bytes). Since the acks are only about 60 bytes in size, they should be
> around 4% of the total traffic. Hence we would not expect to see more
> than 960 Mb/s.
>
> We have run these same tests on older kernels (with Broadcomm NICS) and
> gotten above 900 Mb/s full duplex.
>
> Cheers,
> Bru

Don't forget the network overhead: http://sd.wareonearth.com/~phil/net/overhead/
Max TCP Payload data rates over ethernet:
(1500-40)/(38+1500) = 94.9285 % IPv4, minimal headers
(1500-52)/(38+1500) = 94.1482 % IPv4, TCP timestamps

I believe what you are seeing is an effect that occurs when using
cubic on links with no other idle traffic. With two flows at high speed,
the first flow consumes most of the router buffer and backs off gradually,
and the second flow is not very aggressive. It has been discussed
back and forth between TCP researchers with no agreement, one side
says that it is unfairness and the other side says it is not a problem in
the real world because of the presence of background traffic.

See:
http://www.hamilton.ie/net/pfldnet2007_cubic_final.pdf
http://www.csc.ncsu.edu/faculty/rhee/Rebuttal-LSM-new.pdf


--
Stephen Hemminger <[email protected]>

2008-01-30 22:25:36

by Bruce Allen

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi Stephen,

Thanks for your helpful reply and especially for the literature pointers.

>> Indeed, we are not asking to see 1000 Mb/s. We'd be happy to see 900
>> Mb/s.
>>
>> Netperf is trasmitting a large buffer in MTU-sized packets (min 1500
>> bytes). Since the acks are only about 60 bytes in size, they should be
>> around 4% of the total traffic. Hence we would not expect to see more
>> than 960 Mb/s.

> Don't forget the network overhead: http://sd.wareonearth.com/~phil/net/overhead/
> Max TCP Payload data rates over ethernet:
> (1500-40)/(38+1500) = 94.9285 % IPv4, minimal headers
> (1500-52)/(38+1500) = 94.1482 % IPv4, TCP timestamps

Yes. If you look further down the page, you will see that with jumbo
frames (which we have also tried) on Gb/s ethernet the maximum throughput
is:

(9000-20-20-12)/(9000+14+4+7+1+12)*1000000000/1000000 = 990.042 Mbps

We are very far from this number -- averaging perhaps 600 or 700 Mbps.

> I believe what you are seeing is an effect that occurs when using
> cubic on links with no other idle traffic. With two flows at high speed,
> the first flow consumes most of the router buffer and backs off gradually,
> and the second flow is not very aggressive. It has been discussed
> back and forth between TCP researchers with no agreement, one side
> says that it is unfairness and the other side says it is not a problem in
> the real world because of the presence of background traffic.

At least in principle, we should have NO congestion here. We have ports
on two different machines wired with a crossover cable. Box A can not
transmit faster than 1 Gb/s. Box B should be able to receive that data
without dropping packets. It's not doing anything else!

> See:
> http://www.hamilton.ie/net/pfldnet2007_cubic_final.pdf
> http://www.csc.ncsu.edu/faculty/rhee/Rebuttal-LSM-new.pdf

This is extremely helpful. The typical oscillation (startup) period shown
in the plots in these papers is of order 10 seconds, which is similar to
the types of oscillation periods that we are seeing.

*However* we have also seen similar behavior with the Reno congestion
control algorithm. So this might not be due to cubic, or entirely due to
cubic.

In our application (cluster computing) we use a very tightly coupled
high-speed low-latency network. There is no 'wide area traffic'. So it's
hard for me to understand why any networking components or software layers
should take more than milliseconds to ramp up or back off in speed.
Perhaps we should be asking for a TCP congestion avoidance algorithm which
is designed for a data center environment where there are very few hops
and typical packet delivery times are tens or hundreds of microseconds.
It's very different than delivering data thousands of km across a WAN.

Cheers,
Bruce

2008-01-30 22:34:32

by Stephen Hemminger

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

On Wed, 30 Jan 2008 16:25:12 -0600 (CST)
Bruce Allen <[email protected]> wrote:

> Hi Stephen,
>
> Thanks for your helpful reply and especially for the literature pointers.
>
> >> Indeed, we are not asking to see 1000 Mb/s. We'd be happy to see 900
> >> Mb/s.
> >>
> >> Netperf is trasmitting a large buffer in MTU-sized packets (min 1500
> >> bytes). Since the acks are only about 60 bytes in size, they should be
> >> around 4% of the total traffic. Hence we would not expect to see more
> >> than 960 Mb/s.
>
> > Don't forget the network overhead: http://sd.wareonearth.com/~phil/net/overhead/
> > Max TCP Payload data rates over ethernet:
> > (1500-40)/(38+1500) = 94.9285 % IPv4, minimal headers
> > (1500-52)/(38+1500) = 94.1482 % IPv4, TCP timestamps
>
> Yes. If you look further down the page, you will see that with jumbo
> frames (which we have also tried) on Gb/s ethernet the maximum throughput
> is:
>
> (9000-20-20-12)/(9000+14+4+7+1+12)*1000000000/1000000 = 990.042 Mbps
>
> We are very far from this number -- averaging perhaps 600 or 700 Mbps.
>


That is the upper bound of performance on a standard PCI bus (32 bit).
To go higher you need PCI-X or PCI-Express. Also make sure you are really
getting 64-bit PCI, because I have seen some e1000 PCI-X boards that
are only 32bit.

2008-01-30 23:23:27

by Bruce Allen

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi Stephen,

>>>> Indeed, we are not asking to see 1000 Mb/s. We'd be happy to see 900
>>>> Mb/s.
>>>>
>>>> Netperf is trasmitting a large buffer in MTU-sized packets (min 1500
>>>> bytes). Since the acks are only about 60 bytes in size, they should be
>>>> around 4% of the total traffic. Hence we would not expect to see more
>>>> than 960 Mb/s.
>>
>>> Don't forget the network overhead: http://sd.wareonearth.com/~phil/net/overhead/
>>> Max TCP Payload data rates over ethernet:
>>> (1500-40)/(38+1500) = 94.9285 % IPv4, minimal headers
>>> (1500-52)/(38+1500) = 94.1482 % IPv4, TCP timestamps
>>
>> Yes. If you look further down the page, you will see that with jumbo
>> frames (which we have also tried) on Gb/s ethernet the maximum throughput
>> is:
>>
>> (9000-20-20-12)/(9000+14+4+7+1+12)*1000000000/1000000 = 990.042 Mbps
>>
>> We are very far from this number -- averaging perhaps 600 or 700 Mbps.

> That is the upper bound of performance on a standard PCI bus (32 bit).
> To go higher you need PCI-X or PCI-Express. Also make sure you are really
> getting 64-bit PCI, because I have seen some e1000 PCI-X boards that
> are only 32bit.

The motherboard NIC is in a PCI-e x1 slot. This has a maximum speed of
250 MB/s (2 Gb/s) in each direction. It should be a factor of 2 more
interface speed than is needed.

Cheers,
Bruce

2008-01-31 00:17:36

by SANGTAE HA

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi Bruce,

On Jan 30, 2008 5:25 PM, Bruce Allen <[email protected]> wrote:
>
> In our application (cluster computing) we use a very tightly coupled
> high-speed low-latency network. There is no 'wide area traffic'. So it's
> hard for me to understand why any networking components or software layers
> should take more than milliseconds to ramp up or back off in speed.
> Perhaps we should be asking for a TCP congestion avoidance algorithm which
> is designed for a data center environment where there are very few hops
> and typical packet delivery times are tens or hundreds of microseconds.
> It's very different than delivering data thousands of km across a WAN.
>

If your network latency is low, regardless of type of protocols should
give you more than 900Mbps. I can guess the RTT of two machines is
less than 4ms in your case and I remember the throughputs of all
high-speed protocols (including tcp-reno) were more than 900Mbps with
4ms RTT. So, my question which kernel version did you use with your
broadcomm NIC and got more than 900Mbps?

I have two machines connected by a gig switch and I can see what
happens in my environment. Could you post what parameters did you use
for netperf testing?
and also if you set any parameters for your testing, please post them
here so that I can see that happens to me as well.

Regards,
Sangtae

2008-01-31 08:52:37

by Bruce Allen

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi Sangtae,

Thanks for joining this discussion -- it's good to a CUBIC author and
expert here!

>> In our application (cluster computing) we use a very tightly coupled
>> high-speed low-latency network. There is no 'wide area traffic'. So
>> it's hard for me to understand why any networking components or
>> software layers should take more than milliseconds to ramp up or back
>> off in speed. Perhaps we should be asking for a TCP congestion
>> avoidance algorithm which is designed for a data center environment
>> where there are very few hops and typical packet delivery times are
>> tens or hundreds of microseconds. It's very different than delivering
>> data thousands of km across a WAN.

> If your network latency is low, regardless of type of protocols should
> give you more than 900Mbps.

Yes, this is also what I had thought.

In the graph that we posted, the two machines are connected by an ethernet
crossover cable. The total RTT of the two machines is probably AT MOST a
couple of hundred microseconds. Typically it takes 20 or 30 microseconds
to get the first packet out the NIC. Travel across the wire is a few
nanoseconds. Then getting the packet into the receiving NIC might be
another 20 or 30 microseconds. The ACK should fly back in about the same
time.

> I can guess the RTT of two machines is less than 4ms in your case and I
> remember the throughputs of all high-speed protocols (including
> tcp-reno) were more than 900Mbps with 4ms RTT. So, my question which
> kernel version did you use with your broadcomm NIC and got more than
> 900Mbps?

We are going to double-check this (we did the broadcom testing about two
months ago). Carsten is going to re-run the broadcomm experiments later
today and will then post the results.

You can see results from some testing on crossover-cable wired systems
with broadcomm NICs, that I did about 2 years ago, here:
http://www.lsc-group.phys.uwm.edu/beowulf/nemo/design/SMC_8508T_Performance.html
You'll notice that total TCP throughput on the crossover cable was about
220 MB/sec. With TCP overhead this is very close to 2Gb/s.

> I have two machines connected by a gig switch and I can see what happens
> in my environment. Could you post what parameters did you use for
> netperf testing?

Carsten will post these in the next few hours. If you want to simplify
further, you can even take away the gig switch and just use a crossover
cable.

> and also if you set any parameters for your testing, please post them
> here so that I can see that happens to me as well.

Carsten will post all the sysctl and ethtool parameters shortly.

Thanks again for chiming in. I am sure that with help from you, Jesse, and
Rick, we can figure out what is going on here, and get it fixed.

Cheers,
Bruce

2008-01-31 11:46:07

by Bill Fink

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

On Wed, 30 Jan 2008, SANGTAE HA wrote:

> On Jan 30, 2008 5:25 PM, Bruce Allen <[email protected]> wrote:
> >
> > In our application (cluster computing) we use a very tightly coupled
> > high-speed low-latency network. There is no 'wide area traffic'. So it's
> > hard for me to understand why any networking components or software layers
> > should take more than milliseconds to ramp up or back off in speed.
> > Perhaps we should be asking for a TCP congestion avoidance algorithm which
> > is designed for a data center environment where there are very few hops
> > and typical packet delivery times are tens or hundreds of microseconds.
> > It's very different than delivering data thousands of km across a WAN.
> >
>
> If your network latency is low, regardless of type of protocols should
> give you more than 900Mbps. I can guess the RTT of two machines is
> less than 4ms in your case and I remember the throughputs of all
> high-speed protocols (including tcp-reno) were more than 900Mbps with
> 4ms RTT. So, my question which kernel version did you use with your
> broadcomm NIC and got more than 900Mbps?
>
> I have two machines connected by a gig switch and I can see what
> happens in my environment. Could you post what parameters did you use
> for netperf testing?
> and also if you set any parameters for your testing, please post them
> here so that I can see that happens to me as well.

I see similar results on my test systems, using Tyan Thunder K8WE (S2895)
motherboard with dual Intel Xeon 3.06 GHZ CPUs and 1 GB memory, running
a 2.6.15.4 kernel. The GigE NICs are Intel PRO/1000 82546EB_QUAD_COPPER,
on a 64-bit/133-MHz PCI-X bus, using version 6.1.16-k2 of the e1000
driver, and running with 9000-byte jumbo frames. The TCP congestion
control is BIC.

Unidirectional TCP test:

[bill@chance4 ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79
tx: 1186.5649 MB / 10.05 sec = 990.2741 Mbps 11 %TX 9 %RX 0 retrans

and:

[bill@chance4 ~]$ nuttcp -f-beta -Irx -r -w2m 192.168.6.79
rx: 1186.8281 MB / 10.05 sec = 990.5634 Mbps 14 %TX 9 %RX 0 retrans

Each direction gets full GigE line rate.

Bidirectional TCP test:

[bill@chance4 ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -w2m 192.168.6.79
tx: 898.9934 MB / 10.05 sec = 750.1634 Mbps 10 %TX 8 %RX 0 retrans
rx: 1167.3750 MB / 10.06 sec = 973.8617 Mbps 14 %TX 11 %RX 0 retrans

While one direction gets close to line rate, the other only got 750 Mbps.
Note there were no TCP retransmitted segments for either data stream, so
that doesn't appear to be the cause of the slower transfer rate in one
direction.

If the receive direction uses a different GigE NIC that's part of the
same quad-GigE, all is fine:

[bill@chance4 ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -w2m 192.168.5.79
tx: 1186.5051 MB / 10.05 sec = 990.2250 Mbps 12 %TX 13 %RX 0 retrans
rx: 1186.7656 MB / 10.05 sec = 990.5204 Mbps 15 %TX 14 %RX 0 retrans

Here's a test using the same GigE NIC for both directions with 1-second
interval reports:

[bill@chance4 ~]$ nuttcp -f-beta -Itx -i1 -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -i1 -w2m 192.168.6.79
tx: 92.3750 MB / 1.01 sec = 767.2277 Mbps 0 retrans
rx: 104.5625 MB / 1.01 sec = 872.4757 Mbps 0 retrans
tx: 83.3125 MB / 1.00 sec = 700.1845 Mbps 0 retrans
rx: 117.6250 MB / 1.00 sec = 986.5541 Mbps 0 retrans
tx: 83.8125 MB / 1.00 sec = 703.0322 Mbps 0 retrans
rx: 117.6250 MB / 1.00 sec = 986.5502 Mbps 0 retrans
tx: 83.0000 MB / 1.00 sec = 696.1779 Mbps 0 retrans
rx: 117.6250 MB / 1.00 sec = 986.5522 Mbps 0 retrans
tx: 83.7500 MB / 1.00 sec = 702.4989 Mbps 0 retrans
rx: 117.6250 MB / 1.00 sec = 986.5512 Mbps 0 retrans
tx: 83.1250 MB / 1.00 sec = 697.2270 Mbps 0 retrans
rx: 117.6250 MB / 1.00 sec = 986.5512 Mbps 0 retrans
tx: 84.1875 MB / 1.00 sec = 706.1665 Mbps 0 retrans
rx: 117.5625 MB / 1.00 sec = 985.5510 Mbps 0 retrans
tx: 83.0625 MB / 1.00 sec = 696.7167 Mbps 0 retrans
rx: 117.6875 MB / 1.00 sec = 987.5543 Mbps 0 retrans
tx: 84.1875 MB / 1.00 sec = 706.1545 Mbps 0 retrans
rx: 117.6250 MB / 1.00 sec = 986.5472 Mbps 0 retrans
rx: 117.6875 MB / 1.00 sec = 987.0724 Mbps 0 retrans
tx: 83.3125 MB / 1.00 sec = 698.8137 Mbps 0 retrans

tx: 844.9375 MB / 10.07 sec = 703.7699 Mbps 11 %TX 6 %RX 0 retrans
rx: 1167.4414 MB / 10.05 sec = 973.9980 Mbps 14 %TX 11 %RX 0 retrans

In this test case, the receiver ramped up to nearly full GigE line rate,
while the transmitter was stuck at about 700 Mbps. I ran one longer
60-second test and didn't see the oscillating behavior between receiver
and transmitter, but maybe that's because I have the GigE NIC interrupts
and nuttcp client/server applications both locked to CPU 0.

So in my tests, once one direction gets the upper hand, it seems to
stay that way. Could this be because the slower side is so busy
processing the transmits of the faster side, that it just doesn't
get to do its fair share of transmits (although it doesn't seem to
be a bus or CPU issue). Hopefully those more knowledgeable about
the Linux TCP/IP stack and network drivers might have some more
concrete ideas.

-Bill

2008-01-31 14:55:56

by David Acker

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Bill Fink wrote:
> If the receive direction uses a different GigE NIC that's part of the
> same quad-GigE, all is fine:
>
> [bill@chance4 ~]$ nuttcp -f-beta -Itx -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -w2m 192.168.5.79
> tx: 1186.5051 MB / 10.05 sec = 990.2250 Mbps 12 %TX 13 %RX 0 retrans
> rx: 1186.7656 MB / 10.05 sec = 990.5204 Mbps 15 %TX 14 %RX 0 retrans
Could this be an issue with pause frames? At a previous job I remember
having issues with a similar configuration using two broadcom sb1250 3
gigE port devices. If I ran bidirectional tests on a single pair of
ports connected via cross over, it was slower than when I gave each
direction its own pair of ports. The problem turned out to be that
pause frame generation and handling was not configured correctly.
-Ack

2008-01-31 15:54:51

by Bruce Allen

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi Bill,

> I see similar results on my test systems

Thanks for this report and for confirming our observations. Could you
please confirm that a single-port bidrectional UDP link runs at wire
speed? This helps to localize the problem to the TCP stack or interaction
of the TCP stack with the e1000 driver and hardware.

Cheers,
Bruce

2008-01-31 15:58:29

by Bruce Allen

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi David,

> Could this be an issue with pause frames? At a previous job I remember
> having issues with a similar configuration using two broadcom sb1250 3
> gigE port devices. If I ran bidirectional tests on a single pair of
> ports connected via cross over, it was slower than when I gave each
> direction its own pair of ports. The problem turned out to be that
> pause frame generation and handling was not configured correctly.

We had PAUSE frames turned off for our testing. The idea is to let TCP
do the flow and congestion control.

The problem with PAUSE+TCP is that it can cause head-of-line blocking,
where a single oversubscribed output port on a switch can PAUSE a large
number of flows on other paths.

Cheers,
Bruce

2008-01-31 17:36:47

by Bill Fink

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi Bruce,

On Thu, 31 Jan 2008, Bruce Allen wrote:

> > I see similar results on my test systems
>
> Thanks for this report and for confirming our observations. Could you
> please confirm that a single-port bidrectional UDP link runs at wire
> speed? This helps to localize the problem to the TCP stack or interaction
> of the TCP stack with the e1000 driver and hardware.

Yes, a single-port bidirectional UDP test gets full GigE line rate
in both directions with no packet loss.

[bill@chance4 ~]$ nuttcp -f-beta -Itx -u -Ru -w2m 192.168.6.79 & nuttcp -f-beta -Irx -r -u -Ru -w2m 192.168.6.79
tx: 1187.0078 MB / 10.04 sec = 992.0550 Mbps 19 %TX 7 %RX 0 / 151937 drop/pkt 0.00 %loss
rx: 1187.1016 MB / 10.03 sec = 992.3408 Mbps 19 %TX 7 %RX 0 / 151949 drop/pkt 0.00 %loss

-Bill

2008-01-31 18:28:09

by Jesse Brandeburg

[permalink] [raw]
Subject: RE: e1000 full-duplex TCP performance well below wire speed

Bill Fink wrote:
> a 2.6.15.4 kernel. The GigE NICs are Intel PRO/1000
> 82546EB_QUAD_COPPER,
> on a 64-bit/133-MHz PCI-X bus, using version 6.1.16-k2 of the e1000
> driver, and running with 9000-byte jumbo frames. The TCP congestion
> control is BIC.

Bill, FYI, there was a known issue with e1000 (fixed in 7.0.38-k2) and
socket charge due to truesize that kept one end or the other from
opening its window. The result is not so great performance, and you
must upgrade the driver at both ends to fix it.

it was fixed in commit
9e2feace1acd38d7a3b1275f7f9f8a397d09040e

That commit itself needed a couple of follow on bug fixes, but the point
is that you could download 7.3.20 from sourceforge (which would compile
on your kernel) and compare the performance with it if you were
interested in a further experiment.

Jesse

2008-01-31 19:37:52

by Bruce Allen

[permalink] [raw]
Subject: Re: e1000 full-duplex TCP performance well below wire speed

Hi Bill,

>>> I see similar results on my test systems
>>
>> Thanks for this report and for confirming our observations. Could you
>> please confirm that a single-port bidrectional UDP link runs at wire
>> speed? This helps to localize the problem to the TCP stack or interaction
>> of the TCP stack with the e1000 driver and hardware.
>
> Yes, a single-port bidirectional UDP test gets full GigE line rate
> in both directions with no packet loss.

Thanks for confirming this. And thanks also for nuttcp! I just
recognized you as the author.

Cheers,
Bruce