LinuxLists.cc - linux-next network throughput performance regression

2015-11-06 20:31:48

Subject: linux-next network throughput performance regression

I compared the network throughput performance on SLES12 bare metal servers, between SLES12 default kernel and latest linux-next (2015-11-05) kernel, based on the test results, I suspect there is a network regression exists on Linux-Next over the 40G Ethernet network:
a) iperf3 reports 50% performance drop with single TCP stream on latest linux-next;
b) iperf3 reports 10% ~ 30% performance drop with 2 to 128 TCP streams on latest linux-next;
Another throughput benchmarking tool (ntttcp-for-linux) test result is also listed at the end of the email for reference.

Server configuration:
------------------------------
Two servers (one client and one server, cross linked by 40G Ethernet), which have:
a) CPU: Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz, 2 sockets, 16 CPUs, cache size : 20480 KB
b) Memory: 64 GB
c) Ethernet controller: Mellanox Technologies MT27520 Family [ConnectX-3 Pro], 40G Ethernet, default driver

Test with iperf3:
------------------------------
iperf3: https://github.com/esnet/iperf

a) SLES12 default kernel, network throughput tested by iperf3:
Test Connections 1 2 4 8 16 32 64 128
Throughput (G bps) 36.7 37.3 37.6 37.7 37.7 37.7 37.7 25.7

b) SLES12 + Linux-Next 20151105, network throughput tested by iperf3:
Test Connections 1 2 4 8 16 32 64 128
Throughput (G bps) 18.2 32.2 34.6 32.8 27.6 32.0 27.0 21.3
Percentage dropped -50% -14% -8% -13% -27% -15% -28% -17%

Test with ntttcp-for-linux:
------------------------------
ntttcp-for-linux: https://github.com/Microsoft/ntttcp-for-linux

a) SLES12 default kernel, network throughput tested by ntttcp-for-linux:
Test Connections 1 2 4 8 16 32 64 128 256 512
Throughput (Gbps) 36.19 37.29 37.67 37.68 37.7 37.72 37.74 37.76 37.81 37.9

b) SLES12 + Linux-Next 20151105, network throughput tested by ntttcp-for-linux:
Test Connections 1 2 4 8 16 32 64 128 256 512
Throughput (Gbps) 28.12 34.01 37.6 36.53 32.94 33.07 33.63 33.44 33.83 34.42
Percentage dropped -22% -9% 0% -3% -13% -12% -11% -11% -11% -9%

2015-11-06 20:46:01

by David Ahern

[permalink] [raw]

Subject: Re: linux-next network throughput performance regression

On 11/6/15 1:31 PM, Simon Xiao wrote:
> I compared the network throughput performance on SLES12 bare metal servers, between SLES12 default kernel and latest linux-next (2015-11-05) kernel, based on the test results, I suspect there is a network regression exists on Linux-Next over the 40G Ethernet network:
> a) iperf3 reports 50% performance drop with single TCP stream on latest linux-next;
> b) iperf3 reports 10% ~ 30% performance drop with 2 to 128 TCP streams on latest linux-next;
> Another throughput benchmarking tool (ntttcp-for-linux) test result is also listed at the end of the email for reference.
>

Can you post your kernel config file?

2015-11-06 21:34:06

by Simon Xiao

[permalink] [raw]

Subject: RE: linux-next network throughput performance regression

The .config file used to build linux-next kernel is attached to this mail.

> -----Original Message-----
> From: David Ahern [mailto:[email protected]]
> Sent: Friday, November 6, 2015 12:46 PM
> To: Simon Xiao <[email protected]>; [email protected];
> [email protected]; [email protected]
> Cc: David Miller <[email protected]>; KY Srinivasan
> <[email protected]>; Haiyang Zhang <[email protected]>
> Subject: Re: linux-next network throughput performance regression
>
> On 11/6/15 1:31 PM, Simon Xiao wrote:
> > I compared the network throughput performance on SLES12 bare metal
> servers, between SLES12 default kernel and latest linux-next (2015-11-05)
> kernel, based on the test results, I suspect there is a network regression
> exists on Linux-Next over the 40G Ethernet network:
> > a) iperf3 reports 50% performance drop with single TCP stream on
> > latest linux-next;
> > b) iperf3 reports 10% ~ 30% performance drop with 2 to 128 TCP streams
> > on latest linux-next; Another throughput benchmarking tool (ntttcp-for-
> linux) test result is also listed at the end of the email for reference.
> >
>
> Can you post your kernel config file?

Attachments:

ATT25492.config (147.04 kB)
ATT25492.config

2015-11-06 21:30:23

by David Ahern

[permalink] [raw]

Subject: Re: linux-next network throughput performance regression

On 11/6/15 2:18 PM, Simon Xiao wrote:
> The .config file used to build linux-next kernel is attached to this mail.

Thanks.

Failed to notice this on the first response; my brain filled in. Why
linux-next tree? Can you try net-next which is more relevant for this
mailing list, post the top commit id and config file used?

2015-11-07 19:36:06

by Eric Dumazet

[permalink] [raw]

Subject: Re: linux-next network throughput performance regression

On Fri, 2015-11-06 at 14:30 -0700, David Ahern wrote:
> On 11/6/15 2:18 PM, Simon Xiao wrote:
> > The .config file used to build linux-next kernel is attached to this mail.
>
> Thanks.
>
> Failed to notice this on the first response; my brain filled in. Why
> linux-next tree? Can you try net-next which is more relevant for this
> mailing list, post the top commit id and config file used?

Throughput on a single TCP flow for a 40G NIC can be tricky to tune.

Make sure IRQ are properly setup/balanced, as I know that IRQ names were
changed recently and your scripts might have not noticed...

Also "ethtool -c eth0" might show very different interrupt coalescing
params ?

I too have a Mellanox 40Gb in my lab and saw no difference in
performance with recent kernels.

Of course, a simple "perf record -a -g sleep 4 ; perf report" might
point to some obvious issue. Like unexpected segmentation in case of
forwarding...

2015-11-07 19:50:37

by Eric Dumazet

[permalink] [raw]

Subject: Re: linux-next network throughput performance regression

On Sat, 2015-11-07 at 11:35 -0800, Eric Dumazet wrote:
> On Fri, 2015-11-06 at 14:30 -0700, David Ahern wrote:
> > On 11/6/15 2:18 PM, Simon Xiao wrote:
> > > The .config file used to build linux-next kernel is attached to this mail.
> >
> > Thanks.
> >
> > Failed to notice this on the first response; my brain filled in. Why
> > linux-next tree? Can you try net-next which is more relevant for this
> > mailing list, post the top commit id and config file used?
>
> Throughput on a single TCP flow for a 40G NIC can be tricky to tune.
>
> Make sure IRQ are properly setup/balanced, as I know that IRQ names were
> changed recently and your scripts might have not noticed...
>
> Also "ethtool -c eth0" might show very different interrupt coalescing
> params ?
>
> I too have a Mellanox 40Gb in my lab and saw no difference in
> performance with recent kernels.
>
> Of course, a simple "perf record -a -g sleep 4 ; perf report" might
> point to some obvious issue. Like unexpected segmentation in case of
> forwarding...
>
>

I did a test with current net tree on both sender and receiver

lpaa23:~# ./netperf -H 10.246.7.152
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.246.7.152 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.00 26864.98
lpaa23:~# ethtool -c eth1
Coalesce parameters for eth1:
Adaptive RX: on TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 400000
pkt-rate-high: 450000

rx-usecs: 16
rx-frames: 44
rx-usecs-irq: 0
rx-frames-irq: 0

tx-usecs: 16
tx-frames: 16
tx-usecs-irq: 0
tx-frames-irq: 256

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 128
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

lpaa23:~# ethtool -C eth1 tx-usecs 4 tx-frames 4
lpaa23:~# ./netperf -H 10.246.7.152
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
10.246.7.152 () port 0 AF_INET
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec

87380 16384 16384 10.00 30206.27

2015-11-09 02:53:55

by Dexuan Cui

[permalink] [raw]

Subject: RE: linux-next network throughput performance regression

> From: devel [mailto:[email protected]] On Behalf
> Of Eric Dumazet
> Sent: Sunday, November 8, 2015 3:36
> To: David Ahern <[email protected]>
> Cc: [email protected]; Haiyang Zhang <[email protected]>; linux-
> [email protected]; [email protected]; David Miller
> <[email protected]>
> Subject: Re: linux-next network throughput performance regression
>
> On Fri, 2015-11-06 at 14:30 -0700, David Ahern wrote:
> > On 11/6/15 2:18 PM, Simon Xiao wrote:
> > > The .config file used to build linux-next kernel is attached to this mail.
> >
> > Thanks.
> >
> > Failed to notice this on the first response; my brain filled in. Why
> > linux-next tree? Can you try net-next which is more relevant for this
> > mailing list, post the top commit id and config file used?
>
> Throughput on a single TCP flow for a 40G NIC can be tricky to tune.
Why is a single TCP flow trickier than multiple TCP flows?
IMO it should be easier to analyze the issue of a single TCP flow?

Here the perf drop in Simon's test is very obvious -- 50%, but it looks Eric
can't reproduce it, so I suppose some net-related kernel config options may
do the magic?

Maybe Simon can narrow the regression down by bisecting. :-)

> Make sure IRQ are properly setup/balanced, as I know that IRQ names were
> changed recently and your scripts might have not noticed...
>
> Also "ethtool -c eth0" might show very different interrupt coalescing
> params ?
>
> I too have a Mellanox 40Gb in my lab and saw no difference in
> performance with recent kernels.
>
> Of course, a simple "perf record -a -g sleep 4 ; perf report" might
> point to some obvious issue. Like unexpected segmentation in case of
> forwarding...
>

Thanks,
-- Dexuan

2015-11-09 02:52:54

by David Miller

[permalink] [raw]

Subject: Re: linux-next network throughput performance regression

From: Dexuan Cui <[email protected]>
Date: Mon, 9 Nov 2015 02:39:24 +0000

>> Throughput on a single TCP flow for a 40G NIC can be tricky to tune.
> Why is a single TCP flow trickier than multiple TCP flows?
> IMO it should be easier to analyze the issue of a single TCP flow?

Because a single TCP flow can only use one of the many TX queues
that such modern NICs have.

The single TX queue becomes the bottleneck.

Whereas if you have several TCP flows, all of them can use independant
TX queues on the NIC in parallel to fill the link with traffic.

That's why.

2015-11-09 03:27:05

by Dexuan Cui

[permalink] [raw]

Subject: RE: linux-next network throughput performance regression

> -----Original Message-----
> From: David Miller [mailto:[email protected]]
> Sent: Monday, November 9, 2015 10:53
> To: Dexuan Cui <[email protected]>
> Cc: [email protected]; [email protected]; Simon Xiao
> <[email protected]>; [email protected]; Haiyang Zhang
> <[email protected]>; [email protected];
> [email protected]
> Subject: Re: linux-next network throughput performance regression
>
> From: Dexuan Cui <[email protected]>
> Date: Mon, 9 Nov 2015 02:39:24 +0000
>
> >> Throughput on a single TCP flow for a 40G NIC can be tricky to tune.
> > Why is a single TCP flow trickier than multiple TCP flows?
> > IMO it should be easier to analyze the issue of a single TCP flow?
>
> Because a single TCP flow can only use one of the many TX queues
> that such modern NICs have.
>
> The single TX queue becomes the bottleneck.
>
> Whereas if you have several TCP flows, all of them can use independant
> TX queues on the NIC in parallel to fill the link with traffic.
>
> That's why.

Thanks, David!
I understand 1 TX queue is the bottleneck (however in Simon's
test, TX=1 => 36.7Gb/s, TX=8 => 37.7 Gb/s, so it looks the TX=1 bottleneck
is not so obvious).
I'm just wondering how the bottleneck became much narrower with
recent linux-next in Simon's result (36.7 Gb/s vs. 18.2 Gb/s). IMO there
must be some latency somewhere.

Thanks,
-- Dexuan

2015-11-09 03:23:49

by David Miller

[permalink] [raw]

Subject: Re: linux-next network throughput performance regression

From: Dexuan Cui <[email protected]>
Date: Mon, 9 Nov 2015 03:11:35 +0000

>> -----Original Message-----
>> From: David Miller [mailto:[email protected]]
>> Sent: Monday, November 9, 2015 10:53
>> To: Dexuan Cui <[email protected]>
>> Cc: [email protected]; [email protected]; Simon Xiao
>> <[email protected]>; [email protected]; Haiyang Zhang
>> <[email protected]>; [email protected];
>> [email protected]
>> Subject: Re: linux-next network throughput performance regression
>>
>> From: Dexuan Cui <[email protected]>
>> Date: Mon, 9 Nov 2015 02:39:24 +0000
>>
>> >> Throughput on a single TCP flow for a 40G NIC can be tricky to tune.
>> > Why is a single TCP flow trickier than multiple TCP flows?
>> > IMO it should be easier to analyze the issue of a single TCP flow?
>>
>> Because a single TCP flow can only use one of the many TX queues
>> that such modern NICs have.
>>
>> The single TX queue becomes the bottleneck.
>>
>> Whereas if you have several TCP flows, all of them can use independant
>> TX queues on the NIC in parallel to fill the link with traffic.
>>
>> That's why.
>
> Thanks, David!
> I understand 1 TX queue is the bottleneck (however in Simon's
> test, TX=1 => 36.7Gb/s, TX=8 => 37.7 Gb/s, so it looks the TX=1 bottleneck
> is not so obvious).
> I'm just wondering how the bottleneck became much narrower with
> recent linux-next in Simon's result (36.7 Gb/s vs. 18.2 Gb/s). IMO there
> must be some latency somewhere.

I think the whole thing here is that you misinterpreted what Eric said.

He is not arguing that some regression did, or did not, happen.

He instead was making the basic statement about the fact that due to
the lack of paralellness a single stream TCP case is harder to
optimize for high speed NICs.

That is all.

2015-11-09 03:46:25

by Dexuan Cui

[permalink] [raw]

Subject: RE: linux-next network throughput performance regression

> From: David Miller [mailto:[email protected]]
> Sent: Monday, November 9, 2015 11:24
> ...
> > Thanks, David!
> > I understand 1 TX queue is the bottleneck (however in Simon's
> > test, TX=1 => 36.7Gb/s, TX=8 => 37.7 Gb/s, so it looks the TX=1 bottleneck
> > is not so obvious).
> > I'm just wondering how the bottleneck became much narrower with
> > recent linux-next in Simon's result (36.7 Gb/s vs. 18.2 Gb/s). IMO there
> > must be some latency somewhere.
>
> I think the whole thing here is that you misinterpreted what Eric said.
>
> He is not arguing that some regression did, or did not, happen.
>
> He instead was making the basic statement about the fact that due to
> the lack of paralellness a single stream TCP case is harder to
> optimize for high speed NICs.
>
> That is all.
Thanks, I got it.
I'm actually new to network performance tuning, trying to understand
all the related details. :-)

Thanks,
-- Dexuan

2015-11-09 03:33:01

by Dave Airlie

[permalink] [raw]

Subject: Re: linux-next network throughput performance regression

On 9 November 2015 at 13:23, David Miller <[email protected]> wrote:
> From: Dexuan Cui <[email protected]>
> Date: Mon, 9 Nov 2015 03:11:35 +0000
>
>>> -----Original Message-----
>>> From: David Miller [mailto:[email protected]]
>>> Sent: Monday, November 9, 2015 10:53
>>> To: Dexuan Cui <[email protected]>
>>> Cc: [email protected]; [email protected]; Simon Xiao
>>> <[email protected]>; [email protected]; Haiyang Zhang
>>> <[email protected]>; [email protected];
>>> [email protected]
>>> Subject: Re: linux-next network throughput performance regression
>>>
>>> From: Dexuan Cui <[email protected]>
>>> Date: Mon, 9 Nov 2015 02:39:24 +0000
>>>
>>> >> Throughput on a single TCP flow for a 40G NIC can be tricky to tune.
>>> > Why is a single TCP flow trickier than multiple TCP flows?
>>> > IMO it should be easier to analyze the issue of a single TCP flow?
>>>
>>> Because a single TCP flow can only use one of the many TX queues
>>> that such modern NICs have.
>>>
>>> The single TX queue becomes the bottleneck.
>>>
>>> Whereas if you have several TCP flows, all of them can use independant
>>> TX queues on the NIC in parallel to fill the link with traffic.
>>>
>>> That's why.
>>
>> Thanks, David!
>> I understand 1 TX queue is the bottleneck (however in Simon's
>> test, TX=1 => 36.7Gb/s, TX=8 => 37.7 Gb/s, so it looks the TX=1 bottleneck
>> is not so obvious).
>> I'm just wondering how the bottleneck became much narrower with
>> recent linux-next in Simon's result (36.7 Gb/s vs. 18.2 Gb/s). IMO there
>> must be some latency somewhere.
>
> I think the whole thing here is that you misinterpreted what Eric said.
>
> He is not arguing that some regression did, or did not, happen.
>
> He instead was making the basic statement about the fact that due to
> the lack of paralellness a single stream TCP case is harder to
> optimize for high speed NICs.
>
> That is all.

We recently had a regression tracked down in a similiar area that was
because of link order.

Dave.

2015-11-09 05:30:33

by Tom Herbert

[permalink] [raw]

Subject: Re: linux-next network throughput performance regression

On Sun, Nov 8, 2015 at 7:31 PM, Dexuan Cui <[email protected]> wrote:
>> From: David Miller [mailto:[email protected]]
>> Sent: Monday, November 9, 2015 11:24
>> ...
>> > Thanks, David!
>> > I understand 1 TX queue is the bottleneck (however in Simon's
>> > test, TX=1 => 36.7Gb/s, TX=8 => 37.7 Gb/s, so it looks the TX=1 bottleneck
>> > is not so obvious).
>> > I'm just wondering how the bottleneck became much narrower with
>> > recent linux-next in Simon's result (36.7 Gb/s vs. 18.2 Gb/s). IMO there
>> > must be some latency somewhere.
>>
>> I think the whole thing here is that you misinterpreted what Eric said.
>>
>> He is not arguing that some regression did, or did not, happen.
>>
>> He instead was making the basic statement about the fact that due to
>> the lack of paralellness a single stream TCP case is harder to
>> optimize for high speed NICs.
>>
>> That is all.
> Thanks, I got it.
> I'm actually new to network performance tuning, trying to understand
> all the related details. :-)
>

You might want to look at
https://www.kernel.org/doc/Documentation/networking/scaling.txt as an
introduction to the scaling capabilities of the stack.

Tom

> Thanks,
> -- Dexuan
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-11-09 20:23:10

by Simon Xiao

[permalink] [raw]

Subject: RE: linux-next network throughput performance regression

Thanks Eric to provide the data. I am looping Tom (as I am looking into his recent patches) and Olaf (from Suse).

So, if I understand it correctly, you are running netperf with single TCP connection, and you got ~26Gbps initially and got ~30Gbps after turning the tx-usecs and tx-frames.

Do you have a baseline on your environment for the best/max/or peak throughput?
Again, in my environment (SLES bare metal), if use SLES12 default kernel as a baseline, we can see significant performance drop (10% ~ 50%) on latest linux-next kernel.
Absolutely I will try the same test on net-next soon and update the results to here later.

Thanks,
Simon

> -----Original Message-----
> From: Eric Dumazet [mailto:[email protected]]
> Sent: Saturday, November 7, 2015 11:50 AM
> To: David Ahern <[email protected]>
> Cc: Simon Xiao <[email protected]>; [email protected];
> [email protected]; [email protected]; David Miller
> <[email protected]>; KY Srinivasan <[email protected]>; Haiyang
> Zhang <[email protected]>
> Subject: Re: linux-next network throughput performance regression
>
> On Sat, 2015-11-07 at 11:35 -0800, Eric Dumazet wrote:
> > On Fri, 2015-11-06 at 14:30 -0700, David Ahern wrote:
> > > On 11/6/15 2:18 PM, Simon Xiao wrote:
> > > > The .config file used to build linux-next kernel is attached to this mail.
> > >
> > > Thanks.
> > >
> > > Failed to notice this on the first response; my brain filled in. Why
> > > linux-next tree? Can you try net-next which is more relevant for
> > > this mailing list, post the top commit id and config file used?
> >
> > Throughput on a single TCP flow for a 40G NIC can be tricky to tune.
> >
> > Make sure IRQ are properly setup/balanced, as I know that IRQ names
> > were changed recently and your scripts might have not noticed...
> >
> > Also "ethtool -c eth0" might show very different interrupt coalescing
> > params ?
> >
> > I too have a Mellanox 40Gb in my lab and saw no difference in
> > performance with recent kernels.
> >
> > Of course, a simple "perf record -a -g sleep 4 ; perf report" might
> > point to some obvious issue. Like unexpected segmentation in case of
> > forwarding...
> >
> >
>
> I did a test with current net tree on both sender and receiver
>
> lpaa23:~# ./netperf -H 10.246.7.152
> MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
> 10.246.7.152 () port 0 AF_INET
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 87380 16384 16384 10.00 26864.98
> lpaa23:~# ethtool -c eth1
> Coalesce parameters for eth1:
> Adaptive RX: on TX: off
> stats-block-usecs: 0
> sample-interval: 0
> pkt-rate-low: 400000
> pkt-rate-high: 450000
>
> rx-usecs: 16
> rx-frames: 44
> rx-usecs-irq: 0
> rx-frames-irq: 0
>
> tx-usecs: 16
> tx-frames: 16
> tx-usecs-irq: 0
> tx-frames-irq: 256
>
> rx-usecs-low: 0
> rx-frame-low: 0
> tx-usecs-low: 0
> tx-frame-low: 0
>
> rx-usecs-high: 128
> rx-frame-high: 0
> tx-usecs-high: 0
> tx-frame-high: 0
>
> lpaa23:~# ethtool -C eth1 tx-usecs 4 tx-frames 4

> lpaa23:~# ./netperf -H
> 10.246.7.152 MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0
> AF_INET to
> 10.246.7.152 () port 0 AF_INET
> Recv Send Send
> Socket Socket Message Elapsed
> Size Size Size Time Throughput
> bytes bytes bytes secs. 10^6bits/sec
>
> 87380 16384 16384 10.00 30206.27
>

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2015-11-09 23:04:43

by Eric Dumazet

[permalink] [raw]

Subject: Re: linux-next network throughput performance regression

On Mon, 2015-11-09 at 20:23 +0000, Simon Xiao wrote:
> Thanks Eric to provide the data. I am looping Tom (as I am looking into his recent patches) and Olaf (from Suse).
>
> So, if I understand it correctly, you are running netperf with single
> TCP connection, and you got ~26Gbps initially and got ~30Gbps after
> turning the tx-usecs and tx-frames.
>
> Do you have a baseline on your environment for the best/max/or peak
> throughput?

The peak on my lab pair is about 34Gbits, usually I get this if I pin
the receiving thread on a cpu, otherwise process scheduler can really
hurt too much.

lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H lpaa24 -l 20 -Cc -T ,1
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpaa24.prod.google.com () port 0 AF_INET : cpu bind
tcpi_rto 201000 tcpi_ato 0 tcpi_pmtu 1500 tcpi_rcv_ssthresh 29200
tcpi_rtt 101 tcpi_rttvar 15 tcpi_snd_ssthresh 289 tpci_snd_cwnd 289
tcpi_reordering 3 tcpi_total_retrans 453
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

87380 16384 16384 20.00 33975.99 1.27 3.36 0.147 0.389

Not too bad, I don't recall reaching more than that ever.