Return-path: Received: from mail-ig0-f180.google.com ([209.85.213.180]:55462 "EHLO mail-ig0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S965521AbbBDNQv (ORCPT ); Wed, 4 Feb 2015 08:16:51 -0500 Message-ID: <1423055810.907.125.camel@edumazet-glaptop2.roam.corp.google.com> (sfid-20150204_141655_956580_AE324F8C) Subject: Re: Throughput regression with `tcp: refine TSO autosizing` From: Eric Dumazet To: Michal Kazior , Neal Cardwell Cc: linux-wireless , Network Development , eyalpe@dev.mellanox.co.il Date: Wed, 04 Feb 2015 05:16:50 -0800 In-Reply-To: References: <1422537297.21689.15.camel@edumazet-glaptop2.roam.corp.google.com> <1422628835.21689.95.camel@edumazet-glaptop2.roam.corp.google.com> <1422903136.21689.114.camel@edumazet-glaptop2.roam.corp.google.com> <1422926330.21689.138.camel@edumazet-glaptop2.roam.corp.google.com> <1422973660.907.10.camel@edumazet-glaptop2.roam.corp.google.com> <1423051045.907.108.camel@edumazet-glaptop2.roam.corp.google.com> <1423053531.907.115.camel@edumazet-glaptop2.roam.corp.google.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Sender: linux-wireless-owner@vger.kernel.org List-ID: OK guys Using a mlx4 testbed I can reproduce the problem by pushing coalescing settings and disabling SG (thus disabling GSO) ethtool -K eth0 sg off Actual changes: scatter-gather: off tx-scatter-gather: off generic-segmentation-offload: off [requested on] ethtool -C eth0 tx-usecs 1024 tx-frames 64 Meaning that NIC waits one ms before sending the TX IRQ, and can accumulate 64 frames before forcing the interrupt. We probably have a bug in cwnd expansion logic : lpaa23:~# DUMP_TCP_INFO=1 ./netperf -H 10.246.7.152 -Cc MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.246.7.152 () port 0 AF_INET rto=201000 ato=0 pmtu=1500 rcv_ssthresh=29200 rtt=230 rttvar=30 snd_ssthresh=41 cwnd=59 reordering=3 total_retrans=1 ca_state=0 pacing_rate=5943.1 Mbits Recv Send Send Utilization Service Demand Socket Socket Message Elapsed Send Recv Send Recv Size Size Size Time Throughput local remote local remote bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB 87380 16384 16384 10.00 530.39 0.40 0.32 2.965 2.398 -> final cwnd=59 which is not enough to avoid the 1ms delay between each burst. So sender sends ~60 packets, then has to wait 1ms (to get NIC TX IRQ) before sending the following burst. I am CCing Neal, he probably can help to root cause the problem. Thanks