Return-path: Received: from mail-we0-f174.google.com ([74.125.82.174]:48469 "EHLO mail-we0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932074AbbBCJAe convert rfc822-to-8bit (ORCPT ); Tue, 3 Feb 2015 04:00:34 -0500 Received: by mail-we0-f174.google.com with SMTP id w55so38297433wes.5 for ; Tue, 03 Feb 2015 01:00:33 -0800 (PST) MIME-Version: 1.0 In-Reply-To: <1422918363.21689.132.camel@edumazet-glaptop2.roam.corp.google.com> References: <1422537297.21689.15.camel@edumazet-glaptop2.roam.corp.google.com> <1422628835.21689.95.camel@edumazet-glaptop2.roam.corp.google.com> <1422903136.21689.114.camel@edumazet-glaptop2.roam.corp.google.com> <54CFEB46.3050006@candelatech.com> <1422918363.21689.132.camel@edumazet-glaptop2.roam.corp.google.com> Date: Tue, 3 Feb 2015 10:00:33 +0100 Message-ID: (sfid-20150203_100041_683055_7BEC8AAC) Subject: Re: Throughput regression with `tcp: refine TSO autosizing` From: Michal Kazior To: Eric Dumazet Cc: Ben Greear , linux-wireless , Network Development , eyalpe@dev.mellanox.co.il Content-Type: text/plain; charset=UTF-8 Sender: linux-wireless-owner@vger.kernel.org List-ID: On 3 February 2015 at 00:06, Eric Dumazet wrote: > On Mon, 2015-02-02 at 13:25 -0800, Ben Greear wrote: > >> It is a big throughput win to have fewer TCP ack packets on >> wireless since it is a half-duplex environment. Is there anything >> we could improve so that we can have fewer acks and still get >> good tcp stack behaviour? > > First apply TCP stretch ack fixes to the sender. There is no way to get > good performance if the sender does not handle stretch ack. > > d6b1a8a92a14 tcp: fix timing issue in CUBIC slope calculation > 9cd981dcf174 tcp: fix stretch ACK bugs in CUBIC > c22bdca94782 tcp: fix stretch ACK bugs in Reno > 814d488c6126 tcp: fix the timid additive increase on stretch ACKs > e73ebb0881ea tcp: stretch ACK fixes prep > > Then, make sure you do not throttle ACK too long, especially if you hope > to get Gbit line rate on a 4 ms RTT flow. > > GRO does not mean : send one ACK every ms, or after 3ms delay... I think it's worth pointing out that If you assume 3-frame A-MSDU and 64-frame A-MPDU you get 192 frames (as far as TCP/IP is concerned) per aggregation window. Assuming effective 600mbps throughput: python> 1.0/((((600/8)*1024*1024)/1500)/(3*64)) 0.003663003663003663 This is probably worst case, but still probably worth to keep in mind. ath10k has a knob to tune A-MSDU aggregation count. The default is "3" and it's what I've been testing so far. When I change it to "1" on sender I get 250->400mbps boost in TCP -P5 but see no difference with -P1 (number of flows). Changing it to "1" on receiver yields no difference. I can try adding this configuration permutation to my future tests if you're interested. So that you have an idea - using "1" on sender degrades UDP throughput (even 690->500mbps in some cases). MichaƂ