Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp542424ybl; Fri, 13 Dec 2019 00:11:22 -0800 (PST) X-Google-Smtp-Source: APXvYqy/JIb3L9lKFZEcZV2Y+U+uSV92/C+3FUqu+SEFuiea0tW5YSCEh820nVYUfnM5b3RVTMmf X-Received: by 2002:a05:6830:681:: with SMTP id q1mr13227097otr.162.1576224682659; Fri, 13 Dec 2019 00:11:22 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576224682; cv=none; d=google.com; s=arc-20160816; b=krGl/TCHQffdKEGU8wZuJNQ7MyowRiTWnqUGS8TnMU8YJO8sX+B1QnsqAGDxGYXFJI ewmFWaIxOi4TGy3y/Q0+HQaBiWuQzqj7+DXrpirceeghHl1/IfYI7Pdie6SfmK8GMMwq 3Sle72S6m0Hj5GUEl7NpgyU92c+Z3KTjf19slIEqz39vBHP3kczhNQUyAYwvxlqq8sDd ffw7AjhxTT0pXBR56RCEXm54itPWvmuyR/bVQxBczH5SEG0KuxFj7Hoz3M2FQWgZfW5L gTmy+XoEQDF2c6c7BN9xVTn9dwbmR7DKDWp0nGxWRqorWo6p2Qxhi3duQZp2SteCN3h+ nBZQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=gSIlE/0YEuZ965GHkjdQwJtoxVeCLUiPiopGei1rkhs=; b=0SvtGvtUX1LRBjcFLj7WUqcUifvcYNlIqy5MoLLt3/WbkPNVVG93yMlbjBNeoc4VEo JE6dWqqatBcjzwx3wge9qcurf+gRPKWTTpKcx16zH27PheOpfGSHfHHyxjOHZIpJTpnC 7zpqKTlQyu7dMhXHd0nc9Ob5EHvJxwp+wjsSqkya0Y0wmAnrK8fkW4BFz3t+D+VO8zTO zMvZXjDFx8mKqeIdQUXqQ71olYMSnM5quq+5aoUlDh3wCshnqbRgOyuy5UWjB9kIkQs0 sRtRY9RnG/bVNAAr/QHbeQ4GaTPbTzdICTXX6jVFpP3LAYI7yWWzppvAvUWiNBFRPd94 z+aw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-wireless-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r6si4658788otn.216.2019.12.13.00.11.02; Fri, 13 Dec 2019 00:11:22 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-wireless-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-wireless-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726690AbfLMIIR (ORCPT + 99 others); Fri, 13 Dec 2019 03:08:17 -0500 Received: from s3.sipsolutions.net ([144.76.43.62]:42932 "EHLO sipsolutions.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726595AbfLMIIP (ORCPT ); Fri, 13 Dec 2019 03:08:15 -0500 Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.92.3) (envelope-from ) id 1iffzf-009Qfq-NZ; Fri, 13 Dec 2019 09:08:11 +0100 Message-ID: Subject: Re: debugging TCP stalls on high-speed wifi From: Johannes Berg To: Dave Taht Cc: Eric Dumazet , Neal Cardwell , Toke =?ISO-8859-1?Q?H=F8iland-J=F8rgensen?= , linux-wireless , Netdev , Make-Wifi-fast Date: Fri, 13 Dec 2019 09:08:10 +0100 In-Reply-To: (sfid-20191213_004245_370450_D3825B7B) References: <14cedbb9300f887fecc399ebcdb70c153955f876.camel@sipsolutions.net> <99748db5-7898-534b-d407-ed819f07f939@gmail.com> (sfid-20191213_004245_370450_D3825B7B) Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.2 (3.34.2-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On Thu, 2019-12-12 at 15:42 -0800, Dave Taht wrote: > If you captured the air you'd probably see the sender winning the > election for airtime 2 or more times in a row, > it's random and oft dependent on on a variety of factors. I'm going to try to capture more details - I can probably extract this out of the firmware but it's more effort. > Most Wifi is *not* "half" duplex, which implies it ping pongs between > send and receive. That's an interesting definition of "half duplex" which doesn't really match anything that I've seen used or in the literature? What you're describing sounds more like some sort of "half duplex with token-based flow control" or something like that to me ... > > But unless somehow you think processing the (many) ACKs on the sender > > will cause it to stop transmitting, or something like that, I don't > > think I should be seeing what I described earlier: we sometimes (have > > to?) reclaim the entire transmit queue before TCP starts pushing data > > again. That's less than 2MB split across at least two TCP streams, I > > don't see why we should have to get to 0 (which takes about 7ms) until > > more packets come in from TCP? > > Perhaps having a budget for ack processing within a 1ms window? What do you mean? There's such a budget? What kind of budget? I have plenty of CPU time left, as far as I can tell. > It would be interesting to repeat this test in ht20 mode, Why? HT20 is far slower, what would be the advantage. In my experience I don't hit this until I get to HE80. > flent --socket-stats --step-size=.04 --te=upload_streams=2 -t > whatever_variant_of_test tcp_nup > > That will capture some of the tcp stats for you. I guses I can try, but the upload_streams=2 won't actually help - I need to run towards two different IP addresses - remember that I'm otherwise limited by a GBit LAN link on the other side right now. > > But that is something the *receiver* would have to do. > > Well it is certainly feasible to thin acks on the driver as we did in > cake. I really don't think it would help in my case, either the ACKs are the problem (which I doubt) and then they're the problem on the air, or they're not the problem since I have plenty of CPU time to waste on them ... > One thing comcast inadvertently does to most flows is remark them cs1, > which tosses big data into the bk queue and acks into the be queue. It > actually helps sometimes. I thought about doing this but if I make my flows BK it halves my throughput (perhaps due to the more then double AIFSN?) > > (**) As another aside to this, the next generation HW after this will > > have 256 frames in a block-ack, so that means instead of up to 64 (we > > only use 63 for internal reasons) frames aggregated together we'll be > > able to aggregate 256 (or maybe we again only 255?). > > My fervent wish is to somehow be able to mark every frame we can as not > needing a retransmit in future standards. This can be done since ... oh I don't know, probably 2005 with the 802.11e amendment? Not sure off the top of my head how it interacts with A-MPDUs though, and probably has bugs if you do that. > I've lost track of what ax > can do. ? And for block ack retries > to give up far sooner. You can do that too, it's just a local configuration how much you try each packet. If you give up you leave a hole in the reorder window, but if you start sending packets that are further ahead then the window, the old ones will (have to be) released regardless. > you can safely drop all but the last three acks in a flow, and the > txop itself provides > a suitable clock. Now that's more tricky because once you stick the packets into the hardware queue you likely have to decide whether or not they're important. I can probably think of ways of working around that (similar to the table-based rate scaling we use), but it's tricky. > And, ya know, releasing packets ooo doesn't hurt as much as it used > to, with rack. :) That I think is not currently possible with A-MPDUs. It'd also still have to be opt-in per frame since you can't really do that for anything but TCP (and probably QUIC? Maybe SCTP?) > Just wearing my usual hat, I would prefer to optimize for service > time, not bandwidth, in the future, > using smaller txops with this more data in them, than the biggest > txops possible. Patience. We're getting there now. HE will allow the AP to schedule everything, and then you don't need TXOPs anymore. The problem is that winning a TXOP is costly, so you *need* to put as much as possible into it for good performance. With HE and the AP scheduling, you win some, you lose some. The client will lose the ability to actually make any decisions about its transmit rate and things like that, but the AP can schedule & poll the clients better without all the overhead. > If you constrain your max txop to 2ms in this test, you will see tcp > in slow start ramp up faster, > and the ap scale to way more devices, with way less jitter and > retries. Most flows never get out of slowstart. I'm running a client ... you're forgetting that there's something else that's actually talking to the AP you're thinking of :-) > > . we'll probably have to bump the sk_pacing_shift to be able to > > fill that with a single TCP stream, though since we run all our > > performance numbers with many streams, maybe we should just leave it :) > > Please. Optimizing for single flow performance is an academic's game. Same here, kinda. johannes