Received: by 2002:a25:8b91:0:0:0:0:0 with SMTP id j17csp15099ybl; Thu, 12 Dec 2019 13:12:18 -0800 (PST) X-Google-Smtp-Source: APXvYqy2cRpb18+I+PbYId6TyqTJTdWkfZdR2KwPpSqI3Bb3CA5PE/fg+zn8UKXH0LEngrCY1vrH X-Received: by 2002:aca:5083:: with SMTP id e125mr6510979oib.96.1576185138232; Thu, 12 Dec 2019 13:12:18 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1576185138; cv=none; d=google.com; s=arc-20160816; b=J1H2cuzlq6L7Byp0L5Kkkkj3w/jdhGAXL08NWjj2rlGQqGwAsZV5Xv/fvK3M6TPGH9 PNLRe0ex8rAYBXqWdCUqHOkiFZaX6mwR1jEkV0kVF+2Q0iZBxCFDpQuU43Yb5kTxz4gs tqw8Wras3+IEFBjWpMXJCoeoycjUAg9CkoSLVVGHQFewUQgz9JbliiHI46PJNeJZcgWj Mz2Vun42dNmeF9HIbOPWo/pKtj2TvSeZnJ4aF+vplw8UvC8cOSz/kAgZqEdXkvJ3h2JO 77hID8yWm9+99zjgYBsTNa4cijinVAAo9DqCj0vo4UFa3mHqWIM0HakytXEWZckF7sIZ roYg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :user-agent:references:in-reply-to:date:cc:to:from:subject :message-id; bh=rcWtQOk0fNht9P+zfP2kZR3nvFE/eJCyNVinCWGQF3U=; b=0LTyc+6jSR1W70mGKclDnZUpavPBYyGhUv+O/gdu2gVpRhPyfXEnhBSsPEO45sl9Of euBTyeM2hty35PAz/6eIbqL5/nJrou+l+HNWi0UxN/g6Cy3GLabr5NywNc4/sWZLXiTh Jj3esIPDtJqwO+sVMwhsKBWqKIz3pW/IWvqIZfdGlgWavENfyOXtz1uApIT9dF0/hT9m NbesYxlWy8QQODxPEHP6okzn5DXr7RNlAi96H2bGeWYZc1YLpeHLuefpQq8qI1fStsPq bDVoaZ0mCn5gyCRVj9l9pTjRPFeWtRwkPW8kZbgr7xSLprpsAGcQKpv3WncYkF9rp3ze PwLg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-wireless-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id o129si3988552oib.236.2019.12.12.13.11.52; Thu, 12 Dec 2019 13:12:18 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-wireless-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-wireless-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730979AbfLLVLZ (ORCPT + 99 others); Thu, 12 Dec 2019 16:11:25 -0500 Received: from s3.sipsolutions.net ([144.76.43.62]:52170 "EHLO sipsolutions.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1730806AbfLLVLZ (ORCPT ); Thu, 12 Dec 2019 16:11:25 -0500 Received: by sipsolutions.net with esmtpsa (TLS1.3:ECDHE_SECP256R1__RSA_PSS_RSAE_SHA256__AES_256_GCM:256) (Exim 4.92.3) (envelope-from ) id 1ifVjz-0086Dz-Ss; Thu, 12 Dec 2019 22:11:20 +0100 Message-ID: Subject: Re: debugging TCP stalls on high-speed wifi From: Johannes Berg To: Eric Dumazet , Neal Cardwell Cc: Toke =?ISO-8859-1?Q?H=F8iland-J=F8rgensen?= , linux-wireless@vger.kernel.org, Netdev Date: Thu, 12 Dec 2019 22:11:17 +0100 In-Reply-To: <99748db5-7898-534b-d407-ed819f07f939@gmail.com> (sfid-20191212_191119_097127_6CE454CE) References: <14cedbb9300f887fecc399ebcdb70c153955f876.camel@sipsolutions.net> <99748db5-7898-534b-d407-ed819f07f939@gmail.com> (sfid-20191212_191119_097127_6CE454CE) Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.34.2 (3.34.2-1.fc31) MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org Hi Eric, Thanks for looking :) > > I'm not sure how to do headers-only, but I guess -s100 will work. > > > > https://johannes.sipsolutions.net/files/he-tcp.pcap.xz > > > > Lack of GRO on receiver is probably what is killing performance, > both for receiver (generating gazillions of acks) and sender > (to process all these acks) Yes, I'm aware of this, to some extent. And I'm not saying we should see even close to 1800 Mbps like we have with UDP... Mind you, the biggest thing that kills performance with many ACKs isn't the load on the system - the sender system is only moderately loaded at ~20-25% of a single core with TSO, and around double that without TSO. The thing that kills performance is eating up all the medium time with small non-aggregated packets, due to the the half-duplex nature of WiFi. I know you know, but in case somebody else is reading along :-) But unless somehow you think processing the (many) ACKs on the sender will cause it to stop transmitting, or something like that, I don't think I should be seeing what I described earlier: we sometimes (have to?) reclaim the entire transmit queue before TCP starts pushing data again. That's less than 2MB split across at least two TCP streams, I don't see why we should have to get to 0 (which takes about 7ms) until more packets come in from TCP? Or put another way - if I free say 400kB worth of SKBs, what could be the reason we don't see more packets be sent out of the TCP stack within the few ms or so? I guess I have to correlate this somehow with the ACKs so I know how much data is outstanding for ACKs. (*) The sk_pacing_shift is set to 7, btw, which should give us 8ms of outstanding data. For now in this setup that's enough(**), and indeed bumping the limit up (setting sk_pacing_shift to say 5) doesn't change anything. So I think this part we actually solved - I get basically the same performance and behaviour with two streams (needed due to GBit LAN on the other side) as with 20 streams. > I had a plan about enabling compressing ACK as I did for SACK > in commit > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5d9f4262b7ea41ca9981cc790e37cca6e37c789e > > But I have not done it yet. > It is a pity because this would tremendously help wifi I am sure. Nice :-) But that is something the *receiver* would have to do. The dirty secret here is that we're getting close to 1700 Mbps TCP with Windows in place of Linux in the setup, with the same receiver on the other end (which is actually a single Linux machine with two GBit network connections to the AP). So if we had this I'm sure it'd increase performance, but it still wouldn't explain why we're so much slower than Windows :-) Now, I'm certainly not saying that TCP behaviour is the only reason for the difference, we already found an issue for example where due to a small Windows driver bug some packet extension was always used, and the AP is also buggy in that it needs the extension but didn't request it ... so the two bugs cancelled each other out and things worked well, but our Linux driver believed the AP ... :) Certainly there can be more things like that still, I just started on the TCP side and ran into the queueing behaviour that I cannot explain. In any case, I'll try to dig deeper into the TCP stack to understand the reason for this transmit behaviour. Thanks, johannes (*) Hmm. Now I have another idea. Maybe we have some kind of problem with the medium access configuration, and we transmit all this data without the AP having a chance to send back all the ACKs? Too bad I can't put an air sniffer into the setup - it's a conductive setup. (**) As another aside to this, the next generation HW after this will have 256 frames in a block-ack, so that means instead of up to 64 (we only use 63 for internal reasons) frames aggregated together we'll be able to aggregate 256 (or maybe we again only 255?). Each one of those frames may be an A-MSDU with ~11k content though (only 8k in the setup I have here right now), which means we can get a LOT of data into a single PPDU ... we'll probably have to bump the sk_pacing_shift to be able to fill that with a single TCP stream, though since we run all our performance numbers with many streams, maybe we should just leave it :)