Received: by 2002:a25:1985:0:0:0:0:0 with SMTP id 127csp4592210ybz; Tue, 28 Apr 2020 14:21:49 -0700 (PDT) X-Google-Smtp-Source: APiQypJNDtQHe/ABB8b2QnFu+rxURQx8/dinHWa2kPbQu/GNhcNuFx+6ZpT5a14a6FOYoBApViAV X-Received: by 2002:a17:907:7210:: with SMTP id dr16mr2194478ejc.197.1588108909137; Tue, 28 Apr 2020 14:21:49 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1588108909; cv=none; d=google.com; s=arc-20160816; b=eWQIZX3GmBlpBeCqm4/Ra8++6C7aQ5EFfkcsj+9w5Z3uTUwaTm2bBIQ1qb9giR18Uz Xs4T1R7FQIAInfsL7qcIRzmxFt6hqYlubHkcjZhczy56rEA4yHFHrHWMklU6SqKy7F6m YFqEGipe+5emDQALP6Aie1RFq60kpjpeZkYbe6gfN8+J4fGdkHisZ5d4fLfyryBwtjg8 hD8r1o0+rRRK4AZ9bLETuej/a/UJep5RhvI+YGMzi1ALmTJavazRasC9zC4AeijBbfs8 SVscMvSL30Jah7xPKLdRjggj4S47mf0uabdqa2ci2cwp0aJOY/he9E2O7ctBuRuj1wCS H8gg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:in-reply-to :mime-version:user-agent:date:message-id:from:references:to:subject :dkim-signature:dkim-filter; bh=Lc0gDWd17NI6RhKqzq555MBeuWBrRQpvyZcyh0ZiKyo=; b=Sa9Gnrf6R3/pxWkn24WUImhKOXyUOTl/7PnoANPsosiv5MDWUp5t1yP1Sh73o8Hz5J FL+rjtkrAyXn2v6/vl07LrDL60+flbQR7L0NHZObkk6CgRQGfyJVfZBtpMfLo+5+3HBu dk2I+/8gyWPBFX0TEdf8uWcm2wmsuuDZLeBaDJAA68CPYzvoHXZaEXKA5dVvbQ7hCkdL 9IOEmjfeCHPSINsQ2dtebqyv7YE4S3amXda3iO9RKjtpQEfzIMgWZssUP3kLk1sOXu9P r4AN3dAEOeoxKfcxfoHP93kh6ynBDngNcIIecUnMJWsdwIzPby6MZFhcF422vhdxdRty j/bg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@candelatech.com header.s=default header.b=nvbwHF3r; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=candelatech.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id qt4si2576348ejb.447.2020.04.28.14.21.12; Tue, 28 Apr 2020 14:21:49 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@candelatech.com header.s=default header.b=nvbwHF3r; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=candelatech.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726377AbgD1VSq (ORCPT + 99 others); Tue, 28 Apr 2020 17:18:46 -0400 Received: from mail2.candelatech.com ([208.74.158.173]:45238 "EHLO mail3.candelatech.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726307AbgD1VSp (ORCPT ); Tue, 28 Apr 2020 17:18:45 -0400 Received: from [192.168.254.4] (unknown [50.34.219.109]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail3.candelatech.com (Postfix) with ESMTPSA id 72F0613C283; Tue, 28 Apr 2020 14:18:44 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 mail3.candelatech.com 72F0613C283 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=candelatech.com; s=default; t=1588108724; bh=Tx1liRwgP1mL6dr1t4bO4tB1dZ+xutZhGfyNEn/NP6o=; h=Subject:To:References:From:Date:In-Reply-To:From; b=nvbwHF3rUbuQocipwwW6O+XlgbKhBy2xhnPl6t1EiXDgPzVTvZ0O3dP1q3bS5Yb1/ P0T3h/EkLes6001/PnTnut73zRbncIV/eItB3H10viVTrRFEsMcj4oIfRX7vZLWbtw gleOqXsWI5vEC7NwD7DxKc63wFszhf/GQnSAUW40= Subject: Re: [PATCH] ath10k: Restart xmit queues below low-water mark. To: =?UTF-8?Q?Toke_H=c3=b8iland-J=c3=b8rgensen?= , linux-wireless@vger.kernel.org References: <20200427145435.13151-1-greearb@candelatech.com> <87h7x3v1tn.fsf@toke.dk> <87a72vuyyn.fsf@toke.dk> From: Ben Greear Message-ID: Date: Tue, 28 Apr 2020 14:18:42 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <87a72vuyyn.fsf@toke.dk> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Sender: linux-wireless-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On 04/28/2020 01:39 PM, Toke Høiland-Jørgensen wrote: > Ben Greear writes: > >> On 04/28/2020 12:37 PM, Toke Høiland-Jørgensen wrote: >>> greearb@candelatech.com writes: >>> >>>> From: Ben Greear >>>> >>>> While running tcp upload + download tests with ~200 >>>> concurrent TCP streams, 1-2 processes, and 30 station >>>> vdevs, I noticed that the __ieee80211_stop_queue was taking >>>> around 20% of the CPU according to perf-top, which other locking >>>> taking an additional ~15%. >>>> >>>> I believe the issue is that the ath10k driver would unlock the >>>> txqueue when a single frame could be transmitted, instead of >>>> waiting for a low water mark. >>>> >>>> So, this patch adds a low-water mark that is 1/4 of the total >>>> tx buffers allowed. >>>> >>>> This appears to resolve the performance problem that I saw. >>>> >>>> Tested with recent wave-1 ath10k-ct firmware. >>>> >>>> Signed-off-by: Ben Greear >>>> --- >>>> drivers/net/wireless/ath/ath10k/htt.h | 1 + >>>> drivers/net/wireless/ath/ath10k/htt_tx.c | 8 ++++++-- >>>> 2 files changed, 7 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/net/wireless/ath/ath10k/htt.h b/drivers/net/wireless/ath/ath10k/htt.h >>>> index 31c4ddbf45cb..b5634781c0dc 100644 >>>> --- a/drivers/net/wireless/ath/ath10k/htt.h >>>> +++ b/drivers/net/wireless/ath/ath10k/htt.h >>>> @@ -1941,6 +1941,7 @@ struct ath10k_htt { >>>> >>>> u8 target_version_major; >>>> u8 target_version_minor; >>>> + bool needs_unlock; >>>> struct completion target_version_received; >>>> u8 max_num_amsdu; >>>> u8 max_num_ampdu; >>>> diff --git a/drivers/net/wireless/ath/ath10k/htt_tx.c b/drivers/net/wireless/ath/ath10k/htt_tx.c >>>> index 9b3c3b080e92..44795d9a7c0c 100644 >>>> --- a/drivers/net/wireless/ath/ath10k/htt_tx.c >>>> +++ b/drivers/net/wireless/ath/ath10k/htt_tx.c >>>> @@ -145,8 +145,10 @@ void ath10k_htt_tx_dec_pending(struct ath10k_htt *htt) >>>> lockdep_assert_held(&htt->tx_lock); >>>> >>>> htt->num_pending_tx--; >>>> - if (htt->num_pending_tx == htt->max_num_pending_tx - 1) >>>> + if ((htt->num_pending_tx <= (htt->max_num_pending_tx / 4)) && htt->needs_unlock) { >>> >>> Why /4? Seems a bit arbitrary? >> >> Yes, arbitrary for sure. I figure restart filling the queue when 1/4 >> full so that it is unlikely to run dry. Possibly it should restart >> sooner to keep it more full on average? > > Theoretically, the "keep the queue at the lowest possible level that > keeps it from underflowing" is what BQL is supposed to do. The diff > below uses the dynamic adjustment bit (from dynamic_queue_limits.h) in > place of num_pending_tx. I've only compile tested it, and I'm a bit > skeptical that it will work right for this, but if anyone wants to give > it a shot, there it is. > > BTW, while doing that, I noticed there's a similar arbitrary limit in > ath10k_mac_tx_push_pending() at max_num_pending_tx/2. So if you're going > to keep the arbitrary limit maybe use the same one? :) > >> Before my patch, the behaviour would be to try to keep it as full as >> possible, as in restart the queues as soon as a single slot opens up >> in the tx queue. > > Yeah, that seems somewhat misguided as well, from a latency perspective, > at least. But I guess that's what we're fixing with AQL. What does the > firmware do with the frames queued within? Do they just go on a FIFO > queue altogether, or something smarter? Sort of like a mini-mac80211 stack inside the firmware is used to create ampdu/amsdu chains and schedule them with its own scheduler. For optimal throughput with 200 users steaming video, the ath10k driver should think that it has only a few active peers wanting to send data at a time (and so firmware would think the same), and the driver should be fed a large chunk of pkts for those peers. And then the next few peers. That should let firmware send large ampdu/amsdu to each peer, increasing throughput over all. If you feed a few frames to each of the 200 peers, then even if firmware has 2000 tx buffers, that is only 10 frames per peer at best, leading to small ampdu/amsdu and thus worse over-all throughput and utilization of airtime. It would be nice to be able to set certain traffic flows to have the throughput optimization and others to have the latency optimization. For instance, high latency on a streaming download is a good trade-off if it increases total throughput. The end device will have plenty of buffers to handle the bursts of data. And of course other traffic will benefit from lower latency. Maybe some of the AI folks training their AI to categorize cat pictures could instead start categorizing traffic flows and adjusting the stack realtime... And now...back to the grind for me. Thanks, Ben -- Ben Greear Candela Technologies Inc http://www.candelatech.com