Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp905630rwr; Thu, 4 May 2023 11:07:53 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6xfsiPllaUa4DMqmFiZIBNPtFytegI1ksb6XVA8UGAinjRJMmgLDeWurE8rI1meyibX9p7 X-Received: by 2002:a17:90a:d482:b0:24e:4a1a:3994 with SMTP id s2-20020a17090ad48200b0024e4a1a3994mr3080957pju.3.1683223673666; Thu, 04 May 2023 11:07:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683223673; cv=none; d=google.com; s=arc-20160816; b=vc/d2MKSAAFOFjiUouYazz8FNxGSAKrlN01lFzW2XTCbBKG2IfgOSBW/WNiLN/Sy6/ tP6xT3398WlMbDE6fUxfnprOnxScgyBZucj74OZ7T1aP75gcyGakvyzhEOXbBdLGYn/r OqN+uTNpdWnWHR+M6b8Qllwn/5epvhw6h6fXIHeOYPYwuyEtM4tsLfogmvMPoBYdQ8UJ 20xcMA7nQ3RIYqFDcpWgkQYdqsCp69JA8ThylSZ6X7wk1NFiPIC6Ijxrlf1S9UwCVrlg DIPYqAxiczv/OyyQ/lQ5trNYPeIrVPbo+NtzXj8fuAl4BZEDMsioAeogVNtnSDDiabGz TXvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=kYGmSvNgFDY7g35LY35L0BjxtnZwRN1BFpVvX5bSjqA=; b=NrYt0ot/IBHn7Sv8SqKKzJmV1Y8XckgEkMM/AJXP+Go5ACUya3kyHjvvLdhrUJLnKt +2ejY9VXfa5nt4Y/r6dhVUsGYgBLiA3r2o90uv5rDMmrkz9TZ+cSOWme9s3sTVL6xbAv si/lXCqnuql30kb0M7scGsiW8tRP441Wmeela4pJ7irj24fwtid41Eof+uO129Qmn+ex 7lFvA+jN05rwFgr8o/Wb8cPMyabtsU7+7jT/fERyIJ1uqxyS+ZwzAErAeMhBSlLD7hUz 0Be62wkSxaqvsHeOPK+Rzodyvke4TeLZrkD+VZH17W42kAXjNjNSEM4sLP+F4y6G4eQ+ 1wtw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id mq5-20020a17090b380500b0024b5a028e7asi17590401pjb.47.2023.05.04.11.07.23; Thu, 04 May 2023 11:07:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-wireless-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-wireless-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229892AbjEDRzh (ORCPT + 62 others); Thu, 4 May 2023 13:55:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48230 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229626AbjEDRze (ORCPT ); Thu, 4 May 2023 13:55:34 -0400 Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4ECA92694; Thu, 4 May 2023 10:55:33 -0700 (PDT) Received: by mail-pl1-f172.google.com with SMTP id d9443c01a7336-1aaea43def7so5419155ad.2; Thu, 04 May 2023 10:55:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683222933; x=1685814933; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kYGmSvNgFDY7g35LY35L0BjxtnZwRN1BFpVvX5bSjqA=; b=FGxtAsbS7c91Wes4epS1OwkuW/gKvQO1KCveikOJ0pePTF1iuC7n7O8/8AY89Tsfb1 lY+0iw/eD3guAJiFbd1BoydIZgGZ6ph78pZPPat4mBzNhgFsub0tI6QjdDn+5pt+wvTe Waf4EZVMc5BJ4pNl2hn/O73KILcXY9Kg11hDqiGFcaYK0WC4Z4FO7lxT2wiKUIc2XStA J/ZE2U/ff9eizeqjMXDgCF52quR7iUkz9GM/6IpOdezDdeaBsGQdymHhJL0PQao4fwV4 XzI+3jqCNRWd/HdjdmTKwUZrxTwOcx6aFVBG5MN7jZZbIRJCwIrkaobahFMqAfxvhJYF eZkg== X-Gm-Message-State: AC+VfDzYUU4AXQ84MjH0SUrR8sB2jAahKCsU5PLOUbLCjqY8tDRNbwdA ouNXVKWpMu8gVeQ+v09ZTLo= X-Received: by 2002:a17:902:d50a:b0:1a6:5fa2:3293 with SMTP id b10-20020a170902d50a00b001a65fa23293mr5686816plg.56.1683222932569; Thu, 04 May 2023 10:55:32 -0700 (PDT) Received: from sultan-box.localdomain ([142.147.89.230]) by smtp.gmail.com with ESMTPSA id f14-20020a170902ab8e00b001aafe56ea70sm7076770plr.5.2023.05.04.10.55.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 04 May 2023 10:55:32 -0700 (PDT) Date: Thu, 4 May 2023 10:55:29 -0700 From: Sultan Alsawaf To: Johannes Berg Cc: "Greenman, Gregory" , Kalle Valo , "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "Goodstein, Mordechay" , "Coelho, Luciano" , "Sisodiya, Mukesh" , "linux-wireless@vger.kernel.org" , "netdev@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [PATCH] wifi: iwlwifi: Fix spurious packet drops with RSS Message-ID: References: <20230430001348.3552-1-sultan@kerneltoast.com> <8d2b0aec270b8cd0111654dc4b361987a112d3ce.camel@sipsolutions.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8d2b0aec270b8cd0111654dc4b361987a112d3ce.camel@sipsolutions.net> X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-wireless@vger.kernel.org On Thu, May 04, 2023 at 02:10:50PM +0200, Johannes Berg wrote: > [let's see if my reply will make it to the list, the original seems to > not have] > > On Sun, 2023-04-30 at 00:13 +0000, Sultan Alsawaf wrote: > > From: Sultan Alsawaf > > > > When RSS is used and one of the RX queues lags behind others by more than > > 2048 frames, then new frames arriving on the lagged RX queue are > > incorrectly treated as old rather than new by the reorder buffer, and are > > thus spuriously dropped. This is because the reorder buffer treats frames > > as old when they have an SN that is more than 2048 away from the head SN, > > which causes the reorder buffer to drop frames that are actually valid. > > > > The odds of this occurring naturally increase with the number of > > RX queues used, so CPUs with many threads are more susceptible to > > encountering spurious packet drops caused by this issue. > > > > As it turns out, the firmware already detects when a frame is either old or > > duplicated and exports this information, but it's currently unused. Using > > these firmware bits to decide when frames are old or duplicated fixes the > > spurious drops. > > So I assume you tested it now, and it works? Somehow I had been under > the impression we never got it to work back when... Yep, I've been using this for about a year and have let it run through the original iperf3 reproducer I mentioned on bugzilla for hours with no stalls. My big git clones don't freeze anymore either. :) What I wasn't able to get working was the big reorder buffer cleanup that's made possible by using these firmware bits. The explicit queue sync can be removed easily, but there were further potential cleanups you had mentioned that I wasn't able to get working. I hadn't submitted this patch until now because I was hoping to get the big cleanup done simultaneously but I got too busy until now. Since this small patch does fix the issue, my thought is that this could be merged and sent to stable, and with subsequent patches I can chip away at cleaning up the reorder buffer. > > Johannes mentions that the 9000 series' firmware doesn't support these > > bits, so disable RSS on the 9000 series chipsets since they lack a > > mechanism to properly detect old and duplicated frames. > > Indeed, I checked this again, I also somehow thought it was backported > to some versions but doesn't look like. We can either leave those old > ones broken (they only shipped with fewer cores anyway), or just disable > it as you did here, not sure. RSS is probably not as relevant with those > slower speeds anyway. Agreed, I think it's worth disabling RSS on 9000 series to fix it there. If the RX queues are heavily backed up and incoming packets are not released fast enough due to a slow CPU, then I think the spurious drops could happen somewhat regularly on slow devices using 9000 series. It's probably also difficult to judge the impact/frequency of these spurious drops in the wild due to TCP retries potentially masking them. The issue can be very noticeable when a lot of packets are spuriously dropped at once though, so I think it's certainly worth the tradeoff to disable RSS on the older chipsets. > > +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c > > @@ -918,7 +918,6 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, > > struct iwl_mvm_sta *mvm_sta; > > struct iwl_mvm_baid_data *baid_data; > > struct iwl_mvm_reorder_buffer *buffer; > > - struct sk_buff *tail; > > u32 reorder = le32_to_cpu(desc->reorder_data); > > bool amsdu = desc->mac_flags2 & IWL_RX_MPDU_MFLG2_AMSDU; > > bool last_subframe = > > @@ -1020,7 +1019,7 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, > > rx_status->device_timestamp, queue); > > > > /* drop any oudated packets */ > > - if (ieee80211_sn_less(sn, buffer->head_sn)) > > + if (reorder & IWL_RX_MPDU_REORDER_BA_OLD_SN) > > goto drop; > > > > /* release immediately if allowed by nssn and no stored frames */ > > @@ -1068,24 +1067,12 @@ static bool iwl_mvm_reorder(struct iwl_mvm *mvm, > > return false; > > } > > All that "send queue sync" code in the middle that was _meant_ to fix > this issue but I guess never really did can also be removed, no? And the > timer, etc. etc. Indeed, and removing the queue sync + timer are easy. Would you prefer I send additional patches for at least those cleanups before the fix itself can be considered for merging? Sultan