Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id B971EC6FD19 for ; Thu, 9 Mar 2023 08:00:58 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229743AbjCIIA5 (ORCPT ); Thu, 9 Mar 2023 03:00:57 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229697AbjCIIAH (ORCPT ); Thu, 9 Mar 2023 03:00:07 -0500 Received: from wp530.webpack.hosteurope.de (wp530.webpack.hosteurope.de [80.237.130.52]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C2E89DDF2E; Wed, 8 Mar 2023 23:59:28 -0800 (PST) Received: from [2a02:8108:8980:2478:8cde:aa2c:f324:937e]; authenticated by wp530.webpack.hosteurope.de running ExIM with esmtpsa (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) id 1paBBN-0006ZG-B1; Thu, 09 Mar 2023 08:59:25 +0100 Message-ID: <18c406b6-ca57-b8e4-e60c-f4f0186ed392@leemhuis.info> Date: Thu, 9 Mar 2023 08:59:24 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.8.0 Subject: Re: [Regression] rt2800usb - Wifi performance issues and connection drops Content-Language: en-US, de-DE To: Alexander Wetzel , Linux regressions mailing list , Felix Fietkau Cc: "linux-wireless@vger.kernel.org" , LKML , Thomas Mann , Stanislaw Gruszka , Helmut Schaa , Johannes Berg References: <5a7cd098-1d83-6297-e802-ce998c8ec116@leemhuis.info> <6025e17e-4c29-6d36-6b9c-2fec543b21c4@wetzel-home.de> <4a02173f-3a60-0a7e-8962-3778e6c55bf3@nbd.name> <42185fa2-4191-fcf5-9c0f-fd7098bb856b@nbd.name> <2246a9d5-789d-08c9-f6a7-fb9db2edfe9f@leemhuis.info> From: "Linux regression tracking (Thorsten Leemhuis)" Reply-To: Linux regressions mailing list In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-bounce-key: webpack.hosteurope.de;regressions@leemhuis.info;1678348768;02e27c01; X-HE-SMSGID: 1paBBN-0006ZG-B1 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 08.03.23 17:50, Alexander Wetzel wrote: > On 08.03.23 13:21, Linux regression tracking (Thorsten Leemhuis) wrote: >> On 08.03.23 12:57, Felix Fietkau wrote: >>> On 08.03.23 12:41, Alexander Wetzel wrote: >>>> On 08.03.23 08:52, Felix Fietkau wrote: >>>>>> I'm also planning to provide some more debug patches, to figuring out >>>>>> which part of commit 4444bc2116ae ("wifi: mac80211: Proper mark iTXQs >>>>>> for resumption") fixes the issue for you. Assuming my understanding >>>>>> above is correct the patch should not really fix/break anything for >>>>>> you...With the findings above I would have expected your git bisec to >>>>>> identify commit a790cc3a4fad ("wifi: mac80211: add wake_tx_queue >>>>>> callback to drivers") as the first broken commit... >>>>> I can't point to any specific series of events where it would go >>>>> wrong, but I suspect that the problem might be the fact that you're >>>>> doing tx scheduling from within ieee80211_handle_wake_tx_queue. I >>>>> don't see how it's properly protected from potentially being called >>>>> on different CPUs concurrently. >>>>> Back when I was debugging some iTXQ issues in mt76, I also had >>>>> problems when tx scheduling could happen from multiple places. My >>>>> solution was to have a single worker thread that handles tx, which is >>>>> scheduled from the wake_tx_queue op. >>>>> Maybe you could do something similar in mac80211 for non-iTXQ drivers. >>>> I think it's already doing all of that: >>>> ieee80211_handle_wake_tx_queue() is the mac80211 implementation for the >>>> wake_tx_queue op. The drivers without native iTXQ support simply >>>> link it >>>> to this handler. >>> I know. The problem I see is that I can't find anything that guarantees >>> that .wake_tx_queue_op is not being called concurrently from multiple >>> different places. ieee80211_handle_wake_tx_queue is doing the scheduling >>> directly, instead of deferring it to a single workqueue/tasklet/thread, >>> and multiple concurrent calls to it could potentially cause issues. >> >> Alexander, Felix, many thx for looking into this. >> >> This more and more sounds like something that might take a while to get >> fixed, which makes it harder to get this fixed within those time-frames >> Documentation/process/handling-regressions.rst outlines. So please allow >> me to ask: >> >> Is reverting the culprit (and reapplying it later once the real cause is >> found and fixed) an option, or would that cause other regressions? > > This patch turned out to fix a (much worse) pre-release regression. See > e.g. > https://lore.kernel.org/linux-wireless/7cff27f8-d363-bbfb-241e-8d6fc0009c40@leemhuis.info/T/#t Uggh, thx for the update, that's unfortunate, but that's how it is sometimes. I just asked because the culprit didn't have a Reported-by or together with a Link: to the backstory, so it looked like it might be fine to revert. But then it's not a option. Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page.