Received: by 2002:a05:6a10:83d0:0:0:0:0 with SMTP id o16csp23070pxh; Thu, 7 Apr 2022 12:49:00 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwxJieM2UNOlWaWevqOTjX+pECoN8DWuTdxhi674Wy7zGvD4uOUIFTtWp+tfTXNep9/qEzn X-Received: by 2002:a63:df4a:0:b0:399:460d:2da with SMTP id h10-20020a63df4a000000b00399460d02damr12316283pgj.315.1649360940166; Thu, 07 Apr 2022 12:49:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649360940; cv=none; d=google.com; s=arc-20160816; b=d35UF8zydWlPfZSOi7oO45aoOnSYhpLI/OOnz6H8RzKZcrhwEAERTnSc+itBsw3zUe csHcmjkkjiJeNEJbVSF4FyZr7w7MX2C03EK4PH4uUpehStkUiekpQw4/U10uZSUSQRf+ Q/Qx1gDQNtKy/j0YxSMEMt77cOAGZ6J29+dyo9nbQJeS0kTQL+y9YEeJ90O73e8MIHzL nbO3RceeZ+70j49vGTHUiVr7WukIAayAoz9OW480kN4UBT0H9yoVs42DqXEuS8v3T/1E vSilLxUL7/ON1LU48x1haTd+1DVqxjQ2nsdM6zVXIFoJ9FtH/8srDC1p9tLjbQoTPRsp Gxrg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:message-id:date:references :in-reply-to:subject:cc:to:dkim-signature:dkim-signature:from; bh=l0I9I6WV1S1O6FTGGqs5ZdqvRo5lJvmEwFUVwH/oV+U=; b=hKfjyVpQiUz2X56HRoHh0Y7RH9NZnWIx89DI5AKPq4h6V/rzeulc+GmOtNX9yu7Vi5 A2/XamuLSNYT3inCD3sqimYAPPQbRwFnFuMSuyx+v5uHuGPLS3caRVuZR2ZcvGpifBJd zlLZ28nV2wx4HNu6uc5OB56kF33akXZkKwpsh89S0RLWWRXLZZRZIrrNVMcTIUcdJ17X 04kwsTs3wxerp1kLT0WQsW2mPe4DgyWNXbTn2XY22p2eCTmzYkfylWg4ZmiWe1uFyV3D LkZW4zGlF4VgC0bUiCePWrLsnU9mXx0lhp7S9QolHcwVPkH6r/fSxTv+LONRPbhqigBH eKTw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=MFHvGLPy; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b="/2Ta+A6n"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [2620:137:e000::1:18]) by mx.google.com with ESMTPS id s11-20020a170902ea0b00b00153b2d165ecsi574872plg.500.2022.04.07.12.48.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 07 Apr 2022 12:49:00 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) client-ip=2620:137:e000::1:18; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=MFHvGLPy; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e header.b="/2Ta+A6n"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id B7F82223845; Thu, 7 Apr 2022 12:21:05 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S245389AbiDGMt2 (ORCPT + 99 others); Thu, 7 Apr 2022 08:49:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59582 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S242785AbiDGMtX (ORCPT ); Thu, 7 Apr 2022 08:49:23 -0400 Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 141C392D33; Thu, 7 Apr 2022 05:47:22 -0700 (PDT) From: Thomas Gleixner DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1649335640; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=l0I9I6WV1S1O6FTGGqs5ZdqvRo5lJvmEwFUVwH/oV+U=; b=MFHvGLPyrEoJXZikbzZN3w0JkZL6HSMRqGDnxPZ0KuA+DjMOYwrtIciTJ/okXOHwd+xXGT BS4/9j1Jtv6EydYqmn+NA7XwdxGwk0VHxeU97XC3/uTzDvD/iJ0ZHETNmEJ9XPwpDRRMPh tl0RgLxWSlEdRnagx0ReNDZLDgauxJJUaJZHPxBRProolKPF0nK+t2pMdNSrHwbmH0PoCd 6YfqCiBPXZJwT3E9WfQYp0/gWZFHXUEhr5Nqr4Yt/QfldP7BuruBkZaeQneBe5kZz7uWMW jvdLcIgbClBBzW2jYbxYilmgWGQuD6nZEyi0UIbl6BLqTX0zioLrbb4aNirpRw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1649335640; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=l0I9I6WV1S1O6FTGGqs5ZdqvRo5lJvmEwFUVwH/oV+U=; b=/2Ta+A6nKdnPawgIIqaCCFMIwPuuCPkTSrEj1P9rXDj3vBarf6uOD00BPIt438DyBa5QmM x8XkxiU/QG1YePDg== To: Matthew Wilcox , Liao Chang Cc: mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, clg@kaod.org, nitesh@redhat.com, edumazet@google.com, peterz@infradead.org, joshdon@google.com, masahiroy@kernel.org, nathan@kernel.org, akpm@linux-foundation.org, vbabka@suse.cz, gustavoars@kernel.org, arnd@arndb.de, chris@chrisdown.name, dmitry.torokhov@gmail.com, linux@rasmusvillemoes.dk, daniel@iogearbox.net, john.ogness@linutronix.de, will@kernel.org, dave@stgolabs.net, frederic@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, heying24@huawei.com, guohanjun@huawei.com, weiyongjun1@huawei.com Subject: Re: [RFC 0/3] softirq: Introduce softirq throttling In-Reply-To: References: <20220406025241.191300-1-liaochang1@huawei.com> Date: Thu, 07 Apr 2022 14:47:20 +0200 Message-ID: <877d81jc13.ffs@tglx> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 06 2022 at 15:54, Matthew Wilcox wrote: > On Wed, Apr 06, 2022 at 10:52:38AM +0800, Liao Chang wrote: >> Kernel check for pending softirqs periodically, they are performed in a >> few points of kernel code, such as irq_exit() and __local_bh_enable_ip(), >> softirqs that have been activated by a given CPU must be executed on the >> same CPU, this characteristic of softirq is always a potentially >> "dangerous" operation, because one CPU might be end up very busy while >> the other are most idle. >> >> Above concern is proven in a networking user case: recenlty, we >> engineer find out the time used for connection re-establishment on >> kernel v5.10 is 300 times larger than v4.19, meanwhile, softirq >> monopolize almost 99% of CPU. This problem stem from that the connection >> between Sender and Receiver node get lost, the NIC driver on Sender node >> will keep raising NET_TX softirq before connection recovery. The system >> log show that most of softirq is performed from __local_bh_enable_ip(), >> since __local_bh_enable_ip is used widley in kernel code, it is very >> easy to run out most of CPU, and the user-mode application can't obtain >> enough CPU cycles to establish connection as soon as possible. > > Shouldn't you fix that bug instead? This seems like papering over the > bad effects of a bug and would make it harder to find bugs like this in > the future. Essentially, it's the same as a screaming hardware interrupt, > except that it's a software interrupt, so we can fix the bug instead of > working around broken hardware. It's not necessarily broken hardware. It's a fundamental issue of our softirq processing magic which can happen in those contexts: 1) On return from interrupt 2) In local_bh_enable() 3) In ksoftirqd We have heuristics in place which delegate processing to ksoftirqd, which brings softirq processing under scheduler control to some extent, but those heuristics are rather easy to evade. Delegation to ksoftirqd happens when the runtime of the __do_softirq() loop exceeds a threshold. But if that is not exceeded then you still can get into a situation where softirq processing eats up a large quantity of CPU time in #1 and #2 which is the real problem because it prevents the scheduler from applying fairness. That's a known issue and attempts to fix that are popping up on a regular base. There are several issues here: 1) The runtime check in __do_softirq() is jiffies based and depending on CONFIG_HZ it's easy to stay under the threshold for one invocation, but still eat a large amount of CPU time. Also the runtime check happens at the end of the loop, which means that if a single softirq callback runs too long we still process all other pending ones. 2) The decision to process softirqs directly on return from interrupt or in local_bh_enable() is error prone. The logic is: if (!ksoftirq_running() || local_softirq_pending() & (SOFTIRQ_HI | SOFTIRQ_TASKLET)) process_direct(); The reason for the HI/TASKLET exception is documented here: 3c53776e29f8 ("Mark HI and TASKLET softirq synchronous") But this is nasty because tasklets are widely used in networking, crypto and quite some of them are self rearming when they take too long. See mlx5_cq_tasklet_cb() as one example, which also uses jiffies for time limiting... With the HI/TASKLET exception this means that there is no delegation to ksoftirqd simply because on the next return from interrupt or the next local_bh_enable() in task context softirqs are processed. And this processes _all_ pending bits not only the HI/TASKLET ones... The approach of just not running softirqs at all via a throttle mechanism as proposed with these patches here, is definitely wrong and going nowhere. The proper solution for load accounting is a moving average with exponential decay based on sched_clock() and not on jiffies. That gives a reasonable decision to enforce ksoftirqd processing, but of course that does neither solve the tasklet issues nor any of the other problems vs. softirqs at all. We've tried splitting ksoftirqd into different threads a couple of years ago in RT, but that turned out to be problematic in some cases. Frederic did some experiments to make local_bh_disable() take a mask argument to disable only particular softirqs, but that ran into a dead end and is problematic because quite some code, esp. networking relies on multiple softirqs being disabled. Softirqs are semantically ill defined and that's known for a very long time, but of course they are conveniant and with a few hacks piled on top to address the most urgent horrors they work by some definition of work. IOW, we are accumulating technical depth with a fast pace. TBH, I have no real good plan how to address this proper, but it's about time to tackle this in a concerted effort. Thanks, tglx