Received: by 2002:a05:6a10:2726:0:0:0:0 with SMTP id ib38csp1161473pxb; Wed, 6 Apr 2022 10:09:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyndtCJgmKDoRWYeFHwAk1DNjw1gq2LjRi1ixYXLxA3jAnci4itUu46mZV8AWK8ktBaLxxL X-Received: by 2002:a05:6a00:2887:b0:4fa:e10c:7ca with SMTP id ch7-20020a056a00288700b004fae10c07camr9960075pfb.9.1649264972925; Wed, 06 Apr 2022 10:09:32 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1649264972; cv=none; d=google.com; s=arc-20160816; b=ZygCMYZXUS3ot6Ex9Zt01yNMo3i6OkXzbr3TJNn3OtB/Ak5V2RfS/wX6jGnE949dHA EW2c7BjtqOZAca5DZV9PlWXqaecPqKmqB2TNQTi+iUygpK+mS/24OjzlvjIkCRM+pMN5 ZXssT+vQ1OmBofjPib8wpOdMiTchFy3Lt27iQgaAPL6/0cibTuLhT6+dmZWqrxeygpoP d4HDGVf7HF+FdIpVXe/MXWeEoXDoP4u1h4i+ii1SyKyvMDsT41XgRT9x/OK2NIaBtnS4 wuKr3PRmLob9rZSTdDW6phGZOIrsLr8MwRJYkdwST2C9hDn7XoFNgDLu8pftVurliKpQ lDgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=Jd5mGjSbX47w2X+2S3x1eLe58DMjaD30ZV5S+64MR24=; b=L5Xu7JD8UKmhhqJ1y7GCc8Caq5YbB1jPLeLj0crYeyx6hFcJYyh2yKCf3Lgf5awIJz bvYP4bjpkM98GDcd/ae3ppw7k6htO9Put4p1gkmPyZU/VRLLFArFj7lJ4Zvsz46RTVOS Ki3lwclOcvxZiaB0E68Twozov1jHjswTQuWVXoUKIcPgx6mVanZqEezaQB5v39oL6lUj UPKR5M+/s5yKEE7jqfqquRoLfRvBZ4TjnTI2lYf5kaxQhmLJeMIrSPsjU+MO42khyA7u KWQuG7cBl+G8MrVXC1zAJ2JQLQJ6Yw/Ws8rw7Qq4CZMcqvsEgqcJXOK/qzHAe2QmkABe EFZw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=oAj37mZg; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net. [23.128.96.19]) by mx.google.com with ESMTPS id x12-20020a170902ec8c00b00153b2d16663si17096460plg.619.2022.04.06.10.09.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 06 Apr 2022 10:09:32 -0700 (PDT) Received-SPF: softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) client-ip=23.128.96.19; Authentication-Results: mx.google.com; dkim=pass header.i=@infradead.org header.s=casper.20170209 header.b=oAj37mZg; spf=softfail (google.com: domain of transitioning linux-kernel-owner@vger.kernel.org does not designate 23.128.96.19 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 1D426217398; Wed, 6 Apr 2022 09:54:39 -0700 (PDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S238337AbiDFQ4a (ORCPT + 99 others); Wed, 6 Apr 2022 12:56:30 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50552 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238522AbiDFQzp (ORCPT ); Wed, 6 Apr 2022 12:55:45 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6B9881E6F97; Wed, 6 Apr 2022 07:55:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=Jd5mGjSbX47w2X+2S3x1eLe58DMjaD30ZV5S+64MR24=; b=oAj37mZgYGKYE01JQUyBn6XaeH 7W9y0feU9pl7TuryQObRZK4vnceajbqIPXBFI9ttn0ygE/dTS4e8S5ViYR8cUL6bSW0GW5cYC/ouP at8TEgpOWlx8FIFUjjnlRfT8zYqw4675vYLfUvp275wWTcuiSWKaXiT7IMPHv6qqpDlfyP6yd3wE9 HTwskVIEH2F2eenvBrp8pCLYYLvdDG0vjldTcvZqdQOU0sTqlgzX8fOYLTlKjuYHfTi1mHd9M089U kRdV16FIMeY7pGe7tb7qB5PwFlCMFS63hE2xZ3n9c07GoVWin95kWtirUwtoylvMFs0HBFGTcL2S5 HDTAqm9A==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1nc73E-007ubg-N1; Wed, 06 Apr 2022 14:54:28 +0000 Date: Wed, 6 Apr 2022 15:54:28 +0100 From: Matthew Wilcox To: Liao Chang Cc: mcgrof@kernel.org, keescook@chromium.org, yzaikin@google.com, tglx@linutronix.de, clg@kaod.org, nitesh@redhat.com, edumazet@google.com, peterz@infradead.org, joshdon@google.com, masahiroy@kernel.org, nathan@kernel.org, akpm@linux-foundation.org, vbabka@suse.cz, gustavoars@kernel.org, arnd@arndb.de, chris@chrisdown.name, dmitry.torokhov@gmail.com, linux@rasmusvillemoes.dk, daniel@iogearbox.net, john.ogness@linutronix.de, will@kernel.org, dave@stgolabs.net, frederic@kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, heying24@huawei.com, guohanjun@huawei.com, weiyongjun1@huawei.com Subject: Re: [RFC 0/3] softirq: Introduce softirq throttling Message-ID: References: <20220406025241.191300-1-liaochang1@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220406025241.191300-1-liaochang1@huawei.com> X-Spam-Status: No, score=-2.0 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,RDNS_NONE,SPF_HELO_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 06, 2022 at 10:52:38AM +0800, Liao Chang wrote: > Kernel check for pending softirqs periodically, they are performed in a > few points of kernel code, such as irq_exit() and __local_bh_enable_ip(), > softirqs that have been activated by a given CPU must be executed on the > same CPU, this characteristic of softirq is always a potentially > "dangerous" operation, because one CPU might be end up very busy while > the other are most idle. > > Above concern is proven in a networking user case: recenlty, we > engineer find out the time used for connection re-establishment on > kernel v5.10 is 300 times larger than v4.19, meanwhile, softirq > monopolize almost 99% of CPU. This problem stem from that the connection > between Sender and Receiver node get lost, the NIC driver on Sender node > will keep raising NET_TX softirq before connection recovery. The system > log show that most of softirq is performed from __local_bh_enable_ip(), > since __local_bh_enable_ip is used widley in kernel code, it is very > easy to run out most of CPU, and the user-mode application can't obtain > enough CPU cycles to establish connection as soon as possible. Shouldn't you fix that bug instead? This seems like papering over the bad effects of a bug and would make it harder to find bugs like this in the future. Essentially, it's the same as a screaming hardware interrupt, except that it's a software interrupt, so we can fix the bug instead of working around broken hardware.