Received: by 2002:a25:f815:0:0:0:0:0 with SMTP id u21csp1530326ybd; Sun, 23 Jun 2019 09:41:26 -0700 (PDT) X-Google-Smtp-Source: APXvYqziBO/plzeM8SJ0H4Wpb+f7aLq93mk1fUuQV301nPmJFp/yjko38JAnmGjLbzKl/7zMr4Il X-Received: by 2002:a63:60c5:: with SMTP id u188mr7668145pgb.159.1561308085863; Sun, 23 Jun 2019 09:41:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561308085; cv=none; d=google.com; s=arc-20160816; b=L+e5PrRTfJiNPOqHTeN2xz9YpbxUxdr8sTBZHjvQreUTELXPJFDkYGKbjY1VNojNyT 2K2OaiTdGmxmBgfYZTDsvyr+YNu4iCIGdxtK+cbpkqaUw/JScqyAuqFVDBBGfxBJFyRV PbBGEvIMuaJDTEjja5kEzcud5alOGtmnUy0OVta6X4XJDffiLLqh0MnPgq3mHZmbRqKe 3HmvL61KeQUMVCagNExFaU4ihijZsE6/0jX2WOp1HVbsYzxwbWH/kJghtioe3J0MGNDE 61DBmDsNClrgrOwBaKMkIppq0tSuqrWc35zbr0UKBde52ZH+CPigtTZHwVki6i3BS1u9 BZ+Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date; bh=t3BEcbOBUx1+sM7yX67wucs6m4YRbWtqRH0V4tvRlVk=; b=x6v424wnNIkKQTqVIuQefV7ghL+umjIeOYPcTroOb4Fj3dXYj6haM4elbItWHGMbo1 CnNfb1tnypEvwMPDsi5g1+eDwSJrKIc1F9XVfVw5pXfEP6s5qg4sWYFWLnPHAkR8FB2U /uoooWofmReoOBrWrfFB/ODnOZOSZ46qySfvZU41imR4BqEfiSzrV9EUzytgbK5CGjay JHctYqwSwDMvvBidcTLIKCRzuDi77UGckDjKJAaDCh1Ii3IkhEe0WdKsewL2WxX5AFjW V4taksDjztH+Go/KwUEiBUwTPxd5yNuJUuGq2adxDBFSckhdhZpVerg9c6+hwjp9wIgP siTg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id t10si7846681plo.297.2019.06.23.09.41.10; Sun, 23 Jun 2019 09:41:25 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726747AbfFWQjA (ORCPT + 99 others); Sun, 23 Jun 2019 12:39:00 -0400 Received: from Galois.linutronix.de ([193.142.43.55]:33801 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726399AbfFWQjA (ORCPT ); Sun, 23 Jun 2019 12:39:00 -0400 Received: from p5b06daab.dip0.t-ipconnect.de ([91.6.218.171] helo=nanos) by Galois.linutronix.de with esmtpsa (TLS1.2:DHE_RSA_AES_256_CBC_SHA256:256) (Exim 4.80) (envelope-from ) id 1hf5Vk-0003To-B7; Sun, 23 Jun 2019 18:38:36 +0200 Date: Sun, 23 Jun 2019 18:38:34 +0200 (CEST) From: Thomas Gleixner To: Zhiqiang Liu cc: corbet@lwn.net, mcgrof@kernel.org, Kees Cook , akpm@linux-foundation.org, manfred@colorfullife.com, jwilk@jwilk.net, dvyukov@google.com, feng.tang@intel.com, sunilmut@microsoft.com, quentin.perret@arm.com, linux@leemhuis.info, alex.popov@linux.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, "wangxiaogang (F)" , "Zhoukang (A)" , Mingfangsen , tedheadster@gmail.com, Eric Dumazet Subject: Re: [PATCH next] softirq: enable MAX_SOFTIRQ_TIME tuning with sysctl max_softirq_time_usecs In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Zhiqiang, On Thu, 20 Jun 2019, Zhiqiang Liu wrote: > From: Zhiqiang liu > > In __do_softirq func, MAX_SOFTIRQ_TIME was set to 2ms via experimentation by > commit c10d73671 ("softirq: reduce latencies") in 2013, which was designed > to reduce latencies for various network workloads. The key reason is that the > maximum number of microseconds in one NAPI polling cycle in net_rx_action func > was set to 2 jiffies, so different HZ settting will lead to different latencies. > > However, commit 7acf8a1e8 ("Replace 2 jiffies with sysctl netdev_budget_usecs > to enable softirq tuning") adopts netdev_budget_usecs to tun maximum number of > microseconds in one NAPI polling cycle. So the latencies of net_rx_action can be > controlled by sysadmins to copy with hardware changes over time. So much for the theory. See below. > Correspondingly, the MAX_SOFTIRQ_TIME should be able to be tunned by sysadmins, > who knows best about hardware performance, for excepted tradeoff between latence > and fairness. > > Here, we add sysctl variable max_softirq_time_usecs to replace MAX_SOFTIRQ_TIME > with 2ms default value. ... > */ > -#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2) > +unsigned int __read_mostly max_softirq_time_usecs = 2000; > #define MAX_SOFTIRQ_RESTART 10 > > #ifdef CONFIG_TRACE_IRQFLAGS > @@ -248,7 +249,8 @@ static inline void lockdep_softirq_end(bool in_hardirq) { } > > asmlinkage __visible void __softirq_entry __do_softirq(void) > { > - unsigned long end = jiffies + MAX_SOFTIRQ_TIME; > + unsigned long end = jiffies + > + usecs_to_jiffies(max_softirq_time_usecs); That's still jiffies based and therefore depends on CONFIG_HZ. Any budget value will be rounded up to the next jiffie. So in case of HZ=100 and time=1000us this will still result in 10ms of allowed loop time. I'm not saying that we must use a more fine grained time source, but both the changelog and the sysctl documentation are misleading. If we keep it jiffies based, then microseconds do not make any sense. They just give a false sense of controlability. Keep also in mind that with jiffies the accuracy depends also on the distance to the next tick when 'end' is evaluated. The next tick might be imminent. That's all information which needs to be in the documentation. > + { > + .procname = "max_softirq_time_usecs", > + .data = &max_softirq_time_usecs, > + .maxlen = sizeof(unsigned int), > + .mode = 0644, > + .proc_handler = proc_dointvec_minmax, > + .extra1 = &zero, > + }, Zero as the lower limit? That means it allows a single loop. Fine, but needs to be documented as well. Thanks, tglx