Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933512AbdGKQJg (ORCPT ); Tue, 11 Jul 2017 12:09:36 -0400 Received: from mail-wr0-f193.google.com ([209.85.128.193]:36591 "EHLO mail-wr0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932635AbdGKQJa (ORCPT ); Tue, 11 Jul 2017 12:09:30 -0400 Date: Tue, 11 Jul 2017 18:09:27 +0200 From: Frederic Weisbecker To: Peter Zijlstra , Christoph Lameter Cc: "Li, Aubrey" , Andi Kleen , Aubrey Li , tglx@linutronix.de, len.brown@intel.com, rjw@rjwysocki.net, tim.c.chen@linux.intel.com, arjan@linux.intel.com, paulmck@linux.vnet.ibm.com, yang.zhang.wz@gmail.com, x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods Message-ID: <20170711160926.GA18805@lerouge> References: <1499650721-5928-1-git-send-email-aubrey.li@intel.com> <20170710084647.zs6wkl3fumszd33g@hirez.programming.kicks-ass.net> <20170710144609.GD31832@tassilo.jf.intel.com> <20170710164206.5aon5kelbisxqyxq@hirez.programming.kicks-ass.net> <20170710172705.GA3441@tassilo.jf.intel.com> <20170711094157.5xcwkloxnjehieqv@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170711094157.5xcwkloxnjehieqv@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3138 Lines: 90 On Tue, Jul 11, 2017 at 11:41:57AM +0200, Peter Zijlstra wrote: > On Tue, Jul 11, 2017 at 12:40:06PM +0800, Li, Aubrey wrote: > > > On Mon, Jul 10, 2017 at 06:42:06PM +0200, Peter Zijlstra wrote: > > > >> Data to indicate what hurts how much would be a very good addition to > > >> the Changelogs. Clearly you have some, you really should have shared. > > > In the idle loop, > > > > - quiet_vmstat costs 5562ns - 6296ns > > Urgh, that thing is horrible, also I think its placed wrong. The comment > near that function says it should be called when we enter NOHZ. > > Which suggests something like so: > > --- > kernel/sched/idle.c | 1 - > kernel/time/tick-sched.c | 1 + > 2 files changed, 1 insertion(+), 1 deletion(-) > > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index 6c23e30c0e5c..ef63adce0c9c 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -219,7 +219,6 @@ static void do_idle(void) > */ > > __current_set_polling(); > - quiet_vmstat(); > tick_nohz_idle_enter(); > > while (!need_resched()) { > diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c > index c7a899c5ce64..eb0e9753db8f 100644 > --- a/kernel/time/tick-sched.c > +++ b/kernel/time/tick-sched.c > @@ -787,6 +787,7 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, > if (!ts->tick_stopped) { > calc_load_nohz_start(); > cpu_load_update_nohz_start(); > + quiet_vmstat(); This patch seems to make sense. Christoph? > > ts->last_tick = hrtimer_get_expires(&ts->sched_timer); > ts->tick_stopped = 1; > > > > - tick_nohz_idle_enter costs 7058ns - 10726ns > > - tick_nohz_idle_exit costs 8372ns - 20850ns > > Right, those are horrible expensive, but skipping them isn't 'hard', the > only tricky bit is finding a condition that makes sense. Note you can statically disable it with nohz=0 boot parameter. > > See Mike's patch: https://patchwork.kernel.org/patch/2839221/ > > Combined with the above, and possibly a better condition, that should > get rid of most of this. Such a patch could work well if the decision from the scheduler to not stop the tick happens on idle entry. Now if sched_needs_cpu() first allows to stop the tick then refuses it later in the end of an idle IRQ, this won't have the desired effect. As long as ts->tick_stopped=1, it stays so until we really restart the tick. So the whole costly nohz machinery stays on. I guess it doesn't matter though, as we are talking about making fast idle entry so the decision not to stop the tick is likely to be done once on idle entry, when ts->tick_stopped=0. One exception though: if the tick is already stopped when we enter idle (full nohz case). And BTW stopping the tick outside idle shouldn't be concerned here. So I'd rather put that on can_stop_idle_tick(). > > > - totally from arch_cpu_idle_enter entry to arch_cpu_idle_exit return costs > > 9122ns - 15318ns. > > --In this period, rcu_idle_enter costs 1985ns - 2262ns, rcu_idle_exit costs > > 1813ns - 3507ns > > Is that the POPF being painful? or something else? Probably that and the atomic_add_return(). Thanks.