Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753329AbdGLSq1 (ORCPT ); Wed, 12 Jul 2017 14:46:27 -0400 Received: from mx0a-001b2d01.pphosted.com ([148.163.156.1]:55700 "EHLO mx0a-001b2d01.pphosted.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752434AbdGLSqZ (ORCPT ); Wed, 12 Jul 2017 14:46:25 -0400 Date: Wed, 12 Jul 2017 11:46:17 -0700 From: "Paul E. McKenney" To: Peter Zijlstra Cc: Frederic Weisbecker , Christoph Lameter , "Li, Aubrey" , Andi Kleen , Aubrey Li , tglx@linutronix.de, len.brown@intel.com, rjw@rjwysocki.net, tim.c.chen@linux.intel.com, arjan@linux.intel.com, yang.zhang.wz@gmail.com, x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods Reply-To: paulmck@linux.vnet.ibm.com References: <20170710164206.5aon5kelbisxqyxq@hirez.programming.kicks-ass.net> <20170710172705.GA3441@tassilo.jf.intel.com> <20170711094157.5xcwkloxnjehieqv@hirez.programming.kicks-ass.net> <20170711160926.GA18805@lerouge> <20170711163422.etydkhhtgfthpfi5@hirez.programming.kicks-ass.net> <20170711180931.GP2393@linux.vnet.ibm.com> <20170712122249.u6y4ymmk6qwvog57@hirez.programming.kicks-ass.net> <20170712155458.GW2393@linux.vnet.ibm.com> <20170712171756.e3fnc3waanbaiiss@hirez.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20170712171756.e3fnc3waanbaiiss@hirez.programming.kicks-ass.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-TM-AS-GCONF: 00 x-cbid: 17071218-0056-0000-0000-000003A57873 X-IBM-SpamModules-Scores: X-IBM-SpamModules-Versions: BY=3.00007356; HX=3.00000241; KW=3.00000007; PH=3.00000004; SC=3.00000214; SDB=6.00886615; UDB=6.00442587; IPR=6.00666764; BA=6.00005468; NDR=6.00000001; ZLA=6.00000005; ZF=6.00000009; ZB=6.00000000; ZP=6.00000000; ZH=6.00000000; ZU=6.00000002; MB=3.00016200; XFM=3.00000015; UTC=2017-07-12 18:46:21 X-IBM-AV-DETECTION: SAVI=unused REMOTE=unused XFE=unused x-cbparentid: 17071218-0057-0000-0000-000007DB8581 Message-Id: <20170712184617.GZ2393@linux.vnet.ibm.com> X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-07-12_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 spamscore=0 suspectscore=0 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1706020000 definitions=main-1707120302 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4311 Lines: 90 On Wed, Jul 12, 2017 at 07:17:56PM +0200, Peter Zijlstra wrote: > On Wed, Jul 12, 2017 at 08:54:58AM -0700, Paul E. McKenney wrote: > > On Wed, Jul 12, 2017 at 02:22:49PM +0200, Peter Zijlstra wrote: > > > On Tue, Jul 11, 2017 at 11:09:31AM -0700, Paul E. McKenney wrote: > > > > On Tue, Jul 11, 2017 at 06:34:22PM +0200, Peter Zijlstra wrote: > > > > > Also, RCU_FAST_NO_HZ will make a fairly large difference here.. Paul > > > > > what's the state of that thing, do we actually want that or not? > > > > > > > > If you are battery powered and don't have tight real-time latency > > > > constraints, you want it -- it has represent a 30-40% boost in battery > > > > lifetime for some low-utilization battery-powered devices. Otherwise, > > > > probably not. > > > > > > Would it make sense to hook that off of tick_nohz_idle_enter(); in > > > specific the part where we actually stop the tick; instead of every > > > idle? > > > > The actions RCU takes on RCU_FAST_NO_HZ depend on the current state of > > the CPU's callback lists, so it seems to me that the decision has to > > be made on each idle entry. > > > > Now it might be possible to make the checks more efficient, and doing > > that is on my list. > > > > Or am I missing your point? > > Could be I'm just not remembering how all that works.. But I was > wondering if we can do the expensive bits if we've decided to actually > go NOHZ and avoid doing it on every idle entry. > > IIRC the RCU fast NOHZ bits try and flush the callback list (or paw it > off to another CPU?) such that we can go NOHZ sooner. Having a !empty > callback list avoid NOHZ from happening. The code did indeed attempt to flush the callback list back in the day, but that proved to not actually save any power. There were several variations in the meantime, but what it does now is to check to see if there are callbacks at rcu_needs_cpu() time: 1. If there are none, RCU tells the caller that it doesn't need the CPU. 2. If there are some, and some of them are non-lazy (as in doing something other than just freeing memory), RCU updates its idea of which grace period the callbacks are waiting for, otherwise leaves the callbacks alone, but returns saying that it needs the CPU around four jiffies (by default), but rounded to allow one wakeup to handle all CPUs in the power domain. Use the rcu_idle_gp_delay boot/sysfs parameter to adjust the wait duration if required. (I haven't heard of adjustment ever being required.) Note that a non-lazy callback might well be synchronize_rcu(), so we cannot wait too long, or we will be delaying things too much. 3. If there are some callbacks, and all of them are lazy, RCU again updates its idea of which grace period the callbacks are waiting for, otherwise leaves the callbacks alone, but returns saying that it needs the CPU around six seconds (by default) in the future, but using round_jiffies(), again to share wakeups within a power domain. Use the rcu_idle_lazy_gp_delay boot/sysfs parameter to adjust the wait, and again, as far as I know adjustment never has been necessary. When the CPU is awakened, it will update its callback based on any grace periods that have elapsed in the meantime. There is a bit of work later at rcu_idle_enter() time, but it is quite small. > Now if we've already decided we can't in fact go NOHZ due to other > concerns, flushing the callback list is pointless work. So I'm thinking > we can find a better place to do this. True, if the tick will still be happening, there is little point in bothering RCU about it. And if CPUs tend to go idle with RCU callbacks, then it would be cheaper to check arch_needs_cpu() and irq_work_needs_cpu() first. If CPUs tend to be free of callbacks when they go idle, this won't help, and might be counterproductive. But if rcu_needs_cpu() or rcu_prepare_for_idle() is showing up on profiles, I could adjust things. This would include making rcu_prepare_for_idle() no longer expect that rcu_needs_cpu() had previously been called on the current path to idle. (Not a big deal, just that the obvious chnage to tick_nohz_stop_sched_tick() won't necessarily do what you want.) So please let me know if rcu_needs_cpu() or rcu_prepare_for_idle() are prominent contributors to to-idle latency. Thanx, Paul