Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752920AbbKQBlS (ORCPT ); Mon, 16 Nov 2015 20:41:18 -0500 Received: from slow1-d.mail.gandi.net ([217.70.178.86]:35814 "EHLO slow1-d.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751611AbbKQBlQ (ORCPT ); Mon, 16 Nov 2015 20:41:16 -0500 X-Originating-IP: 50.39.163.18 Date: Mon, 16 Nov 2015 17:41:03 -0800 From: Josh Triplett To: "Paul E. McKenney" Cc: Jacob Pan , Thomas Gleixner , Peter Zijlstra , Ingo Molnar , John Stultz , LKML , Arjan van de Ven , Srinivas Pandruvada , Len Brown , Rafael Wysocki , Eduardo Valentin , Paul Turner Subject: Re: [PATCH 2/4] timer: relax tick stop in idle entry Message-ID: <20151117014103.GA6629@x> References: <1447444387-23525-1-git-send-email-jacob.jun.pan@linux.intel.com> <1447444387-23525-3-git-send-email-jacob.jun.pan@linux.intel.com> <20151113142438.3144d47d@icelake> <20151116135126.5a50e45d@icelake> <20151116223210.GA21522@cloud> <20151116232640.GM5184@linux.vnet.ibm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20151116232640.GM5184@linux.vnet.ibm.com> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 4394 Lines: 85 On Mon, Nov 16, 2015 at 03:26:40PM -0800, Paul E. McKenney wrote: > On Mon, Nov 16, 2015 at 02:32:11PM -0800, Josh Triplett wrote: > > On Mon, Nov 16, 2015 at 01:51:26PM -0800, Jacob Pan wrote: > > > On Mon, 16 Nov 2015 16:06:57 +0100 (CET) > > > Thomas Gleixner wrote: > > > > > > > > -0 [000] 30.093474: bprint: > > > > > __tick_nohz_idle_enter: JPAN: tick_nohz_stop_sched_tick 609 delta > > > > > 1000000 [JP] but sees delta is exactly 1 tick away. didn't stop > > > > > tick. > > > > > > > > If the delta is 1 tick then it is not supposed to stop it. Did you > > > > ever try to figure out WHY it is 1 tick? > > > > > > > > There are two code pathes which can set it to basemono + TICK_NSEC: > > > > > > > > if (rcu_needs_cpu(basemono, &next_rcu) || > > > > arch_needs_cpu() || irq_work_needs_cpu()) { > > > > next_tick = basemono + TICK_NSEC; > > > > } else { > > > > next_tmr = get_next_timer_interrupt(basejiff, > > > > basemono); ts->next_timer = next_tmr; > > > > /* Take the next rcu event into account */ > > > > next_tick = next_rcu < next_tmr ? next_rcu : next_tmr; > > > > } > > > > > > > > Can you please figure out WHY the tick is requested to continue > > > > instead of blindly wreckaging the logic in that code? > > > > > > Looks like the it hits in both cases during forced idle. > > > + Josh > > > + Paul > > > > > > For the first case, it is always related to RCU. I found there are two > > > CONFIG options to avoid this undesired tick in idle loop. > > > 1. enable CONFIG_RCU_NOCB_CPU_ALL, offload to orcu kthreads > > > 2. or enable CONFIG_RCU_FAST_NO_HZ (enter dytick idle w/ rcu callback) > > > > > > Either one works but my concern is that users may not realize the > > > intricate CONFIG_ options and how they translate into energy savings. > > > Consulted with Josh, it seems we could add a check here to recognize > > > the forced idle state and relax rcu_needs_cpu() to return false even it > > > has callbacks. Since we are blocking everybody for a short time (5 ticks > > > default). It should not impact synchronize and kfree rcu. > > > > Right; as long as you're blocking *everybody*, and RCU priority boosting > > doesn't come into play (meaning a real-time task is waiting on RCU > > callbacks), then I don't see any harm in blocking RCU callbacks for a > > while. You'd block completion of synchronize_rcu() and similar, as well > > as memory reclamation, but since you've blocked *every* CPU systemwide > > then that doesn't cause a problem. > > True enough. But how does RCU distinguish between this being a > normal idle cycle that might last indefinitely on the one hand and the > five-jiffy system-wide throttling on the other? OK, maybe there is a > global variable that says that the just-now-starting idle period is > system-wide throttling. But then what about the CPU that just went > idle 10 microseconds ago, and therefore left its timer tick running? > Fine and well, we could IPI it to wake it up and let it see that we > are now doing thermal throttling. But then we presumably also have to > IPI it at the end of the thermal-throttling interval in order for it to > re-evaluate whether or not it should have the tick going. :-/ > > On the one hand, I am sure that all of this can be made to work, > but simply having systems using thermal throttling enable either > CONFIG_RCU_NOCB_CPU_ALL or CONFIG_RCU_FAST_NO_HZ seems -way- simpler. > CONFIG_RCU_FAST_NO_HZ is probably the better choice for generic workloads, > but CONFIG_RCU_NOCB_CPU_ALL is the better choice for embedded workloads > where it is less likely that RCU callbacks will be posted with continuous > wild abandon. > > Or am I missing something subtle here? I agree that it seems preferable to make this require an existing RCU solution rather than adding more complexity to the RCU idle path. One possible thing that may affect the choice of solution: this needs to idle *every* CPU, without leaving any CPU awake to handle callbacks or similar. - Josh Triplett -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/