Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756576AbdGLFWo (ORCPT ); Wed, 12 Jul 2017 01:22:44 -0400 Received: from mga05.intel.com ([192.55.52.43]:62362 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756534AbdGLFWm (ORCPT ); Wed, 12 Jul 2017 01:22:42 -0400 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,347,1496127600"; d="scan'208";a="285801831" Subject: Re: [RFC PATCH v1 04/11] sched/idle: make the fast idle path for short idle periods To: paulmck@linux.vnet.ibm.com Cc: Frederic Weisbecker , Aubrey Li , tglx@linutronix.de, peterz@infradead.org, len.brown@intel.com, rjw@rjwysocki.net, ak@linux.intel.com, tim.c.chen@linux.intel.com, arjan@linux.intel.com, yang.zhang.wz@gmail.com, x86@kernel.org, linux-kernel@vger.kernel.org References: <1499650721-5928-1-git-send-email-aubrey.li@intel.com> <1499650721-5928-5-git-send-email-aubrey.li@intel.com> <20170711125847.GA13265@linux.vnet.ibm.com> <20170711163353.GB18805@lerouge> <20170711181108.GQ2393@linux.vnet.ibm.com> <20170712050329.GV2393@linux.vnet.ibm.com> From: "Li, Aubrey" Message-ID: Date: Wed, 12 Jul 2017 13:22:38 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <20170712050329.GV2393@linux.vnet.ibm.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3248 Lines: 83 On 2017/7/12 13:03, Paul E. McKenney wrote: > On Wed, Jul 12, 2017 at 11:19:59AM +0800, Li, Aubrey wrote: >> On 2017/7/12 2:11, Paul E. McKenney wrote: >>> On Tue, Jul 11, 2017 at 06:33:55PM +0200, Frederic Weisbecker wrote: >>>> On Tue, Jul 11, 2017 at 05:58:47AM -0700, Paul E. McKenney wrote: >>>>> On Mon, Jul 10, 2017 at 09:38:34AM +0800, Aubrey Li wrote: >>>>>> From: Aubrey Li >>>>>> >>>>>> The system will enter a fast idle loop if the predicted idle period >>>>>> is shorter than the threshold. >>>>>> --- >>>>>> kernel/sched/idle.c | 9 ++++++++- >>>>>> 1 file changed, 8 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c >>>>>> index cf6c11f..16a766c 100644 >>>>>> --- a/kernel/sched/idle.c >>>>>> +++ b/kernel/sched/idle.c >>>>>> @@ -280,6 +280,8 @@ static void cpuidle_generic(void) >>>>>> */ >>>>>> static void do_idle(void) >>>>>> { >>>>>> + unsigned int predicted_idle_us; >>>>>> + unsigned int short_idle_threshold = jiffies_to_usecs(1) / 2; >>>>>> /* >>>>>> * If the arch has a polling bit, we maintain an invariant: >>>>>> * >>>>>> @@ -291,7 +293,12 @@ static void do_idle(void) >>>>>> >>>>>> __current_set_polling(); >>>>>> >>>>>> - cpuidle_generic(); >>>>>> + predicted_idle_us = cpuidle_predict(); >>>>>> + >>>>>> + if (likely(predicted_idle_us < short_idle_threshold)) >>>>>> + cpuidle_fast(); >>>>> >>>>> What if we get here from nohz_full usermode execution? In that >>>>> case, if I remember correctly, the scheduling-clock interrupt >>>>> will still be disabled, and would have to be re-enabled before >>>>> we could safely invoke cpuidle_fast(). >>>>> >>>>> Or am I missing something here? >>>> >>>> That's a good point. It's partially ok because if the tick is needed >>>> for something specific, it is not entirely stopped but programmed to that >>>> deadline. >>>> >>>> Now there is some idle specific code when we enter dynticks-idle. See >>>> tick_nohz_start_idle(), tick_nohz_stop_idle(), sched_clock_idle_wakeup_event() >>>> and some subsystems that react differently when we enter dyntick idle >>>> mode (scheduler_tick_max_deferment) so the tick may need a reevaluation. >>>> >>>> For now I'd rather suggest that we treat full nohz as an exception case here >>>> and do: >>>> >>>> if (!tick_nohz_full_cpu(smp_processor_id()) && likely(predicted_idle_us < short_idle_threshold)) >>>> cpuidle_fast(); >>>> >>>> Ugly but safer! >>> >>> Works for me! >> >> I guess who enabled full nohz(for example the financial guys who need the system >> response as fast as possible) does not like this compromise, ;) > > And some HPC guys and some real-time guys with CPU-bound real-time > processing, so there are likely quite a few different views on this > compromise. > >> How about add rcu_idle enter/exit back only for full nohz case in fast idle? RCU idle >> is the only risky ops if removing them from fast idle path. Comparing to adding RCU >> idle back, going to normal idle path has more overhead IMHO. > > That might work, but I would need to see the actual patch. Frederic > Weisbecker should look at it as well. > Okay, let me address the first round of comments and deliver v2 soon. Thanks, -Aubrey