Received: by 10.213.65.68 with SMTP id h4csp1015359imn; Tue, 20 Mar 2018 23:49:47 -0700 (PDT) X-Google-Smtp-Source: AG47ELu/IQpy78GDregRKYXKY7sFF1UmkmKMajrjAhn/ts4y2hFcx5NJwqMt7zX530iDwwtvV2v+ X-Received: by 10.98.194.87 with SMTP id l84mr16079620pfg.6.1521614987012; Tue, 20 Mar 2018 23:49:47 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1521614986; cv=none; d=google.com; s=arc-20160816; b=aiCZaejU8WmAxKYb/J8GjbejigjBYoA4Cw1Gxvv9nY64VaHOvXdaBrdAQZcnsbKez0 5avL81cV/MNKGKuC2UwmCkJw8C4dli9O0R++3Ft0zNR1zVKOepc0Wt3PqQgBC4pRNr1i 2buPpMg90/y+P8bPdC0nDG+GH1AJ53HBBA0YW9ZPBDi922AErJ1DQJErW3o7RBb8N9eP Dyw2AXLliVvEWc4JH3pRUEsQm37jsDdQtyDURpxgiHQCuep1i4EiQJ0qKAKVCWlZkgds OtYQroU5XG/R9tYTIv78c4EYZhN+rbO37FIDhCUGYdcDTZqh5Vhnf/MUaqTaRvh28KjQ 3WvQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=hlwwaz5F69LpoPFGWDskKYKsu5G4pp7uy8l5p0zpWDg=; b=UVHg5YkSfu9nrjAiF1oKs86yHKKpFLQt/RdmroMs3+73lSk3GGQlryosHdVjrneExI 3ySgz5QeOOCOm0YyHBIzEGPBgvyX8JpGf8qcV914xU/NjmND27wL9sIhliXHoFkSHq42 ud8EiqnvhcbzPayq5cMhunxEz1eshN8ke2rXibD/cuHJY8jAE5hFtfXApzVSqjOJ/VJu Uq1Sja3swgX55O+ZaiX6UmO/rj42rdliiUOGJKj9Ymrl0YejfsDJHoRgqexufwUOivU0 xonwEMDKsJLeSTqtTNf+xaO02L41zvxEhMNAkQjNLb6V+5VfdVMax0mAYmrJbOUAIFy2 DDBQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k11si2309655pgo.753.2018.03.20.23.49.32; Tue, 20 Mar 2018 23:49:46 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751657AbeCUGsP (ORCPT + 99 others); Wed, 21 Mar 2018 02:48:15 -0400 Received: from cloudserver094114.home.pl ([79.96.170.134]:49235 "EHLO cloudserver094114.home.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751411AbeCUGsM (ORCPT ); Wed, 21 Mar 2018 02:48:12 -0400 Received: from 79.184.254.228.ipv4.supernova.orange.pl (79.184.254.228) (HELO aspire.rjw.lan) by serwer1319399.home.pl (79.96.170.134) with SMTP (IdeaSmtpServer 0.83) id 948345542eaa6036; Wed, 21 Mar 2018 07:48:11 +0100 From: "Rafael J. Wysocki" To: Peter Zijlstra , Linux PM Cc: Frederic Weisbecker , Thomas Gleixner , Paul McKenney , Thomas Ilsche , Doug Smythies , Rik van Riel , Aubrey Li , Mike Galbraith , LKML Subject: [RFT][PATCH v7.1 5/8] cpuidle: Return nohz hint from cpuidle_select() Date: Wed, 21 Mar 2018 07:48:38 +0100 Message-ID: <2111949.c172CzMuf8@aspire.rjw.lan> In-Reply-To: <1635957.yuHkCe9oyz@aspire.rjw.lan> References: <2390019.oHdSGtR3EE@aspire.rjw.lan> <1635957.yuHkCe9oyz@aspire.rjw.lan> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Rafael J. Wysocki Add a new pointer argument to cpuidle_select() and to the ->select cpuidle governor callback to allow a boolean value indicating whether or not the tick should be stopped before entering the selected state to be returned from there. Make the ladder governor ignore that pointer (to preserve its current behavior) and make the menu governor return 'false" through it if: (1) the idle exit latency is constrained at 0, or (2) the selected state is a polling one, or (3) the expected idle period duration is within the tick period range. In addition to that, the correction factor computations in the menu governor need to take the possibility that the tick may not be stopped into account to avoid artificially small correction factor values. To that end, add a mechanism to record tick wakeups, as suggested by Peter Zijlstra, and use it to modify the menu_reflect() behavior when tick wakeup occurs. Namely, if the CPU is woken up by the tick, the predicted idle duration is likely too short, so make menu_reflect() try to compensate by bumping up the correction factor with a (sufficiently large) constant value. Since the value returned through the new argument pointer of cpuidle_select() is not used by its caller yet, this change by itself is not expected to alter the functionality of the code. Signed-off-by: Rafael J. Wysocki --- Another variant of patch [5/8] to test. This one doesn't run menu_update() at all on tick wakeups, but simply bumps up the correction factor alone then. It may give better results, but please test and compare. I have created a git branch with this variant for easier testing: git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git \ idle-loop-v7.1 --- drivers/cpuidle/cpuidle.c | 10 ++++++- drivers/cpuidle/governors/ladder.c | 3 +- drivers/cpuidle/governors/menu.c | 49 +++++++++++++++++++++++++++++++++---- include/linux/cpuidle.h | 8 +++--- include/linux/tick.h | 2 + kernel/sched/idle.c | 4 ++- kernel/time/tick-sched.c | 20 +++++++++++++++ 7 files changed, 84 insertions(+), 12 deletions(-) Index: linux-pm/include/linux/cpuidle.h =================================================================== --- linux-pm.orig/include/linux/cpuidle.h +++ linux-pm/include/linux/cpuidle.h @@ -135,7 +135,8 @@ extern bool cpuidle_not_available(struct struct cpuidle_device *dev); extern int cpuidle_select(struct cpuidle_driver *drv, - struct cpuidle_device *dev); + struct cpuidle_device *dev, + bool *stop_tick); extern int cpuidle_enter(struct cpuidle_driver *drv, struct cpuidle_device *dev, int index); extern void cpuidle_reflect(struct cpuidle_device *dev, int index); @@ -167,7 +168,7 @@ static inline bool cpuidle_not_available struct cpuidle_device *dev) {return true; } static inline int cpuidle_select(struct cpuidle_driver *drv, - struct cpuidle_device *dev) + struct cpuidle_device *dev, bool *stop_tick) {return -ENODEV; } static inline int cpuidle_enter(struct cpuidle_driver *drv, struct cpuidle_device *dev, int index) @@ -250,7 +251,8 @@ struct cpuidle_governor { struct cpuidle_device *dev); int (*select) (struct cpuidle_driver *drv, - struct cpuidle_device *dev); + struct cpuidle_device *dev, + bool *stop_tick); void (*reflect) (struct cpuidle_device *dev, int index); }; Index: linux-pm/kernel/sched/idle.c =================================================================== --- linux-pm.orig/kernel/sched/idle.c +++ linux-pm/kernel/sched/idle.c @@ -188,13 +188,15 @@ static void cpuidle_idle_call(void) next_state = cpuidle_find_deepest_state(drv, dev); call_cpuidle(drv, dev, next_state); } else { + bool stop_tick = true; + tick_nohz_idle_stop_tick(); rcu_idle_enter(); /* * Ask the cpuidle framework to choose a convenient idle state. */ - next_state = cpuidle_select(drv, dev); + next_state = cpuidle_select(drv, dev, &stop_tick); entered_state = call_cpuidle(drv, dev, next_state); /* * Give the governor an opportunity to reflect on the outcome Index: linux-pm/drivers/cpuidle/cpuidle.c =================================================================== --- linux-pm.orig/drivers/cpuidle/cpuidle.c +++ linux-pm/drivers/cpuidle/cpuidle.c @@ -272,12 +272,18 @@ int cpuidle_enter_state(struct cpuidle_d * * @drv: the cpuidle driver * @dev: the cpuidle device + * @stop_tick: indication on whether or not to stop the tick * * Returns the index of the idle state. The return value must not be negative. + * + * The memory location pointed to by @stop_tick is expected to be written the + * 'false' boolean value if the scheduler tick should not be stopped before + * entering the returned state. */ -int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) +int cpuidle_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, + bool *stop_tick) { - return cpuidle_curr_governor->select(drv, dev); + return cpuidle_curr_governor->select(drv, dev, stop_tick); } /** Index: linux-pm/drivers/cpuidle/governors/ladder.c =================================================================== --- linux-pm.orig/drivers/cpuidle/governors/ladder.c +++ linux-pm/drivers/cpuidle/governors/ladder.c @@ -63,9 +63,10 @@ static inline void ladder_do_selection(s * ladder_select_state - selects the next state to enter * @drv: cpuidle driver * @dev: the CPU + * @dummy: not used */ static int ladder_select_state(struct cpuidle_driver *drv, - struct cpuidle_device *dev) + struct cpuidle_device *dev, bool *dummy) { struct ladder_device *ldev = this_cpu_ptr(&ladder_devices); struct device *device = get_cpu_device(dev->cpu); Index: linux-pm/drivers/cpuidle/governors/menu.c =================================================================== --- linux-pm.orig/drivers/cpuidle/governors/menu.c +++ linux-pm/drivers/cpuidle/governors/menu.c @@ -279,8 +279,10 @@ again: * menu_select - selects the next idle state to enter * @drv: cpuidle driver containing state data * @dev: the CPU + * @stop_tick: indication on whether or not to stop the tick */ -static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev) +static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, + bool *stop_tick) { struct menu_device *data = this_cpu_ptr(&menu_devices); struct device *device = get_cpu_device(dev->cpu); @@ -303,8 +305,10 @@ static int menu_select(struct cpuidle_dr latency_req = resume_latency; /* Special case when user has set very strict latency requirement */ - if (unlikely(latency_req == 0)) + if (unlikely(latency_req == 0)) { + *stop_tick = false; return 0; + } /* determine the expected residency time, round up */ data->next_timer_us = ktime_to_us(tick_nohz_get_sleep_length()); @@ -354,6 +358,7 @@ static int menu_select(struct cpuidle_dr if (latency_req > interactivity_req) latency_req = interactivity_req; + expected_interval = data->predicted_us; /* * Find the idle state with the lowest power while satisfying * our constraints. @@ -369,15 +374,30 @@ static int menu_select(struct cpuidle_dr idx = i; /* first enabled state */ if (s->target_residency > data->predicted_us) break; - if (s->exit_latency > latency_req) + if (s->exit_latency > latency_req) { + /* + * If we break out of the loop for latency reasons, use + * the target residency of the selected state as the + * expected idle duration so that the tick is retained + * as long as that target residency is low enough. + */ + expected_interval = drv->states[idx].target_residency; break; - + } idx = i; } if (idx == -1) idx = 0; /* No states enabled. Must use 0. */ + /* + * Don't stop the tick if the selected state is a polling one or if the + * expected idle duration is shorter than the tick period length. + */ + if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || + expected_interval < TICK_USEC) + *stop_tick = false; + data->last_state_idx = idx; return data->last_state_idx; @@ -396,7 +416,26 @@ static void menu_reflect(struct cpuidle_ struct menu_device *data = this_cpu_ptr(&menu_devices); data->last_state_idx = index; - data->needs_update = 1; + if (tick_nohz_idle_got_tick()) { + unsigned int new_factor = data->correction_factor[data->bucket]; + + /* + * Only update the correction factor, don't update the repeating + * pattern data to avoid polluting it with the tick period + * length which is an artificial addition. + */ + new_factor -= new_factor / DECAY; + /* + * If the CPU was woken up by the tick, the predicted idle time + * was likely too short. Try to compensate by bumping up the + * correction factor. Use 0.75 * RESOLUTION (which is easy + * enough to get) that should work on the average. + */ + new_factor += RESOLUTION / 2 + RESOLUTION / 4; + data->correction_factor[data->bucket] = new_factor; + } else { + data->needs_update = 1; + } } /** Index: linux-pm/kernel/time/tick-sched.c =================================================================== --- linux-pm.orig/kernel/time/tick-sched.c +++ linux-pm/kernel/time/tick-sched.c @@ -991,6 +991,20 @@ void tick_nohz_irq_exit(void) } /** + * tick_nohz_idle_got_tick - Check whether or not the tick handler has run + */ +bool tick_nohz_idle_got_tick(void) +{ + struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched); + + if (ts->inidle > 1) { + ts->inidle = 1; + return true; + } + return false; +} + +/** * tick_nohz_get_sleep_length - return the length of the current sleep * * Called from power state control code with interrupts disabled @@ -1101,6 +1115,9 @@ static void tick_nohz_handler(struct clo struct pt_regs *regs = get_irq_regs(); ktime_t now = ktime_get(); + if (ts->inidle) + ts->inidle = 2; + dev->next_event = KTIME_MAX; tick_sched_do_timer(now); @@ -1198,6 +1215,9 @@ static enum hrtimer_restart tick_sched_t struct pt_regs *regs = get_irq_regs(); ktime_t now = ktime_get(); + if (ts->inidle) + ts->inidle = 2; + tick_sched_do_timer(now); /* Index: linux-pm/include/linux/tick.h =================================================================== --- linux-pm.orig/include/linux/tick.h +++ linux-pm/include/linux/tick.h @@ -119,6 +119,7 @@ extern void tick_nohz_idle_restart_tick( extern void tick_nohz_idle_enter(void); extern void tick_nohz_idle_exit(void); extern void tick_nohz_irq_exit(void); +extern bool tick_nohz_idle_got_tick(void); extern ktime_t tick_nohz_get_sleep_length(void); extern unsigned long tick_nohz_get_idle_calls(void); extern unsigned long tick_nohz_get_idle_calls_cpu(int cpu); @@ -139,6 +140,7 @@ static inline void tick_nohz_idle_stop_t static inline void tick_nohz_idle_restart_tick(void) { } static inline void tick_nohz_idle_enter(void) { } static inline void tick_nohz_idle_exit(void) { } +static inline bool tick_nohz_idle_got_tick(void) { return false; } static inline ktime_t tick_nohz_get_sleep_length(void) {