Received: by 10.223.185.111 with SMTP id b44csp105073wrg; Fri, 9 Mar 2018 01:50:04 -0800 (PST) X-Google-Smtp-Source: AG47ELu5QtKR6YBh+Z35QbAm9PwHt7libTDgBiz5KbmgB0fXMozsivjIZUomuCFYAHVpe84GpGq6 X-Received: by 2002:a17:902:365:: with SMTP id 92-v6mr27292721pld.127.1520589004498; Fri, 09 Mar 2018 01:50:04 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520589004; cv=none; d=google.com; s=arc-20160816; b=pAcnPNT43q4I7f23NFtEAXtQPA9XMyUSixnpOZnuAxQpaDHrOi4M1ZiYbHtUH9KC4O oE1FHCJG44Ej9vbP3gqJl7lRtcV1AK4t8KxttY1ND++bEggjeh8m1yd1N6AW4hqkXzAb Qo+s39z1/EqalxBIszp5FR0mHrz0yyMLTXlVdAsY1EWBz0mh4NnjSDci656OvvPwaBt+ kaQwPUROPvcfiXFJoImK5yLVdNPSXTd/CDy16Ew51a5k4sPEYDdjtf2DUW15mC3jhs5a mJzCrOGYgSyK19CsFXevi7LEIvethHztVWblmSeFtFSi4XtEkTXRmIEJmnfmFTcBspGB GdfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=QPHaAnNBohj8plyzAHe1dgG8pRflzMEkjftTewqdBJo=; b=sa5CCsoCW70/0gEBAoc2/4nuAjIOTtnWIhbP2SC37++419YrH/Y1uKWMzU4VPtr4Gv AlE9VILwqCcU4xv5s8FCHp8IYJBDPR9ixZg5mT6EWLxZevjX3G50gxbCwijVWiVd92Q0 txmHFR56hCAwKHQqTr2cIG4ucYfy3Gl5BQWcn/8D1RjLa0Yjkn81zpWA+CVB4uVM+DSD QJ3dj5ocRvReh1GhqzAju54cpylnXHyrkSqVeVQj05caEDiGWetjy3hT0qzZmXcPwo6U WYkO3kkZYUy6/+oQRXOmUj3Fd3koc0aBeDd5BKuVs3HlK4hYCUhJeaFSdqviRGBSG8m0 zSDw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id p23si488790pgv.371.2018.03.09.01.49.49; Fri, 09 Mar 2018 01:50:04 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751145AbeCIJsy (ORCPT + 99 others); Fri, 9 Mar 2018 04:48:54 -0500 Received: from cloudserver094114.home.pl ([79.96.170.134]:42502 "EHLO cloudserver094114.home.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750883AbeCIJsu (ORCPT ); Fri, 9 Mar 2018 04:48:50 -0500 Received: from 79.184.254.228.ipv4.supernova.orange.pl (79.184.254.228) (HELO aspire.rjw.lan) by serwer1319399.home.pl (79.96.170.134) with SMTP (IdeaSmtpServer 0.83) id 5254dcd1a51c06d9; Fri, 9 Mar 2018 10:48:48 +0100 From: "Rafael J. Wysocki" To: Peter Zijlstra , Linux PM , Frederic Weisbecker Cc: Thomas Gleixner , Paul McKenney , Thomas Ilsche , Doug Smythies , Rik van Riel , Aubrey Li , Mike Galbraith , LKML Subject: [RFC/RFT][PATCH v3 5/6] sched: idle: Select idle state before stopping the tick Date: Fri, 09 Mar 2018 10:46:55 +0100 Message-ID: <3333184.WFRNK4vEe0@aspire.rjw.lan> In-Reply-To: <2450532.XN8DODrtDf@aspire.rjw.lan> References: <2450532.XN8DODrtDf@aspire.rjw.lan> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org From: Rafael J. Wysocki In order to address the issue with short idle duration predictions by the idle governor after the tick has been stopped, reorder the code in cpuidle_idle_call() so that the governor idle state selection runs before tick_nohz_idle_go_idle() and use the "nohz" hint returned by cpuidle_select() to decide whether or not to stop the tick. This isn't straightforward, because menu_select() invokes tick_nohz_get_sleep_length() to get the time to the next timer event and the number returned by the latter comes from __tick_nohz_idle_enter(). Fortunately, however, it is possible to compute that number without actually stopping the tick and with the help of the existing code. Namely, notice that tick_nohz_stop_sched_tick() already computes the next timer event time to reprogram the scheduler tick hrtimer and that time can be used as a proxy for the actual next timer event time in the idle duration predicition. Moreover, it is possible to split tick_nohz_stop_sched_tick() into two separate routines, one computing the time to the next timer event and the other simply stopping the tick when the time to the next timer event is known. Accordingly, split tick_nohz_stop_sched_tick() into __tick_nohz_next_event() and __tick_nohz_stop_tick() and use the latter in tick_nohz_get_sleep_length(). Add two new extra fields, timer_expires and timer_expires_basemono, to struct tick_sched for passing data between these two functions and one more extra field, tick_may_stop, to indicate that __tick_nohz_next_event() has run and __tick_nohz_stop_tick() can be called now. Also drop the now redundant sleep_length field from there. Signed-off-by: Rafael J. Wysocki --- include/linux/tick.h | 2 kernel/sched/idle.c | 11 ++- kernel/time/tick-sched.c | 146 ++++++++++++++++++++++++++++++----------------- kernel/time/tick-sched.h | 6 + 4 files changed, 110 insertions(+), 55 deletions(-) Index: linux-pm/kernel/time/tick-sched.h =================================================================== --- linux-pm.orig/kernel/time/tick-sched.h +++ linux-pm/kernel/time/tick-sched.h @@ -30,6 +30,7 @@ enum tick_nohz_mode { * when the CPU returns from nohz sleep. * @next_tick: Next tick to be fired when in dynticks mode. * @tick_stopped: Indicator that the idle tick has been stopped + * @tick_may_stop: Indicator that the idle tick may be stopped shortly * @idle_jiffies: jiffies at the entry to idle for idle time accounting * @idle_calls: Total number of idle calls * @idle_sleeps: Number of idle calls, where the sched tick was stopped @@ -38,7 +39,6 @@ enum tick_nohz_mode { * @idle_exittime: Time when the idle state was left * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding - * @sleep_length: Duration of the current idle sleep * @do_timer_lst: CPU was the last one doing do_timer before going idle */ struct tick_sched { @@ -49,6 +49,7 @@ struct tick_sched { ktime_t next_tick; int inidle; int tick_stopped; + int tick_may_stop; unsigned long idle_jiffies; unsigned long idle_calls; unsigned long idle_sleeps; @@ -58,8 +59,9 @@ struct tick_sched { ktime_t idle_exittime; ktime_t idle_sleeptime; ktime_t iowait_sleeptime; - ktime_t sleep_length; unsigned long last_jiffies; + u64 timer_expires; + u64 timer_expires_basemono; u64 next_timer; ktime_t idle_expires; int do_timer_last; Index: linux-pm/kernel/sched/idle.c =================================================================== --- linux-pm.orig/kernel/sched/idle.c +++ linux-pm/kernel/sched/idle.c @@ -190,13 +190,18 @@ static void cpuidle_idle_call(void) } else { bool nohz = true; - tick_nohz_idle_stop_tick(); - rcu_idle_enter(); - /* * Ask the cpuidle framework to choose a convenient idle state. */ next_state = cpuidle_select(drv, dev, &nohz); + + if (nohz) + tick_nohz_idle_stop_tick(); + else + tick_nohz_idle_retain_tick(); + + rcu_idle_enter(); + entered_state = call_cpuidle(drv, dev, next_state); /* * Give the governor an opportunity to reflect on the outcome Index: linux-pm/kernel/time/tick-sched.c =================================================================== --- linux-pm.orig/kernel/time/tick-sched.c +++ linux-pm/kernel/time/tick-sched.c @@ -652,13 +652,10 @@ static inline bool local_timer_softirq_p return local_softirq_pending() & TIMER_SOFTIRQ; } -static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, - ktime_t now, int cpu) +static ktime_t __tick_nohz_next_event(struct tick_sched *ts, int cpu) { - struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); u64 basemono, next_tick, next_tmr, next_rcu, delta, expires; unsigned long seq, basejiff; - ktime_t tick; /* Read jiffies and the time when jiffies were updated last */ do { @@ -667,6 +664,7 @@ static ktime_t tick_nohz_stop_sched_tick basejiff = jiffies; } while (read_seqretry(&jiffies_lock, seq)); ts->last_jiffies = basejiff; + ts->timer_expires_basemono = basemono; /* * Keep the periodic tick, when RCU, architecture or irq_work @@ -711,31 +709,24 @@ static ktime_t tick_nohz_stop_sched_tick * next period, so no point in stopping it either, bail. */ if (!ts->tick_stopped) { - tick = 0; + ts->timer_expires = 0; goto out; } } /* - * If this CPU is the one which updates jiffies, then give up - * the assignment and let it be taken by the CPU which runs - * the tick timer next, which might be this CPU as well. If we - * don't drop this here the jiffies might be stale and - * do_timer() never invoked. Keep track of the fact that it - * was the one which had the do_timer() duty last. If this CPU - * is the one which had the do_timer() duty last, we limit the - * sleep time to the timekeeping max_deferment value. + * If this CPU is the one which had the do_timer() duty last, we limit + * the sleep time to the timekeeping max_deferment value. * Otherwise we can sleep as long as we want. */ delta = timekeeping_max_deferment(); - if (cpu == tick_do_timer_cpu) { - tick_do_timer_cpu = TICK_DO_TIMER_NONE; - ts->do_timer_last = 1; - } else if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) { - delta = KTIME_MAX; - ts->do_timer_last = 0; - } else if (!ts->do_timer_last) { - delta = KTIME_MAX; + if (cpu != tick_do_timer_cpu) { + if (tick_do_timer_cpu != TICK_DO_TIMER_NONE) { + delta = KTIME_MAX; + ts->do_timer_last = 0; + } else if (!ts->do_timer_last) { + delta = KTIME_MAX; + } } #ifdef CONFIG_NO_HZ_FULL @@ -750,14 +741,41 @@ static ktime_t tick_nohz_stop_sched_tick else expires = KTIME_MAX; - expires = min_t(u64, expires, next_tick); - tick = expires; + ts->timer_expires = min_t(u64, expires, next_tick); + +out: + ts->tick_may_stop = 1; + return ts->timer_expires; +} + +static void __tick_nohz_stop_tick(struct tick_sched *ts, int cpu) +{ + struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); + u64 basemono = ts->timer_expires_basemono; + u64 expires = ts->timer_expires; + ktime_t tick = expires; + + /* Make sure we won't be trying to stop it twice in a row. */ + ts->tick_may_stop = 0; + + /* + * If this CPU is the one which updates jiffies, then give up + * the assignment and let it be taken by the CPU which runs + * the tick timer next, which might be this CPU as well. If we + * don't drop this here the jiffies might be stale and + * do_timer() never invoked. Keep track of the fact that it + * was the one which had the do_timer() duty last. + */ + if (cpu == tick_do_timer_cpu) { + tick_do_timer_cpu = TICK_DO_TIMER_NONE; + ts->do_timer_last = 1; + } /* Skip reprogram of event if its not changed */ if (ts->tick_stopped && (expires == ts->next_tick)) { /* Sanity check: make sure clockevent is actually programmed */ if (tick == KTIME_MAX || ts->next_tick == hrtimer_get_expires(&ts->sched_timer)) - goto out; + return; WARN_ON_ONCE(1); printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu timer->active: %d timer->expires: %llu\n", @@ -791,7 +809,7 @@ static ktime_t tick_nohz_stop_sched_tick if (unlikely(expires == KTIME_MAX)) { if (ts->nohz_mode == NOHZ_MODE_HIGHRES) hrtimer_cancel(&ts->sched_timer); - goto out; + return; } hrtimer_set_expires(&ts->sched_timer, tick); @@ -800,15 +818,18 @@ static ktime_t tick_nohz_stop_sched_tick hrtimer_start_expires(&ts->sched_timer, HRTIMER_MODE_ABS_PINNED); else tick_program_event(tick, 1); -out: - /* - * Update the estimated sleep length until the next timer - * (not only the tick). - */ - ts->sleep_length = ktime_sub(dev->next_event, now); - return tick; } +#ifdef CONFIG_NO_HZ_FULL +static void tick_nohz_stop_sched_tick(struct tick_sched *ts, int cpu) +{ + if (__tick_nohz_next_event(ts, cpu)) + __tick_nohz_stop_tick(ts, cpu); + else + ts->tick_may_stop = 0; +} +#endif /* CONFIG_NO_HZ_FULL */ + static void tick_nohz_restart_sched_tick(struct tick_sched *ts, ktime_t now) { /* Update jiffies first */ @@ -844,7 +865,7 @@ static void tick_nohz_full_update_tick(s return; if (can_stop_full_tick(cpu, ts)) - tick_nohz_stop_sched_tick(ts, ktime_get(), cpu); + tick_nohz_stop_sched_tick(ts, cpu); else if (ts->tick_stopped) tick_nohz_restart_sched_tick(ts, ktime_get()); #endif @@ -870,10 +891,8 @@ static bool can_stop_idle_tick(int cpu, return false; } - if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE)) { - ts->sleep_length = NSEC_PER_SEC / HZ; + if (unlikely(ts->nohz_mode == NOHZ_MODE_INACTIVE)) return false; - } if (need_resched()) return false; @@ -913,25 +932,33 @@ static void __tick_nohz_idle_stop_tick(s ktime_t expires; int cpu = smp_processor_id(); - if (can_stop_idle_tick(cpu, ts)) { + /* + * If tick_nohz_get_sleep_length() ran __tick_nohz_next_event(), the + * tick timer expiration time is known already. + */ + if (ts->tick_may_stop) + expires = ts->timer_expires; + else if (can_stop_idle_tick(cpu, ts)) + expires = __tick_nohz_next_event(ts, cpu); + else + return; + + ts->idle_calls++; + + if (expires > 0LL) { int was_stopped = ts->tick_stopped; - ts->idle_calls++; + __tick_nohz_stop_tick(ts, cpu); - /* - * The idle entry time should be a sufficient approximation of - * the current time at this point. - */ - expires = tick_nohz_stop_sched_tick(ts, ts->idle_entrytime, cpu); - if (expires > 0LL) { - ts->idle_sleeps++; - ts->idle_expires = expires; - } + ts->idle_sleeps++; + ts->idle_expires = expires; if (!was_stopped && ts->tick_stopped) { ts->idle_jiffies = ts->last_jiffies; nohz_balance_enter_idle(cpu); } + } else { + ts->tick_may_stop = 0; } } @@ -945,6 +972,11 @@ void tick_nohz_idle_stop_tick(void) __tick_nohz_idle_stop_tick(this_cpu_ptr(&tick_cpu_sched)); } +void tick_nohz_idle_retain_tick(void) +{ + __this_cpu_write(tick_cpu_sched.tick_may_stop, 0); +} + /** * tick_nohz_idle_enter - prepare for entering idle on the current CPU * @@ -957,7 +989,7 @@ void tick_nohz_idle_enter(void) lockdep_assert_irqs_enabled(); /* * Update the idle state in the scheduler domain hierarchy - * when tick_nohz_stop_sched_tick() is called from the idle loop. + * when __tick_nohz_stop_tick() is called from the idle loop. * State will be updated to busy during the first busy tick after * exiting idle. */ @@ -991,15 +1023,29 @@ void tick_nohz_irq_exit(void) } /** - * tick_nohz_get_sleep_length - return the length of the current sleep + * tick_nohz_get_sleep_length - return the expected length of the current sleep * * Called from power state control code with interrupts disabled */ ktime_t tick_nohz_get_sleep_length(void) { + struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); struct tick_sched *ts = this_cpu_ptr(&tick_cpu_sched); + int cpu = smp_processor_id(); + /* + * The idle entry time is expected to be a sufficient approximation of + * the current time at this point. + */ + ktime_t now = ts->idle_entrytime; + + if (can_stop_idle_tick(cpu, ts)) { + ktime_t next_event = __tick_nohz_next_event(ts, cpu); + + if (next_event) + return ktime_sub(next_event, now); + } - return ts->sleep_length; + return ktime_sub(dev->next_event, now); } /** Index: linux-pm/include/linux/tick.h =================================================================== --- linux-pm.orig/include/linux/tick.h +++ linux-pm/include/linux/tick.h @@ -115,6 +115,7 @@ enum tick_dep_bits { extern bool tick_nohz_enabled; extern int tick_nohz_tick_stopped(void); extern void tick_nohz_idle_stop_tick(void); +extern void tick_nohz_idle_retain_tick(void); extern void tick_nohz_idle_enter(void); extern void tick_nohz_idle_exit(void); extern void tick_nohz_irq_exit(void); @@ -127,6 +128,7 @@ extern u64 get_cpu_iowait_time_us(int cp #define tick_nohz_enabled (0) static inline int tick_nohz_tick_stopped(void) { return 0; } static inline void tick_nohz_idle_stop_tick(void) { } +static inline void tick_nohz_idle_retain_tick(void) { } static inline void tick_nohz_idle_enter(void) { } static inline void tick_nohz_idle_exit(void) { }