Received: by 10.192.245.15 with SMTP id i15csp1117519imn; Sun, 11 Mar 2018 03:32:03 -0700 (PDT) X-Google-Smtp-Source: AG47ELs5+o9u7956nAcHvh8HLaZ1blI+OYqcUInHvIbJaqlgbhMgaBJz6VYiE5R0uVhtke9w4aVy X-Received: by 10.99.67.1 with SMTP id q1mr3675890pga.365.1520764323855; Sun, 11 Mar 2018 03:32:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1520764323; cv=none; d=google.com; s=arc-20160816; b=vO9hQAEI4A6KaishCt+ZEpMyGdqYXzc9/H1g1uMXkUuIpgtx9orBLVtms1NNazG3Kr c1BPjKg2nhURr+UY5Mvw30hGKqE/uAIqd2pmfUibLc4kZS5G2lQZQcDlHua2pdB/vicb Q2srjN867Xbn4c93JeVUmvlOCNbWi0QIQio/wnOvmmJ6D5YUvfj0Ob8NggYoJk3rV5Rq vT+YwZND0NbHB+nn8gz7BqjTT1bZUcTLOx0wNpP5O12puO6t7NxK7l/4URh1hM04/ptR fJUo2CfXmR9eELpadxKlzCjBQ3QXJWH3k98VXyS+QnfLOTvy1PCSPDWn91WBOYBYvCvT c1UQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :arc-authentication-results; bh=8SeCsoSb66PLffQOFHbIBCDiaW/tr+m4G5WHd/FvcHg=; b=vVLydydL09/cyrcmvPZOQmzjZUCrVfuV4KOS8StSVqgzvkjr16ZQ6NGrtx1lps8MXy G0/lq55csWdIuTQ0X4xVV2zlF2OO5Icsa0uqmT64cMDDM227NfT1eZTzPG/RNN0oi40E 6ZKk+rcyC289TbmGMXe8oVRGuCnbbMY7eWvxGpE2LtfNSfxydRkoOGEq8QMFjL+j8Drl YcL4vrc//ibAscrMyIyHyQxCLaesbBhdimGWQPdt3+bArhjPuRWmxg8A0NcAg80j1+Fn hD9aWjJYgnusvboFrlmvT6ItWWpbzvL8v7XIy01nfUEb8KsZCOJIeJshpq3ypiMncO2w /xwQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m12-v6si4113155pls.74.2018.03.11.03.31.48; Sun, 11 Mar 2018 03:32:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932149AbeCKKa4 (ORCPT + 99 others); Sun, 11 Mar 2018 06:30:56 -0400 Received: from cloudserver094114.home.pl ([79.96.170.134]:61653 "EHLO cloudserver094114.home.pl" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932113AbeCKKay (ORCPT ); Sun, 11 Mar 2018 06:30:54 -0400 Received: from 79.184.254.228.ipv4.supernova.orange.pl (79.184.254.228) (HELO aspire.rjw.lan) by serwer1319399.home.pl (79.96.170.134) with SMTP (IdeaSmtpServer 0.83) id 94e082b9e636b848; Sun, 11 Mar 2018 11:30:52 +0100 From: "Rafael J. Wysocki" To: Frederic Weisbecker Cc: Peter Zijlstra , Linux PM , Frederic Weisbecker , Thomas Gleixner , Paul McKenney , Thomas Ilsche , Doug Smythies , Rik van Riel , Aubrey Li , Mike Galbraith , LKML Subject: Re: [RFC/RFT][PATCH v3 5/6] sched: idle: Select idle state before stopping the tick Date: Sun, 11 Mar 2018 11:31:31 +0100 Message-ID: <1998186.Se6Zm9i2s4@aspire.rjw.lan> In-Reply-To: <20180311014435.GA30062@lerouge> References: <2450532.XN8DODrtDf@aspire.rjw.lan> <3333184.WFRNK4vEe0@aspire.rjw.lan> <20180311014435.GA30062@lerouge> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sunday, March 11, 2018 2:44:37 AM CET Frederic Weisbecker wrote: > On Fri, Mar 09, 2018 at 10:46:55AM +0100, Rafael J. Wysocki wrote: > > --- linux-pm.orig/kernel/time/tick-sched.h > > +++ linux-pm/kernel/time/tick-sched.h > > @@ -30,6 +30,7 @@ enum tick_nohz_mode { > > * when the CPU returns from nohz sleep. > > * @next_tick: Next tick to be fired when in dynticks mode. > > * @tick_stopped: Indicator that the idle tick has been stopped > > + * @tick_may_stop: Indicator that the idle tick may be stopped shortly > > Perhaps we can set timer_expires to 0 instead when we want to invalidate > the last value? timer_expires can be 0 for other reasons. I can use timer_expires_basemono for that (I actually did that in one version of my patches, but it looked odd and I decided to use a new field for clarity) if you prefer to avoid adding an extra field. > > * @idle_jiffies: jiffies at the entry to idle for idle time accounting > > * @idle_calls: Total number of idle calls > > * @idle_sleeps: Number of idle calls, where the sched tick was stopped > > @@ -38,7 +39,6 @@ enum tick_nohz_mode { > > * @idle_exittime: Time when the idle state was left > > * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped > > * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding > > - * @sleep_length: Duration of the current idle sleep > > * @do_timer_lst: CPU was the last one doing do_timer before going idle > > */ > > struct tick_sched { > > @@ -49,6 +49,7 @@ struct tick_sched { > > ktime_t next_tick; > > int inidle; > > int tick_stopped; > > + int tick_may_stop; > > unsigned long idle_jiffies; > > unsigned long idle_calls; > > unsigned long idle_sleeps; > > @@ -58,8 +59,9 @@ struct tick_sched { > > ktime_t idle_exittime; > > ktime_t idle_sleeptime; > > ktime_t iowait_sleeptime; > > - ktime_t sleep_length; > > unsigned long last_jiffies; > > + u64 timer_expires; > > + u64 timer_expires_basemono; > > We may need documentation for the above fields too. OK > > Index: linux-pm/kernel/time/tick-sched.c > > =================================================================== > > --- linux-pm.orig/kernel/time/tick-sched.c > > +++ linux-pm/kernel/time/tick-sched.c > > @@ -652,13 +652,10 @@ static inline bool local_timer_softirq_p > > return local_softirq_pending() & TIMER_SOFTIRQ; > > } > > > > -static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, > > - ktime_t now, int cpu) > > +static ktime_t __tick_nohz_next_event(struct tick_sched *ts, int cpu) > > Since we don't seem to have a lower level version, can we remove the underscores? Sure, will do. > > { > > - struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); > > u64 basemono, next_tick, next_tmr, next_rcu, delta, expires; > > unsigned long seq, basejiff; > > - ktime_t tick; > > > > /* Read jiffies and the time when jiffies were updated last */ > > do { > [...] > > > > +static void __tick_nohz_stop_tick(struct tick_sched *ts, int cpu) > > (Same comment here about the underscores). Yup. > > +{ > > + struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); > > + u64 basemono = ts->timer_expires_basemono; > > + u64 expires = ts->timer_expires; > > + ktime_t tick = expires; > > + > > + /* Make sure we won't be trying to stop it twice in a row. */ > > + ts->tick_may_stop = 0; > > + > > + /* > > + * If this CPU is the one which updates jiffies, then give up > > + * the assignment and let it be taken by the CPU which runs > > + * the tick timer next, which might be this CPU as well. If we > > + * don't drop this here the jiffies might be stale and > > + * do_timer() never invoked. Keep track of the fact that it > > + * was the one which had the do_timer() duty last. > > + */ > > + if (cpu == tick_do_timer_cpu) { > > + tick_do_timer_cpu = TICK_DO_TIMER_NONE; > > + ts->do_timer_last = 1; > > + } > > > > /* Skip reprogram of event if its not changed */ > > if (ts->tick_stopped && (expires == ts->next_tick)) { > > /* Sanity check: make sure clockevent is actually programmed */ > > if (tick == KTIME_MAX || ts->next_tick == hrtimer_get_expires(&ts->sched_timer)) > > - goto out; > > + return; > > > > WARN_ON_ONCE(1); > > printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu timer->active: %d timer->expires: %llu\n", > [...] > > +void tick_nohz_idle_retain_tick(void) > > +{ > > + __this_cpu_write(tick_cpu_sched.tick_may_stop, 0); > > +} > > + > > So, I've become overly-paranoid about cached expiration on nohz code, we've run > into bugs that took months to debug before. It seems that the cached version shouldn't > leak in any way there, still can we have checks such as this in tick_nohz_idle_enter/exit()? > > WARN_ON_ONCE(__this_cpu_read(tick_cpu_sched.tick_may_stop)); OK > Otherwise a leaking cached expiration may mislead further nohz tick stop and > bypass calls to tick_nohz_next_event(). > > Also let's make sure we never call tick_nohz_get_sleep_length() outside idle: > > WARN_ON_ONCE(!__this_cpu_read(tick_cpu_sched.inidle)); OK I'll respin the series with the above comments addressed early next week. Thanks!