Received: by 10.192.245.15 with SMTP id i15csp992906imn; Sat, 10 Mar 2018 17:47:32 -0800 (PST) X-Google-Smtp-Source: AG47ELsJhpUwVlDTnZeNovpOwccZKNzj3sB+wfTe7noz+7S2FB7Om/sT6yFmoZYDPqX/HE0zAQ9r X-Received: by 2002:a17:902:8b85:: with SMTP id ay5-v6mr3516536plb.329.1520732852162; Sat, 10 Mar 2018 17:47:32 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1520732852; cv=none; d=google.com; s=arc-20160816; b=WcjCKYY9IFnAh8vJ8ktO3j2wAZzrxFVY9msO+Pct8aoq2I9A+xqfKYTck51hKVuIgV k+X5hxiTBA7chJLAmh7syVvEGuaNvPcLGjDdt9FWdv/bwuXCYjJJIfk6fKIAAP2YPF5t AinAw8Y4+Az2IdYD2GmXNRPSEEVNTVp506nGkqr401A/QuCfBYvbQ974ul1xq05X4B1B VXFA64mduXYdC02Af07UX54Dw3H6ggvIPtJw6kXvf8kn0ts+Q5f80bhJUpZeoQpCirsv IFVFTil876YNEslpwGdurIaWx2LEAbWiIYBV8bPeSkxLET5ungA/rmAQ4lJGvpd7u/tz 2gnA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dmarc-filter:arc-authentication-results; bh=FLdU5KirAjXSgLvKtlvw9T5qFeH5goYavUmTw1oU32A=; b=fSbJTVzVgZKYIcu35NP4sSqeb0PzVtuamWsdAZAgrJ3mTWbC86nQVloBHMows6M5/d i0qJc0bCjIbHtpH21C59ywc1aN4OLkuTkzmQ53nfH6CZSDvGPAoGysw+qLk3Ad0nLE/r B/VW20gfjLUdCALRjZBmjHXMoqF2xb6Dy/aA4RtPR0gA2QfBv9r78oGG0IPKXCge/65D MPNK7bb+8eRFhX7fk327wu47T4a5lHX6TcXvRfP6GDVGN/L8ocjAhKt2nkBE9gYTZIkd L8UU2DlPtA1nJcneomgn3lHnTSLajij9JlFhkdChRFPa7jzEnSfOVyJ6m8IiWQcEbN/j iVoQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d24si3436735pfn.336.2018.03.10.17.46.40; Sat, 10 Mar 2018 17:47:32 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751234AbeCKBom (ORCPT + 99 others); Sat, 10 Mar 2018 20:44:42 -0500 Received: from mail.kernel.org ([198.145.29.99]:55600 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751199AbeCKBok (ORCPT ); Sat, 10 Mar 2018 20:44:40 -0500 Received: from localhost (i16-les03-th2-31-37-47-191.sfr.lns.abo.bbox.fr [31.37.47.191]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 8C67820685; Sun, 11 Mar 2018 01:44:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8C67820685 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=frederic@kernel.org Date: Sun, 11 Mar 2018 02:44:37 +0100 From: Frederic Weisbecker To: "Rafael J. Wysocki" Cc: Peter Zijlstra , Linux PM , Frederic Weisbecker , Thomas Gleixner , Paul McKenney , Thomas Ilsche , Doug Smythies , Rik van Riel , Aubrey Li , Mike Galbraith , LKML Subject: Re: [RFC/RFT][PATCH v3 5/6] sched: idle: Select idle state before stopping the tick Message-ID: <20180311014435.GA30062@lerouge> References: <2450532.XN8DODrtDf@aspire.rjw.lan> <3333184.WFRNK4vEe0@aspire.rjw.lan> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3333184.WFRNK4vEe0@aspire.rjw.lan> User-Agent: Mutt/1.5.24 (2015-08-30) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Mar 09, 2018 at 10:46:55AM +0100, Rafael J. Wysocki wrote: > --- linux-pm.orig/kernel/time/tick-sched.h > +++ linux-pm/kernel/time/tick-sched.h > @@ -30,6 +30,7 @@ enum tick_nohz_mode { > * when the CPU returns from nohz sleep. > * @next_tick: Next tick to be fired when in dynticks mode. > * @tick_stopped: Indicator that the idle tick has been stopped > + * @tick_may_stop: Indicator that the idle tick may be stopped shortly Perhaps we can set timer_expires to 0 instead when we want to invalidate the last value? > * @idle_jiffies: jiffies at the entry to idle for idle time accounting > * @idle_calls: Total number of idle calls > * @idle_sleeps: Number of idle calls, where the sched tick was stopped > @@ -38,7 +39,6 @@ enum tick_nohz_mode { > * @idle_exittime: Time when the idle state was left > * @idle_sleeptime: Sum of the time slept in idle with sched tick stopped > * @iowait_sleeptime: Sum of the time slept in idle with sched tick stopped, with IO outstanding > - * @sleep_length: Duration of the current idle sleep > * @do_timer_lst: CPU was the last one doing do_timer before going idle > */ > struct tick_sched { > @@ -49,6 +49,7 @@ struct tick_sched { > ktime_t next_tick; > int inidle; > int tick_stopped; > + int tick_may_stop; > unsigned long idle_jiffies; > unsigned long idle_calls; > unsigned long idle_sleeps; > @@ -58,8 +59,9 @@ struct tick_sched { > ktime_t idle_exittime; > ktime_t idle_sleeptime; > ktime_t iowait_sleeptime; > - ktime_t sleep_length; > unsigned long last_jiffies; > + u64 timer_expires; > + u64 timer_expires_basemono; We may need documentation for the above fields too. > Index: linux-pm/kernel/time/tick-sched.c > =================================================================== > --- linux-pm.orig/kernel/time/tick-sched.c > +++ linux-pm/kernel/time/tick-sched.c > @@ -652,13 +652,10 @@ static inline bool local_timer_softirq_p > return local_softirq_pending() & TIMER_SOFTIRQ; > } > > -static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts, > - ktime_t now, int cpu) > +static ktime_t __tick_nohz_next_event(struct tick_sched *ts, int cpu) Since we don't seem to have a lower level version, can we remove the underscores? > { > - struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); > u64 basemono, next_tick, next_tmr, next_rcu, delta, expires; > unsigned long seq, basejiff; > - ktime_t tick; > > /* Read jiffies and the time when jiffies were updated last */ > do { [...] > +static void __tick_nohz_stop_tick(struct tick_sched *ts, int cpu) (Same comment here about the underscores). > +{ > + struct clock_event_device *dev = __this_cpu_read(tick_cpu_device.evtdev); > + u64 basemono = ts->timer_expires_basemono; > + u64 expires = ts->timer_expires; > + ktime_t tick = expires; > + > + /* Make sure we won't be trying to stop it twice in a row. */ > + ts->tick_may_stop = 0; > + > + /* > + * If this CPU is the one which updates jiffies, then give up > + * the assignment and let it be taken by the CPU which runs > + * the tick timer next, which might be this CPU as well. If we > + * don't drop this here the jiffies might be stale and > + * do_timer() never invoked. Keep track of the fact that it > + * was the one which had the do_timer() duty last. > + */ > + if (cpu == tick_do_timer_cpu) { > + tick_do_timer_cpu = TICK_DO_TIMER_NONE; > + ts->do_timer_last = 1; > + } > > /* Skip reprogram of event if its not changed */ > if (ts->tick_stopped && (expires == ts->next_tick)) { > /* Sanity check: make sure clockevent is actually programmed */ > if (tick == KTIME_MAX || ts->next_tick == hrtimer_get_expires(&ts->sched_timer)) > - goto out; > + return; > > WARN_ON_ONCE(1); > printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu timer->active: %d timer->expires: %llu\n", [...] > +void tick_nohz_idle_retain_tick(void) > +{ > + __this_cpu_write(tick_cpu_sched.tick_may_stop, 0); > +} > + So, I've become overly-paranoid about cached expiration on nohz code, we've run into bugs that took months to debug before. It seems that the cached version shouldn't leak in any way there, still can we have checks such as this in tick_nohz_idle_enter/exit()? WARN_ON_ONCE(__this_cpu_read(tick_cpu_sched.tick_may_stop)); Otherwise a leaking cached expiration may mislead further nohz tick stop and bypass calls to tick_nohz_next_event(). Also let's make sure we never call tick_nohz_get_sleep_length() outside idle: WARN_ON_ONCE(!__this_cpu_read(tick_cpu_sched.inidle)); Thanks!