Date: Thu, 20 Jul 2017 10:11:00 +0200 (CEST)
From: Thomas Gleixner <tglx@linutronix.de>
To: "Li, Aubrey" <aubrey.li@linux.intel.com>
cc: Andi Kleen <ak@linux.intel.com>, Peter Zijlstra <peterz@infradead.org>,
        Frederic Weisbecker <fweisbec@gmail.com>,
        Christoph Lameter <cl@linux.com>, Aubrey Li <aubrey.li@intel.com>,
        len.brown@intel.com, rjw@rjwysocki.net, tim.c.chen@linux.intel.com,
        arjan@linux.intel.com, paulmck@linux.vnet.ibm.com,
        yang.zhang.wz@gmail.com, x86@kernel.org, linux-kernel@vger.kernel.org,
        daniel.lezcano@linaro.org
Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle
 periods
In-Reply-To: <0c346d95-baae-715a-bff5-4738b924ccff@linux.intel.com>
Message-ID: <alpine.DEB.2.20.1707200936260.3168@nanos>
References: <20170713182820.sn3fjitnd3mca27p@hirez.programming.kicks-ass.net> <31170ac6-9db1-f0b8-4841-f1661c8ed6e1@linux.intel.com> <20170714153818.pjauqxebxyhs6ljp@hirez.programming.kicks-ass.net> <20170714155356.GH3441@tassilo.jf.intel.com>
 <20170714160648.tg2u6eo2id6gmnjz@hirez.programming.kicks-ass.net> <20170714162619.GJ3441@tassilo.jf.intel.com> <20170717192309.ubn5muvc3u7htuaw@hirez.programming.kicks-ass.net> <34371ef8-b8bc-d2bf-93de-3fccd6beb032@linux.intel.com> <20170718044521.GO3441@tassilo.jf.intel.com>
 <alpine.DEB.2.20.1707180841290.1945@nanos> <20170718065926.GP3441@tassilo.jf.intel.com> <alpine.DEB.2.20.1707180918080.1945@nanos> <348019f4-85ae-ba91-3fce-9886533e8d22@linux.intel.com> <alpine.DEB.2.20.1707190928510.2286@nanos>
 <0c346d95-baae-715a-bff5-4738b924ccff@linux.intel.com>
User-Agent: Alpine 2.20 (DEB 67 2015-01-07)
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 1349
Lines: 43

On Thu, 20 Jul 2017, Li, Aubrey wrote:
> Don't get me wrong, even if a fast path is acceptable, we still need to
> figure out if the coming idle is short and when to switch. I'm just worried
> about if irq timings is not an ideal statistics, we have to skip it too.

There is no ideal solution ever.

Lets sit back and look at that from the big picture first before dismissing
a particular item upfront.

The current NOHZ implementation does:

    predict = nohz_predict(timers, rcu, arch, irqwork);

    if ((predict - now) > X)
    	stop_tick()

The C-State machinery does something like:

    predict = cstate_predict(next_timer, scheduler);

    cstate = cstate_select(predict);

That disconnect is part of the problem. What we really want is:

    predict = idle_predict(timers, rcu, arch, irqwork, scheduler, irq timings);

and use that prediction for both the NOHZ and the C-State decision
function. That's the first thing which needs to be addressed.

Once that is done, you can look into the prediction functions and optimize
that or tweak the bits and pieces there and decide which predictors work
best for a particular workload.

As long as you just look into a particular predictor function and do not
address the underlying conceptual issues first, the outcome is very much
predictable: It's going to be useless crap.

Thanks,

	tglx