Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751617AbdGRQpZ (ORCPT ); Tue, 18 Jul 2017 12:45:25 -0400 Received: from merlin.infradead.org ([205.233.59.134]:52042 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751388AbdGRQpY (ORCPT ); Tue, 18 Jul 2017 12:45:24 -0400 Date: Tue, 18 Jul 2017 18:45:14 +0200 From: Peter Zijlstra To: "Li, Aubrey" Cc: Thomas Gleixner , Andi Kleen , Frederic Weisbecker , Christoph Lameter , Aubrey Li , len.brown@intel.com, rjw@rjwysocki.net, tim.c.chen@linux.intel.com, arjan@linux.intel.com, paulmck@linux.vnet.ibm.com, yang.zhang.wz@gmail.com, x86@kernel.org, linux-kernel@vger.kernel.org, daniel.lezcano@linaro.org Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods Message-ID: <20170718164514.wljhan5nmp4kiyb7@hirez.programming.kicks-ass.net> References: <31170ac6-9db1-f0b8-4841-f1661c8ed6e1@linux.intel.com> <20170714153818.pjauqxebxyhs6ljp@hirez.programming.kicks-ass.net> <20170714155356.GH3441@tassilo.jf.intel.com> <20170714160648.tg2u6eo2id6gmnjz@hirez.programming.kicks-ass.net> <20170714162619.GJ3441@tassilo.jf.intel.com> <20170717192309.ubn5muvc3u7htuaw@hirez.programming.kicks-ass.net> <34371ef8-b8bc-d2bf-93de-3fccd6beb032@linux.intel.com> <20170718044521.GO3441@tassilo.jf.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1543 Lines: 31 On Tue, Jul 18, 2017 at 02:56:47PM +0800, Li, Aubrey wrote: > 3) for tick nohz idle, we want to skip if the coming idle is short. If we can > skip the tick nohz idle, we then skip all the items depending on it. But, there > are two hard points: > > 3.1) how to compute the period of the coming idle. My current proposal is to > use two factors in the current idle menu governor. There are two possible > options from Peter and Thomas, the one is to use scheduler idle estimate, which > is task activity based, the other is to use the statistics generated from irq > timings work. > > 3.2) how to determine if the idle is short or long. My current proposal is to > use a tunable value via /sys, while Peter prefers an auto-adjust mechanism. I > didn't get the details of an auto-adjust mechanism yet So the problem is that the cost of NOHZ enter/exit are for a large part determined by (micro) architecture costs of programming timer hardware. A single static threshold will never be the right value across all the various machines we run Linux on. So my suggestion was simply timing the cost of doing those functions ever time we do them and keeping a running average of their cost. Then use that measured cost as a basis for selecting when to skip them. For example if the estimated idle period (by whatever estimate we end up using) is less than 4 times the cost of doing NOHZ, don't do NOHZ. Note how both tick_nohz_idle_{enter,exit}() already take a timestamp at entry; all you need to do is take another one at exit and subtract.