Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933919AbdGLIe1 (ORCPT ); Wed, 12 Jul 2017 04:34:27 -0400 Received: from merlin.infradead.org ([205.233.59.134]:59722 "EHLO merlin.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933465AbdGLIeZ (ORCPT ); Wed, 12 Jul 2017 04:34:25 -0400 Date: Wed, 12 Jul 2017 10:34:10 +0200 From: Peter Zijlstra To: "Li, Aubrey" Cc: Frederic Weisbecker , Christoph Lameter , Andi Kleen , Aubrey Li , tglx@linutronix.de, len.brown@intel.com, rjw@rjwysocki.net, tim.c.chen@linux.intel.com, arjan@linux.intel.com, paulmck@linux.vnet.ibm.com, yang.zhang.wz@gmail.com, x86@kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH v1 00/11] Create fast idle path for short idle periods Message-ID: <20170712083410.ualmvnvzoohyami5@hirez.programming.kicks-ass.net> References: <1499650721-5928-1-git-send-email-aubrey.li@intel.com> <20170710084647.zs6wkl3fumszd33g@hirez.programming.kicks-ass.net> <20170710144609.GD31832@tassilo.jf.intel.com> <20170710164206.5aon5kelbisxqyxq@hirez.programming.kicks-ass.net> <20170710172705.GA3441@tassilo.jf.intel.com> <20170711094157.5xcwkloxnjehieqv@hirez.programming.kicks-ass.net> <20170711160926.GA18805@lerouge> <20170711163422.etydkhhtgfthpfi5@hirez.programming.kicks-ass.net> <496d4921-5768-cd1e-654b-38630b7d2e13@linux.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <496d4921-5768-cd1e-654b-38630b7d2e13@linux.intel.com> User-Agent: NeoMutt/20170609 (1.8.3) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 1194 Lines: 29 On Wed, Jul 12, 2017 at 12:15:08PM +0800, Li, Aubrey wrote: > Okay, the difference is that Mike's patch uses a very simple algorithm to make the decision. No, the difference is that we don't end up with duplication of a metric ton of code. It uses the normal idle path, it just makes the NOHZ enter fail. The condition Mike uses is why that patch never really went anywhere and needs work. For the condition I tend to prefer something auto-adjusting vs a tunable threshold that everybody + dog needs to manually adjust. So add something that measures the cost of tick_nohz_idle_{enter,exit}() and base the threshold off of that. Then of course, there's the question which of the idle estimates to use. The cpuidle idle estimate includes IRQs, which is important for actual idle states, but not all interrupts re-enable the tick. The scheduler idle estimate only considers task activity, which tends to re-enable the tick. So the cpuidle estimate is pessimistic in that it'll vastly under estimate the actual nohz period, while the scheduler estimate will over estimate. I suspsect the scheduler one is closer to the actual nohz duration, but this is something we'll have to play with.