Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753803AbcCGUne (ORCPT ); Mon, 7 Mar 2016 15:43:34 -0500 Received: from bombadil.infradead.org ([198.137.202.9]:38626 "EHLO bombadil.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753590AbcCGUn0 (ORCPT ); Mon, 7 Mar 2016 15:43:26 -0500 Date: Mon, 7 Mar 2016 21:43:17 +0100 From: Peter Zijlstra To: Chris Metcalf Cc: Daniel Thompson , Russell King , Thomas Gleixner , Ingo Molnar , Andrew Morton , linux-kernel@vger.kernel.org, Aaron Tomlin , "Rafael J. Wysocki" , Daniel Lezcano Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus Message-ID: <20160307204317.GR6344@twins.programming.kicks-ass.net> References: <1456782024-7122-1-git-send-email-cmetcalf@ezchip.com> <1456782024-7122-3-git-send-email-cmetcalf@ezchip.com> <56D5A5E6.9050206@linaro.org> <56D5BCE6.3010300@mellanox.com> <20160307094852.GA6356@twins.programming.kicks-ass.net> <56DDBC88.9060308@mellanox.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <56DDBC88.9060308@mellanox.com> User-Agent: Mutt/1.5.21 (2012-12-30) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 2731 Lines: 53 On Mon, Mar 07, 2016 at 12:38:16PM -0500, Chris Metcalf wrote: > On 03/07/2016 04:48 AM, Peter Zijlstra wrote: > I'm a little skeptical that a single percpu write is going to add much > measurable overhead to this path. So that write is almost guaranteed to be a cacheline miss, those things hurt and do show up on profiles. > However, we can certainly adapt > alternate approaches that stay away from the actual idle code. > > One approach (diff appended) is to just test to see if the PC is > actually in the architecture-specific halt code. There are two downsides: > > 1. It requires a small amount of per-architecture support. I've provided > the tile support as an example, since that's what I tested. I expect > x86 is a little more complicated since there are more idle paths and > they don't currently run the idle instruction(s) at a fixed address, but > it's unlikely to be too complicated on any platform. > Still, adding anything per-architecture is certainly a downside. > > 2. As proposed, my new alternate solution only handles the non-polling > case, so if you are in the polling loop, we won't benefit from having > the NMI backtrace code skip over you. However my guess is that 99% of > the time folks do choose to run the default non-polling mode, so this > probably still achieves a pretty reasonable outcome. > > A different approach that would handle downside #2 and probably make it > easier to implement the architecture-specific code for more complicated > platforms like x86 would be to use the SCHED_TEXT model and tag all the > low-level idling functions as CPUIDLE_TEXT. Then the "are we idling" > test is just a range compare on the PC against __cpuidle_text_{start,end}. > > We'd have to decide whether to make cpu_idle_poll() non-inline and just > test for being in that function, or whether we could tag all of > cpu_idle_loop() as being CPUIDLE_TEXT and just omit any backtrace > whenever the PC is anywhere in that function. Obviously if we have > called out to more complicated code (e.g. Daniel's concern about calling > out to power management code) the PC would no longer be in the CPUIDLE_TEXT > at that point, so that might be OK too. But the CPU would also not be idle if its running pm code. So I like the CPUIDLE_TEXT approach, since it has no impact on the generated code. An alternative option could be to inspect the stack, we already take a stack dump, so you could say that everything that has cpuidle_enter() in its callchain is an 'idle' cpu. Yet another option would be to look at rq->idle_state or any other state cpuidle already tracks. The 'obvious' downside is relying on cpuidle, which I understand isn't supported by everyone.