Date: Mon, 7 Mar 2016 21:43:17 +0100
From: Peter Zijlstra <peterz@infradead.org>
To: Chris Metcalf <cmetcalf@mellanox.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>,
        Russell King <linux@arm.linux.org.uk>,
        Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
        Andrew Morton <akpm@osdl.org>, linux-kernel@vger.kernel.org,
        Aaron Tomlin <atomlin@redhat.com>,
        "Rafael J. Wysocki" <rjw@rjwysocki.net>,
        Daniel Lezcano <daniel.lezcano@linaro.org>
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle
 cpus
Message-ID: <20160307204317.GR6344@twins.programming.kicks-ass.net>
References: <1456782024-7122-1-git-send-email-cmetcalf@ezchip.com>
 <1456782024-7122-3-git-send-email-cmetcalf@ezchip.com>
 <56D5A5E6.9050206@linaro.org>
 <56D5BCE6.3010300@mellanox.com>
 <20160307094852.GA6356@twins.programming.kicks-ass.net>
 <56DDBC88.9060308@mellanox.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <56DDBC88.9060308@mellanox.com>
User-Agent: Mutt/1.5.21 (2012-12-30)
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 2731
Lines: 53

On Mon, Mar 07, 2016 at 12:38:16PM -0500, Chris Metcalf wrote:
> On 03/07/2016 04:48 AM, Peter Zijlstra wrote:
> I'm a little skeptical that a single percpu write is going to add much
> measurable overhead to this path. 

So that write is almost guaranteed to be a cacheline miss, those things
hurt and do show up on profiles.

> However, we can certainly adapt
> alternate approaches that stay away from the actual idle code.
> 
> One approach (diff appended) is to just test to see if the PC is
> actually in the architecture-specific halt code.  There are two downsides:
> 
> 1. It requires a small amount of per-architecture support.  I've provided
>    the tile support as an example, since that's what I tested.  I expect
>    x86 is a little more complicated since there are more idle paths and
>    they don't currently run the idle instruction(s) at a fixed address, but
>    it's unlikely to be too complicated on any platform.
>    Still, adding anything per-architecture is certainly a downside.
> 
> 2. As proposed, my new alternate solution only handles the non-polling
>    case, so if you are in the polling loop, we won't benefit from having
>    the NMI backtrace code skip over you.  However my guess is that 99% of
>    the time folks do choose to run the default non-polling mode, so this
>    probably still achieves a pretty reasonable outcome.
> 
> A different approach that would handle downside #2 and probably make it
> easier to implement the architecture-specific code for more complicated
> platforms like x86 would be to use the SCHED_TEXT model and tag all the
> low-level idling functions as CPUIDLE_TEXT.  Then the "are we idling"
> test is just a range compare on the PC against __cpuidle_text_{start,end}.
> 
> We'd have to decide whether to make cpu_idle_poll() non-inline and just
> test for being in that function, or whether we could tag all of
> cpu_idle_loop() as being CPUIDLE_TEXT and just omit any backtrace
> whenever the PC is anywhere in that function.  Obviously if we have
> called out to more complicated code (e.g. Daniel's concern about calling
> out to power management code) the PC would no longer be in the CPUIDLE_TEXT
> at that point, so that might be OK too.

But the CPU would also not be idle if its running pm code.

So I like the CPUIDLE_TEXT approach, since it has no impact on the
generated code.

An alternative option could be to inspect the stack, we already take a
stack dump, so you could say that everything that has cpuidle_enter() in
its callchain is an 'idle' cpu.

Yet another option would be to look at rq->idle_state or any other state
cpuidle already tracks. The 'obvious' downside is relying on cpuidle,
which I understand isn't supported by everyone.