Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752476AbcCGI05 (ORCPT ); Mon, 7 Mar 2016 03:26:57 -0500 Received: from mail-pa0-f52.google.com ([209.85.220.52]:33952 "EHLO mail-pa0-f52.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752339AbcCGI0z (ORCPT ); Mon, 7 Mar 2016 03:26:55 -0500 Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus To: Chris Metcalf References: <1456782024-7122-1-git-send-email-cmetcalf@ezchip.com> <1456782024-7122-3-git-send-email-cmetcalf@ezchip.com> <56D5A5E6.9050206@linaro.org> <56D5BCE6.3010300@mellanox.com> Cc: Russell King , Thomas Gleixner , Ingo Molnar , Andrew Morton , linux-kernel@vger.kernel.org, Aaron Tomlin , Peter Zijlstra , "Rafael J. Wysocki" , Daniel Lezcano From: Daniel Thompson Message-ID: <56DD3B3D.1010404@linaro.org> Date: Mon, 7 Mar 2016 15:26:37 +0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0 MIME-Version: 1.0 In-Reply-To: <56D5BCE6.3010300@mellanox.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 6116 Lines: 153 On 01/03/16 23:01, Chris Metcalf wrote: > (+PeterZ, Rafael, and Daniel Lezcano for cpuidle and scheduler) > > On 03/01/2016 09:23 AM, Daniel Thompson wrote: >> On 29/02/16 21:40, Chris Metcalf wrote: >>> When doing an nmi backtrace of many cores, and most of them are idle, >>> the output is a little overwhelming and very uninformative. Suppress >>> messages for cpus that are idling when they are interrupted and >>> just emit one line, "NMI backtrace for N skipped: idle". >> >> I can see this makes the logs more attractive but this is code for >> emergency situations. >> >> The idle task is responsible for certain power management activities. >> How can you be sure the system is wedged because of bugs in that code? > > It's a fair point, but as core count increases, you really run the risk > of losing the valuable data in a sea of data that isn't. For example, > for the architecture I maintain, we have the TILE-Gx72, which is a > 72-core chip. If each core's register dump and backtrace is 40 lines, > we're up to around 3,000 lines of console output. Filtering that down by > a factor of 10x or more (if only a handful of cores happen to be active, > which is not uncommon) is a substantial usability improvement. No objections to your use case. The output feels very verbose even with "only" eight cores. > That said, it's true that the original solution I offered (examining > just is_idle_task() plus interrupt nesting) is imprecise. It is > relatively straightforward to add a bit of per-cpu state that is set at > the same moment we currently do stop/start_critical_timings(), which > would indicate much more specifically that the cpu was running the > idling code itself, and not anything more complex. In that case if the > flag was set, you would know you were either sitting on a > processor-specific idle instruction in arch_cpu_idle(), or else polling > one or two memory locations in a tight loop in cpu_idle_poll(), which > presumably would offer sufficient precision to feel safe. > > Here's an alternative version of the patch which incorporates this > idea. Do you think this is preferable? Thanks! I prefer the approach taken by the new patch although I think the implementation might be buggy... > commit 5b6dca9bad908ae66fa764025c4e6046a6cc0262 > Author: Chris Metcalf > Date: Mon Feb 29 11:56:32 2016 -0500 > > nmi_backtrace: generate one-line reports for idle cpus > When doing an nmi backtrace of many cores, and most of them are idle, > the output is a little overwhelming and very uninformative. Suppress > messages for cpus that are idling when they are interrupted and > just emit one line, "NMI backtrace for N skipped: idle". > Signed-off-by: Chris Metcalf > > diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h > index 786ad32631a6..b8c3c4cf88ad 100644 > --- a/include/linux/cpuidle.h > +++ b/include/linux/cpuidle.h > @@ -206,6 +206,7 @@ static inline int cpuidle_enter_freeze(struct > cpuidle_driver *drv, > /* kernel/sched/idle.c */ > extern void sched_idle_set_state(struct cpuidle_state *idle_state); > extern void default_idle_call(void); > +extern bool in_cpu_idle(void); > > #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED > void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev, > atomic_t *a); > diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c > index 544a7133cbd1..9aff315f278b 100644 > --- a/kernel/sched/idle.c > +++ b/kernel/sched/idle.c > @@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused) > __setup("hlt", cpu_idle_nopoll_setup); > #endif > > +static DEFINE_PER_CPU(bool, cpu_idling); > + > +/* Was the cpu was in the low-level idle code when interrupted? */ > +bool in_cpu_idle(void) > +{ > + return this_cpu_read(cpu_idling); I think we continue to need the code to identify a core that is running an interrupt handler. Interrupts are not masked at the point we set cpu_idling to false meaning we can easily be preempted before we clear the flag. > +} > + > static inline int cpu_idle_poll(void) > { > rcu_idle_enter(); > trace_cpu_idle_rcuidle(0, smp_processor_id()); > local_irq_enable(); > stop_critical_timings(); > + this_cpu_write(cpu_idling, true); > while (!tif_need_resched() && > (cpu_idle_force_poll || tick_check_broadcast_expired())) > cpu_relax(); > + this_cpu_write(cpu_idling, false); > start_critical_timings(); > trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id()); > rcu_idle_exit(); > @@ -89,7 +99,9 @@ void default_idle_call(void) > local_irq_enable(); > } else { > stop_critical_timings(); > + this_cpu_write(cpu_idling, true); > arch_cpu_idle(); > + this_cpu_write(cpu_idling, false); > start_critical_timings(); > } > } > diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c > index db63ac75eba0..75b5eacaa5d3 100644 > --- a/lib/nmi_backtrace.c > +++ b/lib/nmi_backtrace.c > @@ -17,6 +17,7 @@ > #include > #include > #include > +#include > > #ifdef arch_trigger_cpumask_backtrace > /* For reliability, we're prepared to waste bits here. */ > @@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs) > > /* Replace printk to write into the NMI seq */ > this_cpu_write(printk_func, nmi_vprintk); > - pr_warn("NMI backtrace for cpu %d\n", cpu); > - if (regs) > - show_regs(regs); > - else > - dump_stack(); > + if (in_cpu_idle()) { > + pr_warn("NMI backtrace for cpu %d skipped: idle\n", > + cpu); > + } else { > + pr_warn("NMI backtrace for cpu %d\n", cpu); > + if (regs) > + show_regs(regs); > + else > + dump_stack(); > + } > this_cpu_write(printk_func, printk_func_save); > > cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask)); >