Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle
 cpus
To: Chris Metcalf <cmetcalf@mellanox.com>
References: <1456782024-7122-1-git-send-email-cmetcalf@ezchip.com>
 <1456782024-7122-3-git-send-email-cmetcalf@ezchip.com>
 <56D5A5E6.9050206@linaro.org> <56D5BCE6.3010300@mellanox.com>
Cc: Russell King <linux@arm.linux.org.uk>,
        Thomas Gleixner <tglx@linutronix.de>, Ingo Molnar <mingo@redhat.com>,
        Andrew Morton <akpm@osdl.org>, linux-kernel@vger.kernel.org,
        Aaron Tomlin <atomlin@redhat.com>,
        Peter Zijlstra <peterz@infradead.org>,
        "Rafael J. Wysocki" <rjw@rjwysocki.net>,
        Daniel Lezcano <daniel.lezcano@linaro.org>
From: Daniel Thompson <daniel.thompson@linaro.org>
Message-ID: <56DD3B3D.1010404@linaro.org>
Date: Mon, 7 Mar 2016 15:26:37 +0700
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101
 Thunderbird/38.6.0
MIME-Version: 1.0
In-Reply-To: <56D5BCE6.3010300@mellanox.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Sender: linux-kernel-owner@vger.kernel.org
Content-Length: 6116
Lines: 153

On 01/03/16 23:01, Chris Metcalf wrote:
> (+PeterZ, Rafael, and Daniel Lezcano for cpuidle and scheduler)
>
> On 03/01/2016 09:23 AM, Daniel Thompson wrote:
>> On 29/02/16 21:40, Chris Metcalf wrote:
>>> When doing an nmi backtrace of many cores, and most of them are idle,
>>> the output is a little overwhelming and very uninformative. Suppress
>>> messages for cpus that are idling when they are interrupted and
>>> just emit one line, "NMI backtrace for N skipped: idle".
>>
>> I can see this makes the logs more attractive but this is code for
>> emergency situations.
>>
>> The idle task is responsible for certain power management activities.
>> How can you be sure the system is wedged because of bugs in that code?
>
> It's a fair point, but as core count increases, you really run the risk
> of losing the valuable data in a sea of data that isn't.  For example,
> for the architecture I maintain, we have the TILE-Gx72, which is a
> 72-core chip.  If each core's register dump and backtrace is 40 lines,
> we're up to around 3,000 lines of console output. Filtering that down by
> a factor of 10x or more (if only a handful of cores happen to be active,
> which is not uncommon) is a substantial usability improvement.

No objections to your use case. The output feels very verbose even with 
"only" eight cores.


> That said, it's true that the original solution I offered (examining
> just is_idle_task() plus interrupt nesting) is imprecise.  It is
> relatively straightforward to add a bit of per-cpu state that is set at
> the same moment we currently do stop/start_critical_timings(), which
> would indicate much more specifically that the cpu was running the
> idling code itself, and not anything more complex.  In that case if the
> flag was set, you would know you were either sitting on a
> processor-specific idle instruction in arch_cpu_idle(), or else polling
> one or two memory locations in a tight loop in cpu_idle_poll(), which
> presumably would offer sufficient precision to feel safe.
>
> Here's an alternative version of the patch which incorporates this
> idea.  Do you think this is preferable?  Thanks!

I prefer the approach taken by the new patch although I think the 
implementation might be buggy...


> commit 5b6dca9bad908ae66fa764025c4e6046a6cc0262
> Author: Chris Metcalf <cmetcalf@ezchip.com>
> Date:   Mon Feb 29 11:56:32 2016 -0500
>
>      nmi_backtrace: generate one-line reports for idle cpus
>      When doing an nmi backtrace of many cores, and most of them are idle,
>      the output is a little overwhelming and very uninformative.  Suppress
>      messages for cpus that are idling when they are interrupted and
>      just emit one line, "NMI backtrace for N skipped: idle".
>      Signed-off-by: Chris Metcalf <cmetcalf@ezchip.com>
>
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 786ad32631a6..b8c3c4cf88ad 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -206,6 +206,7 @@ static inline int cpuidle_enter_freeze(struct
> cpuidle_driver *drv,
>   /* kernel/sched/idle.c */
>   extern void sched_idle_set_state(struct cpuidle_state *idle_state);
>   extern void default_idle_call(void);
> +extern bool in_cpu_idle(void);
>
>   #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
>   void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev,
> atomic_t *a);
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 544a7133cbd1..9aff315f278b 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
>   __setup("hlt", cpu_idle_nopoll_setup);
>   #endif
>
> +static DEFINE_PER_CPU(bool, cpu_idling);
> +
> +/* Was the cpu was in the low-level idle code when interrupted? */
> +bool in_cpu_idle(void)
> +{
> +    return this_cpu_read(cpu_idling);

I think we continue to need the code to identify a core that is running 
an interrupt handler. Interrupts are not masked at the point we set 
cpu_idling to false meaning we can easily be preempted before we clear 
the flag.


> +}
> +
>   static inline int cpu_idle_poll(void)
>   {
>       rcu_idle_enter();
>       trace_cpu_idle_rcuidle(0, smp_processor_id());
>       local_irq_enable();
>       stop_critical_timings();
> +    this_cpu_write(cpu_idling, true);
>       while (!tif_need_resched() &&
>           (cpu_idle_force_poll || tick_check_broadcast_expired()))
>           cpu_relax();
> +    this_cpu_write(cpu_idling, false);
>       start_critical_timings();
>       trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>       rcu_idle_exit();
> @@ -89,7 +99,9 @@ void default_idle_call(void)
>           local_irq_enable();
>       } else {
>           stop_critical_timings();
> +        this_cpu_write(cpu_idling, true);
>           arch_cpu_idle();
> +        this_cpu_write(cpu_idling, false);
>           start_critical_timings();
>       }
>   }
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index db63ac75eba0..75b5eacaa5d3 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -17,6 +17,7 @@
>   #include <linux/kprobes.h>
>   #include <linux/nmi.h>
>   #include <linux/seq_buf.h>
> +#include <linux/cpuidle.h>
>
>   #ifdef arch_trigger_cpumask_backtrace
>   /* For reliability, we're prepared to waste bits here. */
> @@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
>
>           /* Replace printk to write into the NMI seq */
>           this_cpu_write(printk_func, nmi_vprintk);
> -        pr_warn("NMI backtrace for cpu %d\n", cpu);
> -        if (regs)
> -            show_regs(regs);
> -        else
> -            dump_stack();
> +        if (in_cpu_idle()) {
> +            pr_warn("NMI backtrace for cpu %d skipped: idle\n",
> +                cpu);
> +        } else {
> +            pr_warn("NMI backtrace for cpu %d\n", cpu);
> +            if (regs)
> +                show_regs(regs);
> +            else
> +                dump_stack();
> +        }
>           this_cpu_write(printk_func, printk_func_save);
>
>           cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>