2016-03-01 14:23:38

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

On 29/02/16 21:40, Chris Metcalf wrote:
> When doing an nmi backtrace of many cores, and most of them are idle,
> the output is a little overwhelming and very uninformative. Suppress
> messages for cpus that are idling when they are interrupted and
> just emit one line, "NMI backtrace for N skipped: idle".

I can see this makes the logs more attractive but this is code for
emergency situations.

The idle task is responsible for certain power management activities.
How can you be sure the system is wedged because of bugs in that code?


Daniel.


2016-03-01 16:01:59

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

(+PeterZ, Rafael, and Daniel Lezcano for cpuidle and scheduler)

On 03/01/2016 09:23 AM, Daniel Thompson wrote:
> On 29/02/16 21:40, Chris Metcalf wrote:
>> When doing an nmi backtrace of many cores, and most of them are idle,
>> the output is a little overwhelming and very uninformative. Suppress
>> messages for cpus that are idling when they are interrupted and
>> just emit one line, "NMI backtrace for N skipped: idle".
>
> I can see this makes the logs more attractive but this is code for
> emergency situations.
>
> The idle task is responsible for certain power management activities.
> How can you be sure the system is wedged because of bugs in that code?

It's a fair point, but as core count increases, you really run the risk
of losing the valuable data in a sea of data that isn't. For example,
for the architecture I maintain, we have the TILE-Gx72, which is a
72-core chip. If each core's register dump and backtrace is 40 lines,
we're up to around 3,000 lines of console output. Filtering that down by
a factor of 10x or more (if only a handful of cores happen to be active,
which is not uncommon) is a substantial usability improvement.

That said, it's true that the original solution I offered (examining
just is_idle_task() plus interrupt nesting) is imprecise. It is
relatively straightforward to add a bit of per-cpu state that is set at
the same moment we currently do stop/start_critical_timings(), which
would indicate much more specifically that the cpu was running the
idling code itself, and not anything more complex. In that case if the
flag was set, you would know you were either sitting on a
processor-specific idle instruction in arch_cpu_idle(), or else polling
one or two memory locations in a tight loop in cpu_idle_poll(), which
presumably would offer sufficient precision to feel safe.

Here's an alternative version of the patch which incorporates this
idea. Do you think this is preferable? Thanks!

commit 5b6dca9bad908ae66fa764025c4e6046a6cc0262
Author: Chris Metcalf <[email protected]>
Date: Mon Feb 29 11:56:32 2016 -0500

nmi_backtrace: generate one-line reports for idle cpus

When doing an nmi backtrace of many cores, and most of them are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and
just emit one line, "NMI backtrace for N skipped: idle".

Signed-off-by: Chris Metcalf <[email protected]>

diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
index 786ad32631a6..b8c3c4cf88ad 100644
--- a/include/linux/cpuidle.h
+++ b/include/linux/cpuidle.h
@@ -206,6 +206,7 @@ static inline int cpuidle_enter_freeze(struct cpuidle_driver *drv,
/* kernel/sched/idle.c */
extern void sched_idle_set_state(struct cpuidle_state *idle_state);
extern void default_idle_call(void);
+extern bool in_cpu_idle(void);

#ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev, atomic_t *a);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 544a7133cbd1..9aff315f278b 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
__setup("hlt", cpu_idle_nopoll_setup);
#endif

+static DEFINE_PER_CPU(bool, cpu_idling);
+
+/* Was the cpu was in the low-level idle code when interrupted? */
+bool in_cpu_idle(void)
+{
+ return this_cpu_read(cpu_idling);
+}
+
static inline int cpu_idle_poll(void)
{
rcu_idle_enter();
trace_cpu_idle_rcuidle(0, smp_processor_id());
local_irq_enable();
stop_critical_timings();
+ this_cpu_write(cpu_idling, true);
while (!tif_need_resched() &&
(cpu_idle_force_poll || tick_check_broadcast_expired()))
cpu_relax();
+ this_cpu_write(cpu_idling, false);
start_critical_timings();
trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
rcu_idle_exit();
@@ -89,7 +99,9 @@ void default_idle_call(void)
local_irq_enable();
} else {
stop_critical_timings();
+ this_cpu_write(cpu_idling, true);
arch_cpu_idle();
+ this_cpu_write(cpu_idling, false);
start_critical_timings();
}
}
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index db63ac75eba0..75b5eacaa5d3 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/nmi.h>
#include <linux/seq_buf.h>
+#include <linux/cpuidle.h>

#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
@@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)

/* Replace printk to write into the NMI seq */
this_cpu_write(printk_func, nmi_vprintk);
- pr_warn("NMI backtrace for cpu %d\n", cpu);
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+ if (in_cpu_idle()) {
+ pr_warn("NMI backtrace for cpu %d skipped: idle\n",
+ cpu);
+ } else {
+ pr_warn("NMI backtrace for cpu %d\n", cpu);
+ if (regs)
+ show_regs(regs);
+ else
+ dump_stack();
+ }
this_cpu_write(printk_func, printk_func_save);

cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-03-07 08:26:57

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

On 01/03/16 23:01, Chris Metcalf wrote:
> (+PeterZ, Rafael, and Daniel Lezcano for cpuidle and scheduler)
>
> On 03/01/2016 09:23 AM, Daniel Thompson wrote:
>> On 29/02/16 21:40, Chris Metcalf wrote:
>>> When doing an nmi backtrace of many cores, and most of them are idle,
>>> the output is a little overwhelming and very uninformative. Suppress
>>> messages for cpus that are idling when they are interrupted and
>>> just emit one line, "NMI backtrace for N skipped: idle".
>>
>> I can see this makes the logs more attractive but this is code for
>> emergency situations.
>>
>> The idle task is responsible for certain power management activities.
>> How can you be sure the system is wedged because of bugs in that code?
>
> It's a fair point, but as core count increases, you really run the risk
> of losing the valuable data in a sea of data that isn't. For example,
> for the architecture I maintain, we have the TILE-Gx72, which is a
> 72-core chip. If each core's register dump and backtrace is 40 lines,
> we're up to around 3,000 lines of console output. Filtering that down by
> a factor of 10x or more (if only a handful of cores happen to be active,
> which is not uncommon) is a substantial usability improvement.

No objections to your use case. The output feels very verbose even with
"only" eight cores.


> That said, it's true that the original solution I offered (examining
> just is_idle_task() plus interrupt nesting) is imprecise. It is
> relatively straightforward to add a bit of per-cpu state that is set at
> the same moment we currently do stop/start_critical_timings(), which
> would indicate much more specifically that the cpu was running the
> idling code itself, and not anything more complex. In that case if the
> flag was set, you would know you were either sitting on a
> processor-specific idle instruction in arch_cpu_idle(), or else polling
> one or two memory locations in a tight loop in cpu_idle_poll(), which
> presumably would offer sufficient precision to feel safe.
>
> Here's an alternative version of the patch which incorporates this
> idea. Do you think this is preferable? Thanks!

I prefer the approach taken by the new patch although I think the
implementation might be buggy...


> commit 5b6dca9bad908ae66fa764025c4e6046a6cc0262
> Author: Chris Metcalf <[email protected]>
> Date: Mon Feb 29 11:56:32 2016 -0500
>
> nmi_backtrace: generate one-line reports for idle cpus
> When doing an nmi backtrace of many cores, and most of them are idle,
> the output is a little overwhelming and very uninformative. Suppress
> messages for cpus that are idling when they are interrupted and
> just emit one line, "NMI backtrace for N skipped: idle".
> Signed-off-by: Chris Metcalf <[email protected]>
>
> diff --git a/include/linux/cpuidle.h b/include/linux/cpuidle.h
> index 786ad32631a6..b8c3c4cf88ad 100644
> --- a/include/linux/cpuidle.h
> +++ b/include/linux/cpuidle.h
> @@ -206,6 +206,7 @@ static inline int cpuidle_enter_freeze(struct
> cpuidle_driver *drv,
> /* kernel/sched/idle.c */
> extern void sched_idle_set_state(struct cpuidle_state *idle_state);
> extern void default_idle_call(void);
> +extern bool in_cpu_idle(void);
>
> #ifdef CONFIG_ARCH_NEEDS_CPU_IDLE_COUPLED
> void cpuidle_coupled_parallel_barrier(struct cpuidle_device *dev,
> atomic_t *a);
> diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
> index 544a7133cbd1..9aff315f278b 100644
> --- a/kernel/sched/idle.c
> +++ b/kernel/sched/idle.c
> @@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
> __setup("hlt", cpu_idle_nopoll_setup);
> #endif
>
> +static DEFINE_PER_CPU(bool, cpu_idling);
> +
> +/* Was the cpu was in the low-level idle code when interrupted? */
> +bool in_cpu_idle(void)
> +{
> + return this_cpu_read(cpu_idling);

I think we continue to need the code to identify a core that is running
an interrupt handler. Interrupts are not masked at the point we set
cpu_idling to false meaning we can easily be preempted before we clear
the flag.


> +}
> +
> static inline int cpu_idle_poll(void)
> {
> rcu_idle_enter();
> trace_cpu_idle_rcuidle(0, smp_processor_id());
> local_irq_enable();
> stop_critical_timings();
> + this_cpu_write(cpu_idling, true);
> while (!tif_need_resched() &&
> (cpu_idle_force_poll || tick_check_broadcast_expired()))
> cpu_relax();
> + this_cpu_write(cpu_idling, false);
> start_critical_timings();
> trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> rcu_idle_exit();
> @@ -89,7 +99,9 @@ void default_idle_call(void)
> local_irq_enable();
> } else {
> stop_critical_timings();
> + this_cpu_write(cpu_idling, true);
> arch_cpu_idle();
> + this_cpu_write(cpu_idling, false);
> start_critical_timings();
> }
> }
> diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
> index db63ac75eba0..75b5eacaa5d3 100644
> --- a/lib/nmi_backtrace.c
> +++ b/lib/nmi_backtrace.c
> @@ -17,6 +17,7 @@
> #include <linux/kprobes.h>
> #include <linux/nmi.h>
> #include <linux/seq_buf.h>
> +#include <linux/cpuidle.h>
>
> #ifdef arch_trigger_cpumask_backtrace
> /* For reliability, we're prepared to waste bits here. */
> @@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
>
> /* Replace printk to write into the NMI seq */
> this_cpu_write(printk_func, nmi_vprintk);
> - pr_warn("NMI backtrace for cpu %d\n", cpu);
> - if (regs)
> - show_regs(regs);
> - else
> - dump_stack();
> + if (in_cpu_idle()) {
> + pr_warn("NMI backtrace for cpu %d skipped: idle\n",
> + cpu);
> + } else {
> + pr_warn("NMI backtrace for cpu %d\n", cpu);
> + if (regs)
> + show_regs(regs);
> + else
> + dump_stack();
> + }
> this_cpu_write(printk_func, printk_func_save);
>
> cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
>

2016-03-07 09:57:47

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

On Tue, Mar 01, 2016 at 11:01:42AM -0500, Chris Metcalf wrote:
> +++ b/kernel/sched/idle.c
> @@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
> __setup("hlt", cpu_idle_nopoll_setup);
> #endif
> +static DEFINE_PER_CPU(bool, cpu_idling);
> +
> +/* Was the cpu was in the low-level idle code when interrupted? */
> +bool in_cpu_idle(void)
> +{
> + return this_cpu_read(cpu_idling);
> +}
> +
> static inline int cpu_idle_poll(void)
> {
> rcu_idle_enter();
> trace_cpu_idle_rcuidle(0, smp_processor_id());
> local_irq_enable();
> stop_critical_timings();
> + this_cpu_write(cpu_idling, true);
> while (!tif_need_resched() &&
> (cpu_idle_force_poll || tick_check_broadcast_expired()))
> cpu_relax();
> + this_cpu_write(cpu_idling, false);
> start_critical_timings();
> trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
> rcu_idle_exit();
> @@ -89,7 +99,9 @@ void default_idle_call(void)
> local_irq_enable();
> } else {
> stop_critical_timings();
> + this_cpu_write(cpu_idling, true);
> arch_cpu_idle();
> + this_cpu_write(cpu_idling, false);
> start_critical_timings();
> }
> }

No, we're not going to add random crap here. This is actually considered
a fast path for some workloads.

There's already far too much fat in the whole going to idle and coming
out of idle. We should be trimming this, not adding to it.

2016-03-07 17:05:30

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

On 03/07/2016 03:26 AM, Daniel Thompson wrote:
>> Chris Metcalf wrote:
>> +static DEFINE_PER_CPU(bool, cpu_idling);
>> +
>> +/* Was the cpu was in the low-level idle code when interrupted? */
>> +bool in_cpu_idle(void)
>> +{
>> + return this_cpu_read(cpu_idling);
>
> I think we continue to need the code to identify a core that is
> running an interrupt handler. Interrupts are not masked at the point
> we set cpu_idling to false meaning we can easily be preempted before
> we clear the flag.

Yes, good catch. However, mooted by PeterZ wanting to keep any extra
state-switching code out of the idle path. See my reply to him for more
on that.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-03-07 17:38:43

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

On 03/07/2016 04:48 AM, Peter Zijlstra wrote:
> On Tue, Mar 01, 2016 at 11:01:42AM -0500, Chris Metcalf wrote:
>> +++ b/kernel/sched/idle.c
>> @@ -52,15 +52,25 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
>> __setup("hlt", cpu_idle_nopoll_setup);
>> #endif
>> +static DEFINE_PER_CPU(bool, cpu_idling);
>> +
>> +/* Was the cpu was in the low-level idle code when interrupted? */
>> +bool in_cpu_idle(void)
>> +{
>> + return this_cpu_read(cpu_idling);
>> +}
>> +
>> static inline int cpu_idle_poll(void)
>> {
>> rcu_idle_enter();
>> trace_cpu_idle_rcuidle(0, smp_processor_id());
>> local_irq_enable();
>> stop_critical_timings();
>> + this_cpu_write(cpu_idling, true);
>> while (!tif_need_resched() &&
>> (cpu_idle_force_poll || tick_check_broadcast_expired()))
>> cpu_relax();
>> + this_cpu_write(cpu_idling, false);
>> start_critical_timings();
>> trace_cpu_idle_rcuidle(PWR_EVENT_EXIT, smp_processor_id());
>> rcu_idle_exit();
>> @@ -89,7 +99,9 @@ void default_idle_call(void)
>> local_irq_enable();
>> } else {
>> stop_critical_timings();
>> + this_cpu_write(cpu_idling, true);
>> arch_cpu_idle();
>> + this_cpu_write(cpu_idling, false);
>> start_critical_timings();
>> }
>> }
> No, we're not going to add random crap here. This is actually considered
> a fast path for some workloads.
>
> There's already far too much fat in the whole going to idle and coming
> out of idle. We should be trimming this, not adding to it.

I'm a little skeptical that a single percpu write is going to add much
measurable overhead to this path. However, we can certainly adapt
alternate approaches that stay away from the actual idle code.

One approach (diff appended) is to just test to see if the PC is
actually in the architecture-specific halt code. There are two downsides:

1. It requires a small amount of per-architecture support. I've provided
the tile support as an example, since that's what I tested. I expect
x86 is a little more complicated since there are more idle paths and
they don't currently run the idle instruction(s) at a fixed address, but
it's unlikely to be too complicated on any platform.
Still, adding anything per-architecture is certainly a downside.

2. As proposed, my new alternate solution only handles the non-polling
case, so if you are in the polling loop, we won't benefit from having
the NMI backtrace code skip over you. However my guess is that 99% of
the time folks do choose to run the default non-polling mode, so this
probably still achieves a pretty reasonable outcome.

A different approach that would handle downside #2 and probably make it
easier to implement the architecture-specific code for more complicated
platforms like x86 would be to use the SCHED_TEXT model and tag all the
low-level idling functions as CPUIDLE_TEXT. Then the "are we idling"
test is just a range compare on the PC against __cpuidle_text_{start,end}.

We'd have to decide whether to make cpu_idle_poll() non-inline and just
test for being in that function, or whether we could tag all of
cpu_idle_loop() as being CPUIDLE_TEXT and just omit any backtrace
whenever the PC is anywhere in that function. Obviously if we have
called out to more complicated code (e.g. Daniel's concern about calling
out to power management code) the PC would no longer be in the CPUIDLE_TEXT
at that point, so that might be OK too.

Let me know what you think is the right direction here.

Thanks!

diff --git a/arch/tile/include/asm/thread_info.h b/arch/tile/include/asm/thread_info.h
index 4b7cef9e94e0..93ec51a4853b 100644
--- a/arch/tile/include/asm/thread_info.h
+++ b/arch/tile/include/asm/thread_info.h
@@ -92,6 +92,9 @@ extern void smp_nap(void);
/* Enable interrupts racelessly and nap forever: helper for arch_cpu_idle(). */
extern void _cpu_idle(void);

+/* The address of the actual nap instruction. */
+extern long _cpu_idle_nap[];
+
#else /* __ASSEMBLY__ */

/*
diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index b5f30d376ce1..a83a426f1755 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -70,6 +70,11 @@ void arch_cpu_idle(void)
_cpu_idle();
}

+bool arch_cpu_in_idle(struct pt_regs *regs)
+{
+ return regs->pc == (unsigned long)_cpu_idle_nap;
+}
+
/*
* Release a thread_info structure
*/
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index d2ca8c38f9c4..24462927fa49 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -279,6 +279,7 @@ void arch_cpu_idle_prepare(void);
void arch_cpu_idle_enter(void);
void arch_cpu_idle_exit(void);
void arch_cpu_idle_dead(void);
+bool arch_cpu_in_idle(struct pt_regs *);

DECLARE_PER_CPU(bool, cpu_dead_idle);

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 544a7133cbd1..d9dbab6526a9 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -77,6 +77,7 @@ void __weak arch_cpu_idle(void)
cpu_idle_force_poll = 1;
local_irq_enable();
}
+bool __weak arch_cpu_in_idle(struct pt_regs *regs) { return false; }

/**
* default_idle_call - Default CPU idle routine.
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index db63ac75eba0..bcc4ecc828f2 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/nmi.h>
#include <linux/seq_buf.h>
+#include <linux/cpu.h>

#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
@@ -151,11 +152,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)

/* Replace printk to write into the NMI seq */
this_cpu_write(printk_func, nmi_vprintk);
- pr_warn("NMI backtrace for cpu %d\n", cpu);
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+ if (regs != NULL && arch_cpu_in_idle(regs)) {
+ pr_warn("NMI backtrace for cpu %d skipped: idle\n",
+ cpu);
+ } else {
+ pr_warn("NMI backtrace for cpu %d\n", cpu);
+ if (regs)
+ show_regs(regs);
+ else
+ dump_stack();
+ }
this_cpu_write(printk_func, printk_func_save);

cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-03-07 20:43:34

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 2/4] nmi_backtrace: generate one-line reports for idle cpus

On Mon, Mar 07, 2016 at 12:38:16PM -0500, Chris Metcalf wrote:
> On 03/07/2016 04:48 AM, Peter Zijlstra wrote:
> I'm a little skeptical that a single percpu write is going to add much
> measurable overhead to this path.

So that write is almost guaranteed to be a cacheline miss, those things
hurt and do show up on profiles.

> However, we can certainly adapt
> alternate approaches that stay away from the actual idle code.
>
> One approach (diff appended) is to just test to see if the PC is
> actually in the architecture-specific halt code. There are two downsides:
>
> 1. It requires a small amount of per-architecture support. I've provided
> the tile support as an example, since that's what I tested. I expect
> x86 is a little more complicated since there are more idle paths and
> they don't currently run the idle instruction(s) at a fixed address, but
> it's unlikely to be too complicated on any platform.
> Still, adding anything per-architecture is certainly a downside.
>
> 2. As proposed, my new alternate solution only handles the non-polling
> case, so if you are in the polling loop, we won't benefit from having
> the NMI backtrace code skip over you. However my guess is that 99% of
> the time folks do choose to run the default non-polling mode, so this
> probably still achieves a pretty reasonable outcome.
>
> A different approach that would handle downside #2 and probably make it
> easier to implement the architecture-specific code for more complicated
> platforms like x86 would be to use the SCHED_TEXT model and tag all the
> low-level idling functions as CPUIDLE_TEXT. Then the "are we idling"
> test is just a range compare on the PC against __cpuidle_text_{start,end}.
>
> We'd have to decide whether to make cpu_idle_poll() non-inline and just
> test for being in that function, or whether we could tag all of
> cpu_idle_loop() as being CPUIDLE_TEXT and just omit any backtrace
> whenever the PC is anywhere in that function. Obviously if we have
> called out to more complicated code (e.g. Daniel's concern about calling
> out to power management code) the PC would no longer be in the CPUIDLE_TEXT
> at that point, so that might be OK too.

But the CPU would also not be idle if its running pm code.

So I like the CPUIDLE_TEXT approach, since it has no impact on the
generated code.

An alternative option could be to inspect the stack, we already take a
stack dump, so you could say that everything that has cpuidle_enter() in
its callchain is an 'idle' cpu.

Yet another option would be to look at rq->idle_state or any other state
cpuidle already tracks. The 'obvious' downside is relying on cpuidle,
which I understand isn't supported by everyone.

2016-03-16 17:02:39

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

Currently you can only request a backtrace of either all cpus, or
all cpus but yourself. It can also be helpful to request a remote
backtrace of a single cpu, and since we want that, the logical
extension is to support a cpumask as the underlying primitive.

This change modifies the existing lib/nmi_backtrace.c code to take
a cpumask as its basic primitive, and modifies the linux/nmi.h code
to use either the old "all/all_but_self" arch methods, or the new
"cpumask" method, depending on which is available.

The existing clients of nmi_backtrace (arm and x86) are converted
to using the new cpumask approach in this change.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/arm/include/asm/irq.h | 4 +--
arch/arm/kernel/smp.c | 4 +--
arch/x86/include/asm/irq.h | 4 +--
arch/x86/kernel/apic/hw_nmi.c | 6 ++---
include/linux/nmi.h | 63 ++++++++++++++++++++++++++++++++++---------
lib/nmi_backtrace.c | 15 +++++------
6 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index 1bd9510de1b9..13f9a9a17eca 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -36,8 +36,8 @@ extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
#endif

#ifdef CONFIG_SMP
-extern void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x)
+extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask);
+#define arch_trigger_cpumask_backtrace(x) arch_trigger_cpumask_backtrace(x)
#endif

static inline int nr_legacy_irqs(void)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 37312f6749f3..208125658e56 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -758,7 +758,7 @@ static void raise_nmi(cpumask_t *mask)
smp_cross_call(mask, IPI_CPU_BACKTRACE);
}

-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
{
- nmi_trigger_all_cpu_backtrace(include_self, raise_nmi);
+ nmi_trigger_cpumask_backtrace(mask, raise_nmi);
}
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index e7de5c9a4fbd..18bdc8cc5c63 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -50,8 +50,8 @@ extern int vector_used_by_percpu_irq(unsigned int vector);
extern void init_ISA_irqs(void);

#ifdef CONFIG_X86_LOCAL_APIC
-void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
#endif

#endif /* _ASM_X86_IRQ_H */
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..63f0b69ad6a6 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -27,15 +27,15 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
}
#endif

-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
static void nmi_raise_cpu_backtrace(cpumask_t *mask)
{
apic->send_IPI_mask(mask, NMI_VECTOR);
}

-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
{
- nmi_trigger_all_cpu_backtrace(include_self, nmi_raise_cpu_backtrace);
+ nmi_trigger_cpumask_backtrace(mask, nmi_raise_cpu_backtrace);
}

static int
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 7ec5b86735f3..951875f4f072 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -31,38 +31,75 @@ static inline void hardlockup_detector_disable(void) {}
#endif

/*
- * Create trigger_all_cpu_backtrace() out of the arch-provided
- * base function. Return whether such support was available,
+ * Create trigger_all_cpu_backtrace() etc out of the arch-provided
+ * base function(s). Return whether such support was available,
* to allow calling code to fall back to some other mechanism:
*/
-#ifdef arch_trigger_all_cpu_backtrace
static inline bool trigger_all_cpu_backtrace(void)
{
+#if defined(arch_trigger_all_cpu_backtrace)
arch_trigger_all_cpu_backtrace(true);
-
return true;
+#elif defined(arch_trigger_cpumask_backtrace)
+ arch_trigger_cpumask_backtrace(cpu_online_mask);
+ return true;
+#else
+ return false;
+#endif
}
+
static inline bool trigger_allbutself_cpu_backtrace(void)
{
+#if defined(arch_trigger_all_cpu_backtrace)
arch_trigger_all_cpu_backtrace(false);
return true;
-}
-
-/* generic implementation */
-void nmi_trigger_all_cpu_backtrace(bool include_self,
- void (*raise)(cpumask_t *mask));
-bool nmi_cpu_backtrace(struct pt_regs *regs);
+#elif defined(arch_trigger_cpumask_backtrace)
+ cpumask_var_t mask;
+ int cpu = get_cpu();

+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ return false;
+ cpumask_copy(mask, cpu_online_mask);
+ cpumask_clear_cpu(cpu, mask);
+ arch_trigger_cpumask_backtrace(mask);
+ put_cpu();
+ free_cpumask_var(mask);
+ return true;
#else
-static inline bool trigger_all_cpu_backtrace(void)
-{
return false;
+#endif
}
-static inline bool trigger_allbutself_cpu_backtrace(void)
+
+static inline bool trigger_cpumask_backtrace(struct cpumask *mask)
{
+#if defined(arch_trigger_cpumask_backtrace)
+ arch_trigger_cpumask_backtrace(mask);
+ return true;
+#else
return false;
+#endif
}
+
+static inline bool trigger_single_cpu_backtrace(int cpu)
+{
+#if defined(arch_trigger_cpumask_backtrace)
+ cpumask_var_t mask;
+
+ if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+ return false;
+ cpumask_set_cpu(cpu, mask);
+ arch_trigger_cpumask_backtrace(mask);
+ free_cpumask_var(mask);
+ return true;
+#else
+ return false;
#endif
+}
+
+/* generic implementation */
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
+ void (*raise)(cpumask_t *mask));
+bool nmi_cpu_backtrace(struct pt_regs *regs);

#ifdef CONFIG_LOCKUP_DETECTOR
int hw_nmi_is_cpu_stuck(struct pt_regs *);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..db63ac75eba0 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -18,7 +18,7 @@
#include <linux/nmi.h>
#include <linux/seq_buf.h>

-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
static cpumask_t printtrace_mask;
@@ -44,12 +44,12 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
}

/*
- * When raise() is called it will be is passed a pointer to the
+ * When raise() is called it will be passed a pointer to the
* backtrace_mask. Architectures that call nmi_cpu_backtrace()
* directly from their raise() functions may rely on the mask
* they are passed being updated as a side effect of this call.
*/
-void nmi_trigger_all_cpu_backtrace(bool include_self,
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
void (*raise)(cpumask_t *mask))
{
struct nmi_seq_buf *s;
@@ -64,10 +64,7 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
return;
}

- cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
- if (!include_self)
- cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
-
+ cpumask_copy(to_cpumask(backtrace_mask), mask);
cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));

/*
@@ -80,8 +77,8 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
}

if (!cpumask_empty(to_cpumask(backtrace_mask))) {
- pr_info("Sending NMI to %s CPUs:\n",
- (include_self ? "all" : "other"));
+ pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
+ this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
raise(to_cpumask(backtrace_mask));
}

--
2.7.2

2016-03-16 17:02:42

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v2 0/4] improvements to the nmi_backtrace code

>From the version 1 cover letter:

This patch series modifies the trigger_xxx_backtrace() NMI-based
remote backtracing code to make it more flexible, and makes a few
small improvements along the way.

The motivation comes from the task isolation code, where there are
scenarios where we want to be able to diagnose a case where some cpu
is about to interrupt a task-isolated cpu. It can be helpful to
see both where the interrupting cpu is, and also an approximation
of where the cpu that is being interrupted is. The nmi_backtrace
framework allows us to discover the stack of the interrupted cpu.

Version 2 of the patch series adopts the CPUIDLE_TEXT approach that I
suggested in some discussion around identifying idle cpus, and that
Peter Zijlstra endorsed. I renumbered the patches to put the idle-test
patch last in the series (4/4) and it is the only one modified in this
version of the patch series. (To be fair I did also change all the
S-O-B and author lines to be mellanox.com instead of ezchip.com).

I've tested that the change works as desired on tile, and build-tested
x86, arm64, and arm. For x86 and arm64 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section. For arm I just build-tested it and made sure the
generic cpuidle routines were in the new cpuidle section, but I didn't
attempt to tease apart the tangle of platform-specific idle routines
that arm has and tag them with __cpuidle. That might be more usefully
done by someone with arm platform experience in a follow-up patch.

v1 of the series is here:

https://lkml.kernel.org/r/[email protected]

Chris Metcalf (4):
nmi_backtrace: add more trigger_*_cpu_backtrace() methods
nmi_backtrace: do a local dump_stack() instead of a self-NMI
arch/tile: adopt the new nmi_backtrace framework
nmi_backtrace: generate one-line reports for idle cpus

arch/alpha/kernel/vmlinux.lds.S | 1 +
arch/arc/kernel/vmlinux.lds.S | 1 +
arch/arm/include/asm/irq.h | 4 +-
arch/arm/kernel/smp.c | 13 +------
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/arm64/mm/proc.S | 2 +
arch/avr32/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/cris/kernel/vmlinux.lds.S | 1 +
arch/frv/kernel/vmlinux.lds.S | 1 +
arch/h8300/kernel/vmlinux.lds.S | 1 +
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/kernel/vmlinux.lds.S | 1 +
arch/m32r/kernel/vmlinux.lds.S | 1 +
arch/m68k/kernel/vmlinux-nommu.lds | 1 +
arch/m68k/kernel/vmlinux-std.lds | 1 +
arch/m68k/kernel/vmlinux-sun3.lds | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/mn10300/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/score/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/include/asm/irq.h | 4 +-
arch/tile/kernel/entry.S | 2 +-
arch/tile/kernel/pmc.c | 3 --
arch/tile/kernel/process.c | 72 ++++++++----------------------------
arch/tile/kernel/traps.c | 7 +++-
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/um/kernel/dyn.lds.S | 1 +
arch/um/kernel/uml.lds.S | 1 +
arch/unicore32/kernel/vmlinux.lds.S | 1 +
arch/x86/include/asm/irq.h | 4 +-
arch/x86/kernel/apic/hw_nmi.c | 6 +--
arch/x86/kernel/process.c | 4 +-
arch/x86/kernel/vmlinux.lds.S | 1 +
include/asm-generic/vmlinux.lds.h | 6 +++
include/linux/cpu.h | 5 +++
include/linux/nmi.h | 63 ++++++++++++++++++++++++-------
kernel/sched/idle.c | 13 ++++++-
lib/nmi_backtrace.c | 40 +++++++++++++-------
scripts/mod/modpost.c | 4 +-
scripts/recordmcount.c | 1 +
scripts/recordmcount.pl | 1 +
52 files changed, 172 insertions(+), 115 deletions(-)

--
2.7.2

2016-03-16 17:02:53

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v2 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI

Currently on arm there is code that checks whether it should call
dump_stack() explicitly, to avoid trying to raise an NMI when the
current context is not preemptible by the backtrace IPI. Similarly,
the forthcoming arch/tile support uses an IPI mechanism that does
not support generating an NMI to self.

Accordingly, move the code that guards this case into the generic
mechanism, and invoke it unconditionally whenever we want a
backtrace of the current cpu. It seems plausible that in all cases,
dump_stack() will generate better information than generating a
stack from the NMI handler. The register state will be missing,
but that state is likely not particularly helpful in any case.

Or, if we think it is helpful, we should be capturing and emitting
the current register state in all cases when regs == NULL is passed
to nmi_cpu_backtrace().

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/arm/kernel/smp.c | 9 ---------
lib/nmi_backtrace.c | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 208125658e56..26a9ac6bc616 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -746,15 +746,6 @@ core_initcall(register_cpufreq_notifier);

static void raise_nmi(cpumask_t *mask)
{
- /*
- * Generate the backtrace directly if we are running in a calling
- * context that is not preemptible by the backtrace IPI. Note
- * that nmi_cpu_backtrace() automatically removes the current cpu
- * from mask.
- */
- if (cpumask_test_cpu(smp_processor_id(), mask) && irqs_disabled())
- nmi_cpu_backtrace(NULL);
-
smp_cross_call(mask, IPI_CPU_BACKTRACE);
}

diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index db63ac75eba0..9375c0279b73 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -76,6 +76,15 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
}

+ /*
+ * Don't try to send an NMI to this cpu; it may work on some
+ * architectures, but on others it may not, and we'll get
+ * information at least as useful just by doing a dump_stack() here.
+ * Note that nmi_cpu_backtrace(NULL) will clear the cpu bit.
+ */
+ if (cpumask_test_cpu(this_cpu, to_cpumask(backtrace_mask)))
+ nmi_cpu_backtrace(NULL);
+
if (!cpumask_empty(to_cpumask(backtrace_mask))) {
pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
--
2.7.2

2016-03-16 17:03:06

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

When doing an nmi backtrace of many cores, most of which are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the
interrupted PC to see if it lies within that section.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/alpha/kernel/vmlinux.lds.S | 1 +
arch/arc/kernel/vmlinux.lds.S | 1 +
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/arm64/mm/proc.S | 2 ++
arch/avr32/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/cris/kernel/vmlinux.lds.S | 1 +
arch/frv/kernel/vmlinux.lds.S | 1 +
arch/h8300/kernel/vmlinux.lds.S | 1 +
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/kernel/vmlinux.lds.S | 1 +
arch/m32r/kernel/vmlinux.lds.S | 1 +
arch/m68k/kernel/vmlinux-nommu.lds | 1 +
arch/m68k/kernel/vmlinux-std.lds | 1 +
arch/m68k/kernel/vmlinux-sun3.lds | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/mn10300/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/score/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/kernel/entry.S | 2 +-
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/um/kernel/dyn.lds.S | 1 +
arch/um/kernel/uml.lds.S | 1 +
arch/unicore32/kernel/vmlinux.lds.S | 1 +
arch/x86/kernel/process.c | 4 ++--
arch/x86/kernel/vmlinux.lds.S | 1 +
include/asm-generic/vmlinux.lds.h | 6 ++++++
include/linux/cpu.h | 5 +++++
kernel/sched/idle.c | 13 +++++++++++--
lib/nmi_backtrace.c | 16 +++++++++++-----
scripts/mod/modpost.c | 4 ++--
scripts/recordmcount.c | 1 +
scripts/recordmcount.pl | 1 +
43 files changed, 75 insertions(+), 12 deletions(-)

diff --git a/arch/alpha/kernel/vmlinux.lds.S b/arch/alpha/kernel/vmlinux.lds.S
index 647b84c15382..cebecfb76fbf 100644
--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -22,6 +22,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/arc/kernel/vmlinux.lds.S b/arch/arc/kernel/vmlinux.lds.S
index 894e696bddaa..65652160cfda 100644
--- a/arch/arc/kernel/vmlinux.lds.S
+++ b/arch/arc/kernel/vmlinux.lds.S
@@ -97,6 +97,7 @@ SECTIONS
_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 8b60fde5ce48..6c13d570e9c9 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -107,6 +107,7 @@ SECTIONS
IRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.gnu.warning)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index e3928f578891..a5cbecf8a74c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -104,6 +104,7 @@ SECTIONS
IRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
HYPERVISOR_TEXT
IDMAP_TEXT
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index c164d2cb35c0..b1b60fc438f6 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -48,11 +48,13 @@
*
* Idle the processor (wait for interrupt).
*/
+ .pushsection ".cpuidle.text","ax"
ENTRY(cpu_do_idle)
dsb sy // WFI may enter a low-power mode
wfi
ret
ENDPROC(cpu_do_idle)
+ .popsection

#ifdef CONFIG_CPU_PM
/**
diff --git a/arch/avr32/kernel/vmlinux.lds.S b/arch/avr32/kernel/vmlinux.lds.S
index a4589176bed5..17f2730eb497 100644
--- a/arch/avr32/kernel/vmlinux.lds.S
+++ b/arch/avr32/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
KPROBES_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S
index c9eec84aa258..63a02c342830 100644
--- a/arch/blackfin/kernel/vmlinux.lds.S
+++ b/arch/blackfin/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS
#ifndef CONFIG_SCHEDULE_L1
SCHED_TEXT
#endif
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
KPROBES_TEXT
diff --git a/arch/c6x/kernel/vmlinux.lds.S b/arch/c6x/kernel/vmlinux.lds.S
index 5a6e141d1641..9cabd962ab36 100644
--- a/arch/c6x/kernel/vmlinux.lds.S
+++ b/arch/c6x/kernel/vmlinux.lds.S
@@ -70,6 +70,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
KPROBES_TEXT
diff --git a/arch/cris/kernel/vmlinux.lds.S b/arch/cris/kernel/vmlinux.lds.S
index 7552c2557506..979586261520 100644
--- a/arch/cris/kernel/vmlinux.lds.S
+++ b/arch/cris/kernel/vmlinux.lds.S
@@ -43,6 +43,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.text.__*)
diff --git a/arch/frv/kernel/vmlinux.lds.S b/arch/frv/kernel/vmlinux.lds.S
index 7e958d829ec9..aa6e573d57da 100644
--- a/arch/frv/kernel/vmlinux.lds.S
+++ b/arch/frv/kernel/vmlinux.lds.S
@@ -63,6 +63,7 @@ SECTIONS
*(.text..tlbmiss)
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
#ifdef CONFIG_DEBUG_INFO
INIT_TEXT
diff --git a/arch/h8300/kernel/vmlinux.lds.S b/arch/h8300/kernel/vmlinux.lds.S
index cb5dfb02c88d..7f11da1b895e 100644
--- a/arch/h8300/kernel/vmlinux.lds.S
+++ b/arch/h8300/kernel/vmlinux.lds.S
@@ -29,6 +29,7 @@ SECTIONS
_stext = . ;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
#if defined(CONFIG_ROMKERNEL)
*(.int_redirect)
diff --git a/arch/hexagon/kernel/vmlinux.lds.S b/arch/hexagon/kernel/vmlinux.lds.S
index 5f268c1071b3..ec87e67feb19 100644
--- a/arch/hexagon/kernel/vmlinux.lds.S
+++ b/arch/hexagon/kernel/vmlinux.lds.S
@@ -50,6 +50,7 @@ SECTIONS
_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index dc506b05ffbd..f89d20c97412 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -46,6 +46,7 @@ SECTIONS {
__end_ivt_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.gnu.linkonce.t*)
diff --git a/arch/m32r/kernel/vmlinux.lds.S b/arch/m32r/kernel/vmlinux.lds.S
index 018e4a711d79..ad1fe56455aa 100644
--- a/arch/m32r/kernel/vmlinux.lds.S
+++ b/arch/m32r/kernel/vmlinux.lds.S
@@ -31,6 +31,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-nommu.lds b/arch/m68k/kernel/vmlinux-nommu.lds
index 06a763f49fd3..d2c8abf1c8c4 100644
--- a/arch/m68k/kernel/vmlinux-nommu.lds
+++ b/arch/m68k/kernel/vmlinux-nommu.lds
@@ -45,6 +45,7 @@ SECTIONS {
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
. = ALIGN(16);
diff --git a/arch/m68k/kernel/vmlinux-std.lds b/arch/m68k/kernel/vmlinux-std.lds
index d0993594f558..5b5ce1e4d1ed 100644
--- a/arch/m68k/kernel/vmlinux-std.lds
+++ b/arch/m68k/kernel/vmlinux-std.lds
@@ -16,6 +16,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-sun3.lds b/arch/m68k/kernel/vmlinux-sun3.lds
index 8080469ee6c1..fe5ea1974b16 100644
--- a/arch/m68k/kernel/vmlinux-sun3.lds
+++ b/arch/m68k/kernel/vmlinux-sun3.lds
@@ -16,6 +16,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/metag/kernel/vmlinux.lds.S b/arch/metag/kernel/vmlinux.lds.S
index e12055e88bfe..9fc48354d519 100644
--- a/arch/metag/kernel/vmlinux.lds.S
+++ b/arch/metag/kernel/vmlinux.lds.S
@@ -21,6 +21,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/microblaze/kernel/vmlinux.lds.S b/arch/microblaze/kernel/vmlinux.lds.S
index be9488d69734..5913c7863067 100644
--- a/arch/microblaze/kernel/vmlinux.lds.S
+++ b/arch/microblaze/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS {
EXIT_TEXT
EXIT_CALL
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S
index 0a93e83cd014..e0fc08cb0c89 100644
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -55,6 +55,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/mn10300/kernel/vmlinux.lds.S b/arch/mn10300/kernel/vmlinux.lds.S
index 13c4814c29f8..2d5f1c3f1afb 100644
--- a/arch/mn10300/kernel/vmlinux.lds.S
+++ b/arch/mn10300/kernel/vmlinux.lds.S
@@ -30,6 +30,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/nios2/kernel/vmlinux.lds.S b/arch/nios2/kernel/vmlinux.lds.S
index 326fab40a9de..340c7ab1d8b0 100644
--- a/arch/nios2/kernel/vmlinux.lds.S
+++ b/arch/nios2/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
KPROBES_TEXT
diff --git a/arch/openrisc/kernel/vmlinux.lds.S b/arch/openrisc/kernel/vmlinux.lds.S
index 2d69a853b742..6c3cf834b5d8 100644
--- a/arch/openrisc/kernel/vmlinux.lds.S
+++ b/arch/openrisc/kernel/vmlinux.lds.S
@@ -47,6 +47,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index 308f29081d46..7e53bf44fdd2 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -69,6 +69,7 @@ SECTIONS
.text ALIGN(PAGE_SIZE) : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index d41fd0af8980..bf423392b20a 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
/* careful! __ftr_alt_* sections need to be close to .text */
*(.text .fixup __ftr_alt_* .ref.text)
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 445657fe658c..cbc74fd4a6db 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -25,6 +25,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/score/kernel/vmlinux.lds.S b/arch/score/kernel/vmlinux.lds.S
index 7274b5c4287e..4117890b1db1 100644
--- a/arch/score/kernel/vmlinux.lds.S
+++ b/arch/score/kernel/vmlinux.lds.S
@@ -40,6 +40,7 @@ SECTIONS
_text = .; /* Text and read-only data */
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.text.*)
diff --git a/arch/sh/kernel/vmlinux.lds.S b/arch/sh/kernel/vmlinux.lds.S
index db88cbf9eafd..989500c17358 100644
--- a/arch/sh/kernel/vmlinux.lds.S
+++ b/arch/sh/kernel/vmlinux.lds.S
@@ -36,6 +36,7 @@ SECTIONS
TEXT_TEXT
EXTRA_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
index f1a2f688b28a..93029a4b5299 100644
--- a/arch/sparc/kernel/vmlinux.lds.S
+++ b/arch/sparc/kernel/vmlinux.lds.S
@@ -45,6 +45,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index 670a3569450f..101de132e363 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -50,7 +50,7 @@ STD_ENTRY(smp_nap)
* When interrupted at _cpu_idle_nap, we bump the PC forward 8, and
* as a result return to the function that called _cpu_idle().
*/
-STD_ENTRY(_cpu_idle)
+STD_ENTRY_SECTION(_cpu_idle, .cpuidle.text)
movei r1, 1
IRQ_ENABLE_LOAD(r2, r3)
mtspr INTERRUPT_CRITICAL_SECTION, r1
diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S
index 0e059a0101ea..a92931e8c4f9 100644
--- a/arch/tile/kernel/vmlinux.lds.S
+++ b/arch/tile/kernel/vmlinux.lds.S
@@ -42,6 +42,7 @@ SECTIONS
.text : AT (ADDR(.text) - LOAD_OFFSET) {
HEAD_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index adde088aeeff..4fdbcf958cd5 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -68,6 +68,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.stub .text.* .gnu.linkonce.t.*)
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index 6899195602b7..1840f55ed042 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -28,6 +28,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
/* .gnu.warning sections are handled specially by elf32.em. */
diff --git a/arch/unicore32/kernel/vmlinux.lds.S b/arch/unicore32/kernel/vmlinux.lds.S
index 77e407e49a63..56e788e8ee83 100644
--- a/arch/unicore32/kernel/vmlinux.lds.S
+++ b/arch/unicore32/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
.text : { /* Real text segment */
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT

*(.fixup)
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9f7c21c22477..d569ae7fde37 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -298,7 +298,7 @@ void arch_cpu_idle(void)
/*
* We use this if we don't have any better idle routine..
*/
-void default_idle(void)
+void __cpuidle default_idle(void)
{
trace_cpu_idle_rcuidle(1, smp_processor_id());
safe_halt();
@@ -413,7 +413,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
* with interrupts enabled and no flags, which is backwards compatible with the
* original MWAIT implementation.
*/
-static void mwait_idle(void)
+static __cpuidle void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
trace_cpu_idle_rcuidle(1, smp_processor_id());
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 74e4bf11f562..95f80be7632f 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -98,6 +98,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
ENTRY_TEXT
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index c4bd0e2c173c..18af5199f97c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -444,6 +444,12 @@
*(.spinlock.text) \
VMLINUX_SYMBOL(__lock_text_end) = .;

+#define CPUIDLE_TEXT \
+ ALIGN_FUNCTION(); \
+ VMLINUX_SYMBOL(__cpuidle_text_start) = .; \
+ *(.cpuidle.text) \
+ VMLINUX_SYMBOL(__cpuidle_text_end) = .;
+
#define KPROBES_TEXT \
ALIGN_FUNCTION(); \
VMLINUX_SYMBOL(__kprobes_text_start) = .; \
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index d2ca8c38f9c4..0cbe214e8f4b 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -274,6 +274,11 @@ void cpu_startup_entry(enum cpuhp_state state);

void cpu_idle_poll_ctrl(bool enable);

+/* Attach to any functions which should be considered cpuidle. */
+#define __cpuidle __attribute__((__section__(".cpuidle.text")))
+
+bool cpu_in_idle(unsigned long pc);
+
void arch_cpu_idle(void);
void arch_cpu_idle_prepare(void);
void arch_cpu_idle_enter(void);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 544a7133cbd1..ffca482beab5 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -15,6 +15,9 @@

#include "sched.h"

+/* Linker adds these: start and end of __cpuidle functions */
+extern char __cpuidle_text_start[], __cpuidle_text_end[];
+
/**
* sched_idle_set_state - Record idle state for the current CPU.
* @idle_state: State to record.
@@ -52,7 +55,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
__setup("hlt", cpu_idle_nopoll_setup);
#endif

-static inline int cpu_idle_poll(void)
+static int noinline __cpuidle cpu_idle_poll(void)
{
rcu_idle_enter();
trace_cpu_idle_rcuidle(0, smp_processor_id());
@@ -83,7 +86,7 @@ void __weak arch_cpu_idle(void)
*
* To use when the cpuidle framework cannot be used.
*/
-void default_idle_call(void)
+void __cpuidle default_idle_call(void)
{
if (current_clr_polling_and_test()) {
local_irq_enable();
@@ -273,6 +276,12 @@ static void cpu_idle_loop(void)
}
}

+bool cpu_in_idle(unsigned long pc)
+{
+ return pc >= (unsigned long)__cpuidle_text_start &&
+ pc < (unsigned long)__cpuidle_text_end;
+}
+
void cpu_startup_entry(enum cpuhp_state state)
{
/*
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 9375c0279b73..ac41f3c84e8d 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/nmi.h>
#include <linux/seq_buf.h>
+#include <linux/cpu.h>

#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
@@ -160,11 +161,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)

/* Replace printk to write into the NMI seq */
this_cpu_write(printk_func, nmi_vprintk);
- pr_warn("NMI backtrace for cpu %d\n", cpu);
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+ if (regs != NULL && cpu_in_idle(instruction_pointer(regs))) {
+ pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
+ cpu, instruction_pointer(regs));
+ } else {
+ pr_warn("NMI backtrace for cpu %d\n", cpu);
+ if (regs)
+ show_regs(regs);
+ else
+ dump_stack();
+ }
this_cpu_write(printk_func, printk_func_save);

cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 48958d3cec9e..37afd721ec99 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -887,8 +887,8 @@ static void check_section(const char *modname, struct elf_info *elf,
#define ALL_EXIT_SECTIONS EXIT_SECTIONS, ALL_XXXEXIT_SECTIONS

#define DATA_SECTIONS ".data", ".data.rel"
-#define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \
- ".kprobes.text"
+#define TEXT_SECTIONS ".text", ".text.unlikely", \
+ ".kprobes.text", ".cpuidle.text"
#define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
".fixup", ".entry.text", ".exception.text", ".text.*", \
".coldtext"
diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index e167592793a7..9a6ec6ce00b5 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -357,6 +357,7 @@ is_mcounted_section_name(char const *const txtname)
strcmp(".spinlock.text", txtname) == 0 ||
strcmp(".irqentry.text", txtname) == 0 ||
strcmp(".kprobes.text", txtname) == 0 ||
+ strcmp(".cpuidle.text", txtname) == 0 ||
strcmp(".text.unlikely", txtname) == 0;
}

diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 96e2486a6fc4..29cecf9b504f 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -135,6 +135,7 @@ my %text_sections = (
".spinlock.text" => 1,
".irqentry.text" => 1,
".kprobes.text" => 1,
+ ".cpuidle.text" => 1,
".text.unlikely" => 1,
);

--
2.7.2

2016-03-16 17:02:50

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v2 3/4] arch/tile: adopt the new nmi_backtrace framework

Previously tile was rolling its own method of capturing backtrace
data in the NMI handlers, but it was relying on running printk()
from the NMI handler, which is not always safe. So adopt the
nmi_backtrace model (with the new cpumask extension) instead.

So we can call the nmi_backtrace code directly from the nmi handler,
move the nmi_enter()/exit() into the top-level tile NMI handler.

The semantics of the routine change slightly since it is now
synchronous with the remote cores completing the backtraces.
Previously it was asynchronous, but with protection to avoid starting
a new remote backtrace if the old one was still in progress.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/tile/include/asm/irq.h | 4 +--
arch/tile/kernel/pmc.c | 3 --
arch/tile/kernel/process.c | 72 ++++++++++-----------------------------------
arch/tile/kernel/traps.c | 7 +++--
4 files changed, 23 insertions(+), 63 deletions(-)

diff --git a/arch/tile/include/asm/irq.h b/arch/tile/include/asm/irq.h
index 84a924034bdb..909230a02ea8 100644
--- a/arch/tile/include/asm/irq.h
+++ b/arch/tile/include/asm/irq.h
@@ -79,8 +79,8 @@ void tile_irq_activate(unsigned int irq, int tile_irq_type);
void setup_irq_regs(void);

#ifdef __tilegx__
-void arch_trigger_all_cpu_backtrace(bool self);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
#endif

#endif /* _ASM_TILE_IRQ_H */
diff --git a/arch/tile/kernel/pmc.c b/arch/tile/kernel/pmc.c
index db62cc34b955..81cf8743a3f3 100644
--- a/arch/tile/kernel/pmc.c
+++ b/arch/tile/kernel/pmc.c
@@ -16,7 +16,6 @@
#include <linux/spinlock.h>
#include <linux/module.h>
#include <linux/atomic.h>
-#include <linux/interrupt.h>

#include <asm/processor.h>
#include <asm/pmc.h>
@@ -29,9 +28,7 @@ int handle_perf_interrupt(struct pt_regs *regs, int fault)
if (!perf_irq)
panic("Unexpected PERF_COUNT interrupt %d\n", fault);

- nmi_enter();
retval = perf_irq(regs, fault);
- nmi_exit();
return retval;
}

diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index b5f30d376ce1..6594df5fed53 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -22,7 +22,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/compat.h>
-#include <linux/hardirq.h>
+#include <linux/nmi.h>
#include <linux/syscalls.h>
#include <linux/kernel.h>
#include <linux/tracehook.h>
@@ -593,66 +593,18 @@ void show_regs(struct pt_regs *regs)
tile_show_stack(&kbt);
}

-/* To ensure stack dump on tiles occurs one by one. */
-static DEFINE_SPINLOCK(backtrace_lock);
-/* To ensure no backtrace occurs before all of the stack dump are done. */
-static atomic_t backtrace_cpus;
-/* The cpu mask to avoid reentrance. */
-static struct cpumask backtrace_mask;
-
-void do_nmi_dump_stack(struct pt_regs *regs)
-{
- int is_idle = is_idle_task(current) && !in_interrupt();
- int cpu;
-
- nmi_enter();
- cpu = smp_processor_id();
- if (WARN_ON_ONCE(!cpumask_test_and_clear_cpu(cpu, &backtrace_mask)))
- goto done;
-
- spin_lock(&backtrace_lock);
- if (is_idle)
- pr_info("CPU: %d idle\n", cpu);
- else
- show_regs(regs);
- spin_unlock(&backtrace_lock);
- atomic_dec(&backtrace_cpus);
-done:
- nmi_exit();
-}
-
#ifdef __tilegx__
-void arch_trigger_all_cpu_backtrace(bool self)
+void nmi_raise_cpu_backtrace(struct cpumask *in_mask)
{
struct cpumask mask;
HV_Coord tile;
unsigned int timeout;
int cpu;
- int ongoing;
HV_NMI_Info info[NR_CPUS];

- ongoing = atomic_cmpxchg(&backtrace_cpus, 0, num_online_cpus() - 1);
- if (ongoing != 0) {
- pr_err("Trying to do all-cpu backtrace.\n");
- pr_err("But another all-cpu backtrace is ongoing (%d cpus left)\n",
- ongoing);
- if (self) {
- pr_err("Reporting the stack on this cpu only.\n");
- dump_stack();
- }
- return;
- }
-
- cpumask_copy(&mask, cpu_online_mask);
- cpumask_clear_cpu(smp_processor_id(), &mask);
- cpumask_copy(&backtrace_mask, &mask);
-
- /* Backtrace for myself first. */
- if (self)
- dump_stack();
-
/* Tentatively dump stack on remote tiles via NMI. */
timeout = 100;
+ cpumask_copy(&mask, in_mask);
while (!cpumask_empty(&mask) && timeout) {
for_each_cpu(cpu, &mask) {
tile.x = cpu_x(cpu);
@@ -663,12 +615,17 @@ void arch_trigger_all_cpu_backtrace(bool self)
}

mdelay(10);
+ touch_softlockup_watchdog();
timeout--;
}

- /* Warn about cpus stuck in ICS and decrement their counts here. */
+ /* Warn about cpus stuck in ICS. */
if (!cpumask_empty(&mask)) {
for_each_cpu(cpu, &mask) {
+
+ /* Clear the bit as if nmi_cpu_backtrace() ran. */
+ cpumask_clear_cpu(cpu, in_mask);
+
switch (info[cpu].result) {
case HV_NMI_RESULT_FAIL_ICS:
pr_warn("Skipping stack dump of cpu %d in ICS at pc %#llx\n",
@@ -679,16 +636,19 @@ void arch_trigger_all_cpu_backtrace(bool self)
cpu);
break;
case HV_ENOSYS:
- pr_warn("Hypervisor too old to allow remote stack dumps.\n");
- goto skip_for_each;
+ WARN_ONCE(1, "Hypervisor too old to allow remote stack dumps.\n");
+ break;
default: /* should not happen */
pr_warn("Skipping stack dump of cpu %d [%d,%#llx]\n",
cpu, info[cpu].result, info[cpu].pc);
break;
}
}
-skip_for_each:
- atomic_sub(cpumask_weight(&mask), &backtrace_cpus);
}
}
+
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
+{
+ nmi_trigger_cpumask_backtrace(mask, nmi_raise_cpu_backtrace);
+}
#endif /* __tilegx_ */
diff --git a/arch/tile/kernel/traps.c b/arch/tile/kernel/traps.c
index 4d9651c5b1ad..934a7d88eb29 100644
--- a/arch/tile/kernel/traps.c
+++ b/arch/tile/kernel/traps.c
@@ -20,6 +20,8 @@
#include <linux/reboot.h>
#include <linux/uaccess.h>
#include <linux/ptrace.h>
+#include <linux/hardirq.h>
+#include <linux/nmi.h>
#include <asm/stack.h>
#include <asm/traps.h>
#include <asm/setup.h>
@@ -392,14 +394,15 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,

void do_nmi(struct pt_regs *regs, int fault_num, unsigned long reason)
{
+ nmi_enter();
switch (reason) {
case TILE_NMI_DUMP_STACK:
- do_nmi_dump_stack(regs);
+ nmi_cpu_backtrace(regs);
break;
default:
panic("Unexpected do_nmi type %ld", reason);
- return;
}
+ nmi_exit();
}

/* Deprecated function currently only used here. */
--
2.7.2

2016-03-16 18:48:17

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

Hi Chris,

[auto build test ERROR on tile/master]
[also build test ERROR on v4.5]
[cannot apply to next-20160316]
[if your patch is applied to the wrong git tree, please drop us a note to help improving the system]

url: https://github.com/0day-ci/linux/commits/Chris-Metcalf/improvements-to-the-nmi_backtrace-code/20160317-010929
base: https://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git master
config: xtensa-common_defconfig (attached as .config)
reproduce:
wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=xtensa

All errors (new ones prefixed by >>):

kernel/built-in.o: In function `SyS_setgroups':
>> (.text+0x16688): undefined reference to `__cpuidle_text_start'
kernel/built-in.o: In function `SyS_setgroups':
>> (.text+0x1668c): undefined reference to `__cpuidle_text_end'

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (1.13 kB)
.config.gz (9.52 kB)
Download all attachments

2016-03-17 19:36:33

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Wed, Mar 16, 2016 at 01:02:10PM -0400, Chris Metcalf wrote:
> Currently you can only request a backtrace of either all cpus, or
> all cpus but yourself. It can also be helpful to request a remote
> backtrace of a single cpu, and since we want that, the logical
> extension is to support a cpumask as the underlying primitive.
>
> This change modifies the existing lib/nmi_backtrace.c code to take
> a cpumask as its basic primitive, and modifies the linux/nmi.h code
> to use either the old "all/all_but_self" arch methods, or the new
> "cpumask" method, depending on which is available.
>
> The existing clients of nmi_backtrace (arm and x86) are converted
> to using the new cpumask approach in this change.

So the past days I've been staring at RCU stall warns, and they can use
a little of this. Their remote stack unwinds are less than useful.


2016-03-17 22:32:10

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On 3/17/2016 3:36 PM, Peter Zijlstra wrote:
> On Wed, Mar 16, 2016 at 01:02:10PM -0400, Chris Metcalf wrote:
>> Currently you can only request a backtrace of either all cpus, or
>> all cpus but yourself. It can also be helpful to request a remote
>> backtrace of a single cpu, and since we want that, the logical
>> extension is to support a cpumask as the underlying primitive.
>>
>> This change modifies the existing lib/nmi_backtrace.c code to take
>> a cpumask as its basic primitive, and modifies the linux/nmi.h code
>> to use either the old "all/all_but_self" arch methods, or the new
>> "cpumask" method, depending on which is available.
>>
>> The existing clients of nmi_backtrace (arm and x86) are converted
>> to using the new cpumask approach in this change.
> So the past days I've been staring at RCU stall warns, and they can use
> a little of this. Their remote stack unwinds are less than useful.

Were you suggesting this as an improvement for a possible v3, or just a
kind of implicit ack of the patch series? Thanks!

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-03-17 22:38:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Thu, Mar 17, 2016 at 06:31:44PM -0400, Chris Metcalf wrote:
> On 3/17/2016 3:36 PM, Peter Zijlstra wrote:
> >On Wed, Mar 16, 2016 at 01:02:10PM -0400, Chris Metcalf wrote:
> >>Currently you can only request a backtrace of either all cpus, or
> >>all cpus but yourself. It can also be helpful to request a remote
> >>backtrace of a single cpu, and since we want that, the logical
> >>extension is to support a cpumask as the underlying primitive.
> >>
> >>This change modifies the existing lib/nmi_backtrace.c code to take
> >>a cpumask as its basic primitive, and modifies the linux/nmi.h code
> >>to use either the old "all/all_but_self" arch methods, or the new
> >>"cpumask" method, depending on which is available.
> >>
> >>The existing clients of nmi_backtrace (arm and x86) are converted
> >>to using the new cpumask approach in this change.
> >So the past days I've been staring at RCU stall warns, and they can use
> >a little of this. Their remote stack unwinds are less than useful.
>
> Were you suggesting this as an improvement for a possible v3, or just a
> kind of implicit ack of the patch series? Thanks!

A suggestion more like. I've not actually looked at the 4th patch.

I'll try and fold the patches into the runs I do tomorrow, I'm sure to
trigger lots of fail. Maybe I'll even do that RCU patch.

2016-03-17 22:42:24

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On 3/17/2016 6:38 PM, Peter Zijlstra wrote:
> On Thu, Mar 17, 2016 at 06:31:44PM -0400, Chris Metcalf wrote:
>> On 3/17/2016 3:36 PM, Peter Zijlstra wrote:
>>> On Wed, Mar 16, 2016 at 01:02:10PM -0400, Chris Metcalf wrote:
>>>> Currently you can only request a backtrace of either all cpus, or
>>>> all cpus but yourself. It can also be helpful to request a remote
>>>> backtrace of a single cpu, and since we want that, the logical
>>>> extension is to support a cpumask as the underlying primitive.
>>>>
>>>> This change modifies the existing lib/nmi_backtrace.c code to take
>>>> a cpumask as its basic primitive, and modifies the linux/nmi.h code
>>>> to use either the old "all/all_but_self" arch methods, or the new
>>>> "cpumask" method, depending on which is available.
>>>>
>>>> The existing clients of nmi_backtrace (arm and x86) are converted
>>>> to using the new cpumask approach in this change.
>>> So the past days I've been staring at RCU stall warns, and they can use
>>> a little of this. Their remote stack unwinds are less than useful.
>> Were you suggesting this as an improvement for a possible v3, or just a
>> kind of implicit ack of the patch series? Thanks!
> A suggestion more like. I've not actually looked at the 4th patch.
>
> I'll try and fold the patches into the runs I do tomorrow, I'm sure to
> trigger lots of fail. Maybe I'll even do that RCU patch.

The build bot caught the fact that I missed arch/xtensa since it doesn't use
LOCK_TEXT, so if you're testing on that (ok maybe unlikely) you can add this:

diff --git a/arch/xtensa/kernel/vmlinux.lds.S b/arch/xtensa/kernel/vmlinux.lds.S
index c417cbe4ec87..18a174c7fb87 100644
--- a/arch/xtensa/kernel/vmlinux.lds.S
+++ b/arch/xtensa/kernel/vmlinux.lds.S
@@ -93,6 +93,9 @@ SECTIONS
VMLINUX_SYMBOL(__sched_text_start) = .;
*(.sched.literal .sched.text)
VMLINUX_SYMBOL(__sched_text_end) = .;
+ VMLINUX_SYMBOL(__cpuidle_text_start) = .;
+ *(.cpuidle.literal .cpuidle.text)
+ VMLINUX_SYMBOL(__cpuidle_text_end) = .;
VMLINUX_SYMBOL(__lock_text_start) = .;
*(.spinlock.literal .spinlock.text)
VMLINUX_SYMBOL(__lock_text_end) = .;


--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-03-17 22:56:41

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Thu, Mar 17, 2016 at 08:36:00PM +0100, Peter Zijlstra wrote:
> On Wed, Mar 16, 2016 at 01:02:10PM -0400, Chris Metcalf wrote:
> > Currently you can only request a backtrace of either all cpus, or
> > all cpus but yourself. It can also be helpful to request a remote
> > backtrace of a single cpu, and since we want that, the logical
> > extension is to support a cpumask as the underlying primitive.
> >
> > This change modifies the existing lib/nmi_backtrace.c code to take
> > a cpumask as its basic primitive, and modifies the linux/nmi.h code
> > to use either the old "all/all_but_self" arch methods, or the new
> > "cpumask" method, depending on which is available.
> >
> > The existing clients of nmi_backtrace (arm and x86) are converted
> > to using the new cpumask approach in this change.
>
> So the past days I've been staring at RCU stall warns, and they can use
> a little of this. Their remote stack unwinds are less than useful.

The RCU stall-warn stack traces can be ugly, agreed.

That said, RCU used to use NMI-based stack traces, but switched to the
current scheme due to the NMIs having the unfortunate habit of locking
things up, which IIRC often meant no stack traces at all. If I recall
correctly, one of the problems was self-deadlock in printk().

Have these problems been fixed?

Thanx, Paul

2016-03-17 23:09:32

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Thu, Mar 17, 2016 at 03:55:57PM -0700, Paul E. McKenney wrote:
> That said, RCU used to use NMI-based stack traces, but switched to the
> current scheme due to the NMIs having the unfortunate habit of locking
> things up, which IIRC often meant no stack traces at all. If I recall
> correctly, one of the problems was self-deadlock in printk().
>
> Have these problems been fixed?

Improved is I think the word.

Although I've butchered my printk() into absolute submission :-)

2016-03-17 23:11:41

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Thu, Mar 17, 2016 at 03:55:57PM -0700, Paul E. McKenney wrote:
> The RCU stall-warn stack traces can be ugly, agreed.

Ugly isn't the problem, completely random bollocks that puts you on the
wrong path was more the problem.

It uses a stack pointer saved at some random time in the past to start
unwinding an active stack from. Completely and utter misery.

2016-03-17 23:15:56

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Thu, Mar 17, 2016 at 06:41:42PM -0400, Chris Metcalf wrote:
> The build bot caught the fact that I missed arch/xtensa since it doesn't use
> LOCK_TEXT, so if you're testing on that (ok maybe unlikely) you can add this:

Ha!, no. regular boring x86_64.

2016-03-18 00:18:31

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On 3/17/2016 6:55 PM, Paul E. McKenney wrote:
> The RCU stall-warn stack traces can be ugly, agreed.
>
> That said, RCU used to use NMI-based stack traces, but switched to the
> current scheme due to the NMIs having the unfortunate habit of locking
> things up, which IIRC often meant no stack traces at all. If I recall
> correctly, one of the problems was self-deadlock in printk().

Steven Rostedt enabled the per_cpu printk func support in June 2014, and
the nmi_backtrace code uses it to just capture printk output to percpu
buffers, so I think it's going to be a lot more robust than earlier attempts.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-03-18 00:37:02

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Fri, Mar 18, 2016 at 12:11:28AM +0100, Peter Zijlstra wrote:
> On Thu, Mar 17, 2016 at 03:55:57PM -0700, Paul E. McKenney wrote:
> > The RCU stall-warn stack traces can be ugly, agreed.
>
> Ugly isn't the problem, completely random bollocks that puts you on the
> wrong path was more the problem.
>
> It uses a stack pointer saved at some random time in the past to start
> unwinding an active stack from. Completely and utter misery.

Yep, its accuracy does depend on what is going on, which was also my
experience with the NMI-based approach's reliablity.

Perhaps a boot-time parameter enabling the sysadm to pick the desired
flavor of poison?

Thanx, Paul

2016-03-18 00:42:33

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Thu, Mar 17, 2016 at 08:17:59PM -0400, Chris Metcalf wrote:
> On 3/17/2016 6:55 PM, Paul E. McKenney wrote:
> >The RCU stall-warn stack traces can be ugly, agreed.
> >
> >That said, RCU used to use NMI-based stack traces, but switched to the
> >current scheme due to the NMIs having the unfortunate habit of locking
> >things up, which IIRC often meant no stack traces at all. If I recall
> >correctly, one of the problems was self-deadlock in printk().
>
> Steven Rostedt enabled the per_cpu printk func support in June 2014, and
> the nmi_backtrace code uses it to just capture printk output to percpu
> buffers, so I think it's going to be a lot more robust than earlier attempts.

That would be a very good thing, give or take the "I think" qualifier.
And assuming that the target CPU is healthy enough to find its way back
to some place that can dump the per-CPU printk buffer. I might well
be overly paranoid, but I have to suspect that the probability of that
buffer getting dumped is reduced greatly on a CPU that isn't healthy
enough to respond to RCU, though.

But it seems like enabling the experiment might be useful.

"Try enabling the NMI version. If that doesn't get you your RCU CPU
stall warning stack trace, try the remote-print variant."

Or I suppose we could just do both in succession, just in case their
console was a serial port. ;-)

Thanx, Paul

2016-03-18 09:40:45

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On 18/03/16 00:33, Paul E. McKenney wrote:
> On Thu, Mar 17, 2016 at 08:17:59PM -0400, Chris Metcalf wrote:
>> On 3/17/2016 6:55 PM, Paul E. McKenney wrote:
>>> The RCU stall-warn stack traces can be ugly, agreed.
>>>
>>> That said, RCU used to use NMI-based stack traces, but switched to the
>>> current scheme due to the NMIs having the unfortunate habit of locking
>>> things up, which IIRC often meant no stack traces at all. If I recall
>>> correctly, one of the problems was self-deadlock in printk().
>>
>> Steven Rostedt enabled the per_cpu printk func support in June 2014, and
>> the nmi_backtrace code uses it to just capture printk output to percpu
>> buffers, so I think it's going to be a lot more robust than earlier attempts.
>
> That would be a very good thing, give or take the "I think" qualifier.
> And assuming that the target CPU is healthy enough to find its way back
> to some place that can dump the per-CPU printk buffer. I might well
> be overly paranoid, but I have to suspect that the probability of that
> buffer getting dumped is reduced greatly on a CPU that isn't healthy
> enough to respond to RCU, though.

The target CPU doesn't dump the buffer. It "just" fields the NMI, stores
the backtrace and sets a flag.

The buffer is dumped to console by the requesting CPU, either when all
backtraces have come back or when a timeout is reached.


> But it seems like enabling the experiment might be useful.
>
> "Try enabling the NMI version. If that doesn't get you your RCU CPU
> stall warning stack trace, try the remote-print variant."
>
> Or I suppose we could just do both in succession, just in case their
> console was a serial port. ;-)

I guess both might be needed but only when the target CPU is dead enough
to fail to respond to NMI. In principle, we could exploit the timeout in
the NMI backtrace logic and only issue the missing backtraces.


Daniel.

2016-03-18 23:54:55

by Paul E. McKenney

[permalink] [raw]
Subject: Re: [PATCH v2 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Fri, Mar 18, 2016 at 09:40:25AM +0000, Daniel Thompson wrote:
> On 18/03/16 00:33, Paul E. McKenney wrote:
> >On Thu, Mar 17, 2016 at 08:17:59PM -0400, Chris Metcalf wrote:
> >>On 3/17/2016 6:55 PM, Paul E. McKenney wrote:
> >>>The RCU stall-warn stack traces can be ugly, agreed.
> >>>
> >>>That said, RCU used to use NMI-based stack traces, but switched to the
> >>>current scheme due to the NMIs having the unfortunate habit of locking
> >>>things up, which IIRC often meant no stack traces at all. If I recall
> >>>correctly, one of the problems was self-deadlock in printk().
> >>
> >>Steven Rostedt enabled the per_cpu printk func support in June 2014, and
> >>the nmi_backtrace code uses it to just capture printk output to percpu
> >>buffers, so I think it's going to be a lot more robust than earlier attempts.
> >
> >That would be a very good thing, give or take the "I think" qualifier.
> >And assuming that the target CPU is healthy enough to find its way back
> >to some place that can dump the per-CPU printk buffer. I might well
> >be overly paranoid, but I have to suspect that the probability of that
> >buffer getting dumped is reduced greatly on a CPU that isn't healthy
> >enough to respond to RCU, though.
>
> The target CPU doesn't dump the buffer. It "just" fields the NMI,
> stores the backtrace and sets a flag.
>
> The buffer is dumped to console by the requesting CPU, either when
> all backtraces have come back or when a timeout is reached.

That does sound a bit more robust, good!

> >But it seems like enabling the experiment might be useful.
> >
> >"Try enabling the NMI version. If that doesn't get you your RCU CPU
> >stall warning stack trace, try the remote-print variant."
> >
> >Or I suppose we could just do both in succession, just in case their
> >console was a serial port. ;-)
>
> I guess both might be needed but only when the target CPU is dead
> enough to fail to respond to NMI. In principle, we could exploit the
> timeout in the NMI backtrace logic and only issue the missing
> backtraces.

It would be really nice if I could call one function that used the
best strategy for getting information (including stack trace) about a
specified CPU. Ditto for getting information about a specified task,
which might be running or might be preempted at the time.

Thanx, Paul

2016-03-21 15:38:45

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote:
> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
> index 48958d3cec9e..37afd721ec99 100644
> --- a/scripts/mod/modpost.c
> +++ b/scripts/mod/modpost.c
> @@ -887,8 +887,8 @@ static void check_section(const char *modname, struct elf_info *elf,
> #define ALL_EXIT_SECTIONS EXIT_SECTIONS, ALL_XXXEXIT_SECTIONS
>
> #define DATA_SECTIONS ".data", ".data.rel"
> -#define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \
> - ".kprobes.text"
> +#define TEXT_SECTIONS ".text", ".text.unlikely", \
> + ".kprobes.text", ".cpuidle.text"

Where did .sched.text go?

> #define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
> ".fixup", ".entry.text", ".exception.text", ".text.*", \
> ".coldtext"

2016-03-21 15:42:06

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote:
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 9f7c21c22477..d569ae7fde37 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -298,7 +298,7 @@ void arch_cpu_idle(void)
> /*
> * We use this if we don't have any better idle routine..
> */
> -void default_idle(void)
> +void __cpuidle default_idle(void)
> {
> trace_cpu_idle_rcuidle(1, smp_processor_id());
> safe_halt();
> @@ -413,7 +413,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
> * with interrupts enabled and no flags, which is backwards compatible with the
> * original MWAIT implementation.
> */
> -static void mwait_idle(void)
> +static __cpuidle void mwait_idle(void)
> {
> if (!current_set_polling_and_test()) {
> trace_cpu_idle_rcuidle(1, smp_processor_id());

The most common idle function for x86 is: mwait_idle_with_hints(),
trouble is, its an inline, so I'm not sure adding __cpuidle to it does
anything.

I've yet to find the magic objdump incantation to check. Or rather
objdump -h doesn't appear to list .cpuidle.text at all :/

I'm probably doing something silly...

2016-03-21 16:02:44

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

On 03/21/2016 11:38 AM, Peter Zijlstra wrote:
> On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote:
>> diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
>> index 48958d3cec9e..37afd721ec99 100644
>> --- a/scripts/mod/modpost.c
>> +++ b/scripts/mod/modpost.c
>> @@ -887,8 +887,8 @@ static void check_section(const char *modname, struct elf_info *elf,
>> #define ALL_EXIT_SECTIONS EXIT_SECTIONS, ALL_XXXEXIT_SECTIONS
>>
>> #define DATA_SECTIONS ".data", ".data.rel"
>> -#define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \
>> - ".kprobes.text"
>> +#define TEXT_SECTIONS ".text", ".text.unlikely", \
>> + ".kprobes.text", ".cpuidle.text"
> Where did .sched.text go?

Indeed! Good catch. I can't even speculate as to how I managed
to delete the thing on the previous line while adding something
on the following line :-)

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-03-21 16:15:33

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

On 03/21/2016 11:42 AM, Peter Zijlstra wrote:
> On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote:
>> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>> index 9f7c21c22477..d569ae7fde37 100644
>> --- a/arch/x86/kernel/process.c
>> +++ b/arch/x86/kernel/process.c
>> @@ -298,7 +298,7 @@ void arch_cpu_idle(void)
>> /*
>> * We use this if we don't have any better idle routine..
>> */
>> -void default_idle(void)
>> +void __cpuidle default_idle(void)
>> {
>> trace_cpu_idle_rcuidle(1, smp_processor_id());
>> safe_halt();
>> @@ -413,7 +413,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
>> * with interrupts enabled and no flags, which is backwards compatible with the
>> * original MWAIT implementation.
>> */
>> -static void mwait_idle(void)
>> +static __cpuidle void mwait_idle(void)
>> {
>> if (!current_set_polling_and_test()) {
>> trace_cpu_idle_rcuidle(1, smp_processor_id());
> The most common idle function for x86 is: mwait_idle_with_hints(),
> trouble is, its an inline, so I'm not sure adding __cpuidle to it does
> anything.

No, you're right, it wouldn't help. I didn't look at the drivers/cpuidle
subsystem at all in my patch, since I'm not that familiar with it,
but it seems like tagging acpi_processor_ffh_cstate_enter(), as the
only user of mwait_idle_with_hints(), will do the job.

I do see that native_play_dead() also uses mwait/monitor, but since
that's hotplug I don't think it's relevant to this patch series.

> I've yet to find the magic objdump incantation to check. Or rather
> objdump -h doesn't appear to list .cpuidle.text at all :/
>
> I'm probably doing something silly...

The easiest way to check for a given function is just to look
at the "nm -n" output and see that all the functions you expect
to reflect idle behavior are in the cpuidle begin/end range.
Or, to look at "objdump -dr" and search for monitor/mwait.

objdump -h certainly works to show .cpuidle.text if you look at
individual objects (e.g. arch/x86/kernel/process.o) but by the time
you're looking at the linked vmlinux image they have all been linked
into the giant .text section.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-03-21 16:32:26

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Mon, Mar 21, 2016 at 12:15:12PM -0400, Chris Metcalf wrote:
> On 03/21/2016 11:42 AM, Peter Zijlstra wrote:

> >The most common idle function for x86 is: mwait_idle_with_hints(),
> >trouble is, its an inline, so I'm not sure adding __cpuidle to it does
> >anything.
>
> No, you're right, it wouldn't help. I didn't look at the drivers/cpuidle
> subsystem at all in my patch, since I'm not that familiar with it,
> but it seems like tagging acpi_processor_ffh_cstate_enter(), as the
> only user of mwait_idle_with_hints(), will do the job.

intel_idle() also uses it.

> >I've yet to find the magic objdump incantation to check. Or rather
> >objdump -h doesn't appear to list .cpuidle.text at all :/
> >
> >I'm probably doing something silly...
>
> The easiest way to check for a given function is just to look
> at the "nm -n" output and see that all the functions you expect
> to reflect idle behavior are in the cpuidle begin/end range.

# nm -n ivb-ep-build/vmlinux | awk '/__cpuidle_text_start/ {p=1} {if (p) print $0} /__cpuidle_text_end/ {p=0}'
ffffffff81b16ca8 T __cpuidle_text_start
ffffffff81b16cb0 T default_idle
ffffffff81b16e50 t mwait_idle
ffffffff81b17080 t cpu_idle_poll
ffffffff81b17280 T default_idle_call
ffffffff81b172be T __cpuidle_text_end

So no intel_idle for me..

> objdump -h certainly works to show .cpuidle.text if you look at
> individual objects (e.g. arch/x86/kernel/process.o) but by the time
> you're looking at the linked vmlinux image they have all been linked
> into the giant .text section.

Indeed.

2016-03-21 16:48:53

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus


The below annotates the two most used idle functions on x86


--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -152,7 +152,7 @@ int acpi_processor_ffh_cstate_probe(unsi
}
EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);

-void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
+__cpuidle void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
{
unsigned int cpu = smp_processor_id();
struct cstate_entry *percpu_entry;
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -725,7 +725,7 @@ static struct cpuidle_state avn_cstates[
*
* Must be called under local_irq_disable().
*/
-static int intel_idle(struct cpuidle_device *dev,
+__cpuidle static int intel_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
unsigned long ecx = 1; /* break on interrupt flag */

2016-03-21 17:17:49

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Mon, Mar 21, 2016 at 01:12:39PM -0400, Chris Metcalf wrote:
> I do see mwait used in the ACPI 4.0 Processor Aggregator Device driver, but
> this seems sufficiently far removed from regular cpuidle that I don't
> think it's appropriate to tag the power_saving_thread() function -
> the initial commit talks about using the mechanism "to ride-out
> transient electrical and thermal emergencies."
>
> There's also the thermal "powerclamp" driver that enforces a particular
> amount of idle time across the system. For this one it's less clear to
> me whether this is a valid "idle" state that we should ignore when doing
> NMI backtracing. This would be the clamp_thread() function in
> drivers/thermal/intel_powerclamp.c. For now I'm not including it,
> but what do you think?

Both the acpi power aggregator and the powerclamp driver are forced idle
and have some serious issues, so are safe to ignore for now.

Also, I would explicitly not include them, because forced idle might
still be interesting.


> ># nm -n ivb-ep-build/vmlinux | awk '/__cpuidle_text_start/ {p=1} {if (p) print $0} /__cpuidle_text_end/ {p=0}'
> >ffffffff81b16ca8 T __cpuidle_text_start
> >ffffffff81b16cb0 T default_idle
> >ffffffff81b16e50 t mwait_idle
> >ffffffff81b17080 t cpu_idle_poll
> >ffffffff81b17280 T default_idle_call
> >ffffffff81b172be T __cpuidle_text_end
> >
> >So no intel_idle for me..
>
> With the changes discussed so far in this email thread, we've gotten to:
>
> ffffffff818df178 T __cpuidle_text_start
> ffffffff818df180 T default_idle
> ffffffff818df260 t mwait_idle
> ffffffff818df3f0 T acpi_processor_ffh_cstate_enter
> ffffffff818df4a0 T default_idle_call
> ffffffff818df4e0 t cpu_idle_poll

> ffffffff818df600 t intel_idle_freeze

You can skip this one, that only happens when you suspend to idle.

> ffffffff818df6a0 t intel_idle
> ffffffff818df7b5 T __cpuidle_text_end
>
> This is about 1,600 bytes (or about 450 instructions) that will cause
> NMI to skip doing a backtrace if the PC is anywhere in the range.

Yeah, the alternative is making mwait_idle_with_hints an actual
function, but then we get to somehow exclude the other users like the
forced idle stuff.


2016-03-21 17:44:31

by Chris Metcalf

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

On 03/21/2016 12:32 PM, Peter Zijlstra wrote:
> On Mon, Mar 21, 2016 at 12:15:12PM -0400, Chris Metcalf wrote:
>> On 03/21/2016 11:42 AM, Peter Zijlstra wrote:
>>> The most common idle function for x86 is: mwait_idle_with_hints(),
>>> trouble is, its an inline, so I'm not sure adding __cpuidle to it does
>>> anything.
>> No, you're right, it wouldn't help. I didn't look at the drivers/cpuidle
>> subsystem at all in my patch, since I'm not that familiar with it,
>> but it seems like tagging acpi_processor_ffh_cstate_enter(), as the
>> only user of mwait_idle_with_hints(), will do the job.
> intel_idle() also uses it.

Ah, of course. I was only looking at the config options enabled in the
kernel I was building. I've added INTEL_IDLE now and grep'ed the whole
kernel tree as well, finding a couple of extra possibilities:

I do see mwait used in the ACPI 4.0 Processor Aggregator Device driver, but
this seems sufficiently far removed from regular cpuidle that I don't
think it's appropriate to tag the power_saving_thread() function -
the initial commit talks about using the mechanism "to ride-out
transient electrical and thermal emergencies."

There's also the thermal "powerclamp" driver that enforces a particular
amount of idle time across the system. For this one it's less clear to
me whether this is a valid "idle" state that we should ignore when doing
NMI backtracing. This would be the clamp_thread() function in
drivers/thermal/intel_powerclamp.c. For now I'm not including it,
but what do you think?

> # nm -n ivb-ep-build/vmlinux | awk '/__cpuidle_text_start/ {p=1} {if (p) print $0} /__cpuidle_text_end/ {p=0}'
> ffffffff81b16ca8 T __cpuidle_text_start
> ffffffff81b16cb0 T default_idle
> ffffffff81b16e50 t mwait_idle
> ffffffff81b17080 t cpu_idle_poll
> ffffffff81b17280 T default_idle_call
> ffffffff81b172be T __cpuidle_text_end
>
> So no intel_idle for me..

With the changes discussed so far in this email thread, we've gotten to:

ffffffff818df178 T __cpuidle_text_start
ffffffff818df180 T default_idle
ffffffff818df260 t mwait_idle
ffffffff818df3f0 T acpi_processor_ffh_cstate_enter
ffffffff818df4a0 T default_idle_call
ffffffff818df4e0 t cpu_idle_poll
ffffffff818df600 t intel_idle_freeze
ffffffff818df6a0 t intel_idle
ffffffff818df7b5 T __cpuidle_text_end

This is about 1,600 bytes (or about 450 instructions) that will cause
NMI to skip doing a backtrace if the PC is anywhere in the range.

--
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com

2016-03-21 21:49:50

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Wed, Mar 16, 2016 at 01:02:13PM -0400, Chris Metcalf wrote:
> When doing an nmi backtrace of many cores, most of which are idle,
> the output is a little overwhelming and very uninformative. Suppress
> messages for cpus that are idling when they are interrupted and just
> emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

This is still 100+ lines on a modern system, but better than the many
many thousands it would otherwise generate.

> We do this by grouping all the cpuidle code together into a new
> .cpuidle.text section, and then checking the address of the
> interrupted PC to see if it lies within that section.
>
> Signed-off-by: Chris Metcalf <[email protected]>

Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>

Please Cc Rafael on the next posting.

2016-03-22 17:20:04

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v3 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI

Currently on arm there is code that checks whether it should call
dump_stack() explicitly, to avoid trying to raise an NMI when the
current context is not preemptible by the backtrace IPI. Similarly,
the forthcoming arch/tile support uses an IPI mechanism that does
not support generating an NMI to self.

Accordingly, move the code that guards this case into the generic
mechanism, and invoke it unconditionally whenever we want a
backtrace of the current cpu. It seems plausible that in all cases,
dump_stack() will generate better information than generating a
stack from the NMI handler. The register state will be missing,
but that state is likely not particularly helpful in any case.

Or, if we think it is helpful, we should be capturing and emitting
the current register state in all cases when regs == NULL is passed
to nmi_cpu_backtrace().

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/arm/kernel/smp.c | 9 ---------
lib/nmi_backtrace.c | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 208125658e56..26a9ac6bc616 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -746,15 +746,6 @@ core_initcall(register_cpufreq_notifier);

static void raise_nmi(cpumask_t *mask)
{
- /*
- * Generate the backtrace directly if we are running in a calling
- * context that is not preemptible by the backtrace IPI. Note
- * that nmi_cpu_backtrace() automatically removes the current cpu
- * from mask.
- */
- if (cpumask_test_cpu(smp_processor_id(), mask) && irqs_disabled())
- nmi_cpu_backtrace(NULL);
-
smp_cross_call(mask, IPI_CPU_BACKTRACE);
}

diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index db63ac75eba0..9375c0279b73 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -76,6 +76,15 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
}

+ /*
+ * Don't try to send an NMI to this cpu; it may work on some
+ * architectures, but on others it may not, and we'll get
+ * information at least as useful just by doing a dump_stack() here.
+ * Note that nmi_cpu_backtrace(NULL) will clear the cpu bit.
+ */
+ if (cpumask_test_cpu(this_cpu, to_cpumask(backtrace_mask)))
+ nmi_cpu_backtrace(NULL);
+
if (!cpumask_empty(to_cpumask(backtrace_mask))) {
pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
--
2.7.2

2016-03-22 17:20:19

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v3 3/4] arch/tile: adopt the new nmi_backtrace framework

Previously tile was rolling its own method of capturing backtrace
data in the NMI handlers, but it was relying on running printk()
from the NMI handler, which is not always safe. So adopt the
nmi_backtrace model (with the new cpumask extension) instead.

So we can call the nmi_backtrace code directly from the nmi handler,
move the nmi_enter()/exit() into the top-level tile NMI handler.

The semantics of the routine change slightly since it is now
synchronous with the remote cores completing the backtraces.
Previously it was asynchronous, but with protection to avoid starting
a new remote backtrace if the old one was still in progress.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/tile/include/asm/irq.h | 4 +--
arch/tile/kernel/pmc.c | 3 --
arch/tile/kernel/process.c | 72 ++++++++++-----------------------------------
arch/tile/kernel/traps.c | 7 +++--
4 files changed, 23 insertions(+), 63 deletions(-)

diff --git a/arch/tile/include/asm/irq.h b/arch/tile/include/asm/irq.h
index 84a924034bdb..909230a02ea8 100644
--- a/arch/tile/include/asm/irq.h
+++ b/arch/tile/include/asm/irq.h
@@ -79,8 +79,8 @@ void tile_irq_activate(unsigned int irq, int tile_irq_type);
void setup_irq_regs(void);

#ifdef __tilegx__
-void arch_trigger_all_cpu_backtrace(bool self);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
#endif

#endif /* _ASM_TILE_IRQ_H */
diff --git a/arch/tile/kernel/pmc.c b/arch/tile/kernel/pmc.c
index db62cc34b955..81cf8743a3f3 100644
--- a/arch/tile/kernel/pmc.c
+++ b/arch/tile/kernel/pmc.c
@@ -16,7 +16,6 @@
#include <linux/spinlock.h>
#include <linux/module.h>
#include <linux/atomic.h>
-#include <linux/interrupt.h>

#include <asm/processor.h>
#include <asm/pmc.h>
@@ -29,9 +28,7 @@ int handle_perf_interrupt(struct pt_regs *regs, int fault)
if (!perf_irq)
panic("Unexpected PERF_COUNT interrupt %d\n", fault);

- nmi_enter();
retval = perf_irq(regs, fault);
- nmi_exit();
return retval;
}

diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index b5f30d376ce1..6594df5fed53 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -22,7 +22,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/compat.h>
-#include <linux/hardirq.h>
+#include <linux/nmi.h>
#include <linux/syscalls.h>
#include <linux/kernel.h>
#include <linux/tracehook.h>
@@ -593,66 +593,18 @@ void show_regs(struct pt_regs *regs)
tile_show_stack(&kbt);
}

-/* To ensure stack dump on tiles occurs one by one. */
-static DEFINE_SPINLOCK(backtrace_lock);
-/* To ensure no backtrace occurs before all of the stack dump are done. */
-static atomic_t backtrace_cpus;
-/* The cpu mask to avoid reentrance. */
-static struct cpumask backtrace_mask;
-
-void do_nmi_dump_stack(struct pt_regs *regs)
-{
- int is_idle = is_idle_task(current) && !in_interrupt();
- int cpu;
-
- nmi_enter();
- cpu = smp_processor_id();
- if (WARN_ON_ONCE(!cpumask_test_and_clear_cpu(cpu, &backtrace_mask)))
- goto done;
-
- spin_lock(&backtrace_lock);
- if (is_idle)
- pr_info("CPU: %d idle\n", cpu);
- else
- show_regs(regs);
- spin_unlock(&backtrace_lock);
- atomic_dec(&backtrace_cpus);
-done:
- nmi_exit();
-}
-
#ifdef __tilegx__
-void arch_trigger_all_cpu_backtrace(bool self)
+void nmi_raise_cpu_backtrace(struct cpumask *in_mask)
{
struct cpumask mask;
HV_Coord tile;
unsigned int timeout;
int cpu;
- int ongoing;
HV_NMI_Info info[NR_CPUS];

- ongoing = atomic_cmpxchg(&backtrace_cpus, 0, num_online_cpus() - 1);
- if (ongoing != 0) {
- pr_err("Trying to do all-cpu backtrace.\n");
- pr_err("But another all-cpu backtrace is ongoing (%d cpus left)\n",
- ongoing);
- if (self) {
- pr_err("Reporting the stack on this cpu only.\n");
- dump_stack();
- }
- return;
- }
-
- cpumask_copy(&mask, cpu_online_mask);
- cpumask_clear_cpu(smp_processor_id(), &mask);
- cpumask_copy(&backtrace_mask, &mask);
-
- /* Backtrace for myself first. */
- if (self)
- dump_stack();
-
/* Tentatively dump stack on remote tiles via NMI. */
timeout = 100;
+ cpumask_copy(&mask, in_mask);
while (!cpumask_empty(&mask) && timeout) {
for_each_cpu(cpu, &mask) {
tile.x = cpu_x(cpu);
@@ -663,12 +615,17 @@ void arch_trigger_all_cpu_backtrace(bool self)
}

mdelay(10);
+ touch_softlockup_watchdog();
timeout--;
}

- /* Warn about cpus stuck in ICS and decrement their counts here. */
+ /* Warn about cpus stuck in ICS. */
if (!cpumask_empty(&mask)) {
for_each_cpu(cpu, &mask) {
+
+ /* Clear the bit as if nmi_cpu_backtrace() ran. */
+ cpumask_clear_cpu(cpu, in_mask);
+
switch (info[cpu].result) {
case HV_NMI_RESULT_FAIL_ICS:
pr_warn("Skipping stack dump of cpu %d in ICS at pc %#llx\n",
@@ -679,16 +636,19 @@ void arch_trigger_all_cpu_backtrace(bool self)
cpu);
break;
case HV_ENOSYS:
- pr_warn("Hypervisor too old to allow remote stack dumps.\n");
- goto skip_for_each;
+ WARN_ONCE(1, "Hypervisor too old to allow remote stack dumps.\n");
+ break;
default: /* should not happen */
pr_warn("Skipping stack dump of cpu %d [%d,%#llx]\n",
cpu, info[cpu].result, info[cpu].pc);
break;
}
}
-skip_for_each:
- atomic_sub(cpumask_weight(&mask), &backtrace_cpus);
}
}
+
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
+{
+ nmi_trigger_cpumask_backtrace(mask, nmi_raise_cpu_backtrace);
+}
#endif /* __tilegx_ */
diff --git a/arch/tile/kernel/traps.c b/arch/tile/kernel/traps.c
index 4d9651c5b1ad..934a7d88eb29 100644
--- a/arch/tile/kernel/traps.c
+++ b/arch/tile/kernel/traps.c
@@ -20,6 +20,8 @@
#include <linux/reboot.h>
#include <linux/uaccess.h>
#include <linux/ptrace.h>
+#include <linux/hardirq.h>
+#include <linux/nmi.h>
#include <asm/stack.h>
#include <asm/traps.h>
#include <asm/setup.h>
@@ -392,14 +394,15 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,

void do_nmi(struct pt_regs *regs, int fault_num, unsigned long reason)
{
+ nmi_enter();
switch (reason) {
case TILE_NMI_DUMP_STACK:
- do_nmi_dump_stack(regs);
+ nmi_cpu_backtrace(regs);
break;
default:
panic("Unexpected do_nmi type %ld", reason);
- return;
}
+ nmi_exit();
}

/* Deprecated function currently only used here. */
--
2.7.2

2016-03-22 17:30:14

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Tue, Mar 22, 2016 at 01:19:39PM -0400, Chris Metcalf wrote:
> When doing an nmi backtrace of many cores, most of which are idle,
> the output is a little overwhelming and very uninformative. Suppress
> messages for cpus that are idling when they are interrupted and just
> emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
>
> We do this by grouping all the cpuidle code together into a new
> .cpuidle.text section, and then checking the address of the
> interrupted PC to see if it lies within that section.
>
> This commit suitably tags x86, arm64, and tile idle routines,
> and only adds in the minimal framework for other architectures.
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
> Tested-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Chris Metcalf <[email protected]>

For some reason I found a few CPUs using poll_idle().

Rafael, when and why would that ever get selected as a useful idle
state? When the predicted idle time is so short even C1 isn't worth it?


--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -14,6 +14,7 @@
#include <linux/cpuidle.h>
#include <linux/cpumask.h>
#include <linux/tick.h>
+#include <linux/cpu.h>

#include "cpuidle.h"

@@ -178,7 +179,7 @@ static void __cpuidle_driver_init(struct
}

#ifdef CONFIG_ARCH_HAS_CPU_RELAX
-static int poll_idle(struct cpuidle_device *dev,
+__cpuidle static int poll_idle(struct cpuidle_device *dev,
struct cpuidle_driver *drv, int index)
{
local_irq_enable();

2016-03-22 17:35:04

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v3 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

Currently you can only request a backtrace of either all cpus, or
all cpus but yourself. It can also be helpful to request a remote
backtrace of a single cpu, and since we want that, the logical
extension is to support a cpumask as the underlying primitive.

This change modifies the existing lib/nmi_backtrace.c code to take
a cpumask as its basic primitive, and modifies the linux/nmi.h code
to use either the old "all/all_but_self" arch methods, or the new
"cpumask" method, depending on which is available.

The existing clients of nmi_backtrace (arm and x86) are converted
to using the new cpumask approach in this change.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/arm/include/asm/irq.h | 4 +--
arch/arm/kernel/smp.c | 4 +--
arch/x86/include/asm/irq.h | 4 +--
arch/x86/kernel/apic/hw_nmi.c | 6 ++---
include/linux/nmi.h | 63 ++++++++++++++++++++++++++++++++++---------
lib/nmi_backtrace.c | 15 +++++------
6 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index 1bd9510de1b9..13f9a9a17eca 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -36,8 +36,8 @@ extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
#endif

#ifdef CONFIG_SMP
-extern void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x)
+extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask);
+#define arch_trigger_cpumask_backtrace(x) arch_trigger_cpumask_backtrace(x)
#endif

static inline int nr_legacy_irqs(void)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 37312f6749f3..208125658e56 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -758,7 +758,7 @@ static void raise_nmi(cpumask_t *mask)
smp_cross_call(mask, IPI_CPU_BACKTRACE);
}

-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
{
- nmi_trigger_all_cpu_backtrace(include_self, raise_nmi);
+ nmi_trigger_cpumask_backtrace(mask, raise_nmi);
}
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index e7de5c9a4fbd..18bdc8cc5c63 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -50,8 +50,8 @@ extern int vector_used_by_percpu_irq(unsigned int vector);
extern void init_ISA_irqs(void);

#ifdef CONFIG_X86_LOCAL_APIC
-void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
#endif

#endif /* _ASM_X86_IRQ_H */
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..63f0b69ad6a6 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -27,15 +27,15 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
}
#endif

-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
static void nmi_raise_cpu_backtrace(cpumask_t *mask)
{
apic->send_IPI_mask(mask, NMI_VECTOR);
}

-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
{
- nmi_trigger_all_cpu_backtrace(include_self, nmi_raise_cpu_backtrace);
+ nmi_trigger_cpumask_backtrace(mask, nmi_raise_cpu_backtrace);
}

static int
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 7ec5b86735f3..951875f4f072 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -31,38 +31,75 @@ static inline void hardlockup_detector_disable(void) {}
#endif

/*
- * Create trigger_all_cpu_backtrace() out of the arch-provided
- * base function. Return whether such support was available,
+ * Create trigger_all_cpu_backtrace() etc out of the arch-provided
+ * base function(s). Return whether such support was available,
* to allow calling code to fall back to some other mechanism:
*/
-#ifdef arch_trigger_all_cpu_backtrace
static inline bool trigger_all_cpu_backtrace(void)
{
+#if defined(arch_trigger_all_cpu_backtrace)
arch_trigger_all_cpu_backtrace(true);
-
return true;
+#elif defined(arch_trigger_cpumask_backtrace)
+ arch_trigger_cpumask_backtrace(cpu_online_mask);
+ return true;
+#else
+ return false;
+#endif
}
+
static inline bool trigger_allbutself_cpu_backtrace(void)
{
+#if defined(arch_trigger_all_cpu_backtrace)
arch_trigger_all_cpu_backtrace(false);
return true;
-}
-
-/* generic implementation */
-void nmi_trigger_all_cpu_backtrace(bool include_self,
- void (*raise)(cpumask_t *mask));
-bool nmi_cpu_backtrace(struct pt_regs *regs);
+#elif defined(arch_trigger_cpumask_backtrace)
+ cpumask_var_t mask;
+ int cpu = get_cpu();

+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ return false;
+ cpumask_copy(mask, cpu_online_mask);
+ cpumask_clear_cpu(cpu, mask);
+ arch_trigger_cpumask_backtrace(mask);
+ put_cpu();
+ free_cpumask_var(mask);
+ return true;
#else
-static inline bool trigger_all_cpu_backtrace(void)
-{
return false;
+#endif
}
-static inline bool trigger_allbutself_cpu_backtrace(void)
+
+static inline bool trigger_cpumask_backtrace(struct cpumask *mask)
{
+#if defined(arch_trigger_cpumask_backtrace)
+ arch_trigger_cpumask_backtrace(mask);
+ return true;
+#else
return false;
+#endif
}
+
+static inline bool trigger_single_cpu_backtrace(int cpu)
+{
+#if defined(arch_trigger_cpumask_backtrace)
+ cpumask_var_t mask;
+
+ if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+ return false;
+ cpumask_set_cpu(cpu, mask);
+ arch_trigger_cpumask_backtrace(mask);
+ free_cpumask_var(mask);
+ return true;
+#else
+ return false;
#endif
+}
+
+/* generic implementation */
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
+ void (*raise)(cpumask_t *mask));
+bool nmi_cpu_backtrace(struct pt_regs *regs);

#ifdef CONFIG_LOCKUP_DETECTOR
int hw_nmi_is_cpu_stuck(struct pt_regs *);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..db63ac75eba0 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -18,7 +18,7 @@
#include <linux/nmi.h>
#include <linux/seq_buf.h>

-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
static cpumask_t printtrace_mask;
@@ -44,12 +44,12 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
}

/*
- * When raise() is called it will be is passed a pointer to the
+ * When raise() is called it will be passed a pointer to the
* backtrace_mask. Architectures that call nmi_cpu_backtrace()
* directly from their raise() functions may rely on the mask
* they are passed being updated as a side effect of this call.
*/
-void nmi_trigger_all_cpu_backtrace(bool include_self,
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
void (*raise)(cpumask_t *mask))
{
struct nmi_seq_buf *s;
@@ -64,10 +64,7 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
return;
}

- cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
- if (!include_self)
- cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
-
+ cpumask_copy(to_cpumask(backtrace_mask), mask);
cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));

/*
@@ -80,8 +77,8 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
}

if (!cpumask_empty(to_cpumask(backtrace_mask))) {
- pr_info("Sending NMI to %s CPUs:\n",
- (include_self ? "all" : "other"));
+ pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
+ this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
raise(to_cpumask(backtrace_mask));
}

--
2.7.2

2016-03-22 17:35:27

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v3 4/4] nmi_backtrace: generate one-line reports for idle cpus

When doing an nmi backtrace of many cores, most of which are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the
interrupted PC to see if it lies within that section.

This commit suitably tags x86, arm64, and tile idle routines,
and only adds in the minimal framework for other architectures.

Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
---
arch/alpha/kernel/vmlinux.lds.S | 1 +
arch/arc/kernel/vmlinux.lds.S | 1 +
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/arm64/mm/proc.S | 2 ++
arch/avr32/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/cris/kernel/vmlinux.lds.S | 1 +
arch/frv/kernel/vmlinux.lds.S | 1 +
arch/h8300/kernel/vmlinux.lds.S | 1 +
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/kernel/vmlinux.lds.S | 1 +
arch/m32r/kernel/vmlinux.lds.S | 1 +
arch/m68k/kernel/vmlinux-nommu.lds | 1 +
arch/m68k/kernel/vmlinux-std.lds | 1 +
arch/m68k/kernel/vmlinux-sun3.lds | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/mn10300/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/score/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/kernel/entry.S | 2 +-
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/um/kernel/dyn.lds.S | 1 +
arch/um/kernel/uml.lds.S | 1 +
arch/unicore32/kernel/vmlinux.lds.S | 1 +
arch/x86/kernel/acpi/cstate.c | 2 +-
arch/x86/kernel/process.c | 4 ++--
arch/x86/kernel/vmlinux.lds.S | 1 +
arch/xtensa/kernel/vmlinux.lds.S | 3 +++
drivers/idle/intel_idle.c | 4 ++--
include/asm-generic/vmlinux.lds.h | 6 ++++++
include/linux/cpu.h | 5 +++++
kernel/sched/idle.c | 13 +++++++++++--
lib/nmi_backtrace.c | 16 +++++++++++-----
scripts/mod/modpost.c | 2 +-
scripts/recordmcount.c | 1 +
scripts/recordmcount.pl | 1 +
46 files changed, 80 insertions(+), 14 deletions(-)

diff --git a/arch/alpha/kernel/vmlinux.lds.S b/arch/alpha/kernel/vmlinux.lds.S
index 647b84c15382..cebecfb76fbf 100644
--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -22,6 +22,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/arc/kernel/vmlinux.lds.S b/arch/arc/kernel/vmlinux.lds.S
index 894e696bddaa..65652160cfda 100644
--- a/arch/arc/kernel/vmlinux.lds.S
+++ b/arch/arc/kernel/vmlinux.lds.S
@@ -97,6 +97,7 @@ SECTIONS
_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 8b60fde5ce48..6c13d570e9c9 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -107,6 +107,7 @@ SECTIONS
IRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.gnu.warning)
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index e3928f578891..a5cbecf8a74c 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -104,6 +104,7 @@ SECTIONS
IRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
HYPERVISOR_TEXT
IDMAP_TEXT
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index c164d2cb35c0..b1b60fc438f6 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -48,11 +48,13 @@
*
* Idle the processor (wait for interrupt).
*/
+ .pushsection ".cpuidle.text","ax"
ENTRY(cpu_do_idle)
dsb sy // WFI may enter a low-power mode
wfi
ret
ENDPROC(cpu_do_idle)
+ .popsection

#ifdef CONFIG_CPU_PM
/**
diff --git a/arch/avr32/kernel/vmlinux.lds.S b/arch/avr32/kernel/vmlinux.lds.S
index a4589176bed5..17f2730eb497 100644
--- a/arch/avr32/kernel/vmlinux.lds.S
+++ b/arch/avr32/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
KPROBES_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S
index c9eec84aa258..63a02c342830 100644
--- a/arch/blackfin/kernel/vmlinux.lds.S
+++ b/arch/blackfin/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS
#ifndef CONFIG_SCHEDULE_L1
SCHED_TEXT
#endif
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
KPROBES_TEXT
diff --git a/arch/c6x/kernel/vmlinux.lds.S b/arch/c6x/kernel/vmlinux.lds.S
index 5a6e141d1641..9cabd962ab36 100644
--- a/arch/c6x/kernel/vmlinux.lds.S
+++ b/arch/c6x/kernel/vmlinux.lds.S
@@ -70,6 +70,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
KPROBES_TEXT
diff --git a/arch/cris/kernel/vmlinux.lds.S b/arch/cris/kernel/vmlinux.lds.S
index 7552c2557506..979586261520 100644
--- a/arch/cris/kernel/vmlinux.lds.S
+++ b/arch/cris/kernel/vmlinux.lds.S
@@ -43,6 +43,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.text.__*)
diff --git a/arch/frv/kernel/vmlinux.lds.S b/arch/frv/kernel/vmlinux.lds.S
index 7e958d829ec9..aa6e573d57da 100644
--- a/arch/frv/kernel/vmlinux.lds.S
+++ b/arch/frv/kernel/vmlinux.lds.S
@@ -63,6 +63,7 @@ SECTIONS
*(.text..tlbmiss)
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
#ifdef CONFIG_DEBUG_INFO
INIT_TEXT
diff --git a/arch/h8300/kernel/vmlinux.lds.S b/arch/h8300/kernel/vmlinux.lds.S
index cb5dfb02c88d..7f11da1b895e 100644
--- a/arch/h8300/kernel/vmlinux.lds.S
+++ b/arch/h8300/kernel/vmlinux.lds.S
@@ -29,6 +29,7 @@ SECTIONS
_stext = . ;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
#if defined(CONFIG_ROMKERNEL)
*(.int_redirect)
diff --git a/arch/hexagon/kernel/vmlinux.lds.S b/arch/hexagon/kernel/vmlinux.lds.S
index 5f268c1071b3..ec87e67feb19 100644
--- a/arch/hexagon/kernel/vmlinux.lds.S
+++ b/arch/hexagon/kernel/vmlinux.lds.S
@@ -50,6 +50,7 @@ SECTIONS
_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index dc506b05ffbd..f89d20c97412 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -46,6 +46,7 @@ SECTIONS {
__end_ivt_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.gnu.linkonce.t*)
diff --git a/arch/m32r/kernel/vmlinux.lds.S b/arch/m32r/kernel/vmlinux.lds.S
index 018e4a711d79..ad1fe56455aa 100644
--- a/arch/m32r/kernel/vmlinux.lds.S
+++ b/arch/m32r/kernel/vmlinux.lds.S
@@ -31,6 +31,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-nommu.lds b/arch/m68k/kernel/vmlinux-nommu.lds
index 06a763f49fd3..d2c8abf1c8c4 100644
--- a/arch/m68k/kernel/vmlinux-nommu.lds
+++ b/arch/m68k/kernel/vmlinux-nommu.lds
@@ -45,6 +45,7 @@ SECTIONS {
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
. = ALIGN(16);
diff --git a/arch/m68k/kernel/vmlinux-std.lds b/arch/m68k/kernel/vmlinux-std.lds
index d0993594f558..5b5ce1e4d1ed 100644
--- a/arch/m68k/kernel/vmlinux-std.lds
+++ b/arch/m68k/kernel/vmlinux-std.lds
@@ -16,6 +16,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-sun3.lds b/arch/m68k/kernel/vmlinux-sun3.lds
index 8080469ee6c1..fe5ea1974b16 100644
--- a/arch/m68k/kernel/vmlinux-sun3.lds
+++ b/arch/m68k/kernel/vmlinux-sun3.lds
@@ -16,6 +16,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/metag/kernel/vmlinux.lds.S b/arch/metag/kernel/vmlinux.lds.S
index e12055e88bfe..9fc48354d519 100644
--- a/arch/metag/kernel/vmlinux.lds.S
+++ b/arch/metag/kernel/vmlinux.lds.S
@@ -21,6 +21,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/microblaze/kernel/vmlinux.lds.S b/arch/microblaze/kernel/vmlinux.lds.S
index be9488d69734..5913c7863067 100644
--- a/arch/microblaze/kernel/vmlinux.lds.S
+++ b/arch/microblaze/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS {
EXIT_TEXT
EXIT_CALL
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S
index 0a93e83cd014..e0fc08cb0c89 100644
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -55,6 +55,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/mn10300/kernel/vmlinux.lds.S b/arch/mn10300/kernel/vmlinux.lds.S
index 13c4814c29f8..2d5f1c3f1afb 100644
--- a/arch/mn10300/kernel/vmlinux.lds.S
+++ b/arch/mn10300/kernel/vmlinux.lds.S
@@ -30,6 +30,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/nios2/kernel/vmlinux.lds.S b/arch/nios2/kernel/vmlinux.lds.S
index 326fab40a9de..340c7ab1d8b0 100644
--- a/arch/nios2/kernel/vmlinux.lds.S
+++ b/arch/nios2/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
KPROBES_TEXT
diff --git a/arch/openrisc/kernel/vmlinux.lds.S b/arch/openrisc/kernel/vmlinux.lds.S
index 2d69a853b742..6c3cf834b5d8 100644
--- a/arch/openrisc/kernel/vmlinux.lds.S
+++ b/arch/openrisc/kernel/vmlinux.lds.S
@@ -47,6 +47,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index 308f29081d46..7e53bf44fdd2 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -69,6 +69,7 @@ SECTIONS
.text ALIGN(PAGE_SIZE) : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index d41fd0af8980..bf423392b20a 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
/* careful! __ftr_alt_* sections need to be close to .text */
*(.text .fixup __ftr_alt_* .ref.text)
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 445657fe658c..cbc74fd4a6db 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -25,6 +25,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/score/kernel/vmlinux.lds.S b/arch/score/kernel/vmlinux.lds.S
index 7274b5c4287e..4117890b1db1 100644
--- a/arch/score/kernel/vmlinux.lds.S
+++ b/arch/score/kernel/vmlinux.lds.S
@@ -40,6 +40,7 @@ SECTIONS
_text = .; /* Text and read-only data */
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.text.*)
diff --git a/arch/sh/kernel/vmlinux.lds.S b/arch/sh/kernel/vmlinux.lds.S
index db88cbf9eafd..989500c17358 100644
--- a/arch/sh/kernel/vmlinux.lds.S
+++ b/arch/sh/kernel/vmlinux.lds.S
@@ -36,6 +36,7 @@ SECTIONS
TEXT_TEXT
EXTRA_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
index f1a2f688b28a..93029a4b5299 100644
--- a/arch/sparc/kernel/vmlinux.lds.S
+++ b/arch/sparc/kernel/vmlinux.lds.S
@@ -45,6 +45,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index 670a3569450f..101de132e363 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -50,7 +50,7 @@ STD_ENTRY(smp_nap)
* When interrupted at _cpu_idle_nap, we bump the PC forward 8, and
* as a result return to the function that called _cpu_idle().
*/
-STD_ENTRY(_cpu_idle)
+STD_ENTRY_SECTION(_cpu_idle, .cpuidle.text)
movei r1, 1
IRQ_ENABLE_LOAD(r2, r3)
mtspr INTERRUPT_CRITICAL_SECTION, r1
diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S
index 0e059a0101ea..a92931e8c4f9 100644
--- a/arch/tile/kernel/vmlinux.lds.S
+++ b/arch/tile/kernel/vmlinux.lds.S
@@ -42,6 +42,7 @@ SECTIONS
.text : AT (ADDR(.text) - LOAD_OFFSET) {
HEAD_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index adde088aeeff..4fdbcf958cd5 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -68,6 +68,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.stub .text.* .gnu.linkonce.t.*)
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index 6899195602b7..1840f55ed042 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -28,6 +28,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
/* .gnu.warning sections are handled specially by elf32.em. */
diff --git a/arch/unicore32/kernel/vmlinux.lds.S b/arch/unicore32/kernel/vmlinux.lds.S
index 77e407e49a63..56e788e8ee83 100644
--- a/arch/unicore32/kernel/vmlinux.lds.S
+++ b/arch/unicore32/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
.text : { /* Real text segment */
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT

*(.fixup)
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index 4b28159e0421..7efbb4d19024 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -152,7 +152,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
}
EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);

-void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
+void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
{
unsigned int cpu = smp_processor_id();
struct cstate_entry *percpu_entry;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 9f7c21c22477..d569ae7fde37 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -298,7 +298,7 @@ void arch_cpu_idle(void)
/*
* We use this if we don't have any better idle routine..
*/
-void default_idle(void)
+void __cpuidle default_idle(void)
{
trace_cpu_idle_rcuidle(1, smp_processor_id());
safe_halt();
@@ -413,7 +413,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
* with interrupts enabled and no flags, which is backwards compatible with the
* original MWAIT implementation.
*/
-static void mwait_idle(void)
+static __cpuidle void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
trace_cpu_idle_rcuidle(1, smp_processor_id());
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 74e4bf11f562..95f80be7632f 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -98,6 +98,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
ENTRY_TEXT
diff --git a/arch/xtensa/kernel/vmlinux.lds.S b/arch/xtensa/kernel/vmlinux.lds.S
index c417cbe4ec87..18a174c7fb87 100644
--- a/arch/xtensa/kernel/vmlinux.lds.S
+++ b/arch/xtensa/kernel/vmlinux.lds.S
@@ -93,6 +93,9 @@ SECTIONS
VMLINUX_SYMBOL(__sched_text_start) = .;
*(.sched.literal .sched.text)
VMLINUX_SYMBOL(__sched_text_end) = .;
+ VMLINUX_SYMBOL(__cpuidle_text_start) = .;
+ *(.cpuidle.literal .cpuidle.text)
+ VMLINUX_SYMBOL(__cpuidle_text_end) = .;
VMLINUX_SYMBOL(__lock_text_start) = .;
*(.spinlock.literal .spinlock.text)
VMLINUX_SYMBOL(__lock_text_end) = .;
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index cd4510a63375..924554f920fb 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -725,8 +725,8 @@ static struct cpuidle_state avn_cstates[] = {
*
* Must be called under local_irq_disable().
*/
-static int intel_idle(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index)
+static __cpuidle int intel_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
unsigned long ecx = 1; /* break on interrupt flag */
struct cpuidle_state *state = &drv->states[index];
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index c4bd0e2c173c..18af5199f97c 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -444,6 +444,12 @@
*(.spinlock.text) \
VMLINUX_SYMBOL(__lock_text_end) = .;

+#define CPUIDLE_TEXT \
+ ALIGN_FUNCTION(); \
+ VMLINUX_SYMBOL(__cpuidle_text_start) = .; \
+ *(.cpuidle.text) \
+ VMLINUX_SYMBOL(__cpuidle_text_end) = .;
+
#define KPROBES_TEXT \
ALIGN_FUNCTION(); \
VMLINUX_SYMBOL(__kprobes_text_start) = .; \
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index d2ca8c38f9c4..0cbe214e8f4b 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -274,6 +274,11 @@ void cpu_startup_entry(enum cpuhp_state state);

void cpu_idle_poll_ctrl(bool enable);

+/* Attach to any functions which should be considered cpuidle. */
+#define __cpuidle __attribute__((__section__(".cpuidle.text")))
+
+bool cpu_in_idle(unsigned long pc);
+
void arch_cpu_idle(void);
void arch_cpu_idle_prepare(void);
void arch_cpu_idle_enter(void);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 544a7133cbd1..ffca482beab5 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -15,6 +15,9 @@

#include "sched.h"

+/* Linker adds these: start and end of __cpuidle functions */
+extern char __cpuidle_text_start[], __cpuidle_text_end[];
+
/**
* sched_idle_set_state - Record idle state for the current CPU.
* @idle_state: State to record.
@@ -52,7 +55,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
__setup("hlt", cpu_idle_nopoll_setup);
#endif

-static inline int cpu_idle_poll(void)
+static int noinline __cpuidle cpu_idle_poll(void)
{
rcu_idle_enter();
trace_cpu_idle_rcuidle(0, smp_processor_id());
@@ -83,7 +86,7 @@ void __weak arch_cpu_idle(void)
*
* To use when the cpuidle framework cannot be used.
*/
-void default_idle_call(void)
+void __cpuidle default_idle_call(void)
{
if (current_clr_polling_and_test()) {
local_irq_enable();
@@ -273,6 +276,12 @@ static void cpu_idle_loop(void)
}
}

+bool cpu_in_idle(unsigned long pc)
+{
+ return pc >= (unsigned long)__cpuidle_text_start &&
+ pc < (unsigned long)__cpuidle_text_end;
+}
+
void cpu_startup_entry(enum cpuhp_state state)
{
/*
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 9375c0279b73..ac41f3c84e8d 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/nmi.h>
#include <linux/seq_buf.h>
+#include <linux/cpu.h>

#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
@@ -160,11 +161,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)

/* Replace printk to write into the NMI seq */
this_cpu_write(printk_func, nmi_vprintk);
- pr_warn("NMI backtrace for cpu %d\n", cpu);
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+ if (regs != NULL && cpu_in_idle(instruction_pointer(regs))) {
+ pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
+ cpu, instruction_pointer(regs));
+ } else {
+ pr_warn("NMI backtrace for cpu %d\n", cpu);
+ if (regs)
+ show_regs(regs);
+ else
+ dump_stack();
+ }
this_cpu_write(printk_func, printk_func_save);

cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 48958d3cec9e..bd8349759095 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -888,7 +888,7 @@ static void check_section(const char *modname, struct elf_info *elf,

#define DATA_SECTIONS ".data", ".data.rel"
#define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \
- ".kprobes.text"
+ ".kprobes.text", ".cpuidle.text"
#define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
".fixup", ".entry.text", ".exception.text", ".text.*", \
".coldtext"
diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index e167592793a7..9a6ec6ce00b5 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -357,6 +357,7 @@ is_mcounted_section_name(char const *const txtname)
strcmp(".spinlock.text", txtname) == 0 ||
strcmp(".irqentry.text", txtname) == 0 ||
strcmp(".kprobes.text", txtname) == 0 ||
+ strcmp(".cpuidle.text", txtname) == 0 ||
strcmp(".text.unlikely", txtname) == 0;
}

diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 96e2486a6fc4..29cecf9b504f 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -135,6 +135,7 @@ my %text_sections = (
".spinlock.text" => 1,
".irqentry.text" => 1,
".kprobes.text" => 1,
+ ".cpuidle.text" => 1,
".text.unlikely" => 1,
);

--
2.7.2

2016-03-22 17:35:31

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v3 0/4] improvements to the nmi_backtrace code

>From the version 1 cover letter:

This patch series modifies the trigger_xxx_backtrace() NMI-based
remote backtracing code to make it more flexible, and makes a few
small improvements along the way.

The motivation comes from the task isolation code, where there are
scenarios where we want to be able to diagnose a case where some cpu
is about to interrupt a task-isolated cpu. It can be helpful to
see both where the interrupting cpu is, and also an approximation
of where the cpu that is being interrupted is. The nmi_backtrace
framework allows us to discover the stack of the interrupted cpu.

I've tested that the change works as desired on tile, and build-tested
x86, arm64, and arm. For x86 and arm64 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section. For arm I just build-tested it and made sure the
generic cpuidle routines were in the new cpuidle section, but I didn't
attempt to tease apart the tangle of platform-specific idle routines
that arm has and tag them with __cpuidle. That might be more usefully
done by someone with arm platform experience in a follow-up patch.

I have also pushed it up to kernel.org to pull if that's easier:

git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git nmi-backtrace

v3: Various improvements to the set of __cpuidle functions;
Add back in a missing section accidentally removed in modpost.c (PeterZ)

v2: Switch to using __cpuidle tagging, switch S-O-B to Mellanox
https://lkml.kernel.org/r/[email protected]

Chris Metcalf (4):
nmi_backtrace: add more trigger_*_cpu_backtrace() methods
nmi_backtrace: do a local dump_stack() instead of a self-NMI
arch/tile: adopt the new nmi_backtrace framework
nmi_backtrace: generate one-line reports for idle cpus

arch/alpha/kernel/vmlinux.lds.S | 1 +
arch/arc/kernel/vmlinux.lds.S | 1 +
arch/arm/include/asm/irq.h | 4 +-
arch/arm/kernel/smp.c | 13 +------
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/arm64/mm/proc.S | 2 +
arch/avr32/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/cris/kernel/vmlinux.lds.S | 1 +
arch/frv/kernel/vmlinux.lds.S | 1 +
arch/h8300/kernel/vmlinux.lds.S | 1 +
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/kernel/vmlinux.lds.S | 1 +
arch/m32r/kernel/vmlinux.lds.S | 1 +
arch/m68k/kernel/vmlinux-nommu.lds | 1 +
arch/m68k/kernel/vmlinux-std.lds | 1 +
arch/m68k/kernel/vmlinux-sun3.lds | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/mn10300/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/score/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/include/asm/irq.h | 4 +-
arch/tile/kernel/entry.S | 2 +-
arch/tile/kernel/pmc.c | 3 --
arch/tile/kernel/process.c | 72 ++++++++----------------------------
arch/tile/kernel/traps.c | 7 +++-
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/um/kernel/dyn.lds.S | 1 +
arch/um/kernel/uml.lds.S | 1 +
arch/unicore32/kernel/vmlinux.lds.S | 1 +
arch/x86/include/asm/irq.h | 4 +-
arch/x86/kernel/acpi/cstate.c | 2 +-
arch/x86/kernel/apic/hw_nmi.c | 6 +--
arch/x86/kernel/process.c | 4 +-
arch/x86/kernel/vmlinux.lds.S | 1 +
arch/xtensa/kernel/vmlinux.lds.S | 3 ++
drivers/idle/intel_idle.c | 4 +-
include/asm-generic/vmlinux.lds.h | 6 +++
include/linux/cpu.h | 5 +++
include/linux/nmi.h | 63 ++++++++++++++++++++++++-------
kernel/sched/idle.c | 13 ++++++-
lib/nmi_backtrace.c | 40 +++++++++++++-------
scripts/mod/modpost.c | 2 +-
scripts/recordmcount.c | 1 +
scripts/recordmcount.pl | 1 +
55 files changed, 177 insertions(+), 117 deletions(-)

--
2.7.2

2016-03-22 22:25:54

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Tuesday, March 22, 2016 06:30:05 PM Peter Zijlstra wrote:
> On Tue, Mar 22, 2016 at 01:19:39PM -0400, Chris Metcalf wrote:
> > When doing an nmi backtrace of many cores, most of which are idle,
> > the output is a little overwhelming and very uninformative. Suppress
> > messages for cpus that are idling when they are interrupted and just
> > emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
> >
> > We do this by grouping all the cpuidle code together into a new
> > .cpuidle.text section, and then checking the address of the
> > interrupted PC to see if it lies within that section.
> >
> > This commit suitably tags x86, arm64, and tile idle routines,
> > and only adds in the minimal framework for other architectures.
> >
> > Acked-by: Peter Zijlstra (Intel) <[email protected]>
> > Tested-by: Peter Zijlstra (Intel) <[email protected]>
> > Signed-off-by: Chris Metcalf <[email protected]>
>
> For some reason I found a few CPUs using poll_idle().
>
> Rafael, when and why would that ever get selected as a useful idle
> state? When the predicted idle time is so short even C1 isn't worth it?

Yes, that's the case.

2016-03-22 22:29:04

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Tuesday, March 22, 2016 01:19:39 PM Chris Metcalf wrote:
> When doing an nmi backtrace of many cores, most of which are idle,
> the output is a little overwhelming and very uninformative. Suppress
> messages for cpus that are idling when they are interrupted and just
> emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
>
> We do this by grouping all the cpuidle code together into a new
> .cpuidle.text section, and then checking the address of the
> interrupted PC to see if it lies within that section.
>
> This commit suitably tags x86, arm64, and tile idle routines,
> and only adds in the minimal framework for other architectures.
>
> Acked-by: Peter Zijlstra (Intel) <[email protected]>
> Tested-by: Peter Zijlstra (Intel) <[email protected]>
> Signed-off-by: Chris Metcalf <[email protected]>
> ---

[cut]

> diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> index cd4510a63375..924554f920fb 100644
> --- a/drivers/idle/intel_idle.c
> +++ b/drivers/idle/intel_idle.c
> @@ -725,8 +725,8 @@ static struct cpuidle_state avn_cstates[] = {
> *
> * Must be called under local_irq_disable().
> */
> -static int intel_idle(struct cpuidle_device *dev,
> - struct cpuidle_driver *drv, int index)
> +static __cpuidle int intel_idle(struct cpuidle_device *dev,
> + struct cpuidle_driver *drv, int index)
> {
> unsigned long ecx = 1; /* break on interrupt flag */
> struct cpuidle_state *state = &drv->states[index];

Well, what about intel_idle_freeze()? Or do we not care?

And analogous stuff in processor_idle.c for that matter?

acpi_idle_enter()/acpi_idle_enter_freeze() plus stuff called by those?

Thanks,
Rafael

2016-03-22 22:46:08

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Tue, Mar 22, 2016 at 11:31:11PM +0100, Rafael J. Wysocki wrote:

> > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> > index cd4510a63375..924554f920fb 100644
> > --- a/drivers/idle/intel_idle.c
> > +++ b/drivers/idle/intel_idle.c
> > @@ -725,8 +725,8 @@ static struct cpuidle_state avn_cstates[] = {
> > *
> > * Must be called under local_irq_disable().
> > */
> > -static int intel_idle(struct cpuidle_device *dev,
> > - struct cpuidle_driver *drv, int index)
> > +static __cpuidle int intel_idle(struct cpuidle_device *dev,
> > + struct cpuidle_driver *drv, int index)
> > {
> > unsigned long ecx = 1; /* break on interrupt flag */
> > struct cpuidle_state *state = &drv->states[index];
>
> Well, what about intel_idle_freeze()? Or do we not care?

I argued against it; when you're suspended the NMI watchdog is stopped
too. Then again, you've more experience debugging that thing, so if
you think its useful its not much effort adding it.

> And analogous stuff in processor_idle.c for that matter?
>
> acpi_idle_enter()/acpi_idle_enter_freeze() plus stuff called by those?

Ah, I only tagged acpi_processor_ffh_cstate_enter() because I went from
mwait_idle_with_hints(), I suppose acpi_safe_halt() and
acpi_idle_do_entry() itself for the INB method should cover it?

(This being one of the reasons I asked Chris to Cc you; you know this
stuff far better than I do)


---
drivers/acpi/processor_idle.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 175c86bee3a9..d5b11fff9e88 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -111,7 +111,7 @@ static const struct dmi_system_id processor_power_dmi_table[] = {
* Callers should disable interrupts before the call and enable
* interrupts after return.
*/
-static void acpi_safe_halt(void)
+__cpuidle static void acpi_safe_halt(void)
{
if (!tif_need_resched()) {
safe_halt();
@@ -680,7 +680,7 @@ static int acpi_idle_bm_check(void)
*
* Caller disables interrupt before call and enables interrupt after return.
*/
-static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
+__cpuidle static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
{
if (cx->entry_method == ACPI_CSTATE_FFH) {
/* Call into architectural FFH based C-state */

2016-03-23 00:48:00

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Tuesday, March 22, 2016 11:45:57 PM Peter Zijlstra wrote:
> On Tue, Mar 22, 2016 at 11:31:11PM +0100, Rafael J. Wysocki wrote:
>
> > > diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
> > > index cd4510a63375..924554f920fb 100644
> > > --- a/drivers/idle/intel_idle.c
> > > +++ b/drivers/idle/intel_idle.c
> > > @@ -725,8 +725,8 @@ static struct cpuidle_state avn_cstates[] = {
> > > *
> > > * Must be called under local_irq_disable().
> > > */
> > > -static int intel_idle(struct cpuidle_device *dev,
> > > - struct cpuidle_driver *drv, int index)
> > > +static __cpuidle int intel_idle(struct cpuidle_device *dev,
> > > + struct cpuidle_driver *drv, int index)
> > > {
> > > unsigned long ecx = 1; /* break on interrupt flag */
> > > struct cpuidle_state *state = &drv->states[index];
> >
> > Well, what about intel_idle_freeze()? Or do we not care?
>
> I argued against it; when you're suspended the NMI watchdog is stopped
> too.

Is it also stopped for suspend-to-idle? I'm not sure about that.

Where do I need to look to find out?

> Then again, you've more experience debugging that thing, so if
> you think its useful its not much effort adding it.
>
> > And analogous stuff in processor_idle.c for that matter?
> >
> > acpi_idle_enter()/acpi_idle_enter_freeze() plus stuff called by those?
>
> Ah, I only tagged acpi_processor_ffh_cstate_enter() because I went from
> mwait_idle_with_hints(), I suppose acpi_safe_halt() and
> acpi_idle_do_entry() itself for the INB method should cover it?

Yes, these two should be sufficient.

> (This being one of the reasons I asked Chris to Cc you; you know this
> stuff far better than I do)
>
> ---
> drivers/acpi/processor_idle.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
> index 175c86bee3a9..d5b11fff9e88 100644
> --- a/drivers/acpi/processor_idle.c
> +++ b/drivers/acpi/processor_idle.c
> @@ -111,7 +111,7 @@ static const struct dmi_system_id processor_power_dmi_table[] = {
> * Callers should disable interrupts before the call and enable
> * interrupts after return.
> */
> -static void acpi_safe_halt(void)
> +__cpuidle static void acpi_safe_halt(void)
> {
> if (!tif_need_resched()) {
> safe_halt();
> @@ -680,7 +680,7 @@ static int acpi_idle_bm_check(void)
> *
> * Caller disables interrupt before call and enables interrupt after return.
> */
> -static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
> +__cpuidle static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
> {
> if (cx->entry_method == ACPI_CSTATE_FFH) {
> /* Call into architectural FFH based C-state */

2016-03-23 07:54:19

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v3 4/4] nmi_backtrace: generate one-line reports for idle cpus

On Wed, Mar 23, 2016 at 01:50:00AM +0100, Rafael J. Wysocki wrote:

> > > Well, what about intel_idle_freeze()? Or do we not care?
> >
> > I argued against it; when you're suspended the NMI watchdog is stopped
> > too.
>
> Is it also stopped for suspend-to-idle? I'm not sure about that.
>
> Where do I need to look to find out?

Hmm I have memories of writing a patch to that effect when we were
starting with that suspend-to-idle stuff, because people didn't like
being woken up all the time.

But now that I look I cannot find it either..

2016-03-30 17:16:30

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v4 0/4] improvements to the nmi_backtrace code

>From the version 1 cover letter:

This patch series modifies the trigger_xxx_backtrace() NMI-based
remote backtracing code to make it more flexible, and makes a few
small improvements along the way.

The motivation comes from the task isolation code, where there are
scenarios where we want to be able to diagnose a case where some cpu
is about to interrupt a task-isolated cpu. It can be helpful to
see both where the interrupting cpu is, and also an approximation
of where the cpu that is being interrupted is. The nmi_backtrace
framework allows us to discover the stack of the interrupted cpu.

I've tested that the change works as desired on tile, and build-tested
x86, arm64, and arm. For x86 and arm64 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section. For arm I just build-tested it and made sure the
generic cpuidle routines were in the new cpuidle section, but I didn't
attempt to tease apart the tangle of platform-specific idle routines
that arm has and tag them with __cpuidle. That might be more usefully
done by someone with arm platform experience in a follow-up patch.

I have also pushed it up to kernel.org to pull if that's easier:

git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git nmi-backtrace

The change conflicts with Petr Mladek's NMI printk cleanup patches:

https://lkml.kernel.org/r/[email protected]

He has kindly offered to resolve the conflicts.

v4: Added some more __cpuidle functions (PeterZ, Rafael Wysocki)
Rebased to kernel v4.6-rc1

v3: Various improvements to the set of __cpuidle functions;
Add back in a missing section accidentally removed in modpost.c (PeterZ)
https://lkml.kernel.org/r/[email protected]

v2: Switch to using __cpuidle tagging, switch S-O-B to Mellanox
https://lkml.kernel.org/r/[email protected]

Chris Metcalf (4):
nmi_backtrace: add more trigger_*_cpu_backtrace() methods
nmi_backtrace: do a local dump_stack() instead of a self-NMI
arch/tile: adopt the new nmi_backtrace framework
nmi_backtrace: generate one-line reports for idle cpus

arch/alpha/kernel/vmlinux.lds.S | 1 +
arch/arc/kernel/vmlinux.lds.S | 1 +
arch/arm/include/asm/irq.h | 4 +-
arch/arm/kernel/smp.c | 13 +------
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/arm64/mm/proc.S | 2 +
arch/avr32/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/cris/kernel/vmlinux.lds.S | 1 +
arch/frv/kernel/vmlinux.lds.S | 1 +
arch/h8300/kernel/vmlinux.lds.S | 1 +
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/kernel/vmlinux.lds.S | 1 +
arch/m32r/kernel/vmlinux.lds.S | 1 +
arch/m68k/kernel/vmlinux-nommu.lds | 1 +
arch/m68k/kernel/vmlinux-std.lds | 1 +
arch/m68k/kernel/vmlinux-sun3.lds | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/mn10300/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/score/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/include/asm/irq.h | 4 +-
arch/tile/kernel/entry.S | 2 +-
arch/tile/kernel/pmc.c | 3 --
arch/tile/kernel/process.c | 72 ++++++++----------------------------
arch/tile/kernel/traps.c | 7 +++-
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/um/kernel/dyn.lds.S | 1 +
arch/um/kernel/uml.lds.S | 1 +
arch/unicore32/kernel/vmlinux.lds.S | 1 +
arch/x86/include/asm/irq.h | 4 +-
arch/x86/kernel/acpi/cstate.c | 2 +-
arch/x86/kernel/apic/hw_nmi.c | 6 +--
arch/x86/kernel/process.c | 4 +-
arch/x86/kernel/vmlinux.lds.S | 1 +
arch/xtensa/kernel/vmlinux.lds.S | 3 ++
drivers/acpi/processor_idle.c | 5 ++-
drivers/cpuidle/driver.c | 5 ++-
drivers/idle/intel_idle.c | 4 +-
include/asm-generic/vmlinux.lds.h | 6 +++
include/linux/cpu.h | 5 +++
include/linux/nmi.h | 63 ++++++++++++++++++++++++-------
kernel/sched/idle.c | 13 ++++++-
lib/nmi_backtrace.c | 40 +++++++++++++-------
scripts/mod/modpost.c | 2 +-
scripts/recordmcount.c | 1 +
scripts/recordmcount.pl | 1 +
57 files changed, 183 insertions(+), 121 deletions(-)

--
2.7.2

2016-03-30 17:16:48

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v4 4/4] nmi_backtrace: generate one-line reports for idle cpus

When doing an nmi backtrace of many cores, most of which are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the
interrupted PC to see if it lies within that section.

This commit suitably tags x86, arm64, and tile idle routines,
and only adds in the minimal framework for other architectures.

Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
---
arch/alpha/kernel/vmlinux.lds.S | 1 +
arch/arc/kernel/vmlinux.lds.S | 1 +
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/arm64/mm/proc.S | 2 ++
arch/avr32/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/cris/kernel/vmlinux.lds.S | 1 +
arch/frv/kernel/vmlinux.lds.S | 1 +
arch/h8300/kernel/vmlinux.lds.S | 1 +
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/kernel/vmlinux.lds.S | 1 +
arch/m32r/kernel/vmlinux.lds.S | 1 +
arch/m68k/kernel/vmlinux-nommu.lds | 1 +
arch/m68k/kernel/vmlinux-std.lds | 1 +
arch/m68k/kernel/vmlinux-sun3.lds | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/mn10300/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/score/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/kernel/entry.S | 2 +-
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/um/kernel/dyn.lds.S | 1 +
arch/um/kernel/uml.lds.S | 1 +
arch/unicore32/kernel/vmlinux.lds.S | 1 +
arch/x86/kernel/acpi/cstate.c | 2 +-
arch/x86/kernel/process.c | 4 ++--
arch/x86/kernel/vmlinux.lds.S | 1 +
arch/xtensa/kernel/vmlinux.lds.S | 3 +++
drivers/acpi/processor_idle.c | 5 +++--
drivers/cpuidle/driver.c | 5 +++--
drivers/idle/intel_idle.c | 4 ++--
include/asm-generic/vmlinux.lds.h | 6 ++++++
include/linux/cpu.h | 5 +++++
kernel/sched/idle.c | 13 +++++++++++--
lib/nmi_backtrace.c | 16 +++++++++++-----
scripts/mod/modpost.c | 2 +-
scripts/recordmcount.c | 1 +
scripts/recordmcount.pl | 1 +
48 files changed, 86 insertions(+), 18 deletions(-)

diff --git a/arch/alpha/kernel/vmlinux.lds.S b/arch/alpha/kernel/vmlinux.lds.S
index 647b84c15382..cebecfb76fbf 100644
--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -22,6 +22,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/arc/kernel/vmlinux.lds.S b/arch/arc/kernel/vmlinux.lds.S
index 894e696bddaa..65652160cfda 100644
--- a/arch/arc/kernel/vmlinux.lds.S
+++ b/arch/arc/kernel/vmlinux.lds.S
@@ -97,6 +97,7 @@ SECTIONS
_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index e2c6da096cef..b5376e87e61c 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -111,6 +111,7 @@ SECTIONS
SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
HYPERVISOR_TEXT
KPROBES_TEXT
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 5a1939a74ff3..fbedb7f489c7 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -106,6 +106,7 @@ SECTIONS
SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
HYPERVISOR_TEXT
IDMAP_TEXT
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 543f5198005a..580fec01f009 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -50,11 +50,13 @@
*
* Idle the processor (wait for interrupt).
*/
+ .pushsection ".cpuidle.text","ax"
ENTRY(cpu_do_idle)
dsb sy // WFI may enter a low-power mode
wfi
ret
ENDPROC(cpu_do_idle)
+ .popsection

#ifdef CONFIG_CPU_PM
/**
diff --git a/arch/avr32/kernel/vmlinux.lds.S b/arch/avr32/kernel/vmlinux.lds.S
index a4589176bed5..17f2730eb497 100644
--- a/arch/avr32/kernel/vmlinux.lds.S
+++ b/arch/avr32/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
KPROBES_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S
index d920b959ff3a..68069a120055 100644
--- a/arch/blackfin/kernel/vmlinux.lds.S
+++ b/arch/blackfin/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS
#ifndef CONFIG_SCHEDULE_L1
SCHED_TEXT
#endif
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
diff --git a/arch/c6x/kernel/vmlinux.lds.S b/arch/c6x/kernel/vmlinux.lds.S
index 50bc10f97bcb..a1a5c166bc9b 100644
--- a/arch/c6x/kernel/vmlinux.lds.S
+++ b/arch/c6x/kernel/vmlinux.lds.S
@@ -70,6 +70,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
diff --git a/arch/cris/kernel/vmlinux.lds.S b/arch/cris/kernel/vmlinux.lds.S
index 7552c2557506..979586261520 100644
--- a/arch/cris/kernel/vmlinux.lds.S
+++ b/arch/cris/kernel/vmlinux.lds.S
@@ -43,6 +43,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.text.__*)
diff --git a/arch/frv/kernel/vmlinux.lds.S b/arch/frv/kernel/vmlinux.lds.S
index 7e958d829ec9..aa6e573d57da 100644
--- a/arch/frv/kernel/vmlinux.lds.S
+++ b/arch/frv/kernel/vmlinux.lds.S
@@ -63,6 +63,7 @@ SECTIONS
*(.text..tlbmiss)
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
#ifdef CONFIG_DEBUG_INFO
INIT_TEXT
diff --git a/arch/h8300/kernel/vmlinux.lds.S b/arch/h8300/kernel/vmlinux.lds.S
index cb5dfb02c88d..7f11da1b895e 100644
--- a/arch/h8300/kernel/vmlinux.lds.S
+++ b/arch/h8300/kernel/vmlinux.lds.S
@@ -29,6 +29,7 @@ SECTIONS
_stext = . ;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
#if defined(CONFIG_ROMKERNEL)
*(.int_redirect)
diff --git a/arch/hexagon/kernel/vmlinux.lds.S b/arch/hexagon/kernel/vmlinux.lds.S
index 5f268c1071b3..ec87e67feb19 100644
--- a/arch/hexagon/kernel/vmlinux.lds.S
+++ b/arch/hexagon/kernel/vmlinux.lds.S
@@ -50,6 +50,7 @@ SECTIONS
_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index dc506b05ffbd..f89d20c97412 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -46,6 +46,7 @@ SECTIONS {
__end_ivt_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.gnu.linkonce.t*)
diff --git a/arch/m32r/kernel/vmlinux.lds.S b/arch/m32r/kernel/vmlinux.lds.S
index 018e4a711d79..ad1fe56455aa 100644
--- a/arch/m32r/kernel/vmlinux.lds.S
+++ b/arch/m32r/kernel/vmlinux.lds.S
@@ -31,6 +31,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-nommu.lds b/arch/m68k/kernel/vmlinux-nommu.lds
index 06a763f49fd3..d2c8abf1c8c4 100644
--- a/arch/m68k/kernel/vmlinux-nommu.lds
+++ b/arch/m68k/kernel/vmlinux-nommu.lds
@@ -45,6 +45,7 @@ SECTIONS {
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
. = ALIGN(16);
diff --git a/arch/m68k/kernel/vmlinux-std.lds b/arch/m68k/kernel/vmlinux-std.lds
index d0993594f558..5b5ce1e4d1ed 100644
--- a/arch/m68k/kernel/vmlinux-std.lds
+++ b/arch/m68k/kernel/vmlinux-std.lds
@@ -16,6 +16,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-sun3.lds b/arch/m68k/kernel/vmlinux-sun3.lds
index 8080469ee6c1..fe5ea1974b16 100644
--- a/arch/m68k/kernel/vmlinux-sun3.lds
+++ b/arch/m68k/kernel/vmlinux-sun3.lds
@@ -16,6 +16,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/metag/kernel/vmlinux.lds.S b/arch/metag/kernel/vmlinux.lds.S
index 150ace92c7ad..e6c700eaf207 100644
--- a/arch/metag/kernel/vmlinux.lds.S
+++ b/arch/metag/kernel/vmlinux.lds.S
@@ -21,6 +21,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/microblaze/kernel/vmlinux.lds.S b/arch/microblaze/kernel/vmlinux.lds.S
index 0a47f0410554..289d0e7f3e3a 100644
--- a/arch/microblaze/kernel/vmlinux.lds.S
+++ b/arch/microblaze/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS {
EXIT_TEXT
EXIT_CALL
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S
index 54d653ee17e1..f6ca8e5caaf6 100644
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -55,6 +55,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/mn10300/kernel/vmlinux.lds.S b/arch/mn10300/kernel/vmlinux.lds.S
index 13c4814c29f8..2d5f1c3f1afb 100644
--- a/arch/mn10300/kernel/vmlinux.lds.S
+++ b/arch/mn10300/kernel/vmlinux.lds.S
@@ -30,6 +30,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/nios2/kernel/vmlinux.lds.S b/arch/nios2/kernel/vmlinux.lds.S
index e23e89539967..6a8045bb1a77 100644
--- a/arch/nios2/kernel/vmlinux.lds.S
+++ b/arch/nios2/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
diff --git a/arch/openrisc/kernel/vmlinux.lds.S b/arch/openrisc/kernel/vmlinux.lds.S
index d936de4c07ca..d68b9ede8423 100644
--- a/arch/openrisc/kernel/vmlinux.lds.S
+++ b/arch/openrisc/kernel/vmlinux.lds.S
@@ -47,6 +47,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index f3ead0b6ce46..9ec8ec075dae 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -69,6 +69,7 @@ SECTIONS
.text ALIGN(PAGE_SIZE) : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 2dd91f79de05..ac425ff39b4d 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
/* careful! __ftr_alt_* sections need to be close to .text */
*(.text .fixup __ftr_alt_* .ref.text)
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 0f41a8286378..b1c8958e72ad 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -25,6 +25,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/score/kernel/vmlinux.lds.S b/arch/score/kernel/vmlinux.lds.S
index 7274b5c4287e..4117890b1db1 100644
--- a/arch/score/kernel/vmlinux.lds.S
+++ b/arch/score/kernel/vmlinux.lds.S
@@ -40,6 +40,7 @@ SECTIONS
_text = .; /* Text and read-only data */
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.text.*)
diff --git a/arch/sh/kernel/vmlinux.lds.S b/arch/sh/kernel/vmlinux.lds.S
index 235a4101999f..5b9a3cc90c58 100644
--- a/arch/sh/kernel/vmlinux.lds.S
+++ b/arch/sh/kernel/vmlinux.lds.S
@@ -36,6 +36,7 @@ SECTIONS
TEXT_TEXT
EXTRA_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
index aadd321aa05d..846a734e3882 100644
--- a/arch/sparc/kernel/vmlinux.lds.S
+++ b/arch/sparc/kernel/vmlinux.lds.S
@@ -45,6 +45,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index 670a3569450f..101de132e363 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -50,7 +50,7 @@ STD_ENTRY(smp_nap)
* When interrupted at _cpu_idle_nap, we bump the PC forward 8, and
* as a result return to the function that called _cpu_idle().
*/
-STD_ENTRY(_cpu_idle)
+STD_ENTRY_SECTION(_cpu_idle, .cpuidle.text)
movei r1, 1
IRQ_ENABLE_LOAD(r2, r3)
mtspr INTERRUPT_CRITICAL_SECTION, r1
diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S
index 378f5d8d1ec8..9e54bee9c048 100644
--- a/arch/tile/kernel/vmlinux.lds.S
+++ b/arch/tile/kernel/vmlinux.lds.S
@@ -42,6 +42,7 @@ SECTIONS
.text : AT (ADDR(.text) - LOAD_OFFSET) {
HEAD_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index adde088aeeff..4fdbcf958cd5 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -68,6 +68,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.stub .text.* .gnu.linkonce.t.*)
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index 6899195602b7..1840f55ed042 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -28,6 +28,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
/* .gnu.warning sections are handled specially by elf32.em. */
diff --git a/arch/unicore32/kernel/vmlinux.lds.S b/arch/unicore32/kernel/vmlinux.lds.S
index 77e407e49a63..56e788e8ee83 100644
--- a/arch/unicore32/kernel/vmlinux.lds.S
+++ b/arch/unicore32/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
.text : { /* Real text segment */
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT

*(.fixup)
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index 4b28159e0421..7efbb4d19024 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -152,7 +152,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
}
EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);

-void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
+void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
{
unsigned int cpu = smp_processor_id();
struct cstate_entry *percpu_entry;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 2915d54e9dd5..3e1db7fdd69d 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -301,7 +301,7 @@ void arch_cpu_idle(void)
/*
* We use this if we don't have any better idle routine..
*/
-void default_idle(void)
+void __cpuidle default_idle(void)
{
trace_cpu_idle_rcuidle(1, smp_processor_id());
safe_halt();
@@ -416,7 +416,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
* with interrupts enabled and no flags, which is backwards compatible with the
* original MWAIT implementation.
*/
-static void mwait_idle(void)
+static __cpuidle void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
trace_cpu_idle_rcuidle(1, smp_processor_id());
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4c941f88d405..e611d0dc9942 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -97,6 +97,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
ENTRY_TEXT
diff --git a/arch/xtensa/kernel/vmlinux.lds.S b/arch/xtensa/kernel/vmlinux.lds.S
index c417cbe4ec87..18a174c7fb87 100644
--- a/arch/xtensa/kernel/vmlinux.lds.S
+++ b/arch/xtensa/kernel/vmlinux.lds.S
@@ -93,6 +93,9 @@ SECTIONS
VMLINUX_SYMBOL(__sched_text_start) = .;
*(.sched.literal .sched.text)
VMLINUX_SYMBOL(__sched_text_end) = .;
+ VMLINUX_SYMBOL(__cpuidle_text_start) = .;
+ *(.cpuidle.literal .cpuidle.text)
+ VMLINUX_SYMBOL(__cpuidle_text_end) = .;
VMLINUX_SYMBOL(__lock_text_start) = .;
*(.spinlock.literal .spinlock.text)
VMLINUX_SYMBOL(__lock_text_end) = .;
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 444e3745c8b3..2477f9a351d3 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -31,6 +31,7 @@
#include <linux/sched.h> /* need_resched() */
#include <linux/tick.h>
#include <linux/cpuidle.h>
+#include <linux/cpu.h>
#include <acpi/processor.h>

/*
@@ -109,7 +110,7 @@ static const struct dmi_system_id processor_power_dmi_table[] = {
* Callers should disable interrupts before the call and enable
* interrupts after return.
*/
-static void acpi_safe_halt(void)
+static void __cpuidle acpi_safe_halt(void)
{
if (!tif_need_resched()) {
safe_halt();
@@ -640,7 +641,7 @@ static int acpi_idle_bm_check(void)
*
* Caller disables interrupt before call and enables interrupt after return.
*/
-static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
+static void __cpuidle acpi_idle_do_entry(struct acpi_processor_cx *cx)
{
if (cx->entry_method == ACPI_CSTATE_FFH) {
/* Call into architectural FFH based C-state */
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 389ade4572be..ab264d393233 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -14,6 +14,7 @@
#include <linux/cpuidle.h>
#include <linux/cpumask.h>
#include <linux/tick.h>
+#include <linux/cpu.h>

#include "cpuidle.h"

@@ -178,8 +179,8 @@ static void __cpuidle_driver_init(struct cpuidle_driver *drv)
}

#ifdef CONFIG_ARCH_HAS_CPU_RELAX
-static int poll_idle(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index)
+static int __cpuidle poll_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
local_irq_enable();
if (!current_set_polling_and_test()) {
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index ba947df5a8c7..d30127a0f3ac 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -745,8 +745,8 @@ static struct cpuidle_state knl_cstates[] = {
*
* Must be called under local_irq_disable().
*/
-static int intel_idle(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index)
+static __cpuidle int intel_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
unsigned long ecx = 1; /* break on interrupt flag */
struct cpuidle_state *state = &drv->states[index];
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 339125bb4d2c..5ed7075f7ef1 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -444,6 +444,12 @@
*(.spinlock.text) \
VMLINUX_SYMBOL(__lock_text_end) = .;

+#define CPUIDLE_TEXT \
+ ALIGN_FUNCTION(); \
+ VMLINUX_SYMBOL(__cpuidle_text_start) = .; \
+ *(.cpuidle.text) \
+ VMLINUX_SYMBOL(__cpuidle_text_end) = .;
+
#define KPROBES_TEXT \
ALIGN_FUNCTION(); \
VMLINUX_SYMBOL(__kprobes_text_start) = .; \
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index f9b1fab4388a..07642073989c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -268,6 +268,11 @@ void cpu_startup_entry(enum cpuhp_state state);

void cpu_idle_poll_ctrl(bool enable);

+/* Attach to any functions which should be considered cpuidle. */
+#define __cpuidle __attribute__((__section__(".cpuidle.text")))
+
+bool cpu_in_idle(unsigned long pc);
+
void arch_cpu_idle(void);
void arch_cpu_idle_prepare(void);
void arch_cpu_idle_enter(void);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index bd12c6c714ec..d4dc16e6749b 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -16,6 +16,9 @@

#include "sched.h"

+/* Linker adds these: start and end of __cpuidle functions */
+extern char __cpuidle_text_start[], __cpuidle_text_end[];
+
/**
* sched_idle_set_state - Record idle state for the current CPU.
* @idle_state: State to record.
@@ -53,7 +56,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
__setup("hlt", cpu_idle_nopoll_setup);
#endif

-static inline int cpu_idle_poll(void)
+static noinline int __cpuidle cpu_idle_poll(void)
{
rcu_idle_enter();
trace_cpu_idle_rcuidle(0, smp_processor_id());
@@ -84,7 +87,7 @@ void __weak arch_cpu_idle(void)
*
* To use when the cpuidle framework cannot be used.
*/
-void default_idle_call(void)
+void __cpuidle default_idle_call(void)
{
if (current_clr_polling_and_test()) {
local_irq_enable();
@@ -269,6 +272,12 @@ static void cpu_idle_loop(void)
}
}

+bool cpu_in_idle(unsigned long pc)
+{
+ return pc >= (unsigned long)__cpuidle_text_start &&
+ pc < (unsigned long)__cpuidle_text_end;
+}
+
void cpu_startup_entry(enum cpuhp_state state)
{
/*
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 9375c0279b73..ac41f3c84e8d 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/nmi.h>
#include <linux/seq_buf.h>
+#include <linux/cpu.h>

#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
@@ -160,11 +161,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)

/* Replace printk to write into the NMI seq */
this_cpu_write(printk_func, nmi_vprintk);
- pr_warn("NMI backtrace for cpu %d\n", cpu);
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+ if (regs != NULL && cpu_in_idle(instruction_pointer(regs))) {
+ pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
+ cpu, instruction_pointer(regs));
+ } else {
+ pr_warn("NMI backtrace for cpu %d\n", cpu);
+ if (regs)
+ show_regs(regs);
+ else
+ dump_stack();
+ }
this_cpu_write(printk_func, printk_func_save);

cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 48958d3cec9e..bd8349759095 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -888,7 +888,7 @@ static void check_section(const char *modname, struct elf_info *elf,

#define DATA_SECTIONS ".data", ".data.rel"
#define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \
- ".kprobes.text"
+ ".kprobes.text", ".cpuidle.text"
#define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
".fixup", ".entry.text", ".exception.text", ".text.*", \
".coldtext"
diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index e167592793a7..9a6ec6ce00b5 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -357,6 +357,7 @@ is_mcounted_section_name(char const *const txtname)
strcmp(".spinlock.text", txtname) == 0 ||
strcmp(".irqentry.text", txtname) == 0 ||
strcmp(".kprobes.text", txtname) == 0 ||
+ strcmp(".cpuidle.text", txtname) == 0 ||
strcmp(".text.unlikely", txtname) == 0;
}

diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 96e2486a6fc4..29cecf9b504f 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -135,6 +135,7 @@ my %text_sections = (
".spinlock.text" => 1,
".irqentry.text" => 1,
".kprobes.text" => 1,
+ ".cpuidle.text" => 1,
".text.unlikely" => 1,
);

--
2.7.2

2016-03-30 17:16:41

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v4 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI

Currently on arm there is code that checks whether it should call
dump_stack() explicitly, to avoid trying to raise an NMI when the
current context is not preemptible by the backtrace IPI. Similarly,
the forthcoming arch/tile support uses an IPI mechanism that does
not support generating an NMI to self.

Accordingly, move the code that guards this case into the generic
mechanism, and invoke it unconditionally whenever we want a
backtrace of the current cpu. It seems plausible that in all cases,
dump_stack() will generate better information than generating a
stack from the NMI handler. The register state will be missing,
but that state is likely not particularly helpful in any case.

Or, if we think it is helpful, we should be capturing and emitting
the current register state in all cases when regs == NULL is passed
to nmi_cpu_backtrace().

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/arm/kernel/smp.c | 9 ---------
lib/nmi_backtrace.c | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 72ad8485993a..07223f2a3ee0 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -746,15 +746,6 @@ core_initcall(register_cpufreq_notifier);

static void raise_nmi(cpumask_t *mask)
{
- /*
- * Generate the backtrace directly if we are running in a calling
- * context that is not preemptible by the backtrace IPI. Note
- * that nmi_cpu_backtrace() automatically removes the current cpu
- * from mask.
- */
- if (cpumask_test_cpu(smp_processor_id(), mask) && irqs_disabled())
- nmi_cpu_backtrace(NULL);
-
smp_cross_call(mask, IPI_CPU_BACKTRACE);
}

diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index db63ac75eba0..9375c0279b73 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -76,6 +76,15 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
}

+ /*
+ * Don't try to send an NMI to this cpu; it may work on some
+ * architectures, but on others it may not, and we'll get
+ * information at least as useful just by doing a dump_stack() here.
+ * Note that nmi_cpu_backtrace(NULL) will clear the cpu bit.
+ */
+ if (cpumask_test_cpu(this_cpu, to_cpumask(backtrace_mask)))
+ nmi_cpu_backtrace(NULL);
+
if (!cpumask_empty(to_cpumask(backtrace_mask))) {
pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
--
2.7.2

2016-03-30 17:17:12

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v4 3/4] arch/tile: adopt the new nmi_backtrace framework

Previously tile was rolling its own method of capturing backtrace
data in the NMI handlers, but it was relying on running printk()
from the NMI handler, which is not always safe. So adopt the
nmi_backtrace model (with the new cpumask extension) instead.

So we can call the nmi_backtrace code directly from the nmi handler,
move the nmi_enter()/exit() into the top-level tile NMI handler.

The semantics of the routine change slightly since it is now
synchronous with the remote cores completing the backtraces.
Previously it was asynchronous, but with protection to avoid starting
a new remote backtrace if the old one was still in progress.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/tile/include/asm/irq.h | 4 +--
arch/tile/kernel/pmc.c | 3 --
arch/tile/kernel/process.c | 72 ++++++++++-----------------------------------
arch/tile/kernel/traps.c | 7 +++--
4 files changed, 23 insertions(+), 63 deletions(-)

diff --git a/arch/tile/include/asm/irq.h b/arch/tile/include/asm/irq.h
index 84a924034bdb..909230a02ea8 100644
--- a/arch/tile/include/asm/irq.h
+++ b/arch/tile/include/asm/irq.h
@@ -79,8 +79,8 @@ void tile_irq_activate(unsigned int irq, int tile_irq_type);
void setup_irq_regs(void);

#ifdef __tilegx__
-void arch_trigger_all_cpu_backtrace(bool self);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
#endif

#endif /* _ASM_TILE_IRQ_H */
diff --git a/arch/tile/kernel/pmc.c b/arch/tile/kernel/pmc.c
index db62cc34b955..81cf8743a3f3 100644
--- a/arch/tile/kernel/pmc.c
+++ b/arch/tile/kernel/pmc.c
@@ -16,7 +16,6 @@
#include <linux/spinlock.h>
#include <linux/module.h>
#include <linux/atomic.h>
-#include <linux/interrupt.h>

#include <asm/processor.h>
#include <asm/pmc.h>
@@ -29,9 +28,7 @@ int handle_perf_interrupt(struct pt_regs *regs, int fault)
if (!perf_irq)
panic("Unexpected PERF_COUNT interrupt %d\n", fault);

- nmi_enter();
retval = perf_irq(regs, fault);
- nmi_exit();
return retval;
}

diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index b5f30d376ce1..6594df5fed53 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -22,7 +22,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/compat.h>
-#include <linux/hardirq.h>
+#include <linux/nmi.h>
#include <linux/syscalls.h>
#include <linux/kernel.h>
#include <linux/tracehook.h>
@@ -593,66 +593,18 @@ void show_regs(struct pt_regs *regs)
tile_show_stack(&kbt);
}

-/* To ensure stack dump on tiles occurs one by one. */
-static DEFINE_SPINLOCK(backtrace_lock);
-/* To ensure no backtrace occurs before all of the stack dump are done. */
-static atomic_t backtrace_cpus;
-/* The cpu mask to avoid reentrance. */
-static struct cpumask backtrace_mask;
-
-void do_nmi_dump_stack(struct pt_regs *regs)
-{
- int is_idle = is_idle_task(current) && !in_interrupt();
- int cpu;
-
- nmi_enter();
- cpu = smp_processor_id();
- if (WARN_ON_ONCE(!cpumask_test_and_clear_cpu(cpu, &backtrace_mask)))
- goto done;
-
- spin_lock(&backtrace_lock);
- if (is_idle)
- pr_info("CPU: %d idle\n", cpu);
- else
- show_regs(regs);
- spin_unlock(&backtrace_lock);
- atomic_dec(&backtrace_cpus);
-done:
- nmi_exit();
-}
-
#ifdef __tilegx__
-void arch_trigger_all_cpu_backtrace(bool self)
+void nmi_raise_cpu_backtrace(struct cpumask *in_mask)
{
struct cpumask mask;
HV_Coord tile;
unsigned int timeout;
int cpu;
- int ongoing;
HV_NMI_Info info[NR_CPUS];

- ongoing = atomic_cmpxchg(&backtrace_cpus, 0, num_online_cpus() - 1);
- if (ongoing != 0) {
- pr_err("Trying to do all-cpu backtrace.\n");
- pr_err("But another all-cpu backtrace is ongoing (%d cpus left)\n",
- ongoing);
- if (self) {
- pr_err("Reporting the stack on this cpu only.\n");
- dump_stack();
- }
- return;
- }
-
- cpumask_copy(&mask, cpu_online_mask);
- cpumask_clear_cpu(smp_processor_id(), &mask);
- cpumask_copy(&backtrace_mask, &mask);
-
- /* Backtrace for myself first. */
- if (self)
- dump_stack();
-
/* Tentatively dump stack on remote tiles via NMI. */
timeout = 100;
+ cpumask_copy(&mask, in_mask);
while (!cpumask_empty(&mask) && timeout) {
for_each_cpu(cpu, &mask) {
tile.x = cpu_x(cpu);
@@ -663,12 +615,17 @@ void arch_trigger_all_cpu_backtrace(bool self)
}

mdelay(10);
+ touch_softlockup_watchdog();
timeout--;
}

- /* Warn about cpus stuck in ICS and decrement their counts here. */
+ /* Warn about cpus stuck in ICS. */
if (!cpumask_empty(&mask)) {
for_each_cpu(cpu, &mask) {
+
+ /* Clear the bit as if nmi_cpu_backtrace() ran. */
+ cpumask_clear_cpu(cpu, in_mask);
+
switch (info[cpu].result) {
case HV_NMI_RESULT_FAIL_ICS:
pr_warn("Skipping stack dump of cpu %d in ICS at pc %#llx\n",
@@ -679,16 +636,19 @@ void arch_trigger_all_cpu_backtrace(bool self)
cpu);
break;
case HV_ENOSYS:
- pr_warn("Hypervisor too old to allow remote stack dumps.\n");
- goto skip_for_each;
+ WARN_ONCE(1, "Hypervisor too old to allow remote stack dumps.\n");
+ break;
default: /* should not happen */
pr_warn("Skipping stack dump of cpu %d [%d,%#llx]\n",
cpu, info[cpu].result, info[cpu].pc);
break;
}
}
-skip_for_each:
- atomic_sub(cpumask_weight(&mask), &backtrace_cpus);
}
}
+
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
+{
+ nmi_trigger_cpumask_backtrace(mask, nmi_raise_cpu_backtrace);
+}
#endif /* __tilegx_ */
diff --git a/arch/tile/kernel/traps.c b/arch/tile/kernel/traps.c
index 4d9651c5b1ad..934a7d88eb29 100644
--- a/arch/tile/kernel/traps.c
+++ b/arch/tile/kernel/traps.c
@@ -20,6 +20,8 @@
#include <linux/reboot.h>
#include <linux/uaccess.h>
#include <linux/ptrace.h>
+#include <linux/hardirq.h>
+#include <linux/nmi.h>
#include <asm/stack.h>
#include <asm/traps.h>
#include <asm/setup.h>
@@ -392,14 +394,15 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,

void do_nmi(struct pt_regs *regs, int fault_num, unsigned long reason)
{
+ nmi_enter();
switch (reason) {
case TILE_NMI_DUMP_STACK:
- do_nmi_dump_stack(regs);
+ nmi_cpu_backtrace(regs);
break;
default:
panic("Unexpected do_nmi type %ld", reason);
- return;
}
+ nmi_exit();
}

/* Deprecated function currently only used here. */
--
2.7.2

2016-03-30 17:17:33

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v4 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

Currently you can only request a backtrace of either all cpus, or
all cpus but yourself. It can also be helpful to request a remote
backtrace of a single cpu, and since we want that, the logical
extension is to support a cpumask as the underlying primitive.

This change modifies the existing lib/nmi_backtrace.c code to take
a cpumask as its basic primitive, and modifies the linux/nmi.h code
to use either the old "all/all_but_self" arch methods, or the new
"cpumask" method, depending on which is available.

The existing clients of nmi_backtrace (arm and x86) are converted
to using the new cpumask approach in this change.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/arm/include/asm/irq.h | 4 +--
arch/arm/kernel/smp.c | 4 +--
arch/x86/include/asm/irq.h | 4 +--
arch/x86/kernel/apic/hw_nmi.c | 6 ++---
include/linux/nmi.h | 63 ++++++++++++++++++++++++++++++++++---------
lib/nmi_backtrace.c | 15 +++++------
6 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index 1bd9510de1b9..13f9a9a17eca 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -36,8 +36,8 @@ extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
#endif

#ifdef CONFIG_SMP
-extern void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x)
+extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask);
+#define arch_trigger_cpumask_backtrace(x) arch_trigger_cpumask_backtrace(x)
#endif

static inline int nr_legacy_irqs(void)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index baee70267f29..72ad8485993a 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -758,7 +758,7 @@ static void raise_nmi(cpumask_t *mask)
smp_cross_call(mask, IPI_CPU_BACKTRACE);
}

-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
{
- nmi_trigger_all_cpu_backtrace(include_self, raise_nmi);
+ nmi_trigger_cpumask_backtrace(mask, raise_nmi);
}
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index e7de5c9a4fbd..18bdc8cc5c63 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -50,8 +50,8 @@ extern int vector_used_by_percpu_irq(unsigned int vector);
extern void init_ISA_irqs(void);

#ifdef CONFIG_X86_LOCAL_APIC
-void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
#endif

#endif /* _ASM_X86_IRQ_H */
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..63f0b69ad6a6 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -27,15 +27,15 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
}
#endif

-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
static void nmi_raise_cpu_backtrace(cpumask_t *mask)
{
apic->send_IPI_mask(mask, NMI_VECTOR);
}

-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
{
- nmi_trigger_all_cpu_backtrace(include_self, nmi_raise_cpu_backtrace);
+ nmi_trigger_cpumask_backtrace(mask, nmi_raise_cpu_backtrace);
}

static int
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 4630eeae18e0..434208af10fc 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -31,38 +31,75 @@ static inline void hardlockup_detector_disable(void) {}
#endif

/*
- * Create trigger_all_cpu_backtrace() out of the arch-provided
- * base function. Return whether such support was available,
+ * Create trigger_all_cpu_backtrace() etc out of the arch-provided
+ * base function(s). Return whether such support was available,
* to allow calling code to fall back to some other mechanism:
*/
-#ifdef arch_trigger_all_cpu_backtrace
static inline bool trigger_all_cpu_backtrace(void)
{
+#if defined(arch_trigger_all_cpu_backtrace)
arch_trigger_all_cpu_backtrace(true);
-
return true;
+#elif defined(arch_trigger_cpumask_backtrace)
+ arch_trigger_cpumask_backtrace(cpu_online_mask);
+ return true;
+#else
+ return false;
+#endif
}
+
static inline bool trigger_allbutself_cpu_backtrace(void)
{
+#if defined(arch_trigger_all_cpu_backtrace)
arch_trigger_all_cpu_backtrace(false);
return true;
-}
-
-/* generic implementation */
-void nmi_trigger_all_cpu_backtrace(bool include_self,
- void (*raise)(cpumask_t *mask));
-bool nmi_cpu_backtrace(struct pt_regs *regs);
+#elif defined(arch_trigger_cpumask_backtrace)
+ cpumask_var_t mask;
+ int cpu = get_cpu();

+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ return false;
+ cpumask_copy(mask, cpu_online_mask);
+ cpumask_clear_cpu(cpu, mask);
+ arch_trigger_cpumask_backtrace(mask);
+ put_cpu();
+ free_cpumask_var(mask);
+ return true;
#else
-static inline bool trigger_all_cpu_backtrace(void)
-{
return false;
+#endif
}
-static inline bool trigger_allbutself_cpu_backtrace(void)
+
+static inline bool trigger_cpumask_backtrace(struct cpumask *mask)
{
+#if defined(arch_trigger_cpumask_backtrace)
+ arch_trigger_cpumask_backtrace(mask);
+ return true;
+#else
return false;
+#endif
}
+
+static inline bool trigger_single_cpu_backtrace(int cpu)
+{
+#if defined(arch_trigger_cpumask_backtrace)
+ cpumask_var_t mask;
+
+ if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+ return false;
+ cpumask_set_cpu(cpu, mask);
+ arch_trigger_cpumask_backtrace(mask);
+ free_cpumask_var(mask);
+ return true;
+#else
+ return false;
#endif
+}
+
+/* generic implementation */
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
+ void (*raise)(cpumask_t *mask));
+bool nmi_cpu_backtrace(struct pt_regs *regs);

#ifdef CONFIG_LOCKUP_DETECTOR
u64 hw_nmi_get_sample_period(int watchdog_thresh);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..db63ac75eba0 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -18,7 +18,7 @@
#include <linux/nmi.h>
#include <linux/seq_buf.h>

-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
static cpumask_t printtrace_mask;
@@ -44,12 +44,12 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
}

/*
- * When raise() is called it will be is passed a pointer to the
+ * When raise() is called it will be passed a pointer to the
* backtrace_mask. Architectures that call nmi_cpu_backtrace()
* directly from their raise() functions may rely on the mask
* they are passed being updated as a side effect of this call.
*/
-void nmi_trigger_all_cpu_backtrace(bool include_self,
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
void (*raise)(cpumask_t *mask))
{
struct nmi_seq_buf *s;
@@ -64,10 +64,7 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
return;
}

- cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
- if (!include_self)
- cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
-
+ cpumask_copy(to_cpumask(backtrace_mask), mask);
cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));

/*
@@ -80,8 +77,8 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
}

if (!cpumask_empty(to_cpumask(backtrace_mask))) {
- pr_info("Sending NMI to %s CPUs:\n",
- (include_self ? "all" : "other"));
+ pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
+ this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
raise(to_cpumask(backtrace_mask));
}

--
2.7.2