2016-04-05 17:27:15

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v5 0/4] improvements to the nmi_backtrace code

This is just a one-line change to the v4 series, to catch the new arm
vmlinux-xip.lds.S file, which I missed when I rebased to 4.6 for v4
(my arm config for testing did not include CONFIG_XIP_KERNEL).
Thanks to Fengguang Wu and the 0-day test robot for that.

Whose tree would this go through? I have an ack for Peter Z for
patch 4/4 and no other feedback for patches 1/4 or 2/4; I can
certainly push 3/4 through the tile tree myself if that helps, though
my guess is keeping it with the rest of the series makes more sense
for tile since it doesn't lose any functionality that way.

>From the version 1 cover letter:

This patch series modifies the trigger_xxx_backtrace() NMI-based
remote backtracing code to make it more flexible, and makes a few
small improvements along the way.

The motivation comes from the task isolation code, where there are
scenarios where we want to be able to diagnose a case where some cpu
is about to interrupt a task-isolated cpu. It can be helpful to
see both where the interrupting cpu is, and also an approximation
of where the cpu that is being interrupted is. The nmi_backtrace
framework allows us to discover the stack of the interrupted cpu.

I've tested that the change works as desired on tile, and build-tested
x86, arm64, and arm. For x86 and arm64 I confirmed that the generic
cpuidle stuff as well as the architecture-specific routines are in the
new cpuidle section. For arm I just build-tested it and made sure the
generic cpuidle routines were in the new cpuidle section, but I didn't
attempt to tease apart the tangle of platform-specific idle routines
that arm has and tag them with __cpuidle. That might be more usefully
done by someone with arm platform experience in a follow-up patch.

I have also pushed it up to kernel.org to pull if that's easier:

git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git nmi-backtrace

The change conflicts with Petr Mladek's NMI printk cleanup patches:

https://lkml.kernel.org/r/[email protected]

He has kindly offered to resolve the conflicts.

v5: Add CPUIDLE_TEXT to the new arch/arm/kernel/vmlinux-xip.lds.S

v4: Added some more __cpuidle functions (PeterZ, Rafael Wysocki)
Rebased to kernel v4.6-rc1

v3: Various improvements to the set of __cpuidle functions;
Add back in a missing section accidentally removed in modpost.c (PeterZ)
https://lkml.kernel.org/r/[email protected]

v2: Switch to using __cpuidle tagging, switch S-O-B to Mellanox
https://lkml.kernel.org/r/[email protected]

Chris Metcalf (4):
nmi_backtrace: add more trigger_*_cpu_backtrace() methods
nmi_backtrace: do a local dump_stack() instead of a self-NMI
arch/tile: adopt the new nmi_backtrace framework
nmi_backtrace: generate one-line reports for idle cpus

arch/alpha/kernel/vmlinux.lds.S | 1 +
arch/arc/kernel/vmlinux.lds.S | 1 +
arch/arm/include/asm/irq.h | 4 +-
arch/arm/kernel/smp.c | 13 +------
arch/arm/kernel/vmlinux-xip.lds.S | 1 +
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/arm64/mm/proc.S | 2 +
arch/avr32/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/cris/kernel/vmlinux.lds.S | 1 +
arch/frv/kernel/vmlinux.lds.S | 1 +
arch/h8300/kernel/vmlinux.lds.S | 1 +
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/kernel/vmlinux.lds.S | 1 +
arch/m32r/kernel/vmlinux.lds.S | 1 +
arch/m68k/kernel/vmlinux-nommu.lds | 1 +
arch/m68k/kernel/vmlinux-std.lds | 1 +
arch/m68k/kernel/vmlinux-sun3.lds | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/mn10300/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/score/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/include/asm/irq.h | 4 +-
arch/tile/kernel/entry.S | 2 +-
arch/tile/kernel/pmc.c | 3 --
arch/tile/kernel/process.c | 72 ++++++++----------------------------
arch/tile/kernel/traps.c | 7 +++-
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/um/kernel/dyn.lds.S | 1 +
arch/um/kernel/uml.lds.S | 1 +
arch/unicore32/kernel/vmlinux.lds.S | 1 +
arch/x86/include/asm/irq.h | 4 +-
arch/x86/kernel/acpi/cstate.c | 2 +-
arch/x86/kernel/apic/hw_nmi.c | 6 +--
arch/x86/kernel/process.c | 4 +-
arch/x86/kernel/vmlinux.lds.S | 1 +
arch/xtensa/kernel/vmlinux.lds.S | 3 ++
drivers/acpi/processor_idle.c | 5 ++-
drivers/cpuidle/driver.c | 5 ++-
drivers/idle/intel_idle.c | 4 +-
include/asm-generic/vmlinux.lds.h | 6 +++
include/linux/cpu.h | 5 +++
include/linux/nmi.h | 63 ++++++++++++++++++++++++-------
kernel/sched/idle.c | 13 ++++++-
lib/nmi_backtrace.c | 40 +++++++++++++-------
scripts/mod/modpost.c | 2 +-
scripts/recordmcount.c | 1 +
scripts/recordmcount.pl | 1 +
58 files changed, 184 insertions(+), 121 deletions(-)

--
2.7.2


2016-04-05 17:27:10

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v5 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

Currently you can only request a backtrace of either all cpus, or
all cpus but yourself. It can also be helpful to request a remote
backtrace of a single cpu, and since we want that, the logical
extension is to support a cpumask as the underlying primitive.

This change modifies the existing lib/nmi_backtrace.c code to take
a cpumask as its basic primitive, and modifies the linux/nmi.h code
to use either the old "all/all_but_self" arch methods, or the new
"cpumask" method, depending on which is available.

The existing clients of nmi_backtrace (arm and x86) are converted
to using the new cpumask approach in this change.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/arm/include/asm/irq.h | 4 +--
arch/arm/kernel/smp.c | 4 +--
arch/x86/include/asm/irq.h | 4 +--
arch/x86/kernel/apic/hw_nmi.c | 6 ++---
include/linux/nmi.h | 63 ++++++++++++++++++++++++++++++++++---------
lib/nmi_backtrace.c | 15 +++++------
6 files changed, 65 insertions(+), 31 deletions(-)

diff --git a/arch/arm/include/asm/irq.h b/arch/arm/include/asm/irq.h
index 1bd9510de1b9..13f9a9a17eca 100644
--- a/arch/arm/include/asm/irq.h
+++ b/arch/arm/include/asm/irq.h
@@ -36,8 +36,8 @@ extern void set_handle_irq(void (*handle_irq)(struct pt_regs *));
#endif

#ifdef CONFIG_SMP
-extern void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace(x) arch_trigger_all_cpu_backtrace(x)
+extern void arch_trigger_cpumask_backtrace(const cpumask_t *mask);
+#define arch_trigger_cpumask_backtrace(x) arch_trigger_cpumask_backtrace(x)
#endif

static inline int nr_legacy_irqs(void)
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index baee70267f29..72ad8485993a 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -758,7 +758,7 @@ static void raise_nmi(cpumask_t *mask)
smp_cross_call(mask, IPI_CPU_BACKTRACE);
}

-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
{
- nmi_trigger_all_cpu_backtrace(include_self, raise_nmi);
+ nmi_trigger_cpumask_backtrace(mask, raise_nmi);
}
diff --git a/arch/x86/include/asm/irq.h b/arch/x86/include/asm/irq.h
index e7de5c9a4fbd..18bdc8cc5c63 100644
--- a/arch/x86/include/asm/irq.h
+++ b/arch/x86/include/asm/irq.h
@@ -50,8 +50,8 @@ extern int vector_used_by_percpu_irq(unsigned int vector);
extern void init_ISA_irqs(void);

#ifdef CONFIG_X86_LOCAL_APIC
-void arch_trigger_all_cpu_backtrace(bool);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
#endif

#endif /* _ASM_X86_IRQ_H */
diff --git a/arch/x86/kernel/apic/hw_nmi.c b/arch/x86/kernel/apic/hw_nmi.c
index 045e424fb368..63f0b69ad6a6 100644
--- a/arch/x86/kernel/apic/hw_nmi.c
+++ b/arch/x86/kernel/apic/hw_nmi.c
@@ -27,15 +27,15 @@ u64 hw_nmi_get_sample_period(int watchdog_thresh)
}
#endif

-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
static void nmi_raise_cpu_backtrace(cpumask_t *mask)
{
apic->send_IPI_mask(mask, NMI_VECTOR);
}

-void arch_trigger_all_cpu_backtrace(bool include_self)
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
{
- nmi_trigger_all_cpu_backtrace(include_self, nmi_raise_cpu_backtrace);
+ nmi_trigger_cpumask_backtrace(mask, nmi_raise_cpu_backtrace);
}

static int
diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 4630eeae18e0..434208af10fc 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -31,38 +31,75 @@ static inline void hardlockup_detector_disable(void) {}
#endif

/*
- * Create trigger_all_cpu_backtrace() out of the arch-provided
- * base function. Return whether such support was available,
+ * Create trigger_all_cpu_backtrace() etc out of the arch-provided
+ * base function(s). Return whether such support was available,
* to allow calling code to fall back to some other mechanism:
*/
-#ifdef arch_trigger_all_cpu_backtrace
static inline bool trigger_all_cpu_backtrace(void)
{
+#if defined(arch_trigger_all_cpu_backtrace)
arch_trigger_all_cpu_backtrace(true);
-
return true;
+#elif defined(arch_trigger_cpumask_backtrace)
+ arch_trigger_cpumask_backtrace(cpu_online_mask);
+ return true;
+#else
+ return false;
+#endif
}
+
static inline bool trigger_allbutself_cpu_backtrace(void)
{
+#if defined(arch_trigger_all_cpu_backtrace)
arch_trigger_all_cpu_backtrace(false);
return true;
-}
-
-/* generic implementation */
-void nmi_trigger_all_cpu_backtrace(bool include_self,
- void (*raise)(cpumask_t *mask));
-bool nmi_cpu_backtrace(struct pt_regs *regs);
+#elif defined(arch_trigger_cpumask_backtrace)
+ cpumask_var_t mask;
+ int cpu = get_cpu();

+ if (!alloc_cpumask_var(&mask, GFP_KERNEL))
+ return false;
+ cpumask_copy(mask, cpu_online_mask);
+ cpumask_clear_cpu(cpu, mask);
+ arch_trigger_cpumask_backtrace(mask);
+ put_cpu();
+ free_cpumask_var(mask);
+ return true;
#else
-static inline bool trigger_all_cpu_backtrace(void)
-{
return false;
+#endif
}
-static inline bool trigger_allbutself_cpu_backtrace(void)
+
+static inline bool trigger_cpumask_backtrace(struct cpumask *mask)
{
+#if defined(arch_trigger_cpumask_backtrace)
+ arch_trigger_cpumask_backtrace(mask);
+ return true;
+#else
return false;
+#endif
}
+
+static inline bool trigger_single_cpu_backtrace(int cpu)
+{
+#if defined(arch_trigger_cpumask_backtrace)
+ cpumask_var_t mask;
+
+ if (!zalloc_cpumask_var(&mask, GFP_KERNEL))
+ return false;
+ cpumask_set_cpu(cpu, mask);
+ arch_trigger_cpumask_backtrace(mask);
+ free_cpumask_var(mask);
+ return true;
+#else
+ return false;
#endif
+}
+
+/* generic implementation */
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
+ void (*raise)(cpumask_t *mask));
+bool nmi_cpu_backtrace(struct pt_regs *regs);

#ifdef CONFIG_LOCKUP_DETECTOR
u64 hw_nmi_get_sample_period(int watchdog_thresh);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 6019c53c669e..db63ac75eba0 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -18,7 +18,7 @@
#include <linux/nmi.h>
#include <linux/seq_buf.h>

-#ifdef arch_trigger_all_cpu_backtrace
+#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
static DECLARE_BITMAP(backtrace_mask, NR_CPUS) __read_mostly;
static cpumask_t printtrace_mask;
@@ -44,12 +44,12 @@ static void print_seq_line(struct nmi_seq_buf *s, int start, int end)
}

/*
- * When raise() is called it will be is passed a pointer to the
+ * When raise() is called it will be passed a pointer to the
* backtrace_mask. Architectures that call nmi_cpu_backtrace()
* directly from their raise() functions may rely on the mask
* they are passed being updated as a side effect of this call.
*/
-void nmi_trigger_all_cpu_backtrace(bool include_self,
+void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
void (*raise)(cpumask_t *mask))
{
struct nmi_seq_buf *s;
@@ -64,10 +64,7 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
return;
}

- cpumask_copy(to_cpumask(backtrace_mask), cpu_online_mask);
- if (!include_self)
- cpumask_clear_cpu(this_cpu, to_cpumask(backtrace_mask));
-
+ cpumask_copy(to_cpumask(backtrace_mask), mask);
cpumask_copy(&printtrace_mask, to_cpumask(backtrace_mask));

/*
@@ -80,8 +77,8 @@ void nmi_trigger_all_cpu_backtrace(bool include_self,
}

if (!cpumask_empty(to_cpumask(backtrace_mask))) {
- pr_info("Sending NMI to %s CPUs:\n",
- (include_self ? "all" : "other"));
+ pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
+ this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
raise(to_cpumask(backtrace_mask));
}

--
2.7.2

2016-04-05 17:27:18

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v5 3/4] arch/tile: adopt the new nmi_backtrace framework

Previously tile was rolling its own method of capturing backtrace
data in the NMI handlers, but it was relying on running printk()
from the NMI handler, which is not always safe. So adopt the
nmi_backtrace model (with the new cpumask extension) instead.

So we can call the nmi_backtrace code directly from the nmi handler,
move the nmi_enter()/exit() into the top-level tile NMI handler.

The semantics of the routine change slightly since it is now
synchronous with the remote cores completing the backtraces.
Previously it was asynchronous, but with protection to avoid starting
a new remote backtrace if the old one was still in progress.

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/tile/include/asm/irq.h | 4 +--
arch/tile/kernel/pmc.c | 3 --
arch/tile/kernel/process.c | 72 ++++++++++-----------------------------------
arch/tile/kernel/traps.c | 7 +++--
4 files changed, 23 insertions(+), 63 deletions(-)

diff --git a/arch/tile/include/asm/irq.h b/arch/tile/include/asm/irq.h
index 84a924034bdb..909230a02ea8 100644
--- a/arch/tile/include/asm/irq.h
+++ b/arch/tile/include/asm/irq.h
@@ -79,8 +79,8 @@ void tile_irq_activate(unsigned int irq, int tile_irq_type);
void setup_irq_regs(void);

#ifdef __tilegx__
-void arch_trigger_all_cpu_backtrace(bool self);
-#define arch_trigger_all_cpu_backtrace arch_trigger_all_cpu_backtrace
+void arch_trigger_cpumask_backtrace(const struct cpumask *mask);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
#endif

#endif /* _ASM_TILE_IRQ_H */
diff --git a/arch/tile/kernel/pmc.c b/arch/tile/kernel/pmc.c
index db62cc34b955..81cf8743a3f3 100644
--- a/arch/tile/kernel/pmc.c
+++ b/arch/tile/kernel/pmc.c
@@ -16,7 +16,6 @@
#include <linux/spinlock.h>
#include <linux/module.h>
#include <linux/atomic.h>
-#include <linux/interrupt.h>

#include <asm/processor.h>
#include <asm/pmc.h>
@@ -29,9 +28,7 @@ int handle_perf_interrupt(struct pt_regs *regs, int fault)
if (!perf_irq)
panic("Unexpected PERF_COUNT interrupt %d\n", fault);

- nmi_enter();
retval = perf_irq(regs, fault);
- nmi_exit();
return retval;
}

diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
index b5f30d376ce1..6594df5fed53 100644
--- a/arch/tile/kernel/process.c
+++ b/arch/tile/kernel/process.c
@@ -22,7 +22,7 @@
#include <linux/init.h>
#include <linux/mm.h>
#include <linux/compat.h>
-#include <linux/hardirq.h>
+#include <linux/nmi.h>
#include <linux/syscalls.h>
#include <linux/kernel.h>
#include <linux/tracehook.h>
@@ -593,66 +593,18 @@ void show_regs(struct pt_regs *regs)
tile_show_stack(&kbt);
}

-/* To ensure stack dump on tiles occurs one by one. */
-static DEFINE_SPINLOCK(backtrace_lock);
-/* To ensure no backtrace occurs before all of the stack dump are done. */
-static atomic_t backtrace_cpus;
-/* The cpu mask to avoid reentrance. */
-static struct cpumask backtrace_mask;
-
-void do_nmi_dump_stack(struct pt_regs *regs)
-{
- int is_idle = is_idle_task(current) && !in_interrupt();
- int cpu;
-
- nmi_enter();
- cpu = smp_processor_id();
- if (WARN_ON_ONCE(!cpumask_test_and_clear_cpu(cpu, &backtrace_mask)))
- goto done;
-
- spin_lock(&backtrace_lock);
- if (is_idle)
- pr_info("CPU: %d idle\n", cpu);
- else
- show_regs(regs);
- spin_unlock(&backtrace_lock);
- atomic_dec(&backtrace_cpus);
-done:
- nmi_exit();
-}
-
#ifdef __tilegx__
-void arch_trigger_all_cpu_backtrace(bool self)
+void nmi_raise_cpu_backtrace(struct cpumask *in_mask)
{
struct cpumask mask;
HV_Coord tile;
unsigned int timeout;
int cpu;
- int ongoing;
HV_NMI_Info info[NR_CPUS];

- ongoing = atomic_cmpxchg(&backtrace_cpus, 0, num_online_cpus() - 1);
- if (ongoing != 0) {
- pr_err("Trying to do all-cpu backtrace.\n");
- pr_err("But another all-cpu backtrace is ongoing (%d cpus left)\n",
- ongoing);
- if (self) {
- pr_err("Reporting the stack on this cpu only.\n");
- dump_stack();
- }
- return;
- }
-
- cpumask_copy(&mask, cpu_online_mask);
- cpumask_clear_cpu(smp_processor_id(), &mask);
- cpumask_copy(&backtrace_mask, &mask);
-
- /* Backtrace for myself first. */
- if (self)
- dump_stack();
-
/* Tentatively dump stack on remote tiles via NMI. */
timeout = 100;
+ cpumask_copy(&mask, in_mask);
while (!cpumask_empty(&mask) && timeout) {
for_each_cpu(cpu, &mask) {
tile.x = cpu_x(cpu);
@@ -663,12 +615,17 @@ void arch_trigger_all_cpu_backtrace(bool self)
}

mdelay(10);
+ touch_softlockup_watchdog();
timeout--;
}

- /* Warn about cpus stuck in ICS and decrement their counts here. */
+ /* Warn about cpus stuck in ICS. */
if (!cpumask_empty(&mask)) {
for_each_cpu(cpu, &mask) {
+
+ /* Clear the bit as if nmi_cpu_backtrace() ran. */
+ cpumask_clear_cpu(cpu, in_mask);
+
switch (info[cpu].result) {
case HV_NMI_RESULT_FAIL_ICS:
pr_warn("Skipping stack dump of cpu %d in ICS at pc %#llx\n",
@@ -679,16 +636,19 @@ void arch_trigger_all_cpu_backtrace(bool self)
cpu);
break;
case HV_ENOSYS:
- pr_warn("Hypervisor too old to allow remote stack dumps.\n");
- goto skip_for_each;
+ WARN_ONCE(1, "Hypervisor too old to allow remote stack dumps.\n");
+ break;
default: /* should not happen */
pr_warn("Skipping stack dump of cpu %d [%d,%#llx]\n",
cpu, info[cpu].result, info[cpu].pc);
break;
}
}
-skip_for_each:
- atomic_sub(cpumask_weight(&mask), &backtrace_cpus);
}
}
+
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask)
+{
+ nmi_trigger_cpumask_backtrace(mask, nmi_raise_cpu_backtrace);
+}
#endif /* __tilegx_ */
diff --git a/arch/tile/kernel/traps.c b/arch/tile/kernel/traps.c
index 4d9651c5b1ad..934a7d88eb29 100644
--- a/arch/tile/kernel/traps.c
+++ b/arch/tile/kernel/traps.c
@@ -20,6 +20,8 @@
#include <linux/reboot.h>
#include <linux/uaccess.h>
#include <linux/ptrace.h>
+#include <linux/hardirq.h>
+#include <linux/nmi.h>
#include <asm/stack.h>
#include <asm/traps.h>
#include <asm/setup.h>
@@ -392,14 +394,15 @@ void __kprobes do_trap(struct pt_regs *regs, int fault_num,

void do_nmi(struct pt_regs *regs, int fault_num, unsigned long reason)
{
+ nmi_enter();
switch (reason) {
case TILE_NMI_DUMP_STACK:
- do_nmi_dump_stack(regs);
+ nmi_cpu_backtrace(regs);
break;
default:
panic("Unexpected do_nmi type %ld", reason);
- return;
}
+ nmi_exit();
}

/* Deprecated function currently only used here. */
--
2.7.2

2016-04-05 17:27:51

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v5 4/4] nmi_backtrace: generate one-line reports for idle cpus

When doing an nmi backtrace of many cores, most of which are idle,
the output is a little overwhelming and very uninformative. Suppress
messages for cpus that are idling when they are interrupted and just
emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

We do this by grouping all the cpuidle code together into a new
.cpuidle.text section, and then checking the address of the
interrupted PC to see if it lies within that section.

This commit suitably tags x86, arm64, and tile idle routines,
and only adds in the minimal framework for other architectures.

Acked-by: Peter Zijlstra (Intel) <[email protected]>
Tested-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Chris Metcalf <[email protected]>
---
arch/alpha/kernel/vmlinux.lds.S | 1 +
arch/arc/kernel/vmlinux.lds.S | 1 +
arch/arm/kernel/vmlinux-xip.lds.S | 1 +
arch/arm/kernel/vmlinux.lds.S | 1 +
arch/arm64/kernel/vmlinux.lds.S | 1 +
arch/arm64/mm/proc.S | 2 ++
arch/avr32/kernel/vmlinux.lds.S | 1 +
arch/blackfin/kernel/vmlinux.lds.S | 1 +
arch/c6x/kernel/vmlinux.lds.S | 1 +
arch/cris/kernel/vmlinux.lds.S | 1 +
arch/frv/kernel/vmlinux.lds.S | 1 +
arch/h8300/kernel/vmlinux.lds.S | 1 +
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/kernel/vmlinux.lds.S | 1 +
arch/m32r/kernel/vmlinux.lds.S | 1 +
arch/m68k/kernel/vmlinux-nommu.lds | 1 +
arch/m68k/kernel/vmlinux-std.lds | 1 +
arch/m68k/kernel/vmlinux-sun3.lds | 1 +
arch/metag/kernel/vmlinux.lds.S | 1 +
arch/microblaze/kernel/vmlinux.lds.S | 1 +
arch/mips/kernel/vmlinux.lds.S | 1 +
arch/mn10300/kernel/vmlinux.lds.S | 1 +
arch/nios2/kernel/vmlinux.lds.S | 1 +
arch/openrisc/kernel/vmlinux.lds.S | 1 +
arch/parisc/kernel/vmlinux.lds.S | 1 +
arch/powerpc/kernel/vmlinux.lds.S | 1 +
arch/s390/kernel/vmlinux.lds.S | 1 +
arch/score/kernel/vmlinux.lds.S | 1 +
arch/sh/kernel/vmlinux.lds.S | 1 +
arch/sparc/kernel/vmlinux.lds.S | 1 +
arch/tile/kernel/entry.S | 2 +-
arch/tile/kernel/vmlinux.lds.S | 1 +
arch/um/kernel/dyn.lds.S | 1 +
arch/um/kernel/uml.lds.S | 1 +
arch/unicore32/kernel/vmlinux.lds.S | 1 +
arch/x86/kernel/acpi/cstate.c | 2 +-
arch/x86/kernel/process.c | 4 ++--
arch/x86/kernel/vmlinux.lds.S | 1 +
arch/xtensa/kernel/vmlinux.lds.S | 3 +++
drivers/acpi/processor_idle.c | 5 +++--
drivers/cpuidle/driver.c | 5 +++--
drivers/idle/intel_idle.c | 4 ++--
include/asm-generic/vmlinux.lds.h | 6 ++++++
include/linux/cpu.h | 5 +++++
kernel/sched/idle.c | 13 +++++++++++--
lib/nmi_backtrace.c | 16 +++++++++++-----
scripts/mod/modpost.c | 2 +-
scripts/recordmcount.c | 1 +
scripts/recordmcount.pl | 1 +
49 files changed, 87 insertions(+), 18 deletions(-)

diff --git a/arch/alpha/kernel/vmlinux.lds.S b/arch/alpha/kernel/vmlinux.lds.S
index 647b84c15382..cebecfb76fbf 100644
--- a/arch/alpha/kernel/vmlinux.lds.S
+++ b/arch/alpha/kernel/vmlinux.lds.S
@@ -22,6 +22,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/arc/kernel/vmlinux.lds.S b/arch/arc/kernel/vmlinux.lds.S
index 894e696bddaa..65652160cfda 100644
--- a/arch/arc/kernel/vmlinux.lds.S
+++ b/arch/arc/kernel/vmlinux.lds.S
@@ -97,6 +97,7 @@ SECTIONS
_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/arm/kernel/vmlinux-xip.lds.S b/arch/arm/kernel/vmlinux-xip.lds.S
index cba1ec899a69..7fa487ef7e2f 100644
--- a/arch/arm/kernel/vmlinux-xip.lds.S
+++ b/arch/arm/kernel/vmlinux-xip.lds.S
@@ -98,6 +98,7 @@ SECTIONS
IRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.gnu.warning)
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index e2c6da096cef..b5376e87e61c 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -111,6 +111,7 @@ SECTIONS
SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
HYPERVISOR_TEXT
KPROBES_TEXT
diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S
index 5a1939a74ff3..fbedb7f489c7 100644
--- a/arch/arm64/kernel/vmlinux.lds.S
+++ b/arch/arm64/kernel/vmlinux.lds.S
@@ -106,6 +106,7 @@ SECTIONS
SOFTIRQENTRY_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
HYPERVISOR_TEXT
IDMAP_TEXT
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S
index 543f5198005a..580fec01f009 100644
--- a/arch/arm64/mm/proc.S
+++ b/arch/arm64/mm/proc.S
@@ -50,11 +50,13 @@
*
* Idle the processor (wait for interrupt).
*/
+ .pushsection ".cpuidle.text","ax"
ENTRY(cpu_do_idle)
dsb sy // WFI may enter a low-power mode
wfi
ret
ENDPROC(cpu_do_idle)
+ .popsection

#ifdef CONFIG_CPU_PM
/**
diff --git a/arch/avr32/kernel/vmlinux.lds.S b/arch/avr32/kernel/vmlinux.lds.S
index a4589176bed5..17f2730eb497 100644
--- a/arch/avr32/kernel/vmlinux.lds.S
+++ b/arch/avr32/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
KPROBES_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/blackfin/kernel/vmlinux.lds.S b/arch/blackfin/kernel/vmlinux.lds.S
index d920b959ff3a..68069a120055 100644
--- a/arch/blackfin/kernel/vmlinux.lds.S
+++ b/arch/blackfin/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS
#ifndef CONFIG_SCHEDULE_L1
SCHED_TEXT
#endif
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
diff --git a/arch/c6x/kernel/vmlinux.lds.S b/arch/c6x/kernel/vmlinux.lds.S
index 50bc10f97bcb..a1a5c166bc9b 100644
--- a/arch/c6x/kernel/vmlinux.lds.S
+++ b/arch/c6x/kernel/vmlinux.lds.S
@@ -70,6 +70,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
diff --git a/arch/cris/kernel/vmlinux.lds.S b/arch/cris/kernel/vmlinux.lds.S
index 7552c2557506..979586261520 100644
--- a/arch/cris/kernel/vmlinux.lds.S
+++ b/arch/cris/kernel/vmlinux.lds.S
@@ -43,6 +43,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.text.__*)
diff --git a/arch/frv/kernel/vmlinux.lds.S b/arch/frv/kernel/vmlinux.lds.S
index 7e958d829ec9..aa6e573d57da 100644
--- a/arch/frv/kernel/vmlinux.lds.S
+++ b/arch/frv/kernel/vmlinux.lds.S
@@ -63,6 +63,7 @@ SECTIONS
*(.text..tlbmiss)
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
#ifdef CONFIG_DEBUG_INFO
INIT_TEXT
diff --git a/arch/h8300/kernel/vmlinux.lds.S b/arch/h8300/kernel/vmlinux.lds.S
index cb5dfb02c88d..7f11da1b895e 100644
--- a/arch/h8300/kernel/vmlinux.lds.S
+++ b/arch/h8300/kernel/vmlinux.lds.S
@@ -29,6 +29,7 @@ SECTIONS
_stext = . ;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
#if defined(CONFIG_ROMKERNEL)
*(.int_redirect)
diff --git a/arch/hexagon/kernel/vmlinux.lds.S b/arch/hexagon/kernel/vmlinux.lds.S
index 5f268c1071b3..ec87e67feb19 100644
--- a/arch/hexagon/kernel/vmlinux.lds.S
+++ b/arch/hexagon/kernel/vmlinux.lds.S
@@ -50,6 +50,7 @@ SECTIONS
_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/ia64/kernel/vmlinux.lds.S b/arch/ia64/kernel/vmlinux.lds.S
index dc506b05ffbd..f89d20c97412 100644
--- a/arch/ia64/kernel/vmlinux.lds.S
+++ b/arch/ia64/kernel/vmlinux.lds.S
@@ -46,6 +46,7 @@ SECTIONS {
__end_ivt_text = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.gnu.linkonce.t*)
diff --git a/arch/m32r/kernel/vmlinux.lds.S b/arch/m32r/kernel/vmlinux.lds.S
index 018e4a711d79..ad1fe56455aa 100644
--- a/arch/m32r/kernel/vmlinux.lds.S
+++ b/arch/m32r/kernel/vmlinux.lds.S
@@ -31,6 +31,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-nommu.lds b/arch/m68k/kernel/vmlinux-nommu.lds
index 06a763f49fd3..d2c8abf1c8c4 100644
--- a/arch/m68k/kernel/vmlinux-nommu.lds
+++ b/arch/m68k/kernel/vmlinux-nommu.lds
@@ -45,6 +45,7 @@ SECTIONS {
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
. = ALIGN(16);
diff --git a/arch/m68k/kernel/vmlinux-std.lds b/arch/m68k/kernel/vmlinux-std.lds
index d0993594f558..5b5ce1e4d1ed 100644
--- a/arch/m68k/kernel/vmlinux-std.lds
+++ b/arch/m68k/kernel/vmlinux-std.lds
@@ -16,6 +16,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/m68k/kernel/vmlinux-sun3.lds b/arch/m68k/kernel/vmlinux-sun3.lds
index 8080469ee6c1..fe5ea1974b16 100644
--- a/arch/m68k/kernel/vmlinux-sun3.lds
+++ b/arch/m68k/kernel/vmlinux-sun3.lds
@@ -16,6 +16,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.gnu.warning)
diff --git a/arch/metag/kernel/vmlinux.lds.S b/arch/metag/kernel/vmlinux.lds.S
index 150ace92c7ad..e6c700eaf207 100644
--- a/arch/metag/kernel/vmlinux.lds.S
+++ b/arch/metag/kernel/vmlinux.lds.S
@@ -21,6 +21,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/microblaze/kernel/vmlinux.lds.S b/arch/microblaze/kernel/vmlinux.lds.S
index 0a47f0410554..289d0e7f3e3a 100644
--- a/arch/microblaze/kernel/vmlinux.lds.S
+++ b/arch/microblaze/kernel/vmlinux.lds.S
@@ -33,6 +33,7 @@ SECTIONS {
EXIT_TEXT
EXIT_CALL
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/mips/kernel/vmlinux.lds.S b/arch/mips/kernel/vmlinux.lds.S
index 54d653ee17e1..f6ca8e5caaf6 100644
--- a/arch/mips/kernel/vmlinux.lds.S
+++ b/arch/mips/kernel/vmlinux.lds.S
@@ -55,6 +55,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/mn10300/kernel/vmlinux.lds.S b/arch/mn10300/kernel/vmlinux.lds.S
index 13c4814c29f8..2d5f1c3f1afb 100644
--- a/arch/mn10300/kernel/vmlinux.lds.S
+++ b/arch/mn10300/kernel/vmlinux.lds.S
@@ -30,6 +30,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.fixup)
diff --git a/arch/nios2/kernel/vmlinux.lds.S b/arch/nios2/kernel/vmlinux.lds.S
index e23e89539967..6a8045bb1a77 100644
--- a/arch/nios2/kernel/vmlinux.lds.S
+++ b/arch/nios2/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
.text : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
IRQENTRY_TEXT
SOFTIRQENTRY_TEXT
diff --git a/arch/openrisc/kernel/vmlinux.lds.S b/arch/openrisc/kernel/vmlinux.lds.S
index d936de4c07ca..d68b9ede8423 100644
--- a/arch/openrisc/kernel/vmlinux.lds.S
+++ b/arch/openrisc/kernel/vmlinux.lds.S
@@ -47,6 +47,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/parisc/kernel/vmlinux.lds.S b/arch/parisc/kernel/vmlinux.lds.S
index f3ead0b6ce46..9ec8ec075dae 100644
--- a/arch/parisc/kernel/vmlinux.lds.S
+++ b/arch/parisc/kernel/vmlinux.lds.S
@@ -69,6 +69,7 @@ SECTIONS
.text ALIGN(PAGE_SIZE) : {
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S
index 2dd91f79de05..ac425ff39b4d 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -52,6 +52,7 @@ SECTIONS
/* careful! __ftr_alt_* sections need to be close to .text */
*(.text .fixup __ftr_alt_* .ref.text)
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 0f41a8286378..b1c8958e72ad 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -25,6 +25,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/score/kernel/vmlinux.lds.S b/arch/score/kernel/vmlinux.lds.S
index 7274b5c4287e..4117890b1db1 100644
--- a/arch/score/kernel/vmlinux.lds.S
+++ b/arch/score/kernel/vmlinux.lds.S
@@ -40,6 +40,7 @@ SECTIONS
_text = .; /* Text and read-only data */
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
*(.text.*)
diff --git a/arch/sh/kernel/vmlinux.lds.S b/arch/sh/kernel/vmlinux.lds.S
index 235a4101999f..5b9a3cc90c58 100644
--- a/arch/sh/kernel/vmlinux.lds.S
+++ b/arch/sh/kernel/vmlinux.lds.S
@@ -36,6 +36,7 @@ SECTIONS
TEXT_TEXT
EXTRA_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/sparc/kernel/vmlinux.lds.S b/arch/sparc/kernel/vmlinux.lds.S
index aadd321aa05d..846a734e3882 100644
--- a/arch/sparc/kernel/vmlinux.lds.S
+++ b/arch/sparc/kernel/vmlinux.lds.S
@@ -45,6 +45,7 @@ SECTIONS
HEAD_TEXT
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/tile/kernel/entry.S b/arch/tile/kernel/entry.S
index 670a3569450f..101de132e363 100644
--- a/arch/tile/kernel/entry.S
+++ b/arch/tile/kernel/entry.S
@@ -50,7 +50,7 @@ STD_ENTRY(smp_nap)
* When interrupted at _cpu_idle_nap, we bump the PC forward 8, and
* as a result return to the function that called _cpu_idle().
*/
-STD_ENTRY(_cpu_idle)
+STD_ENTRY_SECTION(_cpu_idle, .cpuidle.text)
movei r1, 1
IRQ_ENABLE_LOAD(r2, r3)
mtspr INTERRUPT_CRITICAL_SECTION, r1
diff --git a/arch/tile/kernel/vmlinux.lds.S b/arch/tile/kernel/vmlinux.lds.S
index 378f5d8d1ec8..9e54bee9c048 100644
--- a/arch/tile/kernel/vmlinux.lds.S
+++ b/arch/tile/kernel/vmlinux.lds.S
@@ -42,6 +42,7 @@ SECTIONS
.text : AT (ADDR(.text) - LOAD_OFFSET) {
HEAD_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
IRQENTRY_TEXT
diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S
index adde088aeeff..4fdbcf958cd5 100644
--- a/arch/um/kernel/dyn.lds.S
+++ b/arch/um/kernel/dyn.lds.S
@@ -68,6 +68,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
*(.stub .text.* .gnu.linkonce.t.*)
diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S
index 6899195602b7..1840f55ed042 100644
--- a/arch/um/kernel/uml.lds.S
+++ b/arch/um/kernel/uml.lds.S
@@ -28,6 +28,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
*(.fixup)
/* .gnu.warning sections are handled specially by elf32.em. */
diff --git a/arch/unicore32/kernel/vmlinux.lds.S b/arch/unicore32/kernel/vmlinux.lds.S
index 77e407e49a63..56e788e8ee83 100644
--- a/arch/unicore32/kernel/vmlinux.lds.S
+++ b/arch/unicore32/kernel/vmlinux.lds.S
@@ -37,6 +37,7 @@ SECTIONS
.text : { /* Real text segment */
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT

*(.fixup)
diff --git a/arch/x86/kernel/acpi/cstate.c b/arch/x86/kernel/acpi/cstate.c
index 4b28159e0421..7efbb4d19024 100644
--- a/arch/x86/kernel/acpi/cstate.c
+++ b/arch/x86/kernel/acpi/cstate.c
@@ -152,7 +152,7 @@ int acpi_processor_ffh_cstate_probe(unsigned int cpu,
}
EXPORT_SYMBOL_GPL(acpi_processor_ffh_cstate_probe);

-void acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
+void __cpuidle acpi_processor_ffh_cstate_enter(struct acpi_processor_cx *cx)
{
unsigned int cpu = smp_processor_id();
struct cstate_entry *percpu_entry;
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 2915d54e9dd5..3e1db7fdd69d 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -301,7 +301,7 @@ void arch_cpu_idle(void)
/*
* We use this if we don't have any better idle routine..
*/
-void default_idle(void)
+void __cpuidle default_idle(void)
{
trace_cpu_idle_rcuidle(1, smp_processor_id());
safe_halt();
@@ -416,7 +416,7 @@ static int prefer_mwait_c1_over_halt(const struct cpuinfo_x86 *c)
* with interrupts enabled and no flags, which is backwards compatible with the
* original MWAIT implementation.
*/
-static void mwait_idle(void)
+static __cpuidle void mwait_idle(void)
{
if (!current_set_polling_and_test()) {
trace_cpu_idle_rcuidle(1, smp_processor_id());
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 4c941f88d405..e611d0dc9942 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -97,6 +97,7 @@ SECTIONS
_stext = .;
TEXT_TEXT
SCHED_TEXT
+ CPUIDLE_TEXT
LOCK_TEXT
KPROBES_TEXT
ENTRY_TEXT
diff --git a/arch/xtensa/kernel/vmlinux.lds.S b/arch/xtensa/kernel/vmlinux.lds.S
index c417cbe4ec87..18a174c7fb87 100644
--- a/arch/xtensa/kernel/vmlinux.lds.S
+++ b/arch/xtensa/kernel/vmlinux.lds.S
@@ -93,6 +93,9 @@ SECTIONS
VMLINUX_SYMBOL(__sched_text_start) = .;
*(.sched.literal .sched.text)
VMLINUX_SYMBOL(__sched_text_end) = .;
+ VMLINUX_SYMBOL(__cpuidle_text_start) = .;
+ *(.cpuidle.literal .cpuidle.text)
+ VMLINUX_SYMBOL(__cpuidle_text_end) = .;
VMLINUX_SYMBOL(__lock_text_start) = .;
*(.spinlock.literal .spinlock.text)
VMLINUX_SYMBOL(__lock_text_end) = .;
diff --git a/drivers/acpi/processor_idle.c b/drivers/acpi/processor_idle.c
index 444e3745c8b3..2477f9a351d3 100644
--- a/drivers/acpi/processor_idle.c
+++ b/drivers/acpi/processor_idle.c
@@ -31,6 +31,7 @@
#include <linux/sched.h> /* need_resched() */
#include <linux/tick.h>
#include <linux/cpuidle.h>
+#include <linux/cpu.h>
#include <acpi/processor.h>

/*
@@ -109,7 +110,7 @@ static const struct dmi_system_id processor_power_dmi_table[] = {
* Callers should disable interrupts before the call and enable
* interrupts after return.
*/
-static void acpi_safe_halt(void)
+static void __cpuidle acpi_safe_halt(void)
{
if (!tif_need_resched()) {
safe_halt();
@@ -640,7 +641,7 @@ static int acpi_idle_bm_check(void)
*
* Caller disables interrupt before call and enables interrupt after return.
*/
-static void acpi_idle_do_entry(struct acpi_processor_cx *cx)
+static void __cpuidle acpi_idle_do_entry(struct acpi_processor_cx *cx)
{
if (cx->entry_method == ACPI_CSTATE_FFH) {
/* Call into architectural FFH based C-state */
diff --git a/drivers/cpuidle/driver.c b/drivers/cpuidle/driver.c
index 389ade4572be..ab264d393233 100644
--- a/drivers/cpuidle/driver.c
+++ b/drivers/cpuidle/driver.c
@@ -14,6 +14,7 @@
#include <linux/cpuidle.h>
#include <linux/cpumask.h>
#include <linux/tick.h>
+#include <linux/cpu.h>

#include "cpuidle.h"

@@ -178,8 +179,8 @@ static void __cpuidle_driver_init(struct cpuidle_driver *drv)
}

#ifdef CONFIG_ARCH_HAS_CPU_RELAX
-static int poll_idle(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index)
+static int __cpuidle poll_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
local_irq_enable();
if (!current_set_polling_and_test()) {
diff --git a/drivers/idle/intel_idle.c b/drivers/idle/intel_idle.c
index ba947df5a8c7..d30127a0f3ac 100644
--- a/drivers/idle/intel_idle.c
+++ b/drivers/idle/intel_idle.c
@@ -745,8 +745,8 @@ static struct cpuidle_state knl_cstates[] = {
*
* Must be called under local_irq_disable().
*/
-static int intel_idle(struct cpuidle_device *dev,
- struct cpuidle_driver *drv, int index)
+static __cpuidle int intel_idle(struct cpuidle_device *dev,
+ struct cpuidle_driver *drv, int index)
{
unsigned long ecx = 1; /* break on interrupt flag */
struct cpuidle_state *state = &drv->states[index];
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 339125bb4d2c..5ed7075f7ef1 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -444,6 +444,12 @@
*(.spinlock.text) \
VMLINUX_SYMBOL(__lock_text_end) = .;

+#define CPUIDLE_TEXT \
+ ALIGN_FUNCTION(); \
+ VMLINUX_SYMBOL(__cpuidle_text_start) = .; \
+ *(.cpuidle.text) \
+ VMLINUX_SYMBOL(__cpuidle_text_end) = .;
+
#define KPROBES_TEXT \
ALIGN_FUNCTION(); \
VMLINUX_SYMBOL(__kprobes_text_start) = .; \
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index f9b1fab4388a..07642073989c 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -268,6 +268,11 @@ void cpu_startup_entry(enum cpuhp_state state);

void cpu_idle_poll_ctrl(bool enable);

+/* Attach to any functions which should be considered cpuidle. */
+#define __cpuidle __attribute__((__section__(".cpuidle.text")))
+
+bool cpu_in_idle(unsigned long pc);
+
void arch_cpu_idle(void);
void arch_cpu_idle_prepare(void);
void arch_cpu_idle_enter(void);
diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index bd12c6c714ec..d4dc16e6749b 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -16,6 +16,9 @@

#include "sched.h"

+/* Linker adds these: start and end of __cpuidle functions */
+extern char __cpuidle_text_start[], __cpuidle_text_end[];
+
/**
* sched_idle_set_state - Record idle state for the current CPU.
* @idle_state: State to record.
@@ -53,7 +56,7 @@ static int __init cpu_idle_nopoll_setup(char *__unused)
__setup("hlt", cpu_idle_nopoll_setup);
#endif

-static inline int cpu_idle_poll(void)
+static noinline int __cpuidle cpu_idle_poll(void)
{
rcu_idle_enter();
trace_cpu_idle_rcuidle(0, smp_processor_id());
@@ -84,7 +87,7 @@ void __weak arch_cpu_idle(void)
*
* To use when the cpuidle framework cannot be used.
*/
-void default_idle_call(void)
+void __cpuidle default_idle_call(void)
{
if (current_clr_polling_and_test()) {
local_irq_enable();
@@ -269,6 +272,12 @@ static void cpu_idle_loop(void)
}
}

+bool cpu_in_idle(unsigned long pc)
+{
+ return pc >= (unsigned long)__cpuidle_text_start &&
+ pc < (unsigned long)__cpuidle_text_end;
+}
+
void cpu_startup_entry(enum cpuhp_state state)
{
/*
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index 9375c0279b73..ac41f3c84e8d 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -17,6 +17,7 @@
#include <linux/kprobes.h>
#include <linux/nmi.h>
#include <linux/seq_buf.h>
+#include <linux/cpu.h>

#ifdef arch_trigger_cpumask_backtrace
/* For reliability, we're prepared to waste bits here. */
@@ -160,11 +161,16 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)

/* Replace printk to write into the NMI seq */
this_cpu_write(printk_func, nmi_vprintk);
- pr_warn("NMI backtrace for cpu %d\n", cpu);
- if (regs)
- show_regs(regs);
- else
- dump_stack();
+ if (regs != NULL && cpu_in_idle(instruction_pointer(regs))) {
+ pr_warn("NMI backtrace for cpu %d skipped: idling at pc %#lx\n",
+ cpu, instruction_pointer(regs));
+ } else {
+ pr_warn("NMI backtrace for cpu %d\n", cpu);
+ if (regs)
+ show_regs(regs);
+ else
+ dump_stack();
+ }
this_cpu_write(printk_func, printk_func_save);

cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c
index 48958d3cec9e..bd8349759095 100644
--- a/scripts/mod/modpost.c
+++ b/scripts/mod/modpost.c
@@ -888,7 +888,7 @@ static void check_section(const char *modname, struct elf_info *elf,

#define DATA_SECTIONS ".data", ".data.rel"
#define TEXT_SECTIONS ".text", ".text.unlikely", ".sched.text", \
- ".kprobes.text"
+ ".kprobes.text", ".cpuidle.text"
#define OTHER_TEXT_SECTIONS ".ref.text", ".head.text", ".spinlock.text", \
".fixup", ".entry.text", ".exception.text", ".text.*", \
".coldtext"
diff --git a/scripts/recordmcount.c b/scripts/recordmcount.c
index e167592793a7..9a6ec6ce00b5 100644
--- a/scripts/recordmcount.c
+++ b/scripts/recordmcount.c
@@ -357,6 +357,7 @@ is_mcounted_section_name(char const *const txtname)
strcmp(".spinlock.text", txtname) == 0 ||
strcmp(".irqentry.text", txtname) == 0 ||
strcmp(".kprobes.text", txtname) == 0 ||
+ strcmp(".cpuidle.text", txtname) == 0 ||
strcmp(".text.unlikely", txtname) == 0;
}

diff --git a/scripts/recordmcount.pl b/scripts/recordmcount.pl
index 96e2486a6fc4..29cecf9b504f 100755
--- a/scripts/recordmcount.pl
+++ b/scripts/recordmcount.pl
@@ -135,6 +135,7 @@ my %text_sections = (
".spinlock.text" => 1,
".irqentry.text" => 1,
".kprobes.text" => 1,
+ ".cpuidle.text" => 1,
".text.unlikely" => 1,
);

--
2.7.2

2016-04-05 17:42:42

by Chris Metcalf

[permalink] [raw]
Subject: [PATCH v5 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI

Currently on arm there is code that checks whether it should call
dump_stack() explicitly, to avoid trying to raise an NMI when the
current context is not preemptible by the backtrace IPI. Similarly,
the forthcoming arch/tile support uses an IPI mechanism that does
not support generating an NMI to self.

Accordingly, move the code that guards this case into the generic
mechanism, and invoke it unconditionally whenever we want a
backtrace of the current cpu. It seems plausible that in all cases,
dump_stack() will generate better information than generating a
stack from the NMI handler. The register state will be missing,
but that state is likely not particularly helpful in any case.

Or, if we think it is helpful, we should be capturing and emitting
the current register state in all cases when regs == NULL is passed
to nmi_cpu_backtrace().

Signed-off-by: Chris Metcalf <[email protected]>
---
arch/arm/kernel/smp.c | 9 ---------
lib/nmi_backtrace.c | 9 +++++++++
2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 72ad8485993a..07223f2a3ee0 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -746,15 +746,6 @@ core_initcall(register_cpufreq_notifier);

static void raise_nmi(cpumask_t *mask)
{
- /*
- * Generate the backtrace directly if we are running in a calling
- * context that is not preemptible by the backtrace IPI. Note
- * that nmi_cpu_backtrace() automatically removes the current cpu
- * from mask.
- */
- if (cpumask_test_cpu(smp_processor_id(), mask) && irqs_disabled())
- nmi_cpu_backtrace(NULL);
-
smp_cross_call(mask, IPI_CPU_BACKTRACE);
}

diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index db63ac75eba0..9375c0279b73 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -76,6 +76,15 @@ void nmi_trigger_cpumask_backtrace(const cpumask_t *mask,
seq_buf_init(&s->seq, s->buffer, NMI_BUF_SIZE);
}

+ /*
+ * Don't try to send an NMI to this cpu; it may work on some
+ * architectures, but on others it may not, and we'll get
+ * information at least as useful just by doing a dump_stack() here.
+ * Note that nmi_cpu_backtrace(NULL) will clear the cpu bit.
+ */
+ if (cpumask_test_cpu(this_cpu, to_cpumask(backtrace_mask)))
+ nmi_cpu_backtrace(NULL);
+
if (!cpumask_empty(to_cpumask(backtrace_mask))) {
pr_info("Sending NMI from CPU %d to CPUs %*pbl:\n",
this_cpu, nr_cpumask_bits, to_cpumask(backtrace_mask));
--
2.7.2

2016-04-14 15:17:16

by Aaron Tomlin

[permalink] [raw]
Subject: Re: [PATCH v5 1/4] nmi_backtrace: add more trigger_*_cpu_backtrace() methods

On Tue 2016-04-05 13:26 -0400, Chris Metcalf wrote:
> Currently you can only request a backtrace of either all cpus, or
> all cpus but yourself. It can also be helpful to request a remote
> backtrace of a single cpu, and since we want that, the logical
> extension is to support a cpumask as the underlying primitive.
>
> This change modifies the existing lib/nmi_backtrace.c code to take
> a cpumask as its basic primitive, and modifies the linux/nmi.h code
> to use either the old "all/all_but_self" arch methods, or the new
> "cpumask" method, depending on which is available.
>
> The existing clients of nmi_backtrace (arm and x86) are converted
> to using the new cpumask approach in this change.
>
> Signed-off-by: Chris Metcalf <[email protected]>
> ---
> arch/arm/include/asm/irq.h | 4 +--
> arch/arm/kernel/smp.c | 4 +--
> arch/x86/include/asm/irq.h | 4 +--
> arch/x86/kernel/apic/hw_nmi.c | 6 ++---
> include/linux/nmi.h | 63 ++++++++++++++++++++++++++++++++++---------
> lib/nmi_backtrace.c | 15 +++++------
> 6 files changed, 65 insertions(+), 31 deletions(-)

Looks good to me.

Reviewed-by: Aaron Tomlin <[email protected]>

--
Aaron Tomlin

2016-04-14 15:19:40

by Aaron Tomlin

[permalink] [raw]
Subject: Re: [PATCH v5 2/4] nmi_backtrace: do a local dump_stack() instead of a self-NMI

On Tue 2016-04-05 13:26 -0400, Chris Metcalf wrote:
> Currently on arm there is code that checks whether it should call
> dump_stack() explicitly, to avoid trying to raise an NMI when the
> current context is not preemptible by the backtrace IPI. Similarly,
> the forthcoming arch/tile support uses an IPI mechanism that does
> not support generating an NMI to self.
>
> Accordingly, move the code that guards this case into the generic
> mechanism, and invoke it unconditionally whenever we want a
> backtrace of the current cpu. It seems plausible that in all cases,
> dump_stack() will generate better information than generating a
> stack from the NMI handler. The register state will be missing,
> but that state is likely not particularly helpful in any case.
>
> Or, if we think it is helpful, we should be capturing and emitting
> the current register state in all cases when regs == NULL is passed
> to nmi_cpu_backtrace().
>
> Signed-off-by: Chris Metcalf <[email protected]>
> ---
> arch/arm/kernel/smp.c | 9 ---------
> lib/nmi_backtrace.c | 9 +++++++++
> 2 files changed, 9 insertions(+), 9 deletions(-)

Thanks Chris.

Acked-by: Aaron Tomlin <[email protected]>

--
Aaron Tomlin