This is an attempt to resurrect Sumit's old patch series [1] that
allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
also to round up CPUs in kdb/kgdb. The last post from Sumit that I
could find was v7, so I started my series at v8. I haven't copied all
of his old changelongs here, but you can find them from the link.
I'm really looking for a way to land this patch series. In response to
v8, Mark Rutland indicated [2] that he was worried about the soundness
of pseudo NMI. Those definitely need to get fixed, but IMO this patch
series could still land in the meantime. That would at least let
people test with it.
Request for anyone reading this: please help indicate your support of
this patch series landing by replying, even if you don't have the
background for a full review. My suspicion is that there are a lot of
people who agree that this would be super useful to get landed.
Since v8, I have cleaned up this patch series by integrating the 10th
patch from v8 [3] into the whole series. As part of this, I renamed
the "NMI IPI" to the "debug IPI" since it could now be backed by a
regular IPI in the case that pseudo NMIs weren't available. With the
fallback, this allowed me to drop some extra patches from the
series. This feels (to me) to be pretty clean and hopefully others
agree. Any patch I touched significantly I removed Masayoshi and
Chen-Yu's tags from.
...also in v8, I reorderd the patches a bit in a way that seemed a
little cleaner to me.
Since v7, I have:
* Addressed the small amount of feedback that was there for v7.
* Rebased.
* Added a new patch that prevents us from spamming the logs with idle
tasks.
* Added an extra patch to gracefully fall back to regular IPIs if
pseudo-NMIs aren't there.
It can be noted that this patch series works very well with the recent
"hardlockup" patches that have landed through Andrew Morton's tree and
are currently in linuxnext. It works especially well with the "buddy"
lockup detector.
[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
[2] https://lore.kernel.org/lkml/ZFvGqD%2F%2Fpm%[email protected]/
[3] https://lore.kernel.org/r/20230419155341.v8.10.Ic3659997d6243139d0522fc3afcdfd88d7a5f030@changeid/
Changes in v9:
- Add a warning if we don't have enough IPIs for the NMI IPI
- Added comments that we might not be using NMI always.
- Added missing "inline"
- Added to commit message that this doesn't catch all cases.
- Fold in v8 patch #10 ("Fallback to a regular IPI if NMI isn't enabled")
- Moved header file out of "include" since it didn't need to be there.
- Remove arm64_supports_nmi()
- Remove fallback for when debug IPI isn't available.
- Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
- Update commit description
- arch_trigger_cpumask_backtrace() no longer returns bool
Changes in v8:
- "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
- "Tag the arm64 idle functions as __cpuidle" new for v8
- Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
- debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param
Douglas Anderson (2):
arm64: idle: Tag the arm64 idle functions as __cpuidle
kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB
Sumit Garg (5):
irqchip/gic-v3: Enable support for SGIs to act as NMIs
arm64: Add framework for a debug IPI
arm64: smp: Assign and setup the debug IPI
arm64: ipi_debug: Add support for backtrace using the debug IPI
arm64: kgdb: Roundup cpus using the debug IPI
arch/arm64/include/asm/irq.h | 3 +
arch/arm64/kernel/Makefile | 2 +-
arch/arm64/kernel/idle.c | 4 +-
arch/arm64/kernel/ipi_debug.c | 102 ++++++++++++++++++++++++++++++++++
arch/arm64/kernel/ipi_debug.h | 13 +++++
arch/arm64/kernel/kgdb.c | 14 +++++
arch/arm64/kernel/smp.c | 11 ++++
drivers/irqchip/irq-gic-v3.c | 29 +++++++---
include/linux/kgdb.h | 1 +
9 files changed, 168 insertions(+), 11 deletions(-)
create mode 100644 arch/arm64/kernel/ipi_debug.c
create mode 100644 arch/arm64/kernel/ipi_debug.h
--
2.41.0.rc2.161.g9c6817b8e7-goog
From: Sumit Garg <[email protected]>
Enable arch_trigger_cpumask_backtrace() support on arm64 using the new
debug IPI. With this arm64 can now get backtraces in cases where
callers of the trigger_xyz_backtrace() class of functions don't check
the return code and implement a fallback. One example is
`kernel.softlockup_all_cpu_backtrace`. This also allows us to
backtrace hard locked up CPUs in cases where the debug IPI is backed
by an NMI (or pseudo NMI).
Signed-off-by: Sumit Garg <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v9:
- Added comments that we might not be using NMI always.
- Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
- arch_trigger_cpumask_backtrace() no longer returns bool
Changes in v8:
- Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
arch/arm64/include/asm/irq.h | 3 +++
arch/arm64/kernel/ipi_debug.c | 25 +++++++++++++++++++++++--
2 files changed, 26 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
index fac08e18bcd5..be2d103f316e 100644
--- a/arch/arm64/include/asm/irq.h
+++ b/arch/arm64/include/asm/irq.h
@@ -6,6 +6,9 @@
#include <asm-generic/irq.h>
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self);
+#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
+
struct pt_regs;
int set_handle_irq(void (*handle_irq)(struct pt_regs *));
diff --git a/arch/arm64/kernel/ipi_debug.c b/arch/arm64/kernel/ipi_debug.c
index b57833e31eaf..6984ed507e1f 100644
--- a/arch/arm64/kernel/ipi_debug.c
+++ b/arch/arm64/kernel/ipi_debug.c
@@ -8,6 +8,7 @@
#include <linux/interrupt.h>
#include <linux/irq.h>
+#include <linux/nmi.h>
#include <linux/smp.h>
#include "ipi_debug.h"
@@ -24,11 +25,31 @@ void arm64_debug_ipi(cpumask_t *mask)
__ipi_send_mask(ipi_debug_desc, mask);
}
+void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
+{
+ /*
+ * NOTE: though nmi_trigger_cpumask_backtrace has "nmi_" in the name,
+ * nothing about it truly needs to be backed by an NMI, it's just that
+ * it's _allowed_ to be called from an NMI. If set_smp_debug_ipi()
+ * failed to get an NMI (or pseudo-NMI) this will just be backed by a
+ * regular IPI.
+ */
+ nmi_trigger_cpumask_backtrace(mask, exclude_self, arm64_debug_ipi);
+}
+
static irqreturn_t ipi_debug_handler(int irq, void *data)
{
- /* nop, NMI handlers for special features can be added here. */
+ irqreturn_t ret = IRQ_NONE;
+
+ /*
+ * NOTE: Just like in arch_trigger_cpumask_backtrace(), we're calling
+ * a function with "nmi_" in the name but it works fine even if we
+ * are using a regulaor IPI.
+ */
+ if (nmi_cpu_backtrace(get_irq_regs()))
+ ret = IRQ_HANDLED;
- return IRQ_NONE;
+ return ret;
}
void debug_ipi_setup(void)
--
2.41.0.rc2.161.g9c6817b8e7-goog
To save architectures from needing to wrap the call in #ifdefs, add a
stub no-op version of kgdb_nmicallback(), which returns 1 if it didn't
handle anything.
Reviewed-by: Daniel Thompson <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---
In v9 this is the only kgdb dependency. I'm assuming it could go
through the arm64 tree? If that's not a good idea, we could always
change the patch ("arm64: kgdb: Roundup cpus using IPI as NMI") not to
depend on it by only calling kgdb_nmicallback() if CONFIG_KGDB is not
defined.
Changes in v9:
- Added missing "inline"
Changes in v8:
- "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
include/linux/kgdb.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index 258cdde8d356..76e891ee9e37 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -365,5 +365,6 @@ extern void kgdb_free_init_mem(void);
#define dbg_late_init()
static inline void kgdb_panic(const char *msg) {}
static inline void kgdb_free_init_mem(void) { }
+static inline int kgdb_nmicallback(int cpu, void *regs) { return 1; }
#endif /* ! CONFIG_KGDB */
#endif /* _KGDB_H_ */
--
2.41.0.rc2.161.g9c6817b8e7-goog
As per the (somewhat recent) comment before the definition of
`__cpuidle`, the tag is like `noinstr` but also marks a function so it
can be identified by cpu_in_idle(). Let'a add this.
After doing this then when we dump stack traces of all processors
using nmi_cpu_backtrace() then instead of getting useless backtraces
we get things like:
NMI backtrace for cpu N skipped: idling at cpu_do_idle+0x94/0x98
NOTE: this patch won't make cpu_in_idle() work perfectly for arm64,
but it doesn't hurt and does catch some cases. Specifically an example
that wasn't caught in my testing looked like this:
gic_cpu_sys_reg_init+0x1f8/0x314
gic_cpu_pm_notifier+0x40/0x78
raw_notifier_call_chain+0x5c/0x134
cpu_pm_notify+0x38/0x64
cpu_pm_exit+0x20/0x2c
psci_enter_idle_state+0x48/0x70
cpuidle_enter_state+0xb8/0x260
cpuidle_enter+0x44/0x5c
do_idle+0x188/0x30c
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v9:
- Added to commit message that this doesn't catch all cases.
Changes in v8:
- "Tag the arm64 idle functions as __cpuidle" new for v8
arch/arm64/kernel/idle.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
index c1125753fe9b..05cfb347ec26 100644
--- a/arch/arm64/kernel/idle.c
+++ b/arch/arm64/kernel/idle.c
@@ -20,7 +20,7 @@
* ensure that interrupts are not masked at the PMR (because the core will
* not wake up if we block the wake up signal in the interrupt controller).
*/
-void noinstr cpu_do_idle(void)
+void __cpuidle cpu_do_idle(void)
{
struct arm_cpuidle_irq_context context;
@@ -35,7 +35,7 @@ void noinstr cpu_do_idle(void)
/*
* This is our default idle handler.
*/
-void noinstr arch_cpu_idle(void)
+void __cpuidle arch_cpu_idle(void)
{
/*
* This should do all the clock switching and wait for interrupt
--
2.41.0.rc2.161.g9c6817b8e7-goog
From: Sumit Garg <[email protected]>
Let's use the debug IPI for rounding up CPUs in kgdb. When the debug
IPI is backed by an NMI (or pseudo NMI) then this will let us debug
even hard locked CPUs. When the debug IPI isn't backed by an NMI then
this won't really have any huge benefit but it will still work.
Signed-off-by: Sumit Garg <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---
Changes in v9:
- Remove fallback for when debug IPI isn't available.
- Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
arch/arm64/kernel/ipi_debug.c | 5 +++++
arch/arm64/kernel/kgdb.c | 14 ++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/arch/arm64/kernel/ipi_debug.c b/arch/arm64/kernel/ipi_debug.c
index 6984ed507e1f..5794894d94f1 100644
--- a/arch/arm64/kernel/ipi_debug.c
+++ b/arch/arm64/kernel/ipi_debug.c
@@ -8,6 +8,7 @@
#include <linux/interrupt.h>
#include <linux/irq.h>
+#include <linux/kgdb.h>
#include <linux/nmi.h>
#include <linux/smp.h>
@@ -40,6 +41,7 @@ void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
static irqreturn_t ipi_debug_handler(int irq, void *data)
{
irqreturn_t ret = IRQ_NONE;
+ unsigned int cpu = smp_processor_id();
/*
* NOTE: Just like in arch_trigger_cpumask_backtrace(), we're calling
@@ -49,6 +51,9 @@ static irqreturn_t ipi_debug_handler(int irq, void *data)
if (nmi_cpu_backtrace(get_irq_regs()))
ret = IRQ_HANDLED;
+ if (!kgdb_nmicallback(cpu, get_irq_regs()))
+ ret = IRQ_HANDLED;
+
return ret;
}
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 4e1f983df3d1..9c4c47507cd4 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -20,6 +20,8 @@
#include <asm/patching.h>
#include <asm/traps.h>
+#include "ipi_debug.h"
+
struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
{ "x0", 8, offsetof(struct pt_regs, regs[0])},
{ "x1", 8, offsetof(struct pt_regs, regs[1])},
@@ -356,3 +358,15 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
return aarch64_insn_write((void *)bpt->bpt_addr,
*(u32 *)bpt->saved_instr);
}
+
+void kgdb_roundup_cpus(void)
+{
+ struct cpumask mask;
+
+ cpumask_copy(&mask, cpu_online_mask);
+ cpumask_clear_cpu(raw_smp_processor_id(), &mask);
+ if (cpumask_empty(&mask))
+ return;
+
+ arm64_debug_ipi(&mask);
+}
--
2.41.0.rc2.161.g9c6817b8e7-goog
From: Sumit Garg <[email protected]>
Add support to handle SGIs as pseudo NMIs. As SGIs or IPIs default to a
special flow handler: handle_percpu_devid_fasteoi_ipi(), so skip NMI
handler update in case of SGIs.
Also, enable NMI support prior to gic_smp_init() as allocation of SGIs
as IRQs/NMIs happen as part of this routine.
Signed-off-by: Sumit Garg <[email protected]>
Reviewed-by: Masayoshi Mizuma <[email protected]>
Tested-by: Chen-Yu Tsai <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---
(no changes since v1)
drivers/irqchip/irq-gic-v3.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index 0c6c1af9a5b7..ed37e02d4c5f 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -525,6 +525,7 @@ static u32 gic_get_ppi_index(struct irq_data *d)
static int gic_irq_nmi_setup(struct irq_data *d)
{
struct irq_desc *desc = irq_to_desc(d->irq);
+ u32 idx;
if (!gic_supports_nmi())
return -EINVAL;
@@ -542,16 +543,22 @@ static int gic_irq_nmi_setup(struct irq_data *d)
return -EINVAL;
/* desc lock should already be held */
- if (gic_irq_in_rdist(d)) {
- u32 idx = gic_get_ppi_index(d);
+ switch (get_intid_range(d)) {
+ case SGI_RANGE:
+ break;
+ case PPI_RANGE:
+ case EPPI_RANGE:
+ idx = gic_get_ppi_index(d);
/* Setting up PPI as NMI, only switch handler for first NMI */
if (!refcount_inc_not_zero(&ppi_nmi_refs[idx])) {
refcount_set(&ppi_nmi_refs[idx], 1);
desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
}
- } else {
+ break;
+ default:
desc->handle_irq = handle_fasteoi_nmi;
+ break;
}
gic_irq_set_prio(d, GICD_INT_NMI_PRI);
@@ -562,6 +569,7 @@ static int gic_irq_nmi_setup(struct irq_data *d)
static void gic_irq_nmi_teardown(struct irq_data *d)
{
struct irq_desc *desc = irq_to_desc(d->irq);
+ u32 idx;
if (WARN_ON(!gic_supports_nmi()))
return;
@@ -579,14 +587,20 @@ static void gic_irq_nmi_teardown(struct irq_data *d)
return;
/* desc lock should already be held */
- if (gic_irq_in_rdist(d)) {
- u32 idx = gic_get_ppi_index(d);
+ switch (get_intid_range(d)) {
+ case SGI_RANGE:
+ break;
+ case PPI_RANGE:
+ case EPPI_RANGE:
+ idx = gic_get_ppi_index(d);
/* Tearing down NMI, only switch handler for last NMI */
if (refcount_dec_and_test(&ppi_nmi_refs[idx]))
desc->handle_irq = handle_percpu_devid_irq;
- } else {
+ break;
+ default:
desc->handle_irq = handle_fasteoi_irq;
+ break;
}
gic_irq_set_prio(d, GICD_INT_DEF_PRI);
@@ -2001,6 +2015,7 @@ static int __init gic_init_bases(phys_addr_t dist_phys_base,
gic_dist_init();
gic_cpu_init();
+ gic_enable_nmi_support();
gic_smp_init();
gic_cpu_pm_init();
@@ -2013,8 +2028,6 @@ static int __init gic_init_bases(phys_addr_t dist_phys_base,
gicv2m_init(handle, gic_data.domain);
}
- gic_enable_nmi_support();
-
return 0;
out_free:
--
2.41.0.rc2.161.g9c6817b8e7-goog
From: Sumit Garg <[email protected]>
Introduce a framework for an IPI that will be used for debug
purposes. The primary use case of this IPI will be to generate stack
crawls on other CPUs, but it will also be used to round up CPUs for
kgdb.
When possible, we try to allocate this debug IPI as an NMI (or a
pseudo NMI). If that fails (due to CONFIG, an incompatible interrupt
controller, a quirk, missing the "irqchip.gicv3_pseudo_nmi=1" kernel
parameter, etc) we fall back to a normal IPI.
NOTE: hooking this up for CPU backtrace / kgdb will happen in a future
patch, this just adds the framework.
Signed-off-by: Sumit Garg <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---
I didn't get any feedback from v8 patch #10 [1], but I went ahead and
folded it in here anyway since it really simplfies things. If people
don't like the fallback to regular IPI, I can also undo it.
[1] https://lore.kernel.org/r/20230419155341.v8.10.Ic3659997d6243139d0522fc3afcdfd88d7a5f030@changeid/
Changes in v9:
- Fold in v8 patch #10 ("Fallback to a regular IPI if NMI isn't enabled")
- Moved header file out of "include" since it didn't need to be there.
- Remove arm64_supports_nmi()
- Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
Changes in v8:
- debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param
arch/arm64/kernel/Makefile | 2 +-
arch/arm64/kernel/ipi_debug.c | 76 +++++++++++++++++++++++++++++++++++
arch/arm64/kernel/ipi_debug.h | 13 ++++++
3 files changed, 90 insertions(+), 1 deletion(-)
create mode 100644 arch/arm64/kernel/ipi_debug.c
create mode 100644 arch/arm64/kernel/ipi_debug.h
diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index cc22011ab66a..737838f803b7 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -34,7 +34,7 @@ obj-y := debug-monitors.o entry.o irq.o fpsimd.o \
cpufeature.o alternative.o cacheinfo.o \
smp.o smp_spin_table.o topology.o smccc-call.o \
syscall.o proton-pack.o idreg-override.o idle.o \
- patching.o
+ patching.o ipi_debug.o
obj-$(CONFIG_COMPAT) += sys32.o signal32.o \
sys_compat.o
diff --git a/arch/arm64/kernel/ipi_debug.c b/arch/arm64/kernel/ipi_debug.c
new file mode 100644
index 000000000000..b57833e31eaf
--- /dev/null
+++ b/arch/arm64/kernel/ipi_debug.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Debug IPI support
+ *
+ * Copyright (C) 2020 Linaro Limited
+ * Author: Sumit Garg <[email protected]>
+ */
+
+#include <linux/interrupt.h>
+#include <linux/irq.h>
+#include <linux/smp.h>
+
+#include "ipi_debug.h"
+
+static struct irq_desc *ipi_debug_desc __read_mostly;
+static int ipi_debug_id __read_mostly;
+static bool is_nmi;
+
+void arm64_debug_ipi(cpumask_t *mask)
+{
+ if (WARN_ON_ONCE(!ipi_debug_desc))
+ return;
+
+ __ipi_send_mask(ipi_debug_desc, mask);
+}
+
+static irqreturn_t ipi_debug_handler(int irq, void *data)
+{
+ /* nop, NMI handlers for special features can be added here. */
+
+ return IRQ_NONE;
+}
+
+void debug_ipi_setup(void)
+{
+ if (!ipi_debug_desc)
+ return;
+
+ if (is_nmi) {
+ if (!prepare_percpu_nmi(ipi_debug_id))
+ enable_percpu_nmi(ipi_debug_id, IRQ_TYPE_NONE);
+ } else {
+ enable_percpu_irq(ipi_debug_id, IRQ_TYPE_NONE);
+ }
+}
+
+void debug_ipi_teardown(void)
+{
+ if (!ipi_debug_desc)
+ return;
+
+ if (is_nmi) {
+ disable_percpu_nmi(ipi_debug_id);
+ teardown_percpu_nmi(ipi_debug_id);
+ } else {
+ disable_percpu_irq(ipi_debug_id);
+ }
+}
+
+void __init set_smp_debug_ipi(int ipi)
+{
+ int err;
+
+ if (!request_percpu_nmi(ipi, ipi_debug_handler, "IPI", &cpu_number)) {
+ is_nmi = true;
+ } else {
+ err = request_percpu_irq(ipi, ipi_debug_handler, "IPI", &cpu_number);
+ if (WARN_ON(err))
+ return;
+
+ irq_set_status_flags(ipi, IRQ_HIDDEN);
+ }
+
+ ipi_debug_desc = irq_to_desc(ipi);
+ ipi_debug_id = ipi;
+}
diff --git a/arch/arm64/kernel/ipi_debug.h b/arch/arm64/kernel/ipi_debug.h
new file mode 100644
index 000000000000..f6011a09282f
--- /dev/null
+++ b/arch/arm64/kernel/ipi_debug.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef __ASM_NMI_H
+#define __ASM_NMI_H
+
+#include <linux/cpumask.h>
+
+void arm64_debug_ipi(cpumask_t *mask);
+
+void set_smp_debug_ipi(int ipi);
+void debug_ipi_setup(void);
+void debug_ipi_teardown(void);
+
+#endif
--
2.41.0.rc2.161.g9c6817b8e7-goog
From: Sumit Garg <[email protected]>
All current arm64 interrupt controllers have at least 8
IPIs. Currently we are only using 7 of them on arm64. Let's use the
8th one as a debug IPI. This uses the new "debug IPI" infrastructure
which will try to allocate this IPI as an NMI/pseudo NMI if possible.
Signed-off-by: Sumit Garg <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---
I could imagine that people object to using up the last free IPI on
interrupt controllers with only 8 IPIs. However, it shouldn't be a big
deal. If we later need an extra IPI, it shouldn't be too hard to
combine some of the existing ones. Presumably we could just get rid of
the "crash stop" IPI and have the normal "stop" IPI do the crash if
"waiting_for_crash_ipi" is non-zero
Changes in v9:
- Add a warning if we don't have enough IPIs for the NMI IPI
- Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
- Update commit description
Changes in v8:
- debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param
arch/arm64/kernel/smp.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index edd63894d61e..db019b49d3bd 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -53,6 +53,8 @@
#include <trace/events/ipi.h>
+#include "ipi_debug.h"
+
DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
EXPORT_PER_CPU_SYMBOL(cpu_number);
@@ -935,6 +937,8 @@ static void ipi_setup(int cpu)
for (i = 0; i < nr_ipi; i++)
enable_percpu_irq(ipi_irq_base + i, 0);
+
+ debug_ipi_setup();
}
#ifdef CONFIG_HOTPLUG_CPU
@@ -947,6 +951,8 @@ static void ipi_teardown(int cpu)
for (i = 0; i < nr_ipi; i++)
disable_percpu_irq(ipi_irq_base + i);
+
+ debug_ipi_teardown();
}
#endif
@@ -968,6 +974,11 @@ void __init set_smp_ipi_range(int ipi_base, int n)
irq_set_status_flags(ipi_base + i, IRQ_HIDDEN);
}
+ if (n > nr_ipi)
+ set_smp_debug_ipi(ipi_base + nr_ipi);
+ else
+ WARN(1, "Not enough IPIs for NMI IPI\n");
+
ipi_irq_base = ipi_base;
/* Setup the boot CPU immediately */
--
2.41.0.rc2.161.g9c6817b8e7-goog
Hi Doug,
On Fri, 2 Jun 2023 at 03:07, Douglas Anderson <[email protected]> wrote:
>
> This is an attempt to resurrect Sumit's old patch series [1] that
> allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> could find was v7, so I started my series at v8. I haven't copied all
> of his old changelongs here, but you can find them from the link.
>
> I'm really looking for a way to land this patch series. In response to
> v8, Mark Rutland indicated [2] that he was worried about the soundness
> of pseudo NMI. Those definitely need to get fixed, but IMO this patch
> series could still land in the meantime. That would at least let
> people test with it.
+1
>
> Request for anyone reading this: please help indicate your support of
> this patch series landing by replying, even if you don't have the
> background for a full review. My suspicion is that there are a lot of
> people who agree that this would be super useful to get landed.
>
You can ofcourse count me in here. This will certainly bring NMI
debugging capabilities to arm64 and I have been advocating for that
since the advent of pseudo NMIs on arm64.
> Since v8, I have cleaned up this patch series by integrating the 10th
> patch from v8 [3] into the whole series. As part of this, I renamed
> the "NMI IPI" to the "debug IPI" since it could now be backed by a
> regular IPI in the case that pseudo NMIs weren't available. With the
> fallback, this allowed me to drop some extra patches from the
> series. This feels (to me) to be pretty clean and hopefully others
> agree. Any patch I touched significantly I removed Masayoshi and
> Chen-Yu's tags from.
>
> ...also in v8, I reorderd the patches a bit in a way that seemed a
> little cleaner to me.
This cleanup looks good to me.
-Sumit
>
> Since v7, I have:
> * Addressed the small amount of feedback that was there for v7.
> * Rebased.
> * Added a new patch that prevents us from spamming the logs with idle
> tasks.
> * Added an extra patch to gracefully fall back to regular IPIs if
> pseudo-NMIs aren't there.
>
> It can be noted that this patch series works very well with the recent
> "hardlockup" patches that have landed through Andrew Morton's tree and
> are currently in linuxnext. It works especially well with the "buddy"
> lockup detector.
>
> [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> [2] https://lore.kernel.org/lkml/ZFvGqD%2F%2Fpm%[email protected]/
> [3] https://lore.kernel.org/r/20230419155341.v8.10.Ic3659997d6243139d0522fc3afcdfd88d7a5f030@changeid/
>
> Changes in v9:
> - Add a warning if we don't have enough IPIs for the NMI IPI
> - Added comments that we might not be using NMI always.
> - Added missing "inline"
> - Added to commit message that this doesn't catch all cases.
> - Fold in v8 patch #10 ("Fallback to a regular IPI if NMI isn't enabled")
> - Moved header file out of "include" since it didn't need to be there.
> - Remove arm64_supports_nmi()
> - Remove fallback for when debug IPI isn't available.
> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
> - Update commit description
> - arch_trigger_cpumask_backtrace() no longer returns bool
>
> Changes in v8:
> - "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
> - "Tag the arm64 idle functions as __cpuidle" new for v8
> - Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
> - debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param
>
> Douglas Anderson (2):
> arm64: idle: Tag the arm64 idle functions as __cpuidle
> kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB
>
> Sumit Garg (5):
> irqchip/gic-v3: Enable support for SGIs to act as NMIs
> arm64: Add framework for a debug IPI
> arm64: smp: Assign and setup the debug IPI
> arm64: ipi_debug: Add support for backtrace using the debug IPI
> arm64: kgdb: Roundup cpus using the debug IPI
>
> arch/arm64/include/asm/irq.h | 3 +
> arch/arm64/kernel/Makefile | 2 +-
> arch/arm64/kernel/idle.c | 4 +-
> arch/arm64/kernel/ipi_debug.c | 102 ++++++++++++++++++++++++++++++++++
> arch/arm64/kernel/ipi_debug.h | 13 +++++
> arch/arm64/kernel/kgdb.c | 14 +++++
> arch/arm64/kernel/smp.c | 11 ++++
> drivers/irqchip/irq-gic-v3.c | 29 +++++++---
> include/linux/kgdb.h | 1 +
> 9 files changed, 168 insertions(+), 11 deletions(-)
> create mode 100644 arch/arm64/kernel/ipi_debug.c
> create mode 100644 arch/arm64/kernel/ipi_debug.h
>
> --
> 2.41.0.rc2.161.g9c6817b8e7-goog
>
Daniel,
On Thu, Jun 1, 2023 at 2:37 PM Douglas Anderson <[email protected]> wrote:
>
> To save architectures from needing to wrap the call in #ifdefs, add a
> stub no-op version of kgdb_nmicallback(), which returns 1 if it didn't
> handle anything.
>
> Reviewed-by: Daniel Thompson <[email protected]>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
> In v9 this is the only kgdb dependency. I'm assuming it could go
> through the arm64 tree? If that's not a good idea, we could always
> change the patch ("arm64: kgdb: Roundup cpus using IPI as NMI") not to
> depend on it by only calling kgdb_nmicallback() if CONFIG_KGDB is not
> defined.
>
> Changes in v9:
> - Added missing "inline"
>
> Changes in v8:
> - "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
>
> include/linux/kgdb.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> index 258cdde8d356..76e891ee9e37 100644
> --- a/include/linux/kgdb.h
> +++ b/include/linux/kgdb.h
> @@ -365,5 +365,6 @@ extern void kgdb_free_init_mem(void);
> #define dbg_late_init()
> static inline void kgdb_panic(const char *msg) {}
> static inline void kgdb_free_init_mem(void) { }
> +static inline int kgdb_nmicallback(int cpu, void *regs) { return 1; }
What do you think about landing just ${SUBJECT} patch in kgdb right
now so it can end up in v6.5-rc1? It seems like this series is
currently blocked on Mark getting a spare moment and it seems unlikely
that'll happen this cycle. If we at least land the kgdb patch then it
would make things all that much easier to land in the next cycle. The
kgdb patch feels like it can make sense on its own...
-Doug
On Thu, Jun 15, 2023 at 11:14:18AM -0700, Doug Anderson wrote:
> Daniel,
>
> On Thu, Jun 1, 2023 at 2:37 PM Douglas Anderson <[email protected]> wrote:
> >
> > To save architectures from needing to wrap the call in #ifdefs, add a
> > stub no-op version of kgdb_nmicallback(), which returns 1 if it didn't
> > handle anything.
> >
> > Reviewed-by: Daniel Thompson <[email protected]>
> > Signed-off-by: Douglas Anderson <[email protected]>
> > ---
> > In v9 this is the only kgdb dependency. I'm assuming it could go
> > through the arm64 tree? If that's not a good idea, we could always
> > change the patch ("arm64: kgdb: Roundup cpus using IPI as NMI") not to
> > depend on it by only calling kgdb_nmicallback() if CONFIG_KGDB is not
> > defined.
> >
> > Changes in v9:
> > - Added missing "inline"
> >
> > Changes in v8:
> > - "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
> >
> > include/linux/kgdb.h | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> > index 258cdde8d356..76e891ee9e37 100644
> > --- a/include/linux/kgdb.h
> > +++ b/include/linux/kgdb.h
> > @@ -365,5 +365,6 @@ extern void kgdb_free_init_mem(void);
> > #define dbg_late_init()
> > static inline void kgdb_panic(const char *msg) {}
> > static inline void kgdb_free_init_mem(void) { }
> > +static inline int kgdb_nmicallback(int cpu, void *regs) { return 1; }
>
> What do you think about landing just ${SUBJECT} patch in kgdb right
> now so it can end up in v6.5-rc1? It seems like this series is
> currently blocked on Mark getting a spare moment and it seems unlikely
> that'll happen this cycle. If we at least land the kgdb patch then it
> would make things all that much easier to land in the next cycle. The
> kgdb patch feels like it can make sense on its own...
Yes, grabbing this one should be fine!
Daniel.
Hi folks,
On Thu, Jun 1, 2023 at 2:37 PM Douglas Anderson <[email protected]> wrote:
>
> This is an attempt to resurrect Sumit's old patch series [1] that
> allowed us to use the arm64 pseudo-NMI to get backtraces of CPUs and
> also to round up CPUs in kdb/kgdb. The last post from Sumit that I
> could find was v7, so I started my series at v8. I haven't copied all
> of his old changelongs here, but you can find them from the link.
>
> I'm really looking for a way to land this patch series. In response to
> v8, Mark Rutland indicated [2] that he was worried about the soundness
> of pseudo NMI. Those definitely need to get fixed, but IMO this patch
> series could still land in the meantime. That would at least let
> people test with it.
>
> Request for anyone reading this: please help indicate your support of
> this patch series landing by replying, even if you don't have the
> background for a full review. My suspicion is that there are a lot of
> people who agree that this would be super useful to get landed.
>
> Since v8, I have cleaned up this patch series by integrating the 10th
> patch from v8 [3] into the whole series. As part of this, I renamed
> the "NMI IPI" to the "debug IPI" since it could now be backed by a
> regular IPI in the case that pseudo NMIs weren't available. With the
> fallback, this allowed me to drop some extra patches from the
> series. This feels (to me) to be pretty clean and hopefully others
> agree. Any patch I touched significantly I removed Masayoshi and
> Chen-Yu's tags from.
>
> ...also in v8, I reorderd the patches a bit in a way that seemed a
> little cleaner to me.
>
> Since v7, I have:
> * Addressed the small amount of feedback that was there for v7.
> * Rebased.
> * Added a new patch that prevents us from spamming the logs with idle
> tasks.
> * Added an extra patch to gracefully fall back to regular IPIs if
> pseudo-NMIs aren't there.
>
> It can be noted that this patch series works very well with the recent
> "hardlockup" patches that have landed through Andrew Morton's tree and
> are currently in linuxnext. It works especially well with the "buddy"
> lockup detector.
>
> [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
> [2] https://lore.kernel.org/lkml/ZFvGqD%2F%2Fpm%[email protected]/
> [3] https://lore.kernel.org/r/20230419155341.v8.10.Ic3659997d6243139d0522fc3afcdfd88d7a5f030@changeid/
>
> Changes in v9:
> - Add a warning if we don't have enough IPIs for the NMI IPI
> - Added comments that we might not be using NMI always.
> - Added missing "inline"
> - Added to commit message that this doesn't catch all cases.
> - Fold in v8 patch #10 ("Fallback to a regular IPI if NMI isn't enabled")
> - Moved header file out of "include" since it didn't need to be there.
> - Remove arm64_supports_nmi()
> - Remove fallback for when debug IPI isn't available.
> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
> - Update commit description
> - arch_trigger_cpumask_backtrace() no longer returns bool
>
> Changes in v8:
> - "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
> - "Tag the arm64 idle functions as __cpuidle" new for v8
> - Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
> - debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param
>
> Douglas Anderson (2):
> arm64: idle: Tag the arm64 idle functions as __cpuidle
> kgdb: Provide a stub kgdb_nmicallback() if !CONFIG_KGDB
>
> Sumit Garg (5):
> irqchip/gic-v3: Enable support for SGIs to act as NMIs
> arm64: Add framework for a debug IPI
> arm64: smp: Assign and setup the debug IPI
> arm64: ipi_debug: Add support for backtrace using the debug IPI
> arm64: kgdb: Roundup cpus using the debug IPI
>
> arch/arm64/include/asm/irq.h | 3 +
> arch/arm64/kernel/Makefile | 2 +-
> arch/arm64/kernel/idle.c | 4 +-
> arch/arm64/kernel/ipi_debug.c | 102 ++++++++++++++++++++++++++++++++++
> arch/arm64/kernel/ipi_debug.h | 13 +++++
> arch/arm64/kernel/kgdb.c | 14 +++++
> arch/arm64/kernel/smp.c | 11 ++++
> drivers/irqchip/irq-gic-v3.c | 29 +++++++---
> include/linux/kgdb.h | 1 +
> 9 files changed, 168 insertions(+), 11 deletions(-)
I'm looking for some ideas on what to do to move this patch series
forward. Thanks to Daniel, the kgdb patch is now in Linus's tree which
hopefully makes this simpler to land. I guess there is still the
irqchip dependency that will need to be sorted out, though...
Even if folks aren't in agreement about whether this is ready to be
enabled in production, I don't think anything here is super
objectionable or controversial, is it? Can we land it? If you feel
like it needs extra review, would it help if I tried to drum up some
extra people to provide review feedback?
Also: in case it's interesting to anyone, I've been doing benchmarks
on sc7180-trogdor devices in preparation for enabling this. On that
platform, I did manage to see about 4% reduction in a set of hackbench
numbers when fully enabling pseudo-NMI. However, when I instead ran
Speedometer 2.1 I saw no difference. See:
https://issuetracker.google.com/issues/197061987
-Doug
On Thu, Jun 01, 2023 at 02:31:48PM -0700, Douglas Anderson wrote:
> From: Sumit Garg <[email protected]>
>
> All current arm64 interrupt controllers have at least 8
> IPIs. Currently we are only using 7 of them on arm64. Let's use the
> 8th one as a debug IPI. This uses the new "debug IPI" infrastructure
> which will try to allocate this IPI as an NMI/pseudo NMI if possible.
>
> Signed-off-by: Sumit Garg <[email protected]>
> Signed-off-by: Douglas Anderson <[email protected]>
I think with my suggestion on the prior patch, we don't need the additional
logic here.
> ---
> I could imagine that people object to using up the last free IPI on
> interrupt controllers with only 8 IPIs. However, it shouldn't be a big
> deal. If we later need an extra IPI, it shouldn't be too hard to
> combine some of the existing ones. Presumably we could just get rid of
> the "crash stop" IPI and have the normal "stop" IPI do the crash if
> "waiting_for_crash_ipi" is non-zero
TBH, I'd love to unify the logic for IPI_CPU_STOP and IPI_CPU_CRASH_STOP as
they have a bunch of pointless divergence.
We could also remove IPI_WAKEUP and replace that with something else (e.g.
IPI_CALL_FUNC with an empty function) in order to free up an IPI.
I'd *also* prefer to have separate IPI_CPU_BACKTRACE and IPI_CPU_DEBUG as the
backtrace logic is used for a bunch of things other than KGDB (e.g. RCU stalls,
panic), and that would clearly separate the logic for the two cases.
Thanks,
Mark.
>
> Changes in v9:
> - Add a warning if we don't have enough IPIs for the NMI IPI
> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
> - Update commit description
>
> Changes in v8:
> - debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param
>
> arch/arm64/kernel/smp.c | 11 +++++++++++
> 1 file changed, 11 insertions(+)
>
> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> index edd63894d61e..db019b49d3bd 100644
> --- a/arch/arm64/kernel/smp.c
> +++ b/arch/arm64/kernel/smp.c
> @@ -53,6 +53,8 @@
>
> #include <trace/events/ipi.h>
>
> +#include "ipi_debug.h"
> +
> DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number);
> EXPORT_PER_CPU_SYMBOL(cpu_number);
>
> @@ -935,6 +937,8 @@ static void ipi_setup(int cpu)
>
> for (i = 0; i < nr_ipi; i++)
> enable_percpu_irq(ipi_irq_base + i, 0);
> +
> + debug_ipi_setup();
> }
>
> #ifdef CONFIG_HOTPLUG_CPU
> @@ -947,6 +951,8 @@ static void ipi_teardown(int cpu)
>
> for (i = 0; i < nr_ipi; i++)
> disable_percpu_irq(ipi_irq_base + i);
> +
> + debug_ipi_teardown();
> }
> #endif
>
> @@ -968,6 +974,11 @@ void __init set_smp_ipi_range(int ipi_base, int n)
> irq_set_status_flags(ipi_base + i, IRQ_HIDDEN);
> }
>
> + if (n > nr_ipi)
> + set_smp_debug_ipi(ipi_base + nr_ipi);
> + else
> + WARN(1, "Not enough IPIs for NMI IPI\n");
> +
> ipi_irq_base = ipi_base;
>
> /* Setup the boot CPU immediately */
> --
> 2.41.0.rc2.161.g9c6817b8e7-goog
>
On Thu, Jun 01, 2023 at 02:31:50PM -0700, Douglas Anderson wrote:
> To save architectures from needing to wrap the call in #ifdefs, add a
> stub no-op version of kgdb_nmicallback(), which returns 1 if it didn't
> handle anything.
>
> Reviewed-by: Daniel Thompson <[email protected]>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
> In v9 this is the only kgdb dependency. I'm assuming it could go
> through the arm64 tree? If that's not a good idea, we could always
> change the patch ("arm64: kgdb: Roundup cpus using IPI as NMI") not to
> depend on it by only calling kgdb_nmicallback() if CONFIG_KGDB is not
> defined.
>
> Changes in v9:
> - Added missing "inline"
>
> Changes in v8:
> - "Provide a stub kgdb_nmicallback() if !CONFIG_KGDB" new for v8
>
> include/linux/kgdb.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
> index 258cdde8d356..76e891ee9e37 100644
> --- a/include/linux/kgdb.h
> +++ b/include/linux/kgdb.h
> @@ -365,5 +365,6 @@ extern void kgdb_free_init_mem(void);
> #define dbg_late_init()
> static inline void kgdb_panic(const char *msg) {}
> static inline void kgdb_free_init_mem(void) { }
> +static inline int kgdb_nmicallback(int cpu, void *regs) { return 1; }
Is '1' an error?
Can we return an actual error code if so? It makes it *much* clearer to anyone
looking at the code.
Thanks,
Mark.
> #endif /* ! CONFIG_KGDB */
> #endif /* _KGDB_H_ */
> --
> 2.41.0.rc2.161.g9c6817b8e7-goog
>
On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote:
> From: Sumit Garg <[email protected]>
>
> Let's use the debug IPI for rounding up CPUs in kgdb. When the debug
> IPI is backed by an NMI (or pseudo NMI) then this will let us debug
> even hard locked CPUs. When the debug IPI isn't backed by an NMI then
> this won't really have any huge benefit but it will still work.
>
> Signed-off-by: Sumit Garg <[email protected]>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> Changes in v9:
> - Remove fallback for when debug IPI isn't available.
> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
>
> arch/arm64/kernel/ipi_debug.c | 5 +++++
> arch/arm64/kernel/kgdb.c | 14 ++++++++++++++
> 2 files changed, 19 insertions(+)
This looks fine to me, but I'd feel a bit happier if we had separate SGIs for
the backtrace and the KGDB callback as they're logically unrelated.
Thanks,
Mark.
>
> diff --git a/arch/arm64/kernel/ipi_debug.c b/arch/arm64/kernel/ipi_debug.c
> index 6984ed507e1f..5794894d94f1 100644
> --- a/arch/arm64/kernel/ipi_debug.c
> +++ b/arch/arm64/kernel/ipi_debug.c
> @@ -8,6 +8,7 @@
>
> #include <linux/interrupt.h>
> #include <linux/irq.h>
> +#include <linux/kgdb.h>
> #include <linux/nmi.h>
> #include <linux/smp.h>
>
> @@ -40,6 +41,7 @@ void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
> static irqreturn_t ipi_debug_handler(int irq, void *data)
> {
> irqreturn_t ret = IRQ_NONE;
> + unsigned int cpu = smp_processor_id();
>
> /*
> * NOTE: Just like in arch_trigger_cpumask_backtrace(), we're calling
> @@ -49,6 +51,9 @@ static irqreturn_t ipi_debug_handler(int irq, void *data)
> if (nmi_cpu_backtrace(get_irq_regs()))
> ret = IRQ_HANDLED;
>
> + if (!kgdb_nmicallback(cpu, get_irq_regs()))
> + ret = IRQ_HANDLED;
> +
> return ret;
> }
>
> diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
> index 4e1f983df3d1..9c4c47507cd4 100644
> --- a/arch/arm64/kernel/kgdb.c
> +++ b/arch/arm64/kernel/kgdb.c
> @@ -20,6 +20,8 @@
> #include <asm/patching.h>
> #include <asm/traps.h>
>
> +#include "ipi_debug.h"
> +
> struct dbg_reg_def_t dbg_reg_def[DBG_MAX_REG_NUM] = {
> { "x0", 8, offsetof(struct pt_regs, regs[0])},
> { "x1", 8, offsetof(struct pt_regs, regs[1])},
> @@ -356,3 +358,15 @@ int kgdb_arch_remove_breakpoint(struct kgdb_bkpt *bpt)
> return aarch64_insn_write((void *)bpt->bpt_addr,
> *(u32 *)bpt->saved_instr);
> }
> +
> +void kgdb_roundup_cpus(void)
> +{
> + struct cpumask mask;
> +
> + cpumask_copy(&mask, cpu_online_mask);
> + cpumask_clear_cpu(raw_smp_processor_id(), &mask);
> + if (cpumask_empty(&mask))
> + return;
> +
> + arm64_debug_ipi(&mask);
> +}
> --
> 2.41.0.rc2.161.g9c6817b8e7-goog
>
On Thu, Jun 01, 2023 at 02:31:47PM -0700, Douglas Anderson wrote:
> From: Sumit Garg <[email protected]>
>
> Introduce a framework for an IPI that will be used for debug
> purposes. The primary use case of this IPI will be to generate stack
> crawls on other CPUs, but it will also be used to round up CPUs for
> kgdb.
>
> When possible, we try to allocate this debug IPI as an NMI (or a
> pseudo NMI). If that fails (due to CONFIG, an incompatible interrupt
> controller, a quirk, missing the "irqchip.gicv3_pseudo_nmi=1" kernel
> parameter, etc) we fall back to a normal IPI.
>
> NOTE: hooking this up for CPU backtrace / kgdb will happen in a future
> patch, this just adds the framework.
>
> Signed-off-by: Sumit Garg <[email protected]>
> Signed-off-by: Douglas Anderson <[email protected]>
I think that we shouldn't add a framework in a separate file for this:
* This is very similar to our existing IPI management in smp.c, so it feels
like duplication, or at least another thing we'd like to keep in-sync.
* We're going to want an NMI backtrace regardless of KGDB
* We're going to want the IPI_CPU_STOP and IPI_CRASH_CPU_STOP IPIs to be NMIs
too.
I reckon it'd be better to extend the existing IPI logic in smp.c to allow IPIs
to be requested as NMIs, e.g.
----
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index edd63894d61e8..48e6aa62c473e 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -33,6 +33,7 @@
#include <linux/kernel_stat.h>
#include <linux/kexec.h>
#include <linux/kvm_host.h>
+#include <linux/nmi.h>
#include <asm/alternative.h>
#include <asm/atomic.h>
@@ -926,6 +927,21 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
__ipi_send_mask(ipi_desc[ipinr], target);
}
+static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
+{
+ if (!system_uses_irq_prio_masking())
+ return false;
+
+ switch (ipi) {
+ /*
+ * TODO: select NMI IPIs here
+ */
+ return true;
+ default:
+ return false;
+ }
+}
+
static void ipi_setup(int cpu)
{
int i;
@@ -933,8 +949,14 @@ static void ipi_setup(int cpu)
if (WARN_ON_ONCE(!ipi_irq_base))
return;
- for (i = 0; i < nr_ipi; i++)
- enable_percpu_irq(ipi_irq_base + i, 0);
+ for (i = 0; i < nr_ipi; i++) {
+ if (ipi_should_be_nmi(i)) {
+ prepare_percpu_nmi(ipi_irq_base + i);
+ enable_percpu_nmi(ipi_irq_base + i, 0);
+ } else {
+ enable_percpu_irq(ipi_irq_base + i, 0);
+ }
+ }
}
#ifdef CONFIG_HOTPLUG_CPU
@@ -945,8 +967,14 @@ static void ipi_teardown(int cpu)
if (WARN_ON_ONCE(!ipi_irq_base))
return;
- for (i = 0; i < nr_ipi; i++)
- disable_percpu_irq(ipi_irq_base + i);
+ for (i = 0; i < nr_ipi; i++) {
+ if (ipi_should_be_nmi(i)) {
+ disable_percpu_nmi(ipi_irq_base + i);
+ teardown_percpu_nmi(ipi_irq_base + i);
+ } else {
+ disable_percpu_irq(ipi_irq_base + i);
+ }
+ }
}
#endif
@@ -958,11 +986,19 @@ void __init set_smp_ipi_range(int ipi_base, int n)
nr_ipi = min(n, NR_IPI);
for (i = 0; i < nr_ipi; i++) {
- int err;
-
- err = request_percpu_irq(ipi_base + i, ipi_handler,
- "IPI", &cpu_number);
- WARN_ON(err);
+ int err = -EINVAL;
+
+ if (ipi_should_be_nmi(i)) {
+ err = request_percpu_nmi(ipi_base + i, ipi_handler,
+ "IPI", &cpu_number);
+ WARN(err, "Could not request IPI %d as NMI, err=%d\n",
+ i, err);
+ } else {
+ err = request_percpu_irq(ipi_base + i, ipi_handler,
+ "IPI", &cpu_number);
+ WARN(err, "Could not request IPI %d as IRQ, err=%d\n",
+ i, err);
+ }
ipi_desc[i] = irq_to_desc(ipi_base + i);
irq_set_status_flags(ipi_base + i, IRQ_HIDDEN);
----
... and then if we need an IPI for KGDB, we can add that to the existing list
of IPIs, and have it requested/enabled/disabled as usual.
Thanks,
Mark.
> ---
> I didn't get any feedback from v8 patch #10 [1], but I went ahead and
> folded it in here anyway since it really simplfies things. If people
> don't like the fallback to regular IPI, I can also undo it.
>
> [1] https://lore.kernel.org/r/20230419155341.v8.10.Ic3659997d6243139d0522fc3afcdfd88d7a5f030@changeid/
>
> Changes in v9:
> - Fold in v8 patch #10 ("Fallback to a regular IPI if NMI isn't enabled")
> - Moved header file out of "include" since it didn't need to be there.
> - Remove arm64_supports_nmi()
> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
>
> Changes in v8:
> - debug_ipi_setup() and debug_ipi_teardown() no longer take cpu param
>
> arch/arm64/kernel/Makefile | 2 +-
> arch/arm64/kernel/ipi_debug.c | 76 +++++++++++++++++++++++++++++++++++
> arch/arm64/kernel/ipi_debug.h | 13 ++++++
> 3 files changed, 90 insertions(+), 1 deletion(-)
> create mode 100644 arch/arm64/kernel/ipi_debug.c
> create mode 100644 arch/arm64/kernel/ipi_debug.h
>
> diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
> index cc22011ab66a..737838f803b7 100644
> --- a/arch/arm64/kernel/Makefile
> +++ b/arch/arm64/kernel/Makefile
> @@ -34,7 +34,7 @@ obj-y := debug-monitors.o entry.o irq.o fpsimd.o \
> cpufeature.o alternative.o cacheinfo.o \
> smp.o smp_spin_table.o topology.o smccc-call.o \
> syscall.o proton-pack.o idreg-override.o idle.o \
> - patching.o
> + patching.o ipi_debug.o
>
> obj-$(CONFIG_COMPAT) += sys32.o signal32.o \
> sys_compat.o
> diff --git a/arch/arm64/kernel/ipi_debug.c b/arch/arm64/kernel/ipi_debug.c
> new file mode 100644
> index 000000000000..b57833e31eaf
> --- /dev/null
> +++ b/arch/arm64/kernel/ipi_debug.c
> @@ -0,0 +1,76 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Debug IPI support
> + *
> + * Copyright (C) 2020 Linaro Limited
> + * Author: Sumit Garg <[email protected]>
> + */
> +
> +#include <linux/interrupt.h>
> +#include <linux/irq.h>
> +#include <linux/smp.h>
> +
> +#include "ipi_debug.h"
> +
> +static struct irq_desc *ipi_debug_desc __read_mostly;
> +static int ipi_debug_id __read_mostly;
> +static bool is_nmi;
> +
> +void arm64_debug_ipi(cpumask_t *mask)
> +{
> + if (WARN_ON_ONCE(!ipi_debug_desc))
> + return;
> +
> + __ipi_send_mask(ipi_debug_desc, mask);
> +}
> +
> +static irqreturn_t ipi_debug_handler(int irq, void *data)
> +{
> + /* nop, NMI handlers for special features can be added here. */
> +
> + return IRQ_NONE;
> +}
> +
> +void debug_ipi_setup(void)
> +{
> + if (!ipi_debug_desc)
> + return;
> +
> + if (is_nmi) {
> + if (!prepare_percpu_nmi(ipi_debug_id))
> + enable_percpu_nmi(ipi_debug_id, IRQ_TYPE_NONE);
> + } else {
> + enable_percpu_irq(ipi_debug_id, IRQ_TYPE_NONE);
> + }
> +}
> +
> +void debug_ipi_teardown(void)
> +{
> + if (!ipi_debug_desc)
> + return;
> +
> + if (is_nmi) {
> + disable_percpu_nmi(ipi_debug_id);
> + teardown_percpu_nmi(ipi_debug_id);
> + } else {
> + disable_percpu_irq(ipi_debug_id);
> + }
> +}
> +
> +void __init set_smp_debug_ipi(int ipi)
> +{
> + int err;
> +
> + if (!request_percpu_nmi(ipi, ipi_debug_handler, "IPI", &cpu_number)) {
> + is_nmi = true;
> + } else {
> + err = request_percpu_irq(ipi, ipi_debug_handler, "IPI", &cpu_number);
> + if (WARN_ON(err))
> + return;
> +
> + irq_set_status_flags(ipi, IRQ_HIDDEN);
> + }
> +
> + ipi_debug_desc = irq_to_desc(ipi);
> + ipi_debug_id = ipi;
> +}
> diff --git a/arch/arm64/kernel/ipi_debug.h b/arch/arm64/kernel/ipi_debug.h
> new file mode 100644
> index 000000000000..f6011a09282f
> --- /dev/null
> +++ b/arch/arm64/kernel/ipi_debug.h
> @@ -0,0 +1,13 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef __ASM_NMI_H
> +#define __ASM_NMI_H
> +
> +#include <linux/cpumask.h>
> +
> +void arm64_debug_ipi(cpumask_t *mask);
> +
> +void set_smp_debug_ipi(int ipi);
> +void debug_ipi_setup(void);
> +void debug_ipi_teardown(void);
> +
> +#endif
> --
> 2.41.0.rc2.161.g9c6817b8e7-goog
>
On Thu, Jun 01, 2023 at 02:31:46PM -0700, Douglas Anderson wrote:
> As per the (somewhat recent) comment before the definition of
> `__cpuidle`, the tag is like `noinstr` but also marks a function so it
> can be identified by cpu_in_idle(). Let'a add this.
>
> After doing this then when we dump stack traces of all processors
> using nmi_cpu_backtrace() then instead of getting useless backtraces
> we get things like:
>
> NMI backtrace for cpu N skipped: idling at cpu_do_idle+0x94/0x98
>
> NOTE: this patch won't make cpu_in_idle() work perfectly for arm64,
> but it doesn't hurt and does catch some cases. Specifically an example
> that wasn't caught in my testing looked like this:
>
> gic_cpu_sys_reg_init+0x1f8/0x314
> gic_cpu_pm_notifier+0x40/0x78
> raw_notifier_call_chain+0x5c/0x134
> cpu_pm_notify+0x38/0x64
> cpu_pm_exit+0x20/0x2c
> psci_enter_idle_state+0x48/0x70
> cpuidle_enter_state+0xb8/0x260
> cpuidle_enter+0x44/0x5c
> do_idle+0x188/0x30c
>
> Signed-off-by: Douglas Anderson <[email protected]>
I don't have strong feelings either way for this, so:
Acked-by: Mark Rutland <[email protected]>
Thanks,
Mark.
> ---
>
> Changes in v9:
> - Added to commit message that this doesn't catch all cases.
>
> Changes in v8:
> - "Tag the arm64 idle functions as __cpuidle" new for v8
>
> arch/arm64/kernel/idle.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/kernel/idle.c b/arch/arm64/kernel/idle.c
> index c1125753fe9b..05cfb347ec26 100644
> --- a/arch/arm64/kernel/idle.c
> +++ b/arch/arm64/kernel/idle.c
> @@ -20,7 +20,7 @@
> * ensure that interrupts are not masked at the PMR (because the core will
> * not wake up if we block the wake up signal in the interrupt controller).
> */
> -void noinstr cpu_do_idle(void)
> +void __cpuidle cpu_do_idle(void)
> {
> struct arm_cpuidle_irq_context context;
>
> @@ -35,7 +35,7 @@ void noinstr cpu_do_idle(void)
> /*
> * This is our default idle handler.
> */
> -void noinstr arch_cpu_idle(void)
> +void __cpuidle arch_cpu_idle(void)
> {
> /*
> * This should do all the clock switching and wait for interrupt
> --
> 2.41.0.rc2.161.g9c6817b8e7-goog
>
On Thu, Jun 01, 2023 at 02:31:49PM -0700, Douglas Anderson wrote:
> From: Sumit Garg <[email protected]>
>
> Enable arch_trigger_cpumask_backtrace() support on arm64 using the new
> debug IPI. With this arm64 can now get backtraces in cases where
> callers of the trigger_xyz_backtrace() class of functions don't check
> the return code and implement a fallback. One example is
> `kernel.softlockup_all_cpu_backtrace`. This also allows us to
> backtrace hard locked up CPUs in cases where the debug IPI is backed
> by an NMI (or pseudo NMI).
>
> Signed-off-by: Sumit Garg <[email protected]>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> Changes in v9:
> - Added comments that we might not be using NMI always.
> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
> - arch_trigger_cpumask_backtrace() no longer returns bool
>
> Changes in v8:
> - Removed "#ifdef CONFIG_SMP" since arm64 is always SMP
>
> arch/arm64/include/asm/irq.h | 3 +++
> arch/arm64/kernel/ipi_debug.c | 25 +++++++++++++++++++++++--
> 2 files changed, 26 insertions(+), 2 deletions(-)
As with prior patches, I'd prefer if this lived in smp.c with the other IPI
logic.
>
> diff --git a/arch/arm64/include/asm/irq.h b/arch/arm64/include/asm/irq.h
> index fac08e18bcd5..be2d103f316e 100644
> --- a/arch/arm64/include/asm/irq.h
> +++ b/arch/arm64/include/asm/irq.h
> @@ -6,6 +6,9 @@
>
> #include <asm-generic/irq.h>
>
> +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self);
> +#define arch_trigger_cpumask_backtrace arch_trigger_cpumask_backtrace
> +
> struct pt_regs;
>
> int set_handle_irq(void (*handle_irq)(struct pt_regs *));
> diff --git a/arch/arm64/kernel/ipi_debug.c b/arch/arm64/kernel/ipi_debug.c
> index b57833e31eaf..6984ed507e1f 100644
> --- a/arch/arm64/kernel/ipi_debug.c
> +++ b/arch/arm64/kernel/ipi_debug.c
> @@ -8,6 +8,7 @@
>
> #include <linux/interrupt.h>
> #include <linux/irq.h>
> +#include <linux/nmi.h>
> #include <linux/smp.h>
>
> #include "ipi_debug.h"
> @@ -24,11 +25,31 @@ void arm64_debug_ipi(cpumask_t *mask)
> __ipi_send_mask(ipi_debug_desc, mask);
> }
>
> +void arch_trigger_cpumask_backtrace(const cpumask_t *mask, bool exclude_self)
> +{
> + /*
> + * NOTE: though nmi_trigger_cpumask_backtrace has "nmi_" in the name,
> + * nothing about it truly needs to be backed by an NMI, it's just that
> + * it's _allowed_ to be called from an NMI. If set_smp_debug_ipi()
> + * failed to get an NMI (or pseudo-NMI) this will just be backed by a
> + * regular IPI.
> + */
> + nmi_trigger_cpumask_backtrace(mask, exclude_self, arm64_debug_ipi);
> +}
> +
> static irqreturn_t ipi_debug_handler(int irq, void *data)
> {
> - /* nop, NMI handlers for special features can be added here. */
> + irqreturn_t ret = IRQ_NONE;
> +
> + /*
> + * NOTE: Just like in arch_trigger_cpumask_backtrace(), we're calling
> + * a function with "nmi_" in the name but it works fine even if we
> + * are using a regulaor IPI.
> + */
> + if (nmi_cpu_backtrace(get_irq_regs()))
> + ret = IRQ_HANDLED;
>
Does this need the printk_deferred_{enter,exit}() that 32-bit arm has?
Thanks,
Mark.
> - return IRQ_NONE;
> + return ret;
> }
>
> void debug_ipi_setup(void)
> --
> 2.41.0.rc2.161.g9c6817b8e7-goog
>
Hi Doug,
On Thu, Jun 01, 2023 at 02:31:45PM -0700, Douglas Anderson wrote:
> From: Sumit Garg <[email protected]>
>
> Add support to handle SGIs as pseudo NMIs. As SGIs or IPIs default to a
> special flow handler: handle_percpu_devid_fasteoi_ipi(), so skip NMI
> handler update in case of SGIs.
I couldn't find handle_percpu_devid_fasteoi_ipi() in mainline, and when
researching I found that we changed that in commit:
6abbd6988971aaa6 ("irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq()")
... which was in v5.11, so it looks like this is stale?
Since that commit, SGIs are treated the same as PPIs/EPPIs, and use
handle_percpu_devid_irq() by default.
IIUC handle_percpu_devid_irq() isn't NMI safe, and so to run in an NMI context
those should use handle_percpu_devid_fasteoi_nmi().
Marc, does that sound right to you? i.e. SGI NMIs should be handled exactly the
same as PPI NMIs, and use handle_percpu_devid_fasteoi_nmi()?
I have some comments below assuming that SGI NMIs should use
handle_percpu_devid_fasteoi_nmi().
> Also, enable NMI support prior to gic_smp_init() as allocation of SGIs
> as IRQs/NMIs happen as part of this routine.
This bit looks fine to me.
> Signed-off-by: Sumit Garg <[email protected]>
> Reviewed-by: Masayoshi Mizuma <[email protected]>
> Tested-by: Chen-Yu Tsai <[email protected]>
> Signed-off-by: Douglas Anderson <[email protected]>
> ---
>
> (no changes since v1)
>
> drivers/irqchip/irq-gic-v3.c | 29 +++++++++++++++++++++--------
> 1 file changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index 0c6c1af9a5b7..ed37e02d4c5f 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -525,6 +525,7 @@ static u32 gic_get_ppi_index(struct irq_data *d)
> static int gic_irq_nmi_setup(struct irq_data *d)
> {
> struct irq_desc *desc = irq_to_desc(d->irq);
> + u32 idx;
>
> if (!gic_supports_nmi())
> return -EINVAL;
> @@ -542,16 +543,22 @@ static int gic_irq_nmi_setup(struct irq_data *d)
> return -EINVAL;
>
> /* desc lock should already be held */
> - if (gic_irq_in_rdist(d)) {
> - u32 idx = gic_get_ppi_index(d);
> + switch (get_intid_range(d)) {
> + case SGI_RANGE:
> + break;
> + case PPI_RANGE:
> + case EPPI_RANGE:
> + idx = gic_get_ppi_index(d);
>
> /* Setting up PPI as NMI, only switch handler for first NMI */
> if (!refcount_inc_not_zero(&ppi_nmi_refs[idx])) {
> refcount_set(&ppi_nmi_refs[idx], 1);
> desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
> }
> - } else {
> + break;
> + default:
> desc->handle_irq = handle_fasteoi_nmi;
> + break;
> }
As above, I reckon this isn't right, and we should treat all rdist interrupts
(which are all percpu) the same.
I reckon what we should be doing here is make ppi_nmi_refs cover all of the
rdist interrupts (e.g. make that rdist_nmi_refs, add a gic_get_rdist_idx()
helper), and then here have something like:
if (gic_irq_in_rdist(d)) {
u32 idx = gic_get_rdist_idx(d);
/*
* Setting up a percpu interrupt as NMI, only switch handler
* for first NMI
*/
if (!refcount_inc_not_zero(&rdist_nmi_refs[idx])) {
refcount_set(&ppi_nmi_refs[idx], 1);
desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
}
}
... as an aside, it'd be nicer if we could switch the handler at request time,
as then we wouldn't need the refcount at all, but I couldn't see a good irqchip
hook to hang that off, so I don't think that needs to change as a prerequisite.
>
> gic_irq_set_prio(d, GICD_INT_NMI_PRI);
> @@ -562,6 +569,7 @@ static int gic_irq_nmi_setup(struct irq_data *d)
> static void gic_irq_nmi_teardown(struct irq_data *d)
> {
> struct irq_desc *desc = irq_to_desc(d->irq);
> + u32 idx;
>
> if (WARN_ON(!gic_supports_nmi()))
> return;
> @@ -579,14 +587,20 @@ static void gic_irq_nmi_teardown(struct irq_data *d)
> return;
>
> /* desc lock should already be held */
> - if (gic_irq_in_rdist(d)) {
> - u32 idx = gic_get_ppi_index(d);
> + switch (get_intid_range(d)) {
> + case SGI_RANGE:
> + break;
> + case PPI_RANGE:
> + case EPPI_RANGE:
> + idx = gic_get_ppi_index(d);
>
> /* Tearing down NMI, only switch handler for last NMI */
> if (refcount_dec_and_test(&ppi_nmi_refs[idx]))
> desc->handle_irq = handle_percpu_devid_irq;
> - } else {
> + break;
> + default:
> desc->handle_irq = handle_fasteoi_irq;
> + break;
> }
Same comments as for gic_irq_nmi_setup() here.
>
> gic_irq_set_prio(d, GICD_INT_DEF_PRI);
> @@ -2001,6 +2015,7 @@ static int __init gic_init_bases(phys_addr_t dist_phys_base,
>
> gic_dist_init();
> gic_cpu_init();
> + gic_enable_nmi_support();
> gic_smp_init();
> gic_cpu_pm_init();
>
> @@ -2013,8 +2028,6 @@ static int __init gic_init_bases(phys_addr_t dist_phys_base,
> gicv2m_init(handle, gic_data.domain);
> }
>
> - gic_enable_nmi_support();
> -
This bit looks fine to me.
Thanks,
Mark.
Hi Doug,
Apologies for the delay.
On Mon, Jul 24, 2023 at 08:55:44AM -0700, Doug Anderson wrote:
> On Thu, Jun 1, 2023 at 2:37 PM Douglas Anderson <[email protected]> wrote:
> I'm looking for some ideas on what to do to move this patch series
> forward. Thanks to Daniel, the kgdb patch is now in Linus's tree which
> hopefully makes this simpler to land. I guess there is still the
> irqchip dependency that will need to be sorted out, though...
>
> Even if folks aren't in agreement about whether this is ready to be
> enabled in production, I don't think anything here is super
> objectionable or controversial, is it? Can we land it? If you feel
> like it needs extra review, would it help if I tried to drum up some
> extra people to provide review feedback?
Ignoring the soundness issues I mentioned before (which I'm slowly chipping
away at, and you're likely lucky enough to avoid in practice)...
Having looked over the series, I think the GICv3 bit isn't quite right, but is
easy enough to fix. I've commented on the patch with what I think we should
have there.
The only major thing otherwise from my PoV is the structure of the debug IPI
framework. I'm not keen on that being a separate body of code and I think it
should live in smp.c along with the other IPIs. I'd also strongly prefer if we
could have separate IPI_CPU_BACKTRACE and IPI_CPU_KGDB IPIs, and I think we can
do that either by unifying IPI_CPU_STOP && IPI_CPU_CRASH_STOP or by reclaiming
IPI_WAKEUP by reusing a different IPI for the parking protocol (e.g.
IPI_RESCHEDULE).
I think it'd be nice if the series could enable NMIs for backtrace and the
CPU_{,CRASH_}STOP cases, with KGDB being the bonus atop. That way it'd be
clearly beneficial for anyone trying to debug lockups even if they're not a
KGDB user.
> Also: in case it's interesting to anyone, I've been doing benchmarks
> on sc7180-trogdor devices in preparation for enabling this. On that
> platform, I did manage to see about 4% reduction in a set of hackbench
> numbers when fully enabling pseudo-NMI. However, when I instead ran
> Speedometer 2.1 I saw no difference. See:
>
> https://issuetracker.google.com/issues/197061987
Thanks for the pointer!
I know that there are a couple of things that we could do to slightly improve
local_irq_*() when using pNMIs, though I suspect that the bulk of the cost
there will come from the necessary synchronization.
Thanks,
Mark.
On 2023-08-07 11:28, Mark Rutland wrote:
> On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote:
>> From: Sumit Garg <[email protected]>
>>
>> Let's use the debug IPI for rounding up CPUs in kgdb. When the debug
>> IPI is backed by an NMI (or pseudo NMI) then this will let us debug
>> even hard locked CPUs. When the debug IPI isn't backed by an NMI then
>> this won't really have any huge benefit but it will still work.
>>
>> Signed-off-by: Sumit Garg <[email protected]>
>> Signed-off-by: Douglas Anderson <[email protected]>
>> ---
>>
>> Changes in v9:
>> - Remove fallback for when debug IPI isn't available.
>> - Renamed "NMI IPI" to "debug IPI" since it might not be backed by
>> NMI.
>>
>> arch/arm64/kernel/ipi_debug.c | 5 +++++
>> arch/arm64/kernel/kgdb.c | 14 ++++++++++++++
>> 2 files changed, 19 insertions(+)
>
> This looks fine to me, but I'd feel a bit happier if we had separate
> SGIs for
> the backtrace and the KGDB callback as they're logically unrelated.
Well, we're a bit stuck here.
We have exactly *one* spare SGI with GICv3, as we lose 8 of them
to the secure side. One possibility would be to mux some of the
lesser used IPIs onto two SGIs (one with standard priority, and
one with NMI priority).
M.
--
Jazz is not dead. It just smells funny...
On Mon, Aug 07, 2023 at 11:27:00AM +0100, Mark Rutland wrote:
> On Thu, Jun 01, 2023 at 02:31:50PM -0700, Douglas Anderson wrote:
> > +static inline int kgdb_nmicallback(int cpu, void *regs) { return 1; }
>
> Is '1' an error?
>
> Can we return an actual error code if so? It makes it *much* clearer to anyone
> looking at the code.
Never mind; I see this was already queued. Sorry for the noise.
Thanks,
Mark.
On Mon, Aug 07, 2023 at 11:47:04AM +0100, Marc Zyngier wrote:
> On 2023-08-07 11:28, Mark Rutland wrote:
> > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote:
> > > From: Sumit Garg <[email protected]>
> > >
> > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug
> > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug
> > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then
> > > this won't really have any huge benefit but it will still work.
> > >
> > > Signed-off-by: Sumit Garg <[email protected]>
> > > Signed-off-by: Douglas Anderson <[email protected]>
> > > ---
> > >
> > > Changes in v9:
> > > - Remove fallback for when debug IPI isn't available.
> > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by
> > > NMI.
> > >
> > > arch/arm64/kernel/ipi_debug.c | 5 +++++
> > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++
> > > 2 files changed, 19 insertions(+)
> >
> > This looks fine to me, but I'd feel a bit happier if we had separate
> > SGIs for
> > the backtrace and the KGDB callback as they're logically unrelated.
>
> Well, we're a bit stuck here.
>
> We have exactly *one* spare SGI with GICv3, as we lose 8 of them
> to the secure side. One possibility would be to mux some of the
> lesser used IPIs onto two SGIs (one with standard priority, and
> one with NMI priority).
Understood; Doug and I suggested two options for that:
1) Unify/mux the IPI_CPU_STOP and IPI_CPU_CRASH_STOP IPIs
The only *intended* difference between the two is that IPI_CPU_CRASH_STOP
calls crash_save_cpu() before trying to stop the CPU, but the
implementations have diverged significantly for unrelated reasons.
2) Remove IPI_WAKEUP
We only use IPI_WAKEUP for the ACPI parking protocol, and we could reuse
another IPI (e.g. IPI_RESCHEDULE) to achieve the same thing witout a
dedicated IPI.
Thanks,
Mark.
On 2023-08-07 11:54, Mark Rutland wrote:
> On Mon, Aug 07, 2023 at 11:47:04AM +0100, Marc Zyngier wrote:
>> On 2023-08-07 11:28, Mark Rutland wrote:
>> > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote:
>> > > From: Sumit Garg <[email protected]>
>> > >
>> > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug
>> > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug
>> > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then
>> > > this won't really have any huge benefit but it will still work.
>> > >
>> > > Signed-off-by: Sumit Garg <[email protected]>
>> > > Signed-off-by: Douglas Anderson <[email protected]>
>> > > ---
>> > >
>> > > Changes in v9:
>> > > - Remove fallback for when debug IPI isn't available.
>> > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by
>> > > NMI.
>> > >
>> > > arch/arm64/kernel/ipi_debug.c | 5 +++++
>> > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++
>> > > 2 files changed, 19 insertions(+)
>> >
>> > This looks fine to me, but I'd feel a bit happier if we had separate
>> > SGIs for
>> > the backtrace and the KGDB callback as they're logically unrelated.
>>
>> Well, we're a bit stuck here.
>>
>> We have exactly *one* spare SGI with GICv3, as we lose 8 of them
>> to the secure side. One possibility would be to mux some of the
>> lesser used IPIs onto two SGIs (one with standard priority, and
>> one with NMI priority).
>
> Understood; Doug and I suggested two options for that:
>
> 1) Unify/mux the IPI_CPU_STOP and IPI_CPU_CRASH_STOP IPIs
>
> The only *intended* difference between the two is that
> IPI_CPU_CRASH_STOP
> calls crash_save_cpu() before trying to stop the CPU, but the
> implementations have diverged significantly for unrelated reasons.
>
> 2) Remove IPI_WAKEUP
>
> We only use IPI_WAKEUP for the ACPI parking protocol, and we could
> reuse
> another IPI (e.g. IPI_RESCHEDULE) to achieve the same thing witout a
> dedicated IPI.
Sure. My concern is that we're papering over the fundamental problem,
which is that IPIs are limited resource, and that we're bound to pile
more stuff on them.
I'm all for reclaiming the ones that can be merged, but we may
ultimately
need a real fix for this.
M.
--
Jazz is not dead. It just smells funny...
Hi Mark,
On Mon, 7 Aug 2023 at 15:20, Mark Rutland <[email protected]> wrote:
>
> Hi Doug,
>
> On Thu, Jun 01, 2023 at 02:31:45PM -0700, Douglas Anderson wrote:
> > From: Sumit Garg <[email protected]>
> >
> > Add support to handle SGIs as pseudo NMIs. As SGIs or IPIs default to a
> > special flow handler: handle_percpu_devid_fasteoi_ipi(), so skip NMI
> > handler update in case of SGIs.
>
> I couldn't find handle_percpu_devid_fasteoi_ipi() in mainline, and when
> researching I found that we changed that in commit:
>
> 6abbd6988971aaa6 ("irqchip/gic, gic-v3: Make SGIs use handle_percpu_devid_irq()")
>
> ... which was in v5.11, so it looks like this is stale?
The last time I tested this patchset (v7 [1]) was with kernel
v5.9.0-rc3. So I agree with you that later
handle_percpu_devid_fasteoi_ipi() was removed and the SGI handling
flow almost became identical to PPI.
[1] https://lore.kernel.org/linux-arm-kernel/[email protected]/
>
> Since that commit, SGIs are treated the same as PPIs/EPPIs, and use
> handle_percpu_devid_irq() by default.
>
> IIUC handle_percpu_devid_irq() isn't NMI safe, and so to run in an NMI context
> those should use handle_percpu_devid_fasteoi_nmi().
True.
>
> Marc, does that sound right to you? i.e. SGI NMIs should be handled exactly the
> same as PPI NMIs, and use handle_percpu_devid_fasteoi_nmi()?
>
> I have some comments below assuming that SGI NMIs should use
> handle_percpu_devid_fasteoi_nmi().
>
This sounds fine to me.
> > Also, enable NMI support prior to gic_smp_init() as allocation of SGIs
> > as IRQs/NMIs happen as part of this routine.
>
> This bit looks fine to me.
>
> > Signed-off-by: Sumit Garg <[email protected]>
> > Reviewed-by: Masayoshi Mizuma <[email protected]>
> > Tested-by: Chen-Yu Tsai <[email protected]>
> > Signed-off-by: Douglas Anderson <[email protected]>
> > ---
> >
> > (no changes since v1)
> >
> > drivers/irqchip/irq-gic-v3.c | 29 +++++++++++++++++++++--------
> > 1 file changed, 21 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> > index 0c6c1af9a5b7..ed37e02d4c5f 100644
> > --- a/drivers/irqchip/irq-gic-v3.c
> > +++ b/drivers/irqchip/irq-gic-v3.c
> > @@ -525,6 +525,7 @@ static u32 gic_get_ppi_index(struct irq_data *d)
> > static int gic_irq_nmi_setup(struct irq_data *d)
> > {
> > struct irq_desc *desc = irq_to_desc(d->irq);
> > + u32 idx;
> >
> > if (!gic_supports_nmi())
> > return -EINVAL;
> > @@ -542,16 +543,22 @@ static int gic_irq_nmi_setup(struct irq_data *d)
> > return -EINVAL;
> >
> > /* desc lock should already be held */
> > - if (gic_irq_in_rdist(d)) {
> > - u32 idx = gic_get_ppi_index(d);
> > + switch (get_intid_range(d)) {
> > + case SGI_RANGE:
> > + break;
> > + case PPI_RANGE:
> > + case EPPI_RANGE:
> > + idx = gic_get_ppi_index(d);
> >
> > /* Setting up PPI as NMI, only switch handler for first NMI */
> > if (!refcount_inc_not_zero(&ppi_nmi_refs[idx])) {
> > refcount_set(&ppi_nmi_refs[idx], 1);
> > desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
> > }
> > - } else {
> > + break;
> > + default:
> > desc->handle_irq = handle_fasteoi_nmi;
> > + break;
> > }
>
> As above, I reckon this isn't right, and we should treat all rdist interrupts
> (which are all percpu) the same.
>
> I reckon what we should be doing here is make ppi_nmi_refs cover all of the
> rdist interrupts (e.g. make that rdist_nmi_refs, add a gic_get_rdist_idx()
> helper), and then here have something like:
>
> if (gic_irq_in_rdist(d)) {
> u32 idx = gic_get_rdist_idx(d);
>
> /*
> * Setting up a percpu interrupt as NMI, only switch handler
> * for first NMI
> */
> if (!refcount_inc_not_zero(&rdist_nmi_refs[idx])) {
> refcount_set(&ppi_nmi_refs[idx], 1);
> desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
> }
> }
It looks like you missed the else part here as follows for all other
interrupt types:
} else {
desc->handle_irq = handle_fasteoi_nmi;
}
Apart from that, your logic sounds good to me.
-Sumit
>
> ... as an aside, it'd be nicer if we could switch the handler at request time,
> as then we wouldn't need the refcount at all, but I couldn't see a good irqchip
> hook to hang that off, so I don't think that needs to change as a prerequisite.
>
> >
> > gic_irq_set_prio(d, GICD_INT_NMI_PRI);
> > @@ -562,6 +569,7 @@ static int gic_irq_nmi_setup(struct irq_data *d)
> > static void gic_irq_nmi_teardown(struct irq_data *d)
> > {
> > struct irq_desc *desc = irq_to_desc(d->irq);
> > + u32 idx;
> >
> > if (WARN_ON(!gic_supports_nmi()))
> > return;
> > @@ -579,14 +587,20 @@ static void gic_irq_nmi_teardown(struct irq_data *d)
> > return;
> >
> > /* desc lock should already be held */
> > - if (gic_irq_in_rdist(d)) {
> > - u32 idx = gic_get_ppi_index(d);
> > + switch (get_intid_range(d)) {
> > + case SGI_RANGE:
> > + break;
> > + case PPI_RANGE:
> > + case EPPI_RANGE:
> > + idx = gic_get_ppi_index(d);
> >
> > /* Tearing down NMI, only switch handler for last NMI */
> > if (refcount_dec_and_test(&ppi_nmi_refs[idx]))
> > desc->handle_irq = handle_percpu_devid_irq;
> > - } else {
> > + break;
> > + default:
> > desc->handle_irq = handle_fasteoi_irq;
> > + break;
> > }
>
> Same comments as for gic_irq_nmi_setup() here.
>
> >
> > gic_irq_set_prio(d, GICD_INT_DEF_PRI);
> > @@ -2001,6 +2015,7 @@ static int __init gic_init_bases(phys_addr_t dist_phys_base,
> >
> > gic_dist_init();
> > gic_cpu_init();
> > + gic_enable_nmi_support();
> > gic_smp_init();
> > gic_cpu_pm_init();
> >
> > @@ -2013,8 +2028,6 @@ static int __init gic_init_bases(phys_addr_t dist_phys_base,
> > gicv2m_init(handle, gic_data.domain);
> > }
> >
> > - gic_enable_nmi_support();
> > -
>
> This bit looks fine to me.
>
> Thanks,
> Mark.
On Mon, Aug 07, 2023 at 12:08:06PM +0100, Marc Zyngier wrote:
> On 2023-08-07 11:54, Mark Rutland wrote:
> > On Mon, Aug 07, 2023 at 11:47:04AM +0100, Marc Zyngier wrote:
> > > On 2023-08-07 11:28, Mark Rutland wrote:
> > > > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote:
> > > > > From: Sumit Garg <[email protected]>
> > > > >
> > > > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug
> > > > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug
> > > > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then
> > > > > this won't really have any huge benefit but it will still work.
> > > > >
> > > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > > Signed-off-by: Douglas Anderson <[email protected]>
> > > > > ---
> > > > >
> > > > > Changes in v9:
> > > > > - Remove fallback for when debug IPI isn't available.
> > > > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by
> > > > > NMI.
> > > > >
> > > > > arch/arm64/kernel/ipi_debug.c | 5 +++++
> > > > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++
> > > > > 2 files changed, 19 insertions(+)
> > > >
> > > > This looks fine to me, but I'd feel a bit happier if we had separate
> > > > SGIs for
> > > > the backtrace and the KGDB callback as they're logically unrelated.
> > >
> > > Well, we're a bit stuck here.
> > >
> > > We have exactly *one* spare SGI with GICv3, as we lose 8 of them
> > > to the secure side. One possibility would be to mux some of the
> > > lesser used IPIs onto two SGIs (one with standard priority, and
> > > one with NMI priority).
> >
> > Understood; Doug and I suggested two options for that:
> >
> > 1) Unify/mux the IPI_CPU_STOP and IPI_CPU_CRASH_STOP IPIs
> >
> > The only *intended* difference between the two is that
> > IPI_CPU_CRASH_STOP
> > calls crash_save_cpu() before trying to stop the CPU, but the
> > implementations have diverged significantly for unrelated reasons.
> >
> > 2) Remove IPI_WAKEUP
> >
> > We only use IPI_WAKEUP for the ACPI parking protocol, and we could
> > reuse
> > another IPI (e.g. IPI_RESCHEDULE) to achieve the same thing witout a
> > dedicated IPI.
>
> Sure. My concern is that we're papering over the fundamental problem,
> which is that IPIs are limited resource, and that we're bound to pile
> more stuff on them.
>
> I'm all for reclaiming the ones that can be merged, but we may ultimately
> need a real fix for this.
Sure; I will bear that in mind.
Thanks,
Mark.
On Mon, 7 Aug 2023 at 16:11, Mark Rutland <[email protected]> wrote:
>
> Hi Doug,
>
> Apologies for the delay.
>
> On Mon, Jul 24, 2023 at 08:55:44AM -0700, Doug Anderson wrote:
> > On Thu, Jun 1, 2023 at 2:37 PM Douglas Anderson <[email protected]> wrote:
> > I'm looking for some ideas on what to do to move this patch series
> > forward. Thanks to Daniel, the kgdb patch is now in Linus's tree which
> > hopefully makes this simpler to land. I guess there is still the
> > irqchip dependency that will need to be sorted out, though...
> >
> > Even if folks aren't in agreement about whether this is ready to be
> > enabled in production, I don't think anything here is super
> > objectionable or controversial, is it? Can we land it? If you feel
> > like it needs extra review, would it help if I tried to drum up some
> > extra people to provide review feedback?
>
> Ignoring the soundness issues I mentioned before (which I'm slowly chipping
> away at, and you're likely lucky enough to avoid in practice)...
>
> Having looked over the series, I think the GICv3 bit isn't quite right, but is
> easy enough to fix. I've commented on the patch with what I think we should
> have there.
Thanks for catching this and I agree with your proposed fix.
>
> The only major thing otherwise from my PoV is the structure of the debug IPI
> framework. I'm not keen on that being a separate body of code and I think it
> should live in smp.c along with the other IPIs.
That's a fair point.
> I'd also strongly prefer if we
> could have separate IPI_CPU_BACKTRACE and IPI_CPU_KGDB IPIs,
With current logic of single debug IPI, it is not required for a user
to enable KGDB in order to use that IPI for backtrace. The original
motivation for this logic was that the IPIs are a scarce resource on
arm64 as per comments from Marc. So I am fine either way to keep them
separate or unified.
> and I think we can
> do that either by unifying IPI_CPU_STOP && IPI_CPU_CRASH_STOP or by reclaiming
> IPI_WAKEUP by reusing a different IPI for the parking protocol (e.g.
> IPI_RESCHEDULE).
That sounds like a good cleanup.
>
> I think it'd be nice if the series could enable NMIs for backtrace and the
> CPU_{,CRASH_}STOP cases, with KGDB being the bonus atop. That way it'd be
> clearly beneficial for anyone trying to debug lockups even if they're not a
> KGDB user.
>
It's good to see other use-cases of IPIs turned into NMIs.
-Sumit
> > Also: in case it's interesting to anyone, I've been doing benchmarks
> > on sc7180-trogdor devices in preparation for enabling this. On that
> > platform, I did manage to see about 4% reduction in a set of hackbench
> > numbers when fully enabling pseudo-NMI. However, when I instead ran
> > Speedometer 2.1 I saw no difference. See:
> >
> > https://issuetracker.google.com/issues/197061987
>
> Thanks for the pointer!
>
> I know that there are a couple of things that we could do to slightly improve
> local_irq_*() when using pNMIs, though I suspect that the bulk of the cost
> there will come from the necessary synchronization.
>
> Thanks,
> Mark.
On Mon, Aug 07, 2023 at 04:52:40PM +0530, Sumit Garg wrote:
> On Mon, 7 Aug 2023 at 15:20, Mark Rutland <[email protected]> wrote:
> > On Thu, Jun 01, 2023 at 02:31:45PM -0700, Douglas Anderson wrote:
> > > @@ -542,16 +543,22 @@ static int gic_irq_nmi_setup(struct irq_data *d)
> > > return -EINVAL;
> > >
> > > /* desc lock should already be held */
> > > - if (gic_irq_in_rdist(d)) {
> > > - u32 idx = gic_get_ppi_index(d);
> > > + switch (get_intid_range(d)) {
> > > + case SGI_RANGE:
> > > + break;
> > > + case PPI_RANGE:
> > > + case EPPI_RANGE:
> > > + idx = gic_get_ppi_index(d);
> > >
> > > /* Setting up PPI as NMI, only switch handler for first NMI */
> > > if (!refcount_inc_not_zero(&ppi_nmi_refs[idx])) {
> > > refcount_set(&ppi_nmi_refs[idx], 1);
> > > desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
> > > }
> > > - } else {
> > > + break;
> > > + default:
> > > desc->handle_irq = handle_fasteoi_nmi;
> > > + break;
> > > }
> >
> > As above, I reckon this isn't right, and we should treat all rdist interrupts
> > (which are all percpu) the same.
> >
> > I reckon what we should be doing here is make ppi_nmi_refs cover all of the
> > rdist interrupts (e.g. make that rdist_nmi_refs, add a gic_get_rdist_idx()
> > helper), and then here have something like:
> >
> > if (gic_irq_in_rdist(d)) {
> > u32 idx = gic_get_rdist_idx(d);
> >
> > /*
> > * Setting up a percpu interrupt as NMI, only switch handler
> > * for first NMI
> > */
> > if (!refcount_inc_not_zero(&rdist_nmi_refs[idx])) {
> > refcount_set(&ppi_nmi_refs[idx], 1);
> > desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
> > }
> > }
>
> It looks like you missed the else part here as follows for all other
> interrupt types:
>
> } else {
> desc->handle_irq = handle_fasteoi_nmi;
> }
Yes, sorry; I had meant for that to be included, i.e.
if (gic_irq_in_rdist(d)) {
u32 idx = gic_get_rdist_idx(d);
/*
* Setting up a percpu interrupt as NMI, only switch handler
* for first NMI
*/
if (!refcount_inc_not_zero(&rdist_nmi_refs[idx])) {
refcount_set(&ppi_nmi_refs[idx], 1);
desc->handle_irq = handle_percpu_devid_fasteoi_nmi;
}
} else {
desc->handle_irq = handle_fasteoi_nmi;
}
> Apart from that, your logic sounds good to me.
Great!
Thanks,
Mark.
On Mon, Aug 07, 2023 at 06:16:24PM +0530, Sumit Garg wrote:
> On Mon, 7 Aug 2023 at 16:11, Mark Rutland <[email protected]> wrote:
> > On Mon, Jul 24, 2023 at 08:55:44AM -0700, Doug Anderson wrote:
> > > On Thu, Jun 1, 2023 at 2:37 PM Douglas Anderson <[email protected]> wrote:
> > I'd also strongly prefer if we
> > could have separate IPI_CPU_BACKTRACE and IPI_CPU_KGDB IPIs,
>
> With current logic of single debug IPI, it is not required for a user
> to enable KGDB in order to use that IPI for backtrace. The original
> motivation for this logic was that the IPIs are a scarce resource on
> arm64 as per comments from Marc. So I am fine either way to keep them
> separate or unified.
>
> > and I think we can
> > do that either by unifying IPI_CPU_STOP && IPI_CPU_CRASH_STOP or by reclaiming
> > IPI_WAKEUP by reusing a different IPI for the parking protocol (e.g.
> > IPI_RESCHEDULE).
>
> That sounds like a good cleanup.
I took a quick stab at removing IPI_WAKEUP, and it seems simple enough; patch
below.
Mark.
----<8----
From 267401824576c1dd3bb6d380142c92f710098b67 Mon Sep 17 00:00:00 2001
From: Mark Rutland <[email protected]>
Date: Mon, 7 Aug 2023 15:05:47 +0100
Subject: [PATCH] arm64: smp: Remove dedicated wakeup IPI
To enable NMI backtrace and KGDB's NMI cpu roundup, we need to free up
at least one dedicated IPI.
On arm64 the IPI_WAKEUP IPI is only used for the ACPI parking protocol,
which itself is only used on some very early ARMv8 systems which
couldn't implement PSCI.
Remove the IPI_WAKEUP IPI, and rely on the IPI_RESCHEDULE IPI to wake
CPUs from the parked state. This will cause a tiny amonut of redundant
work to check the thread flags, but this is miniscule in relation to the
cost of taking and handling the IPI in the first place. We can safely
handle redundant IPI_RESCHEDULE IPIs, so there should be no functional
impact as a result of this change.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Douglas Anderson <[email protected]>
Cc: Marc Zyngier <[email protected]>
Cc: Sumit Garg <[email protected]>
Cc: Will Deacon <[email protected]>
---
arch/arm64/include/asm/smp.h | 4 ++--
arch/arm64/kernel/acpi_parking_protocol.c | 2 +-
arch/arm64/kernel/smp.c | 28 +++++++++--------------
3 files changed, 14 insertions(+), 20 deletions(-)
diff --git a/arch/arm64/include/asm/smp.h b/arch/arm64/include/asm/smp.h
index 9b31e6d0da174..798fd76a883a0 100644
--- a/arch/arm64/include/asm/smp.h
+++ b/arch/arm64/include/asm/smp.h
@@ -89,9 +89,9 @@ extern void arch_send_call_function_single_ipi(int cpu);
extern void arch_send_call_function_ipi_mask(const struct cpumask *mask);
#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
-extern void arch_send_wakeup_ipi_mask(const struct cpumask *mask);
+extern void arch_send_wakeup_ipi(int cpu);
#else
-static inline void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
+static inline void arch_send_wakeup_ipi(int cpu)
{
BUILD_BUG();
}
diff --git a/arch/arm64/kernel/acpi_parking_protocol.c b/arch/arm64/kernel/acpi_parking_protocol.c
index b1990e38aed0a..e1be29e608b71 100644
--- a/arch/arm64/kernel/acpi_parking_protocol.c
+++ b/arch/arm64/kernel/acpi_parking_protocol.c
@@ -103,7 +103,7 @@ static int acpi_parking_protocol_cpu_boot(unsigned int cpu)
&mailbox->entry_point);
writel_relaxed(cpu_entry->gic_cpu_id, &mailbox->cpu_id);
- arch_send_wakeup_ipi_mask(cpumask_of(cpu));
+ arch_send_wakeup_ipi(cpu);
return 0;
}
diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
index edd63894d61e8..be00c8c5c1711 100644
--- a/arch/arm64/kernel/smp.c
+++ b/arch/arm64/kernel/smp.c
@@ -72,7 +72,6 @@ enum ipi_msg_type {
IPI_CPU_CRASH_STOP,
IPI_TIMER,
IPI_IRQ_WORK,
- IPI_WAKEUP,
NR_IPI
};
@@ -764,7 +763,6 @@ static const char *ipi_types[NR_IPI] __tracepoint_string = {
[IPI_CPU_CRASH_STOP] = "CPU stop (for crash dump) interrupts",
[IPI_TIMER] = "Timer broadcast interrupts",
[IPI_IRQ_WORK] = "IRQ work interrupts",
- [IPI_WAKEUP] = "CPU wake-up interrupts",
};
static void smp_cross_call(const struct cpumask *target, unsigned int ipinr);
@@ -797,13 +795,6 @@ void arch_send_call_function_single_ipi(int cpu)
smp_cross_call(cpumask_of(cpu), IPI_CALL_FUNC);
}
-#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
-void arch_send_wakeup_ipi_mask(const struct cpumask *mask)
-{
- smp_cross_call(mask, IPI_WAKEUP);
-}
-#endif
-
#ifdef CONFIG_IRQ_WORK
void arch_irq_work_raise(void)
{
@@ -897,14 +888,6 @@ static void do_handle_IPI(int ipinr)
break;
#endif
-#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
- case IPI_WAKEUP:
- WARN_ONCE(!acpi_parking_protocol_valid(cpu),
- "CPU%u: Wake-up IPI outside the ACPI parking protocol\n",
- cpu);
- break;
-#endif
-
default:
pr_crit("CPU%u: Unknown IPI message 0x%x\n", cpu, ipinr);
break;
@@ -979,6 +962,17 @@ void arch_smp_send_reschedule(int cpu)
smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE);
}
+#ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL
+void arch_send_wakeup_ipi(int cpu)
+{
+ /*
+ * We use a scheduler IPI to wake the CPU as this avoids the need for a
+ * dedicated IPI and we can safely handle spurious scheduler IPIs.
+ */
+ arch_smp_send_reschedule(cpu);
+}
+#endif
+
#ifdef CONFIG_GENERIC_CLOCKEVENTS_BROADCAST
void tick_broadcast(const struct cpumask *mask)
{
--
2.30.2
On Mon, Aug 07, 2023 at 11:28:52AM +0100, Mark Rutland wrote:
> On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote:
> > From: Sumit Garg <[email protected]>
> >
> > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug
> > IPI is backed by an NMI (or pseudo NMI) then this will let us debug
> > even hard locked CPUs. When the debug IPI isn't backed by an NMI then
> > this won't really have any huge benefit but it will still work.
> >
> > Signed-off-by: Sumit Garg <[email protected]>
> > Signed-off-by: Douglas Anderson <[email protected]>
> > ---
> >
> > Changes in v9:
> > - Remove fallback for when debug IPI isn't available.
> > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
> >
> > arch/arm64/kernel/ipi_debug.c | 5 +++++
> > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++
> > 2 files changed, 19 insertions(+)
>
> This looks fine to me, but I'd feel a bit happier if we had separate SGIs for
> the backtrace and the KGDB callback as they're logically unrelated.
I've no objections to seperate SGIs (if one can be found) but I'm curious
what benefit emerges from giving them seperate IPIs.
Both interfaces are already designed to share and NMI-like IPI nicely
(and IIUC they must share one on x86), neither is performance
critical[1] and the content of /proc/interrupts for the IPI is seldom
going to be of much interest.
As mentioned it is perfectly OK to separate them if there is space in
the SGI allocations. However if any two functions are good candidates to
share a scarce resource such as an SGI then it is these!
Daniel.
[1] In both cases their results are only required at human-scale
and the work of the both handlers is hugely more expensive than
either up front quick-check.
On Mon, Aug 07, 2023 at 04:24:44PM +0100, Daniel Thompson wrote:
> On Mon, Aug 07, 2023 at 11:28:52AM +0100, Mark Rutland wrote:
> > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote:
> > > From: Sumit Garg <[email protected]>
> > >
> > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug
> > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug
> > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then
> > > this won't really have any huge benefit but it will still work.
> > >
> > > Signed-off-by: Sumit Garg <[email protected]>
> > > Signed-off-by: Douglas Anderson <[email protected]>
> > > ---
> > >
> > > Changes in v9:
> > > - Remove fallback for when debug IPI isn't available.
> > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
> > >
> > > arch/arm64/kernel/ipi_debug.c | 5 +++++
> > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++
> > > 2 files changed, 19 insertions(+)
> >
> > This looks fine to me, but I'd feel a bit happier if we had separate SGIs for
> > the backtrace and the KGDB callback as they're logically unrelated.
>
> I've no objections to seperate SGIs (if one can be found) but I'm curious
> what benefit emerges from giving them seperate IPIs.
Mostly an "I'd feel happier"; they're logically unrelated and having them
separate avoids the risk of them unintentionally getting in the way of the
other.
> Both interfaces are already designed to share and NMI-like IPI nicely
> (and IIUC they must share one on x86), neither is performance
> critical[1] and the content of /proc/interrupts for the IPI is seldom
> going to be of much interest.
Sure; I understand that. The flip side of "neither is performance critical" is
that they're seldom tested in terms of interaction with one another, and hence
for robustness I'd prefer they're separate.
I agree it's not strictly necessary, but given we can easily free up an SGI
slot, I think it'd be worthwhile. We can always decide to fold them together in
future if we have to.
I realise a similar argument could be applied to IPI_WAKEUP and IPI_RESCHEDULE,
but given that IPI_RESCHEDULE happens *all the time* and the wakeup handler
does literally nothing, I think the risk there is substantially lower.
Thanks,
Mark.
> As mentioned it is perfectly OK to separate them if there is space in
> the SGI allocations. However if any two functions are good candidates to
> share a scarce resource such as an SGI then it is these!
>
>
> Daniel.
>
>
> [1] In both cases their results are only required at human-scale
> and the work of the both handlers is hugely more expensive than
> either up front quick-check.
On Mon, Aug 07, 2023 at 05:04:14PM +0100, Mark Rutland wrote:
> On Mon, Aug 07, 2023 at 04:24:44PM +0100, Daniel Thompson wrote:
> > On Mon, Aug 07, 2023 at 11:28:52AM +0100, Mark Rutland wrote:
> > > On Thu, Jun 01, 2023 at 02:31:51PM -0700, Douglas Anderson wrote:
> > > > From: Sumit Garg <[email protected]>
> > > >
> > > > Let's use the debug IPI for rounding up CPUs in kgdb. When the debug
> > > > IPI is backed by an NMI (or pseudo NMI) then this will let us debug
> > > > even hard locked CPUs. When the debug IPI isn't backed by an NMI then
> > > > this won't really have any huge benefit but it will still work.
> > > >
> > > > Signed-off-by: Sumit Garg <[email protected]>
> > > > Signed-off-by: Douglas Anderson <[email protected]>
> > > > ---
> > > >
> > > > Changes in v9:
> > > > - Remove fallback for when debug IPI isn't available.
> > > > - Renamed "NMI IPI" to "debug IPI" since it might not be backed by NMI.
> > > >
> > > > arch/arm64/kernel/ipi_debug.c | 5 +++++
> > > > arch/arm64/kernel/kgdb.c | 14 ++++++++++++++
> > > > 2 files changed, 19 insertions(+)
> > >
> > > This looks fine to me, but I'd feel a bit happier if we had separate SGIs for
> > > the backtrace and the KGDB callback as they're logically unrelated.
> >
> > I've no objections to seperate SGIs (if one can be found) but I'm curious
> > what benefit emerges from giving them seperate IPIs.
>
> Mostly an "I'd feel happier"; they're logically unrelated and having them
> separate avoids the risk of them unintentionally getting in the way of the
> other.
>
> > Both interfaces are already designed to share and NMI-like IPI nicely
> > (and IIUC they must share one on x86), neither is performance
> > critical[1] and the content of /proc/interrupts for the IPI is seldom
> > going to be of much interest.
>
> Sure; I understand that. The flip side of "neither is performance critical" is
> that they're seldom tested in terms of interaction with one another, and hence
> for robustness I'd prefer they're separate.
We can't really stop them interacting when kdb is enabled: both activate
special logic to intercept console messages and this is much more
complex than the "is this my IPI?" tests.
> I agree it's not strictly necessary, but given we can easily free up an SGI
> slot, I think it'd be worthwhile. We can always decide to fold them together in
> future if we have to.
Agreed.
It should be very little bother to "reclaim" an IPI from the diagnostics
machinery if that is ever needed. So seperate IPIs is no problem from my
point of view.
Daniel.
On Mon, Aug 21, 2023 at 05:06:50PM -0700, Doug Anderson wrote:
> On Mon, Aug 7, 2023 at 3:23 AM Mark Rutland <[email protected]> wrote:
> > On Thu, Jun 01, 2023 at 02:31:49PM -0700, Douglas Anderson wrote:
> > > From: Sumit Garg <[email protected]>
> > > static irqreturn_t ipi_debug_handler(int irq, void *data)
> > > {
> > > - /* nop, NMI handlers for special features can be added here. */
> > > + irqreturn_t ret = IRQ_NONE;
> > > +
> > > + /*
> > > + * NOTE: Just like in arch_trigger_cpumask_backtrace(), we're calling
> > > + * a function with "nmi_" in the name but it works fine even if we
> > > + * are using a regulaor IPI.
> > > + */
> > > + if (nmi_cpu_backtrace(get_irq_regs()))
> > > + ret = IRQ_HANDLED;
> > >
> >
> > Does this need the printk_deferred_{enter,exit}() that 32-bit arm has?
>
> I don't _think_ so, but I also don't _think_ it's needed for arm32.
> ;-) Let me explain my logic and you can tell me if it sounds right to
> you.
>
> If we're doing the backtrace in pseudo-NMI context then we definitely
> don't need it. Specifically, the printk_deferred_{enter,exit}() just
> manages the per-cpu "printk_context" value. That value doesn't matter
> if "in_nmi()" returns true.
>
> If we're _not_ doing the backtrace in pseudo-NMI context then I think
> we also don't need it. After all, if we're not in pseudo-NMI context
> then it's perfectly fine to print, right?
>
> While it's hard to know 100% for sure, my best guess is that in arm
> this might have helped prevent stack traces from getting interspersed
> among one another. I guess this is no longer needed as of commit
> 55d6af1d6688 ("lib/nmi_backtrace: explicitly serialize banner and
> regs")? In any case, when I tested this earlier things seemed to
> printout fine without it...
Thanks for that explanation; that makes sense to me! Looking around a bit I see
that x86 doesn't bother either.
> That being said, it wouldn't hurt to include it here and I'll do it if you
> want.
No need -- I'm happy without it.
Thanks,
Mark.
On Mon, Aug 21, 2023 at 03:16:56PM -0700, Doug Anderson wrote:
> Hi,
>
> On Mon, Aug 7, 2023 at 3:12 AM Mark Rutland <[email protected]> wrote:
> >
> > On Thu, Jun 01, 2023 at 02:31:47PM -0700, Douglas Anderson wrote:
> > > From: Sumit Garg <[email protected]>
> > >
> > > Introduce a framework for an IPI that will be used for debug
> > > purposes. The primary use case of this IPI will be to generate stack
> > > crawls on other CPUs, but it will also be used to round up CPUs for
> > > kgdb.
> > >
> > > When possible, we try to allocate this debug IPI as an NMI (or a
> > > pseudo NMI). If that fails (due to CONFIG, an incompatible interrupt
> > > controller, a quirk, missing the "irqchip.gicv3_pseudo_nmi=1" kernel
> > > parameter, etc) we fall back to a normal IPI.
> > >
> > > NOTE: hooking this up for CPU backtrace / kgdb will happen in a future
> > > patch, this just adds the framework.
> > >
> > > Signed-off-by: Sumit Garg <[email protected]>
> > > Signed-off-by: Douglas Anderson <[email protected]>
> >
> > I think that we shouldn't add a framework in a separate file for this:
> >
> > * This is very similar to our existing IPI management in smp.c, so it feels
> > like duplication, or at least another thing we'd like to keep in-sync.
> >
> > * We're going to want an NMI backtrace regardless of KGDB
> >
> > * We're going to want the IPI_CPU_STOP and IPI_CRASH_CPU_STOP IPIs to be NMIs
> > too.
> >
> > I reckon it'd be better to extend the existing IPI logic in smp.c to allow IPIs
> > to be requested as NMIs, e.g.
> >
> > ----
> > diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c
> > index edd63894d61e8..48e6aa62c473e 100644
> > --- a/arch/arm64/kernel/smp.c
> > +++ b/arch/arm64/kernel/smp.c
> > @@ -33,6 +33,7 @@
> > #include <linux/kernel_stat.h>
> > #include <linux/kexec.h>
> > #include <linux/kvm_host.h>
> > +#include <linux/nmi.h>
> >
> > #include <asm/alternative.h>
> > #include <asm/atomic.h>
> > @@ -926,6 +927,21 @@ static void smp_cross_call(const struct cpumask *target, unsigned int ipinr)
> > __ipi_send_mask(ipi_desc[ipinr], target);
> > }
> >
> > +static bool ipi_should_be_nmi(enum ipi_msg_type ipi)
> > +{
> > + if (!system_uses_irq_prio_masking())
> > + return false;
> > +
> > + switch (ipi) {
> > + /*
> > + * TODO: select NMI IPIs here
> > + */
> > + return true;
> > + default:
> > + return false;
> > + }
> > +}
> > +
> > static void ipi_setup(int cpu)
> > {
> > int i;
> > @@ -933,8 +949,14 @@ static void ipi_setup(int cpu)
> > if (WARN_ON_ONCE(!ipi_irq_base))
> > return;
> >
> > - for (i = 0; i < nr_ipi; i++)
> > - enable_percpu_irq(ipi_irq_base + i, 0);
> > + for (i = 0; i < nr_ipi; i++) {
> > + if (ipi_should_be_nmi(i)) {
> > + prepare_percpu_nmi(ipi_irq_base + i);
> > + enable_percpu_nmi(ipi_irq_base + i, 0);
> > + } else {
> > + enable_percpu_irq(ipi_irq_base + i, 0);
> > + }
> > + }
> > }
> >
> > #ifdef CONFIG_HOTPLUG_CPU
> > @@ -945,8 +967,14 @@ static void ipi_teardown(int cpu)
> > if (WARN_ON_ONCE(!ipi_irq_base))
> > return;
> >
> > - for (i = 0; i < nr_ipi; i++)
> > - disable_percpu_irq(ipi_irq_base + i);
> > + for (i = 0; i < nr_ipi; i++) {
> > + if (ipi_should_be_nmi(i)) {
> > + disable_percpu_nmi(ipi_irq_base + i);
> > + teardown_percpu_nmi(ipi_irq_base + i);
> > + } else {
> > + disable_percpu_irq(ipi_irq_base + i);
> > + }
> > + }
> > }
> > #endif
> >
> > @@ -958,11 +986,19 @@ void __init set_smp_ipi_range(int ipi_base, int n)
> > nr_ipi = min(n, NR_IPI);
> >
> > for (i = 0; i < nr_ipi; i++) {
> > - int err;
> > -
> > - err = request_percpu_irq(ipi_base + i, ipi_handler,
> > - "IPI", &cpu_number);
> > - WARN_ON(err);
> > + int err = -EINVAL;
> > +
> > + if (ipi_should_be_nmi(i)) {
> > + err = request_percpu_nmi(ipi_base + i, ipi_handler,
> > + "IPI", &cpu_number);
> > + WARN(err, "Could not request IPI %d as NMI, err=%d\n",
> > + i, err);
> > + } else {
> > + err = request_percpu_irq(ipi_base + i, ipi_handler,
> > + "IPI", &cpu_number);
> > + WARN(err, "Could not request IPI %d as IRQ, err=%d\n",
> > + i, err);
> > + }
> >
> > ipi_desc[i] = irq_to_desc(ipi_base + i);
> > irq_set_status_flags(ipi_base + i, IRQ_HIDDEN);
> > ----
> >
> > ... and then if we need an IPI for KGDB, we can add that to the existing list
> > of IPIs, and have it requested/enabled/disabled as usual.
>
> Sounds good. I'm starting to work on v10 incorporating your feedback.
> A few quick questions:
>
> 1. If I mostly take your patch above verbatim, do you have any
> suggested tags for Author/Signed-off-by? I'd tend to set you as the
> author but I can't do that because you didn't provide a
> Signed-off-by...
Sorry about that. For the above:
Signed-off-by: Mark Rutland <[email protected]>
If squashed into another patch, then feel free to use:
Co-developed-by: Mark Rutland <[email protected]>
> 2. Would you prefer this patch on its own, or would you rather it be
> squashed with the first user ("backtrace")? On its own, I think I have
> to get rid of the "switch" statement in ipi_should_be_nmi() and just
> return false;
I reckon it makes sense to squash it with the first user.
Thanks,
Mark.