2018-10-30 22:21:59

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 0/2] kgdb: Fix kgdb_roundup_cpus()

This series was originally part of the series ("serial: Finish kgdb on
qcom_geni; fix many lockdep splats w/ kgdb") but it made sense to
split it up.

It's believed that dropping into kgdb should be more robust once these
patches are applied.

Changes in v2:
- Removing irq flags separated from fixing lockdep splat.
- Don't use smp_call_function (Daniel).

Douglas Anderson (2):
kgdb: Remove irq flags from roundup
kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

arch/arc/kernel/kgdb.c | 10 ++--------
arch/arm/kernel/kgdb.c | 12 ------------
arch/arm64/kernel/kgdb.c | 12 ------------
arch/hexagon/kernel/kgdb.c | 32 --------------------------------
arch/mips/kernel/kgdb.c | 9 +--------
arch/powerpc/kernel/kgdb.c | 6 +++---
arch/sh/kernel/kgdb.c | 12 ------------
arch/sparc/kernel/smp_64.c | 2 +-
arch/x86/kernel/kgdb.c | 9 ++-------
include/linux/kgdb.h | 22 ++++++++++++++--------
kernel/debug/debug_core.c | 38 +++++++++++++++++++++++++++++++++++++-
11 files changed, 60 insertions(+), 104 deletions(-)

--
2.19.1.568.g152ad8e336-goog



2018-10-30 22:20:59

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

When I had lockdep turned on and dropped into kgdb I got a nice splat
on my system. Specifically it hit:
DEBUG_LOCKS_WARN_ON(current->hardirq_context)

Specifically it looked like this:
sysrq: SysRq : DEBUG
------------[ cut here ]------------
DEBUG_LOCKS_WARN_ON(current->hardirq_context)
WARNING: CPU: 0 PID: 0 at .../kernel/locking/lockdep.c:2875 lockdep_hardirqs_on+0xf0/0x160
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0 #27
pstate: 604003c9 (nZCv DAIF +PAN -UAO)
pc : lockdep_hardirqs_on+0xf0/0x160
...
Call trace:
lockdep_hardirqs_on+0xf0/0x160
trace_hardirqs_on+0x188/0x1ac
kgdb_roundup_cpus+0x14/0x3c
kgdb_cpu_enter+0x53c/0x5cc
kgdb_handle_exception+0x180/0x1d4
kgdb_compiled_brk_fn+0x30/0x3c
brk_handler+0x134/0x178
do_debug_exception+0xfc/0x178
el1_dbg+0x18/0x78
kgdb_breakpoint+0x34/0x58
sysrq_handle_dbg+0x54/0x5c
__handle_sysrq+0x114/0x21c
handle_sysrq+0x30/0x3c
qcom_geni_serial_isr+0x2dc/0x30c
...
...
irq event stamp: ...45
hardirqs last enabled at (...44): [...] __do_softirq+0xd8/0x4e4
hardirqs last disabled at (...45): [...] el1_irq+0x74/0x130
softirqs last enabled at (...42): [...] _local_bh_enable+0x2c/0x34
softirqs last disabled at (...43): [...] irq_exit+0xa8/0x100
---[ end trace adf21f830c46e638 ]---

Looking closely at it, it seems like a really bad idea to be calling
local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems
like it could violate spinlock semantics and cause a deadlock.

Instead, let's use a private csd alongside
smp_call_function_single_async() to round up the other CPUs. Using
smp_call_function_single_async() doesn't require interrupts to be
enabled so we can remove the offending bit of code.

In order to avoid duplicating this across all the architectures that
use the default kgdb_roundup_cpus(), we'll add a "weak" implementation
to debug_core.c.

Looking at all the people who previously had copies of this code,
there were a few variants. I've attempted to keep the variants
working like they used to. Specifically:
* For arch/arc we passed NULL to kgdb_nmicallback() instead of
get_irq_regs().
* For arch/mips there was a bit of extra code around
kgdb_nmicallback()

Suggested-by: Daniel Thompson <[email protected]>
Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v2:
- Removing irq flags separated from fixing lockdep splat.
- Don't use smp_call_function (Daniel).

arch/arc/kernel/kgdb.c | 10 ++--------
arch/arm/kernel/kgdb.c | 12 ------------
arch/arm64/kernel/kgdb.c | 12 ------------
arch/hexagon/kernel/kgdb.c | 27 ---------------------------
arch/mips/kernel/kgdb.c | 9 +--------
arch/powerpc/kernel/kgdb.c | 4 ++--
arch/sh/kernel/kgdb.c | 12 ------------
include/linux/kgdb.h | 15 +++++++++++++--
kernel/debug/debug_core.c | 36 ++++++++++++++++++++++++++++++++++++
9 files changed, 54 insertions(+), 83 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 0932851028e0..68d9fe4b5aa7 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -192,18 +192,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long ip)
instruction_pointer(regs) = ip;
}

-static void kgdb_call_nmi_hook(void *ignored)
+void kgdb_call_nmi_hook(void *ignored)
{
+ /* Default implementation passes get_irq_regs() but we don't */
kgdb_nmicallback(raw_smp_processor_id(), NULL);
}

-void kgdb_roundup_cpus(void)
-{
- local_irq_enable();
- smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
-}
-
struct kgdb_arch arch_kgdb_ops = {
/* breakpoint instruction: TRAP_S 0x3 */
#ifdef CONFIG_CPU_BIG_ENDIAN
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index f21077b077be..d9a69e941463 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -170,18 +170,6 @@ static struct undef_hook kgdb_compiled_brkpt_hook = {
.fn = kgdb_compiled_brk_fn
};

-static void kgdb_call_nmi_hook(void *ignored)
-{
- kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
-}
-
-void kgdb_roundup_cpus(void)
-{
- local_irq_enable();
- smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
-}
-
static int __kgdb_notify(struct die_args *args, unsigned long cmd)
{
struct pt_regs *regs = args->regs;
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index 12c339ff6e75..da880247c734 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -284,18 +284,6 @@ static struct step_hook kgdb_step_hook = {
.fn = kgdb_step_brk_fn
};

-static void kgdb_call_nmi_hook(void *ignored)
-{
- kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
-}
-
-void kgdb_roundup_cpus(void)
-{
- local_irq_enable();
- smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
-}
-
static int __kgdb_notify(struct die_args *args, unsigned long cmd)
{
struct pt_regs *regs = args->regs;
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 012e0e230ac2..b95d12038a4e 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -115,33 +115,6 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)
instruction_pointer(regs) = pc;
}

-#ifdef CONFIG_SMP
-
-/**
- * kgdb_roundup_cpus - Get other CPUs into a holding pattern
- *
- * On SMP systems, we need to get the attention of the other CPUs
- * and get them be in a known state. This should do what is needed
- * to get the other CPUs to call kgdb_wait(). Note that on some arches,
- * the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs.
- *
- * On non-SMP systems, this is not called.
- */
-
-static void hexagon_kgdb_nmi_hook(void *ignored)
-{
- kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
-}
-
-void kgdb_roundup_cpus(void)
-{
- local_irq_enable();
- smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
- local_irq_disable();
-}
-#endif
-

/* Not yet working */
void sleeping_thread_to_gdb_regs(unsigned long *gdb_regs,
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index 2b05effc17b4..42f057a6c215 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -207,7 +207,7 @@ void arch_kgdb_breakpoint(void)
".set\treorder");
}

-static void kgdb_call_nmi_hook(void *ignored)
+void kgdb_call_nmi_hook(void *ignored)
{
mm_segment_t old_fs;

@@ -219,13 +219,6 @@ static void kgdb_call_nmi_hook(void *ignored)
set_fs(old_fs);
}

-void kgdb_roundup_cpus(void)
-{
- local_irq_enable();
- smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
-}
-
static int compute_signal(int tt)
{
struct hard_trap_info *ht;
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index b0e804844be0..b4ce54d73337 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -117,7 +117,7 @@ int kgdb_skipexception(int exception, struct pt_regs *regs)
return kgdb_isremovedbreak(regs->nip);
}

-static int kgdb_call_nmi_hook(struct pt_regs *regs)
+static int kgdb_debugger_ipi(struct pt_regs *regs)
{
kgdb_nmicallback(raw_smp_processor_id(), regs);
return 0;
@@ -502,7 +502,7 @@ int kgdb_arch_init(void)
old__debugger_break_match = __debugger_break_match;
old__debugger_fault_handler = __debugger_fault_handler;

- __debugger_ipi = kgdb_call_nmi_hook;
+ __debugger_ipi = kgdb_debugger_ipi;
__debugger = kgdb_debugger;
__debugger_bpt = kgdb_handle_breakpoint;
__debugger_sstep = kgdb_singlestep;
diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
index cc57630f6bf2..14e012ad7c57 100644
--- a/arch/sh/kernel/kgdb.c
+++ b/arch/sh/kernel/kgdb.c
@@ -314,18 +314,6 @@ BUILD_TRAP_HANDLER(singlestep)
local_irq_restore(flags);
}

-static void kgdb_call_nmi_hook(void *ignored)
-{
- kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
-}
-
-void kgdb_roundup_cpus(void)
-{
- local_irq_enable();
- smp_call_function(kgdb_call_nmi_hook, NULL, 0);
- local_irq_disable();
-}
-
static int __kgdb_notify(struct die_args *args, unsigned long cmd)
{
int ret;
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index 05e5b2eb0d32..f4f866273ed2 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -176,14 +176,25 @@ kgdb_arch_handle_exception(int vector, int signo, int err_code,
char *remcom_out_buffer,
struct pt_regs *regs);

+/**
+ * kgdb_call_nmi_hook - Call kgdb_nmicallback() on the current CPU
+ *
+ * If you're using the default implementation of kgdb_roundup_cpus()
+ * this function will be called per CPU. If you don't implement
+ * kgdb_call_nmi_hook() a default will be used.
+ */
+
+extern void kgdb_call_nmi_hook(void *ignored);
+
+
/**
* kgdb_roundup_cpus - Get other CPUs into a holding pattern
*
* On SMP systems, we need to get the attention of the other CPUs
* and get them into a known state. This should do what is needed
* to get the other CPUs to call kgdb_wait(). Note that on some arches,
- * the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs.
+ * the NMI approach is not used for rounding up all the CPUs. Normally
+ * those architectures can just not implement this and get the default.
*
* On non-SMP systems, this is not called.
*/
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index f3cadda45f07..9a3f952de6ed 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -55,6 +55,7 @@
#include <linux/mm.h>
#include <linux/vmacache.h>
#include <linux/rcupdate.h>
+#include <linux/irq.h>

#include <asm/cacheflush.h>
#include <asm/byteorder.h>
@@ -220,6 +221,39 @@ int __weak kgdb_skipexception(int exception, struct pt_regs *regs)
return 0;
}

+/*
+ * Default (weak) implementation for kgdb_roundup_cpus
+ */
+
+static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
+
+void __weak kgdb_call_nmi_hook(void *ignored)
+{
+ kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
+}
+
+void __weak kgdb_roundup_cpus(void)
+{
+ call_single_data_t *csd;
+ int cpu;
+
+ for_each_cpu(cpu, cpu_online_mask) {
+ csd = &per_cpu(kgdb_roundup_csd, cpu);
+ smp_call_function_single_async(cpu, csd);
+ }
+}
+
+static void kgdb_generic_roundup_init(void)
+{
+ call_single_data_t *csd;
+ int cpu;
+
+ for_each_possible_cpu(cpu) {
+ csd = &per_cpu(kgdb_roundup_csd, cpu);
+ csd->func = kgdb_call_nmi_hook;
+ }
+}
+
/*
* Some architectures need cache flushes when we set/clear a
* breakpoint:
@@ -993,6 +1027,8 @@ int kgdb_register_io_module(struct kgdb_io *new_dbg_io_ops)
return -EBUSY;
}

+ kgdb_generic_roundup_init();
+
if (new_dbg_io_ops->init) {
err = new_dbg_io_ops->init();
if (err) {
--
2.19.1.568.g152ad8e336-goog


2018-10-30 22:22:16

by Doug Anderson

[permalink] [raw]
Subject: [PATCH v2 1/2] kgdb: Remove irq flags from roundup

The function kgdb_roundup_cpus() was passed a parameter that was
documented as:

> the flags that will be used when restoring the interrupts. There is
> local_irq_save() call before kgdb_roundup_cpus().

Nobody used those flags. Anyone who wanted to temporarily turn on
interrupts just did local_irq_enable() and local_irq_disable() without
looking at them. So we can definitely remove the flags.

Signed-off-by: Douglas Anderson <[email protected]>
---

Changes in v2:
- Removing irq flags separated from fixing lockdep splat.

arch/arc/kernel/kgdb.c | 2 +-
arch/arm/kernel/kgdb.c | 2 +-
arch/arm64/kernel/kgdb.c | 2 +-
arch/hexagon/kernel/kgdb.c | 9 ++-------
arch/mips/kernel/kgdb.c | 2 +-
arch/powerpc/kernel/kgdb.c | 2 +-
arch/sh/kernel/kgdb.c | 2 +-
arch/sparc/kernel/smp_64.c | 2 +-
arch/x86/kernel/kgdb.c | 9 ++-------
include/linux/kgdb.h | 9 ++-------
kernel/debug/debug_core.c | 2 +-
11 files changed, 14 insertions(+), 29 deletions(-)

diff --git a/arch/arc/kernel/kgdb.c b/arch/arc/kernel/kgdb.c
index 9a3c34af2ae8..0932851028e0 100644
--- a/arch/arc/kernel/kgdb.c
+++ b/arch/arc/kernel/kgdb.c
@@ -197,7 +197,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), NULL);
}

-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/arm/kernel/kgdb.c b/arch/arm/kernel/kgdb.c
index caa0dbe3dc61..f21077b077be 100644
--- a/arch/arm/kernel/kgdb.c
+++ b/arch/arm/kernel/kgdb.c
@@ -175,7 +175,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}

-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/arm64/kernel/kgdb.c b/arch/arm64/kernel/kgdb.c
index a20de58061a8..12c339ff6e75 100644
--- a/arch/arm64/kernel/kgdb.c
+++ b/arch/arm64/kernel/kgdb.c
@@ -289,7 +289,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}

-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/hexagon/kernel/kgdb.c b/arch/hexagon/kernel/kgdb.c
index 16c24b22d0b2..012e0e230ac2 100644
--- a/arch/hexagon/kernel/kgdb.c
+++ b/arch/hexagon/kernel/kgdb.c
@@ -119,17 +119,12 @@ void kgdb_arch_set_pc(struct pt_regs *regs, unsigned long pc)

/**
* kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
*
* On SMP systems, we need to get the attention of the other CPUs
* and get them be in a known state. This should do what is needed
* to get the other CPUs to call kgdb_wait(). Note that on some arches,
* the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
*
* On non-SMP systems, this is not called.
*/
@@ -139,7 +134,7 @@ static void hexagon_kgdb_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}

-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
local_irq_enable();
smp_call_function(hexagon_kgdb_nmi_hook, NULL, 0);
diff --git a/arch/mips/kernel/kgdb.c b/arch/mips/kernel/kgdb.c
index eb6c0d582626..2b05effc17b4 100644
--- a/arch/mips/kernel/kgdb.c
+++ b/arch/mips/kernel/kgdb.c
@@ -219,7 +219,7 @@ static void kgdb_call_nmi_hook(void *ignored)
set_fs(old_fs);
}

-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index 59c578f865aa..b0e804844be0 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -124,7 +124,7 @@ static int kgdb_call_nmi_hook(struct pt_regs *regs)
}

#ifdef CONFIG_SMP
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
smp_send_debugger_break();
}
diff --git a/arch/sh/kernel/kgdb.c b/arch/sh/kernel/kgdb.c
index 4f04c6638a4d..cc57630f6bf2 100644
--- a/arch/sh/kernel/kgdb.c
+++ b/arch/sh/kernel/kgdb.c
@@ -319,7 +319,7 @@ static void kgdb_call_nmi_hook(void *ignored)
kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
}

-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
local_irq_enable();
smp_call_function(kgdb_call_nmi_hook, NULL, 0);
diff --git a/arch/sparc/kernel/smp_64.c b/arch/sparc/kernel/smp_64.c
index d3ea1f3c06a0..68098bbf3705 100644
--- a/arch/sparc/kernel/smp_64.c
+++ b/arch/sparc/kernel/smp_64.c
@@ -1014,7 +1014,7 @@ void flush_dcache_page_all(struct mm_struct *mm, struct page *page)
}

#ifdef CONFIG_KGDB
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
smp_cross_call(&xcall_kgdb_capture, 0, 0, 0);
}
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8e36f249646e..ac6291a4178d 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -422,21 +422,16 @@ static void kgdb_disable_hw_debug(struct pt_regs *regs)
#ifdef CONFIG_SMP
/**
* kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
*
* On SMP systems, we need to get the attention of the other CPUs
* and get them be in a known state. This should do what is needed
* to get the other CPUs to call kgdb_wait(). Note that on some arches,
* the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
*
* On non-SMP systems, this is not called.
*/
-void kgdb_roundup_cpus(unsigned long flags)
+void kgdb_roundup_cpus(void)
{
apic->send_IPI_allbutself(APIC_DM_NMI);
}
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index e465bb15912d..05e5b2eb0d32 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -178,21 +178,16 @@ kgdb_arch_handle_exception(int vector, int signo, int err_code,

/**
* kgdb_roundup_cpus - Get other CPUs into a holding pattern
- * @flags: Current IRQ state
*
* On SMP systems, we need to get the attention of the other CPUs
* and get them into a known state. This should do what is needed
* to get the other CPUs to call kgdb_wait(). Note that on some arches,
* the NMI approach is not used for rounding up all the CPUs. For example,
- * in case of MIPS, smp_call_function() is used to roundup CPUs. In
- * this case, we have to make sure that interrupts are enabled before
- * calling smp_call_function(). The argument to this function is
- * the flags that will be used when restoring the interrupts. There is
- * local_irq_save() call before kgdb_roundup_cpus().
+ * in case of MIPS, smp_call_function() is used to roundup CPUs.
*
* On non-SMP systems, this is not called.
*/
-extern void kgdb_roundup_cpus(unsigned long flags);
+extern void kgdb_roundup_cpus(void);

/**
* kgdb_arch_set_pc - Generic call back to the program counter
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 65c0f1363788..f3cadda45f07 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -593,7 +593,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,

/* Signal the other CPUs to enter kgdb_wait() */
else if ((!kgdb_single_step) && kgdb_do_roundup)
- kgdb_roundup_cpus(flags);
+ kgdb_roundup_cpus();
#endif

/*
--
2.19.1.568.g152ad8e336-goog


2018-10-31 01:55:26

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

Hi Douglas,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kgdb/kgdb-next]
[also build test WARNING on v4.19 next-20181030]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url: https://github.com/0day-ci/linux/commits/Douglas-Anderson/kgdb-Fix-kgdb_roundup_cpus/20181031-063733
base: https://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb.git kgdb-next
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

WARNING: convert(1) not found, for SVG to PDF conversion install ImageMagick (https://www.imagemagick.org)
include/linux/srcu.h:175: warning: Function parameter or member 'p' not described in 'srcu_dereference_notrace'
include/linux/srcu.h:175: warning: Function parameter or member 'sp' not described in 'srcu_dereference_notrace'
include/linux/gfp.h:1: warning: no structured comments found
>> include/linux/kgdb.h:188: warning: Function parameter or member 'ignored' not described in 'kgdb_call_nmi_hook'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.ibss' not described in 'wireless_dev'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.connect' not described in 'wireless_dev'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.keys' not described in 'wireless_dev'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.ie' not described in 'wireless_dev'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.ie_len' not described in 'wireless_dev'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.bssid' not described in 'wireless_dev'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.ssid' not described in 'wireless_dev'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.default_key' not described in 'wireless_dev'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.default_mgmt_key' not described in 'wireless_dev'
include/net/cfg80211.h:4381: warning: Function parameter or member 'wext.prev_bssid_valid' not described in 'wireless_dev'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'
include/net/cfg80211.h:4869: warning: Excess function parameter 'ptr' description in 'reg_query_regdb_wmm'

vim +188 include/linux/kgdb.h

> 188
189

---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation


Attachments:
(No filename) (12.16 kB)
.config.gz (6.42 kB)
Download all attachments

2018-10-31 13:53:40

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

On Tue, Oct 30, 2018 at 03:18:43PM -0700, Douglas Anderson wrote:
> Looking closely at it, it seems like a really bad idea to be calling
> local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems
> like it could violate spinlock semantics and cause a deadlock.
>
> Instead, let's use a private csd alongside
> smp_call_function_single_async() to round up the other CPUs. Using
> smp_call_function_single_async() doesn't require interrupts to be
> enabled so we can remove the offending bit of code.

You might want to mention that the only reason this isn't a deadlock
itself is because there is a timeout on waiting for the slaves to
check-in.

> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -55,6 +55,7 @@
> #include <linux/mm.h>
> #include <linux/vmacache.h>
> #include <linux/rcupdate.h>
> +#include <linux/irq.h>
>
> #include <asm/cacheflush.h>
> #include <asm/byteorder.h>
> @@ -220,6 +221,39 @@ int __weak kgdb_skipexception(int exception, struct pt_regs *regs)
> return 0;
> }
>
> +/*
> + * Default (weak) implementation for kgdb_roundup_cpus
> + */
> +
> +static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
> +
> +void __weak kgdb_call_nmi_hook(void *ignored)
> +{
> + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> +}
> +
> +void __weak kgdb_roundup_cpus(void)
> +{
> + call_single_data_t *csd;
> + int cpu;
> +
> + for_each_cpu(cpu, cpu_online_mask) {
> + csd = &per_cpu(kgdb_roundup_csd, cpu);
> + smp_call_function_single_async(cpu, csd);
> + }
> +}
> +
> +static void kgdb_generic_roundup_init(void)
> +{
> + call_single_data_t *csd;
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + csd = &per_cpu(kgdb_roundup_csd, cpu);
> + csd->func = kgdb_call_nmi_hook;
> + }
> +}
> +
> /*
> * Some architectures need cache flushes when we set/clear a
> * breakpoint:
> @@ -993,6 +1027,8 @@ int kgdb_register_io_module(struct kgdb_io *new_dbg_io_ops)
> return -EBUSY;
> }
>
> + kgdb_generic_roundup_init();
> +
> if (new_dbg_io_ops->init) {
> err = new_dbg_io_ops->init();
> if (err) {

That's a little bit of overhead for those architectures not using that
generic code; but I suppose we can live with that.

2018-10-31 17:04:01

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

On Wed, Oct 31, 2018 at 02:49:26PM +0100, Peter Zijlstra wrote:
> On Tue, Oct 30, 2018 at 03:18:43PM -0700, Douglas Anderson wrote:
> > Looking closely at it, it seems like a really bad idea to be calling
> > local_irq_enable() in kgdb_roundup_cpus(). If nothing else that seems
> > like it could violate spinlock semantics and cause a deadlock.
> >
> > Instead, let's use a private csd alongside
> > smp_call_function_single_async() to round up the other CPUs. Using
> > smp_call_function_single_async() doesn't require interrupts to be
> > enabled so we can remove the offending bit of code.
>
> You might want to mention that the only reason this isn't a deadlock
> itself is because there is a timeout on waiting for the slaves to
> check-in.

dbg_master_lock must be owned to call kgdb_roundup_cpus() so
the calls to smp_call_function_single_async() should never deadlock the
calling CPU unless there has been a previous failure to round up (e.g.
cores that cannot react to the round up signal).

When there is a failure to round up when we resume, there is a window (before
whatever locks that prevented the IPI being serviced are released) during which
the system will deadlock if the debugger is re entered.

I don't think there is any point trying to round up a CPU that did not
previously respond... it should still have an IPI pending. The deadlock
can be eliminated by getting the round up code to avoid CPUs whose csd->flags
are non-zero either by checking the flag in the kgdb code or adding something
like smp_trycall_function_single_async().


Daniel.

2018-10-31 18:43:05

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

On Tue, Oct 30, 2018 at 03:18:43PM -0700, Douglas Anderson wrote:
> diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> index f3cadda45f07..9a3f952de6ed 100644
> --- a/kernel/debug/debug_core.c
> +++ b/kernel/debug/debug_core.c
> @@ -55,6 +55,7 @@
> #include <linux/mm.h>
> #include <linux/vmacache.h>
> #include <linux/rcupdate.h>
> +#include <linux/irq.h>
>
> #include <asm/cacheflush.h>
> #include <asm/byteorder.h>
> @@ -220,6 +221,39 @@ int __weak kgdb_skipexception(int exception, struct pt_regs *regs)
> return 0;
> }
>
> +/*
> + * Default (weak) implementation for kgdb_roundup_cpus
> + */
> +
> +static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
> +
> +void __weak kgdb_call_nmi_hook(void *ignored)
> +{
> + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> +}
> +
> +void __weak kgdb_roundup_cpus(void)
> +{
> + call_single_data_t *csd;
> + int cpu;
> +
> + for_each_cpu(cpu, cpu_online_mask) {
> + csd = &per_cpu(kgdb_roundup_csd, cpu);
> + smp_call_function_single_async(cpu, csd);
> + }

smp_call_function() automatically skips the calling CPU but this code does
not. It isn't a hard bug since kgdb_nmicallback() does a re-entrancy
check but I'd still prefer to skip the calling CPU.

As mentioned in another part of the thread we can also add robustness
by skipping a cpu where csd->flags != 0 (and adding an appropriately
large comment regarding why). Doing the check directly is abusing
internal knowledge that smp.c normally keeps to itself so an accessor
of some kind would be needed.


> +}
> +
> +static void kgdb_generic_roundup_init(void)
> +{
> + call_single_data_t *csd;
> + int cpu;
> +
> + for_each_possible_cpu(cpu) {
> + csd = &per_cpu(kgdb_roundup_csd, cpu);
> + csd->func = kgdb_call_nmi_hook;
> + }
> +}

I can't help noticing this code is very similar to kgdb_roundup_cpus. Do
we really gain much from ahead-of-time initializing csd->func?


Daniel.

2018-10-31 21:51:57

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

Hi,

On Wed, Oct 31, 2018 at 11:40 AM Daniel Thompson
<[email protected]> wrote:
>
> On Tue, Oct 30, 2018 at 03:18:43PM -0700, Douglas Anderson wrote:
> > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > index f3cadda45f07..9a3f952de6ed 100644
> > --- a/kernel/debug/debug_core.c
> > +++ b/kernel/debug/debug_core.c
> > @@ -55,6 +55,7 @@
> > #include <linux/mm.h>
> > #include <linux/vmacache.h>
> > #include <linux/rcupdate.h>
> > +#include <linux/irq.h>
> >
> > #include <asm/cacheflush.h>
> > #include <asm/byteorder.h>
> > @@ -220,6 +221,39 @@ int __weak kgdb_skipexception(int exception, struct pt_regs *regs)
> > return 0;
> > }
> >
> > +/*
> > + * Default (weak) implementation for kgdb_roundup_cpus
> > + */
> > +
> > +static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd);
> > +
> > +void __weak kgdb_call_nmi_hook(void *ignored)
> > +{
> > + kgdb_nmicallback(raw_smp_processor_id(), get_irq_regs());
> > +}
> > +
> > +void __weak kgdb_roundup_cpus(void)
> > +{
> > + call_single_data_t *csd;
> > + int cpu;
> > +
> > + for_each_cpu(cpu, cpu_online_mask) {
> > + csd = &per_cpu(kgdb_roundup_csd, cpu);
> > + smp_call_function_single_async(cpu, csd);
> > + }
>
> smp_call_function() automatically skips the calling CPU but this code does
> not. It isn't a hard bug since kgdb_nmicallback() does a re-entrancy
> check but I'd still prefer to skip the calling CPU.

I'll incorporate this into the next version.


> As mentioned in another part of the thread we can also add robustness
> by skipping a cpu where csd->flags != 0 (and adding an appropriately
> large comment regarding why). Doing the check directly is abusing
> internal knowledge that smp.c normally keeps to itself so an accessor
> of some kind would be needed.

Sure. I could add smp_async_func_finished() that just looked like:

int smp_async_func_finished(call_single_data_t *csd)
{
return !(csd->flags & CSD_FLAG_LOCK);
}

My understanding of all the mutual exclusion / memory barrier concepts
employed by smp.c is pretty weak, though. I'm hoping that it's safe
to just access the structure and check the bit directly.

...but do you think adding a generic accessor like this is better than
just keeping track of this in kgdb directly? I could avoid the
accessor by adding a "rounding_up" member to "struct
debuggerinfo_struct" and doing something like this in roundup:

/* If it didn't round up last time, don't try again */
if (kgdb_info[cpu].rounding_up)
continue

kgdb_info[cpu].rounding_up = true
smp_call_function_single_async(cpu, csd);

...and then in kgdb_nmicallback() I could just add:

kgdb_info[cpu].rounding_up = false

In that case we're not adding a generic accessor to smp.c that most
people should never use.


I'll wait to hear back from you if you think the accessor is OK. It
seems like it might be nice not to have to add something to smp.c just
for this one use case.


> > +}
> > +
> > +static void kgdb_generic_roundup_init(void)
> > +{
> > + call_single_data_t *csd;
> > + int cpu;
> > +
> > + for_each_possible_cpu(cpu) {
> > + csd = &per_cpu(kgdb_roundup_csd, cpu);
> > + csd->func = kgdb_call_nmi_hook;
> > + }
> > +}
>
> I can't help noticing this code is very similar to kgdb_roundup_cpus. Do
> we really gain much from ahead-of-time initializing csd->func?

Oh! Right... At first I thought about just trying to put the "csd"
on the stack in kgdb_roundup_cpus() but then I realized that it needed
to persist past the end of kgdb_roundup_cpus(). ...and once I gave up
on the idea of putting it on the stack I decided I needed the init.

...but you're right that I don't really. The only thing I'm initting
is the function pointer and it totally wouldn't hurt to just init that
over and over again every time kgdb_roundup_cpus() is called.

-Doug

2018-11-03 10:45:54

by Daniel Thompson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

On Wed, Oct 31, 2018 at 02:41:14PM -0700, Doug Anderson wrote:
> > As mentioned in another part of the thread we can also add robustness
> > by skipping a cpu where csd->flags != 0 (and adding an appropriately
> > large comment regarding why). Doing the check directly is abusing
> > internal knowledge that smp.c normally keeps to itself so an accessor
> > of some kind would be needed.
>
> Sure. I could add smp_async_func_finished() that just looked like:
>
> int smp_async_func_finished(call_single_data_t *csd)
> {
> return !(csd->flags & CSD_FLAG_LOCK);
> }
>
> My understanding of all the mutual exclusion / memory barrier concepts
> employed by smp.c is pretty weak, though. I'm hoping that it's safe
> to just access the structure and check the bit directly.
>
> ...but do you think adding a generic accessor like this is better than
> just keeping track of this in kgdb directly? I could avoid the
> accessor by adding a "rounding_up" member to "struct
> debuggerinfo_struct" and doing something like this in roundup:
>
> /* If it didn't round up last time, don't try again */
> if (kgdb_info[cpu].rounding_up)
> continue
>
> kgdb_info[cpu].rounding_up = true
> smp_call_function_single_async(cpu, csd);
>
> ...and then in kgdb_nmicallback() I could just add:
>
> kgdb_info[cpu].rounding_up = false
>
> In that case we're not adding a generic accessor to smp.c that most
> people should never use.

Whilst it is very tempting to make a sarcastic reply here ("Of course! What
kgdb really needs is yet another set of condition variables") I can't
because I actually agree with the proposal. I don't really want kgdb to
be too "special" especially when it doesn't need to be.

Only thing to note is that rounding_up will not be manipulated within a
common spin lock so you might have to invest a bit of thought to make
sure any races between the master and slave as the slave CPU clears the
flag are benign.


Daniel.

2018-11-07 01:11:17

by Doug Anderson

[permalink] [raw]
Subject: Re: [PATCH v2 2/2] kgdb: Fix kgdb_roundup_cpus() for arches who used smp_call_function()

Hi,

On Sat, Nov 3, 2018 at 3:45 AM Daniel Thompson
<[email protected]> wrote:
>
> On Wed, Oct 31, 2018 at 02:41:14PM -0700, Doug Anderson wrote:
> > > As mentioned in another part of the thread we can also add robustness
> > > by skipping a cpu where csd->flags != 0 (and adding an appropriately
> > > large comment regarding why). Doing the check directly is abusing
> > > internal knowledge that smp.c normally keeps to itself so an accessor
> > > of some kind would be needed.
> >
> > Sure. I could add smp_async_func_finished() that just looked like:
> >
> > int smp_async_func_finished(call_single_data_t *csd)
> > {
> > return !(csd->flags & CSD_FLAG_LOCK);
> > }
> >
> > My understanding of all the mutual exclusion / memory barrier concepts
> > employed by smp.c is pretty weak, though. I'm hoping that it's safe
> > to just access the structure and check the bit directly.
> >
> > ...but do you think adding a generic accessor like this is better than
> > just keeping track of this in kgdb directly? I could avoid the
> > accessor by adding a "rounding_up" member to "struct
> > debuggerinfo_struct" and doing something like this in roundup:
> >
> > /* If it didn't round up last time, don't try again */
> > if (kgdb_info[cpu].rounding_up)
> > continue
> >
> > kgdb_info[cpu].rounding_up = true
> > smp_call_function_single_async(cpu, csd);
> >
> > ...and then in kgdb_nmicallback() I could just add:
> >
> > kgdb_info[cpu].rounding_up = false
> >
> > In that case we're not adding a generic accessor to smp.c that most
> > people should never use.
>
> Whilst it is very tempting to make a sarcastic reply here ("Of course! What
> kgdb really needs is yet another set of condition variables") I can't
> because I actually agree with the proposal. I don't really want kgdb to
> be too "special" especially when it doesn't need to be.
>
> Only thing to note is that rounding_up will not be manipulated within a
> common spin lock so you might have to invest a bit of thought to make
> sure any races between the master and slave as the slave CPU clears the
> flag are benign.

OK, so I've hopefully got all this thought through and posted v3.
I've put most of my thoughts in the patch descriptions themselves so
let's continue the discussion there.

-Doug