The execution context for kgdb/kdb is pretty much unique. We are running
a debug trap handler with all CPUs parked in a holding loop and with
interrupts disabled. At least one CPU is in an unknowable execution
state (could be NMI, IRQ, irqs disabled, etc) and the others are either
servicing an IRQ or NMI depending on architecture.
Breakpoints (including some implicit breakpoints when serious errors
are detected) can happen on more or less any context, including when we
own important spin locks.
As such spin lock waits should never happen whilst we are executing the
kgdb trap handler used except, occasionally, via an explicit command
from a (forewarned?) local operator.
Currently kdb doesn't meet this criteria (although I think kgdb does)
so I started thinking about what tooling we could employ to reinforce
code review and bring problems to the surface.
The result is a patch that extends DEBUG_SPINLOCKS and checks whether
the execution context is safe. The "except via an explicit command"
aspect (mentioned above) convinced me to make the checks conditional
on KGDB_DEBUG_SPINLOCKS.
Daniel Thompson (2):
debug: Convert dbg_slave_lock to an atomic
locking/spinlock/debug: Add checks for kgdb trap safety
include/linux/kgdb.h | 16 ++++++++++++++++
kernel/debug/debug_core.c | 8 ++++----
kernel/locking/spinlock_debug.c | 4 ++++
lib/Kconfig.kgdb | 11 +++++++++++
4 files changed, 35 insertions(+), 4 deletions(-)
base-commit: 6a8b55ed4056ea5559ebe4f6a4b247f627870d4c
--
2.25.4
Currently the debug core takes and releases dbg_slave_lock, knowing it to
be uncontended, as a means to set and clear a flag that it uses to herd
the other cores in to or out of a holding pen.
Let's convert this to a regular atomic instead. This change is worthwhile
simply for the subtle increase in clarity in a very tangled bit of code.
It is also useful because it removes a raw_spin_lock() from within the
debug trap, which will make it easier to introduce spin lock debugging
code for the debug trap handler.
Signed-off-by: Daniel Thompson <[email protected]>
---
kernel/debug/debug_core.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index 2b7c9b67931d..8f43171ddeac 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -121,12 +121,12 @@ static struct kgdb_bkpt kgdb_break[KGDB_MAX_BREAKPOINTS] = {
atomic_t kgdb_active = ATOMIC_INIT(-1);
EXPORT_SYMBOL_GPL(kgdb_active);
static DEFINE_RAW_SPINLOCK(dbg_master_lock);
-static DEFINE_RAW_SPINLOCK(dbg_slave_lock);
/*
* We use NR_CPUs not PERCPU, in case kgdb is used to debug early
* bootup code (which might not have percpu set up yet):
*/
+static atomic_t slaves_must_spin;
static atomic_t masters_in_kgdb;
static atomic_t slaves_in_kgdb;
static atomic_t kgdb_break_tasklet_var;
@@ -615,7 +615,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
dump_stack();
kgdb_info[cpu].exception_state &= ~DCPU_WANT_BT;
} else if (kgdb_info[cpu].exception_state & DCPU_IS_SLAVE) {
- if (!raw_spin_is_locked(&dbg_slave_lock))
+ if (!atomic_read(&slaves_must_spin))
goto return_normal;
} else {
return_normal:
@@ -677,7 +677,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
* CPU in a spin state while the debugger is active
*/
if (!kgdb_single_step)
- raw_spin_lock(&dbg_slave_lock);
+ atomic_set(&slaves_must_spin, 1);
#ifdef CONFIG_SMP
/* If send_ready set, slaves are already waiting */
@@ -741,7 +741,7 @@ static int kgdb_cpu_enter(struct kgdb_state *ks, struct pt_regs *regs,
dbg_io_ops->post_exception();
if (!kgdb_single_step) {
- raw_spin_unlock(&dbg_slave_lock);
+ atomic_set(&slaves_must_spin, 0);
/* Wait till all the CPUs have quit from the debugger. */
while (kgdb_do_roundup && atomic_read(&slaves_in_kgdb))
cpu_relax();
--
2.25.4
On Fri, May 22, 2020 at 03:55:09PM +0100, Daniel Thompson wrote:
> +static atomic_t slaves_must_spin;
> + if (!atomic_read(&slaves_must_spin))
> + atomic_set(&slaves_must_spin, 1);
> + atomic_set(&slaves_must_spin, 0);
There is no atomic operation there; this is pointless use of atomic_t.