2020-05-22 14:57:53

by Daniel Thompson

[permalink] [raw]
Subject: [RFC PATCH 2/2] locking/spinlock/debug: Add checks for kgdb trap safety

In general it is not safe to call spin_lock() whilst executing in the
kgdb trap handler. The trap can be entered from all sorts of execution
context (NMI, IRQ, irqs disabled, etc) and the kgdb/kdb needs to be
as resillient as possible.

Currently it is difficult to spot mistakes in the kgdb/kdb logic
(especially so for kdb because it uses more kernel features than
pure-kgdb). Let's provide a means to bring attention to deadlock
risks in the debug code.

Signed-off-by: Daniel Thompson <[email protected]>
---
include/linux/kgdb.h | 16 ++++++++++++++++
kernel/locking/spinlock_debug.c | 4 ++++
lib/Kconfig.kgdb | 11 +++++++++++
3 files changed, 31 insertions(+)

diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index b072aeb1fd78..de30ce8078cf 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -332,4 +332,20 @@ extern void kgdb_panic(const char *msg);
#define dbg_late_init()
static inline void kgdb_panic(const char *msg) {}
#endif /* ! CONFIG_KGDB */
+
+#ifdef CONFIG_KGDB_DEBUG_SPINLOCK
+/**
+ * check_kgdb_context_before() - Check whether to issue a spinlock warning
+ *
+ * Currently this only reports when the master processor violates the
+ * locking rules (because we are using the in_dbg_master() macro since
+ * we are confident that will avoid false positives).
+ *
+ * Return: True if we are executing in the debug trap
+ */
+static inline int check_kgdb_context_before(void) { return in_dbg_master(); }
+#else
+static inline int check_kgdb_context_before(void) { return 0; }
+#endif
+
#endif /* _KGDB_H_ */
diff --git a/kernel/locking/spinlock_debug.c b/kernel/locking/spinlock_debug.c
index b9d93087ee66..b49789e0fed8 100644
--- a/kernel/locking/spinlock_debug.c
+++ b/kernel/locking/spinlock_debug.c
@@ -12,6 +12,7 @@
#include <linux/debug_locks.h>
#include <linux/delay.h>
#include <linux/export.h>
+#include <linux/kgdb.h>

void __raw_spin_lock_init(raw_spinlock_t *lock, const char *name,
struct lock_class_key *key, short inner)
@@ -84,6 +85,7 @@ debug_spin_lock_before(raw_spinlock_t *lock)
SPIN_BUG_ON(READ_ONCE(lock->owner) == current, lock, "recursion");
SPIN_BUG_ON(READ_ONCE(lock->owner_cpu) == raw_smp_processor_id(),
lock, "cpu recursion");
+ SPIN_BUG_ON(check_kgdb_context_before(), lock, "in debug trap");
}

static inline void debug_spin_lock_after(raw_spinlock_t *lock)
@@ -174,6 +176,7 @@ int do_raw_read_trylock(rwlock_t *lock)
void do_raw_read_unlock(rwlock_t *lock)
{
RWLOCK_BUG_ON(lock->magic != RWLOCK_MAGIC, lock, "bad magic");
+ RWLOCK_BUG_ON(check_kgdb_context_before(), lock, "in debug trap");
arch_read_unlock(&lock->raw_lock);
}

@@ -183,6 +186,7 @@ static inline void debug_write_lock_before(rwlock_t *lock)
RWLOCK_BUG_ON(lock->owner == current, lock, "recursion");
RWLOCK_BUG_ON(lock->owner_cpu == raw_smp_processor_id(),
lock, "cpu recursion");
+ RWLOCK_BUG_ON(check_kgdb_context_before(), lock, "in debug trap");
}

static inline void debug_write_lock_after(rwlock_t *lock)
diff --git a/lib/Kconfig.kgdb b/lib/Kconfig.kgdb
index 933680b59e2d..4d57900d6c53 100644
--- a/lib/Kconfig.kgdb
+++ b/lib/Kconfig.kgdb
@@ -29,6 +29,17 @@ config KGDB_SERIAL_CONSOLE
Share a serial console with kgdb. Sysrq-g must be used
to break in initially.

+config KGDB_DEBUG_SPINLOCK
+ bool "KGDB: Check for spin lock usage when system is halted"
+ select DEBUG_SPINLOCK
+ default n
+ help
+ Say Y here to catch spin lock waiting when we are running
+ in the kgdb trap handler and report it. When the trap handler
+ is executing all other system activity is halted and spin lock
+ contention will lead to deadlock. This makes any spin lock wait
+ from this execution context risky.
+
config KGDB_TESTS
bool "KGDB: internal test suite"
default n
--
2.25.4


2020-05-22 16:37:37

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [RFC PATCH 2/2] locking/spinlock/debug: Add checks for kgdb trap safety

On Fri, May 22, 2020 at 03:55:10PM +0100, Daniel Thompson wrote:
> In general it is not safe to call spin_lock() whilst executing in the
> kgdb trap handler. The trap can be entered from all sorts of execution
> context (NMI, IRQ, irqs disabled, etc) and the kgdb/kdb needs to be
> as resillient as possible.
>
> Currently it is difficult to spot mistakes in the kgdb/kdb logic
> (especially so for kdb because it uses more kernel features than
> pure-kgdb). Let's provide a means to bring attention to deadlock
> risks in the debug code.

I really dislike this thing. Also, commit:

f6f48e180404 ("lockdep: Teach lockdep about "USED" <- "IN-NMI" inversions")

should be able to trigger here when the kgdb traps are marked as NMI.
x86 will soon have that.