2022-01-21 13:17:32

by Stephen Brennan

[permalink] [raw]
Subject: [PATCH v2] printk: disable optimistic spin during panic

A CPU executing with console lock spinning enabled might be halted
during a panic. Before the panicking CPU calls console_flush_on_panic(),
it may call console_trylock(), which attempts to optimistically spin,
deadlocking the panic CPU:

CPU 0 (panic CPU) CPU 1
----------------- ------
printk() {
vprintk_func() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_lock_spinning_enable();
... printing to console ...
panic() {
crash_smp_send_stop() {
NMI -------------------> HALT
}
atomic_notifier_call_chain() {
printk() {
...
console_trylock_spinnning() {
// optimistic spin infinitely

This hang during panic can be induced when a kdump kernel is loaded, and
crash_kexec_post_notifiers=1 is present on the kernel command line. The
following script which concurrently writes to /dev/kmsg, and triggers a
panic, can result in this hang:

#!/bin/bash
date
# 991 chars (based on log buffer size):
chars="$(printf 'a%.0s' {1..991})"
while :; do
echo $chars > /dev/kmsg
done &
echo c > /proc/sysrq-trigger &
date
exit

To avoid this deadlock, ensure that console_trylock_spinning() does not
allow spinning once a panic has begun. Below are the hang rates for the
above test case, based on v5.16-rc8 before and after this patch:

before: 776 hangs / 1484 trials - 52.3%
after : 0 hangs / 102 trials - 0.0%

Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")

Signed-off-by: Stephen Brennan <[email protected]>
---

Note: I continue to run my tests past 100 trials, but at this point I'm
confident enough in the fix to send the patch.

As Petr suggested I will send another patch making the non-panic CPUs
bail out of console_unlock(), but I wanted to get this one first, since
it is implemented and ready now.

kernel/printk/printk.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 57b132b658e1..612ed895b967 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1843,6 +1843,16 @@ static int console_trylock_spinning(void)
if (console_trylock())
return 1;

+ /*
+ * It's unsafe for the panic CPU to spin, since the CPU holding the
+ * console_sem may have already been halted. However, if a panic has
+ * started, but we're not the panic CPU, we still spin. This protects
+ * the panic CPU from needing to write out messages buffered by other
+ * CPUs in console_unlock().
+ */
+ if (atomic_read(&panic_cpu) == raw_smp_processor_id())
+ return 0;
+
printk_safe_enter_irqsave(flags);

raw_spin_lock(&console_owner_lock);
--
2.30.2