Hi,
This is the next part of our printk-rework effort (points 3 and
4 of the LPC 2019 summary [0]).
Here the concept of "atomic consoles" is introduced through a
new (optional) write_atomic() callback for console drivers. This
callback must be implemented as an NMI-safe variant of the
write() callback, meaning that it can function from any context
without relying on questionable tactics such as ignoring locking
and also without relying on the synchronization of console
semaphore.
As an example of how such an atomic console can look like, this
series implements write_atomic() for the 8250 UART driver.
This series also introduces a new console printing mode called
"sync mode" that is only activated when the kernel is about to
end (such as panic, oops, shutdown, reboot). Sync mode can only
be activated if atomic consoles are available. A system without
registered atomic consoles will be unaffected by this series.
When in sync mode, the console printing behavior becomes:
- only consoles implementing write_atomic() will be called
- printing occurs within vprintk_store() instead of
console_unlock(), since the console semaphore is irrelevant
for atomic consoles
For systems that have registered atomic consoles, this series
improves the reliability of seeing crash messages by using new
locking techniques rather than "ignoring locks and hoping for
the best". In particular, atomic consoles rely on the
CPU-reentrant spinlock (i.e. the printk cpulock) for
synchronizing console output.
John Ogness
[0] https://lore.kernel.org/lkml/[email protected]/
John Ogness (10):
printk: relocate printk cpulock functions
printk: rename printk cpulock API and always disable interrupts
kgdb: delay roundup if holding printk cpulock
printk: relocate printk_delay()
printk: call boot_delay_msec() in printk_delay()
printk: use seqcount_latch for console_seq
console: add write_atomic interface
printk: introduce kernel sync mode
kdb: if available, only use atomic consoles for output mirroring
serial: 8250: implement write_atomic
arch/powerpc/include/asm/smp.h | 1 +
arch/powerpc/kernel/kgdb.c | 10 +-
arch/powerpc/kernel/smp.c | 5 +
arch/x86/kernel/kgdb.c | 9 +-
drivers/tty/serial/8250/8250.h | 47 ++-
drivers/tty/serial/8250/8250_core.c | 17 +-
drivers/tty/serial/8250/8250_fsl.c | 9 +
drivers/tty/serial/8250/8250_ingenic.c | 7 +
drivers/tty/serial/8250/8250_mtk.c | 29 +-
drivers/tty/serial/8250/8250_port.c | 92 ++--
drivers/tty/serial/8250/Kconfig | 1 +
include/linux/console.h | 32 ++
include/linux/kgdb.h | 3 +
include/linux/printk.h | 57 +--
include/linux/serial_8250.h | 5 +
kernel/debug/debug_core.c | 45 +-
kernel/debug/kdb/kdb_io.c | 16 +
kernel/printk/printk.c | 554 +++++++++++++++++--------
lib/Kconfig.debug | 3 +
lib/dump_stack.c | 4 +-
lib/nmi_backtrace.c | 4 +-
21 files changed, 684 insertions(+), 266 deletions(-)
base-commit: 23d8adcf8022b9483605531d8985f5b77533cb3a
--
2.20.1
kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
during cpu roundup. This will conflict with the printk cpulock.
Therefore, a CPU must ensure that it is not holding the printk
cpulock when calling kgdb_cpu_enter(). If it is, it must allow its
printk context to complete first.
A new helper function kgdb_roundup_delay() is introduced for kgdb
to determine if it is holding the printk cpulock. If so, a flag is
set so that when the printk cpulock is released, kgdb will be
re-triggered for that CPU.
Signed-off-by: John Ogness <[email protected]>
---
arch/powerpc/include/asm/smp.h | 1 +
arch/powerpc/kernel/kgdb.c | 10 +++++++-
arch/powerpc/kernel/smp.c | 5 ++++
arch/x86/kernel/kgdb.c | 9 ++++---
include/linux/kgdb.h | 3 +++
include/linux/printk.h | 8 ++++++
kernel/debug/debug_core.c | 45 ++++++++++++++++++++--------------
kernel/printk/printk.c | 12 +++++++++
8 files changed, 70 insertions(+), 23 deletions(-)
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 03b3d010cbab..eec452e647b3 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -58,6 +58,7 @@ struct smp_ops_t {
extern int smp_send_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us);
extern int smp_send_safe_nmi_ipi(int cpu, void (*fn)(struct pt_regs *), u64 delay_us);
+extern void smp_send_debugger_break_cpu(unsigned int cpu);
extern void smp_send_debugger_break(void);
extern void start_secondary_resume(void);
extern void smp_generic_give_timebase(void);
diff --git a/arch/powerpc/kernel/kgdb.c b/arch/powerpc/kernel/kgdb.c
index bdee7262c080..d57d37497862 100644
--- a/arch/powerpc/kernel/kgdb.c
+++ b/arch/powerpc/kernel/kgdb.c
@@ -120,11 +120,19 @@ int kgdb_skipexception(int exception, struct pt_regs *regs)
static int kgdb_debugger_ipi(struct pt_regs *regs)
{
- kgdb_nmicallback(raw_smp_processor_id(), regs);
+ int cpu = raw_smp_processor_id();
+
+ if (!kgdb_roundup_delay(cpu))
+ kgdb_nmicallback(cpu, regs);
return 0;
}
#ifdef CONFIG_SMP
+void kgdb_roundup_cpu(unsigned int cpu)
+{
+ smp_send_debugger_break_cpu(cpu);
+}
+
void kgdb_roundup_cpus(void)
{
smp_send_debugger_break();
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 447b78a87c8f..816d7f09bbf9 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -582,6 +582,11 @@ static void debugger_ipi_callback(struct pt_regs *regs)
debugger_ipi(regs);
}
+void smp_send_debugger_break_cpu(unsigned int cpu)
+{
+ smp_send_nmi_ipi(cpu, debugger_ipi_callback, 1000000);
+}
+
void smp_send_debugger_break(void)
{
smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, debugger_ipi_callback, 1000000);
diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 3a43a2dee658..37bd37cdf2b6 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -502,9 +502,12 @@ static int kgdb_nmi_handler(unsigned int cmd, struct pt_regs *regs)
if (atomic_read(&kgdb_active) != -1) {
/* KGDB CPU roundup */
cpu = raw_smp_processor_id();
- kgdb_nmicallback(cpu, regs);
- set_bit(cpu, was_in_debug_nmi);
- touch_nmi_watchdog();
+
+ if (!kgdb_roundup_delay(cpu)) {
+ kgdb_nmicallback(cpu, regs);
+ set_bit(cpu, was_in_debug_nmi);
+ touch_nmi_watchdog();
+ }
return NMI_HANDLED;
}
diff --git a/include/linux/kgdb.h b/include/linux/kgdb.h
index 258cdde8d356..9bca0d98db5a 100644
--- a/include/linux/kgdb.h
+++ b/include/linux/kgdb.h
@@ -212,6 +212,8 @@ extern void kgdb_call_nmi_hook(void *ignored);
*/
extern void kgdb_roundup_cpus(void);
+extern void kgdb_roundup_cpu(unsigned int cpu);
+
/**
* kgdb_arch_set_pc - Generic call back to the program counter
* @regs: Current &struct pt_regs.
@@ -365,5 +367,6 @@ extern void kgdb_free_init_mem(void);
#define dbg_late_init()
static inline void kgdb_panic(const char *msg) {}
static inline void kgdb_free_init_mem(void) { }
+static inline void kgdb_roundup_cpu(unsigned int cpu) {}
#endif /* ! CONFIG_KGDB */
#endif /* _KGDB_H_ */
diff --git a/include/linux/printk.h b/include/linux/printk.h
index ac738d1d9934..974ea2c99749 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -280,10 +280,18 @@ static inline void dump_stack(void)
extern int __printk_cpu_trylock(void);
extern void __printk_wait_on_cpu_lock(void);
extern void __printk_cpu_unlock(void);
+extern bool kgdb_roundup_delay(unsigned int cpu);
+
#else
+
#define __printk_cpu_trylock() 1
#define __printk_wait_on_cpu_lock()
#define __printk_cpu_unlock()
+
+static inline bool kgdb_roundup_delay(unsigned int cpu)
+{
+ return false;
+}
#endif /* CONFIG_SMP */
/**
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index b4aa6bb6b2bd..9117ca86b81c 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -241,35 +241,42 @@ NOKPROBE_SYMBOL(kgdb_call_nmi_hook);
static DEFINE_PER_CPU(call_single_data_t, kgdb_roundup_csd) =
CSD_INIT(kgdb_call_nmi_hook, NULL);
-void __weak kgdb_roundup_cpus(void)
+void __weak kgdb_roundup_cpu(unsigned int cpu)
{
call_single_data_t *csd;
+ int ret;
+
+ csd = &per_cpu(kgdb_roundup_csd, cpu);
+
+ /*
+ * If it didn't round up last time, don't try again
+ * since smp_call_function_single_async() will block.
+ *
+ * If rounding_up is false then we know that the
+ * previous call must have at least started and that
+ * means smp_call_function_single_async() won't block.
+ */
+ if (kgdb_info[cpu].rounding_up)
+ return;
+ kgdb_info[cpu].rounding_up = true;
+
+ ret = smp_call_function_single_async(cpu, csd);
+ if (ret)
+ kgdb_info[cpu].rounding_up = false;
+}
+NOKPROBE_SYMBOL(kgdb_roundup_cpu);
+
+void __weak kgdb_roundup_cpus(void)
+{
int this_cpu = raw_smp_processor_id();
int cpu;
- int ret;
for_each_online_cpu(cpu) {
/* No need to roundup ourselves */
if (cpu == this_cpu)
continue;
- csd = &per_cpu(kgdb_roundup_csd, cpu);
-
- /*
- * If it didn't round up last time, don't try again
- * since smp_call_function_single_async() will block.
- *
- * If rounding_up is false then we know that the
- * previous call must have at least started and that
- * means smp_call_function_single_async() won't block.
- */
- if (kgdb_info[cpu].rounding_up)
- continue;
- kgdb_info[cpu].rounding_up = true;
-
- ret = smp_call_function_single_async(cpu, csd);
- if (ret)
- kgdb_info[cpu].rounding_up = false;
+ kgdb_roundup_cpu(cpu);
}
}
NOKPROBE_SYMBOL(kgdb_roundup_cpus);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 3d0c933937b4..1b546e117f10 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -44,6 +44,7 @@
#include <linux/irq_work.h>
#include <linux/ctype.h>
#include <linux/uio.h>
+#include <linux/kgdb.h>
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
@@ -214,6 +215,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
#ifdef CONFIG_SMP
static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
+static unsigned int kgdb_cpu = -1;
/**
* __printk_wait_on_cpu_lock() - Busy wait until the printk cpu-reentrant
@@ -325,6 +327,16 @@ void __printk_cpu_unlock(void)
-1); /* LMM(__printk_cpu_unlock:B) */
}
EXPORT_SYMBOL(__printk_cpu_unlock);
+
+bool kgdb_roundup_delay(unsigned int cpu)
+{
+ if (cpu != atomic_read(&printk_cpulock_owner))
+ return false;
+
+ kgdb_cpu = cpu;
+ return true;
+}
+EXPORT_SYMBOL(kgdb_roundup_delay);
#endif /* CONFIG_SMP */
/* Number of registered extended console drivers. */
--
2.20.1
In preparation for synchronous printing, change @console_seq to use
seqcount_latch so that it can be read without requiring @console_sem.
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 73 ++++++++++++++++++++++++------------------
1 file changed, 41 insertions(+), 32 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index d07d98c1e846..f8f46d9fba9b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -489,9 +489,7 @@ static u64 syslog_seq;
static size_t syslog_partial;
static bool syslog_time;
-/* All 3 protected by @console_sem. */
-/* the next printk record to write to the console */
-static u64 console_seq;
+/* Both protected by @console_sem. */
static u64 exclusive_console_stop_seq;
static unsigned long console_dropped;
@@ -500,6 +498,17 @@ struct latched_seq {
u64 val[2];
};
+/*
+ * The next printk record to write to the console. There are two
+ * copies (updated with seqcount_latch) so that reads can locklessly
+ * access a valid value. Writers are synchronized by @console_sem.
+ */
+static struct latched_seq console_seq = {
+ .latch = SEQCNT_LATCH_ZERO(console_seq.latch),
+ .val[0] = 0,
+ .val[1] = 0,
+};
+
/*
* The next printk record to read after the last 'clear' command. There are
* two copies (updated with seqcount_latch) so that reads can locklessly
@@ -563,7 +572,7 @@ bool printk_percpu_data_ready(void)
return __printk_percpu_data_ready;
}
-/* Must be called under syslog_lock. */
+/* Must be called under associated write-protection lock. */
static void latched_seq_write(struct latched_seq *ls, u64 val)
{
raw_write_seqcount_latch(&ls->latch);
@@ -2405,9 +2414,9 @@ EXPORT_SYMBOL(_printk);
#define prb_read_valid(rb, seq, r) false
#define prb_first_valid_seq(rb) 0
+#define latched_seq_read_nolock(seq) 0
+#define latched_seq_write(dst, src)
-static u64 syslog_seq;
-static u64 console_seq;
static u64 exclusive_console_stop_seq;
static unsigned long console_dropped;
@@ -2735,7 +2744,7 @@ void console_unlock(void)
bool do_cond_resched, retry;
struct printk_info info;
struct printk_record r;
- u64 __maybe_unused next_seq;
+ u64 seq;
if (console_suspended) {
up_console_sem();
@@ -2779,12 +2788,14 @@ void console_unlock(void)
size_t len;
skip:
- if (!prb_read_valid(prb, console_seq, &r))
+ seq = latched_seq_read_nolock(&console_seq);
+ if (!prb_read_valid(prb, seq, &r))
break;
- if (console_seq != r.info->seq) {
- console_dropped += r.info->seq - console_seq;
- console_seq = r.info->seq;
+ if (seq != r.info->seq) {
+ console_dropped += r.info->seq - seq;
+ latched_seq_write(&console_seq, r.info->seq);
+ seq = r.info->seq;
}
if (suppress_message_printing(r.info->level)) {
@@ -2793,13 +2804,13 @@ void console_unlock(void)
* directly to the console when we received it, and
* record that has level above the console loglevel.
*/
- console_seq++;
+ latched_seq_write(&console_seq, seq + 1);
goto skip;
}
/* Output to all consoles once old messages replayed. */
if (unlikely(exclusive_console &&
- console_seq >= exclusive_console_stop_seq)) {
+ seq >= exclusive_console_stop_seq)) {
exclusive_console = NULL;
}
@@ -2820,7 +2831,7 @@ void console_unlock(void)
len = record_print_text(&r,
console_msg_format & MSG_FORMAT_SYSLOG,
printk_time);
- console_seq++;
+ latched_seq_write(&console_seq, seq + 1);
/*
* While actively printing out messages, if another printk()
@@ -2848,9 +2859,6 @@ void console_unlock(void)
cond_resched();
}
- /* Get consistent value of the next-to-be-used sequence number. */
- next_seq = console_seq;
-
console_locked = 0;
up_console_sem();
@@ -2860,7 +2868,7 @@ void console_unlock(void)
* there's a new owner and the console_unlock() from them will do the
* flush, no worries.
*/
- retry = prb_read_valid(prb, next_seq, NULL);
+ retry = prb_read_valid(prb, latched_seq_read_nolock(&console_seq), NULL);
if (retry && console_trylock())
goto again;
}
@@ -2912,18 +2920,19 @@ void console_unblank(void)
*/
void console_flush_on_panic(enum con_flush_mode mode)
{
- /*
- * If someone else is holding the console lock, trylock will fail
- * and may_schedule may be set. Ignore and proceed to unlock so
- * that messages are flushed out. As this can be called from any
- * context and we don't want to get preempted while flushing,
- * ensure may_schedule is cleared.
- */
- console_trylock();
- console_may_schedule = 0;
-
- if (mode == CONSOLE_REPLAY_ALL)
- console_seq = prb_first_valid_seq(prb);
+ if (console_trylock()) {
+ if (mode == CONSOLE_REPLAY_ALL)
+ latched_seq_write(&console_seq, prb_first_valid_seq(prb));
+ } else {
+ /*
+ * Another context is holding the console lock and
+ * @console_may_schedule may be set. Ignore and proceed to
+ * unlock so that messages are flushed out. As this can be
+ * called from any context and we don't want to get preempted
+ * while flushing, ensure @console_may_schedule is cleared.
+ */
+ console_may_schedule = 0;
+ }
console_unlock();
}
@@ -3159,11 +3168,11 @@ void register_console(struct console *newcon)
* ignores console_lock.
*/
exclusive_console = newcon;
- exclusive_console_stop_seq = console_seq;
+ exclusive_console_stop_seq = latched_seq_read_nolock(&console_seq);
/* Get a consistent copy of @syslog_seq. */
mutex_lock(&syslog_lock);
- console_seq = syslog_seq;
+ latched_seq_write(&console_seq, syslog_seq);
mutex_unlock(&syslog_lock);
}
console_unlock();
--
2.20.1
Implement an NMI-safe write_atomic() console function in order to
support synchronous console printing.
Since interrupts need to be disabled during transmit, all usage of
the IER register is wrapped with access functions that use the
printk cpulock to synchronize register access while tracking the
state of the interrupts. This is necessary because write_atomic()
can be called from an NMI context that has preempted write_atomic().
Signed-off-by: John Ogness <[email protected]>
---
drivers/tty/serial/8250/8250.h | 47 ++++++++++++-
drivers/tty/serial/8250/8250_core.c | 17 +++--
drivers/tty/serial/8250/8250_fsl.c | 9 +++
drivers/tty/serial/8250/8250_ingenic.c | 7 ++
drivers/tty/serial/8250/8250_mtk.c | 29 +++++++-
drivers/tty/serial/8250/8250_port.c | 92 ++++++++++++++++----------
drivers/tty/serial/8250/Kconfig | 1 +
include/linux/serial_8250.h | 5 ++
8 files changed, 163 insertions(+), 44 deletions(-)
diff --git a/drivers/tty/serial/8250/8250.h b/drivers/tty/serial/8250/8250.h
index 6473361525d1..9e8785ab9886 100644
--- a/drivers/tty/serial/8250/8250.h
+++ b/drivers/tty/serial/8250/8250.h
@@ -132,12 +132,55 @@ static inline void serial_dl_write(struct uart_8250_port *up, int value)
up->dl_write(up, value);
}
+static inline void serial8250_set_IER(struct uart_8250_port *up,
+ unsigned char ier)
+{
+ struct uart_port *port = &up->port;
+ unsigned long flags;
+ bool is_console;
+
+ is_console = uart_console(port);
+
+ if (is_console)
+ console_atomic_cpu_lock(flags);
+
+ serial_out(up, UART_IER, ier);
+
+ if (is_console)
+ console_atomic_cpu_unlock(flags);
+}
+
+static inline unsigned char serial8250_clear_IER(struct uart_8250_port *up)
+{
+ struct uart_port *port = &up->port;
+ unsigned int clearval = 0;
+ unsigned long flags;
+ unsigned int prior;
+ bool is_console;
+
+ is_console = uart_console(port);
+
+ if (up->capabilities & UART_CAP_UUE)
+ clearval = UART_IER_UUE;
+
+ if (is_console)
+ console_atomic_cpu_lock(flags);
+
+ prior = serial_port_in(port, UART_IER);
+ serial_port_out(port, UART_IER, clearval);
+
+ if (is_console)
+ console_atomic_cpu_unlock(flags);
+
+ return prior;
+}
+
static inline bool serial8250_set_THRI(struct uart_8250_port *up)
{
if (up->ier & UART_IER_THRI)
return false;
up->ier |= UART_IER_THRI;
- serial_out(up, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
return true;
}
@@ -146,7 +189,7 @@ static inline bool serial8250_clear_THRI(struct uart_8250_port *up)
if (!(up->ier & UART_IER_THRI))
return false;
up->ier &= ~UART_IER_THRI;
- serial_out(up, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
return true;
}
diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index 1ce193daea7f..fad00c0414e3 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -264,10 +264,8 @@ static void serial8250_backup_timeout(struct timer_list *t)
* Must disable interrupts or else we risk racing with the interrupt
* based handler.
*/
- if (up->port.irq) {
- ier = serial_in(up, UART_IER);
- serial_out(up, UART_IER, 0);
- }
+ if (up->port.irq)
+ ier = serial8250_clear_IER(up);
iir = serial_in(up, UART_IIR);
@@ -290,7 +288,7 @@ static void serial8250_backup_timeout(struct timer_list *t)
serial8250_tx_chars(up);
if (up->port.irq)
- serial_out(up, UART_IER, ier);
+ serial8250_set_IER(up, ier);
spin_unlock_irqrestore(&up->port.lock, flags);
@@ -568,6 +566,14 @@ serial8250_register_ports(struct uart_driver *drv, struct device *dev)
#ifdef CONFIG_SERIAL_8250_CONSOLE
+static void univ8250_console_write_atomic(struct console *co, const char *s,
+ unsigned int count)
+{
+ struct uart_8250_port *up = &serial8250_ports[co->index];
+
+ serial8250_console_write_atomic(up, s, count);
+}
+
static void univ8250_console_write(struct console *co, const char *s,
unsigned int count)
{
@@ -661,6 +667,7 @@ static int univ8250_console_match(struct console *co, char *name, int idx,
static struct console univ8250_console = {
.name = "ttyS",
+ .write_atomic = univ8250_console_write_atomic,
.write = univ8250_console_write,
.device = uart_console_device,
.setup = univ8250_console_setup,
diff --git a/drivers/tty/serial/8250/8250_fsl.c b/drivers/tty/serial/8250/8250_fsl.c
index 4e75d2e4f87c..00e0f8f6607b 100644
--- a/drivers/tty/serial/8250/8250_fsl.c
+++ b/drivers/tty/serial/8250/8250_fsl.c
@@ -59,9 +59,18 @@ int fsl8250_handle_irq(struct uart_port *port)
/* Stop processing interrupts on input overrun */
if ((orig_lsr & UART_LSR_OE) && (up->overrun_backoff_time_ms > 0)) {
+ unsigned long flags;
unsigned long delay;
+ bool is_console;
+ is_console = uart_console(port);
+
+ if (is_console)
+ console_atomic_cpu_lock(flags);
up->ier = port->serial_in(port, UART_IER);
+ if (is_console)
+ console_atomic_cpu_unlock(flags);
+
if (up->ier & (UART_IER_RLSI | UART_IER_RDI)) {
port->ops->stop_rx(port);
} else {
diff --git a/drivers/tty/serial/8250/8250_ingenic.c b/drivers/tty/serial/8250/8250_ingenic.c
index 988bf6bcce42..ff0e1d84ec7e 100644
--- a/drivers/tty/serial/8250/8250_ingenic.c
+++ b/drivers/tty/serial/8250/8250_ingenic.c
@@ -146,6 +146,8 @@ OF_EARLYCON_DECLARE(x1000_uart, "ingenic,x1000-uart",
static void ingenic_uart_serial_out(struct uart_port *p, int offset, int value)
{
+ unsigned long flags;
+ bool is_console;
int ier;
switch (offset) {
@@ -167,7 +169,12 @@ static void ingenic_uart_serial_out(struct uart_port *p, int offset, int value)
* If we have enabled modem status IRQs we should enable
* modem mode.
*/
+ is_console = uart_console(p);
+ if (is_console)
+ console_atomic_cpu_lock(flags);
ier = p->serial_in(p, UART_IER);
+ if (is_console)
+ console_atomic_cpu_unlock(flags);
if (ier & UART_IER_MSI)
value |= UART_MCR_MDCE | UART_MCR_FCM;
diff --git a/drivers/tty/serial/8250/8250_mtk.c b/drivers/tty/serial/8250/8250_mtk.c
index f7d3023f860f..f26140ebd67a 100644
--- a/drivers/tty/serial/8250/8250_mtk.c
+++ b/drivers/tty/serial/8250/8250_mtk.c
@@ -213,12 +213,37 @@ static void mtk8250_shutdown(struct uart_port *port)
static void mtk8250_disable_intrs(struct uart_8250_port *up, int mask)
{
- serial_out(up, UART_IER, serial_in(up, UART_IER) & (~mask));
+ struct uart_port *port = &up->port;
+ unsigned long flags;
+ unsigned int ier;
+ bool is_console;
+
+ is_console = uart_console(port);
+
+ if (is_console)
+ console_atomic_cpu_lock(flags);
+
+ ier = serial_in(up, UART_IER);
+ serial_out(up, UART_IER, ier & (~mask));
+
+ if (is_console)
+ console_atomic_cpu_unlock(flags);
}
static void mtk8250_enable_intrs(struct uart_8250_port *up, int mask)
{
- serial_out(up, UART_IER, serial_in(up, UART_IER) | mask);
+ struct uart_port *port = &up->port;
+ unsigned long flags;
+ unsigned int ier;
+
+ if (uart_console(port))
+ console_atomic_cpu_lock(flags);
+
+ ier = serial_in(up, UART_IER);
+ serial_out(up, UART_IER, ier | mask);
+
+ if (uart_console(port))
+ console_atomic_cpu_unlock(flags);
}
static void mtk8250_set_flow_ctrl(struct uart_8250_port *up, int mode)
diff --git a/drivers/tty/serial/8250/8250_port.c b/drivers/tty/serial/8250/8250_port.c
index 2164290cbd31..2c2a5abc1baf 100644
--- a/drivers/tty/serial/8250/8250_port.c
+++ b/drivers/tty/serial/8250/8250_port.c
@@ -757,7 +757,7 @@ static void serial8250_set_sleep(struct uart_8250_port *p, int sleep)
serial_out(p, UART_EFR, UART_EFR_ECB);
serial_out(p, UART_LCR, 0);
}
- serial_out(p, UART_IER, sleep ? UART_IERX_SLEEP : 0);
+ serial8250_set_IER(p, sleep ? UART_IERX_SLEEP : 0);
if (p->capabilities & UART_CAP_EFR) {
serial_out(p, UART_LCR, UART_LCR_CONF_MODE_B);
serial_out(p, UART_EFR, efr);
@@ -1429,7 +1429,7 @@ static void serial8250_stop_rx(struct uart_port *port)
up->ier &= ~(UART_IER_RLSI | UART_IER_RDI);
up->port.read_status_mask &= ~UART_LSR_DR;
- serial_port_out(port, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
serial8250_rpm_put(up);
}
@@ -1459,7 +1459,7 @@ void serial8250_em485_stop_tx(struct uart_8250_port *p)
serial8250_clear_and_reinit_fifos(p);
p->ier |= UART_IER_RLSI | UART_IER_RDI;
- serial_port_out(&p->port, UART_IER, p->ier);
+ serial8250_set_IER(p, p->ier);
}
}
EXPORT_SYMBOL_GPL(serial8250_em485_stop_tx);
@@ -1681,7 +1681,7 @@ static void serial8250_disable_ms(struct uart_port *port)
mctrl_gpio_disable_ms(up->gpios);
up->ier &= ~UART_IER_MSI;
- serial_port_out(port, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
}
static void serial8250_enable_ms(struct uart_port *port)
@@ -1697,7 +1697,7 @@ static void serial8250_enable_ms(struct uart_port *port)
up->ier |= UART_IER_MSI;
serial8250_rpm_get(up);
- serial_port_out(port, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
serial8250_rpm_put(up);
}
@@ -2124,14 +2124,7 @@ static void serial8250_put_poll_char(struct uart_port *port,
struct uart_8250_port *up = up_to_u8250p(port);
serial8250_rpm_get(up);
- /*
- * First save the IER then disable the interrupts
- */
- ier = serial_port_in(port, UART_IER);
- if (up->capabilities & UART_CAP_UUE)
- serial_port_out(port, UART_IER, UART_IER_UUE);
- else
- serial_port_out(port, UART_IER, 0);
+ ier = serial8250_clear_IER(up);
wait_for_xmitr(up, BOTH_EMPTY);
/*
@@ -2144,7 +2137,7 @@ static void serial8250_put_poll_char(struct uart_port *port,
* and restore the IER
*/
wait_for_xmitr(up, BOTH_EMPTY);
- serial_port_out(port, UART_IER, ier);
+ serial8250_set_IER(up, ier);
serial8250_rpm_put(up);
}
@@ -2447,7 +2440,7 @@ void serial8250_do_shutdown(struct uart_port *port)
*/
spin_lock_irqsave(&port->lock, flags);
up->ier = 0;
- serial_port_out(port, UART_IER, 0);
+ serial8250_set_IER(up, 0);
spin_unlock_irqrestore(&port->lock, flags);
synchronize_irq(port->irq);
@@ -2816,7 +2809,7 @@ serial8250_do_set_termios(struct uart_port *port, struct ktermios *termios,
if (up->capabilities & UART_CAP_RTOIE)
up->ier |= UART_IER_RTOIE;
- serial_port_out(port, UART_IER, up->ier);
+ serial8250_set_IER(up, up->ier);
if (up->capabilities & UART_CAP_EFR) {
unsigned char efr = 0;
@@ -3282,7 +3275,7 @@ EXPORT_SYMBOL_GPL(serial8250_set_defaults);
#ifdef CONFIG_SERIAL_8250_CONSOLE
-static void serial8250_console_putchar(struct uart_port *port, int ch)
+static void serial8250_console_putchar_locked(struct uart_port *port, int ch)
{
struct uart_8250_port *up = up_to_u8250p(port);
@@ -3290,6 +3283,18 @@ static void serial8250_console_putchar(struct uart_port *port, int ch)
serial_port_out(port, UART_TX, ch);
}
+static void serial8250_console_putchar(struct uart_port *port, int ch)
+{
+ struct uart_8250_port *up = up_to_u8250p(port);
+ unsigned long flags;
+
+ wait_for_xmitr(up, UART_LSR_THRE);
+
+ console_atomic_cpu_lock(flags);
+ serial8250_console_putchar_locked(port, ch);
+ console_atomic_cpu_unlock(flags);
+}
+
/*
* Restore serial console when h/w power-off detected
*/
@@ -3311,6 +3316,32 @@ static void serial8250_console_restore(struct uart_8250_port *up)
serial8250_out_MCR(up, UART_MCR_DTR | UART_MCR_RTS);
}
+void serial8250_console_write_atomic(struct uart_8250_port *up,
+ const char *s, unsigned int count)
+{
+ struct uart_port *port = &up->port;
+ unsigned long flags;
+ unsigned int ier;
+
+ console_atomic_cpu_lock(flags);
+
+ touch_nmi_watchdog();
+
+ ier = serial8250_clear_IER(up);
+
+ if (atomic_fetch_inc(&up->console_printing)) {
+ uart_console_write(port, "\n", 1,
+ serial8250_console_putchar_locked);
+ }
+ uart_console_write(port, s, count, serial8250_console_putchar_locked);
+ atomic_dec(&up->console_printing);
+
+ wait_for_xmitr(up, BOTH_EMPTY);
+ serial8250_set_IER(up, ier);
+
+ console_atomic_cpu_unlock(flags);
+}
+
/*
* Print a string to the serial port trying not to disturb
* any possible real use of the port...
@@ -3327,24 +3358,12 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
struct uart_port *port = &up->port;
unsigned long flags;
unsigned int ier;
- int locked = 1;
touch_nmi_watchdog();
- if (oops_in_progress)
- locked = spin_trylock_irqsave(&port->lock, flags);
- else
- spin_lock_irqsave(&port->lock, flags);
-
- /*
- * First save the IER then disable the interrupts
- */
- ier = serial_port_in(port, UART_IER);
+ spin_lock_irqsave(&port->lock, flags);
- if (up->capabilities & UART_CAP_UUE)
- serial_port_out(port, UART_IER, UART_IER_UUE);
- else
- serial_port_out(port, UART_IER, 0);
+ ier = serial8250_clear_IER(up);
/* check scratch reg to see if port powered off during system sleep */
if (up->canary && (up->canary != serial_port_in(port, UART_SCR))) {
@@ -3358,7 +3377,9 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
mdelay(port->rs485.delay_rts_before_send);
}
+ atomic_inc(&up->console_printing);
uart_console_write(port, s, count, serial8250_console_putchar);
+ atomic_dec(&up->console_printing);
/*
* Finally, wait for transmitter to become empty
@@ -3371,8 +3392,7 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
if (em485->tx_stopped)
up->rs485_stop_tx(up);
}
-
- serial_port_out(port, UART_IER, ier);
+ serial8250_set_IER(up, ier);
/*
* The receive handling will happen properly because the
@@ -3384,8 +3404,7 @@ void serial8250_console_write(struct uart_8250_port *up, const char *s,
if (up->msr_saved_flags)
serial8250_modem_status(up);
- if (locked)
- spin_unlock_irqrestore(&port->lock, flags);
+ spin_unlock_irqrestore(&port->lock, flags);
}
static unsigned int probe_baud(struct uart_port *port)
@@ -3405,6 +3424,7 @@ static unsigned int probe_baud(struct uart_port *port)
int serial8250_console_setup(struct uart_port *port, char *options, bool probe)
{
+ struct uart_8250_port *up = up_to_u8250p(port);
int baud = 9600;
int bits = 8;
int parity = 'n';
@@ -3414,6 +3434,8 @@ int serial8250_console_setup(struct uart_port *port, char *options, bool probe)
if (!port->iobase && !port->membase)
return -ENODEV;
+ atomic_set(&up->console_printing, 0);
+
if (options)
uart_parse_options(options, &baud, &parity, &bits, &flow);
else if (probe)
diff --git a/drivers/tty/serial/8250/Kconfig b/drivers/tty/serial/8250/Kconfig
index d1b3c2373fa4..1671d2882ba7 100644
--- a/drivers/tty/serial/8250/Kconfig
+++ b/drivers/tty/serial/8250/Kconfig
@@ -9,6 +9,7 @@ config SERIAL_8250
depends on !S390
select SERIAL_CORE
select SERIAL_MCTRL_GPIO if GPIOLIB
+ select HAVE_ATOMIC_CONSOLE
help
This selects whether you want to include the driver for the standard
serial ports. The standard answer is Y. People who might say N
diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h
index 5db211f43b29..aa011f668705 100644
--- a/include/linux/serial_8250.h
+++ b/include/linux/serial_8250.h
@@ -7,6 +7,7 @@
#ifndef _LINUX_SERIAL_8250_H
#define _LINUX_SERIAL_8250_H
+#include <linux/atomic.h>
#include <linux/serial_core.h>
#include <linux/serial_reg.h>
#include <linux/platform_device.h>
@@ -125,6 +126,8 @@ struct uart_8250_port {
#define MSR_SAVE_FLAGS UART_MSR_ANY_DELTA
unsigned char msr_saved_flags;
+ atomic_t console_printing;
+
struct uart_8250_dma *dma;
const struct uart_8250_ops *ops;
@@ -180,6 +183,8 @@ void serial8250_init_port(struct uart_8250_port *up);
void serial8250_set_defaults(struct uart_8250_port *up);
void serial8250_console_write(struct uart_8250_port *up, const char *s,
unsigned int count);
+void serial8250_console_write_atomic(struct uart_8250_port *up, const char *s,
+ unsigned int count);
int serial8250_console_setup(struct uart_port *port, char *options, bool probe);
int serial8250_console_exit(struct uart_port *port);
--
2.20.1
Move printk_delay() "as is" further up so that it can be used by
new functions in an upcoming commit.
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 28 ++++++++++++++--------------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 1b546e117f10..8bdfac4c9ee9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1875,6 +1875,20 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
return do_syslog(type, buf, len, SYSLOG_FROM_READER);
}
+int printk_delay_msec __read_mostly;
+
+static inline void printk_delay(void)
+{
+ if (unlikely(printk_delay_msec)) {
+ int m = printk_delay_msec;
+
+ while (m--) {
+ mdelay(1);
+ touch_nmi_watchdog();
+ }
+ }
+}
+
/*
* Special console_lock variants that help to reduce the risk of soft-lockups.
* They allow to pass console_lock to another printk() call using a busy wait.
@@ -2129,20 +2143,6 @@ static u8 *__printk_recursion_counter(void)
local_irq_restore(flags); \
} while (0)
-int printk_delay_msec __read_mostly;
-
-static inline void printk_delay(void)
-{
- if (unlikely(printk_delay_msec)) {
- int m = printk_delay_msec;
-
- while (m--) {
- mdelay(1);
- touch_nmi_watchdog();
- }
- }
-}
-
static inline u32 printk_caller_id(void)
{
return in_task() ? task_pid_nr(current) :
--
2.20.1
Introduce "sync mode", which means that all printk calls will
synchronously write to the console. Once activated, this mode is
never deactivated. It is used when the kernel is about to end
(such as panic, oops, shutdown, reboot).
Sync mode can only be activated if atomic consoles are available.
In sync mode:
- only atomic consoles (write_atomic() callback) will print
- printing occurs within vprintk_store() instead of console_unlock()
CONSOLE_LOG_MAX is moved to printk.h to support the per-console
buffer used in sync mode.
Signed-off-by: John Ogness <[email protected]>
---
include/linux/console.h | 3 +
include/linux/printk.h | 6 ++
kernel/printk/printk.c | 190 +++++++++++++++++++++++++++++++++++++---
3 files changed, 189 insertions(+), 10 deletions(-)
diff --git a/include/linux/console.h b/include/linux/console.h
index 2f11b604e487..eda9b96e3fb6 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -151,6 +151,9 @@ struct console {
short flags;
short index;
int cflag;
+#if defined(CONFIG_PRINTK) && defined(CONFIG_HAVE_ATOMIC_CONSOLE)
+ char sync_buf[CONSOLE_LOG_MAX];
+#endif
void *data;
struct console *next;
};
diff --git a/include/linux/printk.h b/include/linux/printk.h
index e8181934cfc5..04dd6e3617f0 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -47,6 +47,12 @@ static inline const char *printk_skip_headers(const char *buffer)
#define CONSOLE_EXT_LOG_MAX 8192
+/*
+ * The maximum size of a record formatted for console printing
+ * (i.e. with the prefix prepended to every line).
+ */
+#define CONSOLE_LOG_MAX 1024
+
/* printk's without a loglevel use this.. */
#define MESSAGE_LOGLEVEL_DEFAULT CONFIG_MESSAGE_LOGLEVEL_DEFAULT
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index acd80a8d299f..3fed3be9effe 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -45,6 +45,7 @@
#include <linux/ctype.h>
#include <linux/uio.h>
#include <linux/kgdb.h>
+#include <linux/clocksource.h>
#include <linux/sched/clock.h>
#include <linux/sched/debug.h>
#include <linux/sched/task_stack.h>
@@ -509,6 +510,35 @@ static struct latched_seq console_seq = {
.val[1] = 0,
};
+#ifdef CONFIG_HAVE_ATOMIC_CONSOLE
+/*
+ * A separate console_seq is used during sync mode. This allows tracking the
+ * current printk record to write to the console without holding @console_sem.
+ * Writers are synchronized by the printk cpulock.
+ */
+static struct latched_seq console_sync_seq = {
+ .latch = SEQCNT_LATCH_ZERO(console_sync_seq.latch),
+ .val[0] = 0,
+ .val[1] = 0,
+};
+
+#ifdef CONFIG_HAVE_NMI
+/*
+ * A separate console_seq is used in NMI context during sync mode. This allows
+ * tracking the current printk record to write to the console from NMI
+ * context. Writers are synchronized by the printk cpulock.
+ */
+static struct latched_seq console_sync_nmi_seq = {
+ .latch = SEQCNT_LATCH_ZERO(console_sync_nmi_seq.latch),
+ .val[0] = 0,
+ .val[1] = 0,
+};
+#endif
+
+/* Set to enable sync mode. Once set, it is never cleared. */
+static bool sync_mode;
+#endif /* CONFIG_HAVE_ATOMIC_CONSOLE */
+
/*
* The next printk record to read after the last 'clear' command. There are
* two copies (updated with seqcount_latch) so that reads can locklessly
@@ -526,9 +556,6 @@ static struct latched_seq clear_seq = {
#define PREFIX_MAX 32
#endif
-/* the maximum size of a formatted record (i.e. with prefix added per line) */
-#define CONSOLE_LOG_MAX 1024
-
/* the maximum size allowed to be reserved for a record */
#define LOG_LINE_MAX (CONSOLE_LOG_MAX - PREFIX_MAX)
@@ -1900,6 +1927,126 @@ static inline void printk_delay(int level)
}
}
+#ifdef CONFIG_HAVE_ATOMIC_CONSOLE
+/*
+ * Locklessly determine the next record seq to print. The sync and sync+nmi
+ * variants must also be checked because the latest records may have been
+ * printed from these contexts.
+ */
+static u64 read_console_seq(void)
+{
+ u64 sync_seq;
+ u64 seq;
+
+ seq = latched_seq_read_nolock(&console_seq);
+ sync_seq = latched_seq_read_nolock(&console_sync_seq);
+ if (sync_seq > seq)
+ seq = sync_seq;
+#ifdef CONFIG_HAVE_NMI
+ sync_seq = latched_seq_read_nolock(&console_sync_nmi_seq);
+ if (sync_seq > seq)
+ seq = sync_seq;
+#endif
+ return seq;
+}
+
+static void enable_sync_mode(void)
+{
+ if (!sync_mode) {
+ /*
+ * The trailing printk message is important in order
+ * to flush any pending messages.
+ */
+ sync_mode = true;
+ pr_info("sync mode enabled\n");
+ }
+}
+
+static bool in_sync_mode(void)
+{
+ if (sync_mode)
+ return true;
+ if (oops_in_progress && have_atomic_console()) {
+ enable_sync_mode();
+ return true;
+ }
+ return false;
+}
+
+static bool print_sync(struct console *con, u64 *seq)
+{
+ struct printk_info info;
+ struct printk_record r;
+ size_t text_len;
+
+ prb_rec_init_rd(&r, &info, &con->sync_buf[0], sizeof(con->sync_buf));
+
+ if (!prb_read_valid(prb, *seq, &r))
+ return false;
+
+ text_len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
+
+ con->write_atomic(con, &con->sync_buf[0], text_len);
+
+ *seq = r.info->seq;
+
+ touch_softlockup_watchdog_sync();
+ clocksource_touch_watchdog();
+ rcu_cpu_stall_reset();
+ touch_nmi_watchdog();
+
+ if (text_len)
+ printk_delay(r.info->level);
+
+ return true;
+}
+
+static void print_sync_to(struct console *con, u64 seq)
+{
+ u64 printk_seq;
+
+ while (!__printk_cpu_trylock())
+ __printk_wait_on_cpu_lock();
+
+ for (;;) {
+ printk_seq = read_console_seq();
+ if (printk_seq > seq)
+ break;
+ if (!print_sync(con, &printk_seq))
+ break;
+#ifdef CONFIG_PRINTK_NMI
+ if (in_nmi()) {
+ latched_seq_write(&console_sync_nmi_seq, printk_seq + 1);
+ continue;
+ }
+#endif
+ latched_seq_write(&console_sync_seq, printk_seq + 1);
+ }
+
+ __printk_cpu_unlock();
+}
+
+static void call_sync_console_drivers(u64 seq)
+{
+ struct console *con;
+
+ for_each_console(con) {
+ if (!(con->flags & CON_ENABLED))
+ continue;
+ if (!con->write_atomic)
+ continue;
+ print_sync_to(con, seq);
+ }
+}
+#else
+
+#define read_console_seq() latched_seq_read_nolock(&console_seq)
+#define in_sync_mode() false
+#define enable_sync_mode()
+#define call_sync_console_drivers(seq) ((void)seq)
+
+#endif /* CONFIG_HAVE_ATOMIC_CONSOLE */
+
/*
* Special console_lock variants that help to reduce the risk of soft-lockups.
* They allow to pass console_lock to another printk() call using a busy wait.
@@ -2084,6 +2231,8 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
if (!cpu_online(smp_processor_id()) &&
!(con->flags & CON_ANYTIME))
continue;
+ if (in_sync_mode())
+ continue;
if (con->flags & CON_EXTENDED)
con->write(con, ext_text, ext_len);
else {
@@ -2251,6 +2400,7 @@ int vprintk_store(int facility, int level,
const u32 caller_id = printk_caller_id();
struct prb_reserved_entry e;
enum printk_info_flags flags = 0;
+ bool final_commit = false;
struct printk_record r;
unsigned long irqflags;
u16 trunc_msg_len = 0;
@@ -2261,6 +2411,7 @@ int vprintk_store(int facility, int level,
u16 text_len;
int ret = 0;
u64 ts_nsec;
+ u64 seq;
/*
* Since the duration of printk() can vary depending on the message
@@ -2299,6 +2450,7 @@ int vprintk_store(int facility, int level,
if (flags & LOG_CONT) {
prb_rec_init_wr(&r, reserve_size);
if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) {
+ seq = r.info->seq;
text_len = printk_sprint(&r.text_buf[r.info->text_len], reserve_size,
facility, &flags, fmt, args);
r.info->text_len += text_len;
@@ -2306,6 +2458,7 @@ int vprintk_store(int facility, int level,
if (flags & LOG_NEWLINE) {
r.info->flags |= LOG_NEWLINE;
prb_final_commit(&e);
+ final_commit = true;
} else {
prb_commit(&e);
}
@@ -2330,6 +2483,8 @@ int vprintk_store(int facility, int level,
goto out;
}
+ seq = r.info->seq;
+
/* fill message */
text_len = printk_sprint(&r.text_buf[0], reserve_size, facility, &flags, fmt, args);
if (trunc_msg_len)
@@ -2344,13 +2499,19 @@ int vprintk_store(int facility, int level,
memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info));
/* A message without a trailing newline can be continued. */
- if (!(flags & LOG_NEWLINE))
+ if (!(flags & LOG_NEWLINE)) {
prb_commit(&e);
- else
+ } else {
prb_final_commit(&e);
+ final_commit = true;
+ }
ret = text_len + trunc_msg_len;
out:
+ /* only the kernel may perform synchronous printing */
+ if (in_sync_mode() && facility == 0 && final_commit)
+ call_sync_console_drivers(seq);
+
printk_exit_irqrestore(recursion_ptr, irqflags);
return ret;
}
@@ -2419,13 +2580,13 @@ EXPORT_SYMBOL(_printk);
#else /* CONFIG_PRINTK */
-#define CONSOLE_LOG_MAX 0
#define printk_time false
#define prb_read_valid(rb, seq, r) false
#define prb_first_valid_seq(rb) 0
-#define latched_seq_read_nolock(seq) 0
+#define read_console_seq() 0
#define latched_seq_write(dst, src)
+#define in_sync_mode() false
static u64 exclusive_console_stop_seq;
static unsigned long console_dropped;
@@ -2739,6 +2900,8 @@ static inline int can_use_console(void)
if (atomic_read(&printk_cpulock_owner) == cpu)
return 0;
#endif
+ if (in_sync_mode())
+ return 0;
return cpu_online(cpu) || have_callable_console();
}
@@ -2808,7 +2971,7 @@ void console_unlock(void)
size_t len;
skip:
- seq = latched_seq_read_nolock(&console_seq);
+ seq = read_console_seq();
if (!prb_read_valid(prb, seq, &r))
break;
@@ -2888,7 +3051,7 @@ void console_unlock(void)
* there's a new owner and the console_unlock() from them will do the
* flush, no worries.
*/
- retry = prb_read_valid(prb, latched_seq_read_nolock(&console_seq), NULL);
+ retry = prb_read_valid(prb, read_console_seq(), NULL);
if (retry && console_trylock())
goto again;
}
@@ -3188,7 +3351,7 @@ void register_console(struct console *newcon)
* ignores console_lock.
*/
exclusive_console = newcon;
- exclusive_console_stop_seq = latched_seq_read_nolock(&console_seq);
+ exclusive_console_stop_seq = read_console_seq();
/* Get a consistent copy of @syslog_seq. */
mutex_lock(&syslog_lock);
@@ -3558,6 +3721,13 @@ void kmsg_dump(enum kmsg_dump_reason reason)
{
struct kmsg_dumper *dumper;
+ /*
+ * If atomic consoles are available, activate sync mode
+ * to make sure any final messages are visible.
+ */
+ if (have_atomic_console())
+ enable_sync_mode();
+
rcu_read_lock();
list_for_each_entry_rcu(dumper, &dump_list, list) {
enum kmsg_dump_reason max_reason = dumper->max_reason;
--
2.20.1
The printk cpulock functions use local_irq_disable(). This means that
hardware interrupts are also disabled on PREEMPT_RT. To make this
clear, rename the functions to use the raw_ prefix:
raw_printk_cpu_lock_irqsave(flags);
raw_printk_cpu_unlock_irqrestore(flags);
Also, these functions were a NOP for !CONFIG_SMP. But for !CONFIG_SMP
they still need to disable hardware interrupts. So modify them
appropriately for this.
Signed-off-by: John Ogness <[email protected]>
---
include/linux/printk.h | 30 ++++++++++++++----------------
lib/dump_stack.c | 4 ++--
lib/nmi_backtrace.c | 4 ++--
3 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 259af4f97f50..ac738d1d9934 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -280,17 +280,22 @@ static inline void dump_stack(void)
extern int __printk_cpu_trylock(void);
extern void __printk_wait_on_cpu_lock(void);
extern void __printk_cpu_unlock(void);
+#else
+#define __printk_cpu_trylock() 1
+#define __printk_wait_on_cpu_lock()
+#define __printk_cpu_unlock()
+#endif /* CONFIG_SMP */
/**
- * printk_cpu_lock_irqsave() - Acquire the printk cpu-reentrant spinning
- * lock and disable interrupts.
+ * raw_printk_cpu_lock_irqsave() - Acquire the printk cpu-reentrant spinning
+ * lock and disable interrupts.
* @flags: Stack-allocated storage for saving local interrupt state,
- * to be passed to printk_cpu_unlock_irqrestore().
+ * to be passed to raw_printk_cpu_unlock_irqrestore().
*
* If the lock is owned by another CPU, spin until it becomes available.
* Interrupts are restored while spinning.
*/
-#define printk_cpu_lock_irqsave(flags) \
+#define raw_printk_cpu_lock_irqsave(flags) \
for (;;) { \
local_irq_save(flags); \
if (__printk_cpu_trylock()) \
@@ -300,22 +305,15 @@ extern void __printk_cpu_unlock(void);
}
/**
- * printk_cpu_unlock_irqrestore() - Release the printk cpu-reentrant spinning
- * lock and restore interrupts.
- * @flags: Caller's saved interrupt state, from printk_cpu_lock_irqsave().
+ * raw_printk_cpu_unlock_irqrestore() - Release the printk cpu-reentrant
+ * spinning lock and restore interrupts.
+ * @flags: Caller's saved interrupt state from raw_printk_cpu_lock_irqsave().
*/
-#define printk_cpu_unlock_irqrestore(flags) \
+#define raw_printk_cpu_unlock_irqrestore(flags) \
do { \
__printk_cpu_unlock(); \
local_irq_restore(flags); \
- } while (0) \
-
-#else
-
-#define printk_cpu_lock_irqsave(flags) ((void)flags)
-#define printk_cpu_unlock_irqrestore(flags) ((void)flags)
-
-#endif /* CONFIG_SMP */
+ } while (0)
extern int kptr_restrict;
diff --git a/lib/dump_stack.c b/lib/dump_stack.c
index cd3387bb34e5..7af32829b062 100644
--- a/lib/dump_stack.c
+++ b/lib/dump_stack.c
@@ -101,9 +101,9 @@ asmlinkage __visible void dump_stack_lvl(const char *log_lvl)
* Permit this cpu to perform nested stack dumps while serialising
* against other CPUs
*/
- printk_cpu_lock_irqsave(flags);
+ raw_printk_cpu_lock_irqsave(flags);
__dump_stack(log_lvl);
- printk_cpu_unlock_irqrestore(flags);
+ raw_printk_cpu_unlock_irqrestore(flags);
}
EXPORT_SYMBOL(dump_stack_lvl);
diff --git a/lib/nmi_backtrace.c b/lib/nmi_backtrace.c
index f9e89001b52e..569ec8676072 100644
--- a/lib/nmi_backtrace.c
+++ b/lib/nmi_backtrace.c
@@ -93,7 +93,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
* Allow nested NMI backtraces while serializing
* against other CPUs.
*/
- printk_cpu_lock_irqsave(flags);
+ raw_printk_cpu_lock_irqsave(flags);
if (!READ_ONCE(backtrace_idle) && regs && cpu_in_idle(instruction_pointer(regs))) {
pr_warn("NMI backtrace for cpu %d skipped: idling at %pS\n",
cpu, (void *)instruction_pointer(regs));
@@ -104,7 +104,7 @@ bool nmi_cpu_backtrace(struct pt_regs *regs)
else
dump_stack();
}
- printk_cpu_unlock_irqrestore(flags);
+ raw_printk_cpu_unlock_irqrestore(flags);
cpumask_clear_cpu(cpu, to_cpumask(backtrace_mask));
return true;
}
--
2.20.1
Add a write_atomic() callback to the console. This is an optional
function for console drivers. The function must be atomic (including
NMI safe) for writing to the console.
Console drivers implementing write_atomic() must select the new
HAVE_ATOMIC_CONSOLE Kconfig option.
Console drivers must still implement the write() callback. The
write_atomic() callback will only be used in special situations,
such as when the kernel panics.
Creating an NMI safe write_atomic() that must synchronize with
write() requires a careful implementation of the console driver. To
aid with the implementation, a set of console_atomic_*() functions
are provided:
void console_atomic_lock(unsigned long flags);
void console_atomic_unlock(unsigned long flags);
These functions synchronize using the printk cpulock and disable
hardware interrupts.
In order to increase effectiveness, the printk cpulock functions are
also made more aggressive and now keep interrupts disabled while
spinning.
CPUs holding the printk cpulock must not spin on any other lock.
Therefore can_use_console() will now return false if the printk
cpulock is held in order to avoid calling into console driver code,
while typically contain spinlocks.
Likewise, console_trylock_spinning() will fail rather than attempt
to acquire the console_sem (which includes a spinlock in its
implementation) if the printk cpulock is held.
Signed-off-by: John Ogness <[email protected]>
---
include/linux/console.h | 29 +++++++++++++++++++++++++++++
include/linux/printk.h | 19 ++++++++-----------
kernel/printk/printk.c | 22 +++++++++++++++++++++-
lib/Kconfig.debug | 3 +++
4 files changed, 61 insertions(+), 12 deletions(-)
diff --git a/include/linux/console.h b/include/linux/console.h
index 20874db50bc8..2f11b604e487 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -16,6 +16,7 @@
#include <linux/atomic.h>
#include <linux/types.h>
+#include <linux/printk.h>
struct vc_data;
struct console_font_op;
@@ -140,6 +141,7 @@ static inline int con_debug_leave(void)
struct console {
char name[16];
void (*write)(struct console *, const char *, unsigned);
+ void (*write_atomic)(struct console *co, const char *s, unsigned int count);
int (*read)(struct console *, char *, unsigned);
struct tty_driver *(*device)(struct console *, int *);
void (*unblank)(void);
@@ -159,6 +161,33 @@ struct console {
#define for_each_console(con) \
for (con = console_drivers; con != NULL; con = con->next)
+#ifdef CONFIG_HAVE_ATOMIC_CONSOLE
+#define have_atomic_console() \
+ ({ \
+ struct console *con; \
+ \
+ for_each_console(con) { \
+ if (!(con->flags & CON_ENABLED)) \
+ continue; \
+ if (con->write_atomic) \
+ break; \
+ } \
+ (con != NULL); \
+ })
+
+/*
+ * For write_atomic() implementations, the printk cpu-reentrant spinlock can
+ * be used to help synchronize between write_atomic() and write().
+ */
+#define console_atomic_cpu_lock raw_printk_cpu_lock_irqsave
+#define console_atomic_cpu_unlock raw_printk_cpu_unlock_irqrestore
+
+#else
+
+#define have_atomic_console() false
+
+#endif /* CONFIG_HAVE_ATOMIC_CONSOLE */
+
extern int console_set_on_cmdline;
extern struct console *early_console;
diff --git a/include/linux/printk.h b/include/linux/printk.h
index 974ea2c99749..e8181934cfc5 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -295,22 +295,19 @@ static inline bool kgdb_roundup_delay(unsigned int cpu)
#endif /* CONFIG_SMP */
/**
- * raw_printk_cpu_lock_irqsave() - Acquire the printk cpu-reentrant spinning
- * lock and disable interrupts.
+ * raw_printk_cpu_lock_irqsave() - Disable interrupts and acquire the printk
+ * cpu-reentrant spinning lock.
* @flags: Stack-allocated storage for saving local interrupt state,
* to be passed to raw_printk_cpu_unlock_irqrestore().
*
* If the lock is owned by another CPU, spin until it becomes available.
- * Interrupts are restored while spinning.
*/
-#define raw_printk_cpu_lock_irqsave(flags) \
- for (;;) { \
- local_irq_save(flags); \
- if (__printk_cpu_trylock()) \
- break; \
- local_irq_restore(flags); \
- __printk_wait_on_cpu_lock(); \
- }
+#define raw_printk_cpu_lock_irqsave(flags) \
+ do { \
+ local_irq_save(flags); \
+ while (!__printk_cpu_trylock()) \
+ __printk_wait_on_cpu_lock(); \
+ } while (0)
/**
* raw_printk_cpu_unlock_irqrestore() - Release the printk cpu-reentrant
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index f8f46d9fba9b..acd80a8d299f 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1993,6 +1993,16 @@ static int console_trylock_spinning(void)
bool spin = false;
unsigned long flags;
+#ifdef CONFIG_SMP
+ /*
+ * CPUs holding the printk cpulock must not spin on any lock. Even
+ * console_trylock() must not be called because its implementation
+ * uses spinlocks.
+ */
+ if (atomic_read(&printk_cpulock_owner) == smp_processor_id())
+ return 0;
+#endif
+
if (console_trylock())
return 1;
@@ -2719,7 +2729,17 @@ static int have_callable_console(void)
*/
static inline int can_use_console(void)
{
- return cpu_online(raw_smp_processor_id()) || have_callable_console();
+ int cpu = raw_smp_processor_id();
+#ifdef CONFIG_SMP
+ /*
+ * CPUs holding the printk cpulock must not spin on any lock.
+ * Allowing console usage could call into the spinlocks of the
+ * various console drivers.
+ */
+ if (atomic_read(&printk_cpulock_owner) == cpu)
+ return 0;
+#endif
+ return cpu_online(cpu) || have_callable_console();
}
/**
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 831212722924..a32e57329f0a 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -72,6 +72,9 @@ config CONSOLE_LOGLEVEL_QUIET
will be used as the loglevel. IOW passing "quiet" will be the
equivalent of passing "loglevel=<CONSOLE_LOGLEVEL_QUIET>"
+config HAVE_ATOMIC_CONSOLE
+ bool
+
config MESSAGE_LOGLEVEL_DEFAULT
int "Default message log level (1-7)"
range 1 7
--
2.20.1
Move the printk cpulock functions "as is" further up so that they
can be used by other printk.c functions in an upcoming commit.
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 232 ++++++++++++++++++++---------------------
1 file changed, 116 insertions(+), 116 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 825277e1e742..3d0c933937b4 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -211,6 +211,122 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
return 0;
}
+#ifdef CONFIG_SMP
+static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
+static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
+
+/**
+ * __printk_wait_on_cpu_lock() - Busy wait until the printk cpu-reentrant
+ * spinning lock is not owned by any CPU.
+ *
+ * Context: Any context.
+ */
+void __printk_wait_on_cpu_lock(void)
+{
+ do {
+ cpu_relax();
+ } while (atomic_read(&printk_cpulock_owner) != -1);
+}
+EXPORT_SYMBOL(__printk_wait_on_cpu_lock);
+
+/**
+ * __printk_cpu_trylock() - Try to acquire the printk cpu-reentrant
+ * spinning lock.
+ *
+ * If no processor has the lock, the calling processor takes the lock and
+ * becomes the owner. If the calling processor is already the owner of the
+ * lock, this function succeeds immediately.
+ *
+ * Context: Any context. Expects interrupts to be disabled.
+ * Return: 1 on success, otherwise 0.
+ */
+int __printk_cpu_trylock(void)
+{
+ int cpu;
+ int old;
+
+ cpu = smp_processor_id();
+
+ /*
+ * Guarantee loads and stores from this CPU when it is the lock owner
+ * are _not_ visible to the previous lock owner. This pairs with
+ * __printk_cpu_unlock:B.
+ *
+ * Memory barrier involvement:
+ *
+ * If __printk_cpu_trylock:A reads from __printk_cpu_unlock:B, then
+ * __printk_cpu_unlock:A can never read from __printk_cpu_trylock:B.
+ *
+ * Relies on:
+ *
+ * RELEASE from __printk_cpu_unlock:A to __printk_cpu_unlock:B
+ * of the previous CPU
+ * matching
+ * ACQUIRE from __printk_cpu_trylock:A to __printk_cpu_trylock:B
+ * of this CPU
+ */
+ old = atomic_cmpxchg_acquire(&printk_cpulock_owner, -1,
+ cpu); /* LMM(__printk_cpu_trylock:A) */
+ if (old == -1) {
+ /*
+ * This CPU is now the owner and begins loading/storing
+ * data: LMM(__printk_cpu_trylock:B)
+ */
+ return 1;
+
+ } else if (old == cpu) {
+ /* This CPU is already the owner. */
+ atomic_inc(&printk_cpulock_nested);
+ return 1;
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(__printk_cpu_trylock);
+
+/**
+ * __printk_cpu_unlock() - Release the printk cpu-reentrant spinning lock.
+ *
+ * The calling processor must be the owner of the lock.
+ *
+ * Context: Any context. Expects interrupts to be disabled.
+ */
+void __printk_cpu_unlock(void)
+{
+ if (atomic_read(&printk_cpulock_nested)) {
+ atomic_dec(&printk_cpulock_nested);
+ return;
+ }
+
+ /*
+ * This CPU is finished loading/storing data:
+ * LMM(__printk_cpu_unlock:A)
+ */
+
+ /*
+ * Guarantee loads and stores from this CPU when it was the
+ * lock owner are visible to the next lock owner. This pairs
+ * with __printk_cpu_trylock:A.
+ *
+ * Memory barrier involvement:
+ *
+ * If __printk_cpu_trylock:A reads from __printk_cpu_unlock:B,
+ * then __printk_cpu_trylock:B reads from __printk_cpu_unlock:A.
+ *
+ * Relies on:
+ *
+ * RELEASE from __printk_cpu_unlock:A to __printk_cpu_unlock:B
+ * of this CPU
+ * matching
+ * ACQUIRE from __printk_cpu_trylock:A to __printk_cpu_trylock:B
+ * of the next CPU
+ */
+ atomic_set_release(&printk_cpulock_owner,
+ -1); /* LMM(__printk_cpu_unlock:B) */
+}
+EXPORT_SYMBOL(__printk_cpu_unlock);
+#endif /* CONFIG_SMP */
+
/* Number of registered extended console drivers. */
static int nr_ext_console_drivers;
@@ -3578,119 +3694,3 @@ void kmsg_dump_rewind(struct kmsg_dump_iter *iter)
EXPORT_SYMBOL_GPL(kmsg_dump_rewind);
#endif
-
-#ifdef CONFIG_SMP
-static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
-static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
-
-/**
- * __printk_wait_on_cpu_lock() - Busy wait until the printk cpu-reentrant
- * spinning lock is not owned by any CPU.
- *
- * Context: Any context.
- */
-void __printk_wait_on_cpu_lock(void)
-{
- do {
- cpu_relax();
- } while (atomic_read(&printk_cpulock_owner) != -1);
-}
-EXPORT_SYMBOL(__printk_wait_on_cpu_lock);
-
-/**
- * __printk_cpu_trylock() - Try to acquire the printk cpu-reentrant
- * spinning lock.
- *
- * If no processor has the lock, the calling processor takes the lock and
- * becomes the owner. If the calling processor is already the owner of the
- * lock, this function succeeds immediately.
- *
- * Context: Any context. Expects interrupts to be disabled.
- * Return: 1 on success, otherwise 0.
- */
-int __printk_cpu_trylock(void)
-{
- int cpu;
- int old;
-
- cpu = smp_processor_id();
-
- /*
- * Guarantee loads and stores from this CPU when it is the lock owner
- * are _not_ visible to the previous lock owner. This pairs with
- * __printk_cpu_unlock:B.
- *
- * Memory barrier involvement:
- *
- * If __printk_cpu_trylock:A reads from __printk_cpu_unlock:B, then
- * __printk_cpu_unlock:A can never read from __printk_cpu_trylock:B.
- *
- * Relies on:
- *
- * RELEASE from __printk_cpu_unlock:A to __printk_cpu_unlock:B
- * of the previous CPU
- * matching
- * ACQUIRE from __printk_cpu_trylock:A to __printk_cpu_trylock:B
- * of this CPU
- */
- old = atomic_cmpxchg_acquire(&printk_cpulock_owner, -1,
- cpu); /* LMM(__printk_cpu_trylock:A) */
- if (old == -1) {
- /*
- * This CPU is now the owner and begins loading/storing
- * data: LMM(__printk_cpu_trylock:B)
- */
- return 1;
-
- } else if (old == cpu) {
- /* This CPU is already the owner. */
- atomic_inc(&printk_cpulock_nested);
- return 1;
- }
-
- return 0;
-}
-EXPORT_SYMBOL(__printk_cpu_trylock);
-
-/**
- * __printk_cpu_unlock() - Release the printk cpu-reentrant spinning lock.
- *
- * The calling processor must be the owner of the lock.
- *
- * Context: Any context. Expects interrupts to be disabled.
- */
-void __printk_cpu_unlock(void)
-{
- if (atomic_read(&printk_cpulock_nested)) {
- atomic_dec(&printk_cpulock_nested);
- return;
- }
-
- /*
- * This CPU is finished loading/storing data:
- * LMM(__printk_cpu_unlock:A)
- */
-
- /*
- * Guarantee loads and stores from this CPU when it was the
- * lock owner are visible to the next lock owner. This pairs
- * with __printk_cpu_trylock:A.
- *
- * Memory barrier involvement:
- *
- * If __printk_cpu_trylock:A reads from __printk_cpu_unlock:B,
- * then __printk_cpu_trylock:B reads from __printk_cpu_unlock:A.
- *
- * Relies on:
- *
- * RELEASE from __printk_cpu_unlock:A to __printk_cpu_unlock:B
- * of this CPU
- * matching
- * ACQUIRE from __printk_cpu_trylock:A to __printk_cpu_trylock:B
- * of the next CPU
- */
- atomic_set_release(&printk_cpulock_owner,
- -1); /* LMM(__printk_cpu_unlock:B) */
-}
-EXPORT_SYMBOL(__printk_cpu_unlock);
-#endif /* CONFIG_SMP */
--
2.20.1
boot_delay_msec() is always called immediately before printk_delay()
so just call it from within printk_delay().
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 8bdfac4c9ee9..d07d98c1e846 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1877,8 +1877,10 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, int, len)
int printk_delay_msec __read_mostly;
-static inline void printk_delay(void)
+static inline void printk_delay(int level)
{
+ boot_delay_msec(level);
+
if (unlikely(printk_delay_msec)) {
int m = printk_delay_msec;
@@ -2350,8 +2352,7 @@ asmlinkage int vprintk_emit(int facility, int level,
in_sched = true;
}
- boot_delay_msec(level);
- printk_delay();
+ printk_delay(level);
printed_len = vprintk_store(facility, level, dev_info, fmt, args);
--
2.20.1
On Tue, Aug 03, 2021 at 03:18:51PM +0206, John Ogness wrote:
> Hi,
>
> This is the next part of our printk-rework effort (points 3 and
> 4 of the LPC 2019 summary [0]).
>
> Here the concept of "atomic consoles" is introduced through a
> new (optional) write_atomic() callback for console drivers. This
> callback must be implemented as an NMI-safe variant of the
> write() callback, meaning that it can function from any context
> without relying on questionable tactics such as ignoring locking
> and also without relying on the synchronization of console
> semaphore.
>
> As an example of how such an atomic console can look like, this
> series implements write_atomic() for the 8250 UART driver.
>
> This series also introduces a new console printing mode called
> "sync mode" that is only activated when the kernel is about to
> end (such as panic, oops, shutdown, reboot). Sync mode can only
> be activated if atomic consoles are available. A system without
> registered atomic consoles will be unaffected by this series.
>
> When in sync mode, the console printing behavior becomes:
>
> - only consoles implementing write_atomic() will be called
>
> - printing occurs within vprintk_store() instead of
> console_unlock(), since the console semaphore is irrelevant
> for atomic consoles
>
> For systems that have registered atomic consoles, this series
> improves the reliability of seeing crash messages by using new
> locking techniques rather than "ignoring locks and hoping for
> the best". In particular, atomic consoles rely on the
> CPU-reentrant spinlock (i.e. the printk cpulock) for
> synchronizing console output.
If console is runtime suspended, who will bring it up?
Does it mean that this callback can't be implemented on the consoles that
do runtime suspend (some of 8250 currently, for example)?
--
With Best Regards,
Andy Shevchenko
On Tue, Aug 03, 2021 at 03:18:58PM +0206, John Ogness wrote:
> Add a write_atomic() callback to the console. This is an optional
> function for console drivers. The function must be atomic (including
> NMI safe) for writing to the console.
>
> Console drivers implementing write_atomic() must select the new
> HAVE_ATOMIC_CONSOLE Kconfig option.
>
> Console drivers must still implement the write() callback. The
> write_atomic() callback will only be used in special situations,
> such as when the kernel panics.
>
> Creating an NMI safe write_atomic() that must synchronize with
> write() requires a careful implementation of the console driver. To
> aid with the implementation, a set of console_atomic_*() functions
> are provided:
>
> void console_atomic_lock(unsigned long flags);
> void console_atomic_unlock(unsigned long flags);
>
> These functions synchronize using the printk cpulock and disable
> hardware interrupts.
>
> In order to increase effectiveness, the printk cpulock functions are
> also made more aggressive and now keep interrupts disabled while
> spinning.
>
> CPUs holding the printk cpulock must not spin on any other lock.
> Therefore can_use_console() will now return false if the printk
> cpulock is held in order to avoid calling into console driver code,
> while typically contain spinlocks.
>
> Likewise, console_trylock_spinning() will fail rather than attempt
> to acquire the console_sem (which includes a spinlock in its
...
> #include <linux/atomic.h>
> #include <linux/types.h>
> +#include <linux/printk.h>
Ordered?
...
> + if (!(con->flags & CON_ENABLED)) \
> + continue; \
What about
#define console_is_enabled(con) (!!(con->flags & CON_ENABLED))
or inliner equivalent
static inline bool console_is_enabled(struct console *con)
{
return !!(con->flags & CON_ENABLED);
}
?
--
With Best Regards,
Andy Shevchenko
On Tue, Aug 03, 2021 at 03:19:01PM +0206, John Ogness wrote:
> Implement an NMI-safe write_atomic() console function in order to
> support synchronous console printing.
>
> Since interrupts need to be disabled during transmit, all usage of
> the IER register is wrapped with access functions that use the
> printk cpulock to synchronize register access while tracking the
> state of the interrupts. This is necessary because write_atomic()
> can be called from an NMI context that has preempted write_atomic().
...
> +static inline void serial8250_set_IER(struct uart_8250_port *up,
> + unsigned char ier)
> +{
> + struct uart_port *port = &up->port;
> + unsigned long flags;
> + bool is_console;
> + is_console = uart_console(port);
> +
> + if (is_console)
> + console_atomic_cpu_lock(flags);
> +
> + serial_out(up, UART_IER, ier);
> +
> + if (is_console)
> + console_atomic_cpu_unlock(flags);
I would rewrite it as
if (uart_console()) {
console_atomic_cpu_lock(flags);
serial_out(up, UART_IER, ier);
console_atomic_cpu_unlock(flags);
} else {
serial_out(up, UART_IER, ier);
}
No additional variable, easier to get the algorithm on the first glance, less
error prone.
> +}
> +static inline unsigned char serial8250_clear_IER(struct uart_8250_port *up)
> +{
> + struct uart_port *port = &up->port;
> + unsigned int clearval = 0;
> + unsigned long flags;
> + unsigned int prior;
> + bool is_console;
> +
> + is_console = uart_console(port);
> +
> + if (up->capabilities & UART_CAP_UUE)
> + clearval = UART_IER_UUE;
> +
> + if (is_console)
> + console_atomic_cpu_lock(flags);
> +
> + prior = serial_port_in(port, UART_IER);
> + serial_port_out(port, UART_IER, clearval);
> +
> + if (is_console)
> + console_atomic_cpu_unlock(flags);
Ditto.
> + return prior;
> +}
...
> + is_console = uart_console(port);
> +
> + if (is_console)
> + console_atomic_cpu_lock(flags);
> up->ier = port->serial_in(port, UART_IER);
> + if (is_console)
> + console_atomic_cpu_unlock(flags);
> +
I'm wondering why you can't call above function here?
...
> + is_console = uart_console(p);
> + if (is_console)
> + console_atomic_cpu_lock(flags);
> ier = p->serial_in(p, UART_IER);
> + if (is_console)
> + console_atomic_cpu_unlock(flags);
Ditto.
...
> + is_console = uart_console(port);
> +
> + if (is_console)
> + console_atomic_cpu_lock(flags);
> +
> + ier = serial_in(up, UART_IER);
> + serial_out(up, UART_IER, ier & (~mask));
> +
> + if (is_console)
> + console_atomic_cpu_unlock(flags);
Ditto.
...
> + if (uart_console(port))
> + console_atomic_cpu_lock(flags);
> +
> + ier = serial_in(up, UART_IER);
> + serial_out(up, UART_IER, ier | mask);
> +
> + if (uart_console(port))
> + console_atomic_cpu_unlock(flags);
Ditto.
Looking into above note, that uart_console(port) can give different results
here, AFAIR.
--
With Best Regards,
Andy Shevchenko
On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> during cpu roundup. This will conflict with the printk cpulock.
When the full vision is realized what will be the purpose of the printk
cpulock?
I'm asking largely because it's current role is actively unhelpful
w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
be a better (and safer) solution. However it sounds like there is a
larger role planned for the printk cpulock...
> Therefore, a CPU must ensure that it is not holding the printk
> cpulock when calling kgdb_cpu_enter(). If it is, it must allow its
> printk context to complete first.
>
> A new helper function kgdb_roundup_delay() is introduced for kgdb
> to determine if it is holding the printk cpulock. If so, a flag is
> set so that when the printk cpulock is released, kgdb will be
> re-triggered for that CPU.
>
> Signed-off-by: John Ogness <[email protected]>
> ---
> arch/powerpc/include/asm/smp.h | 1 +
> arch/powerpc/kernel/kgdb.c | 10 +++++++-
> arch/powerpc/kernel/smp.c | 5 ++++
> arch/x86/kernel/kgdb.c | 9 ++++---
> include/linux/kgdb.h | 3 +++
> include/linux/printk.h | 8 ++++++
> kernel/debug/debug_core.c | 45 ++++++++++++++++++++--------------
> kernel/printk/printk.c | 12 +++++++++
> 8 files changed, 70 insertions(+), 23 deletions(-)
>
> [...]
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 3d0c933937b4..1b546e117f10 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -44,6 +44,7 @@
> #include <linux/irq_work.h>
> #include <linux/ctype.h>
> #include <linux/uio.h>
> +#include <linux/kgdb.h>
> #include <linux/sched/clock.h>
> #include <linux/sched/debug.h>
> #include <linux/sched/task_stack.h>
> @@ -214,6 +215,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
> #ifdef CONFIG_SMP
> static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
> static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
> +static unsigned int kgdb_cpu = -1;
Is this the flag to provoke retriggering? It appears to be a write-only
variable (at least in this patch). How is it consumed?
Daniel.
> /**
> * __printk_wait_on_cpu_lock() - Busy wait until the printk cpu-reentrant
> @@ -325,6 +327,16 @@ void __printk_cpu_unlock(void)
> -1); /* LMM(__printk_cpu_unlock:B) */
> }
> EXPORT_SYMBOL(__printk_cpu_unlock);
> +
> +bool kgdb_roundup_delay(unsigned int cpu)
> +{
> + if (cpu != atomic_read(&printk_cpulock_owner))
> + return false;
> +
> + kgdb_cpu = cpu;
> + return true;
> +}
> +EXPORT_SYMBOL(kgdb_roundup_delay);
> #endif /* CONFIG_SMP */
>
> /* Number of registered extended console drivers. */
> --
> 2.20.1
>
On 2021-08-03, Daniel Thompson <[email protected]> wrote:
> On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
>> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
>> during cpu roundup. This will conflict with the printk cpulock.
>
> When the full vision is realized what will be the purpose of the printk
> cpulock?
>
> I'm asking largely because it's current role is actively unhelpful
> w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
> be a better (and safer) solution. However it sounds like there is a
> larger role planned for the printk cpulock...
The printk cpulock is used as a synchronization mechanism for
implementing atomic consoles, which need to be able to safely interrupt
the console write() activity at any time and immediately continue with
their own printing. The ultimate goal is to move all console printing
into per-console dedicated kthreads, so the primary function of the
printk cpulock is really to immediately _stop_ the CPU/kthread
performing write() in order to allow write_atomic() (from any context on
any CPU) to safely and reliably take over.
Atomic consoles are actually quite similar to the kgdb_io ops. For
example, comparing:
serial8250_console_write_atomic() + serial8250_console_putchar_locked()
with
serial8250_put_poll_char()
The difference is that serial8250_console_write_atomic() is line-based
and synchronizing with serial8250_console_write() so that if the kernel
crashes while outputing to the console, write() can be interrupted by
write_atomic() and cleanly formatted crash data can be output.
Also serial8250_put_poll_char() is calling into __pm_runtime_resume(),
which includes a spinlock and possibly sleeping. This would not be
acceptable for atomic consoles. Although, as Andy pointed out [0], I
will need to figure out how to deal with suspended consoles. Or just
implement a policy that registered atomic consoles may never be
suspended.
I had not considered merging kgdb_io ops with atomic console ops. But
now that I look at it more closely, there may be some useful overlap. I
will consider this. Thank you for this idea.
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index 3d0c933937b4..1b546e117f10 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -214,6 +215,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
>> #ifdef CONFIG_SMP
>> static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
>> static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
>> +static unsigned int kgdb_cpu = -1;
>
> Is this the flag to provoke retriggering? It appears to be a write-only
> variable (at least in this patch). How is it consumed?
Critical catch! Thank you. I am quite unhappy to see these hunks were
accidentally dropped when generating this series.
@@ -3673,6 +3675,9 @@ EXPORT_SYMBOL(__printk_cpu_trylock);
*/
void __printk_cpu_unlock(void)
{
+ bool trigger_kgdb = false;
+ unsigned int cpu;
+
if (atomic_read(&printk_cpulock_nested)) {
atomic_dec(&printk_cpulock_nested);
return;
@@ -3683,6 +3688,12 @@ void __printk_cpu_unlock(void)
* LMM(__printk_cpu_unlock:A)
*/
+ cpu = smp_processor_id();
+ if (kgdb_cpu == cpu) {
+ trigger_kgdb = true;
+ kgdb_cpu = -1;
+ }
+
/*
* Guarantee loads and stores from this CPU when it was the
* lock owner are visible to the next lock owner. This pairs
@@ -3703,6 +3714,21 @@ void __printk_cpu_unlock(void)
*/
atomic_set_release(&printk_cpulock_owner,
-1); /* LMM(__printk_cpu_unlock:B) */
+
+ if (trigger_kgdb) {
+ pr_warn("re-triggering kgdb roundup for CPU#%d\n", cpu);
+ kgdb_roundup_cpu(cpu);
+ }
}
EXPORT_SYMBOL(__printk_cpu_unlock);
John Ogness
[0] https://lore.kernel.org/lkml/[email protected]
On Tue 2021-08-03 15:18:53, John Ogness wrote:
> The printk cpulock functions use local_irq_disable(). This means that
> hardware interrupts are also disabled on PREEMPT_RT. To make this
> clear, rename the functions to use the raw_ prefix:
>
> raw_printk_cpu_lock_irqsave(flags);
> raw_printk_cpu_unlock_irqrestore(flags);
>
> Also, these functions were a NOP for !CONFIG_SMP. But for !CONFIG_SMP
> they still need to disable hardware interrupts. So modify them
> appropriately for this.
>
> Signed-off-by: John Ogness <[email protected]>
> ---
> include/linux/printk.h | 30 ++++++++++++++----------------
> lib/dump_stack.c | 4 ++--
> lib/nmi_backtrace.c | 4 ++--
> 3 files changed, 18 insertions(+), 20 deletions(-)
>
> diff --git a/include/linux/printk.h b/include/linux/printk.h
> index 259af4f97f50..ac738d1d9934 100644
> --- a/include/linux/printk.h
> +++ b/include/linux/printk.h
> @@ -280,17 +280,22 @@ static inline void dump_stack(void)
> extern int __printk_cpu_trylock(void);
> extern void __printk_wait_on_cpu_lock(void);
> extern void __printk_cpu_unlock(void);
> +#else
> +#define __printk_cpu_trylock() 1
> +#define __printk_wait_on_cpu_lock()
> +#define __printk_cpu_unlock()
> +#endif /* CONFIG_SMP */
IMHO, it is not obvious that
while (!__printk_cpu_trylock()) \
__printk_wait_on_cpu_lock(); \
does nothing in the !CONFIG_SMP case. Please, make it more obvious.
I suggest to define:
#ifdef CONFIG_SMP
#define __printk_cpu_lock() \
do { \
while (!__printk_cpu_trylock()) \
__printk_wait_on_cpu_lock(); \
} while (0)
#else
#define __printk_cpu_lock()
#endif
/**
* raw_printk_cpu_lock_irqsave() - Disable interrupts and acquire the printk
* cpu-reentrant spinning lock.
* @flags: Stack-allocated storage for saving local interrupt state,
* to be passed to raw_printk_cpu_unlock_irqrestore().
*
* If the lock is owned by another CPU, spin until it becomes available.
*/
#define raw_printk_cpu_lock_irqsave(flags) \
do { \
local_irq_save(flags); \
__printk_cpu_lock(); \
} while (0)
Best Regards,
Petr
On Tue 2021-08-03 15:18:52, John Ogness wrote:
> Move the printk cpulock functions "as is" further up so that they
> can be used by other printk.c functions in an upcoming commit.
>
> Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Best Regards,
Petr
On Tue, Aug 03, 2021 at 05:36:32PM +0206, John Ogness wrote:
> On 2021-08-03, Daniel Thompson <[email protected]> wrote:
> > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> >> during cpu roundup. This will conflict with the printk cpulock.
> >
> > When the full vision is realized what will be the purpose of the printk
> > cpulock?
> >
> > I'm asking largely because it's current role is actively unhelpful
> > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
> > be a better (and safer) solution. However it sounds like there is a
> > larger role planned for the printk cpulock...
>
> The printk cpulock is used as a synchronization mechanism for
> implementing atomic consoles, which need to be able to safely interrupt
> the console write() activity at any time and immediately continue with
> their own printing. The ultimate goal is to move all console printing
> into per-console dedicated kthreads, so the primary function of the
> printk cpulock is really to immediately _stop_ the CPU/kthread
> performing write() in order to allow write_atomic() (from any context on
> any CPU) to safely and reliably take over.
I see.
Is there any mileage in allowing in_dbg_master() to suppress taking
the console lock?
There's a couple of reasons to worry about the current approach.
The first is that we don't want this code to trigger in the case when
kgdb is enabled and kdb is not since it is only kdb (a self-hosted
debugger) than uses the consoles. This case is relatively trivial to
address since we can rename it kdb_roundup_delay() and alter the way it
is conditionally compiled.
The second is more of a problem however. kdb will only call into the
console code from the debug master. By default this is the CPU that
takes the debug trap so initial prints will work fine. However it is
possible to switch to a different master (so we can read per-CPU
registers and things like that). This will result in one of the CPUs
that did the IPI round up calling into console code and this is unsafe
in that instance.
There are a couple of tricks we could adopt to work around this but
given the slightly odd calling context for kdb (all CPUs quiesced, no
log interleaving possible) it sounds like it would remain safe to
bypass the lock if in_dbg_master() is true.
Bypassing an inconvenient lock might sound icky but:
1. If the lock is not owned by any CPU then what kdb will do is safe.
2. If the lock is owned by any CPU then we have quiesced it anyway
and this makes is safe for the owning CPU to share its ownership
(since it isn't much different to recursive acquisition on a single
CPU)
> Atomic consoles are actually quite similar to the kgdb_io ops. For
> example, comparing:
>
> serial8250_console_write_atomic() + serial8250_console_putchar_locked()
>
> with
>
> serial8250_put_poll_char()
>
> The difference is that serial8250_console_write_atomic() is line-based
> and synchronizing with serial8250_console_write() so that if the kernel
> crashes while outputing to the console, write() can be interrupted by
> write_atomic() and cleanly formatted crash data can be output.
>
> Also serial8250_put_poll_char() is calling into __pm_runtime_resume(),
> which includes a spinlock and possibly sleeping. This would not be
> acceptable for atomic consoles.
spinlocks aren't allowed in polled I/O either.
However IIRC there is a rather nasty trick being played here to allow
code sharing. I believe there was a deliberate unbalanced resume in the
poll_init() function that results (again IIRC) in the PM calls in
poll_char() becoming nothing but atomic add and subtract (e.g. enabling
polled I/O effectively suppresses PM activity).
Daniel.
> Although, as Andy pointed out [0], I
> will need to figure out how to deal with suspended consoles. Or just
> implement a policy that registered atomic consoles may never be
> suspended.
>
> I had not considered merging kgdb_io ops with atomic console ops. But
> now that I look at it more closely, there may be some useful overlap. I
> will consider this. Thank you for this idea.
>
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index 3d0c933937b4..1b546e117f10 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -214,6 +215,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
> >> #ifdef CONFIG_SMP
> >> static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
> >> static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
> >> +static unsigned int kgdb_cpu = -1;
> >
> > Is this the flag to provoke retriggering? It appears to be a write-only
> > variable (at least in this patch). How is it consumed?
>
> Critical catch! Thank you. I am quite unhappy to see these hunks were
> accidentally dropped when generating this series.
>
> @@ -3673,6 +3675,9 @@ EXPORT_SYMBOL(__printk_cpu_trylock);
> */
> void __printk_cpu_unlock(void)
> {
> + bool trigger_kgdb = false;
> + unsigned int cpu;
> +
> if (atomic_read(&printk_cpulock_nested)) {
> atomic_dec(&printk_cpulock_nested);
> return;
> @@ -3683,6 +3688,12 @@ void __printk_cpu_unlock(void)
> * LMM(__printk_cpu_unlock:A)
> */
>
> + cpu = smp_processor_id();
> + if (kgdb_cpu == cpu) {
> + trigger_kgdb = true;
> + kgdb_cpu = -1;
> + }
> +
> /*
> * Guarantee loads and stores from this CPU when it was the
> * lock owner are visible to the next lock owner. This pairs
> @@ -3703,6 +3714,21 @@ void __printk_cpu_unlock(void)
> */
> atomic_set_release(&printk_cpulock_owner,
> -1); /* LMM(__printk_cpu_unlock:B) */
> +
> + if (trigger_kgdb) {
> + pr_warn("re-triggering kgdb roundup for CPU#%d\n", cpu);
> + kgdb_roundup_cpu(cpu);
> + }
> }
> EXPORT_SYMBOL(__printk_cpu_unlock);
>
> John Ogness
>
> [0] https://lore.kernel.org/lkml/[email protected]
On Tue 2021-08-03 15:18:56, John Ogness wrote:
> boot_delay_msec() is always called immediately before printk_delay()
> so just call it from within printk_delay().
>
> Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Best Regards,
Petr
On Tue 2021-08-03 17:36:32, John Ogness wrote:
> On 2021-08-03, Daniel Thompson <[email protected]> wrote:
> > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> >> during cpu roundup. This will conflict with the printk cpulock.
> >
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index 3d0c933937b4..1b546e117f10 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -214,6 +215,7 @@ int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
> >> #ifdef CONFIG_SMP
> >> static atomic_t printk_cpulock_owner = ATOMIC_INIT(-1);
> >> static atomic_t printk_cpulock_nested = ATOMIC_INIT(0);
> >> +static unsigned int kgdb_cpu = -1;
> >
> > Is this the flag to provoke retriggering? It appears to be a write-only
> > variable (at least in this patch). How is it consumed?
>
> Critical catch! Thank you. I am quite unhappy to see these hunks were
> accidentally dropped when generating this series.
>
> @@ -3673,6 +3675,9 @@ EXPORT_SYMBOL(__printk_cpu_trylock);
> */
> void __printk_cpu_unlock(void)
> {
> + bool trigger_kgdb = false;
> + unsigned int cpu;
> +
> if (atomic_read(&printk_cpulock_nested)) {
> atomic_dec(&printk_cpulock_nested);
> return;
> @@ -3683,6 +3688,12 @@ void __printk_cpu_unlock(void)
> * LMM(__printk_cpu_unlock:A)
> */
>
> + cpu = smp_processor_id();
> + if (kgdb_cpu == cpu) {
> + trigger_kgdb = true;
> + kgdb_cpu = -1;
> + }
Just in case that this approach is used in the end.
This code looks racy. kgdb_roundup_delay() seems to be called in NMI
context. NMI might happen at this point and set kgdb_cpu after
it was checked.
I am afraid that it won't be easy to make this safe using a single
global variable. A solution might be a per-CPU variable set
by kgdb_roundup_delay() when it owns printk_cpu_lock.
__printk_cpu_unlock() would call kgdb_roundup_cpu(cpu) when
the variable is set.
Nit: The name "kgdb_cpu" is too generic. It is not clear what is
so special about this CPU. I would call the per-CPU variable
"kgdb_delayed_roundup" or so.
Best Regards,
Petr
> /*
> * Guarantee loads and stores from this CPU when it was the
> * lock owner are visible to the next lock owner. This pairs
> @@ -3703,6 +3714,21 @@ void __printk_cpu_unlock(void)
> */
> atomic_set_release(&printk_cpulock_owner,
> -1); /* LMM(__printk_cpu_unlock:B) */
> +
> + if (trigger_kgdb) {
> + pr_warn("re-triggering kgdb roundup for CPU#%d\n", cpu);
> + kgdb_roundup_cpu(cpu);
> + }
> }
On Tue 2021-08-03 15:18:55, John Ogness wrote:
> Move printk_delay() "as is" further up so that it can be used by
> new functions in an upcoming commit.
>
> Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Best Regards,
Petr
On Wed 2021-08-04 12:31:59, Daniel Thompson wrote:
> On Tue, Aug 03, 2021 at 05:36:32PM +0206, John Ogness wrote:
> > On 2021-08-03, Daniel Thompson <[email protected]> wrote:
> > > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> > >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> > >> during cpu roundup. This will conflict with the printk cpulock.
> > >
> > > When the full vision is realized what will be the purpose of the printk
> > > cpulock?
> > >
> > > I'm asking largely because it's current role is actively unhelpful
> > > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
> > > be a better (and safer) solution. However it sounds like there is a
> > > larger role planned for the printk cpulock...
> >
> > The printk cpulock is used as a synchronization mechanism for
> > implementing atomic consoles, which need to be able to safely interrupt
> > the console write() activity at any time and immediately continue with
> > their own printing. The ultimate goal is to move all console printing
> > into per-console dedicated kthreads, so the primary function of the
> > printk cpulock is really to immediately _stop_ the CPU/kthread
> > performing write() in order to allow write_atomic() (from any context on
> > any CPU) to safely and reliably take over.
>
> I see.
>
> Is there any mileage in allowing in_dbg_master() to suppress taking
> the console lock?
>
> There's a couple of reasons to worry about the current approach.
>
> The first is that we don't want this code to trigger in the case when
> kgdb is enabled and kdb is not since it is only kdb (a self-hosted
> debugger) than uses the consoles. This case is relatively trivial to
> address since we can rename it kdb_roundup_delay() and alter the way it
> is conditionally compiled.
>
> The second is more of a problem however. kdb will only call into the
> console code from the debug master. By default this is the CPU that
> takes the debug trap so initial prints will work fine. However it is
> possible to switch to a different master (so we can read per-CPU
> registers and things like that). This will result in one of the CPUs
> that did the IPI round up calling into console code and this is unsafe
> in that instance.
>
> There are a couple of tricks we could adopt to work around this but
> given the slightly odd calling context for kdb (all CPUs quiesced, no
> log interleaving possible) it sounds like it would remain safe to
> bypass the lock if in_dbg_master() is true.
>
> Bypassing an inconvenient lock might sound icky but:
>
> 1. If the lock is not owned by any CPU then what kdb will do is safe.
>
> 2. If the lock is owned by any CPU then we have quiesced it anyway
> and this makes is safe for the owning CPU to share its ownership
> (since it isn't much different to recursive acquisition on a single
> CPU)
I think about the following:
void kgdb_roundup_cpus(void)
{
__printk_cpu_lock();
__kgdb_roundup_cpus();
}
, where __printk_cpu_lock() waits/takes printk_cpu_lock()
__kgdb_roundup_cpus() is the original kgdb_roundup_cpus();
The idea is that kgdb_roundup_cpus() caller takes the printk_cpu lock.
The owner will be well defined.
As a result any other CPU will not be able to take the printk_cpu lock
as long as it is owned by the kgdb lock. But as you say, kgdb will
make sure that everything is serialized at this stage. So that
the original raw_printk_cpu_lock_irqsave() might just disable
IRQs when called under debugger.
Does it make any sense?
I have to say that it is a bit hairy. But it looks slightly better
than the delayed/repeated IPI proposed by this patch.
Best Regards,
Petr
On Wed, Aug 04, 2021 at 02:12:22PM +0200, Petr Mladek wrote:
> On Wed 2021-08-04 12:31:59, Daniel Thompson wrote:
> > On Tue, Aug 03, 2021 at 05:36:32PM +0206, John Ogness wrote:
> > > On 2021-08-03, Daniel Thompson <[email protected]> wrote:
> > > > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> > > >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> > > >> during cpu roundup. This will conflict with the printk cpulock.
> > > >
> > > > When the full vision is realized what will be the purpose of the printk
> > > > cpulock?
> > > >
> > > > I'm asking largely because it's current role is actively unhelpful
> > > > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
> > > > be a better (and safer) solution. However it sounds like there is a
> > > > larger role planned for the printk cpulock...
> > >
> > > The printk cpulock is used as a synchronization mechanism for
> > > implementing atomic consoles, which need to be able to safely interrupt
> > > the console write() activity at any time and immediately continue with
> > > their own printing. The ultimate goal is to move all console printing
> > > into per-console dedicated kthreads, so the primary function of the
> > > printk cpulock is really to immediately _stop_ the CPU/kthread
> > > performing write() in order to allow write_atomic() (from any context on
> > > any CPU) to safely and reliably take over.
> >
> > I see.
> >
> > Is there any mileage in allowing in_dbg_master() to suppress taking
> > the console lock?
> >
> > There's a couple of reasons to worry about the current approach.
> >
> > The first is that we don't want this code to trigger in the case when
> > kgdb is enabled and kdb is not since it is only kdb (a self-hosted
> > debugger) than uses the consoles. This case is relatively trivial to
> > address since we can rename it kdb_roundup_delay() and alter the way it
> > is conditionally compiled.
> >
> > The second is more of a problem however. kdb will only call into the
> > console code from the debug master. By default this is the CPU that
> > takes the debug trap so initial prints will work fine. However it is
> > possible to switch to a different master (so we can read per-CPU
> > registers and things like that). This will result in one of the CPUs
> > that did the IPI round up calling into console code and this is unsafe
> > in that instance.
> >
> > There are a couple of tricks we could adopt to work around this but
> > given the slightly odd calling context for kdb (all CPUs quiesced, no
> > log interleaving possible) it sounds like it would remain safe to
> > bypass the lock if in_dbg_master() is true.
> >
> > Bypassing an inconvenient lock might sound icky but:
> >
> > 1. If the lock is not owned by any CPU then what kdb will do is safe.
> >
> > 2. If the lock is owned by any CPU then we have quiesced it anyway
> > and this makes is safe for the owning CPU to share its ownership
> > (since it isn't much different to recursive acquisition on a single
> > CPU)
>
> I think about the following:
>
> void kgdb_roundup_cpus(void)
> {
> __printk_cpu_lock();
> __kgdb_roundup_cpus();
> }
>
> , where __printk_cpu_lock() waits/takes printk_cpu_lock()
> __kgdb_roundup_cpus() is the original kgdb_roundup_cpus();
>
>
> The idea is that kgdb_roundup_cpus() caller takes the printk_cpu lock.
> The owner will be well defined.
>
> As a result any other CPU will not be able to take the printk_cpu lock
> as long as it is owned by the kgdb lock. But as you say, kgdb will
> make sure that everything is serialized at this stage. So that
> the original raw_printk_cpu_lock_irqsave() might just disable
> IRQs when called under debugger.
>
> Does it make any sense?
Yes but I think it is still has problems.
Primarily is doesn't solve the issue I raised. It would still be unsafe
to change debug master: we can guarantee the initial master owns the
lock but if it has been multiply acquired we cannot transfer ownership
when we want to change master.
Additionally it will delay the round up of cores that do not own the
lock. The quiescing is never atomic and the operator needs to know
that but the longer CPUs are allows to execute for the more confusing
things can become for the operator.
Finally on machines without an NMI this could cause trouble with the
interrupt disable in raw_printk_cpu_lock_irqsave() (or any outer level
interrupt disable). If the master get the lock then the other processes
will become incapable of being rounded up if they are waiting for the
printk lock).
> I have to say that it is a bit hairy. But it looks slightly better
> than the delayed/repeated IPI proposed by this patch.
I'd like to reserve judgement for now which one is least worst...
largely because if the purpose of the lock simply to prevent interleaving
of console output then the debugger quiescing code should already have
this covered.
It leaves me wondering if a change like the one below is sufficient
(based on code without John's patches but hopefully still clear enough).
I've given the new code it's own branch which it doesn't, strictly
speaking, need but it is easier to comment this way... and perhaps also
just a little easier for people who have not set CONFIG_KGDB to
ignore ;-).
~~~
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 142a58d124d9..41a7e103bb66 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3599,6 +3599,18 @@ int __printk_cpu_trylock(void)
/* This CPU is already the owner. */
atomic_inc(&printk_cpulock_nested);
return 1;
+ } else if (in_dbg_master()) {
+ /*
+ * If we are executing as the primary CPU and with the debugger
+ * active than all other CPUs in the system are quiesced by
+ * the time kdb winds up calling this function. To execute this
+ * branch then the lock must be owned by one of the quiesced CPUs.
+ * Happily, because it is quiesced and cannot release it, it is
+ * safe for us to allow the lock to be taken from a different CPU!
+ * The lock will be released prior to resuming the real owner.
+ */
+ atomic_inc(&printk_cpulock_nested);
+ return 1;
}
return 0;
~~~
Daniel.
PS In the interested of full disclosure there is a special case
in the debugger to allow it to try to cope if it fails to
quiesce a CPU and I deliberately omitted this from the long
comment above. That special case is expected to be unstable
but since the alternative is likely to be a permanent deadlock
without any indication of why we choose to take the risk of
continuing. Personally I don't recommend reasoning about
console safety based on this emergency case hence omitting the
comment.
On 2021-08-04, Daniel Thompson <[email protected]> wrote:
> On Wed, Aug 04, 2021 at 02:12:22PM +0200, Petr Mladek wrote:
>> On Wed 2021-08-04 12:31:59, Daniel Thompson wrote:
>> > On Tue, Aug 03, 2021 at 05:36:32PM +0206, John Ogness wrote:
>> > > On 2021-08-03, Daniel Thompson <[email protected]> wrote:
>> > > > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
>> > > >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
>> > > >> during cpu roundup. This will conflict with the printk cpulock.
>> > > >
>> > > > When the full vision is realized what will be the purpose of the printk
>> > > > cpulock?
>> > > >
>> > > > I'm asking largely because it's current role is actively unhelpful
>> > > > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
>> > > > be a better (and safer) solution. However it sounds like there is a
>> > > > larger role planned for the printk cpulock...
>> > >
>> > > The printk cpulock is used as a synchronization mechanism for
>> > > implementing atomic consoles, which need to be able to safely interrupt
>> > > the console write() activity at any time and immediately continue with
>> > > their own printing. The ultimate goal is to move all console printing
>> > > into per-console dedicated kthreads, so the primary function of the
>> > > printk cpulock is really to immediately _stop_ the CPU/kthread
>> > > performing write() in order to allow write_atomic() (from any context on
>> > > any CPU) to safely and reliably take over.
>> >
>> > I see.
>> >
>> > Is there any mileage in allowing in_dbg_master() to suppress taking
>> > the console lock?
>> >
>> > There's a couple of reasons to worry about the current approach.
>> >
>> > The first is that we don't want this code to trigger in the case when
>> > kgdb is enabled and kdb is not since it is only kdb (a self-hosted
>> > debugger) than uses the consoles. This case is relatively trivial to
>> > address since we can rename it kdb_roundup_delay() and alter the way it
>> > is conditionally compiled.
Well, _I_ want this code to trigger even without kdb. The printk cpulock
is meant to be the innermost locking for the entire kernel. No code is
allowed to block/spin on any kind of lock if holding the printk
cpulock. This is the only way to guarantee the functionality of the
atomic consoles.
For example, if the kernel were to crash while inside kgdb code, we want
to see the backtrace.
Since kgdb _does_ take locks (spinning on @dbg_slave_lock during roundup
and the master's own cpu lock as a retry loop on @dbg_master_lock),
clearly it is not allowed to hold the printk cpulock. The simplest
solution I could find was just to make sure kgdb_cpu_enter() isn't
called while holding the printk cpulock.
>> > The second is more of a problem however. kdb will only call into the
>> > console code from the debug master. By default this is the CPU that
>> > takes the debug trap so initial prints will work fine. However it is
>> > possible to switch to a different master (so we can read per-CPU
>> > registers and things like that). This will result in one of the CPUs
>> > that did the IPI round up calling into console code and this is unsafe
>> > in that instance.
It is only unsafe if a CPU enters "kgdb/kdb context" while holding the
printk cpulock. That is what I want to prevent.
>> > There are a couple of tricks we could adopt to work around this but
>> > given the slightly odd calling context for kdb (all CPUs quiesced, no
>> > log interleaving possible) it sounds like it would remain safe to
>> > bypass the lock if in_dbg_master() is true.
>> >
>> > Bypassing an inconvenient lock might sound icky but:
>> >
>> > 1. If the lock is not owned by any CPU then what kdb will do is safe.
No. The printk cpulock exists for low-level synchronization. The atomic
consoles need this synchronization. (For example, the 8250 needs this
for correct tracking of its interrupt register, even for
serial8250_put_poll_char().)
>> > 2. If the lock is owned by any CPU then we have quiesced it anyway
>> > and this makes is safe for the owning CPU to share its ownership
>> > (since it isn't much different to recursive acquisition on a single
>> > CPU)
Quiescing the printk cpulock is not permitted.
Just because it is kdb, does not mean that the atomic consoles were
interrupted in a convenient place. The whole purpose of the atomic
consoles is so that we can have guaranteed console output from _any_
context and _any_ line of code in the kernel.
>> I think about the following:
>>
>> void kgdb_roundup_cpus(void)
>> {
>> __printk_cpu_lock();
>> __kgdb_roundup_cpus();
>> }
>>
>> , where __printk_cpu_lock() waits/takes printk_cpu_lock()
>> __kgdb_roundup_cpus() is the original kgdb_roundup_cpus();
>>
>>
>> The idea is that kgdb_roundup_cpus() caller takes the printk_cpu lock.
>> The owner will be well defined.
>>
>> As a result any other CPU will not be able to take the printk_cpu lock
>> as long as it is owned by the kgdb lock. But as you say, kgdb will
>> make sure that everything is serialized at this stage. So that
>> the original raw_printk_cpu_lock_irqsave() might just disable
>> IRQs when called under debugger.
>>
>> Does it make any sense?
>
> Yes but I think it is still has problems.
>
> Primarily is doesn't solve the issue I raised. It would still be unsafe
> to change debug master: we can guarantee the initial master owns the
> lock but if it has been multiply acquired we cannot transfer ownership
> when we want to change master.
>
> Additionally it will delay the round up of cores that do not own the
> lock. The quiescing is never atomic and the operator needs to know
> that but the longer CPUs are allows to execute for the more confusing
> things can become for the operator.
>
> Finally on machines without an NMI this could cause trouble with the
> interrupt disable in raw_printk_cpu_lock_irqsave() (or any outer level
> interrupt disable). If the master get the lock then the other processes
> will become incapable of being rounded up if they are waiting for the
> printk lock).
I am also not happy with such a solution. Aside from Daniel's comments,
it also violates the basic principle of the printk cpulock by allowing
further locking while holding the print cpulock. That is a recipe for
deadlock.
>> I have to say that it is a bit hairy. But it looks slightly better
>> than the delayed/repeated IPI proposed by this patch.
>
> I'd like to reserve judgement for now which one is least worst...
> largely because if the purpose of the lock simply to prevent interleaving
> of console output then the debugger quiescing code should already have
> this covered.
>
> It leaves me wondering if a change like the one below is sufficient
> (based on code without John's patches but hopefully still clear enough).
> I've given the new code it's own branch which it doesn't, strictly
> speaking, need but it is easier to comment this way... and perhaps also
> just a little easier for people who have not set CONFIG_KGDB to
> ignore ;-).
>
> ~~~
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index 142a58d124d9..41a7e103bb66 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3599,6 +3599,18 @@ int __printk_cpu_trylock(void)
> /* This CPU is already the owner. */
> atomic_inc(&printk_cpulock_nested);
> return 1;
> + } else if (in_dbg_master()) {
> + /*
> + * If we are executing as the primary CPU and with the debugger
> + * active than all other CPUs in the system are quiesced by
> + * the time kdb winds up calling this function. To execute this
> + * branch then the lock must be owned by one of the quiesced CPUs.
> + * Happily, because it is quiesced and cannot release it, it is
> + * safe for us to allow the lock to be taken from a different CPU!
> + * The lock will be released prior to resuming the real owner.
> + */
> + atomic_inc(&printk_cpulock_nested);
> + return 1;
> }
>
> return 0;
> ~~~
Being in kgdb/kdb context is similar to being in atomic console
context. (Of course, they are both using cpu locks.) But the contexts
are not the same. It is incorrect to handle them as the same.
We need to decide who is inside of who. Either printk is the innermost,
in which case the printk cpulock cannot be held when calling
kgdb_cpu_enter(). Or kgdb is the innermost, meaning that the atomic
consoles are no longer atomic/reliable while in kgdb.
I prefer and am pushing for the first, but am willing to accept the
second (i.e. that kgdb is the innermost function of the kernel).
> PS In the interested of full disclosure there is a special case
> in the debugger to allow it to try to cope if it fails to
> quiesce a CPU and I deliberately omitted this from the long
> comment above. That special case is expected to be unstable
> but since the alternative is likely to be a permanent deadlock
> without any indication of why we choose to take the risk of
> continuing. Personally I don't recommend reasoning about
> console safety based on this emergency case hence omitting the
> comment.
John Ogness
On 03. 08. 21, 16:07, Andy Shevchenko wrote:
> On Tue, Aug 03, 2021 at 03:19:01PM +0206, John Ogness wrote:
>> Implement an NMI-safe write_atomic() console function in order to
>> support synchronous console printing.
>>
>> Since interrupts need to be disabled during transmit, all usage of
>> the IER register is wrapped with access functions that use the
>> printk cpulock to synchronize register access while tracking the
>> state of the interrupts. This is necessary because write_atomic()
>> can be called from an NMI context that has preempted write_atomic().
>
> ...
>
>> +static inline void serial8250_set_IER(struct uart_8250_port *up,
>> + unsigned char ier)
>> +{
>> + struct uart_port *port = &up->port;
>> + unsigned long flags;
>> + bool is_console;
>
>> + is_console = uart_console(port);
>> +
>> + if (is_console)
>> + console_atomic_cpu_lock(flags);
>> +
>> + serial_out(up, UART_IER, ier);
>> +
>> + if (is_console)
>> + console_atomic_cpu_unlock(flags);
>
> I would rewrite it as
>
> if (uart_console()) {
> console_atomic_cpu_lock(flags);
> serial_out(up, UART_IER, ier);
> console_atomic_cpu_unlock(flags);
> } else {
> serial_out(up, UART_IER, ier);
> }
>
> No additional variable, easier to get the algorithm on the first glance, less
> error prone.
Yes, the original is terrible.
Another option:
bool locked = console_atomic_cpu_lock(flags, uart_console());
serial_out(up, UART_IER, ier);
console_atomic_cpu_unlock(flags, locked);
Which makes console_atomic_cpu_lock to lock only if second parameter is
true and return its value too.
BTW I actually don't know what console_atomic_cpu_lock does to think
about it more as I was not CCed, and neither lore sees the other patches:
https://lore.kernel.org/linux-mips/[email protected]/
thanks,
--
js
suse labs
On 2021-08-05, Jiri Slaby <[email protected]> wrote:
> On 03. 08. 21, 16:07, Andy Shevchenko wrote:
>> On Tue, Aug 03, 2021 at 03:19:01PM +0206, John Ogness wrote:
>>> Implement an NMI-safe write_atomic() console function in order to
>>> support synchronous console printing.
>>>
>>> Since interrupts need to be disabled during transmit, all usage of
>>> the IER register is wrapped with access functions that use the
>>> printk cpulock to synchronize register access while tracking the
>>> state of the interrupts. This is necessary because write_atomic()
>>> can be called from an NMI context that has preempted write_atomic().
>>
>> ...
>>
>>> +static inline void serial8250_set_IER(struct uart_8250_port *up,
>>> + unsigned char ier)
>>> +{
>>> + struct uart_port *port = &up->port;
>>> + unsigned long flags;
>>> + bool is_console;
>>
>>> + is_console = uart_console(port);
>>> +
>>> + if (is_console)
>>> + console_atomic_cpu_lock(flags);
>>> +
>>> + serial_out(up, UART_IER, ier);
>>> +
>>> + if (is_console)
>>> + console_atomic_cpu_unlock(flags);
>>
>> I would rewrite it as
>>
>> if (uart_console()) {
>> console_atomic_cpu_lock(flags);
>> serial_out(up, UART_IER, ier);
>> console_atomic_cpu_unlock(flags);
>> } else {
>> serial_out(up, UART_IER, ier);
>> }
Some locations have more than just 1 line of code in between
lock/unlock. I agree this looks better, but am unsure how much
copy/paste code is acceptable.
>> No additional variable, easier to get the algorithm on the first
>> glance, less error prone.
>
> Yes, the original is terrible.
>
> Another option:
>
> bool locked = console_atomic_cpu_lock(flags, uart_console());
> serial_out(up, UART_IER, ier);
> console_atomic_cpu_unlock(flags, locked);
>
> Which makes console_atomic_cpu_lock to lock only if second parameter
> is true and return its value too.
I am not sure how common such semantics for lock/unlock functions
are. But since this pattern, using uart_console(), will most likely be a
common pattern for atomic consoles, I can see how this will be useful.
I will choose one of these 2 suggestions for v2. Thanks.
> BTW I actually don't know what console_atomic_cpu_lock does to think
> about it more as I was not CCed, and neither lore sees the other patches:
> https://lore.kernel.org/linux-mips/[email protected]/
Only the lkml mailing list saw the full series:
https://lore.kernel.org/lkml/[email protected]/
John Ogness
On 2021-08-05, Petr Mladek <[email protected]> wrote:
> On Tue 2021-08-03 15:18:57, John Ogness wrote:
>> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
>> index d07d98c1e846..f8f46d9fba9b 100644
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> @@ -2912,18 +2920,19 @@ void console_unblank(void)
>> */
>> void console_flush_on_panic(enum con_flush_mode mode)
>> {
>> - /*
>> - * If someone else is holding the console lock, trylock will fail
>> - * and may_schedule may be set. Ignore and proceed to unlock so
>> - * that messages are flushed out. As this can be called from any
>> - * context and we don't want to get preempted while flushing,
>> - * ensure may_schedule is cleared.
>> - */
>> - console_trylock();
>> - console_may_schedule = 0;
>> -
>> - if (mode == CONSOLE_REPLAY_ALL)
>> - console_seq = prb_first_valid_seq(prb);
>> + if (console_trylock()) {
>> + if (mode == CONSOLE_REPLAY_ALL)
>> + latched_seq_write(&console_seq, prb_first_valid_seq(prb));
>
> I am scratching my head about this. Of course, latched_seq_write() does
> not guarantee the result when the console lock it taken by another process.
> But console_lock(), called below, will call latched_seq_write()
> anyway.
>
> Also CONSOLE_REPLAY_ALL is used by panic_print_sys_info().
> It is called the following way:
>
> void panic(const char *fmt, ...)
> {
> [...]
> debug_locks_off();
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
> panic_print_sys_info();
> [...]
> }
>
> On one hand, console_flush_on_panic(CONSOLE_FLUSH_PENDING) will
> most likely take over the console lock even when it was taken
> by another CPU before. And the 2nd console_flush_on_panic()
> called from panic_print_sys_info() will not even notice.
>
> On the other hand, CONSOLE_REPLAY_ALL would not even try to
> reply the log when the console log was not available.
>
> The risk of broken console_seq is neglible. console_unlock()
> should be safe even with invalid console_seq.
>
> My opinion:
>
> I suggest to keep the original logic and maybe add some comment:
>
> void console_flush_on_panic(enum con_flush_mode mode)
> {
> /*
> * If someone else is holding the console lock, trylock will fail
> * and may_schedule may be set. Ignore and proceed to unlock so
> * that messages are flushed out. As this can be called from any
> * context and we don't want to get preempted while flushing,
> * ensure may_schedule is cleared.
> */
> console_trylock();
> console_may_schedule = 0;
>
> /*
> * latched_seq_write() does not guarantee consistent values
> * when console_trylock() failed. But this is the best effort.
> * console_unlock() will update anyway console_seq. prb_read_valid()
> * handles even invalid sequence numbers.
> */
> if (mode == CONSOLE_REPLAY_ALL)
> latched_seq_write(&console_seq, prb_first_valid_seq(prb));
>
> console_unlock();
> }
I see now that CONSOLE_REPLAY_ALL is not handled correctly. And in the
follow-up patch "printk: introduce kernel sync mode" the situation gets
worse. I am trying to find ways to handle things without blindly
ignoring locks and hoping for the best.
I need to re-evaluate how to correctly support this feature.
John Ogness
On Tue 2021-08-03 15:18:57, John Ogness wrote:
> In preparation for synchronous printing, change @console_seq to use
> seqcount_latch so that it can be read without requiring @console_sem.
>
> Signed-off-by: John Ogness <[email protected]>
> ---
> kernel/printk/printk.c | 73 ++++++++++++++++++++++++------------------
> 1 file changed, 41 insertions(+), 32 deletions(-)
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index d07d98c1e846..f8f46d9fba9b 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -489,9 +489,7 @@ static u64 syslog_seq;
> static size_t syslog_partial;
> static bool syslog_time;
>
> -/* All 3 protected by @console_sem. */
> -/* the next printk record to write to the console */
> -static u64 console_seq;
> +/* Both protected by @console_sem. */
> static u64 exclusive_console_stop_seq;
> static unsigned long console_dropped;
>
> @@ -500,6 +498,17 @@ struct latched_seq {
> u64 val[2];
> };
>
> +/*
> + * The next printk record to write to the console. There are two
> + * copies (updated with seqcount_latch) so that reads can locklessly
> + * access a valid value. Writers are synchronized by @console_sem.
> + */
> +static struct latched_seq console_seq = {
> + .latch = SEQCNT_LATCH_ZERO(console_seq.latch),
> + .val[0] = 0,
> + .val[1] = 0,
> +};
> +
> /*
> * The next printk record to read after the last 'clear' command. There are
> * two copies (updated with seqcount_latch) so that reads can locklessly
> @@ -563,7 +572,7 @@ bool printk_percpu_data_ready(void)
> return __printk_percpu_data_ready;
> }
>
> -/* Must be called under syslog_lock. */
> +/* Must be called under associated write-protection lock. */
> static void latched_seq_write(struct latched_seq *ls, u64 val)
> {
> raw_write_seqcount_latch(&ls->latch);
> @@ -2405,9 +2414,9 @@ EXPORT_SYMBOL(_printk);
>
> #define prb_read_valid(rb, seq, r) false
> #define prb_first_valid_seq(rb) 0
> +#define latched_seq_read_nolock(seq) 0
> +#define latched_seq_write(dst, src)
>
> -static u64 syslog_seq;
> -static u64 console_seq;
> static u64 exclusive_console_stop_seq;
> static unsigned long console_dropped;
>
> @@ -2735,7 +2744,7 @@ void console_unlock(void)
> bool do_cond_resched, retry;
> struct printk_info info;
> struct printk_record r;
> - u64 __maybe_unused next_seq;
> + u64 seq;
>
> if (console_suspended) {
> up_console_sem();
> @@ -2779,12 +2788,14 @@ void console_unlock(void)
> size_t len;
>
> skip:
> - if (!prb_read_valid(prb, console_seq, &r))
> + seq = latched_seq_read_nolock(&console_seq);
> + if (!prb_read_valid(prb, seq, &r))
> break;
>
> - if (console_seq != r.info->seq) {
> - console_dropped += r.info->seq - console_seq;
> - console_seq = r.info->seq;
> + if (seq != r.info->seq) {
> + console_dropped += r.info->seq - seq;
> + latched_seq_write(&console_seq, r.info->seq);
> + seq = r.info->seq;
> }
>
> if (suppress_message_printing(r.info->level)) {
> @@ -2793,13 +2804,13 @@ void console_unlock(void)
> * directly to the console when we received it, and
> * record that has level above the console loglevel.
> */
> - console_seq++;
> + latched_seq_write(&console_seq, seq + 1);
> goto skip;
> }
>
> /* Output to all consoles once old messages replayed. */
> if (unlikely(exclusive_console &&
> - console_seq >= exclusive_console_stop_seq)) {
> + seq >= exclusive_console_stop_seq)) {
> exclusive_console = NULL;
> }
>
> @@ -2820,7 +2831,7 @@ void console_unlock(void)
> len = record_print_text(&r,
> console_msg_format & MSG_FORMAT_SYSLOG,
> printk_time);
> - console_seq++;
> + latched_seq_write(&console_seq, seq + 1);
>
> /*
> * While actively printing out messages, if another printk()
> @@ -2848,9 +2859,6 @@ void console_unlock(void)
> cond_resched();
> }
>
> - /* Get consistent value of the next-to-be-used sequence number. */
> - next_seq = console_seq;
> -
> console_locked = 0;
> up_console_sem();
>
> @@ -2860,7 +2868,7 @@ void console_unlock(void)
> * there's a new owner and the console_unlock() from them will do the
> * flush, no worries.
> */
> - retry = prb_read_valid(prb, next_seq, NULL);
> + retry = prb_read_valid(prb, latched_seq_read_nolock(&console_seq), NULL);
> if (retry && console_trylock())
> goto again;
> }
> @@ -2912,18 +2920,19 @@ void console_unblank(void)
> */
> void console_flush_on_panic(enum con_flush_mode mode)
> {
> - /*
> - * If someone else is holding the console lock, trylock will fail
> - * and may_schedule may be set. Ignore and proceed to unlock so
> - * that messages are flushed out. As this can be called from any
> - * context and we don't want to get preempted while flushing,
> - * ensure may_schedule is cleared.
> - */
> - console_trylock();
> - console_may_schedule = 0;
> -
> - if (mode == CONSOLE_REPLAY_ALL)
> - console_seq = prb_first_valid_seq(prb);
> + if (console_trylock()) {
> + if (mode == CONSOLE_REPLAY_ALL)
> + latched_seq_write(&console_seq, prb_first_valid_seq(prb));
I am scratching my head about this. Of course, latched_seq_write() does
not guarantee the result when the console lock it taken by another process.
But console_lock(), called below, will call latched_seq_write()
anyway.
Also CONSOLE_REPLAY_ALL is used by panic_print_sys_info().
It is called the following way:
void panic(const char *fmt, ...)
{
[...]
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
panic_print_sys_info();
[...]
}
On one hand, console_flush_on_panic(CONSOLE_FLUSH_PENDING) will
most likely take over the console lock even when it was taken
by another CPU before. And the 2nd console_flush_on_panic()
called from panic_print_sys_info() will not even notice.
On the other hand, CONSOLE_REPLAY_ALL would not even try to
reply the log when the console log was not available.
The risk of broken console_seq is neglible. console_unlock()
should be safe even with invalid console_seq.
My opinion:
I suggest to keep the original logic and maybe add some comment:
void console_flush_on_panic(enum con_flush_mode mode)
{
/*
* If someone else is holding the console lock, trylock will fail
* and may_schedule may be set. Ignore and proceed to unlock so
* that messages are flushed out. As this can be called from any
* context and we don't want to get preempted while flushing,
* ensure may_schedule is cleared.
*/
console_trylock();
console_may_schedule = 0;
/*
* latched_seq_write() does not guarantee consistent values
* when console_trylock() failed. But this is the best effort.
* console_unlock() will update anyway console_seq. prb_read_valid()
* handles even invalid sequence numbers.
*/
if (mode == CONSOLE_REPLAY_ALL)
latched_seq_write(&console_seq, prb_first_valid_seq(prb));
console_unlock();
}
Best Regards,
Petr
On Tue 2021-08-03 15:18:59, John Ogness wrote:
> Introduce "sync mode", which means that all printk calls will
> synchronously write to the console. Once activated, this mode is
> never deactivated. It is used when the kernel is about to end
> (such as panic, oops, shutdown, reboot).
>
> Sync mode can only be activated if atomic consoles are available.
>
> In sync mode:
>
> - only atomic consoles (write_atomic() callback) will print
> - printing occurs within vprintk_store() instead of console_unlock()
>
> CONSOLE_LOG_MAX is moved to printk.h to support the per-console
> buffer used in sync mode.
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 2f11b604e487..eda9b96e3fb6 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -151,6 +151,9 @@ struct console {
> short flags;
> short index;
> int cflag;
> +#if defined(CONFIG_PRINTK) && defined(CONFIG_HAVE_ATOMIC_CONSOLE)
> + char sync_buf[CONSOLE_LOG_MAX];
Could it be allocated in register_console()?
It is needed only when sync_write() callback is defined...
> +#endif
> void *data;
> struct console *next;
> };
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> +static bool print_sync(struct console *con, u64 *seq)
> +{
> + struct printk_info info;
> + struct printk_record r;
> + size_t text_len;
> +
> + prb_rec_init_rd(&r, &info, &con->sync_buf[0], sizeof(con->sync_buf));
> +
> + if (!prb_read_valid(prb, *seq, &r))
> + return false;
> +
It should check suppress_message_printing().
> + text_len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
> +
> + con->write_atomic(con, &con->sync_buf[0], text_len);
> +
> + *seq = r.info->seq;
> +
> + touch_softlockup_watchdog_sync();
> + clocksource_touch_watchdog();
> + rcu_cpu_stall_reset();
> + touch_nmi_watchdog();
> +
> + if (text_len)
> + printk_delay(r.info->level);
> +
> + return true;
> +}
> +
> +static void print_sync_to(struct console *con, u64 seq)
> +{
> + u64 printk_seq;
> +
> + while (!__printk_cpu_trylock())
> + __printk_wait_on_cpu_lock();
> +
> + for (;;) {
> + printk_seq = read_console_seq();
> + if (printk_seq > seq)
> + break;
> + if (!print_sync(con, &printk_seq))
> + break;
> +#ifdef CONFIG_PRINTK_NMI
> + if (in_nmi()) {
> + latched_seq_write(&console_sync_nmi_seq, printk_seq + 1);
> + continue;
> + }
> +#endif
> + latched_seq_write(&console_sync_seq, printk_seq + 1);
> + }
> +
> + __printk_cpu_unlock();
> +}
> +
> +static void call_sync_console_drivers(u64 seq)
> +{
> + struct console *con;
> +
> + for_each_console(con) {
> + if (!(con->flags & CON_ENABLED))
> + continue;
> + if (!con->write_atomic)
> + continue;
> + print_sync_to(con, seq);
> + }
> +}
> +#else
> +
> +#define read_console_seq() latched_seq_read_nolock(&console_seq)
> +#define in_sync_mode() false
> +#define enable_sync_mode()
> +#define call_sync_console_drivers(seq) ((void)seq)
> +
> +#endif /* CONFIG_HAVE_ATOMIC_CONSOLE */
> +
> /*
> * Special console_lock variants that help to reduce the risk of soft-lockups.
> * They allow to pass console_lock to another printk() call using a busy wait.
> @@ -2084,6 +2231,8 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
> if (!cpu_online(smp_processor_id()) &&
> !(con->flags & CON_ANYTIME))
> continue;
> + if (in_sync_mode())
> + continue;
IMHO, this causes that console_unlock() will iterate over all pending
messsages without calling console. But it will still increment
console_seq.
As a result, the messages will be skipped also by print_sync_to() because
read_console_seq() will return the incremented value.
I think that we need to break the iteration in console_unlock().
Or do I miss something?
> if (con->flags & CON_EXTENDED)
> con->write(con, ext_text, ext_len);
> else {
> @@ -2251,6 +2400,7 @@ int vprintk_store(int facility, int level,
> const u32 caller_id = printk_caller_id();
> struct prb_reserved_entry e;
> enum printk_info_flags flags = 0;
> + bool final_commit = false;
> struct printk_record r;
> unsigned long irqflags;
> u16 trunc_msg_len = 0;
> @@ -2261,6 +2411,7 @@ int vprintk_store(int facility, int level,
> u16 text_len;
> int ret = 0;
> u64 ts_nsec;
> + u64 seq;
>
> /*
> * Since the duration of printk() can vary depending on the message
> @@ -2299,6 +2450,7 @@ int vprintk_store(int facility, int level,
> if (flags & LOG_CONT) {
> prb_rec_init_wr(&r, reserve_size);
> if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) {
> + seq = r.info->seq;
> text_len = printk_sprint(&r.text_buf[r.info->text_len], reserve_size,
> facility, &flags, fmt, args);
> r.info->text_len += text_len;
> @@ -2306,6 +2458,7 @@ int vprintk_store(int facility, int level,
> if (flags & LOG_NEWLINE) {
> r.info->flags |= LOG_NEWLINE;
> prb_final_commit(&e);
> + final_commit = true;
> } else {
> prb_commit(&e);
> }
> @@ -2330,6 +2483,8 @@ int vprintk_store(int facility, int level,
> goto out;
> }
>
> + seq = r.info->seq;
> +
> /* fill message */
> text_len = printk_sprint(&r.text_buf[0], reserve_size, facility, &flags, fmt, args);
> if (trunc_msg_len)
> @@ -2344,13 +2499,19 @@ int vprintk_store(int facility, int level,
> memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info));
>
> /* A message without a trailing newline can be continued. */
> - if (!(flags & LOG_NEWLINE))
> + if (!(flags & LOG_NEWLINE)) {
> prb_commit(&e);
> - else
> + } else {
> prb_final_commit(&e);
> + final_commit = true;
> + }
>
> ret = text_len + trunc_msg_len;
> out:
> + /* only the kernel may perform synchronous printing */
> + if (in_sync_mode() && facility == 0 && final_commit)
> + call_sync_console_drivers(seq);
Is there any reason why this is called from vprintk_emit()?
I guess that you wanted to call it before releasing IRQ.
But is it really necessary? call_sync_console_drivers(seq)
reads the message again via the seq number anyway.
I have to say that the new code makes the printk() code/api much more
twisted. It is a combination of the naming scheme and design.
The original code path is:
+ printk()
+ vprintk_emit()
+ vprintk_store()
+ console_trylock()
+ console_unlock()
+ prb_read_valid()
+ record_print_text()
+ call_console_drivers()
+ con->write()
The new code path is:
+ printk()
+ vprintk_emit()
+ vprintk_store()
+ call_sync_console_drivers()
+ printk_sync_to()
+ print_sync()
+ prb_read_valid()
+ record_print_text()
+ con->write_atomic()
One thing is the ordering of the api names:
+ printk -> vprintk -> console -> record_print -> call_console -> con -> write
vs.
+ printk -> vprintk -> call_console -> print -> record_print -> con -> write
The original patch called console() API from printk() API. The most
ugly things were:
+ console_unlock() flushed the messages to the console.
A cleaner API would be:
console_lock();
console_flush();
console_unlock();
+ record_print() called from console_unlock(). The "print" name
name makes it hard to distinguish from the "printk" API.
But it does a completely different job:
+ "printk" API stores the message and call console
+ "record_print" API converts the message into the console format
The new code adds even more twists:
+ Adds yet another "print" API. It has another meaning than
"printk" or "record_print" API:
+ "printk" API stores the message and call console
+ "print" API prints the message to the console
+ "record_print" API converts the message into the console format
+ call_sync_console_drivers() does similar job as console_unlock()
(iterates over all pending messages, read, format, call console).
While the original call_console_drivers() only called the
console. The logic is somehow inside out.
This is why I opened the discussion about the behavior with console
kthreads.
I think that we might need another synchronous mode also for the early
boot, suspend, kexec, panic, for non-atomic consoles. We might need
another cycle/solution when there are per-console kthreads.
I would prefer to somehow refactor the existing console_unlock()
so that the iteration over pending messages and call_console_drivers()
might be usable also in the other sync modes, console kthreads, ...
Best Regards,
Petr
PS: I have vacation the following two weeks. I am still going to work
tomorrow (Friday) but I am not sure how much time I will have to
discuss this. I am afraid that I won't be able to help much before
I am back from the vacation.
On 2021-08-05, Petr Mladek <[email protected]> wrote:
> On Tue 2021-08-03 15:18:59, John Ogness wrote:
>> Introduce "sync mode", which means that all printk calls will
>> synchronously write to the console. Once activated, this mode is
>> never deactivated. It is used when the kernel is about to end
>> (such as panic, oops, shutdown, reboot).
>>
>> Sync mode can only be activated if atomic consoles are available.
>>
>> In sync mode:
>>
>> - only atomic consoles (write_atomic() callback) will print
>> - printing occurs within vprintk_store() instead of console_unlock()
>>
>> CONSOLE_LOG_MAX is moved to printk.h to support the per-console
>> buffer used in sync mode.
>>
>> diff --git a/include/linux/console.h b/include/linux/console.h
>> index 2f11b604e487..eda9b96e3fb6 100644
>> --- a/include/linux/console.h
>> +++ b/include/linux/console.h
>> @@ -151,6 +151,9 @@ struct console {
>> short flags;
>> short index;
>> int cflag;
>> +#if defined(CONFIG_PRINTK) && defined(CONFIG_HAVE_ATOMIC_CONSOLE)
>> + char sync_buf[CONSOLE_LOG_MAX];
>
> Could it be allocated in register_console()?
>
> It is needed only when sync_write() callback is defined...
Agreed.
>> +#endif
>> void *data;
>> struct console *next;
>> };
>> --- a/kernel/printk/printk.c
>> +++ b/kernel/printk/printk.c
>> +static bool print_sync(struct console *con, u64 *seq)
>> +{
>> + struct printk_info info;
>> + struct printk_record r;
>> + size_t text_len;
>> +
>> + prb_rec_init_rd(&r, &info, &con->sync_buf[0], sizeof(con->sync_buf));
>> +
>> + if (!prb_read_valid(prb, *seq, &r))
>> + return false;
>> +
>
> It should check suppress_message_printing().
Agreed. Nice catch.
>> + text_len = record_print_text(&r, console_msg_format & MSG_FORMAT_SYSLOG, printk_time);
>> +
>> + con->write_atomic(con, &con->sync_buf[0], text_len);
>> +
>> + *seq = r.info->seq;
>> +
>> + touch_softlockup_watchdog_sync();
>> + clocksource_touch_watchdog();
>> + rcu_cpu_stall_reset();
>> + touch_nmi_watchdog();
>> +
>> + if (text_len)
>> + printk_delay(r.info->level);
>> +
>> + return true;
>> +}
>> +
>> +static void print_sync_to(struct console *con, u64 seq)
>> +{
>> + u64 printk_seq;
>> +
>> + while (!__printk_cpu_trylock())
>> + __printk_wait_on_cpu_lock();
>> +
>> + for (;;) {
>> + printk_seq = read_console_seq();
>> + if (printk_seq > seq)
>> + break;
>> + if (!print_sync(con, &printk_seq))
>> + break;
>> +#ifdef CONFIG_PRINTK_NMI
>> + if (in_nmi()) {
>> + latched_seq_write(&console_sync_nmi_seq, printk_seq + 1);
>> + continue;
>> + }
>> +#endif
>> + latched_seq_write(&console_sync_seq, printk_seq + 1);
>> + }
>> +
>> + __printk_cpu_unlock();
>> +}
>> +
>> +static void call_sync_console_drivers(u64 seq)
>> +{
>> + struct console *con;
>> +
>> + for_each_console(con) {
>> + if (!(con->flags & CON_ENABLED))
>> + continue;
>> + if (!con->write_atomic)
>> + continue;
>> + print_sync_to(con, seq);
>> + }
>> +}
>> +#else
>> +
>> +#define read_console_seq() latched_seq_read_nolock(&console_seq)
>> +#define in_sync_mode() false
>> +#define enable_sync_mode()
>> +#define call_sync_console_drivers(seq) ((void)seq)
>> +
>> +#endif /* CONFIG_HAVE_ATOMIC_CONSOLE */
>> +
>> /*
>> * Special console_lock variants that help to reduce the risk of soft-lockups.
>> * They allow to pass console_lock to another printk() call using a busy wait.
>> @@ -2084,6 +2231,8 @@ static void call_console_drivers(const char *ext_text, size_t ext_len,
>> if (!cpu_online(smp_processor_id()) &&
>> !(con->flags & CON_ANYTIME))
>> continue;
>> + if (in_sync_mode())
>> + continue;
>
> IMHO, this causes that console_unlock() will iterate over all pending
> messsages without calling console. But it will still increment
> console_seq.
>
> As a result, the messages will be skipped also by print_sync_to() because
> read_console_seq() will return the incremented value.
>
> I think that we need to break the iteration in console_unlock().
>
> Or do I miss something?
You are correct.
>> if (con->flags & CON_EXTENDED)
>> con->write(con, ext_text, ext_len);
>> else {
>> @@ -2251,6 +2400,7 @@ int vprintk_store(int facility, int level,
>> const u32 caller_id = printk_caller_id();
>> struct prb_reserved_entry e;
>> enum printk_info_flags flags = 0;
>> + bool final_commit = false;
>> struct printk_record r;
>> unsigned long irqflags;
>> u16 trunc_msg_len = 0;
>> @@ -2261,6 +2411,7 @@ int vprintk_store(int facility, int level,
>> u16 text_len;
>> int ret = 0;
>> u64 ts_nsec;
>> + u64 seq;
>>
>> /*
>> * Since the duration of printk() can vary depending on the message
>> @@ -2299,6 +2450,7 @@ int vprintk_store(int facility, int level,
>> if (flags & LOG_CONT) {
>> prb_rec_init_wr(&r, reserve_size);
>> if (prb_reserve_in_last(&e, prb, &r, caller_id, LOG_LINE_MAX)) {
>> + seq = r.info->seq;
>> text_len = printk_sprint(&r.text_buf[r.info->text_len], reserve_size,
>> facility, &flags, fmt, args);
>> r.info->text_len += text_len;
>> @@ -2306,6 +2458,7 @@ int vprintk_store(int facility, int level,
>> if (flags & LOG_NEWLINE) {
>> r.info->flags |= LOG_NEWLINE;
>> prb_final_commit(&e);
>> + final_commit = true;
>> } else {
>> prb_commit(&e);
>> }
>> @@ -2330,6 +2483,8 @@ int vprintk_store(int facility, int level,
>> goto out;
>> }
>>
>> + seq = r.info->seq;
>> +
>> /* fill message */
>> text_len = printk_sprint(&r.text_buf[0], reserve_size, facility, &flags, fmt, args);
>> if (trunc_msg_len)
>> @@ -2344,13 +2499,19 @@ int vprintk_store(int facility, int level,
>> memcpy(&r.info->dev_info, dev_info, sizeof(r.info->dev_info));
>>
>> /* A message without a trailing newline can be continued. */
>> - if (!(flags & LOG_NEWLINE))
>> + if (!(flags & LOG_NEWLINE)) {
>> prb_commit(&e);
>> - else
>> + } else {
>> prb_final_commit(&e);
>> + final_commit = true;
>> + }
>>
>> ret = text_len + trunc_msg_len;
>> out:
>> + /* only the kernel may perform synchronous printing */
>> + if (in_sync_mode() && facility == 0 && final_commit)
>> + call_sync_console_drivers(seq);
>
> Is there any reason why this is called from vprintk_emit()?
>
> I guess that you wanted to call it before releasing IRQ.
> But is it really necessary? call_sync_console_drivers(seq)
> reads the message again via the seq number anyway.
The goal of sync mode is for all printk callers to print
synchronously. The synchronous printing occurs at the earliest possible
moment (immediately after the record has been stored in the
ringbuffer).
Although sync mode is used for non-crash scenarios (halt/reboot), it is
tuned for crash scenarios. I think we should be pushing those messages
out as soon as possible and not do things like re-enable interrupts
first.
> I have to say that the new code makes the printk() code/api much more
> twisted. It is a combination of the naming scheme and design.
Thank you for this comment. I may be too focussed on the end goal and
not investing enough care to maximize the beauty of these intermediate
releases.
> The original code path is:
>
> + printk()
> + vprintk_emit()
> + vprintk_store()
> + console_trylock()
> + console_unlock()
> + prb_read_valid()
> + record_print_text()
> + call_console_drivers()
> + con->write()
>
> The new code path is:
>
> + printk()
> + vprintk_emit()
> + vprintk_store()
> + call_sync_console_drivers()
> + printk_sync_to()
> + print_sync()
> + prb_read_valid()
> + record_print_text()
> + con->write_atomic()
>
>
> One thing is the ordering of the api names:
>
> + printk -> vprintk -> console -> record_print -> call_console -> con -> write
> vs.
> + printk -> vprintk -> call_console -> print -> record_print -> con -> write
>
>
> The original patch called console() API from printk() API. The most
> ugly things were:
>
> + console_unlock() flushed the messages to the console.
> A cleaner API would be:
>
> console_lock();
> console_flush();
> console_unlock();
>
>
> + record_print() called from console_unlock(). The "print" name
> name makes it hard to distinguish from the "printk" API.
> But it does a completely different job:
>
> + "printk" API stores the message and call console
> + "record_print" API converts the message into the console format
>
>
> The new code adds even more twists:
>
> + Adds yet another "print" API. It has another meaning than
> "printk" or "record_print" API:
>
> + "printk" API stores the message and call console
> + "print" API prints the message to the console
> + "record_print" API converts the message into the console format
>
>
> + call_sync_console_drivers() does similar job as console_unlock()
> (iterates over all pending messages, read, format, call console).
> While the original call_console_drivers() only called the
> console. The logic is somehow inside out.
It is difficult for me to compare code that is planned for complete
removal with new functions (which in several cases are also planned for
complete removal).
I will seriously consider your naming/design comments and try to rework
things that fit better into the current scheme of things.
> This is why I opened the discussion about the behavior with console
> kthreads.
>
> I think that we might need another synchronous mode also for the early
> boot, suspend, kexec, panic, for non-atomic consoles. We might need
> another cycle/solution when there are per-console kthreads.
I think our energies are better spent engineering more atomic
consoles. Trying to make non-atomic code work in atomic context usually
causes more problems than it solves.
Keep in mind we do have a real world implementation (PREEMPT_RT series)
where 8250 UART users can test if kthread printers and atomic consoles
are a genuine improvement (i.e. printk never disturbs the system during
runtime and crash scenarios are reliably pushing out dumps via the 8250
atomic console). Rather than speculating if something is missing, I
prefer that users test and point out real existing issues. Obviously the
PREEMPT_RT series is not being merged 1:1 into mainline, but with regard
to printk, PREEMPT_RT is functionally fairly close to what I am aiming
for.
> I would prefer to somehow refactor the existing console_unlock()
> so that the iteration over pending messages and call_console_drivers()
> might be usable also in the other sync modes, console kthreads, ...
I will do my best to interpret these suggestions.
> PS: I have vacation the following two weeks. I am still going to work
> tomorrow (Friday) but I am not sure how much time I will have to
> discuss this. I am afraid that I won't be able to help much before
> I am back from the vacation.
Thank you for taking time this week for so much feedback. I have plenty
to prepare for v2.
John Ogness
On Tue 2021-08-03 15:18:51, John Ogness wrote:
> Hi,
>
> This is the next part of our printk-rework effort (points 3 and
> 4 of the LPC 2019 summary [0]).
>
> Here the concept of "atomic consoles" is introduced through a
> new (optional) write_atomic() callback for console drivers. This
> callback must be implemented as an NMI-safe variant of the
> write() callback, meaning that it can function from any context
> without relying on questionable tactics such as ignoring locking
> and also without relying on the synchronization of console
> semaphore.
>
> As an example of how such an atomic console can look like, this
> series implements write_atomic() for the 8250 UART driver.
>
> This series also introduces a new console printing mode called
> "sync mode" that is only activated when the kernel is about to
> end (such as panic, oops, shutdown, reboot). Sync mode can only
> be activated if atomic consoles are available. A system without
> registered atomic consoles will be unaffected by this series.
>
> When in sync mode, the console printing behavior becomes:
>
> - only consoles implementing write_atomic() will be called
>
> - printing occurs within vprintk_store() instead of
> console_unlock(), since the console semaphore is irrelevant
> for atomic consoles
I am fine with the new behavior at this stage. It is a quite clear
win when (only) the atomic console is used. And it does not make any
difference when atomic consoles are disabled.
But I am not sure about the proposed terms and implementation.
I want to be sure that we are on the right way for introducing
console kthreads.
Let me try to compare the behavior:
1. before this patchset():
/* printk: store immediately; try all consoles immediately */
int printk(...)
{
vprintk_store();
if (console_try_lock()) {
/* flush pending messages to the consoles */
console_unlock();
}
}
/* panic: try hard to flush messages to the consoles and avoid deadlock */
void panic()
{
/* Ignore locks in console drivers */
bust_spinlocks(1);
printk("Kernel panic ...);
dump_stack();
smp_send_stop();
/* ignore console lock */
console_flush_on_panic();
}
2. after this patchset():
+ same as before in normal mode or when there is no atomic console
+ in panic with atomic console; it modifies the behavior:
/*
* printk: store immediately; immediately flush atomic consoles;
* unsafe consoles are not used anymore;
*/
int printk(...)
{
vprintk_store();
flush_atomic_consoles();
}
/* panic: no hacks; only atomic consoles are used */
void panic()
{
printk("Kernel panic ...);
dump_stack();
}
3. After introducing console kthread(s):
int printk(...)
{
vprintk_store();
wake_consoles_via_irqwork();
}
+ in panic:
+ with atomic console like after this patchset?
+ without atomic consoles?
+ during early boot?
I guess that we will need another sync mode for the early boot,
panic, suspend, kexec, etc.. It must be posible to debug these states
even wihtout atomic console and working kthreads.
Best Regards,
Petr
On 2021-08-03, Andy Shevchenko <[email protected]> wrote:
>> #include <linux/atomic.h>
>> #include <linux/types.h>
>> +#include <linux/printk.h>
>
> Ordered?
Agreed. v2 will include printk.h first.
>> + if (!(con->flags & CON_ENABLED)) \
>> + continue; \
>
> What about
>
> #define console_is_enabled(con) (!!(con->flags & CON_ENABLED))
>
> or inliner equivalent
>
> static inline bool console_is_enabled(struct console *con)
> {
> return !!(con->flags & CON_ENABLED);
> }
Generally kernel code uses the console flags directly. A quick check for
CON_ENABLED shows direct flag queries all over:
$ git grep -l -e 'flags.*& .*CON_ENABLED' | wc -c
16
Are you suggesting I replace this usage in all of these files? Or just
the one macro in console.h for now? And since there are 6 more console
flags, they should probably also have equivalent wrappers?
Thanks.
John Ogness
On Thu, Aug 05, 2021 at 05:52:43AM +0206, John Ogness wrote:
> On 2021-08-04, Daniel Thompson <[email protected]> wrote:
> > On Wed, Aug 04, 2021 at 02:12:22PM +0200, Petr Mladek wrote:
> >> On Wed 2021-08-04 12:31:59, Daniel Thompson wrote:
> >> > On Tue, Aug 03, 2021 at 05:36:32PM +0206, John Ogness wrote:
> >> > > On 2021-08-03, Daniel Thompson <[email protected]> wrote:
> >> > > > On Tue, Aug 03, 2021 at 03:18:54PM +0206, John Ogness wrote:
> >> > > >> kgdb makes use of its own cpulock (@dbg_master_lock, @kgdb_active)
> >> > > >> during cpu roundup. This will conflict with the printk cpulock.
> >> > > >
> >> > > > When the full vision is realized what will be the purpose of the printk
> >> > > > cpulock?
> >> > > >
> >> > > > I'm asking largely because it's current role is actively unhelpful
> >> > > > w.r.t. kdb. It is possible that cautious use of in_dbg_master() might
> >> > > > be a better (and safer) solution. However it sounds like there is a
> >> > > > larger role planned for the printk cpulock...
> >> > >
> >> > > The printk cpulock is used as a synchronization mechanism for
> >> > > implementing atomic consoles, which need to be able to safely interrupt
> >> > > the console write() activity at any time and immediately continue with
> >> > > their own printing. The ultimate goal is to move all console printing
> >> > > into per-console dedicated kthreads, so the primary function of the
> >> > > printk cpulock is really to immediately _stop_ the CPU/kthread
> >> > > performing write() in order to allow write_atomic() (from any context on
> >> > > any CPU) to safely and reliably take over.
> >> >
> >> > I see.
> >> >
> >> > Is there any mileage in allowing in_dbg_master() to suppress taking
> >> > the console lock?
> >> >
> >> > There's a couple of reasons to worry about the current approach.
> >> >
> >> > The first is that we don't want this code to trigger in the case when
> >> > kgdb is enabled and kdb is not since it is only kdb (a self-hosted
> >> > debugger) than uses the consoles. This case is relatively trivial to
> >> > address since we can rename it kdb_roundup_delay() and alter the way it
> >> > is conditionally compiled.
>
> Well, _I_ want this code to trigger even without kdb. The printk cpulock
> is meant to be the innermost locking for the entire kernel. No code is
> allowed to block/spin on any kind of lock if holding the printk
> cpulock. This is the only way to guarantee the functionality of the
> atomic consoles.
>
> For example, if the kernel were to crash while inside kgdb code, we want
> to see the backtrace.
That would certainly help me debug any such problems in kgdb ;-) .
> Since kgdb _does_ take locks (spinning on @dbg_slave_lock during roundup
> and the master's own cpu lock as a retry loop on @dbg_master_lock),
> clearly it is not allowed to hold the printk cpulock. The simplest
> solution I could find was just to make sure kgdb_cpu_enter() isn't
> called while holding the printk cpulock.
We might have to come back to this. I'm pretty certain your patch
does not currently achieve this goal.
> >> > The second is more of a problem however. kdb will only call into the
> >> > console code from the debug master. By default this is the CPU that
> >> > takes the debug trap so initial prints will work fine. However it is
> >> > possible to switch to a different master (so we can read per-CPU
> >> > registers and things like that). This will result in one of the CPUs
> >> > that did the IPI round up calling into console code and this is unsafe
> >> > in that instance.
>
> It is only unsafe if a CPU enters "kgdb/kdb context" while holding the
> printk cpulock. That is what I want to prevent.
Currently you can preventing this only for CPUs that enter the debugger
via an IPI. CPUs that enter due to a breakpoint (and there can be more
than one at a time) cannot just continue until the lock is dropped
since they would end up re-executing the breakpoint instruction.
> >> > There are a couple of tricks we could adopt to work around this but
> >> > given the slightly odd calling context for kdb (all CPUs quiesced, no
> >> > log interleaving possible) it sounds like it would remain safe to
> >> > bypass the lock if in_dbg_master() is true.
> >> >
> >> > Bypassing an inconvenient lock might sound icky but:
> >> >
> >> > 1. If the lock is not owned by any CPU then what kdb will do is safe.
>
> No. The printk cpulock exists for low-level synchronization. The atomic
> consoles need this synchronization. (For example, the 8250 needs this
> for correct tracking of its interrupt register, even for
> serial8250_put_poll_char().)
What I mean is that because kdb is mono-threaded (even on SMP systems
due to the quiescing of other CPUs) then if the lock is not taken when
we enter kdb then it is safe for kdb to contend for the lock in the
normal way since it cannot deadlock.
BTW the synchronization in question is the need for strict nesting, is
that right? (e.g. that each context that recursively acquires the lock
will release it in strict reverse order?).
> >> > 2. If the lock is owned by any CPU then we have quiesced it anyway
> >> > and this makes is safe for the owning CPU to share its ownership
> >> > (since it isn't much different to recursive acquisition on a single
> >> > CPU)
>
> Quiescing the printk cpulock is not permitted.
Sorry I wasn't quite clear in phrasing here. I don't think of it as
quiescing the lock, I think of it as quiescing the CPU that owns the
lock.
If any CPU that owns the lock *and* all CPUs except the debug master are
quiesced then allowing the debug master to take the lock is essentially
a special case of recursive acquisition and it will nest correctly.
> Just because it is kdb, does not mean that the atomic consoles were
> interrupted in a convenient place. The whole purpose of the atomic
> consoles is so that we can have guaranteed console output from _any_
> context and _any_ line of code in the kernel.
>
> >> I think about the following:
> >>
> >> void kgdb_roundup_cpus(void)
> >> {
> >> __printk_cpu_lock();
> >> __kgdb_roundup_cpus();
> >> }
> >>
> >> , where __printk_cpu_lock() waits/takes printk_cpu_lock()
> >> __kgdb_roundup_cpus() is the original kgdb_roundup_cpus();
> >>
> >>
> >> The idea is that kgdb_roundup_cpus() caller takes the printk_cpu lock.
> >> The owner will be well defined.
> >>
> >> As a result any other CPU will not be able to take the printk_cpu lock
> >> as long as it is owned by the kgdb lock. But as you say, kgdb will
> >> make sure that everything is serialized at this stage. So that
> >> the original raw_printk_cpu_lock_irqsave() might just disable
> >> IRQs when called under debugger.
> >>
> >> Does it make any sense?
> >
> > Yes but I think it is still has problems.
> >
> > Primarily is doesn't solve the issue I raised. It would still be unsafe
> > to change debug master: we can guarantee the initial master owns the
> > lock but if it has been multiply acquired we cannot transfer ownership
> > when we want to change master.
> >
> > Additionally it will delay the round up of cores that do not own the
> > lock. The quiescing is never atomic and the operator needs to know
> > that but the longer CPUs are allows to execute for the more confusing
> > things can become for the operator.
> >
> > Finally on machines without an NMI this could cause trouble with the
> > interrupt disable in raw_printk_cpu_lock_irqsave() (or any outer level
> > interrupt disable). If the master get the lock then the other processes
> > will become incapable of being rounded up if they are waiting for the
> > printk lock).
>
> I am also not happy with such a solution. Aside from Daniel's comments,
> it also violates the basic principle of the printk cpulock by allowing
> further locking while holding the print cpulock. That is a recipe for
> deadlock.
>
> >> I have to say that it is a bit hairy. But it looks slightly better
> >> than the delayed/repeated IPI proposed by this patch.
> >
> > I'd like to reserve judgement for now which one is least worst...
> > largely because if the purpose of the lock simply to prevent interleaving
> > of console output then the debugger quiescing code should already have
> > this covered.
> >
> > It leaves me wondering if a change like the one below is sufficient
> > (based on code without John's patches but hopefully still clear enough).
> > I've given the new code it's own branch which it doesn't, strictly
> > speaking, need but it is easier to comment this way... and perhaps also
> > just a little easier for people who have not set CONFIG_KGDB to
> > ignore ;-).
> >
> > ~~~
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index 142a58d124d9..41a7e103bb66 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -3599,6 +3599,18 @@ int __printk_cpu_trylock(void)
> > /* This CPU is already the owner. */
> > atomic_inc(&printk_cpulock_nested);
> > return 1;
> > + } else if (in_dbg_master()) {
> > + /*
> > + * If we are executing as the primary CPU and with the debugger
> > + * active than all other CPUs in the system are quiesced by
> > + * the time kdb winds up calling this function. To execute this
> > + * branch then the lock must be owned by one of the quiesced CPUs.
> > + * Happily, because it is quiesced and cannot release it, it is
> > + * safe for us to allow the lock to be taken from a different CPU!
> > + * The lock will be released prior to resuming the real owner.
> > + */
> > + atomic_inc(&printk_cpulock_nested);
> > + return 1;
> > }
> >
> > return 0;
> > ~~~
>
> Being in kgdb/kdb context is similar to being in atomic console
> context. (Of course, they are both using cpu locks.) But the contexts
> are not the same. It is incorrect to handle them as the same.
>
> We need to decide who is inside of who. Either printk is the innermost,
> in which case the printk cpulock cannot be held when calling
> kgdb_cpu_enter().
It is difficult to prevent this for the breakpoint cases... although
since everything about your current work is difficult I don't expect
that to be a sufficient argument on its own!
> Or kgdb is the innermost, meaning that the atomic
> consoles are no longer atomic/reliable while in kgdb.
>
> I prefer and am pushing for the first, but am willing to accept the
> second (i.e. that kgdb is the innermost function of the kernel).
I think it will always be the case that we might execute breakpoints in
an NMI context since the collateral damage from forbidding breakpoints
on all API that *might* be called from NMI is likely to constrain the
not-NMI debugging experience too much. However it *is* possible to defer
breakpoints: we could defer them by calling into the
out-of-line-single-step logic that is needed to support kprobes. I
dislike this approach since there is no way to fixup the PC so when
we eventually stop then gdb would have trouble figuring out
why the system has stopped.
However taking on board what you are saying about innermost functions
I think there might be a we could look into that is much nicer from an
analysis point of view than relying in in_dbg_master() to implicitly
borrow the printk lock.
Would you consider a means for kgdb to *explicitly* allow a slave CPU
to donate ownership to the debug master as part of it's spin loop (e.g.
explicitly transfer ownership if and only if we are quiesced). This
has a number of nice properties:
1. The ownership transfer happens *after* we have decided who the
master actually is and before that point everything works as
normal!
2. Safe-nesting is guaranteed by the slave CPUs exception stack.
3. We can print (and expect it to be seen) pretty much anywhere in the
master code path (including the ones before we find out who will be
master since that happens before the IPIs) with no trouble.
3. Handling change of master is easy... we can re-donate the lock
to the new master using the same or similar API.
4. We can print anywhere in the slave code *except* for the tight
loop we run after donating ownership to the master and the code
after an former master CPU donates the lock to the next master
and before the former master drops into the slave loop.
5. Apart from the function to donate ownership all the nasty code
to handle it ends up in kgdb where is belongs rather than smeared
in your lock code.
I can't decide if this makes a tiny piece of kgdb inner-most or not
but it is certainly much easier to analyse how kgdb and atomic consoles
interact.
> > PS In the interested of full disclosure there is a special case
> > in the debugger to allow it to try to cope if it fails to
> > quiesce a CPU and I deliberately omitted this from the long
> > comment above. That special case is expected to be unstable
> > but since the alternative is likely to be a permanent deadlock
> > without any indication of why we choose to take the risk of
> > continuing. Personally I don't recommend reasoning about
> > console safety based on this emergency case hence omitting the
> > comment.
The above idea of explicitly transferring lock ownership also allows us
to analyse this case (where as the in_dbg_master() approach meant it was
too hard). If the CPU cannot be rounded up (will not react to the NMI
IPI) *and* it owns the printk lock and won't give it back then kdb will
deadlock. Given your goals w.r.t. reliability of atomic consoles then I
am more than happy to live with this!
Daniel.
On Fri, Aug 06, 2021 at 01:02:40PM +0206, John Ogness wrote:
> On 2021-08-03, Andy Shevchenko <[email protected]> wrote:
...
> >> + if (!(con->flags & CON_ENABLED)) \
> >> + continue; \
> >
> > What about
> >
> > #define console_is_enabled(con) (!!(con->flags & CON_ENABLED))
> >
> > or inliner equivalent
> >
> > static inline bool console_is_enabled(struct console *con)
> > {
> > return !!(con->flags & CON_ENABLED);
> > }
>
> Generally kernel code uses the console flags directly. A quick check for
> CON_ENABLED shows direct flag queries all over:
>
> $ git grep -l -e 'flags.*& .*CON_ENABLED' | wc -c
> 16
>
> Are you suggesting I replace this usage in all of these files? Or just
> the one macro in console.h for now? And since there are 6 more console
> flags, they should probably also have equivalent wrappers?
Introduce it now and reuse in new code, but somebody may clean up the rest in
the future.
--
With Best Regards,
Andy Shevchenko
On Thu 2021-08-05 17:32:40, John Ogness wrote:
> On 2021-08-05, Petr Mladek <[email protected]> wrote:
> > On Tue 2021-08-03 15:18:57, John Ogness wrote:
> >> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> >> index d07d98c1e846..f8f46d9fba9b 100644
> >> --- a/kernel/printk/printk.c
> >> +++ b/kernel/printk/printk.c
> >> @@ -2912,18 +2920,19 @@ void console_unblank(void)
> >> */
> >> void console_flush_on_panic(enum con_flush_mode mode)
> >> {
> >> - /*
> >> - * If someone else is holding the console lock, trylock will fail
> >> - * and may_schedule may be set. Ignore and proceed to unlock so
> >> - * that messages are flushed out. As this can be called from any
> >> - * context and we don't want to get preempted while flushing,
> >> - * ensure may_schedule is cleared.
> >> - */
> >> - console_trylock();
> >> - console_may_schedule = 0;
> >> -
> >> - if (mode == CONSOLE_REPLAY_ALL)
> >> - console_seq = prb_first_valid_seq(prb);
> >> + if (console_trylock()) {
> >> + if (mode == CONSOLE_REPLAY_ALL)
> >> + latched_seq_write(&console_seq, prb_first_valid_seq(prb));
> >
> > I am scratching my head about this. Of course, latched_seq_write() does
> > not guarantee the result when the console lock it taken by another process.
> > But console_lock(), called below, will call latched_seq_write()
> > anyway.
> >
> > Also CONSOLE_REPLAY_ALL is used by panic_print_sys_info().
> > It is called the following way:
> >
> > void panic(const char *fmt, ...)
> > {
> > [...]
> > debug_locks_off();
> > console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> >
> > panic_print_sys_info();
> > [...]
> > }
> >
> > On one hand, console_flush_on_panic(CONSOLE_FLUSH_PENDING) will
> > most likely take over the console lock even when it was taken
> > by another CPU before. And the 2nd console_flush_on_panic()
> > called from panic_print_sys_info() will not even notice.
> >
> > On the other hand, CONSOLE_REPLAY_ALL would not even try to
> > reply the log when the console log was not available.
> >
> > The risk of broken console_seq is neglible. console_unlock()
> > should be safe even with invalid console_seq.
> >
> > My opinion:
> >
> > I suggest to keep the original logic and maybe add some comment:
> >
> > void console_flush_on_panic(enum con_flush_mode mode)
> > {
> > /*
> > * If someone else is holding the console lock, trylock will fail
> > * and may_schedule may be set. Ignore and proceed to unlock so
> > * that messages are flushed out. As this can be called from any
> > * context and we don't want to get preempted while flushing,
> > * ensure may_schedule is cleared.
> > */
> > console_trylock();
> > console_may_schedule = 0;
> >
> > /*
> > * latched_seq_write() does not guarantee consistent values
> > * when console_trylock() failed. But this is the best effort.
> > * console_unlock() will update anyway console_seq. prb_read_valid()
> > * handles even invalid sequence numbers.
> > */
> > if (mode == CONSOLE_REPLAY_ALL)
> > latched_seq_write(&console_seq, prb_first_valid_seq(prb));
> >
> > console_unlock();
> > }
>
> I see now that CONSOLE_REPLAY_ALL is not handled correctly. And in the
> follow-up patch "printk: introduce kernel sync mode" the situation gets
> worse. I am trying to find ways to handle things without blindly
> ignoring locks and hoping for the best.
>
> I need to re-evaluate how to correctly support this feature.
A solution might be to implement a generic cycle that would use
the right latched_seq and buffers. Something like:
enum console_mode {
CONSOLE_MODE_NORMAL = 0,
CONSOLE_MODE_ATOMIC,
CONSOLE_MODE_ATOMIC_NMI,
CONSOLE_MODE_REPLAY_ALL,
CONSOLE_MODE_LAST
};
struct console_mode_info
{
static char text[CONSOLE_LOG_MAX];
static char ext_text[CONSOLE_EXT_LOG_MAX];
static struct latched_seq seq;
};
struct console_mode_info[CONSOLE_MODE_LAST];
void set_console_seq(u64 seq, enum console_mode mode)
{
latched_seq_write(&console_mode_info[mode].seq), seq);
}
u64 get_console_seq(enum console_mode mode)
{
/* Atomic console calls might be nested. Return the highest value. */
if (mode == CONSOLE_MODE_ATOMIC ||
mode == CONSOLE_MODE_ATOMIC_NMI) {
u64 seq, seq_nmi;
seq = latched_seq_read_nolock(&console_mode_info[CONSOLE_MODE_ATTOMIC].seq);
seq_nmi = latched_seq_read_nolock(&console_mode_info[CONSOLE_MODE_ATTOMIC_NMI].seq));
if (seq_nmi > seq)
seq = seq_nmi;
/*
* Return what has already been proceed by normal consoles
* when the atomic consoles have not been used yet.
*/
if (seq > 0)
return seq;
return latched_seq_read_nolock(&console_mode_info[CONSOLE_MODE_NORMAL].seq);
}
return latched_seq_read_nolock(&console_mode_info[mode].seq;
}
/*
* Generic cycle for flushing messages to the console.
*
* Return: the next to be proceed sequence number;
*/
void console_write_pending_messages(enum console_mode mode);
{
struct printk_info info;
struct printk_record r;
u64 seq;
prb_rec_init_rd(&r, &info, console_mode_info[mode].text,
sizeof(console_mode_info[mode].text));
for (;;) {
/*
* Stop normal console when atomic consoles got
* activated in a panic mode.
*/
if (ctx == CONSOLE_MODE_NORMAL && atomic_console_used)
break;
seq = get_console_seq(mode);
if (!prb_read_valid(prb, seq, &r))
return;
if (suppress_message_printing(r.info->level)) {
set_console_seq(seq + 1, mode);
continue;
}
len = console_format_msg(r);
call_console_drivers(...);
set_console_seq(seq + 1, mode);
}
}
void console_unlock(void)
{
bool retry;
int seq;
again:
/* Prevent infinite loop when normal consoles were stopped */
if (atomic_console_used) {
up_console_sem();
return;
}
console_write_pending_messages(CONSOLE_MODE_NORMAL);
up_console_sem();
/*
* Make sure that someone handles messages added in the small
* race window before console_sem was released.
*/
retry = prb_read_valid(prb, get_console_seq(CONSOLE_MODE_NORMAL), NULL);
if (retry && console_trylock())
goto again;
}
void atomic_console_write_pending()
{
unsinged long flags;
raw_printk_cpu_lock_irqsave(flags);
atomic_console_used = true;
if (in_nmi)
console_write_pending_messages(CONSOLE_MODE_ATOMIC_NMI);
else
console_write_pending_messages(CONSOLE_MODE_ATOMIC);
}
raw_printk_cpu_unlock_irqrestore(flags);
}
void replay_all_messages()
{
set_console_seq(CONSOLE_MODE_REPLAY_ALL, rb_first_valid_seq(prb));
console_write_pending_messages(CONSOLE_MODE_REPLAY_ALL);
}
Problems:
a) The same line might be printed by more contexts.
b) per-console kthreads?
Ad a) I am not sure if we could prevent duplicated lines when
the nested IRQ/NMI writes the same message that is just
being written by the outer context. But it should be
an acceptable corner case.
Ad b) Everything will get much more complicated with per-console
kthreads. We will need counters and buffers for each console
and each context.
This is what I was able to come up before leaving for vacation. I am
not sure if it is the best design/naming and it if it has a chance
to work.
But it looks like a way how to re-use the same code in all modes.
It might help to see what is the same and what is special about each
mode.
I would prefer to see something like this instead of the completely
different code paths for atomic consoles that are proposed by 7th
patch of this patchset.
Best Regards,
Petr
On (21/08/05 17:47), Petr Mladek wrote:
[..]
> 3. After introducing console kthread(s):
>
> int printk(...)
> {
> vprintk_store();
> wake_consoles_via_irqwork();
> }
>
> + in panic:
>
> + with atomic console like after this patchset?
> + without atomic consoles?
>
> + during early boot?
I guess I'd also add netconsole to the list.
On (21/08/03 15:18), John Ogness wrote:
> boot_delay_msec() is always called immediately before printk_delay()
> so just call it from within printk_delay().
[..]
Interesting. Apparently boot_delay_msec() does not do anything
if suppress_message_printing(level). I wonder if we want a similar
thing for printk_delay() as well. Otherwise we have some imbalance in
behaviour.
IOW,
> @@ -1222,10 +1222,8 @@ static void boot_delay_msec(int level)
> unsigned long long k;
> unsigned long timeout;
>
>- if ((boot_delay == 0 || system_state >= SYSTEM_RUNNING)
>- || suppress_message_printing(level)) {
>+ if (boot_delay == 0 || system_state >= SYSTEM_RUNNING)
> return;
>- }
>
> k = (unsigned long long)loops_per_msec * boot_delay;
[..]
> +static inline void printk_delay(int level)
> {
+ if (suppress_message_printing(level))
+ return;
> + boot_delay_msec(level);
> +
> if (unlikely(printk_delay_msec)) {
> int m = printk_delay_msec;
>
On (21/08/03 15:18), John Ogness wrote:
[..]
> @@ -1993,6 +1993,16 @@ static int console_trylock_spinning(void)
> bool spin = false;
> unsigned long flags;
>
> +#ifdef CONFIG_SMP
> + /*
> + * CPUs holding the printk cpulock must not spin on any lock. Even
> + * console_trylock() must not be called because its implementation
> + * uses spinlocks.
> + */
> + if (atomic_read(&printk_cpulock_owner) == smp_processor_id())
> + return 0;
> +#endif
> +
> if (console_trylock())
> return 1;
>
> @@ -2719,7 +2729,17 @@ static int have_callable_console(void)
> */
> static inline int can_use_console(void)
> {
> - return cpu_online(raw_smp_processor_id()) || have_callable_console();
> + int cpu = raw_smp_processor_id();
> +#ifdef CONFIG_SMP
> + /*
> + * CPUs holding the printk cpulock must not spin on any lock.
> + * Allowing console usage could call into the spinlocks of the
> + * various console drivers.
> + */
> + if (atomic_read(&printk_cpulock_owner) == cpu)
> + return 0;
I guess the only reason this is done in can_use_console() is
console_flush_on_panic()?
Because otherwise, I think, we can move this check to vprintk_emit().
can_use_console() can be called from preemptible() context. But
if it's called from preemptible() then we know that this is not
printk()/NMI path (but console_device() and friends instead) and
that this CPU is definitely not holding printk CPU lock.
console_trylock_spinning() and console_unlock()->can_use_console()
follow each other
if (console_trylock_spinning())
console_unlock();
so a single `atomic_read(&printk_cpulock_owner) == cpu` can suffice.
Now we get to the console_flush_on_panic(), which still calls console_unlock(),
iterates over messages, but when called by the CPU that owns printk_lock,
just skips all the messages. But there is no point in calling console_unlock()
in such a case, we can check if we're printk_cpulock_owner and bail out if so.
Or am I missing something?
On (21/08/06 17:56), Petr Mladek wrote:
>
> A solution might be to implement a generic cycle that would use
> the right latched_seq and buffers. Something like:
>
> enum console_mode {
> CONSOLE_MODE_NORMAL = 0,
> CONSOLE_MODE_ATOMIC,
> CONSOLE_MODE_ATOMIC_NMI,
> CONSOLE_MODE_REPLAY_ALL,
> CONSOLE_MODE_LAST
> };
>
> struct console_mode_info
> {
> static char text[CONSOLE_LOG_MAX];
> static char ext_text[CONSOLE_EXT_LOG_MAX];
> static struct latched_seq seq;
> };
Seems to me like this wants to be part of struct console.
[..]
> Problems:
>
> a) The same line might be printed by more contexts.
> b) per-console kthreads?
>
>
> Ad a) I am not sure if we could prevent duplicated lines when
> the nested IRQ/NMI writes the same message that is just
> being written by the outer context. But it should be
> an acceptable corner case.
>
> Ad b) Everything will get much more complicated with per-console
> kthreads. We will need counters and buffers for each console
> and each context.
Oh, yes, you are talking about per-console counters/buffers too.
> This is what I was able to come up before leaving for vacation. I am
> not sure if it is the best design/naming and it if it has a chance
> to work.
>
> But it looks like a way how to re-use the same code in all modes.
> It might help to see what is the same and what is special about each
> mode.
>
>
> I would prefer to see something like this instead of the completely
> different code paths for atomic consoles that are proposed by 7th
> patch of this patchset.
I agree.