2024-04-02 22:12:02

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 00/27] wire up write_atomic() printing

Hi,

This is v4 of a series to wire up the nbcon consoles so that
they actually perform printing using their write_atomic()
callback. v3 is here [0]. For information about the motivation
of the atomic consoles, please read the cover letter of v1 [1].

The main focus of this series:

- For nbcon consoles, always call write_atomic() directly from
printk() caller context for the panic CPU.

- For nbcon consoles, call write_atomic() when unlocking the
console lock.

- Only perform the console lock/unlock dance if legacy or boot
consoles are registered.

- For legacy consoles, if nbcon consoles are registered, do not
attempt to print from printk() caller context for the panic
CPU until nbcon consoles have had a chance to print the most
significant messages.

- Mark emergency sections. In these sections printk() calls
will only store the messages. Upon exiting the emergency
section, nbcon consoles are flushed directly and legacy
console flushing is triggered via irq_work.

This series does _not_ include threaded printing or nbcon
drivers. Those features will be added in separate follow-up
series.

Note: With this series, a system with _only_ nbcon consoles
registered will not perform console printing unless the
console lock is used (for synchronization), or when
exiting emergency sections, or on panic. This is on
purpose. When nbcon kthreads are introduced, they will
fill the gaps.

The changes since v3:

- Modify the documentation of console_srcu_read_flags() to
clarify that it is needed anytime a console _might_ be
registered and the caller is not holding the
console_list_lock. Hopefully this makes it clear when this
helper function is needed.

- Create a function uart_port_set_cons() for setting @cons of
struct uart_port. It modifies @cons under the port lock to
avoid possible races within the port lock wrapper. All (5)
code sites are modified to use the new function.

- Introduce 2 new required nbcon console callbacks
device_lock()/device_unlock() to implement any internal
locking required by the driver. (For example, for uart serial
consoles it is locking/unlocking the port lock.) This is used
during console registration to ensure that the hardware is
not in use while the console transitions to registered. This
avoids the risk that the port lock wrappers do not lock the
nbcon console lock while the console was being registered on
another CPU. These callbacks also will be used later by the
printing kthreads.

- Introduce struct nbcon_drvdata to track ownership state when
using the port lock wrappers. This provides a race-free
alternative to the @nbcon_locked_port flag used in v3.

- Split the functionality of uart_nbcon_acquire() and
uart_nbcon_release() into driver-specific and generic parts.
The generic functions are named nbcon_driver_acquire() and
nbcon_driver_release(). The driver-specific part is moved
into serial_core.h into the new helper functions
__uart_port_nbcon_acquire() and __uart_port_nbcon_release().

- Rename nbcon_atomic_flush_all() to
nbcon_atomic_flush_pending() to emphasize that it only prints
up to the latest record at the time of the call. Also, flush
all the pending records of a console (without releasing
ownership in between) before flushing the next nbcon console.
This allows the full emergency block to be printed on at
least one atomic console before trying the next.

- Flush nbcon consoles directly in the caller context when
exiting an emergency section.

- If a CPU is in EMERGENCY context, do not trigger printing
of legacy consoles via irq_work.

- In panic, allow synchronous legacy printing before calling
the panic handlers. Attempt to flush there in the panic
context as well.

- Remove the return value for the nbcon console atomic_write()
callback. If ownership has not been lost, it is assumed the
printing was successful.

- Add a WARN_ON_ONCE if nbcon_emit_next_record() is called for
a console that has not provided a write_atomic() callback.

- Change the meaning of the return value of
nbcon_atomic_emit_one() to allow
nbcon_legacy_emit_next_record() to have the same return value
meaning as console_emit_next_record().

- Remove all legacy @seq handling from nbcon.c. For nbcon
consoles, printk.c handles the transfer and resetting of the
legacy @seq value to @nbcon_seq.

- Add a compiler barrier in __pr_flush() to ensure the compiler
does not optimize out a local variable by replacing it with
a racy read of multiple global variables.

- Let __wake_up_klogd() remove unnecessary flags before
possibly queuing irq_work.

- Eliminate header proxying in nbcon.c.

- Mark _all_ lockdep output blocks as emergency sections.

- Mark _all_ rcu stall blocks as emergency sections.

- Remove "(Optional)" in the documentation of the
write_atomic() callback. Once threads are available, it will
be optional. But at this point in the rework it is not.

John Ogness

[0] https://lore.kernel.org/lkml/[email protected]
[1] https://lore.kernel.org/lkml/[email protected]

John Ogness (23):
printk: Add notation to console_srcu locking
printk: Properly deal with nbcon consoles on seq init
printk: nbcon: Remove return value for write_atomic()
printk: nbcon: Add detailed doc for write_atomic()
printk: nbcon: Add callbacks to synchronize with driver
printk: nbcon: Use driver synchronization while registering
serial: core: Provide low-level functions to lock port
printk: nbcon: Implement processing in port->lock wrapper
printk: nbcon: Do not rely on proxy headers
printk: nbcon: Fix kerneldoc for enums
printk: Make console_is_usable() available to nbcon
printk: Let console_is_usable() handle nbcon
printk: Add @flags argument for console_is_usable()
printk: Track registered boot consoles
printk: nbcon: Use nbcon consoles in console_flush_all()
printk: nbcon: Assign priority based on CPU state
printk: nbcon: Add unsafe flushing on panic
printk: Avoid console_lock dance if no legacy or boot consoles
printk: Track nbcon consoles
printk: Coordinate direct printing in panic
panic: Mark emergency section in oops
rcu: Mark emergency section in rcu stalls
lockdep: Mark emergency sections in lockdep splats

Sebastian Andrzej Siewior (1):
printk: Check printk_deferred_enter()/_exit() usage

Thomas Gleixner (3):
printk: nbcon: Provide function to flush using write_atomic()
printk: nbcon: Implement emergency sections
panic: Mark emergency section in warn

drivers/tty/serial/8250/8250_core.c | 6 +-
drivers/tty/serial/amba-pl011.c | 2 +-
drivers/tty/serial/serial_core.c | 2 +-
include/linux/console.h | 138 ++++++++--
include/linux/printk.h | 32 ++-
include/linux/serial_core.h | 116 ++++++++-
kernel/locking/lockdep.c | 91 ++++++-
kernel/panic.c | 9 +
kernel/printk/internal.h | 56 +++-
kernel/printk/nbcon.c | 382 ++++++++++++++++++++++++++--
kernel/printk/printk.c | 287 ++++++++++++++++-----
kernel/printk/printk_ringbuffer.h | 2 +
kernel/printk/printk_safe.c | 12 +
kernel/rcu/tree_exp.h | 7 +
kernel/rcu/tree_stall.h | 9 +
15 files changed, 1038 insertions(+), 113 deletions(-)


base-commit: a2b4cab9da7746c42f87c13721d305baf0085a20
--
2.39.2



2024-04-02 22:12:41

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 02/27] printk: Properly deal with nbcon consoles on seq init

If a non-boot console is registering and boot consoles exist, the
consoles are flushed before being unregistered. This allows the
non-boot console to continue where the boot console left off.

If for whatever reason flushing fails, the lowest seq found from
any of the enabled boot consoles is used. Until now con->seq was
checked. However, if it is an nbcon boot console, the function
nbcon_seq_read() must be used to read seq because con->seq is
not updated for nbcon consoles.

Check if it is an nbcon boot console and if so call
nbcon_seq_read() to read seq.

Also setup the nbcon sequence number and reset the legacy
sequence number from register_console() (rather than in
nbcon_init() and nbcon_seq_force()). This removes all legacy
sequence handling from nbcon.c so the code is easier to follow
and maintain.

Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/nbcon.c | 7 +------
kernel/printk/printk.c | 29 ++++++++++++++++++++++++-----
2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index c8093bcc01fe..d741659d26ec 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -172,9 +172,6 @@ void nbcon_seq_force(struct console *con, u64 seq)
u64 valid_seq = max_t(u64, seq, prb_first_valid_seq(prb));

atomic_long_set(&ACCESS_PRIVATE(con, nbcon_seq), __u64seq_to_ulseq(valid_seq));
-
- /* Clear con->seq since nbcon consoles use con->nbcon_seq instead. */
- con->seq = 0;
}

/**
@@ -964,8 +961,6 @@ bool nbcon_alloc(struct console *con)
*
* nbcon_alloc() *must* be called and succeed before this function
* is called.
- *
- * This function expects that the legacy @con->seq has been set.
*/
void nbcon_init(struct console *con)
{
@@ -974,7 +969,7 @@ void nbcon_init(struct console *con)
/* nbcon_alloc() must have been called and successful! */
BUG_ON(!con->pbufs);

- nbcon_seq_force(con, con->seq);
+ nbcon_seq_force(con, 0);
nbcon_state_set(con, &state);
}

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index c7c0ee2b47eb..b7e52b3f3e96 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3348,6 +3348,7 @@ static void try_enable_default_console(struct console *newcon)
newcon->flags |= CON_CONSDEV;
}

+/* Set @newcon->seq to the first record this console should print. */
static void console_init_seq(struct console *newcon, bool bootcon_registered)
{
struct console *con;
@@ -3396,11 +3397,20 @@ static void console_init_seq(struct console *newcon, bool bootcon_registered)

newcon->seq = prb_next_seq(prb);
for_each_console(con) {
- if ((con->flags & CON_BOOT) &&
- (con->flags & CON_ENABLED) &&
- con->seq < newcon->seq) {
- newcon->seq = con->seq;
+ u64 seq;
+
+ if (!((con->flags & CON_BOOT) &&
+ (con->flags & CON_ENABLED))) {
+ continue;
}
+
+ if (con->flags & CON_NBCON)
+ seq = nbcon_seq_read(con);
+ else
+ seq = con->seq;
+
+ if (seq < newcon->seq)
+ newcon->seq = seq;
}
}

@@ -3517,9 +3527,18 @@ void register_console(struct console *newcon)
newcon->dropped = 0;
console_init_seq(newcon, bootcon_registered);

- if (newcon->flags & CON_NBCON)
+ if (newcon->flags & CON_NBCON) {
nbcon_init(newcon);

+ /*
+ * nbcon consoles have their own sequence counter. The legacy
+ * sequence counter is reset so that it is clear it is not
+ * being used.
+ */
+ nbcon_seq_force(newcon, newcon->seq);
+ newcon->seq = 0;
+ }
+
/*
* Put this console in the list - keep the
* preferred driver at the head of the list.
--
2.39.2


2024-04-02 22:12:45

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

Console drivers typically must deal with access to the hardware
via user input/output (such as an interactive login shell) and
output of kernel messages via printk() calls.

Follow-up commits require that the printk subsystem is able to
synchronize with the driver. Require nbcon consoles to implement
two new callbacks (device_lock(), device_unlock()) that will
use whatever synchronization mechanism the driver is using for
itself (for example, the port lock for uart serial consoles).

Signed-off-by: John Ogness <[email protected]>
---
include/linux/console.h | 42 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 42 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index e4028d4079e1..ad85594e070e 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -352,6 +352,48 @@ struct console {
*/
void (*write_atomic)(struct console *con, struct nbcon_write_context *wctxt);

+ /**
+ * @device_lock:
+ *
+ * NBCON callback to begin synchronization with driver code.
+ *
+ * Console drivers typically must deal with access to the hardware
+ * via user input/output (such as an interactive login shell) and
+ * output of kernel messages via printk() calls. This callback is
+ * called by the printk-subsystem whenever it needs to synchronize
+ * with hardware access by the driver. It should be implemented to
+ * use whatever synchronization mechanism the driver is using for
+ * itself (for example, the port lock for uart serial consoles).
+ *
+ * This callback is always called from task context. It may use any
+ * synchronization method required by the driver. BUT this callback
+ * MUST also disable migration. The console driver may be using a
+ * synchronization mechanism that already takes care of this (such as
+ * spinlocks). Otherwise this function must explicitly call
+ * migrate_disable().
+ *
+ * The flags argument is provided as a convenience to the driver. It
+ * will be passed again to device_unlock(). It can be ignored if the
+ * driver does not need it.
+ */
+ void (*device_lock)(struct console *con, unsigned long *flags);
+
+ /**
+ * @device_unlock:
+ *
+ * NBCON callback to finish synchronization with driver code.
+ *
+ * It is the counterpart to device_lock().
+ *
+ * This callback is always called from task context. It must
+ * appropriately re-enable migration (depending on how device_lock()
+ * disabled migration).
+ *
+ * The flags argument is the value of the same variable that was
+ * passed to device_lock().
+ */
+ void (*device_unlock)(struct console *con, unsigned long flags);
+
atomic_t __private nbcon_state;
atomic_long_t __private nbcon_seq;
struct printk_buffers *pbufs;
--
2.39.2


2024-04-02 22:12:47

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 04/27] printk: Check printk_deferred_enter()/_exit() usage

From: Sebastian Andrzej Siewior <[email protected]>

Add validation that printk_deferred_enter()/_exit() are called in
non-migration contexts.

Signed-off-by: Sebastian Andrzej Siewior <[email protected]>
Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
---
include/linux/printk.h | 9 +++++----
kernel/printk/internal.h | 3 +++
kernel/printk/printk_safe.c | 12 ++++++++++++
3 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 2fde40cc9677..d8b3f51d9e98 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -157,15 +157,16 @@ int _printk(const char *fmt, ...);
*/
__printf(1, 2) __cold int _printk_deferred(const char *fmt, ...);

-extern void __printk_safe_enter(void);
-extern void __printk_safe_exit(void);
+extern void __printk_deferred_enter(void);
+extern void __printk_deferred_exit(void);
+
/*
* The printk_deferred_enter/exit macros are available only as a hack for
* some code paths that need to defer all printk console printing. Interrupts
* must be disabled for the deferred duration.
*/
-#define printk_deferred_enter __printk_safe_enter
-#define printk_deferred_exit __printk_safe_exit
+#define printk_deferred_enter() __printk_deferred_enter()
+#define printk_deferred_exit() __printk_deferred_exit()

/*
* Please don't use printk_ratelimit(), because it shares ratelimiting state
diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 6c2afee5ef62..4e0edcb3c311 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -53,6 +53,9 @@ int vprintk_store(int facility, int level,
__printf(1, 0) int vprintk_default(const char *fmt, va_list args);
__printf(1, 0) int vprintk_deferred(const char *fmt, va_list args);

+void __printk_safe_enter(void);
+void __printk_safe_exit(void);
+
bool printk_percpu_data_ready(void);

#define printk_safe_enter_irqsave(flags) \
diff --git a/kernel/printk/printk_safe.c b/kernel/printk/printk_safe.c
index 6d10927a07d8..4421ccac3113 100644
--- a/kernel/printk/printk_safe.c
+++ b/kernel/printk/printk_safe.c
@@ -26,6 +26,18 @@ void __printk_safe_exit(void)
this_cpu_dec(printk_context);
}

+void __printk_deferred_enter(void)
+{
+ cant_migrate();
+ __printk_safe_enter();
+}
+
+void __printk_deferred_exit(void)
+{
+ cant_migrate();
+ __printk_safe_exit();
+}
+
asmlinkage int vprintk(const char *fmt, va_list args)
{
#ifdef CONFIG_KGDB_KDB
--
2.39.2


2024-04-02 22:13:06

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 07/27] printk: nbcon: Use driver synchronization while registering

Depending on if an nbcon console is registered, a driver may
handle its internal locking differently. If a driver is holding
its internal lock while the nbcon console is registered, there
may be a risk that two different contexts access the hardware
simultaneously without synchronization. (For example, if the
printk subsystem invokes atomic printing while another driver
context acquired the internal lock without considering the
atomic console.)

Use the driver synchronization while a registering nbcon console
transitions to being registered. This guarantees that if the
driver acquires its internal lock when the nbcon console was not
registered, it will remain unregistered until that context
releases the lock.

Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b7e52b3f3e96..cd32648372a0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3448,6 +3448,7 @@ void register_console(struct console *newcon)
struct console *con;
bool bootcon_registered = false;
bool realcon_registered = false;
+ unsigned long flags;
int err;

console_list_lock();
@@ -3539,6 +3540,19 @@ void register_console(struct console *newcon)
newcon->seq = 0;
}

+ /*
+ * If another context is actively using the hardware of this new
+ * console, it will not be aware of the nbcon synchronization. This
+ * is a risk that two contexts could access the hardware
+ * simultaneously if this new console is used for atomic printing
+ * and the other context is still using the hardware.
+ *
+ * Use the driver synchronization to ensure that the hardware is not
+ * in use while this new console transitions to being registered.
+ */
+ if ((newcon->flags & CON_NBCON) && newcon->write_atomic)
+ newcon->device_lock(newcon, &flags);
+
/*
* Put this console in the list - keep the
* preferred driver at the head of the list.
@@ -3563,6 +3577,10 @@ void register_console(struct console *newcon)
* register_console() completes.
*/

+ /* This new console is now registered. */
+ if ((newcon->flags & CON_NBCON) && newcon->write_atomic)
+ newcon->device_unlock(newcon, flags);
+
console_sysfs_notify();

/*
--
2.39.2


2024-04-02 22:13:11

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 08/27] serial: core: Provide low-level functions to lock port

It will be necessary at times for the uart nbcon console
drivers to acquire the port lock directly (without the
additional nbcon functionality of the port lock wrappers).
These are special cases such as the implementation of the
device_lock()/device_unlock() callbacks or for internal
port lock wrapper synchronization.

Provide low-level variants __uart_port_lock_irqsave() and
__uart_port_unlock_irqrestore() for this purpose.

Signed-off-by: John Ogness <[email protected]>
---
include/linux/serial_core.h | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 55b1f3ba48ac..bb3324d49453 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -588,6 +588,24 @@ struct uart_port {
void *private_data; /* generic platform data pointer */
};

+/*
+ * Only for console->device_lock()/_unlock() callbacks and internal
+ * port lock wrapper synchronization.
+ */
+static inline void __uart_port_lock_irqsave(struct uart_port *up, unsigned long *flags)
+{
+ spin_lock_irqsave(&up->lock, *flags);
+}
+
+/*
+ * Only for console->device_lock()/_unlock() callbacks and internal
+ * port lock wrapper synchronization.
+ */
+static inline void __uart_port_unlock_irqrestore(struct uart_port *up, unsigned long flags)
+{
+ spin_unlock_irqrestore(&up->lock, flags);
+}
+
/**
* uart_port_lock - Lock the UART port
* @up: Pointer to UART port structure
--
2.39.2


2024-04-02 22:13:39

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 10/27] printk: nbcon: Do not rely on proxy headers

The headers kernel.h, serial_core.h, and console.h allow for the
definitions of many types and functions from other headers.
Rather than relying on these as proxy headers, explicitly
include all headers providing needed definitions. Also sort the
list alphabetically to be able to easily detect duplicates.

Suggested-by: Andy Shevchenko <[email protected]>
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/internal.h | 8 ++++++--
kernel/printk/nbcon.c | 12 +++++++++++-
kernel/printk/printk_ringbuffer.h | 2 ++
3 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 4e0edcb3c311..c040fc8f1fd9 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -2,11 +2,12 @@
/*
* internal.h - printk internal definitions
*/
-#include <linux/percpu.h>
#include <linux/console.h>
-#include "printk_ringbuffer.h"
+#include <linux/percpu.h>
+#include <linux/types.h>

#if defined(CONFIG_PRINTK) && defined(CONFIG_SYSCTL)
+struct ctl_table;
void __init printk_sysctl_init(void);
int devkmsg_sysctl_set_loglvl(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos);
@@ -43,6 +44,9 @@ enum printk_info_flags {
LOG_CONT = 8, /* text is a fragment of a continuation line */
};

+struct printk_ringbuffer;
+struct dev_printk_info;
+
extern struct printk_ringbuffer *prb;

__printf(4, 0)
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 38328cf0fd5c..1de6062d4ce3 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -2,14 +2,24 @@
// Copyright (C) 2022 Linutronix GmbH, John Ogness
// Copyright (C) 2022 Intel, Thomas Gleixner

-#include <linux/kernel.h>
+#include <linux/atomic.h>
#include <linux/bug.h>
#include <linux/console.h>
#include <linux/delay.h>
+#include <linux/errno.h>
#include <linux/export.h>
+#include <linux/init.h>
+#include <linux/irqflags.h>
+#include <linux/minmax.h>
+#include <linux/percpu.h>
+#include <linux/preempt.h>
#include <linux/slab.h>
+#include <linux/smp.h>
+#include <linux/stddef.h>
#include <linux/string.h>
+#include <linux/types.h>
#include "internal.h"
+#include "printk_ringbuffer.h"
/*
* Printk console printing implementation for consoles which does not depend
* on the legacy style console_lock mechanism.
diff --git a/kernel/printk/printk_ringbuffer.h b/kernel/printk/printk_ringbuffer.h
index 52626d0f1fa3..bd2a892deac1 100644
--- a/kernel/printk/printk_ringbuffer.h
+++ b/kernel/printk/printk_ringbuffer.h
@@ -5,6 +5,8 @@

#include <linux/atomic.h>
#include <linux/dev_printk.h>
+#include <linux/stddef.h>
+#include <linux/types.h>

/*
* Meta information about each stored message.
--
2.39.2


2024-04-02 22:13:48

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 11/27] printk: nbcon: Fix kerneldoc for enums

Kerneldoc requires enums to be specified as such. Otherwise it is
interpreted as function documentation.

Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
---
include/linux/console.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index e7c35c686720..5f1758891aec 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -137,7 +137,7 @@ static inline int con_debug_leave(void)
*/

/**
- * cons_flags - General console flags
+ * enum cons_flags - General console flags
* @CON_PRINTBUFFER: Used by newly registered consoles to avoid duplicate
* output of messages that were already shown by boot
* consoles or read by userspace via syslog() syscall.
@@ -218,7 +218,7 @@ struct nbcon_state {
static_assert(sizeof(struct nbcon_state) <= sizeof(int));

/**
- * nbcon_prio - console owner priority for nbcon consoles
+ * enum nbcon_prio - console owner priority for nbcon consoles
* @NBCON_PRIO_NONE: Unused
* @NBCON_PRIO_NORMAL: Normal (non-emergency) usage
* @NBCON_PRIO_EMERGENCY: Emergency output (WARN/OOPS...)
--
2.39.2


2024-04-02 22:14:00

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 12/27] printk: Make console_is_usable() available to nbcon

Move console_is_usable() as-is into internal.h so that it can
be used by nbcon printing functions as well.

Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
---
kernel/printk/internal.h | 32 ++++++++++++++++++++++++++++++++
kernel/printk/printk.c | 30 ------------------------------
2 files changed, 32 insertions(+), 30 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index c040fc8f1fd9..bad22092cd5e 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -85,6 +85,36 @@ bool nbcon_alloc(struct console *con);
void nbcon_init(struct console *con);
void nbcon_free(struct console *con);

+/*
+ * Check if the given console is currently capable and allowed to print
+ * records.
+ *
+ * Requires the console_srcu_read_lock.
+ */
+static inline bool console_is_usable(struct console *con)
+{
+ short flags = console_srcu_read_flags(con);
+
+ if (!(flags & CON_ENABLED))
+ return false;
+
+ if ((flags & CON_SUSPENDED))
+ return false;
+
+ if (!con->write)
+ return false;
+
+ /*
+ * Console drivers may assume that per-cpu resources have been
+ * allocated. So unless they're explicitly marked as being able to
+ * cope (CON_ANYTIME) don't call them until this CPU is officially up.
+ */
+ if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
+ return false;
+
+ return true;
+}
+
#else

#define PRINTK_PREFIX_MAX 0
@@ -106,6 +136,8 @@ static inline bool nbcon_alloc(struct console *con) { return false; }
static inline void nbcon_init(struct console *con) { }
static inline void nbcon_free(struct console *con) { }

+static inline bool console_is_usable(struct console *con) { return false; }
+
#endif /* CONFIG_PRINTK */

extern struct printk_buffers printk_shared_pbufs;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index cd32648372a0..9ea51cb2aca9 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2697,36 +2697,6 @@ int is_console_locked(void)
}
EXPORT_SYMBOL(is_console_locked);

-/*
- * Check if the given console is currently capable and allowed to print
- * records.
- *
- * Requires the console_srcu_read_lock.
- */
-static inline bool console_is_usable(struct console *con)
-{
- short flags = console_srcu_read_flags(con);
-
- if (!(flags & CON_ENABLED))
- return false;
-
- if ((flags & CON_SUSPENDED))
- return false;
-
- if (!con->write)
- return false;
-
- /*
- * Console drivers may assume that per-cpu resources have been
- * allocated. So unless they're explicitly marked as being able to
- * cope (CON_ANYTIME) don't call them until this CPU is officially up.
- */
- if (!cpu_online(raw_smp_processor_id()) && !(flags & CON_ANYTIME))
- return false;
-
- return true;
-}
-
static void __console_unlock(void)
{
console_locked = 0;
--
2.39.2


2024-04-02 22:14:00

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 09/27] printk: nbcon: Implement processing in port->lock wrapper

Currently the port->lock wrappers uart_port_lock(),
uart_port_unlock() (and their variants) only lock/unlock
the spin_lock.

If the port is an nbcon console, the wrappers must also
acquire/release the console and mark the region as unsafe. This
allows general port->lock synchronization to be synchronized
with the nbcon console ownership.

Introduce a new struct nbcon_drvdata within struct console that
provides the necessary components for the port lock wrappers to
acquire the nbcon console and track its ownership.

Also introduce uart_port_set_cons() as a wrapper to set @cons
of a uart_port. The wrapper sets @cons under the port lock in
order to prevent @cons from disappearing while another context
owns the port lock via the port lock wrappers.

Also cleanup the description of the console_srcu_read_flags()
function. It is used by the port lock wrappers to ensure a
console cannot be fully unregistered while another context
owns the port lock via the port lock wrappers.

Signed-off-by: John Ogness <[email protected]>
---
drivers/tty/serial/8250/8250_core.c | 6 +-
drivers/tty/serial/amba-pl011.c | 2 +-
drivers/tty/serial/serial_core.c | 2 +-
include/linux/console.h | 57 +++++++++++++----
include/linux/printk.h | 13 ++++
include/linux/serial_core.h | 98 ++++++++++++++++++++++++++++-
kernel/printk/nbcon.c | 52 +++++++++++++++
7 files changed, 212 insertions(+), 18 deletions(-)

diff --git a/drivers/tty/serial/8250/8250_core.c b/drivers/tty/serial/8250/8250_core.c
index b62ad9006780..41d74ee3d95a 100644
--- a/drivers/tty/serial/8250/8250_core.c
+++ b/drivers/tty/serial/8250/8250_core.c
@@ -627,11 +627,11 @@ static int univ8250_console_setup(struct console *co, char *options)

port = &serial8250_ports[co->index].port;
/* link port to console */
- port->cons = co;
+ uart_port_set_cons(port, co);

retval = serial8250_console_setup(port, options, false);
if (retval != 0)
- port->cons = NULL;
+ uart_port_set_cons(port, NULL);
return retval;
}

@@ -689,7 +689,7 @@ static int univ8250_console_match(struct console *co, char *name, int idx,
continue;

co->index = i;
- port->cons = co;
+ uart_port_set_cons(port, co);
return serial8250_console_setup(port, options, true);
}

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c
index cf2c890a560f..347aacf8400f 100644
--- a/drivers/tty/serial/amba-pl011.c
+++ b/drivers/tty/serial/amba-pl011.c
@@ -2496,7 +2496,7 @@ static int pl011_console_match(struct console *co, char *name, int idx,
continue;

co->index = i;
- port->cons = co;
+ uart_port_set_cons(port, co);
return pl011_console_setup(co, options);
}

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index d6a58a9e072a..2652b4d5c944 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -3146,7 +3146,7 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
uport->state = state;

state->pm_state = UART_PM_STATE_UNDEFINED;
- uport->cons = drv->cons;
+ uart_port_set_cons(uport, drv->cons);
uport->minor = drv->tty_driver->minor_start + uport->line;
uport->name = kasprintf(GFP_KERNEL, "%s%d", drv->dev_name,
drv->tty_driver->name_base + uport->line);
diff --git a/include/linux/console.h b/include/linux/console.h
index ad85594e070e..e7c35c686720 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -282,6 +282,25 @@ struct nbcon_write_context {
bool unsafe_takeover;
};

+/**
+ * struct nbcon_drvdata - Data to allow nbcon acquire in non-print context
+ * @ctxt: The core console context
+ * @srcu_cookie: Storage for a console_srcu_lock cookie, if needed
+ * @owner_index: Storage for the owning console index, if needed
+ * @locked: Storage for the locked state, if needed
+ *
+ * All fields (except for @ctxt) are available exclusively to the driver to
+ * use as needed. They are not used by the printk subsystem.
+ */
+struct nbcon_drvdata {
+ struct nbcon_context __private ctxt;
+
+ /* reserved for driver use */
+ int srcu_cookie;
+ short owner_index;
+ bool locked;
+};
+
/**
* struct console - The console descriptor structure
* @name: The name of the console driver
@@ -396,6 +415,21 @@ struct console {

atomic_t __private nbcon_state;
atomic_long_t __private nbcon_seq;
+
+ /**
+ * @nbcon_drvdata:
+ *
+ * Data for nbcon ownership tracking to allow acquiring nbcon consoles
+ * in non-printing contexts.
+ *
+ * Drivers may need to acquire nbcon consoles in non-printing
+ * contexts. This is achieved by providing a struct nbcon_drvdata.
+ * Then the driver can call nbcon_driver_acquire() and
+ * nbcon_driver_release(). The struct does not require any special
+ * initialization.
+ */
+ struct nbcon_drvdata *nbcon_drvdata;
+
struct printk_buffers *pbufs;
};

@@ -425,28 +459,29 @@ extern void console_list_unlock(void) __releases(console_mutex);
extern struct hlist_head console_list;

/**
- * console_srcu_read_flags - Locklessly read the console flags
+ * console_srcu_read_flags - Locklessly read flags of a possibly registered
+ * console
* @con: struct console pointer of console to read flags from
*
- * This function provides the necessary READ_ONCE() and data_race()
- * notation for locklessly reading the console flags. The READ_ONCE()
- * in this function matches the WRITE_ONCE() when @flags are modified
- * for registered consoles with console_srcu_write_flags().
+ * Locklessly reading @con->flags provides a consistent read value because
+ * there is at most one CPU modifying @con->flags and that CPU is using only
+ * read-modify-write operations to do so.
*
- * Only use this function to read console flags when locklessly
- * iterating the console list via srcu.
+ * Requires console_srcu_read_lock to be held, which implies that @con might
+ * be a registered console. If the caller is holding the console_list_lock or
+ * it is certain that the console is not registered, the caller may read
+ * @con->flags directly instead.
*
* Context: Any context.
+ * Return: The current value of the @con->flags field.
*/
static inline short console_srcu_read_flags(const struct console *con)
{
WARN_ON_ONCE(!console_srcu_read_lock_is_held());

/*
- * Locklessly reading console->flags provides a consistent
- * read value because there is at most one CPU modifying
- * console->flags and that CPU is using only read-modify-write
- * operations to do so.
+ * The READ_ONCE() matches the WRITE_ONCE() when @flags are modified
+ * for registered consoles with console_srcu_write_flags().
*/
return data_race(READ_ONCE(con->flags));
}
diff --git a/include/linux/printk.h b/include/linux/printk.h
index d8b3f51d9e98..0ad3ee752635 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -9,6 +9,8 @@
#include <linux/ratelimit_types.h>
#include <linux/once_lite.h>

+struct console;
+
extern const char linux_banner[];
extern const char linux_proc_banner[];

@@ -193,6 +195,8 @@ void show_regs_print_info(const char *log_lvl);
extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
extern asmlinkage void dump_stack(void) __cold;
void printk_trigger_flush(void);
+extern void nbcon_driver_acquire(struct console *con);
+extern void nbcon_driver_release(struct console *con);
#else
static inline __printf(1, 0)
int vprintk(const char *s, va_list args)
@@ -272,6 +276,15 @@ static inline void dump_stack(void)
static inline void printk_trigger_flush(void)
{
}
+
+static inline void nbcon_driver_acquire(struct console *con)
+{
+}
+
+static inline void nbcon_driver_release(struct console *con)
+{
+}
+
#endif

bool this_cpu_in_panic(void);
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index bb3324d49453..9a73dee32ad9 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -8,10 +8,13 @@
#define LINUX_SERIAL_CORE_H

#include <linux/bitops.h>
+#include <linux/bug.h>
#include <linux/compiler.h>
#include <linux/console.h>
#include <linux/interrupt.h>
#include <linux/circ_buf.h>
+#include <linux/lockdep.h>
+#include <linux/printk.h>
#include <linux/spinlock.h>
#include <linux/sched.h>
#include <linux/tty.h>
@@ -606,6 +609,83 @@ static inline void __uart_port_unlock_irqrestore(struct uart_port *up, unsigned
spin_unlock_irqrestore(&up->lock, flags);
}

+/**
+ * uart_port_set_cons - Safely set the @cons field for a uart
+ * @up: The uart port to set
+ * @con: The new console to set to
+ *
+ * This function must be used to set @up->cons. It uses the port lock to
+ * synchronize with the port lock wrappers in order to ensure that the console
+ * cannot change or disappear while another context is holding the port lock.
+ */
+static inline void uart_port_set_cons(struct uart_port *up, struct console *con)
+{
+ unsigned long flags;
+
+ __uart_port_lock_irqsave(up, &flags);
+ up->cons = con;
+ __uart_port_unlock_irqrestore(up, flags);
+}
+
+/* Only for internal port lock wrapper usage. */
+static inline void __uart_port_nbcon_acquire(struct uart_port *up)
+{
+ lockdep_assert_held_once(&up->lock);
+
+ if (likely(!uart_console(up)))
+ return;
+
+ if (up->cons->nbcon_drvdata) {
+ /*
+ * If @up->cons is registered, prevent it from fully
+ * unregistering until this context releases the nbcon.
+ */
+ int cookie = console_srcu_read_lock();
+
+ /* Ensure console is registered and is an nbcon console. */
+ if (!hlist_unhashed_lockless(&up->cons->node) &&
+ (console_srcu_read_flags(up->cons) & CON_NBCON)) {
+ WARN_ON_ONCE(up->cons->nbcon_drvdata->locked);
+
+ nbcon_driver_acquire(up->cons);
+
+ /*
+ * Record @up->line to be used during release because
+ * @up->cons->index can change while the port and
+ * nbcon are locked.
+ */
+ up->cons->nbcon_drvdata->owner_index = up->line;
+ up->cons->nbcon_drvdata->srcu_cookie = cookie;
+ up->cons->nbcon_drvdata->locked = true;
+ } else {
+ console_srcu_read_unlock(cookie);
+ }
+ }
+}
+
+/* Only for internal port lock wrapper usage. */
+static inline void __uart_port_nbcon_release(struct uart_port *up)
+{
+ lockdep_assert_held_once(&up->lock);
+
+ /*
+ * uart_console() cannot be used here because @up->cons->index might
+ * have changed. Check against @up->cons->nbcon_drvdata->owner_index
+ * instead.
+ */
+
+ if (unlikely(up->cons &&
+ up->cons->nbcon_drvdata &&
+ up->cons->nbcon_drvdata->locked &&
+ up->cons->nbcon_drvdata->owner_index == up->line)) {
+ WARN_ON_ONCE(!up->cons->nbcon_drvdata->locked);
+
+ up->cons->nbcon_drvdata->locked = false;
+ nbcon_driver_release(up->cons);
+ console_srcu_read_unlock(up->cons->nbcon_drvdata->srcu_cookie);
+ }
+}
+
/**
* uart_port_lock - Lock the UART port
* @up: Pointer to UART port structure
@@ -613,6 +693,7 @@ static inline void __uart_port_unlock_irqrestore(struct uart_port *up, unsigned
static inline void uart_port_lock(struct uart_port *up)
{
spin_lock(&up->lock);
+ __uart_port_nbcon_acquire(up);
}

/**
@@ -622,6 +703,7 @@ static inline void uart_port_lock(struct uart_port *up)
static inline void uart_port_lock_irq(struct uart_port *up)
{
spin_lock_irq(&up->lock);
+ __uart_port_nbcon_acquire(up);
}

/**
@@ -632,6 +714,7 @@ static inline void uart_port_lock_irq(struct uart_port *up)
static inline void uart_port_lock_irqsave(struct uart_port *up, unsigned long *flags)
{
spin_lock_irqsave(&up->lock, *flags);
+ __uart_port_nbcon_acquire(up);
}

/**
@@ -642,7 +725,11 @@ static inline void uart_port_lock_irqsave(struct uart_port *up, unsigned long *f
*/
static inline bool uart_port_trylock(struct uart_port *up)
{
- return spin_trylock(&up->lock);
+ if (!spin_trylock(&up->lock))
+ return false;
+
+ __uart_port_nbcon_acquire(up);
+ return true;
}

/**
@@ -654,7 +741,11 @@ static inline bool uart_port_trylock(struct uart_port *up)
*/
static inline bool uart_port_trylock_irqsave(struct uart_port *up, unsigned long *flags)
{
- return spin_trylock_irqsave(&up->lock, *flags);
+ if (!spin_trylock_irqsave(&up->lock, *flags))
+ return false;
+
+ __uart_port_nbcon_acquire(up);
+ return true;
}

/**
@@ -663,6 +754,7 @@ static inline bool uart_port_trylock_irqsave(struct uart_port *up, unsigned long
*/
static inline void uart_port_unlock(struct uart_port *up)
{
+ __uart_port_nbcon_release(up);
spin_unlock(&up->lock);
}

@@ -672,6 +764,7 @@ static inline void uart_port_unlock(struct uart_port *up)
*/
static inline void uart_port_unlock_irq(struct uart_port *up)
{
+ __uart_port_nbcon_release(up);
spin_unlock_irq(&up->lock);
}

@@ -682,6 +775,7 @@ static inline void uart_port_unlock_irq(struct uart_port *up)
*/
static inline void uart_port_unlock_irqrestore(struct uart_port *up, unsigned long flags)
{
+ __uart_port_nbcon_release(up);
spin_unlock_irqrestore(&up->lock, flags);
}

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 2516449f921d..38328cf0fd5c 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -3,9 +3,12 @@
// Copyright (C) 2022 Intel, Thomas Gleixner

#include <linux/kernel.h>
+#include <linux/bug.h>
#include <linux/console.h>
#include <linux/delay.h>
+#include <linux/export.h>
#include <linux/slab.h>
+#include <linux/string.h>
#include "internal.h"
/*
* Printk console printing implementation for consoles which does not depend
@@ -988,3 +991,52 @@ void nbcon_free(struct console *con)

con->pbufs = NULL;
}
+
+/**
+ * nbcon_driver_acquire - Acquire nbcon console and enter unsafe section
+ * @con: The nbcon console to acquire
+ *
+ * Context: Any context which could not be migrated to another CPU.
+ *
+ * Console drivers will usually use their own internal synchronization
+ * mechasism to synchronize between console printing and non-printing
+ * activities (such as setting baud rates). However, nbcon console drivers
+ * supporting atomic consoles may also want to mark unsafe sections when
+ * performing non-printing activities.
+ *
+ * This function acquires the nbcon console using priority NBCON_PRIO_NORMAL
+ * and marks it unsafe for handover/takeover.
+ *
+ * Console drivers using this function must have provided @nbcon_drvdata in
+ * their struct console, which is used to track ownership and state
+ * information.
+ */
+void nbcon_driver_acquire(struct console *con)
+{
+ struct nbcon_context *ctxt = &ACCESS_PRIVATE(con->nbcon_drvdata, ctxt);
+
+ cant_migrate();
+
+ do {
+ do {
+ memset(ctxt, 0, sizeof(*ctxt));
+ ctxt->console = con;
+ ctxt->prio = NBCON_PRIO_NORMAL;
+ } while (!nbcon_context_try_acquire(ctxt));
+
+ } while (!nbcon_context_enter_unsafe(ctxt));
+}
+EXPORT_SYMBOL_GPL(nbcon_driver_acquire);
+
+/**
+ * nbcon_driver_release - Exit unsafe section and release the nbcon console
+ * @con: The nbcon console acquired in nbcon_driver_acquire()
+ */
+void nbcon_driver_release(struct console *con)
+{
+ struct nbcon_context *ctxt = &ACCESS_PRIVATE(con->nbcon_drvdata, ctxt);
+
+ if (nbcon_context_exit_unsafe(ctxt))
+ nbcon_context_release(ctxt);
+}
+EXPORT_SYMBOL_GPL(nbcon_driver_release);
--
2.39.2


2024-04-02 22:14:23

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 14/27] printk: Add @flags argument for console_is_usable()

The caller of console_is_usable() usually needs @console->flags
for its own checks. Rather than having console_is_usable() read
its own copy, make the caller pass in the @flags. This also
ensures that the caller saw the same @flags value.

Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
---
kernel/printk/internal.h | 8 ++------
kernel/printk/printk.c | 5 +++--
2 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index 398e99a30427..b7a0072eb2a4 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -90,13 +90,9 @@ void nbcon_free(struct console *con);
* records. Note that this function does not consider the current context,
* which can also play a role in deciding if @con can be used to print
* records.
- *
- * Requires the console_srcu_read_lock.
*/
-static inline bool console_is_usable(struct console *con)
+static inline bool console_is_usable(struct console *con, short flags)
{
- short flags = console_srcu_read_flags(con);
-
if (!(flags & CON_ENABLED))
return false;

@@ -143,7 +139,7 @@ static inline bool nbcon_alloc(struct console *con) { return false; }
static inline void nbcon_init(struct console *con) { }
static inline void nbcon_free(struct console *con) { }

-static inline bool console_is_usable(struct console *con) { return false; }
+static inline bool console_is_usable(struct console *con, short flags) { return false; }

#endif /* CONFIG_PRINTK */

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9ea51cb2aca9..fe06856f7653 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2942,9 +2942,10 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove

cookie = console_srcu_read_lock();
for_each_console_srcu(con) {
+ short flags = console_srcu_read_flags(con);
bool progress;

- if (!console_is_usable(con))
+ if (!console_is_usable(con, flags))
continue;
any_usable = true;

@@ -3814,7 +3815,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
* that they make forward progress, so only increment
* @diff for usable consoles.
*/
- if (!console_is_usable(c))
+ if (!console_is_usable(c, flags))
continue;

if (flags & CON_NBCON) {
--
2.39.2


2024-04-02 22:15:05

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 18/27] printk: nbcon: Assign priority based on CPU state

Use the current state of the CPU to determine which priority to
assign to the printing context.

The EMERGENCY priority handling is added in a follow-up commit.
It will use a per-CPU variable.

Note: nbcon_driver_acquire(), which is used by console drivers
to acquire the nbcon console for non-printing activities,
will always use NORMAL priority.

Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
---
kernel/printk/internal.h | 2 ++
kernel/printk/nbcon.c | 20 ++++++++++++++++++--
2 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index acf53c35b7a0..bcf2105a5c5c 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -86,6 +86,7 @@ void nbcon_seq_force(struct console *con, u64 seq);
bool nbcon_alloc(struct console *con);
void nbcon_init(struct console *con);
void nbcon_free(struct console *con);
+enum nbcon_prio nbcon_get_default_prio(void);
void nbcon_atomic_flush_pending(void);
bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
int cookie);
@@ -143,6 +144,7 @@ static inline void nbcon_seq_force(struct console *con, u64 seq) { }
static inline bool nbcon_alloc(struct console *con) { return false; }
static inline void nbcon_init(struct console *con) { }
static inline void nbcon_free(struct console *con) { }
+static inline enum nbcon_prio nbcon_get_default_prio(void) { return NBCON_PRIO_NONE; }
static inline void nbcon_atomic_flush_pending(void) { }
static inline bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
int cookie) { return false; }
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 599fff3c0ab3..fe5a96ab1f40 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -967,6 +967,22 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
return ctxt->backlog;
}

+/**
+ * nbcon_get_default_prio - The appropriate nbcon priority to use for nbcon
+ * printing on the current CPU
+ *
+ * Context: Any context which could not be migrated to another CPU.
+ * Return: The nbcon_prio to use for acquiring an nbcon console in this
+ * context for printing.
+ */
+enum nbcon_prio nbcon_get_default_prio(void)
+{
+ if (this_cpu_in_panic())
+ return NBCON_PRIO_PANIC;
+
+ return NBCON_PRIO_NORMAL;
+}
+
/**
* nbcon_legacy_emit_next_record - Print one record for an nbcon console
* in legacy contexts
@@ -1001,7 +1017,7 @@ bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
stop_critical_timings();

ctxt->console = con;
- ctxt->prio = NBCON_PRIO_NORMAL;
+ ctxt->prio = nbcon_get_default_prio();

progress = nbcon_atomic_emit_one(&wctxt);

@@ -1032,7 +1048,7 @@ static bool __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)

ctxt->console = con;
ctxt->spinwait_max_us = 2000;
- ctxt->prio = NBCON_PRIO_NORMAL;
+ ctxt->prio = nbcon_get_default_prio();

if (!nbcon_context_try_acquire(ctxt))
return false;
--
2.39.2


2024-04-02 22:15:04

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

Allow nbcon consoles to print messages in the legacy printk()
caller context (printing via unlock) by integrating them into
console_flush_all(). The write_atomic() callback is used for
printing.

Provide nbcon_legacy_emit_next_record(), which acts as the
nbcon variant of console_emit_next_record(). Call this variant
within console_flush_all() for nbcon consoles. Since nbcon
consoles use their own @nbcon_seq variable to track the next
record to print, this also must be appropriately handled.

Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/internal.h | 6 ++++
kernel/printk/nbcon.c | 77 ++++++++++++++++++++++++++++++++++++++++
kernel/printk/printk.c | 17 ++++++---
3 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index a8df764fd0c5..acf53c35b7a0 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -78,6 +78,8 @@ void defer_console_output(void);

u16 printk_parse_prefix(const char *text, int *level,
enum printk_info_flags *flags);
+void console_lock_spinning_enable(void);
+int console_lock_spinning_disable_and_check(int cookie);

u64 nbcon_seq_read(struct console *con);
void nbcon_seq_force(struct console *con, u64 seq);
@@ -85,6 +87,8 @@ bool nbcon_alloc(struct console *con);
void nbcon_init(struct console *con);
void nbcon_free(struct console *con);
void nbcon_atomic_flush_pending(void);
+bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
+ int cookie);

/*
* Check if the given console is currently capable and allowed to print
@@ -140,6 +144,8 @@ static inline bool nbcon_alloc(struct console *con) { return false; }
static inline void nbcon_init(struct console *con) { }
static inline void nbcon_free(struct console *con) { }
static inline void nbcon_atomic_flush_pending(void) { }
+static inline bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
+ int cookie) { return false; }

static inline bool console_is_usable(struct console *con, short flags) { return false; }

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index fcdab2eaaedb..599fff3c0ab3 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -541,6 +541,7 @@ static struct printk_buffers panic_nbcon_pbufs;
* nbcon_context_try_acquire - Try to acquire nbcon console
* @ctxt: The context of the caller
*
+ * Context: Any context which could not be migrated to another CPU.
* Return: True if the console was acquired. False otherwise.
*
* If the caller allowed an unsafe hostile takeover, on success the
@@ -935,6 +936,82 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
return nbcon_context_exit_unsafe(ctxt);
}

+/**
+ * nbcon_atomic_emit_one - Print one record for an nbcon console using the
+ * write_atomic() callback
+ * @wctxt: An initialized write context struct to use for this context
+ *
+ * Return: False if it is known there are no more records to print,
+ * otherwise true.
+ *
+ * This is an internal helper to handle the locking of the console before
+ * calling nbcon_emit_next_record().
+ */
+static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
+{
+ struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
+
+ if (!nbcon_context_try_acquire(ctxt))
+ return true;
+
+ /*
+ * nbcon_emit_next_record() returns false when the console was
+ * handed over or taken over. In both cases the context is no
+ * longer valid.
+ */
+ if (!nbcon_emit_next_record(wctxt))
+ return true;
+
+ nbcon_context_release(ctxt);
+
+ return ctxt->backlog;
+}
+
+/**
+ * nbcon_legacy_emit_next_record - Print one record for an nbcon console
+ * in legacy contexts
+ * @con: The console to print on
+ * @handover: Will be set to true if a printk waiter has taken over the
+ * console_lock, in which case the caller is no longer holding
+ * both the console_lock and the SRCU read lock. Otherwise it
+ * is set to false.
+ * @cookie: The cookie from the SRCU read lock.
+ *
+ * Context: Any context except NMI.
+ * Return: False if the given console has no next record to print,
+ * otherwise true.
+ *
+ * This function is meant to be called by console_flush_all() to print records
+ * on nbcon consoles from legacy context (printing via console unlocking).
+ * Essentially it is the nbcon version of console_emit_next_record().
+ */
+bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
+ int cookie)
+{
+ struct nbcon_write_context wctxt = { };
+ struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
+ unsigned long flags;
+ bool progress;
+
+ *handover = false;
+
+ /* Use the same procedure as console_emit_next_record(). */
+ printk_safe_enter_irqsave(flags);
+ console_lock_spinning_enable();
+ stop_critical_timings();
+
+ ctxt->console = con;
+ ctxt->prio = NBCON_PRIO_NORMAL;
+
+ progress = nbcon_atomic_emit_one(&wctxt);
+
+ start_critical_timings();
+ *handover = console_lock_spinning_disable_and_check(cookie);
+ printk_safe_exit_irqrestore(flags);
+
+ return progress;
+}
+
/**
* __nbcon_atomic_flush_pending_con - Flush specified nbcon console using its
* write_atomic() callback
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index a1b3309e12c1..df84c6bfbb2d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1860,7 +1860,7 @@ static bool console_waiter;
* there may be a waiter spinning (like a spinlock). Also it must be
* ready to hand over the lock at the end of the section.
*/
-static void console_lock_spinning_enable(void)
+void console_lock_spinning_enable(void)
{
/*
* Do not use spinning in panic(). The panic CPU wants to keep the lock.
@@ -1899,7 +1899,7 @@ static void console_lock_spinning_enable(void)
*
* Return: 1 if the lock rights were passed, 0 otherwise.
*/
-static int console_lock_spinning_disable_and_check(int cookie)
+int console_lock_spinning_disable_and_check(int cookie)
{
int waiter;

@@ -2951,13 +2951,20 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
cookie = console_srcu_read_lock();
for_each_console_srcu(con) {
short flags = console_srcu_read_flags(con);
+ u64 printk_seq;
bool progress;

if (!console_is_usable(con, flags))
continue;
any_usable = true;

- progress = console_emit_next_record(con, handover, cookie);
+ if (flags & CON_NBCON) {
+ progress = nbcon_legacy_emit_next_record(con, handover, cookie);
+ printk_seq = nbcon_seq_read(con);
+ } else {
+ progress = console_emit_next_record(con, handover, cookie);
+ printk_seq = con->seq;
+ }

/*
* If a handover has occurred, the SRCU read lock
@@ -2967,8 +2974,8 @@ static bool console_flush_all(bool do_cond_resched, u64 *next_seq, bool *handove
return false;

/* Track the next of the highest seq flushed. */
- if (con->seq > *next_seq)
- *next_seq = con->seq;
+ if (printk_seq > *next_seq)
+ *next_seq = printk_seq;

if (!progress)
continue;
--
2.39.2


2024-04-02 22:15:09

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 16/27] printk: Track registered boot consoles

Unfortunately it is not known if a boot console and a regular
(legacy or nbcon) console use the same hardware. For this reason
they must not be allowed to print simultaneously.

For legacy consoles this is not an issue because they are
already synchronized with the boot consoles using the console
lock. However nbcon consoles can be triggered separately.

Add a global flag @have_boot_console to identify if any boot
consoles are registered. This will be used in follow-up commits
to ensure that boot consoles and nbcon consoles cannot print
simultaneously.

Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
---
kernel/printk/printk.c | 24 ++++++++++++++++++++++++
1 file changed, 24 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 6404f2044ceb..a1b3309e12c1 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -463,6 +463,14 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
/* syslog_lock protects syslog_* variables and write access to clear_seq. */
static DEFINE_MUTEX(syslog_lock);

+/*
+ * Specifies if a boot console is registered. If boot consoles are present,
+ * nbcon consoles cannot print simultaneously and must be synchronized by
+ * the console lock. This is because boot consoles and nbcon consoles may
+ * have mapped the same hardware.
+ */
+static bool have_boot_console;
+
#ifdef CONFIG_PRINTK
DECLARE_WAIT_QUEUE_HEAD(log_wait);
/* All 3 protected by @syslog_lock. */
@@ -3513,6 +3521,9 @@ void register_console(struct console *newcon)
newcon->seq = 0;
}

+ if (newcon->flags & CON_BOOT)
+ have_boot_console = true;
+
/*
* If another context is actively using the hardware of this new
* console, it will not be aware of the nbcon synchronization. This
@@ -3582,6 +3593,8 @@ EXPORT_SYMBOL(register_console);
/* Must be called under console_list_lock(). */
static int unregister_console_locked(struct console *console)
{
+ bool found_boot_con = false;
+ struct console *c;
int res;

lockdep_assert_console_list_lock_held();
@@ -3629,6 +3642,17 @@ static int unregister_console_locked(struct console *console)
if (console->exit)
res = console->exit(console);

+ /*
+ * With this console gone, the global flags tracking registered
+ * console types may have changed. Update them.
+ */
+ for_each_console(c) {
+ if (c->flags & CON_BOOT)
+ found_boot_con = true;
+ }
+ if (!found_boot_con)
+ have_boot_console = found_boot_con;
+
return res;
}

--
2.39.2


2024-04-02 22:15:24

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 19/27] printk: nbcon: Add unsafe flushing on panic

Add nbcon_atomic_flush_unsafe() to flush all nbcon consoles
using the write_atomic() callback and allowing unsafe hostile
takeovers. Call this at the end of panic() as a final attempt
to flush any pending messages.

Note that legacy consoles use unsafe methods for flushing
from the beginning of panic (see bust_spinlocks()). Therefore,
systems using both legacy and nbcon consoles may still fail to
see panic messages due to unsafe legacy console usage.

Signed-off-by: John Ogness <[email protected]>
---
include/linux/printk.h | 5 +++++
kernel/panic.c | 1 +
kernel/printk/nbcon.c | 26 +++++++++++++++++++++-----
3 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 0ad3ee752635..866683a293af 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -197,6 +197,7 @@ extern asmlinkage void dump_stack(void) __cold;
void printk_trigger_flush(void);
extern void nbcon_driver_acquire(struct console *con);
extern void nbcon_driver_release(struct console *con);
+void nbcon_atomic_flush_unsafe(void);
#else
static inline __printf(1, 0)
int vprintk(const char *s, va_list args)
@@ -285,6 +286,10 @@ static inline void nbcon_driver_release(struct console *con)
{
}

+static inline void nbcon_atomic_flush_unsafe(void)
+{
+}
+
#endif

bool this_cpu_in_panic(void);
diff --git a/kernel/panic.c b/kernel/panic.c
index f22d8f33ea14..c039f8e1ddae 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -453,6 +453,7 @@ void panic(const char *fmt, ...)
* Explicitly flush the kernel log buffer one last time.
*/
console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+ nbcon_atomic_flush_unsafe();

local_irq_enable();
for (i = 0; ; i += PANIC_TIMER_STEP) {
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index fe5a96ab1f40..47f39402a22b 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1033,6 +1033,7 @@ bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
* write_atomic() callback
* @con: The nbcon console to flush
* @stop_seq: Flush up until this record
+ * @allow_unsafe_takeover: True, to allow unsafe hostile takeovers
*
* Return: True if taken over while printing. Otherwise false.
*
@@ -1041,7 +1042,8 @@ bool nbcon_legacy_emit_next_record(struct console *con, bool *handover,
* there are no more records available to read or this context is not allowed
* to acquire the console.
*/
-static bool __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
+static bool __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq,
+ bool allow_unsafe_takeover)
{
struct nbcon_write_context wctxt = { };
struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
@@ -1049,6 +1051,7 @@ static bool __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
ctxt->console = con;
ctxt->spinwait_max_us = 2000;
ctxt->prio = nbcon_get_default_prio();
+ ctxt->allow_unsafe_takeover = allow_unsafe_takeover;

if (!nbcon_context_try_acquire(ctxt))
return false;
@@ -1075,8 +1078,9 @@ static bool __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
* __nbcon_atomic_flush_pending - Flush all nbcon consoles using their
* write_atomic() callback
* @stop_seq: Flush up until this record
+ * @allow_unsafe_takeover: True, to allow unsafe hostile takeovers
*/
-static void __nbcon_atomic_flush_pending(u64 stop_seq)
+static void __nbcon_atomic_flush_pending(u64 stop_seq, bool allow_unsafe_takeover)
{
struct console *con;
bool should_retry;
@@ -1109,8 +1113,8 @@ static void __nbcon_atomic_flush_pending(u64 stop_seq)
*/
local_irq_save(irq_flags);

- should_retry |= __nbcon_atomic_flush_pending_con(con, stop_seq);
-
+ should_retry |= __nbcon_atomic_flush_pending_con(con, stop_seq,
+ allow_unsafe_takeover);
local_irq_restore(irq_flags);
}
console_srcu_read_unlock(cookie);
@@ -1127,7 +1131,19 @@ static void __nbcon_atomic_flush_pending(u64 stop_seq)
*/
void nbcon_atomic_flush_pending(void)
{
- __nbcon_atomic_flush_pending(prb_next_reserve_seq(prb));
+ __nbcon_atomic_flush_pending(prb_next_reserve_seq(prb), false);
+}
+
+/**
+ * nbcon_atomic_flush_unsafe - Flush all nbcon consoles using their
+ * write_atomic() callback and allowing unsafe hostile takeovers
+ *
+ * Flush the backlog up through the currently newest record. Unsafe hostile
+ * takeovers will be performed, if necessary.
+ */
+void nbcon_atomic_flush_unsafe(void)
+{
+ __nbcon_atomic_flush_pending(prb_next_reserve_seq(prb), true);
}

/**
--
2.39.2


2024-04-02 22:15:42

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 21/27] printk: Track nbcon consoles

Add a global flag @have_nbcon_console to identify if any nbcon
consoles are registered. This will be used in follow-up commits
to preserve legacy behavior when no nbcon consoles are registered.

Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 4ff3800e8e8e..329b8507f8a0 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -470,6 +470,11 @@ static DEFINE_MUTEX(syslog_lock);
*/
static bool have_legacy_console;

+/*
+ * Specifies if an nbcon console is registered.
+ */
+static bool have_nbcon_console;
+
/*
* Specifies if a boot console is registered. If boot consoles are present,
* nbcon consoles cannot print simultaneously and must be synchronized by
@@ -3533,6 +3538,7 @@ void register_console(struct console *newcon)
console_init_seq(newcon, bootcon_registered);

if (newcon->flags & CON_NBCON) {
+ have_nbcon_console = true;
nbcon_init(newcon);

/*
@@ -3619,6 +3625,7 @@ EXPORT_SYMBOL(register_console);
static int unregister_console_locked(struct console *console)
{
bool found_legacy_con = false;
+ bool found_nbcon_con = false;
bool found_boot_con = false;
struct console *c;
int res;
@@ -3675,13 +3682,18 @@ static int unregister_console_locked(struct console *console)
for_each_console(c) {
if (c->flags & CON_BOOT)
found_boot_con = true;
- if (!(c->flags & CON_NBCON))
+
+ if (c->flags & CON_NBCON)
+ found_nbcon_con = true;
+ else
found_legacy_con = true;
}
if (!found_boot_con)
have_boot_console = found_boot_con;
if (!found_legacy_con)
have_legacy_console = found_legacy_con;
+ if (!found_nbcon_con)
+ have_nbcon_console = found_nbcon_con;

return res;
}
--
2.39.2


2024-04-02 22:15:43

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 20/27] printk: Avoid console_lock dance if no legacy or boot consoles

Currently the console lock is used to attempt legacy-type
printing even if there are no legacy or boot consoles registered.
If no such consoles are registered, the console lock does not
need to be taken.

Add tracking of legacy console registration and use it with
boot console tracking to avoid unnecessary code paths, i.e.
do not use the console lock if there are no boot consoles
and no legacy consoles.

Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 77 +++++++++++++++++++++++++++++++++++-------
1 file changed, 64 insertions(+), 13 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index df84c6bfbb2d..4ff3800e8e8e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -463,6 +463,13 @@ static int console_msg_format = MSG_FORMAT_DEFAULT;
/* syslog_lock protects syslog_* variables and write access to clear_seq. */
static DEFINE_MUTEX(syslog_lock);

+/*
+ * Specifies if a legacy console is registered. If legacy consoles are
+ * present, it is necessary to perform the console lock/unlock dance
+ * whenever console flushing should occur.
+ */
+static bool have_legacy_console;
+
/*
* Specifies if a boot console is registered. If boot consoles are present,
* nbcon consoles cannot print simultaneously and must be synchronized by
@@ -471,6 +478,14 @@ static DEFINE_MUTEX(syslog_lock);
*/
static bool have_boot_console;

+/*
+ * Specifies if the console lock/unlock dance is needed for console
+ * printing. If @have_boot_console is true, the nbcon consoles will
+ * be printed serially along with the legacy consoles because nbcon
+ * consoles cannot print simultaneously with boot consoles.
+ */
+#define printing_via_unlock (have_legacy_console || have_boot_console)
+
#ifdef CONFIG_PRINTK
DECLARE_WAIT_QUEUE_HEAD(log_wait);
/* All 3 protected by @syslog_lock. */
@@ -2339,7 +2354,7 @@ asmlinkage int vprintk_emit(int facility, int level,
printed_len = vprintk_store(facility, level, dev_info, fmt, args);

/* If called from the scheduler, we can not call up(). */
- if (!in_sched) {
+ if (!in_sched && printing_via_unlock) {
/*
* The caller may be holding system-critical or
* timing-sensitive locks. Disable preemption during
@@ -2648,7 +2663,7 @@ void resume_console(void)
*/
static int console_cpu_notify(unsigned int cpu)
{
- if (!cpuhp_tasks_frozen) {
+ if (!cpuhp_tasks_frozen && printing_via_unlock) {
/* If trylock fails, someone else is doing the printing */
if (console_trylock())
console_unlock();
@@ -3189,7 +3204,8 @@ void console_flush_on_panic(enum con_flush_mode mode)

nbcon_atomic_flush_pending();

- console_flush_all(false, &next_seq, &handover);
+ if (printing_via_unlock)
+ console_flush_all(false, &next_seq, &handover);
}

/*
@@ -3526,6 +3542,8 @@ void register_console(struct console *newcon)
*/
nbcon_seq_force(newcon, newcon->seq);
newcon->seq = 0;
+ } else {
+ have_legacy_console = true;
}

if (newcon->flags & CON_BOOT)
@@ -3600,6 +3618,7 @@ EXPORT_SYMBOL(register_console);
/* Must be called under console_list_lock(). */
static int unregister_console_locked(struct console *console)
{
+ bool found_legacy_con = false;
bool found_boot_con = false;
struct console *c;
int res;
@@ -3656,9 +3675,13 @@ static int unregister_console_locked(struct console *console)
for_each_console(c) {
if (c->flags & CON_BOOT)
found_boot_con = true;
+ if (!(c->flags & CON_NBCON))
+ found_legacy_con = true;
}
if (!found_boot_con)
have_boot_console = found_boot_con;
+ if (!found_legacy_con)
+ have_legacy_console = found_legacy_con;

return res;
}
@@ -3810,6 +3833,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
u64 last_diff = 0;
u64 printk_seq;
short flags;
+ bool locked;
int cookie;
u64 diff;
u64 seq;
@@ -3819,22 +3843,35 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
seq = prb_next_reserve_seq(prb);

/* Flush the consoles so that records up to @seq are printed. */
- console_lock();
- console_unlock();
+ if (printing_via_unlock) {
+ console_lock();
+ console_unlock();
+ }

for (;;) {
unsigned long begin_jiffies;
unsigned long slept_jiffies;

+ locked = false;
diff = 0;

+ if (printing_via_unlock) {
+ /*
+ * Hold the console_lock to guarantee safe access to
+ * console->seq. Releasing console_lock flushes more
+ * records in case @seq is still not printed on all
+ * usable consoles.
+ */
+ console_lock();
+ locked = true;
+ }
+
/*
- * Hold the console_lock to guarantee safe access to
- * console->seq. Releasing console_lock flushes more
- * records in case @seq is still not printed on all
- * usable consoles.
+ * Ensure the compiler does not optimize @locked to be
+ * @printing_via_unlock since the latter can change at any
+ * time.
*/
- console_lock();
+ barrier();

cookie = console_srcu_read_lock();
for_each_console_srcu(c) {
@@ -3854,6 +3891,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
if (flags & CON_NBCON) {
printk_seq = nbcon_seq_read(c);
} else {
+ WARN_ON_ONCE(!locked);
printk_seq = c->seq;
}

@@ -3865,7 +3903,8 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
if (diff != last_diff && reset_on_progress)
remaining_jiffies = timeout_jiffies;

- console_unlock();
+ if (locked)
+ console_unlock();

/* Note: @diff is 0 if there are no usable consoles. */
if (diff == 0 || remaining_jiffies == 0)
@@ -3935,6 +3974,7 @@ static void __wake_up_klogd(int val)
return;

preempt_disable();
+
/*
* Guarantee any new records can be seen by tasks preparing to wait
* before this context checks if the wait queue is empty.
@@ -3946,11 +3986,22 @@ static void __wake_up_klogd(int val)
*
* This pairs with devkmsg_read:A and syslog_print:A.
*/
- if (wq_has_sleeper(&log_wait) || /* LMM(__wake_up_klogd:A) */
- (val & PRINTK_PENDING_OUTPUT)) {
+ if (!wq_has_sleeper(&log_wait)) /* LMM(__wake_up_klogd:A) */
+ val &= ~PRINTK_PENDING_WAKEUP;
+
+ /*
+ * Simple read is safe. register_console() would flush a newly
+ * registered legacy console when writing the message about it
+ * being enabled.
+ */
+ if (!printing_via_unlock)
+ val &= ~PRINTK_PENDING_OUTPUT;
+
+ if (val) {
this_cpu_or(printk_pending, val);
irq_work_queue(this_cpu_ptr(&wake_up_klogd_work));
}
+
preempt_enable();
}

--
2.39.2


2024-04-02 22:16:08

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 22/27] printk: Coordinate direct printing in panic

Perform printing by nbcon consoles on the panic CPU from the
printk() caller context in order to get panic messages printed
as soon as possible.

If legacy and nbcon consoles are registered, the legacy consoles
will no longer perform direct printing on the panic CPU until
after the backtrace has been stored. This will give the safe
nbcon consoles a chance to print the panic messages before
allowing the unsafe legacy consoles to print.

If no nbcon consoles are registered, there is no change in
behavior (i.e. legacy consoles will always attempt to print
from the printk() caller context).

Signed-off-by: John Ogness <[email protected]>
---
include/linux/printk.h | 5 ++++
kernel/panic.c | 2 ++
kernel/printk/printk.c | 62 ++++++++++++++++++++++++++++++++++++------
3 files changed, 61 insertions(+), 8 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 866683a293af..2aa7c302d616 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -195,6 +195,7 @@ void show_regs_print_info(const char *log_lvl);
extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
extern asmlinkage void dump_stack(void) __cold;
void printk_trigger_flush(void);
+void printk_legacy_allow_panic_sync(void);
extern void nbcon_driver_acquire(struct console *con);
extern void nbcon_driver_release(struct console *con);
void nbcon_atomic_flush_unsafe(void);
@@ -278,6 +279,10 @@ static inline void printk_trigger_flush(void)
{
}

+static inline void printk_legacy_allow_panic_sync(void)
+{
+}
+
static inline void nbcon_driver_acquire(struct console *con)
{
}
diff --git a/kernel/panic.c b/kernel/panic.c
index c039f8e1ddae..de8115c829cf 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -364,6 +364,8 @@ void panic(const char *fmt, ...)

panic_other_cpus_shutdown(_crash_kexec_post_notifiers);

+ printk_legacy_allow_panic_sync();
+
/*
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 329b8507f8a0..e3fd157d7ba3 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -471,7 +471,9 @@ static DEFINE_MUTEX(syslog_lock);
static bool have_legacy_console;

/*
- * Specifies if an nbcon console is registered.
+ * Specifies if an nbcon console is registered. If nbcon consoles are present,
+ * synchronous printing of legacy consoles will not occur during panic until
+ * the backtrace has been stored to the ringbuffer.
*/
static bool have_nbcon_console;

@@ -2330,12 +2332,29 @@ int vprintk_store(int facility, int level,
return ret;
}

+static bool legacy_allow_panic_sync;
+
+/*
+ * This acts as a one-way switch to allow legacy consoles to print from
+ * the printk() caller context on a panic CPU. It also attempts to flush
+ * the legacy consoles in this context.
+ */
+void printk_legacy_allow_panic_sync(void)
+{
+ legacy_allow_panic_sync = true;
+
+ if (printing_via_unlock && !in_nmi()) {
+ if (console_trylock())
+ console_unlock();
+ }
+}
+
asmlinkage int vprintk_emit(int facility, int level,
const struct dev_printk_info *dev_info,
const char *fmt, va_list args)
{
+ bool do_trylock_unlock = printing_via_unlock;
int printed_len;
- bool in_sched = false;

/* Suppress unimportant messages after panic happens */
if (unlikely(suppress_printk))
@@ -2351,15 +2370,42 @@ asmlinkage int vprintk_emit(int facility, int level,

if (level == LOGLEVEL_SCHED) {
level = LOGLEVEL_DEFAULT;
- in_sched = true;
+ /* If called from the scheduler, we can not call up(). */
+ do_trylock_unlock = false;
}

printk_delay(level);

printed_len = vprintk_store(facility, level, dev_info, fmt, args);

- /* If called from the scheduler, we can not call up(). */
- if (!in_sched && printing_via_unlock) {
+ if (have_nbcon_console && !have_boot_console) {
+ bool is_panic_context = this_cpu_in_panic();
+
+ /*
+ * In panic, the legacy consoles are not allowed to print from
+ * the printk calling context unless explicitly allowed. This
+ * gives the safe nbcon consoles a chance to print out all the
+ * panic messages first. This restriction only applies if
+ * there are nbcon consoles registered.
+ */
+ if (is_panic_context)
+ do_trylock_unlock &= legacy_allow_panic_sync;
+
+ /*
+ * There are situations where nbcon atomic printing should
+ * happen in the printk() caller context:
+ *
+ * - When this CPU is in panic.
+ *
+ * Note that if boot consoles are registered, the console
+ * lock/unlock dance must be relied upon instead because nbcon
+ * consoles cannot print simultaneously with boot consoles.
+ */
+ if (is_panic_context)
+ nbcon_atomic_flush_pending();
+ }
+
+ if (do_trylock_unlock) {
/*
* The caller may be holding system-critical or
* timing-sensitive locks. Disable preemption during
@@ -2379,10 +2425,10 @@ asmlinkage int vprintk_emit(int facility, int level,
preempt_enable();
}

- if (in_sched)
- defer_console_output();
- else
+ if (do_trylock_unlock)
wake_up_klogd();
+ else
+ defer_console_output();

return printed_len;
}
--
2.39.2


2024-04-02 22:16:19

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 24/27] panic: Mark emergency section in warn

From: Thomas Gleixner <[email protected]>

Mark the full contents of __warn() as an emergency section. In
this section, the CPU will not perform console output for the
printk() calls. Instead, a flushing of the console output is
triggered when exiting the emergency section.

Co-developed-by: John Ogness <[email protected]>
Signed-off-by: John Ogness <[email protected]>
Signed-off-by: Thomas Gleixner (Intel) <[email protected]>
---
kernel/panic.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/panic.c b/kernel/panic.c
index de8115c829cf..ee03193f9495 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -667,6 +667,8 @@ struct warn_args {
void __warn(const char *file, int line, void *caller, unsigned taint,
struct pt_regs *regs, struct warn_args *args)
{
+ nbcon_cpu_emergency_enter();
+
disable_trace_on_warning();

if (file)
@@ -697,6 +699,8 @@ void __warn(const char *file, int line, void *caller, unsigned taint,

/* Just a warning, don't kill lockdep. */
add_taint(taint, LOCKDEP_STILL_OK);
+
+ nbcon_cpu_emergency_exit();
}

#ifdef CONFIG_BUG
--
2.39.2


2024-04-02 22:16:27

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 23/27] printk: nbcon: Implement emergency sections

From: Thomas Gleixner <[email protected]>

In emergency situations (something has gone wrong but the
system continues to operate), usually important information
(such as a backtrace) is generated via printk(). Each
individual printk record has little meaning. It is the
collection of printk messages that is most often needed by
developers and users.

In order to help ensure that the collection of printk messages
in an emergency situation are all stored to the ringbuffer as
quickly as possible, disable console output for that CPU while
it is in the emergency situation. The consoles need to be
flushed when exiting the emergency situation.

Add per-CPU emergency nesting tracking because an emergency
can arise while in an emergency situation.

Add functions to mark the beginning and end of emergency
sections where the urgent messages are generated.

Do not print if the current CPU is in an emergency state.

When exiting all emergency nesting, flush nbcon consoles
directly using their atomic callback. Legacy consoles are
triggered for flushing via irq_work because it is not known
if the context was safe for a trylock on the console lock.

Note that the emergency state is not system-wide. While one CPU
is in an emergency state, another CPU may continue to print
console messages.

Co-developed-by: John Ogness <[email protected]>
Signed-off-by: John Ogness <[email protected]>
Signed-off-by: Thomas Gleixner (Intel) <[email protected]>
---
include/linux/console.h | 4 ++
kernel/printk/nbcon.c | 83 +++++++++++++++++++++++++++++++++++++++++
kernel/printk/printk.c | 13 ++++++-
3 files changed, 98 insertions(+), 2 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 5f1758891aec..7712e4145164 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -559,10 +559,14 @@ static inline bool console_is_registered(const struct console *con)
hlist_for_each_entry(con, &console_list, node)

#ifdef CONFIG_PRINTK
+extern void nbcon_cpu_emergency_enter(void);
+extern void nbcon_cpu_emergency_exit(void);
extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
#else
+static inline void nbcon_cpu_emergency_enter(void) { }
+static inline void nbcon_cpu_emergency_exit(void) { }
static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return false; }
static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 47f39402a22b..4c852c2e8d89 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -936,6 +936,29 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
return nbcon_context_exit_unsafe(ctxt);
}

+/* Track the nbcon emergency nesting per CPU. */
+static DEFINE_PER_CPU(unsigned int, nbcon_pcpu_emergency_nesting);
+static unsigned int early_nbcon_pcpu_emergency_nesting __initdata;
+
+/**
+ * nbcon_get_cpu_emergency_nesting - Get the per CPU emergency nesting pointer
+ *
+ * Return: Either a pointer to the per CPU emergency nesting counter of
+ * the current CPU or to the init data during early boot.
+ */
+static __ref unsigned int *nbcon_get_cpu_emergency_nesting(void)
+{
+ /*
+ * The value of __printk_percpu_data_ready gets set in normal
+ * context and before SMP initialization. As a result it could
+ * never change while inside an nbcon emergency section.
+ */
+ if (!printk_percpu_data_ready())
+ return &early_nbcon_pcpu_emergency_nesting;
+
+ return this_cpu_ptr(&nbcon_pcpu_emergency_nesting);
+}
+
/**
* nbcon_atomic_emit_one - Print one record for an nbcon console using the
* write_atomic() callback
@@ -977,9 +1000,15 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
*/
enum nbcon_prio nbcon_get_default_prio(void)
{
+ unsigned int *cpu_emergency_nesting;
+
if (this_cpu_in_panic())
return NBCON_PRIO_PANIC;

+ cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
+ if (*cpu_emergency_nesting)
+ return NBCON_PRIO_EMERGENCY;
+
return NBCON_PRIO_NORMAL;
}

@@ -1146,6 +1175,60 @@ void nbcon_atomic_flush_unsafe(void)
__nbcon_atomic_flush_pending(prb_next_reserve_seq(prb), true);
}

+/**
+ * nbcon_cpu_emergency_enter - Enter an emergency section where printk()
+ * messages for that CPU are only stored
+ *
+ * Upon exiting the emergency section, all stored messages are flushed.
+ *
+ * Context: Any context. Disables preemption.
+ *
+ * When within an emergency section, no printing occurs on that CPU. This
+ * is to allow all emergency messages to be dumped into the ringbuffer before
+ * flushing the ringbuffer. The actual printing occurs when exiting the
+ * outermost emergency section.
+ */
+void nbcon_cpu_emergency_enter(void)
+{
+ unsigned int *cpu_emergency_nesting;
+
+ preempt_disable();
+
+ cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
+ (*cpu_emergency_nesting)++;
+}
+
+/**
+ * nbcon_cpu_emergency_exit - Exit an emergency section and flush the
+ * stored messages
+ *
+ * Flushing only occurs when exiting all nesting for the CPU.
+ *
+ * Context: Any context. Enables preemption.
+ */
+void nbcon_cpu_emergency_exit(void)
+{
+ unsigned int *cpu_emergency_nesting;
+ bool do_trigger_flush = false;
+
+ cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
+
+ WARN_ON_ONCE(*cpu_emergency_nesting == 0);
+
+ if (*cpu_emergency_nesting == 1) {
+ nbcon_atomic_flush_pending();
+ do_trigger_flush = true;
+ }
+
+ /* Undo the nesting count of nbcon_cpu_emergency_enter(). */
+ (*cpu_emergency_nesting)--;
+
+ preempt_enable();
+
+ if (do_trigger_flush)
+ printk_trigger_flush();
+}
+
/**
* nbcon_alloc - Allocate buffers needed by the nbcon console
* @con: Console to allocate buffers for
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index e3fd157d7ba3..ab5dade1352d 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2412,16 +2412,25 @@ asmlinkage int vprintk_emit(int facility, int level,
* printing of all remaining records to all consoles so that
* this context can return as soon as possible. Hopefully
* another printk() caller will take over the printing.
+ *
+ * Also, nbcon_get_default_prio() requires migration disabled.
*/
preempt_disable();
+
/*
* Try to acquire and then immediately release the console
* semaphore. The release will print out buffers. With the
* spinning variant, this context tries to take over the
* printing from another printing context.
+ *
+ * Skip it in EMERGENCY priority. The console will be
+ * explicitly flushed when exiting the emergency section.
*/
- if (console_trylock_spinning())
- console_unlock();
+ if (nbcon_get_default_prio() != NBCON_PRIO_EMERGENCY) {
+ if (console_trylock_spinning())
+ console_unlock();
+ }
+
preempt_enable();
}

--
2.39.2


2024-04-02 22:16:41

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 25/27] panic: Mark emergency section in oops

Mark an emergency section beginning with oops_enter() until the
end of oops_exit(). In this section, the CPU will not perform
console output for the printk() calls. Instead, a flushing of the
console output is triggered when exiting the emergency section.

The very end of oops_exit() performs a kmsg_dump(). This is not
included in the emergency section because it is another
flushing mechanism that should occur after the consoles have
been triggered to flush.

Signed-off-by: John Ogness <[email protected]>
---
kernel/panic.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/panic.c b/kernel/panic.c
index ee03193f9495..3754a2471b4f 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -634,6 +634,7 @@ bool oops_may_print(void)
*/
void oops_enter(void)
{
+ nbcon_cpu_emergency_enter();
tracing_off();
/* can't trust the integrity of the kernel anymore: */
debug_locks_off();
@@ -656,6 +657,7 @@ void oops_exit(void)
{
do_oops_enter_exit();
print_oops_end_marker();
+ nbcon_cpu_emergency_exit();
kmsg_dump(KMSG_DUMP_OOPS);
}

--
2.39.2


2024-04-02 22:18:59

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 27/27] lockdep: Mark emergency sections in lockdep splats

Mark emergency sections wherever multiple lines of
lock debugging output are generated. In an emergency
section the CPU will not perform console output for the
printk() calls. Instead, a flushing of the console
output is triggered when exiting the emergency section.
This allows the full message block to be stored as
quickly as possible in the ringbuffer.

Signed-off-by: John Ogness <[email protected]>
---
kernel/locking/lockdep.c | 91 ++++++++++++++++++++++++++++++++++++++--
1 file changed, 88 insertions(+), 3 deletions(-)

diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 151bd3de5936..80cfbe7b340e 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -56,6 +56,7 @@
#include <linux/kprobes.h>
#include <linux/lockdep.h>
#include <linux/context_tracking.h>
+#include <linux/console.h>

#include <asm/sections.h>

@@ -574,8 +575,10 @@ static struct lock_trace *save_trace(void)
if (!debug_locks_off_graph_unlock())
return NULL;

+ nbcon_cpu_emergency_enter();
print_lockdep_off("BUG: MAX_STACK_TRACE_ENTRIES too low!");
dump_stack();
+ nbcon_cpu_emergency_exit();

return NULL;
}
@@ -782,6 +785,8 @@ static void lockdep_print_held_locks(struct task_struct *p)
{
int i, depth = READ_ONCE(p->lockdep_depth);

+ nbcon_cpu_emergency_enter();
+
if (!depth)
printk("no locks held by %s/%d.\n", p->comm, task_pid_nr(p));
else
@@ -792,11 +797,13 @@ static void lockdep_print_held_locks(struct task_struct *p)
* and it's not the current task.
*/
if (p != current && task_is_running(p))
- return;
+ goto out;
for (i = 0; i < depth; i++) {
printk(" #%d: ", i);
print_lock(p->held_locks + i);
}
+out:
+ nbcon_cpu_emergency_exit();
}

static void print_kernel_ident(void)
@@ -888,11 +895,13 @@ look_up_lock_class(const struct lockdep_map *lock, unsigned int subclass)
if (unlikely(subclass >= MAX_LOCKDEP_SUBCLASSES)) {
instrumentation_begin();
debug_locks_off();
+ nbcon_cpu_emergency_enter();
printk(KERN_ERR
"BUG: looking up invalid subclass: %u\n", subclass);
printk(KERN_ERR
"turning off the locking correctness validator.\n");
dump_stack();
+ nbcon_cpu_emergency_exit();
instrumentation_end();
return NULL;
}
@@ -969,11 +978,13 @@ static bool assign_lock_key(struct lockdep_map *lock)
else {
/* Debug-check: all keys must be persistent! */
debug_locks_off();
+ nbcon_cpu_emergency_enter();
pr_err("INFO: trying to register non-static key.\n");
pr_err("The code is fine but needs lockdep annotation, or maybe\n");
pr_err("you didn't initialize this object before use?\n");
pr_err("turning off the locking correctness validator.\n");
dump_stack();
+ nbcon_cpu_emergency_exit();
return false;
}

@@ -1317,8 +1328,10 @@ register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
return NULL;
}

+ nbcon_cpu_emergency_enter();
print_lockdep_off("BUG: MAX_LOCKDEP_KEYS too low!");
dump_stack();
+ nbcon_cpu_emergency_exit();
return NULL;
}
nr_lock_classes++;
@@ -1350,11 +1363,13 @@ register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force)
if (verbose(class)) {
graph_unlock();

+ nbcon_cpu_emergency_enter();
printk("\nnew class %px: %s", class->key, class->name);
if (class->name_version > 1)
printk(KERN_CONT "#%d", class->name_version);
printk(KERN_CONT "\n");
dump_stack();
+ nbcon_cpu_emergency_exit();

if (!graph_lock()) {
return NULL;
@@ -1393,8 +1408,10 @@ static struct lock_list *alloc_list_entry(void)
if (!debug_locks_off_graph_unlock())
return NULL;

+ nbcon_cpu_emergency_enter();
print_lockdep_off("BUG: MAX_LOCKDEP_ENTRIES too low!");
dump_stack();
+ nbcon_cpu_emergency_exit();
return NULL;
}
nr_list_entries++;
@@ -2040,6 +2057,8 @@ static noinline void print_circular_bug(struct lock_list *this,

depth = get_lock_depth(target);

+ nbcon_cpu_emergency_enter();
+
print_circular_bug_header(target, depth, check_src, check_tgt);

parent = get_lock_parent(target);
@@ -2058,6 +2077,8 @@ static noinline void print_circular_bug(struct lock_list *this,

printk("\nstack backtrace:\n");
dump_stack();
+
+ nbcon_cpu_emergency_exit();
}

static noinline void print_bfs_bug(int ret)
@@ -2570,6 +2591,8 @@ print_bad_irq_dependency(struct task_struct *curr,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("=====================================================\n");
pr_warn("WARNING: %s-safe -> %s-unsafe lock order detected\n",
@@ -2619,11 +2642,13 @@ print_bad_irq_dependency(struct task_struct *curr,
pr_warn(" and %s-irq-unsafe lock:\n", irqclass);
next_root->trace = save_trace();
if (!next_root->trace)
- return;
+ goto out;
print_shortest_lock_dependencies(forwards_entry, next_root);

pr_warn("\nstack backtrace:\n");
dump_stack();
+out:
+ nbcon_cpu_emergency_exit();
}

static const char *state_names[] = {
@@ -2988,6 +3013,8 @@ print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("============================================\n");
pr_warn("WARNING: possible recursive locking detected\n");
@@ -3010,6 +3037,8 @@ print_deadlock_bug(struct task_struct *curr, struct held_lock *prev,

pr_warn("\nstack backtrace:\n");
dump_stack();
+
+ nbcon_cpu_emergency_exit();
}

/*
@@ -3607,6 +3636,8 @@ static void print_collision(struct task_struct *curr,
struct held_lock *hlock_next,
struct lock_chain *chain)
{
+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("============================\n");
pr_warn("WARNING: chain_key collision\n");
@@ -3623,6 +3654,8 @@ static void print_collision(struct task_struct *curr,

pr_warn("\nstack backtrace:\n");
dump_stack();
+
+ nbcon_cpu_emergency_exit();
}
#endif

@@ -3713,8 +3746,10 @@ static inline int add_chain_cache(struct task_struct *curr,
if (!debug_locks_off_graph_unlock())
return 0;

+ nbcon_cpu_emergency_enter();
print_lockdep_off("BUG: MAX_LOCKDEP_CHAINS too low!");
dump_stack();
+ nbcon_cpu_emergency_exit();
return 0;
}
chain->chain_key = chain_key;
@@ -3731,8 +3766,10 @@ static inline int add_chain_cache(struct task_struct *curr,
if (!debug_locks_off_graph_unlock())
return 0;

+ nbcon_cpu_emergency_enter();
print_lockdep_off("BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!");
dump_stack();
+ nbcon_cpu_emergency_exit();
return 0;
}

@@ -3971,6 +4008,8 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this,
if (!debug_locks_off() || debug_locks_silent)
return;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("================================\n");
pr_warn("WARNING: inconsistent lock state\n");
@@ -3999,6 +4038,8 @@ print_usage_bug(struct task_struct *curr, struct held_lock *this,

pr_warn("\nstack backtrace:\n");
dump_stack();
+
+ nbcon_cpu_emergency_exit();
}

/*
@@ -4033,6 +4074,8 @@ print_irq_inversion_bug(struct task_struct *curr,
if (!debug_locks_off_graph_unlock() || debug_locks_silent)
return;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("========================================================\n");
pr_warn("WARNING: possible irq lock inversion dependency detected\n");
@@ -4073,11 +4116,13 @@ print_irq_inversion_bug(struct task_struct *curr,
pr_warn("\nthe shortest dependencies between 2nd lock and 1st lock:\n");
root->trace = save_trace();
if (!root->trace)
- return;
+ goto out;
print_shortest_lock_dependencies(other, root);

pr_warn("\nstack backtrace:\n");
dump_stack();
+out:
+ nbcon_cpu_emergency_exit();
}

/*
@@ -4154,6 +4199,8 @@ void print_irqtrace_events(struct task_struct *curr)
{
const struct irqtrace_events *trace = &curr->irqtrace;

+ nbcon_cpu_emergency_enter();
+
printk("irq event stamp: %u\n", trace->irq_events);
printk("hardirqs last enabled at (%u): [<%px>] %pS\n",
trace->hardirq_enable_event, (void *)trace->hardirq_enable_ip,
@@ -4167,6 +4214,8 @@ void print_irqtrace_events(struct task_struct *curr)
printk("softirqs last disabled at (%u): [<%px>] %pS\n",
trace->softirq_disable_event, (void *)trace->softirq_disable_ip,
(void *)trace->softirq_disable_ip);
+
+ nbcon_cpu_emergency_exit();
}

static int HARDIRQ_verbose(struct lock_class *class)
@@ -4687,10 +4736,12 @@ static int mark_lock(struct task_struct *curr, struct held_lock *this,
* We must printk outside of the graph_lock:
*/
if (ret == 2) {
+ nbcon_cpu_emergency_enter();
printk("\nmarked lock as {%s}:\n", usage_str[new_bit]);
print_lock(this);
print_irqtrace_events(curr);
dump_stack();
+ nbcon_cpu_emergency_exit();
}

return ret;
@@ -4731,6 +4782,8 @@ print_lock_invalid_wait_context(struct task_struct *curr,
if (debug_locks_silent)
return 0;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("=============================\n");
pr_warn("[ BUG: Invalid wait context ]\n");
@@ -4750,6 +4803,8 @@ print_lock_invalid_wait_context(struct task_struct *curr,
pr_warn("stack backtrace:\n");
dump_stack();

+ nbcon_cpu_emergency_exit();
+
return 0;
}

@@ -4954,6 +5009,8 @@ print_lock_nested_lock_not_held(struct task_struct *curr,
if (debug_locks_silent)
return;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("==================================\n");
pr_warn("WARNING: Nested lock was not taken\n");
@@ -4974,6 +5031,8 @@ print_lock_nested_lock_not_held(struct task_struct *curr,

pr_warn("\nstack backtrace:\n");
dump_stack();
+
+ nbcon_cpu_emergency_exit();
}

static int __lock_is_held(const struct lockdep_map *lock, int read);
@@ -5019,11 +5078,13 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
debug_class_ops_inc(class);

if (very_verbose(class)) {
+ nbcon_cpu_emergency_enter();
printk("\nacquire class [%px] %s", class->key, class->name);
if (class->name_version > 1)
printk(KERN_CONT "#%d", class->name_version);
printk(KERN_CONT "\n");
dump_stack();
+ nbcon_cpu_emergency_exit();
}

/*
@@ -5150,6 +5211,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
#endif
if (unlikely(curr->lockdep_depth >= MAX_LOCK_DEPTH)) {
debug_locks_off();
+ nbcon_cpu_emergency_enter();
print_lockdep_off("BUG: MAX_LOCK_DEPTH too low!");
printk(KERN_DEBUG "depth: %i max: %lu!\n",
curr->lockdep_depth, MAX_LOCK_DEPTH);
@@ -5157,6 +5219,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
lockdep_print_held_locks(current);
debug_show_all_locks();
dump_stack();
+ nbcon_cpu_emergency_exit();

return 0;
}
@@ -5176,6 +5239,8 @@ static void print_unlock_imbalance_bug(struct task_struct *curr,
if (debug_locks_silent)
return;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("=====================================\n");
pr_warn("WARNING: bad unlock balance detected!\n");
@@ -5192,6 +5257,8 @@ static void print_unlock_imbalance_bug(struct task_struct *curr,

pr_warn("\nstack backtrace:\n");
dump_stack();
+
+ nbcon_cpu_emergency_exit();
}

static noinstr int match_held_lock(const struct held_lock *hlock,
@@ -5895,6 +5962,8 @@ static void print_lock_contention_bug(struct task_struct *curr,
if (debug_locks_silent)
return;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("=================================\n");
pr_warn("WARNING: bad contention detected!\n");
@@ -5911,6 +5980,8 @@ static void print_lock_contention_bug(struct task_struct *curr,

pr_warn("\nstack backtrace:\n");
dump_stack();
+
+ nbcon_cpu_emergency_exit();
}

static void
@@ -6524,6 +6595,8 @@ print_freed_lock_bug(struct task_struct *curr, const void *mem_from,
if (debug_locks_silent)
return;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("=========================\n");
pr_warn("WARNING: held lock freed!\n");
@@ -6536,6 +6609,8 @@ print_freed_lock_bug(struct task_struct *curr, const void *mem_from,

pr_warn("\nstack backtrace:\n");
dump_stack();
+
+ nbcon_cpu_emergency_exit();
}

static inline int not_in_range(const void* mem_from, unsigned long mem_len,
@@ -6582,6 +6657,8 @@ static void print_held_locks_bug(void)
if (debug_locks_silent)
return;

+ nbcon_cpu_emergency_enter();
+
pr_warn("\n");
pr_warn("====================================\n");
pr_warn("WARNING: %s/%d still has locks held!\n",
@@ -6591,6 +6668,8 @@ static void print_held_locks_bug(void)
lockdep_print_held_locks(current);
pr_warn("\nstack backtrace:\n");
dump_stack();
+
+ nbcon_cpu_emergency_exit();
}

void debug_check_no_locks_held(void)
@@ -6609,6 +6688,7 @@ void debug_show_all_locks(void)
pr_warn("INFO: lockdep is turned off.\n");
return;
}
+ nbcon_cpu_emergency_enter();
pr_warn("\nShowing all locks held in the system:\n");

rcu_read_lock();
@@ -6623,6 +6703,7 @@ void debug_show_all_locks(void)

pr_warn("\n");
pr_warn("=============================================\n\n");
+ nbcon_cpu_emergency_exit();
}
EXPORT_SYMBOL_GPL(debug_show_all_locks);
#endif
@@ -6648,6 +6729,7 @@ asmlinkage __visible void lockdep_sys_exit(void)
if (unlikely(curr->lockdep_depth)) {
if (!debug_locks_off())
return;
+ nbcon_cpu_emergency_enter();
pr_warn("\n");
pr_warn("================================================\n");
pr_warn("WARNING: lock held when returning to user space!\n");
@@ -6656,6 +6738,7 @@ asmlinkage __visible void lockdep_sys_exit(void)
pr_warn("%s/%d is leaving the kernel with locks still held!\n",
curr->comm, curr->pid);
lockdep_print_held_locks(curr);
+ nbcon_cpu_emergency_exit();
}

/*
@@ -6672,6 +6755,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
bool rcu = warn_rcu_enter();

/* Note: the following can be executed concurrently, so be careful. */
+ nbcon_cpu_emergency_enter();
pr_warn("\n");
pr_warn("=============================\n");
pr_warn("WARNING: suspicious RCU usage\n");
@@ -6710,6 +6794,7 @@ void lockdep_rcu_suspicious(const char *file, const int line, const char *s)
lockdep_print_held_locks(curr);
pr_warn("\nstack backtrace:\n");
dump_stack();
+ nbcon_cpu_emergency_exit();
warn_rcu_exit(rcu);
}
EXPORT_SYMBOL_GPL(lockdep_rcu_suspicious);
--
2.39.2


2024-04-02 22:22:55

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 26/27] rcu: Mark emergency section in rcu stalls

Mark emergency sections wherever multiple lines of
rcu stall information are generated. In an emergency
section the CPU will not perform console output for the
printk() calls. Instead, a flushing of the console
output is triggered when exiting the emergency section.
This allows the full message block to be stored as
quickly as possible in the ringbuffer.

Signed-off-by: John Ogness <[email protected]>
---
kernel/rcu/tree_exp.h | 7 +++++++
kernel/rcu/tree_stall.h | 9 +++++++++
2 files changed, 16 insertions(+)

diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h
index 6b83537480b1..0a135e94da08 100644
--- a/kernel/rcu/tree_exp.h
+++ b/kernel/rcu/tree_exp.h
@@ -7,6 +7,7 @@
* Authors: Paul E. McKenney <[email protected]>
*/

+#include <linux/console.h>
#include <linux/lockdep.h>

static void rcu_exp_handler(void *unused);
@@ -571,6 +572,9 @@ static void synchronize_rcu_expedited_wait(void)
return;
if (rcu_stall_is_suppressed())
continue;
+
+ nbcon_cpu_emergency_enter();
+
j = jiffies;
rcu_stall_notifier_call_chain(RCU_STALL_NOTIFY_EXP, (void *)(j - jiffies_start));
trace_rcu_stall_warning(rcu_state.name, TPS("ExpeditedStall"));
@@ -624,6 +628,9 @@ static void synchronize_rcu_expedited_wait(void)
rcu_exp_print_detail_task_stall_rnp(rnp);
}
jiffies_stall = 3 * rcu_exp_jiffies_till_stall_check() + 3;
+
+ nbcon_cpu_emergency_exit();
+
panic_on_rcu_stall();
}
}
diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
index 5d666428546b..f4d73ca20c76 100644
--- a/kernel/rcu/tree_stall.h
+++ b/kernel/rcu/tree_stall.h
@@ -7,6 +7,7 @@
* Author: Paul E. McKenney <[email protected]>
*/

+#include <linux/console.h>
#include <linux/kvm_para.h>
#include <linux/rcu_notifier.h>

@@ -604,6 +605,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
if (rcu_stall_is_suppressed())
return;

+ nbcon_cpu_emergency_enter();
+
/*
* OK, time to rat on our buddy...
* See Documentation/RCU/stallwarn.rst for info on how to debug
@@ -655,6 +658,8 @@ static void print_other_cpu_stall(unsigned long gp_seq, unsigned long gps)
rcu_check_gp_kthread_expired_fqs_timer();
rcu_check_gp_kthread_starvation();

+ nbcon_cpu_emergency_exit();
+
panic_on_rcu_stall();

rcu_force_quiescent_state(); /* Kick them all. */
@@ -675,6 +680,8 @@ static void print_cpu_stall(unsigned long gps)
if (rcu_stall_is_suppressed())
return;

+ nbcon_cpu_emergency_enter();
+
/*
* OK, time to rat on ourselves...
* See Documentation/RCU/stallwarn.rst for info on how to debug
@@ -703,6 +710,8 @@ static void print_cpu_stall(unsigned long gps)
jiffies + 3 * rcu_jiffies_till_stall_check() + 3);
raw_spin_unlock_irqrestore_rcu_node(rnp, flags);

+ nbcon_cpu_emergency_exit();
+
panic_on_rcu_stall();

/*
--
2.39.2


2024-04-02 22:24:37

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 01/27] printk: Add notation to console_srcu locking

kernel/printk/printk.c:284:5: sparse: sparse: context imbalance in
'console_srcu_read_lock' - wrong count at exit
include/linux/srcu.h:301:9: sparse: sparse: context imbalance in
'console_srcu_read_unlock' - unexpected unlock

Fixes: 6c4afa79147e ("printk: Prepare for SRCU console list protection")
Signed-off-by: John Ogness <[email protected]>
---
kernel/printk/printk.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 612c73333848..c7c0ee2b47eb 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -282,6 +282,7 @@ EXPORT_SYMBOL(console_list_unlock);
* Return: A cookie to pass to console_srcu_read_unlock().
*/
int console_srcu_read_lock(void)
+ __acquires(&console_srcu)
{
return srcu_read_lock_nmisafe(&console_srcu);
}
@@ -295,6 +296,7 @@ EXPORT_SYMBOL(console_srcu_read_lock);
* Counterpart to console_srcu_read_lock()
*/
void console_srcu_read_unlock(int cookie)
+ __releases(&console_srcu)
{
srcu_read_unlock_nmisafe(&console_srcu, cookie);
}
--
2.39.2


2024-04-02 22:25:00

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 03/27] printk: nbcon: Remove return value for write_atomic()

The return value of write_atomic() does not provide any useful
information. On the contrary, it makes things more complicated
for the caller to appropriately deal with the information.

Change write_atomic() to not have a return value. If the
message did not get printed due to loss of ownership, the
caller will notice this on its own. If ownership was not lost,
it will be assumed that the driver successfully printed the
message and the sequence number for that console will be
incremented.

Signed-off-by: John Ogness <[email protected]>
---
include/linux/console.h | 2 +-
kernel/printk/nbcon.c | 15 +++++++--------
2 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 779d388af8a0..54b98e4f0544 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -327,7 +327,7 @@ struct console {
struct hlist_node node;

/* nbcon console specific members */
- bool (*write_atomic)(struct console *con,
+ void (*write_atomic)(struct console *con,
struct nbcon_write_context *wctxt);
atomic_t __private nbcon_state;
atomic_long_t __private nbcon_seq;
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index d741659d26ec..2516449f921d 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -849,7 +849,6 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
unsigned long con_dropped;
struct nbcon_state cur;
unsigned long dropped;
- bool done;

/*
* The printk buffers are filled within an unsafe section. This
@@ -889,16 +888,16 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
wctxt->unsafe_takeover = cur.unsafe_takeover;

if (con->write_atomic) {
- done = con->write_atomic(con, wctxt);
+ con->write_atomic(con, wctxt);
} else {
- nbcon_context_release(ctxt);
+ /*
+ * This function should never be called for legacy consoles.
+ * Handle it as if ownership was lost and try to continue.
+ */
WARN_ON_ONCE(1);
- done = false;
- }
-
- /* If not done, the emit was aborted. */
- if (!done)
+ nbcon_context_release(ctxt);
return false;
+ }

/*
* Since any dropped message was successfully output, reset the
--
2.39.2


2024-04-02 22:26:37

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 05/27] printk: nbcon: Add detailed doc for write_atomic()

The write_atomic() callback has special requirements and is
allowed to use special helper functions. Provide detailed
documentation of the callback so that a developer has a
chance of implementing it correctly.

Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
---
include/linux/console.h | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 54b98e4f0544..e4028d4079e1 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -285,7 +285,7 @@ struct nbcon_write_context {
/**
* struct console - The console descriptor structure
* @name: The name of the console driver
- * @write: Write callback to output messages (Optional)
+ * @write: Legacy write callback to output messages (Optional)
* @read: Read callback for console input (Optional)
* @device: The underlying TTY device driver (Optional)
* @unblank: Callback to unblank the console (Optional)
@@ -302,7 +302,6 @@ struct nbcon_write_context {
* @data: Driver private data
* @node: hlist node for the console list
*
- * @write_atomic: Write callback for atomic context
* @nbcon_state: State for nbcon consoles
* @nbcon_seq: Sequence number of the next record for nbcon to print
* @pbufs: Pointer to nbcon private buffer
@@ -327,8 +326,32 @@ struct console {
struct hlist_node node;

/* nbcon console specific members */
- void (*write_atomic)(struct console *con,
- struct nbcon_write_context *wctxt);
+
+ /**
+ * @write_atomic:
+ *
+ * NBCON callback to write out text in any context.
+ *
+ * This callback is called with the console already acquired. The
+ * callback can use nbcon_can_proceed() at any time to verify that
+ * it is still the owner of the console. In the case that it has
+ * lost ownership, it is no longer allowed to go forward. In this
+ * case it must back out immediately and carefully. The buffer
+ * content is also no longer trusted since it no longer belongs to
+ * the context.
+ *
+ * If the callback needs to perform actions where ownership is not
+ * allowed to be taken over, nbcon_enter_unsafe() and
+ * nbcon_exit_unsafe() can be used to mark such sections. These
+ * functions are also points of possible ownership transfer. If
+ * either function returns false, ownership has been lost.
+ *
+ * This callback can be called from any context (including NMI).
+ * Therefore it must avoid usage of any locking and instead rely
+ * on the console ownership for synchronization.
+ */
+ void (*write_atomic)(struct console *con, struct nbcon_write_context *wctxt);
+
atomic_t __private nbcon_state;
atomic_long_t __private nbcon_seq;
struct printk_buffers *pbufs;
--
2.39.2


2024-04-02 22:28:22

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 13/27] printk: Let console_is_usable() handle nbcon

The nbcon consoles use a different printing callback. For nbcon
consoles, check for the write_atomic() callback instead of
write().

Signed-off-by: John Ogness <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
---
kernel/printk/internal.h | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index bad22092cd5e..398e99a30427 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -87,6 +87,8 @@ void nbcon_free(struct console *con);

/*
* Check if the given console is currently capable and allowed to print
+ * records. Note that this function does not consider the current context,
+ * which can also play a role in deciding if @con can be used to print
* records.
*
* Requires the console_srcu_read_lock.
@@ -101,8 +103,13 @@ static inline bool console_is_usable(struct console *con)
if ((flags & CON_SUSPENDED))
return false;

- if (!con->write)
- return false;
+ if (flags & CON_NBCON) {
+ if (!con->write_atomic)
+ return false;
+ } else {
+ if (!con->write)
+ return false;
+ }

/*
* Console drivers may assume that per-cpu resources have been
--
2.39.2


2024-04-02 22:29:05

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v4 15/27] printk: nbcon: Provide function to flush using write_atomic()

From: Thomas Gleixner <[email protected]>

Provide nbcon_atomic_flush_pending() to perform flushing of all
registered nbcon consoles using their write_atomic() callback.

Unlike console_flush_all(), nbcon_atomic_flush_pending() will
only flush up through the newest record at the time of the
call. This prevents a CPU from printing unbounded when other
CPUs are adding records.

Also unlike console_flush_all(), nbcon_atomic_flush_pending()
will fully flush one console before flushing the next. This
helps to guarantee that a block of pending records (such as
a stack trace in an emergency situation) can be printed
atomically at once before releasing console ownership.

nbcon_atomic_flush_pending() is safe in any context because it
uses write_atomic() and acquires with unsafe_takeover disabled.

Use it in console_flush_on_panic() before flushing legacy
consoles. The legacy write() callbacks are not fully safe when
oops_in_progress is set.

Co-developed-by: John Ogness <[email protected]>
Signed-off-by: John Ogness <[email protected]>
Signed-off-by: Thomas Gleixner (Intel) <[email protected]>
---
kernel/printk/internal.h | 2 +
kernel/printk/nbcon.c | 104 ++++++++++++++++++++++++++++++++++++++-
kernel/printk/printk.c | 2 +
3 files changed, 106 insertions(+), 2 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index b7a0072eb2a4..a8df764fd0c5 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -84,6 +84,7 @@ void nbcon_seq_force(struct console *con, u64 seq);
bool nbcon_alloc(struct console *con);
void nbcon_init(struct console *con);
void nbcon_free(struct console *con);
+void nbcon_atomic_flush_pending(void);

/*
* Check if the given console is currently capable and allowed to print
@@ -138,6 +139,7 @@ static inline void nbcon_seq_force(struct console *con, u64 seq) { }
static inline bool nbcon_alloc(struct console *con) { return false; }
static inline void nbcon_init(struct console *con) { }
static inline void nbcon_free(struct console *con) { }
+static inline void nbcon_atomic_flush_pending(void) { }

static inline bool console_is_usable(struct console *con, short flags) { return false; }

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 1de6062d4ce3..fcdab2eaaedb 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -548,7 +548,6 @@ static struct printk_buffers panic_nbcon_pbufs;
* in an unsafe state. Otherwise, on success the caller may assume
* the console is not in an unsafe state.
*/
-__maybe_unused
static bool nbcon_context_try_acquire(struct nbcon_context *ctxt)
{
unsigned int cpu = smp_processor_id();
@@ -850,7 +849,6 @@ EXPORT_SYMBOL_GPL(nbcon_exit_unsafe);
* When true is returned, @wctxt->ctxt.backlog indicates whether there are
* still records pending in the ringbuffer,
*/
-__maybe_unused
static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
{
struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);
@@ -937,6 +935,108 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
return nbcon_context_exit_unsafe(ctxt);
}

+/**
+ * __nbcon_atomic_flush_pending_con - Flush specified nbcon console using its
+ * write_atomic() callback
+ * @con: The nbcon console to flush
+ * @stop_seq: Flush up until this record
+ *
+ * Return: True if taken over while printing. Otherwise false.
+ *
+ * If flushing up to @stop_seq was not successful, it only makes sense for the
+ * caller to try again when true was returned. When false is returned, either
+ * there are no more records available to read or this context is not allowed
+ * to acquire the console.
+ */
+static bool __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
+{
+ struct nbcon_write_context wctxt = { };
+ struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
+
+ ctxt->console = con;
+ ctxt->spinwait_max_us = 2000;
+ ctxt->prio = NBCON_PRIO_NORMAL;
+
+ if (!nbcon_context_try_acquire(ctxt))
+ return false;
+
+ while (nbcon_seq_read(con) < stop_seq) {
+ /*
+ * nbcon_emit_next_record() returns false when the console was
+ * handed over or taken over. In both cases the context is no
+ * longer valid.
+ */
+ if (!nbcon_emit_next_record(&wctxt))
+ return true;
+
+ if (!ctxt->backlog)
+ break;
+ }
+
+ nbcon_context_release(ctxt);
+
+ return false;
+}
+
+/**
+ * __nbcon_atomic_flush_pending - Flush all nbcon consoles using their
+ * write_atomic() callback
+ * @stop_seq: Flush up until this record
+ */
+static void __nbcon_atomic_flush_pending(u64 stop_seq)
+{
+ struct console *con;
+ bool should_retry;
+ int cookie;
+
+ do {
+ should_retry = false;
+
+ cookie = console_srcu_read_lock();
+ for_each_console_srcu(con) {
+ short flags = console_srcu_read_flags(con);
+ unsigned long irq_flags;
+
+ if (!(flags & CON_NBCON))
+ continue;
+
+ if (!console_is_usable(con, flags))
+ continue;
+
+ if (nbcon_seq_read(con) >= stop_seq)
+ continue;
+
+ /*
+ * Atomic flushing does not use console driver
+ * synchronization (i.e. it does not hold the port
+ * lock for uart consoles). Therefore IRQs must be
+ * disabled to avoid being interrupted and then
+ * calling into a driver that will deadlock trying
+ * to acquire console ownership.
+ */
+ local_irq_save(irq_flags);
+
+ should_retry |= __nbcon_atomic_flush_pending_con(con, stop_seq);
+
+ local_irq_restore(irq_flags);
+ }
+ console_srcu_read_unlock(cookie);
+ } while (should_retry);
+}
+
+/**
+ * nbcon_atomic_flush_pending - Flush all nbcon consoles using their
+ * write_atomic() callback
+ *
+ * Flush the backlog up through the currently newest record. Any new
+ * records added while flushing will not be flushed. This is to avoid
+ * one CPU printing unbounded because other CPUs continue to add records.
+ */
+void nbcon_atomic_flush_pending(void)
+{
+ __nbcon_atomic_flush_pending(prb_next_reserve_seq(prb));
+}
+
/**
* nbcon_alloc - Allocate buffers needed by the nbcon console
* @con: Console to allocate buffers for
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index fe06856f7653..6404f2044ceb 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3172,6 +3172,8 @@ void console_flush_on_panic(enum con_flush_mode mode)
console_srcu_read_unlock(cookie);
}

+ nbcon_atomic_flush_pending();
+
console_flush_all(false, &next_seq, &handover);
}

--
2.39.2


2024-04-03 10:03:38

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v4 20/27] printk: Avoid console_lock dance if no legacy or boot consoles

On 2024-04-03, John Ogness <[email protected]> wrote:
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index df84c6bfbb2d..4ff3800e8e8e 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3810,6 +3833,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> u64 last_diff = 0;
> u64 printk_seq;
> short flags;
> + bool locked;
> int cookie;
> u64 diff;
> u64 seq;
> @@ -3819,22 +3843,35 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> seq = prb_next_reserve_seq(prb);
>
> /* Flush the consoles so that records up to @seq are printed. */
> - console_lock();
> - console_unlock();
> + if (printing_via_unlock) {
> + console_lock();
> + console_unlock();
> + }
>
> for (;;) {
> unsigned long begin_jiffies;
> unsigned long slept_jiffies;
>
> + locked = false;
> diff = 0;
>
> + if (printing_via_unlock) {
> + /*
> + * Hold the console_lock to guarantee safe access to
> + * console->seq. Releasing console_lock flushes more
> + * records in case @seq is still not printed on all
> + * usable consoles.
> + */
> + console_lock();
> + locked = true;
> + }
> +
> /*
> - * Hold the console_lock to guarantee safe access to
> - * console->seq. Releasing console_lock flushes more
> - * records in case @seq is still not printed on all
> - * usable consoles.
> + * Ensure the compiler does not optimize @locked to be
> + * @printing_via_unlock since the latter can change at any
> + * time.
> */
> - console_lock();
> + barrier();

When I originally implemented this, my objective was to force the
compiler to use a local variable. But to be fully paranoid (which we
should) we must also forbid the compiler from being able to do this
nonsense:

if (printing_via_unlock) {
console_lock();
locked = printing_via_unlock;
}

I suggest replacing the __pr_flush() hunks to be as follows instead
(i.e. ensure all conditional console lock usage within the loop is using
the local variable):

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index df84c6bfbb2d..1dbd2a837b67 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3819,22 +3842,34 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
seq = prb_next_reserve_seq(prb);

/* Flush the consoles so that records up to @seq are printed. */
- console_lock();
- console_unlock();
+ if (printing_via_unlock) {
+ console_lock();
+ console_unlock();
+ }

for (;;) {
unsigned long begin_jiffies;
unsigned long slept_jiffies;
-
- diff = 0;
+ bool use_console_lock = printing_via_unlock;

/*
- * Hold the console_lock to guarantee safe access to
- * console->seq. Releasing console_lock flushes more
- * records in case @seq is still not printed on all
- * usable consoles.
+ * Ensure the compiler does not optimize @use_console_lock to
+ * be @printing_via_unlock since the latter can change at any
+ * time.
*/
- console_lock();
+ barrier();
+
+ diff = 0;
+
+ if (use_console_lock) {
+ /*
+ * Hold the console_lock to guarantee safe access to
+ * console->seq. Releasing console_lock flushes more
+ * records in case @seq is still not printed on all
+ * usable consoles.
+ */
+ console_lock();
+ }

cookie = console_srcu_read_lock();
for_each_console_srcu(c) {
@@ -3854,6 +3889,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
if (flags & CON_NBCON) {
printk_seq = nbcon_seq_read(c);
} else {
+ WARN_ON_ONCE(!use_console_lock);
printk_seq = c->seq;
}

@@ -3865,7 +3901,8 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
if (diff != last_diff && reset_on_progress)
remaining_jiffies = timeout_jiffies;

- console_unlock();
+ if (use_console_lock)
+ console_unlock();

/* Note: @diff is 0 if there are no usable consoles. */
if (diff == 0 || remaining_jiffies == 0)

2024-04-03 10:35:17

by Andy Shevchenko

[permalink] [raw]
Subject: Re: [PATCH printk v4 10/27] printk: nbcon: Do not rely on proxy headers

On Wed, Apr 03, 2024 at 12:17:12AM +0206, John Ogness wrote:
> The headers kernel.h, serial_core.h, and console.h allow for the
> definitions of many types and functions from other headers.
> Rather than relying on these as proxy headers, explicitly
> include all headers providing needed definitions. Also sort the
> list alphabetically to be able to easily detect duplicates.

Thank you! LGTM,

Reviewed-by: Andy Shevchenko <[email protected]>

--
With Best Regards,
Andy Shevchenko



2024-04-03 11:36:21

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v4 09/27] printk: nbcon: Implement processing in port->lock wrapper

On 2024-04-03, John Ogness <[email protected]> wrote:
> diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
> index d6a58a9e072a..2652b4d5c944 100644
> --- a/drivers/tty/serial/serial_core.c
> +++ b/drivers/tty/serial/serial_core.c
> @@ -3146,7 +3146,7 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
> uport->state = state;
>
> state->pm_state = UART_PM_STATE_UNDEFINED;
> - uport->cons = drv->cons;
> + uart_port_set_cons(uport, drv->cons);
> uport->minor = drv->tty_driver->minor_start + uport->line;
> uport->name = kasprintf(GFP_KERNEL, "%s%d", drv->dev_name,
> drv->tty_driver->name_base + uport->line);

Sebastian Siewior pointed out that the port lock is initialized shortly
after this code. Since uart_port_set_cons() uses the port lock, the
spinlock initialization must come first. The changes for serial_core.c
should be:

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c
index d6a58a9e072a..0c13ea6a3afa 100644
--- a/drivers/tty/serial/serial_core.c
+++ b/drivers/tty/serial/serial_core.c
@@ -3145,8 +3145,15 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
state->uart_port = uport;
uport->state = state;

+ /*
+ * If this port is in use as a console then the spinlock is already
+ * initialised.
+ */
+ if (!uart_console_registered(uport))
+ uart_port_spin_lock_init(uport);
+
state->pm_state = UART_PM_STATE_UNDEFINED;
- uport->cons = drv->cons;
+ uart_port_set_cons(uport, drv->cons);
uport->minor = drv->tty_driver->minor_start + uport->line;
uport->name = kasprintf(GFP_KERNEL, "%s%d", drv->dev_name,
drv->tty_driver->name_base + uport->line);
@@ -3155,13 +3162,6 @@ static int serial_core_add_one_port(struct uart_driver *drv, struct uart_port *u
goto out;
}

- /*
- * If this port is in use as a console then the spinlock is already
- * initialised.
- */
- if (!uart_console_registered(uport))
- uart_port_spin_lock_init(uport);
-
if (uport->cons && uport->dev)
of_console_check(uport->dev->of_node, uport->cons->name, uport->line);


2024-04-05 15:38:06

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 02/27] printk: Properly deal with nbcon consoles on seq init

On Wed 2024-04-03 00:17:04, John Ogness wrote:
> If a non-boot console is registering and boot consoles exist, the
> consoles are flushed before being unregistered. This allows the
> non-boot console to continue where the boot console left off.
>
> If for whatever reason flushing fails, the lowest seq found from
> any of the enabled boot consoles is used. Until now con->seq was
> checked. However, if it is an nbcon boot console, the function
> nbcon_seq_read() must be used to read seq because con->seq is
> not updated for nbcon consoles.
>
> Check if it is an nbcon boot console and if so call
> nbcon_seq_read() to read seq.
>
> Also setup the nbcon sequence number and reset the legacy
> sequence number from register_console() (rather than in
> nbcon_init() and nbcon_seq_force()). This removes all legacy
> sequence handling from nbcon.c so the code is easier to follow
> and maintain.
>
> Signed-off-by: John Ogness <[email protected]>

Sigh, I wanted to wave this over. But then I ended in some
doubts, see below.


> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -974,7 +969,7 @@ void nbcon_init(struct console *con)
> /* nbcon_alloc() must have been called and successful! */
> BUG_ON(!con->pbufs);
>
> - nbcon_seq_force(con, con->seq);
> + nbcon_seq_force(con, 0);
> nbcon_state_set(con, &state);
> }
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index c7c0ee2b47eb..b7e52b3f3e96 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3348,6 +3348,7 @@ static void try_enable_default_console(struct console *newcon)
> newcon->flags |= CON_CONSDEV;
> }
>
> +/* Set @newcon->seq to the first record this console should print. */
> static void console_init_seq(struct console *newcon, bool bootcon_registered)
> {
> struct console *con;
> @@ -3396,11 +3397,20 @@ static void console_init_seq(struct console *newcon, bool bootcon_registered)
>
> newcon->seq = prb_next_seq(prb);
> for_each_console(con) {
> - if ((con->flags & CON_BOOT) &&
> - (con->flags & CON_ENABLED) &&
> - con->seq < newcon->seq) {
> - newcon->seq = con->seq;
> + u64 seq;
> +
> + if (!((con->flags & CON_BOOT) &&
> + (con->flags & CON_ENABLED))) {
> + continue;
> }
> +
> + if (con->flags & CON_NBCON)
> + seq = nbcon_seq_read(con);
> + else
> + seq = con->seq;
> +
> + if (seq < newcon->seq)
> + newcon->seq = seq;
> }
> }
>
> @@ -3517,9 +3527,18 @@ void register_console(struct console *newcon)
> newcon->dropped = 0;
> console_init_seq(newcon, bootcon_registered);
>
> - if (newcon->flags & CON_NBCON)
> + if (newcon->flags & CON_NBCON) {
> nbcon_init(newcon);
>
> + /*
> + * nbcon consoles have their own sequence counter. The legacy
> + * sequence counter is reset so that it is clear it is not
> + * being used.
> + */
> + nbcon_seq_force(newcon, newcon->seq);
> + newcon->seq = 0;

I have tried whether we could get rid of this hack by assigning the
value correctly already in console_init_seq().

But I ended with a checken-and-egg problem whether to call
nbcon_init() before console_init_seq() or vice versa.
No solution was ideal.

Then I realized that the full solution in RT tree starts the kthread
in nbcon_init() => con->nbcon_seq must be initialized earlier
=> the current code is buggy.

BTW: I wonder if it is sane to start the kthread in the middle of
struct console initialization. Especially when the full
initialization is needed for a correct serialization against
uart_port_lock().

It might be better to create the kthread in a separate function
called later. But this will be another patchset...


For now, I suggest to make the seq initialization cleaner. Here is
my proposal. The patch can be applied on top of this patchset.
It is only compile tested.


From 2fcb73c98dde2c099fd5d5e4de9c0ffb449c7cc6 Mon Sep 17 00:00:00 2001
From: Petr Mladek <[email protected]>
Date: Fri, 5 Apr 2024 15:44:45 +0200
Subject: [PATCH] printk: Clean up code initializing the per-console sequence
number

console_init_seq() reads either con->seq or con->nbcon_seq
depending on the value of CON_NBCON flag. But it always sets
newcon->seq even for nbcon consoles. The value is later
moved in register_console().

Rename console_init_seq() to get_init_console_seq() and just
return the value. Pass the value to nbcon_init() so that
it might be initialized there.

The cleaned design should make sure that the value stays
and is set before the kthread is created.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/internal.h | 2 +-
kernel/printk/nbcon.c | 5 +++--
kernel/printk/printk.c | 37 +++++++++++++++++--------------------
3 files changed, 21 insertions(+), 23 deletions(-)

diff --git a/kernel/printk/internal.h b/kernel/printk/internal.h
index bcf2105a5c5c..2366303f31ae 100644
--- a/kernel/printk/internal.h
+++ b/kernel/printk/internal.h
@@ -84,7 +84,7 @@ int console_lock_spinning_disable_and_check(int cookie);
u64 nbcon_seq_read(struct console *con);
void nbcon_seq_force(struct console *con, u64 seq);
bool nbcon_alloc(struct console *con);
-void nbcon_init(struct console *con);
+void nbcon_init(struct console *con, u64 init_seq);
void nbcon_free(struct console *con);
enum nbcon_prio nbcon_get_default_prio(void);
void nbcon_atomic_flush_pending(void);
diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index 4c852c2e8d89..c57ad34a8d56 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -1262,18 +1262,19 @@ bool nbcon_alloc(struct console *con)
/**
* nbcon_init - Initialize the nbcon console specific data
* @con: Console to initialize
+ * @init_seq Sequence number of the first to-be-emitted record
*
* nbcon_alloc() *must* be called and succeed before this function
* is called.
*/
-void nbcon_init(struct console *con)
+void nbcon_init(struct console *con, u64 init_seq)
{
struct nbcon_state state = { };

/* nbcon_alloc() must have been called and successful! */
BUG_ON(!con->pbufs);

- nbcon_seq_force(con, 0);
+ nbcon_seq_force(con, init_seq);
nbcon_state_set(con, &state);
}

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 9da70d7dda3d..9164b7f72e17 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -3427,20 +3427,21 @@ static void try_enable_default_console(struct console *newcon)
newcon->flags |= CON_CONSDEV;
}

-/* Set @newcon->seq to the first record this console should print. */
-static void console_init_seq(struct console *newcon, bool bootcon_registered)
+/* Return starting sequence number to be used by a newly registered console. */
+static u64 get_init_console_seq(struct console *newcon, bool bootcon_registered)
{
struct console *con;
+ u64 init_seq;
bool handover;

if (newcon->flags & (CON_PRINTBUFFER | CON_BOOT)) {
/* Get a consistent copy of @syslog_seq. */
mutex_lock(&syslog_lock);
- newcon->seq = syslog_seq;
+ init_seq = syslog_seq;
mutex_unlock(&syslog_lock);
} else {
/* Begin with next message added to ringbuffer. */
- newcon->seq = prb_next_seq(prb);
+ init_seq = prb_next_seq(prb);

/*
* If any enabled boot consoles are due to be unregistered
@@ -3461,7 +3462,7 @@ static void console_init_seq(struct console *newcon, bool bootcon_registered)
* Flush all consoles and set the console to start at
* the next unprinted sequence number.
*/
- if (!console_flush_all(true, &newcon->seq, &handover)) {
+ if (!console_flush_all(true, &init_seq, &handover)) {
/*
* Flushing failed. Just choose the lowest
* sequence of the enabled boot consoles.
@@ -3474,12 +3475,12 @@ static void console_init_seq(struct console *newcon, bool bootcon_registered)
if (handover)
console_lock();

- newcon->seq = prb_next_seq(prb);
+ init_seq = prb_next_seq(prb);
for_each_console(con) {
u64 seq;

- if (!((con->flags & CON_BOOT) &&
- (con->flags & CON_ENABLED))) {
+ if (!(con->flags & CON_BOOT) ||
+ !(con->flags & CON_ENABLED)) {
continue;
}

@@ -3488,14 +3489,16 @@ static void console_init_seq(struct console *newcon, bool bootcon_registered)
else
seq = con->seq;

- if (seq < newcon->seq)
- newcon->seq = seq;
+ if (seq < init_seq)
+ init_seq = seq;
}
}

console_unlock();
}
}
+
+ return init_seq;
}

#define console_first() \
@@ -3528,6 +3531,7 @@ void register_console(struct console *newcon)
bool bootcon_registered = false;
bool realcon_registered = false;
unsigned long flags;
+ u64 init_seq;
int err;

console_list_lock();
@@ -3605,21 +3609,14 @@ void register_console(struct console *newcon)
}

newcon->dropped = 0;
- console_init_seq(newcon, bootcon_registered);
+ init_seq = get_init_console_seq(newcon, have_boot_console);

if (newcon->flags & CON_NBCON) {
have_nbcon_console = true;
- nbcon_init(newcon);
-
- /*
- * nbcon consoles have their own sequence counter. The legacy
- * sequence counter is reset so that it is clear it is not
- * being used.
- */
- nbcon_seq_force(newcon, newcon->seq);
- newcon->seq = 0;
+ nbcon_init(newcon, init_seq);
} else {
have_legacy_console = true;
+ newcon->seq = init_seq;
}

if (newcon->flags & CON_BOOT)
--
2.44.0


2024-04-08 13:22:09

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 03/27] printk: nbcon: Remove return value for write_atomic()

On Wed 2024-04-03 00:17:05, John Ogness wrote:
> The return value of write_atomic() does not provide any useful
> information. On the contrary, it makes things more complicated
> for the caller to appropriately deal with the information.
>
> Change write_atomic() to not have a return value. If the
> message did not get printed due to loss of ownership, the
> caller will notice this on its own. If ownership was not lost,
> it will be assumed that the driver successfully printed the
> message and the sequence number for that console will be
> incremented.
>
> Signed-off-by: John Ogness <[email protected]>

Reviewed-by: Petr Mladek <[email protected]>

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -889,16 +888,16 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
> wctxt->unsafe_takeover = cur.unsafe_takeover;
>
> if (con->write_atomic) {
> - done = con->write_atomic(con, wctxt);
> + con->write_atomic(con, wctxt);
> } else {
> - nbcon_context_release(ctxt);
> + /*
> + * This function should never be called for legacy consoles.
> + * Handle it as if ownership was lost and try to continue.
> + */
> WARN_ON_ONCE(1);
> - done = false;
> - }
> -
> - /* If not done, the emit was aborted. */
> - if (!done)
> + nbcon_context_release(ctxt);

I thought a bit whether it is better to release the context before
or after the WARN(). My conclusion is that it does not really matter.

Anyway, we must make sure that it is safe to call WARN_ON_ONCE()
when nbcon context is acquired. People will use it. And I believe
that it _is_ safe.

Best Regards,
Petr

2024-04-08 15:20:55

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 05/27] printk: nbcon: Add detailed doc for write_atomic()

On Wed 2024-04-03 00:17:07, John Ogness wrote:
> The write_atomic() callback has special requirements and is
> allowed to use special helper functions. Provide detailed
> documentation of the callback so that a developer has a
> chance of implementing it correctly.
>
> Signed-off-by: John Ogness <[email protected]>
> Reviewed-by: Petr Mladek <[email protected]>

I have re-read the text and found a mistake.

---
> include/linux/console.h | 31 +++++++++++++++++++++++++++----
> 1 file changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 54b98e4f0544..e4028d4079e1 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -285,7 +285,7 @@ struct nbcon_write_context {
> /**
> * struct console - The console descriptor structure
> * @name: The name of the console driver
> - * @write: Write callback to output messages (Optional)
> + * @write: Legacy write callback to output messages (Optional)
> * @read: Read callback for console input (Optional)
> * @device: The underlying TTY device driver (Optional)
> * @unblank: Callback to unblank the console (Optional)
> @@ -302,7 +302,6 @@ struct nbcon_write_context {
> * @data: Driver private data
> * @node: hlist node for the console list
> *
> - * @write_atomic: Write callback for atomic context
> * @nbcon_state: State for nbcon consoles
> * @nbcon_seq: Sequence number of the next record for nbcon to print
> * @pbufs: Pointer to nbcon private buffer
> @@ -327,8 +326,32 @@ struct console {
> struct hlist_node node;
>
> /* nbcon console specific members */
> - void (*write_atomic)(struct console *con,
> - struct nbcon_write_context *wctxt);
> +
> + /**
> + * @write_atomic:
> + *
> + * NBCON callback to write out text in any context.
> + *
> + * This callback is called with the console already acquired. The
> + * callback can use nbcon_can_proceed() at any time to verify that
> + * it is still the owner of the console. In the case that it has
> + * lost ownership, it is no longer allowed to go forward. In this
> + * case it must back out immediately and carefully. The buffer
> + * content is also no longer trusted since it no longer belongs to
> + * the context.
> + *
> + * If the callback needs to perform actions where ownership is not
> + * allowed to be taken over, nbcon_enter_unsafe() and
> + * nbcon_exit_unsafe() can be used to mark such sections.

IMHO, the word 'can' is wrong. The callback has to enter unsafe state
when the ownership is not allowed to be taken over.

We should probably be more clear what this exactly means.

Thinking more about this. We should also be more clear about when
to use nbcon_can_proceed() and what it guarantees. Is it actually
useful to call it directly in practice?


> + * functions are also points of possible ownership transfer. If
> + * either function returns false, ownership has been lost.
> + *
> + * This callback can be called from any context (including NMI).
> + * Therefore it must avoid usage of any locking and instead rely
> + * on the console ownership for synchronization.
> + */

My proposal:

/**
* @write_atomic:
*
* NBCON callback to write out text in any context.
*
* This callback is called with the nbcon context acquired. But
* a higher priority context is allowed to take over it by default.
*
* The callback has to call nbcon_enter_unsafe() and nbcon_exit_unsafe()
* around any code where the takeover is not safe, for example, when
* manipulating the serial port registers.
*
* nbcon_enter_unsafe() might fail when the context has lost
* the console ownership in the meantime. In this case, the callback
* is no longer allowed to go forward. It must back out immediately and
* carefully. The buffer content is also no longer trusted since it
* no longer belongs to the context.
*
* The callback should allow the takeover whenever it is safe.
* It increases the chance to see messages when the system is
* in troubles.
*
* The callback is called from any context (including NMI).
* Therefore it must avoid usage of any locking and instead rely
* on the console ownership for synchronization.
*/

> + void (*write_atomic)(struct console *con, struct nbcon_write_context *wctxt);
> +
> atomic_t __private nbcon_state;
> atomic_long_t __private nbcon_seq;
> struct printk_buffers *pbufs;


Best Regards,
Petr

2024-04-09 09:20:57

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

On Wed 2024-04-03 00:17:08, John Ogness wrote:
> Console drivers typically must deal with access to the hardware
> via user input/output (such as an interactive login shell) and
> output of kernel messages via printk() calls.
>
> Follow-up commits require that the printk subsystem is able to
> synchronize with the driver. Require nbcon consoles to implement
> two new callbacks (device_lock(), device_unlock()) that will
> use whatever synchronization mechanism the driver is using for
> itself (for example, the port lock for uart serial consoles).

We should explain here the bigger picture, see my comment
of the word "whenever" below.

<proposal>
Console drivers typically have to deal with access to the hardware
via user input/output (such as an interactive login shell) and
output of kernel messages via printk() calls.

They use some classic locking mechanism in most situations. But
console->write_atomic() callbacks, used by nbcon consoles,
must be synchronized only by acquiring the console context.

The synchronization via the console context ownership is possible
only when the console driver is registered. It is when a particular
device driver is connected with a particular console driver.

The two synchronization mechanisms must be synchronized between
each other. It is tricky because the console context ownership
is quite special. It might be taken over by a higher priority
context. Also CPU migration has to be disabled.

The most tricky part is to (dis)connect these two mechanism during
the console (un)registration. Let's make it easier by adding two new
callbacks: device_lock(), device_unlock(). They would allow
to take the device-specific lock while the device is being
(un)registered by the related console driver.

For example, these callbacks would lock/unlock the port lock
for serial port drivers.
</proposal>

>
> Signed-off-by: John Ogness <[email protected]>
> ---
> include/linux/console.h | 42 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 42 insertions(+)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index e4028d4079e1..ad85594e070e 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -352,6 +352,48 @@ struct console {
> */
> void (*write_atomic)(struct console *con, struct nbcon_write_context *wctxt);
>
> + /**
> + * @device_lock:
> + *
> + * NBCON callback to begin synchronization with driver code.
> + *
> + * Console drivers typically must deal with access to the hardware
> + * via user input/output (such as an interactive login shell) and
> + * output of kernel messages via printk() calls. This callback is
> + * called by the printk-subsystem whenever it needs to synchronize
> + * with hardware access by the driver.

The word "whenever" can be translated as "always" (by dictionary
and by my head ;-) Then the description sounds like this would be
the primary synchronization mechanism between printk and the driver.

In fact, this API would have only one purpose to synchronize
the console registration/unregistration.

IMHO, we should explain here the relation between the driver-specific
locking and nbcon console context locking. It would describe the big
picture and hopefully reduce confusion and eventual misuse.


> + It should be implemented to
> + * use whatever synchronization mechanism the driver is using for
> + * itself (for example, the port lock for uart serial consoles).
> + *
> + * This callback is always called from task context. It may use any
> + * synchronization method required by the driver. BUT this callback
> + * MUST also disable migration. The console driver may be using a
> + * synchronization mechanism that already takes care of this (such as
> + * spinlocks). Otherwise this function must explicitly call
> + * migrate_disable().
> + *
> + * The flags argument is provided as a convenience to the driver. It
> + * will be passed again to device_unlock(). It can be ignored if the
> + * driver does not need it.
> + */

<proposal>
/**
* @device_lock:
*
* NBCON callback to serialize registration of a device driver
* for a console driver.
*
* Console drivers typically have to deal with access to the hardware
* via user input/output (such as an interactive login shell) and
* Output of kernel messages via printk() calls.
*
* They use some classic locking mechanism in most situations. But
* console->write_atomic() callbacks, used by nbcon consoles,
* must be synchronized only by acquiring the console context.
*
* The synchronization via the console context ownership is possible
* only when the console driver is registered. It is when a particular
* device driver is connected with a particular console driver.
*
* The device_lock() callback must block operations on the device
* while it is being (un)registered as a console driver. It will
* make sure that the classic device locking is aware of the console
* context locking when it might be acquired by the related nbcon
* console driver.
*
* This callback is always called from task context. It may use any
* synchronization method required by the driver, for example
* port lock for serial ports.
*
* IMPORTANT: This callback MUST also disable migration. The console
* driver may be using a synchronization mechanism that already
* takes care of this (such as spinlocks). Otherwise this function
* must explicitly call migrate_disable().
*
* The flags argument is provided as a convenience to the driver.
* It will be passed again to device_unlock(). It can be ignored
* if the driver does not need it.
*/
</proposal>


> + void (*device_lock)(struct console *con, unsigned long *flags);
> +
> + /**
> + * @device_unlock:
> + *
> + * NBCON callback to finish synchronization with driver code.
> + *
> + * It is the counterpart to device_lock().
> + *
> + * This callback is always called from task context. It must
> + * appropriately re-enable migration (depending on how device_lock()
> + * disabled migration).
> + *
> + * The flags argument is the value of the same variable that was
> + * passed to device_lock().
> + */
> + void (*device_unlock)(struct console *con, unsigned long flags);
> +
> atomic_t __private nbcon_state;
> atomic_long_t __private nbcon_seq;
> struct printk_buffers *pbufs;

With the updated commit message and comment:

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr


2024-04-09 09:46:52

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 07/27] printk: nbcon: Use driver synchronization while registering

On Wed 2024-04-03 00:17:09, John Ogness wrote:
> Depending on if an nbcon console is registered, a driver may
> handle its internal locking differently. If a driver is holding
> its internal lock while the nbcon console is registered, there
> may be a risk that two different contexts access the hardware
> simultaneously without synchronization. (For example, if the
> printk subsystem invokes atomic printing while another driver
> context acquired the internal lock without considering the
> atomic console.)
>
> Use the driver synchronization while a registering nbcon console
> transitions to being registered. This guarantees that if the
> driver acquires its internal lock when the nbcon console was not
> registered, it will remain unregistered until that context
> releases the lock.
>
> Signed-off-by: John Ogness <[email protected]>

Looks reasonable:

Reviewed-by: Petr Mladek <[email protected]>

Note:

The printk kthread integration is not part of this patchset.
I see in linux-rt-devel that nbcon_kthread_func() emits
a pending record under con->driver_lock(). This is
a solution to prevent a race with the driver lock.

IMHO, it is not strictly necessary to take the driver_lock()
in the kthread. Instead, it would be enough to make sure that
the kthread is running only when the device driver is properly
registered as the console driver.

Well, we should probably discuss this in the patchset introducing
the kthread.

Best Regards,
Petr

2024-04-09 12:00:36

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 08/27] serial: core: Provide low-level functions to lock port

On Wed 2024-04-03 00:17:10, John Ogness wrote:
> It will be necessary at times for the uart nbcon console
> drivers to acquire the port lock directly (without the
> additional nbcon functionality of the port lock wrappers).
> These are special cases such as the implementation of the
> device_lock()/device_unlock() callbacks or for internal
> port lock wrapper synchronization.
>
> Provide low-level variants __uart_port_lock_irqsave() and
> __uart_port_unlock_irqrestore() for this purpose.
>
> Signed-off-by: John Ogness <[email protected]>

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2024-04-09 13:23:39

by Greg Kroah-Hartman

[permalink] [raw]
Subject: Re: [PATCH printk v4 08/27] serial: core: Provide low-level functions to lock port

On Wed, Apr 03, 2024 at 12:17:10AM +0206, John Ogness wrote:
> It will be necessary at times for the uart nbcon console
> drivers to acquire the port lock directly (without the
> additional nbcon functionality of the port lock wrappers).
> These are special cases such as the implementation of the
> device_lock()/device_unlock() callbacks or for internal
> port lock wrapper synchronization.
>
> Provide low-level variants __uart_port_lock_irqsave() and
> __uart_port_unlock_irqrestore() for this purpose.
>
> Signed-off-by: John Ogness <[email protected]>
> ---
> include/linux/serial_core.h | 18 ++++++++++++++++++
> 1 file changed, 18 insertions(+)

Acked-by: Greg Kroah-Hartman <[email protected]>

2024-04-10 12:35:39

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 09/27] printk: nbcon: Implement processing in port->lock wrapper

On Wed 2024-04-03 00:17:11, John Ogness wrote:
> Currently the port->lock wrappers uart_port_lock(),
> uart_port_unlock() (and their variants) only lock/unlock
> the spin_lock.
>
> If the port is an nbcon console, the wrappers must also
> acquire/release the console and mark the region as unsafe. This
> allows general port->lock synchronization to be synchronized
> with the nbcon console ownership.
>
> Introduce a new struct nbcon_drvdata within struct console that
> provides the necessary components for the port lock wrappers to
> acquire the nbcon console and track its ownership.
>
> Also introduce uart_port_set_cons() as a wrapper to set @cons
> of a uart_port. The wrapper sets @cons under the port lock in
> order to prevent @cons from disappearing while another context
> owns the port lock via the port lock wrappers.
>
> Also cleanup the description of the console_srcu_read_flags()
> function. It is used by the port lock wrappers to ensure a
> console cannot be fully unregistered while another context
> owns the port lock via the port lock wrappers.
>
> Signed-off-by: John Ogness <[email protected]>

> --- a/drivers/tty/serial/8250/8250_core.c
> +++ b/drivers/tty/serial/8250/8250_core.c
> @@ -689,7 +689,7 @@ static int univ8250_console_match(struct console *co, char *name, int idx,
> continue;
>
> co->index = i;
> - port->cons = co;
> + uart_port_set_cons(port, co);
> return serial8250_console_setup(port, options, true);

I just noticed that this is a newcon->match() callback. It does:

+ univ8250_console_match()
+ serial8250_console_setup(port, options, true) // @probe == true
+ probe_baud(port)

which manipulates the serial port.

We should call also the con->match() callback under console_lock()
in try_enable_preferred_console() like we do with con->setup,
see the commit 801410b26a0e8 ("serial: Lock console when calling into
driver before registration").

Maybe, we should just take console_lock() in register_console()
around these try_enable_*() calls.

Well, this is for a separated patch or separate patchset.

> }
>
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -282,6 +282,25 @@ struct nbcon_write_context {
> bool unsafe_takeover;
> };
>
> +/**
> + * struct nbcon_drvdata - Data to allow nbcon acquire in non-print context
> + * @ctxt: The core console context
> + * @srcu_cookie: Storage for a console_srcu_lock cookie, if needed
> + * @owner_index: Storage for the owning console index, if needed
> + * @locked: Storage for the locked state, if needed
> + *
> + * All fields (except for @ctxt) are available exclusively to the driver to
> + * use as needed. They are not used by the printk subsystem.
> + */
> +struct nbcon_drvdata {
> + struct nbcon_context __private ctxt;
> +
> + /* reserved for driver use */
> + int srcu_cookie;
> + short owner_index;
> + bool locked;
> +};
> +
> /**
> * struct console - The console descriptor structure
> * @name: The name of the console driver
> --- a/include/linux/serial_core.h
> +++ b/include/linux/serial_core.h
> @@ -606,6 +609,83 @@ static inline void __uart_port_unlock_irqrestore(struct uart_port *up, unsigned
> spin_unlock_irqrestore(&up->lock, flags);
> }
>
> +/**
> + * uart_port_set_cons - Safely set the @cons field for a uart
> + * @up: The uart port to set
> + * @con: The new console to set to
> + *
> + * This function must be used to set @up->cons. It uses the port lock to
> + * synchronize with the port lock wrappers in order to ensure that the console
> + * cannot change or disappear while another context is holding the port lock.
> + */
> +static inline void uart_port_set_cons(struct uart_port *up, struct console *con)
> +{
> + unsigned long flags;
> +
> + __uart_port_lock_irqsave(up, &flags);
> + up->cons = con;
> + __uart_port_unlock_irqrestore(up, flags);
> +}
> +
> +/* Only for internal port lock wrapper usage. */
> +static inline void __uart_port_nbcon_acquire(struct uart_port *up)
> +{
> + lockdep_assert_held_once(&up->lock);
> +
> + if (likely(!uart_console(up)))
> + return;
> +
> + if (up->cons->nbcon_drvdata) {
> + /*
> + * If @up->cons is registered, prevent it from fully
> + * unregistering until this context releases the nbcon.
> + */
> + int cookie = console_srcu_read_lock();

[ later update: maybe skip 30 lines and read the "Hum, ho" part first]
[ even later update: or skip 60 lines and read "Win win" part first.]

OK, this makes sense. But I feel like we are in a lock cycle.
This code is called under "up->lock()". It will be taken also by
the newcon->device_lock() in register_console() so we would have:

+ register_console()
+ console_list_lock() // serializes SRCU access to console list.
+ con->device_lock()
+spin_lock(&up->lock)

=> console_list_lock -> up->lock

and here

+ uart_port_lock()
+ spin_lock(&up->lock)
+ __uart_port_nbcon_acquire()
+ console_srcu_read_lock() // SRCU read lock serialized by console_list_lock

=> up->lock -> SRCU read lock serialized by console_list_lock

and for completeness:

+ unregister_console()
+ console_list_lock()
+ unregister_console_locked()
+ synchronize_srcu(&console_srcu);


OK, it works because because scru_read_lock() is not blocking.
The synchronize_srcu() is called under console_list_lock(). So that
the only important thing is not taking console_list_lock() under
console_srcu_read_lock().


Hum, ho, it took me some time to create a mental model around this.
It is not that complicated after all:

+ console_list_lock(): serializes the entire (un)register console
console operation. Well, it primary serializes
the console_list manipulation, including up->cons->node
which is tested below.

+ console_lock(): serializes emitting messages on legacy and
boot consoles

+ con->device_lock aka port->lock: serializes more actions:

1. any non-printk related access to the device (HW) like
a generic read/write.

2. Access to the device by con->write() for legacy consoles.

3. console registration, in particular console_list
manipulation, including up->cons->node. It is needed
to avoid races when the non-printk code has to decide
whether it needs to get serialized against nbcon
consoles or not.

For example, it should prevent races in
__uart_port_nbcon_acquire(up) and
__uart_port_nbcon_release(up) which are added in this patch.

But wait, we take con->device_lock() only in register_console().

Is this correct?

IMHO, it is a bug. We should (must) take con->device_lock()
also in unregister_console() when manipulating the
console_list and up->cons->node. Otherwise, uart_console(up)
would provide racy results.


Win win situation:

If we take con->device_lock() in unregister_console() around
console_list manipulation then the console could never
disappear or be assigned to another port when both:

uart_console(up) && hlist_unhashed_lockless(&up->cons->node)

are "true"

=> we would not need to take console_scru_read_lock() here
=> we would not need to store/check up->line

Heh, we even would not need "bool locked" because

uart_console(up) && hlist_unhashed_lockless(&up->cons->node)

would always return the same even in __uart_port_nbcon_release()

=> easier code, straight serialization rules, no races.


> +
> + /* Ensure console is registered and is an nbcon console. */
> + if (!hlist_unhashed_lockless(&up->cons->node) &&
> + (console_srcu_read_flags(up->cons) & CON_NBCON)) {
> + WARN_ON_ONCE(up->cons->nbcon_drvdata->locked);
> +
> + nbcon_driver_acquire(up->cons);
> +
> + /*
> + * Record @up->line to be used during release because
> + * @up->cons->index can change while the port and
> + * nbcon are locked.
> + */
> + up->cons->nbcon_drvdata->owner_index = up->line;
> + up->cons->nbcon_drvdata->srcu_cookie = cookie;
> + up->cons->nbcon_drvdata->locked = true;
> + } else {
> + console_srcu_read_unlock(cookie);
> + }
> + }
> +}
> +
> +/* Only for internal port lock wrapper usage. */
> +static inline void __uart_port_nbcon_release(struct uart_port *up)
> +{
> + lockdep_assert_held_once(&up->lock);
> +
> + /*
> + * uart_console() cannot be used here because @up->cons->index might
> + * have changed. Check against @up->cons->nbcon_drvdata->owner_index
> + * instead.
> + */
> +
> + if (unlikely(up->cons &&
> + up->cons->nbcon_drvdata &&
> + up->cons->nbcon_drvdata->locked &&
> + up->cons->nbcon_drvdata->owner_index == up->line)) {
> + WARN_ON_ONCE(!up->cons->nbcon_drvdata->locked);

The WARN_ON_ONCE() would never trigger because
"up->cons->nbcon_drvdata->locked" is checked by the above
if-condition.

I hope that we would replace this by the same checks as in acquire()
part as proposed above.


> +
> + up->cons->nbcon_drvdata->locked = false;
> + nbcon_driver_release(up->cons);
> + console_srcu_read_unlock(up->cons->nbcon_drvdata->srcu_cookie);
> + }
> +}
> +
> /**
> * uart_port_lock - Lock the UART port
> * @up: Pointer to UART port structure
> @@ -654,7 +741,11 @@ static inline bool uart_port_trylock(struct uart_port *up)
> */
> static inline bool uart_port_trylock_irqsave(struct uart_port *up, unsigned long *flags)
> {
> - return spin_trylock_irqsave(&up->lock, *flags);
> + if (!spin_trylock_irqsave(&up->lock, *flags))
> + return false;
> +
> + __uart_port_nbcon_acquire(up);

I would feel more comfortable if we created
__uart_port_nbcon_try_acquire(up). It would give up when it could
not acquire the context in the given timeout.

It would by similar to acquire(). The only difference would be that
it would return false on failure. And it would call:

/**
* nbcon_driver_try_acquire - Try acquire nbcon console and enter unsafe section
* @con: The nbcon console to acquire
*
* Context: Any context which could not be migrated to another CPU.
*
* Console drivers will usually use their own internal synchronization
* mechanism to synchronize between console printing and non-printing
* activities (such as setting baud rates). However, nbcon console drivers
* supporting atomic consoles may also want to mark unsafe sections when
* performing non-printing activities.
*
* This function tries to acquires the nbcon console using priority
* NBCON_PRIO_NORMAL and marks it unsafe for handover/takeover.
*
* Return: true on success, false when it was not able to acquire the
* console and set it "usafe" for a takeover.
*/
bool nbcon_driver_try_acquire(struct console *con)
{
struct nbcon_context *ctxt = &ACCESS_PRIVATE(con->nbcon_drvdata, ctxt);

cant_migrate();

memset(ctxt, 0, sizeof(*ctxt));
ctxt->console = con;
ctxt->prio = NBCON_PRIO_NORMAL;

if (!nbcon_context_try_acquire(ctxt))
return false;

if (!nbcon_context_enter_unsafe(ctxt))
return false;
}

It is probably not that important because it should not block emitting
the emergency or panic messages. They would use NBCON_PRIO_EMERGENCY
or NBCON_PRIO_PANIC in the important code paths.

But it looks semantically wrong to use a potentially blocking function
in a try_lock() API. IMHO, it would be a call for troubles.

> + return true;
> }
>
> /**
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 2516449f921d..38328cf0fd5c 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -988,3 +991,52 @@ void nbcon_free(struct console *con)
>
> con->pbufs = NULL;
> }
> +
> +/**
> + * nbcon_driver_acquire - Acquire nbcon console and enter unsafe section
> + * @con: The nbcon console to acquire
> + *
> + * Context: Any context which could not be migrated to another CPU.
> + *
> + * Console drivers will usually use their own internal synchronization
> + * mechasism to synchronize between console printing and non-printing
> + * activities (such as setting baud rates). However, nbcon console drivers
> + * supporting atomic consoles may also want to mark unsafe sections when
> + * performing non-printing activities.
> + *
> + * This function acquires the nbcon console using priority NBCON_PRIO_NORMAL
> + * and marks it unsafe for handover/takeover.
> + *
> + * Console drivers using this function must have provided @nbcon_drvdata in
> + * their struct console, which is used to track ownership and state
> + * information.
> + */
> +void nbcon_driver_acquire(struct console *con)
> +{
> + struct nbcon_context *ctxt = &ACCESS_PRIVATE(con->nbcon_drvdata, ctxt);

Hmm, we need to store somewhere the "struct nbcon_context" for this
generic purpose. If we agreed to remove struct nbcon_drvdata then
I would store it in struct console as

struct console {
[...]
/**
* @device_nbcon_context:
*
* nbcon_context used to serialize non-printing operations on
* the same device.
*
* The device drivers synchronize these operations with a driver-specific
* lock, such as port->lock in the serial consoles. When the
* device is registered as a console, they additionally have to acquire
* this nbcon context to get serialized against the atomic_write()
* callback using the same device.
*
* The struct does not require any special initialization.
*/
struct nbcon_context driver_nbcon_context;
[...]
};

It will be unused for legacy consoles. But the plan is convert all
console drivers anyway.

IMHO, passing it via an optional pointer is not worth the complexity.

> +
> + cant_migrate();
> +
> + do {
> + do {
> + memset(ctxt, 0, sizeof(*ctxt));
> + ctxt->console = con;
> + ctxt->prio = NBCON_PRIO_NORMAL;
> + } while (!nbcon_context_try_acquire(ctxt));
> +
> + } while (!nbcon_context_enter_unsafe(ctxt));
> +}
> +EXPORT_SYMBOL_GPL(nbcon_driver_acquire);

Best Regards,
Petr

2024-04-10 13:10:02

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 10/27] printk: nbcon: Do not rely on proxy headers

On Wed 2024-04-03 00:17:12, John Ogness wrote:
> The headers kernel.h, serial_core.h, and console.h allow for the
> definitions of many types and functions from other headers.
> Rather than relying on these as proxy headers, explicitly
> include all headers providing needed definitions. Also sort the
> list alphabetically to be able to easily detect duplicates.
>
> Suggested-by: Andy Shevchenko <[email protected]>
> Signed-off-by: John Ogness <[email protected]>

Looks good. Well, I haven't really checked the list of includes one by
one, so I only provide:

Acked-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2024-04-10 15:01:54

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 15/27] printk: nbcon: Provide function to flush using write_atomic()

On Wed 2024-04-03 00:17:17, John Ogness wrote:
> From: Thomas Gleixner <[email protected]>
>
> Provide nbcon_atomic_flush_pending() to perform flushing of all
> registered nbcon consoles using their write_atomic() callback.
>
> Unlike console_flush_all(), nbcon_atomic_flush_pending() will
> only flush up through the newest record at the time of the
> call. This prevents a CPU from printing unbounded when other
> CPUs are adding records.
>
> Also unlike console_flush_all(), nbcon_atomic_flush_pending()
> will fully flush one console before flushing the next. This
> helps to guarantee that a block of pending records (such as
> a stack trace in an emergency situation) can be printed
> atomically at once before releasing console ownership.
>
> nbcon_atomic_flush_pending() is safe in any context because it
> uses write_atomic() and acquires with unsafe_takeover disabled.
>
> Use it in console_flush_on_panic() before flushing legacy
> consoles. The legacy write() callbacks are not fully safe when
> oops_in_progress is set.
>
> Co-developed-by: John Ogness <[email protected]>
> Signed-off-by: John Ogness <[email protected]>
> Signed-off-by: Thomas Gleixner (Intel) <[email protected]>

Reviewed-by: Petr Mladek <[email protected]>

See few nits below.

> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -937,6 +935,108 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
> return nbcon_context_exit_unsafe(ctxt);
> }
>
> +/**
> + * __nbcon_atomic_flush_pending_con - Flush specified nbcon console using its
> + * write_atomic() callback
> + * @con: The nbcon console to flush
> + * @stop_seq: Flush up until this record
> + *
> + * Return: True if taken over while printing. Otherwise false.
> + *
> + * If flushing up to @stop_seq was not successful, it only makes sense for the
> + * caller to try again when true was returned. When false is returned, either
> + * there are no more records available to read or this context is not allowed
> + * to acquire the console.
> + */
> +static bool __nbcon_atomic_flush_pending_con(struct console *con, u64 stop_seq)
> +{
> + struct nbcon_write_context wctxt = { };
> + struct nbcon_context *ctxt = &ACCESS_PRIVATE(&wctxt, ctxt);
> +
> + ctxt->console = con;
> + ctxt->spinwait_max_us = 2000;
> + ctxt->prio = NBCON_PRIO_NORMAL;

Nit: It looks strange to harcode NBCON_PRIO_NORMAL and call it from
console_flush_on_panic() in the same patch.

I see. It will get replaced by nbcon_get_default_prio() later.
I guess that it is just a relic from several reworks and
shufling. I know that it is hard.

It might have been better to either add the call in
console_flush_in_panic() later. Or introduce nbcon_get_default_prio()
earlier so that we could use it here.


> +
> + if (!nbcon_context_try_acquire(ctxt))
> + return false;
> +
> + while (nbcon_seq_read(con) < stop_seq) {
> + /*
> + * nbcon_emit_next_record() returns false when the console was
> + * handed over or taken over. In both cases the context is no
> + * longer valid.
> + */
> + if (!nbcon_emit_next_record(&wctxt))
> + return true;
> +
> + if (!ctxt->backlog)
> + break;
> + }
> +
> + nbcon_context_release(ctxt);
> +
> + return false;
> +}
> +
> +/**
> + * __nbcon_atomic_flush_pending - Flush all nbcon consoles using their
> + * write_atomic() callback
> + * @stop_seq: Flush up until this record
> + */
> +static void __nbcon_atomic_flush_pending(u64 stop_seq)
> +{
> + struct console *con;
> + bool should_retry;
> + int cookie;
> +
> + do {
> + should_retry = false;
> +
> + cookie = console_srcu_read_lock();
> + for_each_console_srcu(con) {
> + short flags = console_srcu_read_flags(con);
> + unsigned long irq_flags;
> +
> + if (!(flags & CON_NBCON))
> + continue;
> +
> + if (!console_is_usable(con, flags))
> + continue;
> +
> + if (nbcon_seq_read(con) >= stop_seq)
> + continue;
> +
> + /*
> + * Atomic flushing does not use console driver
> + * synchronization (i.e. it does not hold the port
> + * lock for uart consoles). Therefore IRQs must be
> + * disabled to avoid being interrupted and then
> + * calling into a driver that will deadlock trying
> + * to acquire console ownership.
> + */
> + local_irq_save(irq_flags);
> +
> + should_retry |= __nbcon_atomic_flush_pending_con(con, stop_seq);

Nit: I have to say that this is quite cryptic. The "true" return value
usually means success. But it sets "should_retry" here.

It would mean sligtly more code but it would be much more clear
when __nbcon_atomic_flush_pending_con() returns:

+ 0 on success
+ -EAGAIN when lost the owenership
+ -EPERM when can't get the owner ship like nbcon_context_try_acquire_direct()

and we make the decision here.

> +
> + local_irq_restore(irq_flags);
> + }
> + console_srcu_read_unlock(cookie);
> + } while (should_retry);
> +}

Neither of the nits is a blocker. They are basically just about potential
complications for the future code archaeologists.

Best Regards,
Petr

2024-04-11 14:15:53

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

On Wed 2024-04-03 00:17:19, John Ogness wrote:
> Allow nbcon consoles to print messages in the legacy printk()
> caller context (printing via unlock) by integrating them into
> console_flush_all(). The write_atomic() callback is used for
> printing.

Hmm, this patch tries to flush nbcon console even in context
with NBCON_PRIO_NORMAL. Do we really want this, please?

I would expect that it would do so only when the kthread
is not working.

> Provide nbcon_legacy_emit_next_record(), which acts as the
> nbcon variant of console_emit_next_record(). Call this variant
> within console_flush_all() for nbcon consoles. Since nbcon
> consoles use their own @nbcon_seq variable to track the next
> record to print, this also must be appropriately handled.

I have been a bit confused by all the boolean return values
and what _exactly_ they mean. IMHO, we should make it more
clear how it works when it can't acquire the context.

IMHO, it is is importnat because console_flush_all() interprets
nbcon_legacy_emit_next_record() return value as @progress even when
there is no guaranteed progress. We just expect that
the other context is doing something.

It feels like it might get stuck forewer in some situatuon.
It would be good to understand if it is OK or not.


Later update:

Hmm, console_flush_all() is called from console_unlock().
It might be called in atomic context. But the current
owner might be theoretically scheduled out.

This is from documentation of nbcon_context_try_acquire()

/**
* nbcon_context_try_acquire - Try to acquire nbcon console
* @ctxt: The context of the caller
*
* Context: Any context which could not be migrated to another CPU.


I can't find any situation where nbcon_context_try_acquire() is
currently called in normal (schedulable) context. This is probably
why you did not see any problems with testing.

I see 3 possible solutions:

1. Enforce that nbcon context can be acquired only with preemtion
disabled.

2. Enforce that nbcon context can be acquired only with
interrupts. It would prevent deadlock when some future
code interrupt flush in NBCON_PRIO_EMERGENCY context.
And then a potential nested console_flush_all() won't be
able to takeover the interrupted NBCON_PRIO_CONTEXT
and there will be no progress.

3. console_flush_all() should ignore nbcon console when
it is not able to get the context, aka no progress.


I personally prefer the 3rd solution because I have spent
last 12 years on attempts to move printk into preemtible
context. And it looks wrong to move into atomic context.

Warning: console_flush_all() suddenly won't guarantee flushing
all messages.

I am not completely sure about all the consequences until
I see the rest of the patchset and the kthread intergration.
We will somehow need to guarantee that all messages
are flushed.


I propose the following changes to make console_flush_all()
safe. The patch is made on top of this patchset. Feel free
to squash them into your patch and remove my SoB.


From 8dd9c0be5ab693c6976a0f7f8b99de48ecebcbf6 Mon Sep 17 00:00:00 2001
From: Petr Mladek <[email protected]>
Date: Thu, 11 Apr 2024 15:45:53 +0200
Subject: [PATCH] printk: nbcon: Prevent deadlock when flushing nbcon consoles
in the legacy loop

Ignore pending records in nbcon consoles in the legacy console_flush_all()
loop when the nbcon console context can't be acquired. They will be
flushed either by the current nbcon context owner or to-be-introduced
printk kthread.

Signed-off-by: Petr Mladek <[email protected]>
---
kernel/printk/nbcon.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
index c57ad34a8d56..88c40f165be4 100644
--- a/kernel/printk/nbcon.c
+++ b/kernel/printk/nbcon.c
@@ -964,8 +964,12 @@ static __ref unsigned int *nbcon_get_cpu_emergency_nesting(void)
* write_atomic() callback
* @wctxt: An initialized write context struct to use for this context
*
- * Return: False if it is known there are no more records to print,
- * otherwise true.
+ * Return: True, when a record has been printed and there are still
+ * pending records. The caller might want to continue flushing.
+ *
+ * False, when there is no pending record, or when the console
+ * context can't be acquired, or the ownership has been lost.
+ * The caller should give up. Either the job is or can't be done.
*
* This is an internal helper to handle the locking of the console before
* calling nbcon_emit_next_record().
@@ -975,7 +979,7 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
struct nbcon_context *ctxt = &ACCESS_PRIVATE(wctxt, ctxt);

if (!nbcon_context_try_acquire(ctxt))
- return true;
+ return false;

/*
* nbcon_emit_next_record() returns false when the console was
@@ -983,7 +987,7 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
* longer valid.
*/
if (!nbcon_emit_next_record(wctxt))
- return true;
+ return false;

nbcon_context_release(ctxt);

@@ -1023,8 +1027,13 @@ enum nbcon_prio nbcon_get_default_prio(void)
* @cookie: The cookie from the SRCU read lock.
*
* Context: Any context except NMI.
- * Return: False if the given console has no next record to print,
- * otherwise true.
+ *
+ * Return: True, when a record has been printed and there are still
+ * pending records. The caller might want to continue flushing.
+ *
+ * False, when there is no pending record, or when the console
+ * context can't be acquired, or the ownership has been lost.
+ * The caller should give up. Either the job is or can't be done.
*
* This function is meant to be called by console_flush_all() to print records
* on nbcon consoles from legacy context (printing via console unlocking).
--
2.44.0


Best Regards,
Petr

2024-04-11 14:19:23

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 19/27] printk: nbcon: Add unsafe flushing on panic

On Wed 2024-04-03 00:17:21, John Ogness wrote:
> Add nbcon_atomic_flush_unsafe() to flush all nbcon consoles
> using the write_atomic() callback and allowing unsafe hostile
> takeovers. Call this at the end of panic() as a final attempt
> to flush any pending messages.
>
> Note that legacy consoles use unsafe methods for flushing
> from the beginning of panic (see bust_spinlocks()). Therefore,
> systems using both legacy and nbcon consoles may still fail to
> see panic messages due to unsafe legacy console usage.
>
> Signed-off-by: John Ogness <[email protected]>

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2024-04-11 15:46:09

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

On Thu 2024-04-11 16:14:58, Petr Mladek wrote:
> On Wed 2024-04-03 00:17:19, John Ogness wrote:
> > Allow nbcon consoles to print messages in the legacy printk()
> > caller context (printing via unlock) by integrating them into
> > console_flush_all(). The write_atomic() callback is used for
> > printing.
>
> Hmm, this patch tries to flush nbcon console even in context
> with NBCON_PRIO_NORMAL. Do we really want this, please?
>
> I would expect that it would do so only when the kthread
> is not working.
>
> > Provide nbcon_legacy_emit_next_record(), which acts as the
> > nbcon variant of console_emit_next_record(). Call this variant
> > within console_flush_all() for nbcon consoles. Since nbcon
> > consoles use their own @nbcon_seq variable to track the next
> > record to print, this also must be appropriately handled.
>
> I have been a bit confused by all the boolean return values
> and what _exactly_ they mean. IMHO, we should make it more
> clear how it works when it can't acquire the context.
>
> IMHO, it is is importnat because console_flush_all() interprets
> nbcon_legacy_emit_next_record() return value as @progress even when
> there is no guaranteed progress. We just expect that
> the other context is doing something.
>
> It feels like it might get stuck forewer in some situatuon.
> It would be good to understand if it is OK or not.
>
>
> Later update:
>
> Hmm, console_flush_all() is called from console_unlock().
> It might be called in atomic context. But the current
> owner might be theoretically scheduled out.
>
> This is from documentation of nbcon_context_try_acquire()
>
> /**
> * nbcon_context_try_acquire - Try to acquire nbcon console
> * @ctxt: The context of the caller
> *
> * Context: Any context which could not be migrated to another CPU.
>
>
> I can't find any situation where nbcon_context_try_acquire() is
> currently called in normal (schedulable) context. This is probably
> why you did not see any problems with testing.
>
> I see 3 possible solutions:
>
> 1. Enforce that nbcon context can be acquired only with preemtion
> disabled.
>
> 2. Enforce that nbcon context can be acquired only with
> interrupts. It would prevent deadlock when some future
> code interrupt flush in NBCON_PRIO_EMERGENCY context.
> And then a potential nested console_flush_all() won't be
> able to takeover the interrupted NBCON_PRIO_CONTEXT
> and there will be no progress.
>
> 3. console_flush_all() should ignore nbcon console when
> it is not able to get the context, aka no progress.
>
>
> I personally prefer the 3rd solution because I have spent
> last 12 years on attempts to move printk into preemtible
> context. And it looks wrong to move into atomic context.
>
> Warning: console_flush_all() suddenly won't guarantee flushing
> all messages.
>
> I am not completely sure about all the consequences until
> I see the rest of the patchset and the kthread intergration.
> We will somehow need to guarantee that all messages
> are flushed.

I am trying to make a full picture when and how the nbcon consoles
will get flushed. My current understanding and view is the following,
starting from the easiest priority:


1. NBCON_PRIO_PANIC messages will be flushed by calling
nbcon_atomic_flush_pending() directly in vprintk_emit()

This will take care of any previously added messages.

Non-panic CPUs are not allowed to add messages anymore
when there is a panic in progress.

[ALL OK]


2. NBCON_PRIO_EMERGENCY messages will be flushed by calling
nbcon_atomic_flush_pending() directly in nbcon_cpu_emergency_exit().

This would cover all previously added messages, including
the ones printed by the code between
nbcon_cpu_emergency_enter()/exit().

This won't cover later added messages which might be
a problem. Let's look at this closer. Later added
messages with:

+ NBCON_PRIO_PANIC will be handled in vprintk_emit()
as explained above [OK]

+ NBCON_PRIO_EMERGENCY() will be handled in the
related nbcon_cpu_emergency_exit() as described here.
[OK]

+ NBCON_PRIO_NORMAL will be handled, see below. [?]

[ PROBLEM: later added NBCON_PRIO_NORMAL messages, see below. ]


3. NBCON_PRIO_NORMAL messages will be flushed by:

+ the printk kthread when it is available

+ the legacy loop via

+ console_unlock()
+ console_flush_all()
+ console nbcon_legacy_emit_next_record() [PROBLEM]


PROBLEM: console_flush_all() does not guarantee progress with
nbcon consoles as explained above (previous mail).


My proposal:

1. console_flush_all() will flush nbcon consoles only
in NBCON_PRIO_NORMAL and when the kthreads are not
available.

It will make it clear that this is the flusher in
this situation.


2. Allow to skip nbcon consoles in console_flush_all() when
it can't take the context (as suggested in my previous
reply).

This won't guarantee flushing NORMAL messages added
while nbcon_cpu_emergency_exit() calls
nbcon_atomic_flush_pending().

Solve this problem by introducing[*] nbcon_atomic_flush_all()
which would flush even newly added messages and
call this in nbcon_cpu_emergency_exit() when the printk
kthread does not work. It should bail out when there
is a panic in progress.

Motivation: It does not matter which "atomic" context
flushes NORMAL/EMERGENCY messages when
the printk kthread is not available.

[*] Alternatively we could modify nbcon_atomic_flush_pending()
to flush even newly added messages when the kthread is
not working. But it might create another mess.

How does it sound, please?
Or do I miss anything?

Best Regards,
Petr

2024-04-12 09:07:50

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

On Thu 2024-04-11 16:14:58, Petr Mladek wrote:
> On Wed 2024-04-03 00:17:19, John Ogness wrote:
> > Allow nbcon consoles to print messages in the legacy printk()
> > caller context (printing via unlock) by integrating them into
> > console_flush_all(). The write_atomic() callback is used for
> > printing.
>
> Hmm, this patch tries to flush nbcon console even in context
> with NBCON_PRIO_NORMAL. Do we really want this, please?
>
> I would expect that it would do so only when the kthread
> is not working.
>
> > Provide nbcon_legacy_emit_next_record(), which acts as the
> > nbcon variant of console_emit_next_record(). Call this variant
> > within console_flush_all() for nbcon consoles. Since nbcon
> > consoles use their own @nbcon_seq variable to track the next
> > record to print, this also must be appropriately handled.
>
> I have been a bit confused by all the boolean return values
> and what _exactly_ they mean. IMHO, we should make it more
> clear how it works when it can't acquire the context.
>
> IMHO, it is is importnat because console_flush_all() interprets
> nbcon_legacy_emit_next_record() return value as @progress even when
> there is no guaranteed progress. We just expect that
> the other context is doing something.
>
> It feels like it might get stuck forewer in some situatuon.
> It would be good to understand if it is OK or not.
>
>
> Later update:
>
> Hmm, console_flush_all() is called from console_unlock().
> It might be called in atomic context. But the current
> owner might be theoretically scheduled out.
>
> This is from documentation of nbcon_context_try_acquire()
>
> /**
> * nbcon_context_try_acquire - Try to acquire nbcon console
> * @ctxt: The context of the caller
> *
> * Context: Any context which could not be migrated to another CPU.
>
>
> I can't find any situation where nbcon_context_try_acquire() is
> currently called in normal (schedulable) context. This is probably
> why you did not see any problems with testing.

> I see 3 possible solutions:
>
> 1. Enforce that nbcon context can be acquired only with preemtion
> disabled.

We actually have to make sure that preemtion is disabled because
nbcon_owner_matches() is not reliable after a wakeup.

The context might be taken by a higher priority context then
released and then taken by another task on the same CPU as
the original sleeping owner. I mean this:


CPU0 CPU1

[ task A ]

nbcon_context_try_acquire()
# success with NORMAL prio
# .unsafe == false; // safe for takeover

[ schedule: task A -> B ]


WARN_ON()
nbcon_atomic_flush_pending()
nbcon_context_try_acquire()
# success with EMERGENCY prio
# .unsafe == false; // safe for takeover

# flushing
nbcon_context_release()


nbcon_context_try_acquire()
# success with NORMAL prio [ task B ]
# .unsafe == false; // safe for takeover

[ schedule: task B -> A ]

nbcon_enter_unsafe()
nbcon_context_can_proceed()

BUG: nbcon_context_can_proceed() returns "true" because
the console is owned by a context on CPU0 with
NBCON_PRIO_NORMAL.

But it should return "false". The console is owned
by a context from task B and we do the check
in a context from task A.


I guess that most of the current code is safe because, for example:

+ __nbcon_atomic_flush_pending() disables interrupts before
acquiring the context

+ nbcon_driver_acquire() is called under spin_lock in
the uart_port_*lock() API.

+ Even the nbcon_kthread_func() in the current RT tree
acquires the context under con->device_lock(). Where
the device_lock() is a spin_lock in the only supported
uart serial console.


To be done:

1. We should make this clear:

+ Add either preempt_disable() or cant_sleep() into
nbcon_context_try_acquire().

+ Replace cant_migrate() with cant_sleep everywhere

+ Fix/update the documentation


2. We should make sure that the context is acquired for each
emitted record separately at least when using the normal
priority.

For example, __nbcon_atomic_flush_pending() is wrong from
this POV. It is used also from console_unlock(). It should
allow to schedule in between the records in this case.


Best Regards,
Petr

PS: I am still shaking my head around this. Sigh, I haven't expected
such a big "aha moment" at this stage.

2024-04-12 14:29:39

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 20/27] printk: Avoid console_lock dance if no legacy or boot consoles

On Wed 2024-04-03 12:07:32, John Ogness wrote:
> On 2024-04-03, John Ogness <[email protected]> wrote:
> > diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> > index df84c6bfbb2d..4ff3800e8e8e 100644
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -3810,6 +3833,7 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> > u64 last_diff = 0;
> > u64 printk_seq;
> > short flags;
> > + bool locked;
> > int cookie;
> > u64 diff;
> > u64 seq;
> > @@ -3819,22 +3843,35 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> > seq = prb_next_reserve_seq(prb);
> >
> > /* Flush the consoles so that records up to @seq are printed. */
> > - console_lock();
> > - console_unlock();
> > + if (printing_via_unlock) {
> > + console_lock();
> > + console_unlock();
> > + }
> >
> > for (;;) {
> > unsigned long begin_jiffies;
> > unsigned long slept_jiffies;
> >
> > + locked = false;
> > diff = 0;
> >
> > + if (printing_via_unlock) {
> > + /*
> > + * Hold the console_lock to guarantee safe access to
> > + * console->seq. Releasing console_lock flushes more
> > + * records in case @seq is still not printed on all
> > + * usable consoles.
> > + */
> > + console_lock();
> > + locked = true;
> > + }
> > +
> > /*
> > - * Hold the console_lock to guarantee safe access to
> > - * console->seq. Releasing console_lock flushes more
> > - * records in case @seq is still not printed on all
> > - * usable consoles.
> > + * Ensure the compiler does not optimize @locked to be
> > + * @printing_via_unlock since the latter can change at any
> > + * time.
> > */
> > - console_lock();
> > + barrier();
>
> When I originally implemented this, my objective was to force the
> compiler to use a local variable. But to be fully paranoid (which we
> should) we must also forbid the compiler from being able to do this
> nonsense:
>
> if (printing_via_unlock) {
> console_lock();
> locked = printing_via_unlock;
> }
>
> I suggest replacing the __pr_flush() hunks to be as follows instead
> (i.e. ensure all conditional console lock usage within the loop is using
> the local variable):
>
> diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
> index df84c6bfbb2d..1dbd2a837b67 100644
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -3819,22 +3842,34 @@ static bool __pr_flush(struct console *con, int timeout_ms, bool reset_on_progre
> seq = prb_next_reserve_seq(prb);
>
> /* Flush the consoles so that records up to @seq are printed. */
> - console_lock();
> - console_unlock();
> + if (printing_via_unlock) {
> + console_lock();
> + console_unlock();
> + }
>
> for (;;) {
> unsigned long begin_jiffies;
> unsigned long slept_jiffies;
> -
> - diff = 0;
> + bool use_console_lock = printing_via_unlock;
>
> /*
> - * Hold the console_lock to guarantee safe access to
> - * console->seq. Releasing console_lock flushes more
> - * records in case @seq is still not printed on all
> - * usable consoles.
> + * Ensure the compiler does not optimize @use_console_lock to
> + * be @printing_via_unlock since the latter can change at any
> + * time.
> */
> - console_lock();
> + barrier();

Indeed, this looks better. I have missed this as well.

It seems that using the barrier() is more tricky than I have expected.
I wonder if it would be better to use WRITE_ONCE()/READ_ONCE().
It would be more clear what we want to achieve. Also it would let
the compiler to optimize other things. Not that the the performance is
importnat here but...

Otherwise, the patch looks fine. With this change, feel free to use:

Reviewed-by: Petr Mladek <[email protected]>

Best Regargds,
Petr

2024-04-12 14:34:27

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 21/27] printk: Track nbcon consoles

On Wed 2024-04-03 00:17:23, John Ogness wrote:
> Add a global flag @have_nbcon_console to identify if any nbcon
> consoles are registered. This will be used in follow-up commits
> to preserve legacy behavior when no nbcon consoles are registered.
>
> Signed-off-by: John Ogness <[email protected]>

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2024-04-12 14:39:54

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 22/27] printk: Coordinate direct printing in panic

On Wed 2024-04-03 00:17:24, John Ogness wrote:
> Perform printing by nbcon consoles on the panic CPU from the
> printk() caller context in order to get panic messages printed
> as soon as possible.
>
> If legacy and nbcon consoles are registered, the legacy consoles
> will no longer perform direct printing on the panic CPU until
> after the backtrace has been stored. This will give the safe
> nbcon consoles a chance to print the panic messages before
> allowing the unsafe legacy consoles to print.
>
> If no nbcon consoles are registered, there is no change in
> behavior (i.e. legacy consoles will always attempt to print
> from the printk() caller context).
>
> Signed-off-by: John Ogness <[email protected]>

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2024-04-12 15:28:07

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 23/27] printk: nbcon: Implement emergency sections

On Wed 2024-04-03 00:17:25, John Ogness wrote:
> From: Thomas Gleixner <[email protected]>
>
> In emergency situations (something has gone wrong but the
> system continues to operate), usually important information
> (such as a backtrace) is generated via printk(). Each
> individual printk record has little meaning. It is the
> collection of printk messages that is most often needed by
> developers and users.
>
> In order to help ensure that the collection of printk messages
> in an emergency situation are all stored to the ringbuffer as
> quickly as possible, disable console output for that CPU while
> it is in the emergency situation. The consoles need to be
> flushed when exiting the emergency situation.
>
> Add per-CPU emergency nesting tracking because an emergency
> can arise while in an emergency situation.
>
> Add functions to mark the beginning and end of emergency
> sections where the urgent messages are generated.
>
> Do not print if the current CPU is in an emergency state.
>
> When exiting all emergency nesting, flush nbcon consoles
> directly using their atomic callback. Legacy consoles are
> triggered for flushing via irq_work because it is not known
> if the context was safe for a trylock on the console lock.
>
> Note that the emergency state is not system-wide. While one CPU
> is in an emergency state, another CPU may continue to print
> console messages.
>
> Co-developed-by: John Ogness <[email protected]>
> Signed-off-by: John Ogness <[email protected]>
> Signed-off-by: Thomas Gleixner (Intel) <[email protected]>
> ---
> include/linux/console.h | 4 ++
> kernel/printk/nbcon.c | 83 +++++++++++++++++++++++++++++++++++++++++
> kernel/printk/printk.c | 13 ++++++-
> 3 files changed, 98 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index 5f1758891aec..7712e4145164 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -559,10 +559,14 @@ static inline bool console_is_registered(const struct console *con)
> hlist_for_each_entry(con, &console_list, node)
>
> #ifdef CONFIG_PRINTK
> +extern void nbcon_cpu_emergency_enter(void);
> +extern void nbcon_cpu_emergency_exit(void);
> extern bool nbcon_can_proceed(struct nbcon_write_context *wctxt);
> extern bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt);
> extern bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt);
> #else
> +static inline void nbcon_cpu_emergency_enter(void) { }
> +static inline void nbcon_cpu_emergency_exit(void) { }
> static inline bool nbcon_can_proceed(struct nbcon_write_context *wctxt) { return false; }
> static inline bool nbcon_enter_unsafe(struct nbcon_write_context *wctxt) { return false; }
> static inline bool nbcon_exit_unsafe(struct nbcon_write_context *wctxt) { return false; }
> diff --git a/kernel/printk/nbcon.c b/kernel/printk/nbcon.c
> index 47f39402a22b..4c852c2e8d89 100644
> --- a/kernel/printk/nbcon.c
> +++ b/kernel/printk/nbcon.c
> @@ -936,6 +936,29 @@ static bool nbcon_emit_next_record(struct nbcon_write_context *wctxt)
> return nbcon_context_exit_unsafe(ctxt);
> }
>
> +/* Track the nbcon emergency nesting per CPU. */
> +static DEFINE_PER_CPU(unsigned int, nbcon_pcpu_emergency_nesting);
> +static unsigned int early_nbcon_pcpu_emergency_nesting __initdata;
> +
> +/**
> + * nbcon_get_cpu_emergency_nesting - Get the per CPU emergency nesting pointer
> + *
> + * Return: Either a pointer to the per CPU emergency nesting counter of
> + * the current CPU or to the init data during early boot.
> + */
> +static __ref unsigned int *nbcon_get_cpu_emergency_nesting(void)
> +{
> + /*
> + * The value of __printk_percpu_data_ready gets set in normal
> + * context and before SMP initialization. As a result it could
> + * never change while inside an nbcon emergency section.
> + */
> + if (!printk_percpu_data_ready())
> + return &early_nbcon_pcpu_emergency_nesting;
> +
> + return this_cpu_ptr(&nbcon_pcpu_emergency_nesting);
> +}
> +
> /**
> * nbcon_atomic_emit_one - Print one record for an nbcon console using the
> * write_atomic() callback
> @@ -977,9 +1000,15 @@ static bool nbcon_atomic_emit_one(struct nbcon_write_context *wctxt)
> */
> enum nbcon_prio nbcon_get_default_prio(void)
> {
> + unsigned int *cpu_emergency_nesting;
> +
> if (this_cpu_in_panic())
> return NBCON_PRIO_PANIC;
>
> + cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> + if (*cpu_emergency_nesting)
> + return NBCON_PRIO_EMERGENCY;
> +
> return NBCON_PRIO_NORMAL;
> }
>
> @@ -1146,6 +1175,60 @@ void nbcon_atomic_flush_unsafe(void)
> __nbcon_atomic_flush_pending(prb_next_reserve_seq(prb), true);
> }
>
> +/**
> + * nbcon_cpu_emergency_enter - Enter an emergency section where printk()
> + * messages for that CPU are only stored
> + *
> + * Upon exiting the emergency section, all stored messages are flushed.
> + *
> + * Context: Any context. Disables preemption.
> + *
> + * When within an emergency section, no printing occurs on that CPU. This
> + * is to allow all emergency messages to be dumped into the ringbuffer before
> + * flushing the ringbuffer. The actual printing occurs when exiting the
> + * outermost emergency section.
> + */
> +void nbcon_cpu_emergency_enter(void)
> +{
> + unsigned int *cpu_emergency_nesting;
> +
> + preempt_disable();
> +
> + cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> + (*cpu_emergency_nesting)++;
> +}
> +
> +/**
> + * nbcon_cpu_emergency_exit - Exit an emergency section and flush the
> + * stored messages
> + *
> + * Flushing only occurs when exiting all nesting for the CPU.
> + *
> + * Context: Any context. Enables preemption.
> + */
> +void nbcon_cpu_emergency_exit(void)
> +{
> + unsigned int *cpu_emergency_nesting;
> + bool do_trigger_flush = false;
> +
> + cpu_emergency_nesting = nbcon_get_cpu_emergency_nesting();
> +
> + WARN_ON_ONCE(*cpu_emergency_nesting == 0);

We should safe the situation and make sure that the couter
does not go below zero.

Also the WARN_ON_ONCE() might be moved after the flush. IMHO, it is
not that important.


> + if (*cpu_emergency_nesting == 1) {
> + nbcon_atomic_flush_pending();
> + do_trigger_flush = true;
> + }

I wanted to reshufle the code. Then I realized that the messages
are flushed before decrementing the counter probably on purpose.

> +
> + /* Undo the nesting count of nbcon_cpu_emergency_enter(). */
> + (*cpu_emergency_nesting)--;

I suggest to replace the above code with something like this:

/*
* Flush the messages enabling preemtion to see them ASAP.
*
* Reduce the risk of potential softlockup by using the flush_pending()
* variant which ignores later added messages. It is called before
* decrementing the couter to even prevent flushing of another nested
* emergency messages.
*/
if (*cpu_emergency_nesting == 1) {
nbcon_atomic_flush_pending();
do_trigger_flush = true;
}

(*cpu_emergency_nesting)--;

if (WARN_ON_ONCE(*cpu_emergency_nesting < 0))
*cpu_emergency_nesting = 0;

> +
> + preempt_enable();
>
> + if (do_trigger_flush)
> + printk_trigger_flush();
> +}
> +
> /**
> * nbcon_alloc - Allocate buffers needed by the nbcon console
> * @con: Console to allocate buffers for

With the above change, feel free to use:

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2024-04-15 13:39:40

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 24/27] panic: Mark emergency section in warn

On Wed 2024-04-03 00:17:26, John Ogness wrote:
> From: Thomas Gleixner <[email protected]>
>
> Mark the full contents of __warn() as an emergency section. In
> this section, the CPU will not perform console output for the
> printk() calls. Instead, a flushing of the console output is
> triggered when exiting the emergency section.
>
> Co-developed-by: John Ogness <[email protected]>
> Signed-off-by: John Ogness <[email protected]>
> Signed-off-by: Thomas Gleixner (Intel) <[email protected]>

Ok, let's try it:

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2024-04-15 13:40:10

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 26/27] rcu: Mark emergency section in rcu stalls

On Wed 2024-04-03 00:17:28, John Ogness wrote:
> Mark emergency sections wherever multiple lines of
> rcu stall information are generated. In an emergency
> section the CPU will not perform console output for the
> printk() calls. Instead, a flushing of the console
> output is triggered when exiting the emergency section.
> This allows the full message block to be stored as
> quickly as possible in the ringbuffer.
>
> Signed-off-by: John Ogness <[email protected]>

Seems to be on the right location:

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2024-04-15 13:53:16

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 25/27] panic: Mark emergency section in oops

On Wed 2024-04-03 00:17:27, John Ogness wrote:
> Mark an emergency section beginning with oops_enter() until the
> end of oops_exit(). In this section, the CPU will not perform
> console output for the printk() calls. Instead, a flushing of the
> console output is triggered when exiting the emergency section.
>
> The very end of oops_exit() performs a kmsg_dump(). This is not
> included in the emergency section because it is another
> flushing mechanism that should occur after the consoles have
> been triggered to flush.
>
> Signed-off-by: John Ogness <[email protected]>

Let's try it:

Reviewed-by: Petr Mladek <[email protected]>

Best Regards,
Petr

2024-04-16 09:51:54

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 27/27] lockdep: Mark emergency sections in lockdep splats

On Wed 2024-04-03 00:17:29, John Ogness wrote:
> Mark emergency sections wherever multiple lines of
> lock debugging output are generated. In an emergency
> section the CPU will not perform console output for the
> printk() calls. Instead, a flushing of the console
> output is triggered when exiting the emergency section.
> This allows the full message block to be stored as
> quickly as possible in the ringbuffer.
>
> Signed-off-by: John Ogness <[email protected]>
> ---
> kernel/locking/lockdep.c | 91 ++++++++++++++++++++++++++++++++++++++--
> 1 file changed, 88 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 151bd3de5936..80cfbe7b340e 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -5150,6 +5211,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> #endif
> if (unlikely(curr->lockdep_depth >= MAX_LOCK_DEPTH)) {
> debug_locks_off();
> + nbcon_cpu_emergency_enter();
> print_lockdep_off("BUG: MAX_LOCK_DEPTH too low!");
> printk(KERN_DEBUG "depth: %i max: %lu!\n",
> curr->lockdep_depth, MAX_LOCK_DEPTH);
> @@ -5157,6 +5219,7 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
> lockdep_print_held_locks(current);
> debug_show_all_locks();

My concern is that the dump of all locks would not fit into the
buffer. And it might even trigger softlockup. See below.


> dump_stack();
> + nbcon_cpu_emergency_exit();
>
> return 0;
> }
> @@ -6609,6 +6688,7 @@ void debug_show_all_locks(void)
> pr_warn("INFO: lockdep is turned off.\n");
> return;
> }
> + nbcon_cpu_emergency_enter();
> pr_warn("\nShowing all locks held in the system:\n");
>
> rcu_read_lock();

The code dumping the locks looks like:

for_each_process_thread(g, p) {
if (!p->lockdep_depth)
continue;
lockdep_print_held_locks(p);
touch_nmi_watchdog();
touch_all_softlockup_watchdogs();
}

I see two problems here:

1. I belive that the watchdogs are touched for a reason. And they will
be useless when we print everything in a single emergency context
and emit all messages at the end.

2. The default config LOG_BUF_SHIFT is 17. So the default kernel
buffer size is 128kB. The number of messages is bound by
the number of processes. I am afraid that all messages might
not fit into the buffer.


I see two solutions:

1. Take the emergency context only around single dump:

nbcon_cpu_emergency_enter();
lockdep_print_held_locks(p);
nbcon_cpu_emergency_exit();


2. Explicitely flush the printk buffer here. Something like:

lockdep_print_held_locks(p);
nbcon_cpu_emergency_flush();
touch_nmi_watchdog();
touch_all_softlockup_watchdogs();


, where nbcon_cpu_emergency_flush() would do something like:

/**
* nbcon_cpu_emergency_flush - Explicitly flush consoles in
* the middle of emergency context.
*
* Both nbcon and legacy consoles are flushed.
*
* It should be used only when there are too many messages printed
* in emergency context, for example, printing backtraces of all
* CPUs or processes. It is typically needed when the watchdogs
* have to be touched as well.
*
void nbcon_cpu_emergency_flush()
{
/* The explicit flush is needed only in the emergency context. */
if (!nbcon_get_cpu_emergency_nesting())
return;

nbcon_atomic_flush_pending();

if (printing_via_unlock && !in_nmi()) {
if (console_trylock())
console_unlock();
}
}


I like the 2nd solution. The rule is more or less clear. The
explicit flush is needed when the the code needed to touch watchdogs.

Maybe, we should go even further and call the flush directly
from the touch_*_*watchdog() API. It has effect only in emergency
context.

Best Regards,
Petr

2024-04-16 11:18:09

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH printk v4 27/27] lockdep: Mark emergency sections in lockdep splats

On Wed, Apr 03, 2024 at 12:17:29AM +0206, John Ogness wrote:
> Mark emergency sections wherever multiple lines of
> lock debugging output are generated. In an emergency
> section the CPU will not perform console output for the
> printk() calls. Instead, a flushing of the console
> output is triggered when exiting the emergency section.
> This allows the full message block to be stored as
> quickly as possible in the ringbuffer.

I am confused, when in emergency I want the thing to dump everything to
the atomic thing asap.

Storing it all up runs the risk of never getting to the 'complete' point
because we're dead.

2024-04-16 12:53:48

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v4 27/27] lockdep: Mark emergency sections in lockdep splats

On 2024-04-16, Peter Zijlstra <[email protected]> wrote:
>> Mark emergency sections wherever multiple lines of
>> lock debugging output are generated. In an emergency
>> section the CPU will not perform console output for the
>> printk() calls. Instead, a flushing of the console
>> output is triggered when exiting the emergency section.
>> This allows the full message block to be stored as
>> quickly as possible in the ringbuffer.
>
> I am confused, when in emergency I want the thing to dump everything to
> the atomic thing asap.

At LPC 2022 we discussed this and agreed that it is more appropriate to
get the full set of emergency printk messages into the ringbuffer
first. This is fast and lockless. Then we can begin the slow process of
printing.

Perhaps I am making these emergency sections too large. Maybe only
certain backtraces should use them. We will need to gain some experience
here.

> Storing it all up runs the risk of never getting to the 'complete' point
> because we're dead.

If no other debugging mechanisms are available to read the ringbuffer
and no other CPUs are alive to handle the printing: yes, we are
dead. But if the machine is dying so quickly, it would probably die
before getting the first line out anyway.

Note that by "dead" we mean an irresponsive system that does not panic.

John

2024-04-16 15:02:20

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

On 2024-04-09, Petr Mladek <[email protected]> wrote:
>> Console drivers typically must deal with access to the hardware
>> via user input/output (such as an interactive login shell) and
>> output of kernel messages via printk() calls.
>>
>> Follow-up commits require that the printk subsystem is able to
>> synchronize with the driver. Require nbcon consoles to implement
>> two new callbacks (device_lock(), device_unlock()) that will
>> use whatever synchronization mechanism the driver is using for
>> itself (for example, the port lock for uart serial consoles).
>
> We should explain here the bigger picture, see my comment
> of the word "whenever" below.
>
> <proposal>
> Console drivers typically have to deal with access to the hardware
> via user input/output (such as an interactive login shell) and
> output of kernel messages via printk() calls.
>
> They use some classic locking mechanism in most situations. But
> console->write_atomic() callbacks, used by nbcon consoles,
> must be synchronized only by acquiring the console context.
>
> The synchronization via the console context ownership is possible
> only when the console driver is registered. It is when a particular
> device driver is connected with a particular console driver.
>
> The two synchronization mechanisms must be synchronized between
> each other. It is tricky because the console context ownership
> is quite special. It might be taken over by a higher priority
> context. Also CPU migration has to be disabled.
>
> The most tricky part is to (dis)connect these two mechanism during
> the console (un)registration. Let's make it easier by adding two new
> callbacks: device_lock(), device_unlock(). They would allow
> to take the device-specific lock while the device is being
> (un)registered by the related console driver.
>
> For example, these callbacks would lock/unlock the port lock
> for serial port drivers.
> </proposal>

I do not like this proposal. IMHO it is focussing only on nbcon
registration details (a corner case) rather than presenting the big
picture.

Yes, we will use these callbacks during registration/unregistration to
avoid some races, but that is just one usage (and not the primary
one). The main reason these callbacks exist is to provide driver
synchronization when printing.

We want to avoid using nbcon console ownership contention whenever
possible. In fact, there should _never_ be nbcon console owership
contention except for in emergency or panic situations.

In the normal case, printk will use the driver-specific locking for
synchronization. Previously this was achieved by implementing the
lock/unlock within the write() callback. But with nbcon consoles that is
not possible because the nbcon ownership must be taken outside of the
write callback:

con->device_lock()
nbcon_acquire()
con->write_atomic() or con->write_thread()
nbcon_release()
con->device_unlock()

How about this for the commit message:

printk: nbcon: Add callbacks to synchronize with driver

Console drivers typically must deal with access to the hardware
via user input/output (such as an interactive login shell) and
output of kernel messages via printk() calls. To provide the
necessary synchronization, usually some driver-specific locking
mechanism is used (for example, the port spinlock for uart
serial consoles).

Until now, usage of this driver-specific locking has been hidden
from the printk-subsystem and implemented within the various
console callbacks. However, for nbcon consoles, it is necessary
that the printk-subsystem uses the driver-specific locking so
that nbcon console ownership can be acquired _after_ the
driver-specific locking has succeeded. This allows for lock
contention to exist on the more context-friendly driver-specific
locking rather than nbcon console ownership (for non-emergency
and non-panic cases).

Require nbcon consoles to implement two new callbacks
(device_lock(), device_unlock()) that will use whatever
synchronization mechanism the driver is using for itself.

John

P.S. I think it makes more sense to use your proposal for the commit
message of the follow-up patch "printk: nbcon: Use driver
synchronization while registering".

2024-04-17 13:03:58

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

On Tue 2024-04-16 17:07:39, John Ogness wrote:
> On 2024-04-09, Petr Mladek <[email protected]> wrote:
> >> Console drivers typically must deal with access to the hardware
> >> via user input/output (such as an interactive login shell) and
> >> output of kernel messages via printk() calls.
> >>
> >> Follow-up commits require that the printk subsystem is able to
> >> synchronize with the driver. Require nbcon consoles to implement
> >> two new callbacks (device_lock(), device_unlock()) that will
> >> use whatever synchronization mechanism the driver is using for
> >> itself (for example, the port lock for uart serial consoles).
> >
> > We should explain here the bigger picture, see my comment
> > of the word "whenever" below.
> >
> > <proposal>
> > Console drivers typically have to deal with access to the hardware
> > via user input/output (such as an interactive login shell) and
> > output of kernel messages via printk() calls.
> >
> > They use some classic locking mechanism in most situations. But
> > console->write_atomic() callbacks, used by nbcon consoles,
> > must be synchronized only by acquiring the console context.
> >
> > The synchronization via the console context ownership is possible
> > only when the console driver is registered. It is when a particular
> > device driver is connected with a particular console driver.
> >
> > The two synchronization mechanisms must be synchronized between
> > each other. It is tricky because the console context ownership
> > is quite special. It might be taken over by a higher priority
> > context. Also CPU migration has to be disabled.
> >
> > The most tricky part is to (dis)connect these two mechanism during
> > the console (un)registration. Let's make it easier by adding two new
> > callbacks: device_lock(), device_unlock(). They would allow
> > to take the device-specific lock while the device is being
> > (un)registered by the related console driver.
> >
> > For example, these callbacks would lock/unlock the port lock
> > for serial port drivers.
> > </proposal>
>
> I do not like this proposal. IMHO it is focussing only on nbcon
> registration details (a corner case) rather than presenting the big
> picture.

Fair enough.

> Yes, we will use these callbacks during registration/unregistration to
> avoid some races, but that is just one usage (and not the primary
> one). The main reason these callbacks exist is to provide driver
> synchronization when printing.
>
> We want to avoid using nbcon console ownership contention whenever
> possible. In fact, there should _never_ be nbcon console owership
> contention except for in emergency or panic situations.
>
> In the normal case, printk will use the driver-specific locking for
> synchronization. Previously this was achieved by implementing the
> lock/unlock within the write() callback. But with nbcon consoles that is
> not possible because the nbcon ownership must be taken outside of the
> write callback:
>
> con->device_lock()
> nbcon_acquire()
> con->write_atomic() or con->write_thread()
> nbcon_release()
> con->device_unlock()

This sounds like a strong requirement. So there should be a strong
reason ;-) I would like to better understand it [*] and
document it a clear way.

In principle, we do not need to take the full con->device_lock()
around nbcon_acquire() in the normal context. By other words,
when nbcon_acquire() is safe enough in emergency context
then it should be safe enough in the normal context either.
Otherwise, we would have a problem.

My understanding is that we want to take con->device_lock()
in the normal context from two reasons:

1. This is historical, king of speculation, and probably
not the real reason.

In the initial RFC, nbcon_try_acquire() allowed to take over
the context with the same priority.

The con->device() taken in the kthread might have prevented
stealing the context by another "random" uart_port_lock() call from
a code which would not continue emitting the messages.

But the current nbcon_try_acquire() implementation does not take
over the context with the same priority. So, this reason
is not longet valid. But it probably has never been the reason.


2. The con->device() defines the context in which nbcon_acquire()
will be taken and con->write_atomic() called to make it
safe against other operations with the device driver.

For example, con->device() from uart serial consoles would
disable interrupts to prevent deadlocks with the serial
port IRQ handlers.

Some other drivers might just need to disable preemption.
And some (future) drivers might even allow to keep
the preemption enabled.


I believe that the 2nd reason is that right one. But it is far
from obvious from the proposed comments.


[*] I am pretty sure that we have had the same discussion during
Plumbers 2023. I am sorry I do not recall all the details now.
Anyway, this should be explained in public anyway.


> How about this for the commit message:
>
> printk: nbcon: Add callbacks to synchronize with driver
>
> Console drivers typically must deal with access to the hardware
> via user input/output (such as an interactive login shell) and
> output of kernel messages via printk() calls. To provide the
> necessary synchronization, usually some driver-specific locking
> mechanism is used (for example, the port spinlock for uart
> serial consoles).
>
> Until now, usage of this driver-specific locking has been hidden
> from the printk-subsystem and implemented within the various
> console callbacks.

So far, so good.

> However, for nbcon consoles, it is necessary
> that the printk-subsystem uses the driver-specific locking so
> that nbcon console ownership can be acquired _after_ the
> driver-specific locking has succeeded. This allows for lock
> contention to exist on the more context-friendly driver-specific
> locking rather than nbcon console ownership (for non-emergency
> and non-panic cases).

I guess that the part:

This allows for lock contention to exist on the more
context-friendly driver-specific locking

tries to explain the 2nd reason that I have described above.
But it looks too cryptic to me. I have got it only because
I knew what I was looking for.

> Require nbcon consoles to implement two new callbacks
> (device_lock(), device_unlock()) that will use whatever
> synchronization mechanism the driver is using for itself.


Honestly, I still think that we need con->device_lock() primary
to synchronize the console registration and unregistration.

In all other cases, we only need to know whether we have
to call nbcon_try_acquire() with interrupts disabled or not[**].
And we need to know this for any nbcon_try_acquire() access,
including the emergency context.

Maybe, we should pass this information another way because
we do not want to call con->device_lock() in
nbcon_cpu_emergency_exit().

[**] According to my last findings, we always need to disable
preemption, see
https://lore.kernel.org/r/[email protected]


I still have to shake my head around this. But I would first like
to know whether:

+ You agree that nbcon_try_acquire() always have to be called with
preemption disabled.

+ What do you think about explicitly disabling preemption
in nbcon_try_acquire().

+ If it is acceptable for the big picture. It should be fine for
serial consoles. But I think that graphics consoles wanted to
be preemptive when called in the printk kthread.


I am sure that it will be possible to make nbcon_try_acquire()
preemption-safe but it will need some more magic. Maybe, we could
do it later when really needed. The primary target are the slow
serial consoles anyway.


> P.S. I think it makes more sense to use your proposal for the commit
> message of the follow-up patch "printk: nbcon: Use driver
> synchronization while registering".

Yup, feel free to use it.

Best Regards,
Petr

2024-04-17 14:54:56

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

On 2024-04-17, Petr Mladek <[email protected]> wrote:
>> We want to avoid using nbcon console ownership contention whenever
>> possible. In fact, there should _never_ be nbcon console owership
>> contention except for in emergency or panic situations.
>>
>> In the normal case, printk will use the driver-specific locking for
>> synchronization. Previously this was achieved by implementing the
>> lock/unlock within the write() callback. But with nbcon consoles that
>> is not possible because the nbcon ownership must be taken outside of
>> the write callback:
>>
>> con->device_lock()
>> nbcon_acquire()
>> con->write_atomic() or con->write_thread()
>> nbcon_release()
>> con->device_unlock()
>
> This sounds like a strong requirement. So there should be a strong
> reason

There is: PREEMPT_RT

> when nbcon_acquire() is safe enough in emergency context
> then it should be safe enough in the normal context either.
> Otherwise, we would have a problem.

Of course. That is not the issue.

> My understanding is that we want to take con->device_lock()
> in the normal context from two reasons:
>
> 1. This is historical, king of speculation, and probably
> not the real reason.

Correct. Not a reason.

> 2. The con->device() defines the context in which nbcon_acquire()
> will be taken and con->write_atomic() called to make it
> safe against other operations with the device driver.
>
> For example, con->device() from uart serial consoles would
> disable interrupts to prevent deadlocks with the serial
> port IRQ handlers.
>
> Some other drivers might just need to disable preemption.
> And some (future) drivers might even allow to keep
> the preemption enabled.

(Side note: In PREEMPT_RT, all drivers keep preemption enabled.)

This 2nd reason is almost correct.

Drivers are implemented using their own locking mechanisms. For UART it
will be spinlocks. For VT it will be mutexes. Whatever these mechanisms
are, that is what printk also wants to use. And since (for the normal
case) printk will always print console messages from task context,
drivers are free to use whatever locking mechanism fits them best. By
using the locking choice of the driver, printk will always do the right
thing and the author of that driver will always be in control of the
context.

Unfortunately printk also needs to deal with the non-normal case
(emergencies, panic, shutdown) when no printing threads are
available. For this (and only for this case) the nbcon console ownership
was introduced. It functions as a special[*] inner lock. This inner lock
will never be contended in the normal case. It exists only so that the
non-normal case can takeover the console for printing.

[*] Special = NMI-safe with priority and handover/takeover semantics.

Generally speaking, driver authors should not be concerned about this
special inner lock. It should be hidden (such as in the port lock
wrapper).

The special lock is interesting _only_ for drivers implementing
write_atomic(). And even then, it is only interesting for the
write_thread() and write_atomic() callback implementations. These
require some special handling to make sure they will print sane output
during handover/takeovers. But no other functions need to be concerned
about it.


> I still have to shake my head around this. But I would first like
> to know whether:
>
> + You agree that nbcon_try_acquire() always have to be called with
> preemption disabled.

No, it must not. PREEMPT_RT requires preemption enabled. That has always
been the core of this whole rework.

> + What do you think about explicitly disabling preemption
> in nbcon_try_acquire().

We cannot do it.

> + If it is acceptable for the big picture. It should be fine for
> serial consoles. But I think that graphics consoles wanted to
> be preemptive when called in the printk kthread.

In PREEMPT_RT, all are preemptive.

> I am sure that it will be possible to make nbcon_try_acquire()
> preemption-safe but it will need some more magic.

I am still investigating why you think it is not safe (as an inner lock
for the normal case). Note that for emergency and panic situations,
preemption _is_ disabled.

John

2024-04-17 21:59:44

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

On 2024-04-11, Petr Mladek <[email protected]> wrote:
> console_flush_all() is called from console_unlock().
> It might be called in atomic context. But the current
> owner might be theoretically scheduled out.

Nice catch. This problem was introduced in this series after you pointed
out that return value of nbcon_legacy_emit_next_record() did not match
the semantics of console_emit_next_record().

> I see 3 possible solutions:
>
> 1. Enforce that nbcon context can be acquired only with preemtion
> disabled.

Not acceptable.

> 2. Enforce that nbcon context can be acquired only with
> interrupts. It would prevent deadlock when some future
> code interrupt flush in NBCON_PRIO_EMERGENCY context.
> And then a potential nested console_flush_all() won't be
> able to takeover the interrupted NBCON_PRIO_CONTEXT
> and there will be no progress.

Not acceptable.

> 3. console_flush_all() should ignore nbcon console when
> it is not able to get the context, aka no progress.

This was the previous implementation, but as you point out, it is an
issue that console_flush_all() is no longer reliable.

I will continue this topic by responding to your follow-up message.

John

2024-04-17 23:07:39

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

On 2024-04-11, Petr Mladek <[email protected]> wrote:
> I am trying to make a full picture when and how the nbcon consoles
> will get flushed. My current understanding and view is the following,
> starting from the easiest priority:
>
>
> 1. NBCON_PRIO_PANIC messages will be flushed by calling
> nbcon_atomic_flush_pending() directly in vprintk_emit()
>
> This will take care of any previously added messages.
>
> Non-panic CPUs are not allowed to add messages anymore
> when there is a panic in progress.
>
> [ALL OK]

OK, because the end of panic will perform unsafe takeovers if necessary.

> 2. NBCON_PRIO_EMERGENCY messages will be flushed by calling
> nbcon_atomic_flush_pending() directly in nbcon_cpu_emergency_exit().
>
> This would cover all previously added messages, including
> the ones printed by the code between
> nbcon_cpu_emergency_enter()/exit().

This is best effort. If the console is owned by another context and is
marked unsafe, nbcon_atomic_flush_pending() will do nothing.

[ PROBLEM: In this case, who will flush the emergency messages? ]

> This won't cover later added messages which might be
> a problem. Let's look at this closer. Later added
> messages with:
>
> + NBCON_PRIO_PANIC will be handled in vprintk_emit()
> as explained above [OK]
>
> + NBCON_PRIO_EMERGENCY() will be handled in the
> related nbcon_cpu_emergency_exit() as described here.
> [OK]
>
> + NBCON_PRIO_NORMAL will be handled, see below. [?]
>
> [ PROBLEM: later added NBCON_PRIO_NORMAL messages, see below. ]

Yes, this is also an issue, although the solution may be the same for
this and the above problem.

> 3. NBCON_PRIO_NORMAL messages will be flushed by:
>
> + the printk kthread when it is available
>
> + the legacy loop via
>
> + console_unlock()
> + console_flush_all()
> + console nbcon_legacy_emit_next_record() [PROBLEM]
>
> PROBLEM: console_flush_all() does not guarantee progress with
> nbcon consoles as explained above (previous mail).

Not only this. If there is no kthread available, no printing will occur
until the _next_ printk(), whenever that is.


Above we have listed 3 problems:

- emergency messages will not flush if owned by another context and
unsafe

- normal messages will not flush if owned by another context

- for the above 2 problems, if there is no kthread, nobody will flush
the messages


My question: Is this really a problem?

The main idea behind the rework is that printing is deferred. The
kthreads exist for this. If the kthreads are not available (early boot
or shutdown) or the kthreads are not reliable enough (emergency
messages), a best-safe-effort is made to print in the caller
context. Only the panic situation is designed to force output (unsafely,
if necessary). Is that not enough?

> My proposal:
>
> 1. console_flush_all() will flush nbcon consoles only
> in NBCON_PRIO_NORMAL and when the kthreads are not
> available.
>
> It will make it clear that this is the flusher in
> this situation.

This is the current PREEMPT_RT implementation.

> 2. Allow to skip nbcon consoles in console_flush_all() when
> it can't take the context (as suggested in my previous
> reply).
>
> This won't guarantee flushing NORMAL messages added
> while nbcon_cpu_emergency_exit() calls
> nbcon_atomic_flush_pending().

This was the previous version. And I agree that we need to go back to
that.

> Solve this problem by introducing[*] nbcon_atomic_flush_all()
> which would flush even newly added messages and
> call this in nbcon_cpu_emergency_exit() when the printk
> kthread does not work. It should bail out when there
> is a panic in progress.
>
> Motivation: It does not matter which "atomic" context
> flushes NORMAL/EMERGENCY messages when
> the printk kthread is not available.

I do not think that solves the problem. If the console is in an unsafe
section, nothing can be printed.

> [*] Alternatively we could modify nbcon_atomic_flush_pending()
> to flush even newly added messages when the kthread is
> not working. But it might create another mess.

This discussion is about when kthreads are not available. If this is a
concern, I wonder if maybe in this situation an irq_work should be
triggered upon release of the console.

For example, something like:

static bool flush_pending(struct console *con)
{
/* If there is a kthread, let it do the work. */
if (con->kthread)
return false;

/* Make sure a record is pending. */
if (!prb_read_valid(prb, nbcon_seq_read(con), NULL))
return false;

return true;
}

static void nbcon_context_release(struct nbcon_context *ctxt)
{
...

/* Trigger irq_work to flush if necessary. */
if (flush_pending(con))
defer_console_output();
}

John

2024-04-18 10:34:11

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

On Wed 2024-04-17 17:00:42, John Ogness wrote:
> On 2024-04-17, Petr Mladek <[email protected]> wrote:
> >> We want to avoid using nbcon console ownership contention whenever
> >> possible. In fact, there should _never_ be nbcon console owership
> >> contention except for in emergency or panic situations.
> >>
> >> In the normal case, printk will use the driver-specific locking for
> >> synchronization. Previously this was achieved by implementing the
> >> lock/unlock within the write() callback. But with nbcon consoles that
> >> is not possible because the nbcon ownership must be taken outside of
> >> the write callback:
> >>
> >> con->device_lock()
> >> nbcon_acquire()
> >> con->write_atomic() or con->write_thread()
> >> nbcon_release()
> >> con->device_unlock()
> >
> > This sounds like a strong requirement. So there should be a strong
> > reason
>
> There is: PREEMPT_RT

This explains it!

I think that a lot of misunderstanding here is caused because
your brain is trained primary in "RT mode" ;-) While I am not
that familiar with the RT tricks and my brain is thinking
in classic preemption mode :-)

I am not sure how it is done in other parts of kernel code where
RT needed to introduce some tricks. But I think that we should
really start mentioning RT behavior in the commit messages and
and comments where the RT mode makes huge changes.


> > when nbcon_acquire() is safe enough in emergency context
> > then it should be safe enough in the normal context either.
> > Otherwise, we would have a problem.
>
> Of course. That is not the issue.
>
> > My understanding is that we want to take con->device_lock()
> > in the normal context from two reasons:
> >
> > 1. This is historical, king of speculation, and probably
> > not the real reason.
>
> Correct. Not a reason.
>
> > 2. The con->device() defines the context in which nbcon_acquire()
> > will be taken and con->write_atomic() called to make it
> > safe against other operations with the device driver.
> >
> > For example, con->device() from uart serial consoles would
> > disable interrupts to prevent deadlocks with the serial
> > port IRQ handlers.
> >
> > Some other drivers might just need to disable preemption.
> > And some (future) drivers might even allow to keep
> > the preemption enabled.
>
> (Side note: In PREEMPT_RT, all drivers keep preemption enabled.)

This explains everything. It is a huge game changer.

Sigh, I remember that you told me this on Plumbers. But my
non-RT-trained brain forgot this "detail". Well, I hope that
I am not the only one and we should mention this in the comments.

> > I still have to shake my head around this. But I would first like
> > to know whether:
> >
> > + You agree that nbcon_try_acquire() always have to be called with
> > preemption disabled.
>
> No, it must not. PREEMPT_RT requires preemption enabled. That has always
> been the core of this whole rework.

Got it! I have completely forgot that spin_lock() is a mutex in RT.

> > + What do you think about explicitly disabling preemption
> > in nbcon_try_acquire().
>
> We cannot do it.
>
> > + If it is acceptable for the big picture. It should be fine for
> > serial consoles. But I think that graphics consoles wanted to
> > be preemptive when called in the printk kthread.
>
> In PREEMPT_RT, all are preemptive.
>
> > I am sure that it will be possible to make nbcon_try_acquire()
> > preemption-safe but it will need some more magic.
>
> I am still investigating why you think it is not safe (as an inner lock
> for the normal case). Note that for emergency and panic situations,
> preemption _is_ disabled.

The race scenario has been mentioned in
https://lore.kernel.org/r/[email protected]

CPU0 CPU1

[ task A ]

nbcon_context_try_acquire()
# success with NORMAL prio
# .unsafe == false; // safe for takeover

[ schedule: task A -> B ]


WARN_ON()
nbcon_atomic_flush_pending()
nbcon_context_try_acquire()
# success with EMERGENCY prio
# .unsafe == false; // safe for takeover

# flushing
nbcon_context_release()

# HERE: con->nbcon_state is free
# to take by anyone !!!


nbcon_context_try_acquire()
# success with NORMAL prio [ task B ]
# .unsafe == false; // safe for takeover

[ schedule: task B -> A ]

nbcon_enter_unsafe()
nbcon_context_can_proceed()

BUG: nbcon_context_can_proceed() returns "true" because
the console is owned by a context on CPU0 with
NBCON_PRIO_NORMAL.

But it should return "false". The console is owned
by a context from task B and we do the check
in a context from task A.


OK, let's look at it with the new RT perspective. Here, the
con->device_lock() plays important role.

The race could NOT happen in:

+ NBCON_PRIO_PANIC context because it does not schedule

+ NBCON_PRIO_EMERGENCY context because we explicitly disable preemption there

+ NBCON_NORMAL_PRIO context when we ALWAYS do nbcon_try_acquire() under
con->device() lock. Here the con->device_lock() serializes
nbcon_try_acquire() calls even between running tasks.


Everything makes sense now. And we are probable safe.

I have to double check that we really ALWAYS call nbcon_try_acquire()
under con->device() lock. And I have to think how to describe this
in the commit messages and comments.

Best Regards,
Petr

2024-04-18 12:10:35

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

On 2024-04-18, Petr Mladek <[email protected]> wrote:
> I am not sure how it is done in other parts of kernel code where
> RT needed to introduce some tricks. But I think that we should
> really start mentioning RT behavior in the commit messages and
> and comments where the RT mode makes huge changes.

Yes, our motivation is RT. But these semantics are not RT-specific. They
apply to the general kernel locking model. For example, even for a !RT
system, it is semantically incorrect to take a spin_lock while holding a
raw_spin_lock.

In the full PREEMPT_RT series I have tried to be careful about only
mentioning PREEMPT_RT when it is really PREEMPT_RT-specific. For example
[0][1][2].

[0] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=linux-6.9.y-rt-rebase&id=1564af55a92c32fe215af35cf55cb9359c5fff30

[1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=linux-6.9.y-rt-rebase&id=033b416ad25b17dc60d5f71c1a0b33a5fbc17639

[2] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=linux-6.9.y-rt-rebase&id=7929ba9e5c110148a1fcd8bd93d6a4eff37aa265

> The race could NOT happen in:
>
> + NBCON_PRIO_PANIC context because it does not schedule

Yes.

> + NBCON_PRIO_EMERGENCY context because we explicitly disable
> preemption there

Yes.

> + NBCON_NORMAL_PRIO context when we ALWAYS do nbcon_try_acquire()
> under con->device() lock. Here the con->device_lock() serializes
> nbcon_try_acquire() calls even between running tasks.

The nbcon_legacy_emit_next_record() printing as NBCON_NORMAL_PRIO is a
special situation where write_atomic() is used. It is safe because it
disables hard interrupts and is never called from NMI context.

nbcon_atomic_flush_pending() as NBCON_NORMAL_PRIO is safe in !NMI
because it also disables hard interrupts. However,
nbcon_atomic_flush_pending() could be called in NMI with
NBCON_NORMAL_PRIO. I need to think about this case.

John

2024-04-18 12:49:42

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

On Thu 2024-04-18 01:11:59, John Ogness wrote:
> On 2024-04-11, Petr Mladek <[email protected]> wrote:
> > I am trying to make a full picture when and how the nbcon consoles
> > will get flushed. My current understanding and view is the following,
> > starting from the easiest priority:
> >
> >
> > 1. NBCON_PRIO_PANIC messages will be flushed by calling
> > nbcon_atomic_flush_pending() directly in vprintk_emit()
> >
> > This will take care of any previously added messages.
> >
> > Non-panic CPUs are not allowed to add messages anymore
> > when there is a panic in progress.
> >
> > [ALL OK]
>
> OK, because the end of panic will perform unsafe takeovers if necessary.
>
> > 2. NBCON_PRIO_EMERGENCY messages will be flushed by calling
> > nbcon_atomic_flush_pending() directly in nbcon_cpu_emergency_exit().
> >
> > This would cover all previously added messages, including
> > the ones printed by the code between
> > nbcon_cpu_emergency_enter()/exit().
>
> This is best effort. If the console is owned by another context and is
> marked unsafe, nbcon_atomic_flush_pending() will do nothing.
>
> [ PROBLEM: In this case, who will flush the emergency messages? ]

They should get flushed by the current owner when the system is still
living. Or the system is ready for panic() and the messages would
be emitted bu the final unsafe flush.

IMPORTANT: The optimistic scenario would work only when the current
owner really flushes everything. More on this below.


> > This won't cover later added messages which might be
> > a problem. Let's look at this closer. Later added
> > messages with:
> >
> > + NBCON_PRIO_PANIC will be handled in vprintk_emit()
> > as explained above [OK]
> >
> > + NBCON_PRIO_EMERGENCY() will be handled in the
> > related nbcon_cpu_emergency_exit() as described here.
> > [OK]
> >
> > + NBCON_PRIO_NORMAL will be handled, see below. [?]
> >
> > [ PROBLEM: later added NBCON_PRIO_NORMAL messages, see below. ]
>
> Yes, this is also an issue, although the solution may be the same for
> this and the above problem.

This is a bit different. There was an existing console owner in the
above scenario. In this case, the code relies on the printk kthread.
But we need a solution for situations when the kthread is not working,
e.g. early boot.


> > 3. NBCON_PRIO_NORMAL messages will be flushed by:
> >
> > + the printk kthread when it is available
> >
> > + the legacy loop via
> >
> > + console_unlock()
> > + console_flush_all()
> > + console nbcon_legacy_emit_next_record() [PROBLEM]
> >
> > PROBLEM: console_flush_all() does not guarantee progress with
> > nbcon consoles as explained above (previous mail).
>
> Not only this. If there is no kthread available, no printing will occur
> until the _next_ printk(), whenever that is.

> Above we have listed 3 problems:
>
> - emergency messages will not flush if owned by another context and
> unsafe
>
> - normal messages will not flush if owned by another context
>
> - for the above 2 problems, if there is no kthread, nobody will flush
> the messages

All this goes down to the problem who is would flush "ignored"
messages when the system continues working in "normal" mode.


> My question: Is this really a problem?

IMHO, it is. For example, early boot consoles exists for a reason.
People want to debug early boot problems using printk.
We should not break the reliability too much by introducing kthreads.

Later update: It is basically only about early boot debugging.

The kthreads should always be created later. And
we assume that they work, except for the emergency
and panic context.

So, this is not a problem as long as the boot consoles
are using the legacy code paths.

Well, I guess that they might use the "atomic_write()"
callback in the future. And then this "bug" might hurt.


> The main idea behind the rework is that printing is deferred. The
> kthreads exist for this. If the kthreads are not available (early boot
> or shutdown) or the kthreads are not reliable enough (emergency
> messages), a best-safe-effort is made to print in the caller
> context. Only the panic situation is designed to force output (unsafely,
> if necessary). Is that not enough?

Simple answer: No, primary because the early boot behavior.

Longer answer: I tried to separate all the variants and point out
a particular problem. The above paragraph mixes everything
into "Wave this away" proposal.


> > My proposal:
> >
> > 1. console_flush_all() will flush nbcon consoles only
> > in NBCON_PRIO_NORMAL and when the kthreads are not
> > available.
> >
> > It will make it clear that this is the flusher in
> > this situation.
>
> This is the current PREEMPT_RT implementation.
>
> > 2. Allow to skip nbcon consoles in console_flush_all() when
> > it can't take the context (as suggested in my previous
> > reply).
> >
> > This won't guarantee flushing NORMAL messages added
> > while nbcon_cpu_emergency_exit() calls
> > nbcon_atomic_flush_pending().
>
> This was the previous version. And I agree that we need to go back to
> that.
>
> > Solve this problem by introducing[*] nbcon_atomic_flush_all()
> > which would flush even newly added messages and
> > call this in nbcon_cpu_emergency_exit() when the printk
> > kthread does not work. It should bail out when there
> > is a panic in progress.
> >
> > Motivation: It does not matter which "atomic" context
> > flushes NORMAL/EMERGENCY messages when
> > the printk kthread is not available.
>
> I do not think that solves the problem. If the console is in an unsafe
> section, nothing can be printed.

IMHO, it solves the problem.

The idea is simple:

"The current nbcon console owner will be responsible for flushing
all messages when the printk kthread does not exist."

The prove is more complicated:

1. Let's put aside panic. We already do the best effort there.

2. Emergency mode currently violates the rule because
nbcon_atomic_flush_pending() ignores the simple rule.

=> FIX: improve nbcon_cpu_emergency_exit() to flush
all messages when kthreads are not ready.


3. Normal mode flushes nbcon consoles via
nbcon_legacy_emit_next_record() from console_unlock()
before the kthreads are started.

It is not reliable when nbcon_try_acquire() fails.
But it would fail only when there is another user.
The other owner might be:

+ panic: will handle everything

+ emergency: should flush everything [*]

+ normal: can't happen because of con->device() lock.

=> The only remaining problem is to fix nbcon_atomic_flush_pending()
to flush everything when the kthreads are not started yet.


> > [*] Alternatively we could modify nbcon_atomic_flush_pending()
> > to flush even newly added messages when the kthread is
> > not working. But it might create another mess.
>
> This discussion is about when kthreads are not available. If this is a
> concern, I wonder if maybe in this situation an irq_work should be
> triggered upon release of the console.

Calling irq_work() would solve the problem as well. It would move
flushing to context with "normal" priority which is serialized
by con->device_lock(). It works for me.

Does this make any sense?

It is possible that you already knew all this. And it is possible
that you did not see it as a problem because there was no plan
to convert boot consoles to nbcon variant. Maybe, it does
not even make sense because boot consoles could not use
kthreads. The only motivation would be code sharing and
simplification of the legacy loop but this is far away dream.

Sigh, all this is so complicated. I wonder how to document
this so that other people do not have to discover these
dependencies again and again. Is it even possible?

Well, I still think that it makes sense to improve
nbcon_cpu_emergency_exit() to fill the potential hole.
And ideally mention all these details somewhere
(commit message, comments, Documentation/...)

Best Regards,
Petr

2024-04-18 15:04:05

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 06/27] printk: nbcon: Add callbacks to synchronize with driver

On Thu 2024-04-18 14:16:16, John Ogness wrote:
> On 2024-04-18, Petr Mladek <[email protected]> wrote:
> > I am not sure how it is done in other parts of kernel code where
> > RT needed to introduce some tricks. But I think that we should
> > really start mentioning RT behavior in the commit messages and
> > and comments where the RT mode makes huge changes.
>
> Yes, our motivation is RT. But these semantics are not RT-specific. They
> apply to the general kernel locking model.

Yes, but RT is a nice example where it is clear what want to achieve.
IMHO, a clear example is always better then a scientific formulation
where every word might be important. Especially when different people
might understand some words different ways.


> For example, even for a !RT system, it is semantically incorrect to
> take a spin_lock while holding a raw_spin_lock.

Really? I am not aware of it. I know that lockdep complains even
in no-RT configuration. But I have expected that it only helps
to catch potential problems when the same code is used with
RT enabled.

Is there any difference between spin_lock() and raw_spin_lock()
when RT is disabled. I do not see any. This is from
include/linux/spinlock.h:

/* Non PREEMPT_RT kernel, map to raw spinlocks: */
#ifndef CONFIG_PREEMPT_RT
[...]
static __always_inline void spin_lock(spinlock_t *lock)
{
raw_spin_lock(&lock->rlock);
}

Would raw_spinlock() API exist without CONFIG_PREEMPT_RT?

Maybe, you do not understand what I suggest. Let's talk about
particular comments in the code.


> In the full PREEMPT_RT series I have tried to be careful about only
> mentioning PREEMPT_RT when it is really PREEMPT_RT-specific. For example
> [0][1][2].
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=linux-6.9.y-rt-rebase&id=1564af55a92c32fe215af35cf55cb9359c5fff30
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=linux-6.9.y-rt-rebase&id=033b416ad25b17dc60d5f71c1a0b33a5fbc17639
>
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/rt/linux-rt-devel.git/commit/?h=linux-6.9.y-rt-rebase&id=7929ba9e5c110148a1fcd8bd93d6a4eff37aa265
>
> > The race could NOT happen in:
> >
> > + NBCON_PRIO_PANIC context because it does not schedule
>
> Yes.
>
> > + NBCON_PRIO_EMERGENCY context because we explicitly disable
> > preemption there
>
> Yes.
>
> > + NBCON_NORMAL_PRIO context when we ALWAYS do nbcon_try_acquire()
> > under con->device() lock. Here the con->device_lock() serializes
> > nbcon_try_acquire() calls even between running tasks.
>
> The nbcon_legacy_emit_next_record() printing as NBCON_NORMAL_PRIO is a
> special situation where write_atomic() is used. It is safe because it
> disables hard interrupts and is never called from NMI context.
>
> nbcon_atomic_flush_pending() as NBCON_NORMAL_PRIO is safe in !NMI
> because it also disables hard interrupts. However,
> nbcon_atomic_flush_pending() could be called in NMI with
> NBCON_NORMAL_PRIO. I need to think about this case.

It is safe. The race scenario requires _double_ scheduling (A->B->A):

1. [CPU 0]: process A acquires the context and is scheduled (CPU 0)

2. [CPU 1] The nbcon context is taken over and released in emergency.

3. [CPU 0] process B acquires the context and is scheduled

4. [CPU 0] process A thinks that it still owns the context
and continue when it ended.


This could not happen with the current code when:

+ nbcon_try_acquire() is serialized by con->device_lock()
because process B would get blocked on this lock.

+ nbcon_try_acquire() is called in atomic context
because the context is always released before scheduling.


I would say that this is far from obvious and we really need
to document this somehow. I would mention these details above
nbcon_context_try_acquire().

Best Regards,
Petr

2024-04-18 21:45:13

by John Ogness

[permalink] [raw]
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

On 2024-04-18, Petr Mladek <[email protected]> wrote:
>> > Solve this problem by introducing[*] nbcon_atomic_flush_all()
>> > which would flush even newly added messages and
>> > call this in nbcon_cpu_emergency_exit() when the printk
>> > kthread does not work. It should bail out when there
>> > is a panic in progress.
>> >
>> > Motivation: It does not matter which "atomic" context
>> > flushes NORMAL/EMERGENCY messages when
>> > the printk kthread is not available.
>>
>> I do not think that solves the problem. If the console is in an unsafe
>> section, nothing can be printed.
>
> IMHO, it solves the problem.
>
> The idea is simple:
>
> "The current nbcon console owner will be responsible for flushing
> all messages when the printk kthread does not exist."

Currently this is not valid. It assumes owners are printers. That is not
always true. The owner might be some task modifying the baud rate and
has nothing to do with printing.

> The prove is more complicated:
>
> 1. Let's put aside panic. We already do the best effort there.
>
> 2. Emergency mode currently violates the rule because
> nbcon_atomic_flush_pending() ignores the simple rule.
>
> => FIX: improve nbcon_cpu_emergency_exit() to flush
> all messages when kthreads are not ready.

Emergency mode cannot flush _anything_ if there is an owner in an unsafe
region. (And that owner may not be a printer.)

> 3. Normal mode flushes nbcon consoles via
> nbcon_legacy_emit_next_record() from console_unlock()
> before the kthreads are started.
>
> It is not reliable when nbcon_try_acquire() fails.
> But it would fail only when there is another user.
> The other owner might be:
>
> + panic: will handle everything
>
> + emergency: should flush everything [*]
>
> + normal: can't happen because of con->device() lock.

As the code is now, "normal" does not imply con->device() lock. When
using con->write_atomic(), we do not (and can not) use the con->device()
lock. So normal _can_ fail in nbcon_legacy_emit_next_record() if another
CPU is adjusting the baud rate. This is why I said the problem with
"emergency" is basically the same problem as "normal" (WRT using
write_atomic()).

> => The only remaining problem is to fix nbcon_atomic_flush_pending()
> to flush everything when the kthreads are not started yet.

I think my proposed change handles it better. I have been doing various
tests and also adjusted it a bit.

The reason the flushing fails is because another context owns the
console. So I think it makes sense for that owner to handle flushing
responsibility when releasing ownership (even if that context was just
changing the baud rate).

[ Keep in mind we are only talking about printing via write_atomic()
when the kthread is not available. ]

If the current owner is a normal printing context, it will print to
completion anyway (via console_flush_all()).

If the current owner is an emergency printing context, it will only
print the emergency messages (as PRIO_EMERGENCY). However, when it
releases ownership, it could flush the remaining records (as
PRIO_NORMAL) in the same fashion as console_flush_all() does it.

If the current owner is a non-printing context, when it releases
ownership, it could flush the remaining records (as PRIO_NORMAL) in the
same fashion as console_flush_all() does it.

So what I am proposing is that we add two new normal-flushing sites that
are only used when the kthread is not available:

1. after exiting emergency mode: in nbcon_cpu_emergency_exit()

2. after releasing ownership for non-printing: in nbcon_driver_release()

I think this will close the gap and it does not require irq_work.

> Sigh, all this is so complicated. I wonder how to document
> this so that other people do not have to discover these
> dependencies again and again. Is it even possible?

In the end we will have the following 5 scenarios (assuming my
proposal):

1. PRIO_NORMAL non-printing activity, always under con->device() lock,
upon release flushes[*] full backlog

2. PRIO_NORMAL printing using write_thread(), always called from task
context and under con->device() lock, always flushes full backlog

3. PRIO_NORMAL printing using write_atomic(), called with hardware
interrupts disabled, always flushes full backlog, (only used when the
kthread is not available)

4. PRIO_EMERGENCY printing using write_atomic(), called with hardware
interrupts disabled, flushes through emergency, upon exit flushes[*]
full backlog

5. PRIO_PANIC printing using write_atomic(), called with hardware
interrupts disabled, flushes full backlog

[*] Only when the kthread is not available. And in that case #3 is the
scenario used for the trailing full flushing by #1 and #4.


Full flushing is attempted in all 5 scenarios. A failed attempt means
there is a new owner, but that owner is also one of the 5 scenarios.

Am I missing something?

IMHO #3 is the only bizarre scenario. It exists only to cover when the
kthread is not available.

John

2024-04-19 09:59:22

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v4 17/27] printk: nbcon: Use nbcon consoles in console_flush_all()

On Thu 2024-04-18 23:51:01, John Ogness wrote:
> On 2024-04-18, Petr Mladek <[email protected]> wrote:
> >> > Solve this problem by introducing[*] nbcon_atomic_flush_all()
> >> > which would flush even newly added messages and
> >> > call this in nbcon_cpu_emergency_exit() when the printk
> >> > kthread does not work. It should bail out when there
> >> > is a panic in progress.
> >> >
> >> > Motivation: It does not matter which "atomic" context
> >> > flushes NORMAL/EMERGENCY messages when
> >> > the printk kthread is not available.
> >>
> >> I do not think that solves the problem. If the console is in an unsafe
> >> section, nothing can be printed.
> >
> > IMHO, it solves the problem.
> >
> > The idea is simple:
> >
> > "The current nbcon console owner will be responsible for flushing
> > all messages when the printk kthread does not exist."
>
> Currently this is not valid. It assumes owners are printers. That is not
> always true. The owner might be some task modifying the baud rate and
> has nothing to do with printing.

Ah, I have missed this scenario.

> So what I am proposing is that we add two new normal-flushing sites that
> are only used when the kthread is not available:
>
> 1. after exiting emergency mode: in nbcon_cpu_emergency_exit()
>
> 2. after releasing ownership for non-printing: in nbcon_driver_release()
>
> I think this will close the gap and it does not require irq_work.
>
> > Sigh, all this is so complicated. I wonder how to document
> > this so that other people do not have to discover these
> > dependencies again and again. Is it even possible?
>
> In the end we will have the following 5 scenarios (assuming my
> proposal):
>
> 1. PRIO_NORMAL non-printing activity, always under con->device() lock,
> upon release flushes[*] full backlog
>
> 2. PRIO_NORMAL printing using write_thread(), always called from task
> context and under con->device() lock, always flushes full backlog
>
> 3. PRIO_NORMAL printing using write_atomic(), called with hardware
> interrupts disabled, always flushes full backlog, (only used when the
> kthread is not available)
>
> 4. PRIO_EMERGENCY printing using write_atomic(), called with hardware
> interrupts disabled, flushes through emergency, upon exit flushes[*]
> full backlog
>
> 5. PRIO_PANIC printing using write_atomic(), called with hardware
> interrupts disabled, flushes full backlog
>
> [*] Only when the kthread is not available. And in that case #3 is the
> scenario used for the trailing full flushing by #1 and #4.
>
>
> Full flushing is attempted in all 5 scenarios. A failed attempt means
> there is a new owner, but that owner is also one of the 5 scenarios.
>
> Am I missing something?
>
> IMHO #3 is the only bizarre scenario. It exists only to cover when the
> kthread is not available.

Great summary! I like it.

Let me try another summary:

We basically repeat the same trick as was used in the legacy
console_unlock(). When the kthread is not avaialale then
the current owner is responsible for flushing everything.

The game changer is the kthread. When it is available then
it takes care of "everything" as long as the system is
working normaly.

The system is working normally until suspend, shutdown, or panic().
It might also be sick. In which case, we try to flush the doctor
report immediately (emergency when safe). But we wait for
the entire doctor report before flushing (publishing).


Or another look. We have the following rules:

1. kthread is not avaialbe => owner flushes everything

2. kthread is available:

a) Normal messages are off loaded to kthread (store + kick)

b) Emergency messages (doctor examination) are first stored and then
tried to be flushed immediately (when possible/safe), including
all previous messages (other up-to-date notes).

The emergency messages might also be split in more
parts when the report is too long, for example,
backtraces or lock depedencies of all tasks.
(Reports from more doctor specialists). In this case,
each part (report) is flushed immediately (when possible/safe),
including all previous messages (other up-to-date notes).

When the system tries to continue normaly (no dying)
then any later messages (post doctor exam notes) are
let the kthread (secretary) to flush them.

c) Panic messages are flushed immediately (when possible/safe),
including all previous messages (other up-to-date notes).

d) The final flush in panic() will be done even when not safe
(patient will try to read the diary even when feeling
dizzy and it might be a complete fiasco but it is
the very last chance).


In shorrt, go ahead with your proposal.

Best Regards,
Petr