2024-02-18 19:01:33

by John Ogness

[permalink] [raw]
Subject: [PATCH printk v2 21/26] printk: Coordinate direct printing in panic

Perform printing by nbcon consoles on the panic CPU from the
printk() caller context in order to get panic messages printed
as soon as possible.

If legacy and nbcon consoles are registered, the legacy consoles
will no longer perform direct printing on the panic CPU until
after the backtrace has been stored. This will give the safe
nbcon consoles a chance to print the panic messages before
allowing the unsafe legacy consoles to print.

If no nbcon consoles are registered, there is no change in
behavior (i.e. legacy consoles will always attempt to print
from the printk() caller context).

Signed-off-by: John Ogness <[email protected]>
---
include/linux/printk.h | 5 ++++
kernel/panic.c | 2 ++
kernel/printk/printk.c | 53 ++++++++++++++++++++++++++++++++++++------
3 files changed, 53 insertions(+), 7 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index fe37395af9e2..a2d40a637226 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -197,6 +197,7 @@ void show_regs_print_info(const char *log_lvl);
extern asmlinkage void dump_stack_lvl(const char *log_lvl) __cold;
extern asmlinkage void dump_stack(void) __cold;
void printk_trigger_flush(void);
+void printk_legacy_allow_panic_sync(void);
extern void uart_nbcon_acquire(struct uart_port *up);
extern void uart_nbcon_release(struct uart_port *up);
void nbcon_atomic_flush_unsafe(void);
@@ -280,6 +281,10 @@ static inline void printk_trigger_flush(void)
{
}

+static inline void printk_legacy_allow_panic_sync(void)
+{
+}
+
static inline void uart_nbcon_acquire(struct uart_port *up)
{
}
diff --git a/kernel/panic.c b/kernel/panic.c
index c039f8e1ddae..86813305510f 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -370,6 +370,8 @@ void panic(const char *fmt, ...)
*/
atomic_notifier_call_chain(&panic_notifier_list, 0, buf);

+ printk_legacy_allow_panic_sync();
+
panic_print_sys_info(false);

kmsg_dump(KMSG_DUMP_PANIC);
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 2c4ab717c110..109cfdd988aa 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2329,12 +2329,23 @@ int vprintk_store(int facility, int level,
return ret;
}

+static bool legacy_allow_panic_sync;
+
+/*
+ * This acts as a one-way switch to allow legacy consoles to print from
+ * the printk() caller context on a panic CPU.
+ */
+void printk_legacy_allow_panic_sync(void)
+{
+ legacy_allow_panic_sync = true;
+}
+
asmlinkage int vprintk_emit(int facility, int level,
const struct dev_printk_info *dev_info,
const char *fmt, va_list args)
{
+ bool do_trylock_unlock = printing_via_unlock;
int printed_len;
- bool in_sched = false;

/* Suppress unimportant messages after panic happens */
if (unlikely(suppress_printk))
@@ -2350,15 +2361,43 @@ asmlinkage int vprintk_emit(int facility, int level,

if (level == LOGLEVEL_SCHED) {
level = LOGLEVEL_DEFAULT;
- in_sched = true;
+ /* If called from the scheduler, we can not call up(). */
+ do_trylock_unlock = false;
}

printk_delay(level);

printed_len = vprintk_store(facility, level, dev_info, fmt, args);

- /* If called from the scheduler, we can not call up(). */
- if (!in_sched && printing_via_unlock) {
+ if (!have_boot_console && have_nbcon_console) {
+ bool is_panic_context = this_cpu_in_panic();
+
+ /*
+ * In panic, the legacy consoles are not allowed to print from
+ * the printk calling context unless explicitly allowed. This
+ * gives the safe nbcon consoles a chance to print out all the
+ * panic messages first. This restriction only applies if
+ * there are nbcon consoles registered.
+ */
+ if (is_panic_context)
+ do_trylock_unlock &= legacy_allow_panic_sync;
+
+ /*
+ * There are situations where nbcon atomic printing should
+ * happen in the printk() caller context:
+ *
+ * - When this CPU is in panic.
+ *
+ * Note that if boot consoles are registered, the
+ * console_lock/console_unlock dance must be relied upon
+ * instead because nbcon consoles cannot print simultaneously
+ * with boot consoles.
+ */
+ if (is_panic_context)
+ nbcon_atomic_flush_all();
+ }
+
+ if (do_trylock_unlock) {
/*
* The caller may be holding system-critical or
* timing-sensitive locks. Disable preemption during
@@ -2378,10 +2417,10 @@ asmlinkage int vprintk_emit(int facility, int level,
preempt_enable();
}

- if (in_sched)
- defer_console_output();
- else
+ if (do_trylock_unlock)
wake_up_klogd();
+ else
+ defer_console_output();

return printed_len;
}
--
2.39.2



2024-03-01 13:06:09

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH printk v2 21/26] printk: Coordinate direct printing in panic

On Sun 2024-02-18 20:03:21, John Ogness wrote:
> Perform printing by nbcon consoles on the panic CPU from the
> printk() caller context in order to get panic messages printed
> as soon as possible.
>
> If legacy and nbcon consoles are registered, the legacy consoles
> will no longer perform direct printing on the panic CPU until
> after the backtrace has been stored. This will give the safe
> nbcon consoles a chance to print the panic messages before
> allowing the unsafe legacy consoles to print.
>
> If no nbcon consoles are registered, there is no change in
> behavior (i.e. legacy consoles will always attempt to print
> from the printk() caller context).

> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -370,6 +370,8 @@ void panic(const char *fmt, ...)
> */
> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>
> + printk_legacy_allow_panic_sync();

I would call this before the panic notifiers. They are known
to cause problems.

It was the reason to introduce "crash_kexec_post_notifiers" parameter.
Also there is a patchset which tries to somehow split them
by purpose, see
https://lore.kernel.org/all/[email protected]/

It brings another question whether to try flushing the legacy consoles
before calling the notifiers.

> panic_print_sys_info(false);
>
> kmsg_dump(KMSG_DUMP_PANIC);
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2329,12 +2329,23 @@ int vprintk_store(int facility, int level,
> return ret;
> }
>
> +static bool legacy_allow_panic_sync;
> +
> +/*
> + * This acts as a one-way switch to allow legacy consoles to print from
> + * the printk() caller context on a panic CPU.
> + */
> +void printk_legacy_allow_panic_sync(void)
> +{
> + legacy_allow_panic_sync = true;

I would flush the legacy consoles here. Otherwise it might be done
by another random printk() from the notifiers or by the even more
unsafe printk_console_flush_in_panic().

I mean to do something like:

if (printing_via_unlock && console_trylock)
console_unlock();

> +}
> +
> asmlinkage int vprintk_emit(int facility, int level,
> const struct dev_printk_info *dev_info,
> const char *fmt, va_list args)
> {
> + bool do_trylock_unlock = printing_via_unlock;
> int printed_len;
> - bool in_sched = false;
>
> /* Suppress unimportant messages after panic happens */
> if (unlikely(suppress_printk))
> @@ -2350,15 +2361,43 @@ asmlinkage int vprintk_emit(int facility, int level,
>
> if (level == LOGLEVEL_SCHED) {
> level = LOGLEVEL_DEFAULT;
> - in_sched = true;
> + /* If called from the scheduler, we can not call up(). */
> + do_trylock_unlock = false;
> }
>
> printk_delay(level);
>
> printed_len = vprintk_store(facility, level, dev_info, fmt, args);
>
> - /* If called from the scheduler, we can not call up(). */
> - if (!in_sched && printing_via_unlock) {
> + if (!have_boot_console && have_nbcon_console) {

Nit: The opposite order is more logic ;-)

if (have_nbcon_console && !have_boot_console) {

> + bool is_panic_context = this_cpu_in_panic();
> +
> + /*
> + * In panic, the legacy consoles are not allowed to print from
> + * the printk calling context unless explicitly allowed. This
> + * gives the safe nbcon consoles a chance to print out all the
> + * panic messages first. This restriction only applies if
> + * there are nbcon consoles registered.
> + */
> + if (is_panic_context)
> + do_trylock_unlock &= legacy_allow_panic_sync;
> +
> + /*
> + * There are situations where nbcon atomic printing should
> + * happen in the printk() caller context:
> + *
> + * - When this CPU is in panic.
> + *
> + * Note that if boot consoles are registered, the
> + * console_lock/console_unlock dance must be relied upon
> + * instead because nbcon consoles cannot print simultaneously
> + * with boot consoles.
> + */
> + if (is_panic_context)
> + nbcon_atomic_flush_all();
> + }
> +
> + if (do_trylock_unlock) {
> /*
> * The caller may be holding system-critical or
> * timing-sensitive locks. Disable preemption during

Otherwise, it looks good.

Best Regards,
Petr