LinuxLists.cc - [PATCH printk v2 24/26] panic: Mark emergency section in oops

2024-02-18 19:01:20

Subject: [PATCH printk v2 24/26] panic: Mark emergency section in oops

Mark an emergency section beginning with oops_enter() until the
end of oops_exit(). In this section, the CPU will not perform
console output for the printk() calls. Instead, a flushing of the
console output is triggered when exiting the emergency section.

The very end of oops_exit() performs a kmsg_dump(). This is not
included in the emergency section because it is another
flushing mechanism that should occur after the consoles have
been triggered to flush.

Signed-off-by: John Ogness <[email protected]>
---
kernel/panic.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/panic.c b/kernel/panic.c
index d30d261f9246..9fa44bc38f46 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -634,6 +634,7 @@ bool oops_may_print(void)
*/
void oops_enter(void)
{
+ nbcon_cpu_emergency_enter();
tracing_off();
/* can't trust the integrity of the kernel anymore: */
debug_locks_off();
@@ -656,6 +657,7 @@ void oops_exit(void)
{
do_oops_enter_exit();
print_oops_end_marker();
+ nbcon_cpu_emergency_exit();
kmsg_dump(KMSG_DUMP_OOPS);
}

--
2.39.2

2024-03-01 14:55:51

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH printk v2 24/26] panic: Mark emergency section in oops

On Sun 2024-02-18 20:03:24, John Ogness wrote:
> Mark an emergency section beginning with oops_enter() until the
> end of oops_exit(). In this section, the CPU will not perform
> console output for the printk() calls. Instead, a flushing of the
> console output is triggered when exiting the emergency section.
>
> The very end of oops_exit() performs a kmsg_dump(). This is not
> included in the emergency section because it is another
> flushing mechanism that should occur after the consoles have
> been triggered to flush.
>
> Signed-off-by: John Ogness <[email protected]>
> ---
> kernel/panic.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index d30d261f9246..9fa44bc38f46 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -634,6 +634,7 @@ bool oops_may_print(void)
> */
> void oops_enter(void)
> {
> + nbcon_cpu_emergency_enter();
> tracing_off();
> /* can't trust the integrity of the kernel anymore: */
> debug_locks_off();
> @@ -656,6 +657,7 @@ void oops_exit(void)
> {
> do_oops_enter_exit();

The comment above oops_enter() function says:

/*
* Called when the architecture enters its oops handler, before it prints
* anything. If this is the first CPU to oops, and it's oopsing the first
* time then let it proceed.
*
* This is all enabled by the pause_on_oops kernel boot option. We do all
* this to ensure that oopses don't scroll off the screen. It has the
* side-effect of preventing later-oopsing CPUs from mucking up the display,
* too.
*
* It turns out that the CPU which is allowed to print ends up pausing for
* the right duration, whereas all the other CPUs pause for twice as long:
* once in oops_enter(), once in oops_exit().
*/

and indeed do_oops_enter_exit(); does the waiting.

IMHO, we should enter() the emergency context after waiting in
oops_enter(). And exit() it before waiting in oops_exit(). Aka

void oops_enter(void)
{
tracing_off();
/* can't trust the integrity of the kernel anymore: */
debug_locks_off();
do_oops_enter_exit();
+ nbcon_cpu_emergency_enter();

if (sysctl_oops_all_cpu_backtrace)
trigger_all_cpu_backtrace();
}

void oops_exit(void)
{
+ nbcon_cpu_emergency_exit();
do_oops_enter_exit();
print_oops_end_marker();
kmsg_dump(KMSG_DUMP_OOPS);
}

> print_oops_end_marker();
> + nbcon_cpu_emergency_exit();
> kmsg_dump(KMSG_DUMP_OOPS);
> }

Otherwise, it looks good.

Best Regards,
Petr