LinuxLists.cc - [PATCH] panic: disable optimistic spin after halting CPUs

2022-01-13 00:55:16

Subject: [PATCH] panic: disable optimistic spin after halting CPUs

A CPU executing with console lock spinning enabled might be halted
during a panic. Before console_flush_on_panic(), it's possible for
console_trylock() to attempt optimistic spinning, deadlocking the panic
CPU:

CPU 0 (panic CPU) CPU 1
----------------- ------
printk() {
vprintk_func() {
vprintk_default() {
vprintk_emit() {
console_unlock() {
console_lock_spinning_enable();
... printing to console ...
panic() {
crash_smp_send_stop() {
NMI -------------------> HALT
}
atomic_notifier_call_chain() {
printk() {
...
console_trylock_spinnning() {
// optimistic spin infinitely

This hang during panic can be induced when a kdump kernel is loaded, and
crash_kexec_post_notifiers=1 is present on the kernel command line. The
following script which concurrently writes to /dev/kmsg, and triggers a
panic, can result in this hang:

#!/bin/bash
date
# 991 chars (based on log buffer size):
chars="$(printf 'a%.0s' {1..991})"
while :; do
echo $chars > /dev/kmsg
done &
echo c > /proc/sysrq-trigger &
date
exit

Below are the hang rates for the above test case, based on v5.16-rc8
before and after this patch:

before: 197 hangs / 340 trials - 57.9%
after : 0 hangs / 245 trials - 0.0%

Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")

Signed-off-by: Stephen Brennan <[email protected]>
Reviewed-by: Junxiao Bi <[email protected]>
---
include/linux/console.h | 1 +
kernel/panic.c | 7 +++++++
kernel/printk/printk.c | 17 +++++++++++++++++
3 files changed, 25 insertions(+)

diff --git a/include/linux/console.h b/include/linux/console.h
index a97f277cfdfa..4eeb46927d96 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -179,6 +179,7 @@ extern void console_unlock(void);
extern void console_conditional_schedule(void);
extern void console_unblank(void);
extern void console_flush_on_panic(enum con_flush_mode mode);
+extern void console_lock_spinning_disable_on_panic(void);
extern struct tty_driver *console_device(int *);
extern void console_stop(struct console *);
extern void console_start(struct console *);
diff --git a/kernel/panic.c b/kernel/panic.c
index cefd7d82366f..a9340e580b20 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -265,6 +265,13 @@ void panic(const char *fmt, ...)
crash_smp_send_stop();
}

+ /*
+ * Now that we've halted other CPUs, disable optimistic spinning in
+ * printk(). This avoids deadlocking in console_trylock(), until we call
+ * console_flush_on_panic().
+ */
+ console_lock_spinning_disable_on_panic();
+
/*
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 57b132b658e1..d824fdf7d3fd 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1823,6 +1823,23 @@ static int console_lock_spinning_disable_and_check(void)
return 1;
}

+/**
+ * console_lock_spinning_disable_on_panic - disable spinning so that
+ * a panic CPU does not enter an infinite loop
+ *
+ * This is called once all CPUs are halted. A CPU halted during a section which
+ * allowed spinning, could trigger an infinite loop in console_trylock. To avoid
+ * this, mark console_owner as NULL.
+ */
+void console_lock_spinning_disable_on_panic(void)
+{
+ WRITE_ONCE(console_owner, NULL);
+ if (raw_spin_is_locked(&console_owner_lock)) {
+ debug_locks_off();
+ raw_spin_lock_init(&console_owner_lock);
+ }
+}
+
/**
* console_trylock_spinning - try to get console_lock by busy waiting
*
--
2.30.2

2022-01-13 17:39:34

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH] panic: disable optimistic spin after halting CPUs

On Wed 2022-01-12 16:54:25, Stephen Brennan wrote:
> A CPU executing with console lock spinning enabled might be halted
> during a panic. Before console_flush_on_panic(), it's possible for
> console_trylock() to attempt optimistic spinning, deadlocking the panic
> CPU:
>
> CPU 0 (panic CPU) CPU 1
> ----------------- ------
> printk() {
> vprintk_func() {
> vprintk_default() {
> vprintk_emit() {
> console_unlock() {
> console_lock_spinning_enable();
> ... printing to console ...
> panic() {
> crash_smp_send_stop() {
> NMI -------------------> HALT
> }
> atomic_notifier_call_chain() {
> printk() {
> ...
> console_trylock_spinnning() {
> // optimistic spin infinitely

I see.

> This hang during panic can be induced when a kdump kernel is loaded, and
> crash_kexec_post_notifiers=1 is present on the kernel command line. The
> following script which concurrently writes to /dev/kmsg, and triggers a
> panic, can result in this hang:
>
> #!/bin/bash
> date
> # 991 chars (based on log buffer size):
> chars="$(printf 'a%.0s' {1..991})"
> while :; do
> echo $chars > /dev/kmsg
> done &
> echo c > /proc/sysrq-trigger &
> date
> exit
>
> Below are the hang rates for the above test case, based on v5.16-rc8
> before and after this patch:
>
> before: 197 hangs / 340 trials - 57.9%
> after : 0 hangs / 245 trials - 0.0%
>
> Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")
>
> Signed-off-by: Stephen Brennan <[email protected]>
> Reviewed-by: Junxiao Bi <[email protected]>
> ---
> include/linux/console.h | 1 +
> kernel/panic.c | 7 +++++++
> kernel/printk/printk.c | 17 +++++++++++++++++
> 3 files changed, 25 insertions(+)
>
> diff --git a/include/linux/console.h b/include/linux/console.h
> index a97f277cfdfa..4eeb46927d96 100644
> --- a/include/linux/console.h
> +++ b/include/linux/console.h
> @@ -179,6 +179,7 @@ extern void console_unlock(void);
> extern void console_conditional_schedule(void);
> extern void console_unblank(void);
> extern void console_flush_on_panic(enum con_flush_mode mode);
> +extern void console_lock_spinning_disable_on_panic(void);
> extern struct tty_driver *console_device(int *);
> extern void console_stop(struct console *);
> extern void console_start(struct console *);
> diff --git a/kernel/panic.c b/kernel/panic.c
> index cefd7d82366f..a9340e580b20 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -265,6 +265,13 @@ void panic(const char *fmt, ...)
> crash_smp_send_stop();
> }
>
> + /*
> + * Now that we've halted other CPUs, disable optimistic spinning in
> + * printk(). This avoids deadlocking in console_trylock(), until we call
> + * console_flush_on_panic().
> + */
> + console_lock_spinning_disable_on_panic();

The proposed implementation causes that the panicing CPU takes over
console_lock even when the current owner is in the middle of
call_console_drivers().

It means that calling console drivers from another CPU is not
completely safe. This is why console_flush_on_panic() is called
at end when the console is the only way to see the messages.

We should not do this before kmsg_dump() and __crash_kexec()
that allow to see the messages a "more safe" way.

I suggest to disable the spinning when panic is in progress. I mean
something like:

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1842,6 +1842,10 @@ static int console_trylock_spinning(void)
if (console_trylock())
return 1;

+ /* Spinning is not safe when the system is stopped */
+ if (read_atomic(&panic_cpu) != PANIC_CPU_INVALID)
+ return 0;
+
printk_safe_enter_irqsave(flags);

raw_spin_lock(&console_owner_lock);

It would be also great when the current owner releases console_lock
so that the panicing CPU could take over it.

I think about the following. Well, I am not sure if it would help
in the real life because other CPUs are stopped quite early in panic().

--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2646,13 +2650,18 @@ void console_unlock(void)

for (;;) {
size_t ext_len = 0;
- int handover;
+ int handover, pcpu;
size_t len;

skip:
if (!prb_read_valid(prb, console_seq, &r))
break;

+ /* Allow the panic_cpu to take over consoles a safe way. */
+ pcpu = read_atomic(&panic_cpu);
+ if (pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id())
+ break;
+
if (console_seq != r.info->seq) {
console_dropped += r.info->seq - console_seq;
console_seq = r.info->seq;

Note that the above code is not even compile tested. panic_cpu is
local variable in panic.c. If we go this way than I would define
some helpers:

bool panic_in_progress(void)
{
return read_atomic(&panic_cpu) != PANIC_CPU_INVALID;
}

bool panic_in_progress_on_other_cpu(void)
{
int pcpu = read_atomic(&panic_cpu);

if (pcpu == PANIC_CPU_INVALID)
return false;

if (pcpu == smp_processor_id())
return false;

return true;
}

Best Regards,
Petr

2022-01-13 19:37:20

by Stephen Brennan

[permalink] [raw]

Subject: Re: [PATCH] panic: disable optimistic spin after halting CPUs

Hi Petr,

Thanks for taking a look!

Petr Mladek <[email protected]> writes:
> On Wed 2022-01-12 16:54:25, Stephen Brennan wrote:
>> A CPU executing with console lock spinning enabled might be halted
>> during a panic. Before console_flush_on_panic(), it's possible for
>> console_trylock() to attempt optimistic spinning, deadlocking the panic
>> CPU:
>>
>> CPU 0 (panic CPU) CPU 1
>> ----------------- ------
>> printk() {
>> vprintk_func() {
>> vprintk_default() {
>> vprintk_emit() {
>> console_unlock() {
>> console_lock_spinning_enable();
>> ... printing to console ...
>> panic() {
>> crash_smp_send_stop() {
>> NMI -------------------> HALT
>> }
>> atomic_notifier_call_chain() {
>> printk() {
>> ...
>> console_trylock_spinnning() {
>> // optimistic spin infinitely
>
> I see.
>
>> This hang during panic can be induced when a kdump kernel is loaded, and
>> crash_kexec_post_notifiers=1 is present on the kernel command line. The
>> following script which concurrently writes to /dev/kmsg, and triggers a
>> panic, can result in this hang:
>>
>> #!/bin/bash
>> date
>> # 991 chars (based on log buffer size):
>> chars="$(printf 'a%.0s' {1..991})"
>> while :; do
>> echo $chars > /dev/kmsg
>> done &
>> echo c > /proc/sysrq-trigger &
>> date
>> exit
>>
>> Below are the hang rates for the above test case, based on v5.16-rc8
>> before and after this patch:
>>
>> before: 197 hangs / 340 trials - 57.9%
>> after : 0 hangs / 245 trials - 0.0%
>>
>> Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")
>>
>> Signed-off-by: Stephen Brennan <[email protected]>
>> Reviewed-by: Junxiao Bi <[email protected]>
>> ---
>> include/linux/console.h | 1 +
>> kernel/panic.c | 7 +++++++
>> kernel/printk/printk.c | 17 +++++++++++++++++
>> 3 files changed, 25 insertions(+)
>>
>> diff --git a/include/linux/console.h b/include/linux/console.h
>> index a97f277cfdfa..4eeb46927d96 100644
>> --- a/include/linux/console.h
>> +++ b/include/linux/console.h
>> @@ -179,6 +179,7 @@ extern void console_unlock(void);
>> extern void console_conditional_schedule(void);
>> extern void console_unblank(void);
>> extern void console_flush_on_panic(enum con_flush_mode mode);
>> +extern void console_lock_spinning_disable_on_panic(void);
>> extern struct tty_driver *console_device(int *);
>> extern void console_stop(struct console *);
>> extern void console_start(struct console *);
>> diff --git a/kernel/panic.c b/kernel/panic.c
>> index cefd7d82366f..a9340e580b20 100644
>> --- a/kernel/panic.c
>> +++ b/kernel/panic.c
>> @@ -265,6 +265,13 @@ void panic(const char *fmt, ...)
>> crash_smp_send_stop();
>> }
>>
>> + /*
>> + * Now that we've halted other CPUs, disable optimistic spinning in
>> + * printk(). This avoids deadlocking in console_trylock(), until we call
>> + * console_flush_on_panic().
>> + */
>> + console_lock_spinning_disable_on_panic();
>
> The proposed implementation causes that the panicing CPU takes over
> console_lock even when the current owner is in the middle of
> call_console_drivers().

I don't think this is true. All that
console_lock_spinning_disable_on_panic() does is clear out the
console_owner and console_owner_lock, signalling that the panicking CPU
should not attempt to spin and wait for the lock.

The only time this matters is when another task already holds
console_sem and is executing presently in call_console_drivers(), as you
said.

With console_owner set to NULL, console_trylock_spinning() will get
spin=false, and thus it will bail out, returning 0. Thus, the panicking
CPU would fail to acquire console_sem, and so in vprintk_emit it will
not end up going into console_unlock(), or doing call_console_drivers().

It would not be until console_flush_on_panic() down below (when there is
no other option than to try to use the console) that the panicking CPU
actually attempts to call the drivers.

>
> It means that calling console drivers from another CPU is not
> completely safe. This is why console_flush_on_panic() is called
> at end when the console is the only way to see the messages.
>
> We should not do this before kmsg_dump() and __crash_kexec()
> that allow to see the messages a "more safe" way.
>
> I suggest to disable the spinning when panic is in progress. I mean
> something like:
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -1842,6 +1842,10 @@ static int console_trylock_spinning(void)
> if (console_trylock())
> return 1;
>
> + /* Spinning is not safe when the system is stopped */
> + if (read_atomic(&panic_cpu) != PANIC_CPU_INVALID)
> + return 0;
> +
> printk_safe_enter_irqsave(flags);
>
> raw_spin_lock(&console_owner_lock);

I think this is pretty much equivalent to my fix, since it also results
in the panicking CPU failing to acquire console_sem during
console_trylock_spinning().

I do think this is better than my fix :) since it doesn't clutter up
panic() even more, nor introduce an additional function. It even avoids
resetting the console_owner_lock spinlock.

I'd be happy to do this as a v2, if you'd prefer?

>
> It would be also great when the current owner releases console_lock
> so that the panicing CPU could take over it.
>
> I think about the following. Well, I am not sure if it would help
> in the real life because other CPUs are stopped quite early in panic().
>
> --- a/kernel/printk/printk.c
> +++ b/kernel/printk/printk.c
> @@ -2646,13 +2650,18 @@ void console_unlock(void)
>
> for (;;) {
> size_t ext_len = 0;
> - int handover;
> + int handover, pcpu;
> size_t len;
>
> skip:
> if (!prb_read_valid(prb, console_seq, &r))
> break;
>
> + /* Allow the panic_cpu to take over consoles a safe way. */
> + pcpu = read_atomic(&panic_cpu);
> + if (pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id())
> + break;
> +
> if (console_seq != r.info->seq) {
> console_dropped += r.info->seq - console_seq;
> console_seq = r.info->seq;
>

I suppose this could help reduce the odds of a CPU getting interrupted
during call_console_drivers(), and it might reduce the odds of
console_sem being held by a halted CPU during panic().

However, it occurs to me that there are two cases:

1) a kdump kernel is loaded. In this case, crash_smp_send_stop() is
used, which (at least on x86_64) sends NMIs. In this case,
__crash_kexec() will not return (barring any errors) and we won't ever
really need to get the console_sem.

2) no kdump kernel is loaded. In this case, smp_send_stop() is used,
which (again on x86_64) sends regular IPIs. In this case, we know we can
at least avoid interrupting call_console_drivers(), since that executes
with local IRQ disabled.

So I'm also unsure how much this would help in the real world. But it's
a small change and so it doesn't need to help much in order to be a net
positive :)

Thanks,
Stephen

>
>
> Note that the above code is not even compile tested. panic_cpu is
> local variable in panic.c. If we go this way than I would define
> some helpers:
>
> bool panic_in_progress(void)
> {
> return read_atomic(&panic_cpu) != PANIC_CPU_INVALID;
> }
>
> bool panic_in_progress_on_other_cpu(void)
> {
> int pcpu = read_atomic(&panic_cpu);
>
> if (pcpu == PANIC_CPU_INVALID)
> return false;
>
> if (pcpu == smp_processor_id())
> return false;
>
> return true;
> }
>
> Best Regards,
> Petr

2022-01-14 10:24:31

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH] panic: disable optimistic spin after halting CPUs

On Thu 2022-01-13 11:36:35, Stephen Brennan wrote:
> Hi Petr,
>
> Thanks for taking a look!
>
> Petr Mladek <[email protected]> writes:
> > On Wed 2022-01-12 16:54:25, Stephen Brennan wrote:
> >> A CPU executing with console lock spinning enabled might be halted
> >> during a panic. Before console_flush_on_panic(), it's possible for
> >> console_trylock() to attempt optimistic spinning, deadlocking the panic
> >> CPU:
> >>
> >> CPU 0 (panic CPU) CPU 1
> >> ----------------- ------
> >> printk() {
> >> vprintk_func() {
> >> vprintk_default() {
> >> vprintk_emit() {
> >> console_unlock() {
> >> console_lock_spinning_enable();
> >> ... printing to console ...
> >> panic() {
> >> crash_smp_send_stop() {
> >> NMI -------------------> HALT
> >> }
> >> atomic_notifier_call_chain() {
> >> printk() {
> >> ...
> >> console_trylock_spinnning() {
> >> // optimistic spin infinitely
> >
> > I see.
> >
> >> This hang during panic can be induced when a kdump kernel is loaded, and
> >> crash_kexec_post_notifiers=1 is present on the kernel command line. The
> >> following script which concurrently writes to /dev/kmsg, and triggers a
> >> panic, can result in this hang:
> >>
> >> #!/bin/bash
> >> date
> >> # 991 chars (based on log buffer size):
> >> chars="$(printf 'a%.0s' {1..991})"
> >> while :; do
> >> echo $chars > /dev/kmsg
> >> done &
> >> echo c > /proc/sysrq-trigger &
> >> date
> >> exit
> >>
> >> Below are the hang rates for the above test case, based on v5.16-rc8
> >> before and after this patch:
> >>
> >> before: 197 hangs / 340 trials - 57.9%
> >> after : 0 hangs / 245 trials - 0.0%
> >>
> >> Fixes: dbdda842fe96 ("printk: Add console owner and waiter logic to load balance console writes")
> >>
> >> Signed-off-by: Stephen Brennan <[email protected]>
> >> Reviewed-by: Junxiao Bi <[email protected]>
> >> ---
> >> include/linux/console.h | 1 +
> >> kernel/panic.c | 7 +++++++
> >> kernel/printk/printk.c | 17 +++++++++++++++++
> >> 3 files changed, 25 insertions(+)
> >>
> >> diff --git a/include/linux/console.h b/include/linux/console.h
> >> index a97f277cfdfa..4eeb46927d96 100644
> >> --- a/include/linux/console.h
> >> +++ b/include/linux/console.h
> >> @@ -179,6 +179,7 @@ extern void console_unlock(void);
> >> extern void console_conditional_schedule(void);
> >> extern void console_unblank(void);
> >> extern void console_flush_on_panic(enum con_flush_mode mode);
> >> +extern void console_lock_spinning_disable_on_panic(void);
> >> extern struct tty_driver *console_device(int *);
> >> extern void console_stop(struct console *);
> >> extern void console_start(struct console *);
> >> diff --git a/kernel/panic.c b/kernel/panic.c
> >> index cefd7d82366f..a9340e580b20 100644
> >> --- a/kernel/panic.c
> >> +++ b/kernel/panic.c
> >> @@ -265,6 +265,13 @@ void panic(const char *fmt, ...)
> >> crash_smp_send_stop();
> >> }
> >>
> >> + /*
> >> + * Now that we've halted other CPUs, disable optimistic spinning in
> >> + * printk(). This avoids deadlocking in console_trylock(), until we call
> >> + * console_flush_on_panic().
> >> + */
> >> + console_lock_spinning_disable_on_panic();
> >
> > The proposed implementation causes that the panicing CPU takes over
> > console_lock even when the current owner is in the middle of
> > call_console_drivers().
>
> I don't think this is true. All that
> console_lock_spinning_disable_on_panic() does is clear out the
> console_owner and console_owner_lock, signalling that the panicking CPU
> should not attempt to spin and wait for the lock.
>
> The only time this matters is when another task already holds
> console_sem and is executing presently in call_console_drivers(), as you
> said.
>
> With console_owner set to NULL, console_trylock_spinning() will get
> spin=false, and thus it will bail out, returning 0. Thus, the panicking
> CPU would fail to acquire console_sem, and so in vprintk_emit it will
> not end up going into console_unlock(), or doing call_console_drivers().
>
> It would not be until console_flush_on_panic() down below (when there is
> no other option than to try to use the console) that the panicking CPU
> actually attempts to call the drivers.

You are right. I should have used drawing board.

> > It means that calling console drivers from another CPU is not
> > completely safe. This is why console_flush_on_panic() is called
> > at end when the console is the only way to see the messages.
> >
> > We should not do this before kmsg_dump() and __crash_kexec()
> > that allow to see the messages a "more safe" way.
> >
> > I suggest to disable the spinning when panic is in progress. I mean
> > something like:
> >
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -1842,6 +1842,10 @@ static int console_trylock_spinning(void)
> > if (console_trylock())
> > return 1;
> >
> > + /* Spinning is not safe when the system is stopped */
> > + if (read_atomic(&panic_cpu) != PANIC_CPU_INVALID)
> > + return 0;
> > +
> > printk_safe_enter_irqsave(flags);
> >
> > raw_spin_lock(&console_owner_lock);
>
> I think this is pretty much equivalent to my fix, since it also results
> in the panicking CPU failing to acquire console_sem during
> console_trylock_spinning().

Yes.

> I do think this is better than my fix :) since it doesn't clutter up
> panic() even more, nor introduce an additional function. It even avoids
> resetting the console_owner_lock spinlock.

Yes. I agree.

It also looks slightly more straightforward to me. It might be matter
of taste. It is just that I misunderstood the effect of your variant
yesterday ;-)

> I'd be happy to do this as a v2, if you'd prefer?

Yes, please. Go for it.

> > It would be also great when the current owner releases console_lock
> > so that the panicing CPU could take over it.
> >
> > I think about the following. Well, I am not sure if it would help
> > in the real life because other CPUs are stopped quite early in panic().
> >
> > --- a/kernel/printk/printk.c
> > +++ b/kernel/printk/printk.c
> > @@ -2646,13 +2650,18 @@ void console_unlock(void)
> >
> > for (;;) {
> > size_t ext_len = 0;
> > - int handover;
> > + int handover, pcpu;
> > size_t len;
> >
> > skip:
> > if (!prb_read_valid(prb, console_seq, &r))
> > break;
> >
> > + /* Allow the panic_cpu to take over consoles a safe way. */
> > + pcpu = read_atomic(&panic_cpu);
> > + if (pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id())
> > + break;
> > +
> > if (console_seq != r.info->seq) {
> > console_dropped += r.info->seq - console_seq;
> > console_seq = r.info->seq;
> >
>
> I suppose this could help reduce the odds of a CPU getting interrupted
> during call_console_drivers(), and it might reduce the odds of
> console_sem being held by a halted CPU during panic().
>
> However, it occurs to me that there are two cases:
>
> 1) a kdump kernel is loaded. In this case, crash_smp_send_stop() is
> used, which (at least on x86_64) sends NMIs. In this case,
> __crash_kexec() will not return (barring any errors) and we won't ever
> really need to get the console_sem.

This is likely the more safe case. NMI will make sure that
the current owner will not mess with the console drivers any longer.

> 2) no kdump kernel is loaded. In this case, smp_send_stop() is used,
> which (again on x86_64) sends regular IPIs. In this case, we know we can
> at least avoid interrupting call_console_drivers(), since that executes
> with local IRQ disabled.

This is probably more dangerous case. Regular IPIs will not stop the
current owner when it is running with IRQ disabled. It means
the it could still manipulate consoles and race with
console_flush_on_panic().

Note that printk() can be called in IRQ disabled context. Also note
that some architectures do not have NMI. They use normal IRQ even
in the 1st case.

> So I'm also unsure how much this would help in the real world. But it's
> a small change and so it doesn't need to help much in order to be a net
> positive :)

I agree that it is a corner case. But I think that it is worth it.

Well, we could do this change separately. I could send the patch
for this part if you would prefer it. But feel free to send it
yourself.

Best Regards,
Petr

2022-01-21 11:44:42

by Stephen Brennan

[permalink] [raw]

Subject: Re: [PATCH] panic: disable optimistic spin after halting CPUs

Petr Mladek <[email protected]> writes:
> On Thu 2022-01-13 11:36:35, Stephen Brennan wrote:
>> Petr Mladek <[email protected]> writes:
>> > On Wed 2022-01-12 16:54:25, Stephen Brennan wrote:
[snip]
>> > I suggest to disable the spinning when panic is in progress. I mean
>> > something like:
>> >
>> > --- a/kernel/printk/printk.c
>> > +++ b/kernel/printk/printk.c
>> > @@ -1842,6 +1842,10 @@ static int console_trylock_spinning(void)
>> > if (console_trylock())
>> > return 1;
>> >
>> > + /* Spinning is not safe when the system is stopped */
>> > + if (read_atomic(&panic_cpu) != PANIC_CPU_INVALID)
>> > + return 0;
>> > +
>> > printk_safe_enter_irqsave(flags);
>> >
>> > raw_spin_lock(&console_owner_lock);
>>
>> I think this is pretty much equivalent to my fix, since it also results
>> in the panicking CPU failing to acquire console_sem during
>> console_trylock_spinning().
>
> Yes.
>
>> I do think this is better than my fix :) since it doesn't clutter up
>> panic() even more, nor introduce an additional function. It even avoids
>> resetting the console_owner_lock spinlock.
>
> Yes. I agree.
>
> It also looks slightly more straightforward to me. It might be matter
> of taste. It is just that I misunderstood the effect of your variant
> yesterday ;-)
>
>> I'd be happy to do this as a v2, if you'd prefer?
>
> Yes, please. Go for it.

Hi Petr,

Sorry for the delayed response due to the US holiday weekend.

I switched to your approach, and began running it through my test script
(from the commit message) on QEMU. However, 14% of the time (200/1409) I
found that the system fell into an interesting race condition / loop.

178 void panic(const char *fmt, ...)
179 {
...
187 /*
188 * Disable local interrupts. This will prevent panic_smp_self_stop
189 * from deadlocking the first cpu that invokes the panic, since
190 * there is nothing to prevent an interrupt handler (that runs
191 * after setting panic_cpu) from invoking panic() again.
192 */
193 local_irq_disable();
194 preempt_disable_notrace();
...
211 this_cpu = raw_smp_processor_id();
212 old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
213
214 if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
215 panic_smp_self_stop();
...
226 pr_emerg("Kernel panic - not syncing: %s\n", buf);
...
250 if (!_crash_kexec_post_notifiers) { // <- HALT CPUs
...

We disable IRQ and preemption at the beginning of panic(), then set
panic_cpu. This opens a window (between lines 212 and 250) where
printk() spinning is disabled. Then, we go ahead and we use printk()
during this window, at line 226.

If another CPU is adding to the log buffer at the right time (concurrent
with the pr_emerg in line 226), then they can successfully prevent the
panic CPU from making progress. Since spinning is disabled, the other
CPU in vprintk_emit() will never get the console_sem, and will just
leave its buffered contents in the log buffer. And if the other CPU can
add to the log buffer faster than the panic CPU can drain it... then the
panic CPU is stuck forever writing into the console. And I do mean
forever. If a watchdog triggers, it will find panic_cpu already set, and
so it won't be able to do anything!

Thus I'm now testing the following modification:

// in console_trylock_spinning()
if (atomic_read(&panic_cpu) == this_cpu)
return 0;

This would ensure that the panic CPU won't fall into the spin loop, but
also ensures that other CPUs can't flood the panic CPU with buffered
messages.

I'll test this method and ensure there are no hangs, then send the patch.

>
>> > It would be also great when the current owner releases console_lock
>> > so that the panicing CPU could take over it.
>> >
>> > I think about the following. Well, I am not sure if it would help
>> > in the real life because other CPUs are stopped quite early in panic().
>> >
>> > --- a/kernel/printk/printk.c
>> > +++ b/kernel/printk/printk.c
>> > @@ -2646,13 +2650,18 @@ void console_unlock(void)
>> >
>> > for (;;) {
>> > size_t ext_len = 0;
>> > - int handover;
>> > + int handover, pcpu;
>> > size_t len;
>> >
>> > skip:
>> > if (!prb_read_valid(prb, console_seq, &r))
>> > break;
>> >
>> > + /* Allow the panic_cpu to take over consoles a safe way. */
>> > + pcpu = read_atomic(&panic_cpu);
>> > + if (pcpu != PANIC_CPU_INVALID && pcpu != smp_processor_id())
>> > + break;
>> > +
>> > if (console_seq != r.info->seq) {
>> > console_dropped += r.info->seq - console_seq;
>> > console_seq = r.info->seq;
>> >
>>
>> I suppose this could help reduce the odds of a CPU getting interrupted
>> during call_console_drivers(), and it might reduce the odds of
>> console_sem being held by a halted CPU during panic().
>>
>> However, it occurs to me that there are two cases:
>>
>> 1) a kdump kernel is loaded. In this case, crash_smp_send_stop() is
>> used, which (at least on x86_64) sends NMIs. In this case,
>> __crash_kexec() will not return (barring any errors) and we won't ever
>> really need to get the console_sem.
>
> This is likely the more safe case. NMI will make sure that
> the current owner will not mess with the console drivers any longer.
>
>
>> 2) no kdump kernel is loaded. In this case, smp_send_stop() is used,
>> which (again on x86_64) sends regular IPIs. In this case, we know we can
>> at least avoid interrupting call_console_drivers(), since that executes
>> with local IRQ disabled.
>
> This is probably more dangerous case. Regular IPIs will not stop the
> current owner when it is running with IRQ disabled. It means
> the it could still manipulate consoles and race with
> console_flush_on_panic().

On x86_64, smp_send_stop() will send an IPI and try to wait a grace
period for CPUs to stop, and after that grace period, use NMI, so there
is thankfully some protection.

>
> Note that printk() can be called in IRQ disabled context. Also note
> that some architectures do not have NMI. They use normal IRQ even
> in the 1st case.

Yes, this is a very important consideration.

>
>
>> So I'm also unsure how much this would help in the real world. But it's
>> a small change and so it doesn't need to help much in order to be a net
>> positive :)
>
> I agree that it is a corner case. But I think that it is worth it.
>
> Well, we could do this change separately. I could send the patch
> for this part if you would prefer it. But feel free to send it
> yourself.

I'd be glad to write this patch, put it through my VM stress testing,
and then send it. I've already got it setup so it might as well get put
to good use :)

Thanks,
Stephen

>
> Best Regards,
> Petr

2022-01-21 22:25:27

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH] panic: disable optimistic spin after halting CPUs

On Tue 2022-01-18 15:13:51, Stephen Brennan wrote:
> Petr Mladek <[email protected]> writes:
> > On Thu 2022-01-13 11:36:35, Stephen Brennan wrote:
> Hi Petr,
>
> Sorry for the delayed response due to the US holiday weekend.

No problem at all :-)

> I switched to your approach, and began running it through my test script
> (from the commit message) on QEMU. However, 14% of the time (200/1409) I
> found that the system fell into an interesting race condition / loop.
>
> 178 void panic(const char *fmt, ...)
> 179 {
> ...
> 187 /*
> 188 * Disable local interrupts. This will prevent panic_smp_self_stop
> 189 * from deadlocking the first cpu that invokes the panic, since
> 190 * there is nothing to prevent an interrupt handler (that runs
> 191 * after setting panic_cpu) from invoking panic() again.
> 192 */
> 193 local_irq_disable();
> 194 preempt_disable_notrace();
> ...
> 211 this_cpu = raw_smp_processor_id();
> 212 old_cpu = atomic_cmpxchg(&panic_cpu, PANIC_CPU_INVALID, this_cpu);
> 213
> 214 if (old_cpu != PANIC_CPU_INVALID && old_cpu != this_cpu)
> 215 panic_smp_self_stop();
> ...
> 226 pr_emerg("Kernel panic - not syncing: %s\n", buf);
> ...
> 250 if (!_crash_kexec_post_notifiers) { // <- HALT CPUs
> ...
>
> We disable IRQ and preemption at the beginning of panic(), then set
> panic_cpu. This opens a window (between lines 212 and 250) where
> printk() spinning is disabled. Then, we go ahead and we use printk()
> during this window, at line 226.
>
> If another CPU is adding to the log buffer at the right time (concurrent
> with the pr_emerg in line 226), then they can successfully prevent the
> panic CPU from making progress. Since spinning is disabled, the other
> CPU in vprintk_emit() will never get the console_sem, and will just
> leave its buffered contents in the log buffer. And if the other CPU can
> add to the log buffer faster than the panic CPU can drain it... then the
> panic CPU is stuck forever writing into the console. And I do mean
> forever.

Yeah. The console waiter logic helps a bit to avoid this livelock.

> If a watchdog triggers, it will find panic_cpu already set, and
> so it won't be able to do anything!
>
> Thus I'm now testing the following modification:
>
> // in console_trylock_spinning()
> if (atomic_read(&panic_cpu) == this_cpu)
> return 0;
>
> This would ensure that the panic CPU won't fall into the spin loop, but
> also ensures that other CPUs can't flood the panic CPU with buffered
> messages.

Hmm, I am not sure if we really want it:

The other CPU will likely become the console_sem owner. If the
other CPU is running, the current console_sem owner is likely
running as well and will be able to pass console_sem.

Yes, it will throttle the other CPU. It will become busy with
pushing its own messages to the console. So far so good.

But panic() will stop the other CPU at some stage. And the
panic CPU will not be able to push the messages to the
console because @console_sem is owned by the other (stopped) CPU.

panic CPU will try to get the messages out later in
console_flush_on_panic(). But it is prone to deadlocks.

We want panic CPU to become console_sem owner and be able to flush
the messages to the consoles a clean way. This is why I proposed
the 2nd patch where the non-panic owner releases console_sem.

I wonder if we could prevent the livelock another way. The livelock
happens when the messages added faster into the log buffer than
they can reach the console. The result must be that some messages
are lost, see in console_unlock():

if (console_seq != r.info->seq) {
console_dropped += r.info->seq - console_seq;
console_seq = r.info->seq;
}

A solution might be to disable printk() on non-panic CPUs when
some messages are repeatedly dropped during panic. I mean
something like:

void console_unlock(void)
{
[...]
+ static panic_console_dropped;
[...]
if (console_seq != r.info->seq) {
console_dropped += r.info->seq - console_seq;
console_seq = r.info->seq;
+
+ if (read_atomic(&panic_cpu) != PANIC_CPU_INVALID) {
+ if (panic_console_dropped++ > 10)
+ suppress_non_panic_printk = 1;
+ }
}

, where "suppress_non_panic_printk" will have similar effect as
"supress_printk" except that it will supress only printk
on non-panic CPUs.

I am not completely sure that it is the right approach. But it is not
completely bad:

+ allows the panic() CPU to become clean console_sem owner
+ blocks printk() only when many messages are lost anyway

> > Well, we could do this change separately. I could send the patch
> > for this part if you would prefer it. But feel free to send it
> > yourself.
>
> I'd be glad to write this patch, put it through my VM stress testing,
> and then send it. I've already got it setup so it might as well get put
> to good use :)

Great, thanks.

Best Regards,
Petr