2020-02-10 09:49:22

by Konstantin Khlebnikov

[permalink] [raw]
Subject: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

In NMI context printk() could save messages into per-cpu buffers and
schedule flush by irq_work when IRQ are unblocked. This means message
about hardlockup appears in kernel log only when/if lockup is gone.

Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
thus printk() cannot schedule flush work to another cpu.

This patch adds simple atomic counter of detected hardlockups and
flushes all per-cpu printk buffers in context softlockup watchdog
at any other cpu when it sees changes of this counter.

Signed-off-by: Konstantin Khlebnikov <[email protected]>
---
include/linux/nmi.h | 1 +
kernel/watchdog.c | 22 ++++++++++++++++++++++
kernel/watchdog_hld.c | 1 +
3 files changed, 24 insertions(+)

diff --git a/include/linux/nmi.h b/include/linux/nmi.h
index 9003e29cde46..8406df72ae5a 100644
--- a/include/linux/nmi.h
+++ b/include/linux/nmi.h
@@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
#if defined(CONFIG_HARDLOCKUP_DETECTOR)
extern void hardlockup_detector_disable(void);
extern unsigned int hardlockup_panic;
+extern atomic_t hardlockup_detected;
#else
static inline void hardlockup_detector_disable(void) {}
#endif
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b6b1f54a7837..9f5c68fababe 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
}
__setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
# endif /* CONFIG_SMP */
+
+atomic_t hardlockup_detected = ATOMIC_INIT(0);
+
+static inline void flush_hardlockup_messages(void)
+{
+ static atomic_t flushed = ATOMIC_INIT(0);
+
+ /* flush messages from hard lockup detector */
+ if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
+ atomic_set(&flushed, atomic_read(&hardlockup_detected));
+ printk_safe_flush();
+ }
+}
+
+#else /* CONFIG_HARDLOCKUP_DETECTOR */
+
+static inline void flush_hardlockup_messages(void)
+{
+}
+
#endif /* CONFIG_HARDLOCKUP_DETECTOR */

/*
@@ -370,6 +390,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
/* kick the hardlockup detector */
watchdog_interrupt_count();

+ flush_hardlockup_messages();
+
/* kick the softlockup detector */
if (completion_done(this_cpu_ptr(&softlockup_completion))) {
reinit_completion(this_cpu_ptr(&softlockup_completion));
diff --git a/kernel/watchdog_hld.c b/kernel/watchdog_hld.c
index 247bf0b1582c..a546bc54f6ff 100644
--- a/kernel/watchdog_hld.c
+++ b/kernel/watchdog_hld.c
@@ -154,6 +154,7 @@ static void watchdog_overflow_callback(struct perf_event *event,

if (hardlockup_panic)
nmi_panic(regs, "Hard LOCKUP");
+ atomic_inc(&hardlockup_detected);

__this_cpu_write(hard_watchdog_warn, true);
return;


2020-02-10 22:51:47

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

On Mon, 10 Feb 2020 12:48:57 +0300 Konstantin Khlebnikov <[email protected]> wrote:

> In NMI context printk() could save messages into per-cpu buffers and
> schedule flush by irq_work when IRQ are unblocked. This means message
> about hardlockup appears in kernel log only when/if lockup is gone.

I think I understand what this means. The hard lockup detector runs at
NMI time but if it detects a lockup within IRQ context it cannot call
printk, because it's within NMI context, where synchronous printk
doesn't work. Yes?

> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> thus printk() cannot schedule flush work to another cpu.
>
> This patch adds simple atomic counter of detected hardlockups and
> flushes all per-cpu printk buffers in context softlockup watchdog
> at any other cpu when it sees changes of this counter.

And I think this works because the softlockup detector runs within irq
context?

>
> ...
>
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> }
> __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
> # endif /* CONFIG_SMP */
> +
> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
> +
> +static inline void flush_hardlockup_messages(void)

I don't think this needs to be inlined?

> +{
> + static atomic_t flushed = ATOMIC_INIT(0);
> +
> + /* flush messages from hard lockup detector */
> + if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
> + atomic_set(&flushed, atomic_read(&hardlockup_detected));
> + printk_safe_flush();
> + }
> +}

Could we add some explanatory comments here? Explain to the reader why
this code exists, what purpose it serves? Basically a micro version of
the above changelog.

>
> ...
>

2020-02-11 08:15:47

by Kirill Tkhai

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

Hi, Konstantin,

On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
> In NMI context printk() could save messages into per-cpu buffers and
> schedule flush by irq_work when IRQ are unblocked. This means message
> about hardlockup appears in kernel log only when/if lockup is gone.
>
> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> thus printk() cannot schedule flush work to another cpu.
>
> This patch adds simple atomic counter of detected hardlockups and
> flushes all per-cpu printk buffers in context softlockup watchdog
> at any other cpu when it sees changes of this counter.
>
> Signed-off-by: Konstantin Khlebnikov <[email protected]>
> ---
> include/linux/nmi.h | 1 +
> kernel/watchdog.c | 22 ++++++++++++++++++++++
> kernel/watchdog_hld.c | 1 +
> 3 files changed, 24 insertions(+)
>
> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> index 9003e29cde46..8406df72ae5a 100644
> --- a/include/linux/nmi.h
> +++ b/include/linux/nmi.h
> @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
> #if defined(CONFIG_HARDLOCKUP_DETECTOR)
> extern void hardlockup_detector_disable(void);
> extern unsigned int hardlockup_panic;
> +extern atomic_t hardlockup_detected;
> #else
> static inline void hardlockup_detector_disable(void) {}
> #endif
> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> index b6b1f54a7837..9f5c68fababe 100644
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> }
> __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
> # endif /* CONFIG_SMP */
> +
> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
> +
> +static inline void flush_hardlockup_messages(void)
> +{
> + static atomic_t flushed = ATOMIC_INIT(0);
> +
> + /* flush messages from hard lockup detector */
> + if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
> + atomic_set(&flushed, atomic_read(&hardlockup_detected));
> + printk_safe_flush();
> + }
> +}

Do we really need two variables here? They may come into two different
cache lines, and there will be double cache pollution just because of
this simple check. Why not the below?

if (atomic_read(&hardlockup_detected)) {
atomic_set(&hardlockup_detected, 0);
printk_safe_flush();
}

Or even, since atomic is not needed here, as it does not give any ordering guarantees.
static inline void flush_hardlockup_messages(void)
{
if (READ_ONCE(&hardlockup_detected)) {
WRITE_ONCE(&hardlockup_detected, 0);
printk_safe_flush();
}
}

watchdog_timer_fn()
{
...
WRITE_ONCE(&hardlockup_detected, 1);
...
}

Kirill

2020-02-11 11:45:26

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

On 11/02/2020 01.51, Andrew Morton wrote:
> On Mon, 10 Feb 2020 12:48:57 +0300 Konstantin Khlebnikov <[email protected]> wrote:
>
>> In NMI context printk() could save messages into per-cpu buffers and
>> schedule flush by irq_work when IRQ are unblocked. This means message
>> about hardlockup appears in kernel log only when/if lockup is gone.
>
> I think I understand what this means. The hard lockup detector runs at
> NMI time but if it detects a lockup within IRQ context it cannot call
> printk, because it's within NMI context, where synchronous printk
> doesn't work. Yes?

Yes. Printing from hardlockup watchdog is only special case.
Without it irq-work will flush per-cpu buffer right after enabling irq.

>
>> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
>> thus printk() cannot schedule flush work to another cpu.
>>
>> This patch adds simple atomic counter of detected hardlockups and
>> flushes all per-cpu printk buffers in context softlockup watchdog
>> at any other cpu when it sees changes of this counter.
>
> And I think this works because the softlockup detector runs within irq
> context?

Yes. Softlockuo watchdog is a timer. It could use normal printk and
flush per-cpu buffers. Any periodically executed code could do that
but softlockup is most logical place for that.

There is forward signal from softlockup to hardlockup wathdogs in
per-cpu counter hrtimer_interrupts (increment means cpu in't in
hardlockup). This patch adds backward signal from hardlockup to
softlocup detector that some cpus are in hardlockup.

>
>>
>> ...
>>
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>> }
>> __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>> # endif /* CONFIG_SMP */
>> +
>> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
>> +
>> +static inline void flush_hardlockup_messages(void)
>
> I don't think this needs to be inlined?
>
>> +{
>> + static atomic_t flushed = ATOMIC_INIT(0);
>> +
>> + /* flush messages from hard lockup detector */
>> + if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
>> + atomic_set(&flushed, atomic_read(&hardlockup_detected));
>> + printk_safe_flush();
>> + }
>> +}
>
> Could we add some explanatory comments here? Explain to the reader why
> this code exists, what purpose it serves? Basically a micro version of
> the above changelog.

Hmm. This seems obvious from names of variables and called function.
Both watchdogs use same patterns: monotonic counter and side variable
with snapshot to detect changes or stalls.

>
>>
>> ...
>>

2020-02-11 12:48:32

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

On 11/02/2020 11.14, Kirill Tkhai wrote:
> Hi, Konstantin,
>
> On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
>> In NMI context printk() could save messages into per-cpu buffers and
>> schedule flush by irq_work when IRQ are unblocked. This means message
>> about hardlockup appears in kernel log only when/if lockup is gone.
>>
>> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
>> thus printk() cannot schedule flush work to another cpu.
>>
>> This patch adds simple atomic counter of detected hardlockups and
>> flushes all per-cpu printk buffers in context softlockup watchdog
>> at any other cpu when it sees changes of this counter.
>>
>> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>> ---
>> include/linux/nmi.h | 1 +
>> kernel/watchdog.c | 22 ++++++++++++++++++++++
>> kernel/watchdog_hld.c | 1 +
>> 3 files changed, 24 insertions(+)
>>
>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>> index 9003e29cde46..8406df72ae5a 100644
>> --- a/include/linux/nmi.h
>> +++ b/include/linux/nmi.h
>> @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
>> #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>> extern void hardlockup_detector_disable(void);
>> extern unsigned int hardlockup_panic;
>> +extern atomic_t hardlockup_detected;
>> #else
>> static inline void hardlockup_detector_disable(void) {}
>> #endif
>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>> index b6b1f54a7837..9f5c68fababe 100644
>> --- a/kernel/watchdog.c
>> +++ b/kernel/watchdog.c
>> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>> }
>> __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>> # endif /* CONFIG_SMP */
>> +
>> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
>> +
>> +static inline void flush_hardlockup_messages(void)
>> +{
>> + static atomic_t flushed = ATOMIC_INIT(0);
>> +
>> + /* flush messages from hard lockup detector */
>> + if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
>> + atomic_set(&flushed, atomic_read(&hardlockup_detected));
>> + printk_safe_flush();
>> + }
>> +}
>
> Do we really need two variables here? They may come into two different
> cache lines, and there will be double cache pollution just because of
> this simple check. Why not the below?

I don't think anybody could notice read-only access to second variable.
This executes once in several seconds.

Watchdogs already use same pattern (monotonic counter + snapshot) in
couple places. So code looks more clean in this way.

>
> if (atomic_read(&hardlockup_detected)) {
> atomic_set(&hardlockup_detected, 0);
> printk_safe_flush();
> }
>
> Or even, since atomic is not needed here, as it does not give any ordering guarantees.
> static inline void flush_hardlockup_messages(void)
> {
> if (READ_ONCE(&hardlockup_detected)) {
> WRITE_ONCE(&hardlockup_detected, 0);
> printk_safe_flush();
> }
> }
>
> watchdog_timer_fn()
> {
> ...
> WRITE_ONCE(&hardlockup_detected, 1);
> ...
> }
>
> Kirill
>

2020-02-12 01:16:18

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

On (20/02/10 12:48), Konstantin Khlebnikov wrote:
>
> In NMI context printk() could save messages into per-cpu buffers and
> schedule flush by irq_work when IRQ are unblocked. This means message
> about hardlockup appears in kernel log only when/if lockup is gone.
>
> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> thus printk() cannot schedule flush work to another cpu.
>
> This patch adds simple atomic counter of detected hardlockups and
> flushes all per-cpu printk buffers in context softlockup watchdog
> at any other cpu when it sees changes of this counter.

Petr, could you remind me, why do we do PRINTK_NMI_DIRECT_CONTEXT_MASK
only from ftrace?

-ss

2020-02-12 02:50:46

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

On Wed, 12 Feb 2020 10:15:51 +0900
Sergey Senozhatsky <[email protected]> wrote:

> On (20/02/10 12:48), Konstantin Khlebnikov wrote:
> >
> > In NMI context printk() could save messages into per-cpu buffers and
> > schedule flush by irq_work when IRQ are unblocked. This means message
> > about hardlockup appears in kernel log only when/if lockup is gone.
> >
> > Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> > thus printk() cannot schedule flush work to another cpu.
> >
> > This patch adds simple atomic counter of detected hardlockups and
> > flushes all per-cpu printk buffers in context softlockup watchdog
> > at any other cpu when it sees changes of this counter.
>
> Petr, could you remind me, why do we do PRINTK_NMI_DIRECT_CONTEXT_MASK
> only from ftrace?

Could it be because its from ftrace_dump() which can spit out millions
of lines from NMI context?

-- Steve

2020-02-12 03:04:59

by Sergey Senozhatsky

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

On (20/02/11 21:49), Steven Rostedt wrote:
> On Wed, 12 Feb 2020 10:15:51 +0900
> Sergey Senozhatsky <[email protected]> wrote:
>
> > On (20/02/10 12:48), Konstantin Khlebnikov wrote:
> > >
> > > In NMI context printk() could save messages into per-cpu buffers and
> > > schedule flush by irq_work when IRQ are unblocked. This means message
> > > about hardlockup appears in kernel log only when/if lockup is gone.
> > >
> > > Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> > > thus printk() cannot schedule flush work to another cpu.
> > >
> > > This patch adds simple atomic counter of detected hardlockups and
> > > flushes all per-cpu printk buffers in context softlockup watchdog
> > > at any other cpu when it sees changes of this counter.
> >
> > Petr, could you remind me, why do we do PRINTK_NMI_DIRECT_CONTEXT_MASK
> > only from ftrace?
>
> Could it be because its from ftrace_dump() which can spit out millions
> of lines from NMI context?

Oh, yes, ftrace printks a lot. But I sort of forgot why don't we do
the same for "regular" NMIs. So NMIs use per-cpu buffers, expect for
NMIs which involve ftrace dump. I'm missing something here.

-ss

2020-02-12 03:29:27

by Steven Rostedt

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

On Wed, 12 Feb 2020 12:04:03 +0900
Sergey Senozhatsky <[email protected]> wrote:

> Oh, yes, ftrace printks a lot. But I sort of forgot why don't we do
> the same for "regular" NMIs. So NMIs use per-cpu buffers, expect for
> NMIs which involve ftrace dump. I'm missing something here.

Well, ftrace_dump() is called from NMI context when the system is about
to crash (ftrace_dump_on_oops). And at that moment, we care more about
the trace than other output (it's most like to contain the information
that caused the bug).

But for things like sysrq-l, that does a printk in NMI context for
normal operations, we don't want strange races to occur and affect
other printk operations. Having them in buffers controls the output a
bit better.

But with the new printk ring buffer work, perhaps that's no longer
necessary.

-- Steve

2020-02-12 14:18:58

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

On Wed 2020-02-12 10:15:51, Sergey Senozhatsky wrote:
> On (20/02/10 12:48), Konstantin Khlebnikov wrote:
> >
> > In NMI context printk() could save messages into per-cpu buffers and
> > schedule flush by irq_work when IRQ are unblocked. This means message
> > about hardlockup appears in kernel log only when/if lockup is gone.
> >
> > Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> > thus printk() cannot schedule flush work to another cpu.
> >
> > This patch adds simple atomic counter of detected hardlockups and
> > flushes all per-cpu printk buffers in context softlockup watchdog
> > at any other cpu when it sees changes of this counter.
>
> Petr, could you remind me, why do we do PRINTK_NMI_DIRECT_CONTEXT_MASK
> only from ftrace?

There was a possible deadlock when printing backtraces from all CPUs.
The CPUs were serialized via a lock in nmi_cpu_backtrace(). One of
them might have been interrupted under logbuf_lock.

ftrace was needed because it printed too many messages. And it was
safe because the ftrace log was read from a single CPU without
any lock.

Best Regards,
Petr

2020-02-12 14:55:04

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected

On Tue 2020-02-11 15:36:02, Konstantin Khlebnikov wrote:
> On 11/02/2020 11.14, Kirill Tkhai wrote:
> > Hi, Konstantin,
> >
> > On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
> > > In NMI context printk() could save messages into per-cpu buffers and
> > > schedule flush by irq_work when IRQ are unblocked. This means message
> > > about hardlockup appears in kernel log only when/if lockup is gone.
> > >
> > > Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
> > > thus printk() cannot schedule flush work to another cpu.
> > >
> > > This patch adds simple atomic counter of detected hardlockups and
> > > flushes all per-cpu printk buffers in context softlockup watchdog
> > > at any other cpu when it sees changes of this counter.
> > >
> > > Signed-off-by: Konstantin Khlebnikov <[email protected]>
> > > ---
> > > include/linux/nmi.h | 1 +
> > > kernel/watchdog.c | 22 ++++++++++++++++++++++
> > > kernel/watchdog_hld.c | 1 +
> > > 3 files changed, 24 insertions(+)
> > >
> > > diff --git a/include/linux/nmi.h b/include/linux/nmi.h
> > > index 9003e29cde46..8406df72ae5a 100644
> > > --- a/include/linux/nmi.h
> > > +++ b/include/linux/nmi.h
> > > @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
> > > #if defined(CONFIG_HARDLOCKUP_DETECTOR)
> > > extern void hardlockup_detector_disable(void);
> > > extern unsigned int hardlockup_panic;
> > > +extern atomic_t hardlockup_detected;
> > > #else
> > > static inline void hardlockup_detector_disable(void) {}
> > > #endif
> > > diff --git a/kernel/watchdog.c b/kernel/watchdog.c
> > > index b6b1f54a7837..9f5c68fababe 100644
> > > --- a/kernel/watchdog.c
> > > +++ b/kernel/watchdog.c
> > > @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
> > > }
> > > __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
> > > # endif /* CONFIG_SMP */
> > > +
> > > +atomic_t hardlockup_detected = ATOMIC_INIT(0);
> > > +
> > > +static inline void flush_hardlockup_messages(void)
> > > +{
> > > + static atomic_t flushed = ATOMIC_INIT(0);
> > > +
> > > + /* flush messages from hard lockup detector */
> > > + if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
> > > + atomic_set(&flushed, atomic_read(&hardlockup_detected));
> > > + printk_safe_flush();
> > > + }
> > > +}
> >
> > Do we really need two variables here? They may come into two different
> > cache lines, and there will be double cache pollution just because of
> > this simple check. Why not the below?
>
> I don't think anybody could notice read-only access to second variable.
> This executes once in several seconds.
>
> Watchdogs already use same pattern (monotonic counter + snapshot) in
> couple places. So code looks more clean in this way.

It is not only about speed. It is also about code complexity
and correctness. Using two variables is more complex.
Calling atomic_read(&hardlockup_detected) twice does
not look like a correct pattern.

The single variable patter is used for similar things there
as well, for example, see hard_watchdog_warn,
hardlockup_allcpu_dumped.

I would call the variable "hardlockup_dump_flush" and
use the same logic as for hardlockup_allcpu_dumped.

Note that simple READ_ONCE(), WRITE_ONCE() are not enough
because they do not provide smp barriers.

Best Regards,
Petr

2020-02-13 13:05:44

by Konstantin Khlebnikov

[permalink] [raw]
Subject: Re: [PATCH] kernel/watchdog: flush all printk nmi buffers when hardlockup detected



On 12/02/2020 17.54, Petr Mladek wrote:
> On Tue 2020-02-11 15:36:02, Konstantin Khlebnikov wrote:
>> On 11/02/2020 11.14, Kirill Tkhai wrote:
>>> Hi, Konstantin,
>>>
>>> On 10.02.2020 12:48, Konstantin Khlebnikov wrote:
>>>> In NMI context printk() could save messages into per-cpu buffers and
>>>> schedule flush by irq_work when IRQ are unblocked. This means message
>>>> about hardlockup appears in kernel log only when/if lockup is gone.
>>>>
>>>> Comment in irq_work_queue_on() states that remote IPI aren't NMI safe
>>>> thus printk() cannot schedule flush work to another cpu.
>>>>
>>>> This patch adds simple atomic counter of detected hardlockups and
>>>> flushes all per-cpu printk buffers in context softlockup watchdog
>>>> at any other cpu when it sees changes of this counter.
>>>>
>>>> Signed-off-by: Konstantin Khlebnikov <[email protected]>
>>>> ---
>>>> include/linux/nmi.h | 1 +
>>>> kernel/watchdog.c | 22 ++++++++++++++++++++++
>>>> kernel/watchdog_hld.c | 1 +
>>>> 3 files changed, 24 insertions(+)
>>>>
>>>> diff --git a/include/linux/nmi.h b/include/linux/nmi.h
>>>> index 9003e29cde46..8406df72ae5a 100644
>>>> --- a/include/linux/nmi.h
>>>> +++ b/include/linux/nmi.h
>>>> @@ -84,6 +84,7 @@ static inline void reset_hung_task_detector(void) { }
>>>> #if defined(CONFIG_HARDLOCKUP_DETECTOR)
>>>> extern void hardlockup_detector_disable(void);
>>>> extern unsigned int hardlockup_panic;
>>>> +extern atomic_t hardlockup_detected;
>>>> #else
>>>> static inline void hardlockup_detector_disable(void) {}
>>>> #endif
>>>> diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>>>> index b6b1f54a7837..9f5c68fababe 100644
>>>> --- a/kernel/watchdog.c
>>>> +++ b/kernel/watchdog.c
>>>> @@ -92,6 +92,26 @@ static int __init hardlockup_all_cpu_backtrace_setup(char *str)
>>>> }
>>>> __setup("hardlockup_all_cpu_backtrace=", hardlockup_all_cpu_backtrace_setup);
>>>> # endif /* CONFIG_SMP */
>>>> +
>>>> +atomic_t hardlockup_detected = ATOMIC_INIT(0);
>>>> +
>>>> +static inline void flush_hardlockup_messages(void)
>>>> +{
>>>> + static atomic_t flushed = ATOMIC_INIT(0);
>>>> +
>>>> + /* flush messages from hard lockup detector */
>>>> + if (atomic_read(&hardlockup_detected) != atomic_read(&flushed)) {
>>>> + atomic_set(&flushed, atomic_read(&hardlockup_detected));
>>>> + printk_safe_flush();
>>>> + }
>>>> +}
>>>
>>> Do we really need two variables here? They may come into two different
>>> cache lines, and there will be double cache pollution just because of
>>> this simple check. Why not the below?
>>
>> I don't think anybody could notice read-only access to second variable.
>> This executes once in several seconds.
>>
>> Watchdogs already use same pattern (monotonic counter + snapshot) in
>> couple places. So code looks more clean in this way.
>
> It is not only about speed. It is also about code complexity
> and correctness. Using two variables is more complex.
> Calling atomic_read(&hardlockup_detected) twice does > not look like a correct pattern.

It guarantees "at least once" which is enough for this case.

>
> The single variable patter is used for similar things there
> as well, for example, see hard_watchdog_warn,

> hardlockup_allcpu_dumped.

Ouch. This works only once and there is no way to rearm it back.
Now I see why this thing never worked for me recent years =)

Maybe it's better reset sysctl_hardlockup_all_cpu_backtrace
to let user set sysctl back to 1.
Or rearm it back in softlockup watchdog after timeout.

>
> I would call the variable "hardlockup_dump_flush" and
> use the same logic as for hardlockup_allcpu_dumped.
>
> Note that simple READ_ONCE(), WRITE_ONCE() are not enough
> because they do not provide smp barriers.

It's hard to imagine arch which actually needs barrires here.
Softlockup timers will eventually see the change.

>
> Best Regards,
> Petr
>