LinuxLists.cc - [PATCH 0/4] watchdog: Better handling of concurrent lockups

2023-12-20 21:17:13

Subject: [PATCH 0/4] watchdog: Better handling of concurrent lockups

When we get multiple lockups at roughly the same time, the output in
the kernel logs can be very confusing since the reports about the
lockups end up interleaved in the logs. There is some code in the
kernel to try to handle this but it wasn't that complete.

Li Zhe recently made this a bit better for softlockups (specifically
for the case where `kernel.softlockup_all_cpu_backtrace` is not set)
in commit 9d02330abd3e ("softlockup: serialized softlockup's log"),
but that only handled softlockup reports. Hardlockup reports still had
similar issues.

This series also has a small fix to avoid dumping all stacks a second
time in the case of a panic. This is a bit unrelated to the
interleaving fixes but it does also improve the clarity of lockup
reports.

Douglas Anderson (4):
watchdog/hardlockup: Adopt softlockup logic avoiding double-dumps
watchdog/softlockup: Use printk_cpu_sync_get_irqsave() to serialize
reporting
watchdog/hardlockup: Use printk_cpu_sync_get_irqsave() to serialize
reporting
watchdog: If panicking and we dumped everything, don't re-enable
dumping

kernel/watchdog.c | 43 ++++++++++++++++++++++++++++++++-----------
1 file changed, 32 insertions(+), 11 deletions(-)

--
2.43.0.472.g3155946c3a-goog

2023-12-20 21:17:27

by Doug Anderson

[permalink] [raw]

Subject: [PATCH 1/4] watchdog/hardlockup: Adopt softlockup logic avoiding double-dumps

The hardlockup detector and softlockup detector both have the ability
to dump the stack of all CPUs (`kernel.hardlockup_all_cpu_backtrace`
and `kernel.softlockup_all_cpu_backtrace`). Both detectors also have
some logic to attempt to avoid interleaving printouts if two CPUs were
trying to do dumps of all CPUs at the same time. However:
- The hardlockup detector's logic still allowed interleaving some
information. Specifically another CPU could print modules and dump
the stack of the locked CPU at the same time we were dumping all
CPUs.
- In the case where `kernel.hardlockup_panic` was set in addition to
`kernel.hardlockup_all_cpu_backtrace`, when two CPUs both detected
hardlockups at the same time the second CPU could call panic() while
the first was still dumping stacks. This was especially bad if the
locked up CPU wasn't responding to the request for a backtrace since
the function nmi_trigger_cpumask_backtrace() can wait up to 10
seconds.

Let's resolve this by adopting the softlockup logic in the hardlockup
handler.

NOTES:
- As part of this, one might think that we should make a helper
function that both the hard and softlockup detectors call. This
turns out not to be super trivial since it would have to be
parameterized quite a bit since there are separate global variables
controlling each lockup detector and they print log messages that
are just different enough that it would be a pain. We probably don't
want to change the messages that are printed without good reason to
avoid throwing log parsers for a loop.
- One might also think that it would be a good idea to have the
hardlockup and softlockup detector use the same global variable to
prevent interleaving. This would make sure that softlockups and
hardlockups can't interleave each other. That _almost_ works but has
a dangerous flaw if `kernel.hardlockup_panic` is not the same as
`kernel.softlockup_panic` because we might skip a call to panic() if
one type of lockup was detected at the same time as another.

Signed-off-by: Douglas Anderson <[email protected]>
---

kernel/watchdog.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index bf30a6fac665..b4fd2f12137f 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -91,7 +91,7 @@ static DEFINE_PER_CPU(atomic_t, hrtimer_interrupts);
static DEFINE_PER_CPU(int, hrtimer_interrupts_saved);
static DEFINE_PER_CPU(bool, watchdog_hardlockup_warned);
static DEFINE_PER_CPU(bool, watchdog_hardlockup_touched);
-static unsigned long watchdog_hardlockup_all_cpu_dumped;
+static unsigned long hard_lockup_nmi_warn;

notrace void arch_touch_nmi_watchdog(void)
{
@@ -156,6 +156,15 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
if (per_cpu(watchdog_hardlockup_warned, cpu))
return;

+ /*
+ * Prevent multiple hard-lockup reports if one cpu is already
+ * engaged in dumping all cpu back traces.
+ */
+ if (sysctl_hardlockup_all_cpu_backtrace) {
+ if (test_and_set_bit_lock(0, &hard_lockup_nmi_warn))
+ return;
+ }
+
pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu);
print_modules();
print_irqtrace_events(current);
@@ -168,13 +177,10 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
trigger_single_cpu_backtrace(cpu);
}

- /*
- * Perform multi-CPU dump only once to avoid multiple
- * hardlockups generating interleaving traces
- */
- if (sysctl_hardlockup_all_cpu_backtrace &&
- !test_and_set_bit(0, &watchdog_hardlockup_all_cpu_dumped))
+ if (sysctl_hardlockup_all_cpu_backtrace) {
trigger_allbutcpu_cpu_backtrace(cpu);
+ clear_bit_unlock(0, &hard_lockup_nmi_warn);
+ }

if (hardlockup_panic)
nmi_panic(regs, "Hard LOCKUP");
--
2.43.0.472.g3155946c3a-goog

2023-12-20 21:17:42

by Doug Anderson

[permalink] [raw]

Subject: [PATCH 2/4] watchdog/softlockup: Use printk_cpu_sync_get_irqsave() to serialize reporting

Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave()
and printk_cpu_sync_put_irqrestore() to serialize softlockup
reporting. Alone this doesn't have any real advantage over the
spinlock, but this will allow us to use the same function in a future
change to also serialize hardlockup crawls.

NOTE: for the most part this serialization is important because we
often end up in the show_regs() path and that has no built-in
serialization if there are multiple callers at once. However, even in
the case where we end up in the dump_stack() path this still has some
advantages because the stack will be guaranteed to be together in the
logs with the lockup message with no interleaving.

NOTE: the fact that printk_cpu_sync_get_irqsave() is allowed to be
called multiple times on the same CPU is important here. Specifically
we hold the "lock" while calling dump_stack() which also gets the same
"lock". This is explicitly documented to be OK and means we don't need
to introduce a variant of dump_stack() that doesn't grab the lock.

Signed-off-by: Douglas Anderson <[email protected]>
---

kernel/watchdog.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index b4fd2f12137f..526041a1100a 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -454,7 +454,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
struct pt_regs *regs = get_irq_regs();
int duration;
int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
- static DEFINE_SPINLOCK(watchdog_output_lock);
+ unsigned long flags;

if (!watchdog_enabled)
return HRTIMER_NORESTART;
@@ -521,7 +521,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
/* Start period for the next softlockup warning. */
update_report_ts();

- spin_lock(&watchdog_output_lock);
+ printk_cpu_sync_get_irqsave(flags);
pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
smp_processor_id(), duration,
current->comm, task_pid_nr(current));
@@ -531,7 +531,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
show_regs(regs);
else
dump_stack();
- spin_unlock(&watchdog_output_lock);
+ printk_cpu_sync_put_irqrestore(flags);

if (softlockup_all_cpu_backtrace) {
trigger_allbutcpu_cpu_backtrace(smp_processor_id());
--
2.43.0.472.g3155946c3a-goog

2023-12-20 21:18:05

by Doug Anderson

[permalink] [raw]

Subject: [PATCH 4/4] watchdog: If panicking and we dumped everything, don't re-enable dumping

If, as part of handling a hardlockup or softlockup, we've already
dumped all CPUs and we're just about to panic, don't reenable dumping
and give some other CPU a chance to hop in there and add some
confusing logs right as the panic is happening.

Signed-off-by: Douglas Anderson <[email protected]>
---

kernel/watchdog.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 11f9577accca..81a8862295d6 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -192,7 +192,8 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)

if (sysctl_hardlockup_all_cpu_backtrace) {
trigger_allbutcpu_cpu_backtrace(cpu);
- clear_bit_unlock(0, &hard_lockup_nmi_warn);
+ if (!hardlockup_panic)
+ clear_bit_unlock(0, &hard_lockup_nmi_warn);
}

if (hardlockup_panic)
@@ -548,7 +549,8 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)

if (softlockup_all_cpu_backtrace) {
trigger_allbutcpu_cpu_backtrace(smp_processor_id());
- clear_bit_unlock(0, &soft_lockup_nmi_warn);
+ if (!softlockup_panic)
+ clear_bit_unlock(0, &soft_lockup_nmi_warn);
}

add_taint(TAINT_SOFTLOCKUP, LOCKDEP_STILL_OK);
--
2.43.0.472.g3155946c3a-goog

2023-12-20 21:18:35

by Doug Anderson

[permalink] [raw]

Subject: [PATCH 3/4] watchdog/hardlockup: Use printk_cpu_sync_get_irqsave() to serialize reporting

If two CPUs end up reporting a hardlockup at the same time then their
logs could get interleaved which is hard to read.

The interleaving problem was especially bad with the "perf" hardlockup
detector where the locked up CPU is always the same as the running CPU
and we end up in show_regs(). show_regs() has no inherent
serialization so we could mix together two crawls if two hardlockups
happened at the same time (and if we didn't have
`sysctl_hardlockup_all_cpu_backtrace` set). With this change we'll
fully serialize hardlockups when using the "perf" hardlockup detector.

The interleaving problem was less bad with the "buddy" hardlockup
detector. With "buddy" we always end up calling
`trigger_single_cpu_backtrace(cpu)` on some CPU other than the running
one. trigger_single_cpu_backtrace() always at least serializes the
individual stack crawls because it eventually uses
printk_cpu_sync_get_irqsave(). Unfortunately the fact that
trigger_single_cpu_backtrace() eventually calls
printk_cpu_sync_get_irqsave() (on a different CPU) means that we have
to drop the "lock" before calling it and we can't fully serialize all
printouts associated with a given hardlockup. However, we still do get
the advantage of serializing the output of print_modules() and
print_irqtrace_events().

Aside from serializing hardlockups from each other, this change also
has the advantage of serializing hardlockups and softlockups from each
other if they happen to happen at the same time since they are both
using the same "lock".

Even though nobody is expected to hang while holding the lock
associated with printk_cpu_sync_get_irqsave(), out of an abundance of
caution, we don't call printk_cpu_sync_get_irqsave() until after we
print out about the hardlockup. This makes extra sure that, even if
printk_cpu_sync_get_irqsave() somehow never runs we at least print
that we saw the hardlockup. This is different than the choice made for
softlockup because hardlockup is really our last resort.

Signed-off-by: Douglas Anderson <[email protected]>
---

kernel/watchdog.c | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 526041a1100a..11f9577accca 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -151,6 +151,7 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
*/
if (is_hardlockup(cpu)) {
unsigned int this_cpu = smp_processor_id();
+ unsigned long flags;

/* Only print hardlockups once. */
if (per_cpu(watchdog_hardlockup_warned, cpu))
@@ -165,7 +166,17 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
return;
}

+ /*
+ * NOTE: we call printk_cpu_sync_get_irqsave() after printing
+ * the lockup message. While it would be nice to serialize
+ * that printout, we really want to make sure that if some
+ * other CPU somehow locked up while holding the lock associated
+ * with printk_cpu_sync_get_irqsave() that we can still at least
+ * get the message about the lockup out.
+ */
pr_emerg("Watchdog detected hard LOCKUP on cpu %d\n", cpu);
+ printk_cpu_sync_get_irqsave(flags);
+
print_modules();
print_irqtrace_events(current);
if (cpu == this_cpu) {
@@ -173,7 +184,9 @@ void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
show_regs(regs);
else
dump_stack();
+ printk_cpu_sync_put_irqrestore(flags);
} else {
+ printk_cpu_sync_put_irqrestore(flags);
trigger_single_cpu_backtrace(cpu);
}

--
2.43.0.472.g3155946c3a-goog

2023-12-22 07:13:58

by lizhe.67

[permalink] [raw]

Subject: Re: [PATCH 2/4] watchdog/softlockup: Use printk_cpu_sync_get_irqsave() to serialize reporting

On Wed, 20 Dec 2023 13:15:35 -0800, [email protected] wrote:

>diff --git a/kernel/watchdog.c b/kernel/watchdog.c
>index b4fd2f12137f..526041a1100a 100644
>--- a/kernel/watchdog.c
>+++ b/kernel/watchdog.c
>@@ -454,7 +454,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> struct pt_regs *regs = get_irq_regs();
> int duration;
> int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace;
>- static DEFINE_SPINLOCK(watchdog_output_lock);
>+ unsigned long flags;
>
> if (!watchdog_enabled)
> return HRTIMER_NORESTART;
>@@ -521,7 +521,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> /* Start period for the next softlockup warning. */
> update_report_ts();
>
>- spin_lock(&watchdog_output_lock);
>+ printk_cpu_sync_get_irqsave(flags);
> pr_emerg("BUG: soft lockup - CPU#%d stuck for %us! [%s:%d]\n",
> smp_processor_id(), duration,
> current->comm, task_pid_nr(current));
>@@ -531,7 +531,7 @@ static enum hrtimer_restart watchdog_timer_fn(struct hrtimer *hrtimer)
> show_regs(regs);
> else
> dump_stack();
>- spin_unlock(&watchdog_output_lock);
>+ printk_cpu_sync_put_irqrestore(flags);
>
> if (softlockup_all_cpu_backtrace) {
> trigger_allbutcpu_cpu_backtrace(smp_processor_id());
>--

Reviewed-by: Li Zhe <[email protected]>

2023-12-22 09:30:49

by John Ogness

[permalink] [raw]

Subject: Re: [PATCH 2/4] watchdog/softlockup: Use printk_cpu_sync_get_irqsave() to serialize reporting

On 2023-12-20, Douglas Anderson <[email protected]> wrote:
> Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave()
> and printk_cpu_sync_put_irqrestore() to serialize softlockup
> reporting. Alone this doesn't have any real advantage over the
> spinlock, but this will allow us to use the same function in a future
> change to also serialize hardlockup crawls.

Thanks for this change. For me, this is the preferred workaround to
best-effort serialize a particular type of output. Hopefully one day we
will get to implementing printk contexts [0] [1] so that message blocks
can be inserted atomically.

> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: John Ogness <[email protected]>

[0] https://lore.kernel.org/lkml/1299043680.4208.97.camel@Joe-Laptop
[1] https://lore.kernel.org/lkml/[email protected]

2023-12-22 09:43:20

by John Ogness

[permalink] [raw]

Subject: Re: [PATCH 3/4] watchdog/hardlockup: Use printk_cpu_sync_get_irqsave() to serialize reporting

On 2023-12-20, Douglas Anderson <[email protected]> wrote:
> The interleaving problem was less bad with the "buddy" hardlockup
> detector. With "buddy" we always end up calling
> `trigger_single_cpu_backtrace(cpu)` on some CPU other than the running
> one. trigger_single_cpu_backtrace() always at least serializes the
> individual stack crawls because it eventually uses
> printk_cpu_sync_get_irqsave(). Unfortunately the fact that
> trigger_single_cpu_backtrace() eventually calls
> printk_cpu_sync_get_irqsave() (on a different CPU) means that we have
> to drop the "lock" before calling it and we can't fully serialize all
> printouts associated with a given hardlockup.

I think that is good enough. Otherwise there would need to be some kind
of CPU handshaking to ensure things are synchronized correctly in case
multiple CPUs have triggered the situation.

> However, we still do get
> the advantage of serializing the output of print_modules() and
> print_irqtrace_events().
>
> Aside from serializing hardlockups from each other, this change also
> has the advantage of serializing hardlockups and softlockups from each
> other if they happen to happen at the same time since they are both
> using the same "lock".
>
> Even though nobody is expected to hang while holding the lock
> associated with printk_cpu_sync_get_irqsave(), out of an abundance of
> caution, we don't call printk_cpu_sync_get_irqsave() until after we
> print out about the hardlockup. This makes extra sure that, even if
> printk_cpu_sync_get_irqsave() somehow never runs we at least print
> that we saw the hardlockup.

I agree with calling printk() before trying to acquire ownership of the
cpu_sync.

> This is different than the choice made for
> softlockup because hardlockup is really our last resort.
>
> Signed-off-by: Douglas Anderson <[email protected]>

Reviewed-by: John Ogness <[email protected]>

2024-02-06 10:13:37

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH 0/4] watchdog: Better handling of concurrent lockups

Hi,

On Wed 2023-12-20 13:15:33, Douglas Anderson wrote:
>
> When we get multiple lockups at roughly the same time, the output in
> the kernel logs can be very confusing since the reports about the
> lockups end up interleaved in the logs. There is some code in the
> kernel to try to handle this but it wasn't that complete.
>
> Li Zhe recently made this a bit better for softlockups (specifically
> for the case where `kernel.softlockup_all_cpu_backtrace` is not set)
> in commit 9d02330abd3e ("softlockup: serialized softlockup's log"),
> but that only handled softlockup reports. Hardlockup reports still had
> similar issues.
>
> This series also has a small fix to avoid dumping all stacks a second
> time in the case of a panic. This is a bit unrelated to the
> interleaving fixes but it does also improve the clarity of lockup
> reports.

Just for record. This patchset has finally got on top of my queue
(after Christmas and a sick leave). And it looks good from my POV.

I was slightly afraid to use printk_cpu_sync_get_irqsave() on more
locations. It has to be used with care to avoid deadlock.

But the patchset looks good. It takes the lock only around code
proceed on the same CPU. And it always releases the lock before
triggering backtrace on another CPU.

Idea:

I have just got an idea how to make printk_cpu_sync_get_irqsave()
less error prone for deadlock on the panic() CPU. The idea is
to ignore the lock or give up locking after a timeout on
the panic CPU.

AFAIK, the lock is currently used only to serialize related
printk() calls. The only risk is that some messages might be
interleaved when it is ignored.

I am not sure if this is a good idea though. It might create
another risk when the lock gets used to serialize more
things in the future and a race might create a real problem.

Best Regards,
Petr

2024-02-06 10:30:14

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH 2/4] watchdog/softlockup: Use printk_cpu_sync_get_irqsave() to serialize reporting

On Fri 2023-12-22 10:36:37, John Ogness wrote:
> On 2023-12-20, Douglas Anderson <[email protected]> wrote:
> > Instead of introducing a spinlock, use printk_cpu_sync_get_irqsave()
> > and printk_cpu_sync_put_irqrestore() to serialize softlockup
> > reporting. Alone this doesn't have any real advantage over the
> > spinlock, but this will allow us to use the same function in a future
> > change to also serialize hardlockup crawls.
>
> Thanks for this change. For me, this is the preferred workaround to
> best-effort serialize a particular type of output.

I agree.

The good thing is that dump_stack_lvl() and nmi_cpu_backtrace()
use this lock on its known. Also nmi_trigger_cpumask_backtrace()
prevents parallel calls. It means that the particular backtraces
should be serialized for most callers.

> Hopefully one day we
> will get to implementing printk contexts [0] [1] so that message blocks
> can be inserted atomically.

I didn't think about this possibility. You are right. It might be even
better than the printk_cpu_sync_put_irqrestore() because it allows
passing the lock to a higher priority context and
supports timeout.

Best Regards,
Petr

2024-02-06 10:46:35

by John Ogness

[permalink] [raw]

Subject: Re: [PATCH 0/4] watchdog: Better handling of concurrent lockups

On 2024-02-06, Petr Mladek <[email protected]> wrote:
> I have just got an idea how to make printk_cpu_sync_get_irqsave()
> less error prone for deadlock on the panic() CPU. The idea is
> to ignore the lock or give up locking after a timeout on
> the panic CPU.

This idea is out of scope for this series. But it is something we should
think about. The issue has always been a possible problem in panic().

> AFAIK, the lock is currently used only to serialize related
> printk() calls. The only risk is that some messages might be
> interleaved when it is ignored.
>
> I am not sure if this is a good idea though. It might create
> another risk when the lock gets used to serialize more
> things in the future and a race might create a real problem.

With the printk series we are currently working on [0], only the panic
CPU can store new printk messages anyway. So there would be nothing to
synchronize against (and it could be safely ignored).

kgdb uses the same technique to quiesce the CPUs. It does not use the
printk_cpu_sync for this, but it is an example of a possible future
usage not related to printk.

My vote is to make it a NOP for the panic CPU and then keep an eye on
any future uses. Should I add this to v4 of [0]?

John

[0] https://lore.kernel.org/lkml/[email protected]

2024-02-06 19:31:54

by Doug Anderson

[permalink] [raw]

Subject: Re: [PATCH 0/4] watchdog: Better handling of concurrent lockups

Hi,

On Tue, Feb 6, 2024 at 2:46 AM John Ogness <[email protected]> wrote:
>
> On 2024-02-06, Petr Mladek <[email protected]> wrote:
> > I have just got an idea how to make printk_cpu_sync_get_irqsave()
> > less error prone for deadlock on the panic() CPU. The idea is
> > to ignore the lock or give up locking after a timeout on
> > the panic CPU.
>
> This idea is out of scope for this series. But it is something we should
> think about. The issue has always been a possible problem in panic().

One thing to be at least a little cognizant of is how this interacts
with the 10 second timeout in nmi_trigger_cpumask_backtrace(), which
we can hit twice in some of the lockup reports since we first trace
the locked CPU and then the rest. Ideally we don't hit that timeout
lots, except that on arm64 if you don't have pseudo-NMI turned on then
it's actually pretty easy to hit the timeout when you've got a
hard-locked CPU. Probably that 10 second timeout should be
shortened...

-Doug

2024-02-07 13:06:07

by Petr Mladek

[permalink] [raw]

Subject: Re: [PATCH 0/4] watchdog: Better handling of concurrent lockups

On Tue 2024-02-06 11:51:50, John Ogness wrote:
> On 2024-02-06, Petr Mladek <[email protected]> wrote:
> > I have just got an idea how to make printk_cpu_sync_get_irqsave()
> > less error prone for deadlock on the panic() CPU. The idea is
> > to ignore the lock or give up locking after a timeout on
> > the panic CPU.
>
> This idea is out of scope for this series. But it is something we should
> think about. The issue has always been a possible problem in panic().
>
> > AFAIK, the lock is currently used only to serialize related
> > printk() calls. The only risk is that some messages might be
> > interleaved when it is ignored.
> >
> > I am not sure if this is a good idea though. It might create
> > another risk when the lock gets used to serialize more
> > things in the future and a race might create a real problem.
>
> With the printk series we are currently working on [0], only the panic
> CPU can store new printk messages anyway. So there would be nothing to
> synchronize against (and it could be safely ignored).

Right.

> kgdb uses the same technique to quiesce the CPUs. It does not use the
> printk_cpu_sync for this, but it is an example of a possible future
> usage not related to printk.
>
> My vote is to make it a NOP for the panic CPU and then keep an eye on
> any future uses.
Sounds good.

> Should I add this to v4 of [0]?

Let's not complicate this series any more. It is almost ready ;-)
We could do it by a separate patch in top of it or in another
patchset.

>
> [0] https://lore.kernel.org/lkml/[email protected]

Best Regards,
Petr