When entering kdb/kgdb on a kernel panic, it was be observed that the
console isn't flushed before the `kdb` prompt came up. Specifically,
when using the buddy lockup detector on arm64 and running:
echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
I could see:
[ 26.161099] lkdtm: Performing direct entry HARDLOCKUP
[ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
[ 32.552865] Sending NMI from CPU 5 to CPUs 6:
[ 32.557359] NMI backtrace for cpu 6
... [backtrace for cpu 6] ...
[ 32.558353] NMI backtrace for cpu 5
... [backtrace for cpu 5] ...
[ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
[ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
[5]kdb>
As you can see, backtraces for the other CPUs start printing and get
interleaved with the kdb PANIC print.
Let's replicate the commands to flush the console in the kdb panic
entry point to avoid this.
Signed-off-by: Douglas Anderson <[email protected]>
---
kernel/debug/debug_core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
index d5e9ccde3ab8..3a904d8697c8 100644
--- a/kernel/debug/debug_core.c
+++ b/kernel/debug/debug_core.c
@@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
if (panic_timeout)
return;
+ debug_locks_off();
+ console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+
if (dbg_kdb_mode)
kdb_printf("PANIC: %s\n", msg);
--
2.42.0.rc1.204.g551eb34607-goog
On Wed 2023-08-30 10:53:59, Daniel Thompson wrote:
> On Fri, Aug 25, 2023 at 07:18:44AM -0700, Doug Anderson wrote:
> > Hi,
> >
> > On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
> > <[email protected]> wrote:
> > >
> > > On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > > > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > > > console isn't flushed before the `kdb` prompt came up. Specifically,
> > > > when using the buddy lockup detector on arm64 and running:
> > > > echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> > > >
> > > > I could see:
> > > > [ 26.161099] lkdtm: Performing direct entry HARDLOCKUP
> > > > [ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > > [ 32.552865] Sending NMI from CPU 5 to CPUs 6:
> > > > [ 32.557359] NMI backtrace for cpu 6
> > > > ... [backtrace for cpu 6] ...
> > > > [ 32.558353] NMI backtrace for cpu 5
> > > > ... [backtrace for cpu 5] ...
> > > > [ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > > [ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> > > >
> > > > Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> > > > [5]kdb>
> > > >
> > > > As you can see, backtraces for the other CPUs start printing and get
> > > > interleaved with the kdb PANIC print.
> > > >
> > > > Let's replicate the commands to flush the console in the kdb panic
> > > > entry point to avoid this.
> > > >
> > > > Signed-off-by: Douglas Anderson <[email protected]>
> > > > ---
> > > >
> > > > kernel/debug/debug_core.c | 3 +++
> > > > 1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > > > index d5e9ccde3ab8..3a904d8697c8 100644
> > > > --- a/kernel/debug/debug_core.c
> > > > +++ b/kernel/debug/debug_core.c
> > > > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> > > > if (panic_timeout)
> > > > return;
> > > >
> > > > + debug_locks_off();
> > > > + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > > > +
> > > > if (dbg_kdb_mode)
> > > > kdb_printf("PANIC: %s\n", msg);
> > >
> > > I'm somewhat included to say *this* (calling kdb_printf() when not
> > > actually in the debugger) is the cause of the problem. kdb_printf()
> > > does some pretty horid things to the console and isn't intended to
> > > run while the system is active.
> > >
> > > I'd therefore be more tempted to defer the print to the b.p. trap
> > > handler itself and make this part of kgdb_panic() look more like:
> > >
> > > kgdb_panic_msg = msg;
> > > kgdb_breakpoint();
> > > kgdb_panic_msg = NULL;
> >
> > Unfortunately I think that only solves half the problem. As a quick
> > test, I tried simply commenting out the "kdb_printf" line in
> > kgdb_panic(). While that avoids the interleaved panic message and
> > backtrace, it does nothing to actually get the backtraces printed out
> > before you end up in kdb. As an example, this is what happened when I
> > used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
> > had the "kdb_printf" in kgdb_panic() commented out:
> >
> > [ 72.658424] lkdtm: Performing direct entry HARDLOCKUP
> > [ 82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > ...
> > [ 82.234801] Sending NMI from CPU 5 to CPUs 6:
> > [ 82.239296] NMI backtrace for cpu 6
> > ... [ stack trace for CPU 6 ] ...
> > [ 82.240294] NMI backtrace for cpu 5
> > ... [ stack trace for CPU 5 ] ...
> > [ 82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
> > [ 82.581291] NMI backtrace
> > Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
> > to Keyboard Entry
> > [5]kdb>
> >
> > As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
> > show up if I use the "dmesg" command but it's a bit of a hassle to run
> > "dmesg" to look for any extra debug messages every time I drop in kdb.
> >
> > I guess perhaps that part isn't obvious from the commit message?
>
> I figured it was a risk.
>
> In fact it's an area where my instinct to honour console messages and my
> instinct to get into the kernel as soon as possible after the decision
> to invoke it has been made come into conflict.
>
> In other words does it matter that the console buffers are not flushed
> before entering kgdb? However having thought about it for a little while
> (and knowing the console code tends to be written to be decently robust)
> I can come to the view the flushing is best.
Just for record. I wondered if there was bug in printk() that the
backtraces were not flushed. And I think that it is because they
are printed in NMI context where the conosle handling is deferred.
From this POV, the explicit flush makes perfect sense.
Best Regards,
Petr
On Wed, Sep 20, 2023 at 02:01:30PM +0200, Petr Mladek wrote:
> On Wed 2023-08-30 10:53:59, Daniel Thompson wrote:
> > On Fri, Aug 25, 2023 at 07:18:44AM -0700, Doug Anderson wrote:
> > > Hi,
> > >
> > > On Fri, Aug 25, 2023 at 3:09 AM Daniel Thompson
> > > <[email protected]> wrote:
> > > >
> > > > On Tue, Aug 22, 2023 at 01:19:46PM -0700, Douglas Anderson wrote:
> > > > > When entering kdb/kgdb on a kernel panic, it was be observed that the
> > > > > console isn't flushed before the `kdb` prompt came up. Specifically,
> > > > > when using the buddy lockup detector on arm64 and running:
> > > > > echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
> > > > >
> > > > > I could see:
> > > > > [ 26.161099] lkdtm: Performing direct entry HARDLOCKUP
> > > > > [ 32.499881] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > > > [ 32.552865] Sending NMI from CPU 5 to CPUs 6:
> > > > > [ 32.557359] NMI backtrace for cpu 6
> > > > > ... [backtrace for cpu 6] ...
> > > > > [ 32.558353] NMI backtrace for cpu 5
> > > > > ... [backtrace for cpu 5] ...
> > > > > [ 32.867471] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > > > [ 32.872321] NMI backtrace forP cpuANC: Hard LOCKUP
> > > > >
> > > > > Entering kdb (current=..., pid 0) on processor 5 due to Keyboard Entry
> > > > > [5]kdb>
> > > > >
> > > > > As you can see, backtraces for the other CPUs start printing and get
> > > > > interleaved with the kdb PANIC print.
> > > > >
> > > > > Let's replicate the commands to flush the console in the kdb panic
> > > > > entry point to avoid this.
> > > > >
> > > > > Signed-off-by: Douglas Anderson <[email protected]>
> > > > > ---
> > > > >
> > > > > kernel/debug/debug_core.c | 3 +++
> > > > > 1 file changed, 3 insertions(+)
> > > > >
> > > > > diff --git a/kernel/debug/debug_core.c b/kernel/debug/debug_core.c
> > > > > index d5e9ccde3ab8..3a904d8697c8 100644
> > > > > --- a/kernel/debug/debug_core.c
> > > > > +++ b/kernel/debug/debug_core.c
> > > > > @@ -1006,6 +1006,9 @@ void kgdb_panic(const char *msg)
> > > > > if (panic_timeout)
> > > > > return;
> > > > >
> > > > > + debug_locks_off();
> > > > > + console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > > > > +
> > > > > if (dbg_kdb_mode)
> > > > > kdb_printf("PANIC: %s\n", msg);
> > > >
> > > > I'm somewhat included to say *this* (calling kdb_printf() when not
> > > > actually in the debugger) is the cause of the problem. kdb_printf()
> > > > does some pretty horid things to the console and isn't intended to
> > > > run while the system is active.
> > > >
> > > > I'd therefore be more tempted to defer the print to the b.p. trap
> > > > handler itself and make this part of kgdb_panic() look more like:
> > > >
> > > > kgdb_panic_msg = msg;
> > > > kgdb_breakpoint();
> > > > kgdb_panic_msg = NULL;
> > >
> > > Unfortunately I think that only solves half the problem. As a quick
> > > test, I tried simply commenting out the "kdb_printf" line in
> > > kgdb_panic(). While that avoids the interleaved panic message and
> > > backtrace, it does nothing to actually get the backtraces printed out
> > > before you end up in kdb. As an example, this is what happened when I
> > > used `echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT` and
> > > had the "kdb_printf" in kgdb_panic() commented out:
> > >
> > > [ 72.658424] lkdtm: Performing direct entry HARDLOCKUP
> > > [ 82.181857] watchdog: Watchdog detected hard LOCKUP on cpu 6
> > > ...
> > > [ 82.234801] Sending NMI from CPU 5 to CPUs 6:
> > > [ 82.239296] NMI backtrace for cpu 6
> > > ... [ stack trace for CPU 6 ] ...
> > > [ 82.240294] NMI backtrace for cpu 5
> > > ... [ stack trace for CPU 5 ] ...
> > > [ 82.576443] Sending NMI from CPU 5 to CPUs 0-4,7:
> > > [ 82.581291] NMI backtrace
> > > Entering kdb (current=0xffffff80da5a1080, pid 6978) on processor 5 due
> > > to Keyboard Entry
> > > [5]kdb>
> > >
> > > As you can see, I don't see the traces for CPUs 0-4 and 7. Those do
> > > show up if I use the "dmesg" command but it's a bit of a hassle to run
> > > "dmesg" to look for any extra debug messages every time I drop in kdb.
> > >
> > > I guess perhaps that part isn't obvious from the commit message?
> >
> > I figured it was a risk.
> >
> > In fact it's an area where my instinct to honour console messages and my
> > instinct to get into the kernel as soon as possible after the decision
> > to invoke it has been made come into conflict.
> >
> > In other words does it matter that the console buffers are not flushed
> > before entering kgdb? However having thought about it for a little while
> > (and knowing the console code tends to be written to be decently robust)
> > I can come to the view the flushing is best.
>
> Just for record. I wondered if there was bug in printk() that the
> backtraces were not flushed. And I think that it is because they
> are printed in NMI context where the conosle handling is deferred.
>
> From this POV, the explicit flush makes perfect sense.
Agreed... and (with apologies for the length of time to get this
done):
Applied!
Thanks
Daniel.