2022-01-14 23:01:59

by Guilherme G. Piccoli

[permalink] [raw]
Subject: [PATCH V3] panic: Move panic_print before kmsg dumpers

The panic_print setting allows users to collect more information in a
panic event, like memory stats, tasks, CPUs backtraces, etc.
This is a pretty interesting debug mechanism, but currently the print
event happens *after* kmsg_dump(), meaning that pstore, for example,
cannot collect a dmesg with the panic_print information.

This patch changes that in 2 ways:

(a) First, after a good discussion with Petr in the mailing-list[0],
he noticed that one specific setting of panic_print (the console replay,
bit 5) should not be executed before console proper flushing; hence we
hereby split the panic_print_sys_info() function in upper and lower
portions: if the parameter "after_kmsg_dumpers" is passed, only bit 5
(the console replay thing) is evaluated and the function returns - this
is the lower portion. Otherwise all other bits are checked and the
function prints the user required information; this is the upper/earlier
mode.

(b) With the above change, we can safely insert a panic_print_sys_info()
call up some lines, in order kmsg_dump() accounts this new information
and exposes it through pstore or other kmsg dumpers. Notice that this
new earlier call doesn't set "after_kmsg_dumpers" so we print the
information set by user in panic_print, except the console replay that,
if set, will be executed after the console flushing.
Also, worth to notice we needed to guard the panic_print_sys_info(false)
calls against double print - we use kexec_crash_loaded() helper for that
(see discussion [1] for more details).

In the first version of this patch we discussed the risks in the
mailing-list [0], and seems this approach is the least terrible,
despite still having risks of polluting the log with the bulk of
information that panic_print may bring, losing older messages.
In order to attenuate that, we've added a warning in the
kernel-parameters.txt so that users enabling this mechanism consider
to increase the log buffer size via "log_buf_len" as well.

Finally, another decision was to keep 2 panic_print_sys_info(false)
calls (instead of just bringing it up some code lines and keep a single
call) due to the panic notifiers: if kdump is not set, currently the
panic_print information is collected after the notifiers and since
it's a bit safer this way, we decided to keep it as is, only modifying
the kdump case as per the previous commit [2] (see more details about
this discussion also in thread [1]).

[0] https://lore.kernel.org/lkml/[email protected]
[1] https://lore.kernel.org/lkml/[email protected]
[2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=5613b7538f69

Cc: Feng Tang <[email protected]>
Cc: Petr Mladek <[email protected]>
Signed-off-by: Guilherme G. Piccoli <[email protected]>
---


V3: Added a guard in the 2nd panic_print_sys_info(false) to prevent
double print - thanks for catching this Petr!

I didn't implement your final suggestion Petr, i.e., putting the first
panic_print_sys_info(false) inside the if (!_crash_kexec_post_notifiers)
block, and the reason is that when we do this, there's 4 cases to consider:

!kexec_crash_load() && !_crash_kexec_post_notifiers
kexec_crash_load() && !_crash_kexec_post_notifiers
kexec_crash_load() && _crash_kexec_post_notifiers
!kexec_crash_load() && _crash_kexec_post_notifiers

The 3rd case, which means user enabled kdump and set the post_notifiers
in the cmdline fails - we end-up not reaching panic_print_sys_info(false)
in this case, unless we add another variable to track the function call
and prevent double print. My preference was to keep the first call
as introduced by commit [2] (mentioned above) and not rely in another
variable.
Thanks again for the great reviews,

Guilherme


V2: https://lore.kernel.org/lkml/[email protected]



.../admin-guide/kernel-parameters.txt | 4 ++++
kernel/panic.c | 22 ++++++++++++++-----
2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index a069d8fe2fee..0f5cbe141bfd 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -3727,6 +3727,10 @@
bit 4: print ftrace buffer
bit 5: print all printk messages in buffer
bit 6: print all CPUs backtrace (if available in the arch)
+ *Be aware* that this option may print a _lot_ of lines,
+ so there are risks of losing older messages in the log.
+ Use this option carefully, maybe worth to setup a
+ bigger log buffer with "log_buf_len" along with this.

panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
Format: <hex>[,nousertaint]
diff --git a/kernel/panic.c b/kernel/panic.c
index 41ecf9ab824a..4ae712665f75 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -148,10 +148,13 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
}
EXPORT_SYMBOL(nmi_panic);

-static void panic_print_sys_info(void)
+static void panic_print_sys_info(bool after_kmsg_dumpers)
{
- if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
- console_flush_on_panic(CONSOLE_REPLAY_ALL);
+ if (after_kmsg_dumpers) {
+ if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
+ console_flush_on_panic(CONSOLE_REPLAY_ALL);
+ return;
+ }

if (panic_print & PANIC_PRINT_ALL_CPU_BT)
trigger_all_cpu_backtrace();
@@ -249,7 +252,7 @@ void panic(const char *fmt, ...)
* show some extra information on kernel log if it was set...
*/
if (kexec_crash_loaded())
- panic_print_sys_info();
+ panic_print_sys_info(false);

/*
* If we have crashed and we have a crash kernel loaded let it handle
@@ -283,6 +286,15 @@ void panic(const char *fmt, ...)
*/
atomic_notifier_call_chain(&panic_notifier_list, 0, buf);

+ /*
+ * If kexec_crash_loaded() is true and we still reach this point,
+ * kernel would double print the information from panic_print; so
+ * let's guard against that possibility (it happens if kdump users
+ * also set crash_kexec_post_notifiers in the command-line).
+ */
+ if (!kexec_crash_loaded())
+ panic_print_sys_info(false);
+
kmsg_dump(KMSG_DUMP_PANIC);

/*
@@ -313,7 +325,7 @@ void panic(const char *fmt, ...)
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);

- panic_print_sys_info();
+ panic_print_sys_info(true);

if (!panic_blink)
panic_blink = no_blink;
--
2.34.1


2022-01-17 11:57:58

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/14/22 at 03:30pm, Guilherme G. Piccoli wrote:
......
> .../admin-guide/kernel-parameters.txt | 4 ++++
> kernel/panic.c | 22 ++++++++++++++-----
> 2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a069d8fe2fee..0f5cbe141bfd 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3727,6 +3727,10 @@
> bit 4: print ftrace buffer
> bit 5: print all printk messages in buffer
> bit 6: print all CPUs backtrace (if available in the arch)
> + *Be aware* that this option may print a _lot_ of lines,
> + so there are risks of losing older messages in the log.
> + Use this option carefully, maybe worth to setup a
> + bigger log buffer with "log_buf_len" along with this.
>
> panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
> Format: <hex>[,nousertaint]
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 41ecf9ab824a..4ae712665f75 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -148,10 +148,13 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
> }
> EXPORT_SYMBOL(nmi_panic);
>
> -static void panic_print_sys_info(void)
> +static void panic_print_sys_info(bool after_kmsg_dumpers)
> {
> - if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> - console_flush_on_panic(CONSOLE_REPLAY_ALL);
> + if (after_kmsg_dumpers) {
> + if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> + console_flush_on_panic(CONSOLE_REPLAY_ALL);
> + return;
> + }
>
> if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> trigger_all_cpu_backtrace();
> @@ -249,7 +252,7 @@ void panic(const char *fmt, ...)
> * show some extra information on kernel log if it was set...
> */
> if (kexec_crash_loaded())
> - panic_print_sys_info();
> + panic_print_sys_info(false);

Patch can'e be applied on the latest code of linus's tree, can you tell
which branch your code are based on?

>
> /*
> * If we have crashed and we have a crash kernel loaded let it handle
> @@ -283,6 +286,15 @@ void panic(const char *fmt, ...)
> */
> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>
> + /*
> + * If kexec_crash_loaded() is true and we still reach this point,
> + * kernel would double print the information from panic_print; so
> + * let's guard against that possibility (it happens if kdump users
> + * also set crash_kexec_post_notifiers in the command-line).
> + */
> + if (!kexec_crash_loaded())
> + panic_print_sys_info(false);
> +
> kmsg_dump(KMSG_DUMP_PANIC);
>
> /*
> @@ -313,7 +325,7 @@ void panic(const char *fmt, ...)
> debug_locks_off();
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
> - panic_print_sys_info();
> + panic_print_sys_info(true);
>
> if (!panic_blink)
> panic_blink = no_blink;
> --
> 2.34.1
>

2022-01-17 14:31:10

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/17/22 at 11:33am, Baoquan He wrote:
> On 01/14/22 at 03:30pm, Guilherme G. Piccoli wrote:
> ......
> > .../admin-guide/kernel-parameters.txt | 4 ++++
> > kernel/panic.c | 22 ++++++++++++++-----
> > 2 files changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index a069d8fe2fee..0f5cbe141bfd 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -3727,6 +3727,10 @@
> > bit 4: print ftrace buffer
> > bit 5: print all printk messages in buffer
> > bit 6: print all CPUs backtrace (if available in the arch)
> > + *Be aware* that this option may print a _lot_ of lines,
> > + so there are risks of losing older messages in the log.
> > + Use this option carefully, maybe worth to setup a
> > + bigger log buffer with "log_buf_len" along with this.
> >
> > panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
> > Format: <hex>[,nousertaint]
> > diff --git a/kernel/panic.c b/kernel/panic.c
> > index 41ecf9ab824a..4ae712665f75 100644
> > --- a/kernel/panic.c
> > +++ b/kernel/panic.c
> > @@ -148,10 +148,13 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
> > }
> > EXPORT_SYMBOL(nmi_panic);
> >
> > -static void panic_print_sys_info(void)
> > +static void panic_print_sys_info(bool after_kmsg_dumpers)
> > {
> > - if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> > - console_flush_on_panic(CONSOLE_REPLAY_ALL);
> > + if (after_kmsg_dumpers) {
> > + if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> > + console_flush_on_panic(CONSOLE_REPLAY_ALL);
> > + return;
> > + }
> >
> > if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> > trigger_all_cpu_backtrace();
> > @@ -249,7 +252,7 @@ void panic(const char *fmt, ...)
> > * show some extra information on kernel log if it was set...
> > */
> > if (kexec_crash_loaded())
> > - panic_print_sys_info();
> > + panic_print_sys_info(false);
>
> Patch can'e be applied on the latest code of linus's tree, can you tell
> which branch your code are based on?

OK, this is based on linux-next, I will apply this patch on the
linux-next/master and have a look.

>
> >
> > /*
> > * If we have crashed and we have a crash kernel loaded let it handle
> > @@ -283,6 +286,15 @@ void panic(const char *fmt, ...)
> > */
> > atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> >
> > + /*
> > + * If kexec_crash_loaded() is true and we still reach this point,
> > + * kernel would double print the information from panic_print; so
> > + * let's guard against that possibility (it happens if kdump users
> > + * also set crash_kexec_post_notifiers in the command-line).
> > + */
> > + if (!kexec_crash_loaded())
> > + panic_print_sys_info(false);
> > +
> > kmsg_dump(KMSG_DUMP_PANIC);
> >
> > /*
> > @@ -313,7 +325,7 @@ void panic(const char *fmt, ...)
> > debug_locks_off();
> > console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> >
> > - panic_print_sys_info();
> > + panic_print_sys_info(true);
> >
> > if (!panic_blink)
> > panic_blink = no_blink;
> > --
> > 2.34.1
> >
>

2022-01-18 02:25:59

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 17/01/2022 03:13, Baoquan He wrote:
> [...]
>>
>> Patch can'e be applied on the latest code of linus's tree, can you tell
>> which branch your code are based on?
>
> OK, this is based on linux-next, I will apply this patch on the
> linux-next/master and have a look.
>

Indeed, I'm sorry for that - should have mentioned, but I forgot in the V3.
It's based on linux-next because it's on top of some patches there - if
I base in Linus tree, it wouldn't work on -next heh

Thanks in advance for the reviews/test =)
Cheers,


Guilherme

2022-01-21 17:43:32

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/14/22 at 03:30pm, Guilherme G. Piccoli wrote:
> The panic_print setting allows users to collect more information in a
> panic event, like memory stats, tasks, CPUs backtraces, etc.
> This is a pretty interesting debug mechanism, but currently the print
> event happens *after* kmsg_dump(), meaning that pstore, for example,
> cannot collect a dmesg with the panic_print information.
......

Thanks for the effort.

I have some concerns about this patch, and it's still not clear to me
why the thing done in this patch is needed. Hope you don't mind if I ask
stupid question or things have been discussed in the earlier post.

Firstly, let me add some background information and details about the
problem with my understanding, please correct me if anthing is wrong.

Background:
===========
Currently, in panic(), there are 5 different ways to try best to
collect information:
1) Vmcore dumping
done via kdump, high weight, almost get all information we want;
need switch to kdump kernel immediately.
2) Panic notifier
iterating those registered handlers, light weight, information got
depends on those panic handlers. Try its best after panic, do not reboot.
3) kmsg_dump
iterating registered dumpers to collect as much kernel log as possible
to store. E.g on pstore, light weight, only kernel log, do not reboot,
log can be checked after system reboot.
4)console_flush_on_panic(CONSOLE_FLUSH_PENDING);
Flush to get the pending logbuf printed out to console.
5)panic_print
Print out more system info, including all dmesg, task states, mem
info, etc.
===========


The problem encoutered:
=======
Currently panic_print is done after kmsg_dump. This caused an issue that
system info poked by panic_print to expose to logbuf have no chance to
be collected by kmsg_dump. E.g pstore_dumper registered and handled in
kmsg_dump(), is a light weight and helpful panic info collecting
mechanism, will miss those lots of and helful message poked by
panic_print.
======


> Finally, another decision was to keep 2 panic_print_sys_info(false)
> calls (instead of just bringing it up some code lines and keep a single
> call) due to the panic notifiers: if kdump is not set, currently the
> panic_print information is collected after the notifiers and since
> it's a bit safer this way, we decided to keep it as is, only modifying
> the kdump case as per the previous commit [2] (see more details about
> this discussion also in thread [1]).

About the change, my question is why not packing
console_flush_on_panic(CONSOLE_FLUSH_PENDING) into panic_print(), and
move panic_print() up to be above kmsg_dump(). The pending console
flusing and replaying all flush are all only called inside panic(), and
aim to print out more information.

And adding panic_print_sys_info(false) before kdump might not be a good
idea. Firstly, panicked system is very unstable, kdump need switch to
2nd kernel asap w/o any extra action so that it can survive and collect
tons of information. Added panic_print_sys_info(false) is obviously
unnecessary to kdump, and increase risk to kernel switching.

If I missing anything, please help point out so that I can have a
complete view on this isuse and its fix.

Thanks
Baoquan

>
> [0] https://lore.kernel.org/lkml/[email protected]
> [1] https://lore.kernel.org/lkml/[email protected]
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=5613b7538f69
>
> Cc: Feng Tang <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
>
>
> V3: Added a guard in the 2nd panic_print_sys_info(false) to prevent
> double print - thanks for catching this Petr!
>
> I didn't implement your final suggestion Petr, i.e., putting the first
> panic_print_sys_info(false) inside the if (!_crash_kexec_post_notifiers)
> block, and the reason is that when we do this, there's 4 cases to consider:
>
> !kexec_crash_load() && !_crash_kexec_post_notifiers
> kexec_crash_load() && !_crash_kexec_post_notifiers
> kexec_crash_load() && _crash_kexec_post_notifiers
> !kexec_crash_load() && _crash_kexec_post_notifiers
>
> The 3rd case, which means user enabled kdump and set the post_notifiers
> in the cmdline fails - we end-up not reaching panic_print_sys_info(false)
> in this case, unless we add another variable to track the function call
> and prevent double print. My preference was to keep the first call
> as introduced by commit [2] (mentioned above) and not rely in another
> variable.
> Thanks again for the great reviews,
>
> Guilherme
>
>
> V2: https://lore.kernel.org/lkml/[email protected]
>
>
>
> .../admin-guide/kernel-parameters.txt | 4 ++++
> kernel/panic.c | 22 ++++++++++++++-----
> 2 files changed, 21 insertions(+), 5 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index a069d8fe2fee..0f5cbe141bfd 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -3727,6 +3727,10 @@
> bit 4: print ftrace buffer
> bit 5: print all printk messages in buffer
> bit 6: print all CPUs backtrace (if available in the arch)
> + *Be aware* that this option may print a _lot_ of lines,
> + so there are risks of losing older messages in the log.
> + Use this option carefully, maybe worth to setup a
> + bigger log buffer with "log_buf_len" along with this.
>
> panic_on_taint= Bitmask for conditionally calling panic() in add_taint()
> Format: <hex>[,nousertaint]
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 41ecf9ab824a..4ae712665f75 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -148,10 +148,13 @@ void nmi_panic(struct pt_regs *regs, const char *msg)
> }
> EXPORT_SYMBOL(nmi_panic);
>
> -static void panic_print_sys_info(void)
> +static void panic_print_sys_info(bool after_kmsg_dumpers)
> {
> - if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> - console_flush_on_panic(CONSOLE_REPLAY_ALL);
> + if (after_kmsg_dumpers) {
> + if (panic_print & PANIC_PRINT_ALL_PRINTK_MSG)
> + console_flush_on_panic(CONSOLE_REPLAY_ALL);
> + return;
> + }
>
> if (panic_print & PANIC_PRINT_ALL_CPU_BT)
> trigger_all_cpu_backtrace();
> @@ -249,7 +252,7 @@ void panic(const char *fmt, ...)
> * show some extra information on kernel log if it was set...
> */
> if (kexec_crash_loaded())
> - panic_print_sys_info();
> + panic_print_sys_info(false);
>
> /*
> * If we have crashed and we have a crash kernel loaded let it handle
> @@ -283,6 +286,15 @@ void panic(const char *fmt, ...)
> */
> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>
> + /*
> + * If kexec_crash_loaded() is true and we still reach this point,
> + * kernel would double print the information from panic_print; so
> + * let's guard against that possibility (it happens if kdump users
> + * also set crash_kexec_post_notifiers in the command-line).
> + */
> + if (!kexec_crash_loaded())
> + panic_print_sys_info(false);
> +
> kmsg_dump(KMSG_DUMP_PANIC);
>
> /*
> @@ -313,7 +325,7 @@ void panic(const char *fmt, ...)
> debug_locks_off();
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
> - panic_print_sys_info();
> + panic_print_sys_info(true);
>
> if (!panic_blink)
> panic_blink = no_blink;
> --
> 2.34.1
>

2022-01-21 19:15:30

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 19/01/2022 04:13, Baoquan He wrote:
> [...]
> Thanks for the effort.
>
> I have some concerns about this patch, and it's still not clear to me
> why the thing done in this patch is needed. Hope you don't mind if I ask
> stupid question or things have been discussed in the earlier post.
>
> Firstly, let me add some background information and details about the
> problem with my understanding, please correct me if anthing is wrong.
>

Hi Baoquan - first of all, thanks a lot for your review and detailed
analysis, this is pretty good and much appreciated!


> Background:
> ===========
> Currently, in panic(), there are 5 different ways to try best to
> collect information:
> 1) Vmcore dumping
> done via kdump, high weight, almost get all information we want;
> need switch to kdump kernel immediately.
> 2) Panic notifier
> iterating those registered handlers, light weight, information got
> depends on those panic handlers. Try its best after panic, do not reboot.
> 3) kmsg_dump
> iterating registered dumpers to collect as much kernel log as possible
> to store. E.g on pstore, light weight, only kernel log, do not reboot,
> log can be checked after system reboot.
> 4)console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> Flush to get the pending logbuf printed out to console.
> 5)panic_print
> Print out more system info, including all dmesg, task states, mem
> info, etc.
> ===========
>
>
> The problem encoutered:
> =======
> Currently panic_print is done after kmsg_dump. This caused an issue that
> system info poked by panic_print to expose to logbuf have no chance to
> be collected by kmsg_dump. E.g pstore_dumper registered and handled in
> kmsg_dump(), is a light weight and helpful panic info collecting
> mechanism, will miss those lots of and helful message poked by
> panic_print.
> ======
>

OK, I agree with you analysis, it's quite precise - our main problem
resolved in this patch is that some users may want to save the
panic_print information on pstore collected logs, and currently this
isn't possible.


> [...]
> About the change, my question is why not packing
> console_flush_on_panic(CONSOLE_FLUSH_PENDING) into panic_print(), and
> move panic_print() up to be above kmsg_dump(). The pending console
> flusing and replaying all flush are all only called inside panic(), and
> aim to print out more information.
>

About this point, the idea of not messing with the console flush comes
from a discussion in the V1 of this patch, see here:
https://lore.kernel.org/lkml/YdQzU0KkAVq2BWpk@alley/

Basically, there are much more risks than benefits in moving the console
flush earlier. Hence, we kept the console flushing intact and even the
replay flag in panic_print - that's the reason we need the new flag in
panic_print_sys_info().


> And adding panic_print_sys_info(false) before kdump might not be a good
> idea. Firstly, panicked system is very unstable, kdump need switch to
> 2nd kernel asap w/o any extra action so that it can survive and collect
> tons of information. Added panic_print_sys_info(false) is obviously
> unnecessary to kdump, and increase risk to kernel switching.
>
> If I missing anything, please help point out so that I can have a
> complete view on this isuse and its fix.
>

And this is another point, and for that I must apologize since I sent
this change in another patch[0] but forgot to CC the kexec list; so
details are in this thread:
https://lore.kernel.org/lkml/[email protected]/

There was a discussion with Dave there, thankfully he was able to see
the patch using some lore/lei search in lkml, but let me give you the
summary here.

So, I agree with you that calling the panic_print handler before kdump
_increases the risk_ of a kdump failure - this is more or less like a
the "crash_kexec_post_notifiers", right? It's a decision from the user,
with its natural trade-offs and considerations to be made.
But in the panic_print case, users weren't allowed to decide, before my
patch - it was just impossible to ever get the panic_print output before
the kdump.

My patch just enabled the users to have this decision. It's not like if
we are now dumping the panic_print always, it's still a parameter that
users must add consciously there, but now they are entitled to have this
choice! And the bonus is that we can have kdump collecting only the
dmesg, which is fast and storage-friendly (since the vmcore, even
compressed, is quite big) but still, with lots of information from the
panic_print output. Of course, there's the trade-off here, the risk of
printing all that information before the kdump, that might make kdump
more unstable, though this is a decision that the users/sysadmin must
take into account.

Finally, Dave suggested that we could make panic_print a panic notifier,
and this idea is pretty good! But then, the trade-off is more complex:
in order to have panic_print output, users also must have all the panic
notifiers, by using the "crash_kexec_post_notifiers" option. It's too
much, in my opinion.

I hope this was a good summary, please let me know if you still have any
questions Baoquan - thanks again for the great review!
Cheers,


Guilherme


[0]
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=ab693ae2140

2022-01-21 19:40:00

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On Wed 2022-01-19 15:13:18, Baoquan He wrote:
> On 01/14/22 at 03:30pm, Guilherme G. Piccoli wrote:
> > The panic_print setting allows users to collect more information in a
> > panic event, like memory stats, tasks, CPUs backtraces, etc.
> > This is a pretty interesting debug mechanism, but currently the print
> > event happens *after* kmsg_dump(), meaning that pstore, for example,
> > cannot collect a dmesg with the panic_print information.
> ......
>
> Thanks for the effort.
>
> I have some concerns about this patch, and it's still not clear to me
> why the thing done in this patch is needed. Hope you don't mind if I ask
> stupid question or things have been discussed in the earlier post.
>
> Firstly, let me add some background information and details about the
> problem with my understanding, please correct me if anthing is wrong.
>
> Background:
> ===========
> Currently, in panic(), there are 5 different ways to try best to
> collect information:
> 1) Vmcore dumping
> done via kdump, high weight, almost get all information we want;
> need switch to kdump kernel immediately.

My experience is that it basically always works when it is correctly
configured. It might be tested before the crash.



> 2) Panic notifier
> iterating those registered handlers, light weight, information got
> depends on those panic handlers. Try its best after panic, do not reboot.


From my POV, the function of panic notifiers is not well defined. They
do various things, for example:

+ on_panic_nb() flushes s390 console buffers. It looks like a
complex and risky code. For example, __sclp_vt220_flush_buffer()
takes spin locks and calls timer code.

+ dump_gisb_error() prints some info but it also communicates with
the device, see gisb_read() and gisb_write() in
brcmstb_gisb_arb_decode_addr(). I am not sure how reliable this is.

+ parisc_panic_event() re-enables the soft-power switch. I am not
sure how safe this is.

+ pvpanic_panic_notify() takes spin lock and does in iowrite().


The do more that just providing information. Some are risky. It is not
easy to disable a particular one.


> 3) kmsg_dump
> iterating registered dumpers to collect as much kernel log as possible
> to store. E.g on pstore, light weight, only kernel log, do not reboot,
> log can be checked after system reboot.

From my POV, it is similar functionality like the crash dump.

It might make sense to allow to call kmsg_dump before panic notifiers
to reduce the risk of a breakage. But I do not have enough experience
with them to judge this.

I can't remember any bug report in this code. I guess that only
few people use kmsg_dump.


> 4)console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> Flush to get the pending logbuf printed out to console.

My experience is that there are many problems with consoles.
There might be deadlocks because of internal locks. Some
serial consoles are incredibly slow. Messages in the console
drivers might cause infinite output. Some drivers are pretty
complex, especially tty that is often enabled.

console_flush_on_panic() makes it even worse. It resets console_sem.
It means that it tries to use the consoles even when they are
already in use.

Note that printk() tries to show the messages on the console
immediately. console_flush_on_panic() is just the last
instance when all safe ways failed.


> 5)panic_print
> Print out more system info, including all dmesg, task states, mem
> info, etc.

This adds more information to the kernel log. They can be requested
by kernel commandline parameter. They can be seen only on the console
at the moment.

It makes sense to allow to see them also in crash dump or kmsg_dump.

> About the change, my question is why not packing
> console_flush_on_panic(CONSOLE_FLUSH_PENDING) into panic_print(), and
> move panic_print() up to be above kmsg_dump().

As explained above. console_flush_on_panic() is a dirty hack.
It really has to be called at the end as the last resort.


> And adding panic_print_sys_info(false) before kdump might not be a good
> idea. Firstly, panicked system is very unstable, kdump need switch to
> 2nd kernel asap w/o any extra action so that it can survive and collect
> tons of information. Added panic_print_sys_info(false) is obviously
> unnecessary to kdump, and increase risk to kernel switching.

Yes, panic_print_sys_info() increases the risk that the crash dump
will not succeed. But the change makes sense because:

+ panic_print_sys_info() does nothing by default. Most users will
not enable it together with crash dump.

+ Guilherme uses crash dump only to dump the kernel log. It might
be more reliable than kmsg_dump. In this case, panic_print_sys_info()
is the only way to get the extra information.

+ panic_print_sys_info() might be useful even with full crash dump.
For example, ftrace messages are not easy to read from the memory
dump.

Best Regards,
Petr

2022-01-21 19:43:52

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

Thanks again Petr, for the deep analysis! Much appreciated.
Some comments inline below:


On 19/01/2022 12:48, Petr Mladek wrote:
>[...]
> From my POV, the function of panic notifiers is not well defined. They
> do various things, for example:
> [...]
>
> The do more that just providing information. Some are risky. It is not
> easy to disable a particular one.

We are trying to change that here:
https://lore.kernel.org/lkml/[email protected]/

Your review/comments are very welcome =)


> [...]
> It might make sense to allow to call kmsg_dump before panic notifiers
> to reduce the risk of a breakage. But I do not have enough experience
> with them to judge this.
>
> I can't remember any bug report in this code. I guess that only
> few people use kmsg_dump.

One of the problems doing that is that RCU and hung task detector, for
example, have panic notifiers to disable themselves in the panic
scenario - if we kmsg_dump() _before_ the panic notifiers, we may have
intermixed messages, all messy...so for me it makes sense to keep the
kmsg_dump() after panic notifiers.


> [...]
> Yes, panic_print_sys_info() increases the risk that the crash dump
> will not succeed. But the change makes sense because:
>
> + panic_print_sys_info() does nothing by default. Most users will
> not enable it together with crash dump.
>
> + Guilherme uses crash dump only to dump the kernel log. It might
> be more reliable than kmsg_dump. In this case, panic_print_sys_info()
> is the only way to get the extra information.
>
> + panic_print_sys_info() might be useful even with full crash dump.
> For example, ftrace messages are not easy to read from the memory
> dump.

The last point is really good, I didn't consider that before but makes a
lot of sense - we can now dump (a hopefully small!) ftrace/event trace
buffer to dmesg before a kdump, making it pretty easy to read that later.
Cheers,


Guilherme

2022-01-21 19:56:35

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On Fri 2022-01-14 15:30:46, Guilherme G. Piccoli wrote:
> The panic_print setting allows users to collect more information in a
> panic event, like memory stats, tasks, CPUs backtraces, etc.
> This is a pretty interesting debug mechanism, but currently the print
> event happens *after* kmsg_dump(), meaning that pstore, for example,
> cannot collect a dmesg with the panic_print information.

The above text is OK.

The below commit message is quite hard to follow. The sentences are very
long and complex. I tend to do the same mistakes. I will try to
suggest some improvements.


> This patch changes that in 2 ways:

> (a) First, after a good discussion with Petr in the mailing-list[0],
> he noticed

The above is not needed ;-) It is better to just describe the problem
and the solution a directive way.

> that one specific setting of panic_print (the console replay,
> bit 5) should not be executed before console proper flushing; hence we
> hereby split the panic_print_sys_info() function in upper and lower
> portions: if the parameter "after_kmsg_dumpers" is passed, only bit 5
> (the console replay thing) is evaluated and the function returns - this
> is the lower portion. Otherwise all other bits are checked and the
> function prints the user required information; this is the upper/earlier
> mode.

The meaning of the boolean parameter might be explained by:

"panic_print_sys_info() allows to print additional information into the
kernel log. In addition, it allows to reply the existing log on the
console first.

The extra information is useful also when generating crash dump or
kmsg_dump. But the replay functionality makes sense only at the end
of panic().

Allow to distinguish the two situations by a boolean parameter."



> (b) With the above change, we can safely insert a panic_print_sys_info()
> call up some lines, in order kmsg_dump() accounts this new information
> and exposes it through pstore or other kmsg dumpers. Notice that this
> new earlier call doesn't set "after_kmsg_dumpers" so we print the
> information set by user in panic_print, except the console replay that,
> if set, will be executed after the console flushing.

This paragraph is too complicated. The important information here is:

"panic_print_sys_info() is moved above kmsg_dump(). It allows to dump
the extra information."

The trick with the boolean is already explained elsewhere. It is not
needed to repeat it in this paragraph.


> Also, worth to notice we needed to guard the panic_print_sys_info(false)
> calls against double print - we use kexec_crash_loaded() helper for that
> (see discussion [1] for more details).

This should be explained together with the reason to call it on two
locations, see below.


> In the first version of this patch we discussed the risks in the
> mailing-list [0], and seems this approach is the least terrible,

the above is not needed.

> despite still having risks of polluting the log with the bulk of
> information that panic_print may bring, losing older messages.
> In order to attenuate that, we've added a warning in the
> kernel-parameters.txt so that users enabling this mechanism consider
> to increase the log buffer size via "log_buf_len" as well.

Again, I would describe the problem and solution a directive way.

"The additional messages might overwrite the oldest messages when
the buffer is full. The only reasonable solution is to use large
enough log buffer. The advice is added into kernel-parameters.txt."



> Finally, another decision was to keep 2 panic_print_sys_info(false)
> calls (instead of just bringing it up some code lines and keep a single
> call) due to the panic notifiers:

The above is too complicated and not important.

> if kdump is not set, currently the
> panic_print information is collected after the notifiers and since
> it's a bit safer this way,

The important information is why it is safer. Honestly, I am still not sure
about this claim. I have checked several notifiers today and they did
not added much stability.


> we decided to keep it as is, only modifying
> the kdump case as per the previous commit [2] (see more details about
> this discussion also in thread [1]).

This does not provide much information. Link to linux-next is bad
idea, see below.


> [0] https://lore.kernel.org/lkml/[email protected]
> [1] https://lore.kernel.org/lkml/[email protected]
> [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=5613b7538f69

linux-next is regularly rebased and Andrew's patches are always newly
imported from quilt. The commit probably already has another hash.

>
> Cc: Feng Tang <[email protected]>
> Cc: Petr Mladek <[email protected]>
> Signed-off-by: Guilherme G. Piccoli <[email protected]>
> ---
>
>
> V3: Added a guard in the 2nd panic_print_sys_info(false) to prevent
> double print - thanks for catching this Petr!
>
> I didn't implement your final suggestion Petr, i.e., putting the first
> panic_print_sys_info(false) inside the if (!_crash_kexec_post_notifiers)
> block, and the reason is that when we do this, there's 4 cases to consider:

But then it does not make sense to call panic_print_sys_info(false)
from both locations.

You say that it is called after kexec_post_notifiers because
it is more safe. If this is true then it would make sense to
always call it after the notifiers when _crash_kexec_post_notifiers
is set.

> !kexec_crash_load() && !_crash_kexec_post_notifiers
> kexec_crash_load() && !_crash_kexec_post_notifiers
> kexec_crash_load() && _crash_kexec_post_notifiers
> !kexec_crash_load() && _crash_kexec_post_notifiers
>
> The 3rd case, which means user enabled kdump and set the post_notifiers
> in the cmdline fails - we end-up not reaching panic_print_sys_info(false)
> in this case, unless we add another variable to track the function call
> and prevent double print. My preference was to keep the first call
> as introduced by commit [2] (mentioned above) and not rely in another
> variable.

You could do:

if (!_crash_kexec_post_notifiers) {
if (kexec_crash_loaded())
panic_print_sys_info(false);

__crash_kexec(NULL);
smp_send_stop();
} else {
crash_smp_send_stop();
}

atomic_notifier_call_chain(&panic_notifier_list, 0, buf);

/* Just negate the condition here */
if (_crash_kexec_post_notifiers || !kexec_crash_loaded())
panic_print_sys_info(false);

> diff --git a/kernel/panic.c b/kernel/panic.c
> index 41ecf9ab824a..4ae712665f75 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -249,7 +252,7 @@ void panic(const char *fmt, ...)
> * show some extra information on kernel log if it was set...
> */
> if (kexec_crash_loaded())
> - panic_print_sys_info();
> + panic_print_sys_info(false);
>
> /*
> * If we have crashed and we have a crash kernel loaded let it handle
> @@ -283,6 +286,15 @@ void panic(const char *fmt, ...)
> */
> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>
> + /*
> + * If kexec_crash_loaded() is true and we still reach this point,
> + * kernel would double print the information from panic_print; so
> + * let's guard against that possibility (it happens if kdump users
> + * also set crash_kexec_post_notifiers in the command-line).
> + */

Too many words. Most of it is obvious from the code. The most important
information is why it is called after the notifiers when the crash
kernel is not loaded. And why it does not depend on _crash_kexec_post_notifiers
setting.


> + if (!kexec_crash_loaded())
> + panic_print_sys_info(false);
> +
> kmsg_dump(KMSG_DUMP_PANIC);
>
> /*

Best Regards,
Petr

2022-01-21 20:02:20

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

Hi Petr, thanks again for the thorough review! I will implement all your
suggestions and submit a V4, all makes sense to me =)

Cheers,


Guilherme


2022-01-21 21:12:50

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/19/22 at 04:48pm, Petr Mladek wrote:
> On Wed 2022-01-19 15:13:18, Baoquan He wrote:
> > On 01/14/22 at 03:30pm, Guilherme G. Piccoli wrote:
> > > The panic_print setting allows users to collect more information in a
> > > panic event, like memory stats, tasks, CPUs backtraces, etc.
> > > This is a pretty interesting debug mechanism, but currently the print
> > > event happens *after* kmsg_dump(), meaning that pstore, for example,
> > > cannot collect a dmesg with the panic_print information.
> > ......
> >
> > Thanks for the effort.
> >
> > I have some concerns about this patch, and it's still not clear to me
> > why the thing done in this patch is needed. Hope you don't mind if I ask
> > stupid question or things have been discussed in the earlier post.
> >
> > Firstly, let me add some background information and details about the
> > problem with my understanding, please correct me if anthing is wrong.
> >
> > Background:
> > ===========
> > Currently, in panic(), there are 5 different ways to try best to
> > collect information:
> > 1) Vmcore dumping
> > done via kdump, high weight, almost get all information we want;
> > need switch to kdump kernel immediately.
>
Thanks for the input, Petr.

> My experience is that it basically always works when it is correctly
> configured. It might be tested before the crash.

I guess you mean on a certain machine, admin usually triggers a crash
manually to test the vmcore dumping. If it works with the test, it
always works later. However, the success of switching sometime may
depend on what is crashed.

We got report of on guest of hyperV, they enable panic_notifier by
default, w/o the need of adding in cmdline. When crash triggered, it
will enter into the panic_notifier and execute the HyperV's specific
task, then it failed and never reach to kdump jumping.
>
>
>
> > 2) Panic notifier
> > iterating those registered handlers, light weight, information got
> > depends on those panic handlers. Try its best after panic, do not reboot.
>
>
> From my POV, the function of panic notifiers is not well defined. They
> do various things, for example:
>
> + on_panic_nb() flushes s390 console buffers. It looks like a
> complex and risky code. For example, __sclp_vt220_flush_buffer()
> takes spin locks and calls timer code.
>
> + dump_gisb_error() prints some info but it also communicates with
> the device, see gisb_read() and gisb_write() in
> brcmstb_gisb_arb_decode_addr(). I am not sure how reliable this is.
>
> + parisc_panic_event() re-enables the soft-power switch. I am not
> sure how safe this is.
>
> + pvpanic_panic_notify() takes spin lock and does in iowrite().
>
>
> The do more that just providing information. Some are risky. It is not
> easy to disable a particular one.

Yes, agree. Not all of them just provide information.

Now panic_notifier_filter Guilherme added can help to disable some of
them.

>
>
> > 3) kmsg_dump
> > iterating registered dumpers to collect as much kernel log as possible
> > to store. E.g on pstore, light weight, only kernel log, do not reboot,
> > log can be checked after system reboot.
>
> From my POV, it is similar functionality like the crash dump.
>
> It might make sense to allow to call kmsg_dump before panic notifiers
> to reduce the risk of a breakage. But I do not have enough experience
> with them to judge this.

Hmm, from the content it saved, it's more like the vmcore-dmesg
we call in kdump kernel. From behaviour, it's a little similar with
crash dump, dumped file are saved and can be checked after system
reboot. While it can't be compared to crash dump. With my understanding,
kmsg_dump is mostly used on embeded systems or mobile devices.

I tend to agree with you to move kmsg_dump before panic notifiers since
kmsg_dump looks lighter weight than panic notifier. Let's see how the
users of them say.
>
> I can't remember any bug report in this code. I guess that only
> few people use kmsg_dump.
>
>
> > 4)console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > Flush to get the pending logbuf printed out to console.
>
> My experience is that there are many problems with consoles.
> There might be deadlocks because of internal locks. Some
> serial consoles are incredibly slow. Messages in the console
> drivers might cause infinite output. Some drivers are pretty
> complex, especially tty that is often enabled.
>
> console_flush_on_panic() makes it even worse. It resets console_sem.
> It means that it tries to use the consoles even when they are
> already in use.
>
> Note that printk() tries to show the messages on the console
> immediately. console_flush_on_panic() is just the last
> instance when all safe ways failed.

OK, then putting them at last stage to execute sounds reasonable.

>
>
> > 5)panic_print
> > Print out more system info, including all dmesg, task states, mem
> > info, etc.
>
> This adds more information to the kernel log. They can be requested
> by kernel commandline parameter. They can be seen only on the console
> at the moment.
>
> It makes sense to allow to see them also in crash dump or kmsg_dump.

I haven't heard complaint about insufficient message in crash dump. Do
you have any pointer or case about this?

>
> > About the change, my question is why not packing
> > console_flush_on_panic(CONSOLE_FLUSH_PENDING) into panic_print(), and
> > move panic_print() up to be above kmsg_dump().
>
> As explained above. console_flush_on_panic() is a dirty hack.
> It really has to be called at the end as the last resort.
>
>
> > And adding panic_print_sys_info(false) before kdump might not be a good
> > idea. Firstly, panicked system is very unstable, kdump need switch to
> > 2nd kernel asap w/o any extra action so that it can survive and collect
> > tons of information. Added panic_print_sys_info(false) is obviously
> > unnecessary to kdump, and increase risk to kernel switching.
>
> Yes, panic_print_sys_info() increases the risk that the crash dump
> will not succeed. But the change makes sense because:
>
> + panic_print_sys_info() does nothing by default. Most users will
> not enable it together with crash dump.

Even so, we may need to keep code easier to maintain. We can make it
just as panic notifier does.

>
> + Guilherme uses crash dump only to dump the kernel log. It might
> be more reliable than kmsg_dump. In this case, panic_print_sys_info()
> is the only way to get the extra information.

Hmm, I haven't made clear what Guilherme really wants in his recent
post. In this patch he wants to get panic print info into pstore. He
also want to dump the kernel log poked by panic_print in kdump kernel.
And it's very weird people try to collect kernel log via crash dump
mechnism, that is obviously using a sledgehammer to crack a nut.
Sometime, we should not add or change code to a too specific corner
case.

>
> + panic_print_sys_info() might be useful even with full crash dump.
> For example, ftrace messages are not easy to read from the memory
> dump.

As said before, do you have a case report about ftrace debugging
requring more message in kdump kernel? If true, I would like to see a
patch posted for that, and patch log stating it clearly. I noticed
below patch from Guilherme has been queued in linux-next. At least, from
the commit log, I don't understand why a kernel log need be collected
via crash dump. Now, this patch is posted, kernel log need be collected
via kmsg_dump. Really hope we can make all things clear, then a final
agreement is made.

commit ab693ae2140afdf797cc376b3569ca9850a7681d
Author: Guilherme G. Piccoli <[email protected]>
Date: Thu Dec 30 20:29:14 2021 +1100

panic: allow printing extra panic information on kdump


In fact, my suggestion is as below. I would like to see kmsg_dump()
being moved above panic_notifer after Guilherme's careful evaluation.

void panic()
{
if (!_crash_kexec_post_notifiers && !panic_print) {
__crash_kexec(NULL);
smp_send_stop();
} else {
crash_smp_send_stop();
}

atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
panic_print_sys_info(false);
kmsg_dump(KMSG_DUMP_PANIC);
if (_crash_kexec_post_notifiers || panic_print)
__crash_kexec(NULL);
...
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);

panic_print_sys_info(true);
......
}

Please, don't name 'after_kmsg_dumpers', that's too nerd, bro :-)
static void panic_print_sys_info(bool console_flush)
{
.....
}

2022-01-21 21:14:48

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On Wed 2022-01-19 13:03:15, Guilherme G. Piccoli wrote:
> Thanks again Petr, for the deep analysis! Much appreciated.
> Some comments inline below:
>
>
> On 19/01/2022 12:48, Petr Mladek wrote:
> >[...]
> > From my POV, the function of panic notifiers is not well defined. They
> > do various things, for example:
> > [...]
> >
> > The do more that just providing information. Some are risky. It is not
> > easy to disable a particular one.
>
> We are trying to change that here:
> https://lore.kernel.org/lkml/[email protected]/
>
> Your review/comments are very welcome =)
>
>
> > [...]
> > It might make sense to allow to call kmsg_dump before panic notifiers
> > to reduce the risk of a breakage. But I do not have enough experience
> > with them to judge this.
> >
> > I can't remember any bug report in this code. I guess that only
> > few people use kmsg_dump.
>
> One of the problems doing that is that RCU and hung task detector, for
> example, have panic notifiers to disable themselves in the panic
> scenario - if we kmsg_dump() _before_ the panic notifiers, we may have
> intermixed messages, all messy...so for me it makes sense to keep the
> kmsg_dump() after panic notifiers.

It makes perfect sense to disable the watchdogs during panic().
For example, rcu_panic() just sets a variable:

static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
{
rcu_cpu_stall_suppress = 1;
return NOTIFY_DONE;
}

It is quick and super-safe. It might make sense to implement similar
thing for other watchdogs and do something like:

void panic_supress_watchdogs(void)
{
rcu_panic();
softlockup_watchog_panic();
wq_watchog_panic();
}

It might be caller early in panic().

>
> > [...]
> > Yes, panic_print_sys_info() increases the risk that the crash dump
> > will not succeed. But the change makes sense because:
> >
> > + panic_print_sys_info() might be useful even with full crash dump.
> > For example, ftrace messages are not easy to read from the memory
> > dump.
>
> The last point is really good, I didn't consider that before but makes a
> lot of sense - we can now dump (a hopefully small!) ftrace/event trace
> buffer to dmesg before a kdump, making it pretty easy to read that later.
> Cheers,

JFYI, there is an extension for the crash tool that might be able to read
the trace log, see https://crash-utility.github.io/extensions.html

I haven't tested it myself yet.

Best Regards,
Petr

2022-01-21 22:22:27

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 20/01/2022 06:39, Petr Mladek wrote:
> [...]
> It makes perfect sense to disable the watchdogs during panic().
> For example, rcu_panic() just sets a variable:
>
> static int rcu_panic(struct notifier_block *this, unsigned long ev, void *ptr)
> {
> rcu_cpu_stall_suppress = 1;
> return NOTIFY_DONE;
> }
>
> It is quick and super-safe. It might make sense to implement similar
> thing for other watchdogs and do something like:
>
> void panic_supress_watchdogs(void)
> {
> rcu_panic();
> softlockup_watchog_panic();
> wq_watchog_panic();
> }
>
> It might be caller early in panic().
>

For reference, I saw your great input about this subject in another
related thread, which I think we should mention here:

https://lore.kernel.org/lkml/Yel8WQiBn%2FHNQN83@alley/


> JFYI, there is an extension for the crash tool that might be able to read
> the trace log, see https://crash-utility.github.io/extensions.html
>
> I haven't tested it myself yet.

Thanks, nice to know =)

2022-01-22 00:05:43

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

Hi Baoquan, some comments inline below:


On 20/01/2022 05:51, Baoquan He wrote:
> [...]
>> From my POV, the function of panic notifiers is not well defined. They
>> do various things, for example:
>> [...]
>> The do more that just providing information. Some are risky. It is not
>> easy to disable a particular one.
>
> Yes, agree. Not all of them just provide information.
>
> Now panic_notifier_filter Guilherme added can help to disable some of
> them.

So, just for completeness, worth to mention Petr had some interesting
suggestions in the other thread (about the filter) and we may end-up not
having this implemented - in other words, maybe a refactor of that
mechanism is going to be proposed.


> [...]
>>
>> + Guilherme uses crash dump only to dump the kernel log. It might
>> be more reliable than kmsg_dump. In this case, panic_print_sys_info()
>> is the only way to get the extra information.
>
> Hmm, I haven't made clear what Guilherme really wants in his recent
> post. In this patch he wants to get panic print info into pstore. He
> also want to dump the kernel log poked by panic_print in kdump kernel.
> And it's very weird people try to collect kernel log via crash dump
> mechnism, that is obviously using a sledgehammer to crack a nut.
> Sometime, we should not add or change code to a too specific corner
> case.

OK, I'll try to be really clear, hopefully I can explain the use case in
better and simpler words. First of all, I wouldn't call it a corner case
- it's just a valid use case that, in my opinion, should be allowed. Why
not, right? Kernel shouldn't push policy on users, we should instead let
the users decide how to use the tools/options.

So imagine you cannot collect a vmcore, due to the lack of storage
space. Yet, you want the most information as possible to investigate the
cause of a panic. The kernel flag "panic_print" is the perfect fit, we
can dump backtraces, task list, memory info...right on a panic event.

But then, how to save this panic log with lots of information after a
reboot? There are 2 ways in my understanding:

(a) pstore/kmsg_dump()
(b) kdump

The option (a) is easily the best - we don't need to reserve lots of
memory, then boot another kernel, etc. This patch (being hereby
discussed) aims to enable the "panic_print" output for this case!
But...there are cases in which option (a) cannot work. We need a backend
of persistent storage, either a block device or, more common, RAM memory
that is persistent across a reboot. What if it's not available?

Then, we fallback to option (b) - kind of a sledgehammer, in your words heh
It's not ideal, but might be a last resort for users wanting to collect
the most information they can without saving a full vmcore. And for
that, we need to be able to invoke "panic_print" function before the
__crash_kexec() call. Continue below...


> [...] I noticed
> below patch from Guilherme has been queued in linux-next. At least, from
> the commit log, I don't understand why a kernel log need be collected
> via crash dump. Now, this patch is posted, kernel log need be collected
> via kmsg_dump. Really hope we can make all things clear, then a final
> agreement is made.
>
> commit ab693ae2140afdf797cc376b3569ca9850a7681d
> Author: Guilherme G. Piccoli <[email protected]>
> Date: Thu Dec 30 20:29:14 2021 +1100
>
> panic: allow printing extra panic information on kdump
>
>
> In fact, my suggestion is as below. I would like to see kmsg_dump()
> being moved above panic_notifer after Guilherme's careful evaluation.
>
> void panic()
> {
> if (!_crash_kexec_post_notifiers && !panic_print) {
> __crash_kexec(NULL);
> smp_send_stop();
> } else {
> crash_smp_send_stop();
> }
>
> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> panic_print_sys_info(false);
> kmsg_dump(KMSG_DUMP_PANIC);
> if (_crash_kexec_post_notifiers || panic_print)
> __crash_kexec(NULL);
> ...
> debug_locks_off();
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
> panic_print_sys_info(true);
> ......
> }
>

So, your idea is good and it mostly works, except it *requires* users to
make use of "crash_kexec_post_notifiers" in order to use "panic_print"
in the case (b) above discussed.

Do you think it should be necessary?
How about if we allow users to just "panic_print" with or without the
"crash_kexec_post_notifiers", then we pursue Petr suggestion of
refactoring the panic notifiers? So, after this future refactor, we
might have a much clear code.


> Please, don't name 'after_kmsg_dumpers', that's too nerd, bro :-)
> static void panic_print_sys_info(bool console_flush)

Sure, I'll rename "after_kmsg_dumpers" to "console_flush" in next
iteration, although my nerd side won't be so happy ;-)

Thanks for your review/comments once more.
Cheers,


Guilherme

2022-01-22 00:31:21

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> Hi Baoquan, some comments inline below:
>
>
> On 20/01/2022 05:51, Baoquan He wrote:
> > [...]
> >> From my POV, the function of panic notifiers is not well defined. They
> >> do various things, for example:
> >> [...]
> >> The do more that just providing information. Some are risky. It is not
> >> easy to disable a particular one.
> >
> > Yes, agree. Not all of them just provide information.
> >
> > Now panic_notifier_filter Guilherme added can help to disable some of
> > them.
>
> So, just for completeness, worth to mention Petr had some interesting
> suggestions in the other thread (about the filter) and we may end-up not
> having this implemented - in other words, maybe a refactor of that
> mechanism is going to be proposed.

OK, saw that. We can continue discuss that there.

>
>
> > [...]
> >>
> >> + Guilherme uses crash dump only to dump the kernel log. It might
> >> be more reliable than kmsg_dump. In this case, panic_print_sys_info()
> >> is the only way to get the extra information.
> >
> > Hmm, I haven't made clear what Guilherme really wants in his recent
> > post. In this patch he wants to get panic print info into pstore. He
> > also want to dump the kernel log poked by panic_print in kdump kernel.
> > And it's very weird people try to collect kernel log via crash dump
> > mechnism, that is obviously using a sledgehammer to crack a nut.
> > Sometime, we should not add or change code to a too specific corner
> > case.
>
> OK, I'll try to be really clear, hopefully I can explain the use case in
> better and simpler words. First of all, I wouldn't call it a corner case
> - it's just a valid use case that, in my opinion, should be allowed. Why
> not, right? Kernel shouldn't push policy on users, we should instead let
> the users decide how to use the tools/options.

Agree, sorry about my wrong expression.

>
> So imagine you cannot collect a vmcore, due to the lack of storage
> space. Yet, you want the most information as possible to investigate the
> cause of a panic. The kernel flag "panic_print" is the perfect fit, we
> can dump backtraces, task list, memory info...right on a panic event.
>
> But then, how to save this panic log with lots of information after a
> reboot? There are 2 ways in my understanding:
>
> (a) pstore/kmsg_dump()
> (b) kdump
>
> The option (a) is easily the best - we don't need to reserve lots of
> memory, then boot another kernel, etc. This patch (being hereby
> discussed) aims to enable the "panic_print" output for this case!
> But...there are cases in which option (a) cannot work. We need a backend
> of persistent storage, either a block device or, more common, RAM memory
> that is persistent across a reboot. What if it's not available?
>
> Then, we fallback to option (b) - kind of a sledgehammer, in your words heh
> It's not ideal, but might be a last resort for users wanting to collect
> the most information they can without saving a full vmcore. And for
> that, we need to be able to invoke "panic_print" function before the
> __crash_kexec() call. Continue below...

OK, pstore via kmsg_dump is first option, then fallback to kdump.
This is what I suggested at below. This is what panic notifier has done
at below. I think both of them are similar, thus should take the same
way to handle.

void panic()
{
if (!_crash_kexec_post_notifiers && !panic_print) {
__crash_kexec(NULL);
smp_send_stop();
} else {
crash_smp_send_stop();
}

atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
panic_print_sys_info(false);
kmsg_dump(KMSG_DUMP_PANIC);
if (_crash_kexec_post_notifiers || panic_print)
__crash_kexec(NULL);
...
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);

panic_print_sys_info(true);
......
}
> >
>
> So, your idea is good and it mostly works, except it *requires* users to
> make use of "crash_kexec_post_notifiers" in order to use "panic_print"
> in the case (b) above discussed.

I don't get. Why it has to *require* users to make use of
"crash_kexec_post_notifiers" in order to use "panic_print"?
To enable panic notifiers and panic_print, we need add below parameter
to kernel cmdline separately.

crash_kexec_post_notifiers=1
panic_print=0x7f

With above code, we have:
1) None specified in cmdline, only kdump enabled.
Crash dump will work to get vmcore.
2) crash_kexec_post_notifiers=1 , kdump enabled
panic_notifers are executed, then crash dump
3) panic_print=0x7f, kdump enabled,
Panic_print get system info printed, then crash dump
4) crash_kexec_post_notifiers=1 panic_print=0x7f, kdump enabled
panic_notifers are executed firstly, then panic_print, at last crash dump

Here I don't list the no kdump enabled case. Please help point out if I
misunderstood anything.
>
> Do you think it should be necessary?
> How about if we allow users to just "panic_print" with or without the
> "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> refactoring the panic notifiers? So, after this future refactor, we
> might have a much clear code.

I haven't read Petr's reply in another panic notifier filter thread. For
panic notifier, it's only enforced to use on HyperV platform, excepto of
that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
it. And we got bug report on the HyperV issue. In our internal discussion,
we strongly suggest HyperV dev to change the default enablement, instead
leave it to user to decide.

> > Please, don't name 'after_kmsg_dumpers', that's too nerd, bro :-)
> > static void panic_print_sys_info(bool console_flush)
>
> Sure, I'll rename "after_kmsg_dumpers" to "console_flush" in next
> iteration, although my nerd side won't be so happy ;-)

No offence at all. My wife always call me nerd. Sorry about that.

2022-01-22 01:32:16

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

Hi Baoquan , thanks again for your prompt reply!
Comments inline below:


On 20/01/2022 23:31, Baoquan He wrote:
> [...]
>> OK, I'll try to be really clear, hopefully I can explain the use case in
>> better and simpler words. First of all, I wouldn't call it a corner case
>> - it's just a valid use case that, in my opinion, should be allowed. Why
>> not, right? Kernel shouldn't push policy on users, we should instead let
>> the users decide how to use the tools/options.
>
> Agree, sorry about my wrong expression.

No need to be sorry at all! And if you indeed consider that a corner
case, feel free to express that and we should take it into account =)
Your opinion is much appreciated!


>
> OK, pstore via kmsg_dump is first option, then fallback to kdump.
> This is what I suggested at below. This is what panic notifier has done
> at below. I think both of them are similar, thus should take the same
> way to handle.
>
> void panic()
> {
>1 if (!_crash_kexec_post_notifiers && !panic_print) {
>2 __crash_kexec(NULL);
>3 smp_send_stop();
>4 } else {
>5 crash_smp_send_stop();
>6 }
>
>8 atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
>9 panic_print_sys_info(false);
>10 kmsg_dump(KMSG_DUMP_PANIC);
>11 if (_crash_kexec_post_notifiers || panic_print)
>12 __crash_kexec(NULL);
> ...
> debug_locks_off();
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
> panic_print_sys_info(true);
> ......
> }
>[...]
> I don't get. Why it has to *require* users to make use of
> "crash_kexec_post_notifiers" in order to use "panic_print"?
> To enable panic notifiers and panic_print, we need add below parameter
> to kernel cmdline separately.
>
> crash_kexec_post_notifiers=1
> panic_print=0x7f
>
> With above code, we have:
> 1) None specified in cmdline, only kdump enabled.
> Crash dump will work to get vmcore.
> 2) crash_kexec_post_notifiers=1 , kdump enabled
> panic_notifers are executed, then crash dump
> 3) panic_print=0x7f, kdump enabled,
> Panic_print get system info printed, then crash dump
> 4) crash_kexec_post_notifiers=1 panic_print=0x7f, kdump enabled
> panic_notifers are executed firstly, then panic_print, at last crash dump
>
> Here I don't list the no kdump enabled case. Please help point out if I
> misunderstood anything.

OK, this is a really great summary list of the possible cases, thanks
for that. I might be wrong here, this code is a bit confusing for
me...so I put line numbers in your code and we can discuss based on that.

Case (1) - Line L2 is reached, we jump to the kdump kernel, right?
Case (2) - Line L5 and lines L8->L12 executed, correct?

Case (3) - I don't understand this case! If kdump is enabled and
panic_print as well, we execute Line L2 right? If that's not the case,
then we jump to kdump kernel at line L12, but that means L8 was
executed, the notifiers list. Right?

So, how is it possible in your code to execute
"panic_print_sys_info(false);" and then jump to kdump *without* reaching L8?

I apologize in advance if I'm silly and it's obvious - I guess I won't
get the C-programmer-prize of the year anyway heheh


>> Sure, I'll rename "after_kmsg_dumpers" to "console_flush" in next
>> iteration, although my nerd side won't be so happy ;-)
>
> No offence at all. My wife always call me nerd. Sorry about that.

No offense taken, and no need to be sorry - we're cool!
I got the joke =D

And the variable name suggestion was indeed good.

2022-01-22 01:50:49

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH V3] panic: Move panic_print before kmsg dumpers

From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
>
> On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > Hi Baoquan, some comments inline below:
> >
> >
> > On 20/01/2022 05:51, Baoquan He wrote:
> > > [...]
> > >> From my POV, the function of panic notifiers is not well defined. They
> > >> do various things, for example:
> > >> [...]
> > >> The do more that just providing information. Some are risky. It is not
> > >> easy to disable a particular one.
> > >
> > > Yes, agree. Not all of them just provide information.
> > >
> > > Now panic_notifier_filter Guilherme added can help to disable some of
> > > them.
> >
> > So, just for completeness, worth to mention Petr had some interesting
> > suggestions in the other thread (about the filter) and we may end-up not
> > having this implemented - in other words, maybe a refactor of that
> > mechanism is going to be proposed.
>
> OK, saw that. We can continue discuss that there.
>
> >
> >
> > > [...]
> > >>
> > >> + Guilherme uses crash dump only to dump the kernel log. It might
> > >> be more reliable than kmsg_dump. In this case, panic_print_sys_info()
> > >> is the only way to get the extra information.
> > >
> > > Hmm, I haven't made clear what Guilherme really wants in his recent
> > > post. In this patch he wants to get panic print info into pstore. He
> > > also want to dump the kernel log poked by panic_print in kdump kernel.
> > > And it's very weird people try to collect kernel log via crash dump
> > > mechnism, that is obviously using a sledgehammer to crack a nut.
> > > Sometime, we should not add or change code to a too specific corner
> > > case.
> >
> > OK, I'll try to be really clear, hopefully I can explain the use case in
> > better and simpler words. First of all, I wouldn't call it a corner case
> > - it's just a valid use case that, in my opinion, should be allowed. Why
> > not, right? Kernel shouldn't push policy on users, we should instead let
> > the users decide how to use the tools/options.
>
> Agree, sorry about my wrong expression.
>
> >
> > So imagine you cannot collect a vmcore, due to the lack of storage
> > space. Yet, you want the most information as possible to investigate the
> > cause of a panic. The kernel flag "panic_print" is the perfect fit, we
> > can dump backtraces, task list, memory info...right on a panic event.
> >
> > But then, how to save this panic log with lots of information after a
> > reboot? There are 2 ways in my understanding:
> >
> > (a) pstore/kmsg_dump()
> > (b) kdump
> >
> > The option (a) is easily the best - we don't need to reserve lots of
> > memory, then boot another kernel, etc. This patch (being hereby
> > discussed) aims to enable the "panic_print" output for this case!
> > But...there are cases in which option (a) cannot work. We need a backend
> > of persistent storage, either a block device or, more common, RAM memory
> > that is persistent across a reboot. What if it's not available?
> >
> > Then, we fallback to option (b) - kind of a sledgehammer, in your words heh
> > It's not ideal, but might be a last resort for users wanting to collect
> > the most information they can without saving a full vmcore. And for
> > that, we need to be able to invoke "panic_print" function before the
> > __crash_kexec() call. Continue below...
>
> OK, pstore via kmsg_dump is first option, then fallback to kdump.
> This is what I suggested at below. This is what panic notifier has done
> at below. I think both of them are similar, thus should take the same
> way to handle.
>
> void panic()
> {
> if (!_crash_kexec_post_notifiers && !panic_print) {
> __crash_kexec(NULL);
> smp_send_stop();
> } else {
> crash_smp_send_stop();
> }
>
> atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> panic_print_sys_info(false);
> kmsg_dump(KMSG_DUMP_PANIC);
> if (_crash_kexec_post_notifiers || panic_print)
> __crash_kexec(NULL);
> ...
> debug_locks_off();
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>
> panic_print_sys_info(true);
> ......
> }
> > >
> >
> > So, your idea is good and it mostly works, except it *requires* users to
> > make use of "crash_kexec_post_notifiers" in order to use "panic_print"
> > in the case (b) above discussed.
>
> I don't get. Why it has to *require* users to make use of
> "crash_kexec_post_notifiers" in order to use "panic_print"?
> To enable panic notifiers and panic_print, we need add below parameter
> to kernel cmdline separately.
>
> crash_kexec_post_notifiers=1
> panic_print=0x7f
>
> With above code, we have:
> 1) None specified in cmdline, only kdump enabled.
> Crash dump will work to get vmcore.
> 2) crash_kexec_post_notifiers=1 , kdump enabled
> panic_notifers are executed, then crash dump
> 3) panic_print=0x7f, kdump enabled,
> Panic_print get system info printed, then crash dump
> 4) crash_kexec_post_notifiers=1 panic_print=0x7f, kdump enabled
> panic_notifers are executed firstly, then panic_print, at last crash dump
>
> Here I don't list the no kdump enabled case. Please help point out if I
> misunderstood anything.
> >
> > Do you think it should be necessary?
> > How about if we allow users to just "panic_print" with or without the
> > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > refactoring the panic notifiers? So, after this future refactor, we
> > might have a much clear code.
>
> I haven't read Petr's reply in another panic notifier filter thread. For
> panic notifier, it's only enforced to use on HyperV platform, excepto of
> that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> it. And we got bug report on the HyperV issue. In our internal discussion,
> we strongly suggest HyperV dev to change the default enablement, instead
> leave it to user to decide.
>

Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
kdump kernel is necessary for correctness. During initial boot of the
main kernel, the Hyper-V and VMbus code in Linux sets up several guest
physical memory pages that are shared with Hyper-V, and that Hyper-V
may write to. A VMbus connection is also established. Before kexec'ing
into the kdump kernel, the sharing of these pages must be rescinded
and the VMbus connection must be terminated. If this isn't done, the
kdump kernel will see strange memory overwrites if these shared guest
physical memory pages get used for something else.

I hope we've found and fixed all the problems where the Hyper-V
notifier could get hung. Unfortunately, the Hyper-V interfaces were
designed long ago without the Linux kexec scenario in mind, and they
don't provide a simple way to reset everything except by doing a
reboot that goes back through the virtual BIOS/UEFI. So the Hyper-V
notifier code is more complicated than would be desirable, and in
particular, terminating the VMbus connection is tricky.

This has been an evolving area of understanding. It's only been the last
couple of years that we've fully understood the implications of these
shared memory pages on the kexec/kdump scenario and what it takes
to reset everything so the kexec'ed kernel will work.

Michael

>
> > > Please, don't name 'after_kmsg_dumpers', that's too nerd, bro :-)
> > > static void panic_print_sys_info(bool console_flush)
> >
> > Sure, I'll rename "after_kmsg_dumpers" to "console_flush" in next
> > iteration, although my nerd side won't be so happy ;-)
>
> No offence at all. My wife always call me nerd. Sorry about that.

2022-01-23 00:16:31

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
> >
> > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > Hi Baoquan, some comments inline below:
> > >
> > >
> > > On 20/01/2022 05:51, Baoquan He wrote:
> > > > [...]
> > > >> From my POV, the function of panic notifiers is not well defined. They
> > > >> do various things, for example:
> > > >> [...]
> > > >> The do more that just providing information. Some are risky. It is not
> > > >> easy to disable a particular one.
> > > >
> > > > Yes, agree. Not all of them just provide information.
> > > >
> > > > Now panic_notifier_filter Guilherme added can help to disable some of
> > > > them.
> > >
> > > So, just for completeness, worth to mention Petr had some interesting
> > > suggestions in the other thread (about the filter) and we may end-up not
> > > having this implemented - in other words, maybe a refactor of that
> > > mechanism is going to be proposed.
> >
> > OK, saw that. We can continue discuss that there.
> >
> > >
> > >
> > > > [...]
> > > >>
> > > >> + Guilherme uses crash dump only to dump the kernel log. It might
> > > >> be more reliable than kmsg_dump. In this case, panic_print_sys_info()
> > > >> is the only way to get the extra information.
> > > >
> > > > Hmm, I haven't made clear what Guilherme really wants in his recent
> > > > post. In this patch he wants to get panic print info into pstore. He
> > > > also want to dump the kernel log poked by panic_print in kdump kernel.
> > > > And it's very weird people try to collect kernel log via crash dump
> > > > mechnism, that is obviously using a sledgehammer to crack a nut.
> > > > Sometime, we should not add or change code to a too specific corner
> > > > case.
> > >
> > > OK, I'll try to be really clear, hopefully I can explain the use case in
> > > better and simpler words. First of all, I wouldn't call it a corner case
> > > - it's just a valid use case that, in my opinion, should be allowed. Why
> > > not, right? Kernel shouldn't push policy on users, we should instead let
> > > the users decide how to use the tools/options.
> >
> > Agree, sorry about my wrong expression.
> >
> > >
> > > So imagine you cannot collect a vmcore, due to the lack of storage
> > > space. Yet, you want the most information as possible to investigate the
> > > cause of a panic. The kernel flag "panic_print" is the perfect fit, we
> > > can dump backtraces, task list, memory info...right on a panic event.
> > >
> > > But then, how to save this panic log with lots of information after a
> > > reboot? There are 2 ways in my understanding:
> > >
> > > (a) pstore/kmsg_dump()
> > > (b) kdump
> > >
> > > The option (a) is easily the best - we don't need to reserve lots of
> > > memory, then boot another kernel, etc. This patch (being hereby
> > > discussed) aims to enable the "panic_print" output for this case!
> > > But...there are cases in which option (a) cannot work. We need a backend
> > > of persistent storage, either a block device or, more common, RAM memory
> > > that is persistent across a reboot. What if it's not available?
> > >
> > > Then, we fallback to option (b) - kind of a sledgehammer, in your words heh
> > > It's not ideal, but might be a last resort for users wanting to collect
> > > the most information they can without saving a full vmcore. And for
> > > that, we need to be able to invoke "panic_print" function before the
> > > __crash_kexec() call. Continue below...
> >
> > OK, pstore via kmsg_dump is first option, then fallback to kdump.
> > This is what I suggested at below. This is what panic notifier has done
> > at below. I think both of them are similar, thus should take the same
> > way to handle.
> >
> > void panic()
> > {
> > if (!_crash_kexec_post_notifiers && !panic_print) {
> > __crash_kexec(NULL);
> > smp_send_stop();
> > } else {
> > crash_smp_send_stop();
> > }
> >
> > atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> > panic_print_sys_info(false);
> > kmsg_dump(KMSG_DUMP_PANIC);
> > if (_crash_kexec_post_notifiers || panic_print)
> > __crash_kexec(NULL);
> > ...
> > debug_locks_off();
> > console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> >
> > panic_print_sys_info(true);
> > ......
> > }
> > > >
> > >
> > > So, your idea is good and it mostly works, except it *requires* users to
> > > make use of "crash_kexec_post_notifiers" in order to use "panic_print"
> > > in the case (b) above discussed.
> >
> > I don't get. Why it has to *require* users to make use of
> > "crash_kexec_post_notifiers" in order to use "panic_print"?
> > To enable panic notifiers and panic_print, we need add below parameter
> > to kernel cmdline separately.
> >
> > crash_kexec_post_notifiers=1
> > panic_print=0x7f
> >
> > With above code, we have:
> > 1) None specified in cmdline, only kdump enabled.
> > Crash dump will work to get vmcore.
> > 2) crash_kexec_post_notifiers=1 , kdump enabled
> > panic_notifers are executed, then crash dump
> > 3) panic_print=0x7f, kdump enabled,
> > Panic_print get system info printed, then crash dump
> > 4) crash_kexec_post_notifiers=1 panic_print=0x7f, kdump enabled
> > panic_notifers are executed firstly, then panic_print, at last crash dump
> >
> > Here I don't list the no kdump enabled case. Please help point out if I
> > misunderstood anything.
> > >
> > > Do you think it should be necessary?
> > > How about if we allow users to just "panic_print" with or without the
> > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > refactoring the panic notifiers? So, after this future refactor, we
> > > might have a much clear code.
> >
> > I haven't read Petr's reply in another panic notifier filter thread. For
> > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > it. And we got bug report on the HyperV issue. In our internal discussion,
> > we strongly suggest HyperV dev to change the default enablement, instead
> > leave it to user to decide.
> >
>
> Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> kdump kernel is necessary for correctness. During initial boot of the
> main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> physical memory pages that are shared with Hyper-V, and that Hyper-V
> may write to. A VMbus connection is also established. Before kexec'ing
> into the kdump kernel, the sharing of these pages must be rescinded
> and the VMbus connection must be terminated. If this isn't done, the
> kdump kernel will see strange memory overwrites if these shared guest
> physical memory pages get used for something else.
>
> I hope we've found and fixed all the problems where the Hyper-V
> notifier could get hung. Unfortunately, the Hyper-V interfaces were
> designed long ago without the Linux kexec scenario in mind, and they
> don't provide a simple way to reset everything except by doing a
> reboot that goes back through the virtual BIOS/UEFI. So the Hyper-V
> notifier code is more complicated than would be desirable, and in
> particular, terminating the VMbus connection is tricky.
>
> This has been an evolving area of understanding. It's only been the last
> couple of years that we've fully understood the implications of these
> shared memory pages on the kexec/kdump scenario and what it takes
> to reset everything so the kexec'ed kernel will work.

Glad to know these background details, thx, Michael. While from the
commit which introduced it and the code comment above code, I thought
Hyper-V wants to collect data before crash dump. If this is the true,
it might be helpful to add these in commit log or add as code comment,
and also help to defend you when people question it.

int __init hv_common_init(void)
{
int i;

/*
* Hyper-V expects to get crash register data or kmsg when
* crash enlightment is available and system crashes. Set
* crash_kexec_post_notifiers to be true to make sure that
* calling crash enlightment interface before running kdump
* kernel.
*/
if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE)
crash_kexec_post_notifiers = true;

......
}

2022-01-23 15:03:37

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/21/22 at 10:17am, Guilherme G. Piccoli wrote:
> Hi Baoquan , thanks again for your prompt reply!
> Comments inline below:
>
>
> On 20/01/2022 23:31, Baoquan He wrote:
> > [...]
> >> OK, I'll try to be really clear, hopefully I can explain the use case in
> >> better and simpler words. First of all, I wouldn't call it a corner case
> >> - it's just a valid use case that, in my opinion, should be allowed. Why
> >> not, right? Kernel shouldn't push policy on users, we should instead let
> >> the users decide how to use the tools/options.
> >
> > Agree, sorry about my wrong expression.
>
> No need to be sorry at all! And if you indeed consider that a corner
> case, feel free to express that and we should take it into account =)
> Your opinion is much appreciated!

From my old POV, I took pstore as a necessity on handheld devices or
embeded system, e.g on Andriod. In that case, reserving crashkernel
memory to enable kdump to save kernel log, it sounds not so
cost-effective, since memory on those systems is usually not big.
I am also interested in any new use case where people deploy these
and why it's needed, to widen my view.
>
> >
> > OK, pstore via kmsg_dump is first option, then fallback to kdump.
> > This is what I suggested at below. This is what panic notifier has done
> > at below. I think both of them are similar, thus should take the same
> > way to handle.
> >
> > void panic()
> > {
> >1 if (!_crash_kexec_post_notifiers && !panic_print) {
> >2 __crash_kexec(NULL);
> >3 smp_send_stop();
> >4 } else {
> >5 crash_smp_send_stop();
> >6 }
> >
> >8 atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> >9 panic_print_sys_info(false);
> >10 kmsg_dump(KMSG_DUMP_PANIC);
> >11 if (_crash_kexec_post_notifiers || panic_print)
> >12 __crash_kexec(NULL);
> > ...
> > debug_locks_off();
> > console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> >
> > panic_print_sys_info(true);
> > ......
> > }
> >[...]
> > I don't get. Why it has to *require* users to make use of
> > "crash_kexec_post_notifiers" in order to use "panic_print"?
> > To enable panic notifiers and panic_print, we need add below parameter
> > to kernel cmdline separately.
> >
> > crash_kexec_post_notifiers=1
> > panic_print=0x7f
> >
> > With above code, we have:
> > 1) None specified in cmdline, only kdump enabled.
> > Crash dump will work to get vmcore.
> > 2) crash_kexec_post_notifiers=1 , kdump enabled
> > panic_notifers are executed, then crash dump
> > 3) panic_print=0x7f, kdump enabled,
> > Panic_print get system info printed, then crash dump
> > 4) crash_kexec_post_notifiers=1 panic_print=0x7f, kdump enabled
> > panic_notifers are executed firstly, then panic_print, at last crash dump
> >
> > Here I don't list the no kdump enabled case. Please help point out if I
> > misunderstood anything.
>
> OK, this is a really great summary list of the possible cases, thanks
> for that. I might be wrong here, this code is a bit confusing for
> me...so I put line numbers in your code and we can discuss based on that.
>
> Case (1) - Line L2 is reached, we jump to the kdump kernel, right?
> Case (2) - Line L5 and lines L8->L12 executed, correct?
>
> Case (3) - I don't understand this case! If kdump is enabled and
> panic_print as well, we execute Line L2 right? If that's not the case,
> then we jump to kdump kernel at line L12, but that means L8 was
> executed, the notifiers list. Right?
>
> So, how is it possible in your code to execute
> "panic_print_sys_info(false);" and then jump to kdump *without* reaching L8?
>
> I apologize in advance if I'm silly and it's obvious - I guess I won't
> get the C-programmer-prize of the year anyway heheh

It's my bad. My thought is panic_print and kmsg_dump can be coupled, but
they should decouple with panic_notifier. When panic_print is enabled,
we do not expect to execute panic_notifier? My personal opinion.

I missed the change at line 8, sorry for the caused misunderstanding.
Now the chance of holding C-programmer-prize of the year comes back
again.

void panic()
{
1 if (!_crash_kexec_post_notifiers && !panic_print) {
2 __crash_kexec(NULL);
3 smp_send_stop();
4 } else {
5 crash_smp_send_stop();
6 }

if (_crash_kexec_post_notifiers)
8 atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
9 panic_print_sys_info(false);
10 kmsg_dump(KMSG_DUMP_PANIC);
11 if (_crash_kexec_post_notifiers || panic_print)
12 __crash_kexec(NULL);
...
debug_locks_off();
console_flush_on_panic(CONSOLE_FLUSH_PENDING);

panic_print_sys_info(true);
......
}

>
>
> >> Sure, I'll rename "after_kmsg_dumpers" to "console_flush" in next
> >> iteration, although my nerd side won't be so happy ;-)
> >
> > No offence at all. My wife always call me nerd. Sorry about that.
>
> No offense taken, and no need to be sorry - we're cool!
> I got the joke =D
>
> And the variable name suggestion was indeed good.
>

2022-01-23 15:14:37

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 22/01/2022 07:31, Baoquan He wrote:
> [...]
> From my old POV, I took pstore as a necessity on handheld devices or
> embeded system, e.g on Andriod. In that case, reserving crashkernel
> memory to enable kdump to save kernel log, it sounds not so
> cost-effective, since memory on those systems is usually not big.
> I am also interested in any new use case where people deploy these
> and why it's needed, to widen my view.

Hi Baoquan, that's great to hear. Indeed, I feel pstore is unfortunately
not very used in non-embedded devices - if you see kdump/error-report
userspace tooling, like on Red Hat/Fedora, Debian/Ubuntu and so on, they
never rely on pstore. And the configuration is not straightforward for
the users...I think that's a good thing to change, since pstore is much
less resource consuming than kdump.
But of course, not a discussion related to this patch specifically, just
me thinking out loud heh


> [...]
> It's my bad. My thought is panic_print and kmsg_dump can be coupled, but
> they should decouple with panic_notifier. When panic_print is enabled,
> we do not expect to execute panic_notifier? My personal opinion.
>
> I missed the change at line 8, sorry for the caused misunderstanding.
> Now the chance of holding C-programmer-prize of the year comes back
> again.
>
> void panic()
> {
> 1 if (!_crash_kexec_post_notifiers && !panic_print) {
> 2 __crash_kexec(NULL);
> 3 smp_send_stop();
> 4 } else {
> 5 crash_smp_send_stop();
> 6 }
>
> if (_crash_kexec_post_notifiers)
> 8 atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> 9 panic_print_sys_info(false);
> 10 kmsg_dump(KMSG_DUMP_PANIC);
> 11 if (_crash_kexec_post_notifiers || panic_print)
> 12 __crash_kexec(NULL);
> ...
> debug_locks_off();
> console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> panic_print_sys_info(true);

Hmm, yeah, I still don't think I'm a brilliant C programmer heh
Again, in the code above, I can't see how we would reach
"__crash_kexec(NULL)" after printing the extra info of panic_print, if
we don't have panic notifiers enabled.

So, indeed the code currently don't really tightly couple "panic_print"
with the panic notifiers. We could change that in another patch series,
based on what Petr suggested in the filter thread (I know you're
following there as well, thanks bu the way!), but for now, they are
completely independent. My plan, following Petr suggestions here and if
you agree, is to re-submit this patch with some changes, but in the end
the code will allow users that have kdump enabled + panic_print
-"crash_kexec_post_notifiers" to have "panic_print_sys_info(false)"
executing before the "__crash_kexec(NULL)".

But also, we can add "crash_kexec_post_notifiers" and it will still
work; finally, pstore is gonna be able to collect the logs from
"panic_print" as well (the main purpose of this patch).

Once that's all resolved, my goal is to jump into the panic notifiers
refactor suggested in the other thread. Let me know if you agree with
these steps/plans, and I'll work them.
Cheers,


Guilherme

2022-01-24 19:38:09

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH V3] panic: Move panic_print before kmsg dumpers

From: Baoquan He <[email protected]> Sent: Friday, January 21, 2022 8:34 PM
>
> On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
> > >
> > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > Hi Baoquan, some comments inline below:
> > > >
> > > > On 20/01/2022 05:51, Baoquan He wrote:

[snip]

> > > > Do you think it should be necessary?
> > > > How about if we allow users to just "panic_print" with or without the
> > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > might have a much clear code.
> > >
> > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > we strongly suggest HyperV dev to change the default enablement, instead
> > > leave it to user to decide.
> > >
> >
> > Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> > kdump kernel is necessary for correctness. During initial boot of the
> > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > may write to. A VMbus connection is also established. Before kexec'ing
> > into the kdump kernel, the sharing of these pages must be rescinded
> > and the VMbus connection must be terminated. If this isn't done, the
> > kdump kernel will see strange memory overwrites if these shared guest
> > physical memory pages get used for something else.
> >
> > I hope we've found and fixed all the problems where the Hyper-V
> > notifier could get hung. Unfortunately, the Hyper-V interfaces were
> > designed long ago without the Linux kexec scenario in mind, and they
> > don't provide a simple way to reset everything except by doing a
> > reboot that goes back through the virtual BIOS/UEFI. So the Hyper-V
> > notifier code is more complicated than would be desirable, and in
> > particular, terminating the VMbus connection is tricky.
> >
> > This has been an evolving area of understanding. It's only been the last
> > couple of years that we've fully understood the implications of these
> > shared memory pages on the kexec/kdump scenario and what it takes
> > to reset everything so the kexec'ed kernel will work.
>
> Glad to know these background details, thx, Michael. While from the
> commit which introduced it and the code comment above code, I thought
> Hyper-V wants to collect data before crash dump. If this is the true,
> it might be helpful to add these in commit log or add as code comment,
> and also help to defend you when people question it.
>
> int __init hv_common_init(void)
> {
> int i;
>
> /*
> * Hyper-V expects to get crash register data or kmsg when
> * crash enlightment is available and system crashes. Set
> * crash_kexec_post_notifiers to be true to make sure that
> * calling crash enlightment interface before running kdump
> * kernel.
> */
> if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE)
> crash_kexec_post_notifiers = true;
>
> ......
> }

In the Azure cloud, collecting data before crash dumps is a motivation
as well for setting crash_kexec_post_notifiers to true. That way as
cloud operator we can see broad failure trends, and in specific cases
customers often expect the cloud operator to be able to provide info
about a problem even if they have taken a kdump. Where did you
envision adding a comment in the code to help clarify these intentions?

I looked at the code again, and should revise my previous comments
somewhat. The Hyper-V resets that I described indeed must be done
prior to kexec'ing the kdump kernel. Most such resets are actually
done via __crash_kexec() -> machine_crash_shutdown(), not via the
panic notifier. However, the Hyper-V panic notifier must terminate the
VMbus connection, because that must be done even if kdump is not
being invoked. See commit 74347a99e73.

Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
were probably due to the machine_crash_shutdown() path, and not due
to running the panic notifiers prior to kexec'ing the kdump kernel. The
exception is terminating the VMbus connection, which had problems that
are hopefully now fixed because of adding a timeout.

Michael

2022-01-26 17:38:54

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/22/22 at 10:49am, Guilherme G. Piccoli wrote:
> On 22/01/2022 07:31, Baoquan He wrote:
> > [...]
> > From my old POV, I took pstore as a necessity on handheld devices or
> > embeded system, e.g on Andriod. In that case, reserving crashkernel
> > memory to enable kdump to save kernel log, it sounds not so
> > cost-effective, since memory on those systems is usually not big.
> > I am also interested in any new use case where people deploy these
> > and why it's needed, to widen my view.
>
> Hi Baoquan, that's great to hear. Indeed, I feel pstore is unfortunately
> not very used in non-embedded devices - if you see kdump/error-report
> userspace tooling, like on Red Hat/Fedora, Debian/Ubuntu and so on, they
> never rely on pstore. And the configuration is not straightforward for
> the users...I think that's a good thing to change, since pstore is much
> less resource consuming than kdump.
> But of course, not a discussion related to this patch specifically, just
> me thinking out loud heh
>
>
> > [...]
> > It's my bad. My thought is panic_print and kmsg_dump can be coupled, but
> > they should decouple with panic_notifier. When panic_print is enabled,
> > we do not expect to execute panic_notifier? My personal opinion.
> >
> > I missed the change at line 8, sorry for the caused misunderstanding.
> > Now the chance of holding C-programmer-prize of the year comes back
> > again.
> >
> > void panic()
> > {
> > 1 if (!_crash_kexec_post_notifiers && !panic_print) {
> > 2 __crash_kexec(NULL);
> > 3 smp_send_stop();
> > 4 } else {
> > 5 crash_smp_send_stop();
> > 6 }
> >
> > if (_crash_kexec_post_notifiers)
> > 8 atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
> > 9 panic_print_sys_info(false);
> > 10 kmsg_dump(KMSG_DUMP_PANIC);
> > 11 if (_crash_kexec_post_notifiers || panic_print)
> > 12 __crash_kexec(NULL);
> > ...
> > debug_locks_off();
> > console_flush_on_panic(CONSOLE_FLUSH_PENDING);
> > panic_print_sys_info(true);
>
> Hmm, yeah, I still don't think I'm a brilliant C programmer heh
> Again, in the code above, I can't see how we would reach
> "__crash_kexec(NULL)" after printing the extra info of panic_print, if
> we don't have panic notifiers enabled.

Missed this one.

Above code will allow any of _crash_kexec_post_notifiers and panic_print
to execute, then crash dump in L11.
L5 -> L11

Since you have posted v4, let's ignore it anyway.

>
> So, indeed the code currently don't really tightly couple "panic_print"
> with the panic notifiers. We could change that in another patch series,
> based on what Petr suggested in the filter thread (I know you're
> following there as well, thanks bu the way!), but for now, they are
> completely independent. My plan, following Petr suggestions here and if
> you agree, is to re-submit this patch with some changes, but in the end
> the code will allow users that have kdump enabled + panic_print
> -"crash_kexec_post_notifiers" to have "panic_print_sys_info(false)"
> executing before the "__crash_kexec(NULL)".
>
> But also, we can add "crash_kexec_post_notifiers" and it will still
> work; finally, pstore is gonna be able to collect the logs from
> "panic_print" as well (the main purpose of this patch).
>
> Once that's all resolved, my goal is to jump into the panic notifiers
> refactor suggested in the other thread. Let me know if you agree with
> these steps/plans, and I'll work them.

I am glad to see any improvement from refactory. As for panic_notifier,
I have expressed my concern and worry about the plan. So, if no any
new action added before kdump switching, it's welcomed.

2022-01-26 21:14:17

by Petr Mladek

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On Mon 2022-01-24 16:57:17, Michael Kelley (LINUX) wrote:
> From: Baoquan He <[email protected]> Sent: Friday, January 21, 2022 8:34 PM
> >
> > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
> > > >
> > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > Hi Baoquan, some comments inline below:
> > > > >
> > > > > On 20/01/2022 05:51, Baoquan He wrote:
>
> [snip]
>
> > > > > Do you think it should be necessary?
> > > > > How about if we allow users to just "panic_print" with or without the
> > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > might have a much clear code.
> > > >
> > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > leave it to user to decide.
> > > >
> > >
> > > Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> > > kdump kernel is necessary for correctness. During initial boot of the
> > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > may write to. A VMbus connection is also established. Before kexec'ing
> > > into the kdump kernel, the sharing of these pages must be rescinded
> > > and the VMbus connection must be terminated. If this isn't done, the
> > > kdump kernel will see strange memory overwrites if these shared guest
> > > physical memory pages get used for something else.
>
> In the Azure cloud, collecting data before crash dumps is a motivation
> as well for setting crash_kexec_post_notifiers to true. That way as
> cloud operator we can see broad failure trends, and in specific cases
> customers often expect the cloud operator to be able to provide info
> about a problem even if they have taken a kdump. Where did you
> envision adding a comment in the code to help clarify these intentions?
>
> I looked at the code again, and should revise my previous comments
> somewhat. The Hyper-V resets that I described indeed must be done
> prior to kexec'ing the kdump kernel. Most such resets are actually
> done via __crash_kexec() -> machine_crash_shutdown(), not via the
> panic notifier. However, the Hyper-V panic notifier must terminate the
> VMbus connection, because that must be done even if kdump is not
> being invoked. See commit 74347a99e73.
>
> Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
> were probably due to the machine_crash_shutdown() path, and not due
> to running the panic notifiers prior to kexec'ing the kdump kernel. The
> exception is terminating the VMbus connection, which had problems that
> are hopefully now fixed because of adding a timeout.

My undestanding is that we could split the actions into three groups:

1. Actions that has to be before kexec'ing kdump kernel, like
resetting physicall memory shared with Hyper-V.

These operation(s) are needed only for kexec and can be done
in kexec.


2. Notify Hyper-V so that, for example, Azure cloud, could collect
data before crash dump.

It is nice to have.

It should be configurable if it is not completely safe. I mean
that there should be a way to disable it when it might increase
the risk that kexec'ing kdump kernel might fail.


3. Some actions are needed only when panic() ends up in the
infinite loop.

For example, unloading vmbus channel. At least the commit
74347a99e73ae00b8385f ("x86/Hyper-V: Unload vmbus channel in
hv panic callback") says that it is done in kdump path
out of box.

All these operations are needed and used only when the kernel is
running under Hyper-V.

My mine intention is to understand if we need 2 or 3 notifier lists
or the current one is enough.

The 3 notifier lists would be:

+ always do (even before kdump)
+ optionally do before or after kdump
+ needed only when kdump is not called

Thanks a lot for the very valuable input.

Best Regards,
Petr

2022-01-30 19:05:44

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/24/22 at 04:57pm, Michael Kelley (LINUX) wrote:
> From: Baoquan He <[email protected]> Sent: Friday, January 21, 2022 8:34 PM
> >
> > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
> > > >
> > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > Hi Baoquan, some comments inline below:
> > > > >
> > > > > On 20/01/2022 05:51, Baoquan He wrote:
>
> [snip]
>
> > > > > Do you think it should be necessary?
> > > > > How about if we allow users to just "panic_print" with or without the
> > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > might have a much clear code.
> > > >
> > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > leave it to user to decide.
> > > >
> > >
> > > Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> > > kdump kernel is necessary for correctness. During initial boot of the
> > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > may write to. A VMbus connection is also established. Before kexec'ing
> > > into the kdump kernel, the sharing of these pages must be rescinded
> > > and the VMbus connection must be terminated. If this isn't done, the
> > > kdump kernel will see strange memory overwrites if these shared guest
> > > physical memory pages get used for something else.
> > >
> > > I hope we've found and fixed all the problems where the Hyper-V
> > > notifier could get hung. Unfortunately, the Hyper-V interfaces were
> > > designed long ago without the Linux kexec scenario in mind, and they
> > > don't provide a simple way to reset everything except by doing a
> > > reboot that goes back through the virtual BIOS/UEFI. So the Hyper-V
> > > notifier code is more complicated than would be desirable, and in
> > > particular, terminating the VMbus connection is tricky.
> > >
> > > This has been an evolving area of understanding. It's only been the last
> > > couple of years that we've fully understood the implications of these
> > > shared memory pages on the kexec/kdump scenario and what it takes
> > > to reset everything so the kexec'ed kernel will work.
> >
> > Glad to know these background details, thx, Michael. While from the
> > commit which introduced it and the code comment above code, I thought
> > Hyper-V wants to collect data before crash dump. If this is the true,
> > it might be helpful to add these in commit log or add as code comment,
> > and also help to defend you when people question it.
> >
> > int __init hv_common_init(void)
> > {
> > int i;
> >
> > /*
> > * Hyper-V expects to get crash register data or kmsg when
> > * crash enlightment is available and system crashes. Set
> > * crash_kexec_post_notifiers to be true to make sure that
> > * calling crash enlightment interface before running kdump
> > * kernel.
> > */
> > if (ms_hyperv.misc_features & HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE)
> > crash_kexec_post_notifiers = true;
> >
> > ......
> > }
>
> In the Azure cloud, collecting data before crash dumps is a motivation
> as well for setting crash_kexec_post_notifiers to true. That way as
> cloud operator we can see broad failure trends, and in specific cases
> customers often expect the cloud operator to be able to provide info
> about a problem even if they have taken a kdump. Where did you
> envision adding a comment in the code to help clarify these intentions?
>
> I looked at the code again, and should revise my previous comments
> somewhat. The Hyper-V resets that I described indeed must be done
> prior to kexec'ing the kdump kernel. Most such resets are actually
> done via __crash_kexec() -> machine_crash_shutdown(), not via the
> panic notifier. However, the Hyper-V panic notifier must terminate the
> VMbus connection, because that must be done even if kdump is not
> being invoked. See commit 74347a99e73.
>
> Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
> were probably due to the machine_crash_shutdown() path, and not due
> to running the panic notifiers prior to kexec'ing the kdump kernel. The
> exception is terminating the VMbus connection, which had problems that
> are hopefully now fixed because of adding a timeout.
Thanks for detailed information.

So I can understand the status as:
===
Hyper-V needed panic_notifier to execute before __crash_kexec() in
the past, because VMbus connection need be terminated, that's done in
commit 74347a99e73 as a workaround when panic happened, whether kdump is
enabled or not. But now, the VMbus connection termination is not needed
anymore since it's fixed by adding a timeout on Hyper-V.

Then, in the current kernel, panic_notifier is taken to execute on Hyper-V
by default just because of one reason, Hyper-V wants to collect data
before crash dump. The data collecting is motivate by trying to see
broad failure trends as cloud operator on Azure cloud, and in specific
cases providing info to customer even if they have taken vmcore.
===

Do I get it right?

2022-01-31 11:31:12

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH V3] panic: Move panic_print before kmsg dumpers

From: Baoquan He <[email protected]> Sent: Friday, January 28, 2022 1:03 AM
>
> On 01/24/22 at 04:57pm, Michael Kelley (LINUX) wrote:
> > From: Baoquan He <[email protected]> Sent: Friday, January 21, 2022 8:34 PM
> > >
> > > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > > From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
> > > > >
> > > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > > Hi Baoquan, some comments inline below:
> > > > > >
> > > > > > On 20/01/2022 05:51, Baoquan He wrote:
> >
> > [snip]
> >
> > > > > > Do you think it should be necessary?
> > > > > > How about if we allow users to just "panic_print" with or without the
> > > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > > might have a much clear code.
> > > > >
> > > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > > leave it to user to decide.
> > > > >
> > > >
> > > > Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> > > > kdump kernel is necessary for correctness. During initial boot of the
> > > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > > may write to. A VMbus connection is also established. Before kexec'ing
> > > > into the kdump kernel, the sharing of these pages must be rescinded
> > > > and the VMbus connection must be terminated. If this isn't done, the
> > > > kdump kernel will see strange memory overwrites if these shared guest
> > > > physical memory pages get used for something else.
> > > >
> > > > I hope we've found and fixed all the problems where the Hyper-V
> > > > notifier could get hung. Unfortunately, the Hyper-V interfaces were
> > > > designed long ago without the Linux kexec scenario in mind, and they
> > > > don't provide a simple way to reset everything except by doing a
> > > > reboot that goes back through the virtual BIOS/UEFI. So the Hyper-V
> > > > notifier code is more complicated than would be desirable, and in
> > > > particular, terminating the VMbus connection is tricky.
> > > >
> > > > This has been an evolving area of understanding. It's only been the last
> > > > couple of years that we've fully understood the implications of these
> > > > shared memory pages on the kexec/kdump scenario and what it takes
> > > > to reset everything so the kexec'ed kernel will work.
> > >
> > > Glad to know these background details, thx, Michael. While from the
> > > commit which introduced it and the code comment above code, I thought
> > > Hyper-V wants to collect data before crash dump. If this is the true,
> > > it might be helpful to add these in commit log or add as code comment,
> > > and also help to defend you when people question it.
> > >
> > > int __init hv_common_init(void)
> > > {
> > > int i;
> > >
> > > /*
> > > * Hyper-V expects to get crash register data or kmsg when
> > > * crash enlightment is available and system crashes. Set
> > > * crash_kexec_post_notifiers to be true to make sure that
> > > * calling crash enlightment interface before running kdump
> > > * kernel.
> > > */
> > > if (ms_hyperv.misc_features &
> HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE)
> > > crash_kexec_post_notifiers = true;
> > >
> > > ......
> > > }
> >
> > In the Azure cloud, collecting data before crash dumps is a motivation
> > as well for setting crash_kexec_post_notifiers to true. That way as
> > cloud operator we can see broad failure trends, and in specific cases
> > customers often expect the cloud operator to be able to provide info
> > about a problem even if they have taken a kdump. Where did you
> > envision adding a comment in the code to help clarify these intentions?
> >
> > I looked at the code again, and should revise my previous comments
> > somewhat. The Hyper-V resets that I described indeed must be done
> > prior to kexec'ing the kdump kernel. Most such resets are actually
> > done via __crash_kexec() -> machine_crash_shutdown(), not via the
> > panic notifier. However, the Hyper-V panic notifier must terminate the
> > VMbus connection, because that must be done even if kdump is not
> > being invoked. See commit 74347a99e73.
> >
> > Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
> > were probably due to the machine_crash_shutdown() path, and not due
> > to running the panic notifiers prior to kexec'ing the kdump kernel. The
> > exception is terminating the VMbus connection, which had problems that
> > are hopefully now fixed because of adding a timeout.
> Thanks for detailed information.
>
> So I can understand the status as:
> ===
> Hyper-V needed panic_notifier to execute before __crash_kexec() in
> the past, because VMbus connection need be terminated, that's done in
> commit 74347a99e73 as a workaround when panic happened, whether kdump is
> enabled or not. But now, the VMbus connection termination is not needed
> anymore since it's fixed by adding a timeout on Hyper-V.

No. Sorry I wasn't clear. Even now, specific action needs to be taken to
terminate the VMbus connection before __crash_kexec() runs so that
the new kdump kernel can start fresh and establish its own VMbus
connection. You had originally mentioned hang problems occurring
because of running the Hyper-V panic notifier before __crash_kexec().
Terminating the VMbus connection waits for a reply from Hyper-V
because terminating the connection can take a while (10's seconds)
if Hyper-V has a lot of disk data cached. Dirty data must be flushed back
to a cloud disk before the kdump kernel runs (otherwise other weird stuff
happens in the kdump kernel). We've added a timeout in Linux so that if
for whatever reason Hyper-V fails to reply, __crash_kexec() still gets called.
Hopefully that timeout cures any hang problems that were previously
seen. But the timeout does not remove the need to terminate the
VMbus connection.

Michael

>
> Then, in the current kernel, panic_notifier is taken to execute on Hyper-V
> by default just because of one reason, Hyper-V wants to collect data
> before crash dump. The data collecting is motivate by trying to see
> broad failure trends as cloud operator on Azure cloud, and in specific
> cases providing info to customer even if they have taken vmcore.
> ===
>
> Do I get it right?

2022-01-31 23:14:21

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/28/22 at 06:24pm, Michael Kelley (LINUX) wrote:
> From: Baoquan He <[email protected]> Sent: Friday, January 28, 2022 1:03 AM
> >
> > On 01/24/22 at 04:57pm, Michael Kelley (LINUX) wrote:
> > > From: Baoquan He <[email protected]> Sent: Friday, January 21, 2022 8:34 PM
> > > >
> > > > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > > > From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
> > > > > >
> > > > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > > > Hi Baoquan, some comments inline below:
> > > > > > >
> > > > > > > On 20/01/2022 05:51, Baoquan He wrote:
> > >
> > > [snip]
> > >
> > > > > > > Do you think it should be necessary?
> > > > > > > How about if we allow users to just "panic_print" with or without the
> > > > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > > > might have a much clear code.
> > > > > >
> > > > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > > > leave it to user to decide.
> > > > > >
> > > > >
> > > > > Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> > > > > kdump kernel is necessary for correctness. During initial boot of the
> > > > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > > > may write to. A VMbus connection is also established. Before kexec'ing
> > > > > into the kdump kernel, the sharing of these pages must be rescinded
> > > > > and the VMbus connection must be terminated. If this isn't done, the
> > > > > kdump kernel will see strange memory overwrites if these shared guest
> > > > > physical memory pages get used for something else.
> > > > >
> > > > > I hope we've found and fixed all the problems where the Hyper-V
> > > > > notifier could get hung. Unfortunately, the Hyper-V interfaces were
> > > > > designed long ago without the Linux kexec scenario in mind, and they
> > > > > don't provide a simple way to reset everything except by doing a
> > > > > reboot that goes back through the virtual BIOS/UEFI. So the Hyper-V
> > > > > notifier code is more complicated than would be desirable, and in
> > > > > particular, terminating the VMbus connection is tricky.
> > > > >
> > > > > This has been an evolving area of understanding. It's only been the last
> > > > > couple of years that we've fully understood the implications of these
> > > > > shared memory pages on the kexec/kdump scenario and what it takes
> > > > > to reset everything so the kexec'ed kernel will work.
> > > >
> > > > Glad to know these background details, thx, Michael. While from the
> > > > commit which introduced it and the code comment above code, I thought
> > > > Hyper-V wants to collect data before crash dump. If this is the true,
> > > > it might be helpful to add these in commit log or add as code comment,
> > > > and also help to defend you when people question it.
> > > >
> > > > int __init hv_common_init(void)
> > > > {
> > > > int i;
> > > >
> > > > /*
> > > > * Hyper-V expects to get crash register data or kmsg when
> > > > * crash enlightment is available and system crashes. Set
> > > > * crash_kexec_post_notifiers to be true to make sure that
> > > > * calling crash enlightment interface before running kdump
> > > > * kernel.
> > > > */
> > > > if (ms_hyperv.misc_features &
> > HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE)
> > > > crash_kexec_post_notifiers = true;
> > > >
> > > > ......
> > > > }
> > >
> > > In the Azure cloud, collecting data before crash dumps is a motivation
> > > as well for setting crash_kexec_post_notifiers to true. That way as
> > > cloud operator we can see broad failure trends, and in specific cases
> > > customers often expect the cloud operator to be able to provide info
> > > about a problem even if they have taken a kdump. Where did you
> > > envision adding a comment in the code to help clarify these intentions?
> > >
> > > I looked at the code again, and should revise my previous comments
> > > somewhat. The Hyper-V resets that I described indeed must be done
> > > prior to kexec'ing the kdump kernel. Most such resets are actually
> > > done via __crash_kexec() -> machine_crash_shutdown(), not via the
> > > panic notifier. However, the Hyper-V panic notifier must terminate the
> > > VMbus connection, because that must be done even if kdump is not
> > > being invoked. See commit 74347a99e73.
> > >
> > > Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
> > > were probably due to the machine_crash_shutdown() path, and not due
> > > to running the panic notifiers prior to kexec'ing the kdump kernel. The
> > > exception is terminating the VMbus connection, which had problems that
> > > are hopefully now fixed because of adding a timeout.
> > Thanks for detailed information.
> >
> > So I can understand the status as:
> > ===
> > Hyper-V needed panic_notifier to execute before __crash_kexec() in
> > the past, because VMbus connection need be terminated, that's done in
> > commit 74347a99e73 as a workaround when panic happened, whether kdump is
> > enabled or not. But now, the VMbus connection termination is not needed
> > anymore since it's fixed by adding a timeout on Hyper-V.
>
> No. Sorry I wasn't clear. Even now, specific action needs to be taken to
> terminate the VMbus connection before __crash_kexec() runs so that
> the new kdump kernel can start fresh and establish its own VMbus
> connection. You had originally mentioned hang problems occurring
> because of running the Hyper-V panic notifier before __crash_kexec().
> Terminating the VMbus connection waits for a reply from Hyper-V
> because terminating the connection can take a while (10's seconds)
> if Hyper-V has a lot of disk data cached. Dirty data must be flushed back
> to a cloud disk before the kdump kernel runs (otherwise other weird stuff
> happens in the kdump kernel). We've added a timeout in Linux so that if
> for whatever reason Hyper-V fails to reply, __crash_kexec() still gets called.
> Hopefully that timeout cures any hang problems that were previously
> seen. But the timeout does not remove the need to terminate the
> VMbus connection.

Ah, got it now, thx.

So the VMbus connection has to be terminated explicitly and need be
waited, no matter kdump is configured or not. Wondering why we don't
wrap these into a function and call it before __crash_kexec() if
platform is Hyper-V. Honeslty, hooking it up to panic_notifier_list
and executing the entire panic_notifiers for Hyper-V is risky and
not so inappropriate.




2022-01-31 23:15:25

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 01/26/22 at 12:51pm, Petr Mladek wrote:
> On Mon 2022-01-24 16:57:17, Michael Kelley (LINUX) wrote:
> > From: Baoquan He <[email protected]> Sent: Friday, January 21, 2022 8:34 PM
> > >
> > > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > > From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
> > > > >
> > > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > > Hi Baoquan, some comments inline below:
> > > > > >
> > > > > > On 20/01/2022 05:51, Baoquan He wrote:
> >
> > [snip]
> >
> > > > > > Do you think it should be necessary?
> > > > > > How about if we allow users to just "panic_print" with or without the
> > > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > > might have a much clear code.
> > > > >
> > > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > > leave it to user to decide.
> > > > >
> > > >
> > > > Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> > > > kdump kernel is necessary for correctness. During initial boot of the
> > > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > > may write to. A VMbus connection is also established. Before kexec'ing
> > > > into the kdump kernel, the sharing of these pages must be rescinded
> > > > and the VMbus connection must be terminated. If this isn't done, the
> > > > kdump kernel will see strange memory overwrites if these shared guest
> > > > physical memory pages get used for something else.
> >
> > In the Azure cloud, collecting data before crash dumps is a motivation
> > as well for setting crash_kexec_post_notifiers to true. That way as
> > cloud operator we can see broad failure trends, and in specific cases
> > customers often expect the cloud operator to be able to provide info
> > about a problem even if they have taken a kdump. Where did you
> > envision adding a comment in the code to help clarify these intentions?
> >
> > I looked at the code again, and should revise my previous comments
> > somewhat. The Hyper-V resets that I described indeed must be done
> > prior to kexec'ing the kdump kernel. Most such resets are actually
> > done via __crash_kexec() -> machine_crash_shutdown(), not via the
> > panic notifier. However, the Hyper-V panic notifier must terminate the
> > VMbus connection, because that must be done even if kdump is not
> > being invoked. See commit 74347a99e73.
> >
> > Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
> > were probably due to the machine_crash_shutdown() path, and not due
> > to running the panic notifiers prior to kexec'ing the kdump kernel. The
> > exception is terminating the VMbus connection, which had problems that
> > are hopefully now fixed because of adding a timeout.
>
> My undestanding is that we could split the actions into three groups:
>
> 1. Actions that has to be before kexec'ing kdump kernel, like
> resetting physicall memory shared with Hyper-V.
>
> These operation(s) are needed only for kexec and can be done
> in kexec.
>
>
> 2. Notify Hyper-V so that, for example, Azure cloud, could collect
> data before crash dump.
>
> It is nice to have.
>
> It should be configurable if it is not completely safe. I mean
> that there should be a way to disable it when it might increase
> the risk that kexec'ing kdump kernel might fail.
>
>
> 3. Some actions are needed only when panic() ends up in the
> infinite loop.
>
> For example, unloading vmbus channel. At least the commit
> 74347a99e73ae00b8385f ("x86/Hyper-V: Unload vmbus channel in
> hv panic callback") says that it is done in kdump path
> out of box.
>
> All these operations are needed and used only when the kernel is
> running under Hyper-V.
>
> My mine intention is to understand if we need 2 or 3 notifier lists
> or the current one is enough.
>
> The 3 notifier lists would be:
>
> + always do (even before kdump)
> + optionally do before or after kdump
> + needed only when kdump is not called

Totally agree with above suggesitons for Hyper-V. Cleanup as ablove
seems necesary. Stuffing them into panic_notifiers package is not
appropriate.

2022-02-03 13:25:37

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH V3] panic: Move panic_print before kmsg dumpers

From: Baoquan He <[email protected]> Sent: Saturday, January 29, 2022 12:00 AM
>
> On 01/26/22 at 12:51pm, Petr Mladek wrote:
> > On Mon 2022-01-24 16:57:17, Michael Kelley (LINUX) wrote:
> > > From: Baoquan He <[email protected]> Sent: Friday, January 21, 2022 8:34 PM
> > > >
> > > > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > > > From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
> > > > > >
> > > > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > > > Hi Baoquan, some comments inline below:
> > > > > > >
> > > > > > > On 20/01/2022 05:51, Baoquan He wrote:
> > >
> > > [snip]
> > >
> > > > > > > Do you think it should be necessary?
> > > > > > > How about if we allow users to just "panic_print" with or without the
> > > > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > > > might have a much clear code.
> > > > > >
> > > > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > > > leave it to user to decide.
> > > > > >
> > > > >
> > > > > Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> > > > > kdump kernel is necessary for correctness. During initial boot of the
> > > > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > > > may write to. A VMbus connection is also established. Before kexec'ing
> > > > > into the kdump kernel, the sharing of these pages must be rescinded
> > > > > and the VMbus connection must be terminated. If this isn't done, the
> > > > > kdump kernel will see strange memory overwrites if these shared guest
> > > > > physical memory pages get used for something else.
> > >
> > > In the Azure cloud, collecting data before crash dumps is a motivation
> > > as well for setting crash_kexec_post_notifiers to true. That way as
> > > cloud operator we can see broad failure trends, and in specific cases
> > > customers often expect the cloud operator to be able to provide info
> > > about a problem even if they have taken a kdump. Where did you
> > > envision adding a comment in the code to help clarify these intentions?
> > >
> > > I looked at the code again, and should revise my previous comments
> > > somewhat. The Hyper-V resets that I described indeed must be done
> > > prior to kexec'ing the kdump kernel. Most such resets are actually
> > > done via __crash_kexec() -> machine_crash_shutdown(), not via the
> > > panic notifier. However, the Hyper-V panic notifier must terminate the
> > > VMbus connection, because that must be done even if kdump is not
> > > being invoked. See commit 74347a99e73.
> > >
> > > Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
> > > were probably due to the machine_crash_shutdown() path, and not due
> > > to running the panic notifiers prior to kexec'ing the kdump kernel. The
> > > exception is terminating the VMbus connection, which had problems that
> > > are hopefully now fixed because of adding a timeout.
> >
> > My undestanding is that we could split the actions into three groups:
> >
> > 1. Actions that has to be before kexec'ing kdump kernel, like
> > resetting physicall memory shared with Hyper-V.
> >
> > These operation(s) are needed only for kexec and can be done
> > in kexec.
> >
> >
> > 2. Notify Hyper-V so that, for example, Azure cloud, could collect
> > data before crash dump.
> >
> > It is nice to have.
> >
> > It should be configurable if it is not completely safe. I mean
> > that there should be a way to disable it when it might increase
> > the risk that kexec'ing kdump kernel might fail.
> >
> >
> > 3. Some actions are needed only when panic() ends up in the
> > infinite loop.
> >
> > For example, unloading vmbus channel. At least the commit
> > 74347a99e73ae00b8385f ("x86/Hyper-V: Unload vmbus channel in
> > hv panic callback") says that it is done in kdump path
> > out of box.
> >
> > All these operations are needed and used only when the kernel is
> > running under Hyper-V.
> >
> > My mine intention is to understand if we need 2 or 3 notifier lists
> > or the current one is enough.
> >
> > The 3 notifier lists would be:
> >
> > + always do (even before kdump)
> > + optionally do before or after kdump
> > + needed only when kdump is not called
>
> Totally agree with above suggesitons for Hyper-V. Cleanup as ablove
> seems necesary. Stuffing them into panic_notifiers package is not
> appropriate.

Baoquan -- if the concept of panic notifiers is broadened as Petr is
proposing, with three different notifier lists, are you OK with the
Hyper-V requirements being met that way? Having a generic
mechanism seems better to me than adding #ifdef CONFIG_HYPERV
code into panic().

Michael



2022-02-08 06:40:30

by Baoquan He

[permalink] [raw]
Subject: Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

On 02/02/22 at 05:43pm, Michael Kelley (LINUX) wrote:
> From: Baoquan He <[email protected]> Sent: Saturday, January 29, 2022 12:00 AM
> >
> > On 01/26/22 at 12:51pm, Petr Mladek wrote:
> > > On Mon 2022-01-24 16:57:17, Michael Kelley (LINUX) wrote:
> > > > From: Baoquan He <[email protected]> Sent: Friday, January 21, 2022 8:34 PM
> > > > >
> > > > > On 01/21/22 at 03:00pm, Michael Kelley (LINUX) wrote:
> > > > > > From: Baoquan He <[email protected]> Sent: Thursday, January 20, 2022 6:31 PM
> > > > > > >
> > > > > > > On 01/20/22 at 06:36pm, Guilherme G. Piccoli wrote:
> > > > > > > > Hi Baoquan, some comments inline below:
> > > > > > > >
> > > > > > > > On 20/01/2022 05:51, Baoquan He wrote:
> > > >
> > > > [snip]
> > > >
> > > > > > > > Do you think it should be necessary?
> > > > > > > > How about if we allow users to just "panic_print" with or without the
> > > > > > > > "crash_kexec_post_notifiers", then we pursue Petr suggestion of
> > > > > > > > refactoring the panic notifiers? So, after this future refactor, we
> > > > > > > > might have a much clear code.
> > > > > > >
> > > > > > > I haven't read Petr's reply in another panic notifier filter thread. For
> > > > > > > panic notifier, it's only enforced to use on HyperV platform, excepto of
> > > > > > > that, users need to explicitly add "crash_kexec_post_notifiers=1" to enable
> > > > > > > it. And we got bug report on the HyperV issue. In our internal discussion,
> > > > > > > we strongly suggest HyperV dev to change the default enablement, instead
> > > > > > > leave it to user to decide.
> > > > > > >
> > > > > >
> > > > > > Regarding Hyper-V: Invoking the Hyper-V notifier prior to running the
> > > > > > kdump kernel is necessary for correctness. During initial boot of the
> > > > > > main kernel, the Hyper-V and VMbus code in Linux sets up several guest
> > > > > > physical memory pages that are shared with Hyper-V, and that Hyper-V
> > > > > > may write to. A VMbus connection is also established. Before kexec'ing
> > > > > > into the kdump kernel, the sharing of these pages must be rescinded
> > > > > > and the VMbus connection must be terminated. If this isn't done, the
> > > > > > kdump kernel will see strange memory overwrites if these shared guest
> > > > > > physical memory pages get used for something else.
> > > >
> > > > In the Azure cloud, collecting data before crash dumps is a motivation
> > > > as well for setting crash_kexec_post_notifiers to true. That way as
> > > > cloud operator we can see broad failure trends, and in specific cases
> > > > customers often expect the cloud operator to be able to provide info
> > > > about a problem even if they have taken a kdump. Where did you
> > > > envision adding a comment in the code to help clarify these intentions?
> > > >
> > > > I looked at the code again, and should revise my previous comments
> > > > somewhat. The Hyper-V resets that I described indeed must be done
> > > > prior to kexec'ing the kdump kernel. Most such resets are actually
> > > > done via __crash_kexec() -> machine_crash_shutdown(), not via the
> > > > panic notifier. However, the Hyper-V panic notifier must terminate the
> > > > VMbus connection, because that must be done even if kdump is not
> > > > being invoked. See commit 74347a99e73.
> > > >
> > > > Most of the hangs seen in getting into the kdump kernel on Hyper-V/Azure
> > > > were probably due to the machine_crash_shutdown() path, and not due
> > > > to running the panic notifiers prior to kexec'ing the kdump kernel. The
> > > > exception is terminating the VMbus connection, which had problems that
> > > > are hopefully now fixed because of adding a timeout.
> > >
> > > My undestanding is that we could split the actions into three groups:
> > >
> > > 1. Actions that has to be before kexec'ing kdump kernel, like
> > > resetting physicall memory shared with Hyper-V.
> > >
> > > These operation(s) are needed only for kexec and can be done
> > > in kexec.
> > >
> > >
> > > 2. Notify Hyper-V so that, for example, Azure cloud, could collect
> > > data before crash dump.
> > >
> > > It is nice to have.
> > >
> > > It should be configurable if it is not completely safe. I mean
> > > that there should be a way to disable it when it might increase
> > > the risk that kexec'ing kdump kernel might fail.
> > >
> > >
> > > 3. Some actions are needed only when panic() ends up in the
> > > infinite loop.
> > >
> > > For example, unloading vmbus channel. At least the commit
> > > 74347a99e73ae00b8385f ("x86/Hyper-V: Unload vmbus channel in
> > > hv panic callback") says that it is done in kdump path
> > > out of box.
> > >
> > > All these operations are needed and used only when the kernel is
> > > running under Hyper-V.
> > >
> > > My mine intention is to understand if we need 2 or 3 notifier lists
> > > or the current one is enough.
> > >
> > > The 3 notifier lists would be:
> > >
> > > + always do (even before kdump)
> > > + optionally do before or after kdump
> > > + needed only when kdump is not called
> >
> > Totally agree with above suggesitons for Hyper-V. Cleanup as ablove
> > seems necesary. Stuffing them into panic_notifiers package is not
> > appropriate.
>
> Baoquan -- if the concept of panic notifiers is broadened as Petr is
> proposing, with three different notifier lists, are you OK with the
> Hyper-V requirements being met that way? Having a generic
> mechanism seems better to me than adding #ifdef CONFIG_HYPERV
> code into panic().

Yeah, after reading these two threads and rethinking, I think Petr's
proposing is reasonable. We may not call the 1st part which always do
(even before kdump) panic notifiers any more, they are split out from
the notifiers list and taken as a fixed code block, by the way.