2020-09-18 03:29:50

by Dave Young

[permalink] [raw]
Subject: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

crash_kexec_post_notifiers enables running various panic notifier
before kdump kernel booting. This increases risks of kdump failure.
It is well documented in kernel-parameters.txt. We do not suggest
people to enable it together with kdump unless he/she is really sure.
This is also not suggested to be enabled by default when users are
not aware in distributions.

But unfortunately it is enabled by default in systemd, see below
discussions in a systemd report, we can not convince systemd to change
it:
https://github.com/systemd/systemd/issues/16661

Actually we have got reports about kdump kernel hangs in both s390x
and powerpcle cases caused by the systemd change, also some x86 cases
could also be caused by the same (although that is in Hyper-V code
instead of systemd, that need to be addressed separately).

Thus to avoid the auto enablement here just disable the param writable
permission in sysfs.

Signed-off-by: Dave Young <[email protected]>
---
kernel/panic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/panic.c b/kernel/panic.c
index aef8872ba843..bea44fc4eb3b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -695,7 +695,7 @@ core_param(panic, panic_timeout, int, 0644);
core_param(panic_print, panic_print, ulong, 0644);
core_param(pause_on_oops, pause_on_oops, int, 0644);
core_param(panic_on_warn, panic_on_warn, int, 0644);
-core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0644);
+core_param(crash_kexec_post_notifiers, crash_kexec_post_notifiers, bool, 0444);

static int __init oops_setup(char *s)
{
--
2.26.2


2020-09-19 00:50:15

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:

> crash_kexec_post_notifiers enables running various panic notifier
> before kdump kernel booting. This increases risks of kdump failure.
> It is well documented in kernel-parameters.txt. We do not suggest
> people to enable it together with kdump unless he/she is really sure.
> This is also not suggested to be enabled by default when users are
> not aware in distributions.
>
> But unfortunately it is enabled by default in systemd, see below
> discussions in a systemd report, we can not convince systemd to change
> it:
> https://github.com/systemd/systemd/issues/16661
>
> Actually we have got reports about kdump kernel hangs in both s390x
> and powerpcle cases caused by the systemd change, also some x86 cases
> could also be caused by the same (although that is in Hyper-V code
> instead of systemd, that need to be addressed separately).
>
> Thus to avoid the auto enablement here just disable the param writable
> permission in sysfs.
>

Well. I don't think this is at all a desirable way of resolving a
disagreement with the systemd developers

At the above github address I'm seeing "ryncsn added a commit to
ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
enable crash_kexec_post_notifiers by default". So didn't that address
the issue?

2020-09-19 07:30:13

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

On 09/18/20 at 05:47pm, Andrew Morton wrote:
> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:
>
> > crash_kexec_post_notifiers enables running various panic notifier
> > before kdump kernel booting. This increases risks of kdump failure.
> > It is well documented in kernel-parameters.txt. We do not suggest
> > people to enable it together with kdump unless he/she is really sure.
> > This is also not suggested to be enabled by default when users are
> > not aware in distributions.
> >
> > But unfortunately it is enabled by default in systemd, see below
> > discussions in a systemd report, we can not convince systemd to change
> > it:
> > https://github.com/systemd/systemd/issues/16661
> >
> > Actually we have got reports about kdump kernel hangs in both s390x
> > and powerpcle cases caused by the systemd change, also some x86 cases
> > could also be caused by the same (although that is in Hyper-V code
> > instead of systemd, that need to be addressed separately).
> >
> > Thus to avoid the auto enablement here just disable the param writable
> > permission in sysfs.
> >
>
> Well. I don't think this is at all a desirable way of resolving a
> disagreement with the systemd developers
>
> At the above github address I'm seeing "ryncsn added a commit to
> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> enable crash_kexec_post_notifiers by default". So didn't that address
> the issue?
>

I hope that commit can be merged in systemd, but we are really not
optimize about that. The discussion is clear there but we did not get
response since Aug 6.

BTW, Kairui sent the systemd pull request 15 days ago, the new update added some
comment.

Thanks
Dave

2020-09-21 20:19:42

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:
>
> > crash_kexec_post_notifiers enables running various panic notifier
> > before kdump kernel booting. This increases risks of kdump failure.
> > It is well documented in kernel-parameters.txt. We do not suggest
> > people to enable it together with kdump unless he/she is really sure.
> > This is also not suggested to be enabled by default when users are
> > not aware in distributions.
> >
> > But unfortunately it is enabled by default in systemd, see below
> > discussions in a systemd report, we can not convince systemd to change
> > it:
> > https://github.com/systemd/systemd/issues/16661
> >
> > Actually we have got reports about kdump kernel hangs in both s390x
> > and powerpcle cases caused by the systemd change, also some x86 cases
> > could also be caused by the same (although that is in Hyper-V code
> > instead of systemd, that need to be addressed separately).

Perhaps it may be better to fix the issus on s390x and PowerPC as well?

> >
> > Thus to avoid the auto enablement here just disable the param writable
> > permission in sysfs.
> >
>
> Well. I don't think this is at all a desirable way of resolving a
> disagreement with the systemd developers
>
> At the above github address I'm seeing "ryncsn added a commit to
> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> enable crash_kexec_post_notifiers by default". So didn't that address
> the issue?

It does in systemd, but there is a strong interest in making this on by default.

2020-09-22 03:10:11

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

Konrad Rzeszutek Wilk <[email protected]> writes:

> On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
>> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:
>>
>> > crash_kexec_post_notifiers enables running various panic notifier
>> > before kdump kernel booting. This increases risks of kdump failure.
>> > It is well documented in kernel-parameters.txt. We do not suggest
>> > people to enable it together with kdump unless he/she is really sure.
>> > This is also not suggested to be enabled by default when users are
>> > not aware in distributions.
>> >
>> > But unfortunately it is enabled by default in systemd, see below
>> > discussions in a systemd report, we can not convince systemd to change
>> > it:
>> > https://github.com/systemd/systemd/issues/16661
>> >
>> > Actually we have got reports about kdump kernel hangs in both s390x
>> > and powerpcle cases caused by the systemd change, also some x86 cases
>> > could also be caused by the same (although that is in Hyper-V code
>> > instead of systemd, that need to be addressed separately).
>
> Perhaps it may be better to fix the issus on s390x and PowerPC as well?
>
>> >
>> > Thus to avoid the auto enablement here just disable the param writable
>> > permission in sysfs.
>> >
>>
>> Well. I don't think this is at all a desirable way of resolving a
>> disagreement with the systemd developers
>>
>> At the above github address I'm seeing "ryncsn added a commit to
>> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
>> enable crash_kexec_post_notifiers by default". So didn't that address
>> the issue?
>
> It does in systemd, but there is a strong interest in making this on
> by default.

There is also a strong interest in removing this code entirely from the
kernel.

This failure is a case in point.

I think I am at my I told you so point. This is what all of the testing
over all the years has said. Leaving functionality to the peculiarities
of firmware when you don't have to, and can actually control what is
going on doesn't work.

Eric


2020-09-22 11:01:55

by Philipp Rudo

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

Hi Konrad,


On Mon, 21 Sep 2020 16:18:12 -0400
Konrad Rzeszutek Wilk <[email protected]> wrote:

> On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:
> >
> > > crash_kexec_post_notifiers enables running various panic notifier
> > > before kdump kernel booting. This increases risks of kdump failure.
> > > It is well documented in kernel-parameters.txt. We do not suggest
> > > people to enable it together with kdump unless he/she is really sure.
> > > This is also not suggested to be enabled by default when users are
> > > not aware in distributions.
> > >
> > > But unfortunately it is enabled by default in systemd, see below
> > > discussions in a systemd report, we can not convince systemd to change
> > > it:
> > > https://github.com/systemd/systemd/issues/16661
> > >
> > > Actually we have got reports about kdump kernel hangs in both s390x
> > > and powerpcle cases caused by the systemd change, also some x86 cases
> > > could also be caused by the same (although that is in Hyper-V code
> > > instead of systemd, that need to be addressed separately).
>
> Perhaps it may be better to fix the issus on s390x and PowerPC as well?

There's little s390 can fix. We use the panic_notifier_list to start
other dumpers in case kdump isn't configured or failed. This behavior was
introduced in 2006 long before crash_kexec_post_notifiers were introduced. So I
suggest that crash_kexec_post_notifiers are fixed instead.

> > >
> > > Thus to avoid the auto enablement here just disable the param writable
> > > permission in sysfs.
> > >
> >
> > Well. I don't think this is at all a desirable way of resolving a
> > disagreement with the systemd developers
> >
> > At the above github address I'm seeing "ryncsn added a commit to
> > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> > enable crash_kexec_post_notifiers by default". So didn't that address
> > the issue?
>
> It does in systemd, but there is a strong interest in making this on by default.

AFAIK pstore requires UEFI to work. So what's the point to enable it on non-UEFI
systems?

Thanks
Philipp

2020-09-22 14:53:20

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time


On 9/22/20 6:58 AM, Philipp Rudo wrote:
>
> AFAIK pstore requires UEFI to work. So what's the point to enable it on non-UEFI
> systems?


I don't think UEFI is required, ERST can specify its own backend. And that, in fact, can be quite useful in virtualization scenarios (especially in cases of direct boot, when there is no OVMF)


-boris

2020-09-22 17:08:45

by Guilherme G. Piccoli

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

On Tue, Sep 22, 2020 at 11:53 AM <[email protected]> wrote:
>
>
> On 9/22/20 6:58 AM, Philipp Rudo wrote:
> >
> > AFAIK pstore requires UEFI to work. So what's the point to enable it on non-UEFI
> > systems?
>
>
> I don't think UEFI is required, ERST can specify its own backend. And that, in fact, can be quite useful in virtualization scenarios (especially in cases of direct boot, when there is no OVMF)
>
>
> -boris

There is ramoops backend too - I was able to collect a dmesg in a
cloud provider using that!

2020-09-23 03:01:05

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

On 09/21/20 at 04:18pm, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> > On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:
> >
> > > crash_kexec_post_notifiers enables running various panic notifier
> > > before kdump kernel booting. This increases risks of kdump failure.
> > > It is well documented in kernel-parameters.txt. We do not suggest
> > > people to enable it together with kdump unless he/she is really sure.
> > > This is also not suggested to be enabled by default when users are
> > > not aware in distributions.
> > >
> > > But unfortunately it is enabled by default in systemd, see below
> > > discussions in a systemd report, we can not convince systemd to change
> > > it:
> > > https://github.com/systemd/systemd/issues/16661
> > >
> > > Actually we have got reports about kdump kernel hangs in both s390x
> > > and powerpcle cases caused by the systemd change, also some x86 cases
> > > could also be caused by the same (although that is in Hyper-V code
> > > instead of systemd, that need to be addressed separately).
>
> Perhaps it may be better to fix the issus on s390x and PowerPC as well?
>
> > >
> > > Thus to avoid the auto enablement here just disable the param writable
> > > permission in sysfs.
> > >
> >
> > Well. I don't think this is at all a desirable way of resolving a
> > disagreement with the systemd developers
> >
> > At the above github address I'm seeing "ryncsn added a commit to
> > ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> > enable crash_kexec_post_notifiers by default". So didn't that address
> > the issue?
>
> It does in systemd, but there is a strong interest in making this on by default.

I understand there could be such interest, but we have to keep in mind
that any extra things after a system crash can cause kdump unreliable.

I do not object people to use pstore, but I do object to enable the
notifiers by default.

BTW, crash notifiers are not limited to pstore, there are quite a log of
other pieces like led trigger etc.

Thanks
Dave

2020-09-23 03:04:15

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

+ more people who may care about this param
On 09/21/20 at 08:45pm, Eric W. Biederman wrote:
> Konrad Rzeszutek Wilk <[email protected]> writes:
>
> > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:
> >>
> >> > crash_kexec_post_notifiers enables running various panic notifier
> >> > before kdump kernel booting. This increases risks of kdump failure.
> >> > It is well documented in kernel-parameters.txt. We do not suggest
> >> > people to enable it together with kdump unless he/she is really sure.
> >> > This is also not suggested to be enabled by default when users are
> >> > not aware in distributions.
> >> >
> >> > But unfortunately it is enabled by default in systemd, see below
> >> > discussions in a systemd report, we can not convince systemd to change
> >> > it:
> >> > https://github.com/systemd/systemd/issues/16661
> >> >
> >> > Actually we have got reports about kdump kernel hangs in both s390x
> >> > and powerpcle cases caused by the systemd change, also some x86 cases
> >> > could also be caused by the same (although that is in Hyper-V code
> >> > instead of systemd, that need to be addressed separately).
> >
> > Perhaps it may be better to fix the issus on s390x and PowerPC as well?
> >
> >> >
> >> > Thus to avoid the auto enablement here just disable the param writable
> >> > permission in sysfs.
> >> >
> >>
> >> Well. I don't think this is at all a desirable way of resolving a
> >> disagreement with the systemd developers
> >>
> >> At the above github address I'm seeing "ryncsn added a commit to
> >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> >> enable crash_kexec_post_notifiers by default". So didn't that address
> >> the issue?
> >
> > It does in systemd, but there is a strong interest in making this on
> > by default.
>
> There is also a strong interest in removing this code entirely from the
> kernel.

Added Hyper-V people and people who created the param, it is below
commit, I also want to remove it if possible, let's see how people
think, but the least way should be to disable the auto setting in both systemd
and kernel:

commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45
Author: Masami Hiramatsu <[email protected]>
Date: Fri Jun 6 14:37:07 2014 -0700

kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers

Add a "crash_kexec_post_notifiers" boot option to run kdump after
running panic_notifiers and dump kmsg. This can help rare situations
where kdump fails because of unstable crashed kernel or hardware failure
(memory corruption on critical data/code), or the 2nd kernel is already
broken by the 1st kernel (it's a broken behavior, but who can guarantee
that the "crashed" kernel works correctly?).

Usage: add "crash_kexec_post_notifiers" to kernel boot option.

Note that this actually increases risks of the failure of kdump. This
option should be set only if you worry about the rare case of kdump
failure rather than increasing the chance of success.

>
> This failure is a case in point.
>
> I think I am at my I told you so point. This is what all of the testing
> over all the years has said. Leaving functionality to the peculiarities
> of firmware when you don't have to, and can actually control what is
> going on doesn't work.
>
> Eric
>
>

Thanks
Dave

2020-09-23 15:51:06

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote:
> + more people who may care about this param

Paarty time!!

(See below, didn't snip any comments)
> On 09/21/20 at 08:45pm, Eric W. Biederman wrote:
> > Konrad Rzeszutek Wilk <[email protected]> writes:
> >
> > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:
> > >>
> > >> > crash_kexec_post_notifiers enables running various panic notifier
> > >> > before kdump kernel booting. This increases risks of kdump failure.
> > >> > It is well documented in kernel-parameters.txt. We do not suggest
> > >> > people to enable it together with kdump unless he/she is really sure.
> > >> > This is also not suggested to be enabled by default when users are
> > >> > not aware in distributions.
> > >> >
> > >> > But unfortunately it is enabled by default in systemd, see below
> > >> > discussions in a systemd report, we can not convince systemd to change
> > >> > it:
> > >> > https://github.com/systemd/systemd/issues/16661
> > >> >
> > >> > Actually we have got reports about kdump kernel hangs in both s390x
> > >> > and powerpcle cases caused by the systemd change, also some x86 cases
> > >> > could also be caused by the same (although that is in Hyper-V code
> > >> > instead of systemd, that need to be addressed separately).
> > >
> > > Perhaps it may be better to fix the issus on s390x and PowerPC as well?
> > >
> > >> >
> > >> > Thus to avoid the auto enablement here just disable the param writable
> > >> > permission in sysfs.
> > >> >
> > >>
> > >> Well. I don't think this is at all a desirable way of resolving a
> > >> disagreement with the systemd developers
> > >>
> > >> At the above github address I'm seeing "ryncsn added a commit to
> > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> > >> enable crash_kexec_post_notifiers by default". So didn't that address
> > >> the issue?
> > >
> > > It does in systemd, but there is a strong interest in making this on
> > > by default.
> >
> > There is also a strong interest in removing this code entirely from the
> > kernel.
>
> Added Hyper-V people and people who created the param, it is below
> commit, I also want to remove it if possible, let's see how people
> think, but the least way should be to disable the auto setting in both systemd
> and kernel:
>
> commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45
> Author: Masami Hiramatsu <[email protected]>
> Date: Fri Jun 6 14:37:07 2014 -0700
>
> kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers
>
> Add a "crash_kexec_post_notifiers" boot option to run kdump after
> running panic_notifiers and dump kmsg. This can help rare situations
> where kdump fails because of unstable crashed kernel or hardware failure
> (memory corruption on critical data/code), or the 2nd kernel is already
> broken by the 1st kernel (it's a broken behavior, but who can guarantee
> that the "crashed" kernel works correctly?).
>
> Usage: add "crash_kexec_post_notifiers" to kernel boot option.
>
> Note that this actually increases risks of the failure of kdump. This
> option should be set only if you worry about the rare case of kdump
> failure rather than increasing the chance of success.


If this is such risky knob that leads to bugs where folks are backing away
from with disgust in their faces - then perhaps the only way to go about
this is - limit the exposure to known working situations on firmware
that we can control?

That is enable only a subset of post notifiers which determine if they
are OK running if the conditions are blessed?

I think that would satisfy the conditions where you have to to deal with unsavory
bugs that end up on your plate - and aren't fun because there is no
way to fixing it - but at the same time allowing multiple ways to save the crash?

Please don't take away something that is quite useful in the field. Can we
hammer out something that will remove your pain points?
>
> >
> > This failure is a case in point.
> >
> > I think I am at my I told you so point. This is what all of the testing
> > over all the years has said. Leaving functionality to the peculiarities
> > of firmware when you don't have to, and can actually control what is
> > going on doesn't work.
> >
> > Eric
> >
> >
>
> Thanks
> Dave
>

2020-09-24 16:19:25

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

From: Konrad Rzeszutek Wilk <[email protected]> Sent: Wednesday, September 23, 2020 8:48 AM
>
> On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote:
> > + more people who may care about this param
>
> Paarty time!!
>
> (See below, didn't snip any comments)
> > On 09/21/20 at 08:45pm, Eric W. Biederman wrote:
> > > Konrad Rzeszutek Wilk <[email protected]> writes:
> > >
> > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
> > > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:
> > > >>
> > > >> > crash_kexec_post_notifiers enables running various panic notifier
> > > >> > before kdump kernel booting. This increases risks of kdump failure.
> > > >> > It is well documented in kernel-parameters.txt. We do not suggest
> > > >> > people to enable it together with kdump unless he/she is really sure.
> > > >> > This is also not suggested to be enabled by default when users are
> > > >> > not aware in distributions.
> > > >> >
> > > >> > But unfortunately it is enabled by default in systemd, see below
> > > >> > discussions in a systemd report, we can not convince systemd to change
> > > >> > it:
> > > >> >
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsyst
> emd%2Fsystemd%2Fissues%2F16661&amp;data=02%7C01%7Cmikelley%40microsoft.com%
> 7C3631bae06f7147c0f92908d85fd7f2b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
> 7C637364728378052956&amp;sdata=9CUpPUxcKLLggbJ1bjubBjbFUAhPVeZhIc4yss8wAiU%3
> D&amp;reserved=0
> > > >> >
> > > >> > Actually we have got reports about kdump kernel hangs in both s390x
> > > >> > and powerpcle cases caused by the systemd change, also some x86 cases
> > > >> > could also be caused by the same (although that is in Hyper-V code
> > > >> > instead of systemd, that need to be addressed separately).
> > > >
> > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well?
> > > >
> > > >> >
> > > >> > Thus to avoid the auto enablement here just disable the param writable
> > > >> > permission in sysfs.
> > > >> >
> > > >>
> > > >> Well. I don't think this is at all a desirable way of resolving a
> > > >> disagreement with the systemd developers
> > > >>
> > > >> At the above github address I'm seeing "ryncsn added a commit to
> > > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
> > > >> enable crash_kexec_post_notifiers by default". So didn't that address
> > > >> the issue?
> > > >
> > > > It does in systemd, but there is a strong interest in making this on
> > > > by default.
> > >
> > > There is also a strong interest in removing this code entirely from the
> > > kernel.
> >
> > Added Hyper-V people and people who created the param, it is below
> > commit, I also want to remove it if possible, let's see how people
> > think, but the least way should be to disable the auto setting in both systemd
> > and kernel:

Hyper-V uses a notifier to inform the host system that a Linux VM has
panic'ed. Informing the host is particularly important in a public cloud
such as Azure so that the cloud software can alert the customer, and can
track cloud-wide reliability statistics. Whether a kdump is taken is controlled
entirely by the customer and how he configures the VM, and we want
the host to be informed either way.

Michael

> >
> > commit f06e5153f4ae2e2f3b0300f0e260e40cb7fefd45
> > Author: Masami Hiramatsu <[email protected]>
> > Date: Fri Jun 6 14:37:07 2014 -0700
> >
> > kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after
> panic_notifers
> >
> > Add a "crash_kexec_post_notifiers" boot option to run kdump after
> > running panic_notifiers and dump kmsg. This can help rare situations
> > where kdump fails because of unstable crashed kernel or hardware failure
> > (memory corruption on critical data/code), or the 2nd kernel is already
> > broken by the 1st kernel (it's a broken behavior, but who can guarantee
> > that the "crashed" kernel works correctly?).
> >
> > Usage: add "crash_kexec_post_notifiers" to kernel boot option.
> >
> > Note that this actually increases risks of the failure of kdump. This
> > option should be set only if you worry about the rare case of kdump
> > failure rather than increasing the chance of success.
>
>
> If this is such risky knob that leads to bugs where folks are backing away
> from with disgust in their faces - then perhaps the only way to go about
> this is - limit the exposure to known working situations on firmware
> that we can control?
>
> That is enable only a subset of post notifiers which determine if they
> are OK running if the conditions are blessed?
>
> I think that would satisfy the conditions where you have to to deal with unsavory
> bugs that end up on your plate - and aren't fun because there is no
> way to fixing it - but at the same time allowing multiple ways to save the crash?
>
> Please don't take away something that is quite useful in the field. Can we
> hammer out something that will remove your pain points?
> >
> > >
> > > This failure is a case in point.
> > >
> > > I think I am at my I told you so point. This is what all of the testing
> > > over all the years has said. Leaving functionality to the peculiarities
> > > of firmware when you don't have to, and can actually control what is
> > > going on doesn't work.
> > >
> > > Eric
> > >
> > >
> >
> > Thanks
> > Dave
> >

2020-09-24 16:28:57

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

Michael Kelley <[email protected]> writes:

> From: Konrad Rzeszutek Wilk <[email protected]> Sent: Wednesday, September 23, 2020 8:48 AM
>>
>> On Wed, Sep 23, 2020 at 10:43:29AM +0800, Dave Young wrote:
>> > + more people who may care about this param
>>
>> Paarty time!!
>>
>> (See below, didn't snip any comments)
>> > On 09/21/20 at 08:45pm, Eric W. Biederman wrote:
>> > > Konrad Rzeszutek Wilk <[email protected]> writes:
>> > >
>> > > > On Fri, Sep 18, 2020 at 05:47:43PM -0700, Andrew Morton wrote:
>> > > >> On Fri, 18 Sep 2020 11:25:46 +0800 Dave Young <[email protected]> wrote:
>> > > >>
>> > > >> > crash_kexec_post_notifiers enables running various panic notifier
>> > > >> > before kdump kernel booting. This increases risks of kdump failure.
>> > > >> > It is well documented in kernel-parameters.txt. We do not suggest
>> > > >> > people to enable it together with kdump unless he/she is really sure.
>> > > >> > This is also not suggested to be enabled by default when users are
>> > > >> > not aware in distributions.
>> > > >> >
>> > > >> > But unfortunately it is enabled by default in systemd, see below
>> > > >> > discussions in a systemd report, we can not convince systemd to change
>> > > >> > it:
>> > > >> >
>> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fsyst
>> emd%2Fsystemd%2Fissues%2F16661&amp;data=02%7C01%7Cmikelley%40microsoft.com%
>> 7C3631bae06f7147c0f92908d85fd7f2b2%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%
>> 7C637364728378052956&amp;sdata=9CUpPUxcKLLggbJ1bjubBjbFUAhPVeZhIc4yss8wAiU%3
>> D&amp;reserved=0
>> > > >> >
>> > > >> > Actually we have got reports about kdump kernel hangs in both s390x
>> > > >> > and powerpcle cases caused by the systemd change, also some x86 cases
>> > > >> > could also be caused by the same (although that is in Hyper-V code
>> > > >> > instead of systemd, that need to be addressed separately).
>> > > >
>> > > > Perhaps it may be better to fix the issus on s390x and PowerPC as well?
>> > > >
>> > > >> >
>> > > >> > Thus to avoid the auto enablement here just disable the param writable
>> > > >> > permission in sysfs.
>> > > >> >
>> > > >>
>> > > >> Well. I don't think this is at all a desirable way of resolving a
>> > > >> disagreement with the systemd developers
>> > > >>
>> > > >> At the above github address I'm seeing "ryncsn added a commit to
>> > > >> ryncsn/systemd that referenced this issue 9 days ago", "pstore: don't
>> > > >> enable crash_kexec_post_notifiers by default". So didn't that address
>> > > >> the issue?
>> > > >
>> > > > It does in systemd, but there is a strong interest in making this on
>> > > > by default.
>> > >
>> > > There is also a strong interest in removing this code entirely from the
>> > > kernel.
>> >
>> > Added Hyper-V people and people who created the param, it is below
>> > commit, I also want to remove it if possible, let's see how people
>> > think, but the least way should be to disable the auto setting in both systemd
>> > and kernel:
>
> Hyper-V uses a notifier to inform the host system that a Linux VM has
> panic'ed. Informing the host is particularly important in a public cloud
> such as Azure so that the cloud software can alert the customer, and can
> track cloud-wide reliability statistics. Whether a kdump is taken is controlled
> entirely by the customer and how he configures the VM, and we want
> the host to be informed either way.

Why?

Why does the host care?
Especially if the VM continues executing into a kdump kernel?

Further like I have mentioned everytime something like this has come up
a call on the kexec on panic code path should be a direct call (That can
be audited) not something hidden in a notifier call chain (which can not).

Eric

2020-09-24 16:46:39

by Michael Kelley (LINUX)

[permalink] [raw]
Subject: RE: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

From: Eric W. Biederman <[email protected]> Sent: Thursday, September 24, 2020 9:26 AM
>
> Michael Kelley <[email protected]> writes:
>
> >> >
> >> > Added Hyper-V people and people who created the param, it is below
> >> > commit, I also want to remove it if possible, let's see how people
> >> > think, but the least way should be to disable the auto setting in both systemd
> >> > and kernel:
> >
> > Hyper-V uses a notifier to inform the host system that a Linux VM has
> > panic'ed. Informing the host is particularly important in a public cloud
> > such as Azure so that the cloud software can alert the customer, and can
> > track cloud-wide reliability statistics. Whether a kdump is taken is controlled
> > entirely by the customer and how he configures the VM, and we want
> > the host to be informed either way.
>
> Why?
>
> Why does the host care?
> Especially if the VM continues executing into a kdump kernel?

The host itself doesn't care. But the host is a convenient out-of-band
channel for recording that a panic has occurred and to collect basic data
about the panic. This out-of-band channel is then used to notify the end
customer that his VM has panic'ed. Sure, the customer should be running
his own monitoring software, but customers don't always do what they
should. Equally important, the out-of-band channel allows the cloud
infrastructure software to notice trends, such as that the rate of Linux
panics has increased, and that perhaps there is a cloud problem that
should be investigated.

>
> Further like I have mentioned everytime something like this has come up
> a call on the kexec on panic code path should be a direct call (That can
> be audited) not something hidden in a notifier call chain (which can not).
>

The use case I describe has no particular requirement that it be
implemented via the notifier call chain. If there's a better way to run
some out-of-band notification code on all Linux panics regardless of
whether a kdump is taken, we're open to such an alternative.

Michael

2020-09-24 17:18:33

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time


On 9/24/20 12:43 PM, Michael Kelley wrote:
> From: Eric W. Biederman <[email protected]> Sent: Thursday, September 24, 2020 9:26 AM
>> Michael Kelley <[email protected]> writes:
>>
>>>>> Added Hyper-V people and people who created the param, it is below
>>>>> commit, I also want to remove it if possible, let's see how people
>>>>> think, but the least way should be to disable the auto setting in both systemd
>>>>> and kernel:
>>> Hyper-V uses a notifier to inform the host system that a Linux VM has
>>> panic'ed. Informing the host is particularly important in a public cloud
>>> such as Azure so that the cloud software can alert the customer, and can
>>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled
>>> entirely by the customer and how he configures the VM, and we want
>>> the host to be informed either way.
>> Why?
>>
>> Why does the host care?
>> Especially if the VM continues executing into a kdump kernel?
> The host itself doesn't care. But the host is a convenient out-of-band
> channel for recording that a panic has occurred and to collect basic data
> about the panic. This out-of-band channel is then used to notify the end
> customer that his VM has panic'ed. Sure, the customer should be running
> his own monitoring software, but customers don't always do what they
> should. Equally important, the out-of-band channel allows the cloud
> infrastructure software to notice trends, such as that the rate of Linux
> panics has increased, and that perhaps there is a cloud problem that
> should be investigated.


In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete.


>
>> Further like I have mentioned everytime something like this has come up
>> a call on the kexec on panic code path should be a direct call (That can
>> be audited) not something hidden in a notifier call chain (which can not).
>>

We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well.


-boris


> The use case I describe has no particular requirement that it be
> implemented via the notifier call chain. If there's a better way to run
> some out-of-band notification code on all Linux panics regardless of
> whether a kdump is taken, we're open to such an alternative.

2020-09-25 03:10:20

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

Hi,

On 09/24/20 at 01:16pm, [email protected] wrote:
>
> On 9/24/20 12:43 PM, Michael Kelley wrote:
> > From: Eric W. Biederman <[email protected]> Sent: Thursday, September 24, 2020 9:26 AM
> >> Michael Kelley <[email protected]> writes:
> >>
> >>>>> Added Hyper-V people and people who created the param, it is below
> >>>>> commit, I also want to remove it if possible, let's see how people
> >>>>> think, but the least way should be to disable the auto setting in both systemd
> >>>>> and kernel:
> >>> Hyper-V uses a notifier to inform the host system that a Linux VM has
> >>> panic'ed. Informing the host is particularly important in a public cloud
> >>> such as Azure so that the cloud software can alert the customer, and can
> >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled
> >>> entirely by the customer and how he configures the VM, and we want
> >>> the host to be informed either way.
> >> Why?
> >>
> >> Why does the host care?
> >> Especially if the VM continues executing into a kdump kernel?
> > The host itself doesn't care. But the host is a convenient out-of-band
> > channel for recording that a panic has occurred and to collect basic data
> > about the panic. This out-of-band channel is then used to notify the end
> > customer that his VM has panic'ed. Sure, the customer should be running
> > his own monitoring software, but customers don't always do what they
> > should. Equally important, the out-of-band channel allows the cloud
> > infrastructure software to notice trends, such as that the rate of Linux
> > panics has increased, and that perhaps there is a cloud problem that
> > should be investigated.
>
>
> In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete.

That can probably be done in kdump kernel if it is really needed. Say
informing host that panic happened and a kdump kernel is runnning.

But I think to set crash_kexec_post_notifiers by default is still bad.

>
>
> >
> >> Further like I have mentioned everytime something like this has come up
> >> a call on the kexec on panic code path should be a direct call (That can
> >> be audited) not something hidden in a notifier call chain (which can not).
> >>
>
> We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well.

Right, that is the same thing we are talking about.

Thanks
Dave

2020-09-25 14:58:54

by Konrad Rzeszutek Wilk

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote:
> Hi,
>
> On 09/24/20 at 01:16pm, [email protected] wrote:
> >
> > On 9/24/20 12:43 PM, Michael Kelley wrote:
> > > From: Eric W. Biederman <[email protected]> Sent: Thursday, September 24, 2020 9:26 AM
> > >> Michael Kelley <[email protected]> writes:
> > >>
> > >>>>> Added Hyper-V people and people who created the param, it is below
> > >>>>> commit, I also want to remove it if possible, let's see how people
> > >>>>> think, but the least way should be to disable the auto setting in both systemd
> > >>>>> and kernel:
> > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has
> > >>> panic'ed. Informing the host is particularly important in a public cloud
> > >>> such as Azure so that the cloud software can alert the customer, and can
> > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled
> > >>> entirely by the customer and how he configures the VM, and we want
> > >>> the host to be informed either way.
> > >> Why?
> > >>
> > >> Why does the host care?
> > >> Especially if the VM continues executing into a kdump kernel?
> > > The host itself doesn't care. But the host is a convenient out-of-band
> > > channel for recording that a panic has occurred and to collect basic data
> > > about the panic. This out-of-band channel is then used to notify the end
> > > customer that his VM has panic'ed. Sure, the customer should be running
> > > his own monitoring software, but customers don't always do what they
> > > should. Equally important, the out-of-band channel allows the cloud
> > > infrastructure software to notice trends, such as that the rate of Linux
> > > panics has increased, and that perhaps there is a cloud problem that
> > > should be investigated.
> >
> >
> > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete.
>
> That can probably be done in kdump kernel if it is really needed. Say
> informing host that panic happened and a kdump kernel is runnning.

If kdump kernel gets to that point. Sometimes (sadly) it ends up being
misconfigured and it chokes up - and hence having multiple ways to emit
the crash information before running kdump kernel is a life-saver.

>
> But I think to set crash_kexec_post_notifiers by default is still bad.

Because of the way it is run today I presume? If there was some
safe/unsafe policy that should work right? I would think that the
safe ones that work properly all the time are:

- HyperV CRASH_MSRs,
- KVM PVPANIC_[PANIC,CRASHLOAD] push button knob,
- pstore EFI variables
- Dumping in memory,

And then some that depend on firmware version (aka BIOS, and vendor) are:
- ACPI ERST,

And then the unsafe:
- s390, PowerPC (I don't actually know what they are but that
was Dave's primary motivator).

>
> >
> >
> > >
> > >> Further like I have mentioned everytime something like this has come up
> > >> a call on the kexec on panic code path should be a direct call (That can
> > >> be audited) not something hidden in a notifier call chain (which can not).
> > >>
> >
> > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well.
>
> Right, that is the same thing we are talking about.
>
> Thanks
> Dave
>

2020-09-27 02:54:36

by Dave Young

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

Hi,

On 09/25/20 at 10:56am, Konrad Rzeszutek Wilk wrote:
> On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote:
> > Hi,
> >
> > On 09/24/20 at 01:16pm, [email protected] wrote:
> > >
> > > On 9/24/20 12:43 PM, Michael Kelley wrote:
> > > > From: Eric W. Biederman <[email protected]> Sent: Thursday, September 24, 2020 9:26 AM
> > > >> Michael Kelley <[email protected]> writes:
> > > >>
> > > >>>>> Added Hyper-V people and people who created the param, it is below
> > > >>>>> commit, I also want to remove it if possible, let's see how people
> > > >>>>> think, but the least way should be to disable the auto setting in both systemd
> > > >>>>> and kernel:
> > > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has
> > > >>> panic'ed. Informing the host is particularly important in a public cloud
> > > >>> such as Azure so that the cloud software can alert the customer, and can
> > > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled
> > > >>> entirely by the customer and how he configures the VM, and we want
> > > >>> the host to be informed either way.
> > > >> Why?
> > > >>
> > > >> Why does the host care?
> > > >> Especially if the VM continues executing into a kdump kernel?
> > > > The host itself doesn't care. But the host is a convenient out-of-band
> > > > channel for recording that a panic has occurred and to collect basic data
> > > > about the panic. This out-of-band channel is then used to notify the end
> > > > customer that his VM has panic'ed. Sure, the customer should be running
> > > > his own monitoring software, but customers don't always do what they
> > > > should. Equally important, the out-of-band channel allows the cloud
> > > > infrastructure software to notice trends, such as that the rate of Linux
> > > > panics has increased, and that perhaps there is a cloud problem that
> > > > should be investigated.
> > >
> > >
> > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete.
> >
> > That can probably be done in kdump kernel if it is really needed. Say
> > informing host that panic happened and a kdump kernel is runnning.
>
> If kdump kernel gets to that point. Sometimes (sadly) it ends up being
> misconfigured and it chokes up - and hence having multiple ways to emit
> the crash information before running kdump kernel is a life-saver.

If it is done in kernel boot phase before pid 1 comes up then things
should be good enough, specific for kvm/hyper-v guests the kdump kernel.

>
> >
> > But I think to set crash_kexec_post_notifiers by default is still bad.
>
> Because of the way it is run today I presume? If there was some
> safe/unsafe policy that should work right? I would think that the
> safe ones that work properly all the time are:
>
> - HyperV CRASH_MSRs,
> - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob,
> - pstore EFI variables
> - Dumping in memory,
>
> And then some that depend on firmware version (aka BIOS, and vendor) are:
> - ACPI ERST,
>
> And then the unsafe:
> - s390, PowerPC (I don't actually know what they are but that
> was Dave's primary motivator).

As I said we also got reports of kdump kernel hang with Hyper-V with the
crash_kexec_post_notifiers enabled.

EFI pstore also depends on efi runtime that is in firmware, also we can
not ensure it works well after a panic happened. Ditto for other pstore
backends we do not prefer to do it before kdump. But as I said I'm not
saying they are not useful, people can use them by their choose.

As for the virtual machine panic events maybe it is ok to add some other
hooks instead of the notifiers. But frankly I still feel it is better to do
it in kdump kernel boot path since kdump works well for virt from our
experience.

>
> >
> > >
> > >
> > > >
> > > >> Further like I have mentioned everytime something like this has come up
> > > >> a call on the kexec on panic code path should be a direct call (That can
> > > >> be audited) not something hidden in a notifier call chain (which can not).
> > > >>
> > >
> > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well.
> >
> > Right, that is the same thing we are talking about.
> >
> > Thanks
> > Dave
> >
>

Thanks
Dave

2020-09-29 13:39:39

by Philipp Rudo

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

Hi,

On Fri, 25 Sep 2020 10:56:25 -0400
Konrad Rzeszutek Wilk <[email protected]> wrote:

> On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote:
> > Hi,
> >
> > On 09/24/20 at 01:16pm, [email protected] wrote:
> > >
> > > On 9/24/20 12:43 PM, Michael Kelley wrote:
> > > > From: Eric W. Biederman <[email protected]> Sent: Thursday, September 24, 2020 9:26 AM
> > > >> Michael Kelley <[email protected]> writes:
> > > >>
> > > >>>>> Added Hyper-V people and people who created the param, it is below
> > > >>>>> commit, I also want to remove it if possible, let's see how people
> > > >>>>> think, but the least way should be to disable the auto setting in both systemd
> > > >>>>> and kernel:
> > > >>> Hyper-V uses a notifier to inform the host system that a Linux VM has
> > > >>> panic'ed. Informing the host is particularly important in a public cloud
> > > >>> such as Azure so that the cloud software can alert the customer, and can
> > > >>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled
> > > >>> entirely by the customer and how he configures the VM, and we want
> > > >>> the host to be informed either way.
> > > >> Why?
> > > >>
> > > >> Why does the host care?
> > > >> Especially if the VM continues executing into a kdump kernel?
> > > > The host itself doesn't care. But the host is a convenient out-of-band
> > > > channel for recording that a panic has occurred and to collect basic data
> > > > about the panic. This out-of-band channel is then used to notify the end
> > > > customer that his VM has panic'ed. Sure, the customer should be running
> > > > his own monitoring software, but customers don't always do what they
> > > > should. Equally important, the out-of-band channel allows the cloud
> > > > infrastructure software to notice trends, such as that the rate of Linux
> > > > panics has increased, and that perhaps there is a cloud problem that
> > > > should be investigated.
> > >
> > >
> > > In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete.
> >
> > That can probably be done in kdump kernel if it is really needed. Say
> > informing host that panic happened and a kdump kernel is runnning.
>
> If kdump kernel gets to that point. Sometimes (sadly) it ends up being
> misconfigured and it chokes up - and hence having multiple ways to emit
> the crash information before running kdump kernel is a life-saver.
>
> >
> > But I think to set crash_kexec_post_notifiers by default is still bad.
>
> Because of the way it is run today I presume? If there was some
> safe/unsafe policy that should work right? I would think that the
> safe ones that work properly all the time are:
>
> - HyperV CRASH_MSRs,
> - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob,
> - pstore EFI variables
> - Dumping in memory,
>
> And then some that depend on firmware version (aka BIOS, and vendor) are:
> - ACPI ERST,
>
> And then the unsafe:
> - s390, PowerPC (I don't actually know what they are but that
> was Dave's primary motivator).

that won't work on s390. Let me emphasize that the problems on s390 are not the
notifiers themselves but the fact that they are called before crash_kexec.

On s390 we have multiple dump methods besides kdump. We use a panic notifier to
trigger these dump methods from the panicking kernel. The problem is that these
dump methods are less powerful than kdump so we only want to use them as
fallback, i.e. only use them when either kdump wasn't configured or loading of
the crash kernel failed for whatever reason. That's why (plus historic reasons)
our notifier stops the machine when it is called and none of the methods is
configured. Which means that the second crash_kexec is never reached.

Long story short, the problem on s390 is caused by the two hunks in
kernel/panic.c:panic from f06e5153f4ae ("kernel/panic.c: add
"crash_kexec_post_notifiers" option for kdump after panic_notifers").

Besides the problems on s390 I support Dave and think that setting
crash_kexec_post_notifiers by default is wrong. We should keep in mind that
we are in a panic situation. This means that the kernel is in a state where it
doesn't trust itself anymore. So we should keep the code that is run to the
bare minimum as we cannot rely on it to work properly.

Thanks
Philipp

>
> >
> > >
> > >
> > > >
> > > >> Further like I have mentioned everytime something like this has come up
> > > >> a call on the kexec on panic code path should be a direct call (That can
> > > >> be audited) not something hidden in a notifier call chain (which can not).
> > > >>
> > >
> > > We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well.
> >
> > Right, that is the same thing we are talking about.
> >
> > Thanks
> > Dave
> >
>
> _______________________________________________
> kexec mailing list
> [email protected]
> http://lists.infradead.org/mailman/listinfo/kexec

2020-09-29 19:12:58

by Boris Ostrovsky

[permalink] [raw]
Subject: Re: [PATCH] Only allow to set crash_kexec_post_notifiers on boot time

+Lennart


On 9/29/20 9:36 AM, Philipp Rudo wrote:
> Hi,
>
> On Fri, 25 Sep 2020 10:56:25 -0400
> Konrad Rzeszutek Wilk <[email protected]> wrote:
>
>> On Fri, Sep 25, 2020 at 11:05:58AM +0800, Dave Young wrote:
>>> Hi,
>>>
>>> On 09/24/20 at 01:16pm, [email protected] wrote:
>>>> On 9/24/20 12:43 PM, Michael Kelley wrote:
>>>>> From: Eric W. Biederman <[email protected]> Sent: Thursday, September 24, 2020 9:26 AM
>>>>>> Michael Kelley <[email protected]> writes:
>>>>>>
>>>>>>>>> Added Hyper-V people and people who created the param, it is below
>>>>>>>>> commit, I also want to remove it if possible, let's see how people
>>>>>>>>> think, but the least way should be to disable the auto setting in both systemd
>>>>>>>>> and kernel:
>>>>>>> Hyper-V uses a notifier to inform the host system that a Linux VM has
>>>>>>> panic'ed. Informing the host is particularly important in a public cloud
>>>>>>> such as Azure so that the cloud software can alert the customer, and can
>>>>>>> track cloud-wide reliability statistics. Whether a kdump is taken is controlled
>>>>>>> entirely by the customer and how he configures the VM, and we want
>>>>>>> the host to be informed either way.
>>>>>> Why?
>>>>>>
>>>>>> Why does the host care?
>>>>>> Especially if the VM continues executing into a kdump kernel?
>>>>> The host itself doesn't care. But the host is a convenient out-of-band
>>>>> channel for recording that a panic has occurred and to collect basic data
>>>>> about the panic. This out-of-band channel is then used to notify the end
>>>>> customer that his VM has panic'ed. Sure, the customer should be running
>>>>> his own monitoring software, but customers don't always do what they
>>>>> should. Equally important, the out-of-band channel allows the cloud
>>>>> infrastructure software to notice trends, such as that the rate of Linux
>>>>> panics has increased, and that perhaps there is a cloud problem that
>>>>> should be investigated.
>>>>
>>>> In many cases (especially in cloud environment) your dump device is remote (e.g. iscsi) and kdump sometimes (often?) gets stuck because of connectivity issues (which could be cause of the panic in the first place). So it is quite desirable to inform the infrastructure that the VM is on its way out without waiting for kdump to complete.
>>> That can probably be done in kdump kernel if it is really needed. Say
>>> informing host that panic happened and a kdump kernel is runnning.
>> If kdump kernel gets to that point. Sometimes (sadly) it ends up being
>> misconfigured and it chokes up - and hence having multiple ways to emit
>> the crash information before running kdump kernel is a life-saver.
>>
>>> But I think to set crash_kexec_post_notifiers by default is still bad.
>> Because of the way it is run today I presume? If there was some
>> safe/unsafe policy that should work right? I would think that the
>> safe ones that work properly all the time are:
>>
>> - HyperV CRASH_MSRs,
>> - KVM PVPANIC_[PANIC,CRASHLOAD] push button knob,
>> - pstore EFI variables
>> - Dumping in memory,
>>
>> And then some that depend on firmware version (aka BIOS, and vendor) are:
>> - ACPI ERST,
>>
>> And then the unsafe:
>> - s390, PowerPC (I don't actually know what they are but that
>> was Dave's primary motivator).
> that won't work on s390. Let me emphasize that the problems on s390 are not the
> notifiers themselves but the fact that they are called before crash_kexec.
>
> On s390 we have multiple dump methods besides kdump. We use a panic notifier to
> trigger these dump methods from the panicking kernel. The problem is that these
> dump methods are less powerful than kdump so we only want to use them as
> fallback, i.e. only use them when either kdump wasn't configured or loading of
> the crash kernel failed for whatever reason. That's why (plus historic reasons)
> our notifier stops the machine when it is called and none of the methods is
> configured. Which means that the second crash_kexec is never reached.
>
> Long story short, the problem on s390 is caused by the two hunks in
> kernel/panic.c:panic from f06e5153f4ae ("kernel/panic.c: add
> "crash_kexec_post_notifiers" option for kdump after panic_notifers").
>
> Besides the problems on s390 I support Dave and think that setting
> crash_kexec_post_notifiers by default is wrong. We should keep in mind that
> we are in a panic situation. This means that the kernel is in a state where it
> doesn't trust itself anymore. So we should keep the code that is run to the
> bare minimum as we cannot rely on it to work properly.


There is a pending patch to revert notifiers' default in systemd: https://github.com/systemd/systemd/pull/16950


If this change goes through then Dave's patch will be unnecessary.


-boris



>
> Thanks
> Philipp
>
>>>
>>>>
>>>>>
>>>>>> Further like I have mentioned everytime something like this has come up
>>>>>> a call on the kexec on panic code path should be a direct call (That can
>>>>>> be audited) not something hidden in a notifier call chain (which can not).
>>>>>>
>>>> We btw already have a direct call from panic() to kmsg_dump() which is indirectly controlled by crash_kexec_post_notifiers, and it would also be preferable to be able to call it before kdump as well.
>>> Right, that is the same thing we are talking about.
>>>
>>> Thanks
>>> Dave
>>>
>> _______________________________________________
>> kexec mailing list
>> [email protected]
>> http://lists.infradead.org/mailman/listinfo/kexec