LinuxLists.cc - [RFC V2 PATCH 0/1] kexec: crash_kexec_post

2015-07-24 01:21:13

Subject: [RFC V2 PATCH 0/1] kexec: crash_kexec_post_notifiers boot option related fixes

This is a bugfix patch for crash_kexec_post_notifiers boot option
which allows users to call panic notifiers and kmsg dumpers before
kdump.

This fixes one of the problems reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44).

Problem 1:
If crash_kexec_post_notifiers boot option is specified, some
shutting down process which assume other cpus are still alive
don't work properly.

Problem 2 (addressed by this patch):
If crash_kexec_post_notifiers boot option is specified, register
information of other cpus are not saved to crash dumps.

Following Vivek's opinion, this patch replaces smp_send_stop()
in panic() with suitable version for crash_kexec which saves
cpu states and other things like cleaning up VMX/SVM. Since this
needs architecture specific implementation and it's not so trivial,
this version only support for x86. So the problem 1, known to
happen on MIPS/OCTEON, is not addressed now.

To keep the modification impact low, this patch doesn't change
the logic basically if crash_kexec_post_notifiers is not specified.

Please note that crash_kexec() can be called directly without
entering panic(). Stopping other cpus functionality is still
needed in crash_kexec().

Changes in V2:
- Replace smp_send_stop() call with crash_kexec version which
saves cpu states and does cleanups instead of changing execution
flow
- Drop a fix for Problem 1
- Drop other patches because they aren't needed anymore

V1: https://lkml.org/lkml/2015/7/10/316

---

Hidehiro Kawai (1):
panic/x86: Replace smp_send_stop() with crash_kexec version

arch/x86/kernel/crash.c | 16 +++++++++++-----
kernel/panic.c | 29 +++++++++++++++++++++++------
2 files changed, 34 insertions(+), 11 deletions(-)

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

2015-07-24 01:21:25

by Hidehiro Kawai

[permalink] [raw]

Subject: [RFC V2 PATCH 1/1] panic/x86: Replace smp_send_stop() with crash_kexec version

This patch fixes one of the problems reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44).

If "crash_kexec_post_notifiers" boot option is specified,
other cpus are stopped by smp_send_stop() before entering
crash_kexec(), while usually machine_crash_shutdown() called by
crash_kexec() does that. This behavior change leads two problems.

Problem 1:
Some functions in the crash_kexec() path depend on other cpus being
still online. If other cpus have been offlined already, they
doesn't work properly.

Example (MIPS OCTEON case):
panic()
crash_kexec()
machine_crash_shutdown()
octeon_generic_shutdown() // shutdown watchdog for ONLINE cpus
machine_kexec()

Problem 2:
Most of architectures stop other cpus in the machine_crash_shutdown()
path and save register information at that time. However, if
smp_send_stop() is called before that, we can't save the register
information.

This patch solves the problem 2 by replacing smp_send_stop() in
panic() with panic_smp_stop_cpus() which is a weak function and can be
replaced with suitable version for crash_kexec context. In fact,
x86 replaces it with a function based on kdump_nmi_shootdown_cpus() to
stop other cpus and save their states.

Please note that crash_kexec() can be called directly without
entering panic(). A stop-other-cpus procedure is still needed
by crash_kexec().

Changes in V2:
- Replace smp_send_stop() call with crash_kexec version which
saves cpu states and cleans up VMX/SVM
- Drop a fix for Problem 1 at this moment

Reported-by: Daniel Walker <[email protected]>
Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option
Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
arch/x86/kernel/crash.c | 16 +++++++++++-----
kernel/panic.c | 29 +++++++++++++++++++++++------
2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e068d66..913c621 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -130,16 +130,22 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
disable_local_APIC();
}

-static void kdump_nmi_shootdown_cpus(void)
+/* Please see the comment on the weak version in kernel/panic.c */
+void panic_smp_stop_cpus(void)
{
+ static int cpus_stopped;
+
in_crash_kexec = 1;
- nmi_shootdown_cpus(kdump_nmi_callback);

- disable_local_APIC();
+ if (!cpus_stopped) {
+ nmi_shootdown_cpus(kdump_nmi_callback);
+ disable_local_APIC();
+ cpus_stopped = 1;
+ }
}

#else
-static void kdump_nmi_shootdown_cpus(void)
+void panic_smp_stop_cpus(void)
{
/* There are no cpus to shootdown */
}
@@ -158,7 +164,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
/* The kernel is broken so disable interrupts */
local_irq_disable();

- kdump_nmi_shootdown_cpus();
+ panic_smp_stop_cpus();

/*
* VMCLEAR VMCSs loaded on this cpu if needed.
diff --git a/kernel/panic.c b/kernel/panic.c
index 04e91ff..a507637 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -60,6 +60,28 @@ void __weak panic_smp_self_stop(void)
cpu_relax();
}

+/*
+ * Stop other cpus in panic. Architecture code may override this to
+ * with more suitable version. Moreover, if the architecture supports
+ * crash dump, it should also save the states of stopped cpus.
+ *
+ * This function should be called only once.
+ */
+void __weak panic_smp_stop_cpus(void)
+{
+ static int cpus_stopped;
+
+ if (!cpus_stopped) {
+ /*
+ * Note smp_send_stop is the usual smp shutdown function,
+ * which unfortunately means it may not be hardened to
+ * work in a panic situation.
+ */
+ smp_send_stop();
+ cpus_stopped = 1;
+ }
+}
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -120,12 +142,7 @@ void panic(const char *fmt, ...)
if (!crash_kexec_post_notifiers)
crash_kexec(NULL);

- /*
- * Note smp_send_stop is the usual smp shutdown function, which
- * unfortunately means it may not be hardened to work in a panic
- * situation.
- */
- smp_send_stop();
+ panic_smp_stop_cpus();

/*
* Run any panic handlers, including those that might need to

2015-08-03 11:06:13

by Hidehiro Kawai

[permalink] [raw]

Subject: Re: [RFC V2 PATCH 0/1] kexec: crash_kexec_post_notifiers boot option related fixes

Hello Eric and Vivek,

Do you have any comments?

(2015/07/24 10:16), Hidehiro Kawai wrote:
> This is a bugfix patch for crash_kexec_post_notifiers boot option
> which allows users to call panic notifiers and kmsg dumpers before
> kdump.
>
> This fixes one of the problems reported by Daniel Walker
> (https://lkml.org/lkml/2015/6/24/44).
>
> Problem 1:
> If crash_kexec_post_notifiers boot option is specified, some
> shutting down process which assume other cpus are still alive
> don't work properly.
>
> Problem 2 (addressed by this patch):
> If crash_kexec_post_notifiers boot option is specified, register
> information of other cpus are not saved to crash dumps.
>
> Following Vivek's opinion, this patch replaces smp_send_stop()
> in panic() with suitable version for crash_kexec which saves
> cpu states and other things like cleaning up VMX/SVM. Since this
> needs architecture specific implementation and it's not so trivial,
> this version only support for x86. So the problem 1, known to
> happen on MIPS/OCTEON, is not addressed now.
>
> To keep the modification impact low, this patch doesn't change
> the logic basically if crash_kexec_post_notifiers is not specified.
>
> Please note that crash_kexec() can be called directly without
> entering panic(). Stopping other cpus functionality is still
> needed in crash_kexec().
>
> Changes in V2:
> - Replace smp_send_stop() call with crash_kexec version which
> saves cpu states and does cleanups instead of changing execution
> flow
> - Drop a fix for Problem 1
> - Drop other patches because they aren't needed anymore
>
> V1: https://lkml.org/lkml/2015/7/10/316
>
> ---
>
> Hidehiro Kawai (1):
> panic/x86: Replace smp_send_stop() with crash_kexec version
>
>
> arch/x86/kernel/crash.c | 16 +++++++++++-----
> kernel/panic.c | 29 +++++++++++++++++++++++------
> 2 files changed, 34 insertions(+), 11 deletions(-)
>
>

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

2015-08-03 16:39:53

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC V2 PATCH 0/1] kexec: crash_kexec_post_notifiers boot option related fixes

Hidehiro Kawai <[email protected]> writes:

> Hello Eric and Vivek,
>
> Do you have any comments?

crash_kexec_post_notifiers is a debugging hack to allow people to test
if the kmsg_dump works better than kexec. crash_kexec_post_notifiers is
not, nor has it ever been a solution for general operation (which is
what I perceive this work trying to push).

I will not support any work that expands crash_kexec_post_notifiers to
be more than it currently is, because people want ``panic hooks'' to
run before kexec. That appropach was extensively tested before kexec on
panic was implemented in the kernel and every implementation failed.
The practical symptom was that everything would work ok in testing but
on failures in the real world there would be enough going on in the
dying kernel that no crash dump would be taken. kexec on panic on the
other hand works a reasonable fraction of the time.

I deeply and fundamentally can not support a general purpose hook being
called before kexec. In 15 years of practice I have never heard of a
case where using a general purpose hook does anything but make kexec on
panic undebuggable in practice.

A specific hook for a very specific purpose when there is no other way
we can consider.

If you don't have something that generalises well into a general purpose
operation that it makes sense for everyone to call you can always use
the world's largest aka you can run code before the new kernel starts
that is loaded with kexec_load.

If you absolutely must run code in the dying code because you need lots
of the kernel infrastructure to work, and it is too hard to code a small
little bit of stand-alone assembly, I am sorry for you. Experience shows
that will never work when the kernel fails in interesting ways.

So no. I don't think there is any point to putting any more effort into
the crash_kexec_post_notifiers path because experience has shown over
the years that in practice it won't work for anyone, and if the code
doesn't work in practice there is no point in developing or implementing
it.

Eric

2015-08-04 11:41:20

by Hidehiro Kawai

[permalink] [raw]

Subject: RE: [RFC V2 PATCH 0/1] kexec: crash_kexec_post_notifiers boot option related fixes

Hello,

Thanks for the reply.

> From: Eric W. Biederman [mailto:[email protected]]
[...]
> A specific hook for a very specific purpose when there is no other way
> we can consider.

So, is kmsg_dump like feature admissible?

> If you don't have something that generalises well into a general purpose
> operation that it makes sense for everyone to call you can always use
> the world's largest aka you can run code before the new kernel starts
> that is loaded with kexec_load.

One of our purposes, notifying "I'm dying", would be achieved by purgatory
code provided by kexec command as I stated before. Since the way of the
notification will differ from each vendor, I think we need to modify
the purgatory codes pluggable. Also, I think we need some parameter
passing mechanism to the purgatory code. For example, passing the panic
message via boot parameter to save it to SEL. Although I'm not sure
we can do that (I've not investigated well yet). Is that acceptable?

Regards,
Kawai

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2015-08-05 17:16:54

by Eric W. Biederman

[permalink] [raw]

Subject: Re: [RFC V2 PATCH 0/1] kexec: crash_kexec_post_notifiers boot option related fixes

"河合英宏 / KAWAI，HIDEHIRO" <[email protected]> writes:

> Hello,
>
> Thanks for the reply.
>
>> From: Eric W. Biederman [mailto:[email protected]]
> [...]
>> A specific hook for a very specific purpose when there is no other way
>> we can consider.
>
> So, is kmsg_dump like feature admissible?
>
>> If you don't have something that generalises well into a general purpose
>> operation that it makes sense for everyone to call you can always use
>> the world's largest aka you can run code before the new kernel starts
>> that is loaded with kexec_load.
>
> One of our purposes, notifying "I'm dying", would be achieved by purgatory
> code provided by kexec command as I stated before. Since the way of the
> notification will differ from each vendor, I think we need to modify
> the purgatory codes pluggable. Also, I think we need some parameter
> passing mechanism to the purgatory code. For example, passing the panic
> message via boot parameter to save it to SEL. Although I'm not sure
> we can do that (I've not investigated well yet). Is that acceptable?

I think the address of panic message is available in crash notes. If
not that is very reasonable to add.

Updating the SEL from purgatory after purgatory has validated the
checksums of the crash handling code is acceptable.

All that is desired is to run as little code as possible in a kernel
that is known broken. Once the checksums have verified things in
purgatory you should be in good shape, and there is no possibility of
relying on broken infrastructure because that code simply is not present
in purgatory.

We already have a few early_printk style drivers in purgatory and I
don't the code to update the SEL would be much worse.

On the flip side there are enough firmware bugs that I personally would
not want to rely on firmware code running properly when the machine is
in a known broken state, so I don't want the SEL update to be
unconditional.

Eric

2015-08-07 01:38:54

by Hidehiro Kawai

[permalink] [raw]

Subject: RE: [RFC V2 PATCH 0/1] kexec: crash_kexec_post_notifiers boot option related fixes

> From: Eric W. Biederman [mailto:[email protected]]
> >> From: Eric W. Biederman [mailto:[email protected]]
> > [...]
> >> A specific hook for a very specific purpose when there is no other way
> >> we can consider.
> >
> > So, is kmsg_dump like feature admissible?
> >
> >> If you don't have something that generalises well into a general purpose
> >> operation that it makes sense for everyone to call you can always use
> >> the world's largest aka you can run code before the new kernel starts
> >> that is loaded with kexec_load.
> >
> > One of our purposes, notifying "I'm dying", would be achieved by purgatory
> > code provided by kexec command as I stated before. Since the way of the
> > notification will differ from each vendor, I think we need to modify
> > the purgatory codes pluggable. Also, I think we need some parameter
> > passing mechanism to the purgatory code. For example, passing the panic
> > message via boot parameter to save it to SEL. Although I'm not sure
> > we can do that (I've not investigated well yet). Is that acceptable?
>
> I think the address of panic message is available in crash notes. If
> not that is very reasonable to add.

I believed the boot parameter is prepared by the 1st kernel, but
it's wrong. The boot parameter is completely provieded kexec command.
So, passing the panic message through boot parameter will not
be feasible. I'm not sure we can easily access to the crash notes
from purgatory, but I think it's a reasonable way to pass panic message.

> Updating the SEL from purgatory after purgatory has validated the
> checksums of the crash handling code is acceptable.
>
> All that is desired is to run as little code as possible in a kernel
> that is known broken. Once the checksums have verified things in
> purgatory you should be in good shape, and there is no possibility of
> relying on broken infrastructure because that code simply is not present
> in purgatory.
>
> We already have a few early_printk style drivers in purgatory and I
> don't the code to update the SEL would be much worse.

For developers, early_printk style feature will be better solution.
For end users, however, it will not be true. Sometimes they cannot
use a serial port for early_printk because the serial port is used
for other purpose. Sometimes they cannot place additional machine
which receives messages from the serial port. So we need some
plugin or enable/disable mechanism for specific purgatory code.

> On the flip side there are enough firmware bugs that I personally would
> not want to rely on firmware code running properly when the machine is
> in a known broken state, so I don't want the SEL update to be
> unconditional.

Yes, I don't also trust BMC firmware. The most simple I/F to BMC
is KCS (Keyboard Controller Style) I/F which is accessible via
two I/O ports. If BMC becomes insane, the state machine for the I/F
can go into infinite loop. However, we can avoid this by introducing
proper timeout. Of course, I think we should add some enable/disable
mechanism.

Regards,
Kawai

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?