2015-07-10 11:40:04

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 0/3] kexec: crash_kexec_post_notifiers boot option related fixes

This is a bugfix patch set for crash_kexec_post_notifiers boot option
which allows users to call panic notifiers and kmsg dumpers before
kdump.

The main patch is PATCH 3/3, and it fixes two problems reported by
Daniel Walker (https://lkml.org/lkml/2015/6/24/44).

Problem 1:
If crash_kexec_post_notifiers boot option is specified, some
shutting down process which assume other cpus are still alive
don't work properly.

Problem 2:
If crash_kexec_post_notifiers boot option is specified, register
information of other cpus are not saved to crash dumps.

PATCH 3/3 is also another solution for a bug reported by Daisuke
Hatayama (https://lkml.org/lkml/2015/5/15/284). So, this patch
replace commits 5375b70 and f45d85f which fix the bug.

---

Hidehiro Kawai (3):
panic: Disable crash_kexec_post_notifiers if kdump is not available
kexec: Pass panic message to crash_kexec()
kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot option


Documentation/kernel-parameters.txt | 4 ++
arch/arm/kernel/traps.c | 2 +
arch/arm64/kernel/traps.c | 2 +
arch/metag/kernel/traps.c | 2 +
arch/mips/kernel/traps.c | 2 +
arch/powerpc/kernel/traps.c | 2 +
arch/s390/kernel/ipl.c | 2 +
arch/sh/kernel/traps.c | 2 +
arch/x86/kernel/dumpstack.c | 2 +
arch/x86/platform/uv/uv_nmi.c | 2 +
include/linux/kernel.h | 1 +
include/linux/kexec.h | 7 +++-
kernel/kexec.c | 62 ++++++++++++++++++++++++++++++-----
kernel/panic.c | 23 ++++---------
14 files changed, 80 insertions(+), 35 deletions(-)


--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


2015-07-10 11:40:15

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot option

This patch fixes problems reported by Daniel Walker
(https://lkml.org/lkml/2015/6/24/44), and also replaces the bug fix
commits 5375b70 and f45d85f.

If "crash_kexec_post_notifiers" boot option is specified,
other cpus are stopped by smp_send_stop() before entering
crash_kexec(), while usually machine_crash_shutdown() called by
crash_kexec() does that. This behavior change leads two problems.

Problem 1:
Some function in the crash_kexec() path depend on other cpus being
still online. If other cpus have been offlined already, they
doesn't work properly.

Example:
panic()
crash_kexec()
machine_crash_shutdown()
octeon_generic_shutdown() // shutdown watchdog for ONLINE cpus
machine_kexec()

Problem 2:
Most of architectures stop other cpus in the machine_crash_shutdown()
path and save register information at the same time. However, if
smp_send_stop() is called before that, we can't save the register
information.

To solve these problems, this patch changes the timing of calling
the callbacks instead of changing the timing of crash_kexec() if
crash_kexec_post_notifiers boot option is specified.

Before:
if (!crash_kexec_post_notifiers)
crash_kexec()

smp_send_stop()
atomic_notifier_call_chain()
kmsg_dump()

if (crash_kexec_post_notifiers)
crash_kexec()

After:
crash_kexec()
machine_crash_shutdown()
if (crash_kexec_post_notifiers) {
atomic_notifier_call_chain()
kmsg_dump()
}
machine_kexec()

smp_send_stop()
if (!crash_kexec_post_notifiers) {
atomic_notifier_call_chain()
kmsg_dump()
}

NOTE: In current implementation, S/390 stops other cpus in
machine_kexec() but not machine_crash_shutdown(). This means
the notifiers run with other cpus being alive. In this case,
user should use SMP-safe notifiers only.

Reported-by: Daniel Walker <[email protected]>
Fixes: f06e5153f4ae (kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers)
Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
Documentation/kernel-parameters.txt | 4 ++++
include/linux/kernel.h | 1 +
kernel/kexec.c | 39 +++++++++++++++++++++++++++++------
kernel/panic.c | 21 ++++---------------
4 files changed, 42 insertions(+), 23 deletions(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f045..1dfaf23 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2627,6 +2627,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
Note that this also increases risks of kdump failure,
because some panic notifiers can make the crashed
kernel more unstable.
+ Currently, panic-notifiers and kmsg-dumpers are
+ called without stopping other cpus on S/390. If you
+ don't know if those callbacks will work safely in
+ that case, please don't enable this feature.

parkbd.port= [HW] Parallel port number the keyboard adapter is
connected to, default is 0.
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5f0be58..718b46b 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -251,6 +251,7 @@ static inline void might_fault(void) { }
#endif

extern struct atomic_notifier_head panic_notifier_list;
+extern bool crash_kexec_post_notifiers;
extern long (*panic_blink)(int state);
__printf(1, 2)
void panic(const char *fmt, ...)
diff --git a/kernel/kexec.c b/kernel/kexec.c
index 9c7894f..c3c4279 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -36,6 +36,7 @@
#include <linux/syscore_ops.h>
#include <linux/compiler.h>
#include <linux/hugetlb.h>
+#include <linux/kmsg_dump.h>

#include <asm/page.h>
#include <asm/uaccess.h>
@@ -85,13 +86,6 @@ struct resource crashk_low_res = {
int kexec_should_crash(struct task_struct *p)
{
/*
- * If crash_kexec_post_notifiers is enabled, don't run
- * crash_kexec() here yet, which must be run after panic
- * notifiers in panic().
- */
- if (crash_kexec_post_notifiers)
- return 0;
- /*
* There are 4 panic() calls in do_exit() path, each of which
* corresponds to each of these 4 conditions.
*/
@@ -1472,6 +1466,8 @@ void __weak crash_unmap_reserved_pages(void)

void crash_kexec(struct pt_regs *regs, char *msg)
{
+ int notify_done = 0;
+
/* Take the kexec_mutex here to prevent sys_kexec_load
* running on one cpu from replacing the crash kernel
* we are using after a panic on a different cpu.
@@ -1487,10 +1483,39 @@ void crash_kexec(struct pt_regs *regs, char *msg)
crash_setup_regs(&fixed_regs, regs);
crash_save_vmcoreinfo();
machine_crash_shutdown(&fixed_regs);
+
+ /*
+ * If you doubt kdump always works fine in any
+ * situation, "crash_kexec_post_notifiers" offers
+ * you a chance to run panic_notifiers and dumping
+ * kmsg before kdump.
+ *
+ * NOTE: Since some panic_notifiers can make crashed
+ * kernel more unstable, it can increase risks of
+ * the kdump failure too.
+ *
+ * NOTE: Some notifiers assume they run in a single
+ * cpu. Most of architectures stop other cpus in
+ * machine_crash_shutdown(), but S/390 does it in
+ * machine_kexec() at this point. Please use
+ * "crash_kexec_post_notifiers" carefully in that
+ * case.
+ */
+ if (crash_kexec_post_notifiers) {
+ atomic_notifier_call_chain(
+ &panic_notifier_list, 0, msg);
+ kmsg_dump(KMSG_DUMP_PANIC);
+ notify_done = 1;
+ }
+
machine_kexec(kexec_crash_image);
}
mutex_unlock(&kexec_mutex);
}
+
+ if (notify_done == 0)
+ /* Force to call panic notifiers and kmsg dumpers */
+ crash_kexec_post_notifiers = 0;
}

void crash_kexec_on_oops(struct pt_regs *regs, struct task_struct *p)
diff --git a/kernel/panic.c b/kernel/panic.c
index 93008b6..834e349 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -114,11 +114,8 @@ void panic(const char *fmt, ...)
/*
* If we have crashed and we have a crash kernel loaded let it handle
* everything else.
- * If we want to run this after calling panic_notifiers, pass
- * the "crash_kexec_post_notifiers" option to the kernel.
*/
- if (!crash_kexec_post_notifiers)
- crash_kexec(NULL, buf);
+ crash_kexec(NULL, buf);

/*
* Note smp_send_stop is the usual smp shutdown function, which
@@ -131,19 +128,11 @@ void panic(const char *fmt, ...)
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
*/
- atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
+ if (!crash_kexec_post_notifiers) {
+ atomic_notifier_call_chain(&panic_notifier_list, 0, buf);

- kmsg_dump(KMSG_DUMP_PANIC);
-
- /*
- * If you doubt kdump always works fine in any situation,
- * "crash_kexec_post_notifiers" offers you a chance to run
- * panic_notifiers and dumping kmsg before kdump.
- * Note: since some panic_notifiers can make crashed kernel
- * more unstable, it can increase risks of the kdump failure too.
- */
- if (crash_kexec_post_notifiers)
- crash_kexec(NULL, buf);
+ kmsg_dump(KMSG_DUMP_PANIC);
+ }

bust_spinlocks(0);


2015-07-10 11:40:24

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 2/3] kexec: Pass panic message to crash_kexec()

Add an argument to crash_kexec() to pass the panic message. This
patch is a preparation for the next patch, and it doesn't change
the current behavior.

Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
arch/arm/kernel/traps.c | 2 +-
arch/arm64/kernel/traps.c | 2 +-
arch/metag/kernel/traps.c | 2 +-
arch/mips/kernel/traps.c | 2 +-
arch/powerpc/kernel/traps.c | 2 +-
arch/s390/kernel/ipl.c | 2 +-
arch/sh/kernel/traps.c | 2 +-
arch/x86/kernel/dumpstack.c | 2 +-
arch/x86/platform/uv/uv_nmi.c | 2 +-
include/linux/kexec.h | 7 +++++--
kernel/kexec.c | 23 ++++++++++++++++++++++-
kernel/panic.c | 4 ++--
12 files changed, 38 insertions(+), 14 deletions(-)

diff --git a/arch/arm/kernel/traps.c b/arch/arm/kernel/traps.c
index d358226..abb7b86 100644
--- a/arch/arm/kernel/traps.c
+++ b/arch/arm/kernel/traps.c
@@ -293,7 +293,7 @@ static unsigned long oops_begin(void)
static void oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
if (regs && kexec_should_crash(current))
- crash_kexec(regs);
+ crash_kexec_on_oops(regs, current);

bust_spinlocks(0);
die_owner = -1;
diff --git a/arch/arm64/kernel/traps.c b/arch/arm64/kernel/traps.c
index 566bc4c..17a6841 100644
--- a/arch/arm64/kernel/traps.c
+++ b/arch/arm64/kernel/traps.c
@@ -233,7 +233,7 @@ void die(const char *str, struct pt_regs *regs, int err)
ret = __die(str, err, thread, regs);

if (regs && kexec_should_crash(thread->task))
- crash_kexec(regs);
+ crash_kexec_on_oops(regs, thread->task);

bust_spinlocks(0);
add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
diff --git a/arch/metag/kernel/traps.c b/arch/metag/kernel/traps.c
index 17b2e2e..dca46c7 100644
--- a/arch/metag/kernel/traps.c
+++ b/arch/metag/kernel/traps.c
@@ -110,7 +110,7 @@ void __noreturn die(const char *str, struct pt_regs *regs,
bust_spinlocks(0);
add_taint(TAINT_DIE, LOCKDEP_NOW_UNRELIABLE);
if (kexec_should_crash(current))
- crash_kexec(regs);
+ crash_kexec_on_oops(regs, current);

if (in_interrupt())
panic("Fatal exception in interrupt");
diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c
index 2a7b38e..46fc9a6 100644
--- a/arch/mips/kernel/traps.c
+++ b/arch/mips/kernel/traps.c
@@ -402,7 +402,7 @@ void __noreturn die(const char *str, struct pt_regs *regs)
}

if (regs && kexec_should_crash(current))
- crash_kexec(regs);
+ crash_kexec_on_oops(regs, current);

do_exit(sig);
}
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 6530f1b..4e1fe20 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -163,7 +163,7 @@ static void __kprobes oops_end(unsigned long flags, struct pt_regs *regs,
* it through the crashdump code.
*/
if (kexec_should_crash(current) || (TRAP(regs) == 0x100)) {
- crash_kexec(regs);
+ crash_kexec_on_oops(regs, current);

/*
* We aren't the primary crash CPU. We need to send it
diff --git a/arch/s390/kernel/ipl.c b/arch/s390/kernel/ipl.c
index 52fbef91d..67c1f8b 100644
--- a/arch/s390/kernel/ipl.c
+++ b/arch/s390/kernel/ipl.c
@@ -1726,7 +1726,7 @@ static void __do_restart(void *ignore)
__arch_local_irq_stosm(0x04); /* enable DAT */
smp_send_stop();
#ifdef CONFIG_CRASH_DUMP
- crash_kexec(NULL);
+ crash_kexec(NULL, "restart");
#endif
on_restart_trigger.action->fn(&on_restart_trigger);
stop_run(&on_restart_trigger);
diff --git a/arch/sh/kernel/traps.c b/arch/sh/kernel/traps.c
index dfdad72..4bfb519 100644
--- a/arch/sh/kernel/traps.c
+++ b/arch/sh/kernel/traps.c
@@ -43,7 +43,7 @@ void die(const char *str, struct pt_regs *regs, long err)
oops_exit();

if (kexec_should_crash(current))
- crash_kexec(regs);
+ crash_kexec_on_oops(regs, current);

if (in_interrupt())
panic("Fatal exception in interrupt");
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index 9c30acf..e973a1d 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -229,7 +229,7 @@ unsigned long oops_begin(void)
void oops_end(unsigned long flags, struct pt_regs *regs, int signr)
{
if (regs && kexec_should_crash(current))
- crash_kexec(regs);
+ crash_kexec_on_oops(regs, current);

bust_spinlocks(0);
die_owner = -1;
diff --git a/arch/x86/platform/uv/uv_nmi.c b/arch/x86/platform/uv/uv_nmi.c
index 020c101..8b67644 100644
--- a/arch/x86/platform/uv/uv_nmi.c
+++ b/arch/x86/platform/uv/uv_nmi.c
@@ -499,7 +499,7 @@ static void uv_nmi_kdump(int cpu, int master, struct pt_regs *regs)
/* Call crash to dump system state */
if (master) {
pr_emerg("UV: NMI executing crash_kexec on CPU%d\n", cpu);
- crash_kexec(regs);
+ crash_kexec(regs, "UV: NMI executing crash_kexec");

pr_emerg("UV: crash_kexec unexpectedly returned, ");
if (!kexec_crash_image) {
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index e804306..0a2744d 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -237,7 +237,8 @@ extern int kexec_purgatory_get_set_symbol(struct kimage *image,
unsigned int size, bool get_value);
extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
const char *name);
-extern void crash_kexec(struct pt_regs *);
+extern void crash_kexec(struct pt_regs *, char *msg);
+extern void crash_kexec_on_oops(struct pt_regs *, struct task_struct *p);
int kexec_should_crash(struct task_struct *);
void crash_save_cpu(struct pt_regs *regs, int cpu);
void crash_save_vmcoreinfo(void);
@@ -321,7 +322,9 @@ int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
#else /* !CONFIG_KEXEC */
struct pt_regs;
struct task_struct;
-static inline void crash_kexec(struct pt_regs *regs) { }
+static inline void crash_kexec(struct pt_regs *regs, char *msg) { }
+static inline void crash_kexec_on_oops(struct pt_regs *regs,
+ struct task_struct *p) { }
static inline int kexec_should_crash(struct task_struct *p) { return 0; }
#endif /* CONFIG_KEXEC */

diff --git a/kernel/kexec.c b/kernel/kexec.c
index a785c10..9c7894f 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1470,7 +1470,7 @@ void __weak crash_unmap_reserved_pages(void)

#endif /* CONFIG_KEXEC_FILE */

-void crash_kexec(struct pt_regs *regs)
+void crash_kexec(struct pt_regs *regs, char *msg)
{
/* Take the kexec_mutex here to prevent sys_kexec_load
* running on one cpu from replacing the crash kernel
@@ -1493,6 +1493,27 @@ void crash_kexec(struct pt_regs *regs)
}
}

+void crash_kexec_on_oops(struct pt_regs *regs, struct task_struct *p)
+{
+ static char buf[128];
+ char *msg = "Die for unknown reason";
+
+ if (in_interrupt())
+ msg = "Fatal exception in interrupt";
+ else if (panic_on_oops)
+ msg = "Fatal exception";
+ else if (!p->pid)
+ msg = "Attempted to kill the idle task!";
+ else if (is_global_init(p)) {
+ snprintf(buf, sizeof(buf),
+ "Attempted to kill init! exitcode=0x%08x",
+ p->signal->group_exit_code ?: p->exit_code);
+ msg = buf;
+ }
+
+ crash_kexec(regs, msg);
+}
+
size_t crash_get_memory_size(void)
{
size_t size = 0;
diff --git a/kernel/panic.c b/kernel/panic.c
index 5252331..93008b6 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -118,7 +118,7 @@ void panic(const char *fmt, ...)
* the "crash_kexec_post_notifiers" option to the kernel.
*/
if (!crash_kexec_post_notifiers)
- crash_kexec(NULL);
+ crash_kexec(NULL, buf);

/*
* Note smp_send_stop is the usual smp shutdown function, which
@@ -143,7 +143,7 @@ void panic(const char *fmt, ...)
* more unstable, it can increase risks of the kdump failure too.
*/
if (crash_kexec_post_notifiers)
- crash_kexec(NULL);
+ crash_kexec(NULL, buf);

bust_spinlocks(0);


2015-07-10 11:40:33

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

You can call panic notifiers and kmsg dumpers before kdump by
specifying "crash_kexec_post_notifiers" as a boot parameter.
However, it doesn't make sense if kdump is not available. In that
case, disable "crash_kexec_post_notifiers" boot parameter so that
you can't change the value of the parameter.

Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Vivek Goyal <[email protected]>
---
kernel/panic.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/panic.c b/kernel/panic.c
index 04e91ff..5252331 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -502,12 +502,14 @@ __visible void __stack_chk_fail(void)
core_param(pause_on_oops, pause_on_oops, int, 0644);
core_param(panic_on_warn, panic_on_warn, int, 0644);

+#ifdef CONFIG_CRASH_DUMP
static int __init setup_crash_kexec_post_notifiers(char *s)
{
crash_kexec_post_notifiers = true;
return 0;
}
early_param("crash_kexec_post_notifiers", setup_crash_kexec_post_notifiers);
+#endif

static int __init oops_setup(char *s)
{

2015-07-10 13:47:17

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

Hidehiro Kawai <[email protected]> writes:

> You can call panic notifiers and kmsg dumpers before kdump by
> specifying "crash_kexec_post_notifiers" as a boot parameter.
> However, it doesn't make sense if kdump is not available. In that
> case, disable "crash_kexec_post_notifiers" boot parameter so that
> you can't change the value of the parameter.

Nacked-by: "Eric W. Biederman" <[email protected]>

You are confusing kexec on panic and CONFIG_CRASH_DUMP
which is about the tools for reading the state of the previous kernel.

Eric

> Signed-off-by: Hidehiro Kawai <[email protected]>
> Cc: Andrew Morton <[email protected]>
> Cc: Eric Biederman <[email protected]>
> Cc: Vivek Goyal <[email protected]>
> ---
> kernel/panic.c | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/kernel/panic.c b/kernel/panic.c
> index 04e91ff..5252331 100644
> --- a/kernel/panic.c
> +++ b/kernel/panic.c
> @@ -502,12 +502,14 @@ __visible void __stack_chk_fail(void)
> core_param(pause_on_oops, pause_on_oops, int, 0644);
> core_param(panic_on_warn, panic_on_warn, int, 0644);
>
> +#ifdef CONFIG_CRASH_DUMP
> static int __init setup_crash_kexec_post_notifiers(char *s)
> {
> crash_kexec_post_notifiers = true;
> return 0;
> }
> early_param("crash_kexec_post_notifiers", setup_crash_kexec_post_notifiers);
> +#endif
>
> static int __init oops_setup(char *s)
> {

2015-07-13 20:26:28

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> Hidehiro Kawai <[email protected]> writes:
>
> > You can call panic notifiers and kmsg dumpers before kdump by
> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> > However, it doesn't make sense if kdump is not available. In that
> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> > you can't change the value of the parameter.
>
> Nacked-by: "Eric W. Biederman" <[email protected]>

I think it would make sense if he just replaced "kdump" with "kexec".

Daniel

2015-07-14 01:26:15

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

[email protected] writes:

> On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
>> Hidehiro Kawai <[email protected]> writes:
>>
>> > You can call panic notifiers and kmsg dumpers before kdump by
>> > specifying "crash_kexec_post_notifiers" as a boot parameter.
>> > However, it doesn't make sense if kdump is not available. In that
>> > case, disable "crash_kexec_post_notifiers" boot parameter so that
>> > you can't change the value of the parameter.
>>
>> Nacked-by: "Eric W. Biederman" <[email protected]>
>
> I think it would make sense if he just replaced "kdump" with "kexec".

It would be less insane, however it still makes no sense as without
kexec on panic support crash_kexec is a noop. So the value of the
seeting makes no difference.

Eric

2015-07-14 01:56:20

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

Hello Eric and Daniel,

(2015/07/14 5:26), [email protected] wrote:
> On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
>> Hidehiro Kawai <[email protected]> writes:
>>
>>> You can call panic notifiers and kmsg dumpers before kdump by
>>> specifying "crash_kexec_post_notifiers" as a boot parameter.
>>> However, it doesn't make sense if kdump is not available. In that
>>> case, disable "crash_kexec_post_notifiers" boot parameter so that
>>> you can't change the value of the parameter.
>>
>> Nacked-by: "Eric W. Biederman" <[email protected]>
>>
>> You are confusing kexec on panic and CONFIG_CRASH_DUMP
>> which is about the tools for reading the state of the previous kernel.

You are right. I mistook the meaning of CONFIG_CRASH_DUMP.

> I think it would make sense if he just replaced "kdump" with "kexec".

If a user specified crash_kexec_post_notifiers option with kexec
being totally disabled, it just disables notifier and kmsg dumper calls
(please see PATCH 3/3). So as Daniel said, I also think it makes sense
to hide crash_kexec_post_notifiers option if kexec is disabled.

Regards,

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

2015-07-14 13:59:38

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
> [email protected] writes:
>
> > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> >> Hidehiro Kawai <[email protected]> writes:
> >>
> >> > You can call panic notifiers and kmsg dumpers before kdump by
> >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> >> > However, it doesn't make sense if kdump is not available. In that
> >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> >> > you can't change the value of the parameter.
> >>
> >> Nacked-by: "Eric W. Biederman" <[email protected]>
> >
> > I think it would make sense if he just replaced "kdump" with "kexec".
>
> It would be less insane, however it still makes no sense as without
> kexec on panic support crash_kexec is a noop. So the value of the
> seeting makes no difference.

Can you explain more, I don't really understand what you mean. Are you suggesting
the whole "crash_kexec_post_notifiers" feature has no value ?

Daniel

2015-07-14 14:20:08

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Tue, Jul 14, 2015 at 01:59:19PM +0000, [email protected] wrote:
> On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
> > [email protected] writes:
> >
> > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> > >> Hidehiro Kawai <[email protected]> writes:
> > >>
> > >> > You can call panic notifiers and kmsg dumpers before kdump by
> > >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> > >> > However, it doesn't make sense if kdump is not available. In that
> > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> > >> > you can't change the value of the parameter.
> > >>
> > >> Nacked-by: "Eric W. Biederman" <[email protected]>
> > >
> > > I think it would make sense if he just replaced "kdump" with "kexec".
> >
> > It would be less insane, however it still makes no sense as without
> > kexec on panic support crash_kexec is a noop. So the value of the
> > seeting makes no difference.
>
> Can you explain more, I don't really understand what you mean. Are you suggesting
> the whole "crash_kexec_post_notifiers" feature has no value ?

If CONFIG_KEXEC=n, then crash_kexec() is a nop. So it does not matter
whether crash_kexec() is called before panic notifiers or after.

IOW, what do you gain by disabling crash_kexec_post_notifiers, in
case of CONFIG_KEXEC=n?

Thanks
Vivek

2015-07-14 14:42:51

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot option

On Fri, Jul 10, 2015 at 08:33:31PM +0900, Hidehiro Kawai wrote:
> This patch fixes problems reported by Daniel Walker
> (https://lkml.org/lkml/2015/6/24/44), and also replaces the bug fix
> commits 5375b70 and f45d85f.
>
> If "crash_kexec_post_notifiers" boot option is specified,
> other cpus are stopped by smp_send_stop() before entering
> crash_kexec(), while usually machine_crash_shutdown() called by
> crash_kexec() does that. This behavior change leads two problems.
>
> Problem 1:
> Some function in the crash_kexec() path depend on other cpus being
> still online. If other cpus have been offlined already, they
> doesn't work properly.
>
> Example:
> panic()
> crash_kexec()
> machine_crash_shutdown()
> octeon_generic_shutdown() // shutdown watchdog for ONLINE cpus
> machine_kexec()
>
> Problem 2:
> Most of architectures stop other cpus in the machine_crash_shutdown()
> path and save register information at the same time. However, if
> smp_send_stop() is called before that, we can't save the register
> information.
>
> To solve these problems, this patch changes the timing of calling
> the callbacks instead of changing the timing of crash_kexec() if
> crash_kexec_post_notifiers boot option is specified.
>
> Before:
> if (!crash_kexec_post_notifiers)
> crash_kexec()
>
> smp_send_stop()
> atomic_notifier_call_chain()
> kmsg_dump()
>
> if (crash_kexec_post_notifiers)
> crash_kexec()
>
> After:
> crash_kexec()
> machine_crash_shutdown()
> if (crash_kexec_post_notifiers) {
> atomic_notifier_call_chain()
> kmsg_dump()
> }
> machine_kexec()
>
> smp_send_stop()
> if (!crash_kexec_post_notifiers) {
> atomic_notifier_call_chain()
> kmsg_dump()
> }
>

I think this new code flow looks bad. Now we are calling kmsg_dump()
and atomic_notifier_call_chain() from inside the crash_kexec() as well
as from inside panic(). This is bad.

So basic problem seems to be that cpus need to be stopped once (with
or without panic notifiers. So why don't we look into desiginig a
function which stops cpus, saves register states first and then does
rest of the processing.

Something like.

stop_cpus_save_register_state;

if (!crash_kexec_post_notifiers)
crash_kexec()

atomic_notifier_call_chain()
kmsg_dump()

Here crash_kexec() will have to be modified and it will assume that cpus
have already been stopped and register states have already been saved.

IOW, is there a reason that we can't get rid of smp_send_stop() and
use the mechanism crash_kexec() is using to stop cpus after panic()?

Thanks
Vivek

2015-07-14 15:02:13

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Tue, Jul 14, 2015 at 01:59:19PM +0000, [email protected] wrote:
> On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
> > [email protected] writes:
> >
> > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> > >> Hidehiro Kawai <[email protected]> writes:
> > >>
> > >> > You can call panic notifiers and kmsg dumpers before kdump by
> > >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> > >> > However, it doesn't make sense if kdump is not available. In that
> > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> > >> > you can't change the value of the parameter.
> > >>
> > >> Nacked-by: "Eric W. Biederman" <[email protected]>
> > >
> > > I think it would make sense if he just replaced "kdump" with "kexec".
> >
> > It would be less insane, however it still makes no sense as without
> > kexec on panic support crash_kexec is a noop. So the value of the
> > seeting makes no difference.
>
> Can you explain more, I don't really understand what you mean. Are you suggesting
> the whole "crash_kexec_post_notifiers" feature has no value ?

Daniel,

BTW, why are you using crash_kexec_post_notifiers commandline? Why not
without it?

Thanks
Vivek

2015-07-14 15:34:48

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
> On Tue, Jul 14, 2015 at 01:59:19PM +0000, [email protected] wrote:
> > On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
> > > [email protected] writes:
> > >
> > > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> > > >> Hidehiro Kawai <[email protected]> writes:
> > > >>
> > > >> > You can call panic notifiers and kmsg dumpers before kdump by
> > > >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> > > >> > However, it doesn't make sense if kdump is not available. In that
> > > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> > > >> > you can't change the value of the parameter.
> > > >>
> > > >> Nacked-by: "Eric W. Biederman" <[email protected]>
> > > >
> > > > I think it would make sense if he just replaced "kdump" with "kexec".
> > >
> > > It would be less insane, however it still makes no sense as without
> > > kexec on panic support crash_kexec is a noop. So the value of the
> > > seeting makes no difference.
> >
> > Can you explain more, I don't really understand what you mean. Are you suggesting
> > the whole "crash_kexec_post_notifiers" feature has no value ?
>
> Daniel,
>
> BTW, why are you using crash_kexec_post_notifiers commandline? Why not
> without it?

It was explained in the prior thread but to rehash, the notifiers are used to do a switch
over from the crashed machine to another redundant machine.

Daniel

2015-07-14 15:40:43

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Tue, Jul 14, 2015 at 03:34:30PM +0000, [email protected] wrote:
> On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
> > On Tue, Jul 14, 2015 at 01:59:19PM +0000, [email protected] wrote:
> > > On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
> > > > [email protected] writes:
> > > >
> > > > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> > > > >> Hidehiro Kawai <[email protected]> writes:
> > > > >>
> > > > >> > You can call panic notifiers and kmsg dumpers before kdump by
> > > > >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> > > > >> > However, it doesn't make sense if kdump is not available. In that
> > > > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> > > > >> > you can't change the value of the parameter.
> > > > >>
> > > > >> Nacked-by: "Eric W. Biederman" <[email protected]>
> > > > >
> > > > > I think it would make sense if he just replaced "kdump" with "kexec".
> > > >
> > > > It would be less insane, however it still makes no sense as without
> > > > kexec on panic support crash_kexec is a noop. So the value of the
> > > > seeting makes no difference.
> > >
> > > Can you explain more, I don't really understand what you mean. Are you suggesting
> > > the whole "crash_kexec_post_notifiers" feature has no value ?
> >
> > Daniel,
> >
> > BTW, why are you using crash_kexec_post_notifiers commandline? Why not
> > without it?
>
> It was explained in the prior thread but to rehash, the notifiers are used to do a switch
> over from the crashed machine to another redundant machine.

So why not detect failure using polling or issue notifications from second
kernel.

IOW, expecting that a crashed machine will be able to deliver notification
reliably is falwed to begin with, IMHO.

If a machine is failing, there are high chance it can't deliver you the
notification. Detecting that failure suing some kind of polling mechanism
might be more reliable. And it will make even kdump mechanism more
reliable so that it does not have to run panic notifiers after the crash.

Thanks
Vivek

2015-07-14 15:48:55

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote:
> On Tue, Jul 14, 2015 at 03:34:30PM +0000, [email protected] wrote:
> > On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
> > > On Tue, Jul 14, 2015 at 01:59:19PM +0000, [email protected] wrote:
> > > > On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
> > > > > [email protected] writes:
> > > > >
> > > > > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> > > > > >> Hidehiro Kawai <[email protected]> writes:
> > > > > >>
> > > > > >> > You can call panic notifiers and kmsg dumpers before kdump by
> > > > > >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> > > > > >> > However, it doesn't make sense if kdump is not available. In that
> > > > > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> > > > > >> > you can't change the value of the parameter.
> > > > > >>
> > > > > >> Nacked-by: "Eric W. Biederman" <[email protected]>
> > > > > >
> > > > > > I think it would make sense if he just replaced "kdump" with "kexec".
> > > > >
> > > > > It would be less insane, however it still makes no sense as without
> > > > > kexec on panic support crash_kexec is a noop. So the value of the
> > > > > seeting makes no difference.
> > > >
> > > > Can you explain more, I don't really understand what you mean. Are you suggesting
> > > > the whole "crash_kexec_post_notifiers" feature has no value ?
> > >
> > > Daniel,
> > >
> > > BTW, why are you using crash_kexec_post_notifiers commandline? Why not
> > > without it?
> >
> > It was explained in the prior thread but to rehash, the notifiers are used to do a switch
> > over from the crashed machine to another redundant machine.
>
> So why not detect failure using polling or issue notifications from second
> kernel.
>
> IOW, expecting that a crashed machine will be able to deliver notification
> reliably is falwed to begin with, IMHO.

It's flawed to think you can kexec, but you still do it right ? I've not gotten into
the deep details of this switching process, but that's how this interface is used.

> If a machine is failing, there are high chance it can't deliver you the
> notification. Detecting that failure suing some kind of polling mechanism
> might be more reliable. And it will make even kdump mechanism more
> reliable so that it does not have to run panic notifiers after the crash.

I think what your suggesting is that my company should change how it's hardware works
and that's not really an option for me. This isn't a simple thing like checking over the
network if the machine is down or not, this is way more complex hardware design.

Daniel

2015-07-14 16:16:17

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Tue, Jul 14, 2015 at 03:48:33PM +0000, [email protected] wrote:
> On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote:
> > On Tue, Jul 14, 2015 at 03:34:30PM +0000, [email protected] wrote:
> > > On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
> > > > On Tue, Jul 14, 2015 at 01:59:19PM +0000, [email protected] wrote:
> > > > > On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
> > > > > > [email protected] writes:
> > > > > >
> > > > > > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> > > > > > >> Hidehiro Kawai <[email protected]> writes:
> > > > > > >>
> > > > > > >> > You can call panic notifiers and kmsg dumpers before kdump by
> > > > > > >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> > > > > > >> > However, it doesn't make sense if kdump is not available. In that
> > > > > > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> > > > > > >> > you can't change the value of the parameter.
> > > > > > >>
> > > > > > >> Nacked-by: "Eric W. Biederman" <[email protected]>
> > > > > > >
> > > > > > > I think it would make sense if he just replaced "kdump" with "kexec".
> > > > > >
> > > > > > It would be less insane, however it still makes no sense as without
> > > > > > kexec on panic support crash_kexec is a noop. So the value of the
> > > > > > seeting makes no difference.
> > > > >
> > > > > Can you explain more, I don't really understand what you mean. Are you suggesting
> > > > > the whole "crash_kexec_post_notifiers" feature has no value ?
> > > >
> > > > Daniel,
> > > >
> > > > BTW, why are you using crash_kexec_post_notifiers commandline? Why not
> > > > without it?
> > >
> > > It was explained in the prior thread but to rehash, the notifiers are used to do a switch
> > > over from the crashed machine to another redundant machine.
> >
> > So why not detect failure using polling or issue notifications from second
> > kernel.
> >
> > IOW, expecting that a crashed machine will be able to deliver notification
> > reliably is falwed to begin with, IMHO.
>
> It's flawed to think you can kexec, but you still do it right ? I've not gotten into
> the deep details of this switching process, but that's how this interface is used.

Sure. But the deal here is that users of interface know that sometimes it
can be unreliable. And in the absence of more reliable mechanism, somewhat
less reliable mechanism is fine.

>
> > If a machine is failing, there are high chance it can't deliver you the
> > notification. Detecting that failure suing some kind of polling mechanism
> > might be more reliable. And it will make even kdump mechanism more
> > reliable so that it does not have to run panic notifiers after the crash.
>
> I think what your suggesting is that my company should change how it's hardware works
> and that's not really an option for me. This isn't a simple thing like checking over the
> network if the machine is down or not, this is way more complex hardware design.

That means you are ready to live with an unreliable design. There might be
cases where notifier does not get run properly and you will not do switch
despite the fact that OS has failed. I was just trying to nudge you in
a direction which could be more reliable mechanism.

Thanks
Vivek

2015-07-14 17:12:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

Vivek Goyal <[email protected]> writes:

> On Tue, Jul 14, 2015 at 03:48:33PM +0000, [email protected] wrote:
>> On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote:
>> > On Tue, Jul 14, 2015 at 03:34:30PM +0000, [email protected] wrote:
>> > > On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
>> > > > On Tue, Jul 14, 2015 at 01:59:19PM +0000, [email protected] wrote:
>> > > > > On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
>> > > > > > [email protected] writes:
>> > > > > >
>> > > > > > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
>> > > > > > >> Hidehiro Kawai <[email protected]> writes:
>> > > > > > >>
>> > > > > > >> > You can call panic notifiers and kmsg dumpers before kdump by
>> > > > > > >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
>> > > > > > >> > However, it doesn't make sense if kdump is not available. In that
>> > > > > > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
>> > > > > > >> > you can't change the value of the parameter.
>> > > > > > >>
>> > > > > > >> Nacked-by: "Eric W. Biederman" <[email protected]>
>> > > > > > >
>> > > > > > > I think it would make sense if he just replaced "kdump" with "kexec".
>> > > > > >
>> > > > > > It would be less insane, however it still makes no sense as without
>> > > > > > kexec on panic support crash_kexec is a noop. So the value of the
>> > > > > > seeting makes no difference.
>> > > > >
>> > > > > Can you explain more, I don't really understand what you mean. Are you suggesting
>> > > > > the whole "crash_kexec_post_notifiers" feature has no value ?
>> > > >
>> > > > Daniel,
>> > > >
>> > > > BTW, why are you using crash_kexec_post_notifiers commandline? Why not
>> > > > without it?
>> > >
>> > > It was explained in the prior thread but to rehash, the notifiers are used to do a switch
>> > > over from the crashed machine to another redundant machine.
>> >
>> > So why not detect failure using polling or issue notifications from second
>> > kernel.
>> >
>> > IOW, expecting that a crashed machine will be able to deliver notification
>> > reliably is falwed to begin with, IMHO.
>>
>> It's flawed to think you can kexec, but you still do it right ? I've not gotten into
>> the deep details of this switching process, but that's how this interface is used.
>
> Sure. But the deal here is that users of interface know that sometimes it
> can be unreliable. And in the absence of more reliable mechanism, somewhat
> less reliable mechanism is fine.
>
>>
>> > If a machine is failing, there are high chance it can't deliver you the
>> > notification. Detecting that failure suing some kind of polling mechanism
>> > might be more reliable. And it will make even kdump mechanism more
>> > reliable so that it does not have to run panic notifiers after the crash.
>>
>> I think what your suggesting is that my company should change how it's hardware works
>> and that's not really an option for me. This isn't a simple thing like checking over the
>> network if the machine is down or not, this is way more complex hardware design.
>
> That means you are ready to live with an unreliable design. There might be
> cases where notifier does not get run properly and you will not do switch
> despite the fact that OS has failed. I was just trying to nudge you in
> a direction which could be more reliable mechanism.

Sigh I see some deep confusion going on here.

The panic notifiers are just that panic notifiers. They have not been
nor should they be tied to kexec. If those notifiers force a switch
over of between machines I fail to see why you would care if it was
kexec or another panic situation that is forcing that switchover.

Now if you want a reliable design, I strongly recommend as I have been
recommending for the 15 years that magic failover code be placed in
either the new kernel or a stub that preceedes the new kernel.

That gives the greatest reliabilty we know how to engineer, and it lets
you do whatever you need to do.

Especially if it is not desirable for the panic notifiers to run without
the presence of kexec, I very strongly recommend not using them at all
and just writing a stub of code that can run before a new kernel starts.

Eric

2015-07-14 17:30:13

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Tue, Jul 14, 2015 at 12:06:15PM -0500, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
>
> > On Tue, Jul 14, 2015 at 03:48:33PM +0000, [email protected] wrote:
> >> On Tue, Jul 14, 2015 at 11:40:40AM -0400, Vivek Goyal wrote:
> >> > On Tue, Jul 14, 2015 at 03:34:30PM +0000, [email protected] wrote:
> >> > > On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
> >> > > > On Tue, Jul 14, 2015 at 01:59:19PM +0000, [email protected] wrote:
> >> > > > > On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
> >> > > > > > [email protected] writes:
> >> > > > > >
> >> > > > > > > On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
> >> > > > > > >> Hidehiro Kawai <[email protected]> writes:
> >> > > > > > >>
> >> > > > > > >> > You can call panic notifiers and kmsg dumpers before kdump by
> >> > > > > > >> > specifying "crash_kexec_post_notifiers" as a boot parameter.
> >> > > > > > >> > However, it doesn't make sense if kdump is not available. In that
> >> > > > > > >> > case, disable "crash_kexec_post_notifiers" boot parameter so that
> >> > > > > > >> > you can't change the value of the parameter.
> >> > > > > > >>
> >> > > > > > >> Nacked-by: "Eric W. Biederman" <[email protected]>
> >> > > > > > >
> >> > > > > > > I think it would make sense if he just replaced "kdump" with "kexec".
> >> > > > > >
> >> > > > > > It would be less insane, however it still makes no sense as without
> >> > > > > > kexec on panic support crash_kexec is a noop. So the value of the
> >> > > > > > seeting makes no difference.
> >> > > > >
> >> > > > > Can you explain more, I don't really understand what you mean. Are you suggesting
> >> > > > > the whole "crash_kexec_post_notifiers" feature has no value ?
> >> > > >
> >> > > > Daniel,
> >> > > >
> >> > > > BTW, why are you using crash_kexec_post_notifiers commandline? Why not
> >> > > > without it?
> >> > >
> >> > > It was explained in the prior thread but to rehash, the notifiers are used to do a switch
> >> > > over from the crashed machine to another redundant machine.
> >> >
> >> > So why not detect failure using polling or issue notifications from second
> >> > kernel.
> >> >
> >> > IOW, expecting that a crashed machine will be able to deliver notification
> >> > reliably is falwed to begin with, IMHO.
> >>
> >> It's flawed to think you can kexec, but you still do it right ? I've not gotten into
> >> the deep details of this switching process, but that's how this interface is used.
> >
> > Sure. But the deal here is that users of interface know that sometimes it
> > can be unreliable. And in the absence of more reliable mechanism, somewhat
> > less reliable mechanism is fine.
> >
> >>
> >> > If a machine is failing, there are high chance it can't deliver you the
> >> > notification. Detecting that failure suing some kind of polling mechanism
> >> > might be more reliable. And it will make even kdump mechanism more
> >> > reliable so that it does not have to run panic notifiers after the crash.
> >>
> >> I think what your suggesting is that my company should change how it's hardware works
> >> and that's not really an option for me. This isn't a simple thing like checking over the
> >> network if the machine is down or not, this is way more complex hardware design.
> >
> > That means you are ready to live with an unreliable design. There might be
> > cases where notifier does not get run properly and you will not do switch
> > despite the fact that OS has failed. I was just trying to nudge you in
> > a direction which could be more reliable mechanism.
>
> Sigh I see some deep confusion going on here.
>
> The panic notifiers are just that panic notifiers. They have not been
> nor should they be tied to kexec. If those notifiers force a switch
> over of between machines I fail to see why you would care if it was
> kexec or another panic situation that is forcing that switchover.

Hidehiro isn't fixing the failover situation on my side, he's fixing register
information collection when crash_kexec_post_notifiers is used.

Daniel

2015-07-14 17:55:30

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Tue, Jul 14, 2015 at 05:29:53PM +0000, [email protected] wrote:

[..]
> > >> > If a machine is failing, there are high chance it can't deliver you the
> > >> > notification. Detecting that failure suing some kind of polling mechanism
> > >> > might be more reliable. And it will make even kdump mechanism more
> > >> > reliable so that it does not have to run panic notifiers after the crash.
> > >>
> > >> I think what your suggesting is that my company should change how it's hardware works
> > >> and that's not really an option for me. This isn't a simple thing like checking over the
> > >> network if the machine is down or not, this is way more complex hardware design.
> > >
> > > That means you are ready to live with an unreliable design. There might be
> > > cases where notifier does not get run properly and you will not do switch
> > > despite the fact that OS has failed. I was just trying to nudge you in
> > > a direction which could be more reliable mechanism.
> >
> > Sigh I see some deep confusion going on here.
> >
> > The panic notifiers are just that panic notifiers. They have not been
> > nor should they be tied to kexec. If those notifiers force a switch
> > over of between machines I fail to see why you would care if it was
> > kexec or another panic situation that is forcing that switchover.
>
> Hidehiro isn't fixing the failover situation on my side, he's fixing register
> information collection when crash_kexec_post_notifiers is used.

Sure. Given that we have created this new parameter, let us fix it so that
we can capture the other cpu register state in crash dump.

I am little disappointed that it was not tested well when this parameter was
introuced. We should have atleast tested it to the extent to see if there
is proper cpu state present for all cpus in the crash dump.

At that point of time it looked like a simple modification
to allow panic notifiers before crash_kexec().

Thanks
Vivek

2015-07-14 18:07:41

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

Vivek Goyal <[email protected]> writes:

> On Tue, Jul 14, 2015 at 05:29:53PM +0000, [email protected] wrote:
>
> [..]
>> > >> > If a machine is failing, there are high chance it can't deliver you the
>> > >> > notification. Detecting that failure suing some kind of polling mechanism
>> > >> > might be more reliable. And it will make even kdump mechanism more
>> > >> > reliable so that it does not have to run panic notifiers after the crash.
>> > >>
>> > >> I think what your suggesting is that my company should change how it's hardware works
>> > >> and that's not really an option for me. This isn't a simple thing like checking over the
>> > >> network if the machine is down or not, this is way more complex hardware design.
>> > >
>> > > That means you are ready to live with an unreliable design. There might be
>> > > cases where notifier does not get run properly and you will not do switch
>> > > despite the fact that OS has failed. I was just trying to nudge you in
>> > > a direction which could be more reliable mechanism.
>> >
>> > Sigh I see some deep confusion going on here.
>> >
>> > The panic notifiers are just that panic notifiers. They have not been
>> > nor should they be tied to kexec. If those notifiers force a switch
>> > over of between machines I fail to see why you would care if it was
>> > kexec or another panic situation that is forcing that switchover.
>>
>> Hidehiro isn't fixing the failover situation on my side, he's fixing register
>> information collection when crash_kexec_post_notifiers is used.
>
> Sure. Given that we have created this new parameter, let us fix it so that
> we can capture the other cpu register state in crash dump.
>
> I am little disappointed that it was not tested well when this parameter was
> introuced. We should have atleast tested it to the extent to see if there
> is proper cpu state present for all cpus in the crash dump.
>
> At that point of time it looked like a simple modification
> to allow panic notifiers before crash_kexec().

Either that or we say no one cares enough, and it known broken so let's
just revert the fool thing.

I honestly can't see how to support panic notifiers, before kexec.
There is no way to tell what is being done and all of the pieces
including smp_send_stop are known to be buggy.

It isn't like this latest set of patches was reviewed/tested much
better, as the first patch was wrong.

Eric

2015-07-14 18:23:41

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote:
> Vivek Goyal <[email protected]> writes:
>
> > On Tue, Jul 14, 2015 at 05:29:53PM +0000, [email protected] wrote:
> >
> > [..]
> >> > >> > If a machine is failing, there are high chance it can't deliver you the
> >> > >> > notification. Detecting that failure suing some kind of polling mechanism
> >> > >> > might be more reliable. And it will make even kdump mechanism more
> >> > >> > reliable so that it does not have to run panic notifiers after the crash.
> >> > >>
> >> > >> I think what your suggesting is that my company should change how it's hardware works
> >> > >> and that's not really an option for me. This isn't a simple thing like checking over the
> >> > >> network if the machine is down or not, this is way more complex hardware design.
> >> > >
> >> > > That means you are ready to live with an unreliable design. There might be
> >> > > cases where notifier does not get run properly and you will not do switch
> >> > > despite the fact that OS has failed. I was just trying to nudge you in
> >> > > a direction which could be more reliable mechanism.
> >> >
> >> > Sigh I see some deep confusion going on here.
> >> >
> >> > The panic notifiers are just that panic notifiers. They have not been
> >> > nor should they be tied to kexec. If those notifiers force a switch
> >> > over of between machines I fail to see why you would care if it was
> >> > kexec or another panic situation that is forcing that switchover.
> >>
> >> Hidehiro isn't fixing the failover situation on my side, he's fixing register
> >> information collection when crash_kexec_post_notifiers is used.
> >
> > Sure. Given that we have created this new parameter, let us fix it so that
> > we can capture the other cpu register state in crash dump.
> >
> > I am little disappointed that it was not tested well when this parameter was
> > introuced. We should have atleast tested it to the extent to see if there
> > is proper cpu state present for all cpus in the crash dump.
> >
> > At that point of time it looked like a simple modification
> > to allow panic notifiers before crash_kexec().
>
> Either that or we say no one cares enough, and it known broken so let's
> just revert the fool thing.

Masami, you introduced this option. Are you fine with the revert? Is it
really being used and tested?

> I honestly can't see how to support panic notifiers, before kexec.
> There is no way to tell what is being done and all of the pieces
> including smp_send_stop are known to be buggy.

we should be able to replace smp_send_stop() with what crash_kexec() is
doing to stop the machine? If yes, then it should be fine I guess. This
parameter description clearly says that specify it at your own risk. So
we are not issuing a big support statement for successful kdump after
panic notifiers. If it is something fixable, otherwise user needs
to deal with it.

Thanks
Vivek

Subject: Re: Re: [PATCH 3/3] kexec: Change the timing of callbacks related to "crash_kexec_post_notifiers" boot option

On 2015/07/14 23:42, Vivek Goyal wrote:
> On Fri, Jul 10, 2015 at 08:33:31PM +0900, Hidehiro Kawai wrote:
>> This patch fixes problems reported by Daniel Walker
>> (https://lkml.org/lkml/2015/6/24/44), and also replaces the bug fix
>> commits 5375b70 and f45d85f.
>>
>> If "crash_kexec_post_notifiers" boot option is specified,
>> other cpus are stopped by smp_send_stop() before entering
>> crash_kexec(), while usually machine_crash_shutdown() called by
>> crash_kexec() does that. This behavior change leads two problems.
>>
>> Problem 1:
>> Some function in the crash_kexec() path depend on other cpus being
>> still online. If other cpus have been offlined already, they
>> doesn't work properly.
>>
>> Example:
>> panic()
>> crash_kexec()
>> machine_crash_shutdown()
>> octeon_generic_shutdown() // shutdown watchdog for ONLINE cpus
>> machine_kexec()
>>
>> Problem 2:
>> Most of architectures stop other cpus in the machine_crash_shutdown()
>> path and save register information at the same time. However, if
>> smp_send_stop() is called before that, we can't save the register
>> information.
>>
>> To solve these problems, this patch changes the timing of calling
>> the callbacks instead of changing the timing of crash_kexec() if
>> crash_kexec_post_notifiers boot option is specified.
>>
>> Before:
>> if (!crash_kexec_post_notifiers)
>> crash_kexec()
>>
>> smp_send_stop()
>> atomic_notifier_call_chain()
>> kmsg_dump()
>>
>> if (crash_kexec_post_notifiers)
>> crash_kexec()
>>
>> After:
>> crash_kexec()
>> machine_crash_shutdown()
>> if (crash_kexec_post_notifiers) {
>> atomic_notifier_call_chain()
>> kmsg_dump()
>> }
>> machine_kexec()
>>
>> smp_send_stop()
>> if (!crash_kexec_post_notifiers) {
>> atomic_notifier_call_chain()
>> kmsg_dump()
>> }
>>
>
> I think this new code flow looks bad. Now we are calling kmsg_dump()
> and atomic_notifier_call_chain() from inside the crash_kexec() as well
> as from inside panic(). This is bad.
>
> So basic problem seems to be that cpus need to be stopped once (with
> or without panic notifiers. So why don't we look into desiginig a
> function which stops cpus, saves register states first and then does
> rest of the processing.
>
> Something like.
>
> stop_cpus_save_register_state;
>
> if (!crash_kexec_post_notifiers)
> crash_kexec()
>
> atomic_notifier_call_chain()
> kmsg_dump()
>
> Here crash_kexec() will have to be modified and it will assume that cpus
> have already been stopped and register states have already been saved.

Ah, nice! I like this idea :)

>
> IOW, is there a reason that we can't get rid of smp_send_stop() and
> use the mechanism crash_kexec() is using to stop cpus after panic()?

I think there is no reason why we don't do so. smp_send_stop() just
stops other cpus, but crash's one does more (collect registers and
stop watchdogs if needed, etc.). why don't we just replace(improve) it?

Thank you!


--
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: [email protected]

Subject: Re: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

On 2015/07/15 3:23, Vivek Goyal wrote:
> On Tue, Jul 14, 2015 at 01:01:12PM -0500, Eric W. Biederman wrote:
>> Vivek Goyal <[email protected]> writes:
>>
>>> On Tue, Jul 14, 2015 at 05:29:53PM +0000, [email protected] wrote:
>>>
>>> [..]
>>>>>>>> If a machine is failing, there are high chance it can't deliver you the
>>>>>>>> notification. Detecting that failure suing some kind of polling mechanism
>>>>>>>> might be more reliable. And it will make even kdump mechanism more
>>>>>>>> reliable so that it does not have to run panic notifiers after the crash.
>>>>>>>
>>>>>>> I think what your suggesting is that my company should change how it's hardware works
>>>>>>> and that's not really an option for me. This isn't a simple thing like checking over the
>>>>>>> network if the machine is down or not, this is way more complex hardware design.
>>>>>>
>>>>>> That means you are ready to live with an unreliable design. There might be
>>>>>> cases where notifier does not get run properly and you will not do switch
>>>>>> despite the fact that OS has failed. I was just trying to nudge you in
>>>>>> a direction which could be more reliable mechanism.
>>>>>
>>>>> Sigh I see some deep confusion going on here.
>>>>>
>>>>> The panic notifiers are just that panic notifiers. They have not been
>>>>> nor should they be tied to kexec. If those notifiers force a switch
>>>>> over of between machines I fail to see why you would care if it was
>>>>> kexec or another panic situation that is forcing that switchover.
>>>>
>>>> Hidehiro isn't fixing the failover situation on my side, he's fixing register
>>>> information collection when crash_kexec_post_notifiers is used.
>>>
>>> Sure. Given that we have created this new parameter, let us fix it so that
>>> we can capture the other cpu register state in crash dump.
>>>
>>> I am little disappointed that it was not tested well when this parameter was
>>> introuced. We should have atleast tested it to the extent to see if there
>>> is proper cpu state present for all cpus in the crash dump.
>>>
>>> At that point of time it looked like a simple modification
>>> to allow panic notifiers before crash_kexec().
>>
>> Either that or we say no one cares enough, and it known broken so let's
>> just revert the fool thing.
>
> Masami, you introduced this option. Are you fine with the revert? Is it
> really being used and tested?

Actually, it is tested but under very limited situation. I think we
need a clear acceptance criteria, IOW, we need a testset for kdump
so that we can make things better.
Would you have it? maybe we can push it into kselftest.

>> I honestly can't see how to support panic notifiers, before kexec.
>> There is no way to tell what is being done and all of the pieces
>> including smp_send_stop are known to be buggy.
>
> we should be able to replace smp_send_stop() with what crash_kexec() is
> doing to stop the machine? If yes, then it should be fine I guess. This
> parameter description clearly says that specify it at your own risk. So
> we are not issuing a big support statement for successful kdump after
> panic notifiers. If it is something fixable, otherwise user needs
> to deal with it.

Agreed (as I've sent in other replay).

Thank you,

--
Masami HIRAMATSU
Linux Technology Research Center, System Productivity Research Dept.
Center for Technology Innovation - Systems Engineering
Hitachi, Ltd., Research & Development Group
E-mail: [email protected]

2015-07-15 10:49:51

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: [PATCH 1/3] panic: Disable crash_kexec_post_notifiers if kdump is not available

(2015/07/15 0:40), Vivek Goyal wrote:
> On Tue, Jul 14, 2015 at 03:34:30PM +0000, [email protected] wrote:
>> On Tue, Jul 14, 2015 at 11:02:08AM -0400, Vivek Goyal wrote:
>>> On Tue, Jul 14, 2015 at 01:59:19PM +0000, [email protected] wrote:
>>>> On Mon, Jul 13, 2015 at 08:19:45PM -0500, Eric W. Biederman wrote:
>>>>> [email protected] writes:
>>>>>
>>>>>> On Fri, Jul 10, 2015 at 08:41:28AM -0500, Eric W. Biederman wrote:
>>>>>>> Hidehiro Kawai <[email protected]> writes:
>>>>>>>
>>>>>>>> You can call panic notifiers and kmsg dumpers before kdump by
>>>>>>>> specifying "crash_kexec_post_notifiers" as a boot parameter.
>>>>>>>> However, it doesn't make sense if kdump is not available. In that
>>>>>>>> case, disable "crash_kexec_post_notifiers" boot parameter so that
>>>>>>>> you can't change the value of the parameter.
>>>>>>>
>>>>>>> Nacked-by: "Eric W. Biederman" <[email protected]>
>>>>>>
>>>>>> I think it would make sense if he just replaced "kdump" with "kexec".
>>>>>
>>>>> It would be less insane, however it still makes no sense as without
>>>>> kexec on panic support crash_kexec is a noop. So the value of the
>>>>> seeting makes no difference.
>>>>
>>>> Can you explain more, I don't really understand what you mean. Are you suggesting
>>>> the whole "crash_kexec_post_notifiers" feature has no value ?
>>>
>>> Daniel,
>>>
>>> BTW, why are you using crash_kexec_post_notifiers commandline? Why not
>>> without it?
>>
>> It was explained in the prior thread but to rehash, the notifiers are used to do a switch
>> over from the crashed machine to another redundant machine.
>
> So why not detect failure using polling or issue notifications from second
> kernel.

Polling is not sufficient because some kernel parts may be
alive even if the responder of the polling is dead. We want
to notify the failure after stopping other CPUs.

Notifying from second kernel needs to wait for the kernel
booted up and device initialization if needed, and this
is not applicable if we want to do fast switchover.

Notifying just before second kernel, as Eric stated, is
one of the reliable option although we can't do complicate
things there. For example, we can notify the failure by
writing some specific I/O registers in purgatory codes
provided by kexec command. Since the purgatory codes are
currently embedded into kexec command, so we might need to
modify the mechanism to be pluggable because how to notify
will differ among vendors.

Anyway, this is the case of switchover use case. If we want
to save minimal information before kdump, notifiers or
kmsg_dump() can be used.

> IOW, expecting that a crashed machine will be able to deliver notification
> reliably is falwed to begin with, IMHO.

I think it depends on what callback is used. Most of panic
notifiers just do memory copy or I/O register access.
Of course, there are relatively complicate notifiers too,
and I'm preparing patch sets for hardening for that case.

Regards,
Hidehiro Kawai