2015-07-22 07:37:35

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 0/3] x86: Fix panic vs. NMI issues

When an HA cluster software or administrator detects non-response
of a host, they issue an NMI to the host to completely stop current
works and take a crash dump. If the kernel has already panicked
or is capturing a crash dump at that time, further NMI can cause
a crash dump failure.

To solve this issue, this patch set does two things:

- Don't panic on NMI if the kernel has already panicked
- Introduce "noextnmi" boot option which masks external NMI at the
boot time (supported only for x86)

---

Hidehiro Kawai (3):
x86/panic: Fix re-entrance problem due to panic on NMI
kexec: Fix race between panic() and crash_kexec() directly called
x86/apic: Introduce noextnmi boot option


Documentation/kernel-parameters.txt | 4 ++++
arch/x86/kernel/apic/apic.c | 17 +++++++++++++++-
arch/x86/kernel/nmi.c | 18 +++++++++++------
include/linux/kernel.h | 4 ++++
include/linux/kexec.h | 2 ++
kernel/kexec.c | 12 ++++++++++-
kernel/panic.c | 37 ++++++++++++++++++++++++++---------
7 files changed, 76 insertions(+), 18 deletions(-)


--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group


2015-07-22 07:37:39

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 3/3] x86/apic: Introduce noextnmi boot option

This patch introduces new boot option "noextnmi" which disables
external NMI. This option is useful for the dump capture kernel
so that an HA application or administrator wouldn't mistakenly
shoot down the kernel by NMI.

Currently, only x86 supports this option.

Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
---
Documentation/kernel-parameters.txt | 4 ++++
arch/x86/kernel/apic/apic.c | 17 ++++++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f045..2cbd40b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2364,6 +2364,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings

+ noextnmi [X86]
+ Mask external NMI. This option is useful for a
+ dump capture kernel to be shot down by NMI.
+
nosmap [X86]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index dcb5285..a140410 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -82,6 +82,12 @@
static unsigned int disabled_cpu_apicid __read_mostly = BAD_APICID;

/*
+ * Don't enable external NMI via LINT1 on BSP. This is useful for
+ * the dump capture kernel.
+ */
+static bool apic_noextnmi;
+
+/*
* Map cpu index to physical APIC ID
*/
DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -1150,6 +1156,8 @@ void __init init_bsp_APIC(void)
value = APIC_DM_NMI;
if (!lapic_is_integrated()) /* 82489DX */
value |= APIC_LVT_LEVEL_TRIGGER;
+ if (apic_noextnmi)
+ value |= APIC_LVT_MASKED;
apic_write(APIC_LVT1, value);
}

@@ -1369,7 +1377,7 @@ void setup_local_APIC(void)
/*
* only the BP should see the LINT1 NMI signal, obviously.
*/
- if (!cpu)
+ if (!cpu && !apic_noextnmi)
value = APIC_DM_NMI;
else
value = APIC_DM_NMI | APIC_LVT_MASKED;
@@ -2537,3 +2545,10 @@ static int __init apic_set_disabled_cpu_apicid(char *arg)
return 0;
}
early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
+
+static int __init apic_set_noextnmi(char *arg)
+{
+ apic_noextnmi = true;
+ return 0;
+}
+early_param("noextnmi", apic_set_noextnmi);

2015-07-22 07:37:37

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 2/3] kexec: Fix race between panic() and crash_kexec() directly called

Currently, panic() and crash_kexec() can be called at the same time.
For example (x86 case):

CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
nmi_shootdown_cpus() // stop other cpus

CPU 1:
panic()
crash_kexec()
mutex_trylock() // failed to acquire
smp_send_stop() // stop other cpus
infinite loop

If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
fails.

In another case:

CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
<NMI>
io_check_error()
panic()
crash_kexec()
mutex_trylock() // failed to acquire
infinite loop

Clearly, this is an undesirable result.

To fix this problem, this patch changes crash_kexec() to exclude
others by using panic_lock.

Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Andrew Morton <[email protected]>
---
include/linux/kexec.h | 2 ++
kernel/kexec.c | 12 +++++++++++-
kernel/panic.c | 4 ++--
3 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index e804306..bd6e477 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -238,6 +238,7 @@ extern int kexec_purgatory_get_set_symbol(struct kimage *image,
extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
const char *name);
extern void crash_kexec(struct pt_regs *);
+extern void __crash_kexec(struct pt_regs *);
int kexec_should_crash(struct task_struct *);
void crash_save_cpu(struct pt_regs *regs, int cpu);
void crash_save_vmcoreinfo(void);
@@ -322,6 +323,7 @@ int parse_crashkernel_low(char *cmdline, unsigned long long system_ram,
struct pt_regs;
struct task_struct;
static inline void crash_kexec(struct pt_regs *regs) { }
+static inline void __crash_kexec(struct pt_regs *regs) { }
static inline int kexec_should_crash(struct task_struct *p) { return 0; }
#endif /* CONFIG_KEXEC */

diff --git a/kernel/kexec.c b/kernel/kexec.c
index a785c10..fcdd825 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1470,7 +1470,7 @@ void __weak crash_unmap_reserved_pages(void)

#endif /* CONFIG_KEXEC_FILE */

-void crash_kexec(struct pt_regs *regs)
+void __crash_kexec(struct pt_regs *regs)
{
/* Take the kexec_mutex here to prevent sys_kexec_load
* running on one cpu from replacing the crash kernel
@@ -1493,6 +1493,16 @@ void crash_kexec(struct pt_regs *regs)
}
}

+void crash_kexec(struct pt_regs *regs)
+{
+ unsigned long flags;
+
+ if (spin_trylock_irqsave(&panic_lock, flags)) {
+ __crash_kexec(regs);
+ spin_unlock_irqrestore(&panic_lock, flags);
+ }
+}
+
size_t crash_get_memory_size(void)
{
size_t size = 0;
diff --git a/kernel/panic.c b/kernel/panic.c
index 3c8338b..ce5c8ab 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -135,7 +135,7 @@ void __panic(char *msg)
* the "crash_kexec_post_notifiers" option to the kernel.
*/
if (!crash_kexec_post_notifiers)
- crash_kexec(NULL);
+ __crash_kexec(NULL);

/*
* Note smp_send_stop is the usual smp shutdown function, which
@@ -160,7 +160,7 @@ void __panic(char *msg)
* more unstable, it can increase risks of the kdump failure too.
*/
if (crash_kexec_post_notifiers)
- crash_kexec(NULL);
+ __crash_kexec(NULL);

bust_spinlocks(0);


2015-07-22 07:38:24

by Hidehiro Kawai

[permalink] [raw]
Subject: [PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

If panic on NMI happens just after panic() on the same CPU, panic()
is recursively called. As the result, it stalls on panic_lock.

To avoid this problem, don't call panic() in NMI context if
we've already entered panic() (i.e. we hold panic_lock).

Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
---
arch/x86/kernel/nmi.c | 18 ++++++++++++------
include/linux/kernel.h | 4 ++++
kernel/panic.c | 33 +++++++++++++++++++++++++--------
3 files changed, 41 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index d05bd2e..c14b23f 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -230,8 +230,8 @@ void unregister_nmi_handler(unsigned int type, const char *name)
}
#endif

- if (panic_on_unrecovered_nmi)
- panic("NMI: Not continuing");
+ if (panic_on_unrecovered_nmi && spin_trylock(&panic_lock))
+ __panic("NMI: Not continuing");

pr_emerg("Dazed and confused, but trying to continue\n");

@@ -255,8 +255,12 @@ void unregister_nmi_handler(unsigned int type, const char *name)
reason, smp_processor_id());
show_regs(regs);

- if (panic_on_io_nmi)
- panic("NMI IOCK error: Not continuing");
+ if (panic_on_io_nmi) {
+ if (spin_trylock(&panic_lock))
+ __panic("NMI IOCK error: Not continuing");
+ else
+ return; /* We don't want to wait and re-enable NMI */
+ }

/* Re-enable the IOCK line, wait for a few seconds */
reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
@@ -296,8 +300,10 @@ void unregister_nmi_handler(unsigned int type, const char *name)
reason, smp_processor_id());

pr_emerg("Do you have a strange power saving mode enabled?\n");
- if (unknown_nmi_panic || panic_on_unrecovered_nmi)
- panic("NMI: Not continuing");
+ if (unknown_nmi_panic || panic_on_unrecovered_nmi) {
+ if (spin_trylock(&panic_lock))
+ __panic("NMI: Not continuing");
+ }

pr_emerg("Dazed and confused, but trying to continue\n");
}
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5582410..be430dc 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -250,11 +250,15 @@ static inline u32 reciprocal_scale(u32 val, u32 ep_ro)
static inline void might_fault(void) { }
#endif

+typedef struct spinlock spinlock_t;
+extern spinlock_t panic_lock;
extern struct atomic_notifier_head panic_notifier_list;
extern long (*panic_blink)(int state);
__printf(1, 2)
void panic(const char *fmt, ...)
__noreturn __cold;
+void __panic(char *msg)
+ __noreturn __cold;
extern void oops_enter(void);
extern void oops_exit(void);
void print_oops_end_marker(void);
diff --git a/kernel/panic.c b/kernel/panic.c
index 04e91ff..3c8338b 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -60,6 +60,8 @@ void __weak panic_smp_self_stop(void)
cpu_relax();
}

+DEFINE_SPINLOCK(panic_lock);
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -70,11 +72,8 @@ void __weak panic_smp_self_stop(void)
*/
void panic(const char *fmt, ...)
{
- static DEFINE_SPINLOCK(panic_lock);
static char buf[1024];
va_list args;
- long i, i_next = 0;
- int state = 0;

/*
* Disable local interrupts. This will prevent panic_smp_self_stop
@@ -97,12 +96,30 @@ void panic(const char *fmt, ...)
if (!spin_trylock(&panic_lock))
panic_smp_self_stop();

- console_verbose();
- bust_spinlocks(1);
va_start(args, fmt);
vsnprintf(buf, sizeof(buf), fmt, args);
va_end(args);
- pr_emerg("Kernel panic - not syncing: %s\n", buf);
+
+ __panic(buf);
+}
+
+/**
+ * __panic - no lock version of panic
+ * @msg: The text string to print
+ *
+ * Normally, please use panic(). This function can be used
+ * only if panic_lock has already been held.
+ *
+ * This function never returns.
+ */
+void __panic(char *msg)
+{
+ long i, i_next = 0;
+ int state = 0;
+
+ console_verbose();
+ bust_spinlocks(1);
+ pr_emerg("Kernel panic - not syncing: %s\n", msg);
#ifdef CONFIG_DEBUG_BUGVERBOSE
/*
* Avoid nested stack-dumping if a panic occurs during oops processing
@@ -131,7 +148,7 @@ void panic(const char *fmt, ...)
* Run any panic handlers, including those that might need to
* add information to the kmsg dump output.
*/
- atomic_notifier_call_chain(&panic_notifier_list, 0, buf);
+ atomic_notifier_call_chain(&panic_notifier_list, 0, msg);

kmsg_dump(KMSG_DUMP_PANIC);

@@ -190,7 +207,7 @@ void panic(const char *fmt, ...)
disabled_wait(caller);
}
#endif
- pr_emerg("---[ end Kernel panic - not syncing: %s\n", buf);
+ pr_emerg("---[ end Kernel panic - not syncing: %s\n", msg);
local_irq_enable();
for (i = 0; ; i += PANIC_TIMER_STEP) {
touch_softlockup_watchdog();

2015-07-23 08:16:38

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

On Wed, Jul 22, 2015 at 11:14:21AM +0900, Hidehiro Kawai wrote:
> +DEFINE_SPINLOCK(panic_lock);

At the very least this should be a raw spinlock, but wth aren't you
using a simple atomic_xchg() ?

2015-07-23 08:25:26

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: Fix panic vs. NMI issues

Hi,

On Wed 22-07-15 11:14:21, Hidehiro Kawai wrote:
> When an HA cluster software or administrator detects non-response
> of a host, they issue an NMI to the host to completely stop current
> works and take a crash dump. If the kernel has already panicked
> or is capturing a crash dump at that time, further NMI can cause
> a crash dump failure.
>
> To solve this issue, this patch set does two things:
>
> - Don't panic on NMI if the kernel has already panicked
> - Introduce "noextnmi" boot option which masks external NMI at the
> boot time (supported only for x86)

I am currently debugging the same issue for our customer. Curiously
enough the issue happens on a Hitachi HW.
I haven't posted my patch for an upstream review yet because I still
do not have a feedback but I believe your solution is unnecessarily
too complex. Unless I am missing something the following should be enough,
no?
---
>From ba6ef85d26113e720a630ea22b08efef5b70210f Mon Sep 17 00:00:00 2001
From: Michal Hocko <[email protected]>
Date: Fri, 17 Jul 2015 15:17:08 +0200
Subject: [PATCH] kexec: Never return from crash_kexec when kexex is in
progress

We had a report when kdump kernel hasn't booted after unknown NMI has
been delivered and unknown_nmi_panic is enabled. The NMI is triggered
by HW and it is delivered to all CPUs at the same time. The machine has
hundreds of CPUs and the most plausible theory is that one CPU really
manages to kick the kexec but it cannot shut down all the CPUs because
they are processing NMI and so cannot process an IPI. Another CPU then
manages to call smp_send_stop from a concurrent panic and this stops the
kexec CPU which has managed to switch to the new kernel and doesn't run
in the NMI mode anymore.

Fix this by making crash_kexec to never return if there is a kexec in
progress. This can be done easily by relying on the fact that
kexec_mutex will never be released for an ongoing kexec so we just have
to loop over the try lock. The only tricky part is that
kexec_crash_image might be not loaded when we have to return. The check
has to be done under the lock. Extract the trylock and check into
try_crash_kexec and make it return true only if crash kexec is disabled.

Signed-off-by: Michal Hocko <[email protected]>
---
kernel/kexec.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index a785c1015e25..d61b1478167d 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1470,7 +1470,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,

#endif /* CONFIG_KEXEC_FILE */

-void crash_kexec(struct pt_regs *regs)
+static bool try_crash_kexec(struct pt_regs *regs)
{
/* Take the kexec_mutex here to prevent sys_kexec_load
* running on one cpu from replacing the crash kernel
@@ -1490,7 +1490,20 @@ void crash_kexec(struct pt_regs *regs)
machine_kexec(kexec_crash_image);
}
mutex_unlock(&kexec_mutex);
+ return true;
}
+ return false;
+}
+
+void crash_kexec(struct pt_regs *regs)
+{
+ /*
+ * Never return from this function if a kexec is in progress
+ * already because next steps might interfere with it.
+ * try_crash_kexec will never succeed in such a case.
+ */
+ while (!try_crash_kexec(regs))
+ cpu_relax();
}

size_t crash_get_memory_size(void)
--
2.1.4

--
Michal Hocko
SUSE Labs

2015-07-23 09:44:25

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: [PATCH 1/3] x86/panic: Fix re-entrance problem due to panic on NMI

(2015/07/23 17:15), Peter Zijlstra wrote:
> On Wed, Jul 22, 2015 at 11:14:21AM +0900, Hidehiro Kawai wrote:
>> +DEFINE_SPINLOCK(panic_lock);
>
> At the very least this should be a raw spinlock, but wth aren't you
> using a simple atomic_xchg() ?

Thanks for the comment.

I just followed the current panic_lock implementation.
Using atomic_xchg() may be OK. I'll try another version
with atomic_xchg().

Regards,
--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

2015-07-23 10:11:15

by Hidehiro Kawai

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: Fix panic vs. NMI issues

Hi,

Thanks for the feedback.

(2015/07/23 17:25), Michal Hocko wrote:
> Hi,
>
> On Wed 22-07-15 11:14:21, Hidehiro Kawai wrote:
>> When an HA cluster software or administrator detects non-response
>> of a host, they issue an NMI to the host to completely stop current
>> works and take a crash dump. If the kernel has already panicked
>> or is capturing a crash dump at that time, further NMI can cause
>> a crash dump failure.
>>
>> To solve this issue, this patch set does two things:
>>
>> - Don't panic on NMI if the kernel has already panicked
>> - Introduce "noextnmi" boot option which masks external NMI at the
>> boot time (supported only for x86)
>
> I am currently debugging the same issue for our customer. Curiously
> enough the issue happens on a Hitachi HW.

I found these issues by my white-box testing and source code
reading. So, they haven't happened on our customers yet, but
possibly happen.

> I haven't posted my patch for an upstream review yet because I still
> do not have a feedback but I believe your solution is unnecessarily
> too complex. Unless I am missing something the following should be enough,
> no?

Your patch solves some cases, but I think it wouldn't cover
all cases where I want to solve. How about the following cases?

1) panic -> acquire panic_lock -> unknown NMI on this CPU ->
panic -> failed to acquire panic_lock -> infinite loop
==> no one processes kdump procedure.

2) crash_kexec w/o entering panic -> acquire kexec_mutex ->
unknown NMI on this CPU -> panic -> crash_kexec ->
failed to acquire kexec_mutex -> return to panic -> smp_send_stop

Even if with your patch, case 2) causes infinite loop of
try_crash_kexec and no one processes kdump procedure.

Regards,

> ---
>>From ba6ef85d26113e720a630ea22b08efef5b70210f Mon Sep 17 00:00:00 2001
> From: Michal Hocko <[email protected]>
> Date: Fri, 17 Jul 2015 15:17:08 +0200
> Subject: [PATCH] kexec: Never return from crash_kexec when kexex is in
> progress
>
> We had a report when kdump kernel hasn't booted after unknown NMI has
> been delivered and unknown_nmi_panic is enabled. The NMI is triggered
> by HW and it is delivered to all CPUs at the same time. The machine has
> hundreds of CPUs and the most plausible theory is that one CPU really
> manages to kick the kexec but it cannot shut down all the CPUs because
> they are processing NMI and so cannot process an IPI. Another CPU then
> manages to call smp_send_stop from a concurrent panic and this stops the
> kexec CPU which has managed to switch to the new kernel and doesn't run
> in the NMI mode anymore.
>
> Fix this by making crash_kexec to never return if there is a kexec in
> progress. This can be done easily by relying on the fact that
> kexec_mutex will never be released for an ongoing kexec so we just have
> to loop over the try lock. The only tricky part is that
> kexec_crash_image might be not loaded when we have to return. The check
> has to be done under the lock. Extract the trylock and check into
> try_crash_kexec and make it return true only if crash kexec is disabled.
>
> Signed-off-by: Michal Hocko <[email protected]>
> ---
> kernel/kexec.c | 15 ++++++++++++++-
> 1 file changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index a785c1015e25..d61b1478167d 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1470,7 +1470,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
>
> #endif /* CONFIG_KEXEC_FILE */
>
> -void crash_kexec(struct pt_regs *regs)
> +static bool try_crash_kexec(struct pt_regs *regs)
> {
> /* Take the kexec_mutex here to prevent sys_kexec_load
> * running on one cpu from replacing the crash kernel
> @@ -1490,7 +1490,20 @@ void crash_kexec(struct pt_regs *regs)
> machine_kexec(kexec_crash_image);
> }
> mutex_unlock(&kexec_mutex);
> + return true;
> }
> + return false;
> +}
> +
> +void crash_kexec(struct pt_regs *regs)
> +{
> + /*
> + * Never return from this function if a kexec is in progress
> + * already because next steps might interfere with it.
> + * try_crash_kexec will never succeed in such a case.
> + */
> + while (!try_crash_kexec(regs))
> + cpu_relax();
> }
>
> size_t crash_get_memory_size(void)
>


--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

2015-07-23 11:25:47

by Michal Hocko

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: Fix panic vs. NMI issues

On Thu 23-07-15 19:11:03, Hidehiro Kawai wrote:
> Hi,
>
> Thanks for the feedback.
>
> (2015/07/23 17:25), Michal Hocko wrote:
> > Hi,
> >
> > On Wed 22-07-15 11:14:21, Hidehiro Kawai wrote:
> >> When an HA cluster software or administrator detects non-response
> >> of a host, they issue an NMI to the host to completely stop current
> >> works and take a crash dump. If the kernel has already panicked
> >> or is capturing a crash dump at that time, further NMI can cause
> >> a crash dump failure.
> >>
> >> To solve this issue, this patch set does two things:
> >>
> >> - Don't panic on NMI if the kernel has already panicked
> >> - Introduce "noextnmi" boot option which masks external NMI at the
> >> boot time (supported only for x86)
> >
> > I am currently debugging the same issue for our customer. Curiously
> > enough the issue happens on a Hitachi HW.
>
> I found these issues by my white-box testing and source code
> reading. So, they haven't happened on our customers yet, but
> possibly happen.
>
> > I haven't posted my patch for an upstream review yet because I still
> > do not have a feedback but I believe your solution is unnecessarily
> > too complex. Unless I am missing something the following should be enough,
> > no?
>
> Your patch solves some cases, but I think it wouldn't cover
> all cases where I want to solve. How about the following cases?
>
> 1) panic -> acquire panic_lock -> unknown NMI on this CPU ->
> panic -> failed to acquire panic_lock -> infinite loop
> ==> no one processes kdump procedure.

Ohh, I wasn't aware of panic_lock, 93e13a360ba3 ("kdump: fix
crash_kexec()/smp_send_stop() race in panic()") has been introduced in
3.3 and I was debugging this on 3.0 based kernel.

> 2) crash_kexec w/o entering panic -> acquire kexec_mutex ->
> unknown NMI on this CPU -> panic -> crash_kexec ->
> failed to acquire kexec_mutex -> return to panic -> smp_send_stop
>
> Even if with your patch, case 2) causes infinite loop of
> try_crash_kexec and no one processes kdump procedure.

You are right - I have missed this case.
--
Michal Hocko
SUSE Labs