LinuxLists.cc - [V3 PATCH 0/4] Fix race issues among panic, NMI and crash

2015-08-06 05:47:36

Subject: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

When an HA cluster software or administrator detects non-response
of a host, they issue an NMI to the host to completely stop current
works and take a crash dump. If the kernel has already panicked
or is capturing a crash dump at that time, further NMI can cause
a crash dump failure.

Also, crash_kexec() called from oops context and panic() can
cause race conditions.

To solve these issue, this patch set does following things:

- Don't panic on NMI if the kernel has already panicked
- Extend exclusion control currently done by panic_lock to crash_kexec
- Introduce "noextnmi" boot option which masks external NMI at the
boot time (supported only for x86)

V3:
- Introduce nmi_panic() macro to reduce code duplication
- In the case of panic on NMI, don't return from NMI handlers
if another cpu already panicked

V2:
- Use atomic_cmpxchg() instead of current spin_trylock() to exclude
concurrent accesses to panic() and crash_kexec()
- Don't introduce no-lock version of panic() and crash_kexec()

---

Hidehiro Kawai (4):
panic/x86: Fix re-entrance problem due to panic on NMI
panic/x86: Allow cpus to save registers even if they are looping in NMI context
kexec: Fix race between panic() and crash_kexec() called directly
x86/apic: Introduce noextnmi boot option

Documentation/kernel-parameters.txt | 4 ++++
arch/x86/kernel/apic/apic.c | 17 ++++++++++++++++-
arch/x86/kernel/nmi.c | 15 +++++++++++----
arch/x86/kernel/reboot.c | 11 +++++++++++
include/linux/kernel.h | 21 +++++++++++++++++++++
kernel/kexec.c | 20 ++++++++++++++++++++
kernel/panic.c | 23 ++++++++++++++++++++---
kernel/watchdog.c | 5 +++--
8 files changed, 106 insertions(+), 10 deletions(-)

--
Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

2015-08-06 05:47:44

by Hidehiro Kawai

[permalink] [raw]

Subject: [V3 PATCH 4/4] x86/apic: Introduce noextnmi boot option

This patch introduces new boot option "noextnmi" which disables
external NMI. This option is useful for the dump capture kernel
so that an HA application or administrator wouldn't mistakenly
shoot down the kernel by NMI.

Currently, only x86 supports this option.

Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Jonathan Corbet <[email protected]>
---
Documentation/kernel-parameters.txt | 4 ++++
arch/x86/kernel/apic/apic.c | 17 ++++++++++++++++-
2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1d6f045..2cbd40b 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -2364,6 +2364,10 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
noexec=on: enable non-executable mappings (default)
noexec=off: disable non-executable mappings

+ noextnmi [X86]
+ Mask external NMI. This option is useful for a
+ dump capture kernel to be shot down by NMI.
+
nosmap [X86]
Disable SMAP (Supervisor Mode Access Prevention)
even if it is supported by processor.
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index dcb5285..a140410 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -82,6 +82,12 @@
static unsigned int disabled_cpu_apicid __read_mostly = BAD_APICID;

/*
+ * Don't enable external NMI via LINT1 on BSP. This is useful for
+ * the dump capture kernel.
+ */
+static bool apic_noextnmi;
+
+/*
* Map cpu index to physical APIC ID
*/
DEFINE_EARLY_PER_CPU_READ_MOSTLY(u16, x86_cpu_to_apicid, BAD_APICID);
@@ -1150,6 +1156,8 @@ void __init init_bsp_APIC(void)
value = APIC_DM_NMI;
if (!lapic_is_integrated()) /* 82489DX */
value |= APIC_LVT_LEVEL_TRIGGER;
+ if (apic_noextnmi)
+ value |= APIC_LVT_MASKED;
apic_write(APIC_LVT1, value);
}

@@ -1369,7 +1377,7 @@ void setup_local_APIC(void)
/*
* only the BP should see the LINT1 NMI signal, obviously.
*/
- if (!cpu)
+ if (!cpu && !apic_noextnmi)
value = APIC_DM_NMI;
else
value = APIC_DM_NMI | APIC_LVT_MASKED;
@@ -2537,3 +2545,10 @@ static int __init apic_set_disabled_cpu_apicid(char *arg)
return 0;
}
early_param("disable_cpu_apicid", apic_set_disabled_cpu_apicid);
+
+static int __init apic_set_noextnmi(char *arg)
+{
+ apic_noextnmi = true;
+ return 0;
+}
+early_param("noextnmi", apic_set_noextnmi);

2015-08-06 05:47:42

by Hidehiro Kawai

[permalink] [raw]

Subject: [V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

When cpu-A panics on NMI just after cpu-B has panicked, cpu-A loops
infinitely in NMI context. Especially for x86, cpu-B issues NMI IPI
to other cpus to save their register states and do some cleanups if
kdump is enabled, but cpu-A can't handle the NMI and fails to save
register states.

To solve thie issue, we wait for the timing of the NMI IPI, then
call the NMI handler which saves register states.

V3:
- Newly introduced

Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Michal Hocko <[email protected]>
---
arch/x86/kernel/nmi.c | 6 +++---
arch/x86/kernel/reboot.c | 11 +++++++++++
include/linux/kernel.h | 12 ++++++++++--
kernel/panic.c | 10 ++++++++++
kernel/watchdog.c | 5 +++--
5 files changed, 37 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index dcd4038..fbb1877 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
#endif

if (panic_on_unrecovered_nmi)
- nmi_panic("NMI: Not continuing");
+ nmi_panic(regs, "NMI: Not continuing");

pr_emerg("Dazed and confused, but trying to continue\n");

@@ -256,7 +256,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
show_regs(regs);

if (panic_on_io_nmi) {
- nmi_panic("NMI IOCK error: Not continuing");
+ nmi_panic(regs, "NMI IOCK error: Not continuing");

/*
* If we return from here, we've already being in panic.
@@ -304,7 +304,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)

pr_emerg("Do you have a strange power saving mode enabled?\n");
if (unknown_nmi_panic || panic_on_unrecovered_nmi)
- nmi_panic("NMI: Not continuing");
+ nmi_panic(regs, "NMI: Not continuing");

pr_emerg("Dazed and confused, but trying to continue\n");
}
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 86db4bc..299b4b7 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -718,6 +718,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
static nmi_shootdown_cb shootdown_callback;

static atomic_t waiting_for_crash_ipi;
+static int crash_ipi_done;

static int crash_nmi_callback(unsigned int val, struct pt_regs *regs)
{
@@ -779,6 +780,7 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)
wmb();

smp_send_nmi_allbutself();
+ crash_ipi_done = 1; /* Kick cpus looping in nmi context */

msecs = 1000; /* Wait at most a second for the other cpus to stop */
while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
@@ -788,6 +790,15 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)

/* Leave the nmi callback set */
}
+
+void nmi_panic_self_stop(struct pt_regs *regs)
+{
+ while (crash_ipi_done == 0)
+ cpu_relax();
+
+ crash_nmi_callback(0, regs); /* Shouldn't return */
+}
+
#else /* !CONFIG_SMP */
void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 57c33da..9fe9961 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -255,6 +255,7 @@ static inline void might_fault(void) { }
__printf(1, 2)
void panic(const char *fmt, ...)
__noreturn __cold;
+void nmi_panic_self_stop(struct pt_regs *);
extern void oops_enter(void);
extern void oops_exit(void);
void print_oops_end_marker(void);
@@ -448,12 +449,19 @@ extern __scanf(2, 0)
/*
* A variant of panic() called from NMI context.
* If we've already panicked on this cpu, return from here.
+ * If another cpu already panicked, loop in nmi_panic_self_stop() which
+ * can provide architecture dependent code such as saving register states
+ * for crash dump.
*/
-#define nmi_panic(fmt, ...) \
+#define nmi_panic(regs, fmt, ...) \
do { \
+ int old_cpu; \
int this_cpu = raw_smp_processor_id(); \
- if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
+ old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu); \
+ if (old_cpu == -1) \
panic(fmt, ##__VA_ARGS__); \
+ else if (old_cpu != this_cpu) \
+ nmi_panic_self_stop(regs); \
} while (0)

/*
diff --git a/kernel/panic.c b/kernel/panic.c
index 3d4bc73..415c4e7 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -60,6 +60,16 @@ void __weak panic_smp_self_stop(void)
cpu_relax();
}

+/*
+ * Stop ourself in NMI context if another cpu has already panicked.
+ * Architecture code may override this to prepare for crash dumping
+ * (e.g. save register information).
+ */
+void __weak nmi_panic_self_stop(struct pt_regs *regs)
+{
+ panic_smp_self_stop();
+}
+
atomic_t panic_cpu = ATOMIC_INIT(-1);

/**
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index 9a3d738..8e3a31c 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -304,8 +304,9 @@ static void watchdog_overflow_callback(struct perf_event *event,
return;

if (hardlockup_panic)
- nmi_panic("Watchdog detected hard LOCKUP on cpu %d",
- this_cpu);
+ nmi_panic(regs,
+ "Watchdog detected hard LOCKUP on cpu %d",
+ this_cpu);
else
WARN(1, "Watchdog detected hard LOCKUP on cpu %d",
this_cpu);

2015-08-06 05:47:40

by Hidehiro Kawai

[permalink] [raw]

Subject: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

Currently, panic() and crash_kexec() can be called at the same time.
For example (x86 case):

CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
nmi_shootdown_cpus() // stop other cpus

CPU 1:
panic()
crash_kexec()
mutex_trylock() // failed to acquire
smp_send_stop() // stop other cpus
infinite loop

If CPU 1 calls smp_send_stop() before nmi_shootdown_cpus(), kdump
fails.

In another case:

CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
<NMI>
io_check_error()
panic()
crash_kexec()
mutex_trylock() // failed to acquire
infinite loop

Clearly, this is an undesirable result.

To fix this problem, this patch changes crash_kexec() to exclude
others by using atomic_t panicking_cpu.

V2:
- Use atomic_cmpxchg() instead of spin_trylock() on panic_lock
to exclude concurrent accesses
- Don't introduce no-lock version of crash_kexec()

Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Eric Biederman <[email protected]>
Cc: Vivek Goyal <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Michal Hocko <[email protected]>
---
kernel/kexec.c | 20 ++++++++++++++++++++
1 file changed, 20 insertions(+)

diff --git a/kernel/kexec.c b/kernel/kexec.c
index a785c10..ed75a8b 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1472,6 +1472,18 @@ void __weak crash_unmap_reserved_pages(void)

void crash_kexec(struct pt_regs *regs)
{
+ int old_cpu, this_cpu;
+
+ /*
+ * `old_cpu == -1' means we are the first comer and crash_kexec()
+ * was called without entering panic().
+ * `old_cpu == this_cpu' means crash_kexec() was called from panic().
+ */
+ this_cpu = raw_smp_processor_id();
+ old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
+ if (old_cpu != -1 && old_cpu != this_cpu)
+ return;
+
/* Take the kexec_mutex here to prevent sys_kexec_load
* running on one cpu from replacing the crash kernel
* we are using after a panic on a different cpu.
@@ -1491,6 +1503,14 @@ void crash_kexec(struct pt_regs *regs)
}
mutex_unlock(&kexec_mutex);
}
+
+ /*
+ * If we came here from panic(), we have to keep panic_cpu
+ * to prevent other cpus from entering panic(). Otherwise,
+ * resetting it so that other cpus can enter panic()/crash_kexec().
+ */
+ if (old_cpu == -1)
+ atomic_xchg(&panic_cpu, -1);
}

size_t crash_get_memory_size(void)

2015-08-06 05:47:39

by Hidehiro Kawai

[permalink] [raw]

Subject: [V3 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

If panic on NMI happens just after panic() on the same CPU, panic()
is recursively called. As the result, it stalls after failing to
acquire panic_lock.

To avoid this problem, don't call panic() in NMI context if
we've already entered panic().

V3:
- Introduce nmi_panic() macro to reduce code duplication
- In the case of panic on NMI, don't return from NMI handlers
if another cpu already panicked

V2:
- Use atomic_cmpxchg() instead of current spin_trylock() to
exclude concurrent accesses to the panic routines
- Don't introduce no-lock version of panic()

Signed-off-by: Hidehiro Kawai <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Michal Hocko <[email protected]>
---
arch/x86/kernel/nmi.c | 15 +++++++++++----
include/linux/kernel.h | 13 +++++++++++++
kernel/panic.c | 13 ++++++++++---
kernel/watchdog.c | 2 +-
4 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
index d05bd2e..dcd4038 100644
--- a/arch/x86/kernel/nmi.c
+++ b/arch/x86/kernel/nmi.c
@@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
#endif

if (panic_on_unrecovered_nmi)
- panic("NMI: Not continuing");
+ nmi_panic("NMI: Not continuing");

pr_emerg("Dazed and confused, but trying to continue\n");

@@ -255,8 +255,15 @@ void unregister_nmi_handler(unsigned int type, const char *name)
reason, smp_processor_id());
show_regs(regs);

- if (panic_on_io_nmi)
- panic("NMI IOCK error: Not continuing");
+ if (panic_on_io_nmi) {
+ nmi_panic("NMI IOCK error: Not continuing");
+
+ /*
+ * If we return from here, we've already being in panic.
+ * So, simply return without a delay and re-enabling NMI
+ */
+ return;
+ }

/* Re-enable the IOCK line, wait for a few seconds */
reason = (reason & NMI_REASON_CLEAR_MASK) | NMI_REASON_CLEAR_IOCHK;
@@ -297,7 +304,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)

pr_emerg("Do you have a strange power saving mode enabled?\n");
if (unknown_nmi_panic || panic_on_unrecovered_nmi)
- panic("NMI: Not continuing");
+ nmi_panic("NMI: Not continuing");

pr_emerg("Dazed and confused, but trying to continue\n");
}
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 5582410..57c33da 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -443,6 +443,19 @@ extern __scanf(2, 0)

extern bool crash_kexec_post_notifiers;

+extern atomic_t panic_cpu;
+
+/*
+ * A variant of panic() called from NMI context.
+ * If we've already panicked on this cpu, return from here.
+ */
+#define nmi_panic(fmt, ...) \
+ do { \
+ int this_cpu = raw_smp_processor_id(); \
+ if (atomic_cmpxchg(&panic_cpu, -1, this_cpu) != this_cpu) \
+ panic(fmt, ##__VA_ARGS__); \
+ } while (0)
+
/*
* Only to be used by arch init code. If the user over-wrote the default
* CONFIG_PANIC_TIMEOUT, honor it.
diff --git a/kernel/panic.c b/kernel/panic.c
index 04e91ff..3d4bc73 100644
--- a/kernel/panic.c
+++ b/kernel/panic.c
@@ -60,6 +60,8 @@ void __weak panic_smp_self_stop(void)
cpu_relax();
}

+atomic_t panic_cpu = ATOMIC_INIT(-1);
+
/**
* panic - halt the system
* @fmt: The text string to print
@@ -70,17 +72,17 @@ void __weak panic_smp_self_stop(void)
*/
void panic(const char *fmt, ...)
{
- static DEFINE_SPINLOCK(panic_lock);
static char buf[1024];
va_list args;
long i, i_next = 0;
int state = 0;
+ int old_cpu, this_cpu;

/*
* Disable local interrupts. This will prevent panic_smp_self_stop
* from deadlocking the first cpu that invokes the panic, since
* there is nothing to prevent an interrupt handler (that runs
- * after the panic_lock is acquired) from invoking panic again.
+ * after setting panic_cpu) from invoking panic again.
*/
local_irq_disable();

@@ -93,8 +95,13 @@ void panic(const char *fmt, ...)
* multiple parallel invocations of panic, all other CPUs either
* stop themself or will wait until they are stopped by the 1st CPU
* with smp_send_stop().
+ *
+ * `old_cpu == -1' means we are the first comer.
+ * `old_cpu == this_cpu' means we came here due to panic on NMI.
*/
- if (!spin_trylock(&panic_lock))
+ this_cpu = raw_smp_processor_id();
+ old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
+ if (old_cpu != -1 && old_cpu != this_cpu)
panic_smp_self_stop();

console_verbose();
diff --git a/kernel/watchdog.c b/kernel/watchdog.c
index a6ffa43..9a3d738 100644
--- a/kernel/watchdog.c
+++ b/kernel/watchdog.c
@@ -304,7 +304,7 @@ static void watchdog_overflow_callback(struct perf_event *event,
return;

if (hardlockup_panic)
- panic("Watchdog detected hard LOCKUP on cpu %d",
+ nmi_panic("Watchdog detected hard LOCKUP on cpu %d",
this_cpu);
else
WARN(1, "Watchdog detected hard LOCKUP on cpu %d",

2015-08-07 14:38:43

by Michal Hocko

[permalink] [raw]

Subject: Re: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

JFYI I have those patches tested on the largish box. Will come back to
you as soon as I have some feedback.

I will also try to review these patches sometimes next week.

Thanks!

On Thu 06-08-15 14:45:43, Hidehiro Kawai wrote:
> When an HA cluster software or administrator detects non-response
> of a host, they issue an NMI to the host to completely stop current
> works and take a crash dump. If the kernel has already panicked
> or is capturing a crash dump at that time, further NMI can cause
> a crash dump failure.
>
> Also, crash_kexec() called from oops context and panic() can
> cause race conditions.
>
> To solve these issue, this patch set does following things:
>
> - Don't panic on NMI if the kernel has already panicked
> - Extend exclusion control currently done by panic_lock to crash_kexec
> - Introduce "noextnmi" boot option which masks external NMI at the
> boot time (supported only for x86)
>
> V3:
> - Introduce nmi_panic() macro to reduce code duplication
> - In the case of panic on NMI, don't return from NMI handlers
> if another cpu already panicked
>
> V2:
> - Use atomic_cmpxchg() instead of current spin_trylock() to exclude
> concurrent accesses to panic() and crash_kexec()
> - Don't introduce no-lock version of panic() and crash_kexec()
>
> ---
>
> Hidehiro Kawai (4):
> panic/x86: Fix re-entrance problem due to panic on NMI
> panic/x86: Allow cpus to save registers even if they are looping in NMI context
> kexec: Fix race between panic() and crash_kexec() called directly
> x86/apic: Introduce noextnmi boot option
>
>
> Documentation/kernel-parameters.txt | 4 ++++
> arch/x86/kernel/apic/apic.c | 17 ++++++++++++++++-
> arch/x86/kernel/nmi.c | 15 +++++++++++----
> arch/x86/kernel/reboot.c | 11 +++++++++++
> include/linux/kernel.h | 21 +++++++++++++++++++++
> kernel/kexec.c | 20 ++++++++++++++++++++
> kernel/panic.c | 23 ++++++++++++++++++++---
> kernel/watchdog.c | 5 +++--
> 8 files changed, 106 insertions(+), 10 deletions(-)
>
>
> --
> Hidehiro Kawai
> Hitachi, Ltd. Research & Development Group
>
>

--
Michal Hocko
SUSE Labs

2015-08-20 23:08:54

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> void crash_kexec(struct pt_regs *regs)
> {
> + int old_cpu, this_cpu;
> +
> + /*
> + * `old_cpu == -1' means we are the first comer and crash_kexec()
> + * was called without entering panic().
> + * `old_cpu == this_cpu' means crash_kexec() was called from panic().
> + */
> + this_cpu = raw_smp_processor_id();
> + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> + if (old_cpu != -1 && old_cpu != this_cpu)
> + return;

This allows recursive calling of crash_kexec(), the Changelog did not
mention that. Is this really required?

> +
> /* Take the kexec_mutex here to prevent sys_kexec_load
> * running on one cpu from replacing the crash kernel
> * we are using after a panic on a different cpu.

2015-08-20 23:18:06

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

User-Agent: StGit/0.16

Fwiw, stgit is broken wrt sending email, all your emails have the exact
same timestamp, which means that the emails will be ordered on received
timestamp when threaded and generate the below mess:

Aug 06 Hidehiro Kawai (1.9K) [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec
Aug 06 Hidehiro Kawai (2.4K) ├─>[V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly
Aug 06 Hidehiro Kawai (4.9K) ├─>[V3 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI
Aug 06 Hidehiro Kawai (5.3K) ├─>[V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context
Aug 06 Hidehiro Kawai (2.5K) ├─>[V3 PATCH 4/4] x86/apic: Introduce noextnmi boot option

2015-08-20 23:18:22

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> When cpu-A panics on NMI just after cpu-B has panicked, cpu-A loops
> infinitely in NMI context. Especially for x86, cpu-B issues NMI IPI
> to other cpus to save their register states and do some cleanups if
> kdump is enabled, but cpu-A can't handle the NMI and fails to save
> register states.
>
> To solve thie issue, we wait for the timing of the NMI IPI, then
> call the NMI handler which saves register states.

Sorry, I don't follow, what?

2015-08-20 23:45:09

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [V3 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> index d05bd2e..dcd4038 100644
> --- a/arch/x86/kernel/nmi.c
> +++ b/arch/x86/kernel/nmi.c
> @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
> #endif
>
> if (panic_on_unrecovered_nmi)
> - panic("NMI: Not continuing");
> + nmi_panic("NMI: Not continuing");
>
> pr_emerg("Dazed and confused, but trying to continue\n");
>

What tree is this again.. my tree (-tip) doesn't have
panic_on_unrecovered_nmi nonsense.

2015-08-22 00:41:27

by Hidehiro Kawai

[permalink] [raw]

Subject: RE: [V3 PATCH 0/4] Fix race issues among panic, NMI and crash_kexec

> From: Peter Zijlstra [mailto:[email protected]]
> User-Agent: StGit/0.16
>
> Fwiw, stgit is broken wrt sending email, all your emails have the exact
> same timestamp, which means that the emails will be ordered on received
> timestamp when threaded and generate the below mess:

Sorry for the inconvenience. I'll try to find some workaround.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2015-08-22 00:46:25

by Hidehiro Kawai

[permalink] [raw]

Subject: RE: [V3 PATCH 1/4] panic/x86: Fix re-entrance problem due to panic on NMI

> From: Peter Zijlstra [mailto:[email protected]]
>
> On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c
> > index d05bd2e..dcd4038 100644
> > --- a/arch/x86/kernel/nmi.c
> > +++ b/arch/x86/kernel/nmi.c
> > @@ -231,7 +231,7 @@ void unregister_nmi_handler(unsigned int type, const char *name)
> > #endif
> >
> > if (panic_on_unrecovered_nmi)
> > - panic("NMI: Not continuing");
> > + nmi_panic("NMI: Not continuing");
> >
> > pr_emerg("Dazed and confused, but trying to continue\n");
> >
>
> What tree is this again.. my tree (-tip) doesn't have
> panic_on_unrecovered_nmi nonsense.

I created this patch set on top of the v4.2-rc5, but this is
mostly related to x86 and should be based on -tip tree.
I'll do rebase in the next version.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2015-08-22 01:43:08

by Hidehiro Kawai

[permalink] [raw]

Subject: RE: [V3 PATCH 2/4] panic/x86: Allow cpus to save registers even if they are looping in NMI context

> From: Peter Zijlstra [mailto:[email protected]]
>
> On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > When cpu-A panics on NMI just after cpu-B has panicked, cpu-A loops
> > infinitely in NMI context. Especially for x86, cpu-B issues NMI IPI
> > to other cpus to save their register states and do some cleanups if
> > kdump is enabled, but cpu-A can't handle the NMI and fails to save
> > register states.
> >
> > To solve thie issue, we wait for the timing of the NMI IPI, then
> > call the NMI handler which saves register states.
>
> Sorry, I don't follow, what?

First, a subroutine of crash_kexec(), nmi_shootdown_cpus()
send NMI IPI to non-panic cpus to stop them while saving their
registers ans doing some cleanups for crash dumping. So if a non-panic
cpu is looping in NMI context infinitely at that time, we fail to save
its register information and lose the information from the crash dump.

`Infinite loop in NMI context' can happen when panic on NMI is about
to happen while another cpu has already been processing panic().
To save regs and do some cleanups in that case too, this patch does
two things:

1. Moves the timing of `infinite loop in NMI context' (actually
panic_smp_self_stop()) outside of panic() to keep the pt_regs object
2. call a callback of nmi_shootdown_cpus() directly to save regs and
do some cleanups after setting waiting_for_crash_ipi which is used
for counting down the number of cpus which handled the callback

Does that answer your question?

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2015-08-22 02:35:30

by Hidehiro Kawai

[permalink] [raw]

Subject: RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

> From: Peter Zijlstra [mailto:[email protected]]
>
> On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > void crash_kexec(struct pt_regs *regs)
> > {
> > + int old_cpu, this_cpu;
> > +
> > + /*
> > + * `old_cpu == -1' means we are the first comer and crash_kexec()
> > + * was called without entering panic().
> > + * `old_cpu == this_cpu' means crash_kexec() was called from panic().
> > + */
> > + this_cpu = raw_smp_processor_id();
> > + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > + if (old_cpu != -1 && old_cpu != this_cpu)
> > + return;
>
> This allows recursive calling of crash_kexec(), the Changelog did not
> mention that. Is this really required?

What part are you arguing? Recursive call of crash_kexec() doesn't
happen. In the first place, one of the purpose of this patch is
to prevent a recursive call of crash_kexec() in the following case
as I stated in the description:

CPU 0:
oops_end()
crash_kexec()
mutex_trylock() // acquired
<NMI>
io_check_error()
panic()
crash_kexec()
mutex_trylock() // failed to acquire
infinite loop

Also, the logic doesn't change form V1 (although the implementation
changed), so I didn't add changelogs any more.

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2015-08-25 14:53:11

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

On Sat, Aug 22, 2015 at 02:35:24AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > From: Peter Zijlstra [mailto:[email protected]]
> >
> > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > > void crash_kexec(struct pt_regs *regs)
> > > {
> > > + int old_cpu, this_cpu;
> > > +
> > > + /*
> > > + * `old_cpu == -1' means we are the first comer and crash_kexec()
> > > + * was called without entering panic().
> > > + * `old_cpu == this_cpu' means crash_kexec() was called from panic().
> > > + */
> > > + this_cpu = raw_smp_processor_id();
> > > + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > > + if (old_cpu != -1 && old_cpu != this_cpu)
> > > + return;
> >
> > This allows recursive calling of crash_kexec(), the Changelog did not
> > mention that. Is this really required?
>
> What part are you arguing? Recursive call of crash_kexec() doesn't
> happen. In the first place, one of the purpose of this patch is
> to prevent a recursive call of crash_kexec() in the following case
> as I stated in the description:
>
> CPU 0:
> oops_end()
> crash_kexec()
> mutex_trylock() // acquired
> <NMI>
> io_check_error()
> panic()
> crash_kexec()
> mutex_trylock() // failed to acquire
> infinite loop
>

Yes, but what to we want to do there? It seems to me that is wrong, we
do not want to let a recursive crash_kexec() proceed.

Whereas the condition you created explicitly allows this recursion by
virtue of the 'old_cpu != this_cpu' check.

You changelog does not explain why you want a recursive crash_kexec().

> Also, the logic doesn't change form V1 (although the implementation
> changed), so I didn't add changelogs any more.

I cannot remember V1, nor can any prior patch be relevant.

2015-08-26 03:11:35

by Hidehiro Kawai

[permalink] [raw]

Subject: RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

Hi,

> From: Peter Zijlstra [mailto:[email protected]]
>
> On Sat, Aug 22, 2015 at 02:35:24AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > From: Peter Zijlstra [mailto:[email protected]]
> > >
> > > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > > > void crash_kexec(struct pt_regs *regs)
> > > > {
> > > > + int old_cpu, this_cpu;
> > > > +
> > > > + /*
> > > > + * `old_cpu == -1' means we are the first comer and crash_kexec()
> > > > + * was called without entering panic().
> > > > + * `old_cpu == this_cpu' means crash_kexec() was called from panic().
> > > > + */
> > > > + this_cpu = raw_smp_processor_id();
> > > > + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > > > + if (old_cpu != -1 && old_cpu != this_cpu)
> > > > + return;
> > >
> > > This allows recursive calling of crash_kexec(), the Changelog did not
> > > mention that. Is this really required?
> >
> > What part are you arguing? Recursive call of crash_kexec() doesn't
> > happen. In the first place, one of the purpose of this patch is
> > to prevent a recursive call of crash_kexec() in the following case
> > as I stated in the description:
> >
> > CPU 0:
> > oops_end()
> > crash_kexec()
> > mutex_trylock() // acquired
> > <NMI>
> > io_check_error()
> > panic()
> > crash_kexec()
> > mutex_trylock() // failed to acquire
> > infinite loop
> >
>
> Yes, but what to we want to do there? It seems to me that is wrong, we
> do not want to let a recursive crash_kexec() proceed.
>
> Whereas the condition you created explicitly allows this recursion by
> virtue of the 'old_cpu != this_cpu' check.

I understand your question. I don't intend to permit the recursive
call of crash_kexec() as for 'old_cpu != this_cpu' check. That is
needed for the case of panic() --> crash_kexec(). Since panic_cpu has
already been set to this_cpu in panic() (please see PATCH 1/4), no one
can run crash_kexec() without 'old_cpu != this_cpu' check.

If you don't like this check, I would also be able to handle this case
like below:

crash_kexec()
{
old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
if (old_cpu != -1)
return;

__crash_kexec();
}

panic()
{
atomic_cmpxchg(&panic_cpu, -1, this_cpu);
__crash_kexec();
...

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2015-08-31 08:53:18

by Hidehiro Kawai

[permalink] [raw]

Subject: RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

Hello Peter,

> From: [email protected] [mailto:[email protected]] On Behalf Of 河合英宏 / KAWAI，
>
> Hi,
>
> > From: Peter Zijlstra [mailto:[email protected]]
> >
> > On Sat, Aug 22, 2015 at 02:35:24AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > > From: Peter Zijlstra [mailto:[email protected]]
> > > >
> > > > On Thu, Aug 06, 2015 at 02:45:43PM +0900, Hidehiro Kawai wrote:
> > > > > void crash_kexec(struct pt_regs *regs)
> > > > > {
> > > > > + int old_cpu, this_cpu;
> > > > > +
> > > > > + /*
> > > > > + * `old_cpu == -1' means we are the first comer and crash_kexec()
> > > > > + * was called without entering panic().
> > > > > + * `old_cpu == this_cpu' means crash_kexec() was called from panic().
> > > > > + */
> > > > > + this_cpu = raw_smp_processor_id();
> > > > > + old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > > > > + if (old_cpu != -1 && old_cpu != this_cpu)
> > > > > + return;
> > > >
> > > > This allows recursive calling of crash_kexec(), the Changelog did not
> > > > mention that. Is this really required?
> > >
> > > What part are you arguing? Recursive call of crash_kexec() doesn't
> > > happen. In the first place, one of the purpose of this patch is
> > > to prevent a recursive call of crash_kexec() in the following case
> > > as I stated in the description:
> > >
> > > CPU 0:
> > > oops_end()
> > > crash_kexec()
> > > mutex_trylock() // acquired
> > > <NMI>
> > > io_check_error()
> > > panic()
> > > crash_kexec()
> > > mutex_trylock() // failed to acquire
> > > infinite loop
> > >
> >
> > Yes, but what to we want to do there? It seems to me that is wrong, we
> > do not want to let a recursive crash_kexec() proceed.
> >
> > Whereas the condition you created explicitly allows this recursion by
> > virtue of the 'old_cpu != this_cpu' check.
>
> I understand your question. I don't intend to permit the recursive
> call of crash_kexec() as for 'old_cpu != this_cpu' check. That is
> needed for the case of panic() --> crash_kexec(). Since panic_cpu has
> already been set to this_cpu in panic() (please see PATCH 1/4), no one
> can run crash_kexec() without 'old_cpu != this_cpu' check.
>
> If you don't like this check, I would also be able to handle this case
> like below:
>
> crash_kexec()
> {
> old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> if (old_cpu != -1)
> return;
>
> __crash_kexec();
> }
>
> panic()
> {
> atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> __crash_kexec();
> ...
>

Is that OK?

Regards,

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?

2015-08-31 09:07:45

by Peter Zijlstra

[permalink] [raw]

Subject: Re: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

On Mon, Aug 31, 2015 at 08:53:11AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > I understand your question. I don't intend to permit the recursive
> > call of crash_kexec() as for 'old_cpu != this_cpu' check. That is
> > needed for the case of panic() --> crash_kexec(). Since panic_cpu has
> > already been set to this_cpu in panic() (please see PATCH 1/4), no one
> > can run crash_kexec() without 'old_cpu != this_cpu' check.
> >
> > If you don't like this check, I would also be able to handle this case
> > like below:
> >
> > crash_kexec()
> > {
> > old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > if (old_cpu != -1)
> > return;
> >
> > __crash_kexec();
> > }
> >
> > panic()
> > {
> > atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > __crash_kexec();
> > ...
> >
>
> Is that OK?

I suppose so, but I think me getting confused means comments can be
added/improved.

2015-08-31 09:57:18

by Hidehiro Kawai

[permalink] [raw]

Subject: RE: [V3 PATCH 3/4] kexec: Fix race between panic() and crash_kexec() called directly

> From: Peter Zijlstra [mailto:[email protected]]
> On Mon, Aug 31, 2015 at 08:53:11AM +0000, 河合英宏 / KAWAI，HIDEHIRO wrote:
> > > I understand your question. I don't intend to permit the recursive
> > > call of crash_kexec() as for 'old_cpu != this_cpu' check. That is
> > > needed for the case of panic() --> crash_kexec(). Since panic_cpu has
> > > already been set to this_cpu in panic() (please see PATCH 1/4), no one
> > > can run crash_kexec() without 'old_cpu != this_cpu' check.
> > >
> > > If you don't like this check, I would also be able to handle this case
> > > like below:
> > >
> > > crash_kexec()
> > > {
> > > old_cpu = atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > > if (old_cpu != -1)
> > > return;
> > >
> > > __crash_kexec();
> > > }
> > >
> > > panic()
> > > {
> > > atomic_cmpxchg(&panic_cpu, -1, this_cpu);
> > > __crash_kexec();
> > > ...
> > >
> >
> > Is that OK?
>
> I suppose so, but I think me getting confused means comments can be
> added/improved.

OK, I'll improve comments and description in the next version.

Thanks!

Hidehiro Kawai
Hitachi, Ltd. Research & Development Group

????{.n?+???????+%?????ݶ??w??{.n?+????{??G?????{ay?ʇڙ?,j??f???h?????????z_??(?階?ݢj"???m??????G????????????&???~???iO???z??v?^?m????????????I?