2008-11-05 20:03:38

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 00/15] x86: disable virt on kdump and emergency_restart (v2)

Hi,

This is an updated version of the reboot/kdump virtualization disable
series that I've sent previously.

In short, the x86 and kdump changes are the same as before, except for
EXPORT_SYMBOL_GPL, and the KVM parts are completely different.

Details of changes since the previous series:

- Style fixes suggested by checkpatch
- Added local_irq_disable() to nmi_shootdown_cpus() (patch 08)
- Use EXPORT_SYMBOL_GPL() on set_virt_disable_func() &
clear_virt_disable_func()
- Add comments to source code on places where emergency_virt_disable()
is called, explaining why.
- kvm: Move the set_virt_disable_func() call to vmx.c and svm.c.
This made the patch series shorter and removing one level
of abstraction.

This series is against linux-next-20081105.

Patches 01-07 simply move the non-kdump-specific parts
of nmi_shootdown_cpus() to reboot.c, so it can be used by
emergency_restart(). They should be a no-op in relation to existing code.

Patch 08 adds an additional local_irq_disable() to nmi_shootdown_cpus(),
in case it is called with IRQs enabled.

Patch 09 adds the virt_disable function registering interface, like
the previous series.

Patch 10 hooks emergency_virt_disable() into kdump crash_shutdown code.

Patch 11 hooks emergency_virt_disable() into emergency_restart() using
nmi_shootdown_cpus().

Patches 12-14 change KVM so that it registers a virt_disable function
when loading.

Finally, patch 15 restore the previous reboot=kbd default.

--
Eduardo


2008-11-05 19:58:22

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 03/15] x86 kdump: Create kdump_nmi_shootdown_cpus()

For the kdump-specific code that was living on nmi_shootdown_cpus().

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index a70c1c6..d6a8085 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -119,10 +119,17 @@ static void nmi_shootdown_cpus(void)
}

/* Leave the nmi callback set */
+}
+
+static void kdump_nmi_shootdown_cpus(void)
+{
+ nmi_shootdown_cpus();
+
disable_local_APIC();
}
+
#else
-static void nmi_shootdown_cpus(void)
+static void kdump_nmi_shootdown_cpus(void)
{
/* There are no cpus to shootdown */
}
@@ -141,7 +148,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
/* The kernel is broken so disable interrupts */
local_irq_disable();

- nmi_shootdown_cpus();
+ kdump_nmi_shootdown_cpus();
lapic_shutdown();
#if defined(CONFIG_X86_IO_APIC)
disable_IO_APIC();
--
1.5.5.GIT

2008-11-05 19:58:48

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 01/15] x86 kdump: Extract kdump-specific code from crash_nmi_callback()

The NMI CPU-halting code will be used on non-kdump cases, also
(e.g. emergency_reboot when virtualization is enabled).

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 31 ++++++++++++++++++++-----------
1 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 81e01f7..d82ac72 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -35,19 +35,34 @@ static int crashing_cpu;
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
static atomic_t waiting_for_crash_ipi;

-static int crash_nmi_callback(struct notifier_block *self,
- unsigned long val, void *data)
+static void kdump_nmi_callback(int cpu, struct die_args *args)
{
struct pt_regs *regs;
#ifdef CONFIG_X86_32
struct pt_regs fixed_regs;
#endif
+
+ regs = args->regs;
+
+#ifdef CONFIG_X86_32
+ if (!user_mode_vm(regs)) {
+ crash_fixup_ss_esp(&fixed_regs, regs);
+ regs = &fixed_regs;
+ }
+#endif
+ crash_save_cpu(regs, cpu);
+
+ disable_local_APIC();
+}
+
+static int crash_nmi_callback(struct notifier_block *self,
+ unsigned long val, void *data)
+{
int cpu;

if (val != DIE_NMI_IPI)
return NOTIFY_OK;

- regs = ((struct die_args *)data)->regs;
cpu = raw_smp_processor_id();

/* Don't do anything if this handler is invoked on crashing cpu.
@@ -58,14 +73,8 @@ static int crash_nmi_callback(struct notifier_block *self,
return NOTIFY_STOP;
local_irq_disable();

-#ifdef CONFIG_X86_32
- if (!user_mode_vm(regs)) {
- crash_fixup_ss_esp(&fixed_regs, regs);
- regs = &fixed_regs;
- }
-#endif
- crash_save_cpu(regs, cpu);
- disable_local_APIC();
+ kdump_nmi_callback(cpu, (struct die_args *)data);
+
atomic_dec(&waiting_for_crash_ipi);
/* Assume hlt works */
halt();
--
1.5.5.GIT

2008-11-05 19:59:08

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.

Now that we have the hooks to disable virtualization on
emergency_restart(), we can get back to the BOOT_KBD reboot_type default.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/reboot.c | 6 +-----
1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 3240491..260c131 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -32,11 +32,7 @@ EXPORT_SYMBOL(pm_power_off);

static const struct desc_ptr no_idt = {};
static int reboot_mode;
-/*
- * Keyboard reset and triple fault may result in INIT, not RESET, which
- * doesn't work when we're in vmx root mode. Try ACPI first.
- */
-enum reboot_type reboot_type = BOOT_ACPI;
+enum reboot_type reboot_type = BOOT_KBD;
int reboot_force;

#if defined(CONFIG_X86_32) && defined(CONFIG_SMP)
--
1.5.5.GIT

2008-11-05 19:59:35

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 09/15] x86: Emergency virtualization disable function

This patch adds an interface to set a function that can be used to
disable virtualization extensions on the CPU on emergency cases, such
as on kdump or emergency reboot.

The function will be set by code that enables virtualization extensions
on the CPUs (i.e. KVM), and should do the very least to disable virt
extensions on the CPU where it is being called.

The functions that set the function pointer uses RCU synchronization,
just in case the crash NMI is triggered while KVM is unloading.

We can't just use the same notifiers used at reboot time (that are used
by non-crash-dump kexec), because at crash time some CPUs may have IRQs
disabled, so we can't use IPIs. The crash shutdown code use NMIs to tell
the other CPUs to be halted, so the notifier call will be hooked into
the CPU halting code that is on the crash shutdown NMI handler.

[v2: drop 'unsigned int cpu' arg from function]
[v3: make emergency_virt_disable() non-static]
[v4: add a config option for it: CPU_VIRT_EXTENSIONS]
[v5: add has_virt_extensions() function]
[v6: don't define the registering functions if CPU_VIRT_EXTENSIONS is
not enabled]
[v7: use EXPORT_SYMBOL_GPL]

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/include/asm/virtext.h | 29 +++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/virtext.c | 89 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/Kconfig | 5 ++
4 files changed, 124 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/include/asm/virtext.h
create mode 100644 arch/x86/kernel/virtext.c

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
new file mode 100644
index 0000000..72b7caa
--- /dev/null
+++ b/arch/x86/include/asm/virtext.h
@@ -0,0 +1,29 @@
+/* CPU virtualization extensions handling
+ */
+#ifndef _ASM_X86_VIRTEX_H
+#define _ASM_X86_VIRTEX_H
+
+#ifdef CONFIG_CPU_VIRT_EXTENSIONS
+
+int set_virt_disable_func(void (*fn)(void));
+void clear_virt_disable_func(void);
+void emergency_virt_disable(void);
+int has_virt_extensions(void);
+
+
+#else /* !CONFIG_CPU_VIRT_EXTENSIONS */
+
+static inline
+void emergency_virt_disable(void)
+{
+}
+
+static inline
+int has_virt_extensions(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_CPU_VIRT_EXTENSIONS */
+
+#endif /* _ASM_X86_VIRTEX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 58814cc..84a0e23 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -68,6 +68,7 @@ obj-$(CONFIG_X86_REBOOTFIXUPS) += reboot_fixups_32.o
obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o
obj-$(CONFIG_KEXEC) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC) += relocate_kernel_$(BITS).o crash.o
+obj-$(CONFIG_CPU_VIRT_EXTENSIONS) += virtext.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o
obj-$(CONFIG_X86_NUMAQ) += numaq_32.o
obj-$(CONFIG_X86_ES7000) += es7000_32.o
diff --git a/arch/x86/kernel/virtext.c b/arch/x86/kernel/virtext.c
new file mode 100644
index 0000000..10d90f2
--- /dev/null
+++ b/arch/x86/kernel/virtext.c
@@ -0,0 +1,89 @@
+/* Core CPU virtualization extensions handling
+ *
+ * This should carry the code for handling CPU virtualization extensions
+ * that needs to live in the kernel core.
+ *
+ * Author: Eduardo Habkost <[email protected]>
+ *
+ * Copyright (C) 2008, Red Hat Inc.
+ */
+
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/rcupdate.h>
+
+static DEFINE_SPINLOCK(virt_disable_lock);
+static void (*virt_disable_fn)(void);
+
+/** set function to be called to disable virtualization on crash
+ *
+ * Registers a function to be called when CPUs are being halted at
+ * machine_crash_shutdown().
+ *
+ * There is only one function pointer, so the function
+ * is reserved to be set by the KVM module at load time, before
+ * enabling virtualization.
+ *
+ * The function is called once on each online CPU, possibly
+ * from a NMI handler. It should do the very least to allow the CPU
+ * to be halted before booting the kdump kernel, as the kernel has
+ * just crashed.
+ */
+int set_virt_disable_func(void (*fn)(void))
+{
+ int r = 0;
+
+ spin_lock(&virt_disable_lock);
+ if (!virt_disable_fn)
+ rcu_assign_pointer(virt_disable_fn, fn);
+ else
+ r = -EEXIST;
+ spin_unlock(&virt_disable_lock);
+
+ return r;
+}
+EXPORT_SYMBOL_GPL(set_virt_disable_func);
+
+/** clear the virt_disable function set by set_virt_disable_func()
+ *
+ * You must call this function only if you sucessfully set
+ * the virt_disable function on a previous set_virt_disable_func()
+ * call.
+ *
+ * This function will use synchronize_sched() to wait until it's safe
+ * to free any data or code related to the previous existing virt_disable
+ * func, before returning.
+ */
+void clear_virt_disable_func(void)
+{
+ spin_lock(&virt_disable_lock);
+ rcu_assign_pointer(virt_disable_fn, NULL);
+ spin_unlock(&virt_disable_lock);
+
+ synchronize_sched();
+}
+EXPORT_SYMBOL_GPL(clear_virt_disable_func);
+
+/* Disable virtualization extensions if needed
+ *
+ * Runs thefunction set by set_virt_disable_func()
+ *
+ * Must be called on the CPU that is being halted.
+ */
+void emergency_virt_disable(void)
+{
+ void (*fn)(void);
+ fn = rcu_dereference(virt_disable_fn);
+ if (fn)
+ fn();
+}
+
+/* Returns non-zero if a virt_disable function was set
+ *
+ * The function pointer may be set just after you've called this function.
+ * Use with care.
+ */
+int has_virt_extensions(void)
+{
+ return rcu_dereference(virt_disable_fn) != NULL;
+}
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ce3251c..69373cc 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -15,6 +15,10 @@ menuconfig VIRTUALIZATION

If you say N, all options in this submenu will be skipped and disabled.

+# Core code for handling CPU virt extensions
+config CPU_VIRT_EXTENSIONS
+ bool
+
if VIRTUALIZATION

config KVM
@@ -23,6 +27,7 @@ config KVM
select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select ANON_INODES
+ select CPU_VIRT_EXTENSIONS
---help---
Support hosting fully virtualized guest machines using hardware
virtualization extensions. You will need a fairly recent
--
1.5.5.GIT

2008-11-05 19:59:52

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 06/15] x86: Move nmi_shootdown_cpus() to reboot.c

Now nmi_shootdown_cpus() is ready to be used by non-kdump code also.
Move it to reboot.c.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 73 ------------------------------------------
arch/x86/kernel/reboot.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+), 73 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index bba0907..d84a852 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -32,12 +32,6 @@

#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)

-/* This keeps a track of which one is crashing cpu. */
-static int crashing_cpu;
-static nmi_shootdown_cb shootdown_callback;
-
-static atomic_t waiting_for_crash_ipi;
-
static void kdump_nmi_callback(int cpu, struct die_args *args)
{
struct pt_regs *regs;
@@ -58,73 +52,6 @@ static void kdump_nmi_callback(int cpu, struct die_args *args)
disable_local_APIC();
}

-static int crash_nmi_callback(struct notifier_block *self,
- unsigned long val, void *data)
-{
- int cpu;
-
- if (val != DIE_NMI_IPI)
- return NOTIFY_OK;
-
- cpu = raw_smp_processor_id();
-
- /* Don't do anything if this handler is invoked on crashing cpu.
- * Otherwise, system will completely hang. Crashing cpu can get
- * an NMI if system was initially booted with nmi_watchdog parameter.
- */
- if (cpu == crashing_cpu)
- return NOTIFY_STOP;
- local_irq_disable();
-
- shootdown_callback(cpu, (struct die_args *)data);
-
- atomic_dec(&waiting_for_crash_ipi);
- /* Assume hlt works */
- halt();
- for (;;)
- cpu_relax();
-
- return 1;
-}
-
-static void smp_send_nmi_allbutself(void)
-{
- send_IPI_allbutself(NMI_VECTOR);
-}
-
-static struct notifier_block crash_nmi_nb = {
- .notifier_call = crash_nmi_callback,
-};
-
-void nmi_shootdown_cpus(nmi_shootdown_cb callback)
-{
- unsigned long msecs;
-
- /* Make a note of crashing cpu. Will be used in NMI callback.*/
- crashing_cpu = safe_smp_processor_id();
-
- shootdown_callback = callback;
-
- atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
- /* Would it be better to replace the trap vector here? */
- if (register_die_notifier(&crash_nmi_nb))
- return; /* return what? */
- /* Ensure the new callback function is set before sending
- * out the NMI
- */
- wmb();
-
- smp_send_nmi_allbutself();
-
- msecs = 1000; /* Wait at most a second for the other cpus to stop */
- while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
- mdelay(1);
- msecs--;
- }
-
- /* Leave the nmi callback set */
-}
-
static void kdump_nmi_shootdown_cpus(void)
{
nmi_shootdown_cpus(kdump_nmi_callback);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f4c93f1..be29987 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -21,6 +21,8 @@
# include <asm/iommu.h>
#endif

+#include <mach_ipi.h>
+
/*
* Power off function, if any
*/
@@ -518,3 +520,80 @@ void machine_crash_shutdown(struct pt_regs *regs)
machine_ops.crash_shutdown(regs);
}
#endif
+
+
+#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
+
+/* This keeps a track of which one is crashing cpu. */
+static int crashing_cpu;
+static nmi_shootdown_cb shootdown_callback;
+
+static atomic_t waiting_for_crash_ipi;
+
+static int crash_nmi_callback(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ int cpu;
+
+ if (val != DIE_NMI_IPI)
+ return NOTIFY_OK;
+
+ cpu = raw_smp_processor_id();
+
+ /* Don't do anything if this handler is invoked on crashing cpu.
+ * Otherwise, system will completely hang. Crashing cpu can get
+ * an NMI if system was initially booted with nmi_watchdog parameter.
+ */
+ if (cpu == crashing_cpu)
+ return NOTIFY_STOP;
+ local_irq_disable();
+
+ shootdown_callback(cpu, (struct die_args *)data);
+
+ atomic_dec(&waiting_for_crash_ipi);
+ /* Assume hlt works */
+ halt();
+ for (;;)
+ cpu_relax();
+
+ return 1;
+}
+
+static void smp_send_nmi_allbutself(void)
+{
+ send_IPI_allbutself(NMI_VECTOR);
+}
+
+static struct notifier_block crash_nmi_nb = {
+ .notifier_call = crash_nmi_callback,
+};
+
+void nmi_shootdown_cpus(nmi_shootdown_cb callback)
+{
+ unsigned long msecs;
+
+ /* Make a note of crashing cpu. Will be used in NMI callback.*/
+ crashing_cpu = safe_smp_processor_id();
+
+ shootdown_callback = callback;
+
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+ /* Would it be better to replace the trap vector here? */
+ if (register_die_notifier(&crash_nmi_nb))
+ return; /* return what? */
+ /* Ensure the new callback function is set before sending
+ * out the NMI
+ */
+ wmb();
+
+ smp_send_nmi_allbutself();
+
+ msecs = 1000; /* Wait at most a second for the other cpus to stop */
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
+ mdelay(1);
+ msecs--;
+ }
+
+ /* Leave the nmi callback set */
+}
+#endif
--
1.5.5.GIT

2008-11-05 20:00:15

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 07/15] x86: Make nmi_shootdown_cpus() available on !SMP and !X86_LOCAL_APIC

The X86_LOCAL_APIC #ifdef was for kdump. For !SMP, the function simply
does nothing.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/reboot.c | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index be29987..23a9d78 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -522,7 +522,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
#endif


-#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
+#if defined(CONFIG_SMP)

/* This keeps a track of which one is crashing cpu. */
static int crashing_cpu;
@@ -568,6 +568,12 @@ static struct notifier_block crash_nmi_nb = {
.notifier_call = crash_nmi_callback,
};

+/* Halt all other CPUs, calling the specified function on each of them
+ *
+ * This function can be used to halt all other CPUs on crash
+ * or emergency reboot time. The function passed as parameter
+ * will be called inside a NMI handler on all CPUs.
+ */
void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
unsigned long msecs;
@@ -596,4 +602,9 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)

/* Leave the nmi callback set */
}
+#else /* !CONFIG_SMP */
+void nmi_shootdown_cpus(nmi_shootdown_cb callback)
+{
+ /* No other CPUs to shoot down */
+}
#endif
--
1.5.5.GIT

2008-11-05 20:00:42

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 11/15] x86: disable virtualization on all CPUs if needed, on emergency_restart

On emergency_restart, we may need to use an NMI to disable virtualization
on all CPUs. We do that by using nmi_shootdown_cpus(), but only if a
virt_disable function was set by KVM.

Finding a proper point to hook the nmi_shootdown_cpus() call isn't
trivial, as the non-emergency machine_restart() (that doesn't need the
NMI tricks) uses machine_emergency_restart() directly.

The solution to make this work without adding a new function or argument
to machine_ops was setting a 'reboot_emergency' flag that tells if
native_machine_emergency_restart() needs to do the virt cleanup or not.

[v2: additional source code comments explaining why we do that]

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/reboot.c | 55 ++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 407106e..3240491 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -12,6 +12,7 @@
#include <asm/proto.h>
#include <asm/reboot_fixups.h>
#include <asm/reboot.h>
+#include <asm/virtext.h>

#ifdef CONFIG_X86_32
# include <linux/dmi.h>
@@ -42,6 +43,12 @@ int reboot_force;
static int reboot_cpu = -1;
#endif

+/* This is set if we need to go through the 'emergency' path.
+ * When machine_emergency_restart() is called, we may be on
+ * an inconsistent state and won't be able to do a clean cleanup
+ */
+static int reboot_emergency;
+
/* reboot=b[ios] | s[mp] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old]
warm Don't set the cold reboot flag
cold Set the cold reboot flag
@@ -357,6 +364,40 @@ static inline void kb_wait(void)
}
}

+static void disable_virt_nmi(int cpu, struct die_args *args)
+{
+ emergency_virt_disable();
+}
+
+/* Use NMIs as IPIs to tell all CPUs to disable virtualization
+ */
+static void emergency_virt_disable_all(void)
+{
+ /* We need to disable virtualization extensions on all CPUs
+ * before rebooting, otherwise we risk hanging up the machine,
+ * because the CPU ignore INIT signals when VMX is enabled.
+ *
+ * We can't take any locks and we may be on an inconsistent
+ * state, so we use NMIs as IPIs to tell the other CPUs to halt.
+ *
+ * For safety, we will try to do this only when we may have
+ * virtualization enabled.
+ */
+ if (has_virt_extensions()) {
+ /* Kill the other CPUs */
+ nmi_shootdown_cpus(disable_virt_nmi);
+ /* Disable virt on this CPU */
+ emergency_virt_disable();
+ }
+
+ /* If another CPU call set_virt_disable_func()
+ * here, we will be lost. Unless we want to add the NMI
+ * stuff to the reboot path for non-virt users, we will
+ * have to live with this (unlikely) possibility.
+ */
+
+}
+
void __attribute__((weak)) mach_reboot_fixups(void)
{
}
@@ -365,9 +406,13 @@ static void native_machine_emergency_restart(void)
{
int i;

+ if (reboot_emergency)
+ emergency_virt_disable_all();
+
/* Tell the BIOS if we want cold or warm reboot */
*((unsigned short *)__va(0x472)) = reboot_mode;

+
for (;;) {
/* Could also try the reset bit in the Hammer NB */
switch (reboot_type) {
@@ -456,13 +501,19 @@ void native_machine_shutdown(void)
#endif
}

+static void __machine_emergency_restart(int emergency)
+{
+ reboot_emergency = emergency;
+ machine_ops.emergency_restart();
+}
+
static void native_machine_restart(char *__unused)
{
printk("machine restart\n");

if (!reboot_force)
machine_shutdown();
- machine_emergency_restart();
+ __machine_emergency_restart(0);
}

static void native_machine_halt(void)
@@ -501,7 +552,7 @@ void machine_shutdown(void)

void machine_emergency_restart(void)
{
- machine_ops.emergency_restart();
+ __machine_emergency_restart(1);
}

void machine_restart(char *cmd)
--
1.5.5.GIT

2008-11-05 20:00:58

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 08/15] x86: Disable IRQs before doing anything on nmi_shootdown_cpus()

We need to know on which CPU we are running on, and we don't want to be
preempted while doing this.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/reboot.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 23a9d78..407106e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -577,6 +577,7 @@ static struct notifier_block crash_nmi_nb = {
void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
unsigned long msecs;
+ local_irq_disable();

/* Make a note of crashing cpu. Will be used in NMI callback.*/
crashing_cpu = safe_smp_processor_id();
--
1.5.5.GIT

2008-11-05 20:01:22

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 04/15] x86 kdump: Make kdump_nmi_callback() a function ptr on crash_nmi_callback()

The reboot code will use a different function on crash_nmi_callback().
Adding a function pointer parameter to nmi_shootdown_cpus() for that.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index d6a8085..4201fb3 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -29,10 +29,13 @@

#include <mach_ipi.h>

+typedef void (*nmi_shootdown_cb)(int, struct die_args*);
+
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)

/* This keeps a track of which one is crashing cpu. */
static int crashing_cpu;
+static nmi_shootdown_cb shootdown_callback;

static atomic_t waiting_for_crash_ipi;

@@ -74,7 +77,7 @@ static int crash_nmi_callback(struct notifier_block *self,
return NOTIFY_STOP;
local_irq_disable();

- kdump_nmi_callback(cpu, (struct die_args *)data);
+ shootdown_callback(cpu, (struct die_args *)data);

atomic_dec(&waiting_for_crash_ipi);
/* Assume hlt works */
@@ -94,13 +97,15 @@ static struct notifier_block crash_nmi_nb = {
.notifier_call = crash_nmi_callback,
};

-static void nmi_shootdown_cpus(void)
+static void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
unsigned long msecs;

/* Make a note of crashing cpu. Will be used in NMI callback.*/
crashing_cpu = safe_smp_processor_id();

+ shootdown_callback = callback;
+
atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
/* Would it be better to replace the trap vector here? */
if (register_die_notifier(&crash_nmi_nb))
@@ -123,7 +128,7 @@ static void nmi_shootdown_cpus(void)

static void kdump_nmi_shootdown_cpus(void)
{
- nmi_shootdown_cpus();
+ nmi_shootdown_cpus(kdump_nmi_callback);

disable_local_APIC();
}
--
1.5.5.GIT

2008-11-05 20:01:56

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 10/15] kdump: Hook emergency_virt_disable() on crash shutdown code

[v2: add comments on source code, explaining why it is needed]

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index d84a852..040f3ca 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -26,6 +26,7 @@
#include <linux/kdebug.h>
#include <asm/smp.h>
#include <asm/reboot.h>
+#include <asm/virtext.h>

#include <mach_ipi.h>

@@ -49,6 +50,11 @@ static void kdump_nmi_callback(int cpu, struct die_args *args)
#endif
crash_save_cpu(regs, cpu);

+ /* We need to disable virtualization on all CPUs.
+ * Having VMX or SVM enabled on any CPU may break rebooting
+ * after the kdump kernel has finished its task.
+ */
+ emergency_virt_disable();
disable_local_APIC();
}

@@ -80,6 +86,13 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
local_irq_disable();

kdump_nmi_shootdown_cpus();
+
+ /* Booting kdump kernel with VMX or SVM enabled won't work,
+ * because (among other limitations) we can't disable paging
+ * with the virt flags.
+ */
+ emergency_virt_disable();
+
lapic_shutdown();
#if defined(CONFIG_X86_IO_APIC)
disable_IO_APIC();
--
1.5.5.GIT

2008-11-05 20:01:39

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 13/15] kvm: svm: register virt_disable function on hardware_setup

Call set_virt_disable_func() on hardware_setup, and clear it on
hardware_unsetup.

This way kdump and reboot code will be able to disable svm when needed.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kvm/svm.c | 19 +++++++++++++++----
1 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 3d330a2..8f9256d 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -27,6 +27,7 @@
#include <linux/sched.h>

#include <asm/desc.h>
+#include <asm/virtext.h>

#define __ex(x) __kvm_handle_fault_on_reboot(x)

@@ -422,10 +423,16 @@ static __init int svm_hardware_setup(void)
void *iopm_va;
int r;

+ r = set_virt_disable_func(__svm_hardware_disable);
+ if (r)
+ goto err;
+
iopm_pages = alloc_pages(GFP_KERNEL, IOPM_ALLOC_ORDER);

- if (!iopm_pages)
- return -ENOMEM;
+ if (!iopm_pages) {
+ r = -ENOMEM;
+ goto err_clear_virt_disable;
+ }

iopm_va = page_address(iopm_pages);
memset(iopm_va, 0xff, PAGE_SIZE * (1 << IOPM_ALLOC_ORDER));
@@ -438,7 +445,7 @@ static __init int svm_hardware_setup(void)
for_each_online_cpu(cpu) {
r = svm_cpu_init(cpu);
if (r)
- goto err;
+ goto err_free_iopm;
}

svm_features = cpuid_edx(SVM_CPUID_FUNC);
@@ -459,9 +466,12 @@ static __init int svm_hardware_setup(void)

return 0;

-err:
+err_free_iopm:
__free_pages(iopm_pages, IOPM_ALLOC_ORDER);
iopm_base = 0;
+err_clear_virt_disable:
+ clear_virt_disable_func();
+err:
return r;
}

@@ -474,6 +484,7 @@ static __exit void svm_hardware_unsetup(void)

__free_pages(pfn_to_page(iopm_base >> PAGE_SHIFT), IOPM_ALLOC_ORDER);
iopm_base = 0;
+ clear_virt_disable_func();
}

static void init_seg(struct vmcb_seg *seg)
--
1.5.5.GIT

2008-11-05 20:02:22

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 14/15] kvm: vmx: crash_hardware_disable function

We need to first check if virtualization was enabled. We do this by
checking CR4.VMXE. If it is set, run vmxoff and clear CR4.VMXE.

Register it using set_virt_disable_func() on hardware_setup, and
unregister it on hardware_unsetup.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kvm/vmx.c | 40 +++++++++++++++++++++++++++++++++++-----
1 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ece9cb7..4b10768 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -31,6 +31,7 @@

#include <asm/io.h>
#include <asm/desc.h>
+#include <asm/virtext.h>

#define __ex(x) __kvm_handle_fault_on_reboot(x)

@@ -1091,13 +1092,24 @@ static void vmclear_local_vcpus(void)
__vcpu_clear(vmx);
}

-static void hardware_disable(void *garbage)
+static void __hardware_disable(void)
{
- vmclear_local_vcpus();
asm volatile (__ex(ASM_VMX_VMXOFF) : : : "cc");
write_cr4(read_cr4() & ~X86_CR4_VMXE);
}

+static void hardware_disable(void *garbage)
+{
+ vmclear_local_vcpus();
+ __hardware_disable();
+}
+
+static void crash_hardware_disable(void)
+{
+ if (read_cr4() & X86_CR4_VMXE)
+ __hardware_disable();
+}
+
static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
u32 msr, u32 *result)
{
@@ -1281,18 +1293,36 @@ static __init int alloc_kvm_area(void)

static __init int hardware_setup(void)
{
- if (setup_vmcs_config(&vmcs_config) < 0)
- return -EIO;
+ int r;
+
+ r = set_virt_disable_func(crash_hardware_disable);
+ if (r)
+ goto err;
+
+ if (setup_vmcs_config(&vmcs_config) < 0) {
+ r = -EIO;
+ goto err_clear_virt_disable;
+ }

if (boot_cpu_has(X86_FEATURE_NX))
kvm_enable_efer_bits(EFER_NX);

- return alloc_kvm_area();
+ r = alloc_kvm_area();
+ if (r)
+ goto err_clear_virt_disable;
+
+ return 0;
+
+err_clear_virt_disable:
+ clear_virt_disable_func();
+err:
+ return r;
}

static __exit void hardware_unsetup(void)
{
free_kvm_area();
+ clear_virt_disable_func();
}

static void fix_pmode_dataseg(int seg, struct kvm_save_segment *save)
--
1.5.5.GIT

2008-11-05 20:02:58

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 02/15] x86 kdump: Move crashing_cpu assignment to nmi_shootdown_cpus()

This variable will be moved to non-kdump-specific code.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index d82ac72..a70c1c6 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -29,10 +29,11 @@

#include <mach_ipi.h>

+#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
+
/* This keeps a track of which one is crashing cpu. */
static int crashing_cpu;

-#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
static atomic_t waiting_for_crash_ipi;

static void kdump_nmi_callback(int cpu, struct die_args *args)
@@ -97,6 +98,9 @@ static void nmi_shootdown_cpus(void)
{
unsigned long msecs;

+ /* Make a note of crashing cpu. Will be used in NMI callback.*/
+ crashing_cpu = safe_smp_processor_id();
+
atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
/* Would it be better to replace the trap vector here? */
if (register_die_notifier(&crash_nmi_nb))
@@ -137,8 +141,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
/* The kernel is broken so disable interrupts */
local_irq_disable();

- /* Make a note of crashing cpu. Will be used in NMI callback.*/
- crashing_cpu = safe_smp_processor_id();
nmi_shootdown_cpus();
lapic_shutdown();
#if defined(CONFIG_X86_IO_APIC)
--
1.5.5.GIT

2008-11-05 20:02:40

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 05/15] x86 kdump: Make nmi_shootdown_cpus() non-static

Add prototype to asm/reboot.h.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/include/asm/reboot.h | 5 +++++
arch/x86/kernel/crash.c | 3 +--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index df77103..562d4fd 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -1,6 +1,8 @@
#ifndef _ASM_X86_REBOOT_H
#define _ASM_X86_REBOOT_H

+#include <linux/kdebug.h>
+
struct pt_regs;

struct machine_ops {
@@ -18,4 +20,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs);
void native_machine_shutdown(void);
void machine_real_restart(const unsigned char *code, int length);

+typedef void (*nmi_shootdown_cb)(int, struct die_args*);
+void nmi_shootdown_cpus(nmi_shootdown_cb callback);
+
#endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 4201fb3..bba0907 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -29,7 +29,6 @@

#include <mach_ipi.h>

-typedef void (*nmi_shootdown_cb)(int, struct die_args*);

#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)

@@ -97,7 +96,7 @@ static struct notifier_block crash_nmi_nb = {
.notifier_call = crash_nmi_callback,
};

-static void nmi_shootdown_cpus(nmi_shootdown_cb callback)
+void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
unsigned long msecs;

--
1.5.5.GIT

2008-11-05 20:03:21

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 12/15] kvm: svm: no-parameters version of svm_hardware_disable()

Create __svm_hardware_disable(), a function we can use for
set_virt_disable_func().

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kvm/svm.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f0ad4d4..3d330a2 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -266,7 +266,7 @@ static int has_svm(void)
return 1;
}

-static void svm_hardware_disable(void *garbage)
+static void __svm_hardware_disable(void)
{
uint64_t efer;

@@ -275,6 +275,11 @@ static void svm_hardware_disable(void *garbage)
wrmsrl(MSR_EFER, efer & ~MSR_EFER_SVME_MASK);
}

+static void svm_hardware_disable(void *garbage)
+{
+ __svm_hardware_disable();
+}
+
static void svm_hardware_enable(void *garbage)
{

--
1.5.5.GIT

2008-11-05 22:26:00

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 09/15] x86: Emergency virtualization disable function

On Wed 2008-11-05 17:56:52, Eduardo Habkost wrote:
> This patch adds an interface to set a function that can be used to
> disable virtualization extensions on the CPU on emergency cases, such
> as on kdump or emergency reboot.
>
> The function will be set by code that enables virtualization extensions
> on the CPUs (i.e. KVM), and should do the very least to disable virt
> extensions on the CPU where it is being called.
>
> The functions that set the function pointer uses RCU synchronization,
> just in case the crash NMI is triggered while KVM is unloading.
>
> We can't just use the same notifiers used at reboot time (that are used
> by non-crash-dump kexec), because at crash time some CPUs may have IRQs
> disabled, so we can't use IPIs. The crash shutdown code use NMIs to tell
> the other CPUs to be halted, so the notifier call will be hooked into
> the CPU halting code that is on the crash shutdown NMI handler.
>
> [v2: drop 'unsigned int cpu' arg from function]
> [v3: make emergency_virt_disable() non-static]
> [v4: add a config option for it: CPU_VIRT_EXTENSIONS]
> [v5: add has_virt_extensions() function]
> [v6: don't define the registering functions if CPU_VIRT_EXTENSIONS is
> not enabled]
> [v7: use EXPORT_SYMBOL_GPL]
>

> --- /dev/null
> +++ b/arch/x86/kernel/virtext.c
> @@ -0,0 +1,89 @@
> +/* Core CPU virtualization extensions handling
> + *
> + * This should carry the code for handling CPU virtualization extensions
> + * that needs to live in the kernel core.
> + *
> + * Author: Eduardo Habkost <[email protected]>
> + *
> + * Copyright (C) 2008, Red Hat Inc.
> + */

GPL?

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-11-06 07:16:14

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"


* Eduardo Habkost <[email protected]> wrote:

> This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.
>
> Now that we have the hooks to disable virtualization on
> emergency_restart(), we can get back to the BOOT_KBD reboot_type default.
>
> Signed-off-by: Eduardo Habkost <[email protected]>

hm, why revert this? There's nothing wrong with the ACPI reboot
method, it's just that we surprise the BIOS by exiting with an unclean
VMX state in certain circumstances.

if the ACPI reboot method does not work we do the KBD method as the
next thing in the reboot chain.

Ingo

2008-11-06 12:41:39

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

On Thu, Nov 06, 2008 at 08:14:45AM +0100, Ingo Molnar wrote:
>
> * Eduardo Habkost <[email protected]> wrote:
>
> > This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.
> >
> > Now that we have the hooks to disable virtualization on
> > emergency_restart(), we can get back to the BOOT_KBD reboot_type default.
> >
> > Signed-off-by: Eduardo Habkost <[email protected]>
>
> hm, why revert this? There's nothing wrong with the ACPI reboot
> method, it's just that we surprise the BIOS by exiting with an unclean
> VMX state in certain circumstances.
>
> if the ACPI reboot method does not work we do the KBD method as the
> next thing in the reboot chain.

I suppose there are cases where the new default broke without KVM, and
the suggestion was to disable VMX before rebooting instead of changing
the default.

Avi changed the default because on some machines reboot=kbd breaks when
VMX is enabled, but the regressions caused by the new default doesn't
necessarily involve VMX.

Andrey Borzenkov's patch, for example, adds a new DMI entry because
reboot=acpi breaks his keyboard (even without KVM, I guess). Andrey,
was that the case?

--
Eduardo

2008-11-06 14:31:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"


* Eduardo Habkost <[email protected]> wrote:

> On Thu, Nov 06, 2008 at 08:14:45AM +0100, Ingo Molnar wrote:
> >
> > * Eduardo Habkost <[email protected]> wrote:
> >
> > > This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.
> > >
> > > Now that we have the hooks to disable virtualization on
> > > emergency_restart(), we can get back to the BOOT_KBD reboot_type default.
> > >
> > > Signed-off-by: Eduardo Habkost <[email protected]>
> >
> > hm, why revert this? There's nothing wrong with the ACPI reboot
> > method, it's just that we surprise the BIOS by exiting with an unclean
> > VMX state in certain circumstances.
> >
> > if the ACPI reboot method does not work we do the KBD method as the
> > next thing in the reboot chain.
>
> I suppose there are cases where the new default broke without KVM,
> and the suggestion was to disable VMX before rebooting instead of
> changing the default.
>
> Avi changed the default because on some machines reboot=kbd breaks
> when VMX is enabled, but the regressions caused by the new default
> doesn't necessarily involve VMX.

there's another reason as well: a growing quirk-list of machines where
ACPI is the best (sometimes only) method to reboot.

> Andrey Borzenkov's patch, for example, adds a new DMI entry because
> reboot=acpi breaks his keyboard (even without KVM, I guess). Andrey,
> was that the case?

hm, IIRC the problem was KVM in his case too.

Ingo

2008-11-06 15:06:55

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"


* Ingo Molnar <[email protected]> wrote:

> > Andrey Borzenkov's patch, for example, adds a new DMI entry
> > because reboot=acpi breaks his keyboard (even without KVM, I
> > guess). Andrey, was that the case?
>
> hm, IIRC the problem was KVM in his case too.

actually, Andrey's problem seems to be unrelated. So i've queued up
the revert below in tip/x86/urgent for v2.6.28. Thanks guys!

Ingo

---------------->
>From 8d00450d296dedec9ada38d43b83e79cca6fd5a3 Mon Sep 17 00:00:00 2001
From: Eduardo Habkost <[email protected]>
Date: Tue, 4 Nov 2008 12:52:44 -0200
Subject: [PATCH] Revert "x86: default to reboot via ACPI"

This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.

the assumptio of this change was that this would not break
any existing machine. Andrey Borzenkov reported troubles with
the ACPI reboot method: the system would hang on reboot, necessiating
a power cycle. Probably more systems are affected as well.

Also, there are patches queued up for v2.6.29 to disable virtualization
on emergency_restart() - which was the original motivation of
this change.

Reported-by: Andrey Borzenkov <[email protected]>
Bisected-by: Andrey Borzenkov <[email protected]>
Signed-off-by: Eduardo Habkost <[email protected]>
Acked-by: Avi Kivity <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
---
arch/x86/kernel/reboot.c | 6 +-----
1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f4c93f1..724adfc 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -29,11 +29,7 @@ EXPORT_SYMBOL(pm_power_off);

static const struct desc_ptr no_idt = {};
static int reboot_mode;
-/*
- * Keyboard reset and triple fault may result in INIT, not RESET, which
- * doesn't work when we're in vmx root mode. Try ACPI first.
- */
-enum reboot_type reboot_type = BOOT_ACPI;
+enum reboot_type reboot_type = BOOT_KBD;
int reboot_force;

#if defined(CONFIG_X86_32) && defined(CONFIG_SMP)

2008-11-06 15:36:29

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 09/15] x86: Emergency virtualization disable function

On Wed, Nov 05, 2008 at 11:27:32PM +0100, Pavel Machek wrote:
> On Wed 2008-11-05 17:56:52, Eduardo Habkost wrote:
> > This patch adds an interface to set a function that can be used to
> > disable virtualization extensions on the CPU on emergency cases, such
> > as on kdump or emergency reboot.
> >
> > The function will be set by code that enables virtualization extensions
> > on the CPUs (i.e. KVM), and should do the very least to disable virt
> > extensions on the CPU where it is being called.
> >
> > The functions that set the function pointer uses RCU synchronization,
> > just in case the crash NMI is triggered while KVM is unloading.
> >
> > We can't just use the same notifiers used at reboot time (that are used
> > by non-crash-dump kexec), because at crash time some CPUs may have IRQs
> > disabled, so we can't use IPIs. The crash shutdown code use NMIs to tell
> > the other CPUs to be halted, so the notifier call will be hooked into
> > the CPU halting code that is on the crash shutdown NMI handler.
> >
> > [v2: drop 'unsigned int cpu' arg from function]
> > [v3: make emergency_virt_disable() non-static]
> > [v4: add a config option for it: CPU_VIRT_EXTENSIONS]
> > [v5: add has_virt_extensions() function]
> > [v6: don't define the registering functions if CPU_VIRT_EXTENSIONS is
> > not enabled]
> > [v7: use EXPORT_SYMBOL_GPL]
> >
>
> > --- /dev/null
> > +++ b/arch/x86/kernel/virtext.c
> > @@ -0,0 +1,89 @@
> > +/* Core CPU virtualization extensions handling
> > + *
> > + * This should carry the code for handling CPU virtualization extensions
> > + * that needs to live in the kernel core.
> > + *
> > + * Author: Eduardo Habkost <[email protected]>
> > + *
> > + * Copyright (C) 2008, Red Hat Inc.
> > + */
>
> GPL?

Yes, of course.

As IANAL, I used a similar .c file as reference for the author/copyright
info header, and used arch/x86/kernel/crash.c, that doesn't refer to
the GPL also. I think I chose a bad example as reference.

arch/x86/kernel/crash.c even has "All rights reserved". Ouch.

--
Eduardo

2008-11-06 15:45:39

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

Ingo Molnar <[email protected]> writes:

> * Ingo Molnar <[email protected]> wrote:
>
>> > Andrey Borzenkov's patch, for example, adds a new DMI entry
>> > because reboot=acpi breaks his keyboard (even without KVM, I
>> > guess). Andrey, was that the case?
>>
>> hm, IIRC the problem was KVM in his case too.
>
> actually, Andrey's problem seems to be unrelated. So i've queued up
> the revert below in tip/x86/urgent for v2.6.28. Thanks guys!

If there are a number of problems with reboot and reset
I'm wondering if we should investigate using an
outb to port 0x92. With the right bit set you can trigger
a toggle of the reset line on many motherboards.

Eric

2008-11-06 15:53:29

by Andrey Borzenkov

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

[I had to trim direct recipients as my provider would refuse deliver
claiming it is spam]

On Thursday 06 November 2008, Ingo Molnar wrote:
>
> * Ingo Molnar <[email protected]> wrote:
>
> > > Andrey Borzenkov's patch, for example, adds a new DMI entry
> > > because reboot=acpi breaks his keyboard (even without KVM, I
> > > guess). Andrey, was that the case?
> >
> > hm, IIRC the problem was KVM in his case too.
>
> actually, Andrey's problem seems to be unrelated. So i've queued up
> the revert below in tip/x86/urgent for v2.6.28. Thanks guys!
>

Yes, I do not use KVM. Actually my CPU (PIII) does not even support
virtualization.

> Ingo
>
> ---------------->
> From 8d00450d296dedec9ada38d43b83e79cca6fd5a3 Mon Sep 17 00:00:00 2001
> From: Eduardo Habkost <[email protected]>
> Date: Tue, 4 Nov 2008 12:52:44 -0200
> Subject: [PATCH] Revert "x86: default to reboot via ACPI"
>
> This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.
>
> the assumptio of this change was that this would not break
> any existing machine. Andrey Borzenkov reported troubles with
> the ACPI reboot method: the system would hang on reboot, necessiating
> a power cycle. Probably more systems are affected as well.
>

To be precise - system reboots but keyboard is non-functional after that.
Power off is required to clear this condition.

I am fine with either way (revert or DMI). But if problem which ACPI
reboot fixed (or worked around) is not solved differently I think
reverting to old way is better.

> Also, there are patches queued up for v2.6.29 to disable virtualization
> on emergency_restart() - which was the original motivation of
> this change.
>
> Reported-by: Andrey Borzenkov <[email protected]>
> Bisected-by: Andrey Borzenkov <[email protected]>
> Signed-off-by: Eduardo Habkost <[email protected]>
> Acked-by: Avi Kivity <[email protected]>
> Signed-off-by: Ingo Molnar <[email protected]>
> ---
> arch/x86/kernel/reboot.c | 6 +-----
> 1 files changed, 1 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
> index f4c93f1..724adfc 100644
> --- a/arch/x86/kernel/reboot.c
> +++ b/arch/x86/kernel/reboot.c
> @@ -29,11 +29,7 @@ EXPORT_SYMBOL(pm_power_off);
>
> static const struct desc_ptr no_idt = {};
> static int reboot_mode;
> -/*
> - * Keyboard reset and triple fault may result in INIT, not RESET, which
> - * doesn't work when we're in vmx root mode. Try ACPI first.
> - */
> -enum reboot_type reboot_type = BOOT_ACPI;
> +enum reboot_type reboot_type = BOOT_KBD;
> int reboot_force;
>
> #if defined(CONFIG_X86_32) && defined(CONFIG_SMP)
>
>



Attachments:
(No filename) (2.57 kB)
signature.asc (197.00 B)
This is a digitally signed message part.
Download all attachments

2008-11-06 15:53:53

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

Eric W. Biederman wrote:
> If there are a number of problems with reboot and reset
> I'm wondering if we should investigate using an
> outb to port 0x92. With the right bit set you can trigger
> a toggle of the reset line on many motherboards.
>

Most likely that's what the ACPI reset describes.

--
error compiling committee.c: too many arguments to function

2008-11-06 18:12:21

by Pavel Machek

[permalink] [raw]
Subject: Re: [PATCH 09/15] x86: Emergency virtualization disable function

Hi!

> > > --- /dev/null
> > > +++ b/arch/x86/kernel/virtext.c
> > > @@ -0,0 +1,89 @@
> > > +/* Core CPU virtualization extensions handling
> > > + *
> > > + * This should carry the code for handling CPU virtualization extensions
> > > + * that needs to live in the kernel core.
> > > + *
> > > + * Author: Eduardo Habkost <[email protected]>
> > > + *
> > > + * Copyright (C) 2008, Red Hat Inc.
> > > + */
> >
> > GPL?
>
> Yes, of course.
>
> As IANAL, I used a similar .c file as reference for the author/copyright
> info header, and used arch/x86/kernel/crash.c, that doesn't refer to
> the GPL also. I think I chose a bad example as reference.
>
> arch/x86/kernel/crash.c even has "All rights reserved". Ouch.

Yep, that should be fixed.

--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

2008-11-06 19:54:33

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"



> > This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.

I agree that the 2.6.27 default was changed to ACPI for the wrong reason.

However, I think it was the right thing to do,
and if you didn't propose it, I would.

My expectation is that with the ACPI default, our problem
is working around a finite list of old machines that don't work;
while with the default KBD, our problem is working around
a potentially unbounded list of yet to be shipped machines
who may only be tested and work using the ACPI method.

So I recommend leaving the default as ACPI for a while to
see how it goes.

thanks,
-Len

ps. please cc: [email protected] on ACPI related changes.

2008-11-06 21:51:39

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

On Thu, Nov 06, 2008 at 02:50:06PM -0500, Len Brown wrote:

> My expectation is that with the ACPI default, our problem
> is working around a finite list of old machines that don't work;
> while with the default KBD, our problem is working around
> a potentially unbounded list of yet to be shipped machines
> who may only be tested and work using the ACPI method.

Does Windows default to using the ACPI method now?

--
Matthew Garrett | [email protected]

2008-11-06 22:58:36

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"



> > My expectation is that with the ACPI default, our problem
> > is working around a finite list of old machines that don't work;
> > while with the default KBD, our problem is working around
> > a potentially unbounded list of yet to be shipped machines
> > who may only be tested and work using the ACPI method.
>
> Does Windows default to using the ACPI method now?

That is my guess, based on the fact that we've seen
newer machines that don't reboot w/o using the ACPI reset reg.

-Len

2008-11-06 23:24:53

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

On Thu, Nov 06, 2008 at 05:17:25PM -0500, Len Brown wrote:
> > Does Windows default to using the ACPI method now?
>
> That is my guess, based on the fact that we've seen
> newer machines that don't reboot w/o using the ACPI reset reg.

We've seen machines that won't reboot for a variety of reasons in the
past. In some cases this has seemed to be due to hardware being in a
state that the BIOS didn't expect. Hitting the SMI trap behind the ACPI
register might work around this, but I suspect that in many cases we
could achieve the same effect by spending more time trying to work out
whether there's any common themes in the failures.

--
Matthew Garrett | [email protected]

2008-11-07 00:54:22

by Zhao, Yakui

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

On Fri, 2008-11-07 at 06:17 +0800, Len Brown wrote:
>
> > > My expectation is that with the ACPI default, our problem
> > > is working around a finite list of old machines that don't work;
> > > while with the default KBD, our problem is working around
> > > a potentially unbounded list of yet to be shipped machines
> > > who may only be tested and work using the ACPI method.
> >
> > Does Windows default to using the ACPI method now?
>
> That is my guess, based on the fact that we've seen
> newer machines that don't reboot w/o using the ACPI reset reg.
With the help of KVM I find that the windows will be rebooted by writing
RESET_VALUE to RESET_REG I/O port if the RESET_REG_SUP bit is not
zero(It indicates whether ACPI reboot is supported).
IMO maybe the ACPI reboot is the first choice. If it can't, then it will
fall back to other mode.

Thanks.
>
> -Len
> --
> To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2008-11-07 01:00:31

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

On Fri, Nov 07, 2008 at 09:01:19AM +0800, Zhao Yakui wrote:

> With the help of KVM I find that the windows will be rebooted by writing
> RESET_VALUE to RESET_REG I/O port if the RESET_REG_SUP bit is not
> zero(It indicates whether ACPI reboot is supported).
> IMO maybe the ACPI reboot is the first choice. If it can't, then it will
> fall back to other mode.

Hmm. But we're seeing some machines that end up very confused if
rebooted via ACPI. I guess we need to run Vista on them to find out how
they behave. What OSI strings did your KVM setup expose? We know that
Windows changes behaviour under various circumstances depending on which
OS the firmware requests, so it's almost possible that this is another
of those cases.

--
Matthew Garrett | [email protected]

2008-11-07 01:30:59

by Robert Hancock

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

Matthew Garrett wrote:
> On Fri, Nov 07, 2008 at 09:01:19AM +0800, Zhao Yakui wrote:
>
>> With the help of KVM I find that the windows will be rebooted by writing
>> RESET_VALUE to RESET_REG I/O port if the RESET_REG_SUP bit is not
>> zero(It indicates whether ACPI reboot is supported).
>> IMO maybe the ACPI reboot is the first choice. If it can't, then it will
>> fall back to other mode.
>
> Hmm. But we're seeing some machines that end up very confused if
> rebooted via ACPI. I guess we need to run Vista on them to find out how
> they behave. What OSI strings did your KVM setup expose? We know that
> Windows changes behaviour under various circumstances depending on which
> OS the firmware requests, so it's almost possible that this is another
> of those cases.

Given that Windows behavior, this patch seems suspicious:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8fd145917fb62368a9b80db59562c20576238f5a

This patch ignores the RESET_REG_SUP flag and just tries using the reset
register anyway if it thinks it's valid. So we may attempt ACPI reset on
machines which don't indicate it's supported.

The patch description mentioned that some machines didn't reboot after
S3 suspend without this patch. However, we recently had this patch merged:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a68823ee5285e65b51ceb96f8b13a5b4f99a6888

Is it possible that the problem fixed there is the true cause of this
reboot after S3 problem?

2008-11-07 01:44:20

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

On Thu, Nov 06, 2008 at 07:30:43PM -0600, Robert Hancock wrote:

> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8fd145917fb62368a9b80db59562c20576238f5a
>
> This patch ignores the RESET_REG_SUP flag and just tries using the reset
> register anyway if it thinks it's valid. So we may attempt ACPI reset on
> machines which don't indicate it's supported.

Yeah, that sounds very wrong.

> The patch description mentioned that some machines didn't reboot after
> S3 suspend without this patch. However, we recently had this patch merged:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a68823ee5285e65b51ceb96f8b13a5b4f99a6888
>
> Is it possible that the problem fixed there is the true cause of this
> reboot after S3 problem?

Oh, yeah, could be. Given the From:, I should really have thought of
that :)

--
Matthew Garrett | [email protected]

2008-11-07 01:57:22

by Len Brown

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"



On Fri, 7 Nov 2008, Matthew Garrett wrote:

> On Thu, Nov 06, 2008 at 07:30:43PM -0600, Robert Hancock wrote:
>
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8fd145917fb62368a9b80db59562c20576238f5a
> >
> > This patch ignores the RESET_REG_SUP flag and just tries using the reset
> > register anyway if it thinks it's valid. So we may attempt ACPI reset on
> > machines which don't indicate it's supported.
>
> Yeah, that sounds very wrong.

As it turns out, it was an incorrect guess on our part on how to be "bug
compatible" and I'm reverting it per the regression report here:

http://bugzilla.kernel.org/show_bug.cgi?id=11942

-Len

> > The patch description mentioned that some machines didn't reboot after
> > S3 suspend without this patch. However, we recently had this patch merged:
> >
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a68823ee5285e65b51ceb96f8b13a5b4f99a6888
> >
> > Is it possible that the problem fixed there is the true cause of this
> > reboot after S3 problem?
>
> Oh, yeah, could be. Given the From:, I should really have thought of
> that :)
>
> --
> Matthew Garrett | [email protected]
>

2008-11-08 22:28:10

by Rafael J. Wysocki

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

On Friday, 7 of November 2008, Robert Hancock wrote:
> Matthew Garrett wrote:
> > On Fri, Nov 07, 2008 at 09:01:19AM +0800, Zhao Yakui wrote:
> >
> >> With the help of KVM I find that the windows will be rebooted by writing
> >> RESET_VALUE to RESET_REG I/O port if the RESET_REG_SUP bit is not
> >> zero(It indicates whether ACPI reboot is supported).
> >> IMO maybe the ACPI reboot is the first choice. If it can't, then it will
> >> fall back to other mode.
> >
> > Hmm. But we're seeing some machines that end up very confused if
> > rebooted via ACPI. I guess we need to run Vista on them to find out how
> > they behave. What OSI strings did your KVM setup expose? We know that
> > Windows changes behaviour under various circumstances depending on which
> > OS the firmware requests, so it's almost possible that this is another
> > of those cases.
>
> Given that Windows behavior, this patch seems suspicious:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=8fd145917fb62368a9b80db59562c20576238f5a
>
> This patch ignores the RESET_REG_SUP flag and just tries using the reset
> register anyway if it thinks it's valid. So we may attempt ACPI reset on
> machines which don't indicate it's supported.
>
> The patch description mentioned that some machines didn't reboot after
> S3 suspend without this patch. However, we recently had this patch merged:
>
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a68823ee5285e65b51ceb96f8b13a5b4f99a6888
>
> Is it possible that the problem fixed there is the true cause of this
> reboot after S3 problem?

Generally, it is.

Should it regarded as -stable material, BTW, or is it already in -stable?

Matthew?

Thanks,
Rafael

2008-11-09 10:12:52

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

Matthew Garrett wrote:
> Hmm. But we're seeing some machines that end up very confused if
> rebooted via ACPI. I guess we need to run Vista on them to find out how
> they behave. What OSI strings did your KVM setup expose? We know that
> Windows changes behaviour under various circumstances depending on which
> OS the firmware requests, so it's almost possible that this is another
> of those cases.
>

Isn't it the other way around? The firmware changes behavior depending
on how the OS identifies itself?

Reboot is a fixed feature IIRC, so it cannot change depending on
identification strings.

--
error compiling committee.c: too many arguments to function

2008-11-09 10:25:57

by Matthew Garrett

[permalink] [raw]
Subject: Re: [PATCH 15/15] Revert "x86: default to reboot via ACPI"

On Sun, Nov 09, 2008 at 12:11:03PM +0200, Avi Kivity wrote:
> Matthew Garrett wrote:
> >Hmm. But we're seeing some machines that end up very confused if
> >rebooted via ACPI. I guess we need to run Vista on them to find out how
> >they behave. What OSI strings did your KVM setup expose? We know that
> >Windows changes behaviour under various circumstances depending on which
> >OS the firmware requests, so it's almost possible that this is another
> >of those cases.
> >
>
> Isn't it the other way around? The firmware changes behavior depending
> on how the OS identifies itself?

That also happens, yes.

> Reboot is a fixed feature IIRC, so it cannot change depending on
> identification strings.

The ID strings that the firmware requests give a good idea about which
operating systems the machine has been tested with. If Vista uses the
ACPI method then having the firmware request the OSI string for Vista
gives us a good indication that it's safe to use the ACPI method.

--
Matthew Garrett | [email protected]