2008-11-04 14:55:46

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 00/14] x86: disable virt on kdump and emergency_restart

Hi,

This is a new version of the series to disabling virtualization on kdump,
now extended to do the same tricks on emergency_restart() if needed.

This series is against linux-next-20081103.

Patches 01-07 simply move the non-kdump-specific parts
of nmi_shootdown_cpus() to reboot.c, so it can be used by
emergency_restart(). They should be a no-op in relation to existing code.

Patches 08 adds the virt_disable function registering interface, like
the previous series.

Patch 09 hooks emergency_virt_disable() into kdump crash_shutdown code.

Patch 10 hooks emergency_virt_disable() into emergency_restart() using
nmi_shootdown_cpus().

Patches 11-15 change KVM so that it registers a virt_disable function
when loading.

Finally, patch 16 restore the previous reboot=kbd default.

--
Eduardo


2008-11-04 14:53:54

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 11/16] kvm_x86_ops: crash_hardware_disable() operation

The vmx hardware_disable() function does too much for the virt_disable
crash handler, so we will have a new operation for the crash-time
virt_disable case.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 09e6c56..002023b 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -426,6 +426,7 @@ struct kvm_x86_ops {
int (*disabled_by_bios)(void); /* __init */
void (*hardware_enable)(void *dummy); /* __init */
void (*hardware_disable)(void *dummy);
+ void (*crash_hardware_disable)(void *dummy);
void (*check_processor_compatibility)(void *rtn);
int (*hardware_setup)(void); /* __init */
void (*hardware_unsetup)(void); /* __exit */
--
1.5.5.GIT

2008-11-04 14:54:23

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 10/16] x86: disable virtualization on all CPUs if needed, on emergency_restart

On emergency_restart, we may need to use NMIs to disable virtualization
on all CPUs. We do that by using nmi_shootdown_cpus(), but only if a
virt_disable function was set by KVM.

Finding a proper point to hook the nmi_shootdown_cpus() call wasn't
really easy, as the non-emergency machine_restart() (that doesn't
need the NMI tricks) uses machine_emergency_restart() directly.

The solution to make this work without adding a new function or argument
to machine_ops was setting a 'reboot_emergency' flag that tells if
native_machine_emergency_restart() needs to do the virt cleanup or not.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/reboot.c | 48 ++++++++++++++++++++++++++++++++++++++++++++-
1 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 80d5e9c..932763f 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -12,6 +12,7 @@
#include <asm/proto.h>
#include <asm/reboot_fixups.h>
#include <asm/reboot.h>
+#include <asm/virtext.h>

#ifdef CONFIG_X86_32
# include <linux/dmi.h>
@@ -42,6 +43,12 @@ int reboot_force;
static int reboot_cpu = -1;
#endif

+/* This is set if we need to go through the 'emergency' path.
+ * When machine_emergency_restart() is called, we may be on
+ * an inconsistent state and won't be able to do a clean cleanup
+ */
+static int reboot_emergency;
+
/* reboot=b[ios] | s[mp] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old]
warm Don't set the cold reboot flag
cold Set the cold reboot flag
@@ -357,6 +364,33 @@ static inline void kb_wait(void)
}
}

+static void disable_virt_nmi(int cpu, struct die_args *args)
+{
+ emergency_virt_disable();
+}
+
+/* Use NMIs as IPIs to tell all CPUs to disable virtualization
+ */
+static void emergency_virt_disable_all(void)
+{
+ /* We will do this NMI stuff only when we may have virtualization
+ * enabled.
+ */
+ if (has_virt_extensions()) {
+ /* Kill the other CPUs */
+ nmi_shootdown_cpus(disable_virt_nmi);
+ /* Disable virt on this CPU */
+ emergency_virt_disable();
+ }
+
+ /* If another CPU call set_virt_disable_func()
+ * here, we will be lost. Unless we want to add the NMI
+ * stuff to the reboot path for non-virt users, we will
+ * have to live with this (unlikely) possibility.
+ */
+
+}
+
void __attribute__((weak)) mach_reboot_fixups(void)
{
}
@@ -365,9 +399,13 @@ static void native_machine_emergency_restart(void)
{
int i;

+ if (reboot_emergency)
+ emergency_virt_disable_all();
+
/* Tell the BIOS if we want cold or warm reboot */
*((unsigned short *)__va(0x472)) = reboot_mode;

+
for (;;) {
/* Could also try the reset bit in the Hammer NB */
switch (reboot_type) {
@@ -456,13 +494,19 @@ void native_machine_shutdown(void)
#endif
}

+static void __machine_emergency_restart(int emergency)
+{
+ reboot_emergency = emergency;
+ machine_ops.emergency_restart();
+}
+
static void native_machine_restart(char *__unused)
{
printk("machine restart\n");

if (!reboot_force)
machine_shutdown();
- machine_emergency_restart();
+ __machine_emergency_restart(0);
}

static void native_machine_halt(void)
@@ -501,7 +545,7 @@ void machine_shutdown(void)

void machine_emergency_restart(void)
{
- machine_ops.emergency_restart();
+ __machine_emergency_restart(1);
}

void machine_restart(char *cmd)
--
1.5.5.GIT

2008-11-04 14:54:37

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 06/16] x86: Move nmi_shootdown_cpus() to reboot.c

Now nmi_shootdown_cpus() is ready to be used by non-kdump code also.
Move it to reboot.c.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 73 ------------------------------------------
arch/x86/kernel/reboot.c | 79 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 79 insertions(+), 73 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 84d6338..d84a852 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -32,12 +32,6 @@

#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)

-/* This keeps a track of which one is crashing cpu. */
-static int crashing_cpu;
-static nmi_shootdown_cb shootdown_callback;
-
-static atomic_t waiting_for_crash_ipi;
-
static void kdump_nmi_callback(int cpu, struct die_args *args)
{
struct pt_regs *regs;
@@ -58,73 +52,6 @@ static void kdump_nmi_callback(int cpu, struct die_args *args)
disable_local_APIC();
}

-static int crash_nmi_callback(struct notifier_block *self,
- unsigned long val, void *data)
-{
- int cpu;
-
- if (val != DIE_NMI_IPI)
- return NOTIFY_OK;
-
- cpu = raw_smp_processor_id();
-
- /* Don't do anything if this handler is invoked on crashing cpu.
- * Otherwise, system will completely hang. Crashing cpu can get
- * an NMI if system was initially booted with nmi_watchdog parameter.
- */
- if (cpu == crashing_cpu)
- return NOTIFY_STOP;
- local_irq_disable();
-
- shootdown_callback(cpu, (struct die_args*)data);
-
- atomic_dec(&waiting_for_crash_ipi);
- /* Assume hlt works */
- halt();
- for (;;)
- cpu_relax();
-
- return 1;
-}
-
-static void smp_send_nmi_allbutself(void)
-{
- send_IPI_allbutself(NMI_VECTOR);
-}
-
-static struct notifier_block crash_nmi_nb = {
- .notifier_call = crash_nmi_callback,
-};
-
-void nmi_shootdown_cpus(nmi_shootdown_cb callback)
-{
- unsigned long msecs;
-
- /* Make a note of crashing cpu. Will be used in NMI callback.*/
- crashing_cpu = safe_smp_processor_id();
-
- shootdown_callback = callback;
-
- atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
- /* Would it be better to replace the trap vector here? */
- if (register_die_notifier(&crash_nmi_nb))
- return; /* return what? */
- /* Ensure the new callback function is set before sending
- * out the NMI
- */
- wmb();
-
- smp_send_nmi_allbutself();
-
- msecs = 1000; /* Wait at most a second for the other cpus to stop */
- while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
- mdelay(1);
- msecs--;
- }
-
- /* Leave the nmi callback set */
-}
-
static void kdump_nmi_shootdown_cpus(void)
{
nmi_shootdown_cpus(kdump_nmi_callback);
diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index f4c93f1..b65fc86 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -21,6 +21,8 @@
# include <asm/iommu.h>
#endif

+#include <mach_ipi.h>
+
/*
* Power off function, if any
*/
@@ -518,3 +520,80 @@ void machine_crash_shutdown(struct pt_regs *regs)
machine_ops.crash_shutdown(regs);
}
#endif
+
+
+#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
+
+/* This keeps a track of which one is crashing cpu. */
+static int crashing_cpu;
+static nmi_shootdown_cb shootdown_callback;
+
+static atomic_t waiting_for_crash_ipi;
+
+static int crash_nmi_callback(struct notifier_block *self,
+ unsigned long val, void *data)
+{
+ int cpu;
+
+ if (val != DIE_NMI_IPI)
+ return NOTIFY_OK;
+
+ cpu = raw_smp_processor_id();
+
+ /* Don't do anything if this handler is invoked on crashing cpu.
+ * Otherwise, system will completely hang. Crashing cpu can get
+ * an NMI if system was initially booted with nmi_watchdog parameter.
+ */
+ if (cpu == crashing_cpu)
+ return NOTIFY_STOP;
+ local_irq_disable();
+
+ shootdown_callback(cpu, (struct die_args*)data);
+
+ atomic_dec(&waiting_for_crash_ipi);
+ /* Assume hlt works */
+ halt();
+ for (;;)
+ cpu_relax();
+
+ return 1;
+}
+
+static void smp_send_nmi_allbutself(void)
+{
+ send_IPI_allbutself(NMI_VECTOR);
+}
+
+static struct notifier_block crash_nmi_nb = {
+ .notifier_call = crash_nmi_callback,
+};
+
+void nmi_shootdown_cpus(nmi_shootdown_cb callback)
+{
+ unsigned long msecs;
+
+ /* Make a note of crashing cpu. Will be used in NMI callback.*/
+ crashing_cpu = safe_smp_processor_id();
+
+ shootdown_callback = callback;
+
+ atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
+ /* Would it be better to replace the trap vector here? */
+ if (register_die_notifier(&crash_nmi_nb))
+ return; /* return what? */
+ /* Ensure the new callback function is set before sending
+ * out the NMI
+ */
+ wmb();
+
+ smp_send_nmi_allbutself();
+
+ msecs = 1000; /* Wait at most a second for the other cpus to stop */
+ while ((atomic_read(&waiting_for_crash_ipi) > 0) && msecs) {
+ mdelay(1);
+ msecs--;
+ }
+
+ /* Leave the nmi callback set */
+}
+#endif
--
1.5.5.GIT

2008-11-04 14:54:53

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 01/16] x86 kdump: Extract kdump-specific code from crash_nmi_callback()

The NMI CPU-halting code will be used on non-kdump cases, also
(e.g. emergency_reboot when virtualization is enabled).

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 31 ++++++++++++++++++++-----------
1 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 81e01f7..e8c58aa 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -35,19 +35,34 @@ static int crashing_cpu;
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
static atomic_t waiting_for_crash_ipi;

-static int crash_nmi_callback(struct notifier_block *self,
- unsigned long val, void *data)
+static void kdump_nmi_callback(int cpu, struct die_args *args)
{
struct pt_regs *regs;
#ifdef CONFIG_X86_32
struct pt_regs fixed_regs;
#endif
+
+ regs = args->regs;
+
+#ifdef CONFIG_X86_32
+ if (!user_mode_vm(regs)) {
+ crash_fixup_ss_esp(&fixed_regs, regs);
+ regs = &fixed_regs;
+ }
+#endif
+ crash_save_cpu(regs, cpu);
+
+ disable_local_APIC();
+}
+
+static int crash_nmi_callback(struct notifier_block *self,
+ unsigned long val, void *data)
+{
int cpu;

if (val != DIE_NMI_IPI)
return NOTIFY_OK;

- regs = ((struct die_args *)data)->regs;
cpu = raw_smp_processor_id();

/* Don't do anything if this handler is invoked on crashing cpu.
@@ -58,14 +73,8 @@ static int crash_nmi_callback(struct notifier_block *self,
return NOTIFY_STOP;
local_irq_disable();

-#ifdef CONFIG_X86_32
- if (!user_mode_vm(regs)) {
- crash_fixup_ss_esp(&fixed_regs, regs);
- regs = &fixed_regs;
- }
-#endif
- crash_save_cpu(regs, cpu);
- disable_local_APIC();
+ kdump_nmi_callback(cpu, (struct die_args*)data);
+
atomic_dec(&waiting_for_crash_ipi);
/* Assume hlt works */
halt();
--
1.5.5.GIT

2008-11-04 14:55:16

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 12/16] kvm: svm: set crash_hardware_disable to svm_hardware_disable

We can use it as is, as it is a simple function.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kvm/svm.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index f0ad4d4..6b3a660 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1942,6 +1942,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.check_processor_compatibility = svm_check_processor_compat,
.hardware_enable = svm_hardware_enable,
.hardware_disable = svm_hardware_disable,
+ .crash_hardware_disable = svm_hardware_disable,
.cpu_has_accelerated_tpr = svm_cpu_has_accelerated_tpr,

.vcpu_create = svm_create_vcpu,
--
1.5.5.GIT

2008-11-04 14:56:04

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 16/16] Revert "x86: default to reboot via ACPI"

This reverts commit c7ffa6c26277b403920e2255d10df849bd613380.

Now that we have the hooks to disable virtualization on
emergency_restart(), we can get back to the BOOT_KBD reboot_type default.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/reboot.c | 6 +-----
1 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 932763f..f05e819 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -32,11 +32,7 @@ EXPORT_SYMBOL(pm_power_off);

static const struct desc_ptr no_idt = {};
static int reboot_mode;
-/*
- * Keyboard reset and triple fault may result in INIT, not RESET, which
- * doesn't work when we're in vmx root mode. Try ACPI first.
- */
-enum reboot_type reboot_type = BOOT_ACPI;
+enum reboot_type reboot_type = BOOT_KBD;
int reboot_force;

#if defined(CONFIG_X86_32) && defined(CONFIG_SMP)
--
1.5.5.GIT

2008-11-04 14:56:28

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 14/16] kvmx: x86: set kvm_x86_ops earlier on kvm_arch_init()

Small change that will be needed when we use set_virt_disable_func()
on kvm_arch_init().

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kvm/x86.c | 7 +++++--
1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9a4a39c..049c6a0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2603,19 +2603,22 @@ int kvm_arch_init(void *opaque)
goto out;
}

+ kvm_x86_ops = ops;
+
r = kvm_mmu_module_init();
if (r)
- goto out;
+ goto out_clear_ops;

kvm_init_msr_list();

- kvm_x86_ops = ops;
kvm_mmu_set_nonpresent_ptes(0ull, 0ull);
kvm_mmu_set_base_ptes(PT_PRESENT_MASK);
kvm_mmu_set_mask_ptes(PT_USER_MASK, PT_ACCESSED_MASK,
PT_DIRTY_MASK, PT64_NX_MASK, 0, 0);
return 0;

+out_clear_ops:
+ kvm_x86_ops = NULL;
out:
return r;
}
--
1.5.5.GIT

2008-11-04 14:57:00

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 09/16] kdump: Hook emergency_virt_disable() on crash shutdown code

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index d84a852..87780ba 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -26,6 +26,7 @@
#include <linux/kdebug.h>
#include <asm/smp.h>
#include <asm/reboot.h>
+#include <asm/virtext.h>

#include <mach_ipi.h>

@@ -49,6 +50,7 @@ static void kdump_nmi_callback(int cpu, struct die_args *args)
#endif
crash_save_cpu(regs, cpu);

+ emergency_virt_disable();
disable_local_APIC();
}

@@ -80,6 +82,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
local_irq_disable();

kdump_nmi_shootdown_cpus();
+ emergency_virt_disable();
lapic_shutdown();
#if defined(CONFIG_X86_IO_APIC)
disable_IO_APIC();
--
1.5.5.GIT

2008-11-04 14:56:44

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 03/16] x86 kdump: Create kdump_nmi_shootdown_cpus()

For the kdump-specific code that was living on nmi_shootdown_cpus().

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 11 +++++++++--
1 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 0e77fe9..7f6e2b2 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -119,10 +119,17 @@ static void nmi_shootdown_cpus(void)
}

/* Leave the nmi callback set */
+}
+
+static void kdump_nmi_shootdown_cpus(void)
+{
+ nmi_shootdown_cpus();
+
disable_local_APIC();
}
+
#else
-static void nmi_shootdown_cpus(void)
+static void kdump_nmi_shootdown_cpus(void)
{
/* There are no cpus to shootdown */
}
@@ -141,7 +148,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
/* The kernel is broken so disable interrupts */
local_irq_disable();

- nmi_shootdown_cpus();
+ kdump_nmi_shootdown_cpus();
lapic_shutdown();
#if defined(CONFIG_X86_IO_APIC)
disable_IO_APIC();
--
1.5.5.GIT

2008-11-04 14:57:29

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 15/16] kvm: x86: set kdump virt_disable function on initialization

Finally implement the virt_disable function for kdump. It will call
kvm_x86_ops->crash_hardware_disable(), that will disable virtualization
extensions on the CPU if it is not disabled yet.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
1 files changed, 18 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 049c6a0..9e61baf 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -40,6 +40,7 @@
#include <asm/msr.h>
#include <asm/desc.h>
#include <asm/mtrr.h>
+#include <asm/virtext.h>

#define MAX_IO_MSRS 256
#define CR0_RESERVED_BITS \
@@ -2581,6 +2582,13 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct kvm_run *run, int in,
}
EXPORT_SYMBOL_GPL(kvm_emulate_pio_string);

+/* Called at crash time, so we can disable virtualization if needed
+ */
+static void crash_hardware_disable(void)
+{
+ kvm_x86_ops->crash_hardware_disable(NULL);
+}
+
int kvm_arch_init(void *opaque)
{
int r;
@@ -2605,9 +2613,15 @@ int kvm_arch_init(void *opaque)

kvm_x86_ops = ops;

+ r = set_virt_disable_func(crash_hardware_disable);
+ if (r) {
+ printk(KERN_ERR "kvm: virt_disable function already set?\n");
+ goto out_clear_ops;
+ }
+
r = kvm_mmu_module_init();
if (r)
- goto out_clear_ops;
+ goto out_clear_crash;

kvm_init_msr_list();

@@ -2617,6 +2631,8 @@ int kvm_arch_init(void *opaque)
PT_DIRTY_MASK, PT64_NX_MASK, 0, 0);
return 0;

+out_clear_crash:
+ clear_virt_disable_func();
out_clear_ops:
kvm_x86_ops = NULL;
out:
@@ -2625,6 +2641,7 @@ out:

void kvm_arch_exit(void)
{
+ clear_virt_disable_func();
kvm_x86_ops = NULL;
kvm_mmu_module_exit();
}
--
1.5.5.GIT

2008-11-04 14:57:43

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 08/16] x86: Emergency virtualization disable function

This patch adds an interface to set a function that can be used to
disable virtualization extensions on the CPU on emergency cases, such
as on kdump or emergency reboot.

The function will be set by code that enables virtualization extensions
on the CPUs (i.e. KVM), and should do the very least to disable virt
extensions on the CPU where it is being called.

The functions that set the function pointer uses RCU synchronization,
just in case the crash NMI is triggered while KVM is unloading.

We can't just use the same notifiers used at reboot time (that are used
by non-crash-dump kexec), because at crash time some CPUs may have IRQs
disabled, so we can't use IPIs. The crash shutdown code use NMIs to tell
the other CPUs to be halted, so the notifier call will be hooked into
the CPU halting code that is on the crash shutdown NMI handler.

[v2: drop 'unsigned int cpu' arg from function]
[v3: make emergency_virt_disable() non-static]
[v4: add a config option for it: CPU_VIRT_EXTENSIONS]
[v5: add has_virt_extensions() function]
[v6: don't define the registering functions if CPU_VIRT_EXTENSIONS is
not enabled]

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/include/asm/virtext.h | 29 +++++++++++++
arch/x86/kernel/Makefile | 1 +
arch/x86/kernel/virtext.c | 89 ++++++++++++++++++++++++++++++++++++++++
arch/x86/kvm/Kconfig | 5 ++
4 files changed, 124 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/include/asm/virtext.h
create mode 100644 arch/x86/kernel/virtext.c

diff --git a/arch/x86/include/asm/virtext.h b/arch/x86/include/asm/virtext.h
new file mode 100644
index 0000000..72b7caa
--- /dev/null
+++ b/arch/x86/include/asm/virtext.h
@@ -0,0 +1,29 @@
+/* CPU virtualization extensions handling
+ */
+#ifndef _ASM_X86_VIRTEX_H
+#define _ASM_X86_VIRTEX_H
+
+#ifdef CONFIG_CPU_VIRT_EXTENSIONS
+
+int set_virt_disable_func(void (*fn)(void));
+void clear_virt_disable_func(void);
+void emergency_virt_disable(void);
+int has_virt_extensions(void);
+
+
+#else /* !CONFIG_CPU_VIRT_EXTENSIONS */
+
+static inline
+void emergency_virt_disable(void)
+{
+}
+
+static inline
+int has_virt_extensions(void)
+{
+ return 0;
+}
+
+#endif /* CONFIG_CPU_VIRT_EXTENSIONS */
+
+#endif /* _ASM_X86_VIRTEX_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 58814cc..84a0e23 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -68,6 +68,7 @@ obj-$(CONFIG_X86_REBOOTFIXUPS) += reboot_fixups_32.o
obj-$(CONFIG_DYNAMIC_FTRACE) += ftrace.o
obj-$(CONFIG_KEXEC) += machine_kexec_$(BITS).o
obj-$(CONFIG_KEXEC) += relocate_kernel_$(BITS).o crash.o
+obj-$(CONFIG_CPU_VIRT_EXTENSIONS) += virtext.o
obj-$(CONFIG_CRASH_DUMP) += crash_dump_$(BITS).o
obj-$(CONFIG_X86_NUMAQ) += numaq_32.o
obj-$(CONFIG_X86_ES7000) += es7000_32.o
diff --git a/arch/x86/kernel/virtext.c b/arch/x86/kernel/virtext.c
new file mode 100644
index 0000000..4876253
--- /dev/null
+++ b/arch/x86/kernel/virtext.c
@@ -0,0 +1,89 @@
+/* Core CPU virtualization extensions handling
+ *
+ * This should carry the code for handling CPU virtualization extensions
+ * that needs to live in the kernel core.
+ *
+ * Author: Eduardo Habkost <[email protected]>
+ *
+ * Copyright (C) 2008, Red Hat Inc.
+ */
+
+#include <linux/module.h>
+#include <linux/spinlock.h>
+#include <linux/rcupdate.h>
+
+static DEFINE_SPINLOCK(virt_disable_lock);
+static void (*virt_disable_fn)(void);
+
+/** set function to be called to disable virtualization on crash
+ *
+ * Registers a function to be called when CPUs are being halted at
+ * machine_crash_shutdown().
+ *
+ * There is only one function pointer, so the function
+ * is reserved to be set by the KVM module at load time, before
+ * enabling virtualization.
+ *
+ * The function is called once on each online CPU, possibly
+ * from a NMI handler. It should do the very least to allow the CPU
+ * to be halted before booting the kdump kernel, as the kernel has
+ * just crashed.
+ */
+int set_virt_disable_func(void (*fn)(void))
+{
+ int r = 0;
+
+ spin_lock(&virt_disable_lock);
+ if (!virt_disable_fn)
+ rcu_assign_pointer(virt_disable_fn, fn);
+ else
+ r = -EEXIST;
+ spin_unlock(&virt_disable_lock);
+
+ return r;
+}
+EXPORT_SYMBOL(set_virt_disable_func);
+
+/** clear the virt_disable function set by set_virt_disable_func()
+ *
+ * You must call this function only if you sucessfully set
+ * the virt_disable function on a previous set_virt_disable_func()
+ * call.
+ *
+ * This function will use synchronize_sched() to wait until it's safe
+ * to free any data or code related to the previous existing virt_disable
+ * func, before returning.
+ */
+void clear_virt_disable_func(void)
+{
+ spin_lock(&virt_disable_lock);
+ rcu_assign_pointer(virt_disable_fn, NULL);
+ spin_unlock(&virt_disable_lock);
+
+ synchronize_sched();
+}
+EXPORT_SYMBOL(clear_virt_disable_func);
+
+/* Disable virtualization extensions if needed
+ *
+ * Runs thefunction set by set_virt_disable_func()
+ *
+ * Must be called on the CPU that is being halted.
+ */
+void emergency_virt_disable(void)
+{
+ void (*fn)(void);
+ fn = rcu_dereference(virt_disable_fn);
+ if (fn)
+ fn();
+}
+
+/* Returns non-zero if a virt_disable function was set
+ *
+ * The function pointer may be set just after you've called this function.
+ * Use with care.
+ */
+int has_virt_extensions(void)
+{
+ return (rcu_dereference(virt_disable_fn) != NULL);
+}
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index ce3251c..69373cc 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -15,6 +15,10 @@ menuconfig VIRTUALIZATION

If you say N, all options in this submenu will be skipped and disabled.

+# Core code for handling CPU virt extensions
+config CPU_VIRT_EXTENSIONS
+ bool
+
if VIRTUALIZATION

config KVM
@@ -23,6 +27,7 @@ config KVM
select PREEMPT_NOTIFIERS
select MMU_NOTIFIER
select ANON_INODES
+ select CPU_VIRT_EXTENSIONS
---help---
Support hosting fully virtualized guest machines using hardware
virtualization extensions. You will need a fairly recent
--
1.5.5.GIT

2008-11-04 14:58:00

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 07/16] x86: Make nmi_shootdown_cpus() available on !SMP and !X86_LOCAL_APIC

The X86_LOCAL_APIC #ifdef was for kdump. For !SMP, the function simply
does nothing.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/reboot.c | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index b65fc86..80d5e9c 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -522,7 +522,7 @@ void machine_crash_shutdown(struct pt_regs *regs)
#endif


-#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
+#if defined(CONFIG_SMP)

/* This keeps a track of which one is crashing cpu. */
static int crashing_cpu;
@@ -568,6 +568,12 @@ static struct notifier_block crash_nmi_nb = {
.notifier_call = crash_nmi_callback,
};

+/* Halt all other CPUs, calling the specified function on each of them
+ *
+ * This function can be used to halt all other CPUs on crash
+ * or emergency reboot time. The function passed as parameter
+ * will be called inside a NMI handler on all CPUs.
+ */
void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
unsigned long msecs;
@@ -596,4 +602,9 @@ void nmi_shootdown_cpus(nmi_shootdown_cb callback)

/* Leave the nmi callback set */
}
+#else /* !CONFIG_SMP */
+void nmi_shootdown_cpus(nmi_shootdown_cb callback)
+{
+ /* No other CPUs to shoot down */
+}
#endif
--
1.5.5.GIT

2008-11-04 14:58:56

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 04/16] x86 kdump: Make kdump_nmi_callback() a function ptr on crash_nmi_callback()

The reboot code will use a different function on crash_nmi_callback().
Adding a function pointer parameter to nmi_shootdown_cpus() for that.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 11 ++++++++---
1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 7f6e2b2..ffbe54b 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -29,10 +29,13 @@

#include <mach_ipi.h>

+typedef void (*nmi_shootdown_cb)(int, struct die_args*);
+
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)

/* This keeps a track of which one is crashing cpu. */
static int crashing_cpu;
+static nmi_shootdown_cb shootdown_callback;

static atomic_t waiting_for_crash_ipi;

@@ -74,7 +77,7 @@ static int crash_nmi_callback(struct notifier_block *self,
return NOTIFY_STOP;
local_irq_disable();

- kdump_nmi_callback(cpu, (struct die_args*)data);
+ shootdown_callback(cpu, (struct die_args*)data);

atomic_dec(&waiting_for_crash_ipi);
/* Assume hlt works */
@@ -94,13 +97,15 @@ static struct notifier_block crash_nmi_nb = {
.notifier_call = crash_nmi_callback,
};

-static void nmi_shootdown_cpus(void)
+static void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
unsigned long msecs;

/* Make a note of crashing cpu. Will be used in NMI callback.*/
crashing_cpu = safe_smp_processor_id();

+ shootdown_callback = callback;
+
atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
/* Would it be better to replace the trap vector here? */
if (register_die_notifier(&crash_nmi_nb))
@@ -123,7 +128,7 @@ static void nmi_shootdown_cpus(void)

static void kdump_nmi_shootdown_cpus(void)
{
- nmi_shootdown_cpus();
+ nmi_shootdown_cpus(kdump_nmi_callback);

disable_local_APIC();
}
--
1.5.5.GIT

2008-11-04 14:59:44

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 02/16] x86 kdump: Move crashing_cpu assignment to nmi_shootdown_cpus()

This variable will be moved to non-kdump-specific code.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/crash.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index e8c58aa..0e77fe9 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -29,10 +29,11 @@

#include <mach_ipi.h>

+#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
+
/* This keeps a track of which one is crashing cpu. */
static int crashing_cpu;

-#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
static atomic_t waiting_for_crash_ipi;

static void kdump_nmi_callback(int cpu, struct die_args *args)
@@ -97,6 +98,9 @@ static void nmi_shootdown_cpus(void)
{
unsigned long msecs;

+ /* Make a note of crashing cpu. Will be used in NMI callback.*/
+ crashing_cpu = safe_smp_processor_id();
+
atomic_set(&waiting_for_crash_ipi, num_online_cpus() - 1);
/* Would it be better to replace the trap vector here? */
if (register_die_notifier(&crash_nmi_nb))
@@ -137,8 +141,6 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
/* The kernel is broken so disable interrupts */
local_irq_disable();

- /* Make a note of crashing cpu. Will be used in NMI callback.*/
- crashing_cpu = safe_smp_processor_id();
nmi_shootdown_cpus();
lapic_shutdown();
#if defined(CONFIG_X86_IO_APIC)
--
1.5.5.GIT

2008-11-04 14:59:24

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 05/16] x86 kdump: Make nmi_shootdown_cpus() non-static

Add prototype to asm/reboot.h.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/include/asm/reboot.h | 5 +++++
arch/x86/kernel/crash.c | 3 +--
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/reboot.h b/arch/x86/include/asm/reboot.h
index df77103..562d4fd 100644
--- a/arch/x86/include/asm/reboot.h
+++ b/arch/x86/include/asm/reboot.h
@@ -1,6 +1,8 @@
#ifndef _ASM_X86_REBOOT_H
#define _ASM_X86_REBOOT_H

+#include <linux/kdebug.h>
+
struct pt_regs;

struct machine_ops {
@@ -18,4 +20,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs);
void native_machine_shutdown(void);
void machine_real_restart(const unsigned char *code, int length);

+typedef void (*nmi_shootdown_cb)(int, struct die_args*);
+void nmi_shootdown_cpus(nmi_shootdown_cb callback);
+
#endif /* _ASM_X86_REBOOT_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index ffbe54b..84d6338 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -29,7 +29,6 @@

#include <mach_ipi.h>

-typedef void (*nmi_shootdown_cb)(int, struct die_args*);

#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)

@@ -97,7 +96,7 @@ static struct notifier_block crash_nmi_nb = {
.notifier_call = crash_nmi_callback,
};

-static void nmi_shootdown_cpus(nmi_shootdown_cb callback)
+void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
unsigned long msecs;

--
1.5.5.GIT

2008-11-04 15:00:01

by Eduardo Habkost

[permalink] [raw]
Subject: [PATCH 13/16] kvm: vmx: crash_hardware_disable function

We need to first check if virtualization was enabled. We do this by
checking CR4.VMXE. If it is set, run vmxoff and clear CR4.VMXE.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kvm/vmx.c | 16 ++++++++++++++--
1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index ece9cb7..0f4e191 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1091,13 +1091,24 @@ static void vmclear_local_vcpus(void)
__vcpu_clear(vmx);
}

-static void hardware_disable(void *garbage)
+static void __hardware_disable(void)
{
- vmclear_local_vcpus();
asm volatile (__ex(ASM_VMX_VMXOFF) : : : "cc");
write_cr4(read_cr4() & ~X86_CR4_VMXE);
}

+static void hardware_disable(void *garbage)
+{
+ vmclear_local_vcpus();
+ __hardware_disable();
+}
+
+static void crash_hardware_disable(void *garbage)
+{
+ if (read_cr4() & X86_CR4_VMXE)
+ __hardware_disable();
+}
+
static __init int adjust_vmx_controls(u32 ctl_min, u32 ctl_opt,
u32 msr, u32 *result)
{
@@ -3589,6 +3600,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.check_processor_compatibility = vmx_check_processor_compat,
.hardware_enable = hardware_enable,
.hardware_disable = hardware_disable,
+ .crash_hardware_disable = crash_hardware_disable,
.cpu_has_accelerated_tpr = cpu_has_vmx_virtualize_apic_accesses,

.vcpu_create = vmx_create_vcpu,
--
1.5.5.GIT

2008-11-04 16:55:50

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 00/14] x86: disable virt on kdump and emergency_restart

Eduardo Habkost wrote:
> Hi,
>
> This is a new version of the series to disabling virtualization on kdump,
> now extended to do the same tricks on emergency_restart() if needed.
>

Looks good. If you me to push it upstream, I'll need kexec/kdump acks.
Otherwise, ack for the kvm bits.

--
error compiling committee.c: too many arguments to function

2008-11-04 18:16:44

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 07/16] x86: Make nmi_shootdown_cpus() available on !SMP and !X86_LOCAL_APIC



On Tue, Nov 04, 2008 at 12:52:35PM -0200, Eduardo Habkost wrote:
> The X86_LOCAL_APIC #ifdef was for kdump. For !SMP, the function simply
> does nothing.
>

The bit below is also needed. It can be added to the series right before
patch 07/16.

After getting some review, I will resubmit the series with it, and with
a few style fixes suggested by checkpatch.

---
From: Eduardo Habkost <[email protected]>
Subject: [PATCH 06.1/16] x86: Disable IRQs before doing anything on nmi_shootdown_cpus()

We need to know on which CPU we are running on, and we don't want to be
preempted while doing this.

Signed-off-by: Eduardo Habkost <[email protected]>
---
arch/x86/kernel/reboot.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index 23a9d78..407106e 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -577,6 +577,7 @@ static struct notifier_block crash_nmi_nb = {
void nmi_shootdown_cpus(nmi_shootdown_cb callback)
{
unsigned long msecs;
+ local_irq_disable();

/* Make a note of crashing cpu. Will be used in NMI callback.*/
crashing_cpu = safe_smp_processor_id();
--
1.5.5.GIT


--
Eduardo

2008-11-05 14:42:51

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 09/16] kdump: Hook emergency_virt_disable() on crash shutdown code

On Tue, Nov 04, 2008 at 12:52:37PM -0200, Eduardo Habkost wrote:
> Signed-off-by: Eduardo Habkost <[email protected]>
> ---
> arch/x86/kernel/crash.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index d84a852..87780ba 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -26,6 +26,7 @@
> #include <linux/kdebug.h>
> #include <asm/smp.h>
> #include <asm/reboot.h>
> +#include <asm/virtext.h>
>
> #include <mach_ipi.h>
>
> @@ -49,6 +50,7 @@ static void kdump_nmi_callback(int cpu, struct die_args *args)
> #endif
> crash_save_cpu(regs, cpu);
>
> + emergency_virt_disable();
> disable_local_APIC();
> }
>
> @@ -80,6 +82,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> local_irq_disable();
>
> kdump_nmi_shootdown_cpus();
> + emergency_virt_disable();

Hi Eduardo,

It might be a good idea to put a 2-3 line comment saying why it is
imporatant to disable virtualization here before we continue.

Thanks
Vivek

2008-11-05 14:44:47

by Vivek Goyal

[permalink] [raw]
Subject: Re: [PATCH 00/14] x86: disable virt on kdump and emergency_restart

On Tue, Nov 04, 2008 at 06:54:04PM +0200, Avi Kivity wrote:
> Eduardo Habkost wrote:
>> Hi,
>>
>> This is a new version of the series to disabling virtualization on kdump,
>> now extended to do the same tricks on emergency_restart() if needed.
>>
>
> Looks good. If you me to push it upstream, I'll need kexec/kdump acks.
> Otherwise, ack for the kvm bits.

Looks good to me from kdump perspective.

Is there a way we can prevent any other module from using virt disable
callback incase kvm is not loaded?

Thanks
Vivek

2008-11-05 15:05:54

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 00/14] x86: disable virt on kdump and emergency_restart

On Wed, Nov 05, 2008 at 09:44:12AM -0500, Vivek Goyal wrote:
> On Tue, Nov 04, 2008 at 06:54:04PM +0200, Avi Kivity wrote:
> > Eduardo Habkost wrote:
> >> Hi,
> >>
> >> This is a new version of the series to disabling virtualization on kdump,
> >> now extended to do the same tricks on emergency_restart() if needed.
> >>
> >
> > Looks good. If you me to push it upstream, I'll need kexec/kdump acks.
> > Otherwise, ack for the kvm bits.
>
> Looks good to me from kdump perspective.
>
> Is there a way we can prevent any other module from using virt disable
> callback incase kvm is not loaded?

I think we can't prevent other modules from using the API. But 3rd-party
modules deal with CPU virtualization extensions today won't live with
KVM today, anyway. And between not living together with KVM and breaking
kdump/reboot, and not living together with KVM and making kdump and
reboot working, I the the latter looks better.

But I see this as an orthogonal issue. If we include a "CPU virtualization
extensions core API" in the future, it could simply keep the same
emergency_virt_disable() API available for emergency_restart and kdump.

--
Eduardo

2008-11-05 15:08:44

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 09/16] kdump: Hook emergency_virt_disable() on crash shutdown code

On Wed, Nov 05, 2008 at 09:41:44AM -0500, Vivek Goyal wrote:
> On Tue, Nov 04, 2008 at 12:52:37PM -0200, Eduardo Habkost wrote:
> > Signed-off-by: Eduardo Habkost <[email protected]>
> > ---
> > arch/x86/kernel/crash.c | 3 +++
> > 1 files changed, 3 insertions(+), 0 deletions(-)
> >
> > diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> > index d84a852..87780ba 100644
> > --- a/arch/x86/kernel/crash.c
> > +++ b/arch/x86/kernel/crash.c
> > @@ -26,6 +26,7 @@
> > #include <linux/kdebug.h>
> > #include <asm/smp.h>
> > #include <asm/reboot.h>
> > +#include <asm/virtext.h>
> >
> > #include <mach_ipi.h>
> >
> > @@ -49,6 +50,7 @@ static void kdump_nmi_callback(int cpu, struct die_args *args)
> > #endif
> > crash_save_cpu(regs, cpu);
> >
> > + emergency_virt_disable();
> > disable_local_APIC();
> > }
> >
> > @@ -80,6 +82,7 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
> > local_irq_disable();
> >
> > kdump_nmi_shootdown_cpus();
> > + emergency_virt_disable();
>
> Hi Eduardo,
>
> It might be a good idea to put a 2-3 line comment saying why it is
> imporatant to disable virtualization here before we continue.

True. Sometimes I forget that detailed commit messages don't replace
good documentation on source code. I will do that.

--
Eduardo

2008-11-05 16:34:19

by Ingo Molnar

[permalink] [raw]
Subject: Re: [PATCH 00/14] x86: disable virt on kdump and emergency_restart


* Avi Kivity <[email protected]> wrote:

> Eduardo Habkost wrote:
>> Hi,
>>
>> This is a new version of the series to disabling virtualization on kdump,
>> now extended to do the same tricks on emergency_restart() if needed.
>>
>
> Looks good. If you me to push it upstream, I'll need kexec/kdump
> acks. Otherwise, ack for the kvm bits.

general ack for the x86 bits, but i'm not sure whether we should be
pushing this upstream so late in the cycle. If we do it in the next
cycle then it's best we do it in the x86 tree, the KVM impact seems
much smaller than the general x86 impact.

Ingo

2008-11-05 17:35:30

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 08/16] x86: Emergency virtualization disable function

Eduardo Habkost <[email protected]> writes:

> +int set_virt_disable_func(void (*fn)(void))
> +{
> + int r = 0;
> +
> + spin_lock(&virt_disable_lock);
> + if (!virt_disable_fn)
> + rcu_assign_pointer(virt_disable_fn, fn);
> + else
> + r = -EEXIST;
> + spin_unlock(&virt_disable_lock);
> +
> + return r;
> +}
> +EXPORT_SYMBOL(set_virt_disable_func);

EXPORT_SYMBOL_GPL?

> +EXPORT_SYMBOL(clear_virt_disable_func);
EXPORT_SYMBOL_GPL?

We are talking a core internal api that should not even
be exported if KVM is compiled into the kernel.

I have had to tell people NO too many times by that
wanted to shove code on the kexec on panic path that
had no business there. I do not want to give
the least little impression that this is an ok hook
for the to use. The very specific name helps in
that regard thank you for that. Having the symbol
exported GPL would help even more.

Overall I think the code is just barely ok.

I don't like the fact that to run 2-3 instructions per cpu we are two
function pointers deep. It feels like we have an excess of
abstraction here on the kvm side.

Is it possible to have the individual kvm modules call
set_virt_disable_func and clear_virt_disable_func? Instead
of going through the x86_kvm_ops?

It really feels like we have an excess of abstraction here.

Eric

2008-11-05 17:35:44

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 15/16] kvm: x86: set kdump virt_disable function on initialization

Eduardo Habkost <[email protected]> writes:

> Finally implement the virt_disable function for kdump. It will call
> kvm_x86_ops->crash_hardware_disable(), that will disable virtualization
> extensions on the CPU if it is not disabled yet.
>
> Signed-off-by: Eduardo Habkost <[email protected]>
> ---
> arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
> 1 files changed, 18 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 049c6a0..9e61baf 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -40,6 +40,7 @@
> #include <asm/msr.h>
> #include <asm/desc.h>
> #include <asm/mtrr.h>
> +#include <asm/virtext.h>
>
> #define MAX_IO_MSRS 256
> #define CR0_RESERVED_BITS \
> @@ -2581,6 +2582,13 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct
> kvm_run *run, int in,
> }
> EXPORT_SYMBOL_GPL(kvm_emulate_pio_string);
>
> +/* Called at crash time, so we can disable virtualization if needed
> + */
> +static void crash_hardware_disable(void)
> +{
> + kvm_x86_ops->crash_hardware_disable(NULL);
> +}
> +
> int kvm_arch_init(void *opaque)
> {
> int r;
> @@ -2605,9 +2613,15 @@ int kvm_arch_init(void *opaque)
>
> kvm_x86_ops = ops;
>
> + r = set_virt_disable_func(crash_hardware_disable);

Can we make this say:
set_virt_disable_func(kvm_x86_ops->crash_hardware_disable);

So we can avoid going through 2 levels of function pointers?
I find that a little scary in code that might be running
at the edge of stack overflow.

> + if (r) {
> + printk(KERN_ERR "kvm: virt_disable function already set?\n");
> + goto out_clear_ops;
> + }
> +
> r = kvm_mmu_module_init();
> if (r)
> - goto out_clear_ops;
> + goto out_clear_crash;
>
> kvm_init_msr_list();
>
> @@ -2617,6 +2631,8 @@ int kvm_arch_init(void *opaque)
> PT_DIRTY_MASK, PT64_NX_MASK, 0, 0);
> return 0;
>
> +out_clear_crash:
> + clear_virt_disable_func();
> out_clear_ops:
> kvm_x86_ops = NULL;
> out:
> @@ -2625,6 +2641,7 @@ out:
>
> void kvm_arch_exit(void)
> {
> + clear_virt_disable_func();
> kvm_x86_ops = NULL;
> kvm_mmu_module_exit();
> }
> --
> 1.5.5.GIT

2008-11-05 17:52:03

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 15/16] kvm: x86: set kdump virt_disable function on initialization

On Wed, Nov 05, 2008 at 09:26:53AM -0800, Eric W. Biederman wrote:
> Eduardo Habkost <[email protected]> writes:
>
> > Finally implement the virt_disable function for kdump. It will call
> > kvm_x86_ops->crash_hardware_disable(), that will disable virtualization
> > extensions on the CPU if it is not disabled yet.
> >
> > Signed-off-by: Eduardo Habkost <[email protected]>
> > ---
> > arch/x86/kvm/x86.c | 19 ++++++++++++++++++-
> > 1 files changed, 18 insertions(+), 1 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 049c6a0..9e61baf 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -40,6 +40,7 @@
> > #include <asm/msr.h>
> > #include <asm/desc.h>
> > #include <asm/mtrr.h>
> > +#include <asm/virtext.h>
> >
> > #define MAX_IO_MSRS 256
> > #define CR0_RESERVED_BITS \
> > @@ -2581,6 +2582,13 @@ int kvm_emulate_pio_string(struct kvm_vcpu *vcpu, struct
> > kvm_run *run, int in,
> > }
> > EXPORT_SYMBOL_GPL(kvm_emulate_pio_string);
> >
> > +/* Called at crash time, so we can disable virtualization if needed
> > + */
> > +static void crash_hardware_disable(void)
> > +{
> > + kvm_x86_ops->crash_hardware_disable(NULL);
> > +}
> > +
> > int kvm_arch_init(void *opaque)
> > {
> > int r;
> > @@ -2605,9 +2613,15 @@ int kvm_arch_init(void *opaque)
> >
> > kvm_x86_ops = ops;
> >
> > + r = set_virt_disable_func(crash_hardware_disable);
>
> Can we make this say:
> set_virt_disable_func(kvm_x86_ops->crash_hardware_disable);
>
> So we can avoid going through 2 levels of function pointers?
> I find that a little scary in code that might be running
> at the edge of stack overflow.

When I've checked this (on x86_64), gcc did tail recursion optimization
and the call was just a jump to the function, so stack usage shouldn't
be a problem.

But I am inclined to agree with you about the excess of abstraction.

--
Eduardo

2008-11-05 17:53:23

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 08/16] x86: Emergency virtualization disable function

On Wed, Nov 05, 2008 at 09:33:06AM -0800, Eric W. Biederman wrote:
> Eduardo Habkost <[email protected]> writes:
>
> > +int set_virt_disable_func(void (*fn)(void))
> > +{
> > + int r = 0;
> > +
> > + spin_lock(&virt_disable_lock);
> > + if (!virt_disable_fn)
> > + rcu_assign_pointer(virt_disable_fn, fn);
> > + else
> > + r = -EEXIST;
> > + spin_unlock(&virt_disable_lock);
> > +
> > + return r;
> > +}
> > +EXPORT_SYMBOL(set_virt_disable_func);
>
> EXPORT_SYMBOL_GPL?
>
> > +EXPORT_SYMBOL(clear_virt_disable_func);
> EXPORT_SYMBOL_GPL?
>
> We are talking a core internal api that should not even
> be exported if KVM is compiled into the kernel.
>
> I have had to tell people NO too many times by that
> wanted to shove code on the kexec on panic path that
> had no business there. I do not want to give
> the least little impression that this is an ok hook
> for the to use. The very specific name helps in
> that regard thank you for that. Having the symbol
> exported GPL would help even more.

Agreed. I will change that if nobody else objects.

>
> Overall I think the code is just barely ok.
>
> I don't like the fact that to run 2-3 instructions per cpu we are two
> function pointers deep. It feels like we have an excess of
> abstraction here on the kvm side.
>
> Is it possible to have the individual kvm modules call
> set_virt_disable_func and clear_virt_disable_func? Instead
> of going through the x86_kvm_ops?
>
> It really feels like we have an excess of abstraction here.

We could move the set_virt_disable_func() calls to vmx.c and svm.c (on
hardware_setup/hardware_unsetup). One could argue that it is sort of a
coincidence that we need the code for both vmx and svm.

Avi, what do you think?

--
Eduardo

2008-11-06 09:45:36

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 15/16] kvm: x86: set kdump virt_disable function on initialization

Eric W. Biederman wrote:
>>
>> + r = set_virt_disable_func(crash_hardware_disable);
>>
>
> Can we make this say:
> set_virt_disable_func(kvm_x86_ops->crash_hardware_disable);
>
> So we can avoid going through 2 levels of function pointers?
> I find that a little scary in code that might be running
> at the edge of stack overflow.
>

Actually, with scheduling disabled we can overflow the stack as much as
we like. It will reduce the quality of the dump, but everything ought
to work.

--
error compiling committee.c: too many arguments to function

2008-11-06 09:49:22

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 08/16] x86: Emergency virtualization disable function

Eduardo Habkost wrote:
> We could move the set_virt_disable_func() calls to vmx.c and svm.c (on
> hardware_setup/hardware_unsetup). One could argue that it is sort of a
> coincidence that we need the code for both vmx and svm.
>

I don't share this fear of function calls, but perhaps that's due to
lack of experience with kdump.

If you want to be extra simple and safe, remove kvm from the equation.
Make the disabling code part of kdump/emergency_restart and only rely on
the convention that cr3.vmxe == vmxon.

That has the advantage of working with other virtualization systems as well.

--
error compiling committee.c: too many arguments to function

2008-11-06 09:50:29

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 00/14] x86: disable virt on kdump and emergency_restart

Ingo Molnar wrote:
> general ack for the x86 bits, but i'm not sure whether we should be
> pushing this upstream so late in the cycle. If we do it in the next
> cycle then it's best we do it in the x86 tree, the KVM impact seems
> much smaller than the general x86 impact.
>

It certainly doesn't fall under the recent regression rule, and there's
a simple workaround (rmmod -r kvm) so I agree it's best to defer for the
next cycle.

--
error compiling committee.c: too many arguments to function

2008-11-06 09:52:40

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 00/14] x86: disable virt on kdump and emergency_restart

Vivek Goyal wrote:
> Is there a way we can prevent any other module from using virt disable
> callback incase kvm is not loaded?
>

We can inline the kvm specific code (which will make it useful for other
hypervisors, and remove the hook).


--
error compiling committee.c: too many arguments to function

2008-11-06 10:25:35

by Eric W. Biederman

[permalink] [raw]
Subject: Re: [PATCH 08/16] x86: Emergency virtualization disable function

Avi Kivity <[email protected]> writes:

> Eduardo Habkost wrote:
>> We could move the set_virt_disable_func() calls to vmx.c and svm.c (on
>> hardware_setup/hardware_unsetup). One could argue that it is sort of a
>> coincidence that we need the code for both vmx and svm.
>>
>
> I don't share this fear of function calls, but perhaps that's due to lack of
> experience with kdump.

Mostly I'm just afraid of extra complexity.

> If you want to be extra simple and safe, remove kvm from the equation. Make the
> disabling code part of kdump/emergency_restart and only rely on the convention
> that cr3.vmxe == vmxon.
Convention?

> That has the advantage of working with other virtualization systems as well.

Sounds good.

Eric

2008-11-06 10:32:27

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 08/16] x86: Emergency virtualization disable function

Eric W. Biederman wrote:
>
>> If you want to be extra simple and safe, remove kvm from the equation. Make the
>> disabling code part of kdump/emergency_restart and only rely on the convention
>> that cr3.vmxe == vmxon.
>>
> Convention?
>

There is a de-facto convention supported by at least vmware and kvm. If
cr4.vmxe is 1, then we are in vmx operation. If cr4.vmxe is 0, then we
are not in vmx operation. This allows us to determine whether we need
to execute vmxoff without any APIs.


--
error compiling committee.c: too many arguments to function

2008-11-06 12:29:31

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 00/14] x86: disable virt on kdump and emergency_restart

On Thu, Nov 06, 2008 at 11:49:57AM +0200, Avi Kivity wrote:
> Ingo Molnar wrote:
>> general ack for the x86 bits, but i'm not sure whether we should be
>> pushing this upstream so late in the cycle. If we do it in the next
>> cycle then it's best we do it in the x86 tree, the KVM impact seems
>> much smaller than the general x86 impact.
>>
>
> It certainly doesn't fall under the recent regression rule, and there's
> a simple workaround (rmmod -r kvm) so I agree it's best to defer for the
> next cycle.

Shouldn't reboot=kbd be the default again still during this cycle, then?

The cases where the new reboot=acpi default broke didn't necessarily
involve KVM.

--
Eduardo

2008-11-06 17:47:31

by Eduardo Habkost

[permalink] [raw]
Subject: Re: [PATCH 08/16] x86: Emergency virtualization disable function

On Thu, Nov 06, 2008 at 12:30:51PM +0200, Avi Kivity wrote:
> Eric W. Biederman wrote:
>>
>>> If you want to be extra simple and safe, remove kvm from the equation. Make the
>>> disabling code part of kdump/emergency_restart and only rely on the convention
>>> that cr3.vmxe == vmxon.
>>>
>> Convention?
>>
>
> There is a de-facto convention supported by at least vmware and kvm. If
> cr4.vmxe is 1, then we are in vmx operation. If cr4.vmxe is 0, then we
> are not in vmx operation. This allows us to determine whether we need
> to execute vmxoff without any APIs.

I am just worried about the probing needed to make sure CR4 is available
(and that the CR4.VMXE bit means what we expect it to mean), before we
try to read it and check VMXE. The same for SVM and the MSRs we need to
touch to disable SVM.

I prefer to reuse code that already exists on KVM and is working than
adding new probing code that I won't be able to test on all hardware
configurations.

--
Eduardo

2008-11-09 11:24:13

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 08/16] x86: Emergency virtualization disable function

Eduardo Habkost wrote:
> On Thu, Nov 06, 2008 at 12:30:51PM +0200, Avi Kivity wrote:
>
>> Eric W. Biederman wrote:
>>
>>>> If you want to be extra simple and safe, remove kvm from the equation. Make the
>>>> disabling code part of kdump/emergency_restart and only rely on the convention
>>>> that cr3.vmxe == vmxon.
>>>>
>>>>
>>> Convention?
>>>
>>>
>> There is a de-facto convention supported by at least vmware and kvm. If
>> cr4.vmxe is 1, then we are in vmx operation. If cr4.vmxe is 0, then we
>> are not in vmx operation. This allows us to determine whether we need
>> to execute vmxoff without any APIs.
>>
>
> I am just worried about the probing needed to make sure CR4 is available
> (and that the CR4.VMXE bit means what we expect it to mean),

That's cpu_has_kvm_support() in vmx.c. Should compile to three
instructions.

> before we
> try to read it and check VMXE. The same for SVM and the MSRs we need to
> touch to disable SVM.
>

That's has_svm() in svm.c; slightly longer but not any more dangerous.

> I prefer to reuse code that already exists on KVM and is working than
> adding new probing code that I won't be able to test on all hardware
> configurations.
>

We could move the code to a header file, and so compile it both for kvm
and for emergency_restart/kdump.

--
error compiling committee.c: too many arguments to function