2012-10-12 06:42:11

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

Currently, kdump just makes all the logical processors leave VMX operation by
executing VMXOFF instruction, so any VMCSs active on the logical processors may
be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before
executing the VMXOFF instruction.

The patch set provides an alternative way to clear VMCSs related to guests
on all cpus when host is doing kdump.

zhangyanfei (3):
x86/kexec: clear vmcss on all cpus if necessary
KVM: make crash_clear_loaded_vmcss valid when kvm_intel is loaded
sysctl: introduce a new interface to control kdump-vmcs-clear
behaviour

Documentation/sysctl/kernel.txt | 8 ++++++++
arch/x86/include/asm/kexec.h | 3 +++
arch/x86/kernel/crash.c | 23 +++++++++++++++++++++++
arch/x86/kvm/vmx.c | 9 +++++++++
kernel/sysctl.c | 10 ++++++++++
5 files changed, 53 insertions(+), 0 deletions(-)


2012-10-12 06:44:52

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH 1/3] x86/kexec: clear vmcss on all cpus if necessary

This patch provides an alternative way to clear vmcss related to guests
on all cpus when doing kdump.

Signed-off-by: zhangyanfei <[email protected]>
---
arch/x86/include/asm/kexec.h | 3 +++
arch/x86/kernel/crash.c | 23 +++++++++++++++++++++++
2 files changed, 26 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 317ff17..0692921 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -163,6 +163,9 @@ struct kimage_arch {
};
#endif

+extern int clear_loaded_vmcs_enabled;
+extern void (*crash_clear_loaded_vmcss)(void);
+
#endif /* __ASSEMBLY__ */

#endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 13ad899..947550e 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -16,6 +16,7 @@
#include <linux/delay.h>
#include <linux/elf.h>
#include <linux/elfcore.h>
+#include <linux/module.h>

#include <asm/processor.h>
#include <asm/hardirq.h>
@@ -30,6 +31,24 @@

int in_crash_kexec;

+/*
+ * If clear_loaded_vmcs_enabled is set, vmcss
+ * that are loaded on all cpus will be cleared
+ * via crash_clear_loaded_vmcss.
+ */
+int clear_loaded_vmcs_enabled;
+void (*crash_clear_loaded_vmcss)(void) = NULL;
+EXPORT_SYMBOL_GPL(crash_clear_loaded_vmcss);
+
+static void cpu_emergency_clear_loaded_vmcss(void)
+{
+ if (clear_loaded_vmcs_enabled &&
+ crash_clear_loaded_vmcss &&
+ cpu_has_vmx() && cpu_vmx_enabled()) {
+ crash_clear_loaded_vmcss();
+ }
+}
+
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)

static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
@@ -46,6 +65,8 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
#endif
crash_save_cpu(regs, cpu);

+ cpu_emergency_clear_loaded_vmcss();
+
/* Disable VMX or SVM if needed.
*
* We need to disable virtualization on all CPUs.
@@ -88,6 +109,8 @@ void native_machine_crash_shutdown(struct pt_regs *regs)

kdump_nmi_shootdown_cpus();

+ cpu_emergency_clear_loaded_vmcss();
+
/* Booting kdump kernel with VMX or SVM enabled won't work,
* because (among other limitations) we can't disable paging
* with the virt flags.
--
1.7.1

2012-10-12 06:45:42

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH 2/3] KVM: make crash_clear_loaded_vmcss valid when kvm_intel is loaded

Signed-off-by: zhangyanfei <[email protected]>
---
arch/x86/kvm/vmx.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4ff0ab9..f6a16b2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -41,6 +41,7 @@
#include <asm/i387.h>
#include <asm/xcr.h>
#include <asm/perf_event.h>
+#include <asm/kexec.h>

#include "trace.h"

@@ -7230,6 +7231,10 @@ static int __init vmx_init(void)
if (r)
goto out3;

+#ifdef CONFIG_KEXEC
+ crash_clear_loaded_vmcss = vmclear_local_loaded_vmcss;
+#endif
+
vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
@@ -7265,6 +7270,10 @@ static void __exit vmx_exit(void)
free_page((unsigned long)vmx_io_bitmap_b);
free_page((unsigned long)vmx_io_bitmap_a);

+#ifdef CONFIG_KEXEC
+ crash_clear_loaded_vmcss = NULL;
+#endif
+
kvm_exit();
}

--
1.7.1

2012-10-12 06:46:35

by Zhang Yanfei

[permalink] [raw]
Subject: [PATCH 3/3] sysctl: introduce a new interface to control kdump-vmcs-clear behaviour

This patch exports the variable clear_loaded_vmcs_enabled to userspace.

Signed-off-by: zhangyanfei <[email protected]>
---
Documentation/sysctl/kernel.txt | 8 ++++++++
kernel/sysctl.c | 10 ++++++++++
2 files changed, 18 insertions(+), 0 deletions(-)

diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index 6d78841..038148b 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -25,6 +25,7 @@ show up in /proc/sys/kernel:
- bootloader_version [ X86 only ]
- callhome [ S390 only ]
- cap_last_cap
+- clear_loaded_vmcs [ X86 only ]
- core_pattern
- core_pipe_limit
- core_uses_pid
@@ -164,6 +165,13 @@ CAP_LAST_CAP from the kernel.

==============================================================

+clear_loaded_vmcs
+
+Controls if VMCSs should be cleared when host is doing kdump. Exports
+clear_loaded_vmcs_enabled from the kernel.
+
+==============================================================
+
core_pattern:

core_pattern is used to specify a core dumpfile pattern name.
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4ab1187..3ab7d9c 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -63,6 +63,7 @@

#include <asm/uaccess.h>
#include <asm/processor.h>
+#include <asm/kexec.h>

#ifdef CONFIG_X86
#include <asm/nmi.h>
@@ -994,6 +995,15 @@ static struct ctl_table kern_table[] = {
.proc_handler = proc_dointvec,
},
#endif
+#ifdef CONFIG_KEXEC
+ {
+ .procname = "clear_loaded_vmcs",
+ .data = &clear_loaded_vmcs_enabled,
+ .maxlen = sizeof(int),
+ .mode = 0644,
+ .proc_handler = proc_dointvec,
+ },
+#endif
{ }
};

--
1.7.1

2012-10-15 15:43:53

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
> Currently, kdump just makes all the logical processors leave VMX operation by
> executing VMXOFF instruction, so any VMCSs active on the logical processors may
> be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before
> executing the VMXOFF instruction.

How have you verified that VMXOFF doesn't flush cached VMCSs already?

>
> The patch set provides an alternative way to clear VMCSs related to guests
> on all cpus when host is doing kdump.
>

I'm not sure the sysctl is really necessary. The only reason to turn if
off is if the corruption is so severe that the loaded vmcs list itself
causes a crash. I think it should be rare enough that we can do it
unconditionally.

--
error compiling committee.c: too many arguments to function

2012-10-17 02:29:37

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

于 2012年10月15日 23:43, Avi Kivity 写道:
> On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
>> Currently, kdump just makes all the logical processors leave VMX operation by
>> executing VMXOFF instruction, so any VMCSs active on the logical processors may
>> be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
>> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before
>> executing the VMXOFF instruction.
>
> How have you verified that VMXOFF doesn't flush cached VMCSs already?
>

I tried some tests, for example, I made copies for every vmcs, and in the kdump
path, I backed up all the loaded vmcs into the copies before vmxoff.
After generating the vmcore, I retrieve the vmcss and their copies, and compare them,
no differences.

Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF,
and compare the vmcss and their copies, there are indeed differences between the
vmcs and its copy.

I know the tests may be not so convincing, for example, I used memcpy to back up
the vmcss and it is an ordinary memory operation. But to ensure the non-corruption
of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before VMXOFF just
as the Intel spec says.

>>
>> The patch set provides an alternative way to clear VMCSs related to guests
>> on all cpus when host is doing kdump.
>>
>
> I'm not sure the sysctl is really necessary. The only reason to turn if
> off is if the corruption is so severe that the loaded vmcs list itself
> causes a crash. I think it should be rare enough that we can do it
> unconditionally.
>

You mean not using sysctl and just let VMCLEAR-VMCSS be a default behaviour? If so,
I agree with you.

Thanks
Zhang Yanfei

2012-10-17 10:16:37

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

On 10/17/2012 04:28 AM, Zhang Yanfei wrote:
> 于 2012年10月15日 23:43, Avi Kivity 写道:
>> On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
>>> Currently, kdump just makes all the logical processors leave VMX operation by
>>> executing VMXOFF instruction, so any VMCSs active on the logical processors may
>>> be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
>>> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before
>>> executing the VMXOFF instruction.
>>
>> How have you verified that VMXOFF doesn't flush cached VMCSs already?
>>
>
> I tried some tests, for example, I made copies for every vmcs, and in the kdump
> path, I backed up all the loaded vmcs into the copies before vmxoff.
> After generating the vmcore, I retrieve the vmcss and their copies, and compare them,
> no differences.
>
> Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF,
> and compare the vmcss and their copies, there are indeed differences between the
> vmcs and its copy.
>
> I know the tests may be not so convincing, for example, I used memcpy to back up
> the vmcss and it is an ordinary memory operation. But to ensure the non-corruption
> of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before VMXOFF just
> as the Intel spec says.

Sorry, I was unclear -- I was referring to the spec, I wasn't sure
whether VMXOFF is defined to flush VMCSes or whether it just invalidates
on-chip caches so that it won't flush them out in the future, corrupting
memory. We don't want to depend on actual behaviour as it may change
with future version.

Copying some Intel folk, maybe they can clarify it.

>
>>>
>>> The patch set provides an alternative way to clear VMCSs related to guests
>>> on all cpus when host is doing kdump.
>>>
>>
>> I'm not sure the sysctl is really necessary. The only reason to turn if
>> off is if the corruption is so severe that the loaded vmcs list itself
>> causes a crash. I think it should be rare enough that we can do it
>> unconditionally.
>>
>
> You mean not using sysctl and just let VMCLEAR-VMCSS be a default behaviour? If so,
> I agree with you.

Yes, that's what I meant.


--
error compiling committee.c: too many arguments to function

2012-10-18 01:13:20

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

于 2012年10月17日 18:16, Avi Kivity 写道:
> On 10/17/2012 04:28 AM, Zhang Yanfei wrote:
>> 于 2012年10月15日 23:43, Avi Kivity 写道:
>>> On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
>>>> Currently, kdump just makes all the logical processors leave VMX operation by
>>>> executing VMXOFF instruction, so any VMCSs active on the logical processors may
>>>> be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
>>>> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before
>>>> executing the VMXOFF instruction.
>>>
>>> How have you verified that VMXOFF doesn't flush cached VMCSs already?
>>>
>>
>> I tried some tests, for example, I made copies for every vmcs, and in the kdump
>> path, I backed up all the loaded vmcs into the copies before vmxoff.
>> After generating the vmcore, I retrieve the vmcss and their copies, and compare them,
>> no differences.
>>
>> Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF,
>> and compare the vmcss and their copies, there are indeed differences between the
>> vmcs and its copy.
>>
>> I know the tests may be not so convincing, for example, I used memcpy to back up
>> the vmcss and it is an ordinary memory operation. But to ensure the non-corruption
>> of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before VMXOFF just
>> as the Intel spec says.
>
> Sorry, I was unclear -- I was referring to the spec, I wasn't sure
> whether VMXOFF is defined to flush VMCSes or whether it just invalidates
> on-chip caches so that it won't flush them out in the future, corrupting
> memory. We don't want to depend on actual behaviour as it may change
> with future version.
>
> Copying some Intel folk, maybe they can clarify it.
>

Yes, the Intel spec says "may be" about the VMCS-corruption thing. From
chapter 24.10.1 in Intel® 64 and IA-32 Architectures Software Developer’s
Manual Volume 3C:System Programming Guide, Part 3, there is the description:

"If a logical processor leaves VMX operation, any VMCSs active on that logical
processor may be corrupted (see below). To prevent such corruption of a VMCS that
may be used either after a return to VMX operation or on another logical processor,
software should VMCLEAR that VMCS before executing the VMXOFF instruction or
removing power from the processor (e.g., as part of a transition to the S3 and S4
power states)."

Our purpose is to make sure the VMCSs in the vmcore are updated and non-corrupted. So
according to the description above, no matter whether VMXOFF is defined to flush
VMCSs or whether it just invalidates on-chip caches, we'd better VMCLEAR the
VMCSs before executing the VMXOFF.

Thanks
Zhang Yanfei

2012-10-18 10:55:45

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

On 10/18/2012 03:12 AM, Zhang Yanfei wrote:
> 于 2012年10月17日 18:16, Avi Kivity 写道:
>> On 10/17/2012 04:28 AM, Zhang Yanfei wrote:
>>> 于 2012年10月15日 23:43, Avi Kivity 写道:
>>>> On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
>>>>> Currently, kdump just makes all the logical processors leave VMX operation by
>>>>> executing VMXOFF instruction, so any VMCSs active on the logical processors may
>>>>> be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
>>>>> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before
>>>>> executing the VMXOFF instruction.
>>>>
>>>> How have you verified that VMXOFF doesn't flush cached VMCSs already?
>>>>
>>>
>>> I tried some tests, for example, I made copies for every vmcs, and in the kdump
>>> path, I backed up all the loaded vmcs into the copies before vmxoff.
>>> After generating the vmcore, I retrieve the vmcss and their copies, and compare them,
>>> no differences.
>>>
>>> Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF,
>>> and compare the vmcss and their copies, there are indeed differences between the
>>> vmcs and its copy.
>>>
>>> I know the tests may be not so convincing, for example, I used memcpy to back up
>>> the vmcss and it is an ordinary memory operation. But to ensure the non-corruption
>>> of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before VMXOFF just
>>> as the Intel spec says.
>>
>> Sorry, I was unclear -- I was referring to the spec, I wasn't sure
>> whether VMXOFF is defined to flush VMCSes or whether it just invalidates
>> on-chip caches so that it won't flush them out in the future, corrupting
>> memory. We don't want to depend on actual behaviour as it may change
>> with future version.
>>
>> Copying some Intel folk, maybe they can clarify it.
>>
>
> Yes, the Intel spec says "may be" about the VMCS-corruption thing. From
> chapter 24.10.1 in Intel® 64 and IA-32 Architectures Software Developer’s
> Manual Volume 3C:System Programming Guide, Part 3, there is the description:
>
> "If a logical processor leaves VMX operation, any VMCSs active on that logical
> processor may be corrupted (see below). To prevent such corruption of a VMCS that
> may be used either after a return to VMX operation or on another logical processor,
> software should VMCLEAR that VMCS before executing the VMXOFF instruction or
> removing power from the processor (e.g., as part of a transition to the S3 and S4
> power states)."
>
> Our purpose is to make sure the VMCSs in the vmcore are updated and non-corrupted. So
> according to the description above, no matter whether VMXOFF is defined to flush
> VMCSs or whether it just invalidates on-chip caches, we'd better VMCLEAR the
> VMCSs before executing the VMXOFF.

Ok, that's clear then. So all we need is to remove the sysctl and clear
VMCSs unconditionally.



--
error compiling committee.c: too many arguments to function

2012-10-19 04:52:21

by Zhang Yanfei

[permalink] [raw]
Subject: Re: [PATCH 0/3] x86: clear vmcss on all cpus when doing kdump if necessary

于 2012年10月18日 18:55, Avi Kivity 写道:
> On 10/18/2012 03:12 AM, Zhang Yanfei wrote:
>> 于 2012年10月17日 18:16, Avi Kivity 写道:
>>> On 10/17/2012 04:28 AM, Zhang Yanfei wrote:
>>>> 于 2012年10月15日 23:43, Avi Kivity 写道:
>>>>> On 10/12/2012 08:40 AM, Zhang Yanfei wrote:
>>>>>> Currently, kdump just makes all the logical processors leave VMX operation by
>>>>>> executing VMXOFF instruction, so any VMCSs active on the logical processors may
>>>>>> be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
>>>>>> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before
>>>>>> executing the VMXOFF instruction.
>>>>>
>>>>> How have you verified that VMXOFF doesn't flush cached VMCSs already?
>>>>>
>>>>
>>>> I tried some tests, for example, I made copies for every vmcs, and in the kdump
>>>> path, I backed up all the loaded vmcs into the copies before vmxoff.
>>>> After generating the vmcore, I retrieve the vmcss and their copies, and compare them,
>>>> no differences.
>>>>
>>>> Another test is using VMCLEAR to clear all the loaded vmcs before VMXOFF,
>>>> and compare the vmcss and their copies, there are indeed differences between the
>>>> vmcs and its copy.
>>>>
>>>> I know the tests may be not so convincing, for example, I used memcpy to back up
>>>> the vmcss and it is an ordinary memory operation. But to ensure the non-corruption
>>>> of the vmcss in the vmcore, I think we should VMCLEAR the vmcss before VMXOFF just
>>>> as the Intel spec says.
>>>
>>> Sorry, I was unclear -- I was referring to the spec, I wasn't sure
>>> whether VMXOFF is defined to flush VMCSes or whether it just invalidates
>>> on-chip caches so that it won't flush them out in the future, corrupting
>>> memory. We don't want to depend on actual behaviour as it may change
>>> with future version.
>>>
>>> Copying some Intel folk, maybe they can clarify it.
>>>
>>
>> Yes, the Intel spec says "may be" about the VMCS-corruption thing. From
>> chapter 24.10.1 in Intel® 64 and IA-32 Architectures Software Developer’s
>> Manual Volume 3C:System Programming Guide, Part 3, there is the description:
>>
>> "If a logical processor leaves VMX operation, any VMCSs active on that logical
>> processor may be corrupted (see below). To prevent such corruption of a VMCS that
>> may be used either after a return to VMX operation or on another logical processor,
>> software should VMCLEAR that VMCS before executing the VMXOFF instruction or
>> removing power from the processor (e.g., as part of a transition to the S3 and S4
>> power states)."
>>
>> Our purpose is to make sure the VMCSs in the vmcore are updated and non-corrupted. So
>> according to the description above, no matter whether VMXOFF is defined to flush
>> VMCSs or whether it just invalidates on-chip caches, we'd better VMCLEAR the
>> VMCSs before executing the VMXOFF.
>
> Ok, that's clear then. So all we need is to remove the sysctl and clear
> VMCSs unconditionally.
>

OK, I'll make the new patch and resend it again.

Thanks
Zhang Yanfei