Currently, kdump just makes all the logical processors leave VMX operation by
executing VMXOFF instruction, so any VMCSs active on the logical processors may
be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before
executing the VMXOFF instruction.
The patch set provides a way to VMCLEAR vmcss related to guests on all cpus before
executing the VMXOFF when doing kdump. This is used to ensure the VMCSs in the
vmcore updated and non-corrupted.
Changelog from v2 to v3:
1. remove unnecessary conditions in function
cpu_emergency_clear_loaded_vmcss as Marcelo suggested.
Changelog from v1 to v2:
1. remove the sysctl and clear VMCSs unconditionally.
Zhang Yanfei (2):
x86/kexec: VMCLEAR vmcss on all cpus if necessary
KVM: make crash_clear_loaded_vmcss valid when loading kvm_intel
module
arch/x86/include/asm/kexec.h | 2 ++
arch/x86/kernel/crash.c | 25 +++++++++++++++++++++++++
arch/x86/kvm/vmx.c | 9 +++++++++
3 files changed, 36 insertions(+), 0 deletions(-)
This patch provides a way to VMCLEAR vmcss related to guests
on all cpus before executing the VMXOFF when doing kdump. This
is used to ensure the VMCSs in the vmcore updated and
non-corrupted.
Signed-off-by: Zhang Yanfei <[email protected]>
---
arch/x86/include/asm/kexec.h | 2 ++
arch/x86/kernel/crash.c | 25 +++++++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index 317ff17..fc05440 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -163,6 +163,8 @@ struct kimage_arch {
};
#endif
+extern void (*crash_clear_loaded_vmcss)(void);
+
#endif /* __ASSEMBLY__ */
#endif /* _ASM_X86_KEXEC_H */
diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 13ad899..9ed65c1 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -16,6 +16,7 @@
#include <linux/delay.h>
#include <linux/elf.h>
#include <linux/elfcore.h>
+#include <linux/module.h>
#include <asm/processor.h>
#include <asm/hardirq.h>
@@ -30,6 +31,20 @@
int in_crash_kexec;
+/*
+ * This is used to VMCLEAR vmcss loaded on all
+ * cpus. And when loading kvm_intel module, the
+ * function pointer will be made valid.
+ */
+void (*crash_clear_loaded_vmcss)(void) = NULL;
+EXPORT_SYMBOL_GPL(crash_clear_loaded_vmcss);
+
+static void cpu_emergency_clear_loaded_vmcss(void)
+{
+ if (crash_clear_loaded_vmcss)
+ crash_clear_loaded_vmcss();
+}
+
#if defined(CONFIG_SMP) && defined(CONFIG_X86_LOCAL_APIC)
static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
@@ -46,6 +61,11 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs)
#endif
crash_save_cpu(regs, cpu);
+ /*
+ * VMCLEAR vmcss loaded on all cpus if needed.
+ */
+ cpu_emergency_clear_loaded_vmcss();
+
/* Disable VMX or SVM if needed.
*
* We need to disable virtualization on all CPUs.
@@ -88,6 +108,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
kdump_nmi_shootdown_cpus();
+ /*
+ * VMCLEAR vmcss loaded on this cpu if needed.
+ */
+ cpu_emergency_clear_loaded_vmcss();
+
/* Booting kdump kernel with VMX or SVM enabled won't work,
* because (among other limitations) we can't disable paging
* with the virt flags.
--
1.7.1
Signed-off-by: Zhang Yanfei <[email protected]>
---
arch/x86/kvm/vmx.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 4ff0ab9..f6a16b2 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -41,6 +41,7 @@
#include <asm/i387.h>
#include <asm/xcr.h>
#include <asm/perf_event.h>
+#include <asm/kexec.h>
#include "trace.h"
@@ -7230,6 +7231,10 @@ static int __init vmx_init(void)
if (r)
goto out3;
+#ifdef CONFIG_KEXEC
+ crash_clear_loaded_vmcss = vmclear_local_loaded_vmcss;
+#endif
+
vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
@@ -7265,6 +7270,10 @@ static void __exit vmx_exit(void)
free_page((unsigned long)vmx_io_bitmap_b);
free_page((unsigned long)vmx_io_bitmap_a);
+#ifdef CONFIG_KEXEC
+ crash_clear_loaded_vmcss = NULL;
+#endif
+
kvm_exit();
}
--
1.7.1
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of zhangyanfei
> Sent: Wednesday, October 31, 2012 12:34 PM
> To: [email protected]; [email protected]; Avi Kivity; Marcelo
> Tosatti
> Cc: [email protected]; [email protected]
> Subject: [PATCH v3 2/2] KVM: make crash_clear_loaded_vmcss valid when
> loading kvm_intel module
>
> Signed-off-by: Zhang Yanfei <[email protected]>
[...]
> @@ -7230,6 +7231,10 @@ static int __init vmx_init(void)
> if (r)
> goto out3;
>
> +#ifdef CONFIG_KEXEC
> + crash_clear_loaded_vmcss = vmclear_local_loaded_vmcss;
> +#endif
> +
Assignment here cannot cover the case where NMI is initiated after VMX is on in kvm_init and before vmclear_local_loaded_vmcss is assigned, though rare but can happen.
What does happen if calling vmclear_local_loaded_vmcss before kvm_init? I think it no problem since the list is initially empty.
> vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
> vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
> vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
> @@ -7265,6 +7270,10 @@ static void __exit vmx_exit(void)
> free_page((unsigned long)vmx_io_bitmap_b);
> free_page((unsigned long)vmx_io_bitmap_a);
>
> +#ifdef CONFIG_KEXEC
> + crash_clear_loaded_vmcss = NULL;
> +#endif
> +
> kvm_exit();
> }
Also, this is converse to the above.
Thanks.
HATAYAMA, Daisuke
$BP2(B 2012$BG/(B10$B7n(B31$BF|(B 17:01, Hatayama, Daisuke $B<LF;(B:
>
>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of zhangyanfei
>> Sent: Wednesday, October 31, 2012 12:34 PM
>> To: [email protected]; [email protected]; Avi Kivity; Marcelo
>> Tosatti
>> Cc: [email protected]; [email protected]
>> Subject: [PATCH v3 2/2] KVM: make crash_clear_loaded_vmcss valid when
>> loading kvm_intel module
>>
>> Signed-off-by: Zhang Yanfei <[email protected]>
>
> [...]
>
>> @@ -7230,6 +7231,10 @@ static int __init vmx_init(void)
>> if (r)
>> goto out3;
>>
>> +#ifdef CONFIG_KEXEC
>> + crash_clear_loaded_vmcss = vmclear_local_loaded_vmcss;
>> +#endif
>> +
>
> Assignment here cannot cover the case where NMI is initiated after VMX is on in kvm_init and before vmclear_local_loaded_vmcss is assigned, though rare but can happen.
>
By saying "VMX is on in kvm init", you mean kvm_init enables the VMX feature in the logical processor?
No, only there is a vcpu to be created, kvm will enable the VMX feature.
I think there is no difference with this assignment before or after kvm_init because the vmcs linked
list must be empty before vmx_init is finished.
Thanks
Zhang Yanfei
> What does happen if calling vmclear_local_loaded_vmcss before kvm_init? I think it no problem since the list is initially empty.
>
>> vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
>> vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
>> vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
>> @@ -7265,6 +7270,10 @@ static void __exit vmx_exit(void)
>> free_page((unsigned long)vmx_io_bitmap_b);
>> free_page((unsigned long)vmx_io_bitmap_a);
>>
>> +#ifdef CONFIG_KEXEC
>> + crash_clear_loaded_vmcss = NULL;
>> +#endif
>> +
>> kvm_exit();
>> }
>
> Also, this is converse to the above.
>
> Thanks.
> HATAYAMA, Daisuke
>
>
Hello Marcelo,
Do you have any comments about this version?
Thanks
Zhang
?? 2012??10??31?? 11:30, zhangyanfei д??:
> Currently, kdump just makes all the logical processors leave VMX operation by
> executing VMXOFF instruction, so any VMCSs active on the logical processors may
> be corrupted. But, sometimes, we need the VMCSs to debug guest images contained
> in the host vmcore. To prevent the corruption, we should VMCLEAR the VMCSs before
> executing the VMXOFF instruction.
>
> The patch set provides a way to VMCLEAR vmcss related to guests on all cpus before
> executing the VMXOFF when doing kdump. This is used to ensure the VMCSs in the
> vmcore updated and non-corrupted.
>
> Changelog from v2 to v3:
> 1. remove unnecessary conditions in function
> cpu_emergency_clear_loaded_vmcss as Marcelo suggested.
>
> Changelog from v1 to v2:
> 1. remove the sysctl and clear VMCSs unconditionally.
>
> Zhang Yanfei (2):
> x86/kexec: VMCLEAR vmcss on all cpus if necessary
> KVM: make crash_clear_loaded_vmcss valid when loading kvm_intel
> module
>
> arch/x86/include/asm/kexec.h | 2 ++
> arch/x86/kernel/crash.c | 25 +++++++++++++++++++++++++
> arch/x86/kvm/vmx.c | 9 +++++++++
> 3 files changed, 36 insertions(+), 0 deletions(-)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
On Thu, Nov 01, 2012 at 01:55:04PM +0800, zhangyanfei wrote:
> 于 2012年10月31日 17:01, Hatayama, Daisuke 写道:
> >
> >
> >> -----Original Message-----
> >> From: [email protected]
> >> [mailto:[email protected]] On Behalf Of zhangyanfei
> >> Sent: Wednesday, October 31, 2012 12:34 PM
> >> To: [email protected]; [email protected]; Avi Kivity; Marcelo
> >> Tosatti
> >> Cc: [email protected]; [email protected]
> >> Subject: [PATCH v3 2/2] KVM: make crash_clear_loaded_vmcss valid when
> >> loading kvm_intel module
> >>
> >> Signed-off-by: Zhang Yanfei <[email protected]>
> >
> > [...]
> >
> >> @@ -7230,6 +7231,10 @@ static int __init vmx_init(void)
> >> if (r)
> >> goto out3;
> >>
> >> +#ifdef CONFIG_KEXEC
> >> + crash_clear_loaded_vmcss = vmclear_local_loaded_vmcss;
> >> +#endif
> >> +
> >
> > Assignment here cannot cover the case where NMI is initiated after VMX is on in kvm_init and before vmclear_local_loaded_vmcss is assigned, though rare but can happen.
> >
>
> By saying "VMX is on in kvm init", you mean kvm_init enables the VMX feature in the logical processor?
> No, only there is a vcpu to be created, kvm will enable the VMX feature.
>
> I think there is no difference with this assignment before or after kvm_init because the vmcs linked
> list must be empty before vmx_init is finished.
The list is not initialized before hardware_enable(), though. Should
move the assignment after that.
Also, it is possible that the loaded_vmcss_on_cpu list is being modified
_while_ crash executes say via NMI, correct? If that is the case, better
flag that the list is under manipulation so the vmclear can be skipped.
> Thanks
> Zhang Yanfei
>
> > What does happen if calling vmclear_local_loaded_vmcss before kvm_init? I think it no problem since the list is initially empty.
> >
> >> vmx_disable_intercept_for_msr(MSR_FS_BASE, false);
> >> vmx_disable_intercept_for_msr(MSR_GS_BASE, false);
> >> vmx_disable_intercept_for_msr(MSR_KERNEL_GS_BASE, true);
> >> @@ -7265,6 +7270,10 @@ static void __exit vmx_exit(void)
> >> free_page((unsigned long)vmx_io_bitmap_b);
> >> free_page((unsigned long)vmx_io_bitmap_a);
> >>
> >> +#ifdef CONFIG_KEXEC
> >> + crash_clear_loaded_vmcss = NULL;
> >> +#endif
> >> +
> >> kvm_exit();
> >> }
> >
> > Also, this is converse to the above.
> >
> > Thanks.
> > HATAYAMA, Daisuke
> >
> >
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
于 2012年11月14日 05:22, Marcelo Tosatti 写道:
> On Thu, Nov 01, 2012 at 01:55:04PM +0800, zhangyanfei wrote:
>> 于 2012年10月31日 17:01, Hatayama, Daisuke 写道:
>>>
>>>
>>>> -----Original Message-----
>>>> From: [email protected]
>>>> [mailto:[email protected]] On Behalf Of zhangyanfei
>>>> Sent: Wednesday, October 31, 2012 12:34 PM
>>>> To: [email protected]; [email protected]; Avi Kivity; Marcelo
>>>> Tosatti
>>>> Cc: [email protected]; [email protected]
>>>> Subject: [PATCH v3 2/2] KVM: make crash_clear_loaded_vmcss valid when
>>>> loading kvm_intel module
>>>>
>>>> Signed-off-by: Zhang Yanfei <[email protected]>
>>>
>>> [...]
>>>
>>>> @@ -7230,6 +7231,10 @@ static int __init vmx_init(void)
>>>> if (r)
>>>> goto out3;
>>>>
>>>> +#ifdef CONFIG_KEXEC
>>>> + crash_clear_loaded_vmcss = vmclear_local_loaded_vmcss;
>>>> +#endif
>>>> +
>>>
>>> Assignment here cannot cover the case where NMI is initiated after VMX is on in kvm_init and before vmclear_local_loaded_vmcss is assigned, though rare but can happen.
>>>
>>
>> By saying "VMX is on in kvm init", you mean kvm_init enables the VMX feature in the logical processor?
>> No, only there is a vcpu to be created, kvm will enable the VMX feature.
>>
>> I think there is no difference with this assignment before or after kvm_init because the vmcs linked
>> list must be empty before vmx_init is finished.
>
> The list is not initialized before hardware_enable(), though. Should
> move the assignment after that.
>
> Also, it is possible that the loaded_vmcss_on_cpu list is being modified
> _while_ crash executes say via NMI, correct? If that is the case, better
> flag that the list is under manipulation so the vmclear can be skipped.
>
Thanks for your comments.
In the new patchset, I didn't move the crash_clear_loaded_vmcss assignment.
I added a new percpu variable vmclear_skipped to indicate everything:
1. Before the loaded_vmcss_on_cpu list is initialized, vmclear_skipped is 1 and
this means if the machine crashes and doing kdump, crash_clear_loaded_vmcss
still will not be called.
2. If the loaded_vmcss_on_cpu list is under manipulation, vmclear_skipped is
set to 1 and after the manipulation is finished, the variable is set to 0.
3. After all loaded vmcss are vmcleared, vmclear_skipped is set to 1. So we
needn't repeat to vmclear loaded vmcss in kdump path.
Please refer to the new version of the patchset I sent. If you have any suggestions, that'll be helpful.
Thanks
Zhang