LinuxLists.cc - [PATCH v8 0/4] KVM: async_pf: Fix async pf exception injection

2017-07-14 01:30:51

Subject: [PATCH v8 0/4] KVM: async_pf: Fix async pf exception injection

INFO: task gnome-terminal-:1734 blocked for more than 120 seconds.
Not tainted 4.12.0-rc4+ #8
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
gnome-terminal- D 0 1734 1015 0x00000000
Call Trace:
__schedule+0x3cd/0xb30
schedule+0x40/0x90
kvm_async_pf_task_wait+0x1cc/0x270
? __vfs_read+0x37/0x150
? prepare_to_swait+0x22/0x70
do_async_page_fault+0x77/0xb0
? do_async_page_fault+0x77/0xb0
async_page_fault+0x28/0x30

This is triggered by running both win7 and win2016 on L1 KVM simultaneously,
and then gives stress to memory on L1, I can observed this hang on L1 when
at least ~70% swap area is occupied on L0.

This is due to async pf was injected to L2 which should be injected to L1,
L2 guest starts receiving pagefault w/ bogus %cr2(apf token from the host
actually), and L1 guest starts accumulating tasks stuck in D state in
kvm_async_pf_task_wait() since missing PAGE_READY async_pfs.

This patchset fixes it according to Radim's proposal "force a nested VM exit
from nested_vmx_check_exception if the injected #PF is async_pf and handle
the #PF VM exit in L1". https://www.spinics.net/lists/kvm/msg142498.html

v7 -> v8:
* fold Radim's fix for SVM
* fold Paolo's patch subject/description cleanup when he last applied

v6 -> v7:
* drop KVM_GET/PUT_VCPU_EVENTS stuff for nested_apf

v5 -> v6:
* move vcpu_svm's apf_reason to vcpu->arch.apf.host_apf_reason
* introduce function kvm_handle_page_fault() to be used by both VMX/SVM
* introduce svm's codes posted by Paolo
* introduce nested_apf
* better set MSR_KVM_ASYNC_PF_EN

v4 -> v5:
* utilize wrmsr_safe for MSR_KVM_ASYNC_PF_EN

v3 -> v4:
* reuse pad field in kvm_vcpu_events for async_page_fault
* update kvm_vcpu_events API documentations
* change async_page_fault type in vcpu->arch.exception from bool to u8

v2 -> v3:
* add the flag to the userspace interface(KVM_GET/PUT_VCPU_EVENTS)

v1 -> v2:
* remove nested_vmx_check_exception nr parameter
* construct a simple special vm-exit information field for async pf
* introduce nested_apf_token to vcpu->arch.apf to avoid change the CR2
visible in L2 guest
* avoid pass the apf directed towards it (L1) into L2 if there is L3
at the moment

Wanpeng Li (4):
KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list
KVM: async_pf: Add L1 guest async_pf #PF vmexit handler
KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf
KVM: async_pf: Let guest support delivery of async_pf from guest mode

Documentation/virtual/kvm/msr.txt | 5 ++--
arch/x86/include/asm/kvm_emulate.h | 1 +
arch/x86/include/asm/kvm_host.h | 8 +++--
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kernel/kvm.c | 7 ++++-
arch/x86/kvm/mmu.c | 34 ++++++++++++++++++++-
arch/x86/kvm/mmu.h | 3 ++
arch/x86/kvm/svm.c | 58 +++++++++++++-----------------------
arch/x86/kvm/vmx.c | 42 +++++++++++++++++---------
arch/x86/kvm/x86.c | 19 +++++++-----
10 files changed, 112 insertions(+), 66 deletions(-)

--
2.7.4

2017-07-14 01:31:02

by Wanpeng Li

[permalink] [raw]

Subject: [PATCH v8 3/4] KVM: async_pf: Force a nested vmexit if the injected #PF is async_pf

From: Wanpeng Li <[email protected]>

Add an nested_apf field to vcpu->arch.exception to identify an async page
fault, and constructs the expected vm-exit information fields. Force a
nested VM exit from nested_vmx_check_exception() if the injected #PF is
async page fault.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/include/asm/kvm_emulate.h | 1 +
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/svm.c | 16 ++++++++++------
arch/x86/kvm/vmx.c | 17 ++++++++++++++---
arch/x86/kvm/x86.c | 9 ++++++++-
5 files changed, 35 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 722d0e5..fde36f1 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -23,6 +23,7 @@ struct x86_exception {
u16 error_code;
bool nested_page_fault;
u64 address; /* cr2 or nested page fault gpa */
+ u8 async_page_fault;
};

/*
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f20ee6..5e9ac50 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -550,6 +550,7 @@ struct kvm_vcpu_arch {
bool reinject;
u8 nr;
u32 error_code;
+ u8 nested_apf;
} exception;

struct kvm_queued_interrupt {
@@ -651,6 +652,7 @@ struct kvm_vcpu_arch {
u32 id;
bool send_user_only;
u32 host_apf_reason;
+ unsigned long nested_apf_token;
} apf;

/* OSVW MSRs (AMD only) */
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index fb23497..4d8141e 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2423,15 +2423,19 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr,
if (!is_guest_mode(&svm->vcpu))
return 0;

+ vmexit = nested_svm_intercept(svm);
+ if (vmexit != NESTED_EXIT_DONE)
+ return 0;
+
svm->vmcb->control.exit_code = SVM_EXIT_EXCP_BASE + nr;
svm->vmcb->control.exit_code_hi = 0;
svm->vmcb->control.exit_info_1 = error_code;
- svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;
-
- vmexit = nested_svm_intercept(svm);
- if (vmexit == NESTED_EXIT_DONE)
- svm->nested.exit_required = true;
+ if (svm->vcpu.arch.exception.nested_apf)
+ svm->vmcb->control.exit_info_2 = svm->vcpu.arch.apf.nested_apf_token;
+ else
+ svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2;

+ svm->nested.exit_required = true;
return vmexit;
}

@@ -2653,7 +2657,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
}
/* async page fault always cause vmexit */
else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) &&
- svm->vcpu.arch.apf.host_apf_reason != 0)
+ svm->vcpu.arch.exception.nested_apf != 0)
vmexit = NESTED_EXIT_DONE;
break;
}
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c9c46e6..5a3bb1a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2422,13 +2422,24 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
* KVM wants to inject page-faults which it got to the guest. This function
* checks whether in a nested guest, we need to inject them to L1 or L2.
*/
-static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
+static int nested_vmx_check_exception(struct kvm_vcpu *vcpu)
{
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
+ unsigned int nr = vcpu->arch.exception.nr;

- if (!(vmcs12->exception_bitmap & (1u << nr)))
+ if (!((vmcs12->exception_bitmap & (1u << nr)) ||
+ (nr == PF_VECTOR && vcpu->arch.exception.nested_apf)))
return 0;

+ if (vcpu->arch.exception.nested_apf) {
+ vmcs_write32(VM_EXIT_INTR_ERROR_CODE, vcpu->arch.exception.error_code);
+ nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
+ PF_VECTOR | INTR_TYPE_HARD_EXCEPTION |
+ INTR_INFO_DELIVER_CODE_MASK | INTR_INFO_VALID_MASK,
+ vcpu->arch.apf.nested_apf_token);
+ return 1;
+ }
+
nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI,
vmcs_read32(VM_EXIT_INTR_INFO),
vmcs_readl(EXIT_QUALIFICATION));
@@ -2445,7 +2456,7 @@ static void vmx_queue_exception(struct kvm_vcpu *vcpu)
u32 intr_info = nr | INTR_INFO_VALID_MASK;

if (!reinject && is_guest_mode(vcpu) &&
- nested_vmx_check_exception(vcpu, nr))
+ nested_vmx_check_exception(vcpu))
return;

if (has_error_code) {
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e149c92..f3f1015 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -450,7 +450,12 @@ EXPORT_SYMBOL_GPL(kvm_complete_insn_gp);
void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault)
{
++vcpu->stat.pf_guest;
- vcpu->arch.cr2 = fault->address;
+ vcpu->arch.exception.nested_apf =
+ is_guest_mode(vcpu) && fault->async_page_fault;
+ if (vcpu->arch.exception.nested_apf)
+ vcpu->arch.apf.nested_apf_token = fault->address;
+ else
+ vcpu->arch.cr2 = fault->address;
kvm_queue_exception_e(vcpu, PF_VECTOR, fault->error_code);
}
EXPORT_SYMBOL_GPL(kvm_inject_page_fault);
@@ -8582,6 +8587,7 @@ void kvm_arch_async_page_not_present(struct kvm_vcpu *vcpu,
fault.error_code = 0;
fault.nested_page_fault = false;
fault.address = work->arch.token;
+ fault.async_page_fault = true;
kvm_inject_page_fault(vcpu, &fault);
}
}
@@ -8604,6 +8610,7 @@ void kvm_arch_async_page_present(struct kvm_vcpu *vcpu,
fault.error_code = 0;
fault.nested_page_fault = false;
fault.address = work->arch.token;
+ fault.async_page_fault = true;
kvm_inject_page_fault(vcpu, &fault);
}
vcpu->arch.apf.halted = false;
--
2.7.4

2017-07-14 01:30:57

by Wanpeng Li

[permalink] [raw]

Subject: [PATCH v8 1/4] KVM: x86: Simplify kvm_x86_ops->queue_exception parameter list

From: Wanpeng Li <[email protected]>

This patch removes all arguments except the first in
kvm_x86_ops->queue_exception since they can extract the arguments from
vcpu->arch.exception themselves.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 4 +---
arch/x86/kvm/svm.c | 8 +++++---
arch/x86/kvm/vmx.c | 8 +++++---
arch/x86/kvm/x86.c | 5 +----
4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9d8de5d..8d11ddc 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -954,9 +954,7 @@ struct kvm_x86_ops {
unsigned char *hypercall_addr);
void (*set_irq)(struct kvm_vcpu *vcpu);
void (*set_nmi)(struct kvm_vcpu *vcpu);
- void (*queue_exception)(struct kvm_vcpu *vcpu, unsigned nr,
- bool has_error_code, u32 error_code,
- bool reinject);
+ void (*queue_exception)(struct kvm_vcpu *vcpu);
void (*cancel_injection)(struct kvm_vcpu *vcpu);
int (*interrupt_allowed)(struct kvm_vcpu *vcpu);
int (*nmi_allowed)(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 4c98d36..cde756a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -637,11 +637,13 @@ static void skip_emulated_instruction(struct kvm_vcpu *vcpu)
svm_set_interrupt_shadow(vcpu, 0);
}

-static void svm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
- bool has_error_code, u32 error_code,
- bool reinject)
+static void svm_queue_exception(struct kvm_vcpu *vcpu)
{
struct vcpu_svm *svm = to_svm(vcpu);
+ unsigned nr = vcpu->arch.exception.nr;
+ bool has_error_code = vcpu->arch.exception.has_error_code;
+ bool reinject = vcpu->arch.exception.reinject;
+ u32 error_code = vcpu->arch.exception.error_code;

/*
* If we are within a nested VM we'd better #VMEXIT and let the guest
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 25f2fdc..69cc228 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2435,11 +2435,13 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned nr)
return 1;
}

-static void vmx_queue_exception(struct kvm_vcpu *vcpu, unsigned nr,
- bool has_error_code, u32 error_code,
- bool reinject)
+static void vmx_queue_exception(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ unsigned nr = vcpu->arch.exception.nr;
+ bool has_error_code = vcpu->arch.exception.has_error_code;
+ bool reinject = vcpu->arch.exception.reinject;
+ u32 error_code = vcpu->arch.exception.error_code;
u32 intr_info = nr | INTR_INFO_VALID_MASK;

if (!reinject && is_guest_mode(vcpu) &&
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f41c522..e149c92 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6356,10 +6356,7 @@ static int inject_pending_event(struct kvm_vcpu *vcpu, bool req_int_win)
kvm_update_dr7(vcpu);
}

- kvm_x86_ops->queue_exception(vcpu, vcpu->arch.exception.nr,
- vcpu->arch.exception.has_error_code,
- vcpu->arch.exception.error_code,
- vcpu->arch.exception.reinject);
+ kvm_x86_ops->queue_exception(vcpu);
return 0;
}

--
2.7.4

2017-07-14 01:31:17

by Wanpeng Li

[permalink] [raw]

Subject: [PATCH v8 4/4] KVM: async_pf: Let guest support delivery of async_pf from guest mode

From: Wanpeng Li <[email protected]>

Adds another flag bit (bit 2) to MSR_KVM_ASYNC_PF_EN. If bit 2 is 1,
async page faults are delivered to L1 as #PF vmexits; if bit 2 is 0,
kvm_can_do_async_pf returns 0 if in guest mode.

This is similar to what svm.c wanted to do all along, but it is only
enabled for Linux as L1 hypervisor. Foreign hypervisors must never
receive async page faults as vmexits, because they'd probably be very
confused about that.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
Documentation/virtual/kvm/msr.txt | 5 +++--
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/uapi/asm/kvm_para.h | 1 +
arch/x86/kernel/kvm.c | 7 ++++++-
arch/x86/kvm/mmu.c | 2 +-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 5 +++--
7 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt
index 0a9ea51..1ebecc1 100644
--- a/Documentation/virtual/kvm/msr.txt
+++ b/Documentation/virtual/kvm/msr.txt
@@ -166,10 +166,11 @@ MSR_KVM_SYSTEM_TIME: 0x12
MSR_KVM_ASYNC_PF_EN: 0x4b564d02
data: Bits 63-6 hold 64-byte aligned physical address of a
64 byte memory area which must be in guest RAM and must be
- zeroed. Bits 5-2 are reserved and should be zero. Bit 0 is 1
+ zeroed. Bits 5-3 are reserved and should be zero. Bit 0 is 1
when asynchronous page faults are enabled on the vcpu 0 when
disabled. Bit 1 is 1 if asynchronous page faults can be injected
- when vcpu is in cpl == 0.
+ when vcpu is in cpl == 0. Bit 2 is 1 if asynchronous page faults
+ are delivered to L1 as #PF vmexits.

First 4 byte of 64 byte memory location will be written to by
the hypervisor at the time of asynchronous page fault (APF)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5e9ac50..da3261e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -653,6 +653,7 @@ struct kvm_vcpu_arch {
bool send_user_only;
u32 host_apf_reason;
unsigned long nested_apf_token;
+ bool delivery_as_pf_vmexit;
} apf;

/* OSVW MSRs (AMD only) */
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/asm/kvm_para.h
index cff0bb6..a965e5b 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -67,6 +67,7 @@ struct kvm_clock_pairing {

#define KVM_ASYNC_PF_ENABLED (1 << 0)
#define KVM_ASYNC_PF_SEND_ALWAYS (1 << 1)
+#define KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT (1 << 2)

/* Operations for KVM_HC_MMU_OP */
#define KVM_MMU_OP_WRITE_PTE 1
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 43e10d6..71c17a5 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -330,7 +330,12 @@ static void kvm_guest_cpu_init(void)
#ifdef CONFIG_PREEMPT
pa |= KVM_ASYNC_PF_SEND_ALWAYS;
#endif
- wrmsrl(MSR_KVM_ASYNC_PF_EN, pa | KVM_ASYNC_PF_ENABLED);
+ pa |= KVM_ASYNC_PF_ENABLED;
+
+ /* Async page fault support for L1 hypervisor is optional */
+ if (wrmsr_safe(MSR_KVM_ASYNC_PF_EN,
+ (pa | KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT) & 0xffffffff, pa >> 32) < 0)
+ wrmsrl(MSR_KVM_ASYNC_PF_EN, pa);
__this_cpu_write(apf_reason.enabled, 1);
printk(KERN_INFO"KVM setup async PF for cpu %d\n",
smp_processor_id());
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 338cb4c..02449ef 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3749,7 +3749,7 @@ bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu)
kvm_event_needs_reinjection(vcpu)))
return false;

- if (is_guest_mode(vcpu))
+ if (!vcpu->arch.apf.delivery_as_pf_vmexit && is_guest_mode(vcpu))
return false;

return kvm_x86_ops->interrupt_allowed(vcpu);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5a3bb1a..84e62ac 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -8037,7 +8037,7 @@ static bool nested_vmx_exit_handled(struct kvm_vcpu *vcpu)
if (is_nmi(intr_info))
return false;
else if (is_page_fault(intr_info))
- return enable_ept;
+ return !vmx->vcpu.arch.apf.host_apf_reason && enable_ept;
else if (is_no_device(intr_info) &&
!(vmcs12->guest_cr0 & X86_CR0_TS))
return false;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f3f1015..6753f09 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2063,8 +2063,8 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
{
gpa_t gpa = data & ~0x3f;

- /* Bits 2:5 are reserved, Should be zero */
- if (data & 0x3c)
+ /* Bits 3:5 are reserved, Should be zero */
+ if (data & 0x38)
return 1;

vcpu->arch.apf.msr_val = data;
@@ -2080,6 +2080,7 @@ static int kvm_pv_enable_async_pf(struct kvm_vcpu *vcpu, u64 data)
return 1;

vcpu->arch.apf.send_user_only = !(data & KVM_ASYNC_PF_SEND_ALWAYS);
+ vcpu->arch.apf.delivery_as_pf_vmexit = data & KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT;
kvm_async_pf_wakeup_all(vcpu);
return 0;
}
--
2.7.4

2017-07-14 01:31:48

by Wanpeng Li

[permalink] [raw]

Subject: [PATCH v8 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler

From: Wanpeng Li <[email protected]>

This patch adds the L1 guest async page fault #PF vmexit handler, such
by L1 similar to ordinary async page fault.

Cc: Paolo Bonzini <[email protected]>
Cc: Radim Krčmář <[email protected]>
Signed-off-by: Wanpeng Li <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.c | 32 ++++++++++++++++++++++++++++++++
arch/x86/kvm/mmu.h | 3 +++
arch/x86/kvm/svm.c | 36 ++++++------------------------------
arch/x86/kvm/vmx.c | 15 ++++++++-------
5 files changed, 50 insertions(+), 37 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 8d11ddc..4f20ee6 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -650,6 +650,7 @@ struct kvm_vcpu_arch {
u64 msr_val;
u32 id;
bool send_user_only;
+ u32 host_apf_reason;
} apf;

/* OSVW MSRs (AMD only) */
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index aafd399..338cb4c 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -46,6 +46,7 @@
#include <asm/io.h>
#include <asm/vmx.h>
#include <asm/kvm_page_track.h>
+#include "trace.h"

/*
* When setting this variable to true it enables Two-Dimensional-Paging
@@ -3780,6 +3781,37 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
return false;
}

+int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
+ u64 fault_address, char *insn, int insn_len,
+ bool need_unprotect)
+{
+ int r = 1;
+
+ switch (vcpu->arch.apf.host_apf_reason) {
+ default:
+ trace_kvm_page_fault(fault_address, error_code);
+
+ if (need_unprotect && kvm_event_needs_reinjection(vcpu))
+ kvm_mmu_unprotect_page_virt(vcpu, fault_address);
+ r = kvm_mmu_page_fault(vcpu, fault_address, error_code, NULL, 0);
+ break;
+ case KVM_PV_REASON_PAGE_NOT_PRESENT:
+ vcpu->arch.apf.host_apf_reason = 0;
+ local_irq_disable();
+ kvm_async_pf_task_wait(fault_address);
+ local_irq_enable();
+ break;
+ case KVM_PV_REASON_PAGE_READY:
+ vcpu->arch.apf.host_apf_reason = 0;
+ local_irq_disable();
+ kvm_async_pf_task_wake(fault_address);
+ local_irq_enable();
+ break;
+ }
+ return r;
+}
+EXPORT_SYMBOL_GPL(kvm_handle_page_fault);
+
static bool
check_hugepage_cache_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, int level)
{
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index a276834..d7d248a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -77,6 +77,9 @@ void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu);
void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly,
bool accessed_dirty);
bool kvm_can_do_async_pf(struct kvm_vcpu *vcpu);
+int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
+ u64 fault_address, char *insn, int insn_len,
+ bool need_unprotect);

static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
{
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index cde756a..fb23497 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -194,7 +194,6 @@ struct vcpu_svm {

unsigned int3_injected;
unsigned long int3_rip;
- u32 apf_reason;

/* cached guest cpuid flags for faster access */
bool nrips_enabled : 1;
@@ -2122,34 +2121,11 @@ static void svm_set_dr7(struct kvm_vcpu *vcpu, unsigned long value)
static int pf_interception(struct vcpu_svm *svm)
{
u64 fault_address = svm->vmcb->control.exit_info_2;
- u64 error_code;
- int r = 1;
+ u64 error_code = svm->vmcb->control.exit_info_1;

- switch (svm->apf_reason) {
- default:
- error_code = svm->vmcb->control.exit_info_1;
-
- trace_kvm_page_fault(fault_address, error_code);
- if (!npt_enabled && kvm_event_needs_reinjection(&svm->vcpu))
- kvm_mmu_unprotect_page_virt(&svm->vcpu, fault_address);
- r = kvm_mmu_page_fault(&svm->vcpu, fault_address, error_code,
+ return kvm_handle_page_fault(&svm->vcpu, error_code, fault_address,
svm->vmcb->control.insn_bytes,
- svm->vmcb->control.insn_len);
- break;
- case KVM_PV_REASON_PAGE_NOT_PRESENT:
- svm->apf_reason = 0;
- local_irq_disable();
- kvm_async_pf_task_wait(fault_address);
- local_irq_enable();
- break;
- case KVM_PV_REASON_PAGE_READY:
- svm->apf_reason = 0;
- local_irq_disable();
- kvm_async_pf_task_wake(fault_address);
- local_irq_enable();
- break;
- }
- return r;
+ svm->vmcb->control.insn_len, !npt_enabled);
}

static int db_interception(struct vcpu_svm *svm)
@@ -2630,7 +2606,7 @@ static int nested_svm_exit_special(struct vcpu_svm *svm)
break;
case SVM_EXIT_EXCP_BASE + PF_VECTOR:
/* When we're shadowing, trap PFs, but not async PF */
- if (!npt_enabled && svm->apf_reason == 0)
+ if (!npt_enabled && svm->vcpu.arch.apf.host_apf_reason == 0)
return NESTED_EXIT_HOST;
break;
default:
@@ -2677,7 +2653,7 @@ static int nested_svm_intercept(struct vcpu_svm *svm)
}
/* async page fault always cause vmexit */
else if ((exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR) &&
- svm->apf_reason != 0)
+ svm->vcpu.arch.apf.host_apf_reason != 0)
vmexit = NESTED_EXIT_DONE;
break;
}
@@ -4998,7 +4974,7 @@ static void svm_vcpu_run(struct kvm_vcpu *vcpu)

/* if exit due to PF check for async PF */
if (svm->vmcb->control.exit_code == SVM_EXIT_EXCP_BASE + PF_VECTOR)
- svm->apf_reason = kvm_read_and_reset_pf_reason();
+ svm->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();

if (npt_enabled) {
vcpu->arch.regs_avail &= ~(1 << VCPU_EXREG_PDPTR);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 69cc228..c9c46e6 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5698,14 +5698,11 @@ static int handle_exception(struct kvm_vcpu *vcpu)
}

if (is_page_fault(intr_info)) {
- /* EPT won't cause page fault directly */
- BUG_ON(enable_ept);
cr2 = vmcs_readl(EXIT_QUALIFICATION);
- trace_kvm_page_fault(cr2, error_code);
-
- if (kvm_event_needs_reinjection(vcpu))
- kvm_mmu_unprotect_page_virt(vcpu, cr2);
- return kvm_mmu_page_fault(vcpu, cr2, error_code, NULL, 0);
+ /* EPT won't cause page fault directly */
+ WARN_ON_ONCE(!vcpu->arch.apf.host_apf_reason && enable_ept);
+ return kvm_handle_page_fault(vcpu, error_code, cr2, NULL, 0,
+ true);
}

ex_no = intr_info & INTR_INFO_VECTOR_MASK;
@@ -8643,6 +8640,10 @@ static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx)
exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
vmx->exit_intr_info = exit_intr_info;

+ /* if exit due to PF check for async PF */
+ if (is_page_fault(exit_intr_info))
+ vmx->vcpu.arch.apf.host_apf_reason = kvm_read_and_reset_pf_reason();
+
/* Handle machine checks before interrupts are enabled */
if (basic_exit_reason == EXIT_REASON_MCE_DURING_VMENTRY ||
is_machine_check(exit_intr_info))
--
2.7.4

2017-07-14 12:39:39

by Radim Krčmář

[permalink] [raw]

Subject: Re: [PATCH v8 2/4] KVM: async_pf: Add L1 guest async_pf #PF vmexit handler

2017-07-13 18:30-0700, Wanpeng Li:
> From: Wanpeng Li <[email protected]>
>
> This patch adds the L1 guest async page fault #PF vmexit handler, such
> by L1 similar to ordinary async page fault.
>
> Cc: Paolo Bonzini <[email protected]>
> Cc: Radim Krčmář <[email protected]>
> Signed-off-by: Wanpeng Li <[email protected]>
> ---
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> @@ -3780,6 +3781,37 @@ static bool try_async_pf(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn,
> return false;
> }
>
> +int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code,
> + u64 fault_address, char *insn, int insn_len,
> + bool need_unprotect)
> +{
> + int r = 1;
> +
> + switch (vcpu->arch.apf.host_apf_reason) {
> + default:
> + trace_kvm_page_fault(fault_address, error_code);
> +
> + if (need_unprotect && kvm_event_needs_reinjection(vcpu))
> + kvm_mmu_unprotect_page_virt(vcpu, fault_address);
> + r = kvm_mmu_page_fault(vcpu, fault_address, error_code, NULL, 0);

I changed this when applying (my patch was crappy), the arguments
shouldn't be lost:

kvm_mmu_page_fault(vcpu, fault_address, error_code, insn, insn_len);

It will be in the second merge window pull request if nothing goes bad,
thanks.