ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.
But actually they don't need to be pinned in memory.
[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.
[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.
NOTE: Patch 1~5 are tested with -cpu xxx,-x2apic option, and they work well.
Patch 6 is not tested yet, not sure if it is right.
Change log v2 -> v3:
1. Remove original [PATCH 3/6] since ept_identity_pagetable has been removed
in new [PATCH 3/6].
2. In [PATCH 3/6], fix the problem that kvm->slots_lock does not protect
kvm->arch.ept_identity_pagetable_done checking.
3. In [PATCH 3/6], drop gfn_to_page() since ept_identity_pagetable has been
removed.
4. Add new [PATCH 4/6], remove redundant variable in init_rmode_identity_map(),
and make it return 0 on success.
5. In [PATCH 5/6], drop put_page(kvm->arch.apic_access_page) from x86.c .
6. In [PATCH 5/6], update kvm->arch.apic_access_page in vcpu_reload_apic_access_page().
7. Add new [PATCH 6/6], reload apic access page in L2->L1 exit.
Change log v1 -> v2:
1. Add [PATCH 4/5] to remove unnecessary kvm_arch->ept_identity_pagetable.
2. In [PATCH 5/5], only introduce KVM_REQ_APIC_PAGE_RELOAD request.
3. In [PATCH 5/5], add set_apic_access_page_addr() for svm.
Tang Chen (6):
kvm: Add gfn_to_page_no_pin() to translate gfn to page without
pinning.
kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
kvm: Remove ept_identity_pagetable from struct kvm_arch.
kvm: Make init_rmode_identity_map() return 0 on success.
kvm, mem-hotplug: Do not pin apic access page in memory.
kvm, mem-hotplug: Reload L1's apic access page if it is migrated when
L2 is running.
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/kvm/svm.c | 15 +++++-
arch/x86/kvm/vmx.c | 108 +++++++++++++++++++++++++++-------------
arch/x86/kvm/x86.c | 22 ++++++--
include/linux/kvm_host.h | 3 ++
virt/kvm/kvm_main.c | 29 ++++++++++-
6 files changed, 139 insertions(+), 41 deletions(-)
--
1.8.3.1
gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin the page
in memory by calling GUP functions. This function unpins the page.
Will be used by the followed patches.
Signed-off-by: Tang Chen <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 17 ++++++++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec4e3bd..7c58d9d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -541,6 +541,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages,
int nr_pages);
struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn);
unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b6c01b..6091849 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1371,9 +1371,24 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)
return kvm_pfn_to_page(pfn);
}
-
EXPORT_SYMBOL_GPL(gfn_to_page);
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn)
+{
+ struct page *page = gfn_to_page(kvm, gfn);
+
+ /*
+ * gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin
+ * the page in memory by calling GUP functions. This function unpins
+ * the page.
+ */
+ if (!is_error_page(page))
+ put_page(page);
+
+ return page;
+}
+EXPORT_SYMBOL_GPL(gfn_to_page_no_pin);
+
void kvm_release_page_clean(struct page *page)
{
WARN_ON(is_error_page(page));
--
1.8.3.1
We have APIC_DEFAULT_PHYS_BASE defined as 0xfee00000, which is also the address of
apic access page. So use this macro.
Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/kvm/svm.c | 3 ++-
arch/x86/kvm/vmx.c | 6 +++---
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..576b525 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);
- svm->vcpu.arch.apic_base = 0xfee00000 | MSR_IA32_APICBASE_ENABLE;
+ svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+ MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..0e1117c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3982,13 +3982,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
- kvm_userspace_mem.guest_phys_addr = 0xfee00000ULL;
+ kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;
- page = gfn_to_page(kvm, 0xfee00);
+ page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4460,7 +4460,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)
vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
- apic_base_msr.data = 0xfee00000 | MSR_IA32_APICBASE_ENABLE;
+ apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
--
1.8.3.1
apic access page is pinned in memory. As a result, it cannot be migrated/hot-removed.
Actually, it is not necessary to be pinned.
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.
Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 6 ++++++
arch/x86/kvm/vmx.c | 8 +++++++-
arch/x86/kvm/x86.c | 17 +++++++++++++++--
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_main.c | 12 ++++++++++++
6 files changed, 43 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 62f973e..9ce6bfd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -737,6 +737,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+ void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 576b525..dc76f29 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3612,6 +3612,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
return;
}
+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+ return;
+}
+
static int svm_vm_has_apicv(struct kvm *kvm)
{
return 0;
@@ -4365,6 +4370,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+ .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6ab4f87..c123c1d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3995,7 +3995,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;
- page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+ page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7072,6 +7072,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
}
+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+ vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
{
u16 status;
@@ -8841,6 +8846,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+ .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ffbe557..7541a66 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5929,6 +5929,19 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
}
+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+ /*
+ * apic access page could be migrated. When the page is being migrated,
+ * GUP will wait till the migrate entry is replaced with the new pte
+ * entry pointing to the new page.
+ */
+ vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
+ APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+ kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+ page_to_phys(vcpu->kvm->arch.apic_access_page));
+}
+
/*
* Returns 1 to let __vcpu_run() continue the guest execution loop without
* exiting to the userspace. Otherwise, the value will be returned to the
@@ -5989,6 +6002,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+ if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+ vcpu_reload_apic_access_page(vcpu);
}
if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
@@ -7175,8 +7190,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kfree(kvm->arch.vpic);
kfree(kvm->arch.vioapic);
kvm_free_vcpus(kvm);
- if (kvm->arch.apic_access_page)
- put_page(kvm->arch.apic_access_page);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7c58d9d..f49be86 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -136,6 +136,7 @@ static inline bool is_error_page(struct page *page)
#define KVM_REQ_GLOBAL_CLOCK_UPDATE 22
#define KVM_REQ_ENABLE_IBS 23
#define KVM_REQ_DISABLE_IBS 24
+#define KVM_REQ_APIC_PAGE_RELOAD 25
#define KVM_USERSPACE_IRQ_SOURCE_ID 0
#define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
@@ -596,6 +597,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
void kvm_reload_remote_mmus(struct kvm *kvm);
void kvm_make_mclock_inprogress_request(struct kvm *kvm);
void kvm_make_scan_ioapic_request(struct kvm *kvm);
+void kvm_reload_apic_access_page(struct kvm *kvm);
long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6091849..965b702 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
}
+void kvm_reload_apic_access_page(struct kvm *kvm)
+{
+ make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
+}
+
int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
{
struct page *page;
@@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
if (need_tlb_flush)
kvm_flush_remote_tlbs(kvm);
+ /*
+ * The physical address of apic access page is stroed in VMCS.
+ * So need to update it when it becomes invalid.
+ */
+ if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT))
+ kvm_reload_apic_access_page(kvm);
+
spin_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);
}
--
1.8.3.1
This patch only handle "L1 and L2 vm share one apic access page" situation.
When L1 vm is running, if the shared apic access page is migrated, mmu_notifier will
request all vcpus to exit to L0, and reload apic access page physical address for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.
When L2 vm is running, if the shared apic access page is migrated, mmu_notifier will
request all vcpus to exit to L0, and reload apic access page physical address for
all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.
Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm.c | 6 ++++++
arch/x86/kvm/vmx.c | 37 +++++++++++++++++++++++++++++++++++++
arch/x86/kvm/x86.c | 3 +++
4 files changed, 47 insertions(+)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9ce6bfd..613ee7f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -738,6 +738,7 @@ struct kvm_x86_ops {
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
+ void (*set_nested_apic_page_migrated)(struct kvm_vcpu *vcpu, bool set);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index dc76f29..87273ef 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3617,6 +3617,11 @@ static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
return;
}
+static void svm_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+ return;
+}
+
static int svm_vm_has_apicv(struct kvm *kvm)
{
return 0;
@@ -4371,6 +4376,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
.set_apic_access_page_addr = svm_set_apic_access_page_addr,
+ .set_nested_apic_page_migrated = svm_set_nested_apic_page_migrated,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c123c1d..9231afe 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -379,6 +379,16 @@ struct nested_vmx {
* we must keep them pinned while L2 runs.
*/
struct page *apic_access_page;
+ /*
+ * L1's apic access page can be migrated. When L1 and L2 are sharing
+ * the apic access page, after the page is migrated when L2 is running,
+ * we have to reload it to L1 vmcs before we enter L1.
+ *
+ * When the shared apic access page is migrated in L1 mode, we don't
+ * need to do anything else because we reload apic access page each
+ * time when entering L2 in prepare_vmcs02().
+ */
+ bool apic_access_page_migrated;
u64 msr_ia32_feature_control;
struct hrtimer preemption_timer;
@@ -7077,6 +7087,12 @@ static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
vmcs_write64(APIC_ACCESS_ADDR, hpa);
}
+static void vmx_set_nested_apic_page_migrated(struct kvm_vcpu *vcpu, bool set)
+{
+ struct vcpu_vmx *vmx = to_vmx(vcpu);
+ vmx->nested.apic_access_page_migrated = set;
+}
+
static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
{
u16 status;
@@ -8727,6 +8743,26 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason,
}
/*
+ * When shared (L1 & L2) apic access page is migrated during L2 is
+ * running, mmu_notifier will force to reload the page's hpa for L2
+ * vmcs. Need to reload it for L1 before entering L1.
+ */
+ if (vmx->nested.apic_access_page_migrated) {
+ /*
+ * Do not call kvm_reload_apic_access_page() because we are now
+ * in L2. We should not call make_all_cpus_request() to exit to
+ * L0, otherwise we will reload for L2 vmcs again.
+ */
+ int i;
+
+ for (i = 0; i < atomic_read(&vcpu->kvm->online_vcpus); i++)
+ kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD,
+ vcpu->kvm->vcpus[i]);
+
+ vmx->nested.apic_access_page_migrated = false;
+ }
+
+ /*
* Exiting from L2 to L1, we're now back to L1 which thinks it just
* finished a VMLAUNCH or VMRESUME instruction, so we need to set the
* success or failure flag accordingly.
@@ -8847,6 +8883,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
.set_apic_access_page_addr = vmx_set_apic_access_page_addr,
+ .set_nested_apic_page_migrated = vmx_set_nested_apic_page_migrated,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7541a66..0c11e12 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5940,6 +5940,9 @@ static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
page_to_phys(vcpu->kvm->arch.apic_access_page));
+
+ if (is_guest_mode(vcpu))
+ kvm_x86_ops->set_nested_apic_page_migrated(vcpu, true);
}
/*
--
1.8.3.1
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.
In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized
Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.
NOTE: In the original code, ept identity pagetable page is pinned in memroy.
As a result, it cannot be migrated/hot-removed. After this patch, since
kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page
is no longer pinned in memory. And it can be migrated/hot-removed.
Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/vmx.c | 50 ++++++++++++++++++++---------------------
arch/x86/kvm/x86.c | 2 --
3 files changed, 25 insertions(+), 28 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..62f973e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,7 +578,6 @@ struct kvm_arch {
gpa_t wall_clock;
- struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0e1117c..b8bf47d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
static bool vmx_mpx_supported(void);
+static int alloc_identity_pagetable(struct kvm *kvm);
static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3921,21 +3922,27 @@ out:
static int init_rmode_identity_map(struct kvm *kvm)
{
- int i, idx, r, ret;
+ int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
if (!enable_ept)
return 1;
- if (unlikely(!kvm->arch.ept_identity_pagetable)) {
- printk(KERN_ERR "EPT: identity-mapping pagetable "
- "haven't been allocated!\n");
- return 0;
+
+ /* Protect kvm->arch.ept_identity_pagetable_done. */
+ mutex_lock(&kvm->slots_lock);
+
+ if (likely(kvm->arch.ept_identity_pagetable_done)) {
+ ret = 1;
+ goto out2;
}
- if (likely(kvm->arch.ept_identity_pagetable_done))
- return 1;
- ret = 0;
+
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+ r = alloc_identity_pagetable(kvm);
+ if (r)
+ goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3953,6 +3960,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+ mutex_unlock(&kvm->slots_lock);
return ret;
}
@@ -4002,31 +4012,23 @@ out:
static int alloc_identity_pagetable(struct kvm *kvm)
{
- struct page *page;
+ /*
+ * In init_rmode_identity_map(), kvm->arch.ept_identity_pagetable_done
+ * is checked before calling this function and set to true after the
+ * calling. The access to kvm->arch.ept_identity_pagetable_done should
+ * be protected by kvm->slots_lock.
+ */
+
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;
- mutex_lock(&kvm->slots_lock);
- if (kvm->arch.ept_identity_pagetable)
- goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
kvm->arch.ept_identity_map_addr;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
- if (r)
- goto out;
- page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
- if (is_error_page(page)) {
- r = -EFAULT;
- goto out;
- }
-
- kvm->arch.ept_identity_pagetable = page;
-out:
- mutex_unlock(&kvm->slots_lock);
return r;
}
@@ -7582,8 +7584,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
- if (alloc_identity_pagetable(kvm) != 0)
- goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..ffbe557 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7177,8 +7177,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_free_vcpus(kvm);
if (kvm->arch.apic_access_page)
put_page(kvm->arch.apic_access_page);
- if (kvm->arch.ept_identity_pagetable)
- put_page(kvm->arch.ept_identity_pagetable);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
}
--
1.8.3.1
In init_rmode_identity_map(), there two variables indicating the return
value, r and ret, and it return 0 on error, 1 on success. The function
is only called by vmx_create_vcpu(), and r is redundant.
This patch removes the redundant variable r, and make init_rmode_identity_map()
return 0 on success, -errno on failure.
Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/kvm/vmx.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index b8bf47d..6ab4f87 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3922,45 +3922,42 @@ out:
static int init_rmode_identity_map(struct kvm *kvm)
{
- int i, idx, r, ret = 0;
+ int i, idx, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;
if (!enable_ept)
- return 1;
+ return 0;
/* Protect kvm->arch.ept_identity_pagetable_done. */
mutex_lock(&kvm->slots_lock);
- if (likely(kvm->arch.ept_identity_pagetable_done)) {
- ret = 1;
+ if (likely(kvm->arch.ept_identity_pagetable_done))
goto out2;
- }
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
- r = alloc_identity_pagetable(kvm);
- if (r)
+ ret = alloc_identity_pagetable(kvm);
+ if (ret)
goto out2;
idx = srcu_read_lock(&kvm->srcu);
- r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
- if (r < 0)
+ ret = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
+ if (ret)
goto out;
/* Set up identity-mapping pagetable for EPT in real mode */
for (i = 0; i < PT32_ENT_PER_PAGE; i++) {
tmp = (i << 22) + (_PAGE_PRESENT | _PAGE_RW | _PAGE_USER |
_PAGE_ACCESSED | _PAGE_DIRTY | _PAGE_PSE);
- r = kvm_write_guest_page(kvm, identity_map_pfn,
+ ret = kvm_write_guest_page(kvm, identity_map_pfn,
&tmp, i * sizeof(tmp), sizeof(tmp));
- if (r < 0)
+ if (ret)
goto out;
}
kvm->arch.ept_identity_pagetable_done = true;
- ret = 1;
+
out:
srcu_read_unlock(&kvm->srcu, idx);
-
out2:
mutex_unlock(&kvm->slots_lock);
return ret;
@@ -7584,7 +7581,7 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
- if (!init_rmode_identity_map(kvm))
+ if (init_rmode_identity_map(kvm))
goto free_vmcs;
}
--
1.8.3.1
On 2014-07-23 21:42, Tang Chen wrote:
> This patch only handle "L1 and L2 vm share one apic access page" situation.
>
> When L1 vm is running, if the shared apic access page is migrated, mmu_notifier will
> request all vcpus to exit to L0, and reload apic access page physical address for
> all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, L2's vmcs
> will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
> nothing.
>
> When L2 vm is running, if the shared apic access page is migrated, mmu_notifier will
> request all vcpus to exit to L0, and reload apic access page physical address for
> all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.
Shouldn't this patch come before we allow apic access page migration?
Jan
On 07/26/2014 04:44 AM, Jan Kiszka wrote:
> On 2014-07-23 21:42, Tang Chen wrote:
>> This patch only handle "L1 and L2 vm share one apic access page" situation.
>>
>> When L1 vm is running, if the shared apic access page is migrated, mmu_notifier will
>> request all vcpus to exit to L0, and reload apic access page physical address for
>> all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, L2's vmcs
>> will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
>> nothing.
>>
>> When L2 vm is running, if the shared apic access page is migrated, mmu_notifier will
>> request all vcpus to exit to L0, and reload apic access page physical address for
>> all L2 vmcs. And this patch requests apic access page reload in L2->L1 vmexit.
> Shouldn't this patch come before we allow apic access page migration?
Yes, it should come before patch 5.
Thanks.