2014-07-08 13:00:39

by Tang Chen

[permalink] [raw]
Subject: [PATCH v2 0/5] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

[For ept identity page]
Just do not pin it. When it is migrated, guest will be able to find the
new page in the next ept violation.

[For apic access page]
The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
for each vcpu in addition.

Change log v1 -> v2:
1. Add [PATCH 4/5] to remove unnecessary kvm_arch->ept_identity_pagetable.
2. In [PATCH 3/5], only introduce KVM_REQ_APIC_PAGE_RELOAD request.
3. In [PATCH 3/5], add set_apic_access_page_addr() for svm.


Tang Chen (5):
kvm: Add gfn_to_page_no_pin() to translate gfn to page without
pinning.
kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
kvm, mem-hotplug: Do not pin ept identity pagetable in memory.
kvm: Remove ept_identity_pagetable from struct kvm_arch.
kvm, mem-hotplug: Do not pin apic access page in memory.

arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 11 +++++++++++
arch/x86/kvm/svm.c | 9 ++++++++-
arch/x86/kvm/vmx.c | 40 ++++++++++++++++++++++------------------
arch/x86/kvm/x86.c | 16 ++++++++++++++--
include/linux/kvm_host.h | 3 +++
virt/kvm/kvm_main.c | 29 ++++++++++++++++++++++++++++-
7 files changed, 87 insertions(+), 23 deletions(-)

--
1.8.3.1


2014-07-08 13:00:43

by Tang Chen

[permalink] [raw]
Subject: [PATCH v2 1/5] kvm: Add gfn_to_page_no_pin() to translate gfn to page without pinning.

gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin the page
in memory by calling GUP functions. This function unpins the page.

Will be used by the followed patches.

Signed-off-by: Tang Chen <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 17 ++++++++++++++++-
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ec4e3bd..7c58d9d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -541,6 +541,7 @@ int gfn_to_page_many_atomic(struct kvm *kvm, gfn_t gfn, struct page **pages,
int nr_pages);

struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn);
+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn);
unsigned long gfn_to_hva(struct kvm *kvm, gfn_t gfn);
unsigned long gfn_to_hva_prot(struct kvm *kvm, gfn_t gfn, bool *writable);
unsigned long gfn_to_hva_memslot(struct kvm_memory_slot *slot, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b6c01b..6091849 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1371,9 +1371,24 @@ struct page *gfn_to_page(struct kvm *kvm, gfn_t gfn)

return kvm_pfn_to_page(pfn);
}
-
EXPORT_SYMBOL_GPL(gfn_to_page);

+struct page *gfn_to_page_no_pin(struct kvm *kvm, gfn_t gfn)
+{
+ struct page *page = gfn_to_page(kvm, gfn);
+
+ /*
+ * gfn_to_page() will finally call hva_to_pfn() to get the pfn, and pin
+ * the page in memory by calling GUP functions. This function unpins
+ * the page.
+ */
+ if (!is_error_page(page))
+ put_page(page);
+
+ return page;
+}
+EXPORT_SYMBOL_GPL(gfn_to_page_no_pin);
+
void kvm_release_page_clean(struct page *page)
{
WARN_ON(is_error_page(page));
--
1.8.3.1

2014-07-08 13:00:48

by Tang Chen

[permalink] [raw]
Subject: [PATCH v2 3/5] kvm, mem-hotplug: Do not pin ept identity pagetable in memory.

ept identity page is pinned in memory. As a result, it cannot be migrated/hot-removed.

Actually, this page is not necessary to be pinned. When it is migrated,
mmu_notifier_invalidate_page() in try_to_unmap_one() will invalidate the ept
entry so that the guest won't be able to access the page. And in the next ept
violation, the new page will be found by ept violation handler.

This patch just unpin the ept identity page because it is not necessary.

Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/kvm/vmx.c | 3 ++-
arch/x86/kvm/x86.c | 2 --
2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0e1117c..0918635e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4018,7 +4018,8 @@ static int alloc_identity_pagetable(struct kvm *kvm)
if (r)
goto out;

- page = gfn_to_page(kvm, kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
+ page = gfn_to_page_no_pin(kvm,
+ kvm->arch.ept_identity_map_addr >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f32a025..ffbe557 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7177,8 +7177,6 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_free_vcpus(kvm);
if (kvm->arch.apic_access_page)
put_page(kvm->arch.apic_access_page);
- if (kvm->arch.ept_identity_pagetable)
- put_page(kvm->arch.ept_identity_pagetable);
kfree(rcu_dereference_check(kvm->arch.apic_map, 1));
}

--
1.8.3.1

2014-07-08 13:00:58

by Tang Chen

[permalink] [raw]
Subject: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

apic access page is pinned in memory. As a result, it cannot be migrated/hot-removed.
Actually, it is not necessary to be pinned.

The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
corresponding ept entry. This patch introduces a new vcpu request named
KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
and force all the vcpus exit guest, and re-enter guest till they updates the VMCS
APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
kvm->arch.apic_access_page to the new page.

Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.c | 11 +++++++++++
arch/x86/kvm/svm.c | 6 ++++++
arch/x86/kvm/vmx.c | 8 +++++++-
arch/x86/kvm/x86.c | 14 ++++++++++++++
include/linux/kvm_host.h | 2 ++
virt/kvm/kvm_main.c | 12 ++++++++++++
7 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 62f973e..9ce6bfd 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -737,6 +737,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+ void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..551693d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3427,6 +3427,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
level, gfn, pfn, prefault);
spin_unlock(&vcpu->kvm->mmu_lock);

+ /*
+ * apic access page could be migrated. When the guest tries to access
+ * the apic access page, ept violation will occur, and we can use GUP
+ * to find the new page.
+ *
+ * GUP will wait till the migrate entry be replaced with the new page.
+ */
+ if (gpa == APIC_DEFAULT_PHYS_BASE)
+ vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
+ APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+
return r;

out_unlock:
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 576b525..dc76f29 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3612,6 +3612,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
return;
}

+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+ return;
+}
+
static int svm_vm_has_apicv(struct kvm *kvm)
{
return 0;
@@ -4365,6 +4370,7 @@ static struct kvm_x86_ops svm_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
+ .set_apic_access_page_addr = svm_set_apic_access_page_addr,
.vm_has_apicv = svm_vm_has_apicv,
.load_eoi_exitmap = svm_load_eoi_exitmap,
.hwapic_isr_update = svm_hwapic_isr_update,
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 5532ac8..f7c6313 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3992,7 +3992,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;

- page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+ page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7073,6 +7073,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
}

+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+ vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
{
u16 status;
@@ -8842,6 +8847,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+ .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ffbe557..7080eda 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5929,6 +5929,18 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
kvm_apic_update_tmr(vcpu, tmr);
}

+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
+{
+ /*
+ * When the page is being migrated, GUP will wait till the migrate
+ * entry is replaced with the new pte entry pointing to the new page.
+ */
+ struct page *page = gfn_to_page_no_pin(vcpu->kvm,
+ APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
+ kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
+ page_to_phys(page));
+}
+
/*
* Returns 1 to let __vcpu_run() continue the guest execution loop without
* exiting to the userspace. Otherwise, the value will be returned to the
@@ -5989,6 +6001,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
kvm_deliver_pmi(vcpu);
if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
vcpu_scan_ioapic(vcpu);
+ if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
+ vcpu_reload_apic_access_page(vcpu);
}

if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 7c58d9d..f49be86 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -136,6 +136,7 @@ static inline bool is_error_page(struct page *page)
#define KVM_REQ_GLOBAL_CLOCK_UPDATE 22
#define KVM_REQ_ENABLE_IBS 23
#define KVM_REQ_DISABLE_IBS 24
+#define KVM_REQ_APIC_PAGE_RELOAD 25

#define KVM_USERSPACE_IRQ_SOURCE_ID 0
#define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
@@ -596,6 +597,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
void kvm_reload_remote_mmus(struct kvm *kvm);
void kvm_make_mclock_inprogress_request(struct kvm *kvm);
void kvm_make_scan_ioapic_request(struct kvm *kvm);
+void kvm_reload_apic_access_page(struct kvm *kvm);

long kvm_arch_dev_ioctl(struct file *filp,
unsigned int ioctl, unsigned long arg);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6091849..965b702 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
}

+void kvm_reload_apic_access_page(struct kvm *kvm)
+{
+ make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
+}
+
int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
{
struct page *page;
@@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
if (need_tlb_flush)
kvm_flush_remote_tlbs(kvm);

+ /*
+ * The physical address of apic access page is stroed in VMCS.
+ * So need to update it when it becomes invalid.
+ */
+ if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT))
+ kvm_reload_apic_access_page(kvm);
+
spin_unlock(&kvm->mmu_lock);
srcu_read_unlock(&kvm->srcu, idx);
}
--
1.8.3.1

2014-07-08 13:00:51

by Tang Chen

[permalink] [raw]
Subject: [PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/vmx.c | 25 +++++++++++--------------
2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..62f973e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,7 +578,6 @@ struct kvm_arch {

gpa_t wall_clock;

- struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0918635e..5532ac8 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
static bool vmx_mpx_supported(void);
+static int alloc_identity_pagetable(struct kvm *kvm);

static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3921,21 +3922,21 @@ out:

static int init_rmode_identity_map(struct kvm *kvm)
{
- int i, idx, r, ret;
+ int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;

if (!enable_ept)
return 1;
- if (unlikely(!kvm->arch.ept_identity_pagetable)) {
- printk(KERN_ERR "EPT: identity-mapping pagetable "
- "haven't been allocated!\n");
- return 0;
- }
if (likely(kvm->arch.ept_identity_pagetable_done))
return 1;
- ret = 0;
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+ mutex_lock(&kvm->slots_lock);
+ r = alloc_identity_pagetable(kvm);
+ if (r)
+ goto out;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3953,6 +3954,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+ mutex_unlock(&kvm->slots_lock);
return ret;
}

@@ -4006,9 +4010,6 @@ static int alloc_identity_pagetable(struct kvm *kvm)
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;

- mutex_lock(&kvm->slots_lock);
- if (kvm->arch.ept_identity_pagetable)
- goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
@@ -4025,9 +4026,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
goto out;
}

- kvm->arch.ept_identity_pagetable = page;
out:
- mutex_unlock(&kvm->slots_lock);
return r;
}

@@ -7583,8 +7582,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
- if (alloc_identity_pagetable(kvm) != 0)
- goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
--
1.8.3.1

2014-07-08 13:02:29

by Tang Chen

[permalink] [raw]
Subject: [PATCH v2 2/5] kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.

We have APIC_DEFAULT_PHYS_BASE defined as 0xfee00000, which is also the address of
apic access page. So use this macro.

Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/kvm/svm.c | 3 ++-
arch/x86/kvm/vmx.c | 6 +++---
2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ec8366c..576b525 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1257,7 +1257,8 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
svm->asid_generation = 0;
init_vmcb(svm);

- svm->vcpu.arch.apic_base = 0xfee00000 | MSR_IA32_APICBASE_ENABLE;
+ svm->vcpu.arch.apic_base = APIC_DEFAULT_PHYS_BASE |
+ MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&svm->vcpu))
svm->vcpu.arch.apic_base |= MSR_IA32_APICBASE_BSP;

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..0e1117c 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3982,13 +3982,13 @@ static int alloc_apic_access_page(struct kvm *kvm)
goto out;
kvm_userspace_mem.slot = APIC_ACCESS_PAGE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
- kvm_userspace_mem.guest_phys_addr = 0xfee00000ULL;
+ kvm_userspace_mem.guest_phys_addr = APIC_DEFAULT_PHYS_BASE;
kvm_userspace_mem.memory_size = PAGE_SIZE;
r = __kvm_set_memory_region(kvm, &kvm_userspace_mem);
if (r)
goto out;

- page = gfn_to_page(kvm, 0xfee00);
+ page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -4460,7 +4460,7 @@ static void vmx_vcpu_reset(struct kvm_vcpu *vcpu)

vmx->vcpu.arch.regs[VCPU_REGS_RDX] = get_rdx_init_val();
kvm_set_cr8(&vmx->vcpu, 0);
- apic_base_msr.data = 0xfee00000 | MSR_IA32_APICBASE_ENABLE;
+ apic_base_msr.data = APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE;
if (kvm_vcpu_is_bsp(&vmx->vcpu))
apic_base_msr.data |= MSR_IA32_APICBASE_BSP;
apic_base_msr.host_initiated = true;
--
1.8.3.1

2014-07-09 01:19:43

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

On 07/08/2014 09:01 PM, Tang Chen wrote:
> ept identity pagetable and apic access page in kvm are pinned in memory.
> As a result, they cannot be migrated/hot-removed.
>
> But actually they don't need to be pinned in memory.
>
> [For ept identity page]
> Just do not pin it. When it is migrated, guest will be able to find the
> new page in the next ept violation.
>
> [For apic access page]
> The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
> When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
> for each vcpu in addition.
>
> Change log v1 -> v2:
> 1. Add [PATCH 4/5] to remove unnecessary kvm_arch->ept_identity_pagetable.
> 2. In [PATCH 3/5], only introduce KVM_REQ_APIC_PAGE_RELOAD request.

s/[PATCH 3/5]/[PATCH 5/5]

> 3. In [PATCH 3/5], add set_apic_access_page_addr() for svm.

s/[PATCH 3/5]/[PATCH 5/5]

>
>
> Tang Chen (5):
> kvm: Add gfn_to_page_no_pin() to translate gfn to page without
> pinning.
> kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
> kvm, mem-hotplug: Do not pin ept identity pagetable in memory.
> kvm: Remove ept_identity_pagetable from struct kvm_arch.
> kvm, mem-hotplug: Do not pin apic access page in memory.
>
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/mmu.c | 11 +++++++++++
> arch/x86/kvm/svm.c | 9 ++++++++-
> arch/x86/kvm/vmx.c | 40 ++++++++++++++++++++++------------------
> arch/x86/kvm/x86.c | 16 ++++++++++++++--
> include/linux/kvm_host.h | 3 +++
> virt/kvm/kvm_main.c | 29 ++++++++++++++++++++++++++++-
> 7 files changed, 87 insertions(+), 23 deletions(-)
>

2014-07-09 02:05:47

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

On 07/08/2014 09:01 PM, Tang Chen wrote:
> kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
> it is never used to refer to the page at all.
>
> In vcpu initialization, it indicates two things:
> 1. indicates if ept page is allocated
> 2. indicates if a memory slot for identity page is initialized
>
> Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
> identity pagetable is initialized. So we can remove ept_identity_pagetable.
>
> Signed-off-by: Tang Chen<[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 -
> arch/x86/kvm/vmx.c | 25 +++++++++++--------------
> 2 files changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4931415..62f973e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -578,7 +578,6 @@ struct kvm_arch {
>
> gpa_t wall_clock;
>
> - struct page *ept_identity_pagetable;
> bool ept_identity_pagetable_done;
> gpa_t ept_identity_map_addr;
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0918635e..5532ac8 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
> static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
> static bool vmx_mpx_supported(void);
> +static int alloc_identity_pagetable(struct kvm *kvm);
>
> static DEFINE_PER_CPU(struct vmcs *, vmxarea);
> static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> @@ -3921,21 +3922,21 @@ out:
>
> static int init_rmode_identity_map(struct kvm *kvm)
> {
> - int i, idx, r, ret;
> + int i, idx, r, ret = 0;
> pfn_t identity_map_pfn;
> u32 tmp;
>
> if (!enable_ept)
> return 1;
> - if (unlikely(!kvm->arch.ept_identity_pagetable)) {
> - printk(KERN_ERR "EPT: identity-mapping pagetable "
> - "haven't been allocated!\n");
> - return 0;
> - }
> if (likely(kvm->arch.ept_identity_pagetable_done))
> return 1;
> - ret = 0;
> identity_map_pfn = kvm->arch.ept_identity_map_addr>> PAGE_SHIFT;
> +
> + mutex_lock(&kvm->slots_lock);
> + r = alloc_identity_pagetable(kvm);
> + if (r)
> + goto out;

s/goto out/goto out2

Will resend the patch soon.

> +
> idx = srcu_read_lock(&kvm->srcu);
> r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
> if (r< 0)
> @@ -3953,6 +3954,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
> ret = 1;
> out:
> srcu_read_unlock(&kvm->srcu, idx);
> +
> +out2:
> + mutex_unlock(&kvm->slots_lock);
> return ret;
> }
>
> @@ -4006,9 +4010,6 @@ static int alloc_identity_pagetable(struct kvm *kvm)
> struct kvm_userspace_memory_region kvm_userspace_mem;
> int r = 0;
>
> - mutex_lock(&kvm->slots_lock);
> - if (kvm->arch.ept_identity_pagetable)
> - goto out;
> kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
> kvm_userspace_mem.flags = 0;
> kvm_userspace_mem.guest_phys_addr =
> @@ -4025,9 +4026,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
> goto out;
> }
>
> - kvm->arch.ept_identity_pagetable = page;
> out:
> - mutex_unlock(&kvm->slots_lock);
> return r;
> }
>
> @@ -7583,8 +7582,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
> kvm->arch.ept_identity_map_addr =
> VMX_EPT_IDENTITY_PAGETABLE_ADDR;
> err = -ENOMEM;
> - if (alloc_identity_pagetable(kvm) != 0)
> - goto free_vmcs;
> if (!init_rmode_identity_map(kvm))
> goto free_vmcs;
> }

2014-07-09 02:07:10

by Tang Chen

[permalink] [raw]
Subject: [RESEND PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
it is never used to refer to the page at all.

In vcpu initialization, it indicates two things:
1. indicates if ept page is allocated
2. indicates if a memory slot for identity page is initialized

Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
identity pagetable is initialized. So we can remove ept_identity_pagetable.

Signed-off-by: Tang Chen <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/vmx.c | 25 +++++++++++--------------
2 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4931415..62f973e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -578,7 +578,6 @@ struct kvm_arch {

gpa_t wall_clock;

- struct page *ept_identity_pagetable;
bool ept_identity_pagetable_done;
gpa_t ept_identity_map_addr;

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 0918635e..fe2e5f4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
static bool vmx_mpx_supported(void);
+static int alloc_identity_pagetable(struct kvm *kvm);

static DEFINE_PER_CPU(struct vmcs *, vmxarea);
static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
@@ -3921,21 +3922,21 @@ out:

static int init_rmode_identity_map(struct kvm *kvm)
{
- int i, idx, r, ret;
+ int i, idx, r, ret = 0;
pfn_t identity_map_pfn;
u32 tmp;

if (!enable_ept)
return 1;
- if (unlikely(!kvm->arch.ept_identity_pagetable)) {
- printk(KERN_ERR "EPT: identity-mapping pagetable "
- "haven't been allocated!\n");
- return 0;
- }
if (likely(kvm->arch.ept_identity_pagetable_done))
return 1;
- ret = 0;
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+
+ mutex_lock(&kvm->slots_lock);
+ r = alloc_identity_pagetable(kvm);
+ if (r)
+ goto out2;
+
idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
@@ -3953,6 +3954,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
ret = 1;
out:
srcu_read_unlock(&kvm->srcu, idx);
+
+out2:
+ mutex_unlock(&kvm->slots_lock);
return ret;
}

@@ -4006,9 +4010,6 @@ static int alloc_identity_pagetable(struct kvm *kvm)
struct kvm_userspace_memory_region kvm_userspace_mem;
int r = 0;

- mutex_lock(&kvm->slots_lock);
- if (kvm->arch.ept_identity_pagetable)
- goto out;
kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
kvm_userspace_mem.flags = 0;
kvm_userspace_mem.guest_phys_addr =
@@ -4025,9 +4026,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
goto out;
}

- kvm->arch.ept_identity_pagetable = page;
out:
- mutex_unlock(&kvm->slots_lock);
return r;
}

@@ -7583,8 +7582,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
kvm->arch.ept_identity_map_addr =
VMX_EPT_IDENTITY_PAGETABLE_ADDR;
err = -ENOMEM;
- if (alloc_identity_pagetable(kvm) != 0)
- goto free_vmcs;
if (!init_rmode_identity_map(kvm))
goto free_vmcs;
}
--
1.8.3.1

2014-07-11 06:23:03

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 0/5] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

hi Gleb, Marcelo, Nadav,

Would you please help to review these patches ?

Thanks. :)

On 07/08/2014 09:01 PM, Tang Chen wrote:
> ept identity pagetable and apic access page in kvm are pinned in memory.
> As a result, they cannot be migrated/hot-removed.
>
> But actually they don't need to be pinned in memory.
>
> [For ept identity page]
> Just do not pin it. When it is migrated, guest will be able to find the
> new page in the next ept violation.
>
> [For apic access page]
> The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer.
> When apic access page is migrated, we update VMCS APIC_ACCESS_ADDR pointer
> for each vcpu in addition.
>
> Change log v1 -> v2:
> 1. Add [PATCH 4/5] to remove unnecessary kvm_arch->ept_identity_pagetable.
> 2. In [PATCH 3/5], only introduce KVM_REQ_APIC_PAGE_RELOAD request.
> 3. In [PATCH 3/5], add set_apic_access_page_addr() for svm.
>
>
> Tang Chen (5):
> kvm: Add gfn_to_page_no_pin() to translate gfn to page without
> pinning.
> kvm: Use APIC_DEFAULT_PHYS_BASE macro as the apic access page address.
> kvm, mem-hotplug: Do not pin ept identity pagetable in memory.
> kvm: Remove ept_identity_pagetable from struct kvm_arch.
> kvm, mem-hotplug: Do not pin apic access page in memory.
>
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/mmu.c | 11 +++++++++++
> arch/x86/kvm/svm.c | 9 ++++++++-
> arch/x86/kvm/vmx.c | 40 ++++++++++++++++++++++------------------
> arch/x86/kvm/x86.c | 16 ++++++++++++++--
> include/linux/kvm_host.h | 3 +++
> virt/kvm/kvm_main.c | 29 ++++++++++++++++++++++++++++-
> 7 files changed, 87 insertions(+), 23 deletions(-)
>

2014-07-12 07:44:34

by Gleb Natapov

[permalink] [raw]
Subject: Re: [RESEND PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

On Wed, Jul 09, 2014 at 10:08:03AM +0800, Tang Chen wrote:
> kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
> it is never used to refer to the page at all.
>
> In vcpu initialization, it indicates two things:
> 1. indicates if ept page is allocated
> 2. indicates if a memory slot for identity page is initialized
>
> Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
> identity pagetable is initialized. So we can remove ept_identity_pagetable.
>
> Signed-off-by: Tang Chen <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 -
> arch/x86/kvm/vmx.c | 25 +++++++++++--------------
> 2 files changed, 11 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 4931415..62f973e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -578,7 +578,6 @@ struct kvm_arch {
>
> gpa_t wall_clock;
>
> - struct page *ept_identity_pagetable;
> bool ept_identity_pagetable_done;
> gpa_t ept_identity_map_addr;
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 0918635e..fe2e5f4 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
> static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
> static bool vmx_mpx_supported(void);
> +static int alloc_identity_pagetable(struct kvm *kvm);
>
> static DEFINE_PER_CPU(struct vmcs *, vmxarea);
> static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> @@ -3921,21 +3922,21 @@ out:
>
> static int init_rmode_identity_map(struct kvm *kvm)
> {
> - int i, idx, r, ret;
> + int i, idx, r, ret = 0;
> pfn_t identity_map_pfn;
> u32 tmp;
>
> if (!enable_ept)
> return 1;
> - if (unlikely(!kvm->arch.ept_identity_pagetable)) {
> - printk(KERN_ERR "EPT: identity-mapping pagetable "
> - "haven't been allocated!\n");
> - return 0;
> - }
> if (likely(kvm->arch.ept_identity_pagetable_done))
> return 1;
> - ret = 0;
> identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
> +
> + mutex_lock(&kvm->slots_lock);
Why move this out of alloc_identity_pagetable()?

> + r = alloc_identity_pagetable(kvm);
> + if (r)
> + goto out2;
> +
> idx = srcu_read_lock(&kvm->srcu);
> r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
> if (r < 0)
> @@ -3953,6 +3954,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
> ret = 1;
> out:
> srcu_read_unlock(&kvm->srcu, idx);
> +
> +out2:
> + mutex_unlock(&kvm->slots_lock);
> return ret;
> }
>
> @@ -4006,9 +4010,6 @@ static int alloc_identity_pagetable(struct kvm *kvm)
> struct kvm_userspace_memory_region kvm_userspace_mem;
> int r = 0;
>
> - mutex_lock(&kvm->slots_lock);
> - if (kvm->arch.ept_identity_pagetable)
> - goto out;
> kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
> kvm_userspace_mem.flags = 0;
> kvm_userspace_mem.guest_phys_addr =
> @@ -4025,9 +4026,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
> goto out;
> }
>
> - kvm->arch.ept_identity_pagetable = page;
I think we can drop gfn_to_page() above too now. Why would we need it?

> out:
> - mutex_unlock(&kvm->slots_lock);
> return r;
> }
>
> @@ -7583,8 +7582,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
> kvm->arch.ept_identity_map_addr =
> VMX_EPT_IDENTITY_PAGETABLE_ADDR;
> err = -ENOMEM;
> - if (alloc_identity_pagetable(kvm) != 0)
> - goto free_vmcs;
> if (!init_rmode_identity_map(kvm))
> goto free_vmcs;
> }
> --
> 1.8.3.1
>

--
Gleb.

2014-07-12 08:04:56

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On Tue, Jul 08, 2014 at 09:01:32PM +0800, Tang Chen wrote:
> apic access page is pinned in memory. As a result, it cannot be migrated/hot-removed.
> Actually, it is not necessary to be pinned.
>
> The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
> the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
> corresponding ept entry. This patch introduces a new vcpu request named
> KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
> and force all the vcpus exit guest, and re-enter guest till they updates the VMCS
> APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
> kvm->arch.apic_access_page to the new page.
>
By default kvm Linux guest uses x2apic, so APIC_ACCESS_ADDR mechanism
is not used since no MMIO access to APIC is ever done. Have you tested
this with "-cpu modelname,-x2apic" qemu flag?

> Signed-off-by: Tang Chen <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 +
> arch/x86/kvm/mmu.c | 11 +++++++++++
> arch/x86/kvm/svm.c | 6 ++++++
> arch/x86/kvm/vmx.c | 8 +++++++-
> arch/x86/kvm/x86.c | 14 ++++++++++++++
> include/linux/kvm_host.h | 2 ++
> virt/kvm/kvm_main.c | 12 ++++++++++++
> 7 files changed, 53 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 62f973e..9ce6bfd 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -737,6 +737,7 @@ struct kvm_x86_ops {
> void (*hwapic_isr_update)(struct kvm *kvm, int isr);
> void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
> void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
> + void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
> void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
> void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
> int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9314678..551693d 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3427,6 +3427,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
> level, gfn, pfn, prefault);
> spin_unlock(&vcpu->kvm->mmu_lock);
>
> + /*
> + * apic access page could be migrated. When the guest tries to access
> + * the apic access page, ept violation will occur, and we can use GUP
> + * to find the new page.
> + *
> + * GUP will wait till the migrate entry be replaced with the new page.
> + */
> + if (gpa == APIC_DEFAULT_PHYS_BASE)
> + vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
> + APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
Shouldn't you make KVM_REQ_APIC_PAGE_RELOAD request here?

> +
> return r;
>
> out_unlock:
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 576b525..dc76f29 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -3612,6 +3612,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
> return;
> }
>
> +static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
> +{
> + return;
> +}
> +
> static int svm_vm_has_apicv(struct kvm *kvm)
> {
> return 0;
> @@ -4365,6 +4370,7 @@ static struct kvm_x86_ops svm_x86_ops = {
> .enable_irq_window = enable_irq_window,
> .update_cr8_intercept = update_cr8_intercept,
> .set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
> + .set_apic_access_page_addr = svm_set_apic_access_page_addr,
> .vm_has_apicv = svm_vm_has_apicv,
> .load_eoi_exitmap = svm_load_eoi_exitmap,
> .hwapic_isr_update = svm_hwapic_isr_update,
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 5532ac8..f7c6313 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3992,7 +3992,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
> if (r)
> goto out;
>
> - page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
> + page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
> if (is_error_page(page)) {
> r = -EFAULT;
> goto out;
> @@ -7073,6 +7073,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
> vmx_set_msr_bitmap(vcpu);
> }
>
> +static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
> +{
> + vmcs_write64(APIC_ACCESS_ADDR, hpa);
> +}
> +
> static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
> {
> u16 status;
> @@ -8842,6 +8847,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
> .enable_irq_window = enable_irq_window,
> .update_cr8_intercept = update_cr8_intercept,
> .set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
> + .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
> .vm_has_apicv = vmx_vm_has_apicv,
> .load_eoi_exitmap = vmx_load_eoi_exitmap,
> .hwapic_irr_update = vmx_hwapic_irr_update,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index ffbe557..7080eda 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5929,6 +5929,18 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
> kvm_apic_update_tmr(vcpu, tmr);
> }
>
> +static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
> +{
> + /*
> + * When the page is being migrated, GUP will wait till the migrate
> + * entry is replaced with the new pte entry pointing to the new page.
> + */
> + struct page *page = gfn_to_page_no_pin(vcpu->kvm,
> + APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT);
If you do not use kvm->arch.apic_access_page to get current address why not drop it entirely?

> + kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
> + page_to_phys(page));
> +}
> +
> /*
> * Returns 1 to let __vcpu_run() continue the guest execution loop without
> * exiting to the userspace. Otherwise, the value will be returned to the
> @@ -5989,6 +6001,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
> kvm_deliver_pmi(vcpu);
> if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
> vcpu_scan_ioapic(vcpu);
> + if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
> + vcpu_reload_apic_access_page(vcpu);
> }
>
> if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 7c58d9d..f49be86 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -136,6 +136,7 @@ static inline bool is_error_page(struct page *page)
> #define KVM_REQ_GLOBAL_CLOCK_UPDATE 22
> #define KVM_REQ_ENABLE_IBS 23
> #define KVM_REQ_DISABLE_IBS 24
> +#define KVM_REQ_APIC_PAGE_RELOAD 25
>
> #define KVM_USERSPACE_IRQ_SOURCE_ID 0
> #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
> @@ -596,6 +597,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
> void kvm_reload_remote_mmus(struct kvm *kvm);
> void kvm_make_mclock_inprogress_request(struct kvm *kvm);
> void kvm_make_scan_ioapic_request(struct kvm *kvm);
> +void kvm_reload_apic_access_page(struct kvm *kvm);
>
> long kvm_arch_dev_ioctl(struct file *filp,
> unsigned int ioctl, unsigned long arg);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 6091849..965b702 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
> make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
> }
>
> +void kvm_reload_apic_access_page(struct kvm *kvm)
> +{
> + make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
> +}
> +
> int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
> {
> struct page *page;
> @@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
> if (need_tlb_flush)
> kvm_flush_remote_tlbs(kvm);
>
> + /*
> + * The physical address of apic access page is stroed in VMCS.
> + * So need to update it when it becomes invalid.
> + */
> + if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT))
> + kvm_reload_apic_access_page(kvm);
> +
> spin_unlock(&kvm->mmu_lock);
> srcu_read_unlock(&kvm->srcu, idx);
> }

You forgot to drop put_page(kvm->arch.apic_access_page); from x86.c again.

--
Gleb.

2014-07-14 07:56:19

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

Hi Gleb,

Thanks for the reply. Please see below.

On 07/12/2014 04:04 PM, Gleb Natapov wrote:
> On Tue, Jul 08, 2014 at 09:01:32PM +0800, Tang Chen wrote:
>> apic access page is pinned in memory. As a result, it cannot be migrated/hot-removed.
>> Actually, it is not necessary to be pinned.
>>
>> The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
>> the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
>> corresponding ept entry. This patch introduces a new vcpu request named
>> KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
>> and force all the vcpus exit guest, and re-enter guest till they updates the VMCS
>> APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
>> kvm->arch.apic_access_page to the new page.
>>
> By default kvm Linux guest uses x2apic, so APIC_ACCESS_ADDR mechanism
> is not used since no MMIO access to APIC is ever done. Have you tested
> this with "-cpu modelname,-x2apic" qemu flag?

I used the following commandline to test the patches.

# /usr/libexec/qemu-kvm -m 512M -hda /home/tangchen/xxx.img -enable-kvm
-smp 2

And I think the guest used APIC_ACCESS_ADDR mechanism because the previous
patch-set has some problem which will happen when the apic page is accessed.
And it did happen.

I'll test this patch-set with "-cpu modelname,-x2apic" flag.

>
>> Signed-off-by: Tang Chen<[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 +
>> arch/x86/kvm/mmu.c | 11 +++++++++++
>> arch/x86/kvm/svm.c | 6 ++++++
>> arch/x86/kvm/vmx.c | 8 +++++++-
>> arch/x86/kvm/x86.c | 14 ++++++++++++++
>> include/linux/kvm_host.h | 2 ++
>> virt/kvm/kvm_main.c | 12 ++++++++++++
>> 7 files changed, 53 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 62f973e..9ce6bfd 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -737,6 +737,7 @@ struct kvm_x86_ops {
>> void (*hwapic_isr_update)(struct kvm *kvm, int isr);
>> void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
>> void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
>> + void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
>> void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
>> void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
>> int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 9314678..551693d 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3427,6 +3427,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
>> level, gfn, pfn, prefault);
>> spin_unlock(&vcpu->kvm->mmu_lock);
>>
>> + /*
>> + * apic access page could be migrated. When the guest tries to access
>> + * the apic access page, ept violation will occur, and we can use GUP
>> + * to find the new page.
>> + *
>> + * GUP will wait till the migrate entry be replaced with the new page.
>> + */
>> + if (gpa == APIC_DEFAULT_PHYS_BASE)
>> + vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
>> + APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
> Shouldn't you make KVM_REQ_APIC_PAGE_RELOAD request here?

I don't think we need to make KVM_REQ_APIC_PAGE_RELOAD request here.

In kvm_mmu_notifier_invalidate_page() I made the request. And the
handler called
gfn_to_page_no_pin() to get the new page, which will wait till the
migration
finished. And then updated the VMCS APIC_ACCESS_ADDR pointer. So, when
the vcpus
were forced to exit the guest mode, they would wait till the VMCS
APIC_ACCESS_ADDR
pointer was updated.

As a result, we don't need to make the request here.


>
>> +
>> return r;
>>
>> out_unlock:
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 576b525..dc76f29 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -3612,6 +3612,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>> return;
>> }
>>
>> +static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
>> +{
>> + return;
>> +}
>> +
>> static int svm_vm_has_apicv(struct kvm *kvm)
>> {
>> return 0;
>> @@ -4365,6 +4370,7 @@ static struct kvm_x86_ops svm_x86_ops = {
>> .enable_irq_window = enable_irq_window,
>> .update_cr8_intercept = update_cr8_intercept,
>> .set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
>> + .set_apic_access_page_addr = svm_set_apic_access_page_addr,
>> .vm_has_apicv = svm_vm_has_apicv,
>> .load_eoi_exitmap = svm_load_eoi_exitmap,
>> .hwapic_isr_update = svm_hwapic_isr_update,
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 5532ac8..f7c6313 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -3992,7 +3992,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
>> if (r)
>> goto out;
>>
>> - page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
>> + page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
>> if (is_error_page(page)) {
>> r = -EFAULT;
>> goto out;
>> @@ -7073,6 +7073,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
>> vmx_set_msr_bitmap(vcpu);
>> }
>>
>> +static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
>> +{
>> + vmcs_write64(APIC_ACCESS_ADDR, hpa);
>> +}
>> +
>> static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
>> {
>> u16 status;
>> @@ -8842,6 +8847,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
>> .enable_irq_window = enable_irq_window,
>> .update_cr8_intercept = update_cr8_intercept,
>> .set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
>> + .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
>> .vm_has_apicv = vmx_vm_has_apicv,
>> .load_eoi_exitmap = vmx_load_eoi_exitmap,
>> .hwapic_irr_update = vmx_hwapic_irr_update,
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index ffbe557..7080eda 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -5929,6 +5929,18 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
>> kvm_apic_update_tmr(vcpu, tmr);
>> }
>>
>> +static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
>> +{
>> + /*
>> + * When the page is being migrated, GUP will wait till the migrate
>> + * entry is replaced with the new pte entry pointing to the new page.
>> + */
>> + struct page *page = gfn_to_page_no_pin(vcpu->kvm,
>> + APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
> If you do not use kvm->arch.apic_access_page to get current address why not drop it entirely?
>

I should also update kvm->arch.apic_access_page here. It is used in
other places
in kvm, so I don't think we should drop it. Will update the patch.

>> + kvm_x86_ops->set_apic_access_page_addr(vcpu->kvm,
>> + page_to_phys(page));
>> +}
>> +
>> /*
>> * Returns 1 to let __vcpu_run() continue the guest execution loop without
>> * exiting to the userspace. Otherwise, the value will be returned to the
>> @@ -5989,6 +6001,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
>> kvm_deliver_pmi(vcpu);
>> if (kvm_check_request(KVM_REQ_SCAN_IOAPIC, vcpu))
>> vcpu_scan_ioapic(vcpu);
>> + if (kvm_check_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu))
>> + vcpu_reload_apic_access_page(vcpu);
>> }
>>
>> if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 7c58d9d..f49be86 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -136,6 +136,7 @@ static inline bool is_error_page(struct page *page)
>> #define KVM_REQ_GLOBAL_CLOCK_UPDATE 22
>> #define KVM_REQ_ENABLE_IBS 23
>> #define KVM_REQ_DISABLE_IBS 24
>> +#define KVM_REQ_APIC_PAGE_RELOAD 25
>>
>> #define KVM_USERSPACE_IRQ_SOURCE_ID 0
>> #define KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID 1
>> @@ -596,6 +597,7 @@ void kvm_flush_remote_tlbs(struct kvm *kvm);
>> void kvm_reload_remote_mmus(struct kvm *kvm);
>> void kvm_make_mclock_inprogress_request(struct kvm *kvm);
>> void kvm_make_scan_ioapic_request(struct kvm *kvm);
>> +void kvm_reload_apic_access_page(struct kvm *kvm);
>>
>> long kvm_arch_dev_ioctl(struct file *filp,
>> unsigned int ioctl, unsigned long arg);
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 6091849..965b702 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -210,6 +210,11 @@ void kvm_make_scan_ioapic_request(struct kvm *kvm)
>> make_all_cpus_request(kvm, KVM_REQ_SCAN_IOAPIC);
>> }
>>
>> +void kvm_reload_apic_access_page(struct kvm *kvm)
>> +{
>> + make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD);
>> +}
>> +
>> int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
>> {
>> struct page *page;
>> @@ -294,6 +299,13 @@ static void kvm_mmu_notifier_invalidate_page(struct mmu_notifier *mn,
>> if (need_tlb_flush)
>> kvm_flush_remote_tlbs(kvm);
>>
>> + /*
>> + * The physical address of apic access page is stroed in VMCS.
>> + * So need to update it when it becomes invalid.
>> + */
>> + if (address == gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT))
>> + kvm_reload_apic_access_page(kvm);
>> +
>> spin_unlock(&kvm->mmu_lock);
>> srcu_read_unlock(&kvm->srcu, idx);
>> }
>
> You forgot to drop put_page(kvm->arch.apic_access_page); from x86.c again.
>

Yes...will update the patch.

Thanks.

2014-07-14 09:16:12

by Tang Chen

[permalink] [raw]
Subject: Re: [RESEND PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

Hi Gleb,

Please see below.

On 07/12/2014 03:44 PM, Gleb Natapov wrote:
> On Wed, Jul 09, 2014 at 10:08:03AM +0800, Tang Chen wrote:
>> kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
>> it is never used to refer to the page at all.
>>
>> In vcpu initialization, it indicates two things:
>> 1. indicates if ept page is allocated
>> 2. indicates if a memory slot for identity page is initialized
>>
>> Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
>> identity pagetable is initialized. So we can remove ept_identity_pagetable.
>>
>> Signed-off-by: Tang Chen<[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 1 -
>> arch/x86/kvm/vmx.c | 25 +++++++++++--------------
>> 2 files changed, 11 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index 4931415..62f973e 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -578,7 +578,6 @@ struct kvm_arch {
>>
>> gpa_t wall_clock;
>>
>> - struct page *ept_identity_pagetable;
>> bool ept_identity_pagetable_done;
>> gpa_t ept_identity_map_addr;
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 0918635e..fe2e5f4 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
>> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
>> static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
>> static bool vmx_mpx_supported(void);
>> +static int alloc_identity_pagetable(struct kvm *kvm);
>>
>> static DEFINE_PER_CPU(struct vmcs *, vmxarea);
>> static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
>> @@ -3921,21 +3922,21 @@ out:
>>
>> static int init_rmode_identity_map(struct kvm *kvm)
>> {
>> - int i, idx, r, ret;
>> + int i, idx, r, ret = 0;
>> pfn_t identity_map_pfn;
>> u32 tmp;
>>
>> if (!enable_ept)
>> return 1;
>> - if (unlikely(!kvm->arch.ept_identity_pagetable)) {
>> - printk(KERN_ERR "EPT: identity-mapping pagetable "
>> - "haven't been allocated!\n");
>> - return 0;
>> - }
>> if (likely(kvm->arch.ept_identity_pagetable_done))
>> return 1;
>> - ret = 0;
>> identity_map_pfn = kvm->arch.ept_identity_map_addr>> PAGE_SHIFT;
>> +
>> + mutex_lock(&kvm->slots_lock);
> Why move this out of alloc_identity_pagetable()?
>

Referring to the original code, I think mutex_lock(&kvm->slots_lock) is
used
to protect kvm->arch.ept_identity_pagetable. If two or more threads try to
modify it at the same time, the mutex ensures that the identity table is
only
allocated once.

Now, we dropped kvm->arch.ept_identity_pagetable. And use
kvm->arch.ept_identity_pagetable_done
to check if the identity table is allocated and initialized. So we
should protect
memory slot operation in alloc_identity_pagetable() and
kvm->arch.ept_identity_pagetable_done
with this mutex.

Of course, I can see that the name "slots_lock" indicates that it may be
used
to protect the memory slot operation only. Maybe move it out here is not
suitable.

If I'm wrong, please tell me.

>> + r = alloc_identity_pagetable(kvm);
>> + if (r)
>> + goto out2;
>> +
>> idx = srcu_read_lock(&kvm->srcu);
>> r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
>> if (r< 0)
>> @@ -3953,6 +3954,9 @@ static int init_rmode_identity_map(struct kvm *kvm)
>> ret = 1;
>> out:
>> srcu_read_unlock(&kvm->srcu, idx);
>> +
>> +out2:
>> + mutex_unlock(&kvm->slots_lock);
>> return ret;
>> }
>>
>> @@ -4006,9 +4010,6 @@ static int alloc_identity_pagetable(struct kvm *kvm)
>> struct kvm_userspace_memory_region kvm_userspace_mem;
>> int r = 0;
>>
>> - mutex_lock(&kvm->slots_lock);
>> - if (kvm->arch.ept_identity_pagetable)
>> - goto out;
>> kvm_userspace_mem.slot = IDENTITY_PAGETABLE_PRIVATE_MEMSLOT;
>> kvm_userspace_mem.flags = 0;
>> kvm_userspace_mem.guest_phys_addr =
>> @@ -4025,9 +4026,7 @@ static int alloc_identity_pagetable(struct kvm *kvm)
>> goto out;
>> }
>>
>> - kvm->arch.ept_identity_pagetable = page;
> I think we can drop gfn_to_page() above too now. Why would we need it?
>

Yes, will remove it in the next version.

Thanks.

>> out:
>> - mutex_unlock(&kvm->slots_lock);
>> return r;
>> }
>>
>> @@ -7583,8 +7582,6 @@ static struct kvm_vcpu *vmx_create_vcpu(struct kvm *kvm, unsigned int id)
>> kvm->arch.ept_identity_map_addr =
>> VMX_EPT_IDENTITY_PAGETABLE_ADDR;
>> err = -ENOMEM;
>> - if (alloc_identity_pagetable(kvm) != 0)
>> - goto free_vmcs;
>> if (!init_rmode_identity_map(kvm))
>> goto free_vmcs;
>> }
>> --
>> 1.8.3.1
>>
>
> --
> Gleb.
> .
>

2014-07-14 14:27:56

by Gleb Natapov

[permalink] [raw]
Subject: Re: [RESEND PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

On Mon, Jul 14, 2014 at 05:17:04PM +0800, Tang Chen wrote:
> On 07/12/2014 03:44 PM, Gleb Natapov wrote:
> >On Wed, Jul 09, 2014 at 10:08:03AM +0800, Tang Chen wrote:
> >>kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But
> >>it is never used to refer to the page at all.
> >>
> >>In vcpu initialization, it indicates two things:
> >>1. indicates if ept page is allocated
> >>2. indicates if a memory slot for identity page is initialized
> >>
> >>Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept
> >>identity pagetable is initialized. So we can remove ept_identity_pagetable.
> >>
> >>Signed-off-by: Tang Chen<[email protected]>
> >>---
> >> arch/x86/include/asm/kvm_host.h | 1 -
> >> arch/x86/kvm/vmx.c | 25 +++++++++++--------------
> >> 2 files changed, 11 insertions(+), 15 deletions(-)
> >>
> >>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>index 4931415..62f973e 100644
> >>--- a/arch/x86/include/asm/kvm_host.h
> >>+++ b/arch/x86/include/asm/kvm_host.h
> >>@@ -578,7 +578,6 @@ struct kvm_arch {
> >>
> >> gpa_t wall_clock;
> >>
> >>- struct page *ept_identity_pagetable;
> >> bool ept_identity_pagetable_done;
> >> gpa_t ept_identity_map_addr;
> >>
> >>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>index 0918635e..fe2e5f4 100644
> >>--- a/arch/x86/kvm/vmx.c
> >>+++ b/arch/x86/kvm/vmx.c
> >>@@ -741,6 +741,7 @@ static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
> >> static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
> >> static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
> >> static bool vmx_mpx_supported(void);
> >>+static int alloc_identity_pagetable(struct kvm *kvm);
> >>
> >> static DEFINE_PER_CPU(struct vmcs *, vmxarea);
> >> static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> >>@@ -3921,21 +3922,21 @@ out:
> >>
> >> static int init_rmode_identity_map(struct kvm *kvm)
> >> {
> >>- int i, idx, r, ret;
> >>+ int i, idx, r, ret = 0;
> >> pfn_t identity_map_pfn;
> >> u32 tmp;
> >>
> >> if (!enable_ept)
> >> return 1;
> >>- if (unlikely(!kvm->arch.ept_identity_pagetable)) {
> >>- printk(KERN_ERR "EPT: identity-mapping pagetable "
> >>- "haven't been allocated!\n");
> >>- return 0;
> >>- }
> >> if (likely(kvm->arch.ept_identity_pagetable_done))
> >> return 1;
> >>- ret = 0;
> >> identity_map_pfn = kvm->arch.ept_identity_map_addr>> PAGE_SHIFT;
> >>+
> >>+ mutex_lock(&kvm->slots_lock);
> >Why move this out of alloc_identity_pagetable()?
> >
>
> Referring to the original code, I think mutex_lock(&kvm->slots_lock) is used
> to protect kvm->arch.ept_identity_pagetable. If two or more threads try to
> modify it at the same time, the mutex ensures that the identity table is
> only
> allocated once.
>
> Now, we dropped kvm->arch.ept_identity_pagetable. And use
> kvm->arch.ept_identity_pagetable_done
> to check if the identity table is allocated and initialized. So we should
> protect
> memory slot operation in alloc_identity_pagetable() and
> kvm->arch.ept_identity_pagetable_done
> with this mutex.
>
> Of course, I can see that the name "slots_lock" indicates that it may be
> used
> to protect the memory slot operation only. Maybe move it out here is not
> suitable.
>
> If I'm wrong, please tell me.
>
No, you are right that besides memory slot creation slots_lock protects checking
of ept_identity_pagetable here, but after you patch ept_identity_pagetable_done is
tested outside of slots_lock so the allocation can happen twice, no?

--
Gleb.

2014-07-14 14:58:45

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

CCing Jan to check my nested kvm findings below.

On Mon, Jul 14, 2014 at 03:57:09PM +0800, Tang Chen wrote:
> Hi Gleb,
>
> Thanks for the reply. Please see below.
>
> On 07/12/2014 04:04 PM, Gleb Natapov wrote:
> >On Tue, Jul 08, 2014 at 09:01:32PM +0800, Tang Chen wrote:
> >>apic access page is pinned in memory. As a result, it cannot be migrated/hot-removed.
> >>Actually, it is not necessary to be pinned.
> >>
> >>The hpa of apic access page is stored in VMCS APIC_ACCESS_ADDR pointer. When
> >>the page is migrated, kvm_mmu_notifier_invalidate_page() will invalidate the
> >>corresponding ept entry. This patch introduces a new vcpu request named
> >>KVM_REQ_APIC_PAGE_RELOAD, and makes this request to all the vcpus at this time,
> >>and force all the vcpus exit guest, and re-enter guest till they updates the VMCS
> >>APIC_ACCESS_ADDR pointer to the new apic access page address, and updates
> >>kvm->arch.apic_access_page to the new page.
> >>
> >By default kvm Linux guest uses x2apic, so APIC_ACCESS_ADDR mechanism
> >is not used since no MMIO access to APIC is ever done. Have you tested
> >this with "-cpu modelname,-x2apic" qemu flag?
>
> I used the following commandline to test the patches.
>
> # /usr/libexec/qemu-kvm -m 512M -hda /home/tangchen/xxx.img -enable-kvm -smp
> 2
>
That most likely uses x2apic.

> And I think the guest used APIC_ACCESS_ADDR mechanism because the previous
> patch-set has some problem which will happen when the apic page is accessed.
> And it did happen.
>
> I'll test this patch-set with "-cpu modelname,-x2apic" flag.
>
Replace "modelname" with one of supported model names such as nehalem of course :)

> >
> >>Signed-off-by: Tang Chen<[email protected]>
> >>---
> >> arch/x86/include/asm/kvm_host.h | 1 +
> >> arch/x86/kvm/mmu.c | 11 +++++++++++
> >> arch/x86/kvm/svm.c | 6 ++++++
> >> arch/x86/kvm/vmx.c | 8 +++++++-
> >> arch/x86/kvm/x86.c | 14 ++++++++++++++
> >> include/linux/kvm_host.h | 2 ++
> >> virt/kvm/kvm_main.c | 12 ++++++++++++
> >> 7 files changed, 53 insertions(+), 1 deletion(-)
> >>
> >>diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>index 62f973e..9ce6bfd 100644
> >>--- a/arch/x86/include/asm/kvm_host.h
> >>+++ b/arch/x86/include/asm/kvm_host.h
> >>@@ -737,6 +737,7 @@ struct kvm_x86_ops {
> >> void (*hwapic_isr_update)(struct kvm *kvm, int isr);
> >> void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
> >> void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
> >>+ void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
> >> void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
> >> void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
> >> int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> >>diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >>index 9314678..551693d 100644
> >>--- a/arch/x86/kvm/mmu.c
> >>+++ b/arch/x86/kvm/mmu.c
> >>@@ -3427,6 +3427,17 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
> >> level, gfn, pfn, prefault);
> >> spin_unlock(&vcpu->kvm->mmu_lock);
> >>
> >>+ /*
> >>+ * apic access page could be migrated. When the guest tries to access
> >>+ * the apic access page, ept violation will occur, and we can use GUP
> >>+ * to find the new page.
> >>+ *
> >>+ * GUP will wait till the migrate entry be replaced with the new page.
> >>+ */
> >>+ if (gpa == APIC_DEFAULT_PHYS_BASE)
> >>+ vcpu->kvm->arch.apic_access_page = gfn_to_page_no_pin(vcpu->kvm,
> >>+ APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
> >Shouldn't you make KVM_REQ_APIC_PAGE_RELOAD request here?
>
> I don't think we need to make KVM_REQ_APIC_PAGE_RELOAD request here.
>
> In kvm_mmu_notifier_invalidate_page() I made the request. And the handler
> called
> gfn_to_page_no_pin() to get the new page, which will wait till the migration
> finished. And then updated the VMCS APIC_ACCESS_ADDR pointer. So, when the
> vcpus
> were forced to exit the guest mode, they would wait till the VMCS
> APIC_ACCESS_ADDR
> pointer was updated.
>
> As a result, we don't need to make the request here.
OK, I do not see what's the purpose of the code here then.

>
>
> >
> >>+
> >> return r;
> >>
> >> out_unlock:
> >>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>index 576b525..dc76f29 100644
> >>--- a/arch/x86/kvm/svm.c
> >>+++ b/arch/x86/kvm/svm.c
> >>@@ -3612,6 +3612,11 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
> >> return;
> >> }
> >>
> >>+static void svm_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
> >>+{
> >>+ return;
> >>+}
> >>+
> >> static int svm_vm_has_apicv(struct kvm *kvm)
> >> {
> >> return 0;
> >>@@ -4365,6 +4370,7 @@ static struct kvm_x86_ops svm_x86_ops = {
> >> .enable_irq_window = enable_irq_window,
> >> .update_cr8_intercept = update_cr8_intercept,
> >> .set_virtual_x2apic_mode = svm_set_virtual_x2apic_mode,
> >>+ .set_apic_access_page_addr = svm_set_apic_access_page_addr,
> >> .vm_has_apicv = svm_vm_has_apicv,
> >> .load_eoi_exitmap = svm_load_eoi_exitmap,
> >> .hwapic_isr_update = svm_hwapic_isr_update,
> >>diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> >>index 5532ac8..f7c6313 100644
> >>--- a/arch/x86/kvm/vmx.c
> >>+++ b/arch/x86/kvm/vmx.c
> >>@@ -3992,7 +3992,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
> >> if (r)
> >> goto out;
> >>
> >>- page = gfn_to_page(kvm, APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
> >>+ page = gfn_to_page_no_pin(kvm, APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
> >> if (is_error_page(page)) {
> >> r = -EFAULT;
> >> goto out;
> >>@@ -7073,6 +7073,11 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu *vcpu, bool set)
> >> vmx_set_msr_bitmap(vcpu);
> >> }
> >>
> >>+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
> >>+{
> >>+ vmcs_write64(APIC_ACCESS_ADDR, hpa);
> >>+}
> >>+
> >> static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
> >> {
> >> u16 status;
> >>@@ -8842,6 +8847,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
> >> .enable_irq_window = enable_irq_window,
> >> .update_cr8_intercept = update_cr8_intercept,
> >> .set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
> >>+ .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
> >> .vm_has_apicv = vmx_vm_has_apicv,
> >> .load_eoi_exitmap = vmx_load_eoi_exitmap,
> >> .hwapic_irr_update = vmx_hwapic_irr_update,
> >>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>index ffbe557..7080eda 100644
> >>--- a/arch/x86/kvm/x86.c
> >>+++ b/arch/x86/kvm/x86.c
> >>@@ -5929,6 +5929,18 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
> >> kvm_apic_update_tmr(vcpu, tmr);
> >> }
> >>
> >>+static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
> >>+{
> >>+ /*
> >>+ * When the page is being migrated, GUP will wait till the migrate
> >>+ * entry is replaced with the new pte entry pointing to the new page.
> >>+ */
> >>+ struct page *page = gfn_to_page_no_pin(vcpu->kvm,
> >>+ APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
> >If you do not use kvm->arch.apic_access_page to get current address why not drop it entirely?
> >
>
> I should also update kvm->arch.apic_access_page here. It is used in other
> places
> in kvm, so I don't think we should drop it. Will update the patch.
What other places? The only other place I see is in nested kvm code and you can call
gfn_to_page_no_pin() there instead of using kvm->arch.apic_access_page directly. But
as far as I see nested kvm code cannot handle change of APIC_ACCESS_ADDR phys address.
If APIC_ACCESS_ADDR changes during nested guest run, non nested vmcs will still have old
physical address. One way to fix that is to set KVM_REQ_APIC_PAGE_RELOAD during nested exit.

--
Gleb.

2014-07-15 10:38:31

by Tang Chen

[permalink] [raw]
Subject: Re: [RESEND PATCH v2 4/5] kvm: Remove ept_identity_pagetable from struct kvm_arch.

On 07/14/2014 10:27 PM, Gleb Natapov wrote:
......
>>>> if (likely(kvm->arch.ept_identity_pagetable_done))
>>>> return 1;
>>>> - ret = 0;
>>>> identity_map_pfn = kvm->arch.ept_identity_map_addr>> PAGE_SHIFT;
>>>> +
>>>> + mutex_lock(&kvm->slots_lock);
>>> Why move this out of alloc_identity_pagetable()?
>>>
>>
>> Referring to the original code, I think mutex_lock(&kvm->slots_lock) is used
>> to protect kvm->arch.ept_identity_pagetable. If two or more threads try to
>> modify it at the same time, the mutex ensures that the identity table is
>> only
>> allocated once.
>>
>> Now, we dropped kvm->arch.ept_identity_pagetable. And use
>> kvm->arch.ept_identity_pagetable_done
>> to check if the identity table is allocated and initialized. So we should
>> protect
>> memory slot operation in alloc_identity_pagetable() and
>> kvm->arch.ept_identity_pagetable_done
>> with this mutex.
>>
>> Of course, I can see that the name "slots_lock" indicates that it may be
>> used
>> to protect the memory slot operation only. Maybe move it out here is not
>> suitable.
>>
>> If I'm wrong, please tell me.
>>
> No, you are right that besides memory slot creation slots_lock protects checking
> of ept_identity_pagetable here, but after you patch ept_identity_pagetable_done is
> tested outside of slots_lock so the allocation can happen twice, no?

Oh, yes. Will fix it in the next version.

Thanks.

2014-07-15 11:53:35

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On 2014-07-14 16:58, Gleb Natapov wrote:
>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>>>> index ffbe557..7080eda 100644
>>>> --- a/arch/x86/kvm/x86.c
>>>> +++ b/arch/x86/kvm/x86.c
>>>> @@ -5929,6 +5929,18 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
>>>> kvm_apic_update_tmr(vcpu, tmr);
>>>> }
>>>>
>>>> +static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
>>>> +{
>>>> + /*
>>>> + * When the page is being migrated, GUP will wait till the migrate
>>>> + * entry is replaced with the new pte entry pointing to the new page.
>>>> + */
>>>> + struct page *page = gfn_to_page_no_pin(vcpu->kvm,
>>>> + APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
>>> If you do not use kvm->arch.apic_access_page to get current address why not drop it entirely?
>>>
>>
>> I should also update kvm->arch.apic_access_page here. It is used in other
>> places
>> in kvm, so I don't think we should drop it. Will update the patch.
> What other places? The only other place I see is in nested kvm code and you can call
> gfn_to_page_no_pin() there instead of using kvm->arch.apic_access_page directly. But
> as far as I see nested kvm code cannot handle change of APIC_ACCESS_ADDR phys address.
> If APIC_ACCESS_ADDR changes during nested guest run, non nested vmcs will still have old
> physical address. One way to fix that is to set KVM_REQ_APIC_PAGE_RELOAD during nested exit.

I cannot follow your concerns yet. Specifically, how should
APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?

Jan



Attachments:
signature.asc (263.00 B)
OpenPGP digital signature

2014-07-15 12:09:30

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On Tue, Jul 15, 2014 at 01:52:40PM +0200, Jan Kiszka wrote:
> On 2014-07-14 16:58, Gleb Natapov wrote:
> >>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> >>>> index ffbe557..7080eda 100644
> >>>> --- a/arch/x86/kvm/x86.c
> >>>> +++ b/arch/x86/kvm/x86.c
> >>>> @@ -5929,6 +5929,18 @@ static void vcpu_scan_ioapic(struct kvm_vcpu *vcpu)
> >>>> kvm_apic_update_tmr(vcpu, tmr);
> >>>> }
> >>>>
> >>>> +static void vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu)
> >>>> +{
> >>>> + /*
> >>>> + * When the page is being migrated, GUP will wait till the migrate
> >>>> + * entry is replaced with the new pte entry pointing to the new page.
> >>>> + */
> >>>> + struct page *page = gfn_to_page_no_pin(vcpu->kvm,
> >>>> + APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
> >>> If you do not use kvm->arch.apic_access_page to get current address why not drop it entirely?
> >>>
> >>
> >> I should also update kvm->arch.apic_access_page here. It is used in other
> >> places
> >> in kvm, so I don't think we should drop it. Will update the patch.
> > What other places? The only other place I see is in nested kvm code and you can call
> > gfn_to_page_no_pin() there instead of using kvm->arch.apic_access_page directly. But
> > as far as I see nested kvm code cannot handle change of APIC_ACCESS_ADDR phys address.
> > If APIC_ACCESS_ADDR changes during nested guest run, non nested vmcs will still have old
> > physical address. One way to fix that is to set KVM_REQ_APIC_PAGE_RELOAD during nested exit.
>
> I cannot follow your concerns yet. Specifically, how should
> APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
> currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?
>
I am talking about this case:
if (cpu_has_secondary_exec_ctrls()) {a
} else {
exec_control |=
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
vmcs_write64(APIC_ACCESS_ADDR,
page_to_phys(vcpu->kvm->arch.apic_access_page));
}
We do not pin here.

--
Gleb.

2014-07-15 12:10:46

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On 07/15/2014 07:52 PM, Jan Kiszka wrote:
> On 2014-07-14 16:58, Gleb Natapov wrote:
......
>>>>> + struct page *page = gfn_to_page_no_pin(vcpu->kvm,
>>>>> + APIC_DEFAULT_PHYS_BASE>> PAGE_SHIFT);
>>>> If you do not use kvm->arch.apic_access_page to get current address why not drop it entirely?
>>>>
>>>
>>> I should also update kvm->arch.apic_access_page here. It is used in other
>>> places
>>> in kvm, so I don't think we should drop it. Will update the patch.
>> What other places? The only other place I see is in nested kvm code and you can call
>> gfn_to_page_no_pin() there instead of using kvm->arch.apic_access_page directly. But
>> as far as I see nested kvm code cannot handle change of APIC_ACCESS_ADDR phys address.
>> If APIC_ACCESS_ADDR changes during nested guest run, non nested vmcs will still have old
>> physical address. One way to fix that is to set KVM_REQ_APIC_PAGE_RELOAD during nested exit.
>

Hi Jan,

Thanks for the reply. Please see below.

> I cannot follow your concerns yet. Specifically, how should
> APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
> currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?
>

Currently, we pin the nested apic page in memory. And as a result, the page
cannot be migrated/hot-removed, Just like the apic page for L1 vm.

What we want to do here is DO NOT ping the page in memory. When it is
migrated,
we track the hpa of the page and update the VMCS field at proper time.

Please refer to patch 5/5, I have done this for the L1 vm. The solution is:
1. When apic page is migrated, invalidate ept entry of the apic page in
mmu_notifier
registered by kvm, which is kvm_mmu_notifier_invalidate_page() here.
2. Introduce a new vcpu request named KVM_REQ_APIC_PAGE_RELOAD, and
enforce all the
vcpu to exit from guest mode, make this request to all the vcpus.
3. In the request handler, use GUP function to find back the new apic
page, and
update the VMCS field.

I think Gleb is trying to say that we have to face the same problem in
nested vm.

Thanks.

2014-07-15 12:32:04

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On 07/15/2014 08:09 PM, Gleb Natapov wrote:
> On Tue, Jul 15, 2014 at 01:52:40PM +0200, Jan Kiszka wrote:
......
>>
>> I cannot follow your concerns yet. Specifically, how should
>> APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
>> currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?
>>
> I am talking about this case:
> if (cpu_has_secondary_exec_ctrls()) {a
> } else {
> exec_control |=
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> vmcs_write64(APIC_ACCESS_ADDR,
> page_to_phys(vcpu->kvm->arch.apic_access_page));
> }
> We do not pin here.
>

Hi Gleb,


7905 if (exec_control &
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
......
7912 if (vmx->nested.apic_access_page) /*
shouldn't happen */
7913
nested_release_page(vmx->nested.apic_access_page);
7914 vmx->nested.apic_access_page =
7915 nested_get_page(vcpu,
vmcs12->apic_access_addr);

I thought you were talking about the problem here. We pin
vmcs12->apic_access_addr
in memory. And I think we should do the same thing to this page as to L1
vm.
Right ?

......
7922 if (!vmx->nested.apic_access_page)
7923 exec_control &=
7924
~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
7925 else
7926 vmcs_write64(APIC_ACCESS_ADDR,
7927
page_to_phys(vmx->nested.apic_access_page));
7928 } else if
(vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
7929 exec_control |=
7930
SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
7931 vmcs_write64(APIC_ACCESS_ADDR,
7932
page_to_phys(vcpu->kvm->arch.apic_access_page));
7933 }

And yes, we have the problem you said here. We can migrate the page
while L2 vm is running.
So I think we should enforce L2 vm to exit to L1. Right ?

Thanks.

2014-07-15 12:40:56

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On Tue, Jul 15, 2014 at 08:28:22PM +0800, Tang Chen wrote:
> On 07/15/2014 08:09 PM, Gleb Natapov wrote:
> >On Tue, Jul 15, 2014 at 01:52:40PM +0200, Jan Kiszka wrote:
> ......
> >>
> >>I cannot follow your concerns yet. Specifically, how should
> >>APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
> >>currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?
> >>
> >I am talking about this case:
> > if (cpu_has_secondary_exec_ctrls()) {a
> > } else {
> > exec_control |=
> > SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> > vmcs_write64(APIC_ACCESS_ADDR,
> > page_to_phys(vcpu->kvm->arch.apic_access_page));
> > }
> >We do not pin here.
> >
>
> Hi Gleb,
>
>
> 7905 if (exec_control &
> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
> ......
> 7912 if (vmx->nested.apic_access_page) /* shouldn't
> happen */
> 7913 nested_release_page(vmx->nested.apic_access_page);
> 7914 vmx->nested.apic_access_page =
> 7915 nested_get_page(vcpu,
> vmcs12->apic_access_addr);
>
> I thought you were talking about the problem here. We pin
> vmcs12->apic_access_addr
> in memory. And I think we should do the same thing to this page as to L1 vm.
> Right ?
Nested kvm pins a lot of pages, it will probably be not easy to handle all of them,
so for now I am concerned with non nested case only (but nested should continue to
work obviously, just pin pages like it does now).

>
> ......
> 7922 if (!vmx->nested.apic_access_page)
> 7923 exec_control &=
> 7924 ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> 7925 else
> 7926 vmcs_write64(APIC_ACCESS_ADDR,
> 7927 page_to_phys(vmx->nested.apic_access_page));
> 7928 } else if
> (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
> 7929 exec_control |=
> 7930 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> 7931 vmcs_write64(APIC_ACCESS_ADDR,
> 7932 page_to_phys(vcpu->kvm->arch.apic_access_page));
> 7933 }
>
> And yes, we have the problem you said here. We can migrate the page while L2
> vm is running.
> So I think we should enforce L2 vm to exit to L1. Right ?
>
We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.

--
Gleb.

2014-07-15 12:53:05

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On 07/15/2014 08:40 PM, Gleb Natapov wrote:
> On Tue, Jul 15, 2014 at 08:28:22PM +0800, Tang Chen wrote:
>> On 07/15/2014 08:09 PM, Gleb Natapov wrote:
>>> On Tue, Jul 15, 2014 at 01:52:40PM +0200, Jan Kiszka wrote:
>> ......
>>>>
>>>> I cannot follow your concerns yet. Specifically, how should
>>>> APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
>>>> currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?
>>>>
>>> I am talking about this case:
>>> if (cpu_has_secondary_exec_ctrls()) {a
>>> } else {
>>> exec_control |=
>>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>>> vmcs_write64(APIC_ACCESS_ADDR,
>>> page_to_phys(vcpu->kvm->arch.apic_access_page));
>>> }
>>> We do not pin here.
>>>
>>
>> Hi Gleb,
>>
>>
>> 7905 if (exec_control&
>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
>> ......
>> 7912 if (vmx->nested.apic_access_page) /* shouldn't
>> happen */
>> 7913 nested_release_page(vmx->nested.apic_access_page);
>> 7914 vmx->nested.apic_access_page =
>> 7915 nested_get_page(vcpu,
>> vmcs12->apic_access_addr);
>>
>> I thought you were talking about the problem here. We pin
>> vmcs12->apic_access_addr
>> in memory. And I think we should do the same thing to this page as to L1 vm.
>> Right ?
> Nested kvm pins a lot of pages, it will probably be not easy to handle all of them,
> so for now I am concerned with non nested case only (but nested should continue to
> work obviously, just pin pages like it does now).

True. I will work on it.

And also, when using PCI passthrough, kvm_pin_pages() also pins some
pages. This is
also in my todo list.

But sorry, a little strange. I didn't find where
vmcs12->apic_access_addr is allocated
or initialized... Would you please tell me ?

>
>>
>> ......
>> 7922 if (!vmx->nested.apic_access_page)
>> 7923 exec_control&=
>> 7924 ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>> 7925 else
>> 7926 vmcs_write64(APIC_ACCESS_ADDR,
>> 7927 page_to_phys(vmx->nested.apic_access_page));
>> 7928 } else if
>> (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
>> 7929 exec_control |=
>> 7930 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>> 7931 vmcs_write64(APIC_ACCESS_ADDR,
>> 7932 page_to_phys(vcpu->kvm->arch.apic_access_page));
>> 7933 }
>>
>> And yes, we have the problem you said here. We can migrate the page while L2
>> vm is running.
>> So I think we should enforce L2 vm to exit to L1. Right ?
>>
> We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
> if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.
>

apic pages for L2 and L1 are not the same page, right ?

I think, just like we are doing in patch 5/5, we cannot wait for the
next L2->L1 vmexit.
We should enforce a L2->L1 vmexit in mmu_notifier, just like
make_all_cpus_request() does.

Am I right ?

Thanks.

2014-07-15 13:10:41

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On 2014-07-15 14:40, Gleb Natapov wrote:
>>
>> ......
>> 7922 if (!vmx->nested.apic_access_page)
>> 7923 exec_control &=
>> 7924 ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>> 7925 else
>> 7926 vmcs_write64(APIC_ACCESS_ADDR,
>> 7927 page_to_phys(vmx->nested.apic_access_page));
>> 7928 } else if
>> (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
>> 7929 exec_control |=
>> 7930 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
>> 7931 vmcs_write64(APIC_ACCESS_ADDR,
>> 7932 page_to_phys(vcpu->kvm->arch.apic_access_page));
>> 7933 }
>>
>> And yes, we have the problem you said here. We can migrate the page while L2
>> vm is running.
>> So I think we should enforce L2 vm to exit to L1. Right ?
>>
> We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
> if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.

How should this host-managed VMCS field possibly change while L2 is running?

Jan



Attachments:
signature.asc (263.00 B)
OpenPGP digital signature

2014-07-15 14:04:23

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On Tue, Jul 15, 2014 at 03:10:15PM +0200, Jan Kiszka wrote:
> On 2014-07-15 14:40, Gleb Natapov wrote:
> >>
> >> ......
> >> 7922 if (!vmx->nested.apic_access_page)
> >> 7923 exec_control &=
> >> 7924 ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> >> 7925 else
> >> 7926 vmcs_write64(APIC_ACCESS_ADDR,
> >> 7927 page_to_phys(vmx->nested.apic_access_page));
> >> 7928 } else if
> >> (vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
> >> 7929 exec_control |=
> >> 7930 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> >> 7931 vmcs_write64(APIC_ACCESS_ADDR,
> >> 7932 page_to_phys(vcpu->kvm->arch.apic_access_page));
> >> 7933 }
> >>
> >> And yes, we have the problem you said here. We can migrate the page while L2
> >> vm is running.
> >> So I think we should enforce L2 vm to exit to L1. Right ?
> >>
> > We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
> > if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.
>
> How should this host-managed VMCS field possibly change while L2 is running?
>
That what "Do not pin apic access page in memory" patch is doing. It changes APIC_ACCESS_ADDR
of a current vmcs. It kicks vcpu out of a guest mode of course, but vcpu may still be in
L2 at this point, so only L2 vmcs will get new APIC_ACCESS_ADDR pointer, L1 vmcs will still have
an old one.

--
Gleb.

2014-07-15 14:40:42

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On Tue, Jul 15, 2014 at 08:54:01PM +0800, Tang Chen wrote:
> On 07/15/2014 08:40 PM, Gleb Natapov wrote:
> >On Tue, Jul 15, 2014 at 08:28:22PM +0800, Tang Chen wrote:
> >>On 07/15/2014 08:09 PM, Gleb Natapov wrote:
> >>>On Tue, Jul 15, 2014 at 01:52:40PM +0200, Jan Kiszka wrote:
> >>......
> >>>>
> >>>>I cannot follow your concerns yet. Specifically, how should
> >>>>APIC_ACCESS_ADDR (the VMCS field, right?) change while L2 is running? We
> >>>>currently pin/unpin on L1->L2/L2->L1, respectively. Or what do you mean?
> >>>>
> >>>I am talking about this case:
> >>> if (cpu_has_secondary_exec_ctrls()) {a
> >>> } else {
> >>> exec_control |=
> >>> SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> >>> vmcs_write64(APIC_ACCESS_ADDR,
> >>> page_to_phys(vcpu->kvm->arch.apic_access_page));
> >>> }
> >>>We do not pin here.
> >>>
> >>
> >>Hi Gleb,
> >>
> >>
> >>7905 if (exec_control&
> >>SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES) {
> >>......
> >>7912 if (vmx->nested.apic_access_page) /* shouldn't
> >>happen */
> >>7913 nested_release_page(vmx->nested.apic_access_page);
> >>7914 vmx->nested.apic_access_page =
> >>7915 nested_get_page(vcpu,
> >>vmcs12->apic_access_addr);
> >>
> >>I thought you were talking about the problem here. We pin
> >>vmcs12->apic_access_addr
> >>in memory. And I think we should do the same thing to this page as to L1 vm.
> >>Right ?
> >Nested kvm pins a lot of pages, it will probably be not easy to handle all of them,
> >so for now I am concerned with non nested case only (but nested should continue to
> >work obviously, just pin pages like it does now).
>
> True. I will work on it.
>
> And also, when using PCI passthrough, kvm_pin_pages() also pins some pages.
> This is
> also in my todo list.
Those pages are (almost) directly accessible by assigned PCI devices,
I am not sure this is even doable.

>
> But sorry, a little strange. I didn't find where vmcs12->apic_access_addr is
> allocated
> or initialized... Would you please tell me ?
handle_vmwrite() writes it when guest is executing vmwrite(APIC_ACCESS_ADDR);

>
> >
> >>
> >>......
> >>7922 if (!vmx->nested.apic_access_page)
> >>7923 exec_control&=
> >>7924 ~SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> >>7925 else
> >>7926 vmcs_write64(APIC_ACCESS_ADDR,
> >>7927 page_to_phys(vmx->nested.apic_access_page));
> >>7928 } else if
> >>(vm_need_virtualize_apic_accesses(vmx->vcpu.kvm)) {
> >>7929 exec_control |=
> >>7930 SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES;
> >>7931 vmcs_write64(APIC_ACCESS_ADDR,
> >>7932 page_to_phys(vcpu->kvm->arch.apic_access_page));
> >>7933 }
> >>
> >>And yes, we have the problem you said here. We can migrate the page while L2
> >>vm is running.
> >>So I think we should enforce L2 vm to exit to L1. Right ?
> >>
> >We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
> >if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.
> >
>
> apic pages for L2 and L1 are not the same page, right ?
>
If L2 guest enable apic access page then they are different, otherwise
they are the same.

> I think, just like we are doing in patch 5/5, we cannot wait for the next
> L2->L1 vmexit.
> We should enforce a L2->L1 vmexit in mmu_notifier, just like
> make_all_cpus_request() does.
>
> Am I right ?
>
I do not see why forcing APIC_ACCESS_ADDR reload during L2->L1 exit is not enough.

--
Gleb.

2014-07-17 09:21:12

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

Hi Gleb,

Sorry for the delay. Please see below.

On 07/15/2014 10:40 PM, Gleb Natapov wrote:
......
>>>>
>>> We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
>>> if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.
>>>
>>
>> apic pages for L2 and L1 are not the same page, right ?
>>
> If L2 guest enable apic access page then they are different, otherwise
> they are the same.
>
>> I think, just like we are doing in patch 5/5, we cannot wait for the next
>> L2->L1 vmexit.
>> We should enforce a L2->L1 vmexit in mmu_notifier, just like
>> make_all_cpus_request() does.
>>
>> Am I right ?
>>
> I do not see why forcing APIC_ACCESS_ADDR reload during L2->L1 exit is not enough.

Yes, you are right. APIC_ACCESS_ADDR reload should be done during L2->L1
vmexit.

I mean, before the page is moved to other place, we have to enforce a
L2->L1 vmexit,
but not wait for the next L2->L1 vmexit. Since when the page is being
moved, if the
L2 vm is still running, it could access apic page directly. And the vm
may corrupt.

In the mmu_notifier called before the page is moved, we have to enforce
a L2->L1
vmexit, and ask vcpus to reload APIC_ACCESS_ADDR for L2 vm. The process
will wait
till the page migration is completed, and update the APIC_ACCESS_ADDR,
and re-enter
guest mode.

Thanks.

2014-07-17 13:33:28

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

Hi Gleb,

On 07/15/2014 08:40 PM, Gleb Natapov wrote:
......
>>
>> And yes, we have the problem you said here. We can migrate the page while L2
>> vm is running.
>> So I think we should enforce L2 vm to exit to L1. Right ?
>>
> We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
> if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.
>

Sorry, I think I don't quite understand the procedure you are talking
about here.

Referring to the code, I think we have three machines: L0(host), L1 and L2.
And we have two types of vmexit: L2->L1 and L2->L0. Right ?

We are now talking about this case: L2 and L1 shares the apic page.

Using patch 5/5, when apic page is migrated on L0, mmu_notifier will
notify L1,
and update L1's VMCS. At this time, we are in L0, not L2. Why cannot we
update
the L2's VMCS at the same time ? Is it because we don't know how many
L2 vms
there are in L1 ?

And, when will L2->L1 vmexit happen ? When we enforce L1 to exit to L0 by
calling make_all_cpus_request(), is L2->L1 vmexit triggered automatically ?

Thanks.

2014-07-17 13:57:16

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On Thu, Jul 17, 2014 at 09:34:20PM +0800, Tang Chen wrote:
> Hi Gleb,
>
> On 07/15/2014 08:40 PM, Gleb Natapov wrote:
> ......
> >>
> >>And yes, we have the problem you said here. We can migrate the page while L2
> >>vm is running.
> >>So I think we should enforce L2 vm to exit to L1. Right ?
> >>
> >We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
> >if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.
> >
>
> Sorry, I think I don't quite understand the procedure you are talking about
> here.
>
> Referring to the code, I think we have three machines: L0(host), L1 and L2.
> And we have two types of vmexit: L2->L1 and L2->L0. Right ?
>
> We are now talking about this case: L2 and L1 shares the apic page.
>
> Using patch 5/5, when apic page is migrated on L0, mmu_notifier will notify
> L1,
> and update L1's VMCS. At this time, we are in L0, not L2. Why cannot we
Using patch 5/5, when apic page is migrated on L0, mmu_notifier will notify
L1 or L2 VMCS depending on which one happens to be running right now.
If it is L1 then L2's VMCS will be updated during vmentry emulation, if it is
L2 we need to request reload during vmexit emulation to make sure L1's VMCS is
updated.

--
Gleb.

2014-07-18 09:04:25

by Tang Chen

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

Hi Gleb,

On 07/17/2014 09:57 PM, Gleb Natapov wrote:
> On Thu, Jul 17, 2014 at 09:34:20PM +0800, Tang Chen wrote:
>> Hi Gleb,
>>
>> On 07/15/2014 08:40 PM, Gleb Natapov wrote:
>> ......
>>>>
>>>> And yes, we have the problem you said here. We can migrate the page while L2
>>>> vm is running.
>>>> So I think we should enforce L2 vm to exit to L1. Right ?
>>>>
>>> We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
>>> if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.
>>>
>>
>> Sorry, I think I don't quite understand the procedure you are talking about
>> here.
>>
>> Referring to the code, I think we have three machines: L0(host), L1 and L2.
>> And we have two types of vmexit: L2->L1 and L2->L0. Right ?
>>
>> We are now talking about this case: L2 and L1 shares the apic page.
>>
>> Using patch 5/5, when apic page is migrated on L0, mmu_notifier will notify
>> L1,
>> and update L1's VMCS. At this time, we are in L0, not L2. Why cannot we
> Using patch 5/5, when apic page is migrated on L0, mmu_notifier will notify
> L1 or L2 VMCS depending on which one happens to be running right now.
> If it is L1 then L2's VMCS will be updated during vmentry emulation,

OK, this is easy to understand.

>if it is
> L2 we need to request reload during vmexit emulation to make sure L1's VMCS is
> updated.
>

I'm a little confused here. In patch 5/5, I called
make_all_cpus_request() to
force all vcpus exit to host. If we are in L2, where will the vcpus exit
to ?
L1 or L0 ?

Thanks.

2014-07-18 11:21:51

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v2 5/5] kvm, mem-hotplug: Do not pin apic access page in memory.

On Fri, Jul 18, 2014 at 05:05:20PM +0800, Tang Chen wrote:
> Hi Gleb,
>
> On 07/17/2014 09:57 PM, Gleb Natapov wrote:
> >On Thu, Jul 17, 2014 at 09:34:20PM +0800, Tang Chen wrote:
> >>Hi Gleb,
> >>
> >>On 07/15/2014 08:40 PM, Gleb Natapov wrote:
> >>......
> >>>>
> >>>>And yes, we have the problem you said here. We can migrate the page while L2
> >>>>vm is running.
> >>>>So I think we should enforce L2 vm to exit to L1. Right ?
> >>>>
> >>>We can request APIC_ACCESS_ADDR reload during L2->L1 vmexit emulation, so
> >>>if APIC_ACCESS_ADDR changes while L2 is running it will be reloaded for L1 too.
> >>>
> >>
> >>Sorry, I think I don't quite understand the procedure you are talking about
> >>here.
> >>
> >>Referring to the code, I think we have three machines: L0(host), L1 and L2.
> >>And we have two types of vmexit: L2->L1 and L2->L0. Right ?
> >>
> >>We are now talking about this case: L2 and L1 shares the apic page.
> >>
> >>Using patch 5/5, when apic page is migrated on L0, mmu_notifier will notify
> >>L1,
> >>and update L1's VMCS. At this time, we are in L0, not L2. Why cannot we
> >Using patch 5/5, when apic page is migrated on L0, mmu_notifier will notify
> >L1 or L2 VMCS depending on which one happens to be running right now.
> >If it is L1 then L2's VMCS will be updated during vmentry emulation,
>
> OK, this is easy to understand.
>
> >if it is
> >L2 we need to request reload during vmexit emulation to make sure L1's VMCS is
> >updated.
> >
>
> I'm a little confused here. In patch 5/5, I called make_all_cpus_request()
> to
> force all vcpus exit to host. If we are in L2, where will the vcpus exit to
> ?
> L1 or L0 ?
>
L0. CPU always exits to L0. L1->L2 exit is emulated by nested_vmx_vmexit().

--
Gleb.