2011-05-15 15:18:42

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 1/7] KVM: MMU: optimize pte write path if don't have protected sp

Simply return from kvm_mmu_pte_write path if no shadow page is
write-protected, then we can avoid to walk all shadow pages and hold
mmu-lock

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.c | 9 +++++++++
2 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index d2ac8e2..152601a 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -441,6 +441,7 @@ struct kvm_arch {
unsigned int n_used_mmu_pages;
unsigned int n_requested_mmu_pages;
unsigned int n_max_mmu_pages;
+ unsigned int indirect_shadow_pages;
atomic_t invlpg_counter;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2841805..ad520d4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -498,6 +498,7 @@ static void account_shadowed(struct kvm *kvm, gfn_t gfn)
linfo = lpage_info_slot(gfn, slot, i);
linfo->write_count += 1;
}
+ kvm->arch.indirect_shadow_pages++;
}

static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
@@ -513,6 +514,7 @@ static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
linfo->write_count -= 1;
WARN_ON(linfo->write_count < 0);
}
+ kvm->arch.indirect_shadow_pages--;
}

static int has_wrprotected_page(struct kvm *kvm,
@@ -3233,6 +3235,13 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
int level, npte, invlpg_counter, r, flooded = 0;
bool remote_flush, local_flush, zap_page;

+ /*
+ * If we don't have indirect shadow pages, it means no page is
+ * write-protected, so we can exit simply.
+ */
+ if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages))
+ return;
+
zap_page = remote_flush = local_flush = false;
offset = offset_in_page(gpa);

--
1.7.4.4


2011-05-15 15:20:23

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 2/7] KVM: use __copy_to_user/__clear_user to write guest page

Simply use __copy_to_user/__clear_user to write guest page since we have
already verified the user address when the memslot is set

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/x86.c | 4 ++--
virt/kvm/kvm_main.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 77c9d86..a419bd1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1388,7 +1388,7 @@ static int set_msr_hyperv_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data)
return 1;
kvm_x86_ops->patch_hypercall(vcpu, instructions);
((unsigned char *)instructions)[3] = 0xc3; /* ret */
- if (copy_to_user((void __user *)addr, instructions, 4))
+ if (__copy_to_user((void __user *)addr, instructions, 4))
return 1;
kvm->arch.hv_hypercall = data;
break;
@@ -1415,7 +1415,7 @@ static int set_msr_hyperv(struct kvm_vcpu *vcpu, u32 msr, u64 data)
HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT);
if (kvm_is_error_hva(addr))
return 1;
- if (clear_user((void __user *)addr, PAGE_SIZE))
+ if (__clear_user((void __user *)addr, PAGE_SIZE))
return 1;
vcpu->arch.hv_vapic = data;
break;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 22cdb96..3962899 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1342,7 +1342,7 @@ int kvm_write_guest_page(struct kvm *kvm, gfn_t gfn, const void *data,
addr = gfn_to_hva(kvm, gfn);
if (kvm_is_error_hva(addr))
return -EFAULT;
- r = copy_to_user((void __user *)addr + offset, data, len);
+ r = __copy_to_user((void __user *)addr + offset, data, len);
if (r)
return -EFAULT;
mark_page_dirty(kvm, gfn);
@@ -1402,7 +1402,7 @@ int kvm_write_guest_cached(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
if (kvm_is_error_hva(ghc->hva))
return -EFAULT;

- r = copy_to_user((void __user *)ghc->hva, data, len);
+ r = __copy_to_user((void __user *)ghc->hva, data, len);
if (r)
return -EFAULT;
mark_page_dirty_in_slot(kvm, ghc->memslot, ghc->gpa >> PAGE_SHIFT);
--
1.7.4.4

2011-05-15 15:23:24

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 3/7] KVM: fix uninitialized warning

Fix:

warning: ‘cs_sel’ may be used uninitialized in this function
warning: ‘ss_sel’ may be used uninitialized in this function

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/emulate.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 291c872..ea32340 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -2015,7 +2015,7 @@ emulate_sysexit(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
struct desc_struct cs, ss;
u64 msr_data;
int usermode;
- u16 cs_sel, ss_sel;
+ u16 cs_sel = 0, ss_sel = 0;

/* inject #GP if in real mode or Virtual 8086 mode */
if (ctxt->mode == X86EMUL_MODE_REAL ||
--
1.7.4.4

2011-05-15 15:24:34

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 4/7] KVM: MMU: abstract the operation of rmap

Abstract the operation of rmap to spte_list, then we can use it for the
reverse mapping of parent pte in the later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 260 +++++++++++++++++++++------------------
2 files changed, 140 insertions(+), 122 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 152601a..0d824e4 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -347,7 +347,7 @@ struct kvm_vcpu_arch {
struct kvm_pv_mmu_op_buffer mmu_op_buffer;

struct kvm_mmu_memory_cache mmu_pte_chain_cache;
- struct kvm_mmu_memory_cache mmu_rmap_desc_cache;
+ struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index ad520d4..5ba347b 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -148,7 +148,7 @@ module_param(oos_shadow, bool, 0644);
#define PT64_PERM_MASK (PT_PRESENT_MASK | PT_WRITABLE_MASK | PT_USER_MASK \
| PT64_NX_MASK)

-#define RMAP_EXT 4
+#define PTE_LIST_EXT 4

#define ACC_EXEC_MASK 1
#define ACC_WRITE_MASK PT_WRITABLE_MASK
@@ -164,9 +164,9 @@ module_param(oos_shadow, bool, 0644);

#define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level)

-struct kvm_rmap_desc {
- u64 *sptes[RMAP_EXT];
- struct kvm_rmap_desc *more;
+struct pte_list_desc {
+ u64 *sptes[PTE_LIST_EXT];
+ struct pte_list_desc *more;
};

struct kvm_shadow_walk_iterator {
@@ -185,7 +185,7 @@ struct kvm_shadow_walk_iterator {
typedef void (*mmu_parent_walk_fn) (struct kvm_mmu_page *sp, u64 *spte);

static struct kmem_cache *pte_chain_cache;
-static struct kmem_cache *rmap_desc_cache;
+static struct kmem_cache *pte_list_desc_cache;
static struct kmem_cache *mmu_page_header_cache;
static struct percpu_counter kvm_total_used_mmu_pages;

@@ -401,8 +401,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
pte_chain_cache, 4);
if (r)
goto out;
- r = mmu_topup_memory_cache(&vcpu->arch.mmu_rmap_desc_cache,
- rmap_desc_cache, 4 + PTE_PREFETCH_NUM);
+ r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
+ pte_list_desc_cache, 4 + PTE_PREFETCH_NUM);
if (r)
goto out;
r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
@@ -416,8 +416,10 @@ out:

static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
{
- mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache, pte_chain_cache);
- mmu_free_memory_cache(&vcpu->arch.mmu_rmap_desc_cache, rmap_desc_cache);
+ mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache,
+ pte_chain_cache);
+ mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
+ pte_list_desc_cache);
mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache,
mmu_page_header_cache);
@@ -444,15 +446,15 @@ static void mmu_free_pte_chain(struct kvm_pte_chain *pc)
kmem_cache_free(pte_chain_cache, pc);
}

-static struct kvm_rmap_desc *mmu_alloc_rmap_desc(struct kvm_vcpu *vcpu)
+static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
{
- return mmu_memory_cache_alloc(&vcpu->arch.mmu_rmap_desc_cache,
- sizeof(struct kvm_rmap_desc));
+ return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache,
+ sizeof(struct pte_list_desc));
}

-static void mmu_free_rmap_desc(struct kvm_rmap_desc *rd)
+static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
{
- kmem_cache_free(rmap_desc_cache, rd);
+ kmem_cache_free(pte_list_desc_cache, pte_list_desc);
}

static gfn_t kvm_mmu_page_get_gfn(struct kvm_mmu_page *sp, int index)
@@ -590,67 +592,42 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
}

/*
- * Take gfn and return the reverse mapping to it.
- */
-
-static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
-{
- struct kvm_memory_slot *slot;
- struct kvm_lpage_info *linfo;
-
- slot = gfn_to_memslot(kvm, gfn);
- if (likely(level == PT_PAGE_TABLE_LEVEL))
- return &slot->rmap[gfn - slot->base_gfn];
-
- linfo = lpage_info_slot(gfn, slot, level);
-
- return &linfo->rmap_pde;
-}
-
-/*
- * Reverse mapping data structures:
+ * Pte mapping structures:
*
- * If rmapp bit zero is zero, then rmapp point to the shadw page table entry
- * that points to page_address(page).
+ * If pte_list bit zero is zero, then pte_list point to the spte.
*
- * If rmapp bit zero is one, (then rmap & ~1) points to a struct kvm_rmap_desc
- * containing more mappings.
+ * If pte_list bit zero is one, (then pte_list & ~1) points to a struct
+ * pte_list_desc containing more mappings.
*
- * Returns the number of rmap entries before the spte was added or zero if
+ * Returns the number of pte entries before the spte was added or zero if
* the spte was not added.
*
*/
-static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
+static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte,
+ unsigned long *pte_list)
{
- struct kvm_mmu_page *sp;
- struct kvm_rmap_desc *desc;
- unsigned long *rmapp;
+ struct pte_list_desc *desc;
int i, count = 0;

- if (!is_rmap_spte(*spte))
- return count;
- sp = page_header(__pa(spte));
- kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
- rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
- if (!*rmapp) {
- rmap_printk("rmap_add: %p %llx 0->1\n", spte, *spte);
- *rmapp = (unsigned long)spte;
- } else if (!(*rmapp & 1)) {
- rmap_printk("rmap_add: %p %llx 1->many\n", spte, *spte);
- desc = mmu_alloc_rmap_desc(vcpu);
- desc->sptes[0] = (u64 *)*rmapp;
+ if (!*pte_list) {
+ rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte);
+ *pte_list = (unsigned long)spte;
+ } else if (!(*pte_list & 1)) {
+ rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte);
+ desc = mmu_alloc_pte_list_desc(vcpu);
+ desc->sptes[0] = (u64 *)*pte_list;
desc->sptes[1] = spte;
- *rmapp = (unsigned long)desc | 1;
+ *pte_list = (unsigned long)desc | 1;
++count;
} else {
- rmap_printk("rmap_add: %p %llx many->many\n", spte, *spte);
- desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
- while (desc->sptes[RMAP_EXT-1] && desc->more) {
+ rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte);
+ desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+ while (desc->sptes[PTE_LIST_EXT-1] && desc->more) {
desc = desc->more;
- count += RMAP_EXT;
+ count += PTE_LIST_EXT;
}
- if (desc->sptes[RMAP_EXT-1]) {
- desc->more = mmu_alloc_rmap_desc(vcpu);
+ if (desc->sptes[PTE_LIST_EXT-1]) {
+ desc->more = mmu_alloc_pte_list_desc(vcpu);
desc = desc->more;
}
for (i = 0; desc->sptes[i]; ++i)
@@ -660,59 +637,78 @@ static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
return count;
}

-static void rmap_desc_remove_entry(unsigned long *rmapp,
- struct kvm_rmap_desc *desc,
- int i,
- struct kvm_rmap_desc *prev_desc)
+static u64 *pte_list_next(unsigned long *pte_list, u64 *spte)
+{
+ struct pte_list_desc *desc;
+ u64 *prev_spte;
+ int i;
+
+ if (!*pte_list)
+ return NULL;
+ else if (!(*pte_list & 1)) {
+ if (!spte)
+ return (u64 *)*pte_list;
+ return NULL;
+ }
+ desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+ prev_spte = NULL;
+ while (desc) {
+ for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) {
+ if (prev_spte == spte)
+ return desc->sptes[i];
+ prev_spte = desc->sptes[i];
+ }
+ desc = desc->more;
+ }
+ return NULL;
+}
+
+static void
+pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc,
+ int i, struct pte_list_desc *prev_desc)
{
int j;

- for (j = RMAP_EXT - 1; !desc->sptes[j] && j > i; --j)
+ for (j = PTE_LIST_EXT - 1; !desc->sptes[j] && j > i; --j)
;
desc->sptes[i] = desc->sptes[j];
desc->sptes[j] = NULL;
if (j != 0)
return;
if (!prev_desc && !desc->more)
- *rmapp = (unsigned long)desc->sptes[0];
+ *pte_list = (unsigned long)desc->sptes[0];
else
if (prev_desc)
prev_desc->more = desc->more;
else
- *rmapp = (unsigned long)desc->more | 1;
- mmu_free_rmap_desc(desc);
+ *pte_list = (unsigned long)desc->more | 1;
+ mmu_free_pte_list_desc(desc);
}

-static void rmap_remove(struct kvm *kvm, u64 *spte)
+static void pte_list_remove(u64 *spte, unsigned long *pte_list)
{
- struct kvm_rmap_desc *desc;
- struct kvm_rmap_desc *prev_desc;
- struct kvm_mmu_page *sp;
- gfn_t gfn;
- unsigned long *rmapp;
+ struct pte_list_desc *desc;
+ struct pte_list_desc *prev_desc;
int i;

- sp = page_header(__pa(spte));
- gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
- rmapp = gfn_to_rmap(kvm, gfn, sp->role.level);
- if (!*rmapp) {
- printk(KERN_ERR "rmap_remove: %p 0->BUG\n", spte);
+ if (!*pte_list) {
+ printk(KERN_ERR "pte_list_remove: %p 0->BUG\n", spte);
BUG();
- } else if (!(*rmapp & 1)) {
- rmap_printk("rmap_remove: %p 1->0\n", spte);
- if ((u64 *)*rmapp != spte) {
- printk(KERN_ERR "rmap_remove: %p 1->BUG\n", spte);
+ } else if (!(*pte_list & 1)) {
+ rmap_printk("pte_list_remove: %p 1->0\n", spte);
+ if ((u64 *)*pte_list != spte) {
+ printk(KERN_ERR "pte_list_remove: %p 1->BUG\n", spte);
BUG();
}
- *rmapp = 0;
+ *pte_list = 0;
} else {
- rmap_printk("rmap_remove: %p many->many\n", spte);
- desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
+ rmap_printk("pte_list_remove: %p many->many\n", spte);
+ desc = (struct pte_list_desc *)(*pte_list & ~1ul);
prev_desc = NULL;
while (desc) {
- for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i)
+ for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
if (desc->sptes[i] == spte) {
- rmap_desc_remove_entry(rmapp,
+ pte_list_desc_remove_entry(pte_list,
desc, i,
prev_desc);
return;
@@ -720,11 +716,59 @@ static void rmap_remove(struct kvm *kvm, u64 *spte)
prev_desc = desc;
desc = desc->more;
}
- pr_err("rmap_remove: %p many->many\n", spte);
+ pr_err("pte_list_remove: %p many->many\n", spte);
BUG();
}
}

+/*
+ * Take gfn and return the reverse mapping to it.
+ */
+static unsigned long *gfn_to_rmap(struct kvm *kvm, gfn_t gfn, int level)
+{
+ struct kvm_memory_slot *slot;
+ struct kvm_lpage_info *linfo;
+
+ slot = gfn_to_memslot(kvm, gfn);
+ if (likely(level == PT_PAGE_TABLE_LEVEL))
+ return &slot->rmap[gfn - slot->base_gfn];
+
+ linfo = lpage_info_slot(gfn, slot, level);
+
+ return &linfo->rmap_pde;
+}
+
+static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
+{
+ struct kvm_mmu_page *sp;
+ unsigned long *rmapp;
+
+ if (!is_rmap_spte(*spte))
+ return 0;
+
+ sp = page_header(__pa(spte));
+ kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn);
+ rmapp = gfn_to_rmap(vcpu->kvm, gfn, sp->role.level);
+ return pte_list_add(vcpu, spte, rmapp);
+}
+
+static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)
+{
+ return pte_list_next(rmapp, spte);
+}
+
+static void rmap_remove(struct kvm *kvm, u64 *spte)
+{
+ struct kvm_mmu_page *sp;
+ gfn_t gfn;
+ unsigned long *rmapp;
+
+ sp = page_header(__pa(spte));
+ gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt);
+ rmapp = gfn_to_rmap(kvm, gfn, sp->role.level);
+ pte_list_remove(spte, rmapp);
+}
+
static int set_spte_track_bits(u64 *sptep, u64 new_spte)
{
pfn_t pfn;
@@ -752,32 +796,6 @@ static void drop_spte(struct kvm *kvm, u64 *sptep, u64 new_spte)
rmap_remove(kvm, sptep);
}

-static u64 *rmap_next(struct kvm *kvm, unsigned long *rmapp, u64 *spte)
-{
- struct kvm_rmap_desc *desc;
- u64 *prev_spte;
- int i;
-
- if (!*rmapp)
- return NULL;
- else if (!(*rmapp & 1)) {
- if (!spte)
- return (u64 *)*rmapp;
- return NULL;
- }
- desc = (struct kvm_rmap_desc *)(*rmapp & ~1ul);
- prev_spte = NULL;
- while (desc) {
- for (i = 0; i < RMAP_EXT && desc->sptes[i]; ++i) {
- if (prev_spte == spte)
- return desc->sptes[i];
- prev_spte = desc->sptes[i];
- }
- desc = desc->more;
- }
- return NULL;
-}
-
static int rmap_write_protect(struct kvm *kvm, u64 gfn)
{
unsigned long *rmapp;
@@ -3600,8 +3618,8 @@ static void mmu_destroy_caches(void)
{
if (pte_chain_cache)
kmem_cache_destroy(pte_chain_cache);
- if (rmap_desc_cache)
- kmem_cache_destroy(rmap_desc_cache);
+ if (pte_list_desc_cache)
+ kmem_cache_destroy(pte_list_desc_cache);
if (mmu_page_header_cache)
kmem_cache_destroy(mmu_page_header_cache);
}
@@ -3613,10 +3631,10 @@ int kvm_mmu_module_init(void)
0, 0, NULL);
if (!pte_chain_cache)
goto nomem;
- rmap_desc_cache = kmem_cache_create("kvm_rmap_desc",
- sizeof(struct kvm_rmap_desc),
+ pte_list_desc_cache = kmem_cache_create("pte_list_desc",
+ sizeof(struct pte_list_desc),
0, 0, NULL);
- if (!rmap_desc_cache)
+ if (!pte_list_desc_cache)
goto nomem;

mmu_page_header_cache = kmem_cache_create("kvm_mmu_page_header",
--
1.7.4.4

2011-05-15 15:25:22

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 5/7] KVM: MMU: remove the arithmetic of parent pte rmap

Parent pte rmap and page rmap are very similar, so use the same arithmetic
for them

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 7 +--
arch/x86/kvm/mmu.c | 189 +++++++++-----------------------------
2 files changed, 46 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 0d824e4..e8a68f8 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -227,14 +227,10 @@ struct kvm_mmu_page {
* in this shadow page.
*/
DECLARE_BITMAP(slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
- bool multimapped; /* More than one parent_pte? */
bool unsync;
int root_count; /* Currently serving as active root */
unsigned int unsync_children;
- union {
- u64 *parent_pte; /* !multimapped */
- struct hlist_head parent_ptes; /* multimapped, kvm_pte_chain */
- };
+ unsigned long parent_ptes; /* Reverse mapping for parent_pte */
DECLARE_BITMAP(unsync_child_bitmap, 512);
};

@@ -346,7 +342,6 @@ struct kvm_vcpu_arch {
* put it here to avoid allocation */
struct kvm_pv_mmu_op_buffer mmu_op_buffer;

- struct kvm_mmu_memory_cache mmu_pte_chain_cache;
struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
struct kvm_mmu_memory_cache mmu_page_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 5ba347b..c0b16dd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -182,9 +182,6 @@ struct kvm_shadow_walk_iterator {
shadow_walk_okay(&(_walker)); \
shadow_walk_next(&(_walker)))

-typedef void (*mmu_parent_walk_fn) (struct kvm_mmu_page *sp, u64 *spte);
-
-static struct kmem_cache *pte_chain_cache;
static struct kmem_cache *pte_list_desc_cache;
static struct kmem_cache *mmu_page_header_cache;
static struct percpu_counter kvm_total_used_mmu_pages;
@@ -397,12 +394,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
{
int r;

- r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_chain_cache,
- pte_chain_cache, 4);
- if (r)
- goto out;
r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
- pte_list_desc_cache, 4 + PTE_PREFETCH_NUM);
+ pte_list_desc_cache, 8 + PTE_PREFETCH_NUM);
if (r)
goto out;
r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
@@ -416,8 +409,6 @@ out:

static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
{
- mmu_free_memory_cache(&vcpu->arch.mmu_pte_chain_cache,
- pte_chain_cache);
mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
pte_list_desc_cache);
mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
@@ -435,17 +426,6 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc,
return p;
}

-static struct kvm_pte_chain *mmu_alloc_pte_chain(struct kvm_vcpu *vcpu)
-{
- return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_chain_cache,
- sizeof(struct kvm_pte_chain));
-}
-
-static void mmu_free_pte_chain(struct kvm_pte_chain *pc)
-{
- kmem_cache_free(pte_chain_cache, pc);
-}
-
static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
{
return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache,
@@ -721,6 +701,26 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list)
}
}

+typedef void (*pte_list_walk_fn) (u64 *spte);
+static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn)
+{
+ struct pte_list_desc *desc;
+ int i;
+
+ if (!*pte_list)
+ return;
+
+ if (!(*pte_list & 1))
+ return fn((u64 *)*pte_list);
+
+ desc = (struct pte_list_desc *)(*pte_list & ~1ul);
+ while (desc) {
+ for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i)
+ fn(desc->sptes[i]);
+ desc = desc->more;
+ }
+}
+
/*
* Take gfn and return the reverse mapping to it.
*/
@@ -1069,134 +1069,52 @@ static unsigned kvm_page_table_hashfn(gfn_t gfn)
return gfn & ((1 << KVM_MMU_HASH_SHIFT) - 1);
}

-static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
- u64 *parent_pte, int direct)
-{
- struct kvm_mmu_page *sp;
-
- sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache, sizeof *sp);
- sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
- if (!direct)
- sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache,
- PAGE_SIZE);
- set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
- list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
- bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
- sp->multimapped = 0;
- sp->parent_pte = parent_pte;
- kvm_mod_used_mmu_pages(vcpu->kvm, +1);
- return sp;
-}
-
static void mmu_page_add_parent_pte(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp, u64 *parent_pte)
{
- struct kvm_pte_chain *pte_chain;
- struct hlist_node *node;
- int i;
-
if (!parent_pte)
return;
- if (!sp->multimapped) {
- u64 *old = sp->parent_pte;

- if (!old) {
- sp->parent_pte = parent_pte;
- return;
- }
- sp->multimapped = 1;
- pte_chain = mmu_alloc_pte_chain(vcpu);
- INIT_HLIST_HEAD(&sp->parent_ptes);
- hlist_add_head(&pte_chain->link, &sp->parent_ptes);
- pte_chain->parent_ptes[0] = old;
- }
- hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link) {
- if (pte_chain->parent_ptes[NR_PTE_CHAIN_ENTRIES-1])
- continue;
- for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i)
- if (!pte_chain->parent_ptes[i]) {
- pte_chain->parent_ptes[i] = parent_pte;
- return;
- }
- }
- pte_chain = mmu_alloc_pte_chain(vcpu);
- BUG_ON(!pte_chain);
- hlist_add_head(&pte_chain->link, &sp->parent_ptes);
- pte_chain->parent_ptes[0] = parent_pte;
+ pte_list_add(vcpu, parent_pte, &sp->parent_ptes);
}

static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
u64 *parent_pte)
{
- struct kvm_pte_chain *pte_chain;
- struct hlist_node *node;
- int i;
-
- if (!sp->multimapped) {
- BUG_ON(sp->parent_pte != parent_pte);
- sp->parent_pte = NULL;
- return;
- }
- hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
- for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
- if (!pte_chain->parent_ptes[i])
- break;
- if (pte_chain->parent_ptes[i] != parent_pte)
- continue;
- while (i + 1 < NR_PTE_CHAIN_ENTRIES
- && pte_chain->parent_ptes[i + 1]) {
- pte_chain->parent_ptes[i]
- = pte_chain->parent_ptes[i + 1];
- ++i;
- }
- pte_chain->parent_ptes[i] = NULL;
- if (i == 0) {
- hlist_del(&pte_chain->link);
- mmu_free_pte_chain(pte_chain);
- if (hlist_empty(&sp->parent_ptes)) {
- sp->multimapped = 0;
- sp->parent_pte = NULL;
- }
- }
- return;
- }
- BUG();
+ pte_list_remove(parent_pte, &sp->parent_ptes);
}

-static void mmu_parent_walk(struct kvm_mmu_page *sp, mmu_parent_walk_fn fn)
+static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
+ u64 *parent_pte, int direct)
{
- struct kvm_pte_chain *pte_chain;
- struct hlist_node *node;
- struct kvm_mmu_page *parent_sp;
- int i;
-
- if (!sp->multimapped && sp->parent_pte) {
- parent_sp = page_header(__pa(sp->parent_pte));
- fn(parent_sp, sp->parent_pte);
- return;
- }
-
- hlist_for_each_entry(pte_chain, node, &sp->parent_ptes, link)
- for (i = 0; i < NR_PTE_CHAIN_ENTRIES; ++i) {
- u64 *spte = pte_chain->parent_ptes[i];
-
- if (!spte)
- break;
- parent_sp = page_header(__pa(spte));
- fn(parent_sp, spte);
- }
+ struct kvm_mmu_page *sp;
+ sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache,
+ sizeof *sp);
+ sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache, PAGE_SIZE);
+ if (!direct)
+ sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache,
+ PAGE_SIZE);
+ set_page_private(virt_to_page(sp->spt), (unsigned long)sp);
+ list_add(&sp->link, &vcpu->kvm->arch.active_mmu_pages);
+ bitmap_zero(sp->slot_bitmap, KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS);
+ sp->parent_ptes = 0;
+ mmu_page_add_parent_pte(vcpu, sp, parent_pte);
+ kvm_mod_used_mmu_pages(vcpu->kvm, +1);
+ return sp;
}

-static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte);
+static void mark_unsync(u64 *spte);
static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp)
{
- mmu_parent_walk(sp, mark_unsync);
+ pte_list_walk(&sp->parent_ptes, mark_unsync);
}

-static void mark_unsync(struct kvm_mmu_page *sp, u64 *spte)
+static void mark_unsync(u64 *spte)
{
+ struct kvm_mmu_page *sp;
unsigned int index;

+ sp = page_header(__pa(spte));
index = spte - sp->spt;
if (__test_and_set_bit(index, sp->unsync_child_bitmap))
return;
@@ -1694,17 +1612,7 @@ static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
{
u64 *parent_pte;

- while (sp->multimapped || sp->parent_pte) {
- if (!sp->multimapped)
- parent_pte = sp->parent_pte;
- else {
- struct kvm_pte_chain *chain;
-
- chain = container_of(sp->parent_ptes.first,
- struct kvm_pte_chain, link);
- parent_pte = chain->parent_ptes[0];
- }
- BUG_ON(!parent_pte);
+ while ((parent_pte = pte_list_next(&sp->parent_ptes, NULL))) {
kvm_mmu_put_page(sp, parent_pte);
__set_spte(parent_pte, shadow_trap_nonpresent_pte);
}
@@ -3616,8 +3524,6 @@ static struct shrinker mmu_shrinker = {

static void mmu_destroy_caches(void)
{
- if (pte_chain_cache)
- kmem_cache_destroy(pte_chain_cache);
if (pte_list_desc_cache)
kmem_cache_destroy(pte_list_desc_cache);
if (mmu_page_header_cache)
@@ -3626,11 +3532,6 @@ static void mmu_destroy_caches(void)

int kvm_mmu_module_init(void)
{
- pte_chain_cache = kmem_cache_create("kvm_pte_chain",
- sizeof(struct kvm_pte_chain),
- 0, 0, NULL);
- if (!pte_chain_cache)
- goto nomem;
pte_list_desc_cache = kmem_cache_create("pte_list_desc",
sizeof(struct pte_list_desc),
0, 0, NULL);
--
1.7.4.4

2011-05-15 15:26:08

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 6/7] KVM: MMU: cleanup for kvm_mmu_page_unlink_children

Cleanup the same operation between kvm_mmu_page_unlink_children and
mmu_pte_write_zap_pte

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 66 ++++++++++++++++++---------------------------------
1 files changed, 23 insertions(+), 43 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c0b16dd..3fab3c2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1566,32 +1566,33 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
}
}

+static void mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
+ u64 *spte)
+{
+ u64 pte;
+ struct kvm_mmu_page *child;
+
+ pte = *spte;
+ if (is_shadow_present_pte(pte)) {
+ if (is_last_spte(pte, sp->role.level))
+ drop_spte(kvm, spte, shadow_trap_nonpresent_pte);
+ else {
+ child = page_header(pte & PT64_BASE_ADDR_MASK);
+ mmu_page_remove_parent_pte(child, spte);
+ }
+ }
+ __set_spte(spte, shadow_trap_nonpresent_pte);
+ if (is_large_pte(pte))
+ --kvm->stat.lpages;
+}
+
static void kvm_mmu_page_unlink_children(struct kvm *kvm,
struct kvm_mmu_page *sp)
{
unsigned i;
- u64 *pt;
- u64 ent;
-
- pt = sp->spt;

- for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
- ent = pt[i];
-
- if (is_shadow_present_pte(ent)) {
- if (!is_last_spte(ent, sp->role.level)) {
- ent &= PT64_BASE_ADDR_MASK;
- mmu_page_remove_parent_pte(page_header(ent),
- &pt[i]);
- } else {
- if (is_large_pte(ent))
- --kvm->stat.lpages;
- drop_spte(kvm, &pt[i],
- shadow_trap_nonpresent_pte);
- }
- }
- pt[i] = shadow_trap_nonpresent_pte;
- }
+ for (i = 0; i < PT64_ENT_PER_PAGE; ++i)
+ mmu_page_zap_pte(kvm, sp, sp->spt + i);
}

static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte)
@@ -3069,27 +3070,6 @@ void kvm_mmu_unload(struct kvm_vcpu *vcpu)
}
EXPORT_SYMBOL_GPL(kvm_mmu_unload);

-static void mmu_pte_write_zap_pte(struct kvm_vcpu *vcpu,
- struct kvm_mmu_page *sp,
- u64 *spte)
-{
- u64 pte;
- struct kvm_mmu_page *child;
-
- pte = *spte;
- if (is_shadow_present_pte(pte)) {
- if (is_last_spte(pte, sp->role.level))
- drop_spte(vcpu->kvm, spte, shadow_trap_nonpresent_pte);
- else {
- child = page_header(pte & PT64_BASE_ADDR_MASK);
- mmu_page_remove_parent_pte(child, spte);
- }
- }
- __set_spte(spte, shadow_trap_nonpresent_pte);
- if (is_large_pte(pte))
- --vcpu->kvm->stat.lpages;
-}
-
static void mmu_pte_write_new_pte(struct kvm_vcpu *vcpu,
struct kvm_mmu_page *sp, u64 *spte,
const void *new)
@@ -3271,7 +3251,7 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
spte = &sp->spt[page_offset / sizeof(*spte)];
while (npte--) {
entry = *spte;
- mmu_pte_write_zap_pte(vcpu, sp, spte);
+ mmu_page_zap_pte(vcpu->kvm, sp, spte);
if (gentry &&
!((sp->role.word ^ vcpu->arch.mmu.base_role.word)
& mask.word))
--
1.7.4.4

2011-05-15 15:26:42

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 7/7] KVM: MMU: cleanup for dropping parent pte

Introduce drop_parent_pte to remove the rmap of parent pte and
clear parent pte

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 21 ++++++++++++---------
1 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3fab3c2..2d14434 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1084,6 +1084,13 @@ static void mmu_page_remove_parent_pte(struct kvm_mmu_page *sp,
pte_list_remove(parent_pte, &sp->parent_ptes);
}

+static void drop_parent_pte(struct kvm_mmu_page *sp,
+ u64 *parent_pte)
+{
+ mmu_page_remove_parent_pte(sp, parent_pte);
+ __set_spte(parent_pte, shadow_trap_nonpresent_pte);
+}
+
static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu,
u64 *parent_pte, int direct)
{
@@ -1560,8 +1567,7 @@ static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep,
if (child->role.access == direct_access)
return;

- mmu_page_remove_parent_pte(child, sptep);
- __set_spte(sptep, shadow_trap_nonpresent_pte);
+ drop_parent_pte(child, sptep);
kvm_flush_remote_tlbs(vcpu->kvm);
}
}
@@ -1578,7 +1584,7 @@ static void mmu_page_zap_pte(struct kvm *kvm, struct kvm_mmu_page *sp,
drop_spte(kvm, spte, shadow_trap_nonpresent_pte);
else {
child = page_header(pte & PT64_BASE_ADDR_MASK);
- mmu_page_remove_parent_pte(child, spte);
+ drop_parent_pte(child, spte);
}
}
__set_spte(spte, shadow_trap_nonpresent_pte);
@@ -1613,10 +1619,8 @@ static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
{
u64 *parent_pte;

- while ((parent_pte = pte_list_next(&sp->parent_ptes, NULL))) {
- kvm_mmu_put_page(sp, parent_pte);
- __set_spte(parent_pte, shadow_trap_nonpresent_pte);
- }
+ while ((parent_pte = pte_list_next(&sp->parent_ptes, NULL)))
+ drop_parent_pte(sp, parent_pte);
}

static int mmu_zap_unsync_children(struct kvm *kvm,
@@ -2046,8 +2050,7 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
u64 pte = *sptep;

child = page_header(pte & PT64_BASE_ADDR_MASK);
- mmu_page_remove_parent_pte(child, sptep);
- __set_spte(sptep, shadow_trap_nonpresent_pte);
+ drop_parent_pte(child, sptep);
kvm_flush_remote_tlbs(vcpu->kvm);
} else if (pfn != spte_to_pfn(*sptep)) {
pgprintk("hfn old %llx new %llx\n",
--
1.7.4.4

2011-05-16 11:25:44

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] KVM: MMU: optimize pte write path if don't have protected sp

On 05/15/2011 06:20 PM, Xiao Guangrong wrote:
> Simply return from kvm_mmu_pte_write path if no shadow page is
> write-protected, then we can avoid to walk all shadow pages and hold
> mmu-lock

Patchset looks like a very good cleanup (plus the nice optimization in
patch 1).

--
error compiling committee.c: too many arguments to function

2011-05-18 13:18:28

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] KVM: MMU: optimize pte write path if don't have protected sp

On Mon, May 16, 2011 at 02:25:02PM +0300, Avi Kivity wrote:
> On 05/15/2011 06:20 PM, Xiao Guangrong wrote:
> >Simply return from kvm_mmu_pte_write path if no shadow page is
> >write-protected, then we can avoid to walk all shadow pages and hold
> >mmu-lock
>
> Patchset looks like a very good cleanup (plus the nice optimization
> in patch 1).

What case is patch 1 optimizing for?

2011-05-18 13:20:11

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] KVM: MMU: optimize pte write path if don't have protected sp

On 05/18/2011 04:12 PM, Marcelo Tosatti wrote:
> On Mon, May 16, 2011 at 02:25:02PM +0300, Avi Kivity wrote:
> > On 05/15/2011 06:20 PM, Xiao Guangrong wrote:
> > >Simply return from kvm_mmu_pte_write path if no shadow page is
> > >write-protected, then we can avoid to walk all shadow pages and hold
> > >mmu-lock
> >
> > Patchset looks like a very good cleanup (plus the nice optimization
> > in patch 1).
>
> What case is patch 1 optimizing for?
>

Say, kvmclock updates.

--
error compiling committee.c: too many arguments to function

2011-05-20 21:31:59

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v2 1/7] KVM: MMU: optimize pte write path if don't have protected sp

On Sun, May 15, 2011 at 11:20:27PM +0800, Xiao Guangrong wrote:
> Simply return from kvm_mmu_pte_write path if no shadow page is
> write-protected, then we can avoid to walk all shadow pages and hold
> mmu-lock
>
> Signed-off-by: Xiao Guangrong <[email protected]>

Applied, thanks.