2011-03-04 10:55:48

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 1/10] KVM: MMU: fix kvm_mmu_slot_remove_write_access

Only remove write access in the last sptes

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 9 +++++++--
1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b6a9963..b9bf016 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3540,12 +3540,17 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot)

pt = sp->spt;
for (i = 0; i < PT64_ENT_PER_PAGE; ++i) {
- if (sp->role.level != PT_PAGE_TABLE_LEVEL
- && is_large_pte(pt[i])) {
+ if (!is_shadow_present_pte(pt[i]) ||
+ !is_last_spte(pt[i], sp->role.level))
+ continue;
+
+ if (is_large_pte(pt[i])) {
drop_spte(kvm, &pt[i],
shadow_trap_nonpresent_pte);
--kvm->stat.lpages;
+ continue;
}
+
/* avoid RMW */
if (is_writable_pte(pt[i]))
update_spte(&pt[i], pt[i] & ~PT_WRITABLE_MASK);
--
1.7.4


2011-03-04 10:56:38

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 2/10] KVM: MMU: fix kvm_mmu_get_spte_hierarchy

Don not walk to the next level if spte is mapping to the large page

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b9bf016..10e0982 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3819,7 +3819,8 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
for_each_shadow_entry(vcpu, addr, iterator) {
sptes[iterator.level-1] = *iterator.sptep;
nr_sptes++;
- if (!is_shadow_present_pte(*iterator.sptep))
+ if (!is_shadow_present_pte(*iterator.sptep) ||
+ is_last_spte(*iterator.sptep, iterator.level))
break;
}
spin_unlock(&vcpu->kvm->mmu_lock);
--
1.7.4

2011-03-04 10:57:14

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 3/10] KVM: MMU: set spte accessed bit properly

Set spte accessed bit only if guest_initiated == 1 that means the really
accessed

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 10e0982..6e9cb35 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3307,11 +3307,11 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
spin_lock(&vcpu->kvm->mmu_lock);
if (atomic_read(&vcpu->kvm->arch.invlpg_counter) != invlpg_counter)
gentry = 0;
- kvm_mmu_access_page(vcpu, gfn);
kvm_mmu_free_some_pages(vcpu);
++vcpu->kvm->stat.mmu_pte_write;
trace_kvm_mmu_audit(vcpu, AUDIT_PRE_PTE_WRITE);
if (guest_initiated) {
+ kvm_mmu_access_page(vcpu, gfn);
if (gfn == vcpu->arch.last_pt_write_gfn
&& !last_updated_pte_accessed(vcpu)) {
++vcpu->arch.last_pt_write_count;
--
1.7.4

2011-03-04 10:57:47

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 4/10] KVM: fix rcu usage in init_rmode_* functions

fix:
[ 3494.671786] stack backtrace:
[ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
[ 3494.671790] Call Trace:
[ 3494.671796] [<ffffffff8107d15f>] ? lockdep_rcu_dereference+0x9d/0xa5
[ 3494.671826] [<ffffffffa03498be>] ? kvm_memslots+0x6b/0x73 [kvm]
[ 3494.671834] [<ffffffffa03499be>] ? gfn_to_memslot+0x16/0x4f [kvm]
[ 3494.671843] [<ffffffffa0349a0d>] ? gfn_to_hva+0x16/0x27 [kvm]
[ 3494.671851] [<ffffffffa034c1c4>] ? kvm_write_guest_page+0x31/0x83 [kvm]
[ 3494.671861] [<ffffffffa034c230>] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
[ 3494.671867] [<ffffffffa015e321>] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel]

and:
[ 8328.789599] stack backtrace:
[ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
[ 8328.789603] Call Trace:
[ 8328.789609] [<ffffffff8107d15f>] ? lockdep_rcu_dereference+0x9d/0xa5
[ 8328.789621] [<ffffffffa03988be>] ? kvm_memslots+0x6b/0x73 [kvm]
[ 8328.789628] [<ffffffffa03989be>] ? gfn_to_memslot+0x16/0x4f [kvm]
[ 8328.789635] [<ffffffffa0398a0d>] ? gfn_to_hva+0x16/0x27 [kvm]
[ 8328.789643] [<ffffffffa039b1c4>] ? kvm_write_guest_page+0x31/0x83 [kvm]
[ 8328.789699] [<ffffffffa039b230>] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
[ 8328.789713] [<ffffffffa0206d0e>] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel]

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/vmx.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index e2b8c6b..d871ced 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2399,9 +2399,10 @@ static int init_rmode_tss(struct kvm *kvm)
{
gfn_t fn = rmode_tss_base(kvm) >> PAGE_SHIFT;
u16 data = 0;
- int ret = 0;
- int r;
+ int r, idx, ret = 0;

+ idx = srcu_read_lock(&kvm->srcu);
+ fn = rmode_tss_base(kvm) >> PAGE_SHIFT;
r = kvm_clear_guest_page(kvm, fn, 0, PAGE_SIZE);
if (r < 0)
goto out;
@@ -2425,12 +2426,13 @@ static int init_rmode_tss(struct kvm *kvm)

ret = 1;
out:
+ srcu_read_unlock(&kvm->srcu, idx);
return ret;
}

static int init_rmode_identity_map(struct kvm *kvm)
{
- int i, r, ret;
+ int i, idx, r, ret;
pfn_t identity_map_pfn;
u32 tmp;

@@ -2445,6 +2447,7 @@ static int init_rmode_identity_map(struct kvm *kvm)
return 1;
ret = 0;
identity_map_pfn = kvm->arch.ept_identity_map_addr >> PAGE_SHIFT;
+ idx = srcu_read_lock(&kvm->srcu);
r = kvm_clear_guest_page(kvm, identity_map_pfn, 0, PAGE_SIZE);
if (r < 0)
goto out;
@@ -2460,6 +2463,7 @@ static int init_rmode_identity_map(struct kvm *kvm)
kvm->arch.ept_identity_pagetable_done = true;
ret = 1;
out:
+ srcu_read_unlock(&kvm->srcu, idx);
return ret;
}

--
1.7.4

2011-03-04 10:58:28

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 5/10] KVM: MMU: move mmu pages calculated out of mmu lock

kvm_mmu_calculate_mmu_pages need to walk all memslots and it's protected by
kvm->slots_lock, so move it out of mmu spinlock

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/x86.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 785ae0c..b9c2b8e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6105,7 +6105,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
int user_alloc)
{

- int npages = mem->memory_size >> PAGE_SHIFT;
+ int nr_mmu_pages = 0, npages = mem->memory_size >> PAGE_SHIFT;

if (!user_alloc && !old.user_alloc && old.rmap && !npages) {
int ret;
@@ -6120,12 +6120,12 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
"failed to munmap memory\n");
}

+ if (!kvm->arch.n_requested_mmu_pages)
+ nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm);
+
spin_lock(&kvm->mmu_lock);
- if (!kvm->arch.n_requested_mmu_pages) {
- unsigned int nr_mmu_pages = kvm_mmu_calculate_mmu_pages(kvm);
+ if (nr_mmu_pages)
kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
- }
-
kvm_mmu_slot_remove_write_access(kvm, mem->slot);
spin_unlock(&kvm->mmu_lock);
}
--
1.7.4

2011-03-04 10:59:07

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 6/10] KVM: MMU: don not record gfn in kvm_mmu_pte_write

No need to record the gfn to verifier the pte has the same mode as
current vcpu, it's because we only speculatively update the pte only
if the pte and vcpu have the same mode

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/mmu.c | 6 ++----
arch/x86/kvm/paging_tmpl.h | 2 --
3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 37bd730..f08314f 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -336,7 +336,6 @@ struct kvm_vcpu_arch {
gfn_t last_pte_gfn;

struct {
- gfn_t gfn; /* presumed gfn during guest pte update */
pfn_t pfn; /* pfn corresponding to that gfn */
unsigned long mmu_seq;
} update_pte;
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6e9cb35..cd35e71 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3228,7 +3228,6 @@ static void mmu_guess_page_from_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
kvm_release_pfn_clean(pfn);
return;
}
- vcpu->arch.update_pte.gfn = gfn;
vcpu->arch.update_pte.pfn = pfn;
}

@@ -3275,9 +3274,8 @@ void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,

/*
* Assume that the pte write on a page table of the same type
- * as the current vcpu paging mode. This is nearly always true
- * (might be false while changing modes). Note it is verified later
- * by update_pte().
+ * as the current vcpu paging mode since we update the sptes only
+ * when they have the same mode.
*/
if ((is_pae(vcpu) && bytes == 4) || !new) {
/* Handle a 32-bit guest writing two halves of a 64-bit gpte */
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 6bccc24..b3862ee 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -339,8 +339,6 @@ static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,

pgprintk("%s: gpte %llx spte %p\n", __func__, (u64)gpte, spte);
pte_access = sp->role.access & FNAME(gpte_access)(vcpu, gpte);
- if (gpte_to_gfn(gpte) != vcpu->arch.update_pte.gfn)
- return;
pfn = vcpu->arch.update_pte.pfn;
if (is_error_pfn(pfn))
return;
--
1.7.4

2011-03-04 10:59:50

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 7/10] KVM: MMU: introduce a common function to get no-dirty-logged slot

pte_prefetch_gfn_to_memslot and mapping_level_dirty_bitmap have the same
operation, so integrated

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 26 ++++++++++++++++++--------
1 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cd35e71..8acb8e3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -554,13 +554,24 @@ static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
return ret;
}

-static bool mapping_level_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t large_gfn)
+static struct kvm_memory_slot *
+gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn, bool no_dirty_log)
{
struct kvm_memory_slot *slot;
- slot = gfn_to_memslot(vcpu->kvm, large_gfn);
- if (slot && slot->dirty_bitmap)
- return true;
- return false;
+
+ slot = gfn_to_memslot(vcpu->kvm, gfn);
+ if (!slot)
+ return NULL;
+
+ if (no_dirty_log && slot->dirty_bitmap)
+ return NULL;
+
+ return slot;
+}
+
+static bool mapping_level_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t large_gfn)
+{
+ return !!gfn_to_memslot_dirty_bitmap(vcpu, large_gfn, true);
}

static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
@@ -2155,9 +2166,8 @@ pte_prefetch_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn, bool no_dirty_log)
{
struct kvm_memory_slot *slot;

- slot = gfn_to_memslot(vcpu->kvm, gfn);
- if (!slot || slot->flags & KVM_MEMSLOT_INVALID ||
- (no_dirty_log && slot->dirty_bitmap))
+ slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
+ if (slot && slot->flags & KVM_MEMSLOT_INVALID)
slot = NULL;

return slot;
--
1.7.4

2011-03-04 11:00:17

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 8/10] KVM: MMU: cleanup page alloc and free

Using __get_free_page instead of alloc_page and page_address,
using free_page instead of __free_page and virt_to_page

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 10 +++++-----
1 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 8acb8e3..7d558b8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -379,15 +379,15 @@ static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
int min)
{
- struct page *page;
+ void *page;

if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- page = alloc_page(GFP_KERNEL);
+ page = (void *)__get_free_page(GFP_KERNEL);
if (!page)
return -ENOMEM;
- cache->objects[cache->nobjs++] = page_address(page);
+ cache->objects[cache->nobjs++] = page;
}
return 0;
}
@@ -1043,9 +1043,9 @@ static void kvm_mmu_free_page(struct kvm *kvm, struct kvm_mmu_page *sp)
ASSERT(is_empty_shadow_page(sp->spt));
hlist_del(&sp->hash_link);
list_del(&sp->link);
- __free_page(virt_to_page(sp->spt));
+ free_page((unsigned long)sp->spt);
if (!sp->role.direct)
- __free_page(virt_to_page(sp->gfns));
+ free_page((unsigned long)sp->gfns);
kmem_cache_free(mmu_page_header_cache, sp);
kvm_mod_used_mmu_pages(kvm, -1);
}
--
1.7.4

2011-03-04 11:00:46

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 9/10] KVM: MMU: remove unused macros

These macros are not used, so removed

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 5 -----
arch/x86/kvm/paging_tmpl.h | 3 ---
2 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 7d558b8..54a2ead 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -111,9 +111,6 @@ module_param(oos_shadow, bool, 0644);
#define PT64_LEVEL_SHIFT(level) \
(PAGE_SHIFT + (level - 1) * PT64_LEVEL_BITS)

-#define PT64_LEVEL_MASK(level) \
- (((1ULL << PT64_LEVEL_BITS) - 1) << PT64_LEVEL_SHIFT(level))
-
#define PT64_INDEX(address, level)\
(((address) >> PT64_LEVEL_SHIFT(level)) & ((1 << PT64_LEVEL_BITS) - 1))

@@ -123,8 +120,6 @@ module_param(oos_shadow, bool, 0644);
#define PT32_LEVEL_SHIFT(level) \
(PAGE_SHIFT + (level - 1) * PT32_LEVEL_BITS)

-#define PT32_LEVEL_MASK(level) \
- (((1ULL << PT32_LEVEL_BITS) - 1) << PT32_LEVEL_SHIFT(level))
#define PT32_LVL_OFFSET_MASK(level) \
(PT32_BASE_ADDR_MASK & ((1ULL << (PAGE_SHIFT + (((level) - 1) \
* PT32_LEVEL_BITS))) - 1))
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index b3862ee..86eb816 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -31,7 +31,6 @@
#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
- #define PT_LEVEL_MASK(level) PT64_LEVEL_MASK(level)
#define PT_LEVEL_BITS PT64_LEVEL_BITS
#ifdef CONFIG_X86_64
#define PT_MAX_FULL_LEVELS 4
@@ -48,7 +47,6 @@
#define PT_LVL_ADDR_MASK(lvl) PT32_LVL_ADDR_MASK(lvl)
#define PT_LVL_OFFSET_MASK(lvl) PT32_LVL_OFFSET_MASK(lvl)
#define PT_INDEX(addr, level) PT32_INDEX(addr, level)
- #define PT_LEVEL_MASK(level) PT32_LEVEL_MASK(level)
#define PT_LEVEL_BITS PT32_LEVEL_BITS
#define PT_MAX_FULL_LEVELS 2
#define CMPXCHG cmpxchg
@@ -827,7 +825,6 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
#undef FNAME
#undef PT_BASE_ADDR_MASK
#undef PT_INDEX
-#undef PT_LEVEL_MASK
#undef PT_LVL_ADDR_MASK
#undef PT_LVL_OFFSET_MASK
#undef PT_LEVEL_BITS
--
1.7.4

2011-03-04 11:01:13

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 10/10] KVM: cleanup memslot_id function

We can get memslot id from memslot->id directly

Signed-off-by: Xiao Guangrong <[email protected]>
---
include/linux/kvm_host.h | 7 ++++++-
virt/kvm/kvm_main.c | 17 -----------------
2 files changed, 6 insertions(+), 18 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ab42855..b1a7430 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -365,7 +365,6 @@ pfn_t gfn_to_pfn_prot(struct kvm *kvm, gfn_t gfn, bool write_fault,
bool *writable);
pfn_t gfn_to_pfn_memslot(struct kvm *kvm,
struct kvm_memory_slot *slot, gfn_t gfn);
-int memslot_id(struct kvm *kvm, gfn_t gfn);
void kvm_release_pfn_dirty(pfn_t);
void kvm_release_pfn_clean(pfn_t pfn);
void kvm_set_pfn_dirty(pfn_t pfn);
@@ -388,6 +387,12 @@ int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
+
+static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
+{
+ return gfn_to_memslot(kvm, gfn)->id;
+}
+
int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
void mark_page_dirty(struct kvm *kvm, gfn_t gfn);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1fa0d29..cf90f28 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -997,23 +997,6 @@ out:
return size;
}

-int memslot_id(struct kvm *kvm, gfn_t gfn)
-{
- int i;
- struct kvm_memslots *slots = kvm_memslots(kvm);
- struct kvm_memory_slot *memslot = NULL;
-
- for (i = 0; i < slots->nmemslots; ++i) {
- memslot = &slots->memslots[i];
-
- if (gfn >= memslot->base_gfn
- && gfn < memslot->base_gfn + memslot->npages)
- break;
- }
-
- return memslot - slots->memslots;
-}
-
static unsigned long gfn_to_hva_many(struct kvm_memory_slot *slot, gfn_t gfn,
gfn_t *nr_pages)
{
--
1.7.4

2011-03-04 11:44:16

by Jan Kiszka

[permalink] [raw]
Subject: Re: [PATCH 4/10] KVM: fix rcu usage in init_rmode_* functions

On 2011-03-04 11:58, Xiao Guangrong wrote:
> fix:
> [ 3494.671786] stack backtrace:
> [ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
> [ 3494.671790] Call Trace:
> [ 3494.671796] [<ffffffff8107d15f>] ? lockdep_rcu_dereference+0x9d/0xa5
> [ 3494.671826] [<ffffffffa03498be>] ? kvm_memslots+0x6b/0x73 [kvm]
> [ 3494.671834] [<ffffffffa03499be>] ? gfn_to_memslot+0x16/0x4f [kvm]
> [ 3494.671843] [<ffffffffa0349a0d>] ? gfn_to_hva+0x16/0x27 [kvm]
> [ 3494.671851] [<ffffffffa034c1c4>] ? kvm_write_guest_page+0x31/0x83 [kvm]
> [ 3494.671861] [<ffffffffa034c230>] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
> [ 3494.671867] [<ffffffffa015e321>] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel]
>
> and:
> [ 8328.789599] stack backtrace:
> [ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
> [ 8328.789603] Call Trace:
> [ 8328.789609] [<ffffffff8107d15f>] ? lockdep_rcu_dereference+0x9d/0xa5
> [ 8328.789621] [<ffffffffa03988be>] ? kvm_memslots+0x6b/0x73 [kvm]
> [ 8328.789628] [<ffffffffa03989be>] ? gfn_to_memslot+0x16/0x4f [kvm]
> [ 8328.789635] [<ffffffffa0398a0d>] ? gfn_to_hva+0x16/0x27 [kvm]
> [ 8328.789643] [<ffffffffa039b1c4>] ? kvm_write_guest_page+0x31/0x83 [kvm]
> [ 8328.789699] [<ffffffffa039b230>] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
> [ 8328.789713] [<ffffffffa0206d0e>] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel]
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/kvm/vmx.c | 10 +++++++---
> 1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index e2b8c6b..d871ced 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2399,9 +2399,10 @@ static int init_rmode_tss(struct kvm *kvm)
> {
> gfn_t fn = rmode_tss_base(kvm) >> PAGE_SHIFT;

I bet you also wanted to remove this initialization.

> u16 data = 0;
> - int ret = 0;
> - int r;
> + int r, idx, ret = 0;
>
> + idx = srcu_read_lock(&kvm->srcu);
> + fn = rmode_tss_base(kvm) >> PAGE_SHIFT;
> r = kvm_clear_guest_page(kvm, fn, 0, PAGE_SIZE);
> if (r < 0)
> goto out;

Jan

--
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

2011-03-05 17:03:39

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH 2/10] KVM: MMU: fix kvm_mmu_get_spte_hierarchy

On Fri, Mar 04, 2011 at 06:57:27PM +0800, Xiao Guangrong wrote:
> Don not walk to the next level if spte is mapping to the large page
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 3 ++-
> 1 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index b9bf016..10e0982 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3819,7 +3819,8 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
> for_each_shadow_entry(vcpu, addr, iterator) {
> sptes[iterator.level-1] = *iterator.sptep;
> nr_sptes++;
> - if (!is_shadow_present_pte(*iterator.sptep))
> + if (!is_shadow_present_pte(*iterator.sptep) ||
> + is_last_spte(*iterator.sptep, iterator.level))
> break;
> }
> spin_unlock(&vcpu->kvm->mmu_lock);

shadow_walk_okay covers that case.

2011-03-06 12:32:54

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 10/10] KVM: cleanup memslot_id function

On 03/04/2011 01:02 PM, Xiao Guangrong wrote:
> We can get memslot id from memslot->id directly
>
>
> @@ -388,6 +387,12 @@ int kvm_gfn_to_hva_cache_init(struct kvm *kvm, struct gfn_to_hva_cache *ghc,
> int kvm_clear_guest_page(struct kvm *kvm, gfn_t gfn, int offset, int len);
> int kvm_clear_guest(struct kvm *kvm, gpa_t gpa, unsigned long len);
> struct kvm_memory_slot *gfn_to_memslot(struct kvm *kvm, gfn_t gfn);
> +
> +static inline int memslot_id(struct kvm *kvm, gfn_t gfn)
> +{
> + return gfn_to_memslot(kvm, gfn)->id;
> +}
> +
> int kvm_is_visible_gfn(struct kvm *kvm, gfn_t gfn);
> unsigned long kvm_host_page_size(struct kvm *kvm, gfn_t gfn);
> void mark_page_dirty(struct kvm *kvm, gfn_t gfn);

Please put the function near the other inlines.

--
error compiling committee.c: too many arguments to function

2011-03-06 12:28:38

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 7/10] KVM: MMU: introduce a common function to get no-dirty-logged slot

On 03/04/2011 01:00 PM, Xiao Guangrong wrote:
> pte_prefetch_gfn_to_memslot and mapping_level_dirty_bitmap have the same
> operation, so integrated
>
> Signed-off-by: Xiao Guangrong<[email protected]>
> ---
> arch/x86/kvm/mmu.c | 26 ++++++++++++++++++--------
> 1 files changed, 18 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index cd35e71..8acb8e3 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -554,13 +554,24 @@ static int host_mapping_level(struct kvm *kvm, gfn_t gfn)
> return ret;
> }
>
> -static bool mapping_level_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t large_gfn)
> +static struct kvm_memory_slot *
> +gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn, bool no_dirty_log)
> {
> struct kvm_memory_slot *slot;
> - slot = gfn_to_memslot(vcpu->kvm, large_gfn);
> - if (slot&& slot->dirty_bitmap)
> - return true;
> - return false;
> +
> + slot = gfn_to_memslot(vcpu->kvm, gfn);
> + if (!slot)
> + return NULL;
> +
> + if (no_dirty_log&& slot->dirty_bitmap)
> + return NULL;
> +
> + return slot;
> +}
> +
> +static bool mapping_level_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t large_gfn)
> +{
> + return !!gfn_to_memslot_dirty_bitmap(vcpu, large_gfn, true);
> }

!! !needed.

>
> static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
> @@ -2155,9 +2166,8 @@ pte_prefetch_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn, bool no_dirty_log)
> {
> struct kvm_memory_slot *slot;
>
> - slot = gfn_to_memslot(vcpu->kvm, gfn);
> - if (!slot || slot->flags& KVM_MEMSLOT_INVALID ||
> - (no_dirty_log&& slot->dirty_bitmap))
> + slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
> + if (slot&& slot->flags& KVM_MEMSLOT_INVALID)
> slot = NULL;
>
> return slot;

For a unification this adds a lot of code... I think the result is more
complicated than the starting point.

--
error compiling committee.c: too many arguments to function

2011-03-06 12:35:10

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 1/10] KVM: MMU: fix kvm_mmu_slot_remove_write_access

On 03/04/2011 12:56 PM, Xiao Guangrong wrote:
> Only remove write access in the last sptes

Thanks, applied patches 1, 3, 5, 6, 8, and 9.

--
error compiling committee.c: too many arguments to function

2011-03-07 03:05:18

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 2/10] KVM: MMU: fix kvm_mmu_get_spte_hierarchy

On 03/06/2011 12:48 AM, Marcelo Tosatti wrote:
> On Fri, Mar 04, 2011 at 06:57:27PM +0800, Xiao Guangrong wrote:
>> Don not walk to the next level if spte is mapping to the large page
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 3 ++-
>> 1 files changed, 2 insertions(+), 1 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index b9bf016..10e0982 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3819,7 +3819,8 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4])
>> for_each_shadow_entry(vcpu, addr, iterator) {
>> sptes[iterator.level-1] = *iterator.sptep;
>> nr_sptes++;
>> - if (!is_shadow_present_pte(*iterator.sptep))
>> + if (!is_shadow_present_pte(*iterator.sptep) ||
>> + is_last_spte(*iterator.sptep, iterator.level))
>> break;
>> }
>> spin_unlock(&vcpu->kvm->mmu_lock);
>
> shadow_walk_okay covers that case.
>

We can get the large mapping pte in the loop:

static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
{
if (iterator->level < PT_PAGE_TABLE_LEVEL)
return false;

if (iterator->level == PT_PAGE_TABLE_LEVEL)
if (is_large_pte(*iterator->sptep))
return false;

......
}

if level > 1 and pte.pse is set, it will return true.

And, i think this judgment is useless:
if (iterator->level == PT_PAGE_TABLE_LEVEL)
if (is_large_pte(*iterator->sptep))
return false;
since if level = 1, the pte bit7 is PAT.

2011-03-07 02:03:38

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 4/10] KVM: fix rcu usage in init_rmode_* functions

On 03/04/2011 07:43 PM, Jan Kiszka wrote:
> On 2011-03-04 11:58, Xiao Guangrong wrote:
>> fix:
>> [ 3494.671786] stack backtrace:
>> [ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
>> [ 3494.671790] Call Trace:
>> [ 3494.671796] [<ffffffff8107d15f>] ? lockdep_rcu_dereference+0x9d/0xa5
>> [ 3494.671826] [<ffffffffa03498be>] ? kvm_memslots+0x6b/0x73 [kvm]
>> [ 3494.671834] [<ffffffffa03499be>] ? gfn_to_memslot+0x16/0x4f [kvm]
>> [ 3494.671843] [<ffffffffa0349a0d>] ? gfn_to_hva+0x16/0x27 [kvm]
>> [ 3494.671851] [<ffffffffa034c1c4>] ? kvm_write_guest_page+0x31/0x83 [kvm]
>> [ 3494.671861] [<ffffffffa034c230>] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
>> [ 3494.671867] [<ffffffffa015e321>] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel]
>>
>> and:
>> [ 8328.789599] stack backtrace:
>> [ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
>> [ 8328.789603] Call Trace:
>> [ 8328.789609] [<ffffffff8107d15f>] ? lockdep_rcu_dereference+0x9d/0xa5
>> [ 8328.789621] [<ffffffffa03988be>] ? kvm_memslots+0x6b/0x73 [kvm]
>> [ 8328.789628] [<ffffffffa03989be>] ? gfn_to_memslot+0x16/0x4f [kvm]
>> [ 8328.789635] [<ffffffffa0398a0d>] ? gfn_to_hva+0x16/0x27 [kvm]
>> [ 8328.789643] [<ffffffffa039b1c4>] ? kvm_write_guest_page+0x31/0x83 [kvm]
>> [ 8328.789699] [<ffffffffa039b230>] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
>> [ 8328.789713] [<ffffffffa0206d0e>] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel]
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/kvm/vmx.c | 10 +++++++---
>> 1 files changed, 7 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index e2b8c6b..d871ced 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2399,9 +2399,10 @@ static int init_rmode_tss(struct kvm *kvm)
>> {
>> gfn_t fn = rmode_tss_base(kvm) >> PAGE_SHIFT;
>
> I bet you also wanted to remove this initialization.
>

Ouch, thank you for reminding me of that!

2011-03-07 03:49:41

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 7/10] KVM: MMU: introduce a common function to get no-dirty-logged slot

On 03/06/2011 08:28 PM, Avi Kivity wrote:

>> static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn)
>> @@ -2155,9 +2166,8 @@ pte_prefetch_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn, bool no_dirty_log)
>> {
>> struct kvm_memory_slot *slot;
>>
>> - slot = gfn_to_memslot(vcpu->kvm, gfn);
>> - if (!slot || slot->flags& KVM_MEMSLOT_INVALID ||
>> - (no_dirty_log&& slot->dirty_bitmap))
>> + slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log);
>> + if (slot&& slot->flags& KVM_MEMSLOT_INVALID)
>> slot = NULL;
>>
>> return slot;
>
> For a unification this adds a lot of code... I think the result is more complicated than the starting point.
>

After more thinking, i think we can move 'slot->flags& KVM_MEMSLOT_INVALID'
into the common function, since the later gva_to_pfn also can filter this out.

2011-03-07 03:23:08

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 6/10] KVM: MMU: don not record gfn in kvm_mmu_pte_write

On 03/06/2011 08:16 PM, Avi Kivity wrote:
> On 03/06/2011 02:14 PM, Avi Kivity wrote:
>> On 03/04/2011 01:00 PM, Xiao Guangrong wrote:
>>> No need to record the gfn to verifier the pte has the same mode as
>>> current vcpu, it's because we only speculatively update the pte only
>>> if the pte and vcpu have the same mode
>>
>> True. We can, as an additional cleanup, change mmu_pte_write_new_pte() to just do vcpu->arch.mmu.update_pte(...) instead of the if ().
>>
>
> Also, we can remove the pfn guessing and to it in FNAME(update_pte), using gfn_to_pfn_atomic().
>

OK, i'll do these in the next version, thanks!

2011-03-06 12:14:25

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 6/10] KVM: MMU: don not record gfn in kvm_mmu_pte_write

On 03/04/2011 01:00 PM, Xiao Guangrong wrote:
> No need to record the gfn to verifier the pte has the same mode as
> current vcpu, it's because we only speculatively update the pte only
> if the pte and vcpu have the same mode

True. We can, as an additional cleanup, change mmu_pte_write_new_pte()
to just do vcpu->arch.mmu.update_spte(...) instead of the if ().

--
error compiling committee.c: too many arguments to function

2011-03-06 12:16:47

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 6/10] KVM: MMU: don not record gfn in kvm_mmu_pte_write

On 03/06/2011 02:14 PM, Avi Kivity wrote:
> On 03/04/2011 01:00 PM, Xiao Guangrong wrote:
>> No need to record the gfn to verifier the pte has the same mode as
>> current vcpu, it's because we only speculatively update the pte only
>> if the pte and vcpu have the same mode
>
> True. We can, as an additional cleanup, change
> mmu_pte_write_new_pte() to just do vcpu->arch.mmu.update_pte(...)
> instead of the if ().
>

Also, we can remove the pfn guessing and to it in FNAME(update_pte),
using gfn_to_pfn_atomic().

--
error compiling committee.c: too many arguments to function

2011-03-07 08:55:23

by Avi Kivity

[permalink] [raw]
Subject: Re: [PATCH 2/10] KVM: MMU: fix kvm_mmu_get_spte_hierarchy

On 03/07/2011 05:06 AM, Xiao Guangrong wrote:
> We can get the large mapping pte in the loop:
>
> static bool shadow_walk_okay(struct kvm_shadow_walk_iterator *iterator)
> {
> if (iterator->level< PT_PAGE_TABLE_LEVEL)
> return false;
>
> if (iterator->level == PT_PAGE_TABLE_LEVEL)


> if (is_large_pte(*iterator->sptep))
> return false;
>
> ......
> }
>
> if level> 1 and pte.pse is set, it will return true.
>
> And, i think this judgment is useless:
> if (iterator->level == PT_PAGE_TABLE_LEVEL)
> if (is_large_pte(*iterator->sptep))
> return false;
> since if level = 1, the pte bit7 is PAT.

Right, the logic is inverted. I'll fix it up.

--
error compiling committee.c: too many arguments to function