2015-05-12 02:36:07

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 0/9] KVM: MTRR fixes and some cleanups

Changelog:
- fix the bit description in changelog of the first patch, thanks
David Matlack for pointing it out

all follow changes are from Paolo's comment and really appreciate it:
- reorder the whole patchset to make it is more readable
- redesign the iterator APIs
- make TLB clean if @lock_flush_tlb is true in slot_handle_level()
- make MTRR update be generic

This are some MTRR bugs if legacy IOMMU device is used on Intel's CPU:
- In current code, whenever guest MTRR registers are changed
kvm_mmu_reset_context is called to switch to the new root shadow page
table, however, it's useless since:
1) the cache type is not cached into shadow page's attribute so that the
original root shadow page will be reused

2) the cache type is set on the last spte, that means we should sync the
last sptes when MTRR is changed

We can fix it by dropping all the spte in the gfn range which is
being updated by MTRR

- some bugs are in get_mtrr_type();
1: bit 1 of mtrr_state->enabled is corresponding bit 11 of IA32_MTRR_DEF_TYPE
MSR which completely control MTRR's enablement that means other bits are
ignored if it is cleared

2: the fixed MTRR ranges are controlled by bit 0 of mtrr_state->enabled (bit
10 of IA32_MTRR_DEF_TYPE)

3: if MTRR is disabled, UC is applied to all of physical memory rather than
mtrr_state->def_type

- we need not to reset mmu once cache policy is changed since shadow page table
does not virtualize any cache policy

Also, these are some cleanups to make current MMU code more cleaner and help
us fixing the bug more easier.

Xiao Guangrong (9):
KVM: MMU: fix decoding cache type from MTRR
KVM: MMU: introduce for_each_rmap_spte()
KVM: MMU: introduce for_each_slot_rmap_range
KVM: MMU: introduce slot_handle_level_range() and its helpers
KVM: MMU: use slot_handle_level and its helper to clean up the code
KVM: MMU: introduce kvm_zap_rmapp
KVM: MMU: introduce kvm_zap_gfn_range
KVM: MMU: fix MTRR update
KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed

arch/x86/kvm/mmu.c | 408 ++++++++++++++++++++++++++---------------------
arch/x86/kvm/mmu.h | 1 +
arch/x86/kvm/mmu_audit.c | 4 +-
arch/x86/kvm/x86.c | 62 ++++++-
4 files changed, 284 insertions(+), 191 deletions(-)

--
2.1.0


2015-05-12 02:38:44

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 1/9] KVM: MMU: fix decoding cache type from MTRR

There are some bugs in current get_mtrr_type();
1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
that means other bits are ignored if it is cleared

2: the fixed MTRR ranges are controlled by bit 0 of
mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)

3: if MTRR is disabled, UC is applied to all of physical memory rather
than mtrr_state->def_type

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 3711095..f5fcfc1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
u64 start, u64 end)
{
- int i;
u64 base, mask;
u8 prev_match, curr_match;
- int num_var_ranges = KVM_NR_VAR_MTRR;
+ int i, num_var_ranges = KVM_NR_VAR_MTRR;

- if (!mtrr_state->enabled)
- return 0xFF;
+ /* MTRR is completely disabled, use UC for all of physical memory. */
+ if (!(mtrr_state->enabled & 0x2))
+ return MTRR_TYPE_UNCACHABLE;

/* Make end inclusive end, instead of exclusive */
end--;

/* Look in fixed ranges. Just return the type as per start */
- if (mtrr_state->have_fixed && (start < 0x100000)) {
+ if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
+ (start < 0x100000)) {
int idx;

if (start < 0x80000) {
@@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
* Look of multiple ranges matching this address and pick type
* as per MTRR precedence
*/
- if (!(mtrr_state->enabled & 2))
- return mtrr_state->def_type;
-
prev_match = 0xFF;
for (i = 0; i < num_var_ranges; ++i) {
unsigned short start_state, end_state;
--
2.1.0

2015-05-12 02:36:13

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 2/9] KVM: MMU: introduce for_each_rmap_spte()

It's used to walk all the sptes on the rmap to clean up the
code

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 63 +++++++++++++++++++-----------------------------
arch/x86/kvm/mmu_audit.c | 4 +--
2 files changed, 26 insertions(+), 41 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f5fcfc1..0da9cf0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1142,6 +1142,11 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
return NULL;
}

+#define for_each_rmap_spte(_rmap_, _iter_, _spte_) \
+ for (_spte_ = rmap_get_first(*_rmap_, _iter_); \
+ _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \
+ _spte_ = rmap_get_next(_iter_))
+
static void drop_spte(struct kvm *kvm, u64 *sptep)
{
if (mmu_spte_clear_track_bits(sptep))
@@ -1205,12 +1210,8 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
struct rmap_iterator iter;
bool flush = false;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
flush |= spte_write_protect(kvm, sptep, pt_protect);
- sptep = rmap_get_next(&iter);
- }

return flush;
}
@@ -1232,12 +1233,8 @@ static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
struct rmap_iterator iter;
bool flush = false;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
flush |= spte_clear_dirty(kvm, sptep);
- sptep = rmap_get_next(&iter);
- }

return flush;
}
@@ -1259,12 +1256,8 @@ static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp)
struct rmap_iterator iter;
bool flush = false;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
flush |= spte_set_dirty(kvm, sptep);
- sptep = rmap_get_next(&iter);
- }

return flush;
}
@@ -1368,13 +1361,14 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
struct rmap_iterator iter;
int need_tlb_flush = 0;

- while ((sptep = rmap_get_first(*rmapp, &iter))) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
+restart:
+ for_each_rmap_spte(rmapp, &iter, sptep) {
rmap_printk("kvm_rmap_unmap_hva: spte %p %llx gfn %llx (%d)\n",
sptep, *sptep, gfn, level);

drop_spte(kvm, sptep);
need_tlb_flush = 1;
+ goto restart;
}

return need_tlb_flush;
@@ -1394,8 +1388,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
WARN_ON(pte_huge(*ptep));
new_pfn = pte_pfn(*ptep);

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!is_shadow_present_pte(*sptep));
+restart:
+ for_each_rmap_spte(rmapp, &iter, sptep) {
rmap_printk("kvm_set_pte_rmapp: spte %p %llx gfn %llx (%d)\n",
sptep, *sptep, gfn, level);

@@ -1403,7 +1397,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,

if (pte_write(*ptep)) {
drop_spte(kvm, sptep);
- sptep = rmap_get_first(*rmapp, &iter);
+ goto restart;
} else {
new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
new_spte |= (u64)new_pfn << PAGE_SHIFT;
@@ -1414,7 +1408,6 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,

mmu_spte_clear_track_bits(sptep);
mmu_spte_set(sptep, new_spte);
- sptep = rmap_get_next(&iter);
}
}

@@ -1518,16 +1511,13 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,

BUG_ON(!shadow_accessed_mask);

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;
- sptep = rmap_get_next(&iter)) {
- BUG_ON(!is_shadow_present_pte(*sptep));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
if (*sptep & shadow_accessed_mask) {
young = 1;
clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
}
- }
+
trace_kvm_age_page(gfn, level, slot, young);
return young;
}
@@ -1548,15 +1538,11 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
if (!shadow_accessed_mask)
goto out;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;
- sptep = rmap_get_next(&iter)) {
- BUG_ON(!is_shadow_present_pte(*sptep));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
if (*sptep & shadow_accessed_mask) {
young = 1;
break;
}
- }
out:
return young;
}
@@ -2232,8 +2218,11 @@ static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp)
u64 *sptep;
struct rmap_iterator iter;

- while ((sptep = rmap_get_first(sp->parent_ptes, &iter)))
+restart:
+ for_each_rmap_spte(&sp->parent_ptes, &iter, sptep) {
drop_parent_pte(sp, sptep);
+ goto restart;
+ }
}

static int mmu_zap_unsync_children(struct kvm *kvm,
@@ -4473,9 +4462,8 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
pfn_t pfn;
struct kvm_mmu_page *sp;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+restart:
+ for_each_rmap_spte(rmapp, &iter, sptep) {
sp = page_header(__pa(sptep));
pfn = spte_to_pfn(*sptep);

@@ -4490,10 +4478,9 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
!kvm_is_reserved_pfn(pfn) &&
PageTransCompound(pfn_to_page(pfn))) {
drop_spte(kvm, sptep);
- sptep = rmap_get_first(*rmapp, &iter);
need_tlb_flush = 1;
- } else
- sptep = rmap_get_next(&iter);
+ goto restart;
+ }
}

return need_tlb_flush;
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 9ade5cf..368d534 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -197,13 +197,11 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)

rmapp = gfn_to_rmap(kvm, sp->gfn, PT_PAGE_TABLE_LEVEL);

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;
- sptep = rmap_get_next(&iter)) {
+ for_each_rmap_spte(rmapp, &iter, sptep)
if (is_writable_pte(*sptep))
audit_printk(kvm, "shadow page has writable "
"mappings: gfn %llx role %x\n",
sp->gfn, sp->role.word);
- }
}

static void audit_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
--
2.1.0

2015-05-12 02:36:18

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 3/9] KVM: MMU: introduce for_each_slot_rmap_range

It's used to abstract the code from kvm_handle_hva_range and it will be
used by later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 97 +++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 75 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 0da9cf0..98b7a6a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1417,6 +1417,74 @@ restart:
return 0;
}

+struct slot_rmap_walk_iterator {
+ /* input fields. */
+ struct kvm_memory_slot *slot;
+ gfn_t start_gfn;
+ gfn_t end_gfn;
+ int start_level;
+ int end_level;
+
+ /* output fields. */
+ gfn_t gfn;
+ unsigned long *rmap;
+ int level;
+
+ /* private field. */
+ unsigned long *end_rmap;
+};
+
+static void
+rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
+{
+ iterator->level = level;
+ iterator->gfn = iterator->start_gfn;
+ iterator->rmap = __gfn_to_rmap(iterator->gfn, level, iterator->slot);
+ iterator->end_rmap = __gfn_to_rmap(iterator->end_gfn, level,
+ iterator->slot);
+}
+
+static void
+slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
+ struct kvm_memory_slot *slot, int start_level,
+ int end_level, gfn_t start_gfn, gfn_t end_gfn)
+{
+ iterator->slot = slot;
+ iterator->start_level = start_level;
+ iterator->end_level = end_level;
+ iterator->start_gfn = start_gfn;
+ iterator->end_gfn = end_gfn;
+
+ rmap_walk_init_level(iterator, iterator->start_level);
+}
+
+static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
+{
+ return !!iterator->rmap;
+}
+
+static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
+{
+ if (++iterator->rmap <= iterator->end_rmap) {
+ iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
+ return;
+ }
+
+ if (++iterator->level > iterator->end_level) {
+ iterator->rmap = NULL;
+ return;
+ }
+
+ rmap_walk_init_level(iterator, iterator->level);
+}
+
+#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_, \
+ _start_gfn, _end_gfn, _iter_) \
+ for (slot_rmap_walk_init(_iter_, _slot_, _start_level_, \
+ _end_level_, _start_gfn, _end_gfn); \
+ slot_rmap_walk_okay(_iter_); \
+ slot_rmap_walk_next(_iter_))
+
static int kvm_handle_hva_range(struct kvm *kvm,
unsigned long start,
unsigned long end,
@@ -1428,10 +1496,10 @@ static int kvm_handle_hva_range(struct kvm *kvm,
int level,
unsigned long data))
{
- int j;
- int ret = 0;
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
+ struct slot_rmap_walk_iterator iterator;
+ int ret = 0;

slots = kvm_memslots(kvm);

@@ -1451,26 +1519,11 @@ static int kvm_handle_hva_range(struct kvm *kvm,
gfn_start = hva_to_gfn_memslot(hva_start, memslot);
gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);

- for (j = PT_PAGE_TABLE_LEVEL;
- j < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
- unsigned long idx, idx_end;
- unsigned long *rmapp;
- gfn_t gfn = gfn_start;
-
- /*
- * {idx(page_j) | page_j intersects with
- * [hva_start, hva_end)} = {idx, idx+1, ..., idx_end}.
- */
- idx = gfn_to_index(gfn_start, memslot->base_gfn, j);
- idx_end = gfn_to_index(gfn_end - 1, memslot->base_gfn, j);
-
- rmapp = __gfn_to_rmap(gfn_start, j, memslot);
-
- for (; idx <= idx_end;
- ++idx, gfn += (1UL << KVM_HPAGE_GFN_SHIFT(j)))
- ret |= handler(kvm, rmapp++, memslot,
- gfn, j, data);
- }
+ for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL,
+ PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1,
+ gfn_start, gfn_end - 1, &iterator)
+ ret |= handler(kvm, iterator.rmap, memslot,
+ iterator.gfn, iterator.level, data);
}

return ret;
--
2.1.0

2015-05-12 02:38:23

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 4/9] KVM: MMU: introduce slot_handle_level_range() and its helpers

There are several places walking all rmaps for the memslot so that
introduce common functions to cleanup the code

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 98b7a6a..55ed4f6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4453,6 +4453,75 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
init_kvm_mmu(vcpu);
}

+/* The return value indicates if tlb flush on all vcpus is needed. */
+typedef bool (*slot_level_handler) (struct kvm *kvm, unsigned long *rmap);
+
+/* The caller should hold mmu-lock before calling this function. */
+static bool
+slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, int start_level, int end_level,
+ gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
+{
+ struct slot_rmap_walk_iterator iterator;
+ bool flush = false;
+
+ for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
+ end_gfn, &iterator) {
+ if (iterator.rmap)
+ flush |= fn(kvm, iterator.rmap);
+
+ if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+ if (flush & lock_flush_tlb) {
+ kvm_flush_remote_tlbs(kvm);
+ flush = false;
+ }
+ cond_resched_lock(&kvm->mmu_lock);
+ }
+ }
+
+ if (flush && lock_flush_tlb) {
+ kvm_flush_remote_tlbs(kvm);
+ flush = false;
+ }
+
+ return flush;
+}
+
+static bool
+slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, int start_level, int end_level,
+ bool lock_flush_tlb)
+{
+ return slot_handle_level_range(kvm, memslot, fn, start_level,
+ end_level, memslot->base_gfn,
+ memslot->base_gfn + memslot->npages - 1,
+ lock_flush_tlb);
+}
+
+static bool
+slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, bool lock_flush_tlb)
+{
+ return slot_handle_level(kvm, memslot, fn, PT_PAGE_TABLE_LEVEL,
+ PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1, lock_flush_tlb);
+}
+
+static bool
+slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, bool lock_flush_tlb)
+{
+ return slot_handle_level(kvm, memslot, fn, PT_PAGE_TABLE_LEVEL + 1,
+ PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1, lock_flush_tlb);
+}
+
+static bool
+slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, bool lock_flush_tlb)
+{
+ return slot_handle_level(kvm, memslot, fn, PT_PAGE_TABLE_LEVEL,
+ PT_PAGE_TABLE_LEVEL, lock_flush_tlb);
+}
+
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
--
2.1.0

2015-05-12 02:37:52

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 5/9] KVM: MMU: use slot_handle_level and its helper to clean up the code

slot_handle_level and its helper functions are ready now, use them to
clean up the code

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 130 +++++++----------------------------------------------
1 file changed, 16 insertions(+), 114 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 55ed4f6..fae349a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4522,35 +4522,19 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_PAGE_TABLE_LEVEL, lock_flush_tlb);
}

+static bool slot_rmap_write_protect(struct kvm *kvm, unsigned long *rmapp)
+{
+ return __rmap_write_protect(kvm, rmapp, false);
+}
+
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- gfn_t last_gfn;
- int i;
- bool flush = false;
-
- last_gfn = memslot->base_gfn + memslot->npages - 1;
+ bool flush;

spin_lock(&kvm->mmu_lock);
-
- for (i = PT_PAGE_TABLE_LEVEL;
- i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
- unsigned long *rmapp;
- unsigned long last_index, index;
-
- rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
- last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= __rmap_write_protect(kvm, rmapp,
- false);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock))
- cond_resched_lock(&kvm->mmu_lock);
- }
- }
-
+ flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
+ false);
spin_unlock(&kvm->mmu_lock);

/*
@@ -4611,59 +4595,18 @@ restart:
void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- bool flush = false;
- unsigned long *rmapp;
- unsigned long last_index, index;
-
spin_lock(&kvm->mmu_lock);
-
- rmapp = memslot->arch.rmap[0];
- last_index = gfn_to_index(memslot->base_gfn + memslot->npages - 1,
- memslot->base_gfn, PT_PAGE_TABLE_LEVEL);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= kvm_mmu_zap_collapsible_spte(kvm, rmapp);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
- if (flush) {
- kvm_flush_remote_tlbs(kvm);
- flush = false;
- }
- cond_resched_lock(&kvm->mmu_lock);
- }
- }
-
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
+ slot_handle_leaf(kvm, memslot, kvm_mmu_zap_collapsible_spte, true);
spin_unlock(&kvm->mmu_lock);
}

void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- gfn_t last_gfn;
- unsigned long *rmapp;
- unsigned long last_index, index;
- bool flush = false;
-
- last_gfn = memslot->base_gfn + memslot->npages - 1;
+ bool flush;

spin_lock(&kvm->mmu_lock);
-
- rmapp = memslot->arch.rmap[PT_PAGE_TABLE_LEVEL - 1];
- last_index = gfn_to_index(last_gfn, memslot->base_gfn,
- PT_PAGE_TABLE_LEVEL);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= __rmap_clear_dirty(kvm, rmapp);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock))
- cond_resched_lock(&kvm->mmu_lock);
- }
-
+ flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false);
spin_unlock(&kvm->mmu_lock);

lockdep_assert_held(&kvm->slots_lock);
@@ -4682,31 +4625,11 @@ EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);
void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- gfn_t last_gfn;
- int i;
- bool flush = false;
-
- last_gfn = memslot->base_gfn + memslot->npages - 1;
+ bool flush;

spin_lock(&kvm->mmu_lock);
-
- for (i = PT_PAGE_TABLE_LEVEL + 1; /* skip rmap for 4K page */
- i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
- unsigned long *rmapp;
- unsigned long last_index, index;
-
- rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
- last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= __rmap_write_protect(kvm, rmapp,
- false);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock))
- cond_resched_lock(&kvm->mmu_lock);
- }
- }
+ flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect,
+ false);
spin_unlock(&kvm->mmu_lock);

/* see kvm_mmu_slot_remove_write_access */
@@ -4720,31 +4643,10 @@ EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access);
void kvm_mmu_slot_set_dirty(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- gfn_t last_gfn;
- int i;
- bool flush = false;
-
- last_gfn = memslot->base_gfn + memslot->npages - 1;
+ bool flush;

spin_lock(&kvm->mmu_lock);
-
- for (i = PT_PAGE_TABLE_LEVEL;
- i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
- unsigned long *rmapp;
- unsigned long last_index, index;
-
- rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
- last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= __rmap_set_dirty(kvm, rmapp);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock))
- cond_resched_lock(&kvm->mmu_lock);
- }
- }
-
+ flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false);
spin_unlock(&kvm->mmu_lock);

lockdep_assert_held(&kvm->slots_lock);
--
2.1.0

2015-05-12 02:37:30

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 6/9] KVM: MMU: introduce kvm_zap_rmapp

Split kvm_unmap_rmapp and introduce kvm_zap_rmapp which will be used in the
later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fae349a..10d5e03 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1353,25 +1353,29 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
return write_protected;
}

-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
- struct kvm_memory_slot *slot, gfn_t gfn, int level,
- unsigned long data)
+static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
{
u64 *sptep;
struct rmap_iterator iter;
- int need_tlb_flush = 0;
+ bool flush = false;

restart:
for_each_rmap_spte(rmapp, &iter, sptep) {
- rmap_printk("kvm_rmap_unmap_hva: spte %p %llx gfn %llx (%d)\n",
- sptep, *sptep, gfn, level);
+ rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);

drop_spte(kvm, sptep);
- need_tlb_flush = 1;
+ flush = true;
goto restart;
}

- return need_tlb_flush;
+ return flush;
+}
+
+static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
+ struct kvm_memory_slot *slot, gfn_t gfn, int level,
+ unsigned long data)
+{
+ return kvm_zap_rmapp(kvm, rmapp);
}

static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
--
2.1.0

2015-05-12 02:36:52

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 7/9] KVM: MMU: introduce kvm_zap_gfn_range

It is used to zap all the rmaps of the specified gfn range and will
be used by the later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 25 +++++++++++++++++++++++++
arch/x86/kvm/mmu.h | 1 +
2 files changed, 26 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 10d5e03..8c400dc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4526,6 +4526,31 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_PAGE_TABLE_LEVEL, lock_flush_tlb);
}

+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *memslot;
+
+ slots = kvm_memslots(kvm);
+
+ spin_lock(&kvm->mmu_lock);
+ kvm_for_each_memslot(memslot, slots) {
+ gfn_t start, end;
+
+ start = max(gfn_start, memslot->base_gfn);
+ end = min(gfn_end, memslot->base_gfn + memslot->npages);
+ if (start >= end)
+ continue;
+
+ slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
+ PT_PAGE_TABLE_LEVEL,
+ PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1,
+ start, end - 1, true);
+ }
+
+ spin_unlock(&kvm->mmu_lock);
+}
+
static bool slot_rmap_write_protect(struct kvm *kvm, unsigned long *rmapp)
{
return __rmap_write_protect(kvm, rmapp, false);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 06eb2fc..deec5a8 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -172,4 +172,5 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
}

void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
#endif
--
2.1.0

2015-05-12 02:36:49

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 8/9] KVM: MMU: fix MTRR update

Currently, whenever guest MTRR registers are changed
kvm_mmu_reset_context is called to switch to the new root shadow page
table, however, it's useless since:
1) the cache type is not cached into shadow page's attribute so that
the original root shadow page will be reused

2) the cache type is set on the last spte, that means we should sync
the last sptes when MTRR is changed

This patch fixs this issue by drop all the spte in the gfn range which
is being updated by MTRR

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/x86.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cdccbe1..a527dd0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1854,6 +1854,63 @@ bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data)
}
EXPORT_SYMBOL_GPL(kvm_mtrr_valid);

+static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
+{
+ struct mtrr_state_type *mtrr_state = &vcpu->arch.mtrr_state;
+ unsigned char mtrr_enabled = mtrr_state->enabled;
+ gfn_t start, end, mask;
+ int index;
+ bool is_fixed = true;
+
+ if (msr == MSR_IA32_CR_PAT || !tdp_enabled ||
+ !kvm_arch_has_noncoherent_dma(vcpu->kvm))
+ return;
+
+ if (!(mtrr_enabled & 0x2) && msr != MSR_MTRRdefType)
+ return;
+
+ switch (msr) {
+ case MSR_MTRRfix64K_00000:
+ start = 0x0;
+ end = 0x80000;
+ break;
+ case MSR_MTRRfix16K_80000:
+ start = 0x80000;
+ end = 0xa0000;
+ break;
+ case MSR_MTRRfix16K_A0000:
+ start = 0xa0000;
+ end = 0xc0000;
+ break;
+ case MSR_MTRRfix4K_C0000 ... MSR_MTRRfix4K_F8000:
+ index = msr - MSR_MTRRfix4K_C0000;
+ start = 0xc0000 + index * (32 << 10);
+ end = start + (32 << 10);
+ break;
+ case MSR_MTRRdefType:
+ is_fixed = false;
+ start = 0x0;
+ end = ~0ULL;
+ break;
+ default:
+ /* variable range MTRRs. */
+ is_fixed = false;
+ index = (msr - 0x200) / 2;
+ start = (((u64)mtrr_state->var_ranges[index].base_hi) << 32) +
+ (mtrr_state->var_ranges[index].base_lo & PAGE_MASK);
+ mask = (((u64)mtrr_state->var_ranges[index].mask_hi) << 32) +
+ (mtrr_state->var_ranges[index].mask_lo & PAGE_MASK);
+ mask |= ~0ULL << cpuid_maxphyaddr(vcpu);
+
+ end = ((start & mask) | ~mask) + 1;
+ }
+
+ if (is_fixed && !(mtrr_enabled & 0x1))
+ return;
+
+ kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end));
+}
+
static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
@@ -1887,7 +1944,7 @@ static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
*pt = data;
}

- kvm_mmu_reset_context(vcpu);
+ update_mtrr(vcpu, msr);
return 0;
}

--
2.1.0

2015-05-12 02:36:23

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v2 9/9] KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed

CR0.CD and CR0.NW are not used by shadow page table so that need
not adjust mmu if these two bit are changed

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/x86.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a527dd0..a82d26f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -572,8 +572,7 @@ out:
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
{
unsigned long old_cr0 = kvm_read_cr0(vcpu);
- unsigned long update_bits = X86_CR0_PG | X86_CR0_WP |
- X86_CR0_CD | X86_CR0_NW;
+ unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;

cr0 |= X86_CR0_ET;

--
2.1.0

2015-05-12 09:24:37

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] KVM: MMU: introduce for_each_rmap_spte()



On 12/05/2015 04:32, Xiao Guangrong wrote:
> - while ((sptep = rmap_get_first(*rmapp, &iter))) {
> - BUG_ON(!(*sptep & PT_PRESENT_MASK));
> +restart:
> + for_each_rmap_spte(rmapp, &iter, sptep) {
> rmap_printk("kvm_rmap_unmap_hva: spte %p %llx gfn %llx (%d)\n",
> sptep, *sptep, gfn, level);
>
> drop_spte(kvm, sptep);
> need_tlb_flush = 1;
> + goto restart;
> }
>

For this one, I would keep using rmap_get_first. Otherwise looks good.

Paolo

2015-05-12 09:28:51

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] KVM: MMU: introduce slot_handle_level_range() and its helpers



On 12/05/2015 04:32, Xiao Guangrong wrote:
> + if (iterator.rmap)
> + flush |= fn(kvm, iterator.rmap);
> +
> + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
> + if (flush & lock_flush_tlb) {

&&, not &. (Cosmetic only).

Paolo

2015-05-12 09:29:38

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] KVM: MMU: introduce for_each_slot_rmap_range



On 12/05/2015 04:32, Xiao Guangrong wrote:
> +#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_, \
> + _start_gfn, _end_gfn, _iter_) \
> + for (slot_rmap_walk_init(_iter_, _slot_, _start_level_, \
> + _end_level_, _start_gfn, _end_gfn); \
> + slot_rmap_walk_okay(_iter_); \
> + slot_rmap_walk_next(_iter_))
> +


> + for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL,
> + PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1,


What about adding a

#define for PT_MAX_HUGEPAGE_LEVEL (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1)

?

> + gfn_start, gfn_end - 1, &iterator)
> + ret |= handler(kvm, iterator.rmap, memslot,
> + iterator.gfn, iterator.level, data);

I prefer to indent by two tabs:

for_each_slot_rmap_range(memslot,
PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL,
gfn_start, gfn_end - 1, &iterator)
ret |= handler(kvm, iterator.rmap, memslot,
iterator.gfn, iterator.level, data);


Same in the next patch:

for_each_slot_rmap_range(memslot,
start_level, end_level,
start_gfn, end_gfn, &iterator) {
if (iterator.rmap)
flush |= fn(kvm, iterator.rmap);
...
}
Thanks,

Paolo

2015-05-12 09:39:50

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v2 2/9] KVM: MMU: introduce for_each_rmap_spte()



On 05/12/2015 04:08 PM, Paolo Bonzini wrote:
>
>
> On 12/05/2015 04:32, Xiao Guangrong wrote:
>> - while ((sptep = rmap_get_first(*rmapp, &iter))) {
>> - BUG_ON(!(*sptep & PT_PRESENT_MASK));
>> +restart:
>> + for_each_rmap_spte(rmapp, &iter, sptep) {
>> rmap_printk("kvm_rmap_unmap_hva: spte %p %llx gfn %llx (%d)\n",
>> sptep, *sptep, gfn, level);
>>
>> drop_spte(kvm, sptep);
>> need_tlb_flush = 1;
>> + goto restart;
>> }
>>
>
> For this one, I would keep using rmap_get_first. Otherwise looks good.
>
> Paolo
>

Okay, will do.

2015-05-12 09:40:25

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v2 4/9] KVM: MMU: introduce slot_handle_level_range() and its helpers



On 05/12/2015 04:22 PM, Paolo Bonzini wrote:
>
>
> On 12/05/2015 04:32, Xiao Guangrong wrote:
>> + if (iterator.rmap)
>> + flush |= fn(kvm, iterator.rmap);
>> +
>> + if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
>> + if (flush & lock_flush_tlb) {
>
> &&, not &. (Cosmetic only).

Will fix.

2015-05-12 09:42:09

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v2 3/9] KVM: MMU: introduce for_each_slot_rmap_range



On 05/12/2015 04:22 PM, Paolo Bonzini wrote:
>
>
> On 12/05/2015 04:32, Xiao Guangrong wrote:
>> +#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_, \
>> + _start_gfn, _end_gfn, _iter_) \
>> + for (slot_rmap_walk_init(_iter_, _slot_, _start_level_, \
>> + _end_level_, _start_gfn, _end_gfn); \
>> + slot_rmap_walk_okay(_iter_); \
>> + slot_rmap_walk_next(_iter_))
>> +
>
>
>> + for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL,
>> + PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1,
>
>
> What about adding a
>
> #define for PT_MAX_HUGEPAGE_LEVEL (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1)

Good to me.

>
> ?
>
>> + gfn_start, gfn_end - 1, &iterator)
>> + ret |= handler(kvm, iterator.rmap, memslot,
>> + iterator.gfn, iterator.level, data);
>
> I prefer to indent by two tabs:
>
> for_each_slot_rmap_range(memslot,
> PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL,
> gfn_start, gfn_end - 1, &iterator)
> ret |= handler(kvm, iterator.rmap, memslot,
> iterator.gfn, iterator.level, data);
>
>
> Same in the next patch:
>
> for_each_slot_rmap_range(memslot,
> start_level, end_level,
> start_gfn, end_gfn, &iterator) {
> if (iterator.rmap)
> flush |= fn(kvm, iterator.rmap);
> ...
> }

looks nice, will follow your style. :)