2015-05-13 06:46:07

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 00/10] KVM: MTRR fixes and some cleanups

Changelog in v3:
thanks for Paolo's comment:
- do not apply for_each_rmap_spte to kvm_zap_rmapp and kvm_mmu_unlink_parents
- fix a cosmetic issue in slot_handle_level_range
- introduce PT_MAX_HUGEPAGE_LEVEL to clean up the code
- improve code Indentation

Changelog in v2:
- fix the bit description in changelog of the first patch, thanks
David Matlack for pointing it out

all follow changes are from Paolo's comment and really appreciate it:
- reorder the whole patchset to make it is more readable
- redesign the iterator APIs
- make TLB clean if @lock_flush_tlb is true in slot_handle_level()
- make MTRR update be generic

This are some MTRR bugs if legacy IOMMU device is used on Intel's CPU:
- In current code, whenever guest MTRR registers are changed
kvm_mmu_reset_context is called to switch to the new root shadow page
table, however, it's useless since:
1) the cache type is not cached into shadow page's attribute so that the
original root shadow page will be reused

2) the cache type is set on the last spte, that means we should sync the
last sptes when MTRR is changed

We can fix it by dropping all the spte in the gfn range which is
being updated by MTRR

- some bugs are in get_mtrr_type();
1: bit 1 of mtrr_state->enabled is corresponding bit 11 of IA32_MTRR_DEF_TYPE
MSR which completely control MTRR's enablement that means other bits are
ignored if it is cleared

2: the fixed MTRR ranges are controlled by bit 0 of mtrr_state->enabled (bit
10 of IA32_MTRR_DEF_TYPE)

3: if MTRR is disabled, UC is applied to all of physical memory rather than
mtrr_state->def_type

- we need not to reset mmu once cache policy is changed since shadow page table
does not virtualize any cache policy

Also, these are some cleanups to make current MMU code more cleaner and help
us fixing the bug more easier.

Xiao Guangrong (10):
KVM: MMU: fix decoding cache type from MTRR
KVM: MMU: introduce for_each_rmap_spte()
KVM: MMU: introduce PT_MAX_HUGEPAGE_LEVEL
KVM: MMU: introduce for_each_slot_rmap_range
KVM: MMU: introduce slot_handle_level_range() and its helpers
KVM: MMU: use slot_handle_level and its helper to clean up the code
KVM: MMU: introduce kvm_zap_rmapp
KVM: MMU: introduce kvm_zap_gfn_range
KVM: MMU: fix MTRR update
KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed

arch/x86/kvm/mmu.c | 409 +++++++++++++++++++++++++----------------------
arch/x86/kvm/mmu.h | 2 +
arch/x86/kvm/mmu_audit.c | 4 +-
arch/x86/kvm/x86.c | 62 ++++++-
4 files changed, 281 insertions(+), 196 deletions(-)

--
2.1.0


2015-05-13 06:50:12

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR

There are some bugs in current get_mtrr_type();
1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
that means other bits are ignored if it is cleared

2: the fixed MTRR ranges are controlled by bit 0 of
mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)

3: if MTRR is disabled, UC is applied to all of physical memory rather
than mtrr_state->def_type

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 14 ++++++--------
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b78e83f..d00cebd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
u64 start, u64 end)
{
- int i;
u64 base, mask;
u8 prev_match, curr_match;
- int num_var_ranges = KVM_NR_VAR_MTRR;
+ int i, num_var_ranges = KVM_NR_VAR_MTRR;

- if (!mtrr_state->enabled)
- return 0xFF;
+ /* MTRR is completely disabled, use UC for all of physical memory. */
+ if (!(mtrr_state->enabled & 0x2))
+ return MTRR_TYPE_UNCACHABLE;

/* Make end inclusive end, instead of exclusive */
end--;

/* Look in fixed ranges. Just return the type as per start */
- if (mtrr_state->have_fixed && (start < 0x100000)) {
+ if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
+ (start < 0x100000)) {
int idx;

if (start < 0x80000) {
@@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
* Look of multiple ranges matching this address and pick type
* as per MTRR precedence
*/
- if (!(mtrr_state->enabled & 2))
- return mtrr_state->def_type;
-
prev_match = 0xFF;
for (i = 0; i < num_var_ranges; ++i) {
unsigned short start_state, end_state;
--
2.1.0

2015-05-13 06:46:11

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 02/10] KVM: MMU: introduce for_each_rmap_spte()

It's used to walk all the sptes on the rmap to clean up the
code

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 53 ++++++++++++++++--------------------------------
arch/x86/kvm/mmu_audit.c | 4 +---
2 files changed, 19 insertions(+), 38 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index d00cebd..afc8d15 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1142,6 +1142,11 @@ static u64 *rmap_get_next(struct rmap_iterator *iter)
return NULL;
}

+#define for_each_rmap_spte(_rmap_, _iter_, _spte_) \
+ for (_spte_ = rmap_get_first(*_rmap_, _iter_); \
+ _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \
+ _spte_ = rmap_get_next(_iter_))
+
static void drop_spte(struct kvm *kvm, u64 *sptep)
{
if (mmu_spte_clear_track_bits(sptep))
@@ -1205,12 +1210,8 @@ static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp,
struct rmap_iterator iter;
bool flush = false;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
flush |= spte_write_protect(kvm, sptep, pt_protect);
- sptep = rmap_get_next(&iter);
- }

return flush;
}
@@ -1232,12 +1233,8 @@ static bool __rmap_clear_dirty(struct kvm *kvm, unsigned long *rmapp)
struct rmap_iterator iter;
bool flush = false;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
flush |= spte_clear_dirty(kvm, sptep);
- sptep = rmap_get_next(&iter);
- }

return flush;
}
@@ -1259,12 +1256,8 @@ static bool __rmap_set_dirty(struct kvm *kvm, unsigned long *rmapp)
struct rmap_iterator iter;
bool flush = false;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
flush |= spte_set_dirty(kvm, sptep);
- sptep = rmap_get_next(&iter);
- }

return flush;
}
@@ -1394,8 +1387,8 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
WARN_ON(pte_huge(*ptep));
new_pfn = pte_pfn(*ptep);

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!is_shadow_present_pte(*sptep));
+restart:
+ for_each_rmap_spte(rmapp, &iter, sptep) {
rmap_printk("kvm_set_pte_rmapp: spte %p %llx gfn %llx (%d)\n",
sptep, *sptep, gfn, level);

@@ -1403,7 +1396,7 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,

if (pte_write(*ptep)) {
drop_spte(kvm, sptep);
- sptep = rmap_get_first(*rmapp, &iter);
+ goto restart;
} else {
new_spte = *sptep & ~PT64_BASE_ADDR_MASK;
new_spte |= (u64)new_pfn << PAGE_SHIFT;
@@ -1414,7 +1407,6 @@ static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,

mmu_spte_clear_track_bits(sptep);
mmu_spte_set(sptep, new_spte);
- sptep = rmap_get_next(&iter);
}
}

@@ -1518,16 +1510,13 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,

BUG_ON(!shadow_accessed_mask);

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;
- sptep = rmap_get_next(&iter)) {
- BUG_ON(!is_shadow_present_pte(*sptep));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
if (*sptep & shadow_accessed_mask) {
young = 1;
clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
}
- }
+
trace_kvm_age_page(gfn, level, slot, young);
return young;
}
@@ -1548,15 +1537,11 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
if (!shadow_accessed_mask)
goto out;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;
- sptep = rmap_get_next(&iter)) {
- BUG_ON(!is_shadow_present_pte(*sptep));
-
+ for_each_rmap_spte(rmapp, &iter, sptep)
if (*sptep & shadow_accessed_mask) {
young = 1;
break;
}
- }
out:
return young;
}
@@ -4481,9 +4466,8 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
pfn_t pfn;
struct kvm_mmu_page *sp;

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;) {
- BUG_ON(!(*sptep & PT_PRESENT_MASK));
-
+restart:
+ for_each_rmap_spte(rmapp, &iter, sptep) {
sp = page_header(__pa(sptep));
pfn = spte_to_pfn(*sptep);

@@ -4498,10 +4482,9 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
!kvm_is_reserved_pfn(pfn) &&
PageTransCompound(pfn_to_page(pfn))) {
drop_spte(kvm, sptep);
- sptep = rmap_get_first(*rmapp, &iter);
need_tlb_flush = 1;
- } else
- sptep = rmap_get_next(&iter);
+ goto restart;
+ }
}

return need_tlb_flush;
diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c
index 9ade5cf..368d534 100644
--- a/arch/x86/kvm/mmu_audit.c
+++ b/arch/x86/kvm/mmu_audit.c
@@ -197,13 +197,11 @@ static void audit_write_protection(struct kvm *kvm, struct kvm_mmu_page *sp)

rmapp = gfn_to_rmap(kvm, sp->gfn, PT_PAGE_TABLE_LEVEL);

- for (sptep = rmap_get_first(*rmapp, &iter); sptep;
- sptep = rmap_get_next(&iter)) {
+ for_each_rmap_spte(rmapp, &iter, sptep)
if (is_writable_pte(*sptep))
audit_printk(kvm, "shadow page has writable "
"mappings: gfn %llx role %x\n",
sp->gfn, sp->role.word);
- }
}

static void audit_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
--
2.1.0

2015-05-13 06:49:53

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 03/10] KVM: MMU: introduce PT_MAX_HUGEPAGE_LEVEL

It's used to clean up the code. Thanks for Paolo Bonzini's
suggestion

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 24 +++++++++---------------
arch/x86/kvm/mmu.h | 1 +
2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index afc8d15..e0e15b8 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -811,8 +811,7 @@ static void account_shadowed(struct kvm *kvm, gfn_t gfn)
int i;

slot = gfn_to_memslot(kvm, gfn);
- for (i = PT_DIRECTORY_LEVEL;
- i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+ for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
linfo = lpage_info_slot(gfn, slot, i);
linfo->write_count += 1;
}
@@ -826,8 +825,7 @@ static void unaccount_shadowed(struct kvm *kvm, gfn_t gfn)
int i;

slot = gfn_to_memslot(kvm, gfn);
- for (i = PT_DIRECTORY_LEVEL;
- i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+ for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
linfo = lpage_info_slot(gfn, slot, i);
linfo->write_count -= 1;
WARN_ON(linfo->write_count < 0);
@@ -858,8 +856,7 @@ static int host_mapping_level(struct kvm *kvm, gfn_t gfn)

page_size = kvm_host_page_size(kvm, gfn);

- for (i = PT_PAGE_TABLE_LEVEL;
- i < (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES); ++i) {
+ for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
if (page_size >= KVM_HPAGE_SIZE(i))
ret = i;
else
@@ -1344,8 +1341,7 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)

slot = gfn_to_memslot(kvm, gfn);

- for (i = PT_PAGE_TABLE_LEVEL;
- i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+ for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
rmapp = __gfn_to_rmap(gfn, i, slot);
write_protected |= __rmap_write_protect(kvm, rmapp, true);
}
@@ -1451,7 +1447,7 @@ static int kvm_handle_hva_range(struct kvm *kvm,
gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);

for (j = PT_PAGE_TABLE_LEVEL;
- j < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++j) {
+ j <= PT_MAX_HUGEPAGE_LEVEL; ++j) {
unsigned long idx, idx_end;
unsigned long *rmapp;
gfn_t gfn = gfn_start;
@@ -4415,8 +4411,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,

spin_lock(&kvm->mmu_lock);

- for (i = PT_PAGE_TABLE_LEVEL;
- i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+ for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
unsigned long *rmapp;
unsigned long last_index, index;

@@ -4572,8 +4567,8 @@ void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,

spin_lock(&kvm->mmu_lock);

- for (i = PT_PAGE_TABLE_LEVEL + 1; /* skip rmap for 4K page */
- i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+ /* skip rmap for 4K page */
+ for (i = PT_PAGE_TABLE_LEVEL + 1; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
unsigned long *rmapp;
unsigned long last_index, index;

@@ -4610,8 +4605,7 @@ void kvm_mmu_slot_set_dirty(struct kvm *kvm,

spin_lock(&kvm->mmu_lock);

- for (i = PT_PAGE_TABLE_LEVEL;
- i < PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) {
+ for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
unsigned long *rmapp;
unsigned long last_index, index;

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 0ada65e..72bb33f 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -43,6 +43,7 @@
#define PT_PDPE_LEVEL 3
#define PT_DIRECTORY_LEVEL 2
#define PT_PAGE_TABLE_LEVEL 1
+#define PT_MAX_HUGEPAGE_LEVEL (PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES - 1)

static inline u64 rsvd_bits(int s, int e)
{
--
2.1.0

2015-05-13 06:46:14

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 04/10] KVM: MMU: introduce for_each_slot_rmap_range

It's used to abstract the code from kvm_handle_hva_range and it will be
used by later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 97 +++++++++++++++++++++++++++++++++++++++++-------------
1 file changed, 75 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index e0e15b8..df403b3 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1412,6 +1412,74 @@ restart:
return 0;
}

+struct slot_rmap_walk_iterator {
+ /* input fields. */
+ struct kvm_memory_slot *slot;
+ gfn_t start_gfn;
+ gfn_t end_gfn;
+ int start_level;
+ int end_level;
+
+ /* output fields. */
+ gfn_t gfn;
+ unsigned long *rmap;
+ int level;
+
+ /* private field. */
+ unsigned long *end_rmap;
+};
+
+static void
+rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
+{
+ iterator->level = level;
+ iterator->gfn = iterator->start_gfn;
+ iterator->rmap = __gfn_to_rmap(iterator->gfn, level, iterator->slot);
+ iterator->end_rmap = __gfn_to_rmap(iterator->end_gfn, level,
+ iterator->slot);
+}
+
+static void
+slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
+ struct kvm_memory_slot *slot, int start_level,
+ int end_level, gfn_t start_gfn, gfn_t end_gfn)
+{
+ iterator->slot = slot;
+ iterator->start_level = start_level;
+ iterator->end_level = end_level;
+ iterator->start_gfn = start_gfn;
+ iterator->end_gfn = end_gfn;
+
+ rmap_walk_init_level(iterator, iterator->start_level);
+}
+
+static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
+{
+ return !!iterator->rmap;
+}
+
+static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
+{
+ if (++iterator->rmap <= iterator->end_rmap) {
+ iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
+ return;
+ }
+
+ if (++iterator->level > iterator->end_level) {
+ iterator->rmap = NULL;
+ return;
+ }
+
+ rmap_walk_init_level(iterator, iterator->level);
+}
+
+#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_, \
+ _start_gfn, _end_gfn, _iter_) \
+ for (slot_rmap_walk_init(_iter_, _slot_, _start_level_, \
+ _end_level_, _start_gfn, _end_gfn); \
+ slot_rmap_walk_okay(_iter_); \
+ slot_rmap_walk_next(_iter_))
+
static int kvm_handle_hva_range(struct kvm *kvm,
unsigned long start,
unsigned long end,
@@ -1423,10 +1491,10 @@ static int kvm_handle_hva_range(struct kvm *kvm,
int level,
unsigned long data))
{
- int j;
- int ret = 0;
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
+ struct slot_rmap_walk_iterator iterator;
+ int ret = 0;

slots = kvm_memslots(kvm);

@@ -1446,26 +1514,11 @@ static int kvm_handle_hva_range(struct kvm *kvm,
gfn_start = hva_to_gfn_memslot(hva_start, memslot);
gfn_end = hva_to_gfn_memslot(hva_end + PAGE_SIZE - 1, memslot);

- for (j = PT_PAGE_TABLE_LEVEL;
- j <= PT_MAX_HUGEPAGE_LEVEL; ++j) {
- unsigned long idx, idx_end;
- unsigned long *rmapp;
- gfn_t gfn = gfn_start;
-
- /*
- * {idx(page_j) | page_j intersects with
- * [hva_start, hva_end)} = {idx, idx+1, ..., idx_end}.
- */
- idx = gfn_to_index(gfn_start, memslot->base_gfn, j);
- idx_end = gfn_to_index(gfn_end - 1, memslot->base_gfn, j);
-
- rmapp = __gfn_to_rmap(gfn_start, j, memslot);
-
- for (; idx <= idx_end;
- ++idx, gfn += (1UL << KVM_HPAGE_GFN_SHIFT(j)))
- ret |= handler(kvm, rmapp++, memslot,
- gfn, j, data);
- }
+ for_each_slot_rmap_range(memslot, PT_PAGE_TABLE_LEVEL,
+ PT_MAX_HUGEPAGE_LEVEL, gfn_start, gfn_end - 1,
+ &iterator)
+ ret |= handler(kvm, iterator.rmap, memslot,
+ iterator.gfn, iterator.level, data);
}

return ret;
--
2.1.0

2015-05-13 06:46:18

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 05/10] KVM: MMU: introduce slot_handle_level_range() and its helpers

There are several places walking all rmaps for the memslot so that
introduce common functions to cleanup the code

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 69 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index df403b3..62ac4e0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4453,6 +4453,75 @@ void kvm_mmu_setup(struct kvm_vcpu *vcpu)
init_kvm_mmu(vcpu);
}

+/* The return value indicates if tlb flush on all vcpus is needed. */
+typedef bool (*slot_level_handler) (struct kvm *kvm, unsigned long *rmap);
+
+/* The caller should hold mmu-lock before calling this function. */
+static bool
+slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, int start_level, int end_level,
+ gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb)
+{
+ struct slot_rmap_walk_iterator iterator;
+ bool flush = false;
+
+ for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
+ end_gfn, &iterator) {
+ if (iterator.rmap)
+ flush |= fn(kvm, iterator.rmap);
+
+ if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
+ if (flush && lock_flush_tlb) {
+ kvm_flush_remote_tlbs(kvm);
+ flush = false;
+ }
+ cond_resched_lock(&kvm->mmu_lock);
+ }
+ }
+
+ if (flush && lock_flush_tlb) {
+ kvm_flush_remote_tlbs(kvm);
+ flush = false;
+ }
+
+ return flush;
+}
+
+static bool
+slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, int start_level, int end_level,
+ bool lock_flush_tlb)
+{
+ return slot_handle_level_range(kvm, memslot, fn, start_level,
+ end_level, memslot->base_gfn,
+ memslot->base_gfn + memslot->npages - 1,
+ lock_flush_tlb);
+}
+
+static bool
+slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, bool lock_flush_tlb)
+{
+ return slot_handle_level(kvm, memslot, fn, PT_PAGE_TABLE_LEVEL,
+ PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
+}
+
+static bool
+slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, bool lock_flush_tlb)
+{
+ return slot_handle_level(kvm, memslot, fn, PT_PAGE_TABLE_LEVEL + 1,
+ PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
+}
+
+static bool
+slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
+ slot_level_handler fn, bool lock_flush_tlb)
+{
+ return slot_handle_level(kvm, memslot, fn, PT_PAGE_TABLE_LEVEL,
+ PT_PAGE_TABLE_LEVEL, lock_flush_tlb);
+}
+
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
--
2.1.0

2015-05-13 06:49:14

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 06/10] KVM: MMU: use slot_handle_level and its helper to clean up the code

slot_handle_level and its helper functions are ready now, use them to
clean up the code

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 128 +++++++----------------------------------------------
1 file changed, 16 insertions(+), 112 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 62ac4e0..c059822 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4522,34 +4522,19 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_PAGE_TABLE_LEVEL, lock_flush_tlb);
}

+static bool slot_rmap_write_protect(struct kvm *kvm, unsigned long *rmapp)
+{
+ return __rmap_write_protect(kvm, rmapp, false);
+}
+
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- gfn_t last_gfn;
- int i;
- bool flush = false;
-
- last_gfn = memslot->base_gfn + memslot->npages - 1;
+ bool flush;

spin_lock(&kvm->mmu_lock);
-
- for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
- unsigned long *rmapp;
- unsigned long last_index, index;
-
- rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
- last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= __rmap_write_protect(kvm, rmapp,
- false);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock))
- cond_resched_lock(&kvm->mmu_lock);
- }
- }
-
+ flush = slot_handle_all_level(kvm, memslot, slot_rmap_write_protect,
+ false);
spin_unlock(&kvm->mmu_lock);

/*
@@ -4610,59 +4595,18 @@ restart:
void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- bool flush = false;
- unsigned long *rmapp;
- unsigned long last_index, index;
-
spin_lock(&kvm->mmu_lock);
-
- rmapp = memslot->arch.rmap[0];
- last_index = gfn_to_index(memslot->base_gfn + memslot->npages - 1,
- memslot->base_gfn, PT_PAGE_TABLE_LEVEL);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= kvm_mmu_zap_collapsible_spte(kvm, rmapp);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock)) {
- if (flush) {
- kvm_flush_remote_tlbs(kvm);
- flush = false;
- }
- cond_resched_lock(&kvm->mmu_lock);
- }
- }
-
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
+ slot_handle_leaf(kvm, memslot, kvm_mmu_zap_collapsible_spte, true);
spin_unlock(&kvm->mmu_lock);
}

void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- gfn_t last_gfn;
- unsigned long *rmapp;
- unsigned long last_index, index;
- bool flush = false;
-
- last_gfn = memslot->base_gfn + memslot->npages - 1;
+ bool flush;

spin_lock(&kvm->mmu_lock);
-
- rmapp = memslot->arch.rmap[PT_PAGE_TABLE_LEVEL - 1];
- last_index = gfn_to_index(last_gfn, memslot->base_gfn,
- PT_PAGE_TABLE_LEVEL);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= __rmap_clear_dirty(kvm, rmapp);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock))
- cond_resched_lock(&kvm->mmu_lock);
- }
-
+ flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false);
spin_unlock(&kvm->mmu_lock);

lockdep_assert_held(&kvm->slots_lock);
@@ -4681,31 +4625,11 @@ EXPORT_SYMBOL_GPL(kvm_mmu_slot_leaf_clear_dirty);
void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- gfn_t last_gfn;
- int i;
- bool flush = false;
-
- last_gfn = memslot->base_gfn + memslot->npages - 1;
+ bool flush;

spin_lock(&kvm->mmu_lock);
-
- /* skip rmap for 4K page */
- for (i = PT_PAGE_TABLE_LEVEL + 1; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
- unsigned long *rmapp;
- unsigned long last_index, index;
-
- rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
- last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= __rmap_write_protect(kvm, rmapp,
- false);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock))
- cond_resched_lock(&kvm->mmu_lock);
- }
- }
+ flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect,
+ false);
spin_unlock(&kvm->mmu_lock);

/* see kvm_mmu_slot_remove_write_access */
@@ -4719,30 +4643,10 @@ EXPORT_SYMBOL_GPL(kvm_mmu_slot_largepage_remove_write_access);
void kvm_mmu_slot_set_dirty(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- gfn_t last_gfn;
- int i;
- bool flush = false;
-
- last_gfn = memslot->base_gfn + memslot->npages - 1;
+ bool flush;

spin_lock(&kvm->mmu_lock);
-
- for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) {
- unsigned long *rmapp;
- unsigned long last_index, index;
-
- rmapp = memslot->arch.rmap[i - PT_PAGE_TABLE_LEVEL];
- last_index = gfn_to_index(last_gfn, memslot->base_gfn, i);
-
- for (index = 0; index <= last_index; ++index, ++rmapp) {
- if (*rmapp)
- flush |= __rmap_set_dirty(kvm, rmapp);
-
- if (need_resched() || spin_needbreak(&kvm->mmu_lock))
- cond_resched_lock(&kvm->mmu_lock);
- }
- }
-
+ flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false);
spin_unlock(&kvm->mmu_lock);

lockdep_assert_held(&kvm->slots_lock);
--
2.1.0

2015-05-13 06:47:29

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 07/10] KVM: MMU: introduce kvm_zap_rmapp

Split kvm_unmap_rmapp and introduce kvm_zap_rmapp which will be used in the
later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c059822..a990ad9 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1349,24 +1349,28 @@ static bool rmap_write_protect(struct kvm *kvm, u64 gfn)
return write_protected;
}

-static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
- struct kvm_memory_slot *slot, gfn_t gfn, int level,
- unsigned long data)
+static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp)
{
u64 *sptep;
struct rmap_iterator iter;
- int need_tlb_flush = 0;
+ bool flush = false;

while ((sptep = rmap_get_first(*rmapp, &iter))) {
BUG_ON(!(*sptep & PT_PRESENT_MASK));
- rmap_printk("kvm_rmap_unmap_hva: spte %p %llx gfn %llx (%d)\n",
- sptep, *sptep, gfn, level);
+ rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep);

drop_spte(kvm, sptep);
- need_tlb_flush = 1;
+ flush = true;
}

- return need_tlb_flush;
+ return flush;
+}
+
+static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
+ struct kvm_memory_slot *slot, gfn_t gfn, int level,
+ unsigned long data)
+{
+ return kvm_zap_rmapp(kvm, rmapp);
}

static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp,
--
2.1.0

2015-05-13 06:47:26

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 08/10] KVM: MMU: introduce kvm_zap_gfn_range

It is used to zap all the rmaps of the specified gfn range and will
be used by the later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 24 ++++++++++++++++++++++++
arch/x86/kvm/mmu.h | 1 +
2 files changed, 25 insertions(+)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index a990ad9..49c34e6 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4526,6 +4526,30 @@ slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
PT_PAGE_TABLE_LEVEL, lock_flush_tlb);
}

+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *memslot;
+
+ slots = kvm_memslots(kvm);
+
+ spin_lock(&kvm->mmu_lock);
+ kvm_for_each_memslot(memslot, slots) {
+ gfn_t start, end;
+
+ start = max(gfn_start, memslot->base_gfn);
+ end = min(gfn_end, memslot->base_gfn + memslot->npages);
+ if (start >= end)
+ continue;
+
+ slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
+ PT_PAGE_TABLE_LEVEL, PT_MAX_HUGEPAGE_LEVEL,
+ start, end - 1, true);
+ }
+
+ spin_unlock(&kvm->mmu_lock);
+}
+
static bool slot_rmap_write_protect(struct kvm *kvm, unsigned long *rmapp)
{
return __rmap_write_protect(kvm, rmapp, false);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 72bb33f..398d21c 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -171,4 +171,5 @@ static inline bool permission_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu,
}

void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm);
+void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end);
#endif
--
2.1.0

2015-05-13 06:46:20

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 09/10] KVM: MMU: fix MTRR update

Currently, whenever guest MTRR registers are changed
kvm_mmu_reset_context is called to switch to the new root shadow page
table, however, it's useless since:
1) the cache type is not cached into shadow page's attribute so that
the original root shadow page will be reused

2) the cache type is set on the last spte, that means we should sync
the last sptes when MTRR is changed

This patch fixs this issue by drop all the spte in the gfn range which
is being updated by MTRR

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/x86.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 58 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cde5d61..bbe184f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1852,6 +1852,63 @@ bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data)
}
EXPORT_SYMBOL_GPL(kvm_mtrr_valid);

+static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
+{
+ struct mtrr_state_type *mtrr_state = &vcpu->arch.mtrr_state;
+ unsigned char mtrr_enabled = mtrr_state->enabled;
+ gfn_t start, end, mask;
+ int index;
+ bool is_fixed = true;
+
+ if (msr == MSR_IA32_CR_PAT || !tdp_enabled ||
+ !kvm_arch_has_noncoherent_dma(vcpu->kvm))
+ return;
+
+ if (!(mtrr_enabled & 0x2) && msr != MSR_MTRRdefType)
+ return;
+
+ switch (msr) {
+ case MSR_MTRRfix64K_00000:
+ start = 0x0;
+ end = 0x80000;
+ break;
+ case MSR_MTRRfix16K_80000:
+ start = 0x80000;
+ end = 0xa0000;
+ break;
+ case MSR_MTRRfix16K_A0000:
+ start = 0xa0000;
+ end = 0xc0000;
+ break;
+ case MSR_MTRRfix4K_C0000 ... MSR_MTRRfix4K_F8000:
+ index = msr - MSR_MTRRfix4K_C0000;
+ start = 0xc0000 + index * (32 << 10);
+ end = start + (32 << 10);
+ break;
+ case MSR_MTRRdefType:
+ is_fixed = false;
+ start = 0x0;
+ end = ~0ULL;
+ break;
+ default:
+ /* variable range MTRRs. */
+ is_fixed = false;
+ index = (msr - 0x200) / 2;
+ start = (((u64)mtrr_state->var_ranges[index].base_hi) << 32) +
+ (mtrr_state->var_ranges[index].base_lo & PAGE_MASK);
+ mask = (((u64)mtrr_state->var_ranges[index].mask_hi) << 32) +
+ (mtrr_state->var_ranges[index].mask_lo & PAGE_MASK);
+ mask |= ~0ULL << cpuid_maxphyaddr(vcpu);
+
+ end = ((start & mask) | ~mask) + 1;
+ }
+
+ if (is_fixed && !(mtrr_enabled & 0x1))
+ return;
+
+ kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end));
+}
+
static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
{
u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
@@ -1885,7 +1942,7 @@ static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
*pt = data;
}

- kvm_mmu_reset_context(vcpu);
+ update_mtrr(vcpu, msr);
return 0;
}

--
2.1.0

2015-05-13 06:46:23

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 10/10] KVM: x86: do not reset mmu if CR0.CD and CR0.NW are changed

CR0.CD and CR0.NW are not used by shadow page table so that need
not adjust mmu if these two bit are changed

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/x86.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bbe184f..457b908 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -572,8 +572,7 @@ out:
int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0)
{
unsigned long old_cr0 = kvm_read_cr0(vcpu);
- unsigned long update_bits = X86_CR0_PG | X86_CR0_WP |
- X86_CR0_CD | X86_CR0_NW;
+ unsigned long update_bits = X86_CR0_PG | X86_CR0_WP;

cr0 |= X86_CR0_ET;

--
2.1.0

2015-05-13 08:27:17

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR

On Wed, May 13, 2015 at 02:42:19PM +0800, Xiao Guangrong wrote:
>There are some bugs in current get_mtrr_type();
>1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
> IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
> that means other bits are ignored if it is cleared
>
>2: the fixed MTRR ranges are controlled by bit 0 of
> mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>
>3: if MTRR is disabled, UC is applied to all of physical memory rather
> than mtrr_state->def_type
>
>Signed-off-by: Xiao Guangrong <[email protected]>

Reviewed-by: Wanpeng Li <[email protected]>

>---
> arch/x86/kvm/mmu.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
>diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>index b78e83f..d00cebd 100644
>--- a/arch/x86/kvm/mmu.c
>+++ b/arch/x86/kvm/mmu.c
>@@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
> static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
> u64 start, u64 end)
> {
>- int i;
> u64 base, mask;
> u8 prev_match, curr_match;
>- int num_var_ranges = KVM_NR_VAR_MTRR;
>+ int i, num_var_ranges = KVM_NR_VAR_MTRR;
>
>- if (!mtrr_state->enabled)
>- return 0xFF;
>+ /* MTRR is completely disabled, use UC for all of physical memory. */
>+ if (!(mtrr_state->enabled & 0x2))
>+ return MTRR_TYPE_UNCACHABLE;
>
> /* Make end inclusive end, instead of exclusive */
> end--;
>
> /* Look in fixed ranges. Just return the type as per start */
>- if (mtrr_state->have_fixed && (start < 0x100000)) {
>+ if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
>+ (start < 0x100000)) {
> int idx;
>
> if (start < 0x80000) {
>@@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
> * Look of multiple ranges matching this address and pick type
> * as per MTRR precedence
> */
>- if (!(mtrr_state->enabled & 2))
>- return mtrr_state->def_type;
>-
> prev_match = 0xFF;
> for (i = 0; i < num_var_ranges; ++i) {
> unsigned short start_state, end_state;
>--
>2.1.0
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-05-13 09:02:01

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 09/10] KVM: MMU: fix MTRR update

Hi Xiao,
On Wed, May 13, 2015 at 02:42:27PM +0800, Xiao Guangrong wrote:
>Currently, whenever guest MTRR registers are changed
>kvm_mmu_reset_context is called to switch to the new root shadow page
>table, however, it's useless since:
>1) the cache type is not cached into shadow page's attribute so that
> the original root shadow page will be reused

kvm_mmu_reset_context
kvm_mmu_unload
mmu_free_roots

The original root shadow page will be freed in mmu_free_roots, where I
miss?

Another question maybe not related to this patch:

If kvm_mmu_reset_context is just called to destroy the original root
shadow page and all the sptes will remain valid?

Regards,
Wanpeng Li

>
>2) the cache type is set on the last spte, that means we should sync
> the last sptes when MTRR is changed
>
>This patch fixs this issue by drop all the spte in the gfn range which
>is being updated by MTRR
>
>Signed-off-by: Xiao Guangrong <[email protected]>
>---
> arch/x86/kvm/x86.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
> 1 file changed, 58 insertions(+), 1 deletion(-)
>
>diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>index cde5d61..bbe184f 100644
>--- a/arch/x86/kvm/x86.c
>+++ b/arch/x86/kvm/x86.c
>@@ -1852,6 +1852,63 @@ bool kvm_mtrr_valid(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> }
> EXPORT_SYMBOL_GPL(kvm_mtrr_valid);
>
>+static void update_mtrr(struct kvm_vcpu *vcpu, u32 msr)
>+{
>+ struct mtrr_state_type *mtrr_state = &vcpu->arch.mtrr_state;
>+ unsigned char mtrr_enabled = mtrr_state->enabled;
>+ gfn_t start, end, mask;
>+ int index;
>+ bool is_fixed = true;
>+
>+ if (msr == MSR_IA32_CR_PAT || !tdp_enabled ||
>+ !kvm_arch_has_noncoherent_dma(vcpu->kvm))
>+ return;
>+
>+ if (!(mtrr_enabled & 0x2) && msr != MSR_MTRRdefType)
>+ return;
>+
>+ switch (msr) {
>+ case MSR_MTRRfix64K_00000:
>+ start = 0x0;
>+ end = 0x80000;
>+ break;
>+ case MSR_MTRRfix16K_80000:
>+ start = 0x80000;
>+ end = 0xa0000;
>+ break;
>+ case MSR_MTRRfix16K_A0000:
>+ start = 0xa0000;
>+ end = 0xc0000;
>+ break;
>+ case MSR_MTRRfix4K_C0000 ... MSR_MTRRfix4K_F8000:
>+ index = msr - MSR_MTRRfix4K_C0000;
>+ start = 0xc0000 + index * (32 << 10);
>+ end = start + (32 << 10);
>+ break;
>+ case MSR_MTRRdefType:
>+ is_fixed = false;
>+ start = 0x0;
>+ end = ~0ULL;
>+ break;
>+ default:
>+ /* variable range MTRRs. */
>+ is_fixed = false;
>+ index = (msr - 0x200) / 2;
>+ start = (((u64)mtrr_state->var_ranges[index].base_hi) << 32) +
>+ (mtrr_state->var_ranges[index].base_lo & PAGE_MASK);
>+ mask = (((u64)mtrr_state->var_ranges[index].mask_hi) << 32) +
>+ (mtrr_state->var_ranges[index].mask_lo & PAGE_MASK);
>+ mask |= ~0ULL << cpuid_maxphyaddr(vcpu);
>+
>+ end = ((start & mask) | ~mask) + 1;
>+ }
>+
>+ if (is_fixed && !(mtrr_enabled & 0x1))
>+ return;
>+
>+ kvm_zap_gfn_range(vcpu->kvm, gpa_to_gfn(start), gpa_to_gfn(end));
>+}
>+
> static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> {
> u64 *p = (u64 *)&vcpu->arch.mtrr_state.fixed_ranges;
>@@ -1885,7 +1942,7 @@ static int set_msr_mtrr(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> *pt = data;
> }
>
>- kvm_mmu_reset_context(vcpu);
>+ update_mtrr(vcpu, msr);
> return 0;
> }
>
>--
>2.1.0
>
>--
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to [email protected]
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/

2015-05-13 14:10:24

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 09/10] KVM: MMU: fix MTRR update



On 13/05/2015 10:43, Wanpeng Li wrote:
> kvm_mmu_reset_context
> kvm_mmu_unload
> mmu_free_roots
>
> The original root shadow page will be freed in mmu_free_roots, where I
> miss?
>
> Another question maybe not related to this patch:
>
> If kvm_mmu_reset_context is just called to destroy the original root
> shadow page and all the sptes will remain valid?

SPTEs are kept around and cached. The "role" field is used as the hash
key; if the role doesn't change, SPTEs are reused, so you have to zap
the SPTEs explicitly.

Paolo

2015-05-13 14:14:20

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 00/10] KVM: MTRR fixes and some cleanups



On 13/05/2015 08:42, Xiao Guangrong wrote:
> Changelog in v3:
> thanks for Paolo's comment:
> - do not apply for_each_rmap_spte to kvm_zap_rmapp and kvm_mmu_unlink_parents

Good idea applying my comment to kvm_mmu_unlink_parents as well. :)

> - fix a cosmetic issue in slot_handle_level_range
> - introduce PT_MAX_HUGEPAGE_LEVEL to clean up the code
> - improve code Indentation

I've squashed together patches 8 and 9, and will apply to kvm/queue soon.

Paolo

2015-05-14 00:34:45

by Wanpeng Li

[permalink] [raw]
Subject: Re: [PATCH v3 09/10] KVM: MMU: fix MTRR update

On Wed, May 13, 2015 at 04:10:08PM +0200, Paolo Bonzini wrote:
>
>
>On 13/05/2015 10:43, Wanpeng Li wrote:
>> kvm_mmu_reset_context
>> kvm_mmu_unload
>> mmu_free_roots
>>
>> The original root shadow page will be freed in mmu_free_roots, where I
>> miss?
>>
>> Another question maybe not related to this patch:
>>
>> If kvm_mmu_reset_context is just called to destroy the original root
>> shadow page and all the sptes will remain valid?
>
>SPTEs are kept around and cached. The "role" field is used as the hash
>key; if the role doesn't change, SPTEs are reused, so you have to zap
>the SPTEs explicitly.

Thanks for your explanation. :)

Btw, why the patch changelog mentioned that the root shadow page will be
reused, I think it will be zapped in mmu_free_roots.

Regards,
Wanpeng Li

>
>Paolo

2015-05-14 08:43:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 09/10] KVM: MMU: fix MTRR update



On 14/05/2015 02:16, Wanpeng Li wrote:
> >SPTEs are kept around and cached. The "role" field is used as the hash
> >key; if the role doesn't change, SPTEs are reused, so you have to zap
> >the SPTEs explicitly.
>
> Btw, why the patch changelog mentioned that the root shadow page will be
> reused, I think it will be zapped in mmu_free_roots.

Who has set sp->role.invalid on it? If no one has, it's not zapped and
it will be found again in the hash table.

Paolo

2015-07-12 17:33:56

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR

On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
> There are some bugs in current get_mtrr_type();
> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
> IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
> that means other bits are ignored if it is cleared
>
> 2: the fixed MTRR ranges are controlled by bit 0 of
> mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>
> 3: if MTRR is disabled, UC is applied to all of physical memory rather
> than mtrr_state->def_type
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)


I'm seeing a significant regression in boot performance on Intel
hardware with assigned devices that bisects back to this patch. There's
a long delay with Seabios between the version splash and execution of
option ROMs, and a _very_ long delay with OVMF before the display is
initialized. The delay is long enough that users are reporting their
previously working VM is hung with 100% CPU usage on v4.2-rc1. Thanks,

Alex

>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index b78e83f..d00cebd 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
> static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
> u64 start, u64 end)
> {
> - int i;
> u64 base, mask;
> u8 prev_match, curr_match;
> - int num_var_ranges = KVM_NR_VAR_MTRR;
> + int i, num_var_ranges = KVM_NR_VAR_MTRR;
>
> - if (!mtrr_state->enabled)
> - return 0xFF;
> + /* MTRR is completely disabled, use UC for all of physical memory. */
> + if (!(mtrr_state->enabled & 0x2))
> + return MTRR_TYPE_UNCACHABLE;
>
> /* Make end inclusive end, instead of exclusive */
> end--;
>
> /* Look in fixed ranges. Just return the type as per start */
> - if (mtrr_state->have_fixed && (start < 0x100000)) {
> + if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
> + (start < 0x100000)) {
> int idx;
>
> if (start < 0x80000) {
> @@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
> * Look of multiple ranges matching this address and pick type
> * as per MTRR precedence
> */
> - if (!(mtrr_state->enabled & 2))
> - return mtrr_state->def_type;
> -
> prev_match = 0xFF;
> for (i = 0; i < num_var_ranges; ++i) {
> unsigned short start_state, end_state;


2015-07-12 19:04:04

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR



On 07/13/2015 01:33 AM, Alex Williamson wrote:
> On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
>> There are some bugs in current get_mtrr_type();
>> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>> IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>> that means other bits are ignored if it is cleared
>>
>> 2: the fixed MTRR ranges are controlled by bit 0 of
>> mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>>
>> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>> than mtrr_state->def_type
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 14 ++++++--------
>> 1 file changed, 6 insertions(+), 8 deletions(-)
>
>
> I'm seeing a significant regression in boot performance on Intel
> hardware with assigned devices that bisects back to this patch. There's
> a long delay with Seabios between the version splash and execution of
> option ROMs, and a _very_ long delay with OVMF before the display is
> initialized. The delay is long enough that users are reporting their
> previously working VM is hung with 100% CPU usage on v4.2-rc1. Thanks,
>

Alex, thanks for your report. I will try to reproduce and fix it asap.

2015-07-12 19:12:19

by Bandan Das

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR

Alex Williamson <[email protected]> writes:

> On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
>> There are some bugs in current get_mtrr_type();
>> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>> IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>> that means other bits are ignored if it is cleared
>>
>> 2: the fixed MTRR ranges are controlled by bit 0 of
>> mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>>
>> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>> than mtrr_state->def_type
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 14 ++++++--------
>> 1 file changed, 6 insertions(+), 8 deletions(-)
>
>
> I'm seeing a significant regression in boot performance on Intel
> hardware with assigned devices that bisects back to this patch. There's
> a long delay with Seabios between the version splash and execution of
> option ROMs, and a _very_ long delay with OVMF before the display is
> initialized. The delay is long enough that users are reporting their
> previously working VM is hung with 100% CPU usage on v4.2-rc1. Thanks,
>
> Alex
>
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index b78e83f..d00cebd 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
>> static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>> u64 start, u64 end)
>> {
>> - int i;
>> u64 base, mask;
>> u8 prev_match, curr_match;
>> - int num_var_ranges = KVM_NR_VAR_MTRR;
>> + int i, num_var_ranges = KVM_NR_VAR_MTRR;
>>
>> - if (!mtrr_state->enabled)
>> - return 0xFF;
>> + /* MTRR is completely disabled, use UC for all of physical memory. */
>> + if (!(mtrr_state->enabled & 0x2))
>> + return MTRR_TYPE_UNCACHABLE;

Setting this to UC if MTRR is disabled is too restrictive maybe..

Bandan

>> /* Make end inclusive end, instead of exclusive */
>> end--;
>>
>> /* Look in fixed ranges. Just return the type as per start */
>> - if (mtrr_state->have_fixed && (start < 0x100000)) {
>> + if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
>> + (start < 0x100000)) {
>> int idx;
>>
>> if (start < 0x80000) {
>> @@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>> * Look of multiple ranges matching this address and pick type
>> * as per MTRR precedence
>> */
>> - if (!(mtrr_state->enabled & 2))
>> - return mtrr_state->def_type;
>> -
>> prev_match = 0xFF;
>> for (i = 0; i < num_var_ranges; ++i) {
>> unsigned short start_state, end_state;
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2015-07-13 07:32:13

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR



On 12/07/2015 20:59, Xiao Guangrong wrote:
>
>
> On 07/13/2015 01:33 AM, Alex Williamson wrote:
>> On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
>>> There are some bugs in current get_mtrr_type();
>>> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>>> IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>>> that means other bits are ignored if it is cleared
>>>
>>> 2: the fixed MTRR ranges are controlled by bit 0 of
>>> mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>>>
>>> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>>> than mtrr_state->def_type
>>>
>>> Signed-off-by: Xiao Guangrong <[email protected]>
>>> ---
>>> arch/x86/kvm/mmu.c | 14 ++++++--------
>>> 1 file changed, 6 insertions(+), 8 deletions(-)
>>
>>
>> I'm seeing a significant regression in boot performance on Intel
>> hardware with assigned devices that bisects back to this patch. There's
>> a long delay with Seabios between the version splash and execution of
>> option ROMs, and a _very_ long delay with OVMF before the display is
>> initialized. The delay is long enough that users are reporting their
>> previously working VM is hung with 100% CPU usage on v4.2-rc1. Thanks,
>>
>
> Alex, thanks for your report. I will try to reproduce and fix it asap.

The code that Bandan pointed out

+ /* MTRR is completely disabled, use UC for all of physical memory. */
+ if (!(mtrr_state->enabled & 0x2))
+ return MTRR_TYPE_UNCACHABLE;

actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
kvm_mtrr_get_guest_memory_type, 2015-06-15). Should mtrr_default_type
actually be something like this:

static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
{
if (mtrr_is_enabled(mtrr_state))
return mtrr_state->deftype & IA32_MTRR_DEF_TYPE_TYPE_MASK;
else
return MTRR_TYPE_UNCACHABLE;
}

? Then it's easy to add a quirk that makes the default WRITEBACK until
MTRRs are enabled.

Paolo

2015-07-13 14:50:09

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR


On 07/13/2015 03:32 PM, Paolo Bonzini wrote:

>>> I'm seeing a significant regression in boot performance on Intel
>>> hardware with assigned devices that bisects back to this patch. There's
>>> a long delay with Seabios between the version splash and execution of
>>> option ROMs, and a _very_ long delay with OVMF before the display is
>>> initialized. The delay is long enough that users are reporting their
>>> previously working VM is hung with 100% CPU usage on v4.2-rc1. Thanks,
>>>
>>
>> Alex, thanks for your report. I will try to reproduce and fix it asap.
>

I have reproduced the issue and I think Bandan and Paolo is correct.

> The code that Bandan pointed out
>
> + /* MTRR is completely disabled, use UC for all of physical memory. */
> + if (!(mtrr_state->enabled & 0x2))
> + return MTRR_TYPE_UNCACHABLE;
>
> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
> kvm_mtrr_get_guest_memory_type, 2015-06-15). Should mtrr_default_type
> actually be something like this:

:(

Based on the SDM, UC is applied to all memory rather than default-type
if MTRR is disabled.

If i changed the code to:
if (!(mtrr_state->enabled & 0x2))
return mtrr_state->def_type;
the result is the same as before.

However, fast boot came back if "return 0xFF" here. So fast boot expects
that the memory type is WB.

>
> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
> {
> if (mtrr_is_enabled(mtrr_state))
> return mtrr_state->deftype & IA32_MTRR_DEF_TYPE_TYPE_MASK;
> else
> return MTRR_TYPE_UNCACHABLE;
> }
>
> ? Then it's easy to add a quirk that makes the default WRITEBACK until
> MTRRs are enabled.

It is the wrong configure in OVMF... shall we need to adjust KVM to satisfy
OVMF?

2015-07-13 15:13:08

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR

On 13/07/2015 16:45, Xiao Guangrong wrote:
>> + /* MTRR is completely disabled, use UC for all of physical
>> memory. */
>> + if (!(mtrr_state->enabled & 0x2))
>> + return MTRR_TYPE_UNCACHABLE;
>>
>> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
>> kvm_mtrr_get_guest_memory_type, 2015-06-15).
>
> :(
>
> Based on the SDM, UC is applied to all memory rather than default-type
> if MTRR is disabled.

There are two issues I think. One is that I cannot find in the current
code that "UC is applied to all memory rather than default-type if MTRR
is disabled". mtrr_default_type unconditionally looks at
mtrr_state->deftype.

> However, fast boot came back if "return 0xFF" here. So fast boot expects
> that the memory type is WB.

Yes.

>>
>> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
>> {
>> if (mtrr_is_enabled(mtrr_state))
>> return mtrr_state->deftype &
>> IA32_MTRR_DEF_TYPE_TYPE_MASK;
>> else
>> return MTRR_TYPE_UNCACHABLE;
>> }
>>
>> ? Then it's easy to add a quirk that makes the default WRITEBACK until
>> MTRRs are enabled.
>
> It is the wrong configure in OVMF... shall we need to adjust KVM to satisfy
> OVMF?

That's what quirks are for... The firmware should still be fixed of course.

Paolo

2015-07-13 15:20:54

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 01/10] KVM: MMU: fix decoding cache type from MTRR



On 07/13/2015 11:13 PM, Paolo Bonzini wrote:
> On 13/07/2015 16:45, Xiao Guangrong wrote:
>>> + /* MTRR is completely disabled, use UC for all of physical
>>> memory. */
>>> + if (!(mtrr_state->enabled & 0x2))
>>> + return MTRR_TYPE_UNCACHABLE;
>>>
>>> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
>>> kvm_mtrr_get_guest_memory_type, 2015-06-15).
>>
>> :(
>>
>> Based on the SDM, UC is applied to all memory rather than default-type
>> if MTRR is disabled.
>
> There are two issues I think. One is that I cannot find in the current
> code that "UC is applied to all memory rather than default-type if MTRR
> is disabled". mtrr_default_type unconditionally looks at
> mtrr_state->deftype.

Yes... Will fix.

>
>> However, fast boot came back if "return 0xFF" here. So fast boot expects
>> that the memory type is WB.
>
> Yes.
>
>>>
>>> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
>>> {
>>> if (mtrr_is_enabled(mtrr_state))
>>> return mtrr_state->deftype &
>>> IA32_MTRR_DEF_TYPE_TYPE_MASK;
>>> else
>>> return MTRR_TYPE_UNCACHABLE;
>>> }
>>>
>>> ? Then it's easy to add a quirk that makes the default WRITEBACK until
>>> MTRRs are enabled.
>>
>> It is the wrong configure in OVMF... shall we need to adjust KVM to satisfy
>> OVMF?
>
> That's what quirks are for... The firmware should still be fixed of course.

I see, will do it.

2015-07-14 21:12:59

by Laszlo Ersek

[permalink] [raw]
Subject: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

Cross-posting to edk2-devel.

Original sub-thread starts here:
http://thread.gmane.org/gmane.linux.kernel/1952205/focus=1994315

On 07/13/15 17:15, Xiao Guangrong wrote:
>
>
> On 07/13/2015 11:13 PM, Paolo Bonzini wrote:
>> On 13/07/2015 16:45, Xiao Guangrong wrote:
>>>> + /* MTRR is completely disabled, use UC for all of physical
>>>> memory. */
>>>> + if (!(mtrr_state->enabled & 0x2))
>>>> + return MTRR_TYPE_UNCACHABLE;
>>>>
>>>> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
>>>> kvm_mtrr_get_guest_memory_type, 2015-06-15).
>>>
>>> :(
>>>
>>> Based on the SDM, UC is applied to all memory rather than default-type
>>> if MTRR is disabled.
>>
>> There are two issues I think. One is that I cannot find in the current
>> code that "UC is applied to all memory rather than default-type if MTRR
>> is disabled". mtrr_default_type unconditionally looks at
>> mtrr_state->deftype.
>
> Yes... Will fix.
>
>>
>>> However, fast boot came back if "return 0xFF" here. So fast boot expects
>>> that the memory type is WB.
>>
>> Yes.
>>
>>>>
>>>> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
>>>> {
>>>> if (mtrr_is_enabled(mtrr_state))
>>>> return mtrr_state->deftype &
>>>> IA32_MTRR_DEF_TYPE_TYPE_MASK;
>>>> else
>>>> return MTRR_TYPE_UNCACHABLE;
>>>> }
>>>>
>>>> ? Then it's easy to add a quirk that makes the default WRITEBACK until
>>>> MTRRs are enabled.
>>>
>>> It is the wrong configure in OVMF... shall we need to adjust KVM to
>>> satisfy
>>> OVMF?
>>
>> That's what quirks are for... The firmware should still be fixed of
>> course.
>
> I see, will do it.

The long delay that Alex reported (for the case when all guest memory
was set to UC up-front) is due to the fact that the SEC phase of OVMF
decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
drivers -- and this decompression is extremely memory-intensive.

(Refer to "Firmware image structure" in the OVMF whitepaper,
<http://www.linux-kvm.org/page/OVMF>.)

Perhaps also significant, with this initial MTRR change: the x86_64
reset vector builds some page tables too. (Refer to "Select features |
X64-specific reset vector for OVMF" in the OVMF whitepaper.)

(When Jordan implemented that reset vector first, we saw similar
performance degradation on AMD hosts (albeit not due to MTRR but due to
page attributes). See
<https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
it here because it makes me appreciate the current problem report.)

Anyway, the reset vector's page table building is implemented in
"OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
(It is recommended to refer heavily to the OVMF whitepaper, or at least
to drink heavily.)

In the PEI phase, we do set up MTRRs sensibly, see
"OvmfPkg/PlatformPei/MemDetect.c", function QemuInitializeRam().
However, that's too late with regard this problem report, because PEI
modules run *from* one of the firmware volumes that SEC expands with LZMA.

Anyway, the logic in QemuInitializeRam() relies on the MtrrLib library
class, which OVMF resolves to the
"UefiCpuPkg/Library/MtrrLib/MtrrLib.inf" instance. The latter has no
client module type restriction (it's BASE), so it could be used in SEC
too, if someone were to write patches.

I'm sorry but I really can't take this on now. So, respectfully:
- please quirk it in KVM for now (SeaBIOS is also affected),
- "patches welcome" for OVMF, as always.

Thanks
Laszlo

2015-07-14 21:15:51

by Paolo Bonzini

[permalink] [raw]
Subject: Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

> The long delay that Alex reported (for the case when all guest memory
> was set to UC up-front) is due to the fact that the SEC phase of OVMF
> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
> drivers -- and this decompression is extremely memory-intensive.
>
> (When Jordan implemented that reset vector first, we saw similar
> performance degradation on AMD hosts (albeit not due to MTRR but due to
> page attributes). See
> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
> it here because it makes me appreciate the current problem report.)
>
> Anyway, the reset vector's page table building is implemented in
> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().

Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
and set the default type to writeback.

In any case we're going to have to quirk it, because of the broken
guests in the wild.

Paolo

2015-07-14 21:29:17

by Laszlo Ersek

[permalink] [raw]
Subject: Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

On 07/14/15 23:15, Paolo Bonzini wrote:
>> The long delay that Alex reported (for the case when all guest memory
>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>> drivers -- and this decompression is extremely memory-intensive.
>>
>> (When Jordan implemented that reset vector first, we saw similar
>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>> page attributes). See
>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>> it here because it makes me appreciate the current problem report.)
>>
>> Anyway, the reset vector's page table building is implemented in
>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>
> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?

That's an idea, yes, if someone feels sufficiently drawn to writing
assembly. Complications:
- the reset vector is specific to OvmfPkg only in the OvmfPkgX64.dsc
build
- it needs to be determined what memory to cover.

> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
> and set the default type to writeback.

Seems safe to me, off the top of my head (and testing could confirm /
disprove quickly).

> In any case we're going to have to quirk it, because of the broken
> guests in the wild.

Thanks.
Laszlo

2015-07-14 22:37:09

by Jordan Justen

[permalink] [raw]
Subject: Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

On 2015-07-14 14:29:11, Laszlo Ersek wrote:
> On 07/14/15 23:15, Paolo Bonzini wrote:
> >> The long delay that Alex reported (for the case when all guest memory
> >> was set to UC up-front) is due to the fact that the SEC phase of OVMF
> >> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
> >> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
> >> drivers -- and this decompression is extremely memory-intensive.
> >>
> >> (When Jordan implemented that reset vector first, we saw similar
> >> performance degradation on AMD hosts (albeit not due to MTRR but due to
> >> page attributes). See
> >> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
> >> it here because it makes me appreciate the current problem report.)
> >>
> >> Anyway, the reset vector's page table building is implemented in
> >> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
> >> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
> >
> > Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
>
> That's an idea, yes, if someone feels sufficiently drawn to writing
> assembly.

Maybe we can use MtrrLib in the SEC C code?

-Jordan

> Complications:
> - the reset vector is specific to OvmfPkg only in the OvmfPkgX64.dsc
> build
> - it needs to be determined what memory to cover.
>
> > I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
> > and set the default type to writeback.
>
> Seems safe to me, off the top of my head (and testing could confirm /
> disprove quickly).
>
> > In any case we're going to have to quirk it, because of the broken
> > guests in the wild.
>
> Thanks.
> Laszlo

2015-07-15 00:19:57

by Fan, Jeff

[permalink] [raw]
Subject: RE: [edk2] MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

Actually, MMIO will be used in OVMF SEC phase if local APIC is consumed. (SecPeiDebugAgentLib will consume local APIC timer for communication between debugger TARGET/HOST).

So, I suggest to keep MTRR default value to UC and set the code range to WB, or set default value to WB and set Local APIC space range to UC. (gUefiCpuPkgTokenSpaceGuid.PcdCpuLocalApicBaseAddress|0xfee00000)

As Jordan's suggestion, you could make use of MTRR lib in UefiCpuPkg.

Jeff

-----Original Message-----
From: Paolo Bonzini [mailto:[email protected]]
Sent: Wednesday, July 15, 2015 5:16 AM
To: Laszlo Ersek
Cc: edk2-devel list; Xiao Guangrong; [email protected]; [email protected]; [email protected]; [email protected]
Subject: Re: [edk2] MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

> The long delay that Alex reported (for the case when all guest memory
> was set to UC up-front) is due to the fact that the SEC phase of OVMF
> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
> drivers -- and this decompression is extremely memory-intensive.
>
> (When Jordan implemented that reset vector first, we saw similar
> performance degradation on AMD hosts (albeit not due to MTRR but due
> to page attributes). See
> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only
> mentioning it here because it makes me appreciate the current problem
> report.)
>
> Anyway, the reset vector's page table building is implemented in
> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().

Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs and set the default type to writeback.

In any case we're going to have to quirk it, because of the broken guests in the wild.

Paolo

------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
_______________________________________________
edk2-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/edk2-devel

2015-07-15 09:57:38

by Laszlo Ersek

[permalink] [raw]
Subject: Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

On 07/15/15 00:37, Jordan Justen wrote:
> On 2015-07-14 14:29:11, Laszlo Ersek wrote:
>> On 07/14/15 23:15, Paolo Bonzini wrote:
>>>> The long delay that Alex reported (for the case when all guest memory
>>>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>>>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>>>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>>>> drivers -- and this decompression is extremely memory-intensive.
>>>>
>>>> (When Jordan implemented that reset vector first, we saw similar
>>>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>>>> page attributes). See
>>>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>>>> it here because it makes me appreciate the current problem report.)
>>>>
>>>> Anyway, the reset vector's page table building is implemented in
>>>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>>>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>>>
>>> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
>>
>> That's an idea, yes, if someone feels sufficiently drawn to writing
>> assembly.
>
> Maybe we can use MtrrLib in the SEC C code?

If the page table building in the reset vector is not too costly with
UC, then yes, it should suffice if MtrrLib was put to use first just
before the decompression in SEC.

Thanks
Laszlo

> -Jordan
>
>> Complications:
>> - the reset vector is specific to OvmfPkg only in the OvmfPkgX64.dsc
>> build
>> - it needs to be determined what memory to cover.
>>
>>> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
>>> and set the default type to writeback.
>>
>> Seems safe to me, off the top of my head (and testing could confirm /
>> disprove quickly).
>>
>>> In any case we're going to have to quirk it, because of the broken
>>> guests in the wild.
>>
>> Thanks.
>> Laszlo

2015-07-15 19:35:43

by Xiao Guangrong

[permalink] [raw]
Subject: Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]


Hi,

I have posted the pachset to make OVMF happy and have CCed you guys,
could you please check it if it works for you?


On 07/15/2015 05:15 AM, Paolo Bonzini wrote:
>> The long delay that Alex reported (for the case when all guest memory
>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>> drivers -- and this decompression is extremely memory-intensive.
>>
>> (When Jordan implemented that reset vector first, we saw similar
>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>> page attributes). See
>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>> it here because it makes me appreciate the current problem report.)
>>
>> Anyway, the reset vector's page table building is implemented in
>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>
> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
> and set the default type to writeback.
>
> In any case we're going to have to quirk it, because of the broken
> guests in the wild.
>
> Paolo
>

2015-07-15 19:41:15

by Laszlo Ersek

[permalink] [raw]
Subject: Re: MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

On 07/15/15 21:30, Xiao Guangrong wrote:
>
> Hi,
>
> I have posted the pachset to make OVMF happy and have CCed you guys,
> could you please check it if it works for you?

Sorry, I can't check it; I don't have an environment that exposes the
issue in the first place. Perhaps Alex can check it, or refer some of
the users that reported the problem to him to this patchset? (Those
users were using bleeding edge v4.2-rc1 anyway, unlike conservative me.)

Thanks!
Laszlo

> On 07/15/2015 05:15 AM, Paolo Bonzini wrote:
>>> The long delay that Alex reported (for the case when all guest memory
>>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>>> drivers -- and this decompression is extremely memory-intensive.
>>>
>>> (When Jordan implemented that reset vector first, we saw similar
>>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>>> page attributes). See
>>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>>> it here because it makes me appreciate the current problem report.)
>>>
>>> Anyway, the reset vector's page table building is implemented in
>>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>>
>> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
>> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
>> and set the default type to writeback.
>>
>> In any case we're going to have to quirk it, because of the broken
>> guests in the wild.
>>
>> Paolo
>>