2013-03-15 15:26:46

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 0/5] KVM: MMU: fast invalid all mmio sptes

The current way is holding hot mmu-lock and walking all shadow pages, this
is not scale. This patchset tries to introduce a very simple and scale way
to fast invalid all mmio sptes - it need not walk any shadow pages and hold
any locks.

The idea is simple:
KVM maintains a global mmio invalid generation-number which is stored in
kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
generation-number into his available bits when it is created.

When KVM need zap all mmio sptes, it just simply increase the global
generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
then it walks the shadow page table and get the mmio spte. If the
generation-number on the spte does not equal the global generation-number,
it will go to the normal #PF handler to update the mmio spte.

Since 19 bits are used to store generation-number on mmio spte, the
generation-number can be round after 33554432 times. It is large enough
for nearly all most cases, but making the code be more strong, we zap all
shadow pages when the number is round.

Note: after my patchset that fast zap all shadow pages, kvm_mmu_zap_all is
not a problem any more. The scalability is the same as zap mmio shadow page


2013-03-15 15:27:10

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 1/5] Revert "KVM: x86: Optimize mmio spte zapping when, creating/moving memslot"

This reverts commit 982b3394dd23eec6e5a2f7871238435a167b63cc.
This way is not scale, will use a simple and scale way to zap all
mmio sptes later

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/mmu.c | 18 ------------------
arch/x86/kvm/x86.c | 2 +-
3 files changed, 1 insertions(+), 20 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 3f205c6c..9b75cae 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -767,7 +767,6 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
-void kvm_mmu_zap_mmio_sptes(struct kvm *kvm);
unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c1a9b7b..de45ec1 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -4189,24 +4189,6 @@ restart:
spin_unlock(&kvm->mmu_lock);
}

-void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
-{
- struct kvm_mmu_page *sp, *node;
- LIST_HEAD(invalid_list);
-
- spin_lock(&kvm->mmu_lock);
-restart:
- list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
- if (!sp->mmio_cached)
- continue;
- if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
- goto restart;
- }
-
- kvm_mmu_commit_zap_page(kvm, &invalid_list);
- spin_unlock(&kvm->mmu_lock);
-}
-
static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
{
struct kvm *kvm;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d3c4787..61a5bb6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6991,7 +6991,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
* mmio sptes.
*/
if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
- kvm_mmu_zap_mmio_sptes(kvm);

2013-03-15 15:27:49

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 2/5] Revert "KVM: MMU: Mark sp mmio cached when creating mmio spte"

This reverts commit 95b0430d1a53541076ffbaf453f8b49a547cceba.
Will use a better way to zap all mmio shadow pages

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/mmu.c | 3 ---
2 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 9b75cae..ef7f4a5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -230,7 +230,6 @@ struct kvm_mmu_page {
#endif

int write_flooding_count;
- bool mmio_cached;
};

struct kvm_pio_request {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index de45ec1..fdacabb 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -199,11 +199,8 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);

static void mark_mmio_spte(u64 *sptep, u64 gfn, unsigned access)
{
- struct kvm_mmu_page *sp = page_header(__pa(sptep));
-
access &= ACC_WRITE_MASK | ACC_USER_MASK;

- sp->mmio_cached = true;
trace_mark_mmio_spte(sptep, gfn, access);
mmu_spte_set(sptep, shadow_mmio_mask | access | gfn << PAGE_SHIFT);
}
--
1.7.7.6

2013-03-15 15:28:31

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 3/5] KVM: MMU: retain more available bits available on mmio spte

Let mmio spte only use bit62 and bit63 on upper 32 bits, then bit 52 ~ bit 61
can be used for other purposes

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/vmx.c | 4 ++--
arch/x86/kvm/x86.c | 8 +++++++-
2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 17a6938..54fdb76 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4003,10 +4003,10 @@ static void ept_set_mmio_spte_mask(void)
/*
* EPT Misconfigurations can be generated if the value of bits 2:0
* of an EPT paging-structure entry is 110b (write/execute).
- * Also, magic bits (0xffull << 49) is set to quickly identify mmio
+ * Also, magic bits (0x3ull << 62) is set to quickly identify mmio
* spte.
*/
- kvm_mmu_set_mmio_spte_mask(0xffull << 49 | 0x6ull);
+ kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);
}

/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 61a5bb6..81fb3c3 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5197,7 +5197,13 @@ static void kvm_set_mmio_spte_mask(void)
* Set the reserved bits and the present bit of an paging-structure
* entry to generate page fault with PFER.RSV = 1.
*/
- mask = ((1ull << (62 - maxphyaddr + 1)) - 1) << maxphyaddr;
+ /* Mask the reserved physical address bits. */
+ mask = ((1ull << (51 - maxphyaddr + 1)) - 1) << maxphyaddr;
+
+ /* Bit 62 is always reserved for 32bit host. */
+ mask |= 0x3ull << 62;
+
+ /* Set the present bit. */
mask |= 1ull;

#ifdef CONFIG_X86_64
--
1.7.7.6

2013-03-15 15:30:04

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 4/5] KVM: MMU: store generation-number into mmio spte

Store the generation-number into bit3 ~ bit11 and bit52 ~ bit61, totally
19 bits can be used, it should be enough for nearly all most common cases

In this patch, the generation-number is always 0, it will be changed in
the later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 57 +++++++++++++++++++++++++++++++++++++-------
arch/x86/kvm/mmutrace.h | 10 ++++---
arch/x86/kvm/paging_tmpl.h | 3 +-
3 files changed, 56 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fdacabb..13626f4 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -197,12 +197,50 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
}
EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);

-static void mark_mmio_spte(u64 *sptep, u64 gfn, unsigned access)
+/*
+ * spte bits of bit 3 ~ bit 11 are used as low 9 bits of
+ * generation, the bits of bits 52 ~ bit 61 are used as
+ * high 12 bits of generation.
+ */
+#define MMIO_SPTE_GEN_LOW_SHIFT 3
+#define MMO_SPTE_GEN_HIGH_SHIFT 52
+
+#define GEN_LOW_SHIFT 9
+#define GEN_LOW_MASK ((1 << GEN_LOW_SHIFT) - 1)
+#define MAX_GEN ((1 << 19) - 1)
+
+static u64 generation_mmio_spte_mask(unsigned int gen)
{
+ u64 mask;
+
+ WARN_ON(gen > MAX_GEN);
+
+ mask = (gen & GEN_LOW_MASK) << MMIO_SPTE_GEN_LOW_SHIFT;
+ mask |= ((u64)gen >> GEN_LOW_SHIFT) << MMO_SPTE_GEN_HIGH_SHIFT;
+ return mask;
+}
+
+static unsigned int get_mmio_spte_generation(u64 spte)
+{
+ unsigned int gen;
+
+ spte &= ~shadow_mmio_mask;
+
+ gen = (spte >> MMIO_SPTE_GEN_LOW_SHIFT) & GEN_LOW_MASK;
+ gen |= (spte >> MMO_SPTE_GEN_HIGH_SHIFT) << GEN_LOW_SHIFT;
+ return gen;
+}
+
+static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
+ unsigned access)
+{
+ u64 mask = generation_mmio_spte_mask(0);
+
access &= ACC_WRITE_MASK | ACC_USER_MASK;
+ mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;

- trace_mark_mmio_spte(sptep, gfn, access);
- mmu_spte_set(sptep, shadow_mmio_mask | access | gfn << PAGE_SHIFT);
+ trace_mark_mmio_spte(sptep, gfn, access, 0);
+ mmu_spte_set(sptep, mask);
}

static bool is_mmio_spte(u64 spte)
@@ -220,10 +258,11 @@ static unsigned get_mmio_spte_access(u64 spte)
return (spte & ~shadow_mmio_mask) & ~PAGE_MASK;
}

-static bool set_mmio_spte(u64 *sptep, gfn_t gfn, pfn_t pfn, unsigned access)
+static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
+ pfn_t pfn, unsigned access)
{
if (unlikely(is_noslot_pfn(pfn))) {
- mark_mmio_spte(sptep, gfn, access);
+ mark_mmio_spte(kvm, sptep, gfn, access);
return true;
}

@@ -2327,7 +2366,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
u64 spte;
int ret = 0;

- if (set_mmio_spte(sptep, gfn, pfn, pte_access))
+ if (set_mmio_spte(vcpu->kvm, sptep, gfn, pfn, pte_access))
return 0;

spte = PT_PRESENT_MASK;
@@ -3386,8 +3425,8 @@ static inline void protect_clean_gpte(unsigned *access, unsigned gpte)
*access &= mask;
}

-static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
- int *nr_present)
+static bool sync_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
+ unsigned access, int *nr_present)
{
if (unlikely(is_mmio_spte(*sptep))) {
if (gfn != get_mmio_spte_gfn(*sptep)) {
@@ -3396,7 +3435,7 @@ static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
}

(*nr_present)++;
- mark_mmio_spte(sptep, gfn, access);
+ mark_mmio_spte(kvm, sptep, gfn, access);
return true;
}

diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index b8f6172..f5b62a7 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -197,23 +197,25 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page,

TRACE_EVENT(
mark_mmio_spte,
- TP_PROTO(u64 *sptep, gfn_t gfn, unsigned access),
- TP_ARGS(sptep, gfn, access),
+ TP_PROTO(u64 *sptep, gfn_t gfn, unsigned access, unsigned int gen),
+ TP_ARGS(sptep, gfn, access, gen),

TP_STRUCT__entry(
__field(void *, sptep)
__field(gfn_t, gfn)
__field(unsigned, access)
+ __field(unsigned int, gen)
),

TP_fast_assign(
__entry->sptep = sptep;
__entry->gfn = gfn;
__entry->access = access;
+ __entry->gen = gen;
),

- TP_printk("sptep:%p gfn %llx access %x", __entry->sptep, __entry->gfn,
- __entry->access)
+ TP_printk("sptep:%p gfn %llx access %x gen %x", __entry->sptep,
+ __entry->gfn, __entry->access, __entry->gen)
);

TRACE_EVENT(
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 105dd5b..2c48e5f 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -792,7 +792,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
pte_access &= gpte_access(vcpu, gpte);
protect_clean_gpte(&pte_access, gpte);

- if (sync_mmio_spte(&sp->spt[i], gfn, pte_access, &nr_present))
+ if (sync_mmio_spte(vcpu->kvm, &sp->spt[i], gfn, pte_access,
+ &nr_present))
continue;

if (gfn != sp->gfns[i]) {
--
1.7.7.6

2013-03-15 15:30:26

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

This patch tries to introduce a very simple and scale way to invalid all
mmio sptes - it need not walk any shadow pages and hold mmu-lock

KVM maintains a global mmio invalid generation-number which is stored in
kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
generation-number into his available bits when it is created

When KVM need zap all mmio sptes, it just simply increase the global
generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
then it walks the shadow page table and get the mmio spte. If the
generation-number on the spte does not equal the global generation-number,
it will go to the normal #PF handler to update the mmio spte

Since 19 bits are used to store generation-number on mmio spte, the
generation-number can be round after 33554432 times. It is large enough
for nearly all most cases, but making the code be more strong, we zap all
shadow pages when the number is round

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +
arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
arch/x86/kvm/mmutrace.h | 17 +++++++++++
arch/x86/kvm/paging_tmpl.h | 7 +++-
arch/x86/kvm/vmx.c | 4 ++
arch/x86/kvm/x86.c | 6 +--
6 files changed, 82 insertions(+), 15 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ef7f4a5..572398e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -529,6 +529,7 @@ struct kvm_arch {
unsigned int n_requested_mmu_pages;
unsigned int n_max_mmu_pages;
unsigned int indirect_shadow_pages;
+ unsigned int mmio_invalid_gen;
struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
/*
* Hash table of struct kvm_mmu_page.
@@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask);
+void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
void kvm_mmu_zap_all(struct kvm *kvm);
unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 13626f4..7093a92 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
unsigned access)
{
- u64 mask = generation_mmio_spte_mask(0);
+ unsigned int gen = ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
+ u64 mask = generation_mmio_spte_mask(gen);

access &= ACC_WRITE_MASK | ACC_USER_MASK;
mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;

- trace_mark_mmio_spte(sptep, gfn, access, 0);
+ trace_mark_mmio_spte(sptep, gfn, access, gen);
mmu_spte_set(sptep, mask);
}

@@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
return false;
}

+static bool check_mmio_spte(struct kvm *kvm, u64 spte)
+{
+ return get_mmio_spte_generation(spte) ==
+ ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
+}
+
+/*
+ * The caller should protect concurrent access on
+ * kvm->arch.mmio_invalid_gen. Currently, it is used by
+ * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
+ */
+void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
+{
+ /* Ensure update memslot has been completed. */
+ smp_mb();
+
+ trace_kvm_mmu_invalid_mmio_spte(kvm);
+
+ /*
+ * The very rare case: if the generation-number is round,
+ * zap all shadow pages.
+ */
+ if (unlikely(kvm->arch.mmio_invalid_gen++ == MAX_GEN)) {
+ kvm->arch.mmio_invalid_gen = 0;
+ return kvm_mmu_zap_all(kvm);
+ }
+}
+
static inline u64 rsvd_bits(int s, int e)
{
return ((1ULL << (e - s + 1)) - 1) << s;
@@ -3183,9 +3212,12 @@ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
}

/*
- * If it is a real mmio page fault, return 1 and emulat the instruction
- * directly, return 0 to let CPU fault again on the address, -1 is
- * returned if bug is detected.
+ * Return value:
+ * 2: invalid spte is detected then let the real page fault path
+ * update the mmio spte.
+ * 1: it is a real mmio page fault, emulate the instruction directly.
+ * 0: let CPU fault again on the address.
+ * -1: bug is detected.
*/
int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
{
@@ -3200,6 +3232,9 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
gfn_t gfn = get_mmio_spte_gfn(spte);
unsigned access = get_mmio_spte_access(spte);

+ if (unlikely(!check_mmio_spte(vcpu->kvm, spte)))
+ return 2;
+
if (direct)
addr = 0;

@@ -3241,8 +3276,12 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,

pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);

- if (unlikely(error_code & PFERR_RSVD_MASK))
- return handle_mmio_page_fault(vcpu, gva, error_code, true);
+ if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ r = handle_mmio_page_fault(vcpu, gva, error_code, true);
+
+ if (likely(r != 2))
+ return r;
+ }

r = mmu_topup_memory_caches(vcpu);
if (r)
@@ -3318,8 +3357,12 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
ASSERT(vcpu);
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));

- if (unlikely(error_code & PFERR_RSVD_MASK))
- return handle_mmio_page_fault(vcpu, gpa, error_code, true);
+ if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ r = handle_mmio_page_fault(vcpu, gpa, error_code, true);
+
+ if (likely(r != 2))
+ return r;
+ }

r = mmu_topup_memory_caches(vcpu);
if (r)
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index f5b62a7..811254c 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -276,6 +276,23 @@ TRACE_EVENT(
__spte_satisfied(old_spte), __spte_satisfied(new_spte)
)
);
+
+TRACE_EVENT(
+ kvm_mmu_invalid_mmio_spte,
+ TP_PROTO(struct kvm *kvm),
+ TP_ARGS(kvm),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, mmio_invalid_gen)
+ ),
+
+ TP_fast_assign(
+ __entry->mmio_invalid_gen = kvm->arch.mmio_invalid_gen;
+ ),
+
+ TP_printk("mmio_invalid_gen %x", __entry->mmio_invalid_gen
+ )
+);
#endif /* _TRACE_KVMMMU_H */

#undef TRACE_INCLUDE_PATH
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 2c48e5f..c81f949 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -552,9 +552,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,

pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);

- if (unlikely(error_code & PFERR_RSVD_MASK))
- return handle_mmio_page_fault(vcpu, addr, error_code,
+ if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ r = handle_mmio_page_fault(vcpu, addr, error_code,
mmu_is_nested(vcpu));
+ if (likely(r != 2))
+ return r;
+ };

r = mmu_topup_memory_caches(vcpu);
if (r)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 54fdb76..ca8e9a3 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5159,6 +5159,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
if (likely(ret == 1))
return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
EMULATE_DONE;
+
+ if (unlikely(ret == 2))
+ return kvm_mmu_page_fault(vcpu, gpa, 0, NULL, 0);
+
if (unlikely(!ret))
return 1;

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 81fb3c3..5b6d2a0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -6996,10 +6996,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
* If memory slot is created, or moved, we need to clear all
* mmio sptes.
*/
- if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
- kvm_mmu_zap_all(kvm);
- kvm_reload_remote_mmus(kvm);
- }
+ if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE))
+ kvm_mmu_invalid_mmio_spte(kvm);
}

void kvm_arch_flush_shadow_all(struct kvm *kvm)
--
1.7.7.6

2013-03-16 02:06:21

by Takuya Yoshikawa

[permalink] [raw]
Subject: Re: [PATCH 0/5] KVM: MMU: fast invalid all mmio sptes

Still reading, but sounds great if this works!

I did not like the idea of mmio-rmap based approach so much, but this
would be really/perfectly scalable.

Thanks,
Takuya

On Fri, 15 Mar 2013 23:26:16 +0800
Xiao Guangrong <[email protected]> wrote:

> The current way is holding hot mmu-lock and walking all shadow pages, this
> is not scale. This patchset tries to introduce a very simple and scale way
> to fast invalid all mmio sptes - it need not walk any shadow pages and hold
> any locks.
>
> The idea is simple:
> KVM maintains a global mmio invalid generation-number which is stored in
> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
> generation-number into his available bits when it is created.
>
> When KVM need zap all mmio sptes, it just simply increase the global
> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> then it walks the shadow page table and get the mmio spte. If the
> generation-number on the spte does not equal the global generation-number,
> it will go to the normal #PF handler to update the mmio spte.
>
> Since 19 bits are used to store generation-number on mmio spte, the
> generation-number can be round after 33554432 times. It is large enough
> for nearly all most cases, but making the code be more strong, we zap all
> shadow pages when the number is round.
>
> Note: after my patchset that fast zap all shadow pages, kvm_mmu_zap_all is
> not a problem any more. The scalability is the same as zap mmio shadow page
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Takuya Yoshikawa <[email protected]>

2013-03-16 02:07:53

by Takuya Yoshikawa

[permalink] [raw]
Subject: Re: [PATCH 1/5] Revert "KVM: x86: Optimize mmio spte zapping when, creating/moving memslot"

On Fri, 15 Mar 2013 23:26:59 +0800
Xiao Guangrong <[email protected]> wrote:

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index d3c4787..61a5bb6 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6991,7 +6991,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> * mmio sptes.
> */
> if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> - kvm_mmu_zap_mmio_sptes(kvm);
>

???
+ kvm_mmu_zap_all()


Takuya

2013-03-16 02:13:36

by Takuya Yoshikawa

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On Fri, 15 Mar 2013 23:29:53 +0800
Xiao Guangrong <[email protected]> wrote:

> +/*
> + * The caller should protect concurrent access on
> + * kvm->arch.mmio_invalid_gen. Currently, it is used by
> + * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
> + */
> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)

kvm_mmu_invalidate_mmio_sptes() may be a better name.

Thanks,
Takuya

> +{
> + /* Ensure update memslot has been completed. */
> + smp_mb();
> +
> + trace_kvm_mmu_invalid_mmio_spte(kvm);
> +
> + /*
> + * The very rare case: if the generation-number is round,
> + * zap all shadow pages.
> + */
> + if (unlikely(kvm->arch.mmio_invalid_gen++ == MAX_GEN)) {
> + kvm->arch.mmio_invalid_gen = 0;
> + return kvm_mmu_zap_all(kvm);
> + }
> +}
> +

2013-03-17 15:02:47

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
> This patch tries to introduce a very simple and scale way to invalid all
> mmio sptes - it need not walk any shadow pages and hold mmu-lock
>
> KVM maintains a global mmio invalid generation-number which is stored in
> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
> generation-number into his available bits when it is created
>
> When KVM need zap all mmio sptes, it just simply increase the global
> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> then it walks the shadow page table and get the mmio spte. If the
> generation-number on the spte does not equal the global generation-number,
> it will go to the normal #PF handler to update the mmio spte
>
> Since 19 bits are used to store generation-number on mmio spte, the
> generation-number can be round after 33554432 times. It is large enough
> for nearly all most cases, but making the code be more strong, we zap all
> shadow pages when the number is round
>
Very nice idea, but why drop Takuya patches instead of using
kvm_mmu_zap_mmio_sptes() when generation number overflows.


> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
> arch/x86/kvm/mmutrace.h | 17 +++++++++++
> arch/x86/kvm/paging_tmpl.h | 7 +++-
> arch/x86/kvm/vmx.c | 4 ++
> arch/x86/kvm/x86.c | 6 +--
> 6 files changed, 82 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ef7f4a5..572398e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -529,6 +529,7 @@ struct kvm_arch {
> unsigned int n_requested_mmu_pages;
> unsigned int n_max_mmu_pages;
> unsigned int indirect_shadow_pages;
> + unsigned int mmio_invalid_gen;
Why invalid? Should be mmio_valid_gen or mmio_current_get.

> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> /*
> * Hash table of struct kvm_mmu_page.
> @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> struct kvm_memory_slot *slot,
> gfn_t gfn_offset, unsigned long mask);
> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.

> void kvm_mmu_zap_all(struct kvm *kvm);
> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
> void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 13626f4..7093a92 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
> unsigned access)
> {
> - u64 mask = generation_mmio_spte_mask(0);
> + unsigned int gen = ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
> + u64 mask = generation_mmio_spte_mask(gen);
>
> access &= ACC_WRITE_MASK | ACC_USER_MASK;
> mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
>
> - trace_mark_mmio_spte(sptep, gfn, access, 0);
> + trace_mark_mmio_spte(sptep, gfn, access, gen);
> mmu_spte_set(sptep, mask);
> }
>
> @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
> return false;
> }
>
> +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
> +{
> + return get_mmio_spte_generation(spte) ==
> + ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
> +}
> +
> +/*
> + * The caller should protect concurrent access on
> + * kvm->arch.mmio_invalid_gen. Currently, it is used by
> + * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
> + */
> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
> +{
> + /* Ensure update memslot has been completed. */
> + smp_mb();
What barrier this one is paired with?

> +
> + trace_kvm_mmu_invalid_mmio_spte(kvm);
Something wrong with indentation.

> +
> + /*
> + * The very rare case: if the generation-number is round,
> + * zap all shadow pages.
> + */
> + if (unlikely(kvm->arch.mmio_invalid_gen++ == MAX_GEN)) {
> + kvm->arch.mmio_invalid_gen = 0;
> + return kvm_mmu_zap_all(kvm);
> + }
> +}
> +
> static inline u64 rsvd_bits(int s, int e)
> {
> return ((1ULL << (e - s + 1)) - 1) << s;
> @@ -3183,9 +3212,12 @@ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
> }
>
> /*
> - * If it is a real mmio page fault, return 1 and emulat the instruction
> - * directly, return 0 to let CPU fault again on the address, -1 is
> - * returned if bug is detected.
> + * Return value:
> + * 2: invalid spte is detected then let the real page fault path
> + * update the mmio spte.
> + * 1: it is a real mmio page fault, emulate the instruction directly.
> + * 0: let CPU fault again on the address.
> + * -1: bug is detected.
> */
What about dropping spte and let guest re-fault instead of propagating
new return value up the call chain. If this is slow lets at least define
a enum with descriptive names.

> int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> {
> @@ -3200,6 +3232,9 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> gfn_t gfn = get_mmio_spte_gfn(spte);
> unsigned access = get_mmio_spte_access(spte);
>
> + if (unlikely(!check_mmio_spte(vcpu->kvm, spte)))
> + return 2;
> +
> if (direct)
> addr = 0;
>
> @@ -3241,8 +3276,12 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
>
> pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
>
> - if (unlikely(error_code & PFERR_RSVD_MASK))
> - return handle_mmio_page_fault(vcpu, gva, error_code, true);
> + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + r = handle_mmio_page_fault(vcpu, gva, error_code, true);
> +
> + if (likely(r != 2))
> + return r;
> + }
>
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> @@ -3318,8 +3357,12 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
> ASSERT(vcpu);
> ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
>
> - if (unlikely(error_code & PFERR_RSVD_MASK))
> - return handle_mmio_page_fault(vcpu, gpa, error_code, true);
> + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + r = handle_mmio_page_fault(vcpu, gpa, error_code, true);
> +
> + if (likely(r != 2))
> + return r;
> + }
>
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
> index f5b62a7..811254c 100644
> --- a/arch/x86/kvm/mmutrace.h
> +++ b/arch/x86/kvm/mmutrace.h
> @@ -276,6 +276,23 @@ TRACE_EVENT(
> __spte_satisfied(old_spte), __spte_satisfied(new_spte)
> )
> );
> +
> +TRACE_EVENT(
> + kvm_mmu_invalid_mmio_spte,
> + TP_PROTO(struct kvm *kvm),
> + TP_ARGS(kvm),
> +
> + TP_STRUCT__entry(
> + __field(unsigned int, mmio_invalid_gen)
> + ),
> +
> + TP_fast_assign(
> + __entry->mmio_invalid_gen = kvm->arch.mmio_invalid_gen;
> + ),
> +
> + TP_printk("mmio_invalid_gen %x", __entry->mmio_invalid_gen
> + )
> +);
> #endif /* _TRACE_KVMMMU_H */
>
> #undef TRACE_INCLUDE_PATH
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 2c48e5f..c81f949 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -552,9 +552,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
>
> pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
>
> - if (unlikely(error_code & PFERR_RSVD_MASK))
> - return handle_mmio_page_fault(vcpu, addr, error_code,
> + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + r = handle_mmio_page_fault(vcpu, addr, error_code,
> mmu_is_nested(vcpu));
> + if (likely(r != 2))
> + return r;
> + };
>
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 54fdb76..ca8e9a3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5159,6 +5159,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> if (likely(ret == 1))
> return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
> EMULATE_DONE;
> +
> + if (unlikely(ret == 2))
> + return kvm_mmu_page_fault(vcpu, gpa, 0, NULL, 0);
> +
> if (unlikely(!ret))
> return 1;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 81fb3c3..5b6d2a0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6996,10 +6996,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> * If memory slot is created, or moved, we need to clear all
> * mmio sptes.
> */
> - if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> - kvm_mmu_zap_all(kvm);
> - kvm_reload_remote_mmus(kvm);
> - }
> + if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE))
> + kvm_mmu_invalid_mmio_spte(kvm);
> }
>
> void kvm_arch_flush_shadow_all(struct kvm *kvm)
> --
> 1.7.7.6

--
Gleb.

2013-03-18 07:37:05

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 1/5] Revert "KVM: x86: Optimize mmio spte zapping when, creating/moving memslot"

On 03/16/2013 10:07 AM, Takuya Yoshikawa wrote:
> On Fri, 15 Mar 2013 23:26:59 +0800
> Xiao Guangrong <[email protected]> wrote:
>
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index d3c4787..61a5bb6 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -6991,7 +6991,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
>> * mmio sptes.
>> */
>> if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
>> - kvm_mmu_zap_mmio_sptes(kvm);
>>
>
> ???
> + kvm_mmu_zap_all()

Ouch, this is a stupid copy-paste error. It looks like that i should
use git send-mail.

Thank you for pointing this out, Takuya!

2013-03-18 07:38:17

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On 03/16/2013 10:13 AM, Takuya Yoshikawa wrote:
> On Fri, 15 Mar 2013 23:29:53 +0800
> Xiao Guangrong <[email protected]> wrote:
>
>> +/*
>> + * The caller should protect concurrent access on
>> + * kvm->arch.mmio_invalid_gen. Currently, it is used by
>> + * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
>> + */
>> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
>
> kvm_mmu_invalidate_mmio_sptes() may be a better name.

Yes, i am not good at naming, this is a better name, thank you!

2013-03-18 08:09:09

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On 03/17/2013 11:02 PM, Gleb Natapov wrote:
> On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
>> This patch tries to introduce a very simple and scale way to invalid all
>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
>>
>> KVM maintains a global mmio invalid generation-number which is stored in
>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
>> generation-number into his available bits when it is created
>>
>> When KVM need zap all mmio sptes, it just simply increase the global
>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>> then it walks the shadow page table and get the mmio spte. If the
>> generation-number on the spte does not equal the global generation-number,
>> it will go to the normal #PF handler to update the mmio spte
>>
>> Since 19 bits are used to store generation-number on mmio spte, the
>> generation-number can be round after 33554432 times. It is large enough
>> for nearly all most cases, but making the code be more strong, we zap all
>> shadow pages when the number is round
>>
> Very nice idea, but why drop Takuya patches instead of using
> kvm_mmu_zap_mmio_sptes() when generation number overflows.

I am not sure whether it is still needed. Requesting to zap all mmio sptes for
more than 500000 times is really really rare, it nearly does not happen.
(By the way, 33554432 is wrong in the changelog, i just copy that for my origin
implantation.) And, after my patch optimizing zapping all shadow pages,
zap-all-sps should not be a problem anymore since it does not take too much lock
time.

Your idea?

>
>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 2 +
>> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
>> arch/x86/kvm/mmutrace.h | 17 +++++++++++
>> arch/x86/kvm/paging_tmpl.h | 7 +++-
>> arch/x86/kvm/vmx.c | 4 ++
>> arch/x86/kvm/x86.c | 6 +--
>> 6 files changed, 82 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index ef7f4a5..572398e 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -529,6 +529,7 @@ struct kvm_arch {
>> unsigned int n_requested_mmu_pages;
>> unsigned int n_max_mmu_pages;
>> unsigned int indirect_shadow_pages;
>> + unsigned int mmio_invalid_gen;
> Why invalid? Should be mmio_valid_gen or mmio_current_get.

mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
so i named it as _invalid_. But mmio_valid_gen is good for me.

>
>> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
>> /*
>> * Hash table of struct kvm_mmu_page.
>> @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
>> void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>> struct kvm_memory_slot *slot,
>> gfn_t gfn_offset, unsigned long mask);
>> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
> Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.

Me too.

>
>> void kvm_mmu_zap_all(struct kvm *kvm);
>> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>> void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 13626f4..7093a92 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
>> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
>> unsigned access)
>> {
>> - u64 mask = generation_mmio_spte_mask(0);
>> + unsigned int gen = ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
>> + u64 mask = generation_mmio_spte_mask(gen);
>>
>> access &= ACC_WRITE_MASK | ACC_USER_MASK;
>> mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
>>
>> - trace_mark_mmio_spte(sptep, gfn, access, 0);
>> + trace_mark_mmio_spte(sptep, gfn, access, gen);
>> mmu_spte_set(sptep, mask);
>> }
>>
>> @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
>> return false;
>> }
>>
>> +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>> +{
>> + return get_mmio_spte_generation(spte) ==
>> + ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
>> +}
>> +
>> +/*
>> + * The caller should protect concurrent access on
>> + * kvm->arch.mmio_invalid_gen. Currently, it is used by
>> + * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
>> + */
>> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
>> +{
>> + /* Ensure update memslot has been completed. */
>> + smp_mb();
> What barrier this one is paired with?

It is paired with nothing. :)

I used mb here just for avoid increasing the generation-number before updating
the memslot. But on other sides (storing gen and checking gen), we do not need
to care it - the worse case is that we emulate a memory-accessed instruction.

>
>> +
>> + trace_kvm_mmu_invalid_mmio_spte(kvm);
> Something wrong with indentation.

My mistake. No idea why checkpatch did not warn me.

>
>> +
>> + /*
>> + * The very rare case: if the generation-number is round,
>> + * zap all shadow pages.
>> + */
>> + if (unlikely(kvm->arch.mmio_invalid_gen++ == MAX_GEN)) {
>> + kvm->arch.mmio_invalid_gen = 0;
>> + return kvm_mmu_zap_all(kvm);
>> + }
>> +}
>> +
>> static inline u64 rsvd_bits(int s, int e)
>> {
>> return ((1ULL << (e - s + 1)) - 1) << s;
>> @@ -3183,9 +3212,12 @@ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
>> }
>>
>> /*
>> - * If it is a real mmio page fault, return 1 and emulat the instruction
>> - * directly, return 0 to let CPU fault again on the address, -1 is
>> - * returned if bug is detected.
>> + * Return value:
>> + * 2: invalid spte is detected then let the real page fault path
>> + * update the mmio spte.
>> + * 1: it is a real mmio page fault, emulate the instruction directly.
>> + * 0: let CPU fault again on the address.
>> + * -1: bug is detected.
>> */
> What about dropping spte and let guest re-fault instead of propagating
> new return value up the call chain. If this is slow lets at least define
> a enum with descriptive names.

Checking mmio spte is out of mmu-lock, we can not clear it here.
And should be updated in the real page fault path, so it is no worth to
atomic-ly clear it out of mmu-lock and refault again. (As you said, it is slow).

I will define the return value as your suggestion in the next version.

Thanks for your review and valuable suggestions, Gleb!

2013-03-18 09:13:12

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
> On 03/17/2013 11:02 PM, Gleb Natapov wrote:
> > On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
> >> This patch tries to introduce a very simple and scale way to invalid all
> >> mmio sptes - it need not walk any shadow pages and hold mmu-lock
> >>
> >> KVM maintains a global mmio invalid generation-number which is stored in
> >> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
> >> generation-number into his available bits when it is created
> >>
> >> When KVM need zap all mmio sptes, it just simply increase the global
> >> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> >> then it walks the shadow page table and get the mmio spte. If the
> >> generation-number on the spte does not equal the global generation-number,
> >> it will go to the normal #PF handler to update the mmio spte
> >>
> >> Since 19 bits are used to store generation-number on mmio spte, the
> >> generation-number can be round after 33554432 times. It is large enough
> >> for nearly all most cases, but making the code be more strong, we zap all
> >> shadow pages when the number is round
> >>
> > Very nice idea, but why drop Takuya patches instead of using
> > kvm_mmu_zap_mmio_sptes() when generation number overflows.
>
> I am not sure whether it is still needed. Requesting to zap all mmio sptes for
> more than 500000 times is really really rare, it nearly does not happen.
> (By the way, 33554432 is wrong in the changelog, i just copy that for my origin
> implantation.) And, after my patch optimizing zapping all shadow pages,
> zap-all-sps should not be a problem anymore since it does not take too much lock
> time.
>
> Your idea?
>
I expect 500000 to become less since I already had plans to store some
information in mmio spte. Even if all zap-all-sptes becomes faster we
still needlessly zap all sptes while we can zap only mmio.

> >
> >
> >> Signed-off-by: Xiao Guangrong <[email protected]>
> >> ---
> >> arch/x86/include/asm/kvm_host.h | 2 +
> >> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
> >> arch/x86/kvm/mmutrace.h | 17 +++++++++++
> >> arch/x86/kvm/paging_tmpl.h | 7 +++-
> >> arch/x86/kvm/vmx.c | 4 ++
> >> arch/x86/kvm/x86.c | 6 +--
> >> 6 files changed, 82 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index ef7f4a5..572398e 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -529,6 +529,7 @@ struct kvm_arch {
> >> unsigned int n_requested_mmu_pages;
> >> unsigned int n_max_mmu_pages;
> >> unsigned int indirect_shadow_pages;
> >> + unsigned int mmio_invalid_gen;
> > Why invalid? Should be mmio_valid_gen or mmio_current_get.
>
> mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
> so i named it as _invalid_. But mmio_valid_gen is good for me.
>
It holds currently valid value though, so calling it "invalid" is
confusing.

> >
> >> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> >> /*
> >> * Hash table of struct kvm_mmu_page.
> >> @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> >> void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> >> struct kvm_memory_slot *slot,
> >> gfn_t gfn_offset, unsigned long mask);
> >> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
> > Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.
>
> Me too.
>
> >
> >> void kvm_mmu_zap_all(struct kvm *kvm);
> >> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
> >> void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
> >> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >> index 13626f4..7093a92 100644
> >> --- a/arch/x86/kvm/mmu.c
> >> +++ b/arch/x86/kvm/mmu.c
> >> @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
> >> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
> >> unsigned access)
> >> {
> >> - u64 mask = generation_mmio_spte_mask(0);
> >> + unsigned int gen = ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
> >> + u64 mask = generation_mmio_spte_mask(gen);
> >>
> >> access &= ACC_WRITE_MASK | ACC_USER_MASK;
> >> mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
> >>
> >> - trace_mark_mmio_spte(sptep, gfn, access, 0);
> >> + trace_mark_mmio_spte(sptep, gfn, access, gen);
> >> mmu_spte_set(sptep, mask);
> >> }
> >>
> >> @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
> >> return false;
> >> }
> >>
> >> +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
> >> +{
> >> + return get_mmio_spte_generation(spte) ==
> >> + ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
> >> +}
> >> +
> >> +/*
> >> + * The caller should protect concurrent access on
> >> + * kvm->arch.mmio_invalid_gen. Currently, it is used by
> >> + * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
> >> + */
> >> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
> >> +{
> >> + /* Ensure update memslot has been completed. */
> >> + smp_mb();
> > What barrier this one is paired with?
>
> It is paired with nothing. :)
>
> I used mb here just for avoid increasing the generation-number before updating
> the memslot. But on other sides (storing gen and checking gen), we do not need
> to care it - the worse case is that we emulate a memory-accessed instruction.
>
Are you warring that compiler can reorder instructions and put
instruction that increase generation number before updating memslot?
If yes then you need to use barrier() here. Or are you warring that
update may be seen in different order by another cpu? Then you need to
put another barring in the code that access memslot/generation number
and cares about the order.

> >
> >> +
> >> + trace_kvm_mmu_invalid_mmio_spte(kvm);
> > Something wrong with indentation.
>
> My mistake. No idea why checkpatch did not warn me.
>
> >
> >> +
> >> + /*
> >> + * The very rare case: if the generation-number is round,
> >> + * zap all shadow pages.
> >> + */
> >> + if (unlikely(kvm->arch.mmio_invalid_gen++ == MAX_GEN)) {
> >> + kvm->arch.mmio_invalid_gen = 0;
> >> + return kvm_mmu_zap_all(kvm);
> >> + }
> >> +}
> >> +
> >> static inline u64 rsvd_bits(int s, int e)
> >> {
> >> return ((1ULL << (e - s + 1)) - 1) << s;
> >> @@ -3183,9 +3212,12 @@ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
> >> }
> >>
> >> /*
> >> - * If it is a real mmio page fault, return 1 and emulat the instruction
> >> - * directly, return 0 to let CPU fault again on the address, -1 is
> >> - * returned if bug is detected.
> >> + * Return value:
> >> + * 2: invalid spte is detected then let the real page fault path
> >> + * update the mmio spte.
> >> + * 1: it is a real mmio page fault, emulate the instruction directly.
> >> + * 0: let CPU fault again on the address.
> >> + * -1: bug is detected.
> >> */
> > What about dropping spte and let guest re-fault instead of propagating
> > new return value up the call chain. If this is slow lets at least define
> > a enum with descriptive names.
>
> Checking mmio spte is out of mmu-lock, we can not clear it here.
> And should be updated in the real page fault path, so it is no worth to
> atomic-ly clear it out of mmu-lock and refault again. (As you said, it is slow).
>
> I will define the return value as your suggestion in the next version.
>
> Thanks for your review and valuable suggestions, Gleb!

--
Gleb.

2013-03-18 11:19:20

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: MMU: store generation-number into mmio spte

Il 15/03/2013 16:29, Xiao Guangrong ha scritto:
> +/*
> + * spte bits of bit 3 ~ bit 11 are used as low 9 bits of
> + * generation, the bits of bits 52 ~ bit 61 are used as
> + * high 12 bits of generation.
> + */

High 10 bits.

How often does the generation number change? Especially with Takuya's
patches, using just 10 bits for the generation might be enough and
simplifies the code.

Paolo

2013-03-18 12:30:03

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On 03/18/2013 05:13 PM, Gleb Natapov wrote:
> On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
>> On 03/17/2013 11:02 PM, Gleb Natapov wrote:
>>> On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
>>>> This patch tries to introduce a very simple and scale way to invalid all
>>>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
>>>>
>>>> KVM maintains a global mmio invalid generation-number which is stored in
>>>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
>>>> generation-number into his available bits when it is created
>>>>
>>>> When KVM need zap all mmio sptes, it just simply increase the global
>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>>>> then it walks the shadow page table and get the mmio spte. If the
>>>> generation-number on the spte does not equal the global generation-number,
>>>> it will go to the normal #PF handler to update the mmio spte
>>>>
>>>> Since 19 bits are used to store generation-number on mmio spte, the
>>>> generation-number can be round after 33554432 times. It is large enough
>>>> for nearly all most cases, but making the code be more strong, we zap all
>>>> shadow pages when the number is round
>>>>
>>> Very nice idea, but why drop Takuya patches instead of using
>>> kvm_mmu_zap_mmio_sptes() when generation number overflows.
>>
>> I am not sure whether it is still needed. Requesting to zap all mmio sptes for
>> more than 500000 times is really really rare, it nearly does not happen.
>> (By the way, 33554432 is wrong in the changelog, i just copy that for my origin
>> implantation.) And, after my patch optimizing zapping all shadow pages,
>> zap-all-sps should not be a problem anymore since it does not take too much lock
>> time.
>>
>> Your idea?
>>
> I expect 500000 to become less since I already had plans to store some

Interesting, just curious, what are the plans? ;)

> information in mmio spte. Even if all zap-all-sptes becomes faster we
> still needlessly zap all sptes while we can zap only mmio.

Okay.

>
>>>
>>>
>>>> Signed-off-by: Xiao Guangrong <[email protected]>
>>>> ---
>>>> arch/x86/include/asm/kvm_host.h | 2 +
>>>> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
>>>> arch/x86/kvm/mmutrace.h | 17 +++++++++++
>>>> arch/x86/kvm/paging_tmpl.h | 7 +++-
>>>> arch/x86/kvm/vmx.c | 4 ++
>>>> arch/x86/kvm/x86.c | 6 +--
>>>> 6 files changed, 82 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index ef7f4a5..572398e 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -529,6 +529,7 @@ struct kvm_arch {
>>>> unsigned int n_requested_mmu_pages;
>>>> unsigned int n_max_mmu_pages;
>>>> unsigned int indirect_shadow_pages;
>>>> + unsigned int mmio_invalid_gen;
>>> Why invalid? Should be mmio_valid_gen or mmio_current_get.
>>
>> mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
>> so i named it as _invalid_. But mmio_valid_gen is good for me.
>>
> It holds currently valid value though, so calling it "invalid" is
> confusing.

I agree.

>
>>>
>>>> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
>>>> /*
>>>> * Hash table of struct kvm_mmu_page.
>>>> @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
>>>> void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>>>> struct kvm_memory_slot *slot,
>>>> gfn_t gfn_offset, unsigned long mask);
>>>> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
>>> Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.
>>
>> Me too.
>>
>>>
>>>> void kvm_mmu_zap_all(struct kvm *kvm);
>>>> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>>>> void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
>>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>>> index 13626f4..7093a92 100644
>>>> --- a/arch/x86/kvm/mmu.c
>>>> +++ b/arch/x86/kvm/mmu.c
>>>> @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
>>>> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
>>>> unsigned access)
>>>> {
>>>> - u64 mask = generation_mmio_spte_mask(0);
>>>> + unsigned int gen = ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
>>>> + u64 mask = generation_mmio_spte_mask(gen);
>>>>
>>>> access &= ACC_WRITE_MASK | ACC_USER_MASK;
>>>> mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
>>>>
>>>> - trace_mark_mmio_spte(sptep, gfn, access, 0);
>>>> + trace_mark_mmio_spte(sptep, gfn, access, gen);
>>>> mmu_spte_set(sptep, mask);
>>>> }
>>>>
>>>> @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
>>>> return false;
>>>> }
>>>>
>>>> +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>>>> +{
>>>> + return get_mmio_spte_generation(spte) ==
>>>> + ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
>>>> +}
>>>> +
>>>> +/*
>>>> + * The caller should protect concurrent access on
>>>> + * kvm->arch.mmio_invalid_gen. Currently, it is used by
>>>> + * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
>>>> + */
>>>> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
>>>> +{
>>>> + /* Ensure update memslot has been completed. */
>>>> + smp_mb();
>>> What barrier this one is paired with?
>>
>> It is paired with nothing. :)
>>
>> I used mb here just for avoid increasing the generation-number before updating
>> the memslot. But on other sides (storing gen and checking gen), we do not need
>> to care it - the worse case is that we emulate a memory-accessed instruction.
>>
> Are you warring that compiler can reorder instructions and put
> instruction that increase generation number before updating memslot?
> If yes then you need to use barrier() here. Or are you warring that
> update may be seen in different order by another cpu? Then you need to
> put another barring in the code that access memslot/generation number
> and cares about the order.

After more thinking, maybe i missed something. The correct order should be:

The write side:
update kvm->memslots
smp_wmb()
kvm->mmio_invalid_gen++

The read side:
read kvm->mmio_invalid_gen++
smp_rmb();
search gfn in memslots (read all memslots)

Otherwise, mmio spte would cache a newest generation-number and obsolete
memslot info.

But we read memslots out of mmu-lock on page fault path, we should pass
mmio_invalid_gen to the page fault hander. In order to simplify the code,
let's save the generation-number into kvm_memslots, then they can protected
by SRCU. How about this?

2013-03-18 12:43:09

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: MMU: store generation-number into mmio spte

On 03/18/2013 07:19 PM, Paolo Bonzini wrote:
> Il 15/03/2013 16:29, Xiao Guangrong ha scritto:
>> +/*
>> + * spte bits of bit 3 ~ bit 11 are used as low 9 bits of
>> + * generation, the bits of bits 52 ~ bit 61 are used as
>> + * high 12 bits of generation.
>> + */
>
> High 10 bits.

Yes, i forgot to update the comments! Thank you, Paolo!

>
> How often does the generation number change? Especially with Takuya's
> patches, using just 10 bits for the generation might be enough and
> simplifies the code.

I guess it's only updated when guest is initializing and doing hotplug.
In my origin thought, if the number is big enough that it could not
round on all most case of hotplug, the things introduced by Takuya is
not necessary. But, if you guys want to keep the things, 10 bits should
be enough. :)

2013-03-18 12:46:22

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
> On 03/18/2013 05:13 PM, Gleb Natapov wrote:
> > On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
> >> On 03/17/2013 11:02 PM, Gleb Natapov wrote:
> >>> On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
> >>>> This patch tries to introduce a very simple and scale way to invalid all
> >>>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
> >>>>
> >>>> KVM maintains a global mmio invalid generation-number which is stored in
> >>>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
> >>>> generation-number into his available bits when it is created
> >>>>
> >>>> When KVM need zap all mmio sptes, it just simply increase the global
> >>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> >>>> then it walks the shadow page table and get the mmio spte. If the
> >>>> generation-number on the spte does not equal the global generation-number,
> >>>> it will go to the normal #PF handler to update the mmio spte
> >>>>
> >>>> Since 19 bits are used to store generation-number on mmio spte, the
> >>>> generation-number can be round after 33554432 times. It is large enough
> >>>> for nearly all most cases, but making the code be more strong, we zap all
> >>>> shadow pages when the number is round
> >>>>
> >>> Very nice idea, but why drop Takuya patches instead of using
> >>> kvm_mmu_zap_mmio_sptes() when generation number overflows.
> >>
> >> I am not sure whether it is still needed. Requesting to zap all mmio sptes for
> >> more than 500000 times is really really rare, it nearly does not happen.
> >> (By the way, 33554432 is wrong in the changelog, i just copy that for my origin
> >> implantation.) And, after my patch optimizing zapping all shadow pages,
> >> zap-all-sps should not be a problem anymore since it does not take too much lock
> >> time.
> >>
> >> Your idea?
> >>
> > I expect 500000 to become less since I already had plans to store some
>
> Interesting, just curious, what are the plans? ;)
>
Currently we uses pio to signal that work is pending to virtio devices. The
requirement is that signaling should be fast and PIO is fast since there
is not need to emulate instruction. PCIE though is not really designed
with PIO in mind, so we will have to use MMIO to do signaling. To avoid
instruction emulation I thought about making guest access these devices using
predefined variety of MOV instruction so that emulation can be skipped.
The idea is to mark mmio spte to know that emulation is not needed.

> > information in mmio spte. Even if all zap-all-sptes becomes faster we
> > still needlessly zap all sptes while we can zap only mmio.
>
> Okay.
>
> >
> >>>
> >>>
> >>>> Signed-off-by: Xiao Guangrong <[email protected]>
> >>>> ---
> >>>> arch/x86/include/asm/kvm_host.h | 2 +
> >>>> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
> >>>> arch/x86/kvm/mmutrace.h | 17 +++++++++++
> >>>> arch/x86/kvm/paging_tmpl.h | 7 +++-
> >>>> arch/x86/kvm/vmx.c | 4 ++
> >>>> arch/x86/kvm/x86.c | 6 +--
> >>>> 6 files changed, 82 insertions(+), 15 deletions(-)
> >>>>
> >>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >>>> index ef7f4a5..572398e 100644
> >>>> --- a/arch/x86/include/asm/kvm_host.h
> >>>> +++ b/arch/x86/include/asm/kvm_host.h
> >>>> @@ -529,6 +529,7 @@ struct kvm_arch {
> >>>> unsigned int n_requested_mmu_pages;
> >>>> unsigned int n_max_mmu_pages;
> >>>> unsigned int indirect_shadow_pages;
> >>>> + unsigned int mmio_invalid_gen;
> >>> Why invalid? Should be mmio_valid_gen or mmio_current_get.
> >>
> >> mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
> >> so i named it as _invalid_. But mmio_valid_gen is good for me.
> >>
> > It holds currently valid value though, so calling it "invalid" is
> > confusing.
>
> I agree.
>
> >
> >>>
> >>>> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> >>>> /*
> >>>> * Hash table of struct kvm_mmu_page.
> >>>> @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> >>>> void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> >>>> struct kvm_memory_slot *slot,
> >>>> gfn_t gfn_offset, unsigned long mask);
> >>>> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
> >>> Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.
> >>
> >> Me too.
> >>
> >>>
> >>>> void kvm_mmu_zap_all(struct kvm *kvm);
> >>>> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
> >>>> void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
> >>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >>>> index 13626f4..7093a92 100644
> >>>> --- a/arch/x86/kvm/mmu.c
> >>>> +++ b/arch/x86/kvm/mmu.c
> >>>> @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
> >>>> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
> >>>> unsigned access)
> >>>> {
> >>>> - u64 mask = generation_mmio_spte_mask(0);
> >>>> + unsigned int gen = ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
> >>>> + u64 mask = generation_mmio_spte_mask(gen);
> >>>>
> >>>> access &= ACC_WRITE_MASK | ACC_USER_MASK;
> >>>> mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
> >>>>
> >>>> - trace_mark_mmio_spte(sptep, gfn, access, 0);
> >>>> + trace_mark_mmio_spte(sptep, gfn, access, gen);
> >>>> mmu_spte_set(sptep, mask);
> >>>> }
> >>>>
> >>>> @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
> >>>> return false;
> >>>> }
> >>>>
> >>>> +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
> >>>> +{
> >>>> + return get_mmio_spte_generation(spte) ==
> >>>> + ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
> >>>> +}
> >>>> +
> >>>> +/*
> >>>> + * The caller should protect concurrent access on
> >>>> + * kvm->arch.mmio_invalid_gen. Currently, it is used by
> >>>> + * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
> >>>> + */
> >>>> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
> >>>> +{
> >>>> + /* Ensure update memslot has been completed. */
> >>>> + smp_mb();
> >>> What barrier this one is paired with?
> >>
> >> It is paired with nothing. :)
> >>
> >> I used mb here just for avoid increasing the generation-number before updating
> >> the memslot. But on other sides (storing gen and checking gen), we do not need
> >> to care it - the worse case is that we emulate a memory-accessed instruction.
> >>
> > Are you warring that compiler can reorder instructions and put
> > instruction that increase generation number before updating memslot?
> > If yes then you need to use barrier() here. Or are you warring that
> > update may be seen in different order by another cpu? Then you need to
> > put another barring in the code that access memslot/generation number
> > and cares about the order.
>
> After more thinking, maybe i missed something. The correct order should be:
>
> The write side:
> update kvm->memslots
> smp_wmb()
> kvm->mmio_invalid_gen++
>
> The read side:
> read kvm->mmio_invalid_gen++
> smp_rmb();
> search gfn in memslots (read all memslots)
>
> Otherwise, mmio spte would cache a newest generation-number and obsolete
> memslot info.
>
> But we read memslots out of mmu-lock on page fault path, we should pass
> mmio_invalid_gen to the page fault hander. In order to simplify the code,
> let's save the generation-number into kvm_memslots, then they can protected
> by SRCU. How about this?
>
Make sense and in fact we already have generation number there which is
used for gfn_to_hva_cache. The problem is that gfn_to_hva cache does not
expect generation number to wrap, but with modulo arithmetic we can make
it wrap only for mmio sptes.

--
Gleb.

2013-03-18 12:49:04

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: MMU: store generation-number into mmio spte

On Mon, Mar 18, 2013 at 08:42:09PM +0800, Xiao Guangrong wrote:
> On 03/18/2013 07:19 PM, Paolo Bonzini wrote:
> > Il 15/03/2013 16:29, Xiao Guangrong ha scritto:
> >> +/*
> >> + * spte bits of bit 3 ~ bit 11 are used as low 9 bits of
> >> + * generation, the bits of bits 52 ~ bit 61 are used as
> >> + * high 12 bits of generation.
> >> + */
> >
> > High 10 bits.
>
> Yes, i forgot to update the comments! Thank you, Paolo!
>
> >
> > How often does the generation number change? Especially with Takuya's
> > patches, using just 10 bits for the generation might be enough and
> > simplifies the code.
>
> I guess it's only updated when guest is initializing and doing hotplug.
> In my origin thought, if the number is big enough that it could not
> round on all most case of hotplug, the things introduced by Takuya is
> not necessary. But, if you guys want to keep the things, 10 bits should
> be enough. :)
I think you encapsulated the 19bit generation number handling good enough, so
I personally do not mind making it as big as we can.

--
Gleb.

2013-03-18 13:09:52

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On 03/18/2013 08:46 PM, Gleb Natapov wrote:
> On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
>> On 03/18/2013 05:13 PM, Gleb Natapov wrote:
>>> On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
>>>> On 03/17/2013 11:02 PM, Gleb Natapov wrote:
>>>>> On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
>>>>>> This patch tries to introduce a very simple and scale way to invalid all
>>>>>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
>>>>>>
>>>>>> KVM maintains a global mmio invalid generation-number which is stored in
>>>>>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
>>>>>> generation-number into his available bits when it is created
>>>>>>
>>>>>> When KVM need zap all mmio sptes, it just simply increase the global
>>>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>>>>>> then it walks the shadow page table and get the mmio spte. If the
>>>>>> generation-number on the spte does not equal the global generation-number,
>>>>>> it will go to the normal #PF handler to update the mmio spte
>>>>>>
>>>>>> Since 19 bits are used to store generation-number on mmio spte, the
>>>>>> generation-number can be round after 33554432 times. It is large enough
>>>>>> for nearly all most cases, but making the code be more strong, we zap all
>>>>>> shadow pages when the number is round
>>>>>>
>>>>> Very nice idea, but why drop Takuya patches instead of using
>>>>> kvm_mmu_zap_mmio_sptes() when generation number overflows.
>>>>
>>>> I am not sure whether it is still needed. Requesting to zap all mmio sptes for
>>>> more than 500000 times is really really rare, it nearly does not happen.
>>>> (By the way, 33554432 is wrong in the changelog, i just copy that for my origin
>>>> implantation.) And, after my patch optimizing zapping all shadow pages,
>>>> zap-all-sps should not be a problem anymore since it does not take too much lock
>>>> time.
>>>>
>>>> Your idea?
>>>>
>>> I expect 500000 to become less since I already had plans to store some
>>
>> Interesting, just curious, what are the plans? ;)
>>
> Currently we uses pio to signal that work is pending to virtio devices. The
> requirement is that signaling should be fast and PIO is fast since there
> is not need to emulate instruction. PCIE though is not really designed
> with PIO in mind, so we will have to use MMIO to do signaling. To avoid
> instruction emulation I thought about making guest access these devices using
> predefined variety of MOV instruction so that emulation can be skipped.
> The idea is to mark mmio spte to know that emulation is not needed.

How to know page-fault is caused by the predefined instruction?

>
>>> information in mmio spte. Even if all zap-all-sptes becomes faster we
>>> still needlessly zap all sptes while we can zap only mmio.
>>
>> Okay.
>>
>>>
>>>>>
>>>>>
>>>>>> Signed-off-by: Xiao Guangrong <[email protected]>
>>>>>> ---
>>>>>> arch/x86/include/asm/kvm_host.h | 2 +
>>>>>> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
>>>>>> arch/x86/kvm/mmutrace.h | 17 +++++++++++
>>>>>> arch/x86/kvm/paging_tmpl.h | 7 +++-
>>>>>> arch/x86/kvm/vmx.c | 4 ++
>>>>>> arch/x86/kvm/x86.c | 6 +--
>>>>>> 6 files changed, 82 insertions(+), 15 deletions(-)
>>>>>>
>>>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>>>> index ef7f4a5..572398e 100644
>>>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>>>> @@ -529,6 +529,7 @@ struct kvm_arch {
>>>>>> unsigned int n_requested_mmu_pages;
>>>>>> unsigned int n_max_mmu_pages;
>>>>>> unsigned int indirect_shadow_pages;
>>>>>> + unsigned int mmio_invalid_gen;
>>>>> Why invalid? Should be mmio_valid_gen or mmio_current_get.
>>>>
>>>> mmio_invalid_gen is only updated in kvm_mmu_invalidate_mmio_sptes,
>>>> so i named it as _invalid_. But mmio_valid_gen is good for me.
>>>>
>>> It holds currently valid value though, so calling it "invalid" is
>>> confusing.
>>
>> I agree.
>>
>>>
>>>>>
>>>>>> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
>>>>>> /*
>>>>>> * Hash table of struct kvm_mmu_page.
>>>>>> @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
>>>>>> void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
>>>>>> struct kvm_memory_slot *slot,
>>>>>> gfn_t gfn_offset, unsigned long mask);
>>>>>> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
>>>>> Agree with Takuya that kvm_mmu_invalidate_mmio_sptes() is a better name.
>>>>
>>>> Me too.
>>>>
>>>>>
>>>>>> void kvm_mmu_zap_all(struct kvm *kvm);
>>>>>> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
>>>>>> void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
>>>>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>>>>> index 13626f4..7093a92 100644
>>>>>> --- a/arch/x86/kvm/mmu.c
>>>>>> +++ b/arch/x86/kvm/mmu.c
>>>>>> @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
>>>>>> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
>>>>>> unsigned access)
>>>>>> {
>>>>>> - u64 mask = generation_mmio_spte_mask(0);
>>>>>> + unsigned int gen = ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
>>>>>> + u64 mask = generation_mmio_spte_mask(gen);
>>>>>>
>>>>>> access &= ACC_WRITE_MASK | ACC_USER_MASK;
>>>>>> mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
>>>>>>
>>>>>> - trace_mark_mmio_spte(sptep, gfn, access, 0);
>>>>>> + trace_mark_mmio_spte(sptep, gfn, access, gen);
>>>>>> mmu_spte_set(sptep, mask);
>>>>>> }
>>>>>>
>>>>>> @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
>>>>>> return false;
>>>>>> }
>>>>>>
>>>>>> +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>>>>>> +{
>>>>>> + return get_mmio_spte_generation(spte) ==
>>>>>> + ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
>>>>>> +}
>>>>>> +
>>>>>> +/*
>>>>>> + * The caller should protect concurrent access on
>>>>>> + * kvm->arch.mmio_invalid_gen. Currently, it is used by
>>>>>> + * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
>>>>>> + */
>>>>>> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
>>>>>> +{
>>>>>> + /* Ensure update memslot has been completed. */
>>>>>> + smp_mb();
>>>>> What barrier this one is paired with?
>>>>
>>>> It is paired with nothing. :)
>>>>
>>>> I used mb here just for avoid increasing the generation-number before updating
>>>> the memslot. But on other sides (storing gen and checking gen), we do not need
>>>> to care it - the worse case is that we emulate a memory-accessed instruction.
>>>>
>>> Are you warring that compiler can reorder instructions and put
>>> instruction that increase generation number before updating memslot?
>>> If yes then you need to use barrier() here. Or are you warring that
>>> update may be seen in different order by another cpu? Then you need to
>>> put another barring in the code that access memslot/generation number
>>> and cares about the order.
>>
>> After more thinking, maybe i missed something. The correct order should be:
>>
>> The write side:
>> update kvm->memslots
>> smp_wmb()
>> kvm->mmio_invalid_gen++
>>
>> The read side:
>> read kvm->mmio_invalid_gen++
>> smp_rmb();
>> search gfn in memslots (read all memslots)
>>
>> Otherwise, mmio spte would cache a newest generation-number and obsolete
>> memslot info.
>>
>> But we read memslots out of mmu-lock on page fault path, we should pass
>> mmio_invalid_gen to the page fault hander. In order to simplify the code,
>> let's save the generation-number into kvm_memslots, then they can protected
>> by SRCU. How about this?
>>
> Make sense and in fact we already have generation number there which is
> used for gfn_to_hva_cache. The problem is that gfn_to_hva cache does not
> expect generation number to wrap, but with modulo arithmetic we can make
> it wrap only for mmio sptes.

Reusing the existing generation number can cause mmio spte invalid even if
memslot is deleted but i guess it is not too bad.

2013-03-18 13:19:13

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On Mon, Mar 18, 2013 at 09:09:43PM +0800, Xiao Guangrong wrote:
> On 03/18/2013 08:46 PM, Gleb Natapov wrote:
> > On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
> >> On 03/18/2013 05:13 PM, Gleb Natapov wrote:
> >>> On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
> >>>> On 03/17/2013 11:02 PM, Gleb Natapov wrote:
> >>>>> On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
> >>>>>> This patch tries to introduce a very simple and scale way to invalid all
> >>>>>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
> >>>>>>
> >>>>>> KVM maintains a global mmio invalid generation-number which is stored in
> >>>>>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
> >>>>>> generation-number into his available bits when it is created
> >>>>>>
> >>>>>> When KVM need zap all mmio sptes, it just simply increase the global
> >>>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> >>>>>> then it walks the shadow page table and get the mmio spte. If the
> >>>>>> generation-number on the spte does not equal the global generation-number,
> >>>>>> it will go to the normal #PF handler to update the mmio spte
> >>>>>>
> >>>>>> Since 19 bits are used to store generation-number on mmio spte, the
> >>>>>> generation-number can be round after 33554432 times. It is large enough
> >>>>>> for nearly all most cases, but making the code be more strong, we zap all
> >>>>>> shadow pages when the number is round
> >>>>>>
> >>>>> Very nice idea, but why drop Takuya patches instead of using
> >>>>> kvm_mmu_zap_mmio_sptes() when generation number overflows.
> >>>>
> >>>> I am not sure whether it is still needed. Requesting to zap all mmio sptes for
> >>>> more than 500000 times is really really rare, it nearly does not happen.
> >>>> (By the way, 33554432 is wrong in the changelog, i just copy that for my origin
> >>>> implantation.) And, after my patch optimizing zapping all shadow pages,
> >>>> zap-all-sps should not be a problem anymore since it does not take too much lock
> >>>> time.
> >>>>
> >>>> Your idea?
> >>>>
> >>> I expect 500000 to become less since I already had plans to store some
> >>
> >> Interesting, just curious, what are the plans? ;)
> >>
> > Currently we uses pio to signal that work is pending to virtio devices. The
> > requirement is that signaling should be fast and PIO is fast since there
> > is not need to emulate instruction. PCIE though is not really designed
> > with PIO in mind, so we will have to use MMIO to do signaling. To avoid
> > instruction emulation I thought about making guest access these devices using
> > predefined variety of MOV instruction so that emulation can be skipped.
> > The idea is to mark mmio spte to know that emulation is not needed.
>
> How to know page-fault is caused by the predefined instruction?
>
Only predefined phys address rages will be accessed that way. If page
fault is in a range we assume the knows instruction is used.

--
Gleb.

2013-03-18 13:25:22

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On 03/18/2013 09:19 PM, Gleb Natapov wrote:
> On Mon, Mar 18, 2013 at 09:09:43PM +0800, Xiao Guangrong wrote:
>> On 03/18/2013 08:46 PM, Gleb Natapov wrote:
>>> On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
>>>> On 03/18/2013 05:13 PM, Gleb Natapov wrote:
>>>>> On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
>>>>>> On 03/17/2013 11:02 PM, Gleb Natapov wrote:
>>>>>>> On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
>>>>>>>> This patch tries to introduce a very simple and scale way to invalid all
>>>>>>>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
>>>>>>>>
>>>>>>>> KVM maintains a global mmio invalid generation-number which is stored in
>>>>>>>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
>>>>>>>> generation-number into his available bits when it is created
>>>>>>>>
>>>>>>>> When KVM need zap all mmio sptes, it just simply increase the global
>>>>>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>>>>>>>> then it walks the shadow page table and get the mmio spte. If the
>>>>>>>> generation-number on the spte does not equal the global generation-number,
>>>>>>>> it will go to the normal #PF handler to update the mmio spte
>>>>>>>>
>>>>>>>> Since 19 bits are used to store generation-number on mmio spte, the
>>>>>>>> generation-number can be round after 33554432 times. It is large enough
>>>>>>>> for nearly all most cases, but making the code be more strong, we zap all
>>>>>>>> shadow pages when the number is round
>>>>>>>>
>>>>>>> Very nice idea, but why drop Takuya patches instead of using
>>>>>>> kvm_mmu_zap_mmio_sptes() when generation number overflows.
>>>>>>
>>>>>> I am not sure whether it is still needed. Requesting to zap all mmio sptes for
>>>>>> more than 500000 times is really really rare, it nearly does not happen.
>>>>>> (By the way, 33554432 is wrong in the changelog, i just copy that for my origin
>>>>>> implantation.) And, after my patch optimizing zapping all shadow pages,
>>>>>> zap-all-sps should not be a problem anymore since it does not take too much lock
>>>>>> time.
>>>>>>
>>>>>> Your idea?
>>>>>>
>>>>> I expect 500000 to become less since I already had plans to store some
>>>>
>>>> Interesting, just curious, what are the plans? ;)
>>>>
>>> Currently we uses pio to signal that work is pending to virtio devices. The
>>> requirement is that signaling should be fast and PIO is fast since there
>>> is not need to emulate instruction. PCIE though is not really designed
>>> with PIO in mind, so we will have to use MMIO to do signaling. To avoid
>>> instruction emulation I thought about making guest access these devices using
>>> predefined variety of MOV instruction so that emulation can be skipped.
>>> The idea is to mark mmio spte to know that emulation is not needed.
>>
>> How to know page-fault is caused by the predefined instruction?
>>
> Only predefined phys address rages will be accessed that way. If page
> fault is in a range we assume the knows instruction is used.

That means the access can be identified by the gfn, why need cache
other things into mmio spte?

2013-03-18 13:27:57

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On Mon, Mar 18, 2013 at 09:25:10PM +0800, Xiao Guangrong wrote:
> On 03/18/2013 09:19 PM, Gleb Natapov wrote:
> > On Mon, Mar 18, 2013 at 09:09:43PM +0800, Xiao Guangrong wrote:
> >> On 03/18/2013 08:46 PM, Gleb Natapov wrote:
> >>> On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
> >>>> On 03/18/2013 05:13 PM, Gleb Natapov wrote:
> >>>>> On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
> >>>>>> On 03/17/2013 11:02 PM, Gleb Natapov wrote:
> >>>>>>> On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
> >>>>>>>> This patch tries to introduce a very simple and scale way to invalid all
> >>>>>>>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
> >>>>>>>>
> >>>>>>>> KVM maintains a global mmio invalid generation-number which is stored in
> >>>>>>>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
> >>>>>>>> generation-number into his available bits when it is created
> >>>>>>>>
> >>>>>>>> When KVM need zap all mmio sptes, it just simply increase the global
> >>>>>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> >>>>>>>> then it walks the shadow page table and get the mmio spte. If the
> >>>>>>>> generation-number on the spte does not equal the global generation-number,
> >>>>>>>> it will go to the normal #PF handler to update the mmio spte
> >>>>>>>>
> >>>>>>>> Since 19 bits are used to store generation-number on mmio spte, the
> >>>>>>>> generation-number can be round after 33554432 times. It is large enough
> >>>>>>>> for nearly all most cases, but making the code be more strong, we zap all
> >>>>>>>> shadow pages when the number is round
> >>>>>>>>
> >>>>>>> Very nice idea, but why drop Takuya patches instead of using
> >>>>>>> kvm_mmu_zap_mmio_sptes() when generation number overflows.
> >>>>>>
> >>>>>> I am not sure whether it is still needed. Requesting to zap all mmio sptes for
> >>>>>> more than 500000 times is really really rare, it nearly does not happen.
> >>>>>> (By the way, 33554432 is wrong in the changelog, i just copy that for my origin
> >>>>>> implantation.) And, after my patch optimizing zapping all shadow pages,
> >>>>>> zap-all-sps should not be a problem anymore since it does not take too much lock
> >>>>>> time.
> >>>>>>
> >>>>>> Your idea?
> >>>>>>
> >>>>> I expect 500000 to become less since I already had plans to store some
> >>>>
> >>>> Interesting, just curious, what are the plans? ;)
> >>>>
> >>> Currently we uses pio to signal that work is pending to virtio devices. The
> >>> requirement is that signaling should be fast and PIO is fast since there
> >>> is not need to emulate instruction. PCIE though is not really designed
> >>> with PIO in mind, so we will have to use MMIO to do signaling. To avoid
> >>> instruction emulation I thought about making guest access these devices using
> >>> predefined variety of MOV instruction so that emulation can be skipped.
> >>> The idea is to mark mmio spte to know that emulation is not needed.
> >>
> >> How to know page-fault is caused by the predefined instruction?
> >>
> > Only predefined phys address rages will be accessed that way. If page
> > fault is in a range we assume the knows instruction is used.
>
> That means the access can be identified by the gfn, why need cache
> other things into mmio spte?
Two not search through memory ranges on each access.

--
Gleb.

2013-03-18 13:32:56

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On 03/18/2013 09:27 PM, Gleb Natapov wrote:
> On Mon, Mar 18, 2013 at 09:25:10PM +0800, Xiao Guangrong wrote:
>> On 03/18/2013 09:19 PM, Gleb Natapov wrote:
>>> On Mon, Mar 18, 2013 at 09:09:43PM +0800, Xiao Guangrong wrote:
>>>> On 03/18/2013 08:46 PM, Gleb Natapov wrote:
>>>>> On Mon, Mar 18, 2013 at 08:29:29PM +0800, Xiao Guangrong wrote:
>>>>>> On 03/18/2013 05:13 PM, Gleb Natapov wrote:
>>>>>>> On Mon, Mar 18, 2013 at 04:08:50PM +0800, Xiao Guangrong wrote:
>>>>>>>> On 03/17/2013 11:02 PM, Gleb Natapov wrote:
>>>>>>>>> On Fri, Mar 15, 2013 at 11:29:53PM +0800, Xiao Guangrong wrote:
>>>>>>>>>> This patch tries to introduce a very simple and scale way to invalid all
>>>>>>>>>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
>>>>>>>>>>
>>>>>>>>>> KVM maintains a global mmio invalid generation-number which is stored in
>>>>>>>>>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
>>>>>>>>>> generation-number into his available bits when it is created
>>>>>>>>>>
>>>>>>>>>> When KVM need zap all mmio sptes, it just simply increase the global
>>>>>>>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>>>>>>>>>> then it walks the shadow page table and get the mmio spte. If the
>>>>>>>>>> generation-number on the spte does not equal the global generation-number,
>>>>>>>>>> it will go to the normal #PF handler to update the mmio spte
>>>>>>>>>>
>>>>>>>>>> Since 19 bits are used to store generation-number on mmio spte, the
>>>>>>>>>> generation-number can be round after 33554432 times. It is large enough
>>>>>>>>>> for nearly all most cases, but making the code be more strong, we zap all
>>>>>>>>>> shadow pages when the number is round
>>>>>>>>>>
>>>>>>>>> Very nice idea, but why drop Takuya patches instead of using
>>>>>>>>> kvm_mmu_zap_mmio_sptes() when generation number overflows.
>>>>>>>>
>>>>>>>> I am not sure whether it is still needed. Requesting to zap all mmio sptes for
>>>>>>>> more than 500000 times is really really rare, it nearly does not happen.
>>>>>>>> (By the way, 33554432 is wrong in the changelog, i just copy that for my origin
>>>>>>>> implantation.) And, after my patch optimizing zapping all shadow pages,
>>>>>>>> zap-all-sps should not be a problem anymore since it does not take too much lock
>>>>>>>> time.
>>>>>>>>
>>>>>>>> Your idea?
>>>>>>>>
>>>>>>> I expect 500000 to become less since I already had plans to store some
>>>>>>
>>>>>> Interesting, just curious, what are the plans? ;)
>>>>>>
>>>>> Currently we uses pio to signal that work is pending to virtio devices. The
>>>>> requirement is that signaling should be fast and PIO is fast since there
>>>>> is not need to emulate instruction. PCIE though is not really designed
>>>>> with PIO in mind, so we will have to use MMIO to do signaling. To avoid
>>>>> instruction emulation I thought about making guest access these devices using
>>>>> predefined variety of MOV instruction so that emulation can be skipped.
>>>>> The idea is to mark mmio spte to know that emulation is not needed.
>>>>
>>>> How to know page-fault is caused by the predefined instruction?
>>>>
>>> Only predefined phys address rages will be accessed that way. If page
>>> fault is in a range we assume the knows instruction is used.
>>
>> That means the access can be identified by the gfn, why need cache
>> other things into mmio spte?
> Two not search through memory ranges on each access.

Aha... got it. Thank you! ;)

2013-03-18 22:16:07

by Eric Northup

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On Fri, Mar 15, 2013 at 8:29 AM, Xiao Guangrong
<[email protected]> wrote:
> This patch tries to introduce a very simple and scale way to invalid all
> mmio sptes - it need not walk any shadow pages and hold mmu-lock
>
> KVM maintains a global mmio invalid generation-number which is stored in
> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
> generation-number into his available bits when it is created
>
> When KVM need zap all mmio sptes, it just simply increase the global
> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> then it walks the shadow page table and get the mmio spte. If the
> generation-number on the spte does not equal the global generation-number,
> it will go to the normal #PF handler to update the mmio spte
>
> Since 19 bits are used to store generation-number on mmio spte, the
> generation-number can be round after 33554432 times. It is large enough
> for nearly all most cases, but making the code be more strong, we zap all
> shadow pages when the number is round
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +
> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
> arch/x86/kvm/mmutrace.h | 17 +++++++++++
> arch/x86/kvm/paging_tmpl.h | 7 +++-
> arch/x86/kvm/vmx.c | 4 ++
> arch/x86/kvm/x86.c | 6 +--
> 6 files changed, 82 insertions(+), 15 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ef7f4a5..572398e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -529,6 +529,7 @@ struct kvm_arch {
> unsigned int n_requested_mmu_pages;
> unsigned int n_max_mmu_pages;
> unsigned int indirect_shadow_pages;
> + unsigned int mmio_invalid_gen;

Could this get initialized to something close to the wrap-around
value, so that the wrap-around case gets more real-world coverage?

> struct hlist_head mmu_page_hash[KVM_NUM_MMU_PAGES];
> /*
> * Hash table of struct kvm_mmu_page.
> @@ -765,6 +766,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm, int slot);
> void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> struct kvm_memory_slot *slot,
> gfn_t gfn_offset, unsigned long mask);
> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm);
> void kvm_mmu_zap_all(struct kvm *kvm);
> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
> void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 13626f4..7093a92 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -234,12 +234,13 @@ static unsigned int get_mmio_spte_generation(u64 spte)
> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
> unsigned access)
> {
> - u64 mask = generation_mmio_spte_mask(0);
> + unsigned int gen = ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
> + u64 mask = generation_mmio_spte_mask(gen);
>
> access &= ACC_WRITE_MASK | ACC_USER_MASK;
> mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
>
> - trace_mark_mmio_spte(sptep, gfn, access, 0);
> + trace_mark_mmio_spte(sptep, gfn, access, gen);
> mmu_spte_set(sptep, mask);
> }
>
> @@ -269,6 +270,34 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
> return false;
> }
>
> +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
> +{
> + return get_mmio_spte_generation(spte) ==
> + ACCESS_ONCE(kvm->arch.mmio_invalid_gen);
> +}
> +
> +/*
> + * The caller should protect concurrent access on
> + * kvm->arch.mmio_invalid_gen. Currently, it is used by
> + * kvm_arch_commit_memory_region and protected by kvm->slots_lock.
> + */
> +void kvm_mmu_invalid_mmio_spte(struct kvm *kvm)
> +{
> + /* Ensure update memslot has been completed. */
> + smp_mb();
> +
> + trace_kvm_mmu_invalid_mmio_spte(kvm);
> +
> + /*
> + * The very rare case: if the generation-number is round,
> + * zap all shadow pages.
> + */
> + if (unlikely(kvm->arch.mmio_invalid_gen++ == MAX_GEN)) {
> + kvm->arch.mmio_invalid_gen = 0;
> + return kvm_mmu_zap_all(kvm);
> + }
> +}
> +
> static inline u64 rsvd_bits(int s, int e)
> {
> return ((1ULL << (e - s + 1)) - 1) << s;
> @@ -3183,9 +3212,12 @@ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
> }
>
> /*
> - * If it is a real mmio page fault, return 1 and emulat the instruction
> - * directly, return 0 to let CPU fault again on the address, -1 is
> - * returned if bug is detected.
> + * Return value:
> + * 2: invalid spte is detected then let the real page fault path
> + * update the mmio spte.
> + * 1: it is a real mmio page fault, emulate the instruction directly.
> + * 0: let CPU fault again on the address.
> + * -1: bug is detected.
> */
> int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> {
> @@ -3200,6 +3232,9 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> gfn_t gfn = get_mmio_spte_gfn(spte);
> unsigned access = get_mmio_spte_access(spte);
>
> + if (unlikely(!check_mmio_spte(vcpu->kvm, spte)))
> + return 2;
> +
> if (direct)
> addr = 0;
>
> @@ -3241,8 +3276,12 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
>
> pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
>
> - if (unlikely(error_code & PFERR_RSVD_MASK))
> - return handle_mmio_page_fault(vcpu, gva, error_code, true);
> + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + r = handle_mmio_page_fault(vcpu, gva, error_code, true);
> +
> + if (likely(r != 2))
> + return r;
> + }
>
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> @@ -3318,8 +3357,12 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
> ASSERT(vcpu);
> ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
>
> - if (unlikely(error_code & PFERR_RSVD_MASK))
> - return handle_mmio_page_fault(vcpu, gpa, error_code, true);
> + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + r = handle_mmio_page_fault(vcpu, gpa, error_code, true);
> +
> + if (likely(r != 2))
> + return r;
> + }
>
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
> index f5b62a7..811254c 100644
> --- a/arch/x86/kvm/mmutrace.h
> +++ b/arch/x86/kvm/mmutrace.h
> @@ -276,6 +276,23 @@ TRACE_EVENT(
> __spte_satisfied(old_spte), __spte_satisfied(new_spte)
> )
> );
> +
> +TRACE_EVENT(
> + kvm_mmu_invalid_mmio_spte,
> + TP_PROTO(struct kvm *kvm),
> + TP_ARGS(kvm),
> +
> + TP_STRUCT__entry(
> + __field(unsigned int, mmio_invalid_gen)
> + ),
> +
> + TP_fast_assign(
> + __entry->mmio_invalid_gen = kvm->arch.mmio_invalid_gen;
> + ),
> +
> + TP_printk("mmio_invalid_gen %x", __entry->mmio_invalid_gen
> + )
> +);
> #endif /* _TRACE_KVMMMU_H */
>
> #undef TRACE_INCLUDE_PATH
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 2c48e5f..c81f949 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -552,9 +552,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
>
> pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
>
> - if (unlikely(error_code & PFERR_RSVD_MASK))
> - return handle_mmio_page_fault(vcpu, addr, error_code,
> + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + r = handle_mmio_page_fault(vcpu, addr, error_code,
> mmu_is_nested(vcpu));
> + if (likely(r != 2))
> + return r;
> + };
>
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 54fdb76..ca8e9a3 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5159,6 +5159,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> if (likely(ret == 1))
> return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
> EMULATE_DONE;
> +
> + if (unlikely(ret == 2))
> + return kvm_mmu_page_fault(vcpu, gpa, 0, NULL, 0);
> +
> if (unlikely(!ret))
> return 1;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 81fb3c3..5b6d2a0 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -6996,10 +6996,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> * If memory slot is created, or moved, we need to clear all
> * mmio sptes.
> */
> - if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> - kvm_mmu_zap_all(kvm);
> - kvm_reload_remote_mmus(kvm);
> - }
> + if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE))
> + kvm_mmu_invalid_mmio_spte(kvm);
> }
>
> void kvm_arch_flush_shadow_all(struct kvm *kvm)
> --
> 1.7.7.6
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html

2013-03-19 03:15:53

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On 03/19/2013 06:16 AM, Eric Northup wrote:
> On Fri, Mar 15, 2013 at 8:29 AM, Xiao Guangrong
> <[email protected]> wrote:
>> This patch tries to introduce a very simple and scale way to invalid all
>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
>>
>> KVM maintains a global mmio invalid generation-number which is stored in
>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
>> generation-number into his available bits when it is created
>>
>> When KVM need zap all mmio sptes, it just simply increase the global
>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>> then it walks the shadow page table and get the mmio spte. If the
>> generation-number on the spte does not equal the global generation-number,
>> it will go to the normal #PF handler to update the mmio spte
>>
>> Since 19 bits are used to store generation-number on mmio spte, the
>> generation-number can be round after 33554432 times. It is large enough
>> for nearly all most cases, but making the code be more strong, we zap all
>> shadow pages when the number is round
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 2 +
>> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
>> arch/x86/kvm/mmutrace.h | 17 +++++++++++
>> arch/x86/kvm/paging_tmpl.h | 7 +++-
>> arch/x86/kvm/vmx.c | 4 ++
>> arch/x86/kvm/x86.c | 6 +--
>> 6 files changed, 82 insertions(+), 15 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index ef7f4a5..572398e 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -529,6 +529,7 @@ struct kvm_arch {
>> unsigned int n_requested_mmu_pages;
>> unsigned int n_max_mmu_pages;
>> unsigned int indirect_shadow_pages;
>> + unsigned int mmio_invalid_gen;
>
> Could this get initialized to something close to the wrap-around
> value, so that the wrap-around case gets more real-world coverage?

I am afraid we can not. We cache the current mmio_invalid_gen into mmio spte when
it is created no matter what the initiation value is.

If you have a better way, please show me. ;)


2013-03-19 07:36:54

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On Tue, Mar 19, 2013 at 11:15:36AM +0800, Xiao Guangrong wrote:
> On 03/19/2013 06:16 AM, Eric Northup wrote:
> > On Fri, Mar 15, 2013 at 8:29 AM, Xiao Guangrong
> > <[email protected]> wrote:
> >> This patch tries to introduce a very simple and scale way to invalid all
> >> mmio sptes - it need not walk any shadow pages and hold mmu-lock
> >>
> >> KVM maintains a global mmio invalid generation-number which is stored in
> >> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
> >> generation-number into his available bits when it is created
> >>
> >> When KVM need zap all mmio sptes, it just simply increase the global
> >> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> >> then it walks the shadow page table and get the mmio spte. If the
> >> generation-number on the spte does not equal the global generation-number,
> >> it will go to the normal #PF handler to update the mmio spte
> >>
> >> Since 19 bits are used to store generation-number on mmio spte, the
> >> generation-number can be round after 33554432 times. It is large enough
> >> for nearly all most cases, but making the code be more strong, we zap all
> >> shadow pages when the number is round
> >>
> >> Signed-off-by: Xiao Guangrong <[email protected]>
> >> ---
> >> arch/x86/include/asm/kvm_host.h | 2 +
> >> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
> >> arch/x86/kvm/mmutrace.h | 17 +++++++++++
> >> arch/x86/kvm/paging_tmpl.h | 7 +++-
> >> arch/x86/kvm/vmx.c | 4 ++
> >> arch/x86/kvm/x86.c | 6 +--
> >> 6 files changed, 82 insertions(+), 15 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> >> index ef7f4a5..572398e 100644
> >> --- a/arch/x86/include/asm/kvm_host.h
> >> +++ b/arch/x86/include/asm/kvm_host.h
> >> @@ -529,6 +529,7 @@ struct kvm_arch {
> >> unsigned int n_requested_mmu_pages;
> >> unsigned int n_max_mmu_pages;
> >> unsigned int indirect_shadow_pages;
> >> + unsigned int mmio_invalid_gen;
> >
> > Could this get initialized to something close to the wrap-around
> > value, so that the wrap-around case gets more real-world coverage?
>
> I am afraid we can not. We cache the current mmio_invalid_gen into mmio spte when
> it is created no matter what the initiation value is.
>
> If you have a better way, please show me. ;)
>
The idea is to initialize mmio_invalid_gen to value close to MAX_GEN in
order to exercise

+ if (unlikely(kvm->arch.mmio_invalid_gen++ == MAX_GEN)) {
+ kvm->arch.mmio_invalid_gen = 0;
+ return kvm_mmu_zap_all(kvm);
+ }

path more often.

--
Gleb.

2013-03-19 07:52:32

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH 5/5] KVM: MMU: fast invalid all mmio sptes

On 03/19/2013 03:36 PM, Gleb Natapov wrote:
> On Tue, Mar 19, 2013 at 11:15:36AM +0800, Xiao Guangrong wrote:
>> On 03/19/2013 06:16 AM, Eric Northup wrote:
>>> On Fri, Mar 15, 2013 at 8:29 AM, Xiao Guangrong
>>> <[email protected]> wrote:
>>>> This patch tries to introduce a very simple and scale way to invalid all
>>>> mmio sptes - it need not walk any shadow pages and hold mmu-lock
>>>>
>>>> KVM maintains a global mmio invalid generation-number which is stored in
>>>> kvm->arch.mmio_invalid_gen and every mmio spte stores the current global
>>>> generation-number into his available bits when it is created
>>>>
>>>> When KVM need zap all mmio sptes, it just simply increase the global
>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>>>> then it walks the shadow page table and get the mmio spte. If the
>>>> generation-number on the spte does not equal the global generation-number,
>>>> it will go to the normal #PF handler to update the mmio spte
>>>>
>>>> Since 19 bits are used to store generation-number on mmio spte, the
>>>> generation-number can be round after 33554432 times. It is large enough
>>>> for nearly all most cases, but making the code be more strong, we zap all
>>>> shadow pages when the number is round
>>>>
>>>> Signed-off-by: Xiao Guangrong <[email protected]>
>>>> ---
>>>> arch/x86/include/asm/kvm_host.h | 2 +
>>>> arch/x86/kvm/mmu.c | 61 +++++++++++++++++++++++++++++++++------
>>>> arch/x86/kvm/mmutrace.h | 17 +++++++++++
>>>> arch/x86/kvm/paging_tmpl.h | 7 +++-
>>>> arch/x86/kvm/vmx.c | 4 ++
>>>> arch/x86/kvm/x86.c | 6 +--
>>>> 6 files changed, 82 insertions(+), 15 deletions(-)
>>>>
>>>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>>>> index ef7f4a5..572398e 100644
>>>> --- a/arch/x86/include/asm/kvm_host.h
>>>> +++ b/arch/x86/include/asm/kvm_host.h
>>>> @@ -529,6 +529,7 @@ struct kvm_arch {
>>>> unsigned int n_requested_mmu_pages;
>>>> unsigned int n_max_mmu_pages;
>>>> unsigned int indirect_shadow_pages;
>>>> + unsigned int mmio_invalid_gen;
>>>
>>> Could this get initialized to something close to the wrap-around
>>> value, so that the wrap-around case gets more real-world coverage?
>>
>> I am afraid we can not. We cache the current mmio_invalid_gen into mmio spte when
>> it is created no matter what the initiation value is.
>>
>> If you have a better way, please show me. ;)
>>
> The idea is to initialize mmio_invalid_gen to value close to MAX_GEN in
> order to exercise

Oh, got it, i understood it as "initialize mmio_invalid properly to _avoid_ more
wrap-around"... Sorry for my careless! :(

The idea can check the hardly-run code at runtime, i am fine with this.

Thank you, Eric and Gleb!

2013-03-20 18:44:00

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH 4/5] KVM: MMU: store generation-number into mmio spte

On Mon, Mar 18, 2013 at 08:42:09PM +0800, Xiao Guangrong wrote:
> On 03/18/2013 07:19 PM, Paolo Bonzini wrote:
> > Il 15/03/2013 16:29, Xiao Guangrong ha scritto:
> >> +/*
> >> + * spte bits of bit 3 ~ bit 11 are used as low 9 bits of
> >> + * generation, the bits of bits 52 ~ bit 61 are used as
> >> + * high 12 bits of generation.
> >> + */
> >
> > High 10 bits.
>
> Yes, i forgot to update the comments! Thank you, Paolo!
>
> >
> > How often does the generation number change? Especially with Takuya's
> > patches, using just 10 bits for the generation might be enough and
> > simplifies the code.
>
> I guess it's only updated when guest is initializing and doing hotplug.
> In my origin thought, if the number is big enough that it could not
> round on all most case of hotplug, the things introduced by Takuya is
> not necessary. But, if you guys want to keep the things, 10 bits should
> be enough. :)

Since an overflow can happen in normal scenarios (cirrus emulation can
delete/add memory slot on legacy mode, with syslinux, for example), it
would be good to handle the overflow scenario. It also means bits for
generation number can be reduced.

Can walk the GPA range being added/moved instead of walking
active_mmu_pages (should be much faster since usually its a small slot
being added/deleted).