2013-06-07 08:51:51

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

Changelog:
V3:
All of these changes are from Gleb's review:
1) rename RET_MMIO_PF_EMU to RET_MMIO_PF_EMULATE.
2) smartly adjust kvm generation number in kvm_current_mmio_generatio()
to avoid kvm_memslots->generation overflow.

V2:
- rename kvm_mmu_invalid_mmio_spte to kvm_mmu_invalid_mmio_sptes
- use kvm->memslots->generation as kvm global generation-number
- fix comment and codestyle
- init kvm generation close to mmio wrap-around value
- keep kvm_mmu_zap_mmio_sptes

The current way is holding hot mmu-lock and walking all shadow pages, this
is not scale. This patchset tries to introduce a very simple and scale way
to fast invalidate all mmio sptes - it need not walk any shadow pages and hold
any locks.

The idea is simple:
KVM maintains a global mmio valid generation-number which is stored in
kvm->memslots.generation and every mmio spte stores the current global
generation-number into his available bits when it is created

When KVM need zap all mmio sptes, it just simply increase the global
generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
then it walks the shadow page table and get the mmio spte. If the
generation-number on the spte does not equal the global generation-number,
it will go to the normal #PF handler to update the mmio spte

Since 19 bits are used to store generation-number on mmio spte, we zap all
mmio sptes when the number is round

Xiao Guangrong (6):
KVM: MMU: retain more available bits on mmio spte
KVM: MMU: store generation-number into mmio spte
KVM: MMU: make return value of mmio page fault handler more readable
KVM: MMU: fast invalidate all mmio sptes
KVM: MMU: add tracepoint for check_mmio_spte
KVM: MMU: init kvm generation close to mmio wrap-around value

arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 131 ++++++++++++++++++++++++++++++++--------
arch/x86/kvm/mmu.h | 17 ++++++
arch/x86/kvm/mmutrace.h | 34 +++++++++--
arch/x86/kvm/paging_tmpl.h | 10 ++-
arch/x86/kvm/vmx.c | 12 ++--
arch/x86/kvm/x86.c | 11 +++-
7 files changed, 177 insertions(+), 40 deletions(-)

--
1.8.1.4


2013-06-07 08:52:02

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 3/6] KVM: MMU: make return value of mmio page fault handler more readable

Define some meaningful names instead of raw code

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 15 +++++----------
arch/x86/kvm/mmu.h | 14 ++++++++++++++
arch/x86/kvm/vmx.c | 4 ++--
3 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index eca91bd..044d8c0 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3222,17 +3222,12 @@ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
return spte;
}

-/*
- * If it is a real mmio page fault, return 1 and emulat the instruction
- * directly, return 0 to let CPU fault again on the address, -1 is
- * returned if bug is detected.
- */
int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
{
u64 spte;

if (quickly_check_mmio_pf(vcpu, addr, direct))
- return 1;
+ return RET_MMIO_PF_EMULATE;

spte = walk_shadow_page_get_mmio_spte(vcpu, addr);

@@ -3245,7 +3240,7 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)

trace_handle_mmio_page_fault(addr, gfn, access);
vcpu_cache_mmio_info(vcpu, addr, gfn, access);
- return 1;
+ return RET_MMIO_PF_EMULATE;
}

/*
@@ -3253,13 +3248,13 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
* it's a BUG if the gfn is not a mmio page.
*/
if (direct && !check_direct_spte_mmio_pf(spte))
- return -1;
+ return RET_MMIO_PF_BUG;

/*
* If the page table is zapped by other cpus, let CPU fault again on
* the address.
*/
- return 0;
+ return RET_MMIO_PF_RETRY;
}
EXPORT_SYMBOL_GPL(handle_mmio_page_fault_common);

@@ -3269,7 +3264,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr,
int ret;

ret = handle_mmio_page_fault_common(vcpu, addr, direct);
- WARN_ON(ret < 0);
+ WARN_ON(ret == RET_MMIO_PF_BUG);
return ret;
}

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 922bfae..ba6a19c 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -52,6 +52,20 @@

int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
+
+/*
+ * Return values of handle_mmio_page_fault_common:
+ * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
+ * directly.
+ * RET_MMIO_PF_RETRY: let CPU fault again on the address.
+ * RET_MMIO_PF_BUG: bug is detected.
+ */
+enum {
+ RET_MMIO_PF_EMULATE = 1,
+ RET_MMIO_PF_RETRY = 0,
+ RET_MMIO_PF_BUG = -1
+};
+
int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct);
int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 78ee123..85c8d51 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5366,10 +5366,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);

ret = handle_mmio_page_fault_common(vcpu, gpa, true);
- if (likely(ret == 1))
+ if (likely(ret == RET_MMIO_PF_EMULATE))
return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
EMULATE_DONE;
- if (unlikely(!ret))
+ if (unlikely(ret == RET_MMIO_PF_RETRY))
return 1;

/* It is the real ept misconfig */
--
1.8.1.4

2013-06-07 08:52:00

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 1/6] KVM: MMU: retain more available bits on mmio spte

Let mmio spte only use bit62 and bit63 on upper 32 bits, then bit 52 ~ bit 61
can be used for other purposes

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/vmx.c | 4 ++--
arch/x86/kvm/x86.c | 8 +++++++-
2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 260a919..78ee123 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4176,10 +4176,10 @@ static void ept_set_mmio_spte_mask(void)
/*
* EPT Misconfigurations can be generated if the value of bits 2:0
* of an EPT paging-structure entry is 110b (write/execute).
- * Also, magic bits (0xffull << 49) is set to quickly identify mmio
+ * Also, magic bits (0x3ull << 62) is set to quickly identify mmio
* spte.
*/
- kvm_mmu_set_mmio_spte_mask(0xffull << 49 | 0x6ull);
+ kvm_mmu_set_mmio_spte_mask((0x3ull << 62) | 0x6ull);
}

/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 6402951..54059ba 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5263,7 +5263,13 @@ static void kvm_set_mmio_spte_mask(void)
* Set the reserved bits and the present bit of an paging-structure
* entry to generate page fault with PFER.RSV = 1.
*/
- mask = ((1ull << (62 - maxphyaddr + 1)) - 1) << maxphyaddr;
+ /* Mask the reserved physical address bits. */
+ mask = ((1ull << (51 - maxphyaddr + 1)) - 1) << maxphyaddr;
+
+ /* Bit 62 is always reserved for 32bit host. */
+ mask |= 0x3ull << 62;
+
+ /* Set the present bit. */
mask |= 1ull;

#ifdef CONFIG_X86_64
--
1.8.1.4

2013-06-07 08:51:58

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 5/6] KVM: MMU: add tracepoint for check_mmio_spte

It is useful for debug mmio spte invalidation

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 9 +++++++--
arch/x86/kvm/mmutrace.h | 24 ++++++++++++++++++++++++
2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index bdc95bc..1fd2c05 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -281,8 +281,13 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,

static bool check_mmio_spte(struct kvm *kvm, u64 spte)
{
- return likely(get_mmio_spte_generation(spte) ==
- kvm_current_mmio_generation(kvm));
+ unsigned int kvm_gen, spte_gen;
+
+ kvm_gen = kvm_current_mmio_generation(kvm);
+ spte_gen = get_mmio_spte_generation(spte);
+
+ trace_check_mmio_spte(spte, kvm_gen, spte_gen);
+ return likely(kvm_gen == spte_gen);
}

static inline u64 rsvd_bits(int s, int e)
diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index ad24757..9d2e0ff 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -298,6 +298,30 @@ TRACE_EVENT(
__entry->mmu_valid_gen, __entry->mmu_used_pages
)
);
+
+
+TRACE_EVENT(
+ check_mmio_spte,
+ TP_PROTO(u64 spte, unsigned int kvm_gen, unsigned int spte_gen),
+ TP_ARGS(spte, kvm_gen, spte_gen),
+
+ TP_STRUCT__entry(
+ __field(unsigned int, kvm_gen)
+ __field(unsigned int, spte_gen)
+ __field(u64, spte)
+ ),
+
+ TP_fast_assign(
+ __entry->kvm_gen = kvm_gen;
+ __entry->spte_gen = spte_gen;
+ __entry->spte = spte;
+ ),
+
+ TP_printk("spte %llx kvm_gen %x spte-gen %x valid %d", __entry->spte,
+ __entry->kvm_gen, __entry->spte_gen,
+ __entry->kvm_gen == __entry->spte_gen
+ )
+);
#endif /* _TRACE_KVMMMU_H */

#undef TRACE_INCLUDE_PATH
--
1.8.1.4

2013-06-07 08:53:54

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 6/6] KVM: MMU: init kvm generation close to mmio wrap-around value

Then it has the chance to trigger mmio generation number wrap-around

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1fd2c05..7d50a2d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -235,7 +235,12 @@ static unsigned int get_mmio_spte_generation(u64 spte)

static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
{
- return kvm_memslots(kvm)->generation & MMIO_GEN_MASK;
+ /*
+ * Init kvm generation close to MMIO_MAX_GEN to easily test the
+ * code of handling generation number wrap-around.
+ */
+ return (kvm_memslots(kvm)->generation +
+ MMIO_MAX_GEN - 13) & MMIO_GEN_MASK;
}

static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
--
1.8.1.4

2013-06-07 08:57:50

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

This patch tries to introduce a very simple and scale way to invalidate
all mmio sptes - it need not walk any shadow pages and hold mmu-lock

KVM maintains a global mmio valid generation-number which is stored in
kvm->memslots.generation and every mmio spte stores the current global
generation-number into his available bits when it is created

When KVM need zap all mmio sptes, it just simply increase the global
generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
then it walks the shadow page table and get the mmio spte. If the
generation-number on the spte does not equal the global generation-number,
it will go to the normal #PF handler to update the mmio spte

Since 19 bits are used to store generation-number on mmio spte, we zap all
mmio sptes when the number is round

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/mmu.c | 54 +++++++++++++++++++++++++++++++++++------
arch/x86/kvm/mmu.h | 5 +++-
arch/x86/kvm/paging_tmpl.h | 7 ++++--
arch/x86/kvm/vmx.c | 4 +++
arch/x86/kvm/x86.c | 3 +--
6 files changed, 61 insertions(+), 14 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1f98c1b..90d05ed 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -773,7 +773,7 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset, unsigned long mask);
void kvm_mmu_zap_all(struct kvm *kvm);
-void kvm_mmu_zap_mmio_sptes(struct kvm *kvm);
+void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 044d8c0..bdc95bc 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -205,9 +205,11 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
#define MMIO_SPTE_GEN_LOW_SHIFT 3
#define MMIO_SPTE_GEN_HIGH_SHIFT 52

+#define MMIO_GEN_SHIFT 19
#define MMIO_GEN_LOW_SHIFT 9
#define MMIO_GEN_LOW_MASK ((1 << MMIO_GEN_LOW_SHIFT) - 1)
-#define MMIO_MAX_GEN ((1 << 19) - 1)
+#define MMIO_GEN_MASK ((1 << MMIO_GEN_SHIFT) - 1)
+#define MMIO_MAX_GEN ((1 << MMIO_GEN_SHIFT) - 1)

static u64 generation_mmio_spte_mask(unsigned int gen)
{
@@ -231,17 +233,23 @@ static unsigned int get_mmio_spte_generation(u64 spte)
return gen;
}

+static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+{
+ return kvm_memslots(kvm)->generation & MMIO_GEN_MASK;
+}
+
static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
unsigned access)
{
struct kvm_mmu_page *sp = page_header(__pa(sptep));
- u64 mask = generation_mmio_spte_mask(0);
+ unsigned int gen = kvm_current_mmio_generation(kvm);
+ u64 mask = generation_mmio_spte_mask(gen);

access &= ACC_WRITE_MASK | ACC_USER_MASK;
mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
sp->mmio_cached = true;

- trace_mark_mmio_spte(sptep, gfn, access, 0);
+ trace_mark_mmio_spte(sptep, gfn, access, gen);
mmu_spte_set(sptep, mask);
}

@@ -271,6 +279,12 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
return false;
}

+static bool check_mmio_spte(struct kvm *kvm, u64 spte)
+{
+ return likely(get_mmio_spte_generation(spte) ==
+ kvm_current_mmio_generation(kvm));
+}
+
static inline u64 rsvd_bits(int s, int e)
{
return ((1ULL << (e - s + 1)) - 1) << s;
@@ -3235,6 +3249,9 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
gfn_t gfn = get_mmio_spte_gfn(spte);
unsigned access = get_mmio_spte_access(spte);

+ if (!check_mmio_spte(vcpu->kvm, spte))
+ return RET_MMIO_PF_INVALID;
+
if (direct)
addr = 0;

@@ -3276,8 +3293,12 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,

pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);

- if (unlikely(error_code & PFERR_RSVD_MASK))
- return handle_mmio_page_fault(vcpu, gva, error_code, true);
+ if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ r = handle_mmio_page_fault(vcpu, gva, error_code, true);
+
+ if (likely(r != RET_MMIO_PF_INVALID))
+ return r;
+ }

r = mmu_topup_memory_caches(vcpu);
if (r)
@@ -3353,8 +3374,12 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
ASSERT(vcpu);
ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));

- if (unlikely(error_code & PFERR_RSVD_MASK))
- return handle_mmio_page_fault(vcpu, gpa, error_code, true);
+ if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ r = handle_mmio_page_fault(vcpu, gpa, error_code, true);
+
+ if (likely(r != RET_MMIO_PF_INVALID))
+ return r;
+ }

r = mmu_topup_memory_caches(vcpu);
if (r)
@@ -4327,7 +4352,7 @@ void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
spin_unlock(&kvm->mmu_lock);
}

-void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
+static void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
{
struct kvm_mmu_page *sp, *node;
LIST_HEAD(invalid_list);
@@ -4350,6 +4375,19 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
}

+void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
+{
+ /*
+ * The very rare case: if the generation-number is round,
+ * zap all shadow pages.
+ *
+ * The max value is MMIO_MAX_GEN - 1 since it is not called
+ * when mark memslot invalid.
+ */
+ if (unlikely(kvm_current_mmio_generation(kvm) >= (MMIO_MAX_GEN - 1)))
+ kvm_mmu_zap_mmio_sptes(kvm);
+}
+
static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
{
struct kvm *kvm;
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index ba6a19c..5b59c57 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -56,12 +56,15 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
/*
* Return values of handle_mmio_page_fault_common:
* RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
- * directly.
+ * directly.
+ * RET_MMIO_PF_INVALID: invalid spte is detected then let the real page
+ * fault path update the mmio spte.
* RET_MMIO_PF_RETRY: let CPU fault again on the address.
* RET_MMIO_PF_BUG: bug is detected.
*/
enum {
RET_MMIO_PF_EMULATE = 1,
+ RET_MMIO_PF_INVALID = 2,
RET_MMIO_PF_RETRY = 0,
RET_MMIO_PF_BUG = -1
};
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index fb50fa6..7769699 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -552,9 +552,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,

pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);

- if (unlikely(error_code & PFERR_RSVD_MASK))
- return handle_mmio_page_fault(vcpu, addr, error_code,
+ if (unlikely(error_code & PFERR_RSVD_MASK)) {
+ r = handle_mmio_page_fault(vcpu, addr, error_code,
mmu_is_nested(vcpu));
+ if (likely(r != RET_MMIO_PF_INVALID))
+ return r;
+ };

r = mmu_topup_memory_caches(vcpu);
if (r)
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 85c8d51..f4a5b3f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5369,6 +5369,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
if (likely(ret == RET_MMIO_PF_EMULATE))
return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
EMULATE_DONE;
+
+ if (unlikely(ret == RET_MMIO_PF_INVALID))
+ return kvm_mmu_page_fault(vcpu, gpa, 0, NULL, 0);
+
if (unlikely(ret == RET_MMIO_PF_RETRY))
return 1;

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 54059ba..9e4afa7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7067,8 +7067,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
* If memory slot is created, or moved, we need to clear all
* mmio sptes.
*/
- if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE))
- kvm_mmu_zap_mmio_sptes(kvm);
+ kvm_mmu_invalidate_mmio_sptes(kvm);
}

void kvm_arch_flush_shadow_all(struct kvm *kvm)
--
1.8.1.4

2013-06-07 08:58:46

by Xiao Guangrong

[permalink] [raw]
Subject: [PATCH v3 2/6] KVM: MMU: store generation-number into mmio spte

Store the generation-number into bit3 ~ bit11 and bit52 ~ bit61, totally
19 bits can be used, it should be enough for nearly all most common cases

In this patch, the generation-number is always 0, it will be changed in
the later patch

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/kvm/mmu.c | 58 ++++++++++++++++++++++++++++++++++++++--------
arch/x86/kvm/mmutrace.h | 10 ++++----
arch/x86/kvm/paging_tmpl.h | 3 ++-
3 files changed, 56 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6941fa7..eca91bd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -197,15 +197,52 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
}
EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);

-static void mark_mmio_spte(u64 *sptep, u64 gfn, unsigned access)
+/*
+ * spte bits of bit 3 ~ bit 11 are used as low 9 bits of generation number,
+ * the bits of bits 52 ~ bit 61 are used as high 10 bits of generation
+ * number.
+ */
+#define MMIO_SPTE_GEN_LOW_SHIFT 3
+#define MMIO_SPTE_GEN_HIGH_SHIFT 52
+
+#define MMIO_GEN_LOW_SHIFT 9
+#define MMIO_GEN_LOW_MASK ((1 << MMIO_GEN_LOW_SHIFT) - 1)
+#define MMIO_MAX_GEN ((1 << 19) - 1)
+
+static u64 generation_mmio_spte_mask(unsigned int gen)
+{
+ u64 mask;
+
+ WARN_ON(gen > MMIO_MAX_GEN);
+
+ mask = (gen & MMIO_GEN_LOW_MASK) << MMIO_SPTE_GEN_LOW_SHIFT;
+ mask |= ((u64)gen >> MMIO_GEN_LOW_SHIFT) << MMIO_SPTE_GEN_HIGH_SHIFT;
+ return mask;
+}
+
+static unsigned int get_mmio_spte_generation(u64 spte)
+{
+ unsigned int gen;
+
+ spte &= ~shadow_mmio_mask;
+
+ gen = (spte >> MMIO_SPTE_GEN_LOW_SHIFT) & MMIO_GEN_LOW_MASK;
+ gen |= (spte >> MMIO_SPTE_GEN_HIGH_SHIFT) << MMIO_GEN_LOW_SHIFT;
+ return gen;
+}
+
+static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
+ unsigned access)
{
struct kvm_mmu_page *sp = page_header(__pa(sptep));
+ u64 mask = generation_mmio_spte_mask(0);

access &= ACC_WRITE_MASK | ACC_USER_MASK;
-
+ mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
sp->mmio_cached = true;
- trace_mark_mmio_spte(sptep, gfn, access);
- mmu_spte_set(sptep, shadow_mmio_mask | access | gfn << PAGE_SHIFT);
+
+ trace_mark_mmio_spte(sptep, gfn, access, 0);
+ mmu_spte_set(sptep, mask);
}

static bool is_mmio_spte(u64 spte)
@@ -223,10 +260,11 @@ static unsigned get_mmio_spte_access(u64 spte)
return (spte & ~shadow_mmio_mask) & ~PAGE_MASK;
}

-static bool set_mmio_spte(u64 *sptep, gfn_t gfn, pfn_t pfn, unsigned access)
+static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
+ pfn_t pfn, unsigned access)
{
if (unlikely(is_noslot_pfn(pfn))) {
- mark_mmio_spte(sptep, gfn, access);
+ mark_mmio_spte(kvm, sptep, gfn, access);
return true;
}

@@ -2364,7 +2402,7 @@ static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep,
u64 spte;
int ret = 0;

- if (set_mmio_spte(sptep, gfn, pfn, pte_access))
+ if (set_mmio_spte(vcpu->kvm, sptep, gfn, pfn, pte_access))
return 0;

spte = PT_PRESENT_MASK;
@@ -3427,8 +3465,8 @@ static inline void protect_clean_gpte(unsigned *access, unsigned gpte)
*access &= mask;
}

-static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
- int *nr_present)
+static bool sync_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
+ unsigned access, int *nr_present)
{
if (unlikely(is_mmio_spte(*sptep))) {
if (gfn != get_mmio_spte_gfn(*sptep)) {
@@ -3437,7 +3475,7 @@ static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, unsigned access,
}

(*nr_present)++;
- mark_mmio_spte(sptep, gfn, access);
+ mark_mmio_spte(kvm, sptep, gfn, access);
return true;
}

diff --git a/arch/x86/kvm/mmutrace.h b/arch/x86/kvm/mmutrace.h
index eb444dd..ad24757 100644
--- a/arch/x86/kvm/mmutrace.h
+++ b/arch/x86/kvm/mmutrace.h
@@ -199,23 +199,25 @@ DEFINE_EVENT(kvm_mmu_page_class, kvm_mmu_prepare_zap_page,

TRACE_EVENT(
mark_mmio_spte,
- TP_PROTO(u64 *sptep, gfn_t gfn, unsigned access),
- TP_ARGS(sptep, gfn, access),
+ TP_PROTO(u64 *sptep, gfn_t gfn, unsigned access, unsigned int gen),
+ TP_ARGS(sptep, gfn, access, gen),

TP_STRUCT__entry(
__field(void *, sptep)
__field(gfn_t, gfn)
__field(unsigned, access)
+ __field(unsigned int, gen)
),

TP_fast_assign(
__entry->sptep = sptep;
__entry->gfn = gfn;
__entry->access = access;
+ __entry->gen = gen;
),

- TP_printk("sptep:%p gfn %llx access %x", __entry->sptep, __entry->gfn,
- __entry->access)
+ TP_printk("sptep:%p gfn %llx access %x gen %x", __entry->sptep,
+ __entry->gfn, __entry->access, __entry->gen)
);

TRACE_EVENT(
diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index da20860..fb50fa6 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -792,7 +792,8 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
pte_access &= gpte_access(vcpu, gpte);
protect_clean_gpte(&pte_access, gpte);

- if (sync_mmio_spte(&sp->spt[i], gfn, pte_access, &nr_present))
+ if (sync_mmio_spte(vcpu->kvm, &sp->spt[i], gfn, pte_access,
+ &nr_present))
continue;

if (gfn != sp->gfns[i]) {
--
1.8.1.4

2013-06-10 07:57:02

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
> Changelog:
> V3:
> All of these changes are from Gleb's review:
> 1) rename RET_MMIO_PF_EMU to RET_MMIO_PF_EMULATE.
> 2) smartly adjust kvm generation number in kvm_current_mmio_generatio()
> to avoid kvm_memslots->generation overflow.
>
> V2:
> - rename kvm_mmu_invalid_mmio_spte to kvm_mmu_invalid_mmio_sptes
> - use kvm->memslots->generation as kvm global generation-number
> - fix comment and codestyle
> - init kvm generation close to mmio wrap-around value
> - keep kvm_mmu_zap_mmio_sptes
>
> The current way is holding hot mmu-lock and walking all shadow pages, this
> is not scale. This patchset tries to introduce a very simple and scale way
> to fast invalidate all mmio sptes - it need not walk any shadow pages and hold
> any locks.
>
> The idea is simple:
> KVM maintains a global mmio valid generation-number which is stored in
> kvm->memslots.generation and every mmio spte stores the current global
> generation-number into his available bits when it is created
>
> When KVM need zap all mmio sptes, it just simply increase the global
> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> then it walks the shadow page table and get the mmio spte. If the
> generation-number on the spte does not equal the global generation-number,
> it will go to the normal #PF handler to update the mmio spte
>
> Since 19 bits are used to store generation-number on mmio spte, we zap all
> mmio sptes when the number is round
>
Looks good to me, but doesn't tis obsolete kvm_mmu_zap_mmio_sptes() and
sp->mmio_cached, so they should be removed as part of the patch series?

> Xiao Guangrong (6):
> KVM: MMU: retain more available bits on mmio spte
> KVM: MMU: store generation-number into mmio spte
> KVM: MMU: make return value of mmio page fault handler more readable
> KVM: MMU: fast invalidate all mmio sptes
> KVM: MMU: add tracepoint for check_mmio_spte
> KVM: MMU: init kvm generation close to mmio wrap-around value
>
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/mmu.c | 131 ++++++++++++++++++++++++++++++++--------
> arch/x86/kvm/mmu.h | 17 ++++++
> arch/x86/kvm/mmutrace.h | 34 +++++++++--
> arch/x86/kvm/paging_tmpl.h | 10 ++-
> arch/x86/kvm/vmx.c | 12 ++--
> arch/x86/kvm/x86.c | 11 +++-
> 7 files changed, 177 insertions(+), 40 deletions(-)
>
> --
> 1.8.1.4

--
Gleb.

2013-06-10 07:57:55

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] KVM: MMU: make return value of mmio page fault handler more readable

On Fri, Jun 07, 2013 at 04:51:25PM +0800, Xiao Guangrong wrote:
> Define some meaningful names instead of raw code
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/kvm/mmu.c | 15 +++++----------
> arch/x86/kvm/mmu.h | 14 ++++++++++++++
> arch/x86/kvm/vmx.c | 4 ++--
> 3 files changed, 21 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index eca91bd..044d8c0 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3222,17 +3222,12 @@ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
> return spte;
> }
>
> -/*
> - * If it is a real mmio page fault, return 1 and emulat the instruction
> - * directly, return 0 to let CPU fault again on the address, -1 is
> - * returned if bug is detected.
> - */
> int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> {
> u64 spte;
>
> if (quickly_check_mmio_pf(vcpu, addr, direct))
> - return 1;
> + return RET_MMIO_PF_EMULATE;
>
> spte = walk_shadow_page_get_mmio_spte(vcpu, addr);
>
> @@ -3245,7 +3240,7 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
>
> trace_handle_mmio_page_fault(addr, gfn, access);
> vcpu_cache_mmio_info(vcpu, addr, gfn, access);
> - return 1;
> + return RET_MMIO_PF_EMULATE;
> }
>
> /*
> @@ -3253,13 +3248,13 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> * it's a BUG if the gfn is not a mmio page.
> */
> if (direct && !check_direct_spte_mmio_pf(spte))
> - return -1;
> + return RET_MMIO_PF_BUG;
>
> /*
> * If the page table is zapped by other cpus, let CPU fault again on
> * the address.
> */
> - return 0;
> + return RET_MMIO_PF_RETRY;
> }
> EXPORT_SYMBOL_GPL(handle_mmio_page_fault_common);
>
> @@ -3269,7 +3264,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr,
> int ret;
>
> ret = handle_mmio_page_fault_common(vcpu, addr, direct);
> - WARN_ON(ret < 0);
> + WARN_ON(ret == RET_MMIO_PF_BUG);
> return ret;
> }
>
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 922bfae..ba6a19c 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -52,6 +52,20 @@
>
> int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
> void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
> +
> +/*
> + * Return values of handle_mmio_page_fault_common:
> + * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
> + * directly.
> + * RET_MMIO_PF_RETRY: let CPU fault again on the address.
> + * RET_MMIO_PF_BUG: bug is detected.
> + */
> +enum {
> + RET_MMIO_PF_EMULATE = 1,
> + RET_MMIO_PF_RETRY = 0,
> + RET_MMIO_PF_BUG = -1
> +};
I would order them from -1 to 1 and rename RET_MMIO_PF_BUG to
RET_MMIO_PF_ERROR, but no need to resend just for that.

> +
> int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct);
> int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 78ee123..85c8d51 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5366,10 +5366,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> gpa = vmcs_read64(GUEST_PHYSICAL_ADDRESS);
>
> ret = handle_mmio_page_fault_common(vcpu, gpa, true);
> - if (likely(ret == 1))
> + if (likely(ret == RET_MMIO_PF_EMULATE))
> return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
> EMULATE_DONE;
> - if (unlikely(!ret))
> + if (unlikely(ret == RET_MMIO_PF_RETRY))
> return 1;
>
> /* It is the real ept misconfig */
> --
> 1.8.1.4

--
Gleb.

2013-06-10 08:41:56

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

On 06/10/2013 03:56 PM, Gleb Natapov wrote:
> On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
>> Changelog:
>> V3:
>> All of these changes are from Gleb's review:
>> 1) rename RET_MMIO_PF_EMU to RET_MMIO_PF_EMULATE.
>> 2) smartly adjust kvm generation number in kvm_current_mmio_generatio()
>> to avoid kvm_memslots->generation overflow.
>>
>> V2:
>> - rename kvm_mmu_invalid_mmio_spte to kvm_mmu_invalid_mmio_sptes
>> - use kvm->memslots->generation as kvm global generation-number
>> - fix comment and codestyle
>> - init kvm generation close to mmio wrap-around value
>> - keep kvm_mmu_zap_mmio_sptes
>>
>> The current way is holding hot mmu-lock and walking all shadow pages, this
>> is not scale. This patchset tries to introduce a very simple and scale way
>> to fast invalidate all mmio sptes - it need not walk any shadow pages and hold
>> any locks.
>>
>> The idea is simple:
>> KVM maintains a global mmio valid generation-number which is stored in
>> kvm->memslots.generation and every mmio spte stores the current global
>> generation-number into his available bits when it is created
>>
>> When KVM need zap all mmio sptes, it just simply increase the global
>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>> then it walks the shadow page table and get the mmio spte. If the
>> generation-number on the spte does not equal the global generation-number,
>> it will go to the normal #PF handler to update the mmio spte
>>
>> Since 19 bits are used to store generation-number on mmio spte, we zap all
>> mmio sptes when the number is round
>>
> Looks good to me, but doesn't tis obsolete kvm_mmu_zap_mmio_sptes() and
> sp->mmio_cached, so they should be removed as part of the patch series?

Yes, i agree, they should be removed. :)

There is the patch to do these things:

>From bc1bc36e2640059f06c4860af802ecc74e1f3d2d Mon Sep 17 00:00:00 2001
From: Xiao Guangrong <[email protected]>
Date: Mon, 10 Jun 2013 16:28:55 +0800
Subject: [PATCH 7/6] KVM: MMU: drop kvm_mmu_zap_mmio_sptes

Drop kvm_mmu_zap_mmio_sptes and use kvm_mmu_invalidate_zap_all_pages
instead to handle mmio generation number overflow

Signed-off-by: Xiao Guangrong <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 -
arch/x86/kvm/mmu.c | 22 +---------------------
2 files changed, 1 insertion(+), 22 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 90d05ed..966f265 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -230,7 +230,6 @@ struct kvm_mmu_page {
#endif

int write_flooding_count;
- bool mmio_cached;
};

struct kvm_pio_request {
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 35cd0b6..c87b19d 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -246,13 +246,11 @@ static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
unsigned access)
{
- struct kvm_mmu_page *sp = page_header(__pa(sptep));
unsigned int gen = kvm_current_mmio_generation(kvm);
u64 mask = generation_mmio_spte_mask(gen);

access &= ACC_WRITE_MASK | ACC_USER_MASK;
mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
- sp->mmio_cached = true;

trace_mark_mmio_spte(sptep, gfn, access, gen);
mmu_spte_set(sptep, mask);
@@ -4362,24 +4360,6 @@ void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
spin_unlock(&kvm->mmu_lock);
}

-static void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
-{
- struct kvm_mmu_page *sp, *node;
- LIST_HEAD(invalid_list);
-
- spin_lock(&kvm->mmu_lock);
-restart:
- list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
- if (!sp->mmio_cached)
- continue;
- if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
- goto restart;
- }
-
- kvm_mmu_commit_zap_page(kvm, &invalid_list);
- spin_unlock(&kvm->mmu_lock);
-}
-
static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
{
return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
@@ -4395,7 +4375,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
* when mark memslot invalid.
*/
if (unlikely(kvm_current_mmio_generation(kvm) >= (MMIO_MAX_GEN - 1)))
- kvm_mmu_zap_mmio_sptes(kvm);
+ kvm_mmu_invalidate_zap_all_pages(kvm);
}

static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
--
1.8.1.4

2013-06-10 08:46:00

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] KVM: MMU: make return value of mmio page fault handler more readable

On 06/10/2013 03:57 PM, Gleb Natapov wrote:
> On Fri, Jun 07, 2013 at 04:51:25PM +0800, Xiao Guangrong wrote:
>> Define some meaningful names instead of raw code
>>
>> Signed-off-by: Xiao Guangrong <[email protected]>
>> ---
>> arch/x86/kvm/mmu.c | 15 +++++----------
>> arch/x86/kvm/mmu.h | 14 ++++++++++++++
>> arch/x86/kvm/vmx.c | 4 ++--
>> 3 files changed, 21 insertions(+), 12 deletions(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index eca91bd..044d8c0 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3222,17 +3222,12 @@ static u64 walk_shadow_page_get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr)
>> return spte;
>> }
>>
>> -/*
>> - * If it is a real mmio page fault, return 1 and emulat the instruction
>> - * directly, return 0 to let CPU fault again on the address, -1 is
>> - * returned if bug is detected.
>> - */
>> int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
>> {
>> u64 spte;
>>
>> if (quickly_check_mmio_pf(vcpu, addr, direct))
>> - return 1;
>> + return RET_MMIO_PF_EMULATE;
>>
>> spte = walk_shadow_page_get_mmio_spte(vcpu, addr);
>>
>> @@ -3245,7 +3240,7 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
>>
>> trace_handle_mmio_page_fault(addr, gfn, access);
>> vcpu_cache_mmio_info(vcpu, addr, gfn, access);
>> - return 1;
>> + return RET_MMIO_PF_EMULATE;
>> }
>>
>> /*
>> @@ -3253,13 +3248,13 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
>> * it's a BUG if the gfn is not a mmio page.
>> */
>> if (direct && !check_direct_spte_mmio_pf(spte))
>> - return -1;
>> + return RET_MMIO_PF_BUG;
>>
>> /*
>> * If the page table is zapped by other cpus, let CPU fault again on
>> * the address.
>> */
>> - return 0;
>> + return RET_MMIO_PF_RETRY;
>> }
>> EXPORT_SYMBOL_GPL(handle_mmio_page_fault_common);
>>
>> @@ -3269,7 +3264,7 @@ static int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr,
>> int ret;
>>
>> ret = handle_mmio_page_fault_common(vcpu, addr, direct);
>> - WARN_ON(ret < 0);
>> + WARN_ON(ret == RET_MMIO_PF_BUG);
>> return ret;
>> }
>>
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 922bfae..ba6a19c 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -52,6 +52,20 @@
>>
>> int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 sptes[4]);
>> void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
>> +
>> +/*
>> + * Return values of handle_mmio_page_fault_common:
>> + * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
>> + * directly.
>> + * RET_MMIO_PF_RETRY: let CPU fault again on the address.
>> + * RET_MMIO_PF_BUG: bug is detected.
>> + */
>> +enum {
>> + RET_MMIO_PF_EMULATE = 1,
>> + RET_MMIO_PF_RETRY = 0,
>> + RET_MMIO_PF_BUG = -1
>> +};
> I would order them from -1 to 1 and rename RET_MMIO_PF_BUG to
> RET_MMIO_PF_ERROR, but no need to resend just for that.

Fine to me. Will make a separate patch to update it later. Thanks for your review!

2013-06-10 13:16:13

by Takuya Yoshikawa

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] KVM: MMU: make return value of mmio page fault handler more readable

On Mon, 10 Jun 2013 10:57:50 +0300
Gleb Natapov <[email protected]> wrote:

> On Fri, Jun 07, 2013 at 04:51:25PM +0800, Xiao Guangrong wrote:

> > +
> > +/*
> > + * Return values of handle_mmio_page_fault_common:
> > + * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
> > + * directly.
> > + * RET_MMIO_PF_RETRY: let CPU fault again on the address.
> > + * RET_MMIO_PF_BUG: bug is detected.
> > + */
> > +enum {
> > + RET_MMIO_PF_EMULATE = 1,
> > + RET_MMIO_PF_RETRY = 0,
> > + RET_MMIO_PF_BUG = -1
> > +};
> I would order them from -1 to 1 and rename RET_MMIO_PF_BUG to
> RET_MMIO_PF_ERROR, but no need to resend just for that.

Why not just let compilers select the values? -- It's an enum.
Any reason to make it start from -1?

Takuya

2013-06-10 13:44:00

by Takuya Yoshikawa

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

On Mon, 10 Jun 2013 16:39:37 +0800
Xiao Guangrong <[email protected]> wrote:

> On 06/10/2013 03:56 PM, Gleb Natapov wrote:
> > On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:

> > Looks good to me, but doesn't tis obsolete kvm_mmu_zap_mmio_sptes() and
> > sp->mmio_cached, so they should be removed as part of the patch series?
>
> Yes, i agree, they should be removed. :)

I'm fine with removing it but please make it clear that you all agree
on the same basis.

Last time, Paolo mentioned the possibility to use some bits of spte for
other things. The suggestion there was to keep sp->mmio_cached code
for the time we would need to reduce the bits for generation numbers.

Do you think that zap_all() is now preemptible and can treat the
situation reasonably well as the current kvm_mmu_zap_mmio_sptes()?

One downside is the need to zap unrelated shadow pages, but if this case
is really very rare, yes I agree, it should not be a problem: it depends
on how many bits we can use.

Just please reconfirm.

Takuya

>
> There is the patch to do these things:
>
> From bc1bc36e2640059f06c4860af802ecc74e1f3d2d Mon Sep 17 00:00:00 2001
> From: Xiao Guangrong <[email protected]>
> Date: Mon, 10 Jun 2013 16:28:55 +0800
> Subject: [PATCH 7/6] KVM: MMU: drop kvm_mmu_zap_mmio_sptes
>
> Drop kvm_mmu_zap_mmio_sptes and use kvm_mmu_invalidate_zap_all_pages
> instead to handle mmio generation number overflow
>
> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 1 -
> arch/x86/kvm/mmu.c | 22 +---------------------
> 2 files changed, 1 insertion(+), 22 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 90d05ed..966f265 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -230,7 +230,6 @@ struct kvm_mmu_page {
> #endif
>
> int write_flooding_count;
> - bool mmio_cached;
> };
>
> struct kvm_pio_request {
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 35cd0b6..c87b19d 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -246,13 +246,11 @@ static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
> unsigned access)
> {
> - struct kvm_mmu_page *sp = page_header(__pa(sptep));
> unsigned int gen = kvm_current_mmio_generation(kvm);
> u64 mask = generation_mmio_spte_mask(gen);
>
> access &= ACC_WRITE_MASK | ACC_USER_MASK;
> mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
> - sp->mmio_cached = true;
>
> trace_mark_mmio_spte(sptep, gfn, access, gen);
> mmu_spte_set(sptep, mask);
> @@ -4362,24 +4360,6 @@ void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
> spin_unlock(&kvm->mmu_lock);
> }
>
> -static void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
> -{
> - struct kvm_mmu_page *sp, *node;
> - LIST_HEAD(invalid_list);
> -
> - spin_lock(&kvm->mmu_lock);
> -restart:
> - list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
> - if (!sp->mmio_cached)
> - continue;
> - if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
> - goto restart;
> - }
> -
> - kvm_mmu_commit_zap_page(kvm, &invalid_list);
> - spin_unlock(&kvm->mmu_lock);
> -}
> -
> static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
> {
> return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
> @@ -4395,7 +4375,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
> * when mark memslot invalid.
> */
> if (unlikely(kvm_current_mmio_generation(kvm) >= (MMIO_MAX_GEN - 1)))
> - kvm_mmu_zap_mmio_sptes(kvm);
> + kvm_mmu_invalidate_zap_all_pages(kvm);
> }
>
> static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
> --
> 1.8.1.4
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html


--
Takuya Yoshikawa <[email protected]>

2013-06-10 17:03:48

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

On Mon, Jun 10, 2013 at 10:43:52PM +0900, Takuya Yoshikawa wrote:
> On Mon, 10 Jun 2013 16:39:37 +0800
> Xiao Guangrong <[email protected]> wrote:
>
> > On 06/10/2013 03:56 PM, Gleb Natapov wrote:
> > > On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
>
> > > Looks good to me, but doesn't tis obsolete kvm_mmu_zap_mmio_sptes() and
> > > sp->mmio_cached, so they should be removed as part of the patch series?
> >
> > Yes, i agree, they should be removed. :)
>
> I'm fine with removing it but please make it clear that you all agree
> on the same basis.
>
> Last time, Paolo mentioned the possibility to use some bits of spte for
> other things. The suggestion there was to keep sp->mmio_cached code
> for the time we would need to reduce the bits for generation numbers.
>
> Do you think that zap_all() is now preemptible and can treat the
> situation reasonably well as the current kvm_mmu_zap_mmio_sptes()?
>
> One downside is the need to zap unrelated shadow pages, but if this case
> is really very rare, yes I agree, it should not be a problem: it depends
> on how many bits we can use.
>
> Just please reconfirm.
>
That was me who mention the possibility to use some bits of spte for
other things. But for now I have a use for one bit only. Now that you
have reminded me about that discussion I am not so sure we want to
remove kvm_mmu_zap_mmio_sptes(), but on the other hand it is non
preemptable, so large number of mmio sptes can cause soft lockups.
zap_all() is better in this regards now.

--
Gleb.

2013-06-11 09:19:16

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 3/6] KVM: MMU: make return value of mmio page fault handler more readable

On Mon, Jun 10, 2013 at 10:16:04PM +0900, Takuya Yoshikawa wrote:
> On Mon, 10 Jun 2013 10:57:50 +0300
> Gleb Natapov <[email protected]> wrote:
>
> > On Fri, Jun 07, 2013 at 04:51:25PM +0800, Xiao Guangrong wrote:
>
> > > +
> > > +/*
> > > + * Return values of handle_mmio_page_fault_common:
> > > + * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
> > > + * directly.
> > > + * RET_MMIO_PF_RETRY: let CPU fault again on the address.
> > > + * RET_MMIO_PF_BUG: bug is detected.
> > > + */
> > > +enum {
> > > + RET_MMIO_PF_EMULATE = 1,
> > > + RET_MMIO_PF_RETRY = 0,
> > > + RET_MMIO_PF_BUG = -1
> > > +};
> > I would order them from -1 to 1 and rename RET_MMIO_PF_BUG to
> > RET_MMIO_PF_ERROR, but no need to resend just for that.
>
> Why not just let compilers select the values? -- It's an enum.
> Any reason to make it start from -1?
>
I am fine with this too as an additional patch. It makes sense to preserve
original values like Xiao did for initial patch, since it is easier to
verify that the patch is just a mechanical change.

--
Gleb.

2013-06-14 01:59:31

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
> Changelog:
> V3:
> All of these changes are from Gleb's review:
> 1) rename RET_MMIO_PF_EMU to RET_MMIO_PF_EMULATE.
> 2) smartly adjust kvm generation number in kvm_current_mmio_generatio()
> to avoid kvm_memslots->generation overflow.
>
> V2:
> - rename kvm_mmu_invalid_mmio_spte to kvm_mmu_invalid_mmio_sptes
> - use kvm->memslots->generation as kvm global generation-number
> - fix comment and codestyle
> - init kvm generation close to mmio wrap-around value
> - keep kvm_mmu_zap_mmio_sptes
>
> The current way is holding hot mmu-lock and walking all shadow pages, this
> is not scale. This patchset tries to introduce a very simple and scale way
> to fast invalidate all mmio sptes - it need not walk any shadow pages and hold
> any locks.

Hi Xiao,

- Where is the generation number increased?
- Should use spinlock breakable code in kvm_mmu_zap_mmio_sptes()
(picture guest with 512GB of RAM, even walking all those pages is
expensive) (ah, patch to remove kvm_mmu_zap_mmio_sptes does that).
- Is -13 enough to test wraparound? Its highly likely the guest has
not began executing by the time 13 kvm_set_memory_calls are made
(so no sptes around). Perhaps -2000 is more sensible (should confirm
though).
- Why remove "if (change == KVM_MR_CREATE) || (change
== KVM_MR_MOVE)" from kvm_arch_commit_memory_region?
Its instructive.

Otherwise looks good.

2013-06-15 02:22:19

by Takuya Yoshikawa

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

On Thu, 13 Jun 2013 21:08:21 -0300
Marcelo Tosatti <[email protected]> wrote:

> On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:

> - Where is the generation number increased?

Looks like when a new slot is installed in update_memslots() because
it's based on slots->generation. This is not restricted to "create"
and "move".

> - Should use spinlock breakable code in kvm_mmu_zap_mmio_sptes()
> (picture guest with 512GB of RAM, even walking all those pages is
> expensive) (ah, patch to remove kvm_mmu_zap_mmio_sptes does that).
> - Is -13 enough to test wraparound? Its highly likely the guest has
> not began executing by the time 13 kvm_set_memory_calls are made
> (so no sptes around). Perhaps -2000 is more sensible (should confirm
> though).

In the future, after we've tested enough, we should change the testing
code to be executed only for some debugging configs. Especially, if we
change zap_mmio_sptes() to zap_all_shadows(), very common guests, even
without huge memory like 512GB, can see the effect induced by sudden page
faults unnecessarily.

If necessary, developers can test the wraparound code by lowering the
max_gen itself anyway.

> - Why remove "if (change == KVM_MR_CREATE) || (change
> == KVM_MR_MOVE)" from kvm_arch_commit_memory_region?
> Its instructive.

There may be a chance that we miss generation wraparounds if we don't
check other cases: seems unlikely, but theoretically possible.

In short, all memory slot changes make mmio sptes stored in shadow pages
obsolete, or zapped for wraparounds, in the new way -- am I right?

Takuya

2013-06-17 11:59:28

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

Sorry for the delay reply since i was on vacation.

On 06/15/2013 10:22 AM, Takuya Yoshikawa wrote:
> On Thu, 13 Jun 2013 21:08:21 -0300
> Marcelo Tosatti <[email protected]> wrote:
>
>> On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
>
>> - Where is the generation number increased?
>
> Looks like when a new slot is installed in update_memslots() because
> it's based on slots->generation. This is not restricted to "create"
> and "move".

Yes. It reuses slots->generation to avoid unnecessary synchronizations
(RCU, memory barrier).

Increasing mmio generation number in the case of "create" and "move"
is ok - it is no addition work unless mmio generation number is overflow
which is hardly triggered (since the valid mmio generation number is
large enough and zap_all is scale well now.) and the mmio spte is updated
only when it is used in the future.

>
>> - Should use spinlock breakable code in kvm_mmu_zap_mmio_sptes()
>> (picture guest with 512GB of RAM, even walking all those pages is
>> expensive) (ah, patch to remove kvm_mmu_zap_mmio_sptes does that).
>> - Is -13 enough to test wraparound? Its highly likely the guest has
>> not began executing by the time 13 kvm_set_memory_calls are made
>> (so no sptes around). Perhaps -2000 is more sensible (should confirm
>> though).
>
> In the future, after we've tested enough, we should change the testing
> code to be executed only for some debugging configs. Especially, if we
> change zap_mmio_sptes() to zap_all_shadows(), very common guests, even
> without huge memory like 512GB, can see the effect induced by sudden page
> faults unnecessarily.
>
> If necessary, developers can test the wraparound code by lowering the
> max_gen itself anyway.

I agree.

>
>> - Why remove "if (change == KVM_MR_CREATE) || (change
>> == KVM_MR_MOVE)" from kvm_arch_commit_memory_region?
>> Its instructive.
>
> There may be a chance that we miss generation wraparounds if we don't
> check other cases: seems unlikely, but theoretically possible.
>
> In short, all memory slot changes make mmio sptes stored in shadow pages
> obsolete, or zapped for wraparounds, in the new way -- am I right?

Yes. You are definitely right. :)

Takuya-san, thank you very much for you answering the questions for me and thanks
all of you for patiently reviewing my patches.

Marcelo, your points?

2013-06-18 14:27:07

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

Il 07/06/2013 10:51, Xiao Guangrong ha scritto:
> Changelog:
> V3:
> All of these changes are from Gleb's review:
> 1) rename RET_MMIO_PF_EMU to RET_MMIO_PF_EMULATE.
> 2) smartly adjust kvm generation number in kvm_current_mmio_generatio()
> to avoid kvm_memslots->generation overflow.
>
> V2:
> - rename kvm_mmu_invalid_mmio_spte to kvm_mmu_invalid_mmio_sptes
> - use kvm->memslots->generation as kvm global generation-number
> - fix comment and codestyle
> - init kvm generation close to mmio wrap-around value
> - keep kvm_mmu_zap_mmio_sptes
>
> The current way is holding hot mmu-lock and walking all shadow pages, this
> is not scale. This patchset tries to introduce a very simple and scale way
> to fast invalidate all mmio sptes - it need not walk any shadow pages and hold
> any locks.
>
> The idea is simple:
> KVM maintains a global mmio valid generation-number which is stored in
> kvm->memslots.generation and every mmio spte stores the current global
> generation-number into his available bits when it is created
>
> When KVM need zap all mmio sptes, it just simply increase the global
> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> then it walks the shadow page table and get the mmio spte. If the
> generation-number on the spte does not equal the global generation-number,
> it will go to the normal #PF handler to update the mmio spte
>
> Since 19 bits are used to store generation-number on mmio spte, we zap all
> mmio sptes when the number is round
>
> Xiao Guangrong (6):
> KVM: MMU: retain more available bits on mmio spte
> KVM: MMU: store generation-number into mmio spte
> KVM: MMU: make return value of mmio page fault handler more readable
> KVM: MMU: fast invalidate all mmio sptes
> KVM: MMU: add tracepoint for check_mmio_spte
> KVM: MMU: init kvm generation close to mmio wrap-around value
>
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/mmu.c | 131 ++++++++++++++++++++++++++++++++--------
> arch/x86/kvm/mmu.h | 17 ++++++
> arch/x86/kvm/mmutrace.h | 34 +++++++++--
> arch/x86/kvm/paging_tmpl.h | 10 ++-
> arch/x86/kvm/vmx.c | 12 ++--
> arch/x86/kvm/x86.c | 11 +++-
> 7 files changed, 177 insertions(+), 40 deletions(-)
>

Xiao, is it time to add more comments to the code or update
Documentation/virtual/kvm/mmu.txt? Don't worry about the English, it is
more than understandable and I can help with the editing.

Paolo

2013-06-18 22:21:52

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

On Mon, Jun 17, 2013 at 07:59:15PM +0800, Xiao Guangrong wrote:
> Sorry for the delay reply since i was on vacation.
>
> On 06/15/2013 10:22 AM, Takuya Yoshikawa wrote:
> > On Thu, 13 Jun 2013 21:08:21 -0300
> > Marcelo Tosatti <[email protected]> wrote:
> >
> >> On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
> >
> >> - Where is the generation number increased?
> >
> > Looks like when a new slot is installed in update_memslots() because
> > it's based on slots->generation. This is not restricted to "create"
> > and "move".
>
> Yes. It reuses slots->generation to avoid unnecessary synchronizations
> (RCU, memory barrier).
>
> Increasing mmio generation number in the case of "create" and "move"
> is ok - it is no addition work unless mmio generation number is overflow
> which is hardly triggered (since the valid mmio generation number is
> large enough and zap_all is scale well now.) and the mmio spte is updated
> only when it is used in the future.
>
> >
> >> - Should use spinlock breakable code in kvm_mmu_zap_mmio_sptes()
> >> (picture guest with 512GB of RAM, even walking all those pages is
> >> expensive) (ah, patch to remove kvm_mmu_zap_mmio_sptes does that).
> >> - Is -13 enough to test wraparound? Its highly likely the guest has
> >> not began executing by the time 13 kvm_set_memory_calls are made
> >> (so no sptes around). Perhaps -2000 is more sensible (should confirm
> >> though).
> >
> > In the future, after we've tested enough, we should change the testing
> > code to be executed only for some debugging configs. Especially, if we
> > change zap_mmio_sptes() to zap_all_shadows(), very common guests, even
> > without huge memory like 512GB, can see the effect induced by sudden page
> > faults unnecessarily.
> >
> > If necessary, developers can test the wraparound code by lowering the
> > max_gen itself anyway.
>
> I agree.
>
> >
> >> - Why remove "if (change == KVM_MR_CREATE) || (change
> >> == KVM_MR_MOVE)" from kvm_arch_commit_memory_region?
> >> Its instructive.
> >
> > There may be a chance that we miss generation wraparounds if we don't
> > check other cases: seems unlikely, but theoretically possible.
> >
> > In short, all memory slot changes make mmio sptes stored in shadow pages
> > obsolete, or zapped for wraparounds, in the new way -- am I right?
>
> Yes. You are definitely right. :)
>
> Takuya-san, thank you very much for you answering the questions for me and thanks
> all of you for patiently reviewing my patches.
>
> Marcelo, your points?

Agreed - points are clear. Patchset looks good.

2013-06-19 02:47:18

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

On 06/18/2013 10:26 PM, Paolo Bonzini wrote:
> Il 07/06/2013 10:51, Xiao Guangrong ha scritto:
>> Changelog:
>> V3:
>> All of these changes are from Gleb's review:
>> 1) rename RET_MMIO_PF_EMU to RET_MMIO_PF_EMULATE.
>> 2) smartly adjust kvm generation number in kvm_current_mmio_generatio()
>> to avoid kvm_memslots->generation overflow.
>>
>> V2:
>> - rename kvm_mmu_invalid_mmio_spte to kvm_mmu_invalid_mmio_sptes
>> - use kvm->memslots->generation as kvm global generation-number
>> - fix comment and codestyle
>> - init kvm generation close to mmio wrap-around value
>> - keep kvm_mmu_zap_mmio_sptes
>>
>> The current way is holding hot mmu-lock and walking all shadow pages, this
>> is not scale. This patchset tries to introduce a very simple and scale way
>> to fast invalidate all mmio sptes - it need not walk any shadow pages and hold
>> any locks.
>>
>> The idea is simple:
>> KVM maintains a global mmio valid generation-number which is stored in
>> kvm->memslots.generation and every mmio spte stores the current global
>> generation-number into his available bits when it is created
>>
>> When KVM need zap all mmio sptes, it just simply increase the global
>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>> then it walks the shadow page table and get the mmio spte. If the
>> generation-number on the spte does not equal the global generation-number,
>> it will go to the normal #PF handler to update the mmio spte
>>
>> Since 19 bits are used to store generation-number on mmio spte, we zap all
>> mmio sptes when the number is round
>>
>> Xiao Guangrong (6):
>> KVM: MMU: retain more available bits on mmio spte
>> KVM: MMU: store generation-number into mmio spte
>> KVM: MMU: make return value of mmio page fault handler more readable
>> KVM: MMU: fast invalidate all mmio sptes
>> KVM: MMU: add tracepoint for check_mmio_spte
>> KVM: MMU: init kvm generation close to mmio wrap-around value
>>
>> arch/x86/include/asm/kvm_host.h | 2 +-
>> arch/x86/kvm/mmu.c | 131 ++++++++++++++++++++++++++++++++--------
>> arch/x86/kvm/mmu.h | 17 ++++++
>> arch/x86/kvm/mmutrace.h | 34 +++++++++--
>> arch/x86/kvm/paging_tmpl.h | 10 ++-
>> arch/x86/kvm/vmx.c | 12 ++--
>> arch/x86/kvm/x86.c | 11 +++-
>> 7 files changed, 177 insertions(+), 40 deletions(-)
>>
>
> Xiao, is it time to add more comments to the code or update
> Documentation/virtual/kvm/mmu.txt?

Yes. it is.

We missed to update mmu.txt for a long time. I will post a separate patchset
to update it to the current mmu code.

> Don't worry about the English, it is
> more than understandable and I can help with the editing.

Okay. Thank you in advance, Paolo! :)

2013-06-19 11:08:21

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

Il 10/06/2013 19:03, Gleb Natapov ha scritto:
> On Mon, Jun 10, 2013 at 10:43:52PM +0900, Takuya Yoshikawa wrote:
>> On Mon, 10 Jun 2013 16:39:37 +0800
>> Xiao Guangrong <[email protected]> wrote:
>>
>>> On 06/10/2013 03:56 PM, Gleb Natapov wrote:
>>>> On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
>>
>>>> Looks good to me, but doesn't tis obsolete kvm_mmu_zap_mmio_sptes() and
>>>> sp->mmio_cached, so they should be removed as part of the patch series?
>>>
>>> Yes, i agree, they should be removed. :)
>>
>> I'm fine with removing it but please make it clear that you all agree
>> on the same basis.
>>
>> Last time, Paolo mentioned the possibility to use some bits of spte for
>> other things. The suggestion there was to keep sp->mmio_cached code
>> for the time we would need to reduce the bits for generation numbers.
>>
>> Do you think that zap_all() is now preemptible and can treat the
>> situation reasonably well as the current kvm_mmu_zap_mmio_sptes()?
>>
>> One downside is the need to zap unrelated shadow pages, but if this case
>> is really very rare, yes I agree, it should not be a problem: it depends
>> on how many bits we can use.
>>
>> Just please reconfirm.
>>
> That was me who mention the possibility to use some bits of spte for
> other things. But for now I have a use for one bit only. Now that you
> have reminded me about that discussion I am not so sure we want to
> remove kvm_mmu_zap_mmio_sptes(), but on the other hand it is non
> preemptable, so large number of mmio sptes can cause soft lockups.
> zap_all() is better in this regards now.

I asked Gleb on IRC, and he's fine with applying patch 7 too (otherwise
there's hardly any benefit, because kvm_mmu_zap_mmio_sptes is
non-preemptable).

I'm also changing the -13 to -150 since it's quite easy to generate 150
calls to KVM_SET_USER_MEMORY_REGION. Using QEMU, and for a pretty basic
guest with virtio-net, IDE controller and VGA you get:

- 9-10 calls before starting the guest, depending on the guest memory size

- around 25 during the BIOS

- around 20 during kernel boot

- 34 during a single dump of the 64 KB ROM from a virtio-net device.

Paolo

2013-06-19 11:27:49

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

On 06/19/2013 07:08 PM, Paolo Bonzini wrote:
> Il 10/06/2013 19:03, Gleb Natapov ha scritto:
>> On Mon, Jun 10, 2013 at 10:43:52PM +0900, Takuya Yoshikawa wrote:
>>> On Mon, 10 Jun 2013 16:39:37 +0800
>>> Xiao Guangrong <[email protected]> wrote:
>>>
>>>> On 06/10/2013 03:56 PM, Gleb Natapov wrote:
>>>>> On Fri, Jun 07, 2013 at 04:51:22PM +0800, Xiao Guangrong wrote:
>>>
>>>>> Looks good to me, but doesn't tis obsolete kvm_mmu_zap_mmio_sptes() and
>>>>> sp->mmio_cached, so they should be removed as part of the patch series?
>>>>
>>>> Yes, i agree, they should be removed. :)
>>>
>>> I'm fine with removing it but please make it clear that you all agree
>>> on the same basis.
>>>
>>> Last time, Paolo mentioned the possibility to use some bits of spte for
>>> other things. The suggestion there was to keep sp->mmio_cached code
>>> for the time we would need to reduce the bits for generation numbers.
>>>
>>> Do you think that zap_all() is now preemptible and can treat the
>>> situation reasonably well as the current kvm_mmu_zap_mmio_sptes()?
>>>
>>> One downside is the need to zap unrelated shadow pages, but if this case
>>> is really very rare, yes I agree, it should not be a problem: it depends
>>> on how many bits we can use.
>>>
>>> Just please reconfirm.
>>>
>> That was me who mention the possibility to use some bits of spte for
>> other things. But for now I have a use for one bit only. Now that you
>> have reminded me about that discussion I am not so sure we want to
>> remove kvm_mmu_zap_mmio_sptes(), but on the other hand it is non
>> preemptable, so large number of mmio sptes can cause soft lockups.
>> zap_all() is better in this regards now.
>
> I asked Gleb on IRC, and he's fine with applying patch 7 too (otherwise
> there's hardly any benefit, because kvm_mmu_zap_mmio_sptes is
> non-preemptable).
>
> I'm also changing the -13 to -150 since it's quite easy to generate 150
> calls to KVM_SET_USER_MEMORY_REGION. Using QEMU, and for a pretty basic
> guest with virtio-net, IDE controller and VGA you get:
>
> - 9-10 calls before starting the guest, depending on the guest memory size
>
> - around 25 during the BIOS
>
> - around 20 during kernel boot
>
> - 34 during a single dump of the 64 KB ROM from a virtio-net device.

Okay. The change is find to me. :)

2013-06-19 17:40:32

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 0/6] KVM: MMU: fast invalidate all mmio sptes

Il 07/06/2013 10:51, Xiao Guangrong ha scritto:
> Changelog:
> V3:
> All of these changes are from Gleb's review:
> 1) rename RET_MMIO_PF_EMU to RET_MMIO_PF_EMULATE.
> 2) smartly adjust kvm generation number in kvm_current_mmio_generatio()
> to avoid kvm_memslots->generation overflow.
>
> V2:
> - rename kvm_mmu_invalid_mmio_spte to kvm_mmu_invalid_mmio_sptes
> - use kvm->memslots->generation as kvm global generation-number
> - fix comment and codestyle
> - init kvm generation close to mmio wrap-around value
> - keep kvm_mmu_zap_mmio_sptes
>
> The current way is holding hot mmu-lock and walking all shadow pages, this
> is not scale. This patchset tries to introduce a very simple and scale way
> to fast invalidate all mmio sptes - it need not walk any shadow pages and hold
> any locks.
>
> The idea is simple:
> KVM maintains a global mmio valid generation-number which is stored in
> kvm->memslots.generation and every mmio spte stores the current global
> generation-number into his available bits when it is created
>
> When KVM need zap all mmio sptes, it just simply increase the global
> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> then it walks the shadow page table and get the mmio spte. If the
> generation-number on the spte does not equal the global generation-number,
> it will go to the normal #PF handler to update the mmio spte
>
> Since 19 bits are used to store generation-number on mmio spte, we zap all
> mmio sptes when the number is round
>
> Xiao Guangrong (6):
> KVM: MMU: retain more available bits on mmio spte
> KVM: MMU: store generation-number into mmio spte
> KVM: MMU: make return value of mmio page fault handler more readable
> KVM: MMU: fast invalidate all mmio sptes
> KVM: MMU: add tracepoint for check_mmio_spte
> KVM: MMU: init kvm generation close to mmio wrap-around value
>
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/mmu.c | 131 ++++++++++++++++++++++++++++++++--------
> arch/x86/kvm/mmu.h | 17 ++++++
> arch/x86/kvm/mmutrace.h | 34 +++++++++--
> arch/x86/kvm/paging_tmpl.h | 10 ++-
> arch/x86/kvm/vmx.c | 12 ++--
> arch/x86/kvm/x86.c | 11 +++-
> 7 files changed, 177 insertions(+), 40 deletions(-)
>

Applied all to queue for now. Will graduate to next after doing some
more tests.

Thanks!

Paolo

2013-06-27 08:29:29

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
> This patch tries to introduce a very simple and scale way to invalidate
> all mmio sptes - it need not walk any shadow pages and hold mmu-lock
>
> KVM maintains a global mmio valid generation-number which is stored in
> kvm->memslots.generation and every mmio spte stores the current global
> generation-number into his available bits when it is created
>
> When KVM need zap all mmio sptes, it just simply increase the global
> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> then it walks the shadow page table and get the mmio spte. If the
> generation-number on the spte does not equal the global generation-number,
> it will go to the normal #PF handler to update the mmio spte
>
> Since 19 bits are used to store generation-number on mmio spte, we zap all
> mmio sptes when the number is round
>
So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
fails too, but I haven't checked what happens exactly.

> Signed-off-by: Xiao Guangrong <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 +-
> arch/x86/kvm/mmu.c | 54 +++++++++++++++++++++++++++++++++++------
> arch/x86/kvm/mmu.h | 5 +++-
> arch/x86/kvm/paging_tmpl.h | 7 ++++--
> arch/x86/kvm/vmx.c | 4 +++
> arch/x86/kvm/x86.c | 3 +--
> 6 files changed, 61 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1f98c1b..90d05ed 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -773,7 +773,7 @@ void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> struct kvm_memory_slot *slot,
> gfn_t gfn_offset, unsigned long mask);
> void kvm_mmu_zap_all(struct kvm *kvm);
> -void kvm_mmu_zap_mmio_sptes(struct kvm *kvm);
> +void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm);
> unsigned int kvm_mmu_calculate_mmu_pages(struct kvm *kvm);
> void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned int kvm_nr_mmu_pages);
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 044d8c0..bdc95bc 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -205,9 +205,11 @@ EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
> #define MMIO_SPTE_GEN_LOW_SHIFT 3
> #define MMIO_SPTE_GEN_HIGH_SHIFT 52
>
> +#define MMIO_GEN_SHIFT 19
> #define MMIO_GEN_LOW_SHIFT 9
> #define MMIO_GEN_LOW_MASK ((1 << MMIO_GEN_LOW_SHIFT) - 1)
> -#define MMIO_MAX_GEN ((1 << 19) - 1)
> +#define MMIO_GEN_MASK ((1 << MMIO_GEN_SHIFT) - 1)
> +#define MMIO_MAX_GEN ((1 << MMIO_GEN_SHIFT) - 1)
>
> static u64 generation_mmio_spte_mask(unsigned int gen)
> {
> @@ -231,17 +233,23 @@ static unsigned int get_mmio_spte_generation(u64 spte)
> return gen;
> }
>
> +static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
> +{
> + return kvm_memslots(kvm)->generation & MMIO_GEN_MASK;
> +}
> +
> static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
> unsigned access)
> {
> struct kvm_mmu_page *sp = page_header(__pa(sptep));
> - u64 mask = generation_mmio_spte_mask(0);
> + unsigned int gen = kvm_current_mmio_generation(kvm);
> + u64 mask = generation_mmio_spte_mask(gen);
>
> access &= ACC_WRITE_MASK | ACC_USER_MASK;
> mask |= shadow_mmio_mask | access | gfn << PAGE_SHIFT;
> sp->mmio_cached = true;
>
> - trace_mark_mmio_spte(sptep, gfn, access, 0);
> + trace_mark_mmio_spte(sptep, gfn, access, gen);
> mmu_spte_set(sptep, mask);
> }
>
> @@ -271,6 +279,12 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn,
> return false;
> }
>
> +static bool check_mmio_spte(struct kvm *kvm, u64 spte)
> +{
> + return likely(get_mmio_spte_generation(spte) ==
> + kvm_current_mmio_generation(kvm));
> +}
> +
> static inline u64 rsvd_bits(int s, int e)
> {
> return ((1ULL << (e - s + 1)) - 1) << s;
> @@ -3235,6 +3249,9 @@ int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool direct)
> gfn_t gfn = get_mmio_spte_gfn(spte);
> unsigned access = get_mmio_spte_access(spte);
>
> + if (!check_mmio_spte(vcpu->kvm, spte))
> + return RET_MMIO_PF_INVALID;
> +
> if (direct)
> addr = 0;
>
> @@ -3276,8 +3293,12 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva,
>
> pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code);
>
> - if (unlikely(error_code & PFERR_RSVD_MASK))
> - return handle_mmio_page_fault(vcpu, gva, error_code, true);
> + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + r = handle_mmio_page_fault(vcpu, gva, error_code, true);
> +
> + if (likely(r != RET_MMIO_PF_INVALID))
> + return r;
> + }
>
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> @@ -3353,8 +3374,12 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code,
> ASSERT(vcpu);
> ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
>
> - if (unlikely(error_code & PFERR_RSVD_MASK))
> - return handle_mmio_page_fault(vcpu, gpa, error_code, true);
> + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + r = handle_mmio_page_fault(vcpu, gpa, error_code, true);
> +
> + if (likely(r != RET_MMIO_PF_INVALID))
> + return r;
> + }
>
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> @@ -4327,7 +4352,7 @@ void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm)
> spin_unlock(&kvm->mmu_lock);
> }
>
> -void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
> +static void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
> {
> struct kvm_mmu_page *sp, *node;
> LIST_HEAD(invalid_list);
> @@ -4350,6 +4375,19 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
> return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
> }
>
> +void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm)
> +{
> + /*
> + * The very rare case: if the generation-number is round,
> + * zap all shadow pages.
> + *
> + * The max value is MMIO_MAX_GEN - 1 since it is not called
> + * when mark memslot invalid.
> + */
> + if (unlikely(kvm_current_mmio_generation(kvm) >= (MMIO_MAX_GEN - 1)))
> + kvm_mmu_zap_mmio_sptes(kvm);
> +}
> +
> static int mmu_shrink(struct shrinker *shrink, struct shrink_control *sc)
> {
> struct kvm *kvm;
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index ba6a19c..5b59c57 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -56,12 +56,15 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
> /*
> * Return values of handle_mmio_page_fault_common:
> * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction
> - * directly.
> + * directly.
> + * RET_MMIO_PF_INVALID: invalid spte is detected then let the real page
> + * fault path update the mmio spte.
> * RET_MMIO_PF_RETRY: let CPU fault again on the address.
> * RET_MMIO_PF_BUG: bug is detected.
> */
> enum {
> RET_MMIO_PF_EMULATE = 1,
> + RET_MMIO_PF_INVALID = 2,
> RET_MMIO_PF_RETRY = 0,
> RET_MMIO_PF_BUG = -1
> };
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index fb50fa6..7769699 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -552,9 +552,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code,
>
> pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);
>
> - if (unlikely(error_code & PFERR_RSVD_MASK))
> - return handle_mmio_page_fault(vcpu, addr, error_code,
> + if (unlikely(error_code & PFERR_RSVD_MASK)) {
> + r = handle_mmio_page_fault(vcpu, addr, error_code,
> mmu_is_nested(vcpu));
> + if (likely(r != RET_MMIO_PF_INVALID))
> + return r;
> + };
>
> r = mmu_topup_memory_caches(vcpu);
> if (r)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 85c8d51..f4a5b3f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -5369,6 +5369,10 @@ static int handle_ept_misconfig(struct kvm_vcpu *vcpu)
> if (likely(ret == RET_MMIO_PF_EMULATE))
> return x86_emulate_instruction(vcpu, gpa, 0, NULL, 0) ==
> EMULATE_DONE;
> +
> + if (unlikely(ret == RET_MMIO_PF_INVALID))
> + return kvm_mmu_page_fault(vcpu, gpa, 0, NULL, 0);
> +
> if (unlikely(ret == RET_MMIO_PF_RETRY))
> return 1;
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 54059ba..9e4afa7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7067,8 +7067,7 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
> * If memory slot is created, or moved, we need to clear all
> * mmio sptes.
> */
> - if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE))
> - kvm_mmu_zap_mmio_sptes(kvm);
> + kvm_mmu_invalidate_mmio_sptes(kvm);
> }
>
> void kvm_arch_flush_shadow_all(struct kvm *kvm)
> --
> 1.8.1.4

--
Gleb.

2013-06-27 09:01:20

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
> On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
> > This patch tries to introduce a very simple and scale way to invalidate
> > all mmio sptes - it need not walk any shadow pages and hold mmu-lock
> >
> > KVM maintains a global mmio valid generation-number which is stored in
> > kvm->memslots.generation and every mmio spte stores the current global
> > generation-number into his available bits when it is created
> >
> > When KVM need zap all mmio sptes, it just simply increase the global
> > generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> > then it walks the shadow page table and get the mmio spte. If the
> > generation-number on the spte does not equal the global generation-number,
> > it will go to the normal #PF handler to update the mmio spte
> >
> > Since 19 bits are used to store generation-number on mmio spte, we zap all
> > mmio sptes when the number is round
> >
> So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
> fails too, but I haven't checked what happens exactly.
>
Something wrong with gfn calculation during mmio:

qemu-system-x86-17003 [000] 3962.625103: handle_mmio_page_fault: addr:c00ba6c0 gfn 100000000ba access a92
qemu-system-x86-17003 [000] 3962.774862: handle_mmio_page_fault: addr:ffffb170 gfn 100000fee00 access a92

--
Gleb.

2013-06-27 09:14:30

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
> On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
> > On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
> > > This patch tries to introduce a very simple and scale way to invalidate
> > > all mmio sptes - it need not walk any shadow pages and hold mmu-lock
> > >
> > > KVM maintains a global mmio valid generation-number which is stored in
> > > kvm->memslots.generation and every mmio spte stores the current global
> > > generation-number into his available bits when it is created
> > >
> > > When KVM need zap all mmio sptes, it just simply increase the global
> > > generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> > > then it walks the shadow page table and get the mmio spte. If the
> > > generation-number on the spte does not equal the global generation-number,
> > > it will go to the normal #PF handler to update the mmio spte
> > >
> > > Since 19 bits are used to store generation-number on mmio spte, we zap all
> > > mmio sptes when the number is round
> > >
> > So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
> > fails too, but I haven't checked what happens exactly.
> >
> Something wrong with gfn calculation during mmio:
>
> qemu-system-x86-17003 [000] 3962.625103: handle_mmio_page_fault: addr:c00ba6c0 gfn 100000000ba access a92
> qemu-system-x86-17003 [000] 3962.774862: handle_mmio_page_fault: addr:ffffb170 gfn 100000fee00 access a92
>
Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.

--
Gleb.

2013-06-27 09:21:57

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
> On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
> > On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
> > > On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
> > > > This patch tries to introduce a very simple and scale way to invalidate
> > > > all mmio sptes - it need not walk any shadow pages and hold mmu-lock
> > > >
> > > > KVM maintains a global mmio valid generation-number which is stored in
> > > > kvm->memslots.generation and every mmio spte stores the current global
> > > > generation-number into his available bits when it is created
> > > >
> > > > When KVM need zap all mmio sptes, it just simply increase the global
> > > > generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> > > > then it walks the shadow page table and get the mmio spte. If the
> > > > generation-number on the spte does not equal the global generation-number,
> > > > it will go to the normal #PF handler to update the mmio spte
> > > >
> > > > Since 19 bits are used to store generation-number on mmio spte, we zap all
> > > > mmio sptes when the number is round
> > > >
> > > So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
> > > fails too, but I haven't checked what happens exactly.
> > >
> > Something wrong with gfn calculation during mmio:
> >
> > qemu-system-x86-17003 [000] 3962.625103: handle_mmio_page_fault: addr:c00ba6c0 gfn 100000000ba access a92
> > qemu-system-x86-17003 [000] 3962.774862: handle_mmio_page_fault: addr:ffffb170 gfn 100000fee00 access a92
> >
> Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.
>
Hmm, something like patch below fixes it. Will test more.


diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 1fd2c05..aec9c05 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)

static gfn_t get_mmio_spte_gfn(u64 spte)
{
- return (spte & ~shadow_mmio_mask) >> PAGE_SHIFT;
+ u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
+ return (spte & ~mask) >> PAGE_SHIFT;
}

static unsigned get_mmio_spte_access(u64 spte)
--
Gleb.

2013-06-27 09:50:20

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

On 06/27/2013 05:21 PM, Gleb Natapov wrote:
> On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
>> On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
>>> On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
>>>> On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
>>>>> This patch tries to introduce a very simple and scale way to invalidate
>>>>> all mmio sptes - it need not walk any shadow pages and hold mmu-lock
>>>>>
>>>>> KVM maintains a global mmio valid generation-number which is stored in
>>>>> kvm->memslots.generation and every mmio spte stores the current global
>>>>> generation-number into his available bits when it is created
>>>>>
>>>>> When KVM need zap all mmio sptes, it just simply increase the global
>>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>>>>> then it walks the shadow page table and get the mmio spte. If the
>>>>> generation-number on the spte does not equal the global generation-number,
>>>>> it will go to the normal #PF handler to update the mmio spte
>>>>>
>>>>> Since 19 bits are used to store generation-number on mmio spte, we zap all
>>>>> mmio sptes when the number is round
>>>>>
>>>> So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
>>>> fails too, but I haven't checked what happens exactly.
>>>>
>>> Something wrong with gfn calculation during mmio:
>>>
>>> qemu-system-x86-17003 [000] 3962.625103: handle_mmio_page_fault: addr:c00ba6c0 gfn 100000000ba access a92
>>> qemu-system-x86-17003 [000] 3962.774862: handle_mmio_page_fault: addr:ffffb170 gfn 100000fee00 access a92
>>>
>> Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.
>>
> Hmm, something like patch below fixes it. Will test more.
>
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 1fd2c05..aec9c05 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)
>
> static gfn_t get_mmio_spte_gfn(u64 spte)
> {
> - return (spte & ~shadow_mmio_mask) >> PAGE_SHIFT;
> + u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
> + return (spte & ~mask) >> PAGE_SHIFT;
> }

Looks nice.

Gleb, thank you very much for investigating the bug and fixing my mistake.
I will be more careful in the further developments.

2013-06-27 10:19:37

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

On Thu, Jun 27, 2013 at 05:50:08PM +0800, Xiao Guangrong wrote:
> On 06/27/2013 05:21 PM, Gleb Natapov wrote:
> > On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
> >> On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
> >>> On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
> >>>> On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
> >>>>> This patch tries to introduce a very simple and scale way to invalidate
> >>>>> all mmio sptes - it need not walk any shadow pages and hold mmu-lock
> >>>>>
> >>>>> KVM maintains a global mmio valid generation-number which is stored in
> >>>>> kvm->memslots.generation and every mmio spte stores the current global
> >>>>> generation-number into his available bits when it is created
> >>>>>
> >>>>> When KVM need zap all mmio sptes, it just simply increase the global
> >>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> >>>>> then it walks the shadow page table and get the mmio spte. If the
> >>>>> generation-number on the spte does not equal the global generation-number,
> >>>>> it will go to the normal #PF handler to update the mmio spte
> >>>>>
> >>>>> Since 19 bits are used to store generation-number on mmio spte, we zap all
> >>>>> mmio sptes when the number is round
> >>>>>
> >>>> So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
> >>>> fails too, but I haven't checked what happens exactly.
> >>>>
> >>> Something wrong with gfn calculation during mmio:
> >>>
> >>> qemu-system-x86-17003 [000] 3962.625103: handle_mmio_page_fault: addr:c00ba6c0 gfn 100000000ba access a92
> >>> qemu-system-x86-17003 [000] 3962.774862: handle_mmio_page_fault: addr:ffffb170 gfn 100000fee00 access a92
> >>>
> >> Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.
> >>
> > Hmm, something like patch below fixes it. Will test more.
> >
> >
> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index 1fd2c05..aec9c05 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)
> >
> > static gfn_t get_mmio_spte_gfn(u64 spte)
> > {
> > - return (spte & ~shadow_mmio_mask) >> PAGE_SHIFT;
> > + u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
> > + return (spte & ~mask) >> PAGE_SHIFT;
> > }
>
> Looks nice.
>
The question is if get_mmio_spte_access() need the same treatment?

--
Gleb.

2013-06-27 11:05:31

by Xiao Guangrong

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

On 06/27/2013 06:19 PM, Gleb Natapov wrote:
> On Thu, Jun 27, 2013 at 05:50:08PM +0800, Xiao Guangrong wrote:
>> On 06/27/2013 05:21 PM, Gleb Natapov wrote:
>>> On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
>>>> On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
>>>>> On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
>>>>>> On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
>>>>>>> This patch tries to introduce a very simple and scale way to invalidate
>>>>>>> all mmio sptes - it need not walk any shadow pages and hold mmu-lock
>>>>>>>
>>>>>>> KVM maintains a global mmio valid generation-number which is stored in
>>>>>>> kvm->memslots.generation and every mmio spte stores the current global
>>>>>>> generation-number into his available bits when it is created
>>>>>>>
>>>>>>> When KVM need zap all mmio sptes, it just simply increase the global
>>>>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
>>>>>>> then it walks the shadow page table and get the mmio spte. If the
>>>>>>> generation-number on the spte does not equal the global generation-number,
>>>>>>> it will go to the normal #PF handler to update the mmio spte
>>>>>>>
>>>>>>> Since 19 bits are used to store generation-number on mmio spte, we zap all
>>>>>>> mmio sptes when the number is round
>>>>>>>
>>>>>> So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
>>>>>> fails too, but I haven't checked what happens exactly.
>>>>>>
>>>>> Something wrong with gfn calculation during mmio:
>>>>>
>>>>> qemu-system-x86-17003 [000] 3962.625103: handle_mmio_page_fault: addr:c00ba6c0 gfn 100000000ba access a92
>>>>> qemu-system-x86-17003 [000] 3962.774862: handle_mmio_page_fault: addr:ffffb170 gfn 100000fee00 access a92
>>>>>
>>>> Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.
>>>>
>>> Hmm, something like patch below fixes it. Will test more.
>>>
>>>
>>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>>> index 1fd2c05..aec9c05 100644
>>> --- a/arch/x86/kvm/mmu.c
>>> +++ b/arch/x86/kvm/mmu.c
>>> @@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)
>>>
>>> static gfn_t get_mmio_spte_gfn(u64 spte)
>>> {
>>> - return (spte & ~shadow_mmio_mask) >> PAGE_SHIFT;
>>> + u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
>>> + return (spte & ~mask) >> PAGE_SHIFT;
>>> }
>>
>> Looks nice.
>>
> The question is if get_mmio_spte_access() need the same treatment?

It works okay since the Access only uses bit1 and bit2 (and in the direct mmu
case, only use gfn). But i am happy to do the same change in get_mmio_spte_access()
to make the code more clear.

2013-06-27 11:10:25

by Gleb Natapov

[permalink] [raw]
Subject: Re: [PATCH v3 4/6] KVM: MMU: fast invalidate all mmio sptes

On Thu, Jun 27, 2013 at 07:05:20PM +0800, Xiao Guangrong wrote:
> On 06/27/2013 06:19 PM, Gleb Natapov wrote:
> > On Thu, Jun 27, 2013 at 05:50:08PM +0800, Xiao Guangrong wrote:
> >> On 06/27/2013 05:21 PM, Gleb Natapov wrote:
> >>> On Thu, Jun 27, 2013 at 12:14:24PM +0300, Gleb Natapov wrote:
> >>>> On Thu, Jun 27, 2013 at 12:01:10PM +0300, Gleb Natapov wrote:
> >>>>> On Thu, Jun 27, 2013 at 11:29:00AM +0300, Gleb Natapov wrote:
> >>>>>> On Fri, Jun 07, 2013 at 04:51:26PM +0800, Xiao Guangrong wrote:
> >>>>>>> This patch tries to introduce a very simple and scale way to invalidate
> >>>>>>> all mmio sptes - it need not walk any shadow pages and hold mmu-lock
> >>>>>>>
> >>>>>>> KVM maintains a global mmio valid generation-number which is stored in
> >>>>>>> kvm->memslots.generation and every mmio spte stores the current global
> >>>>>>> generation-number into his available bits when it is created
> >>>>>>>
> >>>>>>> When KVM need zap all mmio sptes, it just simply increase the global
> >>>>>>> generation-number. When guests do mmio access, KVM intercepts a MMIO #PF
> >>>>>>> then it walks the shadow page table and get the mmio spte. If the
> >>>>>>> generation-number on the spte does not equal the global generation-number,
> >>>>>>> it will go to the normal #PF handler to update the mmio spte
> >>>>>>>
> >>>>>>> Since 19 bits are used to store generation-number on mmio spte, we zap all
> >>>>>>> mmio sptes when the number is round
> >>>>>>>
> >>>>>> So this commit makes Fedora 9 32 bit reboot during boot, Fedora 9 64
> >>>>>> fails too, but I haven't checked what happens exactly.
> >>>>>>
> >>>>> Something wrong with gfn calculation during mmio:
> >>>>>
> >>>>> qemu-system-x86-17003 [000] 3962.625103: handle_mmio_page_fault: addr:c00ba6c0 gfn 100000000ba access a92
> >>>>> qemu-system-x86-17003 [000] 3962.774862: handle_mmio_page_fault: addr:ffffb170 gfn 100000fee00 access a92
> >>>>>
> >>>> Hmm, so I wounder why get_mmio_spte_gfn() does not clear gen bits.
> >>>>
> >>> Hmm, something like patch below fixes it. Will test more.
> >>>
> >>>
> >>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> >>> index 1fd2c05..aec9c05 100644
> >>> --- a/arch/x86/kvm/mmu.c
> >>> +++ b/arch/x86/kvm/mmu.c
> >>> @@ -260,7 +260,8 @@ static bool is_mmio_spte(u64 spte)
> >>>
> >>> static gfn_t get_mmio_spte_gfn(u64 spte)
> >>> {
> >>> - return (spte & ~shadow_mmio_mask) >> PAGE_SHIFT;
> >>> + u64 mask = generation_mmio_spte_mask(MMIO_MAX_GEN) | shadow_mmio_mask;
> >>> + return (spte & ~mask) >> PAGE_SHIFT;
> >>> }
> >>
> >> Looks nice.
> >>
> > The question is if get_mmio_spte_access() need the same treatment?
>
> It works okay since the Access only uses bit1 and bit2 (and in the direct mmu
> case, only use gfn). But i am happy to do the same change in get_mmio_spte_access()
> to make the code more clear.
>
It will fix output of handle_mmio_page_fault at least. Currently we have "access a92" there.

--
Gleb.