2020-06-22 20:12:55

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 00/21] KVM: Cleanup and unify kvm_mmu_memory_cache usage

Note, patch 18 will conflict with the p4d rework in 5.8. I originally
stated I would send v2 only after that got pulled into Paolo's tree, but
I got my timing wrong, i.e. I was thinking that would have already
happened. I'll send v3 if necessary. I wanted to get v2 out there now
that I actually compile tested other architectures.

Marc, I interpreted your "nothing caught fire" as Tested-by for the arm64
patches, let me know if that's not what you intended.


This series resurrects Christoffer Dall's series[1] to provide a common
MMU memory cache implementation that can be shared by x86, arm64 and MIPS.

It also picks up a suggested change from Ben Gardon[2] to clear shadow
page tables during initial allocation so as to avoid clearing entire
pages while holding mmu_lock.

The front half of the patches do house cleaning on x86's memory cache
implementation in preparation for moving it to common code, along with a
fair bit of cleanup on the usage. The middle chunk moves the patches to
common KVM, and the last two chunks convert arm64 and MIPS to the common
implementation.

Fully tested on x86 only. Compile tested patches 14-21 on arm64, MIPS,
s390 and PowerPC.

v2:
- Rebase to kvm-5.8-2, commit 49b3deaad345 ("Merge tag ...").
- Use an asm-generic kvm_types.h for s390 and PowerPC instead of an
empty arch-specific file. [Marc]
- Explicit document "GFP_PGTABLE_USER == GFP_KERNEL_ACCOUNT | GFP_ZERO"
in the arm64 conversion patch. [Marc]
- Collect review tags. [Ben]

[1] https://lkml.kernel.org/r/20191105110357.8607-1-christoffer.dall@arm
[2] https://lkml.kernel.org/r/[email protected]

Sean Christopherson (21):
KVM: x86/mmu: Track the associated kmem_cache in the MMU caches
KVM: x86/mmu: Consolidate "page" variant of memory cache helpers
KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache locals
KVM: x86/mmu: Remove superfluous gotos from mmu_topup_memory_caches()
KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is empty
KVM: x86/mmu: Move fast_page_fault() call above
mmu_topup_memory_caches()
KVM: x86/mmu: Topup memory caches after walking GVA->GPA
KVM: x86/mmu: Clean up the gorilla math in mmu_topup_memory_caches()
KVM: x86/mmu: Separate the memory caches for shadow pages and gfn
arrays
KVM: x86/mmu: Make __GFP_ZERO a property of the memory cache
KVM: x86/mmu: Zero allocate shadow pages (outside of mmu_lock)
KVM: x86/mmu: Skip filling the gfn cache for guaranteed direct MMU
topups
KVM: x86/mmu: Prepend "kvm_" to memory cache helpers that will be
global
KVM: Move x86's version of struct kvm_mmu_memory_cache to common code
KVM: Move x86's MMU memory cache helpers to common KVM code
KVM: arm64: Drop @max param from mmu_topup_memory_cache()
KVM: arm64: Use common code's approach for __GFP_ZERO with memory
caches
KVM: arm64: Use common KVM implementation of MMU memory caches
KVM: MIPS: Drop @max param from mmu_topup_memory_cache()
KVM: MIPS: Account pages used for GPA page tables
KVM: MIPS: Use common KVM implementation of MMU memory caches

arch/arm64/include/asm/kvm_host.h | 11 ---
arch/arm64/include/asm/kvm_types.h | 8 ++
arch/arm64/kvm/arm.c | 2 +
arch/arm64/kvm/mmu.c | 54 +++---------
arch/mips/include/asm/kvm_host.h | 11 ---
arch/mips/include/asm/kvm_types.h | 7 ++
arch/mips/kvm/mmu.c | 44 ++--------
arch/powerpc/include/asm/Kbuild | 1 +
arch/s390/include/asm/Kbuild | 1 +
arch/x86/include/asm/kvm_host.h | 14 +---
arch/x86/include/asm/kvm_types.h | 7 ++
arch/x86/kvm/mmu/mmu.c | 129 +++++++++--------------------
arch/x86/kvm/mmu/paging_tmpl.h | 10 +--
include/asm-generic/kvm_types.h | 5 ++
include/linux/kvm_host.h | 7 ++
include/linux/kvm_types.h | 19 +++++
virt/kvm/kvm_main.c | 55 ++++++++++++
17 files changed, 175 insertions(+), 210 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_types.h
create mode 100644 arch/mips/include/asm/kvm_types.h
create mode 100644 arch/x86/include/asm/kvm_types.h
create mode 100644 include/asm-generic/kvm_types.h

--
2.26.0


2020-06-22 20:13:06

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 15/21] KVM: Move x86's MMU memory cache helpers to common KVM code

Move x86's memory cache helpers to common KVM code so that they can be
reused by arm64 and MIPS in future patches.

Suggested-by: Christoffer Dall <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 53 --------------------------------------
include/linux/kvm_host.h | 7 +++++
virt/kvm/kvm_main.c | 55 ++++++++++++++++++++++++++++++++++++++++
3 files changed, 62 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b85d3e8e8403..a627437f73fd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,47 +1060,6 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
local_irq_enable();
}

-static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
- gfp_t gfp_flags)
-{
- gfp_flags |= mc->gfp_zero;
-
- if (mc->kmem_cache)
- return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
- else
- return (void *)__get_free_page(gfp_flags);
-}
-
-static int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
-{
- void *obj;
-
- if (mc->nobjs >= min)
- return 0;
- while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
- obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
- if (!obj)
- return mc->nobjs >= min ? 0 : -ENOMEM;
- mc->objects[mc->nobjs++] = obj;
- }
- return 0;
-}
-
-static int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc)
-{
- return mc->nobjs;
-}
-
-static void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
- while (mc->nobjs) {
- if (mc->kmem_cache)
- kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
- else
- free_page((unsigned long)mc->objects[--mc->nobjs]);
- }
-}
-
static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
{
int r;
@@ -1132,18 +1091,6 @@ static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
}

-static void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
- void *p;
-
- if (WARN_ON(!mc->nobjs))
- p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
- else
- p = mc->objects[--mc->nobjs];
- BUG_ON(!p);
- return p;
-}
-
static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
{
return kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 62ec926c78a0..d35e397dad6a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -816,6 +816,13 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu, bool usermode_vcpu_not_eligible);
void kvm_flush_remote_tlbs(struct kvm *kvm);
void kvm_reload_remote_mmus(struct kvm *kvm);

+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min);
+int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc);
+void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc);
+void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc);
+#endif
+
bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req,
struct kvm_vcpu *except,
unsigned long *vcpu_bitmap, cpumask_var_t tmp);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 7b6013f2ba19..9f019b552dcf 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -341,6 +341,61 @@ void kvm_reload_remote_mmus(struct kvm *kvm)
kvm_make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD);
}

+#ifdef KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE
+static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
+ gfp_t gfp_flags)
+{
+ gfp_flags |= mc->gfp_zero;
+
+ if (mc->kmem_cache)
+ return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
+ else
+ return (void *)__get_free_page(gfp_flags);
+}
+
+int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
+{
+ void *obj;
+
+ if (mc->nobjs >= min)
+ return 0;
+ while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
+ obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
+ if (!obj)
+ return mc->nobjs >= min ? 0 : -ENOMEM;
+ mc->objects[mc->nobjs++] = obj;
+ }
+ return 0;
+}
+
+int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc)
+{
+ return mc->nobjs;
+}
+
+void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+{
+ while (mc->nobjs) {
+ if (mc->kmem_cache)
+ kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
+ else
+ free_page((unsigned long)mc->objects[--mc->nobjs]);
+ }
+}
+
+void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
+{
+ void *p;
+
+ if (WARN_ON(!mc->nobjs))
+ p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
+ else
+ p = mc->objects[--mc->nobjs];
+ BUG_ON(!p);
+ return p;
+}
+#endif
+
static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
{
mutex_init(&vcpu->mutex);
--
2.26.0

2020-06-22 20:13:15

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 13/21] KVM: x86/mmu: Prepend "kvm_" to memory cache helpers that will be global

Rename the memory helpers that will soon be moved to common code and be
made globaly available via linux/kvm_host.h. "mmu" alone is not a
sufficient namespace for globally available KVM symbols.

Opportunistically add "nr_" in mmu_memory_cache_free_objects() to make
it clear the function returns the number of free objects, as opposed to
freeing existing objects.

Suggested-by: Christoffer Dall <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 42 +++++++++++++++++++++---------------------
1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8d66cf558f1b..b85d3e8e8403 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1071,7 +1071,7 @@ static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
return (void *)__get_free_page(gfp_flags);
}

-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
+static int kvm_mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
{
void *obj;

@@ -1086,12 +1086,12 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
return 0;
}

-static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *mc)
+static int kvm_mmu_memory_cache_nr_free_objects(struct kvm_mmu_memory_cache *mc)
{
return mc->nobjs;
}

-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
+static void kvm_mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
{
while (mc->nobjs) {
if (mc->kmem_cache)
@@ -1106,33 +1106,33 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
int r;

/* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
- r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
- 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
+ r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
+ 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
if (r)
return r;
- r = mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
- PT64_ROOT_MAX_LEVEL);
+ r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+ PT64_ROOT_MAX_LEVEL);
if (r)
return r;
if (maybe_indirect) {
- r = mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
- PT64_ROOT_MAX_LEVEL);
+ r = kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
+ PT64_ROOT_MAX_LEVEL);
if (r)
return r;
}
- return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
- PT64_ROOT_MAX_LEVEL);
+ return kvm_mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
+ PT64_ROOT_MAX_LEVEL);
}

static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
{
- mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
- mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
- mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
- mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
+ kvm_mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
+ kvm_mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+ kvm_mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
+ kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
}

-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
+static void *kvm_mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
{
void *p;

@@ -1146,7 +1146,7 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)

static struct pte_list_desc *mmu_alloc_pte_list_desc(struct kvm_vcpu *vcpu)
{
- return mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache);
+ return kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_pte_list_desc_cache);
}

static void mmu_free_pte_list_desc(struct pte_list_desc *pte_list_desc)
@@ -1417,7 +1417,7 @@ static bool rmap_can_add(struct kvm_vcpu *vcpu)
struct kvm_mmu_memory_cache *mc;

mc = &vcpu->arch.mmu_pte_list_desc_cache;
- return mmu_memory_cache_free_objects(mc);
+ return kvm_mmu_memory_cache_nr_free_objects(mc);
}

static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
@@ -2104,10 +2104,10 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
{
struct kvm_mmu_page *sp;

- sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
- sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
+ sp = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
+ sp->spt = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
if (!direct)
- sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);
+ sp->gfns = kvm_mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);

/*
--
2.26.0

2020-06-22 20:13:28

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 12/21] KVM: x86/mmu: Skip filling the gfn cache for guaranteed direct MMU topups

Don't bother filling the gfn array cache when the caller is a fully
direct MMU, i.e. won't need a gfn array for shadow pages.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 18 ++++++++++--------
arch/x86/kvm/mmu/paging_tmpl.h | 4 ++--
2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index a8f8eebf67df..8d66cf558f1b 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1101,7 +1101,7 @@ static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
}
}

-static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
+static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu, bool maybe_indirect)
{
int r;

@@ -1114,10 +1114,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
PT64_ROOT_MAX_LEVEL);
if (r)
return r;
- r = mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
- PT64_ROOT_MAX_LEVEL);
- if (r)
- return r;
+ if (maybe_indirect) {
+ r = mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
+ PT64_ROOT_MAX_LEVEL);
+ if (r)
+ return r;
+ }
return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
PT64_ROOT_MAX_LEVEL);
}
@@ -4107,7 +4109,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
if (fast_page_fault(vcpu, gpa, error_code))
return RET_PF_RETRY;

- r = mmu_topup_memory_caches(vcpu);
+ r = mmu_topup_memory_caches(vcpu, false);
if (r)
return r;

@@ -5147,7 +5149,7 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu)
{
int r;

- r = mmu_topup_memory_caches(vcpu);
+ r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map);
if (r)
goto out;
r = mmu_alloc_roots(vcpu);
@@ -5341,7 +5343,7 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa,
* or not since pte prefetch is skiped if it does not have
* enough objects in the cache.
*/
- mmu_topup_memory_caches(vcpu);
+ mmu_topup_memory_caches(vcpu, true);

spin_lock(&vcpu->kvm->mmu_lock);

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 3de32122f601..ac39710d0594 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -818,7 +818,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
return RET_PF_EMULATE;
}

- r = mmu_topup_memory_caches(vcpu);
+ r = mmu_topup_memory_caches(vcpu, true);
if (r)
return r;

@@ -905,7 +905,7 @@ static void FNAME(invlpg)(struct kvm_vcpu *vcpu, gva_t gva, hpa_t root_hpa)
* No need to check return value here, rmap_can_add() can
* help us to skip pte prefetch later.
*/
- mmu_topup_memory_caches(vcpu);
+ mmu_topup_memory_caches(vcpu, true);

if (!VALID_PAGE(root_hpa)) {
WARN_ON(1);
--
2.26.0

2020-06-22 20:13:31

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 01/21] KVM: x86/mmu: Track the associated kmem_cache in the MMU caches

Track the kmem_cache used for non-page KVM MMU memory caches instead of
passing in the associated kmem_cache when filling the cache. This will
allow consolidating code and other cleanups.

No functional change intended.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 24 +++++++++++-------------
2 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f8998e97457f..7b6ac8fad9c2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -251,6 +251,7 @@ struct kvm_kernel_irq_routing_entry;
*/
struct kvm_mmu_memory_cache {
int nobjs;
+ struct kmem_cache *kmem_cache;
void *objects[KVM_NR_MEM_OBJS];
};

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fdd05c233308..0830c195c9ed 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,15 +1060,14 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
local_irq_enable();
}

-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- struct kmem_cache *base_cache, int min)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
{
void *obj;

if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- obj = kmem_cache_zalloc(base_cache, GFP_KERNEL_ACCOUNT);
+ obj = kmem_cache_zalloc(cache->kmem_cache, GFP_KERNEL_ACCOUNT);
if (!obj)
return cache->nobjs >= min ? 0 : -ENOMEM;
cache->objects[cache->nobjs++] = obj;
@@ -1081,11 +1080,10 @@ static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *cache)
return cache->nobjs;
}

-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc,
- struct kmem_cache *cache)
+static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
{
while (mc->nobjs)
- kmem_cache_free(cache, mc->objects[--mc->nobjs]);
+ kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
}

static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
@@ -1115,25 +1113,22 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
int r;

r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
- pte_list_desc_cache, 8 + PTE_PREFETCH_NUM);
+ 8 + PTE_PREFETCH_NUM);
if (r)
goto out;
r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
if (r)
goto out;
- r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
- mmu_page_header_cache, 4);
+ r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, 4);
out:
return r;
}

static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
{
- mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
- pte_list_desc_cache);
+ mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
- mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache,
- mmu_page_header_cache);
+ mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
}

static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
@@ -5684,6 +5679,9 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
uint i;
int ret;

+ vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
+ vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+
vcpu->arch.mmu = &vcpu->arch.root_mmu;
vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;

--
2.26.0

2020-06-22 20:13:43

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 11/21] KVM: x86/mmu: Zero allocate shadow pages (outside of mmu_lock)

Set __GFP_ZERO for the shadow page memory cache and drop the explicit
clear_page() from kvm_mmu_get_page(). This moves the cost of zeroing a
page to the allocation time of the physical page, i.e. when topping up
the memory caches, and thus avoids having to zero out an entire page
while holding mmu_lock.

Cc: Peter Feiner <[email protected]>
Cc: Peter Shier <[email protected]>
Cc: Junaid Shahid <[email protected]>
Cc: Jim Mattson <[email protected]>
Suggested-by: Ben Gardon <[email protected]>
Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6b0ec9060786..a8f8eebf67df 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2545,7 +2545,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu,
if (level > PG_LEVEL_4K && need_sync)
flush |= kvm_sync_pages(vcpu, gfn, &invalid_list);
}
- clear_page(sp->spt);
trace_kvm_mmu_get_page(sp, true);

kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush);
@@ -5687,6 +5686,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;

+ vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO;
+
vcpu->arch.mmu = &vcpu->arch.root_mmu;
vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;

--
2.26.0

2020-06-22 20:14:22

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 07/21] KVM: x86/mmu: Topup memory caches after walking GVA->GPA

Topup memory caches after walking the GVA->GPA translation during a
shadow page fault, there is no need to ensure the caches are full when
walking the GVA. As of commit f5a1e9f89504f ("KVM: MMU: remove call
to kvm_mmu_pte_write from walk_addr"), the FNAME(walk_addr) flow no
longer add rmaps via kvm_mmu_pte_write().

This avoids allocating memory in the case that the GVA is unmapped in
the guest, and also provides a paper trail of why/when the memory caches
need to be filled.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/paging_tmpl.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 38c576495048..3de32122f601 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -791,10 +791,6 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,

pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code);

- r = mmu_topup_memory_caches(vcpu);
- if (r)
- return r;
-
/*
* If PFEC.RSVD is set, this is a shadow page fault.
* The bit needs to be cleared before walking guest page tables.
@@ -822,6 +818,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code,
return RET_PF_EMULATE;
}

+ r = mmu_topup_memory_caches(vcpu);
+ if (r)
+ return r;
+
vcpu->arch.write_fault_to_shadow_pgtable = false;

is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu,
--
2.26.0

2020-06-22 20:14:27

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 06/21] KVM: x86/mmu: Move fast_page_fault() call above mmu_topup_memory_caches()

Avoid refilling the memory caches and potentially slow reclaim/swap when
handling a fast page fault, which does not need to allocate any new
objects.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 5e773564ab20..4b4c3234d623 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4095,6 +4095,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
if (page_fault_handle_page_track(vcpu, error_code, gfn))
return RET_PF_EMULATE;

+ if (fast_page_fault(vcpu, gpa, error_code))
+ return RET_PF_RETRY;
+
r = mmu_topup_memory_caches(vcpu);
if (r)
return r;
@@ -4102,9 +4105,6 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code,
if (lpage_disallowed)
max_level = PG_LEVEL_4K;

- if (fast_page_fault(vcpu, gpa, error_code))
- return RET_PF_RETRY;
-
mmu_seq = vcpu->kvm->mmu_notifier_seq;
smp_rmb();

--
2.26.0

2020-06-22 20:14:32

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 18/21] KVM: arm64: Use common KVM implementation of MMU memory caches

Move to the common MMU memory cache implementation now that the common
code and arm64's existing code are semantically compatible.

No functional change intended.

Suggested-by: Christoffer Dall <[email protected]>
Tested-by: Marc Zyngier <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/include/asm/Kbuild | 1 -
arch/arm64/include/asm/kvm_host.h | 12 -------
arch/arm64/include/asm/kvm_types.h | 8 +++++
arch/arm64/kvm/mmu.c | 51 ++++++------------------------
4 files changed, 18 insertions(+), 54 deletions(-)
create mode 100644 arch/arm64/include/asm/kvm_types.h

diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild
index 35a68155cd0e..ff9cbb631212 100644
--- a/arch/arm64/include/asm/Kbuild
+++ b/arch/arm64/include/asm/Kbuild
@@ -1,6 +1,5 @@
# SPDX-License-Identifier: GPL-2.0
generic-y += early_ioremap.h
-generic-y += kvm_types.h
generic-y += local64.h
generic-y += mcs_spinlock.h
generic-y += qrwlock.h
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 335170b59899..23d1f41548f5 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -97,18 +97,6 @@ struct kvm_arch {
bool return_nisv_io_abort_to_user;
};

-#define KVM_NR_MEM_OBJS 40
-
-/*
- * We don't want allocation failures within the mmu code, so we preallocate
- * enough memory for a single page fault in a cache.
- */
-struct kvm_mmu_memory_cache {
- int nobjs;
- gfp_t gfp_zero;
- void *objects[KVM_NR_MEM_OBJS];
-};
-
struct kvm_vcpu_fault_info {
u32 esr_el2; /* Hyp Syndrom Register */
u64 far_el2; /* Hyp Fault Address Register */
diff --git a/arch/arm64/include/asm/kvm_types.h b/arch/arm64/include/asm/kvm_types.h
new file mode 100644
index 000000000000..9a126b9e2d7c
--- /dev/null
+++ b/arch/arm64/include/asm/kvm_types.h
@@ -0,0 +1,8 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _ASM_ARM64_KVM_TYPES_H
+#define _ASM_ARM64_KVM_TYPES_H
+
+#define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40
+
+#endif /* _ASM_ARM64_KVM_TYPES_H */
+
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 688213ef34f0..976405e2fbb2 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -124,37 +124,6 @@ static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
put_page(virt_to_page(pudp));
}

-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
-{
- void *page;
-
- if (cache->nobjs >= min)
- return 0;
- while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT |
- cache->gfp_zero);
- if (!page)
- return -ENOMEM;
- cache->objects[cache->nobjs++] = page;
- }
- return 0;
-}
-
-static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
-{
- while (mc->nobjs)
- free_page((unsigned long)mc->objects[--mc->nobjs]);
-}
-
-static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
-{
- void *p;
-
- BUG_ON(!mc || !mc->nobjs);
- p = mc->objects[--mc->nobjs];
- return p;
-}
-
static void clear_stage2_pgd_entry(struct kvm *kvm, pgd_t *pgd, phys_addr_t addr)
{
pud_t *pud_table __maybe_unused = stage2_pud_offset(kvm, pgd, 0UL);
@@ -1024,7 +993,7 @@ static pud_t *stage2_get_pud(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
if (stage2_pgd_none(kvm, *pgd)) {
if (!cache)
return NULL;
- pud = mmu_memory_cache_alloc(cache);
+ pud = kvm_mmu_memory_cache_alloc(cache);
stage2_pgd_populate(kvm, pgd, pud);
get_page(virt_to_page(pgd));
}
@@ -1045,7 +1014,7 @@ static pmd_t *stage2_get_pmd(struct kvm *kvm, struct kvm_mmu_memory_cache *cache
if (stage2_pud_none(kvm, *pud)) {
if (!cache)
return NULL;
- pmd = mmu_memory_cache_alloc(cache);
+ pmd = kvm_mmu_memory_cache_alloc(cache);
stage2_pud_populate(kvm, pud, pmd);
get_page(virt_to_page(pud));
}
@@ -1251,7 +1220,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
if (stage2_pud_none(kvm, *pud)) {
if (!cache)
return 0; /* ignore calls from kvm_set_spte_hva */
- pmd = mmu_memory_cache_alloc(cache);
+ pmd = kvm_mmu_memory_cache_alloc(cache);
stage2_pud_populate(kvm, pud, pmd);
get_page(virt_to_page(pud));
}
@@ -1276,7 +1245,7 @@ static int stage2_set_pte(struct kvm *kvm, struct kvm_mmu_memory_cache *cache,
if (pmd_none(*pmd)) {
if (!cache)
return 0; /* ignore calls from kvm_set_spte_hva */
- pte = mmu_memory_cache_alloc(cache);
+ pte = kvm_mmu_memory_cache_alloc(cache);
kvm_pmd_populate(pmd, pte);
get_page(virt_to_page(pmd));
}
@@ -1343,7 +1312,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
phys_addr_t addr, end;
int ret = 0;
unsigned long pfn;
- struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, };
+ struct kvm_mmu_memory_cache cache = { 0, __GFP_ZERO, NULL, };

end = (guest_ipa + size + PAGE_SIZE - 1) & PAGE_MASK;
pfn = __phys_to_pfn(pa);
@@ -1354,8 +1323,8 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
if (writable)
pte = kvm_s2pte_mkwrite(pte);

- ret = mmu_topup_memory_cache(&cache,
- kvm_mmu_cache_min_pages(kvm));
+ ret = kvm_mmu_topup_memory_cache(&cache,
+ kvm_mmu_cache_min_pages(kvm));
if (ret)
goto out;
spin_lock(&kvm->mmu_lock);
@@ -1369,7 +1338,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
}

out:
- mmu_free_memory_cache(&cache);
+ kvm_mmu_free_memory_cache(&cache);
return ret;
}

@@ -1735,7 +1704,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
up_read(&current->mm->mmap_sem);

/* We need minimum second+third level pages */
- ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm));
+ ret = kvm_mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm));
if (ret)
return ret;

@@ -2158,7 +2127,7 @@ int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)

void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu)
{
- mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+ kvm_mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
}

phys_addr_t kvm_mmu_get_httbr(void)
--
2.26.0

2020-06-22 20:14:36

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 02/21] KVM: x86/mmu: Consolidate "page" variant of memory cache helpers

Drop the "page" variants of the topup/free memory cache helpers, using
the existence of an associated kmem_cache to select the correct alloc
or free routine.

No functional change intended.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 37 +++++++++++--------------------------
1 file changed, 11 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0830c195c9ed..cbc101663a89 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1067,7 +1067,10 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
if (cache->nobjs >= min)
return 0;
while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- obj = kmem_cache_zalloc(cache->kmem_cache, GFP_KERNEL_ACCOUNT);
+ if (cache->kmem_cache)
+ obj = kmem_cache_zalloc(cache->kmem_cache, GFP_KERNEL_ACCOUNT);
+ else
+ obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!obj)
return cache->nobjs >= min ? 0 : -ENOMEM;
cache->objects[cache->nobjs++] = obj;
@@ -1082,30 +1085,12 @@ static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *cache)

static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
{
- while (mc->nobjs)
- kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
-}
-
-static int mmu_topup_memory_cache_page(struct kvm_mmu_memory_cache *cache,
- int min)
-{
- void *page;
-
- if (cache->nobjs >= min)
- return 0;
- while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- page = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
- if (!page)
- return cache->nobjs >= min ? 0 : -ENOMEM;
- cache->objects[cache->nobjs++] = page;
+ while (mc->nobjs) {
+ if (mc->kmem_cache)
+ kmem_cache_free(mc->kmem_cache, mc->objects[--mc->nobjs]);
+ else
+ free_page((unsigned long)mc->objects[--mc->nobjs]);
}
- return 0;
-}
-
-static void mmu_free_memory_cache_page(struct kvm_mmu_memory_cache *mc)
-{
- while (mc->nobjs)
- free_page((unsigned long)mc->objects[--mc->nobjs]);
}

static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
@@ -1116,7 +1101,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
8 + PTE_PREFETCH_NUM);
if (r)
goto out;
- r = mmu_topup_memory_cache_page(&vcpu->arch.mmu_page_cache, 8);
+ r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache, 8);
if (r)
goto out;
r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, 4);
@@ -1127,7 +1112,7 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
{
mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
- mmu_free_memory_cache_page(&vcpu->arch.mmu_page_cache);
+ mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
}

--
2.26.0

2020-06-22 20:14:54

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 09/21] KVM: x86/mmu: Separate the memory caches for shadow pages and gfn arrays

Use separate caches for allocating shadow pages versus gfn arrays. This
sets the stage for specifying __GFP_ZERO when allocating shadow pages
without incurring extra cost for gfn arrays.

No functional change intended.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 ++-
arch/x86/kvm/mmu/mmu.c | 15 ++++++++++-----
2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 7b6ac8fad9c2..376e1653ac41 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -636,7 +636,8 @@ struct kvm_vcpu_arch {
struct kvm_mmu *walk_mmu;

struct kvm_mmu_memory_cache mmu_pte_list_desc_cache;
- struct kvm_mmu_memory_cache mmu_page_cache;
+ struct kvm_mmu_memory_cache mmu_shadow_page_cache;
+ struct kvm_mmu_memory_cache mmu_gfn_array_cache;
struct kvm_mmu_memory_cache mmu_page_header_cache;

/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 451e0365e5dd..d245acece3cd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1108,8 +1108,12 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
if (r)
return r;
- r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
- 2 * PT64_ROOT_MAX_LEVEL);
+ r = mmu_topup_memory_cache(&vcpu->arch.mmu_shadow_page_cache,
+ PT64_ROOT_MAX_LEVEL);
+ if (r)
+ return r;
+ r = mmu_topup_memory_cache(&vcpu->arch.mmu_gfn_array_cache,
+ PT64_ROOT_MAX_LEVEL);
if (r)
return r;
return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
@@ -1119,7 +1123,8 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
{
mmu_free_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache);
- mmu_free_memory_cache(&vcpu->arch.mmu_page_cache);
+ mmu_free_memory_cache(&vcpu->arch.mmu_shadow_page_cache);
+ mmu_free_memory_cache(&vcpu->arch.mmu_gfn_array_cache);
mmu_free_memory_cache(&vcpu->arch.mmu_page_header_cache);
}

@@ -2096,9 +2101,9 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct
struct kvm_mmu_page *sp;

sp = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_header_cache);
- sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache);
+ sp->spt = mmu_memory_cache_alloc(&vcpu->arch.mmu_shadow_page_cache);
if (!direct)
- sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_page_cache);
+ sp->gfns = mmu_memory_cache_alloc(&vcpu->arch.mmu_gfn_array_cache);
set_page_private(virt_to_page(sp->spt), (unsigned long)sp);

/*
--
2.26.0

2020-06-22 20:14:55

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 16/21] KVM: arm64: Drop @max param from mmu_topup_memory_cache()

Replace the @max param in mmu_topup_memory_cache() and instead use
ARRAY_SIZE() to terminate the loop to fill the cache. This removes a
BUG_ON() and sets the stage for moving arm64 to the common memory cache
implementation.

No functional change intended.

Tested-by: Marc Zyngier <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/kvm/mmu.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index a1f6bc70c4e4..9398b66f8a87 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -124,15 +124,13 @@ static void stage2_dissolve_pud(struct kvm *kvm, phys_addr_t addr, pud_t *pudp)
put_page(virt_to_page(pudp));
}

-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache,
- int min, int max)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
{
void *page;

- BUG_ON(max > KVM_NR_MEM_OBJS);
if (cache->nobjs >= min)
return 0;
- while (cache->nobjs < max) {
+ while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
page = (void *)__get_free_page(GFP_PGTABLE_USER);
if (!page)
return -ENOMEM;
@@ -1356,8 +1354,7 @@ int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
pte = kvm_s2pte_mkwrite(pte);

ret = mmu_topup_memory_cache(&cache,
- kvm_mmu_cache_min_pages(kvm),
- KVM_NR_MEM_OBJS);
+ kvm_mmu_cache_min_pages(kvm));
if (ret)
goto out;
spin_lock(&kvm->mmu_lock);
@@ -1737,8 +1734,7 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
up_read(&current->mm->mmap_sem);

/* We need minimum second+third level pages */
- ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm),
- KVM_NR_MEM_OBJS);
+ ret = mmu_topup_memory_cache(memcache, kvm_mmu_cache_min_pages(kvm));
if (ret)
return ret;

--
2.26.0

2020-06-22 20:16:03

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 08/21] KVM: x86/mmu: Clean up the gorilla math in mmu_topup_memory_caches()

Clean up the minimums in mmu_topup_memory_caches() to document the
driving mechanisms behind the minimums. Now that encountering an empty
cache is unlikely to trigger BUG_ON(), it is less dangerous to be more
precise when defining the minimums.

For rmaps, the logic is 1 parent PTE per level, plus a single rmap, and
prefetched rmaps. The extra objects in the current '8 + PREFETCH'
minimum came about due to an abundance of paranoia in commit
c41ef344de212 ("KVM: MMU: increase per-vcpu rmap cache alloc size"),
i.e. it could have increased the minimum to 2 rmaps. Furthermore, the
unexpected extra rmap case was killed off entirely by commits
f759e2b4c728c ("KVM: MMU: avoid pte_list_desc running out in
kvm_mmu_pte_write") and f5a1e9f89504f ("KVM: MMU: remove call to
kvm_mmu_pte_write from walk_addr").

For the so called page cache, replace '8' with 2*PT64_ROOT_MAX_LEVEL.
The 2x multiplier is needed because the cache is used for both shadow
pages and gfn arrays for indirect MMUs.

And finally, for page headers, replace '4' with PT64_ROOT_MAX_LEVEL.

Note, KVM now supports 5-level paging, i.e. the old minimums that used a
baseline derived from 4-level paging were technically wrong. But, KVM
always allocates roots in a separate flow, e.g. it's impossible in the
current implementation to actually need 5 new shadow pages in a single
flow. Use PT64_ROOT_MAX_LEVEL unmodified instead of subtracting 1, as
the direct usage is likely more intuitive to uninformed readers, and the
inflated minimum is unlikely to affect functionality in practice.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 4b4c3234d623..451e0365e5dd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1103,14 +1103,17 @@ static int mmu_topup_memory_caches(struct kvm_vcpu *vcpu)
{
int r;

+ /* 1 rmap, 1 parent PTE per level, and the prefetched rmaps. */
r = mmu_topup_memory_cache(&vcpu->arch.mmu_pte_list_desc_cache,
- 8 + PTE_PREFETCH_NUM);
+ 1 + PT64_ROOT_MAX_LEVEL + PTE_PREFETCH_NUM);
if (r)
return r;
- r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache, 8);
+ r = mmu_topup_memory_cache(&vcpu->arch.mmu_page_cache,
+ 2 * PT64_ROOT_MAX_LEVEL);
if (r)
return r;
- return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache, 4);
+ return mmu_topup_memory_cache(&vcpu->arch.mmu_page_header_cache,
+ PT64_ROOT_MAX_LEVEL);
}

static void mmu_free_memory_caches(struct kvm_vcpu *vcpu)
--
2.26.0

2020-06-22 20:16:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 10/21] KVM: x86/mmu: Make __GFP_ZERO a property of the memory cache

Add a gfp_zero flag to 'struct kvm_mmu_memory_cache' and use it to
control __GFP_ZERO instead of hardcoding a call to kmem_cache_zalloc().
A future patch needs such a flag for the __get_free_page() path, as
gfn arrays do not need/want the allocator to zero the memory. Convert
the kmem_cache paths to __GFP_ZERO now so as to avoid a weird and
inconsistent API in the future.

No functional change intended.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 7 ++++++-
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 376e1653ac41..67b84aa2984e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -251,6 +251,7 @@ struct kvm_kernel_irq_routing_entry;
*/
struct kvm_mmu_memory_cache {
int nobjs;
+ gfp_t gfp_zero;
struct kmem_cache *kmem_cache;
void *objects[KVM_NR_MEM_OBJS];
};
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d245acece3cd..6b0ec9060786 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1063,8 +1063,10 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
gfp_t gfp_flags)
{
+ gfp_flags |= mc->gfp_zero;
+
if (mc->kmem_cache)
- return kmem_cache_zalloc(mc->kmem_cache, gfp_flags);
+ return kmem_cache_alloc(mc->kmem_cache, gfp_flags);
else
return (void *)__get_free_page(gfp_flags);
}
@@ -5680,7 +5682,10 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu)
int ret;

vcpu->arch.mmu_pte_list_desc_cache.kmem_cache = pte_list_desc_cache;
+ vcpu->arch.mmu_pte_list_desc_cache.gfp_zero = __GFP_ZERO;
+
vcpu->arch.mmu_page_header_cache.kmem_cache = mmu_page_header_cache;
+ vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO;

vcpu->arch.mmu = &vcpu->arch.root_mmu;
vcpu->arch.walk_mmu = &vcpu->arch.root_mmu;
--
2.26.0

2020-06-22 20:16:08

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 05/21] KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is empty

Attempt to allocate a new object instead of crashing KVM (and likely the
kernel) if a memory cache is unexpectedly empty. Use GFP_ATOMIC for the
allocation as the caches are used while holding mmu_lock. The immediate
BUG_ON() makes the code unnecessarily explosive and led to confusing
minimums being used in the past, e.g. allocating 4 objects where 1 would
suffice.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index ba70de24a5b0..5e773564ab20 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,6 +1060,15 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
local_irq_enable();
}

+static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
+ gfp_t gfp_flags)
+{
+ if (mc->kmem_cache)
+ return kmem_cache_zalloc(mc->kmem_cache, gfp_flags);
+ else
+ return (void *)__get_free_page(gfp_flags);
+}
+
static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
{
void *obj;
@@ -1067,10 +1076,7 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
if (mc->nobjs >= min)
return 0;
while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
- if (mc->kmem_cache)
- obj = kmem_cache_zalloc(mc->kmem_cache, GFP_KERNEL_ACCOUNT);
- else
- obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
+ obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
if (!obj)
return mc->nobjs >= min ? 0 : -ENOMEM;
mc->objects[mc->nobjs++] = obj;
@@ -1118,8 +1124,11 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
{
void *p;

- BUG_ON(!mc->nobjs);
- p = mc->objects[--mc->nobjs];
+ if (WARN_ON(!mc->nobjs))
+ p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
+ else
+ p = mc->objects[--mc->nobjs];
+ BUG_ON(!p);
return p;
}

--
2.26.0

2020-06-22 20:17:02

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v2 03/21] KVM: x86/mmu: Use consistent "mc" name for kvm_mmu_memory_cache locals

Use "mc" for local variables to shorten line lengths and provide
consistent names, which will be especially helpful when some of the
helpers are moved to common KVM code in future patches.

No functional change intended.

Reviewed-by: Ben Gardon <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index cbc101663a89..36c90f004ef4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1060,27 +1060,27 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
local_irq_enable();
}

-static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *cache, int min)
+static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
{
void *obj;

- if (cache->nobjs >= min)
+ if (mc->nobjs >= min)
return 0;
- while (cache->nobjs < ARRAY_SIZE(cache->objects)) {
- if (cache->kmem_cache)
- obj = kmem_cache_zalloc(cache->kmem_cache, GFP_KERNEL_ACCOUNT);
+ while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
+ if (mc->kmem_cache)
+ obj = kmem_cache_zalloc(mc->kmem_cache, GFP_KERNEL_ACCOUNT);
else
obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
if (!obj)
- return cache->nobjs >= min ? 0 : -ENOMEM;
- cache->objects[cache->nobjs++] = obj;
+ return mc->nobjs >= min ? 0 : -ENOMEM;
+ mc->objects[mc->nobjs++] = obj;
}
return 0;
}

-static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *cache)
+static int mmu_memory_cache_free_objects(struct kvm_mmu_memory_cache *mc)
{
- return cache->nobjs;
+ return mc->nobjs;
}

static void mmu_free_memory_cache(struct kvm_mmu_memory_cache *mc)
@@ -1395,10 +1395,10 @@ static struct kvm_rmap_head *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,

static bool rmap_can_add(struct kvm_vcpu *vcpu)
{
- struct kvm_mmu_memory_cache *cache;
+ struct kvm_mmu_memory_cache *mc;

- cache = &vcpu->arch.mmu_pte_list_desc_cache;
- return mmu_memory_cache_free_objects(cache);
+ mc = &vcpu->arch.mmu_pte_list_desc_cache;
+ return mmu_memory_cache_free_objects(mc);
}

static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
--
2.26.0

2020-06-23 17:27:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v2 00/21] KVM: Cleanup and unify kvm_mmu_memory_cache usage

On Mon, Jun 22, 2020 at 01:08:01PM -0700, Sean Christopherson wrote:
> Note, patch 18 will conflict with the p4d rework in 5.8. I originally
> stated I would send v2 only after that got pulled into Paolo's tree, but
> I got my timing wrong, i.e. I was thinking that would have already
> happened. I'll send v3 if necessary. I wanted to get v2 out there now
> that I actually compile tested other architectures.

Gah, too impatient by one day :-) I'll spin v3 later in the week.

2020-06-24 18:07:04

by Ben Gardon

[permalink] [raw]
Subject: Re: [PATCH v2 05/21] KVM: x86/mmu: Try to avoid crashing KVM if a MMU memory cache is empty

On Mon, Jun 22, 2020 at 1:09 PM Sean Christopherson
<[email protected]> wrote:
>
> Attempt to allocate a new object instead of crashing KVM (and likely the
> kernel) if a memory cache is unexpectedly empty. Use GFP_ATOMIC for the
> allocation as the caches are used while holding mmu_lock. The immediate
> BUG_ON() makes the code unnecessarily explosive and led to confusing
> minimums being used in the past, e.g. allocating 4 objects where 1 would
> suffice.
>
Reviewed-by: Ben Gardon <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 21 +++++++++++++++------
> 1 file changed, 15 insertions(+), 6 deletions(-)
>
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index ba70de24a5b0..5e773564ab20 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -1060,6 +1060,15 @@ static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu)
> local_irq_enable();
> }
>
> +static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc,
> + gfp_t gfp_flags)
> +{
> + if (mc->kmem_cache)
> + return kmem_cache_zalloc(mc->kmem_cache, gfp_flags);
> + else
> + return (void *)__get_free_page(gfp_flags);
> +}
> +
> static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
> {
> void *obj;
> @@ -1067,10 +1076,7 @@ static int mmu_topup_memory_cache(struct kvm_mmu_memory_cache *mc, int min)
> if (mc->nobjs >= min)
> return 0;
> while (mc->nobjs < ARRAY_SIZE(mc->objects)) {
> - if (mc->kmem_cache)
> - obj = kmem_cache_zalloc(mc->kmem_cache, GFP_KERNEL_ACCOUNT);
> - else
> - obj = (void *)__get_free_page(GFP_KERNEL_ACCOUNT);
> + obj = mmu_memory_cache_alloc_obj(mc, GFP_KERNEL_ACCOUNT);
> if (!obj)
> return mc->nobjs >= min ? 0 : -ENOMEM;
> mc->objects[mc->nobjs++] = obj;
> @@ -1118,8 +1124,11 @@ static void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc)
> {
> void *p;
>
> - BUG_ON(!mc->nobjs);
> - p = mc->objects[--mc->nobjs];
> + if (WARN_ON(!mc->nobjs))
> + p = mmu_memory_cache_alloc_obj(mc, GFP_ATOMIC | __GFP_ACCOUNT);
> + else
> + p = mc->objects[--mc->nobjs];
> + BUG_ON(!p);
> return p;
> }
>
> --
> 2.26.0
>