This series enables KVM to save memory when using the TDP MMU by waiting
to allocate memslot rmaps until they are needed. To do this, KVM tracks
whether or not a shadow root has been allocated. In order to get away
with not allocating the rmaps, KVM must also be sure to skip operations
which iterate over the rmaps. If the TDP MMU is in use and we have not
allocated a shadow root, these operations would essentially be op-ops
anyway. Skipping the rmap operations has a secondary benefit of avoiding
acquiring the MMU lock in write mode in many cases, substantially
reducing MMU lock contention.
This series was tested on an Intel Skylake machine. With the TDP MMU off
and on, this introduced no new failures on kvm-unit-tests or KVM selftests.
Changelog:
v2:
Incorporated feedback from Paolo and Sean
Replaced the memslot_assignment_lock with slots_arch_lock, which
has a larger critical section.
v3:
Removed shadow_mmu_active as suggested by Sean
Removed everything except adding a return value to
kvm_mmu_init_tdp_mmu from patch 1 of v2
Added RCU protection and better memory ordering for installing the
memslot rmaps as suggested by Paolo
Reordered most of the patches
v4:
Renamed functions to allocate and free memslots based on feedback
from David.
Eliminated the goto in memslot_rmap_alloc, as David suggested.
Eliminated kvm_memslots_have_rmaps and updated comments on uses of
memslots_have_rmaps. Suggested by Paolo.
Changed the description on patch 7 to one Paolo suggested.
Collected Reviewed-by tags from David.
Dropped the patch to add RCU notations to rmap accesses.
v5:
Responding to comments from Sean.
Improved comments
Swapped args to kvm_copy_memslots to match memcpy
Fixed some line wrap and declaration style issues
No longer check if memslots have rmaps before operations which
iterate through active_mmu_pages
Re-added the kvm_memslots_have_rmaps helper
Fixed a couple missing unlocks for the slots_arch_lock
Ben Gardon (7):
KVM: x86/mmu: Deduplicate rmap freeing
KVM: x86/mmu: Factor out allocating memslot rmap
KVM: mmu: Refactor memslot copy
KVM: mmu: Add slots_arch_lock for memslot arch fields
KVM: x86/mmu: Add a field to control memslot rmap allocation
KVM: x86/mmu: Skip rmap operations if rmaps not allocated
KVM: x86/mmu: Lazily allocate memslot rmaps
arch/x86/include/asm/kvm_host.h | 8 ++
arch/x86/kvm/mmu.h | 10 +++
arch/x86/kvm/mmu/mmu.c | 125 ++++++++++++++++++++------------
arch/x86/kvm/mmu/tdp_mmu.c | 6 +-
arch/x86/kvm/mmu/tdp_mmu.h | 4 +-
arch/x86/kvm/x86.c | 109 +++++++++++++++++++++++-----
include/linux/kvm_host.h | 9 +++
virt/kvm/kvm_main.c | 77 ++++++++++++++++----
8 files changed, 263 insertions(+), 85 deletions(-)
--
2.31.1.751.gd2f1c929bd-goog
Factor out copying kvm_memslots from allocating the memory for new ones
in preparation for adding a new lock to protect the arch-specific fields
of the memslots.
No functional change intended.
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
virt/kvm/kvm_main.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 6b4feb92dc79..4acd4722d729 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1306,6 +1306,18 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
return old_memslots;
}
+static size_t kvm_memslots_size(int slots)
+{
+ return sizeof(struct kvm_memslots) +
+ (sizeof(struct kvm_memory_slot) * slots);
+}
+
+static void kvm_copy_memslots(struct kvm_memslots *to,
+ struct kvm_memslots *from)
+{
+ memcpy(to, from, kvm_memslots_size(from->used_slots));
+}
+
/*
* Note, at a minimum, the current number of used slots must be allocated, even
* when deleting a memslot, as we need a complete duplicate of the memslots for
@@ -1315,19 +1327,16 @@ static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
enum kvm_mr_change change)
{
struct kvm_memslots *slots;
- size_t old_size, new_size;
-
- old_size = sizeof(struct kvm_memslots) +
- (sizeof(struct kvm_memory_slot) * old->used_slots);
+ size_t new_size;
if (change == KVM_MR_CREATE)
- new_size = old_size + sizeof(struct kvm_memory_slot);
+ new_size = kvm_memslots_size(old->used_slots + 1);
else
- new_size = old_size;
+ new_size = kvm_memslots_size(old->used_slots);
slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
if (likely(slots))
- memcpy(slots, old, old_size);
+ kvm_copy_memslots(slots, old);
return slots;
}
--
2.31.1.751.gd2f1c929bd-goog
Small code deduplication. No functional change expected.
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/x86.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9b6bca616929..11908beae58b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10896,17 +10896,23 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_hv_destroy_vm(kvm);
}
-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+static void memslot_rmap_free(struct kvm_memory_slot *slot)
{
int i;
for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
kvfree(slot->arch.rmap[i]);
slot->arch.rmap[i] = NULL;
+ }
+}
- if (i == 0)
- continue;
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ int i;
+
+ memslot_rmap_free(slot);
+ for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
kvfree(slot->arch.lpage_info[i - 1]);
slot->arch.lpage_info[i - 1] = NULL;
}
@@ -10972,12 +10978,9 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
return 0;
out_free:
- for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
- kvfree(slot->arch.rmap[i]);
- slot->arch.rmap[i] = NULL;
- if (i == 0)
- continue;
+ memslot_rmap_free(slot);
+ for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
kvfree(slot->arch.lpage_info[i - 1]);
slot->arch.lpage_info[i - 1] = NULL;
}
--
2.31.1.751.gd2f1c929bd-goog
Add a new lock to protect the arch-specific fields of memslots if they
need to be modified in a kvm->srcu read critical section. A future
commit will use this lock to lazily allocate memslot rmaps for x86.
Signed-off-by: Ben Gardon <[email protected]>
---
include/linux/kvm_host.h | 9 +++++++
virt/kvm/kvm_main.c | 54 +++++++++++++++++++++++++++++++++++-----
2 files changed, 57 insertions(+), 6 deletions(-)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 2f34487e21f2..817aa5e8dbd5 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -517,6 +517,15 @@ struct kvm {
#endif /* KVM_HAVE_MMU_RWLOCK */
struct mutex slots_lock;
+
+ /*
+ * Protects the arch-specific fields of struct kvm_memory_slots in
+ * use by the VM. To be used under the slots_lock (above) or in a
+ * kvm->srcu critical section where acquiring the slots_lock would
+ * lead to deadlock with the synchronize_srcu in
+ * install_new_memslots.
+ */
+ struct mutex slots_arch_lock;
struct mm_struct *mm; /* userspace tied to this vm */
struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4acd4722d729..41dfebde4680 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -908,6 +908,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
mutex_init(&kvm->lock);
mutex_init(&kvm->irq_lock);
mutex_init(&kvm->slots_lock);
+ mutex_init(&kvm->slots_arch_lock);
INIT_LIST_HEAD(&kvm->devices);
BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
@@ -1280,6 +1281,14 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
rcu_assign_pointer(kvm->memslots[as_id], slots);
+
+ /*
+ * Acquired in kvm_set_memslot. Must be released before synchronize
+ * SRCU below in order to avoid deadlock with another thread
+ * acquiring the slots_arch_lock in an srcu critical section.
+ */
+ mutex_unlock(&kvm->slots_arch_lock);
+
synchronize_srcu_expedited(&kvm->srcu);
/*
@@ -1351,9 +1360,27 @@ static int kvm_set_memslot(struct kvm *kvm,
struct kvm_memslots *slots;
int r;
+ /*
+ * Released in install_new_memslots.
+ *
+ * Must be held from before the current memslots are copied until
+ * after the new memslots are installed with rcu_assign_pointer,
+ * then released before the synchronize srcu in install_new_memslots.
+ *
+ * When modifying memslots outside of the slots_lock, must be held
+ * before reading the pointer to the current memslots until after all
+ * changes to those memslots are complete.
+ *
+ * These rules ensure that installing new memslots does not lose
+ * changes made to the previous memslots.
+ */
+ mutex_lock(&kvm->slots_arch_lock);
+
slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
- if (!slots)
+ if (!slots) {
+ mutex_unlock(&kvm->slots_arch_lock);
return -ENOMEM;
+ }
if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
/*
@@ -1364,10 +1391,9 @@ static int kvm_set_memslot(struct kvm *kvm,
slot->flags |= KVM_MEMSLOT_INVALID;
/*
- * We can re-use the old memslots, the only difference from the
- * newly installed memslots is the invalid flag, which will get
- * dropped by update_memslots anyway. We'll also revert to the
- * old memslots if preparing the new memory region fails.
+ * We can re-use the memory from the old memslots.
+ * It will be overwritten with a copy of the new memslots
+ * after reacquiring the slots_arch_lock below.
*/
slots = install_new_memslots(kvm, as_id, slots);
@@ -1379,6 +1405,17 @@ static int kvm_set_memslot(struct kvm *kvm,
* - kvm_is_visible_gfn (mmu_check_root)
*/
kvm_arch_flush_shadow_memslot(kvm, slot);
+
+ /* Released in install_new_memslots. */
+ mutex_lock(&kvm->slots_arch_lock);
+
+ /*
+ * The arch-specific fields of the memslots could have changed
+ * between releasing the slots_arch_lock in
+ * install_new_memslots and here, so get a fresh copy of the
+ * slots.
+ */
+ kvm_copy_memslots(slots, __kvm_memslots(kvm, as_id));
}
r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
@@ -1394,8 +1431,13 @@ static int kvm_set_memslot(struct kvm *kvm,
return 0;
out_slots:
- if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+ slot = id_to_memslot(slots, old->id);
+ slot->flags &= ~KVM_MEMSLOT_INVALID;
slots = install_new_memslots(kvm, as_id, slots);
+ } else {
+ mutex_unlock(&kvm->slots_arch_lock);
+ }
kvfree(slots);
return r;
}
--
2.31.1.751.gd2f1c929bd-goog
Small refactor to facilitate allocating rmaps for all memslots at once.
No functional change expected.
Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/x86.c | 37 +++++++++++++++++++++++++++----------
1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 11908beae58b..4b3d53c5fc76 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10920,10 +10920,31 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
kvm_page_track_free_memslot(slot);
}
+static int memslot_rmap_alloc(struct kvm_memory_slot *slot,
+ unsigned long npages)
+{
+ const int sz = sizeof(*slot->arch.rmap[0]);
+ int i;
+
+ for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+ int level = i + 1;
+ int lpages = gfn_to_index(slot->base_gfn + npages - 1,
+ slot->base_gfn, level) + 1;
+
+ slot->arch.rmap[i] = kvcalloc(lpages, sz, GFP_KERNEL_ACCOUNT);
+ if (!slot->arch.rmap[i]) {
+ memslot_rmap_free(slot);
+ return -ENOMEM;
+ }
+ }
+
+ return 0;
+}
+
static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
unsigned long npages)
{
- int i;
+ int i, r;
/*
* Clear out the previous array pointers for the KVM_MR_MOVE case. The
@@ -10932,7 +10953,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
*/
memset(&slot->arch, 0, sizeof(slot->arch));
- for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+ r = memslot_rmap_alloc(slot, npages);
+ if (r)
+ return r;
+
+ for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
struct kvm_lpage_info *linfo;
unsigned long ugfn;
int lpages;
@@ -10941,14 +10966,6 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
lpages = gfn_to_index(slot->base_gfn + npages - 1,
slot->base_gfn, level) + 1;
- slot->arch.rmap[i] =
- kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
- GFP_KERNEL_ACCOUNT);
- if (!slot->arch.rmap[i])
- goto out_free;
- if (i == 0)
- continue;
-
linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
if (!linfo)
goto out_free;
--
2.31.1.751.gd2f1c929bd-goog
If only the TDP MMU is being used to manage the memory mappings for a VM,
then many rmap operations can be skipped as they are guaranteed to be
no-ops. This saves some time which would be spent on the rmap operation.
It also avoids acquiring the MMU lock in write mode for many operations.
This makes it safe to run the VM without rmaps allocated, when only
using the TDP MMU and sets the stage for waiting to allocate the rmaps
until they're needed.
Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/mmu.h | 5 ++
arch/x86/kvm/mmu/mmu.c | 113 ++++++++++++++++++++++++-----------------
arch/x86/kvm/x86.c | 2 +-
3 files changed, 72 insertions(+), 48 deletions(-)
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 88d0ed5225a4..af09c47b1aa2 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -232,4 +232,9 @@ int kvm_arch_write_log_dirty(struct kvm_vcpu *vcpu);
int kvm_mmu_post_init_vm(struct kvm *kvm);
void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
+static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
+{
+ return kvm->arch.memslots_have_rmaps;
+}
+
#endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f059f2e8c6fe..1e0daabc83ca 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1189,6 +1189,10 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
if (is_tdp_mmu_enabled(kvm))
kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
slot->base_gfn + gfn_offset, mask, true);
+
+ if (!kvm_memslots_have_rmaps(kvm))
+ return;
+
while (mask) {
rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
PG_LEVEL_4K, slot);
@@ -1218,6 +1222,10 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
if (is_tdp_mmu_enabled(kvm))
kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
slot->base_gfn + gfn_offset, mask, false);
+
+ if (!kvm_memslots_have_rmaps(kvm))
+ return;
+
while (mask) {
rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
PG_LEVEL_4K, slot);
@@ -1260,9 +1268,11 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
int i;
bool write_protected = false;
- for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
- rmap_head = __gfn_to_rmap(gfn, i, slot);
- write_protected |= __rmap_write_protect(kvm, rmap_head, true);
+ if (kvm_memslots_have_rmaps(kvm)) {
+ for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
+ rmap_head = __gfn_to_rmap(gfn, i, slot);
+ write_protected |= __rmap_write_protect(kvm, rmap_head, true);
+ }
}
if (is_tdp_mmu_enabled(kvm))
@@ -1433,9 +1443,10 @@ static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
{
- bool flush;
+ bool flush = false;
- flush = kvm_handle_gfn_range(kvm, range, kvm_unmap_rmapp);
+ if (kvm_memslots_have_rmaps(kvm))
+ flush = kvm_handle_gfn_range(kvm, range, kvm_unmap_rmapp);
if (is_tdp_mmu_enabled(kvm))
flush |= kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
@@ -1445,9 +1456,10 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
- bool flush;
+ bool flush = false;
- flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmapp);
+ if (kvm_memslots_have_rmaps(kvm))
+ flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmapp);
if (is_tdp_mmu_enabled(kvm))
flush |= kvm_tdp_mmu_set_spte_gfn(kvm, range);
@@ -1500,9 +1512,10 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
- bool young;
+ bool young = false;
- young = kvm_handle_gfn_range(kvm, range, kvm_age_rmapp);
+ if (kvm_memslots_have_rmaps(kvm))
+ young = kvm_handle_gfn_range(kvm, range, kvm_age_rmapp);
if (is_tdp_mmu_enabled(kvm))
young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
@@ -1512,9 +1525,10 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
- bool young;
+ bool young = false;
- young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmapp);
+ if (kvm_memslots_have_rmaps(kvm))
+ young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmapp);
if (is_tdp_mmu_enabled(kvm))
young |= kvm_tdp_mmu_test_age_gfn(kvm, range);
@@ -5492,29 +5506,29 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
int i;
bool flush = false;
- write_lock(&kvm->mmu_lock);
- for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
- slots = __kvm_memslots(kvm, i);
- kvm_for_each_memslot(memslot, slots) {
- gfn_t start, end;
-
- start = max(gfn_start, memslot->base_gfn);
- end = min(gfn_end, memslot->base_gfn + memslot->npages);
- if (start >= end)
- continue;
+ if (kvm_memslots_have_rmaps(kvm)) {
+ write_lock(&kvm->mmu_lock);
+ for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ slots = __kvm_memslots(kvm, i);
+ kvm_for_each_memslot(memslot, slots) {
+ gfn_t start, end;
+
+ start = max(gfn_start, memslot->base_gfn);
+ end = min(gfn_end, memslot->base_gfn + memslot->npages);
+ if (start >= end)
+ continue;
- flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
- PG_LEVEL_4K,
- KVM_MAX_HUGEPAGE_LEVEL,
- start, end - 1, true, flush);
+ flush = slot_handle_level_range(kvm, memslot,
+ kvm_zap_rmapp, PG_LEVEL_4K,
+ KVM_MAX_HUGEPAGE_LEVEL, start,
+ end - 1, true, flush);
+ }
}
+ if (flush)
+ kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
+ write_unlock(&kvm->mmu_lock);
}
- if (flush)
- kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
-
- write_unlock(&kvm->mmu_lock);
-
if (is_tdp_mmu_enabled(kvm)) {
flush = false;
@@ -5541,12 +5555,15 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot,
int start_level)
{
- bool flush;
+ bool flush = false;
- write_lock(&kvm->mmu_lock);
- flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
- start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
- write_unlock(&kvm->mmu_lock);
+ if (kvm_memslots_have_rmaps(kvm)) {
+ write_lock(&kvm->mmu_lock);
+ flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
+ start_level, KVM_MAX_HUGEPAGE_LEVEL,
+ false);
+ write_unlock(&kvm->mmu_lock);
+ }
if (is_tdp_mmu_enabled(kvm)) {
read_lock(&kvm->mmu_lock);
@@ -5616,16 +5633,15 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
struct kvm_memory_slot *slot = (struct kvm_memory_slot *)memslot;
bool flush;
- write_lock(&kvm->mmu_lock);
- flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
-
- if (flush)
- kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
- write_unlock(&kvm->mmu_lock);
+ if (kvm_memslots_have_rmaps(kvm)) {
+ write_lock(&kvm->mmu_lock);
+ flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
+ if (flush)
+ kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+ write_unlock(&kvm->mmu_lock);
+ }
if (is_tdp_mmu_enabled(kvm)) {
- flush = false;
-
read_lock(&kvm->mmu_lock);
flush = kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot, flush);
if (flush)
@@ -5652,11 +5668,14 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- bool flush;
+ bool flush = false;
- write_lock(&kvm->mmu_lock);
- flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false);
- write_unlock(&kvm->mmu_lock);
+ if (kvm_memslots_have_rmaps(kvm)) {
+ write_lock(&kvm->mmu_lock);
+ flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty,
+ false);
+ write_unlock(&kvm->mmu_lock);
+ }
if (is_tdp_mmu_enabled(kvm)) {
read_lock(&kvm->mmu_lock);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index ae8e3179d483..7cbaa92687f7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10954,7 +10954,7 @@ static int kvm_alloc_memslot_metadata(struct kvm *kvm,
*/
memset(&slot->arch, 0, sizeof(slot->arch));
- if (kvm->arch.memslots_have_rmaps) {
+ if (kvm_memslots_have_rmaps(kvm)) {
r = memslot_rmap_alloc(slot, npages);
if (r)
return r;
--
2.31.1.751.gd2f1c929bd-goog
If the TDP MMU is in use, wait to allocate the rmaps until the shadow
MMU is actually used. (i.e. a nested VM is launched.) This saves memory
equal to 0.2% of guest memory in cases where the TDP MMU is used and
there are no nested guests involved.
Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/mmu.h | 7 ++++-
arch/x86/kvm/mmu/mmu.c | 14 +++++++---
arch/x86/kvm/mmu/tdp_mmu.c | 6 +++--
arch/x86/kvm/mmu/tdp_mmu.h | 4 +--
arch/x86/kvm/x86.c | 46 +++++++++++++++++++++++++++++++++
6 files changed, 71 insertions(+), 8 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fc75ed49bfee..7b65f82ade1c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1868,4 +1868,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
int kvm_cpu_dirty_log_size(void);
+int alloc_all_memslots_rmaps(struct kvm *kvm);
+
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index af09c47b1aa2..e987c9af82b6 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -234,7 +234,12 @@ void kvm_mmu_pre_destroy_vm(struct kvm *kvm);
static inline bool kvm_memslots_have_rmaps(struct kvm *kvm)
{
- return kvm->arch.memslots_have_rmaps;
+ /*
+ * Ensure that threads reading memslots_have_rmaps in various
+ * lock contexts see the value before trying to dereference
+ * the memslot rmap pointers.
+ */
+ return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
}
#endif
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 1e0daabc83ca..2ac7bec515a1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3294,6 +3294,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
}
}
+ r = alloc_all_memslots_rmaps(vcpu->kvm);
+ if (r)
+ return r;
+
write_lock(&vcpu->kvm->mmu_lock);
r = make_mmu_pages_available(vcpu);
if (r < 0)
@@ -5481,9 +5485,13 @@ void kvm_mmu_init_vm(struct kvm *kvm)
{
struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;
- kvm_mmu_init_tdp_mmu(kvm);
-
- kvm->arch.memslots_have_rmaps = true;
+ if (!kvm_mmu_init_tdp_mmu(kvm))
+ /*
+ * No smp_load/store wrappers needed here as we are in
+ * VM init and there cannot be any memslots / other threads
+ * accessing this struct kvm yet.
+ */
+ kvm->arch.memslots_have_rmaps = true;
node->track_write = kvm_mmu_pte_write;
node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 95eeb5ac6a8a..ea00c9502ba1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -14,10 +14,10 @@ static bool __read_mostly tdp_mmu_enabled = false;
module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);
/* Initializes the TDP MMU for the VM, if enabled. */
-void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
+bool kvm_mmu_init_tdp_mmu(struct kvm *kvm)
{
if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
- return;
+ return false;
/* This should not be changed for the lifetime of the VM. */
kvm->arch.tdp_mmu_enabled = true;
@@ -25,6 +25,8 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
+
+ return true;
}
static __always_inline void kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..b046ab5137a1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -80,12 +80,12 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
int *root_level);
#ifdef CONFIG_X86_64
-void kvm_mmu_init_tdp_mmu(struct kvm *kvm);
+bool kvm_mmu_init_tdp_mmu(struct kvm *kvm);
void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; }
static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
#else
-static inline void kvm_mmu_init_tdp_mmu(struct kvm *kvm) {}
+static inline bool kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return false; }
static inline void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) {}
static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; }
static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7cbaa92687f7..28dc8bdd0c8a 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10931,6 +10931,8 @@ static int memslot_rmap_alloc(struct kvm_memory_slot *slot,
int lpages = gfn_to_index(slot->base_gfn + npages - 1,
slot->base_gfn, level) + 1;
+ WARN_ON(slot->arch.rmap[i]);
+
slot->arch.rmap[i] = kvcalloc(lpages, sz, GFP_KERNEL_ACCOUNT);
if (!slot->arch.rmap[i]) {
memslot_rmap_free(slot);
@@ -10941,6 +10943,50 @@ static int memslot_rmap_alloc(struct kvm_memory_slot *slot,
return 0;
}
+int alloc_all_memslots_rmaps(struct kvm *kvm)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *slot;
+ int r, i;
+
+ /*
+ * Check if memslots alreday have rmaps early before acquiring
+ * the slots_arch_lock below.
+ */
+ if (kvm_memslots_have_rmaps(kvm))
+ return 0;
+
+ mutex_lock(&kvm->slots_arch_lock);
+
+ /*
+ * Read memslots_have_rmaps again, under the slots arch lock,
+ * before allocating the rmaps
+ */
+ if (kvm_memslots_have_rmaps(kvm)) {
+ mutex_unlock(&kvm->slots_arch_lock);
+ return 0;
+ }
+
+ for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ slots = __kvm_memslots(kvm, i);
+ kvm_for_each_memslot(slot, slots) {
+ r = memslot_rmap_alloc(slot, slot->npages);
+ if (r) {
+ mutex_unlock(&kvm->slots_arch_lock);
+ return r;
+ }
+ }
+ }
+
+ /*
+ * Ensure that memslots_have_rmaps becomes true strictly after
+ * all the rmap pointers are set.
+ */
+ smp_store_release(&kvm->arch.memslots_have_rmaps, true);
+ mutex_unlock(&kvm->slots_arch_lock);
+ return 0;
+}
+
static int kvm_alloc_memslot_metadata(struct kvm *kvm,
struct kvm_memory_slot *slot,
unsigned long npages)
--
2.31.1.751.gd2f1c929bd-goog
Add a field to control whether new memslots should have rmaps allocated
for them. As of this change, it's not safe to skip allocating rmaps, so
the field is always set to allocate rmaps. Future changes will make it
safe to operate without rmaps, using the TDP MMU. Then further changes
will allow the rmaps to be allocated lazily when needed for nested
oprtation.
No functional change expected.
Reviewed-by: David Hildenbrand <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 6 ++++++
arch/x86/kvm/mmu/mmu.c | 2 ++
arch/x86/kvm/x86.c | 13 ++++++++-----
3 files changed, 16 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 55efbacfc244..fc75ed49bfee 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1124,6 +1124,12 @@ struct kvm_arch {
*/
spinlock_t tdp_mmu_pages_lock;
#endif /* CONFIG_X86_64 */
+
+ /*
+ * If set, rmaps have been allocated for all memslots and should be
+ * allocated for any newly created or modified memslots.
+ */
+ bool memslots_have_rmaps;
};
struct kvm_vm_stat {
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 0144c40d09c7..f059f2e8c6fe 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5469,6 +5469,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)
kvm_mmu_init_tdp_mmu(kvm);
+ kvm->arch.memslots_have_rmaps = true;
+
node->track_write = kvm_mmu_pte_write;
node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
kvm_page_track_register_notifier(kvm, node);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4b3d53c5fc76..ae8e3179d483 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10941,7 +10941,8 @@ static int memslot_rmap_alloc(struct kvm_memory_slot *slot,
return 0;
}
-static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
+static int kvm_alloc_memslot_metadata(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
unsigned long npages)
{
int i, r;
@@ -10953,9 +10954,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
*/
memset(&slot->arch, 0, sizeof(slot->arch));
- r = memslot_rmap_alloc(slot, npages);
- if (r)
- return r;
+ if (kvm->arch.memslots_have_rmaps) {
+ r = memslot_rmap_alloc(slot, npages);
+ if (r)
+ return r;
+ }
for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
struct kvm_lpage_info *linfo;
@@ -11026,7 +11029,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
enum kvm_mr_change change)
{
if (change == KVM_MR_CREATE || change == KVM_MR_MOVE)
- return kvm_alloc_memslot_metadata(memslot,
+ return kvm_alloc_memslot_metadata(kvm, memslot,
mem->memory_size >> PAGE_SHIFT);
return 0;
}
--
2.31.1.751.gd2f1c929bd-goog
On 18/05/21 19:34, Ben Gardon wrote:
> This series enables KVM to save memory when using the TDP MMU by waiting
> to allocate memslot rmaps until they are needed. To do this, KVM tracks
> whether or not a shadow root has been allocated. In order to get away
> with not allocating the rmaps, KVM must also be sure to skip operations
> which iterate over the rmaps. If the TDP MMU is in use and we have not
> allocated a shadow root, these operations would essentially be op-ops
> anyway. Skipping the rmap operations has a secondary benefit of avoiding
> acquiring the MMU lock in write mode in many cases, substantially
> reducing MMU lock contention.
>
> This series was tested on an Intel Skylake machine. With the TDP MMU off
> and on, this introduced no new failures on kvm-unit-tests or KVM selftests.
>
> Changelog:
> v2:
> Incorporated feedback from Paolo and Sean
> Replaced the memslot_assignment_lock with slots_arch_lock, which
> has a larger critical section.
>
> v3:
> Removed shadow_mmu_active as suggested by Sean
> Removed everything except adding a return value to
> kvm_mmu_init_tdp_mmu from patch 1 of v2
> Added RCU protection and better memory ordering for installing the
> memslot rmaps as suggested by Paolo
> Reordered most of the patches
>
> v4:
> Renamed functions to allocate and free memslots based on feedback
> from David.
> Eliminated the goto in memslot_rmap_alloc, as David suggested.
> Eliminated kvm_memslots_have_rmaps and updated comments on uses of
> memslots_have_rmaps. Suggested by Paolo.
> Changed the description on patch 7 to one Paolo suggested.
> Collected Reviewed-by tags from David.
> Dropped the patch to add RCU notations to rmap accesses.
>
> v5:
> Responding to comments from Sean.
> Improved comments
> Swapped args to kvm_copy_memslots to match memcpy
> Fixed some line wrap and declaration style issues
> No longer check if memslots have rmaps before operations which
> iterate through active_mmu_pages
> Re-added the kvm_memslots_have_rmaps helper
> Fixed a couple missing unlocks for the slots_arch_lock
Queued (with minor conflicts), thanks!
Paolo