2021-05-06 19:51:45

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v3 0/8] Lazily allocate memslot rmaps

This series enables KVM to save memory when using the TDP MMU by waiting
to allocate memslot rmaps until they are needed. To do this, KVM tracks
whether or not a shadow root has been allocated. In order to get away
with not allocating the rmaps, KVM must also be sure to skip operations
which iterate over the rmaps. If the TDP MMU is in use and we have not
allocated a shadow root, these operations would essentially be op-ops
anyway. Skipping the rmap operations has a secondary benefit of avoiding
acquiring the MMU lock in write mode in many cases, substantially
reducing MMU lock contention.

This series was tested on an Intel Skylake machine. With the TDP MMU off
and on, this introduced no new failures on kvm-unit-tests or KVM selftests.

Changelog:
v2:
Incorporated feedback from Paolo and Sean
Replaced the memslot_assignment_lock with slots_arch_lock, which
has a larger critical section.

v3:
Removed shadow_mmu_active as suggested by Sean
Removed everything except adding a return value to
kvm_mmu_init_tdp_mmu from patch 1 of v2
Added RCU protection and better memory ordering for installing the
memslot rmaps as suggested by Paolo
Reordered most of the patches

Ben Gardon (8):
KVM: x86/mmu: Deduplicate rmap freeing
KVM: x86/mmu: Factor out allocating memslot rmap
KVM: mmu: Refactor memslot copy
KVM: mmu: Add slots_arch_lock for memslot arch fields
KVM: x86/mmu: Add a field to control memslot rmap allocation
KVM: x86/mmu: Skip rmap operations if rmaps not allocated
KVM: x86/mmu: Protect rmaps independently with SRCU
KVM: x86/mmu: Lazily allocate memslot rmaps

arch/x86/include/asm/kvm_host.h | 9 ++
arch/x86/kvm/mmu/mmu.c | 195 ++++++++++++++++++++------------
arch/x86/kvm/mmu/tdp_mmu.c | 6 +-
arch/x86/kvm/mmu/tdp_mmu.h | 4 +-
arch/x86/kvm/x86.c | 107 ++++++++++++++----
include/linux/kvm_host.h | 9 ++
virt/kvm/kvm_main.c | 54 +++++++--
7 files changed, 275 insertions(+), 109 deletions(-)

--
2.31.1.607.g51e8a6a459-goog


2021-05-06 19:52:09

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v3 1/8] KVM: x86/mmu: Deduplicate rmap freeing

Small code deduplication. No functional change expected.

Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/x86.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf3b67679cf0..5bcf07465c47 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10818,17 +10818,23 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_hv_destroy_vm(kvm);
}

-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+static void free_memslot_rmap(struct kvm_memory_slot *slot)
{
int i;

for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
kvfree(slot->arch.rmap[i]);
slot->arch.rmap[i] = NULL;
+ }
+}

- if (i == 0)
- continue;
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
+{
+ int i;
+
+ free_memslot_rmap(slot);

+ for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
kvfree(slot->arch.lpage_info[i - 1]);
slot->arch.lpage_info[i - 1] = NULL;
}
@@ -10894,12 +10900,9 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
return 0;

out_free:
- for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
- kvfree(slot->arch.rmap[i]);
- slot->arch.rmap[i] = NULL;
- if (i == 0)
- continue;
+ free_memslot_rmap(slot);

+ for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
kvfree(slot->arch.lpage_info[i - 1]);
slot->arch.lpage_info[i - 1] = NULL;
}
--
2.31.1.607.g51e8a6a459-goog

2021-05-06 19:52:14

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap

Small refactor to facilitate allocating rmaps for all memslots at once.

No functional change expected.

Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/x86.c | 41 ++++++++++++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 5bcf07465c47..fc32a7dbe4c4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10842,10 +10842,37 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
kvm_page_track_free_memslot(slot);
}

+static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
+ unsigned long npages)
+{
+ int i;
+
+ for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+ int lpages;
+ int level = i + 1;
+
+ lpages = gfn_to_index(slot->base_gfn + npages - 1,
+ slot->base_gfn, level) + 1;
+
+ slot->arch.rmap[i] =
+ kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
+ GFP_KERNEL_ACCOUNT);
+ if (!slot->arch.rmap[i])
+ goto out_free;
+ }
+
+ return 0;
+
+out_free:
+ free_memslot_rmap(slot);
+ return -ENOMEM;
+}
+
static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
unsigned long npages)
{
int i;
+ int r;

/*
* Clear out the previous array pointers for the KVM_MR_MOVE case. The
@@ -10854,7 +10881,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
*/
memset(&slot->arch, 0, sizeof(slot->arch));

- for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
+ r = alloc_memslot_rmap(slot, npages);
+ if (r)
+ return r;
+
+ for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
struct kvm_lpage_info *linfo;
unsigned long ugfn;
int lpages;
@@ -10863,14 +10894,6 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
lpages = gfn_to_index(slot->base_gfn + npages - 1,
slot->base_gfn, level) + 1;

- slot->arch.rmap[i] =
- kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
- GFP_KERNEL_ACCOUNT);
- if (!slot->arch.rmap[i])
- goto out_free;
- if (i == 0)
- continue;
-
linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
if (!linfo)
goto out_free;
--
2.31.1.607.g51e8a6a459-goog

2021-05-06 19:52:57

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v3 6/8] KVM: x86/mmu: Skip rmap operations if rmaps not allocated

If only the TDP MMU is being used to manage the memory mappings for a VM,
then many rmap operations can be skipped as they are guaranteed to be
no-ops. This saves some time which would be spent on the rmap operation.
It also avoids acquiring the MMU lock in write mode for many operations.

This makes it safe to run the VM without rmaps allocated, when only
using the TDP MMU and sets the stage for waiting to allocate the rmaps
until they're needed.

Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 128 +++++++++++++++++++++++++----------------
1 file changed, 77 insertions(+), 51 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 8761b4925755..730ea84bf7e7 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1189,6 +1189,10 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
if (is_tdp_mmu_enabled(kvm))
kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
slot->base_gfn + gfn_offset, mask, true);
+
+ if (!kvm_memslots_have_rmaps(kvm))
+ return;
+
while (mask) {
rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
PG_LEVEL_4K, slot);
@@ -1218,6 +1222,10 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
if (is_tdp_mmu_enabled(kvm))
kvm_tdp_mmu_clear_dirty_pt_masked(kvm, slot,
slot->base_gfn + gfn_offset, mask, false);
+
+ if (!kvm_memslots_have_rmaps(kvm))
+ return;
+
while (mask) {
rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
PG_LEVEL_4K, slot);
@@ -1260,9 +1268,12 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
int i;
bool write_protected = false;

- for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
- rmap_head = __gfn_to_rmap(gfn, i, slot);
- write_protected |= __rmap_write_protect(kvm, rmap_head, true);
+ if (kvm_memslots_have_rmaps(kvm)) {
+ for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
+ rmap_head = __gfn_to_rmap(gfn, i, slot);
+ write_protected |= __rmap_write_protect(kvm, rmap_head,
+ true);
+ }
}

if (is_tdp_mmu_enabled(kvm))
@@ -1433,9 +1444,10 @@ static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,

bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)
{
- bool flush;
+ bool flush = false;

- flush = kvm_handle_gfn_range(kvm, range, kvm_unmap_rmapp);
+ if (kvm_memslots_have_rmaps(kvm))
+ flush = kvm_handle_gfn_range(kvm, range, kvm_unmap_rmapp);

if (is_tdp_mmu_enabled(kvm))
flush |= kvm_tdp_mmu_unmap_gfn_range(kvm, range, flush);
@@ -1445,9 +1457,10 @@ bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range)

bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
- bool flush;
+ bool flush = false;

- flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmapp);
+ if (kvm_memslots_have_rmaps(kvm))
+ flush = kvm_handle_gfn_range(kvm, range, kvm_set_pte_rmapp);

if (is_tdp_mmu_enabled(kvm))
flush |= kvm_tdp_mmu_set_spte_gfn(kvm, range);
@@ -1500,9 +1513,10 @@ static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn)

bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
- bool young;
+ bool young = false;

- young = kvm_handle_gfn_range(kvm, range, kvm_age_rmapp);
+ if (kvm_memslots_have_rmaps(kvm))
+ young = kvm_handle_gfn_range(kvm, range, kvm_age_rmapp);

if (is_tdp_mmu_enabled(kvm))
young |= kvm_tdp_mmu_age_gfn_range(kvm, range);
@@ -1512,9 +1526,10 @@ bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)

bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
- bool young;
+ bool young = false;

- young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmapp);
+ if (kvm_memslots_have_rmaps(kvm))
+ young = kvm_handle_gfn_range(kvm, range, kvm_test_age_rmapp);

if (is_tdp_mmu_enabled(kvm))
young |= kvm_tdp_mmu_test_age_gfn(kvm, range);
@@ -5440,7 +5455,8 @@ static void kvm_mmu_zap_all_fast(struct kvm *kvm)
*/
kvm_reload_remote_mmus(kvm);

- kvm_zap_obsolete_pages(kvm);
+ if (kvm_memslots_have_rmaps(kvm))
+ kvm_zap_obsolete_pages(kvm);

write_unlock(&kvm->mmu_lock);

@@ -5492,29 +5508,29 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
int i;
bool flush = false;

- write_lock(&kvm->mmu_lock);
- for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
- slots = __kvm_memslots(kvm, i);
- kvm_for_each_memslot(memslot, slots) {
- gfn_t start, end;
-
- start = max(gfn_start, memslot->base_gfn);
- end = min(gfn_end, memslot->base_gfn + memslot->npages);
- if (start >= end)
- continue;
+ if (kvm_memslots_have_rmaps(kvm)) {
+ write_lock(&kvm->mmu_lock);
+ for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ slots = __kvm_memslots(kvm, i);
+ kvm_for_each_memslot(memslot, slots) {
+ gfn_t start, end;
+
+ start = max(gfn_start, memslot->base_gfn);
+ end = min(gfn_end, memslot->base_gfn + memslot->npages);
+ if (start >= end)
+ continue;

- flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
- PG_LEVEL_4K,
- KVM_MAX_HUGEPAGE_LEVEL,
- start, end - 1, true, flush);
+ flush = slot_handle_level_range(kvm, memslot,
+ kvm_zap_rmapp, PG_LEVEL_4K,
+ KVM_MAX_HUGEPAGE_LEVEL, start,
+ end - 1, true, flush);
+ }
}
+ if (flush)
+ kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
+ write_unlock(&kvm->mmu_lock);
}

- if (flush)
- kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end);
-
- write_unlock(&kvm->mmu_lock);
-
if (is_tdp_mmu_enabled(kvm)) {
flush = false;

@@ -5541,12 +5557,15 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot,
int start_level)
{
- bool flush;
+ bool flush = false;

- write_lock(&kvm->mmu_lock);
- flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
- start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
- write_unlock(&kvm->mmu_lock);
+ if (kvm_memslots_have_rmaps(kvm)) {
+ write_lock(&kvm->mmu_lock);
+ flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
+ start_level, KVM_MAX_HUGEPAGE_LEVEL,
+ false);
+ write_unlock(&kvm->mmu_lock);
+ }

if (is_tdp_mmu_enabled(kvm)) {
read_lock(&kvm->mmu_lock);
@@ -5616,16 +5635,15 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
struct kvm_memory_slot *slot = (struct kvm_memory_slot *)memslot;
bool flush;

- write_lock(&kvm->mmu_lock);
- flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
-
- if (flush)
- kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
- write_unlock(&kvm->mmu_lock);
+ if (kvm_memslots_have_rmaps(kvm)) {
+ write_lock(&kvm->mmu_lock);
+ flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
+ if (flush)
+ kvm_arch_flush_remote_tlbs_memslot(kvm, slot);
+ write_unlock(&kvm->mmu_lock);
+ }

if (is_tdp_mmu_enabled(kvm)) {
- flush = false;
-
read_lock(&kvm->mmu_lock);
flush = kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot, flush);
if (flush)
@@ -5652,11 +5670,14 @@ void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm,
void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
struct kvm_memory_slot *memslot)
{
- bool flush;
+ bool flush = false;

- write_lock(&kvm->mmu_lock);
- flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, false);
- write_unlock(&kvm->mmu_lock);
+ if (kvm_memslots_have_rmaps(kvm)) {
+ write_lock(&kvm->mmu_lock);
+ flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty,
+ false);
+ write_unlock(&kvm->mmu_lock);
+ }

if (is_tdp_mmu_enabled(kvm)) {
read_lock(&kvm->mmu_lock);
@@ -5681,6 +5702,14 @@ void kvm_mmu_zap_all(struct kvm *kvm)
int ign;

write_lock(&kvm->mmu_lock);
+ if (is_tdp_mmu_enabled(kvm))
+ kvm_tdp_mmu_zap_all(kvm);
+
+ if (!kvm_memslots_have_rmaps(kvm)) {
+ write_unlock(&kvm->mmu_lock);
+ return;
+ }
+
restart:
list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
if (WARN_ON(sp->role.invalid))
@@ -5693,9 +5722,6 @@ void kvm_mmu_zap_all(struct kvm *kvm)

kvm_mmu_commit_zap_page(kvm, &invalid_list);

- if (is_tdp_mmu_enabled(kvm))
- kvm_tdp_mmu_zap_all(kvm);
-
write_unlock(&kvm->mmu_lock);
}

--
2.31.1.607.g51e8a6a459-goog

2021-05-06 19:52:59

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v3 4/8] KVM: mmu: Add slots_arch_lock for memslot arch fields

Add a new lock to protect the arch-specific fields of memslots if they
need to be modified in a kvm->srcu read critical section. A future
commit will use this lock to lazily allocate memslot rmaps for x86.

Signed-off-by: Ben Gardon <[email protected]>
---
include/linux/kvm_host.h | 9 +++++++++
virt/kvm/kvm_main.c | 31 ++++++++++++++++++++++++++-----
2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 8895b95b6a22..2d5e797fbb08 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -472,6 +472,15 @@ struct kvm {
#endif /* KVM_HAVE_MMU_RWLOCK */

struct mutex slots_lock;
+
+ /*
+ * Protects the arch-specific fields of struct kvm_memory_slots in
+ * use by the VM. To be used under the slots_lock (above) or in a
+ * kvm->srcu read cirtical section where acquiring the slots_lock
+ * would lead to deadlock with the synchronize_srcu in
+ * install_new_memslots.
+ */
+ struct mutex slots_arch_lock;
struct mm_struct *mm; /* userspace tied to this vm */
struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
struct kvm_vcpu *vcpus[KVM_MAX_VCPUS];
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c8010f55e368..97b03fa2d0c8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -908,6 +908,7 @@ static struct kvm *kvm_create_vm(unsigned long type)
mutex_init(&kvm->lock);
mutex_init(&kvm->irq_lock);
mutex_init(&kvm->slots_lock);
+ mutex_init(&kvm->slots_arch_lock);
INIT_LIST_HEAD(&kvm->devices);

BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
@@ -1280,6 +1281,10 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;

rcu_assign_pointer(kvm->memslots[as_id], slots);
+
+ /* Acquired in kvm_set_memslot. */
+ mutex_unlock(&kvm->slots_arch_lock);
+
synchronize_srcu_expedited(&kvm->srcu);

/*
@@ -1351,6 +1356,9 @@ static int kvm_set_memslot(struct kvm *kvm,
struct kvm_memslots *slots;
int r;

+ /* Released in install_new_memslots. */
+ mutex_lock(&kvm->slots_arch_lock);
+
slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
if (!slots)
return -ENOMEM;
@@ -1364,10 +1372,9 @@ static int kvm_set_memslot(struct kvm *kvm,
slot->flags |= KVM_MEMSLOT_INVALID;

/*
- * We can re-use the old memslots, the only difference from the
- * newly installed memslots is the invalid flag, which will get
- * dropped by update_memslots anyway. We'll also revert to the
- * old memslots if preparing the new memory region fails.
+ * We can re-use the memory from the old memslots.
+ * It will be overwritten with a copy of the new memslots
+ * after reacquiring the slots_arch_lock below.
*/
slots = install_new_memslots(kvm, as_id, slots);

@@ -1379,6 +1386,17 @@ static int kvm_set_memslot(struct kvm *kvm,
* - kvm_is_visible_gfn (mmu_check_root)
*/
kvm_arch_flush_shadow_memslot(kvm, slot);
+
+ /* Released in install_new_memslots. */
+ mutex_lock(&kvm->slots_arch_lock);
+
+ /*
+ * The arch-specific fields of the memslots could have changed
+ * between releasing the slots_arch_lock in
+ * install_new_memslots and here, so get a fresh copy of the
+ * slots.
+ */
+ kvm_copy_memslots(__kvm_memslots(kvm, as_id), slots);
}

r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
@@ -1394,8 +1412,11 @@ static int kvm_set_memslot(struct kvm *kvm,
return 0;

out_slots:
- if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+ slot = id_to_memslot(slots, old->id);
+ slot->flags &= ~KVM_MEMSLOT_INVALID;
slots = install_new_memslots(kvm, as_id, slots);
+ }
kvfree(slots);
return r;
}
--
2.31.1.607.g51e8a6a459-goog

2021-05-06 19:53:02

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps

If the TDP MMU is in use, wait to allocate the rmaps until the shadow
MMU is actually used. (i.e. a nested VM is launched.) This saves memory
equal to 0.2% of guest memory in cases where the TDP MMU is used and
there are no nested guests involved.

Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu/mmu.c | 14 ++++++++++---
arch/x86/kvm/mmu/tdp_mmu.c | 6 ++++--
arch/x86/kvm/mmu/tdp_mmu.h | 4 ++--
arch/x86/kvm/x86.c | 37 ++++++++++++++++++++++++++++++++-
5 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 00065f9bbc5e..7b8e1532fb55 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1860,5 +1860,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
int kvm_cpu_dirty_log_size(void);

inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
+int alloc_all_memslots_rmaps(struct kvm *kvm);

#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 48067c572c02..e3a3b65829c5 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3306,6 +3306,10 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu)
}
}

+ r = alloc_all_memslots_rmaps(vcpu->kvm);
+ if (r)
+ return r;
+
write_lock(&vcpu->kvm->mmu_lock);
r = make_mmu_pages_available(vcpu);
if (r < 0)
@@ -5494,9 +5498,13 @@ void kvm_mmu_init_vm(struct kvm *kvm)
{
struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker;

- kvm_mmu_init_tdp_mmu(kvm);
-
- kvm->arch.memslots_have_rmaps = true;
+ if (!kvm_mmu_init_tdp_mmu(kvm))
+ /*
+ * No smp_load/store wrappers needed here as we are in
+ * VM init and there cannot be any memslots / other threads
+ * accessing this struct kvm yet.
+ */
+ kvm->arch.memslots_have_rmaps = true;

node->track_write = kvm_mmu_pte_write;
node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 83cbdbe5de5a..5342aca2c8e0 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -14,10 +14,10 @@ static bool __read_mostly tdp_mmu_enabled = false;
module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);

/* Initializes the TDP MMU for the VM, if enabled. */
-void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
+bool kvm_mmu_init_tdp_mmu(struct kvm *kvm)
{
if (!tdp_enabled || !READ_ONCE(tdp_mmu_enabled))
- return;
+ return false;

/* This should not be changed for the lifetime of the VM. */
kvm->arch.tdp_mmu_enabled = true;
@@ -25,6 +25,8 @@ void kvm_mmu_init_tdp_mmu(struct kvm *kvm)
INIT_LIST_HEAD(&kvm->arch.tdp_mmu_roots);
spin_lock_init(&kvm->arch.tdp_mmu_pages_lock);
INIT_LIST_HEAD(&kvm->arch.tdp_mmu_pages);
+
+ return true;
}

static __always_inline void kvm_lockdep_assert_mmu_lock_held(struct kvm *kvm,
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index 5fdf63090451..b046ab5137a1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -80,12 +80,12 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes,
int *root_level);

#ifdef CONFIG_X86_64
-void kvm_mmu_init_tdp_mmu(struct kvm *kvm);
+bool kvm_mmu_init_tdp_mmu(struct kvm *kvm);
void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm);
static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; }
static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; }
#else
-static inline void kvm_mmu_init_tdp_mmu(struct kvm *kvm) {}
+static inline bool kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return false; }
static inline void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) {}
static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; }
static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1098ab73a704..95e74fb9fc20 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10868,9 +10868,44 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
return -ENOMEM;
}

+int alloc_all_memslots_rmaps(struct kvm *kvm)
+{
+ struct kvm_memslots *slots;
+ struct kvm_memory_slot *slot;
+ int r = 0;
+ int i;
+
+ if (kvm_memslots_have_rmaps(kvm))
+ return 0;
+
+ mutex_lock(&kvm->slots_arch_lock);
+ for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ slots = __kvm_memslots(kvm, i);
+ kvm_for_each_memslot(slot, slots) {
+ r = alloc_memslot_rmap(slot, slot->npages);
+ if (r) {
+ mutex_unlock(&kvm->slots_arch_lock);
+ return r;
+ }
+ }
+ }
+
+ /*
+ * memslots_have_rmaps is set and read in different lock contexts,
+ * so protect it with smp_load/store.
+ */
+ smp_store_release(&kvm->arch.memslots_have_rmaps, true);
+ mutex_unlock(&kvm->slots_arch_lock);
+ return 0;
+}
+
bool kvm_memslots_have_rmaps(struct kvm *kvm)
{
- return kvm->arch.memslots_have_rmaps;
+ /*
+ * memslots_have_rmaps is set and read in different lock contexts,
+ * so protect it with smp_load/store.
+ */
+ return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
}

static int kvm_alloc_memslot_metadata(struct kvm *kvm,
--
2.31.1.607.g51e8a6a459-goog

2021-05-06 19:53:04

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation

Add a field to control whether new memslots should have rmaps allocated
for them. As of this change, it's not safe to skip allocating rmaps, so
the field is always set to allocate rmaps. Future changes will make it
safe to operate without rmaps, using the TDP MMU. Then further changes
will allow the rmaps to be allocated lazily when needed for nested
oprtation.

No functional change expected.

Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 8 ++++++++
arch/x86/kvm/mmu/mmu.c | 2 ++
arch/x86/kvm/x86.c | 18 +++++++++++++-----
3 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ad22d4839bcc..00065f9bbc5e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1122,6 +1122,12 @@ struct kvm_arch {
*/
spinlock_t tdp_mmu_pages_lock;
#endif /* CONFIG_X86_64 */
+
+ /*
+ * If set, rmaps have been allocated for all memslots and should be
+ * allocated for any newly created or modified memslots.
+ */
+ bool memslots_have_rmaps;
};

struct kvm_vm_stat {
@@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)

int kvm_cpu_dirty_log_size(void);

+inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
+
#endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 930ac8a7e7c9..8761b4925755 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5469,6 +5469,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)

kvm_mmu_init_tdp_mmu(kvm);

+ kvm->arch.memslots_have_rmaps = true;
+
node->track_write = kvm_mmu_pte_write;
node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
kvm_page_track_register_notifier(kvm, node);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index fc32a7dbe4c4..d7a40ce342cc 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10868,7 +10868,13 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
return -ENOMEM;
}

-static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
+bool kvm_memslots_have_rmaps(struct kvm *kvm)
+{
+ return kvm->arch.memslots_have_rmaps;
+}
+
+static int kvm_alloc_memslot_metadata(struct kvm *kvm,
+ struct kvm_memory_slot *slot,
unsigned long npages)
{
int i;
@@ -10881,9 +10887,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
*/
memset(&slot->arch, 0, sizeof(slot->arch));

- r = alloc_memslot_rmap(slot, npages);
- if (r)
- return r;
+ if (kvm_memslots_have_rmaps(kvm)) {
+ r = alloc_memslot_rmap(slot, npages);
+ if (r)
+ return r;
+ }

for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
struct kvm_lpage_info *linfo;
@@ -10954,7 +10962,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
enum kvm_mr_change change)
{
if (change == KVM_MR_CREATE || change == KVM_MR_MOVE)
- return kvm_alloc_memslot_metadata(memslot,
+ return kvm_alloc_memslot_metadata(kvm, memslot,
mem->memory_size >> PAGE_SHIFT);
return 0;
}
--
2.31.1.607.g51e8a6a459-goog

2021-05-07 00:01:04

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU

In preparation for lazily allocating the rmaps when the TDP MMU is in
use, protect the rmaps with SRCU. Unfortunately, this requires
propagating a pointer to struct kvm around to several functions.

Suggested-by: Paolo Bonzini <[email protected]>
Signed-off-by: Ben Gardon <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 57 +++++++++++++++++++++++++-----------------
arch/x86/kvm/x86.c | 6 ++---
2 files changed, 37 insertions(+), 26 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 730ea84bf7e7..48067c572c02 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -927,13 +927,18 @@ static void pte_list_remove(struct kvm_rmap_head *rmap_head, u64 *sptep)
__pte_list_remove(sptep, rmap_head);
}

-static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level,
+static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
+ int level,
struct kvm_memory_slot *slot)
{
+ struct kvm_rmap_head *head;
unsigned long idx;

idx = gfn_to_index(gfn, slot->base_gfn, level);
- return &slot->arch.rmap[level - PG_LEVEL_4K][idx];
+ head = srcu_dereference_check(slot->arch.rmap[level - PG_LEVEL_4K],
+ &kvm->srcu,
+ lockdep_is_held(&kvm->slots_arch_lock));
+ return &head[idx];
}

static struct kvm_rmap_head *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
@@ -944,7 +949,7 @@ static struct kvm_rmap_head *gfn_to_rmap(struct kvm *kvm, gfn_t gfn,

slots = kvm_memslots_for_spte_role(kvm, sp->role);
slot = __gfn_to_memslot(slots, gfn);
- return __gfn_to_rmap(gfn, sp->role.level, slot);
+ return __gfn_to_rmap(kvm, gfn, sp->role.level, slot);
}

static bool rmap_can_add(struct kvm_vcpu *vcpu)
@@ -1194,7 +1199,8 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
return;

while (mask) {
- rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+ rmap_head = __gfn_to_rmap(kvm,
+ slot->base_gfn + gfn_offset + __ffs(mask),
PG_LEVEL_4K, slot);
__rmap_write_protect(kvm, rmap_head, false);

@@ -1227,7 +1233,8 @@ static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
return;

while (mask) {
- rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + __ffs(mask),
+ rmap_head = __gfn_to_rmap(kvm,
+ slot->base_gfn + gfn_offset + __ffs(mask),
PG_LEVEL_4K, slot);
__rmap_clear_dirty(kvm, rmap_head, slot);

@@ -1270,7 +1277,7 @@ bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,

if (kvm_memslots_have_rmaps(kvm)) {
for (i = PG_LEVEL_4K; i <= KVM_MAX_HUGEPAGE_LEVEL; ++i) {
- rmap_head = __gfn_to_rmap(gfn, i, slot);
+ rmap_head = __gfn_to_rmap(kvm, gfn, i, slot);
write_protected |= __rmap_write_protect(kvm, rmap_head,
true);
}
@@ -1373,17 +1380,19 @@ struct slot_rmap_walk_iterator {
};

static void
-rmap_walk_init_level(struct slot_rmap_walk_iterator *iterator, int level)
+rmap_walk_init_level(struct kvm *kvm, struct slot_rmap_walk_iterator *iterator,
+ int level)
{
iterator->level = level;
iterator->gfn = iterator->start_gfn;
- iterator->rmap = __gfn_to_rmap(iterator->gfn, level, iterator->slot);
- iterator->end_rmap = __gfn_to_rmap(iterator->end_gfn, level,
+ iterator->rmap = __gfn_to_rmap(kvm, iterator->gfn, level,
+ iterator->slot);
+ iterator->end_rmap = __gfn_to_rmap(kvm, iterator->end_gfn, level,
iterator->slot);
}

static void
-slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
+slot_rmap_walk_init(struct kvm *kvm, struct slot_rmap_walk_iterator *iterator,
struct kvm_memory_slot *slot, int start_level,
int end_level, gfn_t start_gfn, gfn_t end_gfn)
{
@@ -1393,7 +1402,7 @@ slot_rmap_walk_init(struct slot_rmap_walk_iterator *iterator,
iterator->start_gfn = start_gfn;
iterator->end_gfn = end_gfn;

- rmap_walk_init_level(iterator, iterator->start_level);
+ rmap_walk_init_level(kvm, iterator, iterator->start_level);
}

static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
@@ -1401,7 +1410,8 @@ static bool slot_rmap_walk_okay(struct slot_rmap_walk_iterator *iterator)
return !!iterator->rmap;
}

-static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
+static void slot_rmap_walk_next(struct kvm *kvm,
+ struct slot_rmap_walk_iterator *iterator)
{
if (++iterator->rmap <= iterator->end_rmap) {
iterator->gfn += (1UL << KVM_HPAGE_GFN_SHIFT(iterator->level));
@@ -1413,15 +1423,15 @@ static void slot_rmap_walk_next(struct slot_rmap_walk_iterator *iterator)
return;
}

- rmap_walk_init_level(iterator, iterator->level);
+ rmap_walk_init_level(kvm, iterator, iterator->level);
}

-#define for_each_slot_rmap_range(_slot_, _start_level_, _end_level_, \
- _start_gfn, _end_gfn, _iter_) \
- for (slot_rmap_walk_init(_iter_, _slot_, _start_level_, \
- _end_level_, _start_gfn, _end_gfn); \
- slot_rmap_walk_okay(_iter_); \
- slot_rmap_walk_next(_iter_))
+#define for_each_slot_rmap_range(_kvm_, _slot_, _start_level_, _end_level_, \
+ _start_gfn, _end_gfn, _iter_) \
+ for (slot_rmap_walk_init(_kvm_, _iter_, _slot_, _start_level_, \
+ _end_level_, _start_gfn, _end_gfn); \
+ slot_rmap_walk_okay(_iter_); \
+ slot_rmap_walk_next(_kvm_, _iter_))

typedef bool (*rmap_handler_t)(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
struct kvm_memory_slot *slot, gfn_t gfn,
@@ -1434,8 +1444,9 @@ static __always_inline bool kvm_handle_gfn_range(struct kvm *kvm,
struct slot_rmap_walk_iterator iterator;
bool ret = false;

- for_each_slot_rmap_range(range->slot, PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL,
- range->start, range->end - 1, &iterator)
+ for_each_slot_rmap_range(kvm, range->slot, PG_LEVEL_4K,
+ KVM_MAX_HUGEPAGE_LEVEL, range->start,
+ range->end - 1, &iterator)
ret |= handler(kvm, iterator.rmap, range->slot, iterator.gfn,
iterator.level, range->pte);

@@ -5233,8 +5244,8 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot,
{
struct slot_rmap_walk_iterator iterator;

- for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
- end_gfn, &iterator) {
+ for_each_slot_rmap_range(kvm, memslot, start_level, end_level,
+ start_gfn, end_gfn, &iterator) {
if (iterator.rmap)
flush |= fn(kvm, iterator.rmap, memslot);

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d7a40ce342cc..1098ab73a704 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10854,9 +10854,9 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
lpages = gfn_to_index(slot->base_gfn + npages - 1,
slot->base_gfn, level) + 1;

- slot->arch.rmap[i] =
- kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
- GFP_KERNEL_ACCOUNT);
+ rcu_assign_pointer(slot->arch.rmap[i],
+ kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
+ GFP_KERNEL_ACCOUNT));
if (!slot->arch.rmap[i])
goto out_free;
}
--
2.31.1.607.g51e8a6a459-goog

2021-05-07 00:01:18

by Ben Gardon

[permalink] [raw]
Subject: [PATCH v3 3/8] KVM: mmu: Refactor memslot copy

Factor out copying kvm_memslots from allocating the memory for new ones
in preparation for adding a new lock to protect the arch-specific fields
of the memslots.

No functional change intended.

Signed-off-by: Ben Gardon <[email protected]>
---
virt/kvm/kvm_main.c | 23 ++++++++++++++++-------
1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 2799c6660cce..c8010f55e368 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1306,6 +1306,18 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
return old_memslots;
}

+static size_t kvm_memslots_size(int slots)
+{
+ return sizeof(struct kvm_memslots) +
+ (sizeof(struct kvm_memory_slot) * slots);
+}
+
+static void kvm_copy_memslots(struct kvm_memslots *from,
+ struct kvm_memslots *to)
+{
+ memcpy(to, from, kvm_memslots_size(from->used_slots));
+}
+
/*
* Note, at a minimum, the current number of used slots must be allocated, even
* when deleting a memslot, as we need a complete duplicate of the memslots for
@@ -1315,19 +1327,16 @@ static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
enum kvm_mr_change change)
{
struct kvm_memslots *slots;
- size_t old_size, new_size;
-
- old_size = sizeof(struct kvm_memslots) +
- (sizeof(struct kvm_memory_slot) * old->used_slots);
+ size_t new_size;

if (change == KVM_MR_CREATE)
- new_size = old_size + sizeof(struct kvm_memory_slot);
+ new_size = kvm_memslots_size(old->used_slots + 1);
else
- new_size = old_size;
+ new_size = kvm_memslots_size(old->used_slots);

slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
if (likely(slots))
- memcpy(slots, old, old_size);
+ kvm_copy_memslots(old, slots);

return slots;
}
--
2.31.1.607.g51e8a6a459-goog

2021-05-07 01:13:54

by Ben Gardon

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation

On Thu, May 6, 2021 at 11:43 AM Ben Gardon <[email protected]> wrote:
>
> Add a field to control whether new memslots should have rmaps allocated
> for them. As of this change, it's not safe to skip allocating rmaps, so
> the field is always set to allocate rmaps. Future changes will make it
> safe to operate without rmaps, using the TDP MMU. Then further changes
> will allow the rmaps to be allocated lazily when needed for nested
> oprtation.
>
> No functional change expected.
>
> Signed-off-by: Ben Gardon <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 8 ++++++++
> arch/x86/kvm/mmu/mmu.c | 2 ++
> arch/x86/kvm/x86.c | 18 +++++++++++++-----
> 3 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index ad22d4839bcc..00065f9bbc5e 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -1122,6 +1122,12 @@ struct kvm_arch {
> */
> spinlock_t tdp_mmu_pages_lock;
> #endif /* CONFIG_X86_64 */
> +
> + /*
> + * If set, rmaps have been allocated for all memslots and should be
> + * allocated for any newly created or modified memslots.
> + */
> + bool memslots_have_rmaps;
> };
>
> struct kvm_vm_stat {
> @@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
>
> int kvm_cpu_dirty_log_size(void);
>
> +inline bool kvm_memslots_have_rmaps(struct kvm *kvm);

Woops, this shouldn't be marked inline as it creates build problems
for the next patch with some configs.

> +
> #endif /* _ASM_X86_KVM_HOST_H */
> diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
> index 930ac8a7e7c9..8761b4925755 100644
> --- a/arch/x86/kvm/mmu/mmu.c
> +++ b/arch/x86/kvm/mmu/mmu.c
> @@ -5469,6 +5469,8 @@ void kvm_mmu_init_vm(struct kvm *kvm)
>
> kvm_mmu_init_tdp_mmu(kvm);
>
> + kvm->arch.memslots_have_rmaps = true;
> +
> node->track_write = kvm_mmu_pte_write;
> node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot;
> kvm_page_track_register_notifier(kvm, node);
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index fc32a7dbe4c4..d7a40ce342cc 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10868,7 +10868,13 @@ static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
> return -ENOMEM;
> }
>
> -static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> +bool kvm_memslots_have_rmaps(struct kvm *kvm)
> +{
> + return kvm->arch.memslots_have_rmaps;
> +}
> +
> +static int kvm_alloc_memslot_metadata(struct kvm *kvm,
> + struct kvm_memory_slot *slot,
> unsigned long npages)
> {
> int i;
> @@ -10881,9 +10887,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> */
> memset(&slot->arch, 0, sizeof(slot->arch));
>
> - r = alloc_memslot_rmap(slot, npages);
> - if (r)
> - return r;
> + if (kvm_memslots_have_rmaps(kvm)) {
> + r = alloc_memslot_rmap(slot, npages);
> + if (r)
> + return r;
> + }
>
> for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
> struct kvm_lpage_info *linfo;
> @@ -10954,7 +10962,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
> enum kvm_mr_change change)
> {
> if (change == KVM_MR_CREATE || change == KVM_MR_MOVE)
> - return kvm_alloc_memslot_metadata(memslot,
> + return kvm_alloc_memslot_metadata(kvm, memslot,
> mem->memory_size >> PAGE_SHIFT);
> return 0;
> }
> --
> 2.31.1.607.g51e8a6a459-goog
>

2021-05-07 01:20:17

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU

Hi Ben,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kvm/queue]
[also build test WARNING on tip/master linus/master next-20210506]
[cannot apply to linux/master v5.12]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: x86_64-randconfig-s021-20210506 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-341-g8af24329-dirty
# https://github.com/0day-ci/linux/commit/dd56af97c1d2b22f9acd46d33c83ac5e7bf67acc
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
git checkout dd56af97c1d2b22f9acd46d33c83ac5e7bf67acc
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)
>> arch/x86/kvm/mmu/mmu.c:938:16: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> arch/x86/kvm/mmu/mmu.c:938:16: sparse: struct kvm_rmap_head [noderef] __rcu *
>> arch/x86/kvm/mmu/mmu.c:938:16: sparse: struct kvm_rmap_head *
arch/x86/kvm/mmu/mmu.c: note: in included file (through include/linux/kvm_host.h, arch/x86/kvm/irq.h):
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
--
arch/x86/kvm/x86.c:10926:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
>> arch/x86/kvm/x86.c:10926:17: sparse: struct kvm_rmap_head [noderef] __rcu *
>> arch/x86/kvm/x86.c:10926:17: sparse: struct kvm_rmap_head *
arch/x86/kvm/x86.c: note: in included file (through include/linux/notifier.h, include/linux/memory_hotplug.h, include/linux/mmzone.h, ...):
include/linux/srcu.h:182:9: sparse: sparse: context imbalance in 'vcpu_enter_guest' - unexpected unlock

vim +938 arch/x86/kvm/mmu/mmu.c

929
930 static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
931 int level,
932 struct kvm_memory_slot *slot)
933 {
934 struct kvm_rmap_head *head;
935 unsigned long idx;
936
937 idx = gfn_to_index(gfn, slot->base_gfn, level);
> 938 head = srcu_dereference_check(slot->arch.rmap[level - PG_LEVEL_4K],
939 &kvm->srcu,
940 lockdep_is_held(&kvm->slots_arch_lock));
941 return &head[idx];
942 }
943

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (4.20 kB)
.config.gz (37.36 kB)
Download all attachments

2021-05-07 01:41:22

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU

Hi Ben,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kvm/queue]
[also build test ERROR on tip/master linus/master next-20210506]
[cannot apply to linux/master v5.12]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: i386-randconfig-r032-20210506 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce (this is a W=1 build):
# https://github.com/0day-ci/linux/commit/dd56af97c1d2b22f9acd46d33c83ac5e7bf67acc
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
git checkout dd56af97c1d2b22f9acd46d33c83ac5e7bf67acc
# save the attached .config to linux build tree
make W=1 W=1 ARCH=i386

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All error/warnings (new ones prefixed by >>):

In file included from arch/x86/kvm/mmu/mmu.c:1821:
arch/x86/kvm/mmu/mmu_audit.c: In function 'inspect_spte_has_rmap':
>> arch/x86/kvm/mmu/mmu_audit.c:150:28: warning: passing argument 1 of '__gfn_to_rmap' makes pointer from integer without a cast [-Wint-conversion]
150 | rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
| ^~~
| |
| gfn_t {aka long long unsigned int}
arch/x86/kvm/mmu/mmu.c:930:56: note: expected 'struct kvm *' but argument is of type 'gfn_t' {aka 'long long unsigned int'}
930 | static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
| ~~~~~~~~~~~~^~~
In file included from arch/x86/kvm/mmu/mmu.c:1821:
>> arch/x86/kvm/mmu/mmu_audit.c:150:53: warning: passing argument 3 of '__gfn_to_rmap' makes integer from pointer without a cast [-Wint-conversion]
150 | rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
| ^~~~
| |
| struct kvm_memory_slot *
arch/x86/kvm/mmu/mmu.c:931:13: note: expected 'int' but argument is of type 'struct kvm_memory_slot *'
931 | int level,
| ~~~~^~~~~
In file included from arch/x86/kvm/mmu/mmu.c:1821:
>> arch/x86/kvm/mmu/mmu_audit.c:150:14: error: too few arguments to function '__gfn_to_rmap'
150 | rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
| ^~~~~~~~~~~~~
arch/x86/kvm/mmu/mmu.c:930:30: note: declared here
930 | static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
| ^~~~~~~~~~~~~
In file included from arch/x86/kvm/mmu/mmu.c:1821:
arch/x86/kvm/mmu/mmu_audit.c: In function 'audit_write_protection':
arch/x86/kvm/mmu/mmu_audit.c:203:30: warning: passing argument 1 of '__gfn_to_rmap' makes pointer from integer without a cast [-Wint-conversion]
203 | rmap_head = __gfn_to_rmap(sp->gfn, PG_LEVEL_4K, slot);
| ~~^~~~~
| |
| gfn_t {aka long long unsigned int}
arch/x86/kvm/mmu/mmu.c:930:56: note: expected 'struct kvm *' but argument is of type 'gfn_t' {aka 'long long unsigned int'}
930 | static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
| ~~~~~~~~~~~~^~~
In file included from arch/x86/kvm/mmu/mmu.c:1821:
arch/x86/kvm/mmu/mmu_audit.c:203:50: warning: passing argument 3 of '__gfn_to_rmap' makes integer from pointer without a cast [-Wint-conversion]
203 | rmap_head = __gfn_to_rmap(sp->gfn, PG_LEVEL_4K, slot);
| ^~~~
| |
| struct kvm_memory_slot *
arch/x86/kvm/mmu/mmu.c:931:13: note: expected 'int' but argument is of type 'struct kvm_memory_slot *'
931 | int level,
| ~~~~^~~~~
In file included from arch/x86/kvm/mmu/mmu.c:1821:
arch/x86/kvm/mmu/mmu_audit.c:203:14: error: too few arguments to function '__gfn_to_rmap'
203 | rmap_head = __gfn_to_rmap(sp->gfn, PG_LEVEL_4K, slot);
| ^~~~~~~~~~~~~
arch/x86/kvm/mmu/mmu.c:930:30: note: declared here
930 | static struct kvm_rmap_head *__gfn_to_rmap(struct kvm *kvm, gfn_t gfn,
| ^~~~~~~~~~~~~


vim +/__gfn_to_rmap +150 arch/x86/kvm/mmu/mmu_audit.c

2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 125
eb2591865a234c arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 126 static void inspect_spte_has_rmap(struct kvm *kvm, u64 *sptep)
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 127 {
bd80158aff71a8 arch/x86/kvm/mmu_audit.c Jan Kiszka 2011-09-12 128 static DEFINE_RATELIMIT_STATE(ratelimit_state, 5 * HZ, 10);
018aabb56d6109 arch/x86/kvm/mmu_audit.c Takuya Yoshikawa 2015-11-20 129 struct kvm_rmap_head *rmap_head;
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 130 struct kvm_mmu_page *rev_sp;
699023e239658e arch/x86/kvm/mmu_audit.c Paolo Bonzini 2015-05-18 131 struct kvm_memslots *slots;
699023e239658e arch/x86/kvm/mmu_audit.c Paolo Bonzini 2015-05-18 132 struct kvm_memory_slot *slot;
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 133 gfn_t gfn;
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 134
573546820b792e arch/x86/kvm/mmu/mmu_audit.c Sean Christopherson 2020-06-22 135 rev_sp = sptep_to_sp(sptep);
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 136 gfn = kvm_mmu_page_get_gfn(rev_sp, sptep - rev_sp->spt);
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 137
699023e239658e arch/x86/kvm/mmu_audit.c Paolo Bonzini 2015-05-18 138 slots = kvm_memslots_for_spte_role(kvm, rev_sp->role);
699023e239658e arch/x86/kvm/mmu_audit.c Paolo Bonzini 2015-05-18 139 slot = __gfn_to_memslot(slots, gfn);
699023e239658e arch/x86/kvm/mmu_audit.c Paolo Bonzini 2015-05-18 140 if (!slot) {
bd80158aff71a8 arch/x86/kvm/mmu_audit.c Jan Kiszka 2011-09-12 141 if (!__ratelimit(&ratelimit_state))
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 142 return;
b034cf0105235e arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-12-23 143 audit_printk(kvm, "no memslot for gfn %llx\n", gfn);
b034cf0105235e arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-12-23 144 audit_printk(kvm, "index %ld of sp (gfn=%llx)\n",
38904e128778c3 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-09-27 145 (long int)(sptep - rev_sp->spt), rev_sp->gfn);
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 146 dump_stack();
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 147 return;
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 148 }
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 149
018aabb56d6109 arch/x86/kvm/mmu_audit.c Takuya Yoshikawa 2015-11-20 @150 rmap_head = __gfn_to_rmap(gfn, rev_sp->role.level, slot);
018aabb56d6109 arch/x86/kvm/mmu_audit.c Takuya Yoshikawa 2015-11-20 151 if (!rmap_head->val) {
bd80158aff71a8 arch/x86/kvm/mmu_audit.c Jan Kiszka 2011-09-12 152 if (!__ratelimit(&ratelimit_state))
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 153 return;
b034cf0105235e arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-12-23 154 audit_printk(kvm, "no rmap for writable spte %llx\n",
b034cf0105235e arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-12-23 155 *sptep);
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 156 dump_stack();
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 157 }
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 158 }
2f4f337248cd56 arch/x86/kvm/mmu_audit.c Xiao Guangrong 2010-08-30 159

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (8.85 kB)
.config.gz (42.58 kB)
Download all attachments

2021-05-07 01:56:06

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps

Hi Ben,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kvm/queue]
[also build test WARNING on tip/master linus/master next-20210506]
[cannot apply to linux/master v5.12]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: x86_64-randconfig-s021-20210506 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-341-g8af24329-dirty
# https://github.com/0day-ci/linux/commit/43798461d3f4a38ef487d9c626dd0fb13e403328
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
git checkout 43798461d3f4a38ef487d9c626dd0fb13e403328
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)
arch/x86/kvm/x86.c:10926:17: sparse: sparse: incompatible types in comparison expression (different address spaces):
arch/x86/kvm/x86.c:10926:17: sparse: struct kvm_rmap_head [noderef] __rcu *
arch/x86/kvm/x86.c:10926:17: sparse: struct kvm_rmap_head *
arch/x86/kvm/x86.c: note: in included file (through include/linux/notifier.h, include/linux/memory_hotplug.h, include/linux/mmzone.h, ...):
include/linux/srcu.h:182:9: sparse: sparse: context imbalance in 'vcpu_enter_guest' - unexpected unlock
>> arch/x86/kvm/x86.c:10971:29: sparse: sparse: marked inline, but without a definition

vim +10971 arch/x86/kvm/x86.c

db3fe4eb45f355 Takuya Yoshikawa 2012-02-08 10913
11bb59d1c3e83b Ben Gardon 2021-05-06 10914 static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
11bb59d1c3e83b Ben Gardon 2021-05-06 10915 unsigned long npages)
11bb59d1c3e83b Ben Gardon 2021-05-06 10916 {
11bb59d1c3e83b Ben Gardon 2021-05-06 10917 int i;
11bb59d1c3e83b Ben Gardon 2021-05-06 10918
11bb59d1c3e83b Ben Gardon 2021-05-06 10919 for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
11bb59d1c3e83b Ben Gardon 2021-05-06 10920 int lpages;
11bb59d1c3e83b Ben Gardon 2021-05-06 10921 int level = i + 1;
11bb59d1c3e83b Ben Gardon 2021-05-06 10922
11bb59d1c3e83b Ben Gardon 2021-05-06 10923 lpages = gfn_to_index(slot->base_gfn + npages - 1,
11bb59d1c3e83b Ben Gardon 2021-05-06 10924 slot->base_gfn, level) + 1;
11bb59d1c3e83b Ben Gardon 2021-05-06 10925
dd56af97c1d2b2 Ben Gardon 2021-05-06 @10926 rcu_assign_pointer(slot->arch.rmap[i],
11bb59d1c3e83b Ben Gardon 2021-05-06 10927 kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
dd56af97c1d2b2 Ben Gardon 2021-05-06 10928 GFP_KERNEL_ACCOUNT));
11bb59d1c3e83b Ben Gardon 2021-05-06 10929 if (!slot->arch.rmap[i])
11bb59d1c3e83b Ben Gardon 2021-05-06 10930 goto out_free;
11bb59d1c3e83b Ben Gardon 2021-05-06 10931 }
11bb59d1c3e83b Ben Gardon 2021-05-06 10932
11bb59d1c3e83b Ben Gardon 2021-05-06 10933 return 0;
11bb59d1c3e83b Ben Gardon 2021-05-06 10934
11bb59d1c3e83b Ben Gardon 2021-05-06 10935 out_free:
11bb59d1c3e83b Ben Gardon 2021-05-06 10936 free_memslot_rmap(slot);
11bb59d1c3e83b Ben Gardon 2021-05-06 10937 return -ENOMEM;
11bb59d1c3e83b Ben Gardon 2021-05-06 10938 }
11bb59d1c3e83b Ben Gardon 2021-05-06 10939
43798461d3f4a3 Ben Gardon 2021-05-06 10940 int alloc_all_memslots_rmaps(struct kvm *kvm)
43798461d3f4a3 Ben Gardon 2021-05-06 10941 {
43798461d3f4a3 Ben Gardon 2021-05-06 10942 struct kvm_memslots *slots;
43798461d3f4a3 Ben Gardon 2021-05-06 10943 struct kvm_memory_slot *slot;
43798461d3f4a3 Ben Gardon 2021-05-06 10944 int r = 0;
43798461d3f4a3 Ben Gardon 2021-05-06 10945 int i;
43798461d3f4a3 Ben Gardon 2021-05-06 10946
43798461d3f4a3 Ben Gardon 2021-05-06 10947 if (kvm_memslots_have_rmaps(kvm))
43798461d3f4a3 Ben Gardon 2021-05-06 10948 return 0;
43798461d3f4a3 Ben Gardon 2021-05-06 10949
43798461d3f4a3 Ben Gardon 2021-05-06 10950 mutex_lock(&kvm->slots_arch_lock);
43798461d3f4a3 Ben Gardon 2021-05-06 10951 for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
43798461d3f4a3 Ben Gardon 2021-05-06 10952 slots = __kvm_memslots(kvm, i);
43798461d3f4a3 Ben Gardon 2021-05-06 10953 kvm_for_each_memslot(slot, slots) {
43798461d3f4a3 Ben Gardon 2021-05-06 10954 r = alloc_memslot_rmap(slot, slot->npages);
43798461d3f4a3 Ben Gardon 2021-05-06 10955 if (r) {
43798461d3f4a3 Ben Gardon 2021-05-06 10956 mutex_unlock(&kvm->slots_arch_lock);
43798461d3f4a3 Ben Gardon 2021-05-06 10957 return r;
43798461d3f4a3 Ben Gardon 2021-05-06 10958 }
43798461d3f4a3 Ben Gardon 2021-05-06 10959 }
43798461d3f4a3 Ben Gardon 2021-05-06 10960 }
43798461d3f4a3 Ben Gardon 2021-05-06 10961
43798461d3f4a3 Ben Gardon 2021-05-06 10962 /*
43798461d3f4a3 Ben Gardon 2021-05-06 10963 * memslots_have_rmaps is set and read in different lock contexts,
43798461d3f4a3 Ben Gardon 2021-05-06 10964 * so protect it with smp_load/store.
43798461d3f4a3 Ben Gardon 2021-05-06 10965 */
43798461d3f4a3 Ben Gardon 2021-05-06 10966 smp_store_release(&kvm->arch.memslots_have_rmaps, true);
43798461d3f4a3 Ben Gardon 2021-05-06 10967 mutex_unlock(&kvm->slots_arch_lock);
43798461d3f4a3 Ben Gardon 2021-05-06 10968 return 0;
43798461d3f4a3 Ben Gardon 2021-05-06 10969 }
43798461d3f4a3 Ben Gardon 2021-05-06 10970
fac0d516d29b67 Ben Gardon 2021-05-06 @10971 bool kvm_memslots_have_rmaps(struct kvm *kvm)
fac0d516d29b67 Ben Gardon 2021-05-06 10972 {
43798461d3f4a3 Ben Gardon 2021-05-06 10973 /*
43798461d3f4a3 Ben Gardon 2021-05-06 10974 * memslots_have_rmaps is set and read in different lock contexts,
43798461d3f4a3 Ben Gardon 2021-05-06 10975 * so protect it with smp_load/store.
43798461d3f4a3 Ben Gardon 2021-05-06 10976 */
43798461d3f4a3 Ben Gardon 2021-05-06 10977 return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
fac0d516d29b67 Ben Gardon 2021-05-06 10978 }
fac0d516d29b67 Ben Gardon 2021-05-06 10979

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (6.91 kB)
.config.gz (37.36 kB)
Download all attachments

2021-05-07 03:41:47

by kernel test robot

[permalink] [raw]
Subject: Re: [PATCH v3 6/8] KVM: x86/mmu: Skip rmap operations if rmaps not allocated

Hi Ben,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on kvm/queue]
[also build test WARNING on tip/master linus/master next-20210506]
[cannot apply to linux/master v5.12]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git queue
config: x86_64-randconfig-s021-20210506 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-22) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-341-g8af24329-dirty
# https://github.com/0day-ci/linux/commit/c7c4f34907c708172e52fb04dc753a917a5eebb6
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Ben-Gardon/Lazily-allocate-memslot-rmaps/20210507-024533
git checkout c7c4f34907c708172e52fb04dc753a917a5eebb6
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' W=1 ARCH=x86_64

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>


sparse warnings: (new ones prefixed by >>)
arch/x86/kvm/mmu/mmu.c: note: in included file (through include/linux/kvm_host.h, arch/x86/kvm/irq.h):
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition
>> arch/x86/include/asm/kvm_host.h:1871:36: sparse: sparse: marked inline, but without a definition

vim +1871 arch/x86/include/asm/kvm_host.h

fb04a1eddb1a65 Peter Xu 2020-09-30 1870
fac0d516d29b67 Ben Gardon 2021-05-06 @1871 inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
fac0d516d29b67 Ben Gardon 2021-05-06 1872

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (3.11 kB)
.config.gz (37.36 kB)
Download all attachments

2021-05-07 08:53:22

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 0/8] Lazily allocate memslot rmaps

On 06.05.21 20:42, Ben Gardon wrote:
> This series enables KVM to save memory when using the TDP MMU by waiting
> to allocate memslot rmaps until they are needed. To do this, KVM tracks
> whether or not a shadow root has been allocated. In order to get away
> with not allocating the rmaps, KVM must also be sure to skip operations
> which iterate over the rmaps. If the TDP MMU is in use and we have not
> allocated a shadow root, these operations would essentially be op-ops
> anyway. Skipping the rmap operations has a secondary benefit of avoiding
> acquiring the MMU lock in write mode in many cases, substantially
> reducing MMU lock contention.
>
> This series was tested on an Intel Skylake machine. With the TDP MMU off
> and on, this introduced no new failures on kvm-unit-tests or KVM selftests.
>

Happy to see this change pop up, I remember discussing this with Paolo
recently.

Another step to reduce the rmap overhead could be looking into using a
dynamic datastructure to manage the rmap, instead of allocating a
fixed-sized array. That could also significantly reduce memory overhead
in some setups and give us more flexibility, for example, for resizing
or splitting slots atomically.

--
Thanks,

David / dhildenb

2021-05-07 09:04:10

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation

On 07.05.21 01:44, Ben Gardon wrote:
> On Thu, May 6, 2021 at 11:43 AM Ben Gardon <[email protected]> wrote:
>>
>> Add a field to control whether new memslots should have rmaps allocated
>> for them. As of this change, it's not safe to skip allocating rmaps, so
>> the field is always set to allocate rmaps. Future changes will make it
>> safe to operate without rmaps, using the TDP MMU. Then further changes
>> will allow the rmaps to be allocated lazily when needed for nested
>> oprtation.
>>
>> No functional change expected.
>>
>> Signed-off-by: Ben Gardon <[email protected]>
>> ---
>> arch/x86/include/asm/kvm_host.h | 8 ++++++++
>> arch/x86/kvm/mmu/mmu.c | 2 ++
>> arch/x86/kvm/x86.c | 18 +++++++++++++-----
>> 3 files changed, 23 insertions(+), 5 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
>> index ad22d4839bcc..00065f9bbc5e 100644
>> --- a/arch/x86/include/asm/kvm_host.h
>> +++ b/arch/x86/include/asm/kvm_host.h
>> @@ -1122,6 +1122,12 @@ struct kvm_arch {
>> */
>> spinlock_t tdp_mmu_pages_lock;
>> #endif /* CONFIG_X86_64 */
>> +
>> + /*
>> + * If set, rmaps have been allocated for all memslots and should be
>> + * allocated for any newly created or modified memslots.
>> + */
>> + bool memslots_have_rmaps;
>> };
>>
>> struct kvm_vm_stat {
>> @@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
>>
>> int kvm_cpu_dirty_log_size(void);
>>
>> +inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
>
> Woops, this shouldn't be marked inline as it creates build problems
> for the next patch with some configs.

With that fixed

Reviewed-by: David Hildenbrand <[email protected]>


--
Thanks,

David / dhildenb

2021-05-07 09:04:10

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 3/8] KVM: mmu: Refactor memslot copy

On 06.05.21 20:42, Ben Gardon wrote:
> Factor out copying kvm_memslots from allocating the memory for new ones
> in preparation for adding a new lock to protect the arch-specific fields
> of the memslots.
>
> No functional change intended.
>
> Signed-off-by: Ben Gardon <[email protected]>
> ---
> virt/kvm/kvm_main.c | 23 ++++++++++++++++-------
> 1 file changed, 16 insertions(+), 7 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 2799c6660cce..c8010f55e368 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1306,6 +1306,18 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
> return old_memslots;
> }
>
> +static size_t kvm_memslots_size(int slots)
> +{
> + return sizeof(struct kvm_memslots) +
> + (sizeof(struct kvm_memory_slot) * slots);

no need for the extra set of parentheses

> +}
> +
> +static void kvm_copy_memslots(struct kvm_memslots *from,
> + struct kvm_memslots *to)
> +{
> + memcpy(to, from, kvm_memslots_size(from->used_slots));
> +}
> +
> /*
> * Note, at a minimum, the current number of used slots must be allocated, even
> * when deleting a memslot, as we need a complete duplicate of the memslots for
> @@ -1315,19 +1327,16 @@ static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
> enum kvm_mr_change change)
> {
> struct kvm_memslots *slots;
> - size_t old_size, new_size;
> -
> - old_size = sizeof(struct kvm_memslots) +
> - (sizeof(struct kvm_memory_slot) * old->used_slots);
> + size_t new_size;
>
> if (change == KVM_MR_CREATE)
> - new_size = old_size + sizeof(struct kvm_memory_slot);
> + new_size = kvm_memslots_size(old->used_slots + 1);
> else
> - new_size = old_size;
> + new_size = kvm_memslots_size(old->used_slots);
>
> slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
> if (likely(slots))
> - memcpy(slots, old, old_size);
> + kvm_copy_memslots(old, slots);
>
> return slots;
> }
>

Reviewed-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb

2021-05-07 09:27:44

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 8/8] KVM: x86/mmu: Lazily allocate memslot rmaps

On 06/05/21 20:42, Ben Gardon wrote:
> + /*
> + * memslots_have_rmaps is set and read in different lock contexts,
> + * so protect it with smp_load/store.
> + */
> + smp_store_release(&kvm->arch.memslots_have_rmaps, true);

Shorter and better: /* write rmap pointers before memslots_have_rmaps */

> + mutex_unlock(&kvm->slots_arch_lock);
> + return 0;
> +}
> +
> bool kvm_memslots_have_rmaps(struct kvm *kvm)
> {
> - return kvm->arch.memslots_have_rmaps;
> + /*
> + * memslots_have_rmaps is set and read in different lock contexts,
> + * so protect it with smp_load/store.
> + */
> + return smp_load_acquire(&kvm->arch.memslots_have_rmaps);
> }
>

Likewise, /* read memslots_have_rmaps before the rmaps themselves */

Paolo

2021-05-07 10:25:07

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 1/8] KVM: x86/mmu: Deduplicate rmap freeing

On 06.05.21 20:42, Ben Gardon wrote:
> Small code deduplication. No functional change expected.
>
> Signed-off-by: Ben Gardon <[email protected]>
> ---
> arch/x86/kvm/x86.c | 19 +++++++++++--------
> 1 file changed, 11 insertions(+), 8 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index cf3b67679cf0..5bcf07465c47 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10818,17 +10818,23 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
> kvm_hv_destroy_vm(kvm);
> }
>
> -void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> +static void free_memslot_rmap(struct kvm_memory_slot *slot)
> {
> int i;
>
> for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> kvfree(slot->arch.rmap[i]);
> slot->arch.rmap[i] = NULL;
> + }
> +}
>
> - if (i == 0)
> - continue;
> +void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> +{
> + int i;
> +
> + free_memslot_rmap(slot);
>
> + for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
> kvfree(slot->arch.lpage_info[i - 1]);
> slot->arch.lpage_info[i - 1] = NULL;
> }
> @@ -10894,12 +10900,9 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> return 0;
>
> out_free:
> - for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> - kvfree(slot->arch.rmap[i]);
> - slot->arch.rmap[i] = NULL;
> - if (i == 0)
> - continue;
> + free_memslot_rmap(slot);
>
> + for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
> kvfree(slot->arch.lpage_info[i - 1]);
> slot->arch.lpage_info[i - 1] = NULL;
> }
>

Reviewed-by: David Hildenbrand <[email protected]>

--
Thanks,

David / dhildenb

2021-05-07 11:03:38

by David Hildenbrand

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap

On 06.05.21 20:42, Ben Gardon wrote:
> Small refactor to facilitate allocating rmaps for all memslots at once.
>
> No functional change expected.
>
> Signed-off-by: Ben Gardon <[email protected]>
> ---
> arch/x86/kvm/x86.c | 41 ++++++++++++++++++++++++++++++++---------
> 1 file changed, 32 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 5bcf07465c47..fc32a7dbe4c4 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -10842,10 +10842,37 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> kvm_page_track_free_memslot(slot);
> }
>
> +static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
> + unsigned long npages)

I'd have called the functions memslot_rmap_alloc() and memslot_rmap_free()


> +{
> + int i;
> +
> + for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> + int lpages;
> + int level = i + 1;
> +
> + lpages = gfn_to_index(slot->base_gfn + npages - 1,
> + slot->base_gfn, level) + 1;
> +
> + slot->arch.rmap[i] =
> + kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
> + GFP_KERNEL_ACCOUNT);
> + if (!slot->arch.rmap[i])
> + goto out_free;

you can just avoid the goto here and do the free_memslot_rmap() right away.

> + }
> +
> + return 0;
> +
> +out_free:
> + free_memslot_rmap(slot);
> + return -ENOMEM;
> +}
> +
> static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> unsigned long npages)
> {
> int i;
> + int r;
>
> /*
> * Clear out the previous array pointers for the KVM_MR_MOVE case. The
> @@ -10854,7 +10881,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> */
> memset(&slot->arch, 0, sizeof(slot->arch));
>
> - for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> + r = alloc_memslot_rmap(slot, npages);
> + if (r)
> + return r;
> +
> + for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
> struct kvm_lpage_info *linfo;
> unsigned long ugfn;
> int lpages;
> @@ -10863,14 +10894,6 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> lpages = gfn_to_index(slot->base_gfn + npages - 1,
> slot->base_gfn, level) + 1;
>
> - slot->arch.rmap[i] =
> - kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
> - GFP_KERNEL_ACCOUNT);
> - if (!slot->arch.rmap[i])
> - goto out_free;
> - if (i == 0)
> - continue;
> -
> linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
> if (!linfo)
> goto out_free;
>

apart from that LGTM

--
Thanks,

David / dhildenb

2021-05-07 11:32:24

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation

On 07/05/21 01:44, Ben Gardon wrote:
>> struct kvm_vm_stat {
>> @@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
>>
>> int kvm_cpu_dirty_log_size(void);
>>
>> +inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
> Woops, this shouldn't be marked inline as it creates build problems
> for the next patch with some configs.
>

Possibly stupid (or at least lazy) question: why can't it be a "normal"
static inline function?

Paolo

2021-05-07 11:55:42

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU

On 06/05/21 20:42, Ben Gardon wrote:
> In preparation for lazily allocating the rmaps when the TDP MMU is in
> use, protect the rmaps with SRCU. Unfortunately, this requires
> propagating a pointer to struct kvm around to several functions.

Thinking more about it, this is not needed because all reads of the rmap
array are guarded by the load-acquire of kvm->arch.memslots_have_rmaps.
That is, the pattern is always

if (!load-acquire(memslot_have_rmaps))
return;
... = __gfn_to_rmap(...)

slots->arch.rmap[x] = ...
store-release(memslot_have_rmaps, true)

where the load-acquire/store-release have the same role that
srcu_dereference/rcu_assign_pointer had before this patch.

We also know that any read that misses the check has the potential for a
NULL pointer dereference, so it *has* to be like that.

That said, srcu_dereference has zero cost unless debugging options are
enabled, and it *is* true that the rmap can disappear if kvm->srcu is
not held, so I lean towards keeping this change and just changing the
commit message like this:

---------
Currently, rmaps are always allocated and published together with a new
memslot, so the srcu_dereference for the memslots array already ensures
that the memory pointed to by slots->arch.rmap is zero at the time
slots->arch.rmap. However, they still need to be accessed in an SRCU
read-side critical section, as the whole memslot can be deleted outside
SRCU.
--------

Thanks,

Paolo

>
> Suggested-by: Paolo Bonzini<[email protected]>
> Signed-off-by: Ben Gardon<[email protected]>
> ---
> arch/x86/kvm/mmu/mmu.c | 57 +++++++++++++++++++++++++-----------------
> arch/x86/kvm/x86.c | 6 ++---
> 2 files changed, 37 insertions(+), 26 deletions(-)

2021-05-10 16:15:25

by Ben Gardon

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation

On Fri, May 7, 2021 at 1:28 AM Paolo Bonzini <[email protected]> wrote:
>
> On 07/05/21 01:44, Ben Gardon wrote:
> >> struct kvm_vm_stat {
> >> @@ -1853,4 +1859,6 @@ static inline int kvm_cpu_get_apicid(int mps_cpu)
> >>
> >> int kvm_cpu_dirty_log_size(void);
> >>
> >> +inline bool kvm_memslots_have_rmaps(struct kvm *kvm);
> > Woops, this shouldn't be marked inline as it creates build problems
> > for the next patch with some configs.
> >
>
> Possibly stupid (or at least lazy) question: why can't it be a "normal"
> static inline function?

That was my initial approach (hence the leftover inline) but I got
some warnings about a forward declaration of struct kvm because
arch/x86/include/asm/kvm_host.h doesn't include virt/kvm/kvm_host.h.
Maybe there's a way to fix that, but I didn't want to mess with it.

>
> Paolo
>

2021-05-10 16:33:11

by Ben Gardon

[permalink] [raw]
Subject: Re: [PATCH v3 2/8] KVM: x86/mmu: Factor out allocating memslot rmap

On Fri, May 7, 2021 at 12:47 AM David Hildenbrand <[email protected]> wrote:
>
> On 06.05.21 20:42, Ben Gardon wrote:
> > Small refactor to facilitate allocating rmaps for all memslots at once.
> >
> > No functional change expected.
> >
> > Signed-off-by: Ben Gardon <[email protected]>
> > ---
> > arch/x86/kvm/x86.c | 41 ++++++++++++++++++++++++++++++++---------
> > 1 file changed, 32 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 5bcf07465c47..fc32a7dbe4c4 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10842,10 +10842,37 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
> > kvm_page_track_free_memslot(slot);
> > }
> >
> > +static int alloc_memslot_rmap(struct kvm_memory_slot *slot,
> > + unsigned long npages)
>
> I'd have called the functions memslot_rmap_alloc() and memslot_rmap_free()
>
>
> > +{
> > + int i;
> > +
> > + for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> > + int lpages;
> > + int level = i + 1;
> > +
> > + lpages = gfn_to_index(slot->base_gfn + npages - 1,
> > + slot->base_gfn, level) + 1;
> > +
> > + slot->arch.rmap[i] =
> > + kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
> > + GFP_KERNEL_ACCOUNT);
> > + if (!slot->arch.rmap[i])
> > + goto out_free;
>
> you can just avoid the goto here and do the free_memslot_rmap() right away.

Oh good catch. I'll incorporate your suggestions in v4.

>
> > + }
> > +
> > + return 0;
> > +
> > +out_free:
> > + free_memslot_rmap(slot);
> > + return -ENOMEM;
> > +}
> > +
> > static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> > unsigned long npages)
> > {
> > int i;
> > + int r;
> >
> > /*
> > * Clear out the previous array pointers for the KVM_MR_MOVE case. The
> > @@ -10854,7 +10881,11 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> > */
> > memset(&slot->arch, 0, sizeof(slot->arch));
> >
> > - for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> > + r = alloc_memslot_rmap(slot, npages);
> > + if (r)
> > + return r;
> > +
> > + for (i = 1; i < KVM_NR_PAGE_SIZES; ++i) {
> > struct kvm_lpage_info *linfo;
> > unsigned long ugfn;
> > int lpages;
> > @@ -10863,14 +10894,6 @@ static int kvm_alloc_memslot_metadata(struct kvm_memory_slot *slot,
> > lpages = gfn_to_index(slot->base_gfn + npages - 1,
> > slot->base_gfn, level) + 1;
> >
> > - slot->arch.rmap[i] =
> > - kvcalloc(lpages, sizeof(*slot->arch.rmap[i]),
> > - GFP_KERNEL_ACCOUNT);
> > - if (!slot->arch.rmap[i])
> > - goto out_free;
> > - if (i == 0)
> > - continue;
> > -
> > linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT);
> > if (!linfo)
> > goto out_free;
> >
>
> apart from that LGTM
>
> --
> Thanks,
>
> David / dhildenb
>

2021-05-10 16:35:01

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation

On 10/05/21 18:14, Ben Gardon wrote:
>> Possibly stupid (or at least lazy) question: why can't it be a "normal"
>> static inline function?
> That was my initial approach (hence the leftover inline) but I got
> some warnings about a forward declaration of struct kvm because
> arch/x86/include/asm/kvm_host.h doesn't include virt/kvm/kvm_host.h.
> Maybe there's a way to fix that, but I didn't want to mess with it.
>

Let's just use the field directly.

Paolo

2021-05-10 16:39:21

by Ben Gardon

[permalink] [raw]
Subject: Re: [PATCH v3 5/8] KVM: x86/mmu: Add a field to control memslot rmap allocation

On Mon, May 10, 2021 at 9:33 AM Paolo Bonzini <[email protected]> wrote:
>
> On 10/05/21 18:14, Ben Gardon wrote:
> >> Possibly stupid (or at least lazy) question: why can't it be a "normal"
> >> static inline function?
> > That was my initial approach (hence the leftover inline) but I got
> > some warnings about a forward declaration of struct kvm because
> > arch/x86/include/asm/kvm_host.h doesn't include virt/kvm/kvm_host.h.
> > Maybe there's a way to fix that, but I didn't want to mess with it.
> >
>
> Let's just use the field directly.

That works for me too. I moved to the wrapper because adding the
smp_load_acquire and a comment explaining why we were doing that
looked bloated and I thought it would be easier to document in one
place, but it's not that much bloat, and having the subtleties
documented directly in the function is probably clearer for readers
anyway.

>
> Paolo
>

2021-05-10 17:48:54

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU

On Fri, May 07, 2021, Paolo Bonzini wrote:
> On 06/05/21 20:42, Ben Gardon wrote:
> > In preparation for lazily allocating the rmaps when the TDP MMU is in
> > use, protect the rmaps with SRCU. Unfortunately, this requires
> > propagating a pointer to struct kvm around to several functions.
>
> Thinking more about it, this is not needed because all reads of the rmap
> array are guarded by the load-acquire of kvm->arch.memslots_have_rmaps.
> That is, the pattern is always
>
> if (!load-acquire(memslot_have_rmaps))
> return;
> ... = __gfn_to_rmap(...)
>
> slots->arch.rmap[x] = ...
> store-release(memslot_have_rmaps, true)
>
> where the load-acquire/store-release have the same role that
> srcu_dereference/rcu_assign_pointer had before this patch.
>
> We also know that any read that misses the check has the potential for a
> NULL pointer dereference, so it *has* to be like that.
>
> That said, srcu_dereference has zero cost unless debugging options are
> enabled, and it *is* true that the rmap can disappear if kvm->srcu is not
> held, so I lean towards keeping this change and just changing the commit
> message like this:
>
> ---------
> Currently, rmaps are always allocated and published together with a new
> memslot, so the srcu_dereference for the memslots array already ensures that
> the memory pointed to by slots->arch.rmap is zero at the time
> slots->arch.rmap. However, they still need to be accessed in an SRCU
> read-side critical section, as the whole memslot can be deleted outside
> SRCU.
> --------

I disagree, sprinkling random and unnecessary __rcu/SRCU annotations does more
harm than good. Adding the unnecessary tag could be quite misleading as it
would imply the rmap pointers can _change_ independent of the memslots.

Similary, adding rcu_assign_pointer() in alloc_memslot_rmap() implies that its
safe to access the rmap after its pointer is assigned, and that's simply not
true since an rmap array can be freed if rmap allocation for a different memslot
fails. Accessing the rmap is safe if and only if all rmaps are allocated, i.e.
if arch.memslots_have_rmaps is true, as you pointed out.

Furthermore, to actually gain any protection from SRCU, there would have to be
an synchronize_srcu() call after assigning the pointers, and that _does_ have an
associated. Not to mention that to truly respect the __rcu annotation, deleting
the rmaps would also have to be done "independently" with the correct
rcu_assign_pointer() and synchronize_srcu() logic.

2021-05-10 17:54:56

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU

On 10/05/21 19:45, Sean Christopherson wrote:
>>
>> ---------
>> Currently, rmaps are always allocated and published together with a new
>> memslot, so the srcu_dereference for the memslots array already ensures that
>> the memory pointed to by slots->arch.rmap is zero at the time
>> slots->arch.rmap. However, they still need to be accessed in an SRCU
>> read-side critical section, as the whole memslot can be deleted outside
>> SRCU.
>> --------
> I disagree, sprinkling random and unnecessary __rcu/SRCU annotations does more
> harm than good. Adding the unnecessary tag could be quite misleading as it
> would imply the rmap pointers can_change_ independent of the memslots.
>
> Similary, adding rcu_assign_pointer() in alloc_memslot_rmap() implies that its
> safe to access the rmap after its pointer is assigned, and that's simply not
> true since an rmap array can be freed if rmap allocation for a different memslot
> fails. Accessing the rmap is safe if and only if all rmaps are allocated, i.e.
> if arch.memslots_have_rmaps is true, as you pointed out.

This about freeing is a very good point.

> Furthermore, to actually gain any protection from SRCU, there would have to be
> an synchronize_srcu() call after assigning the pointers, and that _does_ have an
> associated.

... but this is incorrect (I was almost going to point out the below in
my reply to Ben, then decided I was pointing out the obvious; lesson
learned).

synchronize_srcu() is only needed after *deleting* something, which in
this case is done as part of deleting the memslots---it's perfectly fine
to batch multiple synchronize_*() calls given how expensive some of them
are.

(BTW an associated what?)

So they still count as RCU-protected in my opinion, just because reading
them outside SRCU is a big no and ought to warn (it's unlikely that it
happens with rmaps, but then we just had 2-3 bugs like this being
reported in a short time for memslots so never say never). However,
rcu_assign_pointer is not needed because the visibility of the rmaps is
further protected by the have-rmaps flag (to be accessed with
load-acquire/store-release) and not just by the pointer being there and
non-NULL.

Paolo

> Not to mention that to truly respect the __rcu annotation, deleting
> the rmaps would also have to be done "independently" with the correct
> rcu_assign_pointer() and synchronize_srcu() logic.
>

2021-05-10 18:31:00

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU

On Mon, May 10, 2021, Paolo Bonzini wrote:
> On 10/05/21 19:45, Sean Christopherson wrote:
> > >
> > > ---------
> > > Currently, rmaps are always allocated and published together with a new
> > > memslot, so the srcu_dereference for the memslots array already ensures that
> > > the memory pointed to by slots->arch.rmap is zero at the time
> > > slots->arch.rmap. However, they still need to be accessed in an SRCU
> > > read-side critical section, as the whole memslot can be deleted outside
> > > SRCU.
> > > --------
> > I disagree, sprinkling random and unnecessary __rcu/SRCU annotations does more
> > harm than good. Adding the unnecessary tag could be quite misleading as it
> > would imply the rmap pointers can_change_ independent of the memslots.
> >
> > Similary, adding rcu_assign_pointer() in alloc_memslot_rmap() implies that its
> > safe to access the rmap after its pointer is assigned, and that's simply not
> > true since an rmap array can be freed if rmap allocation for a different memslot
> > fails. Accessing the rmap is safe if and only if all rmaps are allocated, i.e.
> > if arch.memslots_have_rmaps is true, as you pointed out.
>
> This about freeing is a very good point.
>
> > Furthermore, to actually gain any protection from SRCU, there would have to be
> > an synchronize_srcu() call after assigning the pointers, and that _does_ have an
> > associated.
>
> ... but this is incorrect (I was almost going to point out the below in my
> reply to Ben, then decided I was pointing out the obvious; lesson learned).
>
> synchronize_srcu() is only needed after *deleting* something, which in this

No, synchronization is required any time the writer needs to ensure readers have
recognized the change. E.g. making a memslot RO, moving a memslot's gfn base,
adding an MSR to the filter list. I suppose you could frame any modification as
"deleting" something, but IMO that's cheating :-)

> case is done as part of deleting the memslots---it's perfectly fine to batch
> multiple synchronize_*() calls given how expensive some of them are.

Yes, but the shortlog says "Protect rmaps _independently_ with SRCU", emphasis
mine. If the rmaps are truly protected independently, then they need to have
their own synchronization. Setting all rmaps could be batched under a single
synchronize_srcu(), but IMO batching the rmaps with the memslot itself would be
in direct contradiction with the shortlog.

> (BTW an associated what?)

Doh. "associated memslot."

> So they still count as RCU-protected in my opinion, just because reading
> them outside SRCU is a big no and ought to warn (it's unlikely that it
> happens with rmaps, but then we just had 2-3 bugs like this being reported
> in a short time for memslots so never say never).

Yes, but that interpretation holds true for literally everything that is hidden
behind an SRCU-protected pointer. E.g. this would also be wrong, it's just much
more obviously broken:

bool kvm_is_gfn_writable(struct kvm* kvm, gfn_t gfn)
{
struct kvm_memory_slot *slot;
int idx;

idx = srcu_read_lock(&kvm->srcu);
slot = gfn_to_memslot(kvm, gfn);
srcu_read_unlock(&kvm->srcu);

return slot && !(slot->flags & KVM_MEMSLOT_INVALID) &&
!(slot->flags & KVM_MEM_READONLY);
}


> However, rcu_assign_pointer is not needed because the visibility of the rmaps
> is further protected by the have-rmaps flag (to be accessed with
> load-acquire/store-release) and not just by the pointer being there and
> non-NULL.

Yes, and I'm arguing that annotating the rmaps as __rcu is wrong because they
themselves are not protected by SRCU. The memslot that contains the rmaps is
protected by SRCU, and because of that asserting SRCU is held for read will hold
true. But, if the memslot code were changed to use a different protection scheme,
e.g. a rwlock for argument's sake, then the SRCU assertion would fail even though
the rmap logic itself didn't change.

2021-05-11 16:27:05

by Ben Gardon

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU

On Mon, May 10, 2021 at 11:28 AM Sean Christopherson <[email protected]> wrote:
>
> On Mon, May 10, 2021, Paolo Bonzini wrote:
> > On 10/05/21 19:45, Sean Christopherson wrote:
> > > >
> > > > ---------
> > > > Currently, rmaps are always allocated and published together with a new
> > > > memslot, so the srcu_dereference for the memslots array already ensures that
> > > > the memory pointed to by slots->arch.rmap is zero at the time
> > > > slots->arch.rmap. However, they still need to be accessed in an SRCU
> > > > read-side critical section, as the whole memslot can be deleted outside
> > > > SRCU.
> > > > --------
> > > I disagree, sprinkling random and unnecessary __rcu/SRCU annotations does more
> > > harm than good. Adding the unnecessary tag could be quite misleading as it
> > > would imply the rmap pointers can_change_ independent of the memslots.
> > >
> > > Similary, adding rcu_assign_pointer() in alloc_memslot_rmap() implies that its
> > > safe to access the rmap after its pointer is assigned, and that's simply not
> > > true since an rmap array can be freed if rmap allocation for a different memslot
> > > fails. Accessing the rmap is safe if and only if all rmaps are allocated, i.e.
> > > if arch.memslots_have_rmaps is true, as you pointed out.
> >
> > This about freeing is a very good point.
> >
> > > Furthermore, to actually gain any protection from SRCU, there would have to be
> > > an synchronize_srcu() call after assigning the pointers, and that _does_ have an
> > > associated.
> >
> > ... but this is incorrect (I was almost going to point out the below in my
> > reply to Ben, then decided I was pointing out the obvious; lesson learned).
> >
> > synchronize_srcu() is only needed after *deleting* something, which in this
>
> No, synchronization is required any time the writer needs to ensure readers have
> recognized the change. E.g. making a memslot RO, moving a memslot's gfn base,
> adding an MSR to the filter list. I suppose you could frame any modification as
> "deleting" something, but IMO that's cheating :-)
>
> > case is done as part of deleting the memslots---it's perfectly fine to batch
> > multiple synchronize_*() calls given how expensive some of them are.
>
> Yes, but the shortlog says "Protect rmaps _independently_ with SRCU", emphasis
> mine. If the rmaps are truly protected independently, then they need to have
> their own synchronization. Setting all rmaps could be batched under a single
> synchronize_srcu(), but IMO batching the rmaps with the memslot itself would be
> in direct contradiction with the shortlog.
>
> > (BTW an associated what?)
>
> Doh. "associated memslot."
>
> > So they still count as RCU-protected in my opinion, just because reading
> > them outside SRCU is a big no and ought to warn (it's unlikely that it
> > happens with rmaps, but then we just had 2-3 bugs like this being reported
> > in a short time for memslots so never say never).
>
> Yes, but that interpretation holds true for literally everything that is hidden
> behind an SRCU-protected pointer. E.g. this would also be wrong, it's just much
> more obviously broken:
>
> bool kvm_is_gfn_writable(struct kvm* kvm, gfn_t gfn)
> {
> struct kvm_memory_slot *slot;
> int idx;
>
> idx = srcu_read_lock(&kvm->srcu);
> slot = gfn_to_memslot(kvm, gfn);
> srcu_read_unlock(&kvm->srcu);
>
> return slot && !(slot->flags & KVM_MEMSLOT_INVALID) &&
> !(slot->flags & KVM_MEM_READONLY);
> }
>
>
> > However, rcu_assign_pointer is not needed because the visibility of the rmaps
> > is further protected by the have-rmaps flag (to be accessed with
> > load-acquire/store-release) and not just by the pointer being there and
> > non-NULL.
>
> Yes, and I'm arguing that annotating the rmaps as __rcu is wrong because they
> themselves are not protected by SRCU. The memslot that contains the rmaps is
> protected by SRCU, and because of that asserting SRCU is held for read will hold
> true. But, if the memslot code were changed to use a different protection scheme,
> e.g. a rwlock for argument's sake, then the SRCU assertion would fail even though
> the rmap logic itself didn't change.

I'm inclined to agree with Sean that the extra RCU annotations are
probably unnecessary since we're already doing the srcu dereference
for all the slots. I'll move all these RCU annotations to their own
patch and put it at the end of the series when I send v4.

2021-05-11 16:50:18

by Paolo Bonzini

[permalink] [raw]
Subject: Re: [PATCH v3 7/8] KVM: x86/mmu: Protect rmaps independently with SRCU

On 11/05/21 18:22, Ben Gardon wrote:
>> Yes, and I'm arguing that annotating the rmaps as __rcu is wrong because they
>> themselves are not protected by SRCU. The memslot that contains the rmaps is
>> protected by SRCU, and because of that asserting SRCU is held for read will hold
>> true. But, if the memslot code were changed to use a different protection scheme,
>> e.g. a rwlock for argument's sake, then the SRCU assertion would fail even though
>> the rmap logic itself didn't change.
>
> I'm inclined to agree with Sean that the extra RCU annotations are
> probably unnecessary since we're already doing the srcu dereference
> for all the slots. I'll move all these RCU annotations to their own
> patch and put it at the end of the series when I send v4.
>

Fair enough, you can even remove them then.

Paolo