2021-11-30 21:44:58

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 00/29] KVM: Scalable memslots implementation

From: "Maciej S. Szmigiero" <[email protected]>

This series contains the sixth iteration of the scalable memslots patch set.
It is based on Sean's version "5.5", but with integrated patches for issues
that arose during its review round.

In addition to that, the kvm_for_each_memslot_in_gfn_range() implementation
was reworked to return only strictly overlapping memslots and to use
iterators.

However, I've dropped a similar kvm_for_each_memslot_in_hva_range() rework
since the existing implementation was already returning only strictly
overlapping memslots and it was already based on interval tree iterators,
so wrapping them in another layer of iterators would only add unnecessary
complexity.
The code in this "for"-like macro is also self-contained and very simple,
so let's keep it this way.

The series was tested on x86 with KASAN on and booted various guests
successfully (including nested ones; with TDP MMU both enabled and disabled).

Sean's previous version (5.5) is available here:
[1]: https://lore.kernel.org/kvm/[email protected]/


Changes from v5.5:
* Note open-coding of kvm_copy_memslots() in the commit message of patch 3,

* When changing kvm_arch_prepare_memory_region() signature change it also
in the RISC-V implementation of this function in patch 5,

* Avoid overflowing "nr_mmu_pages" in patch 18,

* Add a comment to the "slot id to memslot" hash table declaration
explaining its bucket count in patch 21,

* Don't check twice for "old" being != NULL in kvm_replace_memslot()
in patch 21,

* Remove unnecessary "new" NULL check in kvm_replace_memslot() in patch 22,

* Split out changing kvm_invalidate_memslot() to call
kvm_arch_flush_shadow_memslot() on an old slot from patch 24 to a separate
patch 25,

* Make kvm_for_each_memslot_in_gfn_range() return only strictly overlapping
memslots and rewrite its implementation to use iterators in patch 26.


Maciej S. Szmigiero (12):
KVM: Resync only arch fields when slots_arch_lock gets reacquired
KVM: x86: Don't call kvm_mmu_change_mmu_pages() if the count hasn't
changed
KVM: x86: Use nr_memslot_pages to avoid traversing the memslots array
KVM: Integrate gfn_to_memslot_approx() into search_memslots()
KVM: Move WARN on invalid memslot index to update_memslots()
KVM: Resolve memslot ID via a hash table instead of via a static array
KVM: Use interval tree to do fast hva lookup in memslots
KVM: s390: Introduce kvm_s390_get_gfn_end()
KVM: Keep memslots in tree-based structures instead of array-based
ones
KVM: Call kvm_arch_flush_shadow_memslot() on the old slot in
kvm_invalidate_memslot()
KVM: Optimize gfn lookup in kvm_zap_gfn_range()
KVM: Optimize overlapping memslots check

Sean Christopherson (17):
KVM: Require total number of memslot pages to fit in an unsigned long
KVM: Open code kvm_delete_memslot() into its only caller
KVM: Use "new" memslot's address space ID instead of dedicated param
KVM: Let/force architectures to deal with arch specific memslot data
KVM: arm64: Use "new" memslot instead of userspace memory region
KVM: MIPS: Drop pr_debug from memslot commit to avoid using "mem"
KVM: PPC: Avoid referencing userspace memory region in memslot updates
KVM: s390: Use "new" memslot instead of userspace memory region
KVM: x86: Use "new" memslot instead of userspace memory region
KVM: RISC-V: Use "new" memslot instead of userspace memory region
KVM: Stop passing kvm_userspace_memory_region to arch memslot hooks
KVM: Use prepare/commit hooks to handle generic memslot metadata
updates
KVM: x86: Don't assume old/new memslots are non-NULL at memslot commit
KVM: s390: Skip gfn/size sanity checks on memslot DELETE or FLAGS_ONLY
KVM: Don't make a full copy of the old memslot in
__kvm_set_memory_region()
KVM: Wait 'til the bitter end to initialize the "new" memslot
KVM: Dynamically allocate "new" memslots from the get-go

arch/arm64/kvm/Kconfig | 1 +
arch/arm64/kvm/mmu.c | 27 +-
arch/mips/kvm/Kconfig | 1 +
arch/mips/kvm/mips.c | 9 +-
arch/powerpc/include/asm/kvm_ppc.h | 14 +-
arch/powerpc/kvm/Kconfig | 1 +
arch/powerpc/kvm/book3s.c | 14 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 +-
arch/powerpc/kvm/book3s_hv.c | 28 +-
arch/powerpc/kvm/book3s_hv_nested.c | 4 +-
arch/powerpc/kvm/book3s_hv_uvmem.c | 14 +-
arch/powerpc/kvm/book3s_pr.c | 9 +-
arch/powerpc/kvm/booke.c | 7 +-
arch/powerpc/kvm/powerpc.c | 9 +-
arch/riscv/kvm/mmu.c | 31 +-
arch/s390/kvm/Kconfig | 1 +
arch/s390/kvm/kvm-s390.c | 98 ++--
arch/s390/kvm/kvm-s390.h | 14 +
arch/s390/kvm/pv.c | 4 +-
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/kvm/Kconfig | 1 +
arch/x86/kvm/debugfs.c | 6 +-
arch/x86/kvm/mmu/mmu.c | 38 +-
arch/x86/kvm/x86.c | 41 +-
include/linux/kvm_host.h | 277 ++++++---
virt/kvm/kvm_main.c | 835 ++++++++++++++++------------
26 files changed, 854 insertions(+), 637 deletions(-)



2021-11-30 21:45:39

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 01/29] KVM: Require total number of memslot pages to fit in an unsigned long

From: Sean Christopherson <[email protected]>

Explicitly disallow creating more memslot pages than can fit in an
unsigned long, KVM doesn't correctly handle a total number of memslot
pages that doesn't fit in an unsigned long and remedying that would be a
waste of time.

For a 64-bit kernel, this is a nop as memslots are not allowed to overlap
in the gfn address space.

With a 32-bit kernel, userspace can at most address 3gb of virtual memory,
whereas wrapping the total number of pages would require 4tb+ of guest
physical memory. Even with x86's second address space for SMM, userspace
would need to alias all of guest memory more than one _thousand_ times.
And on older x86 hardware with MAXPHYADDR < 43, the guest couldn't
actually access any of those aliases even if userspace lied about
guest.MAXPHYADDR.

On 390 and arm64, this is a nop as they don't support 32-bit hosts.

On x86, practically speaking this is simply acknowledging reality as the
existing kvm_mmu_calculate_default_mmu_pages() assumes the total number
of pages fits in an "unsigned long".

On PPC, this is likely a nop as every flavor of PPC KVM assumes gfns (and
gpas!) fit in unsigned long. arch/powerpc/kvm/book3s_32_mmu_host.c goes
a step further and fails the build if CONFIG_PTE_64BIT=y, which
presumably means that it does't support 64-bit physical addresses.

On MIPS, this is also likely a nop as the core MMU helpers assume gpas
fit in unsigned long, e.g. see kvm_mips_##name##_pte.

And finally, RISC-V is a "don't care" as it doesn't exist in any release,
i.e. there is no established ABI to break.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maciej S. Szmigiero <[email protected]>
Signed-off-by: Maciej S. Szmigiero <[email protected]>
---
include/linux/kvm_host.h | 1 +
virt/kvm/kvm_main.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a745efe389ab..a6830966f8eb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -554,6 +554,7 @@ struct kvm {
*/
struct mutex slots_arch_lock;
struct mm_struct *mm; /* userspace tied to this vm */
+ unsigned long nr_memslot_pages;
struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
struct xarray vcpu_array;

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 863112783ed9..4a1b484518a9 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1640,6 +1640,15 @@ static int kvm_set_memslot(struct kvm *kvm,
update_memslots(slots, new, change);
slots = install_new_memslots(kvm, as_id, slots);

+ /*
+ * Update the total number of memslot pages before calling the arch
+ * hook so that architectures can consume the result directly.
+ */
+ if (change == KVM_MR_DELETE)
+ kvm->nr_memslot_pages -= old.npages;
+ else if (change == KVM_MR_CREATE)
+ kvm->nr_memslot_pages += new->npages;
+
kvm_arch_commit_memory_region(kvm, mem, &old, new, change);

/* Free the old memslot's metadata. Note, this is the full copy!!! */
@@ -1670,6 +1679,9 @@ static int kvm_delete_memslot(struct kvm *kvm,
if (!old->npages)
return -EINVAL;

+ if (WARN_ON_ONCE(kvm->nr_memslot_pages < old->npages))
+ return -EIO;
+
memset(&new, 0, sizeof(new));
new.id = old->id;
/*
@@ -1753,6 +1765,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (!old.npages) {
change = KVM_MR_CREATE;
new.dirty_bitmap = NULL;
+
+ /*
+ * To simplify KVM internals, the total number of pages across
+ * all memslots must fit in an unsigned long.
+ */
+ if ((kvm->nr_memslot_pages + new.npages) < kvm->nr_memslot_pages)
+ return -EINVAL;
} else { /* Modify an existing slot. */
if ((new.userspace_addr != old.userspace_addr) ||
(new.npages != old.npages) ||

2021-11-30 21:46:08

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 04/29] KVM: Use "new" memslot's address space ID instead of dedicated param

From: Sean Christopherson <[email protected]>

Now that the address space ID is stored in every slot, including fake
slots used for deletion, use the slot's as_id instead of passing in the
redundant information as a param to kvm_set_memslot(). This will greatly
simplify future memslot work by avoiding passing a large number of
variables around purely to honor @as_id.

Drop a comment in the DELETE path about new->as_id being provided purely
for debug, as that's now a lie.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maciej S. Szmigiero <[email protected]>
Signed-off-by: Maciej S. Szmigiero <[email protected]>
---
virt/kvm/kvm_main.c | 22 +++++++++-------------
1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 95616392ff91..683a5ef6f71e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1553,7 +1553,7 @@ static void kvm_copy_memslots_arch(struct kvm_memslots *to,

static int kvm_set_memslot(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem,
- struct kvm_memory_slot *new, int as_id,
+ struct kvm_memory_slot *new,
enum kvm_mr_change change)
{
struct kvm_memory_slot *slot, old;
@@ -1576,7 +1576,7 @@ static int kvm_set_memslot(struct kvm *kvm,
*/
mutex_lock(&kvm->slots_arch_lock);

- slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
+ slots = kvm_dup_memslots(__kvm_memslots(kvm, new->as_id), change);
if (!slots) {
mutex_unlock(&kvm->slots_arch_lock);
return -ENOMEM;
@@ -1596,7 +1596,7 @@ static int kvm_set_memslot(struct kvm *kvm,
* dropped by update_memslots anyway. We'll also revert to the
* old memslots if preparing the new memory region fails.
*/
- slots = install_new_memslots(kvm, as_id, slots);
+ slots = install_new_memslots(kvm, new->as_id, slots);

/* From this point no new shadow pages pointing to a deleted,
* or moved, memslot will be created.
@@ -1618,7 +1618,7 @@ static int kvm_set_memslot(struct kvm *kvm,
* to retrieve memslots *after* acquiring slots_arch_lock, thus
* the active memslots are guaranteed to be fresh.
*/
- kvm_copy_memslots_arch(slots, __kvm_memslots(kvm, as_id));
+ kvm_copy_memslots_arch(slots, __kvm_memslots(kvm, new->as_id));
}

/*
@@ -1635,7 +1635,7 @@ static int kvm_set_memslot(struct kvm *kvm,
WARN_ON_ONCE(change != KVM_MR_CREATE);
memset(&old, 0, sizeof(old));
old.id = new->id;
- old.as_id = as_id;
+ old.as_id = new->as_id;
}

/* Copy the arch-specific data, again after (re)acquiring slots_arch_lock. */
@@ -1646,7 +1646,7 @@ static int kvm_set_memslot(struct kvm *kvm,
goto out_slots;

update_memslots(slots, new, change);
- slots = install_new_memslots(kvm, as_id, slots);
+ slots = install_new_memslots(kvm, new->as_id, slots);

/*
* Update the total number of memslot pages before calling the arch
@@ -1668,7 +1668,7 @@ static int kvm_set_memslot(struct kvm *kvm,

out_slots:
if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
- slots = install_new_memslots(kvm, as_id, slots);
+ slots = install_new_memslots(kvm, new->as_id, slots);
else
mutex_unlock(&kvm->slots_arch_lock);
kvfree(slots);
@@ -1740,13 +1740,9 @@ int __kvm_set_memory_region(struct kvm *kvm,

memset(&new, 0, sizeof(new));
new.id = id;
- /*
- * This is only for debugging purpose; it should never be
- * referenced for a removed memslot.
- */
new.as_id = as_id;

- return kvm_set_memslot(kvm, mem, &new, as_id, KVM_MR_DELETE);
+ return kvm_set_memslot(kvm, mem, &new, KVM_MR_DELETE);
}

new.as_id = as_id;
@@ -1809,7 +1805,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
bitmap_set(new.dirty_bitmap, 0, new.npages);
}

- r = kvm_set_memslot(kvm, mem, &new, as_id, change);
+ r = kvm_set_memslot(kvm, mem, &new, change);
if (r)
goto out_bitmap;


2021-11-30 21:46:20

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 16/29] KVM: Don't make a full copy of the old memslot in __kvm_set_memory_region()

From: Sean Christopherson <[email protected]>

Stop making a full copy of the old memslot in __kvm_set_memory_region()
now that metadata updates are handled by kvm_set_memslot(), i.e. now that
the old memslot's dirty bitmap doesn't need to be referenced after the
memslot and its pointer is modified/invalidated by kvm_set_memslot().

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maciej S. Szmigiero <[email protected]>
Signed-off-by: Maciej S. Szmigiero <[email protected]>
---
virt/kvm/kvm_main.c | 35 +++++++++++++----------------------
1 file changed, 13 insertions(+), 22 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1689f598fe9e..1f37c4ce5f97 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1732,8 +1732,8 @@ static int kvm_set_memslot(struct kvm *kvm,
int __kvm_set_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem)
{
- struct kvm_memory_slot old, new;
- struct kvm_memory_slot *tmp;
+ struct kvm_memory_slot *old, *tmp;
+ struct kvm_memory_slot new;
enum kvm_mr_change change;
int as_id, id;
int r;
@@ -1763,25 +1763,16 @@ int __kvm_set_memory_region(struct kvm *kvm,
return -EINVAL;

/*
- * Make a full copy of the old memslot, the pointer will become stale
- * when the memslots are re-sorted by update_memslots(), and the old
- * memslot needs to be referenced after calling update_memslots(), e.g.
- * to free its resources and for arch specific behavior.
+ * Note, the old memslot (and the pointer itself!) may be invalidated
+ * and/or destroyed by kvm_set_memslot().
*/
- tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
- if (tmp) {
- old = *tmp;
- tmp = NULL;
- } else {
- memset(&old, 0, sizeof(old));
- old.id = id;
- }
+ old = id_to_memslot(__kvm_memslots(kvm, as_id), id);

if (!mem->memory_size) {
- if (!old.npages)
+ if (!old || !old->npages)
return -EINVAL;

- if (WARN_ON_ONCE(kvm->nr_memslot_pages < old.npages))
+ if (WARN_ON_ONCE(kvm->nr_memslot_pages < old->npages))
return -EIO;

memset(&new, 0, sizeof(new));
@@ -1801,7 +1792,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (new.npages > KVM_MEM_MAX_NR_PAGES)
return -EINVAL;

- if (!old.npages) {
+ if (!old || !old->npages) {
change = KVM_MR_CREATE;

/*
@@ -1811,14 +1802,14 @@ int __kvm_set_memory_region(struct kvm *kvm,
if ((kvm->nr_memslot_pages + new.npages) < kvm->nr_memslot_pages)
return -EINVAL;
} else { /* Modify an existing slot. */
- if ((new.userspace_addr != old.userspace_addr) ||
- (new.npages != old.npages) ||
- ((new.flags ^ old.flags) & KVM_MEM_READONLY))
+ if ((new.userspace_addr != old->userspace_addr) ||
+ (new.npages != old->npages) ||
+ ((new.flags ^ old->flags) & KVM_MEM_READONLY))
return -EINVAL;

- if (new.base_gfn != old.base_gfn)
+ if (new.base_gfn != old->base_gfn)
change = KVM_MR_MOVE;
- else if (new.flags != old.flags)
+ else if (new.flags != old->flags)
change = KVM_MR_FLAGS_ONLY;
else /* Nothing to change. */
return 0;

2021-11-30 21:46:34

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 15/29] KVM: s390: Skip gfn/size sanity checks on memslot DELETE or FLAGS_ONLY

From: Sean Christopherson <[email protected]>

Sanity check the hva, gfn, and size of a userspace memory region only if
any of those properties can change, i.e. skip the checks for DELETE and
FLAGS_ONLY. KVM doesn't allow moving the hva or changing the size, a gfn
change shows up as a MOVE even if flags are being modified, and the
checks are pointless for the DELETE case as userspace_addr and gfn_base
are zeroed by common KVM.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Maciej S. Szmigiero <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 481789873c81..a92e36d8a827 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -5011,7 +5011,14 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *new,
enum kvm_mr_change change)
{
- gpa_t size = new->npages * PAGE_SIZE;
+ gpa_t size;
+
+ /* When we are protected, we should not change the memory slots */
+ if (kvm_s390_pv_get_handle(kvm))
+ return -EINVAL;
+
+ if (change == KVM_MR_DELETE || change == KVM_MR_FLAGS_ONLY)
+ return 0;

/* A few sanity checks. We can have memory slots which have to be
located/ended at a segment boundary (1MB). The memory in userland is
@@ -5021,15 +5028,13 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
if (new->userspace_addr & 0xffffful)
return -EINVAL;

+ size = new->npages * PAGE_SIZE;
if (size & 0xffffful)
return -EINVAL;

if ((new->base_gfn * PAGE_SIZE) + size > kvm->arch.mem_limit)
return -EINVAL;

- /* When we are protected, we should not change the memory slots */
- if (kvm_s390_pv_get_handle(kvm))
- return -EINVAL;
return 0;
}


2021-11-30 21:46:55

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 19/29] KVM: Integrate gfn_to_memslot_approx() into search_memslots()

From: "Maciej S. Szmigiero" <[email protected]>

s390 arch has gfn_to_memslot_approx() which is almost identical to
search_memslots(), differing only in that in case the gfn falls in a hole
one of the memslots bordering the hole is returned.

Add this lookup mode as an option to search_memslots() so we don't have two
almost identical functions for looking up a memslot by its gfn.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
[sean: tweaked helper names to keep gfn_to_memslot_approx() in s390]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 45 +++++++---------------------------------
include/linux/kvm_host.h | 35 ++++++++++++++++++++++++-------
virt/kvm/kvm_main.c | 2 +-
3 files changed, 36 insertions(+), 46 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index a92e36d8a827..b0fa2592b4ef 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1943,41 +1943,6 @@ static long kvm_s390_set_skeys(struct kvm *kvm, struct kvm_s390_skeys *args)
/* for consistency */
#define KVM_S390_CMMA_SIZE_MAX ((u32)KVM_S390_SKEYS_MAX)

-/*
- * Similar to gfn_to_memslot, but returns the index of a memslot also when the
- * address falls in a hole. In that case the index of one of the memslots
- * bordering the hole is returned.
- */
-static int gfn_to_memslot_approx(struct kvm_memslots *slots, gfn_t gfn)
-{
- int start = 0, end = slots->used_slots;
- int slot = atomic_read(&slots->last_used_slot);
- struct kvm_memory_slot *memslots = slots->memslots;
-
- if (gfn >= memslots[slot].base_gfn &&
- gfn < memslots[slot].base_gfn + memslots[slot].npages)
- return slot;
-
- while (start < end) {
- slot = start + (end - start) / 2;
-
- if (gfn >= memslots[slot].base_gfn)
- end = slot;
- else
- start = slot + 1;
- }
-
- if (start >= slots->used_slots)
- return slots->used_slots - 1;
-
- if (gfn >= memslots[start].base_gfn &&
- gfn < memslots[start].base_gfn + memslots[start].npages) {
- atomic_set(&slots->last_used_slot, start);
- }
-
- return start;
-}
-
static int kvm_s390_peek_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args,
u8 *res, unsigned long bufsize)
{
@@ -2001,11 +1966,17 @@ static int kvm_s390_peek_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args,
return 0;
}

+static struct kvm_memory_slot *gfn_to_memslot_approx(struct kvm_memslots *slots,
+ gfn_t gfn)
+{
+ return ____gfn_to_memslot(slots, gfn, true);
+}
+
static unsigned long kvm_s390_next_dirty_cmma(struct kvm_memslots *slots,
unsigned long cur_gfn)
{
- int slotidx = gfn_to_memslot_approx(slots, cur_gfn);
- struct kvm_memory_slot *ms = slots->memslots + slotidx;
+ struct kvm_memory_slot *ms = gfn_to_memslot_approx(slots, cur_gfn);
+ int slotidx = ms - slots->memslots;
unsigned long ofs = cur_gfn - ms->base_gfn;

if (ms->base_gfn + ms->npages <= cur_gfn) {
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9f984867c297..86562ffb6ea4 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1250,10 +1250,14 @@ try_get_memslot(struct kvm_memslots *slots, int slot_index, gfn_t gfn)
* Returns a pointer to the memslot that contains gfn and records the index of
* the slot in index. Otherwise returns NULL.
*
+ * With "approx" set returns the memslot also when the address falls
+ * in a hole. In that case one of the memslots bordering the hole is
+ * returned.
+ *
* IMPORTANT: Slots are sorted from highest GFN to lowest GFN!
*/
static inline struct kvm_memory_slot *
-search_memslots(struct kvm_memslots *slots, gfn_t gfn, int *index)
+search_memslots(struct kvm_memslots *slots, gfn_t gfn, int *index, bool approx)
{
int start = 0, end = slots->used_slots;
struct kvm_memory_slot *memslots = slots->memslots;
@@ -1271,22 +1275,26 @@ search_memslots(struct kvm_memslots *slots, gfn_t gfn, int *index)
start = slot + 1;
}

+ if (approx && start >= slots->used_slots) {
+ *index = slots->used_slots - 1;
+ return &memslots[slots->used_slots - 1];
+ }
+
slot = try_get_memslot(slots, start, gfn);
if (slot) {
*index = start;
return slot;
}
+ if (approx) {
+ *index = start;
+ return &memslots[start];
+ }

return NULL;
}

-/*
- * __gfn_to_memslot() and its descendants are here because it is called from
- * non-modular code in arch/powerpc/kvm/book3s_64_vio{,_hv}.c. gfn_to_memslot()
- * itself isn't here as an inline because that would bloat other code too much.
- */
static inline struct kvm_memory_slot *
-__gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
+____gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn, bool approx)
{
struct kvm_memory_slot *slot;
int slot_index = atomic_read(&slots->last_used_slot);
@@ -1295,7 +1303,7 @@ __gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
if (slot)
return slot;

- slot = search_memslots(slots, gfn, &slot_index);
+ slot = search_memslots(slots, gfn, &slot_index, approx);
if (slot) {
atomic_set(&slots->last_used_slot, slot_index);
return slot;
@@ -1304,6 +1312,17 @@ __gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
return NULL;
}

+/*
+ * __gfn_to_memslot() and its descendants are here to allow arch code to inline
+ * the lookups in hot paths. gfn_to_memslot() itself isn't here as an inline
+ * because that would bloat other code too much.
+ */
+static inline struct kvm_memory_slot *
+__gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn)
+{
+ return ____gfn_to_memslot(slots, gfn, false);
+}
+
static inline unsigned long
__gfn_to_hva_memslot(const struct kvm_memory_slot *slot, gfn_t gfn)
{
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1f37c4ce5f97..aca39b587cdb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2143,7 +2143,7 @@ struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn
* search_memslots() instead of __gfn_to_memslot() to avoid
* thrashing the VM-wide last_used_index in kvm_memslots.
*/
- slot = search_memslots(slots, gfn, &slot_index);
+ slot = search_memslots(slots, gfn, &slot_index, false);
if (slot) {
vcpu->last_used_slot = slot_index;
return slot;

2021-11-30 21:46:56

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 24/29] KVM: Keep memslots in tree-based structures instead of array-based ones

From: "Maciej S. Szmigiero" <[email protected]>

The current memslot code uses a (reverse gfn-ordered) memslot array for
keeping track of them.

Because the memslot array that is currently in use cannot be modified
every memslot management operation (create, delete, move, change flags)
has to make a copy of the whole array so it has a scratch copy to work on.

Strictly speaking, however, it is only necessary to make copy of the
memslot that is being modified, copying all the memslots currently present
is just a limitation of the array-based memslot implementation.

Two memslot sets, however, are still needed so the VM continues to run
on the currently active set while the requested operation is being
performed on the second, currently inactive one.

In order to have two memslot sets, but only one copy of actual memslots
it is necessary to split out the memslot data from the memslot sets.

The memslots themselves should be also kept independent of each other
so they can be individually added or deleted.

These two memslot sets should normally point to the same set of
memslots. They can, however, be desynchronized when performing a
memslot management operation by replacing the memslot to be modified
by its copy. After the operation is complete, both memslot sets once
again point to the same, common set of memslot data.

This commit implements the aforementioned idea.

For tracking of gfns an ordinary rbtree is used since memslots cannot
overlap in the guest address space and so this data structure is
sufficient for ensuring that lookups are done quickly.

The "last used slot" mini-caches (both per-slot set one and per-vCPU one),
that keep track of the last found-by-gfn memslot, are still present in the
new code.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/kvm/mmu.c | 8 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 4 +-
arch/powerpc/kvm/book3s_hv.c | 3 +-
arch/powerpc/kvm/book3s_hv_nested.c | 4 +-
arch/powerpc/kvm/book3s_hv_uvmem.c | 14 +-
arch/s390/kvm/kvm-s390.c | 24 +-
arch/s390/kvm/kvm-s390.h | 6 +-
arch/x86/kvm/debugfs.c | 6 +-
arch/x86/kvm/mmu/mmu.c | 8 +-
include/linux/kvm_host.h | 143 +++---
virt/kvm/kvm_main.c | 761 ++++++++++++++--------------
11 files changed, 503 insertions(+), 478 deletions(-)

diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 9b2d881ccf49..e65acf35cee3 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -210,13 +210,13 @@ static void stage2_flush_vm(struct kvm *kvm)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
- int idx;
+ int idx, bkt;

idx = srcu_read_lock(&kvm->srcu);
spin_lock(&kvm->mmu_lock);

slots = kvm_memslots(kvm);
- kvm_for_each_memslot(memslot, slots)
+ kvm_for_each_memslot(memslot, bkt, slots)
stage2_flush_memslot(kvm, memslot);

spin_unlock(&kvm->mmu_lock);
@@ -595,14 +595,14 @@ void stage2_unmap_vm(struct kvm *kvm)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
- int idx;
+ int idx, bkt;

idx = srcu_read_lock(&kvm->srcu);
mmap_read_lock(current->mm);
spin_lock(&kvm->mmu_lock);

slots = kvm_memslots(kvm);
- kvm_for_each_memslot(memslot, slots)
+ kvm_for_each_memslot(memslot, bkt, slots)
stage2_unmap_memslot(kvm, memslot);

spin_unlock(&kvm->mmu_lock);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index c63e263312a4..213232914367 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -734,11 +734,11 @@ void kvmppc_rmap_reset(struct kvm *kvm)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
- int srcu_idx;
+ int srcu_idx, bkt;

srcu_idx = srcu_read_lock(&kvm->srcu);
slots = kvm_memslots(kvm);
- kvm_for_each_memslot(memslot, slots) {
+ kvm_for_each_memslot(memslot, bkt, slots) {
/* Mutual exclusion with kvm_unmap_hva_range etc. */
spin_lock(&kvm->mmu_lock);
/*
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2b59ecc5f8c6..51e1c29a6fa0 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -5880,11 +5880,12 @@ static int kvmhv_svm_off(struct kvm *kvm)
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
struct kvm_memory_slot *memslot;
struct kvm_memslots *slots = __kvm_memslots(kvm, i);
+ int bkt;

if (!slots)
continue;

- kvm_for_each_memslot(memslot, slots) {
+ kvm_for_each_memslot(memslot, bkt, slots) {
kvmppc_uvmem_drop_pages(memslot, kvm, true);
uv_unregister_mem_slot(kvm->arch.lpid, memslot->id);
}
diff --git a/arch/powerpc/kvm/book3s_hv_nested.c b/arch/powerpc/kvm/book3s_hv_nested.c
index ed8a2c9f5629..9435e482d514 100644
--- a/arch/powerpc/kvm/book3s_hv_nested.c
+++ b/arch/powerpc/kvm/book3s_hv_nested.c
@@ -749,7 +749,7 @@ void kvmhv_release_all_nested(struct kvm *kvm)
struct kvm_nested_guest *gp;
struct kvm_nested_guest *freelist = NULL;
struct kvm_memory_slot *memslot;
- int srcu_idx;
+ int srcu_idx, bkt;

spin_lock(&kvm->mmu_lock);
for (i = 0; i <= kvm->arch.max_nested_lpid; i++) {
@@ -770,7 +770,7 @@ void kvmhv_release_all_nested(struct kvm *kvm)
}

srcu_idx = srcu_read_lock(&kvm->srcu);
- kvm_for_each_memslot(memslot, kvm_memslots(kvm))
+ kvm_for_each_memslot(memslot, bkt, kvm_memslots(kvm))
kvmhv_free_memslot_nest_rmap(memslot);
srcu_read_unlock(&kvm->srcu, srcu_idx);
}
diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c
index 28c436df9935..e414ca44839f 100644
--- a/arch/powerpc/kvm/book3s_hv_uvmem.c
+++ b/arch/powerpc/kvm/book3s_hv_uvmem.c
@@ -459,7 +459,7 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot, *m;
int ret = H_SUCCESS;
- int srcu_idx;
+ int srcu_idx, bkt;

kvm->arch.secure_guest = KVMPPC_SECURE_INIT_START;

@@ -478,7 +478,7 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)

/* register the memslot */
slots = kvm_memslots(kvm);
- kvm_for_each_memslot(memslot, slots) {
+ kvm_for_each_memslot(memslot, bkt, slots) {
ret = __kvmppc_uvmem_memslot_create(kvm, memslot);
if (ret)
break;
@@ -486,7 +486,7 @@ unsigned long kvmppc_h_svm_init_start(struct kvm *kvm)

if (ret) {
slots = kvm_memslots(kvm);
- kvm_for_each_memslot(m, slots) {
+ kvm_for_each_memslot(m, bkt, slots) {
if (m == memslot)
break;
__kvmppc_uvmem_memslot_delete(kvm, memslot);
@@ -647,7 +647,7 @@ void kvmppc_uvmem_drop_pages(const struct kvm_memory_slot *slot,

unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)
{
- int srcu_idx;
+ int srcu_idx, bkt;
struct kvm_memory_slot *memslot;

/*
@@ -662,7 +662,7 @@ unsigned long kvmppc_h_svm_init_abort(struct kvm *kvm)

srcu_idx = srcu_read_lock(&kvm->srcu);

- kvm_for_each_memslot(memslot, kvm_memslots(kvm))
+ kvm_for_each_memslot(memslot, bkt, kvm_memslots(kvm))
kvmppc_uvmem_drop_pages(memslot, kvm, false);

srcu_read_unlock(&kvm->srcu, srcu_idx);
@@ -821,7 +821,7 @@ unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
- int srcu_idx;
+ int srcu_idx, bkt;
long ret = H_SUCCESS;

if (!(kvm->arch.secure_guest & KVMPPC_SECURE_INIT_START))
@@ -830,7 +830,7 @@ unsigned long kvmppc_h_svm_init_done(struct kvm *kvm)
/* migrate any unmoved normal pfn to device pfns*/
srcu_idx = srcu_read_lock(&kvm->srcu);
slots = kvm_memslots(kvm);
- kvm_for_each_memslot(memslot, slots) {
+ kvm_for_each_memslot(memslot, bkt, slots) {
ret = kvmppc_uv_migrate_mem_slot(kvm, memslot);
if (ret) {
/*
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 5054fab23226..dd099d352753 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -1037,13 +1037,13 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
struct kvm_memory_slot *ms;
struct kvm_memslots *slots;
unsigned long ram_pages = 0;
- int slotnr;
+ int bkt;

/* migration mode already enabled */
if (kvm->arch.migration_mode)
return 0;
slots = kvm_memslots(kvm);
- if (!slots || !slots->used_slots)
+ if (!slots || kvm_memslots_empty(slots))
return -EINVAL;

if (!kvm->arch.use_cmma) {
@@ -1051,8 +1051,7 @@ static int kvm_s390_vm_start_migration(struct kvm *kvm)
return 0;
}
/* mark all the pages in active slots as dirty */
- for (slotnr = 0; slotnr < slots->used_slots; slotnr++) {
- ms = slots->memslots + slotnr;
+ kvm_for_each_memslot(ms, bkt, slots) {
if (!ms->dirty_bitmap)
return -EINVAL;
/*
@@ -1976,22 +1975,21 @@ static unsigned long kvm_s390_next_dirty_cmma(struct kvm_memslots *slots,
unsigned long cur_gfn)
{
struct kvm_memory_slot *ms = gfn_to_memslot_approx(slots, cur_gfn);
- int slotidx = ms - slots->memslots;
unsigned long ofs = cur_gfn - ms->base_gfn;
+ struct rb_node *mnode = &ms->gfn_node[slots->node_idx];

if (ms->base_gfn + ms->npages <= cur_gfn) {
- slotidx--;
+ mnode = rb_next(mnode);
/* If we are above the highest slot, wrap around */
- if (slotidx < 0)
- slotidx = slots->used_slots - 1;
+ if (!mnode)
+ mnode = rb_first(&slots->gfn_tree);

- ms = slots->memslots + slotidx;
+ ms = container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_idx]);
ofs = 0;
}
ofs = find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, ofs);
- while ((slotidx > 0) && (ofs >= ms->npages)) {
- slotidx--;
- ms = slots->memslots + slotidx;
+ while (ofs >= ms->npages && (mnode = rb_next(mnode))) {
+ ms = container_of(mnode, struct kvm_memory_slot, gfn_node[slots->node_idx]);
ofs = find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, 0);
}
return ms->base_gfn + ofs;
@@ -2004,7 +2002,7 @@ static int kvm_s390_get_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args,
struct kvm_memslots *slots = kvm_memslots(kvm);
struct kvm_memory_slot *ms;

- if (unlikely(!slots->used_slots))
+ if (unlikely(kvm_memslots_empty(slots)))
return 0;

cur_gfn = kvm_s390_next_dirty_cmma(slots, args->start_gfn);
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index cc309cc37e96..60f0effcce99 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -220,12 +220,14 @@ static inline void kvm_s390_set_user_cpu_state_ctrl(struct kvm *kvm)
/* get the end gfn of the last (highest gfn) memslot */
static inline unsigned long kvm_s390_get_gfn_end(struct kvm_memslots *slots)
{
+ struct rb_node *node;
struct kvm_memory_slot *ms;

- if (WARN_ON(!slots->used_slots))
+ if (WARN_ON(kvm_memslots_empty(slots)))
return 0;

- ms = slots->memslots;
+ node = rb_last(&slots->gfn_tree);
+ ms = container_of(node, struct kvm_memory_slot, gfn_node[slots->node_idx]);
return ms->base_gfn + ms->npages;
}

diff --git a/arch/x86/kvm/debugfs.c b/arch/x86/kvm/debugfs.c
index 54a83a744538..543a8c04025c 100644
--- a/arch/x86/kvm/debugfs.c
+++ b/arch/x86/kvm/debugfs.c
@@ -107,9 +107,10 @@ static int kvm_mmu_rmaps_stat_show(struct seq_file *m, void *v)
write_lock(&kvm->mmu_lock);

for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ int bkt;
+
slots = __kvm_memslots(kvm, i);
- for (j = 0; j < slots->used_slots; j++) {
- slot = &slots->memslots[j];
+ kvm_for_each_memslot(slot, bkt, slots)
for (k = 0; k < KVM_NR_PAGE_SIZES; k++) {
rmap = slot->arch.rmap[k];
lpage_size = kvm_mmu_slot_lpages(slot, k + 1);
@@ -121,7 +122,6 @@ static int kvm_mmu_rmaps_stat_show(struct seq_file *m, void *v)
cur[index]++;
}
}
- }
}

write_unlock(&kvm->mmu_lock);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 64b6181f7757..f5a89942f054 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -3397,7 +3397,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *slot;
- int r = 0, i;
+ int r = 0, i, bkt;

/*
* Check if this is the first shadow root being allocated before
@@ -3422,7 +3422,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm)

for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
slots = __kvm_memslots(kvm, i);
- kvm_for_each_memslot(slot, slots) {
+ kvm_for_each_memslot(slot, bkt, slots) {
/*
* Both of these functions are no-ops if the target is
* already allocated, so unconditionally calling both
@@ -5706,14 +5706,14 @@ static bool __kvm_zap_rmaps(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
struct kvm_memslots *slots;
bool flush = false;
gfn_t start, end;
- int i;
+ int i, bkt;

if (!kvm_memslots_have_rmaps(kvm))
return flush;

for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
slots = __kvm_memslots(kvm, i);
- kvm_for_each_memslot(memslot, slots) {
+ kvm_for_each_memslot(memslot, bkt, slots) {
start = max(gfn_start, memslot->base_gfn);
end = min(gfn_end, memslot->base_gfn + memslot->npages);
if (start >= end)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9f87adda1764..41efe53cf150 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -31,6 +31,7 @@
#include <linux/notifier.h>
#include <linux/hashtable.h>
#include <linux/interval_tree.h>
+#include <linux/rbtree.h>
#include <linux/xarray.h>
#include <asm/signal.h>

@@ -360,11 +361,13 @@ struct kvm_vcpu {
struct kvm_dirty_ring dirty_ring;

/*
- * The index of the most recently used memslot by this vCPU. It's ok
- * if this becomes stale due to memslot changes since we always check
- * it is a valid slot.
+ * The most recently used memslot by this vCPU and the slots generation
+ * for which it is valid.
+ * No wraparound protection is needed since generations won't overflow in
+ * thousands of years, even assuming 1M memslot operations per second.
*/
- int last_used_slot;
+ struct kvm_memory_slot *last_used_slot;
+ u64 last_used_slot_gen;
};

/* must be called with irqs disabled */
@@ -429,9 +432,26 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
*/
#define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)

+/*
+ * Since at idle each memslot belongs to two memslot sets it has to contain
+ * two embedded nodes for each data structure that it forms a part of.
+ *
+ * Two memslot sets (one active and one inactive) are necessary so the VM
+ * continues to run on one memslot set while the other is being modified.
+ *
+ * These two memslot sets normally point to the same set of memslots.
+ * They can, however, be desynchronized when performing a memslot management
+ * operation by replacing the memslot to be modified by its copy.
+ * After the operation is complete, both memslot sets once again point to
+ * the same, common set of memslot data.
+ *
+ * The memslots themselves are independent of each other so they can be
+ * individually added or deleted.
+ */
struct kvm_memory_slot {
- struct hlist_node id_node;
- struct interval_tree_node hva_node;
+ struct hlist_node id_node[2];
+ struct interval_tree_node hva_node[2];
+ struct rb_node gfn_node[2];
gfn_t base_gfn;
unsigned long npages;
unsigned long *dirty_bitmap;
@@ -526,16 +546,13 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
}
#endif

-/*
- * Note:
- * memslots are not sorted by id anymore, please use id_to_memslot()
- * to get the memslot by its id.
- */
struct kvm_memslots {
u64 generation;
+ atomic_long_t last_used_slot;
struct rb_root_cached hva_tree;
+ struct rb_root gfn_tree;
/*
- * The mapping table from slot id to the index in memslots[].
+ * The mapping table from slot id to memslot.
*
* 7-bit bucket count matches the size of the old id to index array for
* 512 slots, while giving good performance with this slot count.
@@ -543,9 +560,7 @@ struct kvm_memslots {
* always result in higher memory usage (even for lower memslot counts).
*/
DECLARE_HASHTABLE(id_hash, 7);
- atomic_t last_used_slot;
- int used_slots;
- struct kvm_memory_slot memslots[];
+ int node_idx;
};

struct kvm {
@@ -567,6 +582,9 @@ struct kvm {
struct mutex slots_arch_lock;
struct mm_struct *mm; /* userspace tied to this vm */
unsigned long nr_memslot_pages;
+ /* The two memslot sets - active and inactive (per address space) */
+ struct kvm_memslots __memslots[KVM_ADDRESS_SPACE_NUM][2];
+ /* The current active memslot set for each address space */
struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM];
struct xarray vcpu_array;

@@ -741,11 +759,10 @@ static inline struct kvm_vcpu *kvm_get_vcpu_by_id(struct kvm *kvm, int id)
return NULL;
}

-#define kvm_for_each_memslot(memslot, slots) \
- for (memslot = &slots->memslots[0]; \
- memslot < slots->memslots + slots->used_slots; memslot++) \
- if (WARN_ON_ONCE(!memslot->npages)) { \
- } else
+static inline int kvm_vcpu_get_idx(struct kvm_vcpu *vcpu)
+{
+ return vcpu->vcpu_idx;
+}

void kvm_destroy_vcpus(struct kvm *kvm);

@@ -807,12 +824,23 @@ static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu)
return __kvm_memslots(vcpu->kvm, as_id);
}

+static inline bool kvm_memslots_empty(struct kvm_memslots *slots)
+{
+ return RB_EMPTY_ROOT(&slots->gfn_tree);
+}
+
+#define kvm_for_each_memslot(memslot, bkt, slots) \
+ hash_for_each(slots->id_hash, bkt, memslot, id_node[slots->node_idx]) \
+ if (WARN_ON_ONCE(!memslot->npages)) { \
+ } else
+
static inline
struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
{
struct kvm_memory_slot *slot;
+ int idx = slots->node_idx;

- hash_for_each_possible(slots->id_hash, slot, id_node, id) {
+ hash_for_each_possible(slots->id_hash, slot, id_node[idx], id) {
if (slot->id == id)
return slot;
}
@@ -1231,25 +1259,15 @@ void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);
bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args);

/*
- * Returns a pointer to the memslot at slot_index if it contains gfn.
+ * Returns a pointer to the memslot if it contains gfn.
* Otherwise returns NULL.
*/
static inline struct kvm_memory_slot *
-try_get_memslot(struct kvm_memslots *slots, int slot_index, gfn_t gfn)
+try_get_memslot(struct kvm_memory_slot *slot, gfn_t gfn)
{
- struct kvm_memory_slot *slot;
-
- if (slot_index < 0 || slot_index >= slots->used_slots)
+ if (!slot)
return NULL;

- /*
- * slot_index can come from vcpu->last_used_slot which is not kept
- * in sync with userspace-controllable memslot deletion. So use nospec
- * to prevent the CPU from speculating past the end of memslots[].
- */
- slot_index = array_index_nospec(slot_index, slots->used_slots);
- slot = &slots->memslots[slot_index];
-
if (gfn >= slot->base_gfn && gfn < slot->base_gfn + slot->npages)
return slot;
else
@@ -1257,65 +1275,46 @@ try_get_memslot(struct kvm_memslots *slots, int slot_index, gfn_t gfn)
}

/*
- * Returns a pointer to the memslot that contains gfn and records the index of
- * the slot in index. Otherwise returns NULL.
+ * Returns a pointer to the memslot that contains gfn. Otherwise returns NULL.
*
* With "approx" set returns the memslot also when the address falls
* in a hole. In that case one of the memslots bordering the hole is
* returned.
- *
- * IMPORTANT: Slots are sorted from highest GFN to lowest GFN!
*/
static inline struct kvm_memory_slot *
-search_memslots(struct kvm_memslots *slots, gfn_t gfn, int *index, bool approx)
+search_memslots(struct kvm_memslots *slots, gfn_t gfn, bool approx)
{
- int start = 0, end = slots->used_slots;
- struct kvm_memory_slot *memslots = slots->memslots;
struct kvm_memory_slot *slot;
-
- if (unlikely(!slots->used_slots))
- return NULL;
-
- while (start < end) {
- int slot = start + (end - start) / 2;
-
- if (gfn >= memslots[slot].base_gfn)
- end = slot;
- else
- start = slot + 1;
- }
-
- if (approx && start >= slots->used_slots) {
- *index = slots->used_slots - 1;
- return &memslots[slots->used_slots - 1];
- }
-
- slot = try_get_memslot(slots, start, gfn);
- if (slot) {
- *index = start;
- return slot;
- }
- if (approx) {
- *index = start;
- return &memslots[start];
+ struct rb_node *node;
+ int idx = slots->node_idx;
+
+ slot = NULL;
+ for (node = slots->gfn_tree.rb_node; node; ) {
+ slot = container_of(node, struct kvm_memory_slot, gfn_node[idx]);
+ if (gfn >= slot->base_gfn) {
+ if (gfn < slot->base_gfn + slot->npages)
+ return slot;
+ node = node->rb_right;
+ } else
+ node = node->rb_left;
}

- return NULL;
+ return approx ? slot : NULL;
}

static inline struct kvm_memory_slot *
____gfn_to_memslot(struct kvm_memslots *slots, gfn_t gfn, bool approx)
{
struct kvm_memory_slot *slot;
- int slot_index = atomic_read(&slots->last_used_slot);

- slot = try_get_memslot(slots, slot_index, gfn);
+ slot = (struct kvm_memory_slot *)atomic_long_read(&slots->last_used_slot);
+ slot = try_get_memslot(slot, gfn);
if (slot)
return slot;

- slot = search_memslots(slots, gfn, &slot_index, approx);
+ slot = search_memslots(slots, gfn, approx);
if (slot) {
- atomic_set(&slots->last_used_slot, slot_index);
+ atomic_long_set(&slots->last_used_slot, (unsigned long)slot);
return slot;
}

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 502dbdf8ff96..c57748ee41e8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -435,7 +435,7 @@ static void kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id)
vcpu->preempted = false;
vcpu->ready = false;
preempt_notifier_init(&vcpu->preempt_notifier, &kvm_preempt_ops);
- vcpu->last_used_slot = 0;
+ vcpu->last_used_slot = NULL;
}

static void kvm_vcpu_destroy(struct kvm_vcpu *vcpu)
@@ -547,7 +547,7 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
range->start, range->end - 1) {
unsigned long hva_start, hva_end;

- slot = container_of(node, struct kvm_memory_slot, hva_node);
+ slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]);
hva_start = max(range->start, slot->userspace_addr);
hva_end = min(range->end, slot->userspace_addr +
(slot->npages << PAGE_SHIFT));
@@ -878,20 +878,6 @@ static void kvm_destroy_pm_notifier(struct kvm *kvm)
}
#endif /* CONFIG_HAVE_KVM_PM_NOTIFIER */

-static struct kvm_memslots *kvm_alloc_memslots(void)
-{
- struct kvm_memslots *slots;
-
- slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
- if (!slots)
- return NULL;
-
- slots->hva_tree = RB_ROOT_CACHED;
- hash_init(slots->id_hash);
-
- return slots;
-}
-
static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
{
if (!memslot->dirty_bitmap)
@@ -901,27 +887,33 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
memslot->dirty_bitmap = NULL;
}

+/* This does not remove the slot from struct kvm_memslots data structures */
static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
{
kvm_destroy_dirty_bitmap(slot);

kvm_arch_free_memslot(kvm, slot);

- slot->flags = 0;
- slot->npages = 0;
+ kfree(slot);
}

static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
{
+ struct hlist_node *idnode;
struct kvm_memory_slot *memslot;
+ int bkt;

- if (!slots)
+ /*
+ * The same memslot objects live in both active and inactive sets,
+ * arbitrarily free using index '1' so the second invocation of this
+ * function isn't operating over a structure with dangling pointers
+ * (even though this function isn't actually touching them).
+ */
+ if (!slots->node_idx)
return;

- kvm_for_each_memslot(memslot, slots)
+ hash_for_each_safe(slots->id_hash, bkt, idnode, memslot, id_node[1])
kvm_free_memslot(kvm, memslot);
-
- kvfree(slots);
}

static umode_t kvm_stats_debugfs_mode(const struct _kvm_stats_desc *pdesc)
@@ -1060,8 +1052,9 @@ int __weak kvm_arch_create_vm_debugfs(struct kvm *kvm)
static struct kvm *kvm_create_vm(unsigned long type)
{
struct kvm *kvm = kvm_arch_alloc_vm();
+ struct kvm_memslots *slots;
int r = -ENOMEM;
- int i;
+ int i, j;

if (!kvm)
return ERR_PTR(-ENOMEM);
@@ -1089,13 +1082,20 @@ static struct kvm *kvm_create_vm(unsigned long type)

refcount_set(&kvm->users_count, 1);
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
- struct kvm_memslots *slots = kvm_alloc_memslots();
+ for (j = 0; j < 2; j++) {
+ slots = &kvm->__memslots[i][j];

- if (!slots)
- goto out_err_no_arch_destroy_vm;
- /* Generations must be different for each address space. */
- slots->generation = i;
- rcu_assign_pointer(kvm->memslots[i], slots);
+ atomic_long_set(&slots->last_used_slot, (unsigned long)NULL);
+ slots->hva_tree = RB_ROOT_CACHED;
+ slots->gfn_tree = RB_ROOT;
+ hash_init(slots->id_hash);
+ slots->node_idx = j;
+
+ /* Generations must be different for each address space. */
+ slots->generation = i;
+ }
+
+ rcu_assign_pointer(kvm->memslots[i], &kvm->__memslots[i][0]);
}

for (i = 0; i < KVM_NR_BUSES; i++) {
@@ -1149,8 +1149,6 @@ static struct kvm *kvm_create_vm(unsigned long type)
WARN_ON_ONCE(!refcount_dec_and_test(&kvm->users_count));
for (i = 0; i < KVM_NR_BUSES; i++)
kfree(kvm_get_bus(kvm, i));
- for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
- kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
cleanup_srcu_struct(&kvm->irq_srcu);
out_err_no_irq_srcu:
cleanup_srcu_struct(&kvm->srcu);
@@ -1215,8 +1213,10 @@ static void kvm_destroy_vm(struct kvm *kvm)
#endif
kvm_arch_destroy_vm(kvm);
kvm_destroy_devices(kvm);
- for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++)
- kvm_free_memslots(kvm, __kvm_memslots(kvm, i));
+ for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ kvm_free_memslots(kvm, &kvm->__memslots[i][0]);
+ kvm_free_memslots(kvm, &kvm->__memslots[i][1]);
+ }
cleanup_srcu_struct(&kvm->irq_srcu);
cleanup_srcu_struct(&kvm->srcu);
kvm_arch_free_vm(kvm);
@@ -1286,227 +1286,136 @@ static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)
return 0;
}

-static void kvm_replace_memslot(struct kvm_memslots *slots,
- struct kvm_memory_slot *old,
- struct kvm_memory_slot *new)
-{
- /*
- * Remove the old memslot from the hash list and interval tree, copying
- * the node data would corrupt the structures.
- */
- if (old) {
- hash_del(&old->id_node);
- interval_tree_remove(&old->hva_node, &slots->hva_tree);
-
- if (!new)
- return;
-
- /* Copy the source *data*, not the pointer, to the destination. */
- *new = *old;
- } else {
- /* If @old is NULL, initialize @new's hva range. */
- new->hva_node.start = new->userspace_addr;
- new->hva_node.last = new->userspace_addr +
- (new->npages << PAGE_SHIFT) - 1;
- }
-
- /* (Re)Add the new memslot. */
- hash_add(slots->id_hash, &new->id_node, new->id);
- interval_tree_insert(&new->hva_node, &slots->hva_tree);
-}
-
-static void kvm_shift_memslot(struct kvm_memslots *slots, int dst, int src)
+static struct kvm_memslots *kvm_get_inactive_memslots(struct kvm *kvm, int as_id)
{
- struct kvm_memory_slot *mslots = slots->memslots;
+ struct kvm_memslots *active = __kvm_memslots(kvm, as_id);
+ int node_idx_inactive = active->node_idx ^ 1;

- kvm_replace_memslot(slots, &mslots[src], &mslots[dst]);
+ return &kvm->__memslots[as_id][node_idx_inactive];
}

/*
- * Delete a memslot by decrementing the number of used slots and shifting all
- * other entries in the array forward one spot.
- * @memslot is a detached dummy struct with just .id and .as_id filled.
+ * Helper to get the address space ID when one of memslot pointers may be NULL.
+ * This also serves as a sanity that at least one of the pointers is non-NULL,
+ * and that their address space IDs don't diverge.
*/
-static inline void kvm_memslot_delete(struct kvm_memslots *slots,
- struct kvm_memory_slot *memslot)
+static int kvm_memslots_get_as_id(struct kvm_memory_slot *a,
+ struct kvm_memory_slot *b)
{
- struct kvm_memory_slot *mslots = slots->memslots;
- struct kvm_memory_slot *oldslot = id_to_memslot(slots, memslot->id);
- int i;
-
- if (WARN_ON(!oldslot))
- return;
-
- slots->used_slots--;
+ if (WARN_ON_ONCE(!a && !b))
+ return 0;

- if (atomic_read(&slots->last_used_slot) >= slots->used_slots)
- atomic_set(&slots->last_used_slot, 0);
+ if (!a)
+ return b->as_id;
+ if (!b)
+ return a->as_id;

- /*
- * Remove the to-be-deleted memslot from the list/tree _before_ shifting
- * the trailing memslots forward, its data will be overwritten.
- * Defer the (somewhat pointless) copying of the memslot until after
- * the last slot has been shifted to avoid overwriting said last slot.
- */
- kvm_replace_memslot(slots, oldslot, NULL);
-
- for (i = oldslot - mslots; i < slots->used_slots; i++)
- kvm_shift_memslot(slots, i, i + 1);
- mslots[i] = *memslot;
+ WARN_ON_ONCE(a->as_id != b->as_id);
+ return a->as_id;
}

-/*
- * "Insert" a new memslot by incrementing the number of used slots. Returns
- * the new slot's initial index into the memslots array.
- */
-static inline int kvm_memslot_insert_back(struct kvm_memslots *slots)
+static void kvm_insert_gfn_node(struct kvm_memslots *slots,
+ struct kvm_memory_slot *slot)
{
- return slots->used_slots++;
-}
-
-/*
- * Move a changed memslot backwards in the array by shifting existing slots
- * with a higher GFN toward the front of the array. Note, the changed memslot
- * itself is not preserved in the array, i.e. not swapped at this time, only
- * its new index into the array is tracked. Returns the changed memslot's
- * current index into the memslots array.
- * The memslot at the returned index will not be in @slots->hva_tree or
- * @slots->id_hash by then.
- * @memslot is a detached struct with desired final data of the changed slot.
- */
-static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
- struct kvm_memory_slot *memslot)
-{
- struct kvm_memory_slot *mslots = slots->memslots;
- struct kvm_memory_slot *oldslot = id_to_memslot(slots, memslot->id);
- int i;
-
- if (!oldslot || !slots->used_slots)
- return -1;
-
- /*
- * Delete the slot from the hash table and interval tree before sorting
- * the remaining slots, the slot's data may be overwritten when copying
- * slots as part of the sorting proccess. update_memslots() will
- * unconditionally rewrite and re-add the entire slot.
- */
- kvm_replace_memslot(slots, oldslot, NULL);
-
- /*
- * Move the target memslot backward in the array by shifting existing
- * memslots with a higher GFN (than the target memslot) towards the
- * front of the array.
- */
- for (i = oldslot - mslots; i < slots->used_slots - 1; i++) {
- if (memslot->base_gfn > mslots[i + 1].base_gfn)
- break;
+ struct rb_root *gfn_tree = &slots->gfn_tree;
+ struct rb_node **node, *parent;
+ int idx = slots->node_idx;

- WARN_ON_ONCE(memslot->base_gfn == mslots[i + 1].base_gfn);
+ parent = NULL;
+ for (node = &gfn_tree->rb_node; *node; ) {
+ struct kvm_memory_slot *tmp;

- kvm_shift_memslot(slots, i, i + 1);
+ tmp = container_of(*node, struct kvm_memory_slot, gfn_node[idx]);
+ parent = *node;
+ if (slot->base_gfn < tmp->base_gfn)
+ node = &(*node)->rb_left;
+ else if (slot->base_gfn > tmp->base_gfn)
+ node = &(*node)->rb_right;
+ else
+ BUG();
}
- return i;
+
+ rb_link_node(&slot->gfn_node[idx], parent, node);
+ rb_insert_color(&slot->gfn_node[idx], gfn_tree);
}

-/*
- * Move a changed memslot forwards in the array by shifting existing slots with
- * a lower GFN toward the back of the array. Note, the changed memslot itself
- * is not preserved in the array, i.e. not swapped at this time, only its new
- * index into the array is tracked. Returns the changed memslot's final index
- * into the memslots array.
- * The memslot at the returned index will not be in @slots->hva_tree or
- * @slots->id_hash by then.
- * @memslot is a detached struct with desired final data of the new or
- * changed slot.
- * Assumes that the memslot at @start index is not in @slots->hva_tree or
- * @slots->id_hash.
- */
-static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
- struct kvm_memory_slot *memslot,
- int start)
+static void kvm_erase_gfn_node(struct kvm_memslots *slots,
+ struct kvm_memory_slot *slot)
{
- struct kvm_memory_slot *mslots = slots->memslots;
- int i;
+ rb_erase(&slot->gfn_node[slots->node_idx], &slots->gfn_tree);
+}

- for (i = start; i > 0; i--) {
- if (memslot->base_gfn < mslots[i - 1].base_gfn)
- break;
+static void kvm_replace_gfn_node(struct kvm_memslots *slots,
+ struct kvm_memory_slot *old,
+ struct kvm_memory_slot *new)
+{
+ int idx = slots->node_idx;

- WARN_ON_ONCE(memslot->base_gfn == mslots[i - 1].base_gfn);
+ WARN_ON_ONCE(old->base_gfn != new->base_gfn);

- kvm_shift_memslot(slots, i, i - 1);
- }
- return i;
+ rb_replace_node(&old->gfn_node[idx], &new->gfn_node[idx],
+ &slots->gfn_tree);
}

/*
- * Re-sort memslots based on their GFN to account for an added, deleted, or
- * moved memslot. Sorting memslots by GFN allows using a binary search during
- * memslot lookup.
- *
- * IMPORTANT: Slots are sorted from highest GFN to lowest GFN! I.e. the entry
- * at memslots[0] has the highest GFN.
- *
- * The sorting algorithm takes advantage of having initially sorted memslots
- * and knowing the position of the changed memslot. Sorting is also optimized
- * by not swapping the updated memslot and instead only shifting other memslots
- * and tracking the new index for the update memslot. Only once its final
- * index is known is the updated memslot copied into its position in the array.
- *
- * - When deleting a memslot, the deleted memslot simply needs to be moved to
- * the end of the array.
- *
- * - When creating a memslot, the algorithm "inserts" the new memslot at the
- * end of the array and then it forward to its correct location.
- *
- * - When moving a memslot, the algorithm first moves the updated memslot
- * backward to handle the scenario where the memslot's GFN was changed to a
- * lower value. update_memslots() then falls through and runs the same flow
- * as creating a memslot to move the memslot forward to handle the scenario
- * where its GFN was changed to a higher value.
+ * Replace @old with @new in the inactive memslots.
*
- * Note, slots are sorted from highest->lowest instead of lowest->highest for
- * historical reasons. Originally, invalid memslots where denoted by having
- * GFN=0, thus sorting from highest->lowest naturally sorted invalid memslots
- * to the end of the array. The current algorithm uses dedicated logic to
- * delete a memslot and thus does not rely on invalid memslots having GFN=0.
+ * With NULL @old this simply adds @new.
+ * With NULL @new this simply removes @old.
*
- * The other historical motiviation for highest->lowest was to improve the
- * performance of memslot lookup. KVM originally used a linear search starting
- * at memslots[0]. On x86, the largest memslot usually has one of the highest,
- * if not *the* highest, GFN, as the bulk of the guest's RAM is located in a
- * single memslot above the 4gb boundary. As the largest memslot is also the
- * most likely to be referenced, sorting it to the front of the array was
- * advantageous. The current binary search starts from the middle of the array
- * and uses an LRU pointer to improve performance for all memslots and GFNs.
- *
- * @memslot is a detached struct, not a part of the current or new memslot
- * array.
+ * If @new is non-NULL its hva_node[slots_idx] range has to be set
+ * appropriately.
*/
-static void update_memslots(struct kvm_memslots *slots,
- struct kvm_memory_slot *memslot,
- enum kvm_mr_change change)
+static void kvm_replace_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *old,
+ struct kvm_memory_slot *new)
{
- int i;
+ int as_id = kvm_memslots_get_as_id(old, new);
+ struct kvm_memslots *slots = kvm_get_inactive_memslots(kvm, as_id);
+ int idx = slots->node_idx;

- if (change == KVM_MR_DELETE) {
- kvm_memslot_delete(slots, memslot);
- } else {
- if (change == KVM_MR_CREATE)
- i = kvm_memslot_insert_back(slots);
- else
- i = kvm_memslot_move_backward(slots, memslot);
- i = kvm_memslot_move_forward(slots, memslot, i);
+ if (old) {
+ hash_del(&old->id_node[idx]);
+ interval_tree_remove(&old->hva_node[idx], &slots->hva_tree);

- if (WARN_ON_ONCE(i < 0))
+ if ((long)old == atomic_long_read(&slots->last_used_slot))
+ atomic_long_set(&slots->last_used_slot, (long)new);
+
+ if (!new) {
+ kvm_erase_gfn_node(slots, old);
return;
+ }
+ }

- /*
- * Copy the memslot to its new position in memslots and update
- * its index accordingly.
- */
- slots->memslots[i] = *memslot;
- kvm_replace_memslot(slots, NULL, &slots->memslots[i]);
+ /*
+ * Initialize @new's hva range. Do this even when replacing an @old
+ * slot, kvm_copy_memslot() deliberately does not touch node data.
+ */
+ new->hva_node[idx].start = new->userspace_addr;
+ new->hva_node[idx].last = new->userspace_addr +
+ (new->npages << PAGE_SHIFT) - 1;
+
+ /*
+ * (Re)Add the new memslot. There is no O(1) interval_tree_replace(),
+ * hva_node needs to be swapped with remove+insert even though hva can't
+ * change when replacing an existing slot.
+ */
+ hash_add(slots->id_hash, &new->id_node[idx], new->id);
+ interval_tree_insert(&new->hva_node[idx], &slots->hva_tree);
+
+ /*
+ * If the memslot gfn is unchanged, rb_replace_node() can be used to
+ * switch the node in the gfn tree instead of removing the old and
+ * inserting the new as two separate operations. Replacement is a
+ * single O(1) operation versus two O(log(n)) operations for
+ * remove+insert.
+ */
+ if (old && old->base_gfn == new->base_gfn) {
+ kvm_replace_gfn_node(slots, old, new);
+ } else {
+ if (old)
+ kvm_erase_gfn_node(slots, old);
+ kvm_insert_gfn_node(slots, new);
}
}

@@ -1524,11 +1433,12 @@ static int check_memory_region_flags(const struct kvm_userspace_memory_region *m
return 0;
}

-static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
- int as_id, struct kvm_memslots *slots)
+static void kvm_swap_active_memslots(struct kvm *kvm, int as_id)
{
- struct kvm_memslots *old_memslots = __kvm_memslots(kvm, as_id);
- u64 gen = old_memslots->generation;
+ struct kvm_memslots *slots = kvm_get_inactive_memslots(kvm, as_id);
+
+ /* Grab the generation from the activate memslots. */
+ u64 gen = __kvm_memslots(kvm, as_id)->generation;

WARN_ON(gen & KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS);
slots->generation = gen | KVM_MEMSLOT_GEN_UPDATE_IN_PROGRESS;
@@ -1579,58 +1489,6 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
kvm_arch_memslots_updated(kvm, gen);

slots->generation = gen;
-
- return old_memslots;
-}
-
-static size_t kvm_memslots_size(int slots)
-{
- return sizeof(struct kvm_memslots) +
- (sizeof(struct kvm_memory_slot) * slots);
-}
-
-/*
- * Note, at a minimum, the current number of used slots must be allocated, even
- * when deleting a memslot, as we need a complete duplicate of the memslots for
- * use when invalidating a memslot prior to deleting/moving the memslot.
- */
-static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
- enum kvm_mr_change change)
-{
- struct kvm_memslots *slots;
- size_t new_size;
- struct kvm_memory_slot *memslot;
-
- if (change == KVM_MR_CREATE)
- new_size = kvm_memslots_size(old->used_slots + 1);
- else
- new_size = kvm_memslots_size(old->used_slots);
-
- slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
- if (unlikely(!slots))
- return NULL;
-
- memcpy(slots, old, kvm_memslots_size(old->used_slots));
-
- slots->hva_tree = RB_ROOT_CACHED;
- hash_init(slots->id_hash);
- kvm_for_each_memslot(memslot, slots) {
- interval_tree_insert(&memslot->hva_node, &slots->hva_tree);
- hash_add(slots->id_hash, &memslot->id_node, memslot->id);
- }
-
- return slots;
-}
-
-static void kvm_copy_memslots_arch(struct kvm_memslots *to,
- struct kvm_memslots *from)
-{
- int i;
-
- WARN_ON_ONCE(to->used_slots != from->used_slots);
-
- for (i = 0; i < from->used_slots; i++)
- to->memslots[i].arch = from->memslots[i].arch;
}

static int kvm_prepare_memory_region(struct kvm *kvm,
@@ -1685,31 +1543,214 @@ static void kvm_commit_memory_region(struct kvm *kvm,

kvm_arch_commit_memory_region(kvm, old, new, change);

+ switch (change) {
+ case KVM_MR_CREATE:
+ /* Nothing more to do. */
+ break;
+ case KVM_MR_DELETE:
+ /* Free the old memslot and all its metadata. */
+ kvm_free_memslot(kvm, old);
+ break;
+ case KVM_MR_MOVE:
+ case KVM_MR_FLAGS_ONLY:
+ /*
+ * Free the dirty bitmap as needed; the below check encompasses
+ * both the flags and whether a ring buffer is being used)
+ */
+ if (old->dirty_bitmap && !new->dirty_bitmap)
+ kvm_destroy_dirty_bitmap(old);
+
+ /*
+ * The final quirk. Free the detached, old slot, but only its
+ * memory, not any metadata. Metadata, including arch specific
+ * data, may be reused by @new.
+ */
+ kfree(old);
+ break;
+ default:
+ BUG();
+ }
+}
+
+/*
+ * Activate @new, which must be installed in the inactive slots by the caller,
+ * by swapping the active slots and then propagating @new to @old once @old is
+ * unreachable and can be safely modified.
+ *
+ * With NULL @old this simply adds @new to @active (while swapping the sets).
+ * With NULL @new this simply removes @old from @active and frees it
+ * (while also swapping the sets).
+ */
+static void kvm_activate_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *old,
+ struct kvm_memory_slot *new)
+{
+ int as_id = kvm_memslots_get_as_id(old, new);
+
+ kvm_swap_active_memslots(kvm, as_id);
+
+ /* Propagate the new memslot to the now inactive memslots. */
+ kvm_replace_memslot(kvm, old, new);
+}
+
+static void kvm_copy_memslot(struct kvm_memory_slot *dest,
+ const struct kvm_memory_slot *src)
+{
+ dest->base_gfn = src->base_gfn;
+ dest->npages = src->npages;
+ dest->dirty_bitmap = src->dirty_bitmap;
+ dest->arch = src->arch;
+ dest->userspace_addr = src->userspace_addr;
+ dest->flags = src->flags;
+ dest->id = src->id;
+ dest->as_id = src->as_id;
+}
+
+static void kvm_invalidate_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *old,
+ struct kvm_memory_slot *working_slot)
+{
/*
- * Free the old memslot's metadata. On DELETE, free the whole thing,
- * otherwise free the dirty bitmap as needed (the below effectively
- * checks both the flags and whether a ring buffer is being used).
+ * Mark the current slot INVALID. As with all memslot modifications,
+ * this must be done on an unreachable slot to avoid modifying the
+ * current slot in the active tree.
*/
- if (change == KVM_MR_DELETE)
- kvm_free_memslot(kvm, old);
- else if (old->dirty_bitmap && !new->dirty_bitmap)
- kvm_destroy_dirty_bitmap(old);
+ kvm_copy_memslot(working_slot, old);
+ working_slot->flags |= KVM_MEMSLOT_INVALID;
+ kvm_replace_memslot(kvm, old, working_slot);
+
+ /*
+ * Activate the slot that is now marked INVALID, but don't propagate
+ * the slot to the now inactive slots. The slot is either going to be
+ * deleted or recreated as a new slot.
+ */
+ kvm_swap_active_memslots(kvm, old->as_id);
+
+ /*
+ * From this point no new shadow pages pointing to a deleted, or moved,
+ * memslot will be created. Validation of sp->gfn happens in:
+ * - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
+ * - kvm_is_visible_gfn (mmu_check_root)
+ */
+ kvm_arch_flush_shadow_memslot(kvm, working_slot);
+
+ /* Was released by kvm_swap_active_memslots, reacquire. */
+ mutex_lock(&kvm->slots_arch_lock);
+
+ /*
+ * Copy the arch-specific field of the newly-installed slot back to the
+ * old slot as the arch data could have changed between releasing
+ * slots_arch_lock in install_new_memslots() and re-acquiring the lock
+ * above. Writers are required to retrieve memslots *after* acquiring
+ * slots_arch_lock, thus the active slot's data is guaranteed to be fresh.
+ */
+ old->arch = working_slot->arch;
+}
+
+static void kvm_create_memslot(struct kvm *kvm,
+ const struct kvm_memory_slot *new,
+ struct kvm_memory_slot *working)
+{
+ /*
+ * Add the new memslot to the inactive set as a copy of the
+ * new memslot data provided by userspace.
+ */
+ kvm_copy_memslot(working, new);
+ kvm_replace_memslot(kvm, NULL, working);
+ kvm_activate_memslot(kvm, NULL, working);
+}
+
+static void kvm_delete_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *old,
+ struct kvm_memory_slot *invalid_slot)
+{
+ /*
+ * Remove the old memslot (in the inactive memslots) by passing NULL as
+ * the "new" slot.
+ */
+ kvm_replace_memslot(kvm, old, NULL);
+
+ /* And do the same for the invalid version in the active slot. */
+ kvm_activate_memslot(kvm, invalid_slot, NULL);
+
+ /* Free the invalid slot, the caller will clean up the old slot. */
+ kfree(invalid_slot);
+}
+
+static struct kvm_memory_slot *kvm_move_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *old,
+ const struct kvm_memory_slot *new,
+ struct kvm_memory_slot *invalid_slot)
+{
+ struct kvm_memslots *slots = kvm_get_inactive_memslots(kvm, old->as_id);
+
+ /*
+ * The memslot's gfn is changing, remove it from the inactive tree, it
+ * will be re-added with its updated gfn. Because its range is
+ * changing, an in-place replace is not possible.
+ */
+ kvm_erase_gfn_node(slots, old);
+
+ /*
+ * The old slot is now fully disconnected, reuse its memory for the
+ * persistent copy of "new".
+ */
+ kvm_copy_memslot(old, new);
+
+ /* Re-add to the gfn tree with the updated gfn */
+ kvm_insert_gfn_node(slots, old);
+
+ /* Replace the current INVALID slot with the updated memslot. */
+ kvm_activate_memslot(kvm, invalid_slot, old);
+
+ /*
+ * Clear the INVALID flag so that the invalid_slot is now a perfect
+ * copy of the old slot. Return it for cleanup in the caller.
+ */
+ WARN_ON_ONCE(!(invalid_slot->flags & KVM_MEMSLOT_INVALID));
+ invalid_slot->flags &= ~KVM_MEMSLOT_INVALID;
+ return invalid_slot;
+}
+
+static void kvm_update_flags_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *old,
+ const struct kvm_memory_slot *new,
+ struct kvm_memory_slot *working_slot)
+{
+ /*
+ * Similar to the MOVE case, but the slot doesn't need to be zapped as
+ * an intermediate step. Instead, the old memslot is simply replaced
+ * with a new, updated copy in both memslot sets.
+ */
+ kvm_copy_memslot(working_slot, new);
+ kvm_replace_memslot(kvm, old, working_slot);
+ kvm_activate_memslot(kvm, old, working_slot);
}

static int kvm_set_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *old,
struct kvm_memory_slot *new,
enum kvm_mr_change change)
{
- struct kvm_memory_slot *slot, old;
- struct kvm_memslots *slots;
+ struct kvm_memory_slot *working;
int r;

/*
- * Released in install_new_memslots.
+ * Modifications are done on an unreachable slot. Any changes are then
+ * (eventually) propagated to both the active and inactive slots. This
+ * allocation would ideally be on-demand (in helpers), but is done here
+ * to avoid having to handle failure after kvm_prepare_memory_region().
+ */
+ working = kzalloc(sizeof(*working), GFP_KERNEL_ACCOUNT);
+ if (!working)
+ return -ENOMEM;
+
+ /*
+ * Released in kvm_swap_active_memslots.
*
* Must be held from before the current memslots are copied until
* after the new memslots are installed with rcu_assign_pointer,
- * then released before the synchronize srcu in install_new_memslots.
+ * then released before the synchronize srcu in kvm_swap_active_memslots.
*
* When modifying memslots outside of the slots_lock, must be held
* before reading the pointer to the current memslots until after all
@@ -1720,87 +1761,60 @@ static int kvm_set_memslot(struct kvm *kvm,
*/
mutex_lock(&kvm->slots_arch_lock);

- slots = kvm_dup_memslots(__kvm_memslots(kvm, new->as_id), change);
- if (!slots) {
- mutex_unlock(&kvm->slots_arch_lock);
- return -ENOMEM;
- }
-
- if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
- /*
- * Note, the INVALID flag needs to be in the appropriate entry
- * in the freshly allocated memslots, not in @old or @new.
- */
- slot = id_to_memslot(slots, new->id);
- slot->flags |= KVM_MEMSLOT_INVALID;
-
- /*
- * We can re-use the old memslots, the only difference from the
- * newly installed memslots is the invalid flag, which will get
- * dropped by update_memslots anyway. We'll also revert to the
- * old memslots if preparing the new memory region fails.
- */
- slots = install_new_memslots(kvm, new->as_id, slots);
-
- /* From this point no new shadow pages pointing to a deleted,
- * or moved, memslot will be created.
- *
- * validation of sp->gfn happens in:
- * - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
- * - kvm_is_visible_gfn (mmu_check_root)
- */
- kvm_arch_flush_shadow_memslot(kvm, slot);
-
- /* Released in install_new_memslots. */
- mutex_lock(&kvm->slots_arch_lock);
+ /*
+ * Invalidate the old slot if it's being deleted or moved. This is
+ * done prior to actually deleting/moving the memslot to allow vCPUs to
+ * continue running by ensuring there are no mappings or shadow pages
+ * for the memslot when it is deleted/moved. Without pre-invalidation
+ * (and without a lock), a window would exist between effecting the
+ * delete/move and committing the changes in arch code where KVM or a
+ * guest could access a non-existent memslot.
+ */
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+ kvm_invalidate_memslot(kvm, old, working);

+ r = kvm_prepare_memory_region(kvm, old, new, change);
+ if (r) {
/*
- * The arch-specific fields of the now-active memslots could
- * have been modified between releasing slots_arch_lock in
- * install_new_memslots and re-acquiring slots_arch_lock above.
- * Copy them to the inactive memslots. Arch code is required
- * to retrieve memslots *after* acquiring slots_arch_lock, thus
- * the active memslots are guaranteed to be fresh.
+ * For DELETE/MOVE, revert the above INVALID change. No
+ * modifications required since the original slot was preserved
+ * in the inactive slots. Changing the active memslots also
+ * release slots_arch_lock.
*/
- kvm_copy_memslots_arch(slots, __kvm_memslots(kvm, new->as_id));
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+ kvm_activate_memslot(kvm, working, old);
+ else
+ mutex_unlock(&kvm->slots_arch_lock);
+ kfree(working);
+ return r;
}

/*
- * Make a full copy of the old memslot, the pointer will become stale
- * when the memslots are re-sorted by update_memslots(), and the old
- * memslot needs to be referenced after calling update_memslots(), e.g.
- * to free its resources and for arch specific behavior. This needs to
- * happen *after* (re)acquiring slots_arch_lock.
+ * For DELETE and MOVE, the working slot is now active as the INVALID
+ * version of the old slot. MOVE is particularly special as it reuses
+ * the old slot and returns a copy of the old slot (in working_slot).
+ * For CREATE, there is no old slot. For DELETE and FLAGS_ONLY, the
+ * old slot is detached but otherwise preserved.
*/
- slot = id_to_memslot(slots, new->id);
- if (slot) {
- old = *slot;
- } else {
- WARN_ON_ONCE(change != KVM_MR_CREATE);
- memset(&old, 0, sizeof(old));
- old.id = new->id;
- old.as_id = new->as_id;
- }
-
- r = kvm_prepare_memory_region(kvm, &old, new, change);
- if (r)
- goto out_slots;
-
- update_memslots(slots, new, change);
- slots = install_new_memslots(kvm, new->as_id, slots);
+ if (change == KVM_MR_CREATE)
+ kvm_create_memslot(kvm, new, working);
+ else if (change == KVM_MR_DELETE)
+ kvm_delete_memslot(kvm, old, working);
+ else if (change == KVM_MR_MOVE)
+ old = kvm_move_memslot(kvm, old, new, working);
+ else if (change == KVM_MR_FLAGS_ONLY)
+ kvm_update_flags_memslot(kvm, old, new, working);
+ else
+ BUG();

- kvm_commit_memory_region(kvm, &old, new, change);
+ /*
+ * No need to refresh new->arch, changes after dropping slots_arch_lock
+ * will directly hit the final, active memsot. Architectures are
+ * responsible for knowing that new->arch may be stale.
+ */
+ kvm_commit_memory_region(kvm, old, new, change);

- kvfree(slots);
return 0;
-
-out_slots:
- if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
- slots = install_new_memslots(kvm, new->as_id, slots);
- else
- mutex_unlock(&kvm->slots_arch_lock);
- kvfree(slots);
- return r;
}

/*
@@ -1861,7 +1875,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
new.id = id;
new.as_id = as_id;

- return kvm_set_memslot(kvm, &new, KVM_MR_DELETE);
+ return kvm_set_memslot(kvm, old, &new, KVM_MR_DELETE);
}

new.as_id = as_id;
@@ -1898,8 +1912,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
}

if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
+ int bkt;
+
/* Check for overlaps */
- kvm_for_each_memslot(tmp, __kvm_memslots(kvm, as_id)) {
+ kvm_for_each_memslot(tmp, bkt, __kvm_memslots(kvm, as_id)) {
if (tmp->id == id)
continue;
if (!((new.base_gfn + new.npages <= tmp->base_gfn) ||
@@ -1908,7 +1924,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
}
}

- return kvm_set_memslot(kvm, &new, change);
+ return kvm_set_memslot(kvm, old, &new, change);
}
EXPORT_SYMBOL_GPL(__kvm_set_memory_region);

@@ -2213,21 +2229,30 @@ EXPORT_SYMBOL_GPL(gfn_to_memslot);
struct kvm_memory_slot *kvm_vcpu_gfn_to_memslot(struct kvm_vcpu *vcpu, gfn_t gfn)
{
struct kvm_memslots *slots = kvm_vcpu_memslots(vcpu);
+ u64 gen = slots->generation;
struct kvm_memory_slot *slot;
- int slot_index;

- slot = try_get_memslot(slots, vcpu->last_used_slot, gfn);
+ /*
+ * This also protects against using a memslot from a different address space,
+ * since different address spaces have different generation numbers.
+ */
+ if (unlikely(gen != vcpu->last_used_slot_gen)) {
+ vcpu->last_used_slot = NULL;
+ vcpu->last_used_slot_gen = gen;
+ }
+
+ slot = try_get_memslot(vcpu->last_used_slot, gfn);
if (slot)
return slot;

/*
* Fall back to searching all memslots. We purposely use
* search_memslots() instead of __gfn_to_memslot() to avoid
- * thrashing the VM-wide last_used_index in kvm_memslots.
+ * thrashing the VM-wide last_used_slot in kvm_memslots.
*/
- slot = search_memslots(slots, gfn, &slot_index, false);
+ slot = search_memslots(slots, gfn, false);
if (slot) {
- vcpu->last_used_slot = slot_index;
+ vcpu->last_used_slot = slot;
return slot;
}


2021-11-30 21:47:16

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 28/29] KVM: Wait 'til the bitter end to initialize the "new" memslot

From: Sean Christopherson <[email protected]>

Initialize the "new" memslot in the !DELETE path only after the various
sanity checks have passed. This will allow a future commit to allocate
@new dynamically without having to copy a memslot, and without having to
deal with freeing @new in error paths and in the "nothing to change" path
that's hiding in the sanity checks.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maciej S. Szmigiero <[email protected]>
Signed-off-by: Maciej S. Szmigiero <[email protected]>
---
virt/kvm/kvm_main.c | 37 ++++++++++++++++++++-----------------
1 file changed, 20 insertions(+), 17 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 52117f65bc5b..8295d87c07b5 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1844,6 +1844,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
struct kvm_memory_slot new;
struct kvm_memslots *slots;
enum kvm_mr_change change;
+ unsigned long npages;
+ gfn_t base_gfn;
int as_id, id;
int r;

@@ -1870,6 +1872,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
return -EINVAL;
if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
return -EINVAL;
+ if ((mem->memory_size >> PAGE_SHIFT) > KVM_MEM_MAX_NR_PAGES)
+ return -EINVAL;

slots = __kvm_memslots(kvm, as_id);

@@ -1893,15 +1897,8 @@ int __kvm_set_memory_region(struct kvm *kvm,
return kvm_set_memslot(kvm, old, &new, KVM_MR_DELETE);
}

- new.as_id = as_id;
- new.id = id;
- new.base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
- new.npages = mem->memory_size >> PAGE_SHIFT;
- new.flags = mem->flags;
- new.userspace_addr = mem->userspace_addr;
-
- if (new.npages > KVM_MEM_MAX_NR_PAGES)
- return -EINVAL;
+ base_gfn = (mem->guest_phys_addr >> PAGE_SHIFT);
+ npages = (mem->memory_size >> PAGE_SHIFT);

if (!old || !old->npages) {
change = KVM_MR_CREATE;
@@ -1910,27 +1907,33 @@ int __kvm_set_memory_region(struct kvm *kvm,
* To simplify KVM internals, the total number of pages across
* all memslots must fit in an unsigned long.
*/
- if ((kvm->nr_memslot_pages + new.npages) < kvm->nr_memslot_pages)
+ if ((kvm->nr_memslot_pages + npages) < kvm->nr_memslot_pages)
return -EINVAL;
} else { /* Modify an existing slot. */
- if ((new.userspace_addr != old->userspace_addr) ||
- (new.npages != old->npages) ||
- ((new.flags ^ old->flags) & KVM_MEM_READONLY))
+ if ((mem->userspace_addr != old->userspace_addr) ||
+ (npages != old->npages) ||
+ ((mem->flags ^ old->flags) & KVM_MEM_READONLY))
return -EINVAL;

- if (new.base_gfn != old->base_gfn)
+ if (base_gfn != old->base_gfn)
change = KVM_MR_MOVE;
- else if (new.flags != old->flags)
+ else if (mem->flags != old->flags)
change = KVM_MR_FLAGS_ONLY;
else /* Nothing to change. */
return 0;
}

if ((change == KVM_MR_CREATE || change == KVM_MR_MOVE) &&
- kvm_check_memslot_overlap(slots, id, new.base_gfn,
- new.base_gfn + new.npages))
+ kvm_check_memslot_overlap(slots, id, base_gfn, base_gfn + npages))
return -EEXIST;

+ new.as_id = as_id;
+ new.id = id;
+ new.base_gfn = base_gfn;
+ new.npages = npages;
+ new.flags = mem->flags;
+ new.userspace_addr = mem->userspace_addr;
+
return kvm_set_memslot(kvm, old, &new, change);
}
EXPORT_SYMBOL_GPL(__kvm_set_memory_region);

2021-11-30 21:47:22

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 27/29] KVM: Optimize overlapping memslots check

From: "Maciej S. Szmigiero" <[email protected]>

Do a quick lookup for possibly overlapping gfns when creating or moving
a memslot instead of performing a linear scan of the whole memslot set.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
[sean: tweaked params to avoid churn in future cleanup]
Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 35 +++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 086f18969bc3..52117f65bc5b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1817,6 +1817,18 @@ static int kvm_set_memslot(struct kvm *kvm,
return 0;
}

+static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
+ gfn_t start, gfn_t end)
+{
+ struct kvm_memslot_iter iter;
+
+ kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end)
+ if (kvm_memslot_iter_slot(&iter)->id != id)
+ return true;
+
+ return false;
+}
+
/*
* Allocate some memory and give it an address in the guest physical address
* space.
@@ -1828,8 +1840,9 @@ static int kvm_set_memslot(struct kvm *kvm,
int __kvm_set_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem)
{
- struct kvm_memory_slot *old, *tmp;
+ struct kvm_memory_slot *old;
struct kvm_memory_slot new;
+ struct kvm_memslots *slots;
enum kvm_mr_change change;
int as_id, id;
int r;
@@ -1858,11 +1871,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
return -EINVAL;

+ slots = __kvm_memslots(kvm, as_id);
+
/*
* Note, the old memslot (and the pointer itself!) may be invalidated
* and/or destroyed by kvm_set_memslot().
*/
- old = id_to_memslot(__kvm_memslots(kvm, as_id), id);
+ old = id_to_memslot(slots, id);

if (!mem->memory_size) {
if (!old || !old->npages)
@@ -1911,18 +1926,10 @@ int __kvm_set_memory_region(struct kvm *kvm,
return 0;
}

- if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
- int bkt;
-
- /* Check for overlaps */
- kvm_for_each_memslot(tmp, bkt, __kvm_memslots(kvm, as_id)) {
- if (tmp->id == id)
- continue;
- if (!((new.base_gfn + new.npages <= tmp->base_gfn) ||
- (new.base_gfn >= tmp->base_gfn + tmp->npages)))
- return -EEXIST;
- }
- }
+ if ((change == KVM_MR_CREATE || change == KVM_MR_MOVE) &&
+ kvm_check_memslot_overlap(slots, id, new.base_gfn,
+ new.base_gfn + new.npages))
+ return -EEXIST;

return kvm_set_memslot(kvm, old, &new, change);
}

2021-11-30 21:47:26

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 29/29] KVM: Dynamically allocate "new" memslots from the get-go

From: Sean Christopherson <[email protected]>

Allocate the "new" memslot for !DELETE memslot updates straight away
instead of filling an intermediate on-stack object and forcing
kvm_set_memslot() to juggle the allocation and do weird things like reuse
the old memslot object in MOVE.

In the MOVE case, this results in an "extra" memslot allocation due to
allocating both the "new" slot and the "invalid" slot, but that's a
temporary and not-huge allocation, and MOVE is a relatively rare memslot
operation.

Regarding MOVE, drop the open-coded management of the gfn tree with a
call to kvm_replace_memslot(), which already handles the case where
new->base_gfn != old->base_gfn. This is made possible by virtue of not
having to copy the "new" memslot data after erasing the old memslot from
the gfn tree. Using kvm_replace_memslot(), and more specifically not
reusing the old memslot, means the MOVE case now does hva tree and hash
list updates, but that's a small price to pay for simplifying the code
and making MOVE align with all the other flavors of updates. The "extra"
updates are firmly in the noise from a performance perspective, e.g. the
"move (in)active area" selfttests show a (very, very) slight improvement.

Signed-off-by: Sean Christopherson <[email protected]>
Reviewed-by: Maciej S. Szmigiero <[email protected]>
Signed-off-by: Maciej S. Szmigiero <[email protected]>
---
virt/kvm/kvm_main.c | 178 +++++++++++++++++++-------------------------
1 file changed, 77 insertions(+), 101 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8295d87c07b5..b0360afb127b 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1505,23 +1505,25 @@ static int kvm_prepare_memory_region(struct kvm *kvm,
* new and KVM isn't using a ring buffer, allocate and initialize a
* new bitmap.
*/
- if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
- new->dirty_bitmap = NULL;
- else if (old->dirty_bitmap)
- new->dirty_bitmap = old->dirty_bitmap;
- else if (!kvm->dirty_ring_size) {
- r = kvm_alloc_dirty_bitmap(new);
- if (r)
- return r;
+ if (change != KVM_MR_DELETE) {
+ if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
+ new->dirty_bitmap = NULL;
+ else if (old && old->dirty_bitmap)
+ new->dirty_bitmap = old->dirty_bitmap;
+ else if (!kvm->dirty_ring_size) {
+ r = kvm_alloc_dirty_bitmap(new);
+ if (r)
+ return r;

- if (kvm_dirty_log_manual_protect_and_init_set(kvm))
- bitmap_set(new->dirty_bitmap, 0, new->npages);
+ if (kvm_dirty_log_manual_protect_and_init_set(kvm))
+ bitmap_set(new->dirty_bitmap, 0, new->npages);
+ }
}

r = kvm_arch_prepare_memory_region(kvm, old, new, change);

/* Free the bitmap on failure if it was allocated above. */
- if (r && new->dirty_bitmap && !old->dirty_bitmap)
+ if (r && new && new->dirty_bitmap && old && !old->dirty_bitmap)
kvm_destroy_dirty_bitmap(new);

return r;
@@ -1608,16 +1610,16 @@ static void kvm_copy_memslot(struct kvm_memory_slot *dest,

static void kvm_invalidate_memslot(struct kvm *kvm,
struct kvm_memory_slot *old,
- struct kvm_memory_slot *working_slot)
+ struct kvm_memory_slot *invalid_slot)
{
/*
* Mark the current slot INVALID. As with all memslot modifications,
* this must be done on an unreachable slot to avoid modifying the
* current slot in the active tree.
*/
- kvm_copy_memslot(working_slot, old);
- working_slot->flags |= KVM_MEMSLOT_INVALID;
- kvm_replace_memslot(kvm, old, working_slot);
+ kvm_copy_memslot(invalid_slot, old);
+ invalid_slot->flags |= KVM_MEMSLOT_INVALID;
+ kvm_replace_memslot(kvm, old, invalid_slot);

/*
* Activate the slot that is now marked INVALID, but don't propagate
@@ -1644,20 +1646,15 @@ static void kvm_invalidate_memslot(struct kvm *kvm,
* above. Writers are required to retrieve memslots *after* acquiring
* slots_arch_lock, thus the active slot's data is guaranteed to be fresh.
*/
- old->arch = working_slot->arch;
+ old->arch = invalid_slot->arch;
}

static void kvm_create_memslot(struct kvm *kvm,
- const struct kvm_memory_slot *new,
- struct kvm_memory_slot *working)
+ struct kvm_memory_slot *new)
{
- /*
- * Add the new memslot to the inactive set as a copy of the
- * new memslot data provided by userspace.
- */
- kvm_copy_memslot(working, new);
- kvm_replace_memslot(kvm, NULL, working);
- kvm_activate_memslot(kvm, NULL, working);
+ /* Add the new memslot to the inactive set and activate. */
+ kvm_replace_memslot(kvm, NULL, new);
+ kvm_activate_memslot(kvm, NULL, new);
}

static void kvm_delete_memslot(struct kvm *kvm,
@@ -1666,65 +1663,36 @@ static void kvm_delete_memslot(struct kvm *kvm,
{
/*
* Remove the old memslot (in the inactive memslots) by passing NULL as
- * the "new" slot.
+ * the "new" slot, and for the invalid version in the active slots.
*/
kvm_replace_memslot(kvm, old, NULL);
-
- /* And do the same for the invalid version in the active slot. */
kvm_activate_memslot(kvm, invalid_slot, NULL);
-
- /* Free the invalid slot, the caller will clean up the old slot. */
- kfree(invalid_slot);
}

-static struct kvm_memory_slot *kvm_move_memslot(struct kvm *kvm,
- struct kvm_memory_slot *old,
- const struct kvm_memory_slot *new,
- struct kvm_memory_slot *invalid_slot)
+static void kvm_move_memslot(struct kvm *kvm,
+ struct kvm_memory_slot *old,
+ struct kvm_memory_slot *new,
+ struct kvm_memory_slot *invalid_slot)
{
- struct kvm_memslots *slots = kvm_get_inactive_memslots(kvm, old->as_id);
-
- /*
- * The memslot's gfn is changing, remove it from the inactive tree, it
- * will be re-added with its updated gfn. Because its range is
- * changing, an in-place replace is not possible.
- */
- kvm_erase_gfn_node(slots, old);
-
- /*
- * The old slot is now fully disconnected, reuse its memory for the
- * persistent copy of "new".
- */
- kvm_copy_memslot(old, new);
-
- /* Re-add to the gfn tree with the updated gfn */
- kvm_insert_gfn_node(slots, old);
-
- /* Replace the current INVALID slot with the updated memslot. */
- kvm_activate_memslot(kvm, invalid_slot, old);
-
/*
- * Clear the INVALID flag so that the invalid_slot is now a perfect
- * copy of the old slot. Return it for cleanup in the caller.
+ * Replace the old memslot in the inactive slots, and then swap slots
+ * and replace the current INVALID with the new as well.
*/
- WARN_ON_ONCE(!(invalid_slot->flags & KVM_MEMSLOT_INVALID));
- invalid_slot->flags &= ~KVM_MEMSLOT_INVALID;
- return invalid_slot;
+ kvm_replace_memslot(kvm, old, new);
+ kvm_activate_memslot(kvm, invalid_slot, new);
}

static void kvm_update_flags_memslot(struct kvm *kvm,
struct kvm_memory_slot *old,
- const struct kvm_memory_slot *new,
- struct kvm_memory_slot *working_slot)
+ struct kvm_memory_slot *new)
{
/*
* Similar to the MOVE case, but the slot doesn't need to be zapped as
* an intermediate step. Instead, the old memslot is simply replaced
* with a new, updated copy in both memslot sets.
*/
- kvm_copy_memslot(working_slot, new);
- kvm_replace_memslot(kvm, old, working_slot);
- kvm_activate_memslot(kvm, old, working_slot);
+ kvm_replace_memslot(kvm, old, new);
+ kvm_activate_memslot(kvm, old, new);
}

static int kvm_set_memslot(struct kvm *kvm,
@@ -1732,19 +1700,9 @@ static int kvm_set_memslot(struct kvm *kvm,
struct kvm_memory_slot *new,
enum kvm_mr_change change)
{
- struct kvm_memory_slot *working;
+ struct kvm_memory_slot *invalid_slot;
int r;

- /*
- * Modifications are done on an unreachable slot. Any changes are then
- * (eventually) propagated to both the active and inactive slots. This
- * allocation would ideally be on-demand (in helpers), but is done here
- * to avoid having to handle failure after kvm_prepare_memory_region().
- */
- working = kzalloc(sizeof(*working), GFP_KERNEL_ACCOUNT);
- if (!working)
- return -ENOMEM;
-
/*
* Released in kvm_swap_active_memslots.
*
@@ -1769,9 +1727,19 @@ static int kvm_set_memslot(struct kvm *kvm,
* (and without a lock), a window would exist between effecting the
* delete/move and committing the changes in arch code where KVM or a
* guest could access a non-existent memslot.
+ *
+ * Modifications are done on a temporary, unreachable slot. The old
+ * slot needs to be preserved in case a later step fails and the
+ * invalidation needs to be reverted.
*/
- if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
- kvm_invalidate_memslot(kvm, old, working);
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+ invalid_slot = kzalloc(sizeof(*invalid_slot), GFP_KERNEL_ACCOUNT);
+ if (!invalid_slot) {
+ mutex_unlock(&kvm->slots_arch_lock);
+ return -ENOMEM;
+ }
+ kvm_invalidate_memslot(kvm, old, invalid_slot);
+ }

r = kvm_prepare_memory_region(kvm, old, new, change);
if (r) {
@@ -1781,11 +1749,12 @@ static int kvm_set_memslot(struct kvm *kvm,
* in the inactive slots. Changing the active memslots also
* release slots_arch_lock.
*/
- if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
- kvm_activate_memslot(kvm, working, old);
- else
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+ kvm_activate_memslot(kvm, invalid_slot, old);
+ kfree(invalid_slot);
+ } else {
mutex_unlock(&kvm->slots_arch_lock);
- kfree(working);
+ }
return r;
}

@@ -1797,16 +1766,20 @@ static int kvm_set_memslot(struct kvm *kvm,
* old slot is detached but otherwise preserved.
*/
if (change == KVM_MR_CREATE)
- kvm_create_memslot(kvm, new, working);
+ kvm_create_memslot(kvm, new);
else if (change == KVM_MR_DELETE)
- kvm_delete_memslot(kvm, old, working);
+ kvm_delete_memslot(kvm, old, invalid_slot);
else if (change == KVM_MR_MOVE)
- old = kvm_move_memslot(kvm, old, new, working);
+ kvm_move_memslot(kvm, old, new, invalid_slot);
else if (change == KVM_MR_FLAGS_ONLY)
- kvm_update_flags_memslot(kvm, old, new, working);
+ kvm_update_flags_memslot(kvm, old, new);
else
BUG();

+ /* Free the temporary INVALID slot used for DELETE and MOVE. */
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+ kfree(invalid_slot);
+
/*
* No need to refresh new->arch, changes after dropping slots_arch_lock
* will directly hit the final, active memsot. Architectures are
@@ -1840,8 +1813,7 @@ static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
int __kvm_set_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem)
{
- struct kvm_memory_slot *old;
- struct kvm_memory_slot new;
+ struct kvm_memory_slot *old, *new;
struct kvm_memslots *slots;
enum kvm_mr_change change;
unsigned long npages;
@@ -1890,11 +1862,7 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (WARN_ON_ONCE(kvm->nr_memslot_pages < old->npages))
return -EIO;

- memset(&new, 0, sizeof(new));
- new.id = id;
- new.as_id = as_id;
-
- return kvm_set_memslot(kvm, old, &new, KVM_MR_DELETE);
+ return kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE);
}

base_gfn = (mem->guest_phys_addr >> PAGE_SHIFT);
@@ -1927,14 +1895,22 @@ int __kvm_set_memory_region(struct kvm *kvm,
kvm_check_memslot_overlap(slots, id, base_gfn, base_gfn + npages))
return -EEXIST;

- new.as_id = as_id;
- new.id = id;
- new.base_gfn = base_gfn;
- new.npages = npages;
- new.flags = mem->flags;
- new.userspace_addr = mem->userspace_addr;
+ /* Allocate a slot that will persist in the memslot. */
+ new = kzalloc(sizeof(*new), GFP_KERNEL_ACCOUNT);
+ if (!new)
+ return -ENOMEM;
+
+ new->as_id = as_id;
+ new->id = id;
+ new->base_gfn = base_gfn;
+ new->npages = npages;
+ new->flags = mem->flags;
+ new->userspace_addr = mem->userspace_addr;

- return kvm_set_memslot(kvm, old, &new, change);
+ r = kvm_set_memslot(kvm, old, new, change);
+ if (r)
+ kfree(new);
+ return r;
}
EXPORT_SYMBOL_GPL(__kvm_set_memory_region);


2021-11-30 21:47:35

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 17/29] KVM: x86: Don't call kvm_mmu_change_mmu_pages() if the count hasn't changed

From: "Maciej S. Szmigiero" <[email protected]>

There is no point in calling kvm_mmu_change_mmu_pages() for memslot
operations that don't change the total page count, so do it just for
KVM_MR_CREATE and KVM_MR_DELETE.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f25d224d21d9..13048683db60 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11832,7 +11832,8 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
const struct kvm_memory_slot *new,
enum kvm_mr_change change)
{
- if (!kvm->arch.n_requested_mmu_pages)
+ if (!kvm->arch.n_requested_mmu_pages &&
+ (change == KVM_MR_CREATE || change == KVM_MR_DELETE))
kvm_mmu_change_mmu_pages(kvm,
kvm_mmu_calculate_default_mmu_pages(kvm));


2021-11-30 21:47:45

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 26/29] KVM: Optimize gfn lookup in kvm_zap_gfn_range()

From: "Maciej S. Szmigiero" <[email protected]>

Introduce a memslots gfn upper bound operation and use it to optimize
kvm_zap_gfn_range().
This way this handler can do a quick lookup for intersecting gfns and won't
have to do a linear scan of the whole memslot set.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
---
arch/x86/kvm/mmu/mmu.c | 12 +++--
include/linux/kvm_host.h | 99 ++++++++++++++++++++++++++++++++++++++++
2 files changed, 108 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index f5a89942f054..89d34bcfa9c3 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5704,19 +5704,22 @@ static bool __kvm_zap_rmaps(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
{
const struct kvm_memory_slot *memslot;
struct kvm_memslots *slots;
+ struct kvm_memslot_iter iter;
bool flush = false;
gfn_t start, end;
- int i, bkt;
+ int i;

if (!kvm_memslots_have_rmaps(kvm))
return flush;

for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
slots = __kvm_memslots(kvm, i);
- kvm_for_each_memslot(memslot, bkt, slots) {
+
+ kvm_for_each_memslot_in_gfn_range(&iter, slots, gfn_start, gfn_end) {
+ memslot = kvm_memslot_iter_slot(&iter);
start = max(gfn_start, memslot->base_gfn);
end = min(gfn_end, memslot->base_gfn + memslot->npages);
- if (start >= end)
+ if (WARN_ON_ONCE(start >= end))
continue;

flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp,
@@ -5737,6 +5740,9 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end)
bool flush;
int i;

+ if (WARN_ON_ONCE(gfn_end <= gfn_start))
+ return;
+
write_lock(&kvm->mmu_lock);

kvm_inc_notifier_count(kvm, gfn_start, gfn_end);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 41efe53cf150..6fce6eb797a7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -848,6 +848,105 @@ struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
return NULL;
}

+/* Iterator used for walking memslots that overlap a gfn range. */
+struct kvm_memslot_iter {
+ struct kvm_memslots *slots;
+ gfn_t end;
+ struct rb_node *node;
+};
+
+static inline void kvm_memslot_iter_start(struct kvm_memslot_iter *iter,
+ struct kvm_memslots *slots,
+ gfn_t start, gfn_t end)
+{
+ int idx = slots->node_idx;
+ struct rb_node *tmp;
+ struct kvm_memory_slot *slot;
+
+ iter->slots = slots;
+ iter->end = end;
+
+ /*
+ * Find the so called "upper bound" of a key - the first node that has
+ * its key strictly greater than the searched one (the start gfn in our case).
+ */
+ iter->node = NULL;
+ for (tmp = slots->gfn_tree.rb_node; tmp; ) {
+ slot = container_of(tmp, struct kvm_memory_slot, gfn_node[idx]);
+ if (start < slot->base_gfn) {
+ iter->node = tmp;
+ tmp = tmp->rb_left;
+ } else {
+ tmp = tmp->rb_right;
+ }
+ }
+
+ /*
+ * Find the slot with the lowest gfn that can possibly intersect with
+ * the range, so we'll ideally have slot start <= range start
+ */
+ if (iter->node) {
+ /*
+ * A NULL previous node means that the very first slot
+ * already has a higher start gfn.
+ * In this case slot start > range start.
+ */
+ tmp = rb_prev(iter->node);
+ if (tmp)
+ iter->node = tmp;
+ } else {
+ /* a NULL node below means no slots */
+ iter->node = rb_last(&slots->gfn_tree);
+ }
+
+ if (iter->node) {
+ /*
+ * It is possible in the slot start < range start case that the
+ * found slot ends before or at range start (slot end <= range start)
+ * and so it does not overlap the requested range.
+ *
+ * In such non-overlapping case the next slot (if it exists) will
+ * already have slot start > range start, otherwise the logic above
+ * would have found it instead of the current slot.
+ */
+ slot = container_of(iter->node, struct kvm_memory_slot, gfn_node[idx]);
+ if (slot->base_gfn + slot->npages <= start)
+ iter->node = rb_next(iter->node);
+ }
+}
+
+static inline struct kvm_memory_slot *kvm_memslot_iter_slot(struct kvm_memslot_iter *iter)
+{
+ return container_of(iter->node, struct kvm_memory_slot, gfn_node[iter->slots->node_idx]);
+}
+
+static inline bool kvm_memslot_iter_is_valid(struct kvm_memslot_iter *iter)
+{
+ struct kvm_memory_slot *memslot;
+
+ if (!iter->node)
+ return false;
+
+ memslot = kvm_memslot_iter_slot(iter);
+
+ /*
+ * If this slot starts beyond or at the end of the range so does
+ * every next one
+ */
+ return memslot->base_gfn < iter->end;
+}
+
+static inline void kvm_memslot_iter_next(struct kvm_memslot_iter *iter)
+{
+ iter->node = rb_next(iter->node);
+}
+
+/* Iterate over each memslot at least partially intersecting [start, end) range */
+#define kvm_for_each_memslot_in_gfn_range(iter, slots, start, end) \
+ for (kvm_memslot_iter_start(iter, slots, start, end); \
+ kvm_memslot_iter_is_valid(iter); \
+ kvm_memslot_iter_next(iter))
+
/*
* KVM_SET_USER_MEMORY_REGION ioctl allows the following operations:
* - create a new memory slot

2021-11-30 21:47:52

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 25/29] KVM: Call kvm_arch_flush_shadow_memslot() on the old slot in kvm_invalidate_memslot()

From: "Maciej S. Szmigiero" <[email protected]>

kvm_invalidate_memslot() calls kvm_arch_flush_shadow_memslot() on the
active, but KVM_MEMSLOT_INVALID slot.
Do it on the inactive (but valid) old slot instead since arch code really
should not get passed such invalid slot.

Suggested-by: Sean Christopherson <[email protected]>
Signed-off-by: Maciej S. Szmigiero <[email protected]>
---
virt/kvm/kvm_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index c57748ee41e8..086f18969bc3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1632,7 +1632,7 @@ static void kvm_invalidate_memslot(struct kvm *kvm,
* - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
* - kvm_is_visible_gfn (mmu_check_root)
*/
- kvm_arch_flush_shadow_memslot(kvm, working_slot);
+ kvm_arch_flush_shadow_memslot(kvm, old);

/* Was released by kvm_swap_active_memslots, reacquire. */
mutex_lock(&kvm->slots_arch_lock);

2021-11-30 21:48:16

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 23/29] KVM: s390: Introduce kvm_s390_get_gfn_end()

From: "Maciej S. Szmigiero" <[email protected]>

And use it where s390 code would just access the memslot with the highest
gfn directly.

No functional change intended.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/s390/kvm/kvm-s390.c | 2 +-
arch/s390/kvm/kvm-s390.h | 12 ++++++++++++
arch/s390/kvm/pv.c | 4 +---
3 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index b0fa2592b4ef..5054fab23226 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -2014,7 +2014,7 @@ static int kvm_s390_get_cmma(struct kvm *kvm, struct kvm_s390_cmma_log *args,
if (!ms)
return 0;
next_gfn = kvm_s390_next_dirty_cmma(slots, cur_gfn + 1);
- mem_end = slots->memslots[0].base_gfn + slots->memslots[0].npages;
+ mem_end = kvm_s390_get_gfn_end(slots);

while (args->count < bufsize) {
hva = gfn_to_hva(kvm, cur_gfn);
diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h
index b887fe7a7064..cc309cc37e96 100644
--- a/arch/s390/kvm/kvm-s390.h
+++ b/arch/s390/kvm/kvm-s390.h
@@ -217,6 +217,18 @@ static inline void kvm_s390_set_user_cpu_state_ctrl(struct kvm *kvm)
kvm->arch.user_cpu_state_ctrl = 1;
}

+/* get the end gfn of the last (highest gfn) memslot */
+static inline unsigned long kvm_s390_get_gfn_end(struct kvm_memslots *slots)
+{
+ struct kvm_memory_slot *ms;
+
+ if (WARN_ON(!slots->used_slots))
+ return 0;
+
+ ms = slots->memslots;
+ return ms->base_gfn + ms->npages;
+}
+
/* implemented in pv.c */
int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);
diff --git a/arch/s390/kvm/pv.c b/arch/s390/kvm/pv.c
index 00d272d134c2..7f7c0d6af2ce 100644
--- a/arch/s390/kvm/pv.c
+++ b/arch/s390/kvm/pv.c
@@ -116,7 +116,6 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
unsigned long base = uv_info.guest_base_stor_len;
unsigned long virt = uv_info.guest_virt_var_stor_len;
unsigned long npages = 0, vlen = 0;
- struct kvm_memory_slot *memslot;

kvm->arch.pv.stor_var = NULL;
kvm->arch.pv.stor_base = __get_free_pages(GFP_KERNEL_ACCOUNT, get_order(base));
@@ -130,8 +129,7 @@ static int kvm_s390_pv_alloc_vm(struct kvm *kvm)
* Slots are sorted by GFN
*/
mutex_lock(&kvm->slots_lock);
- memslot = kvm_memslots(kvm)->memslots;
- npages = memslot->base_gfn + memslot->npages;
+ npages = kvm_s390_get_gfn_end(kvm_memslots(kvm));
mutex_unlock(&kvm->slots_lock);

kvm->arch.pv.guest_len = npages * PAGE_SIZE;

2021-11-30 21:48:38

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 22/29] KVM: Use interval tree to do fast hva lookup in memslots

From: "Maciej S. Szmigiero" <[email protected]>

The current memslots implementation only allows quick binary search by gfn,
quick lookup by hva is not possible - the implementation has to do a linear
scan of the whole memslots array, even though the operation being performed
might apply just to a single memslot.

This significantly hurts performance of per-hva operations with higher
memslot counts.

Since hva ranges can overlap between memslots an interval tree is needed
for tracking them.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
[sean: handle interval tree updates in kvm_replace_memslot()]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/arm64/kvm/Kconfig | 1 +
arch/mips/kvm/Kconfig | 1 +
arch/powerpc/kvm/Kconfig | 1 +
arch/s390/kvm/Kconfig | 1 +
arch/x86/kvm/Kconfig | 1 +
include/linux/kvm_host.h | 3 +++
virt/kvm/kvm_main.c | 53 +++++++++++++++++++++++++++++-----------
7 files changed, 47 insertions(+), 14 deletions(-)

diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 8ffcbe29395e..f1f8fc069a97 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -39,6 +39,7 @@ menuconfig KVM
select HAVE_KVM_IRQ_BYPASS
select HAVE_KVM_VCPU_RUN_PID_CHANGE
select SCHED_INFO
+ select INTERVAL_TREE
help
Support hosting virtualized guest machines.

diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
index a77297480f56..91d197bee9c0 100644
--- a/arch/mips/kvm/Kconfig
+++ b/arch/mips/kvm/Kconfig
@@ -27,6 +27,7 @@ config KVM
select KVM_MMIO
select MMU_NOTIFIER
select SRCU
+ select INTERVAL_TREE
help
Support for hosting Guest kernels.

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index ff581d70f20c..e4c24f524ba8 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -26,6 +26,7 @@ config KVM
select KVM_VFIO
select IRQ_BYPASS_MANAGER
select HAVE_KVM_IRQ_BYPASS
+ select INTERVAL_TREE

config KVM_BOOK3S_HANDLER
bool
diff --git a/arch/s390/kvm/Kconfig b/arch/s390/kvm/Kconfig
index 67a8e770e369..2e84d3922f7c 100644
--- a/arch/s390/kvm/Kconfig
+++ b/arch/s390/kvm/Kconfig
@@ -33,6 +33,7 @@ config KVM
select HAVE_KVM_NO_POLL
select SRCU
select KVM_VFIO
+ select INTERVAL_TREE
help
Support hosting paravirtualized guest machines using the SIE
virtualization capability on the mainframe. This should work
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 619186138176..7618bef0a4a9 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -43,6 +43,7 @@ config KVM
select KVM_GENERIC_DIRTYLOG_READ_PROTECT
select KVM_VFIO
select SRCU
+ select INTERVAL_TREE
select HAVE_KVM_PM_NOTIFIER if PM
help
Support hosting fully virtualized guest machines using hardware
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f3be79fb7d74..9f87adda1764 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -30,6 +30,7 @@
#include <linux/nospec.h>
#include <linux/notifier.h>
#include <linux/hashtable.h>
+#include <linux/interval_tree.h>
#include <linux/xarray.h>
#include <asm/signal.h>

@@ -430,6 +431,7 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)

struct kvm_memory_slot {
struct hlist_node id_node;
+ struct interval_tree_node hva_node;
gfn_t base_gfn;
unsigned long npages;
unsigned long *dirty_bitmap;
@@ -531,6 +533,7 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
*/
struct kvm_memslots {
u64 generation;
+ struct rb_root_cached hva_tree;
/*
* The mapping table from slot id to the index in memslots[].
*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f2eca32fbca8..502dbdf8ff96 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -514,6 +514,12 @@ static void kvm_null_fn(void)
}
#define IS_KVM_NULL_FN(fn) ((fn) == (void *)kvm_null_fn)

+/* Iterate over each memslot intersecting [start, last] (inclusive) range */
+#define kvm_for_each_memslot_in_hva_range(node, slots, start, last) \
+ for (node = interval_tree_iter_first(&slots->hva_tree, start, last); \
+ node; \
+ node = interval_tree_iter_next(node, start, last)) \
+
static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
const struct kvm_hva_range *range)
{
@@ -523,6 +529,9 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
struct kvm_memslots *slots;
int i, idx;

+ if (WARN_ON_ONCE(range->end <= range->start))
+ return 0;
+
/* A null handler is allowed if and only if on_lock() is provided. */
if (WARN_ON_ONCE(IS_KVM_NULL_FN(range->on_lock) &&
IS_KVM_NULL_FN(range->handler)))
@@ -531,15 +540,17 @@ static __always_inline int __kvm_handle_hva_range(struct kvm *kvm,
idx = srcu_read_lock(&kvm->srcu);

for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
+ struct interval_tree_node *node;
+
slots = __kvm_memslots(kvm, i);
- kvm_for_each_memslot(slot, slots) {
+ kvm_for_each_memslot_in_hva_range(node, slots,
+ range->start, range->end - 1) {
unsigned long hva_start, hva_end;

+ slot = container_of(node, struct kvm_memory_slot, hva_node);
hva_start = max(range->start, slot->userspace_addr);
hva_end = min(range->end, slot->userspace_addr +
(slot->npages << PAGE_SHIFT));
- if (hva_start >= hva_end)
- continue;

/*
* To optimize for the likely case where the address
@@ -875,6 +886,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
if (!slots)
return NULL;

+ slots->hva_tree = RB_ROOT_CACHED;
hash_init(slots->id_hash);

return slots;
@@ -1279,21 +1291,28 @@ static void kvm_replace_memslot(struct kvm_memslots *slots,
struct kvm_memory_slot *new)
{
/*
- * Remove the old memslot from the hash list, copying the node data
- * would corrupt the list.
+ * Remove the old memslot from the hash list and interval tree, copying
+ * the node data would corrupt the structures.
*/
if (old) {
hash_del(&old->id_node);
+ interval_tree_remove(&old->hva_node, &slots->hva_tree);

if (!new)
return;

/* Copy the source *data*, not the pointer, to the destination. */
*new = *old;
+ } else {
+ /* If @old is NULL, initialize @new's hva range. */
+ new->hva_node.start = new->userspace_addr;
+ new->hva_node.last = new->userspace_addr +
+ (new->npages << PAGE_SHIFT) - 1;
}

/* (Re)Add the new memslot. */
hash_add(slots->id_hash, &new->id_node, new->id);
+ interval_tree_insert(&new->hva_node, &slots->hva_tree);
}

static void kvm_shift_memslot(struct kvm_memslots *slots, int dst, int src)
@@ -1324,7 +1343,7 @@ static inline void kvm_memslot_delete(struct kvm_memslots *slots,
atomic_set(&slots->last_used_slot, 0);

/*
- * Remove the to-be-deleted memslot from the list _before_ shifting
+ * Remove the to-be-deleted memslot from the list/tree _before_ shifting
* the trailing memslots forward, its data will be overwritten.
* Defer the (somewhat pointless) copying of the memslot until after
* the last slot has been shifted to avoid overwriting said last slot.
@@ -1351,7 +1370,8 @@ static inline int kvm_memslot_insert_back(struct kvm_memslots *slots)
* itself is not preserved in the array, i.e. not swapped at this time, only
* its new index into the array is tracked. Returns the changed memslot's
* current index into the memslots array.
- * The memslot at the returned index will not be in @slots->id_hash by then.
+ * The memslot at the returned index will not be in @slots->hva_tree or
+ * @slots->id_hash by then.
* @memslot is a detached struct with desired final data of the changed slot.
*/
static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
@@ -1365,10 +1385,10 @@ static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
return -1;

/*
- * Delete the slot from the hash table before sorting the remaining
- * slots, the slot's data may be overwritten when copying slots as part
- * of the sorting proccess. update_memslots() will unconditionally
- * rewrite the entire slot and re-add it to the hash table.
+ * Delete the slot from the hash table and interval tree before sorting
+ * the remaining slots, the slot's data may be overwritten when copying
+ * slots as part of the sorting proccess. update_memslots() will
+ * unconditionally rewrite and re-add the entire slot.
*/
kvm_replace_memslot(slots, oldslot, NULL);

@@ -1394,10 +1414,12 @@ static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
* is not preserved in the array, i.e. not swapped at this time, only its new
* index into the array is tracked. Returns the changed memslot's final index
* into the memslots array.
- * The memslot at the returned index will not be in @slots->id_hash by then.
+ * The memslot at the returned index will not be in @slots->hva_tree or
+ * @slots->id_hash by then.
* @memslot is a detached struct with desired final data of the new or
* changed slot.
- * Assumes that the memslot at @start index is not in @slots->id_hash.
+ * Assumes that the memslot at @start index is not in @slots->hva_tree or
+ * @slots->id_hash.
*/
static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
struct kvm_memory_slot *memslot,
@@ -1590,9 +1612,12 @@ static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,

memcpy(slots, old, kvm_memslots_size(old->used_slots));

+ slots->hva_tree = RB_ROOT_CACHED;
hash_init(slots->id_hash);
- kvm_for_each_memslot(memslot, slots)
+ kvm_for_each_memslot(memslot, slots) {
+ interval_tree_insert(&memslot->hva_node, &slots->hva_tree);
hash_add(slots->id_hash, &memslot->id_node, memslot->id);
+ }

return slots;
}

2021-11-30 21:48:47

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 21/29] KVM: Resolve memslot ID via a hash table instead of via a static array

From: "Maciej S. Szmigiero" <[email protected]>

Memslot ID to the corresponding memslot mappings are currently kept as
indices in static id_to_index array.
The size of this array depends on the maximum allowed memslot count
(regardless of the number of memslots actually in use).

This has become especially problematic recently, when memslot count cap was
removed, so the maximum count is now full 32k memslots - the maximum
allowed by the current KVM API.

Keeping these IDs in a hash table (instead of an array) avoids this
problem.

Resolving a memslot ID to the actual memslot (instead of its index) will
also enable transitioning away from an array-based implementation of the
whole memslots structure in a later commit.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
Co-developed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
include/linux/kvm_host.h | 25 +++++++----
virt/kvm/kvm_main.c | 95 +++++++++++++++++++++++++++++++---------
2 files changed, 91 insertions(+), 29 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 86562ffb6ea4..f3be79fb7d74 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -29,6 +29,7 @@
#include <linux/refcount.h>
#include <linux/nospec.h>
#include <linux/notifier.h>
+#include <linux/hashtable.h>
#include <linux/xarray.h>
#include <asm/signal.h>

@@ -428,6 +429,7 @@ static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu)
#define KVM_MEM_MAX_NR_PAGES ((1UL << 31) - 1)

struct kvm_memory_slot {
+ struct hlist_node id_node;
gfn_t base_gfn;
unsigned long npages;
unsigned long *dirty_bitmap;
@@ -529,8 +531,15 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
*/
struct kvm_memslots {
u64 generation;
- /* The mapping table from slot id to the index in memslots[]. */
- short id_to_index[KVM_MEM_SLOTS_NUM];
+ /*
+ * The mapping table from slot id to the index in memslots[].
+ *
+ * 7-bit bucket count matches the size of the old id to index array for
+ * 512 slots, while giving good performance with this slot count.
+ * Higher bucket counts bring only small performance improvements but
+ * always result in higher memory usage (even for lower memslot counts).
+ */
+ DECLARE_HASHTABLE(id_hash, 7);
atomic_t last_used_slot;
int used_slots;
struct kvm_memory_slot memslots[];
@@ -798,16 +807,14 @@ static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu)
static inline
struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
{
- int index = slots->id_to_index[id];
struct kvm_memory_slot *slot;

- if (index < 0)
- return NULL;
-
- slot = &slots->memslots[index];
+ hash_for_each_possible(slots->id_hash, slot, id_node, id) {
+ if (slot->id == id)
+ return slot;
+ }

- WARN_ON(slot->id != id);
- return slot;
+ return NULL;
}

/*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bbc0110224f3..f2eca32fbca8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -869,15 +869,13 @@ static void kvm_destroy_pm_notifier(struct kvm *kvm)

static struct kvm_memslots *kvm_alloc_memslots(void)
{
- int i;
struct kvm_memslots *slots;

slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
if (!slots)
return NULL;

- for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
- slots->id_to_index[i] = -1;
+ hash_init(slots->id_hash);

return slots;
}
@@ -1276,17 +1274,48 @@ static int kvm_alloc_dirty_bitmap(struct kvm_memory_slot *memslot)
return 0;
}

+static void kvm_replace_memslot(struct kvm_memslots *slots,
+ struct kvm_memory_slot *old,
+ struct kvm_memory_slot *new)
+{
+ /*
+ * Remove the old memslot from the hash list, copying the node data
+ * would corrupt the list.
+ */
+ if (old) {
+ hash_del(&old->id_node);
+
+ if (!new)
+ return;
+
+ /* Copy the source *data*, not the pointer, to the destination. */
+ *new = *old;
+ }
+
+ /* (Re)Add the new memslot. */
+ hash_add(slots->id_hash, &new->id_node, new->id);
+}
+
+static void kvm_shift_memslot(struct kvm_memslots *slots, int dst, int src)
+{
+ struct kvm_memory_slot *mslots = slots->memslots;
+
+ kvm_replace_memslot(slots, &mslots[src], &mslots[dst]);
+}
+
/*
* Delete a memslot by decrementing the number of used slots and shifting all
* other entries in the array forward one spot.
+ * @memslot is a detached dummy struct with just .id and .as_id filled.
*/
static inline void kvm_memslot_delete(struct kvm_memslots *slots,
struct kvm_memory_slot *memslot)
{
struct kvm_memory_slot *mslots = slots->memslots;
+ struct kvm_memory_slot *oldslot = id_to_memslot(slots, memslot->id);
int i;

- if (WARN_ON(slots->id_to_index[memslot->id] == -1))
+ if (WARN_ON(!oldslot))
return;

slots->used_slots--;
@@ -1294,12 +1323,17 @@ static inline void kvm_memslot_delete(struct kvm_memslots *slots,
if (atomic_read(&slots->last_used_slot) >= slots->used_slots)
atomic_set(&slots->last_used_slot, 0);

- for (i = slots->id_to_index[memslot->id]; i < slots->used_slots; i++) {
- mslots[i] = mslots[i + 1];
- slots->id_to_index[mslots[i].id] = i;
- }
+ /*
+ * Remove the to-be-deleted memslot from the list _before_ shifting
+ * the trailing memslots forward, its data will be overwritten.
+ * Defer the (somewhat pointless) copying of the memslot until after
+ * the last slot has been shifted to avoid overwriting said last slot.
+ */
+ kvm_replace_memslot(slots, oldslot, NULL);
+
+ for (i = oldslot - mslots; i < slots->used_slots; i++)
+ kvm_shift_memslot(slots, i, i + 1);
mslots[i] = *memslot;
- slots->id_to_index[memslot->id] = -1;
}

/*
@@ -1317,30 +1351,39 @@ static inline int kvm_memslot_insert_back(struct kvm_memslots *slots)
* itself is not preserved in the array, i.e. not swapped at this time, only
* its new index into the array is tracked. Returns the changed memslot's
* current index into the memslots array.
+ * The memslot at the returned index will not be in @slots->id_hash by then.
+ * @memslot is a detached struct with desired final data of the changed slot.
*/
static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
struct kvm_memory_slot *memslot)
{
struct kvm_memory_slot *mslots = slots->memslots;
+ struct kvm_memory_slot *oldslot = id_to_memslot(slots, memslot->id);
int i;

- if (slots->id_to_index[memslot->id] == -1 || !slots->used_slots)
+ if (!oldslot || !slots->used_slots)
return -1;

+ /*
+ * Delete the slot from the hash table before sorting the remaining
+ * slots, the slot's data may be overwritten when copying slots as part
+ * of the sorting proccess. update_memslots() will unconditionally
+ * rewrite the entire slot and re-add it to the hash table.
+ */
+ kvm_replace_memslot(slots, oldslot, NULL);
+
/*
* Move the target memslot backward in the array by shifting existing
* memslots with a higher GFN (than the target memslot) towards the
* front of the array.
*/
- for (i = slots->id_to_index[memslot->id]; i < slots->used_slots - 1; i++) {
+ for (i = oldslot - mslots; i < slots->used_slots - 1; i++) {
if (memslot->base_gfn > mslots[i + 1].base_gfn)
break;

WARN_ON_ONCE(memslot->base_gfn == mslots[i + 1].base_gfn);

- /* Shift the next memslot forward one and update its index. */
- mslots[i] = mslots[i + 1];
- slots->id_to_index[mslots[i].id] = i;
+ kvm_shift_memslot(slots, i, i + 1);
}
return i;
}
@@ -1351,6 +1394,10 @@ static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
* is not preserved in the array, i.e. not swapped at this time, only its new
* index into the array is tracked. Returns the changed memslot's final index
* into the memslots array.
+ * The memslot at the returned index will not be in @slots->id_hash by then.
+ * @memslot is a detached struct with desired final data of the new or
+ * changed slot.
+ * Assumes that the memslot at @start index is not in @slots->id_hash.
*/
static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
struct kvm_memory_slot *memslot,
@@ -1365,9 +1412,7 @@ static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,

WARN_ON_ONCE(memslot->base_gfn == mslots[i - 1].base_gfn);

- /* Shift the next memslot back one and update its index. */
- mslots[i] = mslots[i - 1];
- slots->id_to_index[mslots[i].id] = i;
+ kvm_shift_memslot(slots, i, i - 1);
}
return i;
}
@@ -1412,6 +1457,9 @@ static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
* most likely to be referenced, sorting it to the front of the array was
* advantageous. The current binary search starts from the middle of the array
* and uses an LRU pointer to improve performance for all memslots and GFNs.
+ *
+ * @memslot is a detached struct, not a part of the current or new memslot
+ * array.
*/
static void update_memslots(struct kvm_memslots *slots,
struct kvm_memory_slot *memslot,
@@ -1436,7 +1484,7 @@ static void update_memslots(struct kvm_memslots *slots,
* its index accordingly.
*/
slots->memslots[i] = *memslot;
- slots->id_to_index[memslot->id] = i;
+ kvm_replace_memslot(slots, NULL, &slots->memslots[i]);
}
}

@@ -1529,6 +1577,7 @@ static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
{
struct kvm_memslots *slots;
size_t new_size;
+ struct kvm_memory_slot *memslot;

if (change == KVM_MR_CREATE)
new_size = kvm_memslots_size(old->used_slots + 1);
@@ -1536,8 +1585,14 @@ static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
new_size = kvm_memslots_size(old->used_slots);

slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
- if (likely(slots))
- memcpy(slots, old, kvm_memslots_size(old->used_slots));
+ if (unlikely(!slots))
+ return NULL;
+
+ memcpy(slots, old, kvm_memslots_size(old->used_slots));
+
+ hash_init(slots->id_hash);
+ kvm_for_each_memslot(memslot, slots)
+ hash_add(slots->id_hash, &memslot->id_node, memslot->id);

return slots;
}

2021-11-30 21:50:01

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 20/29] KVM: Move WARN on invalid memslot index to update_memslots()

From: "Maciej S. Szmigiero" <[email protected]>

Since kvm_memslot_move_forward() can theoretically return a negative
memslot index even when kvm_memslot_move_backward() returned a positive one
(and so did not WARN) let's just move the warning to the common code.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
Reviewed-by: Claudio Imbrenda <[email protected]>
Reviewed-by: Sean Christopherson <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index aca39b587cdb..bbc0110224f3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1324,8 +1324,7 @@ static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
struct kvm_memory_slot *mslots = slots->memslots;
int i;

- if (WARN_ON_ONCE(slots->id_to_index[memslot->id] == -1) ||
- WARN_ON_ONCE(!slots->used_slots))
+ if (slots->id_to_index[memslot->id] == -1 || !slots->used_slots)
return -1;

/*
@@ -1429,6 +1428,9 @@ static void update_memslots(struct kvm_memslots *slots,
i = kvm_memslot_move_backward(slots, memslot);
i = kvm_memslot_move_forward(slots, memslot, i);

+ if (WARN_ON_ONCE(i < 0))
+ return;
+
/*
* Copy the memslot to its new position in memslots and update
* its index accordingly.

2021-11-30 21:50:08

by Maciej S. Szmigiero

[permalink] [raw]
Subject: [PATCH v6 18/29] KVM: x86: Use nr_memslot_pages to avoid traversing the memslots array

From: "Maciej S. Szmigiero" <[email protected]>

There is no point in recalculating from scratch the total number of pages
in all memslots each time a memslot is created or deleted. Use KVM's
cached nr_memslot_pages to compute the default max number of MMU pages.

Note that even with nr_memslot_pages capped at ULONG_MAX we can't safely
multiply it by KVM_PERMILLE_MMU_PAGES (20) since this operation can
possibly overflow an unsigned long variable.

Write this "* 20 / 1000" operation as "/ 50" instead to avoid such
overflow.

Signed-off-by: Maciej S. Szmigiero <[email protected]>
[sean: use common KVM field and rework changelog accordingly]
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 3 +--
arch/x86/kvm/mmu/mmu.c | 24 ------------------------
arch/x86/kvm/x86.c | 10 +++++++---
3 files changed, 8 insertions(+), 29 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index e41ad1ead721..a5daf399e033 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -135,7 +135,7 @@
#define KVM_HPAGE_MASK(x) (~(KVM_HPAGE_SIZE(x) - 1))
#define KVM_PAGES_PER_HPAGE(x) (KVM_HPAGE_SIZE(x) / PAGE_SIZE)

-#define KVM_PERMILLE_MMU_PAGES 20
+#define KVM_MEMSLOT_PAGES_TO_MMU_PAGES_RATIO 50
#define KVM_MIN_ALLOC_MMU_PAGES 64UL
#define KVM_MMU_HASH_SHIFT 12
#define KVM_NUM_MMU_PAGES (1 << KVM_MMU_HASH_SHIFT)
@@ -1590,7 +1590,6 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
const struct kvm_memory_slot *memslot);
void kvm_mmu_zap_all(struct kvm *kvm);
void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
-unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
void kvm_mmu_change_mmu_pages(struct kvm *kvm, unsigned long kvm_nr_mmu_pages);

int load_pdptrs(struct kvm_vcpu *vcpu, unsigned long cr3);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 63c04c25fd66..64b6181f7757 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -6131,30 +6131,6 @@ int kvm_mmu_module_init(void)
return ret;
}

-/*
- * Calculate mmu pages needed for kvm.
- */
-unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm)
-{
- unsigned long nr_mmu_pages;
- unsigned long nr_pages = 0;
- struct kvm_memslots *slots;
- struct kvm_memory_slot *memslot;
- int i;
-
- for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
- slots = __kvm_memslots(kvm, i);
-
- kvm_for_each_memslot(memslot, slots)
- nr_pages += memslot->npages;
- }
-
- nr_mmu_pages = nr_pages * KVM_PERMILLE_MMU_PAGES / 1000;
- nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
-
- return nr_mmu_pages;
-}
-
void kvm_mmu_destroy(struct kvm_vcpu *vcpu)
{
kvm_mmu_unload(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 13048683db60..5c770447f8c5 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11833,9 +11833,13 @@ void kvm_arch_commit_memory_region(struct kvm *kvm,
enum kvm_mr_change change)
{
if (!kvm->arch.n_requested_mmu_pages &&
- (change == KVM_MR_CREATE || change == KVM_MR_DELETE))
- kvm_mmu_change_mmu_pages(kvm,
- kvm_mmu_calculate_default_mmu_pages(kvm));
+ (change == KVM_MR_CREATE || change == KVM_MR_DELETE)) {
+ unsigned long nr_mmu_pages;
+
+ nr_mmu_pages = kvm->nr_memslot_pages / KVM_MEMSLOT_PAGES_TO_MMU_PAGES_RATIO;
+ nr_mmu_pages = max(nr_mmu_pages, KVM_MIN_ALLOC_MMU_PAGES);
+ kvm_mmu_change_mmu_pages(kvm, nr_mmu_pages);
+ }

kvm_mmu_slot_apply_flags(kvm, old, new, change);


2021-12-01 02:21:48

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 18/29] KVM: x86: Use nr_memslot_pages to avoid traversing the memslots array

On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <[email protected]>
>
> There is no point in recalculating from scratch the total number of pages
> in all memslots each time a memslot is created or deleted. Use KVM's
> cached nr_memslot_pages to compute the default max number of MMU pages.
>
> Note that even with nr_memslot_pages capped at ULONG_MAX we can't safely
> multiply it by KVM_PERMILLE_MMU_PAGES (20) since this operation can
> possibly overflow an unsigned long variable.
>
> Write this "* 20 / 1000" operation as "/ 50" instead to avoid such
> overflow.
>
> Signed-off-by: Maciej S. Szmigiero <[email protected]>
> [sean: use common KVM field and rework changelog accordingly]
> Signed-off-by: Sean Christopherson <[email protected]>

My SoB can definitely be dropped for this one, just consider it review feedback
that happened to have an SoB attached.

Reviewed-by: Sean Christopherson <[email protected]>

2021-12-01 02:54:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 21/29] KVM: Resolve memslot ID via a hash table instead of via a static array

On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <[email protected]>
>
> Memslot ID to the corresponding memslot mappings are currently kept as
> indices in static id_to_index array.
> The size of this array depends on the maximum allowed memslot count
> (regardless of the number of memslots actually in use).
>
> This has become especially problematic recently, when memslot count cap was
> removed, so the maximum count is now full 32k memslots - the maximum
> allowed by the current KVM API.
>
> Keeping these IDs in a hash table (instead of an array) avoids this
> problem.
>
> Resolving a memslot ID to the actual memslot (instead of its index) will
> also enable transitioning away from an array-based implementation of the
> whole memslots structure in a later commit.
>
> Signed-off-by: Maciej S. Szmigiero <[email protected]>
> Co-developed-by: Sean Christopherson <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>

Nit, your SoB should come last since you were the last person to handle the patch.

2021-12-01 03:08:38

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 27/29] KVM: Optimize overlapping memslots check

On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <[email protected]>
>
> Do a quick lookup for possibly overlapping gfns when creating or moving
> a memslot instead of performing a linear scan of the whole memslot set.
>
> Signed-off-by: Maciej S. Szmigiero <[email protected]>
> [sean: tweaked params to avoid churn in future cleanup]
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> virt/kvm/kvm_main.c | 35 +++++++++++++++++++++--------------
> 1 file changed, 21 insertions(+), 14 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 086f18969bc3..52117f65bc5b 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1817,6 +1817,18 @@ static int kvm_set_memslot(struct kvm *kvm,
> return 0;
> }
>
> +static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
> + gfn_t start, gfn_t end)
> +{
> + struct kvm_memslot_iter iter;
> +
> + kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end)

The for loop needs curly braces, per coding-style.rst:

Also, use braces when a loop contains more than a single simple statement:

.. code-block:: c

while (condition) {
if (test)
do_something();
}

With that,

Reviewed-by: Sean Christopherson <[email protected]>

2021-12-01 03:27:51

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 25/29] KVM: Call kvm_arch_flush_shadow_memslot() on the old slot in kvm_invalidate_memslot()

On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <[email protected]>
>
> kvm_invalidate_memslot() calls kvm_arch_flush_shadow_memslot() on the
> active, but KVM_MEMSLOT_INVALID slot.
> Do it on the inactive (but valid) old slot instead since arch code really
> should not get passed such invalid slot.

One other thing that's worth noting in the changelog is that "old->arch" may have
stale data. IMO that's perfectly ok, but it's definitely a quirk. Ideally KVM
would disallow touching "arch" for an INVALID slot, but that would require another
arch hook if kvm_prepare_memory_region() failed to refresh old->arch if necessary
before restoring it. :-/

Paolo, thoughts on this goofy case? I don't love it, but I dislike having

kvm_arch_flush_shadow_memslot(kvm, invalid_slot);

in the final code even more.

Reviewed-by: Sean Christopherson <[email protected]>

> Suggested-by: Sean Christopherson <[email protected]>
> Signed-off-by: Maciej S. Szmigiero <[email protected]>
> ---
> virt/kvm/kvm_main.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index c57748ee41e8..086f18969bc3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1632,7 +1632,7 @@ static void kvm_invalidate_memslot(struct kvm *kvm,
> * - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
> * - kvm_is_visible_gfn (mmu_check_root)
> */
> - kvm_arch_flush_shadow_memslot(kvm, working_slot);
> + kvm_arch_flush_shadow_memslot(kvm, old);
>
> /* Was released by kvm_swap_active_memslots, reacquire. */
> mutex_lock(&kvm->slots_arch_lock);

2021-12-01 03:43:41

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 26/29] KVM: Optimize gfn lookup in kvm_zap_gfn_range()

On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 41efe53cf150..6fce6eb797a7 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -848,6 +848,105 @@ struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
> return NULL;
> }
>
> +/* Iterator used for walking memslots that overlap a gfn range. */
> +struct kvm_memslot_iter {
> + struct kvm_memslots *slots;
> + gfn_t end;
> + struct rb_node *node;
> +};

...

> +static inline struct kvm_memory_slot *kvm_memslot_iter_slot(struct kvm_memslot_iter *iter)
> +{
> + return container_of(iter->node, struct kvm_memory_slot, gfn_node[iter->slots->node_idx]);

Having to use a helper in callers of kvm_for_each_memslot_in_gfn_range() is a bit
ugly, any reason not to grab @slot as well? Then the callers just do iter.slot,
which IMO is much more readable.

And if we do that, I'd also vote to omit slots and end from the iterator. It would
mean passing in slots and end to kvm_memslot_iter_is_valid() and kvm_memslot_iter_next(),
but that's more idiomatic in a for-loop if iter is considered to be _just_ the iterator
part. "slots" is arguable, but "end" really shouldn't be part of the iterator.

> +}
> +
> +static inline bool kvm_memslot_iter_is_valid(struct kvm_memslot_iter *iter)
> +{
> + struct kvm_memory_slot *memslot;
> +
> + if (!iter->node)
> + return false;
> +
> + memslot = kvm_memslot_iter_slot(iter);
> +
> + /*
> + * If this slot starts beyond or at the end of the range so does
> + * every next one
> + */
> + return memslot->base_gfn < iter->end;
> +}
> +
> +static inline void kvm_memslot_iter_next(struct kvm_memslot_iter *iter)
> +{
> + iter->node = rb_next(iter->node);
> +}
> +
> +/* Iterate over each memslot at least partially intersecting [start, end) range */
> +#define kvm_for_each_memslot_in_gfn_range(iter, slots, start, end) \
> + for (kvm_memslot_iter_start(iter, slots, start, end); \
> + kvm_memslot_iter_is_valid(iter); \
> + kvm_memslot_iter_next(iter))
> +
> /*
> * KVM_SET_USER_MEMORY_REGION ioctl allows the following operations:
> * - create a new memory slot

2021-12-01 03:45:40

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 00/29] KVM: Scalable memslots implementation

On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> From: "Maciej S. Szmigiero" <[email protected]>
>
> This series contains the sixth iteration of the scalable memslots patch set.
> It is based on Sean's version "5.5", but with integrated patches for issues
> that arose during its review round.
>
> In addition to that, the kvm_for_each_memslot_in_gfn_range() implementation
> was reworked to return only strictly overlapping memslots and to use
> iterators.
>
> However, I've dropped a similar kvm_for_each_memslot_in_hva_range() rework
> since the existing implementation was already returning only strictly
> overlapping memslots and it was already based on interval tree iterators,
> so wrapping them in another layer of iterators would only add unnecessary
> complexity.
> The code in this "for"-like macro is also self-contained and very simple,
> so let's keep it this way.

If kvm_for_each_memslot_in_hva_range() ever gains a user outside of kvm_main.c
it should definitely get an iterator container so that callers don't need to do
the container_of() stuff. I'd still prefer a container even now, but it's not a
sticking point.

2021-12-01 15:47:09

by Maciej S. Szmigiero

[permalink] [raw]
Subject: Re: [PATCH v6 18/29] KVM: x86: Use nr_memslot_pages to avoid traversing the memslots array

On 01.12.2021 03:21, Sean Christopherson wrote:
> On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <[email protected]>
>>
>> There is no point in recalculating from scratch the total number of pages
>> in all memslots each time a memslot is created or deleted. Use KVM's
>> cached nr_memslot_pages to compute the default max number of MMU pages.
>>
>> Note that even with nr_memslot_pages capped at ULONG_MAX we can't safely
>> multiply it by KVM_PERMILLE_MMU_PAGES (20) since this operation can
>> possibly overflow an unsigned long variable.
>>
>> Write this "* 20 / 1000" operation as "/ 50" instead to avoid such
>> overflow.
>>
>> Signed-off-by: Maciej S. Szmigiero <[email protected]>
>> [sean: use common KVM field and rework changelog accordingly]
>> Signed-off-by: Sean Christopherson <[email protected]>
>
> My SoB can definitely be dropped for this one, just consider it review feedback
> that happened to have an SoB attached.
>
> Reviewed-by: Sean Christopherson <[email protected]>
>

...and on this one, too.

Thanks,
Maciej

2021-12-01 15:47:37

by Maciej S. Szmigiero

[permalink] [raw]
Subject: Re: [PATCH v6 27/29] KVM: Optimize overlapping memslots check

On 01.12.2021 04:08, Sean Christopherson wrote:
> On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <[email protected]>
>>
>> Do a quick lookup for possibly overlapping gfns when creating or moving
>> a memslot instead of performing a linear scan of the whole memslot set.
>>
>> Signed-off-by: Maciej S. Szmigiero <[email protected]>
>> [sean: tweaked params to avoid churn in future cleanup]
>> Signed-off-by: Sean Christopherson <[email protected]>
>> ---
>> virt/kvm/kvm_main.c | 35 +++++++++++++++++++++--------------
>> 1 file changed, 21 insertions(+), 14 deletions(-)
>>
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 086f18969bc3..52117f65bc5b 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -1817,6 +1817,18 @@ static int kvm_set_memslot(struct kvm *kvm,
>> return 0;
>> }
>>
>> +static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
>> + gfn_t start, gfn_t end)
>> +{
>> + struct kvm_memslot_iter iter;
>> +
>> + kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end)
>
> The for loop needs curly braces, per coding-style.rst:
>
> Also, use braces when a loop contains more than a single simple statement:
>
> .. code-block:: c
>
> while (condition) {
> if (test)
> do_something();
> }
>
> With that,
>
> Reviewed-by: Sean Christopherson <[email protected]>
>

Will add these braces (and your R-b, too).

Thanks,
Maciej

2021-12-01 15:47:51

by Maciej S. Szmigiero

[permalink] [raw]
Subject: Re: [PATCH v6 21/29] KVM: Resolve memslot ID via a hash table instead of via a static array

On 01.12.2021 03:54, Sean Christopherson wrote:
> On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <[email protected]>
>>
>> Memslot ID to the corresponding memslot mappings are currently kept as
>> indices in static id_to_index array.
>> The size of this array depends on the maximum allowed memslot count
>> (regardless of the number of memslots actually in use).
>>
>> This has become especially problematic recently, when memslot count cap was
>> removed, so the maximum count is now full 32k memslots - the maximum
>> allowed by the current KVM API.
>>
>> Keeping these IDs in a hash table (instead of an array) avoids this
>> problem.
>>
>> Resolving a memslot ID to the actual memslot (instead of its index) will
>> also enable transitioning away from an array-based implementation of the
>> whole memslots structure in a later commit.
>>
>> Signed-off-by: Maciej S. Szmigiero <[email protected]>
>> Co-developed-by: Sean Christopherson <[email protected]>
>> Signed-off-by: Sean Christopherson <[email protected]>
>
> Nit, your SoB should come last since you were the last person to handle the patch.
>

Thought that my SoB should come first as coming from the author of this
patch.

Documentation/process/submitting-patches.rst says that:
> Any further SoBs (Signed-off-by:'s) following the author's SoB are from
> people handling and transporting the patch, but were not involved in its
> development. SoB chains should reflect the **real** route a patch took
> as it was propagated to the maintainers and ultimately to Linus, with
> the first SoB entry signalling primary authorship of a single author.

So "further SoBs follow[] the author's SoB" and "the first SoB entry
signal[s] primary authorship".
But at the same time "SoB chains should reflect the **real** route a
patch took" - these rules contradict each other in our case.

Thanks,
Maciej

2021-12-01 15:48:18

by Maciej S. Szmigiero

[permalink] [raw]
Subject: Re: [PATCH v6 25/29] KVM: Call kvm_arch_flush_shadow_memslot() on the old slot in kvm_invalidate_memslot()

On 01.12.2021 04:27, Sean Christopherson wrote:
> On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
>> From: "Maciej S. Szmigiero" <[email protected]>
>>
>> kvm_invalidate_memslot() calls kvm_arch_flush_shadow_memslot() on the
>> active, but KVM_MEMSLOT_INVALID slot.
>> Do it on the inactive (but valid) old slot instead since arch code really
>> should not get passed such invalid slot.
>
> One other thing that's worth noting in the changelog is that "old->arch" may have
> stale data. IMO that's perfectly ok, but it's definitely a quirk.

Will add this caveat to this commit message.

> Ideally KVM
> would disallow touching "arch" for an INVALID slot, but that would require another
> arch hook if kvm_prepare_memory_region() failed to refresh old->arch if necessary
> before restoring it. :-/

It looks to me that only PPC book3s_hv actually uses the "arch" field in
its kvm_arch_flush_shadow_memslot() implementation.

But at the same time it does not seem to do lazy allocation for rmaps or
similar stuff.

However, the code there is complex enough for this change to be of
higher-than-usual risk.
That's why I have split it out from the main memslots sets rework.

Thanks,
Maciej

2021-12-01 15:49:27

by Maciej S. Szmigiero

[permalink] [raw]
Subject: Re: [PATCH v6 26/29] KVM: Optimize gfn lookup in kvm_zap_gfn_range()

On 01.12.2021 04:41, Sean Christopherson wrote:
> On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index 41efe53cf150..6fce6eb797a7 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -848,6 +848,105 @@ struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
>> return NULL;
>> }
>>
>> +/* Iterator used for walking memslots that overlap a gfn range. */
>> +struct kvm_memslot_iter {
>> + struct kvm_memslots *slots;
>> + gfn_t end;
>> + struct rb_node *node;
>> +};
>
> ...
>
>> +static inline struct kvm_memory_slot *kvm_memslot_iter_slot(struct kvm_memslot_iter *iter)
>> +{
>> + return container_of(iter->node, struct kvm_memory_slot, gfn_node[iter->slots->node_idx]);
>
> Having to use a helper in callers of kvm_for_each_memslot_in_gfn_range() is a bit
> ugly, any reason not to grab @slot as well? Then the callers just do iter.slot,
> which IMO is much more readable.

"slot" can be easily calculated from "node" together with either "slots" or
"node_idx" (the code above just adjusts a pointer) so storing it in the
iterator makes little sense if the later are already stored there.

> And if we do that, I'd also vote to omit slots and end from the iterator. It would
> mean passing in slots and end to kvm_memslot_iter_is_valid() and kvm_memslot_iter_next(),
> but that's more idiomatic in a for-loop if iter is considered to be _just_ the iterator
> part. "slots" is arguable, but "end" really shouldn't be part of the iterator.

You're right that we can get away with not storing "end", will remove it.

Thanks,
Maciej

2021-12-01 16:23:42

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 21/29] KVM: Resolve memslot ID via a hash table instead of via a static array

On Wed, Dec 01, 2021, Maciej S. Szmigiero wrote:
> On 01.12.2021 03:54, Sean Christopherson wrote:
> > On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> > > From: "Maciej S. Szmigiero" <[email protected]>
> > >
> > > Memslot ID to the corresponding memslot mappings are currently kept as
> > > indices in static id_to_index array.
> > > The size of this array depends on the maximum allowed memslot count
> > > (regardless of the number of memslots actually in use).
> > >
> > > This has become especially problematic recently, when memslot count cap was
> > > removed, so the maximum count is now full 32k memslots - the maximum
> > > allowed by the current KVM API.
> > >
> > > Keeping these IDs in a hash table (instead of an array) avoids this
> > > problem.
> > >
> > > Resolving a memslot ID to the actual memslot (instead of its index) will
> > > also enable transitioning away from an array-based implementation of the
> > > whole memslots structure in a later commit.
> > >
> > > Signed-off-by: Maciej S. Szmigiero <[email protected]>
> > > Co-developed-by: Sean Christopherson <[email protected]>
> > > Signed-off-by: Sean Christopherson <[email protected]>
> >
> > Nit, your SoB should come last since you were the last person to handle the patch.
> >
>
> Thought that my SoB should come first as coming from the author of this
> patch.
>
> Documentation/process/submitting-patches.rst says that:
> > Any further SoBs (Signed-off-by:'s) following the author's SoB are from
> > people handling and transporting the patch, but were not involved in its
> > development. SoB chains should reflect the **real** route a patch took
> > as it was propagated to the maintainers and ultimately to Linus, with
> > the first SoB entry signalling primary authorship of a single author.
>
> So "further SoBs follow[] the author's SoB" and "the first SoB entry
> signal[s] primary authorship".
> But at the same time "SoB chains should reflect the **real** route a
> patch took" - these rules contradict each other in our case.

Yeah, this is a unusual case. If we wanted to be super strict, for patches written
by you without a Co-developed-by, the SoB chain should be:

Signed-off-by: Maciej S. Szmigiero <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
Signed-off-by: Maciej S. Szmigiero <[email protected]>

but that's a bit ridiculous and probably unnecessary since my changes were little
more than code review feedback, which is why I think it's ok to just drop my SoB
for patches authored solely by you.

Co-developed-by is a slightly different case. Because patches with multiple
authors are likely handed back and forth multiple times, as was the case here,
and because each author needs a SoB anyways, the normal rules are tweaked slightly
to require that the person submitting the patch is always last to capture that they
were the person that did the actual submission.

There's another "When to use Acked-by:, Cc:, and Co-developed-by:" section in
submitting-patches.rst that covers this:

Co-developed-by: states that the patch was co-created by multiple developers;
it is used to give attribution to co-authors (in addition to the author
attributed by the From: tag) when several people work on a single patch. Since
Co-developed-by: denotes authorship, every Co-developed-by: must be immediately
followed by a Signed-off-by: of the associated co-author. Standard sign-off
procedure applies, i.e. the ordering of Signed-off-by: tags should reflect the
chronological history of the patch insofar as possible, regardless of whether
the author is attributed via From: or Co-developed-by:. Notably, the last
Signed-off-by: must always be that of the developer submitting the patch.

Note, the From: tag is optional when the From: author is also the person (and
email) listed in the From: line of the email header.

Example of a patch submitted by the From: author::

<changelog>

Co-developed-by: First Co-Author <[email protected]>
Signed-off-by: First Co-Author <[email protected]>
Co-developed-by: Second Co-Author <[email protected]>
Signed-off-by: Second Co-Author <[email protected]>
Signed-off-by: From Author <[email protected]>

Example of a patch submitted by a Co-developed-by: author::

From: From Author <[email protected]>

<changelog>

Co-developed-by: Random Co-Author <[email protected]>
Signed-off-by: Random Co-Author <[email protected]>
Signed-off-by: From Author <[email protected]>
Co-developed-by: Submitting Co-Author <[email protected]>
Signed-off-by: Submitting Co-Author <[email protected]>


2021-12-01 16:36:40

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v6 26/29] KVM: Optimize gfn lookup in kvm_zap_gfn_range()

On Wed, Dec 01, 2021, Maciej S. Szmigiero wrote:
> On 01.12.2021 04:41, Sean Christopherson wrote:
> > On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
> > > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > > index 41efe53cf150..6fce6eb797a7 100644
> > > --- a/include/linux/kvm_host.h
> > > +++ b/include/linux/kvm_host.h
> > > @@ -848,6 +848,105 @@ struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
> > > return NULL;
> > > }
> > > +/* Iterator used for walking memslots that overlap a gfn range. */
> > > +struct kvm_memslot_iter {
> > > + struct kvm_memslots *slots;
> > > + gfn_t end;
> > > + struct rb_node *node;
> > > +};
> >
> > ...
> >
> > > +static inline struct kvm_memory_slot *kvm_memslot_iter_slot(struct kvm_memslot_iter *iter)
> > > +{
> > > + return container_of(iter->node, struct kvm_memory_slot, gfn_node[iter->slots->node_idx]);
> >
> > Having to use a helper in callers of kvm_for_each_memslot_in_gfn_range() is a bit
> > ugly, any reason not to grab @slot as well? Then the callers just do iter.slot,
> > which IMO is much more readable.
>
> "slot" can be easily calculated from "node" together with either "slots" or
> "node_idx" (the code above just adjusts a pointer) so storing it in the
> iterator makes little sense if the later are already stored there.

I don't want the callers to have to calculate the slot. It's mostly syntatic
sugar, but I really do think it improves readability. And since the first thing
every caller will do is retrieve the slot, I see no benefit in forcing the caller
to do the work.

E.g. in the simple kvm_check_memslot_overlap() usage, iter.slot->id is intuitive
and easy to parse, whereas kvm_memslot_iter_slot(&iter)->id is slightly more
difficult to parse and raises questions about why a function call is necessary
and what the function might be doing.

static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
gfn_t start, gfn_t end)
{
struct kvm_memslot_iter iter;

kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
if (iter.slot->id != id)
return true;
}

return false;
}

vs.

static bool kvm_check_memslot_overlap(struct kvm_memslots *slots, int id,
gfn_t start, gfn_t end)
{
struct kvm_memslot_iter iter;

kvm_for_each_memslot_in_gfn_range(&iter, slots, start, end) {
if (kvm_memslot_iter_slot(&iter)->id != id)
return true;
}

return false;
}


2021-12-01 23:08:55

by Maciej S. Szmigiero

[permalink] [raw]
Subject: Re: [PATCH v6 26/29] KVM: Optimize gfn lookup in kvm_zap_gfn_range()

On 01.12.2021 17:36, Sean Christopherson wrote:
> On Wed, Dec 01, 2021, Maciej S. Szmigiero wrote:
>> On 01.12.2021 04:41, Sean Christopherson wrote:
>>> On Tue, Nov 30, 2021, Maciej S. Szmigiero wrote:
>>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>>> index 41efe53cf150..6fce6eb797a7 100644
>>>> --- a/include/linux/kvm_host.h
>>>> +++ b/include/linux/kvm_host.h
>>>> @@ -848,6 +848,105 @@ struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
>>>> return NULL;
>>>> }
>>>> +/* Iterator used for walking memslots that overlap a gfn range. */
>>>> +struct kvm_memslot_iter {
>>>> + struct kvm_memslots *slots;
>>>> + gfn_t end;
>>>> + struct rb_node *node;
>>>> +};
>>>
>>> ...
>>>
>>>> +static inline struct kvm_memory_slot *kvm_memslot_iter_slot(struct kvm_memslot_iter *iter)
>>>> +{
>>>> + return container_of(iter->node, struct kvm_memory_slot, gfn_node[iter->slots->node_idx]);
>>>
>>> Having to use a helper in callers of kvm_for_each_memslot_in_gfn_range() is a bit
>>> ugly, any reason not to grab @slot as well? Then the callers just do iter.slot,
>>> which IMO is much more readable.
>>
>> "slot" can be easily calculated from "node" together with either "slots" or
>> "node_idx" (the code above just adjusts a pointer) so storing it in the
>> iterator makes little sense if the later are already stored there.
>
> I don't want the callers to have to calculate the slot. It's mostly syntatic
> sugar, but I really do think it improves readability. And since the first thing
> every caller will do is retrieve the slot, I see no benefit in forcing the caller
> to do the work.
>
> E.g. in the simple kvm_check_memslot_overlap() usage, iter.slot->id is intuitive
> and easy to parse, whereas kvm_memslot_iter_slot(&iter)->id is slightly more
> difficult to parse and raises questions about why a function call is necessary
> and what the function might be doing.

Personally, I don't think it's that much less readable, but I will change
the code to store "slots" instead (as you wish) since it's the last remaining
change - other than Paolo's call whether we should keep or drop the
kvm_arch_flush_shadow_memslot()-related patch 25.

Thanks,
Maciej