2019-10-25 19:16:53

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 00/15] KVM: Dynamically size memslot arrays

The end goal of this series is to dynamically size the memslot array so
that KVM allocates memory based on the number of memslots in use, as
opposed to unconditionally allocating memory for the maximum number of
memslots. On x86, each memslot consumes 88 bytes, and so with 2 address
spaces of 512 memslots, each VM consumes ~90k bytes for the memslots.
E.g. given a VM that uses a total of 30 memslots, dynamic sizing reduces
the memory footprint from 90k to ~2.6k bytes.

The changes required to support dynamic sizing are relatively small,
e.g. are essentially contained in patches 14/15 and 15/15. Patches 1-13
clean up the memslot code, which has gotten quite crusty, especially
__kvm_set_memory_region(). The clean up is likely not strictly necessary
to switch to dynamic sizing, but I didn't have a remotely reasonable
level of confidence in the correctness of the dynamic sizing without first
doing the clean up.

Christoffer, I added your Tested-by to the patches that I was confident
would be fully tested based on the desription of what you tested. Let me
know if you disagree with any of 'em.

v3:
- Fix build errors on PPC and MIPS due to missed params during
refactoring [kbuild test robot].
- Rename the helpers for update_memslots() and add comments describing
the new algorithm and how it interacts with searching [Paolo].
- Remove the unnecessary and obnoxious warning regarding memslots being
a flexible array [Paolo].
- Fix typos in the changelog of patch 09/15 [Christoffer].
- Collect tags [Christoffer].

v2:
- Split "Drop kvm_arch_create_memslot()" into three patches to move
minor functional changes to standalone patches [Janosch].
- Rebase to latest kvm/queue (f0574a1cea5b, "KVM: x86: fix ...")
- Collect an Acked-by and a Reviewed-by


Sean Christopherson (15):
KVM: Reinstall old memslots if arch preparation fails
KVM: Don't free new memslot if allocation of said memslot fails
KVM: PPC: Move memslot memory allocation into prepare_memory_region()
KVM: x86: Allocate memslot resources during prepare_memory_region()
KVM: Drop kvm_arch_create_memslot()
KVM: Explicitly free allocated-but-unused dirty bitmap
KVM: Refactor error handling for setting memory region
KVM: Move setting of memslot into helper routine
KVM: Move memslot deletion to helper function
KVM: Simplify kvm_free_memslot() and all its descendents
KVM: Clean up local variable usage in __kvm_set_memory_region()
KVM: Provide common implementation for generic dirty log functions
KVM: Ensure validity of memslot with respect to kvm_get_dirty_log()
KVM: Terminate memslot walks via used_slots
KVM: Dynamically size memslot array based on number of used slots

arch/mips/include/asm/kvm_host.h | 2 +-
arch/mips/kvm/mips.c | 69 +--
arch/powerpc/include/asm/kvm_ppc.h | 17 +-
arch/powerpc/kvm/book3s.c | 22 +-
arch/powerpc/kvm/book3s_hv.c | 36 +-
arch/powerpc/kvm/book3s_pr.c | 20 +-
arch/powerpc/kvm/booke.c | 17 +-
arch/powerpc/kvm/powerpc.c | 13 +-
arch/s390/include/asm/kvm_host.h | 2 +-
arch/s390/kvm/kvm-s390.c | 21 +-
arch/x86/include/asm/kvm_page_track.h | 3 +-
arch/x86/kvm/page_track.c | 15 +-
arch/x86/kvm/x86.c | 101 +----
include/linux/kvm_host.h | 46 +-
virt/kvm/arm/arm.c | 48 +-
virt/kvm/arm/mmu.c | 18 +-
virt/kvm/kvm_main.c | 621 +++++++++++++++++---------
17 files changed, 542 insertions(+), 529 deletions(-)

--
2.22.0


2019-10-25 19:17:01

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 14/15] KVM: Terminate memslot walks via used_slots

Refactor memslot handling to treat the number of used slots as the de
facto size of the memslot array, e.g. return NULL from id_to_memslot()
when an invalid index is provided instead of relying on npages==0 to
detect an invalid memslot. Rework the sorting and walking of memslots
in advance of dynamically sizing memslots to aid bisection and debug,
e.g. with luck, a bug in the refactoring will bisect here and/or hit a
WARN instead of randomly corrupting memory.

Alternatively, a global null/invalid memslot could be returned, i.e. so
callers of id_to_memslot() don't have to explicitly check for a NULL
memslot, but that approach runs the risk of introducing difficult-to-
debug issues, e.g. if the global null slot is modified. Constifying
the return from id_to_memslot() to combat such issues is possible, but
would require a massive refactoring of arch specific code and would
still be susceptible to casting shenanigans.

Add function comments to update_memslots() and search_memslots() to
explicitly (and loudly) state how memslots are sorted.

No functional change intended.

Tested-by: Christoffer Dall <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/powerpc/kvm/book3s_hv.c | 2 +-
arch/x86/kvm/x86.c | 14 +--
include/linux/kvm_host.h | 18 ++-
virt/kvm/arm/mmu.c | 9 +-
virt/kvm/kvm_main.c | 220 ++++++++++++++++++++++++++---------
5 files changed, 189 insertions(+), 74 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 14906f7c12c5..444c76091f17 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4405,7 +4405,7 @@ static int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm,
slots = kvm_memslots(kvm);
memslot = id_to_memslot(slots, log->slot);
r = -ENOENT;
- if (!memslot->dirty_bitmap)
+ if (!memslot || !memslot->dirty_bitmap)
goto out;

/*
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 71f579f63951..6d1cb2211fe1 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9465,9 +9465,9 @@ void kvm_arch_sync_events(struct kvm *kvm)
int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
{
int i, r;
- unsigned long hva;
+ unsigned long hva, uninitialized_var(old_npages);
struct kvm_memslots *slots = kvm_memslots(kvm);
- struct kvm_memory_slot *slot, old;
+ struct kvm_memory_slot *slot;

/* Called with kvm->slots_lock held. */
if (WARN_ON(id >= KVM_MEM_SLOTS_NUM))
@@ -9475,7 +9475,7 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)

slot = id_to_memslot(slots, id);
if (size) {
- if (slot->npages)
+ if (slot && slot->npages)
return -EEXIST;

/*
@@ -9487,13 +9487,13 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
if (IS_ERR((void *)hva))
return PTR_ERR((void *)hva);
} else {
- if (!slot->npages)
+ if (!slot || !slot->npages)
return 0;

- hva = 0;
+ hva = slot->userspace_addr;
+ old_npages = slot->npages;
}

- old = *slot;
for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) {
struct kvm_userspace_memory_region m;

@@ -9508,7 +9508,7 @@ int __x86_set_memory_region(struct kvm *kvm, int id, gpa_t gpa, u32 size)
}

if (!size)
- vm_munmap(old.userspace_addr, old.npages * PAGE_SIZE);
+ vm_munmap(hva, old_npages * PAGE_SIZE);

return 0;
}
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a19807a4a5ad..b5d4133a11e1 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -579,10 +579,11 @@ static inline int kvm_vcpu_get_idx(struct kvm_vcpu *vcpu)
BUG();
}

-#define kvm_for_each_memslot(memslot, slots) \
- for (memslot = &slots->memslots[0]; \
- memslot < slots->memslots + KVM_MEM_SLOTS_NUM && memslot->npages;\
- memslot++)
+#define kvm_for_each_memslot(memslot, slots) \
+ for (memslot = &slots->memslots[0]; \
+ memslot < slots->memslots + slots->used_slots; memslot++) \
+ if (WARN_ON_ONCE(!memslot->npages)) { \
+ } else

int kvm_vcpu_init(struct kvm_vcpu *vcpu, struct kvm *kvm, unsigned id);
void kvm_vcpu_uninit(struct kvm_vcpu *vcpu);
@@ -643,12 +644,15 @@ static inline struct kvm_memslots *kvm_vcpu_memslots(struct kvm_vcpu *vcpu)
return __kvm_memslots(vcpu->kvm, as_id);
}

-static inline struct kvm_memory_slot *
-id_to_memslot(struct kvm_memslots *slots, int id)
+static inline
+struct kvm_memory_slot *id_to_memslot(struct kvm_memslots *slots, int id)
{
int index = slots->id_to_index[id];
struct kvm_memory_slot *slot;

+ if (index < 0)
+ return NULL;
+
slot = &slots->memslots[index];

WARN_ON(slot->id != id);
@@ -994,6 +998,8 @@ bool kvm_arch_irqfd_allowed(struct kvm *kvm, struct kvm_irqfd *args);
* used in non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c.
* gfn_to_memslot() itself isn't here as an inline because that would
* bloat other code too much.
+ *
+ * IMPORTANT: Slots are sorted from highest GFN to lowest GFN!
*/
static inline struct kvm_memory_slot *
search_memslots(struct kvm_memslots *slots, gfn_t gfn)
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f3241b268d49..7ea3321cabb8 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -1536,8 +1536,13 @@ void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot)
{
struct kvm_memslots *slots = kvm_memslots(kvm);
struct kvm_memory_slot *memslot = id_to_memslot(slots, slot);
- phys_addr_t start = memslot->base_gfn << PAGE_SHIFT;
- phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;
+ phys_addr_t start, end;
+
+ if (WARN_ON_ONCE(!memslot))
+ return;
+
+ start = memslot->base_gfn << PAGE_SHIFT;
+ end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT;

spin_lock(&kvm->mmu_lock);
stage2_wp_range(kvm, start, end);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0192dccfcec1..7bc88375cf53 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -535,7 +535,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
return NULL;

for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
- slots->id_to_index[i] = slots->memslots[i].id = i;
+ slots->id_to_index[i] = slots->memslots[i].id = -1;

return slots;
}
@@ -807,63 +807,162 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
}

/*
- * Insert memslot and re-sort memslots based on their GFN,
- * so binary search could be used to lookup GFN.
- * Sorting algorithm takes advantage of having initially
- * sorted array and known changed memslot position.
+ * Delete a memslot by decrementing the number of used slots and shifting all
+ * other entries in the array forward one spot.
+ */
+static inline void kvm_memslot_delete(struct kvm_memslots *slots,
+ struct kvm_memory_slot *memslot)
+{
+ struct kvm_memory_slot *mslots = slots->memslots;
+ int i;
+
+ if (WARN_ON(slots->id_to_index[memslot->id] == -1))
+ return;
+
+ slots->used_slots--;
+
+ for (i = slots->id_to_index[memslot->id]; i < slots->used_slots; i++) {
+ mslots[i] = mslots[i + 1];
+ slots->id_to_index[mslots[i].id] = i;
+ }
+ mslots[i] = *memslot;
+ slots->id_to_index[memslot->id] = -1;
+}
+
+/*
+ * "Insert" a new memslot by incrementing the number of used slots. Returns
+ * the new slot's initial index into the memslots array.
+ */
+static inline int kvm_memslot_insert_back(struct kvm_memslots *slots)
+{
+ return slots->used_slots++;
+}
+
+/*
+ * Move a changed memslot backwards in the array by shifting existing slots
+ * with a higher GFN toward the front of the array. Note, the changed memslot
+ * itself is not preserved in the array, i.e. not swapped at this time, only
+ * its new index into the array is tracked. Returns the changed memslot's
+ * current index into the memslots array.
+ */
+static inline int kvm_memslot_move_backward(struct kvm_memslots *slots,
+ struct kvm_memory_slot *memslot)
+{
+ struct kvm_memory_slot *mslots = slots->memslots;
+ int i;
+
+ if (WARN_ON_ONCE(slots->id_to_index[memslot->id] == -1) ||
+ WARN_ON_ONCE(!slots->used_slots))
+ return -1;
+
+ /*
+ * Move the target memslot backward in the array by shifting existing
+ * memslots with a higher GFN (than the target memslot) towards the
+ * front of the array.
+ */
+ for (i = slots->id_to_index[memslot->id]; i < slots->used_slots - 1; i++) {
+ if (memslot->base_gfn > mslots[i + 1].base_gfn)
+ break;
+
+ WARN_ON_ONCE(memslot->base_gfn == mslots[i + 1].base_gfn);
+
+ /* Shift the next memslot forward one and update its index. */
+ mslots[i] = mslots[i + 1];
+ slots->id_to_index[mslots[i].id] = i;
+ }
+ return i;
+}
+
+/*
+ * Move a changed memslot forwards in the array by shifting existing slots with
+ * a lower GFN toward the back of the array. Note, the changed memslot itself
+ * is not preserved in the array, i.e. not swapped at this time, only its new
+ * index into the array is tracked. Returns the changed memslot's final index
+ * into the memslots array.
+ */
+static inline int kvm_memslot_move_forward(struct kvm_memslots *slots,
+ struct kvm_memory_slot *memslot,
+ int start)
+{
+ struct kvm_memory_slot *mslots = slots->memslots;
+ int i;
+
+ for (i = start; i > 0; i--) {
+ if (memslot->base_gfn < mslots[i - 1].base_gfn)
+ break;
+
+ WARN_ON_ONCE(memslot->base_gfn == mslots[i - 1].base_gfn);
+
+ /* Shift the next memslot back one and update its index. */
+ mslots[i] = mslots[i - 1];
+ slots->id_to_index[mslots[i].id] = i;
+ }
+ return i;
+}
+
+/*
+ * Re-sort memslots based on their GFN to account for an added, deleted, or
+ * moved memslot. Sorting memslots by GFN allows using a binary search during
+ * memslot lookup.
+ *
+ * IMPORTANT: Slots are sorted from highest GFN to lowest GFN! I.e. the entry
+ * at memslots[0] has the highest GFN.
+ *
+ * The sorting algorithm takes advantage of having initially sorted memslots
+ * and knowing the position of the changed memslot. Sorting is also optimized
+ * by not swapping the changed memslot and instead only shifting other memslots
+ * and tracking the new index for the changed memslot. Only once its final
+ * index is known is the updated memslot copied into its position in the array.
+ *
+ * - When deleting a memslot, the deleted memslot simply needs to be moved to
+ * the end of the array.
+ *
+ * - When creating a memslot, the algorithm "inserts" the new memslot at the
+ * end of the array and then moves it forward to its correct location.
+ *
+ * - When moving a memslot, the algorithm first moves the updated memslot
+ * backward to handle the scenario where the memslot's GFN was changed to a
+ * lower value. update_memslots() then falls through and runs the same flow
+ * as creating a memslot to move the memslot forward to handle the scenario
+ * where its GFN was changed to a higher value.
+ *
+ * Note, slots are sorted from highest->lowest instead of lowest->highest for
+ * historical reasons. Originally, invalid memslots were denoted by having
+ * GFN=0, thus sorting from highest->lowest naturally sorted invalid memslots
+ * to the end of the array. The current algorithm uses dedicated logic to
+ * delete a memslot and thus does not rely on invalid memslots having GFN=0.
+ *
+ * The other historical motiviation for highest->lowest was to improve the
+ * performance of memslot lookup. KVM originally used a linear search starting
+ * at memslots[0]. On x86, the largest memslot usually has one of the highest,
+ * if not *the* highest, GFN, as the bulk of the guest's RAM is located in a
+ * single memslot above the 4gb boundary. As the largest memslot is also the
+ * most likely to be referenced, sorting it to the front of the array was
+ * advantageous. The current binary search starts from the middle of the array
+ * and uses an LRU pointer to improve performance for all memslots and GFNs.
*/
static void update_memslots(struct kvm_memslots *slots,
- struct kvm_memory_slot *new,
+ struct kvm_memory_slot *memslot,
enum kvm_mr_change change)
{
- int id = new->id;
- int i = slots->id_to_index[id];
- struct kvm_memory_slot *mslots = slots->memslots;
+ int i;

- WARN_ON(mslots[i].id != id);
- switch (change) {
- case KVM_MR_CREATE:
- slots->used_slots++;
- WARN_ON(mslots[i].npages || !new->npages);
- break;
- case KVM_MR_DELETE:
- slots->used_slots--;
- WARN_ON(new->npages || !mslots[i].npages);
- break;
- default:
- break;
- }
+ if (change == KVM_MR_DELETE) {
+ kvm_memslot_delete(slots, memslot);
+ } else {
+ if (change == KVM_MR_CREATE)
+ i = kvm_memslot_insert_back(slots);
+ else
+ i = kvm_memslot_move_backward(slots, memslot);
+ i = kvm_memslot_move_forward(slots, memslot, i);

- while (i < KVM_MEM_SLOTS_NUM - 1 &&
- new->base_gfn <= mslots[i + 1].base_gfn) {
- if (!mslots[i + 1].npages)
- break;
- mslots[i] = mslots[i + 1];
- slots->id_to_index[mslots[i].id] = i;
- i++;
+ /*
+ * Copy the memslot to its new position in memslots and update
+ * its index accordingly.
+ */
+ slots->memslots[i] = *memslot;
+ slots->id_to_index[memslot->id] = i;
}
-
- /*
- * The ">=" is needed when creating a slot with base_gfn == 0,
- * so that it moves before all those with base_gfn == npages == 0.
- *
- * On the other hand, if new->npages is zero, the above loop has
- * already left i pointing to the beginning of the empty part of
- * mslots, and the ">=" would move the hole backwards in this
- * case---which is wrong. So skip the loop when deleting a slot.
- */
- if (new->npages) {
- while (i > 0 &&
- new->base_gfn >= mslots[i - 1].base_gfn) {
- mslots[i] = mslots[i - 1];
- slots->id_to_index[mslots[i].id] = i;
- i--;
- }
- } else
- WARN_ON_ONCE(i != slots->used_slots);
-
- mslots[i] = *new;
- slots->id_to_index[mslots[i].id] = i;
}

static int check_memory_region_flags(const struct kvm_userspace_memory_region *mem)
@@ -1042,8 +1141,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
* when the memslots are re-sorted by update_memslots().
*/
tmp = id_to_memslot(__kvm_memslots(kvm, as_id), id);
- old = *tmp;
- tmp = NULL;
+ if (tmp) {
+ old = *tmp;
+ tmp = NULL;
+ } else {
+ memset(&old, 0, sizeof(old));
+ old.id = id;
+ }

if (!mem->memory_size)
return kvm_delete_memslot(kvm, mem, &old, as_id);
@@ -1161,7 +1265,7 @@ int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,

slots = __kvm_memslots(kvm, as_id);
*memslot = id_to_memslot(slots, id);
- if (!(*memslot)->dirty_bitmap)
+ if (!(*memslot) || !(*memslot)->dirty_bitmap)
return -ENOENT;

kvm_arch_sync_dirty_log(kvm, *memslot);
@@ -1219,10 +1323,10 @@ static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)

slots = __kvm_memslots(kvm, as_id);
memslot = id_to_memslot(slots, id);
+ if (!memslot || !memslot->dirty_bitmap)
+ return -ENOENT;

dirty_bitmap = memslot->dirty_bitmap;
- if (!dirty_bitmap)
- return -ENOENT;

kvm_arch_sync_dirty_log(kvm, memslot);

@@ -1330,10 +1434,10 @@ static int kvm_clear_dirty_log_protect(struct kvm *kvm,

slots = __kvm_memslots(kvm, as_id);
memslot = id_to_memslot(slots, id);
+ if (!memslot || !memslot->dirty_bitmap)
+ return -ENOENT;

dirty_bitmap = memslot->dirty_bitmap;
- if (!dirty_bitmap)
- return -ENOENT;

n = ALIGN(log->num_pages, BITS_PER_LONG) / 8;

--
2.22.0

2019-10-25 19:17:08

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 13/15] KVM: Ensure validity of memslot with respect to kvm_get_dirty_log()

Rework kvm_get_dirty_log() so that it "returns" the associated memslot
on success. A future patch will rework memslot handling such that
id_to_memslot() can return NULL, returning the memslot makes it more
obvious that the validity of the memslot has been verified, i.e.
precludes the need to add validity checks in the arch code that are
technically unnecessary.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/powerpc/kvm/book3s_pr.c | 6 +-----
arch/s390/kvm/kvm-s390.c | 12 ++----------
include/linux/kvm_host.h | 2 +-
virt/kvm/kvm_main.c | 27 +++++++++++++++++++--------
4 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 5368a5dbac22..f41a136d247f 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1860,7 +1860,6 @@ static int kvmppc_vcpu_run_pr(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
static int kvm_vm_ioctl_get_dirty_log_pr(struct kvm *kvm,
struct kvm_dirty_log *log)
{
- struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
struct kvm_vcpu *vcpu;
ulong ga, ga_end;
@@ -1870,15 +1869,12 @@ static int kvm_vm_ioctl_get_dirty_log_pr(struct kvm *kvm,

mutex_lock(&kvm->slots_lock);

- r = kvm_get_dirty_log(kvm, log, &is_dirty);
+ r = kvm_get_dirty_log(kvm, log, &is_dirty, &memslot);
if (r)
goto out;

/* If nothing is dirty, don't bother messing with page tables. */
if (is_dirty) {
- slots = kvm_memslots(kvm);
- memslot = id_to_memslot(slots, log->slot);
-
ga = memslot->base_gfn << PAGE_SHIFT;
ga_end = ga + (memslot->npages << PAGE_SHIFT);

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index c0e9929bdb34..a66eb2b9bf71 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -613,9 +613,8 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
{
int r;
unsigned long n;
- struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
- int is_dirty = 0;
+ int is_dirty;

if (kvm_is_ucontrol(kvm))
return -EINVAL;
@@ -626,14 +625,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
if (log->slot >= KVM_USER_MEM_SLOTS)
goto out;

- slots = kvm_memslots(kvm);
- memslot = id_to_memslot(slots, log->slot);
- r = -ENOENT;
- if (!memslot->dirty_bitmap)
- goto out;
-
- kvm_arch_sync_dirty_log(kvm, memslot);
- r = kvm_get_dirty_log(kvm, log, &is_dirty);
+ r = kvm_get_dirty_log(kvm, log, &is_dirty, &memslot);
if (r)
goto out;

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cd204b7d2513..a19807a4a5ad 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -809,7 +809,7 @@ void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
- int *is_dirty);
+ int *is_dirty, struct kvm_memory_slot **memslot);
#endif

int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9da9c801a237..0192dccfcec1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1136,31 +1136,42 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
}

#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
-int kvm_get_dirty_log(struct kvm *kvm,
- struct kvm_dirty_log *log, int *is_dirty)
+/**
+ * kvm_get_dirty_log - get a snapshot of dirty pages
+ * @kvm: pointer to kvm instance
+ * @log: slot id and address to which we copy the log
+ * @is_dirty: set to '1' if any dirty pages were found
+ * @memslot: set to the associated memslot, always valid on success
+ */
+int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
+ int *is_dirty, struct kvm_memory_slot **memslot)
{
struct kvm_memslots *slots;
- struct kvm_memory_slot *memslot;
int i, as_id, id;
unsigned long n;
unsigned long any = 0;

+ *memslot = NULL;
+ *is_dirty = 0;
+
as_id = log->slot >> 16;
id = (u16)log->slot;
if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_USER_MEM_SLOTS)
return -EINVAL;

slots = __kvm_memslots(kvm, as_id);
- memslot = id_to_memslot(slots, id);
- if (!memslot->dirty_bitmap)
+ *memslot = id_to_memslot(slots, id);
+ if (!(*memslot)->dirty_bitmap)
return -ENOENT;

- n = kvm_dirty_bitmap_bytes(memslot);
+ kvm_arch_sync_dirty_log(kvm, *memslot);
+
+ n = kvm_dirty_bitmap_bytes(*memslot);

for (i = 0; !any && i < n/sizeof(long); ++i)
- any = memslot->dirty_bitmap[i];
+ any = (*memslot)->dirty_bitmap[i];

- if (copy_to_user(log->dirty_bitmap, memslot->dirty_bitmap, n))
+ if (copy_to_user(log->dirty_bitmap, (*memslot)->dirty_bitmap, n))
return -EFAULT;

if (any)
--
2.22.0

2019-10-25 19:17:18

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 12/15] KVM: Provide common implementation for generic dirty log functions

Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code.
The arch specific implemenations are extremely similar, differing
only in whether the dirty log needs to be sync'd from hardware (x86)
and how the TLBs are flushed. Add new arch hooks to handle sync
and TLB flush; the sync will also be used for non-generic dirty log
support in a future patch (s390).

The ulterior motive for providing a common implementation is to
eliminate the dependency between arch and common code with respect to
the memslot referenced by the dirty log, i.e. to make it obvious in the
code that the validity of the memslot is guaranteed, as a future patch
will rework memslot handling such that id_to_memslot() can return NULL.

Acked-by: Christoffer Dall <[email protected]>
Tested-by: Christoffer Dall <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
arch/mips/kvm/mips.c | 63 +++--------------------------
arch/powerpc/kvm/book3s.c | 5 +++
arch/powerpc/kvm/booke.c | 5 +++
arch/s390/kvm/kvm-s390.c | 5 +--
arch/x86/kvm/x86.c | 61 ++--------------------------
include/linux/kvm_host.h | 21 +++++-----
virt/kvm/arm/arm.c | 48 ++--------------------
virt/kvm/kvm_main.c | 84 ++++++++++++++++++++++++++++++++-------
8 files changed, 103 insertions(+), 189 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 713e5465edb0..6bb367a4e51c 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -965,69 +965,16 @@ long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl,
return r;
}

-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * Steps 1-4 below provide general overview of dirty page logging. See
- * kvm_get_dirty_log_protect() function description for additional details.
- *
- * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
- * always flush the TLB (step 4) even if previous step failed and the dirty
- * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
- * does not preclude user space subsequent dirty log read. Flushing TLB ensures
- * writes will be marked dirty for next log read.
- *
- * 1. Take a snapshot of the bit and clear it if needed.
- * 2. Write protect the corresponding page.
- * 3. Copy the snapshot to the userspace.
- * 4. Flush TLB's if needed.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
- struct kvm_memslots *slots;
- struct kvm_memory_slot *memslot;
- bool flush = false;
- int r;

- mutex_lock(&kvm->slots_lock);
-
- r = kvm_get_dirty_log_protect(kvm, log, &flush);
-
- if (flush) {
- slots = kvm_memslots(kvm);
- memslot = id_to_memslot(slots, log->slot);
-
- /* Let implementation handle TLB/GVA invalidation */
- kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
- }
-
- mutex_unlock(&kvm->slots_lock);
- return r;
}

-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
+ struct kvm_memory_slot *memslot)
{
- struct kvm_memslots *slots;
- struct kvm_memory_slot *memslot;
- bool flush = false;
- int r;
-
- mutex_lock(&kvm->slots_lock);
-
- r = kvm_clear_dirty_log_protect(kvm, log, &flush);
-
- if (flush) {
- slots = kvm_memslots(kvm);
- memslot = id_to_memslot(slots, log->slot);
-
- /* Let implementation handle TLB/GVA invalidation */
- kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
- }
-
- mutex_unlock(&kvm->slots_lock);
- return r;
+ /* Let implementation handle TLB/GVA invalidation */
+ kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
}

long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index a5d4a1014fdf..84f66c1592a3 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -829,6 +829,11 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
}

+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+
+}
+
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
{
return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index a22ff567724a..35a4ef89a1db 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1796,6 +1796,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return r;
}

+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+
+}
+
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
{
return -ENOTSUPP;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 21f61dae7db1..c0e9929bdb34 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -572,8 +572,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
return r;
}

-static void kvm_s390_sync_dirty_log(struct kvm *kvm,
- struct kvm_memory_slot *memslot)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
int i;
gfn_t cur_gfn, last_gfn;
@@ -633,7 +632,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
if (!memslot->dirty_bitmap)
goto out;

- kvm_s390_sync_dirty_log(kvm, memslot);
+ kvm_arch_sync_dirty_log(kvm, memslot);
r = kvm_get_dirty_log(kvm, log, &is_dirty);
if (r)
goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b2a66c0143f9..71f579f63951 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4642,77 +4642,24 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
return 0;
}

-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * Steps 1-4 below provide general overview of dirty page logging. See
- * kvm_get_dirty_log_protect() function description for additional details.
- *
- * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
- * always flush the TLB (step 4) even if previous step failed and the dirty
- * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
- * does not preclude user space subsequent dirty log read. Flushing TLB ensures
- * writes will be marked dirty for next log read.
- *
- * 1. Take a snapshot of the bit and clear it if needed.
- * 2. Write protect the corresponding page.
- * 3. Copy the snapshot to the userspace.
- * 4. Flush TLB's if needed.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
- bool flush = false;
- int r;
-
- mutex_lock(&kvm->slots_lock);
-
/*
* Flush potentially hardware-cached dirty pages to dirty_bitmap.
*/
if (kvm_x86_ops->flush_log_dirty)
kvm_x86_ops->flush_log_dirty(kvm);
-
- r = kvm_get_dirty_log_protect(kvm, log, &flush);
-
- /*
- * All the TLBs can be flushed out of mmu lock, see the comments in
- * kvm_mmu_slot_remove_write_access().
- */
- lockdep_assert_held(&kvm->slots_lock);
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
- mutex_unlock(&kvm->slots_lock);
- return r;
}

-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
+ struct kvm_memory_slot *memslot)
{
- bool flush = false;
- int r;
-
- mutex_lock(&kvm->slots_lock);
-
- /*
- * Flush potentially hardware-cached dirty pages to dirty_bitmap.
- */
- if (kvm_x86_ops->flush_log_dirty)
- kvm_x86_ops->flush_log_dirty(kvm);
-
- r = kvm_clear_dirty_log_protect(kvm, log, &flush);
-
/*
* All the TLBs can be flushed out of mmu lock, see the comments in
* kvm_mmu_slot_remove_write_access().
*/
lockdep_assert_held(&kvm->slots_lock);
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
- mutex_unlock(&kvm->slots_lock);
- return r;
+ kvm_flush_remote_tlbs(kvm);
}

int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9ee4eabaf457..cd204b7d2513 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -797,23 +797,20 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf);

int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext);

-int kvm_get_dirty_log(struct kvm *kvm,
- struct kvm_dirty_log *log, int *is_dirty);
-
-int kvm_get_dirty_log_protect(struct kvm *kvm,
- struct kvm_dirty_log *log, bool *flush);
-int kvm_clear_dirty_log_protect(struct kvm *kvm,
- struct kvm_clear_dirty_log *log, bool *flush);
-
void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset,
unsigned long mask);
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);

-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
- struct kvm_dirty_log *log);
-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
- struct kvm_clear_dirty_log *log);
+#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
+ struct kvm_memory_slot *memslot);
+#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
+int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
+ int *is_dirty);
+#endif

int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
bool line_status);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 86c6aa1cb58e..8b4c06a33842 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1203,55 +1203,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
return r;
}

-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * Steps 1-4 below provide general overview of dirty page logging. See
- * kvm_get_dirty_log_protect() function description for additional details.
- *
- * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
- * always flush the TLB (step 4) even if previous step failed and the dirty
- * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
- * does not preclude user space subsequent dirty log read. Flushing TLB ensures
- * writes will be marked dirty for next log read.
- *
- * 1. Take a snapshot of the bit and clear it if needed.
- * 2. Write protect the corresponding page.
- * 3. Copy the snapshot to the userspace.
- * 4. Flush TLB's if needed.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
- bool flush = false;
- int r;

- mutex_lock(&kvm->slots_lock);
-
- r = kvm_get_dirty_log_protect(kvm, log, &flush);
-
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
- mutex_unlock(&kvm->slots_lock);
- return r;
}

-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm,
+ struct kvm_memory_slot *memslot)
{
- bool flush = false;
- int r;
-
- mutex_lock(&kvm->slots_lock);
-
- r = kvm_clear_dirty_log_protect(kvm, log, &flush);
-
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
- mutex_unlock(&kvm->slots_lock);
- return r;
+ kvm_flush_remote_tlbs(kvm);
}

static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 04c4a4b01ae8..9da9c801a237 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -793,7 +793,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)

/*
* Allocation size is twice as large as the actual dirty bitmap size.
- * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
+ * See kvm_vm_ioctl_get_dirty_log() why this is needed.
*/
static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
{
@@ -1135,6 +1135,7 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
return kvm_set_memory_region(kvm, mem);
}

+#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
int kvm_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log, int *is_dirty)
{
@@ -1168,13 +1169,12 @@ int kvm_get_dirty_log(struct kvm *kvm,
}
EXPORT_SYMBOL_GPL(kvm_get_dirty_log);

-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+#else /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
/**
* kvm_get_dirty_log_protect - get a snapshot of dirty pages
* and reenable dirty page tracking for the corresponding pages.
* @kvm: pointer to kvm instance
* @log: slot id and address to which we copy the log
- * @flush: true if TLB flush is needed by caller
*
* We need to keep it in mind that VCPU threads can write to the bitmap
* concurrently. So, to avoid losing track of dirty pages we keep the
@@ -1191,8 +1191,7 @@ EXPORT_SYMBOL_GPL(kvm_get_dirty_log);
* exiting to userspace will be logged for the next call.
*
*/
-int kvm_get_dirty_log_protect(struct kvm *kvm,
- struct kvm_dirty_log *log, bool *flush)
+static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
@@ -1200,6 +1199,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
unsigned long n;
unsigned long *dirty_bitmap;
unsigned long *dirty_bitmap_buffer;
+ bool flush;

as_id = log->slot >> 16;
id = (u16)log->slot;
@@ -1213,8 +1213,10 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
if (!dirty_bitmap)
return -ENOENT;

+ kvm_arch_sync_dirty_log(kvm, memslot);
+
n = kvm_dirty_bitmap_bytes(memslot);
- *flush = false;
+ flush = false;
if (kvm->manual_dirty_log_protect) {
/*
* Unlike kvm_get_dirty_log, we always return false in *flush,
@@ -1237,7 +1239,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
if (!dirty_bitmap[i])
continue;

- *flush = true;
+ flush = true;
mask = xchg(&dirty_bitmap[i], 0);
dirty_bitmap_buffer[i] = mask;

@@ -1248,21 +1250,55 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
spin_unlock(&kvm->mmu_lock);
}

+ if (flush)
+ kvm_arch_dirty_log_tlb_flush(kvm, memslot);
+
if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
return -EFAULT;
return 0;
}
-EXPORT_SYMBOL_GPL(kvm_get_dirty_log_protect);
+
+
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * Steps 1-4 below provide general overview of dirty page logging. See
+ * kvm_get_dirty_log_protect() function description for additional details.
+ *
+ * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
+ * always flush the TLB (step 4) even if previous step failed and the dirty
+ * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
+ * does not preclude user space subsequent dirty log read. Flushing TLB ensures
+ * writes will be marked dirty for next log read.
+ *
+ * 1. Take a snapshot of the bit and clear it if needed.
+ * 2. Write protect the corresponding page.
+ * 3. Copy the snapshot to the userspace.
+ * 4. Flush TLB's if needed.
+ */
+static int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+ struct kvm_dirty_log *log)
+{
+ int r;
+
+ mutex_lock(&kvm->slots_lock);
+
+ r = kvm_get_dirty_log_protect(kvm, log);
+
+ mutex_unlock(&kvm->slots_lock);
+ return r;
+}

/**
* kvm_clear_dirty_log_protect - clear dirty bits in the bitmap
* and reenable dirty page tracking for the corresponding pages.
* @kvm: pointer to kvm instance
* @log: slot id and address from which to fetch the bitmap of dirty pages
- * @flush: true if TLB flush is needed by caller
*/
-int kvm_clear_dirty_log_protect(struct kvm *kvm,
- struct kvm_clear_dirty_log *log, bool *flush)
+static int kvm_clear_dirty_log_protect(struct kvm *kvm,
+ struct kvm_clear_dirty_log *log)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
@@ -1271,6 +1307,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
unsigned long i, n;
unsigned long *dirty_bitmap;
unsigned long *dirty_bitmap_buffer;
+ bool flush;

as_id = log->slot >> 16;
id = (u16)log->slot;
@@ -1294,7 +1331,9 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
(log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63)))
return -EINVAL;

- *flush = false;
+ kvm_arch_sync_dirty_log(kvm, memslot);
+
+ flush = false;
dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
return -EFAULT;
@@ -1317,17 +1356,32 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
* a problem if userspace sets them in log->dirty_bitmap.
*/
if (mask) {
- *flush = true;
+ flush = true;
kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
offset, mask);
}
}
spin_unlock(&kvm->mmu_lock);

+ if (flush)
+ kvm_arch_dirty_log_tlb_flush(kvm, memslot);
+
return 0;
}
-EXPORT_SYMBOL_GPL(kvm_clear_dirty_log_protect);
-#endif
+
+static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
+ struct kvm_clear_dirty_log *log)
+{
+ int r;
+
+ mutex_lock(&kvm->slots_lock);
+
+ r = kvm_clear_dirty_log_protect(kvm, log);
+
+ mutex_unlock(&kvm->slots_lock);
+ return r;
+}
+#endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */

bool kvm_largepages_enabled(void)
{
--
2.22.0

2019-10-25 19:17:39

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 07/15] KVM: Refactor error handling for setting memory region

Replace a big pile o' gotos with returns to make it more obvious what
error code is being returned, and to prepare for refactoring the
functional, i.e. post-checks, portion of __kvm_set_memory_region().

Reviewed-by: Janosch Frank <[email protected]>
Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 40 ++++++++++++++++++----------------------
1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a43902d9036d..e2f47d60f696 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -942,34 +942,33 @@ int __kvm_set_memory_region(struct kvm *kvm,

r = check_memory_region_flags(mem);
if (r)
- goto out;
+ return r;

- r = -EINVAL;
as_id = mem->slot >> 16;
id = (u16)mem->slot;

/* General sanity checks */
if (mem->memory_size & (PAGE_SIZE - 1))
- goto out;
+ return -EINVAL;
if (mem->guest_phys_addr & (PAGE_SIZE - 1))
- goto out;
+ return -EINVAL;
/* We can read the guest memory with __xxx_user() later on. */
if ((id < KVM_USER_MEM_SLOTS) &&
((mem->userspace_addr & (PAGE_SIZE - 1)) ||
!access_ok((void __user *)(unsigned long)mem->userspace_addr,
mem->memory_size)))
- goto out;
+ return -EINVAL;
if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
- goto out;
+ return -EINVAL;
if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
- goto out;
+ return -EINVAL;

slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
npages = mem->memory_size >> PAGE_SHIFT;

if (npages > KVM_MEM_MAX_NR_PAGES)
- goto out;
+ return -EINVAL;

new = old = *slot;

@@ -986,20 +985,18 @@ int __kvm_set_memory_region(struct kvm *kvm,
if ((new.userspace_addr != old.userspace_addr) ||
(npages != old.npages) ||
((new.flags ^ old.flags) & KVM_MEM_READONLY))
- goto out;
+ return -EINVAL;

if (base_gfn != old.base_gfn)
change = KVM_MR_MOVE;
else if (new.flags != old.flags)
change = KVM_MR_FLAGS_ONLY;
- else { /* Nothing to change. */
- r = 0;
- goto out;
- }
+ else /* Nothing to change. */
+ return 0;
}
} else {
if (!old.npages)
- goto out;
+ return -EINVAL;

change = KVM_MR_DELETE;
new.base_gfn = 0;
@@ -1008,29 +1005,29 @@ int __kvm_set_memory_region(struct kvm *kvm,

if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
/* Check for overlaps */
- r = -EEXIST;
kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
if (slot->id == id)
continue;
if (!((base_gfn + npages <= slot->base_gfn) ||
(base_gfn >= slot->base_gfn + slot->npages)))
- goto out;
+ return -EEXIST;
}
}

- r = -ENOMEM;
-
/* Allocate/free page dirty bitmap as needed */
if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
new.dirty_bitmap = NULL;
else if (!new.dirty_bitmap) {
- if (kvm_create_dirty_bitmap(&new) < 0)
- goto out;
+ r = kvm_create_dirty_bitmap(&new);
+ if (r)
+ return r;
}

slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
- if (!slots)
+ if (!slots) {
+ r = -ENOMEM;
goto out_bitmap;
+ }
memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));

if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
@@ -1081,7 +1078,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
out_bitmap:
if (new.dirty_bitmap && !old.dirty_bitmap)
kvm_destroy_dirty_bitmap(&new);
-out:
return r;
}
EXPORT_SYMBOL_GPL(__kvm_set_memory_region);
--
2.22.0

2019-10-25 19:17:43

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 06/15] KVM: Explicitly free allocated-but-unused dirty bitmap

Explicitly free an allocated-but-unused dirty bitmap instead of relying
on kvm_free_memslot() if an error occurs in __kvm_set_memory_region().
There is no longer a need to abuse kvm_free_memslot() to free arch
specific resources as arch specific code is now called only after the
common flow is guaranteed to succeed. Arch code can still fail, but
it's responsible for its own cleanup in that case.

Eliminating the error path's abuse of kvm_free_memslot() paves the way
for simplifying kvm_free_memslot(), i.e. dropping its @dont param.

Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 9d6af9044304..a43902d9036d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1030,7 +1030,7 @@ int __kvm_set_memory_region(struct kvm *kvm,

slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
if (!slots)
- goto out_free;
+ goto out_bitmap;
memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));

if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
@@ -1078,8 +1078,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
slots = install_new_memslots(kvm, as_id, slots);
kvfree(slots);
-out_free:
- kvm_free_memslot(kvm, &new, &old);
+out_bitmap:
+ if (new.dirty_bitmap && !old.dirty_bitmap)
+ kvm_destroy_dirty_bitmap(&new);
out:
return r;
}
--
2.22.0

2019-10-25 19:17:56

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH v3 04/15] KVM: x86: Allocate memslot resources during prepare_memory_region()

Allocate the various metadata structures associated with a memslot
during during kvm_arch_prepare_memory_region(), which paves the way for
removing kvm_arch_create_memslot() altogether. Moving x86's memory
allocation only changes the order of kernel memory allocations between
x86 and common KVM code.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 19a0dc96beca..fc63b1f07ba9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9628,6 +9628,12 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,

int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
unsigned long npages)
+{
+ return 0;
+}
+
+static int kvm_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
+ unsigned long npages)
{
int i;

@@ -9705,6 +9711,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem,
enum kvm_mr_change change)
{
+ if (change == KVM_MR_CREATE)
+ return kvm_create_memslot(kvm, memslot,
+ mem->memory_size >> PAGE_SHIFT);
return 0;
}

--
2.22.0

2019-10-25 20:16:22

by Christoffer Dall

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] KVM: Dynamically size memslot arrays

On Thu, Oct 24, 2019 at 04:07:29PM -0700, Sean Christopherson wrote:
> The end goal of this series is to dynamically size the memslot array so
> that KVM allocates memory based on the number of memslots in use, as
> opposed to unconditionally allocating memory for the maximum number of
> memslots. On x86, each memslot consumes 88 bytes, and so with 2 address
> spaces of 512 memslots, each VM consumes ~90k bytes for the memslots.
> E.g. given a VM that uses a total of 30 memslots, dynamic sizing reduces
> the memory footprint from 90k to ~2.6k bytes.
>
> The changes required to support dynamic sizing are relatively small,
> e.g. are essentially contained in patches 14/15 and 15/15. Patches 1-13
> clean up the memslot code, which has gotten quite crusty, especially
> __kvm_set_memory_region(). The clean up is likely not strictly necessary
> to switch to dynamic sizing, but I didn't have a remotely reasonable
> level of confidence in the correctness of the dynamic sizing without first
> doing the clean up.
>
> Christoffer, I added your Tested-by to the patches that I was confident
> would be fully tested based on the desription of what you tested. Let me
> know if you disagree with any of 'em.
>
The only testing I've done of patch 9 would be via the vm_free part of
kvm selftest, so not sure how valid that is, but sure.

Looks fine otherwise.


Thanks,

Christoffer

2019-10-25 20:46:48

by Marc Zyngier

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] KVM: Dynamically size memslot arrays

On 2019-10-25 00:07, Sean Christopherson wrote:
> The end goal of this series is to dynamically size the memslot array
> so
> that KVM allocates memory based on the number of memslots in use, as
> opposed to unconditionally allocating memory for the maximum number
> of
> memslots. On x86, each memslot consumes 88 bytes, and so with 2
> address
> spaces of 512 memslots, each VM consumes ~90k bytes for the memslots.
> E.g. given a VM that uses a total of 30 memslots, dynamic sizing
> reduces
> the memory footprint from 90k to ~2.6k bytes.
>
> The changes required to support dynamic sizing are relatively small,
> e.g. are essentially contained in patches 14/15 and 15/15. Patches
> 1-13
> clean up the memslot code, which has gotten quite crusty, especially
> __kvm_set_memory_region(). The clean up is likely not strictly
> necessary
> to switch to dynamic sizing, but I didn't have a remotely reasonable
> level of confidence in the correctness of the dynamic sizing without
> first
> doing the clean up.

I've finally found time to test this on a garden variety of arm64
boxes,
and nothing caught fire. It surely must be doing something right!

FWIW:

Tested-by: Marc Zyngier <[email protected]>

M.
--
Jazz is not dead. It just smells funny...

2019-12-03 22:15:16

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] KVM: Dynamically size memslot arrays

On Thu, Oct 24, 2019 at 04:07:29PM -0700, Sean Christopherson wrote:
> The end goal of this series is to dynamically size the memslot array so
> that KVM allocates memory based on the number of memslots in use, as
> opposed to unconditionally allocating memory for the maximum number of
> memslots. On x86, each memslot consumes 88 bytes, and so with 2 address
> spaces of 512 memslots, each VM consumes ~90k bytes for the memslots.
> E.g. given a VM that uses a total of 30 memslots, dynamic sizing reduces
> the memory footprint from 90k to ~2.6k bytes.
>
> The changes required to support dynamic sizing are relatively small,
> e.g. are essentially contained in patches 14/15 and 15/15. Patches 1-13
> clean up the memslot code, which has gotten quite crusty, especially
> __kvm_set_memory_region(). The clean up is likely not strictly necessary
> to switch to dynamic sizing, but I didn't have a remotely reasonable
> level of confidence in the correctness of the dynamic sizing without first
> doing the clean up.
>
> Christoffer, I added your Tested-by to the patches that I was confident
> would be fully tested based on the desription of what you tested. Let me
> know if you disagree with any of 'em.
>
> v3:
> - Fix build errors on PPC and MIPS due to missed params during
> refactoring [kbuild test robot].
> - Rename the helpers for update_memslots() and add comments describing
> the new algorithm and how it interacts with searching [Paolo].
> - Remove the unnecessary and obnoxious warning regarding memslots being
> a flexible array [Paolo].
> - Fix typos in the changelog of patch 09/15 [Christoffer].
> - Collect tags [Christoffer].
>
> v2:
> - Split "Drop kvm_arch_create_memslot()" into three patches to move
> minor functional changes to standalone patches [Janosch].
> - Rebase to latest kvm/queue (f0574a1cea5b, "KVM: x86: fix ...")
> - Collect an Acked-by and a Reviewed-by

Paolo, do you want me to rebase this to the latest kvm/queue?

2019-12-05 09:51:00

by Philippe Mathieu-Daudé

[permalink] [raw]
Subject: Re: [PATCH v3 07/15] KVM: Refactor error handling for setting memory region

On 10/25/19 1:07 AM, Sean Christopherson wrote:
> Replace a big pile o' gotos with returns to make it more obvious what
> error code is being returned, and to prepare for refactoring the
> functional, i.e. post-checks, portion of __kvm_set_memory_region().
>
> Reviewed-by: Janosch Frank <[email protected]>
> Signed-off-by: Sean Christopherson <[email protected]>
> ---
> virt/kvm/kvm_main.c | 40 ++++++++++++++++++----------------------
> 1 file changed, 18 insertions(+), 22 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a43902d9036d..e2f47d60f696 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -942,34 +942,33 @@ int __kvm_set_memory_region(struct kvm *kvm,
>
> r = check_memory_region_flags(mem);
> if (r)
> - goto out;
> + return r;
>
> - r = -EINVAL;
> as_id = mem->slot >> 16;
> id = (u16)mem->slot;
>
> /* General sanity checks */
> if (mem->memory_size & (PAGE_SIZE - 1))
> - goto out;
> + return -EINVAL;
> if (mem->guest_phys_addr & (PAGE_SIZE - 1))
> - goto out;
> + return -EINVAL;
> /* We can read the guest memory with __xxx_user() later on. */
> if ((id < KVM_USER_MEM_SLOTS) &&
> ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
> !access_ok((void __user *)(unsigned long)mem->userspace_addr,
> mem->memory_size)))
> - goto out;
> + return -EINVAL;
> if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
> - goto out;
> + return -EINVAL;
> if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
> - goto out;
> + return -EINVAL;
>
> slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
> npages = mem->memory_size >> PAGE_SHIFT;
>
> if (npages > KVM_MEM_MAX_NR_PAGES)
> - goto out;
> + return -EINVAL;
>
> new = old = *slot;
>
> @@ -986,20 +985,18 @@ int __kvm_set_memory_region(struct kvm *kvm,
> if ((new.userspace_addr != old.userspace_addr) ||
> (npages != old.npages) ||
> ((new.flags ^ old.flags) & KVM_MEM_READONLY))
> - goto out;
> + return -EINVAL;
>
> if (base_gfn != old.base_gfn)
> change = KVM_MR_MOVE;
> else if (new.flags != old.flags)
> change = KVM_MR_FLAGS_ONLY;
> - else { /* Nothing to change. */
> - r = 0;
> - goto out;
> - }
> + else /* Nothing to change. */
> + return 0;
> }
> } else {
> if (!old.npages)
> - goto out;
> + return -EINVAL;
>
> change = KVM_MR_DELETE;
> new.base_gfn = 0;
> @@ -1008,29 +1005,29 @@ int __kvm_set_memory_region(struct kvm *kvm,
>
> if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> /* Check for overlaps */
> - r = -EEXIST;
> kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
> if (slot->id == id)
> continue;
> if (!((base_gfn + npages <= slot->base_gfn) ||
> (base_gfn >= slot->base_gfn + slot->npages)))
> - goto out;
> + return -EEXIST;
> }
> }
>
> - r = -ENOMEM;
> -
> /* Allocate/free page dirty bitmap as needed */
> if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
> new.dirty_bitmap = NULL;
> else if (!new.dirty_bitmap) {
> - if (kvm_create_dirty_bitmap(&new) < 0)
> - goto out;
> + r = kvm_create_dirty_bitmap(&new);
> + if (r)
> + return r;

Reviewed-by: Philippe Mathieu-Daudé <[email protected]>

> }
>
> slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
> - if (!slots)
> + if (!slots) {
> + r = -ENOMEM;
> goto out_bitmap;
> + }
> memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
>
> if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
> @@ -1081,7 +1078,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
> out_bitmap:
> if (new.dirty_bitmap && !old.dirty_bitmap)
> kvm_destroy_dirty_bitmap(&new);
> -out:
> return r;
> }
> EXPORT_SYMBOL_GPL(__kvm_set_memory_region);
>

2019-12-13 20:02:52

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH v3 00/15] KVM: Dynamically size memslot arrays

On Tue, Dec 03, 2019 at 02:14:33PM -0800, Sean Christopherson wrote:
> On Thu, Oct 24, 2019 at 04:07:29PM -0700, Sean Christopherson wrote:
> > The end goal of this series is to dynamically size the memslot array so
> > that KVM allocates memory based on the number of memslots in use, as
> > opposed to unconditionally allocating memory for the maximum number of
> > memslots. On x86, each memslot consumes 88 bytes, and so with 2 address
> > spaces of 512 memslots, each VM consumes ~90k bytes for the memslots.
> > E.g. given a VM that uses a total of 30 memslots, dynamic sizing reduces
> > the memory footprint from 90k to ~2.6k bytes.
> >
> > The changes required to support dynamic sizing are relatively small,
> > e.g. are essentially contained in patches 14/15 and 15/15. Patches 1-13
> > clean up the memslot code, which has gotten quite crusty, especially
> > __kvm_set_memory_region(). The clean up is likely not strictly necessary
> > to switch to dynamic sizing, but I didn't have a remotely reasonable
> > level of confidence in the correctness of the dynamic sizing without first
> > doing the clean up.
> >
> > Christoffer, I added your Tested-by to the patches that I was confident
> > would be fully tested based on the desription of what you tested. Let me
> > know if you disagree with any of 'em.
> >
> > v3:
> > - Fix build errors on PPC and MIPS due to missed params during
> > refactoring [kbuild test robot].
> > - Rename the helpers for update_memslots() and add comments describing
> > the new algorithm and how it interacts with searching [Paolo].
> > - Remove the unnecessary and obnoxious warning regarding memslots being
> > a flexible array [Paolo].
> > - Fix typos in the changelog of patch 09/15 [Christoffer].
> > - Collect tags [Christoffer].
> >
> > v2:
> > - Split "Drop kvm_arch_create_memslot()" into three patches to move
> > minor functional changes to standalone patches [Janosch].
> > - Rebase to latest kvm/queue (f0574a1cea5b, "KVM: x86: fix ...")
> > - Collect an Acked-by and a Reviewed-by
>
> Paolo, do you want me to rebase this to the latest kvm/queue?

Ping.

Applies cleanly on the current kvm/queue and nothing caught fire in
testing (though I only re-tested the series as a whole).