2019-09-11 18:54:02

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 00/13] KVM: Dynamically size memslot arrays

The end goal of this series is to dynamically size the memslot array so
that KVM allocates memory based on the number of memslots in use, as
opposed to unconditionally allocating memory for the maximum number of
memslots. On x86, each memslot consumes 88 bytes, and so with 2 address
spaces of 512 memslots, each VM consumes ~90k bytes for the memslots.
E.g. given a VM that uses a total of 30 memslots, dynamic sizing reduces
the memory footprint from 90k to ~2.6k bytes.

The changes required to support dynamic sizing are relatively small,
e.g. are essentially contained in patches 12/13 and 13/13. Patches 1-11
clean up the memslot code, which has gotten quite crusy, especially
__kvm_set_memory_region(). The clean up is likely not strictly necessary
to switch to dynamic sizing, but I didn't have a remotely reasonable
level of confidence in the correctness of the dynamic sizing without first
doing the clean up.

Testing, especially non-x86 platforms, would be greatly appreciated. I'd
really like to get at least one Tested-by from all architectures. The
non-x86 changes are for all intents and purposes untested, e.g. I compile
tested pieces of the code by copying them into x86, but that's it. In
theory, the vast majority of the functional changes are arch agnostic, in
theory...

Sean Christopherson (13):
KVM: Reinstall old memslots if arch preparation fails
KVM: PPC: Move memslot memory allocation into prepare_memory_region()
KVM: x86: Allocate memslot resources during prepare_memory_region()
KVM: Drop kvm_arch_create_memslot()
KVM: Refactor error handling for setting memory region
KVM: Move setting of memslot into helper routine
KVM: Move memslot deletion to helper function
KVM: Simplify kvm_free_memslot() and all its descendents
KVM: Clean up local variable usage in __kvm_set_memory_region()
KVM: Provide common implementation for generic dirty log functions
KVM: Ensure validity of memslot with respect to kvm_get_dirty_log()
KVM: Terminate memslot walks via used_slots
KVM: Dynamically size memslot array based on number of used slots

arch/mips/include/asm/kvm_host.h | 2 +-
arch/mips/kvm/mips.c | 68 +---
arch/powerpc/include/asm/kvm_ppc.h | 14 +-
arch/powerpc/kvm/book3s.c | 22 +-
arch/powerpc/kvm/book3s_hv.c | 36 +-
arch/powerpc/kvm/book3s_pr.c | 20 +-
arch/powerpc/kvm/booke.c | 17 +-
arch/powerpc/kvm/powerpc.c | 13 +-
arch/s390/include/asm/kvm_host.h | 2 +-
arch/s390/kvm/kvm-s390.c | 21 +-
arch/x86/include/asm/kvm_page_track.h | 3 +-
arch/x86/kvm/page_track.c | 15 +-
arch/x86/kvm/x86.c | 100 ++---
include/linux/kvm_host.h | 48 +--
virt/kvm/arm/arm.c | 47 +--
virt/kvm/arm/mmu.c | 18 +-
virt/kvm/kvm_main.c | 546 ++++++++++++++++----------
17 files changed, 467 insertions(+), 525 deletions(-)

--
2.22.0


2019-09-11 18:54:21

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 06/13] KVM: Move setting of memslot into helper routine

Split out the core functionality of setting a memslot into a separate
helper in preparation for moving memslot deletion into its own routine.

Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 106 ++++++++++++++++++++++++++------------------
1 file changed, 63 insertions(+), 43 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8306ce3345a6..693f3d20e710 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -907,6 +907,66 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
return old_memslots;
}

+static int kvm_set_memslot(struct kvm *kvm,
+ const struct kvm_userspace_memory_region *mem,
+ const struct kvm_memory_slot *old,
+ struct kvm_memory_slot *new, int as_id,
+ enum kvm_mr_change change)
+{
+ struct kvm_memory_slot *slot;
+ struct kvm_memslots *slots;
+ int r;
+
+ slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
+ if (!slots)
+ return -ENOMEM;
+ memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
+
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
+ /*
+ * Note, the INVALID flag needs to be in the appropriate entry
+ * in the freshly allocated memslots, not in @old or @new.
+ */
+ slot = id_to_memslot(slots, old->id);
+ slot->flags |= KVM_MEMSLOT_INVALID;
+
+ /*
+ * We can re-use the old memslots, the only difference from the
+ * newly installed memslots is the invalid flag, which will get
+ * dropped by update_memslots anyway. We'll also revert to the
+ * old memslots if preparing the new memory region fails.
+ */
+ slots = install_new_memslots(kvm, as_id, slots);
+
+ /* From this point no new shadow pages pointing to a deleted,
+ * or moved, memslot will be created.
+ *
+ * validation of sp->gfn happens in:
+ * - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
+ * - kvm_is_visible_gfn (mmu_check_roots)
+ */
+ kvm_arch_flush_shadow_memslot(kvm, slot);
+ }
+
+ r = kvm_arch_prepare_memory_region(kvm, new, mem, change);
+ if (r)
+ goto out_slots;
+
+ update_memslots(slots, new, change);
+ slots = install_new_memslots(kvm, as_id, slots);
+
+ kvm_arch_commit_memory_region(kvm, mem, old, new, change);
+
+ kvfree(slots);
+ return 0;
+
+out_slots:
+ if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
+ slots = install_new_memslots(kvm, as_id, slots);
+ kvfree(slots);
+ return r;
+}
+
/*
* Allocate some memory and give it an address in the guest physical address
* space.
@@ -923,7 +983,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
unsigned long npages;
struct kvm_memory_slot *slot;
struct kvm_memory_slot old, new;
- struct kvm_memslots *slots;
int as_id, id;
enum kvm_mr_change change;

@@ -1010,58 +1069,19 @@ int __kvm_set_memory_region(struct kvm *kvm,
return r;
}

- slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
- if (!slots) {
- r = -ENOMEM;
- goto out_bitmap;
- }
- memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
-
- if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
- slot = id_to_memslot(slots, id);
- slot->flags |= KVM_MEMSLOT_INVALID;
-
- /*
- * We can re-use the old memslots, the only difference from the
- * newly installed memslots is the invalid flag, which will get
- * dropped by update_memslots anyway. We'll also revert to the
- * old memslots if preparing the new memory region fails.
- */
- slots = install_new_memslots(kvm, as_id, slots);
-
- /* From this point no new shadow pages pointing to a deleted,
- * or moved, memslot will be created.
- *
- * validation of sp->gfn happens in:
- * - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
- * - kvm_is_visible_gfn (mmu_check_roots)
- */
- kvm_arch_flush_shadow_memslot(kvm, slot);
- }
-
- r = kvm_arch_prepare_memory_region(kvm, &new, mem, change);
- if (r)
- goto out_slots;
-
/* actual memory is freed via old in kvm_free_memslot below */
if (change == KVM_MR_DELETE) {
new.dirty_bitmap = NULL;
memset(&new.arch, 0, sizeof(new.arch));
}

- update_memslots(slots, &new, change);
- slots = install_new_memslots(kvm, as_id, slots);
-
- kvm_arch_commit_memory_region(kvm, mem, &old, &new, change);
+ r = kvm_set_memslot(kvm, mem, &old, &new, as_id, change);
+ if (r)
+ goto out_bitmap;

kvm_free_memslot(kvm, &old, &new);
- kvfree(slots);
return 0;

-out_slots:
- if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
- slots = install_new_memslots(kvm, as_id, slots);
- kvfree(slots);
out_bitmap:
if (new.dirty_bitmap && !old.dirty_bitmap)
kvm_destroy_dirty_bitmap(&new);
--
2.22.0

2019-09-11 18:54:38

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 08/13] KVM: Simplify kvm_free_memslot() and all its descendents

Now that all callers of kvm_free_memslot() pass NULL for @dont, remove
the param from the top-level routine and all arch's implementations.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/mips/include/asm/kvm_host.h | 2 +-
arch/powerpc/include/asm/kvm_ppc.h | 6 ++----
arch/powerpc/kvm/book3s.c | 5 ++---
arch/powerpc/kvm/book3s_hv.c | 9 +++------
arch/powerpc/kvm/book3s_pr.c | 3 +--
arch/powerpc/kvm/booke.c | 3 +--
arch/powerpc/kvm/powerpc.c | 5 ++---
arch/s390/include/asm/kvm_host.h | 2 +-
arch/x86/include/asm/kvm_page_track.h | 3 +--
arch/x86/kvm/page_track.c | 15 ++++++---------
arch/x86/kvm/x86.c | 19 +++++++------------
include/linux/kvm_host.h | 3 +--
virt/kvm/arm/mmu.c | 3 +--
virt/kvm/kvm_main.c | 18 +++++++-----------
14 files changed, 36 insertions(+), 60 deletions(-)

diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h
index 41204a49cf95..2c343c346b79 100644
--- a/arch/mips/include/asm/kvm_host.h
+++ b/arch/mips/include/asm/kvm_host.h
@@ -1133,7 +1133,7 @@ extern unsigned long kvm_mips_get_ramsize(struct kvm *kvm);
static inline void kvm_arch_hardware_unsetup(void) {}
static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_free_memslot(struct kvm *kvm,
- struct kvm_memory_slot *free, struct kvm_memory_slot *dont) {}
+ struct kvm_memory_slot *slot) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) {}
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index cfe19560da1b..a4a5f4c5994e 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -201,8 +201,7 @@ extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages);
extern int kvmppc_core_init_vm(struct kvm *kvm);
extern void kvmppc_core_destroy_vm(struct kvm *kvm);
extern void kvmppc_core_free_memslot(struct kvm *kvm,
- struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont);
+ struct kvm_memory_slot *slot);
extern int kvmppc_core_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
const struct kvm_userspace_memory_region *mem);
@@ -290,8 +289,7 @@ struct kvmppc_ops {
int (*test_age_hva)(struct kvm *kvm, unsigned long hva);
void (*set_spte_hva)(struct kvm *kvm, unsigned long hva, pte_t pte);
void (*mmu_destroy)(struct kvm_vcpu *vcpu);
- void (*free_memslot)(struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont);
+ void (*free_memslot)(struct kvm_memory_slot *slot);
int (*init_vm)(struct kvm *kvm);
void (*destroy_vm)(struct kvm *kvm);
int (*get_smmu_info)(struct kvm *kvm, struct kvm_ppc_smmu_info *info);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c21acd9a7ea1..65214d5b0be0 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -834,10 +834,9 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
}

-void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont)
+void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
{
- kvm->arch.kvm_ops->free_memslot(free, dont);
+ kvm->arch.kvm_ops->free_memslot(slot);
}

void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index a28e2fb185d3..931091fac52c 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4424,13 +4424,10 @@ static int kvm_vm_ioctl_get_dirty_log_hv(struct kvm *kvm,
return r;
}

-static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont)
+static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *slot)
{
- if (!dont || free->arch.rmap != dont->arch.rmap) {
- vfree(free->arch.rmap);
- free->arch.rmap = NULL;
- }
+ vfree(slot->arch.rmap);
+ slot->arch.rmap = NULL;
}

static int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 5fceb1da5fde..5368a5dbac22 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1918,8 +1918,7 @@ static void kvmppc_core_commit_memory_region_pr(struct kvm *kvm,
return;
}

-static void kvmppc_core_free_memslot_pr(struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont)
+static void kvmppc_core_free_memslot_pr(struct kvm_memory_slot *slot)
{
return;
}
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index cf2845e147c5..a22ff567724a 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1801,8 +1801,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
return -ENOTSUPP;
}

-void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont)
+void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
{
}

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index cf74bf8f921a..91bde60a6de1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -680,10 +680,9 @@ long kvm_arch_dev_ioctl(struct file *filp,
return -EINVAL;
}

-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont)
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
{
- kvmppc_core_free_memslot(kvm, free, dont);
+ kvmppc_core_free_memslot(kvm, slot);
}

int kvm_arch_prepare_memory_region(struct kvm *kvm,
diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h
index abe60268335d..43301e8a5cbd 100644
--- a/arch/s390/include/asm/kvm_host.h
+++ b/arch/s390/include/asm/kvm_host.h
@@ -916,7 +916,7 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) {}
static inline void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
static inline void kvm_arch_free_memslot(struct kvm *kvm,
- struct kvm_memory_slot *free, struct kvm_memory_slot *dont) {}
+ struct kvm_memory_slot *slot) {}
static inline void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) {}
static inline void kvm_arch_flush_shadow_all(struct kvm *kvm) {}
static inline void kvm_arch_flush_shadow_memslot(struct kvm *kvm,
diff --git a/arch/x86/include/asm/kvm_page_track.h b/arch/x86/include/asm/kvm_page_track.h
index 172f9749dbb2..87bd6025d91d 100644
--- a/arch/x86/include/asm/kvm_page_track.h
+++ b/arch/x86/include/asm/kvm_page_track.h
@@ -49,8 +49,7 @@ struct kvm_page_track_notifier_node {
void kvm_page_track_init(struct kvm *kvm);
void kvm_page_track_cleanup(struct kvm *kvm);

-void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont);
+void kvm_page_track_free_memslot(struct kvm_memory_slot *slot);
int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
unsigned long npages);

diff --git a/arch/x86/kvm/page_track.c b/arch/x86/kvm/page_track.c
index 3521e2d176f2..d125ec379c79 100644
--- a/arch/x86/kvm/page_track.c
+++ b/arch/x86/kvm/page_track.c
@@ -19,17 +19,14 @@

#include "mmu.h"

-void kvm_page_track_free_memslot(struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont)
+void kvm_page_track_free_memslot(struct kvm_memory_slot *slot)
{
int i;

- for (i = 0; i < KVM_PAGE_TRACK_MAX; i++)
- if (!dont || free->arch.gfn_track[i] !=
- dont->arch.gfn_track[i]) {
- kvfree(free->arch.gfn_track[i]);
- free->arch.gfn_track[i] = NULL;
- }
+ for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) {
+ kvfree(slot->arch.gfn_track[i]);
+ slot->arch.gfn_track[i] = NULL;
+ }
}

int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
@@ -48,7 +45,7 @@ int kvm_page_track_create_memslot(struct kvm_memory_slot *slot,
return 0;

track_free:
- kvm_page_track_free_memslot(slot, NULL);
+ kvm_page_track_free_memslot(slot);
return -ENOMEM;
}

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7adde30c1305..f027665c8a6c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9459,27 +9459,22 @@ void kvm_arch_destroy_vm(struct kvm *kvm)
kvm_hv_destroy_vm(kvm);
}

-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont)
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
{
int i;

for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
- if (!dont || free->arch.rmap[i] != dont->arch.rmap[i]) {
- kvfree(free->arch.rmap[i]);
- free->arch.rmap[i] = NULL;
- }
+ kvfree(slot->arch.rmap[i]);
+ slot->arch.rmap[i] = NULL;
+
if (i == 0)
continue;

- if (!dont || free->arch.lpage_info[i - 1] !=
- dont->arch.lpage_info[i - 1]) {
- kvfree(free->arch.lpage_info[i - 1]);
- free->arch.lpage_info[i - 1] = NULL;
- }
+ kvfree(slot->arch.lpage_info[i - 1]);
+ slot->arch.lpage_info[i - 1] = NULL;
}

- kvm_page_track_free_memslot(free, dont);
+ kvm_page_track_free_memslot(slot);
}

static int kvm_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index cce72f55ab76..f268c97c6cba 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -677,8 +677,7 @@ int kvm_set_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem);
int __kvm_set_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem);
-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont);
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot);
void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen);
int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index f264de85f648..f3241b268d49 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -2353,8 +2353,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
return ret;
}

-void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont)
+void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
{
}

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 1dc9db9bf9eb..8f2f4ee32f3e 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -549,18 +549,14 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
memslot->dirty_bitmap = NULL;
}

-/*
- * Free any memory in @free but not in @dont.
- */
-static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
- struct kvm_memory_slot *dont)
+static void kvm_free_memslot(struct kvm *kvm, struct kvm_memory_slot *slot)
{
- if (!dont || free->dirty_bitmap != dont->dirty_bitmap)
- kvm_destroy_dirty_bitmap(free);
+ kvm_destroy_dirty_bitmap(slot);

- kvm_arch_free_memslot(kvm, free, dont);
+ kvm_arch_free_memslot(kvm, slot);

- free->npages = 0;
+ slot->flags = 0;
+ slot->npages = 0;
}

static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
@@ -571,7 +567,7 @@ static void kvm_free_memslots(struct kvm *kvm, struct kvm_memslots *slots)
return;

kvm_for_each_memslot(memslot, slots)
- kvm_free_memslot(kvm, memslot, NULL);
+ kvm_free_memslot(kvm, memslot);

kvfree(slots);
}
@@ -984,7 +980,7 @@ static int kvm_delete_memslot(struct kvm *kvm,
if (r)
return r;

- kvm_free_memslot(kvm, old, NULL);
+ kvm_free_memslot(kvm, old);
return 0;
}

--
2.22.0

2019-09-11 18:54:58

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 04/13] KVM: Drop kvm_arch_create_memslot()

Remove kvm_arch_create_memslot() now that all arch implementations are
effectively nops. Explicitly free an allocated-but-unused dirty bitmap
instead of relying on kvm_free_memslot() now that setting a memslot can
no longer fail after arch code has allocated memory. In practice
this was already true, e.g. architectures that allocated memory via
kvm_arch_create_memslot() never failed kvm_arch_prepare_memory_region()
and vice versa, but removing kvm_arch_create_memslot() eliminates the
potential for future code to stealthily change behavior.

Eliminating the error path's reliance on kvm_free_memslot() paves the
way for simplify kvm_free_memslot(), i.e. dropping its @dont param.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/mips/kvm/mips.c | 6 ------
arch/powerpc/kvm/powerpc.c | 6 ------
arch/s390/kvm/kvm-s390.c | 6 ------
arch/x86/kvm/x86.c | 6 ------
include/linux/kvm_host.h | 2 --
virt/kvm/arm/mmu.c | 6 ------
virt/kvm/kvm_main.c | 28 +++++++++++-----------------
7 files changed, 11 insertions(+), 49 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 1109924560d8..713e5465edb0 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -188,12 +188,6 @@ long kvm_arch_dev_ioctl(struct file *filp, unsigned int ioctl,
return -ENOIOCTLCMD;
}

-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
- unsigned long npages)
-{
- return 0;
-}
-
void kvm_arch_flush_shadow_all(struct kvm *kvm)
{
/* Flush whole GPA */
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 8b723b164fe1..cf74bf8f921a 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -686,12 +686,6 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
kvmppc_core_free_memslot(kvm, free, dont);
}

-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
- unsigned long npages)
-{
- return 0;
-}
-
int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
const struct kvm_userspace_memory_region *mem,
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index f329dcb3f44c..e651ed80dc2c 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -4488,12 +4488,6 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf)
return VM_FAULT_SIGBUS;
}

-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
- unsigned long npages)
-{
- return 0;
-}
-
/* Section: memory related */
int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 72ec6272d7cb..7adde30c1305 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9482,12 +9482,6 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
kvm_page_track_free_memslot(free, dont);
}

-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
- unsigned long npages)
-{
- return 0;
-}
-
static int kvm_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
unsigned long npages)
{
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index fcb46b3374c6..cce72f55ab76 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -679,8 +679,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem);
void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
struct kvm_memory_slot *dont);
-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
- unsigned long npages);
void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen);
int kvm_arch_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
diff --git a/virt/kvm/arm/mmu.c b/virt/kvm/arm/mmu.c
index 38b4c910b6c3..f264de85f648 100644
--- a/virt/kvm/arm/mmu.c
+++ b/virt/kvm/arm/mmu.c
@@ -2358,12 +2358,6 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
{
}

-int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
- unsigned long npages)
-{
- return 0;
-}
-
void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen)
{
}
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index daa5de5b3f88..ea8f2f37096f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -964,12 +964,13 @@ int __kvm_set_memory_region(struct kvm *kvm,
new.base_gfn = base_gfn;
new.npages = npages;
new.flags = mem->flags;
+ new.userspace_addr = mem->userspace_addr;

if (npages) {
if (!old.npages)
change = KVM_MR_CREATE;
else { /* Modify an existing slot. */
- if ((mem->userspace_addr != old.userspace_addr) ||
+ if ((new.userspace_addr != old.userspace_addr) ||
(npages != old.npages) ||
((new.flags ^ old.flags) & KVM_MEM_READONLY))
goto out;
@@ -1004,27 +1005,19 @@ int __kvm_set_memory_region(struct kvm *kvm,
}
}

- /* Free page dirty bitmap if unneeded */
+ r = -ENOMEM;
+
+ /* Allocate/free page dirty bitmap as needed */
if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
new.dirty_bitmap = NULL;
-
- r = -ENOMEM;
- if (change == KVM_MR_CREATE) {
- new.userspace_addr = mem->userspace_addr;
-
- if (kvm_arch_create_memslot(kvm, &new, npages))
- goto out_free;
- }
-
- /* Allocate page dirty bitmap if needed */
- if ((new.flags & KVM_MEM_LOG_DIRTY_PAGES) && !new.dirty_bitmap) {
+ else if (!new.dirty_bitmap) {
if (kvm_create_dirty_bitmap(&new) < 0)
- goto out_free;
+ goto out;
}

slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
if (!slots)
- goto out_free;
+ goto out_bitmap;
memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));

if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
@@ -1072,8 +1065,9 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (change == KVM_MR_DELETE || change == KVM_MR_MOVE)
slots = install_new_memslots(kvm, as_id, slots);
kvfree(slots);
-out_free:
- kvm_free_memslot(kvm, &new, &old);
+out_bitmap:
+ if (new.dirty_bitmap && !old.dirty_bitmap)
+ kvm_destroy_dirty_bitmap(&new);
out:
return r;
}
--
2.22.0

2019-09-11 18:55:06

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 05/13] KVM: Refactor error handling for setting memory region

Replace a big pile o' gotos with returns to make it more obvious what
error code is being returned, and to prepare for refactoring the
functional, i.e. post-checks, portion of __kvm_set_memory_region().

Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 40 ++++++++++++++++++----------------------
1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ea8f2f37096f..8306ce3345a6 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -929,34 +929,33 @@ int __kvm_set_memory_region(struct kvm *kvm,

r = check_memory_region_flags(mem);
if (r)
- goto out;
+ return r;

- r = -EINVAL;
as_id = mem->slot >> 16;
id = (u16)mem->slot;

/* General sanity checks */
if (mem->memory_size & (PAGE_SIZE - 1))
- goto out;
+ return -EINVAL;
if (mem->guest_phys_addr & (PAGE_SIZE - 1))
- goto out;
+ return -EINVAL;
/* We can read the guest memory with __xxx_user() later on. */
if ((id < KVM_USER_MEM_SLOTS) &&
((mem->userspace_addr & (PAGE_SIZE - 1)) ||
!access_ok((void __user *)(unsigned long)mem->userspace_addr,
mem->memory_size)))
- goto out;
+ return -EINVAL;
if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
- goto out;
+ return -EINVAL;
if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
- goto out;
+ return -EINVAL;

slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
npages = mem->memory_size >> PAGE_SHIFT;

if (npages > KVM_MEM_MAX_NR_PAGES)
- goto out;
+ return -EINVAL;

new = old = *slot;

@@ -973,20 +972,18 @@ int __kvm_set_memory_region(struct kvm *kvm,
if ((new.userspace_addr != old.userspace_addr) ||
(npages != old.npages) ||
((new.flags ^ old.flags) & KVM_MEM_READONLY))
- goto out;
+ return -EINVAL;

if (base_gfn != old.base_gfn)
change = KVM_MR_MOVE;
else if (new.flags != old.flags)
change = KVM_MR_FLAGS_ONLY;
- else { /* Nothing to change. */
- r = 0;
- goto out;
- }
+ else /* Nothing to change. */
+ return 0;
}
} else {
if (!old.npages)
- goto out;
+ return -EINVAL;

change = KVM_MR_DELETE;
new.base_gfn = 0;
@@ -995,29 +992,29 @@ int __kvm_set_memory_region(struct kvm *kvm,

if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
/* Check for overlaps */
- r = -EEXIST;
kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
if (slot->id == id)
continue;
if (!((base_gfn + npages <= slot->base_gfn) ||
(base_gfn >= slot->base_gfn + slot->npages)))
- goto out;
+ return -EEXIST;
}
}

- r = -ENOMEM;
-
/* Allocate/free page dirty bitmap as needed */
if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
new.dirty_bitmap = NULL;
else if (!new.dirty_bitmap) {
- if (kvm_create_dirty_bitmap(&new) < 0)
- goto out;
+ r = kvm_create_dirty_bitmap(&new);
+ if (r)
+ return r;
}

slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
- if (!slots)
+ if (!slots) {
+ r = -ENOMEM;
goto out_bitmap;
+ }
memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));

if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
@@ -1068,7 +1065,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
out_bitmap:
if (new.dirty_bitmap && !old.dirty_bitmap)
kvm_destroy_dirty_bitmap(&new);
-out:
return r;
}
EXPORT_SYMBOL_GPL(__kvm_set_memory_region);
--
2.22.0

2019-09-11 18:55:07

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 02/13] KVM: PPC: Move memslot memory allocation into prepare_memory_region()

Allocate the rmap array during kvm_arch_prepare_memory_region() to pave
the way for removing kvm_arch_create_memslot() altogether. Moving PPC's
memory allocation only changes the order of kernel memory allocations
between PPC and common KVM code.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/powerpc/include/asm/kvm_ppc.h | 8 ++------
arch/powerpc/kvm/book3s.c | 12 ++++--------
arch/powerpc/kvm/book3s_hv.c | 25 ++++++++++++-------------
arch/powerpc/kvm/book3s_pr.c | 11 ++---------
arch/powerpc/kvm/booke.c | 9 ++-------
arch/powerpc/kvm/powerpc.c | 4 ++--
6 files changed, 24 insertions(+), 45 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2484e6a8f5ca..cfe19560da1b 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -203,9 +203,6 @@ extern void kvmppc_core_destroy_vm(struct kvm *kvm);
extern void kvmppc_core_free_memslot(struct kvm *kvm,
struct kvm_memory_slot *free,
struct kvm_memory_slot *dont);
-extern int kvmppc_core_create_memslot(struct kvm *kvm,
- struct kvm_memory_slot *slot,
- unsigned long npages);
extern int kvmppc_core_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
const struct kvm_userspace_memory_region *mem);
@@ -280,7 +277,8 @@ struct kvmppc_ops {
void (*flush_memslot)(struct kvm *kvm, struct kvm_memory_slot *memslot);
int (*prepare_memory_region)(struct kvm *kvm,
struct kvm_memory_slot *memslot,
- const struct kvm_userspace_memory_region *mem);
+ const struct kvm_userspace_memory_region *mem,
+ enum kvm_mr_change change);
void (*commit_memory_region)(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem,
const struct kvm_memory_slot *old,
@@ -294,8 +292,6 @@ struct kvmppc_ops {
void (*mmu_destroy)(struct kvm_vcpu *vcpu);
void (*free_memslot)(struct kvm_memory_slot *free,
struct kvm_memory_slot *dont);
- int (*create_memslot)(struct kvm_memory_slot *slot,
- unsigned long npages);
int (*init_vm)(struct kvm *kvm);
void (*destroy_vm)(struct kvm *kvm);
int (*get_smmu_info)(struct kvm *kvm, struct kvm_ppc_smmu_info *info);
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 9524d92bc45d..c21acd9a7ea1 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -840,12 +840,6 @@ void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
kvm->arch.kvm_ops->free_memslot(free, dont);
}

-int kvmppc_core_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
- unsigned long npages)
-{
- return kvm->arch.kvm_ops->create_memslot(slot, npages);
-}
-
void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
kvm->arch.kvm_ops->flush_memslot(kvm, memslot);
@@ -853,9 +847,11 @@ void kvmppc_core_flush_memslot(struct kvm *kvm, struct kvm_memory_slot *memslot)

int kvmppc_core_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
- const struct kvm_userspace_memory_region *mem)
+ const struct kvm_userspace_memory_region *mem,
+ enum kvm_mr_change change)
{
- return kvm->arch.kvm_ops->prepare_memory_region(kvm, memslot, mem);
+ return kvm->arch.kvm_ops->prepare_memory_region(kvm, memslot, mem,
+ change);
}

void kvmppc_core_commit_memory_region(struct kvm *kvm,
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index ec1804f822af..a28e2fb185d3 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4433,20 +4433,20 @@ static void kvmppc_core_free_memslot_hv(struct kvm_memory_slot *free,
}
}

-static int kvmppc_core_create_memslot_hv(struct kvm_memory_slot *slot,
- unsigned long npages)
-{
- slot->arch.rmap = vzalloc(array_size(npages, sizeof(*slot->arch.rmap)));
- if (!slot->arch.rmap)
- return -ENOMEM;
-
- return 0;
-}
-
static int kvmppc_core_prepare_memory_region_hv(struct kvm *kvm,
- struct kvm_memory_slot *memslot,
- const struct kvm_userspace_memory_region *mem)
+ struct kvm_memory_slot *slot,
+ const struct kvm_userspace_memory_region *mem,
+ enum kvm_mr_change change)
{
+ unsigned long npages = mem->memory_size >> PAGE_SHIFT;
+
+ if (change == KVM_MR_CREATE) {
+ slot->arch.rmap = vzalloc(array_size(npages,
+ sizeof(*slot->arch.rmap)));
+ if (!slot->arch.rmap)
+ return -ENOMEM;
+ }
+
return 0;
}

@@ -5388,7 +5388,6 @@ static struct kvmppc_ops kvm_ops_hv = {
.set_spte_hva = kvm_set_spte_hva_hv,
.mmu_destroy = kvmppc_mmu_destroy_hv,
.free_memslot = kvmppc_core_free_memslot_hv,
- .create_memslot = kvmppc_core_create_memslot_hv,
.init_vm = kvmppc_core_init_vm_hv,
.destroy_vm = kvmppc_core_destroy_vm_hv,
.get_smmu_info = kvm_vm_ioctl_get_smmu_info_hv,
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index cc65af8fe6f7..5fceb1da5fde 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -1903,7 +1903,8 @@ static void kvmppc_core_flush_memslot_pr(struct kvm *kvm,

static int kvmppc_core_prepare_memory_region_pr(struct kvm *kvm,
struct kvm_memory_slot *memslot,
- const struct kvm_userspace_memory_region *mem)
+ const struct kvm_userspace_memory_region *mem,
+ enum kvm_mr_change change)
{
return 0;
}
@@ -1923,13 +1924,6 @@ static void kvmppc_core_free_memslot_pr(struct kvm_memory_slot *free,
return;
}

-static int kvmppc_core_create_memslot_pr(struct kvm_memory_slot *slot,
- unsigned long npages)
-{
- return 0;
-}
-
-
#ifdef CONFIG_PPC64
static int kvm_vm_ioctl_get_smmu_info_pr(struct kvm *kvm,
struct kvm_ppc_smmu_info *info)
@@ -2073,7 +2067,6 @@ static struct kvmppc_ops kvm_ops_pr = {
.set_spte_hva = kvm_set_spte_hva_pr,
.mmu_destroy = kvmppc_mmu_destroy_pr,
.free_memslot = kvmppc_core_free_memslot_pr,
- .create_memslot = kvmppc_core_create_memslot_pr,
.init_vm = kvmppc_core_init_vm_pr,
.destroy_vm = kvmppc_core_destroy_vm_pr,
.get_smmu_info = kvm_vm_ioctl_get_smmu_info_pr,
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index be9a45874194..cf2845e147c5 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1806,15 +1806,10 @@ void kvmppc_core_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
{
}

-int kvmppc_core_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
- unsigned long npages)
-{
- return 0;
-}
-
int kvmppc_core_prepare_memory_region(struct kvm *kvm,
struct kvm_memory_slot *memslot,
- const struct kvm_userspace_memory_region *mem)
+ const struct kvm_userspace_memory_region *mem,
+ enum kvm_mr_change change)
{
return 0;
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 3e566c2e6066..8b723b164fe1 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -689,7 +689,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,
int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
unsigned long npages)
{
- return kvmppc_core_create_memslot(kvm, slot, npages);
+ return 0;
}

int kvm_arch_prepare_memory_region(struct kvm *kvm,
@@ -697,7 +697,7 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem,
enum kvm_mr_change change)
{
- return kvmppc_core_prepare_memory_region(kvm, memslot, mem);
+ return kvmppc_core_prepare_memory_region(kvm, memslot, mem, change);
}

void kvm_arch_commit_memory_region(struct kvm *kvm,
--
2.22.0

2019-09-11 18:55:09

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 10/13] KVM: Provide common implementation for generic dirty log functions

Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code.
The arch specific implemenations are extremely similar, differing
only in whether the dirty log needs to be sync'd from hardware (x86)
and how the TLBs are flushed. Add new arch hooks to handle sync
and TLB flush; the sync will also be used for non-generic dirty log
support in a future patch (s390).

The ulterior motive for providing a common implementation is to
eliminate the dependency between arch and common code with respect to
the memslot referenced by the dirty log, i.e. to make it obvious in the
code that the validity of the memslot is guaranteed, as a future patch
will rework memslot handling such that id_to_memslot() can return NULL.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/mips/kvm/mips.c | 62 ++---------------------------
arch/powerpc/kvm/book3s.c | 5 +++
arch/powerpc/kvm/booke.c | 5 +++
arch/s390/kvm/kvm-s390.c | 5 +--
arch/x86/kvm/x86.c | 60 ++--------------------------
include/linux/kvm_host.h | 20 ++++------
virt/kvm/arm/arm.c | 47 ++--------------------
virt/kvm/kvm_main.c | 84 ++++++++++++++++++++++++++++++++-------
8 files changed, 99 insertions(+), 189 deletions(-)

diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
index 713e5465edb0..5c4feeeee447 100644
--- a/arch/mips/kvm/mips.c
+++ b/arch/mips/kvm/mips.c
@@ -965,69 +965,15 @@ long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl,
return r;
}

-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * Steps 1-4 below provide general overview of dirty page logging. See
- * kvm_get_dirty_log_protect() function description for additional details.
- *
- * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
- * always flush the TLB (step 4) even if previous step failed and the dirty
- * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
- * does not preclude user space subsequent dirty log read. Flushing TLB ensures
- * writes will be marked dirty for next log read.
- *
- * 1. Take a snapshot of the bit and clear it if needed.
- * 2. Write protect the corresponding page.
- * 3. Copy the snapshot to the userspace.
- * 4. Flush TLB's if needed.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
- struct kvm_memslots *slots;
- struct kvm_memory_slot *memslot;
- bool flush = false;
- int r;

- mutex_lock(&kvm->slots_lock);
-
- r = kvm_get_dirty_log_protect(kvm, log, &flush);
-
- if (flush) {
- slots = kvm_memslots(kvm);
- memslot = id_to_memslot(slots, log->slot);
-
- /* Let implementation handle TLB/GVA invalidation */
- kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
- }
-
- mutex_unlock(&kvm->slots_lock);
- return r;
}

-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm, struct kvm_memory_slot *slot)
{
- struct kvm_memslots *slots;
- struct kvm_memory_slot *memslot;
- bool flush = false;
- int r;
-
- mutex_lock(&kvm->slots_lock);
-
- r = kvm_clear_dirty_log_protect(kvm, log, &flush);
-
- if (flush) {
- slots = kvm_memslots(kvm);
- memslot = id_to_memslot(slots, log->slot);
-
- /* Let implementation handle TLB/GVA invalidation */
- kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
- }
-
- mutex_unlock(&kvm->slots_lock);
- return r;
+ /* Let implementation handle TLB/GVA invalidation */
+ kvm_mips_callbacks->flush_shadow_memslot(kvm, memslot);
}

long kvm_arch_vm_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg)
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index 65214d5b0be0..cfdaf7be5bd6 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -829,6 +829,11 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
return vcpu->kvm->arch.kvm_ops->check_requests(vcpu);
}

+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+
+}
+
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
{
return kvm->arch.kvm_ops->get_dirty_log(kvm, log);
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index a22ff567724a..35a4ef89a1db 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -1796,6 +1796,11 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
return r;
}

+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
+{
+
+}
+
int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
{
return -ENOTSUPP;
diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index e651ed80dc2c..36d4de1b1409 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -572,8 +572,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
return r;
}

-static void kvm_s390_sync_dirty_log(struct kvm *kvm,
- struct kvm_memory_slot *memslot)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
int i;
gfn_t cur_gfn, last_gfn;
@@ -633,7 +632,7 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
if (!memslot->dirty_bitmap)
goto out;

- kvm_s390_sync_dirty_log(kvm, memslot);
+ kvm_arch_sync_dirty_log(kvm, memslot);
r = kvm_get_dirty_log(kvm, log, &is_dirty);
if (r)
goto out;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f027665c8a6c..7bf54eccb6d4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4522,77 +4522,23 @@ static int kvm_vm_ioctl_reinject(struct kvm *kvm,
return 0;
}

-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * Steps 1-4 below provide general overview of dirty page logging. See
- * kvm_get_dirty_log_protect() function description for additional details.
- *
- * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
- * always flush the TLB (step 4) even if previous step failed and the dirty
- * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
- * does not preclude user space subsequent dirty log read. Flushing TLB ensures
- * writes will be marked dirty for next log read.
- *
- * 1. Take a snapshot of the bit and clear it if needed.
- * 2. Write protect the corresponding page.
- * 3. Copy the snapshot to the userspace.
- * 4. Flush TLB's if needed.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
- bool flush = false;
- int r;
-
- mutex_lock(&kvm->slots_lock);
-
/*
* Flush potentially hardware-cached dirty pages to dirty_bitmap.
*/
if (kvm_x86_ops->flush_log_dirty)
kvm_x86_ops->flush_log_dirty(kvm);
-
- r = kvm_get_dirty_log_protect(kvm, log, &flush);
-
- /*
- * All the TLBs can be flushed out of mmu lock, see the comments in
- * kvm_mmu_slot_remove_write_access().
- */
- lockdep_assert_held(&kvm->slots_lock);
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
- mutex_unlock(&kvm->slots_lock);
- return r;
}

-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm, struct kvm_memory_slot *slot)
{
- bool flush = false;
- int r;
-
- mutex_lock(&kvm->slots_lock);
-
- /*
- * Flush potentially hardware-cached dirty pages to dirty_bitmap.
- */
- if (kvm_x86_ops->flush_log_dirty)
- kvm_x86_ops->flush_log_dirty(kvm);
-
- r = kvm_clear_dirty_log_protect(kvm, log, &flush);
-
/*
* All the TLBs can be flushed out of mmu lock, see the comments in
* kvm_mmu_slot_remove_write_access().
*/
lockdep_assert_held(&kvm->slots_lock);
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
- mutex_unlock(&kvm->slots_lock);
- return r;
+ kvm_flush_remote_tlbs(kvm);
}

int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_event,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f268c97c6cba..feb40a184847 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -797,23 +797,19 @@ vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf);

int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext);

-int kvm_get_dirty_log(struct kvm *kvm,
- struct kvm_dirty_log *log, int *is_dirty);
-
-int kvm_get_dirty_log_protect(struct kvm *kvm,
- struct kvm_dirty_log *log, bool *flush);
-int kvm_clear_dirty_log_protect(struct kvm *kvm,
- struct kvm_clear_dirty_log *log, bool *flush);
-
void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm,
struct kvm_memory_slot *slot,
gfn_t gfn_offset,
unsigned long mask);
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot);

-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
- struct kvm_dirty_log *log);
-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
- struct kvm_clear_dirty_log *log);
+#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm, struct kvm_memory_slot *slot);
+#else /* !CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
+int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log);
+int kvm_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log,
+ int *is_dirty);
+#endif

int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level,
bool line_status);
diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 35a069815baf..243a9cadc284 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -1201,55 +1201,14 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
return r;
}

-/**
- * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
- * @kvm: kvm instance
- * @log: slot id and address to which we copy the log
- *
- * Steps 1-4 below provide general overview of dirty page logging. See
- * kvm_get_dirty_log_protect() function description for additional details.
- *
- * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
- * always flush the TLB (step 4) even if previous step failed and the dirty
- * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
- * does not preclude user space subsequent dirty log read. Flushing TLB ensures
- * writes will be marked dirty for next log read.
- *
- * 1. Take a snapshot of the bit and clear it if needed.
- * 2. Write protect the corresponding page.
- * 3. Copy the snapshot to the userspace.
- * 4. Flush TLB's if needed.
- */
-int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
+void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot)
{
- bool flush = false;
- int r;

- mutex_lock(&kvm->slots_lock);
-
- r = kvm_get_dirty_log_protect(kvm, log, &flush);
-
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
- mutex_unlock(&kvm->slots_lock);
- return r;
}

-int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm, struct kvm_clear_dirty_log *log)
+void kvm_arch_dirty_log_tlb_flush(struct kvm *kvm, struct kvm_memory_slot *slot)
{
- bool flush = false;
- int r;
-
- mutex_lock(&kvm->slots_lock);
-
- r = kvm_clear_dirty_log_protect(kvm, log, &flush);
-
- if (flush)
- kvm_flush_remote_tlbs(kvm);
-
- mutex_unlock(&kvm->slots_lock);
- return r;
+ kvm_flush_remote_tlbs(kvm);
}

static int kvm_vm_ioctl_set_device_addr(struct kvm *kvm,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0e46524487cc..85062477be90 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -780,7 +780,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp)

/*
* Allocation size is twice as large as the actual dirty bitmap size.
- * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed.
+ * See kvm_vm_ioctl_get_dirty_log() why this is needed.
*/
static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot)
{
@@ -1122,6 +1122,7 @@ static int kvm_vm_ioctl_set_memory_region(struct kvm *kvm,
return kvm_set_memory_region(kvm, mem);
}

+#ifndef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
int kvm_get_dirty_log(struct kvm *kvm,
struct kvm_dirty_log *log, int *is_dirty)
{
@@ -1155,13 +1156,12 @@ int kvm_get_dirty_log(struct kvm *kvm,
}
EXPORT_SYMBOL_GPL(kvm_get_dirty_log);

-#ifdef CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT
+#else /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */
/**
* kvm_get_dirty_log_protect - get a snapshot of dirty pages
* and reenable dirty page tracking for the corresponding pages.
* @kvm: pointer to kvm instance
* @log: slot id and address to which we copy the log
- * @flush: true if TLB flush is needed by caller
*
* We need to keep it in mind that VCPU threads can write to the bitmap
* concurrently. So, to avoid losing track of dirty pages we keep the
@@ -1178,8 +1178,7 @@ EXPORT_SYMBOL_GPL(kvm_get_dirty_log);
* exiting to userspace will be logged for the next call.
*
*/
-int kvm_get_dirty_log_protect(struct kvm *kvm,
- struct kvm_dirty_log *log, bool *flush)
+static int kvm_get_dirty_log_protect(struct kvm *kvm, struct kvm_dirty_log *log)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
@@ -1187,6 +1186,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
unsigned long n;
unsigned long *dirty_bitmap;
unsigned long *dirty_bitmap_buffer;
+ bool flush;

as_id = log->slot >> 16;
id = (u16)log->slot;
@@ -1200,8 +1200,10 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
if (!dirty_bitmap)
return -ENOENT;

+ kvm_arch_sync_dirty_log(kvm, memslot);
+
n = kvm_dirty_bitmap_bytes(memslot);
- *flush = false;
+ flush = false;
if (kvm->manual_dirty_log_protect) {
/*
* Unlike kvm_get_dirty_log, we always return false in *flush,
@@ -1224,7 +1226,7 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
if (!dirty_bitmap[i])
continue;

- *flush = true;
+ flush = true;
mask = xchg(&dirty_bitmap[i], 0);
dirty_bitmap_buffer[i] = mask;

@@ -1235,21 +1237,55 @@ int kvm_get_dirty_log_protect(struct kvm *kvm,
spin_unlock(&kvm->mmu_lock);
}

+ if (flush)
+ kvm_arch_dirty_log_tlb_flush(kvm, memslot);
+
if (copy_to_user(log->dirty_bitmap, dirty_bitmap_buffer, n))
return -EFAULT;
return 0;
}
-EXPORT_SYMBOL_GPL(kvm_get_dirty_log_protect);
+
+
+/**
+ * kvm_vm_ioctl_get_dirty_log - get and clear the log of dirty pages in a slot
+ * @kvm: kvm instance
+ * @log: slot id and address to which we copy the log
+ *
+ * Steps 1-4 below provide general overview of dirty page logging. See
+ * kvm_get_dirty_log_protect() function description for additional details.
+ *
+ * We call kvm_get_dirty_log_protect() to handle steps 1-3, upon return we
+ * always flush the TLB (step 4) even if previous step failed and the dirty
+ * bitmap may be corrupt. Regardless of previous outcome the KVM logging API
+ * does not preclude user space subsequent dirty log read. Flushing TLB ensures
+ * writes will be marked dirty for next log read.
+ *
+ * 1. Take a snapshot of the bit and clear it if needed.
+ * 2. Write protect the corresponding page.
+ * 3. Copy the snapshot to the userspace.
+ * 4. Flush TLB's if needed.
+ */
+static int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
+ struct kvm_dirty_log *log)
+{
+ int r;
+
+ mutex_lock(&kvm->slots_lock);
+
+ r = kvm_get_dirty_log_protect(kvm, log);
+
+ mutex_unlock(&kvm->slots_lock);
+ return r;
+}

/**
* kvm_clear_dirty_log_protect - clear dirty bits in the bitmap
* and reenable dirty page tracking for the corresponding pages.
* @kvm: pointer to kvm instance
* @log: slot id and address from which to fetch the bitmap of dirty pages
- * @flush: true if TLB flush is needed by caller
*/
-int kvm_clear_dirty_log_protect(struct kvm *kvm,
- struct kvm_clear_dirty_log *log, bool *flush)
+static int kvm_clear_dirty_log_protect(struct kvm *kvm,
+ struct kvm_clear_dirty_log *log)
{
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;
@@ -1258,6 +1294,7 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
unsigned long i, n;
unsigned long *dirty_bitmap;
unsigned long *dirty_bitmap_buffer;
+ bool flush;

as_id = log->slot >> 16;
id = (u16)log->slot;
@@ -1281,7 +1318,9 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
(log->num_pages < memslot->npages - log->first_page && (log->num_pages & 63)))
return -EINVAL;

- *flush = false;
+ kvm_arch_sync_dirty_log(kvm, memslot);
+
+ flush = false;
dirty_bitmap_buffer = kvm_second_dirty_bitmap(memslot);
if (copy_from_user(dirty_bitmap_buffer, log->dirty_bitmap, n))
return -EFAULT;
@@ -1304,17 +1343,32 @@ int kvm_clear_dirty_log_protect(struct kvm *kvm,
* a problem if userspace sets them in log->dirty_bitmap.
*/
if (mask) {
- *flush = true;
+ flush = true;
kvm_arch_mmu_enable_log_dirty_pt_masked(kvm, memslot,
offset, mask);
}
}
spin_unlock(&kvm->mmu_lock);

+ if (flush)
+ kvm_arch_dirty_log_tlb_flush(kvm, memslot);
+
return 0;
}
-EXPORT_SYMBOL_GPL(kvm_clear_dirty_log_protect);
-#endif
+
+static int kvm_vm_ioctl_clear_dirty_log(struct kvm *kvm,
+ struct kvm_clear_dirty_log *log)
+{
+ int r;
+
+ mutex_lock(&kvm->slots_lock);
+
+ r = kvm_clear_dirty_log_protect(kvm, log);
+
+ mutex_unlock(&kvm->slots_lock);
+ return r;
+}
+#endif /* CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT */

bool kvm_largepages_enabled(void)
{
--
2.22.0

2019-09-11 18:55:11

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 07/13] KVM: Move memslot deletion to helper function

Move memslot deletion into its own routine so that the success path for
other memslot updates does not need to use kvm_free_memslot(), i.e. can
explicitly destroy the dirty bitmap when necessary. This paves the way
for dropping @dont from kvm_free_memslot(), i.e. all callers now pass
NULL for @dont.

Add a comment above the code to make a copy of the existing memslot
prior to deletion, it is not at all obvious that the pointer will become
stale due sorting and/or installation of new memslots.

Note, kvm_arch_commit_memory_region() allows an architecture to free
resources when moving a memslot or changing its flags, i.e. implement
logic similar to the dirty bitmap handling, if such functionality is
needed in the future.

Signed-off-by: Sean Christopherson <[email protected]>
---
virt/kvm/kvm_main.c | 73 +++++++++++++++++++++++++++------------------
1 file changed, 44 insertions(+), 29 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 693f3d20e710..1dc9db9bf9eb 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -967,6 +967,27 @@ static int kvm_set_memslot(struct kvm *kvm,
return r;
}

+static int kvm_delete_memslot(struct kvm *kvm,
+ const struct kvm_userspace_memory_region *mem,
+ struct kvm_memory_slot *old, int as_id)
+{
+ struct kvm_memory_slot new;
+ int r;
+
+ if (!old->npages)
+ return -EINVAL;
+
+ memset(&new, 0, sizeof(new));
+ new.id = old->id;
+
+ r = kvm_set_memslot(kvm, mem, old, &new, as_id, KVM_MR_DELETE);
+ if (r)
+ return r;
+
+ kvm_free_memslot(kvm, old, NULL);
+ return 0;
+}
+
/*
* Allocate some memory and give it an address in the guest physical address
* space.
@@ -1016,7 +1037,15 @@ int __kvm_set_memory_region(struct kvm *kvm,
if (npages > KVM_MEM_MAX_NR_PAGES)
return -EINVAL;

- new = old = *slot;
+ /*
+ * Make a full copy of the old memslot, the pointer will become stale
+ * when the memslots are re-sorted by update_memslots().
+ */
+ old = *slot;
+ if (!mem->memory_size)
+ return kvm_delete_memslot(kvm, mem, &old, as_id);
+
+ new = old;

new.id = id;
new.base_gfn = base_gfn;
@@ -1024,29 +1053,20 @@ int __kvm_set_memory_region(struct kvm *kvm,
new.flags = mem->flags;
new.userspace_addr = mem->userspace_addr;

- if (npages) {
- if (!old.npages)
- change = KVM_MR_CREATE;
- else { /* Modify an existing slot. */
- if ((new.userspace_addr != old.userspace_addr) ||
- (npages != old.npages) ||
- ((new.flags ^ old.flags) & KVM_MEM_READONLY))
- return -EINVAL;
-
- if (base_gfn != old.base_gfn)
- change = KVM_MR_MOVE;
- else if (new.flags != old.flags)
- change = KVM_MR_FLAGS_ONLY;
- else /* Nothing to change. */
- return 0;
- }
- } else {
- if (!old.npages)
+ if (!old.npages) {
+ change = KVM_MR_CREATE;
+ } else { /* Modify an existing slot. */
+ if ((new.userspace_addr != old.userspace_addr) ||
+ (npages != old.npages) ||
+ ((new.flags ^ old.flags) & KVM_MEM_READONLY))
return -EINVAL;

- change = KVM_MR_DELETE;
- new.base_gfn = 0;
- new.flags = 0;
+ if (base_gfn != old.base_gfn)
+ change = KVM_MR_MOVE;
+ else if (new.flags != old.flags)
+ change = KVM_MR_FLAGS_ONLY;
+ else /* Nothing to change. */
+ return 0;
}

if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
@@ -1069,17 +1089,12 @@ int __kvm_set_memory_region(struct kvm *kvm,
return r;
}

- /* actual memory is freed via old in kvm_free_memslot below */
- if (change == KVM_MR_DELETE) {
- new.dirty_bitmap = NULL;
- memset(&new.arch, 0, sizeof(new.arch));
- }
-
r = kvm_set_memslot(kvm, mem, &old, &new, as_id, change);
if (r)
goto out_bitmap;

- kvm_free_memslot(kvm, &old, &new);
+ if (old.dirty_bitmap && !new.dirty_bitmap)
+ kvm_destroy_dirty_bitmap(&old);
return 0;

out_bitmap:
--
2.22.0

2019-09-11 18:57:20

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 03/13] KVM: x86: Allocate memslot resources during prepare_memory_region()

Allocate the various metadata structures associated with a memslot
during during kvm_arch_prepare_memory_region(), which paves the way for
removing kvm_arch_create_memslot() altogether. Moving x86's memory
allocation only changes the order of kernel memory allocations between
x86 and common KVM code.

No functional change intended.

Signed-off-by: Sean Christopherson <[email protected]>
---
arch/x86/kvm/x86.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b4cfd786d0b6..72ec6272d7cb 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9484,6 +9484,12 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free,

int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
unsigned long npages)
+{
+ return 0;
+}
+
+static int kvm_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot,
+ unsigned long npages)
{
int i;

@@ -9561,6 +9567,9 @@ int kvm_arch_prepare_memory_region(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem,
enum kvm_mr_change change)
{
+ if (change == KVM_MR_CREATE)
+ return kvm_create_memslot(kvm, memslot,
+ mem->memory_size >> PAGE_SHIFT);
return 0;
}

--
2.22.0

2019-09-11 21:30:22

by Sean Christopherson

[permalink] [raw]
Subject: [PATCH 13/13] KVM: Dynamically size memslot array based on number of used slots

Now that the memslot logic doesn't assume memslots are always non-NULL,
dynamically size the array of memslots instead of unconditionally
allocating memory for the maximum number of memslots.

Note, because a to-be-deleted memslot must first be invalidated, the
array size cannot be immediately reduced when deleting a memslot.
However, consecutive deletions will realize the memory savings, i.e.
a second deletion will trim the entry.

Signed-off-by: Sean Christopherson <[email protected]>
---
include/linux/kvm_host.h | 5 ++++-
virt/kvm/kvm_main.c | 31 ++++++++++++++++++++++++++++---
2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 40ea5df50faa..8feb61bfbd1a 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -433,11 +433,14 @@ static inline int kvm_arch_vcpu_memslots_id(struct kvm_vcpu *vcpu)
*/
struct kvm_memslots {
u64 generation;
- struct kvm_memory_slot memslots[KVM_MEM_SLOTS_NUM];
/* The mapping table from slot id to the index in memslots[]. */
short id_to_index[KVM_MEM_SLOTS_NUM];
atomic_t lru_slot;
int used_slots;
+ struct kvm_memory_slot memslots[];
+ /*
+ * WARNING: 'memslots' is dynamically-sized. It *MUST* be at the end.
+ */
};

struct kvm {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index e2571a9ccfc4..f952a0bec67a 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -535,7 +535,7 @@ static struct kvm_memslots *kvm_alloc_memslots(void)
return NULL;

for (i = 0; i < KVM_MEM_SLOTS_NUM; i++)
- slots->id_to_index[i] = slots->memslots[i].id = -1;
+ slots->id_to_index[i] = -1;

return slots;
}
@@ -933,6 +933,32 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm,
return old_memslots;
}

+/*
+ * Note, at a minimum, the current number of used slots must be allocated, even
+ * when deleting a memslot, as we need a complete duplicate of the memslots for
+ * use when invalidating a memslot prior to deleting/moving the memslot.
+ */
+static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old,
+ enum kvm_mr_change change)
+{
+ struct kvm_memslots *slots;
+ size_t old_size, new_size;
+
+ old_size = sizeof(struct kvm_memslots) +
+ (sizeof(struct kvm_memory_slot) * old->used_slots);
+
+ if (change == KVM_MR_CREATE)
+ new_size = old_size + sizeof(struct kvm_memory_slot);
+ else
+ new_size = old_size;
+
+ slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT);
+ if (likely(slots))
+ memcpy(slots, old, old_size);
+
+ return slots;
+}
+
static int kvm_set_memslot(struct kvm *kvm,
const struct kvm_userspace_memory_region *mem,
const struct kvm_memory_slot *old,
@@ -943,10 +969,9 @@ static int kvm_set_memslot(struct kvm *kvm,
struct kvm_memslots *slots;
int r;

- slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
+ slots = kvm_dup_memslots(__kvm_memslots(kvm, as_id), change);
if (!slots)
return -ENOMEM;
- memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));

if (change == KVM_MR_DELETE || change == KVM_MR_MOVE) {
/*
--
2.22.0

2019-09-12 11:34:32

by Janosch Frank

[permalink] [raw]
Subject: Re: [PATCH 04/13] KVM: Drop kvm_arch_create_memslot()

On 9/11/19 8:50 PM, Sean Christopherson wrote:
> Remove kvm_arch_create_memslot() now that all arch implementations are
> effectively nops. Explicitly free an allocated-but-unused dirty bitmap
> instead of relying on kvm_free_memslot() now that setting a memslot can
> no longer fail after arch code has allocated memory. In practice
> this was already true, e.g. architectures that allocated memory via
> kvm_arch_create_memslot() never failed kvm_arch_prepare_memory_region()
> and vice versa, but removing kvm_arch_create_memslot() eliminates the
> potential for future code to stealthily change behavior.
>
> Eliminating the error path's reliance on kvm_free_memslot() paves the
> way for simplify kvm_free_memslot(), i.e. dropping its @dont param.
>
> Signed-off-by: Sean Christopherson <[email protected]>

Please either split or adopt the patch title to include the freeing.
I'd go for splitting.


Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2019-09-12 20:27:34

by Janosch Frank

[permalink] [raw]
Subject: Re: [PATCH 05/13] KVM: Refactor error handling for setting memory region

On 9/11/19 8:50 PM, Sean Christopherson wrote:
> Replace a big pile o' gotos with returns to make it more obvious what
> error code is being returned, and to prepare for refactoring the
> functional, i.e. post-checks, portion of __kvm_set_memory_region().
>
> Signed-off-by: Sean Christopherson <[email protected]>

Definitely necessary
Reviewed-by: Janosch Frank <[email protected]>

> ---
> virt/kvm/kvm_main.c | 40 ++++++++++++++++++----------------------
> 1 file changed, 18 insertions(+), 22 deletions(-)
>
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index ea8f2f37096f..8306ce3345a6 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -929,34 +929,33 @@ int __kvm_set_memory_region(struct kvm *kvm,
>
> r = check_memory_region_flags(mem);
> if (r)
> - goto out;
> + return r;
>
> - r = -EINVAL;
> as_id = mem->slot >> 16;
> id = (u16)mem->slot;
>
> /* General sanity checks */
> if (mem->memory_size & (PAGE_SIZE - 1))
> - goto out;
> + return -EINVAL;
> if (mem->guest_phys_addr & (PAGE_SIZE - 1))
> - goto out;
> + return -EINVAL;
> /* We can read the guest memory with __xxx_user() later on. */
> if ((id < KVM_USER_MEM_SLOTS) &&
> ((mem->userspace_addr & (PAGE_SIZE - 1)) ||
> !access_ok((void __user *)(unsigned long)mem->userspace_addr,
> mem->memory_size)))
> - goto out;
> + return -EINVAL;
> if (as_id >= KVM_ADDRESS_SPACE_NUM || id >= KVM_MEM_SLOTS_NUM)
> - goto out;
> + return -EINVAL;
> if (mem->guest_phys_addr + mem->memory_size < mem->guest_phys_addr)
> - goto out;
> + return -EINVAL;
>
> slot = id_to_memslot(__kvm_memslots(kvm, as_id), id);
> base_gfn = mem->guest_phys_addr >> PAGE_SHIFT;
> npages = mem->memory_size >> PAGE_SHIFT;
>
> if (npages > KVM_MEM_MAX_NR_PAGES)
> - goto out;
> + return -EINVAL;
>
> new = old = *slot;
>
> @@ -973,20 +972,18 @@ int __kvm_set_memory_region(struct kvm *kvm,
> if ((new.userspace_addr != old.userspace_addr) ||
> (npages != old.npages) ||
> ((new.flags ^ old.flags) & KVM_MEM_READONLY))
> - goto out;
> + return -EINVAL;
>
> if (base_gfn != old.base_gfn)
> change = KVM_MR_MOVE;
> else if (new.flags != old.flags)
> change = KVM_MR_FLAGS_ONLY;
> - else { /* Nothing to change. */
> - r = 0;
> - goto out;
> - }
> + else /* Nothing to change. */
> + return 0;
> }
> } else {
> if (!old.npages)
> - goto out;
> + return -EINVAL;
>
> change = KVM_MR_DELETE;
> new.base_gfn = 0;
> @@ -995,29 +992,29 @@ int __kvm_set_memory_region(struct kvm *kvm,
>
> if ((change == KVM_MR_CREATE) || (change == KVM_MR_MOVE)) {
> /* Check for overlaps */
> - r = -EEXIST;
> kvm_for_each_memslot(slot, __kvm_memslots(kvm, as_id)) {
> if (slot->id == id)
> continue;
> if (!((base_gfn + npages <= slot->base_gfn) ||
> (base_gfn >= slot->base_gfn + slot->npages)))
> - goto out;
> + return -EEXIST;
> }
> }
>
> - r = -ENOMEM;
> -
> /* Allocate/free page dirty bitmap as needed */
> if (!(new.flags & KVM_MEM_LOG_DIRTY_PAGES))
> new.dirty_bitmap = NULL;
> else if (!new.dirty_bitmap) {
> - if (kvm_create_dirty_bitmap(&new) < 0)
> - goto out;
> + r = kvm_create_dirty_bitmap(&new);
> + if (r)
> + return r;
> }
>
> slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT);
> - if (!slots)
> + if (!slots) {
> + r = -ENOMEM;
> goto out_bitmap;
> + }
> memcpy(slots, __kvm_memslots(kvm, as_id), sizeof(struct kvm_memslots));
>
> if ((change == KVM_MR_DELETE) || (change == KVM_MR_MOVE)) {
> @@ -1068,7 +1065,6 @@ int __kvm_set_memory_region(struct kvm *kvm,
> out_bitmap:
> if (new.dirty_bitmap && !old.dirty_bitmap)
> kvm_destroy_dirty_bitmap(&new);
> -out:
> return r;
> }
> EXPORT_SYMBOL_GPL(__kvm_set_memory_region);
>



Attachments:
signature.asc (849.00 B)
OpenPGP digital signature

2019-09-19 00:25:47

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 10/13] KVM: Provide common implementation for generic dirty log functions

On Wed, Sep 11, 2019 at 11:50:35AM -0700, Sean Christopherson wrote:
> Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
> for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code.
> The arch specific implemenations are extremely similar, differing
> only in whether the dirty log needs to be sync'd from hardware (x86)
> and how the TLBs are flushed. Add new arch hooks to handle sync
> and TLB flush; the sync will also be used for non-generic dirty log
> support in a future patch (s390).
>
> The ulterior motive for providing a common implementation is to
> eliminate the dependency between arch and common code with respect to
> the memslot referenced by the dirty log, i.e. to make it obvious in the
> code that the validity of the memslot is guaranteed, as a future patch
> will rework memslot handling such that id_to_memslot() can return NULL.

I notice you add empty definitions of kvm_arch_sync_dirty_log() for
PPC, both Book E and Book 3S. Given that PPC doesn't select
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT, why is this necessary?

Paul.

2019-09-19 03:56:32

by Paul Mackerras

[permalink] [raw]
Subject: Re: [PATCH 02/13] KVM: PPC: Move memslot memory allocation into prepare_memory_region()

On Wed, Sep 11, 2019 at 11:50:27AM -0700, Sean Christopherson wrote:
> Allocate the rmap array during kvm_arch_prepare_memory_region() to pave
> the way for removing kvm_arch_create_memslot() altogether. Moving PPC's
> memory allocation only changes the order of kernel memory allocations
> between PPC and common KVM code.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <[email protected]>

Seems OK.

Acked-by: Paul Mackerras <[email protected]>

2019-09-19 22:17:44

by Sean Christopherson

[permalink] [raw]
Subject: Re: [PATCH 10/13] KVM: Provide common implementation for generic dirty log functions

On Thu, Sep 19, 2019 at 10:22:42AM +1000, Paul Mackerras wrote:
> On Wed, Sep 11, 2019 at 11:50:35AM -0700, Sean Christopherson wrote:
> > Move the implementations of KVM_GET_DIRTY_LOG and KVM_CLEAR_DIRTY_LOG
> > for CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT into common KVM code.
> > The arch specific implemenations are extremely similar, differing
> > only in whether the dirty log needs to be sync'd from hardware (x86)
> > and how the TLBs are flushed. Add new arch hooks to handle sync
> > and TLB flush; the sync will also be used for non-generic dirty log
> > support in a future patch (s390).
> >
> > The ulterior motive for providing a common implementation is to
> > eliminate the dependency between arch and common code with respect to
> > the memslot referenced by the dirty log, i.e. to make it obvious in the
> > code that the validity of the memslot is guaranteed, as a future patch
> > will rework memslot handling such that id_to_memslot() can return NULL.
>
> I notice you add empty definitions of kvm_arch_sync_dirty_log() for
> PPC, both Book E and Book 3S. Given that PPC doesn't select
> CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT, why is this necessary?

s390 has a non-empty kvm_arch_sync_dirty_log() but doesn't select
CONFIG_KVM_GENERIC_DIRTYLOG_READ_PROTECT. Patch 11/13 moves s390's call
of kvm_arch_sync_dirty_log() from s390's kvm_vm_ioctl_get_dirty_log() into
the common (but not "generic") kvm_get_dirty_log() so that it's obvious
that kvm_vm_ioctl_get_dirty_log() and kvm_get_dirty_log() are operating on
the same memslot, i.e. aren't independently querying id_to_memslot().

I originally made kvm_arch_sync_dirty_log() opt-in with a __KVM_HAVE_ARCH
macro, but the resulting #ifdeffery felt uglier than having PPC and ARM
provide empty functions.