2016-12-20 05:49:22

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 00/11] KVM implementation of PAPR HPT resizing extension

Here is the KVM implementation for the proposed PAPR extension which
allows the runtime resizing of a PAPR guest's Hashed Page Table (HPT).

Using this requires a guest kernel with support for the extension.
Patches for guest side support in Linux were posted earlier:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-December/152164.html

It also requires userspace (i.e. qemu) to intercept the HPT resizing
hypercalls and invoke the KVM ioctl()s to implement them. This is
done instead of having KVM direclty intercept the hypercalls, so that
userspace can, if useful, impose additional restrictions on resizes:
for example it could refuse them entirely if policy for the VM
precludes resizing, or it could limit the size of HPT the guest can
request to meet resource limits.

Patches to implement the userspace part of HPT resizing are proposed
for qemu-2.9, and can be found at:
https://github.com/dgibson/qemu/tree/hpt-resize

I'm posting these now, in the hopes that both these and the
corresponding guest side patches can be staged and merged for the 4.11
window.

Changes in v2:
* Use a more normal and straightforward encoding of the capability
flags
* Add information to Documentation/virtual/kvm/api.txt
* Assorted minor cleanups based on review comments

David Gibson (11):
Documentation: Correct duplicate section number in kvm/api.txt
powerpc/kvm: HPT resizing documentation and reserved numbers
powerpc/kvm: Rename kvm_alloc_hpt() for clarity
powerpc/kvm: Gather HPT related variables into sub-structure
powerpc/kvm: Don't store values derivable from HPT order
powerpc/kvm: Split HPT allocation from activation
powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size
powerpc/kvm: Create kvmppc_unmap_hpte_helper()
powerpc/kvm: Outline of KVM-HV HPT resizing implementation
powerpc/kvm: KVM-HV HPT resizing implementation
powerpc/kvm: Advertise availablity of HPT resizing on KVM HV

Documentation/virtual/kvm/api.txt | 111 +++++-
arch/powerpc/include/asm/kvm_book3s_64.h | 16 +
arch/powerpc/include/asm/kvm_host.h | 21 +-
arch/powerpc/include/asm/kvm_ppc.h | 15 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 625 +++++++++++++++++++++++++------
arch/powerpc/kvm/book3s_hv.c | 50 ++-
arch/powerpc/kvm/book3s_hv_builtin.c | 8 +-
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 +--
arch/powerpc/kvm/powerpc.c | 3 +
include/uapi/linux/kvm.h | 11 +
10 files changed, 741 insertions(+), 181 deletions(-)

--
2.9.3


2016-12-20 05:49:23

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 07/11] powerpc/kvm: Allow KVM_PPC_ALLOCATE_HTAB ioctl() to change HPT size

The KVM_PPC_ALLOCATE_HTAB ioctl() is used to set the size of hashed page
table (HPT) that userspace expects a guest VM to have, and is also used to
clear that HPT when necessary (e.g. guest reboot).

At present, once the ioctl() is called for the first time, the HPT size can
never be changed thereafter - it will be cleared but always sized as from
the first call.

With upcoming HPT resize implementation, we're going to need to allow
userspace to resize the HPT at reset (to change it back to the default size
if the guest changed it).

So, we need to allow this ioctl() to change the HPT size.

This patch also updates Documentation/virtual/kvm/api.txt to reflect
the new behaviour. In fact the documentation was already slightly
incorrect since 572abd5 "KVM: PPC: Book3S HV: Don't fall back to
smaller HPT size in allocation ioctl"

Signed-off-by: David Gibson <[email protected]>
---
Documentation/virtual/kvm/api.txt | 14 ++++++++------
arch/powerpc/include/asm/kvm_ppc.h | 2 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 29 ++++++++++++++++-------------
arch/powerpc/kvm/book3s_hv.c | 5 +----
4 files changed, 26 insertions(+), 24 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index f452c72..c12da23 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2443,18 +2443,20 @@ are, it will do nothing and return an EBUSY error.
The parameter is a pointer to a 32-bit unsigned integer variable
containing the order (log base 2) of the desired size of the hash
table, which must be between 18 and 46. On successful return from the
-ioctl, it will have been updated with the order of the hash table that
-was allocated.
+ioctl, the value will not be changed by the kernel.

If no hash table has been allocated when any vcpu is asked to run
(with the KVM_RUN ioctl), the host kernel will allocate a
default-sized hash table (16 MB).

If this ioctl is called when a hash table has already been allocated,
-the kernel will clear out the existing hash table (zero all HPTEs) and
-return the hash table order in the parameter. (If the guest is using
-the virtualized real-mode area (VRMA) facility, the kernel will
-re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
+with a different order from the existing hash table, the existing hash
+table will be freed and a new one allocated. If this is ioctl is
+called when a hash table has already been allocated of the same order
+as specified, the kernel will clear out the existing hash table (zero
+all HPTEs). In either case, if the guest is using the virtualized
+real-mode area (VRMA) facility, the kernel will re-create the VMRA
+HPTEs on the next KVM_RUN of any vcpu.

4.77 KVM_S390_INTERRUPT

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 41575b8..3b837bc 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -157,7 +157,7 @@ extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);

extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
-extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
+extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order);
extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 4683d87..2765d04 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -102,10 +102,10 @@ void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
info->virt, (long)info->order, kvm->arch.lpid);
}

-long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
+long kvmppc_alloc_reset_hpt(struct kvm *kvm, int order)
{
long err = -EBUSY;
- long order;
+ struct kvm_hpt_info info;

mutex_lock(&kvm->lock);
if (kvm->arch.hpte_setup_done) {
@@ -117,8 +117,9 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
goto out;
}
}
- if (kvm->arch.hpt.virt) {
- order = kvm->arch.hpt.order;
+ if (kvm->arch.hpt.order == order) {
+ /* We already have a suitable HPT */
+
/* Set the entire HPT to 0, i.e. invalid HPTEs */
memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
/*
@@ -127,17 +128,19 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
kvmppc_rmap_reset(kvm);
/* Ensure that each vcpu will flush its TLB on next entry. */
cpumask_setall(&kvm->arch.need_tlb_flush);
- *htab_orderp = order;
err = 0;
- } else {
- struct kvm_hpt_info info;
-
- err = kvmppc_allocate_hpt(&info, *htab_orderp);
- if (err < 0)
- goto out;
- kvmppc_set_hpt(kvm, &info);
+ goto out;
}
- out:
+
+ if (kvm->arch.hpt.virt)
+ kvmppc_free_hpt(&kvm->arch.hpt);
+
+ err = kvmppc_allocate_hpt(&info, order);
+ if (err < 0)
+ goto out;
+ kvmppc_set_hpt(kvm, &info);
+
+out:
mutex_unlock(&kvm->lock);
return err;
}
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 436508e..9ebb9e6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3600,12 +3600,9 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
r = -EFAULT;
if (get_user(htab_order, (u32 __user *)argp))
break;
- r = kvmppc_alloc_reset_hpt(kvm, &htab_order);
+ r = kvmppc_alloc_reset_hpt(kvm, htab_order);
if (r)
break;
- r = -EFAULT;
- if (put_user(htab_order, (u32 __user *)argp))
- break;
r = 0;
break;
}
--
2.9.3

2016-12-20 05:49:20

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 01/11] Documentation: Correct duplicate section number in kvm/api.txt

Both KVM_CREATE_SPAPR_TCE_64 and KVM_REINJECT_CONTROL have section number
4.98 in Documentation/virtual/kvm/api.txt, presumably due to a naive merge.
This corrects the duplication.

Signed-off-by: David Gibson <[email protected]>
---
Documentation/virtual/kvm/api.txt | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 03145b7..42be7b1 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3177,7 +3177,7 @@ of IOMMU pages.

The rest of functionality is identical to KVM_CREATE_SPAPR_TCE.

-4.98 KVM_REINJECT_CONTROL
+4.99 KVM_REINJECT_CONTROL

Capability: KVM_CAP_REINJECT_CONTROL
Architectures: x86
--
2.9.3

2016-12-20 05:49:19

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 03/11] powerpc/kvm: Rename kvm_alloc_hpt() for clarity

The difference between kvm_alloc_hpt() and kvmppc_alloc_hpt() is not at
all obvious from the name. In practice kvmppc_alloc_hpt() allocates an HPT
by whatever means, and calls kvm_alloc_hpt() which will attempt to allocate
it with CMA only.

To make this less confusing, rename kvm_alloc_hpt() to kvm_alloc_hpt_cma().
Similarly, kvm_release_hpt() is renamed kvm_free_hpt_cma().

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
---
arch/powerpc/include/asm/kvm_ppc.h | 4 ++--
arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 ++++----
arch/powerpc/kvm/book3s_hv_builtin.c | 8 ++++----
3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 2da67bf..3db6be9 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -186,8 +186,8 @@ extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
unsigned long tce_value, unsigned long npages);
extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
unsigned long ioba);
-extern struct page *kvm_alloc_hpt(unsigned long nr_pages);
-extern void kvm_release_hpt(struct page *page, unsigned long nr_pages);
+extern struct page *kvm_alloc_hpt_cma(unsigned long nr_pages);
+extern void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages);
extern int kvmppc_core_init_vm(struct kvm *kvm);
extern void kvmppc_core_destroy_vm(struct kvm *kvm);
extern void kvmppc_core_free_memslot(struct kvm *kvm,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b795dd1..ae17cdd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -62,7 +62,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}

kvm->arch.hpt_cma_alloc = 0;
- page = kvm_alloc_hpt(1ul << (order - PAGE_SHIFT));
+ page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
@@ -108,7 +108,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)

out_freehpt:
if (kvm->arch.hpt_cma_alloc)
- kvm_release_hpt(page, 1 << (order - PAGE_SHIFT));
+ kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
else
free_pages(hpt, order - PAGE_SHIFT);
return -ENOMEM;
@@ -155,8 +155,8 @@ void kvmppc_free_hpt(struct kvm *kvm)
kvmppc_free_lpid(kvm->arch.lpid);
vfree(kvm->arch.revmap);
if (kvm->arch.hpt_cma_alloc)
- kvm_release_hpt(virt_to_page(kvm->arch.hpt_virt),
- 1 << (kvm->arch.hpt_order - PAGE_SHIFT));
+ kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt),
+ 1 << (kvm->arch.hpt_order - PAGE_SHIFT));
else
free_pages(kvm->arch.hpt_virt,
kvm->arch.hpt_order - PAGE_SHIFT);
diff --git a/arch/powerpc/kvm/book3s_hv_builtin.c b/arch/powerpc/kvm/book3s_hv_builtin.c
index 5bb24be..4c4aa47 100644
--- a/arch/powerpc/kvm/book3s_hv_builtin.c
+++ b/arch/powerpc/kvm/book3s_hv_builtin.c
@@ -52,19 +52,19 @@ static int __init early_parse_kvm_cma_resv(char *p)
}
early_param("kvm_cma_resv_ratio", early_parse_kvm_cma_resv);

-struct page *kvm_alloc_hpt(unsigned long nr_pages)
+struct page *kvm_alloc_hpt_cma(unsigned long nr_pages)
{
VM_BUG_ON(order_base_2(nr_pages) < KVM_CMA_CHUNK_ORDER - PAGE_SHIFT);

return cma_alloc(kvm_cma, nr_pages, order_base_2(HPT_ALIGN_PAGES));
}
-EXPORT_SYMBOL_GPL(kvm_alloc_hpt);
+EXPORT_SYMBOL_GPL(kvm_alloc_hpt_cma);

-void kvm_release_hpt(struct page *page, unsigned long nr_pages)
+void kvm_free_hpt_cma(struct page *page, unsigned long nr_pages)
{
cma_release(kvm_cma, page, nr_pages);
}
-EXPORT_SYMBOL_GPL(kvm_release_hpt);
+EXPORT_SYMBOL_GPL(kvm_free_hpt_cma);

/**
* kvm_cma_reserve() - reserve area for kvm hash pagetable
--
2.9.3

2016-12-20 05:51:06

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 09/11] powerpc/kvm: Outline of KVM-HV HPT resizing implementation

This adds a not yet working outline of the HPT resizing PAPR
extension. Specifically it adds the necessary ioctl() functions,
their basic steps, the work function which will handle preparation for
the resize, and synchronization between these, the guest page fault
path and guest HPT update path.

The actual guts of the implementation isn't here yet, so for now the
calls will always fail.

Signed-off-by: David Gibson <[email protected]>
---
arch/powerpc/include/asm/kvm_host.h | 3 +
arch/powerpc/include/asm/kvm_ppc.h | 4 +
arch/powerpc/kvm/book3s_64_mmu_hv.c | 191 ++++++++++++++++++++++++++++++++++++
arch/powerpc/kvm/book3s_hv.c | 25 +++++
4 files changed, 223 insertions(+)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 7576e0e..fb8e508 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -252,6 +252,8 @@ struct kvm_hpt_info {
int cma;
};

+struct kvm_resize_hpt;
+
struct kvm_arch {
unsigned int lpid;
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
@@ -272,6 +274,7 @@ struct kvm_arch {
cpumask_t need_tlb_flush;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
+ struct kvm_resize_hpt *resize_hpt; /* protected by kvm->lock */
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
struct mutex hpt_mutex;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 3b837bc..f8eaed0 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -215,6 +215,10 @@ extern void kvmppc_bookehv_exit(void);
extern int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu);

extern int kvm_vm_ioctl_get_htab_fd(struct kvm *kvm, struct kvm_get_htab_fd *);
+extern long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
+ struct kvm_ppc_resize_hpt *rhpt);
+extern long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
+ struct kvm_ppc_resize_hpt *rhpt);

int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq);

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 74e909a..86a1bcb 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -40,9 +40,34 @@

#include "trace_hv.h"

+//#define DEBUG_RESIZE_HPT 1
+
+#ifdef DEBUG_RESIZE_HPT
+#define resize_hpt_debug(resize, ...) \
+ do { \
+ printk(KERN_DEBUG "RESIZE HPT %p: ", resize); \
+ printk(__VA_ARGS__); \
+ } while (0)
+#else
+#define resize_hpt_debug(resize, ...) \
+ do { } while (0)
+#endif
+
static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
long pte_index, unsigned long pteh,
unsigned long ptel, unsigned long *pte_idx_ret);
+
+struct kvm_resize_hpt {
+ /* These fields read-only after init */
+ struct kvm *kvm;
+ struct work_struct work;
+ u32 order;
+
+ /* These fields protected by kvm->lock */
+ int error;
+ bool prepare_done;
+};
+
static void kvmppc_rmap_reset(struct kvm *kvm);

int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
@@ -1158,6 +1183,172 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, unsigned long gpa,
}

/*
+ * HPT resizing
+ */
+static int resize_hpt_allocate(struct kvm_resize_hpt *resize)
+{
+ return 0;
+}
+
+static int resize_hpt_rehash(struct kvm_resize_hpt *resize)
+{
+ return -EIO;
+}
+
+static void resize_hpt_pivot(struct kvm_resize_hpt *resize)
+{
+}
+
+static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt *resize)
+{
+ BUG_ON(kvm->arch.resize_hpt != resize);
+ kvm->arch.resize_hpt = NULL;
+ kfree(resize);
+}
+
+static void resize_hpt_prepare_work(struct work_struct *work)
+{
+ struct kvm_resize_hpt *resize = container_of(work,
+ struct kvm_resize_hpt,
+ work);
+ struct kvm *kvm = resize->kvm;
+ int err;
+
+ resize_hpt_debug(resize, "resize_hpt_prepare_work(): order = %d\n",
+ resize->order);
+
+ err = resize_hpt_allocate(resize);
+
+ mutex_lock(&kvm->lock);
+
+ resize->error = err;
+ resize->prepare_done = true;
+
+ mutex_unlock(&kvm->lock);
+}
+
+long kvm_vm_ioctl_resize_hpt_prepare(struct kvm *kvm,
+ struct kvm_ppc_resize_hpt *rhpt)
+{
+ unsigned long flags = rhpt->flags;
+ unsigned long shift = rhpt->shift;
+ struct kvm_resize_hpt *resize;
+ int ret;
+
+ if (flags != 0)
+ return -EINVAL;
+
+ if (shift && ((shift < 18) || (shift > 46)))
+ return -EINVAL;
+
+ mutex_lock(&kvm->lock);
+
+ resize = kvm->arch.resize_hpt;
+
+ if (resize) {
+ if (resize->order == shift) {
+ /* Suitable resize in progress */
+ if (resize->prepare_done) {
+ ret = resize->error;
+ if (ret != 0)
+ resize_hpt_release(kvm, resize);
+ } else {
+ ret = 100; /* estimated time in ms */
+ }
+
+ goto out;
+ }
+
+ /* not suitable, cancel it */
+ resize_hpt_release(kvm, resize);
+ }
+
+ ret = 0;
+ if (!shift)
+ goto out; /* nothing to do */
+
+ /* start new resize */
+
+ resize = kzalloc(sizeof(*resize), GFP_KERNEL);
+ resize->order = shift;
+ resize->kvm = kvm;
+ INIT_WORK(&resize->work, resize_hpt_prepare_work);
+ kvm->arch.resize_hpt = resize;
+
+ schedule_work(&resize->work);
+
+ ret = 100; /* estimated time in ms */
+
+out:
+ mutex_unlock(&kvm->lock);
+ return ret;
+}
+
+static void resize_hpt_boot_vcpu(void *opaque)
+{
+ /* Nothing to do, just force a KVM exit */
+}
+
+long kvm_vm_ioctl_resize_hpt_commit(struct kvm *kvm,
+ struct kvm_ppc_resize_hpt *rhpt)
+{
+ unsigned long flags = rhpt->flags;
+ unsigned long shift = rhpt->shift;
+ struct kvm_resize_hpt *resize;
+ long ret;
+
+ if (flags != 0)
+ return -EINVAL;
+
+ if (shift && ((shift < 18) || (shift > 46)))
+ return -EINVAL;
+
+ mutex_lock(&kvm->lock);
+
+ resize = kvm->arch.resize_hpt;
+
+ /* This shouldn't be possible */
+ ret = -EIO;
+ if (WARN_ON(!kvm->arch.hpte_setup_done))
+ goto out_no_hpt;
+
+ /* Stop VCPUs from running while we mess with the HPT */
+ kvm->arch.hpte_setup_done = 0;
+ smp_mb();
+
+ /* Boot all CPUs out of the guest so they re-read
+ * hpte_setup_done */
+ on_each_cpu(resize_hpt_boot_vcpu, NULL, 1);
+
+ ret = -ENXIO;
+ if (!resize || (resize->order != shift))
+ goto out;
+
+ ret = -EBUSY;
+ if (!resize->prepare_done)
+ goto out;
+
+ ret = resize->error;
+ if (ret != 0)
+ goto out;
+
+ ret = resize_hpt_rehash(resize);
+ if (ret != 0)
+ goto out;
+
+ resize_hpt_pivot(resize);
+
+out:
+ /* Let VCPUs run again */
+ kvm->arch.hpte_setup_done = 1;
+ smp_mb();
+out_no_hpt:
+ resize_hpt_release(kvm, resize);
+ mutex_unlock(&kvm->lock);
+ return ret;
+}
+
+/*
* Functions for reading and writing the hash table via reads and
* writes on a file descriptor.
*
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 9ebb9e6..7fd1dba 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3323,6 +3323,9 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)

kvm->arch.lpcr = lpcr;

+ /* Initialization for future HPT resizes */
+ kvm->arch.resize_hpt = NULL;
+
/*
* Work out how many sets the TLB has, for the use of
* the TLB invalidation loop in book3s_hv_rmhandlers.S.
@@ -3617,6 +3620,28 @@ static long kvm_arch_vm_ioctl_hv(struct file *filp,
break;
}

+ case KVM_PPC_RESIZE_HPT_PREPARE: {
+ struct kvm_ppc_resize_hpt rhpt;
+
+ r = -EFAULT;
+ if (copy_from_user(&rhpt, argp, sizeof(rhpt)))
+ break;
+
+ r = kvm_vm_ioctl_resize_hpt_prepare(kvm, &rhpt);
+ break;
+ }
+
+ case KVM_PPC_RESIZE_HPT_COMMIT: {
+ struct kvm_ppc_resize_hpt rhpt;
+
+ r = -EFAULT;
+ if (copy_from_user(&rhpt, argp, sizeof(rhpt)))
+ break;
+
+ r = kvm_vm_ioctl_resize_hpt_commit(kvm, &rhpt);
+ break;
+ }
+
default:
r = -ENOTTY;
}
--
2.9.3

2016-12-20 05:51:37

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 10/11] powerpc/kvm: KVM-HV HPT resizing implementation

This adds the "guts" of the implementation for the HPT resizing PAPR
extension. It has the code to allocate and clear a new HPT, rehash an
existing HPT's entries into it, and accomplish the switchover for a
KVM guest from the old HPT to the new one.

Signed-off-by: David Gibson <[email protected]>
---
arch/powerpc/kvm/book3s_64_mmu_hv.c | 188 +++++++++++++++++++++++++++++++++++-
1 file changed, 187 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 86a1bcb..e22776e 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -66,6 +66,10 @@ struct kvm_resize_hpt {
/* These fields protected by kvm->lock */
int error;
bool prepare_done;
+
+ /* Private to the work thread, until prepare_done is true,
+ * then protected by kvm->resize_hpt_sem */
+ struct kvm_hpt_info hpt;
};

static void kvmppc_rmap_reset(struct kvm *kvm);
@@ -1187,21 +1191,203 @@ void kvmppc_unpin_guest_page(struct kvm *kvm, void *va, unsigned long gpa,
*/
static int resize_hpt_allocate(struct kvm_resize_hpt *resize)
{
+ int rc;
+
+ rc = kvmppc_allocate_hpt(&resize->hpt, resize->order);
+ if (rc < 0)
+ return rc;
+
+ resize_hpt_debug(resize, "resize_hpt_allocate(): HPT @ 0x%lx\n",
+ resize->hpt.virt);
+
return 0;
}

+static unsigned long resize_hpt_rehash_hpte(struct kvm_resize_hpt *resize,
+ unsigned long idx)
+{
+ struct kvm *kvm = resize->kvm;
+ struct kvm_hpt_info *old = &kvm->arch.hpt;
+ struct kvm_hpt_info *new = &resize->hpt;
+ unsigned long old_hash_mask = (1ULL << (old->order - 7)) - 1;
+ unsigned long new_hash_mask = (1ULL << (new->order - 7)) - 1;
+ __be64 *hptep, *new_hptep;
+ unsigned long vpte, rpte, guest_rpte;
+ int ret;
+ struct revmap_entry *rev;
+ unsigned long apsize, psize, avpn, pteg, hash;
+ unsigned long new_idx, new_pteg, replace_vpte;
+
+ hptep = (__be64 *)(old->virt + (idx << 4));
+
+ /* Guest is stopped, so new HPTEs can't be added or faulted
+ * in, only unmapped or altered by host actions. So, it's
+ * safe to check this before we take the HPTE lock */
+ vpte = be64_to_cpu(hptep[0]);
+ if (!(vpte & HPTE_V_VALID) && !(vpte & HPTE_V_ABSENT))
+ return 0; /* nothing to do */
+
+ while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
+ cpu_relax();
+
+ vpte = be64_to_cpu(hptep[0]);
+
+ ret = 0;
+ if (!(vpte & HPTE_V_VALID) && !(vpte & HPTE_V_ABSENT))
+ /* Nothing to do */
+ goto out;
+
+ /* Unmap */
+ rev = &old->rev[idx];
+ guest_rpte = rev->guest_rpte;
+
+ ret = -EIO;
+ apsize = hpte_page_size(vpte, guest_rpte);
+ if (!apsize)
+ goto out;
+
+ if (vpte & HPTE_V_VALID) {
+ unsigned long gfn = hpte_rpn(guest_rpte, apsize);
+ int srcu_idx = srcu_read_lock(&kvm->srcu);
+ struct kvm_memory_slot *memslot =
+ __gfn_to_memslot(kvm_memslots(kvm), gfn);
+
+ if (memslot) {
+ unsigned long *rmapp;
+ rmapp = &memslot->arch.rmap[gfn - memslot->base_gfn];
+
+ lock_rmap(rmapp);
+ kvmppc_unmap_hpte(kvm, idx, rmapp, gfn);
+ unlock_rmap(rmapp);
+ }
+
+ srcu_read_unlock(&kvm->srcu, srcu_idx);
+ }
+
+ /* Reload PTE after unmap */
+ vpte = be64_to_cpu(hptep[0]);
+
+ BUG_ON(vpte & HPTE_V_VALID);
+ BUG_ON(!(vpte & HPTE_V_ABSENT));
+
+ ret = 0;
+ if (!(vpte & HPTE_V_BOLTED))
+ goto out;
+
+ rpte = be64_to_cpu(hptep[1]);
+ psize = hpte_base_page_size(vpte, rpte);
+ avpn = HPTE_V_AVPN_VAL(vpte) & ~((psize - 1) >> 23);
+ pteg = idx / HPTES_PER_GROUP;
+ if (vpte & HPTE_V_SECONDARY)
+ pteg = ~pteg;
+
+ if (!(vpte & HPTE_V_1TB_SEG)) {
+ unsigned long offset, vsid;
+
+ /* We only have 28 - 23 bits of offset in avpn */
+ offset = (avpn & 0x1f) << 23;
+ vsid = avpn >> 5;
+ /* We can find more bits from the pteg value */
+ if (psize < (1ULL << 23))
+ offset |= ((vsid ^ pteg) & old_hash_mask) * psize;
+
+ hash = vsid ^ (offset / psize);
+ } else {
+ unsigned long offset, vsid;
+
+ /* We only have 40 - 23 bits of seg_off in avpn */
+ offset = (avpn & 0x1ffff) << 23;
+ vsid = avpn >> 17;
+ if (psize < (1ULL << 23))
+ offset |= ((vsid ^ (vsid << 25) ^ pteg) & old_hash_mask) * psize;
+
+ hash = vsid ^ (vsid << 25) ^ (offset / psize);
+ }
+
+ new_pteg = hash & new_hash_mask;
+ if (vpte & HPTE_V_SECONDARY) {
+ BUG_ON(~pteg != (hash & old_hash_mask));
+ new_pteg = ~new_pteg;
+ } else {
+ BUG_ON(pteg != (hash & old_hash_mask));
+ }
+
+ new_idx = new_pteg * HPTES_PER_GROUP + (idx % HPTES_PER_GROUP);
+ new_hptep = (__be64 *)(new->virt + (new_idx << 4));
+
+ replace_vpte = be64_to_cpu(new_hptep[0]);
+
+ if (replace_vpte & (HPTE_V_VALID | HPTE_V_ABSENT)) {
+ BUG_ON(new->order >= old->order);
+
+ if (replace_vpte & HPTE_V_BOLTED) {
+ if (vpte & HPTE_V_BOLTED)
+ /* Bolted collision, nothing we can do */
+ ret = -ENOSPC;
+ /* Discard the new HPTE */
+ goto out;
+ }
+
+ /* Discard the previous HPTE */
+ }
+
+ new_hptep[1] = cpu_to_be64(rpte);
+ new->rev[new_idx].guest_rpte = guest_rpte;
+ /* No need for a barrier, since new HPT isn't active */
+ new_hptep[0] = cpu_to_be64(vpte);
+ unlock_hpte(new_hptep, vpte);
+
+out:
+ unlock_hpte(hptep, vpte);
+ return ret;
+}
+
static int resize_hpt_rehash(struct kvm_resize_hpt *resize)
{
- return -EIO;
+ struct kvm *kvm = resize->kvm;
+ unsigned long i;
+ int rc;
+
+ for (i = 0; i < kvmppc_hpt_npte(&kvm->arch.hpt); i++) {
+ rc = resize_hpt_rehash_hpte(resize, i);
+ if (rc != 0)
+ return rc;
+ }
+
+ return 0;
}

static void resize_hpt_pivot(struct kvm_resize_hpt *resize)
{
+ struct kvm *kvm = resize->kvm;
+ struct kvm_hpt_info hpt_tmp;
+
+ /* Exchange the pending tables in the resize structure with
+ * the active tables */
+
+ resize_hpt_debug(resize, "resize_hpt_pivot()\n");
+
+ spin_lock(&kvm->mmu_lock);
+ asm volatile("ptesync" : : : "memory");
+
+ hpt_tmp = kvm->arch.hpt;
+ kvmppc_set_hpt(kvm, &resize->hpt);
+ resize->hpt = hpt_tmp;
+
+ spin_unlock(&kvm->mmu_lock);
+
+ synchronize_srcu_expedited(&kvm->srcu);
+
+ resize_hpt_debug(resize, "resize_hpt_pivot() done\n");
}

static void resize_hpt_release(struct kvm *kvm, struct kvm_resize_hpt *resize)
{
BUG_ON(kvm->arch.resize_hpt != resize);
+
+ if (resize->hpt.virt)
+ kvmppc_free_hpt(&resize->hpt);
+
kvm->arch.resize_hpt = NULL;
kfree(resize);
}
--
2.9.3

2016-12-20 05:51:40

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 05/11] powerpc/kvm: Don't store values derivable from HPT order

Currently the kvm_hpt_info structure stores the hashed page table's order,
and also the number of HPTEs it contains and a mask for its size. The
last two can be easily derived from the order, so remove them and just
calculate them as necessary with a couple of helper inlines.

Signed-off-by: David Gibson <[email protected]>
Reviewed-by: Thomas Huth <[email protected]>
---
arch/powerpc/include/asm/kvm_book3s_64.h | 12 ++++++++++++
arch/powerpc/include/asm/kvm_host.h | 2 --
arch/powerpc/kvm/book3s_64_mmu_hv.c | 28 +++++++++++++---------------
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 18 +++++++++---------
4 files changed, 34 insertions(+), 26 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 8482921..8810327 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -350,6 +350,18 @@ extern void kvmppc_mmu_debugfs_init(struct kvm *kvm);

extern void kvmhv_rm_send_ipi(int cpu);

+static inline unsigned long kvmppc_hpt_npte(struct kvm_hpt_info *hpt)
+{
+ /* HPTEs are 2**4 bytes long */
+ return 1UL << (hpt->order - 4);
+}
+
+static inline unsigned long kvmppc_hpt_mask(struct kvm_hpt_info *hpt)
+{
+ /* 128 (2**7) bytes in each HPTEG */
+ return (1UL << (hpt->order - 7)) - 1;
+}
+
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */

#endif /* __ASM_KVM_BOOK3S_64_H__ */
diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 074b17f..7576e0e 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -246,8 +246,6 @@ struct kvm_hpt_info {
unsigned long virt;
/* Array of reverse mapping entries for each guest HPTE */
struct revmap_entry *rev;
- unsigned long npte;
- unsigned long mask;
/* Guest HPT size is 2**(order) bytes */
u32 order;
/* 1 if HPT allocated with CMA, 0 otherwise */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index b5799d1..fe88132 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -83,15 +83,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)

kvm->arch.hpt.virt = hpt;
kvm->arch.hpt.order = order;
- /* HPTEs are 2**4 bytes long */
- kvm->arch.hpt.npte = 1ul << (order - 4);
- /* 128 (2**7) bytes in each HPTEG */
- kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;

atomic64_set(&kvm->arch.mmio_update, 0);

/* Allocate reverse map array */
- rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
+ rev = vmalloc(sizeof(struct revmap_entry) * kvmppc_hpt_npte(&kvm->arch.hpt));
if (!rev) {
pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
goto out_freehpt;
@@ -194,8 +190,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
if (npages > 1ul << (40 - porder))
npages = 1ul << (40 - porder);
/* Can't use more than 1 HPTE per HPTEG */
- if (npages > kvm->arch.hpt.mask + 1)
- npages = kvm->arch.hpt.mask + 1;
+ if (npages > kvmppc_hpt_mask(&kvm->arch.hpt) + 1)
+ npages = kvmppc_hpt_mask(&kvm->arch.hpt) + 1;

hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
@@ -205,7 +201,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
for (i = 0; i < npages; ++i) {
addr = i << porder;
/* can't use hpt_hash since va > 64 bits */
- hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt.mask;
+ hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25)))
+ & kvmppc_hpt_mask(&kvm->arch.hpt);
/*
* We assume that the hash table is empty and no
* vcpus are using it at this stage. Since we create
@@ -1306,7 +1303,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,

/* Skip uninteresting entries, i.e. clean on not-first pass */
if (!first_pass) {
- while (i < kvm->arch.hpt.npte &&
+ while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
!hpte_dirty(revp, hptp)) {
++i;
hptp += 2;
@@ -1316,7 +1313,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
hdr.index = i;

/* Grab a series of valid entries */
- while (i < kvm->arch.hpt.npte &&
+ while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
hdr.n_valid < 0xffff &&
nb + HPTE_SIZE < count &&
record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
@@ -1332,7 +1329,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
++revp;
}
/* Now skip invalid entries while we can */
- while (i < kvm->arch.hpt.npte &&
+ while (i < kvmppc_hpt_npte(&kvm->arch.hpt) &&
hdr.n_invalid < 0xffff &&
record_hpte(flags, hptp, hpte, revp, 0, first_pass)) {
/* found an invalid entry */
@@ -1353,7 +1350,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
}

/* Check if we've wrapped around the hash table */
- if (i >= kvm->arch.hpt.npte) {
+ if (i >= kvmppc_hpt_npte(&kvm->arch.hpt)) {
i = 0;
ctx->first_pass = 0;
break;
@@ -1412,8 +1409,8 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,

err = -EINVAL;
i = hdr.index;
- if (i >= kvm->arch.hpt.npte ||
- i + hdr.n_valid + hdr.n_invalid > kvm->arch.hpt.npte)
+ if (i >= kvmppc_hpt_npte(&kvm->arch.hpt) ||
+ i + hdr.n_valid + hdr.n_invalid > kvmppc_hpt_npte(&kvm->arch.hpt))
break;

hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
@@ -1604,7 +1601,8 @@ static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
kvm = p->kvm;
i = p->hpt_index;
hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
- for (; len != 0 && i < kvm->arch.hpt.npte; ++i, hptp += 2) {
+ for (; len != 0 && i < kvmppc_hpt_npte(&kvm->arch.hpt);
+ ++i, hptp += 2) {
if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)))
continue;

diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 7e2b048..4ef3561 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -283,7 +283,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,

/* Find and lock the HPTEG slot to use */
do_insert:
- if (pte_index >= kvm->arch.hpt.npte)
+ if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
return H_PARAMETER;
if (likely((flags & H_EXACT) == 0)) {
pte_index &= ~7UL;
@@ -458,7 +458,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
struct revmap_entry *rev;
u64 pte, orig_pte, pte_r;

- if (pte_index >= kvm->arch.hpt.npte)
+ if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
return H_PARAMETER;
hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
@@ -544,7 +544,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
break;
}
if (req != 1 || flags == 3 ||
- pte_index >= kvm->arch.hpt.npte) {
+ pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt)) {
/* parameter error */
args[j] = ((0xa0 | flags) << 56) + pte_index;
ret = H_PARAMETER;
@@ -642,7 +642,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long v, r, rb, mask, bits;
u64 pte_v, pte_r;

- if (pte_index >= kvm->arch.hpt.npte)
+ if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
return H_PARAMETER;

hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
@@ -711,7 +711,7 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
int i, n = 1;
struct revmap_entry *rev = NULL;

- if (pte_index >= kvm->arch.hpt.npte)
+ if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
return H_PARAMETER;
if (flags & H_READ_4) {
pte_index &= ~3;
@@ -750,7 +750,7 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long *rmap;
long ret = H_NOT_FOUND;

- if (pte_index >= kvm->arch.hpt.npte)
+ if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
return H_PARAMETER;

rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
@@ -796,7 +796,7 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long *rmap;
long ret = H_NOT_FOUND;

- if (pte_index >= kvm->arch.hpt.npte)
+ if (pte_index >= kvmppc_hpt_npte(&kvm->arch.hpt))
return H_PARAMETER;

rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
@@ -949,7 +949,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
somask = (1UL << 28) - 1;
vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT;
}
- hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvm->arch.hpt.mask;
+ hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvmppc_hpt_mask(&kvm->arch.hpt);
avpn = slb_v & ~(somask >> 16); /* also includes B */
avpn |= (eaddr & somask) >> 16;

@@ -996,7 +996,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
if (val & HPTE_V_SECONDARY)
break;
val |= HPTE_V_SECONDARY;
- hash = hash ^ kvm->arch.hpt.mask;
+ hash = hash ^ kvmppc_hpt_mask(&kvm->arch.hpt);
}
return -1;
}
--
2.9.3

2016-12-20 05:51:42

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 11/11] powerpc/kvm: Advertise availablity of HPT resizing on KVM HV

This updates the KVM_CAP_SPAPR_RESIZE_HPT capability to advertise the
presence of in-kernel HPT resizing on KVM HV.

Signed-off-by: David Gibson <[email protected]>
---
arch/powerpc/kvm/powerpc.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index efd1183..1f32235 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -605,6 +605,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_SPAPR_MULTITCE:
r = 1;
break;
+ case KVM_CAP_SPAPR_RESIZE_HPT:
+ r = !!hv_enabled;
+ break;
#endif
case KVM_CAP_PPC_HTM:
r = cpu_has_feature(CPU_FTR_TM_COMP) &&
--
2.9.3

2016-12-20 05:51:35

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 08/11] powerpc/kvm: Create kvmppc_unmap_hpte_helper()

The kvm_unmap_rmapp() function, called from certain MMU notifiers, is used
to force all guest mappings of a particular host page to be set ABSENT, and
removed from the reverse mappings.

For HPT resizing, we will have some cases where we want to set just a
single guest HPTE ABSENT and remove its reverse mappings. To prepare with
this, we split out the logic from kvm_unmap_rmapp() to evict a single HPTE,
moving it to a new helper function.

Signed-off-by: David Gibson <[email protected]>
---
arch/powerpc/kvm/book3s_64_mmu_hv.c | 77 +++++++++++++++++++++----------------
1 file changed, 44 insertions(+), 33 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index 2765d04..74e909a 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -738,13 +738,53 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
return kvm_handle_hva_range(kvm, hva, hva + 1, handler);
}

+/* Must be called with both HPTE and rmap locked */
+static void kvmppc_unmap_hpte(struct kvm *kvm, unsigned long i,
+ unsigned long *rmapp, unsigned long gfn)
+{
+ __be64 *hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
+ struct revmap_entry *rev = kvm->arch.hpt.rev;
+ unsigned long j, h;
+ unsigned long ptel, psize, rcbits;
+
+ j = rev[i].forw;
+ if (j == i) {
+ /* chain is now empty */
+ *rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX);
+ } else {
+ /* remove i from chain */
+ h = rev[i].back;
+ rev[h].forw = j;
+ rev[j].back = h;
+ rev[i].forw = rev[i].back = i;
+ *rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j;
+ }
+
+ /* Now check and modify the HPTE */
+ ptel = rev[i].guest_rpte;
+ psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
+ if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
+ hpte_rpn(ptel, psize) == gfn) {
+ hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
+ kvmppc_invalidate_hpte(kvm, hptep, i);
+ hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
+ /* Harvest R and C */
+ rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
+ *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
+ if (rcbits & HPTE_R_C)
+ kvmppc_update_rmap_change(rmapp, psize);
+ if (rcbits & ~rev[i].guest_rpte) {
+ rev[i].guest_rpte = ptel | rcbits;
+ note_hpte_modification(kvm, &rev[i]);
+ }
+ }
+}
+
static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
unsigned long gfn)
{
- struct revmap_entry *rev = kvm->arch.hpt.rev;
- unsigned long h, i, j;
+ unsigned long i;
__be64 *hptep;
- unsigned long ptel, psize, rcbits;

for (;;) {
lock_rmap(rmapp);
@@ -767,37 +807,8 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
cpu_relax();
continue;
}
- j = rev[i].forw;
- if (j == i) {
- /* chain is now empty */
- *rmapp &= ~(KVMPPC_RMAP_PRESENT | KVMPPC_RMAP_INDEX);
- } else {
- /* remove i from chain */
- h = rev[i].back;
- rev[h].forw = j;
- rev[j].back = h;
- rev[i].forw = rev[i].back = i;
- *rmapp = (*rmapp & ~KVMPPC_RMAP_INDEX) | j;
- }

- /* Now check and modify the HPTE */
- ptel = rev[i].guest_rpte;
- psize = hpte_page_size(be64_to_cpu(hptep[0]), ptel);
- if ((be64_to_cpu(hptep[0]) & HPTE_V_VALID) &&
- hpte_rpn(ptel, psize) == gfn) {
- hptep[0] |= cpu_to_be64(HPTE_V_ABSENT);
- kvmppc_invalidate_hpte(kvm, hptep, i);
- hptep[1] &= ~cpu_to_be64(HPTE_R_KEY_HI | HPTE_R_KEY_LO);
- /* Harvest R and C */
- rcbits = be64_to_cpu(hptep[1]) & (HPTE_R_R | HPTE_R_C);
- *rmapp |= rcbits << KVMPPC_RMAP_RC_SHIFT;
- if (rcbits & HPTE_R_C)
- kvmppc_update_rmap_change(rmapp, psize);
- if (rcbits & ~rev[i].guest_rpte) {
- rev[i].guest_rpte = ptel | rcbits;
- note_hpte_modification(kvm, &rev[i]);
- }
- }
+ kvmppc_unmap_hpte(kvm, i, rmapp, gfn);
unlock_rmap(rmapp);
__unlock_hpte(hptep, be64_to_cpu(hptep[0]));
}
--
2.9.3

2016-12-20 05:51:32

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 06/11] powerpc/kvm: Split HPT allocation from activation

Currently, kvmppc_alloc_hpt() both allocates a new hashed page table (HPT)
and sets it up as the active page table for a VM. For the upcoming HPT
resize implementation we're going to want to allocate HPTs separately from
activating them.

So, split the allocation itself out into kvmppc_allocate_hpt() and perform
the activation with a new kvmppc_set_hpt() function. Likewise we split
kvmppc_free_hpt(), which just frees the HPT, from kvmppc_release_hpt()
which unsets it as an active HPT, then frees it.

We also move the logic to fall back to smaller HPT sizes if the first try
fails into the single caller which used that behaviour,
kvmppc_hv_setup_htab_rma(). This introduces a slight semantic change, in
that previously if the initial attempt at CMA allocation failed, we would
fall back to attempting smaller sizes with the page allocator. Now, we
try first CMA, then the page allocator at each size. As far as I can tell
this change should be harmless.

To match, we make kvmppc_free_hpt() just free the actual HPT itself. The
call to kvmppc_free_lpid() that was there, we move to the single caller.

Signed-off-by: David Gibson <[email protected]>
---
arch/powerpc/include/asm/kvm_book3s_64.h | 4 ++
arch/powerpc/include/asm/kvm_ppc.h | 5 +-
arch/powerpc/kvm/book3s_64_mmu_hv.c | 96 +++++++++++++++-----------------
arch/powerpc/kvm/book3s_hv.c | 18 +++++-
4 files changed, 68 insertions(+), 55 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s_64.h b/arch/powerpc/include/asm/kvm_book3s_64.h
index 8810327..bcb3e0a 100644
--- a/arch/powerpc/include/asm/kvm_book3s_64.h
+++ b/arch/powerpc/include/asm/kvm_book3s_64.h
@@ -22,6 +22,10 @@

#include <asm/book3s/64/mmu-hash.h>

+/* Power architecture requires HPT is at least 256kiB, at most 64TiB */
+#define PPC_MIN_HPT_ORDER 18
+#define PPC_MAX_HPT_ORDER 46
+
#ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu)
{
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 3db6be9..41575b8 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -155,9 +155,10 @@ extern void kvmppc_core_destroy_mmu(struct kvm_vcpu *vcpu);
extern int kvmppc_kvm_pv(struct kvm_vcpu *vcpu);
extern void kvmppc_map_magic(struct kvm_vcpu *vcpu);

-extern long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp);
+extern int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order);
+extern void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info);
extern long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp);
-extern void kvmppc_free_hpt(struct kvm *kvm);
+extern void kvmppc_free_hpt(struct kvm_hpt_info *info);
extern long kvmppc_prepare_vrma(struct kvm *kvm,
struct kvm_userspace_memory_region *mem);
extern void kvmppc_map_vrma(struct kvm_vcpu *vcpu,
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fe88132..4683d87 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -40,74 +40,66 @@

#include "trace_hv.h"

-/* Power architecture requires HPT is at least 256kB */
-#define PPC_MIN_HPT_ORDER 18
-
static long kvmppc_virtmode_do_h_enter(struct kvm *kvm, unsigned long flags,
long pte_index, unsigned long pteh,
unsigned long ptel, unsigned long *pte_idx_ret);
static void kvmppc_rmap_reset(struct kvm *kvm);

-long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
+int kvmppc_allocate_hpt(struct kvm_hpt_info *info, u32 order)
{
unsigned long hpt = 0;
- struct revmap_entry *rev;
+ int cma = 0;
struct page *page = NULL;
- long order = KVM_DEFAULT_HPT_ORDER;
+ struct revmap_entry *rev;
+ unsigned long npte;

- if (htab_orderp) {
- order = *htab_orderp;
- if (order < PPC_MIN_HPT_ORDER)
- order = PPC_MIN_HPT_ORDER;
- }
+ if ((order < PPC_MIN_HPT_ORDER) || (order > PPC_MAX_HPT_ORDER))
+ return -EINVAL;

- kvm->arch.hpt.cma = 0;
page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
- kvm->arch.hpt.cma = 1;
+ cma = 1;
}

- /* Lastly try successively smaller sizes from the page allocator */
- /* Only do this if userspace didn't specify a size via ioctl */
- while (!hpt && order > PPC_MIN_HPT_ORDER && !htab_orderp) {
- hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
- __GFP_NOWARN, order - PAGE_SHIFT);
- if (!hpt)
- --order;
- }
+ if (!hpt)
+ hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT
+ |__GFP_NOWARN, order - PAGE_SHIFT);

if (!hpt)
return -ENOMEM;

- kvm->arch.hpt.virt = hpt;
- kvm->arch.hpt.order = order;
-
- atomic64_set(&kvm->arch.mmio_update, 0);
+ /* HPTEs are 2**4 bytes long */
+ npte = 1ul << (order - 4);

/* Allocate reverse map array */
- rev = vmalloc(sizeof(struct revmap_entry) * kvmppc_hpt_npte(&kvm->arch.hpt));
+ rev = vmalloc(sizeof(struct revmap_entry) * npte);
if (!rev) {
- pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
- goto out_freehpt;
+ pr_err("kvmppc_allocate_hpt: Couldn't alloc reverse map array\n");
+ if (cma)
+ kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
+ else
+ free_pages(hpt, order - PAGE_SHIFT);
+ return -ENOMEM;
}
- kvm->arch.hpt.rev = rev;
- kvm->arch.sdr1 = __pa(hpt) | (order - 18);

- pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
- hpt, order, kvm->arch.lpid);
+ info->order = order;
+ info->virt = hpt;
+ info->cma = cma;
+ info->rev = rev;

- if (htab_orderp)
- *htab_orderp = order;
return 0;
+}

- out_freehpt:
- if (kvm->arch.hpt.cma)
- kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
- else
- free_pages(hpt, order - PAGE_SHIFT);
- return -ENOMEM;
+void kvmppc_set_hpt(struct kvm *kvm, struct kvm_hpt_info *info)
+{
+ atomic64_set(&kvm->arch.mmio_update, 0);
+ kvm->arch.hpt = *info;
+ kvm->arch.sdr1 = __pa(info->virt) | (info->order - 18);
+
+ pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
+ info->virt, (long)info->order, kvm->arch.lpid);
}

long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
@@ -138,24 +130,28 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
*htab_orderp = order;
err = 0;
} else {
- err = kvmppc_alloc_hpt(kvm, htab_orderp);
- order = *htab_orderp;
+ struct kvm_hpt_info info;
+
+ err = kvmppc_allocate_hpt(&info, *htab_orderp);
+ if (err < 0)
+ goto out;
+ kvmppc_set_hpt(kvm, &info);
}
out:
mutex_unlock(&kvm->lock);
return err;
}

-void kvmppc_free_hpt(struct kvm *kvm)
+void kvmppc_free_hpt(struct kvm_hpt_info *info)
{
- kvmppc_free_lpid(kvm->arch.lpid);
- vfree(kvm->arch.hpt.rev);
- if (kvm->arch.hpt.cma)
- kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt.virt),
- 1 << (kvm->arch.hpt.order - PAGE_SHIFT));
+ vfree(info->rev);
+ if (info->cma)
+ kvm_free_hpt_cma(virt_to_page(info->virt),
+ 1 << (info->order - PAGE_SHIFT));
else
- free_pages(kvm->arch.hpt.virt,
- kvm->arch.hpt.order - PAGE_SHIFT);
+ free_pages(info->virt, info->order - PAGE_SHIFT);
+ info->virt = 0;
+ info->order = 0;
}

/* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 78baa2b..436508e 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3115,11 +3115,22 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)

/* Allocate hashed page table (if not done already) and reset it */
if (!kvm->arch.hpt.virt) {
- err = kvmppc_alloc_hpt(kvm, NULL);
- if (err) {
+ int order = KVM_DEFAULT_HPT_ORDER;
+ struct kvm_hpt_info info;
+
+ err = kvmppc_allocate_hpt(&info, order);
+ /* If we get here, it means userspace didn't specify a
+ * size explicitly. So, try successively smaller
+ * sizes if the default failed. */
+ while ((err == -ENOMEM) && --order >= PPC_MIN_HPT_ORDER)
+ err = kvmppc_allocate_hpt(&info, order);
+
+ if (err < 0) {
pr_err("KVM: Couldn't alloc HPT\n");
goto out;
}
+
+ kvmppc_set_hpt(kvm, &info);
}

/* Look up the memslot for guest physical address 0 */
@@ -3357,7 +3368,8 @@ static void kvmppc_core_destroy_vm_hv(struct kvm *kvm)

kvmppc_free_vcores(kvm);

- kvmppc_free_hpt(kvm);
+ kvmppc_free_lpid(kvm->arch.lpid);
+ kvmppc_free_hpt(&kvm->arch.hpt);

kvmppc_free_pimap(kvm);
}
--
2.9.3

2016-12-20 05:52:47

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 04/11] powerpc/kvm: Gather HPT related variables into sub-structure

Currently, the powerpc kvm_arch structure contains a number of variables
tracking the state of the guest's hashed page table (HPT) in KVM HV. This
patch gathers them all together into a single kvm_hpt_info substructure.
This makes life more convenient for the upcoming HPT resizing
implementation.

Signed-off-by: David Gibson <[email protected]>
---
arch/powerpc/include/asm/kvm_host.h | 20 ++++++---
arch/powerpc/kvm/book3s_64_mmu_hv.c | 90 ++++++++++++++++++-------------------
arch/powerpc/kvm/book3s_hv.c | 2 +-
arch/powerpc/kvm/book3s_hv_rm_mmu.c | 62 ++++++++++++-------------
4 files changed, 91 insertions(+), 83 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index e59b172..074b17f 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -241,12 +241,24 @@ struct kvm_arch_memory_slot {
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
};

+struct kvm_hpt_info {
+ /* Host virtual (linear mapping) address of guest HPT */
+ unsigned long virt;
+ /* Array of reverse mapping entries for each guest HPTE */
+ struct revmap_entry *rev;
+ unsigned long npte;
+ unsigned long mask;
+ /* Guest HPT size is 2**(order) bytes */
+ u32 order;
+ /* 1 if HPT allocated with CMA, 0 otherwise */
+ int cma;
+};
+
struct kvm_arch {
unsigned int lpid;
#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
unsigned int tlb_sets;
- unsigned long hpt_virt;
- struct revmap_entry *revmap;
+ struct kvm_hpt_info hpt;
atomic64_t mmio_update;
unsigned int host_lpid;
unsigned long host_lpcr;
@@ -256,14 +268,10 @@ struct kvm_arch {
unsigned long lpcr;
unsigned long vrma_slb_v;
int hpte_setup_done;
- u32 hpt_order;
atomic_t vcpus_running;
u32 online_vcores;
- unsigned long hpt_npte;
- unsigned long hpt_mask;
atomic_t hpte_mod_interest;
cpumask_t need_tlb_flush;
- int hpt_cma_alloc;
struct dentry *debugfs_dir;
struct dentry *htab_dentry;
#endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index ae17cdd..b5799d1 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -61,12 +61,12 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
order = PPC_MIN_HPT_ORDER;
}

- kvm->arch.hpt_cma_alloc = 0;
+ kvm->arch.hpt.cma = 0;
page = kvm_alloc_hpt_cma(1ul << (order - PAGE_SHIFT));
if (page) {
hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
memset((void *)hpt, 0, (1ul << order));
- kvm->arch.hpt_cma_alloc = 1;
+ kvm->arch.hpt.cma = 1;
}

/* Lastly try successively smaller sizes from the page allocator */
@@ -81,22 +81,22 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
if (!hpt)
return -ENOMEM;

- kvm->arch.hpt_virt = hpt;
- kvm->arch.hpt_order = order;
+ kvm->arch.hpt.virt = hpt;
+ kvm->arch.hpt.order = order;
/* HPTEs are 2**4 bytes long */
- kvm->arch.hpt_npte = 1ul << (order - 4);
+ kvm->arch.hpt.npte = 1ul << (order - 4);
/* 128 (2**7) bytes in each HPTEG */
- kvm->arch.hpt_mask = (1ul << (order - 7)) - 1;
+ kvm->arch.hpt.mask = (1ul << (order - 7)) - 1;

atomic64_set(&kvm->arch.mmio_update, 0);

/* Allocate reverse map array */
- rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt_npte);
+ rev = vmalloc(sizeof(struct revmap_entry) * kvm->arch.hpt.npte);
if (!rev) {
pr_err("kvmppc_alloc_hpt: Couldn't alloc reverse map array\n");
goto out_freehpt;
}
- kvm->arch.revmap = rev;
+ kvm->arch.hpt.rev = rev;
kvm->arch.sdr1 = __pa(hpt) | (order - 18);

pr_info("KVM guest htab at %lx (order %ld), LPID %x\n",
@@ -107,7 +107,7 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
return 0;

out_freehpt:
- if (kvm->arch.hpt_cma_alloc)
+ if (kvm->arch.hpt.cma)
kvm_free_hpt_cma(page, 1 << (order - PAGE_SHIFT));
else
free_pages(hpt, order - PAGE_SHIFT);
@@ -129,10 +129,10 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
goto out;
}
}
- if (kvm->arch.hpt_virt) {
- order = kvm->arch.hpt_order;
+ if (kvm->arch.hpt.virt) {
+ order = kvm->arch.hpt.order;
/* Set the entire HPT to 0, i.e. invalid HPTEs */
- memset((void *)kvm->arch.hpt_virt, 0, 1ul << order);
+ memset((void *)kvm->arch.hpt.virt, 0, 1ul << order);
/*
* Reset all the reverse-mapping chains for all memslots
*/
@@ -153,13 +153,13 @@ long kvmppc_alloc_reset_hpt(struct kvm *kvm, u32 *htab_orderp)
void kvmppc_free_hpt(struct kvm *kvm)
{
kvmppc_free_lpid(kvm->arch.lpid);
- vfree(kvm->arch.revmap);
- if (kvm->arch.hpt_cma_alloc)
- kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt_virt),
- 1 << (kvm->arch.hpt_order - PAGE_SHIFT));
+ vfree(kvm->arch.hpt.rev);
+ if (kvm->arch.hpt.cma)
+ kvm_free_hpt_cma(virt_to_page(kvm->arch.hpt.virt),
+ 1 << (kvm->arch.hpt.order - PAGE_SHIFT));
else
- free_pages(kvm->arch.hpt_virt,
- kvm->arch.hpt_order - PAGE_SHIFT);
+ free_pages(kvm->arch.hpt.virt,
+ kvm->arch.hpt.order - PAGE_SHIFT);
}

/* Bits in first HPTE dword for pagesize 4k, 64k or 16M */
@@ -194,8 +194,8 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
if (npages > 1ul << (40 - porder))
npages = 1ul << (40 - porder);
/* Can't use more than 1 HPTE per HPTEG */
- if (npages > kvm->arch.hpt_mask + 1)
- npages = kvm->arch.hpt_mask + 1;
+ if (npages > kvm->arch.hpt.mask + 1)
+ npages = kvm->arch.hpt.mask + 1;

hp0 = HPTE_V_1TB_SEG | (VRMA_VSID << (40 - 16)) |
HPTE_V_BOLTED | hpte0_pgsize_encoding(psize);
@@ -205,7 +205,7 @@ void kvmppc_map_vrma(struct kvm_vcpu *vcpu, struct kvm_memory_slot *memslot,
for (i = 0; i < npages; ++i) {
addr = i << porder;
/* can't use hpt_hash since va > 64 bits */
- hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt_mask;
+ hash = (i ^ (VRMA_VSID ^ (VRMA_VSID << 25))) & kvm->arch.hpt.mask;
/*
* We assume that the hash table is empty and no
* vcpus are using it at this stage. Since we create
@@ -338,11 +338,11 @@ static int kvmppc_mmu_book3s_64_hv_xlate(struct kvm_vcpu *vcpu, gva_t eaddr,
preempt_enable();
return -ENOENT;
}
- hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
+ hptep = (__be64 *)(kvm->arch.hpt.virt + (index << 4));
v = orig_v = be64_to_cpu(hptep[0]) & ~HPTE_V_HVLOCK;
if (cpu_has_feature(CPU_FTR_ARCH_300))
v = hpte_new_to_old_v(v, be64_to_cpu(hptep[1]));
- gr = kvm->arch.revmap[index].guest_rpte;
+ gr = kvm->arch.hpt.rev[index].guest_rpte;

unlock_hpte(hptep, orig_v);
preempt_enable();
@@ -480,8 +480,8 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu,
}
}
index = vcpu->arch.pgfault_index;
- hptep = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
- rev = &kvm->arch.revmap[index];
+ hptep = (__be64 *)(kvm->arch.hpt.virt + (index << 4));
+ rev = &kvm->arch.hpt.rev[index];
preempt_disable();
while (!try_lock_hpte(hptep, HPTE_V_HVLOCK))
cpu_relax();
@@ -745,7 +745,7 @@ static int kvm_handle_hva(struct kvm *kvm, unsigned long hva,
static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
unsigned long gfn)
{
- struct revmap_entry *rev = kvm->arch.revmap;
+ struct revmap_entry *rev = kvm->arch.hpt.rev;
unsigned long h, i, j;
__be64 *hptep;
unsigned long ptel, psize, rcbits;
@@ -763,7 +763,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp,
* rmap chain lock.
*/
i = *rmapp & KVMPPC_RMAP_INDEX;
- hptep = (__be64 *) (kvm->arch.hpt_virt + (i << 4));
+ hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
if (!try_lock_hpte(hptep, HPTE_V_HVLOCK)) {
/* unlock rmap before spinning on the HPTE lock */
unlock_rmap(rmapp);
@@ -846,7 +846,7 @@ void kvmppc_core_flush_memslot_hv(struct kvm *kvm,
static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
unsigned long gfn)
{
- struct revmap_entry *rev = kvm->arch.revmap;
+ struct revmap_entry *rev = kvm->arch.hpt.rev;
unsigned long head, i, j;
__be64 *hptep;
int ret = 0;
@@ -864,7 +864,7 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned long *rmapp,

i = head = *rmapp & KVMPPC_RMAP_INDEX;
do {
- hptep = (__be64 *) (kvm->arch.hpt_virt + (i << 4));
+ hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
j = rev[i].forw;

/* If this HPTE isn't referenced, ignore it */
@@ -904,7 +904,7 @@ int kvm_age_hva_hv(struct kvm *kvm, unsigned long start, unsigned long end)
static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
unsigned long gfn)
{
- struct revmap_entry *rev = kvm->arch.revmap;
+ struct revmap_entry *rev = kvm->arch.hpt.rev;
unsigned long head, i, j;
unsigned long *hp;
int ret = 1;
@@ -919,7 +919,7 @@ static int kvm_test_age_rmapp(struct kvm *kvm, unsigned long *rmapp,
if (*rmapp & KVMPPC_RMAP_PRESENT) {
i = head = *rmapp & KVMPPC_RMAP_INDEX;
do {
- hp = (unsigned long *)(kvm->arch.hpt_virt + (i << 4));
+ hp = (unsigned long *)(kvm->arch.hpt.virt + (i << 4));
j = rev[i].forw;
if (be64_to_cpu(hp[1]) & HPTE_R_R)
goto out;
@@ -953,7 +953,7 @@ static int vcpus_running(struct kvm *kvm)
*/
static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
{
- struct revmap_entry *rev = kvm->arch.revmap;
+ struct revmap_entry *rev = kvm->arch.hpt.rev;
unsigned long head, i, j;
unsigned long n;
unsigned long v, r;
@@ -978,7 +978,7 @@ static int kvm_test_clear_dirty_npages(struct kvm *kvm, unsigned long *rmapp)
i = head = *rmapp & KVMPPC_RMAP_INDEX;
do {
unsigned long hptep1;
- hptep = (__be64 *) (kvm->arch.hpt_virt + (i << 4));
+ hptep = (__be64 *) (kvm->arch.hpt.virt + (i << 4));
j = rev[i].forw;

/*
@@ -1290,8 +1290,8 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
flags = ctx->flags;

i = ctx->index;
- hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
- revp = kvm->arch.revmap + i;
+ hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
+ revp = kvm->arch.hpt.rev + i;
lbuf = (unsigned long __user *)buf;

nb = 0;
@@ -1306,7 +1306,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,

/* Skip uninteresting entries, i.e. clean on not-first pass */
if (!first_pass) {
- while (i < kvm->arch.hpt_npte &&
+ while (i < kvm->arch.hpt.npte &&
!hpte_dirty(revp, hptp)) {
++i;
hptp += 2;
@@ -1316,7 +1316,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
hdr.index = i;

/* Grab a series of valid entries */
- while (i < kvm->arch.hpt_npte &&
+ while (i < kvm->arch.hpt.npte &&
hdr.n_valid < 0xffff &&
nb + HPTE_SIZE < count &&
record_hpte(flags, hptp, hpte, revp, 1, first_pass)) {
@@ -1332,7 +1332,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
++revp;
}
/* Now skip invalid entries while we can */
- while (i < kvm->arch.hpt_npte &&
+ while (i < kvm->arch.hpt.npte &&
hdr.n_invalid < 0xffff &&
record_hpte(flags, hptp, hpte, revp, 0, first_pass)) {
/* found an invalid entry */
@@ -1353,7 +1353,7 @@ static ssize_t kvm_htab_read(struct file *file, char __user *buf,
}

/* Check if we've wrapped around the hash table */
- if (i >= kvm->arch.hpt_npte) {
+ if (i >= kvm->arch.hpt.npte) {
i = 0;
ctx->first_pass = 0;
break;
@@ -1412,11 +1412,11 @@ static ssize_t kvm_htab_write(struct file *file, const char __user *buf,

err = -EINVAL;
i = hdr.index;
- if (i >= kvm->arch.hpt_npte ||
- i + hdr.n_valid + hdr.n_invalid > kvm->arch.hpt_npte)
+ if (i >= kvm->arch.hpt.npte ||
+ i + hdr.n_valid + hdr.n_invalid > kvm->arch.hpt.npte)
break;

- hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
+ hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
lbuf = (unsigned long __user *)buf;
for (j = 0; j < hdr.n_valid; ++j) {
__be64 hpte_v;
@@ -1603,8 +1603,8 @@ static ssize_t debugfs_htab_read(struct file *file, char __user *buf,

kvm = p->kvm;
i = p->hpt_index;
- hptp = (__be64 *)(kvm->arch.hpt_virt + (i * HPTE_SIZE));
- for (; len != 0 && i < kvm->arch.hpt_npte; ++i, hptp += 2) {
+ hptp = (__be64 *)(kvm->arch.hpt.virt + (i * HPTE_SIZE));
+ for (; len != 0 && i < kvm->arch.hpt.npte; ++i, hptp += 2) {
if (!(be64_to_cpu(hptp[0]) & (HPTE_V_VALID | HPTE_V_ABSENT)))
continue;

@@ -1614,7 +1614,7 @@ static ssize_t debugfs_htab_read(struct file *file, char __user *buf,
cpu_relax();
v = be64_to_cpu(hptp[0]) & ~HPTE_V_HVLOCK;
hr = be64_to_cpu(hptp[1]);
- gr = kvm->arch.revmap[i].guest_rpte;
+ gr = kvm->arch.hpt.rev[i].guest_rpte;
unlock_hpte(hptp, v);
preempt_enable();

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 8dcbe37..78baa2b 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3114,7 +3114,7 @@ static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu)
goto out; /* another vcpu beat us to it */

/* Allocate hashed page table (if not done already) and reset it */
- if (!kvm->arch.hpt_virt) {
+ if (!kvm->arch.hpt.virt) {
err = kvmppc_alloc_hpt(kvm, NULL);
if (err) {
pr_err("KVM: Couldn't alloc HPT\n");
diff --git a/arch/powerpc/kvm/book3s_hv_rm_mmu.c b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
index 9ef3c4b..7e2b048 100644
--- a/arch/powerpc/kvm/book3s_hv_rm_mmu.c
+++ b/arch/powerpc/kvm/book3s_hv_rm_mmu.c
@@ -79,10 +79,10 @@ void kvmppc_add_revmap_chain(struct kvm *kvm, struct revmap_entry *rev,

if (*rmap & KVMPPC_RMAP_PRESENT) {
i = *rmap & KVMPPC_RMAP_INDEX;
- head = &kvm->arch.revmap[i];
+ head = &kvm->arch.hpt.rev[i];
if (realmode)
head = real_vmalloc_addr(head);
- tail = &kvm->arch.revmap[head->back];
+ tail = &kvm->arch.hpt.rev[head->back];
if (realmode)
tail = real_vmalloc_addr(tail);
rev->forw = i;
@@ -147,8 +147,8 @@ static void remove_revmap_chain(struct kvm *kvm, long pte_index,
lock_rmap(rmap);

head = *rmap & KVMPPC_RMAP_INDEX;
- next = real_vmalloc_addr(&kvm->arch.revmap[rev->forw]);
- prev = real_vmalloc_addr(&kvm->arch.revmap[rev->back]);
+ next = real_vmalloc_addr(&kvm->arch.hpt.rev[rev->forw]);
+ prev = real_vmalloc_addr(&kvm->arch.hpt.rev[rev->back]);
next->back = rev->back;
prev->forw = rev->forw;
if (head == pte_index) {
@@ -283,11 +283,11 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,

/* Find and lock the HPTEG slot to use */
do_insert:
- if (pte_index >= kvm->arch.hpt_npte)
+ if (pte_index >= kvm->arch.hpt.npte)
return H_PARAMETER;
if (likely((flags & H_EXACT) == 0)) {
pte_index &= ~7UL;
- hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+ hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
for (i = 0; i < 8; ++i) {
if ((be64_to_cpu(*hpte) & HPTE_V_VALID) == 0 &&
try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
@@ -318,7 +318,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
}
pte_index += i;
} else {
- hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+ hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
if (!try_lock_hpte(hpte, HPTE_V_HVLOCK | HPTE_V_VALID |
HPTE_V_ABSENT)) {
/* Lock the slot and check again */
@@ -335,7 +335,7 @@ long kvmppc_do_h_enter(struct kvm *kvm, unsigned long flags,
}

/* Save away the guest's idea of the second HPTE dword */
- rev = &kvm->arch.revmap[pte_index];
+ rev = &kvm->arch.hpt.rev[pte_index];
if (realmode)
rev = real_vmalloc_addr(rev);
if (rev) {
@@ -458,9 +458,9 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
struct revmap_entry *rev;
u64 pte, orig_pte, pte_r;

- if (pte_index >= kvm->arch.hpt_npte)
+ if (pte_index >= kvm->arch.hpt.npte)
return H_PARAMETER;
- hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+ hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
pte = orig_pte = be64_to_cpu(hpte[0]);
@@ -476,7 +476,7 @@ long kvmppc_do_h_remove(struct kvm *kvm, unsigned long flags,
return H_NOT_FOUND;
}

- rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+ rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
v = pte & ~HPTE_V_HVLOCK;
if (v & HPTE_V_VALID) {
hpte[0] &= ~cpu_to_be64(HPTE_V_VALID);
@@ -544,13 +544,13 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
break;
}
if (req != 1 || flags == 3 ||
- pte_index >= kvm->arch.hpt_npte) {
+ pte_index >= kvm->arch.hpt.npte) {
/* parameter error */
args[j] = ((0xa0 | flags) << 56) + pte_index;
ret = H_PARAMETER;
break;
}
- hp = (__be64 *) (kvm->arch.hpt_virt + (pte_index << 4));
+ hp = (__be64 *) (kvm->arch.hpt.virt + (pte_index << 4));
/* to avoid deadlock, don't spin except for first */
if (!try_lock_hpte(hp, HPTE_V_HVLOCK)) {
if (n)
@@ -587,7 +587,7 @@ long kvmppc_h_bulk_remove(struct kvm_vcpu *vcpu)
}

args[j] = ((0x80 | flags) << 56) + pte_index;
- rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+ rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
note_hpte_modification(kvm, rev);

if (!(hp0 & HPTE_V_VALID)) {
@@ -642,10 +642,10 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long v, r, rb, mask, bits;
u64 pte_v, pte_r;

- if (pte_index >= kvm->arch.hpt_npte)
+ if (pte_index >= kvm->arch.hpt.npte)
return H_PARAMETER;

- hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+ hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
v = pte_v = be64_to_cpu(hpte[0]);
@@ -665,7 +665,7 @@ long kvmppc_h_protect(struct kvm_vcpu *vcpu, unsigned long flags,
/* Update guest view of 2nd HPTE dword */
mask = HPTE_R_PP0 | HPTE_R_PP | HPTE_R_N |
HPTE_R_KEY_HI | HPTE_R_KEY_LO;
- rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+ rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
if (rev) {
r = (rev->guest_rpte & ~mask) | bits;
rev->guest_rpte = r;
@@ -711,15 +711,15 @@ long kvmppc_h_read(struct kvm_vcpu *vcpu, unsigned long flags,
int i, n = 1;
struct revmap_entry *rev = NULL;

- if (pte_index >= kvm->arch.hpt_npte)
+ if (pte_index >= kvm->arch.hpt.npte)
return H_PARAMETER;
if (flags & H_READ_4) {
pte_index &= ~3;
n = 4;
}
- rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
+ rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
for (i = 0; i < n; ++i, ++pte_index) {
- hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+ hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
r = be64_to_cpu(hpte[1]);
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
@@ -750,11 +750,11 @@ long kvmppc_h_clear_ref(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long *rmap;
long ret = H_NOT_FOUND;

- if (pte_index >= kvm->arch.hpt_npte)
+ if (pte_index >= kvm->arch.hpt.npte)
return H_PARAMETER;

- rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
- hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+ rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+ hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
v = be64_to_cpu(hpte[0]);
@@ -796,11 +796,11 @@ long kvmppc_h_clear_mod(struct kvm_vcpu *vcpu, unsigned long flags,
unsigned long *rmap;
long ret = H_NOT_FOUND;

- if (pte_index >= kvm->arch.hpt_npte)
+ if (pte_index >= kvm->arch.hpt.npte)
return H_PARAMETER;

- rev = real_vmalloc_addr(&kvm->arch.revmap[pte_index]);
- hpte = (__be64 *)(kvm->arch.hpt_virt + (pte_index << 4));
+ rev = real_vmalloc_addr(&kvm->arch.hpt.rev[pte_index]);
+ hpte = (__be64 *)(kvm->arch.hpt.virt + (pte_index << 4));
while (!try_lock_hpte(hpte, HPTE_V_HVLOCK))
cpu_relax();
v = be64_to_cpu(hpte[0]);
@@ -949,7 +949,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
somask = (1UL << 28) - 1;
vsid = (slb_v & ~SLB_VSID_B) >> SLB_VSID_SHIFT;
}
- hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvm->arch.hpt_mask;
+ hash = (vsid ^ ((eaddr & somask) >> pshift)) & kvm->arch.hpt.mask;
avpn = slb_v & ~(somask >> 16); /* also includes B */
avpn |= (eaddr & somask) >> 16;

@@ -960,7 +960,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
val |= avpn;

for (;;) {
- hpte = (__be64 *)(kvm->arch.hpt_virt + (hash << 7));
+ hpte = (__be64 *)(kvm->arch.hpt.virt + (hash << 7));

for (i = 0; i < 16; i += 2) {
/* Read the PTE racily */
@@ -996,7 +996,7 @@ long kvmppc_hv_find_lock_hpte(struct kvm *kvm, gva_t eaddr, unsigned long slb_v,
if (val & HPTE_V_SECONDARY)
break;
val |= HPTE_V_SECONDARY;
- hash = hash ^ kvm->arch.hpt_mask;
+ hash = hash ^ kvm->arch.hpt.mask;
}
return -1;
}
@@ -1045,14 +1045,14 @@ long kvmppc_hpte_hv_fault(struct kvm_vcpu *vcpu, unsigned long addr,
return status; /* there really was no HPTE */
return 0; /* for prot fault, HPTE disappeared */
}
- hpte = (__be64 *)(kvm->arch.hpt_virt + (index << 4));
+ hpte = (__be64 *)(kvm->arch.hpt.virt + (index << 4));
v = orig_v = be64_to_cpu(hpte[0]) & ~HPTE_V_HVLOCK;
r = be64_to_cpu(hpte[1]);
if (cpu_has_feature(CPU_FTR_ARCH_300)) {
v = hpte_new_to_old_v(v, r);
r = hpte_new_to_old_r(r);
}
- rev = real_vmalloc_addr(&kvm->arch.revmap[index]);
+ rev = real_vmalloc_addr(&kvm->arch.hpt.rev[index]);
gr = rev->guest_rpte;

unlock_hpte(hpte, orig_v);
--
2.9.3

2016-12-20 05:52:52

by David Gibson

[permalink] [raw]
Subject: [PATCHv2 02/11] powerpc/kvm: HPT resizing documentation and reserved numbers

This adds a new powerpc-specific KVM_CAP_SPAPR_RESIZE_HPT capability to
advertise whether KVM is capable of handling the PAPR extensions for
resizing the hashed page table during guest runtime. It also adds
definitions for two new VM ioctl()s to implement this extension, and
documentation of the same.

Note that, HPT resizing is already possible with KVM PR without kernel
modification, since the HPT is managed within userspace (qemu). The
capability defined here will only be set where an in-kernel implementation
of resizing is necessary, i.e. for KVM HV. To determine if the userspace
resize implementation can be used, it's necessary to check
KVM_CAP_PPC_ALLOC_HTAB. Unfortunately older kernels incorrectly set
KVM_CAP_PPC_ALLOC_HTAB even with KVM PR. If userspace it want to support
resizing with KVM PR on such kernels, it will need a workaround.

Signed-off-by: David Gibson <[email protected]>
---
Documentation/virtual/kvm/api.txt | 95 +++++++++++++++++++++++++++++++++++++++
include/uapi/linux/kvm.h | 11 +++++
2 files changed, 106 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 42be7b1..f452c72 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -3201,6 +3201,101 @@ struct kvm_reinject_control {
pit_reinject = 0 (!reinject mode) is recommended, unless running an old
operating system that uses the PIT for timing (e.g. Linux 2.4.x).

+4.100 KVM_PPC_RESIZE_HPT_PREPARE
+
+Capability: KVM_CAP_SPAPR_RESIZE_HPT
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_ppc_resize_hpt (in)
+Returns: 0 on successful completion,
+ >0 if a new HPT is being prepared, the value is an estimated
+ number of milliseconds until preparation is complete
+ -EFAULT if struct kvm_reinject_control cannot be read,
+ -EINVAL if the supplied shift or flags are invalid
+ -ENOMEM if unable to allocate the new HPT
+ -ENOSPC if there was a hash collision when moving existing
+ HPT entries to the new HPT
+ -EIO on other error conditions
+
+Used to implement the PAPR extension for runtime resizing of a guest's
+Hashed Page Table (HPT). Specifically this starts, stops or monitors
+the preparation of a new potential HPT for the guest, essentially
+implementing the H_RESIZE_HPT_PREPARE hypercall.
+
+If called with shift > 0 when there is no pending HPT for the guest,
+this begins preparation of a new pending HPT of size 2^(shift) bytes.
+It then returns a positive integer with the estimated number of
+milliseconds until preparation is complete.
+
+If called when there is a pending HPT whose size does not match that
+requested in the parameters, discards the existing pending HPT and
+creates a new one as above.
+
+If called when there is a pending HPT of the size requested, will:
+ * If preparation of the pending HPT is already complete, return 0
+ * If preparation of the pending HPT has failed, return an error
+ code, then discard the pending HPT.
+ * If preparation of the pending HPT is still in progress, return an
+ estimated number of milliseconds until preparation is complete.
+
+If called with shift == 0, discards any currently pending HPT and
+returns 0 (i.e. cancels any in-progress preparation).
+
+flags is reserved for future expansion, currently setting any bits in
+flags will result in an -EINVAL.
+
+Normally this will be called repeatedly with the same parameters until
+it returns <= 0. The first call will initiate preparation, subsequent
+ones will monitor preparation until it completes or fails.
+
+struct kvm_ppc_resize_hpt {
+ __u64 flags;
+ __u32 shift;
+ __u32 pad;
+};
+
+4.100 KVM_PPC_RESIZE_HPT_COMMIT
+
+Capability: KVM_CAP_SPAPR_RESIZE_HPT
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_ppc_resize_hpt (in)
+Returns: 0 on successful completion,
+ -EFAULT if struct kvm_reinject_control cannot be read,
+ -EINVAL if the supplied shift or flags are invalid
+ -ENXIO is there is no pending HPT, or the pending HPT doesn't
+ have the requested size
+ -EBUSY if the pending HPT is not fully prepared
+ -ENOSPC if there was a hash collision when moving existing
+ HPT entries to the new HPT
+ -EIO on other error conditions
+
+Used to implement the PAPR extension for runtime resizing of a guest's
+Hashed Page Table (HPT). Specifically this requests that the guest be
+transferred to working with the new HPT, essentially implementing the
+H_RESIZE_HPT_COMMIT hypercall.
+
+This should only be called after KVM_PPC_RESIZE_HPT_PREPARE has
+returned 0 with the same parameters. In other cases
+KVM_PPC_RESIZE_HPT_COMMIT will return an error (usually -ENXIO or
+-EBUSY, though others may be possible if the preparation was started,
+but failed).
+
+This will have undefined effects on the guest if it has not already
+placed itself in a quiescent state where no vcpu will make MMU enabled
+memory accesses.
+
+On succsful completion, the pending HPT will become the guest's active
+HPT and the previous HPT will be discarded.
+
+On failure, the guest will still be operating on its previous HPT.
+
+struct kvm_ppc_resize_hpt {
+ __u64 flags;
+ __u32 shift;
+ __u32 pad;
+};
+
5. The kvm_run structure
------------------------

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index cac48ed..50ee1c7 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -685,6 +685,13 @@ struct kvm_ppc_smmu_info {
struct kvm_ppc_one_seg_page_size sps[KVM_PPC_PAGE_SIZES_MAX_SZ];
};

+/* for KVM_PPC_RESIZE_HPT_{PREPARE,COMMIT} */
+struct kvm_ppc_resize_hpt {
+ __u64 flags;
+ __u32 shift;
+ __u32 pad;
+};
+
#define KVMIO 0xAE

/* machine type bits, to be used as argument to KVM_CREATE_VM */
@@ -871,6 +878,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_S390_USER_INSTR0 130
#define KVM_CAP_MSI_DEVID 131
#define KVM_CAP_PPC_HTM 132
+#define KVM_CAP_SPAPR_RESIZE_HPT 133

#ifdef KVM_CAP_IRQ_ROUTING

@@ -1187,6 +1195,9 @@ struct kvm_s390_ucas_mapping {
#define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
/* Available with KVM_CAP_PPC_RTAS */
#define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
+/* Available with KVM_CAP_SPAPR_RESIZE_HPT */
+#define KVM_PPC_RESIZE_HPT_PREPARE _IOR(KVMIO, 0xad, struct kvm_ppc_resize_hpt)
+#define KVM_PPC_RESIZE_HPT_COMMIT _IOR(KVMIO, 0xae, struct kvm_ppc_resize_hpt)

/* ioctl for vm fd */
#define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)
--
2.9.3