LinuxLists.cc - [PATCH 00/10 v7] KVM: PPC: IOMMU in-kernel handling

2013-08-01 04:44:49

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 00/10 v7] KVM: PPC: IOMMU in-kernel handling

This accelerates VFIO DMA operations on POWER by moving them
into kernel.

The changes in this series are:
1. rebased on v3.11-rc3.
2. VFIO external user API will go through VFIO tree so it is
excluded from this series.
3. As nobody ever reacted on "hashtable: add hash_for_each_possible_rcu_notrace()",
Ben suggested to push it via his tree so I included it to the series.
4. realmode_(get|put)_page is reworked.

More details in the individual patch comments.

Alexey Kardashevskiy (10):
hashtable: add hash_for_each_possible_rcu_notrace()
KVM: PPC: reserve a capability number for multitce support
KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO
powerpc: Prepare to support kernel handling of IOMMU map/unmap
powerpc: add real mode support for dma operations on powernv
KVM: PPC: enable IOMMU_API for KVM_BOOK3S_64 permanently
KVM: PPC: Add support for multiple-TCE hcalls
powerpc/iommu: rework to support realmode
KVM: PPC: Add support for IOMMU in-kernel handling
KVM: PPC: Add hugepage support for IOMMU in-kernel handling

Documentation/virtual/kvm/api.txt | 52 +++
arch/powerpc/include/asm/iommu.h | 9 +-
arch/powerpc/include/asm/kvm_host.h | 37 +++
arch/powerpc/include/asm/kvm_ppc.h | 18 +-
arch/powerpc/include/asm/machdep.h | 12 +
arch/powerpc/include/asm/pgtable-ppc64.h | 2 +
arch/powerpc/include/uapi/asm/kvm.h | 7 +
arch/powerpc/kernel/iommu.c | 202 +++++++----
arch/powerpc/kvm/Kconfig | 1 +
arch/powerpc/kvm/book3s_64_vio.c | 533 +++++++++++++++++++++++++++++-
arch/powerpc/kvm/book3s_64_vio_hv.c | 405 +++++++++++++++++++++--
arch/powerpc/kvm/book3s_hv.c | 41 ++-
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +-
arch/powerpc/kvm/book3s_pr_papr.c | 35 ++
arch/powerpc/kvm/powerpc.c | 15 +
arch/powerpc/mm/init_64.c | 50 ++-
arch/powerpc/platforms/powernv/pci-ioda.c | 47 ++-
arch/powerpc/platforms/powernv/pci.c | 38 ++-
arch/powerpc/platforms/powernv/pci.h | 3 +-
include/linux/hashtable.h | 15 +
include/linux/mm.h | 14 +
include/linux/page-flags.h | 4 +-
include/uapi/linux/kvm.h | 5 +
23 files changed, 1430 insertions(+), 123 deletions(-)

--
1.8.3.2

2013-08-01 04:44:56

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 01/10] hashtable: add hash_for_each_possible_rcu_notrace()

This adds hash_for_each_possible_rcu_notrace() which is basically
a notrace clone of hash_for_each_possible_rcu() which cannot be
used in real mode due to its tracing/debugging capability.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
include/linux/hashtable.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/include/linux/hashtable.h b/include/linux/hashtable.h
index a9df51f..af8b169 100644
--- a/include/linux/hashtable.h
+++ b/include/linux/hashtable.h
@@ -174,6 +174,21 @@ static inline void hash_del_rcu(struct hlist_node *node)
member)

/**
+ * hash_for_each_possible_rcu_notrace - iterate over all possible objects hashing
+ * to the same bucket in an rcu enabled hashtable in a rcu enabled hashtable
+ * @name: hashtable to iterate
+ * @obj: the type * to use as a loop cursor for each entry
+ * @member: the name of the hlist_node within the struct
+ * @key: the key of the objects to iterate over
+ *
+ * This is the same as hash_for_each_possible_rcu() except that it does
+ * not do any RCU debugging or tracing.
+ */
+#define hash_for_each_possible_rcu_notrace(name, obj, member, key) \
+ hlist_for_each_entry_rcu_notrace(obj, &name[hash_min(key, HASH_BITS(name))],\
+ member)
+
+/**
* hash_for_each_possible_safe - iterate over all possible objects hashing to the
* same bucket safe against removals
* @name: hashtable to iterate
--
1.8.3.2

2013-08-01 04:45:07

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 03/10] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

This is to reserve a capablity number for upcoming support
of VFIO-IOMMU DMA operations in real mode.

Signed-off-by: Alexey Kardashevskiy <[email protected]>

---
Changes:
2013/07/16:
* changed the number

2013/07/11:
* changed order in a file, added comment about a gap in ioctl number

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
include/uapi/linux/kvm.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 99c2533..53c3f1f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_IRQ_XICS 92
#define KVM_CAP_ARM_EL1_32BIT 93
#define KVM_CAP_SPAPR_MULTITCE 94
+#define KVM_CAP_SPAPR_TCE_IOMMU 95

#ifdef KVM_CAP_IRQ_ROUTING

@@ -933,6 +934,9 @@ struct kvm_s390_ucas_mapping {
#define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
/* Available with KVM_CAP_PPC_RTAS */
#define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
+/* 0xad and 0xaf are already taken */
+/* Available with KVM_CAP_SPAPR_TCE_IOMMU */
+#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xaf, struct kvm_create_spapr_tce_iommu)

/* ioctl for vm fd */
#define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)
--
1.8.3.2

2013-08-01 04:45:03

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

This is to reserve a capablity number for upcoming support
of H_PUT_TCE_INDIRECT and H_STUFF_TCE pseries hypercalls
which support mulptiple DMA map/unmap operations per one call.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
Changes:
2013/07/16:
* changed the number

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
include/uapi/linux/kvm.h | 1 +
1 file changed, 1 insertion(+)

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index acccd08..99c2533 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
#define KVM_CAP_PPC_RTAS 91
#define KVM_CAP_IRQ_XICS 92
#define KVM_CAP_ARM_EL1_32BIT 93
+#define KVM_CAP_SPAPR_MULTITCE 94

#ifdef KVM_CAP_IRQ_ROUTING

--
1.8.3.2

2013-08-01 04:45:16

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

The current VFIO-on-POWER implementation supports only user mode
driven mapping, i.e. QEMU is sending requests to map/unmap pages.
However this approach is really slow, so we want to move that to KVM.
Since H_PUT_TCE can be extremely performance sensitive (especially with
network adapters where each packet needs to be mapped/unmapped) we chose
to implement that as a "fast" hypercall directly in "real
mode" (processor still in the guest context but MMU off).

To be able to do that, we need to provide some facilities to
access the struct page count within that real mode environment as things
like the sparsemem vmemmap mappings aren't accessible.

This adds an API function realmode_pfn_to_page() to get page struct when
MMU is off.

This adds to MM a new function put_page_unless_one() which drops a page
if counter is bigger than 1. It is going to be used when MMU is off
(for example, real mode on PPC64) and we want to make sure that page
release will not happen in real mode as it may crash the kernel in
a horrible way.

CONFIG_SPARSEMEM_VMEMMAP and CONFIG_FLATMEM are supported.

Cc: [email protected]
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Andrew Morton <[email protected]>
Reviewed-by: Paul Mackerras <[email protected]>
Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>

---

Changes:
2013/07/25 (v7):
* removed realmode_put_page and added put_page_unless_one() instead.
The name has been chosen to conform the already existing
get_page_unless_zero().
* removed realmode_get_page. Instead, get_page_unless_zero() should be used

2013/07/10:
* adjusted comment (removed sentence about virtual mode)
* get_page_unless_zero replaced with atomic_inc_not_zero to minimize
effect of a possible get_page_unless_zero() rework (if it ever happens).

2013/06/27:
* realmode_get_page() fixed to use get_page_unless_zero(). If failed,
the call will be passed from real to virtual mode and safely handled.
* added comment to PageCompound() in include/linux/page-flags.h.

2013/05/20:
* PageTail() is replaced by PageCompound() in order to have the same checks
for whether the page is huge in realmode_get_page() and realmode_put_page()

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/include/asm/pgtable-ppc64.h | 2 ++
arch/powerpc/mm/init_64.c | 50 +++++++++++++++++++++++++++++++-
include/linux/mm.h | 14 +++++++++
include/linux/page-flags.h | 4 ++-
4 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index 46db094..4a191c4 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -394,6 +394,8 @@ static inline void mark_hpte_slot_valid(unsigned char *hpte_slot_array,
hpte_slot_array[index] = hidx << 4 | 0x1 << 3;
}

+struct page *realmode_pfn_to_page(unsigned long pfn);
+
static inline char *get_hpte_slot_array(pmd_t *pmdp)
{
/*
diff --git a/arch/powerpc/mm/init_64.c b/arch/powerpc/mm/init_64.c
index d0cd9e4..8cf345a 100644
--- a/arch/powerpc/mm/init_64.c
+++ b/arch/powerpc/mm/init_64.c
@@ -300,5 +300,53 @@ void vmemmap_free(unsigned long start, unsigned long end)
{
}

-#endif /* CONFIG_SPARSEMEM_VMEMMAP */
+/*
+ * We do not have access to the sparsemem vmemmap, so we fallback to
+ * walking the list of sparsemem blocks which we already maintain for
+ * the sake of crashdump. In the long run, we might want to maintain
+ * a tree if performance of that linear walk becomes a problem.
+ *
+ * realmode_pfn_to_page functions can fail due to:
+ * 1) As real sparsemem blocks do not lay in RAM continously (they
+ * are in virtual address space which is not available in the real mode),
+ * the requested page struct can be split between blocks so get_page/put_page
+ * may fail.
+ * 2) When huge pages are used, the get_page/put_page API will fail
+ * in real mode as the linked addresses in the page struct are virtual
+ * too.
+ */
+struct page *realmode_pfn_to_page(unsigned long pfn)
+{
+ struct vmemmap_backing *vmem_back;
+ struct page *page;
+ unsigned long page_size = 1 << mmu_psize_defs[mmu_vmemmap_psize].shift;
+ unsigned long pg_va = (unsigned long) pfn_to_page(pfn);

+ for (vmem_back = vmemmap_list; vmem_back; vmem_back = vmem_back->list) {
+ if (pg_va < vmem_back->virt_addr)
+ continue;
+
+ /* Check that page struct is not split between real pages */
+ if ((pg_va + sizeof(struct page)) >
+ (vmem_back->virt_addr + page_size))
+ return NULL;
+
+ page = (struct page *) (vmem_back->phys + pg_va -
+ vmem_back->virt_addr);
+ return page;
+ }
+
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
+
+#elif defined(CONFIG_FLATMEM)
+
+struct page *realmode_pfn_to_page(unsigned long pfn)
+{
+ struct page *page = pfn_to_page(pfn);
+ return page;
+}
+EXPORT_SYMBOL_GPL(realmode_pfn_to_page);
+
+#endif /* CONFIG_SPARSEMEM_VMEMMAP/CONFIG_FLATMEM */
diff --git a/include/linux/mm.h b/include/linux/mm.h
index f022460..dcc99b5 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -290,12 +290,26 @@ static inline int put_page_testzero(struct page *page)
/*
* Try to grab a ref unless the page has a refcount of zero, return false if
* that is the case.
+ * This can be called when MMU is off so it must not access
+ * any of the virtual mappings.
*/
static inline int get_page_unless_zero(struct page *page)
{
return atomic_inc_not_zero(&page->_count);
}

+/*
+ * Try to drop a ref unless the page has a refcount of one, return false if
+ * that is the case.
+ * This is to make sure that the refcount won't become zero after this drop.
+ * This can be called when MMU is off so it must not access
+ * any of the virtual mappings.
+ */
+static inline int put_page_unless_one(struct page *page)
+{
+ return atomic_add_unless(&page->_count, -1, 1);
+}
+
extern int page_is_ram(unsigned long pfn);

/* Support for virtually mapped pages */
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 6d53675..98ada58 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -329,7 +329,9 @@ static inline void set_page_writeback(struct page *page)
* System with lots of page flags available. This allows separate
* flags for PageHead() and PageTail() checks of compound pages so that bit
* tests can be used in performance sensitive paths. PageCompound is
- * generally not used in hot code paths.
+ * generally not used in hot code paths except arch/powerpc/mm/init_64.c
+ * and arch/powerpc/kvm/book3s_64_vio_hv.c which use it to detect huge pages
+ * and avoid handling those in real mode.
*/
__PAGEFLAG(Head, head) CLEARPAGEFLAG(Head, head)
__PAGEFLAG(Tail, tail)
--
1.8.3.2

2013-08-01 04:45:26

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 06/10] KVM: PPC: enable IOMMU_API for KVM_BOOK3S_64 permanently

It does not make much sense to have KVM in book3s-64bit and
not to have IOMMU bits for PCI pass through support as it costs little
and allows VFIO to function on book3s-kvm.

Having IOMMU_API always enabled makes it unnecessary to have a lot of
"#ifdef IOMMU_API" in arch/powerpc/kvm/book3s_64_vio*. With those
ifdef's we could have only user space emulated devices accelerated
(but not VFIO) which do not seem to be very useful.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/kvm/Kconfig | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index c55c538..3b2b761 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -59,6 +59,7 @@ config KVM_BOOK3S_64
depends on PPC_BOOK3S_64
select KVM_BOOK3S_64_HANDLER
select KVM
+ select SPAPR_TCE_IOMMU
---help---
Support running unmodified book3s_64 and book3s_32 guest kernels
in virtual machines on book3s_64 host processors.
--
1.8.3.2

2013-08-01 04:45:37

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 07/10] KVM: PPC: Add support for multiple-TCE hcalls

This adds real mode handlers for the H_PUT_TCE_INDIRECT and
H_STUFF_TCE hypercalls for user space emulated devices such as IBMVIO
devices or emulated PCI. These calls allow adding multiple entries
(up to 512) into the TCE table in one call which saves time on
transition to/from real mode.

This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs
(copied from user and verified) before writing the whole list into
the TCE table. This cache will be utilized more in the upcoming
VFIO/IOMMU support to continue TCE list processing in the virtual
mode in the case if the real mode handler failed for some reason.

This adds a function to convert a guest physical address to a host
virtual address in order to parse a TCE list from H_PUT_TCE_INDIRECT.

This also implements the KVM_CAP_PPC_MULTITCE capability. When present,
the hypercalls mentioned above may or may not be processed successfully
in the kernel based fast path. If they can not be handled by the kernel,
they will get passed on to user space. So user space still has to have
an implementation for these despite the in kernel acceleration.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>

---
Changelog:
2013/08/01 (v7):
* realmode_get_page/realmode_put_page use was replaced with
get_page_unless_zero/put_page_unless_one

2013/07/11:
* addressed many, many comments from maintainers

2013/07/06:
* fixed number of wrong get_page()/put_page() calls

2013/06/27:
* fixed clear of BUSY bit in kvmppc_lookup_pte()
* H_PUT_TCE_INDIRECT does realmode_get_page() now
* KVM_CAP_SPAPR_MULTITCE now depends on CONFIG_PPC_BOOK3S_64
* updated doc

2013/06/05:
* fixed mistype about IBMVIO in the commit message
* updated doc and moved it to another section
* changed capability number

2013/05/21:
* added kvm_vcpu_arch::tce_tmp
* removed cleanup if put_indirect failed, instead we do not even start
writing to TCE table if we cannot get TCEs from the user and they are
invalid
* kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce
and kvmppc_emulated_validate_tce (for the previous item)
* fixed bug with failthrough for H_IPI
* removed all get_user() from real mode handlers
* kvmppc_lookup_pte() added (instead of making lookup_linux_pte public)

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
Documentation/virtual/kvm/api.txt | 26 ++++
arch/powerpc/include/asm/kvm_host.h | 9 ++
arch/powerpc/include/asm/kvm_ppc.h | 16 +-
arch/powerpc/kvm/book3s_64_vio.c | 132 +++++++++++++++-
arch/powerpc/kvm/book3s_64_vio_hv.c | 267 ++++++++++++++++++++++++++++----
arch/powerpc/kvm/book3s_hv.c | 41 ++++-
arch/powerpc/kvm/book3s_hv_rmhandlers.S | 8 +-
arch/powerpc/kvm/book3s_pr_papr.c | 35 +++++
arch/powerpc/kvm/powerpc.c | 3 +
9 files changed, 503 insertions(+), 34 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index ef925ea..1c8942a 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2382,6 +2382,32 @@ calls by the guest for that service will be passed to userspace to be
handled.

+4.86 KVM_CAP_PPC_MULTITCE
+
+Capability: KVM_CAP_PPC_MULTITCE
+Architectures: ppc
+Type: vm
+
+This capability means the kernel is capable of handling hypercalls
+H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user
+space. This significantly accelerates DMA operations for PPC KVM guests.
+User space should expect that its handlers for these hypercalls
+are not going to be called if user space previously registered LIOBN
+in KVM (via KVM_CREATE_SPAPR_TCE or similar calls).
+
+In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest,
+user space might have to advertise it for the guest. For example,
+IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is
+present in the "ibm,hypertas-functions" device-tree property.
+
+The hypercalls mentioned above may or may not be processed successfully
+in the kernel based fast path. If they can not be handled by the kernel,
+they will get passed on to user space. So user space still has to have
+an implementation for these despite the in kernel acceleration.
+
+This capability is always enabled.
+
+
5. The kvm_run structure
------------------------

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index af326cd..b8fe3de 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -30,6 +30,7 @@
#include <linux/kvm_para.h>
#include <linux/list.h>
#include <linux/atomic.h>
+#include <linux/tracepoint.h>
#include <asm/kvm_asm.h>
#include <asm/processor.h>
#include <asm/page.h>
@@ -609,6 +610,14 @@ struct kvm_vcpu_arch {
spinlock_t tbacct_lock;
u64 busy_stolen;
u64 busy_preempt;
+
+ unsigned long *tce_tmp_hpas; /* TCE cache for TCE_PUT_INDIRECT hcall */
+ enum {
+ TCERM_NONE,
+ TCERM_GETPAGE,
+ TCERM_PUTTCE,
+ TCERM_PUTLIST,
+ } tce_rm_fail; /* failed stage of request processing */
#endif
};

diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index a5287fe..0ce4691 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -133,8 +133,20 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);

extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
-extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
- unsigned long ioba, unsigned long tce);
+extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
+ struct kvm_vcpu *vcpu, unsigned long liobn);
+extern long kvmppc_tce_validate(unsigned long tce);
+extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
+ unsigned long ioba, unsigned long tce);
+extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce);
+extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce_list, unsigned long npages);
+extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce_value, unsigned long npages);
extern long kvm_vm_ioctl_allocate_rma(struct kvm *kvm,
struct kvm_allocate_rma *rma);
extern struct kvmppc_linear_info *kvm_alloc_rma(void);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index b2d3f3b..2167feb 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -14,6 +14,7 @@
*
* Copyright 2010 Paul Mackerras, IBM Corp. <[email protected]>
* Copyright 2011 David Gibson, IBM Corporation <[email protected]>
+ * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <[email protected]>
*/

#include <linux/types.h>
@@ -36,8 +37,10 @@
#include <asm/ppc-opcode.h>
#include <asm/kvm_host.h>
#include <asm/udbg.h>
+#include <asm/iommu.h>
+#include <asm/tce.h>

-#define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
+#define ERROR_ADDR ((void *)~(unsigned long)0x0)

static long kvmppc_stt_npages(unsigned long window_size)
{
@@ -148,3 +151,130 @@ fail:
}
return ret;
}
+
+/* Converts guest physical address to host virtual address */
+static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
+ unsigned long gpa, struct page **pg)
+{
+ unsigned long hva, gfn = gpa >> PAGE_SHIFT;
+ struct kvm_memory_slot *memslot;
+ static const int is_write = 0;
+
+ memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
+ if (!memslot)
+ return ERROR_ADDR;
+
+ hva = __gfn_to_hva_memslot(memslot, gfn) | (gpa & ~PAGE_MASK);
+
+ if (get_user_pages_fast(hva & PAGE_MASK, 1, is_write, pg) != 1)
+ return ERROR_ADDR;
+
+ return (void *) hva;
+}
+
+long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce)
+{
+ long ret;
+ struct kvmppc_spapr_tce_table *tt;
+
+ tt = kvmppc_find_tce_table(vcpu, liobn);
+ if (!tt)
+ return H_TOO_HARD;
+
+ if (ioba >= tt->window_size)
+ return H_PARAMETER;
+
+ ret = kvmppc_tce_validate(tce);
+ if (ret)
+ return ret;
+
+ kvmppc_tce_put(tt, ioba, tce);
+
+ return H_SUCCESS;
+}
+
+long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce_list, unsigned long npages)
+{
+ struct kvmppc_spapr_tce_table *tt;
+ long i, ret = H_SUCCESS;
+ unsigned long __user *tces;
+ struct page *pg = NULL;
+
+ tt = kvmppc_find_tce_table(vcpu, liobn);
+ if (!tt)
+ return H_TOO_HARD;
+
+ /*
+ * The spec says that the maximum size of the list is 512 TCEs
+ * so the whole table addressed resides in 4K page
+ */
+ if (npages > 512)
+ return H_PARAMETER;
+
+ if (tce_list & ~IOMMU_PAGE_MASK)
+ return H_PARAMETER;
+
+ if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
+ return H_PARAMETER;
+
+ tces = kvmppc_gpa_to_hva_and_get(vcpu, tce_list, &pg);
+ if (tces == ERROR_ADDR)
+ return H_TOO_HARD;
+
+ if (vcpu->arch.tce_rm_fail == TCERM_PUTLIST)
+ goto put_list_page_exit;
+
+ for (i = 0; i < npages; ++i) {
+ if (get_user(vcpu->arch.tce_tmp_hpas[i], tces + i)) {
+ ret = H_PARAMETER;
+ goto put_list_page_exit;
+ }
+
+ ret = kvmppc_tce_validate(vcpu->arch.tce_tmp_hpas[i]);
+ if (ret)
+ goto put_list_page_exit;
+ }
+
+ for (i = 0; i < npages; ++i)
+ kvmppc_tce_put(tt, ioba + (i << IOMMU_PAGE_SHIFT),
+ vcpu->arch.tce_tmp_hpas[i]);
+put_list_page_exit:
+ if (pg) {
+ put_page(pg);
+ if (vcpu->arch.tce_rm_fail != TCERM_NONE) {
+ vcpu->arch.tce_rm_fail = TCERM_NONE;
+ /* Finish pending put_page() from realmode */
+ put_page(pg);
+ }
+ }
+
+ return ret;
+}
+
+long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce_value, unsigned long npages)
+{
+ struct kvmppc_spapr_tce_table *tt;
+ long i, ret;
+
+ tt = kvmppc_find_tce_table(vcpu, liobn);
+ if (!tt)
+ return H_TOO_HARD;
+
+ if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
+ return H_PARAMETER;
+
+ ret = kvmppc_tce_validate(tce_value);
+ if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
+ return H_PARAMETER;
+
+ for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE)
+ kvmppc_tce_put(tt, ioba, tce_value);
+
+ return H_SUCCESS;
+}
diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 30c2f3b..c564bf5 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -14,6 +14,7 @@
*
* Copyright 2010 Paul Mackerras, IBM Corp. <[email protected]>
* Copyright 2011 David Gibson, IBM Corporation <[email protected]>
+ * Copyright 2013 Alexey Kardashevskiy, IBM Corporation <[email protected]>
*/

#include <linux/types.h>
@@ -35,42 +36,250 @@
#include <asm/ppc-opcode.h>
#include <asm/kvm_host.h>
#include <asm/udbg.h>
+#include <asm/iommu.h>
+#include <asm/tce.h>

#define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64))
+#define ERROR_ADDR (~(unsigned long)0x0)

-/* WARNING: This will be called in real-mode on HV KVM and virtual
+/* Finds a TCE table descriptor by LIOBN.
+ *
+ * WARNING: This will be called in real or virtual mode on HV KVM and virtual
* mode on PR KVM
*/
-long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
+struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(struct kvm_vcpu *vcpu,
+ unsigned long liobn)
+{
+ struct kvmppc_spapr_tce_table *tt;
+
+ list_for_each_entry(tt, &vcpu->kvm->arch.spapr_tce_tables, list) {
+ if (tt->liobn == liobn)
+ return tt;
+ }
+
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(kvmppc_find_tce_table);
+
+/*
+ * Validates TCE address.
+ * At the moment only flags are validated.
+ * As the host kernel does not access those addresses (just puts them
+ * to the table and user space is supposed to process them), we can skip
+ * checking other things (such as TCE is a guest RAM address or the page
+ * was actually allocated).
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ * mode on PR KVM
+ */
+long kvmppc_tce_validate(unsigned long tce)
+{
+ if (tce & ~(IOMMU_PAGE_MASK | TCE_PCI_WRITE | TCE_PCI_READ))
+ return H_PARAMETER;
+
+ return H_SUCCESS;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_validate);
+
+/* Note on the use of page_address() in real mode,
+ *
+ * It is safe to use page_address() in real mode on ppc64 because
+ * page_address() is always defined as lowmem_page_address()
+ * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetial
+ * operation and does not access page struct.
+ *
+ * Theoretically page_address() could be defined different
+ * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL
+ * should be enabled.
+ * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64,
+ * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only
+ * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP
+ * is not expected to be enabled on ppc32, page_address()
+ * is safe for ppc32 as well.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ * mode on PR KVM
+ */
+extern u64 *kvmppc_page_address(struct page *page)
+{
+#if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL)
+#error TODO: fix to avoid page_address() here
+#endif
+ return (u64 *) page_address(page);
+}
+
+/*
+ * Handles TCE requests for emulated devices.
+ * Puts guest TCE values to the table and expects user space to convert them.
+ * Called in both real and virtual modes.
+ * Cannot fail so kvmppc_tce_validate must be called before it.
+ *
+ * WARNING: This will be called in real-mode on HV KVM and virtual
+ * mode on PR KVM
+ */
+void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
+ unsigned long ioba, unsigned long tce)
+{
+ unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
+ struct page *page;
+ u64 *tbl;
+
+ page = tt->pages[idx / TCES_PER_PAGE];
+ tbl = kvmppc_page_address(page);
+
+ tbl[idx % TCES_PER_PAGE] = tce;
+}
+EXPORT_SYMBOL_GPL(kvmppc_tce_put);
+
+#ifdef CONFIG_KVM_BOOK3S_64_HV
+/*
+ * Converts guest physical address to host physical address.
+ * Tries to increase page counter via get_page_unless_zero() and
+ * returns ERROR_ADDR if failed.
+ */
+static unsigned long kvmppc_rm_gpa_to_hpa_and_get(struct kvm_vcpu *vcpu,
+ unsigned long gpa, struct page **pg)
+{
+ struct kvm_memory_slot *memslot;
+ pte_t *ptep, pte;
+ unsigned long hva, hpa = ERROR_ADDR;
+ unsigned long gfn = gpa >> PAGE_SHIFT;
+ unsigned shift = 0;
+
+ memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
+ if (!memslot)
+ return ERROR_ADDR;
+
+ hva = __gfn_to_hva_memslot(memslot, gfn);
+
+ ptep = find_linux_pte_or_hugepte(vcpu->arch.pgdir, hva, &shift);
+ if (!ptep || !pte_present(*ptep))
+ return ERROR_ADDR;
+ pte = *ptep;
+
+ if (!shift)
+ shift = PAGE_SHIFT;
+
+ /* Avoid handling anything potentially complicated in realmode */
+ if (shift > PAGE_SHIFT)
+ return ERROR_ADDR;
+
+ if (((gpa & TCE_PCI_WRITE) || pte_write(pte)) && !pte_dirty(pte))
+ return ERROR_ADDR;
+
+ if (!pte_young(pte))
+ return ERROR_ADDR;
+
+ /* Increase page counter */
+ *pg = realmode_pfn_to_page(pte_pfn(pte));
+ if (!*pg || PageCompound(*pg) || !get_page_unless_zero(*pg))
+ return ERROR_ADDR;
+
+ hpa = (pte_pfn(pte) << PAGE_SHIFT) + (gpa & ((1 << shift) - 1));
+
+ /* Page has gone since we got pte, safer to put the request to virt mode */
+ if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+ hpa = ERROR_ADDR;
+ /* Try drop the page, if failed, let virtmode do that */
+ if (put_page_unless_one(*pg))
+ *pg = NULL;
+ }
+
+ return hpa;
+}
+
+long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
unsigned long ioba, unsigned long tce)
{
- struct kvm *kvm = vcpu->kvm;
- struct kvmppc_spapr_tce_table *stt;
-
- /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */
- /* liobn, ioba, tce); */
-
- list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) {
- if (stt->liobn == liobn) {
- unsigned long idx = ioba >> SPAPR_TCE_SHIFT;
- struct page *page;
- u64 *tbl;
-
- /* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p window_size=0x%x\n", */
- /* liobn, stt, stt->window_size); */
- if (ioba >= stt->window_size)
- return H_PARAMETER;
-
- page = stt->pages[idx / TCES_PER_PAGE];
- tbl = (u64 *)page_address(page);
-
- /* FIXME: Need to validate the TCE itself */
- /* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */
- tbl[idx % TCES_PER_PAGE] = tce;
- return H_SUCCESS;
- }
+ long ret;
+ struct kvmppc_spapr_tce_table *tt;
+
+ tt = kvmppc_find_tce_table(vcpu, liobn);
+ if (!tt)
+ return H_TOO_HARD;
+
+ if (ioba >= tt->window_size)
+ return H_PARAMETER;
+
+ ret = kvmppc_tce_validate(tce);
+ if (!ret)
+ kvmppc_tce_put(tt, ioba, tce);
+
+ return ret;
+}
+
+long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce_list, unsigned long npages)
+{
+ struct kvmppc_spapr_tce_table *tt;
+ long i, ret = H_SUCCESS;
+ unsigned long tces;
+ struct page *pg = NULL;
+
+ tt = kvmppc_find_tce_table(vcpu, liobn);
+ if (!tt)
+ return H_TOO_HARD;
+
+ /*
+ * The spec says that the maximum size of the list is 512 TCEs
+ * so the whole table addressed resides in 4K page
+ */
+ if (npages > 512)
+ return H_PARAMETER;
+
+ if (tce_list & ~IOMMU_PAGE_MASK)
+ return H_PARAMETER;
+
+ if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
+ return H_PARAMETER;
+
+ tces = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tce_list, &pg);
+ if (tces == ERROR_ADDR) {
+ ret = H_TOO_HARD;
+ goto put_unlock_exit;
}

- /* Didn't find the liobn, punt it to userspace */
- return H_TOO_HARD;
+ for (i = 0; i < npages; ++i) {
+ ret = kvmppc_tce_validate(((unsigned long *)tces)[i]);
+ if (ret)
+ goto put_unlock_exit;
+ }
+
+ for (i = 0; i < npages; ++i)
+ kvmppc_tce_put(tt, ioba + (i << IOMMU_PAGE_SHIFT),
+ ((unsigned long *)tces)[i]);
+
+put_unlock_exit:
+ if (!ret && pg && !put_page_unless_one(pg)) {
+ vcpu->arch.tce_rm_fail = TCERM_PUTLIST;
+ ret = H_TOO_HARD;
+ }
+
+ return ret;
+}
+
+long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce_value, unsigned long npages)
+{
+ struct kvmppc_spapr_tce_table *tt;
+ long i, ret;
+
+ tt = kvmppc_find_tce_table(vcpu, liobn);
+ if (!tt)
+ return H_TOO_HARD;
+
+ if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
+ return H_PARAMETER;
+
+ ret = kvmppc_tce_validate(tce_value);
+ if (ret || (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)))
+ return H_PARAMETER;
+
+ for (i = 0; i < npages; ++i, ioba += IOMMU_PAGE_SIZE)
+ kvmppc_tce_put(tt, ioba, tce_value);
+
+ return H_SUCCESS;
}
+#endif /* CONFIG_KVM_BOOK3S_64_HV */
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 2efa9dd..8da85e6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -567,7 +567,31 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu)
if (kvmppc_xics_enabled(vcpu)) {
ret = kvmppc_xics_hcall(vcpu, req);
break;
- } /* fallthrough */
+ }
+ return RESUME_HOST;
+ case H_PUT_TCE:
+ ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+ kvmppc_get_gpr(vcpu, 5),
+ kvmppc_get_gpr(vcpu, 6));
+ if (ret == H_TOO_HARD)
+ return RESUME_HOST;
+ break;
+ case H_PUT_TCE_INDIRECT:
+ ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4),
+ kvmppc_get_gpr(vcpu, 5),
+ kvmppc_get_gpr(vcpu, 6),
+ kvmppc_get_gpr(vcpu, 7));
+ if (ret == H_TOO_HARD)
+ return RESUME_HOST;
+ break;
+ case H_STUFF_TCE:
+ ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4),
+ kvmppc_get_gpr(vcpu, 5),
+ kvmppc_get_gpr(vcpu, 6),
+ kvmppc_get_gpr(vcpu, 7));
+ if (ret == H_TOO_HARD)
+ return RESUME_HOST;
+ break;
default:
return RESUME_HOST;
}
@@ -958,6 +982,20 @@ struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm, unsigned int id)
vcpu->arch.cpu_type = KVM_CPU_3S_64;
kvmppc_sanity_check(vcpu);

+ /*
+ * As we want to minimize the chance of having H_PUT_TCE_INDIRECT
+ * half executed, we first read TCEs from the user, check them and
+ * return error if something went wrong and only then put TCEs into
+ * the TCE table.
+ *
+ * tce_tmp_hpas is a cache for TCEs to avoid stack allocation or
+ * kmalloc as the whole TCE list can take up to 512 items 8 bytes
+ * each (4096 bytes).
+ */
+ vcpu->arch.tce_tmp_hpas = kmalloc(4096, GFP_KERNEL);
+ if (!vcpu->arch.tce_tmp_hpas)
+ goto free_vcpu;
+
return vcpu;

free_vcpu:
@@ -980,6 +1018,7 @@ void kvmppc_core_vcpu_free(struct kvm_vcpu *vcpu)
unpin_vpa(vcpu->kvm, &vcpu->arch.slb_shadow);
unpin_vpa(vcpu->kvm, &vcpu->arch.vpa);
spin_unlock(&vcpu->arch.vpa_update_lock);
+ kfree(vcpu->arch.tce_tmp_hpas);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
}
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index b02f91e..15942bc 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1416,7 +1416,7 @@ hcall_real_table:
.long 0 /* 0x14 - H_CLEAR_REF */
.long .kvmppc_h_protect - hcall_real_table
.long 0 /* 0x1c - H_GET_TCE */
- .long .kvmppc_h_put_tce - hcall_real_table
+ .long .kvmppc_rm_h_put_tce - hcall_real_table
.long 0 /* 0x24 - H_SET_SPRG0 */
.long .kvmppc_h_set_dabr - hcall_real_table
.long 0 /* 0x2c */
@@ -1490,6 +1490,12 @@ hcall_real_table:
.long 0 /* 0x11c */
.long 0 /* 0x120 */
.long .kvmppc_h_bulk_remove - hcall_real_table
+ .long 0 /* 0x128 */
+ .long 0 /* 0x12c */
+ .long 0 /* 0x130 */
+ .long 0 /* 0x134 */
+ .long .kvmppc_rm_h_stuff_tce - hcall_real_table
+ .long .kvmppc_rm_h_put_tce_indirect - hcall_real_table
hcall_real_table_end:

ignore_hdec:
diff --git a/arch/powerpc/kvm/book3s_pr_papr.c b/arch/powerpc/kvm/book3s_pr_papr.c
index da0e0bc..6bd0d4a 100644
--- a/arch/powerpc/kvm/book3s_pr_papr.c
+++ b/arch/powerpc/kvm/book3s_pr_papr.c
@@ -227,6 +227,37 @@ static int kvmppc_h_pr_put_tce(struct kvm_vcpu *vcpu)
return EMULATE_DONE;
}

+static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu)
+{
+ unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+ unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+ unsigned long tce = kvmppc_get_gpr(vcpu, 6);
+ unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+ long rc;
+
+ rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba,
+ tce, npages);
+ if (rc == H_TOO_HARD)
+ return EMULATE_FAIL;
+ kvmppc_set_gpr(vcpu, 3, rc);
+ return EMULATE_DONE;
+}
+
+static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu)
+{
+ unsigned long liobn = kvmppc_get_gpr(vcpu, 4);
+ unsigned long ioba = kvmppc_get_gpr(vcpu, 5);
+ unsigned long tce_value = kvmppc_get_gpr(vcpu, 6);
+ unsigned long npages = kvmppc_get_gpr(vcpu, 7);
+ long rc;
+
+ rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages);
+ if (rc == H_TOO_HARD)
+ return EMULATE_FAIL;
+ kvmppc_set_gpr(vcpu, 3, rc);
+ return EMULATE_DONE;
+}
+
static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd)
{
long rc = kvmppc_xics_hcall(vcpu, cmd);
@@ -247,6 +278,10 @@ int kvmppc_h_pr(struct kvm_vcpu *vcpu, unsigned long cmd)
return kvmppc_h_pr_bulk_remove(vcpu);
case H_PUT_TCE:
return kvmppc_h_pr_put_tce(vcpu);
+ case H_PUT_TCE_INDIRECT:
+ return kvmppc_h_pr_put_tce_indirect(vcpu);
+ case H_STUFF_TCE:
+ return kvmppc_h_pr_stuff_tce(vcpu);
case H_CEDE:
vcpu->arch.shared->msr |= MSR_EE;
kvm_vcpu_block(vcpu);
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 6316ee3..ccb578b 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -394,6 +394,9 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PPC_GET_SMMU_INFO:
r = 1;
break;
+ case KVM_CAP_SPAPR_MULTITCE:
+ r = 1;
+ break;
#endif
default:
r = 0;
--
1.8.3.2

2013-08-01 04:45:48

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 09/10] KVM: PPC: Add support for IOMMU in-kernel handling

This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
and H_STUFF_TCE requests targeted an IOMMU TCE table without passing
them to user space which saves time on switching to user space and back.

Both real and virtual modes are supported. The kernel tries to
handle a TCE request in the real mode, if fails it passes the request
to the virtual mode to complete the operation. If it a virtual mode
handler fails, the request is passed to user space.

The first user of this is VFIO on POWER. The external user API in VFIO
is required for this patch. The patch adds a new KVM_CAP_SPAPR_TCE_IOMMU
ioctl to associate a virtual PCI bus number (LIOBN) with an VFIO IOMMU
group fd and enable in-kernel handling of map/unmap requests.

Tests show that this patch increases transmission speed from 220MB/s
to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>

---

Changes:
2013/07/11:
* removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
for KVM_BOOK3S_64
* kvmppc_gpa_to_hva_and_get also returns host phys address. Not much sense
for this here but the next patch for hugepages support will use it more.

2013/07/06:
* added realmode arch_spin_lock to protect TCE table from races
in real and virtual modes
* POWERPC IOMMU API is changed to support real mode
* iommu_take_ownership and iommu_release_ownership are protected by
iommu_table's locks
* VFIO external user API use rewritten
* multiple small fixes

2013/06/27:
* tce_list page is referenced now in order to protect it from accident
invalidation during H_PUT_TCE_INDIRECT execution
* added use of the external user VFIO API

2013/06/05:
* changed capability number
* changed ioctl number
* update the doc article number

2013/05/20:
* removed get_user() from real mode handlers
* kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there
translated TCEs, tries realmode_get_page() on those and if it fails, it
passes control over the virtual mode handler which tries to finish
the request handling
* kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit
on a page
* The only reason to pass the request to user mode now is when the user mode
did not register TCE table in the kernel, in all other cases the virtual mode
handler is expected to do the job

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
Documentation/virtual/kvm/api.txt | 26 ++++
arch/powerpc/include/asm/kvm_host.h | 3 +
arch/powerpc/include/asm/kvm_ppc.h | 2 +
arch/powerpc/include/uapi/asm/kvm.h | 7 +
arch/powerpc/kvm/book3s_64_vio.c | 296 +++++++++++++++++++++++++++++++++++-
arch/powerpc/kvm/book3s_64_vio_hv.c | 122 +++++++++++++++
arch/powerpc/kvm/powerpc.c | 12 ++
7 files changed, 463 insertions(+), 5 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
index 1c8942a..6ae65bd 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2408,6 +2408,32 @@ an implementation for these despite the in kernel acceleration.
This capability is always enabled.

+4.87 KVM_CREATE_SPAPR_TCE_IOMMU
+
+Capability: KVM_CAP_SPAPR_TCE_IOMMU
+Architectures: powerpc
+Type: vm ioctl
+Parameters: struct kvm_create_spapr_tce_iommu (in)
+Returns: 0 on success, -1 on error
+
+struct kvm_create_spapr_tce_iommu {
+ __u64 liobn;
+ __u32 fd;
+ __u32 flags;
+};
+
+This creates a link between IOMMU group and a hardware TCE (translation
+control entry) table. This link lets the host kernel know what IOMMU
+group (i.e. TCE table) to use for the LIOBN number passed with
+H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
+
+User space passes VFIO group fd. Using the external user VFIO API,
+KVM tries gets IOMMU id from passed fd. If succeeded, acceleration
+turns on. If failed, map/unmap requests are passed to user space.
+
+No flag is supported at the moment.
+
+
5. The kvm_run structure
------------------------

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index b8fe3de..4eeaf7d 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -181,6 +181,8 @@ struct kvmppc_spapr_tce_table {
struct kvm *kvm;
u64 liobn;
u32 window_size;
+ struct iommu_group *grp; /* used for IOMMU groups */
+ struct vfio_group *vfio_grp; /* used for IOMMU groups */
struct page *pages[0];
};

@@ -612,6 +614,7 @@ struct kvm_vcpu_arch {
u64 busy_preempt;

unsigned long *tce_tmp_hpas; /* TCE cache for TCE_PUT_INDIRECT hcall */
+ unsigned long tce_tmp_num; /* Number of handled TCEs in the cache */
enum {
TCERM_NONE,
TCERM_GETPAGE,
diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
index 0ce4691..297cab5 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -133,6 +133,8 @@ extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu);

extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
struct kvm_create_spapr_tce *args);
+extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
+ struct kvm_create_spapr_tce_iommu *args);
extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
struct kvm_vcpu *vcpu, unsigned long liobn);
extern long kvmppc_tce_validate(unsigned long tce);
diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
index 0fb1a6e..3da4aa3 100644
--- a/arch/powerpc/include/uapi/asm/kvm.h
+++ b/arch/powerpc/include/uapi/asm/kvm.h
@@ -319,6 +319,13 @@ struct kvm_create_spapr_tce {
__u32 window_size;
};

+/* for KVM_CAP_SPAPR_TCE_IOMMU */
+struct kvm_create_spapr_tce_iommu {
+ __u64 liobn;
+ __u32 fd;
+ __u32 flags;
+};
+
/* for KVM_ALLOCATE_RMA */
struct kvm_allocate_rma {
__u64 rma_size;
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 2167feb..1e86a8a 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -27,6 +27,10 @@
#include <linux/hugetlb.h>
#include <linux/list.h>
#include <linux/anon_inodes.h>
+#include <linux/iommu.h>
+#include <linux/module.h>
+#include <linux/file.h>
+#include <linux/vfio.h>

#include <asm/tlbflush.h>
#include <asm/kvm_ppc.h>
@@ -42,6 +46,53 @@

#define ERROR_ADDR ((void *)~(unsigned long)0x0)

+/*
+ * Dynamically linked version of the external user VFIO API.
+ *
+ * As a IOMMU group access control is implemented by VFIO,
+ * there is some API to vefiry that specific process can own
+ * a group. As KVM may run when VFIO is not loaded, KVM is not
+ * linked statically to VFIO, instead wrappers are used.
+ */
+struct vfio_group *kvmppc_vfio_group_get_external_user(struct file *filep)
+{
+ struct vfio_group *ret;
+ struct vfio_group * (*proc)(struct file *) =
+ symbol_get(vfio_group_get_external_user);
+ if (!proc)
+ return NULL;
+
+ ret = proc(filep);
+ symbol_put(vfio_group_get_external_user);
+
+ return ret;
+}
+
+void kvmppc_vfio_group_put_external_user(struct vfio_group *group)
+{
+ void (*proc)(struct vfio_group *) =
+ symbol_get(vfio_group_put_external_user);
+ if (!proc)
+ return;
+
+ proc(group);
+ symbol_put(vfio_group_put_external_user);
+}
+
+int kvmppc_vfio_external_user_iommu_id(struct vfio_group *group)
+{
+ int ret;
+ int (*proc)(struct vfio_group *) =
+ symbol_get(vfio_external_user_iommu_id);
+ if (!proc)
+ return -EINVAL;
+
+ ret = proc(group);
+ symbol_put(vfio_external_user_iommu_id);
+
+ return ret;
+}
+
static long kvmppc_stt_npages(unsigned long window_size)
{
return ALIGN((window_size >> SPAPR_TCE_SHIFT)
@@ -55,8 +106,15 @@ static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)

mutex_lock(&kvm->lock);
list_del(&stt->list);
- for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
- __free_page(stt->pages[i]);
+
+ if (stt->grp) {
+ if (stt->vfio_grp)
+ kvmppc_vfio_group_put_external_user(stt->vfio_grp);
+ iommu_group_put(stt->grp);
+ } else
+ for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++)
+ __free_page(stt->pages[i]);
+
kfree(stt);
mutex_unlock(&kvm->lock);

@@ -152,9 +210,90 @@ fail:
return ret;
}

-/* Converts guest physical address to host virtual address */
+static const struct file_operations kvm_spapr_tce_iommu_fops = {
+ .release = kvm_spapr_tce_release,
+};
+
+long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
+ struct kvm_create_spapr_tce_iommu *args)
+{
+ struct kvmppc_spapr_tce_table *tt = NULL;
+ struct iommu_group *grp;
+ struct iommu_table *tbl;
+ struct file *vfio_filp;
+ struct vfio_group *vfio_grp;
+ int ret = 0, iommu_id;
+
+ /* Check this LIOBN hasn't been previously registered */
+ list_for_each_entry(tt, &kvm->arch.spapr_tce_tables, list) {
+ if (tt->liobn == args->liobn)
+ return -EBUSY;
+ }
+
+ vfio_filp = fget(args->fd);
+ if (!vfio_filp)
+ return -ENXIO;
+
+ /* Lock the group. Fails if group is not viable or does not have IOMMU set */
+ vfio_grp = kvmppc_vfio_group_get_external_user(vfio_filp);
+ if (IS_ERR_VALUE((unsigned long)vfio_grp))
+ goto fput_exit;
+
+ /* Get IOMMU ID, find iommu_group and iommu_table*/
+ iommu_id = kvmppc_vfio_external_user_iommu_id(vfio_grp);
+ if (iommu_id < 0)
+ goto grpput_fput_exit;
+
+ ret = -ENXIO;
+ grp = iommu_group_get_by_id(iommu_id);
+ if (!grp)
+ goto grpput_fput_exit;
+
+ tbl = iommu_group_get_iommudata(grp);
+ if (!tbl)
+ goto grpput_fput_exit;
+
+ /* Create a TCE table descriptor and add into the descriptor list */
+ tt = kzalloc(sizeof(*tt), GFP_KERNEL);
+ if (!tt)
+ goto grpput_fput_exit;
+
+ tt->liobn = args->liobn;
+ kvm_get_kvm(kvm);
+ tt->kvm = kvm;
+ tt->grp = grp;
+ tt->window_size = tbl->it_size << IOMMU_PAGE_SHIFT;
+ tt->vfio_grp = vfio_grp;
+
+ /* Create an inode to provide automatic cleanup upon exit */
+ ret = anon_inode_getfd("kvm-spapr-tce-iommu",
+ &kvm_spapr_tce_iommu_fops, tt, O_RDWR);
+ if (ret < 0)
+ goto free_grpput_fput_exit;
+
+ /* Add the TCE table descriptor to the descriptor list */
+ mutex_lock(&kvm->lock);
+ list_add(&tt->list, &kvm->arch.spapr_tce_tables);
+ mutex_unlock(&kvm->lock);
+
+ goto fput_exit;
+
+free_grpput_fput_exit:
+ kfree(tt);
+grpput_fput_exit:
+ kvmppc_vfio_group_put_external_user(vfio_grp);
+fput_exit:
+ fput(vfio_filp);
+
+ return ret;
+}
+
+/*
+ * Converts guest physical address to host virtual address.
+ * Also returns host physical address which is to put to TCE table.
+ */
static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
- unsigned long gpa, struct page **pg)
+ unsigned long gpa, struct page **pg, unsigned long *phpa)
{
unsigned long hva, gfn = gpa >> PAGE_SHIFT;
struct kvm_memory_slot *memslot;
@@ -169,9 +308,140 @@ static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
if (get_user_pages_fast(hva & PAGE_MASK, 1, is_write, pg) != 1)
return ERROR_ADDR;

+ if (phpa)
+ *phpa = __pa((unsigned long) page_address(*pg)) |
+ (hva & ~PAGE_MASK);
+
return (void *) hva;
}

+long kvmppc_h_put_tce_iommu(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce)
+{
+ struct page *pg = NULL;
+ unsigned long hpa;
+ void __user *hva;
+ struct iommu_table *tbl = iommu_group_get_iommudata(tt->grp);
+
+ if (!tbl)
+ return H_RESCINDED;
+
+ /* Clear TCE */
+ if (!(tce & (TCE_PCI_READ | TCE_PCI_WRITE))) {
+ if (iommu_tce_clear_param_check(tbl, ioba, 0, 1))
+ return H_PARAMETER;
+
+ if (iommu_free_tces(tbl, ioba >> IOMMU_PAGE_SHIFT,
+ 1, false))
+ return H_HARDWARE;
+
+ return H_SUCCESS;
+ }
+
+ /* Put TCE */
+ if (vcpu->arch.tce_rm_fail != TCERM_NONE) {
+ /* Retry iommu_tce_build if it failed in real mode */
+ vcpu->arch.tce_rm_fail = TCERM_NONE;
+ hpa = vcpu->arch.tce_tmp_hpas[0];
+ } else {
+ if (iommu_tce_put_param_check(tbl, ioba, tce))
+ return H_PARAMETER;
+
+ hva = kvmppc_gpa_to_hva_and_get(vcpu, tce, &pg, &hpa);
+ if (hva == ERROR_ADDR)
+ return H_HARDWARE;
+ }
+
+ if (!iommu_tce_build(tbl, ioba >> IOMMU_PAGE_SHIFT, &hpa, 1, false))
+ return H_SUCCESS;
+
+ pg = pfn_to_page(hpa >> PAGE_SHIFT);
+ if (pg)
+ put_page(pg);
+
+ return H_HARDWARE;
+}
+
+static long kvmppc_h_put_tce_indirect_iommu(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt, unsigned long ioba,
+ unsigned long __user *tces, unsigned long npages)
+{
+ long i = 0, start = 0;
+ struct iommu_table *tbl = iommu_group_get_iommudata(tt->grp);
+
+ if (!tbl)
+ return H_RESCINDED;
+
+ switch (vcpu->arch.tce_rm_fail) {
+ case TCERM_NONE:
+ break;
+ case TCERM_GETPAGE:
+ start = vcpu->arch.tce_tmp_num;
+ break;
+ case TCERM_PUTTCE:
+ goto put_tces;
+ case TCERM_PUTLIST:
+ default:
+ WARN_ON(1);
+ return H_HARDWARE;
+ }
+
+ for (i = start; i < npages; ++i) {
+ struct page *pg = NULL;
+ unsigned long gpa;
+ void __user *hva;
+
+ if (get_user(gpa, tces + i))
+ return H_HARDWARE;
+
+ if (iommu_tce_put_param_check(tbl, ioba +
+ (i << IOMMU_PAGE_SHIFT), gpa))
+ return H_PARAMETER;
+
+ hva = kvmppc_gpa_to_hva_and_get(vcpu, gpa, &pg,
+ &vcpu->arch.tce_tmp_hpas[i]);
+ if (hva == ERROR_ADDR)
+ goto putpages_flush_exit;
+ }
+
+put_tces:
+ if (!iommu_tce_build(tbl, ioba >> IOMMU_PAGE_SHIFT,
+ vcpu->arch.tce_tmp_hpas, npages, false))
+ return H_SUCCESS;
+
+putpages_flush_exit:
+ for ( --i; i >= 0; --i) {
+ struct page *pg;
+ pg = pfn_to_page(vcpu->arch.tce_tmp_hpas[i] >> PAGE_SHIFT);
+ if (pg)
+ put_page(pg);
+ }
+
+ return H_HARDWARE;
+}
+
+long kvmppc_h_stuff_tce_iommu(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce_value, unsigned long npages)
+{
+ struct iommu_table *tbl = iommu_group_get_iommudata(tt->grp);
+ unsigned long entry = ioba >> IOMMU_PAGE_SHIFT;
+
+ if (!tbl)
+ return H_RESCINDED;
+
+ if (iommu_tce_clear_param_check(tbl, ioba, tce_value, npages))
+ return H_PARAMETER;
+
+ if (iommu_free_tces(tbl, entry, npages, false))
+ return H_HARDWARE;
+
+ return H_SUCCESS;
+}
+
long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
unsigned long liobn, unsigned long ioba,
unsigned long tce)
@@ -183,6 +453,10 @@ long kvmppc_h_put_tce(struct kvm_vcpu *vcpu,
if (!tt)
return H_TOO_HARD;

+ if (tt->grp)
+ return kvmppc_h_put_tce_iommu(vcpu, tt, liobn, ioba, tce);
+
+ /* Emulated IO */
if (ioba >= tt->window_size)
return H_PARAMETER;

@@ -221,13 +495,20 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
return H_PARAMETER;

- tces = kvmppc_gpa_to_hva_and_get(vcpu, tce_list, &pg);
+ tces = kvmppc_gpa_to_hva_and_get(vcpu, tce_list, &pg, NULL);
if (tces == ERROR_ADDR)
return H_TOO_HARD;

if (vcpu->arch.tce_rm_fail == TCERM_PUTLIST)
goto put_list_page_exit;

+ if (tt->grp) {
+ ret = kvmppc_h_put_tce_indirect_iommu(vcpu,
+ tt, ioba, tces, npages);
+ goto put_list_page_exit;
+ }
+
+ /* Emulated IO */
for (i = 0; i < npages; ++i) {
if (get_user(vcpu->arch.tce_tmp_hpas[i], tces + i)) {
ret = H_PARAMETER;
@@ -266,6 +547,11 @@ long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu,
if (!tt)
return H_TOO_HARD;

+ if (tt->grp)
+ return kvmppc_h_stuff_tce_iommu(vcpu, tt, liobn, ioba,
+ tce_value, npages);
+
+ /* Emulated IO */
if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
return H_PARAMETER;

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index c564bf5..f81451b 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -26,6 +26,7 @@
#include <linux/slab.h>
#include <linux/hugetlb.h>
#include <linux/list.h>
+#include <linux/iommu.h>

#include <asm/tlbflush.h>
#include <asm/kvm_ppc.h>
@@ -188,6 +189,111 @@ static unsigned long kvmppc_rm_gpa_to_hpa_and_get(struct kvm_vcpu *vcpu,
return hpa;
}

+static long kvmppc_rm_h_put_tce_iommu(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt, unsigned long liobn,
+ unsigned long ioba, unsigned long tce)
+{
+ int ret = 0;
+ struct iommu_table *tbl = iommu_group_get_iommudata(tt->grp);
+ unsigned long hpa;
+ struct page *pg = NULL;
+
+ if (!tbl)
+ return H_RESCINDED;
+
+ /* Clear TCE */
+ if (!(tce & (TCE_PCI_READ | TCE_PCI_WRITE))) {
+ if (iommu_tce_clear_param_check(tbl, ioba, 0, 1))
+ return H_PARAMETER;
+
+ if (iommu_free_tces(tbl, ioba >> IOMMU_PAGE_SHIFT, 1, true))
+ return H_TOO_HARD;
+
+ return H_SUCCESS;
+ }
+
+ /* Put TCE */
+ if (iommu_tce_put_param_check(tbl, ioba, tce))
+ return H_PARAMETER;
+
+ hpa = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tce, &pg);
+ if (hpa != ERROR_ADDR) {
+ ret = iommu_tce_build(tbl, ioba >> IOMMU_PAGE_SHIFT,
+ &hpa, 1, true);
+ }
+
+ if (((hpa == ERROR_ADDR) && pg) || ret) {
+ vcpu->arch.tce_tmp_hpas[0] = hpa;
+ vcpu->arch.tce_tmp_num = 0;
+ vcpu->arch.tce_rm_fail = TCERM_PUTTCE;
+ return H_TOO_HARD;
+ }
+
+ return H_SUCCESS;
+}
+
+static long kvmppc_rm_h_put_tce_indirect_iommu(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt, unsigned long ioba,
+ unsigned long *tces, unsigned long npages)
+{
+ int i, ret;
+ unsigned long hpa;
+ struct iommu_table *tbl = iommu_group_get_iommudata(tt->grp);
+ struct page *pg = NULL;
+
+ if (!tbl)
+ return H_RESCINDED;
+
+ /* Check all TCEs */
+ for (i = 0; i < npages; ++i) {
+ if (iommu_tce_put_param_check(tbl, ioba +
+ (i << IOMMU_PAGE_SHIFT), tces[i]))
+ return H_PARAMETER;
+ }
+
+ /* Translate TCEs and go get_page() */
+ for (i = 0; i < npages; ++i) {
+ hpa = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tces[i], &pg);
+ if (hpa == ERROR_ADDR) {
+ vcpu->arch.tce_tmp_num = i;
+ vcpu->arch.tce_rm_fail = TCERM_GETPAGE;
+ return H_TOO_HARD;
+ }
+ vcpu->arch.tce_tmp_hpas[i] = hpa;
+ }
+
+ /* Put TCEs to the table */
+ ret = iommu_tce_build(tbl, (ioba >> IOMMU_PAGE_SHIFT),
+ vcpu->arch.tce_tmp_hpas, npages, true);
+ if (ret == -EAGAIN) {
+ vcpu->arch.tce_rm_fail = TCERM_PUTTCE;
+ return H_TOO_HARD;
+ } else if (ret) {
+ return H_HARDWARE;
+ }
+
+ return H_SUCCESS;
+}
+
+static long kvmppc_rm_h_stuff_tce_iommu(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt,
+ unsigned long liobn, unsigned long ioba,
+ unsigned long tce_value, unsigned long npages)
+{
+ struct iommu_table *tbl = iommu_group_get_iommudata(tt->grp);
+
+ if (!tbl)
+ return H_RESCINDED;
+
+ if (iommu_tce_clear_param_check(tbl, ioba, tce_value, npages))
+ return H_PARAMETER;
+
+ if (iommu_free_tces(tbl, ioba >> IOMMU_PAGE_SHIFT, npages, true))
+ return H_TOO_HARD;
+
+ return H_SUCCESS;
+}
+
long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
unsigned long ioba, unsigned long tce)
{
@@ -198,6 +304,10 @@ long kvmppc_rm_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn,
if (!tt)
return H_TOO_HARD;

+ if (tt->grp)
+ return kvmppc_rm_h_put_tce_iommu(vcpu, tt, liobn, ioba, tce);
+
+ /* Emulated IO */
if (ioba >= tt->window_size)
return H_PARAMETER;

@@ -240,6 +350,13 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
goto put_unlock_exit;
}

+ if (tt->grp) {
+ ret = kvmppc_rm_h_put_tce_indirect_iommu(vcpu,
+ tt, ioba, (unsigned long *)tces, npages);
+ goto put_unlock_exit;
+ }
+
+ /* Emulated IO */
for (i = 0; i < npages; ++i) {
ret = kvmppc_tce_validate(((unsigned long *)tces)[i]);
if (ret)
@@ -270,6 +387,11 @@ long kvmppc_rm_h_stuff_tce(struct kvm_vcpu *vcpu,
if (!tt)
return H_TOO_HARD;

+ if (tt->grp)
+ return kvmppc_rm_h_stuff_tce_iommu(vcpu, tt, liobn, ioba,
+ tce_value, npages);
+
+ /* Emulated IO */
if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
return H_PARAMETER;

diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index ccb578b..2909cfa 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -395,6 +395,7 @@ int kvm_dev_ioctl_check_extension(long ext)
r = 1;
break;
case KVM_CAP_SPAPR_MULTITCE:
+ case KVM_CAP_SPAPR_TCE_IOMMU:
r = 1;
break;
#endif
@@ -1025,6 +1026,17 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = kvm_vm_ioctl_create_spapr_tce(kvm, &create_tce);
goto out;
}
+ case KVM_CREATE_SPAPR_TCE_IOMMU: {
+ struct kvm_create_spapr_tce_iommu create_tce_iommu;
+ struct kvm *kvm = filp->private_data;
+
+ r = -EFAULT;
+ if (copy_from_user(&create_tce_iommu, argp,
+ sizeof(create_tce_iommu)))
+ goto out;
+ r = kvm_vm_ioctl_create_spapr_tce_iommu(kvm, &create_tce_iommu);
+ goto out;
+ }
#endif /* CONFIG_PPC_BOOK3S_64 */

#ifdef CONFIG_KVM_BOOK3S_64_HV
--
1.8.3.2

2013-08-01 04:45:56

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 10/10] KVM: PPC: Add hugepage support for IOMMU in-kernel handling

This adds special support for huge pages (16MB) in real mode.

The reference counting cannot be easily done for such pages in real
mode (when MMU is off) so we added a hash table of huge pages.
It is populated in virtual mode and get_page is called just once
per a huge page. Real mode handlers check if the requested page is
in the hash table, then no reference counting is done, otherwise
an exit to virtual mode happens. The hash table is released at KVM
exit.

At the moment the fastest card available for tests uses up to 9 huge
pages so walking through this hash table does not cost much.
However this can change and we may want to optimize this.

Signed-off-by: Paul Mackerras <[email protected]>
Signed-off-by: Alexey Kardashevskiy <[email protected]>

---

Changes:
2013/07/12:
* removed multiple #ifdef IOMMU_API as IOMMU_API is always enabled
for KVM_BOOK3S_64

2013/06/27:
* list of huge pages replaces with hashtable for better performance
* spinlock removed from real mode and only protects insertion of new
huge [ages descriptors into the hashtable

2013/06/05:
* fixed compile error when CONFIG_IOMMU_API=n

2013/05/20:
* the real mode handler now searches for a huge page by gpa (used to be pte)
* the virtual mode handler prints warning if it is called twice for the same
huge page as the real mode handler is expected to fail just once - when a huge
page is not in the list yet.
* the huge page is refcounted twice - when added to the hugepage list and
when used in the virtual mode hcall handler (can be optimized but it will
make the patch less nice).

Conflicts:
arch/powerpc/kernel/iommu.c

Conflicts:
arch/powerpc/kernel/iommu.c

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/include/asm/kvm_host.h | 25 ++++++++
arch/powerpc/kernel/iommu.c | 6 +-
arch/powerpc/kvm/book3s_64_vio.c | 121 ++++++++++++++++++++++++++++++++++--
arch/powerpc/kvm/book3s_64_vio_hv.c | 32 +++++++++-
4 files changed, 175 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
index 4eeaf7d..c57b25a 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -31,6 +31,7 @@
#include <linux/list.h>
#include <linux/atomic.h>
#include <linux/tracepoint.h>
+#include <linux/hashtable.h>
#include <asm/kvm_asm.h>
#include <asm/processor.h>
#include <asm/page.h>
@@ -183,9 +184,33 @@ struct kvmppc_spapr_tce_table {
u32 window_size;
struct iommu_group *grp; /* used for IOMMU groups */
struct vfio_group *vfio_grp; /* used for IOMMU groups */
+ DECLARE_HASHTABLE(hash_tab, ilog2(64)); /* used for IOMMU groups */
+ spinlock_t hugepages_write_lock; /* used for IOMMU groups */
struct page *pages[0];
};

+/*
+ * The KVM guest can be backed with 16MB pages.
+ * In this case, we cannot do page counting from the real mode
+ * as the compound pages are used - they are linked in a list
+ * with pointers as virtual addresses which are inaccessible
+ * in real mode.
+ *
+ * The code below keeps a 16MB pages list and uses page struct
+ * in real mode if it is already locked in RAM and inserted into
+ * the list or switches to the virtual mode where it can be
+ * handled in a usual manner.
+ */
+#define KVMPPC_SPAPR_HUGEPAGE_HASH(gpa) hash_32(gpa >> 24, 32)
+
+struct kvmppc_spapr_iommu_hugepage {
+ struct hlist_node hash_node;
+ unsigned long gpa; /* Guest physical address */
+ unsigned long hpa; /* Host physical address */
+ struct page *page; /* page struct of the very first subpage */
+ unsigned long size; /* Huge page size (always 16MB at the moment) */
+};
+
struct kvmppc_linear_info {
void *base_virt;
unsigned long base_pfn;
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 8314c80..e4a8135 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -999,7 +999,8 @@ int iommu_free_tces(struct iommu_table *tbl, unsigned long entry,
if (!pg) {
ret = -EAGAIN;
} else if (PageCompound(pg)) {
- ret = -EAGAIN;
+ /* Hugepages will be released at KVM exit */
+ ret = 0;
} else {
if (oldtce & TCE_PCI_WRITE)
SetPageDirty(pg);
@@ -1010,6 +1011,9 @@ int iommu_free_tces(struct iommu_table *tbl, unsigned long entry,
struct page *pg = pfn_to_page(oldtce >> PAGE_SHIFT);
if (!pg) {
ret = -EAGAIN;
+ } else if (PageCompound(pg)) {
+ /* Hugepages will be released at KVM exit */
+ ret = 0;
} else {
if (oldtce & TCE_PCI_WRITE)
SetPageDirty(pg);
diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
index 1e86a8a..0798a7b 100644
--- a/arch/powerpc/kvm/book3s_64_vio.c
+++ b/arch/powerpc/kvm/book3s_64_vio.c
@@ -93,6 +93,102 @@ int kvmppc_vfio_external_user_iommu_id(struct vfio_group *group)
return ret;
}

+/*
+ * API to support huge pages in real mode
+ */
+static void kvmppc_iommu_hugepages_init(struct kvmppc_spapr_tce_table *tt)
+{
+ spin_lock_init(&tt->hugepages_write_lock);
+ hash_init(tt->hash_tab);
+}
+
+static void kvmppc_iommu_hugepages_cleanup(struct kvmppc_spapr_tce_table *tt)
+{
+ int bkt;
+ struct kvmppc_spapr_iommu_hugepage *hp;
+ struct hlist_node *tmp;
+
+ spin_lock(&tt->hugepages_write_lock);
+ hash_for_each_safe(tt->hash_tab, bkt, tmp, hp, hash_node) {
+ pr_debug("Release HP liobn=%llx #%u gpa=%lx hpa=%lx size=%ld\n",
+ tt->liobn, bkt, hp->gpa, hp->hpa, hp->size);
+ hlist_del_rcu(&hp->hash_node);
+
+ put_page(hp->page);
+ kfree(hp);
+ }
+ spin_unlock(&tt->hugepages_write_lock);
+}
+
+/* Returns true if a page with GPA is already in the hash table */
+static bool kvmppc_iommu_hugepage_lookup_gpa(struct kvmppc_spapr_tce_table *tt,
+ unsigned long gpa)
+{
+ struct kvmppc_spapr_iommu_hugepage *hp;
+ const unsigned key = KVMPPC_SPAPR_HUGEPAGE_HASH(gpa);
+
+ hash_for_each_possible_rcu(tt->hash_tab, hp, hash_node, key) {
+ if ((gpa < hp->gpa) || (gpa >= hp->gpa + hp->size))
+ continue;
+
+ return true;
+ }
+
+ return false;
+}
+
+/* Returns true if a page with GPA has been added to the hash table */
+static bool kvmppc_iommu_hugepage_add(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt,
+ unsigned long hva, unsigned long gpa)
+{
+ struct kvmppc_spapr_iommu_hugepage *hp;
+ const unsigned key = KVMPPC_SPAPR_HUGEPAGE_HASH(gpa);
+ pte_t *ptep;
+ unsigned int shift = 0;
+ static const int is_write = 1;
+
+ ptep = find_linux_pte_or_hugepte(vcpu->arch.pgdir, hva, &shift);
+ WARN_ON(!ptep);
+
+ if (!ptep || (shift <= PAGE_SHIFT))
+ return false;
+
+ hp = kzalloc(sizeof(*hp), GFP_KERNEL);
+ if (!hp)
+ return false;
+
+ hp->gpa = gpa & ~((1 << shift) - 1);
+ hp->hpa = (pte_pfn(*ptep) << PAGE_SHIFT);
+ hp->size = 1 << shift;
+
+ if (get_user_pages_fast(hva & ~(hp->size - 1), 1,
+ is_write, &hp->page) != 1) {
+ kfree(hp);
+ return false;
+ }
+ hash_add_rcu(tt->hash_tab, &hp->hash_node, key);
+
+ return true;
+}
+
+/** Returns true if a page with GPA is in the hash table or
+ * has just been added.
+ */
+static bool kvmppc_iommu_hugepage_try_add(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt,
+ unsigned long hva, unsigned long gpa)
+{
+ bool ret;
+
+ spin_lock(&tt->hugepages_write_lock);
+ ret = kvmppc_iommu_hugepage_lookup_gpa(tt, gpa) ||
+ kvmppc_iommu_hugepage_add(vcpu, tt, hva, gpa);
+ spin_unlock(&tt->hugepages_write_lock);
+
+ return ret;
+}
+
static long kvmppc_stt_npages(unsigned long window_size)
{
return ALIGN((window_size >> SPAPR_TCE_SHIFT)
@@ -106,6 +202,7 @@ static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt)

mutex_lock(&kvm->lock);
list_del(&stt->list);
+ kvmppc_iommu_hugepages_cleanup(stt);

if (stt->grp) {
if (stt->vfio_grp)
@@ -192,6 +289,7 @@ long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
kvm_get_kvm(kvm);

mutex_lock(&kvm->lock);
+ kvmppc_iommu_hugepages_init(stt);
list_add(&stt->list, &kvm->arch.spapr_tce_tables);

mutex_unlock(&kvm->lock);
@@ -273,6 +371,7 @@ long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,

/* Add the TCE table descriptor to the descriptor list */
mutex_lock(&kvm->lock);
+ kvmppc_iommu_hugepages_init(tt);
list_add(&tt->list, &kvm->arch.spapr_tce_tables);
mutex_unlock(&kvm->lock);

@@ -293,6 +392,7 @@ fput_exit:
* Also returns host physical address which is to put to TCE table.
*/
static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt,
unsigned long gpa, struct page **pg, unsigned long *phpa)
{
unsigned long hva, gfn = gpa >> PAGE_SHIFT;
@@ -312,6 +412,17 @@ static void __user *kvmppc_gpa_to_hva_and_get(struct kvm_vcpu *vcpu,
*phpa = __pa((unsigned long) page_address(*pg)) |
(hva & ~PAGE_MASK);

+ if (PageCompound(*pg)) {
+ /** Check if this GPA is taken care of by the hash table.
+ * If this is the case, do not show the caller page struct
+ * address as huge pages will be released at KVM exit.
+ */
+ if (kvmppc_iommu_hugepage_try_add(vcpu, tt, hva, gpa)) {
+ put_page(*pg);
+ *pg = NULL;
+ }
+ }
+
return (void *) hva;
}

@@ -349,7 +460,7 @@ long kvmppc_h_put_tce_iommu(struct kvm_vcpu *vcpu,
if (iommu_tce_put_param_check(tbl, ioba, tce))
return H_PARAMETER;

- hva = kvmppc_gpa_to_hva_and_get(vcpu, tce, &pg, &hpa);
+ hva = kvmppc_gpa_to_hva_and_get(vcpu, tt, tce, &pg, &hpa);
if (hva == ERROR_ADDR)
return H_HARDWARE;
}
@@ -358,7 +469,7 @@ long kvmppc_h_put_tce_iommu(struct kvm_vcpu *vcpu,
return H_SUCCESS;

pg = pfn_to_page(hpa >> PAGE_SHIFT);
- if (pg)
+ if (pg && !PageCompound(pg))
put_page(pg);

return H_HARDWARE;
@@ -400,7 +511,7 @@ static long kvmppc_h_put_tce_indirect_iommu(struct kvm_vcpu *vcpu,
(i << IOMMU_PAGE_SHIFT), gpa))
return H_PARAMETER;

- hva = kvmppc_gpa_to_hva_and_get(vcpu, gpa, &pg,
+ hva = kvmppc_gpa_to_hva_and_get(vcpu, tt, gpa, &pg,
&vcpu->arch.tce_tmp_hpas[i]);
if (hva == ERROR_ADDR)
goto putpages_flush_exit;
@@ -415,7 +526,7 @@ putpages_flush_exit:
for ( --i; i >= 0; --i) {
struct page *pg;
pg = pfn_to_page(vcpu->arch.tce_tmp_hpas[i] >> PAGE_SHIFT);
- if (pg)
+ if (pg && !PageCompound(pg))
put_page(pg);
}

@@ -495,7 +606,7 @@ long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu,
if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
return H_PARAMETER;

- tces = kvmppc_gpa_to_hva_and_get(vcpu, tce_list, &pg, NULL);
+ tces = kvmppc_gpa_to_hva_and_get(vcpu, tt, tce_list, &pg, NULL);
if (tces == ERROR_ADDR)
return H_TOO_HARD;

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index f81451b..382c9f0 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -133,12 +133,30 @@ void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt,
EXPORT_SYMBOL_GPL(kvmppc_tce_put);

#ifdef CONFIG_KVM_BOOK3S_64_HV
+
+static unsigned long kvmppc_rm_hugepage_gpa_to_hpa(
+ struct kvmppc_spapr_tce_table *tt,
+ unsigned long gpa)
+{
+ struct kvmppc_spapr_iommu_hugepage *hp;
+ const unsigned key = KVMPPC_SPAPR_HUGEPAGE_HASH(gpa);
+
+ hash_for_each_possible_rcu_notrace(tt->hash_tab, hp, hash_node, key) {
+ if ((gpa < hp->gpa) || (gpa >= hp->gpa + hp->size))
+ continue;
+ return hp->hpa + (gpa & (hp->size - 1));
+ }
+
+ return ERROR_ADDR;
+}
+
/*
* Converts guest physical address to host physical address.
* Tries to increase page counter via get_page_unless_zero() and
* returns ERROR_ADDR if failed.
*/
static unsigned long kvmppc_rm_gpa_to_hpa_and_get(struct kvm_vcpu *vcpu,
+ struct kvmppc_spapr_tce_table *tt,
unsigned long gpa, struct page **pg)
{
struct kvm_memory_slot *memslot;
@@ -147,6 +165,14 @@ static unsigned long kvmppc_rm_gpa_to_hpa_and_get(struct kvm_vcpu *vcpu,
unsigned long gfn = gpa >> PAGE_SHIFT;
unsigned shift = 0;

+ /* Check if it is a hugepage */
+ hpa = kvmppc_rm_hugepage_gpa_to_hpa(tt, gpa);
+ if (hpa != ERROR_ADDR) {
+ *pg = NULL; /* Tell the caller not to put page */
+ return hpa;
+ }
+
+ /* System page size case */
memslot = search_memslots(kvm_memslots(vcpu->kvm), gfn);
if (!memslot)
return ERROR_ADDR;
@@ -216,7 +242,7 @@ static long kvmppc_rm_h_put_tce_iommu(struct kvm_vcpu *vcpu,
if (iommu_tce_put_param_check(tbl, ioba, tce))
return H_PARAMETER;

- hpa = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tce, &pg);
+ hpa = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tt, tce, &pg);
if (hpa != ERROR_ADDR) {
ret = iommu_tce_build(tbl, ioba >> IOMMU_PAGE_SHIFT,
&hpa, 1, true);
@@ -253,7 +279,7 @@ static long kvmppc_rm_h_put_tce_indirect_iommu(struct kvm_vcpu *vcpu,

/* Translate TCEs and go get_page() */
for (i = 0; i < npages; ++i) {
- hpa = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tces[i], &pg);
+ hpa = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tt, tces[i], &pg);
if (hpa == ERROR_ADDR) {
vcpu->arch.tce_tmp_num = i;
vcpu->arch.tce_rm_fail = TCERM_GETPAGE;
@@ -344,7 +370,7 @@ long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu,
if ((ioba + (npages << IOMMU_PAGE_SHIFT)) > tt->window_size)
return H_PARAMETER;

- tces = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tce_list, &pg);
+ tces = kvmppc_rm_gpa_to_hpa_and_get(vcpu, tt, tce_list, &pg);
if (tces == ERROR_ADDR) {
ret = H_TOO_HARD;
goto put_unlock_exit;
--
1.8.3.2

2013-08-01 04:46:26

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 08/10] powerpc/iommu: rework to support realmode

The TCE tables handling may differ for real and virtual modes so
additional ppc_md.tce_build_rm/ppc_md.tce_free_rm/ppc_md.tce_flush_rm
handlers were introduced earlier.

So this adds the following:
1. support for the new ppc_md calls;
2. ability to iommu_tce_build to process mupltiple entries per
call;
3. arch_spin_lock to protect TCE table from races in both real and virtual
modes;
4. proper TCE table protection from races with the existing IOMMU code
in iommu_take_ownership/iommu_release_ownership;
5. hwaddr variable renamed to hpa as it better describes what it
actually represents;
6. iommu_tce_direction is static now as it is not called from anywhere else.

This will be used by upcoming real mode support of VFIO on POWER.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/include/asm/iommu.h | 9 +-
arch/powerpc/kernel/iommu.c | 198 ++++++++++++++++++++++++++-------------
2 files changed, 136 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index c34656a..b01bde1 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -78,6 +78,7 @@ struct iommu_table {
unsigned long *it_map; /* A simple allocation bitmap for now */
#ifdef CONFIG_IOMMU_API
struct iommu_group *it_group;
+ arch_spinlock_t it_rm_lock;
#endif
};

@@ -152,9 +153,9 @@ extern int iommu_tce_clear_param_check(struct iommu_table *tbl,
extern int iommu_tce_put_param_check(struct iommu_table *tbl,
unsigned long ioba, unsigned long tce);
extern int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
- unsigned long hwaddr, enum dma_data_direction direction);
-extern unsigned long iommu_clear_tce(struct iommu_table *tbl,
- unsigned long entry);
+ unsigned long *hpas, unsigned long npages, bool rm);
+extern int iommu_free_tces(struct iommu_table *tbl, unsigned long entry,
+ unsigned long npages, bool rm);
extern int iommu_clear_tces_and_put_pages(struct iommu_table *tbl,
unsigned long entry, unsigned long pages);
extern int iommu_put_tce_user_mode(struct iommu_table *tbl,
@@ -164,7 +165,5 @@ extern void iommu_flush_tce(struct iommu_table *tbl);
extern int iommu_take_ownership(struct iommu_table *tbl);
extern void iommu_release_ownership(struct iommu_table *tbl);

-extern enum dma_data_direction iommu_tce_direction(unsigned long tce);
-
#endif /* __KERNEL__ */
#endif /* _ASM_IOMMU_H */
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index b20ff17..8314c80 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -903,7 +903,7 @@ void iommu_register_group(struct iommu_table *tbl,
kfree(name);
}

-enum dma_data_direction iommu_tce_direction(unsigned long tce)
+static enum dma_data_direction iommu_tce_direction(unsigned long tce)
{
if ((tce & TCE_PCI_READ) && (tce & TCE_PCI_WRITE))
return DMA_BIDIRECTIONAL;
@@ -914,7 +914,6 @@ enum dma_data_direction iommu_tce_direction(unsigned long tce)
else
return DMA_NONE;
}
-EXPORT_SYMBOL_GPL(iommu_tce_direction);

void iommu_flush_tce(struct iommu_table *tbl)
{
@@ -972,73 +971,117 @@ int iommu_tce_put_param_check(struct iommu_table *tbl,
}
EXPORT_SYMBOL_GPL(iommu_tce_put_param_check);

-unsigned long iommu_clear_tce(struct iommu_table *tbl, unsigned long entry)
-{
- unsigned long oldtce;
- struct iommu_pool *pool = get_pool(tbl, entry);
-
- spin_lock(&(pool->lock));
-
- oldtce = ppc_md.tce_get(tbl, entry);
- if (oldtce & (TCE_PCI_WRITE | TCE_PCI_READ))
- ppc_md.tce_free(tbl, entry, 1);
- else
- oldtce = 0;
-
- spin_unlock(&(pool->lock));
-
- return oldtce;
-}
-EXPORT_SYMBOL_GPL(iommu_clear_tce);
-
int iommu_clear_tces_and_put_pages(struct iommu_table *tbl,
unsigned long entry, unsigned long pages)
{
- unsigned long oldtce;
- struct page *page;
-
- for ( ; pages; --pages, ++entry) {
- oldtce = iommu_clear_tce(tbl, entry);
- if (!oldtce)
- continue;
-
- page = pfn_to_page(oldtce >> PAGE_SHIFT);
- WARN_ON(!page);
- if (page) {
- if (oldtce & TCE_PCI_WRITE)
- SetPageDirty(page);
- put_page(page);
- }
- }
-
- return 0;
+ return iommu_free_tces(tbl, entry, pages, false);
}
EXPORT_SYMBOL_GPL(iommu_clear_tces_and_put_pages);

-/*
- * hwaddr is a kernel virtual address here (0xc... bazillion),
- * tce_build converts it to a physical address.
- */
+int iommu_free_tces(struct iommu_table *tbl, unsigned long entry,
+ unsigned long npages, bool rm)
+{
+ int i, ret = 0, to_free = 0;
+
+ if (rm && !ppc_md.tce_free_rm)
+ return -EAGAIN;
+
+ arch_spin_lock(&tbl->it_rm_lock);
+
+ for (i = 0; i < npages; ++i) {
+ unsigned long oldtce = ppc_md.tce_get(tbl, entry + i);
+ if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)))
+ continue;
+
+ if (rm) {
+ struct page *pg = realmode_pfn_to_page(
+ oldtce >> PAGE_SHIFT);
+ if (!pg) {
+ ret = -EAGAIN;
+ } else if (PageCompound(pg)) {
+ ret = -EAGAIN;
+ } else {
+ if (oldtce & TCE_PCI_WRITE)
+ SetPageDirty(pg);
+ if (!put_page_unless_one(pg))
+ ret = -EAGAIN;
+ }
+ } else {
+ struct page *pg = pfn_to_page(oldtce >> PAGE_SHIFT);
+ if (!pg) {
+ ret = -EAGAIN;
+ } else {
+ if (oldtce & TCE_PCI_WRITE)
+ SetPageDirty(pg);
+ put_page(pg);
+ }
+ }
+ if (ret)
+ break;
+ to_free = i + 1;
+ }
+
+ if (to_free) {
+ if (rm)
+ ppc_md.tce_free_rm(tbl, entry, to_free);
+ else
+ ppc_md.tce_free(tbl, entry, to_free);
+
+ if (rm && ppc_md.tce_flush_rm)
+ ppc_md.tce_flush_rm(tbl);
+ else if (!rm && ppc_md.tce_flush)
+ ppc_md.tce_flush(tbl);
+ }
+ arch_spin_unlock(&tbl->it_rm_lock);
+
+ /* Make sure updates are seen by hardware */
+ mb();
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(iommu_free_tces);
+
int iommu_tce_build(struct iommu_table *tbl, unsigned long entry,
- unsigned long hwaddr, enum dma_data_direction direction)
+ unsigned long *hpas, unsigned long npages, bool rm)
{
- int ret = -EBUSY;
- unsigned long oldtce;
- struct iommu_pool *pool = get_pool(tbl, entry);
+ int i, ret = 0;

- spin_lock(&(pool->lock));
+ if (rm && !ppc_md.tce_build_rm)
+ return -EAGAIN;

- oldtce = ppc_md.tce_get(tbl, entry);
- /* Add new entry if it is not busy */
- if (!(oldtce & (TCE_PCI_WRITE | TCE_PCI_READ)))
- ret = ppc_md.tce_build(tbl, entry, 1, hwaddr, direction, NULL);
+ arch_spin_lock(&tbl->it_rm_lock);

- spin_unlock(&(pool->lock));
+ for (i = 0; i < npages; ++i) {
+ if (ppc_md.tce_get(tbl, entry + i) &
+ (TCE_PCI_WRITE | TCE_PCI_READ)) {
+ arch_spin_unlock(&tbl->it_rm_lock);
+ return -EBUSY;
+ }
+ }

- /* if (unlikely(ret))
- pr_err("iommu_tce: %s failed on hwaddr=%lx ioba=%lx kva=%lx ret=%d\n",
- __func__, hwaddr, entry << IOMMU_PAGE_SHIFT,
- hwaddr, ret); */
+ for (i = 0; i < npages; ++i) {
+ unsigned long volatile hva = (unsigned long) __va(hpas[i]);
+ enum dma_data_direction dir = iommu_tce_direction(hva);
+
+ if (rm)
+ ret = ppc_md.tce_build_rm(tbl, entry + i, 1,
+ hva, dir, NULL);
+ else
+ ret = ppc_md.tce_build(tbl, entry + i, 1,
+ hva, dir, NULL);
+ if (ret)
+ break;
+ }
+
+ if (rm && ppc_md.tce_flush_rm)
+ ppc_md.tce_flush_rm(tbl);
+ else if (!rm && ppc_md.tce_flush)
+ ppc_md.tce_flush(tbl);
+
+ arch_spin_unlock(&tbl->it_rm_lock);
+
+ /* Make sure updates are seen by hardware */
+ mb();

return ret;
}
@@ -1049,7 +1092,7 @@ int iommu_put_tce_user_mode(struct iommu_table *tbl, unsigned long entry,
{
int ret;
struct page *page = NULL;
- unsigned long hwaddr, offset = tce & IOMMU_PAGE_MASK & ~PAGE_MASK;
+ unsigned long hpa, offset = tce & IOMMU_PAGE_MASK & ~PAGE_MASK;
enum dma_data_direction direction = iommu_tce_direction(tce);

ret = get_user_pages_fast(tce & PAGE_MASK, 1,
@@ -1059,9 +1102,9 @@ int iommu_put_tce_user_mode(struct iommu_table *tbl, unsigned long entry,
tce, entry << IOMMU_PAGE_SHIFT, ret); */
return -EFAULT;
}
- hwaddr = (unsigned long) page_address(page) + offset;
+ hpa = __pa((unsigned long) page_address(page)) + offset;

- ret = iommu_tce_build(tbl, entry, hwaddr, direction);
+ ret = iommu_tce_build(tbl, entry, &hpa, 1, false);
if (ret)
put_page(page);

@@ -1075,18 +1118,32 @@ EXPORT_SYMBOL_GPL(iommu_put_tce_user_mode);

int iommu_take_ownership(struct iommu_table *tbl)
{
- unsigned long sz = (tbl->it_size + 7) >> 3;
+ unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
+ int ret = 0;
+
+ spin_lock_irqsave(&tbl->large_pool.lock, flags);
+ for (i = 0; i < tbl->nr_pools; i++)
+ spin_lock(&tbl->pools[i].lock);

if (tbl->it_offset == 0)
clear_bit(0, tbl->it_map);

if (!bitmap_empty(tbl->it_map, tbl->it_size)) {
pr_err("iommu_tce: it_map is not empty");
- return -EBUSY;
+ ret = -EBUSY;
+ if (tbl->it_offset == 0)
+ clear_bit(1, tbl->it_map);
+
+ } else {
+ memset(tbl->it_map, 0xff, sz);
}

- memset(tbl->it_map, 0xff, sz);
- iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size);
+ for (i = 0; i < tbl->nr_pools; i++)
+ spin_unlock(&tbl->pools[i].lock);
+ spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
+
+ if (!ret)
+ iommu_free_tces(tbl, tbl->it_offset, tbl->it_size, false);

return 0;
}
@@ -1094,14 +1151,23 @@ EXPORT_SYMBOL_GPL(iommu_take_ownership);

void iommu_release_ownership(struct iommu_table *tbl)
{
- unsigned long sz = (tbl->it_size + 7) >> 3;
+ unsigned long flags, i, sz = (tbl->it_size + 7) >> 3;
+
+ iommu_free_tces(tbl, tbl->it_offset, tbl->it_size, false);
+
+ spin_lock_irqsave(&tbl->large_pool.lock, flags);
+ for (i = 0; i < tbl->nr_pools; i++)
+ spin_lock(&tbl->pools[i].lock);

- iommu_clear_tces_and_put_pages(tbl, tbl->it_offset, tbl->it_size);
memset(tbl->it_map, 0, sz);

/* Restore bit#0 set by iommu_init_table() */
if (tbl->it_offset == 0)
set_bit(0, tbl->it_map);
+
+ for (i = 0; i < tbl->nr_pools; i++)
+ spin_unlock(&tbl->pools[i].lock);
+ spin_unlock_irqrestore(&tbl->large_pool.lock, flags);
}
EXPORT_SYMBOL_GPL(iommu_release_ownership);

--
1.8.3.2

2013-08-01 04:45:22

by Alexey Kardashevskiy

[permalink] [raw]

Subject: [PATCH 05/10] powerpc: add real mode support for dma operations on powernv

The existing TCE machine calls (tce_build and tce_free) only support
virtual mode as they call __raw_writeq for TCE invalidation what
fails in real mode.

This introduces tce_build_rm and tce_free_rm real mode versions
which do mostly the same but use "Store Doubleword Caching Inhibited
Indexed" instruction for TCE invalidation.

This new feature is going to be utilized by real mode support of VFIO.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
Changes:
2013/11/07:
* added comment why stdcix cannot be used in virtual mode

2013/08/07:
* tested on p7ioc and fixed a bug with realmode addresses

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/include/asm/machdep.h | 12 ++++++++
arch/powerpc/platforms/powernv/pci-ioda.c | 47 +++++++++++++++++++++++--------
arch/powerpc/platforms/powernv/pci.c | 38 +++++++++++++++++++++----
arch/powerpc/platforms/powernv/pci.h | 3 +-
4 files changed, 81 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 8b48090..07dd3b1 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -78,6 +78,18 @@ struct machdep_calls {
long index);
void (*tce_flush)(struct iommu_table *tbl);

+ /* _rm versions are for real mode use only */
+ int (*tce_build_rm)(struct iommu_table *tbl,
+ long index,
+ long npages,
+ unsigned long uaddr,
+ enum dma_data_direction direction,
+ struct dma_attrs *attrs);
+ void (*tce_free_rm)(struct iommu_table *tbl,
+ long index,
+ long npages);
+ void (*tce_flush_rm)(struct iommu_table *tbl);
+
void __iomem * (*ioremap)(phys_addr_t addr, unsigned long size,
unsigned long flags, void *caller);
void (*iounmap)(volatile void __iomem *token);
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index d8140b1..5815f1d 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -70,6 +70,16 @@ define_pe_printk_level(pe_err, KERN_ERR);
define_pe_printk_level(pe_warn, KERN_WARNING);
define_pe_printk_level(pe_info, KERN_INFO);

+/*
+ * stdcix is only supposed to be used in hypervisor real mode as per
+ * the architecture spec
+ */
+static inline void __raw_rm_writeq(u64 val, volatile void __iomem *paddr)
+{
+ __asm__ __volatile__("stdcix %0,0,%1"
+ : : "r" (val), "r" (paddr) : "memory");
+}
+
static int pnv_ioda_alloc_pe(struct pnv_phb *phb)
{
unsigned long pe;
@@ -454,10 +464,13 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
}
}

-static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl,
- u64 *startp, u64 *endp)
+static void pnv_pci_ioda1_tce_invalidate(struct pnv_ioda_pe *pe,
+ struct iommu_table *tbl,
+ u64 *startp, u64 *endp, bool rm)
{
- u64 __iomem *invalidate = (u64 __iomem *)tbl->it_index;
+ u64 __iomem *invalidate = rm?
+ (u64 __iomem *)pe->tce_inval_reg_phys:
+ (u64 __iomem *)tbl->it_index;
unsigned long start, end, inc;

start = __pa(startp);
@@ -484,7 +497,10 @@ static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl,

mb(); /* Ensure above stores are visible */
while (start <= end) {
- __raw_writeq(start, invalidate);
+ if (rm)
+ __raw_rm_writeq(start, invalidate);
+ else
+ __raw_writeq(start, invalidate);
start += inc;
}

@@ -496,10 +512,12 @@ static void pnv_pci_ioda1_tce_invalidate(struct iommu_table *tbl,

static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe,
struct iommu_table *tbl,
- u64 *startp, u64 *endp)
+ u64 *startp, u64 *endp, bool rm)
{
unsigned long start, end, inc;
- u64 __iomem *invalidate = (u64 __iomem *)tbl->it_index;
+ u64 __iomem *invalidate = rm?
+ (u64 __iomem *)pe->tce_inval_reg_phys:
+ (u64 __iomem *)tbl->it_index;

/* We'll invalidate DMA address in PE scope */
start = 0x2ul << 60;
@@ -515,22 +533,25 @@ static void pnv_pci_ioda2_tce_invalidate(struct pnv_ioda_pe *pe,
mb();

while (start <= end) {
- __raw_writeq(start, invalidate);
+ if (rm)
+ __raw_rm_writeq(start, invalidate);
+ else
+ __raw_writeq(start, invalidate);
start += inc;
}
}

void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
- u64 *startp, u64 *endp)
+ u64 *startp, u64 *endp, bool rm)
{
struct pnv_ioda_pe *pe = container_of(tbl, struct pnv_ioda_pe,
tce32_table);
struct pnv_phb *phb = pe->phb;

if (phb->type == PNV_PHB_IODA1)
- pnv_pci_ioda1_tce_invalidate(tbl, startp, endp);
+ pnv_pci_ioda1_tce_invalidate(pe, tbl, startp, endp, rm);
else
- pnv_pci_ioda2_tce_invalidate(pe, tbl, startp, endp);
+ pnv_pci_ioda2_tce_invalidate(pe, tbl, startp, endp, rm);
}

static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
@@ -603,7 +624,8 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
* bus number, print that out instead.
*/
tbl->it_busno = 0;
- tbl->it_index = (unsigned long)ioremap(be64_to_cpup(swinvp), 8);
+ pe->tce_inval_reg_phys = be64_to_cpup(swinvp);
+ tbl->it_index = (unsigned long)ioremap(pe->tce_inval_reg_phys, 8);
tbl->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE |
TCE_PCI_SWINV_PAIR;
}
@@ -681,7 +703,8 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
* bus number, print that out instead.
*/
tbl->it_busno = 0;
- tbl->it_index = (unsigned long)ioremap(be64_to_cpup(swinvp), 8);
+ pe->tce_inval_reg_phys = be64_to_cpup(swinvp);
+ tbl->it_index = (unsigned long)ioremap(pe->tce_inval_reg_phys, 8);
tbl->it_type = TCE_PCI_SWINV_CREATE | TCE_PCI_SWINV_FREE;
}
iommu_init_table(tbl, phb->hose->node);
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index a28d3b5..2b58549 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -401,7 +401,7 @@ struct pci_ops pnv_pci_ops = {

static int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
unsigned long uaddr, enum dma_data_direction direction,
- struct dma_attrs *attrs)
+ struct dma_attrs *attrs, bool rm)
{
u64 proto_tce;
u64 *tcep, *tces;
@@ -423,12 +423,19 @@ static int pnv_tce_build(struct iommu_table *tbl, long index, long npages,
* of flags if that becomes the case
*/
if (tbl->it_type & TCE_PCI_SWINV_CREATE)
- pnv_pci_ioda_tce_invalidate(tbl, tces, tcep - 1);
+ pnv_pci_ioda_tce_invalidate(tbl, tces, tcep - 1, rm);

return 0;
}

-static void pnv_tce_free(struct iommu_table *tbl, long index, long npages)
+static int pnv_tce_build_vm(struct iommu_table *tbl, long index, long npages,
+ unsigned long uaddr, enum dma_data_direction direction,
+ struct dma_attrs *attrs)
+{
+ return pnv_tce_build(tbl, index, npages, uaddr, direction, attrs, false);
+}
+
+static void pnv_tce_free(struct iommu_table *tbl, long index, long npages, bool rm)
{
u64 *tcep, *tces;

@@ -438,7 +445,12 @@ static void pnv_tce_free(struct iommu_table *tbl, long index, long npages)
*(tcep++) = 0;

if (tbl->it_type & TCE_PCI_SWINV_FREE)
- pnv_pci_ioda_tce_invalidate(tbl, tces, tcep - 1);
+ pnv_pci_ioda_tce_invalidate(tbl, tces, tcep - 1, rm);
+}
+
+static void pnv_tce_free_vm(struct iommu_table *tbl, long index, long npages)
+{
+ pnv_tce_free(tbl, index, npages, false);
}

static unsigned long pnv_tce_get(struct iommu_table *tbl, long index)
@@ -446,6 +458,18 @@ static unsigned long pnv_tce_get(struct iommu_table *tbl, long index)
return ((u64 *)tbl->it_base)[index - tbl->it_offset];
}

+static int pnv_tce_build_rm(struct iommu_table *tbl, long index, long npages,
+ unsigned long uaddr, enum dma_data_direction direction,
+ struct dma_attrs *attrs)
+{
+ return pnv_tce_build(tbl, index, npages, uaddr, direction, attrs, true);
+}
+
+static void pnv_tce_free_rm(struct iommu_table *tbl, long index, long npages)
+{
+ pnv_tce_free(tbl, index, npages, true);
+}
+
void pnv_pci_setup_iommu_table(struct iommu_table *tbl,
void *tce_mem, u64 tce_size,
u64 dma_offset)
@@ -610,8 +634,10 @@ void __init pnv_pci_init(void)

/* Configure IOMMU DMA hooks */
ppc_md.pci_dma_dev_setup = pnv_pci_dma_dev_setup;
- ppc_md.tce_build = pnv_tce_build;
- ppc_md.tce_free = pnv_tce_free;
+ ppc_md.tce_build = pnv_tce_build_vm;
+ ppc_md.tce_free = pnv_tce_free_vm;
+ ppc_md.tce_build_rm = pnv_tce_build_rm;
+ ppc_md.tce_free_rm = pnv_tce_free_rm;
ppc_md.tce_get = pnv_tce_get;
ppc_md.pci_probe_mode = pnv_pci_probe_mode;
set_pci_dma_ops(&dma_iommu_ops);
diff --git a/arch/powerpc/platforms/powernv/pci.h b/arch/powerpc/platforms/powernv/pci.h
index d633c64..170dd98 100644
--- a/arch/powerpc/platforms/powernv/pci.h
+++ b/arch/powerpc/platforms/powernv/pci.h
@@ -52,6 +52,7 @@ struct pnv_ioda_pe {
int tce32_seg;
int tce32_segcount;
struct iommu_table tce32_table;
+ phys_addr_t tce_inval_reg_phys;

/* XXX TODO: Add support for additional 64-bit iommus */

@@ -193,6 +194,6 @@ extern void pnv_pci_init_p5ioc2_hub(struct device_node *np);
extern void pnv_pci_init_ioda_hub(struct device_node *np);
extern void pnv_pci_init_ioda2_phb(struct device_node *np);
extern void pnv_pci_ioda_tce_invalidate(struct iommu_table *tbl,
- u64 *startp, u64 *endp);
+ u64 *startp, u64 *endp, bool rm);

#endif /* __POWERNV_PCI_H */
--
1.8.3.2

2013-08-14 00:51:51

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

On Thu, 2013-08-01 at 14:44 +1000, Alexey Kardashevskiy wrote:
> This is to reserve a capablity number for upcoming support
> of H_PUT_TCE_INDIRECT and H_STUFF_TCE pseries hypercalls
> which support mulptiple DMA map/unmap operations per one call.

Gleb, any chance you can put this (and the next one) into a tree to
"lock in" the numbers ?

I've been wanting to apply the whole series to powerpc-next, that's
stuff has been simmering for way too long and is in a good enough shape
imho, but I need the capabilities and ioctl numbers locked in your tree
first.

Cheers,
Ben.

> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> ---
> Changes:
> 2013/07/16:
> * changed the number
>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> ---
> include/uapi/linux/kvm.h | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index acccd08..99c2533 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_CAP_PPC_RTAS 91
> #define KVM_CAP_IRQ_XICS 92
> #define KVM_CAP_ARM_EL1_32BIT 93
> +#define KVM_CAP_SPAPR_MULTITCE 94
>
> #ifdef KVM_CAP_IRQ_ROUTING
>

2013-08-15 07:16:30

by Alexander Graf

[permalink] [raw]

Subject: Re: [PATCH 03/10] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

On 01.08.2013, at 06:44, Alexey Kardashevskiy wrote:

> This is to reserve a capablity number for upcoming support
> of VFIO-IOMMU DMA operations in real mode.
>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>
> ---
> Changes:
> 2013/07/16:
> * changed the number
>
> 2013/07/11:
> * changed order in a file, added comment about a gap in ioctl number
>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> ---
> include/uapi/linux/kvm.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 99c2533..53c3f1f 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
> #define KVM_CAP_IRQ_XICS 92
> #define KVM_CAP_ARM_EL1_32BIT 93
> #define KVM_CAP_SPAPR_MULTITCE 94
> +#define KVM_CAP_SPAPR_TCE_IOMMU 95
>
> #ifdef KVM_CAP_IRQ_ROUTING
>
> @@ -933,6 +934,9 @@ struct kvm_s390_ucas_mapping {
> #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
> /* Available with KVM_CAP_PPC_RTAS */
> #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
> +/* 0xad and 0xaf are already taken */

so 0xad and 0xaf are already taken? where?

> +/* Available with KVM_CAP_SPAPR_TCE_IOMMU */
> +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xaf, struct kvm_create_spapr_tce_iommu)

and why is this one 0xaf then?

Alex

>
> /* ioctl for vm fd */
> #define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)
> --
> 1.8.3.2
>

2013-08-15 07:22:48

by Alexey Kardashevskiy

[permalink] [raw]

Subject: Re: [PATCH 03/10] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

On 08/15/2013 05:16 PM, Alexander Graf wrote:
>
> On 01.08.2013, at 06:44, Alexey Kardashevskiy wrote:
>
>> This is to reserve a capablity number for upcoming support
>> of VFIO-IOMMU DMA operations in real mode.
>>
>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>
>> ---
>> Changes:
>> 2013/07/16:
>> * changed the number
>>
>> 2013/07/11:
>> * changed order in a file, added comment about a gap in ioctl number
>>
>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>> ---
>> include/uapi/linux/kvm.h | 4 ++++
>> 1 file changed, 4 insertions(+)
>>
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 99c2533..53c3f1f 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
>> #define KVM_CAP_IRQ_XICS 92
>> #define KVM_CAP_ARM_EL1_32BIT 93
>> #define KVM_CAP_SPAPR_MULTITCE 94
>> +#define KVM_CAP_SPAPR_TCE_IOMMU 95
>>
>> #ifdef KVM_CAP_IRQ_ROUTING
>>
>> @@ -933,6 +934,9 @@ struct kvm_s390_ucas_mapping {
>> #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
>> /* Available with KVM_CAP_PPC_RTAS */
>> #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
>> +/* 0xad and 0xaf are already taken */
>
> so 0xad and 0xaf are already taken? where?

Mistype :( s/af/ae/

They are taken in this file:

1016 /* VM is being stopped by host */
1017 #define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad)
1018 #define KVM_ARM_VCPU_INIT _IOW(KVMIO, 0xae, struct kvm_vcpu_init)

>
>> +/* Available with KVM_CAP_SPAPR_TCE_IOMMU */
>> +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xaf, struct kvm_create_spapr_tce_iommu)
>
> and why is this one 0xaf then?

Sorry, mistype. My bad. Sorry again.

>
> Alex
>
>>
>> /* ioctl for vm fd */
>> #define KVM_CREATE_DEVICE _IOWR(KVMIO, 0xe0, struct kvm_create_device)
>> --
>> 1.8.3.2
>>
>

--
Alexey

2013-08-15 07:24:23

by Alexander Graf

[permalink] [raw]

Subject: Re: [PATCH 03/10] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

On 15.08.2013, at 09:22, Alexey Kardashevskiy wrote:

> On 08/15/2013 05:16 PM, Alexander Graf wrote:
>>
>> On 01.08.2013, at 06:44, Alexey Kardashevskiy wrote:
>>
>>> This is to reserve a capablity number for upcoming support
>>> of VFIO-IOMMU DMA operations in real mode.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>
>>> ---
>>> Changes:
>>> 2013/07/16:
>>> * changed the number
>>>
>>> 2013/07/11:
>>> * changed order in a file, added comment about a gap in ioctl number
>>>
>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>> ---
>>> include/uapi/linux/kvm.h | 4 ++++
>>> 1 file changed, 4 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 99c2533..53c3f1f 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
>>> #define KVM_CAP_IRQ_XICS 92
>>> #define KVM_CAP_ARM_EL1_32BIT 93
>>> #define KVM_CAP_SPAPR_MULTITCE 94
>>> +#define KVM_CAP_SPAPR_TCE_IOMMU 95
>>>
>>> #ifdef KVM_CAP_IRQ_ROUTING
>>>
>>> @@ -933,6 +934,9 @@ struct kvm_s390_ucas_mapping {
>>> #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
>>> /* Available with KVM_CAP_PPC_RTAS */
>>> #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
>>> +/* 0xad and 0xaf are already taken */
>>
>> so 0xad and 0xaf are already taken? where?
>
> Mistype :( s/af/ae/
>
> They are taken in this file:
>
> 1016 /* VM is being stopped by host */
> 1017 #define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad)
> 1018 #define KVM_ARM_VCPU_INIT _IOW(KVMIO, 0xae, struct kvm_vcpu_init)

Could you please make sure that whoever reads the comment in a year still knows where to look? :)

>
>>
>>> +/* Available with KVM_CAP_SPAPR_TCE_IOMMU */
>>> +#define KVM_CREATE_SPAPR_TCE_IOMMU _IOW(KVMIO, 0xaf, struct kvm_create_spapr_tce_iommu)
>>
>> and why is this one 0xaf then?
>
> Sorry, mistype. My bad. Sorry again.

No worries - just repost this single patch with the fixed comment.

Alex

2013-08-15 07:25:47

by Alexander Graf

[permalink] [raw]

Subject: Re: [PATCH 03/10] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

On 15.08.2013, at 09:24, Alexander Graf wrote:

>
> On 15.08.2013, at 09:22, Alexey Kardashevskiy wrote:
>
>> On 08/15/2013 05:16 PM, Alexander Graf wrote:
>>>
>>> On 01.08.2013, at 06:44, Alexey Kardashevskiy wrote:
>>>
>>>> This is to reserve a capablity number for upcoming support
>>>> of VFIO-IOMMU DMA operations in real mode.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>>
>>>> ---
>>>> Changes:
>>>> 2013/07/16:
>>>> * changed the number
>>>>
>>>> 2013/07/11:
>>>> * changed order in a file, added comment about a gap in ioctl number
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>> ---
>>>> include/uapi/linux/kvm.h | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>> index 99c2533..53c3f1f 100644
>>>> --- a/include/uapi/linux/kvm.h
>>>> +++ b/include/uapi/linux/kvm.h
>>>> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
>>>> #define KVM_CAP_IRQ_XICS 92
>>>> #define KVM_CAP_ARM_EL1_32BIT 93
>>>> #define KVM_CAP_SPAPR_MULTITCE 94
>>>> +#define KVM_CAP_SPAPR_TCE_IOMMU 95
>>>>
>>>> #ifdef KVM_CAP_IRQ_ROUTING
>>>>
>>>> @@ -933,6 +934,9 @@ struct kvm_s390_ucas_mapping {
>>>> #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
>>>> /* Available with KVM_CAP_PPC_RTAS */
>>>> #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
>>>> +/* 0xad and 0xaf are already taken */
>>>
>>> so 0xad and 0xaf are already taken? where?
>>
>> Mistype :( s/af/ae/
>>
>> They are taken in this file:
>>
>> 1016 /* VM is being stopped by host */
>> 1017 #define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad)
>> 1018 #define KVM_ARM_VCPU_INIT _IOW(KVMIO, 0xae, struct kvm_vcpu_init)
>
> Could you please make sure that whoever reads the comment in a year still knows where to look? :)

Something like

/* 0xad is taken by KVM_KVMCLOCK_CTRL */
/* 0xaf is taken by KVM_ARM_VCPU_INIT */

Alex

2013-08-15 07:42:36

by Alexey Kardashevskiy

[permalink] [raw]

Subject: Re: [PATCH 03/10] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

On 08/15/2013 05:25 PM, Alexander Graf wrote:
>
> On 15.08.2013, at 09:24, Alexander Graf wrote:
>
>>
>> On 15.08.2013, at 09:22, Alexey Kardashevskiy wrote:
>>
>>> On 08/15/2013 05:16 PM, Alexander Graf wrote:
>>>>
>>>> On 01.08.2013, at 06:44, Alexey Kardashevskiy wrote:
>>>>
>>>>> This is to reserve a capablity number for upcoming support
>>>>> of VFIO-IOMMU DMA operations in real mode.
>>>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>>>
>>>>> ---
>>>>> Changes:
>>>>> 2013/07/16:
>>>>> * changed the number
>>>>>
>>>>> 2013/07/11:
>>>>> * changed order in a file, added comment about a gap in ioctl number
>>>>>
>>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>>> ---
>>>>> include/uapi/linux/kvm.h | 4 ++++
>>>>> 1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>>> index 99c2533..53c3f1f 100644
>>>>> --- a/include/uapi/linux/kvm.h
>>>>> +++ b/include/uapi/linux/kvm.h
>>>>> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
>>>>> #define KVM_CAP_IRQ_XICS 92
>>>>> #define KVM_CAP_ARM_EL1_32BIT 93
>>>>> #define KVM_CAP_SPAPR_MULTITCE 94
>>>>> +#define KVM_CAP_SPAPR_TCE_IOMMU 95
>>>>>
>>>>> #ifdef KVM_CAP_IRQ_ROUTING
>>>>>
>>>>> @@ -933,6 +934,9 @@ struct kvm_s390_ucas_mapping {
>>>>> #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
>>>>> /* Available with KVM_CAP_PPC_RTAS */
>>>>> #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
>>>>> +/* 0xad and 0xaf are already taken */
>>>>
>>>> so 0xad and 0xaf are already taken? where?
>>>
>>> Mistype :( s/af/ae/
>>>
>>> They are taken in this file:
>>>
>>> 1016 /* VM is being stopped by host */
>>> 1017 #define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad)
>>> 1018 #define KVM_ARM_VCPU_INIT _IOW(KVMIO, 0xae, struct kvm_vcpu_init)
>>
>> Could you please make sure that whoever reads the comment in a year still knows where to look? :)
>
> Something like
>
> /* 0xad is taken by KVM_KVMCLOCK_CTRL */
> /* 0xaf is taken by KVM_ARM_VCPU_INIT */

Missed this mail and reposted without this comment but updated commit
message saying where they are used. Repost again?

--
Alexey

2013-08-15 07:43:43

by Alexander Graf

[permalink] [raw]

Subject: Re: [PATCH 03/10] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

On 15.08.2013, at 09:42, Alexey Kardashevskiy wrote:

> On 08/15/2013 05:25 PM, Alexander Graf wrote:
>>
>> On 15.08.2013, at 09:24, Alexander Graf wrote:
>>
>>>
>>> On 15.08.2013, at 09:22, Alexey Kardashevskiy wrote:
>>>
>>>> On 08/15/2013 05:16 PM, Alexander Graf wrote:
>>>>>
>>>>> On 01.08.2013, at 06:44, Alexey Kardashevskiy wrote:
>>>>>
>>>>>> This is to reserve a capablity number for upcoming support
>>>>>> of VFIO-IOMMU DMA operations in real mode.
>>>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>>>>
>>>>>> ---
>>>>>> Changes:
>>>>>> 2013/07/16:
>>>>>> * changed the number
>>>>>>
>>>>>> 2013/07/11:
>>>>>> * changed order in a file, added comment about a gap in ioctl number
>>>>>>
>>>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>>>> ---
>>>>>> include/uapi/linux/kvm.h | 4 ++++
>>>>>> 1 file changed, 4 insertions(+)
>>>>>>
>>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>>>> index 99c2533..53c3f1f 100644
>>>>>> --- a/include/uapi/linux/kvm.h
>>>>>> +++ b/include/uapi/linux/kvm.h
>>>>>> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
>>>>>> #define KVM_CAP_IRQ_XICS 92
>>>>>> #define KVM_CAP_ARM_EL1_32BIT 93
>>>>>> #define KVM_CAP_SPAPR_MULTITCE 94
>>>>>> +#define KVM_CAP_SPAPR_TCE_IOMMU 95
>>>>>>
>>>>>> #ifdef KVM_CAP_IRQ_ROUTING
>>>>>>
>>>>>> @@ -933,6 +934,9 @@ struct kvm_s390_ucas_mapping {
>>>>>> #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
>>>>>> /* Available with KVM_CAP_PPC_RTAS */
>>>>>> #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
>>>>>> +/* 0xad and 0xaf are already taken */
>>>>>
>>>>> so 0xad and 0xaf are already taken? where?
>>>>
>>>> Mistype :( s/af/ae/
>>>>
>>>> They are taken in this file:
>>>>
>>>> 1016 /* VM is being stopped by host */
>>>> 1017 #define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad)
>>>> 1018 #define KVM_ARM_VCPU_INIT _IOW(KVMIO, 0xae, struct kvm_vcpu_init)
>>>
>>> Could you please make sure that whoever reads the comment in a year still knows where to look? :)
>>
>> Something like
>>
>> /* 0xad is taken by KVM_KVMCLOCK_CTRL */
>> /* 0xaf is taken by KVM_ARM_VCPU_INIT */
>
>
> Missed this mail and reposted without this comment but updated commit
> message saying where they are used. Repost again?

The commit message doesn't really help, since you don't see that one when you read the header file later. So yes, please.

Alex

2013-08-15 07:54:57

by Alexey Kardashevskiy

[permalink] [raw]

Subject: Re: [PATCH 03/10] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

On 08/15/2013 05:43 PM, Alexander Graf wrote:
>
> On 15.08.2013, at 09:42, Alexey Kardashevskiy wrote:
>
>> On 08/15/2013 05:25 PM, Alexander Graf wrote:
>>>
>>> On 15.08.2013, at 09:24, Alexander Graf wrote:
>>>
>>>>
>>>> On 15.08.2013, at 09:22, Alexey Kardashevskiy wrote:
>>>>
>>>>> On 08/15/2013 05:16 PM, Alexander Graf wrote:
>>>>>>
>>>>>> On 01.08.2013, at 06:44, Alexey Kardashevskiy wrote:
>>>>>>
>>>>>>> This is to reserve a capablity number for upcoming support
>>>>>>> of VFIO-IOMMU DMA operations in real mode.
>>>>>>>
>>>>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>>>>>
>>>>>>> ---
>>>>>>> Changes:
>>>>>>> 2013/07/16:
>>>>>>> * changed the number
>>>>>>>
>>>>>>> 2013/07/11:
>>>>>>> * changed order in a file, added comment about a gap in ioctl number
>>>>>>>
>>>>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>>>>> ---
>>>>>>> include/uapi/linux/kvm.h | 4 ++++
>>>>>>> 1 file changed, 4 insertions(+)
>>>>>>>
>>>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>>>>> index 99c2533..53c3f1f 100644
>>>>>>> --- a/include/uapi/linux/kvm.h
>>>>>>> +++ b/include/uapi/linux/kvm.h
>>>>>>> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
>>>>>>> #define KVM_CAP_IRQ_XICS 92
>>>>>>> #define KVM_CAP_ARM_EL1_32BIT 93
>>>>>>> #define KVM_CAP_SPAPR_MULTITCE 94
>>>>>>> +#define KVM_CAP_SPAPR_TCE_IOMMU 95
>>>>>>>
>>>>>>> #ifdef KVM_CAP_IRQ_ROUTING
>>>>>>>
>>>>>>> @@ -933,6 +934,9 @@ struct kvm_s390_ucas_mapping {
>>>>>>> #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
>>>>>>> /* Available with KVM_CAP_PPC_RTAS */
>>>>>>> #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
>>>>>>> +/* 0xad and 0xaf are already taken */
>>>>>>
>>>>>> so 0xad and 0xaf are already taken? where?
>>>>>
>>>>> Mistype :( s/af/ae/
>>>>>
>>>>> They are taken in this file:
>>>>>
>>>>> 1016 /* VM is being stopped by host */
>>>>> 1017 #define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad)
>>>>> 1018 #define KVM_ARM_VCPU_INIT _IOW(KVMIO, 0xae, struct kvm_vcpu_init)
>>>>
>>>> Could you please make sure that whoever reads the comment in a year still knows where to look? :)
>>>
>>> Something like
>>>
>>> /* 0xad is taken by KVM_KVMCLOCK_CTRL */
>>> /* 0xaf is taken by KVM_ARM_VCPU_INIT */
>>
>>
>> Missed this mail and reposted without this comment but updated commit
>> message saying where they are used. Repost again?
>
> The commit message doesn't really help, since you don't see that one when you read the header file later. So yes, please.

Ok. Reposted as "[PATCH v8] KVM: PPC: reserve a capability and ioctl
numbers for realmode VFIO". Thank you. Sorry for my disturbing ignorance.

--
Alexey

2013-08-15 08:00:39

by Alexander Graf

[permalink] [raw]

Subject: Re: [PATCH 03/10] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO

On 15.08.2013, at 09:54, Alexey Kardashevskiy wrote:

> On 08/15/2013 05:43 PM, Alexander Graf wrote:
>>
>> On 15.08.2013, at 09:42, Alexey Kardashevskiy wrote:
>>
>>> On 08/15/2013 05:25 PM, Alexander Graf wrote:
>>>>
>>>> On 15.08.2013, at 09:24, Alexander Graf wrote:
>>>>
>>>>>
>>>>> On 15.08.2013, at 09:22, Alexey Kardashevskiy wrote:
>>>>>
>>>>>> On 08/15/2013 05:16 PM, Alexander Graf wrote:
>>>>>>>
>>>>>>> On 01.08.2013, at 06:44, Alexey Kardashevskiy wrote:
>>>>>>>
>>>>>>>> This is to reserve a capablity number for upcoming support
>>>>>>>> of VFIO-IOMMU DMA operations in real mode.
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> Changes:
>>>>>>>> 2013/07/16:
>>>>>>>> * changed the number
>>>>>>>>
>>>>>>>> 2013/07/11:
>>>>>>>> * changed order in a file, added comment about a gap in ioctl number
>>>>>>>>
>>>>>>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>>>>>>> ---
>>>>>>>> include/uapi/linux/kvm.h | 4 ++++
>>>>>>>> 1 file changed, 4 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>>>>>>> index 99c2533..53c3f1f 100644
>>>>>>>> --- a/include/uapi/linux/kvm.h
>>>>>>>> +++ b/include/uapi/linux/kvm.h
>>>>>>>> @@ -668,6 +668,7 @@ struct kvm_ppc_smmu_info {
>>>>>>>> #define KVM_CAP_IRQ_XICS 92
>>>>>>>> #define KVM_CAP_ARM_EL1_32BIT 93
>>>>>>>> #define KVM_CAP_SPAPR_MULTITCE 94
>>>>>>>> +#define KVM_CAP_SPAPR_TCE_IOMMU 95
>>>>>>>>
>>>>>>>> #ifdef KVM_CAP_IRQ_ROUTING
>>>>>>>>
>>>>>>>> @@ -933,6 +934,9 @@ struct kvm_s390_ucas_mapping {
>>>>>>>> #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr)
>>>>>>>> /* Available with KVM_CAP_PPC_RTAS */
>>>>>>>> #define KVM_PPC_RTAS_DEFINE_TOKEN _IOW(KVMIO, 0xac, struct kvm_rtas_token_args)
>>>>>>>> +/* 0xad and 0xaf are already taken */
>>>>>>>
>>>>>>> so 0xad and 0xaf are already taken? where?
>>>>>>
>>>>>> Mistype :( s/af/ae/
>>>>>>
>>>>>> They are taken in this file:
>>>>>>
>>>>>> 1016 /* VM is being stopped by host */
>>>>>> 1017 #define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad)
>>>>>> 1018 #define KVM_ARM_VCPU_INIT _IOW(KVMIO, 0xae, struct kvm_vcpu_init)
>>>>>
>>>>> Could you please make sure that whoever reads the comment in a year still knows where to look? :)
>>>>
>>>> Something like
>>>>
>>>> /* 0xad is taken by KVM_KVMCLOCK_CTRL */
>>>> /* 0xaf is taken by KVM_ARM_VCPU_INIT */
>>>
>>>
>>> Missed this mail and reposted without this comment but updated commit
>>> message saying where they are used. Repost again?
>>
>> The commit message doesn't really help, since you don't see that one when you read the header file later. So yes, please.
>
>
> Ok. Reposted as "[PATCH v8] KVM: PPC: reserve a capability and ioctl
> numbers for realmode VFIO". Thank you. Sorry for my disturbing ignorance.

In this case it's just sloppyness which disturbs be a lot less, as I'm sloppy myself :).

Alex

2013-08-26 12:38:07

by Gleb Natapov

[permalink] [raw]

Subject: Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

On Wed, Aug 14, 2013 at 10:51:14AM +1000, Benjamin Herrenschmidt wrote:
> On Thu, 2013-08-01 at 14:44 +1000, Alexey Kardashevskiy wrote:
> > This is to reserve a capablity number for upcoming support
> > of H_PUT_TCE_INDIRECT and H_STUFF_TCE pseries hypercalls
> > which support mulptiple DMA map/unmap operations per one call.
>
> Gleb, any chance you can put this (and the next one) into a tree to
> "lock in" the numbers ?
>
Applied it. Sorry for slow response, was on vocation and still go
through the email backlog.

> I've been wanting to apply the whole series to powerpc-next, that's
> stuff has been simmering for way too long and is in a good enough shape
> imho, but I need the capabilities and ioctl numbers locked in your tree
> first.
>
> Cheers,
> Ben.
>
> > Signed-off-by: Alexey Kardashevskiy <[email protected]>
> > ---
> > Changes:
> > 2013/07/16:
> > * changed the number
> >
> > Signed-off-by: Alexey Kardashevskiy <[email protected]>
> > ---
> > include/uapi/linux/kvm.h | 1 +
> > 1 file changed, 1 insertion(+)
> >
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index acccd08..99c2533 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -667,6 +667,7 @@ struct kvm_ppc_smmu_info {
> > #define KVM_CAP_PPC_RTAS 91
> > #define KVM_CAP_IRQ_XICS 92
> > #define KVM_CAP_ARM_EL1_32BIT 93
> > +#define KVM_CAP_SPAPR_MULTITCE 94
> >
> > #ifdef KVM_CAP_IRQ_ROUTING
> >
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Gleb.

2013-08-27 04:20:24

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote:
> > Gleb, any chance you can put this (and the next one) into a tree to
> > "lock in" the numbers ?
> >
> Applied it. Sorry for slow response, was on vocation and still go
> through the email backlog.

Thanks. Since it's not in a topic branch that I can pull, I'm going to
just cherry-pick them. However, they are in your "queue" branch, not
"next" branch. Should I still assume this is a stable branch and that
the numbers aren't going to change ?

Cheers,
Ben.

2013-08-27 04:23:05

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

On Tue, 2013-08-27 at 14:19 +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote:
> > > Gleb, any chance you can put this (and the next one) into a tree to
> > > "lock in" the numbers ?
> > >
> > Applied it. Sorry for slow response, was on vocation and still go
> > through the email backlog.
>
> Thanks. Since it's not in a topic branch that I can pull, I'm going to
> just cherry-pick them. However, they are in your "queue" branch, not
> "next" branch. Should I still assume this is a stable branch and that
> the numbers aren't going to change ?

Oh and Alexey mentions that there are two capabilities and you only
applied one :-)

Cheers,
Ben.

2013-08-27 06:40:41

by Gleb Natapov

[permalink] [raw]

Subject: Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

On Tue, Aug 27, 2013 at 02:19:58PM +1000, Benjamin Herrenschmidt wrote:
> On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote:
> > > Gleb, any chance you can put this (and the next one) into a tree to
> > > "lock in" the numbers ?
> > >
> > Applied it. Sorry for slow response, was on vocation and still go
> > through the email backlog.
>
> Thanks. Since it's not in a topic branch that I can pull, I'm going to
> just cherry-pick them. However, they are in your "queue" branch, not
> "next" branch. Should I still assume this is a stable branch and that
> the numbers aren't going to change ?
>
Queue will become next after I will test it and if test will fail the
commit hash may change, but since you are going to cherry-pick and this
does not preserve commit hash it does not matter.

--
Gleb.

2013-08-27 06:41:25

by Gleb Natapov

[permalink] [raw]

Subject: Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

On Tue, Aug 27, 2013 at 02:22:42PM +1000, Benjamin Herrenschmidt wrote:
> On Tue, 2013-08-27 at 14:19 +1000, Benjamin Herrenschmidt wrote:
> > On Mon, 2013-08-26 at 15:37 +0300, Gleb Natapov wrote:
> > > > Gleb, any chance you can put this (and the next one) into a tree to
> > > > "lock in" the numbers ?
> > > >
> > > Applied it. Sorry for slow response, was on vocation and still go
> > > through the email backlog.
> >
> > Thanks. Since it's not in a topic branch that I can pull, I'm going to
> > just cherry-pick them. However, they are in your "queue" branch, not
> > "next" branch. Should I still assume this is a stable branch and that
> > the numbers aren't going to change ?
>
> Oh and Alexey mentions that there are two capabilities and you only
> applied one :-)
>
Another one is:
[PATCH v8] KVM: PPC: reserve a capability and ioctl numbers for realmode VFIO
?

--
Gleb.

2013-08-27 07:34:37

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

On Tue, 2013-08-27 at 09:40 +0300, Gleb Natapov wrote:
> > Thanks. Since it's not in a topic branch that I can pull, I'm going to
> > just cherry-pick them. However, they are in your "queue" branch, not
> > "next" branch. Should I still assume this is a stable branch and that
> > the numbers aren't going to change ?
> >
> Queue will become next after I will test it and if test will fail the
> commit hash may change, but since you are going to cherry-pick and this
> does not preserve commit hash it does not matter.

Right, as long as the actual cap and ioctl numbers remain stable :-)

Cheers,
Ben.

2013-08-27 08:10:25

by Benjamin Herrenschmidt

[permalink] [raw]

Subject: Re: [PATCH 02/10] KVM: PPC: reserve a capability number for multitce support

On Tue, 2013-08-27 at 09:41 +0300, Gleb Natapov wrote:
> > Oh and Alexey mentions that there are two capabilities and you only
> > applied one :-)
> >
> Another one is:
> [PATCH v8] KVM: PPC: reserve a capability and ioctl numbers for
> realmode VFIO
> ?

Yes, thanks !

Cheers,
Ben.