2013-10-01 20:15:07

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 0/4] KVM noncoherent DMA registration and VFIO pseudo device

This is a follow-up to my previous RFC series on this topic. The goal
is to unify how KVM manages guests in the presence of non-coherent DMA
trafic and provide a way for QEMU to register VFIO groups to enable
that support. Since this changes the way KVM handles things like
WBINVD, we use the VFIO external user interface so that KVM can verify
if a user has physical hardware access. For now we assume VFIO domains
are always non-coherent, which is true if the IOMMU driver honors the
IOMMU_CACHE mapping attribute. Once this is in and we figure out what
exactly VFIO should do regarding support for coherent IOMMU domains,
we can create a new interface for KVM to call. This is noted with a
TODO in this patch. I've reworked the device interface to use
attributes and pass an address for the file descriptor since the RFC.
Alexey, I expect that a new attribute to associate an LIOBN to a group
and arch callbacks to make use of that information is all that you'll
need to add for POWER. Thanks,

Alex

---

Alex Williamson (4):
kvm: Destroy & free KVM devices on release
kvm/x86: Convert iommu_flags to iommu_noncoherent
kvm: Create non-coherent DMA registeration
kvm: Add VFIO device for handling IOMMU cache coherency


Documentation/virtual/kvm/devices/vfio.txt | 22 ++
arch/ia64/include/asm/kvm_host.h | 2
arch/powerpc/kvm/book3s_xics.c | 1
arch/x86/include/asm/kvm_host.h | 4
arch/x86/kvm/Makefile | 2
arch/x86/kvm/vmx.c | 3
arch/x86/kvm/x86.c | 21 ++
include/linux/kvm_host.h | 23 ++
include/uapi/linux/kvm.h | 4
virt/kvm/iommu.c | 22 +-
virt/kvm/kvm_main.c | 8 +
virt/kvm/vfio.c | 262 ++++++++++++++++++++++++++++
12 files changed, 355 insertions(+), 19 deletions(-)
create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
create mode 100644 virt/kvm/vfio.c


2013-10-01 20:15:13

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 1/4] kvm: Destroy & free KVM devices on release

The KVM device interface allocates a struct kvm_device and calls
kvm_device_ops.create on it from KVM VM ioctl KVM_CREATE_DEVICE.
This returns a file descriptor to the user for them to set/get/check
further attributes. On closing the file descriptor, one would assume
that kvm_device_ops.destroy is called and all traces of the device
would go away. One would be wrong, it actually does nothing more
than release the struct kvm reference, waiting until the VM is
destroyed before doing more. This leaves devices that only want a
single instance of themselves per VM in a tough spot.

To fix this, do full cleanup on release of the device file descriptor.
It's also non-symmetric that one of the existing devices frees the
struct kvm_device from it's .destroy function, while the other
doesn't. KVM-core allocates the structure and should therefore be
responsible for freeing it. Finally, add a missing kfree for the
device creation error path.

Signed-off-by: Alex Williamson <[email protected]>
---
arch/powerpc/kvm/book3s_xics.c | 1 -
virt/kvm/kvm_main.c | 5 +++++
2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/book3s_xics.c b/arch/powerpc/kvm/book3s_xics.c
index a3a5cb8..9a82426 100644
--- a/arch/powerpc/kvm/book3s_xics.c
+++ b/arch/powerpc/kvm/book3s_xics.c
@@ -1220,7 +1220,6 @@ static void kvmppc_xics_free(struct kvm_device *dev)
for (i = 0; i <= xics->max_icsid; i++)
kfree(xics->ics[i]);
kfree(xics);
- kfree(dev);
}

static int kvmppc_xics_create(struct kvm_device *dev, u32 type)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 979bff4..8727820 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -572,6 +572,7 @@ static void kvm_destroy_devices(struct kvm *kvm)

list_del(node);
dev->ops->destroy(dev);
+ kfree(dev);
}
}

@@ -2229,6 +2230,9 @@ static int kvm_device_release(struct inode *inode, struct file *filp)
struct kvm_device *dev = filp->private_data;
struct kvm *kvm = dev->kvm;

+ list_del(&dev->vm_node);
+ dev->ops->destroy(dev);
+ kfree(dev);
kvm_put_kvm(kvm);
return 0;
}
@@ -2292,6 +2296,7 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
ret = anon_inode_getfd(ops->name, &kvm_device_fops, dev, O_RDWR | O_CLOEXEC);
if (ret < 0) {
ops->destroy(dev);
+ kfree(dev);
return ret;
}

2013-10-01 20:15:19

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 2/4] kvm/x86: Convert iommu_flags to iommu_noncoherent

Default to operating in coherent mode. This simplifies the logic when
we switch to a model of registering and unregistering noncoherent I/O
with KVM.

Signed-off-by: Alex Williamson <[email protected]>
---
arch/ia64/include/asm/kvm_host.h | 2 +-
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/kvm/x86.c | 2 +-
include/linux/kvm_host.h | 3 ---
virt/kvm/iommu.c | 16 ++++++++--------
6 files changed, 12 insertions(+), 15 deletions(-)

diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h
index 989dd3f..1933b7a 100644
--- a/arch/ia64/include/asm/kvm_host.h
+++ b/arch/ia64/include/asm/kvm_host.h
@@ -480,7 +480,7 @@ struct kvm_arch {

struct list_head assigned_dev_head;
struct iommu_domain *iommu_domain;
- int iommu_flags;
+ bool iommu_noncoherent;

unsigned long irq_sources_bitmap;
unsigned long irq_states[KVM_IOAPIC_NUM_PINS];
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c76ff74..1b6b5f9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -557,7 +557,7 @@ struct kvm_arch {

struct list_head assigned_dev_head;
struct iommu_domain *iommu_domain;
- int iommu_flags;
+ bool iommu_noncoherent;
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index a1216de..8b2270a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7405,7 +7405,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
if (is_mmio)
ret = MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
else if (vcpu->kvm->arch.iommu_domain &&
- !(vcpu->kvm->arch.iommu_flags & KVM_IOMMU_CACHE_COHERENCY))
+ vcpu->kvm->arch.iommu_noncoherent)
ret = kvm_get_guest_memory_type(vcpu, gfn) <<
VMX_EPT_MT_EPTE_SHIFT;
else
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e5ca72a..b1231b0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2716,7 +2716,7 @@ static void wbinvd_ipi(void *garbage)
static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
{
return vcpu->kvm->arch.iommu_domain &&
- !(vcpu->kvm->arch.iommu_flags & KVM_IOMMU_CACHE_COHERENCY);
+ vcpu->kvm->arch.iommu_noncoherent;
}

void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 0fbbc7a..f46da56 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -747,9 +747,6 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
int kvm_request_irq_source_id(struct kvm *kvm);
void kvm_free_irq_source_id(struct kvm *kvm, int irq_source_id);

-/* For vcpu->arch.iommu_flags */
-#define KVM_IOMMU_CACHE_COHERENCY 0x1
-
#ifdef CONFIG_KVM_DEVICE_ASSIGNMENT
int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
void kvm_iommu_unmap_pages(struct kvm *kvm, struct kvm_memory_slot *slot);
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 72a130b..9cde444 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -79,7 +79,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
flags = IOMMU_READ;
if (!(slot->flags & KVM_MEM_READONLY))
flags |= IOMMU_WRITE;
- if (kvm->arch.iommu_flags & KVM_IOMMU_CACHE_COHERENCY)
+ if (!kvm->arch.iommu_noncoherent)
flags |= IOMMU_CACHE;


@@ -158,7 +158,8 @@ int kvm_assign_device(struct kvm *kvm,
{
struct pci_dev *pdev = NULL;
struct iommu_domain *domain = kvm->arch.iommu_domain;
- int r, last_flags;
+ int r;
+ bool noncoherent;

/* check if iommu exists and in use */
if (!domain)
@@ -174,15 +175,13 @@ int kvm_assign_device(struct kvm *kvm,
return r;
}

- last_flags = kvm->arch.iommu_flags;
- if (iommu_domain_has_cap(kvm->arch.iommu_domain,
- IOMMU_CAP_CACHE_COHERENCY))
- kvm->arch.iommu_flags |= KVM_IOMMU_CACHE_COHERENCY;
+ noncoherent = !iommu_domain_has_cap(kvm->arch.iommu_domain,
+ IOMMU_CAP_CACHE_COHERENCY);

/* Check if need to update IOMMU page table for guest memory */
- if ((last_flags ^ kvm->arch.iommu_flags) ==
- KVM_IOMMU_CACHE_COHERENCY) {
+ if (noncoherent != kvm->arch.iommu_noncoherent) {
kvm_iommu_unmap_memslots(kvm);
+ kvm->arch.iommu_noncoherent = noncoherent;
r = kvm_iommu_map_memslots(kvm);
if (r)
goto out_unmap;
@@ -350,6 +349,7 @@ int kvm_iommu_unmap_guest(struct kvm *kvm)
mutex_lock(&kvm->slots_lock);
kvm_iommu_unmap_memslots(kvm);
kvm->arch.iommu_domain = NULL;
+ kvm->arch.iommu_noncoherent = false;
mutex_unlock(&kvm->slots_lock);

iommu_domain_free(domain);

2013-10-01 20:15:26

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 3/4] kvm: Create non-coherent DMA registeration

We currently use some ad-hoc arch variables tied to legacy KVM device
assignment to manage emulation of instructions that depend on whether
non-coherent DMA is present. Create an interface for this so that we
can register coherency for other devices, like vfio assigned devices.

Signed-off-by: Alex Williamson <[email protected]>
---
arch/x86/include/asm/kvm_host.h | 2 ++
arch/x86/kvm/vmx.c | 3 +--
arch/x86/kvm/x86.c | 21 +++++++++++++++++++--
include/linux/kvm_host.h | 19 +++++++++++++++++++
virt/kvm/iommu.c | 6 ++++++
5 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 1b6b5f9..50c1e9c1 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -558,6 +558,8 @@ struct kvm_arch {
struct list_head assigned_dev_head;
struct iommu_domain *iommu_domain;
bool iommu_noncoherent;
+#define __KVM_HAVE_ARCH_NONCOHERENT_DMA
+ atomic_t noncoherent_dma_count;
struct kvm_pic *vpic;
struct kvm_ioapic *vioapic;
struct kvm_pit *vpit;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 8b2270a..a982c9e 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -7404,8 +7404,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
*/
if (is_mmio)
ret = MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
- else if (vcpu->kvm->arch.iommu_domain &&
- vcpu->kvm->arch.iommu_noncoherent)
+ else if (kvm_arch_has_noncoherent_dma(vcpu->kvm))
ret = kvm_get_guest_memory_type(vcpu, gfn) <<
VMX_EPT_MT_EPTE_SHIFT;
else
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b1231b0..feec86d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2715,8 +2715,7 @@ static void wbinvd_ipi(void *garbage)

static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
{
- return vcpu->kvm->arch.iommu_domain &&
- vcpu->kvm->arch.iommu_noncoherent;
+ return kvm_arch_has_noncoherent_dma(vcpu->kvm);
}

void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
@@ -7420,6 +7419,24 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
kvm_x86_ops->interrupt_allowed(vcpu);
}

+void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
+{
+ atomic_inc(&kvm->arch.noncoherent_dma_count);
+}
+EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);
+
+void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
+{
+ atomic_dec(&kvm->arch.noncoherent_dma_count);
+}
+EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);
+
+bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
+{
+ return atomic_read(&kvm->arch.noncoherent_dma_count);
+}
+EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
+
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f46da56..e239c93 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -671,6 +671,25 @@ static inline void kvm_arch_free_vm(struct kvm *kvm)
}
#endif

+#ifdef __KVM_HAVE_ARCH_NONCOHERENT_DMA
+void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
+void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
+bool kvm_arch_has_noncoherent_dma(struct kvm *kvm);
+#else
+static inline void kvm_arch_register_noncoherent_dma(void)
+{
+}
+
+static inline void kvm_arch_unregister_noncoherent_dma(void)
+{
+}
+
+static inline bool kvm_arch_has_noncoherent_dma(void)
+{
+ return false;
+}
+#endif
+
static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
{
#ifdef __KVM_HAVE_ARCH_WQP
diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index 9cde444..0a54456 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -140,6 +140,9 @@ static int kvm_iommu_map_memslots(struct kvm *kvm)
struct kvm_memslots *slots;
struct kvm_memory_slot *memslot;

+ if (kvm->arch.iommu_noncoherent)
+ kvm_arch_register_noncoherent_dma(kvm);
+
idx = srcu_read_lock(&kvm->srcu);
slots = kvm_memslots(kvm);

@@ -335,6 +338,9 @@ static int kvm_iommu_unmap_memslots(struct kvm *kvm)

srcu_read_unlock(&kvm->srcu, idx);

+ if (kvm->arch.iommu_noncoherent)
+ kvm_arch_unregister_noncoherent_dma(kvm);
+
return 0;
}

2013-10-01 20:15:33

by Alex Williamson

[permalink] [raw]
Subject: [PATCH 4/4] kvm: Add VFIO device for handling IOMMU cache coherency

So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but there's an important point where that breaks down. Intel
VT-d hardware may or may not support snoop control. When snoop
control is available, intel-iommu promotes No-Snoop transactions on
PCIe to be cache coherent. That allows KVM to handle things like the
x86 WBINVD opcode as a nop. When the hardware does not support this,
KVM must implement a hardware visible WBINVD for the guest.

We could simply let userspace tell KVM how to handle WBINVD, but it's
privileged for a reason. Allowing an arbitrary user to enable
physical WBINVD gives them more access to the hardware. Previously,
this has only been enabled for guests supporting legacy PCI device
assignment. In such cases it's necessary for proper guest execution.
We therefore create a new KVM-VFIO virtual device. The user can add
and remove VFIO groups to this device via file descriptors. KVM
makes use of the VFIO external user interface to validate that the
user has access to physical hardware and, for now, assumes the I/O
is noncoherent. Eventually we'll add an interface to allow KVM to
determine the conherency of the domain as noted in the TODO.

Signed-off-by: Alex Williamson <[email protected]>
---
Documentation/virtual/kvm/devices/vfio.txt | 22 ++
arch/x86/kvm/Makefile | 2
include/linux/kvm_host.h | 1
include/uapi/linux/kvm.h | 4
virt/kvm/kvm_main.c | 3
virt/kvm/vfio.c | 262 ++++++++++++++++++++++++++++
6 files changed, 293 insertions(+), 1 deletion(-)
create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
create mode 100644 virt/kvm/vfio.c

diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
new file mode 100644
index 0000000..ef51740
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -0,0 +1,22 @@
+VFIO virtual device
+===================
+
+Device types supported:
+ KVM_DEV_TYPE_VFIO
+
+Only one VFIO instance may be created per VM. The created device
+tracks VFIO groups in use by the VM and features of those groups
+important to the correctness and acceleration of the VM. As groups
+are enabled and disabled for use by the VM, KVM should be updated
+about their presence. When registered with KVM, a reference to the
+VFIO-group is held by KVM.
+
+Groups:
+ KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_GROUP attributes:
+ KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
+ KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
+
+For each, kvm_device_attr.addr points to an int32_t file descriptor
+for the VFIO group.
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index bf4fb04..25d22b2 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -9,7 +9,7 @@ KVM := ../../../virt/kvm

kvm-y += $(KVM)/kvm_main.o $(KVM)/ioapic.o \
$(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
- $(KVM)/eventfd.o $(KVM)/irqchip.o
+ $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += $(KVM)/assigned-dev.o $(KVM)/iommu.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e239c93..8a6bfa0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1082,6 +1082,7 @@ struct kvm_device *kvm_device_from_filp(struct file *filp);

extern struct kvm_device_ops kvm_mpic_ops;
extern struct kvm_device_ops kvm_xics_ops;
+extern struct kvm_device_ops kvm_vfio_ops;

#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 99c2533..7c1a349 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -843,6 +843,10 @@ struct kvm_device_attr {
#define KVM_DEV_TYPE_FSL_MPIC_20 1
#define KVM_DEV_TYPE_FSL_MPIC_42 2
#define KVM_DEV_TYPE_XICS 3
+#define KVM_DEV_TYPE_VFIO 4
+#define KVM_DEV_VFIO_GROUP 1
+#define KVM_DEV_VFIO_GROUP_ADD 1
+#define KVM_DEV_VFIO_GROUP_DEL 2

/*
* ioctls for VM fds
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8727820..85185af 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2273,6 +2273,9 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
ops = &kvm_xics_ops;
break;
#endif
+ case KVM_DEV_TYPE_VFIO:
+ ops = &kvm_vfio_ops;
+ break;
default:
return -ENODEV;
}
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
new file mode 100644
index 0000000..37a8261
--- /dev/null
+++ b/virt/kvm/vfio.c
@@ -0,0 +1,262 @@
+/*
+ * VFIO-KVM bridge pseudo device
+ *
+ * Copyright (C) 2013 Red Hat, Inc. All rights reserved.
+ * Author: Alex Williamson <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/kvm_host.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+
+struct kvm_vfio_group {
+ struct list_head node;
+ struct vfio_group *vfio_group;
+};
+
+struct kvm_vfio {
+ struct list_head group_list;
+ struct mutex lock;
+ bool noncoherent;
+};
+
+static struct vfio_group *kvm_vfio_group_get_external_user(struct file *filep)
+{
+ struct vfio_group *vfio_group;
+ struct vfio_group *(*fn)(struct file *);
+
+ fn = symbol_get(vfio_group_get_external_user);
+ if (!fn)
+ return ERR_PTR(-EINVAL);
+
+ vfio_group = fn(filep);
+
+ symbol_put(vfio_group_get_external_user);
+
+ return vfio_group;
+}
+
+static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
+{
+ void (*fn)(struct vfio_group *);
+
+ fn = symbol_get(vfio_group_put_external_user);
+ if (!fn)
+ return;
+
+ fn(vfio_group);
+
+ symbol_put(vfio_group_put_external_user);
+}
+
+/*
+ * Groups can use the same or different IOMMU domains. If the same then
+ * adding a new group may change the coherency of groups we've previously
+ * been told about. We don't want to care about any of that so we retest
+ * each group and bail as soon as we find one that's noncoherent. This
+ * means we only ever [un]register_noncoherent_dma once for the whole device.
+ */
+static void kvm_vfio_update_coherency(struct kvm_device *dev)
+{
+ struct kvm_vfio *kv = dev->private;
+ bool noncoherent = false;
+ struct kvm_vfio_group *kvg;
+
+ mutex_lock(&kv->lock);
+
+ list_for_each_entry(kvg, &kv->group_list, node) {
+ /*
+ * TODO: We need an interface to check the coherency of
+ * the IOMMU domain this group is using. For now, assume
+ * it's always noncoherent.
+ */
+ noncoherent = true;
+ break;
+ }
+
+ if (noncoherent != kv->noncoherent) {
+ kv->noncoherent = noncoherent;
+
+ if (kv->noncoherent)
+ kvm_arch_register_noncoherent_dma(dev->kvm);
+ else
+ kvm_arch_unregister_noncoherent_dma(dev->kvm);
+ }
+
+ mutex_unlock(&kv->lock);
+}
+
+static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
+{
+ struct kvm_vfio *kv = dev->private;
+ struct vfio_group *vfio_group;
+ struct kvm_vfio_group *kvg;
+ struct fd f;
+ int32_t fd;
+ int ret;
+
+ switch (attr) {
+ case KVM_DEV_VFIO_GROUP_ADD:
+ if (get_user(fd, (int32_t __user *)arg))
+ return -EFAULT;
+
+ f = fdget(fd);
+ if (!f.file)
+ return -EBADF;
+
+ vfio_group = kvm_vfio_group_get_external_user(f.file);
+ fdput(f);
+
+ if (IS_ERR(vfio_group))
+ return PTR_ERR(vfio_group);
+
+ mutex_lock(&kv->lock);
+
+ list_for_each_entry(kvg, &kv->group_list, node) {
+ if (kvg->vfio_group == vfio_group) {
+ mutex_unlock(&kv->lock);
+ kvm_vfio_group_put_external_user(vfio_group);
+ return -EEXIST;
+ }
+ }
+
+ kvg = kzalloc(sizeof(*kvg), GFP_KERNEL);
+ if (!kvg) {
+ mutex_unlock(&kv->lock);
+ kvm_vfio_group_put_external_user(vfio_group);
+ return -ENOMEM;
+ }
+
+ list_add_tail(&kvg->node, &kv->group_list);
+ kvg->vfio_group = vfio_group;
+
+ mutex_unlock(&kv->lock);
+
+ kvm_vfio_update_coherency(dev);
+
+ return 0;
+
+ case KVM_DEV_VFIO_GROUP_DEL:
+ if (get_user(fd, (int32_t __user *)arg))
+ return -EFAULT;
+
+ f = fdget(fd);
+ if (!f.file)
+ return -EBADF;
+
+ vfio_group = kvm_vfio_group_get_external_user(f.file);
+ fdput(f);
+
+ if (IS_ERR(vfio_group))
+ return PTR_ERR(vfio_group);
+
+ ret = -ENOENT;
+
+ mutex_lock(&kv->lock);
+
+ list_for_each_entry(kvg, &kv->group_list, node) {
+ if (kvg->vfio_group != vfio_group)
+ continue;
+
+ list_del(&kvg->node);
+ kvm_vfio_group_put_external_user(kvg->vfio_group);
+ kfree(kvg);
+ ret = 0;
+ break;
+ }
+
+ mutex_unlock(&kv->lock);
+
+ kvm_vfio_group_put_external_user(vfio_group);
+
+ kvm_vfio_update_coherency(dev);
+
+ return ret;
+ }
+
+ return -ENXIO;
+}
+
+static int kvm_vfio_set_attr(struct kvm_device *dev,
+ struct kvm_device_attr *attr)
+{
+ switch (attr->group) {
+ case KVM_DEV_VFIO_GROUP:
+ return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+ }
+
+ return -ENXIO;
+}
+
+static int kvm_vfio_has_attr(struct kvm_device *dev,
+ struct kvm_device_attr *attr)
+{
+ switch (attr->group) {
+ case KVM_DEV_VFIO_GROUP:
+ switch (attr->attr) {
+ case KVM_DEV_VFIO_GROUP_ADD:
+ case KVM_DEV_VFIO_GROUP_DEL:
+ return 0;
+ }
+
+ break;
+ }
+
+ return -ENXIO;
+}
+
+static void kvm_vfio_destroy(struct kvm_device *dev)
+{
+ struct kvm_vfio *kv = dev->private;
+ struct kvm_vfio_group *kvg, *tmp;
+
+ list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+ kvm_vfio_group_put_external_user(kvg->vfio_group);
+ list_del(&kvg->node);
+ kfree(kvg);
+ }
+
+ kvm_vfio_update_coherency(dev);
+
+ kfree(kv);
+}
+
+static int kvm_vfio_create(struct kvm_device *dev, u32 type)
+{
+ struct kvm_device *tmp;
+ struct kvm_vfio *kv;
+
+ /* Only one VFIO "device" per VM */
+ list_for_each_entry(tmp, &dev->kvm->devices, vm_node)
+ if (tmp->ops == &kvm_vfio_ops)
+ return -EBUSY;
+
+ kv = kzalloc(sizeof(*kv), GFP_KERNEL);
+ if (!kv)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&kv->group_list);
+ mutex_init(&kv->lock);
+
+ dev->private = kv;
+
+ return 0;
+}
+
+struct kvm_device_ops kvm_vfio_ops = {
+ .name = "kvm-vfio",
+ .create = kvm_vfio_create,
+ .destroy = kvm_vfio_destroy,
+ .set_attr = kvm_vfio_set_attr,
+ .has_attr = kvm_vfio_has_attr,
+};

2013-10-03 02:55:20

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v2 4/4] kvm: Add VFIO device for handling IOMMU cache coherency

So far we've succeeded at making KVM and VFIO mostly unaware of each
other, but there's an important point where that breaks down. Intel
VT-d hardware may or may not support snoop control. When snoop
control is available, intel-iommu promotes No-Snoop transactions on
PCIe to be cache coherent. That allows KVM to handle things like the
x86 WBINVD opcode as a nop. When the hardware does not support this,
KVM must implement a hardware visible WBINVD for the guest.

We could simply let userspace tell KVM how to handle WBINVD, but it's
privileged for a reason. Allowing an arbitrary user to enable
physical WBINVD gives them more access to the hardware. Previously,
this has only been enabled for guests supporting legacy PCI device
assignment. In such cases it's necessary for proper guest execution.
We therefore create a new KVM-VFIO virtual device. The user can add
and remove VFIO groups to this device via file descriptors. KVM
makes use of the VFIO external user interface to validate that the
user has access to physical hardware and, for now, assumes the I/O
is noncoherent. Eventually we'll add an interface to allow KVM to
determine the conherency of the domain as noted in the TODO.

Signed-off-by: Alex Williamson <[email protected]>
---

v2: Patches 1-3 of v1 series remain the same, not resent
- Fix cast warning from (int32_t *)u64 from get_user calls
- Add a Kconfig variable to protect kvm_vfio_ops for archs
not (yet) building virt/kvm/vfio.c

Both of these identified by the 0-day build bots (thanks!)

Documentation/virtual/kvm/devices/vfio.txt | 22 ++
arch/x86/kvm/Kconfig | 1
arch/x86/kvm/Makefile | 2
include/linux/kvm_host.h | 1
include/uapi/linux/kvm.h | 4
virt/kvm/Kconfig | 3
virt/kvm/kvm_main.c | 5 +
virt/kvm/vfio.c | 263 ++++++++++++++++++++++++++++
8 files changed, 300 insertions(+), 1 deletion(-)
create mode 100644 Documentation/virtual/kvm/devices/vfio.txt
create mode 100644 virt/kvm/vfio.c

diff --git a/Documentation/virtual/kvm/devices/vfio.txt b/Documentation/virtual/kvm/devices/vfio.txt
new file mode 100644
index 0000000..ef51740
--- /dev/null
+++ b/Documentation/virtual/kvm/devices/vfio.txt
@@ -0,0 +1,22 @@
+VFIO virtual device
+===================
+
+Device types supported:
+ KVM_DEV_TYPE_VFIO
+
+Only one VFIO instance may be created per VM. The created device
+tracks VFIO groups in use by the VM and features of those groups
+important to the correctness and acceleration of the VM. As groups
+are enabled and disabled for use by the VM, KVM should be updated
+about their presence. When registered with KVM, a reference to the
+VFIO-group is held by KVM.
+
+Groups:
+ KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_GROUP attributes:
+ KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
+ KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
+
+For each, kvm_device_attr.addr points to an int32_t file descriptor
+for the VFIO group.
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index a47a3e5..b89c5db 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -38,6 +38,7 @@ config KVM
select PERF_EVENTS
select HAVE_KVM_MSI
select HAVE_KVM_CPU_RELAX_INTERCEPT
+ select KVM_VFIO
---help---
Support hosting fully virtualized guest machines using hardware
virtualization extensions. You will need a fairly recent
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index bf4fb04..25d22b2 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -9,7 +9,7 @@ KVM := ../../../virt/kvm

kvm-y += $(KVM)/kvm_main.o $(KVM)/ioapic.o \
$(KVM)/coalesced_mmio.o $(KVM)/irq_comm.o \
- $(KVM)/eventfd.o $(KVM)/irqchip.o
+ $(KVM)/eventfd.o $(KVM)/irqchip.o $(KVM)/vfio.o
kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += $(KVM)/assigned-dev.o $(KVM)/iommu.o
kvm-$(CONFIG_KVM_ASYNC_PF) += $(KVM)/async_pf.o

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index e239c93..8a6bfa0 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -1082,6 +1082,7 @@ struct kvm_device *kvm_device_from_filp(struct file *filp);

extern struct kvm_device_ops kvm_mpic_ops;
extern struct kvm_device_ops kvm_xics_ops;
+extern struct kvm_device_ops kvm_vfio_ops;

#ifdef CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT

diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 99c2533..7c1a349 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -843,6 +843,10 @@ struct kvm_device_attr {
#define KVM_DEV_TYPE_FSL_MPIC_20 1
#define KVM_DEV_TYPE_FSL_MPIC_42 2
#define KVM_DEV_TYPE_XICS 3
+#define KVM_DEV_TYPE_VFIO 4
+#define KVM_DEV_VFIO_GROUP 1
+#define KVM_DEV_VFIO_GROUP_ADD 1
+#define KVM_DEV_VFIO_GROUP_DEL 2

/*
* ioctls for VM fds
diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
index 779262f..fbe1a48 100644
--- a/virt/kvm/Kconfig
+++ b/virt/kvm/Kconfig
@@ -27,3 +27,6 @@ config HAVE_KVM_MSI

config HAVE_KVM_CPU_RELAX_INTERCEPT
bool
+
+config KVM_VFIO
+ bool
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8727820..662ce0d 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2273,6 +2273,11 @@ static int kvm_ioctl_create_device(struct kvm *kvm,
ops = &kvm_xics_ops;
break;
#endif
+#ifdef CONFIG_KVM_VFIO
+ case KVM_DEV_TYPE_VFIO:
+ ops = &kvm_vfio_ops;
+ break;
+#endif
default:
return -ENODEV;
}
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
new file mode 100644
index 0000000..2e336a7
--- /dev/null
+++ b/virt/kvm/vfio.c
@@ -0,0 +1,263 @@
+/*
+ * VFIO-KVM bridge pseudo device
+ *
+ * Copyright (C) 2013 Red Hat, Inc. All rights reserved.
+ * Author: Alex Williamson <[email protected]>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/kvm_host.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+
+struct kvm_vfio_group {
+ struct list_head node;
+ struct vfio_group *vfio_group;
+};
+
+struct kvm_vfio {
+ struct list_head group_list;
+ struct mutex lock;
+ bool noncoherent;
+};
+
+static struct vfio_group *kvm_vfio_group_get_external_user(struct file *filep)
+{
+ struct vfio_group *vfio_group;
+ struct vfio_group *(*fn)(struct file *);
+
+ fn = symbol_get(vfio_group_get_external_user);
+ if (!fn)
+ return ERR_PTR(-EINVAL);
+
+ vfio_group = fn(filep);
+
+ symbol_put(vfio_group_get_external_user);
+
+ return vfio_group;
+}
+
+static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group)
+{
+ void (*fn)(struct vfio_group *);
+
+ fn = symbol_get(vfio_group_put_external_user);
+ if (!fn)
+ return;
+
+ fn(vfio_group);
+
+ symbol_put(vfio_group_put_external_user);
+}
+
+/*
+ * Groups can use the same or different IOMMU domains. If the same then
+ * adding a new group may change the coherency of groups we've previously
+ * been told about. We don't want to care about any of that so we retest
+ * each group and bail as soon as we find one that's noncoherent. This
+ * means we only ever [un]register_noncoherent_dma once for the whole device.
+ */
+static void kvm_vfio_update_coherency(struct kvm_device *dev)
+{
+ struct kvm_vfio *kv = dev->private;
+ bool noncoherent = false;
+ struct kvm_vfio_group *kvg;
+
+ mutex_lock(&kv->lock);
+
+ list_for_each_entry(kvg, &kv->group_list, node) {
+ /*
+ * TODO: We need an interface to check the coherency of
+ * the IOMMU domain this group is using. For now, assume
+ * it's always noncoherent.
+ */
+ noncoherent = true;
+ break;
+ }
+
+ if (noncoherent != kv->noncoherent) {
+ kv->noncoherent = noncoherent;
+
+ if (kv->noncoherent)
+ kvm_arch_register_noncoherent_dma(dev->kvm);
+ else
+ kvm_arch_unregister_noncoherent_dma(dev->kvm);
+ }
+
+ mutex_unlock(&kv->lock);
+}
+
+static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
+{
+ struct kvm_vfio *kv = dev->private;
+ struct vfio_group *vfio_group;
+ struct kvm_vfio_group *kvg;
+ void __user *argp = (void __user *)arg;
+ struct fd f;
+ int32_t fd;
+ int ret;
+
+ switch (attr) {
+ case KVM_DEV_VFIO_GROUP_ADD:
+ if (get_user(fd, (int32_t __user *)argp))
+ return -EFAULT;
+
+ f = fdget(fd);
+ if (!f.file)
+ return -EBADF;
+
+ vfio_group = kvm_vfio_group_get_external_user(f.file);
+ fdput(f);
+
+ if (IS_ERR(vfio_group))
+ return PTR_ERR(vfio_group);
+
+ mutex_lock(&kv->lock);
+
+ list_for_each_entry(kvg, &kv->group_list, node) {
+ if (kvg->vfio_group == vfio_group) {
+ mutex_unlock(&kv->lock);
+ kvm_vfio_group_put_external_user(vfio_group);
+ return -EEXIST;
+ }
+ }
+
+ kvg = kzalloc(sizeof(*kvg), GFP_KERNEL);
+ if (!kvg) {
+ mutex_unlock(&kv->lock);
+ kvm_vfio_group_put_external_user(vfio_group);
+ return -ENOMEM;
+ }
+
+ list_add_tail(&kvg->node, &kv->group_list);
+ kvg->vfio_group = vfio_group;
+
+ mutex_unlock(&kv->lock);
+
+ kvm_vfio_update_coherency(dev);
+
+ return 0;
+
+ case KVM_DEV_VFIO_GROUP_DEL:
+ if (get_user(fd, (int32_t __user *)argp))
+ return -EFAULT;
+
+ f = fdget(fd);
+ if (!f.file)
+ return -EBADF;
+
+ vfio_group = kvm_vfio_group_get_external_user(f.file);
+ fdput(f);
+
+ if (IS_ERR(vfio_group))
+ return PTR_ERR(vfio_group);
+
+ ret = -ENOENT;
+
+ mutex_lock(&kv->lock);
+
+ list_for_each_entry(kvg, &kv->group_list, node) {
+ if (kvg->vfio_group != vfio_group)
+ continue;
+
+ list_del(&kvg->node);
+ kvm_vfio_group_put_external_user(kvg->vfio_group);
+ kfree(kvg);
+ ret = 0;
+ break;
+ }
+
+ mutex_unlock(&kv->lock);
+
+ kvm_vfio_group_put_external_user(vfio_group);
+
+ kvm_vfio_update_coherency(dev);
+
+ return ret;
+ }
+
+ return -ENXIO;
+}
+
+static int kvm_vfio_set_attr(struct kvm_device *dev,
+ struct kvm_device_attr *attr)
+{
+ switch (attr->group) {
+ case KVM_DEV_VFIO_GROUP:
+ return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+ }
+
+ return -ENXIO;
+}
+
+static int kvm_vfio_has_attr(struct kvm_device *dev,
+ struct kvm_device_attr *attr)
+{
+ switch (attr->group) {
+ case KVM_DEV_VFIO_GROUP:
+ switch (attr->attr) {
+ case KVM_DEV_VFIO_GROUP_ADD:
+ case KVM_DEV_VFIO_GROUP_DEL:
+ return 0;
+ }
+
+ break;
+ }
+
+ return -ENXIO;
+}
+
+static void kvm_vfio_destroy(struct kvm_device *dev)
+{
+ struct kvm_vfio *kv = dev->private;
+ struct kvm_vfio_group *kvg, *tmp;
+
+ list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+ kvm_vfio_group_put_external_user(kvg->vfio_group);
+ list_del(&kvg->node);
+ kfree(kvg);
+ }
+
+ kvm_vfio_update_coherency(dev);
+
+ kfree(kv);
+}
+
+static int kvm_vfio_create(struct kvm_device *dev, u32 type)
+{
+ struct kvm_device *tmp;
+ struct kvm_vfio *kv;
+
+ /* Only one VFIO "device" per VM */
+ list_for_each_entry(tmp, &dev->kvm->devices, vm_node)
+ if (tmp->ops == &kvm_vfio_ops)
+ return -EBUSY;
+
+ kv = kzalloc(sizeof(*kv), GFP_KERNEL);
+ if (!kv)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&kv->group_list);
+ mutex_init(&kv->lock);
+
+ dev->private = kv;
+
+ return 0;
+}
+
+struct kvm_device_ops kvm_vfio_ops = {
+ .name = "kvm-vfio",
+ .create = kvm_vfio_create,
+ .destroy = kvm_vfio_destroy,
+ .set_attr = kvm_vfio_set_attr,
+ .has_attr = kvm_vfio_has_attr,
+};

2013-10-03 20:40:50

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v2 4/4] kvm: Add VFIO device for handling IOMMU cache coherency

On Wed, 2013-10-02 at 20:55 -0600, Alex Williamson wrote:
> So far we've succeeded at making KVM and VFIO mostly unaware of each
> other, but there's an important point where that breaks down. Intel
> VT-d hardware may or may not support snoop control. When snoop
> control is available, intel-iommu promotes No-Snoop transactions on
> PCIe to be cache coherent. That allows KVM to handle things like the
> x86 WBINVD opcode as a nop. When the hardware does not support this,
> KVM must implement a hardware visible WBINVD for the guest.
>
> We could simply let userspace tell KVM how to handle WBINVD, but it's
> privileged for a reason. Allowing an arbitrary user to enable
> physical WBINVD gives them more access to the hardware. Previously,
> this has only been enabled for guests supporting legacy PCI device
> assignment. In such cases it's necessary for proper guest execution.
> We therefore create a new KVM-VFIO virtual device. The user can add
> and remove VFIO groups to this device via file descriptors. KVM
> makes use of the VFIO external user interface to validate that the
> user has access to physical hardware and, for now, assumes the I/O
> is noncoherent. Eventually we'll add an interface to allow KVM to
> determine the conherency of the domain as noted in the TODO.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
>
> v2: Patches 1-3 of v1 series remain the same, not resent
> - Fix cast warning from (int32_t *)u64 from get_user calls
> - Add a Kconfig variable to protect kvm_vfio_ops for archs
> not (yet) building virt/kvm/vfio.c

There might be another option for my particular need of this. The
device PCIe capability has a bit in the Device Control register that
enables a device to do NoSnoop transactions. Therefore it seems like by
clearing this bit on the physical device and emulating it as read-only
in the guest, we can prevent the NoSnoop at the device rather than at
the IOMMU. If we can prevent NoSnoop, then I don't think we need to
worry about things like WBINVD emulation in KVM. Let me work on this a
bit more before applying. Thanks,

Alex

2013-10-04 10:03:09

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [PATCH 3/4] kvm: Create non-coherent DMA registeration

On 10/02/2013 06:15 AM, Alex Williamson wrote:
> We currently use some ad-hoc arch variables tied to legacy KVM device
> assignment to manage emulation of instructions that depend on whether
> non-coherent DMA is present. Create an interface for this so that we
> can register coherency for other devices, like vfio assigned devices.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
> arch/x86/include/asm/kvm_host.h | 2 ++
> arch/x86/kvm/vmx.c | 3 +--
> arch/x86/kvm/x86.c | 21 +++++++++++++++++++--
> include/linux/kvm_host.h | 19 +++++++++++++++++++
> virt/kvm/iommu.c | 6 ++++++
> 5 files changed, 47 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 1b6b5f9..50c1e9c1 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -558,6 +558,8 @@ struct kvm_arch {
> struct list_head assigned_dev_head;
> struct iommu_domain *iommu_domain;
> bool iommu_noncoherent;
> +#define __KVM_HAVE_ARCH_NONCOHERENT_DMA
> + atomic_t noncoherent_dma_count;
> struct kvm_pic *vpic;
> struct kvm_ioapic *vioapic;
> struct kvm_pit *vpit;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 8b2270a..a982c9e 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7404,8 +7404,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> */
> if (is_mmio)
> ret = MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
> - else if (vcpu->kvm->arch.iommu_domain &&
> - vcpu->kvm->arch.iommu_noncoherent)
> + else if (kvm_arch_has_noncoherent_dma(vcpu->kvm))
> ret = kvm_get_guest_memory_type(vcpu, gfn) <<
> VMX_EPT_MT_EPTE_SHIFT;
> else
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index b1231b0..feec86d 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -2715,8 +2715,7 @@ static void wbinvd_ipi(void *garbage)
>
> static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
> {
> - return vcpu->kvm->arch.iommu_domain &&
> - vcpu->kvm->arch.iommu_noncoherent;
> + return kvm_arch_has_noncoherent_dma(vcpu->kvm);
> }
>
> void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> @@ -7420,6 +7419,24 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
> kvm_x86_ops->interrupt_allowed(vcpu);
> }
>
> +void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
> +{
> + atomic_inc(&kvm->arch.noncoherent_dma_count);
> +}
> +EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);
> +
> +void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
> +{
> + atomic_dec(&kvm->arch.noncoherent_dma_count);
> +}
> +EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);
> +
> +bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
> +{
> + return atomic_read(&kvm->arch.noncoherent_dma_count);
> +}
> +EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
> +
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index f46da56..e239c93 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -671,6 +671,25 @@ static inline void kvm_arch_free_vm(struct kvm *kvm)
> }
> #endif
>
> +#ifdef __KVM_HAVE_ARCH_NONCOHERENT_DMA
> +void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
> +void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
> +bool kvm_arch_has_noncoherent_dma(struct kvm *kvm);
> +#else
> +static inline void kvm_arch_register_noncoherent_dma(void)


Wrong prototype here and below, must include *kvm.


> +{
> +}
> +
> +static inline void kvm_arch_unregister_noncoherent_dma(void)
> +{
> +}
> +
> +static inline bool kvm_arch_has_noncoherent_dma(void)
> +{
> + return false;
> +}
> +#endif
> +
> static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
> {
> #ifdef __KVM_HAVE_ARCH_WQP
> diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> index 9cde444..0a54456 100644
> --- a/virt/kvm/iommu.c
> +++ b/virt/kvm/iommu.c
> @@ -140,6 +140,9 @@ static int kvm_iommu_map_memslots(struct kvm *kvm)
> struct kvm_memslots *slots;
> struct kvm_memory_slot *memslot;
>
> + if (kvm->arch.iommu_noncoherent)
> + kvm_arch_register_noncoherent_dma(kvm);
> +
> idx = srcu_read_lock(&kvm->srcu);
> slots = kvm_memslots(kvm);
>
> @@ -335,6 +338,9 @@ static int kvm_iommu_unmap_memslots(struct kvm *kvm)
>
> srcu_read_unlock(&kvm->srcu, idx);
>
> + if (kvm->arch.iommu_noncoherent)
> + kvm_arch_unregister_noncoherent_dma(kvm);
> +
> return 0;
> }
>
>


--
Alexey

2013-10-04 12:24:20

by Alexey Kardashevskiy

[permalink] [raw]
Subject: [RFC PATCH] PPC: KVM: vfio kvm device: support spapr tce

This is a very rough change set required for ppc64 to use this KVM device.

vfio_rm.c is a piece of code which is going to be called from the realmode (MMU off),
and I will put everything spapr-related under #ifdef CONFIG_SPAPR_TCE_IOMMU,
it is just friday and I have to run :)

This is an RFC but it works.

Signed-off-by: Alexey Kardashevskiy <[email protected]>
---
arch/powerpc/kvm/Kconfig | 1 +
arch/powerpc/kvm/Makefile | 4 ++++
include/linux/kvm_host.h | 8 ++++---
include/linux/vfio.h | 3 +++
include/uapi/linux/kvm.h | 1 +
virt/kvm/vfio.c | 46 ++++++++++++++++++++++++++++++++++++++++
virt/kvm/vfio_rm.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
7 files changed, 114 insertions(+), 3 deletions(-)
create mode 100644 virt/kvm/vfio_rm.c

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index 61b3535..d1b7f64 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -60,6 +60,7 @@ config KVM_BOOK3S_64
select KVM_BOOK3S_64_HANDLER
select KVM
select SPAPR_TCE_IOMMU
+ select KVM_VFIO
---help---
Support running unmodified book3s_64 and book3s_32 guest kernels
in virtual machines on book3s_64 host processors.
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index 6646c95..fc2878b 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -55,6 +55,8 @@ kvm-objs-$(CONFIG_KVM_E500MC) := $(kvm-e500mc-objs)

kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \
$(KVM)/coalesced_mmio.o \
+ $(KVM)/vfio.o \
+ $(KVM)/vfio_rm.o \
fpu.o \
book3s_paired_singles.o \
book3s_pr.o \
@@ -76,6 +78,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
book3s_hv_rm_xics.o
kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
+ $(KVM)/vfio_rm.o \
book3s_hv_rmhandlers.o \
book3s_hv_rm_mmu.o \
book3s_64_vio_hv.o \
@@ -89,6 +92,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \

kvm-book3s_64-module-objs := \
$(KVM)/kvm_main.o \
+ $(KVM)/vfio.o \
$(KVM)/eventfd.o \
powerpc.o \
emulate.o \
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index ad2b581..43c0290 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -407,6 +407,8 @@ struct kvm {
#endif
long tlbs_dirty;
struct list_head devices;
+
+ struct kvm_vfio *vfio;
};

#define kvm_err(fmt, ...) \
@@ -677,15 +679,15 @@ void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
bool kvm_arch_has_noncoherent_dma(struct kvm *kvm);
#else
-static inline void kvm_arch_register_noncoherent_dma(void)
+static inline void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
{
}

-static inline void kvm_arch_unregister_noncoherent_dma(void)
+static inline void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
{
}

-static inline bool kvm_arch_has_noncoherent_dma(void)
+static inline bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
{
return false;
}
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 24579a0..681e19b 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -97,4 +97,7 @@ extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
extern void vfio_group_put_external_user(struct vfio_group *group);
extern int vfio_external_user_iommu_id(struct vfio_group *group);

+extern struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm,
+ unsigned long liobn);
+
#endif /* VFIO_H */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 7c1a349..a74ad16 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -847,6 +847,7 @@ struct kvm_device_attr {
#define KVM_DEV_VFIO_GROUP 1
#define KVM_DEV_VFIO_GROUP_ADD 1
#define KVM_DEV_VFIO_GROUP_DEL 2
+#define KVM_DEV_VFIO_SPAPR_TCE_LIOBN 2

/*
* ioctls for VM fds
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 2e336a7..39dea9f 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -22,6 +22,7 @@
struct kvm_vfio_group {
struct list_head node;
struct vfio_group *vfio_group;
+ uint64_t liobn; /* sPAPR */
};

struct kvm_vfio {
@@ -188,12 +189,52 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
return -ENXIO;
}

+static int kvm_vfio_set_spapr_tce_liobn(struct kvm_device *dev,
+ long attr, u64 arg)
+{
+ struct kvm_vfio *kv = dev->private;
+ struct vfio_group *vfio_group;
+ struct kvm_vfio_group *kvg;
+ void __user *argp = (void __user *)arg;
+ struct fd f;
+ int32_t fd;
+ uint64_t liobn = attr;
+
+ if (get_user(fd, (int32_t __user *)argp))
+ return -EFAULT;
+
+ f = fdget(fd);
+ if (!f.file)
+ return -EBADF;
+
+ vfio_group = kvm_vfio_group_get_external_user(f.file);
+ fdput(f);
+
+ list_for_each_entry(kvg, &kv->group_list, node) {
+ if (kvg->vfio_group == vfio_group) {
+ WARN_ON(kvg->liobn);
+ kvg->liobn = liobn;
+ kvm_vfio_group_put_external_user(vfio_group);
+ return 0;
+ }
+ }
+
+ kvm_vfio_group_put_external_user(vfio_group);
+
+ return -ENXIO;
+}
+
static int kvm_vfio_set_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
switch (attr->group) {
case KVM_DEV_VFIO_GROUP:
return kvm_vfio_set_group(dev, attr->attr, attr->addr);
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+ case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
+ return kvm_vfio_set_spapr_tce_liobn(dev, attr->attr,
+ attr->addr);
+#endif
}

return -ENXIO;
@@ -211,6 +252,10 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
}

break;
+#ifdef CONFIG_SPAPR_TCE_IOMMU
+ case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
+ return 0;
+#endif
}

return -ENXIO;
@@ -250,6 +295,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
mutex_init(&kv->lock);

dev->private = kv;
+ dev->kvm->vfio = kv;

return 0;
}
diff --git a/virt/kvm/vfio_rm.c b/virt/kvm/vfio_rm.c
new file mode 100644
index 0000000..ee9fd96
--- /dev/null
+++ b/virt/kvm/vfio_rm.c
@@ -0,0 +1,54 @@
+#include <linux/errno.h>
+#include <linux/file.h>
+#include <linux/kvm_host.h>
+#include <linux/list.h>
+#include <linux/module.h>
+#include <linux/mutex.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/vfio.h>
+
+struct kvm_vfio_group {
+ struct list_head node;
+ struct vfio_group *vfio_group;
+ uint64_t liobn; /* sPAPR */
+};
+
+struct kvm_vfio {
+ struct list_head group_list;
+ struct mutex lock;
+ bool noncoherent;
+};
+
+struct vfio_group {
+ struct kref kref;
+ int minor;
+ atomic_t container_users;
+ struct iommu_group *iommu_group;
+ struct vfio_container *container;
+ struct list_head device_list;
+ struct mutex device_lock;
+ struct device *dev;
+ struct notifier_block nb;
+ struct list_head vfio_next;
+ struct list_head container_next;
+ atomic_t opened;
+};
+
+struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm, unsigned long liobn)
+{
+ struct kvm_vfio_group *kvg;
+
+ if (!kvm->vfio)
+ return NULL;
+
+ list_for_each_entry(kvg, &kvm->vfio->group_list, node) {
+ if (kvg->liobn == liobn)
+ return kvg->vfio_group->iommu_group;
+ }
+
+ return NULL;
+}
+EXPORT_SYMBOL_GPL(vfio_find_group_by_liobn);
+
+
--
1.8.4.rc4

2013-10-04 14:53:44

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH 3/4] kvm: Create non-coherent DMA registeration

On Fri, 2013-10-04 at 20:02 +1000, Alexey Kardashevskiy wrote:
> On 10/02/2013 06:15 AM, Alex Williamson wrote:
> > We currently use some ad-hoc arch variables tied to legacy KVM device
> > assignment to manage emulation of instructions that depend on whether
> > non-coherent DMA is present. Create an interface for this so that we
> > can register coherency for other devices, like vfio assigned devices.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
> > ---
> > arch/x86/include/asm/kvm_host.h | 2 ++
> > arch/x86/kvm/vmx.c | 3 +--
> > arch/x86/kvm/x86.c | 21 +++++++++++++++++++--
> > include/linux/kvm_host.h | 19 +++++++++++++++++++
> > virt/kvm/iommu.c | 6 ++++++
> > 5 files changed, 47 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> > index 1b6b5f9..50c1e9c1 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -558,6 +558,8 @@ struct kvm_arch {
> > struct list_head assigned_dev_head;
> > struct iommu_domain *iommu_domain;
> > bool iommu_noncoherent;
> > +#define __KVM_HAVE_ARCH_NONCOHERENT_DMA
> > + atomic_t noncoherent_dma_count;
> > struct kvm_pic *vpic;
> > struct kvm_ioapic *vioapic;
> > struct kvm_pit *vpit;
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 8b2270a..a982c9e 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -7404,8 +7404,7 @@ static u64 vmx_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t gfn, bool is_mmio)
> > */
> > if (is_mmio)
> > ret = MTRR_TYPE_UNCACHABLE << VMX_EPT_MT_EPTE_SHIFT;
> > - else if (vcpu->kvm->arch.iommu_domain &&
> > - vcpu->kvm->arch.iommu_noncoherent)
> > + else if (kvm_arch_has_noncoherent_dma(vcpu->kvm))
> > ret = kvm_get_guest_memory_type(vcpu, gfn) <<
> > VMX_EPT_MT_EPTE_SHIFT;
> > else
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index b1231b0..feec86d 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -2715,8 +2715,7 @@ static void wbinvd_ipi(void *garbage)
> >
> > static bool need_emulate_wbinvd(struct kvm_vcpu *vcpu)
> > {
> > - return vcpu->kvm->arch.iommu_domain &&
> > - vcpu->kvm->arch.iommu_noncoherent;
> > + return kvm_arch_has_noncoherent_dma(vcpu->kvm);
> > }
> >
> > void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
> > @@ -7420,6 +7419,24 @@ bool kvm_arch_can_inject_async_page_present(struct kvm_vcpu *vcpu)
> > kvm_x86_ops->interrupt_allowed(vcpu);
> > }
> >
> > +void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
> > +{
> > + atomic_inc(&kvm->arch.noncoherent_dma_count);
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_arch_register_noncoherent_dma);
> > +
> > +void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
> > +{
> > + atomic_dec(&kvm->arch.noncoherent_dma_count);
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_arch_unregister_noncoherent_dma);
> > +
> > +bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
> > +{
> > + return atomic_read(&kvm->arch.noncoherent_dma_count);
> > +}
> > +EXPORT_SYMBOL_GPL(kvm_arch_has_noncoherent_dma);
> > +
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit);
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_inj_virq);
> > EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_page_fault);
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index f46da56..e239c93 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -671,6 +671,25 @@ static inline void kvm_arch_free_vm(struct kvm *kvm)
> > }
> > #endif
> >
> > +#ifdef __KVM_HAVE_ARCH_NONCOHERENT_DMA
> > +void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
> > +void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
> > +bool kvm_arch_has_noncoherent_dma(struct kvm *kvm);
> > +#else
> > +static inline void kvm_arch_register_noncoherent_dma(void)
>
>
> Wrong prototype here and below, must include *kvm.

D'oh. Thanks

> > +{
> > +}
> > +
> > +static inline void kvm_arch_unregister_noncoherent_dma(void)
> > +{
> > +}
> > +
> > +static inline bool kvm_arch_has_noncoherent_dma(void)
> > +{
> > + return false;
> > +}
> > +#endif
> > +
> > static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu)
> > {
> > #ifdef __KVM_HAVE_ARCH_WQP
> > diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
> > index 9cde444..0a54456 100644
> > --- a/virt/kvm/iommu.c
> > +++ b/virt/kvm/iommu.c
> > @@ -140,6 +140,9 @@ static int kvm_iommu_map_memslots(struct kvm *kvm)
> > struct kvm_memslots *slots;
> > struct kvm_memory_slot *memslot;
> >
> > + if (kvm->arch.iommu_noncoherent)
> > + kvm_arch_register_noncoherent_dma(kvm);
> > +
> > idx = srcu_read_lock(&kvm->srcu);
> > slots = kvm_memslots(kvm);
> >
> > @@ -335,6 +338,9 @@ static int kvm_iommu_unmap_memslots(struct kvm *kvm)
> >
> > srcu_read_unlock(&kvm->srcu, idx);
> >
> > + if (kvm->arch.iommu_noncoherent)
> > + kvm_arch_unregister_noncoherent_dma(kvm);
> > +
> > return 0;
> > }
> >
> >
>
>


2013-10-04 16:05:09

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC PATCH] PPC: KVM: vfio kvm device: support spapr tce

On Fri, 2013-10-04 at 22:24 +1000, Alexey Kardashevskiy wrote:
> This is a very rough change set required for ppc64 to use this KVM device.
>
> vfio_rm.c is a piece of code which is going to be called from the realmode (MMU off),
> and I will put everything spapr-related under #ifdef CONFIG_SPAPR_TCE_IOMMU,
> it is just friday and I have to run :)
>
> This is an RFC but it works.
>
> Signed-off-by: Alexey Kardashevskiy <[email protected]>
> ---
> arch/powerpc/kvm/Kconfig | 1 +
> arch/powerpc/kvm/Makefile | 4 ++++
> include/linux/kvm_host.h | 8 ++++---
> include/linux/vfio.h | 3 +++
> include/uapi/linux/kvm.h | 1 +
> virt/kvm/vfio.c | 46 ++++++++++++++++++++++++++++++++++++++++
> virt/kvm/vfio_rm.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
> 7 files changed, 114 insertions(+), 3 deletions(-)
> create mode 100644 virt/kvm/vfio_rm.c
>
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index 61b3535..d1b7f64 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -60,6 +60,7 @@ config KVM_BOOK3S_64
> select KVM_BOOK3S_64_HANDLER
> select KVM
> select SPAPR_TCE_IOMMU
> + select KVM_VFIO
> ---help---
> Support running unmodified book3s_64 and book3s_32 guest kernels
> in virtual machines on book3s_64 host processors.
> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
> index 6646c95..fc2878b 100644
> --- a/arch/powerpc/kvm/Makefile
> +++ b/arch/powerpc/kvm/Makefile
> @@ -55,6 +55,8 @@ kvm-objs-$(CONFIG_KVM_E500MC) := $(kvm-e500mc-objs)
>
> kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \
> $(KVM)/coalesced_mmio.o \
> + $(KVM)/vfio.o \
> + $(KVM)/vfio_rm.o \
> fpu.o \
> book3s_paired_singles.o \
> book3s_pr.o \
> @@ -76,6 +78,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
> kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
> book3s_hv_rm_xics.o
> kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
> + $(KVM)/vfio_rm.o \
> book3s_hv_rmhandlers.o \
> book3s_hv_rm_mmu.o \
> book3s_64_vio_hv.o \
> @@ -89,6 +92,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
>
> kvm-book3s_64-module-objs := \
> $(KVM)/kvm_main.o \
> + $(KVM)/vfio.o \
> $(KVM)/eventfd.o \
> powerpc.o \
> emulate.o \
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ad2b581..43c0290 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -407,6 +407,8 @@ struct kvm {
> #endif
> long tlbs_dirty;
> struct list_head devices;
> +
> + struct kvm_vfio *vfio;


<cringe> can't this be on kvm->arch?

> };
>
> #define kvm_err(fmt, ...) \
> @@ -677,15 +679,15 @@ void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
> void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
> bool kvm_arch_has_noncoherent_dma(struct kvm *kvm);
> #else
> -static inline void kvm_arch_register_noncoherent_dma(void)
> +static inline void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
> {
> }
>
> -static inline void kvm_arch_unregister_noncoherent_dma(void)
> +static inline void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
> {
> }
>
> -static inline bool kvm_arch_has_noncoherent_dma(void)
> +static inline bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
> {
> return false;
> }

Will fix in my series.

> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
> index 24579a0..681e19b 100644
> --- a/include/linux/vfio.h
> +++ b/include/linux/vfio.h
> @@ -97,4 +97,7 @@ extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
> extern void vfio_group_put_external_user(struct vfio_group *group);
> extern int vfio_external_user_iommu_id(struct vfio_group *group);
>
> +extern struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm,
> + unsigned long liobn);
> +

Wrong header file.

> #endif /* VFIO_H */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 7c1a349..a74ad16 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -847,6 +847,7 @@ struct kvm_device_attr {
> #define KVM_DEV_VFIO_GROUP 1
> #define KVM_DEV_VFIO_GROUP_ADD 1
> #define KVM_DEV_VFIO_GROUP_DEL 2
> +#define KVM_DEV_VFIO_SPAPR_TCE_LIOBN 2
>
> /*
> * ioctls for VM fds
> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
> index 2e336a7..39dea9f 100644
> --- a/virt/kvm/vfio.c
> +++ b/virt/kvm/vfio.c
> @@ -22,6 +22,7 @@
> struct kvm_vfio_group {
> struct list_head node;
> struct vfio_group *vfio_group;
> + uint64_t liobn; /* sPAPR */

Perhaps an arch pointer or at least a union.

> };
>
> struct kvm_vfio {
> @@ -188,12 +189,52 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
> return -ENXIO;
> }
>
> +static int kvm_vfio_set_spapr_tce_liobn(struct kvm_device *dev,
> + long attr, u64 arg)
> +{
> + struct kvm_vfio *kv = dev->private;
> + struct vfio_group *vfio_group;
> + struct kvm_vfio_group *kvg;
> + void __user *argp = (void __user *)arg;
> + struct fd f;
> + int32_t fd;
> + uint64_t liobn = attr;
> +
> + if (get_user(fd, (int32_t __user *)argp))
> + return -EFAULT;
> +
> + f = fdget(fd);
> + if (!f.file)
> + return -EBADF;
> +
> + vfio_group = kvm_vfio_group_get_external_user(f.file);
> + fdput(f);
> +
> + list_for_each_entry(kvg, &kv->group_list, node) {
> + if (kvg->vfio_group == vfio_group) {
> + WARN_ON(kvg->liobn);

Users shouldn't be able to trigger WARN_ON so easily, return -EBUSY,
allow it to be unset and re-set, or just allow the overwrite.

> + kvg->liobn = liobn;
> + kvm_vfio_group_put_external_user(vfio_group);
> + return 0;
> + }
> + }
> +
> + kvm_vfio_group_put_external_user(vfio_group);
> +
> + return -ENXIO;
> +}
> +
> static int kvm_vfio_set_attr(struct kvm_device *dev,
> struct kvm_device_attr *attr)
> {
> switch (attr->group) {
> case KVM_DEV_VFIO_GROUP:
> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> + case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
> + return kvm_vfio_set_spapr_tce_liobn(dev, attr->attr,
> + attr->addr);
> +#endif
> }
>
> return -ENXIO;
> @@ -211,6 +252,10 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
> }
>
> break;
> +#ifdef CONFIG_SPAPR_TCE_IOMMU
> + case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
> + return 0;
> +#endif
> }
>
> return -ENXIO;
> @@ -250,6 +295,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
> mutex_init(&kv->lock);
>
> dev->private = kv;
> + dev->kvm->vfio = kv;
>
> return 0;
> }
> diff --git a/virt/kvm/vfio_rm.c b/virt/kvm/vfio_rm.c
> new file mode 100644
> index 0000000..ee9fd96
> --- /dev/null
> +++ b/virt/kvm/vfio_rm.c
> @@ -0,0 +1,54 @@
> +#include <linux/errno.h>
> +#include <linux/file.h>
> +#include <linux/kvm_host.h>
> +#include <linux/list.h>
> +#include <linux/module.h>
> +#include <linux/mutex.h>
> +#include <linux/slab.h>
> +#include <linux/uaccess.h>
> +#include <linux/vfio.h>
> +
> +struct kvm_vfio_group {
> + struct list_head node;
> + struct vfio_group *vfio_group;
> + uint64_t liobn; /* sPAPR */
> +};
> +
> +struct kvm_vfio {
> + struct list_head group_list;
> + struct mutex lock;
> + bool noncoherent;
> +};
> +
> +struct vfio_group {
> + struct kref kref;
> + int minor;
> + atomic_t container_users;
> + struct iommu_group *iommu_group;
> + struct vfio_container *container;
> + struct list_head device_list;
> + struct mutex device_lock;
> + struct device *dev;
> + struct notifier_block nb;
> + struct list_head vfio_next;
> + struct list_head container_next;
> + atomic_t opened;
> +};
> +
> +struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm, unsigned long liobn)
> +{
> + struct kvm_vfio_group *kvg;
> +
> + if (!kvm->vfio)
> + return NULL;
> +
> + list_for_each_entry(kvg, &kvm->vfio->group_list, node) {
> + if (kvg->liobn == liobn)
> + return kvg->vfio_group->iommu_group;
> + }
> +
> + return NULL;
> +}
> +EXPORT_SYMBOL_GPL(vfio_find_group_by_liobn);
> +
> +

You're kidding, right? These are intentionally private data structures
that are blatantly copied so that you can extract what you want. NACK.
The iommu_group is available off struct device, do you even need vfio or
this kvm-vfio device to get from liobn to iommu_group? Thanks,

Alex

2013-10-05 01:52:19

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [RFC PATCH] PPC: KVM: vfio kvm device: support spapr tce

On 05.10.2013 2:05, Alex Williamson wrote:
> On Fri, 2013-10-04 at 22:24 +1000, Alexey Kardashevskiy wrote:
>> This is a very rough change set required for ppc64 to use this KVM device.
>>
>> vfio_rm.c is a piece of code which is going to be called from the realmode (MMU off),
>> and I will put everything spapr-related under #ifdef CONFIG_SPAPR_TCE_IOMMU,
>> it is just friday and I have to run :)
>>
>> This is an RFC but it works.
>>
>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>> ---
>> arch/powerpc/kvm/Kconfig | 1 +
>> arch/powerpc/kvm/Makefile | 4 ++++
>> include/linux/kvm_host.h | 8 ++++---
>> include/linux/vfio.h | 3 +++
>> include/uapi/linux/kvm.h | 1 +
>> virt/kvm/vfio.c | 46 ++++++++++++++++++++++++++++++++++++++++
>> virt/kvm/vfio_rm.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
>> 7 files changed, 114 insertions(+), 3 deletions(-)
>> create mode 100644 virt/kvm/vfio_rm.c
>>
>> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
>> index 61b3535..d1b7f64 100644
>> --- a/arch/powerpc/kvm/Kconfig
>> +++ b/arch/powerpc/kvm/Kconfig
>> @@ -60,6 +60,7 @@ config KVM_BOOK3S_64
>> select KVM_BOOK3S_64_HANDLER
>> select KVM
>> select SPAPR_TCE_IOMMU
>> + select KVM_VFIO
>> ---help---
>> Support running unmodified book3s_64 and book3s_32 guest kernels
>> in virtual machines on book3s_64 host processors.
>> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
>> index 6646c95..fc2878b 100644
>> --- a/arch/powerpc/kvm/Makefile
>> +++ b/arch/powerpc/kvm/Makefile
>> @@ -55,6 +55,8 @@ kvm-objs-$(CONFIG_KVM_E500MC) := $(kvm-e500mc-objs)
>>
>> kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \
>> $(KVM)/coalesced_mmio.o \
>> + $(KVM)/vfio.o \
>> + $(KVM)/vfio_rm.o \
>> fpu.o \
>> book3s_paired_singles.o \
>> book3s_pr.o \
>> @@ -76,6 +78,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
>> kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
>> book3s_hv_rm_xics.o
>> kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
>> + $(KVM)/vfio_rm.o \
>> book3s_hv_rmhandlers.o \
>> book3s_hv_rm_mmu.o \
>> book3s_64_vio_hv.o \
>> @@ -89,6 +92,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
>>
>> kvm-book3s_64-module-objs := \
>> $(KVM)/kvm_main.o \
>> + $(KVM)/vfio.o \
>> $(KVM)/eventfd.o \
>> powerpc.o \
>> emulate.o \
>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>> index ad2b581..43c0290 100644
>> --- a/include/linux/kvm_host.h
>> +++ b/include/linux/kvm_host.h
>> @@ -407,6 +407,8 @@ struct kvm {
>> #endif
>> long tlbs_dirty;
>> struct list_head devices;
>> +
>> + struct kvm_vfio *vfio;
>
>
> <cringe> can't this be on kvm->arch?

It can, I just thought since it is valid for more than just one
platform, it can go here.


>> };
>>
>> #define kvm_err(fmt, ...) \
>> @@ -677,15 +679,15 @@ void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
>> void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
>> bool kvm_arch_has_noncoherent_dma(struct kvm *kvm);
>> #else
>> -static inline void kvm_arch_register_noncoherent_dma(void)
>> +static inline void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
>> {
>> }
>>
>> -static inline void kvm_arch_unregister_noncoherent_dma(void)
>> +static inline void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
>> {
>> }
>>
>> -static inline bool kvm_arch_has_noncoherent_dma(void)
>> +static inline bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
>> {
>> return false;
>> }
>
> Will fix in my series.
>
>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>> index 24579a0..681e19b 100644
>> --- a/include/linux/vfio.h
>> +++ b/include/linux/vfio.h
>> @@ -97,4 +97,7 @@ extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
>> extern void vfio_group_put_external_user(struct vfio_group *group);
>> extern int vfio_external_user_iommu_id(struct vfio_group *group);
>>
>> +extern struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm,
>> + unsigned long liobn);
>> +
>
> Wrong header file.
>
>> #endif /* VFIO_H */
>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>> index 7c1a349..a74ad16 100644
>> --- a/include/uapi/linux/kvm.h
>> +++ b/include/uapi/linux/kvm.h
>> @@ -847,6 +847,7 @@ struct kvm_device_attr {
>> #define KVM_DEV_VFIO_GROUP 1
>> #define KVM_DEV_VFIO_GROUP_ADD 1
>> #define KVM_DEV_VFIO_GROUP_DEL 2
>> +#define KVM_DEV_VFIO_SPAPR_TCE_LIOBN 2
>>
>> /*
>> * ioctls for VM fds
>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>> index 2e336a7..39dea9f 100644
>> --- a/virt/kvm/vfio.c
>> +++ b/virt/kvm/vfio.c
>> @@ -22,6 +22,7 @@
>> struct kvm_vfio_group {
>> struct list_head node;
>> struct vfio_group *vfio_group;
>> + uint64_t liobn; /* sPAPR */
>
> Perhaps an arch pointer or at least a union.
>
>> };
>>
>> struct kvm_vfio {
>> @@ -188,12 +189,52 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>> return -ENXIO;
>> }
>>
>> +static int kvm_vfio_set_spapr_tce_liobn(struct kvm_device *dev,
>> + long attr, u64 arg)
>> +{
>> + struct kvm_vfio *kv = dev->private;
>> + struct vfio_group *vfio_group;
>> + struct kvm_vfio_group *kvg;
>> + void __user *argp = (void __user *)arg;
>> + struct fd f;
>> + int32_t fd;
>> + uint64_t liobn = attr;
>> +
>> + if (get_user(fd, (int32_t __user *)argp))
>> + return -EFAULT;
>> +
>> + f = fdget(fd);
>> + if (!f.file)
>> + return -EBADF;
>> +
>> + vfio_group = kvm_vfio_group_get_external_user(f.file);
>> + fdput(f);
>> +
>> + list_for_each_entry(kvg, &kv->group_list, node) {
>> + if (kvg->vfio_group == vfio_group) {
>> + WARN_ON(kvg->liobn);
>
> Users shouldn't be able to trigger WARN_ON so easily, return -EBUSY,
> allow it to be unset and re-set, or just allow the overwrite.
>
>> + kvg->liobn = liobn;
>> + kvm_vfio_group_put_external_user(vfio_group);
>> + return 0;
>> + }
>> + }
>> +
>> + kvm_vfio_group_put_external_user(vfio_group);
>> +
>> + return -ENXIO;
>> +}
>> +
>> static int kvm_vfio_set_attr(struct kvm_device *dev,
>> struct kvm_device_attr *attr)
>> {
>> switch (attr->group) {
>> case KVM_DEV_VFIO_GROUP:
>> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
>> +#ifdef CONFIG_SPAPR_TCE_IOMMU
>> + case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
>> + return kvm_vfio_set_spapr_tce_liobn(dev, attr->attr,
>> + attr->addr);
>> +#endif
>> }
>>
>> return -ENXIO;
>> @@ -211,6 +252,10 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>> }
>>
>> break;
>> +#ifdef CONFIG_SPAPR_TCE_IOMMU
>> + case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
>> + return 0;
>> +#endif
>> }
>>
>> return -ENXIO;
>> @@ -250,6 +295,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>> mutex_init(&kv->lock);
>>
>> dev->private = kv;
>> + dev->kvm->vfio = kv;
>>
>> return 0;
>> }
>> diff --git a/virt/kvm/vfio_rm.c b/virt/kvm/vfio_rm.c
>> new file mode 100644
>> index 0000000..ee9fd96
>> --- /dev/null
>> +++ b/virt/kvm/vfio_rm.c
>> @@ -0,0 +1,54 @@
>> +#include <linux/errno.h>
>> +#include <linux/file.h>
>> +#include <linux/kvm_host.h>
>> +#include <linux/list.h>
>> +#include <linux/module.h>
>> +#include <linux/mutex.h>
>> +#include <linux/slab.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/vfio.h>
>> +
>> +struct kvm_vfio_group {
>> + struct list_head node;
>> + struct vfio_group *vfio_group;
>> + uint64_t liobn; /* sPAPR */
>> +};
>> +
>> +struct kvm_vfio {
>> + struct list_head group_list;
>> + struct mutex lock;
>> + bool noncoherent;
>> +};
>> +
>> +struct vfio_group {
>> + struct kref kref;
>> + int minor;
>> + atomic_t container_users;
>> + struct iommu_group *iommu_group;
>> + struct vfio_container *container;
>> + struct list_head device_list;
>> + struct mutex device_lock;
>> + struct device *dev;
>> + struct notifier_block nb;
>> + struct list_head vfio_next;
>> + struct list_head container_next;
>> + atomic_t opened;
>> +};
>> +
>> +struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm, unsigned long liobn)
>> +{
>> + struct kvm_vfio_group *kvg;
>> +
>> + if (!kvm->vfio)
>> + return NULL;
>> +
>> + list_for_each_entry(kvg, &kvm->vfio->group_list, node) {
>> + if (kvg->liobn == liobn)
>> + return kvg->vfio_group->iommu_group;
>> + }
>> +
>> + return NULL;
>> +}
>> +EXPORT_SYMBOL_GPL(vfio_find_group_by_liobn);
>> +
>> +
>
> You're kidding, right? These are intentionally private data structures
> that are blatantly copied so that you can extract what you want. NACK.
> The iommu_group is available off struct device, do you even need vfio or
> this kvm-vfio device to get from liobn to iommu_group? Thanks,


This is an RFC. I am not saying this is what can go to upstream or
anything. I am not kidding (why everyone assumes that?), I am showing
what API I would like to have in the VFIO KVM device. I need the way to
get iommu_table (which is in a private data of iommu_group) by LIOBN and
the VFIO KVM device is the _only_ entity which will know about this
connection (LIOBN is made up by the qemu and told to the guest) and it
cannot go to the kvm.ko - and the patch like this is the best way to
show it as my english obviously sucks.



--
With best regards

Alexey Kardashevskiy -- icq: 52150396

2013-10-05 03:36:55

by Alexey Kardashevskiy

[permalink] [raw]
Subject: Re: [RFC PATCH] PPC: KVM: vfio kvm device: support spapr tce

On 10/05/2013 11:52 AM, Alexey Kardashevskiy wrote:
> On 05.10.2013 2:05, Alex Williamson wrote:
>> On Fri, 2013-10-04 at 22:24 +1000, Alexey Kardashevskiy wrote:
>>> This is a very rough change set required for ppc64 to use this KVM device.
>>>
>>> vfio_rm.c is a piece of code which is going to be called from the realmode (MMU off),
>>> and I will put everything spapr-related under #ifdef CONFIG_SPAPR_TCE_IOMMU,
>>> it is just friday and I have to run :)
>>>
>>> This is an RFC but it works.
>>>
>>> Signed-off-by: Alexey Kardashevskiy <[email protected]>
>>> ---
>>> arch/powerpc/kvm/Kconfig | 1 +
>>> arch/powerpc/kvm/Makefile | 4 ++++
>>> include/linux/kvm_host.h | 8 ++++---
>>> include/linux/vfio.h | 3 +++
>>> include/uapi/linux/kvm.h | 1 +
>>> virt/kvm/vfio.c | 46 ++++++++++++++++++++++++++++++++++++++++
>>> virt/kvm/vfio_rm.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++
>>> 7 files changed, 114 insertions(+), 3 deletions(-)
>>> create mode 100644 virt/kvm/vfio_rm.c
>>>
>>> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
>>> index 61b3535..d1b7f64 100644
>>> --- a/arch/powerpc/kvm/Kconfig
>>> +++ b/arch/powerpc/kvm/Kconfig
>>> @@ -60,6 +60,7 @@ config KVM_BOOK3S_64
>>> select KVM_BOOK3S_64_HANDLER
>>> select KVM
>>> select SPAPR_TCE_IOMMU
>>> + select KVM_VFIO
>>> ---help---
>>> Support running unmodified book3s_64 and book3s_32 guest kernels
>>> in virtual machines on book3s_64 host processors.
>>> diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
>>> index 6646c95..fc2878b 100644
>>> --- a/arch/powerpc/kvm/Makefile
>>> +++ b/arch/powerpc/kvm/Makefile
>>> @@ -55,6 +55,8 @@ kvm-objs-$(CONFIG_KVM_E500MC) := $(kvm-e500mc-objs)
>>>
>>> kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_PR) := \
>>> $(KVM)/coalesced_mmio.o \
>>> + $(KVM)/vfio.o \
>>> + $(KVM)/vfio_rm.o \
>>> fpu.o \
>>> book3s_paired_singles.o \
>>> book3s_pr.o \
>>> @@ -76,6 +78,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
>>> kvm-book3s_64-builtin-xics-objs-$(CONFIG_KVM_XICS) := \
>>> book3s_hv_rm_xics.o
>>> kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HV) := \
>>> + $(KVM)/vfio_rm.o \
>>> book3s_hv_rmhandlers.o \
>>> book3s_hv_rm_mmu.o \
>>> book3s_64_vio_hv.o \
>>> @@ -89,6 +92,7 @@ kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \
>>>
>>> kvm-book3s_64-module-objs := \
>>> $(KVM)/kvm_main.o \
>>> + $(KVM)/vfio.o \
>>> $(KVM)/eventfd.o \
>>> powerpc.o \
>>> emulate.o \
>>> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
>>> index ad2b581..43c0290 100644
>>> --- a/include/linux/kvm_host.h
>>> +++ b/include/linux/kvm_host.h
>>> @@ -407,6 +407,8 @@ struct kvm {
>>> #endif
>>> long tlbs_dirty;
>>> struct list_head devices;
>>> +
>>> + struct kvm_vfio *vfio;
>>
>>
>> <cringe> can't this be on kvm->arch?
>
> It can, I just thought since it is valid for more than just one
> platform, it can go here.
>
>
>>> };
>>>
>>> #define kvm_err(fmt, ...) \
>>> @@ -677,15 +679,15 @@ void kvm_arch_register_noncoherent_dma(struct kvm *kvm);
>>> void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm);
>>> bool kvm_arch_has_noncoherent_dma(struct kvm *kvm);
>>> #else
>>> -static inline void kvm_arch_register_noncoherent_dma(void)
>>> +static inline void kvm_arch_register_noncoherent_dma(struct kvm *kvm)
>>> {
>>> }
>>>
>>> -static inline void kvm_arch_unregister_noncoherent_dma(void)
>>> +static inline void kvm_arch_unregister_noncoherent_dma(struct kvm *kvm)
>>> {
>>> }
>>>
>>> -static inline bool kvm_arch_has_noncoherent_dma(void)
>>> +static inline bool kvm_arch_has_noncoherent_dma(struct kvm *kvm)
>>> {
>>> return false;
>>> }
>>
>> Will fix in my series.
>>
>>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>>> index 24579a0..681e19b 100644
>>> --- a/include/linux/vfio.h
>>> +++ b/include/linux/vfio.h
>>> @@ -97,4 +97,7 @@ extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
>>> extern void vfio_group_put_external_user(struct vfio_group *group);
>>> extern int vfio_external_user_iommu_id(struct vfio_group *group);
>>>
>>> +extern struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm,
>>> + unsigned long liobn);
>>> +
>>
>> Wrong header file.


btw there are two - uapi/linux/vfio.h and linux/vfio.h. The external user
API is in linux/vfio.h but my change should go to uapi/linux/vfio.h, is
that correct? Or we need a third header? Just asking, no kidding, no arguing :)


>>
>>> #endif /* VFIO_H */
>>> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
>>> index 7c1a349..a74ad16 100644
>>> --- a/include/uapi/linux/kvm.h
>>> +++ b/include/uapi/linux/kvm.h
>>> @@ -847,6 +847,7 @@ struct kvm_device_attr {
>>> #define KVM_DEV_VFIO_GROUP 1
>>> #define KVM_DEV_VFIO_GROUP_ADD 1
>>> #define KVM_DEV_VFIO_GROUP_DEL 2
>>> +#define KVM_DEV_VFIO_SPAPR_TCE_LIOBN 2
>>>
>>> /*
>>> * ioctls for VM fds
>>> diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
>>> index 2e336a7..39dea9f 100644
>>> --- a/virt/kvm/vfio.c
>>> +++ b/virt/kvm/vfio.c
>>> @@ -22,6 +22,7 @@
>>> struct kvm_vfio_group {
>>> struct list_head node;
>>> struct vfio_group *vfio_group;
>>> + uint64_t liobn; /* sPAPR */
>>
>> Perhaps an arch pointer or at least a union.
>>
>>> };
>>>
>>> struct kvm_vfio {
>>> @@ -188,12 +189,52 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr, u64 arg)
>>> return -ENXIO;
>>> }
>>>
>>> +static int kvm_vfio_set_spapr_tce_liobn(struct kvm_device *dev,
>>> + long attr, u64 arg)
>>> +{
>>> + struct kvm_vfio *kv = dev->private;
>>> + struct vfio_group *vfio_group;
>>> + struct kvm_vfio_group *kvg;
>>> + void __user *argp = (void __user *)arg;
>>> + struct fd f;
>>> + int32_t fd;
>>> + uint64_t liobn = attr;
>>> +
>>> + if (get_user(fd, (int32_t __user *)argp))
>>> + return -EFAULT;
>>> +
>>> + f = fdget(fd);
>>> + if (!f.file)
>>> + return -EBADF;
>>> +
>>> + vfio_group = kvm_vfio_group_get_external_user(f.file);
>>> + fdput(f);
>>> +
>>> + list_for_each_entry(kvg, &kv->group_list, node) {
>>> + if (kvg->vfio_group == vfio_group) {
>>> + WARN_ON(kvg->liobn);
>>
>> Users shouldn't be able to trigger WARN_ON so easily, return -EBUSY,
>> allow it to be unset and re-set, or just allow the overwrite.
>>
>>> + kvg->liobn = liobn;
>>> + kvm_vfio_group_put_external_user(vfio_group);
>>> + return 0;
>>> + }
>>> + }
>>> +
>>> + kvm_vfio_group_put_external_user(vfio_group);
>>> +
>>> + return -ENXIO;
>>> +}
>>> +
>>> static int kvm_vfio_set_attr(struct kvm_device *dev,
>>> struct kvm_device_attr *attr)
>>> {
>>> switch (attr->group) {
>>> case KVM_DEV_VFIO_GROUP:
>>> return kvm_vfio_set_group(dev, attr->attr, attr->addr);
>>> +#ifdef CONFIG_SPAPR_TCE_IOMMU
>>> + case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
>>> + return kvm_vfio_set_spapr_tce_liobn(dev, attr->attr,
>>> + attr->addr);
>>> +#endif
>>> }
>>>
>>> return -ENXIO;
>>> @@ -211,6 +252,10 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
>>> }
>>>
>>> break;
>>> +#ifdef CONFIG_SPAPR_TCE_IOMMU
>>> + case KVM_DEV_VFIO_SPAPR_TCE_LIOBN:
>>> + return 0;
>>> +#endif
>>> }
>>>
>>> return -ENXIO;
>>> @@ -250,6 +295,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
>>> mutex_init(&kv->lock);
>>>
>>> dev->private = kv;
>>> + dev->kvm->vfio = kv;
>>>
>>> return 0;
>>> }
>>> diff --git a/virt/kvm/vfio_rm.c b/virt/kvm/vfio_rm.c
>>> new file mode 100644
>>> index 0000000..ee9fd96
>>> --- /dev/null
>>> +++ b/virt/kvm/vfio_rm.c
>>> @@ -0,0 +1,54 @@
>>> +#include <linux/errno.h>
>>> +#include <linux/file.h>
>>> +#include <linux/kvm_host.h>
>>> +#include <linux/list.h>
>>> +#include <linux/module.h>
>>> +#include <linux/mutex.h>
>>> +#include <linux/slab.h>
>>> +#include <linux/uaccess.h>
>>> +#include <linux/vfio.h>
>>> +
>>> +struct kvm_vfio_group {
>>> + struct list_head node;
>>> + struct vfio_group *vfio_group;
>>> + uint64_t liobn; /* sPAPR */
>>> +};
>>> +
>>> +struct kvm_vfio {
>>> + struct list_head group_list;
>>> + struct mutex lock;
>>> + bool noncoherent;
>>> +};
>>> +
>>> +struct vfio_group {
>>> + struct kref kref;
>>> + int minor;
>>> + atomic_t container_users;
>>> + struct iommu_group *iommu_group;
>>> + struct vfio_container *container;
>>> + struct list_head device_list;
>>> + struct mutex device_lock;
>>> + struct device *dev;
>>> + struct notifier_block nb;
>>> + struct list_head vfio_next;
>>> + struct list_head container_next;
>>> + atomic_t opened;
>>> +};
>>> +
>>> +struct iommu_group *vfio_find_group_by_liobn(struct kvm *kvm, unsigned long liobn)
>>> +{
>>> + struct kvm_vfio_group *kvg;
>>> +
>>> + if (!kvm->vfio)
>>> + return NULL;
>>> +
>>> + list_for_each_entry(kvg, &kvm->vfio->group_list, node) {
>>> + if (kvg->liobn == liobn)
>>> + return kvg->vfio_group->iommu_group;
>>> + }
>>> +
>>> + return NULL;
>>> +}
>>> +EXPORT_SYMBOL_GPL(vfio_find_group_by_liobn);
>>> +
>>> +
>>
>> You're kidding, right? These are intentionally private data structures
>> that are blatantly copied so that you can extract what you want. NACK.
>> The iommu_group is available off struct device, do you even need vfio or
>> this kvm-vfio device to get from liobn to iommu_group? Thanks,
>
>
> This is an RFC. I am not saying this is what can go to upstream or
> anything. I am not kidding (why everyone assumes that?), I am showing
> what API I would like to have in the VFIO KVM device. I need the way to
> get iommu_table (which is in a private data of iommu_group) by LIOBN and
> the VFIO KVM device is the _only_ entity which will know about this
> connection (LIOBN is made up by the qemu and told to the guest) and it
> cannot go to the kvm.ko - and the patch like this is the best way to
> show it as my english obviously sucks.

Oh. I was confused by:
drivers/vfio/vfio.c|67| struct vfio_group {
drivers/vfio/vfio_iommu_type1.c|73| struct vfio_group {

which are two completely different types (confusing).

So. I either need an additional file to compile to the kernel for mmu-off
case (such as vfio_rm.c) and share vfio_group struct via some internal
header or compile vfio.c into the kernel image always, in this case I'll
need to export kvm_vfio_ops symbol. What would you suggest?



--
Alexey