2007-11-07 14:20:50

by Amit Shah

[permalink] [raw]
Subject: RFC: Paravirtualized DMA accesses for KVM


This patchset is work in progress and is sent out for comments.

Guests within KVM can have paravirtualized DMA access. I've tested
the e1000 driver, and that works fine. A few problems/conditions to
get things to work:

- The pv driver should only be used as a module. If built into the
kernel, It freezes during the HD bringup
- Locks aren't taken on the host; multiple guests with passthrough
won't work
- Only 64 bit host and 64 bit guests are supported

And there are several FIXMEs mentioned in the code, but none
as grave as the ones already mentioned above.

The bulk of the passthrough work is done in userspace (qemu). Patches
will be sent shortly to the kvm-devel and qemu lists.


2007-11-07 14:21:04

by Amit Shah

[permalink] [raw]
Subject: [PATCH 4/8] KVM: PVDMA: Introduce is_pv_device() dma operation

A guest can call dma_ops->is_pv_device() to find out
if a device is a passthrough'ed device (device passed
on to a guest by the host). If this is true, a hypercall
will be made to translate DMA mapping operations.

This function can be done away with and just a
kvm_is_pv_device() call can be added, which can be no-op
on a non-pv guest (or on the host).

Signed-off-by: Amit Shah <[email protected]>
---
include/asm-x86/dma-mapping_64.h | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86/dma-mapping_64.h b/include/asm-x86/dma-mapping_64.h
index ecd0f61..3943edd 100644
--- a/include/asm-x86/dma-mapping_64.h
+++ b/include/asm-x86/dma-mapping_64.h
@@ -48,6 +48,8 @@ struct dma_mapping_ops {
int direction);
int (*dma_supported)(struct device *hwdev, u64 mask);
int is_phys;
+ /* Is this a physical device in a paravirtualized guest? */
+ int (*is_pv_device)(struct device *hwdev, const char *name);
};

extern dma_addr_t bad_dma_address;
--
1.5.3

2007-11-07 14:21:26

by Amit Shah

[permalink] [raw]
Subject: [PATCH 2/8] KVM: Move #include asm/kvm_para.h outside of __KERNEL__

We have some structures defined which are going to be
used by userspace for ioctls.

Signed-off-by: Amit Shah <[email protected]>
---
include/linux/kvm_para.h | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_para.h b/include/linux/kvm_para.h
index e4db25f..ff6ac27 100644
--- a/include/linux/kvm_para.h
+++ b/include/linux/kvm_para.h
@@ -12,12 +12,12 @@
/* Return values for hypercalls */
#define KVM_ENOSYS 1000

-#ifdef __KERNEL__
/*
* hypercalls use architecture specific
*/
#include <asm/kvm_para.h>

+#ifdef __KERNEL__
static inline int kvm_para_has_feature(unsigned int feature)
{
if (kvm_arch_para_features() & (1UL << feature))
@@ -26,4 +26,3 @@ static inline int kvm_para_has_feature(unsigned int feature)
}
#endif /* __KERNEL__ */
#endif /* __LINUX_KVM_PARA_H */
-
--
1.5.3

2007-11-07 14:21:38

by Amit Shah

[permalink] [raw]
Subject: [PATCH 1/8] KVM: PVDMA Host: Handle reqeusts for guest DMA mappings

Introduce three hypercalls and one ioctl for enabling guest
DMA mappings.

An ioctl comes from userspace (qemu) to notify of a physical
device being assigned to a guest. Guests make a hypercall (once
per device) to find out if the device is a passthrough device
and if any DMA translations are necessary.

Two other hypercalls map and unmap DMA regions respectively
for the guest. We basically look up the host page address
and return it in case of a single-page request.

For a multi-page request, we do a dma_map_sg.

Since guests are pageable, we pin all the pages under the DMA
operation on the map request and unpin them on the unmap
operation.

Major tasks still to be done: implement proper locking (get a
vm-lock), we never free some part of memory

Signed-off-by: Amit Shah <[email protected]>
---
drivers/kvm/x86.c | 273 ++++++++++++++++++++++++++++++++++++++++++++
include/asm-x86/kvm_para.h | 23 ++++-
include/linux/kvm.h | 3 +
3 files changed, 297 insertions(+), 2 deletions(-)

diff --git a/drivers/kvm/x86.c b/drivers/kvm/x86.c
index e905d46..60ea93a 100644
--- a/drivers/kvm/x86.c
+++ b/drivers/kvm/x86.c
@@ -21,8 +21,11 @@

#include <linux/kvm.h>
#include <linux/fs.h>
+#include <linux/list.h>
+#include <linux/pci.h>
#include <linux/vmalloc.h>
#include <linux/module.h>
+#include <linux/highmem.h>

#include <asm/uaccess.h>

@@ -61,6 +64,254 @@ struct kvm_stats_debugfs_item debugfs_entries[] = {
{ NULL }
};

+/* Paravirt DMA: We pin the host-side pages for the GPAs that we get
+ * for the DMA operation. We do a sg_map on the host pages for a DMA
+ * operation on the guest side. We un-pin the pages on the
+ * unmap_hypercall.
+ */
+struct dma_map {
+ struct list_head list;
+ int nents;
+ struct scatterlist *sg;
+};
+
+/* This list is to store the guest bus:device:function and host
+ * bus:device:function mapping for passthrough'ed devices.
+ */
+/* FIXME: make this per-vm */
+/* FIXME: delete this list at the end of a vm session */
+struct pv_pci_dev_list {
+ struct list_head list;
+ struct kvm_pv_passthrough_dev pt_dev;
+};
+
+/* FIXME: This should be a per-vm list */
+static LIST_HEAD(dmap_head);
+static LIST_HEAD(pt_dev_head);
+
+static struct dma_map*
+find_matching_dmap(struct list_head *head, dma_addr_t dma)
+{
+ struct list_head *ptr;
+ struct dma_map *match;
+
+ list_for_each(ptr, head) {
+ match = list_entry(ptr, struct dma_map, list);
+ if (match && match->sg[0].dma_address == dma)
+ return match;
+ }
+ return NULL;
+}
+
+static void
+prepare_sg_entry(struct scatterlist *sg, unsigned long addr)
+{
+ unsigned int offset, len;
+
+ offset = addr & ~PAGE_MASK;
+ len = PAGE_SIZE - offset;
+
+ /* FIXME: Use the sg chaining features */
+ sg_set_page(sg, pfn_to_page(addr >> PAGE_SHIFT),
+ len, offset);
+}
+
+static int pv_map_hypercall(struct kvm_vcpu *vcpu, int npages, gfn_t page_gfn)
+{
+ int i, r = 0;
+ gpa_t gpa;
+ hpa_t page_hpa, hpa;
+ struct dma_map *dmap;
+ struct page *host_page;
+ struct scatterlist *sg;
+ unsigned long *shared_addr, *hcall_page;
+
+ /* We currently don't support dma mappings which have more than
+ * PAGE_SIZE/sizeof(unsigned long *) pages
+ */
+ if (!npages || npages > MAX_PVDMA_PAGES) {
+ printk(KERN_INFO "%s: Illegal number of pages: %d\n",
+ __FUNCTION__, npages);
+ goto out;
+ }
+
+ page_hpa = gpa_to_hpa(vcpu->kvm, page_gfn << PAGE_SHIFT);
+ if (is_error_hpa(page_hpa)) {
+ printk(KERN_INFO "%s: page hpa %p not valid for page_gfn %p\n",
+ __FUNCTION__, (void *)page_hpa, (void *)page_gfn);
+ goto out;
+ }
+ host_page = pfn_to_page(page_hpa >> PAGE_SHIFT);
+ hcall_page = shared_addr = kmap(host_page);
+
+ /* scatterlist to map guest dma pages into host physical
+ * memory -- if they exceed the DMA map limit
+ */
+ sg = kcalloc(npages, sizeof(struct scatterlist), GFP_KERNEL);
+ if (sg == NULL) {
+ printk(KERN_INFO "%s: Couldn't allocate memory (sg)\n",
+ __FUNCTION__);
+ goto out_unmap;
+ }
+
+ /* List to store all guest pages mapped into host. This will
+ * be used later to free pages on the host. Think of this as a
+ * translation table from guest dma addresses into host dma
+ * addresses
+ */
+ dmap = kmalloc(sizeof(struct dma_map), GFP_KERNEL);
+ if (dmap == NULL) {
+ printk(KERN_INFO "%s: Couldn't allocate memory\n",
+ __FUNCTION__);
+ goto out_unmap_sg;
+ }
+
+ /* FIXME: consider the length of the last page. Guest should
+ * send this info.
+ */
+ for (i = 0; i < npages; i++) {
+ gpa = *shared_addr++;
+ hpa = gpa_to_hpa(vcpu->kvm, gpa);
+ if (is_error_hpa(hpa)) {
+ int j;
+ printk(KERN_INFO "kvm %s: hpa %p not valid "
+ "for gpa %p\n",
+ __FUNCTION__, (void *)gpa, (void *)hpa);
+
+ for (j = 0; j < i; j++)
+ put_page(sg_page(&sg[j]));
+ goto out_unmap_sg;
+ }
+ prepare_sg_entry(&sg[i], hpa);
+ get_page(sg_page(&sg[i]));
+ }
+
+ /* Put this on the dmap_head list, so that we can find it
+ * later for the 'free' operation
+ */
+ dmap->sg = sg;
+ dmap->nents = npages;
+ list_add(&dmap->list, &dmap_head);
+
+ /* FIXME: guest should send the direction */
+ r = dma_ops->map_sg(NULL, sg, npages, PCI_DMA_BIDIRECTIONAL);
+ if (r) {
+ r = npages;
+ *hcall_page = sg[0].dma_address;
+ }
+
+ out_unmap:
+ if (!r)
+ *hcall_page = bad_dma_address;
+ kunmap(host_page);
+ out:
+ return r;
+ out_unmap_sg:
+ kfree(dmap);
+ kfree(sg);
+ goto out_unmap;
+}
+
+/* FIXME: the argument passed from guest can be 32-bit. We need 64-bit for
+ * dma_addr_t. Send the dma address in a page.
+ */
+static int pv_unmap_hypercall(struct kvm_vcpu *vcpu, dma_addr_t dma)
+{
+ int i, r = 0;
+ struct dma_map *dmap;
+
+ /* dma is the address we have to 'unmap'. Check if it exists
+ * in the dma_map list. If yes, free it.
+ */
+ dmap = find_matching_dmap(&dmap_head, dma);
+ if (dmap) {
+ for (i = 0; i < dmap->nents; i++)
+ put_page(sg_page(&dmap->sg[i]));
+
+ dma_ops->unmap_sg(NULL, dmap->sg, dmap->nents,
+ PCI_DMA_BIDIRECTIONAL);
+ kfree(dmap->sg);
+ list_del(&dmap->list);
+ } else
+ r = 1;
+
+ return r;
+}
+
+static struct pv_pci_dev_list*
+find_matching_pt_dev(struct list_head *head,
+ struct kvm_pv_pci_info *pv_pci_info)
+{
+ struct list_head *ptr;
+ struct pv_pci_dev_list *match;
+
+ list_for_each(ptr, head) {
+ match = list_entry(ptr, struct pv_pci_dev_list, list);
+ /* We use guest_name for comparison since we also use this
+ * function from the hypercall which the guest issues to
+ * find out if it's a pv device
+ */
+ if (match &&
+ (match->pt_dev.guest.busnr == pv_pci_info->busnr) &&
+ (match->pt_dev.guest.devfn == pv_pci_info->devfn))
+ return match;
+ }
+ return NULL;
+}
+
+static int
+pv_mapped_pci_device_hypercall(struct kvm_vcpu *vcpu, gfn_t page_gfn)
+{
+ int r = -1;
+ hpa_t page_hpa;
+ unsigned long *shared_addr;
+ struct page *host_page;
+ struct kvm_pv_pci_info pv_pci_info;
+
+ page_hpa = gpa_to_hpa(vcpu->kvm, page_gfn << PAGE_SHIFT);
+ if (is_error_hpa(page_hpa)) {
+ printk(KERN_INFO "%s: page hpa %p not valid for page_gfn %p\n",
+ __FUNCTION__, (void *)page_hpa, (void *)page_gfn);
+ goto out;
+ }
+ host_page = pfn_to_page(page_hpa >> PAGE_SHIFT);
+ shared_addr = kmap(host_page);
+ memcpy(&pv_pci_info, shared_addr, sizeof(struct kvm_pv_pci_info));
+
+ if (find_matching_pt_dev(&pt_dev_head, &pv_pci_info))
+ r = 1;
+ else
+ r = 0;
+
+ kunmap(host_page);
+ out:
+ return r;
+}
+
+static int kvm_vm_ioctl_pv_pt_dev(struct kvm_pv_passthrough_dev *pv_pci_dev)
+{
+ int r = 0;
+ struct pv_pci_dev_list *match;
+
+ /* Has this been added already? */
+ if (find_matching_pt_dev(&pt_dev_head, &pv_pci_dev->guest))
+ goto out;
+
+ match = kmalloc(sizeof(struct pv_pci_dev_list), GFP_KERNEL);
+ if (match == NULL) {
+ printk(KERN_INFO "%s: Couldn't allocate memory\n",
+ __FUNCTION__);
+ r = -ENOMEM;
+ goto out;
+ }
+ match->pt_dev.guest.busnr = pv_pci_dev->guest.busnr;
+ match->pt_dev.guest.devfn = pv_pci_dev->guest.devfn;
+ match->pt_dev.mach.busnr = pv_pci_dev->mach.busnr;
+ match->pt_dev.mach.devfn = pv_pci_dev->mach.devfn;
+ list_add(&match->list, &pt_dev_head);
+ out:
+ return r;
+}

unsigned long segment_base(u16 selector)
{
@@ -983,6 +1234,19 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
break;
}
+ case KVM_ASSIGN_PV_PCI_DEV: {
+ struct kvm_pv_passthrough_dev pv_pci_dev;
+
+ r = -EFAULT;
+ if (copy_from_user(&pv_pci_dev, argp, sizeof pv_pci_dev)) {
+ printk("pv_register: failing copy from user\n");
+ goto out;
+ }
+ r = kvm_vm_ioctl_pv_pt_dev(&pv_pci_dev);
+ if (r)
+ goto out;
+ break;
+ }
default:
;
}
@@ -1649,6 +1913,15 @@ int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
}

switch (nr) {
+ case KVM_PV_DMA_MAP:
+ ret = pv_map_hypercall(vcpu, a0, a1);
+ break;
+ case KVM_PV_DMA_UNMAP:
+ ret = pv_unmap_hypercall(vcpu, a0);
+ break;
+ case KVM_PV_PCI_DEVICE:
+ ret = pv_mapped_pci_device_hypercall(vcpu, a0);
+ break;
default:
ret = -KVM_ENOSYS;
break;
diff --git a/include/asm-x86/kvm_para.h b/include/asm-x86/kvm_para.h
index c6f3fd8..c4b2be0 100644
--- a/include/asm-x86/kvm_para.h
+++ b/include/asm-x86/kvm_para.h
@@ -17,7 +17,13 @@
/* This instruction is vmcall. On non-VT architectures, it will generate a
* trap that we will then rewrite to the appropriate instruction.
*/
-#define KVM_HYPERCALL ".byte 0x0f,0x01,0xc1"
+#define KVM_HYPERCALL ".byte 0x0f,0x01,0xd9"
+
+/* Hypercall numbers */
+#define KVM_PV_UNUSED 0
+#define KVM_PV_DMA_MAP 1
+#define KVM_PV_DMA_UNMAP 2
+#define KVM_PV_PCI_DEVICE 3

/* For KVM hypercalls, a three-byte sequence of either the vmrun or the vmmrun
* instruction. The hypervisor may replace it with something else but only the
@@ -101,5 +107,18 @@ static inline unsigned int kvm_arch_para_features(void)
}

#endif
-
+/* Info stored for identifying paravirtualized PCI devices in the host kernel */
+struct kvm_pv_pci_info {
+ unsigned char busnr;
+ unsigned int devfn;
+};
+
+/* Mapping between host and guest PCI device */
+struct kvm_pv_passthrough_dev {
+ struct kvm_pv_pci_info guest;
+ struct kvm_pv_pci_info mach;
+};
+
+/* Max. DMA pages we send from guest to host for mapping */
+#define MAX_PVDMA_PAGES (PAGE_SIZE / sizeof(unsigned long *))
#endif
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 71d33d6..38fbebb 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -9,6 +9,7 @@

#include <asm/types.h>
#include <linux/ioctl.h>
+#include <linux/kvm_para.h>

#define KVM_API_VERSION 12

@@ -381,6 +382,8 @@ struct kvm_signal_mask {
#define KVM_IRQ_LINE _IOW(KVMIO, 0x61, struct kvm_irq_level)
#define KVM_GET_IRQCHIP _IOWR(KVMIO, 0x62, struct kvm_irqchip)
#define KVM_SET_IRQCHIP _IOR(KVMIO, 0x63, struct kvm_irqchip)
+#define KVM_ASSIGN_PV_PCI_DEV _IOR(KVMIO, 0x64, \
+ struct kvm_pv_passthrough_dev)

/*
* ioctls for vcpu fds
--
1.5.3

2007-11-07 14:21:53

by Amit Shah

[permalink] [raw]
Subject: [PATCH 3/8] KVM: PVDMA Guest: Guest-side routines for paravirtualized DMA

We make the dma_mapping_ops structure to point to our structure
so that every DMA access goes through us. (This is the reason this
only works for 64-bit guest. 32-bit guest doesn't yet have a dma_ops
struct.)

We make a hypercall for every device that does a DMA operation
to find out if it is a passthroughed device -- so that we can
make hypercalls on each DMA access. The result of this hypercall
is cached, so that this hypercall is made only once for each device

Right now, this only works as a module: compiling it in causes
it to freeze during the HD bring-up.

Signed-off-by: Amit Shah <[email protected]>
---
drivers/kvm/kvm_pv_dma.c | 398 ++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 398 insertions(+), 0 deletions(-)
create mode 100644 drivers/kvm/kvm_pv_dma.c

diff --git a/drivers/kvm/kvm_pv_dma.c b/drivers/kvm/kvm_pv_dma.c
new file mode 100644
index 0000000..8d98d98
--- /dev/null
+++ b/drivers/kvm/kvm_pv_dma.c
@@ -0,0 +1,398 @@
+/*
+ * KVM guest DMA para-virtualization driver
+ *
+ * Copyright (C) 2007, Qumranet, Inc., Amit Shah <[email protected]>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include <asm/io.h>
+#include <asm/page.h>
+#include <linux/fs.h>
+#include <linux/pci.h>
+#include <linux/module.h>
+#include <linux/version.h>
+#include <linux/miscdevice.h>
+#include <linux/kvm_para.h>
+
+MODULE_AUTHOR("Amit Shah");
+MODULE_DESCRIPTION("Implements guest para-virtualized DMA");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1");
+
+#define KVM_DMA_MINOR MISC_DYNAMIC_MINOR
+
+static struct page *page;
+static unsigned long page_gfn;
+
+const struct dma_mapping_ops *orig_dma_ops;
+
+#include <linux/list.h>
+struct pv_passthrough_dev_list {
+ struct list_head list;
+ struct kvm_pv_pci_info pv_pci_info;
+ int is_pv;
+};
+static LIST_HEAD(pt_devs_head);
+
+static struct pv_passthrough_dev_list*
+find_matching_pt_dev(struct list_head *head,
+ struct kvm_pv_pci_info *pv_pci_info)
+{
+ struct list_head *ptr;
+ struct pv_passthrough_dev_list *match;
+
+ list_for_each(ptr, head) {
+ match = list_entry(ptr, struct pv_passthrough_dev_list, list);
+ if (match &&
+ (match->pv_pci_info.busnr == pv_pci_info->busnr) &&
+ (match->pv_pci_info.devfn == pv_pci_info->devfn))
+ return match;
+ }
+ return NULL;
+}
+
+void
+empty_pt_dev_list(struct list_head *head)
+{
+ struct pv_passthrough_dev_list *match;
+
+ while (!list_empty(head)) {
+ match = list_entry(head->next, \
+ struct pv_passthrough_dev_list, list);
+ list_del(&match->list);
+ }
+}
+
+static int
+kvm_is_pv_device(struct device *dev, const char *name)
+{
+ int r;
+ struct pci_dev *pci_dev;
+ struct kvm_pv_pci_info pv_pci_info;
+ struct pv_passthrough_dev_list *match;
+
+ pci_dev = to_pci_dev(dev);
+ pv_pci_info.busnr = pci_dev->bus->number;
+ pv_pci_info.devfn = pci_dev->devfn;
+
+ match = find_matching_pt_dev(&pt_devs_head, &pv_pci_info);
+ if (match) {
+ r = match->is_pv;
+ goto out;
+ }
+
+ memcpy(page_address(page), &pv_pci_info, sizeof(pv_pci_info));
+ r = kvm_hypercall1(KVM_PV_PCI_DEVICE, page_gfn);
+ if (r < 1) {
+ printk(KERN_INFO "%s: Errror doing hypercall!\n", __FUNCTION__);
+ r = 0;
+ goto out;
+ }
+
+ match = kmalloc(sizeof(struct pv_passthrough_dev_list), GFP_KERNEL);
+ if (match == NULL) {
+ printk(KERN_INFO "%s: Out of memory\n", __FUNCTION__);
+ r = 0;
+ goto out;
+ }
+ match->pv_pci_info.busnr = pv_pci_info.busnr;
+ match->pv_pci_info.devfn = pv_pci_info.devfn;
+ match->is_pv = r;
+ list_add(&match->list, &pt_devs_head);
+ out:
+ return r;
+}
+
+static void *
+kvm_dma_map(void *vaddr, size_t size, dma_addr_t *dma_handle)
+{
+ int npages, i;
+ unsigned long *dma_addr;
+ dma_addr_t host_addr = bad_dma_address;
+
+ if (page == NULL)
+ goto out;
+
+ npages = get_order(size) + 1;
+ dma_addr = page_address(page);
+
+ /* We have to take into consideration the offsets for the
+ * virtual address provided by the calling
+ * functions. Currently both, pci_alloc_consistent and
+ * pci_map_single call this function. We have to change it so
+ * that we can also pass to the host the offset of the addr in
+ * the page it is in.
+ */
+
+ if (*dma_handle == bad_dma_address)
+ goto out;
+
+ /* It's not really OK to use dma_handle here, as the IOMMU or
+ * swiotlb could have mapped it elsewhere. But what's a better
+ * solution?
+ */
+ *dma_addr++ = *dma_handle;
+ if (npages > 1) {
+ /* All of the pages will be contiguous in guest
+ * physical memory in both, pci_map_consistent and
+ * pci_map_single cases (see DMA-API.txt)
+ */
+ /* FIXME: we're currently not crossing over to
+ * multiple pages to be sent to host, in case
+ * we have a lot of pages that we can't
+ * accomodate in one page.
+ */
+ for (i = 1; i < min((unsigned long)npages, MAX_PVDMA_PAGES); i++)
+ *dma_addr++ = virt_to_phys(vaddr + PAGE_SIZE * i);
+ }
+
+ /* Maybe we need more arguments (we have first two):
+ * @npages: number of gpas pages in this hypercall
+ * @page: page we pass to host with all the gpas in them
+ * @more: are there any more pages coming?
+ * @offset: offset of the address in the first page
+ * @direction: direction for the mapping (only for pci_map_single)
+ */
+ npages = kvm_hypercall2(KVM_PV_DMA_MAP, npages, page_gfn);
+ if (!npages)
+ host_addr = bad_dma_address;
+ else
+ host_addr = *(unsigned long *)page_address(page);
+
+ out:
+ *dma_handle = host_addr;
+ if (host_addr == bad_dma_address)
+ vaddr = NULL;
+ return vaddr;
+}
+
+static void
+kvm_dma_unmap(dma_addr_t dma_handle)
+{
+ kvm_hypercall1(KVM_PV_DMA_UNMAP, dma_handle);
+ return;
+}
+
+static void *
+kvm_dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
+ gfp_t gfp)
+{
+ void *vaddr = NULL;
+ if ((*dma_handle == bad_dma_address)
+ || !dma_ops->is_pv_device(dev, dev->bus_id))
+ goto out;
+
+ vaddr = bus_to_virt((unsigned long)dma_handle);
+ vaddr = kvm_dma_map(vaddr, size, dma_handle);
+ out:
+ return vaddr;
+}
+
+static void
+kvm_dma_free_coherent(struct device *dev, size_t size, void *vaddr,
+ dma_addr_t dma_handle)
+{
+ kvm_dma_unmap(dma_handle);
+}
+
+static dma_addr_t
+kvm_dma_map_single(struct device *dev, void *ptr, size_t size, int direction)
+{
+ dma_addr_t r;
+
+ r = orig_dma_ops->map_single(dev, ptr, size, direction);
+
+ if (r != bad_dma_address && kvm_is_pv_device(dev, dev->bus_id))
+ kvm_dma_map(ptr, size, &r);
+ return r;
+}
+
+static inline void
+kvm_dma_unmap_single(struct device *dev, dma_addr_t addr, size_t size,
+ int direction)
+{
+ kvm_dma_unmap(addr);
+}
+
+int kvm_pv_dma_mapping_error(dma_addr_t dma_addr)
+{
+ if (orig_dma_ops->mapping_error)
+ return orig_dma_ops->mapping_error(dma_addr);
+
+ printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n",
+ __FUNCTION__);
+ return dma_addr == bad_dma_address;
+}
+
+/* like map_single, but doesn't check the device mask */
+dma_addr_t kvm_pv_dma_map_simple(struct device *hwdev, char *ptr,
+ size_t size, int direction)
+{
+ return orig_dma_ops->map_simple(hwdev, ptr, size, direction);
+}
+
+void kvm_pv_dma_sync_single_for_cpu(struct device *hwdev,
+ dma_addr_t dma_handle, size_t size,
+ int direction)
+{
+ if (orig_dma_ops->sync_single_for_cpu)
+ orig_dma_ops->sync_single_for_cpu(hwdev, dma_handle,
+ size, direction);
+}
+
+void kvm_pv_dma_sync_single_for_device(struct device *hwdev,
+ dma_addr_t dma_handle, size_t size,
+ int direction)
+{
+ if (orig_dma_ops->sync_single_for_device)
+ orig_dma_ops->sync_single_for_device(hwdev, dma_handle,
+ size, direction);
+}
+
+void kvm_pv_dma_sync_single_range_for_cpu(struct device *hwdev,
+ dma_addr_t dma_handle,
+ unsigned long offset,
+ size_t size, int direction)
+{
+ if (orig_dma_ops->sync_single_range_for_cpu)
+ orig_dma_ops->sync_single_range_for_cpu(hwdev, dma_handle,
+ offset, size,
+ direction);
+}
+
+void kvm_pv_dma_sync_single_range_for_device(struct device *hwdev,
+ dma_addr_t dma_handle,
+ unsigned long offset,
+ size_t size, int direction)
+{
+ if (orig_dma_ops->sync_single_range_for_device)
+ orig_dma_ops->sync_single_range_for_device(hwdev, dma_handle,
+ offset, size,
+ direction);
+}
+
+void kvm_pv_dma_sync_sg_for_cpu(struct device *hwdev,
+ struct scatterlist *sg, int nelems,
+ int direction)
+{
+ if (orig_dma_ops->sync_sg_for_cpu)
+ orig_dma_ops->sync_sg_for_cpu(hwdev, sg, nelems, direction);
+}
+
+void kvm_pv_dma_sync_sg_for_device(struct device *hwdev,
+ struct scatterlist *sg, int nelems,
+ int direction)
+{
+ if (orig_dma_ops->sync_sg_for_device)
+ orig_dma_ops->sync_sg_for_device(hwdev, sg, nelems, direction);
+}
+
+int kvm_pv_dma_map_sg(struct device *hwdev, struct scatterlist *sg,
+ int nents, int direction)
+{
+ return orig_dma_ops->map_sg(hwdev, sg, nents, direction);
+ printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n",
+ __FUNCTION__);
+ return 0;
+}
+
+void kvm_pv_dma_unmap_sg(struct device *hwdev,
+ struct scatterlist *sg, int nents,
+ int direction)
+{
+ if (orig_dma_ops->unmap_sg)
+ orig_dma_ops->unmap_sg(hwdev, sg, nents, direction);
+}
+
+int kvm_pv_dma_dma_supported(struct device *hwdev, u64 mask)
+{
+ if (orig_dma_ops->dma_supported)
+ return orig_dma_ops->dma_supported(hwdev, mask);
+ printk(KERN_ERR "%s: Unhandled PV DMA operation. Report this.\n",
+ __FUNCTION__);
+ return 0;
+}
+
+static const struct dma_mapping_ops kvm_dma_ops = {
+ .alloc_coherent = kvm_dma_alloc_coherent,
+ .free_coherent = kvm_dma_free_coherent,
+ .map_single = kvm_dma_map_single,
+ .unmap_single = kvm_dma_unmap_single,
+ .is_pv_device = kvm_is_pv_device,
+
+ .mapping_error = kvm_pv_dma_mapping_error,
+ .map_simple = kvm_pv_dma_map_simple,
+ .sync_single_for_cpu = kvm_pv_dma_sync_single_for_cpu,
+ .sync_single_for_device = kvm_pv_dma_sync_single_for_device,
+ .sync_single_range_for_cpu = kvm_pv_dma_sync_single_range_for_cpu,
+ .sync_single_range_for_device = kvm_pv_dma_sync_single_range_for_device,
+ .sync_sg_for_cpu = kvm_pv_dma_sync_sg_for_cpu,
+ .sync_sg_for_device = kvm_pv_dma_sync_sg_for_device,
+ .map_sg = kvm_pv_dma_map_sg,
+ .unmap_sg = kvm_pv_dma_unmap_sg,
+};
+
+static struct file_operations dma_chardev_ops;
+static struct miscdevice kvm_dma_dev = {
+ KVM_DMA_MINOR,
+ "kvm_dma",
+ &dma_chardev_ops,
+};
+
+int __init kvm_pv_dma_init(void)
+{
+ int r;
+
+ dma_chardev_ops.owner = THIS_MODULE;
+ if (misc_register(&kvm_dma_dev)) {
+ printk(KERN_ERR "%s: misc device register failed\n",
+ __FUNCTION__);
+ r = -EBUSY;
+ goto out;
+ }
+ if (!kvm_para_available()) {
+ printk(KERN_ERR "KVM paravirt support not available\n");
+ r = -ENODEV;
+ goto out_dereg;
+ }
+
+ /* FIXME: check for hypercall support */
+ page = alloc_page(GFP_ATOMIC);
+ if (page == NULL) {
+ printk(KERN_ERR "%s: Could not allocate page\n", __FUNCTION__);
+ r = -ENOMEM;
+ goto out_dereg;
+ }
+ page_gfn = page_to_pfn(page);
+
+ orig_dma_ops = dma_ops;
+ dma_ops = &kvm_dma_ops;
+
+ printk(KERN_INFO "KVM PV DMA engine registered\n");
+ return 0;
+ goto out;
+ goto out_free;
+
+ out_free:
+ __free_page(page);
+ out_dereg:
+ misc_deregister(&kvm_dma_dev);
+ out:
+ return r;
+}
+
+static void __exit kvm_pv_dma_exit(void)
+{
+ dma_ops = orig_dma_ops;
+
+ __free_page(page);
+
+ empty_pt_dev_list(&pt_devs_head);
+
+ misc_deregister(&kvm_dma_dev);
+}
+
+module_init(kvm_pv_dma_init);
+module_exit(kvm_pv_dma_exit);
--
1.5.3

2007-11-07 14:22:12

by Amit Shah

[permalink] [raw]
Subject: [PATCH 6/8] KVM: PVDMA Guest: Add Makefile rule

Add Makefile rule for compiling the new file
that we create

Signed-off-by: Amit Shah <[email protected]>
---
drivers/kvm/Makefile | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Makefile b/drivers/kvm/Makefile
index cf18ad4..f492e3e 100644
--- a/drivers/kvm/Makefile
+++ b/drivers/kvm/Makefile
@@ -8,3 +8,5 @@ kvm-intel-objs = vmx.o
obj-$(CONFIG_KVM_INTEL) += kvm-intel.o
kvm-amd-objs = svm.o
obj-$(CONFIG_KVM_AMD) += kvm-amd.o
+kvm-pv-dma-objs = kvm_pv_dma.o
+obj-$(CONFIG_KVM_PV_DMA) += kvm_pv_dma.o
--
1.5.3

2007-11-07 14:22:31

by Amit Shah

[permalink] [raw]
Subject: [PATCH 5/8] KVM: PVDMA: Update dma_alloc_coherent to make it paravirt-aware

Of all the DMA calls, only dma_alloc_coherent might not actually
call dma_ops->alloc_coherent. We make sure that gets called
if the device that's being worked on is a PV device

Signed-off-by: Amit Shah <[email protected]>
---
arch/x86/kernel/pci-dma_64.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/pci-dma_64.c b/arch/x86/kernel/pci-dma_64.c
index aa805b1..d4b1713 100644
--- a/arch/x86/kernel/pci-dma_64.c
+++ b/arch/x86/kernel/pci-dma_64.c
@@ -11,6 +11,7 @@
#include <asm/io.h>
#include <asm/gart.h>
#include <asm/calgary.h>
+#include <linux/kvm_para.h>

int iommu_merge __read_mostly = 1;
EXPORT_SYMBOL(iommu_merge);
@@ -134,6 +135,18 @@ dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *dma_handle,
memset(memory, 0, size);
if (!mmu) {
*dma_handle = virt_to_bus(memory);
+ if (unlikely(dma_ops->is_pv_device)
+ && unlikely(dma_ops->is_pv_device(dev, dev->bus_id))) {
+ void *r;
+ r = dma_ops->alloc_coherent(dev, size,
+ dma_handle,
+ gfp);
+ if (r == NULL) {
+ free_pages((unsigned long)memory,
+ get_order(size));
+ memory = NULL;
+ }
+ }
return memory;
}
}
--
1.5.3

2007-11-07 14:22:45

by Amit Shah

[permalink] [raw]
Subject: [PATCH 8/8] KVM: Update drivers/Makefile to check for CONFIG_VIRTUALIZATION

Check for CONFIG_VIRTUALIZATION instead of CONFIG_KVM,
since the PV drivers won't depend on CONFIG_KVM and we
still want to be selectable

Signed-off-by: Amit Shah <[email protected]>
---
drivers/Makefile | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/Makefile b/drivers/Makefile
index 8cb37e3..6f1c287 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -47,7 +47,7 @@ obj-$(CONFIG_SPI) += spi/
obj-$(CONFIG_PCCARD) += pcmcia/
obj-$(CONFIG_DIO) += dio/
obj-$(CONFIG_SBUS) += sbus/
-obj-$(CONFIG_KVM) += kvm/
+obj-$(CONFIG_VIRTUALIZATION) += kvm/
obj-$(CONFIG_ZORRO) += zorro/
obj-$(CONFIG_MAC) += macintosh/
obj-$(CONFIG_ATA_OVER_ETH) += block/aoe/
--
1.5.3

2007-11-07 14:22:57

by Amit Shah

[permalink] [raw]
Subject: [PATCH 7/8] PVDMA: Guest: Add Kconfig options to select PVDMA

This is to be enabled on a guest. Currently, only
'module' works; compiling it in freezes at HD bringup

Signed-off-by: Amit Shah <[email protected]>
---
drivers/kvm/Kconfig | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/kvm/Kconfig b/drivers/kvm/Kconfig
index 6569206..3385c10 100644
--- a/drivers/kvm/Kconfig
+++ b/drivers/kvm/Kconfig
@@ -47,6 +47,14 @@ config KVM_AMD
Provides support for KVM on AMD processors equipped with the AMD-V
(SVM) extensions.

+config KVM_PV_DMA
+ tristate "Para-virtualized DMA access"
+ ---help---
+ Provides support for DMA operations in the guest. A hypercall
+ is raised to the host to enable devices owned by guest to use
+ DMA. Select this if compiling a guest kernel and you need
+ paravirtualized DMA operations.
+
# OK, it's a little counter-intuitive to do this, but it puts it neatly under
# the virtualization menu.
source drivers/lguest/Kconfig
--
1.5.3