2021-03-08 21:49:36

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 00/14] vfio: Device memory DMA mapping improvements

The primary goal of this series is to better manage device memory
mappings, both with a much simplified scheme to zap CPU mappings of
device memory using unmap_mapping_range() and also to restrict IOMMU
mappings of PFNMAPs to vfio device memory and drop those mappings on
device release. This series updates vfio-pci to include the necessary
vma-to-pfn interface, allowing the type1 IOMMU backend to recognize
vfio device memory. If other bus drivers support peer-to-peer DMA,
they should be updated with a similar callback and trigger the device
notifier on release.

RFC->v1:

- Fix some incorrect ERR handling
- Fix use of vm_pgoff to be compatible with unmap_mapping_range()
- Add vma-to-pfn interfaces
- Generic device-from-vma handling
- pfnmap obj directly maps back to vfio_dma obj
- No bypass for strict MMIO handling
- Batch PFNMAP handling
- Follow-on patches to cleanup "extern" usage and bare unsigned

Works in my environment, further testing always appreciated. This
will need to be merged with a solution for concurrent fault handling.
Thanks especially to Jason Gunthorpe for previous reviews and
suggestions. Thanks,

Alex

RFC:https://lore.kernel.org/kvm/[email protected]/

---

Alex Williamson (14):
vfio: Create vfio_fs_type with inode per device
vfio: Update vfio_add_group_dev() API
vfio: Export unmap_mapping_range() wrapper
vfio/pci: Use vfio_device_unmap_mapping_range()
vfio: Create a vfio_device from vma lookup
vfio: Add vma to pfn callback
vfio: Add a device notifier interface
vfio/pci: Notify on device release
vfio/type1: Refactor pfn_list clearing
vfio/type1: Pass iommu and dma objects through to vaddr_get_pfn
vfio/type1: Register device notifier
vfio/type1: Support batching of device mappings
vfio: Remove extern from declarations across vfio
vfio: Cleanup use of bare unsigned


Documentation/driver-api/vfio-mediated-device.rst | 19 +-
Documentation/driver-api/vfio.rst | 8 -
drivers/s390/cio/vfio_ccw_cp.h | 13 +
drivers/s390/cio/vfio_ccw_private.h | 14 +
drivers/s390/crypto/vfio_ap_private.h | 2
drivers/vfio/Kconfig | 1
drivers/vfio/fsl-mc/vfio_fsl_mc.c | 6 -
drivers/vfio/fsl-mc/vfio_fsl_mc_private.h | 7 -
drivers/vfio/mdev/vfio_mdev.c | 5
drivers/vfio/pci/vfio_pci.c | 229 ++++-----------------
drivers/vfio/pci/vfio_pci_intrs.c | 42 ++--
drivers/vfio/pci/vfio_pci_private.h | 69 +++---
drivers/vfio/platform/vfio_platform_common.c | 7 -
drivers/vfio/platform/vfio_platform_irq.c | 21 +-
drivers/vfio/platform/vfio_platform_private.h | 31 +--
drivers/vfio/vfio.c | 154 ++++++++++++--
drivers/vfio/vfio_iommu_type1.c | 234 +++++++++++++++------
include/linux/vfio.h | 129 ++++++------
18 files changed, 543 insertions(+), 448 deletions(-)


2021-03-08 21:49:37

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 04/14] vfio/pci: Use vfio_device_unmap_mapping_range()

With the vfio device fd tied to the address space of the pseudo fs
inode, we can use the mm to track all vmas that might be mmap'ing
device BARs, which removes our vma_list and all the complicated
lock ordering necessary to manually zap each related vma.

Note that we can no longer store the pfn in vm_pgoff if we want to
use unmap_mapping_range() to zap a selective portion of the device
fd corresponding to BAR mappings.

Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 228 +++++++----------------------------
drivers/vfio/pci/vfio_pci_private.h | 3
2 files changed, 46 insertions(+), 185 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index f0a1d05f0137..415b5109da9b 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -225,7 +225,7 @@ static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)

static void vfio_pci_try_bus_reset(struct vfio_pci_device *vdev);
static void vfio_pci_disable(struct vfio_pci_device *vdev);
-static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data);
+static int vfio_pci_mem_trylock_and_zap_cb(struct pci_dev *pdev, void *data);

/*
* INTx masking requires the ability to disable INTx signaling via PCI_COMMAND
@@ -1168,7 +1168,7 @@ static long vfio_pci_ioctl(void *device_data,
struct vfio_pci_group_info info;
struct vfio_devices devs = { .cur_index = 0 };
bool slot = false;
- int i, group_idx, mem_idx = 0, count = 0, ret = 0;
+ int i, group_idx, count = 0, ret = 0;

minsz = offsetofend(struct vfio_pci_hot_reset, count);

@@ -1268,32 +1268,16 @@ static long vfio_pci_ioctl(void *device_data,
}

/*
- * We need to get memory_lock for each device, but devices
- * can share mmap_lock, therefore we need to zap and hold
- * the vma_lock for each device, and only then get each
- * memory_lock.
+ * Try to get the memory_lock write lock for all devices and
+ * zap all BAR mmaps.
*/
ret = vfio_pci_for_each_slot_or_bus(vdev->pdev,
- vfio_pci_try_zap_and_vma_lock_cb,
+ vfio_pci_mem_trylock_and_zap_cb,
&devs, slot);
- if (ret)
- goto hot_reset_release;
-
- for (; mem_idx < devs.cur_index; mem_idx++) {
- struct vfio_pci_device *tmp;
-
- tmp = vfio_device_data(devs.devices[mem_idx]);
-
- ret = down_write_trylock(&tmp->memory_lock);
- if (!ret) {
- ret = -EBUSY;
- goto hot_reset_release;
- }
- mutex_unlock(&tmp->vma_lock);
- }

/* User has access, do the reset */
- ret = pci_reset_bus(vdev->pdev);
+ if (!ret)
+ ret = pci_reset_bus(vdev->pdev);

hot_reset_release:
for (i = 0; i < devs.cur_index; i++) {
@@ -1303,10 +1287,7 @@ static long vfio_pci_ioctl(void *device_data,
device = devs.devices[i];
tmp = vfio_device_data(device);

- if (i < mem_idx)
- up_write(&tmp->memory_lock);
- else
- mutex_unlock(&tmp->vma_lock);
+ up_write(&tmp->memory_lock);
vfio_device_put(device);
}
kfree(devs.devices);
@@ -1452,100 +1433,18 @@ static ssize_t vfio_pci_write(void *device_data, const char __user *buf,
return vfio_pci_rw(device_data, (char __user *)buf, count, ppos, true);
}

-/* Return 1 on zap and vma_lock acquired, 0 on contention (only with @try) */
-static int vfio_pci_zap_and_vma_lock(struct vfio_pci_device *vdev, bool try)
+static void vfio_pci_zap_bars(struct vfio_pci_device *vdev)
{
- struct vfio_pci_mmap_vma *mmap_vma, *tmp;
-
- /*
- * Lock ordering:
- * vma_lock is nested under mmap_lock for vm_ops callback paths.
- * The memory_lock semaphore is used by both code paths calling
- * into this function to zap vmas and the vm_ops.fault callback
- * to protect the memory enable state of the device.
- *
- * When zapping vmas we need to maintain the mmap_lock => vma_lock
- * ordering, which requires using vma_lock to walk vma_list to
- * acquire an mm, then dropping vma_lock to get the mmap_lock and
- * reacquiring vma_lock. This logic is derived from similar
- * requirements in uverbs_user_mmap_disassociate().
- *
- * mmap_lock must always be the top-level lock when it is taken.
- * Therefore we can only hold the memory_lock write lock when
- * vma_list is empty, as we'd need to take mmap_lock to clear
- * entries. vma_list can only be guaranteed empty when holding
- * vma_lock, thus memory_lock is nested under vma_lock.
- *
- * This enables the vm_ops.fault callback to acquire vma_lock,
- * followed by memory_lock read lock, while already holding
- * mmap_lock without risk of deadlock.
- */
- while (1) {
- struct mm_struct *mm = NULL;
-
- if (try) {
- if (!mutex_trylock(&vdev->vma_lock))
- return 0;
- } else {
- mutex_lock(&vdev->vma_lock);
- }
- while (!list_empty(&vdev->vma_list)) {
- mmap_vma = list_first_entry(&vdev->vma_list,
- struct vfio_pci_mmap_vma,
- vma_next);
- mm = mmap_vma->vma->vm_mm;
- if (mmget_not_zero(mm))
- break;
-
- list_del(&mmap_vma->vma_next);
- kfree(mmap_vma);
- mm = NULL;
- }
- if (!mm)
- return 1;
- mutex_unlock(&vdev->vma_lock);
-
- if (try) {
- if (!mmap_read_trylock(mm)) {
- mmput(mm);
- return 0;
- }
- } else {
- mmap_read_lock(mm);
- }
- if (try) {
- if (!mutex_trylock(&vdev->vma_lock)) {
- mmap_read_unlock(mm);
- mmput(mm);
- return 0;
- }
- } else {
- mutex_lock(&vdev->vma_lock);
- }
- list_for_each_entry_safe(mmap_vma, tmp,
- &vdev->vma_list, vma_next) {
- struct vm_area_struct *vma = mmap_vma->vma;
-
- if (vma->vm_mm != mm)
- continue;
-
- list_del(&mmap_vma->vma_next);
- kfree(mmap_vma);
-
- zap_vma_ptes(vma, vma->vm_start,
- vma->vm_end - vma->vm_start);
- }
- mutex_unlock(&vdev->vma_lock);
- mmap_read_unlock(mm);
- mmput(mm);
- }
+ vfio_device_unmap_mapping_range(vdev->device,
+ VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX),
+ VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_ROM_REGION_INDEX) -
+ VFIO_PCI_INDEX_TO_OFFSET(VFIO_PCI_BAR0_REGION_INDEX));
}

void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev)
{
- vfio_pci_zap_and_vma_lock(vdev, false);
down_write(&vdev->memory_lock);
- mutex_unlock(&vdev->vma_lock);
+ vfio_pci_zap_bars(vdev);
}

u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev)
@@ -1567,82 +1466,51 @@ void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd)
up_write(&vdev->memory_lock);
}

-/* Caller holds vma_lock */
-static int __vfio_pci_add_vma(struct vfio_pci_device *vdev,
- struct vm_area_struct *vma)
+static int vfio_pci_bar_vma_to_pfn(struct vm_area_struct *vma,
+ unsigned long *pfn)
{
- struct vfio_pci_mmap_vma *mmap_vma;
-
- mmap_vma = kmalloc(sizeof(*mmap_vma), GFP_KERNEL);
- if (!mmap_vma)
- return -ENOMEM;
+ struct vfio_pci_device *vdev = vma->vm_private_data;
+ struct pci_dev *pdev = vdev->pdev;
+ int index;
+ u64 pgoff;

- mmap_vma->vma = vma;
- list_add(&mmap_vma->vma_next, &vdev->vma_list);
+ index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT);

- return 0;
-}
+ if (index >= VFIO_PCI_ROM_REGION_INDEX ||
+ !vdev->bar_mmap_supported[index] || !vdev->barmap[index])
+ return -EINVAL;

-/*
- * Zap mmaps on open so that we can fault them in on access and therefore
- * our vma_list only tracks mappings accessed since last zap.
- */
-static void vfio_pci_mmap_open(struct vm_area_struct *vma)
-{
- zap_vma_ptes(vma, vma->vm_start, vma->vm_end - vma->vm_start);
-}
+ pgoff = vma->vm_pgoff &
+ ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1);

-static void vfio_pci_mmap_close(struct vm_area_struct *vma)
-{
- struct vfio_pci_device *vdev = vma->vm_private_data;
- struct vfio_pci_mmap_vma *mmap_vma;
+ *pfn = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;

- mutex_lock(&vdev->vma_lock);
- list_for_each_entry(mmap_vma, &vdev->vma_list, vma_next) {
- if (mmap_vma->vma == vma) {
- list_del(&mmap_vma->vma_next);
- kfree(mmap_vma);
- break;
- }
- }
- mutex_unlock(&vdev->vma_lock);
+ return 0;
}

static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
{
struct vm_area_struct *vma = vmf->vma;
struct vfio_pci_device *vdev = vma->vm_private_data;
- vm_fault_t ret = VM_FAULT_NOPAGE;
-
- mutex_lock(&vdev->vma_lock);
- down_read(&vdev->memory_lock);
-
- if (!__vfio_pci_memory_enabled(vdev)) {
- ret = VM_FAULT_SIGBUS;
- mutex_unlock(&vdev->vma_lock);
- goto up_out;
- }
+ unsigned long pfn;
+ vm_fault_t ret = VM_FAULT_SIGBUS;

- if (__vfio_pci_add_vma(vdev, vma)) {
- ret = VM_FAULT_OOM;
- mutex_unlock(&vdev->vma_lock);
- goto up_out;
- }
+ if (vfio_pci_bar_vma_to_pfn(vma, &pfn))
+ return ret;

- mutex_unlock(&vdev->vma_lock);
+ down_read(&vdev->memory_lock);

- if (io_remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
- vma->vm_end - vma->vm_start, vma->vm_page_prot))
- ret = VM_FAULT_SIGBUS;
+ if (__vfio_pci_memory_enabled(vdev) &&
+ !io_remap_pfn_range(vma, vma->vm_start, pfn,
+ vma->vm_end - vma->vm_start, vma->vm_page_prot))
+ ret = VM_FAULT_NOPAGE;

-up_out:
up_read(&vdev->memory_lock);
+
return ret;
}

static const struct vm_operations_struct vfio_pci_mmap_ops = {
- .open = vfio_pci_mmap_open,
- .close = vfio_pci_mmap_close,
.fault = vfio_pci_mmap_fault,
};

@@ -1702,7 +1570,6 @@ static int vfio_pci_mmap(void *device_data, struct vm_area_struct *vma)

vma->vm_private_data = vdev;
vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot);
- vma->vm_pgoff = (pci_resource_start(pdev, index) >> PAGE_SHIFT) + pgoff;

/*
* See remap_pfn_range(), called from vfio_pci_fault() but we can't
@@ -1926,7 +1793,6 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
{
struct vfio_pci_device *vdev;
struct iommu_group *group;
- struct vfio_device *device;
int ret;

if (vfio_pci_is_denylisted(pdev))
@@ -1965,13 +1831,11 @@ static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
mutex_init(&vdev->ioeventfds_lock);
INIT_LIST_HEAD(&vdev->dummy_resources_list);
INIT_LIST_HEAD(&vdev->ioeventfds_list);
- mutex_init(&vdev->vma_lock);
- INIT_LIST_HEAD(&vdev->vma_list);
init_rwsem(&vdev->memory_lock);

- device = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
- if (IS_ERR(device)) {
- ret = PTR_ERR(device);
+ vdev->device = vfio_add_group_dev(&pdev->dev, &vfio_pci_ops, vdev);
+ if (IS_ERR(vdev->device)) {
+ ret = PTR_ERR(vdev->device);
goto out_free;
}

@@ -2253,7 +2117,7 @@ static int vfio_pci_get_unused_devs(struct pci_dev *pdev, void *data)
return 0;
}

-static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)
+static int vfio_pci_mem_trylock_and_zap_cb(struct pci_dev *pdev, void *data)
{
struct vfio_devices *devs = data;
struct vfio_device *device;
@@ -2273,15 +2137,13 @@ static int vfio_pci_try_zap_and_vma_lock_cb(struct pci_dev *pdev, void *data)

vdev = vfio_device_data(device);

- /*
- * Locking multiple devices is prone to deadlock, runaway and
- * unwind if we hit contention.
- */
- if (!vfio_pci_zap_and_vma_lock(vdev, true)) {
+ if (!down_write_trylock(&vdev->memory_lock)) {
vfio_device_put(device);
return -EBUSY;
}

+ vfio_pci_zap_bars(vdev);
+
devs->devices[devs->cur_index++] = device;
return 0;
}
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 9cd1882a05af..ba37f4eeefd0 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -101,6 +101,7 @@ struct vfio_pci_mmap_vma {

struct vfio_pci_device {
struct pci_dev *pdev;
+ struct vfio_device *device;
void __iomem *barmap[PCI_STD_NUM_BARS];
bool bar_mmap_supported[PCI_STD_NUM_BARS];
u8 *pci_config_map;
@@ -139,8 +140,6 @@ struct vfio_pci_device {
struct list_head ioeventfds_list;
struct vfio_pci_vf_token *vf_token;
struct notifier_block nb;
- struct mutex vma_lock;
- struct list_head vma_list;
struct rw_semaphore memory_lock;
};


2021-03-08 21:50:07

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 05/14] vfio: Create a vfio_device from vma lookup

Creating an address space mapping onto our vfio pseudo fs for each
device file descriptor means that we can universally retrieve a
vfio_device from a vma mapping this file.

Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/vfio.c | 19 +++++++++++++++++--
include/linux/vfio.h | 1 +
2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 3852e57b9e04..3a3e85a0dc3e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -927,6 +927,23 @@ struct vfio_device *vfio_device_get_from_dev(struct device *dev)
}
EXPORT_SYMBOL_GPL(vfio_device_get_from_dev);

+static const struct file_operations vfio_device_fops;
+
+struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma)
+{
+ struct vfio_device *device;
+
+ if (!vma->vm_file || vma->vm_file->f_op != &vfio_device_fops)
+ return ERR_PTR(-ENODEV);
+
+ device = vma->vm_file->private_data;
+
+ vfio_device_get(device);
+
+ return device;
+}
+EXPORT_SYMBOL_GPL(vfio_device_get_from_vma);
+
static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
char *buf)
{
@@ -1486,8 +1503,6 @@ static int vfio_group_add_container_user(struct vfio_group *group)
return 0;
}

-static const struct file_operations vfio_device_fops;
-
static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
{
struct vfio_device *device;
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index f435dfca15eb..660b8adf90a6 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -58,6 +58,7 @@ extern void vfio_device_put(struct vfio_device *device);
extern void *vfio_device_data(struct vfio_device *device);
extern void vfio_device_unmap_mapping_range(struct vfio_device *device,
loff_t start, loff_t len);
+extern struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma);

/* events for the backend driver notify callback */
enum vfio_iommu_notify_type {

2021-03-08 21:50:13

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 07/14] vfio: Add a device notifier interface

Using a vfio device, a notifier block can be registered to receive
select device events. Notifiers can only be registered for contained
devices, ie. they are available through a user context. Registration
of a notifier increments the reference to that container context
therefore notifiers must minimally respond to the release event by
asynchronously removing notifiers.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/Kconfig | 1 +
drivers/vfio/vfio.c | 35 +++++++++++++++++++++++++++++++++++
include/linux/vfio.h | 9 +++++++++
3 files changed, 45 insertions(+)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 90c0525b1e0c..9a67675c9b6c 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -23,6 +23,7 @@ menuconfig VFIO
tristate "VFIO Non-Privileged userspace driver framework"
select IOMMU_API
select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
+ select SRCU
help
VFIO provides a framework for secure userspace device drivers.
See Documentation/driver-api/vfio.rst for more details.
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index c47895539a1a..7f6d00e54e83 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -105,6 +105,7 @@ struct vfio_device {
struct list_head group_next;
void *device_data;
struct inode *inode;
+ struct srcu_notifier_head notifier;
};

#ifdef CONFIG_VFIO_NOIOMMU
@@ -601,6 +602,7 @@ struct vfio_device *vfio_group_create_device(struct vfio_group *group,
device->ops = ops;
device->device_data = device_data;
dev_set_drvdata(dev, device);
+ srcu_init_notifier_head(&device->notifier);

/* No need to get group_lock, caller has group reference */
vfio_group_get(group);
@@ -1785,6 +1787,39 @@ static const struct file_operations vfio_device_fops = {
.mmap = vfio_device_fops_mmap,
};

+int vfio_device_register_notifier(struct vfio_device *device,
+ struct notifier_block *nb)
+{
+ int ret;
+
+ /* Container ref persists until unregister on success */
+ ret = vfio_group_add_container_user(device->group);
+ if (ret)
+ return ret;
+
+ ret = srcu_notifier_chain_register(&device->notifier, nb);
+ if (ret)
+ vfio_group_try_dissolve_container(device->group);
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(vfio_device_register_notifier);
+
+void vfio_device_unregister_notifier(struct vfio_device *device,
+ struct notifier_block *nb)
+{
+ if (!srcu_notifier_chain_unregister(&device->notifier, nb))
+ vfio_group_try_dissolve_container(device->group);
+}
+EXPORT_SYMBOL_GPL(vfio_device_unregister_notifier);
+
+int vfio_device_notifier_call(struct vfio_device *device,
+ enum vfio_device_notify_type event)
+{
+ return srcu_notifier_call_chain(&device->notifier, event, NULL);
+}
+EXPORT_SYMBOL_GPL(vfio_device_notifier_call);
+
/**
* External user API, exported by symbols to be linked dynamically.
*
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index dbd90d0ba713..c3ff36a7fa6f 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -62,6 +62,15 @@ extern void vfio_device_unmap_mapping_range(struct vfio_device *device,
loff_t start, loff_t len);
extern struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma);
extern int vfio_vma_to_pfn(struct vm_area_struct *vma, unsigned long *pfn);
+extern int vfio_device_register_notifier(struct vfio_device *device,
+ struct notifier_block *nb);
+extern void vfio_device_unregister_notifier(struct vfio_device *device,
+ struct notifier_block *nb);
+enum vfio_device_notify_type {
+ VFIO_DEVICE_RELEASE = 0,
+};
+int vfio_device_notifier_call(struct vfio_device *device,
+ enum vfio_device_notify_type event);

/* events for the backend driver notify callback */
enum vfio_iommu_notify_type {

2021-03-08 21:51:08

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device

By linking all the device fds we provide to userspace to an
address space through a new pseudo fs, we can use tools like
unmap_mapping_range() to zap all vmas associated with a device.

Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/vfio.c | 54 +++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 54 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 38779e6fd80c..abdf8d52a911 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -32,11 +32,18 @@
#include <linux/vfio.h>
#include <linux/wait.h>
#include <linux/sched/signal.h>
+#include <linux/pseudo_fs.h>
+#include <linux/mount.h>

#define DRIVER_VERSION "0.3"
#define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
#define DRIVER_DESC "VFIO - User Level meta-driver"

+#define VFIO_MAGIC 0x5646494f /* "VFIO" */
+
+static int vfio_fs_cnt;
+static struct vfsmount *vfio_fs_mnt;
+
static struct vfio {
struct class *class;
struct list_head iommu_drivers_list;
@@ -97,6 +104,7 @@ struct vfio_device {
struct vfio_group *group;
struct list_head group_next;
void *device_data;
+ struct inode *inode;
};

#ifdef CONFIG_VFIO_NOIOMMU
@@ -529,6 +537,34 @@ static struct vfio_group *vfio_group_get_from_dev(struct device *dev)
return group;
}

+static int vfio_fs_init_fs_context(struct fs_context *fc)
+{
+ return init_pseudo(fc, VFIO_MAGIC) ? 0 : -ENOMEM;
+}
+
+static struct file_system_type vfio_fs_type = {
+ .name = "vfio",
+ .owner = THIS_MODULE,
+ .init_fs_context = vfio_fs_init_fs_context,
+ .kill_sb = kill_anon_super,
+};
+
+static struct inode *vfio_fs_inode_new(void)
+{
+ struct inode *inode;
+ int ret;
+
+ ret = simple_pin_fs(&vfio_fs_type, &vfio_fs_mnt, &vfio_fs_cnt);
+ if (ret)
+ return ERR_PTR(ret);
+
+ inode = alloc_anon_inode(vfio_fs_mnt->mnt_sb);
+ if (IS_ERR(inode))
+ simple_release_fs(&vfio_fs_mnt, &vfio_fs_cnt);
+
+ return inode;
+}
+
/**
* Device objects - create, release, get, put, search
*/
@@ -539,11 +575,19 @@ struct vfio_device *vfio_group_create_device(struct vfio_group *group,
void *device_data)
{
struct vfio_device *device;
+ struct inode *inode;

device = kzalloc(sizeof(*device), GFP_KERNEL);
if (!device)
return ERR_PTR(-ENOMEM);

+ inode = vfio_fs_inode_new();
+ if (IS_ERR(inode)) {
+ kfree(device);
+ return ERR_CAST(inode);
+ }
+ device->inode = inode;
+
kref_init(&device->kref);
device->dev = dev;
device->group = group;
@@ -574,6 +618,9 @@ static void vfio_device_release(struct kref *kref)

dev_set_drvdata(device->dev, NULL);

+ iput(device->inode);
+ simple_release_fs(&vfio_fs_mnt, &vfio_fs_cnt);
+
kfree(device);

/* vfio_del_group_dev may be waiting for this device */
@@ -1488,6 +1535,13 @@ static int vfio_group_get_device_fd(struct vfio_group *group, char *buf)
*/
filep->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);

+ /*
+ * Use the pseudo fs inode on the device to link all mmaps
+ * to the same address space, allowing us to unmap all vmas
+ * associated to this device using unmap_mapping_range().
+ */
+ filep->f_mapping = device->inode->i_mapping;
+
atomic_inc(&group->container_users);

fd_install(ret, filep);

2021-03-08 21:51:25

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 13/14] vfio: Remove extern from declarations across vfio

Cleanup disrecommended usage and docs.

Signed-off-by: Alex Williamson <[email protected]>
---
Documentation/driver-api/vfio-mediated-device.rst | 19 ++-
Documentation/driver-api/vfio.rst | 4 -
drivers/s390/cio/vfio_ccw_cp.h | 13 +-
drivers/s390/cio/vfio_ccw_private.h | 14 +-
drivers/s390/crypto/vfio_ap_private.h | 2
drivers/vfio/fsl-mc/vfio_fsl_mc_private.h | 7 +
drivers/vfio/pci/vfio_pci_private.h | 66 +++++------
drivers/vfio/platform/vfio_platform_private.h | 31 +++--
include/linux/vfio.h | 122 ++++++++++-----------
9 files changed, 130 insertions(+), 148 deletions(-)

diff --git a/Documentation/driver-api/vfio-mediated-device.rst b/Documentation/driver-api/vfio-mediated-device.rst
index 25eb7d5b834b..7685ef582f7a 100644
--- a/Documentation/driver-api/vfio-mediated-device.rst
+++ b/Documentation/driver-api/vfio-mediated-device.rst
@@ -115,12 +115,11 @@ to register and unregister itself with the core driver:

* Register::

- extern int mdev_register_driver(struct mdev_driver *drv,
- struct module *owner);
+ int mdev_register_driver(struct mdev_driver *drv, struct module *owner);

* Unregister::

- extern void mdev_unregister_driver(struct mdev_driver *drv);
+ void mdev_unregister_driver(struct mdev_driver *drv);

The mediated bus driver is responsible for adding mediated devices to the VFIO
group when devices are bound to the driver and removing mediated devices from
@@ -162,13 +161,13 @@ The callbacks in the mdev_parent_ops structure are as follows:
A driver should use the mdev_parent_ops structure in the function call to
register itself with the mdev core driver::

- extern int mdev_register_device(struct device *dev,
- const struct mdev_parent_ops *ops);
+ int mdev_register_device(struct device *dev,
+ const struct mdev_parent_ops *ops);

However, the mdev_parent_ops structure is not required in the function call
that a driver should use to unregister itself with the mdev core driver::

- extern void mdev_unregister_device(struct device *dev);
+ void mdev_unregister_device(struct device *dev);


Mediated Device Management Interface Through sysfs
@@ -293,11 +292,11 @@ Translation APIs for Mediated Devices
The following APIs are provided for translating user pfn to host pfn in a VFIO
driver::

- extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
- int npage, int prot, unsigned long *phys_pfn);
+ int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
+ int npage, int prot, unsigned long *phys_pfn);

- extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
- int npage);
+ int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
+ int npage);

These functions call back into the back-end IOMMU module by using the pin_pages
and unpin_pages callbacks of the struct vfio_iommu_driver_ops[4]. Currently
diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 03e978eb8ec7..e6ba42ca6346 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -252,11 +252,11 @@ into VFIO core. When devices are bound and unbound to the driver,
the driver should call vfio_add_group_dev() and vfio_del_group_dev()
respectively::

- extern struct vfio_device *vfio_add_group_dev(struct device *dev,
+ struct vfio_device *vfio_add_group_dev(struct device *dev,
const struct vfio_device_ops *ops,
void *device_data);

- extern void *vfio_del_group_dev(struct device *dev);
+ void *vfio_del_group_dev(struct device *dev);

vfio_add_group_dev() indicates to the core to begin tracking the
iommu_group of the specified dev and register the dev as owned by
diff --git a/drivers/s390/cio/vfio_ccw_cp.h b/drivers/s390/cio/vfio_ccw_cp.h
index ba31240ce965..1ea81c4fe630 100644
--- a/drivers/s390/cio/vfio_ccw_cp.h
+++ b/drivers/s390/cio/vfio_ccw_cp.h
@@ -42,12 +42,11 @@ struct channel_program {
struct ccw1 *guest_cp;
};

-extern int cp_init(struct channel_program *cp, struct device *mdev,
- union orb *orb);
-extern void cp_free(struct channel_program *cp);
-extern int cp_prefetch(struct channel_program *cp);
-extern union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm);
-extern void cp_update_scsw(struct channel_program *cp, union scsw *scsw);
-extern bool cp_iova_pinned(struct channel_program *cp, u64 iova);
+int cp_init(struct channel_program *cp, struct device *mdev, union orb *orb);
+void cp_free(struct channel_program *cp);
+int cp_prefetch(struct channel_program *cp);
+union orb *cp_get_orb(struct channel_program *cp, u32 intparm, u8 lpm);
+void cp_update_scsw(struct channel_program *cp, union scsw *scsw);
+bool cp_iova_pinned(struct channel_program *cp, u64 iova);

#endif
diff --git a/drivers/s390/cio/vfio_ccw_private.h b/drivers/s390/cio/vfio_ccw_private.h
index b2c762eb42b9..01dff317e063 100644
--- a/drivers/s390/cio/vfio_ccw_private.h
+++ b/drivers/s390/cio/vfio_ccw_private.h
@@ -116,10 +116,10 @@ struct vfio_ccw_private {
struct work_struct crw_work;
} __aligned(8);

-extern int vfio_ccw_mdev_reg(struct subchannel *sch);
-extern void vfio_ccw_mdev_unreg(struct subchannel *sch);
+int vfio_ccw_mdev_reg(struct subchannel *sch);
+void vfio_ccw_mdev_unreg(struct subchannel *sch);

-extern int vfio_ccw_sch_quiesce(struct subchannel *sch);
+int vfio_ccw_sch_quiesce(struct subchannel *sch);

/*
* States of the device statemachine.
@@ -150,7 +150,7 @@ enum vfio_ccw_event {
* Action called through jumptable.
*/
typedef void (fsm_func_t)(struct vfio_ccw_private *, enum vfio_ccw_event);
-extern fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS];
+fsm_func_t *vfio_ccw_jumptable[NR_VFIO_CCW_STATES][NR_VFIO_CCW_EVENTS];

static inline void vfio_ccw_fsm_event(struct vfio_ccw_private *private,
int event)
@@ -159,12 +159,12 @@ static inline void vfio_ccw_fsm_event(struct vfio_ccw_private *private,
vfio_ccw_jumptable[private->state][event](private, event);
}

-extern struct workqueue_struct *vfio_ccw_work_q;
+struct workqueue_struct *vfio_ccw_work_q;


/* s390 debug feature, similar to base cio */
-extern debug_info_t *vfio_ccw_debug_msg_id;
-extern debug_info_t *vfio_ccw_debug_trace_id;
+debug_info_t *vfio_ccw_debug_msg_id;
+debug_info_t *vfio_ccw_debug_trace_id;

#define VFIO_CCW_TRACE_EVENT(imp, txt) \
debug_text_event(vfio_ccw_debug_trace_id, imp, txt)
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 28e9d9989768..d71a38dd4300 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -45,7 +45,7 @@ struct ap_matrix_dev {
struct ap_driver *vfio_ap_drv;
};

-extern struct ap_matrix_dev *matrix_dev;
+struct ap_matrix_dev *matrix_dev;

/**
* The AP matrix is comprised of three bit masks identifying the adapters,
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc_private.h b/drivers/vfio/fsl-mc/vfio_fsl_mc_private.h
index a97ee691ed47..1c6f93b849e2 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc_private.h
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc_private.h
@@ -45,10 +45,9 @@ struct vfio_fsl_mc_device {
struct vfio_fsl_mc_irq *mc_irqs;
};

-extern int vfio_fsl_mc_set_irqs_ioctl(struct vfio_fsl_mc_device *vdev,
- u32 flags, unsigned int index,
- unsigned int start, unsigned int count,
- void *data);
+int vfio_fsl_mc_set_irqs_ioctl(struct vfio_fsl_mc_device *vdev, u32 flags,
+ unsigned int index, unsigned int start,
+ unsigned int count, void *data);

void vfio_fsl_mc_irqs_cleanup(struct vfio_fsl_mc_device *vdev);

diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index ba37f4eeefd0..49a60585cf9c 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -149,49 +149,45 @@ struct vfio_pci_device {
#define is_irq_none(vdev) (!(is_intx(vdev) || is_msi(vdev) || is_msix(vdev)))
#define irq_is(vdev, type) (vdev->irq_type == type)

-extern void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
-extern void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);
+void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
+void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);

-extern int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev,
- uint32_t flags, unsigned index,
- unsigned start, unsigned count, void *data);
+int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev,
+ uint32_t flags, unsigned index,
+ unsigned start, unsigned count, void *data);

-extern ssize_t vfio_pci_config_rw(struct vfio_pci_device *vdev,
- char __user *buf, size_t count,
- loff_t *ppos, bool iswrite);
+ssize_t vfio_pci_config_rw(struct vfio_pci_device *vdev, char __user *buf,
+ size_t count, loff_t *ppos, bool iswrite);

-extern ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
- size_t count, loff_t *ppos, bool iswrite);
+ssize_t vfio_pci_bar_rw(struct vfio_pci_device *vdev, char __user *buf,
+ size_t count, loff_t *ppos, bool iswrite);

-extern ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
- size_t count, loff_t *ppos, bool iswrite);
+ssize_t vfio_pci_vga_rw(struct vfio_pci_device *vdev, char __user *buf,
+ size_t count, loff_t *ppos, bool iswrite);

-extern long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
- uint64_t data, int count, int fd);
+long vfio_pci_ioeventfd(struct vfio_pci_device *vdev, loff_t offset,
+ uint64_t data, int count, int fd);

-extern int vfio_pci_init_perm_bits(void);
-extern void vfio_pci_uninit_perm_bits(void);
+int vfio_pci_init_perm_bits(void);
+void vfio_pci_uninit_perm_bits(void);

-extern int vfio_config_init(struct vfio_pci_device *vdev);
-extern void vfio_config_free(struct vfio_pci_device *vdev);
+int vfio_config_init(struct vfio_pci_device *vdev);
+void vfio_config_free(struct vfio_pci_device *vdev);

-extern int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
- unsigned int type, unsigned int subtype,
- const struct vfio_pci_regops *ops,
- size_t size, u32 flags, void *data);
+int vfio_pci_register_dev_region(struct vfio_pci_device *vdev,
+ unsigned int type, unsigned int subtype,
+ const struct vfio_pci_regops *ops,
+ size_t size, u32 flags, void *data);

-extern int vfio_pci_set_power_state(struct vfio_pci_device *vdev,
- pci_power_t state);
+int vfio_pci_set_power_state(struct vfio_pci_device *vdev, pci_power_t state);

-extern bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev);
-extern void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device
- *vdev);
-extern u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev);
-extern void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev,
- u16 cmd);
+bool __vfio_pci_memory_enabled(struct vfio_pci_device *vdev);
+void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_device *vdev);
+u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_device *vdev);
+void vfio_pci_memory_unlock_and_restore(struct vfio_pci_device *vdev, u16 cmd);

#ifdef CONFIG_VFIO_PCI_IGD
-extern int vfio_pci_igd_init(struct vfio_pci_device *vdev);
+int vfio_pci_igd_init(struct vfio_pci_device *vdev);
#else
static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev)
{
@@ -199,8 +195,8 @@ static inline int vfio_pci_igd_init(struct vfio_pci_device *vdev)
}
#endif
#ifdef CONFIG_VFIO_PCI_NVLINK2
-extern int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev);
-extern int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev);
+int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev);
+int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev);
#else
static inline int vfio_pci_nvdia_v100_nvlink2_init(struct vfio_pci_device *vdev)
{
@@ -214,8 +210,8 @@ static inline int vfio_pci_ibm_npu2_init(struct vfio_pci_device *vdev)
#endif

#ifdef CONFIG_S390
-extern int vfio_pci_info_zdev_add_caps(struct vfio_pci_device *vdev,
- struct vfio_info_cap *caps);
+int vfio_pci_info_zdev_add_caps(struct vfio_pci_device *vdev,
+ struct vfio_info_cap *caps);
#else
static inline int vfio_pci_info_zdev_add_caps(struct vfio_pci_device *vdev,
struct vfio_info_cap *caps)
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 289089910643..2aa12d41f9c6 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -78,22 +78,21 @@ struct vfio_platform_reset_node {
vfio_platform_reset_fn_t of_reset;
};

-extern int vfio_platform_probe_common(struct vfio_platform_device *vdev,
- struct device *dev);
-extern struct vfio_platform_device *vfio_platform_remove_common
- (struct device *dev);
-
-extern int vfio_platform_irq_init(struct vfio_platform_device *vdev);
-extern void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev);
-
-extern int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
- uint32_t flags, unsigned index,
- unsigned start, unsigned count,
- void *data);
-
-extern void __vfio_platform_register_reset(struct vfio_platform_reset_node *n);
-extern void vfio_platform_unregister_reset(const char *compat,
- vfio_platform_reset_fn_t fn);
+int vfio_platform_probe_common(struct vfio_platform_device *vdev,
+ struct device *dev);
+struct vfio_platform_device *vfio_platform_remove_common(struct device *dev);
+
+int vfio_platform_irq_init(struct vfio_platform_device *vdev);
+void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev);
+
+int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
+ uint32_t flags, unsigned index,
+ unsigned start, unsigned count,
+ void *data);
+
+void __vfio_platform_register_reset(struct vfio_platform_reset_node *n);
+void vfio_platform_unregister_reset(const char *compat,
+ vfio_platform_reset_fn_t fn);
#define vfio_platform_register_reset(__compat, __reset) \
static struct vfio_platform_reset_node __reset ## _node = { \
.owner = THIS_MODULE, \
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index c3ff36a7fa6f..27a63c1ce219 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -47,25 +47,25 @@ struct vfio_device_ops {
int (*vma_to_pfn)(struct vm_area_struct *vma, unsigned long *pfn);
};

-extern struct iommu_group *vfio_iommu_group_get(struct device *dev);
-extern void vfio_iommu_group_put(struct iommu_group *group, struct device *dev);
+struct iommu_group *vfio_iommu_group_get(struct device *dev);
+void vfio_iommu_group_put(struct iommu_group *group, struct device *dev);

-extern struct vfio_device *vfio_add_group_dev(struct device *dev,
+struct vfio_device *vfio_add_group_dev(struct device *dev,
const struct vfio_device_ops *ops,
void *device_data);

-extern void *vfio_del_group_dev(struct device *dev);
-extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
-extern void vfio_device_put(struct vfio_device *device);
-extern void *vfio_device_data(struct vfio_device *device);
-extern void vfio_device_unmap_mapping_range(struct vfio_device *device,
- loff_t start, loff_t len);
-extern struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma);
-extern int vfio_vma_to_pfn(struct vm_area_struct *vma, unsigned long *pfn);
-extern int vfio_device_register_notifier(struct vfio_device *device,
- struct notifier_block *nb);
-extern void vfio_device_unregister_notifier(struct vfio_device *device,
- struct notifier_block *nb);
+void *vfio_del_group_dev(struct device *dev);
+struct vfio_device *vfio_device_get_from_dev(struct device *dev);
+void vfio_device_put(struct vfio_device *device);
+void *vfio_device_data(struct vfio_device *device);
+void vfio_device_unmap_mapping_range(struct vfio_device *device,
+ loff_t start, loff_t len);
+struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma);
+int vfio_vma_to_pfn(struct vm_area_struct *vma, unsigned long *pfn);
+int vfio_device_register_notifier(struct vfio_device *device,
+ struct notifier_block *nb);
+void vfio_device_unregister_notifier(struct vfio_device *device,
+ struct notifier_block *nb);
enum vfio_device_notify_type {
VFIO_DEVICE_RELEASE = 0,
};
@@ -116,41 +116,37 @@ struct vfio_iommu_driver_ops {
enum vfio_iommu_notify_type event);
};

-extern int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);
+int vfio_register_iommu_driver(const struct vfio_iommu_driver_ops *ops);

-extern void vfio_unregister_iommu_driver(
- const struct vfio_iommu_driver_ops *ops);
+void vfio_unregister_iommu_driver(const struct vfio_iommu_driver_ops *ops);

/*
* External user API
*/
-extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
-extern void vfio_group_put_external_user(struct vfio_group *group);
-extern struct vfio_group *vfio_group_get_external_user_from_dev(struct device
- *dev);
-extern bool vfio_external_group_match_file(struct vfio_group *group,
- struct file *filep);
-extern int vfio_external_user_iommu_id(struct vfio_group *group);
-extern long vfio_external_check_extension(struct vfio_group *group,
- unsigned long arg);
+struct vfio_group *vfio_group_get_external_user(struct file *filep);
+void vfio_group_put_external_user(struct vfio_group *group);
+struct vfio_group *vfio_group_get_external_user_from_dev(struct device *dev);
+bool vfio_external_group_match_file(struct vfio_group *group,
+ struct file *filep);
+int vfio_external_user_iommu_id(struct vfio_group *group);
+long vfio_external_check_extension(struct vfio_group *group, unsigned long arg);

#define VFIO_PIN_PAGES_MAX_ENTRIES (PAGE_SIZE/sizeof(unsigned long))

-extern int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
- int npage, int prot, unsigned long *phys_pfn);
-extern int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn,
- int npage);
+int vfio_pin_pages(struct device *dev, unsigned long *user_pfn,
+ int npage, int prot, unsigned long *phys_pfn);
+int vfio_unpin_pages(struct device *dev, unsigned long *user_pfn, int npage);

-extern int vfio_group_pin_pages(struct vfio_group *group,
- unsigned long *user_iova_pfn, int npage,
- int prot, unsigned long *phys_pfn);
-extern int vfio_group_unpin_pages(struct vfio_group *group,
- unsigned long *user_iova_pfn, int npage);
+int vfio_group_pin_pages(struct vfio_group *group,
+ unsigned long *user_iova_pfn, int npage,
+ int prot, unsigned long *phys_pfn);
+int vfio_group_unpin_pages(struct vfio_group *group,
+ unsigned long *user_iova_pfn, int npage);

-extern int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
- void *data, size_t len, bool write);
+int vfio_dma_rw(struct vfio_group *group, dma_addr_t user_iova,
+ void *data, size_t len, bool write);

-extern struct iommu_domain *vfio_group_iommu_domain(struct vfio_group *group);
+struct iommu_domain *vfio_group_iommu_domain(struct vfio_group *group);

/* each type has independent events */
enum vfio_notify_type {
@@ -164,16 +160,14 @@ enum vfio_notify_type {
/* events for VFIO_GROUP_NOTIFY */
#define VFIO_GROUP_NOTIFY_SET_KVM BIT(0)

-extern int vfio_register_notifier(struct device *dev,
- enum vfio_notify_type type,
- unsigned long *required_events,
- struct notifier_block *nb);
-extern int vfio_unregister_notifier(struct device *dev,
- enum vfio_notify_type type,
- struct notifier_block *nb);
+int vfio_register_notifier(struct device *dev, enum vfio_notify_type type,
+ unsigned long *required_events,
+ struct notifier_block *nb);
+int vfio_unregister_notifier(struct device *dev, enum vfio_notify_type type,
+ struct notifier_block *nb);

struct kvm;
-extern void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);

/*
* Sub-module helpers
@@ -182,25 +176,22 @@ struct vfio_info_cap {
struct vfio_info_cap_header *buf;
size_t size;
};
-extern struct vfio_info_cap_header *vfio_info_cap_add(
- struct vfio_info_cap *caps, size_t size, u16 id, u16 version);
-extern void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset);
+struct vfio_info_cap_header *vfio_info_cap_add(struct vfio_info_cap *caps,
+ size_t size, u16 id, u16 version);
+void vfio_info_cap_shift(struct vfio_info_cap *caps, size_t offset);

-extern int vfio_info_add_capability(struct vfio_info_cap *caps,
- struct vfio_info_cap_header *cap,
- size_t size);
+int vfio_info_add_capability(struct vfio_info_cap *caps,
+ struct vfio_info_cap_header *cap, size_t size);

-extern int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr,
- int num_irqs, int max_irq_type,
- size_t *data_size);
+int vfio_set_irqs_validate_and_prepare(struct vfio_irq_set *hdr, int num_irqs,
+ int max_irq_type, size_t *data_size);

struct pci_dev;
#if IS_ENABLED(CONFIG_VFIO_SPAPR_EEH)
-extern void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
-extern void vfio_spapr_pci_eeh_release(struct pci_dev *pdev);
-extern long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
- unsigned int cmd,
- unsigned long arg);
+void vfio_spapr_pci_eeh_open(struct pci_dev *pdev);
+void vfio_spapr_pci_eeh_release(struct pci_dev *pdev);
+long vfio_spapr_iommu_eeh_ioctl(struct iommu_group *group,
+ unsigned int cmd, unsigned long arg);
#else
static inline void vfio_spapr_pci_eeh_open(struct pci_dev *pdev)
{
@@ -234,10 +225,9 @@ struct virqfd {
struct virqfd **pvirqfd;
};

-extern int vfio_virqfd_enable(void *opaque,
- int (*handler)(void *, void *),
- void (*thread)(void *, void *),
- void *data, struct virqfd **pvirqfd, int fd);
-extern void vfio_virqfd_disable(struct virqfd **pvirqfd);
+int vfio_virqfd_enable(void *opaque, int (*handler)(void *, void *),
+ void (*thread)(void *, void *), void *data,
+ struct virqfd **pvirqfd, int fd);
+void vfio_virqfd_disable(struct virqfd **pvirqfd);

#endif /* VFIO_H */

2021-03-08 21:51:30

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 03/14] vfio: Export unmap_mapping_range() wrapper

Allow bus drivers to use vfio pseudo fs mapping to zap all mmaps
across a range of their device files.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/vfio.c | 7 +++++++
include/linux/vfio.h | 2 ++
2 files changed, 9 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 34d32f16246a..3852e57b9e04 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -565,6 +565,13 @@ static struct inode *vfio_fs_inode_new(void)
return inode;
}

+void vfio_device_unmap_mapping_range(struct vfio_device *device,
+ loff_t start, loff_t len)
+{
+ unmap_mapping_range(device->inode->i_mapping, start, len, true);
+}
+EXPORT_SYMBOL_GPL(vfio_device_unmap_mapping_range);
+
/**
* Device objects - create, release, get, put, search
*/
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index b784463000d4..f435dfca15eb 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -56,6 +56,8 @@ extern void *vfio_del_group_dev(struct device *dev);
extern struct vfio_device *vfio_device_get_from_dev(struct device *dev);
extern void vfio_device_put(struct vfio_device *device);
extern void *vfio_device_data(struct vfio_device *device);
+extern void vfio_device_unmap_mapping_range(struct vfio_device *device,
+ loff_t start, loff_t len);

/* events for the backend driver notify callback */
enum vfio_iommu_notify_type {

2021-03-08 21:51:30

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 12/14] vfio/type1: Support batching of device mappings

Populate the page array to the extent available to enable batching.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index e89f11141dee..d499bccfbe3f 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -628,6 +628,8 @@ static int vaddr_get_pfns(struct vfio_iommu *iommu, struct vfio_dma *dma,
vma = find_vma_intersection(mm, vaddr, vaddr + 1);

if (vma && vma->vm_flags & VM_PFNMAP) {
+ unsigned long count, i;
+
if ((dma->prot & IOMMU_WRITE && !(vma->vm_flags & VM_WRITE)) ||
(dma->prot & IOMMU_READ && !(vma->vm_flags & VM_READ))) {
ret = -EFAULT;
@@ -678,7 +680,13 @@ static int vaddr_get_pfns(struct vfio_iommu *iommu, struct vfio_dma *dma,

*pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) +
dma->pfnmap->base_pfn;
- ret = 1;
+ count = min_t(long,
+ (vma->vm_end - vaddr) >> PAGE_SHIFT, npages);
+
+ for (i = 0; i < count; i++)
+ pages[i] = pfn_to_page(*pfn + i);
+
+ ret = count;
}
done:
mmap_read_unlock(mm);

2021-03-08 21:51:46

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 06/14] vfio: Add vma to pfn callback

Add a new vfio_device_ops callback to allow the bus driver to
translate a vma mapping of a vfio device fd to a pfn. Plumb through
vfio-core. Implemented for vfio-pci.

Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/vfio.c | 16 ++++++++++++++++
include/linux/vfio.h | 3 +++
3 files changed, 20 insertions(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 415b5109da9b..585895970e9c 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -1756,6 +1756,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
.mmap = vfio_pci_mmap,
.request = vfio_pci_request,
.match = vfio_pci_match,
+ .vma_to_pfn = vfio_pci_bar_vma_to_pfn,
};

static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 3a3e85a0dc3e..c47895539a1a 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -944,6 +944,22 @@ struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma)
}
EXPORT_SYMBOL_GPL(vfio_device_get_from_vma);

+int vfio_vma_to_pfn(struct vm_area_struct *vma, unsigned long *pfn)
+{
+ struct vfio_device *device;
+
+ if (!vma->vm_file || vma->vm_file->f_op != &vfio_device_fops)
+ return -EINVAL;
+
+ device = vma->vm_file->private_data;
+
+ if (unlikely(!device->ops->vma_to_pfn))
+ return -EINVAL;
+
+ return device->ops->vma_to_pfn(vma, pfn);
+}
+EXPORT_SYMBOL_GPL(vfio_vma_to_pfn);
+
static struct vfio_device *vfio_device_get_from_name(struct vfio_group *group,
char *buf)
{
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 660b8adf90a6..dbd90d0ba713 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -29,6 +29,7 @@
* @match: Optional device name match callback (return: 0 for no-match, >0 for
* match, -errno for abort (ex. match with insufficient or incorrect
* additional args)
+ * @vma_to_pfn: Optional pfn from vma lookup against vma mapping device fd
*/
struct vfio_device_ops {
char *name;
@@ -43,6 +44,7 @@ struct vfio_device_ops {
int (*mmap)(void *device_data, struct vm_area_struct *vma);
void (*request)(void *device_data, unsigned int count);
int (*match)(void *device_data, char *buf);
+ int (*vma_to_pfn)(struct vm_area_struct *vma, unsigned long *pfn);
};

extern struct iommu_group *vfio_iommu_group_get(struct device *dev);
@@ -59,6 +61,7 @@ extern void *vfio_device_data(struct vfio_device *device);
extern void vfio_device_unmap_mapping_range(struct vfio_device *device,
loff_t start, loff_t len);
extern struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma);
+extern int vfio_vma_to_pfn(struct vm_area_struct *vma, unsigned long *pfn);

/* events for the backend driver notify callback */
enum vfio_iommu_notify_type {

2021-03-08 21:52:24

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 09/14] vfio/type1: Refactor pfn_list clearing

Pull code out to a function for re-use.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 57 +++++++++++++++++++++++----------------
1 file changed, 34 insertions(+), 23 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 12d9905b429f..f7d35a114354 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -542,6 +542,39 @@ static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
return ret;
}

+/* Return 1 if iommu->lock dropped and notified, 0 if done */
+static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma,
+ struct vfio_dma **dma_last, int *retries)
+{
+ if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
+ struct vfio_iommu_type1_dma_unmap nb_unmap;
+
+ if (*dma_last == dma) {
+ BUG_ON(++(*retries) > 10);
+ } else {
+ *dma_last = dma;
+ *retries = 0;
+ }
+
+ nb_unmap.iova = dma->iova;
+ nb_unmap.size = dma->size;
+
+ /*
+ * Notify anyone (mdev vendor drivers) to invalidate and
+ * unmap iovas within the range we're about to unmap.
+ * Vendor drivers MUST unpin pages in response to an
+ * invalidation.
+ */
+ mutex_unlock(&iommu->lock);
+ blocking_notifier_call_chain(&iommu->notifier,
+ VFIO_IOMMU_NOTIFY_DMA_UNMAP,
+ &nb_unmap);
+ return 1;
+ }
+
+ return 0;
+}
+
/*
* Returns the positive number of pfns successfully obtained or a negative
* error code.
@@ -1397,29 +1430,7 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
continue;
}

- if (!RB_EMPTY_ROOT(&dma->pfn_list)) {
- struct vfio_iommu_type1_dma_unmap nb_unmap;
-
- if (dma_last == dma) {
- BUG_ON(++retries > 10);
- } else {
- dma_last = dma;
- retries = 0;
- }
-
- nb_unmap.iova = dma->iova;
- nb_unmap.size = dma->size;
-
- /*
- * Notify anyone (mdev vendor drivers) to invalidate and
- * unmap iovas within the range we're about to unmap.
- * Vendor drivers MUST unpin pages in response to an
- * invalidation.
- */
- mutex_unlock(&iommu->lock);
- blocking_notifier_call_chain(&iommu->notifier,
- VFIO_IOMMU_NOTIFY_DMA_UNMAP,
- &nb_unmap);
+ if (unmap_dma_pfn_list(iommu, dma, &dma_last, &retries)) {
mutex_lock(&iommu->lock);
goto again;
}

2021-03-08 21:52:39

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 08/14] vfio/pci: Notify on device release

Trigger a release notifier call when open reference count is zero.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 585895970e9c..bee9318b46ed 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -560,6 +560,7 @@ static void vfio_pci_release(void *device_data)
mutex_lock(&vdev->reflck->lock);

if (!(--vdev->refcnt)) {
+ vfio_device_notifier_call(vdev->device, VFIO_DEVICE_RELEASE);
vfio_pci_vf_token_user_add(vdev, -1);
vfio_spapr_pci_eeh_release(vdev->pdev);
vfio_pci_disable(vdev);

2021-03-08 21:52:45

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 10/14] vfio/type1: Pass iommu and dma objects through to vaddr_get_pfn

We'll need these to track vfio device mappings.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 28 ++++++++++++++++------------
1 file changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f7d35a114354..f22c07a40521 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -579,15 +579,16 @@ static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma,
* Returns the positive number of pfns successfully obtained or a negative
* error code.
*/
-static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
- long npages, int prot, unsigned long *pfn,
+static int vaddr_get_pfns(struct vfio_iommu *iommu, struct vfio_dma *dma,
+ struct mm_struct *mm, unsigned long vaddr,
+ long npages, unsigned long *pfn,
struct page **pages)
{
struct vm_area_struct *vma;
unsigned int flags = 0;
int ret;

- if (prot & IOMMU_WRITE)
+ if (dma->prot & IOMMU_WRITE)
flags |= FOLL_WRITE;

mmap_read_lock(mm);
@@ -604,7 +605,8 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
vma = find_vma_intersection(mm, vaddr, vaddr + 1);

if (vma && vma->vm_flags & VM_PFNMAP) {
- ret = follow_fault_pfn(vma, mm, vaddr, pfn, prot & IOMMU_WRITE);
+ ret = follow_fault_pfn(vma, mm, vaddr, pfn,
+ dma->prot & IOMMU_WRITE);
if (ret == -EAGAIN)
goto retry;

@@ -680,7 +682,8 @@ static int vfio_wait_all_valid(struct vfio_iommu *iommu)
* the iommu can only map chunks of consecutive pfns anyway, so get the
* first page and all consecutive pages with the same locking.
*/
-static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
+static long vfio_pin_pages_remote(struct vfio_iommu *iommu,
+ struct vfio_dma *dma, unsigned long vaddr,
long npage, unsigned long *pfn_base,
unsigned long limit, struct vfio_batch *batch)
{
@@ -708,7 +711,7 @@ static long vfio_pin_pages_remote(struct vfio_dma *dma, unsigned long vaddr,
/* Empty batch, so refill it. */
long req_pages = min_t(long, npage, batch->capacity);

- ret = vaddr_get_pfns(mm, vaddr, req_pages, dma->prot,
+ ret = vaddr_get_pfns(iommu, dma, mm, vaddr, req_pages,
&pfn, batch->pages);
if (ret < 0)
goto unpin_out;
@@ -806,7 +809,8 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova,
return unlocked;
}

-static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
+static int vfio_pin_page_external(struct vfio_iommu *iommu,
+ struct vfio_dma *dma, unsigned long vaddr,
unsigned long *pfn_base, bool do_accounting)
{
struct page *pages[1];
@@ -817,7 +821,7 @@ static int vfio_pin_page_external(struct vfio_dma *dma, unsigned long vaddr,
if (!mm)
return -ENODEV;

- ret = vaddr_get_pfns(mm, vaddr, 1, dma->prot, pfn_base, pages);
+ ret = vaddr_get_pfns(iommu, dma, mm, vaddr, 1, pfn_base, pages);
if (ret == 1 && do_accounting && !is_invalid_reserved_pfn(*pfn_base)) {
ret = vfio_lock_acct(dma, 1, true);
if (ret) {
@@ -925,8 +929,8 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,
}

remote_vaddr = dma->vaddr + (iova - dma->iova);
- ret = vfio_pin_page_external(dma, remote_vaddr, &phys_pfn[i],
- do_accounting);
+ ret = vfio_pin_page_external(iommu, dma, remote_vaddr,
+ &phys_pfn[i], do_accounting);
if (ret)
goto pin_unwind;

@@ -1497,7 +1501,7 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma,

while (size) {
/* Pin a contiguous chunk of memory */
- npage = vfio_pin_pages_remote(dma, vaddr + dma->size,
+ npage = vfio_pin_pages_remote(iommu, dma, vaddr + dma->size,
size >> PAGE_SHIFT, &pfn, limit,
&batch);
if (npage <= 0) {
@@ -1759,7 +1763,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
size_t n = dma->iova + dma->size - iova;
long npage;

- npage = vfio_pin_pages_remote(dma, vaddr,
+ npage = vfio_pin_pages_remote(iommu, dma, vaddr,
n >> PAGE_SHIFT,
&pfn, limit,
&batch);

2021-03-08 21:53:26

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 11/14] vfio/type1: Register device notifier

Impose a new default strict MMIO mapping mode where the vma for
a VM_PFNMAP mapping must be backed by a vfio device. This allows
holding a reference to the device and registering a notifier for the
device, which additionally keeps the device in an IOMMU context for
the extent of the DMA mapping. On notification of device release,
automatically drop the DMA mappings for it.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/vfio_iommu_type1.c | 163 ++++++++++++++++++++++++++++-----------
1 file changed, 116 insertions(+), 47 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index f22c07a40521..e89f11141dee 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -101,6 +101,20 @@ struct vfio_dma {
struct task_struct *task;
struct rb_root pfn_list; /* Ex-user pinned pfn list */
unsigned long *bitmap;
+ struct pfnmap_obj *pfnmap;
+};
+
+/*
+ * Separate object used for tracking pfnmaps to allow reference release and
+ * unregistering notifier outside of callback chain.
+ */
+struct pfnmap_obj {
+ struct notifier_block nb;
+ struct work_struct work;
+ struct vfio_iommu *iommu;
+ struct vfio_dma *dma;
+ struct vfio_device *device;
+ unsigned long base_pfn;
};

struct vfio_batch {
@@ -506,42 +520,6 @@ static void vfio_batch_fini(struct vfio_batch *batch)
free_page((unsigned long)batch->pages);
}

-static int follow_fault_pfn(struct vm_area_struct *vma, struct mm_struct *mm,
- unsigned long vaddr, unsigned long *pfn,
- bool write_fault)
-{
- pte_t *ptep;
- spinlock_t *ptl;
- int ret;
-
- ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl);
- if (ret) {
- bool unlocked = false;
-
- ret = fixup_user_fault(mm, vaddr,
- FAULT_FLAG_REMOTE |
- (write_fault ? FAULT_FLAG_WRITE : 0),
- &unlocked);
- if (unlocked)
- return -EAGAIN;
-
- if (ret)
- return ret;
-
- ret = follow_pte(vma->vm_mm, vaddr, &ptep, &ptl);
- if (ret)
- return ret;
- }
-
- if (write_fault && !pte_write(*ptep))
- ret = -EFAULT;
- else
- *pfn = pte_pfn(*ptep);
-
- pte_unmap_unlock(ptep, ptl);
- return ret;
-}
-
/* Return 1 if iommu->lock dropped and notified, 0 if done */
static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma,
struct vfio_dma **dma_last, int *retries)
@@ -575,6 +553,52 @@ static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma,
return 0;
}

+static void unregister_device_bg(struct work_struct *work)
+{
+ struct pfnmap_obj *pfnmap = container_of(work, struct pfnmap_obj, work);
+
+ vfio_device_unregister_notifier(pfnmap->device, &pfnmap->nb);
+ vfio_device_put(pfnmap->device);
+ kfree(pfnmap);
+}
+
+static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma);
+
+static int vfio_device_nb_cb(struct notifier_block *nb,
+ unsigned long action, void *unused)
+{
+ struct pfnmap_obj *pfnmap = container_of(nb, struct pfnmap_obj, nb);
+
+ switch (action) {
+ case VFIO_DEVICE_RELEASE:
+ {
+ struct vfio_dma *dma_last = NULL;
+ int retries = 0;
+again:
+ mutex_lock(&pfnmap->iommu->lock);
+ if (pfnmap->dma) {
+ struct vfio_dma *dma = pfnmap->dma;
+
+ if (unmap_dma_pfn_list(pfnmap->iommu, dma,
+ &dma_last, &retries))
+ goto again;
+
+ dma->pfnmap = NULL;
+ pfnmap->dma = NULL;
+ vfio_remove_dma(pfnmap->iommu, dma);
+ }
+ mutex_unlock(&pfnmap->iommu->lock);
+
+ /* Cannot unregister notifier from callback chain */
+ INIT_WORK(&pfnmap->work, unregister_device_bg);
+ schedule_work(&pfnmap->work);
+ break;
+ }
+ }
+
+ return NOTIFY_OK;
+}
+
/*
* Returns the positive number of pfns successfully obtained or a negative
* error code.
@@ -601,21 +625,60 @@ static int vaddr_get_pfns(struct vfio_iommu *iommu, struct vfio_dma *dma,

vaddr = untagged_addr(vaddr);

-retry:
vma = find_vma_intersection(mm, vaddr, vaddr + 1);

if (vma && vma->vm_flags & VM_PFNMAP) {
- ret = follow_fault_pfn(vma, mm, vaddr, pfn,
- dma->prot & IOMMU_WRITE);
- if (ret == -EAGAIN)
- goto retry;
-
- if (!ret) {
- if (is_invalid_reserved_pfn(*pfn))
- ret = 1;
- else
- ret = -EFAULT;
+ if ((dma->prot & IOMMU_WRITE && !(vma->vm_flags & VM_WRITE)) ||
+ (dma->prot & IOMMU_READ && !(vma->vm_flags & VM_READ))) {
+ ret = -EFAULT;
+ goto done;
+ }
+
+ if (!dma->pfnmap) {
+ struct vfio_device *device;
+ unsigned long base_pfn;
+ struct pfnmap_obj *pfnmap;
+
+ device = vfio_device_get_from_vma(vma);
+ if (IS_ERR(device)) {
+ ret = PTR_ERR(device);
+ goto done;
+ }
+
+ ret = vfio_vma_to_pfn(vma, &base_pfn);
+ if (ret) {
+ vfio_device_put(device);
+ goto done;
+ }
+
+ pfnmap = kzalloc(sizeof(*pfnmap), GFP_KERNEL);
+ if (!pfnmap) {
+ vfio_device_put(device);
+ ret = -ENOMEM;
+ goto done;
+ }
+
+ pfnmap->nb.notifier_call = vfio_device_nb_cb;
+ pfnmap->iommu = iommu;
+ pfnmap->dma = dma;
+ pfnmap->device = device;
+ pfnmap->base_pfn = base_pfn;
+
+ dma->pfnmap = pfnmap;
+
+ ret = vfio_device_register_notifier(device,
+ &pfnmap->nb);
+ if (ret) {
+ dma->pfnmap = NULL;
+ kfree(pfnmap);
+ vfio_device_put(device);
+ goto done;
+ }
}
+
+ *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) +
+ dma->pfnmap->base_pfn;
+ ret = 1;
}
done:
mmap_read_unlock(mm);
@@ -1189,6 +1252,12 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma)
{
WARN_ON(!RB_EMPTY_ROOT(&dma->pfn_list));
+ if (dma->pfnmap) {
+ vfio_device_unregister_notifier(dma->pfnmap->device,
+ &dma->pfnmap->nb);
+ vfio_device_put(dma->pfnmap->device);
+ kfree(dma->pfnmap);
+ }
vfio_unmap_unpin(iommu, dma, true);
vfio_unlink_dma(iommu, dma);
put_task_struct(dma->task);

2021-03-08 21:54:02

by Alex Williamson

[permalink] [raw]
Subject: [PATCH v1 14/14] vfio: Cleanup use of bare unsigned

Replace with 'unsigned int'.

Signed-off-by: Alex Williamson <[email protected]>
---
drivers/vfio/pci/vfio_pci_intrs.c | 42 ++++++++++++++-----------
drivers/vfio/pci/vfio_pci_private.h | 4 +-
drivers/vfio/platform/vfio_platform_irq.c | 21 +++++++------
drivers/vfio/platform/vfio_platform_private.h | 4 +-
4 files changed, 39 insertions(+), 32 deletions(-)

diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 869dce5f134d..67de58d67908 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -364,8 +364,8 @@ static int vfio_msi_set_vector_signal(struct vfio_pci_device *vdev,
return 0;
}

-static int vfio_msi_set_block(struct vfio_pci_device *vdev, unsigned start,
- unsigned count, int32_t *fds, bool msix)
+static int vfio_msi_set_block(struct vfio_pci_device *vdev, unsigned int start,
+ unsigned int count, int32_t *fds, bool msix)
{
int i, j, ret = 0;

@@ -418,8 +418,9 @@ static void vfio_msi_disable(struct vfio_pci_device *vdev, bool msix)
* IOCTL support
*/
static int vfio_pci_set_intx_unmask(struct vfio_pci_device *vdev,
- unsigned index, unsigned start,
- unsigned count, uint32_t flags, void *data)
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
+ void *data)
{
if (!is_intx(vdev) || start != 0 || count != 1)
return -EINVAL;
@@ -445,8 +446,9 @@ static int vfio_pci_set_intx_unmask(struct vfio_pci_device *vdev,
}

static int vfio_pci_set_intx_mask(struct vfio_pci_device *vdev,
- unsigned index, unsigned start,
- unsigned count, uint32_t flags, void *data)
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
+ void *data)
{
if (!is_intx(vdev) || start != 0 || count != 1)
return -EINVAL;
@@ -465,8 +467,9 @@ static int vfio_pci_set_intx_mask(struct vfio_pci_device *vdev,
}

static int vfio_pci_set_intx_trigger(struct vfio_pci_device *vdev,
- unsigned index, unsigned start,
- unsigned count, uint32_t flags, void *data)
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
+ void *data)
{
if (is_intx(vdev) && !count && (flags & VFIO_IRQ_SET_DATA_NONE)) {
vfio_intx_disable(vdev);
@@ -508,8 +511,9 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_device *vdev,
}

static int vfio_pci_set_msi_trigger(struct vfio_pci_device *vdev,
- unsigned index, unsigned start,
- unsigned count, uint32_t flags, void *data)
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
+ void *data)
{
int i;
bool msix = (index == VFIO_PCI_MSIX_IRQ_INDEX) ? true : false;
@@ -614,8 +618,9 @@ static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx,
}

static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
- unsigned index, unsigned start,
- unsigned count, uint32_t flags, void *data)
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
+ void *data)
{
if (index != VFIO_PCI_ERR_IRQ_INDEX || start != 0 || count > 1)
return -EINVAL;
@@ -625,8 +630,9 @@ static int vfio_pci_set_err_trigger(struct vfio_pci_device *vdev,
}

static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
- unsigned index, unsigned start,
- unsigned count, uint32_t flags, void *data)
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
+ void *data)
{
if (index != VFIO_PCI_REQ_IRQ_INDEX || start != 0 || count > 1)
return -EINVAL;
@@ -636,11 +642,11 @@ static int vfio_pci_set_req_trigger(struct vfio_pci_device *vdev,
}

int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev, uint32_t flags,
- unsigned index, unsigned start, unsigned count,
- void *data)
+ unsigned int index, unsigned int start,
+ unsigned int count, void *data)
{
- int (*func)(struct vfio_pci_device *vdev, unsigned index,
- unsigned start, unsigned count, uint32_t flags,
+ int (*func)(struct vfio_pci_device *vdev, unsigned int index,
+ unsigned int start, unsigned int count, uint32_t flags,
void *data) = NULL;

switch (index) {
diff --git a/drivers/vfio/pci/vfio_pci_private.h b/drivers/vfio/pci/vfio_pci_private.h
index 49a60585cf9c..93f47044e2e5 100644
--- a/drivers/vfio/pci/vfio_pci_private.h
+++ b/drivers/vfio/pci/vfio_pci_private.h
@@ -153,8 +153,8 @@ void vfio_pci_intx_mask(struct vfio_pci_device *vdev);
void vfio_pci_intx_unmask(struct vfio_pci_device *vdev);

int vfio_pci_set_irqs_ioctl(struct vfio_pci_device *vdev,
- uint32_t flags, unsigned index,
- unsigned start, unsigned count, void *data);
+ uint32_t flags, unsigned int index,
+ unsigned int start, unsigned int count, void *data);

ssize_t vfio_pci_config_rw(struct vfio_pci_device *vdev, char __user *buf,
size_t count, loff_t *ppos, bool iswrite);
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index c5b09ec0a3c9..90d87ab44131 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -39,8 +39,8 @@ static int vfio_platform_mask_handler(void *opaque, void *unused)
}

static int vfio_platform_set_irq_mask(struct vfio_platform_device *vdev,
- unsigned index, unsigned start,
- unsigned count, uint32_t flags,
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
void *data)
{
if (start != 0 || count != 1)
@@ -99,8 +99,8 @@ static int vfio_platform_unmask_handler(void *opaque, void *unused)
}

static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
- unsigned index, unsigned start,
- unsigned count, uint32_t flags,
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
void *data)
{
if (start != 0 || count != 1)
@@ -216,8 +216,8 @@ static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
}

static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
- unsigned index, unsigned start,
- unsigned count, uint32_t flags,
+ unsigned int index, unsigned int start,
+ unsigned int count, uint32_t flags,
void *data)
{
struct vfio_platform_irq *irq = &vdev->irqs[index];
@@ -254,11 +254,12 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
}

int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
- uint32_t flags, unsigned index, unsigned start,
- unsigned count, void *data)
+ uint32_t flags, unsigned int index,
+ unsigned int start, unsigned int count,
+ void *data)
{
- int (*func)(struct vfio_platform_device *vdev, unsigned index,
- unsigned start, unsigned count, uint32_t flags,
+ int (*func)(struct vfio_platform_device *vdev, unsigned int index,
+ unsigned int start, unsigned int count, uint32_t flags,
void *data) = NULL;

switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
diff --git a/drivers/vfio/platform/vfio_platform_private.h b/drivers/vfio/platform/vfio_platform_private.h
index 2aa12d41f9c6..09870bf07e95 100644
--- a/drivers/vfio/platform/vfio_platform_private.h
+++ b/drivers/vfio/platform/vfio_platform_private.h
@@ -86,8 +86,8 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev);
void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev);

int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
- uint32_t flags, unsigned index,
- unsigned start, unsigned count,
+ uint32_t flags, unsigned int index,
+ unsigned int start, unsigned int count,
void *data);

void __vfio_platform_register_reset(struct vfio_platform_reset_node *n);

2021-03-09 00:23:57

by Halil Pasic

[permalink] [raw]
Subject: Re: [PATCH v1 13/14] vfio: Remove extern from declarations across vfio

On Mon, 08 Mar 2021 14:49:42 -0700
Alex Williamson <[email protected]> wrote:

> Cleanup disrecommended usage and docs.
>
> Signed-off-by: Alex Williamson <[email protected]>

Acked-by: Halil Pasic <[email protected]>

2021-03-09 00:37:39

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v1 06/14] vfio: Add vma to pfn callback

On Mon, Mar 08, 2021 at 02:48:16PM -0700, Alex Williamson wrote:
> Add a new vfio_device_ops callback to allow the bus driver to
> translate a vma mapping of a vfio device fd to a pfn. Plumb through
> vfio-core. Implemented for vfio-pci.
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Alex Williamson <[email protected]>
> drivers/vfio/pci/vfio_pci.c | 1 +
> drivers/vfio/vfio.c | 16 ++++++++++++++++
> include/linux/vfio.h | 3 +++
> 3 files changed, 20 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 415b5109da9b..585895970e9c 100644
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -1756,6 +1756,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
> .mmap = vfio_pci_mmap,
> .request = vfio_pci_request,
> .match = vfio_pci_match,
> + .vma_to_pfn = vfio_pci_bar_vma_to_pfn,
> };
>
> static int vfio_pci_reflck_attach(struct vfio_pci_device *vdev);
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index 3a3e85a0dc3e..c47895539a1a 100644
> +++ b/drivers/vfio/vfio.c
> @@ -944,6 +944,22 @@ struct vfio_device *vfio_device_get_from_vma(struct vm_area_struct *vma)
> }
> EXPORT_SYMBOL_GPL(vfio_device_get_from_vma);
>
> +int vfio_vma_to_pfn(struct vm_area_struct *vma, unsigned long *pfn)
> +{
> + struct vfio_device *device;
> +
> + if (!vma->vm_file || vma->vm_file->f_op != &vfio_device_fops)
> + return -EINVAL;
> +
> + device = vma->vm_file->private_data;

Since the caller has the vfio_device already I would pass in the
vfio_device here rather than look it up again.

If you are really worried about API mis-use then use a protective
assertion like this:

if (WARN_ON(vma->vm_file->private_data != device))
return -EINVAL;

Jason

2021-03-09 00:48:46

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v1 07/14] vfio: Add a device notifier interface

On Mon, Mar 08, 2021 at 02:48:30PM -0700, Alex Williamson wrote:
> Using a vfio device, a notifier block can be registered to receive
> select device events. Notifiers can only be registered for contained
> devices, ie. they are available through a user context. Registration
> of a notifier increments the reference to that container context
> therefore notifiers must minimally respond to the release event by
> asynchronously removing notifiers.
>
> Signed-off-by: Alex Williamson <[email protected]>
> drivers/vfio/Kconfig | 1 +
> drivers/vfio/vfio.c | 35 +++++++++++++++++++++++++++++++++++
> include/linux/vfio.h | 9 +++++++++
> 3 files changed, 45 insertions(+)
>
> diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> index 90c0525b1e0c..9a67675c9b6c 100644
> +++ b/drivers/vfio/Kconfig
> @@ -23,6 +23,7 @@ menuconfig VFIO
> tristate "VFIO Non-Privileged userspace driver framework"
> select IOMMU_API
> select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
> + select SRCU
> help
> VFIO provides a framework for secure userspace device drivers.
> See Documentation/driver-api/vfio.rst for more details.
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> index c47895539a1a..7f6d00e54e83 100644
> +++ b/drivers/vfio/vfio.c
> @@ -105,6 +105,7 @@ struct vfio_device {
> struct list_head group_next;
> void *device_data;
> struct inode *inode;
> + struct srcu_notifier_head notifier;
> };
>
> #ifdef CONFIG_VFIO_NOIOMMU
> @@ -601,6 +602,7 @@ struct vfio_device *vfio_group_create_device(struct vfio_group *group,
> device->ops = ops;
> device->device_data = device_data;
> dev_set_drvdata(dev, device);
> + srcu_init_notifier_head(&device->notifier);
>
> /* No need to get group_lock, caller has group reference */
> vfio_group_get(group);
> @@ -1785,6 +1787,39 @@ static const struct file_operations vfio_device_fops = {
> .mmap = vfio_device_fops_mmap,
> };
>
> +int vfio_device_register_notifier(struct vfio_device *device,
> + struct notifier_block *nb)
> +{
> + int ret;
> +
> + /* Container ref persists until unregister on success */
> + ret = vfio_group_add_container_user(device->group);

I'm having trouble guessing why we need to refcount the group to add a
notifier to the device's notifier chain?

I suppose it actually has to do with the MMIO mapping? But I don't
know what the relation is between MMIO mappings in the IOMMU and the
container? This could deserve a comment?

> +void vfio_device_unregister_notifier(struct vfio_device *device,
> + struct notifier_block *nb)
> +{
> + if (!srcu_notifier_chain_unregister(&device->notifier, nb))
> + vfio_group_try_dissolve_container(device->group);
> +}
> +EXPORT_SYMBOL_GPL(vfio_device_unregister_notifier);

Is the SRCU still needed with the new locking? With a cursory look I
only noticed this called under the reflck->lock ?

Jason

2021-03-09 01:09:20

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v1 12/14] vfio/type1: Support batching of device mappings

On Mon, Mar 08, 2021 at 02:49:31PM -0700, Alex Williamson wrote:
> Populate the page array to the extent available to enable batching.
>
> Signed-off-by: Alex Williamson <[email protected]>
> drivers/vfio/vfio_iommu_type1.c | 10 +++++++++-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index e89f11141dee..d499bccfbe3f 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -628,6 +628,8 @@ static int vaddr_get_pfns(struct vfio_iommu *iommu, struct vfio_dma *dma,
> vma = find_vma_intersection(mm, vaddr, vaddr + 1);
>
> if (vma && vma->vm_flags & VM_PFNMAP) {
> + unsigned long count, i;
> +
> if ((dma->prot & IOMMU_WRITE && !(vma->vm_flags & VM_WRITE)) ||
> (dma->prot & IOMMU_READ && !(vma->vm_flags & VM_READ))) {
> ret = -EFAULT;
> @@ -678,7 +680,13 @@ static int vaddr_get_pfns(struct vfio_iommu *iommu, struct vfio_dma *dma,
>
> *pfn = ((vaddr - vma->vm_start) >> PAGE_SHIFT) +
> dma->pfnmap->base_pfn;
> - ret = 1;
> + count = min_t(long,
> + (vma->vm_end - vaddr) >> PAGE_SHIFT, npages);
> +
> + for (i = 0; i < count; i++)
> + pages[i] = pfn_to_page(*pfn + i);

This isn't safe, we can't pass a VM_PFNMAP pfn into pfn_to_page(). The
whole api here with the batch should be using pfns not struct pages

Also.. this is not nice at all:

static int put_pfn(unsigned long pfn, int prot)
{
if (!is_invalid_reserved_pfn(pfn)) {
struct page *page = pfn_to_page(pfn);

unpin_user_pages_dirty_lock(&page, 1, prot & IOMMU_WRITE);

The manner in which the PFN was obtained should be tracked internally
to VFIO, not deduced externally by the pfn type. *only* pages returned
by pin_user_pages() should be used with unpin_user_pages() - the other
stuff must be kept distinct.

This is actually another bug with the way things are today, as if the
user gets a PFNMAP VMA that happens to point to a struct page (eg a
MIXEDMAP, these things exist in the kernel), the unpin will explode
when it gets here.

Something like what hmm_range_fault() does where the high bits of the
pfn encode information about it (there is always PAGE_SHIFT high bits
available for use) is much cleaner/safer.

Jason

2021-03-09 01:09:23

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v1 00/14] vfio: Device memory DMA mapping improvements

On Mon, Mar 08, 2021 at 02:47:16PM -0700, Alex Williamson wrote:
> The primary goal of this series is to better manage device memory
> mappings, both with a much simplified scheme to zap CPU mappings of
> device memory using unmap_mapping_range() and also to restrict IOMMU
> mappings of PFNMAPs to vfio device memory and drop those mappings on
> device release. This series updates vfio-pci to include the necessary
> vma-to-pfn interface, allowing the type1 IOMMU backend to recognize
> vfio device memory. If other bus drivers support peer-to-peer DMA,
> they should be updated with a similar callback and trigger the device
> notifier on release.

I only skimmed it for now, but it looks pretty good

I need to sit down and take a very careful look at all of this, my
rdma address_space patch included..

Jason

2021-03-09 01:09:38

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v1 13/14] vfio: Remove extern from declarations across vfio

On Mon, Mar 08, 2021 at 02:49:42PM -0700, Alex Williamson wrote:
> Cleanup disrecommended usage and docs.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
> Documentation/driver-api/vfio-mediated-device.rst | 19 ++-
> Documentation/driver-api/vfio.rst | 4 -
> drivers/s390/cio/vfio_ccw_cp.h | 13 +-
> drivers/s390/cio/vfio_ccw_private.h | 14 +-
> drivers/s390/crypto/vfio_ap_private.h | 2
> drivers/vfio/fsl-mc/vfio_fsl_mc_private.h | 7 +
> drivers/vfio/pci/vfio_pci_private.h | 66 +++++------
> drivers/vfio/platform/vfio_platform_private.h | 31 +++--
> include/linux/vfio.h | 122 ++++++++++-----------
> 9 files changed, 130 insertions(+), 148 deletions(-)

I'm glad to see this, could you apply it instead of waiting for the
other patches?

Reviewed-by: Jason Gunthorpe <[email protected]>

Jason

2021-03-09 01:12:05

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v1 14/14] vfio: Cleanup use of bare unsigned

On Mon, Mar 08, 2021 at 02:49:49PM -0700, Alex Williamson wrote:
> Replace with 'unsigned int'.
>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 42 ++++++++++++++-----------
> drivers/vfio/pci/vfio_pci_private.h | 4 +-
> drivers/vfio/platform/vfio_platform_irq.c | 21 +++++++------
> drivers/vfio/platform/vfio_platform_private.h | 4 +-
> 4 files changed, 39 insertions(+), 32 deletions(-)

Indeed, checkpatch has been warning about this too

Reviewed-by: Jason Gunthorpe <[email protected]>

Again it would be nice to apply it

Jason

2021-03-09 08:32:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 14/14] vfio: Cleanup use of bare unsigned

On Mon, Mar 08, 2021 at 09:07:45PM -0400, Jason Gunthorpe wrote:
> Indeed, checkpatch has been warning about this too

And checkaptch as so often these days is completely full of shit.
There is ansolutely not objective reason against using bare unsigned
except for a weird anal preference of a checkpatch author.

2021-03-09 08:38:26

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device

On Mon, Mar 08, 2021 at 02:47:28PM -0700, Alex Williamson wrote:
> By linking all the device fds we provide to userspace to an
> address space through a new pseudo fs, we can use tools like
> unmap_mapping_range() to zap all vmas associated with a device.
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Alex Williamson <[email protected]>

I'd much prefer to just piggy back on the anon_inode fs rather than
duplicating this logic all over. Something like the trivial patch below
should be all that is needed:

diff --git a/fs/anon_inodes.c b/fs/anon_inodes.c
index a280156138ed89..6fe404aab0f0dd 100644
--- a/fs/anon_inodes.c
+++ b/fs/anon_inodes.c
@@ -225,6 +225,12 @@ int anon_inode_getfd_secure(const char *name, const struct file_operations *fops
}
EXPORT_SYMBOL_GPL(anon_inode_getfd_secure);

+struct inode *alloc_anon_inode_simple(void)
+{
+ return alloc_anon_inode(anon_inode_mnt->mnt_sb);
+}
+EXPORT_SYMBOL_GPL(alloc_anon_inode_simple);
+
static int __init anon_inode_init(void)
{
anon_inode_mnt = kern_mount(&anon_inode_fs_type);
diff --git a/include/linux/anon_inodes.h b/include/linux/anon_inodes.h
index 71881a2b6f7860..6b2fb7d0abc57a 100644
--- a/include/linux/anon_inodes.h
+++ b/include/linux/anon_inodes.h
@@ -21,6 +21,7 @@ int anon_inode_getfd_secure(const char *name,
const struct file_operations *fops,
void *priv, int flags,
const struct inode *context_inode);
+struct inode *alloc_anon_inode_simple(void);

#endif /* _LINUX_ANON_INODES_H */

2021-03-09 15:47:19

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v1 07/14] vfio: Add a device notifier interface

On Mon, 8 Mar 2021 20:46:27 -0400
Jason Gunthorpe <[email protected]> wrote:

> On Mon, Mar 08, 2021 at 02:48:30PM -0700, Alex Williamson wrote:
> > Using a vfio device, a notifier block can be registered to receive
> > select device events. Notifiers can only be registered for contained
> > devices, ie. they are available through a user context. Registration
> > of a notifier increments the reference to that container context
> > therefore notifiers must minimally respond to the release event by
> > asynchronously removing notifiers.
> >
> > Signed-off-by: Alex Williamson <[email protected]>
> > drivers/vfio/Kconfig | 1 +
> > drivers/vfio/vfio.c | 35 +++++++++++++++++++++++++++++++++++
> > include/linux/vfio.h | 9 +++++++++
> > 3 files changed, 45 insertions(+)
> >
> > diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
> > index 90c0525b1e0c..9a67675c9b6c 100644
> > +++ b/drivers/vfio/Kconfig
> > @@ -23,6 +23,7 @@ menuconfig VFIO
> > tristate "VFIO Non-Privileged userspace driver framework"
> > select IOMMU_API
> > select VFIO_IOMMU_TYPE1 if (X86 || S390 || ARM || ARM64)
> > + select SRCU
> > help
> > VFIO provides a framework for secure userspace device drivers.
> > See Documentation/driver-api/vfio.rst for more details.
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
> > index c47895539a1a..7f6d00e54e83 100644
> > +++ b/drivers/vfio/vfio.c
> > @@ -105,6 +105,7 @@ struct vfio_device {
> > struct list_head group_next;
> > void *device_data;
> > struct inode *inode;
> > + struct srcu_notifier_head notifier;
> > };
> >
> > #ifdef CONFIG_VFIO_NOIOMMU
> > @@ -601,6 +602,7 @@ struct vfio_device *vfio_group_create_device(struct vfio_group *group,
> > device->ops = ops;
> > device->device_data = device_data;
> > dev_set_drvdata(dev, device);
> > + srcu_init_notifier_head(&device->notifier);
> >
> > /* No need to get group_lock, caller has group reference */
> > vfio_group_get(group);
> > @@ -1785,6 +1787,39 @@ static const struct file_operations vfio_device_fops = {
> > .mmap = vfio_device_fops_mmap,
> > };
> >
> > +int vfio_device_register_notifier(struct vfio_device *device,
> > + struct notifier_block *nb)
> > +{
> > + int ret;
> > +
> > + /* Container ref persists until unregister on success */
> > + ret = vfio_group_add_container_user(device->group);
>
> I'm having trouble guessing why we need to refcount the group to add a
> notifier to the device's notifier chain?
>
> I suppose it actually has to do with the MMIO mapping? But I don't
> know what the relation is between MMIO mappings in the IOMMU and the
> container? This could deserve a comment?

Sure, I can add a comment. We want to make sure the device remains
within an IOMMU context so long as we have a DMA mapping to the device
MMIO, which could potentially manipulate the device. IOMMU context is
managed a the group level.

> > +void vfio_device_unregister_notifier(struct vfio_device *device,
> > + struct notifier_block *nb)
> > +{
> > + if (!srcu_notifier_chain_unregister(&device->notifier, nb))
> > + vfio_group_try_dissolve_container(device->group);
> > +}
> > +EXPORT_SYMBOL_GPL(vfio_device_unregister_notifier);
>
> Is the SRCU still needed with the new locking? With a cursory look I
> only noticed this called under the reflck->lock ?

When registering the notifier, the iommu->lock is held. During the
callback, the same lock is acquired, so we'd have AB-BA otherwise.
Thanks,

Alex

2021-03-09 16:49:34

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v1 07/14] vfio: Add a device notifier interface

On Tue, Mar 09, 2021 at 08:45:13AM -0700, Alex Williamson wrote:

> > I'm having trouble guessing why we need to refcount the group to add a
> > notifier to the device's notifier chain?
> >
> > I suppose it actually has to do with the MMIO mapping? But I don't
> > know what the relation is between MMIO mappings in the IOMMU and the
> > container? This could deserve a comment?
>
> Sure, I can add a comment. We want to make sure the device remains
> within an IOMMU context so long as we have a DMA mapping to the device
> MMIO, which could potentially manipulate the device. IOMMU context is
> managed a the group level.

I find refcounting is easier to understand if the refcount inc/dec is
near the thing that is actually using the object - so I'd suggest to
move this to the iommu code.

A comment sounds like a good idea since this is security sensitive

Jason

2021-03-10 07:58:41

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 07/14] vfio: Add a device notifier interface

On Mon, Mar 08, 2021 at 02:48:30PM -0700, Alex Williamson wrote:
> Using a vfio device, a notifier block can be registered to receive
> select device events. Notifiers can only be registered for contained
> devices, ie. they are available through a user context. Registration
> of a notifier increments the reference to that container context
> therefore notifiers must minimally respond to the release event by
> asynchronously removing notifiers.

Notifiers generally are a horrible multiplexed API. Can't we just
add a proper method table for the intended communication channel?

2021-03-10 08:04:53

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 09/14] vfio/type1: Refactor pfn_list clearing

> +/* Return 1 if iommu->lock dropped and notified, 0 if done */

A bool would be more useful for that pattern.

> +static int unmap_dma_pfn_list(struct vfio_iommu *iommu, struct vfio_dma *dma,
> + struct vfio_dma **dma_last, int *retries)
> +{
> + if (!RB_EMPTY_ROOT(&dma->pfn_list)) {

Just return early when it is empty to remove a level of indentation for
the whole function?

> + struct vfio_iommu_type1_dma_unmap nb_unmap;
> +
> + if (*dma_last == dma) {
> + BUG_ON(++(*retries) > 10);
> + } else {
> + *dma_last = dma;
> + *retries = 0;
> + }
> +
> + nb_unmap.iova = dma->iova;
> + nb_unmap.size = dma->size;
> +
> + /*
> + * Notify anyone (mdev vendor drivers) to invalidate and
> + * unmap iovas within the range we're about to unmap.
> + * Vendor drivers MUST unpin pages in response to an
> + * invalidation.
> + */
> + mutex_unlock(&iommu->lock);
> + blocking_notifier_call_chain(&iommu->notifier,
> + VFIO_IOMMU_NOTIFY_DMA_UNMAP,
> + &nb_unmap);
> + return 1;

Conditional locking is a horrible pattern. I'd rather only factor the
code until before the unlock into the helper, and then leave the
unlock and notify to the caller to avoid that anti-pattern.

Also vendor driver isn't really Linux terminology for a subdriver,
so I'd suggest to switch to something else while you're at it.

2021-03-10 08:08:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 11/14] vfio/type1: Register device notifier

> +
> + if (!dma->pfnmap) {
> + struct vfio_device *device;
> + unsigned long base_pfn;
> + struct pfnmap_obj *pfnmap;

Please factor this whole block into a separate helper to keep it
readable.

2021-03-19 22:29:37

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v1 07/14] vfio: Add a device notifier interface

On Wed, 10 Mar 2021 07:56:39 +0000
Christoph Hellwig <[email protected]> wrote:

> On Mon, Mar 08, 2021 at 02:48:30PM -0700, Alex Williamson wrote:
> > Using a vfio device, a notifier block can be registered to receive
> > select device events. Notifiers can only be registered for contained
> > devices, ie. they are available through a user context. Registration
> > of a notifier increments the reference to that container context
> > therefore notifiers must minimally respond to the release event by
> > asynchronously removing notifiers.
>
> Notifiers generally are a horrible multiplexed API. Can't we just
> add a proper method table for the intended communication channel?

I've been trying to figure out how, but I think not. A user can have
multiple devices, each with entirely separate IOMMU contexts. For each
device, the user can create an mmap of memory to that device and add it
to every other IOMMU context. That enables peer to peer DMA between
all the devices, across all the IOMMU contexts. But each individual
device has no direct reference to any IOMMU context other than its own.
A callback on the IOMMU can't reach those other contexts either, there's
no guarantee those other contexts are necessarily managed via the same
vfio IOMMU backend driver. A notifier is the best I can come up with,
please suggest if you have other ideas. Thanks,

Alex

2021-03-22 15:18:52

by Christoph Hellwig

[permalink] [raw]
Subject: Re: [PATCH v1 07/14] vfio: Add a device notifier interface

On Fri, Mar 19, 2021 at 04:25:40PM -0600, Alex Williamson wrote:
> I've been trying to figure out how, but I think not. A user can have
> multiple devices, each with entirely separate IOMMU contexts. For each
> device, the user can create an mmap of memory to that device and add it
> to every other IOMMU context. That enables peer to peer DMA between
> all the devices, across all the IOMMU contexts. But each individual
> device has no direct reference to any IOMMU context other than its own.
> A callback on the IOMMU can't reach those other contexts either, there's
> no guarantee those other contexts are necessarily managed via the same
> vfio IOMMU backend driver. A notifier is the best I can come up with,
> please suggest if you have other ideas. Thanks,

Indeed, reviewing the set again it seems like we need the notifier
here.

2021-04-09 04:55:46

by Zengtao (B)

[permalink] [raw]
Subject: 答复: [PATCH v1 01/14] vfio: Create vfio_fs _type with inode per device

> -----邮件原件-----
> 发件人: Alex Williamson [mailto:[email protected]]
> 发送时间: 2021年3月9日 5:47
> 收件人: [email protected]
> 抄送: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> 主题: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device
>
> By linking all the device fds we provide to userspace to an address space
> through a new pseudo fs, we can use tools like
> unmap_mapping_range() to zap all vmas associated with a device.
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Alex Williamson <[email protected]>
> ---
> drivers/vfio/vfio.c | 54
> +++++++++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 54 insertions(+)
>
> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> 38779e6fd80c..abdf8d52a911 100644
> --- a/drivers/vfio/vfio.c
> +++ b/drivers/vfio/vfio.c
> @@ -32,11 +32,18 @@
> #include <linux/vfio.h>
> #include <linux/wait.h>
> #include <linux/sched/signal.h>
> +#include <linux/pseudo_fs.h>
> +#include <linux/mount.h>
Minor: keep the headers in alphabetical order.

>
> #define DRIVER_VERSION "0.3"
> #define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
> #define DRIVER_DESC "VFIO - User Level meta-driver"
>
> +#define VFIO_MAGIC 0x5646494f /* "VFIO" */
Move to include/uapi/linux/magic.h ?

> +
> +static int vfio_fs_cnt;
> +static struct vfsmount *vfio_fs_mnt;
> +
> static struct vfio {
> struct class *class;
> struct list_head iommu_drivers_list;
> @@ -97,6 +104,7 @@ struct vfio_device {
> struct vfio_group *group;
> struct list_head group_next;
> void *device_data;
> + struct inode *inode;
> };
>
> #ifdef CONFIG_VFIO_NOIOMMU
> @@ -529,6 +537,34 @@ static struct vfio_group
> *vfio_group_get_from_dev(struct device *dev)
> return group;
> }
>
> +static int vfio_fs_init_fs_context(struct fs_context *fc) {
> + return init_pseudo(fc, VFIO_MAGIC) ? 0 : -ENOMEM; }
> +
> +static struct file_system_type vfio_fs_type = {
> + .name = "vfio",
> + .owner = THIS_MODULE,
> + .init_fs_context = vfio_fs_init_fs_context,
> + .kill_sb = kill_anon_super,
> +};
> +
> +static struct inode *vfio_fs_inode_new(void) {
> + struct inode *inode;
> + int ret;
> +
> + ret = simple_pin_fs(&vfio_fs_type, &vfio_fs_mnt, &vfio_fs_cnt);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + inode = alloc_anon_inode(vfio_fs_mnt->mnt_sb);
> + if (IS_ERR(inode))
> + simple_release_fs(&vfio_fs_mnt, &vfio_fs_cnt);
> +
> + return inode;
> +}
> +
> /**
> * Device objects - create, release, get, put, search
> */
> @@ -539,11 +575,19 @@ struct vfio_device
> *vfio_group_create_device(struct vfio_group *group,
> void *device_data)
> {
> struct vfio_device *device;
> + struct inode *inode;
>
> device = kzalloc(sizeof(*device), GFP_KERNEL);
> if (!device)
> return ERR_PTR(-ENOMEM);
>
> + inode = vfio_fs_inode_new();
> + if (IS_ERR(inode)) {
> + kfree(device);
> + return ERR_CAST(inode);
> + }
> + device->inode = inode;
> +
> kref_init(&device->kref);
> device->dev = dev;
> device->group = group;
> @@ -574,6 +618,9 @@ static void vfio_device_release(struct kref *kref)
>
> dev_set_drvdata(device->dev, NULL);
>
> + iput(device->inode);
> + simple_release_fs(&vfio_fs_mnt, &vfio_fs_cnt);
> +
> kfree(device);
>
> /* vfio_del_group_dev may be waiting for this device */ @@ -1488,6
> +1535,13 @@ static int vfio_group_get_device_fd(struct vfio_group *group,
> char *buf)
> */
> filep->f_mode |= (FMODE_LSEEK | FMODE_PREAD | FMODE_PWRITE);
>
> + /*
> + * Use the pseudo fs inode on the device to link all mmaps
> + * to the same address space, allowing us to unmap all vmas
> + * associated to this device using unmap_mapping_range().
> + */
> + filep->f_mapping = device->inode->i_mapping;
> +
> atomic_inc(&group->container_users);
>
> fd_install(ret, filep);

2021-04-09 14:25:29

by Alex Williamson

[permalink] [raw]
Subject: Re: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device

On Fri, 9 Apr 2021 04:54:23 +0000
"Zengtao (B)" <[email protected]> wrote:

> > -----邮件原件-----
> > 发件人: Alex Williamson [mailto:[email protected]]
> > 发送时间: 2021年3月9日 5:47
> > 收件人: [email protected]
> > 抄送: [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected]
> > 主题: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device
> >
> > By linking all the device fds we provide to userspace to an address space
> > through a new pseudo fs, we can use tools like
> > unmap_mapping_range() to zap all vmas associated with a device.
> >
> > Suggested-by: Jason Gunthorpe <[email protected]>
> > Signed-off-by: Alex Williamson <[email protected]>
> > ---
> > drivers/vfio/vfio.c | 54
> > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 54 insertions(+)
> >
> > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > 38779e6fd80c..abdf8d52a911 100644
> > --- a/drivers/vfio/vfio.c
> > +++ b/drivers/vfio/vfio.c
> > @@ -32,11 +32,18 @@
> > #include <linux/vfio.h>
> > #include <linux/wait.h>
> > #include <linux/sched/signal.h>
> > +#include <linux/pseudo_fs.h>
> > +#include <linux/mount.h>
> Minor: keep the headers in alphabetical order.

They started out that way, but various tree-wide changes ignoring that,
and likely oversights on my part as well, has left us with numerous
breaks in that rule already.

> >
> > #define DRIVER_VERSION "0.3"
> > #define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
> > #define DRIVER_DESC "VFIO - User Level meta-driver"
> >
> > +#define VFIO_MAGIC 0x5646494f /* "VFIO" */
> Move to include/uapi/linux/magic.h ?

Hmm, yeah, I suppose it probably should go there. Thanks.

FWIW, I'm still working on a next version of this series, currently
struggling how to handle an arbitrary number of vmas per user DMA
mapping. Thanks,

Alex

2021-04-09 17:33:13

by Jason Gunthorpe

[permalink] [raw]
Subject: Re: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device

On Fri, Apr 09, 2021 at 08:24:00AM -0600, Alex Williamson wrote:

> > > #define DRIVER_VERSION "0.3"
> > > #define DRIVER_AUTHOR "Alex Williamson <[email protected]>"
> > > #define DRIVER_DESC "VFIO - User Level meta-driver"
> > >
> > > +#define VFIO_MAGIC 0x5646494f /* "VFIO" */
> > Move to include/uapi/linux/magic.h ?
>
> Hmm, yeah, I suppose it probably should go there. Thanks.

From my review we haven't been doing that unless it is actually uapi
relavent for some reason (this is not)

In particular with CH having a patch series to eliminate all of these
cases and drop the constants..

Jason

2021-04-12 04:23:44

by Zengtao (B)

[permalink] [raw]
Subject: 答复: [PATCH v1 01/14] vfio: Create vfio_fs_t ype with inode per device

> -----?ʼ?ԭ??-----
> ??????: Jason Gunthorpe [mailto:[email protected]]
> ????ʱ??: 2021??4??10?? 1:32
> ?ռ???: Alex Williamson <[email protected]>
> ????: Zengtao (B) <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected]
> ????: Re: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device
>
> On Fri, Apr 09, 2021 at 08:24:00AM -0600, Alex Williamson wrote:
>
> > > > #define DRIVER_VERSION "0.3"
> > > > #define DRIVER_AUTHOR "Alex Williamson
> <[email protected]>"
> > > > #define DRIVER_DESC "VFIO - User Level meta-driver"
> > > >
> > > > +#define VFIO_MAGIC 0x5646494f /* "VFIO" */
> > > Move to include/uapi/linux/magic.h ?
> >
> > Hmm, yeah, I suppose it probably should go there. Thanks.
>
> From my review we haven't been doing that unless it is actually uapi relavent
> for some reason (this is not)
>
Yes, it's not uapi relavent, and I still think it's better to put magic
in a single header, and currently not all micros in magic.h are for
uapi, some work should be done for that, but no ideas now, :)

> In particular with CH having a patch series to eliminate all of these cases and
> drop the constants..
>
Interested, could you share the links for that?

Thanks
Zengtao

> Jason

2021-04-12 04:23:44

by Zengtao (B)

[permalink] [raw]
Subject: 答复: [PATCH v1 01/14] vfio: Create vfio_fs _type with inode per device

> -----邮件原件-----
> 发件人: Alex Williamson [mailto:[email protected]]
> 发送时间: 2021年4月9日 22:24
> 收件人: Zengtao (B) <[email protected]>
> 抄送: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]
> 主题: Re: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device
>
> On Fri, 9 Apr 2021 04:54:23 +0000
> "Zengtao (B)" <[email protected]> wrote:
>
> > > -----邮件原件-----
> > > 发件人: Alex Williamson [mailto:[email protected]]
> > > 发送时间: 2021年3月9日 5:47
> > > 收件人: [email protected]
> > > 抄送: [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected]
> > > 主题: [PATCH v1 01/14] vfio: Create vfio_fs_type with inode per device
> > >
> > > By linking all the device fds we provide to userspace to an address
> > > space through a new pseudo fs, we can use tools like
> > > unmap_mapping_range() to zap all vmas associated with a device.
> > >
> > > Suggested-by: Jason Gunthorpe <[email protected]>
> > > Signed-off-by: Alex Williamson <[email protected]>
> > > ---
> > > drivers/vfio/vfio.c | 54
> > > +++++++++++++++++++++++++++++++++++++++++++++++++++
> > > 1 file changed, 54 insertions(+)
> > >
> > > diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index
> > > 38779e6fd80c..abdf8d52a911 100644
> > > --- a/drivers/vfio/vfio.c
> > > +++ b/drivers/vfio/vfio.c
> > > @@ -32,11 +32,18 @@
> > > #include <linux/vfio.h>
> > > #include <linux/wait.h>
> > > #include <linux/sched/signal.h>
> > > +#include <linux/pseudo_fs.h>
> > > +#include <linux/mount.h>
> > Minor: keep the headers in alphabetical order.
>
> They started out that way, but various tree-wide changes ignoring that, and
> likely oversights on my part as well, has left us with numerous breaks in that
> rule already.
>
> > >
> > > #define DRIVER_VERSION "0.3"
> > > #define DRIVER_AUTHOR "Alex Williamson
> <[email protected]>"
> > > #define DRIVER_DESC "VFIO - User Level meta-driver"
> > >
> > > +#define VFIO_MAGIC 0x5646494f /* "VFIO" */
> > Move to include/uapi/linux/magic.h ?
>
> Hmm, yeah, I suppose it probably should go there. Thanks.
>
> FWIW, I'm still working on a next version of this series, currently struggling
> how to handle an arbitrary number of vmas per user DMA mapping. Thanks,
>

Will do some testing on our platform, and want to make sure the issue
I reported is included :
https://patchwork.kernel.org/project/kvm/patch/[email protected]/

Thanks
zengtao


> Alex