2023-02-06 09:05:40

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 00/14] Add vfio_device cdev for iommufd support

Existing VFIO provides group-centric user APIs for userspace. Userspace
opens the /dev/vfio/$group_id first before getting device fd and hence
getting access to device. This is not the desired model for iommufd. Per
the conclusion of community discussion[1], iommufd provides device-centric
kAPIs and requires its consumer (like VFIO) to be device-centric user
APIs. Such user APIs are used to associate device with iommufd and also
the I/O address spaces managed by the iommufd.

This series first introduces a per device file structure to be prepared
for further enhancement and refactors the kvm-vfio code to be prepared
for accepting device file from userspace. Then refactors the vfio to be
able to handle iommufd binding. This refactor includes the mechanism of
blocking device access before iommufd bind, making vfio_device_open() be
exclusive between the group path and the cdev path. Eventually, adds the
cdev support for vfio device, and makes group infrastructure optional as
it is not needed when vfio device cdev is compiled.

This is also a base for further support iommu nesting for vfio device[2].

The complete code can be found in below branch, simple test done with the
legacy group path and the cdev path. Draft QEMU branch can be found at[3]

https://github.com/yiliu1765/iommufd/tree/vfio_device_cdev_v2
(config CONFIG_IOMMUFD=y CONFIG_VFIO_DEVICE_CDEV=y)

[1] https://lore.kernel.org/kvm/BN9PR11MB5433B1E4AE5B0480369F97178C189@BN9PR11MB5433.namprd11.prod.outlook.com/
[2] https://github.com/yiliu1765/iommufd/tree/wip/iommufd-v6.2-rc4-nesting
[3] https://github.com/yiliu1765/qemu/tree/iommufd_rfcv3 (it is based on Eric's
QEMU iommufd rfcv3 (https://lore.kernel.org/kvm/[email protected]/)
plus two commits to align with vfio_device_cdev v2)

Change log:

v2:
- Add r-b from Kevin and Eric on patch 01 02 04.
- "Split kvm/vfio: Provide struct kvm_device_ops::release() insted of ::destroy()"
from this series and got applied. (Alex, Kevin, Jason, Mathhew)
- Add kvm_ref_lock to protect vfio_device_file->kvm instead of reusing
dev_set->lock as dead-lock is observed with vfio-ap which would try to
acquire kvm_lock. This is opposite lock order with kvm_device_release()
which holds kvm_lock first and then hold dev_set->lock. (Kevin)
- Use a separate ioctl for detaching IOAS. (Alex)
- Rename vfio_device_file::single_open to be is_cdev_device (Kevin, Alex)
- Move the vfio device cdev code into device_cdev.c and add a VFIO_DEVICE_CDEV
kconfig for it. (Kevin, Jason)

v1: https://lore.kernel.org/kvm/[email protected]/
- Fix the circular refcount between kvm struct and device file reference. (JasonG)
- Address comments from KevinT
- Remained the ioctl for detach, needs to Alex's taste
(https://lore.kernel.org/kvm/BN9PR11MB5276BE9F4B0613EE859317028CFF9@BN9PR11MB5276.namprd11.prod.outlook.com/)

rfc: https://lore.kernel.org/kvm/[email protected]/

Thanks,
Yi Liu

Yi Liu (14):
vfio: Allocate per device file structure
vfio: Refine vfio file kAPIs
vfio: Accept vfio device file in the driver facing kAPI
kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device
fd
kvm/vfio: Accept vfio device file from userspace
vfio: Pass struct vfio_device_file * to vfio_device_open/close()
vfio: Block device access via device fd until device is opened
vfio: Add infrastructure for bind_iommufd from userspace
vfio-iommufd: Add detach_ioas support for physical VFIO devices
vfio-iommufd: Add detach_ioas for emulated VFIO devices
vfio: Make vfio_device_open() exclusive between group path and device
cdev path
vfio: Add cdev for vfio_device
vfio: Add ioctls for device cdev using iommufd
vfio: Compile group optionally

Documentation/driver-api/vfio.rst | 8 +-
Documentation/virt/kvm/devices/vfio.rst | 40 ++-
drivers/gpu/drm/i915/gvt/kvmgt.c | 1 +
drivers/s390/cio/vfio_ccw_ops.c | 1 +
drivers/s390/crypto/vfio_ap_ops.c | 1 +
drivers/vfio/Kconfig | 29 ++
drivers/vfio/Makefile | 3 +-
drivers/vfio/device_cdev.c | 240 ++++++++++++++
drivers/vfio/fsl-mc/vfio_fsl_mc.c | 1 +
drivers/vfio/group.c | 117 +++----
drivers/vfio/iommufd.c | 81 +++--
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 2 +
drivers/vfio/pci/mlx5/main.c | 1 +
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/pci/vfio_pci_core.c | 4 +-
drivers/vfio/platform/vfio_amba.c | 1 +
drivers/vfio/platform/vfio_platform.c | 1 +
drivers/vfio/vfio.h | 156 ++++++++-
drivers/vfio/vfio_main.c | 312 ++++++++++++++++--
include/linux/iommufd.h | 6 +
include/linux/vfio.h | 24 +-
include/uapi/linux/kvm.h | 16 +-
include/uapi/linux/vfio.h | 86 +++++
virt/kvm/vfio.c | 141 ++++----
24 files changed, 1057 insertions(+), 216 deletions(-)
create mode 100644 drivers/vfio/device_cdev.c

--
2.34.1



2023-02-06 09:05:49

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 02/14] vfio: Refine vfio file kAPIs

This prepares for making the below kAPIs to accept both group file
and device file instead of only vfio group file.

bool vfio_file_enforced_coherent(struct file *file);
void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
bool vfio_file_has_dev(struct file *file, struct vfio_device *device);

Besides above change, vfio_file_is_group() is renamed to be
vfio_file_is_valid().

Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
---
drivers/vfio/group.c | 74 ++++++++------------------------
drivers/vfio/pci/vfio_pci_core.c | 4 +-
drivers/vfio/vfio.h | 4 ++
drivers/vfio/vfio_main.c | 62 ++++++++++++++++++++++++++
include/linux/vfio.h | 2 +-
virt/kvm/vfio.c | 10 ++---
6 files changed, 92 insertions(+), 64 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index cf51e1a0fd96..cc0eded19a9f 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -751,6 +751,15 @@ bool vfio_device_has_container(struct vfio_device *device)
return device->group->container;
}

+struct vfio_group *vfio_group_from_file(struct file *file)
+{
+ struct vfio_group *group = file->private_data;
+
+ if (file->f_op != &vfio_group_fops)
+ return NULL;
+ return group;
+}
+
/**
* vfio_file_iommu_group - Return the struct iommu_group for the vfio group file
* @file: VFIO group file
@@ -761,13 +770,13 @@ bool vfio_device_has_container(struct vfio_device *device)
*/
struct iommu_group *vfio_file_iommu_group(struct file *file)
{
- struct vfio_group *group = file->private_data;
+ struct vfio_group *group = vfio_group_from_file(file);
struct iommu_group *iommu_group = NULL;

if (!IS_ENABLED(CONFIG_SPAPR_TCE_IOMMU))
return NULL;

- if (!vfio_file_is_group(file))
+ if (!group)
return NULL;

mutex_lock(&group->group_lock);
@@ -780,34 +789,11 @@ struct iommu_group *vfio_file_iommu_group(struct file *file)
}
EXPORT_SYMBOL_GPL(vfio_file_iommu_group);

-/**
- * vfio_file_is_group - True if the file is usable with VFIO aPIS
- * @file: VFIO group file
- */
-bool vfio_file_is_group(struct file *file)
-{
- return file->f_op == &vfio_group_fops;
-}
-EXPORT_SYMBOL_GPL(vfio_file_is_group);
-
-/**
- * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
- * is always CPU cache coherent
- * @file: VFIO group file
- *
- * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
- * bit in DMA transactions. A return of false indicates that the user has
- * rights to access additional instructions such as wbinvd on x86.
- */
-bool vfio_file_enforced_coherent(struct file *file)
+bool vfio_group_enforced_coherent(struct vfio_group *group)
{
- struct vfio_group *group = file->private_data;
struct vfio_device *device;
bool ret = true;

- if (!vfio_file_is_group(file))
- return true;
-
/*
* If the device does not have IOMMU_CAP_ENFORCE_CACHE_COHERENCY then
* any domain later attached to it will also not support it. If the cap
@@ -825,46 +811,22 @@ bool vfio_file_enforced_coherent(struct file *file)
mutex_unlock(&group->device_lock);
return ret;
}
-EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);

-/**
- * vfio_file_set_kvm - Link a kvm with VFIO drivers
- * @file: VFIO group file
- * @kvm: KVM to link
- *
- * When a VFIO device is first opened the KVM will be available in
- * device->kvm if one was associated with the group.
- */
-void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
{
- struct vfio_group *group = file->private_data;
-
- if (!vfio_file_is_group(file))
- return;
-
+ /*
+ * When a VFIO device is first opened the KVM will be available in
+ * device->kvm if one was associated with the group.
+ */
spin_lock(&group->kvm_ref_lock);
group->kvm = kvm;
spin_unlock(&group->kvm_ref_lock);
}
-EXPORT_SYMBOL_GPL(vfio_file_set_kvm);

-/**
- * vfio_file_has_dev - True if the VFIO file is a handle for device
- * @file: VFIO file to check
- * @device: Device that must be part of the file
- *
- * Returns true if given file has permission to manipulate the given device.
- */
-bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device)
{
- struct vfio_group *group = file->private_data;
-
- if (!vfio_file_is_group(file))
- return false;
-
return group == device->group;
}
-EXPORT_SYMBOL_GPL(vfio_file_has_dev);

static char *vfio_devnode(const struct device *dev, umode_t *mode)
{
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index a6492a25ff6a..4704c1babae3 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1320,8 +1320,8 @@ static int vfio_pci_ioctl_pci_hot_reset(struct vfio_pci_core_device *vdev,
break;
}

- /* Ensure the FD is a vfio group FD.*/
- if (!vfio_file_is_group(file)) {
+ /* Ensure the FD is a vfio FD.*/
+ if (!vfio_file_is_valid(file)) {
fput(file);
ret = -EINVAL;
break;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index c05a6d3b7a73..54195b107b45 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -90,6 +90,10 @@ void vfio_device_group_unregister(struct vfio_device *device);
int vfio_device_group_use_iommu(struct vfio_device *device);
void vfio_device_group_unuse_iommu(struct vfio_device *device);
void vfio_device_group_close(struct vfio_device *device);
+struct vfio_group *vfio_group_from_file(struct file *file);
+bool vfio_group_enforced_coherent(struct vfio_group *group);
+void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
+bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
bool vfio_device_has_container(struct vfio_device *device);
int __init vfio_group_init(void);
void vfio_group_cleanup(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index d99fa0cec18e..8612ba112e7f 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -1167,6 +1167,68 @@ const struct file_operations vfio_device_fops = {
.mmap = vfio_device_fops_mmap,
};

+/**
+ * vfio_file_is_valid - True if the file is usable with VFIO APIS
+ * @file: VFIO group file or VFIO device file
+ */
+bool vfio_file_is_valid(struct file *file)
+{
+ return vfio_group_from_file(file);
+}
+EXPORT_SYMBOL_GPL(vfio_file_is_valid);
+
+/**
+ * vfio_file_enforced_coherent - True if the DMA associated with the VFIO file
+ * is always CPU cache coherent
+ * @file: VFIO group file or VFIO device file
+ *
+ * Enforced coherency means that the IOMMU ignores things like the PCIe no-snoop
+ * bit in DMA transactions. A return of false indicates that the user has
+ * rights to access additional instructions such as wbinvd on x86.
+ */
+bool vfio_file_enforced_coherent(struct file *file)
+{
+ struct vfio_group *group = vfio_group_from_file(file);
+
+ if (group)
+ return vfio_group_enforced_coherent(group);
+
+ return true;
+}
+EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
+
+/**
+ * vfio_file_set_kvm - Link a kvm with VFIO drivers
+ * @file: VFIO group file or VFIO device file
+ * @kvm: KVM to link
+ *
+ */
+void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+ struct vfio_group *group = vfio_group_from_file(file);
+
+ if (group)
+ vfio_group_set_kvm(group, kvm);
+}
+EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
+
+/**
+ * vfio_file_has_dev - True if the VFIO file is a handle for device
+ * @file: VFIO file to check, VFIO group file or VFIO device file
+ * @device: Device that must be part of the file
+ *
+ * Returns true if given file has permission to manipulate the given device.
+ */
+bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
+{
+ struct vfio_group *group = vfio_group_from_file(file);
+
+ if (group)
+ return vfio_group_has_dev(group, device);
+ return false;
+}
+EXPORT_SYMBOL_GPL(vfio_file_has_dev);
+
/*
* Sub-module support
*/
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 93134b023968..6a07e1c6c38e 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -245,7 +245,7 @@ int vfio_mig_get_next_state(struct vfio_device *device,
* External user API
*/
struct iommu_group *vfio_file_iommu_group(struct file *file);
-bool vfio_file_is_group(struct file *file);
+bool vfio_file_is_valid(struct file *file);
bool vfio_file_enforced_coherent(struct file *file);
void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
bool vfio_file_has_dev(struct file *file, struct vfio_device *device);
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 9584eb57e0ed..8bac308ba630 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -64,18 +64,18 @@ static bool kvm_vfio_file_enforced_coherent(struct file *file)
return ret;
}

-static bool kvm_vfio_file_is_group(struct file *file)
+static bool kvm_vfio_file_is_valid(struct file *file)
{
bool (*fn)(struct file *file);
bool ret;

- fn = symbol_get(vfio_file_is_group);
+ fn = symbol_get(vfio_file_is_valid);
if (!fn)
return false;

ret = fn(file);

- symbol_put(vfio_file_is_group);
+ symbol_put(vfio_file_is_valid);

return ret;
}
@@ -154,8 +154,8 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
if (!filp)
return -EBADF;

- /* Ensure the FD is a vfio group FD.*/
- if (!kvm_vfio_file_is_group(filp)) {
+ /* Ensure the FD is a vfio FD.*/
+ if (!kvm_vfio_file_is_valid(filp)) {
ret = -EINVAL;
goto err_fput;
}
--
2.34.1


2023-02-06 09:05:52

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 01/14] vfio: Allocate per device file structure

This is preparation for adding vfio device cdev support. vfio device
cdev requires:
1) a per device file memory to store the kvm pointer set by KVM. It will
be propagated to vfio_device:kvm after the device cdev file is bound
to an iommufd
2) a mechanism to block device access through device cdev fd before it
is bound to an iommufd

To address above requirements, this adds a per device file structure
named vfio_device_file. For now, it's only a wrapper of struct vfio_device
pointer. Other fields will be added to this per file structure in future
commits.

Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
---
drivers/vfio/group.c | 13 +++++++++++--
drivers/vfio/vfio.h | 6 ++++++
drivers/vfio/vfio_main.c | 31 ++++++++++++++++++++++++++-----
3 files changed, 43 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 0e9036e2b9c4..cf51e1a0fd96 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -215,19 +215,26 @@ void vfio_device_group_close(struct vfio_device *device)

static struct file *vfio_device_open_file(struct vfio_device *device)
{
+ struct vfio_device_file *df;
struct file *filep;
int ret;

+ df = vfio_allocate_device_file(device);
+ if (IS_ERR(df)) {
+ ret = PTR_ERR(df);
+ goto err_out;
+ }
+
ret = vfio_device_group_open(device);
if (ret)
- goto err_out;
+ goto err_free;

/*
* We can't use anon_inode_getfd() because we need to modify
* the f_mode flags directly to allow more than just ioctls
*/
filep = anon_inode_getfile("[vfio-device]", &vfio_device_fops,
- device, O_RDWR);
+ df, O_RDWR);
if (IS_ERR(filep)) {
ret = PTR_ERR(filep);
goto err_close_device;
@@ -251,6 +258,8 @@ static struct file *vfio_device_open_file(struct vfio_device *device)

err_close_device:
vfio_device_group_close(device);
+err_free:
+ kfree(df);
err_out:
return ERR_PTR(ret);
}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 4f39ab549a80..c05a6d3b7a73 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -16,11 +16,17 @@ struct iommu_group;
struct vfio_device;
struct vfio_container;

+struct vfio_device_file {
+ struct vfio_device *device;
+};
+
void vfio_device_put_registration(struct vfio_device *device);
bool vfio_device_try_get_registration(struct vfio_device *device);
int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
void vfio_device_close(struct vfio_device *device,
struct iommufd_ctx *iommufd);
+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device);

extern const struct file_operations vfio_device_fops;

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 3a597e799918..d99fa0cec18e 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -396,6 +396,20 @@ static bool vfio_assert_device_open(struct vfio_device *device)
return !WARN_ON_ONCE(!READ_ONCE(device->open_count));
}

+struct vfio_device_file *
+vfio_allocate_device_file(struct vfio_device *device)
+{
+ struct vfio_device_file *df;
+
+ df = kzalloc(sizeof(*df), GFP_KERNEL_ACCOUNT);
+ if (!df)
+ return ERR_PTR(-ENOMEM);
+
+ df->device = device;
+
+ return df;
+}
+
static int vfio_device_first_open(struct vfio_device *device,
struct iommufd_ctx *iommufd)
{
@@ -509,12 +523,15 @@ static inline void vfio_device_pm_runtime_put(struct vfio_device *device)
*/
static int vfio_device_fops_release(struct inode *inode, struct file *filep)
{
- struct vfio_device *device = filep->private_data;
+ struct vfio_device_file *df = filep->private_data;
+ struct vfio_device *device = df->device;

vfio_device_group_close(device);

vfio_device_put_registration(device);

+ kfree(df);
+
return 0;
}

@@ -1079,7 +1096,8 @@ static int vfio_ioctl_device_feature(struct vfio_device *device,
static long vfio_device_fops_unl_ioctl(struct file *filep,
unsigned int cmd, unsigned long arg)
{
- struct vfio_device *device = filep->private_data;
+ struct vfio_device_file *df = filep->private_data;
+ struct vfio_device *device = df->device;
int ret;

ret = vfio_device_pm_runtime_get(device);
@@ -1106,7 +1124,8 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
size_t count, loff_t *ppos)
{
- struct vfio_device *device = filep->private_data;
+ struct vfio_device_file *df = filep->private_data;
+ struct vfio_device *device = df->device;

if (unlikely(!device->ops->read))
return -EINVAL;
@@ -1118,7 +1137,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,
const char __user *buf,
size_t count, loff_t *ppos)
{
- struct vfio_device *device = filep->private_data;
+ struct vfio_device_file *df = filep->private_data;
+ struct vfio_device *device = df->device;

if (unlikely(!device->ops->write))
return -EINVAL;
@@ -1128,7 +1148,8 @@ static ssize_t vfio_device_fops_write(struct file *filep,

static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
{
- struct vfio_device *device = filep->private_data;
+ struct vfio_device_file *df = filep->private_data;
+ struct vfio_device *device = df->device;

if (unlikely(!device->ops->mmap))
return -EINVAL;
--
2.34.1


2023-02-06 09:05:55

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 03/14] vfio: Accept vfio device file in the driver facing kAPI

This makes the vfio file kAPIs to accepte vfio device files, also a
preparation for vfio device cdev support.

For the kvm set with vfio device file, kvm pointer is stored in struct
vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
pointer usage within VFIO. This kvm pointer will be set to vfio_device
after device file is bound to iommufd in the cdev path.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/vfio/vfio.h | 2 ++
drivers/vfio/vfio_main.c | 51 ++++++++++++++++++++++++++++++++++++----
2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 54195b107b45..00e6ce6ef4c9 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;

struct vfio_device_file {
struct vfio_device *device;
+ spinlock_t kvm_ref_lock; /* protect kvm field */
+ struct kvm *kvm;
};

void vfio_device_put_registration(struct vfio_device *device);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 8612ba112e7f..c529f609fecc 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -406,6 +406,7 @@ vfio_allocate_device_file(struct vfio_device *device)
return ERR_PTR(-ENOMEM);

df->device = device;
+ spin_lock_init(&df->kvm_ref_lock);

return df;
}
@@ -1167,13 +1168,23 @@ const struct file_operations vfio_device_fops = {
.mmap = vfio_device_fops_mmap,
};

+static struct vfio_device *vfio_device_from_file(struct file *file)
+{
+ struct vfio_device_file *df = file->private_data;
+
+ if (file->f_op != &vfio_device_fops)
+ return NULL;
+ return df->device;
+}
+
/**
* vfio_file_is_valid - True if the file is usable with VFIO APIS
* @file: VFIO group file or VFIO device file
*/
bool vfio_file_is_valid(struct file *file)
{
- return vfio_group_from_file(file);
+ return vfio_group_from_file(file) ||
+ vfio_device_from_file(file);
}
EXPORT_SYMBOL_GPL(vfio_file_is_valid);

@@ -1188,15 +1199,36 @@ EXPORT_SYMBOL_GPL(vfio_file_is_valid);
*/
bool vfio_file_enforced_coherent(struct file *file)
{
- struct vfio_group *group = vfio_group_from_file(file);
+ struct vfio_group *group;
+ struct vfio_device *device;

+ group = vfio_group_from_file(file);
if (group)
return vfio_group_enforced_coherent(group);

+ device = vfio_device_from_file(file);
+ if (device)
+ return device_iommu_capable(device->dev,
+ IOMMU_CAP_ENFORCE_CACHE_COHERENCY);
+
return true;
}
EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);

+static void vfio_device_file_set_kvm(struct file *file, struct kvm *kvm)
+{
+ struct vfio_device_file *df = file->private_data;
+
+ /*
+ * The kvm is first recorded in the vfio_device_file, and will
+ * be propagated to vfio_device::kvm when the file is bound to
+ * iommufd successfully in the vfio device cdev path.
+ */
+ spin_lock(&df->kvm_ref_lock);
+ df->kvm = kvm;
+ spin_unlock(&df->kvm_ref_lock);
+}
+
/**
* vfio_file_set_kvm - Link a kvm with VFIO drivers
* @file: VFIO group file or VFIO device file
@@ -1205,10 +1237,14 @@ EXPORT_SYMBOL_GPL(vfio_file_enforced_coherent);
*/
void vfio_file_set_kvm(struct file *file, struct kvm *kvm)
{
- struct vfio_group *group = vfio_group_from_file(file);
+ struct vfio_group *group;

+ group = vfio_group_from_file(file);
if (group)
vfio_group_set_kvm(group, kvm);
+
+ if (vfio_device_from_file(file))
+ vfio_device_file_set_kvm(file, kvm);
}
EXPORT_SYMBOL_GPL(vfio_file_set_kvm);

@@ -1221,10 +1257,17 @@ EXPORT_SYMBOL_GPL(vfio_file_set_kvm);
*/
bool vfio_file_has_dev(struct file *file, struct vfio_device *device)
{
- struct vfio_group *group = vfio_group_from_file(file);
+ struct vfio_group *group;
+ struct vfio_device *vdev;

+ group = vfio_group_from_file(file);
if (group)
return vfio_group_has_dev(group, device);
+
+ vdev = vfio_device_from_file(file);
+ if (device)
+ return vdev == device;
+
return false;
}
EXPORT_SYMBOL_GPL(vfio_file_has_dev);
--
2.34.1


2023-02-06 09:05:58

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 04/14] kvm/vfio: Rename kvm_vfio_group to prepare for accepting vfio device fd

Meanwhile, rename related helpers. No functional change is intended.

Signed-off-by: Yi Liu <[email protected]>
Reviewed-by: Kevin Tian <[email protected]>
Reviewed-by: Eric Auger <[email protected]>
---
virt/kvm/vfio.c | 115 ++++++++++++++++++++++++------------------------
1 file changed, 58 insertions(+), 57 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 8bac308ba630..857d6ba349e1 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -21,7 +21,7 @@
#include <asm/kvm_ppc.h>
#endif

-struct kvm_vfio_group {
+struct kvm_vfio_file {
struct list_head node;
struct file *file;
#ifdef CONFIG_SPAPR_TCE_IOMMU
@@ -30,7 +30,7 @@ struct kvm_vfio_group {
};

struct kvm_vfio {
- struct list_head group_list;
+ struct list_head file_list;
struct mutex lock;
bool noncoherent;
};
@@ -98,34 +98,35 @@ static struct iommu_group *kvm_vfio_file_iommu_group(struct file *file)
}

static void kvm_spapr_tce_release_vfio_group(struct kvm *kvm,
- struct kvm_vfio_group *kvg)
+ struct kvm_vfio_file *kvf)
{
- if (WARN_ON_ONCE(!kvg->iommu_group))
+ if (WARN_ON_ONCE(!kvf->iommu_group))
return;

- kvm_spapr_tce_release_iommu_group(kvm, kvg->iommu_group);
- iommu_group_put(kvg->iommu_group);
- kvg->iommu_group = NULL;
+ kvm_spapr_tce_release_iommu_group(kvm, kvf->iommu_group);
+ iommu_group_put(kvf->iommu_group);
+ kvf->iommu_group = NULL;
}
#endif

/*
- * Groups can use the same or different IOMMU domains. If the same then
- * adding a new group may change the coherency of groups we've previously
- * been told about. We don't want to care about any of that so we retest
- * each group and bail as soon as we find one that's noncoherent. This
- * means we only ever [un]register_noncoherent_dma once for the whole device.
+ * Groups/devices can use the same or different IOMMU domains. If the same
+ * then adding a new group/device may change the coherency of groups/devices
+ * we've previously been told about. We don't want to care about any of
+ * that so we retest each group/device and bail as soon as we find one that's
+ * noncoherent. This means we only ever [un]register_noncoherent_dma once
+ * for the whole device.
*/
static void kvm_vfio_update_coherency(struct kvm_device *dev)
{
struct kvm_vfio *kv = dev->private;
bool noncoherent = false;
- struct kvm_vfio_group *kvg;
+ struct kvm_vfio_file *kvf;

mutex_lock(&kv->lock);

- list_for_each_entry(kvg, &kv->group_list, node) {
- if (!kvm_vfio_file_enforced_coherent(kvg->file)) {
+ list_for_each_entry(kvf, &kv->file_list, node) {
+ if (!kvm_vfio_file_enforced_coherent(kvf->file)) {
noncoherent = true;
break;
}
@@ -143,10 +144,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev)
mutex_unlock(&kv->lock);
}

-static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_add(struct kvm_device *dev, unsigned int fd)
{
struct kvm_vfio *kv = dev->private;
- struct kvm_vfio_group *kvg;
+ struct kvm_vfio_file *kvf;
struct file *filp;
int ret;

@@ -162,27 +163,27 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)

mutex_lock(&kv->lock);

- list_for_each_entry(kvg, &kv->group_list, node) {
- if (kvg->file == filp) {
+ list_for_each_entry(kvf, &kv->file_list, node) {
+ if (kvf->file == filp) {
ret = -EEXIST;
goto err_unlock;
}
}

- kvg = kzalloc(sizeof(*kvg), GFP_KERNEL_ACCOUNT);
- if (!kvg) {
+ kvf = kzalloc(sizeof(*kvf), GFP_KERNEL_ACCOUNT);
+ if (!kvf) {
ret = -ENOMEM;
goto err_unlock;
}

- kvg->file = filp;
- list_add_tail(&kvg->node, &kv->group_list);
+ kvf->file = filp;
+ list_add_tail(&kvf->node, &kv->file_list);

kvm_arch_start_assignment(dev->kvm);

mutex_unlock(&kv->lock);

- kvm_vfio_file_set_kvm(kvg->file, dev->kvm);
+ kvm_vfio_file_set_kvm(kvf->file, dev->kvm);
kvm_vfio_update_coherency(dev);

return 0;
@@ -193,10 +194,10 @@ static int kvm_vfio_group_add(struct kvm_device *dev, unsigned int fd)
return ret;
}

-static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
+static int kvm_vfio_file_del(struct kvm_device *dev, unsigned int fd)
{
struct kvm_vfio *kv = dev->private;
- struct kvm_vfio_group *kvg;
+ struct kvm_vfio_file *kvf;
struct fd f;
int ret;

@@ -208,18 +209,18 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)

mutex_lock(&kv->lock);

- list_for_each_entry(kvg, &kv->group_list, node) {
- if (kvg->file != f.file)
+ list_for_each_entry(kvf, &kv->file_list, node) {
+ if (kvf->file != f.file)
continue;

- list_del(&kvg->node);
+ list_del(&kvf->node);
kvm_arch_end_assignment(dev->kvm);
#ifdef CONFIG_SPAPR_TCE_IOMMU
- kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+ kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
#endif
- kvm_vfio_file_set_kvm(kvg->file, NULL);
- fput(kvg->file);
- kfree(kvg);
+ kvm_vfio_file_set_kvm(kvf->file, NULL);
+ fput(kvf->file);
+ kfree(kvf);
ret = 0;
break;
}
@@ -234,12 +235,12 @@ static int kvm_vfio_group_del(struct kvm_device *dev, unsigned int fd)
}

#ifdef CONFIG_SPAPR_TCE_IOMMU
-static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
- void __user *arg)
+static int kvm_vfio_file_set_spapr_tce(struct kvm_device *dev,
+ void __user *arg)
{
struct kvm_vfio_spapr_tce param;
struct kvm_vfio *kv = dev->private;
- struct kvm_vfio_group *kvg;
+ struct kvm_vfio_file *kvf;
struct fd f;
int ret;

@@ -254,20 +255,20 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,

mutex_lock(&kv->lock);

- list_for_each_entry(kvg, &kv->group_list, node) {
- if (kvg->file != f.file)
+ list_for_each_entry(kvf, &kv->file_list, node) {
+ if (kvf->file != f.file)
continue;

- if (!kvg->iommu_group) {
- kvg->iommu_group = kvm_vfio_file_iommu_group(kvg->file);
- if (WARN_ON_ONCE(!kvg->iommu_group)) {
+ if (!kvf->iommu_group) {
+ kvf->iommu_group = kvm_vfio_file_iommu_group(kvf->file);
+ if (WARN_ON_ONCE(!kvf->iommu_group)) {
ret = -EIO;
goto err_fdput;
}
}

ret = kvm_spapr_tce_attach_iommu_group(dev->kvm, param.tablefd,
- kvg->iommu_group);
+ kvf->iommu_group);
break;
}

@@ -278,8 +279,8 @@ static int kvm_vfio_group_set_spapr_tce(struct kvm_device *dev,
}
#endif

-static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
- void __user *arg)
+static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
+ void __user *arg)
{
int32_t __user *argp = arg;
int32_t fd;
@@ -288,16 +289,16 @@ static int kvm_vfio_set_group(struct kvm_device *dev, long attr,
case KVM_DEV_VFIO_GROUP_ADD:
if (get_user(fd, argp))
return -EFAULT;
- return kvm_vfio_group_add(dev, fd);
+ return kvm_vfio_file_add(dev, fd);

case KVM_DEV_VFIO_GROUP_DEL:
if (get_user(fd, argp))
return -EFAULT;
- return kvm_vfio_group_del(dev, fd);
+ return kvm_vfio_file_del(dev, fd);

#ifdef CONFIG_SPAPR_TCE_IOMMU
case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
- return kvm_vfio_group_set_spapr_tce(dev, arg);
+ return kvm_vfio_file_set_spapr_tce(dev, arg);
#endif
}

@@ -309,8 +310,8 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
{
switch (attr->group) {
case KVM_DEV_VFIO_GROUP:
- return kvm_vfio_set_group(dev, attr->attr,
- u64_to_user_ptr(attr->addr));
+ return kvm_vfio_set_file(dev, attr->attr,
+ u64_to_user_ptr(attr->addr));
}

return -ENXIO;
@@ -339,16 +340,16 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
static void kvm_vfio_release(struct kvm_device *dev)
{
struct kvm_vfio *kv = dev->private;
- struct kvm_vfio_group *kvg, *tmp;
+ struct kvm_vfio_file *kvf, *tmp;

- list_for_each_entry_safe(kvg, tmp, &kv->group_list, node) {
+ list_for_each_entry_safe(kvf, tmp, &kv->file_list, node) {
#ifdef CONFIG_SPAPR_TCE_IOMMU
- kvm_spapr_tce_release_vfio_group(dev->kvm, kvg);
+ kvm_spapr_tce_release_vfio_group(dev->kvm, kvf);
#endif
- kvm_vfio_file_set_kvm(kvg->file, NULL);
- fput(kvg->file);
- list_del(&kvg->node);
- kfree(kvg);
+ kvm_vfio_file_set_kvm(kvf->file, NULL);
+ fput(kvf->file);
+ list_del(&kvf->node);
+ kfree(kvf);
kvm_arch_end_assignment(dev->kvm);
}

@@ -382,7 +383,7 @@ static int kvm_vfio_create(struct kvm_device *dev, u32 type)
if (!kv)
return -ENOMEM;

- INIT_LIST_HEAD(&kv->group_list);
+ INIT_LIST_HEAD(&kv->file_list);
mutex_init(&kv->lock);

dev->private = kv;
--
2.34.1


2023-02-06 09:06:10

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 05/14] kvm/vfio: Accept vfio device file from userspace

This defines KVM_DEV_VFIO_FILE* and make alias with KVM_DEV_VFIO_GROUP*.
Old userspace uses KVM_DEV_VFIO_GROUP* works as well.

Signed-off-by: Yi Liu <[email protected]>
---
Documentation/virt/kvm/devices/vfio.rst | 40 ++++++++++++++-----------
include/uapi/linux/kvm.h | 16 +++++++---
virt/kvm/vfio.c | 16 +++++-----
3 files changed, 42 insertions(+), 30 deletions(-)

diff --git a/Documentation/virt/kvm/devices/vfio.rst b/Documentation/virt/kvm/devices/vfio.rst
index 2d20dc561069..7f84ec26ca4a 100644
--- a/Documentation/virt/kvm/devices/vfio.rst
+++ b/Documentation/virt/kvm/devices/vfio.rst
@@ -9,23 +9,26 @@ Device types supported:
- KVM_DEV_TYPE_VFIO

Only one VFIO instance may be created per VM. The created device
-tracks VFIO groups in use by the VM and features of those groups
-important to the correctness and acceleration of the VM. As groups
-are enabled and disabled for use by the VM, KVM should be updated
-about their presence. When registered with KVM, a reference to the
-VFIO-group is held by KVM.
+tracks VFIO files (group or device) in use by the VM and features
+of those groups/devices important to the correctness and acceleration
+of the VM. As groups/devices are enabled and disabled for use by the
+VM, KVM should be updated about their presence. When registered with
+KVM, a reference to the VFIO file is held by KVM.

Groups:
- KVM_DEV_VFIO_GROUP
-
-KVM_DEV_VFIO_GROUP attributes:
- KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device tracking
- kvm_device_attr.addr points to an int32_t file descriptor
- for the VFIO group.
- KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM device tracking
- kvm_device_attr.addr points to an int32_t file descriptor
- for the VFIO group.
- KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE table
+ KVM_DEV_VFIO_FILE
+ - alias: KVM_DEV_VFIO_GROUP
+
+KVM_DEV_VFIO_FILE attributes:
+ KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM device
+ tracking kvm_device_attr.addr points to an int32_t file descriptor
+ for the VFIO file.
+ - alias: KVM_DEV_VFIO_GROUP_ADD
+ KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-KVM
+ device tracking kvm_device_attr.addr points to an int32_t file
+ descriptor for the VFIO file.
+ - alias: KVM_DEV_VFIO_GROUP_DEL
+ KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
allocated by sPAPR KVM.
kvm_device_attr.addr points to a struct::

@@ -36,6 +39,7 @@ KVM_DEV_VFIO_GROUP attributes:

where:

- - @groupfd is a file descriptor for a VFIO group;
- - @tablefd is a file descriptor for a TCE table allocated via
- KVM_CREATE_SPAPR_TCE.
+ *) @groupfd is a file descriptor for a VFIO group;
+ *) @tablefd is a file descriptor for a TCE table allocated via
+ KVM_CREATE_SPAPR_TCE.
+ - alias: KVM_DEV_VFIO_FILE_SET_SPAPR_TCE
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 55155e262646..484a8133bc69 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1401,10 +1401,18 @@ struct kvm_device_attr {
__u64 addr; /* userspace address of attr data */
};

-#define KVM_DEV_VFIO_GROUP 1
-#define KVM_DEV_VFIO_GROUP_ADD 1
-#define KVM_DEV_VFIO_GROUP_DEL 2
-#define KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE 3
+#define KVM_DEV_VFIO_FILE 1
+
+#define KVM_DEV_VFIO_FILE_ADD 1
+#define KVM_DEV_VFIO_FILE_DEL 2
+#define KVM_DEV_VFIO_FILE_SET_SPAPR_TCE 3
+
+/* KVM_DEV_VFIO_GROUP aliases are for compile time uapi compatibility */
+#define KVM_DEV_VFIO_GROUP KVM_DEV_VFIO_FILE
+
+#define KVM_DEV_VFIO_GROUP_ADD KVM_DEV_VFIO_FILE_ADD
+#define KVM_DEV_VFIO_GROUP_DEL KVM_DEV_VFIO_FILE_DEL
+#define KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE KVM_DEV_VFIO_FILE_SET_SPAPR_TCE

enum kvm_device_type {
KVM_DEV_TYPE_FSL_MPIC_20 = 1,
diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index 857d6ba349e1..d869913baafd 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -286,18 +286,18 @@ static int kvm_vfio_set_file(struct kvm_device *dev, long attr,
int32_t fd;

switch (attr) {
- case KVM_DEV_VFIO_GROUP_ADD:
+ case KVM_DEV_VFIO_FILE_ADD:
if (get_user(fd, argp))
return -EFAULT;
return kvm_vfio_file_add(dev, fd);

- case KVM_DEV_VFIO_GROUP_DEL:
+ case KVM_DEV_VFIO_FILE_DEL:
if (get_user(fd, argp))
return -EFAULT;
return kvm_vfio_file_del(dev, fd);

#ifdef CONFIG_SPAPR_TCE_IOMMU
- case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+ case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
return kvm_vfio_file_set_spapr_tce(dev, arg);
#endif
}
@@ -309,7 +309,7 @@ static int kvm_vfio_set_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
switch (attr->group) {
- case KVM_DEV_VFIO_GROUP:
+ case KVM_DEV_VFIO_FILE:
return kvm_vfio_set_file(dev, attr->attr,
u64_to_user_ptr(attr->addr));
}
@@ -321,12 +321,12 @@ static int kvm_vfio_has_attr(struct kvm_device *dev,
struct kvm_device_attr *attr)
{
switch (attr->group) {
- case KVM_DEV_VFIO_GROUP:
+ case KVM_DEV_VFIO_FILE:
switch (attr->attr) {
- case KVM_DEV_VFIO_GROUP_ADD:
- case KVM_DEV_VFIO_GROUP_DEL:
+ case KVM_DEV_VFIO_FILE_ADD:
+ case KVM_DEV_VFIO_FILE_DEL:
#ifdef CONFIG_SPAPR_TCE_IOMMU
- case KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE:
+ case KVM_DEV_VFIO_FILE_SET_SPAPR_TCE:
#endif
return 0;
}
--
2.34.1


2023-02-06 09:06:12

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 06/14] vfio: Pass struct vfio_device_file * to vfio_device_open/close()

This avoids passing too much parameters in multiple functions.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/vfio/group.c | 19 +++++++++++++------
drivers/vfio/vfio.h | 8 ++++----
drivers/vfio/vfio_main.c | 25 +++++++++++++++----------
3 files changed, 32 insertions(+), 20 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index cc0eded19a9f..2abf55c69281 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -166,8 +166,9 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
spin_unlock(&device->group->kvm_ref_lock);
}

-static int vfio_device_group_open(struct vfio_device *device)
+static int vfio_device_group_open(struct vfio_device_file *df)
{
+ struct vfio_device *device = df->device;
int ret;

mutex_lock(&device->group->group_lock);
@@ -187,7 +188,11 @@ static int vfio_device_group_open(struct vfio_device *device)
if (device->open_count == 0)
vfio_device_group_get_kvm_safe(device);

- ret = vfio_device_open(device, device->group->iommufd);
+ df->iommufd = device->group->iommufd;
+
+ ret = vfio_device_open(df);
+ if (ret)
+ df->iommufd = NULL;

if (device->open_count == 0)
vfio_device_put_kvm(device);
@@ -199,12 +204,14 @@ static int vfio_device_group_open(struct vfio_device *device)
return ret;
}

-void vfio_device_group_close(struct vfio_device *device)
+void vfio_device_group_close(struct vfio_device_file *df)
{
+ struct vfio_device *device = df->device;
+
mutex_lock(&device->group->group_lock);
mutex_lock(&device->dev_set->lock);

- vfio_device_close(device, device->group->iommufd);
+ vfio_device_close(df);

if (device->open_count == 0)
vfio_device_put_kvm(device);
@@ -225,7 +232,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
goto err_out;
}

- ret = vfio_device_group_open(device);
+ ret = vfio_device_group_open(df);
if (ret)
goto err_free;

@@ -257,7 +264,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
return filep;

err_close_device:
- vfio_device_group_close(device);
+ vfio_device_group_close(df);
err_free:
kfree(df);
err_out:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 00e6ce6ef4c9..d8275881c1f1 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -20,13 +20,13 @@ struct vfio_device_file {
struct vfio_device *device;
spinlock_t kvm_ref_lock; /* protect kvm field */
struct kvm *kvm;
+ struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
};

void vfio_device_put_registration(struct vfio_device *device);
bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd);
-void vfio_device_close(struct vfio_device *device,
- struct iommufd_ctx *iommufd);
+int vfio_device_open(struct vfio_device_file *df);
+void vfio_device_close(struct vfio_device_file *df);
struct vfio_device_file *
vfio_allocate_device_file(struct vfio_device *device);

@@ -91,7 +91,7 @@ void vfio_device_group_register(struct vfio_device *device);
void vfio_device_group_unregister(struct vfio_device *device);
int vfio_device_group_use_iommu(struct vfio_device *device);
void vfio_device_group_unuse_iommu(struct vfio_device *device);
-void vfio_device_group_close(struct vfio_device *device);
+void vfio_device_group_close(struct vfio_device_file *df);
struct vfio_group *vfio_group_from_file(struct file *file);
bool vfio_group_enforced_coherent(struct vfio_group *group);
void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index c529f609fecc..c517252aba19 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -411,9 +411,10 @@ vfio_allocate_device_file(struct vfio_device *device)
return df;
}

-static int vfio_device_first_open(struct vfio_device *device,
- struct iommufd_ctx *iommufd)
+static int vfio_device_first_open(struct vfio_device_file *df)
{
+ struct vfio_device *device = df->device;
+ struct iommufd_ctx *iommufd = df->iommufd;
int ret;

lockdep_assert_held(&device->dev_set->lock);
@@ -445,9 +446,11 @@ static int vfio_device_first_open(struct vfio_device *device,
return ret;
}

-static void vfio_device_last_close(struct vfio_device *device,
- struct iommufd_ctx *iommufd)
+static void vfio_device_last_close(struct vfio_device_file *df)
{
+ struct vfio_device *device = df->device;
+ struct iommufd_ctx *iommufd = df->iommufd;
+
lockdep_assert_held(&device->dev_set->lock);

if (device->ops->close_device)
@@ -459,15 +462,16 @@ static void vfio_device_last_close(struct vfio_device *device,
module_put(device->dev->driver->owner);
}

-int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
+int vfio_device_open(struct vfio_device_file *df)
{
+ struct vfio_device *device = df->device;
int ret = 0;

lockdep_assert_held(&device->dev_set->lock);

device->open_count++;
if (device->open_count == 1) {
- ret = vfio_device_first_open(device, iommufd);
+ ret = vfio_device_first_open(df);
if (ret)
device->open_count--;
}
@@ -475,14 +479,15 @@ int vfio_device_open(struct vfio_device *device, struct iommufd_ctx *iommufd)
return ret;
}

-void vfio_device_close(struct vfio_device *device,
- struct iommufd_ctx *iommufd)
+void vfio_device_close(struct vfio_device_file *df)
{
+ struct vfio_device *device = df->device;
+
lockdep_assert_held(&device->dev_set->lock);

vfio_assert_device_open(device);
if (device->open_count == 1)
- vfio_device_last_close(device, iommufd);
+ vfio_device_last_close(df);
device->open_count--;
}

@@ -527,7 +532,7 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
struct vfio_device_file *df = filep->private_data;
struct vfio_device *device = df->device;

- vfio_device_group_close(device);
+ vfio_device_group_close(df);

vfio_device_put_registration(device);

--
2.34.1


2023-02-06 09:06:30

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 07/14] vfio: Block device access via device fd until device is opened

Allow the vfio_device file to be in a state where the device FD is
opened but the device cannot be used by userspace (i.e. its .open_device()
hasn't been called). This inbetween state is not used when the device
FD is spawned from the group FD, however when we create the device FD
directly by opening a cdev it will be opened in the blocked state.

The reason for the inbetween state is userspace only gets a FD but
doesn't have the secure until binding the FD to an iommufd. So in the
blocked state, only the bind operation is allowed, other device accesses
are not allowed. Completing bind will allow user to further access the
device.

This is implemented by adding a flag in struct vfio_device_file to mark
the blocked state and using a simple smp_load_acquire() to obtain the
flag value and serialize all the device setup with the thread accessing
this device.

Following this lockless scheme, it can safely handle the device FD
unbound->bound but it cannot handle bound->unbound. To allow this we'd
need to add a lock on all the vfio ioctls which seems costly. So once
device FD is bound, it remains bound until the FD is closed.

Suggested-by: Jason Gunthorpe <[email protected]>
Signed-off-by: Yi Liu <[email protected]>
---
drivers/vfio/vfio.h | 1 +
drivers/vfio/vfio_main.c | 34 +++++++++++++++++++++++++++++++++-
2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index d8275881c1f1..802e13f1256e 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,7 @@ struct vfio_container;

struct vfio_device_file {
struct vfio_device *device;
+ bool access_granted;
spinlock_t kvm_ref_lock; /* protect kvm field */
struct kvm *kvm;
struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index c517252aba19..2267057240bd 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -476,7 +476,15 @@ int vfio_device_open(struct vfio_device_file *df)
device->open_count--;
}

- return ret;
+ if (ret)
+ return ret;
+
+ /*
+ * Paired with smp_load_acquire() in vfio_device_fops::ioctl/
+ * read/write/mmap
+ */
+ smp_store_release(&df->access_granted, true);
+ return 0;
}

void vfio_device_close(struct vfio_device_file *df)
@@ -1104,8 +1112,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
{
struct vfio_device_file *df = filep->private_data;
struct vfio_device *device = df->device;
+ bool access;
int ret;

+ /* Paired with smp_store_release() in vfio_device_open() */
+ access = smp_load_acquire(&df->access_granted);
+ if (!access)
+ return -EINVAL;
+
ret = vfio_device_pm_runtime_get(device);
if (ret)
return ret;
@@ -1132,6 +1146,12 @@ static ssize_t vfio_device_fops_read(struct file *filep, char __user *buf,
{
struct vfio_device_file *df = filep->private_data;
struct vfio_device *device = df->device;
+ bool access;
+
+ /* Paired with smp_store_release() in vfio_device_open() */
+ access = smp_load_acquire(&df->access_granted);
+ if (!access)
+ return -EINVAL;

if (unlikely(!device->ops->read))
return -EINVAL;
@@ -1145,6 +1165,12 @@ static ssize_t vfio_device_fops_write(struct file *filep,
{
struct vfio_device_file *df = filep->private_data;
struct vfio_device *device = df->device;
+ bool access;
+
+ /* Paired with smp_store_release() in vfio_device_open() */
+ access = smp_load_acquire(&df->access_granted);
+ if (!access)
+ return -EINVAL;

if (unlikely(!device->ops->write))
return -EINVAL;
@@ -1156,6 +1182,12 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)
{
struct vfio_device_file *df = filep->private_data;
struct vfio_device *device = df->device;
+ bool access;
+
+ /* Paired with smp_store_release() in vfio_device_open() */
+ access = smp_load_acquire(&df->access_granted);
+ if (!access)
+ return -EINVAL;

if (unlikely(!device->ops->mmap))
return -EINVAL;
--
2.34.1


2023-02-06 09:06:48

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 08/14] vfio: Add infrastructure for bind_iommufd from userspace

For the device fd opened from cdev, userspace needs to bind it to an
iommufd and attach it to IOAS managed by iommufd. With such operations,
userspace can set up a secure DMA context and hence access device.

This changes the existing vfio_iommufd_bind() to accept a pt_id pointer
as an optional input, and also an dev_id pointer to selectively return
the dev_id to prepare for adding bind_iommufd ioctl, which does the bind
first and then attach IOAS.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/vfio/group.c | 17 ++++++++++++++---
drivers/vfio/iommufd.c | 21 +++++++++------------
drivers/vfio/vfio.h | 9 ++++++---
drivers/vfio/vfio_main.c | 10 ++++++----
4 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 2abf55c69281..9f3f6f0e4942 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -169,6 +169,7 @@ static void vfio_device_group_get_kvm_safe(struct vfio_device *device)
static int vfio_device_group_open(struct vfio_device_file *df)
{
struct vfio_device *device = df->device;
+ u32 ioas_id;
int ret;

mutex_lock(&device->group->group_lock);
@@ -177,6 +178,13 @@ static int vfio_device_group_open(struct vfio_device_file *df)
goto out_unlock;
}

+ if (device->group->iommufd) {
+ ret = iommufd_vfio_compat_ioas_id(device->group->iommufd,
+ &ioas_id);
+ if (ret)
+ goto out_unlock;
+ }
+
mutex_lock(&device->dev_set->lock);

/*
@@ -188,9 +196,12 @@ static int vfio_device_group_open(struct vfio_device_file *df)
if (device->open_count == 0)
vfio_device_group_get_kvm_safe(device);

- df->iommufd = device->group->iommufd;
-
- ret = vfio_device_open(df);
+ if (device->group->iommufd) {
+ df->iommufd = device->group->iommufd;
+ ret = vfio_device_open(df, NULL, &ioas_id);
+ } else {
+ ret = vfio_device_open(df, NULL, NULL);
+ }
if (ret)
df->iommufd = NULL;

diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 4f82a6fa7c6c..beef6ca21107 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -10,9 +10,9 @@
MODULE_IMPORT_NS(IOMMUFD);
MODULE_IMPORT_NS(IOMMUFD_VFIO);

-int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
+int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx,
+ u32 *dev_id, u32 *pt_id)
{
- u32 ioas_id;
u32 device_id;
int ret;

@@ -29,17 +29,14 @@ int vfio_iommufd_bind(struct vfio_device *vdev, struct iommufd_ctx *ictx)
if (ret)
return ret;

- ret = iommufd_vfio_compat_ioas_id(ictx, &ioas_id);
- if (ret)
- goto err_unbind;
- ret = vdev->ops->attach_ioas(vdev, &ioas_id);
- if (ret)
- goto err_unbind;
+ if (pt_id) {
+ ret = vdev->ops->attach_ioas(vdev, pt_id);
+ if (ret)
+ goto err_unbind;
+ }

- /*
- * The legacy path has no way to return the device id or the selected
- * pt_id
- */
+ if (dev_id)
+ *dev_id = device_id;
return 0;

err_unbind:
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 802e13f1256e..9126500381f5 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -26,7 +26,8 @@ struct vfio_device_file {

void vfio_device_put_registration(struct vfio_device *device);
bool vfio_device_try_get_registration(struct vfio_device *device);
-int vfio_device_open(struct vfio_device_file *df);
+int vfio_device_open(struct vfio_device_file *df,
+ u32 *dev_id, u32 *pt_id);
void vfio_device_close(struct vfio_device_file *df);
struct vfio_device_file *
vfio_allocate_device_file(struct vfio_device *device);
@@ -231,11 +232,13 @@ static inline void vfio_container_cleanup(void)
#endif

#if IS_ENABLED(CONFIG_IOMMUFD)
-int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx);
+int vfio_iommufd_bind(struct vfio_device *device, struct iommufd_ctx *ictx,
+ u32 *dev_id, u32 *pt_id);
void vfio_iommufd_unbind(struct vfio_device *device);
#else
static inline int vfio_iommufd_bind(struct vfio_device *device,
- struct iommufd_ctx *ictx)
+ struct iommufd_ctx *ictx,
+ u32 *dev_id, u32 *pt_id)
{
return -EOPNOTSUPP;
}
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 2267057240bd..b40c2d95f693 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -411,7 +411,8 @@ vfio_allocate_device_file(struct vfio_device *device)
return df;
}

-static int vfio_device_first_open(struct vfio_device_file *df)
+static int vfio_device_first_open(struct vfio_device_file *df,
+ u32 *dev_id, u32 *pt_id)
{
struct vfio_device *device = df->device;
struct iommufd_ctx *iommufd = df->iommufd;
@@ -423,7 +424,7 @@ static int vfio_device_first_open(struct vfio_device_file *df)
return -ENODEV;

if (iommufd)
- ret = vfio_iommufd_bind(device, iommufd);
+ ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
else
ret = vfio_device_group_use_iommu(device);
if (ret)
@@ -462,7 +463,8 @@ static void vfio_device_last_close(struct vfio_device_file *df)
module_put(device->dev->driver->owner);
}

-int vfio_device_open(struct vfio_device_file *df)
+int vfio_device_open(struct vfio_device_file *df,
+ u32 *dev_id, u32 *pt_id)
{
struct vfio_device *device = df->device;
int ret = 0;
@@ -471,7 +473,7 @@ int vfio_device_open(struct vfio_device_file *df)

device->open_count++;
if (device->open_count == 1) {
- ret = vfio_device_first_open(df);
+ ret = vfio_device_first_open(df, dev_id, pt_id);
if (ret)
device->open_count--;
}
--
2.34.1


2023-02-06 09:06:54

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 09/14] vfio-iommufd: Add detach_ioas support for physical VFIO devices

this prepares for adding DETACH ioctl for physical VFIO devices.

Signed-off-by: Yi Liu <[email protected]>
---
Documentation/driver-api/vfio.rst | 8 +++--
drivers/vfio/fsl-mc/vfio_fsl_mc.c | 1 +
drivers/vfio/iommufd.c | 31 ++++++++++++++++---
.../vfio/pci/hisilicon/hisi_acc_vfio_pci.c | 2 ++
drivers/vfio/pci/mlx5/main.c | 1 +
drivers/vfio/pci/vfio_pci.c | 1 +
drivers/vfio/platform/vfio_amba.c | 1 +
drivers/vfio/platform/vfio_platform.c | 1 +
drivers/vfio/vfio_main.c | 3 +-
include/linux/vfio.h | 5 +++
10 files changed, 46 insertions(+), 8 deletions(-)

diff --git a/Documentation/driver-api/vfio.rst b/Documentation/driver-api/vfio.rst
index 0bfa7261f991..bbc7934fd55d 100644
--- a/Documentation/driver-api/vfio.rst
+++ b/Documentation/driver-api/vfio.rst
@@ -279,6 +279,7 @@ similar to a file operations structure::
struct iommufd_ctx *ictx, u32 *out_device_id);
void (*unbind_iommufd)(struct vfio_device *vdev);
int (*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+ void (*detach_ioas)(struct vfio_device *vdev);
int (*open_device)(struct vfio_device *vdev);
void (*close_device)(struct vfio_device *vdev);
ssize_t (*read)(struct vfio_device *vdev, char __user *buf,
@@ -314,9 +315,10 @@ container_of().
- The [un]bind_iommufd callbacks are issued when the device is bound to
and unbound from iommufd.

- - The attach_ioas callback is issued when the device is attached to an
- IOAS managed by the bound iommufd. The attached IOAS is automatically
- detached when the device is unbound from iommufd.
+ - The [de]attach_ioas callback is issued when the device is attached to
+ and detached from an IOAS managed by the bound iommufd. However, the
+ attached IOAS can also be automatically detached when the device is
+ unbound from iommufd.

- The read/write/mmap callbacks implement the device region access defined
by the device's own VFIO_DEVICE_GET_REGION_INFO ioctl.
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc.c b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
index c89a047a4cd8..d540cf683d93 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc.c
@@ -594,6 +594,7 @@ static const struct vfio_device_ops vfio_fsl_mc_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
+ .detach_ioas = vfio_iommufd_physical_detach_ioas,
};

static struct fsl_mc_driver vfio_fsl_mc_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index beef6ca21107..90df1f30b7fd 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -71,14 +71,18 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
}
EXPORT_SYMBOL_GPL(vfio_iommufd_physical_bind);

+static void __vfio_iommufd_detach(struct vfio_device *vdev)
+{
+ iommufd_device_detach(vdev->iommufd_device);
+ vdev->iommufd_attached = false;
+}
+
void vfio_iommufd_physical_unbind(struct vfio_device *vdev)
{
lockdep_assert_held(&vdev->dev_set->lock);

- if (vdev->iommufd_attached) {
- iommufd_device_detach(vdev->iommufd_device);
- vdev->iommufd_attached = false;
- }
+ if (vdev->iommufd_attached)
+ __vfio_iommufd_detach(vdev);
iommufd_device_unbind(vdev->iommufd_device);
vdev->iommufd_device = NULL;
}
@@ -88,6 +92,14 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
{
int rc;

+ lockdep_assert_held(&vdev->dev_set->lock);
+
+ if (!vdev->iommufd_device)
+ return -EINVAL;
+
+ if (vdev->iommufd_attached)
+ return -EBUSY;
+
rc = iommufd_device_attach(vdev->iommufd_device, pt_id);
if (rc)
return rc;
@@ -96,6 +108,17 @@ int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
}
EXPORT_SYMBOL_GPL(vfio_iommufd_physical_attach_ioas);

+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev)
+{
+ lockdep_assert_held(&vdev->dev_set->lock);
+
+ if (!vdev->iommufd_device || !vdev->iommufd_attached)
+ return;
+
+ __vfio_iommufd_detach(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_physical_detach_ioas);
+
/*
* The emulated standard ops mean that vfio_device is going to use the
* "mdev path" and will call vfio_pin_pages()/vfio_dma_rw(). Drivers using this
diff --git a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
index a117eaf21c14..b2f9778c8366 100644
--- a/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
+++ b/drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
@@ -1373,6 +1373,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_migrn_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
+ .detach_ioas = vfio_iommufd_physical_detach_ioas,
};

static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
@@ -1391,6 +1392,7 @@ static const struct vfio_device_ops hisi_acc_vfio_pci_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
+ .detach_ioas = vfio_iommufd_physical_detach_ioas,
};

static int hisi_acc_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/pci/mlx5/main.c b/drivers/vfio/pci/mlx5/main.c
index e897537a9e8a..6fc3410989eb 100644
--- a/drivers/vfio/pci/mlx5/main.c
+++ b/drivers/vfio/pci/mlx5/main.c
@@ -1326,6 +1326,7 @@ static const struct vfio_device_ops mlx5vf_pci_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
+ .detach_ioas = vfio_iommufd_physical_detach_ioas,
};

static int mlx5vf_pci_probe(struct pci_dev *pdev,
diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
index 29091ee2e984..cb5b7f865d58 100644
--- a/drivers/vfio/pci/vfio_pci.c
+++ b/drivers/vfio/pci/vfio_pci.c
@@ -141,6 +141,7 @@ static const struct vfio_device_ops vfio_pci_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
+ .detach_ioas = vfio_iommufd_physical_detach_ioas,
};

static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
diff --git a/drivers/vfio/platform/vfio_amba.c b/drivers/vfio/platform/vfio_amba.c
index 83fe54015595..6464b3939ebc 100644
--- a/drivers/vfio/platform/vfio_amba.c
+++ b/drivers/vfio/platform/vfio_amba.c
@@ -119,6 +119,7 @@ static const struct vfio_device_ops vfio_amba_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
+ .detach_ioas = vfio_iommufd_physical_detach_ioas,
};

static const struct amba_id pl330_ids[] = {
diff --git a/drivers/vfio/platform/vfio_platform.c b/drivers/vfio/platform/vfio_platform.c
index 22a1efca32a8..8cf22fa65baa 100644
--- a/drivers/vfio/platform/vfio_platform.c
+++ b/drivers/vfio/platform/vfio_platform.c
@@ -108,6 +108,7 @@ static const struct vfio_device_ops vfio_platform_ops = {
.bind_iommufd = vfio_iommufd_physical_bind,
.unbind_iommufd = vfio_iommufd_physical_unbind,
.attach_ioas = vfio_iommufd_physical_attach_ioas,
+ .detach_ioas = vfio_iommufd_physical_detach_ioas,
};

static struct platform_driver vfio_platform_driver = {
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index b40c2d95f693..05dd4b89e9d1 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -250,7 +250,8 @@ static int __vfio_register_dev(struct vfio_device *device,

if (WARN_ON(device->ops->bind_iommufd &&
(!device->ops->unbind_iommufd ||
- !device->ops->attach_ioas)))
+ !device->ops->attach_ioas ||
+ !device->ops->detach_ioas)))
return -EINVAL;

/*
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 6a07e1c6c38e..99a6a07e915c 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -74,6 +74,7 @@ struct vfio_device {
* @unbind_iommufd: Opposite of bind_iommufd
* @attach_ioas: Called when attaching device to an IOAS/HWPT managed by the
* bound iommufd. Undo in unbind_iommufd.
+ * @detach_ioas: Opposite of attach_ioas, this is for runtime undo.
* @open_device: Called when the first file descriptor is opened for this device
* @close_device: Opposite of open_device
* @read: Perform read(2) on device file descriptor
@@ -97,6 +98,7 @@ struct vfio_device_ops {
struct iommufd_ctx *ictx, u32 *out_device_id);
void (*unbind_iommufd)(struct vfio_device *vdev);
int (*attach_ioas)(struct vfio_device *vdev, u32 *pt_id);
+ void (*detach_ioas)(struct vfio_device *vdev);
int (*open_device)(struct vfio_device *vdev);
void (*close_device)(struct vfio_device *vdev);
ssize_t (*read)(struct vfio_device *vdev, char __user *buf,
@@ -118,6 +120,7 @@ int vfio_iommufd_physical_bind(struct vfio_device *vdev,
struct iommufd_ctx *ictx, u32 *out_device_id);
void vfio_iommufd_physical_unbind(struct vfio_device *vdev);
int vfio_iommufd_physical_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_physical_detach_ioas(struct vfio_device *vdev);
int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
struct iommufd_ctx *ictx, u32 *out_device_id);
void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
@@ -130,6 +133,8 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
((void (*)(struct vfio_device *vdev)) NULL)
#define vfio_iommufd_physical_attach_ioas \
((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_physical_detach_ioas \
+ ((void (*)(struct vfio_device *vdev)) NULL)
#define vfio_iommufd_emulated_bind \
((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \
u32 *out_device_id)) NULL)
--
2.34.1


2023-02-06 09:07:06

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 10/14] vfio-iommufd: Add detach_ioas for emulated VFIO devices

this prepares for adding DETACH ioctl for emulated VFIO devices.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/gpu/drm/i915/gvt/kvmgt.c | 1 +
drivers/s390/cio/vfio_ccw_ops.c | 1 +
drivers/s390/crypto/vfio_ap_ops.c | 1 +
drivers/vfio/iommufd.c | 29 +++++++++++++++++++++++++----
include/linux/vfio.h | 3 +++
5 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 8ae7039b3683..8a76a84bc3c1 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1474,6 +1474,7 @@ static const struct vfio_device_ops intel_vgpu_dev_ops = {
.bind_iommufd = vfio_iommufd_emulated_bind,
.unbind_iommufd = vfio_iommufd_emulated_unbind,
.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+ .detach_ioas = vfio_iommufd_emulated_detach_ioas,
};

static int intel_vgpu_probe(struct mdev_device *mdev)
diff --git a/drivers/s390/cio/vfio_ccw_ops.c b/drivers/s390/cio/vfio_ccw_ops.c
index 5b53b94f13c7..cba4971618ff 100644
--- a/drivers/s390/cio/vfio_ccw_ops.c
+++ b/drivers/s390/cio/vfio_ccw_ops.c
@@ -632,6 +632,7 @@ static const struct vfio_device_ops vfio_ccw_dev_ops = {
.bind_iommufd = vfio_iommufd_emulated_bind,
.unbind_iommufd = vfio_iommufd_emulated_unbind,
.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+ .detach_ioas = vfio_iommufd_emulated_detach_ioas,
};

struct mdev_driver vfio_ccw_mdev_driver = {
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 9c01957e56b3..f99c69d40982 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -1802,6 +1802,7 @@ static const struct vfio_device_ops vfio_ap_matrix_dev_ops = {
.bind_iommufd = vfio_iommufd_emulated_bind,
.unbind_iommufd = vfio_iommufd_emulated_unbind,
.attach_ioas = vfio_iommufd_emulated_attach_ioas,
+ .detach_ioas = vfio_iommufd_emulated_detach_ioas,
};

static struct mdev_driver vfio_ap_matrix_driver = {
diff --git a/drivers/vfio/iommufd.c b/drivers/vfio/iommufd.c
index 90df1f30b7fd..026f81a87dd7 100644
--- a/drivers/vfio/iommufd.c
+++ b/drivers/vfio/iommufd.c
@@ -149,14 +149,18 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
}
EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_bind);

+static void __vfio_iommufd_access_destroy(struct vfio_device *vdev)
+{
+ iommufd_access_destroy(vdev->iommufd_access);
+ vdev->iommufd_access = NULL;
+}
+
void vfio_iommufd_emulated_unbind(struct vfio_device *vdev)
{
lockdep_assert_held(&vdev->dev_set->lock);

- if (vdev->iommufd_access) {
- iommufd_access_destroy(vdev->iommufd_access);
- vdev->iommufd_access = NULL;
- }
+ if (vdev->iommufd_access)
+ __vfio_iommufd_access_destroy(vdev);
iommufd_ctx_put(vdev->iommufd_ictx);
vdev->iommufd_ictx = NULL;
}
@@ -168,6 +172,12 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)

lockdep_assert_held(&vdev->dev_set->lock);

+ if (!vdev->iommufd_ictx)
+ return -EINVAL;
+
+ if (vdev->iommufd_access)
+ return -EBUSY;
+
user = iommufd_access_create(vdev->iommufd_ictx, *pt_id, &vfio_user_ops,
vdev);
if (IS_ERR(user))
@@ -176,3 +186,14 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id)
return 0;
}
EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_attach_ioas);
+
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev)
+{
+ lockdep_assert_held(&vdev->dev_set->lock);
+
+ if (!vdev->iommufd_ictx || !vdev->iommufd_access)
+ return;
+
+ __vfio_iommufd_access_destroy(vdev);
+}
+EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_detach_ioas);
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 99a6a07e915c..70380d4955e1 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -125,6 +125,7 @@ int vfio_iommufd_emulated_bind(struct vfio_device *vdev,
struct iommufd_ctx *ictx, u32 *out_device_id);
void vfio_iommufd_emulated_unbind(struct vfio_device *vdev);
int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
+void vfio_iommufd_emulated_detach_ioas(struct vfio_device *vdev);
#else
#define vfio_iommufd_physical_bind \
((int (*)(struct vfio_device *vdev, struct iommufd_ctx *ictx, \
@@ -142,6 +143,8 @@ int vfio_iommufd_emulated_attach_ioas(struct vfio_device *vdev, u32 *pt_id);
((void (*)(struct vfio_device *vdev)) NULL)
#define vfio_iommufd_emulated_attach_ioas \
((int (*)(struct vfio_device *vdev, u32 *pt_id)) NULL)
+#define vfio_iommufd_emulated_detach_ioas \
+ ((void (*)(struct vfio_device *vdev)) NULL)
#endif

/**
--
2.34.1


2023-02-06 09:07:16

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 11/14] vfio: Make vfio_device_open() exclusive between group path and device cdev path

With the introduction of vfio device cdev, userspace can get device
access by either the legacy group path or the cdev path. For VFIO devices,
it can only be opened by one of the group path and the cdev path at one
time. e.g. when the device is opened via cdev path, the group path should
be failed. Both paths will call into vfio_device_open(), so the exclusion
is done in it.

VFIO group has historically allowed multi-open of the device FD. This
was made secure because the "open" was executed via an ioctl to the
group FD which is itself only single open.

However, no known use of multiple device FDs today. It is kind of a
strange thing to do because new device FDs can naturally be created
via dup().

When we implement the new device uAPI (only used in cdev path) there is
no natural way to allow the device itself from being multi-opened in a
secure manner. Without the group FD we cannot prove the security context
of the opener.

Thus, when moving to the new uAPI we block the ability to multi-open
the device. Old group path still allows it. This requires vfio_device_open()
exclusive between the cdev path with the group path.

The main logic is in the vfio_device_open(). It needs to sustain both
the legacy behavior i.e. multi-open in the group path and the new
behavior i.e. single-open in the cdev path. This mixture leads to the
introduction of a new is_cdev_device flag in struct vfio_device_file,
and a single_open flag in struct vfio_device.
- vfio_device_file::is_cdev_device is set per the vfio_device_file
allocation.
- vfio_device::single_open is set after open_device op is called
successfully if vfio_device_file::is_cdev_device is set.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/vfio/group.c | 2 +-
drivers/vfio/vfio.h | 4 +++-
drivers/vfio/vfio_main.c | 26 +++++++++++++++++++++++---
include/linux/vfio.h | 1 +
4 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/group.c b/drivers/vfio/group.c
index 9f3f6f0e4942..a90273aa77ec 100644
--- a/drivers/vfio/group.c
+++ b/drivers/vfio/group.c
@@ -237,7 +237,7 @@ static struct file *vfio_device_open_file(struct vfio_device *device)
struct file *filep;
int ret;

- df = vfio_allocate_device_file(device);
+ df = vfio_allocate_device_file(device, false);
if (IS_ERR(df)) {
ret = PTR_ERR(df);
goto err_out;
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 9126500381f5..2debf0173861 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -18,6 +18,8 @@ struct vfio_container;

struct vfio_device_file {
struct vfio_device *device;
+ bool is_cdev_device;
+
bool access_granted;
spinlock_t kvm_ref_lock; /* protect kvm field */
struct kvm *kvm;
@@ -30,7 +32,7 @@ int vfio_device_open(struct vfio_device_file *df,
u32 *dev_id, u32 *pt_id);
void vfio_device_close(struct vfio_device_file *df);
struct vfio_device_file *
-vfio_allocate_device_file(struct vfio_device *device);
+vfio_allocate_device_file(struct vfio_device *device, bool is_cdev_device);

extern const struct file_operations vfio_device_fops;

diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 05dd4b89e9d1..e07b185f9820 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -398,7 +398,7 @@ static bool vfio_assert_device_open(struct vfio_device *device)
}

struct vfio_device_file *
-vfio_allocate_device_file(struct vfio_device *device)
+vfio_allocate_device_file(struct vfio_device *device, bool is_cdev_device)
{
struct vfio_device_file *df;

@@ -407,6 +407,7 @@ vfio_allocate_device_file(struct vfio_device *device)
return ERR_PTR(-ENOMEM);

df->device = device;
+ df->is_cdev_device = is_cdev_device;
spin_lock_init(&df->kvm_ref_lock);

return df;
@@ -472,11 +473,23 @@ int vfio_device_open(struct vfio_device_file *df,

lockdep_assert_held(&device->dev_set->lock);

+ /*
+ * Device cdev path cannot support multiple device open since
+ * it doesn't have a secure way for it. So a second device
+ * open attempt should be failed if the caller is from a cdev
+ * path or the device has already been opened by a cdev path.
+ */
+ if (device->open_count != 0 &&
+ (df->is_cdev_device || device->single_open))
+ return -EINVAL;
+
device->open_count++;
if (device->open_count == 1) {
ret = vfio_device_first_open(df, dev_id, pt_id);
if (ret)
device->open_count--;
+ else
+ device->single_open = df->is_cdev_device;
}

if (ret)
@@ -497,8 +510,10 @@ void vfio_device_close(struct vfio_device_file *df)
lockdep_assert_held(&device->dev_set->lock);

vfio_assert_device_open(device);
- if (device->open_count == 1)
+ if (device->open_count == 1) {
vfio_device_last_close(df);
+ device->single_open = false; // clear single_open flag
+ }
device->open_count--;
}

@@ -543,7 +558,12 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
struct vfio_device_file *df = filep->private_data;
struct vfio_device *device = df->device;

- vfio_device_group_close(df);
+ /*
+ * group path supports multiple device open, while cdev doesn't.
+ * So use vfio_device_group_close() for !singel_open case.
+ */
+ if (!df->is_cdev_device)
+ vfio_device_group_close(df);

vfio_device_put_registration(device);

diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 70380d4955e1..83d1e0af0a70 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -63,6 +63,7 @@ struct vfio_device {
struct iommufd_ctx *iommufd_ictx;
bool iommufd_attached;
#endif
+ bool single_open;
};

/**
--
2.34.1


2023-02-06 09:07:27

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 12/14] vfio: Add cdev for vfio_device

This allows user to directly open a vfio device w/o using the legacy
container/group interface, as a prerequisite for supporting new iommu
features like nested translation.

The device fd opened in this manner doesn't have the capability to access
the device as the fops open() doesn't open the device until the successful
BIND_IOMMUFD which be added in next patch.

With this patch, devices registered to vfio core have both group and device
interface created.

- group interface : /dev/vfio/$groupID
- device interface: /dev/vfio/devices/vfioX (X is the minor number and
is unique across devices)

Given a vfio device the user can identify the matching vfioX by checking
the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
/sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
major:minor of the matching vfioX.

Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
that the major:minor matches.

The vfio_device cdev logic in this patch:
*) __vfio_register_dev() path ends up doing cdev_device_add() for each
vfio_device;
*) vfio_unregister_group_dev() path does cdev_device_del();

Signed-off-by: Yi Liu <[email protected]>
Signed-off-by: Joao Martins <[email protected]>
---
drivers/vfio/Kconfig | 11 +++++++
drivers/vfio/Makefile | 1 +
drivers/vfio/device_cdev.c | 64 ++++++++++++++++++++++++++++++++++++++
drivers/vfio/vfio.h | 26 ++++++++++++++++
drivers/vfio/vfio_main.c | 41 +++++++++++++++++++++---
include/linux/vfio.h | 2 ++
6 files changed, 141 insertions(+), 4 deletions(-)
create mode 100644 drivers/vfio/device_cdev.c

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index a8f544629467..0476abf154f2 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -12,6 +12,17 @@ menuconfig VFIO
If you don't know what to do here, say N.

if VFIO
+config VFIO_DEVICE_CDEV
+ bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
+ depends on IOMMUFD
+ help
+ The VFIO device cdev is another way for userspace to get device
+ access. Userspace gets device fd by opening device cdev under
+ /dev/vfio/devices/vfioX, and then bind the device fd with an iommufd
+ to set up secure context for device access.
+
+ If you don't know what to do here, say N.
+
config VFIO_CONTAINER
bool "Support for the VFIO container /dev/vfio/vfio"
select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 70e7dcb302ef..245394aeb94b 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -4,6 +4,7 @@ obj-$(CONFIG_VFIO) += vfio.o
vfio-y += vfio_main.o \
group.o \
iova_bitmap.o
+vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
vfio-$(CONFIG_IOMMUFD) += iommufd.o
vfio-$(CONFIG_VFIO_CONTAINER) += container.o
vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
new file mode 100644
index 000000000000..f024833c9e2c
--- /dev/null
+++ b/drivers/vfio/device_cdev.c
@@ -0,0 +1,64 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (c) 2023 Intel Corporation.
+ */
+#include <linux/vfio.h>
+
+#include "vfio.h"
+
+static struct vfio {
+ dev_t device_devt;
+} vfio;
+
+void vfio_init_device_cdev(struct vfio_device *device)
+{
+ device->device.devt = MKDEV(MAJOR(vfio.device_devt), device->index);
+ cdev_init(&device->cdev, &vfio_device_fops);
+ device->cdev.owner = THIS_MODULE;
+}
+
+int vfio_device_fops_open(struct inode *inode, struct file *filep)
+{
+ struct vfio_device *device = container_of(inode->i_cdev,
+ struct vfio_device, cdev);
+ struct vfio_device_file *df;
+ int ret;
+
+ if (!vfio_device_try_get_registration(device))
+ return -ENODEV;
+
+ /*
+ * device access is blocked until .open_device() is called
+ * in BIND_IOMMUFD.
+ */
+ df = vfio_allocate_device_file(device, true);
+ if (IS_ERR(df)) {
+ ret = PTR_ERR(df);
+ goto err_put_registration;
+ }
+
+ filep->private_data = df;
+
+ return 0;
+
+err_put_registration:
+ vfio_device_put_registration(device);
+ return ret;
+}
+
+static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
+{
+ return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
+}
+
+int vfio_cdev_init(struct class *device_class)
+{
+ device_class->devnode = vfio_device_devnode;
+ return alloc_chrdev_region(&vfio.device_devt, 0,
+ MINORMASK + 1, "vfio-dev");
+}
+
+void vfio_cdev_cleanup(void)
+{
+ unregister_chrdev_region(vfio.device_devt, MINORMASK + 1);
+}
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 2debf0173861..c7c75865afec 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -250,6 +250,32 @@ static inline void vfio_iommufd_unbind(struct vfio_device *device)
}
#endif

+#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
+void vfio_init_device_cdev(struct vfio_device *device);
+int vfio_device_fops_open(struct inode *inode, struct file *filep);
+int vfio_cdev_init(struct class *device_class);
+void vfio_cdev_cleanup(void);
+#else
+static inline void vfio_init_device_cdev(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_fops_open(struct inode *inode,
+ struct file *filep)
+{
+ return 0;
+}
+
+static inline int vfio_cdev_init(struct class *device_class)
+{
+ return 0;
+}
+
+static inline void vfio_cdev_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_DEVICE_CDEV */
+
#if IS_ENABLED(CONFIG_VFIO_VIRQFD)
int __init vfio_virqfd_init(void);
void vfio_virqfd_exit(void);
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index e07b185f9820..035730dc6ad4 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -235,6 +235,7 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
device->device.release = vfio_device_release;
device->device.class = vfio.device_class;
device->device.parent = device->dev;
+ vfio_init_device_cdev(device);
return 0;

out_uninit:
@@ -243,6 +244,25 @@ static int vfio_init_device(struct vfio_device *device, struct device *dev,
return ret;
}

+static int vfio_device_add(struct vfio_device *device)
+{
+ int ret;
+
+ if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV))
+ ret = cdev_device_add(&device->cdev, &device->device);
+ else
+ ret = device_add(&device->device);
+ return ret;
+}
+
+static void vfio_device_del(struct vfio_device *device)
+{
+ if (IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV))
+ cdev_device_del(&device->cdev, &device->device);
+ else
+ device_del(&device->device);
+}
+
static int __vfio_register_dev(struct vfio_device *device,
enum vfio_group_type type)
{
@@ -269,7 +289,7 @@ static int __vfio_register_dev(struct vfio_device *device,
if (ret)
return ret;

- ret = device_add(&device->device);
+ ret = vfio_device_add(device);
if (ret)
goto err_out;

@@ -309,6 +329,13 @@ void vfio_unregister_group_dev(struct vfio_device *device)
bool interrupted = false;
long rc;

+ /*
+ * Balances vfio_device_add in register path. Putting it as the
+ * first operation in unregister to prevent registration refcount
+ * from incrementing per cdev open.
+ */
+ vfio_device_del(device);
+
vfio_device_put_registration(device);
rc = try_wait_for_completion(&device->comp);
while (rc <= 0) {
@@ -334,9 +361,6 @@ void vfio_unregister_group_dev(struct vfio_device *device)

vfio_device_group_unregister(device);

- /* Balances device_add in register path */
- device_del(&device->device);
-
/* Balances vfio_device_set_group in register path */
vfio_device_remove_group(device);
}
@@ -1220,6 +1244,7 @@ static int vfio_device_fops_mmap(struct file *filep, struct vm_area_struct *vma)

const struct file_operations vfio_device_fops = {
.owner = THIS_MODULE,
+ .open = vfio_device_fops_open,
.release = vfio_device_fops_release,
.read = vfio_device_fops_read,
.write = vfio_device_fops_write,
@@ -1593,9 +1618,16 @@ static int __init vfio_init(void)
goto err_dev_class;
}

+ ret = vfio_cdev_init(vfio.device_class);
+ if (ret)
+ goto err_alloc_dev_chrdev;
+
pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n");
return 0;

+err_alloc_dev_chrdev:
+ class_destroy(vfio.device_class);
+ vfio.device_class = NULL;
err_dev_class:
vfio_virqfd_exit();
err_virqfd:
@@ -1606,6 +1638,7 @@ static int __init vfio_init(void)
static void __exit vfio_cleanup(void)
{
ida_destroy(&vfio.device_ida);
+ vfio_cdev_cleanup();
class_destroy(vfio.device_class);
vfio.device_class = NULL;
vfio_virqfd_exit();
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 83d1e0af0a70..fd4bf9c21ffe 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -13,6 +13,7 @@
#include <linux/mm.h>
#include <linux/workqueue.h>
#include <linux/poll.h>
+#include <linux/cdev.h>
#include <uapi/linux/vfio.h>
#include <linux/iova_bitmap.h>

@@ -51,6 +52,7 @@ struct vfio_device {
/* Members below here are private, not for driver use */
unsigned int index;
struct device device; /* device.kref covers object life circle */
+ struct cdev cdev;
refcount_t refcount; /* user count on registered device*/
unsigned int open_count;
struct completion comp;
--
2.34.1


2023-02-06 09:07:29

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 13/14] vfio: Add ioctls for device cdev using iommufd

This adds three vfio device ioctls for userspace using iommufd to set up
secure DMA context for device access.

VFIO_DEVICE_BIND_IOMMUFD: bind device to an iommufd, hence gain DMA
control provided by the iommufd. VFIO no
iommu mode is indicated by passing a minus
fd value.
VFIO_DEVICE_ATTACH_IOMMUFD_PT: attach device to IOAS, hw_pagetable
managed by iommufd. Attach can be
undo by VFIO_DEVICE_DETACH_IOMMUFD_PT
or device fd close.
VFIO_DEVICE_DETACH_IOMMUFD_PT: detach device from the current attached
IOAS or hw_pagetable managed by iommufd.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/vfio/device_cdev.c | 176 +++++++++++++++++++++++++++++++++++++
drivers/vfio/vfio.h | 33 ++++++-
drivers/vfio/vfio_main.c | 47 +++++++++-
include/linux/iommufd.h | 6 ++
include/uapi/linux/vfio.h | 86 ++++++++++++++++++
5 files changed, 343 insertions(+), 5 deletions(-)

diff --git a/drivers/vfio/device_cdev.c b/drivers/vfio/device_cdev.c
index f024833c9e2c..4105cc939b7b 100644
--- a/drivers/vfio/device_cdev.c
+++ b/drivers/vfio/device_cdev.c
@@ -3,6 +3,7 @@
* Copyright (c) 2023 Intel Corporation.
*/
#include <linux/vfio.h>
+#include <linux/iommufd.h>

#include "vfio.h"

@@ -46,6 +47,181 @@ int vfio_device_fops_open(struct inode *inode, struct file *filep)
return ret;
}

+void vfio_device_cdev_close(struct vfio_device_file *df)
+{
+ mutex_lock(&df->device->dev_set->lock);
+ vfio_device_close(df);
+ vfio_device_put_kvm(df->device);
+ mutex_unlock(&df->device->dev_set->lock);
+}
+
+static void vfio_device_get_kvm_safe(struct vfio_device_file *df)
+{
+ spin_lock(&df->kvm_ref_lock);
+ if (!df->kvm)
+ goto unlock;
+
+ _vfio_device_get_kvm_safe(df->device, df->kvm);
+
+unlock:
+ spin_unlock(&df->kvm_ref_lock);
+}
+
+long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+ unsigned long arg)
+{
+ struct vfio_device *device = df->device;
+ struct vfio_device_bind_iommufd bind;
+ struct iommufd_ctx *iommufd = NULL;
+ unsigned long minsz;
+ struct fd f;
+ int ret;
+
+ minsz = offsetofend(struct vfio_device_bind_iommufd, iommufd);
+
+ if (copy_from_user(&bind, (void __user *)arg, minsz))
+ return -EFAULT;
+
+ if (bind.argsz < minsz || bind.flags)
+ return -EINVAL;
+
+ if (!device->ops->bind_iommufd)
+ return -ENODEV;
+
+ mutex_lock(&device->dev_set->lock);
+ /*
+ * If already been bound to an iommufd, or already set noiommu
+ * then fail it.
+ */
+ if (df->iommufd || df->noiommu) {
+ ret = -EINVAL;
+ goto out_unlock;
+ }
+
+ /* iommufd < 0 means noiommu mode */
+ if (bind.iommufd < 0) {
+ if (!capable(CAP_SYS_RAWIO)) {
+ ret = -EPERM;
+ goto out_unlock;
+ }
+ df->noiommu = true;
+ } else {
+ f = fdget(bind.iommufd);
+ if (!f.file) {
+ ret = -EBADF;
+ goto out_unlock;
+ }
+ iommufd = iommufd_ctx_from_file(f.file);
+ if (IS_ERR(iommufd)) {
+ ret = PTR_ERR(iommufd);
+ goto out_put_file;
+ }
+ }
+
+ /*
+ * Before the first device open, get the KVM pointer currently
+ * associated with the device file (if there is) and obtain a
+ * reference. This reference is held until device closed. Save
+ * the pointer in the device for use by drivers.
+ */
+ vfio_device_get_kvm_safe(df);
+
+ df->iommufd = iommufd;
+ ret = vfio_device_open(df, &bind.out_devid, NULL);
+ if (ret)
+ goto out_put_kvm;
+
+ ret = copy_to_user((void __user *)arg + minsz,
+ &bind.out_devid,
+ sizeof(bind.out_devid)) ? -EFAULT : 0;
+ if (ret)
+ goto out_close_device;
+
+ if (iommufd)
+ fdput(f);
+ else if (df->noiommu)
+ dev_warn(device->dev, "vfio-noiommu device used by user "
+ "(%s:%d)\n", current->comm, task_pid_nr(current));
+ mutex_unlock(&device->dev_set->lock);
+ return 0;
+
+out_close_device:
+ vfio_device_close(df);
+out_put_kvm:
+ vfio_device_put_kvm(device);
+out_put_file:
+ if (iommufd)
+ fdput(f);
+out_unlock:
+ df->iommufd = NULL;
+ df->noiommu = false;
+ mutex_unlock(&device->dev_set->lock);
+ return ret;
+}
+
+int vfio_ioctl_device_attach(struct vfio_device_file *df,
+ void __user *arg)
+{
+ struct vfio_device *device = df->device;
+ struct vfio_device_attach_iommufd_pt attach;
+ int ret;
+
+ if (copy_from_user(&attach, (void __user *)arg, sizeof(attach)))
+ return -EFAULT;
+
+ if (attach.flags || attach.pt_id == IOMMUFD_INVALID_ID)
+ return -EINVAL;
+
+ if (!device->ops->bind_iommufd)
+ return -ENODEV;
+
+ mutex_lock(&device->dev_set->lock);
+ if (df->noiommu)
+ return -ENODEV;
+
+ ret = device->ops->attach_ioas(device, &attach.pt_id);
+ if (ret)
+ goto out_unlock;
+
+ ret = copy_to_user((void __user *)arg + offsetofend(
+ struct vfio_device_attach_iommufd_pt, flags),
+ &attach.pt_id,
+ sizeof(attach.pt_id)) ? -EFAULT : 0;
+ if (ret)
+ goto out_detach;
+ mutex_unlock(&device->dev_set->lock);
+ return 0;
+
+out_detach:
+ device->ops->detach_ioas(device);
+out_unlock:
+ mutex_unlock(&device->dev_set->lock);
+ return ret;
+}
+
+int vfio_ioctl_device_detach(struct vfio_device_file *df,
+ void __user *arg)
+{
+ struct vfio_device *device = df->device;
+ struct vfio_device_detach_iommufd_pt detach;
+
+ if (copy_from_user(&detach, (void __user *)arg, sizeof(detach)))
+ return -EFAULT;
+
+ if (detach.flags)
+ return -EINVAL;
+
+ if (!device->ops->bind_iommufd)
+ return -ENODEV;
+
+ mutex_lock(&device->dev_set->lock);
+ if (df->noiommu)
+ return -ENODEV;
+ device->ops->detach_ioas(device);
+ mutex_unlock(&device->dev_set->lock);
+ return 0;
+}
+
static char *vfio_device_devnode(const struct device *dev, umode_t *mode)
{
return kasprintf(GFP_KERNEL, "vfio/devices/%s", dev_name(dev));
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index c7c75865afec..8a290c1455e1 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -23,7 +23,9 @@ struct vfio_device_file {
bool access_granted;
spinlock_t kvm_ref_lock; /* protect kvm field */
struct kvm *kvm;
- struct iommufd_ctx *iommufd; /* protected by struct vfio_device_set::lock */
+ /* protected by struct vfio_device_set::lock */
+ struct iommufd_ctx *iommufd;
+ bool noiommu;
};

void vfio_device_put_registration(struct vfio_device *device);
@@ -253,6 +255,13 @@ static inline void vfio_iommufd_unbind(struct vfio_device *device)
#if IS_ENABLED(CONFIG_VFIO_DEVICE_CDEV)
void vfio_init_device_cdev(struct vfio_device *device);
int vfio_device_fops_open(struct inode *inode, struct file *filep);
+void vfio_device_cdev_close(struct vfio_device_file *df);
+long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+ unsigned long arg);
+int vfio_ioctl_device_attach(struct vfio_device_file *df,
+ void __user *arg);
+int vfio_ioctl_device_detach(struct vfio_device_file *df,
+ void __user *arg);
int vfio_cdev_init(struct class *device_class);
void vfio_cdev_cleanup(void);
#else
@@ -266,6 +275,28 @@ static inline int vfio_device_fops_open(struct inode *inode,
return 0;
}

+static inline void vfio_device_cdev_close(struct vfio_device_file *df)
+{
+}
+
+static inline long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
+ unsigned long arg)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int vfio_ioctl_device_attach(struct vfio_device_file *df,
+ void __user *arg)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline int vfio_ioctl_device_detach(struct vfio_device_file *df,
+ void __user *arg)
+{
+ return -EOPNOTSUPP;
+}
+
static inline int vfio_cdev_init(struct class *device_class)
{
return 0;
diff --git a/drivers/vfio/vfio_main.c b/drivers/vfio/vfio_main.c
index 035730dc6ad4..8559c3dfb335 100644
--- a/drivers/vfio/vfio_main.c
+++ b/drivers/vfio/vfio_main.c
@@ -37,6 +37,7 @@
#include <linux/interval_tree.h>
#include <linux/iova_bitmap.h>
#include <linux/iommufd.h>
+#include <uapi/linux/iommufd.h>
#include "vfio.h"

#define DRIVER_VERSION "0.3"
@@ -442,16 +443,41 @@ static int vfio_device_first_open(struct vfio_device_file *df,
{
struct vfio_device *device = df->device;
struct iommufd_ctx *iommufd = df->iommufd;
- int ret;
+ int ret = 0;

lockdep_assert_held(&device->dev_set->lock);

+ /* df->iommufd and df->noiommu should be exclusive */
+ if (WARN_ON(iommufd && df->noiommu))
+ return -EINVAL;
+
if (!try_module_get(device->dev->driver->owner))
return -ENODEV;

+ /*
+ * For group path, iommufd pointer is NULL when comes into this
+ * helper. Its noiommu support is in container.c.
+ *
+ * For iommufd compat mode, iommufd pointer here is a valid value.
+ * Its noiommu support is supposed to be in vfio_iommufd_bind().
+ *
+ * For device cdev path, iommufd pointer here is a valid value for
+ * normal cases, but it is NULL if it's noiommu. The reason is
+ * that userspace uses iommufd==-1 to indicate noiommu mode in this
+ * path. So caller of this helper will pass in a NULL iommufd
+ * pointer. To differentiate it from the group path which also
+ * passes NULL iommufd pointer in, df->noiommu is used. For cdev
+ * noiommu, df->noiommu would be set to mark noiommu case for cdev
+ * path.
+ *
+ * So if df->noiommu is set then this helper just goes ahead to
+ * open device. If not, it depends on if iommufd pointer is NULL
+ * to handle the group path, iommufd compat mode, normal cases in
+ * the cdev path.
+ */
if (iommufd)
ret = vfio_iommufd_bind(device, iommufd, dev_id, pt_id);
- else
+ else if (!df->noiommu)
ret = vfio_device_group_use_iommu(device);
if (ret)
goto err_module_put;
@@ -466,7 +492,7 @@ static int vfio_device_first_open(struct vfio_device_file *df,
err_unuse_iommu:
if (iommufd)
vfio_iommufd_unbind(device);
- else
+ else if (!df->noiommu)
vfio_device_group_unuse_iommu(device);
err_module_put:
module_put(device->dev->driver->owner);
@@ -484,7 +510,7 @@ static void vfio_device_last_close(struct vfio_device_file *df)
device->ops->close_device(device);
if (iommufd)
vfio_iommufd_unbind(device);
- else
+ else if (!df->noiommu)
vfio_device_group_unuse_iommu(device);
module_put(device->dev->driver->owner);
}
@@ -588,6 +614,8 @@ static int vfio_device_fops_release(struct inode *inode, struct file *filep)
*/
if (!df->is_cdev_device)
vfio_device_group_close(df);
+ else
+ vfio_device_cdev_close(df);

vfio_device_put_registration(device);

@@ -1162,6 +1190,9 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
bool access;
int ret;

+ if (cmd == VFIO_DEVICE_BIND_IOMMUFD)
+ return vfio_device_ioctl_bind_iommufd(df, arg);
+
/* Paired with smp_store_release() in vfio_device_open() */
access = smp_load_acquire(&df->access_granted);
if (!access)
@@ -1176,6 +1207,14 @@ static long vfio_device_fops_unl_ioctl(struct file *filep,
ret = vfio_ioctl_device_feature(device, (void __user *)arg);
break;

+ case VFIO_DEVICE_ATTACH_IOMMUFD_PT:
+ ret = vfio_ioctl_device_attach(df, (void __user *)arg);
+ break;
+
+ case VFIO_DEVICE_DETACH_IOMMUFD_PT:
+ ret = vfio_ioctl_device_detach(df, (void __user *)arg);
+ break;
+
default:
if (unlikely(!device->ops->ioctl))
ret = -EINVAL;
diff --git a/include/linux/iommufd.h b/include/linux/iommufd.h
index 650d45629647..9672cf839687 100644
--- a/include/linux/iommufd.h
+++ b/include/linux/iommufd.h
@@ -17,6 +17,12 @@ struct iommufd_ctx;
struct iommufd_access;
struct file;

+/*
+ * iommufd core init xarray with flags==XA_FLAGS_ALLOC1, so valid
+ * ID starts from 1.
+ */
+#define IOMMUFD_INVALID_ID 0
+
struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx,
struct device *dev, u32 *id);
void iommufd_device_unbind(struct iommufd_device *idev);
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 23105eb036fa..c86cfe442884 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -190,6 +190,92 @@ struct vfio_group_status {

/* --------------- IOCTLs for DEVICE file descriptors --------------- */

+/*
+ * VFIO_DEVICE_BIND_IOMMUFD - _IOR(VFIO_TYPE, VFIO_BASE + 19,
+ * struct vfio_device_bind_iommufd)
+ *
+ * Bind a vfio_device to the specified iommufd.
+ *
+ * The user should provide a device cookie when calling this ioctl. The
+ * cookie is carried only in event e.g. I/O fault reported to userspace
+ * via iommufd. The user should use devid returned by this ioctl to mark
+ * the target device in other ioctls (e.g. capability query via iommufd).
+ *
+ * User is not allowed to access the device before the binding operation
+ * is completed.
+ *
+ * Unbind is automatically conducted when device fd is closed.
+ *
+ * @argsz: user filled size of this data.
+ * @flags: reserved for future extension.
+ * @dev_cookie: a per device cookie provided by userspace.
+ * @iommufd: iommufd to bind. iommufd < 0 means noiommu.
+ * @out_devid: the device id generated by this bind.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_bind_iommufd {
+ __u32 argsz;
+ __u32 flags;
+ __aligned_u64 dev_cookie;
+ __s32 iommufd;
+ __u32 out_devid;
+};
+
+#define VFIO_DEVICE_BIND_IOMMUFD _IO(VFIO_TYPE, VFIO_BASE + 19)
+
+/*
+ * VFIO_DEVICE_ATTACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 20,
+ * struct vfio_device_attach_iommufd_pt)
+ *
+ * Attach a vfio device to an iommufd address space specified by IOAS
+ * id or hw_pagetable (hwpt) id.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD
+ *
+ * Undo by VFIO_DEVICE_DETACH_IOMMUFD_PT or device fd close.
+ *
+ * @argsz: user filled size of this data.
+ * @flags: must be 0.
+ * @pt_id: Input the target id which can represent an ioas or a hwpt
+ * allocated via iommufd subsystem.
+ * Output the attached hwpt id which could be the specified
+ * hwpt itself or a hwpt automatically created for the
+ * specified ioas by kernel during the attachment.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_attach_iommufd_pt {
+ __u32 argsz;
+ __u32 flags;
+ __u32 pt_id;
+};
+
+#define VFIO_DEVICE_ATTACH_IOMMUFD_PT _IO(VFIO_TYPE, VFIO_BASE + 20)
+
+/*
+ * VFIO_DEVICE_DETACH_IOMMUFD_PT - _IOW(VFIO_TYPE, VFIO_BASE + 21,
+ * struct vfio_device_detach_iommufd_pt)
+ *
+ * Detach a vfio device from the iommufd address space it has been
+ * attached to. After it, device should be in a blocking DMA state.
+ *
+ * Available only after a device has been bound to iommufd via
+ * VFIO_DEVICE_BIND_IOMMUFD
+ *
+ * @argsz: user filled size of this data.
+ * @flags: must be 0.
+ *
+ * Return: 0 on success, -errno on failure.
+ */
+struct vfio_device_detach_iommufd_pt {
+ __u32 argsz;
+ __u32 flags;
+};
+
+#define VFIO_DEVICE_DETACH_IOMMUFD_PT _IO(VFIO_TYPE, VFIO_BASE + 21)
+
/**
* VFIO_DEVICE_GET_INFO - _IOR(VFIO_TYPE, VFIO_BASE + 7,
* struct vfio_device_info)
--
2.34.1


2023-02-06 09:07:46

by Yi Liu

[permalink] [raw]
Subject: [PATCH v2 14/14] vfio: Compile group optionally

group code is not needed for vfio device cdev, so with vfio device cdev
introduced, the group infrastructures can be compiled out if only cdev
is needed.

Signed-off-by: Yi Liu <[email protected]>
---
drivers/vfio/Kconfig | 18 +++++++++++
drivers/vfio/Makefile | 2 +-
drivers/vfio/vfio.h | 69 +++++++++++++++++++++++++++++++++++++++++++
include/linux/vfio.h | 11 +++++++
4 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/drivers/vfio/Kconfig b/drivers/vfio/Kconfig
index 0476abf154f2..de0fedcdf4d6 100644
--- a/drivers/vfio/Kconfig
+++ b/drivers/vfio/Kconfig
@@ -15,6 +15,7 @@ if VFIO
config VFIO_DEVICE_CDEV
bool "Support for the VFIO cdev /dev/vfio/devices/vfioX"
depends on IOMMUFD
+ default !VFIO_GROUP
help
The VFIO device cdev is another way for userspace to get device
access. Userspace gets device fd by opening device cdev under
@@ -23,9 +24,26 @@ config VFIO_DEVICE_CDEV

If you don't know what to do here, say N.

+config VFIO_ENABLE_GROUP
+ bool
+ default !VFIO_DEVICE_CDEV
+
+config VFIO_GROUP
+ bool "Support for the VFIO group /dev/vfio/$group_id"
+ select VFIO_ENABLE_GROUP
+ default y
+ help
+ VFIO group is legacy interface for userspace. For userspaces
+ adapted to iommufd and vfio device cdev, this can be N. For
+ now, before iommufd is ready and userspace applications fully
+ converted to iommufd and vfio device cdev, this should be Y.
+
+ If you don't know what to do here, say Y.
+
config VFIO_CONTAINER
bool "Support for the VFIO container /dev/vfio/vfio"
select VFIO_IOMMU_TYPE1 if MMU && (X86 || S390 || ARM || ARM64)
+ depends on VFIO_ENABLE_GROUP
default y
help
The VFIO container is the classic interface to VFIO for establishing
diff --git a/drivers/vfio/Makefile b/drivers/vfio/Makefile
index 245394aeb94b..4e81c3bbed30 100644
--- a/drivers/vfio/Makefile
+++ b/drivers/vfio/Makefile
@@ -2,9 +2,9 @@
obj-$(CONFIG_VFIO) += vfio.o

vfio-y += vfio_main.o \
- group.o \
iova_bitmap.o
vfio-$(CONFIG_VFIO_DEVICE_CDEV) += device_cdev.o
+vfio-$(CONFIG_VFIO_ENABLE_GROUP) += group.o
vfio-$(CONFIG_IOMMUFD) += iommufd.o
vfio-$(CONFIG_VFIO_CONTAINER) += container.o
vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o
diff --git a/drivers/vfio/vfio.h b/drivers/vfio/vfio.h
index 8a290c1455e1..9f823cfff7be 100644
--- a/drivers/vfio/vfio.h
+++ b/drivers/vfio/vfio.h
@@ -62,6 +62,7 @@ enum vfio_group_type {
VFIO_NO_IOMMU,
};

+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
struct vfio_group {
struct device dev;
struct cdev cdev;
@@ -105,6 +106,74 @@ bool vfio_group_has_dev(struct vfio_group *group, struct vfio_device *device);
bool vfio_device_has_container(struct vfio_device *device);
int __init vfio_group_init(void);
void vfio_group_cleanup(void);
+#else
+struct vfio_group;
+
+static inline int vfio_device_set_group(struct vfio_device *device,
+ enum vfio_group_type type)
+{
+ return 0;
+}
+
+static inline void vfio_device_remove_group(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_register(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_unregister(struct vfio_device *device)
+{
+}
+
+static inline int vfio_device_group_use_iommu(struct vfio_device *device)
+{
+ return -EOPNOTSUPP;
+}
+
+static inline void vfio_device_group_unuse_iommu(struct vfio_device *device)
+{
+}
+
+static inline void vfio_device_group_close(struct vfio_device_file *df)
+{
+}
+
+static inline struct vfio_group *vfio_group_from_file(struct file *file)
+{
+ return NULL;
+}
+
+static inline bool vfio_group_enforced_coherent(struct vfio_group *group)
+{
+ return true;
+}
+
+static inline void vfio_group_set_kvm(struct vfio_group *group, struct kvm *kvm)
+{
+}
+
+static inline bool vfio_group_has_dev(struct vfio_group *group,
+ struct vfio_device *device)
+{
+ return false;
+}
+
+static inline bool vfio_device_has_container(struct vfio_device *device)
+{
+ return false;
+}
+
+static inline int __init vfio_group_init(void)
+{
+ return 0;
+}
+
+static inline void vfio_group_cleanup(void)
+{
+}
+#endif /* CONFIG_VFIO_ENABLE_GROUP */

#if IS_ENABLED(CONFIG_VFIO_CONTAINER)
/* events for the backend driver notify callback */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index fd4bf9c21ffe..4c7666f8c5da 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -43,7 +43,9 @@ struct vfio_device {
*/
const struct vfio_migration_ops *mig_ops;
const struct vfio_log_ops *log_ops;
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
struct vfio_group *group;
+#endif
struct vfio_device_set *dev_set;
struct list_head dev_set_list;
unsigned int migration_flags;
@@ -56,8 +58,10 @@ struct vfio_device {
refcount_t refcount; /* user count on registered device*/
unsigned int open_count;
struct completion comp;
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
struct list_head group_next;
struct list_head iommu_entry;
+#endif
struct iommufd_access *iommufd_access;
void (*put_kvm)(struct kvm *kvm);
#if IS_ENABLED(CONFIG_IOMMUFD)
@@ -255,7 +259,14 @@ int vfio_mig_get_next_state(struct vfio_device *device,
/*
* External user API
*/
+#if IS_ENABLED(CONFIG_VFIO_ENABLE_GROUP)
struct iommu_group *vfio_file_iommu_group(struct file *file);
+#else
+static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
+{
+ return NULL;
+}
+#endif
bool vfio_file_is_valid(struct file *file);
bool vfio_file_enforced_coherent(struct file *file);
void vfio_file_set_kvm(struct file *file, struct kvm *kvm);
--
2.34.1


2023-02-07 03:25:30

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 03/14] vfio: Accept vfio device file in the driver facing kAPI

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:05 PM
>
> This makes the vfio file kAPIs to accepte vfio device files, also a
> preparation for vfio device cdev support.
>
> For the kvm set with vfio device file, kvm pointer is stored in struct
> vfio_device_file, and use kvm_ref_lock to protect kvm set and kvm
> pointer usage within VFIO. This kvm pointer will be set to vfio_device
> after device file is bound to iommufd in the cdev path.
>
> Signed-off-by: Yi Liu <[email protected]>

Reviewed-by: Kevin Tian <[email protected]>

2023-02-07 03:36:58

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 05/14] kvm/vfio: Accept vfio device file from userspace

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:05 PM
>
> This defines KVM_DEV_VFIO_FILE* and make alias with
> KVM_DEV_VFIO_GROUP*.
> Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
>
> Signed-off-by: Yi Liu <[email protected]>
> ---
> Documentation/virt/kvm/devices/vfio.rst | 40 ++++++++++++++-----------
> include/uapi/linux/kvm.h | 16 +++++++---
> virt/kvm/vfio.c | 16 +++++-----
> 3 files changed, 42 insertions(+), 30 deletions(-)
>
> diff --git a/Documentation/virt/kvm/devices/vfio.rst
> b/Documentation/virt/kvm/devices/vfio.rst
> index 2d20dc561069..7f84ec26ca4a 100644
> --- a/Documentation/virt/kvm/devices/vfio.rst
> +++ b/Documentation/virt/kvm/devices/vfio.rst
> @@ -9,23 +9,26 @@ Device types supported:
> - KVM_DEV_TYPE_VFIO
>
> Only one VFIO instance may be created per VM. The created device
> -tracks VFIO groups in use by the VM and features of those groups
> -important to the correctness and acceleration of the VM. As groups
> -are enabled and disabled for use by the VM, KVM should be updated
> -about their presence. When registered with KVM, a reference to the
> -VFIO-group is held by KVM.
> +tracks VFIO files (group or device) in use by the VM and features
> +of those groups/devices important to the correctness and acceleration
> +of the VM. As groups/devices are enabled and disabled for use by the
> +VM, KVM should be updated about their presence. When registered with
> +KVM, a reference to the VFIO file is held by KVM.
>
> Groups:

"Files"

> - KVM_DEV_VFIO_GROUP
> -
> -KVM_DEV_VFIO_GROUP attributes:
> - KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device
> tracking
> - kvm_device_attr.addr points to an int32_t file descriptor
> - for the VFIO group.
> - KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM
> device tracking
> - kvm_device_attr.addr points to an int32_t file descriptor
> - for the VFIO group.
> - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE
> table
> + KVM_DEV_VFIO_FILE
> + - alias: KVM_DEV_VFIO_GROUP
> +
> +KVM_DEV_VFIO_FILE attributes:
> + KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM
> device
> + tracking kvm_device_attr.addr points to an int32_t file descriptor

"... device tracking" and "kvm_device.attr.addr points to..." are two
sentences. They are deliberately arranged to be in different lines.

> + for the VFIO file.
> + - alias: KVM_DEV_VFIO_GROUP_ADD
> + KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from VFIO-
> KVM
> + device tracking kvm_device_attr.addr points to an int32_t file
> + descriptor for the VFIO file.

ditto

> + - alias: KVM_DEV_VFIO_GROUP_DEL
> + KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE table
> allocated by sPAPR KVM.

somehow here it should emphasize that the file can only be group

> kvm_device_attr.addr points to a struct::
>
> @@ -36,6 +39,7 @@ KVM_DEV_VFIO_GROUP attributes:
>
> where:
>
> - - @groupfd is a file descriptor for a VFIO group;
> - - @tablefd is a file descriptor for a TCE table allocated via
> - KVM_CREATE_SPAPR_TCE.
> + *) @groupfd is a file descriptor for a VFIO group;
> + *) @tablefd is a file descriptor for a TCE table allocated via

why changing above two lines?

> + KVM_CREATE_SPAPR_TCE.
> + - alias: KVM_DEV_VFIO_FILE_SET_SPAPR_TCE

GROUP


2023-02-07 03:42:33

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 06/14] vfio: Pass struct vfio_device_file * to vfio_device_open/close()

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:05 PM
>
> This avoids passing too much parameters in multiple functions.
>
> Signed-off-by: Yi Liu <[email protected]>

Reviewed-by: Kevin Tian <[email protected]>

2023-02-07 03:52:37

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 07/14] vfio: Block device access via device fd until device is opened

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:05 PM
>
> Allow the vfio_device file to be in a state where the device FD is
> opened but the device cannot be used by userspace (i.e. its .open_device()
> hasn't been called). This inbetween state is not used when the device
> FD is spawned from the group FD, however when we create the device FD
> directly by opening a cdev it will be opened in the blocked state.
>
> The reason for the inbetween state is userspace only gets a FD but

...is "that" userspace...

> doesn't have the secure until binding the FD to an iommufd. So in the

"doesn't gain access permission until..."

> blocked state, only the bind operation is allowed, other device accesses

remove the last part which duplicates with "only ... is allowed"

> are not allowed. Completing bind will allow user to further access the
> device.
>
> This is implemented by adding a flag in struct vfio_device_file to mark
> the blocked state and using a simple smp_load_acquire() to obtain the
> flag value and serialize all the device setup with the thread accessing
> this device.
>
> Following this lockless scheme, it can safely handle the device FD
> unbound->bound but it cannot handle bound->unbound. To allow this we'd
> need to add a lock on all the vfio ioctls which seems costly. So once
> device FD is bound, it remains bound until the FD is closed.
>
> Suggested-by: Jason Gunthorpe <[email protected]>
> Signed-off-by: Yi Liu <[email protected]>

Reviewed-by: Kevin Tian <[email protected]>

2023-02-07 03:57:34

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 08/14] vfio: Add infrastructure for bind_iommufd from userspace

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:05 PM
>
> For the device fd opened from cdev, userspace needs to bind it to an
> iommufd and attach it to IOAS managed by iommufd. With such operations,
> userspace can set up a secure DMA context and hence access device.
>
> This changes the existing vfio_iommufd_bind() to accept a pt_id pointer
> as an optional input, and also an dev_id pointer to selectively return
> the dev_id to prepare for adding bind_iommufd ioctl, which does the bind
> first and then attach IOAS.
>
> Signed-off-by: Yi Liu <[email protected]>

Reviewed-by: Kevin Tian <[email protected]>

2023-02-07 04:05:10

by Yan Zhao

[permalink] [raw]
Subject: Re: [PATCH v2 13/14] vfio: Add ioctls for device cdev using iommufd

On Mon, Feb 06, 2023 at 01:05:31AM -0800, Yi Liu wrote:
...

> +void vfio_device_cdev_close(struct vfio_device_file *df)
> +{
> + mutex_lock(&df->device->dev_set->lock);
> + vfio_device_close(df);
> + vfio_device_put_kvm(df->device);
> + mutex_unlock(&df->device->dev_set->lock);
> +}
> +

...

> +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> + unsigned long arg)
> +{
> + struct vfio_device *device = df->device;
> + struct vfio_device_bind_iommufd bind;
> + struct iommufd_ctx *iommufd = NULL;
> + unsigned long minsz;
> + struct fd f;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_device_bind_iommufd, iommufd);
> +
> + if (copy_from_user(&bind, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (bind.argsz < minsz || bind.flags)
> + return -EINVAL;
> +
> + if (!device->ops->bind_iommufd)
> + return -ENODEV;
> +
> + mutex_lock(&device->dev_set->lock);
> + /*
> + * If already been bound to an iommufd, or already set noiommu
> + * then fail it.
> + */
> + if (df->iommufd || df->noiommu) {
> + ret = -EINVAL;
> + goto out_unlock;
> + }
> +
> + /* iommufd < 0 means noiommu mode */
> + if (bind.iommufd < 0) {
> + if (!capable(CAP_SYS_RAWIO)) {
> + ret = -EPERM;
> + goto out_unlock;
> + }
> + df->noiommu = true;
> + } else {
> + f = fdget(bind.iommufd);
> + if (!f.file) {
> + ret = -EBADF;
> + goto out_unlock;
> + }
> + iommufd = iommufd_ctx_from_file(f.file);
> + if (IS_ERR(iommufd)) {
> + ret = PTR_ERR(iommufd);
> + goto out_put_file;
> + }
> + }
> +
> + /*
> + * Before the first device open, get the KVM pointer currently
> + * associated with the device file (if there is) and obtain a
> + * reference. This reference is held until device closed. Save
> + * the pointer in the device for use by drivers.
> + */
> + vfio_device_get_kvm_safe(df);
> +
> + df->iommufd = iommufd;
> + ret = vfio_device_open(df, &bind.out_devid, NULL);
> + if (ret)
> + goto out_put_kvm;
> +
> + ret = copy_to_user((void __user *)arg + minsz,
> + &bind.out_devid,
> + sizeof(bind.out_devid)) ? -EFAULT : 0;
> + if (ret)
> + goto out_close_device;
> +
> + if (iommufd)
> + fdput(f);
> + else if (df->noiommu)
> + dev_warn(device->dev, "vfio-noiommu device used by user "
> + "(%s:%d)\n", current->comm, task_pid_nr(current));
> + mutex_unlock(&device->dev_set->lock);
> + return 0;
> +
> +out_close_device:
> + vfio_device_close(df);
vfio_device_close() is called here if any error after vfio_device_open().
But it will also be called unconditionally in vfio_device_cdev_close() and
cause a wrong value of device->open_count.

df->access_granted in patch 07 can also be of wrong true value after
this vfio_device_close().


> +out_put_kvm:
> + vfio_device_put_kvm(device);
> +out_put_file:
> + if (iommufd)
> + fdput(f);
> +out_unlock:
> + df->iommufd = NULL;
> + df->noiommu = false;
> + mutex_unlock(&device->dev_set->lock);
> + return ret;
> +}
> +

2023-02-07 06:07:22

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 09/14] vfio-iommufd: Add detach_ioas support for physical VFIO devices

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:05 PM
>
> +static void __vfio_iommufd_detach(struct vfio_device *vdev)
> +{
> + iommufd_device_detach(vdev->iommufd_device);
> + vdev->iommufd_attached = false;
> +}
> +
> void vfio_iommufd_physical_unbind(struct vfio_device *vdev)
> {
> lockdep_assert_held(&vdev->dev_set->lock);
>
> - if (vdev->iommufd_attached) {
> - iommufd_device_detach(vdev->iommufd_device);
> - vdev->iommufd_attached = false;
> - }
> + if (vdev->iommufd_attached)
> + __vfio_iommufd_detach(vdev);

I'm not sure whether this abstraction really improves things.

Just two callers. and the old code reads clearer to me which
checks a flag, does something and then clear the flag.

> @@ -74,6 +74,7 @@ struct vfio_device {
> * @unbind_iommufd: Opposite of bind_iommufd
> * @attach_ioas: Called when attaching device to an IOAS/HWPT managed
> by the
> * bound iommufd. Undo in unbind_iommufd.

"Undo in unbind_iommufd if @detach_ioas is not called".

> + * @detach_ioas: Opposite of attach_ioas, this is for runtime undo.

remove "this is for runtime undo" which is confusing.

2023-02-07 06:08:42

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 10/14] vfio-iommufd: Add detach_ioas for emulated VFIO devices

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:05 PM
> @@ -149,14 +149,18 @@ int vfio_iommufd_emulated_bind(struct
> vfio_device *vdev,
> }
> EXPORT_SYMBOL_GPL(vfio_iommufd_emulated_bind);
>
> +static void __vfio_iommufd_access_destroy(struct vfio_device *vdev)
> +{
> + iommufd_access_destroy(vdev->iommufd_access);
> + vdev->iommufd_access = NULL;
> +}
> +
> void vfio_iommufd_emulated_unbind(struct vfio_device *vdev)
> {
> lockdep_assert_held(&vdev->dev_set->lock);
>
> - if (vdev->iommufd_access) {
> - iommufd_access_destroy(vdev->iommufd_access);
> - vdev->iommufd_access = NULL;
> - }
> + if (vdev->iommufd_access)
> + __vfio_iommufd_access_destroy(vdev);

same comment as in last patch.

2023-02-07 06:25:07

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 11/14] vfio: Make vfio_device_open() exclusive between group path and device cdev path

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:05 PM
>
> struct vfio_device_file *
> -vfio_allocate_device_file(struct vfio_device *device)
> +vfio_allocate_device_file(struct vfio_device *device, bool is_cdev_device)
> {
> struct vfio_device_file *df;
>
> @@ -407,6 +407,7 @@ vfio_allocate_device_file(struct vfio_device *device)
> return ERR_PTR(-ENOMEM);
>
> df->device = device;
> + df->is_cdev_device = is_cdev_device;

You missed Alex's comment that the one caller can open code the
assignment instead of introducing an unmemorable Boolean arg here.

>
> + /*
> + * Device cdev path cannot support multiple device open since
> + * it doesn't have a secure way for it. So a second device
> + * open attempt should be failed if the caller is from a cdev
> + * path or the device has already been opened by a cdev path.
> + */
> + if (device->open_count != 0 &&
> + (df->is_cdev_device || device->single_open))
> + return -EINVAL;
> +

If we are gonna handle the exclusive open via dma ownership, then
here we don't need a new single_open flag inside vfio_device since
only one interface (cdev or group) could be used to open this device.

Just preventing multi-open in cdev is sufficient here.

2023-02-07 07:51:57

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v2 13/14] vfio: Add ioctls for device cdev using iommufd

> From: Yan Zhao <[email protected]>
> Sent: Tuesday, February 7, 2023 11:41 AM
>
> On Mon, Feb 06, 2023 at 01:05:31AM -0800, Yi Liu wrote:
> ...
>
> > +void vfio_device_cdev_close(struct vfio_device_file *df)
> > +{
> > + mutex_lock(&df->device->dev_set->lock);
> > + vfio_device_close(df);
> > + vfio_device_put_kvm(df->device);
> > + mutex_unlock(&df->device->dev_set->lock);
> > +}
> > +
>
> ...
>
> > +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> > + unsigned long arg)
> > +{
> > + struct vfio_device *device = df->device;
> > + struct vfio_device_bind_iommufd bind;
> > + struct iommufd_ctx *iommufd = NULL;
> > + unsigned long minsz;
> > + struct fd f;
> > + int ret;
> > +
> > + minsz = offsetofend(struct vfio_device_bind_iommufd, iommufd);
> > +
> > + if (copy_from_user(&bind, (void __user *)arg, minsz))
> > + return -EFAULT;
> > +
> > + if (bind.argsz < minsz || bind.flags)
> > + return -EINVAL;
> > +
> > + if (!device->ops->bind_iommufd)
> > + return -ENODEV;
> > +
> > + mutex_lock(&device->dev_set->lock);
> > + /*
> > + * If already been bound to an iommufd, or already set noiommu
> > + * then fail it.
> > + */
> > + if (df->iommufd || df->noiommu) {
> > + ret = -EINVAL;
> > + goto out_unlock;
> > + }
> > +
> > + /* iommufd < 0 means noiommu mode */
> > + if (bind.iommufd < 0) {
> > + if (!capable(CAP_SYS_RAWIO)) {
> > + ret = -EPERM;
> > + goto out_unlock;
> > + }
> > + df->noiommu = true;
> > + } else {
> > + f = fdget(bind.iommufd);
> > + if (!f.file) {
> > + ret = -EBADF;
> > + goto out_unlock;
> > + }
> > + iommufd = iommufd_ctx_from_file(f.file);
> > + if (IS_ERR(iommufd)) {
> > + ret = PTR_ERR(iommufd);
> > + goto out_put_file;
> > + }
> > + }
> > +
> > + /*
> > + * Before the first device open, get the KVM pointer currently
> > + * associated with the device file (if there is) and obtain a
> > + * reference. This reference is held until device closed. Save
> > + * the pointer in the device for use by drivers.
> > + */
> > + vfio_device_get_kvm_safe(df);
> > +
> > + df->iommufd = iommufd;
> > + ret = vfio_device_open(df, &bind.out_devid, NULL);
> > + if (ret)
> > + goto out_put_kvm;
> > +
> > + ret = copy_to_user((void __user *)arg + minsz,
> > + &bind.out_devid,
> > + sizeof(bind.out_devid)) ? -EFAULT : 0;
> > + if (ret)
> > + goto out_close_device;
> > +
> > + if (iommufd)
> > + fdput(f);
> > + else if (df->noiommu)
> > + dev_warn(device->dev, "vfio-noiommu device used by user
> "
> > + "(%s:%d)\n", current->comm,
> task_pid_nr(current));
> > + mutex_unlock(&device->dev_set->lock);
> > + return 0;
> > +
> > +out_close_device:
> > + vfio_device_close(df);
> vfio_device_close() is called here if any error after vfio_device_open().
> But it will also be called unconditionally in vfio_device_cdev_close() and
> cause a wrong value of device->open_count.

Oh, yes yes. Good catch. Vfio_device_cdev_close() should check either the
open_count or access_granted.

> df->access_granted in patch 07 can also be of wrong true value after
> this vfio_device_close().

access_granted will surely be wrong if open_count is not correctly
counted.

Regards,
Yi Liu

2023-02-07 07:54:16

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 12/14] vfio: Add cdev for vfio_device

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:06 PM
> +
> +static struct vfio {
> + dev_t device_devt;
> +} vfio;

the structure is meaningless.

> +int vfio_device_fops_open(struct inode *inode, struct file *filep)
> +{
> + struct vfio_device *device = container_of(inode->i_cdev,
> + struct vfio_device, cdev);
> + struct vfio_device_file *df;
> + int ret;
> +
> + if (!vfio_device_try_get_registration(device))
> + return -ENODEV;
> +
> + /*
> + * device access is blocked until .open_device() is called
> + * in BIND_IOMMUFD.
> + */

this comment is more related to the whole function instead of the
following allocation code. Move it to be the function comment.

> @@ -51,6 +52,7 @@ struct vfio_device {
> /* Members below here are private, not for driver use */
> unsigned int index;
> struct device device; /* device.kref covers object life circle */
> + struct cdev cdev;

only if CDEV is configured.

2023-02-07 08:39:37

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v2 05/14] kvm/vfio: Accept vfio device file from userspace

> From: Tian, Kevin <[email protected]>
> Sent: Tuesday, February 7, 2023 11:37 AM
>
> > From: Liu, Yi L <[email protected]>
> > Sent: Monday, February 6, 2023 5:05 PM
> >
> > This defines KVM_DEV_VFIO_FILE* and make alias with
> > KVM_DEV_VFIO_GROUP*.
> > Old userspace uses KVM_DEV_VFIO_GROUP* works as well.
> >
> > Signed-off-by: Yi Liu <[email protected]>
> > ---
> > Documentation/virt/kvm/devices/vfio.rst | 40 ++++++++++++++----------
> -
> > include/uapi/linux/kvm.h | 16 +++++++---
> > virt/kvm/vfio.c | 16 +++++-----
> > 3 files changed, 42 insertions(+), 30 deletions(-)
> >
> > diff --git a/Documentation/virt/kvm/devices/vfio.rst
> > b/Documentation/virt/kvm/devices/vfio.rst
> > index 2d20dc561069..7f84ec26ca4a 100644
> > --- a/Documentation/virt/kvm/devices/vfio.rst
> > +++ b/Documentation/virt/kvm/devices/vfio.rst
> > @@ -9,23 +9,26 @@ Device types supported:
> > - KVM_DEV_TYPE_VFIO
> >
> > Only one VFIO instance may be created per VM. The created device
> > -tracks VFIO groups in use by the VM and features of those groups
> > -important to the correctness and acceleration of the VM. As groups
> > -are enabled and disabled for use by the VM, KVM should be updated
> > -about their presence. When registered with KVM, a reference to the
> > -VFIO-group is held by KVM.
> > +tracks VFIO files (group or device) in use by the VM and features
> > +of those groups/devices important to the correctness and acceleration
> > +of the VM. As groups/devices are enabled and disabled for use by the
> > +VM, KVM should be updated about their presence. When registered
> with
> > +KVM, a reference to the VFIO file is held by KVM.
> >
> > Groups:
>
> "Files"

It should be "Groups" ???? Here "Groups" means subcmd groups.

>
> > - KVM_DEV_VFIO_GROUP
> > -
> > -KVM_DEV_VFIO_GROUP attributes:
> > - KVM_DEV_VFIO_GROUP_ADD: Add a VFIO group to VFIO-KVM device
> > tracking
> > - kvm_device_attr.addr points to an int32_t file descriptor
> > - for the VFIO group.
> > - KVM_DEV_VFIO_GROUP_DEL: Remove a VFIO group from VFIO-KVM
> > device tracking
> > - kvm_device_attr.addr points to an int32_t file descriptor
> > - for the VFIO group.
> > - KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE: attaches a guest visible TCE
> > table
> > + KVM_DEV_VFIO_FILE
> > + - alias: KVM_DEV_VFIO_GROUP
> > +
> > +KVM_DEV_VFIO_FILE attributes:
> > + KVM_DEV_VFIO_FILE_ADD: Add a VFIO file (group/device) to VFIO-KVM
> > device
> > + tracking kvm_device_attr.addr points to an int32_t file descriptor
>
> "... device tracking" and "kvm_device.attr.addr points to..." are two
> sentences. They are deliberately arranged to be in different lines.

Oh, yes.

> > + for the VFIO file.
> > + - alias: KVM_DEV_VFIO_GROUP_ADD
> > + KVM_DEV_VFIO_FILE_DEL: Remove a VFIO file (group/device) from
> VFIO-
> > KVM
> > + device tracking kvm_device_attr.addr points to an int32_t file
> > + descriptor for the VFIO file.
>
> ditto

Will convert.

>
> > + - alias: KVM_DEV_VFIO_GROUP_DEL
> > + KVM_DEV_VFIO_FILE_SET_SPAPR_TCE: attaches a guest visible TCE
> table
> > allocated by sPAPR KVM.
>
> somehow here it should emphasize that the file can only be group

Yes.

> > kvm_device_attr.addr points to a struct::
> >
> > @@ -36,6 +39,7 @@ KVM_DEV_VFIO_GROUP attributes:
> >
> > where:
> >
> > - - @groupfd is a file descriptor for a VFIO group;
> > - - @tablefd is a file descriptor for a TCE table allocated via
> > - KVM_CREATE_SPAPR_TCE.
> > + *) @groupfd is a file descriptor for a VFIO group;
> > + *) @tablefd is a file descriptor for a TCE table allocated via
>
> why changing above two lines?

this is due to changing "-" to be "*)" as subbullet as below need
to add alias.

> > + KVM_CREATE_SPAPR_TCE.
> > + - alias: KVM_DEV_VFIO_FILE_SET_SPAPR_TCE
>
> GROUP

Yes.

Regards,
Yi Liu

2023-02-07 08:54:45

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v2 11/14] vfio: Make vfio_device_open() exclusive between group path and device cdev path

> From: Tian, Kevin <[email protected]>
> Sent: Tuesday, February 7, 2023 2:25 PM
>
> > From: Liu, Yi L <[email protected]>
> > Sent: Monday, February 6, 2023 5:05 PM
> >
> > struct vfio_device_file *
> > -vfio_allocate_device_file(struct vfio_device *device)
> > +vfio_allocate_device_file(struct vfio_device *device, bool
> is_cdev_device)
> > {
> > struct vfio_device_file *df;
> >
> > @@ -407,6 +407,7 @@ vfio_allocate_device_file(struct vfio_device
> *device)
> > return ERR_PTR(-ENOMEM);
> >
> > df->device = device;
> > + df->is_cdev_device = is_cdev_device;
>
> You missed Alex's comment that the one caller can open code the
> assignment instead of introducing an unmemorable Boolean arg here.

Oh, yes. will open code to set it in cdev path.

> >
> > + /*
> > + * Device cdev path cannot support multiple device open since
> > + * it doesn't have a secure way for it. So a second device
> > + * open attempt should be failed if the caller is from a cdev
> > + * path or the device has already been opened by a cdev path.
> > + */
> > + if (device->open_count != 0 &&
> > + (df->is_cdev_device || device->single_open))
> > + return -EINVAL;
> > +
>
> If we are gonna handle the exclusive open via dma ownership, then
> here we don't need a new single_open flag inside vfio_device since
> only one interface (cdev or group) could be used to open this device.
>
> Just preventing multi-open in cdev is sufficient here.

I see. Per below discussion, just need to make group path always
use vfio_group pointer as DMA marker.

https://lore.kernel.org/kvm/BN9PR11MB5276FA68E8CAD6394A8887848CDB9@BN9PR11MB5276.namprd11.prod.outlook.com/

Regards,
Yi Liu

2023-02-07 08:56:57

by Yi Liu

[permalink] [raw]
Subject: RE: [PATCH v2 09/14] vfio-iommufd: Add detach_ioas support for physical VFIO devices

> From: Tian, Kevin <[email protected]>
> Sent: Tuesday, February 7, 2023 2:07 PM
>
> > From: Liu, Yi L <[email protected]>
> > Sent: Monday, February 6, 2023 5:05 PM
> >
> > +static void __vfio_iommufd_detach(struct vfio_device *vdev)
> > +{
> > + iommufd_device_detach(vdev->iommufd_device);
> > + vdev->iommufd_attached = false;
> > +}
> > +
> > void vfio_iommufd_physical_unbind(struct vfio_device *vdev)
> > {
> > lockdep_assert_held(&vdev->dev_set->lock);
> >
> > - if (vdev->iommufd_attached) {
> > - iommufd_device_detach(vdev->iommufd_device);
> > - vdev->iommufd_attached = false;
> > - }
> > + if (vdev->iommufd_attached)
> > + __vfio_iommufd_detach(vdev);
>
> I'm not sure whether this abstraction really improves things.
>
> Just two callers. and the old code reads clearer to me which
> checks a flag, does something and then clear the flag.

Ok. Will revert it.

>
> > @@ -74,6 +74,7 @@ struct vfio_device {
> > * @unbind_iommufd: Opposite of bind_iommufd
> > * @attach_ioas: Called when attaching device to an IOAS/HWPT
> managed
> > by the
> > * bound iommufd. Undo in unbind_iommufd.
>
> "Undo in unbind_iommufd if @detach_ioas is not called".
>
> > + * @detach_ioas: Opposite of attach_ioas, this is for runtime undo.
>
> remove "this is for runtime undo" which is confusing.

Ok, clear with your above suggestion.

Regards,
Yi Liu

2023-02-07 09:17:46

by Tian, Kevin

[permalink] [raw]
Subject: RE: [PATCH v2 13/14] vfio: Add ioctls for device cdev using iommufd

> From: Liu, Yi L <[email protected]>
> Sent: Monday, February 6, 2023 5:06 PM
>
> +long vfio_device_ioctl_bind_iommufd(struct vfio_device_file *df,
> + unsigned long arg)
> +{
> + struct vfio_device *device = df->device;
> + struct vfio_device_bind_iommufd bind;
> + struct iommufd_ctx *iommufd = NULL;
> + unsigned long minsz;
> + struct fd f;
> + int ret;
> +
> + minsz = offsetofend(struct vfio_device_bind_iommufd, iommufd);

shouldn't the minsz include out_devid?

> + /*
> + * Before the first device open, get the KVM pointer currently
> + * associated with the device file (if there is) and obtain a
> + * reference. This reference is held until device closed. Save
> + * the pointer in the device for use by drivers.
> + */
> + vfio_device_get_kvm_safe(df);

there is no multi-open for cdev so "before the first device open"
doesn't make sense here.

> +int vfio_ioctl_device_attach(struct vfio_device_file *df,
> + void __user *arg)
> +{
> + struct vfio_device *device = df->device;
> + struct vfio_device_attach_iommufd_pt attach;
> + int ret;
> +
> + if (copy_from_user(&attach, (void __user *)arg, sizeof(attach)))
> + return -EFAULT;

lack of a minsz check

> +
> + if (attach.flags || attach.pt_id == IOMMUFD_INVALID_ID)
> + return -EINVAL;
> +
> + if (!device->ops->bind_iommufd)
> + return -ENODEV;
> +
> + mutex_lock(&device->dev_set->lock);
> + if (df->noiommu)
> + return -ENODEV;

-EINVAL

> +
> +int vfio_ioctl_device_detach(struct vfio_device_file *df,
> + void __user *arg)
> +{
> + struct vfio_device *device = df->device;
> + struct vfio_device_detach_iommufd_pt detach;
> +
> + if (copy_from_user(&detach, (void __user *)arg, sizeof(detach)))
> + return -EFAULT;

lack of minsz check

> + mutex_lock(&device->dev_set->lock);
> + if (df->noiommu)
> + return -ENODEV;

-EINVAL

> @@ -442,16 +443,41 @@ static int vfio_device_first_open(struct
> vfio_device_file *df,
> {
> struct vfio_device *device = df->device;
> struct iommufd_ctx *iommufd = df->iommufd;
> - int ret;
> + int ret = 0;
>
> lockdep_assert_held(&device->dev_set->lock);
>
> + /* df->iommufd and df->noiommu should be exclusive */
> + if (WARN_ON(iommufd && df->noiommu))

pointless comment

> + return -EINVAL;
> +
> if (!try_module_get(device->dev->driver->owner))
> return -ENODEV;
>
> + /*
> + * For group path, iommufd pointer is NULL when comes into this

"For group/container path"

> + * helper. Its noiommu support is in container.c.
> + *
> + * For iommufd compat mode, iommufd pointer here is a valid value.
> + * Its noiommu support is supposed to be in vfio_iommufd_bind().

remove "supposed to be"

> + *
> + * For device cdev path, iommufd pointer here is a valid value for
> + * normal cases, but it is NULL if it's noiommu. The reason is
> + * that userspace uses iommufd==-1 to indicate noiommu mode in
> this

"iommufd < 0"

> + * path. So caller of this helper will pass in a NULL iommufd

I don't think that is the reason. Instead the caller is required to pass
NULL iommufd pointer to indicate noiommu. Just remove the reason part.

> + * pointer. To differentiate it from the group path which also
> + * passes NULL iommufd pointer in, df->noiommu is used. For cdev
> + * noiommu, df->noiommu would be set to mark noiommu case for
> cdev
> + * path.

"To differentiate ..., check df->noiommu which is set only in the cdev path"

> + *
> + * So if df->noiommu is set then this helper just goes ahead to
> + * open device. If not, it depends on if iommufd pointer is NULL
> + * to handle the group path, iommufd compat mode, normal cases in
> + * the cdev path.

this doesn't match the code order. Probably can be removed after earlier
explanation.

> + * @argsz: user filled size of this data.
> + * @flags: reserved for future extension.
> + * @dev_cookie: a per device cookie provided by userspace.
> + * @iommufd: iommufd to bind. iommufd < 0 means noiommu.

s/iommufd < 0/a negative value/


2023-02-10 11:32:56

by Joao Martins

[permalink] [raw]
Subject: Re: [PATCH v2 12/14] vfio: Add cdev for vfio_device

Hey Yi,

On 06/02/2023 09:05, Yi Liu wrote:
> This allows user to directly open a vfio device w/o using the legacy
> container/group interface, as a prerequisite for supporting new iommu
> features like nested translation.
>
> The device fd opened in this manner doesn't have the capability to access
> the device as the fops open() doesn't open the device until the successful
> BIND_IOMMUFD which be added in next patch.
>
> With this patch, devices registered to vfio core have both group and device
> interface created.
>
> - group interface : /dev/vfio/$groupID
> - device interface: /dev/vfio/devices/vfioX (X is the minor number and
> is unique across devices)
>
> Given a vfio device the user can identify the matching vfioX by checking
> the sysfs path of the device. Take PCI device (0000:6a:01.0) for example,
> /sys/bus/pci/devices/0000\:6a\:01.0/vfio-dev/vfio0/dev contains the
> major:minor of the matching vfioX.
>
> Userspace then opens the /dev/vfio/devices/vfioX and checks with fstat
> that the major:minor matches.
>
> The vfio_device cdev logic in this patch:
> *) __vfio_register_dev() path ends up doing cdev_device_add() for each
> vfio_device;
> *) vfio_unregister_group_dev() path does cdev_device_del();
>
> Signed-off-by: Yi Liu <[email protected]>
> Signed-off-by: Joao Martins <[email protected]>
Feel free to drop my SoB. The code I added/moved at the time was very very tiny
in the middle of rebasing various series ... solely to unblock folks that were
also testing when IOMMUFD going out of RFC. But they don't justify a second
author SoB (thanks regardless!)

Joao