2023-10-21 09:29:06

by Si-Wei Liu

[permalink] [raw]
Subject: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset

In order to reduce needlessly high setup and teardown cost
of iotlb mapping during live migration, it's crucial to
decouple the vhost-vdpa iotlb abstraction from the virtio
device life cycle, i.e. iotlb mappings should be left
intact across virtio device reset [1]. For it to work, the
on-chip IOMMU parent device could implement a separate
.reset_map() operation callback to restore 1:1 DMA mapping
without having to resort to the .reset() callback, the
latter of which is mainly used to reset virtio device state.
This new .reset_map() callback will be invoked only before
the vhost-vdpa driver is to be removed and detached from
the vdpa bus, such that other vdpa bus drivers, e.g.
virtio-vdpa, can start with 1:1 DMA mapping when they
are attached. For the context, those on-chip IOMMU parent
devices, create the 1:1 DMA mapping at vdpa device creation,
and they would implicitly destroy the 1:1 mapping when
the first .set_map or .dma_map callback is invoked.

This patchset is rebased on top of the latest vhost tree.

[1] Reducing vdpa migration downtime because of memory pin / maps
https://www.mail-archive.com/[email protected]/msg953755.html

---
v4:
- Rework compatibility using new .compat_reset driver op

v3:
- add .reset_map support to vdpa_sim
- introduce module parameter to provide bug-for-bug compatibility with older
userspace

v2:
- improved commit message to clarify the intended csope of .reset_map API
- improved commit messages to clarify no breakage on older userspace

v1:
- rewrote commit messages to include more detailed description and background
- reword to vendor specific IOMMU implementation from on-chip IOMMU
- include parent device backend feautres to persistent iotlb precondition
- reimplement mlx5_vdpa patch on top of descriptor group series

RFC v3:
- fix missing return due to merge error in patch #4

RFC v2:
- rebased on top of the "[PATCH RFC v2 0/3] vdpa: dedicated descriptor table group" series:
https://lore.kernel.org/virtualization/[email protected]/

---

Si-Wei Liu (7):
vdpa: introduce .reset_map operation callback
vhost-vdpa: reset vendor specific mapping to initial state in .release
vhost-vdpa: introduce IOTLB_PERSIST backend feature bit
vdpa: introduce .compat_reset operation callback
vhost-vdpa: clean iotlb map during reset for older userspace
vdpa/mlx5: implement .reset_map driver op
vdpa_sim: implement .reset_map support

drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
drivers/vdpa/mlx5/core/mr.c | 17 ++++++++++
drivers/vdpa/mlx5/net/mlx5_vnet.c | 27 ++++++++++++++--
drivers/vdpa/vdpa_sim/vdpa_sim.c | 52 ++++++++++++++++++++++++------
drivers/vhost/vdpa.c | 49 +++++++++++++++++++++++++---
drivers/virtio/virtio_vdpa.c | 2 +-
include/linux/vdpa.h | 30 +++++++++++++++--
include/uapi/linux/vhost_types.h | 2 ++
8 files changed, 161 insertions(+), 19 deletions(-)

--
2.39.3


2023-10-21 09:29:10

by Si-Wei Liu

[permalink] [raw]
Subject: [PATCH v4 3/7] vhost-vdpa: introduce IOTLB_PERSIST backend feature bit

Userspace needs this feature flag to distinguish if vhost-vdpa iotlb in
the kernel can be trusted to persist IOTLB mapping across vDPA reset.
Without it, userspace has no way to tell apart if it's running on an
older kernel, which could silently drop all iotlb mapping across vDPA
reset, especially with broken parent driver implementation for the
.reset driver op. The broken driver may incorrectly drop all mappings of
its own as part of .reset, which inadvertently ends up with corrupted
mapping state between vhost-vdpa userspace and the kernel. As a
workaround, to make the mapping behaviour predictable across reset,
userspace has to pro-actively remove all mappings before vDPA reset, and
then restore all the mappings afterwards. This workaround is done
unconditionally on top of all parent drivers today, due to the parent
driver implementation issue and no means to differentiate. This
workaround had been utilized in QEMU since day one when the
corresponding vhost-vdpa userspace backend came to the world.

There are 3 cases that backend may claim this feature bit on for:

- parent device that has to work with platform IOMMU
- parent device with on-chip IOMMU that has the expected
.reset_map support in driver
- parent device with vendor specific IOMMU implementation with
persistent IOTLB mapping already that has to specifically
declare this backend feature

The reason why .reset_map is being one of the pre-condition for
persistent iotlb is because without it, vhost-vdpa can't switch back
iotlb to the initial state later on, especially for the on-chip IOMMU
case which starts with identity mapping at device creation. virtio-vdpa
requires on-chip IOMMU to perform 1:1 passthrough translation from PA to
IOVA as-is to begin with, and .reset_map is the only means to turn back
iotlb to the identity mapping mode after vhost-vdpa is gone.

The difference in behavior did not matter as QEMU unmaps all the memory
unregistering the memory listener at vhost_vdpa_dev_start( started =
false), but the backend acknowledging this feature flag allows QEMU to
make sure it is safe to skip this unmap & map in the case of vhost stop
& start cycle.

In that sense, this feature flag is actually a signal for userspace to
know that the driver bug has been solved. Not offering it indicates that
userspace cannot trust the kernel will retain the maps.

Signed-off-by: Si-Wei Liu <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
---
drivers/vhost/vdpa.c | 15 +++++++++++++++
include/uapi/linux/vhost_types.h | 2 ++
2 files changed, 17 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index c6bfe9bdde42..acc7c74ba7d6 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -439,6 +439,15 @@ static u64 vhost_vdpa_get_backend_features(const struct vhost_vdpa *v)
return ops->get_backend_features(vdpa);
}

+static bool vhost_vdpa_has_persistent_map(const struct vhost_vdpa *v)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ return (!ops->set_map && !ops->dma_map) || ops->reset_map ||
+ vhost_vdpa_get_backend_features(v) & BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST);
+}
+
static long vhost_vdpa_set_features(struct vhost_vdpa *v, u64 __user *featurep)
{
struct vdpa_device *vdpa = v->vdpa;
@@ -726,6 +735,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
return -EFAULT;
if (features & ~(VHOST_VDPA_BACKEND_FEATURES |
BIT_ULL(VHOST_BACKEND_F_DESC_ASID) |
+ BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST) |
BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
BIT_ULL(VHOST_BACKEND_F_RESUME) |
BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
@@ -742,6 +752,9 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
if ((features & BIT_ULL(VHOST_BACKEND_F_DESC_ASID)) &&
!vhost_vdpa_has_desc_group(v))
return -EOPNOTSUPP;
+ if ((features & BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST)) &&
+ !vhost_vdpa_has_persistent_map(v))
+ return -EOPNOTSUPP;
vhost_set_backend_features(&v->vdev, features);
return 0;
}
@@ -797,6 +810,8 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
features |= BIT_ULL(VHOST_BACKEND_F_RESUME);
if (vhost_vdpa_has_desc_group(v))
features |= BIT_ULL(VHOST_BACKEND_F_DESC_ASID);
+ if (vhost_vdpa_has_persistent_map(v))
+ features |= BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST);
features |= vhost_vdpa_get_backend_features(v);
if (copy_to_user(featurep, &features, sizeof(features)))
r = -EFAULT;
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 18ad6ae7ab5c..d7656908f730 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -190,5 +190,7 @@ struct vhost_vdpa_iova_range {
* buffers may reside. Requires VHOST_BACKEND_F_IOTLB_ASID.
*/
#define VHOST_BACKEND_F_DESC_ASID 0x7
+/* IOTLB don't flush memory mapping across device reset */
+#define VHOST_BACKEND_F_IOTLB_PERSIST 0x8

#endif
--
2.39.3

2023-10-21 09:29:13

by Si-Wei Liu

[permalink] [raw]
Subject: [PATCH v4 4/7] vdpa: introduce .compat_reset operation callback

Some device specific IOMMU parent drivers have long standing bogus
behaviour that mistakenly clean up the maps during .reset. By
definition, this is violation to the on-chip IOMMU ops (i.e. .set_map,
or .dma_map & .dma_unmap) in those offending drivers, as the removal of
internal maps is completely agnostic to the upper layer, causing
inconsistent view between the userspace and the kernel. Some userspace
app like QEMU gets around of this brokenness by proactively removing and
adding back all the maps around vdpa device reset, but such workaround
actually penaltize other well-behaved driver setup, where vdpa reset
always comes with the associated mapping cost, especially for kernel
vDPA devices (use_va=false) that have high cost on pinning. It's
imperative to rectify this behaviour and remove the problematic code
from all those non-compliant parent drivers.

However, we cannot unconditionally remove the bogus map-cleaning code
from the buggy .reset implementation, as there might exist userspace
apps that already rely on the behaviour on some setup. Introduce a
.compat_reset driver op to keep compatibility with older userspace. New
and well behaved parent driver should not bother to implement such op,
but only those drivers that are doing or used to do non-compliant
map-cleaning reset will have to.

Signed-off-by: Si-Wei Liu <[email protected]>
---
include/linux/vdpa.h | 13 +++++++++++++
1 file changed, 13 insertions(+)

diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 26ae6ae1eac3..6b8cbf75712d 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -252,6 +252,17 @@ struct vdpa_map_file {
* @reset: Reset device
* @vdev: vdpa device
* Returns integer: success (0) or error (< 0)
+ * @compat_reset: Reset device with compatibility quirks to
+ * accommodate older userspace. Only needed by
+ * parent driver which used to have bogus reset
+ * behaviour, and has to maintain such behaviour
+ * for compatibility with older userspace.
+ * Historically compliant driver only has to
+ * implement .reset, Historically non-compliant
+ * driver should implement both.
+ * @vdev: vdpa device
+ * @flags: compatibility quirks for reset
+ * Returns integer: success (0) or error (< 0)
* @suspend: Suspend the device (optional)
* @vdev: vdpa device
* Returns integer: success (0) or error (< 0)
@@ -393,6 +404,8 @@ struct vdpa_config_ops {
u8 (*get_status)(struct vdpa_device *vdev);
void (*set_status)(struct vdpa_device *vdev, u8 status);
int (*reset)(struct vdpa_device *vdev);
+ int (*compat_reset)(struct vdpa_device *vdev, u32 flags);
+#define VDPA_RESET_F_CLEAN_MAP 1
int (*suspend)(struct vdpa_device *vdev);
int (*resume)(struct vdpa_device *vdev);
size_t (*get_config_size)(struct vdpa_device *vdev);
--
2.39.3

2023-10-21 09:29:19

by Si-Wei Liu

[permalink] [raw]
Subject: [PATCH v4 5/7] vhost-vdpa: clean iotlb map during reset for older userspace

Using .compat_reset op from the previous patch, the buggy .reset
behaviour can be kept as-is on older userspace apps, which don't ack the
IOTLB_PERSIST backend feature. As this compatibility quirk is limited to
those drivers that used to be buggy in the past, it won't affect change
the behaviour or affect ABI on the setups with API compliant driver.

The separation of .compat_reset from the regular .reset allows
vhost-vdpa able to know which driver had broken behaviour before, so it
can apply the corresponding compatibility quirk to the individual driver
whenever needed. Compared to overloading the existing .reset with
flags, .compat_reset won't cause any extra burden to the implementation
of every compliant driver.

Signed-off-by: Si-Wei Liu <[email protected]>
---
drivers/vhost/vdpa.c | 17 +++++++++++++----
drivers/virtio/virtio_vdpa.c | 2 +-
include/linux/vdpa.h | 7 +++++--
3 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index acc7c74ba7d6..9ce40003793b 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -227,13 +227,22 @@ static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid)
irq_bypass_unregister_producer(&vq->call_ctx.producer);
}

-static int vhost_vdpa_reset(struct vhost_vdpa *v)
+static int _compat_vdpa_reset(struct vhost_vdpa *v)
{
struct vdpa_device *vdpa = v->vdpa;
+ u32 flags = 0;

- v->in_batch = 0;
+ flags |= !vhost_backend_has_feature(v->vdev.vqs[0],
+ VHOST_BACKEND_F_IOTLB_PERSIST) ?
+ VDPA_RESET_F_CLEAN_MAP : 0;
+
+ return vdpa_reset(vdpa, flags);
+}

- return vdpa_reset(vdpa);
+static int vhost_vdpa_reset(struct vhost_vdpa *v)
+{
+ v->in_batch = 0;
+ return _compat_vdpa_reset(v);
}

static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
@@ -312,7 +321,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp)
vhost_vdpa_unsetup_vq_irq(v, i);

if (status == 0) {
- ret = vdpa_reset(vdpa);
+ ret = _compat_vdpa_reset(v);
if (ret)
return ret;
} else
diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
index 06ce6d8c2e00..8d63e5923d24 100644
--- a/drivers/virtio/virtio_vdpa.c
+++ b/drivers/virtio/virtio_vdpa.c
@@ -100,7 +100,7 @@ static void virtio_vdpa_reset(struct virtio_device *vdev)
{
struct vdpa_device *vdpa = vd_get_vdpa(vdev);

- vdpa_reset(vdpa);
+ vdpa_reset(vdpa, 0);
}

static bool virtio_vdpa_notify(struct virtqueue *vq)
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 6b8cbf75712d..db15ac07f8a6 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -519,14 +519,17 @@ static inline struct device *vdpa_get_dma_dev(struct vdpa_device *vdev)
return vdev->dma_dev;
}

-static inline int vdpa_reset(struct vdpa_device *vdev)
+static inline int vdpa_reset(struct vdpa_device *vdev, u32 flags)
{
const struct vdpa_config_ops *ops = vdev->config;
int ret;

down_write(&vdev->cf_lock);
vdev->features_valid = false;
- ret = ops->reset(vdev);
+ if (ops->compat_reset && flags)
+ ret = ops->compat_reset(vdev, flags);
+ else
+ ret = ops->reset(vdev);
up_write(&vdev->cf_lock);
return ret;
}
--
2.39.3

2023-10-21 09:29:21

by Si-Wei Liu

[permalink] [raw]
Subject: [PATCH v4 7/7] vdpa_sim: implement .reset_map support

In order to reduce excessive memory mapping cost in live migration and
VM reboot, it is desirable to decouple the vhost-vdpa IOTLB abstraction
from the virtio device life cycle, i.e. mappings can be kept intact
across virtio device reset. Leverage the .reset_map callback, which is
meant to destroy the iotlb on the given ASID and recreate the 1:1
passthrough/identity mapping. To be consistent, the mapping on device
creation is initiailized to passthrough/identity with PA 1:1 mapped as
IOVA. With this the device .reset op doesn't have to maintain and clean
up memory mappings by itself.

Additionally, implement .compat_reset to cater for older userspace,
which may wish to see mapping to be cleared during reset.

Signed-off-by: Si-Wei Liu <[email protected]>
Tested-by: Stefano Garzarella <[email protected]>
---
drivers/vdpa/vdpa_sim/vdpa_sim.c | 52 ++++++++++++++++++++++++++------
1 file changed, 43 insertions(+), 9 deletions(-)

diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 76d41058add9..be2925d0d283 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -139,7 +139,7 @@ static void vdpasim_vq_reset(struct vdpasim *vdpasim,
vq->vring.notify = NULL;
}

-static void vdpasim_do_reset(struct vdpasim *vdpasim)
+static void vdpasim_do_reset(struct vdpasim *vdpasim, u32 flags)
{
int i;

@@ -151,11 +151,13 @@ static void vdpasim_do_reset(struct vdpasim *vdpasim)
&vdpasim->iommu_lock);
}

- for (i = 0; i < vdpasim->dev_attr.nas; i++) {
- vhost_iotlb_reset(&vdpasim->iommu[i]);
- vhost_iotlb_add_range(&vdpasim->iommu[i], 0, ULONG_MAX,
- 0, VHOST_MAP_RW);
- vdpasim->iommu_pt[i] = true;
+ if (flags & VDPA_RESET_F_CLEAN_MAP) {
+ for (i = 0; i < vdpasim->dev_attr.nas; i++) {
+ vhost_iotlb_reset(&vdpasim->iommu[i]);
+ vhost_iotlb_add_range(&vdpasim->iommu[i], 0, ULONG_MAX,
+ 0, VHOST_MAP_RW);
+ vdpasim->iommu_pt[i] = true;
+ }
}

vdpasim->running = true;
@@ -259,8 +261,12 @@ struct vdpasim *vdpasim_create(struct vdpasim_dev_attr *dev_attr,
if (!vdpasim->iommu_pt)
goto err_iommu;

- for (i = 0; i < vdpasim->dev_attr.nas; i++)
+ for (i = 0; i < vdpasim->dev_attr.nas; i++) {
vhost_iotlb_init(&vdpasim->iommu[i], max_iotlb_entries, 0);
+ vhost_iotlb_add_range(&vdpasim->iommu[i], 0, ULONG_MAX, 0,
+ VHOST_MAP_RW);
+ vdpasim->iommu_pt[i] = true;
+ }

for (i = 0; i < dev_attr->nvqs; i++)
vringh_set_iotlb(&vdpasim->vqs[i].vring, &vdpasim->iommu[0],
@@ -480,18 +486,23 @@ static void vdpasim_set_status(struct vdpa_device *vdpa, u8 status)
mutex_unlock(&vdpasim->mutex);
}

-static int vdpasim_reset(struct vdpa_device *vdpa)
+static int vdpasim_compat_reset(struct vdpa_device *vdpa, u32 flags)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);

mutex_lock(&vdpasim->mutex);
vdpasim->status = 0;
- vdpasim_do_reset(vdpasim);
+ vdpasim_do_reset(vdpasim, flags);
mutex_unlock(&vdpasim->mutex);

return 0;
}

+static int vdpasim_reset(struct vdpa_device *vdpa)
+{
+ return vdpasim_compat_reset(vdpa, 0);
+}
+
static int vdpasim_suspend(struct vdpa_device *vdpa)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
@@ -637,6 +648,25 @@ static int vdpasim_set_map(struct vdpa_device *vdpa, unsigned int asid,
return ret;
}

+static int vdpasim_reset_map(struct vdpa_device *vdpa, unsigned int asid)
+{
+ struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
+
+ if (asid >= vdpasim->dev_attr.nas)
+ return -EINVAL;
+
+ spin_lock(&vdpasim->iommu_lock);
+ if (vdpasim->iommu_pt[asid])
+ goto out;
+ vhost_iotlb_reset(&vdpasim->iommu[asid]);
+ vhost_iotlb_add_range(&vdpasim->iommu[asid], 0, ULONG_MAX,
+ 0, VHOST_MAP_RW);
+ vdpasim->iommu_pt[asid] = true;
+out:
+ spin_unlock(&vdpasim->iommu_lock);
+ return 0;
+}
+
static int vdpasim_bind_mm(struct vdpa_device *vdpa, struct mm_struct *mm)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
@@ -749,6 +779,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
.get_status = vdpasim_get_status,
.set_status = vdpasim_set_status,
.reset = vdpasim_reset,
+ .compat_reset = vdpasim_compat_reset,
.suspend = vdpasim_suspend,
.resume = vdpasim_resume,
.get_config_size = vdpasim_get_config_size,
@@ -759,6 +790,7 @@ static const struct vdpa_config_ops vdpasim_config_ops = {
.set_group_asid = vdpasim_set_group_asid,
.dma_map = vdpasim_dma_map,
.dma_unmap = vdpasim_dma_unmap,
+ .reset_map = vdpasim_reset_map,
.bind_mm = vdpasim_bind_mm,
.unbind_mm = vdpasim_unbind_mm,
.free = vdpasim_free,
@@ -787,6 +819,7 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
.get_status = vdpasim_get_status,
.set_status = vdpasim_set_status,
.reset = vdpasim_reset,
+ .compat_reset = vdpasim_compat_reset,
.suspend = vdpasim_suspend,
.resume = vdpasim_resume,
.get_config_size = vdpasim_get_config_size,
@@ -796,6 +829,7 @@ static const struct vdpa_config_ops vdpasim_batch_config_ops = {
.get_iova_range = vdpasim_get_iova_range,
.set_group_asid = vdpasim_set_group_asid,
.set_map = vdpasim_set_map,
+ .reset_map = vdpasim_reset_map,
.bind_mm = vdpasim_bind_mm,
.unbind_mm = vdpasim_unbind_mm,
.free = vdpasim_free,
--
2.39.3

2023-10-21 09:29:27

by Si-Wei Liu

[permalink] [raw]
Subject: [PATCH v4 2/7] vhost-vdpa: reset vendor specific mapping to initial state in .release

Devices with on-chip IOMMU or vendor specific IOTLB implementation may
need to restore iotlb mapping to the initial or default state using the
.reset_map op, as it's desirable for some parent devices to not work
with DMA ops and maintain a simple IOMMU model with .reset_map. In
particular, device reset should not cause mapping to go away on such
IOTLB model, so persistent mapping is implied across reset. Before the
userspace process using vhost-vdpa is gone, give it a chance to reset
iotlb back to the initial state in vhost_vdpa_cleanup().

Signed-off-by: Si-Wei Liu <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
---
drivers/vhost/vdpa.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 851535f57b95..c6bfe9bdde42 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -131,6 +131,15 @@ static struct vhost_vdpa_as *vhost_vdpa_find_alloc_as(struct vhost_vdpa *v,
return vhost_vdpa_alloc_as(v, asid);
}

+static void vhost_vdpa_reset_map(struct vhost_vdpa *v, u32 asid)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+
+ if (ops->reset_map)
+ ops->reset_map(vdpa, asid);
+}
+
static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)
{
struct vhost_vdpa_as *as = asid_to_as(v, asid);
@@ -140,6 +149,14 @@ static int vhost_vdpa_remove_as(struct vhost_vdpa *v, u32 asid)

hlist_del(&as->hash_link);
vhost_vdpa_iotlb_unmap(v, &as->iotlb, 0ULL, 0ULL - 1, asid);
+ /*
+ * Devices with vendor specific IOMMU may need to restore
+ * iotlb to the initial or default state, which cannot be
+ * cleaned up in the all range unmap call above. Give them
+ * a chance to clean up or reset the map to the desired
+ * state.
+ */
+ vhost_vdpa_reset_map(v, asid);
kfree(as);

return 0;
--
2.39.3

2023-10-21 09:29:39

by Si-Wei Liu

[permalink] [raw]
Subject: [PATCH v4 6/7] vdpa/mlx5: implement .reset_map driver op

Since commit 6f5312f80183 ("vdpa/mlx5: Add support for running with
virtio_vdpa"), mlx5_vdpa starts with preallocate 1:1 DMA MR at device
creation time. This 1:1 DMA MR will be implicitly destroyed while the
first .set_map call is invoked, in which case callers like vhost-vdpa
will start to set up custom mappings. When the .reset callback is
invoked, the custom mappings will be cleared and the 1:1 DMA MR will be
re-created.

In order to reduce excessive memory mapping cost in live migration, it
is desirable to decouple the vhost-vdpa IOTLB abstraction from the
virtio device life cycle, i.e. mappings can be kept around intact across
virtio device reset. Leverage the .reset_map callback, which is meant to
destroy the regular MR (including cvq mapping) on the given ASID and
recreate the initial DMA mapping. That way, the device .reset op runs
free from having to maintain and clean up memory mappings by itself.

Additionally, implement .compat_reset to cater for older userspace,
which may wish to see mapping to be cleared during reset.

Co-developed-by: Dragos Tatulea <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
Signed-off-by: Si-Wei Liu <[email protected]>
---
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 1 +
drivers/vdpa/mlx5/core/mr.c | 17 +++++++++++++++++
drivers/vdpa/mlx5/net/mlx5_vnet.c | 27 ++++++++++++++++++++++++---
3 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index db988ced5a5d..84547d998bcf 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -127,6 +127,7 @@ int mlx5_vdpa_update_cvq_iotlb(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb,
unsigned int asid);
int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev);
+int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid);

#define mlx5_vdpa_warn(__dev, format, ...) \
dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__, \
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 66530e28f327..2197c46e563a 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -645,3 +645,20 @@ int mlx5_vdpa_create_dma_mr(struct mlx5_vdpa_dev *mvdev)

return mlx5_vdpa_update_cvq_iotlb(mvdev, NULL, 0);
}
+
+int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
+{
+ if (asid >= MLX5_VDPA_NUM_AS)
+ return -EINVAL;
+
+ mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[asid]);
+
+ if (asid == 0 && MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
+ if (mlx5_vdpa_create_dma_mr(mvdev))
+ mlx5_vdpa_warn(mvdev, "create DMA MR failed\n");
+ } else {
+ mlx5_vdpa_update_cvq_iotlb(mvdev, NULL, asid);
+ }
+
+ return 0;
+}
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index f4516a2d5bb0..12ac3397f39b 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -2876,7 +2876,7 @@ static void init_group_to_asid_map(struct mlx5_vdpa_dev *mvdev)
mvdev->group2asid[i] = 0;
}

-static int mlx5_vdpa_reset(struct vdpa_device *vdev)
+static int mlx5_vdpa_compat_reset(struct vdpa_device *vdev, u32 flags)
{
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
@@ -2888,7 +2888,8 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
unregister_link_notifier(ndev);
teardown_driver(ndev);
clear_vqs_ready(ndev);
- mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
+ if (flags & VDPA_RESET_F_CLEAN_MAP)
+ mlx5_vdpa_destroy_mr_resources(&ndev->mvdev);
ndev->mvdev.status = 0;
ndev->mvdev.suspended = false;
ndev->cur_num_vqs = 0;
@@ -2899,7 +2900,8 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
init_group_to_asid_map(mvdev);
++mvdev->generation;

- if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
+ if ((flags & VDPA_RESET_F_CLEAN_MAP) &&
+ MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
if (mlx5_vdpa_create_dma_mr(mvdev))
mlx5_vdpa_warn(mvdev, "create MR failed\n");
}
@@ -2908,6 +2910,11 @@ static int mlx5_vdpa_reset(struct vdpa_device *vdev)
return 0;
}

+static int mlx5_vdpa_reset(struct vdpa_device *vdev)
+{
+ return mlx5_vdpa_compat_reset(vdev, 0);
+}
+
static size_t mlx5_vdpa_get_config_size(struct vdpa_device *vdev)
{
return sizeof(struct virtio_net_config);
@@ -2987,6 +2994,18 @@ static int mlx5_vdpa_set_map(struct vdpa_device *vdev, unsigned int asid,
return err;
}

+static int mlx5_vdpa_reset_map(struct vdpa_device *vdev, unsigned int asid)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ int err;
+
+ down_write(&ndev->reslock);
+ err = mlx5_vdpa_reset_mr(mvdev, asid);
+ up_write(&ndev->reslock);
+ return err;
+}
+
static struct device *mlx5_get_vq_dma_dev(struct vdpa_device *vdev, u16 idx)
{
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
@@ -3250,11 +3269,13 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = {
.get_status = mlx5_vdpa_get_status,
.set_status = mlx5_vdpa_set_status,
.reset = mlx5_vdpa_reset,
+ .compat_reset = mlx5_vdpa_compat_reset,
.get_config_size = mlx5_vdpa_get_config_size,
.get_config = mlx5_vdpa_get_config,
.set_config = mlx5_vdpa_set_config,
.get_generation = mlx5_vdpa_get_generation,
.set_map = mlx5_vdpa_set_map,
+ .reset_map = mlx5_vdpa_reset_map,
.set_group_asid = mlx5_set_group_asid,
.get_vq_dma_dev = mlx5_get_vq_dma_dev,
.free = mlx5_vdpa_free,
--
2.39.3

2023-10-23 03:52:59

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset

Hi Si-Wei:

On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]> wrote:
>
> In order to reduce needlessly high setup and teardown cost
> of iotlb mapping during live migration, it's crucial to
> decouple the vhost-vdpa iotlb abstraction from the virtio
> device life cycle, i.e. iotlb mappings should be left
> intact across virtio device reset [1]. For it to work, the
> on-chip IOMMU parent device could implement a separate
> .reset_map() operation callback to restore 1:1 DMA mapping
> without having to resort to the .reset() callback, the
> latter of which is mainly used to reset virtio device state.
> This new .reset_map() callback will be invoked only before
> the vhost-vdpa driver is to be removed and detached from
> the vdpa bus, such that other vdpa bus drivers, e.g.
> virtio-vdpa, can start with 1:1 DMA mapping when they
> are attached. For the context, those on-chip IOMMU parent
> devices, create the 1:1 DMA mapping at vdpa device creation,
> and they would implicitly destroy the 1:1 mapping when
> the first .set_map or .dma_map callback is invoked.
>
> This patchset is rebased on top of the latest vhost tree.
>
> [1] Reducing vdpa migration downtime because of memory pin / maps
> https://www.mail-archive.com/[email protected]/msg953755.html
>
> ---
> v4:
> - Rework compatibility using new .compat_reset driver op

I still think having a set_backend_feature() or reset_map(clean=true)
might be better. As it tries hard to not introduce new stuff on the
bus.

But we can listen to others for sure.

Thanks

2023-10-23 22:01:12

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset



On 10/22/2023 8:51 PM, Jason Wang wrote:
> Hi Si-Wei:
>
> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]> wrote:
>> In order to reduce needlessly high setup and teardown cost
>> of iotlb mapping during live migration, it's crucial to
>> decouple the vhost-vdpa iotlb abstraction from the virtio
>> device life cycle, i.e. iotlb mappings should be left
>> intact across virtio device reset [1]. For it to work, the
>> on-chip IOMMU parent device could implement a separate
>> .reset_map() operation callback to restore 1:1 DMA mapping
>> without having to resort to the .reset() callback, the
>> latter of which is mainly used to reset virtio device state.
>> This new .reset_map() callback will be invoked only before
>> the vhost-vdpa driver is to be removed and detached from
>> the vdpa bus, such that other vdpa bus drivers, e.g.
>> virtio-vdpa, can start with 1:1 DMA mapping when they
>> are attached. For the context, those on-chip IOMMU parent
>> devices, create the 1:1 DMA mapping at vdpa device creation,
>> and they would implicitly destroy the 1:1 mapping when
>> the first .set_map or .dma_map callback is invoked.
>>
>> This patchset is rebased on top of the latest vhost tree.
>>
>> [1] Reducing vdpa migration downtime because of memory pin / maps
>> https://www.mail-archive.com/[email protected]/msg953755.html
>>
>> ---
>> v4:
>> - Rework compatibility using new .compat_reset driver op
> I still think having a set_backend_feature()
This will overload backend features with the role of carrying over
compatibility quirks, which I tried to avoid from. While I think the
.compat_reset from the v4 code just works with the backend features
acknowledgement (and maybe others as well) to determine, but not
directly tie it to backend features itself. These two have different
implications in terms of requirement, scope and maintaining/deprecation,
better to cope with compat quirks in explicit and driver visible way.

> or reset_map(clean=true) might be better.
An explicit op might be marginally better in driver writer's point of
view. Compliant driver doesn't have to bother asserting clean_map never
be true so their code would never bother dealing with this case, as
explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
during reset for older userspace":

"
    The separation of .compat_reset from the regular .reset allows
    vhost-vdpa able to know which driver had broken behavior before, so it
    can apply the corresponding compatibility quirk to the individual
driver
    whenever needed.  Compared to overloading the existing .reset with
    flags, .compat_reset won't cause any extra burden to the implementation
    of every compliant driver.
"

> As it tries hard to not introduce new stuff on the bus.
Honestly I don't see substantial difference between these other than the
color. There's no single best solution that stands out among the 3. And
I assume you already noticed it from all the above 3 approaches will
have to go with backend features negotiation, that the 1st vdpa reset
before backend feature negotiation will use the compliant version of
.reset that doesn't clean up the map. While I don't think this nuance
matters much to existing older userspace apps, as the maps should
already get cleaned by previous process in vhost_vdpa_cleanup(), but if
bug-for-bug behavioral compatibility is what you want, module parameter
will be the single best answer.

Regards,
-Siwei

> But we can listen to others for sure.
>
> Thanks
>

2023-10-24 05:46:46

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] vhost-vdpa: clean iotlb map during reset for older userspace

On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]> wrote:
>
> Using .compat_reset op from the previous patch, the buggy .reset
> behaviour can be kept as-is on older userspace apps, which don't ack the
> IOTLB_PERSIST backend feature. As this compatibility quirk is limited to
> those drivers that used to be buggy in the past, it won't affect change
> the behaviour or affect ABI on the setups with API compliant driver.
>
> The separation of .compat_reset from the regular .reset allows
> vhost-vdpa able to know which driver had broken behaviour before, so it
> can apply the corresponding compatibility quirk to the individual driver
> whenever needed. Compared to overloading the existing .reset with
> flags, .compat_reset won't cause any extra burden to the implementation
> of every compliant driver.
>
> Signed-off-by: Si-Wei Liu <[email protected]>
> ---
> drivers/vhost/vdpa.c | 17 +++++++++++++----
> drivers/virtio/virtio_vdpa.c | 2 +-
> include/linux/vdpa.h | 7 +++++--
> 3 files changed, 19 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index acc7c74ba7d6..9ce40003793b 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -227,13 +227,22 @@ static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid)
> irq_bypass_unregister_producer(&vq->call_ctx.producer);
> }
>
> -static int vhost_vdpa_reset(struct vhost_vdpa *v)
> +static int _compat_vdpa_reset(struct vhost_vdpa *v)
> {
> struct vdpa_device *vdpa = v->vdpa;
> + u32 flags = 0;
>
> - v->in_batch = 0;
> + flags |= !vhost_backend_has_feature(v->vdev.vqs[0],
> + VHOST_BACKEND_F_IOTLB_PERSIST) ?
> + VDPA_RESET_F_CLEAN_MAP : 0;
> +
> + return vdpa_reset(vdpa, flags);
> +}
>
> - return vdpa_reset(vdpa);
> +static int vhost_vdpa_reset(struct vhost_vdpa *v)
> +{
> + v->in_batch = 0;
> + return _compat_vdpa_reset(v);
> }
>
> static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
> @@ -312,7 +321,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp)
> vhost_vdpa_unsetup_vq_irq(v, i);
>
> if (status == 0) {
> - ret = vdpa_reset(vdpa);
> + ret = _compat_vdpa_reset(v);
> if (ret)
> return ret;
> } else
> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
> index 06ce6d8c2e00..8d63e5923d24 100644
> --- a/drivers/virtio/virtio_vdpa.c
> +++ b/drivers/virtio/virtio_vdpa.c
> @@ -100,7 +100,7 @@ static void virtio_vdpa_reset(struct virtio_device *vdev)
> {
> struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>
> - vdpa_reset(vdpa);
> + vdpa_reset(vdpa, 0);
> }
>
> static bool virtio_vdpa_notify(struct virtqueue *vq)
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 6b8cbf75712d..db15ac07f8a6 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -519,14 +519,17 @@ static inline struct device *vdpa_get_dma_dev(struct vdpa_device *vdev)
> return vdev->dma_dev;
> }
>
> -static inline int vdpa_reset(struct vdpa_device *vdev)
> +static inline int vdpa_reset(struct vdpa_device *vdev, u32 flags)
> {
> const struct vdpa_config_ops *ops = vdev->config;
> int ret;
>
> down_write(&vdev->cf_lock);
> vdev->features_valid = false;
> - ret = ops->reset(vdev);
> + if (ops->compat_reset && flags)
> + ret = ops->compat_reset(vdev, flags);
> + else
> + ret = ops->reset(vdev);

Instead of inventing a new API that carries the flags. Tweak the
existing one seems to be simpler and better?

As compat_reset(vdev, 0) == reset(vdev)

Then you don't need the switch in the parent as well

+static int vdpasim_reset(struct vdpa_device *vdpa)
+{
+ return vdpasim_compat_reset(vdpa, 0);
+}

Thanks


> up_write(&vdev->cf_lock);
> return ret;
> }
> --
> 2.39.3
>

2023-10-24 06:53:43

by Lei Yang

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset

QE tested this series v4 with regression testing on real nic, there is
no new regression bug.

Tested-by: Lei Yang <[email protected]>

On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <[email protected]> wrote:
>
>
>
> On 10/22/2023 8:51 PM, Jason Wang wrote:
> > Hi Si-Wei:
> >
> > On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]> wrote:
> >> In order to reduce needlessly high setup and teardown cost
> >> of iotlb mapping during live migration, it's crucial to
> >> decouple the vhost-vdpa iotlb abstraction from the virtio
> >> device life cycle, i.e. iotlb mappings should be left
> >> intact across virtio device reset [1]. For it to work, the
> >> on-chip IOMMU parent device could implement a separate
> >> .reset_map() operation callback to restore 1:1 DMA mapping
> >> without having to resort to the .reset() callback, the
> >> latter of which is mainly used to reset virtio device state.
> >> This new .reset_map() callback will be invoked only before
> >> the vhost-vdpa driver is to be removed and detached from
> >> the vdpa bus, such that other vdpa bus drivers, e.g.
> >> virtio-vdpa, can start with 1:1 DMA mapping when they
> >> are attached. For the context, those on-chip IOMMU parent
> >> devices, create the 1:1 DMA mapping at vdpa device creation,
> >> and they would implicitly destroy the 1:1 mapping when
> >> the first .set_map or .dma_map callback is invoked.
> >>
> >> This patchset is rebased on top of the latest vhost tree.
> >>
> >> [1] Reducing vdpa migration downtime because of memory pin / maps
> >> https://www.mail-archive.com/[email protected]/msg953755.html
> >>
> >> ---
> >> v4:
> >> - Rework compatibility using new .compat_reset driver op
> > I still think having a set_backend_feature()
> This will overload backend features with the role of carrying over
> compatibility quirks, which I tried to avoid from. While I think the
> .compat_reset from the v4 code just works with the backend features
> acknowledgement (and maybe others as well) to determine, but not
> directly tie it to backend features itself. These two have different
> implications in terms of requirement, scope and maintaining/deprecation,
> better to cope with compat quirks in explicit and driver visible way.
>
> > or reset_map(clean=true) might be better.
> An explicit op might be marginally better in driver writer's point of
> view. Compliant driver doesn't have to bother asserting clean_map never
> be true so their code would never bother dealing with this case, as
> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
> during reset for older userspace":
>
> "
> The separation of .compat_reset from the regular .reset allows
> vhost-vdpa able to know which driver had broken behavior before, so it
> can apply the corresponding compatibility quirk to the individual
> driver
> whenever needed. Compared to overloading the existing .reset with
> flags, .compat_reset won't cause any extra burden to the implementation
> of every compliant driver.
> "
>
> > As it tries hard to not introduce new stuff on the bus.
> Honestly I don't see substantial difference between these other than the
> color. There's no single best solution that stands out among the 3. And
> I assume you already noticed it from all the above 3 approaches will
> have to go with backend features negotiation, that the 1st vdpa reset
> before backend feature negotiation will use the compliant version of
> .reset that doesn't clean up the map. While I don't think this nuance
> matters much to existing older userspace apps, as the maps should
> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
> bug-for-bug behavioral compatibility is what you want, module parameter
> will be the single best answer.
>
> Regards,
> -Siwei
>
> > But we can listen to others for sure.
> >
> > Thanks
> >
>

2023-10-24 16:22:56

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] vhost-vdpa: clean iotlb map during reset for older userspace



On 10/23/2023 10:45 PM, Jason Wang wrote:
> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]> wrote:
>> Using .compat_reset op from the previous patch, the buggy .reset
>> behaviour can be kept as-is on older userspace apps, which don't ack the
>> IOTLB_PERSIST backend feature. As this compatibility quirk is limited to
>> those drivers that used to be buggy in the past, it won't affect change
>> the behaviour or affect ABI on the setups with API compliant driver.
>>
>> The separation of .compat_reset from the regular .reset allows
>> vhost-vdpa able to know which driver had broken behaviour before, so it
>> can apply the corresponding compatibility quirk to the individual driver
>> whenever needed. Compared to overloading the existing .reset with
>> flags, .compat_reset won't cause any extra burden to the implementation
>> of every compliant driver.
>>
>> Signed-off-by: Si-Wei Liu <[email protected]>
>> ---
>> drivers/vhost/vdpa.c | 17 +++++++++++++----
>> drivers/virtio/virtio_vdpa.c | 2 +-
>> include/linux/vdpa.h | 7 +++++--
>> 3 files changed, 19 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>> index acc7c74ba7d6..9ce40003793b 100644
>> --- a/drivers/vhost/vdpa.c
>> +++ b/drivers/vhost/vdpa.c
>> @@ -227,13 +227,22 @@ static void vhost_vdpa_unsetup_vq_irq(struct vhost_vdpa *v, u16 qid)
>> irq_bypass_unregister_producer(&vq->call_ctx.producer);
>> }
>>
>> -static int vhost_vdpa_reset(struct vhost_vdpa *v)
>> +static int _compat_vdpa_reset(struct vhost_vdpa *v)
>> {
>> struct vdpa_device *vdpa = v->vdpa;
>> + u32 flags = 0;
>>
>> - v->in_batch = 0;
>> + flags |= !vhost_backend_has_feature(v->vdev.vqs[0],
>> + VHOST_BACKEND_F_IOTLB_PERSIST) ?
>> + VDPA_RESET_F_CLEAN_MAP : 0;
>> +
>> + return vdpa_reset(vdpa, flags);
>> +}
>>
>> - return vdpa_reset(vdpa);
>> +static int vhost_vdpa_reset(struct vhost_vdpa *v)
>> +{
>> + v->in_batch = 0;
>> + return _compat_vdpa_reset(v);
>> }
>>
>> static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
>> @@ -312,7 +321,7 @@ static long vhost_vdpa_set_status(struct vhost_vdpa *v, u8 __user *statusp)
>> vhost_vdpa_unsetup_vq_irq(v, i);
>>
>> if (status == 0) {
>> - ret = vdpa_reset(vdpa);
>> + ret = _compat_vdpa_reset(v);
>> if (ret)
>> return ret;
>> } else
>> diff --git a/drivers/virtio/virtio_vdpa.c b/drivers/virtio/virtio_vdpa.c
>> index 06ce6d8c2e00..8d63e5923d24 100644
>> --- a/drivers/virtio/virtio_vdpa.c
>> +++ b/drivers/virtio/virtio_vdpa.c
>> @@ -100,7 +100,7 @@ static void virtio_vdpa_reset(struct virtio_device *vdev)
>> {
>> struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>>
>> - vdpa_reset(vdpa);
>> + vdpa_reset(vdpa, 0);
>> }
>>
>> static bool virtio_vdpa_notify(struct virtqueue *vq)
>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
>> index 6b8cbf75712d..db15ac07f8a6 100644
>> --- a/include/linux/vdpa.h
>> +++ b/include/linux/vdpa.h
>> @@ -519,14 +519,17 @@ static inline struct device *vdpa_get_dma_dev(struct vdpa_device *vdev)
>> return vdev->dma_dev;
>> }
>>
>> -static inline int vdpa_reset(struct vdpa_device *vdev)
>> +static inline int vdpa_reset(struct vdpa_device *vdev, u32 flags)
>> {
>> const struct vdpa_config_ops *ops = vdev->config;
>> int ret;
>>
>> down_write(&vdev->cf_lock);
>> vdev->features_valid = false;
>> - ret = ops->reset(vdev);
>> + if (ops->compat_reset && flags)
>> + ret = ops->compat_reset(vdev, flags);
>> + else
>> + ret = ops->reset(vdev);
> Instead of inventing a new API that carries the flags. Tweak the
> existing one seems to be simpler and better?
Well, as indicated in the commit message, this allows vhost-vdpa be able
to know which driver had broken behavior before, so it
can apply the corresponding compatibility quirk to the individual driver
when it's really necessary. If sending all flags unconditionally down to
every driver, it's hard for driver writers to distinguish which are
compatibility quirks that they can safely ignore and which are feature
flags that are encouraged to implement. In that sense, gating features
from being polluted by compatibility quirks with an implicit op would be
better.

Regards,
-Siwei
>
> As compat_reset(vdev, 0) == reset(vdev)
>
> Then you don't need the switch in the parent as well
>
> +static int vdpasim_reset(struct vdpa_device *vdpa)
> +{
> + return vdpasim_compat_reset(vdpa, 0);
> +}
>
> Thanks
>
>
>> up_write(&vdev->cf_lock);
>> return ret;
>> }
>> --
>> 2.39.3
>>

2023-10-24 16:26:10

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] vhost-vdpa: clean iotlb map during reset for older userspace



On 10/24/2023 9:21 AM, Si-Wei Liu wrote:
>
>
> On 10/23/2023 10:45 PM, Jason Wang wrote:
>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]>
>> wrote:
>>> Using .compat_reset op from the previous patch, the buggy .reset
>>> behaviour can be kept as-is on older userspace apps, which don't ack
>>> the
>>> IOTLB_PERSIST backend feature. As this compatibility quirk is
>>> limited to
>>> those drivers that used to be buggy in the past, it won't affect change
>>> the behaviour or affect ABI on the setups with API compliant driver.
>>>
>>> The separation of .compat_reset from the regular .reset allows
>>> vhost-vdpa able to know which driver had broken behaviour before, so it
>>> can apply the corresponding compatibility quirk to the individual
>>> driver
>>> whenever needed.  Compared to overloading the existing .reset with
>>> flags, .compat_reset won't cause any extra burden to the implementation
>>> of every compliant driver.
>>>
>>> Signed-off-by: Si-Wei Liu <[email protected]>
>>> ---
>>>   drivers/vhost/vdpa.c         | 17 +++++++++++++----
>>>   drivers/virtio/virtio_vdpa.c |  2 +-
>>>   include/linux/vdpa.h         |  7 +++++--
>>>   3 files changed, 19 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
>>> index acc7c74ba7d6..9ce40003793b 100644
>>> --- a/drivers/vhost/vdpa.c
>>> +++ b/drivers/vhost/vdpa.c
>>> @@ -227,13 +227,22 @@ static void vhost_vdpa_unsetup_vq_irq(struct
>>> vhost_vdpa *v, u16 qid)
>>> irq_bypass_unregister_producer(&vq->call_ctx.producer);
>>>   }
>>>
>>> -static int vhost_vdpa_reset(struct vhost_vdpa *v)
>>> +static int _compat_vdpa_reset(struct vhost_vdpa *v)
>>>   {
>>>          struct vdpa_device *vdpa = v->vdpa;
>>> +       u32 flags = 0;
>>>
>>> -       v->in_batch = 0;
>>> +       flags |= !vhost_backend_has_feature(v->vdev.vqs[0],
>>> + VHOST_BACKEND_F_IOTLB_PERSIST) ?
>>> +                VDPA_RESET_F_CLEAN_MAP : 0;
>>> +
>>> +       return vdpa_reset(vdpa, flags);
>>> +}
>>>
>>> -       return vdpa_reset(vdpa);
>>> +static int vhost_vdpa_reset(struct vhost_vdpa *v)
>>> +{
>>> +       v->in_batch = 0;
>>> +       return _compat_vdpa_reset(v);
>>>   }
>>>
>>>   static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
>>> @@ -312,7 +321,7 @@ static long vhost_vdpa_set_status(struct
>>> vhost_vdpa *v, u8 __user *statusp)
>>>                          vhost_vdpa_unsetup_vq_irq(v, i);
>>>
>>>          if (status == 0) {
>>> -               ret = vdpa_reset(vdpa);
>>> +               ret = _compat_vdpa_reset(v);
>>>                  if (ret)
>>>                          return ret;
>>>          } else
>>> diff --git a/drivers/virtio/virtio_vdpa.c
>>> b/drivers/virtio/virtio_vdpa.c
>>> index 06ce6d8c2e00..8d63e5923d24 100644
>>> --- a/drivers/virtio/virtio_vdpa.c
>>> +++ b/drivers/virtio/virtio_vdpa.c
>>> @@ -100,7 +100,7 @@ static void virtio_vdpa_reset(struct
>>> virtio_device *vdev)
>>>   {
>>>          struct vdpa_device *vdpa = vd_get_vdpa(vdev);
>>>
>>> -       vdpa_reset(vdpa);
>>> +       vdpa_reset(vdpa, 0);
>>>   }
>>>
>>>   static bool virtio_vdpa_notify(struct virtqueue *vq)
>>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
>>> index 6b8cbf75712d..db15ac07f8a6 100644
>>> --- a/include/linux/vdpa.h
>>> +++ b/include/linux/vdpa.h
>>> @@ -519,14 +519,17 @@ static inline struct device
>>> *vdpa_get_dma_dev(struct vdpa_device *vdev)
>>>          return vdev->dma_dev;
>>>   }
>>>
>>> -static inline int vdpa_reset(struct vdpa_device *vdev)
>>> +static inline int vdpa_reset(struct vdpa_device *vdev, u32 flags)
>>>   {
>>>          const struct vdpa_config_ops *ops = vdev->config;
>>>          int ret;
>>>
>>>          down_write(&vdev->cf_lock);
>>>          vdev->features_valid = false;
>>> -       ret = ops->reset(vdev);
>>> +       if (ops->compat_reset && flags)
>>> +               ret = ops->compat_reset(vdev, flags);
>>> +       else
>>> +               ret = ops->reset(vdev);
>> Instead of inventing a new API that carries the flags. Tweak the
>> existing one seems to be simpler and better?
> Well, as indicated in the commit message, this allows vhost-vdpa be
> able to know which driver had broken behavior before, so it
> can apply the corresponding compatibility quirk to the individual
> driver when it's really necessary. If sending all flags
> unconditionally down to every driver, it's hard for driver writers to
> distinguish which are compatibility quirks that they can safely ignore
> and which are feature flags that are encouraged to implement. In that
> sense, gating features from being polluted by compatibility quirks
> with an implicit op
s/implicit/explicit/
> would be better.
>
> Regards,
> -Siwei
>>
>> As compat_reset(vdev, 0) == reset(vdev)
>>
>> Then you don't need the switch in the parent as well
>>
>> +static int vdpasim_reset(struct vdpa_device *vdpa)
>> +{
>> +       return vdpasim_compat_reset(vdpa, 0);
>> +}
>>
>> Thanks
>>
>>
>>> up_write(&vdev->cf_lock);
>>>          return ret;
>>>   }
>>> --
>>> 2.39.3
>>>
>

2023-10-24 17:28:01

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset

Thanks a lot for testing! Please be aware that there's a follow-up fix
for a potential oops in this v4 series:

https://lore.kernel.org/virtualization/[email protected]/

Would be nice to have it applied for any tests.

Thanks,
-Siwei

On 10/23/2023 11:51 PM, Lei Yang wrote:
> QE tested this series v4 with regression testing on real nic, there is
> no new regression bug.
>
> Tested-by: Lei Yang <[email protected]>
>
> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <[email protected]> wrote:
>>
>>
>> On 10/22/2023 8:51 PM, Jason Wang wrote:
>>> Hi Si-Wei:
>>>
>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]> wrote:
>>>> In order to reduce needlessly high setup and teardown cost
>>>> of iotlb mapping during live migration, it's crucial to
>>>> decouple the vhost-vdpa iotlb abstraction from the virtio
>>>> device life cycle, i.e. iotlb mappings should be left
>>>> intact across virtio device reset [1]. For it to work, the
>>>> on-chip IOMMU parent device could implement a separate
>>>> .reset_map() operation callback to restore 1:1 DMA mapping
>>>> without having to resort to the .reset() callback, the
>>>> latter of which is mainly used to reset virtio device state.
>>>> This new .reset_map() callback will be invoked only before
>>>> the vhost-vdpa driver is to be removed and detached from
>>>> the vdpa bus, such that other vdpa bus drivers, e.g.
>>>> virtio-vdpa, can start with 1:1 DMA mapping when they
>>>> are attached. For the context, those on-chip IOMMU parent
>>>> devices, create the 1:1 DMA mapping at vdpa device creation,
>>>> and they would implicitly destroy the 1:1 mapping when
>>>> the first .set_map or .dma_map callback is invoked.
>>>>
>>>> This patchset is rebased on top of the latest vhost tree.
>>>>
>>>> [1] Reducing vdpa migration downtime because of memory pin / maps
>>>> https://www.mail-archive.com/[email protected]/msg953755.html
>>>>
>>>> ---
>>>> v4:
>>>> - Rework compatibility using new .compat_reset driver op
>>> I still think having a set_backend_feature()
>> This will overload backend features with the role of carrying over
>> compatibility quirks, which I tried to avoid from. While I think the
>> .compat_reset from the v4 code just works with the backend features
>> acknowledgement (and maybe others as well) to determine, but not
>> directly tie it to backend features itself. These two have different
>> implications in terms of requirement, scope and maintaining/deprecation,
>> better to cope with compat quirks in explicit and driver visible way.
>>
>>> or reset_map(clean=true) might be better.
>> An explicit op might be marginally better in driver writer's point of
>> view. Compliant driver doesn't have to bother asserting clean_map never
>> be true so their code would never bother dealing with this case, as
>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
>> during reset for older userspace":
>>
>> "
>> The separation of .compat_reset from the regular .reset allows
>> vhost-vdpa able to know which driver had broken behavior before, so it
>> can apply the corresponding compatibility quirk to the individual
>> driver
>> whenever needed. Compared to overloading the existing .reset with
>> flags, .compat_reset won't cause any extra burden to the implementation
>> of every compliant driver.
>> "
>>
>>> As it tries hard to not introduce new stuff on the bus.
>> Honestly I don't see substantial difference between these other than the
>> color. There's no single best solution that stands out among the 3. And
>> I assume you already noticed it from all the above 3 approaches will
>> have to go with backend features negotiation, that the 1st vdpa reset
>> before backend feature negotiation will use the compliant version of
>> .reset that doesn't clean up the map. While I don't think this nuance
>> matters much to existing older userspace apps, as the maps should
>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
>> bug-for-bug behavioral compatibility is what you want, module parameter
>> will be the single best answer.
>>
>> Regards,
>> -Siwei
>>
>>> But we can listen to others for sure.
>>>
>>> Thanks
>>>

2023-10-25 01:17:22

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH v4 5/7] vhost-vdpa: clean iotlb map during reset for older userspace

On Wed, Oct 25, 2023 at 12:25 AM Si-Wei Liu <[email protected]> wrote:
>
>
>
> On 10/24/2023 9:21 AM, Si-Wei Liu wrote:
> >
> >
> > On 10/23/2023 10:45 PM, Jason Wang wrote:
> >> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]>
> >> wrote:
> >>> Using .compat_reset op from the previous patch, the buggy .reset
> >>> behaviour can be kept as-is on older userspace apps, which don't ack
> >>> the
> >>> IOTLB_PERSIST backend feature. As this compatibility quirk is
> >>> limited to
> >>> those drivers that used to be buggy in the past, it won't affect change
> >>> the behaviour or affect ABI on the setups with API compliant driver.
> >>>
> >>> The separation of .compat_reset from the regular .reset allows
> >>> vhost-vdpa able to know which driver had broken behaviour before, so it
> >>> can apply the corresponding compatibility quirk to the individual
> >>> driver
> >>> whenever needed. Compared to overloading the existing .reset with
> >>> flags, .compat_reset won't cause any extra burden to the implementation
> >>> of every compliant driver.
> >>>
> >>> Signed-off-by: Si-Wei Liu <[email protected]>
> >>> ---
> >>> drivers/vhost/vdpa.c | 17 +++++++++++++----
> >>> drivers/virtio/virtio_vdpa.c | 2 +-
> >>> include/linux/vdpa.h | 7 +++++--
> >>> 3 files changed, 19 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> >>> index acc7c74ba7d6..9ce40003793b 100644
> >>> --- a/drivers/vhost/vdpa.c
> >>> +++ b/drivers/vhost/vdpa.c
> >>> @@ -227,13 +227,22 @@ static void vhost_vdpa_unsetup_vq_irq(struct
> >>> vhost_vdpa *v, u16 qid)
> >>> irq_bypass_unregister_producer(&vq->call_ctx.producer);
> >>> }
> >>>
> >>> -static int vhost_vdpa_reset(struct vhost_vdpa *v)
> >>> +static int _compat_vdpa_reset(struct vhost_vdpa *v)
> >>> {
> >>> struct vdpa_device *vdpa = v->vdpa;
> >>> + u32 flags = 0;
> >>>
> >>> - v->in_batch = 0;
> >>> + flags |= !vhost_backend_has_feature(v->vdev.vqs[0],
> >>> + VHOST_BACKEND_F_IOTLB_PERSIST) ?
> >>> + VDPA_RESET_F_CLEAN_MAP : 0;
> >>> +
> >>> + return vdpa_reset(vdpa, flags);
> >>> +}
> >>>
> >>> - return vdpa_reset(vdpa);
> >>> +static int vhost_vdpa_reset(struct vhost_vdpa *v)
> >>> +{
> >>> + v->in_batch = 0;
> >>> + return _compat_vdpa_reset(v);
> >>> }
> >>>
> >>> static long vhost_vdpa_bind_mm(struct vhost_vdpa *v)
> >>> @@ -312,7 +321,7 @@ static long vhost_vdpa_set_status(struct
> >>> vhost_vdpa *v, u8 __user *statusp)
> >>> vhost_vdpa_unsetup_vq_irq(v, i);
> >>>
> >>> if (status == 0) {
> >>> - ret = vdpa_reset(vdpa);
> >>> + ret = _compat_vdpa_reset(v);
> >>> if (ret)
> >>> return ret;
> >>> } else
> >>> diff --git a/drivers/virtio/virtio_vdpa.c
> >>> b/drivers/virtio/virtio_vdpa.c
> >>> index 06ce6d8c2e00..8d63e5923d24 100644
> >>> --- a/drivers/virtio/virtio_vdpa.c
> >>> +++ b/drivers/virtio/virtio_vdpa.c
> >>> @@ -100,7 +100,7 @@ static void virtio_vdpa_reset(struct
> >>> virtio_device *vdev)
> >>> {
> >>> struct vdpa_device *vdpa = vd_get_vdpa(vdev);
> >>>
> >>> - vdpa_reset(vdpa);
> >>> + vdpa_reset(vdpa, 0);
> >>> }
> >>>
> >>> static bool virtio_vdpa_notify(struct virtqueue *vq)
> >>> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> >>> index 6b8cbf75712d..db15ac07f8a6 100644
> >>> --- a/include/linux/vdpa.h
> >>> +++ b/include/linux/vdpa.h
> >>> @@ -519,14 +519,17 @@ static inline struct device
> >>> *vdpa_get_dma_dev(struct vdpa_device *vdev)
> >>> return vdev->dma_dev;
> >>> }
> >>>
> >>> -static inline int vdpa_reset(struct vdpa_device *vdev)
> >>> +static inline int vdpa_reset(struct vdpa_device *vdev, u32 flags)
> >>> {
> >>> const struct vdpa_config_ops *ops = vdev->config;
> >>> int ret;
> >>>
> >>> down_write(&vdev->cf_lock);
> >>> vdev->features_valid = false;
> >>> - ret = ops->reset(vdev);
> >>> + if (ops->compat_reset && flags)
> >>> + ret = ops->compat_reset(vdev, flags);
> >>> + else
> >>> + ret = ops->reset(vdev);
> >> Instead of inventing a new API that carries the flags. Tweak the
> >> existing one seems to be simpler and better?
> > Well, as indicated in the commit message, this allows vhost-vdpa be
> > able to know which driver had broken behavior before, so it
> > can apply the corresponding compatibility quirk to the individual
> > driver when it's really necessary. If sending all flags
> > unconditionally down to every driver,

It depends on whether IOTLB_PERSIST is set.

> it's hard for driver writers to
> > distinguish which are compatibility quirks that they can safely ignore
> > and which are feature flags that are encouraged to implement. In that
> > sense, gating features from being polluted by compatibility quirks
> > with an implicit op
> s/implicit/explicit/
> > would be better.

Both of us have the points, we can listen to Michael or Eugenio for sure.

Thanks

> >
> > Regards,
> > -Siwei
> >>
> >> As compat_reset(vdev, 0) == reset(vdev)
> >>
> >> Then you don't need the switch in the parent as well
> >>
> >> +static int vdpasim_reset(struct vdpa_device *vdpa)
> >> +{
> >> + return vdpasim_compat_reset(vdpa, 0);
> >> +}
> >>
> >> Thanks
> >>
> >>
> >>> up_write(&vdev->cf_lock);
> >>> return ret;
> >>> }
> >>> --
> >>> 2.39.3
> >>>
> >
>

2023-10-25 09:43:12

by Lei Yang

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset

On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <[email protected]> wrote:
>
Hello Si-Wei
> Thanks a lot for testing! Please be aware that there's a follow-up fix
> for a potential oops in this v4 series:
>
The first, when I did not apply this patch [1], I will also hit this
patch mentioned problem. After I applied this patch, this problem will
no longer to hit again. But I hit another issues, about the error
messages please review the attached file.
[1] https://lore.kernel.org/virtualization/[email protected]/

My test steps:
git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
cd linux/
b4 am [email protected]
b4 am [email protected]
b4 am [email protected]
git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx
git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx
git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx
cp /boot/config-5.14.0-377.el9.x86_64 .config
make -j 32
make modules_install
make install

Thanks

Lei
> https://lore.kernel.org/virtualization/[email protected]/
>
> Would be nice to have it applied for any tests.
>
> Thanks,
> -Siwei
>
> On 10/23/2023 11:51 PM, Lei Yang wrote:
> > QE tested this series v4 with regression testing on real nic, there is
> > no new regression bug.
> >
> > Tested-by: Lei Yang <[email protected]>
> >
> > On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <[email protected]> wrote:
> >>
> >>
> >> On 10/22/2023 8:51 PM, Jason Wang wrote:
> >>> Hi Si-Wei:
> >>>
> >>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]> wrote:
> >>>> In order to reduce needlessly high setup and teardown cost
> >>>> of iotlb mapping during live migration, it's crucial to
> >>>> decouple the vhost-vdpa iotlb abstraction from the virtio
> >>>> device life cycle, i.e. iotlb mappings should be left
> >>>> intact across virtio device reset [1]. For it to work, the
> >>>> on-chip IOMMU parent device could implement a separate
> >>>> .reset_map() operation callback to restore 1:1 DMA mapping
> >>>> without having to resort to the .reset() callback, the
> >>>> latter of which is mainly used to reset virtio device state.
> >>>> This new .reset_map() callback will be invoked only before
> >>>> the vhost-vdpa driver is to be removed and detached from
> >>>> the vdpa bus, such that other vdpa bus drivers, e.g.
> >>>> virtio-vdpa, can start with 1:1 DMA mapping when they
> >>>> are attached. For the context, those on-chip IOMMU parent
> >>>> devices, create the 1:1 DMA mapping at vdpa device creation,
> >>>> and they would implicitly destroy the 1:1 mapping when
> >>>> the first .set_map or .dma_map callback is invoked.
> >>>>
> >>>> This patchset is rebased on top of the latest vhost tree.
> >>>>
> >>>> [1] Reducing vdpa migration downtime because of memory pin / maps
> >>>> https://www.mail-archive.com/[email protected]/msg953755.html
> >>>>
> >>>> ---
> >>>> v4:
> >>>> - Rework compatibility using new .compat_reset driver op
> >>> I still think having a set_backend_feature()
> >> This will overload backend features with the role of carrying over
> >> compatibility quirks, which I tried to avoid from. While I think the
> >> .compat_reset from the v4 code just works with the backend features
> >> acknowledgement (and maybe others as well) to determine, but not
> >> directly tie it to backend features itself. These two have different
> >> implications in terms of requirement, scope and maintaining/deprecation,
> >> better to cope with compat quirks in explicit and driver visible way.
> >>
> >>> or reset_map(clean=true) might be better.
> >> An explicit op might be marginally better in driver writer's point of
> >> view. Compliant driver doesn't have to bother asserting clean_map never
> >> be true so their code would never bother dealing with this case, as
> >> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
> >> during reset for older userspace":
> >>
> >> "
> >> The separation of .compat_reset from the regular .reset allows
> >> vhost-vdpa able to know which driver had broken behavior before, so it
> >> can apply the corresponding compatibility quirk to the individual
> >> driver
> >> whenever needed. Compared to overloading the existing .reset with
> >> flags, .compat_reset won't cause any extra burden to the implementation
> >> of every compliant driver.
> >> "
> >>
> >>> As it tries hard to not introduce new stuff on the bus.
> >> Honestly I don't see substantial difference between these other than the
> >> color. There's no single best solution that stands out among the 3. And
> >> I assume you already noticed it from all the above 3 approaches will
> >> have to go with backend features negotiation, that the 1st vdpa reset
> >> before backend feature negotiation will use the compliant version of
> >> .reset that doesn't clean up the map. While I don't think this nuance
> >> matters much to existing older userspace apps, as the maps should
> >> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
> >> bug-for-bug behavioral compatibility is what you want, module parameter
> >> will be the single best answer.
> >>
> >> Regards,
> >> -Siwei
> >>
> >>> But we can listen to others for sure.
> >>>
> >>> Thanks
> >>>
>


Attachments:
log (6.18 kB)

2023-10-25 23:32:34

by Si-Wei Liu

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset

Hi Yang Lei,

Thanks for testing my patches and reporting! As for the issue, could you
please try what I posted in:

https://lore.kernel.org/virtualization/[email protected]/

and let me know how it goes? Thank you very much!

Thanks,
-Siwei

On 10/25/2023 2:41 AM, Lei Yang wrote:
> On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <[email protected]> wrote:
> Hello Si-Wei
>> Thanks a lot for testing! Please be aware that there's a follow-up fix
>> for a potential oops in this v4 series:
>>
> The first, when I did not apply this patch [1], I will also hit this
> patch mentioned problem. After I applied this patch, this problem will
> no longer to hit again. But I hit another issues, about the error
> messages please review the attached file.
> [1] https://lore.kernel.org/virtualization/[email protected]/
>
> My test steps:
> git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> cd linux/
> b4 am [email protected]
> b4 am [email protected]
> b4 am [email protected]
> git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx
> git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx
> git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx
> cp /boot/config-5.14.0-377.el9.x86_64 .config
> make -j 32
> make modules_install
> make install
>
> Thanks
>
> Lei
>> https://lore.kernel.org/virtualization/[email protected]/
>>
>> Would be nice to have it applied for any tests.
>>
>> Thanks,
>> -Siwei
>>
>> On 10/23/2023 11:51 PM, Lei Yang wrote:
>>> QE tested this series v4 with regression testing on real nic, there is
>>> no new regression bug.
>>>
>>> Tested-by: Lei Yang <[email protected]>
>>>
>>> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <[email protected]> wrote:
>>>>
>>>> On 10/22/2023 8:51 PM, Jason Wang wrote:
>>>>> Hi Si-Wei:
>>>>>
>>>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]> wrote:
>>>>>> In order to reduce needlessly high setup and teardown cost
>>>>>> of iotlb mapping during live migration, it's crucial to
>>>>>> decouple the vhost-vdpa iotlb abstraction from the virtio
>>>>>> device life cycle, i.e. iotlb mappings should be left
>>>>>> intact across virtio device reset [1]. For it to work, the
>>>>>> on-chip IOMMU parent device could implement a separate
>>>>>> .reset_map() operation callback to restore 1:1 DMA mapping
>>>>>> without having to resort to the .reset() callback, the
>>>>>> latter of which is mainly used to reset virtio device state.
>>>>>> This new .reset_map() callback will be invoked only before
>>>>>> the vhost-vdpa driver is to be removed and detached from
>>>>>> the vdpa bus, such that other vdpa bus drivers, e.g.
>>>>>> virtio-vdpa, can start with 1:1 DMA mapping when they
>>>>>> are attached. For the context, those on-chip IOMMU parent
>>>>>> devices, create the 1:1 DMA mapping at vdpa device creation,
>>>>>> and they would implicitly destroy the 1:1 mapping when
>>>>>> the first .set_map or .dma_map callback is invoked.
>>>>>>
>>>>>> This patchset is rebased on top of the latest vhost tree.
>>>>>>
>>>>>> [1] Reducing vdpa migration downtime because of memory pin / maps
>>>>>> https://www.mail-archive.com/[email protected]/msg953755.html
>>>>>>
>>>>>> ---
>>>>>> v4:
>>>>>> - Rework compatibility using new .compat_reset driver op
>>>>> I still think having a set_backend_feature()
>>>> This will overload backend features with the role of carrying over
>>>> compatibility quirks, which I tried to avoid from. While I think the
>>>> .compat_reset from the v4 code just works with the backend features
>>>> acknowledgement (and maybe others as well) to determine, but not
>>>> directly tie it to backend features itself. These two have different
>>>> implications in terms of requirement, scope and maintaining/deprecation,
>>>> better to cope with compat quirks in explicit and driver visible way.
>>>>
>>>>> or reset_map(clean=true) might be better.
>>>> An explicit op might be marginally better in driver writer's point of
>>>> view. Compliant driver doesn't have to bother asserting clean_map never
>>>> be true so their code would never bother dealing with this case, as
>>>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
>>>> during reset for older userspace":
>>>>
>>>> "
>>>> The separation of .compat_reset from the regular .reset allows
>>>> vhost-vdpa able to know which driver had broken behavior before, so it
>>>> can apply the corresponding compatibility quirk to the individual
>>>> driver
>>>> whenever needed. Compared to overloading the existing .reset with
>>>> flags, .compat_reset won't cause any extra burden to the implementation
>>>> of every compliant driver.
>>>> "
>>>>
>>>>> As it tries hard to not introduce new stuff on the bus.
>>>> Honestly I don't see substantial difference between these other than the
>>>> color. There's no single best solution that stands out among the 3. And
>>>> I assume you already noticed it from all the above 3 approaches will
>>>> have to go with backend features negotiation, that the 1st vdpa reset
>>>> before backend feature negotiation will use the compliant version of
>>>> .reset that doesn't clean up the map. While I don't think this nuance
>>>> matters much to existing older userspace apps, as the maps should
>>>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
>>>> bug-for-bug behavioral compatibility is what you want, module parameter
>>>> will be the single best answer.
>>>>
>>>> Regards,
>>>> -Siwei
>>>>
>>>>> But we can listen to others for sure.
>>>>>
>>>>> Thanks
>>>>>

2023-10-26 06:18:07

by Lei Yang

[permalink] [raw]
Subject: Re: [PATCH v4 0/7] vdpa: decouple reset of iotlb mapping from device reset

On Thu, Oct 26, 2023 at 7:32 AM Si-Wei Liu <[email protected]> wrote:
>
> Hi Yang Lei,
>
> Thanks for testing my patches and reporting! As for the issue, could you
> please try what I posted in:
>
> https://lore.kernel.org/virtualization/[email protected]/
>
HI Si-Wei
> and let me know how it goes? Thank you very much!

This problem has gone after applying this patch [1].
[1] https://lore.kernel.org/virtualization/[email protected]/

Thanks
Lei
>
> Thanks,
> -Siwei
>
> On 10/25/2023 2:41 AM, Lei Yang wrote:
> > On Wed, Oct 25, 2023 at 1:27 AM Si-Wei Liu <[email protected]> wrote:
> > Hello Si-Wei
> >> Thanks a lot for testing! Please be aware that there's a follow-up fix
> >> for a potential oops in this v4 series:
> >>
> > The first, when I did not apply this patch [1], I will also hit this
> > patch mentioned problem. After I applied this patch, this problem will
> > no longer to hit again. But I hit another issues, about the error
> > messages please review the attached file.
> > [1] https://lore.kernel.org/virtualization/[email protected]/
> >
> > My test steps:
> > git clone https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> > cd linux/
> > b4 am [email protected]
> > b4 am [email protected]
> > b4 am [email protected]
> > git am ./v4_20231018_dtatulea_vdpa_add_support_for_vq_descriptor_mappings.mbx
> > git am ./v4_20231021_si_wei_liu_vdpa_decouple_reset_of_iotlb_mapping_from_device_reset.mbx
> > git am ./20231023_si_wei_liu_vhost_vdpa_fix_null_pointer_deref_in__compat_vdpa_reset.mbx
> > cp /boot/config-5.14.0-377.el9.x86_64 .config
> > make -j 32
> > make modules_install
> > make install
> >
> > Thanks
> >
> > Lei
> >> https://lore.kernel.org/virtualization/[email protected]/
> >>
> >> Would be nice to have it applied for any tests.
> >>
> >> Thanks,
> >> -Siwei
> >>
> >> On 10/23/2023 11:51 PM, Lei Yang wrote:
> >>> QE tested this series v4 with regression testing on real nic, there is
> >>> no new regression bug.
> >>>
> >>> Tested-by: Lei Yang <[email protected]>
> >>>
> >>> On Tue, Oct 24, 2023 at 6:02 AM Si-Wei Liu <[email protected]> wrote:
> >>>>
> >>>> On 10/22/2023 8:51 PM, Jason Wang wrote:
> >>>>> Hi Si-Wei:
> >>>>>
> >>>>> On Sat, Oct 21, 2023 at 5:28 PM Si-Wei Liu <[email protected]> wrote:
> >>>>>> In order to reduce needlessly high setup and teardown cost
> >>>>>> of iotlb mapping during live migration, it's crucial to
> >>>>>> decouple the vhost-vdpa iotlb abstraction from the virtio
> >>>>>> device life cycle, i.e. iotlb mappings should be left
> >>>>>> intact across virtio device reset [1]. For it to work, the
> >>>>>> on-chip IOMMU parent device could implement a separate
> >>>>>> .reset_map() operation callback to restore 1:1 DMA mapping
> >>>>>> without having to resort to the .reset() callback, the
> >>>>>> latter of which is mainly used to reset virtio device state.
> >>>>>> This new .reset_map() callback will be invoked only before
> >>>>>> the vhost-vdpa driver is to be removed and detached from
> >>>>>> the vdpa bus, such that other vdpa bus drivers, e.g.
> >>>>>> virtio-vdpa, can start with 1:1 DMA mapping when they
> >>>>>> are attached. For the context, those on-chip IOMMU parent
> >>>>>> devices, create the 1:1 DMA mapping at vdpa device creation,
> >>>>>> and they would implicitly destroy the 1:1 mapping when
> >>>>>> the first .set_map or .dma_map callback is invoked.
> >>>>>>
> >>>>>> This patchset is rebased on top of the latest vhost tree.
> >>>>>>
> >>>>>> [1] Reducing vdpa migration downtime because of memory pin / maps
> >>>>>> https://www.mail-archive.com/[email protected]/msg953755.html
> >>>>>>
> >>>>>> ---
> >>>>>> v4:
> >>>>>> - Rework compatibility using new .compat_reset driver op
> >>>>> I still think having a set_backend_feature()
> >>>> This will overload backend features with the role of carrying over
> >>>> compatibility quirks, which I tried to avoid from. While I think the
> >>>> .compat_reset from the v4 code just works with the backend features
> >>>> acknowledgement (and maybe others as well) to determine, but not
> >>>> directly tie it to backend features itself. These two have different
> >>>> implications in terms of requirement, scope and maintaining/deprecation,
> >>>> better to cope with compat quirks in explicit and driver visible way.
> >>>>
> >>>>> or reset_map(clean=true) might be better.
> >>>> An explicit op might be marginally better in driver writer's point of
> >>>> view. Compliant driver doesn't have to bother asserting clean_map never
> >>>> be true so their code would never bother dealing with this case, as
> >>>> explained in the commit log for patch 5 "vhost-vdpa: clean iotlb map
> >>>> during reset for older userspace":
> >>>>
> >>>> "
> >>>> The separation of .compat_reset from the regular .reset allows
> >>>> vhost-vdpa able to know which driver had broken behavior before, so it
> >>>> can apply the corresponding compatibility quirk to the individual
> >>>> driver
> >>>> whenever needed. Compared to overloading the existing .reset with
> >>>> flags, .compat_reset won't cause any extra burden to the implementation
> >>>> of every compliant driver.
> >>>> "
> >>>>
> >>>>> As it tries hard to not introduce new stuff on the bus.
> >>>> Honestly I don't see substantial difference between these other than the
> >>>> color. There's no single best solution that stands out among the 3. And
> >>>> I assume you already noticed it from all the above 3 approaches will
> >>>> have to go with backend features negotiation, that the 1st vdpa reset
> >>>> before backend feature negotiation will use the compliant version of
> >>>> .reset that doesn't clean up the map. While I don't think this nuance
> >>>> matters much to existing older userspace apps, as the maps should
> >>>> already get cleaned by previous process in vhost_vdpa_cleanup(), but if
> >>>> bug-for-bug behavioral compatibility is what you want, module parameter
> >>>> will be the single best answer.
> >>>>
> >>>> Regards,
> >>>> -Siwei
> >>>>
> >>>>> But we can listen to others for sure.
> >>>>>
> >>>>> Thanks
> >>>>>
>