2023-12-19 18:09:50

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs

Add support for resumable vqs in the mlx5_vdpa driver. This is a
firmware feature that can be used for the following benefits:
- Full device .suspend/.resume.
- .set_map doesn't need to destroy and create new vqs anymore just to
update the map. When resumable vqs are supported it is enough to
suspend the vqs, set the new maps, and then resume the vqs.

The first patch exposes relevant bits for the feature in mlx5_ifc.h.
That means it needs to be applied to the mlx5-vhost tree [0] first. Once
applied there, the change has to be pulled from mlx5-vhost into the
vhost tree and only then the remaining patches can be applied. Same flow
as the vq descriptor mappings patchset [1].

The second part implements the vdpa backend feature support to allow
vq state and address changes when the device is in DRIVER_OK state and
suspended.

The third part adds support for seletively modifying vq parameters. This
is needed to be able to use resumable vqs.

Then the actual support for resumable vqs is added.

The last part of the series introduces reference counting for mrs which
is necessary to avoid freeing mkeys too early or leaking them.

* Changes in v4:
- Added vdpa backend feature support for changing vq properties in
DRIVER_OK when device is suspended. Added support in the driver as
well.
- Dropped Acked-by for the patches that had the tag mistakenly
added.

* Changes in v3:
- Faulty version. Please ignore.

* Changes in v2:
- Added mr refcounting patches.
- Deleted unnecessary patch: "vdpa/mlx5: Split function into locked and
unlocked variants"
- Small print improvement in "Introduce per vq and device resume"
patch.
- Patch 1/7 has been applied to mlx5-vhost branch.


Dragos Tatulea (15):
vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend
feature
vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend
feature
vdpa: Track device suspended state
vdpa: Block vq address change in DRIVER_OK unless device supports it
vdpa: Block vq state change in DRIVER_OK unless device supports it
vdpa/mlx5: Expose resumable vq capability
vdpa/mlx5: Allow modifying multiple vq fields in one modify command
vdpa/mlx5: Introduce per vq and device resume
vdpa/mlx5: Mark vq addrs for modification in hw vq
vdpa/mlx5: Mark vq state for modification in hw vq
vdpa/mlx5: Use vq suspend/resume during .set_map
vdpa/mlx5: Introduce reference counting to mrs
vdpa/mlx5: Add mkey leak detection

drivers/vdpa/mlx5/core/mlx5_vdpa.h | 10 +-
drivers/vdpa/mlx5/core/mr.c | 69 +++++++--
drivers/vdpa/mlx5/net/mlx5_vnet.c | 218 ++++++++++++++++++++++++++---
drivers/vhost/vdpa.c | 51 ++++++-
include/linux/mlx5/mlx5_ifc.h | 3 +-
include/linux/mlx5/mlx5_ifc_vdpa.h | 4 +
include/uapi/linux/vhost_types.h | 8 ++
7 files changed, 322 insertions(+), 41 deletions(-)

--
2.43.0



2023-12-19 18:10:16

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH mlx5-vhost v4 01/15] vdpa/mlx5: Expose resumable vq capability

Necessary for checking if resumable vqs are supported by the hardware.
Actual support will be added in a downstream patch.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
include/linux/mlx5/mlx5_ifc.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 6f3631425f38..9eaceaf6bcb0 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1236,7 +1236,8 @@ struct mlx5_ifc_virtio_emulation_cap_bits {

u8 reserved_at_c0[0x13];
u8 desc_group_mkey_supported[0x1];
- u8 reserved_at_d4[0xc];
+ u8 freeze_to_rdy_supported[0x1];
+ u8 reserved_at_d5[0xb];

u8 reserved_at_e0[0x20];

--
2.43.0


2023-12-19 18:11:09

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 03/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag

The virtio spec doesn't allow changing virtqueue state after
DRIVER_OK. Some devices do support this operation when the device is
suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
advertises this support as a backend features.

Signed-off-by: Dragos Tatulea <[email protected]>
Suggested-by: Eugenio Pérez <[email protected]>
---
include/uapi/linux/vhost_types.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index aacd067afc89..848dadc95ca1 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -196,5 +196,9 @@ struct vhost_vdpa_iova_range {
* and is in state DRIVER_OK.
*/
#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND 0x9
+/* Device supports changing virtqueue state when device is suspended
+ * and is in state DRIVER_OK.
+ */
+#define VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND 0x10

#endif
--
2.43.0


2023-12-19 18:12:20

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 05/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend feature

If userland sets this feature, allow it.

Signed-off-by: Dragos Tatulea <[email protected]>
Suggested-by: Eugenio Pérez <[email protected]>
---
drivers/vhost/vdpa.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 2250fcd90e5b..b4e8ddf86485 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -750,6 +750,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
BIT_ULL(VHOST_BACKEND_F_RESUME) |
BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
+ BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND) |
BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
return -EOPNOTSUPP;
if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
--
2.43.0


2023-12-19 18:13:17

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 09/15] vdpa/mlx5: Allow modifying multiple vq fields in one modify command

Add a bitmask variable that tracks hw vq field changes that
are supposed to be modified on next hw vq change command.

This will be useful to set multiple vq fields when resuming the vq.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 48 +++++++++++++++++++++++++------
1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 26ba7da6b410..1e08a8805640 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -120,6 +120,9 @@ struct mlx5_vdpa_virtqueue {
u16 avail_idx;
u16 used_idx;
int fw_state;
+
+ u64 modified_fields;
+
struct msi_map map;

/* keep last in the struct */
@@ -1181,7 +1184,19 @@ static bool is_valid_state_change(int oldstate, int newstate)
}
}

-static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int state)
+static bool modifiable_virtqueue_fields(struct mlx5_vdpa_virtqueue *mvq)
+{
+ /* Only state is always modifiable */
+ if (mvq->modified_fields & ~MLX5_VIRTQ_MODIFY_MASK_STATE)
+ return mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT ||
+ mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
+
+ return true;
+}
+
+static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
+ struct mlx5_vdpa_virtqueue *mvq,
+ int state)
{
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
@@ -1193,6 +1208,9 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_NONE)
return 0;

+ if (!modifiable_virtqueue_fields(mvq))
+ return -EINVAL;
+
if (!is_valid_state_change(mvq->fw_state, state))
return -EINVAL;

@@ -1208,17 +1226,28 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);

obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
- MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
- MLX5_VIRTQ_MODIFY_MASK_STATE);
- MLX5_SET(virtio_net_q_object, obj_context, state, state);
+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
+ MLX5_SET(virtio_net_q_object, obj_context, state, state);
+
+ MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
kfree(in);
if (!err)
mvq->fw_state = state;

+ mvq->modified_fields = 0;
+
return err;
}

+static int modify_virtqueue_state(struct mlx5_vdpa_net *ndev,
+ struct mlx5_vdpa_virtqueue *mvq,
+ unsigned int state)
+{
+ mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
+ return modify_virtqueue(ndev, mvq, state);
+}
+
static int counter_set_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
{
u32 in[MLX5_ST_SZ_DW(create_virtio_q_counters_in)] = {};
@@ -1347,7 +1376,7 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
goto err_vq;

if (mvq->ready) {
- err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+ err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
if (err) {
mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready vq idx %d(%d)\n",
idx, err);
@@ -1382,7 +1411,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
return;

- if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
+ if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");

if (query_virtqueue(ndev, mvq, &attr)) {
@@ -1407,6 +1436,7 @@ static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *
return;

suspend_vq(ndev, mvq);
+ mvq->modified_fields = 0;
destroy_virtqueue(ndev, mvq);
dealloc_vector(ndev, mvq);
counter_set_dealloc(ndev, mvq);
@@ -2207,7 +2237,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
if (!ready) {
suspend_vq(ndev, mvq);
} else {
- err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+ err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
if (err) {
mlx5_vdpa_warn(mvdev, "modify VQ %d to ready failed (%d)\n", idx, err);
ready = false;
@@ -2804,8 +2834,10 @@ static void clear_vqs_ready(struct mlx5_vdpa_net *ndev)
{
int i;

- for (i = 0; i < ndev->mvdev.max_vqs; i++)
+ for (i = 0; i < ndev->mvdev.max_vqs; i++) {
ndev->vqs[i].ready = false;
+ ndev->vqs[i].modified_fields = 0;
+ }

ndev->mvdev.cvq.ready = false;
}
--
2.43.0


2023-12-19 18:13:33

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 08/15] vdpa: Block vq state change in DRIVER_OK unless device supports it

Virtqueue state change during DRIVE_OK is not supported by the virtio
standard. Allow this op in DRIVER_OK only for devices that support
changing the state during DRIVER_OK if the device is suspended.

Signed-off-by: Dragos Tatulea <[email protected]>
Suggested-by: Eugenio Pérez <[email protected]>
---
drivers/vhost/vdpa.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 6bfa3391935a..77509440c723 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -641,6 +641,9 @@ static bool vhost_vdpa_vq_config_allowed(struct vhost_vdpa *v, unsigned int cmd)
case VHOST_SET_VRING_ADDR:
feature = VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND;
break;
+ case VHOST_SET_VRING_BASE:
+ feature = VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND;
+ break;
default:
return false;
}
@@ -737,6 +740,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
break;

case VHOST_SET_VRING_BASE:
+ if (!vhost_vdpa_vq_config_allowed(v, cmd))
+ return -EOPNOTSUPP;
+
if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
vq_state.packed.last_avail_idx = vq->last_avail_idx & 0x7fff;
vq_state.packed.last_avail_counter = !!(vq->last_avail_idx & 0x8000);
--
2.43.0


2023-12-19 18:14:09

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 10/15] vdpa/mlx5: Introduce per vq and device resume

Implement vdpa vq and device resume if capability detected. Add support
for suspend -> ready state change.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 69 +++++++++++++++++++++++++++----
1 file changed, 62 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1e08a8805640..f8f088cced50 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1170,7 +1170,12 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
return err;
}

-static bool is_valid_state_change(int oldstate, int newstate)
+static bool is_resumable(struct mlx5_vdpa_net *ndev)
+{
+ return ndev->mvdev.vdev.config->resume;
+}
+
+static bool is_valid_state_change(int oldstate, int newstate, bool resumable)
{
switch (oldstate) {
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
@@ -1178,6 +1183,7 @@ static bool is_valid_state_change(int oldstate, int newstate)
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
return newstate == MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
+ return resumable ? newstate == MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY : false;
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_ERR:
default:
return false;
@@ -1200,6 +1206,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
{
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+ bool state_change = false;
void *obj_context;
void *cmd_hdr;
void *in;
@@ -1211,9 +1218,6 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
if (!modifiable_virtqueue_fields(mvq))
return -EINVAL;

- if (!is_valid_state_change(mvq->fw_state, state))
- return -EINVAL;
-
in = kzalloc(inlen, GFP_KERNEL);
if (!in)
return -ENOMEM;
@@ -1226,17 +1230,29 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);

obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
- if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
+
+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
+ if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
+ err = -EINVAL;
+ goto done;
+ }
+
MLX5_SET(virtio_net_q_object, obj_context, state, state);
+ state_change = true;
+ }

MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
- kfree(in);
- if (!err)
+ if (err)
+ goto done;
+
+ if (state_change)
mvq->fw_state = state;

mvq->modified_fields = 0;

+done:
+ kfree(in);
return err;
}

@@ -1430,6 +1446,24 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
suspend_vq(ndev, &ndev->vqs[i]);
}

+static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ if (!mvq->initialized || !is_resumable(ndev))
+ return;
+
+ if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND)
+ return;
+
+ if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
+ mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u\n", mvq->index);
+}
+
+static void resume_vqs(struct mlx5_vdpa_net *ndev)
+{
+ for (int i = 0; i < ndev->mvdev.max_vqs; i++)
+ resume_vq(ndev, &ndev->vqs[i]);
+}
+
static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
{
if (!mvq->initialized)
@@ -3261,6 +3295,23 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
return 0;
}

+static int mlx5_vdpa_resume(struct vdpa_device *vdev)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev;
+
+ ndev = to_mlx5_vdpa_ndev(mvdev);
+
+ mlx5_vdpa_info(mvdev, "resuming device\n");
+
+ down_write(&ndev->reslock);
+ mvdev->suspended = false;
+ resume_vqs(ndev);
+ register_link_notifier(ndev);
+ up_write(&ndev->reslock);
+ return 0;
+}
+
static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
unsigned int asid)
{
@@ -3317,6 +3368,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = {
.get_vq_dma_dev = mlx5_get_vq_dma_dev,
.free = mlx5_vdpa_free,
.suspend = mlx5_vdpa_suspend,
+ .resume = mlx5_vdpa_resume, /* Op disabled if not supported. */
};

static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
@@ -3688,6 +3740,9 @@ static int mlx5v_probe(struct auxiliary_device *adev,
if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, desc_group_mkey_supported))
mgtdev->vdpa_ops.get_vq_desc_group = NULL;

+ if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, freeze_to_rdy_supported))
+ mgtdev->vdpa_ops.resume = NULL;
+
err = vdpa_mgmtdev_register(&mgtdev->mgtdev);
if (err)
goto reg_err;
--
2.43.0


2023-12-19 18:14:47

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 11/15] vdpa/mlx5: Mark vq addrs for modification in hw vq

Addresses get set by .set_vq_address. hw vq addresses will be updated on
next modify_virtqueue.

Advertise that the device supports changing the vq addresses when
device is in DRIVER_OK state and suspended.

Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 17 ++++++++++++++++-
include/linux/mlx5/mlx5_ifc_vdpa.h | 1 +
2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index f8f088cced50..93812683c88c 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1209,6 +1209,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
bool state_change = false;
void *obj_context;
void *cmd_hdr;
+ void *vq_ctx;
void *in;
int err;

@@ -1230,6 +1231,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);

obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+ vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);

if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
@@ -1241,6 +1243,12 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
state_change = true;
}

+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS) {
+ MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
+ MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
+ MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
+ }
+
MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
if (err)
@@ -2202,6 +2210,7 @@ static int mlx5_vdpa_set_vq_address(struct vdpa_device *vdev, u16 idx, u64 desc_
mvq->desc_addr = desc_area;
mvq->device_addr = device_area;
mvq->driver_addr = driver_area;
+ mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS;
return 0;
}

@@ -2626,7 +2635,13 @@ static void unregister_link_notifier(struct mlx5_vdpa_net *ndev)

static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device *vdpa)
{
- return BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
+ u64 features = BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK);
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdpa);
+
+ if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, freeze_to_rdy_supported))
+ features |= BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND);
+
+ return features;
}

static int mlx5_vdpa_set_driver_features(struct vdpa_device *vdev, u64 features)
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h
index b86d51a855f6..9594ac405740 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -145,6 +145,7 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_STATE = (u64)1 << 0,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS = (u64)1 << 3,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
+ MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS = (u64)1 << 6,
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY = (u64)1 << 14,
};

--
2.43.0


2023-12-19 18:15:13

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 12/15] vdpa/mlx5: Mark vq state for modification in hw vq

.set_vq_state will set the indices and mark the fields to be modified in
the hw vq.

Advertise that the device supports changing the vq state when the device
is in DRIVER_OK state and suspended.

Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 11 ++++++++++-
include/linux/mlx5/mlx5_ifc_vdpa.h | 2 ++
2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 93812683c88c..b760005e2920 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1249,6 +1249,12 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
}

+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX)
+ MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
+
+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX)
+ MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);
+
MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
if (err)
@@ -2328,6 +2334,8 @@ static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx,

mvq->used_idx = state->split.avail_index;
mvq->avail_idx = state->split.avail_index;
+ mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX |
+ MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX;
return 0;
}

@@ -2639,7 +2647,8 @@ static u64 mlx5_vdpa_get_backend_features(const struct vdpa_device *vdpa)
struct mlx5_vdpa_dev *mvdev = to_mvdev(vdpa);

if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, freeze_to_rdy_supported))
- features |= BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND);
+ features |= BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
+ BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND);

return features;
}
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 9594ac405740..32e712106e68 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -146,6 +146,8 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS = (u64)1 << 3,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS = (u64)1 << 6,
+ MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX = (u64)1 << 7,
+ MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX = (u64)1 << 8,
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY = (u64)1 << 14,
};

--
2.43.0


2023-12-19 18:15:57

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 13/15] vdpa/mlx5: Use vq suspend/resume during .set_map

Instead of tearing down and setting up vq resources, use vq
suspend/resume during .set_map to speed things up a bit.

The vq mr is updated with the new mapping while the vqs are suspended.

If the device doesn't support resumable vqs, do the old teardown and
setup dance.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 46 ++++++++++++++++++++++++------
include/linux/mlx5/mlx5_ifc_vdpa.h | 1 +
2 files changed, 39 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index b760005e2920..fcadbeede3e1 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1206,6 +1206,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
{
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+ struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
bool state_change = false;
void *obj_context;
void *cmd_hdr;
@@ -1255,6 +1256,24 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX)
MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);

+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
+ struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+
+ if (mr)
+ MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, mr->mkey);
+ else
+ mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
+ }
+
+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
+ struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+
+ if (mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
+ MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, mr->mkey);
+ else
+ mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
+ }
+
MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
if (err)
@@ -2791,24 +2810,35 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
unsigned int asid)
{
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ bool teardown = !is_resumable(ndev);
int err;

suspend_vqs(ndev);
- err = save_channels_info(ndev);
- if (err)
- return err;
+ if (teardown) {
+ err = save_channels_info(ndev);
+ if (err)
+ return err;

- teardown_driver(ndev);
+ teardown_driver(ndev);
+ }

mlx5_vdpa_update_mr(mvdev, new_mr, asid);

+ for (int i = 0; i < ndev->cur_num_vqs; i++)
+ ndev->vqs[i].modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY |
+ MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
+
if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) || mvdev->suspended)
return 0;

- restore_channels_info(ndev);
- err = setup_driver(mvdev);
- if (err)
- return err;
+ if (teardown) {
+ restore_channels_info(ndev);
+ err = setup_driver(mvdev);
+ if (err)
+ return err;
+ }
+
+ resume_vqs(ndev);

return 0;
}
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h
index 32e712106e68..40371c916cf9 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -148,6 +148,7 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_ADDRS = (u64)1 << 6,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_AVAIL_IDX = (u64)1 << 7,
MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_USED_IDX = (u64)1 << 8,
+ MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY = (u64)1 << 11,
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY = (u64)1 << 14,
};

--
2.43.0


2023-12-19 18:16:59

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 15/15] vdpa/mlx5: Add mkey leak detection

Track allocated mrs in a list and show warning when leaks are detected
on device free or reset.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 2 ++
drivers/vdpa/mlx5/core/mr.c | 23 +++++++++++++++++++++++
drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 ++
3 files changed, 27 insertions(+)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 1a0d27b6e09a..50aac8fe57ef 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -37,6 +37,7 @@ struct mlx5_vdpa_mr {
bool user_mr;

refcount_t refcount;
+ struct list_head mr_list;
};

struct mlx5_vdpa_resources {
@@ -95,6 +96,7 @@ struct mlx5_vdpa_dev {
u32 generation;

struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
+ struct list_head mr_list_head;
/* serialize mr access */
struct mutex mr_mtx;
struct mlx5_control_vq cvq;
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index c7dc8914354a..4758914ccf86 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -508,6 +508,8 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_

vhost_iotlb_free(mr->iotlb);

+ list_del(&mr->mr_list);
+
kfree(mr);
}

@@ -560,12 +562,31 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
mutex_unlock(&mvdev->mr_mtx);
}

+static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
+{
+ struct mlx5_vdpa_mr *mr;
+
+ mutex_lock(&mvdev->mr_mtx);
+
+ list_for_each_entry(mr, &mvdev->mr_list_head, mr_list) {
+
+ mlx5_vdpa_warn(mvdev, "mkey still alive after resource delete: "
+ "mr: %p, mkey: 0x%x, refcount: %u\n",
+ mr, mr->mkey, refcount_read(&mr->refcount));
+ }
+
+ mutex_unlock(&mvdev->mr_mtx);
+
+}
+
void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
{
for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
mlx5_vdpa_update_mr(mvdev, NULL, i);

prune_iotlb(mvdev->cvq.iotlb);
+
+ mlx5_vdpa_show_mr_leaks(mvdev);
}

static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
@@ -592,6 +613,8 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
if (err)
goto err_iotlb;

+ list_add_tail(&mr->mr_list, &mvdev->mr_list_head);
+
return 0;

err_iotlb:
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index f81968b3f9cf..a783e8bd784d 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3729,6 +3729,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
if (err)
goto err_mpfs;

+ INIT_LIST_HEAD(&mvdev->mr_list_head);
+
if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
err = mlx5_vdpa_create_dma_mr(mvdev);
if (err)
--
2.43.0


2023-12-19 18:17:43

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 14/15] vdpa/mlx5: Introduce reference counting to mrs

Deleting the old mr during mr update (.set_map) and then modifying the
vqs with the new mr is not a good flow for firmware. The firmware
expects that mkeys are deleted after there are no more vqs referencing
them.

Introduce reference counting for mrs to fix this. It is the only way to
make sure that mkeys are not in use by vqs.

An mr reference is taken when the mr is associated to the mr asid table
and when the mr is linked to the vq on create/modify. The reference is
released when the mkey is unlinked from the vq (trough modify/destroy)
and from the mr asid table.

To make things consistent, get rid of mlx5_vdpa_destroy_mr and use
get/put semantics everywhere.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 8 +++--
drivers/vdpa/mlx5/core/mr.c | 50 ++++++++++++++++++++----------
drivers/vdpa/mlx5/net/mlx5_vnet.c | 45 ++++++++++++++++++++++-----
3 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 84547d998bcf..1a0d27b6e09a 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -35,6 +35,8 @@ struct mlx5_vdpa_mr {
struct vhost_iotlb *iotlb;

bool user_mr;
+
+ refcount_t refcount;
};

struct mlx5_vdpa_resources {
@@ -118,8 +120,10 @@ int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey);
struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb);
void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
- struct mlx5_vdpa_mr *mr);
+void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr);
+void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr);
void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
struct mlx5_vdpa_mr *mr,
unsigned int asid);
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 2197c46e563a..c7dc8914354a 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -498,32 +498,52 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr

static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
{
+ if (WARN_ON(!mr))
+ return;
+
if (mr->user_mr)
destroy_user_mr(mvdev, mr);
else
destroy_dma_mr(mvdev, mr);

vhost_iotlb_free(mr->iotlb);
+
+ kfree(mr);
}

-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
- struct mlx5_vdpa_mr *mr)
+static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr)
{
if (!mr)
return;

+ if (refcount_dec_and_test(&mr->refcount))
+ _mlx5_vdpa_destroy_mr(mvdev, mr);
+}
+
+void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr)
+{
mutex_lock(&mvdev->mr_mtx);
+ _mlx5_vdpa_put_mr(mvdev, mr);
+ mutex_unlock(&mvdev->mr_mtx);
+}

- _mlx5_vdpa_destroy_mr(mvdev, mr);
+static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr)
+{
+ if (!mr)
+ return;

- for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) {
- if (mvdev->mr[i] == mr)
- mvdev->mr[i] = NULL;
- }
+ refcount_inc(&mr->refcount);
+}

+void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr)
+{
+ mutex_lock(&mvdev->mr_mtx);
+ _mlx5_vdpa_get_mr(mvdev, mr);
mutex_unlock(&mvdev->mr_mtx);
-
- kfree(mr);
}

void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
@@ -534,20 +554,16 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,

mutex_lock(&mvdev->mr_mtx);

+ _mlx5_vdpa_put_mr(mvdev, old_mr);
mvdev->mr[asid] = new_mr;
- if (old_mr) {
- _mlx5_vdpa_destroy_mr(mvdev, old_mr);
- kfree(old_mr);
- }

mutex_unlock(&mvdev->mr_mtx);
-
}

void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
{
for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
- mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]);
+ mlx5_vdpa_update_mr(mvdev, NULL, i);

prune_iotlb(mvdev->cvq.iotlb);
}
@@ -607,6 +623,8 @@ struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
if (err)
goto out_err;

+ refcount_set(&mr->refcount, 1);
+
return mr;

out_err:
@@ -651,7 +669,7 @@ int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
if (asid >= MLX5_VDPA_NUM_AS)
return -EINVAL;

- mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[asid]);
+ mlx5_vdpa_update_mr(mvdev, NULL, asid);

if (asid == 0 && MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
if (mlx5_vdpa_create_dma_mr(mvdev))
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index fcadbeede3e1..f81968b3f9cf 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -123,6 +123,9 @@ struct mlx5_vdpa_virtqueue {

u64 modified_fields;

+ struct mlx5_vdpa_mr *vq_mr;
+ struct mlx5_vdpa_mr *desc_mr;
+
struct msi_map map;

/* keep last in the struct */
@@ -946,6 +949,14 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
kfree(in);
mvq->virtq_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);

+ mlx5_vdpa_get_mr(mvdev, vq_mr);
+ mvq->vq_mr = vq_mr;
+
+ if (vq_desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported)) {
+ mlx5_vdpa_get_mr(mvdev, vq_desc_mr);
+ mvq->desc_mr = vq_desc_mr;
+ }
+
return 0;

err_cmd:
@@ -972,6 +983,12 @@ static void destroy_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtq
}
mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
umems_destroy(ndev, mvq);
+
+ mlx5_vdpa_put_mr(&ndev->mvdev, mvq->vq_mr);
+ mvq->vq_mr = NULL;
+
+ mlx5_vdpa_put_mr(&ndev->mvdev, mvq->desc_mr);
+ mvq->desc_mr = NULL;
}

static u32 get_rqpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
@@ -1207,6 +1224,8 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+ struct mlx5_vdpa_mr *desc_mr = NULL;
+ struct mlx5_vdpa_mr *vq_mr = NULL;
bool state_change = false;
void *obj_context;
void *cmd_hdr;
@@ -1257,19 +1276,19 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET(virtio_net_q_object, obj_context, hw_used_index, mvq->used_idx);

if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
- struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+ vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];

- if (mr)
- MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, mr->mkey);
+ if (vq_mr)
+ MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
else
mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
}

if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
- struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+ desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];

- if (mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
- MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, mr->mkey);
+ if (desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
+ MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, desc_mr->mkey);
else
mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
}
@@ -1282,6 +1301,18 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
if (state_change)
mvq->fw_state = state;

+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
+ mlx5_vdpa_put_mr(mvdev, mvq->vq_mr);
+ mlx5_vdpa_get_mr(mvdev, vq_mr);
+ mvq->vq_mr = vq_mr;
+ }
+
+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
+ mlx5_vdpa_put_mr(mvdev, mvq->desc_mr);
+ mlx5_vdpa_get_mr(mvdev, desc_mr);
+ mvq->desc_mr = desc_mr;
+ }
+
mvq->modified_fields = 0;

done:
@@ -3102,7 +3133,7 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
return mlx5_vdpa_update_cvq_iotlb(mvdev, iotlb, asid);

out_err:
- mlx5_vdpa_destroy_mr(mvdev, new_mr);
+ mlx5_vdpa_put_mr(mvdev, new_mr);
return err;
}

--
2.43.0


2023-12-19 18:18:31

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

The virtio spec doesn't allow changing virtqueue addresses after
DRIVER_OK. Some devices do support this operation when the device is
suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
advertises this support as a backend features.

Signed-off-by: Dragos Tatulea <[email protected]>
Suggested-by: Eugenio Pérez <[email protected]>
---
include/uapi/linux/vhost_types.h | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index d7656908f730..aacd067afc89 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -192,5 +192,9 @@ struct vhost_vdpa_iova_range {
#define VHOST_BACKEND_F_DESC_ASID 0x7
/* IOTLB don't flush memory mapping across device reset */
#define VHOST_BACKEND_F_IOTLB_PERSIST 0x8
+/* Device supports changing virtqueue addresses when device is suspended
+ * and is in state DRIVER_OK.
+ */
+#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND 0x9

#endif
--
2.43.0


2023-12-19 18:21:03

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 07/15] vdpa: Block vq address change in DRIVER_OK unless device supports it

Virtqueue address change during DRIVE_OK is not supported by the virtio
standard. Allow this op in DRIVER_OK only for devices that support
changing the address during DRIVER_OK if the device is suspended.

Signed-off-by: Dragos Tatulea <[email protected]>
Suggested-by: Eugenio Pérez <[email protected]>
---
drivers/vhost/vdpa.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index 00b4fa8e89f2..6bfa3391935a 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -625,6 +625,29 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
return ret;
}

+static bool vhost_vdpa_vq_config_allowed(struct vhost_vdpa *v, unsigned int cmd)
+{
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
+ int feature;
+
+ if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
+ return true;
+
+ if (!v->suspended)
+ return false;
+
+ switch (cmd) {
+ case VHOST_SET_VRING_ADDR:
+ feature = VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND;
+ break;
+ default:
+ return false;
+ }
+
+ return v->vdev.vqs && vhost_backend_has_feature(v->vdev.vqs[0], feature);
+}
+
static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
void __user *argp)
{
@@ -703,6 +726,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,

switch (cmd) {
case VHOST_SET_VRING_ADDR:
+ if (!vhost_vdpa_vq_config_allowed(v, cmd))
+ return -EOPNOTSUPP;
+
if (ops->set_vq_address(vdpa, idx,
(u64)(uintptr_t)vq->desc,
(u64)(uintptr_t)vq->avail,
--
2.43.0


2023-12-19 18:22:14

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 06/15] vdpa: Track device suspended state

Set vdpa device suspended state on successful suspend. Clear it on
successful resume and reset.

The state will be locked by the vhost_vdpa mutex. The mutex is taken
during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
exception is vhost_vdpa_open which does a device reset but that should
be safe because it can only happen before the other ops.

Signed-off-by: Dragos Tatulea <[email protected]>
Suggested-by: Eugenio Pérez <[email protected]>
---
drivers/vhost/vdpa.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index b4e8ddf86485..00b4fa8e89f2 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -59,6 +59,7 @@ struct vhost_vdpa {
int in_batch;
struct vdpa_iova_range range;
u32 batch_asid;
+ bool suspended;
};

static DEFINE_IDA(vhost_vdpa_ida);
@@ -232,6 +233,8 @@ static int _compat_vdpa_reset(struct vhost_vdpa *v)
struct vdpa_device *vdpa = v->vdpa;
u32 flags = 0;

+ v->suspended = false;
+
if (v->vdev.vqs) {
flags |= !vhost_backend_has_feature(v->vdev.vqs[0],
VHOST_BACKEND_F_IOTLB_PERSIST) ?
@@ -590,11 +593,16 @@ static long vhost_vdpa_suspend(struct vhost_vdpa *v)
{
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
+ int ret;

if (!ops->suspend)
return -EOPNOTSUPP;

- return ops->suspend(vdpa);
+ ret = ops->suspend(vdpa);
+ if (!ret)
+ v->suspended = true;
+
+ return ret;
}

/* After a successful return of this ioctl the device resumes processing
@@ -605,11 +613,16 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
{
struct vdpa_device *vdpa = v->vdpa;
const struct vdpa_config_ops *ops = vdpa->config;
+ int ret;

if (!ops->resume)
return -EOPNOTSUPP;

- return ops->resume(vdpa);
+ ret = ops->resume(vdpa);
+ if (!ret)
+ v->suspended = false;
+
+ return ret;
}

static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
--
2.43.0


2023-12-19 18:22:36

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v4 04/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend feature

If userland sets this feature, allow it.

Signed-off-by: Dragos Tatulea <[email protected]>
Suggested-by: Eugenio Pérez <[email protected]>
---
drivers/vhost/vdpa.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index da7ec77cdaff..2250fcd90e5b 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -749,6 +749,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST) |
BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
BIT_ULL(VHOST_BACKEND_F_RESUME) |
+ BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
return -EOPNOTSUPP;
if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
--
2.43.0


2023-12-20 03:47:11

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH mlx5-vhost v4 01/15] vdpa/mlx5: Expose resumable vq capability

On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
>
> Necessary for checking if resumable vqs are supported by the hardware.
> Actual support will be added in a downstream patch.
>
> Reviewed-by: Gal Pressman <[email protected]>
> Acked-by: Eugenio Pérez <[email protected]>
> Signed-off-by: Dragos Tatulea <[email protected]>

Acked-by: Jason Wang <[email protected]>

Thanks


> ---
> include/linux/mlx5/mlx5_ifc.h | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
> index 6f3631425f38..9eaceaf6bcb0 100644
> --- a/include/linux/mlx5/mlx5_ifc.h
> +++ b/include/linux/mlx5/mlx5_ifc.h
> @@ -1236,7 +1236,8 @@ struct mlx5_ifc_virtio_emulation_cap_bits {
>
> u8 reserved_at_c0[0x13];
> u8 desc_group_mkey_supported[0x1];
> - u8 reserved_at_d4[0xc];
> + u8 freeze_to_rdy_supported[0x1];
> + u8 reserved_at_d5[0xb];
>
> u8 reserved_at_e0[0x20];
>
> --
> 2.43.0
>


2023-12-20 03:47:32

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
>
> The virtio spec doesn't allow changing virtqueue addresses after
> DRIVER_OK. Some devices do support this operation when the device is
> suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> advertises this support as a backend features.

There's an ongoing effort in virtio spec to introduce the suspend state.

So I wonder if it's better to just allow such behaviour?

Thanks


>
> Signed-off-by: Dragos Tatulea <[email protected]>
> Suggested-by: Eugenio Pérez <[email protected]>
> ---
> include/uapi/linux/vhost_types.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index d7656908f730..aacd067afc89 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -192,5 +192,9 @@ struct vhost_vdpa_iova_range {
> #define VHOST_BACKEND_F_DESC_ASID 0x7
> /* IOTLB don't flush memory mapping across device reset */
> #define VHOST_BACKEND_F_IOTLB_PERSIST 0x8
> +/* Device supports changing virtqueue addresses when device is suspended
> + * and is in state DRIVER_OK.
> + */
> +#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND 0x9
>
> #endif
> --
> 2.43.0
>


2023-12-20 03:47:48

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 06/15] vdpa: Track device suspended state

On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
>
> Set vdpa device suspended state on successful suspend. Clear it on
> successful resume and reset.
>
> The state will be locked by the vhost_vdpa mutex. The mutex is taken
> during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
> exception is vhost_vdpa_open which does a device reset but that should
> be safe because it can only happen before the other ops.
>
> Signed-off-by: Dragos Tatulea <[email protected]>
> Suggested-by: Eugenio Pérez <[email protected]>
> ---
> drivers/vhost/vdpa.c | 17 +++++++++++++++--
> 1 file changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index b4e8ddf86485..00b4fa8e89f2 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -59,6 +59,7 @@ struct vhost_vdpa {
> int in_batch;
> struct vdpa_iova_range range;
> u32 batch_asid;
> + bool suspended;

Any reason why we don't do it in the core vDPA device but here?

Thanks


2023-12-20 03:48:11

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 09/15] vdpa/mlx5: Allow modifying multiple vq fields in one modify command

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <[email protected]> wrote:
>
> Add a bitmask variable that tracks hw vq field changes that
> are supposed to be modified on next hw vq change command.
>
> This will be useful to set multiple vq fields when resuming the vq.
>
> Reviewed-by: Gal Pressman <[email protected]>
> Acked-by: Eugenio Pérez <[email protected]>
> Signed-off-by: Dragos Tatulea <[email protected]>

Acked-by: Jason Wang <[email protected]>

Thanks


2023-12-20 03:48:48

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 12/15] vdpa/mlx5: Mark vq state for modification in hw vq

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <[email protected]> wrote:
>
> .set_vq_state will set the indices and mark the fields to be modified in
> the hw vq.
>
> Advertise that the device supports changing the vq state when the device
> is in DRIVER_OK state and suspended.
>
> Reviewed-by: Gal Pressman <[email protected]>
> Signed-off-by: Dragos Tatulea <[email protected]>
> ---

Acked-by: Jason Wang <[email protected]>

Thanks


2023-12-20 03:49:24

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 13/15] vdpa/mlx5: Use vq suspend/resume during .set_map

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <[email protected]> wrote:
>
> Instead of tearing down and setting up vq resources, use vq
> suspend/resume during .set_map to speed things up a bit.
>
> The vq mr is updated with the new mapping while the vqs are suspended.
>
> If the device doesn't support resumable vqs, do the old teardown and
> setup dance.
>
> Reviewed-by: Gal Pressman <[email protected]>
> Acked-by: Eugenio Pérez <[email protected]>
> Signed-off-by: Dragos Tatulea <[email protected]>
> ---

Acked-by: Jason Wang <[email protected]>

Thanks


2023-12-20 03:49:39

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 14/15] vdpa/mlx5: Introduce reference counting to mrs

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <[email protected]> wrote:
>
> Deleting the old mr during mr update (.set_map) and then modifying the
> vqs with the new mr is not a good flow for firmware. The firmware
> expects that mkeys are deleted after there are no more vqs referencing
> them.
>
> Introduce reference counting for mrs to fix this. It is the only way to
> make sure that mkeys are not in use by vqs.
>
> An mr reference is taken when the mr is associated to the mr asid table
> and when the mr is linked to the vq on create/modify. The reference is
> released when the mkey is unlinked from the vq (trough modify/destroy)
> and from the mr asid table.
>
> To make things consistent, get rid of mlx5_vdpa_destroy_mr and use
> get/put semantics everywhere.
>
> Reviewed-by: Gal Pressman <[email protected]>
> Acked-by: Eugenio Pérez <[email protected]>
> Signed-off-by: Dragos Tatulea <[email protected]>
> ---

Acked-by: Jason Wang <[email protected]>

Thanks


2023-12-20 03:57:23

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 10/15] vdpa/mlx5: Introduce per vq and device resume

On Wed, Dec 20, 2023 at 2:10 AM Dragos Tatulea <[email protected]> wrote:
>
> Implement vdpa vq and device resume if capability detected. Add support
> for suspend -> ready state change.
>
> Reviewed-by: Gal Pressman <[email protected]>
> Acked-by: Eugenio Pérez <[email protected]>
> Signed-off-by: Dragos Tatulea <[email protected]>

Acked-by: Jason Wang <[email protected]>

Thanks


2023-12-20 04:06:15

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
>
> On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> >
> > The virtio spec doesn't allow changing virtqueue addresses after
> > DRIVER_OK. Some devices do support this operation when the device is
> > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > advertises this support as a backend features.
>
> There's an ongoing effort in virtio spec to introduce the suspend state.
>
> So I wonder if it's better to just allow such behaviour?

Actually I mean, allow drivers to modify the parameters during suspend
without a new feature.

Thanks

>
> Thanks
>
>


2023-12-20 12:56:01

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v4 06/15] vdpa: Track device suspended state

On Wed, 2023-12-20 at 11:46 +0800, Jason Wang wrote:
> On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> >
> > Set vdpa device suspended state on successful suspend. Clear it on
> > successful resume and reset.
> >
> > The state will be locked by the vhost_vdpa mutex. The mutex is taken
> > during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
> > exception is vhost_vdpa_open which does a device reset but that should
> > be safe because it can only happen before the other ops.
> >
> > Signed-off-by: Dragos Tatulea <[email protected]>
> > Suggested-by: Eugenio Pérez <[email protected]>
> > ---
> > drivers/vhost/vdpa.c | 17 +++++++++++++++--
> > 1 file changed, 15 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > index b4e8ddf86485..00b4fa8e89f2 100644
> > --- a/drivers/vhost/vdpa.c
> > +++ b/drivers/vhost/vdpa.c
> > @@ -59,6 +59,7 @@ struct vhost_vdpa {
> > int in_batch;
> > struct vdpa_iova_range range;
> > u32 batch_asid;
> > + bool suspended;
>
> Any reason why we don't do it in the core vDPA device but here?
>
Not really. I wanted to be safe and not expose it in a header due to locking.

Thanks,
Dragos

2023-12-20 12:57:38

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Wed, 2023-12-20 at 12:05 +0800, Jason Wang wrote:
> On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> >
> > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > >
> > > The virtio spec doesn't allow changing virtqueue addresses after
> > > DRIVER_OK. Some devices do support this operation when the device is
> > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > advertises this support as a backend features.
> >
> > There's an ongoing effort in virtio spec to introduce the suspend state.
> >
> > So I wonder if it's better to just allow such behaviour?
>
> Actually I mean, allow drivers to modify the parameters during suspend
> without a new feature.
>
Fine by me. Less code is better than more code. The v2 of this series would be
sufficient then.

Thanks

2023-12-20 13:32:52

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
>
> On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> >
> > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > >
> > > The virtio spec doesn't allow changing virtqueue addresses after
> > > DRIVER_OK. Some devices do support this operation when the device is
> > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > advertises this support as a backend features.
> >
> > There's an ongoing effort in virtio spec to introduce the suspend state.
> >
> > So I wonder if it's better to just allow such behaviour?
>
> Actually I mean, allow drivers to modify the parameters during suspend
> without a new feature.
>

That would be ideal, but how do userland checks if it can suspend +
change properties + resume?

The only way that comes to my mind is to make sure all parents return
error if userland tries to do it, and then fallback in userland. I'm
ok with that, but I'm not sure if the current master & previous kernel
has a coherent behavior. Do they return error? Or return success
without changing address / vq state?


2023-12-20 16:18:46

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 05/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend feature

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <[email protected]> wrote:
>
> If userland sets this feature, allow it.
>
> Signed-off-by: Dragos Tatulea <[email protected]>
> Suggested-by: Eugenio Pérez <[email protected]>

Acked-by: Eugenio Pérez <[email protected]>

> ---
> drivers/vhost/vdpa.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 2250fcd90e5b..b4e8ddf86485 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -750,6 +750,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
> BIT_ULL(VHOST_BACKEND_F_RESUME) |
> BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
> + BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND) |
> BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
> return -EOPNOTSUPP;
> if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> --
> 2.43.0
>


2023-12-20 16:24:59

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 03/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <[email protected]> wrote:
>
> The virtio spec doesn't allow changing virtqueue state after
> DRIVER_OK. Some devices do support this operation when the device is
> suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
> advertises this support as a backend features.
>
> Signed-off-by: Dragos Tatulea <[email protected]>
> Suggested-by: Eugenio Pérez <[email protected]>
> ---
> include/uapi/linux/vhost_types.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index aacd067afc89..848dadc95ca1 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -196,5 +196,9 @@ struct vhost_vdpa_iova_range {
> * and is in state DRIVER_OK.
> */
> #define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND 0x9
> +/* Device supports changing virtqueue state when device is suspended
> + * and is in state DRIVER_OK.
> + */
> +#define VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND 0x10

Acked-by: Eugenio Pérez <[email protected]>

It would be great to find a shorter ID but I'm very bad at naming :(.
Maybe it is ok to drop the _BACKEND_ word? I'm ok with carrying the
acked-by if so.

Thanks!

>
> #endif
> --
> 2.43.0
>


2023-12-20 16:26:41

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 04/15] vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend feature

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <[email protected]> wrote:
>
> If userland sets this feature, allow it.
>
> Signed-off-by: Dragos Tatulea <[email protected]>
> Suggested-by: Eugenio Pérez <[email protected]>

Acked-by: Eugenio Pérez <[email protected]>

> ---
> drivers/vhost/vdpa.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index da7ec77cdaff..2250fcd90e5b 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -749,6 +749,7 @@ static long vhost_vdpa_unlocked_ioctl(struct file *filep,
> BIT_ULL(VHOST_BACKEND_F_IOTLB_PERSIST) |
> BIT_ULL(VHOST_BACKEND_F_SUSPEND) |
> BIT_ULL(VHOST_BACKEND_F_RESUME) |
> + BIT_ULL(VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND) |
> BIT_ULL(VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK)))
> return -EOPNOTSUPP;
> if ((features & BIT_ULL(VHOST_BACKEND_F_SUSPEND)) &&
> --
> 2.43.0
>


2023-12-20 16:28:50

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <[email protected]> wrote:
>
> The virtio spec doesn't allow changing virtqueue addresses after
> DRIVER_OK. Some devices do support this operation when the device is
> suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> advertises this support as a backend features.
>
> Signed-off-by: Dragos Tatulea <[email protected]>
> Suggested-by: Eugenio Pérez <[email protected]>
> ---
> include/uapi/linux/vhost_types.h | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
> index d7656908f730..aacd067afc89 100644
> --- a/include/uapi/linux/vhost_types.h
> +++ b/include/uapi/linux/vhost_types.h
> @@ -192,5 +192,9 @@ struct vhost_vdpa_iova_range {
> #define VHOST_BACKEND_F_DESC_ASID 0x7
> /* IOTLB don't flush memory mapping across device reset */
> #define VHOST_BACKEND_F_IOTLB_PERSIST 0x8
> +/* Device supports changing virtqueue addresses when device is suspended
> + * and is in state DRIVER_OK.
> + */
> +#define VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND 0x9
>

If we go by feature flag,

Acked-by: Eugenio Pérez <[email protected]>

> #endif
> --
> 2.43.0
>


2023-12-20 16:31:58

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 07/15] vdpa: Block vq address change in DRIVER_OK unless device supports it

On Tue, Dec 19, 2023 at 7:09 PM Dragos Tatulea <[email protected]> wrote:
>
> Virtqueue address change during DRIVE_OK is not supported by the virtio
> standard. Allow this op in DRIVER_OK only for devices that support
> changing the address during DRIVER_OK if the device is suspended.
>
> Signed-off-by: Dragos Tatulea <[email protected]>
> Suggested-by: Eugenio Pérez <[email protected]>

Acked-by: Eugenio Pérez <[email protected]>

> ---
> drivers/vhost/vdpa.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 00b4fa8e89f2..6bfa3391935a 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -625,6 +625,29 @@ static long vhost_vdpa_resume(struct vhost_vdpa *v)
> return ret;
> }
>
> +static bool vhost_vdpa_vq_config_allowed(struct vhost_vdpa *v, unsigned int cmd)
> +{
> + struct vdpa_device *vdpa = v->vdpa;
> + const struct vdpa_config_ops *ops = vdpa->config;
> + int feature;
> +
> + if (!(ops->get_status(vdpa) & VIRTIO_CONFIG_S_DRIVER_OK))
> + return true;
> +
> + if (!v->suspended)
> + return false;
> +
> + switch (cmd) {
> + case VHOST_SET_VRING_ADDR:
> + feature = VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND;
> + break;
> + default:
> + return false;
> + }
> +
> + return v->vdev.vqs && vhost_backend_has_feature(v->vdev.vqs[0], feature);
> +}
> +
> static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> void __user *argp)
> {
> @@ -703,6 +726,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
>
> switch (cmd) {
> case VHOST_SET_VRING_ADDR:
> + if (!vhost_vdpa_vq_config_allowed(v, cmd))
> + return -EOPNOTSUPP;
> +
> if (ops->set_vq_address(vdpa, idx,
> (u64)(uintptr_t)vq->desc,
> (u64)(uintptr_t)vq->avail,
> --
> 2.43.0
>


2023-12-20 16:33:20

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 08/15] vdpa: Block vq state change in DRIVER_OK unless device supports it

On Tue, Dec 19, 2023 at 7:10 PM Dragos Tatulea <[email protected]> wrote:
>
> Virtqueue state change during DRIVE_OK is not supported by the virtio
> standard. Allow this op in DRIVER_OK only for devices that support
> changing the state during DRIVER_OK if the device is suspended.
>
> Signed-off-by: Dragos Tatulea <[email protected]>
> Suggested-by: Eugenio Pérez <[email protected]>

Acked-by: Eugenio Pérez <[email protected]>

> ---
> drivers/vhost/vdpa.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> index 6bfa3391935a..77509440c723 100644
> --- a/drivers/vhost/vdpa.c
> +++ b/drivers/vhost/vdpa.c
> @@ -641,6 +641,9 @@ static bool vhost_vdpa_vq_config_allowed(struct vhost_vdpa *v, unsigned int cmd)
> case VHOST_SET_VRING_ADDR:
> feature = VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND;
> break;
> + case VHOST_SET_VRING_BASE:
> + feature = VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND;
> + break;
> default:
> return false;
> }
> @@ -737,6 +740,9 @@ static long vhost_vdpa_vring_ioctl(struct vhost_vdpa *v, unsigned int cmd,
> break;
>
> case VHOST_SET_VRING_BASE:
> + if (!vhost_vdpa_vq_config_allowed(v, cmd))
> + return -EOPNOTSUPP;
> +
> if (vhost_has_feature(vq, VIRTIO_F_RING_PACKED)) {
> vq_state.packed.last_avail_idx = vq->last_avail_idx & 0x7fff;
> vq_state.packed.last_avail_counter = !!(vq->last_avail_idx & 0x8000);
> --
> 2.43.0
>


2023-12-21 02:03:30

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
<[email protected]> wrote:
>
> On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
> >
> > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > >
> > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > advertises this support as a backend features.
> > >
> > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > >
> > > So I wonder if it's better to just allow such behaviour?
> >
> > Actually I mean, allow drivers to modify the parameters during suspend
> > without a new feature.
> >
>
> That would be ideal, but how do userland checks if it can suspend +
> change properties + resume?

As discussed, it looks to me the only device that supports suspend is
simulator and it supports change properties.

E.g:

static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
u64 desc_area, u64 driver_area,
u64 device_area)
{
struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];

vq->desc_addr = desc_area;
vq->driver_addr = driver_area;
vq->device_addr = device_area;

return 0;
}

>
> The only way that comes to my mind is to make sure all parents return
> error if userland tries to do it, and then fallback in userland.

Yes.

> I'm
> ok with that, but I'm not sure if the current master & previous kernel
> has a coherent behavior. Do they return error? Or return success
> without changing address / vq state?

We probably don't need to worry too much here, as e.g set_vq_address
could fail even without suspend (just at uAPI level).

Thanks

>


2023-12-21 07:47:27

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <[email protected]> wrote:
>
> On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> <[email protected]> wrote:
> >
> > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > > >
> > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > advertises this support as a backend features.
> > > >
> > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > >
> > > > So I wonder if it's better to just allow such behaviour?
> > >
> > > Actually I mean, allow drivers to modify the parameters during suspend
> > > without a new feature.
> > >
> >
> > That would be ideal, but how do userland checks if it can suspend +
> > change properties + resume?
>
> As discussed, it looks to me the only device that supports suspend is
> simulator and it supports change properties.
>
> E.g:
>
> static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> u64 desc_area, u64 driver_area,
> u64 device_area)
> {
> struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
>
> vq->desc_addr = desc_area;
> vq->driver_addr = driver_area;
> vq->device_addr = device_area;
>
> return 0;
> }
>

So in the current kernel master it is valid to set a different vq
address while the device is suspended in vdpa_sim. But it is not valid
in mlx5, as the FW will not be updated in resume (Dragos, please
correct me if I'm wrong). Both of them return success.

How can we know in the destination QEMU if it is valid to suspend &
set address? Should we handle this as a bugfix and backport the
change?

> >
> > The only way that comes to my mind is to make sure all parents return
> > error if userland tries to do it, and then fallback in userland.
>
> Yes.
>
> > I'm
> > ok with that, but I'm not sure if the current master & previous kernel
> > has a coherent behavior. Do they return error? Or return success
> > without changing address / vq state?
>
> We probably don't need to worry too much here, as e.g set_vq_address
> could fail even without suspend (just at uAPI level).
>

I don't get this, sorry. I rephrased my point with an example earlier
in the mail.


2023-12-21 11:53:19

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <[email protected]> wrote:
> >
> > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > <[email protected]> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > > > >
> > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > advertises this support as a backend features.
> > > > >
> > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > >
> > > > > So I wonder if it's better to just allow such behaviour?
> > > >
> > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > without a new feature.
> > > >
> > >
> > > That would be ideal, but how do userland checks if it can suspend +
> > > change properties + resume?
> >
> > As discussed, it looks to me the only device that supports suspend is
> > simulator and it supports change properties.
> >
> > E.g:
> >
> > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > u64 desc_area, u64 driver_area,
> > u64 device_area)
> > {
> > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> >
> > vq->desc_addr = desc_area;
> > vq->driver_addr = driver_area;
> > vq->device_addr = device_area;
> >
> > return 0;
> > }
> >
>
> So in the current kernel master it is valid to set a different vq
> address while the device is suspended in vdpa_sim. But it is not valid
> in mlx5, as the FW will not be updated in resume (Dragos, please
> correct me if I'm wrong). Both of them return success.
>
In the current state, there is no resume. HW Virtqueues will just get re-created
with the new address.

> How can we know in the destination QEMU if it is valid to suspend &
> set address? Should we handle this as a bugfix and backport the
> change?
>
> > >
> > > The only way that comes to my mind is to make sure all parents return
> > > error if userland tries to do it, and then fallback in userland.
> >
> > Yes.
> >
> > > I'm
> > > ok with that, but I'm not sure if the current master & previous kernel
> > > has a coherent behavior. Do they return error? Or return success
> > > without changing address / vq state?
> >
> > We probably don't need to worry too much here, as e.g set_vq_address
> > could fail even without suspend (just at uAPI level).
> >
>
> I don't get this, sorry. I rephrased my point with an example earlier
> in the mail.
>

2023-12-21 12:09:27

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <[email protected]> wrote:
>
> On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <[email protected]> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > <[email protected]> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > > > > >
> > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > advertises this support as a backend features.
> > > > > >
> > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > >
> > > > > > So I wonder if it's better to just allow such behaviour?
> > > > >
> > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > without a new feature.
> > > > >
> > > >
> > > > That would be ideal, but how do userland checks if it can suspend +
> > > > change properties + resume?
> > >
> > > As discussed, it looks to me the only device that supports suspend is
> > > simulator and it supports change properties.
> > >
> > > E.g:
> > >
> > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > u64 desc_area, u64 driver_area,
> > > u64 device_area)
> > > {
> > > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > >
> > > vq->desc_addr = desc_area;
> > > vq->driver_addr = driver_area;
> > > vq->device_addr = device_area;
> > >
> > > return 0;
> > > }
> > >
> >
> > So in the current kernel master it is valid to set a different vq
> > address while the device is suspended in vdpa_sim. But it is not valid
> > in mlx5, as the FW will not be updated in resume (Dragos, please
> > correct me if I'm wrong). Both of them return success.
> >
> In the current state, there is no resume. HW Virtqueues will just get re-created
> with the new address.
>

Oh, then all of this is effectively transparent to the userspace
except for the time it takes?

In that case you're right, we don't need feature flags. But I think it
would be great to also move the error return in case userspace tries
to modify vq parameters out of suspend state.

Thanks!


> > How can we know in the destination QEMU if it is valid to suspend &
> > set address? Should we handle this as a bugfix and backport the
> > change?
> >
> > > >
> > > > The only way that comes to my mind is to make sure all parents return
> > > > error if userland tries to do it, and then fallback in userland.
> > >
> > > Yes.
> > >
> > > > I'm
> > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > has a coherent behavior. Do they return error? Or return success
> > > > without changing address / vq state?
> > >
> > > We probably don't need to worry too much here, as e.g set_vq_address
> > > could fail even without suspend (just at uAPI level).
> > >
> >
> > I don't get this, sorry. I rephrased my point with an example earlier
> > in the mail.
> >
>


2023-12-21 14:38:26

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <[email protected]> wrote:
> >
> > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <[email protected]> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > <[email protected]> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > > > > > >
> > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > advertises this support as a backend features.
> > > > > > >
> > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > >
> > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > >
> > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > without a new feature.
> > > > > >
> > > > >
> > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > change properties + resume?
> > > >
> > > > As discussed, it looks to me the only device that supports suspend is
> > > > simulator and it supports change properties.
> > > >
> > > > E.g:
> > > >
> > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > u64 desc_area, u64 driver_area,
> > > > u64 device_area)
> > > > {
> > > > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > >
> > > > vq->desc_addr = desc_area;
> > > > vq->driver_addr = driver_area;
> > > > vq->device_addr = device_area;
> > > >
> > > > return 0;
> > > > }
> > > >
> > >
> > > So in the current kernel master it is valid to set a different vq
> > > address while the device is suspended in vdpa_sim. But it is not valid
> > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > correct me if I'm wrong). Both of them return success.
> > >
> > In the current state, there is no resume. HW Virtqueues will just get re-created
> > with the new address.
> >
>
> Oh, then all of this is effectively transparent to the userspace
> except for the time it takes?
>
Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
representation. Only later will it will call into the FW to update the FW. Later
means:
- On DRIVER_OK state, when the VQs get created.
- On .set_map when the VQs get re-created (before this series) / updated (after
this series)
- On .resume (after this series).

So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
suspended those addresses will be set later for later.

> In that case you're right, we don't need feature flags. But I think it
> would be great to also move the error return in case userspace tries
> to modify vq parameters out of suspend state.
>
On the driver side or on the core side?

Thanks
> Thanks!
>
>
> > > How can we know in the destination QEMU if it is valid to suspend &
> > > set address? Should we handle this as a bugfix and backport the
> > > change?
> > >
> > > > >
> > > > > The only way that comes to my mind is to make sure all parents return
> > > > > error if userland tries to do it, and then fallback in userland.
> > > >
> > > > Yes.
> > > >
> > > > > I'm
> > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > has a coherent behavior. Do they return error? Or return success
> > > > > without changing address / vq state?
> > > >
> > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > could fail even without suspend (just at uAPI level).
> > > >
> > >
> > > I don't get this, sorry. I rephrased my point with an example earlier
> > > in the mail.
> > >
> >
>

2023-12-21 14:57:32

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Thu, Dec 21, 2023 at 3:38 PM Dragos Tatulea <[email protected]> wrote:
>
> On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> > On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <[email protected]> wrote:
> > >
> > > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <[email protected]> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > > <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > > advertises this support as a backend features.
> > > > > > > >
> > > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > >
> > > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > >
> > > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > > without a new feature.
> > > > > > >
> > > > > >
> > > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > > change properties + resume?
> > > > >
> > > > > As discussed, it looks to me the only device that supports suspend is
> > > > > simulator and it supports change properties.
> > > > >
> > > > > E.g:
> > > > >
> > > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > > u64 desc_area, u64 driver_area,
> > > > > u64 device_area)
> > > > > {
> > > > > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > >
> > > > > vq->desc_addr = desc_area;
> > > > > vq->driver_addr = driver_area;
> > > > > vq->device_addr = device_area;
> > > > >
> > > > > return 0;
> > > > > }
> > > > >
> > > >
> > > > So in the current kernel master it is valid to set a different vq
> > > > address while the device is suspended in vdpa_sim. But it is not valid
> > > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > > correct me if I'm wrong). Both of them return success.
> > > >
> > > In the current state, there is no resume. HW Virtqueues will just get re-created
> > > with the new address.
> > >
> >
> > Oh, then all of this is effectively transparent to the userspace
> > except for the time it takes?
> >
> Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
> representation. Only later will it will call into the FW to update the FW. Later
> means:
> - On DRIVER_OK state, when the VQs get created.
> - On .set_map when the VQs get re-created (before this series) / updated (after
> this series)
> - On .resume (after this series).
>
> So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
> suspended those addresses will be set later for later.
>

Ouch, that is more in the line of my thoughts :(.

> > In that case you're right, we don't need feature flags. But I think it
> > would be great to also move the error return in case userspace tries
> > to modify vq parameters out of suspend state.
> >
> On the driver side or on the core side?
>

Core side.

It does not have to be part of this series, I meant it can be proposed
in a separate series and applied before the parent driver one.

> Thanks
> > Thanks!
> >
> >
> > > > How can we know in the destination QEMU if it is valid to suspend &
> > > > set address? Should we handle this as a bugfix and backport the
> > > > change?
> > > >
> > > > > >
> > > > > > The only way that comes to my mind is to make sure all parents return
> > > > > > error if userland tries to do it, and then fallback in userland.
> > > > >
> > > > > Yes.
> > > > >
> > > > > > I'm
> > > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > > has a coherent behavior. Do they return error? Or return success
> > > > > > without changing address / vq state?
> > > > >
> > > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > > could fail even without suspend (just at uAPI level).
> > > > >
> > > >
> > > > I don't get this, sorry. I rephrased my point with an example earlier
> > > > in the mail.
> > > >
> > >
> >
>


2023-12-21 15:07:38

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Thu, 2023-12-21 at 15:55 +0100, Eugenio Perez Martin wrote:
> On Thu, Dec 21, 2023 at 3:38 PM Dragos Tatulea <[email protected]> wrote:
> >
> > On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> > > On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <[email protected]> wrote:
> > > >
> > > > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <[email protected]> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > > > <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > > > advertises this support as a backend features.
> > > > > > > > >
> > > > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > > >
> > > > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > > >
> > > > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > > > without a new feature.
> > > > > > > >
> > > > > > >
> > > > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > > > change properties + resume?
> > > > > >
> > > > > > As discussed, it looks to me the only device that supports suspend is
> > > > > > simulator and it supports change properties.
> > > > > >
> > > > > > E.g:
> > > > > >
> > > > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > > > u64 desc_area, u64 driver_area,
> > > > > > u64 device_area)
> > > > > > {
> > > > > > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > > > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > > >
> > > > > > vq->desc_addr = desc_area;
> > > > > > vq->driver_addr = driver_area;
> > > > > > vq->device_addr = device_area;
> > > > > >
> > > > > > return 0;
> > > > > > }
> > > > > >
> > > > >
> > > > > So in the current kernel master it is valid to set a different vq
> > > > > address while the device is suspended in vdpa_sim. But it is not valid
> > > > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > > > correct me if I'm wrong). Both of them return success.
> > > > >
> > > > In the current state, there is no resume. HW Virtqueues will just get re-created
> > > > with the new address.
> > > >
> > >
> > > Oh, then all of this is effectively transparent to the userspace
> > > except for the time it takes?
> > >
> > Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
> > representation. Only later will it will call into the FW to update the FW. Later
> > means:
> > - On DRIVER_OK state, when the VQs get created.
> > - On .set_map when the VQs get re-created (before this series) / updated (after
> > this series)
> > - On .resume (after this series).
> >
> > So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
> > suspended those addresses will be set later for later.
> >
>
> Ouch, that is more in the line of my thoughts :(.
>
> > > In that case you're right, we don't need feature flags. But I think it
> > > would be great to also move the error return in case userspace tries
> > > to modify vq parameters out of suspend state.
> > >
> > On the driver side or on the core side?
> >
>
> Core side.
>
Checking my understanding: instead of the feature flags there would be a check
(for .set_vq_addr and .set_vq_state) to return an error if they are called under
DRIVER_OK and not suspended state?

> It does not have to be part of this series, I meant it can be proposed
> in a separate series and applied before the parent driver one.
>
> > Thanks
> > > Thanks!
> > >
> > >
> > > > > How can we know in the destination QEMU if it is valid to suspend &
> > > > > set address? Should we handle this as a bugfix and backport the
> > > > > change?
> > > > >
> > > > > > >
> > > > > > > The only way that comes to my mind is to make sure all parents return
> > > > > > > error if userland tries to do it, and then fallback in userland.
> > > > > >
> > > > > > Yes.
> > > > > >
> > > > > > > I'm
> > > > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > > > has a coherent behavior. Do they return error? Or return success
> > > > > > > without changing address / vq state?
> > > > > >
> > > > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > > > could fail even without suspend (just at uAPI level).
> > > > > >
> > > > >
> > > > > I don't get this, sorry. I rephrased my point with an example earlier
> > > > > in the mail.
> > > > >
> > > >
> > >
> >
>

2023-12-22 02:50:52

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Thu, Dec 21, 2023 at 3:47 PM Eugenio Perez Martin
<[email protected]> wrote:
>
> On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <[email protected]> wrote:
> >
> > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > <[email protected]> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > > > >
> > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > advertises this support as a backend features.
> > > > >
> > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > >
> > > > > So I wonder if it's better to just allow such behaviour?
> > > >
> > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > without a new feature.
> > > >
> > >
> > > That would be ideal, but how do userland checks if it can suspend +
> > > change properties + resume?
> >
> > As discussed, it looks to me the only device that supports suspend is
> > simulator and it supports change properties.
> >
> > E.g:
> >
> > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > u64 desc_area, u64 driver_area,
> > u64 device_area)
> > {
> > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> >
> > vq->desc_addr = desc_area;
> > vq->driver_addr = driver_area;
> > vq->device_addr = device_area;
> >
> > return 0;
> > }
> >
>
> So in the current kernel master it is valid to set a different vq
> address while the device is suspended in vdpa_sim. But it is not valid
> in mlx5, as the FW will not be updated in resume (Dragos, please
> correct me if I'm wrong). Both of them return success.
>
> How can we know in the destination QEMU if it is valid to suspend &
> set address? Should we handle this as a bugfix and backport the
> change?

Good point.

We probably need to do backport, this seems to be the easiest way.
Theoretically, userspace may assume this behavior (though I don't
think there would be a user that depends on the simulator).

>
> > >
> > > The only way that comes to my mind is to make sure all parents return
> > > error if userland tries to do it, and then fallback in userland.
> >
> > Yes.
> >
> > > I'm
> > > ok with that, but I'm not sure if the current master & previous kernel
> > > has a coherent behavior. Do they return error? Or return success
> > > without changing address / vq state?
> >
> > We probably don't need to worry too much here, as e.g set_vq_address
> > could fail even without suspend (just at uAPI level).
> >
>
> I don't get this, sorry. I rephrased my point with an example earlier
> in the mail.

I mean currently, VHOST_SET_VRING_ADDR can fail. So userspace should
not assume it will always succeed.

Thanks

>


2023-12-22 07:30:57

by Eugenio Perez Martin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Thu, Dec 21, 2023 at 4:07 PM Dragos Tatulea <[email protected]> wrote:
>
> On Thu, 2023-12-21 at 15:55 +0100, Eugenio Perez Martin wrote:
> > On Thu, Dec 21, 2023 at 3:38 PM Dragos Tatulea <[email protected]> wrote:
> > >
> > > On Thu, 2023-12-21 at 13:08 +0100, Eugenio Perez Martin wrote:
> > > > On Thu, Dec 21, 2023 at 12:52 PM Dragos Tatulea <[email protected]> wrote:
> > > > >
> > > > > On Thu, 2023-12-21 at 08:46 +0100, Eugenio Perez Martin wrote:
> > > > > > On Thu, Dec 21, 2023 at 3:03 AM Jason Wang <[email protected]> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 9:32 PM Eugenio Perez Martin
> > > > > > > <[email protected]> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 5:06 AM Jason Wang <[email protected]> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Dec 20, 2023 at 11:46 AM Jason Wang <[email protected]> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > > > > > > > > >
> > > > > > > > > > > The virtio spec doesn't allow changing virtqueue addresses after
> > > > > > > > > > > DRIVER_OK. Some devices do support this operation when the device is
> > > > > > > > > > > suspended. The VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > > > > > > > > > > advertises this support as a backend features.
> > > > > > > > > >
> > > > > > > > > > There's an ongoing effort in virtio spec to introduce the suspend state.
> > > > > > > > > >
> > > > > > > > > > So I wonder if it's better to just allow such behaviour?
> > > > > > > > >
> > > > > > > > > Actually I mean, allow drivers to modify the parameters during suspend
> > > > > > > > > without a new feature.
> > > > > > > > >
> > > > > > > >
> > > > > > > > That would be ideal, but how do userland checks if it can suspend +
> > > > > > > > change properties + resume?
> > > > > > >
> > > > > > > As discussed, it looks to me the only device that supports suspend is
> > > > > > > simulator and it supports change properties.
> > > > > > >
> > > > > > > E.g:
> > > > > > >
> > > > > > > static int vdpasim_set_vq_address(struct vdpa_device *vdpa, u16 idx,
> > > > > > > u64 desc_area, u64 driver_area,
> > > > > > > u64 device_area)
> > > > > > > {
> > > > > > > struct vdpasim *vdpasim = vdpa_to_sim(vdpa);
> > > > > > > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > > > > > >
> > > > > > > vq->desc_addr = desc_area;
> > > > > > > vq->driver_addr = driver_area;
> > > > > > > vq->device_addr = device_area;
> > > > > > >
> > > > > > > return 0;
> > > > > > > }
> > > > > > >
> > > > > >
> > > > > > So in the current kernel master it is valid to set a different vq
> > > > > > address while the device is suspended in vdpa_sim. But it is not valid
> > > > > > in mlx5, as the FW will not be updated in resume (Dragos, please
> > > > > > correct me if I'm wrong). Both of them return success.
> > > > > >
> > > > > In the current state, there is no resume. HW Virtqueues will just get re-created
> > > > > with the new address.
> > > > >
> > > >
> > > > Oh, then all of this is effectively transparent to the userspace
> > > > except for the time it takes?
> > > >
> > > Not quite: mlx5_vdpa_set_vq_address will save the vq address only on the SW vq
> > > representation. Only later will it will call into the FW to update the FW. Later
> > > means:
> > > - On DRIVER_OK state, when the VQs get created.
> > > - On .set_map when the VQs get re-created (before this series) / updated (after
> > > this series)
> > > - On .resume (after this series).
> > >
> > > So if the .set_vq_address is called when the VQ is in DRIVER_OK but not
> > > suspended those addresses will be set later for later.
> > >
> >
> > Ouch, that is more in the line of my thoughts :(.
> >
> > > > In that case you're right, we don't need feature flags. But I think it
> > > > would be great to also move the error return in case userspace tries
> > > > to modify vq parameters out of suspend state.
> > > >
> > > On the driver side or on the core side?
> > >
> >
> > Core side.
> >
> Checking my understanding: instead of the feature flags there would be a check
> (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> DRIVER_OK and not suspended state?
>

Yes, correct. Per Jason's message, it should be enough with two
independent series:
* Patches 6, 7 and 8 of this series, just checking for suspend state
and not feature flags.
* Your v2.

Thanks!

> > It does not have to be part of this series, I meant it can be proposed
> > in a separate series and applied before the parent driver one.
> >
> > > Thanks
> > > > Thanks!
> > > >
> > > >
> > > > > > How can we know in the destination QEMU if it is valid to suspend &
> > > > > > set address? Should we handle this as a bugfix and backport the
> > > > > > change?
> > > > > >
> > > > > > > >
> > > > > > > > The only way that comes to my mind is to make sure all parents return
> > > > > > > > error if userland tries to do it, and then fallback in userland.
> > > > > > >
> > > > > > > Yes.
> > > > > > >
> > > > > > > > I'm
> > > > > > > > ok with that, but I'm not sure if the current master & previous kernel
> > > > > > > > has a coherent behavior. Do they return error? Or return success
> > > > > > > > without changing address / vq state?
> > > > > > >
> > > > > > > We probably don't need to worry too much here, as e.g set_vq_address
> > > > > > > could fail even without suspend (just at uAPI level).
> > > > > > >
> > > > > >
> > > > > > I don't get this, sorry. I rephrased my point with an example earlier
> > > > > > in the mail.
> > > > > >
> > > > >
> > > >
> > >
> >
>


2023-12-22 08:29:28

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Thu, Dec 21, 2023 at 03:07:22PM +0000, Dragos Tatulea wrote:
> > > > In that case you're right, we don't need feature flags. But I think it
> > > > would be great to also move the error return in case userspace tries
> > > > to modify vq parameters out of suspend state.
> > > >
> > > On the driver side or on the core side?
> > >
> >
> > Core side.
> >
> Checking my understanding:?instead of the feature flags there would be a check
> (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> DRIVER_OK and not suspended state?

Yea this looks much saner, if we start adding feature flags for
each OPERATION_X_LEGAL_IN_STATE_Y then we will end up with N^2
feature bits which is not reasonable.

--
MST


2023-12-22 10:51:28

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Fri, 2023-12-22 at 03:29 -0500, Michael S. Tsirkin wrote:
> On Thu, Dec 21, 2023 at 03:07:22PM +0000, Dragos Tatulea wrote:
> > > > > In that case you're right, we don't need feature flags. But I think it
> > > > > would be great to also move the error return in case userspace tries
> > > > > to modify vq parameters out of suspend state.
> > > > >
> > > > On the driver side or on the core side?
> > > >
> > >
> > > Core side.
> > >
> > Checking my understanding: instead of the feature flags there would be a check
> > (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> > DRIVER_OK and not suspended state?
>
> Yea this looks much saner, if we start adding feature flags for
> each OPERATION_X_LEGAL_IN_STATE_Y then we will end up with N^2
> feature bits which is not reasonable.
>
Ack. Is the v2 enough or should I respin a v5 with the updated Acked-by tags?

I will prepare the core part as a different series without the flags.

Thanks,
Dragos

2023-12-22 11:23:01

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v4 06/15] vdpa: Track device suspended state

On Wed, 2023-12-20 at 13:55 +0100, Dragos Tatulea wrote:
> On Wed, 2023-12-20 at 11:46 +0800, Jason Wang wrote:
> > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > >
> > > Set vdpa device suspended state on successful suspend. Clear it on
> > > successful resume and reset.
> > >
> > > The state will be locked by the vhost_vdpa mutex. The mutex is taken
> > > during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
> > > exception is vhost_vdpa_open which does a device reset but that should
> > > be safe because it can only happen before the other ops.
> > >
> > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > Suggested-by: Eugenio Pérez <[email protected]>
> > > ---
> > > drivers/vhost/vdpa.c | 17 +++++++++++++++--
> > > 1 file changed, 15 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > index b4e8ddf86485..00b4fa8e89f2 100644
> > > --- a/drivers/vhost/vdpa.c
> > > +++ b/drivers/vhost/vdpa.c
> > > @@ -59,6 +59,7 @@ struct vhost_vdpa {
> > > int in_batch;
> > > struct vdpa_iova_range range;
> > > u32 batch_asid;
> > > + bool suspended;
> >
> > Any reason why we don't do it in the core vDPA device but here?
> >
> Not really. I wanted to be safe and not expose it in a header due to locking.
>
A few clearer answers for why the state is not added in struct vdpa_device:
- All the suspend infrastructure is currently only for vhost.
- If the state would be moved to struct vdpa_device then the cf_lock would have
to be used. This adds more complexity to the code.

Thanks,
Dragos

2023-12-25 04:11:52

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH vhost v4 06/15] vdpa: Track device suspended state

On Fri, Dec 22, 2023 at 7:22 PM Dragos Tatulea <[email protected]> wrote:
>
> On Wed, 2023-12-20 at 13:55 +0100, Dragos Tatulea wrote:
> > On Wed, 2023-12-20 at 11:46 +0800, Jason Wang wrote:
> > > On Wed, Dec 20, 2023 at 2:09 AM Dragos Tatulea <[email protected]> wrote:
> > > >
> > > > Set vdpa device suspended state on successful suspend. Clear it on
> > > > successful resume and reset.
> > > >
> > > > The state will be locked by the vhost_vdpa mutex. The mutex is taken
> > > > during suspend, resume and reset in vhost_vdpa_unlocked_ioctl. The
> > > > exception is vhost_vdpa_open which does a device reset but that should
> > > > be safe because it can only happen before the other ops.
> > > >
> > > > Signed-off-by: Dragos Tatulea <[email protected]>
> > > > Suggested-by: Eugenio Pérez <[email protected]>
> > > > ---
> > > > drivers/vhost/vdpa.c | 17 +++++++++++++++--
> > > > 1 file changed, 15 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
> > > > index b4e8ddf86485..00b4fa8e89f2 100644
> > > > --- a/drivers/vhost/vdpa.c
> > > > +++ b/drivers/vhost/vdpa.c
> > > > @@ -59,6 +59,7 @@ struct vhost_vdpa {
> > > > int in_batch;
> > > > struct vdpa_iova_range range;
> > > > u32 batch_asid;
> > > > + bool suspended;
> > >
> > > Any reason why we don't do it in the core vDPA device but here?
> > >
> > Not really. I wanted to be safe and not expose it in a header due to locking.
> >
> A few clearer answers for why the state is not added in struct vdpa_device:
> - All the suspend infrastructure is currently only for vhost.
> - If the state would be moved to struct vdpa_device then the cf_lock would have
> to be used. This adds more complexity to the code.
>
> Thanks,
> Dragos

Ok, I'm fine with that.

Thanks


2023-12-25 13:46:07

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v4 02/15] vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag

On Fri, 2023-12-22 at 11:51 +0100, Dragos Tatulea wrote:
> On Fri, 2023-12-22 at 03:29 -0500, Michael S. Tsirkin wrote:
> > On Thu, Dec 21, 2023 at 03:07:22PM +0000, Dragos Tatulea wrote:
> > > > > > In that case you're right, we don't need feature flags. But I think it
> > > > > > would be great to also move the error return in case userspace tries
> > > > > > to modify vq parameters out of suspend state.
> > > > > >
> > > > > On the driver side or on the core side?
> > > > >
> > > >
> > > > Core side.
> > > >
> > > Checking my understanding: instead of the feature flags there would be a check
> > > (for .set_vq_addr and .set_vq_state) to return an error if they are called under
> > > DRIVER_OK and not suspended state?
> >
> > Yea this looks much saner, if we start adding feature flags for
> > each OPERATION_X_LEGAL_IN_STATE_Y then we will end up with N^2
> > feature bits which is not reasonable.
> >
> Ack. Is the v2 enough or should I respin a v5 with the updated Acked-by tags?
>
> I will prepare the core part as a different series without the flags.
>
Core part sent:
https://lore.kernel.org/virtualization/[email protected]/T/#t

I also have a v2 respin with extra Acked-by tags if necessary as a v5. Just let
me know if it is needed.

Thanks,
Dragos

2023-12-25 14:41:22

by Michael S. Tsirkin

[permalink] [raw]
Subject: Re: [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs

On Tue, Dec 19, 2023 at 08:08:43PM +0200, Dragos Tatulea wrote:
> Add support for resumable vqs in the mlx5_vdpa driver. This is a
> firmware feature that can be used for the following benefits:
> - Full device .suspend/.resume.
> - .set_map doesn't need to destroy and create new vqs anymore just to
> update the map. When resumable vqs are supported it is enough to
> suspend the vqs, set the new maps, and then resume the vqs.
>
> The first patch exposes relevant bits for the feature in mlx5_ifc.h.
> That means it needs to be applied to the mlx5-vhost tree [0] first. Once
> applied there, the change has to be pulled from mlx5-vhost into the
> vhost tree and only then the remaining patches can be applied. Same flow
> as the vq descriptor mappings patchset [1].
>
> The second part implements the vdpa backend feature support to allow
> vq state and address changes when the device is in DRIVER_OK state and
> suspended.
>
> The third part adds support for seletively modifying vq parameters. This
> is needed to be able to use resumable vqs.
>
> Then the actual support for resumable vqs is added.
>
> The last part of the series introduces reference counting for mrs which
> is necessary to avoid freeing mkeys too early or leaking them.


I lost track. Are you going to send v5 or not?

> * Changes in v4:
> - Added vdpa backend feature support for changing vq properties in
> DRIVER_OK when device is suspended. Added support in the driver as
> well.
> - Dropped Acked-by for the patches that had the tag mistakenly
> added.
>
> * Changes in v3:
> - Faulty version. Please ignore.
>
> * Changes in v2:
> - Added mr refcounting patches.
> - Deleted unnecessary patch: "vdpa/mlx5: Split function into locked and
> unlocked variants"
> - Small print improvement in "Introduce per vq and device resume"
> patch.
> - Patch 1/7 has been applied to mlx5-vhost branch.
>
>
> Dragos Tatulea (15):
> vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
> vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend
> feature
> vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend
> feature
> vdpa: Track device suspended state
> vdpa: Block vq address change in DRIVER_OK unless device supports it
> vdpa: Block vq state change in DRIVER_OK unless device supports it
> vdpa/mlx5: Expose resumable vq capability
> vdpa/mlx5: Allow modifying multiple vq fields in one modify command
> vdpa/mlx5: Introduce per vq and device resume
> vdpa/mlx5: Mark vq addrs for modification in hw vq
> vdpa/mlx5: Mark vq state for modification in hw vq
> vdpa/mlx5: Use vq suspend/resume during .set_map
> vdpa/mlx5: Introduce reference counting to mrs
> vdpa/mlx5: Add mkey leak detection
>
> drivers/vdpa/mlx5/core/mlx5_vdpa.h | 10 +-
> drivers/vdpa/mlx5/core/mr.c | 69 +++++++--
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 218 ++++++++++++++++++++++++++---
> drivers/vhost/vdpa.c | 51 ++++++-
> include/linux/mlx5/mlx5_ifc.h | 3 +-
> include/linux/mlx5/mlx5_ifc_vdpa.h | 4 +
> include/uapi/linux/vhost_types.h | 8 ++
> 7 files changed, 322 insertions(+), 41 deletions(-)
>
> --
> 2.43.0


2023-12-25 15:05:47

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v4 00/15] vdpa/mlx5: Add support for resumable vqs

On Mon, 2023-12-25 at 09:41 -0500, Michael S. Tsirkin wrote:
> On Tue, Dec 19, 2023 at 08:08:43PM +0200, Dragos Tatulea wrote:
> > Add support for resumable vqs in the mlx5_vdpa driver. This is a
> > firmware feature that can be used for the following benefits:
> > - Full device .suspend/.resume.
> > - .set_map doesn't need to destroy and create new vqs anymore just to
> > update the map. When resumable vqs are supported it is enough to
> > suspend the vqs, set the new maps, and then resume the vqs.
> >
> > The first patch exposes relevant bits for the feature in mlx5_ifc.h.
> > That means it needs to be applied to the mlx5-vhost tree [0] first. Once
> > applied there, the change has to be pulled from mlx5-vhost into the
> > vhost tree and only then the remaining patches can be applied. Same flow
> > as the vq descriptor mappings patchset [1].
> >
> > The second part implements the vdpa backend feature support to allow
> > vq state and address changes when the device is in DRIVER_OK state and
> > suspended.
> >
> > The third part adds support for seletively modifying vq parameters. This
> > is needed to be able to use resumable vqs.
> >
> > Then the actual support for resumable vqs is added.
> >
> > The last part of the series introduces reference counting for mrs which
> > is necessary to avoid freeing mkeys too early or leaking them.
>
>
> I lost track. Are you going to send v5 or not?
>
I was waiting for your answer if I should send it or not. I suppose this means
yes. I will send it in a few minutes.

> > * Changes in v4:
> > - Added vdpa backend feature support for changing vq properties in
> > DRIVER_OK when device is suspended. Added support in the driver as
> > well.
> > - Dropped Acked-by for the patches that had the tag mistakenly
> > added.
> >
> > * Changes in v3:
> > - Faulty version. Please ignore.
> >
> > * Changes in v2:
> > - Added mr refcounting patches.
> > - Deleted unnecessary patch: "vdpa/mlx5: Split function into locked and
> > unlocked variants"
> > - Small print improvement in "Introduce per vq and device resume"
> > patch.
> > - Patch 1/7 has been applied to mlx5-vhost branch.
> >
> >
> > Dragos Tatulea (15):
> > vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND flag
> > vdpa: Add VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND flag
> > vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_ADDR_IN_SUSPEND backend
> > feature
> > vdpa: Accept VHOST_BACKEND_F_CHANGEABLE_VQ_STATE_IN_SUSPEND backend
> > feature
> > vdpa: Track device suspended state
> > vdpa: Block vq address change in DRIVER_OK unless device supports it
> > vdpa: Block vq state change in DRIVER_OK unless device supports it
> > vdpa/mlx5: Expose resumable vq capability
> > vdpa/mlx5: Allow modifying multiple vq fields in one modify command
> > vdpa/mlx5: Introduce per vq and device resume
> > vdpa/mlx5: Mark vq addrs for modification in hw vq
> > vdpa/mlx5: Mark vq state for modification in hw vq
> > vdpa/mlx5: Use vq suspend/resume during .set_map
> > vdpa/mlx5: Introduce reference counting to mrs
> > vdpa/mlx5: Add mkey leak detection
> >
> > drivers/vdpa/mlx5/core/mlx5_vdpa.h | 10 +-
> > drivers/vdpa/mlx5/core/mr.c | 69 +++++++--
> > drivers/vdpa/mlx5/net/mlx5_vnet.c | 218 ++++++++++++++++++++++++++---
> > drivers/vhost/vdpa.c | 51 ++++++-
> > include/linux/mlx5/mlx5_ifc.h | 3 +-
> > include/linux/mlx5/mlx5_ifc_vdpa.h | 4 +
> > include/uapi/linux/vhost_types.h | 8 ++
> > 7 files changed, 322 insertions(+), 41 deletions(-)
> >
> > --
> > 2.43.0
>