2023-12-15 14:05:04

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v3 0/6] vdpa/mlx5: Add support for resumable vqs

Add support for resumable vqs in the driver. This is a firmware feature
that can be used for the following benefits:
- Full device .suspend/.resume.
- .set_map doesn't need to destroy and create new vqs anymore just to
update the map. When resumable vqs are supported it is enough to
suspend the vqs, set the new maps, and then resume the vqs.

The first patch exposes the relevant bits in mlx5_ifc.h. That means it
needs to be applied to the mlx5-vhost tree [0] first. Once applied
there, the change has to be pulled from mlx5-vhost into the vhost tree
and only then the remaining patches can be applied. Same flow as the vq
descriptor mappings patchset [1].

The second part adds support for resumable vqs in the form of a device .resume
operation but also for the .set_map call (suspend/resume device instead
of re-creating vqs with new mappings).

The last part of the series introduces reference counting for mrs which
is necessary to avoid freeing mkeys too early or leaking them.

* Changes in v3:
- Dropped patches that allowed vq modification of state and addresses
when state is DRIVER_OK. This is not allowed by the standard.
Should be re-added under a vdpa feature flag.

* Changes in v2:
- Added mr refcounting patches.
- Deleted unnecessary patch: "vdpa/mlx5: Split function into locked and
unlocked variants"
- Small print improvement in "Introduce per vq and device resume"
patch.
- Patch 1/7 has been applied to mlx5-vhost branch.

[0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
[1] https://lore.kernel.org/virtualization/[email protected]/


Dragos Tatulea (6):
vdpa/mlx5: Expose resumable vq capability
vdpa/mlx5: Allow modifying multiple vq fields in one modify command
vdpa/mlx5: Introduce per vq and device resume
vdpa/mlx5: Use vq suspend/resume during .set_map
vdpa/mlx5: Introduce reference counting to mrs
vdpa/mlx5: Add mkey leak detection

drivers/vdpa/mlx5/core/mlx5_vdpa.h | 10 +-
drivers/vdpa/mlx5/core/mr.c | 69 +++++++---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 194 +++++++++++++++++++++++++----
include/linux/mlx5/mlx5_ifc.h | 3 +-
include/linux/mlx5/mlx5_ifc_vdpa.h | 1 +
5 files changed, 239 insertions(+), 38 deletions(-)

--
2.43.0



2023-12-15 14:05:47

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v3 2/6] vdpa/mlx5: Allow modifying multiple vq fields in one modify command

Add a bitmask variable that tracks hw vq field changes that
are supposed to be modified on next hw vq change command.

This will be useful to set multiple vq fields when resuming the vq.

Reviewed-by: Gal Pressman <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 48 +++++++++++++++++++++++++------
1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 26ba7da6b410..1e08a8805640 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -120,6 +120,9 @@ struct mlx5_vdpa_virtqueue {
u16 avail_idx;
u16 used_idx;
int fw_state;
+
+ u64 modified_fields;
+
struct msi_map map;

/* keep last in the struct */
@@ -1181,7 +1184,19 @@ static bool is_valid_state_change(int oldstate, int newstate)
}
}

-static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int state)
+static bool modifiable_virtqueue_fields(struct mlx5_vdpa_virtqueue *mvq)
+{
+ /* Only state is always modifiable */
+ if (mvq->modified_fields & ~MLX5_VIRTQ_MODIFY_MASK_STATE)
+ return mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT ||
+ mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
+
+ return true;
+}
+
+static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
+ struct mlx5_vdpa_virtqueue *mvq,
+ int state)
{
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
@@ -1193,6 +1208,9 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_NONE)
return 0;

+ if (!modifiable_virtqueue_fields(mvq))
+ return -EINVAL;
+
if (!is_valid_state_change(mvq->fw_state, state))
return -EINVAL;

@@ -1208,17 +1226,28 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);

obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
- MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
- MLX5_VIRTQ_MODIFY_MASK_STATE);
- MLX5_SET(virtio_net_q_object, obj_context, state, state);
+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
+ MLX5_SET(virtio_net_q_object, obj_context, state, state);
+
+ MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
kfree(in);
if (!err)
mvq->fw_state = state;

+ mvq->modified_fields = 0;
+
return err;
}

+static int modify_virtqueue_state(struct mlx5_vdpa_net *ndev,
+ struct mlx5_vdpa_virtqueue *mvq,
+ unsigned int state)
+{
+ mvq->modified_fields |= MLX5_VIRTQ_MODIFY_MASK_STATE;
+ return modify_virtqueue(ndev, mvq, state);
+}
+
static int counter_set_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
{
u32 in[MLX5_ST_SZ_DW(create_virtio_q_counters_in)] = {};
@@ -1347,7 +1376,7 @@ static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
goto err_vq;

if (mvq->ready) {
- err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+ err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
if (err) {
mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready vq idx %d(%d)\n",
idx, err);
@@ -1382,7 +1411,7 @@ static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *m
if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
return;

- if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
+ if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");

if (query_virtqueue(ndev, mvq, &attr)) {
@@ -1407,6 +1436,7 @@ static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *
return;

suspend_vq(ndev, mvq);
+ mvq->modified_fields = 0;
destroy_virtqueue(ndev, mvq);
dealloc_vector(ndev, mvq);
counter_set_dealloc(ndev, mvq);
@@ -2207,7 +2237,7 @@ static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready
if (!ready) {
suspend_vq(ndev, mvq);
} else {
- err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+ err = modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
if (err) {
mlx5_vdpa_warn(mvdev, "modify VQ %d to ready failed (%d)\n", idx, err);
ready = false;
@@ -2804,8 +2834,10 @@ static void clear_vqs_ready(struct mlx5_vdpa_net *ndev)
{
int i;

- for (i = 0; i < ndev->mvdev.max_vqs; i++)
+ for (i = 0; i < ndev->mvdev.max_vqs; i++) {
ndev->vqs[i].ready = false;
+ ndev->vqs[i].modified_fields = 0;
+ }

ndev->mvdev.cvq.ready = false;
}
--
2.43.0


2023-12-15 14:06:15

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v3 3/6] vdpa/mlx5: Introduce per vq and device resume

Implement vdpa vq and device resume if capability detected. Add support
for suspend -> ready state change.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 69 +++++++++++++++++++++++++++----
1 file changed, 62 insertions(+), 7 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 1e08a8805640..f8f088cced50 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1170,7 +1170,12 @@ static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
return err;
}

-static bool is_valid_state_change(int oldstate, int newstate)
+static bool is_resumable(struct mlx5_vdpa_net *ndev)
+{
+ return ndev->mvdev.vdev.config->resume;
+}
+
+static bool is_valid_state_change(int oldstate, int newstate, bool resumable)
{
switch (oldstate) {
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_INIT:
@@ -1178,6 +1183,7 @@ static bool is_valid_state_change(int oldstate, int newstate)
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY:
return newstate == MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND;
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND:
+ return resumable ? newstate == MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY : false;
case MLX5_VIRTIO_NET_Q_OBJECT_STATE_ERR:
default:
return false;
@@ -1200,6 +1206,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
{
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+ bool state_change = false;
void *obj_context;
void *cmd_hdr;
void *in;
@@ -1211,9 +1218,6 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
if (!modifiable_virtqueue_fields(mvq))
return -EINVAL;

- if (!is_valid_state_change(mvq->fw_state, state))
- return -EINVAL;
-
in = kzalloc(inlen, GFP_KERNEL);
if (!in)
return -ENOMEM;
@@ -1226,17 +1230,29 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);

obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
- if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE)
+
+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
+ if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
+ err = -EINVAL;
+ goto done;
+ }
+
MLX5_SET(virtio_net_q_object, obj_context, state, state);
+ state_change = true;
+ }

MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
- kfree(in);
- if (!err)
+ if (err)
+ goto done;
+
+ if (state_change)
mvq->fw_state = state;

mvq->modified_fields = 0;

+done:
+ kfree(in);
return err;
}

@@ -1430,6 +1446,24 @@ static void suspend_vqs(struct mlx5_vdpa_net *ndev)
suspend_vq(ndev, &ndev->vqs[i]);
}

+static void resume_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ if (!mvq->initialized || !is_resumable(ndev))
+ return;
+
+ if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND)
+ return;
+
+ if (modify_virtqueue_state(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY))
+ mlx5_vdpa_warn(&ndev->mvdev, "modify to resume failed for vq %u\n", mvq->index);
+}
+
+static void resume_vqs(struct mlx5_vdpa_net *ndev)
+{
+ for (int i = 0; i < ndev->mvdev.max_vqs; i++)
+ resume_vq(ndev, &ndev->vqs[i]);
+}
+
static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
{
if (!mvq->initialized)
@@ -3261,6 +3295,23 @@ static int mlx5_vdpa_suspend(struct vdpa_device *vdev)
return 0;
}

+static int mlx5_vdpa_resume(struct vdpa_device *vdev)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev;
+
+ ndev = to_mlx5_vdpa_ndev(mvdev);
+
+ mlx5_vdpa_info(mvdev, "resuming device\n");
+
+ down_write(&ndev->reslock);
+ mvdev->suspended = false;
+ resume_vqs(ndev);
+ register_link_notifier(ndev);
+ up_write(&ndev->reslock);
+ return 0;
+}
+
static int mlx5_set_group_asid(struct vdpa_device *vdev, u32 group,
unsigned int asid)
{
@@ -3317,6 +3368,7 @@ static const struct vdpa_config_ops mlx5_vdpa_ops = {
.get_vq_dma_dev = mlx5_get_vq_dma_dev,
.free = mlx5_vdpa_free,
.suspend = mlx5_vdpa_suspend,
+ .resume = mlx5_vdpa_resume, /* Op disabled if not supported. */
};

static int query_mtu(struct mlx5_core_dev *mdev, u16 *mtu)
@@ -3688,6 +3740,9 @@ static int mlx5v_probe(struct auxiliary_device *adev,
if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, desc_group_mkey_supported))
mgtdev->vdpa_ops.get_vq_desc_group = NULL;

+ if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, freeze_to_rdy_supported))
+ mgtdev->vdpa_ops.resume = NULL;
+
err = vdpa_mgmtdev_register(&mgtdev->mgtdev);
if (err)
goto reg_err;
--
2.43.0


2023-12-15 14:06:20

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH mlx5-vhost v3 1/6] vdpa/mlx5: Expose resumable vq capability

Necessary for checking if resumable vqs are supported by the hardware.
Actual support will be added in a downstream patch.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
include/linux/mlx5/mlx5_ifc.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h
index 6f3631425f38..9eaceaf6bcb0 100644
--- a/include/linux/mlx5/mlx5_ifc.h
+++ b/include/linux/mlx5/mlx5_ifc.h
@@ -1236,7 +1236,8 @@ struct mlx5_ifc_virtio_emulation_cap_bits {

u8 reserved_at_c0[0x13];
u8 desc_group_mkey_supported[0x1];
- u8 reserved_at_d4[0xc];
+ u8 freeze_to_rdy_supported[0x1];
+ u8 reserved_at_d5[0xb];

u8 reserved_at_e0[0x20];

--
2.43.0


2023-12-15 14:06:36

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v3 5/6] vdpa/mlx5: Introduce reference counting to mrs

Deleting the old mr during mr update (.set_map) and then modifying the
vqs with the new mr is not a good flow for firmware. The firmware
expects that mkeys are deleted after there are no more vqs referencing
them.

Introduce reference counting for mrs to fix this. It is the only way to
make sure that mkeys are not in use by vqs.

An mr reference is taken when the mr is associated to the mr asid table
and when the mr is linked to the vq on create/modify. The reference is
released when the mkey is unlinked from the vq (trough modify/destroy)
and from the mr asid table.

To make things consistent, get rid of mlx5_vdpa_destroy_mr and use
get/put semantics everywhere.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 8 +++--
drivers/vdpa/mlx5/core/mr.c | 50 ++++++++++++++++++++----------
drivers/vdpa/mlx5/net/mlx5_vnet.c | 45 ++++++++++++++++++++++-----
3 files changed, 78 insertions(+), 25 deletions(-)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 84547d998bcf..1a0d27b6e09a 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -35,6 +35,8 @@ struct mlx5_vdpa_mr {
struct vhost_iotlb *iotlb;

bool user_mr;
+
+ refcount_t refcount;
};

struct mlx5_vdpa_resources {
@@ -118,8 +120,10 @@ int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, u32 mkey);
struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
struct vhost_iotlb *iotlb);
void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev);
-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
- struct mlx5_vdpa_mr *mr);
+void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr);
+void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr);
void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
struct mlx5_vdpa_mr *mr,
unsigned int asid);
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 2197c46e563a..c7dc8914354a 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -498,32 +498,52 @@ static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr

static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
{
+ if (WARN_ON(!mr))
+ return;
+
if (mr->user_mr)
destroy_user_mr(mvdev, mr);
else
destroy_dma_mr(mvdev, mr);

vhost_iotlb_free(mr->iotlb);
+
+ kfree(mr);
}

-void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev,
- struct mlx5_vdpa_mr *mr)
+static void _mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr)
{
if (!mr)
return;

+ if (refcount_dec_and_test(&mr->refcount))
+ _mlx5_vdpa_destroy_mr(mvdev, mr);
+}
+
+void mlx5_vdpa_put_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr)
+{
mutex_lock(&mvdev->mr_mtx);
+ _mlx5_vdpa_put_mr(mvdev, mr);
+ mutex_unlock(&mvdev->mr_mtx);
+}

- _mlx5_vdpa_destroy_mr(mvdev, mr);
+static void _mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr)
+{
+ if (!mr)
+ return;

- for (int i = 0; i < MLX5_VDPA_NUM_AS; i++) {
- if (mvdev->mr[i] == mr)
- mvdev->mr[i] = NULL;
- }
+ refcount_inc(&mr->refcount);
+}

+void mlx5_vdpa_get_mr(struct mlx5_vdpa_dev *mvdev,
+ struct mlx5_vdpa_mr *mr)
+{
+ mutex_lock(&mvdev->mr_mtx);
+ _mlx5_vdpa_get_mr(mvdev, mr);
mutex_unlock(&mvdev->mr_mtx);
-
- kfree(mr);
}

void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
@@ -534,20 +554,16 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,

mutex_lock(&mvdev->mr_mtx);

+ _mlx5_vdpa_put_mr(mvdev, old_mr);
mvdev->mr[asid] = new_mr;
- if (old_mr) {
- _mlx5_vdpa_destroy_mr(mvdev, old_mr);
- kfree(old_mr);
- }

mutex_unlock(&mvdev->mr_mtx);
-
}

void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
{
for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
- mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[i]);
+ mlx5_vdpa_update_mr(mvdev, NULL, i);

prune_iotlb(mvdev->cvq.iotlb);
}
@@ -607,6 +623,8 @@ struct mlx5_vdpa_mr *mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
if (err)
goto out_err;

+ refcount_set(&mr->refcount, 1);
+
return mr;

out_err:
@@ -651,7 +669,7 @@ int mlx5_vdpa_reset_mr(struct mlx5_vdpa_dev *mvdev, unsigned int asid)
if (asid >= MLX5_VDPA_NUM_AS)
return -EINVAL;

- mlx5_vdpa_destroy_mr(mvdev, mvdev->mr[asid]);
+ mlx5_vdpa_update_mr(mvdev, NULL, asid);

if (asid == 0 && MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
if (mlx5_vdpa_create_dma_mr(mvdev))
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index d5883c554333..0df82e4d13f4 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -123,6 +123,9 @@ struct mlx5_vdpa_virtqueue {

u64 modified_fields;

+ struct mlx5_vdpa_mr *vq_mr;
+ struct mlx5_vdpa_mr *desc_mr;
+
struct msi_map map;

/* keep last in the struct */
@@ -946,6 +949,14 @@ static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtque
kfree(in);
mvq->virtq_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);

+ mlx5_vdpa_get_mr(mvdev, vq_mr);
+ mvq->vq_mr = vq_mr;
+
+ if (vq_desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported)) {
+ mlx5_vdpa_get_mr(mvdev, vq_desc_mr);
+ mvq->desc_mr = vq_desc_mr;
+ }
+
return 0;

err_cmd:
@@ -972,6 +983,12 @@ static void destroy_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtq
}
mvq->fw_state = MLX5_VIRTIO_NET_Q_OBJECT_NONE;
umems_destroy(ndev, mvq);
+
+ mlx5_vdpa_put_mr(&ndev->mvdev, mvq->vq_mr);
+ mvq->vq_mr = NULL;
+
+ mlx5_vdpa_put_mr(&ndev->mvdev, mvq->desc_mr);
+ mvq->desc_mr = NULL;
}

static u32 get_rqpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
@@ -1207,6 +1224,8 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+ struct mlx5_vdpa_mr *desc_mr = NULL;
+ struct mlx5_vdpa_mr *vq_mr = NULL;
bool state_change = false;
void *obj_context;
void *cmd_hdr;
@@ -1245,19 +1264,19 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
}

if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
- struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+ vq_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];

- if (mr)
- MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, mr->mkey);
+ if (vq_mr)
+ MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, vq_mr->mkey);
else
mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
}

if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
- struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+ desc_mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];

- if (mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
- MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, mr->mkey);
+ if (desc_mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
+ MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, desc_mr->mkey);
else
mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
}
@@ -1270,6 +1289,18 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
if (state_change)
mvq->fw_state = state;

+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
+ mlx5_vdpa_put_mr(mvdev, mvq->vq_mr);
+ mlx5_vdpa_get_mr(mvdev, vq_mr);
+ mvq->vq_mr = vq_mr;
+ }
+
+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
+ mlx5_vdpa_put_mr(mvdev, mvq->desc_mr);
+ mlx5_vdpa_get_mr(mvdev, desc_mr);
+ mvq->desc_mr = desc_mr;
+ }
+
mvq->modified_fields = 0;

done:
@@ -3080,7 +3111,7 @@ static int set_map_data(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
return mlx5_vdpa_update_cvq_iotlb(mvdev, iotlb, asid);

out_err:
- mlx5_vdpa_destroy_mr(mvdev, new_mr);
+ mlx5_vdpa_put_mr(mvdev, new_mr);
return err;
}

--
2.43.0


2023-12-15 14:07:18

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v3 4/6] vdpa/mlx5: Use vq suspend/resume during .set_map

Instead of tearing down and setting up vq resources, use vq
suspend/resume during .set_map to speed things up a bit.

The vq mr is updated with the new mapping while the vqs are suspended.

If the device doesn't support resumable vqs, do the old teardown and
setup dance.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/net/mlx5_vnet.c | 48 +++++++++++++++++++++++++-----
include/linux/mlx5/mlx5_ifc_vdpa.h | 1 +
2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index f8f088cced50..d5883c554333 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -1206,9 +1206,11 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
{
int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+ struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
bool state_change = false;
void *obj_context;
void *cmd_hdr;
+ void *vq_ctx;
void *in;
int err;

@@ -1230,6 +1232,7 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);

obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+ vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);

if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_STATE) {
if (!is_valid_state_change(mvq->fw_state, state, is_resumable(ndev))) {
@@ -1241,6 +1244,24 @@ static int modify_virtqueue(struct mlx5_vdpa_net *ndev,
state_change = true;
}

+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY) {
+ struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_GROUP]];
+
+ if (mr)
+ MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, mr->mkey);
+ else
+ mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY;
+ }
+
+ if (mvq->modified_fields & MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY) {
+ struct mlx5_vdpa_mr *mr = mvdev->mr[mvdev->group2asid[MLX5_VDPA_DATAVQ_DESC_GROUP]];
+
+ if (mr && MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, desc_group_mkey_supported))
+ MLX5_SET(virtio_q, vq_ctx, desc_group_mkey, mr->mkey);
+ else
+ mvq->modified_fields &= ~MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
+ }
+
MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select, mvq->modified_fields);
err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
if (err)
@@ -2767,24 +2788,35 @@ static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev,
unsigned int asid)
{
struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ bool teardown = !is_resumable(ndev);
int err;

suspend_vqs(ndev);
- err = save_channels_info(ndev);
- if (err)
- return err;
+ if (teardown) {
+ err = save_channels_info(ndev);
+ if (err)
+ return err;

- teardown_driver(ndev);
+ teardown_driver(ndev);
+ }

mlx5_vdpa_update_mr(mvdev, new_mr, asid);

+ for (int i = 0; i < ndev->cur_num_vqs; i++)
+ ndev->vqs[i].modified_fields |= MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY |
+ MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY;
+
if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK) || mvdev->suspended)
return 0;

- restore_channels_info(ndev);
- err = setup_driver(mvdev);
- if (err)
- return err;
+ if (teardown) {
+ restore_channels_info(ndev);
+ err = setup_driver(mvdev);
+ if (err)
+ return err;
+ }
+
+ resume_vqs(ndev);

return 0;
}
diff --git a/include/linux/mlx5/mlx5_ifc_vdpa.h b/include/linux/mlx5/mlx5_ifc_vdpa.h
index b86d51a855f6..8cbc2592d0f1 100644
--- a/include/linux/mlx5/mlx5_ifc_vdpa.h
+++ b/include/linux/mlx5/mlx5_ifc_vdpa.h
@@ -145,6 +145,7 @@ enum {
MLX5_VIRTQ_MODIFY_MASK_STATE = (u64)1 << 0,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_PARAMS = (u64)1 << 3,
MLX5_VIRTQ_MODIFY_MASK_DIRTY_BITMAP_DUMP_ENABLE = (u64)1 << 4,
+ MLX5_VIRTQ_MODIFY_MASK_VIRTIO_Q_MKEY = (u64)1 << 11,
MLX5_VIRTQ_MODIFY_MASK_DESC_GROUP_MKEY = (u64)1 << 14,
};

--
2.43.0


2023-12-15 14:07:35

by Dragos Tatulea

[permalink] [raw]
Subject: [PATCH vhost v3 6/6] vdpa/mlx5: Add mkey leak detection

Track allocated mrs in a list and show warning when leaks are detected
on device free or reset.

Reviewed-by: Gal Pressman <[email protected]>
Acked-by: Eugenio Pérez <[email protected]>
Signed-off-by: Dragos Tatulea <[email protected]>
---
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 2 ++
drivers/vdpa/mlx5/core/mr.c | 23 +++++++++++++++++++++++
drivers/vdpa/mlx5/net/mlx5_vnet.c | 2 ++
3 files changed, 27 insertions(+)

diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index 1a0d27b6e09a..50aac8fe57ef 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -37,6 +37,7 @@ struct mlx5_vdpa_mr {
bool user_mr;

refcount_t refcount;
+ struct list_head mr_list;
};

struct mlx5_vdpa_resources {
@@ -95,6 +96,7 @@ struct mlx5_vdpa_dev {
u32 generation;

struct mlx5_vdpa_mr *mr[MLX5_VDPA_NUM_AS];
+ struct list_head mr_list_head;
/* serialize mr access */
struct mutex mr_mtx;
struct mlx5_control_vq cvq;
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index c7dc8914354a..4758914ccf86 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -508,6 +508,8 @@ static void _mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_

vhost_iotlb_free(mr->iotlb);

+ list_del(&mr->mr_list);
+
kfree(mr);
}

@@ -560,12 +562,31 @@ void mlx5_vdpa_update_mr(struct mlx5_vdpa_dev *mvdev,
mutex_unlock(&mvdev->mr_mtx);
}

+static void mlx5_vdpa_show_mr_leaks(struct mlx5_vdpa_dev *mvdev)
+{
+ struct mlx5_vdpa_mr *mr;
+
+ mutex_lock(&mvdev->mr_mtx);
+
+ list_for_each_entry(mr, &mvdev->mr_list_head, mr_list) {
+
+ mlx5_vdpa_warn(mvdev, "mkey still alive after resource delete: "
+ "mr: %p, mkey: 0x%x, refcount: %u\n",
+ mr, mr->mkey, refcount_read(&mr->refcount));
+ }
+
+ mutex_unlock(&mvdev->mr_mtx);
+
+}
+
void mlx5_vdpa_destroy_mr_resources(struct mlx5_vdpa_dev *mvdev)
{
for (int i = 0; i < MLX5_VDPA_NUM_AS; i++)
mlx5_vdpa_update_mr(mvdev, NULL, i);

prune_iotlb(mvdev->cvq.iotlb);
+
+ mlx5_vdpa_show_mr_leaks(mvdev);
}

static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
@@ -592,6 +613,8 @@ static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev,
if (err)
goto err_iotlb;

+ list_add_tail(&mr->mr_list, &mvdev->mr_list_head);
+
return 0;

err_iotlb:
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 0df82e4d13f4..88b633682e18 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -3707,6 +3707,8 @@ static int mlx5_vdpa_dev_add(struct vdpa_mgmt_dev *v_mdev, const char *name,
if (err)
goto err_mpfs;

+ INIT_LIST_HEAD(&mvdev->mr_list_head);
+
if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) {
err = mlx5_vdpa_create_dma_mr(mvdev);
if (err)
--
2.43.0


2023-12-15 14:11:05

by Dragos Tatulea

[permalink] [raw]
Subject: Re: [PATCH vhost v3 0/6] vdpa/mlx5: Add support for resumable vqs

On Fri, 2023-12-15 at 16:01 +0200, Dragos Tatulea wrote:
> Add support for resumable vqs in the driver. This is a firmware feature
> that can be used for the following benefits:
> - Full device .suspend/.resume.
> - .set_map doesn't need to destroy and create new vqs anymore just to
> update the map. When resumable vqs are supported it is enough to
> suspend the vqs, set the new maps, and then resume the vqs.
>
> The first patch exposes the relevant bits in mlx5_ifc.h. That means it
> needs to be applied to the mlx5-vhost tree [0] first. Once applied
> there, the change has to be pulled from mlx5-vhost into the vhost tree
> and only then the remaining patches can be applied. Same flow as the vq
> descriptor mappings patchset [1].
>
> The second part adds support for resumable vqs in the form of a device .resume
> operation but also for the .set_map call (suspend/resume device instead
> of re-creating vqs with new mappings).
>
> The last part of the series introduces reference counting for mrs which
> is necessary to avoid freeing mkeys too early or leaking them.
>
> * Changes in v3:
> - Dropped patches that allowed vq modification of state and addresses
> when state is DRIVER_OK. This is not allowed by the standard.
> Should be re-added under a vdpa feature flag.
>
> * Changes in v2:
> - Added mr refcounting patches.
> - Deleted unnecessary patch: "vdpa/mlx5: Split function into locked and
> unlocked variants"
> - Small print improvement in "Introduce per vq and device resume"
> patch.
> - Patch 1/7 has been applied to mlx5-vhost branch.
>
> [0] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/log/?h=mlx5-vhost
> [1] https://lore.kernel.org/virtualization/[email protected]/
>
>
> Dragos Tatulea (6):
> vdpa/mlx5: Expose resumable vq capability
> vdpa/mlx5: Allow modifying multiple vq fields in one modify command
> vdpa/mlx5: Introduce per vq and device resume
> vdpa/mlx5: Use vq suspend/resume during .set_map
> vdpa/mlx5: Introduce reference counting to mrs
> vdpa/mlx5: Add mkey leak detection
>
> drivers/vdpa/mlx5/core/mlx5_vdpa.h | 10 +-
> drivers/vdpa/mlx5/core/mr.c | 69 +++++++---
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 194 +++++++++++++++++++++++++----
> include/linux/mlx5/mlx5_ifc.h | 3 +-
> include/linux/mlx5/mlx5_ifc_vdpa.h | 1 +
> 5 files changed, 239 insertions(+), 38 deletions(-)
>

Please disregard this version. I will send a v4. Sorry about the noise.