Hi Michael,
please note that this series depends on mlx5 core device driver patches
in mlx5-next branch in
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git.
git pull git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git mlx5-next
They also depend Jason Wang's patches submitted a couple of weeks ago. I
included them in this series.
vdpa_sim: use the batching API
vhost-vdpa: support batch updating
The following series of patches provide VDPA support for Mellanox
devices. The supported devices are ConnectX6 DX and newer.
Currently, only a network driver is implemented; future patches will
introduce a block device driver. iperf performance on a single queue is
around 12 Gbps. Future patches will introduce multi queue support.
The files are organized in such a way that code that can be used by
different VDPA implementations will be placed in a common are resides in
drivers/vdpa/mlx5/core.
Only virtual functions are currently supported. Also, certain firmware
capabilities must be set to enable the driver. Physical functions (PFs)
are skipped by the driver.
To make use of the VDPA net driver, one must load mlx5_vdpa. In such
case, VFs will be operated by the VDPA driver. Although one can see a
regular instance of a network driver on the VF, the VDPA driver takes
precedence over the NIC driver, steering-wize.
Currently, the device/interface infrastructure in mlx5_core is used to
probe drivers. Future patches will introduce virtbus as a means to
register devices and drivers and VDPA will be adapted to it.
The mlx5 mode of operation required to support VDPA is switchdev mode.
Once can use Linux or OVS bridge to take care of layer 2 switching.
In order to provide virtio networking to a guest, an updated version of
qemu is required. This version has been tested by the following quemu
version:
url: https://github.com/jasowang/qemu.git
branch: vdpa
Commit ID: 6f4e59b807db
Eli Cohen (7):
net/vdpa: Use struct for set/get vq state
vhost: Fix documentation
vdpa: Add means to communicate vq status on get_vq_state
vdpa/mlx5: Add hardware descriptive header file
vdpa/mlx5: Add support library for mlx5 VDPA implementation
vdpa/mlx5: Add shared memory registration code
vdpa/mlx5: Add VDPA driver for supported mlx5 devices
Jason Wang (2):
vhost-vdpa: support batch updating
vdpa_sim: use the batching API
Max Gurtovoy (1):
vdpa: remove hard coded virtq num
drivers/vdpa/Kconfig | 18 +
drivers/vdpa/Makefile | 1 +
drivers/vdpa/ifcvf/ifcvf_base.c | 4 +-
drivers/vdpa/ifcvf/ifcvf_base.h | 4 +-
drivers/vdpa/ifcvf/ifcvf_main.c | 13 +-
drivers/vdpa/mlx5/Makefile | 4 +
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 89 ++
drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h | 168 ++
drivers/vdpa/mlx5/core/mr.c | 473 ++++++
drivers/vdpa/mlx5/core/resources.c | 284 ++++
drivers/vdpa/mlx5/net/main.c | 76 +
drivers/vdpa/mlx5/net/mlx5_vnet.c | 1966 ++++++++++++++++++++++++
drivers/vdpa/mlx5/net/mlx5_vnet.h | 32 +
drivers/vdpa/vdpa.c | 3 +
drivers/vdpa/vdpa_sim/vdpa_sim.c | 35 +-
drivers/vhost/iotlb.c | 4 +-
drivers/vhost/vdpa.c | 43 +-
include/linux/vdpa.h | 33 +-
include/uapi/linux/vhost_types.h | 2 +
19 files changed, 3193 insertions(+), 59 deletions(-)
create mode 100644 drivers/vdpa/mlx5/Makefile
create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa.h
create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa_ifc.h
create mode 100644 drivers/vdpa/mlx5/core/mr.c
create mode 100644 drivers/vdpa/mlx5/core/resources.c
create mode 100644 drivers/vdpa/mlx5/net/main.c
create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.c
create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.h
--
2.27.0
Fix documentation to match actual function prototypes.
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Eli Cohen <[email protected]>
---
drivers/vhost/iotlb.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
index 1f0ca6e44410..0d4213a54a88 100644
--- a/drivers/vhost/iotlb.c
+++ b/drivers/vhost/iotlb.c
@@ -149,7 +149,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_free);
* vhost_iotlb_itree_first - return the first overlapped range
* @iotlb: the IOTLB
* @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte in IOVA range
*/
struct vhost_iotlb_map *
vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last)
@@ -162,7 +162,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first);
* vhost_iotlb_itree_first - return the next overlapped range
* @iotlb: the IOTLB
* @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte IOVA range
*/
struct vhost_iotlb_map *
vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last)
--
2.27.0
Currently, get_vq_state() is used only to pass the available index value
of a vq. Extend the struct to return status on the VQ to the caller.
For now, define VQ_STATE_NOT_READY. In the future it will be extended to
include other infomration.
Modify current vdpa driver to update this field.
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Eli Cohen <[email protected]>
---
drivers/vdpa/ifcvf/ifcvf_main.c | 1 +
drivers/vdpa/vdpa_sim/vdpa_sim.c | 1 +
include/linux/vdpa.h | 9 +++++++++
3 files changed, 11 insertions(+)
diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 69032ee97824..77e3b3d91167 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -240,6 +240,7 @@ static void ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
{
struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
+ state->state = vf->vring[qid].ready ? 0 : BIT(VQ_STATE_NOT_READY);
state->avail_index = ifcvf_get_vq_state(vf, qid);
}
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
index 599519039f8d..06d974b4bd7b 100644
--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
@@ -434,6 +434,7 @@ static void vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
struct vringh *vrh = &vq->vring;
+ state->state = vq->ready ? 0 : BIT(VQ_STATE_NOT_READY);
state->avail_index = vrh->last_avail_idx;
}
diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
index 7b088bebffe8..bcefa30a3b2f 100644
--- a/include/linux/vdpa.h
+++ b/include/linux/vdpa.h
@@ -27,12 +27,21 @@ struct vdpa_notification_area {
resource_size_t size;
};
+/**
+ * Bitmask describing the status of the vq
+ */
+enum {
+ VQ_STATE_NOT_READY = 0,
+};
+
/**
* vDPA vq_state definition
* @avail_index: available index
+ * @state: returned status from get_vq_state
*/
struct vdpa_vq_state {
u16 avail_index;
+ u32 state;
};
/**
--
2.27.0
Add code to support registering guest's memory region for the device.
This code will be shared by network or block driver implementations.
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Eli Cohen <[email protected]>
---
drivers/vdpa/mlx5/Makefile | 2 +-
drivers/vdpa/mlx5/core/Makefile | 1 -
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 32 ++
drivers/vdpa/mlx5/core/mr.c | 473 +++++++++++++++++++++++++++++
drivers/vdpa/mlx5/core/resources.c | 3 +
5 files changed, 509 insertions(+), 2 deletions(-)
delete mode 100644 drivers/vdpa/mlx5/core/Makefile
create mode 100644 drivers/vdpa/mlx5/core/mr.c
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index d552abb1d126..b347c62032ea 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -1 +1 @@
-obj-$(CONFIG_MLX5_VDPA) += core/resources.o
+obj-$(CONFIG_MLX5_VDPA) += core/resources.o core/mr.o
diff --git a/drivers/vdpa/mlx5/core/Makefile b/drivers/vdpa/mlx5/core/Makefile
deleted file mode 100644
index 7070f8c8680d..000000000000
--- a/drivers/vdpa/mlx5/core/Makefile
+++ /dev/null
@@ -1 +0,0 @@
-obj-y += resources.o
diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
index f3571c8b257e..163cebe4ad08 100644
--- a/drivers/vdpa/mlx5/core/mlx5_vdpa.h
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -7,6 +7,31 @@
#include <linux/vdpa.h>
#include <linux/mlx5/driver.h>
+struct mlx5_vdpa_direct_mr {
+ u64 start;
+ u64 end;
+ u32 perm;
+ struct mlx5_core_mkey mr;
+ struct sg_table sg_head;
+ int log_size;
+ int nsg;
+ struct list_head list;
+ u64 offset;
+};
+
+struct mlx5_vdpa_mr {
+ struct mlx5_core_mkey mkey;
+
+ /* list of direct MRs descendants of this indirect mr */
+ struct list_head head;
+ unsigned long num_directs;
+ unsigned long num_klms;
+ bool initialized;
+
+ /* serialize mkey creation and destruction */
+ struct mutex mkey_mtx;
+};
+
struct mlx5_vdpa_resources {
u32 pdn;
struct mlx5_uars_page *uar;
@@ -26,6 +51,8 @@ struct mlx5_vdpa_dev {
u8 status;
u32 max_vqs;
u32 generation;
+
+ struct mlx5_vdpa_mr mr;
};
int mlx5_vdpa_alloc_pd(struct mlx5_vdpa_dev *dev, u32 *pdn, u16 uid);
@@ -41,6 +68,11 @@ int mlx5_vdpa_alloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 *tdn);
void mlx5_vdpa_dealloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 tdn);
int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev);
void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev);
+int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey, u32 *in,
+ int inlen);
+int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey);
+int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
+ bool *change_map);
#define mlx5_vdpa_warn(__dev, format, ...) \
dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__, \
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
new file mode 100644
index 000000000000..975aa45fd78b
--- /dev/null
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -0,0 +1,473 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/vdpa.h>
+#include <linux/gcd.h>
+#include <linux/string.h>
+#include <linux/mlx5/qp.h>
+#include "mlx5_vdpa.h"
+
+static int get_octo_len(u64 len, int page_shift)
+{
+ u64 page_size = 1ULL << page_shift;
+ int npages;
+
+ npages = ALIGN(len, page_size) >> page_shift;
+ return (npages + 1) / 2;
+}
+
+static void fill_sg(struct mlx5_vdpa_direct_mr *mr, void *in)
+{
+ struct scatterlist *sg;
+ __be64 *pas;
+ int i;
+
+ pas = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
+ for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
+ (*pas) = cpu_to_be64(sg_dma_address(sg));
+}
+
+static void mlx5_set_access_mode(void *mkc, int mode)
+{
+ MLX5_SET(mkc, mkc, access_mode_1_0, mode & 0x3);
+ MLX5_SET(mkc, mkc, access_mode_4_2, mode >> 2);
+}
+
+static void populate_mtts(struct mlx5_vdpa_direct_mr *mr, __be64 *mtt)
+{
+ struct scatterlist *sg;
+ int i;
+
+ for_each_sg(mr->sg_head.sgl, sg, mr->nsg, i)
+ mtt[i] = cpu_to_be64(sg_dma_address(sg));
+}
+
+static int create_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
+{
+ int inlen;
+ void *mkc;
+ void *in;
+ int err;
+
+ inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + roundup(MLX5_ST_SZ_BYTES(mtt) * mr->nsg, 16);
+ in = kvzalloc(inlen, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
+ fill_sg(mr, in);
+ mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+ MLX5_SET(mkc, mkc, lw, !!(mr->perm & VHOST_MAP_WO));
+ MLX5_SET(mkc, mkc, lr, !!(mr->perm & VHOST_MAP_RO));
+ mlx5_set_access_mode(mkc, MLX5_MKC_ACCESS_MODE_MTT);
+ MLX5_SET(mkc, mkc, qpn, 0xffffff);
+ MLX5_SET(mkc, mkc, pd, mvdev->res.pdn);
+ MLX5_SET64(mkc, mkc, start_addr, mr->offset);
+ MLX5_SET64(mkc, mkc, len, mr->end - mr->start);
+ MLX5_SET(mkc, mkc, log_page_size, mr->log_size);
+ MLX5_SET(mkc, mkc, translations_octword_size,
+ get_octo_len(mr->end - mr->start, mr->log_size));
+ MLX5_SET(create_mkey_in, in, translations_octword_actual_size,
+ get_octo_len(mr->end - mr->start, mr->log_size));
+ populate_mtts(mr, MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt));
+ err = mlx5_vdpa_create_mkey(mvdev, &mr->mr, in, inlen);
+ kvfree(in);
+ if (err) {
+ mlx5_vdpa_warn(mvdev, "Failed to create direct MR\n");
+ return err;
+ }
+
+ return 0;
+}
+
+static void destroy_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
+{
+ mlx5_vdpa_destroy_mkey(mvdev, &mr->mr);
+}
+
+static u64 map_start(struct vhost_iotlb_map *map, struct mlx5_vdpa_direct_mr *mr)
+{
+ return max_t(u64, map->start, mr->start);
+}
+
+static u64 map_end(struct vhost_iotlb_map *map, struct mlx5_vdpa_direct_mr *mr)
+{
+ return min_t(u64, map->last + 1, mr->end);
+}
+
+static u64 maplen(struct vhost_iotlb_map *map, struct mlx5_vdpa_direct_mr *mr)
+{
+ return map_end(map, mr) - map_start(map, mr);
+}
+
+#define MLX5_VDPA_INVALID_START_ADDR ((u64)-1)
+#define MLX5_VDPA_INVALID_LEN ((u64)-1)
+
+static u64 indir_start_addr(struct mlx5_vdpa_mr *mkey)
+{
+ struct mlx5_vdpa_direct_mr *s;
+
+ s = list_first_entry_or_null(&mkey->head, struct mlx5_vdpa_direct_mr, list);
+ if (!s)
+ return MLX5_VDPA_INVALID_START_ADDR;
+
+ return s->start;
+}
+
+static u64 indir_len(struct mlx5_vdpa_mr *mkey)
+{
+ struct mlx5_vdpa_direct_mr *s;
+ struct mlx5_vdpa_direct_mr *e;
+
+ s = list_first_entry_or_null(&mkey->head, struct mlx5_vdpa_direct_mr, list);
+ if (!s)
+ return MLX5_VDPA_INVALID_LEN;
+
+ e = list_last_entry(&mkey->head, struct mlx5_vdpa_direct_mr, list);
+
+ return e->end - s->start;
+}
+
+#define MAX_KLM_SIZE 0x40000000
+
+static u32 klm_bcount(u64 size)
+{
+ return (u32)size;
+}
+
+static void fill_indir(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mkey, void *in)
+{
+ struct mlx5_vdpa_direct_mr *dmr;
+ struct mlx5_klm *klmarr;
+ struct mlx5_klm *klm;
+ bool first = true;
+ u64 preve;
+ int i;
+
+ klmarr = MLX5_ADDR_OF(create_mkey_in, in, klm_pas_mtt);
+ i = 0;
+ list_for_each_entry(dmr, &mkey->head, list) {
+again:
+ klm = &klmarr[i++];
+ if (first) {
+ preve = dmr->start;
+ first = false;
+ }
+
+ if (preve == dmr->start) {
+ klm->key = cpu_to_be32(dmr->mr.key);
+ klm->bcount = cpu_to_be32(klm_bcount(dmr->end - dmr->start));
+ preve = dmr->end;
+ } else {
+ klm->key = cpu_to_be32(mvdev->res.null_mkey);
+ klm->bcount = cpu_to_be32(klm_bcount(dmr->start - preve));
+ preve = dmr->start;
+ goto again;
+ }
+ }
+}
+
+static int klm_byte_size(int nklms)
+{
+ return 16 * ALIGN(nklms, 4);
+}
+
+static int create_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr)
+{
+ int inlen;
+ void *mkc;
+ void *in;
+ int err;
+ u64 start;
+ u64 len;
+
+ start = indir_start_addr(mr);
+ len = indir_len(mr);
+ if (start == MLX5_VDPA_INVALID_START_ADDR || len == MLX5_VDPA_INVALID_LEN)
+ return -EINVAL;
+
+ inlen = MLX5_ST_SZ_BYTES(create_mkey_in) + klm_byte_size(mr->num_klms);
+ in = kzalloc(inlen, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
+ mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+ MLX5_SET(mkc, mkc, lw, 1);
+ MLX5_SET(mkc, mkc, lr, 1);
+ mlx5_set_access_mode(mkc, MLX5_MKC_ACCESS_MODE_KLMS);
+ MLX5_SET(mkc, mkc, qpn, 0xffffff);
+ MLX5_SET(mkc, mkc, pd, mvdev->res.pdn);
+ MLX5_SET64(mkc, mkc, start_addr, start);
+ MLX5_SET64(mkc, mkc, len, len);
+ MLX5_SET(mkc, mkc, translations_octword_size, klm_byte_size(mr->num_klms) / 16);
+ MLX5_SET(create_mkey_in, in, translations_octword_actual_size, mr->num_klms);
+ fill_indir(mvdev, mr, in);
+ err = mlx5_vdpa_create_mkey(mvdev, &mr->mkey, in, inlen);
+ kfree(in);
+ return err;
+}
+
+static void destroy_indirect_key(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mkey)
+{
+ mlx5_vdpa_destroy_mkey(mvdev, &mkey->mkey);
+}
+
+static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr,
+ struct vhost_iotlb *iotlb)
+{
+ struct vhost_iotlb_map *map;
+ unsigned long lgcd = 0;
+ int log_entity_size;
+ unsigned long size;
+ u64 start = 0;
+ int err;
+ struct page *pg;
+ unsigned int nsg;
+ int sglen;
+ u64 pa;
+ u64 paend;
+ struct scatterlist *sg;
+ struct device *dma = mvdev->mdev->device;
+ int ret;
+
+ for (map = vhost_iotlb_itree_first(iotlb, mr->start, mr->end - 1);
+ map; map = vhost_iotlb_itree_next(map, start, mr->end - 1)) {
+ size = maplen(map, mr);
+ lgcd = gcd(lgcd, size);
+ start += size;
+ }
+ log_entity_size = ilog2(lgcd);
+
+ sglen = 1 << log_entity_size;
+ nsg = DIV_ROUND_UP(mr->end - mr->start, sglen);
+
+ err = sg_alloc_table(&mr->sg_head, nsg, GFP_KERNEL);
+ if (err)
+ return err;
+
+ sg = mr->sg_head.sgl;
+ for (map = vhost_iotlb_itree_first(iotlb, mr->start, mr->end - 1);
+ map; map = vhost_iotlb_itree_next(map, mr->start, mr->end - 1)) {
+ paend = map->addr + maplen(map, mr);
+ for (pa = map->addr; pa < paend; pa += sglen) {
+ pg = pfn_to_page(__phys_to_pfn(pa));
+ if (!sg) {
+ mlx5_vdpa_warn(mvdev, "sg null. start 0x%llx, end 0x%llx\n",
+ map->start, map->last + 1);
+ err = -ENOMEM;
+ goto err_map;
+ }
+ sg_set_page(sg, pg, sglen, 0);
+ sg = sg_next(sg);
+ if (!sg)
+ goto done;
+ }
+ }
+done:
+ mr->log_size = log_entity_size;
+ mr->nsg = nsg;
+ ret = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
+ if (!ret)
+ goto err_map;
+
+ err = create_direct_mr(mvdev, mr);
+ if (err)
+ goto err_direct;
+
+ return 0;
+
+err_direct:
+ dma_unmap_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
+err_map:
+ sg_free_table(&mr->sg_head);
+ return err;
+}
+
+static void unmap_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr)
+{
+ struct device *dma = mvdev->mdev->device;
+
+ destroy_direct_mr(mvdev, mr);
+ dma_unmap_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
+ sg_free_table(&mr->sg_head);
+}
+
+static int add_direct_chain(struct mlx5_vdpa_dev *mvdev, u64 start, u64 size, u8 perm,
+ struct vhost_iotlb *iotlb)
+{
+ struct mlx5_vdpa_mr *mr = &mvdev->mr;
+ struct mlx5_vdpa_direct_mr *dmr;
+ struct mlx5_vdpa_direct_mr *n;
+ LIST_HEAD(tmp);
+ u64 st;
+ u64 sz;
+ int err;
+ int i = 0;
+
+ st = start;
+ while (size) {
+ sz = (u32)min_t(u64, MAX_KLM_SIZE, size);
+ dmr = kzalloc(sizeof(*dmr), GFP_KERNEL);
+ if (!dmr)
+ goto err_alloc;
+
+ dmr->start = st;
+ dmr->end = st + sz;
+ dmr->perm = perm;
+ err = map_direct_mr(mvdev, dmr, iotlb);
+ if (err) {
+ kfree(dmr);
+ goto err_alloc;
+ }
+
+ list_add_tail(&dmr->list, &tmp);
+ size -= sz;
+ mr->num_directs++;
+ mr->num_klms++;
+ st += sz;
+ i++;
+ }
+ list_splice_tail(&tmp, &mr->head);
+ return 0;
+
+err_alloc:
+ list_for_each_entry_safe(dmr, n, &mr->head, list) {
+ list_del_init(&dmr->list);
+ unmap_direct_mr(mvdev, dmr);
+ kfree(dmr);
+ }
+ return err;
+}
+
+/* The iotlb pointer contains a list of maps. Go over the maps, possibly
+ * merging mergeable maps, and create direct memory keys that provide the
+ * device access to memory. The direct mkeys are then referred to by the
+ * indirect memory key that provides access to the enitre address space given
+ * by iotlb.
+ */
+static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb)
+{
+ struct mlx5_vdpa_mr *mr = &mvdev->mr;
+ struct mlx5_vdpa_direct_mr *dmr;
+ struct mlx5_vdpa_direct_mr *n;
+ struct vhost_iotlb_map *map;
+ u32 pperm = U16_MAX;
+ u64 last = U64_MAX;
+ u64 ps = U64_MAX;
+ u64 pe = U64_MAX;
+ u64 start = 0;
+ int err = 0;
+ int nnuls;
+
+ if (mr->initialized)
+ return 0;
+
+ INIT_LIST_HEAD(&mr->head);
+ for (map = vhost_iotlb_itree_first(iotlb, start, last); map;
+ map = vhost_iotlb_itree_next(map, start, last)) {
+ start = map->start;
+ if (pe == map->start && pperm == map->perm) {
+ pe = map->last + 1;
+ } else {
+ if (ps != U64_MAX) {
+ if (pe < map->start) {
+ /* We have a hole in the map. Check how
+ * many null keys are required to fill it.
+ */
+ nnuls = DIV_ROUND_UP(map->start - pe, MAX_KLM_SIZE);
+ mr->num_klms += nnuls;
+ }
+ err = add_direct_chain(mvdev, ps, pe - ps, pperm, iotlb);
+ if (err)
+ goto err_chain;
+ }
+ ps = map->start;
+ pe = map->last + 1;
+ pperm = map->perm;
+ }
+ }
+ err = add_direct_chain(mvdev, ps, pe - ps, pperm, iotlb);
+ if (err)
+ goto err_chain;
+
+ /* Create the memory key that defines the guests's address space. This
+ * memory key refers to the direct keys that contain the MTT
+ * translations
+ */
+ err = create_indirect_key(mvdev, mr);
+ if (err)
+ goto err_chain;
+
+ mr->initialized = true;
+ return 0;
+
+err_chain:
+ list_for_each_entry_safe_reverse(dmr, n, &mr->head, list) {
+ list_del_init(&dmr->list);
+ unmap_direct_mr(mvdev, dmr);
+ kfree(dmr);
+ }
+ return err;
+}
+
+int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb)
+{
+ struct mlx5_vdpa_mr *mr = &mvdev->mr;
+ int err;
+
+ mutex_lock(&mr->mkey_mtx);
+ err = _mlx5_vdpa_create_mr(mvdev, iotlb);
+ mutex_unlock(&mr->mkey_mtx);
+ return err;
+}
+
+void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
+{
+ struct mlx5_vdpa_mr *mr = &mvdev->mr;
+ struct mlx5_vdpa_direct_mr *dmr;
+ struct mlx5_vdpa_direct_mr *n;
+
+ mutex_lock(&mr->mkey_mtx);
+ if (!mr->initialized)
+ goto out;
+
+ destroy_indirect_key(mvdev, mr);
+ list_for_each_entry_safe_reverse(dmr, n, &mr->head, list) {
+ list_del_init(&dmr->list);
+ unmap_direct_mr(mvdev, dmr);
+ kfree(dmr);
+ }
+ memset(mr, 0, sizeof(*mr));
+ mr->initialized = false;
+out:
+ mutex_unlock(&mr->mkey_mtx);
+}
+
+static bool map_empty(struct vhost_iotlb *iotlb)
+{
+ return !vhost_iotlb_itree_first(iotlb, 0, U64_MAX);
+}
+
+int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
+ bool *change_map)
+{
+ struct mlx5_vdpa_mr *mr = &mvdev->mr;
+ int err;
+
+ *change_map = false;
+ if (map_empty(iotlb)) {
+ mlx5_vdpa_destroy_mr(mvdev);
+ return 0;
+ }
+ mutex_lock(&mr->mkey_mtx);
+ if (mr->initialized) {
+ mlx5_vdpa_info(mvdev, "memory map update\n");
+ *change_map = true;
+ }
+ if (!*change_map)
+ err = _mlx5_vdpa_create_mr(mvdev, iotlb);
+ mutex_unlock(&mr->mkey_mtx);
+
+ return err;
+}
diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
index 6c6552b7e9b5..96e6421c5d1c 100644
--- a/drivers/vdpa/mlx5/core/resources.c
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -227,6 +227,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
mlx5_vdpa_warn(mvdev, "resources already allocated\n");
return -EINVAL;
}
+ mutex_init(&mvdev->mr.mkey_mtx);
res->uar = mlx5_get_uars_page(mdev);
if (IS_ERR(res->uar)) {
err = PTR_ERR(res->uar);
@@ -262,6 +263,7 @@ int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
err_uctx:
mlx5_put_uars_page(mdev, res->uar);
err_uars:
+ mutex_destroy(&mvdev->mr.mkey_mtx);
return err;
}
@@ -277,5 +279,6 @@ void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
dealloc_pd(mvdev, res->pdn, res->uid);
destroy_uctx(mvdev, res->uid);
mlx5_put_uars_page(mvdev->mdev, res->uar);
+ mutex_destroy(&mvdev->mr.mkey_mtx);
res->valid = false;
}
--
2.27.0
From: Jason Wang <[email protected]>
Change-Id: Ifd7139cfbd122dc22493f4bb93fa884f8edbddb6
Signed-off-by: Jason Wang <[email protected]>
---
drivers/vhost/vdpa.c | 23 +++++++++++++++++------
include/uapi/linux/vhost_types.h | 2 ++
2 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/drivers/vhost/vdpa.c b/drivers/vhost/vdpa.c
index a54b60d6623f..8827ae31f96d 100644
--- a/drivers/vhost/vdpa.c
+++ b/drivers/vhost/vdpa.c
@@ -73,6 +73,7 @@ struct vhost_vdpa {
int virtio_id;
int minor;
struct eventfd_ctx *config_ctx;
+ bool in_batch;
};
static DEFINE_IDA(vhost_vdpa_ida);
@@ -521,9 +522,9 @@ static int vhost_vdpa_map(struct vhost_vdpa *v,
if (ops->dma_map)
r = ops->dma_map(vdpa, iova, size, pa, perm);
- else if (ops->set_map)
- r = ops->set_map(vdpa, dev->iotlb);
- else
+ else if (ops->set_map) {
+
+ } else
r = iommu_map(v->domain, iova, pa, size,
perm_to_iommu_flags(perm));
@@ -540,9 +541,8 @@ static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size)
if (ops->dma_map)
ops->dma_unmap(vdpa, iova, size);
- else if (ops->set_map)
- ops->set_map(vdpa, dev->iotlb);
- else
+ else if (ops->set_map) {
+ } else
iommu_unmap(v->domain, iova, size);
}
@@ -636,6 +636,8 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev,
struct vhost_iotlb_msg *msg)
{
struct vhost_vdpa *v = container_of(dev, struct vhost_vdpa, vdev);
+ struct vdpa_device *vdpa = v->vdpa;
+ const struct vdpa_config_ops *ops = vdpa->config;
int r = 0;
r = vhost_dev_check_owner(dev);
@@ -649,6 +651,15 @@ static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev,
case VHOST_IOTLB_INVALIDATE:
vhost_vdpa_unmap(v, msg->iova, msg->size);
break;
+ case VHOST_IOTLB_BATCH_BEGIN:
+ v->in_batch = true;
+ break;
+ case VHOST_IOTLB_BATCH_END:
+ if (v->in_batch && ops->set_map) {
+ ops->set_map(vdpa, dev->iotlb);
+ }
+ v->in_batch = false;
+ break;
default:
r = -EINVAL;
break;
diff --git a/include/uapi/linux/vhost_types.h b/include/uapi/linux/vhost_types.h
index 669457ce5c48..7f52245f8d5d 100644
--- a/include/uapi/linux/vhost_types.h
+++ b/include/uapi/linux/vhost_types.h
@@ -60,6 +60,8 @@ struct vhost_iotlb_msg {
#define VHOST_IOTLB_UPDATE 2
#define VHOST_IOTLB_INVALIDATE 3
#define VHOST_IOTLB_ACCESS_FAIL 4
+#define VHOST_IOTLB_BATCH_BEGIN 5
+#define VHOST_IOTLB_BATCH_END 6
__u8 type;
};
--
2.27.0
Following patches introduce VDPA network driver for Mellanox Connectx6
devices. This patch provides functionality that will be used by those
patches.
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Eli Cohen <[email protected]>
---
drivers/vdpa/Kconfig | 8 +
drivers/vdpa/Makefile | 1 +
drivers/vdpa/mlx5/Makefile | 1 +
drivers/vdpa/mlx5/core/Makefile | 1 +
drivers/vdpa/mlx5/core/mlx5_vdpa.h | 57 ++++++
drivers/vdpa/mlx5/core/resources.c | 281 +++++++++++++++++++++++++++++
6 files changed, 349 insertions(+)
create mode 100644 drivers/vdpa/mlx5/Makefile
create mode 100644 drivers/vdpa/mlx5/core/Makefile
create mode 100644 drivers/vdpa/mlx5/core/mlx5_vdpa.h
create mode 100644 drivers/vdpa/mlx5/core/resources.c
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 3e1ceb8e9f2b..48a1a776dd86 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -28,4 +28,12 @@ config IFCVF
To compile this driver as a module, choose M here: the module will
be called ifcvf.
+config MLX5_VDPA
+ bool "MLX5 VDPA support library for ConnectX devices"
+ depends on MLX5_CORE
+ default n
+ help
+ Support library for Mellanox VDPA drivers. Provides code that is
+ common for all types of VDPA drivers.
+
endif # VDPA
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 8bbb686ca7a2..d160e9b63a66 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -2,3 +2,4 @@
obj-$(CONFIG_VDPA) += vdpa.o
obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
obj-$(CONFIG_IFCVF) += ifcvf/
+obj-$(CONFIG_MLX5_VDPA) += mlx5/
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
new file mode 100644
index 000000000000..d552abb1d126
--- /dev/null
+++ b/drivers/vdpa/mlx5/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_MLX5_VDPA) += core/resources.o
diff --git a/drivers/vdpa/mlx5/core/Makefile b/drivers/vdpa/mlx5/core/Makefile
new file mode 100644
index 000000000000..7070f8c8680d
--- /dev/null
+++ b/drivers/vdpa/mlx5/core/Makefile
@@ -0,0 +1 @@
+obj-y += resources.o
diff --git a/drivers/vdpa/mlx5/core/mlx5_vdpa.h b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
new file mode 100644
index 000000000000..f3571c8b257e
--- /dev/null
+++ b/drivers/vdpa/mlx5/core/mlx5_vdpa.h
@@ -0,0 +1,57 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#ifndef __MLX5_VDPA_H__
+#define __MLX5_VDPA_H__
+
+#include <linux/vdpa.h>
+#include <linux/mlx5/driver.h>
+
+struct mlx5_vdpa_resources {
+ u32 pdn;
+ struct mlx5_uars_page *uar;
+ void __iomem *kick_addr;
+ u16 uid;
+ u32 null_mkey;
+ bool valid;
+};
+
+struct mlx5_vdpa_dev {
+ struct vdpa_device vdev;
+ struct mlx5_core_dev *mdev;
+ struct mlx5_vdpa_resources res;
+
+ u64 mlx_features;
+ u64 actual_features;
+ u8 status;
+ u32 max_vqs;
+ u32 generation;
+};
+
+int mlx5_vdpa_alloc_pd(struct mlx5_vdpa_dev *dev, u32 *pdn, u16 uid);
+int mlx5_vdpa_dealloc_pd(struct mlx5_vdpa_dev *dev, u32 pdn, u16 uid);
+int mlx5_vdpa_get_null_mkey(struct mlx5_vdpa_dev *dev, u32 *null_mkey);
+int mlx5_vdpa_create_tis(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tisn);
+void mlx5_vdpa_destroy_tis(struct mlx5_vdpa_dev *mvdev, u32 tisn);
+int mlx5_vdpa_create_rqt(struct mlx5_vdpa_dev *mvdev, void *in, int inlen, u32 *rqtn);
+void mlx5_vdpa_destroy_rqt(struct mlx5_vdpa_dev *mvdev, u32 rqtn);
+int mlx5_vdpa_create_tir(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tirn);
+void mlx5_vdpa_destroy_tir(struct mlx5_vdpa_dev *mvdev, u32 tirn);
+int mlx5_vdpa_alloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 *tdn);
+void mlx5_vdpa_dealloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 tdn);
+int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev);
+void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev);
+
+#define mlx5_vdpa_warn(__dev, format, ...) \
+ dev_warn((__dev)->mdev->device, "%s:%d:(pid %d) warning: " format, __func__, __LINE__, \
+ current->pid, ##__VA_ARGS__)
+
+#define mlx5_vdpa_info(__dev, format, ...) \
+ dev_info((__dev)->mdev->device, "%s:%d:(pid %d): " format, __func__, __LINE__, \
+ current->pid, ##__VA_ARGS__)
+
+#define mlx5_vdpa_dbg(__dev, format, ...) \
+ dev_debug((__dev)->mdev->device, "%s:%d:(pid %d): " format, __func__, __LINE__, \
+ current->pid, ##__VA_ARGS__)
+
+#endif /* __MLX5_VDPA_H__ */
diff --git a/drivers/vdpa/mlx5/core/resources.c b/drivers/vdpa/mlx5/core/resources.c
new file mode 100644
index 000000000000..6c6552b7e9b5
--- /dev/null
+++ b/drivers/vdpa/mlx5/core/resources.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/mlx5/driver.h>
+#include "mlx5_vdpa.h"
+
+static int alloc_pd(struct mlx5_vdpa_dev *dev, u32 *pdn, u16 uid)
+{
+ struct mlx5_core_dev *mdev = dev->mdev;
+
+ u32 out[MLX5_ST_SZ_DW(alloc_pd_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(alloc_pd_in)] = {};
+ int err;
+
+ MLX5_SET(alloc_pd_in, in, opcode, MLX5_CMD_OP_ALLOC_PD);
+ MLX5_SET(alloc_pd_in, in, uid, uid);
+
+ err = mlx5_cmd_exec_inout(mdev, alloc_pd, in, out);
+ if (!err)
+ *pdn = MLX5_GET(alloc_pd_out, out, pd);
+
+ return err;
+}
+
+static int dealloc_pd(struct mlx5_vdpa_dev *dev, u32 pdn, u16 uid)
+{
+ u32 in[MLX5_ST_SZ_DW(dealloc_pd_in)] = {};
+ struct mlx5_core_dev *mdev = dev->mdev;
+
+ MLX5_SET(dealloc_pd_in, in, opcode, MLX5_CMD_OP_DEALLOC_PD);
+ MLX5_SET(dealloc_pd_in, in, pd, pdn);
+ MLX5_SET(dealloc_pd_in, in, uid, uid);
+ return mlx5_cmd_exec_in(mdev, dealloc_pd, in);
+}
+
+static int get_null_mkey(struct mlx5_vdpa_dev *dev, u32 *null_mkey)
+{
+ u32 out[MLX5_ST_SZ_DW(query_special_contexts_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(query_special_contexts_in)] = {};
+ struct mlx5_core_dev *mdev = dev->mdev;
+ int err;
+
+ MLX5_SET(query_special_contexts_in, in, opcode, MLX5_CMD_OP_QUERY_SPECIAL_CONTEXTS);
+ err = mlx5_cmd_exec_inout(mdev, query_special_contexts, in, out);
+ if (!err)
+ *null_mkey = MLX5_GET(query_special_contexts_out, out, null_mkey);
+ return err;
+}
+
+static int create_uctx(struct mlx5_vdpa_dev *mvdev, u16 *uid)
+{
+ u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
+ int inlen;
+ void *in;
+ int err;
+
+ /* 0 means not supported */
+ if (!MLX5_CAP_GEN(mvdev->mdev, log_max_uctx))
+ return -EOPNOTSUPP;
+
+ inlen = MLX5_ST_SZ_BYTES(create_uctx_in);
+ in = kzalloc(inlen, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(create_uctx_in, in, opcode, MLX5_CMD_OP_CREATE_UCTX);
+ MLX5_SET(create_uctx_in, in, uctx.cap, MLX5_UCTX_CAP_RAW_TX);
+
+ err = mlx5_cmd_exec(mvdev->mdev, in, inlen, out, sizeof(out));
+ kfree(in);
+ if (!err)
+ *uid = MLX5_GET(create_uctx_out, out, uid);
+
+ return err;
+}
+
+static void destroy_uctx(struct mlx5_vdpa_dev *mvdev, u32 uid)
+{
+ u32 out[MLX5_ST_SZ_DW(destroy_uctx_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(destroy_uctx_in)] = {};
+
+ MLX5_SET(destroy_uctx_in, in, opcode, MLX5_CMD_OP_DESTROY_UCTX);
+ MLX5_SET(destroy_uctx_in, in, uid, uid);
+
+ mlx5_cmd_exec(mvdev->mdev, in, sizeof(in), out, sizeof(out));
+}
+
+int mlx5_vdpa_create_tis(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tisn)
+{
+ u32 out[MLX5_ST_SZ_DW(create_tis_out)] = {};
+ int err;
+
+ MLX5_SET(create_tis_in, in, opcode, MLX5_CMD_OP_CREATE_TIS);
+ MLX5_SET(create_tis_in, in, uid, mvdev->res.uid);
+ err = mlx5_cmd_exec_inout(mvdev->mdev, create_tis, in, out);
+ if (!err)
+ *tisn = MLX5_GET(create_tis_out, out, tisn);
+
+ return err;
+}
+
+void mlx5_vdpa_destroy_tis(struct mlx5_vdpa_dev *mvdev, u32 tisn)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_tis_in)] = {};
+
+ MLX5_SET(destroy_tis_in, in, opcode, MLX5_CMD_OP_DESTROY_TIS);
+ MLX5_SET(destroy_tis_in, in, uid, mvdev->res.uid);
+ MLX5_SET(destroy_tis_in, in, tisn, tisn);
+ mlx5_cmd_exec_in(mvdev->mdev, destroy_tis, in);
+}
+
+int mlx5_vdpa_create_rqt(struct mlx5_vdpa_dev *mvdev, void *in, int inlen, u32 *rqtn)
+{
+ u32 out[MLX5_ST_SZ_DW(create_rqt_out)] = {};
+ int err;
+
+ MLX5_SET(create_rqt_in, in, opcode, MLX5_CMD_OP_CREATE_RQT);
+ err = mlx5_cmd_exec(mvdev->mdev, in, inlen, out, sizeof(out));
+ if (!err)
+ *rqtn = MLX5_GET(create_rqt_out, out, rqtn);
+
+ return err;
+}
+
+void mlx5_vdpa_destroy_rqt(struct mlx5_vdpa_dev *mvdev, u32 rqtn)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_rqt_in)] = {};
+
+ MLX5_SET(destroy_rqt_in, in, opcode, MLX5_CMD_OP_DESTROY_RQT);
+ MLX5_SET(destroy_rqt_in, in, uid, mvdev->res.uid);
+ MLX5_SET(destroy_rqt_in, in, rqtn, rqtn);
+ mlx5_cmd_exec_in(mvdev->mdev, destroy_rqt, in);
+}
+
+int mlx5_vdpa_create_tir(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tirn)
+{
+ u32 out[MLX5_ST_SZ_DW(create_tir_out)] = {};
+ int err;
+
+ MLX5_SET(create_tir_in, in, opcode, MLX5_CMD_OP_CREATE_TIR);
+ err = mlx5_cmd_exec_inout(mvdev->mdev, create_tir, in, out);
+ if (!err)
+ *tirn = MLX5_GET(create_tir_out, out, tirn);
+
+ return err;
+}
+
+void mlx5_vdpa_destroy_tir(struct mlx5_vdpa_dev *mvdev, u32 tirn)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_tir_in)] = {};
+
+ MLX5_SET(destroy_tir_in, in, opcode, MLX5_CMD_OP_DESTROY_TIR);
+ MLX5_SET(destroy_tir_in, in, uid, mvdev->res.uid);
+ MLX5_SET(destroy_tir_in, in, tirn, tirn);
+ mlx5_cmd_exec_in(mvdev->mdev, destroy_tir, in);
+}
+
+int mlx5_vdpa_alloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 *tdn)
+{
+ u32 out[MLX5_ST_SZ_DW(alloc_transport_domain_out)] = {};
+ u32 in[MLX5_ST_SZ_DW(alloc_transport_domain_in)] = {};
+ int err;
+
+ MLX5_SET(alloc_transport_domain_in, in, opcode, MLX5_CMD_OP_ALLOC_TRANSPORT_DOMAIN);
+ MLX5_SET(alloc_transport_domain_in, in, uid, mvdev->res.uid);
+
+ err = mlx5_cmd_exec_inout(mvdev->mdev, alloc_transport_domain, in, out);
+ if (!err)
+ *tdn = MLX5_GET(alloc_transport_domain_out, out, transport_domain);
+
+ return err;
+}
+
+void mlx5_vdpa_dealloc_transport_domain(struct mlx5_vdpa_dev *mvdev, u32 tdn)
+{
+ u32 in[MLX5_ST_SZ_DW(dealloc_transport_domain_in)] = {};
+
+ MLX5_SET(dealloc_transport_domain_in, in, opcode, MLX5_CMD_OP_DEALLOC_TRANSPORT_DOMAIN);
+ MLX5_SET(dealloc_transport_domain_in, in, uid, mvdev->res.uid);
+ MLX5_SET(dealloc_transport_domain_in, in, transport_domain, tdn);
+ mlx5_cmd_exec_in(mvdev->mdev, dealloc_transport_domain, in);
+}
+
+int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey, u32 *in,
+ int inlen)
+{
+ u32 lout[MLX5_ST_SZ_DW(create_mkey_out)] = {};
+ u32 mkey_index;
+ void *mkc;
+ int err;
+
+ MLX5_SET(create_mkey_in, in, opcode, MLX5_CMD_OP_CREATE_MKEY);
+ MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
+
+ err = mlx5_cmd_exec(mvdev->mdev, in, inlen, lout, sizeof(lout));
+ if (err)
+ return err;
+
+ mkc = MLX5_ADDR_OF(create_mkey_in, in, memory_key_mkey_entry);
+ mkey_index = MLX5_GET(create_mkey_out, lout, mkey_index);
+ mkey->iova = MLX5_GET64(mkc, mkc, start_addr);
+ mkey->size = MLX5_GET64(mkc, mkc, len);
+ mkey->key |= mlx5_idx_to_mkey(mkey_index);
+ mkey->pd = MLX5_GET(mkc, mkc, pd);
+ return 0;
+}
+
+int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_mkey_in)] = {};
+
+ MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
+ MLX5_SET(destroy_mkey_in, in, opcode, MLX5_CMD_OP_DESTROY_MKEY);
+ MLX5_SET(destroy_mkey_in, in, mkey_index, mlx5_mkey_to_idx(mkey->key));
+ return mlx5_cmd_exec_in(mvdev->mdev, destroy_mkey, in);
+}
+
+int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev)
+{
+ u64 offset = MLX5_CAP64_DEV_VDPA_EMULATION(mvdev->mdev, doorbell_bar_offset);
+ struct mlx5_vdpa_resources *res = &mvdev->res;
+ struct mlx5_core_dev *mdev = mvdev->mdev;
+ u64 kick_addr;
+ int err;
+
+ if (res->valid) {
+ mlx5_vdpa_warn(mvdev, "resources already allocated\n");
+ return -EINVAL;
+ }
+ res->uar = mlx5_get_uars_page(mdev);
+ if (IS_ERR(res->uar)) {
+ err = PTR_ERR(res->uar);
+ goto err_uars;
+ }
+
+ err = create_uctx(mvdev, &res->uid);
+ if (err)
+ goto err_uctx;
+
+ err = alloc_pd(mvdev, &res->pdn, res->uid);
+ if (err)
+ goto err_pd;
+
+ err = get_null_mkey(mvdev, &res->null_mkey);
+ if (err)
+ goto err_key;
+
+ kick_addr = pci_resource_start(mdev->pdev, 0) + offset;
+ res->kick_addr = ioremap(kick_addr, PAGE_SIZE);
+ if (!res->kick_addr) {
+ err = -ENOMEM;
+ goto err_key;
+ }
+ res->valid = true;
+
+ return 0;
+
+err_key:
+ dealloc_pd(mvdev, res->pdn, res->uid);
+err_pd:
+ destroy_uctx(mvdev, res->uid);
+err_uctx:
+ mlx5_put_uars_page(mdev, res->uar);
+err_uars:
+ return err;
+}
+
+void mlx5_vdpa_free_resources(struct mlx5_vdpa_dev *mvdev)
+{
+ struct mlx5_vdpa_resources *res = &mvdev->res;
+
+ if (!res->valid)
+ return;
+
+ iounmap(res->kick_addr);
+ res->kick_addr = NULL;
+ dealloc_pd(mvdev, res->pdn, res->uid);
+ destroy_uctx(mvdev, res->uid);
+ mlx5_put_uars_page(mvdev->mdev, res->uar);
+ res->valid = false;
+}
--
2.27.0
Add a front end VDPA driver that registers in the VDPA bus and provides
networking to a guest. The VDPA driver creates the necessary resources
on the VF it is driving such that data path will be offloaded.
Notifications are being communicated through the driver.
Currently, only VFs are supported. In subsequent patches we will have
devlink support to control which VF is used for VDPA and which function
is used for regular networking.
Reviewed-by: Parav Pandit <[email protected]>
Signed-off-by: Eli Cohen <[email protected]>
---
drivers/vdpa/Kconfig | 10 +
drivers/vdpa/mlx5/Makefile | 5 +-
drivers/vdpa/mlx5/core/mr.c | 2 +-
drivers/vdpa/mlx5/net/main.c | 76 ++
drivers/vdpa/mlx5/net/mlx5_vnet.c | 1966 +++++++++++++++++++++++++++++
drivers/vdpa/mlx5/net/mlx5_vnet.h | 32 +
6 files changed, 2089 insertions(+), 2 deletions(-)
create mode 100644 drivers/vdpa/mlx5/net/main.c
create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.c
create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.h
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index 48a1a776dd86..809cb4c2eecf 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -36,4 +36,14 @@ config MLX5_VDPA
Support library for Mellanox VDPA drivers. Provides code that is
common for all types of VDPA drivers.
+config MLX5_VDPA_NET
+ tristate "vDPA driver for ConnectX devices"
+ depends on MLX5_VDPA
+ default n
+ help
+ VDPA network driver for ConnectX6 and newer. Provides offloading
+ of virtio net datapath such that descriptors put on the ring will
+ be executed by the hardware. It also supports a variety of stateless
+ offloads depending on the actual device used and firmware version.
+
endif # VDPA
diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
index b347c62032ea..3f8850c1a300 100644
--- a/drivers/vdpa/mlx5/Makefile
+++ b/drivers/vdpa/mlx5/Makefile
@@ -1 +1,4 @@
-obj-$(CONFIG_MLX5_VDPA) += core/resources.o core/mr.o
+subdir-ccflags-y += -I$(src)/core
+
+obj-$(CONFIG_MLX5_VDPA_NET) += mlx5_vdpa.o
+mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/main.o net/mlx5_vnet.o core/resources.o core/mr.o
diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
index 975aa45fd78b..34e3bbb80df8 100644
--- a/drivers/vdpa/mlx5/core/mr.c
+++ b/drivers/vdpa/mlx5/core/mr.c
@@ -453,7 +453,7 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *io
bool *change_map)
{
struct mlx5_vdpa_mr *mr = &mvdev->mr;
- int err;
+ int err = 0;
*change_map = false;
if (map_empty(iotlb)) {
diff --git a/drivers/vdpa/mlx5/net/main.c b/drivers/vdpa/mlx5/net/main.c
new file mode 100644
index 000000000000..838cd98386ff
--- /dev/null
+++ b/drivers/vdpa/mlx5/net/main.c
@@ -0,0 +1,76 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/module.h>
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/device.h>
+#include "mlx5_vdpa_ifc.h"
+#include "mlx5_vnet.h"
+
+MODULE_AUTHOR("Eli Cohen <[email protected]>");
+MODULE_DESCRIPTION("Mellanox VDPA driver");
+MODULE_LICENSE("Dual BSD/GPL");
+
+static bool required_caps_supported(struct mlx5_core_dev *mdev)
+{
+ u8 event_mode;
+ u64 got;
+
+ got = MLX5_CAP_GEN_64(mdev, general_obj_types);
+
+ if (!(got & MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q))
+ return false;
+
+ event_mode = MLX5_CAP_DEV_VDPA_EMULATION(mdev, event_mode);
+ if (!(event_mode & MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE))
+ return false;
+
+ if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, eth_frame_offload_type))
+ return false;
+
+ return true;
+}
+
+static void *mlx5_vdpa_add(struct mlx5_core_dev *mdev)
+{
+ struct mlx5_vdpa_dev *vdev;
+
+ if (mlx5_core_is_pf(mdev))
+ return NULL;
+
+ if (!required_caps_supported(mdev)) {
+ dev_info(mdev->device, "virtio net emulation not supported\n");
+ return NULL;
+ }
+ vdev = mlx5_vdpa_add_dev(mdev);
+ if (IS_ERR(vdev))
+ return NULL;
+
+ return vdev;
+}
+
+static void mlx5_vdpa_remove(struct mlx5_core_dev *mdev, void *context)
+{
+ struct mlx5_vdpa_dev *vdev = context;
+
+ mlx5_vdpa_remove_dev(vdev);
+}
+
+static struct mlx5_interface mlx5_vdpa_interface = {
+ .add = mlx5_vdpa_add,
+ .remove = mlx5_vdpa_remove,
+ .protocol = MLX5_INTERFACE_PROTOCOL_VDPA,
+};
+
+static int __init mlx5_vdpa_init(void)
+{
+ return mlx5_register_interface(&mlx5_vdpa_interface);
+}
+
+static void __exit mlx5_vdpa_exit(void)
+{
+ mlx5_unregister_interface(&mlx5_vdpa_interface);
+}
+
+module_init(mlx5_vdpa_init);
+module_exit(mlx5_vdpa_exit);
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
new file mode 100644
index 000000000000..f7d3c7a0eb92
--- /dev/null
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -0,0 +1,1966 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#include <linux/vdpa.h>
+#include <uapi/linux/virtio_ids.h>
+#include <linux/virtio_config.h>
+#include <linux/mlx5/qp.h>
+#include <linux/mlx5/device.h>
+#include <linux/mlx5/vport.h>
+#include <linux/mlx5/fs.h>
+#include <linux/mlx5/device.h>
+#include "mlx5_vnet.h"
+#include "../core/mlx5_vdpa_ifc.h"
+#include "../core/mlx5_vdpa.h"
+
+#define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev)
+
+#define VALID_FEATURES_MASK \
+ (BIT(VIRTIO_NET_F_CSUM) | BIT(VIRTIO_NET_F_GUEST_CSUM) | \
+ BIT(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) | BIT(VIRTIO_NET_F_MTU) | BIT(VIRTIO_NET_F_MAC) | \
+ BIT(VIRTIO_NET_F_GUEST_TSO4) | BIT(VIRTIO_NET_F_GUEST_TSO6) | \
+ BIT(VIRTIO_NET_F_GUEST_ECN) | BIT(VIRTIO_NET_F_GUEST_UFO) | BIT(VIRTIO_NET_F_HOST_TSO4) | \
+ BIT(VIRTIO_NET_F_HOST_TSO6) | BIT(VIRTIO_NET_F_HOST_ECN) | BIT(VIRTIO_NET_F_HOST_UFO) | \
+ BIT(VIRTIO_NET_F_MRG_RXBUF) | BIT(VIRTIO_NET_F_STATUS) | BIT(VIRTIO_NET_F_CTRL_VQ) | \
+ BIT(VIRTIO_NET_F_CTRL_RX) | BIT(VIRTIO_NET_F_CTRL_VLAN) | \
+ BIT(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT(VIRTIO_NET_F_GUEST_ANNOUNCE) | \
+ BIT(VIRTIO_NET_F_MQ) | BIT(VIRTIO_NET_F_CTRL_MAC_ADDR) | BIT(VIRTIO_NET_F_HASH_REPORT) | \
+ BIT(VIRTIO_NET_F_RSS) | BIT(VIRTIO_NET_F_RSC_EXT) | BIT(VIRTIO_NET_F_STANDBY) | \
+ BIT(VIRTIO_NET_F_SPEED_DUPLEX) | BIT(VIRTIO_F_NOTIFY_ON_EMPTY) | \
+ BIT(VIRTIO_F_ANY_LAYOUT) | BIT(VIRTIO_F_VERSION_1) | BIT(VIRTIO_F_IOMMU_PLATFORM) | \
+ BIT(VIRTIO_F_RING_PACKED) | BIT(VIRTIO_F_ORDER_PLATFORM) | BIT(VIRTIO_F_SR_IOV))
+
+#define VALID_STATUS_MASK \
+ (VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK | \
+ VIRTIO_CONFIG_S_FEATURES_OK | VIRTIO_CONFIG_S_NEEDS_RESET | VIRTIO_CONFIG_S_FAILED)
+
+struct mlx5_vdpa_net_resources {
+ u32 tisn;
+ u32 tdn;
+ u32 tirn;
+ u32 rqtn;
+ bool valid;
+};
+
+struct mlx5_vdpa_cq_buf {
+ struct mlx5_frag_buf_ctrl fbc;
+ struct mlx5_frag_buf frag_buf;
+ int cqe_size;
+ int nent;
+};
+
+struct mlx5_vdpa_cq {
+ struct mlx5_core_cq mcq;
+ struct mlx5_vdpa_cq_buf buf;
+ struct mlx5_db db;
+ int cqe;
+};
+
+struct mlx5_vdpa_umem {
+ struct mlx5_frag_buf_ctrl fbc;
+ struct mlx5_frag_buf frag_buf;
+ int size;
+ u32 id;
+};
+
+struct mlx5_vdpa_qp {
+ struct mlx5_core_qp mqp;
+ struct mlx5_frag_buf frag_buf;
+ struct mlx5_db db;
+ u16 head;
+ bool fw;
+};
+
+struct mlx5_vq_restore_info {
+ u32 num_ent;
+ u64 desc_addr;
+ u64 device_addr;
+ u64 driver_addr;
+ u16 avail_index;
+ bool ready;
+ struct vdpa_callback cb;
+ bool restore;
+};
+
+struct mlx5_vdpa_virtqueue {
+ bool ready;
+ u64 desc_addr;
+ u64 device_addr;
+ u64 driver_addr;
+ u32 num_ent;
+ struct vdpa_callback event_cb;
+
+ /* Resources for implementing the notification channel from the device
+ * to the driver. fwqp is the firmware end of an RC connection; the
+ * other end is vqqp used by the driver. cq is is where completions are
+ * reported.
+ */
+ struct mlx5_vdpa_cq cq;
+ struct mlx5_vdpa_qp fwqp;
+ struct mlx5_vdpa_qp vqqp;
+
+ /* umem resources are required for the virtqueue operation. They're use
+ * is internal and they must be provided by the driver.
+ */
+ struct mlx5_vdpa_umem umem1;
+ struct mlx5_vdpa_umem umem2;
+ struct mlx5_vdpa_umem umem3;
+
+ bool initialized;
+ int index;
+ u32 virtq_id;
+ struct mlx5_vdpa_net *ndev;
+ u16 avail_idx;
+ int fw_state;
+
+ /* keep last in the struct */
+ struct mlx5_vq_restore_info ri;
+};
+
+/* We will remove this limitation once mlx5_vdpa_alloc_resources()
+ * provides for driver space allocation
+ */
+#define MLX5_MAX_SUPPORTED_VQS 16
+
+struct mlx5_vdpa_net {
+ struct mlx5_vdpa_dev mvdev;
+ struct mlx5_vdpa_net_resources res;
+ struct virtio_net_config config;
+ struct mlx5_vdpa_virtqueue vqs[MLX5_MAX_SUPPORTED_VQS];
+
+ /* Serialize vq resources creation and destruction. This is required
+ * since memory map might change and we need to destroy and create
+ * resources while driver in operational.
+ */
+ struct mutex reslock;
+ struct mlx5_flow_table *rxft;
+ struct mlx5_fc *rx_counter;
+ struct mlx5_flow_handle *rx_rule;
+ bool setup;
+};
+
+static void free_resources(struct mlx5_vdpa_net *ndev);
+static void init_mvqs(struct mlx5_vdpa_net *ndev);
+static int setup_driver(struct mlx5_vdpa_net *ndev);
+static void teardown_driver(struct mlx5_vdpa_net *ndev);
+
+static bool mlx5_vdpa_debug;
+
+#define MLX5_LOG_VIO_FLAG(_feature) \
+ do { \
+ if (features & BIT(_feature)) \
+ mlx5_vdpa_info(mvdev, "%s\n", #_feature); \
+ } while (0)
+
+#define MLX5_LOG_VIO_STAT(_status) \
+ do { \
+ if (status & (_status)) \
+ mlx5_vdpa_info(mvdev, "%s\n", #_status); \
+ } while (0)
+
+static void print_status(struct mlx5_vdpa_dev *mvdev, u8 status, bool set)
+{
+ if (status & ~VALID_STATUS_MASK)
+ mlx5_vdpa_warn(mvdev, "Warning: there are invalid status bits 0x%x\n",
+ status & ~VALID_STATUS_MASK);
+
+ if (!mlx5_vdpa_debug)
+ return;
+
+ mlx5_vdpa_info(mvdev, "driver status %s", set ? "set" : "get");
+ if (set && !status) {
+ mlx5_vdpa_info(mvdev, "driver resets the device\n");
+ return;
+ }
+
+ MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_ACKNOWLEDGE);
+ MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_DRIVER);
+ MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_DRIVER_OK);
+ MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_FEATURES_OK);
+ MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_NEEDS_RESET);
+ MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_FAILED);
+}
+
+static void print_features(struct mlx5_vdpa_dev *mvdev, u64 features, bool set)
+{
+ if (features & ~VALID_FEATURES_MASK)
+ mlx5_vdpa_warn(mvdev, "There are invalid feature bits 0x%llx\n",
+ features & ~VALID_FEATURES_MASK);
+
+ if (!mlx5_vdpa_debug)
+ return;
+
+ mlx5_vdpa_info(mvdev, "driver %s feature bits:\n", set ? "sets" : "reads");
+ if (!features)
+ mlx5_vdpa_info(mvdev, "all feature bits are cleared\n");
+
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CSUM);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_CSUM);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MTU);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MAC);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_TSO4);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_TSO6);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_ECN);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_UFO);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_TSO4);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_TSO6);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_ECN);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_UFO);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MRG_RXBUF);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_STATUS);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_VQ);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_RX);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_VLAN);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_RX_EXTRA);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_ANNOUNCE);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MQ);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_MAC_ADDR);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HASH_REPORT);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_RSS);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_RSC_EXT);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_STANDBY);
+ MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_SPEED_DUPLEX);
+ MLX5_LOG_VIO_FLAG(VIRTIO_F_NOTIFY_ON_EMPTY);
+ MLX5_LOG_VIO_FLAG(VIRTIO_F_ANY_LAYOUT);
+ MLX5_LOG_VIO_FLAG(VIRTIO_F_VERSION_1);
+ MLX5_LOG_VIO_FLAG(VIRTIO_F_IOMMU_PLATFORM);
+ MLX5_LOG_VIO_FLAG(VIRTIO_F_RING_PACKED);
+ MLX5_LOG_VIO_FLAG(VIRTIO_F_ORDER_PLATFORM);
+ MLX5_LOG_VIO_FLAG(VIRTIO_F_SR_IOV);
+}
+
+static int create_tis(struct mlx5_vdpa_net *ndev)
+{
+ struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
+ u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {};
+ void *tisc;
+ int err;
+
+ tisc = MLX5_ADDR_OF(create_tis_in, in, ctx);
+ MLX5_SET(tisc, tisc, transport_domain, ndev->res.tdn);
+ err = mlx5_vdpa_create_tis(mvdev, in, &ndev->res.tisn);
+ if (err)
+ mlx5_vdpa_warn(mvdev, "create TIS (%d)\n", err);
+
+ return err;
+}
+
+static void destroy_tis(struct mlx5_vdpa_net *ndev)
+{
+ mlx5_vdpa_destroy_tis(&ndev->mvdev, ndev->res.tisn);
+}
+
+#define MLX5_VDPA_CQE_SIZE 64
+#define MLX5_VDPA_LOG_CQE_SIZE ilog2(MLX5_VDPA_CQE_SIZE)
+
+static int cq_frag_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_cq_buf *buf, int nent)
+{
+ struct mlx5_frag_buf *frag_buf = &buf->frag_buf;
+ u8 log_wq_stride = MLX5_VDPA_LOG_CQE_SIZE;
+ u8 log_wq_sz = MLX5_VDPA_LOG_CQE_SIZE;
+ int err;
+
+ err = mlx5_frag_buf_alloc_node(ndev->mvdev.mdev, nent * MLX5_VDPA_CQE_SIZE, frag_buf,
+ ndev->mvdev.mdev->priv.numa_node);
+ if (err)
+ return err;
+
+ mlx5_init_fbc(frag_buf->frags, log_wq_stride, log_wq_sz, &buf->fbc);
+
+ buf->cqe_size = MLX5_VDPA_CQE_SIZE;
+ buf->nent = nent;
+
+ return 0;
+}
+
+static int umem_frag_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_umem *umem, int size)
+{
+ struct mlx5_frag_buf *frag_buf = &umem->frag_buf;
+
+ return mlx5_frag_buf_alloc_node(ndev->mvdev.mdev, size, frag_buf,
+ ndev->mvdev.mdev->priv.numa_node);
+}
+
+static void cq_frag_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_cq_buf *buf)
+{
+ mlx5_frag_buf_free(ndev->mvdev.mdev, &buf->frag_buf);
+}
+
+static void *get_cqe(struct mlx5_vdpa_cq *vcq, int n)
+{
+ return mlx5_frag_buf_get_wqe(&vcq->buf.fbc, n);
+}
+
+static void cq_frag_buf_init(struct mlx5_vdpa_cq *vcq, struct mlx5_vdpa_cq_buf *buf)
+{
+ struct mlx5_cqe64 *cqe64;
+ void *cqe;
+ int i;
+
+ for (i = 0; i < buf->nent; i++) {
+ cqe = get_cqe(vcq, i);
+ cqe64 = cqe;
+ cqe64->op_own = MLX5_CQE_INVALID << 4;
+ }
+}
+
+static void *get_sw_cqe(struct mlx5_vdpa_cq *cq, int n)
+{
+ struct mlx5_cqe64 *cqe64 = get_cqe(cq, n & (cq->cqe - 1));
+
+ if (likely(get_cqe_opcode(cqe64) != MLX5_CQE_INVALID) &&
+ !((cqe64->op_own & MLX5_CQE_OWNER_MASK) ^ !!(n & cq->cqe)))
+ return cqe64;
+
+ return NULL;
+}
+
+static void rx_post(struct mlx5_vdpa_qp *vqp, int n)
+{
+ vqp->head += n;
+ vqp->db.db[0] = cpu_to_be32(vqp->head);
+}
+
+static void qp_prepare(struct mlx5_vdpa_net *ndev, bool fw, void *in,
+ struct mlx5_vdpa_virtqueue *mvq, u32 num_ent)
+{
+ struct mlx5_vdpa_qp *vqp;
+ __be64 *pas;
+ void *qpc;
+
+ vqp = fw ? &mvq->fwqp : &mvq->vqqp;
+ MLX5_SET(create_qp_in, in, uid, ndev->mvdev.res.uid);
+ qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+ if (vqp->fw) {
+ /* Firmware QP is allocated by the driver for the firmware's
+ * use so we can skip part of the params as they will be chosen by firmware
+ */
+ qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+ MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
+ MLX5_SET(qpc, qpc, no_sq, 1);
+ return;
+ }
+
+ MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
+ MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+ MLX5_SET(qpc, qpc, pd, ndev->mvdev.res.pdn);
+ MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
+ MLX5_SET(qpc, qpc, uar_page, ndev->mvdev.res.uar->index);
+ MLX5_SET(qpc, qpc, log_page_size, vqp->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+ MLX5_SET(qpc, qpc, no_sq, 1);
+ MLX5_SET(qpc, qpc, cqn_rcv, mvq->cq.mcq.cqn);
+ MLX5_SET(qpc, qpc, log_rq_size, ilog2(num_ent));
+ MLX5_SET(qpc, qpc, rq_type, MLX5_NON_ZERO_RQ);
+ pas = (__be64 *)MLX5_ADDR_OF(create_qp_in, in, pas);
+ mlx5_fill_page_frag_array(&vqp->frag_buf, pas);
+}
+
+static int rq_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp, u32 num_ent)
+{
+ return mlx5_frag_buf_alloc_node(ndev->mvdev.mdev,
+ num_ent * sizeof(struct mlx5_wqe_data_seg), &vqp->frag_buf,
+ ndev->mvdev.mdev->priv.numa_node);
+}
+
+static void rq_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp)
+{
+ mlx5_frag_buf_free(ndev->mvdev.mdev, &vqp->frag_buf);
+}
+
+static int qp_create(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
+ struct mlx5_vdpa_qp *vqp)
+{
+ struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+ int inlen = MLX5_ST_SZ_BYTES(create_qp_in);
+ u32 out[MLX5_ST_SZ_DW(create_qp_out)] = {};
+ void *qpc;
+ void *in;
+ int err;
+
+ if (!vqp->fw) {
+ vqp = &mvq->vqqp;
+ err = rq_buf_alloc(ndev, vqp, mvq->num_ent);
+ if (err)
+ return err;
+
+ err = mlx5_db_alloc(ndev->mvdev.mdev, &vqp->db);
+ if (err)
+ goto err_db;
+ inlen += vqp->frag_buf.npages * sizeof(__be64);
+ }
+
+ in = kzalloc(inlen, GFP_KERNEL);
+ if (!in) {
+ err = -ENOMEM;
+ goto err_kzalloc;
+ }
+
+ qp_prepare(ndev, vqp->fw, in, mvq, mvq->num_ent);
+ qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
+ MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
+ MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
+ MLX5_SET(qpc, qpc, pd, ndev->mvdev.res.pdn);
+ MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
+ if (!vqp->fw)
+ MLX5_SET64(qpc, qpc, dbr_addr, vqp->db.dma);
+ MLX5_SET(create_qp_in, in, opcode, MLX5_CMD_OP_CREATE_QP);
+ err = mlx5_cmd_exec(mdev, in, inlen, out, sizeof(out));
+ kfree(in);
+ if (err)
+ goto err_kzalloc;
+
+ vqp->mqp.uid = MLX5_GET(create_qp_in, in, uid);
+ vqp->mqp.qpn = MLX5_GET(create_qp_out, out, qpn);
+
+ if (!vqp->fw)
+ rx_post(vqp, mvq->num_ent);
+
+ return 0;
+
+err_kzalloc:
+ if (!vqp->fw)
+ mlx5_db_free(ndev->mvdev.mdev, &vqp->db);
+err_db:
+ if (!vqp->fw)
+ rq_buf_free(ndev, vqp);
+
+ return err;
+}
+
+static void qp_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_qp_in)] = {};
+
+ MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
+ MLX5_SET(destroy_qp_in, in, qpn, vqp->mqp.qpn);
+ MLX5_SET(destroy_qp_in, in, uid, ndev->mvdev.res.uid);
+ if (mlx5_cmd_exec_in(ndev->mvdev.mdev, destroy_qp, in))
+ mlx5_vdpa_warn(&ndev->mvdev, "destroy qp 0x%x\n", vqp->mqp.qpn);
+ if (!vqp->fw) {
+ mlx5_db_free(ndev->mvdev.mdev, &vqp->db);
+ rq_buf_free(ndev, vqp);
+ }
+}
+
+static void *next_cqe_sw(struct mlx5_vdpa_cq *cq)
+{
+ return get_sw_cqe(cq, cq->mcq.cons_index);
+}
+
+static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq *vcq)
+{
+ struct mlx5_cqe64 *cqe64;
+
+ cqe64 = next_cqe_sw(vcq);
+ if (!cqe64)
+ return -EAGAIN;
+
+ vcq->mcq.cons_index++;
+ return 0;
+}
+
+static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, int num)
+{
+ mlx5_cq_set_ci(&mvq->cq.mcq);
+ rx_post(&mvq->vqqp, num);
+ if (mvq->event_cb.callback)
+ mvq->event_cb.callback(mvq->event_cb.private);
+}
+
+static void mlx5_vdpa_cq_comp(struct mlx5_core_cq *mcq, struct mlx5_eqe *eqe)
+{
+ struct mlx5_vdpa_virtqueue *mvq = container_of(mcq, struct mlx5_vdpa_virtqueue, cq.mcq);
+ struct mlx5_vdpa_net *ndev = mvq->ndev;
+ void __iomem *uar_page = ndev->mvdev.res.uar->map;
+ int num = 0;
+
+ while (!mlx5_vdpa_poll_one(&mvq->cq)) {
+ num++;
+ if (num > mvq->num_ent / 2) {
+ /* If completions keep coming while we poll, we want to
+ * let the hardware know that we consumed them by
+ * updating the doorbell record. We also let vdpa core
+ * know about this so it passes it on the virtio driver
+ * on the guest.
+ */
+ mlx5_vdpa_handle_completions(mvq, num);
+ num = 0;
+ }
+ }
+
+ if (num)
+ mlx5_vdpa_handle_completions(mvq, num);
+
+ mlx5_cq_arm(&mvq->cq.mcq, MLX5_CQ_DB_REQ_NOT, uar_page, mvq->cq.mcq.cons_index);
+}
+
+static int cq_create(struct mlx5_vdpa_net *ndev, u16 idx, u32 num_ent)
+{
+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+ struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+ void __iomem *uar_page = ndev->mvdev.res.uar->map;
+ u32 out[MLX5_ST_SZ_DW(create_cq_out)];
+ struct mlx5_vdpa_cq *vcq = &mvq->cq;
+ unsigned int irqn;
+ __be64 *pas;
+ int inlen;
+ void *cqc;
+ void *in;
+ int err;
+ int eqn;
+
+ err = mlx5_db_alloc(mdev, &vcq->db);
+ if (err)
+ return err;
+
+ vcq->mcq.set_ci_db = vcq->db.db;
+ vcq->mcq.arm_db = vcq->db.db + 1;
+ vcq->mcq.cqe_sz = 64;
+
+ err = cq_frag_buf_alloc(ndev, &vcq->buf, num_ent);
+ if (err)
+ goto err_db;
+
+ cq_frag_buf_init(vcq, &vcq->buf);
+
+ inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
+ MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) * vcq->buf.frag_buf.npages;
+ in = kzalloc(inlen, GFP_KERNEL);
+ if (!in) {
+ err = -ENOMEM;
+ goto err_vzalloc;
+ }
+
+ MLX5_SET(create_cq_in, in, uid, ndev->mvdev.res.uid);
+ pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas);
+ mlx5_fill_page_frag_array(&vcq->buf.frag_buf, pas);
+
+ cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+ MLX5_SET(cqc, cqc, log_page_size, vcq->buf.frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+
+ /* Use vector 0 by default. Consider adding code to choose least used
+ * vector.
+ */
+ err = mlx5_vector2eqn(mdev, 0, &eqn, &irqn);
+ if (err)
+ goto err_vec;
+
+ cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
+ MLX5_SET(cqc, cqc, log_cq_size, ilog2(num_ent));
+ MLX5_SET(cqc, cqc, uar_page, ndev->mvdev.res.uar->index);
+ MLX5_SET(cqc, cqc, c_eqn, eqn);
+ MLX5_SET64(cqc, cqc, dbr_addr, vcq->db.dma);
+
+ err = mlx5_core_create_cq(mdev, &vcq->mcq, in, inlen, out, sizeof(out));
+ if (err)
+ goto err_vec;
+
+ vcq->mcq.comp = mlx5_vdpa_cq_comp;
+ vcq->cqe = num_ent;
+ vcq->mcq.set_ci_db = vcq->db.db;
+ vcq->mcq.arm_db = vcq->db.db + 1;
+ mlx5_cq_arm(&mvq->cq.mcq, MLX5_CQ_DB_REQ_NOT, uar_page, mvq->cq.mcq.cons_index);
+ kfree(in);
+ return 0;
+
+err_vec:
+ kfree(in);
+err_vzalloc:
+ cq_frag_buf_free(ndev, &vcq->buf);
+err_db:
+ mlx5_db_free(ndev->mvdev.mdev, &vcq->db);
+ return err;
+}
+
+static void cq_destroy(struct mlx5_vdpa_net *ndev, u16 idx)
+{
+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+ struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+ struct mlx5_vdpa_cq *vcq = &mvq->cq;
+
+ if (mlx5_core_destroy_cq(mdev, &vcq->mcq)) {
+ mlx5_vdpa_warn(&ndev->mvdev, "destroy CQ 0x%x\n", vcq->mcq.cqn);
+ return;
+ }
+ cq_frag_buf_free(ndev, &vcq->buf);
+ mlx5_db_free(ndev->mvdev.mdev, &vcq->db);
+}
+
+static int umem_size(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num,
+ struct mlx5_vdpa_umem **umemp)
+{
+ struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
+ int p_a;
+ int p_b;
+
+ switch (num) {
+ case 1:
+ p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_1_buffer_param_a);
+ p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_1_buffer_param_b);
+ *umemp = &mvq->umem1;
+ break;
+ case 2:
+ p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_2_buffer_param_a);
+ p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_2_buffer_param_b);
+ *umemp = &mvq->umem2;
+ break;
+ case 3:
+ p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_3_buffer_param_a);
+ p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_3_buffer_param_b);
+ *umemp = &mvq->umem3;
+ break;
+ }
+ return p_a * mvq->num_ent + p_b;
+}
+
+static void umem_frag_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_umem *umem)
+{
+ mlx5_frag_buf_free(ndev->mvdev.mdev, &umem->frag_buf);
+}
+
+static int create_umem(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num)
+{
+ int inlen;
+ u32 out[MLX5_ST_SZ_DW(create_umem_out)] = {};
+ void *um;
+ void *in;
+ int err;
+ __be64 *pas;
+ int size;
+ struct mlx5_vdpa_umem *umem;
+
+ size = umem_size(ndev, mvq, num, &umem);
+ if (size < 0)
+ return size;
+
+ umem->size = size;
+ err = umem_frag_buf_alloc(ndev, umem, size);
+ if (err)
+ return err;
+
+ inlen = MLX5_ST_SZ_BYTES(create_umem_in) + MLX5_ST_SZ_BYTES(mtt) * umem->frag_buf.npages;
+
+ in = kzalloc(inlen, GFP_KERNEL);
+ if (!in) {
+ err = -ENOMEM;
+ goto err_in;
+ }
+
+ MLX5_SET(create_umem_in, in, opcode, MLX5_CMD_OP_CREATE_UMEM);
+ MLX5_SET(create_umem_in, in, uid, ndev->mvdev.res.uid);
+ um = MLX5_ADDR_OF(create_umem_in, in, umem);
+ MLX5_SET(umem, um, log_page_size, umem->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
+ MLX5_SET64(umem, um, num_of_mtt, umem->frag_buf.npages);
+
+ pas = (__be64 *)MLX5_ADDR_OF(umem, um, mtt[0]);
+ mlx5_fill_page_frag_array_perm(&umem->frag_buf, pas, MLX5_MTT_PERM_RW);
+
+ err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
+ if (err) {
+ mlx5_vdpa_warn(&ndev->mvdev, "create umem(%d)\n", err);
+ goto err_cmd;
+ }
+
+ kfree(in);
+ umem->id = MLX5_GET(create_umem_out, out, umem_id);
+
+ return 0;
+
+err_cmd:
+ kfree(in);
+err_in:
+ umem_frag_buf_free(ndev, umem);
+ return err;
+}
+
+static void umem_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_umem_in)] = {};
+ u32 out[MLX5_ST_SZ_DW(destroy_umem_out)] = {};
+ struct mlx5_vdpa_umem *umem;
+
+ switch (num) {
+ case 1:
+ umem = &mvq->umem1;
+ break;
+ case 2:
+ umem = &mvq->umem2;
+ break;
+ case 3:
+ umem = &mvq->umem3;
+ break;
+ }
+
+ MLX5_SET(destroy_umem_in, in, opcode, MLX5_CMD_OP_DESTROY_UMEM);
+ MLX5_SET(destroy_umem_in, in, umem_id, umem->id);
+ if (mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, sizeof(out)))
+ return;
+
+ umem_frag_buf_free(ndev, umem);
+}
+
+static int umems_create(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ int num;
+ int err;
+
+ for (num = 1; num <= 3; num++) {
+ err = create_umem(ndev, mvq, num);
+ if (err)
+ goto err_umem;
+ }
+ return 0;
+
+err_umem:
+ for (num--; num > 0; num--)
+ umem_destroy(ndev, mvq, num);
+
+ return err;
+}
+
+static void umems_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ int num;
+
+ for (num = 3; num > 0; num--)
+ umem_destroy(ndev, mvq, num);
+}
+
+static int get_queue_type(struct mlx5_vdpa_net *ndev)
+{
+ u32 type_mask;
+
+ type_mask = MLX5_CAP_DEV_VDPA_EMULATION(ndev->mvdev.mdev, virtio_queue_type);
+
+ /* prefer split queue */
+ if (type_mask & MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_PACKED)
+ return MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_PACKED;
+
+ WARN_ON(!(type_mask & MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_SPLIT));
+
+ return MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_SPLIT;
+}
+
+static bool vq_is_tx(u16 idx)
+{
+ return idx % 2;
+}
+
+static u16 get_features_12_3(u64 features)
+{
+ return (!!(features & BIT(VIRTIO_NET_F_HOST_TSO4)) << 9) |
+ (!!(features & BIT(VIRTIO_NET_F_HOST_TSO6)) << 8) |
+ (!!(features & BIT(VIRTIO_NET_F_CSUM)) << 7) |
+ (!!(features & BIT(VIRTIO_NET_F_GUEST_CSUM)) << 6);
+}
+
+static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
+ u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {};
+ void *obj_context;
+ void *cmd_hdr;
+ void *vq_ctx;
+ void *in;
+ int err;
+
+ err = umems_create(ndev, mvq);
+ if (err)
+ return err;
+
+ in = kzalloc(inlen, GFP_KERNEL);
+ if (!in) {
+ err = -ENOMEM;
+ goto err_alloc;
+ }
+
+ cmd_hdr = MLX5_ADDR_OF(create_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
+
+ obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
+ MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
+ MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
+ get_features_12_3(ndev->mvdev.actual_features));
+ vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
+ MLX5_SET(virtio_q, vq_ctx, virtio_q_type, get_queue_type(ndev));
+
+ if (vq_is_tx(mvq->index))
+ MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, ndev->res.tisn);
+
+ MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
+ MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
+ MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
+ MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
+ MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
+ !!(ndev->mvdev.actual_features & VIRTIO_F_VERSION_1));
+ MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
+ MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
+ MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
+ MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, ndev->mvdev.mr.mkey.key);
+ MLX5_SET(virtio_q, vq_ctx, umem_1_id, mvq->umem1.id);
+ MLX5_SET(virtio_q, vq_ctx, umem_1_size, mvq->umem1.size);
+ MLX5_SET(virtio_q, vq_ctx, umem_2_id, mvq->umem2.id);
+ MLX5_SET(virtio_q, vq_ctx, umem_2_size, mvq->umem1.size);
+ MLX5_SET(virtio_q, vq_ctx, umem_3_id, mvq->umem3.id);
+ MLX5_SET(virtio_q, vq_ctx, umem_3_size, mvq->umem1.size);
+ MLX5_SET(virtio_q, vq_ctx, pd, ndev->mvdev.res.pdn);
+ if (MLX5_CAP_DEV_VDPA_EMULATION(ndev->mvdev.mdev, eth_frame_offload_type))
+ MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0, 1);
+
+ err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
+ if (err)
+ goto err_cmd;
+
+ kfree(in);
+ mvq->virtq_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
+
+ return 0;
+
+err_cmd:
+ kfree(in);
+err_alloc:
+ umems_destroy(ndev, mvq);
+ return err;
+}
+
+static void destroy_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ u32 in[MLX5_ST_SZ_DW(destroy_virtio_net_q_in)] = {};
+ u32 out[MLX5_ST_SZ_DW(destroy_virtio_net_q_out)] = {};
+
+ MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.opcode,
+ MLX5_CMD_OP_DESTROY_GENERAL_OBJECT);
+ MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.obj_id, mvq->virtq_id);
+ MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.uid, ndev->mvdev.res.uid);
+ MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.obj_type,
+ MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+ if (mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, sizeof(out))) {
+ mlx5_vdpa_warn(&ndev->mvdev, "destroy virtqueue 0x%x\n", mvq->virtq_id);
+ return;
+ }
+ umems_destroy(ndev, mvq);
+}
+
+static u32 get_rqpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
+{
+ return fw ? mvq->vqqp.mqp.qpn : mvq->fwqp.mqp.qpn;
+}
+
+static u32 get_qpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
+{
+ return fw ? mvq->fwqp.mqp.qpn : mvq->vqqp.mqp.qpn;
+}
+
+static void alloc_inout(struct mlx5_vdpa_net *ndev, int cmd, void **in, int *inlen, void **out,
+ int *outlen, u32 qpn, u32 rqpn)
+{
+ void *qpc;
+ void *pp;
+
+ switch (cmd) {
+ case MLX5_CMD_OP_2RST_QP:
+ *inlen = MLX5_ST_SZ_BYTES(qp_2rst_in);
+ *outlen = MLX5_ST_SZ_BYTES(qp_2rst_out);
+ *in = kzalloc(*inlen, GFP_KERNEL);
+ *out = kzalloc(*outlen, GFP_KERNEL);
+ if (!in || !out)
+ goto outerr;
+
+ MLX5_SET(qp_2rst_in, *in, opcode, cmd);
+ MLX5_SET(qp_2rst_in, *in, uid, ndev->mvdev.res.uid);
+ MLX5_SET(qp_2rst_in, *in, qpn, qpn);
+ break;
+ case MLX5_CMD_OP_RST2INIT_QP:
+ *inlen = MLX5_ST_SZ_BYTES(rst2init_qp_in);
+ *outlen = MLX5_ST_SZ_BYTES(rst2init_qp_out);
+ *in = kzalloc(*inlen, GFP_KERNEL);
+ *out = kzalloc(MLX5_ST_SZ_BYTES(rst2init_qp_out), GFP_KERNEL);
+ if (!in || !out)
+ goto outerr;
+
+ MLX5_SET(rst2init_qp_in, *in, opcode, cmd);
+ MLX5_SET(rst2init_qp_in, *in, uid, ndev->mvdev.res.uid);
+ MLX5_SET(rst2init_qp_in, *in, qpn, qpn);
+ qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
+ MLX5_SET(qpc, qpc, remote_qpn, rqpn);
+ MLX5_SET(qpc, qpc, rwe, 1);
+ pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+ MLX5_SET(ads, pp, vhca_port_num, 1);
+ break;
+ case MLX5_CMD_OP_INIT2RTR_QP:
+ *inlen = MLX5_ST_SZ_BYTES(init2rtr_qp_in);
+ *outlen = MLX5_ST_SZ_BYTES(init2rtr_qp_out);
+ *in = kzalloc(*inlen, GFP_KERNEL);
+ *out = kzalloc(MLX5_ST_SZ_BYTES(init2rtr_qp_out), GFP_KERNEL);
+ if (!in || !out)
+ goto outerr;
+
+ MLX5_SET(init2rtr_qp_in, *in, opcode, cmd);
+ MLX5_SET(init2rtr_qp_in, *in, uid, ndev->mvdev.res.uid);
+ MLX5_SET(init2rtr_qp_in, *in, qpn, qpn);
+ qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
+ MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
+ MLX5_SET(qpc, qpc, log_msg_max, 30);
+ MLX5_SET(qpc, qpc, remote_qpn, rqpn);
+ pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+ MLX5_SET(ads, pp, fl, 1);
+ break;
+ case MLX5_CMD_OP_RTR2RTS_QP:
+ *inlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_in);
+ *outlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_out);
+ *in = kzalloc(*inlen, GFP_KERNEL);
+ *out = kzalloc(MLX5_ST_SZ_BYTES(rtr2rts_qp_out), GFP_KERNEL);
+ if (!in || !out)
+ goto outerr;
+
+ MLX5_SET(rtr2rts_qp_in, *in, opcode, cmd);
+ MLX5_SET(rtr2rts_qp_in, *in, uid, ndev->mvdev.res.uid);
+ MLX5_SET(rtr2rts_qp_in, *in, qpn, qpn);
+ qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
+ pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
+ MLX5_SET(ads, pp, ack_timeout, 14);
+ MLX5_SET(qpc, qpc, retry_count, 7);
+ MLX5_SET(qpc, qpc, rnr_retry, 7);
+ break;
+ default:
+ goto outerr;
+ }
+ if (!*in || !*out)
+ goto outerr;
+
+ return;
+
+outerr:
+ kfree(*in);
+ kfree(*out);
+ *in = NULL;
+ *out = NULL;
+}
+
+static void free_inout(void *in, void *out)
+{
+ kfree(in);
+ kfree(out);
+}
+
+/* Two QPs are used by each virtqueue. One is used by the driver and one by
+ * firmware. The fw argument indicates whether the subjected QP is the one used
+ * by firmware.
+ */
+static int modify_qp(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, bool fw, int cmd)
+{
+ int outlen;
+ int inlen;
+ void *out;
+ void *in;
+ int err;
+
+ alloc_inout(ndev, cmd, &in, &inlen, &out, &outlen, get_qpn(mvq, fw), get_rqpn(mvq, fw));
+ if (!in || !out)
+ return -ENOMEM;
+
+ err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, outlen);
+ free_inout(in, out);
+ return err;
+}
+
+static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ int err;
+
+ err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_2RST_QP);
+ if (err)
+ return err;
+
+ err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_2RST_QP);
+ if (err)
+ return err;
+
+ err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_RST2INIT_QP);
+ if (err)
+ return err;
+
+ err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_RST2INIT_QP);
+ if (err)
+ return err;
+
+ err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_INIT2RTR_QP);
+ if (err)
+ return err;
+
+ err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_INIT2RTR_QP);
+ if (err)
+ return err;
+
+ return modify_qp(ndev, mvq, true, MLX5_CMD_OP_RTR2RTS_QP);
+}
+
+struct mlx5_virtq_attr {
+ u8 state;
+ u16 available_index;
+};
+
+static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
+ struct mlx5_virtq_attr *attr)
+{
+ int outlen = MLX5_ST_SZ_BYTES(query_virtio_net_q_out);
+ u32 in[MLX5_ST_SZ_DW(query_virtio_net_q_in)] = {};
+ void *out;
+ void *obj_context;
+ void *cmd_hdr;
+ int err;
+
+ out = kzalloc(outlen, GFP_KERNEL);
+ if (!out)
+ return -ENOMEM;
+
+ cmd_hdr = MLX5_ADDR_OF(query_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
+ err = mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, outlen);
+ if (err)
+ goto err_cmd;
+
+ obj_context = MLX5_ADDR_OF(query_virtio_net_q_out, out, obj_context);
+ memset(attr, 0, sizeof(*attr));
+ attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
+ attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
+ kfree(out);
+ return 0;
+
+err_cmd:
+ kfree(out);
+ return err;
+}
+
+static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int state)
+{
+ int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
+ u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
+ void *obj_context;
+ void *cmd_hdr;
+ void *in;
+ int err;
+
+ in = kzalloc(inlen, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, in, general_obj_in_cmd_hdr);
+
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
+ MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
+
+ obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
+ MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
+ MLX5_VIRTQ_MODIFY_MASK_STATE);
+ MLX5_SET(virtio_net_q_object, obj_context, state, state);
+ err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
+ kfree(in);
+ if (!err)
+ mvq->fw_state = state;
+
+ return err;
+}
+
+static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ u16 idx = mvq->index;
+ int err;
+
+ if (!mvq->num_ent)
+ return 0;
+
+ if (mvq->initialized) {
+ mlx5_vdpa_warn(&ndev->mvdev, "attempt re init\n");
+ return -EINVAL;
+ }
+
+ err = cq_create(ndev, idx, mvq->num_ent);
+ if (err)
+ return err;
+
+ err = qp_create(ndev, mvq, &mvq->fwqp);
+ if (err)
+ goto err_fwqp;
+
+ err = qp_create(ndev, mvq, &mvq->vqqp);
+ if (err)
+ goto err_vqqp;
+
+ err = connect_qps(ndev, mvq);
+ if (err)
+ goto err_connect;
+
+ err = create_virtqueue(ndev, mvq);
+ if (err)
+ goto err_connect;
+
+ if (mvq->ready) {
+ err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+ if (err) {
+ mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready vq idx %d(%d)\n",
+ idx, err);
+ goto err_connect;
+ }
+ }
+
+ mvq->initialized = true;
+ return 0;
+
+err_connect:
+ qp_destroy(ndev, &mvq->vqqp);
+err_vqqp:
+ qp_destroy(ndev, &mvq->fwqp);
+err_fwqp:
+ cq_destroy(ndev, idx);
+ return err;
+}
+
+static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ struct mlx5_virtq_attr attr;
+
+ if (!mvq->initialized)
+ return;
+
+ if (query_virtqueue(ndev, mvq, &attr)) {
+ mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
+ return;
+ }
+ if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
+ return;
+
+ if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
+ mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
+}
+
+static void suspend_vqs(struct mlx5_vdpa_net *ndev)
+{
+ int i;
+
+ for (i = 0; i < MLX5_MAX_SUPPORTED_VQS; i++)
+ suspend_vq(ndev, &ndev->vqs[i]);
+}
+
+static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ if (!mvq->initialized || !mvq->num_ent)
+ return;
+
+ suspend_vq(ndev, mvq);
+ destroy_virtqueue(ndev, mvq);
+ qp_destroy(ndev, &mvq->vqqp);
+ qp_destroy(ndev, &mvq->fwqp);
+ cq_destroy(ndev, mvq->index);
+ mvq->initialized = false;
+}
+
+static int create_rqt(struct mlx5_vdpa_net *ndev)
+{
+ int log_max_rqt;
+ int acutal_rqt;
+ __be32 *list;
+ void *rqtc;
+ int inlen;
+ void *in;
+ int i, j;
+ int err;
+
+ log_max_rqt = min_t(int, 1, MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
+ if (log_max_rqt < 1)
+ return -EOPNOTSUPP;
+
+ acutal_rqt = 1;
+ inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + (1 << log_max_rqt) * MLX5_ST_SZ_BYTES(rq_num);
+ in = kzalloc(inlen, GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(create_rqt_in, in, uid, ndev->mvdev.res.uid);
+ rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
+
+ MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
+ MLX5_SET(rqtc, rqtc, rqt_max_size, 1 << log_max_rqt);
+ MLX5_SET(rqtc, rqtc, rqt_actual_size, 1);
+ list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
+ for (i = 0, j = 0; j < ndev->mvdev.max_vqs; j++) {
+ if (!ndev->vqs[j].initialized)
+ continue;
+
+ if (!vq_is_tx(ndev->vqs[j].index)) {
+ list[i] = cpu_to_be32(ndev->vqs[j].virtq_id);
+ i++;
+ }
+ }
+
+ err = mlx5_vdpa_create_rqt(&ndev->mvdev, in, inlen, &ndev->res.rqtn);
+ kfree(in);
+ if (err)
+ return err;
+
+ return 0;
+}
+
+static void destroy_rqt(struct mlx5_vdpa_net *ndev)
+{
+ mlx5_vdpa_destroy_rqt(&ndev->mvdev, ndev->res.rqtn);
+}
+
+static int create_tir(struct mlx5_vdpa_net *ndev)
+{
+#define HASH_IP_L4PORTS \
+ (MLX5_HASH_FIELD_SEL_SRC_IP | MLX5_HASH_FIELD_SEL_DST_IP | MLX5_HASH_FIELD_SEL_L4_SPORT | \
+ MLX5_HASH_FIELD_SEL_L4_DPORT)
+ static const u8 rx_hash_toeplitz_key[] = { 0x2c, 0xc6, 0x81, 0xd1, 0x5b, 0xdb, 0xf4, 0xf7,
+ 0xfc, 0xa2, 0x83, 0x19, 0xdb, 0x1a, 0x3e, 0x94,
+ 0x6b, 0x9e, 0x38, 0xd9, 0x2c, 0x9c, 0x03, 0xd1,
+ 0xad, 0x99, 0x44, 0xa7, 0xd9, 0x56, 0x3d, 0x59,
+ 0x06, 0x3c, 0x25, 0xf3, 0xfc, 0x1f, 0xdc, 0x2a };
+ void *rss_key;
+ void *outer;
+ void *tirc;
+ void *in;
+ int err;
+
+ in = kzalloc(MLX5_ST_SZ_BYTES(create_tir_in), GFP_KERNEL);
+ if (!in)
+ return -ENOMEM;
+
+ MLX5_SET(create_tir_in, in, uid, ndev->mvdev.res.uid);
+ tirc = MLX5_ADDR_OF(create_tir_in, in, ctx);
+ MLX5_SET(tirc, tirc, disp_type, MLX5_TIRC_DISP_TYPE_INDIRECT);
+
+ MLX5_SET(tirc, tirc, rx_hash_symmetric, 1);
+ MLX5_SET(tirc, tirc, rx_hash_fn, MLX5_RX_HASH_FN_TOEPLITZ);
+ rss_key = MLX5_ADDR_OF(tirc, tirc, rx_hash_toeplitz_key);
+ memcpy(rss_key, rx_hash_toeplitz_key, sizeof(rx_hash_toeplitz_key));
+
+ outer = MLX5_ADDR_OF(tirc, tirc, rx_hash_field_selector_outer);
+ MLX5_SET(rx_hash_field_select, outer, l3_prot_type, MLX5_L3_PROT_TYPE_IPV4);
+ MLX5_SET(rx_hash_field_select, outer, l4_prot_type, MLX5_L4_PROT_TYPE_TCP);
+ MLX5_SET(rx_hash_field_select, outer, selected_fields, HASH_IP_L4PORTS);
+
+ MLX5_SET(tirc, tirc, indirect_table, ndev->res.rqtn);
+ MLX5_SET(tirc, tirc, transport_domain, ndev->res.tdn);
+
+ err = mlx5_vdpa_create_tir(&ndev->mvdev, in, &ndev->res.tirn);
+ kfree(in);
+ return err;
+}
+
+static void destroy_tir(struct mlx5_vdpa_net *ndev)
+{
+ mlx5_vdpa_destroy_tir(&ndev->mvdev, ndev->res.tirn);
+}
+
+static int add_fwd_to_tir(struct mlx5_vdpa_net *ndev)
+{
+ struct mlx5_flow_destination dest[2] = {};
+ struct mlx5_flow_table_attr ft_attr = {};
+ struct mlx5_flow_act flow_act = {};
+ struct mlx5_flow_namespace *ns;
+ int err;
+
+ /* for now, one entry, match all, forward to tir */
+ ft_attr.max_fte = 1;
+ ft_attr.autogroup.max_num_groups = 1;
+
+ ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, MLX5_FLOW_NAMESPACE_BYPASS);
+ if (!ns) {
+ mlx5_vdpa_warn(&ndev->mvdev, "get flow namespace\n");
+ return -EOPNOTSUPP;
+ }
+
+ ndev->rxft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
+ if (IS_ERR(ndev->rxft))
+ return PTR_ERR(ndev->rxft);
+
+ ndev->rx_counter = mlx5_fc_create(ndev->mvdev.mdev, false);
+ if (IS_ERR(ndev->rx_counter)) {
+ err = PTR_ERR(ndev->rx_counter);
+ goto err_fc;
+ }
+
+ flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_COUNT;
+ dest[0].type = MLX5_FLOW_DESTINATION_TYPE_TIR;
+ dest[0].tir_num = ndev->res.tirn;
+ dest[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
+ dest[1].counter_id = mlx5_fc_id(ndev->rx_counter);
+ ndev->rx_rule = mlx5_add_flow_rules(ndev->rxft, NULL, &flow_act, dest, 2);
+ if (IS_ERR(ndev->rx_rule)) {
+ err = PTR_ERR(ndev->rx_rule);
+ ndev->rx_rule = NULL;
+ goto err_rule;
+ }
+
+ return 0;
+
+err_rule:
+ mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter);
+err_fc:
+ mlx5_destroy_flow_table(ndev->rxft);
+ return err;
+}
+
+static void remove_fwd_to_tir(struct mlx5_vdpa_net *ndev)
+{
+ if (!ndev->rx_rule)
+ return;
+
+ mlx5_del_flow_rules(ndev->rx_rule);
+ mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter);
+ mlx5_destroy_flow_table(ndev->rxft);
+
+ ndev->rx_rule = NULL;
+}
+
+static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+ if (unlikely(!mvq->ready))
+ return;
+
+ iowrite16(idx, ndev->mvdev.res.kick_addr);
+}
+
+static int mlx5_vdpa_set_vq_address(struct vdpa_device *vdev, u16 idx, u64 desc_area,
+ u64 driver_area, u64 device_area)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+ mvq->desc_addr = desc_area;
+ mvq->device_addr = device_area;
+ mvq->driver_addr = driver_area;
+ return 0;
+}
+
+static void mlx5_vdpa_set_vq_num(struct vdpa_device *vdev, u16 idx, u32 num)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ struct mlx5_vdpa_virtqueue *mvq;
+
+ mvq = &ndev->vqs[idx];
+ mvq->num_ent = num;
+}
+
+static void mlx5_vdpa_set_vq_cb(struct vdpa_device *vdev, u16 idx, struct vdpa_callback *cb)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ struct mlx5_vdpa_virtqueue *vq = &ndev->vqs[idx];
+
+ vq->event_cb = *cb;
+}
+
+static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+ int err;
+
+ if (!mvq->ready && ready && mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
+ err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
+ if (err) {
+ mlx5_vdpa_warn(mvdev, "failed to modify virtqueue(%d)\n", err);
+ return;
+ }
+ }
+
+ mvq->ready = ready;
+}
+
+static bool mlx5_vdpa_get_vq_ready(struct vdpa_device *vdev, u16 idx)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+ return mvq->ready;
+}
+
+static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx,
+ const struct vdpa_vq_state *state)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+
+ if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
+ mlx5_vdpa_warn(mvdev, "can't modify available index\n");
+ return -EINVAL;
+ }
+
+ mvq->avail_idx = state->avail_index;
+ return 0;
+}
+
+static void mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa_vq_state *state)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
+ struct mlx5_virtq_attr attr;
+ int err;
+
+ if (!mvq->initialized)
+ goto not_ready;
+
+ err = query_virtqueue(ndev, mvq, &attr);
+ if (err) {
+ mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
+ goto not_ready;
+ }
+ state->avail_index = attr.available_index;
+
+not_ready:
+ state->state = VQ_STATE_NOT_READY;
+}
+
+static u32 mlx5_vdpa_get_vq_align(struct vdpa_device *vdev)
+{
+ return PAGE_SIZE;
+}
+
+enum { MLX5_VIRTIO_NET_F_GUEST_CSUM = 1 << 9,
+ MLX5_VIRTIO_NET_F_CSUM = 1 << 10,
+ MLX5_VIRTIO_NET_F_HOST_TSO6 = 1 << 11,
+ MLX5_VIRTIO_NET_F_HOST_TSO4 = 1 << 12,
+};
+
+static u64 mlx_to_vritio_features(u16 dev_features)
+{
+ u64 result = 0;
+
+ if (dev_features & MLX5_VIRTIO_NET_F_GUEST_CSUM)
+ result |= BIT(VIRTIO_NET_F_GUEST_CSUM);
+ if (dev_features & MLX5_VIRTIO_NET_F_CSUM)
+ result |= BIT(VIRTIO_NET_F_CSUM);
+ if (dev_features & MLX5_VIRTIO_NET_F_HOST_TSO6)
+ result |= BIT(VIRTIO_NET_F_HOST_TSO6);
+ if (dev_features & MLX5_VIRTIO_NET_F_HOST_TSO4)
+ result |= BIT(VIRTIO_NET_F_HOST_TSO4);
+
+ return result;
+}
+
+static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ u16 dev_features;
+
+ dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
+ ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features);
+ if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
+ ndev->mvdev.mlx_features |= BIT(VIRTIO_F_VERSION_1);
+ ndev->mvdev.mlx_features |= BIT(VIRTIO_F_ANY_LAYOUT);
+ ndev->mvdev.mlx_features |= BIT(VIRTIO_F_IOMMU_PLATFORM);
+ if (mlx5_vdpa_max_qps(ndev->mvdev.max_vqs) > 1)
+ ndev->mvdev.mlx_features |= BIT(VIRTIO_NET_F_MQ);
+
+ print_features(mvdev, ndev->mvdev.mlx_features, false);
+ return ndev->mvdev.mlx_features;
+}
+
+static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
+{
+ /* FIXME: qemu currently does not set all the feaures due to a bug.
+ * Add checks when this is fixed.
+ */
+ return 0;
+}
+
+static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+ int err;
+ int i;
+
+ for (i = 0; i < 2 * mlx5_vdpa_max_qps(ndev->mvdev.max_vqs); i++) {
+ err = setup_vq(ndev, &ndev->vqs[i]);
+ if (err)
+ goto err_vq;
+ }
+
+ return 0;
+
+err_vq:
+ for (--i; i >= 0; i--)
+ teardown_vq(ndev, &ndev->vqs[i]);
+
+ return err;
+}
+
+static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
+{
+ struct mlx5_vdpa_virtqueue *mvq;
+ int i;
+
+ for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+ mvq = &ndev->vqs[i];
+ if (!mvq->num_ent)
+ continue;
+
+ teardown_vq(ndev, mvq);
+ }
+}
+
+static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ int err;
+
+ print_features(mvdev, features, true);
+
+ err = verify_min_features(mvdev, features);
+ if (err)
+ return err;
+
+ ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
+ return err;
+}
+
+static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
+{
+ /* not implemented */
+ mlx5_vdpa_warn(to_mvdev(vdev), "set config callback not supported\n");
+}
+
+#define MLX5_VDPA_MAX_VQ_ENTRIES 256
+static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev)
+{
+ return MLX5_VDPA_MAX_VQ_ENTRIES;
+}
+
+static u32 mlx5_vdpa_get_device_id(struct vdpa_device *vdev)
+{
+ return VIRTIO_ID_NET;
+}
+
+static u32 mlx5_vdpa_get_vendor_id(struct vdpa_device *vdev)
+{
+ return PCI_VENDOR_ID_MELLANOX;
+}
+
+static u8 mlx5_vdpa_get_status(struct vdpa_device *vdev)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+
+ print_status(mvdev, ndev->mvdev.status, false);
+ return ndev->mvdev.status;
+}
+
+static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
+{
+ struct mlx5_vq_restore_info *ri = &mvq->ri;
+ struct mlx5_virtq_attr attr;
+ int err;
+
+ if (!mvq->initialized)
+ return 0;
+
+ err = query_virtqueue(ndev, mvq, &attr);
+ if (err)
+ return err;
+
+ ri->avail_index = attr.available_index;
+ ri->ready = mvq->ready;
+ ri->num_ent = mvq->num_ent;
+ ri->desc_addr = mvq->desc_addr;
+ ri->device_addr = mvq->device_addr;
+ ri->driver_addr = mvq->driver_addr;
+ ri->cb = mvq->event_cb;
+ ri->restore = true;
+ return 0;
+}
+
+static int save_channels_info(struct mlx5_vdpa_net *ndev)
+{
+ int i;
+
+ for (i = 0; i < ndev->mvdev.max_vqs; i++) {
+ memset(&ndev->vqs[i].ri, 0, sizeof(ndev->vqs[i].ri));
+ save_channel_info(ndev, &ndev->vqs[i]);
+ }
+ return 0;
+}
+
+static void mlx5_clear_vqs(struct mlx5_vdpa_net *ndev)
+{
+ int i;
+
+ for (i = 0; i < ndev->mvdev.max_vqs; i++)
+ memset(&ndev->vqs[i], 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
+}
+
+static void restore_channels_info(struct mlx5_vdpa_net *ndev)
+{
+ struct mlx5_vdpa_virtqueue *mvq;
+ struct mlx5_vq_restore_info *ri;
+ int i;
+
+ mlx5_clear_vqs(ndev);
+ init_mvqs(ndev);
+ for (i = 0; i < ndev->mvdev.max_vqs; i++) {
+ mvq = &ndev->vqs[i];
+ ri = &mvq->ri;
+ if (!ri->restore)
+ continue;
+
+ mvq->avail_idx = ri->avail_index;
+ mvq->ready = ri->ready;
+ mvq->num_ent = ri->num_ent;
+ mvq->desc_addr = ri->desc_addr;
+ mvq->device_addr = ri->device_addr;
+ mvq->driver_addr = ri->driver_addr;
+ mvq->event_cb = ri->cb;
+ }
+}
+
+static int mlx5_vdpa_change_map(struct mlx5_vdpa_net *ndev, struct vhost_iotlb *iotlb)
+{
+ int err;
+
+ suspend_vqs(ndev);
+ err = save_channels_info(ndev);
+ if (err)
+ goto err_mr;
+
+ teardown_driver(ndev);
+ mlx5_vdpa_destroy_mr(&ndev->mvdev);
+ err = mlx5_vdpa_create_mr(&ndev->mvdev, iotlb);
+ if (err)
+ goto err_mr;
+
+ restore_channels_info(ndev);
+ err = setup_driver(ndev);
+ if (err)
+ goto err_setup;
+
+ return 0;
+
+err_setup:
+ mlx5_vdpa_destroy_mr(&ndev->mvdev);
+err_mr:
+ return err;
+}
+
+static int setup_driver(struct mlx5_vdpa_net *ndev)
+{
+ int err;
+
+ mutex_lock(&ndev->reslock);
+ if (ndev->setup) {
+ mlx5_vdpa_warn(&ndev->mvdev, "setup driver called for already setup driver\n");
+ err = 0;
+ goto out;
+ }
+ err = setup_virtqueues(ndev);
+ if (err) {
+ mlx5_vdpa_warn(&ndev->mvdev, "setup_virtqueues\n");
+ goto out;
+ }
+
+ err = create_rqt(ndev);
+ if (err) {
+ mlx5_vdpa_warn(&ndev->mvdev, "create_rqt\n");
+ goto err_rqt;
+ }
+
+ err = create_tir(ndev);
+ if (err) {
+ mlx5_vdpa_warn(&ndev->mvdev, "create_tir\n");
+ goto err_tir;
+ }
+
+ err = add_fwd_to_tir(ndev);
+ if (err) {
+ mlx5_vdpa_warn(&ndev->mvdev, "add_fwd_to_tir\n");
+ goto err_fwd;
+ }
+ ndev->setup = true;
+ mutex_unlock(&ndev->reslock);
+
+ return 0;
+
+err_fwd:
+ destroy_tir(ndev);
+err_tir:
+ destroy_rqt(ndev);
+err_rqt:
+ teardown_virtqueues(ndev);
+out:
+ mutex_unlock(&ndev->reslock);
+ return err;
+}
+
+static void teardown_driver(struct mlx5_vdpa_net *ndev)
+{
+ mutex_lock(&ndev->reslock);
+ if (!ndev->setup)
+ goto out;
+
+ remove_fwd_to_tir(ndev);
+ destroy_tir(ndev);
+ destroy_rqt(ndev);
+ teardown_virtqueues(ndev);
+ ndev->setup = false;
+out:
+ mutex_unlock(&ndev->reslock);
+}
+
+static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ int err;
+
+ print_status(mvdev, status, true);
+ if (!status) {
+ mlx5_vdpa_info(mvdev, "performing device reset\n");
+ teardown_driver(ndev);
+ mlx5_vdpa_destroy_mr(&ndev->mvdev);
+ ndev->mvdev.status = 0;
+ ndev->mvdev.mlx_features = 0;
+ ++mvdev->generation;
+ return;
+ }
+
+ if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
+ if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
+ err = setup_driver(ndev);
+ if (err) {
+ mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
+ goto err_setup;
+ }
+ } else {
+ mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
+ return;
+ }
+ }
+
+ ndev->mvdev.status = status;
+ return;
+
+err_setup:
+ mlx5_vdpa_destroy_mr(&ndev->mvdev);
+ ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
+}
+
+static void mlx5_vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf,
+ unsigned int len)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+
+ if (offset + len < sizeof(struct virtio_net_config))
+ memcpy(buf, &ndev->config + offset, len);
+}
+
+static void mlx5_vdpa_set_config(struct vdpa_device *vdev, unsigned int offset, const void *buf,
+ unsigned int len)
+{
+ /* not supported */
+}
+
+static u32 mlx5_vdpa_get_generation(struct vdpa_device *vdev)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+
+ return mvdev->generation;
+}
+
+static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb)
+{
+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
+ bool change_map;
+ int err;
+
+ err = mlx5_vdpa_handle_set_map(mvdev, iotlb, &change_map);
+ if (err) {
+ mlx5_vdpa_warn(mvdev, "set map failed(%d)\n", err);
+ return err;
+ }
+
+ if (change_map)
+ return mlx5_vdpa_change_map(ndev, iotlb);
+
+ return 0;
+}
+
+static void mlx5_vdpa_free(struct vdpa_device *vdev)
+{
+ /* not used */
+}
+
+static const struct vdpa_config_ops mlx5_vdpa_ops = {
+ .set_vq_address = mlx5_vdpa_set_vq_address,
+ .set_vq_num = mlx5_vdpa_set_vq_num,
+ .kick_vq = mlx5_vdpa_kick_vq,
+ .set_vq_cb = mlx5_vdpa_set_vq_cb,
+ .set_vq_ready = mlx5_vdpa_set_vq_ready,
+ .get_vq_ready = mlx5_vdpa_get_vq_ready,
+ .set_vq_state = mlx5_vdpa_set_vq_state,
+ .get_vq_state = mlx5_vdpa_get_vq_state,
+ .get_vq_align = mlx5_vdpa_get_vq_align,
+ .get_features = mlx5_vdpa_get_features,
+ .set_features = mlx5_vdpa_set_features,
+ .set_config_cb = mlx5_vdpa_set_config_cb,
+ .get_vq_num_max = mlx5_vdpa_get_vq_num_max,
+ .get_device_id = mlx5_vdpa_get_device_id,
+ .get_vendor_id = mlx5_vdpa_get_vendor_id,
+ .get_status = mlx5_vdpa_get_status,
+ .set_status = mlx5_vdpa_set_status,
+ .get_config = mlx5_vdpa_get_config,
+ .set_config = mlx5_vdpa_set_config,
+ .get_generation = mlx5_vdpa_get_generation,
+ .set_map = mlx5_vdpa_set_map,
+ .free = mlx5_vdpa_free,
+};
+
+static int alloc_resources(struct mlx5_vdpa_net *ndev)
+{
+ struct mlx5_vdpa_net_resources *res = &ndev->res;
+ int err;
+
+ if (res->valid) {
+ mlx5_vdpa_warn(&ndev->mvdev, "resources already allocated\n");
+ return -EEXIST;
+ }
+
+ err = mlx5_vdpa_alloc_transport_domain(&ndev->mvdev, &res->tdn);
+ if (err)
+ return err;
+
+ err = create_tis(ndev);
+ if (err)
+ goto err_tis;
+
+ res->valid = true;
+
+ return 0;
+
+err_tis:
+ mlx5_vdpa_dealloc_transport_domain(&ndev->mvdev, res->tdn);
+ return err;
+}
+
+static void free_resources(struct mlx5_vdpa_net *ndev)
+{
+ struct mlx5_vdpa_net_resources *res = &ndev->res;
+
+ if (!res->valid)
+ return;
+
+ destroy_tis(ndev);
+ mlx5_vdpa_dealloc_transport_domain(&ndev->mvdev, res->tdn);
+ res->valid = false;
+}
+
+static void init_mvqs(struct mlx5_vdpa_net *ndev)
+{
+ struct mlx5_vdpa_virtqueue *mvq;
+ int i;
+
+ for (i = 0; i < 2 * mlx5_vdpa_max_qps(ndev->mvdev.max_vqs); ++i) {
+ mvq = &ndev->vqs[i];
+ memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
+ mvq->index = i;
+ mvq->ndev = ndev;
+ mvq->fwqp.fw = true;
+ }
+ for (; i < ndev->mvdev.max_vqs; i++) {
+ mvq = &ndev->vqs[i];
+ memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
+ mvq->index = i;
+ mvq->ndev = ndev;
+ }
+}
+
+void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev)
+{
+ struct virtio_net_config *config;
+ struct mlx5_vdpa_dev *mvdev;
+ struct mlx5_vdpa_net *ndev;
+ u32 max_vqs;
+ int err;
+
+ /* we save one virtqueue for control virtqueue should we require it */
+ max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
+ max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
+
+ ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops,
+ 2 * mlx5_vdpa_max_qps(max_vqs));
+ if (IS_ERR(ndev))
+ return ndev;
+
+ ndev->mvdev.max_vqs = max_vqs;
+ mvdev = &ndev->mvdev;
+ mvdev->mdev = mdev;
+ init_mvqs(ndev);
+ mutex_init(&ndev->reslock);
+ config = &ndev->config;
+ err = mlx5_query_nic_vport_mtu(mdev, &config->mtu);
+ if (err)
+ goto err_mtu;
+
+ err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
+ if (err)
+ goto err_mtu;
+
+ mvdev->vdev.dma_dev = mdev->device;
+ err = mlx5_vdpa_alloc_resources(&ndev->mvdev);
+ if (err)
+ goto err_mtu;
+
+ err = alloc_resources(ndev);
+ if (err)
+ goto err_res;
+
+ err = vdpa_register_device(&mvdev->vdev);
+ if (err)
+ goto err_reg;
+
+ return ndev;
+
+err_reg:
+ free_resources(ndev);
+err_res:
+ mlx5_vdpa_free_resources(&ndev->mvdev);
+err_mtu:
+ mutex_destroy(&ndev->reslock);
+ put_device(&mvdev->vdev.dev);
+ return ERR_PTR(err);
+}
+
+void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev)
+{
+ struct mlx5_vdpa_net *ndev;
+
+ mvdev->status = 0;
+ ndev = container_of(mvdev, struct mlx5_vdpa_net, mvdev);
+ vdpa_unregister_device(&ndev->mvdev.vdev);
+ free_resources(ndev);
+ mlx5_vdpa_free_resources(&ndev->mvdev);
+ mutex_destroy(&ndev->reslock);
+ put_device(&mvdev->vdev.dev);
+}
diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h
new file mode 100644
index 000000000000..38bb9adaadc1
--- /dev/null
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
@@ -0,0 +1,32 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2020 Mellanox Technologies Ltd. */
+
+#ifndef __MLX5_VNET_H_
+#define __MLX5_VNET_H_
+
+#include <linux/vdpa.h>
+#include <linux/virtio_net.h>
+#include <linux/vringh.h>
+#include <linux/mlx5/driver.h>
+#include <linux/mlx5/cq.h>
+#include <linux/mlx5/qp.h>
+#include "../core/mlx5_vdpa.h"
+
+/* we want to have one virtqueue reserved for control queue. We do this
+ * reservation only if we have more than two vqs available.
+ */
+static inline u32 mlx5_vdpa_max_qps(int max_vqs)
+{
+ if (max_vqs > 2)
+ return (max_vqs - 1) / 2;
+
+ return max_vqs / 2;
+}
+
+#define to_mlx5_vdpa_ndev(__mvdev) container_of(__mvdev, struct mlx5_vdpa_net, mvdev)
+void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev);
+void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev);
+int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb);
+void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev);
+
+#endif /* __MLX5_VNET_H_ */
--
2.27.0
On 2020/7/16 下午3:23, Eli Cohen wrote:
> Fix documentation to match actual function prototypes.
>
> Reviewed-by: Parav Pandit <[email protected]>
> Signed-off-by: Eli Cohen <[email protected]>
Acked-by: Jason Wang <[email protected]>
> ---
> drivers/vhost/iotlb.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
> index 1f0ca6e44410..0d4213a54a88 100644
> --- a/drivers/vhost/iotlb.c
> +++ b/drivers/vhost/iotlb.c
> @@ -149,7 +149,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_free);
> * vhost_iotlb_itree_first - return the first overlapped range
> * @iotlb: the IOTLB
> * @start: start of IOVA range
> - * @end: end of IOVA range
> + * @last: last byte in IOVA range
> */
> struct vhost_iotlb_map *
> vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last)
> @@ -162,7 +162,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first);
> * vhost_iotlb_itree_first - return the next overlapped range
> * @iotlb: the IOTLB
> * @start: start of IOVA range
> - * @end: end of IOVA range
> + * @last: last byte IOVA range
> */
> struct vhost_iotlb_map *
> vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last)
On 2020/7/16 下午3:23, Eli Cohen wrote:
> Currently, get_vq_state() is used only to pass the available index value
> of a vq. Extend the struct to return status on the VQ to the caller.
> For now, define VQ_STATE_NOT_READY. In the future it will be extended to
> include other infomration.
>
> Modify current vdpa driver to update this field.
>
> Reviewed-by: Parav Pandit <[email protected]>
> Signed-off-by: Eli Cohen <[email protected]>
What's the difference between this and get_vq_ready()?
Thanks
> ---
> drivers/vdpa/ifcvf/ifcvf_main.c | 1 +
> drivers/vdpa/vdpa_sim/vdpa_sim.c | 1 +
> include/linux/vdpa.h | 9 +++++++++
> 3 files changed, 11 insertions(+)
>
> diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
> index 69032ee97824..77e3b3d91167 100644
> --- a/drivers/vdpa/ifcvf/ifcvf_main.c
> +++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> @@ -240,6 +240,7 @@ static void ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
> {
> struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
>
> + state->state = vf->vring[qid].ready ? 0 : BIT(VQ_STATE_NOT_READY);
> state->avail_index = ifcvf_get_vq_state(vf, qid);
> }
>
> diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> index 599519039f8d..06d974b4bd7b 100644
> --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> @@ -434,6 +434,7 @@ static void vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> struct vringh *vrh = &vq->vring;
>
> + state->state = vq->ready ? 0 : BIT(VQ_STATE_NOT_READY);
> state->avail_index = vrh->last_avail_idx;
> }
>
> diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> index 7b088bebffe8..bcefa30a3b2f 100644
> --- a/include/linux/vdpa.h
> +++ b/include/linux/vdpa.h
> @@ -27,12 +27,21 @@ struct vdpa_notification_area {
> resource_size_t size;
> };
>
> +/**
> + * Bitmask describing the status of the vq
> + */
> +enum {
> + VQ_STATE_NOT_READY = 0,
> +};
> +
> /**
> * vDPA vq_state definition
> * @avail_index: available index
> + * @state: returned status from get_vq_state
> */
> struct vdpa_vq_state {
> u16 avail_index;
> + u32 state;
> };
>
> /**
On 2020/7/16 下午3:23, Eli Cohen wrote:
> Add code to support registering guest's memory region for the device.
It would be better to use "userspace" memory here since vhost-vDPA could
be used by e.g dpdk application on the host in the future.
Thanks
On Thu, Jul 16, 2020 at 04:11:00PM +0800, Jason Wang wrote:
>
> On 2020/7/16 下午3:23, Eli Cohen wrote:
> >Currently, get_vq_state() is used only to pass the available index value
> >of a vq. Extend the struct to return status on the VQ to the caller.
> >For now, define VQ_STATE_NOT_READY. In the future it will be extended to
> >include other infomration.
> >
> >Modify current vdpa driver to update this field.
> >
> >Reviewed-by: Parav Pandit <[email protected]>
> >Signed-off-by: Eli Cohen <[email protected]>
>
>
> What's the difference between this and get_vq_ready()?
>
> Thanks
>
There is no difference. It is just a way to communicate a problem to
with the state of the VQ back to the caller. This is not available now.
I think an asynchronous is preferred but that is not available
currently.
>
> >---
> > drivers/vdpa/ifcvf/ifcvf_main.c | 1 +
> > drivers/vdpa/vdpa_sim/vdpa_sim.c | 1 +
> > include/linux/vdpa.h | 9 +++++++++
> > 3 files changed, 11 insertions(+)
> >
> >diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
> >index 69032ee97824..77e3b3d91167 100644
> >--- a/drivers/vdpa/ifcvf/ifcvf_main.c
> >+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
> >@@ -240,6 +240,7 @@ static void ifcvf_vdpa_get_vq_state(struct vdpa_device *vdpa_dev, u16 qid,
> > {
> > struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
> >+ state->state = vf->vring[qid].ready ? 0 : BIT(VQ_STATE_NOT_READY);
> > state->avail_index = ifcvf_get_vq_state(vf, qid);
> > }
> >diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> >index 599519039f8d..06d974b4bd7b 100644
> >--- a/drivers/vdpa/vdpa_sim/vdpa_sim.c
> >+++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c
> >@@ -434,6 +434,7 @@ static void vdpasim_get_vq_state(struct vdpa_device *vdpa, u16 idx,
> > struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
> > struct vringh *vrh = &vq->vring;
> >+ state->state = vq->ready ? 0 : BIT(VQ_STATE_NOT_READY);
> > state->avail_index = vrh->last_avail_idx;
> > }
> >diff --git a/include/linux/vdpa.h b/include/linux/vdpa.h
> >index 7b088bebffe8..bcefa30a3b2f 100644
> >--- a/include/linux/vdpa.h
> >+++ b/include/linux/vdpa.h
> >@@ -27,12 +27,21 @@ struct vdpa_notification_area {
> > resource_size_t size;
> > };
> >+/**
> >+ * Bitmask describing the status of the vq
> >+ */
> >+enum {
> >+ VQ_STATE_NOT_READY = 0,
> >+};
> >+
> > /**
> > * vDPA vq_state definition
> > * @avail_index: available index
> >+ * @state: returned status from get_vq_state
> > */
> > struct vdpa_vq_state {
> > u16 avail_index;
> >+ u32 state;
> > };
> > /**
>
On Thu, Jul 16, 2020 at 04:13:21PM +0800, Jason Wang wrote:
>
> On 2020/7/16 下午3:23, Eli Cohen wrote:
> >Add code to support registering guest's memory region for the device.
>
>
> It would be better to use "userspace" memory here since vhost-vDPA
> could be used by e.g dpdk application on the host in the future.
>
How about replaciing "guest's memory" with "address space". It is more
general and aligns with the with the fact that virio driver can run in
the guest's kernel.
On 2020/7/16 下午3:23, Eli Cohen wrote:
> Add a front end VDPA driver that registers in the VDPA bus and provides
> networking to a guest. The VDPA driver creates the necessary resources
> on the VF it is driving such that data path will be offloaded.
>
> Notifications are being communicated through the driver.
>
> Currently, only VFs are supported. In subsequent patches we will have
> devlink support to control which VF is used for VDPA and which function
> is used for regular networking.
>
> Reviewed-by: Parav Pandit <[email protected]>
> Signed-off-by: Eli Cohen <[email protected]>
Looks good overall.
Few nits inline.
> ---
> drivers/vdpa/Kconfig | 10 +
> drivers/vdpa/mlx5/Makefile | 5 +-
> drivers/vdpa/mlx5/core/mr.c | 2 +-
> drivers/vdpa/mlx5/net/main.c | 76 ++
> drivers/vdpa/mlx5/net/mlx5_vnet.c | 1966 +++++++++++++++++++++++++++++
> drivers/vdpa/mlx5/net/mlx5_vnet.h | 32 +
> 6 files changed, 2089 insertions(+), 2 deletions(-)
> create mode 100644 drivers/vdpa/mlx5/net/main.c
> create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.c
> create mode 100644 drivers/vdpa/mlx5/net/mlx5_vnet.h
>
> diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> index 48a1a776dd86..809cb4c2eecf 100644
> --- a/drivers/vdpa/Kconfig
> +++ b/drivers/vdpa/Kconfig
> @@ -36,4 +36,14 @@ config MLX5_VDPA
> Support library for Mellanox VDPA drivers. Provides code that is
> common for all types of VDPA drivers.
>
> +config MLX5_VDPA_NET
> + tristate "vDPA driver for ConnectX devices"
> + depends on MLX5_VDPA
> + default n
> + help
> + VDPA network driver for ConnectX6 and newer. Provides offloading
> + of virtio net datapath such that descriptors put on the ring will
> + be executed by the hardware. It also supports a variety of stateless
> + offloads depending on the actual device used and firmware version.
> +
> endif # VDPA
> diff --git a/drivers/vdpa/mlx5/Makefile b/drivers/vdpa/mlx5/Makefile
> index b347c62032ea..3f8850c1a300 100644
> --- a/drivers/vdpa/mlx5/Makefile
> +++ b/drivers/vdpa/mlx5/Makefile
> @@ -1 +1,4 @@
> -obj-$(CONFIG_MLX5_VDPA) += core/resources.o core/mr.o
> +subdir-ccflags-y += -I$(src)/core
> +
> +obj-$(CONFIG_MLX5_VDPA_NET) += mlx5_vdpa.o
> +mlx5_vdpa-$(CONFIG_MLX5_VDPA_NET) += net/main.o net/mlx5_vnet.o core/resources.o core/mr.o
> diff --git a/drivers/vdpa/mlx5/core/mr.c b/drivers/vdpa/mlx5/core/mr.c
> index 975aa45fd78b..34e3bbb80df8 100644
> --- a/drivers/vdpa/mlx5/core/mr.c
> +++ b/drivers/vdpa/mlx5/core/mr.c
> @@ -453,7 +453,7 @@ int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *io
> bool *change_map)
> {
> struct mlx5_vdpa_mr *mr = &mvdev->mr;
> - int err;
> + int err = 0;
>
> *change_map = false;
> if (map_empty(iotlb)) {
> diff --git a/drivers/vdpa/mlx5/net/main.c b/drivers/vdpa/mlx5/net/main.c
> new file mode 100644
> index 000000000000..838cd98386ff
> --- /dev/null
> +++ b/drivers/vdpa/mlx5/net/main.c
> @@ -0,0 +1,76 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/* Copyright (c) 2020 Mellanox Technologies Ltd. */
> +
> +#include <linux/module.h>
> +#include <linux/mlx5/driver.h>
> +#include <linux/mlx5/device.h>
> +#include "mlx5_vdpa_ifc.h"
> +#include "mlx5_vnet.h"
> +
> +MODULE_AUTHOR("Eli Cohen <[email protected]>");
> +MODULE_DESCRIPTION("Mellanox VDPA driver");
> +MODULE_LICENSE("Dual BSD/GPL");
> +
> +static bool required_caps_supported(struct mlx5_core_dev *mdev)
> +{
> + u8 event_mode;
> + u64 got;
> +
> + got = MLX5_CAP_GEN_64(mdev, general_obj_types);
> +
> + if (!(got & MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q))
> + return false;
> +
> + event_mode = MLX5_CAP_DEV_VDPA_EMULATION(mdev, event_mode);
> + if (!(event_mode & MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE))
> + return false;
> +
> + if (!MLX5_CAP_DEV_VDPA_EMULATION(mdev, eth_frame_offload_type))
> + return false;
> +
> + return true;
> +}
> +
> +static void *mlx5_vdpa_add(struct mlx5_core_dev *mdev)
> +{
> + struct mlx5_vdpa_dev *vdev;
> +
> + if (mlx5_core_is_pf(mdev))
> + return NULL;
> +
> + if (!required_caps_supported(mdev)) {
> + dev_info(mdev->device, "virtio net emulation not supported\n");
> + return NULL;
> + }
> + vdev = mlx5_vdpa_add_dev(mdev);
> + if (IS_ERR(vdev))
> + return NULL;
> +
> + return vdev;
> +}
> +
> +static void mlx5_vdpa_remove(struct mlx5_core_dev *mdev, void *context)
> +{
> + struct mlx5_vdpa_dev *vdev = context;
> +
> + mlx5_vdpa_remove_dev(vdev);
> +}
> +
> +static struct mlx5_interface mlx5_vdpa_interface = {
> + .add = mlx5_vdpa_add,
> + .remove = mlx5_vdpa_remove,
> + .protocol = MLX5_INTERFACE_PROTOCOL_VDPA,
> +};
> +
> +static int __init mlx5_vdpa_init(void)
> +{
> + return mlx5_register_interface(&mlx5_vdpa_interface);
> +}
> +
> +static void __exit mlx5_vdpa_exit(void)
> +{
> + mlx5_unregister_interface(&mlx5_vdpa_interface);
> +}
> +
> +module_init(mlx5_vdpa_init);
> +module_exit(mlx5_vdpa_exit);
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> new file mode 100644
> index 000000000000..f7d3c7a0eb92
> --- /dev/null
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
> @@ -0,0 +1,1966 @@
> +// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
> +/* Copyright (c) 2020 Mellanox Technologies Ltd. */
> +
> +#include <linux/vdpa.h>
> +#include <uapi/linux/virtio_ids.h>
> +#include <linux/virtio_config.h>
> +#include <linux/mlx5/qp.h>
> +#include <linux/mlx5/device.h>
> +#include <linux/mlx5/vport.h>
> +#include <linux/mlx5/fs.h>
> +#include <linux/mlx5/device.h>
> +#include "mlx5_vnet.h"
> +#include "../core/mlx5_vdpa_ifc.h"
> +#include "../core/mlx5_vdpa.h"
> +
> +#define to_mvdev(__vdev) container_of((__vdev), struct mlx5_vdpa_dev, vdev)
> +
> +#define VALID_FEATURES_MASK \
> + (BIT(VIRTIO_NET_F_CSUM) | BIT(VIRTIO_NET_F_GUEST_CSUM) | \
> + BIT(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) | BIT(VIRTIO_NET_F_MTU) | BIT(VIRTIO_NET_F_MAC) | \
> + BIT(VIRTIO_NET_F_GUEST_TSO4) | BIT(VIRTIO_NET_F_GUEST_TSO6) | \
> + BIT(VIRTIO_NET_F_GUEST_ECN) | BIT(VIRTIO_NET_F_GUEST_UFO) | BIT(VIRTIO_NET_F_HOST_TSO4) | \
> + BIT(VIRTIO_NET_F_HOST_TSO6) | BIT(VIRTIO_NET_F_HOST_ECN) | BIT(VIRTIO_NET_F_HOST_UFO) | \
> + BIT(VIRTIO_NET_F_MRG_RXBUF) | BIT(VIRTIO_NET_F_STATUS) | BIT(VIRTIO_NET_F_CTRL_VQ) | \
> + BIT(VIRTIO_NET_F_CTRL_RX) | BIT(VIRTIO_NET_F_CTRL_VLAN) | \
> + BIT(VIRTIO_NET_F_CTRL_RX_EXTRA) | BIT(VIRTIO_NET_F_GUEST_ANNOUNCE) | \
> + BIT(VIRTIO_NET_F_MQ) | BIT(VIRTIO_NET_F_CTRL_MAC_ADDR) | BIT(VIRTIO_NET_F_HASH_REPORT) | \
> + BIT(VIRTIO_NET_F_RSS) | BIT(VIRTIO_NET_F_RSC_EXT) | BIT(VIRTIO_NET_F_STANDBY) | \
> + BIT(VIRTIO_NET_F_SPEED_DUPLEX) | BIT(VIRTIO_F_NOTIFY_ON_EMPTY) | \
> + BIT(VIRTIO_F_ANY_LAYOUT) | BIT(VIRTIO_F_VERSION_1) | BIT(VIRTIO_F_IOMMU_PLATFORM) | \
> + BIT(VIRTIO_F_RING_PACKED) | BIT(VIRTIO_F_ORDER_PLATFORM) | BIT(VIRTIO_F_SR_IOV))
> +
> +#define VALID_STATUS_MASK \
> + (VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK | \
> + VIRTIO_CONFIG_S_FEATURES_OK | VIRTIO_CONFIG_S_NEEDS_RESET | VIRTIO_CONFIG_S_FAILED)
> +
> +struct mlx5_vdpa_net_resources {
> + u32 tisn;
> + u32 tdn;
> + u32 tirn;
> + u32 rqtn;
> + bool valid;
> +};
> +
> +struct mlx5_vdpa_cq_buf {
> + struct mlx5_frag_buf_ctrl fbc;
> + struct mlx5_frag_buf frag_buf;
> + int cqe_size;
> + int nent;
> +};
> +
> +struct mlx5_vdpa_cq {
> + struct mlx5_core_cq mcq;
> + struct mlx5_vdpa_cq_buf buf;
> + struct mlx5_db db;
> + int cqe;
> +};
> +
> +struct mlx5_vdpa_umem {
> + struct mlx5_frag_buf_ctrl fbc;
> + struct mlx5_frag_buf frag_buf;
> + int size;
> + u32 id;
> +};
> +
> +struct mlx5_vdpa_qp {
> + struct mlx5_core_qp mqp;
> + struct mlx5_frag_buf frag_buf;
> + struct mlx5_db db;
> + u16 head;
> + bool fw;
> +};
> +
> +struct mlx5_vq_restore_info {
> + u32 num_ent;
> + u64 desc_addr;
> + u64 device_addr;
> + u64 driver_addr;
> + u16 avail_index;
> + bool ready;
> + struct vdpa_callback cb;
> + bool restore;
> +};
> +
> +struct mlx5_vdpa_virtqueue {
> + bool ready;
> + u64 desc_addr;
> + u64 device_addr;
> + u64 driver_addr;
> + u32 num_ent;
> + struct vdpa_callback event_cb;
> +
> + /* Resources for implementing the notification channel from the device
> + * to the driver. fwqp is the firmware end of an RC connection; the
> + * other end is vqqp used by the driver. cq is is where completions are
> + * reported.
> + */
> + struct mlx5_vdpa_cq cq;
> + struct mlx5_vdpa_qp fwqp;
> + struct mlx5_vdpa_qp vqqp;
> +
> + /* umem resources are required for the virtqueue operation. They're use
> + * is internal and they must be provided by the driver.
> + */
> + struct mlx5_vdpa_umem umem1;
> + struct mlx5_vdpa_umem umem2;
> + struct mlx5_vdpa_umem umem3;
> +
> + bool initialized;
> + int index;
> + u32 virtq_id;
> + struct mlx5_vdpa_net *ndev;
> + u16 avail_idx;
> + int fw_state;
> +
> + /* keep last in the struct */
> + struct mlx5_vq_restore_info ri;
> +};
> +
> +/* We will remove this limitation once mlx5_vdpa_alloc_resources()
> + * provides for driver space allocation
> + */
> +#define MLX5_MAX_SUPPORTED_VQS 16
> +
> +struct mlx5_vdpa_net {
> + struct mlx5_vdpa_dev mvdev;
> + struct mlx5_vdpa_net_resources res;
> + struct virtio_net_config config;
> + struct mlx5_vdpa_virtqueue vqs[MLX5_MAX_SUPPORTED_VQS];
> +
> + /* Serialize vq resources creation and destruction. This is required
> + * since memory map might change and we need to destroy and create
> + * resources while driver in operational.
> + */
> + struct mutex reslock;
> + struct mlx5_flow_table *rxft;
> + struct mlx5_fc *rx_counter;
> + struct mlx5_flow_handle *rx_rule;
> + bool setup;
> +};
> +
> +static void free_resources(struct mlx5_vdpa_net *ndev);
> +static void init_mvqs(struct mlx5_vdpa_net *ndev);
> +static int setup_driver(struct mlx5_vdpa_net *ndev);
> +static void teardown_driver(struct mlx5_vdpa_net *ndev);
> +
> +static bool mlx5_vdpa_debug;
> +
> +#define MLX5_LOG_VIO_FLAG(_feature) \
> + do { \
> + if (features & BIT(_feature)) \
> + mlx5_vdpa_info(mvdev, "%s\n", #_feature); \
> + } while (0)
> +
> +#define MLX5_LOG_VIO_STAT(_status) \
> + do { \
> + if (status & (_status)) \
> + mlx5_vdpa_info(mvdev, "%s\n", #_status); \
> + } while (0)
> +
> +static void print_status(struct mlx5_vdpa_dev *mvdev, u8 status, bool set)
> +{
> + if (status & ~VALID_STATUS_MASK)
> + mlx5_vdpa_warn(mvdev, "Warning: there are invalid status bits 0x%x\n",
> + status & ~VALID_STATUS_MASK);
> +
> + if (!mlx5_vdpa_debug)
> + return;
> +
> + mlx5_vdpa_info(mvdev, "driver status %s", set ? "set" : "get");
> + if (set && !status) {
> + mlx5_vdpa_info(mvdev, "driver resets the device\n");
> + return;
> + }
> +
> + MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_ACKNOWLEDGE);
> + MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_DRIVER);
> + MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_DRIVER_OK);
> + MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_FEATURES_OK);
> + MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_NEEDS_RESET);
> + MLX5_LOG_VIO_STAT(VIRTIO_CONFIG_S_FAILED);
> +}
> +
> +static void print_features(struct mlx5_vdpa_dev *mvdev, u64 features, bool set)
> +{
> + if (features & ~VALID_FEATURES_MASK)
> + mlx5_vdpa_warn(mvdev, "There are invalid feature bits 0x%llx\n",
> + features & ~VALID_FEATURES_MASK);
> +
> + if (!mlx5_vdpa_debug)
> + return;
> +
> + mlx5_vdpa_info(mvdev, "driver %s feature bits:\n", set ? "sets" : "reads");
> + if (!features)
> + mlx5_vdpa_info(mvdev, "all feature bits are cleared\n");
> +
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CSUM);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_CSUM);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MTU);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MAC);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_TSO4);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_TSO6);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_ECN);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_UFO);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_TSO4);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_TSO6);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_ECN);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HOST_UFO);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MRG_RXBUF);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_STATUS);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_VQ);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_RX);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_VLAN);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_RX_EXTRA);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_GUEST_ANNOUNCE);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_MQ);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_CTRL_MAC_ADDR);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_HASH_REPORT);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_RSS);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_RSC_EXT);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_STANDBY);
> + MLX5_LOG_VIO_FLAG(VIRTIO_NET_F_SPEED_DUPLEX);
> + MLX5_LOG_VIO_FLAG(VIRTIO_F_NOTIFY_ON_EMPTY);
> + MLX5_LOG_VIO_FLAG(VIRTIO_F_ANY_LAYOUT);
> + MLX5_LOG_VIO_FLAG(VIRTIO_F_VERSION_1);
> + MLX5_LOG_VIO_FLAG(VIRTIO_F_IOMMU_PLATFORM);
> + MLX5_LOG_VIO_FLAG(VIRTIO_F_RING_PACKED);
> + MLX5_LOG_VIO_FLAG(VIRTIO_F_ORDER_PLATFORM);
> + MLX5_LOG_VIO_FLAG(VIRTIO_F_SR_IOV);
> +}
> +
> +static int create_tis(struct mlx5_vdpa_net *ndev)
> +{
> + struct mlx5_vdpa_dev *mvdev = &ndev->mvdev;
> + u32 in[MLX5_ST_SZ_DW(create_tis_in)] = {};
> + void *tisc;
> + int err;
> +
> + tisc = MLX5_ADDR_OF(create_tis_in, in, ctx);
> + MLX5_SET(tisc, tisc, transport_domain, ndev->res.tdn);
> + err = mlx5_vdpa_create_tis(mvdev, in, &ndev->res.tisn);
> + if (err)
> + mlx5_vdpa_warn(mvdev, "create TIS (%d)\n", err);
> +
> + return err;
> +}
> +
> +static void destroy_tis(struct mlx5_vdpa_net *ndev)
> +{
> + mlx5_vdpa_destroy_tis(&ndev->mvdev, ndev->res.tisn);
> +}
> +
> +#define MLX5_VDPA_CQE_SIZE 64
> +#define MLX5_VDPA_LOG_CQE_SIZE ilog2(MLX5_VDPA_CQE_SIZE)
> +
> +static int cq_frag_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_cq_buf *buf, int nent)
> +{
> + struct mlx5_frag_buf *frag_buf = &buf->frag_buf;
> + u8 log_wq_stride = MLX5_VDPA_LOG_CQE_SIZE;
> + u8 log_wq_sz = MLX5_VDPA_LOG_CQE_SIZE;
> + int err;
> +
> + err = mlx5_frag_buf_alloc_node(ndev->mvdev.mdev, nent * MLX5_VDPA_CQE_SIZE, frag_buf,
> + ndev->mvdev.mdev->priv.numa_node);
> + if (err)
> + return err;
> +
> + mlx5_init_fbc(frag_buf->frags, log_wq_stride, log_wq_sz, &buf->fbc);
> +
> + buf->cqe_size = MLX5_VDPA_CQE_SIZE;
> + buf->nent = nent;
> +
> + return 0;
> +}
> +
> +static int umem_frag_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_umem *umem, int size)
> +{
> + struct mlx5_frag_buf *frag_buf = &umem->frag_buf;
> +
> + return mlx5_frag_buf_alloc_node(ndev->mvdev.mdev, size, frag_buf,
> + ndev->mvdev.mdev->priv.numa_node);
> +}
> +
> +static void cq_frag_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_cq_buf *buf)
> +{
> + mlx5_frag_buf_free(ndev->mvdev.mdev, &buf->frag_buf);
> +}
> +
> +static void *get_cqe(struct mlx5_vdpa_cq *vcq, int n)
> +{
> + return mlx5_frag_buf_get_wqe(&vcq->buf.fbc, n);
> +}
> +
> +static void cq_frag_buf_init(struct mlx5_vdpa_cq *vcq, struct mlx5_vdpa_cq_buf *buf)
> +{
> + struct mlx5_cqe64 *cqe64;
> + void *cqe;
> + int i;
> +
> + for (i = 0; i < buf->nent; i++) {
> + cqe = get_cqe(vcq, i);
> + cqe64 = cqe;
> + cqe64->op_own = MLX5_CQE_INVALID << 4;
> + }
> +}
> +
> +static void *get_sw_cqe(struct mlx5_vdpa_cq *cq, int n)
> +{
> + struct mlx5_cqe64 *cqe64 = get_cqe(cq, n & (cq->cqe - 1));
> +
> + if (likely(get_cqe_opcode(cqe64) != MLX5_CQE_INVALID) &&
> + !((cqe64->op_own & MLX5_CQE_OWNER_MASK) ^ !!(n & cq->cqe)))
> + return cqe64;
> +
> + return NULL;
> +}
> +
> +static void rx_post(struct mlx5_vdpa_qp *vqp, int n)
> +{
> + vqp->head += n;
> + vqp->db.db[0] = cpu_to_be32(vqp->head);
> +}
> +
> +static void qp_prepare(struct mlx5_vdpa_net *ndev, bool fw, void *in,
> + struct mlx5_vdpa_virtqueue *mvq, u32 num_ent)
> +{
> + struct mlx5_vdpa_qp *vqp;
> + __be64 *pas;
> + void *qpc;
> +
> + vqp = fw ? &mvq->fwqp : &mvq->vqqp;
> + MLX5_SET(create_qp_in, in, uid, ndev->mvdev.res.uid);
> + qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
> + if (vqp->fw) {
> + /* Firmware QP is allocated by the driver for the firmware's
> + * use so we can skip part of the params as they will be chosen by firmware
> + */
> + qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
> + MLX5_SET(qpc, qpc, rq_type, MLX5_ZERO_LEN_RQ);
> + MLX5_SET(qpc, qpc, no_sq, 1);
> + return;
> + }
> +
> + MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
> + MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
> + MLX5_SET(qpc, qpc, pd, ndev->mvdev.res.pdn);
> + MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
> + MLX5_SET(qpc, qpc, uar_page, ndev->mvdev.res.uar->index);
> + MLX5_SET(qpc, qpc, log_page_size, vqp->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
> + MLX5_SET(qpc, qpc, no_sq, 1);
> + MLX5_SET(qpc, qpc, cqn_rcv, mvq->cq.mcq.cqn);
> + MLX5_SET(qpc, qpc, log_rq_size, ilog2(num_ent));
> + MLX5_SET(qpc, qpc, rq_type, MLX5_NON_ZERO_RQ);
> + pas = (__be64 *)MLX5_ADDR_OF(create_qp_in, in, pas);
> + mlx5_fill_page_frag_array(&vqp->frag_buf, pas);
> +}
> +
> +static int rq_buf_alloc(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp, u32 num_ent)
> +{
> + return mlx5_frag_buf_alloc_node(ndev->mvdev.mdev,
> + num_ent * sizeof(struct mlx5_wqe_data_seg), &vqp->frag_buf,
> + ndev->mvdev.mdev->priv.numa_node);
> +}
> +
> +static void rq_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp)
> +{
> + mlx5_frag_buf_free(ndev->mvdev.mdev, &vqp->frag_buf);
> +}
> +
> +static int qp_create(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
> + struct mlx5_vdpa_qp *vqp)
> +{
> + struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
> + int inlen = MLX5_ST_SZ_BYTES(create_qp_in);
> + u32 out[MLX5_ST_SZ_DW(create_qp_out)] = {};
> + void *qpc;
> + void *in;
> + int err;
> +
> + if (!vqp->fw) {
> + vqp = &mvq->vqqp;
> + err = rq_buf_alloc(ndev, vqp, mvq->num_ent);
> + if (err)
> + return err;
> +
> + err = mlx5_db_alloc(ndev->mvdev.mdev, &vqp->db);
> + if (err)
> + goto err_db;
> + inlen += vqp->frag_buf.npages * sizeof(__be64);
> + }
> +
> + in = kzalloc(inlen, GFP_KERNEL);
> + if (!in) {
> + err = -ENOMEM;
> + goto err_kzalloc;
> + }
> +
> + qp_prepare(ndev, vqp->fw, in, mvq, mvq->num_ent);
> + qpc = MLX5_ADDR_OF(create_qp_in, in, qpc);
> + MLX5_SET(qpc, qpc, st, MLX5_QP_ST_RC);
> + MLX5_SET(qpc, qpc, pm_state, MLX5_QP_PM_MIGRATED);
> + MLX5_SET(qpc, qpc, pd, ndev->mvdev.res.pdn);
> + MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
> + if (!vqp->fw)
> + MLX5_SET64(qpc, qpc, dbr_addr, vqp->db.dma);
> + MLX5_SET(create_qp_in, in, opcode, MLX5_CMD_OP_CREATE_QP);
> + err = mlx5_cmd_exec(mdev, in, inlen, out, sizeof(out));
> + kfree(in);
> + if (err)
> + goto err_kzalloc;
> +
> + vqp->mqp.uid = MLX5_GET(create_qp_in, in, uid);
> + vqp->mqp.qpn = MLX5_GET(create_qp_out, out, qpn);
> +
> + if (!vqp->fw)
> + rx_post(vqp, mvq->num_ent);
> +
> + return 0;
> +
> +err_kzalloc:
> + if (!vqp->fw)
> + mlx5_db_free(ndev->mvdev.mdev, &vqp->db);
> +err_db:
> + if (!vqp->fw)
> + rq_buf_free(ndev, vqp);
> +
> + return err;
> +}
> +
> +static void qp_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_qp *vqp)
> +{
> + u32 in[MLX5_ST_SZ_DW(destroy_qp_in)] = {};
> +
> + MLX5_SET(destroy_qp_in, in, opcode, MLX5_CMD_OP_DESTROY_QP);
> + MLX5_SET(destroy_qp_in, in, qpn, vqp->mqp.qpn);
> + MLX5_SET(destroy_qp_in, in, uid, ndev->mvdev.res.uid);
> + if (mlx5_cmd_exec_in(ndev->mvdev.mdev, destroy_qp, in))
> + mlx5_vdpa_warn(&ndev->mvdev, "destroy qp 0x%x\n", vqp->mqp.qpn);
> + if (!vqp->fw) {
> + mlx5_db_free(ndev->mvdev.mdev, &vqp->db);
> + rq_buf_free(ndev, vqp);
> + }
> +}
> +
> +static void *next_cqe_sw(struct mlx5_vdpa_cq *cq)
> +{
> + return get_sw_cqe(cq, cq->mcq.cons_index);
> +}
> +
> +static int mlx5_vdpa_poll_one(struct mlx5_vdpa_cq *vcq)
> +{
> + struct mlx5_cqe64 *cqe64;
> +
> + cqe64 = next_cqe_sw(vcq);
> + if (!cqe64)
> + return -EAGAIN;
> +
> + vcq->mcq.cons_index++;
> + return 0;
> +}
> +
> +static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, int num)
> +{
> + mlx5_cq_set_ci(&mvq->cq.mcq);
> + rx_post(&mvq->vqqp, num);
> + if (mvq->event_cb.callback)
> + mvq->event_cb.callback(mvq->event_cb.private);
> +}
> +
> +static void mlx5_vdpa_cq_comp(struct mlx5_core_cq *mcq, struct mlx5_eqe *eqe)
> +{
> + struct mlx5_vdpa_virtqueue *mvq = container_of(mcq, struct mlx5_vdpa_virtqueue, cq.mcq);
> + struct mlx5_vdpa_net *ndev = mvq->ndev;
> + void __iomem *uar_page = ndev->mvdev.res.uar->map;
> + int num = 0;
> +
> + while (!mlx5_vdpa_poll_one(&mvq->cq)) {
> + num++;
> + if (num > mvq->num_ent / 2) {
> + /* If completions keep coming while we poll, we want to
> + * let the hardware know that we consumed them by
> + * updating the doorbell record. We also let vdpa core
> + * know about this so it passes it on the virtio driver
> + * on the guest.
> + */
> + mlx5_vdpa_handle_completions(mvq, num);
> + num = 0;
> + }
> + }
> +
> + if (num)
> + mlx5_vdpa_handle_completions(mvq, num);
> +
> + mlx5_cq_arm(&mvq->cq.mcq, MLX5_CQ_DB_REQ_NOT, uar_page, mvq->cq.mcq.cons_index);
> +}
> +
> +static int cq_create(struct mlx5_vdpa_net *ndev, u16 idx, u32 num_ent)
> +{
> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> + struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
> + void __iomem *uar_page = ndev->mvdev.res.uar->map;
> + u32 out[MLX5_ST_SZ_DW(create_cq_out)];
> + struct mlx5_vdpa_cq *vcq = &mvq->cq;
> + unsigned int irqn;
> + __be64 *pas;
> + int inlen;
> + void *cqc;
> + void *in;
> + int err;
> + int eqn;
> +
> + err = mlx5_db_alloc(mdev, &vcq->db);
> + if (err)
> + return err;
> +
> + vcq->mcq.set_ci_db = vcq->db.db;
> + vcq->mcq.arm_db = vcq->db.db + 1;
> + vcq->mcq.cqe_sz = 64;
> +
> + err = cq_frag_buf_alloc(ndev, &vcq->buf, num_ent);
> + if (err)
> + goto err_db;
> +
> + cq_frag_buf_init(vcq, &vcq->buf);
> +
> + inlen = MLX5_ST_SZ_BYTES(create_cq_in) +
> + MLX5_FLD_SZ_BYTES(create_cq_in, pas[0]) * vcq->buf.frag_buf.npages;
> + in = kzalloc(inlen, GFP_KERNEL);
> + if (!in) {
> + err = -ENOMEM;
> + goto err_vzalloc;
> + }
> +
> + MLX5_SET(create_cq_in, in, uid, ndev->mvdev.res.uid);
> + pas = (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas);
> + mlx5_fill_page_frag_array(&vcq->buf.frag_buf, pas);
> +
> + cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
> + MLX5_SET(cqc, cqc, log_page_size, vcq->buf.frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
> +
> + /* Use vector 0 by default. Consider adding code to choose least used
> + * vector.
> + */
> + err = mlx5_vector2eqn(mdev, 0, &eqn, &irqn);
> + if (err)
> + goto err_vec;
> +
> + cqc = MLX5_ADDR_OF(create_cq_in, in, cq_context);
> + MLX5_SET(cqc, cqc, log_cq_size, ilog2(num_ent));
> + MLX5_SET(cqc, cqc, uar_page, ndev->mvdev.res.uar->index);
> + MLX5_SET(cqc, cqc, c_eqn, eqn);
> + MLX5_SET64(cqc, cqc, dbr_addr, vcq->db.dma);
> +
> + err = mlx5_core_create_cq(mdev, &vcq->mcq, in, inlen, out, sizeof(out));
> + if (err)
> + goto err_vec;
> +
> + vcq->mcq.comp = mlx5_vdpa_cq_comp;
> + vcq->cqe = num_ent;
> + vcq->mcq.set_ci_db = vcq->db.db;
> + vcq->mcq.arm_db = vcq->db.db + 1;
> + mlx5_cq_arm(&mvq->cq.mcq, MLX5_CQ_DB_REQ_NOT, uar_page, mvq->cq.mcq.cons_index);
> + kfree(in);
> + return 0;
> +
> +err_vec:
> + kfree(in);
> +err_vzalloc:
> + cq_frag_buf_free(ndev, &vcq->buf);
> +err_db:
> + mlx5_db_free(ndev->mvdev.mdev, &vcq->db);
> + return err;
> +}
> +
> +static void cq_destroy(struct mlx5_vdpa_net *ndev, u16 idx)
> +{
> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> + struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
> + struct mlx5_vdpa_cq *vcq = &mvq->cq;
> +
> + if (mlx5_core_destroy_cq(mdev, &vcq->mcq)) {
> + mlx5_vdpa_warn(&ndev->mvdev, "destroy CQ 0x%x\n", vcq->mcq.cqn);
> + return;
> + }
> + cq_frag_buf_free(ndev, &vcq->buf);
> + mlx5_db_free(ndev->mvdev.mdev, &vcq->db);
> +}
> +
> +static int umem_size(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num,
> + struct mlx5_vdpa_umem **umemp)
> +{
> + struct mlx5_core_dev *mdev = ndev->mvdev.mdev;
> + int p_a;
> + int p_b;
> +
> + switch (num) {
> + case 1:
> + p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_1_buffer_param_a);
> + p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_1_buffer_param_b);
> + *umemp = &mvq->umem1;
> + break;
> + case 2:
> + p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_2_buffer_param_a);
> + p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_2_buffer_param_b);
> + *umemp = &mvq->umem2;
> + break;
> + case 3:
> + p_a = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_3_buffer_param_a);
> + p_b = MLX5_CAP_DEV_VDPA_EMULATION(mdev, umem_3_buffer_param_b);
> + *umemp = &mvq->umem3;
> + break;
> + }
> + return p_a * mvq->num_ent + p_b;
> +}
> +
> +static void umem_frag_buf_free(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_umem *umem)
> +{
> + mlx5_frag_buf_free(ndev->mvdev.mdev, &umem->frag_buf);
> +}
> +
> +static int create_umem(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num)
> +{
> + int inlen;
> + u32 out[MLX5_ST_SZ_DW(create_umem_out)] = {};
> + void *um;
> + void *in;
> + int err;
> + __be64 *pas;
> + int size;
> + struct mlx5_vdpa_umem *umem;
> +
> + size = umem_size(ndev, mvq, num, &umem);
> + if (size < 0)
> + return size;
> +
> + umem->size = size;
> + err = umem_frag_buf_alloc(ndev, umem, size);
> + if (err)
> + return err;
> +
> + inlen = MLX5_ST_SZ_BYTES(create_umem_in) + MLX5_ST_SZ_BYTES(mtt) * umem->frag_buf.npages;
> +
> + in = kzalloc(inlen, GFP_KERNEL);
> + if (!in) {
> + err = -ENOMEM;
> + goto err_in;
> + }
> +
> + MLX5_SET(create_umem_in, in, opcode, MLX5_CMD_OP_CREATE_UMEM);
> + MLX5_SET(create_umem_in, in, uid, ndev->mvdev.res.uid);
> + um = MLX5_ADDR_OF(create_umem_in, in, umem);
> + MLX5_SET(umem, um, log_page_size, umem->frag_buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
> + MLX5_SET64(umem, um, num_of_mtt, umem->frag_buf.npages);
> +
> + pas = (__be64 *)MLX5_ADDR_OF(umem, um, mtt[0]);
> + mlx5_fill_page_frag_array_perm(&umem->frag_buf, pas, MLX5_MTT_PERM_RW);
> +
> + err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
> + if (err) {
> + mlx5_vdpa_warn(&ndev->mvdev, "create umem(%d)\n", err);
> + goto err_cmd;
> + }
> +
> + kfree(in);
> + umem->id = MLX5_GET(create_umem_out, out, umem_id);
> +
> + return 0;
> +
> +err_cmd:
> + kfree(in);
> +err_in:
> + umem_frag_buf_free(ndev, umem);
> + return err;
> +}
> +
> +static void umem_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int num)
> +{
> + u32 in[MLX5_ST_SZ_DW(destroy_umem_in)] = {};
> + u32 out[MLX5_ST_SZ_DW(destroy_umem_out)] = {};
> + struct mlx5_vdpa_umem *umem;
> +
> + switch (num) {
> + case 1:
> + umem = &mvq->umem1;
> + break;
> + case 2:
> + umem = &mvq->umem2;
> + break;
> + case 3:
> + umem = &mvq->umem3;
> + break;
> + }
> +
> + MLX5_SET(destroy_umem_in, in, opcode, MLX5_CMD_OP_DESTROY_UMEM);
> + MLX5_SET(destroy_umem_in, in, umem_id, umem->id);
> + if (mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, sizeof(out)))
> + return;
> +
> + umem_frag_buf_free(ndev, umem);
> +}
> +
> +static int umems_create(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +{
> + int num;
> + int err;
> +
> + for (num = 1; num <= 3; num++) {
> + err = create_umem(ndev, mvq, num);
> + if (err)
> + goto err_umem;
> + }
> + return 0;
> +
> +err_umem:
> + for (num--; num > 0; num--)
> + umem_destroy(ndev, mvq, num);
> +
> + return err;
> +}
> +
> +static void umems_destroy(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +{
> + int num;
> +
> + for (num = 3; num > 0; num--)
> + umem_destroy(ndev, mvq, num);
> +}
> +
> +static int get_queue_type(struct mlx5_vdpa_net *ndev)
> +{
> + u32 type_mask;
> +
> + type_mask = MLX5_CAP_DEV_VDPA_EMULATION(ndev->mvdev.mdev, virtio_queue_type);
> +
> + /* prefer split queue */
> + if (type_mask & MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_PACKED)
> + return MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_PACKED;
> +
> + WARN_ON(!(type_mask & MLX5_VIRTIO_EMULATION_CAP_VIRTIO_QUEUE_TYPE_SPLIT));
> +
> + return MLX5_VIRTIO_EMULATION_VIRTIO_QUEUE_TYPE_SPLIT;
> +}
> +
> +static bool vq_is_tx(u16 idx)
> +{
> + return idx % 2;
> +}
> +
> +static u16 get_features_12_3(u64 features)
> +{
> + return (!!(features & BIT(VIRTIO_NET_F_HOST_TSO4)) << 9) |
> + (!!(features & BIT(VIRTIO_NET_F_HOST_TSO6)) << 8) |
> + (!!(features & BIT(VIRTIO_NET_F_CSUM)) << 7) |
> + (!!(features & BIT(VIRTIO_NET_F_GUEST_CSUM)) << 6);
> +}
> +
> +static int create_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +{
> + int inlen = MLX5_ST_SZ_BYTES(create_virtio_net_q_in);
> + u32 out[MLX5_ST_SZ_DW(create_virtio_net_q_out)] = {};
> + void *obj_context;
> + void *cmd_hdr;
> + void *vq_ctx;
> + void *in;
> + int err;
> +
> + err = umems_create(ndev, mvq);
> + if (err)
> + return err;
> +
> + in = kzalloc(inlen, GFP_KERNEL);
> + if (!in) {
> + err = -ENOMEM;
> + goto err_alloc;
> + }
> +
> + cmd_hdr = MLX5_ADDR_OF(create_virtio_net_q_in, in, general_obj_in_cmd_hdr);
> +
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_CREATE_GENERAL_OBJECT);
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
> +
> + obj_context = MLX5_ADDR_OF(create_virtio_net_q_in, in, obj_context);
> + MLX5_SET(virtio_net_q_object, obj_context, hw_available_index, mvq->avail_idx);
> + MLX5_SET(virtio_net_q_object, obj_context, queue_feature_bit_mask_12_3,
> + get_features_12_3(ndev->mvdev.actual_features));
> + vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, obj_context, virtio_q_context);
> + MLX5_SET(virtio_q, vq_ctx, virtio_q_type, get_queue_type(ndev));
> +
> + if (vq_is_tx(mvq->index))
> + MLX5_SET(virtio_net_q_object, obj_context, tisn_or_qpn, ndev->res.tisn);
> +
> + MLX5_SET(virtio_q, vq_ctx, event_mode, MLX5_VIRTIO_Q_EVENT_MODE_QP_MODE);
> + MLX5_SET(virtio_q, vq_ctx, queue_index, mvq->index);
> + MLX5_SET(virtio_q, vq_ctx, event_qpn_or_msix, mvq->fwqp.mqp.qpn);
> + MLX5_SET(virtio_q, vq_ctx, queue_size, mvq->num_ent);
> + MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0,
> + !!(ndev->mvdev.actual_features & VIRTIO_F_VERSION_1));
> + MLX5_SET64(virtio_q, vq_ctx, desc_addr, mvq->desc_addr);
> + MLX5_SET64(virtio_q, vq_ctx, used_addr, mvq->device_addr);
> + MLX5_SET64(virtio_q, vq_ctx, available_addr, mvq->driver_addr);
> + MLX5_SET(virtio_q, vq_ctx, virtio_q_mkey, ndev->mvdev.mr.mkey.key);
> + MLX5_SET(virtio_q, vq_ctx, umem_1_id, mvq->umem1.id);
> + MLX5_SET(virtio_q, vq_ctx, umem_1_size, mvq->umem1.size);
> + MLX5_SET(virtio_q, vq_ctx, umem_2_id, mvq->umem2.id);
> + MLX5_SET(virtio_q, vq_ctx, umem_2_size, mvq->umem1.size);
> + MLX5_SET(virtio_q, vq_ctx, umem_3_id, mvq->umem3.id);
> + MLX5_SET(virtio_q, vq_ctx, umem_3_size, mvq->umem1.size);
> + MLX5_SET(virtio_q, vq_ctx, pd, ndev->mvdev.res.pdn);
> + if (MLX5_CAP_DEV_VDPA_EMULATION(ndev->mvdev.mdev, eth_frame_offload_type))
> + MLX5_SET(virtio_q, vq_ctx, virtio_version_1_0, 1);
> +
> + err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
> + if (err)
> + goto err_cmd;
> +
> + kfree(in);
> + mvq->virtq_id = MLX5_GET(general_obj_out_cmd_hdr, out, obj_id);
> +
> + return 0;
> +
> +err_cmd:
> + kfree(in);
> +err_alloc:
> + umems_destroy(ndev, mvq);
> + return err;
> +}
> +
> +static void destroy_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +{
> + u32 in[MLX5_ST_SZ_DW(destroy_virtio_net_q_in)] = {};
> + u32 out[MLX5_ST_SZ_DW(destroy_virtio_net_q_out)] = {};
> +
> + MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.opcode,
> + MLX5_CMD_OP_DESTROY_GENERAL_OBJECT);
> + MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.obj_id, mvq->virtq_id);
> + MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.uid, ndev->mvdev.res.uid);
> + MLX5_SET(destroy_virtio_net_q_in, in, general_obj_out_cmd_hdr.obj_type,
> + MLX5_OBJ_TYPE_VIRTIO_NET_Q);
> + if (mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, sizeof(out))) {
> + mlx5_vdpa_warn(&ndev->mvdev, "destroy virtqueue 0x%x\n", mvq->virtq_id);
> + return;
> + }
> + umems_destroy(ndev, mvq);
> +}
> +
> +static u32 get_rqpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
> +{
> + return fw ? mvq->vqqp.mqp.qpn : mvq->fwqp.mqp.qpn;
> +}
> +
> +static u32 get_qpn(struct mlx5_vdpa_virtqueue *mvq, bool fw)
> +{
> + return fw ? mvq->fwqp.mqp.qpn : mvq->vqqp.mqp.qpn;
> +}
> +
> +static void alloc_inout(struct mlx5_vdpa_net *ndev, int cmd, void **in, int *inlen, void **out,
> + int *outlen, u32 qpn, u32 rqpn)
> +{
> + void *qpc;
> + void *pp;
> +
> + switch (cmd) {
> + case MLX5_CMD_OP_2RST_QP:
> + *inlen = MLX5_ST_SZ_BYTES(qp_2rst_in);
> + *outlen = MLX5_ST_SZ_BYTES(qp_2rst_out);
> + *in = kzalloc(*inlen, GFP_KERNEL);
> + *out = kzalloc(*outlen, GFP_KERNEL);
> + if (!in || !out)
> + goto outerr;
> +
> + MLX5_SET(qp_2rst_in, *in, opcode, cmd);
> + MLX5_SET(qp_2rst_in, *in, uid, ndev->mvdev.res.uid);
> + MLX5_SET(qp_2rst_in, *in, qpn, qpn);
> + break;
> + case MLX5_CMD_OP_RST2INIT_QP:
> + *inlen = MLX5_ST_SZ_BYTES(rst2init_qp_in);
> + *outlen = MLX5_ST_SZ_BYTES(rst2init_qp_out);
> + *in = kzalloc(*inlen, GFP_KERNEL);
> + *out = kzalloc(MLX5_ST_SZ_BYTES(rst2init_qp_out), GFP_KERNEL);
> + if (!in || !out)
> + goto outerr;
> +
> + MLX5_SET(rst2init_qp_in, *in, opcode, cmd);
> + MLX5_SET(rst2init_qp_in, *in, uid, ndev->mvdev.res.uid);
> + MLX5_SET(rst2init_qp_in, *in, qpn, qpn);
> + qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
> + MLX5_SET(qpc, qpc, remote_qpn, rqpn);
> + MLX5_SET(qpc, qpc, rwe, 1);
> + pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
> + MLX5_SET(ads, pp, vhca_port_num, 1);
> + break;
> + case MLX5_CMD_OP_INIT2RTR_QP:
> + *inlen = MLX5_ST_SZ_BYTES(init2rtr_qp_in);
> + *outlen = MLX5_ST_SZ_BYTES(init2rtr_qp_out);
> + *in = kzalloc(*inlen, GFP_KERNEL);
> + *out = kzalloc(MLX5_ST_SZ_BYTES(init2rtr_qp_out), GFP_KERNEL);
> + if (!in || !out)
> + goto outerr;
> +
> + MLX5_SET(init2rtr_qp_in, *in, opcode, cmd);
> + MLX5_SET(init2rtr_qp_in, *in, uid, ndev->mvdev.res.uid);
> + MLX5_SET(init2rtr_qp_in, *in, qpn, qpn);
> + qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
> + MLX5_SET(qpc, qpc, mtu, MLX5_QPC_MTU_256_BYTES);
> + MLX5_SET(qpc, qpc, log_msg_max, 30);
> + MLX5_SET(qpc, qpc, remote_qpn, rqpn);
> + pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
> + MLX5_SET(ads, pp, fl, 1);
> + break;
> + case MLX5_CMD_OP_RTR2RTS_QP:
> + *inlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_in);
> + *outlen = MLX5_ST_SZ_BYTES(rtr2rts_qp_out);
> + *in = kzalloc(*inlen, GFP_KERNEL);
> + *out = kzalloc(MLX5_ST_SZ_BYTES(rtr2rts_qp_out), GFP_KERNEL);
> + if (!in || !out)
> + goto outerr;
> +
> + MLX5_SET(rtr2rts_qp_in, *in, opcode, cmd);
> + MLX5_SET(rtr2rts_qp_in, *in, uid, ndev->mvdev.res.uid);
> + MLX5_SET(rtr2rts_qp_in, *in, qpn, qpn);
> + qpc = MLX5_ADDR_OF(rst2init_qp_in, *in, qpc);
> + pp = MLX5_ADDR_OF(qpc, qpc, primary_address_path);
> + MLX5_SET(ads, pp, ack_timeout, 14);
> + MLX5_SET(qpc, qpc, retry_count, 7);
> + MLX5_SET(qpc, qpc, rnr_retry, 7);
> + break;
> + default:
> + goto outerr;
> + }
> + if (!*in || !*out)
> + goto outerr;
> +
> + return;
> +
> +outerr:
> + kfree(*in);
> + kfree(*out);
> + *in = NULL;
> + *out = NULL;
> +}
> +
> +static void free_inout(void *in, void *out)
> +{
> + kfree(in);
> + kfree(out);
> +}
> +
> +/* Two QPs are used by each virtqueue. One is used by the driver and one by
> + * firmware. The fw argument indicates whether the subjected QP is the one used
> + * by firmware.
> + */
> +static int modify_qp(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, bool fw, int cmd)
> +{
> + int outlen;
> + int inlen;
> + void *out;
> + void *in;
> + int err;
> +
> + alloc_inout(ndev, cmd, &in, &inlen, &out, &outlen, get_qpn(mvq, fw), get_rqpn(mvq, fw));
> + if (!in || !out)
> + return -ENOMEM;
> +
> + err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, outlen);
> + free_inout(in, out);
> + return err;
> +}
> +
> +static int connect_qps(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +{
> + int err;
> +
> + err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_2RST_QP);
> + if (err)
> + return err;
> +
> + err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_2RST_QP);
> + if (err)
> + return err;
> +
> + err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_RST2INIT_QP);
> + if (err)
> + return err;
> +
> + err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_RST2INIT_QP);
> + if (err)
> + return err;
> +
> + err = modify_qp(ndev, mvq, true, MLX5_CMD_OP_INIT2RTR_QP);
> + if (err)
> + return err;
> +
> + err = modify_qp(ndev, mvq, false, MLX5_CMD_OP_INIT2RTR_QP);
> + if (err)
> + return err;
> +
> + return modify_qp(ndev, mvq, true, MLX5_CMD_OP_RTR2RTS_QP);
> +}
> +
> +struct mlx5_virtq_attr {
> + u8 state;
> + u16 available_index;
> +};
> +
> +static int query_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq,
> + struct mlx5_virtq_attr *attr)
> +{
> + int outlen = MLX5_ST_SZ_BYTES(query_virtio_net_q_out);
> + u32 in[MLX5_ST_SZ_DW(query_virtio_net_q_in)] = {};
> + void *out;
> + void *obj_context;
> + void *cmd_hdr;
> + int err;
> +
> + out = kzalloc(outlen, GFP_KERNEL);
> + if (!out)
> + return -ENOMEM;
> +
> + cmd_hdr = MLX5_ADDR_OF(query_virtio_net_q_in, in, general_obj_in_cmd_hdr);
> +
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_QUERY_GENERAL_OBJECT);
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
> + err = mlx5_cmd_exec(ndev->mvdev.mdev, in, sizeof(in), out, outlen);
> + if (err)
> + goto err_cmd;
> +
> + obj_context = MLX5_ADDR_OF(query_virtio_net_q_out, out, obj_context);
> + memset(attr, 0, sizeof(*attr));
> + attr->state = MLX5_GET(virtio_net_q_object, obj_context, state);
> + attr->available_index = MLX5_GET(virtio_net_q_object, obj_context, hw_available_index);
> + kfree(out);
> + return 0;
> +
> +err_cmd:
> + kfree(out);
> + return err;
> +}
> +
> +static int modify_virtqueue(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq, int state)
> +{
> + int inlen = MLX5_ST_SZ_BYTES(modify_virtio_net_q_in);
> + u32 out[MLX5_ST_SZ_DW(modify_virtio_net_q_out)] = {};
> + void *obj_context;
> + void *cmd_hdr;
> + void *in;
> + int err;
> +
> + in = kzalloc(inlen, GFP_KERNEL);
> + if (!in)
> + return -ENOMEM;
> +
> + cmd_hdr = MLX5_ADDR_OF(modify_virtio_net_q_in, in, general_obj_in_cmd_hdr);
> +
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, opcode, MLX5_CMD_OP_MODIFY_GENERAL_OBJECT);
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_type, MLX5_OBJ_TYPE_VIRTIO_NET_Q);
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, obj_id, mvq->virtq_id);
> + MLX5_SET(general_obj_in_cmd_hdr, cmd_hdr, uid, ndev->mvdev.res.uid);
> +
> + obj_context = MLX5_ADDR_OF(modify_virtio_net_q_in, in, obj_context);
> + MLX5_SET64(virtio_net_q_object, obj_context, modify_field_select,
> + MLX5_VIRTQ_MODIFY_MASK_STATE);
> + MLX5_SET(virtio_net_q_object, obj_context, state, state);
> + err = mlx5_cmd_exec(ndev->mvdev.mdev, in, inlen, out, sizeof(out));
> + kfree(in);
> + if (!err)
> + mvq->fw_state = state;
> +
> + return err;
> +}
> +
> +static int setup_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +{
> + u16 idx = mvq->index;
> + int err;
> +
> + if (!mvq->num_ent)
> + return 0;
> +
> + if (mvq->initialized) {
> + mlx5_vdpa_warn(&ndev->mvdev, "attempt re init\n");
> + return -EINVAL;
> + }
> +
> + err = cq_create(ndev, idx, mvq->num_ent);
> + if (err)
> + return err;
> +
> + err = qp_create(ndev, mvq, &mvq->fwqp);
> + if (err)
> + goto err_fwqp;
> +
> + err = qp_create(ndev, mvq, &mvq->vqqp);
> + if (err)
> + goto err_vqqp;
> +
> + err = connect_qps(ndev, mvq);
> + if (err)
> + goto err_connect;
> +
> + err = create_virtqueue(ndev, mvq);
> + if (err)
> + goto err_connect;
> +
> + if (mvq->ready) {
> + err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> + if (err) {
> + mlx5_vdpa_warn(&ndev->mvdev, "failed to modify to ready vq idx %d(%d)\n",
> + idx, err);
> + goto err_connect;
> + }
> + }
> +
> + mvq->initialized = true;
> + return 0;
> +
> +err_connect:
> + qp_destroy(ndev, &mvq->vqqp);
> +err_vqqp:
> + qp_destroy(ndev, &mvq->fwqp);
> +err_fwqp:
> + cq_destroy(ndev, idx);
> + return err;
> +}
> +
> +static void suspend_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +{
> + struct mlx5_virtq_attr attr;
> +
> + if (!mvq->initialized)
> + return;
> +
> + if (query_virtqueue(ndev, mvq, &attr)) {
> + mlx5_vdpa_warn(&ndev->mvdev, "failed to query virtqueue\n");
> + return;
> + }
> + if (mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY)
> + return;
> +
> + if (modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_SUSPEND))
> + mlx5_vdpa_warn(&ndev->mvdev, "modify to suspend failed\n");
> +}
> +
> +static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> +{
> + int i;
> +
> + for (i = 0; i < MLX5_MAX_SUPPORTED_VQS; i++)
> + suspend_vq(ndev, &ndev->vqs[i]);
In teardown_virtqueues() it has a check of mvq->num_ent, any reason not
doing it here?
> +}
> +
> +static void teardown_vq(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +{
> + if (!mvq->initialized || !mvq->num_ent)
> + return;
> +
> + suspend_vq(ndev, mvq);
> + destroy_virtqueue(ndev, mvq);
> + qp_destroy(ndev, &mvq->vqqp);
> + qp_destroy(ndev, &mvq->fwqp);
> + cq_destroy(ndev, mvq->index);
> + mvq->initialized = false;
> +}
> +
> +static int create_rqt(struct mlx5_vdpa_net *ndev)
> +{
> + int log_max_rqt;
> + int acutal_rqt;
> + __be32 *list;
> + void *rqtc;
> + int inlen;
> + void *in;
> + int i, j;
> + int err;
> +
> + log_max_rqt = min_t(int, 1, MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size));
> + if (log_max_rqt < 1)
> + return -EOPNOTSUPP;
> +
> + acutal_rqt = 1;
> + inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + (1 << log_max_rqt) * MLX5_ST_SZ_BYTES(rq_num);
> + in = kzalloc(inlen, GFP_KERNEL);
> + if (!in)
> + return -ENOMEM;
> +
> + MLX5_SET(create_rqt_in, in, uid, ndev->mvdev.res.uid);
> + rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context);
> +
> + MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
> + MLX5_SET(rqtc, rqtc, rqt_max_size, 1 << log_max_rqt);
> + MLX5_SET(rqtc, rqtc, rqt_actual_size, 1);
> + list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]);
> + for (i = 0, j = 0; j < ndev->mvdev.max_vqs; j++) {
> + if (!ndev->vqs[j].initialized)
> + continue;
> +
> + if (!vq_is_tx(ndev->vqs[j].index)) {
> + list[i] = cpu_to_be32(ndev->vqs[j].virtq_id);
> + i++;
> + }
> + }
> +
> + err = mlx5_vdpa_create_rqt(&ndev->mvdev, in, inlen, &ndev->res.rqtn);
> + kfree(in);
> + if (err)
> + return err;
> +
> + return 0;
> +}
> +
> +static void destroy_rqt(struct mlx5_vdpa_net *ndev)
> +{
> + mlx5_vdpa_destroy_rqt(&ndev->mvdev, ndev->res.rqtn);
> +}
> +
> +static int create_tir(struct mlx5_vdpa_net *ndev)
> +{
> +#define HASH_IP_L4PORTS \
> + (MLX5_HASH_FIELD_SEL_SRC_IP | MLX5_HASH_FIELD_SEL_DST_IP | MLX5_HASH_FIELD_SEL_L4_SPORT | \
> + MLX5_HASH_FIELD_SEL_L4_DPORT)
> + static const u8 rx_hash_toeplitz_key[] = { 0x2c, 0xc6, 0x81, 0xd1, 0x5b, 0xdb, 0xf4, 0xf7,
> + 0xfc, 0xa2, 0x83, 0x19, 0xdb, 0x1a, 0x3e, 0x94,
> + 0x6b, 0x9e, 0x38, 0xd9, 0x2c, 0x9c, 0x03, 0xd1,
> + 0xad, 0x99, 0x44, 0xa7, 0xd9, 0x56, 0x3d, 0x59,
> + 0x06, 0x3c, 0x25, 0xf3, 0xfc, 0x1f, 0xdc, 0x2a };
> + void *rss_key;
> + void *outer;
> + void *tirc;
> + void *in;
> + int err;
> +
> + in = kzalloc(MLX5_ST_SZ_BYTES(create_tir_in), GFP_KERNEL);
> + if (!in)
> + return -ENOMEM;
> +
> + MLX5_SET(create_tir_in, in, uid, ndev->mvdev.res.uid);
> + tirc = MLX5_ADDR_OF(create_tir_in, in, ctx);
> + MLX5_SET(tirc, tirc, disp_type, MLX5_TIRC_DISP_TYPE_INDIRECT);
> +
> + MLX5_SET(tirc, tirc, rx_hash_symmetric, 1);
> + MLX5_SET(tirc, tirc, rx_hash_fn, MLX5_RX_HASH_FN_TOEPLITZ);
> + rss_key = MLX5_ADDR_OF(tirc, tirc, rx_hash_toeplitz_key);
> + memcpy(rss_key, rx_hash_toeplitz_key, sizeof(rx_hash_toeplitz_key));
> +
> + outer = MLX5_ADDR_OF(tirc, tirc, rx_hash_field_selector_outer);
> + MLX5_SET(rx_hash_field_select, outer, l3_prot_type, MLX5_L3_PROT_TYPE_IPV4);
> + MLX5_SET(rx_hash_field_select, outer, l4_prot_type, MLX5_L4_PROT_TYPE_TCP);
> + MLX5_SET(rx_hash_field_select, outer, selected_fields, HASH_IP_L4PORTS);
> +
> + MLX5_SET(tirc, tirc, indirect_table, ndev->res.rqtn);
> + MLX5_SET(tirc, tirc, transport_domain, ndev->res.tdn);
> +
> + err = mlx5_vdpa_create_tir(&ndev->mvdev, in, &ndev->res.tirn);
> + kfree(in);
> + return err;
> +}
> +
> +static void destroy_tir(struct mlx5_vdpa_net *ndev)
> +{
> + mlx5_vdpa_destroy_tir(&ndev->mvdev, ndev->res.tirn);
> +}
> +
> +static int add_fwd_to_tir(struct mlx5_vdpa_net *ndev)
> +{
> + struct mlx5_flow_destination dest[2] = {};
> + struct mlx5_flow_table_attr ft_attr = {};
> + struct mlx5_flow_act flow_act = {};
> + struct mlx5_flow_namespace *ns;
> + int err;
> +
> + /* for now, one entry, match all, forward to tir */
> + ft_attr.max_fte = 1;
> + ft_attr.autogroup.max_num_groups = 1;
> +
> + ns = mlx5_get_flow_namespace(ndev->mvdev.mdev, MLX5_FLOW_NAMESPACE_BYPASS);
> + if (!ns) {
> + mlx5_vdpa_warn(&ndev->mvdev, "get flow namespace\n");
> + return -EOPNOTSUPP;
> + }
> +
> + ndev->rxft = mlx5_create_auto_grouped_flow_table(ns, &ft_attr);
> + if (IS_ERR(ndev->rxft))
> + return PTR_ERR(ndev->rxft);
> +
> + ndev->rx_counter = mlx5_fc_create(ndev->mvdev.mdev, false);
> + if (IS_ERR(ndev->rx_counter)) {
> + err = PTR_ERR(ndev->rx_counter);
> + goto err_fc;
> + }
> +
> + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_FWD_DEST | MLX5_FLOW_CONTEXT_ACTION_COUNT;
> + dest[0].type = MLX5_FLOW_DESTINATION_TYPE_TIR;
> + dest[0].tir_num = ndev->res.tirn;
> + dest[1].type = MLX5_FLOW_DESTINATION_TYPE_COUNTER;
> + dest[1].counter_id = mlx5_fc_id(ndev->rx_counter);
> + ndev->rx_rule = mlx5_add_flow_rules(ndev->rxft, NULL, &flow_act, dest, 2);
> + if (IS_ERR(ndev->rx_rule)) {
> + err = PTR_ERR(ndev->rx_rule);
> + ndev->rx_rule = NULL;
> + goto err_rule;
> + }
> +
> + return 0;
> +
> +err_rule:
> + mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter);
> +err_fc:
> + mlx5_destroy_flow_table(ndev->rxft);
> + return err;
> +}
> +
> +static void remove_fwd_to_tir(struct mlx5_vdpa_net *ndev)
> +{
> + if (!ndev->rx_rule)
> + return;
> +
> + mlx5_del_flow_rules(ndev->rx_rule);
> + mlx5_fc_destroy(ndev->mvdev.mdev, ndev->rx_counter);
> + mlx5_destroy_flow_table(ndev->rxft);
> +
> + ndev->rx_rule = NULL;
> +}
> +
> +static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> +
> + if (unlikely(!mvq->ready))
> + return;
> +
> + iowrite16(idx, ndev->mvdev.res.kick_addr);
> +}
> +
> +static int mlx5_vdpa_set_vq_address(struct vdpa_device *vdev, u16 idx, u64 desc_area,
> + u64 driver_area, u64 device_area)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> +
> + mvq->desc_addr = desc_area;
> + mvq->device_addr = device_area;
> + mvq->driver_addr = driver_area;
> + return 0;
> +}
> +
> +static void mlx5_vdpa_set_vq_num(struct vdpa_device *vdev, u16 idx, u32 num)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + struct mlx5_vdpa_virtqueue *mvq;
> +
> + mvq = &ndev->vqs[idx];
> + mvq->num_ent = num;
> +}
> +
> +static void mlx5_vdpa_set_vq_cb(struct vdpa_device *vdev, u16 idx, struct vdpa_callback *cb)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + struct mlx5_vdpa_virtqueue *vq = &ndev->vqs[idx];
> +
> + vq->event_cb = *cb;
> +}
> +
> +static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> + int err;
> +
> + if (!mvq->ready && ready && mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
> + err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> + if (err) {
> + mlx5_vdpa_warn(mvdev, "failed to modify virtqueue(%d)\n", err);
> + return;
> + }
> + }
I wonder what's the reason of changing vq state on the hardware here. I
think we can defer it to set_status().
(Anyhow we don't sync vq address in set_vq_address()).
> +
> + mvq->ready = ready;
> +}
> +
> +static bool mlx5_vdpa_get_vq_ready(struct vdpa_device *vdev, u16 idx)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> +
> + return mvq->ready;
> +}
> +
> +static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx,
> + const struct vdpa_vq_state *state)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> +
> + if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
> + mlx5_vdpa_warn(mvdev, "can't modify available index\n");
> + return -EINVAL;
> + }
> +
> + mvq->avail_idx = state->avail_index;
> + return 0;
> +}
> +
> +static void mlx5_vdpa_get_vq_state(struct vdpa_device *vdev, u16 idx, struct vdpa_vq_state *state)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> + struct mlx5_virtq_attr attr;
> + int err;
> +
> + if (!mvq->initialized)
> + goto not_ready;
> +
> + err = query_virtqueue(ndev, mvq, &attr);
> + if (err) {
> + mlx5_vdpa_warn(mvdev, "failed to query virtqueue\n");
> + goto not_ready;
> + }
> + state->avail_index = attr.available_index;
> +
> +not_ready:
> + state->state = VQ_STATE_NOT_READY;
> +}
> +
> +static u32 mlx5_vdpa_get_vq_align(struct vdpa_device *vdev)
> +{
> + return PAGE_SIZE;
> +}
> +
> +enum { MLX5_VIRTIO_NET_F_GUEST_CSUM = 1 << 9,
> + MLX5_VIRTIO_NET_F_CSUM = 1 << 10,
> + MLX5_VIRTIO_NET_F_HOST_TSO6 = 1 << 11,
> + MLX5_VIRTIO_NET_F_HOST_TSO4 = 1 << 12,
> +};
> +
> +static u64 mlx_to_vritio_features(u16 dev_features)
> +{
> + u64 result = 0;
> +
> + if (dev_features & MLX5_VIRTIO_NET_F_GUEST_CSUM)
> + result |= BIT(VIRTIO_NET_F_GUEST_CSUM);
> + if (dev_features & MLX5_VIRTIO_NET_F_CSUM)
> + result |= BIT(VIRTIO_NET_F_CSUM);
> + if (dev_features & MLX5_VIRTIO_NET_F_HOST_TSO6)
> + result |= BIT(VIRTIO_NET_F_HOST_TSO6);
> + if (dev_features & MLX5_VIRTIO_NET_F_HOST_TSO4)
> + result |= BIT(VIRTIO_NET_F_HOST_TSO4);
> +
> + return result;
> +}
> +
> +static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + u16 dev_features;
> +
> + dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
> + ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features);
> + if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
> + ndev->mvdev.mlx_features |= BIT(VIRTIO_F_VERSION_1);
This is interesting. This suggests !VIRTIO_F_VERSION_1 &&
VIRTIO_F_IOMMU_PLATFORM is valid. But virito spec doesn't allow such
configuration.
So I think you need either:
1) Fail vDPA device probe when VERSION_1 is not supported
2) clear IOMMU_PLATFORM if VERSION_1 is not negotiated
For 2) I guess it can't work, according to the spec, without
IOMMU_PLATFORM, device need to use PA to access the memory
> + ndev->mvdev.mlx_features |= BIT(VIRTIO_F_ANY_LAYOUT);
> + ndev->mvdev.mlx_features |= BIT(VIRTIO_F_IOMMU_PLATFORM);
> + if (mlx5_vdpa_max_qps(ndev->mvdev.max_vqs) > 1)
> + ndev->mvdev.mlx_features |= BIT(VIRTIO_NET_F_MQ);
> +
> + print_features(mvdev, ndev->mvdev.mlx_features, false);
> + return ndev->mvdev.mlx_features;
> +}
> +
> +static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> +{
> + /* FIXME: qemu currently does not set all the feaures due to a bug.
> + * Add checks when this is fixed.
> + */
I think we should add the check now then qemu can get notified. (E.g
IOMMU_PLATFORM)
> + return 0;
> +}
> +
> +static int setup_virtqueues(struct mlx5_vdpa_net *ndev)
> +{
> + int err;
> + int i;
> +
> + for (i = 0; i < 2 * mlx5_vdpa_max_qps(ndev->mvdev.max_vqs); i++) {
> + err = setup_vq(ndev, &ndev->vqs[i]);
> + if (err)
> + goto err_vq;
> + }
> +
> + return 0;
> +
> +err_vq:
> + for (--i; i >= 0; i--)
> + teardown_vq(ndev, &ndev->vqs[i]);
> +
> + return err;
> +}
> +
> +static void teardown_virtqueues(struct mlx5_vdpa_net *ndev)
> +{
> + struct mlx5_vdpa_virtqueue *mvq;
> + int i;
> +
> + for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
> + mvq = &ndev->vqs[i];
> + if (!mvq->num_ent)
> + continue;
> +
> + teardown_vq(ndev, mvq);
> + }
> +}
> +
> +static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + int err;
> +
> + print_features(mvdev, features, true);
> +
> + err = verify_min_features(mvdev, features);
> + if (err)
> + return err;
> +
> + ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features;
> + return err;
> +}
> +
> +static void mlx5_vdpa_set_config_cb(struct vdpa_device *vdev, struct vdpa_callback *cb)
> +{
> + /* not implemented */
> + mlx5_vdpa_warn(to_mvdev(vdev), "set config callback not supported\n");
> +}
> +
> +#define MLX5_VDPA_MAX_VQ_ENTRIES 256
Is this a hardware limitation, qemu can support up to 1K which the
requirement of some NFV cases.
> +static u16 mlx5_vdpa_get_vq_num_max(struct vdpa_device *vdev)
> +{
> + return MLX5_VDPA_MAX_VQ_ENTRIES;
> +}
> +
> +static u32 mlx5_vdpa_get_device_id(struct vdpa_device *vdev)
> +{
> + return VIRTIO_ID_NET;
> +}
> +
> +static u32 mlx5_vdpa_get_vendor_id(struct vdpa_device *vdev)
> +{
> + return PCI_VENDOR_ID_MELLANOX;
> +}
> +
> +static u8 mlx5_vdpa_get_status(struct vdpa_device *vdev)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> +
> + print_status(mvdev, ndev->mvdev.status, false);
> + return ndev->mvdev.status;
> +}
> +
> +static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq)
> +{
> + struct mlx5_vq_restore_info *ri = &mvq->ri;
> + struct mlx5_virtq_attr attr;
> + int err;
> +
> + if (!mvq->initialized)
> + return 0;
> +
> + err = query_virtqueue(ndev, mvq, &attr);
> + if (err)
> + return err;
> +
> + ri->avail_index = attr.available_index;
> + ri->ready = mvq->ready;
> + ri->num_ent = mvq->num_ent;
> + ri->desc_addr = mvq->desc_addr;
> + ri->device_addr = mvq->device_addr;
> + ri->driver_addr = mvq->driver_addr;
> + ri->cb = mvq->event_cb;
> + ri->restore = true;
> + return 0;
> +}
> +
> +static int save_channels_info(struct mlx5_vdpa_net *ndev)
> +{
> + int i;
> +
> + for (i = 0; i < ndev->mvdev.max_vqs; i++) {
> + memset(&ndev->vqs[i].ri, 0, sizeof(ndev->vqs[i].ri));
> + save_channel_info(ndev, &ndev->vqs[i]);
> + }
> + return 0;
> +}
> +
> +static void mlx5_clear_vqs(struct mlx5_vdpa_net *ndev)
> +{
> + int i;
> +
> + for (i = 0; i < ndev->mvdev.max_vqs; i++)
> + memset(&ndev->vqs[i], 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
> +}
> +
> +static void restore_channels_info(struct mlx5_vdpa_net *ndev)
> +{
> + struct mlx5_vdpa_virtqueue *mvq;
> + struct mlx5_vq_restore_info *ri;
> + int i;
> +
> + mlx5_clear_vqs(ndev);
> + init_mvqs(ndev);
> + for (i = 0; i < ndev->mvdev.max_vqs; i++) {
> + mvq = &ndev->vqs[i];
> + ri = &mvq->ri;
> + if (!ri->restore)
> + continue;
> +
> + mvq->avail_idx = ri->avail_index;
> + mvq->ready = ri->ready;
> + mvq->num_ent = ri->num_ent;
> + mvq->desc_addr = ri->desc_addr;
> + mvq->device_addr = ri->device_addr;
> + mvq->driver_addr = ri->driver_addr;
> + mvq->event_cb = ri->cb;
> + }
> +}
> +
> +static int mlx5_vdpa_change_map(struct mlx5_vdpa_net *ndev, struct vhost_iotlb *iotlb)
> +{
> + int err;
> +
> + suspend_vqs(ndev);
> + err = save_channels_info(ndev);
> + if (err)
> + goto err_mr;
> +
> + teardown_driver(ndev);
> + mlx5_vdpa_destroy_mr(&ndev->mvdev);
> + err = mlx5_vdpa_create_mr(&ndev->mvdev, iotlb);
> + if (err)
> + goto err_mr;
> +
> + restore_channels_info(ndev);
> + err = setup_driver(ndev);
> + if (err)
> + goto err_setup;
> +
> + return 0;
> +
> +err_setup:
> + mlx5_vdpa_destroy_mr(&ndev->mvdev);
> +err_mr:
> + return err;
> +}
> +
> +static int setup_driver(struct mlx5_vdpa_net *ndev)
> +{
> + int err;
> +
> + mutex_lock(&ndev->reslock);
> + if (ndev->setup) {
> + mlx5_vdpa_warn(&ndev->mvdev, "setup driver called for already setup driver\n");
> + err = 0;
> + goto out;
> + }
> + err = setup_virtqueues(ndev);
> + if (err) {
> + mlx5_vdpa_warn(&ndev->mvdev, "setup_virtqueues\n");
> + goto out;
> + }
> +
> + err = create_rqt(ndev);
> + if (err) {
> + mlx5_vdpa_warn(&ndev->mvdev, "create_rqt\n");
> + goto err_rqt;
> + }
> +
> + err = create_tir(ndev);
> + if (err) {
> + mlx5_vdpa_warn(&ndev->mvdev, "create_tir\n");
> + goto err_tir;
> + }
> +
> + err = add_fwd_to_tir(ndev);
> + if (err) {
> + mlx5_vdpa_warn(&ndev->mvdev, "add_fwd_to_tir\n");
> + goto err_fwd;
> + }
> + ndev->setup = true;
> + mutex_unlock(&ndev->reslock);
> +
> + return 0;
> +
> +err_fwd:
> + destroy_tir(ndev);
> +err_tir:
> + destroy_rqt(ndev);
> +err_rqt:
> + teardown_virtqueues(ndev);
> +out:
> + mutex_unlock(&ndev->reslock);
> + return err;
> +}
> +
> +static void teardown_driver(struct mlx5_vdpa_net *ndev)
> +{
> + mutex_lock(&ndev->reslock);
> + if (!ndev->setup)
> + goto out;
> +
> + remove_fwd_to_tir(ndev);
> + destroy_tir(ndev);
> + destroy_rqt(ndev);
> + teardown_virtqueues(ndev);
> + ndev->setup = false;
> +out:
> + mutex_unlock(&ndev->reslock);
> +}
> +
> +static void mlx5_vdpa_set_status(struct vdpa_device *vdev, u8 status)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + int err;
> +
> + print_status(mvdev, status, true);
> + if (!status) {
> + mlx5_vdpa_info(mvdev, "performing device reset\n");
> + teardown_driver(ndev);
> + mlx5_vdpa_destroy_mr(&ndev->mvdev);
> + ndev->mvdev.status = 0;
> + ndev->mvdev.mlx_features = 0;
> + ++mvdev->generation;
> + return;
> + }
> +
> + if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) {
> + if (status & VIRTIO_CONFIG_S_DRIVER_OK) {
> + err = setup_driver(ndev);
> + if (err) {
> + mlx5_vdpa_warn(mvdev, "failed to setup driver\n");
> + goto err_setup;
> + }
> + } else {
> + mlx5_vdpa_warn(mvdev, "did not expect DRIVER_OK to be cleared\n");
> + return;
> + }
> + }
> +
> + ndev->mvdev.status = status;
> + return;
> +
> +err_setup:
> + mlx5_vdpa_destroy_mr(&ndev->mvdev);
> + ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED;
> +}
> +
> +static void mlx5_vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf,
> + unsigned int len)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> +
> + if (offset + len < sizeof(struct virtio_net_config))
> + memcpy(buf, &ndev->config + offset, len);
Note that guest expect LE, what's the endian for ndev->config?
> +}
> +
> +static void mlx5_vdpa_set_config(struct vdpa_device *vdev, unsigned int offset, const void *buf,
> + unsigned int len)
> +{
> + /* not supported */
> +}
> +
> +static u32 mlx5_vdpa_get_generation(struct vdpa_device *vdev)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> +
> + return mvdev->generation;
> +}
> +
> +static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb)
> +{
> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> + bool change_map;
> + int err;
> +
> + err = mlx5_vdpa_handle_set_map(mvdev, iotlb, &change_map);
> + if (err) {
> + mlx5_vdpa_warn(mvdev, "set map failed(%d)\n", err);
> + return err;
> + }
> +
> + if (change_map)
> + return mlx5_vdpa_change_map(ndev, iotlb);
Any reason for not doing this inside mlx5_handle_set_map()?
> +
> + return 0;
> +}
> +
> +static void mlx5_vdpa_free(struct vdpa_device *vdev)
> +{
> + /* not used */
> +}
> +
> +static const struct vdpa_config_ops mlx5_vdpa_ops = {
> + .set_vq_address = mlx5_vdpa_set_vq_address,
> + .set_vq_num = mlx5_vdpa_set_vq_num,
> + .kick_vq = mlx5_vdpa_kick_vq,
> + .set_vq_cb = mlx5_vdpa_set_vq_cb,
> + .set_vq_ready = mlx5_vdpa_set_vq_ready,
> + .get_vq_ready = mlx5_vdpa_get_vq_ready,
> + .set_vq_state = mlx5_vdpa_set_vq_state,
> + .get_vq_state = mlx5_vdpa_get_vq_state,
> + .get_vq_align = mlx5_vdpa_get_vq_align,
> + .get_features = mlx5_vdpa_get_features,
> + .set_features = mlx5_vdpa_set_features,
> + .set_config_cb = mlx5_vdpa_set_config_cb,
> + .get_vq_num_max = mlx5_vdpa_get_vq_num_max,
> + .get_device_id = mlx5_vdpa_get_device_id,
> + .get_vendor_id = mlx5_vdpa_get_vendor_id,
> + .get_status = mlx5_vdpa_get_status,
> + .set_status = mlx5_vdpa_set_status,
> + .get_config = mlx5_vdpa_get_config,
> + .set_config = mlx5_vdpa_set_config,
> + .get_generation = mlx5_vdpa_get_generation,
> + .set_map = mlx5_vdpa_set_map,
> + .free = mlx5_vdpa_free,
> +};
> +
> +static int alloc_resources(struct mlx5_vdpa_net *ndev)
> +{
> + struct mlx5_vdpa_net_resources *res = &ndev->res;
> + int err;
> +
> + if (res->valid) {
> + mlx5_vdpa_warn(&ndev->mvdev, "resources already allocated\n");
> + return -EEXIST;
> + }
> +
> + err = mlx5_vdpa_alloc_transport_domain(&ndev->mvdev, &res->tdn);
> + if (err)
> + return err;
> +
> + err = create_tis(ndev);
> + if (err)
> + goto err_tis;
> +
> + res->valid = true;
> +
> + return 0;
> +
> +err_tis:
> + mlx5_vdpa_dealloc_transport_domain(&ndev->mvdev, res->tdn);
> + return err;
> +}
> +
> +static void free_resources(struct mlx5_vdpa_net *ndev)
> +{
> + struct mlx5_vdpa_net_resources *res = &ndev->res;
> +
> + if (!res->valid)
> + return;
> +
> + destroy_tis(ndev);
> + mlx5_vdpa_dealloc_transport_domain(&ndev->mvdev, res->tdn);
> + res->valid = false;
> +}
> +
> +static void init_mvqs(struct mlx5_vdpa_net *ndev)
> +{
> + struct mlx5_vdpa_virtqueue *mvq;
> + int i;
> +
> + for (i = 0; i < 2 * mlx5_vdpa_max_qps(ndev->mvdev.max_vqs); ++i) {
> + mvq = &ndev->vqs[i];
> + memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
> + mvq->index = i;
> + mvq->ndev = ndev;
> + mvq->fwqp.fw = true;
> + }
> + for (; i < ndev->mvdev.max_vqs; i++) {
> + mvq = &ndev->vqs[i];
> + memset(mvq, 0, offsetof(struct mlx5_vdpa_virtqueue, ri));
> + mvq->index = i;
> + mvq->ndev = ndev;
> + }
> +}
> +
> +void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev)
> +{
> + struct virtio_net_config *config;
> + struct mlx5_vdpa_dev *mvdev;
> + struct mlx5_vdpa_net *ndev;
> + u32 max_vqs;
> + int err;
> +
> + /* we save one virtqueue for control virtqueue should we require it */
> + max_vqs = MLX5_CAP_DEV_VDPA_EMULATION(mdev, max_num_virtio_queues);
> + max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS);
> +
> + ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops,
> + 2 * mlx5_vdpa_max_qps(max_vqs));
> + if (IS_ERR(ndev))
> + return ndev;
> +
> + ndev->mvdev.max_vqs = max_vqs;
> + mvdev = &ndev->mvdev;
> + mvdev->mdev = mdev;
> + init_mvqs(ndev);
> + mutex_init(&ndev->reslock);
> + config = &ndev->config;
> + err = mlx5_query_nic_vport_mtu(mdev, &config->mtu);
> + if (err)
> + goto err_mtu;
> +
> + err = mlx5_query_nic_vport_mac_address(mdev, 0, 0, config->mac);
> + if (err)
> + goto err_mtu;
> +
> + mvdev->vdev.dma_dev = mdev->device;
> + err = mlx5_vdpa_alloc_resources(&ndev->mvdev);
> + if (err)
> + goto err_mtu;
> +
> + err = alloc_resources(ndev);
> + if (err)
> + goto err_res;
> +
> + err = vdpa_register_device(&mvdev->vdev);
> + if (err)
> + goto err_reg;
> +
> + return ndev;
> +
> +err_reg:
> + free_resources(ndev);
> +err_res:
> + mlx5_vdpa_free_resources(&ndev->mvdev);
> +err_mtu:
> + mutex_destroy(&ndev->reslock);
> + put_device(&mvdev->vdev.dev);
> + return ERR_PTR(err);
> +}
> +
> +void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev)
> +{
> + struct mlx5_vdpa_net *ndev;
> +
> + mvdev->status = 0;
This is probably unnecessary.
Thanks
> + ndev = container_of(mvdev, struct mlx5_vdpa_net, mvdev);
> + vdpa_unregister_device(&ndev->mvdev.vdev);
> + free_resources(ndev);
> + mlx5_vdpa_free_resources(&ndev->mvdev);
> + mutex_destroy(&ndev->reslock);
> + put_device(&mvdev->vdev.dev);
> +}
> diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.h b/drivers/vdpa/mlx5/net/mlx5_vnet.h
> new file mode 100644
> index 000000000000..38bb9adaadc1
> --- /dev/null
> +++ b/drivers/vdpa/mlx5/net/mlx5_vnet.h
> @@ -0,0 +1,32 @@
> +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
> +/* Copyright (c) 2020 Mellanox Technologies Ltd. */
> +
> +#ifndef __MLX5_VNET_H_
> +#define __MLX5_VNET_H_
> +
> +#include <linux/vdpa.h>
> +#include <linux/virtio_net.h>
> +#include <linux/vringh.h>
> +#include <linux/mlx5/driver.h>
> +#include <linux/mlx5/cq.h>
> +#include <linux/mlx5/qp.h>
> +#include "../core/mlx5_vdpa.h"
> +
> +/* we want to have one virtqueue reserved for control queue. We do this
> + * reservation only if we have more than two vqs available.
> + */
> +static inline u32 mlx5_vdpa_max_qps(int max_vqs)
> +{
> + if (max_vqs > 2)
> + return (max_vqs - 1) / 2;
> +
> + return max_vqs / 2;
> +}
> +
> +#define to_mlx5_vdpa_ndev(__mvdev) container_of(__mvdev, struct mlx5_vdpa_net, mvdev)
> +void *mlx5_vdpa_add_dev(struct mlx5_core_dev *mdev);
> +void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev);
> +int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb);
> +void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev);
> +
> +#endif /* __MLX5_VNET_H_ */
On 2020/7/16 下午4:28, Eli Cohen wrote:
> On Thu, Jul 16, 2020 at 04:13:21PM +0800, Jason Wang wrote:
>> On 2020/7/16 下午3:23, Eli Cohen wrote:
>>> Add code to support registering guest's memory region for the device.
>>
>> It would be better to use "userspace" memory here since vhost-vDPA
>> could be used by e.g dpdk application on the host in the future.
>>
> How about replaciing "guest's memory" with "address space". It is more
> general and aligns with the with the fact that virio driver can run in
> the guest's kernel.
Probably but note that guest driver is not the only user for this. It
could be either:
1) Guest virtio driver
2) Userspace virtio driver on the host
3) Kernel virtio driver on the host.
Thanks
>
On 2020/7/16 下午4:21, Eli Cohen wrote:
> On Thu, Jul 16, 2020 at 04:11:00PM +0800, Jason Wang wrote:
>> On 2020/7/16 下午3:23, Eli Cohen wrote:
>>> Currently, get_vq_state() is used only to pass the available index value
>>> of a vq. Extend the struct to return status on the VQ to the caller.
>>> For now, define VQ_STATE_NOT_READY. In the future it will be extended to
>>> include other infomration.
>>>
>>> Modify current vdpa driver to update this field.
>>>
>>> Reviewed-by: Parav Pandit<[email protected]>
>>> Signed-off-by: Eli Cohen<[email protected]>
>> What's the difference between this and get_vq_ready()?
>>
>> Thanks
>>
> There is no difference. It is just a way to communicate a problem to
> with the state of the VQ back to the caller. This is not available now.
> I think an asynchronous is preferred but that is not available
> currently.
I still don't see the reason, maybe you can give me an example?
Thanks
Hi Eli,
Thank you for the patch! Perhaps something to improve:
[auto build test WARNING on next-20200715]
url: https://github.com/0day-ci/linux/commits/Eli-Cohen/VDPA-support-for-Mellanox-ConnectX-devices/20200716-155039
base: ca0e494af5edb59002665bf12871e94b4163a257
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-14) 9.3.0
reproduce (this is a W=1 build):
# save the attached .config to linux build tree
make W=1 ARCH=i386
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All warnings (new ones prefixed by >>):
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:58:60: note: in expansion of macro '__mlx5_dw_bit_off'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:194:2: note: in expansion of macro 'MLX5_SET'
194 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:115:54: note: in definition of macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:58:60: note: in expansion of macro '__mlx5_dw_bit_off'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:194:2: note: in expansion of macro 'MLX5_SET'
194 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/swab.h:115:54: note: in definition of macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:57:47: note: in expansion of macro '__mlx5_bit_sz'
57 | #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:78:48: note: in expansion of macro '__mlx5_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:194:2: note: in expansion of macro 'MLX5_SET'
194 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/swab.h:115:54: note: in definition of macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:194:2: note: in expansion of macro 'MLX5_SET'
194 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:115:54: note: in definition of macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:194:2: note: in expansion of macro 'MLX5_SET'
194 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
drivers/vdpa/mlx5/core/mr.c: At top level:
>> drivers/vdpa/mlx5/core/mr.c:414:5: warning: no previous prototype for 'mlx5_vdpa_create_mr' [-Wmissing-prototypes]
414 | int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb)
| ^~~~~~~~~~~~~~~~~~~
>> drivers/vdpa/mlx5/core/mr.c:425:6: warning: no previous prototype for 'mlx5_vdpa_destroy_mr' [-Wmissing-prototypes]
425 | void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
| ^~~~~~~~~~~~~~~~~~~~
vim +/mlx5_vdpa_create_mr +414 drivers/vdpa/mlx5/core/mr.c
413
> 414 int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb)
415 {
416 struct mlx5_vdpa_mr *mr = &mvdev->mr;
417 int err;
418
419 mutex_lock(&mr->mkey_mtx);
420 err = _mlx5_vdpa_create_mr(mvdev, iotlb);
421 mutex_unlock(&mr->mkey_mtx);
422 return err;
423 }
424
> 425 void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
426 {
427 struct mlx5_vdpa_mr *mr = &mvdev->mr;
428 struct mlx5_vdpa_direct_mr *dmr;
429 struct mlx5_vdpa_direct_mr *n;
430
431 mutex_lock(&mr->mkey_mtx);
432 if (!mr->initialized)
433 goto out;
434
435 destroy_indirect_key(mvdev, mr);
436 list_for_each_entry_safe_reverse(dmr, n, &mr->head, list) {
437 list_del_init(&dmr->list);
438 unmap_direct_mr(mvdev, dmr);
439 kfree(dmr);
440 }
441 memset(mr, 0, sizeof(*mr));
442 mr->initialized = false;
443 out:
444 mutex_unlock(&mr->mkey_mtx);
445 }
446
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On Thu, Jul 16, 2020 at 05:35:18PM +0800, Jason Wang wrote:
>
> On 2020/7/16 下午4:21, Eli Cohen wrote:
> >On Thu, Jul 16, 2020 at 04:11:00PM +0800, Jason Wang wrote:
> >>On 2020/7/16 下午3:23, Eli Cohen wrote:
> >>>Currently, get_vq_state() is used only to pass the available index value
> >>>of a vq. Extend the struct to return status on the VQ to the caller.
> >>>For now, define VQ_STATE_NOT_READY. In the future it will be extended to
> >>>include other infomration.
> >>>
> >>>Modify current vdpa driver to update this field.
> >>>
> >>>Reviewed-by: Parav Pandit<[email protected]>
> >>>Signed-off-by: Eli Cohen<[email protected]>
> >>What's the difference between this and get_vq_ready()?
> >>
> >>Thanks
> >>
> >There is no difference. It is just a way to communicate a problem to
> >with the state of the VQ back to the caller. This is not available now.
> >I think an asynchronous is preferred but that is not available
> >currently.
>
>
> I still don't see the reason, maybe you can give me an example?
>
>
My intention was to provide a mechainsm to return meaningful information
on the state of the vq. For example, when you fail to get the state of
the VQ.
Maybe I could just change the prototype of the function to return int
and the driver could put an error if it has trouble returning the vq
state.
Hi Eli,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on next-20200715]
url: https://github.com/0day-ci/linux/commits/Eli-Cohen/VDPA-support-for-Mellanox-ConnectX-devices/20200716-155039
base: ca0e494af5edb59002665bf12871e94b4163a257
config: mips-allyesconfig (attached as .config)
compiler: mips-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=mips
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>):
In file included from include/linux/mlx5/driver.h:52,
from drivers/vdpa/mlx5/core/resources.c:4:
drivers/vdpa/mlx5/core/resources.c: In function 'create_uctx':
>> include/linux/mlx5/device.h:65:36: error: invalid application of 'sizeof' to incomplete type 'struct mlx5_ifc_create_uctx_out_bits'
65 | #define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
| ^~~~~~
drivers/vdpa/mlx5/core/resources.c:52:10: note: in expansion of macro 'MLX5_ST_SZ_DW'
52 | u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
| ^~~~~~~~~~~~~
>> drivers/vdpa/mlx5/core/resources.c:52:44: error: empty scalar initializer
52 | u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
| ^
drivers/vdpa/mlx5/core/resources.c:52:44: note: (near initialization for 'out')
In file included from include/linux/byteorder/big_endian.h:5,
from arch/mips/include/uapi/asm/byteorder.h:13,
from arch/mips/include/asm/bitops.h:20,
from include/linux/bitops.h:29,
from include/linux/kernel.h:12,
from include/linux/mlx5/driver.h:36,
from drivers/vdpa/mlx5/core/resources.c:4:
>> include/linux/compiler_types.h:135:35: error: invalid use of undefined type 'struct mlx5_ifc_create_uctx_out_bits'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/big_endian.h:40:51: note: in definition of macro '__be32_to_cpu'
40 | #define __be32_to_cpu(x) ((__force __u32)(__be32)(x))
| ^
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:53:34: note: in expansion of macro '__mlx5_bit_off'
53 | #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:96:1: note: in expansion of macro '__mlx5_dw_off'
96 | __mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
| ^~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:72:10: note: in expansion of macro 'MLX5_GET'
72 | *uid = MLX5_GET(create_uctx_out, out, uid);
| ^~~~~~~~
In file included from include/linux/mlx5/driver.h:52,
from drivers/vdpa/mlx5/core/resources.c:4:
>> include/linux/mlx5/device.h:50:57: error: dereferencing pointer to incomplete type 'struct mlx5_ifc_create_uctx_out_bits'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:96:30: note: in expansion of macro '__mlx5_dw_bit_off'
96 | __mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:72:10: note: in expansion of macro 'MLX5_GET'
72 | *uid = MLX5_GET(create_uctx_out, out, uid);
| ^~~~~~~~
In file included from <command-line>:
>> include/linux/compiler_types.h:135:35: error: invalid use of undefined type 'struct mlx5_ifc_create_uctx_out_bits'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:96:30: note: in expansion of macro '__mlx5_dw_bit_off'
96 | __mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:72:10: note: in expansion of macro 'MLX5_GET'
72 | *uid = MLX5_GET(create_uctx_out, out, uid);
| ^~~~~~~~
drivers/vdpa/mlx5/core/resources.c:52:6: warning: unused variable 'out' [-Wunused-variable]
52 | u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
| ^~~
In file included from include/linux/mlx5/driver.h:52,
from drivers/vdpa/mlx5/core/resources.c:4:
drivers/vdpa/mlx5/core/resources.c: In function 'destroy_uctx':
>> include/linux/mlx5/device.h:65:36: error: invalid application of 'sizeof' to incomplete type 'struct mlx5_ifc_destroy_uctx_out_bits'
65 | #define MLX5_ST_SZ_DW(typ) (sizeof(struct mlx5_ifc_##typ##_bits) / 32)
| ^~~~~~
drivers/vdpa/mlx5/core/resources.c:79:10: note: in expansion of macro 'MLX5_ST_SZ_DW'
79 | u32 out[MLX5_ST_SZ_DW(destroy_uctx_out)] = {};
| ^~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:79:45: error: empty scalar initializer
79 | u32 out[MLX5_ST_SZ_DW(destroy_uctx_out)] = {};
| ^
drivers/vdpa/mlx5/core/resources.c:79:45: note: (near initialization for 'out')
drivers/vdpa/mlx5/core/resources.c:79:6: warning: unused variable 'out' [-Wunused-variable]
79 | u32 out[MLX5_ST_SZ_DW(destroy_uctx_out)] = {};
| ^~~
drivers/vdpa/mlx5/core/resources.c: At top level:
drivers/vdpa/mlx5/core/resources.c:184:5: warning: no previous prototype for 'mlx5_vdpa_create_mkey' [-Wmissing-prototypes]
184 | int mlx5_vdpa_create_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey, u32 *in,
| ^~~~~~~~~~~~~~~~~~~~~
In file included from <command-line>:
drivers/vdpa/mlx5/core/resources.c: In function 'mlx5_vdpa_create_mkey':
>> include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:53:34: note: in expansion of macro '__mlx5_bit_off'
53 | #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:76:20: note: in expansion of macro '__mlx5_dw_off'
76 | *((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
| ^~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:193:2: note: in expansion of macro 'MLX5_SET'
193 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
In file included from include/linux/byteorder/big_endian.h:5,
from arch/mips/include/uapi/asm/byteorder.h:13,
from arch/mips/include/asm/bitops.h:20,
from include/linux/bitops.h:29,
from include/linux/kernel.h:12,
from include/linux/mlx5/driver.h:36,
from drivers/vdpa/mlx5/core/resources.c:4:
>> include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/byteorder/generic.h:95:21: note: in expansion of macro '__be32_to_cpu'
95 | #define be32_to_cpu __be32_to_cpu
| ^~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:53:34: note: in expansion of macro '__mlx5_bit_off'
53 | #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:77:45: note: in expansion of macro '__mlx5_dw_off'
77 | cpu_to_be32((be32_to_cpu(*((__be32 *)(p) + __mlx5_dw_off(typ, fld))) & \
| ^~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:193:2: note: in expansion of macro 'MLX5_SET'
193 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/mlx5/device.h:57:47: note: in expansion of macro '__mlx5_bit_sz'
57 | #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:58:35: note: in expansion of macro '__mlx5_mask'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:193:2: note: in expansion of macro 'MLX5_SET'
193 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:58:60: note: in expansion of macro '__mlx5_dw_bit_off'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:193:2: note: in expansion of macro 'MLX5_SET'
193 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:58:60: note: in expansion of macro '__mlx5_dw_bit_off'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:193:2: note: in expansion of macro 'MLX5_SET'
193 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/mlx5/device.h:57:47: note: in expansion of macro '__mlx5_bit_sz'
57 | #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:78:48: note: in expansion of macro '__mlx5_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:193:2: note: in expansion of macro 'MLX5_SET'
193 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:193:2: note: in expansion of macro 'MLX5_SET'
193 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:193:2: note: in expansion of macro 'MLX5_SET'
193 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
drivers/vdpa/mlx5/core/resources.c: At top level:
drivers/vdpa/mlx5/core/resources.c:208:5: warning: no previous prototype for 'mlx5_vdpa_destroy_mkey' [-Wmissing-prototypes]
208 | int mlx5_vdpa_destroy_mkey(struct mlx5_vdpa_dev *mvdev, struct mlx5_core_mkey *mkey)
| ^~~~~~~~~~~~~~~~~~~~~~
In file included from <command-line>:
drivers/vdpa/mlx5/core/resources.c: In function 'mlx5_vdpa_destroy_mkey':
>> include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_destroy_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:53:34: note: in expansion of macro '__mlx5_bit_off'
53 | #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:76:20: note: in expansion of macro '__mlx5_dw_off'
76 | *((__be32 *)(p) + __mlx5_dw_off(typ, fld)) = \
| ^~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:212:2: note: in expansion of macro 'MLX5_SET'
212 | MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
In file included from include/linux/byteorder/big_endian.h:5,
from arch/mips/include/uapi/asm/byteorder.h:13,
from arch/mips/include/asm/bitops.h:20,
from include/linux/bitops.h:29,
from include/linux/kernel.h:12,
from include/linux/mlx5/driver.h:36,
from drivers/vdpa/mlx5/core/resources.c:4:
>> include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_destroy_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/byteorder/generic.h:95:21: note: in expansion of macro '__be32_to_cpu'
95 | #define be32_to_cpu __be32_to_cpu
| ^~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:53:34: note: in expansion of macro '__mlx5_bit_off'
53 | #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:77:45: note: in expansion of macro '__mlx5_dw_off'
77 | cpu_to_be32((be32_to_cpu(*((__be32 *)(p) + __mlx5_dw_off(typ, fld))) & \
| ^~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:212:2: note: in expansion of macro 'MLX5_SET'
212 | MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_destroy_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/mlx5/device.h:57:47: note: in expansion of macro '__mlx5_bit_sz'
57 | #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:58:35: note: in expansion of macro '__mlx5_mask'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:212:2: note: in expansion of macro 'MLX5_SET'
212 | MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_destroy_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:58:60: note: in expansion of macro '__mlx5_dw_bit_off'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:212:2: note: in expansion of macro 'MLX5_SET'
212 | MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_destroy_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:58:60: note: in expansion of macro '__mlx5_dw_bit_off'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:212:2: note: in expansion of macro 'MLX5_SET'
212 | MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
>> include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_destroy_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/mlx5/device.h:57:47: note: in expansion of macro '__mlx5_bit_sz'
57 | #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:78:48: note: in expansion of macro '__mlx5_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:212:2: note: in expansion of macro 'MLX5_SET'
212 | MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_destroy_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:212:2: note: in expansion of macro 'MLX5_SET'
212 | MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_destroy_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/byteorder/big_endian.h:39:51: note: in definition of macro '__cpu_to_be32'
39 | #define __cpu_to_be32(x) ((__force __be32)(__u32)(x))
| ^
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/resources.c:212:2: note: in expansion of macro 'MLX5_SET'
212 | MLX5_SET(destroy_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
vim +52 drivers/vdpa/mlx5/core/resources.c
49
50 static int create_uctx(struct mlx5_vdpa_dev *mvdev, u16 *uid)
51 {
> 52 u32 out[MLX5_ST_SZ_DW(create_uctx_out)] = {};
53 int inlen;
54 void *in;
55 int err;
56
57 /* 0 means not supported */
58 if (!MLX5_CAP_GEN(mvdev->mdev, log_max_uctx))
59 return -EOPNOTSUPP;
60
61 inlen = MLX5_ST_SZ_BYTES(create_uctx_in);
62 in = kzalloc(inlen, GFP_KERNEL);
63 if (!in)
64 return -ENOMEM;
65
66 MLX5_SET(create_uctx_in, in, opcode, MLX5_CMD_OP_CREATE_UCTX);
67 MLX5_SET(create_uctx_in, in, uctx.cap, MLX5_UCTX_CAP_RAW_TX);
68
69 err = mlx5_cmd_exec(mvdev->mdev, in, inlen, out, sizeof(out));
70 kfree(in);
71 if (!err)
72 *uid = MLX5_GET(create_uctx_out, out, uid);
73
74 return err;
75 }
76
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On Thu, Jul 16, 2020 at 05:14:32PM +0800, Jason Wang wrote:
>
> >+static void suspend_vqs(struct mlx5_vdpa_net *ndev)
> >+{
> >+ int i;
> >+
> >+ for (i = 0; i < MLX5_MAX_SUPPORTED_VQS; i++)
> >+ suspend_vq(ndev, &ndev->vqs[i]);
>
>
> In teardown_virtqueues() it has a check of mvq->num_ent, any reason
> not doing it here?
>
Looks like checking intialized is enough. Will fix this.
> >+
> >+static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready)
> >+{
> >+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> >+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> >+ int err;
> >+
> >+ if (!mvq->ready && ready && mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
> >+ err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> >+ if (err) {
> >+ mlx5_vdpa_warn(mvdev, "failed to modify virtqueue(%d)\n", err);
> >+ return;
> >+ }
> >+ }
>
>
> I wonder what's the reason of changing vq state on the hardware
> here. I think we can defer it to set_status().
>
I can defer this to set status.
I just wonder if it is possible that the core vdpa driver may call this
function with ready equals false and after some time call it with ready
equals true.
> (Anyhow we don't sync vq address in set_vq_address()).
>
>
> >+static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
> >+{
> >+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> >+ u16 dev_features;
> >+
> >+ dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
> >+ ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features);
> >+ if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
> >+ ndev->mvdev.mlx_features |= BIT(VIRTIO_F_VERSION_1);
>
>
> This is interesting. This suggests !VIRTIO_F_VERSION_1 &&
> VIRTIO_F_IOMMU_PLATFORM is valid. But virito spec doesn't allow such
> configuration.
Will fix
>
> So I think you need either:
>
> 1) Fail vDPA device probe when VERSION_1 is not supported
> 2) clear IOMMU_PLATFORM if VERSION_1 is not negotiated
>
> For 2) I guess it can't work, according to the spec, without
> IOMMU_PLATFORM, device need to use PA to access the memory
>
>
> >+ ndev->mvdev.mlx_features |= BIT(VIRTIO_F_ANY_LAYOUT);
I think this should be removed too
> >+ ndev->mvdev.mlx_features |= BIT(VIRTIO_F_IOMMU_PLATFORM);
> >+ if (mlx5_vdpa_max_qps(ndev->mvdev.max_vqs) > 1)
> >+ ndev->mvdev.mlx_features |= BIT(VIRTIO_NET_F_MQ);
Also this, since multqueue requires configuration vq which is not
supported in this version.
> >+
> >+ print_features(mvdev, ndev->mvdev.mlx_features, false);
> >+ return ndev->mvdev.mlx_features;
> >+}
> >+
> >+static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
> >+{
> >+ /* FIXME: qemu currently does not set all the feaures due to a bug.
> >+ * Add checks when this is fixed.
> >+ */
>
>
> I think we should add the check now then qemu can get notified. (E.g
> IOMMU_PLATFORM)
Will do.
>
>
> >+}
> >+
> >+#define MLX5_VDPA_MAX_VQ_ENTRIES 256
>
>
> Is this a hardware limitation, qemu can support up to 1K which the
> requirement of some NFV cases.
>
Theoretically the device is limit is much higher. In the near future we
will have a device capability for this. I wanted to stay on the safe side
with this but I can change this if you think it's necessary.
>
> >+
> >+static void mlx5_vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf,
> >+ unsigned int len)
> >+{
> >+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> >+
> >+ if (offset + len < sizeof(struct virtio_net_config))
> >+ memcpy(buf, &ndev->config + offset, len);
>
>
> Note that guest expect LE, what's the endian for ndev->config?
This is struct virtio_net_config from include/uapi/linux/virtio_net.h.
Looking there I see it has mixed endianess. I currently don't touch it
but if I do, I should follow endianess guidance per the struct
definition. So I don't think I should care about endianess when copying.
>
>
> >+}
> >+
> >+static void mlx5_vdpa_set_config(struct vdpa_device *vdev, unsigned int offset, const void *buf,
> >+ unsigned int len)
> >+{
> >+ /* not supported */
> >+}
> >+
> >+static u32 mlx5_vdpa_get_generation(struct vdpa_device *vdev)
> >+{
> >+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >+
> >+ return mvdev->generation;
> >+}
> >+
> >+static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb)
> >+{
> >+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> >+ bool change_map;
> >+ int err;
> >+
> >+ err = mlx5_vdpa_handle_set_map(mvdev, iotlb, &change_map);
> >+ if (err) {
> >+ mlx5_vdpa_warn(mvdev, "set map failed(%d)\n", err);
> >+ return err;
> >+ }
> >+
> >+ if (change_map)
> >+ return mlx5_vdpa_change_map(ndev, iotlb);
>
>
> Any reason for not doing this inside mlx5_handle_set_map()?
>
All memory registration related operations are done inside mr.c in the
common code directory. But change map involves operations on other
objects managed in this file.
> >+
> >+void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev)
> >+{
> >+ struct mlx5_vdpa_net *ndev;
> >+
> >+ mvdev->status = 0;
>
>
> This is probably unnecessary.
>
Right, will remove.
Hi Eli,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on next-20200715]
url: https://github.com/0day-ci/linux/commits/Eli-Cohen/VDPA-support-for-Mellanox-ConnectX-devices/20200716-155039
base: ca0e494af5edb59002665bf12871e94b4163a257
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-14) 9.3.0
reproduce (this is a W=1 build):
# save the attached .config to linux build tree
make W=1 ARCH=i386
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All error/warnings (new ones prefixed by >>):
>> drivers/vdpa/mlx5/net/main.c:7:10: fatal error: mlx5_vdpa_ifc.h: No such file or directory
7 | #include "mlx5_vdpa_ifc.h"
| ^~~~~~~~~~~~~~~~~
compilation terminated.
--
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:58:60: note: in expansion of macro '__mlx5_dw_bit_off'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/net/mlx5_vnet.c:1192:2: note: in expansion of macro 'MLX5_SET'
1192 | MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
| ^~~~~~~~
include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_rqtc_bits' has no member named 'list_q_type'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:115:54: note: in definition of macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:58:60: note: in expansion of macro '__mlx5_dw_bit_off'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/net/mlx5_vnet.c:1192:2: note: in expansion of macro 'MLX5_SET'
1192 | MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
| ^~~~~~~~
include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_rqtc_bits' has no member named 'list_q_type'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/swab.h:115:54: note: in definition of macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:57:47: note: in expansion of macro '__mlx5_bit_sz'
57 | #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:78:48: note: in expansion of macro '__mlx5_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~
drivers/vdpa/mlx5/net/mlx5_vnet.c:1192:2: note: in expansion of macro 'MLX5_SET'
1192 | MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
| ^~~~~~~~
include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_rqtc_bits' has no member named 'list_q_type'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/swab.h:115:54: note: in definition of macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/net/mlx5_vnet.c:1192:2: note: in expansion of macro 'MLX5_SET'
1192 | MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
| ^~~~~~~~
include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_rqtc_bits' has no member named 'list_q_type'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:115:54: note: in definition of macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/net/mlx5_vnet.c:1192:2: note: in expansion of macro 'MLX5_SET'
1192 | MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q);
| ^~~~~~~~
>> drivers/vdpa/mlx5/net/mlx5_vnet.c:1171:6: warning: variable 'acutal_rqt' set but not used [-Wunused-but-set-variable]
1171 | int acutal_rqt;
| ^~~~~~~~~~
In file included from include/linux/swab.h:5,
from include/uapi/linux/byteorder/little_endian.h:13,
from include/linux/byteorder/little_endian.h:5,
from arch/x86/include/uapi/asm/byteorder.h:5,
from include/asm-generic/bitops/le.h:6,
from arch/x86/include/asm/bitops.h:395,
from include/linux/bitops.h:29,
from include/linux/kernel.h:12,
from include/linux/vdpa.h:5,
from drivers/vdpa/mlx5/net/mlx5_vnet.c:4:
drivers/vdpa/mlx5/net/mlx5_vnet.c: In function 'mlx5_vdpa_get_features':
include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_device_virtio_emulation_cap_bits' has no member named 'device_features_bits_mask'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:115:54: note: in definition of macro '__swab32'
115 | #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
| ^
include/linux/byteorder/generic.h:95:21: note: in expansion of macro '__be32_to_cpu'
95 | #define be32_to_cpu __be32_to_cpu
| ^~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:53:34: note: in expansion of macro '__mlx5_bit_off'
53 | #define __mlx5_dw_off(typ, fld) (__mlx5_bit_off(typ, fld) / 32)
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:96:1: note: in expansion of macro '__mlx5_dw_off'
96 | __mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:1364:2: note: in expansion of macro 'MLX5_GET'
1364 | MLX5_GET(device_virtio_emulation_cap, \
| ^~~~~~~~
drivers/vdpa/mlx5/net/mlx5_vnet.c:1469:17: note: in expansion of macro 'MLX5_CAP_DEV_VDPA_EMULATION'
1469 | dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from include/linux/mlx5/qp.h:36,
from drivers/vdpa/mlx5/net/mlx5_vnet.c:7:
include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_device_virtio_emulation_cap_bits' has no member named 'device_features_bits_mask'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:96:30: note: in expansion of macro '__mlx5_dw_bit_off'
96 | __mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:1364:2: note: in expansion of macro 'MLX5_GET'
1364 | MLX5_GET(device_virtio_emulation_cap, \
| ^~~~~~~~
drivers/vdpa/mlx5/net/mlx5_vnet.c:1469:17: note: in expansion of macro 'MLX5_CAP_DEV_VDPA_EMULATION'
1469 | dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from <command-line>:
include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_device_virtio_emulation_cap_bits' has no member named 'device_features_bits_mask'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:96:30: note: in expansion of macro '__mlx5_dw_bit_off'
96 | __mlx5_dw_off(typ, fld))) >> __mlx5_dw_bit_off(typ, fld)) & \
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:1364:2: note: in expansion of macro 'MLX5_GET'
1364 | MLX5_GET(device_virtio_emulation_cap, \
| ^~~~~~~~
drivers/vdpa/mlx5/net/mlx5_vnet.c:1469:17: note: in expansion of macro 'MLX5_CAP_DEV_VDPA_EMULATION'
1469 | dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from include/linux/mlx5/qp.h:36,
from drivers/vdpa/mlx5/net/mlx5_vnet.c:7:
include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_device_virtio_emulation_cap_bits' has no member named 'device_features_bits_mask'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/linux/mlx5/device.h:57:47: note: in expansion of macro '__mlx5_bit_sz'
57 | #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:97:1: note: in expansion of macro '__mlx5_mask'
97 | __mlx5_mask(typ, fld))
| ^~~~~~~~~~~
include/linux/mlx5/device.h:1364:2: note: in expansion of macro 'MLX5_GET'
1364 | MLX5_GET(device_virtio_emulation_cap, \
| ^~~~~~~~
drivers/vdpa/mlx5/net/mlx5_vnet.c:1469:17: note: in expansion of macro 'MLX5_CAP_DEV_VDPA_EMULATION'
1469 | dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from include/linux/swab.h:5,
from include/uapi/linux/byteorder/little_endian.h:13,
from include/linux/byteorder/little_endian.h:5,
from arch/x86/include/uapi/asm/byteorder.h:5,
vim +7 drivers/vdpa/mlx5/net/main.c
3
4 #include <linux/module.h>
5 #include <linux/mlx5/driver.h>
6 #include <linux/mlx5/device.h>
> 7 #include "mlx5_vdpa_ifc.h"
8 #include "mlx5_vnet.h"
9
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
Hi Eli,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on next-20200715]
url: https://github.com/0day-ci/linux/commits/Eli-Cohen/VDPA-support-for-Mellanox-ConnectX-devices/20200716-155039
base: ca0e494af5edb59002665bf12871e94b4163a257
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=ia64
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>):
120 | __fswab32(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:58:60: note: in expansion of macro '__mlx5_dw_bit_off'
58 | #define __mlx5_dw_mask(typ, fld) (__mlx5_mask(typ, fld) << __mlx5_dw_bit_off(typ, fld))
| ^~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:78:10: note: in expansion of macro '__mlx5_dw_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:194:2: note: in expansion of macro 'MLX5_SET'
194 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/swab.h:120:12: note: in definition of macro '__swab32'
120 | __fswab32(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:57:47: note: in expansion of macro '__mlx5_bit_sz'
57 | #define __mlx5_mask(typ, fld) ((u32)((1ull << __mlx5_bit_sz(typ, fld)) - 1))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:78:48: note: in expansion of macro '__mlx5_mask'
78 | (~__mlx5_dw_mask(typ, fld))) | (((_v) & __mlx5_mask(typ, fld)) \
| ^~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:194:2: note: in expansion of macro 'MLX5_SET'
194 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
include/linux/mlx5/device.h:50:57: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
50 | #define __mlx5_bit_sz(typ, fld) sizeof(__mlx5_nullp(typ)->fld)
| ^~
include/uapi/linux/swab.h:120:12: note: in definition of macro '__swab32'
120 | __fswab32(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:56:43: note: in expansion of macro '__mlx5_bit_sz'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:194:2: note: in expansion of macro 'MLX5_SET'
194 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
include/linux/compiler_types.h:135:35: error: 'struct mlx5_ifc_create_mkey_in_bits' has no member named 'uid'
135 | #define __compiler_offsetof(a, b) __builtin_offsetof(a, b)
| ^~~~~~~~~~~~~~~~~~
include/uapi/linux/swab.h:120:12: note: in definition of macro '__swab32'
120 | __fswab32(x))
| ^
include/linux/byteorder/generic.h:94:21: note: in expansion of macro '__cpu_to_be32'
94 | #define cpu_to_be32 __cpu_to_be32
| ^~~~~~~~~~~~~
include/linux/stddef.h:17:32: note: in expansion of macro '__compiler_offsetof'
17 | #define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
| ^~~~~~~~~~~~~~~~~~~
include/linux/mlx5/device.h:51:35: note: in expansion of macro 'offsetof'
51 | #define __mlx5_bit_off(typ, fld) (offsetof(struct mlx5_ifc_##typ##_bits, fld))
| ^~~~~~~~
include/linux/mlx5/device.h:56:70: note: in expansion of macro '__mlx5_bit_off'
56 | #define __mlx5_dw_bit_off(typ, fld) (32 - __mlx5_bit_sz(typ, fld) - (__mlx5_bit_off(typ, fld) & 0x1f))
| ^~~~~~~~~~~~~~
include/linux/mlx5/device.h:79:11: note: in expansion of macro '__mlx5_dw_bit_off'
79 | << __mlx5_dw_bit_off(typ, fld))); \
| ^~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:194:2: note: in expansion of macro 'MLX5_SET'
194 | MLX5_SET(create_mkey_in, in, uid, mvdev->res.uid);
| ^~~~~~~~
In file included from arch/ia64/include/asm/ptrace.h:46,
from arch/ia64/include/asm/processor.h:20,
from arch/ia64/include/asm/thread_info.h:12,
from include/linux/thread_info.h:38,
from include/asm-generic/preempt.h:5,
from ./arch/ia64/include/generated/asm/preempt.h:1,
from include/linux/preempt.h:78,
from include/linux/rcupdate.h:27,
from include/linux/rculist.h:11,
from include/linux/pid.h:5,
from include/linux/sched.h:14,
from include/linux/ratelimit.h:6,
from include/linux/dev_printk.h:16,
from include/linux/device.h:15,
from include/linux/vdpa.h:6,
from drivers/vdpa/mlx5/core/mr.c:4:
drivers/vdpa/mlx5/core/mr.c: In function 'map_direct_mr':
>> drivers/vdpa/mlx5/core/mr.c:254:21: error: implicit declaration of function '__phys_to_pfn'; did you mean 'page_to_pfn'? [-Werror=implicit-function-declaration]
254 | pg = pfn_to_page(__phys_to_pfn(pa));
| ^~~~~~~~~~~~~
arch/ia64/include/asm/page.h:108:40: note: in definition of macro 'pfn_to_page'
108 | # define pfn_to_page(pfn) (vmem_map + (pfn))
| ^~~
drivers/vdpa/mlx5/core/mr.c: At top level:
drivers/vdpa/mlx5/core/mr.c:414:5: warning: no previous prototype for 'mlx5_vdpa_create_mr' [-Wmissing-prototypes]
414 | int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb)
| ^~~~~~~~~~~~~~~~~~~
drivers/vdpa/mlx5/core/mr.c:425:6: warning: no previous prototype for 'mlx5_vdpa_destroy_mr' [-Wmissing-prototypes]
425 | void mlx5_vdpa_destroy_mr(struct mlx5_vdpa_dev *mvdev)
| ^~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
vim +254 drivers/vdpa/mlx5/core/mr.c
215
216 static int map_direct_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_direct_mr *mr,
217 struct vhost_iotlb *iotlb)
218 {
219 struct vhost_iotlb_map *map;
220 unsigned long lgcd = 0;
221 int log_entity_size;
222 unsigned long size;
223 u64 start = 0;
224 int err;
225 struct page *pg;
226 unsigned int nsg;
227 int sglen;
228 u64 pa;
229 u64 paend;
230 struct scatterlist *sg;
231 struct device *dma = mvdev->mdev->device;
232 int ret;
233
234 for (map = vhost_iotlb_itree_first(iotlb, mr->start, mr->end - 1);
235 map; map = vhost_iotlb_itree_next(map, start, mr->end - 1)) {
236 size = maplen(map, mr);
237 lgcd = gcd(lgcd, size);
238 start += size;
239 }
240 log_entity_size = ilog2(lgcd);
241
242 sglen = 1 << log_entity_size;
243 nsg = DIV_ROUND_UP(mr->end - mr->start, sglen);
244
245 err = sg_alloc_table(&mr->sg_head, nsg, GFP_KERNEL);
246 if (err)
247 return err;
248
249 sg = mr->sg_head.sgl;
250 for (map = vhost_iotlb_itree_first(iotlb, mr->start, mr->end - 1);
251 map; map = vhost_iotlb_itree_next(map, mr->start, mr->end - 1)) {
252 paend = map->addr + maplen(map, mr);
253 for (pa = map->addr; pa < paend; pa += sglen) {
> 254 pg = pfn_to_page(__phys_to_pfn(pa));
255 if (!sg) {
256 mlx5_vdpa_warn(mvdev, "sg null. start 0x%llx, end 0x%llx\n",
257 map->start, map->last + 1);
258 err = -ENOMEM;
259 goto err_map;
260 }
261 sg_set_page(sg, pg, sglen, 0);
262 sg = sg_next(sg);
263 if (!sg)
264 goto done;
265 }
266 }
267 done:
268 mr->log_size = log_entity_size;
269 mr->nsg = nsg;
270 ret = dma_map_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
271 if (!ret)
272 goto err_map;
273
274 err = create_direct_mr(mvdev, mr);
275 if (err)
276 goto err_direct;
277
278 return 0;
279
280 err_direct:
281 dma_unmap_sg_attrs(dma, mr->sg_head.sgl, mr->nsg, DMA_BIDIRECTIONAL, 0);
282 err_map:
283 sg_free_table(&mr->sg_head);
284 return err;
285 }
286
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]
On 2020/7/16 下午7:54, Eli Cohen wrote:
> On Thu, Jul 16, 2020 at 05:14:32PM +0800, Jason Wang wrote:
>>> +static void suspend_vqs(struct mlx5_vdpa_net *ndev)
>>> +{
>>> + int i;
>>> +
>>> + for (i = 0; i < MLX5_MAX_SUPPORTED_VQS; i++)
>>> + suspend_vq(ndev, &ndev->vqs[i]);
>>
>> In teardown_virtqueues() it has a check of mvq->num_ent, any reason
>> not doing it here?
>>
> Looks like checking intialized is enough. Will fix this.
>
>>> +
>>> +static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready)
>>> +{
>>> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
>>> + int err;
>>> +
>>> + if (!mvq->ready && ready && mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
>>> + err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
>>> + if (err) {
>>> + mlx5_vdpa_warn(mvdev, "failed to modify virtqueue(%d)\n", err);
>>> + return;
>>> + }
>>> + }
>>
>> I wonder what's the reason of changing vq state on the hardware
>> here. I think we can defer it to set_status().
>>
> I can defer this to set status.
>
> I just wonder if it is possible that the core vdpa driver may call this
> function with ready equals false and after some time call it with ready
> equals true.
Good point, so I think we can keep the logic. But looks like the code
can not work if ready equals false since it only tries to modify vq
state to RDY.
>
>
>> (Anyhow we don't sync vq address in set_vq_address()).
>>
>>
>>> +static u64 mlx5_vdpa_get_features(struct vdpa_device *vdev)
>>> +{
>>> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>> + u16 dev_features;
>>> +
>>> + dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask);
>>> + ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features);
>>> + if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0))
>>> + ndev->mvdev.mlx_features |= BIT(VIRTIO_F_VERSION_1);
>>
>> This is interesting. This suggests !VIRTIO_F_VERSION_1 &&
>> VIRTIO_F_IOMMU_PLATFORM is valid. But virito spec doesn't allow such
>> configuration.
> Will fix
>> So I think you need either:
>>
>> 1) Fail vDPA device probe when VERSION_1 is not supported
>> 2) clear IOMMU_PLATFORM if VERSION_1 is not negotiated
>>
>> For 2) I guess it can't work, according to the spec, without
>> IOMMU_PLATFORM, device need to use PA to access the memory
>>
>>
>>> + ndev->mvdev.mlx_features |= BIT(VIRTIO_F_ANY_LAYOUT);
> I think this should be removed too
Yes, I guess for hardware vDPA which depends on IOMMU_PLATFORM which
implies VERSION_1 that implies ANY_LAYOUT.
>
>>> + ndev->mvdev.mlx_features |= BIT(VIRTIO_F_IOMMU_PLATFORM);
>>> + if (mlx5_vdpa_max_qps(ndev->mvdev.max_vqs) > 1)
>>> + ndev->mvdev.mlx_features |= BIT(VIRTIO_NET_F_MQ);
> Also this, since multqueue requires configuration vq which is not
> supported in this version.
Yes.
>
>>> +
>>> + print_features(mvdev, ndev->mvdev.mlx_features, false);
>>> + return ndev->mvdev.mlx_features;
>>> +}
>>> +
>>> +static int verify_min_features(struct mlx5_vdpa_dev *mvdev, u64 features)
>>> +{
>>> + /* FIXME: qemu currently does not set all the feaures due to a bug.
>>> + * Add checks when this is fixed.
>>> + */
>>
>> I think we should add the check now then qemu can get notified. (E.g
>> IOMMU_PLATFORM)
> Will do.
>>
>>> +}
>>> +
>>> +#define MLX5_VDPA_MAX_VQ_ENTRIES 256
>>
>> Is this a hardware limitation, qemu can support up to 1K which the
>> requirement of some NFV cases.
>>
> Theoretically the device is limit is much higher. In the near future we
> will have a device capability for this. I wanted to stay on the safe side
> with this but I can change this if you think it's necessary.
I see, that's fine. Let keep this untouched.
>>> +
>>> +static void mlx5_vdpa_get_config(struct vdpa_device *vdev, unsigned int offset, void *buf,
>>> + unsigned int len)
>>> +{
>>> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>> +
>>> + if (offset + len < sizeof(struct virtio_net_config))
>>> + memcpy(buf, &ndev->config + offset, len);
>>
>> Note that guest expect LE, what's the endian for ndev->config?
> This is struct virtio_net_config from include/uapi/linux/virtio_net.h.
> Looking there I see it has mixed endianess. I currently don't touch it
> but if I do, I should follow endianess guidance per the struct
> definition. So I don't think I should care about endianess when copying.
So guest would expect LE, we need be careful when modify config space
(e.g mtu or status). Consider we don't support VIRTIO_NET_F_STATUS and
VIRTIO_NET_F_MTU, we're probably fine.
>
>>
>>> +}
>>> +
>>> +static void mlx5_vdpa_set_config(struct vdpa_device *vdev, unsigned int offset, const void *buf,
>>> + unsigned int len)
>>> +{
>>> + /* not supported */
>>> +}
>>> +
>>> +static u32 mlx5_vdpa_get_generation(struct vdpa_device *vdev)
>>> +{
>>> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>> +
>>> + return mvdev->generation;
>>> +}
>>> +
>>> +static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb)
>>> +{
>>> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>> + bool change_map;
>>> + int err;
>>> +
>>> + err = mlx5_vdpa_handle_set_map(mvdev, iotlb, &change_map);
>>> + if (err) {
>>> + mlx5_vdpa_warn(mvdev, "set map failed(%d)\n", err);
>>> + return err;
>>> + }
>>> +
>>> + if (change_map)
>>> + return mlx5_vdpa_change_map(ndev, iotlb);
>>
>> Any reason for not doing this inside mlx5_handle_set_map()?
>>
> All memory registration related operations are done inside mr.c in the
> common code directory. But change map involves operations on other
> objects managed in this file.
Ok.
(Note that we can do more generalization in the future, since the only
network specific part is the config space and control vq)
>
>>> +
>>> +void mlx5_vdpa_remove_dev(struct mlx5_vdpa_dev *mvdev)
>>> +{
>>> + struct mlx5_vdpa_net *ndev;
>>> +
>>> + mvdev->status = 0;
>>
>> This is probably unnecessary.
>>
> Right, will remove.
Thanks
>
On Fri, Jul 17, 2020 at 04:57:29PM +0800, Jason Wang wrote:
> >Looks like checking intialized is enough. Will fix this.
> >>>+
> >>>+static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready)
> >>>+{
> >>>+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >>>+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> >>>+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> >>>+ int err;
> >>>+
> >>>+ if (!mvq->ready && ready && mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
> >>>+ err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> >>>+ if (err) {
> >>>+ mlx5_vdpa_warn(mvdev, "failed to modify virtqueue(%d)\n", err);
> >>>+ return;
> >>>+ }
> >>>+ }
> >>
> >>I wonder what's the reason of changing vq state on the hardware
> >>here. I think we can defer it to set_status().
> >>
> >I can defer this to set status.
> >
> >I just wonder if it is possible that the core vdpa driver may call this
> >function with ready equals false and after some time call it with ready
> >equals true.
>
>
> Good point, so I think we can keep the logic. But looks like the
> code can not work if ready equals false since it only tries to
> modify vq state to RDY.
>
The point is that you cannot modify the virtqueue to "not ready". The
only option is to destroy it and create a new one. This means that if I
get ready equals false after the virtqueue has been created I need to
teardown the driver and set it up again.
Given that, I think your original suggestion to defer this logic is
reasonable.
On 2020/7/19 上午3:49, Eli Cohen wrote:
> On Fri, Jul 17, 2020 at 04:57:29PM +0800, Jason Wang wrote:
>>> Looks like checking intialized is enough. Will fix this.
>>>>> +
>>>>> +static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready)
>>>>> +{
>>>>> + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
>>>>> + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
>>>>> + struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
>>>>> + int err;
>>>>> +
>>>>> + if (!mvq->ready && ready && mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
>>>>> + err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
>>>>> + if (err) {
>>>>> + mlx5_vdpa_warn(mvdev, "failed to modify virtqueue(%d)\n", err);
>>>>> + return;
>>>>> + }
>>>>> + }
>>>> I wonder what's the reason of changing vq state on the hardware
>>>> here. I think we can defer it to set_status().
>>>>
>>> I can defer this to set status.
>>>
>>> I just wonder if it is possible that the core vdpa driver may call this
>>> function with ready equals false and after some time call it with ready
>>> equals true.
>>
>> Good point, so I think we can keep the logic. But looks like the
>> code can not work if ready equals false since it only tries to
>> modify vq state to RDY.
>>
> The point is that you cannot modify the virtqueue to "not ready".
Is this a hardware limitation of software one?
I'm asking since we need support live migration. But a questions is how
to stop the device but not reset, since we need get e.g last_avail_idx
from the device.
It could be either:
1) set_status(0)
2) get_vq_state()
or
1) set_queue_ready(0)
2) get_vq_state()
Set_status(0) means reset the virtio device but last_avail_idx is
something out of virtio spec. I guess using set_queue_ready() is better.
What's you opinion?
Thanks
> The
> only option is to destroy it and create a new one. This means that if I
> get ready equals false after the virtqueue has been created I need to
> teardown the driver and set it up again.
>
> Given that, I think your original suggestion to defer this logic is
> reasonable.
>
On 2020/7/16 下午6:25, Eli Cohen wrote:
> On Thu, Jul 16, 2020 at 05:35:18PM +0800, Jason Wang wrote:
>> On 2020/7/16 下午4:21, Eli Cohen wrote:
>>> On Thu, Jul 16, 2020 at 04:11:00PM +0800, Jason Wang wrote:
>>>> On 2020/7/16 下午3:23, Eli Cohen wrote:
>>>>> Currently, get_vq_state() is used only to pass the available index value
>>>>> of a vq. Extend the struct to return status on the VQ to the caller.
>>>>> For now, define VQ_STATE_NOT_READY. In the future it will be extended to
>>>>> include other infomration.
>>>>>
>>>>> Modify current vdpa driver to update this field.
>>>>>
>>>>> Reviewed-by: Parav Pandit<[email protected]>
>>>>> Signed-off-by: Eli Cohen<[email protected]>
>>>> What's the difference between this and get_vq_ready()?
>>>>
>>>> Thanks
>>>>
>>> There is no difference. It is just a way to communicate a problem to
>>> with the state of the VQ back to the caller. This is not available now.
>>> I think an asynchronous is preferred but that is not available
>>> currently.
>>
>> I still don't see the reason, maybe you can give me an example?
>>
>>
> My intention was to provide a mechainsm to return meaningful information
> on the state of the vq. For example, when you fail to get the state of
> the VQ.
>
> Maybe I could just change the prototype of the function to return int
> and the driver could put an error if it has trouble returning the vq
> state.
That's fine.
Thanks
On Mon, Jul 20, 2020 at 12:12:30PM +0800, Jason Wang wrote:
>
> On 2020/7/19 上午3:49, Eli Cohen wrote:
> >On Fri, Jul 17, 2020 at 04:57:29PM +0800, Jason Wang wrote:
> >>>Looks like checking intialized is enough. Will fix this.
> >>>>>+
> >>>>>+static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready)
> >>>>>+{
> >>>>>+ struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev);
> >>>>>+ struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev);
> >>>>>+ struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx];
> >>>>>+ int err;
> >>>>>+
> >>>>>+ if (!mvq->ready && ready && mvq->fw_state != MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) {
> >>>>>+ err = modify_virtqueue(ndev, mvq, MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY);
> >>>>>+ if (err) {
> >>>>>+ mlx5_vdpa_warn(mvdev, "failed to modify virtqueue(%d)\n", err);
> >>>>>+ return;
> >>>>>+ }
> >>>>>+ }
> >>>>I wonder what's the reason of changing vq state on the hardware
> >>>>here. I think we can defer it to set_status().
> >>>>
> >>>I can defer this to set status.
> >>>
> >>>I just wonder if it is possible that the core vdpa driver may call this
> >>>function with ready equals false and after some time call it with ready
> >>>equals true.
> >>
> >>Good point, so I think we can keep the logic. But looks like the
> >>code can not work if ready equals false since it only tries to
> >>modify vq state to RDY.
> >>
> >The point is that you cannot modify the virtqueue to "not ready".
>
>
> Is this a hardware limitation of software one?
Sorry, but I was not accruate in my statement above. You can suspend the
hardware VQ but you cannot mover out of suspend back to ready.
>
> I'm asking since we need support live migration. But a questions is
> how to stop the device but not reset, since we need get e.g
> last_avail_idx from the device.
>
> It could be either:
>
> 1) set_status(0)
> 2) get_vq_state()
>
> or
>
> 1) set_queue_ready(0)
> 2) get_vq_state()
>
This can work.
> Set_status(0) means reset the virtio device but last_avail_idx is
> something out of virtio spec. I guess using set_queue_ready() is
> better.
>
> What's you opinion?
So if the intention to set ready(0) as a preliminary state for live
migration then we're ok. We just need to keep in mind that there's no
way of suspend but destroy the hardware vq.
>
> Thanks
>
>
> > The
> >only option is to destroy it and create a new one. This means that if I
> >get ready equals false after the virtqueue has been created I need to
> >teardown the driver and set it up again.
> >
> >Given that, I think your original suggestion to defer this logic is
> >reasonable.
> >
>