2019-03-08 22:08:57

by Parav Pandit

[permalink] [raw]
Subject: [RFC net-next v1 0/3] Support mlx5 mediated devices in host

Use case:
---------
A user wants to create/delete hardware linked sub devices without
using SR-IOV.
These devices for a pci device can be netdev (optional rdma device)
or other devices. Such sub devices share some of the PCI device
resources and also have their own dedicated resources.
A user wants to use this device in a host where PF PCI device exist.
(not in a guest VM.) A user may want to use such sub device in future
in guest VM.

Few examples are:
1. netdev having its own txq(s), rq(s) and/or hw offload parameters.
2. netdev with switchdev mode using netdev representor
3. rdma device with IB link layer and IPoIB netdev
4. rdma/RoCE device and a netdev
5. rdma device with multiple ports

Requirements for above use cases:
--------------------------------
1. We need a generic user interface & core APIs to create sub devices
from a parent pci device but should be generic enough for other parent
devices
2. Interface should be vendor agnostic
3. User should be able to set device params at creation time
4. In future if needed, tool should be able to create passthrough
device to map to a virtual machine
5. A device can have multiple ports
6. An orchestration software wants to know how many such sub devices
can be created from a parent device so that it can manage them in global
cluster resources.

So how is it done?
------------------
Kernel has existing mediated device infrastructure for lifecyle of such
sub devices provided by mdev driver.
Hence, these sub devices are created with help of mdev driver.

mlx5_core driver registers with mdev core to do so and exposes necessary
sysfs files.

Each creates sub device has unique uuid id assigned by the user.

mdev sub devices inherit their parent's dma parameters.

Each registered mdev has corresponding devlink instance. Through this
devlink instance, such device and it port(s) are managed.

In order to use mediated device in a VM or in host, user decides
which driver to use. Typically vfio_mdev is used to expose a mdev in a
guest VM. In current use case, mlx5 mediated devices are only usable
inside the host through mlx5_core driver binding to it.

Patchset summary:
-----------------
Patch-1 adds support to inherit dma params of parent device in child mdev.
Patch-2 registers with mdev core.
Patch-3 registers a mdev device driver to create actual netdev.

Summary of alternatives considered and discussed:
-------------------------------------------------
1. new subdev bus
Fits the need but mdev simplifies it.
2. visorbus
Very specific to Unisys s-Par devices.
3. platform devices
Primarily meant for autonomous, SoC etc devices.
4. mfd devices
Depends on platform device infra.
5. Directly creating netdev, rdma device instead of sub device
Doesn't fit use case of passthrough mode.
6. creating subports of devlink instance
Doesn't cover multiport rdma device usecase.

While discussion [1], [2] is still ongoing, v1 is posted to describe
how two use cases of using mdev in host or in guest via standard Linux
device driver model are addressed.

[1] https://www.spinics.net/lists/netdev/msg556552.html
[2] https://www.spinics.net/lists/netdev/msg556944.html

All patches are only a reference implementation to see framework in
works at devlink, sysfs, mdev and device model level. Once RFC looks good,
solid upstreamable version of the implementation will be done.

System view with one mdev:
--------------------------

$ ls -l /sys/bus/pci/devices/0000:05:00.0
[..]
drwxr-xr-x 3 root root 0 Mar 8 14:53 69ea1551-d054-46e9-974d-8edae8f0aefe
drwxr-xr-x 3 root root 0 Mar 8 15:41 infiniband
drwxr-xr-x 3 root root 0 Mar 8 15:41 mdev_supported_types
-rw-r--r-- 1 root root 4096 Mar 8 13:17 msi_bus
drwxr-xr-x 2 root root 0 Mar 8 15:41 msi_irqs
drwxr-xr-x 3 root root 0 Mar 8 15:41 net

ls -l /sys/bus/mdev/drivers
total 0
drwxr-xr-x 2 root root 0 Mar 8 13:39 mlx5_core
drwxr-xr-x 2 root root 0 Mar 8 14:53 vfio_mdev

ls -l /sys/bus/mdev/devices/
total 0
lrwxrwxrwx 1 root root 0 Mar 8 14:53 69ea1551-d054-46e9-974d-8edae8f0aefe -> ../../../devices/pci0000:00/0000:00:02.2/0000:05:00.0/69ea1551-d054-46e9-974d-8edae8f0aefe

Bind mdev to mlx5_core driver:
$ echo 69ea1551-d054-46e9-974d-8edae8f0aefe > /sys/bus/mdev/drivers/mlx5_core/bind

$ ls -l /sys/class/net/eth0/
-r--r--r-- 1 root root 4096 Mar 8 15:43 carrier_up_count
lrwxrwxrwx 1 root root 0 Mar 8 15:43 device -> ../../../69ea1551-d054-46e9-974d-8edae8f0aefe
-r--r--r-- 1 root root 4096 Mar 8 15:43 dev_id

$ devlink dev show
pci/0000:05:00.0
mdev/69ea1551-d054-46e9-974d-8edae8f0aefe

Changelog
---
v0->v1:
- Removed subdev bus, instead using existing mdev bus which fits
the need.
- Dropped devlink patches which are not needed anymore due to use of
mdev framework.
- Updated SPDX license line in patches.
- Added TODO to patches where more hardware specific code will be added.

Parav Pandit (3):
vfio/mdev: Inherit dma masks of parent device
net/mlx5: Add mdev sub device life cycle command support
net/mlx5: Add mdev driver to bind to mdev devices

drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 9 ++
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 5 +
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 18 ++++
drivers/net/ethernet/mellanox/mlx5/core/main.c | 22 ++++
drivers/net/ethernet/mellanox/mlx5/core/mdev.c | 120 +++++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/mdev_driver.c | 106 ++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/mlx5_core.h | 19 ++++
drivers/vfio/mdev/mdev_core.c | 4 +
include/linux/mlx5/driver.h | 5 +
9 files changed, 308 insertions(+)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/mdev.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c

--
1.8.3.1



2019-03-08 22:09:09

by Parav Pandit

[permalink] [raw]
Subject: [RFC net-next v1 2/3] net/mlx5: Add mdev sub device life cycle command support

Implement mdev hooks to to create mediated devices using mdev driver.
Actual mlx5_core driver in the host is expected to bind to these devices
using standard device driver model.

mdev devices are created using sysfs file as below example.

$ uuidgen
49d0e9ac-61b8-4c91-957e-6f6dbc42557d

$ echo 49d0e9ac-61b8-4c91-957e-6f6dbc42557d > \
./bus/pci/devices/0000:05:00.0/mdev_supported_types/mlx5_core-mgmt/create

$ echo 49d0e9ac-61b8-4c91-957e-6f6dbc42557d >
/sys/bus/mdev/drivers/vfio_mdev/unbind

Once mlx5 core driver is registered as mdev driver, mdev can be attached
to mlx5_core driver as below.

$ echo 49d0e9ac-61b8-4c91-957e-6f6dbc42557d >
/sys/bus/mdev/drivers/mlx5_core/bind

devlink output:
$ devlink dev show
pci/0000:05:00.0
mdev/69ea1551-d054-46e9-974d-8edae8f0aefe

Signed-off-by: Parav Pandit <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/Kconfig | 9 ++
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 5 +
drivers/net/ethernet/mellanox/mlx5/core/main.c | 9 ++
drivers/net/ethernet/mellanox/mlx5/core/mdev.c | 120 +++++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/mlx5_core.h | 15 +++
include/linux/mlx5/driver.h | 5 +
6 files changed, 163 insertions(+)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/mdev.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
index 37a5514..881ae1a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Kconfig
@@ -117,3 +117,12 @@ config MLX5_EN_TLS
Build support for TLS cryptography-offload accelaration in the NIC.
Note: Support for hardware with this capability needs to be selected
for this option to become available.
+
+config MLX5_MDEV
+ bool "Mellanox Technologies Mediated device support"
+ depends on MLX5_CORE
+ depends on VFIO_MDEV
+ default y
+ help
+ Build support for mediated devices. Mediated devices allow creating
+ multiple netdev and/or rdma device(s) on single PCI function.
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 82d636b..e5c0822c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -58,4 +58,9 @@ mlx5_core-$(CONFIG_MLX5_EN_IPSEC) += en_accel/ipsec.o en_accel/ipsec_rxtx.o \

mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o en_accel/tls_rxtx.o en_accel/tls_stats.o

+#
+# Mdev basic
+#
+mlx5_core-$(CONFIG_MLX5_MDEV) += mdev.o
+
CFLAGS_tracepoint.o := -I$(src)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 40d591c..72b0072 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -851,10 +851,18 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
goto err_sriov_cleanup;
}

+ err = mlx5_mdev_init(dev);
+ if (err) {
+ dev_err(&pdev->dev, "Failed to init mdev device %d\n", err);
+ goto err_fpga_cleanup;
+ }
+
dev->tracer = mlx5_fw_tracer_create(dev);

return 0;

+err_fpga_cleanup:
+ mlx5_fpga_cleanup(dev);
err_sriov_cleanup:
mlx5_sriov_cleanup(dev);
err_eswitch_cleanup:
@@ -881,6 +889,7 @@ static int mlx5_init_once(struct mlx5_core_dev *dev, struct mlx5_priv *priv)
static void mlx5_cleanup_once(struct mlx5_core_dev *dev)
{
mlx5_fw_tracer_destroy(dev->tracer);
+ mlx5_mdev_cleanup(dev);
mlx5_fpga_cleanup(dev);
mlx5_sriov_cleanup(dev);
mlx5_eswitch_cleanup(dev->priv.eswitch);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mdev.c b/drivers/net/ethernet/mellanox/mlx5/core/mdev.c
new file mode 100644
index 0000000..e8e4aac
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mdev.c
@@ -0,0 +1,120 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018-19 Mellanox Technologies
+
+#include <net/devlink.h>
+#include <linux/mdev.h>
+
+#include "mlx5_core.h"
+
+#define MLX5_MAX_MDEVS 1
+
+struct mlx5_mdev {
+ struct mlx5_core_dev *dev;
+};
+
+static int mlx5_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
+{
+ struct mlx5_core_dev *mlx5_dev;
+ struct device *parent_dev;
+ struct mlx5_mdev *mmdev;
+ struct devlink *devlink;
+ bool added;
+ int err;
+
+ parent_dev = mdev_parent_dev(mdev);
+ mlx5_dev = pci_get_drvdata(to_pci_dev(parent_dev));
+
+ added = atomic_add_unless(&mlx5_dev->mdev_info.cnt, 1, MLX5_MAX_MDEVS);
+ if (!added)
+ return -ENOSPC;
+
+ devlink = devlink_alloc(NULL, sizeof(*mmdev));
+ if (!devlink) {
+ atomic_dec(&mlx5_dev->mdev_info.cnt);
+ return -ENOMEM;
+ }
+ mmdev = devlink_priv(devlink);
+ mmdev->dev = mlx5_dev;
+ mdev_set_drvdata(mdev, mmdev);
+
+ /* TODO: create a mediated device hw object that driver
+ * can work on later on to enable interrupts, create queues etc.
+ */
+ err = devlink_register(devlink, mdev_dev(mdev));
+ if (err) {
+ devlink_free(devlink);
+ atomic_dec(&mlx5_dev->mdev_info.cnt);
+ }
+ return err;
+}
+
+static int mlx5_mdev_remove(struct mdev_device *mdev)
+{
+ struct mlx5_core_dev *mlx5_dev;
+ struct device *parent_dev;
+ struct mlx5_mdev *mmdev;
+ struct devlink *devlink;
+
+ parent_dev = mdev_parent_dev(mdev);
+ mlx5_dev = pci_get_drvdata(to_pci_dev(parent_dev));
+
+ mmdev = mdev_get_drvdata(mdev);
+ devlink = priv_to_devlink(mmdev);
+
+ devlink_unregister(devlink);
+ devlink_free(devlink);
+ atomic_dec(&mlx5_dev->mdev_info.cnt);
+ return 0;
+}
+
+static ssize_t
+num_mdevs_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+ struct pci_dev *pdev = to_pci_dev(dev);
+ struct mlx5_core_dev *mlx5_dev;
+
+ mlx5_dev = pci_get_drvdata(pdev);
+
+ return sprintf(buf, "%d\n", atomic_read(&mlx5_dev->mdev_info.cnt));
+}
+MDEV_TYPE_ATTR_RO(num_mdevs);
+
+static ssize_t
+available_instances_show(struct kobject *kobj, struct device *dev, char *buf)
+{
+ return sprintf(buf, "%d\n", MLX5_MAX_MDEVS);
+}
+MDEV_TYPE_ATTR_RO(available_instances);
+
+static struct attribute *mdev_dev_attrs[] = {
+ &mdev_type_attr_num_mdevs.attr,
+ &mdev_type_attr_available_instances.attr,
+ NULL,
+};
+
+static struct attribute_group mdev_mgmt_group = {
+ .name = "mgmt",
+ .attrs = mdev_dev_attrs,
+};
+
+struct attribute_group *mlx5_mdev_groups[] = {
+ &mdev_mgmt_group,
+ NULL,
+};
+
+const struct mdev_parent_ops mlx5_mdev_ops = {
+ .create = mlx5_mdev_create,
+ .remove = mlx5_mdev_remove,
+ .supported_type_groups = mlx5_mdev_groups,
+};
+
+int mlx5_mdev_init(struct mlx5_core_dev *dev)
+{
+ return mdev_register_device(&dev->pdev->dev, &mlx5_mdev_ops);
+}
+
+void mlx5_mdev_cleanup(struct mlx5_core_dev *dev)
+{
+ mdev_unregister_device(&dev->pdev->dev);
+}
+
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 9529cf9..0605a63 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -202,4 +202,19 @@ enum {

u8 mlx5_get_nic_state(struct mlx5_core_dev *dev);
void mlx5_set_nic_state(struct mlx5_core_dev *dev, u8 state);
+
+#ifdef CONFIG_MLX5_MDEV
+int mlx5_mdev_init(struct mlx5_core_dev *mdev);
+void mlx5_mdev_cleanup(struct mlx5_core_dev *mdev);
+#else
+static inline int mlx5_mdev_init(struct mlx5_core_dev *mdev)
+{
+ return 0;
+}
+
+static inline void mlx5_mdev_cleanup(struct mlx5_core_dev *mdev)
+{
+}
+#endif
+
#endif /* __MLX5_CORE_H__ */
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index c2de50f..6836ab9 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -637,6 +637,10 @@ struct mlx5_clock {
struct mlx5_fw_tracer;
struct mlx5_vxlan;

+struct mlx5_mdevs_info {
+ atomic_t cnt;
+};
+
struct mlx5_core_dev {
struct pci_dev *pdev;
/* sync pci state */
@@ -679,6 +683,7 @@ struct mlx5_core_dev {
struct mlx5_ib_clock_info *clock_info;
struct page *clock_info_page;
struct mlx5_fw_tracer *tracer;
+ struct mlx5_mdevs_info mdev_info;
};

struct mlx5_db {
--
1.8.3.1


2019-03-08 22:10:27

by Parav Pandit

[permalink] [raw]
Subject: [RFC net-next v1 3/3] net/mlx5: Add mdev driver to bind to mdev devices

Add a mdev driver to probe the mdev devices and create fake
netdevice for it.
Similar to pci driver, when new mdev are created/removed or when user
triggers binding a mdev to mlx5_core driver by writing
mdev device id to /sys/bus/mdev/drivers/mlx5_core/bind,unbind files,

mlx5_core driver's probe(), remove() are invokes to handle life cycle
of netdev and rdma device associated with the mdev.

Current RFC patch only creates one fake netdev, but in subsequent non
RFC patch, it will create related hw objects and netdev.

Signed-off-by: Parav Pandit <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 18 ++++
drivers/net/ethernet/mellanox/mlx5/core/main.c | 13 +++
.../net/ethernet/mellanox/mlx5/core/mdev_driver.c | 106 +++++++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/mlx5_core.h | 4 +
5 files changed, 142 insertions(+), 1 deletion(-)
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index e5c0822c..bded136a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -61,6 +61,6 @@ mlx5_core-$(CONFIG_MLX5_EN_TLS) += en_accel/tls.o en_accel/tls_rxtx.o en_accel/t
#
# Mdev basic
#
-mlx5_core-$(CONFIG_MLX5_MDEV) += mdev.o
+mlx5_core-$(CONFIG_MLX5_MDEV) += mdev.o mdev_driver.o

CFLAGS_tracepoint.o := -I$(src)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index ebc046f..91b8d8ba 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -321,3 +321,21 @@ int mlx5_dev_list_trylock(void)
{
return mutex_trylock(&mlx5_intf_mutex);
}
+
+struct mlx5_core_dev *mlx5_get_core_dev(const struct device *dev)
+{
+ struct mlx5_core_dev *found = NULL;
+ struct mlx5_core_dev *tmp_dev;
+ struct mlx5_priv *priv;
+
+ mutex_lock(&mlx5_intf_mutex);
+ list_for_each_entry(priv, &mlx5_dev_list, dev_list) {
+ tmp_dev = container_of(priv, struct mlx5_core_dev, priv);
+ if (&tmp_dev->pdev->dev == dev) {
+ found = tmp_dev;
+ break;
+ }
+ }
+ mutex_unlock(&mlx5_intf_mutex);
+ return found;
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 72b0072..c1fc0f4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -40,6 +40,7 @@
#include <linux/io-mapping.h>
#include <linux/interrupt.h>
#include <linux/delay.h>
+#include <linux/mdev.h>
#include <linux/mlx5/driver.h>
#include <linux/mlx5/cq.h>
#include <linux/mlx5/qp.h>
@@ -1553,7 +1554,15 @@ static int __init init(void)
mlx5e_init();
#endif

+#if IS_ENABLED(CONFIG_VFIO_MDEV)
+ err = mdev_register_driver(&mlx5_mdev_driver, THIS_MODULE);
+ if (err) {
+ pci_unregister_driver(&mlx5_core_driver);
+ goto err_debug;
+ }
+#else
return 0;
+#endif

err_debug:
mlx5_unregister_debugfs();
@@ -1562,6 +1571,10 @@ static int __init init(void)

static void __exit cleanup(void)
{
+#if IS_ENABLED(CONFIG_VFIO_MDEV)
+ mdev_unregister_driver(&mlx5_mdev_driver);
+#endif
+
#ifdef CONFIG_MLX5_CORE_EN
mlx5e_cleanup();
#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c b/drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c
new file mode 100644
index 0000000..7618c5e
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mdev_driver.c
@@ -0,0 +1,106 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018-19 Mellanox Technologies
+
+#include <linux/module.h>
+#include <linux/dma-mapping.h>
+#include <linux/etherdevice.h>
+#include <net/devlink.h>
+#include <linux/mdev.h>
+
+#include "mlx5_core.h"
+
+struct mlx5_subdev_ndev {
+ struct net_device ndev;
+};
+
+static void mlx5_dma_test(struct device *dev)
+{
+ dma_addr_t pa;
+ void *va;
+
+ va = dma_alloc_coherent(dev, 4096, &pa, GFP_KERNEL);
+ if (va)
+ dma_free_coherent(dev, 4096, va, pa);
+}
+
+static struct net_device *ndev;
+
+static int mlx5e_mdev_open(struct net_device *netdev)
+{
+ return 0;
+}
+
+static int mlx5e_mdev_close(struct net_device *netdev)
+{
+ return 0;
+}
+
+static netdev_tx_t
+mlx5e_mdev_xmit(struct sk_buff *skb, struct net_device *netdev)
+{
+ return NETDEV_TX_BUSY;
+}
+
+const struct net_device_ops mlx5e_mdev_netdev_ops = {
+ .ndo_open = mlx5e_mdev_open,
+ .ndo_stop = mlx5e_mdev_close,
+ .ndo_start_xmit = mlx5e_mdev_xmit,
+};
+
+static int mlx5_mdev_probe(struct device *dev)
+{
+ struct mdev_device *mdev = mdev_from_dev(dev);
+ struct device *parent_dev = mdev_parent_dev(mdev);
+ struct mlx5_core_dev *mlx5_dev;
+ int err;
+
+ mlx5_dev = mlx5_get_core_dev(parent_dev);
+ if (!mlx5_dev)
+ return -ENODEV;
+
+ mlx5_dma_test(dev);
+
+ ndev = alloc_etherdev_mqs(sizeof(struct mlx5_subdev_ndev), 1, 1);
+ if (!ndev)
+ return -ENOMEM;
+
+ SET_NETDEV_DEV(ndev, dev);
+ ndev->netdev_ops = &mlx5e_mdev_netdev_ops;
+
+ /* TODO:
+ * init_hca().
+ * create eqs, cqs.
+ * attach to irq of parent pci device.
+ */
+ err = register_netdev(ndev);
+ if (err) {
+ free_netdev(ndev);
+ ndev = NULL;
+ }
+ return err;
+}
+
+static void mlx5_mdev_remove(struct device *dev)
+{
+ if (ndev) {
+ unregister_netdev(ndev);
+ free_netdev(ndev);
+ ndev = NULL;
+ }
+}
+
+struct mdev_driver mlx5_mdev_driver = {
+ .name = KBUILD_MODNAME,
+ .probe = mlx5_mdev_probe,
+ .remove = mlx5_mdev_remove,
+};
+
+int __init mlx5_mdev_driver_init(void)
+{
+ return mdev_register_driver(&mlx5_mdev_driver, THIS_MODULE);
+}
+
+void __exit mlx5_mdev_exit(void)
+{
+ mdev_unregister_driver(&mlx5_mdev_driver);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 0605a63..2e2b8b5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -47,6 +47,8 @@

extern uint mlx5_core_debug_mask;

+extern struct mdev_driver mlx5_mdev_driver;
+
#define mlx5_core_dbg(__dev, format, ...) \
dev_dbg(&(__dev)->pdev->dev, "%s:%d:(pid %d): " format, \
__func__, __LINE__, current->pid, \
@@ -217,4 +219,6 @@ static inline void mlx5_mdev_cleanup(struct mlx5_core_dev *mdev)
}
#endif

+struct mlx5_core_dev *mlx5_get_core_dev(const struct device *dev);
+
#endif /* __MLX5_CORE_H__ */
--
1.8.3.1


2019-03-08 22:11:24

by Parav Pandit

[permalink] [raw]
Subject: [RFC net-next v1 1/3] vfio/mdev: Inherit dma masks of parent device

Inherit dma mask of parent device in child mdev devices, so that
protocol stack can use right dma mask while doing dma mappings.

Signed-off-by: Parav Pandit <[email protected]>
---
drivers/vfio/mdev/mdev_core.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index 0212f0e..9b8bdc9 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -315,6 +315,10 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid)
mdev->dev.parent = dev;
mdev->dev.bus = &mdev_bus_type;
mdev->dev.release = mdev_device_release;
+ mdev->dev.dma_mask = dev->dma_mask;
+ mdev->dev.dma_parms = dev->dma_parms;
+ mdev->dev.coherent_dma_mask = dev->coherent_dma_mask;
+
dev_set_name(&mdev->dev, "%pUl", uuid.b);

ret = device_register(&mdev->dev);
--
1.8.3.1


2019-03-08 22:33:22

by Alex Williamson

[permalink] [raw]
Subject: Re: [RFC net-next v1 1/3] vfio/mdev: Inherit dma masks of parent device

On Fri, 8 Mar 2019 16:07:54 -0600
Parav Pandit <[email protected]> wrote:

> Inherit dma mask of parent device in child mdev devices, so that
> protocol stack can use right dma mask while doing dma mappings.
>
> Signed-off-by: Parav Pandit <[email protected]>
> ---
> drivers/vfio/mdev/mdev_core.c | 4 ++++
> 1 file changed, 4 insertions(+)
>
> diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
> index 0212f0e..9b8bdc9 100644
> --- a/drivers/vfio/mdev/mdev_core.c
> +++ b/drivers/vfio/mdev/mdev_core.c
> @@ -315,6 +315,10 @@ int mdev_device_create(struct kobject *kobj, struct device *dev, uuid_le uuid)
> mdev->dev.parent = dev;
> mdev->dev.bus = &mdev_bus_type;
> mdev->dev.release = mdev_device_release;
> + mdev->dev.dma_mask = dev->dma_mask;
> + mdev->dev.dma_parms = dev->dma_parms;
> + mdev->dev.coherent_dma_mask = dev->coherent_dma_mask;
> +
> dev_set_name(&mdev->dev, "%pUl", uuid.b);
>
> ret = device_register(&mdev->dev);

This seems like a rather large assumption and none of the existing mdev
drivers even make use of DMA ops. Why shouldn't this be done in
mdev_parent_ops.create? Thanks,

Alex

2019-03-09 03:57:07

by Parav Pandit

[permalink] [raw]
Subject: RE: [RFC net-next v1 1/3] vfio/mdev: Inherit dma masks of parent device



> -----Original Message-----
> From: Alex Williamson <[email protected]>
> Sent: Friday, March 8, 2019 4:33 PM
> To: Parav Pandit <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; Jiri Pirko <[email protected]>;
> [email protected]; Vu Pham <[email protected]>; Yuval Avnery
> <[email protected]>; [email protected];
> [email protected]
> Subject: Re: [RFC net-next v1 1/3] vfio/mdev: Inherit dma masks of parent
> device
>
> On Fri, 8 Mar 2019 16:07:54 -0600
> Parav Pandit <[email protected]> wrote:
>
> > Inherit dma mask of parent device in child mdev devices, so that
> > protocol stack can use right dma mask while doing dma mappings.
> >
> > Signed-off-by: Parav Pandit <[email protected]>
> > ---
> > drivers/vfio/mdev/mdev_core.c | 4 ++++
> > 1 file changed, 4 insertions(+)
> >
> > diff --git a/drivers/vfio/mdev/mdev_core.c
> > b/drivers/vfio/mdev/mdev_core.c index 0212f0e..9b8bdc9 100644
> > --- a/drivers/vfio/mdev/mdev_core.c
> > +++ b/drivers/vfio/mdev/mdev_core.c
> > @@ -315,6 +315,10 @@ int mdev_device_create(struct kobject *kobj,
> struct device *dev, uuid_le uuid)
> > mdev->dev.parent = dev;
> > mdev->dev.bus = &mdev_bus_type;
> > mdev->dev.release = mdev_device_release;
> > + mdev->dev.dma_mask = dev->dma_mask;
> > + mdev->dev.dma_parms = dev->dma_parms;
> > + mdev->dev.coherent_dma_mask = dev->coherent_dma_mask;
> > +
> > dev_set_name(&mdev->dev, "%pUl", uuid.b);
> >
> > ret = device_register(&mdev->dev);
>
> This seems like a rather large assumption and none of the existing mdev
> drivers even make use of DMA ops.
So its non-harmful anyway.

> Why shouldn't this be done in mdev_parent_ops.create? Thanks,
>
Struct device should be setup correctly before calling device_register().
That is the sane way to access device_register() API.
Doing this under mdev_parent_ops.create() will do it after device_register().
If you want to make it optional, mdev_register_device() can pass a bool flag to setup dma params or not which can be applied conditionally in mdev_device_create() before device_register().

2019-03-13 22:08:47

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC net-next v1 0/3] Support mlx5 mediated devices in host

On Fri, 8 Mar 2019 16:07:53 -0600, Parav Pandit wrote:
> Use case:
> ---------
> A user wants to create/delete hardware linked sub devices without
> using SR-IOV.
> These devices for a pci device can be netdev (optional rdma device)
> or other devices. Such sub devices share some of the PCI device
> resources and also have their own dedicated resources.
> A user wants to use this device in a host where PF PCI device exist.
> (not in a guest VM.) A user may want to use such sub device in future
> in guest VM.
>
> Few examples are:
> 1. netdev having its own txq(s), rq(s) and/or hw offload parameters.
> 2. netdev with switchdev mode using netdev representor

Hi Parav!

Sorry for going quiet, I'm hoping to clarify the use cases and
the devlink part on the other thread with Jiri, and then come back
to implementation details.

> 3. rdma device with IB link layer and IPoIB netdev
> 4. rdma/RoCE device and a netdev
> 5. rdma device with multiple ports


2019-03-14 21:40:40

by Parav Pandit

[permalink] [raw]
Subject: RE: [RFC net-next v1 0/3] Support mlx5 mediated devices in host

Hi Jakub,

> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: Wednesday, March 13, 2019 5:08 PM
> To: Parav Pandit <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; Jiri Pirko <[email protected]>;
> [email protected]; [email protected]; Vu Pham
> <[email protected]>; Yuval Avnery <[email protected]>;
> [email protected]
> Subject: Re: [RFC net-next v1 0/3] Support mlx5 mediated devices in host
>
> On Fri, 8 Mar 2019 16:07:53 -0600, Parav Pandit wrote:
> > Use case:
> > ---------
> > A user wants to create/delete hardware linked sub devices without
> > using SR-IOV.
> > These devices for a pci device can be netdev (optional rdma device) or
> > other devices. Such sub devices share some of the PCI device resources
> > and also have their own dedicated resources.
> > A user wants to use this device in a host where PF PCI device exist.
> > (not in a guest VM.) A user may want to use such sub device in future
> > in guest VM.
> >
> > Few examples are:
> > 1. netdev having its own txq(s), rq(s) and/or hw offload parameters.
> > 2. netdev with switchdev mode using netdev representor
>
> Hi Parav!
>
> Sorry for going quiet, I'm hoping to clarify the use cases and the devlink part
> on the other thread with Jiri, and then come back to implementation details.
>
That's fine. Now we have detached two things so we don't need to create mediated devices etc through devlink anymore.
It is not just implementation details here in RFC.
RFC describes the single and multiport use cases and attachment with devlink too.
It uses existing well defined devlink objects.
So I am not envisioning a big change.

> > 3. rdma device with IB link layer and IPoIB netdev 4. rdma/RoCE device
> > and a netdev 5. rdma device with multiple ports