2022-02-11 22:33:30

by Moshe Shemesh

[permalink] [raw]
Subject: [PATCH net-next v2 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe

Currently SF device has all the aux devices enabled by default. Once
loaded, user who desire to disable some of them need to perform devlink
reload. This operation helps to reclaim memory that was not supposed
to be used, but the lost time in disabling and enabling again cannot be
recovered by this approach[1].
Therefore, introduce a new devlink generic parameter for PCI PF which
spawns SF devices. This parameter sets a flag in order to disable all
auxiliary devices of the SF. i.e.: All children auxiliary devices of SF
for RDMA, eth and vdpa-net are disabled by default and hence no device
initialization is done at probe stage.

The settings introduced here should suit either if ESW and PF are on
same host or not.

Example 1: When ESW and SF hosting PF are the same:

Disable SF aux dev probe:
$ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
value false cmode runtime

Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
hw_addr 00:00:00:00:00:11 state active

Now depending on the use case, the user can enable specific auxiliary
device(s). For example:

$ devlink dev param set auxiliary/mlx5_core.sf.1 \
name enable_vnet value true cmode driverinit

Afterwards, user needs to reload the SF in order for the SF to come up
with the specific configuration:

$ devlink dev reload auxiliary/mlx5_core.sf.1


Example2: ESW and PF are on different hosts.

Disable SF's children auxiliary device probing for the specified PF on
host:
$ devlink dev param set pci/0000:04:00.0 name enable_sfs_aux_devs \
value false cmode runtime

Create SF on ESW side:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11 \
controller 1
$ devlink port function set pci/0000:08:00.0/32768 \
hw_addr 00:00:00:00:00:11 state active

When SF device appears on the host:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
name enable_vnet value true cmode driverinit
$ devlink dev reload auxiliary/mlx5_core.sf.1

changelog:
v1->v2:
- updated example to make clear SF port and SF device creation PFs
- added example when SF port and device creation PFs are on different
hosts

[1]
mlx5 devlink reload is taking about 2 seconds, which means that with
256 SFs we are speaking about ~8.5 minutes.

Shay Drory (4):
net/mlx5: Split function_setup() to enable and open functions
net/mlx5: Delete redundant default assignment of runtime devlink
params
devlink: Add new "enable_sfs_aux_devs" generic device param
net/mlx5: Support enable_sfs_aux_devs devlink param

.../networking/devlink/devlink-params.rst | 5 +
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 16 ++
.../net/ethernet/mellanox/mlx5/core/devlink.c | 51 ++---
.../net/ethernet/mellanox/mlx5/core/eswitch.c | 3 +
.../net/ethernet/mellanox/mlx5/core/health.c | 5 +-
.../net/ethernet/mellanox/mlx5/core/main.c | 183 +++++++++++++++---
.../ethernet/mellanox/mlx5/core/mlx5_core.h | 6 +
.../mellanox/mlx5/core/sf/dev/driver.c | 13 +-
.../ethernet/mellanox/mlx5/core/sf/devlink.c | 40 ++++
.../ethernet/mellanox/mlx5/core/sf/hw_table.c | 7 +
.../net/ethernet/mellanox/mlx5/core/sf/priv.h | 2 +
include/linux/mlx5/driver.h | 1 +
include/net/devlink.h | 4 +
net/core/devlink.c | 5 +
14 files changed, 284 insertions(+), 57 deletions(-)

--
2.26.3


2022-02-14 02:59:09

by Moshe Shemesh

[permalink] [raw]
Subject: [PATCH net-next v2 4/4] net/mlx5: Support enable_sfs_aux_devs devlink param

From: Shay Drory <[email protected]>

In case user wants to configure the SFs, for example: to use only vdpa
functionality, he needs to fully probe a SF, configure what he wants,
and afterward reload the SF.
In order to save the time of the reload, exposing a devlink runtime
param on the PF, that when set to off, the SFs won't create any auxiliary
sub-device, so that the SF can be configured prior to its full probe.

The new supported settings here should suit either if ESW and PF are on
same host or not.

Usage example, when ESW and SF hosting PF are the same:

Disable SF aux dev probe:
$ devlink dev param set pci/0000:08:00.0 name enable_sfs_aux_devs \
value false cmode runtime

Create SF:
$ devlink port add pci/0000:08:00.0 flavour pcisf pfnum 0 sfnum 11
$ devlink port function set pci/0000:08:00.0/32768 \
hw_addr 00:00:00:00:00:11 state active

Enable ETH auxiliary device:
$ devlink dev param set auxiliary/mlx5_core.sf.1 \
name enable_eth value true cmode driverinit

Now, in order to fully probe the SF, use devlink reload:
$ devlink dev reload auxiliary/mlx5_core.sf.1

At this point the user have SF devlink instance with auxiliary device
for the Ethernet functionality only.

Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
---
changelog:
v1->v2:
- Clarify the example in commit message suits the case of ESW and SF
hosting PFs are the same.
---
drivers/net/ethernet/mellanox/mlx5/core/dev.c | 16 +++
.../net/ethernet/mellanox/mlx5/core/devlink.c | 31 +++++-
.../net/ethernet/mellanox/mlx5/core/health.c | 5 +-
.../net/ethernet/mellanox/mlx5/core/main.c | 102 +++++++++++++++++-
.../ethernet/mellanox/mlx5/core/mlx5_core.h | 6 ++
.../mellanox/mlx5/core/sf/dev/driver.c | 13 ++-
.../ethernet/mellanox/mlx5/core/sf/devlink.c | 40 +++++++
.../ethernet/mellanox/mlx5/core/sf/hw_table.c | 7 ++
.../net/ethernet/mellanox/mlx5/core/sf/priv.h | 2 +
include/linux/mlx5/driver.h | 1 +
10 files changed, 211 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/dev.c b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
index ba6dad97e308..89ead37b98ad 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/dev.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/dev.c
@@ -333,6 +333,18 @@ static void del_adev(struct auxiliary_device *adev)
auxiliary_device_uninit(adev);
}

+void mlx5_dev_mark_unregistered(struct mlx5_core_dev *dev)
+{
+ mutex_lock(&mlx5_intf_mutex);
+ dev->priv.flags |= MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV;
+ mutex_unlock(&mlx5_intf_mutex);
+}
+
+bool mlx5_dev_is_unregistered(struct mlx5_core_dev *dev)
+{
+ return dev->priv.flags & MLX5_PRIV_FLAGS_DISABLE_ALL_ADEV;
+}
+
int mlx5_attach_device(struct mlx5_core_dev *dev)
{
struct mlx5_priv *priv = &dev->priv;
@@ -473,6 +485,10 @@ static int add_drivers(struct mlx5_core_dev *dev)
if (!is_supported)
continue;

+ if (mlx5_adev_devices[i].is_enabled &&
+ !(mlx5_adev_devices[i].is_enabled(dev)))
+ continue;
+
priv->adev[i] = add_adev(dev, i);
if (IS_ERR(priv->adev[i])) {
mlx5_core_warn(dev, "Device[%d] (%s) failed to load\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index e832a3f4c18a..1b769b5da577 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -139,6 +139,13 @@ static int mlx5_devlink_reload_down(struct devlink *devlink, bool netns_change,
struct pci_dev *pdev = dev->pdev;
bool sf_dev_allocated;

+ if (mlx5_dev_is_unregistered(dev)) {
+ if (action != DEVLINK_RELOAD_ACTION_DRIVER_REINIT)
+ return -EOPNOTSUPP;
+ mlx5_unload_one_light(dev);
+ return 0;
+ }
+
sf_dev_allocated = mlx5_sf_dev_allocated(dev);
if (sf_dev_allocated) {
/* Reload results in deleting SF device which further results in
@@ -182,6 +189,10 @@ static int mlx5_devlink_reload_up(struct devlink *devlink, enum devlink_reload_a
*actions_performed = BIT(action);
switch (action) {
case DEVLINK_RELOAD_ACTION_DRIVER_REINIT:
+ if (mlx5_dev_is_unregistered(dev)) {
+ mlx5_fw_reporters_create(dev);
+ return mlx5_init_one(dev);
+ }
return mlx5_load_one(dev);
case DEVLINK_RELOAD_ACTION_FW_ACTIVATE:
if (limit == DEVLINK_RELOAD_LIMIT_NO_RESET)
@@ -258,6 +269,9 @@ static int mlx5_devlink_trap_action_set(struct devlink *devlink,
struct mlx5_devlink_trap *dl_trap;
int err = 0;

+ if (mlx5_dev_is_unregistered(dev))
+ return -EOPNOTSUPP;
+
if (is_mdev_switchdev_mode(dev)) {
NL_SET_ERR_MSG_MOD(extack, "Devlink traps can't be set in switchdev mode");
return -EOPNOTSUPP;
@@ -440,6 +454,9 @@ static int mlx5_devlink_fs_mode_get(struct devlink *devlink, u32 id,
{
struct mlx5_core_dev *dev = devlink_priv(devlink);

+ if (mlx5_dev_is_unregistered(dev))
+ return -EOPNOTSUPP;
+
if (dev->priv.steering->mode == MLX5_FLOW_STEERING_MODE_SMFS)
strcpy(ctx->val.vstr, "smfs");
else
@@ -499,6 +516,9 @@ static int mlx5_devlink_esw_port_metadata_get(struct devlink *devlink, u32 id,
{
struct mlx5_core_dev *dev = devlink_priv(devlink);

+ if (mlx5_dev_is_unregistered(dev))
+ return -EOPNOTSUPP;
+
if (!MLX5_ESWITCH_MANAGER(dev))
return -EOPNOTSUPP;

@@ -542,6 +562,9 @@ static int mlx5_devlink_enable_remote_dev_reset_get(struct devlink *devlink, u32
{
struct mlx5_core_dev *dev = devlink_priv(devlink);

+ if (mlx5_dev_is_unregistered(dev))
+ return -EOPNOTSUPP;
+
ctx->val.vbool = mlx5_fw_reset_enable_remote_dev_reset_get(dev);
return 0;
}
@@ -588,7 +611,7 @@ static void mlx5_devlink_set_params_init_values(struct devlink *devlink)
struct mlx5_core_dev *dev = devlink_priv(devlink);
union devlink_param_value value;

- value.vbool = MLX5_CAP_GEN(dev, roce);
+ value.vbool = MLX5_CAP_GEN(dev, roce) && !mlx5_dev_is_unregistered(dev);
devlink_param_driverinit_value_set(devlink,
DEVLINK_PARAM_GENERIC_ID_ENABLE_ROCE,
value);
@@ -628,7 +651,7 @@ static int mlx5_devlink_eth_param_register(struct devlink *devlink)
if (err)
return err;

- value.vbool = true;
+ value.vbool = !mlx5_dev_is_unregistered(dev);
devlink_param_driverinit_value_set(devlink,
DEVLINK_PARAM_GENERIC_ID_ENABLE_ETH,
value);
@@ -673,7 +696,7 @@ static int mlx5_devlink_rdma_param_register(struct devlink *devlink)
if (err)
return err;

- value.vbool = true;
+ value.vbool = !mlx5_dev_is_unregistered(devlink_priv(devlink));
devlink_param_driverinit_value_set(devlink,
DEVLINK_PARAM_GENERIC_ID_ENABLE_RDMA,
value);
@@ -705,7 +728,7 @@ static int mlx5_devlink_vnet_param_register(struct devlink *devlink)
if (err)
return err;

- value.vbool = true;
+ value.vbool = !mlx5_dev_is_unregistered(dev);
devlink_param_driverinit_value_set(devlink,
DEVLINK_PARAM_GENERIC_ID_ENABLE_VNET,
value);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c
index 737df402c927..1b40ada0e9c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/health.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c
@@ -700,7 +700,7 @@ static const struct devlink_health_reporter_ops mlx5_fw_fatal_reporter_ops = {
};

#define MLX5_REPORTER_FW_GRACEFUL_PERIOD 1200000
-static void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
+void mlx5_fw_reporters_create(struct mlx5_core_dev *dev)
{
struct mlx5_core_health *health = &dev->priv.health;
struct devlink *devlink = priv_to_devlink(dev);
@@ -893,7 +893,8 @@ int mlx5_health_init(struct mlx5_core_dev *dev)
struct mlx5_core_health *health;
char *name;

- mlx5_fw_reporters_create(dev);
+ if (!mlx5_dev_is_unregistered(dev))
+ mlx5_fw_reporters_create(dev);

health = &dev->priv.health;
name = kmalloc(64, GFP_KERNEL);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 73d2cec01ead..b5e40ad0b5fb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1314,6 +1314,7 @@ static void mlx5_unload(struct mlx5_core_dev *dev)

int mlx5_init_one(struct mlx5_core_dev *dev)
{
+ bool light_probe = mlx5_dev_is_unregistered(dev);
int err = 0;

mutex_lock(&dev->intf_state_mutex);
@@ -1335,9 +1336,14 @@ int mlx5_init_one(struct mlx5_core_dev *dev)

set_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);

- err = mlx5_devlink_register(priv_to_devlink(dev));
- if (err)
- goto err_devlink_reg;
+ /* In case of light_probe, mlx5_devlink is already registered.
+ * Hence, don't register devlink again.
+ */
+ if (!light_probe) {
+ err = mlx5_devlink_register(priv_to_devlink(dev));
+ if (err)
+ goto err_devlink_reg;
+ }

err = mlx5_register_device(dev);
if (err)
@@ -1347,7 +1353,8 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
return 0;

err_register:
- mlx5_devlink_unregister(priv_to_devlink(dev));
+ if (!light_probe)
+ mlx5_devlink_unregister(priv_to_devlink(dev));
err_devlink_reg:
clear_bit(MLX5_INTERFACE_STATE_UP, &dev->intf_state);
mlx5_unload(dev);
@@ -1443,6 +1450,93 @@ void mlx5_unload_one(struct mlx5_core_dev *dev)
mutex_unlock(&dev->intf_state_mutex);
}

+/* In case of light probe, we don't need a full query of hca_caps, but only the bellow caps.
+ * A full query of hca_caps will be done when the device will reload.
+ */
+static int mlx5_query_hca_caps_light(struct mlx5_core_dev *dev)
+{
+ int err;
+
+ err = mlx5_core_get_caps(dev, MLX5_CAP_GENERAL);
+ if (err)
+ return err;
+
+ if (MLX5_CAP_GEN(dev, eth_net_offloads)) {
+ err = mlx5_core_get_caps(dev, MLX5_CAP_ETHERNET_OFFLOADS);
+ if (err)
+ return err;
+ }
+
+ if (MLX5_CAP_GEN(dev, nic_flow_table) ||
+ MLX5_CAP_GEN(dev, ipoib_enhanced_offloads)) {
+ err = mlx5_core_get_caps(dev, MLX5_CAP_FLOW_TABLE);
+ if (err)
+ return err;
+ }
+
+ if (MLX5_CAP_GEN_64(dev, general_obj_types) &
+ MLX5_GENERAL_OBJ_TYPES_CAP_VIRTIO_NET_Q) {
+ err = mlx5_core_get_caps(dev, MLX5_CAP_VDPA_EMULATION);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+int mlx5_init_one_light(struct mlx5_core_dev *dev)
+{
+ int err;
+
+ dev->state = MLX5_DEVICE_STATE_UP;
+ err = mlx5_function_enable(dev);
+ if (err) {
+ mlx5_core_warn(dev, "mlx5_function_enable err=%d\n", err);
+ goto out;
+ }
+
+ err = mlx5_query_hca_caps_light(dev);
+ if (err) {
+ mlx5_core_warn(dev, "mlx5_query_hca_caps_light err=%d\n", err);
+ goto query_hca_caps_err;
+ }
+
+ err = mlx5_devlink_register(priv_to_devlink(dev));
+ if (err) {
+ mlx5_core_warn(dev, "mlx5_devlink_reg err = %d\n", err);
+ goto query_hca_caps_err;
+ }
+
+ return 0;
+
+query_hca_caps_err:
+ mlx5_function_disable(dev);
+out:
+ dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
+ return err;
+}
+
+void mlx5_uninit_one_light(struct mlx5_core_dev *dev)
+{
+ mlx5_devlink_unregister(priv_to_devlink(dev));
+ if (dev->state != MLX5_DEVICE_STATE_UP)
+ return;
+ mlx5_function_disable(dev);
+}
+
+/* xxx_ligth() function are used in order to configure the device without full
+ * init (light init). e.g.: There isn't a point in reload a device to light state.
+ * Hence, mlx5_load_one_light() isn't needed.
+ */
+
+void mlx5_unload_one_light(struct mlx5_core_dev *dev)
+{
+ if (dev->state != MLX5_DEVICE_STATE_UP)
+ return;
+ mlx5_function_disable(dev);
+ dev->state = MLX5_DEVICE_STATE_INTERNAL_ERROR;
+}
+
static const int types[] = {
MLX5_CAP_GENERAL,
MLX5_CAP_GENERAL_2,
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
index 6f8baa0f2a73..e2d827474552 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.h
@@ -209,11 +209,14 @@ int mlx5_attach_device(struct mlx5_core_dev *dev);
void mlx5_detach_device(struct mlx5_core_dev *dev);
int mlx5_register_device(struct mlx5_core_dev *dev);
void mlx5_unregister_device(struct mlx5_core_dev *dev);
+void mlx5_dev_mark_unregistered(struct mlx5_core_dev *dev);
+bool mlx5_dev_is_unregistered(struct mlx5_core_dev *dev);
struct mlx5_core_dev *mlx5_get_next_phys_dev(struct mlx5_core_dev *dev);
void mlx5_dev_list_lock(void);
void mlx5_dev_list_unlock(void);
int mlx5_dev_list_trylock(void);

+void mlx5_fw_reporters_create(struct mlx5_core_dev *dev);
int mlx5_query_mtpps(struct mlx5_core_dev *dev, u32 *mtpps, u32 mtpps_size);
int mlx5_set_mtpps(struct mlx5_core_dev *mdev, u32 *mtpps, u32 mtpps_size);
int mlx5_query_mtppse(struct mlx5_core_dev *mdev, u8 pin, u8 *arm, u8 *mode);
@@ -291,6 +294,9 @@ int mlx5_init_one(struct mlx5_core_dev *dev);
void mlx5_uninit_one(struct mlx5_core_dev *dev);
void mlx5_unload_one(struct mlx5_core_dev *dev);
int mlx5_load_one(struct mlx5_core_dev *dev);
+int mlx5_init_one_light(struct mlx5_core_dev *dev);
+void mlx5_uninit_one_light(struct mlx5_core_dev *dev);
+void mlx5_unload_one_light(struct mlx5_core_dev *dev);

int mlx5_vport_get_other_func_cap(struct mlx5_core_dev *dev, u16 function_id, void *out);

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
index 7b4783ce213e..8d6e0e44e614 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
@@ -28,6 +28,9 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
mdev->priv.adev_idx = adev->id;
sf_dev->mdev = mdev;

+ if (sf_dev->parent_mdev->priv.sfs_light_probe)
+ mlx5_dev_mark_unregistered(mdev);
+
err = mlx5_mdev_init(mdev, MLX5_DEFAULT_PROF);
if (err) {
mlx5_core_warn(mdev, "mlx5_mdev_init on err=%d\n", err);
@@ -41,7 +44,10 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
goto remap_err;
}

- err = mlx5_init_one(mdev);
+ if (sf_dev->parent_mdev->priv.sfs_light_probe)
+ err = mlx5_init_one_light(mdev);
+ else
+ err = mlx5_init_one(mdev);
if (err) {
mlx5_core_warn(mdev, "mlx5_init_one err=%d\n", err);
goto init_one_err;
@@ -64,7 +70,10 @@ static void mlx5_sf_dev_remove(struct auxiliary_device *adev)
struct devlink *devlink = priv_to_devlink(sf_dev->mdev);

devlink_unregister(devlink);
- mlx5_uninit_one(sf_dev->mdev);
+ if (mlx5_dev_is_unregistered(sf_dev->mdev))
+ mlx5_uninit_one_light(sf_dev->mdev);
+ else
+ mlx5_uninit_one(sf_dev->mdev);
iounmap(sf_dev->mdev->iseg);
mlx5_mdev_uninit(sf_dev->mdev);
mlx5_devlink_free(devlink);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
index 3be659cd91f1..e3f6066ab7af 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/devlink.c
@@ -10,6 +10,7 @@
#include "ecpf.h"
#define CREATE_TRACE_POINTS
#include "diag/sf_tracepoint.h"
+#include "devlink.h"

struct mlx5_sf {
struct devlink_port dl_port;
@@ -456,6 +457,44 @@ static int mlx5_sf_vhca_event(struct notifier_block *nb, unsigned long opcode, v
return 0;
}

+static int mlx5_devlink_sfs_light_probe_get(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+ ctx->val.vbool = !dev->priv.sfs_light_probe;
+ return 0;
+}
+
+static int mlx5_devlink_sfs_light_probe_set(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+ dev->priv.sfs_light_probe = !ctx->val.vbool;
+ return 0;
+}
+
+static const struct devlink_param sfs_light_probe_param =
+ DEVLINK_PARAM_GENERIC(ENABLE_SFS_AUX_DEVS, BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+ mlx5_devlink_sfs_light_probe_get,
+ mlx5_devlink_sfs_light_probe_set, NULL);
+
+int mlx5_devlink_sfs_light_probe_param_register(struct devlink *devlink)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+
+ if (!mlx5_core_is_pf(dev) || !mlx5_sf_max_functions(dev))
+ return 0;
+
+ return devlink_param_register(devlink, &sfs_light_probe_param);
+}
+
+void mlx5_devlink_sfs_light_probe_param_unregister(struct devlink *devlink)
+{
+ devlink_param_unregister(devlink, &sfs_light_probe_param);
+}
+
static void mlx5_sf_table_enable(struct mlx5_sf_table *table)
{
init_completion(&table->disable_complete);
@@ -533,6 +572,7 @@ int mlx5_sf_table_init(struct mlx5_core_dev *dev)
table->dev = dev;
xa_init(&table->port_indices);
dev->priv.sf_table = table;
+ dev->priv.sfs_light_probe = false;
refcount_set(&table->refcount, 0);
table->esw_nb.notifier_call = mlx5_sf_esw_event;
err = mlx5_esw_event_notifier_register(dev->priv.eswitch, &table->esw_nb);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
index 17aa348989cb..664704598b76 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/hw_table.c
@@ -283,9 +283,15 @@ int mlx5_sf_hw_table_init(struct mlx5_core_dev *dev)
if (err)
goto ext_err;

+ err = mlx5_devlink_sfs_light_probe_param_register(priv_to_devlink(dev));
+ if (err)
+ goto sfs_reg_err;
+
mlx5_core_dbg(dev, "SF HW table: max sfs = %d, ext sfs = %d\n", max_fn, max_ext_fn);
return 0;

+sfs_reg_err:
+ mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_EXTERNAL]);
ext_err:
mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_LOCAL]);
table_err:
@@ -301,6 +307,7 @@ void mlx5_sf_hw_table_cleanup(struct mlx5_core_dev *dev)
if (!table)
return;

+ mlx5_devlink_sfs_light_probe_param_unregister(priv_to_devlink(dev));
mutex_destroy(&table->table_lock);
mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_EXTERNAL]);
mlx5_sf_hw_table_hwc_cleanup(&table->hwc[MLX5_SF_HWC_LOCAL]);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
index 7114f3fc335f..c228edc8e43c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/priv.h
@@ -19,4 +19,6 @@ void mlx5_sf_hw_table_sf_free(struct mlx5_core_dev *dev, u32 controller, u16 id)
void mlx5_sf_hw_table_sf_deferred_free(struct mlx5_core_dev *dev, u32 controller, u16 id);
bool mlx5_sf_hw_table_supported(const struct mlx5_core_dev *dev);

+int mlx5_devlink_sfs_light_probe_param_register(struct devlink *devlink);
+void mlx5_devlink_sfs_light_probe_param_unregister(struct devlink *devlink);
#endif
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 78655d8d13a7..fd07175896ff 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -604,6 +604,7 @@ struct mlx5_priv {
struct mlx5_vhca_state_notifier *vhca_state_notifier;
struct mlx5_sf_dev_table *sf_dev_table;
struct mlx5_core_dev *parent_mdev;
+ u8 sfs_light_probe:1;
#endif
#ifdef CONFIG_MLX5_SF_MANAGER
struct mlx5_sf_hw_table *sf_hw_table;
--
2.26.3

2022-02-14 19:29:05

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v2 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe

On Fri, 11 Feb 2022 11:20:17 +0200 Moshe Shemesh wrote:
> v1->v2:
> - updated example to make clear SF port and SF device creation PFs
> - added example when SF port and device creation PFs are on different hosts

How does this address my comments?

We will not define Linux APIs based on what your firmware can or
cannot do today. Can we somehow avoid having another frustrating
and drawn out discussion that hinges on that point?

Otherwise, why the global policy and all the hoops to jump thru?
User wants a device with a vnet, give them a device with a vnet.

You left out from your steps how ESW learns that the device has
to be spawned. Given there's some form of communication between
user intent and ESW the location of the two is completely irrelevant.
You were right to treat the two cases as equivalent in the cover
letter for v1.

2022-02-14 20:32:50

by Parav Pandit

[permalink] [raw]
Subject: RE: [PATCH net-next v2 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe


> From: Jakub Kicinski <[email protected]>
> Sent: Saturday, February 12, 2022 6:43 AM
>
> On Fri, 11 Feb 2022 11:20:17 +0200 Moshe Shemesh wrote:
> > v1->v2:
> > - updated example to make clear SF port and SF device creation PFs
> > - added example when SF port and device creation PFs are on different
> > hosts
>
> How does this address my comments?
>
Which one?
Your suggestion in [1] to specify vnet during port function spawning time?
Or
Your suggestion in [2] to add "noprobe" option?

> We will not define Linux APIs based on what your firmware can or cannot do
> today.
Sure. I just answered and clarified what it is the device capable to do.
We can possibly enhance the fw if it looks correct.

> Otherwise, why the global policy and all the hoops to jump thru?
> User wants a device with a vnet, give them a device with a vnet.
>
User wants a device with specific attributes.
Do you suggest to tunnel those params at port spawning time to share to different host via fw?
Some are HW/FW capabilities, and some are hints.
Some are hints because vnet also uses some eth resources by its very nature of being vnet.

Lets discuss two use cases.
Use case -1:
User wants params of [3] to below value.
eth=false, vnet=false, rdma=true, roce=false, io_eq_size=4, event_eq_size=256, max_macs=1.

Use case -2:
User wants params of [3] be below value.
eth=true, vnet=false, rdma=true, roce=true, rest=don't care.

Last year, when we added "roce" in [4] on the eswitch side, you commented in [4] to leave the decision on the SF side.
Based on this feedback, you can see growth of such params on the SF side in [5], [6] and reusing existing params in [7].
(instead of doing them on port spawning side)

Port spawning time attributes should cover minimum of below attributes of [3].
(a) enable_vnet,eth,roce,rdma,iwarp (b) io_eq_size, (c) event_eq_size (d) max_macs.

Do you agree if above list is worth addition as port function attributes?
If not, its not addressing the user needs.

Did you get a chance to read my reply in [8]?
In future when user wants to change the cpu affinity of a SF, user needs to perform devlink reload.
And params of [3] + any new devlink params also benefit from single devlink reload?
For example, which and how many cpus to use is something best decided by the depending on the workload and use case.

> You left out from your steps how ESW learns that the device has to be
> spawned.
I read above note few times, but didn't understand. Can you please explain?

> Given there's some form of communication between user intent and
> ESW the location of the two is completely irrelevant.
I find it difficult to have all attributes on the port function, specially knobs which are very host specific.
Few valid knobs that I see on host side are
(a) cpu affinity mask
(b) number of msix vectors to consume within driver internally vs map to user space

At present I see knobs on both sides.
Saeed is offline this week, and I want to gather his feedback as well on passing hints from port spawning side to host side.

2022-02-24 03:54:09

by Parav Pandit

[permalink] [raw]
Subject: RE: [PATCH net-next v2 0/4] net/mlx5: Introduce devlink param to disable SF aux dev probe

Hi Jakub,

> From: Parav Pandit
> Sent: Monday, February 14, 2022 8:16 PM
>
> > From: Jakub Kicinski <[email protected]>
> > Sent: Saturday, February 12, 2022 6:43 AM
> >
> > On Fri, 11 Feb 2022 11:20:17 +0200 Moshe Shemesh wrote:
> > > v1->v2:
> > > - updated example to make clear SF port and SF device creation PFs
> > > - added example when SF port and device creation PFs are on
> > > different hosts
> >
> > How does this address my comments?
> >
> Which one?
> Your suggestion in [1] to specify vnet during port function spawning time?
> Or
> Your suggestion in [2] to add "noprobe" option?

> Saeed is offline this week, and I want to gather his feedback as well on passing
> hints from port spawning side to host side.

Saeed is back. Saeed, others and I discussed the per SF knob further.
The option to indicate to not initialize the SF device during port spawning time is very useful to users.

So, how about crafting below UAPI?
Example:

Esw side:
$ devlink port add <dev> flavour pcisf ..
$ devlink port function set <port> hw_addr 00:11:22:33:44:55 initialize false/true ...

The "initialize" option indicates whether a function should fully initialize or not at driver level on the host side.
When initialize=false, the device will spawn but not initialize until a user on the host initialize it using the existing devlink reload API on this SF device.

For example, on host side, a user will be able to do,
$ devlink dev auxiliary/mlx5_core.sf2 resource/param set
$ devlink dev reload auxiliary/mlx5_core.sf.2