2020-07-27 11:11:17

by Moshe Shemesh

[permalink] [raw]
Subject: [PATCH net-next RFC 00/13] Add devlink reload level option

Introduce new option on devlink reload API to enable the user to select the
reload level required. Complete support for all levels in mlx5.
The following reload levels are supported:
driver: Driver entities re-instantiation only.
fw_reset: Firmware reset and driver entities re-instantiation.
fw_live_patch: Firmware live patching only.

Each driver which support this command should expose the reload levels
supported and the driver's default reload level.
The uAPI is backward compatible, if the reload level option is omitted
from the reload command, the driver's default reload level will be used.

Patch 1 adds the new API reload level option to devlink.
Patch 2 exposes the supported reload levels and default level on devlink
dev get.
Patches 3-8 add support on mlx5 for devlink reload level fw-reset and
handle the firmware reset events.
Patches 9-10 add devlink enable remote dev reset parameter and use it
in mlx5.
Patches 11-12 mlx5 add devlink reload live patch support and event
handling.
Patch 13 adds documentation file devlink-reload.rst

Command examples:

# Run reload command with fw-reset reload level:
$ devlink dev reload pci/0000:82:00.0 level fw-reset

# Run reload command with driver reload level:
$ devlink dev reload pci/0000:82:00.0 level driver

# Run reload command with driver's default level (backward compatible):
$ devlink dev reload pci/0000:82:00.0


Moshe Shemesh (13):
devlink: Add reload level option to devlink reload command
devlink: Add reload levels data to dev get
net/mlx5: Add functions to set/query MFRL register
net/mlx5: Set cap for pci sync for fw update event
net/mlx5: Handle sync reset request event
net/mlx5: Handle sync reset now event
net/mlx5: Handle sync reset abort event
net/mlx5: Add support for devlink reload level fw reset
devlink: Add enable_remote_dev_reset generic parameter
net/mlx5: Add devlink param enable_remote_dev_reset support
net/mlx5: Add support for fw live patch event
net/mlx5: Add support for devlink reload level live patch
devlink: Add Documentation/networking/devlink/devlink-reload.rst

.../networking/devlink/devlink-params.rst | 6 +
.../networking/devlink/devlink-reload.rst | 56 +++
Documentation/networking/devlink/index.rst | 1 +
drivers/net/ethernet/mellanox/mlx4/main.c | 6 +-
.../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
.../net/ethernet/mellanox/mlx5/core/devlink.c | 114 +++++-
.../mellanox/mlx5/core/diag/fw_tracer.c | 31 ++
.../mellanox/mlx5/core/diag/fw_tracer.h | 1 +
.../ethernet/mellanox/mlx5/core/fw_reset.c | 328 ++++++++++++++++++
.../ethernet/mellanox/mlx5/core/fw_reset.h | 17 +
.../net/ethernet/mellanox/mlx5/core/health.c | 74 +++-
.../net/ethernet/mellanox/mlx5/core/main.c | 13 +
drivers/net/ethernet/mellanox/mlxsw/core.c | 6 +-
drivers/net/netdevsim/dev.c | 6 +-
include/linux/mlx5/device.h | 1 +
include/linux/mlx5/driver.h | 12 +
include/net/devlink.h | 10 +-
include/uapi/linux/devlink.h | 22 ++
net/core/devlink.c | 95 ++++-
19 files changed, 764 insertions(+), 37 deletions(-)
create mode 100644 Documentation/networking/devlink/devlink-reload.rst
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h

--
2.17.1


2020-07-27 11:11:18

by Moshe Shemesh

[permalink] [raw]
Subject: [PATCH net-next RFC 11/13] net/mlx5: Add support for fw live patch event

Firmware live patch event notifies the driver that the firmware was just
updated using live patch. In such case the driver should not reload or
re-initiate entities, part to updating the firmware version and
re-initiate the firmware tracer which can be updated by live patch with
new strings database to help debugging an issue.

Signed-off-by: Moshe Shemesh <[email protected]>
---
.../mellanox/mlx5/core/diag/fw_tracer.c | 31 +++++++++++++++++++
.../mellanox/mlx5/core/diag/fw_tracer.h | 1 +
.../ethernet/mellanox/mlx5/core/fw_reset.c | 27 ++++++++++++++++
include/linux/mlx5/device.h | 1 +
4 files changed, 60 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
index ad3594c4afcb..08dae045d185 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.c
@@ -1064,6 +1064,37 @@ void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer)
kvfree(tracer);
}

+int mlx5_fw_tracer_recreate_strings_db(struct mlx5_fw_tracer *tracer)
+{
+ struct mlx5_core_dev *dev;
+ int err;
+
+ if (IS_ERR_OR_NULL(tracer))
+ return -EINVAL;
+
+ cancel_work_sync(&tracer->read_fw_strings_work);
+ mlx5_fw_tracer_clean_ready_list(tracer);
+ mlx5_fw_tracer_clean_print_hash(tracer);
+ mlx5_fw_tracer_clean_saved_traces_array(tracer);
+ mlx5_fw_tracer_free_strings_db(tracer);
+
+ dev = tracer->dev;
+ err = mlx5_query_mtrc_caps(tracer);
+ if (err) {
+ mlx5_core_dbg(dev, "FWTracer: Failed to query capabilities %d\n", err);
+ return err;
+ }
+
+ err = mlx5_fw_tracer_allocate_strings_db(tracer);
+ if (err) {
+ mlx5_core_warn(dev, "FWTracer: Allocate strings DB failed %d\n", err);
+ return err;
+ }
+ mlx5_fw_tracer_init_saved_traces_array(tracer);
+
+ return 0;
+}
+
static int fw_tracer_event(struct notifier_block *nb, unsigned long action, void *data)
{
struct mlx5_fw_tracer *tracer = mlx5_nb_cof(nb, struct mlx5_fw_tracer, nb);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
index 40601fba80ba..1a755098aeeb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/diag/fw_tracer.h
@@ -191,5 +191,6 @@ void mlx5_fw_tracer_destroy(struct mlx5_fw_tracer *tracer);
int mlx5_fw_tracer_trigger_core_dump_general(struct mlx5_core_dev *dev);
int mlx5_fw_tracer_get_saved_traces_objects(struct mlx5_fw_tracer *tracer,
struct devlink_fmsg *fmsg);
+int mlx5_fw_tracer_recreate_strings_db(struct mlx5_fw_tracer *tracer);

#endif
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index 45fdecbadf52..87465a5e5577 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -2,11 +2,13 @@
/* Copyright (c) 2020, Mellanox Technologies inc. All rights reserved. */

#include "fw_reset.h"
+#include "diag/fw_tracer.h"

struct mlx5_fw_reset {
struct mlx5_core_dev *dev;
struct mlx5_nb nb;
struct workqueue_struct *wq;
+ struct work_struct fw_live_patch_work;
struct work_struct reset_request_work;
struct work_struct reset_now_work;
struct work_struct reset_abort_work;
@@ -66,6 +68,27 @@ static int mlx5_fw_set_reset_sync_nack(struct mlx5_core_dev *dev)
return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 2, false);
}

+static void mlx5_fw_live_patch_event(struct work_struct *work)
+{
+ struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
+ fw_live_patch_work);
+ struct mlx5_core_dev *dev = fw_reset->dev;
+ struct mlx5_fw_tracer *tracer;
+
+ mlx5_core_info(dev, "Live patch updated firmware version: %d.%d.%d\n", fw_rev_maj(dev),
+ fw_rev_min(dev), fw_rev_sub(dev));
+
+ tracer = dev->tracer;
+ if (IS_ERR_OR_NULL(tracer))
+ return;
+
+ mlx5_fw_tracer_cleanup(tracer);
+ if (mlx5_fw_tracer_recreate_strings_db(tracer))
+ mlx5_core_err(dev, "Failed to recreate FW tracer strings DB\n");
+ if (mlx5_fw_tracer_init(tracer))
+ mlx5_core_err(dev, "Failed to re-initialize FW tracer\n");
+}
+
static void mlx5_sync_reset_request_event(struct work_struct *work)
{
struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
@@ -241,6 +264,9 @@ static int fw_reset_event_notifier(struct notifier_block *nb, unsigned long acti
struct mlx5_eqe *eqe = data;

switch (eqe->sub_type) {
+ case MLX5_GENERAL_SUBTYPE_FW_LIVE_PATCH_EVENT:
+ queue_work(fw_reset->wq, &fw_reset->fw_live_patch_work);
+ break;
case MLX5_GENERAL_SUBTYPE_PCI_SYNC_FOR_FW_UPDATE_EVENT:
mlx5_sync_reset_events_handle(fw_reset, eqe);
break;
@@ -280,6 +306,7 @@ int mlx5_fw_reset_events_init(struct mlx5_core_dev *dev)
fw_reset->dev = dev;
dev->priv.fw_reset = fw_reset;

+ INIT_WORK(&fw_reset->fw_live_patch_work, mlx5_fw_live_patch_event);
INIT_WORK(&fw_reset->reset_request_work, mlx5_sync_reset_request_event);
INIT_WORK(&fw_reset->reset_now_work, mlx5_sync_reset_now_event);
INIT_WORK(&fw_reset->reset_abort_work, mlx5_sync_reset_abort_event);
diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h
index 57db125e5802..58e63bb718a2 100644
--- a/include/linux/mlx5/device.h
+++ b/include/linux/mlx5/device.h
@@ -364,6 +364,7 @@ enum {
enum {
MLX5_GENERAL_SUBTYPE_DELAY_DROP_TIMEOUT = 0x1,
MLX5_GENERAL_SUBTYPE_PCI_POWER_CHANGE_EVENT = 0x5,
+ MLX5_GENERAL_SUBTYPE_FW_LIVE_PATCH_EVENT = 0x7,
MLX5_GENERAL_SUBTYPE_PCI_SYNC_FOR_FW_UPDATE_EVENT = 0x8,
};

--
2.17.1

2020-07-27 11:11:32

by Moshe Shemesh

[permalink] [raw]
Subject: [PATCH net-next RFC 10/13] net/mlx5: Add devlink param enable_remote_dev_reset support

The enable_remote_dev_reset devlink param flags that the host admin
allows resets by other hosts. In case it is cleared mlx5 host PF driver
will send NACK on pci sync for firmware update reset request and the
command will fail.
By default enable_remote_dev_reset parameter is true, so pci sync for
firmware update reset is enabled.

Signed-off-by: Moshe Shemesh <[email protected]>
---
.../net/ethernet/mellanox/mlx5/core/devlink.c | 29 +++++++++++++++++++
.../ethernet/mellanox/mlx5/core/fw_reset.c | 12 ++++++++
include/linux/mlx5/driver.h | 1 +
3 files changed, 42 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index 905d55cab4c3..a81204b3dd7c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -275,6 +275,32 @@ static int mlx5_devlink_large_group_num_validate(struct devlink *devlink, u32 id
}
#endif

+static int mlx5_devlink_enable_remote_dev_reset_set(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ struct mlx5_core_health *health;
+
+ health = &dev->priv.health;
+ if (ctx->val.vbool)
+ clear_bit(MLX5_HEALTH_RESET_FLAGS_NACK_RESET_REQUEST, &health->reset_flags);
+ else
+ set_bit(MLX5_HEALTH_RESET_FLAGS_NACK_RESET_REQUEST, &health->reset_flags);
+ return 0;
+}
+
+static int mlx5_devlink_enable_remote_dev_reset_get(struct devlink *devlink, u32 id,
+ struct devlink_param_gset_ctx *ctx)
+{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
+ struct mlx5_core_health *health;
+
+ health = &dev->priv.health;
+ ctx->val.vbool = !test_bit(MLX5_HEALTH_RESET_FLAGS_NACK_RESET_REQUEST,
+ &health->reset_flags);
+ return 0;
+}
+
static const struct devlink_param mlx5_devlink_params[] = {
DEVLINK_PARAM_DRIVER(MLX5_DEVLINK_PARAM_ID_FLOW_STEERING_MODE,
"flow_steering_mode", DEVLINK_PARAM_TYPE_STRING,
@@ -290,6 +316,9 @@ static const struct devlink_param mlx5_devlink_params[] = {
NULL, NULL,
mlx5_devlink_large_group_num_validate),
#endif
+ DEVLINK_PARAM_GENERIC(ENABLE_REMOTE_DEV_RESET, BIT(DEVLINK_PARAM_CMODE_RUNTIME),
+ mlx5_devlink_enable_remote_dev_reset_get,
+ mlx5_devlink_enable_remote_dev_reset_set, NULL),
};

static void mlx5_devlink_set_params_init_values(struct devlink *devlink)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
index f95df226b915..45fdecbadf52 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
@@ -61,12 +61,24 @@ static int mlx5_fw_set_reset_sync_ack(struct mlx5_core_dev *dev)
return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 1, false);
}

+static int mlx5_fw_set_reset_sync_nack(struct mlx5_core_dev *dev)
+{
+ return mlx5_reg_mfrl_set(dev, MLX5_MFRL_REG_RESET_LEVEL3, 0, 2, false);
+}
+
static void mlx5_sync_reset_request_event(struct work_struct *work)
{
struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
reset_request_work);
struct mlx5_core_dev *dev = fw_reset->dev;
+ int err;

+ if (test_bit(MLX5_HEALTH_RESET_FLAGS_NACK_RESET_REQUEST, &dev->priv.health.reset_flags)) {
+ err = mlx5_fw_set_reset_sync_nack(dev);
+ mlx5_core_warn(dev, "PCI Sync FW Update Reset Nack %s",
+ err ? "Failed" : "Sent");
+ return;
+ }
mlx5_health_set_reset_requested_mode(dev);
mlx5_reload_health_poll_timer(dev);
if (mlx5_fw_set_reset_sync_ack(dev))
diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
index 27b6086f3095..4999dff9dc8c 100644
--- a/include/linux/mlx5/driver.h
+++ b/include/linux/mlx5/driver.h
@@ -426,6 +426,7 @@ struct mlx5_sq_bfreg {
enum {
MLX5_HEALTH_RESET_FLAGS_RESET_REQUESTED,
MLX5_HEALTH_RESET_FLAGS_SILENT_RECOVERY,
+ MLX5_HEALTH_RESET_FLAGS_NACK_RESET_REQUEST,
};

struct mlx5_core_health {
--
2.17.1

2020-07-28 01:00:53

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 10/13] net/mlx5: Add devlink param enable_remote_dev_reset support

On Mon, 27 Jul 2020 14:02:30 +0300 Moshe Shemesh wrote:
> static void mlx5_sync_reset_request_event(struct work_struct *work)
> {
> struct mlx5_fw_reset *fw_reset = container_of(work, struct mlx5_fw_reset,
> reset_request_work);
> struct mlx5_core_dev *dev = fw_reset->dev;
> + int err;
>
> + if (test_bit(MLX5_HEALTH_RESET_FLAGS_NACK_RESET_REQUEST, &dev->priv.health.reset_flags)) {
> + err = mlx5_fw_set_reset_sync_nack(dev);
> + mlx5_core_warn(dev, "PCI Sync FW Update Reset Nack %s",
> + err ? "Failed" : "Sent");
> + return;
> + }

What if the NACK fails? Does the reset still proceed?

> mlx5_health_set_reset_requested_mode(dev);
> mlx5_reload_health_poll_timer(dev);
> if (mlx5_fw_set_reset_sync_ack(dev))

2020-07-28 05:28:46

by Vasundhara Volam

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option

On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
>
> Introduce new option on devlink reload API to enable the user to select the
> reload level required. Complete support for all levels in mlx5.
> The following reload levels are supported:
> driver: Driver entities re-instantiation only.
> fw_reset: Firmware reset and driver entities re-instantiation.
The Name is a little confusing. I think it should be renamed to
fw_live_reset (in which both firmware and driver entities are
re-instantiated). For only fw_reset, the driver should not undergo
reset (it requires a driver reload for firmware to undergo reset).

> fw_live_patch: Firmware live patching only.
This level is not clear. Is this similar to flashing??

Also I have a basic query. The reload command is split into
reload_up/reload_down handlers (Please correct me if this behaviour is
changed with this patchset). What if the vendor specific driver does
not support up/down and needs only a single handler to fire a firmware
reset or firmware live reset command?
>
> Each driver which support this command should expose the reload levels
> supported and the driver's default reload level.
> The uAPI is backward compatible, if the reload level option is omitted
> from the reload command, the driver's default reload level will be used.
>
> Patch 1 adds the new API reload level option to devlink.
> Patch 2 exposes the supported reload levels and default level on devlink
> dev get.
> Patches 3-8 add support on mlx5 for devlink reload level fw-reset and
> handle the firmware reset events.
> Patches 9-10 add devlink enable remote dev reset parameter and use it
> in mlx5.
> Patches 11-12 mlx5 add devlink reload live patch support and event
> handling.
> Patch 13 adds documentation file devlink-reload.rst
>
> Command examples:
>
> # Run reload command with fw-reset reload level:
> $ devlink dev reload pci/0000:82:00.0 level fw-reset
>
> # Run reload command with driver reload level:
> $ devlink dev reload pci/0000:82:00.0 level driver
>
> # Run reload command with driver's default level (backward compatible):
> $ devlink dev reload pci/0000:82:00.0
>
>
> Moshe Shemesh (13):
> devlink: Add reload level option to devlink reload command
> devlink: Add reload levels data to dev get
> net/mlx5: Add functions to set/query MFRL register
> net/mlx5: Set cap for pci sync for fw update event
> net/mlx5: Handle sync reset request event
> net/mlx5: Handle sync reset now event
> net/mlx5: Handle sync reset abort event
> net/mlx5: Add support for devlink reload level fw reset
> devlink: Add enable_remote_dev_reset generic parameter
> net/mlx5: Add devlink param enable_remote_dev_reset support
> net/mlx5: Add support for fw live patch event
> net/mlx5: Add support for devlink reload level live patch
> devlink: Add Documentation/networking/devlink/devlink-reload.rst
>
> .../networking/devlink/devlink-params.rst | 6 +
> .../networking/devlink/devlink-reload.rst | 56 +++
> Documentation/networking/devlink/index.rst | 1 +
> drivers/net/ethernet/mellanox/mlx4/main.c | 6 +-
> .../net/ethernet/mellanox/mlx5/core/Makefile | 2 +-
> .../net/ethernet/mellanox/mlx5/core/devlink.c | 114 +++++-
> .../mellanox/mlx5/core/diag/fw_tracer.c | 31 ++
> .../mellanox/mlx5/core/diag/fw_tracer.h | 1 +
> .../ethernet/mellanox/mlx5/core/fw_reset.c | 328 ++++++++++++++++++
> .../ethernet/mellanox/mlx5/core/fw_reset.h | 17 +
> .../net/ethernet/mellanox/mlx5/core/health.c | 74 +++-
> .../net/ethernet/mellanox/mlx5/core/main.c | 13 +
> drivers/net/ethernet/mellanox/mlxsw/core.c | 6 +-
> drivers/net/netdevsim/dev.c | 6 +-
> include/linux/mlx5/device.h | 1 +
> include/linux/mlx5/driver.h | 12 +
> include/net/devlink.h | 10 +-
> include/uapi/linux/devlink.h | 22 ++
> net/core/devlink.c | 95 ++++-
> 19 files changed, 764 insertions(+), 37 deletions(-)
> create mode 100644 Documentation/networking/devlink/devlink-reload.rst
> create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c
> create mode 100644 drivers/net/ethernet/mellanox/mlx5/core/fw_reset.h
>
> --
> 2.17.1
>

2020-07-28 16:40:18

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option



On 7/27/2020 4:02 AM, Moshe Shemesh wrote:
> Introduce new option on devlink reload API to enable the user to select the
> reload level required. Complete support for all levels in mlx5.
> The following reload levels are supported:
> driver: Driver entities re-instantiation only.

So, this is the current support. Ok.

> fw_reset: Firmware reset and driver entities re-instantiation.


This would include firmware update? What about differing levels of
device/firmware reset? I.e. I think some of our HW has function level
reset, device wide reset, as well as EMP reset. For us, only EMP reset
would trigger firmware update.

> fw_live_patch: Firmware live patching only.

This is for update without reset, right?

2020-07-28 16:45:19

by Jacob Keller

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option



On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
> On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
>>
>> Introduce new option on devlink reload API to enable the user to select the
>> reload level required. Complete support for all levels in mlx5.
>> The following reload levels are supported:
>> driver: Driver entities re-instantiation only.
>> fw_reset: Firmware reset and driver entities re-instantiation.
> The Name is a little confusing. I think it should be renamed to
> fw_live_reset (in which both firmware and driver entities are
> re-instantiated). For only fw_reset, the driver should not undergo
> reset (it requires a driver reload for firmware to undergo reset).
>

So, I think the differentiation here is that "live_patch" doesn't reset
anything.

>> fw_live_patch: Firmware live patching only.
> This level is not clear. Is this similar to flashing??
>
> Also I have a basic query. The reload command is split into
> reload_up/reload_down handlers (Please correct me if this behaviour is
> changed with this patchset). What if the vendor specific driver does
> not support up/down and needs only a single handler to fire a firmware
> reset or firmware live reset command?

In the "reload_down" handler, they would trigger the appropriate reset,
and quiesce anything that needs to be done. Then on reload up, it would
restore and bring up anything quiesced in the first stage.

2020-08-03 10:27:36

by Vasundhara Volam

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option

On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <[email protected]> wrote:
>
>
>
> On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
> > On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
> >>
> >> Introduce new option on devlink reload API to enable the user to select the
> >> reload level required. Complete support for all levels in mlx5.
> >> The following reload levels are supported:
> >> driver: Driver entities re-instantiation only.
> >> fw_reset: Firmware reset and driver entities re-instantiation.
> > The Name is a little confusing. I think it should be renamed to
> > fw_live_reset (in which both firmware and driver entities are
> > re-instantiated). For only fw_reset, the driver should not undergo
> > reset (it requires a driver reload for firmware to undergo reset).
> >
>
> So, I think the differentiation here is that "live_patch" doesn't reset
> anything.
This seems similar to flashing the firmware and does not reset anything.

>
> >> fw_live_patch: Firmware live patching only.
> > This level is not clear. Is this similar to flashing??
> >
> > Also I have a basic query. The reload command is split into
> > reload_up/reload_down handlers (Please correct me if this behaviour is
> > changed with this patchset). What if the vendor specific driver does
> > not support up/down and needs only a single handler to fire a firmware
> > reset or firmware live reset command?
>
> In the "reload_down" handler, they would trigger the appropriate reset,
> and quiesce anything that needs to be done. Then on reload up, it would
> restore and bring up anything quiesced in the first stage.
Yes, I got the "reload_down" and "reload_up". Similar to the device
"remove" and "re-probe" respectively.

But our requirement is a similar "ethtool reset" command, where
ethtool calls a single callback in driver and driver just sends a
firmware command for doing the reset. Once firmware receives the
command, it will initiate the reset of driver and firmware entities
asynchronously.

2020-08-03 12:18:26

by Moshe Shemesh

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option


On 8/3/2020 1:24 PM, Vasundhara Volam wrote:
> On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <[email protected]> wrote:
>>
>>
>> On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
>>> On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
>>>> Introduce new option on devlink reload API to enable the user to select the
>>>> reload level required. Complete support for all levels in mlx5.
>>>> The following reload levels are supported:
>>>> driver: Driver entities re-instantiation only.
>>>> fw_reset: Firmware reset and driver entities re-instantiation.
>>> The Name is a little confusing. I think it should be renamed to
>>> fw_live_reset (in which both firmware and driver entities are
>>> re-instantiated). For only fw_reset, the driver should not undergo
>>> reset (it requires a driver reload for firmware to undergo reset).
>>>
>> So, I think the differentiation here is that "live_patch" doesn't reset
>> anything.
> This seems similar to flashing the firmware and does not reset anything.


The live patch is activating fw change without reset.

It is not suitable for any fw change but fw gaps which don't require reset.

I can query the fw to check if the pending image change is suitable or
require fw reset.

>>>> fw_live_patch: Firmware live patching only.
>>> This level is not clear. Is this similar to flashing??
>>>
>>> Also I have a basic query. The reload command is split into
>>> reload_up/reload_down handlers (Please correct me if this behaviour is
>>> changed with this patchset). What if the vendor specific driver does
>>> not support up/down and needs only a single handler to fire a firmware
>>> reset or firmware live reset command?
>> In the "reload_down" handler, they would trigger the appropriate reset,
>> and quiesce anything that needs to be done. Then on reload up, it would
>> restore and bring up anything quiesced in the first stage.
> Yes, I got the "reload_down" and "reload_up". Similar to the device
> "remove" and "re-probe" respectively.
>
> But our requirement is a similar "ethtool reset" command, where
> ethtool calls a single callback in driver and driver just sends a
> firmware command for doing the reset. Once firmware receives the
> command, it will initiate the reset of driver and firmware entities
> asynchronously.


It is similar to mlx5 case here for fw_reset. The driver triggers the fw
command to reset and all PFs drivers gets events to handle and do
re-initialization.  To fit it to the devlink reload_down and reload_up,
I wait for the event handler to complete and it stops at driver unload
to have the driver up by devlink reload_up. See patch 8 in this patchset.

2020-08-03 12:52:24

by Vasundhara Volam

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option

On Mon, Aug 3, 2020 at 5:47 PM Moshe Shemesh <[email protected]> wrote:
>
>
> On 8/3/2020 1:24 PM, Vasundhara Volam wrote:
> > On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <[email protected]> wrote:
> >>
> >>
> >> On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
> >>> On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
> >>>> Introduce new option on devlink reload API to enable the user to select the
> >>>> reload level required. Complete support for all levels in mlx5.
> >>>> The following reload levels are supported:
> >>>> driver: Driver entities re-instantiation only.
> >>>> fw_reset: Firmware reset and driver entities re-instantiation.
> >>> The Name is a little confusing. I think it should be renamed to
> >>> fw_live_reset (in which both firmware and driver entities are
> >>> re-instantiated). For only fw_reset, the driver should not undergo
> >>> reset (it requires a driver reload for firmware to undergo reset).
> >>>
> >> So, I think the differentiation here is that "live_patch" doesn't reset
> >> anything.
> > This seems similar to flashing the firmware and does not reset anything.
>
>
> The live patch is activating fw change without reset.
>
> It is not suitable for any fw change but fw gaps which don't require reset.
>
> I can query the fw to check if the pending image change is suitable or
> require fw reset.
Okay.
>
> >>>> fw_live_patch: Firmware live patching only.
> >>> This level is not clear. Is this similar to flashing??
> >>>
> >>> Also I have a basic query. The reload command is split into
> >>> reload_up/reload_down handlers (Please correct me if this behaviour is
> >>> changed with this patchset). What if the vendor specific driver does
> >>> not support up/down and needs only a single handler to fire a firmware
> >>> reset or firmware live reset command?
> >> In the "reload_down" handler, they would trigger the appropriate reset,
> >> and quiesce anything that needs to be done. Then on reload up, it would
> >> restore and bring up anything quiesced in the first stage.
> > Yes, I got the "reload_down" and "reload_up". Similar to the device
> > "remove" and "re-probe" respectively.
> >
> > But our requirement is a similar "ethtool reset" command, where
> > ethtool calls a single callback in driver and driver just sends a
> > firmware command for doing the reset. Once firmware receives the
> > command, it will initiate the reset of driver and firmware entities
> > asynchronously.
>
>
> It is similar to mlx5 case here for fw_reset. The driver triggers the fw
> command to reset and all PFs drivers gets events to handle and do
> re-initialization. To fit it to the devlink reload_down and reload_up,
> I wait for the event handler to complete and it stops at driver unload
> to have the driver up by devlink reload_up. See patch 8 in this patchset.
>
Yes, I see reload_down is triggering the reset. In our driver, after
triggering the reset through a firmware command, reset is done in
another context as the driver initiates the reset only after receiving
an ASYNC event from the firmware.

Probably, we have to use reload_down() to send firmware command to
trigger reset and do nothing in reload_up. And returning from reload
does not mean that reset is complete as it is done in another context
and the driver notifies the health reporter once the reset is
complete. devlink framework may have to allow drivers to implement
reload_down only to look more clean or call reload_up only if the
driver notifies the devlink once reset is completed from another
context. Please suggest.

2020-08-03 13:53:45

by Moshe Shemesh

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option


On 8/3/2020 3:47 PM, Vasundhara Volam wrote:
> On Mon, Aug 3, 2020 at 5:47 PM Moshe Shemesh <[email protected]> wrote:
>>
>> On 8/3/2020 1:24 PM, Vasundhara Volam wrote:
>>> On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <[email protected]> wrote:
>>>>
>>>> On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
>>>>> On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
>>>>>> Introduce new option on devlink reload API to enable the user to select the
>>>>>> reload level required. Complete support for all levels in mlx5.
>>>>>> The following reload levels are supported:
>>>>>> driver: Driver entities re-instantiation only.
>>>>>> fw_reset: Firmware reset and driver entities re-instantiation.
>>>>> The Name is a little confusing. I think it should be renamed to
>>>>> fw_live_reset (in which both firmware and driver entities are
>>>>> re-instantiated). For only fw_reset, the driver should not undergo
>>>>> reset (it requires a driver reload for firmware to undergo reset).
>>>>>
>>>> So, I think the differentiation here is that "live_patch" doesn't reset
>>>> anything.
>>> This seems similar to flashing the firmware and does not reset anything.
>>
>> The live patch is activating fw change without reset.
>>
>> It is not suitable for any fw change but fw gaps which don't require reset.
>>
>> I can query the fw to check if the pending image change is suitable or
>> require fw reset.
> Okay.
>>>>>> fw_live_patch: Firmware live patching only.
>>>>> This level is not clear. Is this similar to flashing??
>>>>>
>>>>> Also I have a basic query. The reload command is split into
>>>>> reload_up/reload_down handlers (Please correct me if this behaviour is
>>>>> changed with this patchset). What if the vendor specific driver does
>>>>> not support up/down and needs only a single handler to fire a firmware
>>>>> reset or firmware live reset command?
>>>> In the "reload_down" handler, they would trigger the appropriate reset,
>>>> and quiesce anything that needs to be done. Then on reload up, it would
>>>> restore and bring up anything quiesced in the first stage.
>>> Yes, I got the "reload_down" and "reload_up". Similar to the device
>>> "remove" and "re-probe" respectively.
>>>
>>> But our requirement is a similar "ethtool reset" command, where
>>> ethtool calls a single callback in driver and driver just sends a
>>> firmware command for doing the reset. Once firmware receives the
>>> command, it will initiate the reset of driver and firmware entities
>>> asynchronously.
>>
>> It is similar to mlx5 case here for fw_reset. The driver triggers the fw
>> command to reset and all PFs drivers gets events to handle and do
>> re-initialization. To fit it to the devlink reload_down and reload_up,
>> I wait for the event handler to complete and it stops at driver unload
>> to have the driver up by devlink reload_up. See patch 8 in this patchset.
>>
> Yes, I see reload_down is triggering the reset. In our driver, after
> triggering the reset through a firmware command, reset is done in
> another context as the driver initiates the reset only after receiving
> an ASYNC event from the firmware.


Same here.

>
> Probably, we have to use reload_down() to send firmware command to
> trigger reset and do nothing in reload_up.
I had that in previous version, but its wrong to use devlink reload this
way, so I added wait with timeout for the event handling to complete
before unload_down function ends. See mlx5_fw_wait_fw_reset_done(). Also
the event handler stops before load back to have that done by devlink
reload_up.
> And returning from reload
> does not mean that reset is complete as it is done in another context
> and the driver notifies the health reporter once the reset is
> complete. devlink framework may have to allow drivers to implement
> reload_down only to look more clean or call reload_up only if the
> driver notifies the devlink once reset is completed from another
> context. Please suggest.

2020-08-04 10:15:33

by Vasundhara Volam

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option

On Mon, Aug 3, 2020 at 7:23 PM Moshe Shemesh <[email protected]> wrote:
>
>
> On 8/3/2020 3:47 PM, Vasundhara Volam wrote:
> > On Mon, Aug 3, 2020 at 5:47 PM Moshe Shemesh <[email protected]> wrote:
> >>
> >> On 8/3/2020 1:24 PM, Vasundhara Volam wrote:
> >>> On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <[email protected]> wrote:
> >>>>
> >>>> On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
> >>>>> On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
> >>>>>> Introduce new option on devlink reload API to enable the user to select the
> >>>>>> reload level required. Complete support for all levels in mlx5.
> >>>>>> The following reload levels are supported:
> >>>>>> driver: Driver entities re-instantiation only.
> >>>>>> fw_reset: Firmware reset and driver entities re-instantiation.
> >>>>> The Name is a little confusing. I think it should be renamed to
> >>>>> fw_live_reset (in which both firmware and driver entities are
> >>>>> re-instantiated). For only fw_reset, the driver should not undergo
> >>>>> reset (it requires a driver reload for firmware to undergo reset).
> >>>>>
> >>>> So, I think the differentiation here is that "live_patch" doesn't reset
> >>>> anything.
> >>> This seems similar to flashing the firmware and does not reset anything.
> >>
> >> The live patch is activating fw change without reset.
> >>
> >> It is not suitable for any fw change but fw gaps which don't require reset.
> >>
> >> I can query the fw to check if the pending image change is suitable or
> >> require fw reset.
> > Okay.
> >>>>>> fw_live_patch: Firmware live patching only.
> >>>>> This level is not clear. Is this similar to flashing??
> >>>>>
> >>>>> Also I have a basic query. The reload command is split into
> >>>>> reload_up/reload_down handlers (Please correct me if this behaviour is
> >>>>> changed with this patchset). What if the vendor specific driver does
> >>>>> not support up/down and needs only a single handler to fire a firmware
> >>>>> reset or firmware live reset command?
> >>>> In the "reload_down" handler, they would trigger the appropriate reset,
> >>>> and quiesce anything that needs to be done. Then on reload up, it would
> >>>> restore and bring up anything quiesced in the first stage.
> >>> Yes, I got the "reload_down" and "reload_up". Similar to the device
> >>> "remove" and "re-probe" respectively.
> >>>
> >>> But our requirement is a similar "ethtool reset" command, where
> >>> ethtool calls a single callback in driver and driver just sends a
> >>> firmware command for doing the reset. Once firmware receives the
> >>> command, it will initiate the reset of driver and firmware entities
> >>> asynchronously.
> >>
> >> It is similar to mlx5 case here for fw_reset. The driver triggers the fw
> >> command to reset and all PFs drivers gets events to handle and do
> >> re-initialization. To fit it to the devlink reload_down and reload_up,
> >> I wait for the event handler to complete and it stops at driver unload
> >> to have the driver up by devlink reload_up. See patch 8 in this patchset.
> >>
> > Yes, I see reload_down is triggering the reset. In our driver, after
> > triggering the reset through a firmware command, reset is done in
> > another context as the driver initiates the reset only after receiving
> > an ASYNC event from the firmware.
>
>
> Same here.
>
> >
> > Probably, we have to use reload_down() to send firmware command to
> > trigger reset and do nothing in reload_up.
> I had that in previous version, but its wrong to use devlink reload this
> way, so I added wait with timeout for the event handling to complete
> before unload_down function ends. See mlx5_fw_wait_fw_reset_done(). Also
> the event handler stops before load back to have that done by devlink
> reload_up.
But "devlink dev reload" will be invoked by the user only on a single
dev handler and all function drivers will be re-instantiated upon the
ASYNC event. reload_down and reload_up are invoked only the function
which the user invoked.

Take an example of a 2-port (PF0 and PF1) adapter on a single host and
with some VFs loaded on the device. User invokes "devlink dev reload"
on PF0, ASYNC event is received on 2 PFs and VFs for reset. All the
function drivers will be re-instantiated including PF0.

If we wait for some time in reload_down() of PF0 and then call load in
reload_up(), this code will be different from other function drivers.

> > And returning from reload
> > does not mean that reset is complete as it is done in another context
> > and the driver notifies the health reporter once the reset is
> > complete. devlink framework may have to allow drivers to implement
> > reload_down only to look more clean or call reload_up only if the
> > driver notifies the devlink once reset is completed from another
> > context. Please suggest.

2020-08-05 06:35:18

by Moshe Shemesh

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option


On 8/4/2020 1:13 PM, Vasundhara Volam wrote:
> On Mon, Aug 3, 2020 at 7:23 PM Moshe Shemesh <[email protected]> wrote:
>>
>> On 8/3/2020 3:47 PM, Vasundhara Volam wrote:
>>> On Mon, Aug 3, 2020 at 5:47 PM Moshe Shemesh <[email protected]> wrote:
>>>> On 8/3/2020 1:24 PM, Vasundhara Volam wrote:
>>>>> On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <[email protected]> wrote:
>>>>>> On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
>>>>>>> On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
>>>>>>>> Introduce new option on devlink reload API to enable the user to select the
>>>>>>>> reload level required. Complete support for all levels in mlx5.
>>>>>>>> The following reload levels are supported:
>>>>>>>> driver: Driver entities re-instantiation only.
>>>>>>>> fw_reset: Firmware reset and driver entities re-instantiation.
>>>>>>> The Name is a little confusing. I think it should be renamed to
>>>>>>> fw_live_reset (in which both firmware and driver entities are
>>>>>>> re-instantiated). For only fw_reset, the driver should not undergo
>>>>>>> reset (it requires a driver reload for firmware to undergo reset).
>>>>>>>
>>>>>> So, I think the differentiation here is that "live_patch" doesn't reset
>>>>>> anything.
>>>>> This seems similar to flashing the firmware and does not reset anything.
>>>> The live patch is activating fw change without reset.
>>>>
>>>> It is not suitable for any fw change but fw gaps which don't require reset.
>>>>
>>>> I can query the fw to check if the pending image change is suitable or
>>>> require fw reset.
>>> Okay.
>>>>>>>> fw_live_patch: Firmware live patching only.
>>>>>>> This level is not clear. Is this similar to flashing??
>>>>>>>
>>>>>>> Also I have a basic query. The reload command is split into
>>>>>>> reload_up/reload_down handlers (Please correct me if this behaviour is
>>>>>>> changed with this patchset). What if the vendor specific driver does
>>>>>>> not support up/down and needs only a single handler to fire a firmware
>>>>>>> reset or firmware live reset command?
>>>>>> In the "reload_down" handler, they would trigger the appropriate reset,
>>>>>> and quiesce anything that needs to be done. Then on reload up, it would
>>>>>> restore and bring up anything quiesced in the first stage.
>>>>> Yes, I got the "reload_down" and "reload_up". Similar to the device
>>>>> "remove" and "re-probe" respectively.
>>>>>
>>>>> But our requirement is a similar "ethtool reset" command, where
>>>>> ethtool calls a single callback in driver and driver just sends a
>>>>> firmware command for doing the reset. Once firmware receives the
>>>>> command, it will initiate the reset of driver and firmware entities
>>>>> asynchronously.
>>>> It is similar to mlx5 case here for fw_reset. The driver triggers the fw
>>>> command to reset and all PFs drivers gets events to handle and do
>>>> re-initialization. To fit it to the devlink reload_down and reload_up,
>>>> I wait for the event handler to complete and it stops at driver unload
>>>> to have the driver up by devlink reload_up. See patch 8 in this patchset.
>>>>
>>> Yes, I see reload_down is triggering the reset. In our driver, after
>>> triggering the reset through a firmware command, reset is done in
>>> another context as the driver initiates the reset only after receiving
>>> an ASYNC event from the firmware.
>>
>> Same here.
>>
>>> Probably, we have to use reload_down() to send firmware command to
>>> trigger reset and do nothing in reload_up.
>> I had that in previous version, but its wrong to use devlink reload this
>> way, so I added wait with timeout for the event handling to complete
>> before unload_down function ends. See mlx5_fw_wait_fw_reset_done(). Also
>> the event handler stops before load back to have that done by devlink
>> reload_up.
> But "devlink dev reload" will be invoked by the user only on a single
> dev handler and all function drivers will be re-instantiated upon the
> ASYNC event. reload_down and reload_up are invoked only the function
> which the user invoked.
>
> Take an example of a 2-port (PF0 and PF1) adapter on a single host and
> with some VFs loaded on the device. User invokes "devlink dev reload"
> on PF0, ASYNC event is received on 2 PFs and VFs for reset. All the
> function drivers will be re-instantiated including PF0.
>
> If we wait for some time in reload_down() of PF0 and then call load in
> reload_up(), this code will be different from other function drivers.


I see your point here, but the user run devlink reload command on one
PF, in this case of fw-reset it will influence other PFs, but that's a
result of the fw-reset, the user if asked for params change or namespace
change that was for this PF.

>>> And returning from reload
>>> does not mean that reset is complete as it is done in another context
>>> and the driver notifies the health reporter once the reset is
>>> complete. devlink framework may have to allow drivers to implement
>>> reload_down only to look more clean or call reload_up only if the
>>> driver notifies the devlink once reset is completed from another
>>> context. Please suggest.

2020-08-05 06:55:59

by Vasundhara Volam

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option

On Wed, Aug 5, 2020 at 12:02 PM Moshe Shemesh <[email protected]> wrote:
>
>
> On 8/4/2020 1:13 PM, Vasundhara Volam wrote:
> > On Mon, Aug 3, 2020 at 7:23 PM Moshe Shemesh <[email protected]> wrote:
> >>
> >> On 8/3/2020 3:47 PM, Vasundhara Volam wrote:
> >>> On Mon, Aug 3, 2020 at 5:47 PM Moshe Shemesh <[email protected]> wrote:
> >>>> On 8/3/2020 1:24 PM, Vasundhara Volam wrote:
> >>>>> On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <[email protected]> wrote:
> >>>>>> On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
> >>>>>>> On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
> >>>>>>>> Introduce new option on devlink reload API to enable the user to select the
> >>>>>>>> reload level required. Complete support for all levels in mlx5.
> >>>>>>>> The following reload levels are supported:
> >>>>>>>> driver: Driver entities re-instantiation only.
> >>>>>>>> fw_reset: Firmware reset and driver entities re-instantiation.
> >>>>>>> The Name is a little confusing. I think it should be renamed to
> >>>>>>> fw_live_reset (in which both firmware and driver entities are
> >>>>>>> re-instantiated). For only fw_reset, the driver should not undergo
> >>>>>>> reset (it requires a driver reload for firmware to undergo reset).
> >>>>>>>
> >>>>>> So, I think the differentiation here is that "live_patch" doesn't reset
> >>>>>> anything.
> >>>>> This seems similar to flashing the firmware and does not reset anything.
> >>>> The live patch is activating fw change without reset.
> >>>>
> >>>> It is not suitable for any fw change but fw gaps which don't require reset.
> >>>>
> >>>> I can query the fw to check if the pending image change is suitable or
> >>>> require fw reset.
> >>> Okay.
> >>>>>>>> fw_live_patch: Firmware live patching only.
> >>>>>>> This level is not clear. Is this similar to flashing??
> >>>>>>>
> >>>>>>> Also I have a basic query. The reload command is split into
> >>>>>>> reload_up/reload_down handlers (Please correct me if this behaviour is
> >>>>>>> changed with this patchset). What if the vendor specific driver does
> >>>>>>> not support up/down and needs only a single handler to fire a firmware
> >>>>>>> reset or firmware live reset command?
> >>>>>> In the "reload_down" handler, they would trigger the appropriate reset,
> >>>>>> and quiesce anything that needs to be done. Then on reload up, it would
> >>>>>> restore and bring up anything quiesced in the first stage.
> >>>>> Yes, I got the "reload_down" and "reload_up". Similar to the device
> >>>>> "remove" and "re-probe" respectively.
> >>>>>
> >>>>> But our requirement is a similar "ethtool reset" command, where
> >>>>> ethtool calls a single callback in driver and driver just sends a
> >>>>> firmware command for doing the reset. Once firmware receives the
> >>>>> command, it will initiate the reset of driver and firmware entities
> >>>>> asynchronously.
> >>>> It is similar to mlx5 case here for fw_reset. The driver triggers the fw
> >>>> command to reset and all PFs drivers gets events to handle and do
> >>>> re-initialization. To fit it to the devlink reload_down and reload_up,
> >>>> I wait for the event handler to complete and it stops at driver unload
> >>>> to have the driver up by devlink reload_up. See patch 8 in this patchset.
> >>>>
> >>> Yes, I see reload_down is triggering the reset. In our driver, after
> >>> triggering the reset through a firmware command, reset is done in
> >>> another context as the driver initiates the reset only after receiving
> >>> an ASYNC event from the firmware.
> >>
> >> Same here.
> >>
> >>> Probably, we have to use reload_down() to send firmware command to
> >>> trigger reset and do nothing in reload_up.
> >> I had that in previous version, but its wrong to use devlink reload this
> >> way, so I added wait with timeout for the event handling to complete
> >> before unload_down function ends. See mlx5_fw_wait_fw_reset_done(). Also
> >> the event handler stops before load back to have that done by devlink
> >> reload_up.
> > But "devlink dev reload" will be invoked by the user only on a single
> > dev handler and all function drivers will be re-instantiated upon the
> > ASYNC event. reload_down and reload_up are invoked only the function
> > which the user invoked.
> >
> > Take an example of a 2-port (PF0 and PF1) adapter on a single host and
> > with some VFs loaded on the device. User invokes "devlink dev reload"
> > on PF0, ASYNC event is received on 2 PFs and VFs for reset. All the
> > function drivers will be re-instantiated including PF0.
> >
> > If we wait for some time in reload_down() of PF0 and then call load in
> > reload_up(), this code will be different from other function drivers.
>
>
> I see your point here, but the user run devlink reload command on one
> PF, in this case of fw-reset it will influence other PFs, but that's a
> result of the fw-reset, the user if asked for params change or namespace
> change that was for this PF.
Right, if any driver is implementing only fw-reset have to leave
reload_up as an empty function.

>
> >>> And returning from reload
> >>> does not mean that reset is complete as it is done in another context
> >>> and the driver notifies the health reporter once the reset is
> >>> complete. devlink framework may have to allow drivers to implement
> >>> reload_down only to look more clean or call reload_up only if the
> >>> driver notifies the devlink once reset is completed from another
> >>> context. Please suggest.

2020-08-05 08:23:51

by Moshe Shemesh

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option


On 8/5/2020 9:55 AM, Vasundhara Volam wrote:
> On Wed, Aug 5, 2020 at 12:02 PM Moshe Shemesh <[email protected]> wrote:
>>
>> On 8/4/2020 1:13 PM, Vasundhara Volam wrote:
>>> On Mon, Aug 3, 2020 at 7:23 PM Moshe Shemesh <[email protected]> wrote:
>>>> On 8/3/2020 3:47 PM, Vasundhara Volam wrote:
>>>>> On Mon, Aug 3, 2020 at 5:47 PM Moshe Shemesh <[email protected]> wrote:
>>>>>> On 8/3/2020 1:24 PM, Vasundhara Volam wrote:
>>>>>>> On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <[email protected]> wrote:
>>>>>>>> On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
>>>>>>>>> On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
>>>>>>>>>> Introduce new option on devlink reload API to enable the user to select the
>>>>>>>>>> reload level required. Complete support for all levels in mlx5.
>>>>>>>>>> The following reload levels are supported:
>>>>>>>>>> driver: Driver entities re-instantiation only.
>>>>>>>>>> fw_reset: Firmware reset and driver entities re-instantiation.
>>>>>>>>> The Name is a little confusing. I think it should be renamed to
>>>>>>>>> fw_live_reset (in which both firmware and driver entities are
>>>>>>>>> re-instantiated). For only fw_reset, the driver should not undergo
>>>>>>>>> reset (it requires a driver reload for firmware to undergo reset).
>>>>>>>>>
>>>>>>>> So, I think the differentiation here is that "live_patch" doesn't reset
>>>>>>>> anything.
>>>>>>> This seems similar to flashing the firmware and does not reset anything.
>>>>>> The live patch is activating fw change without reset.
>>>>>>
>>>>>> It is not suitable for any fw change but fw gaps which don't require reset.
>>>>>>
>>>>>> I can query the fw to check if the pending image change is suitable or
>>>>>> require fw reset.
>>>>> Okay.
>>>>>>>>>> fw_live_patch: Firmware live patching only.
>>>>>>>>> This level is not clear. Is this similar to flashing??
>>>>>>>>>
>>>>>>>>> Also I have a basic query. The reload command is split into
>>>>>>>>> reload_up/reload_down handlers (Please correct me if this behaviour is
>>>>>>>>> changed with this patchset). What if the vendor specific driver does
>>>>>>>>> not support up/down and needs only a single handler to fire a firmware
>>>>>>>>> reset or firmware live reset command?
>>>>>>>> In the "reload_down" handler, they would trigger the appropriate reset,
>>>>>>>> and quiesce anything that needs to be done. Then on reload up, it would
>>>>>>>> restore and bring up anything quiesced in the first stage.
>>>>>>> Yes, I got the "reload_down" and "reload_up". Similar to the device
>>>>>>> "remove" and "re-probe" respectively.
>>>>>>>
>>>>>>> But our requirement is a similar "ethtool reset" command, where
>>>>>>> ethtool calls a single callback in driver and driver just sends a
>>>>>>> firmware command for doing the reset. Once firmware receives the
>>>>>>> command, it will initiate the reset of driver and firmware entities
>>>>>>> asynchronously.
>>>>>> It is similar to mlx5 case here for fw_reset. The driver triggers the fw
>>>>>> command to reset and all PFs drivers gets events to handle and do
>>>>>> re-initialization. To fit it to the devlink reload_down and reload_up,
>>>>>> I wait for the event handler to complete and it stops at driver unload
>>>>>> to have the driver up by devlink reload_up. See patch 8 in this patchset.
>>>>>>
>>>>> Yes, I see reload_down is triggering the reset. In our driver, after
>>>>> triggering the reset through a firmware command, reset is done in
>>>>> another context as the driver initiates the reset only after receiving
>>>>> an ASYNC event from the firmware.
>>>> Same here.
>>>>
>>>>> Probably, we have to use reload_down() to send firmware command to
>>>>> trigger reset and do nothing in reload_up.
>>>> I had that in previous version, but its wrong to use devlink reload this
>>>> way, so I added wait with timeout for the event handling to complete
>>>> before unload_down function ends. See mlx5_fw_wait_fw_reset_done(). Also
>>>> the event handler stops before load back to have that done by devlink
>>>> reload_up.
>>> But "devlink dev reload" will be invoked by the user only on a single
>>> dev handler and all function drivers will be re-instantiated upon the
>>> ASYNC event. reload_down and reload_up are invoked only the function
>>> which the user invoked.
>>>
>>> Take an example of a 2-port (PF0 and PF1) adapter on a single host and
>>> with some VFs loaded on the device. User invokes "devlink dev reload"
>>> on PF0, ASYNC event is received on 2 PFs and VFs for reset. All the
>>> function drivers will be re-instantiated including PF0.
>>>
>>> If we wait for some time in reload_down() of PF0 and then call load in
>>> reload_up(), this code will be different from other function drivers.
>>
>> I see your point here, but the user run devlink reload command on one
>> PF, in this case of fw-reset it will influence other PFs, but that's a
>> result of the fw-reset, the user if asked for params change or namespace
>> change that was for this PF.
> Right, if any driver is implementing only fw-reset have to leave
> reload_up as an empty function.


No, its not only up the driver. The netns option is implemented by
devlink and its running between reload_down and reload_up.

>>>>> And returning from reload
>>>>> does not mean that reset is complete as it is done in another context
>>>>> and the driver notifies the health reporter once the reset is
>>>>> complete. devlink framework may have to allow drivers to implement
>>>>> reload_down only to look more clean or call reload_up only if the
>>>>> driver notifies the devlink once reset is completed from another
>>>>> context. Please suggest.

2020-08-12 09:37:32

by Vasundhara Volam

[permalink] [raw]
Subject: Re: [PATCH net-next RFC 00/13] Add devlink reload level option

On Wed, Aug 5, 2020 at 1:51 PM Moshe Shemesh <[email protected]> wrote:
>
>
> On 8/5/2020 9:55 AM, Vasundhara Volam wrote:
> > On Wed, Aug 5, 2020 at 12:02 PM Moshe Shemesh <[email protected]> wrote:
> >>
> >> On 8/4/2020 1:13 PM, Vasundhara Volam wrote:
> >>> On Mon, Aug 3, 2020 at 7:23 PM Moshe Shemesh <[email protected]> wrote:
> >>>> On 8/3/2020 3:47 PM, Vasundhara Volam wrote:
> >>>>> On Mon, Aug 3, 2020 at 5:47 PM Moshe Shemesh <[email protected]> wrote:
> >>>>>> On 8/3/2020 1:24 PM, Vasundhara Volam wrote:
> >>>>>>> On Tue, Jul 28, 2020 at 10:13 PM Jacob Keller <[email protected]> wrote:
> >>>>>>>> On 7/27/2020 10:25 PM, Vasundhara Volam wrote:
> >>>>>>>>> On Mon, Jul 27, 2020 at 4:36 PM Moshe Shemesh <[email protected]> wrote:
> >>>>>>>>>> Introduce new option on devlink reload API to enable the user to select the
> >>>>>>>>>> reload level required. Complete support for all levels in mlx5.
> >>>>>>>>>> The following reload levels are supported:
> >>>>>>>>>> driver: Driver entities re-instantiation only.
> >>>>>>>>>> fw_reset: Firmware reset and driver entities re-instantiation.
> >>>>>>>>> The Name is a little confusing. I think it should be renamed to
> >>>>>>>>> fw_live_reset (in which both firmware and driver entities are
> >>>>>>>>> re-instantiated). For only fw_reset, the driver should not undergo
> >>>>>>>>> reset (it requires a driver reload for firmware to undergo reset).
> >>>>>>>>>
> >>>>>>>> So, I think the differentiation here is that "live_patch" doesn't reset
> >>>>>>>> anything.
> >>>>>>> This seems similar to flashing the firmware and does not reset anything.
> >>>>>> The live patch is activating fw change without reset.
> >>>>>>
> >>>>>> It is not suitable for any fw change but fw gaps which don't require reset.
> >>>>>>
> >>>>>> I can query the fw to check if the pending image change is suitable or
> >>>>>> require fw reset.
> >>>>> Okay.
> >>>>>>>>>> fw_live_patch: Firmware live patching only.
> >>>>>>>>> This level is not clear. Is this similar to flashing??
> >>>>>>>>>
> >>>>>>>>> Also I have a basic query. The reload command is split into
> >>>>>>>>> reload_up/reload_down handlers (Please correct me if this behaviour is
> >>>>>>>>> changed with this patchset). What if the vendor specific driver does
> >>>>>>>>> not support up/down and needs only a single handler to fire a firmware
> >>>>>>>>> reset or firmware live reset command?
> >>>>>>>> In the "reload_down" handler, they would trigger the appropriate reset,
> >>>>>>>> and quiesce anything that needs to be done. Then on reload up, it would
> >>>>>>>> restore and bring up anything quiesced in the first stage.
> >>>>>>> Yes, I got the "reload_down" and "reload_up". Similar to the device
> >>>>>>> "remove" and "re-probe" respectively.
> >>>>>>>
> >>>>>>> But our requirement is a similar "ethtool reset" command, where
> >>>>>>> ethtool calls a single callback in driver and driver just sends a
> >>>>>>> firmware command for doing the reset. Once firmware receives the
> >>>>>>> command, it will initiate the reset of driver and firmware entities
> >>>>>>> asynchronously.
> >>>>>> It is similar to mlx5 case here for fw_reset. The driver triggers the fw
> >>>>>> command to reset and all PFs drivers gets events to handle and do
> >>>>>> re-initialization. To fit it to the devlink reload_down and reload_up,
> >>>>>> I wait for the event handler to complete and it stops at driver unload
> >>>>>> to have the driver up by devlink reload_up. See patch 8 in this patchset.
> >>>>>>
> >>>>> Yes, I see reload_down is triggering the reset. In our driver, after
> >>>>> triggering the reset through a firmware command, reset is done in
> >>>>> another context as the driver initiates the reset only after receiving
> >>>>> an ASYNC event from the firmware.
> >>>> Same here.
> >>>>
> >>>>> Probably, we have to use reload_down() to send firmware command to
> >>>>> trigger reset and do nothing in reload_up.
> >>>> I had that in previous version, but its wrong to use devlink reload this
> >>>> way, so I added wait with timeout for the event handling to complete
> >>>> before unload_down function ends. See mlx5_fw_wait_fw_reset_done(). Also
> >>>> the event handler stops before load back to have that done by devlink
> >>>> reload_up.
> >>> But "devlink dev reload" will be invoked by the user only on a single
> >>> dev handler and all function drivers will be re-instantiated upon the
> >>> ASYNC event. reload_down and reload_up are invoked only the function
> >>> which the user invoked.
> >>>
> >>> Take an example of a 2-port (PF0 and PF1) adapter on a single host and
> >>> with some VFs loaded on the device. User invokes "devlink dev reload"
> >>> on PF0, ASYNC event is received on 2 PFs and VFs for reset. All the
> >>> function drivers will be re-instantiated including PF0.
> >>>
> >>> If we wait for some time in reload_down() of PF0 and then call load in
> >>> reload_up(), this code will be different from other function drivers.
> >>
> >> I see your point here, but the user run devlink reload command on one
> >> PF, in this case of fw-reset it will influence other PFs, but that's a
> >> result of the fw-reset, the user if asked for params change or namespace
> >> change that was for this PF.
> > Right, if any driver is implementing only fw-reset have to leave
> > reload_up as an empty function.
>
>
> No, its not only up the driver. The netns option is implemented by
> devlink and its running between reload_down and reload_up.
What I mean is, driver will provide a reload_up handler but it will
not do anything and simply return 0.
>
> >>>>> And returning from reload
> >>>>> does not mean that reset is complete as it is done in another context
> >>>>> and the driver notifies the health reporter once the reset is
> >>>>> complete. devlink framework may have to allow drivers to implement
> >>>>> reload_down only to look more clean or call reload_up only if the
> >>>>> driver notifies the devlink once reset is completed from another
> >>>>> context. Please suggest.