2024-04-24 13:33:08

by Danielle Ratson

[permalink] [raw]
Subject: [PATCH net-next v5 00/10] Add ability to flash modules' firmware

CMIS compliant modules such as QSFP-DD might be running a firmware that
can be updated in a vendor-neutral way by exchanging messages between
the host and the module as described in section 7.2.2 of revision
4.0 of the CMIS standard.

According to the CMIS standard, the firmware update process is done
using a CDB commands sequence.

CDB (Command Data Block Message Communication) reads and writes are
performed on memory map pages 9Fh-AFh according to the CMIS standard,
section 8.12 of revision 4.0.

Add a pair of new ethtool messages that allow:

* User space to trigger firmware update of transceiver modules

* The kernel to notify user space about the progress of the process

The user interface is designed to be asynchronous in order to avoid RTNL
being held for too long and to allow several modules to be updated
simultaneously. The interface is designed with CMIS compliant modules in
mind, but kept generic enough to accommodate future use cases, if these
arise.

The kernel interface that will implement the firmware update using CDB
command will include 2 layers that will be added under ethtool:

* The upper layer that will be triggered from the module layer, is
cmis_ fw_update.
* The lower one is cmis_cdb.

In the future there might be more operations to implement using CDB
commands. Therefore, the idea is to keep the cmis_cdb interface clean and
the cmis_fw_update specific to the cdb commands handling it.

The communication between the kernel and the driver will be done using
two ethtool operations that enable reading and writing the transceiver
module EEPROM.
The operation ethtool_ops::get_module_eeprom_by_page, that is already
implemented, will be used for reading from the EEPROM the CDB reply,
e.g. reading module setting, state, etc.
The operation ethtool_ops::set_module_eeprom_by_page, that is added in
the current patchset, will be used for writing to the EEPROM the CDB
command such as start firmware image, run firmware image, etc.

Therefore in order for a driver to implement module flashing, that
driver needs to implement the two functions mentioned above.

Patchset overview:
Patch #1-#2: Implement the EEPROM writing in mlxsw.
Patch #3: Define the interface between the kernel and user space.
Patch #4: Add ability to notify the flashing firmware progress.
Patch #5: Add firmware flashing in progress flag.
Patch #6: Add extended compliance codes.
Patch #7: Add the cdb layer.
Patch #8: Add the fw_update layer.
Patch #9: Add ability to flash transceiver modules' firmware.
Patch #10: Veto problematic scenarios.

v5:
* Drop all the inline in cmis_cdb.c.
* Modify tools/net/ynl/Makefile.deps so the ynl file will
* include the ethtool.h changes.
* u64>uint for 'total' and 'done' attrs.
* Translate the enum from ethtool_netlink.h to YAML.

Danielle Ratson (8):
ethtool: Add an interface for flashing transceiver modules' firmware
ethtool: Add flashing transceiver modules' firmware notifications
ability
include: netdevice: Add module firmware flashing in progress flag to
net_device
net: sfp: Add more extended compliance codes
ethtool: cmis_cdb: Add a layer for supporting CDB commands
ethtool: cmis_fw_update: add a layer for supporting firmware update
using CDB
ethtool: Add ability to flash transceiver modules' firmware
ethtool: Veto some operations during firmware flashing process

Ido Schimmel (2):
ethtool: Add ethtool operation to write to a transceiver module EEPROM
mlxsw: Implement ethtool operation to write to a transceiver module
EEPROM

Documentation/netlink/specs/ethtool.yaml | 55 ++
Documentation/networking/ethtool-netlink.rst | 62 ++
.../net/ethernet/mellanox/mlxsw/core_env.c | 57 ++
.../net/ethernet/mellanox/mlxsw/core_env.h | 6 +
drivers/net/ethernet/mellanox/mlxsw/minimal.c | 15 +
.../mellanox/mlxsw/spectrum_ethtool.c | 15 +
include/linux/ethtool.h | 20 +-
include/linux/netdevice.h | 4 +-
include/linux/sfp.h | 6 +
include/uapi/linux/ethtool.h | 18 +
include/uapi/linux/ethtool_netlink.h | 20 +
net/ethtool/Makefile | 2 +-
net/ethtool/cmis.h | 123 ++++
net/ethtool/cmis_cdb.c | 577 ++++++++++++++++++
net/ethtool/cmis_fw_update.c | 397 ++++++++++++
net/ethtool/eeprom.c | 6 +
net/ethtool/ioctl.c | 12 +
net/ethtool/module.c | 286 +++++++++
net/ethtool/module_fw.h | 38 ++
net/ethtool/netlink.c | 37 +-
net/ethtool/netlink.h | 2 +
tools/net/ynl/Makefile.deps | 3 +-
22 files changed, 1749 insertions(+), 12 deletions(-)
create mode 100644 net/ethtool/cmis.h
create mode 100644 net/ethtool/cmis_cdb.c
create mode 100644 net/ethtool/cmis_fw_update.c
create mode 100644 net/ethtool/module_fw.h

--
2.43.0



2024-04-24 13:33:18

by Danielle Ratson

[permalink] [raw]
Subject: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

Add progress notifications ability to user space while flashing modules'
firmware by implementing the interface between the user space and the
kernel.

Signed-off-by: Danielle Ratson <[email protected]>
---

Notes:
v2:
* Increase err_msg length.

net/ethtool/module.c | 83 +++++++++++++++++++++++++++++++++++++++++
net/ethtool/module_fw.h | 10 +++++
2 files changed, 93 insertions(+)
create mode 100644 net/ethtool/module_fw.h

diff --git a/net/ethtool/module.c b/net/ethtool/module.c
index ceb575efc290..114a2ec986fe 100644
--- a/net/ethtool/module.c
+++ b/net/ethtool/module.c
@@ -5,6 +5,7 @@
#include "netlink.h"
#include "common.h"
#include "bitset.h"
+#include "module_fw.h"

struct module_req_info {
struct ethnl_req_info base;
@@ -158,3 +159,85 @@ const struct ethnl_request_ops ethnl_module_request_ops = {
.set = ethnl_set_module,
.set_ntf_cmd = ETHTOOL_MSG_MODULE_NTF,
};
+
+/* MODULE_FW_FLASH_NTF */
+
+static void
+ethnl_module_fw_flash_ntf(struct net_device *dev,
+ enum ethtool_module_fw_flash_status status,
+ const char *status_msg, u64 done, u64 total)
+{
+ struct sk_buff *skb;
+ void *hdr;
+ int ret;
+
+ skb = genlmsg_new(NLMSG_GOODSIZE, GFP_KERNEL);
+ if (!skb)
+ return;
+
+ hdr = ethnl_bcastmsg_put(skb, ETHTOOL_MSG_MODULE_FW_FLASH_NTF);
+ if (!hdr)
+ goto err_skb;
+
+ ret = ethnl_fill_reply_header(skb, dev,
+ ETHTOOL_A_MODULE_FW_FLASH_HEADER);
+ if (ret < 0)
+ goto err_skb;
+
+ if (nla_put_u32(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS, status))
+ goto err_skb;
+
+ if (status_msg &&
+ nla_put_string(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS_MSG,
+ status_msg))
+ goto err_skb;
+
+ if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_DONE, done,
+ ETHTOOL_A_MODULE_FW_FLASH_PAD))
+ goto err_skb;
+
+ if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_TOTAL, total,
+ ETHTOOL_A_MODULE_FW_FLASH_PAD))
+ goto err_skb;
+
+ genlmsg_end(skb, hdr);
+ ethnl_multicast(skb, dev);
+ return;
+
+err_skb:
+ nlmsg_free(skb);
+}
+
+void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
+ char *err_msg, char *sub_err_msg)
+{
+ char status_msg[120];
+
+ if (sub_err_msg)
+ sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
+ else
+ sprintf(status_msg, "%s.", err_msg);
+
+ ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_ERROR,
+ status_msg, 0, 0);
+}
+
+void ethnl_module_fw_flash_ntf_start(struct net_device *dev)
+{
+ ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_STARTED,
+ NULL, 0, 0);
+}
+
+void ethnl_module_fw_flash_ntf_complete(struct net_device *dev)
+{
+ ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_COMPLETED,
+ NULL, 0, 0);
+}
+
+void ethnl_module_fw_flash_ntf_in_progress(struct net_device *dev, u64 done,
+ u64 total)
+{
+ ethnl_module_fw_flash_ntf(dev,
+ ETHTOOL_MODULE_FW_FLASH_STATUS_IN_PROGRESS,
+ NULL, done, total);
+}
diff --git a/net/ethtool/module_fw.h b/net/ethtool/module_fw.h
new file mode 100644
index 000000000000..e40eae442741
--- /dev/null
+++ b/net/ethtool/module_fw.h
@@ -0,0 +1,10 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#include <uapi/linux/ethtool.h>
+
+void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
+ char *err_msg, char *sub_err_msg);
+void ethnl_module_fw_flash_ntf_start(struct net_device *dev);
+void ethnl_module_fw_flash_ntf_complete(struct net_device *dev);
+void ethnl_module_fw_flash_ntf_in_progress(struct net_device *dev, u64 done,
+ u64 total);
--
2.43.0


2024-04-24 13:33:29

by Danielle Ratson

[permalink] [raw]
Subject: [PATCH net-next v5 05/10] include: netdevice: Add module firmware flashing in progress flag to net_device

Some operations cannot be performed during the firmware flashing
process and will be vetoed later in the patchset.
For example, flashing firmware on a device which is already in a flashing
process should be forbidden.

In order to veto those scenarios, add a flag in 'struct net_device' that
indicates when a firmware flash is taking place on the module.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
---
include/linux/netdevice.h | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f849e7d110ed..68a18911c8cf 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1989,6 +1989,8 @@ enum netdev_reg_state {
*
* @threaded: napi threaded mode is enabled
*
+ * @module_fw_flash_in_progress: Module firmware flashing is in progress.
+ *
* @net_notifier_list: List of per-net netdev notifier block
* that follow this device when it is moved
* to another network namespace.
@@ -2372,7 +2374,7 @@ struct net_device {
bool proto_down;
unsigned wol_enabled:1;
unsigned threaded:1;
-
+ unsigned module_fw_flash_in_progress:1;
struct list_head net_notifier_list;

#if IS_ENABLED(CONFIG_MACSEC)
--
2.43.0


2024-04-24 13:33:41

by Danielle Ratson

[permalink] [raw]
Subject: [PATCH net-next v5 06/10] net: sfp: Add more extended compliance codes

SFF-8024 is used to define various constants re-used in several SFF
SFP-related specifications.

Add SFF-8024 extended compliance code definitions for CMIS compliant
modules and use them in the next patch to determine the firmware flashing
work.

Signed-off-by: Danielle Ratson <[email protected]>
Reviewed-by: Petr Machata <[email protected]>
Reviewed-by: Russell King (Oracle) <[email protected]>
---
include/linux/sfp.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/include/linux/sfp.h b/include/linux/sfp.h
index 55c0ab17c9e2..46c145fa855d 100644
--- a/include/linux/sfp.h
+++ b/include/linux/sfp.h
@@ -284,6 +284,12 @@ enum {
SFF8024_ID_QSFP_8438 = 0x0c,
SFF8024_ID_QSFP_8436_8636 = 0x0d,
SFF8024_ID_QSFP28_8636 = 0x11,
+ SFF8024_ID_QSFP_DD = 0x18,
+ SFF8024_ID_OSFP = 0x19,
+ SFF8024_ID_DSFP = 0x1B,
+ SFF8024_ID_QSFP_PLUS_CMIS = 0x1E,
+ SFF8024_ID_SFP_DD_CMIS = 0x1F,
+ SFF8024_ID_SFP_PLUS_CMIS = 0x20,

SFF8024_ENCODING_UNSPEC = 0x00,
SFF8024_ENCODING_8B10B = 0x01,
--
2.43.0


2024-04-24 13:34:26

by Danielle Ratson

[permalink] [raw]
Subject: [PATCH net-next v5 08/10] ethtool: cmis_fw_update: add a layer for supporting firmware update using CDB

According to the CMIS standard, the firmware update process is done using
a CDB commands sequence.

Implement a work that will be triggered from the module layer in the
next patch the will initiate and execute all the CDB commands in order, to
eventually complete the firmware update process.

This flashing process includes, writing the firmware image, running the new
firmware image and committing it after testing, so that it will run upon
reset.

This work will also notify user space about the progress of the firmware
update process.

Signed-off-by: Danielle Ratson <[email protected]>
---

Notes:
v2:
* Decrease msleep before querying completion flag in Write FW
Image command.
* Change the condition for failing when LPL is not supported.
* Re-write cmis_fw_update_write_image().

net/ethtool/Makefile | 2 +-
net/ethtool/cmis.h | 7 +
net/ethtool/cmis_fw_update.c | 397 +++++++++++++++++++++++++++++++++++
net/ethtool/module_fw.h | 18 ++
4 files changed, 423 insertions(+), 1 deletion(-)
create mode 100644 net/ethtool/cmis_fw_update.c

diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 38806b3ecf83..9a190635fe95 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -8,4 +8,4 @@ ethtool_nl-y := netlink.o bitset.o strset.o linkinfo.o linkmodes.o rss.o \
linkstate.o debug.o wol.o features.o privflags.o rings.o \
channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
tunnels.o fec.o eeprom.o stats.o phc_vclocks.o mm.o \
- module.o cmis_cdb.o pse-pd.o plca.o mm.o
+ module.o cmis_fw_update.o cmis_cdb.o pse-pd.o plca.o mm.o
diff --git a/net/ethtool/cmis.h b/net/ethtool/cmis.h
index 6c8e88b8ade2..394308e0c942 100644
--- a/net/ethtool/cmis.h
+++ b/net/ethtool/cmis.h
@@ -20,6 +20,12 @@ struct ethtool_cmis_cdb {
enum ethtool_cmis_cdb_cmd_id {
ETHTOOL_CMIS_CDB_CMD_QUERY_STATUS = 0x0000,
ETHTOOL_CMIS_CDB_CMD_MODULE_FEATURES = 0x0040,
+ ETHTOOL_CMIS_CDB_CMD_FW_MANAGMENT_FEATURES = 0x0041,
+ ETHTOOL_CMIS_CDB_CMD_START_FW_DOWNLOAD = 0x0101,
+ ETHTOOL_CMIS_CDB_CMD_WRITE_FW_BLOCK_LPL = 0x0103,
+ ETHTOOL_CMIS_CDB_CMD_COMPLETE_FW_DOWNLOAD = 0x0107,
+ ETHTOOL_CMIS_CDB_CMD_RUN_FW_IMAGE = 0x0109,
+ ETHTOOL_CMIS_CDB_CMD_COMMIT_FW_IMAGE = 0x010A,
};

/**
@@ -47,6 +53,7 @@ struct ethtool_cmis_cdb_request {

#define CDB_F_COMPLETION_VALID BIT(0)
#define CDB_F_STATUS_VALID BIT(1)
+#define CDB_F_MODULE_STATE_VALID BIT(2)

/**
* struct ethtool_cmis_cdb_cmd_args - CDB commands execution arguments
diff --git a/net/ethtool/cmis_fw_update.c b/net/ethtool/cmis_fw_update.c
new file mode 100644
index 000000000000..a23ff2c86a8a
--- /dev/null
+++ b/net/ethtool/cmis_fw_update.c
@@ -0,0 +1,397 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/ethtool.h>
+#include <linux/firmware.h>
+
+#include "common.h"
+#include "module_fw.h"
+#include "cmis.h"
+
+struct cmis_fw_update_fw_mng_features {
+ u8 start_cmd_payload_size;
+ u16 max_duration_start;
+ u16 max_duration_write;
+ u16 max_duration_complete;
+};
+
+/* See section 9.4.2 "CMD 0041h: Firmware Management Features" in CMIS standard
+ * revision 5.2.
+ * struct cmis_cdb_fw_mng_features_rpl is a structured layout of the flat
+ * array, ethtool_cmis_cdb_rpl::payload.
+ */
+struct cmis_cdb_fw_mng_features_rpl {
+ u8 resv1;
+ u8 resv2;
+ u8 start_cmd_payload_size;
+ u8 resv3;
+ u8 read_write_len_ext;
+ u8 write_mechanism;
+ u8 resv4;
+ u8 resv5;
+ __be16 max_duration_start;
+ __be16 resv6;
+ __be16 max_duration_write;
+ __be16 max_duration_complete;
+ __be16 resv7;
+};
+
+#define CMIS_CDB_FW_WRITE_MECHANISM_LPL 0x01
+
+static int
+cmis_fw_update_fw_mng_features_get(struct ethtool_cmis_cdb *cdb,
+ struct net_device *dev,
+ struct cmis_fw_update_fw_mng_features *fw_mng)
+{
+ struct ethtool_cmis_cdb_cmd_args args = {};
+ struct cmis_cdb_fw_mng_features_rpl *rpl;
+ u8 flags = CDB_F_STATUS_VALID;
+ int err;
+
+ ethtool_cmis_cdb_check_completion_flag(cdb->cmis_rev, &flags);
+ ethtool_cmis_cdb_compose_args(&args,
+ ETHTOOL_CMIS_CDB_CMD_FW_MANAGMENT_FEATURES,
+ NULL, 0, cdb->max_completion_time,
+ cdb->read_write_len_ext, 1000,
+ sizeof(*rpl), flags);
+
+ err = ethtool_cmis_cdb_execute_cmd(dev, &args);
+ if (err < 0) {
+ ethnl_module_fw_flash_ntf_err(dev,
+ "FW Management Features command failed",
+ args.err_msg);
+ return err;
+ }
+
+ rpl = (struct cmis_cdb_fw_mng_features_rpl *)args.req.payload;
+ if (!(rpl->write_mechanism == CMIS_CDB_FW_WRITE_MECHANISM_LPL)) {
+ ethnl_module_fw_flash_ntf_err(dev,
+ "Write LPL is not supported",
+ NULL);
+ return -EOPNOTSUPP;
+ }
+
+ /* Above, we used read_write_len_ext that we got from CDB
+ * advertisement. Update it with the value that we got from module
+ * features query, which is specific for Firmware Management Commands
+ * (IDs 0100h-01FFh).
+ */
+ cdb->read_write_len_ext = rpl->read_write_len_ext;
+ fw_mng->start_cmd_payload_size = rpl->start_cmd_payload_size;
+ fw_mng->max_duration_start = be16_to_cpu(rpl->max_duration_start);
+ fw_mng->max_duration_write = be16_to_cpu(rpl->max_duration_write);
+ fw_mng->max_duration_complete = be16_to_cpu(rpl->max_duration_complete);
+
+ return 0;
+}
+
+/* See section 9.7.2 "CMD 0101h: Start Firmware Download" in CMIS standard
+ * revision 5.2.
+ * struct cmis_cdb_start_fw_download_pl is a structured layout of the
+ * flat array, ethtool_cmis_cdb_request::payload.
+ */
+struct cmis_cdb_start_fw_download_pl {
+ __struct_group(cmis_cdb_start_fw_download_pl_h, head, /* no attrs */,
+ __be32 image_size;
+ __be32 resv1;
+ );
+ u8 vendor_data[ETHTOOL_CMIS_CDB_LPL_MAX_PL_LENGTH -
+ sizeof(struct cmis_cdb_start_fw_download_pl_h)];
+};
+
+static int
+cmis_fw_update_start_download(struct ethtool_cmis_cdb *cdb,
+ struct ethtool_module_fw_flash *module_fw,
+ struct cmis_fw_update_fw_mng_features *fw_mng)
+{
+ u8 vendor_data_size = fw_mng->start_cmd_payload_size;
+ struct cmis_cdb_start_fw_download_pl pl = {};
+ struct ethtool_cmis_cdb_cmd_args args = {};
+ u8 lpl_len;
+ int err;
+
+ pl.image_size = cpu_to_be32(module_fw->fw->size);
+ memcpy(pl.vendor_data, module_fw->fw->data, vendor_data_size);
+
+ lpl_len = offsetof(struct cmis_cdb_start_fw_download_pl,
+ vendor_data[vendor_data_size]);
+
+ ethtool_cmis_cdb_compose_args(&args,
+ ETHTOOL_CMIS_CDB_CMD_START_FW_DOWNLOAD,
+ (u8 *)&pl, lpl_len,
+ fw_mng->max_duration_start,
+ cdb->read_write_len_ext, 1000, 0,
+ CDB_F_COMPLETION_VALID | CDB_F_STATUS_VALID);
+
+ err = ethtool_cmis_cdb_execute_cmd(module_fw->dev, &args);
+ if (err < 0)
+ ethnl_module_fw_flash_ntf_err(module_fw->dev,
+ "Start FW download command failed",
+ args.err_msg);
+
+ return err;
+}
+
+/* See section 9.7.4 "CMD 0103h: Write Firmware Block LPL" in CMIS standard
+ * revision 5.2.
+ * struct cmis_cdb_write_fw_block_lpl_pl is a structured layout of the
+ * flat array, ethtool_cmis_cdb_request::payload.
+ */
+struct cmis_cdb_write_fw_block_lpl_pl {
+ __be32 block_address;
+ u8 fw_block[ETHTOOL_CMIS_CDB_LPL_MAX_PL_LENGTH - sizeof(__be32)];
+};
+
+static int
+cmis_fw_update_write_image(struct ethtool_cmis_cdb *cdb,
+ struct ethtool_module_fw_flash *module_fw,
+ struct cmis_fw_update_fw_mng_features *fw_mng)
+{
+ u8 start = fw_mng->start_cmd_payload_size;
+ u32 image_size = module_fw->fw->size;
+ u32 offset, max_block_size, max_lpl_len;
+ int err;
+
+ max_lpl_len = min_t(u32,
+ ethtool_cmis_get_max_payload_size(cdb->read_write_len_ext),
+ ETHTOOL_CMIS_CDB_LPL_MAX_PL_LENGTH);
+ max_block_size =
+ max_lpl_len - sizeof_field(struct cmis_cdb_write_fw_block_lpl_pl,
+ block_address);
+
+ for (offset = start; offset < image_size; offset += max_block_size) {
+ struct cmis_cdb_write_fw_block_lpl_pl pl = {
+ .block_address = cpu_to_be32(offset - start),
+ };
+ struct ethtool_cmis_cdb_cmd_args args = {};
+ u32 block_size, lpl_len;
+
+ ethnl_module_fw_flash_ntf_in_progress(module_fw->dev,
+ offset - start,
+ image_size);
+ block_size = min_t(u32, max_block_size, image_size - offset);
+ memcpy(pl.fw_block, &module_fw->fw->data[offset], block_size);
+ lpl_len = block_size +
+ sizeof_field(struct cmis_cdb_write_fw_block_lpl_pl,
+ block_address);
+
+ ethtool_cmis_cdb_compose_args(&args,
+ ETHTOOL_CMIS_CDB_CMD_WRITE_FW_BLOCK_LPL,
+ (u8 *)&pl, lpl_len,
+ fw_mng->max_duration_write,
+ cdb->read_write_len_ext, 1, 0,
+ CDB_F_COMPLETION_VALID | CDB_F_STATUS_VALID);
+
+ err = ethtool_cmis_cdb_execute_cmd(module_fw->dev, &args);
+ if (err < 0) {
+ ethnl_module_fw_flash_ntf_err(module_fw->dev,
+ "Write FW block LPL command failed",
+ args.err_msg);
+ return err;
+ }
+ }
+
+ return 0;
+}
+
+static int
+cmis_fw_update_complete_download(struct ethtool_cmis_cdb *cdb,
+ struct net_device *dev,
+ struct cmis_fw_update_fw_mng_features *fw_mng)
+{
+ struct ethtool_cmis_cdb_cmd_args args = {};
+ int err;
+
+ ethtool_cmis_cdb_compose_args(&args,
+ ETHTOOL_CMIS_CDB_CMD_COMPLETE_FW_DOWNLOAD,
+ NULL, 0, fw_mng->max_duration_complete,
+ cdb->read_write_len_ext, 1000, 0,
+ CDB_F_COMPLETION_VALID | CDB_F_STATUS_VALID);
+
+ err = ethtool_cmis_cdb_execute_cmd(dev, &args);
+ if (err < 0)
+ ethnl_module_fw_flash_ntf_err(dev,
+ "Complete FW download command failed",
+ args.err_msg);
+
+ return err;
+}
+
+static int
+cmis_fw_update_download_image(struct ethtool_cmis_cdb *cdb,
+ struct ethtool_module_fw_flash *module_fw,
+ struct cmis_fw_update_fw_mng_features *fw_mng)
+{
+ int err;
+
+ err = cmis_fw_update_start_download(cdb, module_fw, fw_mng);
+ if (err < 0)
+ return err;
+
+ err = cmis_fw_update_write_image(cdb, module_fw, fw_mng);
+ if (err < 0)
+ return err;
+
+ err = cmis_fw_update_complete_download(cdb, module_fw->dev, fw_mng);
+ if (err < 0)
+ return err;
+
+ return 0;
+}
+
+enum {
+ CMIS_MODULE_LOW_PWR = 1,
+ CMIS_MODULE_READY = 3,
+};
+
+static bool module_is_ready(u8 data)
+{
+ u8 state = (data >> 1) & 7;
+
+ return state == CMIS_MODULE_READY || state == CMIS_MODULE_LOW_PWR;
+}
+
+#define CMIS_MODULE_READY_MAX_DURATION_USEC 1000
+#define CMIS_MODULE_STATE_OFFSET 3
+
+static int
+cmis_fw_update_wait_for_module_state(struct ethtool_module_fw_flash *module_fw,
+ u8 flags)
+{
+ u8 state;
+
+ return ethtool_cmis_wait_for_cond(module_fw->dev, flags,
+ CDB_F_MODULE_STATE_VALID,
+ CMIS_MODULE_READY_MAX_DURATION_USEC,
+ CMIS_MODULE_STATE_OFFSET,
+ module_is_ready, NULL, &state);
+}
+
+/* See section 9.7.10 "CMD 0109h: Run Firmware Image" in CMIS standard
+ * revision 5.2.
+ * struct cmis_cdb_run_fw_image_pl is a structured layout of the flat
+ * array, ethtool_cmis_cdb_request::payload.
+ */
+struct cmis_cdb_run_fw_image_pl {
+ u8 resv1;
+ u8 image_to_run;
+ u16 delay_to_reset;
+};
+
+static int cmis_fw_update_run_image(struct ethtool_cmis_cdb *cdb,
+ struct ethtool_module_fw_flash *module_fw)
+{
+ struct ethtool_cmis_cdb_cmd_args args = {};
+ struct cmis_cdb_run_fw_image_pl pl = {0};
+ int err;
+
+ ethtool_cmis_cdb_compose_args(&args, ETHTOOL_CMIS_CDB_CMD_RUN_FW_IMAGE,
+ (u8 *)&pl, sizeof(pl),
+ cdb->max_completion_time,
+ cdb->read_write_len_ext, 1000, 0,
+ CDB_F_MODULE_STATE_VALID);
+
+ err = ethtool_cmis_cdb_execute_cmd(module_fw->dev, &args);
+ if (err < 0) {
+ ethnl_module_fw_flash_ntf_err(module_fw->dev,
+ "Run image command failed",
+ args.err_msg);
+ return err;
+ }
+
+ err = cmis_fw_update_wait_for_module_state(module_fw, args.flags);
+ if (err < 0)
+ ethnl_module_fw_flash_ntf_err(module_fw->dev,
+ "Module is not ready on time after reset",
+ NULL);
+
+ return err;
+}
+
+static int
+cmis_fw_update_commit_image(struct ethtool_cmis_cdb *cdb,
+ struct ethtool_module_fw_flash *module_fw)
+{
+ struct ethtool_cmis_cdb_cmd_args args = {};
+ int err;
+
+ ethtool_cmis_cdb_compose_args(&args,
+ ETHTOOL_CMIS_CDB_CMD_COMMIT_FW_IMAGE,
+ NULL, 0, cdb->max_completion_time,
+ cdb->read_write_len_ext, 1000, 0,
+ CDB_F_COMPLETION_VALID | CDB_F_STATUS_VALID);
+
+ err = ethtool_cmis_cdb_execute_cmd(module_fw->dev, &args);
+ if (err < 0)
+ ethnl_module_fw_flash_ntf_err(module_fw->dev,
+ "Commit image command failed",
+ args.err_msg);
+
+ return err;
+}
+
+static int cmis_fw_update_reset(struct net_device *dev)
+{
+ __u32 reset_data = ETH_RESET_PHY;
+
+ return dev->ethtool_ops->reset(dev, &reset_data);
+}
+
+void ethtool_cmis_fw_update(struct work_struct *work)
+{
+ struct cmis_fw_update_fw_mng_features fw_mng = {0};
+ struct ethtool_module_fw_flash *module_fw;
+ struct ethtool_cmis_cdb *cdb;
+ int err;
+
+ module_fw = container_of(work, struct ethtool_module_fw_flash, work);
+
+ cdb = ethtool_cmis_cdb_init(module_fw->dev, &module_fw->params);
+ if (IS_ERR(cdb))
+ goto err_send_ntf;
+
+ ethnl_module_fw_flash_ntf_start(module_fw->dev);
+
+ err = cmis_fw_update_fw_mng_features_get(cdb, module_fw->dev, &fw_mng);
+ if (err < 0)
+ goto err_cdb_fini;
+
+ err = cmis_fw_update_download_image(cdb, module_fw, &fw_mng);
+ if (err < 0)
+ goto err_cdb_fini;
+
+ err = cmis_fw_update_run_image(cdb, module_fw);
+ if (err < 0)
+ goto err_cdb_fini;
+
+ /* The CDB command "Run Firmware Image" resets the firmware, so the new
+ * one might have different settings.
+ * Free the old CDB instance, and init a new one.
+ */
+ ethtool_cmis_cdb_fini(cdb);
+
+ cdb = ethtool_cmis_cdb_init(module_fw->dev, &module_fw->params);
+ if (IS_ERR(cdb))
+ goto err_send_ntf;
+
+ err = cmis_fw_update_commit_image(cdb, module_fw);
+ if (err < 0)
+ goto err_cdb_fini;
+
+ err = cmis_fw_update_reset(module_fw->dev);
+ if (err < 0)
+ goto err_cdb_fini;
+
+ ethnl_module_fw_flash_ntf_complete(module_fw->dev);
+ ethtool_cmis_cdb_fini(cdb);
+ goto out;
+
+err_cdb_fini:
+ ethtool_cmis_cdb_fini(cdb);
+err_send_ntf:
+ ethnl_module_fw_flash_ntf_err(module_fw->dev, NULL, NULL);
+out:
+ module_fw->dev->module_fw_flash_in_progress = false;
+ netdev_put(module_fw->dev, &module_fw->dev_tracker);
+ release_firmware(module_fw->fw);
+ kfree(module_fw);
+}
diff --git a/net/ethtool/module_fw.h b/net/ethtool/module_fw.h
index 96da7a8175f2..9af5b15efe85 100644
--- a/net/ethtool/module_fw.h
+++ b/net/ethtool/module_fw.h
@@ -9,6 +9,8 @@ void ethnl_module_fw_flash_ntf_complete(struct net_device *dev);
void ethnl_module_fw_flash_ntf_in_progress(struct net_device *dev, u64 done,
u64 total);

+void ethtool_cmis_fw_update(struct work_struct *work);
+
/**
* struct ethtool_module_fw_flash_params - module firmware flashing parameters
* @password: Module password. Only valid when @pass_valid is set.
@@ -18,3 +20,19 @@ struct ethtool_module_fw_flash_params {
__be32 password;
u8 password_valid:1;
};
+
+/**
+ * struct ethtool_module_fw_flash - module firmware flashing
+ * @dev: Pointer to the net_device to be flashed.
+ * @dev_tracker: Refcount tracker for @dev.
+ * @params: Module firmware flashing parameters.
+ * @work: The flashing firmware work.
+ * @fw: Firmware to flash.
+ */
+struct ethtool_module_fw_flash {
+ struct net_device *dev;
+ netdevice_tracker dev_tracker;
+ struct ethtool_module_fw_flash_params params;
+ struct work_struct work;
+ const struct firmware *fw;
+};
--
2.43.0


2024-04-24 13:34:50

by Danielle Ratson

[permalink] [raw]
Subject: [PATCH net-next v5 09/10] ethtool: Add ability to flash transceiver modules' firmware

Add the ability to flash the modules' firmware by implementing the
interface between the user space and the kernel.

Example from a succeeding implementation:

# ethtool --flash-module-firmware swp40 file test.bin

Transceiver module firmware flashing started for device eth0

Transceiver module firmware flashing in progress for device eth0
Status message: Downloading firmware image
Progress: 0%

[...]

Transceiver module firmware flashing in progress for device eth0
Status message: Downloading firmware image
Progress: 50%

[...]

Transceiver module firmware flashing in progress for device eth0
Status message: Downloading firmware image
Progress: 100%

Transceiver module firmware flashing completed for device eth0

Signed-off-by: Danielle Ratson <[email protected]>
---
net/ethtool/module.c | 174 ++++++++++++++++++++++++++++++++++++++++++
net/ethtool/netlink.c | 7 ++
net/ethtool/netlink.h | 2 +
3 files changed, 183 insertions(+)

diff --git a/net/ethtool/module.c b/net/ethtool/module.c
index 114a2ec986fe..f42e9a9a1ab8 100644
--- a/net/ethtool/module.c
+++ b/net/ethtool/module.c
@@ -1,6 +1,8 @@
// SPDX-License-Identifier: GPL-2.0-only

#include <linux/ethtool.h>
+#include <linux/firmware.h>
+#include <linux/sfp.h>

#include "netlink.h"
#include "common.h"
@@ -160,6 +162,178 @@ const struct ethnl_request_ops ethnl_module_request_ops = {
.set_ntf_cmd = ETHTOOL_MSG_MODULE_NTF,
};

+/* MODULE_FW_FLASH_ACT */
+
+const struct nla_policy
+ethnl_module_fw_flash_act_policy[ETHTOOL_A_MODULE_FW_FLASH_PASSWORD + 1] = {
+ [ETHTOOL_A_MODULE_FW_FLASH_HEADER] =
+ NLA_POLICY_NESTED(ethnl_header_policy),
+ [ETHTOOL_A_MODULE_FW_FLASH_FILE_NAME] = { .type = NLA_NUL_STRING },
+ [ETHTOOL_A_MODULE_FW_FLASH_PASSWORD] = { .type = NLA_U32 },
+};
+
+#define MODULE_EEPROM_PHYS_ID_PAGE 0
+#define MODULE_EEPROM_PHYS_ID_I2C_ADDR 0x50
+
+static int module_flash_fw_work_init(struct ethtool_module_fw_flash *module_fw,
+ struct net_device *dev,
+ struct netlink_ext_ack *extack)
+{
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ struct ethtool_module_eeprom page_data = {};
+ u8 phys_id;
+ int err;
+
+ /* Fetch the SFF-8024 Identifier Value. For all supported standards, it
+ * is located at I2C address 0x50, byte 0. See section 4.1 in SFF-8024,
+ * revision 4.9.
+ */
+ page_data.page = MODULE_EEPROM_PHYS_ID_PAGE;
+ page_data.offset = SFP_PHYS_ID;
+ page_data.length = sizeof(phys_id);
+ page_data.i2c_address = MODULE_EEPROM_PHYS_ID_I2C_ADDR;
+ page_data.data = &phys_id;
+
+ err = ops->get_module_eeprom_by_page(dev, &page_data, extack);
+ if (err < 0)
+ return err;
+
+ switch (phys_id) {
+ case SFF8024_ID_QSFP_DD:
+ case SFF8024_ID_OSFP:
+ case SFF8024_ID_DSFP:
+ case SFF8024_ID_QSFP_PLUS_CMIS:
+ case SFF8024_ID_SFP_DD_CMIS:
+ case SFF8024_ID_SFP_PLUS_CMIS:
+ INIT_WORK(&module_fw->work, ethtool_cmis_fw_update);
+ break;
+ default:
+ NL_SET_ERR_MSG(extack,
+ "Module type does not support firmware flashing");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int __module_flash_fw_schedule(struct net_device *dev,
+ struct netlink_ext_ack *extack)
+{
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+
+ if (!ops->set_module_eeprom_by_page ||
+ !ops->get_module_eeprom_by_page) {
+ NL_SET_ERR_MSG(extack,
+ "Flashing module firmware is not supported by this device");
+ return -EOPNOTSUPP;
+ }
+
+ if (!ops->reset) {
+ NL_SET_ERR_MSG(extack,
+ "Reset module is not supported by this device, so flashing is not permitted");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int
+module_flash_fw_schedule(struct net_device *dev, const char *file_name,
+ struct ethtool_module_fw_flash_params *params,
+ struct netlink_ext_ack *extack)
+{
+ struct ethtool_module_fw_flash *module_fw;
+ int err;
+
+ err = __module_flash_fw_schedule(dev, extack);
+ if (err < 0)
+ return err;
+
+ module_fw = kzalloc(sizeof(*module_fw), GFP_KERNEL);
+ if (!module_fw)
+ return -ENOMEM;
+
+ module_fw->params = *params;
+ err = request_firmware_direct(&module_fw->fw, file_name, &dev->dev);
+ if (err) {
+ NL_SET_ERR_MSG(extack,
+ "Failed to request module firmware image");
+ goto err_request_firmware;
+ }
+
+ err = module_flash_fw_work_init(module_fw, dev, extack);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack,
+ "Flashing module firmware is not supported by this device");
+ goto err_work_init;
+ }
+
+ dev->module_fw_flash_in_progress = true;
+ netdev_hold(dev, &module_fw->dev_tracker, GFP_KERNEL);
+ module_fw->dev = dev;
+
+ schedule_work(&module_fw->work);
+
+ return 0;
+
+err_work_init:
+ release_firmware(module_fw->fw);
+err_request_firmware:
+ kfree(module_fw);
+ return err;
+}
+
+static int module_flash_fw(struct net_device *dev, struct nlattr **tb,
+ struct genl_info *info)
+{
+ struct ethtool_module_fw_flash_params params = {};
+ const char *file_name;
+ struct nlattr *attr;
+
+ if (GENL_REQ_ATTR_CHECK(info, ETHTOOL_A_MODULE_FW_FLASH_FILE_NAME))
+ return -EINVAL;
+
+ file_name = nla_data(tb[ETHTOOL_A_MODULE_FW_FLASH_FILE_NAME]);
+
+ attr = tb[ETHTOOL_A_MODULE_FW_FLASH_PASSWORD];
+ if (attr) {
+ params.password = cpu_to_be32(nla_get_u32(attr));
+ params.password_valid = true;
+ }
+
+ return module_flash_fw_schedule(dev, file_name, &params, info->extack);
+}
+
+int ethnl_act_module_fw_flash(struct sk_buff *skb, struct genl_info *info)
+{
+ struct ethnl_req_info req_info = {};
+ struct nlattr **tb = info->attrs;
+ struct net_device *dev;
+ int ret;
+
+ ret = ethnl_parse_header_dev_get(&req_info,
+ tb[ETHTOOL_A_MODULE_FW_FLASH_HEADER],
+ genl_info_net(info), info->extack,
+ true);
+ if (ret < 0)
+ return ret;
+ dev = req_info.dev;
+
+ rtnl_lock();
+ ret = ethnl_ops_begin(dev);
+ if (ret < 0)
+ goto out_rtnl;
+
+ ret = module_flash_fw(dev, tb, info);
+
+ ethnl_ops_complete(dev);
+
+out_rtnl:
+ rtnl_unlock();
+ ethnl_parse_header_dev_put(&req_info);
+ return ret;
+}
+
/* MODULE_FW_FLASH_NTF */

static void
diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 563e94e0cbd8..1a4f6bd1ec7f 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -1169,6 +1169,13 @@ static const struct genl_ops ethtool_genl_ops[] = {
.policy = ethnl_mm_set_policy,
.maxattr = ARRAY_SIZE(ethnl_mm_set_policy) - 1,
},
+ {
+ .cmd = ETHTOOL_MSG_MODULE_FW_FLASH_ACT,
+ .flags = GENL_UNS_ADMIN_PERM,
+ .doit = ethnl_act_module_fw_flash,
+ .policy = ethnl_module_fw_flash_act_policy,
+ .maxattr = ARRAY_SIZE(ethnl_module_fw_flash_act_policy) - 1,
+ },
};

static const struct genl_multicast_group ethtool_nl_mcgrps[] = {
diff --git a/net/ethtool/netlink.h b/net/ethtool/netlink.h
index d57a890b5d9e..e1e2edd05206 100644
--- a/net/ethtool/netlink.h
+++ b/net/ethtool/netlink.h
@@ -446,6 +446,7 @@ extern const struct nla_policy ethnl_plca_set_cfg_policy[ETHTOOL_A_PLCA_MAX + 1]
extern const struct nla_policy ethnl_plca_get_status_policy[ETHTOOL_A_PLCA_HEADER + 1];
extern const struct nla_policy ethnl_mm_get_policy[ETHTOOL_A_MM_HEADER + 1];
extern const struct nla_policy ethnl_mm_set_policy[ETHTOOL_A_MM_MAX + 1];
+extern const struct nla_policy ethnl_module_fw_flash_act_policy[ETHTOOL_A_MODULE_FW_FLASH_PASSWORD + 1];

int ethnl_set_features(struct sk_buff *skb, struct genl_info *info);
int ethnl_act_cable_test(struct sk_buff *skb, struct genl_info *info);
@@ -453,6 +454,7 @@ int ethnl_act_cable_test_tdr(struct sk_buff *skb, struct genl_info *info);
int ethnl_tunnel_info_doit(struct sk_buff *skb, struct genl_info *info);
int ethnl_tunnel_info_start(struct netlink_callback *cb);
int ethnl_tunnel_info_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
+int ethnl_act_module_fw_flash(struct sk_buff *skb, struct genl_info *info);

extern const char stats_std_names[__ETHTOOL_STATS_CNT][ETH_GSTRING_LEN];
extern const char stats_eth_phy_names[__ETHTOOL_A_STATS_ETH_PHY_CNT][ETH_GSTRING_LEN];
--
2.43.0


2024-04-24 13:35:17

by Danielle Ratson

[permalink] [raw]
Subject: [PATCH net-next v5 10/10] ethtool: Veto some operations during firmware flashing process

Some operations cannot be performed during the firmware flashing process.

For example:

- Port must be down during the whole flashing process to avoid packet loss
while committing reset for example.

- Writing to EEPROM interrupts the flashing process, so operations like
ethtool dump, module reset, get and set power mode should be vetoed.

- Split port firmware flashing should be vetoed.

- Flashing firmware on a device which is already in a flashing process
should be forbidden.

Use the 'module_fw_flashing_in_progress' flag introduced in a previous
patch to veto those operations and prevent interruptions while preforming
module firmware flash.

Signed-off-by: Danielle Ratson <[email protected]>
---
net/ethtool/eeprom.c | 6 ++++++
net/ethtool/ioctl.c | 12 ++++++++++++
net/ethtool/module.c | 29 +++++++++++++++++++++++++++++
net/ethtool/netlink.c | 30 +++++++++++++++++++++++++++++-
4 files changed, 76 insertions(+), 1 deletion(-)

diff --git a/net/ethtool/eeprom.c b/net/ethtool/eeprom.c
index 6209c3a9c8f7..f36811b3ecf1 100644
--- a/net/ethtool/eeprom.c
+++ b/net/ethtool/eeprom.c
@@ -91,6 +91,12 @@ static int get_module_eeprom_by_page(struct net_device *dev,
{
const struct ethtool_ops *ops = dev->ethtool_ops;

+ if (dev->module_fw_flash_in_progress) {
+ NL_SET_ERR_MSG(extack,
+ "Module firmware flashing is in progress");
+ return -EBUSY;
+ }
+
if (dev->sfp_bus)
return sfp_get_module_eeprom_by_page(dev->sfp_bus, page_data, extack);

diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 5a55270aa86e..02b23805d2be 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -658,6 +658,9 @@ static int ethtool_get_settings(struct net_device *dev, void __user *useraddr)
if (!dev->ethtool_ops->get_link_ksettings)
return -EOPNOTSUPP;

+ if (dev->module_fw_flash_in_progress)
+ return -EBUSY;
+
memset(&link_ksettings, 0, sizeof(link_ksettings));
err = dev->ethtool_ops->get_link_ksettings(dev, &link_ksettings);
if (err < 0)
@@ -1449,6 +1452,9 @@ static int ethtool_reset(struct net_device *dev, char __user *useraddr)
if (!dev->ethtool_ops->reset)
return -EOPNOTSUPP;

+ if (dev->module_fw_flash_in_progress)
+ return -EBUSY;
+
if (copy_from_user(&reset, useraddr, sizeof(reset)))
return -EFAULT;

@@ -2462,6 +2468,9 @@ int ethtool_get_module_info_call(struct net_device *dev,
const struct ethtool_ops *ops = dev->ethtool_ops;
struct phy_device *phydev = dev->phydev;

+ if (dev->module_fw_flash_in_progress)
+ return -EBUSY;
+
if (dev->sfp_bus)
return sfp_get_module_info(dev->sfp_bus, modinfo);

@@ -2499,6 +2508,9 @@ int ethtool_get_module_eeprom_call(struct net_device *dev,
const struct ethtool_ops *ops = dev->ethtool_ops;
struct phy_device *phydev = dev->phydev;

+ if (dev->module_fw_flash_in_progress)
+ return -EBUSY;
+
if (dev->sfp_bus)
return sfp_get_module_eeprom(dev->sfp_bus, ee, data);

diff --git a/net/ethtool/module.c b/net/ethtool/module.c
index f42e9a9a1ab8..abb7c01e34d2 100644
--- a/net/ethtool/module.c
+++ b/net/ethtool/module.c
@@ -3,6 +3,7 @@
#include <linux/ethtool.h>
#include <linux/firmware.h>
#include <linux/sfp.h>
+#include <net/devlink.h>

#include "netlink.h"
#include "common.h"
@@ -36,6 +37,12 @@ static int module_get_power_mode(struct net_device *dev,
if (!ops->get_module_power_mode)
return 0;

+ if (dev->module_fw_flash_in_progress) {
+ NL_SET_ERR_MSG(extack,
+ "Module firmware flashing is in progress");
+ return -EBUSY;
+ }
+
return ops->get_module_power_mode(dev, &data->power, extack);
}

@@ -112,6 +119,12 @@ ethnl_set_module_validate(struct ethnl_req_info *req_info,
if (!tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY])
return 0;

+ if (req_info->dev->module_fw_flash_in_progress) {
+ NL_SET_ERR_MSG(info->extack,
+ "Module firmware flashing is in progress");
+ return -EBUSY;
+ }
+
if (!ops->get_module_power_mode || !ops->set_module_power_mode) {
NL_SET_ERR_MSG_ATTR(info->extack,
tb[ETHTOOL_A_MODULE_POWER_MODE_POLICY],
@@ -219,6 +232,7 @@ static int module_flash_fw_work_init(struct ethtool_module_fw_flash *module_fw,
static int __module_flash_fw_schedule(struct net_device *dev,
struct netlink_ext_ack *extack)
{
+ struct devlink_port *devlink_port = dev->devlink_port;
const struct ethtool_ops *ops = dev->ethtool_ops;

if (!ops->set_module_eeprom_by_page ||
@@ -234,6 +248,21 @@ static int __module_flash_fw_schedule(struct net_device *dev,
return -EOPNOTSUPP;
}

+ if (dev->module_fw_flash_in_progress) {
+ NL_SET_ERR_MSG(extack, "Module firmware flashing already in progress");
+ return -EBUSY;
+ }
+
+ if (dev->flags & IFF_UP) {
+ NL_SET_ERR_MSG(extack, "Netdevice is up, so flashing is not permitted");
+ return -EBUSY;
+ }
+
+ if (devlink_port && devlink_port->attrs.split) {
+ NL_SET_ERR_MSG(extack, "Can't perform firmware flashing on a split port");
+ return -EOPNOTSUPP;
+ }
+
return 0;
}

diff --git a/net/ethtool/netlink.c b/net/ethtool/netlink.c
index 1a4f6bd1ec7f..90e5b5312aa2 100644
--- a/net/ethtool/netlink.c
+++ b/net/ethtool/netlink.c
@@ -1194,6 +1194,29 @@ static struct genl_family ethtool_genl_family __ro_after_init = {
.n_mcgrps = ARRAY_SIZE(ethtool_nl_mcgrps),
};

+static int module_netdev_pre_up_event(struct notifier_block *this,
+ unsigned long event, void *ptr)
+{
+ struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+ struct netdev_notifier_info *info = ptr;
+ struct netlink_ext_ack *extack;
+
+ extack = netdev_notifier_info_to_extack(info);
+
+ if (event == NETDEV_PRE_UP) {
+ if (dev->module_fw_flash_in_progress) {
+ NL_SET_ERR_MSG(extack, "Can't set port up while flashing module firmware");
+ return NOTIFY_BAD;
+ }
+ }
+
+ return NOTIFY_DONE;
+}
+
+static struct notifier_block ethtool_module_netdev_pre_up_notifier = {
+ .notifier_call = module_netdev_pre_up_event,
+};
+
/* module setup */

static int __init ethnl_init(void)
@@ -1206,7 +1229,12 @@ static int __init ethnl_init(void)
ethnl_ok = true;

ret = register_netdevice_notifier(&ethnl_netdev_notifier);
- WARN(ret < 0, "ethtool: net device notifier registration failed");
+ if (WARN(ret < 0, "ethtool: net device notifier registration failed"))
+ return ret;
+
+ ret = register_netdevice_notifier(&ethtool_module_netdev_pre_up_notifier);
+ WARN(ret < 0, "ethtool: net device port up notifier registration failed");
+
return ret;
}

--
2.43.0


2024-04-24 13:55:00

by Danielle Ratson

[permalink] [raw]
Subject: [PATCH net-next v5 07/10] ethtool: cmis_cdb: Add a layer for supporting CDB commands

CDB (Command Data Block Message Communication) reads and writes are
performed on memory map pages 9Fh-AFh according to the CMIS standard,
section 8.20 of revision 5.2.
Page 9Fh is used to specify the CDB command to be executed and also
provides an area for a local payload (LPL).

According to the CMIS standard, the firmware update process is done using
a CDB commands sequence that will be implemented in the next patch.

The kernel interface that will implement the firmware update using CDB
command will include 2 layers that will be added under ethtool:

* The upper layer that will be triggered from the module layer, is
cmis_fw_update.
* The lower one is cmis_cdb.

In the future there might be more operations to implement using CDB
commands. Therefore, the idea is to keep the CDB interface clean and the
cmis_fw_update specific to the CDB commands handling it.

These two layers will communicate using the API the consists of three
functions:

- struct ethtool_cmis_cdb *
ethtool_cmis_cdb_init(struct net_device *dev,
struct ethtool_module_fw_flash_params *params);
- void ethtool_cmis_cdb_fini(struct ethtool_cmis_cdb *cdb);
- int ethtool_cmis_cdb_execute_cmd(struct net_device *dev,
struct ethtool_cmis_cdb_cmd_args *args);

Add the CDB layer to support initializing, finishing and executing CDB
commands:

* The initialization process will include creating of an ethtool_cmis_cdb
instance, querying the module CDB support, entering and validating the
password from user space (CMD 0x0000) and querying the module features
(CMD 0x0040).

* The finishing API will simply free the ethtool_cmis_cdb instance.

* The executing process will write the CDB command to EEPROM using
set_module_eeprom_by_page() that was presented earlier, and will
process the reply from EEPROM.

Signed-off-by: Danielle Ratson <[email protected]>
---

Notes:
v5:
* Drop all the inline in cmis_cdb.c.

v4:
* Add kernel-doc for msleep_pre_rpl and err_msg.

v3:
* Use kmemdup() instead of kmalloc+memcpy.

v2:
* Define ethtool_cmis_cdb_request::epl_len to be __be16 instead
of u16.

net/ethtool/Makefile | 2 +-
net/ethtool/cmis.h | 116 ++++++++
net/ethtool/cmis_cdb.c | 577 ++++++++++++++++++++++++++++++++++++++++
net/ethtool/module_fw.h | 10 +
4 files changed, 704 insertions(+), 1 deletion(-)
create mode 100644 net/ethtool/cmis.h
create mode 100644 net/ethtool/cmis_cdb.c

diff --git a/net/ethtool/Makefile b/net/ethtool/Makefile
index 504f954a1b28..38806b3ecf83 100644
--- a/net/ethtool/Makefile
+++ b/net/ethtool/Makefile
@@ -8,4 +8,4 @@ ethtool_nl-y := netlink.o bitset.o strset.o linkinfo.o linkmodes.o rss.o \
linkstate.o debug.o wol.o features.o privflags.o rings.o \
channels.o coalesce.o pause.o eee.o tsinfo.o cabletest.o \
tunnels.o fec.o eeprom.o stats.o phc_vclocks.o mm.o \
- module.o pse-pd.o plca.o mm.o
+ module.o cmis_cdb.o pse-pd.o plca.o mm.o
diff --git a/net/ethtool/cmis.h b/net/ethtool/cmis.h
new file mode 100644
index 000000000000..6c8e88b8ade2
--- /dev/null
+++ b/net/ethtool/cmis.h
@@ -0,0 +1,116 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+#define ETHTOOL_CMIS_CDB_LPL_MAX_PL_LENGTH 120
+#define ETHTOOL_CMIS_CDB_CMD_PAGE 0x9F
+#define ETHTOOL_CMIS_CDB_PAGE_I2C_ADDR 0x50
+
+/**
+ * struct ethtool_cmis_cdb - CDB commands parameters
+ * @cmis_rev: CMIS revision major.
+ * @read_write_len_ext: Allowable additional number of byte octets to the LPL
+ * in a READ or a WRITE CDB commands.
+ * @max_completion_time: Maximum CDB command completion time in msec.
+ */
+struct ethtool_cmis_cdb {
+ u8 cmis_rev;
+ u8 read_write_len_ext;
+ u16 max_completion_time;
+};
+
+enum ethtool_cmis_cdb_cmd_id {
+ ETHTOOL_CMIS_CDB_CMD_QUERY_STATUS = 0x0000,
+ ETHTOOL_CMIS_CDB_CMD_MODULE_FEATURES = 0x0040,
+};
+
+/**
+ * struct ethtool_cmis_cdb_request - CDB commands request fields as decribed in
+ * the CMIS standard
+ * @id: Command ID.
+ * @epl_len: EPL memory length.
+ * @lpl_len: LPL memory length.
+ * @chk_code: Check code for the previous field and the payload.
+ * @resv1: Added to match the CMIS standard request continuity.
+ * @resv2: Added to match the CMIS standard request continuity.
+ * @payload: Payload for the CDB commands.
+ */
+struct ethtool_cmis_cdb_request {
+ __be16 id;
+ struct_group(body,
+ __be16 epl_len;
+ u8 lpl_len;
+ u8 chk_code;
+ u8 resv1;
+ u8 resv2;
+ u8 payload[ETHTOOL_CMIS_CDB_LPL_MAX_PL_LENGTH];
+ );
+};
+
+#define CDB_F_COMPLETION_VALID BIT(0)
+#define CDB_F_STATUS_VALID BIT(1)
+
+/**
+ * struct ethtool_cmis_cdb_cmd_args - CDB commands execution arguments
+ * @req: CDB command fields as described in the CMIS standard.
+ * @max_duration: Maximum duration time for command completion in msec.
+ * @read_write_len_ext: Allowable additional number of byte octets to the LPL
+ * in a READ or a WRITE commands.
+ * @msleep_pre_rpl: Waiting time before checking reply in msec.
+ * @rpl_exp_len: Expected reply length in bytes.
+ * @flags: Validation flags for CDB commands.
+ * @err_msg: Error message to be sent to user space.
+ */
+struct ethtool_cmis_cdb_cmd_args {
+ struct ethtool_cmis_cdb_request req;
+ u16 max_duration;
+ u8 read_write_len_ext;
+ u8 msleep_pre_rpl;
+ u8 rpl_exp_len;
+ u8 flags;
+ char *err_msg;
+};
+
+/**
+ * struct ethtool_cmis_cdb_rpl_hdr - CDB commands reply header arguments
+ * @rpl_len: Reply length.
+ * @rpl_chk_code: Reply check code.
+ */
+struct ethtool_cmis_cdb_rpl_hdr {
+ u8 rpl_len;
+ u8 rpl_chk_code;
+};
+
+/**
+ * struct ethtool_cmis_cdb_rpl - CDB commands reply arguments
+ * @hdr: CDB commands reply header arguments.
+ * @payload: Payload for the CDB commands reply.
+ */
+struct ethtool_cmis_cdb_rpl {
+ struct ethtool_cmis_cdb_rpl_hdr hdr;
+ u8 payload[ETHTOOL_CMIS_CDB_LPL_MAX_PL_LENGTH];
+};
+
+u32 ethtool_cmis_get_max_payload_size(u8 num_of_byte_octs);
+
+void ethtool_cmis_cdb_compose_args(struct ethtool_cmis_cdb_cmd_args *args,
+ enum ethtool_cmis_cdb_cmd_id cmd, u8 *pl,
+ u8 lpl_len, u16 max_duration,
+ u8 read_write_len_ext, u16 msleep_pre_rpl,
+ u8 rpl_exp_len, u8 flags);
+
+void ethtool_cmis_cdb_check_completion_flag(u8 cmis_rev, u8 *flags);
+
+void ethtool_cmis_page_init(struct ethtool_module_eeprom *page_data,
+ u8 page, u32 offset, u32 length);
+void ethtool_cmis_page_fini(struct ethtool_module_eeprom *page_data);
+
+struct ethtool_cmis_cdb *
+ethtool_cmis_cdb_init(struct net_device *dev,
+ const struct ethtool_module_fw_flash_params *params);
+void ethtool_cmis_cdb_fini(struct ethtool_cmis_cdb *cdb);
+
+int ethtool_cmis_wait_for_cond(struct net_device *dev, u8 flags, u8 flag,
+ u16 max_duration, u32 offset,
+ bool (*cond_success)(u8), bool (*cond_fail)(u8), u8 *state);
+
+int ethtool_cmis_cdb_execute_cmd(struct net_device *dev,
+ struct ethtool_cmis_cdb_cmd_args *args);
diff --git a/net/ethtool/cmis_cdb.c b/net/ethtool/cmis_cdb.c
new file mode 100644
index 000000000000..0642a3e62fd3
--- /dev/null
+++ b/net/ethtool/cmis_cdb.c
@@ -0,0 +1,577 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#include <linux/ethtool.h>
+#include <linux/jiffies.h>
+
+#include "common.h"
+#include "module_fw.h"
+#include "cmis.h"
+
+/* For accessing the LPL field on page 9Fh, the allowable length extension is
+ * min(i, 15) byte octets where i specifies the allowable additional number of
+ * byte octets in a READ or a WRITE.
+ */
+u32 ethtool_cmis_get_max_payload_size(u8 num_of_byte_octs)
+{
+ return 8 * (1 + min_t(u8, num_of_byte_octs, 15));
+}
+
+void ethtool_cmis_cdb_compose_args(struct ethtool_cmis_cdb_cmd_args *args,
+ enum ethtool_cmis_cdb_cmd_id cmd, u8 *pl,
+ u8 lpl_len, u16 max_duration,
+ u8 read_write_len_ext, u16 msleep_pre_rpl,
+ u8 rpl_exp_len, u8 flags)
+{
+ args->req.id = cpu_to_be16(cmd);
+ args->req.lpl_len = lpl_len;
+ if (pl)
+ memcpy(args->req.payload, pl, args->req.lpl_len);
+
+ args->max_duration = max_duration;
+ args->read_write_len_ext =
+ ethtool_cmis_get_max_payload_size(read_write_len_ext);
+ args->msleep_pre_rpl = msleep_pre_rpl;
+ args->rpl_exp_len = rpl_exp_len;
+ args->flags = flags;
+ args->err_msg = NULL;
+}
+
+void ethtool_cmis_page_init(struct ethtool_module_eeprom *page_data,
+ u8 page, u32 offset, u32 length)
+{
+ page_data->page = page;
+ page_data->offset = offset;
+ page_data->length = length;
+ page_data->i2c_address = ETHTOOL_CMIS_CDB_PAGE_I2C_ADDR;
+}
+
+#define CMIS_REVISION_PAGE 0x00
+#define CMIS_REVISION_OFFSET 0x01
+
+struct cmis_rev_rpl {
+ u8 rev;
+};
+
+static u8 cmis_rev_rpl_major(struct cmis_rev_rpl *rpl)
+{
+ return rpl->rev >> 4;
+}
+
+static int cmis_rev_major_get(struct net_device *dev, u8 *rev_major)
+{
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ struct ethtool_module_eeprom page_data = {0};
+ struct netlink_ext_ack extack = {};
+ struct cmis_rev_rpl rpl = {};
+ int err;
+
+ ethtool_cmis_page_init(&page_data, CMIS_REVISION_PAGE,
+ CMIS_REVISION_OFFSET, sizeof(rpl));
+ page_data.data = (u8 *)&rpl;
+
+ err = ops->get_module_eeprom_by_page(dev, &page_data, &extack);
+ if (err < 0) {
+ if (extack._msg)
+ netdev_err(dev, "%s\n", extack._msg);
+ return err;
+ }
+
+ *rev_major = cmis_rev_rpl_major(&rpl);
+
+ return 0;
+}
+
+#define CMIS_CDB_ADVERTISEMENT_PAGE 0x01
+#define CMIS_CDB_ADVERTISEMENT_OFFSET 0xA3
+
+/* Based on section 8.4.11 "CDB Messaging Support Advertisement" in CMIS
+ * standard revision 5.2.
+ */
+struct cmis_cdb_advert_rpl {
+ u8 inst_supported;
+ u8 read_write_len_ext;
+ u8 resv1;
+ u8 resv2;
+};
+
+static u8 cmis_cdb_advert_rpl_inst_supported(struct cmis_cdb_advert_rpl *rpl)
+{
+ return rpl->inst_supported >> 6;
+}
+
+static int cmis_cdb_advertisement_get(struct ethtool_cmis_cdb *cdb,
+ struct net_device *dev)
+{
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ struct ethtool_module_eeprom page_data = {};
+ struct cmis_cdb_advert_rpl rpl = {};
+ struct netlink_ext_ack extack = {};
+ int err;
+
+ ethtool_cmis_page_init(&page_data, CMIS_CDB_ADVERTISEMENT_PAGE,
+ CMIS_CDB_ADVERTISEMENT_OFFSET, sizeof(rpl));
+ page_data.data = (u8 *)&rpl;
+
+ err = ops->get_module_eeprom_by_page(dev, &page_data, &extack);
+ if (err < 0) {
+ if (extack._msg)
+ netdev_err(dev, "%s\n", extack._msg);
+ return err;
+ }
+
+ if (!cmis_cdb_advert_rpl_inst_supported(&rpl))
+ return -EOPNOTSUPP;
+
+ cdb->read_write_len_ext = rpl.read_write_len_ext;
+
+ return 0;
+}
+
+#define CMIS_PASSWORD_ENTRY_PAGE 0x00
+#define CMIS_PASSWORD_ENTRY_OFFSET 0x7A
+
+struct cmis_password_entry_pl {
+ __be32 password;
+};
+
+/* See section 9.3.1 "CMD 0000h: Query Status" in CMIS standard revision 5.2.
+ * struct cmis_cdb_query_status_pl and struct cmis_cdb_query_status_rpl are
+ * structured layouts of the flat arrays,
+ * struct ethtool_cmis_cdb_request::payload and
+ * struct ethtool_cmis_cdb_rpl::payload respectively.
+ */
+struct cmis_cdb_query_status_pl {
+ u16 response_delay;
+};
+
+struct cmis_cdb_query_status_rpl {
+ u8 length;
+ u8 status;
+};
+
+static int
+cmis_cdb_validate_password(struct ethtool_cmis_cdb *cdb,
+ struct net_device *dev,
+ const struct ethtool_module_fw_flash_params *params)
+{
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ struct cmis_cdb_query_status_pl qs_pl = {0};
+ struct ethtool_module_eeprom page_data = {};
+ struct ethtool_cmis_cdb_cmd_args args = {};
+ struct cmis_password_entry_pl pe_pl = {};
+ struct cmis_cdb_query_status_rpl *rpl;
+ struct netlink_ext_ack extack = {};
+ int err;
+
+ ethtool_cmis_page_init(&page_data, CMIS_PASSWORD_ENTRY_PAGE,
+ CMIS_PASSWORD_ENTRY_OFFSET, sizeof(pe_pl));
+ page_data.data = (u8 *)&pe_pl;
+
+ pe_pl = *((struct cmis_password_entry_pl *)page_data.data);
+ pe_pl.password = params->password;
+ err = ops->set_module_eeprom_by_page(dev, &page_data, &extack);
+ if (err < 0) {
+ if (extack._msg)
+ netdev_err(dev, "%s\n", extack._msg);
+ return err;
+ }
+
+ ethtool_cmis_cdb_compose_args(&args, ETHTOOL_CMIS_CDB_CMD_QUERY_STATUS,
+ (u8 *)&qs_pl, sizeof(qs_pl), 0,
+ cdb->read_write_len_ext, 1000,
+ sizeof(*rpl),
+ CDB_F_COMPLETION_VALID | CDB_F_STATUS_VALID);
+
+ err = ethtool_cmis_cdb_execute_cmd(dev, &args);
+ if (err < 0) {
+ ethnl_module_fw_flash_ntf_err(dev,
+ "Query Status command failed",
+ args.err_msg);
+ return err;
+ }
+
+ rpl = (struct cmis_cdb_query_status_rpl *)args.req.payload;
+ if (!rpl->length || !rpl->status) {
+ ethnl_module_fw_flash_ntf_err(dev, "Password was not accepted",
+ NULL);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+/* Some CDB commands asserts the CDB completion flag only from CMIS
+ * revision 5. Therefore, check the relevant validity flag only when
+ * the revision supports it.
+ */
+void ethtool_cmis_cdb_check_completion_flag(u8 cmis_rev, u8 *flags)
+{
+ *flags |= cmis_rev >= 5 ? CDB_F_COMPLETION_VALID : 0;
+}
+
+#define CMIS_CDB_MODULE_FEATURES_RESV_DATA 34
+
+/* See section 9.4.1 "CMD 0040h: Module Features" in CMIS standard revision 5.2.
+ * struct cmis_cdb_module_features_rpl is structured layout of the flat
+ * array, ethtool_cmis_cdb_rpl::payload.
+ */
+struct cmis_cdb_module_features_rpl {
+ u8 resv1[CMIS_CDB_MODULE_FEATURES_RESV_DATA];
+ __be16 max_completion_time;
+};
+
+static u16
+cmis_cdb_module_features_completion_time(struct cmis_cdb_module_features_rpl *rpl)
+{
+ return be16_to_cpu(rpl->max_completion_time);
+}
+
+static int cmis_cdb_module_features_get(struct ethtool_cmis_cdb *cdb,
+ struct net_device *dev)
+{
+ struct ethtool_cmis_cdb_cmd_args args = {};
+ struct cmis_cdb_module_features_rpl *rpl;
+ u8 flags = CDB_F_STATUS_VALID;
+ int err;
+
+ ethtool_cmis_cdb_check_completion_flag(cdb->cmis_rev, &flags);
+ ethtool_cmis_cdb_compose_args(&args,
+ ETHTOOL_CMIS_CDB_CMD_MODULE_FEATURES,
+ NULL, 0, 0, cdb->read_write_len_ext,
+ 1000, sizeof(*rpl), flags);
+
+ err = ethtool_cmis_cdb_execute_cmd(dev, &args);
+ if (err < 0) {
+ ethnl_module_fw_flash_ntf_err(dev,
+ "Module Features command failed",
+ args.err_msg);
+ return err;
+ }
+
+ rpl = (struct cmis_cdb_module_features_rpl *)args.req.payload;
+ cdb->max_completion_time =
+ cmis_cdb_module_features_completion_time(rpl);
+
+ return 0;
+}
+
+struct ethtool_cmis_cdb *
+ethtool_cmis_cdb_init(struct net_device *dev,
+ const struct ethtool_module_fw_flash_params *params)
+{
+ struct ethtool_cmis_cdb *cdb;
+ int err;
+
+ cdb = kzalloc(sizeof(*cdb), GFP_KERNEL);
+ if (!cdb)
+ return ERR_PTR(-ENOMEM);
+
+ err = cmis_rev_major_get(dev, &cdb->cmis_rev);
+ if (err < 0)
+ goto err;
+
+ if (cdb->cmis_rev < 4) {
+ ethnl_module_fw_flash_ntf_err(dev,
+ "CMIS revision doesn't support module firmware flashing",
+ NULL);
+ err = -EOPNOTSUPP;
+ goto err;
+ }
+
+ err = cmis_cdb_advertisement_get(cdb, dev);
+ if (err < 0)
+ goto err;
+
+ if (params->password_valid) {
+ err = cmis_cdb_validate_password(cdb, dev, params);
+ if (err < 0)
+ goto err;
+ }
+
+ err = cmis_cdb_module_features_get(cdb, dev);
+ if (err < 0)
+ goto err;
+
+ return cdb;
+
+err:
+ ethtool_cmis_cdb_fini(cdb);
+ return ERR_PTR(err);
+}
+
+void ethtool_cmis_cdb_fini(struct ethtool_cmis_cdb *cdb)
+{
+ kfree(cdb);
+}
+
+static bool is_completed(u8 data)
+{
+ return !!(data & 0x40);
+}
+
+#define CMIS_CDB_STATUS_SUCCESS 0x01
+
+static bool status_success(u8 data)
+{
+ return data == CMIS_CDB_STATUS_SUCCESS;
+}
+
+#define CMIS_CDB_STATUS_FAIL 0x40
+
+static bool status_fail(u8 data)
+{
+ return data & CMIS_CDB_STATUS_FAIL;
+}
+
+struct cmis_wait_for_cond_rpl {
+ u8 state;
+};
+
+int ethtool_cmis_wait_for_cond(struct net_device *dev, u8 flags, u8 flag,
+ u16 max_duration, u32 offset,
+ bool (*cond_success)(u8), bool (*cond_fail)(u8),
+ u8 *state)
+{
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ struct ethtool_module_eeprom page_data = {0};
+ struct cmis_wait_for_cond_rpl rpl = {};
+ struct netlink_ext_ack extack = {};
+ unsigned long end;
+ int err;
+
+ if (!(flags & flag))
+ return 0;
+
+ if (max_duration == 0)
+ max_duration = U16_MAX;
+
+ end = jiffies + msecs_to_jiffies(max_duration);
+ do {
+ ethtool_cmis_page_init(&page_data, 0, offset, sizeof(rpl));
+ page_data.data = (u8 *)&rpl;
+
+ err = ops->get_module_eeprom_by_page(dev, &page_data, &extack);
+ if (err < 0) {
+ if (extack._msg)
+ netdev_err(dev, "%s\n", extack._msg);
+ continue;
+ }
+
+ if ((*cond_success)(rpl.state))
+ return 0;
+
+ if (*cond_fail && (*cond_fail)(rpl.state))
+ break;
+
+ msleep(20);
+ } while (time_before(jiffies, end));
+
+ *state = rpl.state;
+ return -EBUSY;
+}
+
+#define CMIS_CDB_COMPLETION_FLAG_OFFSET 0x08
+
+static int cmis_cdb_wait_for_completion(struct net_device *dev,
+ struct ethtool_cmis_cdb_cmd_args *args)
+{
+ u8 flag;
+ int err;
+
+ /* Some vendors demand waiting time before checking completion flag
+ * in some CDB commands.
+ */
+ msleep(args->msleep_pre_rpl);
+
+ err = ethtool_cmis_wait_for_cond(dev, args->flags,
+ CDB_F_COMPLETION_VALID,
+ args->max_duration,
+ CMIS_CDB_COMPLETION_FLAG_OFFSET,
+ is_completed, NULL, &flag);
+ if (err < 0)
+ args->err_msg = "Completion Flag did not set on time";
+
+ return err;
+}
+
+#define CMIS_CDB_STATUS_OFFSET 0x25
+
+static void cmis_cdb_status_fail_msg_get(u8 status, char **err_msg)
+{
+ switch (status) {
+ case 0b10000001:
+ *err_msg = "CDB Status is in progress: Busy capturing command";
+ break;
+ case 0b10000010:
+ *err_msg =
+ "CDB Status is in progress: Busy checking/validating command";
+ break;
+ case 0b10000011:
+ *err_msg = "CDB Status is in progress: Busy executing";
+ break;
+ case 0b01000000:
+ *err_msg = "CDB status failed: no specific failure";
+ break;
+ case 0b01000010:
+ *err_msg =
+ "CDB status failed: Parameter range error or parameter not supported";
+ break;
+ case 0b01000101:
+ *err_msg = "CDB status failed: CdbChkCode error";
+ break;
+ default:
+ *err_msg = "Unknown failure reason";
+ }
+};
+
+static int cmis_cdb_wait_for_status(struct net_device *dev,
+ struct ethtool_cmis_cdb_cmd_args *args)
+{
+ u8 status;
+ int err;
+
+ /* Some vendors demand waiting time before checking status in some
+ * CDB commands.
+ */
+ msleep(args->msleep_pre_rpl);
+
+ err = ethtool_cmis_wait_for_cond(dev, args->flags, CDB_F_STATUS_VALID,
+ args->max_duration,
+ CMIS_CDB_STATUS_OFFSET,
+ status_success, status_fail, &status);
+ if (err < 0 && !args->err_msg)
+ cmis_cdb_status_fail_msg_get(status, &args->err_msg);
+
+ return err;
+}
+
+#define CMIS_CDB_REPLY_OFFSET 0x86
+
+static int cmis_cdb_process_reply(struct net_device *dev,
+ struct ethtool_module_eeprom *page_data,
+ struct ethtool_cmis_cdb_cmd_args *args)
+{
+ u8 rpl_hdr_len = sizeof(struct ethtool_cmis_cdb_rpl_hdr);
+ u8 rpl_exp_len = args->rpl_exp_len + rpl_hdr_len;
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ struct netlink_ext_ack extack = {};
+ struct ethtool_cmis_cdb_rpl *rpl;
+ int err;
+
+ if (!args->rpl_exp_len)
+ return 0;
+
+ ethtool_cmis_page_init(page_data, ETHTOOL_CMIS_CDB_CMD_PAGE,
+ CMIS_CDB_REPLY_OFFSET, rpl_exp_len);
+ page_data->data = kmalloc(page_data->length, GFP_KERNEL);
+ if (!page_data->data)
+ return -ENOMEM;
+
+ err = ops->get_module_eeprom_by_page(dev, page_data, &extack);
+ if (err < 0) {
+ if (extack._msg)
+ netdev_err(dev, "%s\n", extack._msg);
+ goto out;
+ }
+
+ rpl = (struct ethtool_cmis_cdb_rpl *)page_data->data;
+ if ((args->rpl_exp_len > rpl->hdr.rpl_len + rpl_hdr_len) ||
+ !rpl->hdr.rpl_chk_code) {
+ err = -EIO;
+ goto out;
+ }
+
+ args->req.lpl_len = rpl->hdr.rpl_len;
+ memcpy(args->req.payload, rpl->payload, args->req.lpl_len);
+
+out:
+ kfree(page_data->data);
+ return err;
+}
+
+static int
+__ethtool_cmis_cdb_execute_cmd(struct net_device *dev,
+ struct ethtool_module_eeprom *page_data,
+ u8 page, u32 offset, u32 length, void *data)
+{
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ struct netlink_ext_ack extack = {};
+ int err;
+
+ ethtool_cmis_page_init(page_data, page, offset, length);
+ page_data->data = kmemdup(data, page_data->length, GFP_KERNEL);
+ if (!page_data->data)
+ return -ENOMEM;
+
+ err = ops->set_module_eeprom_by_page(dev, page_data, &extack);
+ if (err < 0) {
+ if (extack._msg)
+ netdev_err(dev, "%s\n", extack._msg);
+ }
+
+ kfree(page_data->data);
+ return err;
+}
+
+static u8 cmis_cdb_calc_checksum(const void *data, size_t size)
+{
+ const u8 *bytes = (const u8 *)data;
+ u8 checksum = 0;
+
+ for (size_t i = 0; i < size; i++)
+ checksum += bytes[i];
+
+ return ~checksum;
+}
+
+#define CMIS_CDB_CMD_ID_OFFSET 0x80
+
+int ethtool_cmis_cdb_execute_cmd(struct net_device *dev,
+ struct ethtool_cmis_cdb_cmd_args *args)
+{
+ struct ethtool_module_eeprom page_data = {};
+ u32 offset;
+ int err;
+
+ args->req.chk_code =
+ cmis_cdb_calc_checksum(&args->req, sizeof(args->req));
+
+ if (args->req.lpl_len > args->read_write_len_ext) {
+ args->err_msg = "LPL length is longer than CDB read write length extension allows";
+ return -EINVAL;
+ }
+
+ /* According to the CMIS standard, there are two options to trigger the
+ * CDB commands. The default option is triggering the command by writing
+ * the CMDID bytes. Therefore, the command will be split to 2 calls:
+ * First, with everything except the CMDID field and then the CMDID
+ * field.
+ */
+ offset = CMIS_CDB_CMD_ID_OFFSET +
+ offsetof(struct ethtool_cmis_cdb_request, body);
+ err = __ethtool_cmis_cdb_execute_cmd(dev, &page_data,
+ ETHTOOL_CMIS_CDB_CMD_PAGE, offset,
+ sizeof(args->req.body),
+ &args->req.body);
+ if (err < 0)
+ return err;
+
+ offset = CMIS_CDB_CMD_ID_OFFSET +
+ offsetof(struct ethtool_cmis_cdb_request, id);
+ err = __ethtool_cmis_cdb_execute_cmd(dev, &page_data,
+ ETHTOOL_CMIS_CDB_CMD_PAGE, offset,
+ sizeof(args->req.id),
+ &args->req.id);
+ if (err < 0)
+ return err;
+
+ err = cmis_cdb_wait_for_completion(dev, args);
+ if (err < 0)
+ return err;
+
+ err = cmis_cdb_wait_for_status(dev, args);
+ if (err < 0)
+ return err;
+
+ return cmis_cdb_process_reply(dev, &page_data, args);
+}
diff --git a/net/ethtool/module_fw.h b/net/ethtool/module_fw.h
index e40eae442741..96da7a8175f2 100644
--- a/net/ethtool/module_fw.h
+++ b/net/ethtool/module_fw.h
@@ -8,3 +8,13 @@ void ethnl_module_fw_flash_ntf_start(struct net_device *dev);
void ethnl_module_fw_flash_ntf_complete(struct net_device *dev);
void ethnl_module_fw_flash_ntf_in_progress(struct net_device *dev, u64 done,
u64 total);
+
+/**
+ * struct ethtool_module_fw_flash_params - module firmware flashing parameters
+ * @password: Module password. Only valid when @pass_valid is set.
+ * @password_valid: Whether the module password is valid or not.
+ */
+struct ethtool_module_fw_flash_params {
+ __be32 password;
+ u8 password_valid:1;
+};
--
2.43.0


2024-04-30 03:11:42

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

On Wed, 24 Apr 2024 16:30:17 +0300 Danielle Ratson wrote:
> + hdr = ethnl_bcastmsg_put(skb, ETHTOOL_MSG_MODULE_FW_FLASH_NTF);
> + if (!hdr)
> + goto err_skb;

Do we want to blast it to all listeners or treat it as an async reply?
We can save the seq and portid of the original requester and use reply,
I think.

> + ret = ethnl_fill_reply_header(skb, dev,
> + ETHTOOL_A_MODULE_FW_FLASH_HEADER);
> + if (ret < 0)
> + goto err_skb;
> +
> + if (nla_put_u32(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS, status))
> + goto err_skb;
> +
> + if (status_msg &&
> + nla_put_string(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS_MSG,
> + status_msg))
> + goto err_skb;
> +
> + if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_DONE, done,
> + ETHTOOL_A_MODULE_FW_FLASH_PAD))

nla_put_uint()

> + goto err_skb;
> +
> + if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_TOTAL, total,
> + ETHTOOL_A_MODULE_FW_FLASH_PAD))

nla_put_uint()

> + goto err_skb;
> +
> + genlmsg_end(skb, hdr);
> + ethnl_multicast(skb, dev);
> + return;
> +
> +err_skb:
> + nlmsg_free(skb);
> +}
> +
> +void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
> + char *err_msg, char *sub_err_msg)
> +{
> + char status_msg[120];
> +
> + if (sub_err_msg)
> + sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
> + else
> + sprintf(status_msg, "%s.", err_msg);

Hm, printing in the dot, and assuming sizeof err_msg + sub_err < 116
is a bit surprising. But I guess you have a reason...

Maybe pass them separately to ethnl_module_fw_flash_ntf() then you can
nla_reserve() the right amount of space and sprintf() directly into the
skb?

> + ethnl_module_fw_flash_ntf(dev, ETHTOOL_MODULE_FW_FLASH_STATUS_ERROR,
> + status_msg, 0, 0);

2024-04-30 03:24:10

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v5 09/10] ethtool: Add ability to flash transceiver modules' firmware

On Wed, 24 Apr 2024 16:30:22 +0300 Danielle Ratson wrote:
> +static int
> +module_flash_fw_schedule(struct net_device *dev, const char *file_name,
> + struct ethtool_module_fw_flash_params *params,
> + struct netlink_ext_ack *extack)
> +{
> + struct ethtool_module_fw_flash *module_fw;
> + int err;
> +
> + err = __module_flash_fw_schedule(dev, extack);
> + if (err < 0)
> + return err;

Basic dev validation should probably be called directly from
ethnl_act_module_fw_flash() rather than two functions down.

> + module_fw = kzalloc(sizeof(*module_fw), GFP_KERNEL);
> + if (!module_fw)
> + return -ENOMEM;
> +
> + module_fw->params = *params;
> + err = request_firmware_direct(&module_fw->fw, file_name, &dev->dev);
> + if (err) {
> + NL_SET_ERR_MSG(extack,
> + "Failed to request module firmware image");
> + goto err_request_firmware;

Please name the labels after the actions they perform.

> + }
> +
> + err = module_flash_fw_work_init(module_fw, dev, extack);
> + if (err < 0) {
> + NL_SET_ERR_MSG(extack,
> + "Flashing module firmware is not supported by this device");

This overwrites the more accurate extack msg already set by
module_flash_fw_work_init()

> + goto err_work_init;
> + }

2024-04-30 03:33:18

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v5 10/10] ethtool: Veto some operations during firmware flashing process

On Wed, 24 Apr 2024 16:30:23 +0300 Danielle Ratson wrote:
> Some operations cannot be performed during the firmware flashing process.
>
> For example:
>
> - Port must be down during the whole flashing process to avoid packet loss
> while committing reset for example.
>
> - Writing to EEPROM interrupts the flashing process, so operations like
> ethtool dump, module reset, get and set power mode should be vetoed.
>
> - Split port firmware flashing should be vetoed.
>
> - Flashing firmware on a device which is already in a flashing process
> should be forbidden.
>
> Use the 'module_fw_flashing_in_progress' flag introduced in a previous
> patch to veto those operations and prevent interruptions while preforming
> module firmware flash.

Feels a little out of order to add this check after the functionality.
I'd merge this with patch 5.

2024-04-30 17:51:03

by Danielle Ratson

[permalink] [raw]
Subject: RE: [PATCH net-next v5 10/10] ethtool: Veto some operations during firmware flashing process

> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: Tuesday, 30 April 2024 6:24
> To: Danielle Ratson <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; mlxsw <[email protected]>; Petr Machata
> <[email protected]>; Ido Schimmel <[email protected]>
> Subject: Re: [PATCH net-next v5 10/10] ethtool: Veto some operations during
> firmware flashing process
>
> On Wed, 24 Apr 2024 16:30:23 +0300 Danielle Ratson wrote:
> > Some operations cannot be performed during the firmware flashing process.
> >
> > For example:
> >
> > - Port must be down during the whole flashing process to avoid packet loss
> > while committing reset for example.
> >
> > - Writing to EEPROM interrupts the flashing process, so operations like
> > ethtool dump, module reset, get and set power mode should be vetoed.
> >
> > - Split port firmware flashing should be vetoed.
> >
> > - Flashing firmware on a device which is already in a flashing process
> > should be forbidden.
> >
> > Use the 'module_fw_flashing_in_progress' flag introduced in a previous
> > patch to veto those operations and prevent interruptions while
> > preforming module firmware flash.
>
> Feels a little out of order to add this check after the functionality.
> I'd merge this with patch 5.

Hi Jakub,

Some of this code is only presented in patch 9, so it will cause splitting some of the vetoes in patch 5 and some of it where the code around is presented.
Does it sound reasonable to you?

Thanks,
Danielle

2024-04-30 18:01:25

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v5 10/10] ethtool: Veto some operations during firmware flashing process

On Tue, 30 Apr 2024 17:48:17 +0000 Danielle Ratson wrote:
> Some of this code is only presented in patch 9, so it will cause
> splitting some of the vetoes in patch 5 and some of it where the code
> around is presented. Does it sound reasonable to you?

Yup! Thanks sounds good.

2024-04-30 18:11:44

by Danielle Ratson

[permalink] [raw]
Subject: RE: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: Tuesday, 30 April 2024 6:12
> To: Danielle Ratson <[email protected]>
> Cc: [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; mlxsw <[email protected]>; Petr Machata
> <[email protected]>; Ido Schimmel <[email protected]>
> Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver
> modules' firmware notifications ability
>
> On Wed, 24 Apr 2024 16:30:17 +0300 Danielle Ratson wrote:
> > + hdr = ethnl_bcastmsg_put(skb,
> ETHTOOL_MSG_MODULE_FW_FLASH_NTF);
> > + if (!hdr)
> > + goto err_skb;
>
> Do we want to blast it to all listeners or treat it as an async reply?
> We can save the seq and portid of the original requester and use reply, I
> think.

I am sorry, I am not sure I understood what you meant here... it should be an async reply, but not sure I understood your suggestion.
Can you explain please?
Thanks!

>
> > + ret = ethnl_fill_reply_header(skb, dev,
> > +
> ETHTOOL_A_MODULE_FW_FLASH_HEADER);
> > + if (ret < 0)
> > + goto err_skb;
> > +
> > + if (nla_put_u32(skb, ETHTOOL_A_MODULE_FW_FLASH_STATUS,
> status))
> > + goto err_skb;
> > +
> > + if (status_msg &&
> > + nla_put_string(skb,
> ETHTOOL_A_MODULE_FW_FLASH_STATUS_MSG,
> > + status_msg))
> > + goto err_skb;
> > +
> > + if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_DONE,
> done,
> > + ETHTOOL_A_MODULE_FW_FLASH_PAD))
>
> nla_put_uint()
>
> > + goto err_skb;
> > +
> > + if (nla_put_u64_64bit(skb, ETHTOOL_A_MODULE_FW_FLASH_TOTAL,
> total,
> > + ETHTOOL_A_MODULE_FW_FLASH_PAD))
>
> nla_put_uint()
>
> > + goto err_skb;
> > +
> > + genlmsg_end(skb, hdr);
> > + ethnl_multicast(skb, dev);
> > + return;
> > +
> > +err_skb:
> > + nlmsg_free(skb);
> > +}
> > +
> > +void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
> > + char *err_msg, char *sub_err_msg) {
> > + char status_msg[120];
> > +
> > + if (sub_err_msg)
> > + sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
> > + else
> > + sprintf(status_msg, "%s.", err_msg);
>
> Hm, printing in the dot, and assuming sizeof err_msg + sub_err < 116 is a bit
> surprising. But I guess you have a reason...
>
> Maybe pass them separately to ethnl_module_fw_flash_ntf() then you can
> nla_reserve() the right amount of space and sprintf() directly into the skb?

I can get rid of the dot actually, would it be ok like that?

>
> > + ethnl_module_fw_flash_ntf(dev,
> ETHTOOL_MODULE_FW_FLASH_STATUS_ERROR,
> > + status_msg, 0, 0);

2024-04-30 20:13:14

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

On Tue, 30 Apr 2024 18:11:18 +0000 Danielle Ratson wrote:
> > Do we want to blast it to all listeners or treat it as an async reply?
> > We can save the seq and portid of the original requester and use reply, I
> > think.
>
> I am sorry, I am not sure I understood what you meant here... it
> should be an async reply, but not sure I understood your suggestion.
>
> Can you explain please?

Make sure you have read the netlink intro, it should help fill in some
gaps I won't explicitly cover:
https://docs.kernel.org/next/userspace-api/netlink/intro.html

"True" notifications will have pid = 0 and seq = 0, while replies to
commands have those fields populated based on the request.

pid identifies the socket where the message should be delivered.
ethnl_multicast() assumes that it's zero (since it's designed to work
for notifications) and sends the message to all sockets subscribed to
a multicast / notification group (ETHNL_MCGRP_MONITOR).

So that's the background. What you're doing isn't incorrect but I think
it'd be better if we didn't use the multicast group here, and sent the
messages as a reply - just to the socket which requested the flashing.
Still asynchronously, we just need to save the right pid and seq to use.

Two reasons for this:
1) convenience, the user space socket won't have to subscribe to
the multicast group
2) the multicast group is really intended for notifying about
_configuration changes_ done to the system. If there is a daemon
listening on that group, there's a very high chance it won't care
about progress of the flashing. Maybe we can send a single
notification that flashing has been completed but not "progress
updates"

I think it should work.

> > > +void ethnl_module_fw_flash_ntf_err(struct net_device *dev,
> > > + char *err_msg, char *sub_err_msg) {
> > > + char status_msg[120];
> > > +
> > > + if (sub_err_msg)
> > > + sprintf(status_msg, "%s, %s.", err_msg, sub_err_msg);
> > > + else
> > > + sprintf(status_msg, "%s.", err_msg);
> >
> > Hm, printing in the dot, and assuming sizeof err_msg + sub_err < 116 is a bit
> > surprising. But I guess you have a reason...
> >
> > Maybe pass them separately to ethnl_module_fw_flash_ntf() then you can
> > nla_reserve() the right amount of space and sprintf() directly into the skb?
>
> I can get rid of the dot actually, would it be ok like that?

It'd still be better to splice the two strings and the comma directly
to the skb, rather than on the stack using a function which doesn't
check the bounds of the buffer :S

2024-05-01 07:54:45

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

On Tue, Apr 30, 2024 at 01:03:02PM -0700, Jakub Kicinski wrote:
> On Tue, 30 Apr 2024 18:11:18 +0000 Danielle Ratson wrote:
> > > Do we want to blast it to all listeners or treat it as an async reply?
> > > We can save the seq and portid of the original requester and use reply, I
> > > think.
> >
> > I am sorry, I am not sure I understood what you meant here... it
> > should be an async reply, but not sure I understood your suggestion.
> >
> > Can you explain please?
>
> Make sure you have read the netlink intro, it should help fill in some
> gaps I won't explicitly cover:
> https://docs.kernel.org/next/userspace-api/netlink/intro.html
>
> "True" notifications will have pid = 0 and seq = 0, while replies to
> commands have those fields populated based on the request.
>
> pid identifies the socket where the message should be delivered.
> ethnl_multicast() assumes that it's zero (since it's designed to work
> for notifications) and sends the message to all sockets subscribed to
> a multicast / notification group (ETHNL_MCGRP_MONITOR).
>
> So that's the background. What you're doing isn't incorrect but I think
> it'd be better if we didn't use the multicast group here, and sent the
> messages as a reply - just to the socket which requested the flashing.
> Still asynchronously, we just need to save the right pid and seq to use.
>
> Two reasons for this:
> 1) convenience, the user space socket won't have to subscribe to
> the multicast group
> 2) the multicast group is really intended for notifying about
> _configuration changes_ done to the system. If there is a daemon
> listening on that group, there's a very high chance it won't care
> about progress of the flashing. Maybe we can send a single
> notification that flashing has been completed but not "progress
> updates"
>
> I think it should work.

We can try to use unicast, but the current design is influenced by
devlink firmware flash (see __devlink_flash_update_notify()) and ethtool
cable testing (see ethnl_cable_test_started() and
ethnl_cable_test_finished()), both of which use multicast notifications
although the latter does not update about progress.

Do you want us to try the unicast approach or be consistent with the
above examples?

2024-05-01 14:38:12

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

On Wed, 1 May 2024 10:53:48 +0300 Ido Schimmel wrote:
> We can try to use unicast, but the current design is influenced by
> devlink firmware flash (see __devlink_flash_update_notify()) and ethtool
> cable testing (see ethnl_cable_test_started() and
> ethnl_cable_test_finished()), both of which use multicast notifications
> although the latter does not update about progress.
>
> Do you want us to try the unicast approach or be consistent with the
> above examples?

We are charting a bit of a new territory here, you're right that
the precedents point in the direction of multicast.
The unicast is harder to get done on the kernel side (we should
probably also check that the socket pid didn't get reused, stop
sending the notifications when original socket gets closed?)
It will require using pretty much all the pieces of advanced
netlink infra we have, I'm happy to explain more, but I'll also
understand if you prefer to stick to multicast.

2024-05-22 13:09:02

by Danielle Ratson

[permalink] [raw]
Subject: RE: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: Wednesday, 1 May 2024 17:38
> To: Ido Schimmel <[email protected]>
> Cc: Danielle Ratson <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected]; linux-
> [email protected]; mlxsw <[email protected]>; Petr Machata
> <[email protected]>
> Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver
> modules' firmware notifications ability
>
> On Wed, 1 May 2024 10:53:48 +0300 Ido Schimmel wrote:
> > We can try to use unicast, but the current design is influenced by
> > devlink firmware flash (see __devlink_flash_update_notify()) and
> > ethtool cable testing (see ethnl_cable_test_started() and
> > ethnl_cable_test_finished()), both of which use multicast
> > notifications although the latter does not update about progress.
> >
> > Do you want us to try the unicast approach or be consistent with the
> > above examples?
>
> We are charting a bit of a new territory here, you're right that the precedents
> point in the direction of multicast.
> The unicast is harder to get done on the kernel side (we should probably also
> check that the socket pid didn't get reused, stop sending the notifications
> when original socket gets closed?) It will require using pretty much all the
> pieces of advanced netlink infra we have, I'm happy to explain more, but I'll
> also understand if you prefer to stick to multicast.

Hi Jakub,

Following our discussion, I wanted to see if you are ok with the idea below:

1. Add a new unicast function to netlink.c:
void *ethnl_unicast_put(struct sk_buff *skb, u32 portid, u32 seq, u8 cmd)

2. Use it in the notification function instead of the multicast previously used along with genlmsg_unicast().
'portid' and 'seq' taken from genl_info(), are added to the struct ethtool_module_fw_flash, which is accessible from the work item.

3. Create a global list that holds nodes from type struct ethtool_module_fw_flash() and add it as a field in the struct ethtool_module_fw_flash.
Before scheduling a work, a new node is added to the list.

4. Add a new netlink notifier that when the relevant event takes place, deletes the node from the list, wait until the end of the work item, with cancel_work_sync() and free allocations.

Thanks,
Danielle

2024-05-22 13:45:37

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

On Wed, 22 May 2024 13:08:43 +0000 Danielle Ratson wrote:
> 1. Add a new unicast function to netlink.c:
> void *ethnl_unicast_put(struct sk_buff *skb, u32 portid, u32 seq, u8 cmd)
>
> 2. Use it in the notification function instead of the multicast previously used along with genlmsg_unicast().
> 'portid' and 'seq' taken from genl_info(), are added to the struct ethtool_module_fw_flash, which is accessible from the work item.
>
> 3. Create a global list that holds nodes from type struct ethtool_module_fw_flash() and add it as a field in the struct ethtool_module_fw_flash.
> Before scheduling a work, a new node is added to the list.

Makes sense.

> 4. Add a new netlink notifier that when the relevant event takes place, deletes the node from the list, wait until the end of the work item, with cancel_work_sync() and free allocations.

What's the "relevant event" in this case? Closing of the socket that
user had issued the command on?

Easiest way to "notice" the socket got closed would probably be to
add some info to genl_sk_priv_*(). ->sock_priv_destroy() will get
called. But you can also get a close notification in the family
->unbind callback.

I'm on the fence whether we should cancel the work. We could just
mark the command as 'no socket present' and stop sending notifications.
Not sure which is better..

2024-05-22 13:59:41

by Danielle Ratson

[permalink] [raw]
Subject: RE: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

> > 1. Add a new unicast function to netlink.c:
> > void *ethnl_unicast_put(struct sk_buff *skb, u32 portid, u32 seq, u8
> > cmd)
> >
> > 2. Use it in the notification function instead of the multicast previously used
> along with genlmsg_unicast().
> > 'portid' and 'seq' taken from genl_info(), are added to the struct
> ethtool_module_fw_flash, which is accessible from the work item.
> >
> > 3. Create a global list that holds nodes from type struct
> ethtool_module_fw_flash() and add it as a field in the struct
> ethtool_module_fw_flash.
> > Before scheduling a work, a new node is added to the list.
>
> Makes sense.
>
> > 4. Add a new netlink notifier that when the relevant event takes place,
> deletes the node from the list, wait until the end of the work item, with
> cancel_work_sync() and free allocations.
>
> What's the "relevant event" in this case? Closing of the socket that user had
> issued the command on?

The event should match the below:
event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC

Then iterate over the list to look for work that matches the dev and portid.
The socket doesn’t close until the work is done in that case.

>
> Easiest way to "notice" the socket got closed would probably be to add some
> info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also
> get a close notification in the family
> ->unbind callback.
>
> I'm on the fence whether we should cancel the work. We could just mark the
> command as 'no socket present' and stop sending notifications.
> Not sure which is better..

Is there a scenario that we hit this event and won't intend to cancel the work?

Thanks,
Danielle

2024-05-22 14:25:02

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

On Wed, 22 May 2024 13:56:11 +0000 Danielle Ratson wrote:
> > > 4. Add a new netlink notifier that when the relevant event takes place,
> > deletes the node from the list, wait until the end of the work item, with
> > cancel_work_sync() and free allocations.
> >
> > What's the "relevant event" in this case? Closing of the socket that user had
> > issued the command on?
>
> The event should match the below:
> event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC
>
> Then iterate over the list to look for work that matches the dev and portid.
> The socket doesn’t close until the work is done in that case.

Okay, good, yes. I think you can use one of the callbacks I mentioned
below to achieve the same thing with less complexity than the notifier.

> > Easiest way to "notice" the socket got closed would probably be to add some
> > info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also
> > get a close notification in the family
> > ->unbind callback.
> >
> > I'm on the fence whether we should cancel the work. We could just mark the
> > command as 'no socket present' and stop sending notifications.
> > Not sure which is better..
>
> Is there a scenario that we hit this event and won't intend to cancel the work?

I think it's up to us. I don't see any legit reason for user space to
intentionally cancel the flashing. So the only option is that user space
is either buggy or has crashed, and the socket got closed before
flashing finished. Right?

2024-05-27 16:18:45

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

On Wed, May 22, 2024 at 07:22:12AM -0700, Jakub Kicinski wrote:
> On Wed, 22 May 2024 13:56:11 +0000 Danielle Ratson wrote:
> > > > 4. Add a new netlink notifier that when the relevant event takes place,
> > > deletes the node from the list, wait until the end of the work item, with
> > > cancel_work_sync() and free allocations.
> > >
> > > What's the "relevant event" in this case? Closing of the socket that user had
> > > issued the command on?
> >
> > The event should match the below:
> > event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC
> >
> > Then iterate over the list to look for work that matches the dev and portid.
> > The socket doesn’t close until the work is done in that case.
>
> Okay, good, yes. I think you can use one of the callbacks I mentioned
> below to achieve the same thing with less complexity than the notifier.

Danielle already has a POC with the notifier and it's not that
complicated. I wasn't aware of the netlink notifier, but we found it
when we tried to understand how other netlink families get notified
about a socket being closed.

Which advantages do you see in the sock_priv_destroy() approach? Are you
against the notifier approach?

> > > Easiest way to "notice" the socket got closed would probably be to add some
> > > info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also
> > > get a close notification in the family
> > > ->unbind callback.

Isn't the unbind callback only for multicast (whereas we are using
unicast)?

> > >
> > > I'm on the fence whether we should cancel the work. We could just mark the
> > > command as 'no socket present' and stop sending notifications.
> > > Not sure which is better..
> >
> > Is there a scenario that we hit this event and won't intend to cancel the work?
>
> I think it's up to us. I don't see any legit reason for user space to
> intentionally cancel the flashing. So the only option is that user space
> is either buggy or has crashed, and the socket got closed before
> flashing finished. Right?

We don't think that closing the socket / killing the process mid
flashing is a legitimate scenario. We looked into it in order to avoid
sending unicast notifications to a socket that did not ask for them but
gets them because it was bound to the port ID that was used by the old
socket.

I agree that we don't need to cancel the work and can simply have the
work item stop sending notifications. User space will get an error if it
tries to flash a module that is already being flashed in the background.
WDYT?

2024-05-27 16:31:49

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v5 04/10] ethtool: Add flashing transceiver modules' firmware notifications ability

On Mon, 27 May 2024 19:10:55 +0300 Ido Schimmel wrote:
> On Wed, May 22, 2024 at 07:22:12AM -0700, Jakub Kicinski wrote:
> > On Wed, 22 May 2024 13:56:11 +0000 Danielle Ratson wrote:
> > > The event should match the below:
> > > event == NETLINK_URELEASE && notify->protocol == NETLINK_GENERIC
> > >
> > > Then iterate over the list to look for work that matches the dev and portid.
> > > The socket doesn’t close until the work is done in that case.
> >
> > Okay, good, yes. I think you can use one of the callbacks I mentioned
> > below to achieve the same thing with less complexity than the notifier.
>
> Danielle already has a POC with the notifier and it's not that
> complicated. I wasn't aware of the netlink notifier, but we found it
> when we tried to understand how other netlink families get notified
> about a socket being closed.
>
> Which advantages do you see in the sock_priv_destroy() approach? Are you
> against the notifier approach?

Notifier is not incorrect, but I worry it will result in more code,
and basically duplication of what genl_sk_priv* does. Perhaps you
managed to code it up very neatly - if so feel free to send the v6
and we can discuss further if needed?

> > > > Easiest way to "notice" the socket got closed would probably be to add some
> > > > info to genl_sk_priv_*(). ->sock_priv_destroy() will get called. But you can also
> > > > get a close notification in the family
> > > > ->unbind callback.
>
> Isn't the unbind callback only for multicast (whereas we are using
> unicast)?

True, should work in practice, I think. But sock_priv is much better.

> > > Is there a scenario that we hit this event and won't intend to cancel the work?
> >
> > I think it's up to us. I don't see any legit reason for user space to
> > intentionally cancel the flashing. So the only option is that user space
> > is either buggy or has crashed, and the socket got closed before
> > flashing finished. Right?
>
> We don't think that closing the socket / killing the process mid
> flashing is a legitimate scenario. We looked into it in order to avoid
> sending unicast notifications to a socket that did not ask for them but
> gets them because it was bound to the port ID that was used by the old
> socket.
>
> I agree that we don't need to cancel the work and can simply have the
> work item stop sending notifications. User space will get an error if it
> tries to flash a module that is already being flashed in the background.
> WDYT?

SGTM!