2021-10-03 18:14:05

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH net-next v2 0/5] devlink reload simplification

From: Leon Romanovsky <[email protected]>

Changelog:
v2:
* Dropped const removal patch
* Added new patch to hide struct devlink
* Added new patch to annotate devlink API
* Implemented copy of all callback in devlink ops
v1: https://lore.kernel.org/all/[email protected]
* Missed removal of extra WARN_ON
* Added "ops parameter to macro as Dan suggested.
v0: https://lore.kernel.org/all/[email protected]

-------------------------------------------------------------------
Hi,

This series fixes the bug with mlx5 device, which in some configurations
doesn't support devlink reload and shouldn't have any reload statistics
like any other net device. Unfortunately, it is not the case in the
current implementation of devlink reload.

This fix is done by simplification of internal API.

Thanks

Leon Romanovsky (5):
devlink: Reduce struct devlink exposure
devlink: Annotate devlink API calls
devlink: Allow set specific ops callbacks dynamically
net/mlx5: Register separate reload devlink ops for multiport device
devlink: Delete reload enable/disable interface

.../hisilicon/hns3/hns3pf/hclge_devlink.c | 3 -
.../hisilicon/hns3/hns3vf/hclgevf_devlink.c | 3 -
drivers/net/ethernet/mellanox/mlx4/main.c | 2 -
.../net/ethernet/mellanox/mlx5/core/devlink.c | 13 +-
.../net/ethernet/mellanox/mlx5/core/main.c | 3 -
.../mellanox/mlx5/core/sf/dev/driver.c | 5 +-
drivers/net/ethernet/mellanox/mlxfw/mlxfw.h | 2 +-
drivers/net/ethernet/mellanox/mlxsw/core.c | 10 +-
drivers/net/netdevsim/dev.c | 3 -
include/net/devlink.h | 57 +--
include/trace/events/devlink.h | 72 ++--
net/core/devlink.c | 390 ++++++++++++------
12 files changed, 317 insertions(+), 246 deletions(-)

--
2.31.1


2021-10-03 18:14:18

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH net-next v2 1/5] devlink: Reduce struct devlink exposure

From: Leon Romanovsky <[email protected]>

The declaration of struct devlink in general header provokes the
situation where internal fields can be accidentally used by the driver
authors. In order to reduce such possible situations, let's reduce the
namespace exposure of struct devlink.

Signed-off-by: Leon Romanovsky <[email protected]>
---
drivers/net/ethernet/mellanox/mlxfw/mlxfw.h | 2 +-
include/net/devlink.h | 54 ++--------------
include/trace/events/devlink.h | 72 ++++++++++-----------
net/core/devlink.c | 59 +++++++++++++++++
4 files changed, 100 insertions(+), 87 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxfw/mlxfw.h b/drivers/net/ethernet/mellanox/mlxfw/mlxfw.h
index 7654841a05c2..e6475ea77cd1 100644
--- a/drivers/net/ethernet/mellanox/mlxfw/mlxfw.h
+++ b/drivers/net/ethernet/mellanox/mlxfw/mlxfw.h
@@ -19,7 +19,7 @@ struct mlxfw_dev {
static inline
struct device *mlxfw_dev_dev(struct mlxfw_dev *mlxfw_dev)
{
- return mlxfw_dev->devlink->dev;
+ return devlink_to_dev(mlxfw_dev->devlink);
}

#define MLXFW_PRFX "mlxfw: "
diff --git a/include/net/devlink.h b/include/net/devlink.h
index a7852a257bf6..ae03eb1c6cc9 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -21,45 +21,7 @@
#include <linux/xarray.h>
#include <linux/firmware.h>

-#define DEVLINK_RELOAD_STATS_ARRAY_SIZE \
- (__DEVLINK_RELOAD_LIMIT_MAX * __DEVLINK_RELOAD_ACTION_MAX)
-
-struct devlink_dev_stats {
- u32 reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
- u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
-};
-
-struct devlink_ops;
-
-struct devlink {
- u32 index;
- struct list_head port_list;
- struct list_head rate_list;
- struct list_head sb_list;
- struct list_head dpipe_table_list;
- struct list_head resource_list;
- struct list_head param_list;
- struct list_head region_list;
- struct list_head reporter_list;
- struct mutex reporters_lock; /* protects reporter_list */
- struct devlink_dpipe_headers *dpipe_headers;
- struct list_head trap_list;
- struct list_head trap_group_list;
- struct list_head trap_policer_list;
- const struct devlink_ops *ops;
- struct xarray snapshot_ids;
- struct devlink_dev_stats stats;
- struct device *dev;
- possible_net_t _net;
- struct mutex lock; /* Serializes access to devlink instance specific objects such as
- * port, sb, dpipe, resource, params, region, traps and more.
- */
- u8 reload_failed:1,
- reload_enabled:1;
- refcount_t refcount;
- struct completion comp;
- char priv[0] __aligned(NETDEV_ALIGN);
-};
+struct devlink;

struct devlink_port_phys_attrs {
u32 port_number; /* Same value as "split group".
@@ -1520,17 +1482,9 @@ struct devlink_ops {
struct netlink_ext_ack *extack);
};

-static inline void *devlink_priv(struct devlink *devlink)
-{
- BUG_ON(!devlink);
- return &devlink->priv;
-}
-
-static inline struct devlink *priv_to_devlink(void *priv)
-{
- BUG_ON(!priv);
- return container_of(priv, struct devlink, priv);
-}
+void *devlink_priv(struct devlink *devlink);
+struct devlink *priv_to_devlink(void *priv);
+struct device *devlink_to_dev(const struct devlink *devlink);

static inline struct devlink_port *
netdev_to_devlink_port(struct net_device *dev)
diff --git a/include/trace/events/devlink.h b/include/trace/events/devlink.h
index 44d8e2981065..2814f188d98c 100644
--- a/include/trace/events/devlink.h
+++ b/include/trace/events/devlink.h
@@ -21,9 +21,9 @@ TRACE_EVENT(devlink_hwmsg,
TP_ARGS(devlink, incoming, type, buf, len),

TP_STRUCT__entry(
- __string(bus_name, devlink->dev->bus->name)
- __string(dev_name, dev_name(devlink->dev))
- __string(driver_name, devlink->dev->driver->name)
+ __string(bus_name, devlink_to_dev(devlink)->bus->name)
+ __string(dev_name, dev_name(devlink_to_dev(devlink)))
+ __string(driver_name, devlink_to_dev(devlink)->driver->name)
__field(bool, incoming)
__field(unsigned long, type)
__dynamic_array(u8, buf, len)
@@ -31,9 +31,9 @@ TRACE_EVENT(devlink_hwmsg,
),

TP_fast_assign(
- __assign_str(bus_name, devlink->dev->bus->name);
- __assign_str(dev_name, dev_name(devlink->dev));
- __assign_str(driver_name, devlink->dev->driver->name);
+ __assign_str(bus_name, devlink_to_dev(devlink)->bus->name);
+ __assign_str(dev_name, dev_name(devlink_to_dev(devlink)));
+ __assign_str(driver_name, devlink_to_dev(devlink)->driver->name);
__entry->incoming = incoming;
__entry->type = type;
memcpy(__get_dynamic_array(buf), buf, len);
@@ -55,17 +55,17 @@ TRACE_EVENT(devlink_hwerr,
TP_ARGS(devlink, err, msg),

TP_STRUCT__entry(
- __string(bus_name, devlink->dev->bus->name)
- __string(dev_name, dev_name(devlink->dev))
- __string(driver_name, devlink->dev->driver->name)
+ __string(bus_name, devlink_to_dev(devlink)->bus->name)
+ __string(dev_name, dev_name(devlink_to_dev(devlink)))
+ __string(driver_name, devlink_to_dev(devlink)->driver->name)
__field(int, err)
__string(msg, msg)
),

TP_fast_assign(
- __assign_str(bus_name, devlink->dev->bus->name);
- __assign_str(dev_name, dev_name(devlink->dev));
- __assign_str(driver_name, devlink->dev->driver->name);
+ __assign_str(bus_name, devlink_to_dev(devlink)->bus->name);
+ __assign_str(dev_name, dev_name(devlink_to_dev(devlink)));
+ __assign_str(driver_name, devlink_to_dev(devlink)->driver->name);
__entry->err = err;
__assign_str(msg, msg);
),
@@ -85,17 +85,17 @@ TRACE_EVENT(devlink_health_report,
TP_ARGS(devlink, reporter_name, msg),

TP_STRUCT__entry(
- __string(bus_name, devlink->dev->bus->name)
- __string(dev_name, dev_name(devlink->dev))
- __string(driver_name, devlink->dev->driver->name)
+ __string(bus_name, devlink_to_dev(devlink)->bus->name)
+ __string(dev_name, dev_name(devlink_to_dev(devlink)))
+ __string(driver_name, devlink_to_dev(devlink)->driver->name)
__string(reporter_name, msg)
__string(msg, msg)
),

TP_fast_assign(
- __assign_str(bus_name, devlink->dev->bus->name);
- __assign_str(dev_name, dev_name(devlink->dev));
- __assign_str(driver_name, devlink->dev->driver->name);
+ __assign_str(bus_name, devlink_to_dev(devlink)->bus->name);
+ __assign_str(dev_name, dev_name(devlink_to_dev(devlink)));
+ __assign_str(driver_name, devlink_to_dev(devlink)->driver->name);
__assign_str(reporter_name, reporter_name);
__assign_str(msg, msg);
),
@@ -116,18 +116,18 @@ TRACE_EVENT(devlink_health_recover_aborted,
TP_ARGS(devlink, reporter_name, health_state, time_since_last_recover),

TP_STRUCT__entry(
- __string(bus_name, devlink->dev->bus->name)
- __string(dev_name, dev_name(devlink->dev))
- __string(driver_name, devlink->dev->driver->name)
+ __string(bus_name, devlink_to_dev(devlink)->bus->name)
+ __string(dev_name, dev_name(devlink_to_dev(devlink)))
+ __string(driver_name, devlink_to_dev(devlink)->driver->name)
__string(reporter_name, reporter_name)
__field(bool, health_state)
__field(u64, time_since_last_recover)
),

TP_fast_assign(
- __assign_str(bus_name, devlink->dev->bus->name);
- __assign_str(dev_name, dev_name(devlink->dev));
- __assign_str(driver_name, devlink->dev->driver->name);
+ __assign_str(bus_name, devlink_to_dev(devlink)->bus->name);
+ __assign_str(dev_name, dev_name(devlink_to_dev(devlink)));
+ __assign_str(driver_name, devlink_to_dev(devlink)->driver->name);
__assign_str(reporter_name, reporter_name);
__entry->health_state = health_state;
__entry->time_since_last_recover = time_since_last_recover;
@@ -150,17 +150,17 @@ TRACE_EVENT(devlink_health_reporter_state_update,
TP_ARGS(devlink, reporter_name, new_state),

TP_STRUCT__entry(
- __string(bus_name, devlink->dev->bus->name)
- __string(dev_name, dev_name(devlink->dev))
- __string(driver_name, devlink->dev->driver->name)
+ __string(bus_name, devlink_to_dev(devlink)->bus->name)
+ __string(dev_name, dev_name(devlink_to_dev(devlink)))
+ __string(driver_name, devlink_to_dev(devlink)->driver->name)
__string(reporter_name, reporter_name)
__field(u8, new_state)
),

TP_fast_assign(
- __assign_str(bus_name, devlink->dev->bus->name);
- __assign_str(dev_name, dev_name(devlink->dev));
- __assign_str(driver_name, devlink->dev->driver->name);
+ __assign_str(bus_name, devlink_to_dev(devlink)->bus->name);
+ __assign_str(dev_name, dev_name(devlink_to_dev(devlink)));
+ __assign_str(driver_name, devlink_to_dev(devlink)->driver->name);
__assign_str(reporter_name, reporter_name);
__entry->new_state = new_state;
),
@@ -181,9 +181,9 @@ TRACE_EVENT(devlink_trap_report,
TP_ARGS(devlink, skb, metadata),

TP_STRUCT__entry(
- __string(bus_name, devlink->dev->bus->name)
- __string(dev_name, dev_name(devlink->dev))
- __string(driver_name, devlink->dev->driver->name)
+ __string(bus_name, devlink_to_dev(devlink)->bus->name)
+ __string(dev_name, dev_name(devlink_to_dev(devlink)))
+ __string(driver_name, devlink_to_dev(devlink)->driver->name)
__string(trap_name, metadata->trap_name)
__string(trap_group_name, metadata->trap_group_name)
__dynamic_array(char, input_dev_name, IFNAMSIZ)
@@ -192,9 +192,9 @@ TRACE_EVENT(devlink_trap_report,
TP_fast_assign(
struct net_device *input_dev = metadata->input_dev;

- __assign_str(bus_name, devlink->dev->bus->name);
- __assign_str(dev_name, dev_name(devlink->dev));
- __assign_str(driver_name, devlink->dev->driver->name);
+ __assign_str(bus_name, devlink_to_dev(devlink)->bus->name);
+ __assign_str(dev_name, dev_name(devlink_to_dev(devlink)));
+ __assign_str(driver_name, devlink_to_dev(devlink)->driver->name);
__assign_str(trap_name, metadata->trap_name);
__assign_str(trap_group_name, metadata->trap_group_name);
__assign_str(input_dev_name,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 4917112406a0..9642429cec65 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -30,6 +30,65 @@
#define CREATE_TRACE_POINTS
#include <trace/events/devlink.h>

+#define DEVLINK_RELOAD_STATS_ARRAY_SIZE \
+ (__DEVLINK_RELOAD_LIMIT_MAX * __DEVLINK_RELOAD_ACTION_MAX)
+
+struct devlink_dev_stats {
+ u32 reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
+ u32 remote_reload_stats[DEVLINK_RELOAD_STATS_ARRAY_SIZE];
+};
+
+struct devlink {
+ u32 index;
+ struct list_head port_list;
+ struct list_head rate_list;
+ struct list_head sb_list;
+ struct list_head dpipe_table_list;
+ struct list_head resource_list;
+ struct list_head param_list;
+ struct list_head region_list;
+ struct list_head reporter_list;
+ struct mutex reporters_lock; /* protects reporter_list */
+ struct devlink_dpipe_headers *dpipe_headers;
+ struct list_head trap_list;
+ struct list_head trap_group_list;
+ struct list_head trap_policer_list;
+ const struct devlink_ops *ops;
+ struct xarray snapshot_ids;
+ struct devlink_dev_stats stats;
+ struct device *dev;
+ possible_net_t _net;
+ /* Serializes access to devlink instance specific objects such as
+ * port, sb, dpipe, resource, params, region, traps and more.
+ */
+ struct mutex lock;
+ u8 reload_failed:1,
+ reload_enabled:1;
+ refcount_t refcount;
+ struct completion comp;
+ char priv[0] __aligned(NETDEV_ALIGN);
+};
+
+void *devlink_priv(struct devlink *devlink)
+{
+ BUG_ON(!devlink);
+ return &devlink->priv;
+}
+EXPORT_SYMBOL_GPL(devlink_priv);
+
+struct devlink *priv_to_devlink(void *priv)
+{
+ BUG_ON(!priv);
+ return container_of(priv, struct devlink, priv);
+}
+EXPORT_SYMBOL_GPL(priv_to_devlink);
+
+struct device *devlink_to_dev(const struct devlink *devlink)
+{
+ return devlink->dev;
+}
+EXPORT_SYMBOL_GPL(devlink_to_dev);
+
static struct devlink_dpipe_field devlink_dpipe_fields_ethernet[] = {
{
.name = "destination mac",
--
2.31.1

2021-10-03 18:15:36

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH net-next v2 2/5] devlink: Annotate devlink API calls

From: Leon Romanovsky <[email protected]>

Initial annotation patch to separate calls that needs to be executed
before or after devlink_register().

Signed-off-by: Leon Romanovsky <[email protected]>
---
net/core/devlink.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 9642429cec65..4e484afeadea 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -154,6 +154,22 @@ static const struct nla_policy devlink_function_nl_policy[DEVLINK_PORT_FUNCTION_
static DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
#define DEVLINK_REGISTERED XA_MARK_1

+/* devlink instances are open to the access from the user space after
+ * devlink_register() call. Such logical barrier allows us to have certain
+ * expectations related to locking.
+ *
+ * Before *_register() - we are in initialization stage and no parallel
+ * access possible to the devlink instance. All drivers perform that phase
+ * by implicitly holding device_lock.
+ *
+ * After *_register() - users and driver can access devlink instance at
+ * the same time.
+ */
+#define ASSERT_DEVLINK_REGISTERED(d) \
+ WARN_ON_ONCE(!xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED))
+#define ASSERT_DEVLINK_NOT_REGISTERED(d) \
+ WARN_ON_ONCE(xa_get_mark(&devlinks, (d)->index, DEVLINK_REGISTERED))
+
/* devlink_mutex
*
* An overall lock guarding every operation coming from userspace.
@@ -9115,6 +9131,10 @@ static void devlink_notify_unregister(struct devlink *devlink)
*/
void devlink_register(struct devlink *devlink)
{
+ ASSERT_DEVLINK_NOT_REGISTERED(devlink);
+ /* Make sure that we are in .probe() routine */
+ device_lock_assert(devlink->dev);
+
mutex_lock(&devlink_mutex);
xa_set_mark(&devlinks, devlink->index, DEVLINK_REGISTERED);
devlink_notify_register(devlink);
@@ -9129,6 +9149,10 @@ EXPORT_SYMBOL_GPL(devlink_register);
*/
void devlink_unregister(struct devlink *devlink)
{
+ ASSERT_DEVLINK_REGISTERED(devlink);
+ /* Make sure that we are in .remove() routine */
+ device_lock_assert(devlink->dev);
+
devlink_put(devlink);
wait_for_completion(&devlink->comp);

@@ -9183,6 +9207,8 @@ EXPORT_SYMBOL_GPL(devlink_reload_disable);
*/
void devlink_free(struct devlink *devlink)
{
+ ASSERT_DEVLINK_NOT_REGISTERED(devlink);
+
mutex_destroy(&devlink->reporters_lock);
mutex_destroy(&devlink->lock);
WARN_ON(!list_empty(&devlink->trap_policer_list));
--
2.31.1

2021-10-03 18:16:33

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically

From: Leon Romanovsky <[email protected]>

Introduce new devlink call to set specific ops callback during
device initialization phase after devlink_alloc() is already
called.

This allows us to set specific ops based on device property which
is not known at the beginning of driver initialization.

For the sake of simplicity, this API lacks any type of locking and
needs to be called before devlink_register() to make sure that no
parallel access to the ops is possible at this stage.

Signed-off-by: Leon Romanovsky <[email protected]>
---
include/net/devlink.h | 1 +
net/core/devlink.c | 270 ++++++++++++++++++++++++++++--------------
2 files changed, 180 insertions(+), 91 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index ae03eb1c6cc9..320146d95fb8 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1520,6 +1520,7 @@ static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
{
return devlink_alloc_ns(ops, priv_size, &init_net, dev);
}
+void devlink_set_ops(struct devlink *devlink, const struct devlink_ops *ops);
void devlink_register(struct devlink *devlink);
void devlink_unregister(struct devlink *devlink);
void devlink_reload_enable(struct devlink *devlink);
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 4e484afeadea..25c2aa2b35cd 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -53,7 +53,7 @@ struct devlink {
struct list_head trap_list;
struct list_head trap_group_list;
struct list_head trap_policer_list;
- const struct devlink_ops *ops;
+ struct devlink_ops ops;
struct xarray snapshot_ids;
struct devlink_dev_stats stats;
struct device *dev;
@@ -683,13 +683,13 @@ devlink_reload_combination_is_invalid(enum devlink_reload_action action,
static bool
devlink_reload_action_is_supported(struct devlink *devlink, enum devlink_reload_action action)
{
- return test_bit(action, &devlink->ops->reload_actions);
+ return test_bit(action, &devlink->ops.reload_actions);
}

static bool
devlink_reload_limit_is_supported(struct devlink *devlink, enum devlink_reload_limit limit)
{
- return test_bit(limit, &devlink->ops->reload_limits);
+ return test_bit(limit, &devlink->ops.reload_limits);
}

static int devlink_reload_stat_put(struct sk_buff *msg,
@@ -1036,7 +1036,7 @@ devlink_nl_port_function_attrs_put(struct sk_buff *msg, struct devlink_port *por
if (!function_attr)
return -EMSGSIZE;

- ops = port->devlink->ops;
+ ops = &port->devlink->ops;
err = devlink_port_fn_hw_addr_fill(ops, port, msg, extack,
&msg_updated);
if (err)
@@ -1384,14 +1384,13 @@ static int devlink_port_type_set(struct devlink_port *devlink_port,
{
int err;

- if (!devlink_port->devlink->ops->port_type_set)
+ if (!devlink_port->devlink->ops.port_type_set)
return -EOPNOTSUPP;

if (port_type == devlink_port->type)
return 0;

- err = devlink_port->devlink->ops->port_type_set(devlink_port,
- port_type);
+ err = devlink_port->devlink->ops.port_type_set(devlink_port, port_type);
if (err)
return err;

@@ -1404,7 +1403,7 @@ static int devlink_port_function_hw_addr_set(struct devlink_port *port,
const struct nlattr *attr,
struct netlink_ext_ack *extack)
{
- const struct devlink_ops *ops = port->devlink->ops;
+ const struct devlink_ops *ops = &port->devlink->ops;
const u8 *hw_addr;
int hw_addr_len;

@@ -1442,7 +1441,7 @@ static int devlink_port_fn_state_set(struct devlink_port *port,
const struct devlink_ops *ops;

state = nla_get_u8(attr);
- ops = port->devlink->ops;
+ ops = &port->devlink->ops;
if (!ops->port_fn_state_set) {
NL_SET_ERR_MSG_MOD(extack,
"Function does not support state setting");
@@ -1515,9 +1514,9 @@ static int devlink_port_split(struct devlink *devlink, u32 port_index,
u32 count, struct netlink_ext_ack *extack)

{
- if (devlink->ops->port_split)
- return devlink->ops->port_split(devlink, port_index, count,
- extack);
+ if (devlink->ops.port_split)
+ return devlink->ops.port_split(devlink, port_index, count,
+ extack);
return -EOPNOTSUPP;
}

@@ -1561,8 +1560,8 @@ static int devlink_port_unsplit(struct devlink *devlink, u32 port_index,
struct netlink_ext_ack *extack)

{
- if (devlink->ops->port_unsplit)
- return devlink->ops->port_unsplit(devlink, port_index, extack);
+ if (devlink->ops.port_unsplit)
+ return devlink->ops.port_unsplit(devlink, port_index, extack);
return -EOPNOTSUPP;
}

@@ -1622,7 +1621,7 @@ static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb,
unsigned int new_port_index;
int err;

- if (!devlink->ops->port_new || !devlink->ops->port_del)
+ if (!devlink->ops.port_new || !devlink->ops.port_del)
return -EOPNOTSUPP;

if (!info->attrs[DEVLINK_ATTR_PORT_FLAVOUR] ||
@@ -1651,15 +1650,15 @@ static int devlink_nl_cmd_port_new_doit(struct sk_buff *skb,
new_attrs.sfnum_valid = true;
}

- err = devlink->ops->port_new(devlink, &new_attrs, extack,
- &new_port_index);
+ err = devlink->ops.port_new(devlink, &new_attrs, extack,
+ &new_port_index);
if (err)
return err;

err = devlink_port_new_notifiy(devlink, new_port_index, info);
if (err && err != -ENODEV) {
/* Fail to send the response; destroy newly created port. */
- devlink->ops->port_del(devlink, new_port_index, extack);
+ devlink->ops.port_del(devlink, new_port_index, extack);
}
return err;
}
@@ -1671,7 +1670,7 @@ static int devlink_nl_cmd_port_del_doit(struct sk_buff *skb,
struct devlink *devlink = info->user_ptr[0];
unsigned int port_index;

- if (!devlink->ops->port_del)
+ if (!devlink->ops.port_del)
return -EOPNOTSUPP;

if (!info->attrs[DEVLINK_ATTR_PORT_INDEX]) {
@@ -1680,7 +1679,7 @@ static int devlink_nl_cmd_port_del_doit(struct sk_buff *skb,
}
port_index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]);

- return devlink->ops->port_del(devlink, port_index, extack);
+ return devlink->ops.port_del(devlink, port_index, extack);
}

static int
@@ -1690,7 +1689,7 @@ devlink_nl_rate_parent_node_set(struct devlink_rate *devlink_rate,
{
struct devlink *devlink = devlink_rate->devlink;
const char *parent_name = nla_data(nla_parent);
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;
size_t len = strlen(parent_name);
struct devlink_rate *parent;
int err = -EOPNOTSUPP;
@@ -1839,7 +1838,7 @@ static int devlink_nl_cmd_rate_set_doit(struct sk_buff *skb,
{
struct devlink_rate *devlink_rate = info->user_ptr[1];
struct devlink *devlink = devlink_rate->devlink;
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;
int err;

if (!ops || !devlink_rate_set_ops_supported(ops, info, devlink_rate->type))
@@ -1860,7 +1859,7 @@ static int devlink_nl_cmd_rate_new_doit(struct sk_buff *skb,
const struct devlink_ops *ops;
int err;

- ops = devlink->ops;
+ ops = &devlink->ops;
if (!ops || !ops->rate_node_new || !ops->rate_node_del) {
NL_SET_ERR_MSG_MOD(info->extack, "Rate nodes aren't supported");
return -EOPNOTSUPP;
@@ -1914,7 +1913,7 @@ static int devlink_nl_cmd_rate_del_doit(struct sk_buff *skb,
{
struct devlink_rate *rate_node = info->user_ptr[1];
struct devlink *devlink = rate_node->devlink;
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;
int err;

if (refcount_read(&rate_node->refcnt) > 1) {
@@ -2053,8 +2052,8 @@ static int devlink_nl_sb_pool_fill(struct sk_buff *msg, struct devlink *devlink,
void *hdr;
int err;

- err = devlink->ops->sb_pool_get(devlink, devlink_sb->index,
- pool_index, &pool_info);
+ err = devlink->ops.sb_pool_get(devlink, devlink_sb->index, pool_index,
+ &pool_info);
if (err)
return err;

@@ -2105,7 +2104,7 @@ static int devlink_nl_cmd_sb_pool_get_doit(struct sk_buff *skb,
if (err)
return err;

- if (!devlink->ops->sb_pool_get)
+ if (!devlink->ops.sb_pool_get)
return -EOPNOTSUPP;

msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
@@ -2165,7 +2164,7 @@ static int devlink_nl_cmd_sb_pool_get_dumpit(struct sk_buff *msg,
continue;

if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
- !devlink->ops->sb_pool_get)
+ !devlink->ops.sb_pool_get)
goto retry;

mutex_lock(&devlink->lock);
@@ -2202,7 +2201,7 @@ static int devlink_sb_pool_set(struct devlink *devlink, unsigned int sb_index,
struct netlink_ext_ack *extack)

{
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;

if (ops->sb_pool_set)
return ops->sb_pool_set(devlink, sb_index, pool_index,
@@ -2250,7 +2249,7 @@ static int devlink_nl_sb_port_pool_fill(struct sk_buff *msg,
enum devlink_command cmd,
u32 portid, u32 seq, int flags)
{
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;
u32 threshold;
void *hdr;
int err;
@@ -2320,7 +2319,7 @@ static int devlink_nl_cmd_sb_port_pool_get_doit(struct sk_buff *skb,
if (err)
return err;

- if (!devlink->ops->sb_port_pool_get)
+ if (!devlink->ops.sb_port_pool_get)
return -EOPNOTSUPP;

msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
@@ -2386,7 +2385,7 @@ static int devlink_nl_cmd_sb_port_pool_get_dumpit(struct sk_buff *msg,
continue;

if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
- !devlink->ops->sb_port_pool_get)
+ !devlink->ops.sb_port_pool_get)
goto retry;

mutex_lock(&devlink->lock);
@@ -2423,7 +2422,7 @@ static int devlink_sb_port_pool_set(struct devlink_port *devlink_port,
struct netlink_ext_ack *extack)

{
- const struct devlink_ops *ops = devlink_port->devlink->ops;
+ const struct devlink_ops *ops = &devlink_port->devlink->ops;

if (ops->sb_port_pool_set)
return ops->sb_port_pool_set(devlink_port, sb_index,
@@ -2466,7 +2465,7 @@ devlink_nl_sb_tc_pool_bind_fill(struct sk_buff *msg, struct devlink *devlink,
enum devlink_command cmd,
u32 portid, u32 seq, int flags)
{
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;
u16 pool_index;
u32 threshold;
void *hdr;
@@ -2547,7 +2546,7 @@ static int devlink_nl_cmd_sb_tc_pool_bind_get_doit(struct sk_buff *skb,
if (err)
return err;

- if (!devlink->ops->sb_tc_pool_bind_get)
+ if (!devlink->ops.sb_tc_pool_bind_get)
return -EOPNOTSUPP;

msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
@@ -2635,7 +2634,7 @@ devlink_nl_cmd_sb_tc_pool_bind_get_dumpit(struct sk_buff *msg,
continue;

if (!net_eq(devlink_net(devlink), sock_net(msg->sk)) ||
- !devlink->ops->sb_tc_pool_bind_get)
+ !devlink->ops.sb_tc_pool_bind_get)
goto retry;

mutex_lock(&devlink->lock);
@@ -2674,7 +2673,7 @@ static int devlink_sb_tc_pool_bind_set(struct devlink_port *devlink_port,
struct netlink_ext_ack *extack)

{
- const struct devlink_ops *ops = devlink_port->devlink->ops;
+ const struct devlink_ops *ops = &devlink_port->devlink->ops;

if (ops->sb_tc_pool_bind_set)
return ops->sb_tc_pool_bind_set(devlink_port, sb_index,
@@ -2726,7 +2725,7 @@ static int devlink_nl_cmd_sb_occ_snapshot_doit(struct sk_buff *skb,
struct genl_info *info)
{
struct devlink *devlink = info->user_ptr[0];
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;
struct devlink_sb *devlink_sb;

devlink_sb = devlink_sb_get_from_info(devlink, info);
@@ -2742,7 +2741,7 @@ static int devlink_nl_cmd_sb_occ_max_clear_doit(struct sk_buff *skb,
struct genl_info *info)
{
struct devlink *devlink = info->user_ptr[0];
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;
struct devlink_sb *devlink_sb;

devlink_sb = devlink_sb_get_from_info(devlink, info);
@@ -2758,7 +2757,7 @@ static int devlink_nl_eswitch_fill(struct sk_buff *msg, struct devlink *devlink,
enum devlink_command cmd, u32 portid,
u32 seq, int flags)
{
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;
enum devlink_eswitch_encap_mode encap_mode;
u8 inline_mode;
void *hdr;
@@ -2852,7 +2851,7 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb,
struct genl_info *info)
{
struct devlink *devlink = info->user_ptr[0];
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;
enum devlink_eswitch_encap_mode encap_mode;
u8 inline_mode;
int err = 0;
@@ -4042,14 +4041,16 @@ static int devlink_reload(struct devlink *devlink, struct net *dest_net,

curr_net = devlink_net(devlink);
devlink_ns_change_notify(devlink, dest_net, curr_net, false);
- err = devlink->ops->reload_down(devlink, !!dest_net, action, limit, extack);
+ err = devlink->ops.reload_down(devlink, !!dest_net, action, limit,
+ extack);
if (err)
return err;

if (dest_net && !net_eq(dest_net, curr_net))
write_pnet(&devlink->_net, dest_net);

- err = devlink->ops->reload_up(devlink, action, limit, actions_performed, extack);
+ err = devlink->ops.reload_up(devlink, action, limit, actions_performed,
+ extack);
devlink_reload_failed_set(devlink, !!err);
if (err)
return err;
@@ -4104,7 +4105,7 @@ static int devlink_nl_cmd_reload(struct sk_buff *skb, struct genl_info *info)
u32 actions_performed;
int err;

- if (!devlink_reload_supported(devlink->ops))
+ if (!devlink_reload_supported(&devlink->ops))
return -EOPNOTSUPP;

err = devlink_resources_validate(devlink, NULL, info);
@@ -4314,13 +4315,13 @@ static int devlink_nl_cmd_flash_update(struct sk_buff *skb,
u32 supported_params;
int ret;

- if (!devlink->ops->flash_update)
+ if (!devlink->ops.flash_update)
return -EOPNOTSUPP;

if (!info->attrs[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME])
return -EINVAL;

- supported_params = devlink->ops->supported_flash_update_params;
+ supported_params = devlink->ops.supported_flash_update_params;

nla_component = info->attrs[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT];
if (nla_component) {
@@ -4354,7 +4355,7 @@ static int devlink_nl_cmd_flash_update(struct sk_buff *skb,
}

devlink_flash_update_begin_notify(devlink);
- ret = devlink->ops->flash_update(devlink, &params, info->extack);
+ ret = devlink->ops.flash_update(devlink, &params, info->extack);
devlink_flash_update_end_notify(devlink);

release_firmware(params.fw);
@@ -6055,7 +6056,7 @@ devlink_nl_info_fill(struct sk_buff *msg, struct devlink *devlink,
goto err_cancel_msg;

req.msg = msg;
- err = devlink->ops->info_get(devlink, &req, extack);
+ err = devlink->ops.info_get(devlink, &req, extack);
if (err)
goto err_cancel_msg;

@@ -6074,7 +6075,7 @@ static int devlink_nl_cmd_info_get_doit(struct sk_buff *skb,
struct sk_buff *msg;
int err;

- if (!devlink->ops->info_get)
+ if (!devlink->ops.info_get)
return -EOPNOTSUPP;

msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
@@ -6109,7 +6110,7 @@ static int devlink_nl_cmd_info_get_dumpit(struct sk_buff *msg,
if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
goto retry;

- if (idx < start || !devlink->ops->info_get)
+ if (idx < start || !devlink->ops.info_get)
goto inc;

mutex_lock(&devlink->lock);
@@ -7746,10 +7747,10 @@ static int devlink_trap_stats_put(struct sk_buff *msg, struct devlink *devlink,
u64 drops = 0;
int err;

- if (devlink->ops->trap_drop_counter_get) {
- err = devlink->ops->trap_drop_counter_get(devlink,
- trap_item->trap,
- &drops);
+ if (devlink->ops.trap_drop_counter_get) {
+ err = devlink->ops.trap_drop_counter_get(devlink,
+ trap_item->trap,
+ &drops);
if (err)
return err;
}
@@ -7760,7 +7761,7 @@ static int devlink_trap_stats_put(struct sk_buff *msg, struct devlink *devlink,
if (!attr)
return -EMSGSIZE;

- if (devlink->ops->trap_drop_counter_get &&
+ if (devlink->ops.trap_drop_counter_get &&
nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_DROPPED, drops,
DEVLINK_ATTR_PAD))
goto nla_put_failure;
@@ -7927,8 +7928,8 @@ static int __devlink_trap_action_set(struct devlink *devlink,
return 0;
}

- err = devlink->ops->trap_action_set(devlink, trap_item->trap,
- trap_action, extack);
+ err = devlink->ops.trap_action_set(devlink, trap_item->trap,
+ trap_action, extack);
if (err)
return err;

@@ -8152,9 +8153,11 @@ __devlink_trap_group_action_set(struct devlink *devlink,
struct devlink_trap_item *trap_item;
int err;

- if (devlink->ops->trap_group_action_set) {
- err = devlink->ops->trap_group_action_set(devlink, group_item->group,
- trap_action, extack);
+ if (devlink->ops.trap_group_action_set) {
+ err = devlink->ops.trap_group_action_set(devlink,
+ group_item->group,
+ trap_action,
+ extack);
if (err)
return err;

@@ -8222,7 +8225,7 @@ static int devlink_trap_group_set(struct devlink *devlink,
if (!attrs[DEVLINK_ATTR_TRAP_POLICER_ID])
return 0;

- if (!devlink->ops->trap_group_set)
+ if (!devlink->ops.trap_group_set)
return -EOPNOTSUPP;

policer_item = group_item->policer_item;
@@ -8239,8 +8242,8 @@ static int devlink_trap_group_set(struct devlink *devlink,
}
policer = policer_item ? policer_item->policer : NULL;

- err = devlink->ops->trap_group_set(devlink, group_item->group, policer,
- extack);
+ err = devlink->ops.trap_group_set(devlink, group_item->group, policer,
+ extack);
if (err)
return err;

@@ -8305,10 +8308,10 @@ devlink_trap_policer_stats_put(struct sk_buff *msg, struct devlink *devlink,
u64 drops;
int err;

- if (!devlink->ops->trap_policer_counter_get)
+ if (!devlink->ops.trap_policer_counter_get)
return 0;

- err = devlink->ops->trap_policer_counter_get(devlink, policer, &drops);
+ err = devlink->ops.trap_policer_counter_get(devlink, policer, &drops);
if (err)
return err;

@@ -8495,8 +8498,8 @@ devlink_trap_policer_set(struct devlink *devlink,
return -EINVAL;
}

- err = devlink->ops->trap_policer_set(devlink, policer_item->policer,
- rate, burst, info->extack);
+ err = devlink->ops.trap_policer_set(devlink, policer_item->policer,
+ rate, burst, info->extack);
if (err)
return err;

@@ -8516,7 +8519,7 @@ static int devlink_nl_cmd_trap_policer_set_doit(struct sk_buff *skb,
if (list_empty(&devlink->trap_policer_list))
return -EOPNOTSUPP;

- if (!devlink->ops->trap_policer_set)
+ if (!devlink->ops.trap_policer_set)
return -EOPNOTSUPP;

policer_item = devlink_trap_policer_item_get_from_info(devlink, info);
@@ -8987,6 +8990,92 @@ static bool devlink_reload_actions_valid(const struct devlink_ops *ops)
return true;
}

+/**
+ * devlink_set_ops - Set devlink ops dynamically
+ *
+ * @devlink: devlink
+ * @ops: devlink ops to set
+ *
+ * This interface allows us to set ops based on device property
+ * which is known after devlink_alloc() was already called.
+ *
+ * This call sets fields that are not initialized yet and ignores
+ * already set fields.
+ *
+ * It should be called before devlink_register(), so doesn't have any
+ * protection from concurent access.
+ */
+void devlink_set_ops(struct devlink *devlink, const struct devlink_ops *ops)
+{
+ struct devlink_ops *dev_ops = &devlink->ops;
+
+ WARN_ON(!devlink_reload_actions_valid(ops));
+ ASSERT_DEVLINK_NOT_REGISTERED(devlink);
+
+#define SET_DEVICE_OP(ptr, op, name) \
+ do { \
+ if ((op)->name) \
+ if (!((ptr)->name)) \
+ (ptr)->name = (op)->name; \
+ } while (0)
+
+ /* Keep sorte alphabetically for readability */
+ SET_DEVICE_OP(dev_ops, ops, eswitch_encap_mode_get);
+ SET_DEVICE_OP(dev_ops, ops, eswitch_encap_mode_set);
+ SET_DEVICE_OP(dev_ops, ops, eswitch_inline_mode_get);
+ SET_DEVICE_OP(dev_ops, ops, eswitch_inline_mode_set);
+ SET_DEVICE_OP(dev_ops, ops, eswitch_mode_get);
+ SET_DEVICE_OP(dev_ops, ops, eswitch_mode_set);
+ SET_DEVICE_OP(dev_ops, ops, flash_update);
+ SET_DEVICE_OP(dev_ops, ops, info_get);
+ SET_DEVICE_OP(dev_ops, ops, port_del);
+ SET_DEVICE_OP(dev_ops, ops, port_fn_state_get);
+ SET_DEVICE_OP(dev_ops, ops, port_fn_state_set);
+ SET_DEVICE_OP(dev_ops, ops, port_function_hw_addr_get);
+ SET_DEVICE_OP(dev_ops, ops, port_function_hw_addr_set);
+ SET_DEVICE_OP(dev_ops, ops, port_new);
+ SET_DEVICE_OP(dev_ops, ops, port_split);
+ SET_DEVICE_OP(dev_ops, ops, port_type_set);
+ SET_DEVICE_OP(dev_ops, ops, port_unsplit);
+ SET_DEVICE_OP(dev_ops, ops, rate_leaf_parent_set);
+ SET_DEVICE_OP(dev_ops, ops, rate_leaf_tx_max_set);
+ SET_DEVICE_OP(dev_ops, ops, rate_leaf_tx_share_set);
+ SET_DEVICE_OP(dev_ops, ops, rate_node_del);
+ SET_DEVICE_OP(dev_ops, ops, rate_node_new);
+ SET_DEVICE_OP(dev_ops, ops, rate_node_parent_set);
+ SET_DEVICE_OP(dev_ops, ops, rate_node_tx_max_set);
+ SET_DEVICE_OP(dev_ops, ops, rate_node_tx_share_set);
+ SET_DEVICE_OP(dev_ops, ops, reload_actions);
+ SET_DEVICE_OP(dev_ops, ops, reload_down);
+ SET_DEVICE_OP(dev_ops, ops, reload_limits);
+ SET_DEVICE_OP(dev_ops, ops, reload_up);
+ SET_DEVICE_OP(dev_ops, ops, sb_occ_max_clear);
+ SET_DEVICE_OP(dev_ops, ops, sb_occ_port_pool_get);
+ SET_DEVICE_OP(dev_ops, ops, sb_occ_snapshot);
+ SET_DEVICE_OP(dev_ops, ops, sb_occ_tc_port_bind_get);
+ SET_DEVICE_OP(dev_ops, ops, sb_pool_get);
+ SET_DEVICE_OP(dev_ops, ops, sb_pool_set);
+ SET_DEVICE_OP(dev_ops, ops, sb_port_pool_get);
+ SET_DEVICE_OP(dev_ops, ops, sb_port_pool_set);
+ SET_DEVICE_OP(dev_ops, ops, sb_tc_pool_bind_get);
+ SET_DEVICE_OP(dev_ops, ops, sb_tc_pool_bind_set);
+ SET_DEVICE_OP(dev_ops, ops, supported_flash_update_params);
+ SET_DEVICE_OP(dev_ops, ops, trap_action_set);
+ SET_DEVICE_OP(dev_ops, ops, trap_drop_counter_get);
+ SET_DEVICE_OP(dev_ops, ops, trap_fini);
+ SET_DEVICE_OP(dev_ops, ops, trap_group_action_set);
+ SET_DEVICE_OP(dev_ops, ops, trap_group_init);
+ SET_DEVICE_OP(dev_ops, ops, trap_group_set);
+ SET_DEVICE_OP(dev_ops, ops, trap_init);
+ SET_DEVICE_OP(dev_ops, ops, trap_policer_counter_get);
+ SET_DEVICE_OP(dev_ops, ops, trap_policer_fini);
+ SET_DEVICE_OP(dev_ops, ops, trap_policer_init);
+ SET_DEVICE_OP(dev_ops, ops, trap_policer_set);
+
+#undef SET_DEVICE_OP
+}
+EXPORT_SYMBOL_GPL(devlink_set_ops);
+
/**
* devlink_alloc_ns - Allocate new devlink instance resources
* in specific namespace
@@ -9008,8 +9097,6 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
int ret;

WARN_ON(!ops || !dev);
- if (!devlink_reload_actions_valid(ops))
- return NULL;

devlink = kzalloc(sizeof(*devlink) + priv_size, GFP_KERNEL);
if (!devlink)
@@ -9023,7 +9110,8 @@ struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
}

devlink->dev = dev;
- devlink->ops = ops;
+
+ devlink_set_ops(devlink, ops);
xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC);
write_pnet(&devlink->_net, net);
INIT_LIST_HEAD(&devlink->port_list);
@@ -9157,7 +9245,7 @@ void devlink_unregister(struct devlink *devlink)
wait_for_completion(&devlink->comp);

mutex_lock(&devlink_mutex);
- WARN_ON(devlink_reload_supported(devlink->ops) &&
+ WARN_ON(devlink_reload_supported(&devlink->ops) &&
devlink->reload_enabled);
devlink_notify_unregister(devlink);
xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED);
@@ -9616,7 +9704,7 @@ EXPORT_SYMBOL_GPL(devlink_rate_leaf_destroy);
void devlink_rate_nodes_destroy(struct devlink *devlink)
{
static struct devlink_rate *devlink_rate, *tmp;
- const struct devlink_ops *ops = devlink->ops;
+ const struct devlink_ops *ops = &devlink->ops;

mutex_lock(&devlink->lock);
list_for_each_entry(devlink_rate, &devlink->rate_list, list) {
@@ -10315,7 +10403,7 @@ int devlink_param_driverinit_value_get(struct devlink *devlink, u32 param_id,
{
struct devlink_param_item *param_item;

- if (!devlink_reload_supported(devlink->ops))
+ if (!devlink_reload_supported(&devlink->ops))
return -EOPNOTSUPP;

param_item = devlink_param_find_by_id(&devlink->param_list, param_id);
@@ -10901,7 +10989,7 @@ devlink_trap_register(struct devlink *devlink,
if (err)
goto err_group_link;

- err = devlink->ops->trap_init(devlink, trap, trap_item);
+ err = devlink->ops.trap_init(devlink, trap, trap_item);
if (err)
goto err_trap_init;

@@ -10929,8 +11017,8 @@ static void devlink_trap_unregister(struct devlink *devlink,

devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_DEL);
list_del(&trap_item->list);
- if (devlink->ops->trap_fini)
- devlink->ops->trap_fini(devlink, trap, trap_item);
+ if (devlink->ops.trap_fini)
+ devlink->ops.trap_fini(devlink, trap, trap_item);
free_percpu(trap_item->stats);
kfree(trap_item);
}
@@ -10944,8 +11032,8 @@ static void devlink_trap_disable(struct devlink *devlink,
if (WARN_ON_ONCE(!trap_item))
return;

- devlink->ops->trap_action_set(devlink, trap, DEVLINK_TRAP_ACTION_DROP,
- NULL);
+ devlink->ops.trap_action_set(devlink, trap, DEVLINK_TRAP_ACTION_DROP,
+ NULL);
trap_item->action = DEVLINK_TRAP_ACTION_DROP;
}

@@ -10964,7 +11052,7 @@ int devlink_traps_register(struct devlink *devlink,
{
int i, err;

- if (!devlink->ops->trap_init || !devlink->ops->trap_action_set)
+ if (!devlink->ops.trap_init || !devlink->ops.trap_action_set)
return -EINVAL;

mutex_lock(&devlink->lock);
@@ -11134,8 +11222,8 @@ devlink_trap_group_register(struct devlink *devlink,
if (err)
goto err_policer_link;

- if (devlink->ops->trap_group_init) {
- err = devlink->ops->trap_group_init(devlink, group);
+ if (devlink->ops.trap_group_init) {
+ err = devlink->ops.trap_group_init(devlink, group);
if (err)
goto err_group_init;
}
@@ -11275,8 +11363,8 @@ devlink_trap_policer_register(struct devlink *devlink,
policer_item->rate = policer->init_rate;
policer_item->burst = policer->init_burst;

- if (devlink->ops->trap_policer_init) {
- err = devlink->ops->trap_policer_init(devlink, policer);
+ if (devlink->ops.trap_policer_init) {
+ err = devlink->ops.trap_policer_init(devlink, policer);
if (err)
goto err_policer_init;
}
@@ -11305,8 +11393,8 @@ devlink_trap_policer_unregister(struct devlink *devlink,
devlink_trap_policer_notify(devlink, policer_item,
DEVLINK_CMD_TRAP_POLICER_DEL);
list_del(&policer_item->list);
- if (devlink->ops->trap_policer_fini)
- devlink->ops->trap_policer_fini(devlink, policer);
+ if (devlink->ops.trap_policer_fini)
+ devlink->ops.trap_policer_fini(devlink, policer);
kfree(policer_item);
}

@@ -11386,7 +11474,7 @@ static void __devlink_compat_running_version(struct devlink *devlink,
return;

req.msg = msg;
- err = devlink->ops->info_get(devlink, &req, NULL);
+ err = devlink->ops.info_get(devlink, &req, NULL);
if (err)
goto free_msg;

@@ -11418,7 +11506,7 @@ void devlink_compat_running_version(struct net_device *dev,
rtnl_unlock();

devlink = netdev_to_devlink(dev);
- if (!devlink || !devlink->ops->info_get)
+ if (!devlink || !devlink->ops.info_get)
goto out;

mutex_lock(&devlink->lock);
@@ -11440,7 +11528,7 @@ int devlink_compat_flash_update(struct net_device *dev, const char *file_name)
rtnl_unlock();

devlink = netdev_to_devlink(dev);
- if (!devlink || !devlink->ops->flash_update) {
+ if (!devlink || !devlink->ops.flash_update) {
ret = -EOPNOTSUPP;
goto out;
}
@@ -11451,7 +11539,7 @@ int devlink_compat_flash_update(struct net_device *dev, const char *file_name)

mutex_lock(&devlink->lock);
devlink_flash_update_begin_notify(devlink);
- ret = devlink->ops->flash_update(devlink, &params, NULL);
+ ret = devlink->ops.flash_update(devlink, &params, NULL);
devlink_flash_update_end_notify(devlink);
mutex_unlock(&devlink->lock);

@@ -11518,7 +11606,7 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net)
if (!net_eq(devlink_net(devlink), net))
goto retry;

- WARN_ON(!devlink_reload_supported(devlink->ops));
+ WARN_ON(!devlink_reload_supported(&devlink->ops));
err = devlink_reload(devlink, &init_net,
DEVLINK_RELOAD_ACTION_DRIVER_REINIT,
DEVLINK_RELOAD_LIMIT_UNSPEC,
--
2.31.1

2021-10-03 18:40:11

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface

From: Leon Romanovsky <[email protected]>

After changes to allow dynamically set the reload_up/_down callbacks,
we ensure that properly supported devlink ops are not accessible before
devlink_register, which is last command in the initialization sequence.

It makes devlink_reload_enable/_disable not relevant anymore and can be
safely deleted.

Signed-off-by: Leon Romanovsky <[email protected]>
---
.../hisilicon/hns3/hns3pf/hclge_devlink.c | 3 --
.../hisilicon/hns3/hns3vf/hclgevf_devlink.c | 3 --
drivers/net/ethernet/mellanox/mlx4/main.c | 2 -
.../net/ethernet/mellanox/mlx5/core/main.c | 3 --
.../mellanox/mlx5/core/sf/dev/driver.c | 5 +--
drivers/net/ethernet/mellanox/mlxsw/core.c | 10 ++---
drivers/net/netdevsim/dev.c | 3 --
include/net/devlink.h | 2 -
net/core/devlink.c | 43 +------------------
9 files changed, 5 insertions(+), 69 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c
index 59b0ae7d59e0..c394c393421e 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_devlink.c
@@ -120,7 +120,6 @@ int hclge_devlink_init(struct hclge_dev *hdev)
hdev->devlink = devlink;

devlink_register(devlink);
- devlink_reload_enable(devlink);
return 0;
}

@@ -128,8 +127,6 @@ void hclge_devlink_uninit(struct hclge_dev *hdev)
{
struct devlink *devlink = hdev->devlink;

- devlink_reload_disable(devlink);
-
devlink_unregister(devlink);

devlink_free(devlink);
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
index d60cc9426f70..d67c151e024b 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_devlink.c
@@ -122,7 +122,6 @@ int hclgevf_devlink_init(struct hclgevf_dev *hdev)
hdev->devlink = devlink;

devlink_register(devlink);
- devlink_reload_enable(devlink);
return 0;
}

@@ -130,8 +129,6 @@ void hclgevf_devlink_uninit(struct hclgevf_dev *hdev)
{
struct devlink *devlink = hdev->devlink;

- devlink_reload_disable(devlink);
-
devlink_unregister(devlink);

devlink_free(devlink);
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c b/drivers/net/ethernet/mellanox/mlx4/main.c
index 9541f3a920c8..8b410800f049 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -4026,7 +4026,6 @@ static int mlx4_init_one(struct pci_dev *pdev, const struct pci_device_id *id)

pci_save_state(pdev);
devlink_register(devlink);
- devlink_reload_enable(devlink);
return 0;

err_params_unregister:
@@ -4135,7 +4134,6 @@ static void mlx4_remove_one(struct pci_dev *pdev)
struct devlink *devlink = priv_to_devlink(priv);
int active_vfs = 0;

- devlink_reload_disable(devlink);
devlink_unregister(devlink);

if (mlx4_is_slave(dev))
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 5893fdd5aedb..65313448a47c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1538,8 +1538,6 @@ static int probe_one(struct pci_dev *pdev, const struct pci_device_id *id)

pci_save_state(pdev);
devlink_register(devlink);
- if (!mlx5_core_is_mp_slave(dev))
- devlink_reload_enable(devlink);
return 0;

err_init_one:
@@ -1559,7 +1557,6 @@ static void remove_one(struct pci_dev *pdev)
struct mlx5_core_dev *dev = pci_get_drvdata(pdev);
struct devlink *devlink = priv_to_devlink(dev);

- devlink_reload_disable(devlink);
devlink_unregister(devlink);
mlx5_crdump_disable(dev);
mlx5_drain_health_wq(dev);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
index 3cf272fa2164..7b4783ce213e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/sf/dev/driver.c
@@ -47,7 +47,6 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
goto init_one_err;
}
devlink_register(devlink);
- devlink_reload_enable(devlink);
return 0;

init_one_err:
@@ -62,10 +61,8 @@ static int mlx5_sf_dev_probe(struct auxiliary_device *adev, const struct auxilia
static void mlx5_sf_dev_remove(struct auxiliary_device *adev)
{
struct mlx5_sf_dev *sf_dev = container_of(adev, struct mlx5_sf_dev, adev);
- struct devlink *devlink;
+ struct devlink *devlink = priv_to_devlink(sf_dev->mdev);

- devlink = priv_to_devlink(sf_dev->mdev);
- devlink_reload_disable(devlink);
devlink_unregister(devlink);
mlx5_uninit_one(sf_dev->mdev);
iounmap(sf_dev->mdev->iseg);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 9e831e8b607a..895b3ba88e45 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -2007,11 +2007,8 @@ __mlxsw_core_bus_device_register(const struct mlxsw_bus_info *mlxsw_bus_info,
goto err_driver_init;
}

- if (!reload) {
+ if (!reload)
devlink_register(devlink);
- devlink_reload_enable(devlink);
- }
-
return 0;

err_driver_init:
@@ -2075,10 +2072,9 @@ void mlxsw_core_bus_device_unregister(struct mlxsw_core *mlxsw_core,
{
struct devlink *devlink = priv_to_devlink(mlxsw_core);

- if (!reload) {
- devlink_reload_disable(devlink);
+ if (!reload)
devlink_unregister(devlink);
- }
+
if (devlink_is_reload_failed(devlink)) {
if (!reload)
/* Only the parts that were not de-initialized in the
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index cb6645012a30..09e48fb232a9 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)

nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
devlink_register(devlink);
- devlink_reload_enable(devlink);
return 0;

err_psample_exit:
@@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
struct devlink *devlink = priv_to_devlink(nsim_dev);

- devlink_reload_disable(devlink);
devlink_unregister(devlink);
-
nsim_dev_reload_destroy(nsim_dev);

nsim_bpf_dev_exit(nsim_dev);
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 320146d95fb8..23355fd92553 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1523,8 +1523,6 @@ static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
void devlink_set_ops(struct devlink *devlink, const struct devlink_ops *ops);
void devlink_register(struct devlink *devlink);
void devlink_unregister(struct devlink *devlink);
-void devlink_reload_enable(struct devlink *devlink);
-void devlink_reload_disable(struct devlink *devlink);
void devlink_free(struct devlink *devlink);
int devlink_port_register(struct devlink *devlink,
struct devlink_port *devlink_port,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 25c2aa2b35cd..b45bdba9775d 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -62,8 +62,7 @@ struct devlink {
* port, sb, dpipe, resource, params, region, traps and more.
*/
struct mutex lock;
- u8 reload_failed:1,
- reload_enabled:1;
+ u8 reload_failed:1;
refcount_t refcount;
struct completion comp;
char priv[0] __aligned(NETDEV_ALIGN);
@@ -4033,9 +4032,6 @@ static int devlink_reload(struct devlink *devlink, struct net *dest_net,
struct net *curr_net;
int err;

- if (!devlink->reload_enabled)
- return -EOPNOTSUPP;
-
memcpy(remote_reload_stats, devlink->stats.remote_reload_stats,
sizeof(remote_reload_stats));

@@ -9245,49 +9241,12 @@ void devlink_unregister(struct devlink *devlink)
wait_for_completion(&devlink->comp);

mutex_lock(&devlink_mutex);
- WARN_ON(devlink_reload_supported(&devlink->ops) &&
- devlink->reload_enabled);
devlink_notify_unregister(devlink);
xa_clear_mark(&devlinks, devlink->index, DEVLINK_REGISTERED);
mutex_unlock(&devlink_mutex);
}
EXPORT_SYMBOL_GPL(devlink_unregister);

-/**
- * devlink_reload_enable - Enable reload of devlink instance
- *
- * @devlink: devlink
- *
- * Should be called at end of device initialization
- * process when reload operation is supported.
- */
-void devlink_reload_enable(struct devlink *devlink)
-{
- mutex_lock(&devlink_mutex);
- devlink->reload_enabled = true;
- mutex_unlock(&devlink_mutex);
-}
-EXPORT_SYMBOL_GPL(devlink_reload_enable);
-
-/**
- * devlink_reload_disable - Disable reload of devlink instance
- *
- * @devlink: devlink
- *
- * Should be called at the beginning of device cleanup
- * process when reload operation is supported.
- */
-void devlink_reload_disable(struct devlink *devlink)
-{
- mutex_lock(&devlink_mutex);
- /* Mutex is taken which ensures that no reload operation is in
- * progress while setting up forbidded flag.
- */
- devlink->reload_enabled = false;
- mutex_unlock(&devlink_mutex);
-}
-EXPORT_SYMBOL_GPL(devlink_reload_disable);
-
/**
* devlink_free - Free devlink instance resources
*
--
2.31.1

2021-10-03 18:41:38

by Leon Romanovsky

[permalink] [raw]
Subject: [PATCH net-next v2 4/5] net/mlx5: Register separate reload devlink ops for multiport device

From: Leon Romanovsky <[email protected]>

Mulitport slave device doesn't support devlink reload, so instead of
complicating initialization flow with devlink_reload_enable() which
will be removed in next patch, set specialized devlink ops callbacks
for reload operations.

This fixes an error when reload counters exposed (and equal zero) for
the mode that is not supported at all.

Fixes: d89ddaae1766 ("net/mlx5: Disable devlink reload for multi port slave device")
Signed-off-by: Leon Romanovsky <[email protected]>
---
drivers/net/ethernet/mellanox/mlx5/core/devlink.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
index b9a6cea03951..2b769e8c97e7 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/devlink.c
@@ -314,14 +314,17 @@ static const struct devlink_ops mlx5_devlink_ops = {
#endif
.flash_update = mlx5_devlink_flash_update,
.info_get = mlx5_devlink_info_get,
+ .trap_init = mlx5_devlink_trap_init,
+ .trap_fini = mlx5_devlink_trap_fini,
+ .trap_action_set = mlx5_devlink_trap_action_set,
+};
+
+static struct devlink_ops mlx5_devlink_reload = {
.reload_actions = BIT(DEVLINK_RELOAD_ACTION_DRIVER_REINIT) |
BIT(DEVLINK_RELOAD_ACTION_FW_ACTIVATE),
.reload_limits = BIT(DEVLINK_RELOAD_LIMIT_NO_RESET),
.reload_down = mlx5_devlink_reload_down,
.reload_up = mlx5_devlink_reload_up,
- .trap_init = mlx5_devlink_trap_init,
- .trap_fini = mlx5_devlink_trap_fini,
- .trap_action_set = mlx5_devlink_trap_action_set,
};

void mlx5_devlink_trap_report(struct mlx5_core_dev *dev, int trap_id, struct sk_buff *skb,
@@ -796,6 +799,7 @@ static void mlx5_devlink_traps_unregister(struct devlink *devlink)

int mlx5_devlink_register(struct devlink *devlink)
{
+ struct mlx5_core_dev *dev = devlink_priv(devlink);
int err;

err = devlink_params_register(devlink, mlx5_devlink_params,
@@ -813,6 +817,9 @@ int mlx5_devlink_register(struct devlink *devlink)
if (err)
goto traps_reg_err;

+ if (!mlx5_core_is_mp_slave(dev))
+ devlink_set_ops(devlink, &mlx5_devlink_reload);
+
return 0;

traps_reg_err:
--
2.31.1

2021-10-04 20:57:19

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface

On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky <[email protected]>
>
> After changes to allow dynamically set the reload_up/_down callbacks,
> we ensure that properly supported devlink ops are not accessible before
> devlink_register, which is last command in the initialization sequence.
>
> It makes devlink_reload_enable/_disable not relevant anymore and can be
> safely deleted.
>
> Signed-off-by: Leon Romanovsky <[email protected]>

[...]

> diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> index cb6645012a30..09e48fb232a9 100644
> --- a/drivers/net/netdevsim/dev.c
> +++ b/drivers/net/netdevsim/dev.c
> @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
>
> nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> devlink_register(devlink);
> - devlink_reload_enable(devlink);
> return 0;
>
> err_psample_exit:
> @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> struct devlink *devlink = priv_to_devlink(nsim_dev);
>
> - devlink_reload_disable(devlink);
> devlink_unregister(devlink);
> -
> nsim_dev_reload_destroy(nsim_dev);
>
> nsim_bpf_dev_exit(nsim_dev);

I didn't remember why devlink_reload_{enable,disable}() were added in
the first place so it was not clear to me from the commit message why
they can be removed. It is described in commit a0c76345e3d3 ("devlink:
disallow reload operation during device cleanup") with a reproducer.

Tried the reproducer with this series and I cannot reproduce the issue.
Wasn't quite sure why, but it does not seem to be related to "changes to
allow dynamically set the reload_up/_down callbacks", as this seems to
be specific to mlx5.

IIUC, the reason that the race described in above mentioned commit can
no longer happen is related to the fact that devlink_unregister() is
called first in the device dismantle path, after your previous patches.
Since both the reload operation and devlink_unregister() hold
'devlink_mutex', it is not possible for the reload operation to race
with device dismantle.

Agree? If so, I think it would be good to explain this in the commit
message unless it's clear to everyone else.

Thanks

2021-10-04 22:06:51

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface

On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky <[email protected]>
> >
> > After changes to allow dynamically set the reload_up/_down callbacks,
> > we ensure that properly supported devlink ops are not accessible before
> > devlink_register, which is last command in the initialization sequence.
> >
> > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > safely deleted.
> >
> > Signed-off-by: Leon Romanovsky <[email protected]>
>
> [...]
>
> > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > index cb6645012a30..09e48fb232a9 100644
> > --- a/drivers/net/netdevsim/dev.c
> > +++ b/drivers/net/netdevsim/dev.c
> > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> >
> > nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > devlink_register(devlink);
> > - devlink_reload_enable(devlink);
> > return 0;
> >
> > err_psample_exit:
> > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > struct devlink *devlink = priv_to_devlink(nsim_dev);
> >
> > - devlink_reload_disable(devlink);
> > devlink_unregister(devlink);
> > -
> > nsim_dev_reload_destroy(nsim_dev);
> >
> > nsim_bpf_dev_exit(nsim_dev);
>
> I didn't remember why devlink_reload_{enable,disable}() were added in
> the first place so it was not clear to me from the commit message why
> they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> disallow reload operation during device cleanup") with a reproducer.

It was added because devlink ops were accessible by the user space very
early in the driver lifetime. All my latest devlink patches are the
attempt to fix this arch/design/implementation issue.

>
> Tried the reproducer with this series and I cannot reproduce the issue.
> Wasn't quite sure why, but it does not seem to be related to "changes to
> allow dynamically set the reload_up/_down callbacks", as this seems to
> be specific to mlx5.

You didn't reproduce because of my series that moved
devlink_register()/devlink_unregister() to be last/first commands in
.probe()/.remove() flows.

Patch to allow dynamically set ops was needed because mlx5 had logic
like this:
if(something)
devlink_reload_enable()

And I needed a way to keep this if ... condition.

>
> IIUC, the reason that the race described in above mentioned commit can
> no longer happen is related to the fact that devlink_unregister() is
> called first in the device dismantle path, after your previous patches.
> Since both the reload operation and devlink_unregister() hold
> 'devlink_mutex', it is not possible for the reload operation to race
> with device dismantle.
>
> Agree? If so, I think it would be good to explain this in the commit
> message unless it's clear to everyone else.

I don't agree for very simple reason that devlink_mutex is going to be
removed very soon and it is really not a reason why devlink reload is
safer now when before.

The reload can't race due to:
1. devlink_unregister(), which works as a barrier to stop accesses
from the user space.
2. reference counting that ensures that all in-flight commands are counted.
3. wait_for_completion that blocks till all commands are done.

Thanks

>
> Thanks

2021-10-04 22:46:51

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface

On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > From: Leon Romanovsky <[email protected]>
> > >
> > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > we ensure that properly supported devlink ops are not accessible before
> > > devlink_register, which is last command in the initialization sequence.
> > >
> > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > safely deleted.
> > >
> > > Signed-off-by: Leon Romanovsky <[email protected]>
> >
> > [...]
> >
> > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > index cb6645012a30..09e48fb232a9 100644
> > > --- a/drivers/net/netdevsim/dev.c
> > > +++ b/drivers/net/netdevsim/dev.c
> > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > >
> > > nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > > devlink_register(devlink);
> > > - devlink_reload_enable(devlink);
> > > return 0;
> > >
> > > err_psample_exit:
> > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > > struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > > struct devlink *devlink = priv_to_devlink(nsim_dev);
> > >
> > > - devlink_reload_disable(devlink);
> > > devlink_unregister(devlink);
> > > -
> > > nsim_dev_reload_destroy(nsim_dev);
> > >
> > > nsim_bpf_dev_exit(nsim_dev);
> >
> > I didn't remember why devlink_reload_{enable,disable}() were added in
> > the first place so it was not clear to me from the commit message why
> > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > disallow reload operation during device cleanup") with a reproducer.
>
> It was added because devlink ops were accessible by the user space very
> early in the driver lifetime. All my latest devlink patches are the
> attempt to fix this arch/design/implementation issue.

The reproducer in the commit message executed the reload after the
device was fully initialized. IIRC, the problem there was that nothing
prevented these two tasks from racing:

devlink dev reload netdevsim/netdevsim10
echo 10 > /sys/bus/netdevsim/del_device

The title also talks about forbidding reload during device cleanup.

>
> >
> > Tried the reproducer with this series and I cannot reproduce the issue.
> > Wasn't quite sure why, but it does not seem to be related to "changes to
> > allow dynamically set the reload_up/_down callbacks", as this seems to
> > be specific to mlx5.
>
> You didn't reproduce because of my series that moved
> devlink_register()/devlink_unregister() to be last/first commands in
> .probe()/.remove() flows.

Agree, that is what I wrote in the next paragraph of my reply.

>
> Patch to allow dynamically set ops was needed because mlx5 had logic
> like this:
> if(something)
> devlink_reload_enable()
>
> And I needed a way to keep this if ... condition.
>
> >
> > IIUC, the reason that the race described in above mentioned commit can
> > no longer happen is related to the fact that devlink_unregister() is
> > called first in the device dismantle path, after your previous patches.
> > Since both the reload operation and devlink_unregister() hold
> > 'devlink_mutex', it is not possible for the reload operation to race
> > with device dismantle.
> >
> > Agree? If so, I think it would be good to explain this in the commit
> > message unless it's clear to everyone else.
>
> I don't agree for very simple reason that devlink_mutex is going to be
> removed very soon and it is really not a reason why devlink reload is
> safer now when before.
>
> The reload can't race due to:
> 1. devlink_unregister(), which works as a barrier to stop accesses
> from the user space.
> 2. reference counting that ensures that all in-flight commands are counted.
> 3. wait_for_completion that blocks till all commands are done.

So the wait_for_completion() is what prevents the race, not
'devlink_mutex' that is taken later. This needs to be explained in the
commit message to make it clear why the removal is safe.

Thanks

2021-10-04 23:32:33

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface

On Mon, Oct 04, 2021 at 07:54:02PM +0300, Ido Schimmel wrote:
> On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> > On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <[email protected]>
> > > >
> > > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > > we ensure that properly supported devlink ops are not accessible before
> > > > devlink_register, which is last command in the initialization sequence.
> > > >
> > > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > > safely deleted.
> > > >
> > > > Signed-off-by: Leon Romanovsky <[email protected]>
> > >
> > > [...]
> > >
> > > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > > index cb6645012a30..09e48fb232a9 100644
> > > > --- a/drivers/net/netdevsim/dev.c
> > > > +++ b/drivers/net/netdevsim/dev.c
> > > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > > >
> > > > nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > > > devlink_register(devlink);
> > > > - devlink_reload_enable(devlink);
> > > > return 0;
> > > >
> > > > err_psample_exit:
> > > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > > > struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > > > struct devlink *devlink = priv_to_devlink(nsim_dev);
> > > >
> > > > - devlink_reload_disable(devlink);
> > > > devlink_unregister(devlink);
> > > > -
> > > > nsim_dev_reload_destroy(nsim_dev);
> > > >
> > > > nsim_bpf_dev_exit(nsim_dev);
> > >
> > > I didn't remember why devlink_reload_{enable,disable}() were added in
> > > the first place so it was not clear to me from the commit message why
> > > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > > disallow reload operation during device cleanup") with a reproducer.
> >
> > It was added because devlink ops were accessible by the user space very
> > early in the driver lifetime. All my latest devlink patches are the
> > attempt to fix this arch/design/implementation issue.
>
> The reproducer in the commit message executed the reload after the
> device was fully initialized. IIRC, the problem there was that nothing
> prevented these two tasks from racing:
>
> devlink dev reload netdevsim/netdevsim10
> echo 10 > /sys/bus/netdevsim/del_device
>
> The title also talks about forbidding reload during device cleanup.

It is incomplete title and reproducer. In our verification, we observed
more than 40 bugs related to devlink reload flows and races around it.

>
> >
> > >
> > > Tried the reproducer with this series and I cannot reproduce the issue.
> > > Wasn't quite sure why, but it does not seem to be related to "changes to
> > > allow dynamically set the reload_up/_down callbacks", as this seems to
> > > be specific to mlx5.
> >
> > You didn't reproduce because of my series that moved
> > devlink_register()/devlink_unregister() to be last/first commands in
> > .probe()/.remove() flows.
>
> Agree, that is what I wrote in the next paragraph of my reply.
>
> >
> > Patch to allow dynamically set ops was needed because mlx5 had logic
> > like this:
> > if(something)
> > devlink_reload_enable()
> >
> > And I needed a way to keep this if ... condition.
> >
> > >
> > > IIUC, the reason that the race described in above mentioned commit can
> > > no longer happen is related to the fact that devlink_unregister() is
> > > called first in the device dismantle path, after your previous patches.
> > > Since both the reload operation and devlink_unregister() hold
> > > 'devlink_mutex', it is not possible for the reload operation to race
> > > with device dismantle.
> > >
> > > Agree? If so, I think it would be good to explain this in the commit
> > > message unless it's clear to everyone else.
> >
> > I don't agree for very simple reason that devlink_mutex is going to be
> > removed very soon and it is really not a reason why devlink reload is
> > safer now when before.
> >
> > The reload can't race due to:
> > 1. devlink_unregister(), which works as a barrier to stop accesses
> > from the user space.
> > 2. reference counting that ensures that all in-flight commands are counted.
> > 3. wait_for_completion that blocks till all commands are done.
>
> So the wait_for_completion() is what prevents the race, not
> 'devlink_mutex' that is taken later. This needs to be explained in the
> commit message to make it clear why the removal is safe.

Can you please suggest what exactly should I write in the commit message
to make it clear?

I'm too much into this delvink stuff already and for me this patch is
trivial. IMHO, that change doesn't need an explanation at all because
coding pattern of refcount + wait_for_completion is pretty common in the
kernel. So I think that I explained good enough: move of
devlink_register/devlink_unregister obsoletes the devlink_reload_* APIs.

I have no problem to update the commit message, just help me with the
message.

Thanks

>
> Thanks

2021-10-04 23:57:09

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically

On Sun, 3 Oct 2021 21:12:04 +0300 Leon Romanovsky wrote:
> From: Leon Romanovsky <[email protected]>
>
> Introduce new devlink call to set specific ops callback during
> device initialization phase after devlink_alloc() is already
> called.
>
> This allows us to set specific ops based on device property which
> is not known at the beginning of driver initialization.
>
> For the sake of simplicity, this API lacks any type of locking and
> needs to be called before devlink_register() to make sure that no
> parallel access to the ops is possible at this stage.

The fact that it's not registered does not mean that the callbacks
won't be invoked. Look at uses of devlink_compat_flash_update().

> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index 4e484afeadea..25c2aa2b35cd 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -53,7 +53,7 @@ struct devlink {
> struct list_head trap_list;
> struct list_head trap_group_list;
> struct list_head trap_policer_list;
> - const struct devlink_ops *ops;
> + struct devlink_ops ops;

Security people like ops to live in read-only memory. You're making
them r/w for every devlink instance now.

> struct xarray snapshot_ids;
> struct devlink_dev_stats stats;
> struct device *dev;

> +/**
> + * devlink_set_ops - Set devlink ops dynamically
> + *
> + * @devlink: devlink
> + * @ops: devlink ops to set
> + *
> + * This interface allows us to set ops based on device property
> + * which is known after devlink_alloc() was already called.
> + *
> + * This call sets fields that are not initialized yet and ignores
> + * already set fields.
> + *
> + * It should be called before devlink_register(), so doesn't have any
> + * protection from concurent access.
> + */
> +void devlink_set_ops(struct devlink *devlink, const struct devlink_ops *ops)
> +{
> + struct devlink_ops *dev_ops = &devlink->ops;
> +
> + WARN_ON(!devlink_reload_actions_valid(ops));
> + ASSERT_DEVLINK_NOT_REGISTERED(devlink);
> +
> +#define SET_DEVICE_OP(ptr, op, name) \
> + do { \
> + if ((op)->name) \
> + if (!((ptr)->name)) \
> + (ptr)->name = (op)->name; \
> + } while (0)
> +
> + /* Keep sorte alphabetically for readability */
> + SET_DEVICE_OP(dev_ops, ops, eswitch_encap_mode_get);
> + SET_DEVICE_OP(dev_ops, ops, eswitch_encap_mode_set);
> + SET_DEVICE_OP(dev_ops, ops, eswitch_inline_mode_get);
> + SET_DEVICE_OP(dev_ops, ops, eswitch_inline_mode_set);
> + SET_DEVICE_OP(dev_ops, ops, eswitch_mode_get);
> + SET_DEVICE_OP(dev_ops, ops, eswitch_mode_set);
> + SET_DEVICE_OP(dev_ops, ops, flash_update);
> + SET_DEVICE_OP(dev_ops, ops, info_get);
> + SET_DEVICE_OP(dev_ops, ops, port_del);
> + SET_DEVICE_OP(dev_ops, ops, port_fn_state_get);
> + SET_DEVICE_OP(dev_ops, ops, port_fn_state_set);
> + SET_DEVICE_OP(dev_ops, ops, port_function_hw_addr_get);
> + SET_DEVICE_OP(dev_ops, ops, port_function_hw_addr_set);
> + SET_DEVICE_OP(dev_ops, ops, port_new);
> + SET_DEVICE_OP(dev_ops, ops, port_split);
> + SET_DEVICE_OP(dev_ops, ops, port_type_set);
> + SET_DEVICE_OP(dev_ops, ops, port_unsplit);
> + SET_DEVICE_OP(dev_ops, ops, rate_leaf_parent_set);
> + SET_DEVICE_OP(dev_ops, ops, rate_leaf_tx_max_set);
> + SET_DEVICE_OP(dev_ops, ops, rate_leaf_tx_share_set);
> + SET_DEVICE_OP(dev_ops, ops, rate_node_del);
> + SET_DEVICE_OP(dev_ops, ops, rate_node_new);
> + SET_DEVICE_OP(dev_ops, ops, rate_node_parent_set);
> + SET_DEVICE_OP(dev_ops, ops, rate_node_tx_max_set);
> + SET_DEVICE_OP(dev_ops, ops, rate_node_tx_share_set);
> + SET_DEVICE_OP(dev_ops, ops, reload_actions);
> + SET_DEVICE_OP(dev_ops, ops, reload_down);
> + SET_DEVICE_OP(dev_ops, ops, reload_limits);
> + SET_DEVICE_OP(dev_ops, ops, reload_up);
> + SET_DEVICE_OP(dev_ops, ops, sb_occ_max_clear);
> + SET_DEVICE_OP(dev_ops, ops, sb_occ_port_pool_get);
> + SET_DEVICE_OP(dev_ops, ops, sb_occ_snapshot);
> + SET_DEVICE_OP(dev_ops, ops, sb_occ_tc_port_bind_get);
> + SET_DEVICE_OP(dev_ops, ops, sb_pool_get);
> + SET_DEVICE_OP(dev_ops, ops, sb_pool_set);
> + SET_DEVICE_OP(dev_ops, ops, sb_port_pool_get);
> + SET_DEVICE_OP(dev_ops, ops, sb_port_pool_set);
> + SET_DEVICE_OP(dev_ops, ops, sb_tc_pool_bind_get);
> + SET_DEVICE_OP(dev_ops, ops, sb_tc_pool_bind_set);
> + SET_DEVICE_OP(dev_ops, ops, supported_flash_update_params);
> + SET_DEVICE_OP(dev_ops, ops, trap_action_set);
> + SET_DEVICE_OP(dev_ops, ops, trap_drop_counter_get);
> + SET_DEVICE_OP(dev_ops, ops, trap_fini);
> + SET_DEVICE_OP(dev_ops, ops, trap_group_action_set);
> + SET_DEVICE_OP(dev_ops, ops, trap_group_init);
> + SET_DEVICE_OP(dev_ops, ops, trap_group_set);
> + SET_DEVICE_OP(dev_ops, ops, trap_init);
> + SET_DEVICE_OP(dev_ops, ops, trap_policer_counter_get);
> + SET_DEVICE_OP(dev_ops, ops, trap_policer_fini);
> + SET_DEVICE_OP(dev_ops, ops, trap_policer_init);
> + SET_DEVICE_OP(dev_ops, ops, trap_policer_set);
> +
> +#undef SET_DEVICE_OP
> +}
> +EXPORT_SYMBOL_GPL(devlink_set_ops);

I still don't like this. IMO using feature bits to dynamically mask-off
capabilities has much better properties. We already have static caps
in devlink_ops (first 3 members), we should build on top of that.

2021-10-04 23:57:39

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v2 1/5] devlink: Reduce struct devlink exposure

On Sun, 3 Oct 2021 21:12:02 +0300 Leon Romanovsky wrote:
> From: Leon Romanovsky <[email protected]>
>
> The declaration of struct devlink in general header provokes the
> situation where internal fields can be accidentally used by the driver
> authors. In order to reduce such possible situations, let's reduce the
> namespace exposure of struct devlink.
>
> Signed-off-by: Leon Romanovsky <[email protected]>

100% subjective but every time I decided to hide a structure definition
like this I came to regret it later. The fact there is only one minor
infraction in drivers poking at members seems to prove this is not in
fact needed.

2021-10-05 06:13:54

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface

On Mon, Oct 04, 2021 at 10:02:06PM +0300, Leon Romanovsky wrote:
> On Mon, Oct 04, 2021 at 07:54:02PM +0300, Ido Schimmel wrote:
> > On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> > > On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > > > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > > > From: Leon Romanovsky <[email protected]>
> > > > >
> > > > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > > > we ensure that properly supported devlink ops are not accessible before
> > > > > devlink_register, which is last command in the initialization sequence.
> > > > >
> > > > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > > > safely deleted.
> > > > >
> > > > > Signed-off-by: Leon Romanovsky <[email protected]>
> > > >
> > > > [...]
> > > >
> > > > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > > > index cb6645012a30..09e48fb232a9 100644
> > > > > --- a/drivers/net/netdevsim/dev.c
> > > > > +++ b/drivers/net/netdevsim/dev.c
> > > > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > > > >
> > > > > nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > > > > devlink_register(devlink);
> > > > > - devlink_reload_enable(devlink);
> > > > > return 0;
> > > > >
> > > > > err_psample_exit:
> > > > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > > > > struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > > > > struct devlink *devlink = priv_to_devlink(nsim_dev);
> > > > >
> > > > > - devlink_reload_disable(devlink);
> > > > > devlink_unregister(devlink);
> > > > > -
> > > > > nsim_dev_reload_destroy(nsim_dev);
> > > > >
> > > > > nsim_bpf_dev_exit(nsim_dev);
> > > >
> > > > I didn't remember why devlink_reload_{enable,disable}() were added in
> > > > the first place so it was not clear to me from the commit message why
> > > > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > > > disallow reload operation during device cleanup") with a reproducer.
> > >
> > > It was added because devlink ops were accessible by the user space very
> > > early in the driver lifetime. All my latest devlink patches are the
> > > attempt to fix this arch/design/implementation issue.
> >
> > The reproducer in the commit message executed the reload after the
> > device was fully initialized. IIRC, the problem there was that nothing
> > prevented these two tasks from racing:
> >
> > devlink dev reload netdevsim/netdevsim10
> > echo 10 > /sys/bus/netdevsim/del_device
> >
> > The title also talks about forbidding reload during device cleanup.
>
> It is incomplete title and reproducer.

How can the reproducer be incomplete when it reproduced the issue 100%
of the time?

> In our verification, we observed more than 40 bugs related to devlink
> reload flows and races around it.

I assume these bugs are related to mlx5. syzkaller is familiar with the
devlink messages [1] and we are using it to fuzz over both mlxsw and
netdevsim. syzbot is also fuzzing over netdevsim and I'm not aware of
any open bugs.

[1] https://github.com/google/syzkaller/blob/master/sys/linux/socket_netlink_generic_devlink.txt

>
> >
> > >
> > > >
> > > > Tried the reproducer with this series and I cannot reproduce the issue.
> > > > Wasn't quite sure why, but it does not seem to be related to "changes to
> > > > allow dynamically set the reload_up/_down callbacks", as this seems to
> > > > be specific to mlx5.
> > >
> > > You didn't reproduce because of my series that moved
> > > devlink_register()/devlink_unregister() to be last/first commands in
> > > .probe()/.remove() flows.
> >
> > Agree, that is what I wrote in the next paragraph of my reply.
> >
> > >
> > > Patch to allow dynamically set ops was needed because mlx5 had logic
> > > like this:
> > > if(something)
> > > devlink_reload_enable()
> > >
> > > And I needed a way to keep this if ... condition.
> > >
> > > >
> > > > IIUC, the reason that the race described in above mentioned commit can
> > > > no longer happen is related to the fact that devlink_unregister() is
> > > > called first in the device dismantle path, after your previous patches.
> > > > Since both the reload operation and devlink_unregister() hold
> > > > 'devlink_mutex', it is not possible for the reload operation to race
> > > > with device dismantle.
> > > >
> > > > Agree? If so, I think it would be good to explain this in the commit
> > > > message unless it's clear to everyone else.
> > >
> > > I don't agree for very simple reason that devlink_mutex is going to be
> > > removed very soon and it is really not a reason why devlink reload is
> > > safer now when before.
> > >
> > > The reload can't race due to:
> > > 1. devlink_unregister(), which works as a barrier to stop accesses
> > > from the user space.
> > > 2. reference counting that ensures that all in-flight commands are counted.
> > > 3. wait_for_completion that blocks till all commands are done.
> >
> > So the wait_for_completion() is what prevents the race, not
> > 'devlink_mutex' that is taken later. This needs to be explained in the
> > commit message to make it clear why the removal is safe.
>
> Can you please suggest what exactly should I write in the commit message
> to make it clear?
>
> I'm too much into this delvink stuff already and for me this patch is
> trivial. IMHO, that change doesn't need an explanation at all because
> coding pattern of refcount + wait_for_completion is pretty common in the
> kernel. So I think that I explained good enough: move of
> devlink_register/devlink_unregister obsoletes the devlink_reload_* APIs.
>
> I have no problem to update the commit message, just help me with the
> message.

I suggest something like:

"
Commit a0c76345e3d3 ("devlink: disallow reload operation during device
cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
operation from racing with device probe / dismantle.

After recent changes to move devlink_register() to the end of device
probe and devlink_unregister() to the beginning of device dismantle,
these races can no longer happen. Reload operations will be denied if
the devlink instance is unregistered and devlink_unregister() will block
until all in-flight operations are done.

Therefore, remove these devlink_reload_{enable,disable}() APIs. Tested
with the reproducer mentioned in cited commit.
"

2021-10-05 06:14:35

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v2 1/5] devlink: Reduce struct devlink exposure

On Mon, Oct 04, 2021 at 04:38:08PM -0700, Jakub Kicinski wrote:
> On Sun, 3 Oct 2021 21:12:02 +0300 Leon Romanovsky wrote:
> > From: Leon Romanovsky <[email protected]>
> >
> > The declaration of struct devlink in general header provokes the
> > situation where internal fields can be accidentally used by the driver
> > authors. In order to reduce such possible situations, let's reduce the
> > namespace exposure of struct devlink.
> >
> > Signed-off-by: Leon Romanovsky <[email protected]>
>
> 100% subjective but every time I decided to hide a structure definition
> like this I came to regret it later. The fact there is only one minor
> infraction in drivers poking at members seems to prove this is not in
> fact needed.

Yes, it is subjective, my experience is completely opposite :). Every
time the internals were exposed, they were abused.

IMHO, the one user that poked into the struct devlink internals is a pure
luck together with lack of devlink adoption outside of the netdev which
limited number of devlink API users. The more devlink will be used, the
more creative usage will be.

For example, ionic had internal logic based on internal devlink_port state:
* c2255ff47768 ("ionic: cleanly release devlink instance")
* d7907a2b1a3b ("devlink: Remove duplicated registration check")

However, this patch was written not because of having right software
abstraction, but because of the next patch, where I needed to have
declaration of "struct devlink_ops" before struct devlink itself.

Without this patch, I would need to heavily reshuffle include/net/devlink.h
to have structs declarations written in different order. So a lot of
churn for something that needs to be fixed anyway (in my opinion).

Thanks

2021-10-05 07:34:31

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically

On Mon, Oct 04, 2021 at 04:44:13PM -0700, Jakub Kicinski wrote:
> On Sun, 3 Oct 2021 21:12:04 +0300 Leon Romanovsky wrote:
> > From: Leon Romanovsky <[email protected]>
> >
> > Introduce new devlink call to set specific ops callback during
> > device initialization phase after devlink_alloc() is already
> > called.
> >
> > This allows us to set specific ops based on device property which
> > is not known at the beginning of driver initialization.
> >
> > For the sake of simplicity, this API lacks any type of locking and
> > needs to be called before devlink_register() to make sure that no
> > parallel access to the ops is possible at this stage.
>
> The fact that it's not registered does not mean that the callbacks
> won't be invoked. Look at uses of devlink_compat_flash_update().

It is impossible, devlink_register() is part of .probe() flow and if it
wasn't called -> probe didn't success -> net_device doesn't exist.

We are not having net_device without "connected" device beneath, aren't we?

At least drivers that I checked are not prepared at all to handle call
to devlink->ops.flash_update() if they didn't probe successfully.

>
> > diff --git a/net/core/devlink.c b/net/core/devlink.c
> > index 4e484afeadea..25c2aa2b35cd 100644
> > --- a/net/core/devlink.c
> > +++ b/net/core/devlink.c
> > @@ -53,7 +53,7 @@ struct devlink {
> > struct list_head trap_list;
> > struct list_head trap_group_list;
> > struct list_head trap_policer_list;
> > - const struct devlink_ops *ops;
> > + struct devlink_ops ops;
>
> Security people like ops to live in read-only memory. You're making
> them r/w for every devlink instance now.

Yes, but we are explicitly copy every function pointer, which is safe.

>
> > struct xarray snapshot_ids;
> > struct devlink_dev_stats stats;
> > struct device *dev;
>
> > +/**
> > + * devlink_set_ops - Set devlink ops dynamically
> > + *
> > + * @devlink: devlink
> > + * @ops: devlink ops to set
> > + *
> > + * This interface allows us to set ops based on device property
> > + * which is known after devlink_alloc() was already called.
> > + *
> > + * This call sets fields that are not initialized yet and ignores
> > + * already set fields.
> > + *
> > + * It should be called before devlink_register(), so doesn't have any
> > + * protection from concurent access.
> > + */
> > +void devlink_set_ops(struct devlink *devlink, const struct devlink_ops *ops)
> > +{
> > + struct devlink_ops *dev_ops = &devlink->ops;
> > +
> > + WARN_ON(!devlink_reload_actions_valid(ops));
> > + ASSERT_DEVLINK_NOT_REGISTERED(devlink);

<...>

> > +EXPORT_SYMBOL_GPL(devlink_set_ops);
>
> I still don't like this. IMO using feature bits to dynamically mask-off
> capabilities has much better properties. We already have static caps
> in devlink_ops (first 3 members), we should build on top of that.

These capabilities are for specific operation, like flash or reload.
They control how these flows will work, they don't control if this flow
is valid or not.

You are too focused on reload caps, but mutliport mlx5 device doesn't
support eswitch too. I just didn't remove the eswitch callbacks to
stay focused on more important work - making devlink better. :)

Even if we decide to use new flag in devlink_ops, we will still need to
add this devlink_set_ops() patch, because the value of that new flag
will be known very late in initialization phase, after FW capabilities
are known and I will need to overwrite RO memory.

Jakub,

Can we please continue with the current approach? It doesn't expose any
user visible API and everything here will be easy rewrite differently
if such needs arise.

We have so much ahead, like removing devlink_lock, rewriting devlink->lock,
fixing devlink reload of IB part, e.t.c

Thanks

2021-10-05 07:43:41

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface

On Tue, Oct 05, 2021 at 09:10:15AM +0300, Ido Schimmel wrote:
> On Mon, Oct 04, 2021 at 10:02:06PM +0300, Leon Romanovsky wrote:
> > On Mon, Oct 04, 2021 at 07:54:02PM +0300, Ido Schimmel wrote:
> > > On Mon, Oct 04, 2021 at 06:45:07PM +0300, Leon Romanovsky wrote:
> > > > On Mon, Oct 04, 2021 at 05:19:40PM +0300, Ido Schimmel wrote:
> > > > > On Sun, Oct 03, 2021 at 09:12:06PM +0300, Leon Romanovsky wrote:
> > > > > > From: Leon Romanovsky <[email protected]>
> > > > > >
> > > > > > After changes to allow dynamically set the reload_up/_down callbacks,
> > > > > > we ensure that properly supported devlink ops are not accessible before
> > > > > > devlink_register, which is last command in the initialization sequence.
> > > > > >
> > > > > > It makes devlink_reload_enable/_disable not relevant anymore and can be
> > > > > > safely deleted.
> > > > > >
> > > > > > Signed-off-by: Leon Romanovsky <[email protected]>
> > > > >
> > > > > [...]
> > > > >
> > > > > > diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
> > > > > > index cb6645012a30..09e48fb232a9 100644
> > > > > > --- a/drivers/net/netdevsim/dev.c
> > > > > > +++ b/drivers/net/netdevsim/dev.c
> > > > > > @@ -1512,7 +1512,6 @@ int nsim_dev_probe(struct nsim_bus_dev *nsim_bus_dev)
> > > > > >
> > > > > > nsim_dev->esw_mode = DEVLINK_ESWITCH_MODE_LEGACY;
> > > > > > devlink_register(devlink);
> > > > > > - devlink_reload_enable(devlink);
> > > > > > return 0;
> > > > > >
> > > > > > err_psample_exit:
> > > > > > @@ -1566,9 +1565,7 @@ void nsim_dev_remove(struct nsim_bus_dev *nsim_bus_dev)
> > > > > > struct nsim_dev *nsim_dev = dev_get_drvdata(&nsim_bus_dev->dev);
> > > > > > struct devlink *devlink = priv_to_devlink(nsim_dev);
> > > > > >
> > > > > > - devlink_reload_disable(devlink);
> > > > > > devlink_unregister(devlink);
> > > > > > -
> > > > > > nsim_dev_reload_destroy(nsim_dev);
> > > > > >
> > > > > > nsim_bpf_dev_exit(nsim_dev);
> > > > >
> > > > > I didn't remember why devlink_reload_{enable,disable}() were added in
> > > > > the first place so it was not clear to me from the commit message why
> > > > > they can be removed. It is described in commit a0c76345e3d3 ("devlink:
> > > > > disallow reload operation during device cleanup") with a reproducer.
> > > >
> > > > It was added because devlink ops were accessible by the user space very
> > > > early in the driver lifetime. All my latest devlink patches are the
> > > > attempt to fix this arch/design/implementation issue.
> > >
> > > The reproducer in the commit message executed the reload after the
> > > device was fully initialized. IIRC, the problem there was that nothing
> > > prevented these two tasks from racing:
> > >
> > > devlink dev reload netdevsim/netdevsim10
> > > echo 10 > /sys/bus/netdevsim/del_device
> > >
> > > The title also talks about forbidding reload during device cleanup.
> >
> > It is incomplete title and reproducer.
>
> How can the reproducer be incomplete when it reproduced the issue 100%
> of the time?

Incomplete in the sense that other reproducers exists.
Our internally famous one is module load/reload together with devlink
reload. More complex includes PCI errors, health recover e.t.c.

>
> > In our verification, we observed more than 40 bugs related to devlink
> > reload flows and races around it.
>
> I assume these bugs are related to mlx5. syzkaller is familiar with the
> devlink messages [1] and we are using it to fuzz over both mlxsw and
> netdevsim. syzbot is also fuzzing over netdevsim and I'm not aware of
> any open bugs.
>
> [1] https://github.com/google/syzkaller/blob/master/sys/linux/socket_netlink_generic_devlink.txt

We don't know what we don't know.

>
> >
> > >
> > > >
> > > > >
> > > > > Tried the reproducer with this series and I cannot reproduce the issue.
> > > > > Wasn't quite sure why, but it does not seem to be related to "changes to
> > > > > allow dynamically set the reload_up/_down callbacks", as this seems to
> > > > > be specific to mlx5.
> > > >
> > > > You didn't reproduce because of my series that moved
> > > > devlink_register()/devlink_unregister() to be last/first commands in
> > > > .probe()/.remove() flows.
> > >
> > > Agree, that is what I wrote in the next paragraph of my reply.
> > >
> > > >
> > > > Patch to allow dynamically set ops was needed because mlx5 had logic
> > > > like this:
> > > > if(something)
> > > > devlink_reload_enable()
> > > >
> > > > And I needed a way to keep this if ... condition.
> > > >
> > > > >
> > > > > IIUC, the reason that the race described in above mentioned commit can
> > > > > no longer happen is related to the fact that devlink_unregister() is
> > > > > called first in the device dismantle path, after your previous patches.
> > > > > Since both the reload operation and devlink_unregister() hold
> > > > > 'devlink_mutex', it is not possible for the reload operation to race
> > > > > with device dismantle.
> > > > >
> > > > > Agree? If so, I think it would be good to explain this in the commit
> > > > > message unless it's clear to everyone else.
> > > >
> > > > I don't agree for very simple reason that devlink_mutex is going to be
> > > > removed very soon and it is really not a reason why devlink reload is
> > > > safer now when before.
> > > >
> > > > The reload can't race due to:
> > > > 1. devlink_unregister(), which works as a barrier to stop accesses
> > > > from the user space.
> > > > 2. reference counting that ensures that all in-flight commands are counted.
> > > > 3. wait_for_completion that blocks till all commands are done.
> > >
> > > So the wait_for_completion() is what prevents the race, not
> > > 'devlink_mutex' that is taken later. This needs to be explained in the
> > > commit message to make it clear why the removal is safe.
> >
> > Can you please suggest what exactly should I write in the commit message
> > to make it clear?
> >
> > I'm too much into this delvink stuff already and for me this patch is
> > trivial. IMHO, that change doesn't need an explanation at all because
> > coding pattern of refcount + wait_for_completion is pretty common in the
> > kernel. So I think that I explained good enough: move of
> > devlink_register/devlink_unregister obsoletes the devlink_reload_* APIs.
> >
> > I have no problem to update the commit message, just help me with the
> > message.
>
> I suggest something like:
>
> "
> Commit a0c76345e3d3 ("devlink: disallow reload operation during device
> cleanup") added devlink_reload_{enable,disable}() APIs to prevent reload
> operation from racing with device probe / dismantle.
>
> After recent changes to move devlink_register() to the end of device
> probe and devlink_unregister() to the beginning of device dismantle,
> these races can no longer happen. Reload operations will be denied if
> the devlink instance is unregistered and devlink_unregister() will block
> until all in-flight operations are done.
>
> Therefore, remove these devlink_reload_{enable,disable}() APIs. Tested
> with the reproducer mentioned in cited commit.
> "

Sure, thanks.
Can I added your TOB to the patch?

2021-10-05 08:19:41

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH net-next v2 5/5] devlink: Delete reload enable/disable interface

On Tue, Oct 05, 2021 at 10:40:58AM +0300, Leon Romanovsky wrote:
> Can I added your TOB to the patch?

Yes

Tested-by: Ido Schimmel <[email protected]>

2021-10-05 18:34:01

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically

On Tue, 5 Oct 2021 10:32:45 +0300 Leon Romanovsky wrote:
> On Mon, Oct 04, 2021 at 04:44:13PM -0700, Jakub Kicinski wrote:
> > On Sun, 3 Oct 2021 21:12:04 +0300 Leon Romanovsky wrote:
> > > From: Leon Romanovsky <[email protected]>
> > >
> > > Introduce new devlink call to set specific ops callback during
> > > device initialization phase after devlink_alloc() is already
> > > called.
> > >
> > > This allows us to set specific ops based on device property which
> > > is not known at the beginning of driver initialization.
> > >
> > > For the sake of simplicity, this API lacks any type of locking and
> > > needs to be called before devlink_register() to make sure that no
> > > parallel access to the ops is possible at this stage.
> >
> > The fact that it's not registered does not mean that the callbacks
> > won't be invoked. Look at uses of devlink_compat_flash_update().
>
> It is impossible, devlink_register() is part of .probe() flow and if it
> wasn't called -> probe didn't success -> net_device doesn't exist.

Are you talking about reality or the bright future brought by auxbus?

> We are not having net_device without "connected" device beneath, aren't we?
>
> At least drivers that I checked are not prepared at all to handle call
> to devlink->ops.flash_update() if they didn't probe successfully.

Last time I checked you moved the devlink_register() at the end of
probe which for all no-auxbus drivers means after register_netdev().

> > > diff --git a/net/core/devlink.c b/net/core/devlink.c
> > > index 4e484afeadea..25c2aa2b35cd 100644
> > > --- a/net/core/devlink.c
> > > +++ b/net/core/devlink.c
> > > @@ -53,7 +53,7 @@ struct devlink {
> > > struct list_head trap_list;
> > > struct list_head trap_group_list;
> > > struct list_head trap_policer_list;
> > > - const struct devlink_ops *ops;
> > > + struct devlink_ops ops;
> >
> > Security people like ops to live in read-only memory. You're making
> > them r/w for every devlink instance now.
>
> Yes, but we are explicitly copy every function pointer, which is safe.

The goal is for ops to live in pages which are mapped read-only,
so that heap overflows can overwrite the pointers.

> > > struct xarray snapshot_ids;
> > > struct devlink_dev_stats stats;
> > > struct device *dev;

> > > +EXPORT_SYMBOL_GPL(devlink_set_ops);
> >
> > I still don't like this. IMO using feature bits to dynamically mask-off
> > capabilities has much better properties. We already have static caps
> > in devlink_ops (first 3 members), we should build on top of that.
>
> These capabilities are for specific operation, like flash or reload.
> They control how these flows will work, they don't control if this flow
> is valid or not.
>
> You are too focused on reload caps, but mutliport mlx5 device doesn't
> support eswitch too. I just didn't remove the eswitch callbacks to
> stay focused on more important work - making devlink better. :)
>
> Even if we decide to use new flag in devlink_ops, we will still need to
> add this devlink_set_ops() patch, because the value of that new flag
> will be known very late in initialization phase, after FW capabilities
> are known and I will need to overwrite RO memory.

Yes, you can change the caps at run time, that's perfectly reasonable.
You'll also be able to define more fine grained caps going forward as
needed.

> Jakub,
>
> Can we please continue with the current approach? It doesn't expose any
> user visible API and everything here will be easy rewrite differently
> if such needs arise.
>
> We have so much ahead, like removing devlink_lock, rewriting devlink->lock,
> fixing devlink reload of IB part, e.t.c

I don't like it. If you're feeling strongly please gather support of
other developers. Right now it's my preference against yours. I don't
even see you making arguments that your approach is better, just that
mine is not perfect and requires some similar changes.

2021-10-05 19:20:03

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically

On Tue, Oct 05, 2021 at 11:32:13AM -0700, Jakub Kicinski wrote:
> On Tue, 5 Oct 2021 10:32:45 +0300 Leon Romanovsky wrote:
> > On Mon, Oct 04, 2021 at 04:44:13PM -0700, Jakub Kicinski wrote:
> > > On Sun, 3 Oct 2021 21:12:04 +0300 Leon Romanovsky wrote:
> > > > From: Leon Romanovsky <[email protected]>
> > > >
> > > > Introduce new devlink call to set specific ops callback during
> > > > device initialization phase after devlink_alloc() is already
> > > > called.
> > > >
> > > > This allows us to set specific ops based on device property which
> > > > is not known at the beginning of driver initialization.
> > > >
> > > > For the sake of simplicity, this API lacks any type of locking and
> > > > needs to be called before devlink_register() to make sure that no
> > > > parallel access to the ops is possible at this stage.
> > >
> > > The fact that it's not registered does not mean that the callbacks
> > > won't be invoked. Look at uses of devlink_compat_flash_update().
> >
> > It is impossible, devlink_register() is part of .probe() flow and if it
> > wasn't called -> probe didn't success -> net_device doesn't exist.
>
> Are you talking about reality or the bright future brought by auxbus?

I looked on all the drivers which called to devlink_alloc() which is
starting point before devlink_register(). All of them used it in the
probe. My annotation patch checks that too.

https://lore.kernel.org/linux-rdma/f65772d429d2c259bbc18cf5b1bbe61e39eb7081.1633284302.git.leonro@nvidia.com/T/#u

So IMHO, it is reality.

>
> > We are not having net_device without "connected" device beneath, aren't we?
> >
> > At least drivers that I checked are not prepared at all to handle call
> > to devlink->ops.flash_update() if they didn't probe successfully.
>
> Last time I checked you moved the devlink_register() at the end of
> probe which for all no-auxbus drivers means after register_netdev().

I need to add a check of if(devlink_register) inside devlink_compat_flash_update().

>
> > > > diff --git a/net/core/devlink.c b/net/core/devlink.c
> > > > index 4e484afeadea..25c2aa2b35cd 100644
> > > > --- a/net/core/devlink.c
> > > > +++ b/net/core/devlink.c
> > > > @@ -53,7 +53,7 @@ struct devlink {
> > > > struct list_head trap_list;
> > > > struct list_head trap_group_list;
> > > > struct list_head trap_policer_list;
> > > > - const struct devlink_ops *ops;
> > > > + struct devlink_ops ops;
> > >
> > > Security people like ops to live in read-only memory. You're making
> > > them r/w for every devlink instance now.
> >
> > Yes, but we are explicitly copy every function pointer, which is safe.
>
> The goal is for ops to live in pages which are mapped read-only,
> so that heap overflows can overwrite the pointers.

<...>

> I don't like it. If you're feeling strongly please gather support of
> other developers. Right now it's my preference against yours. I don't
> even see you making arguments that your approach is better, just that
> mine is not perfect and requires some similar changes.

I have an idea of how to keep static ops and allow devlink_set_ops()
like functionality.

What about if I group ops by some sort of commonalities?

In my case, it will be devlink_reload_ops, which will include reload
relevant callbacks and provide devlink_set_reload_ops() wrapper to set
them?

It will ensure that all pointers are const without need to have feature
bits.

Thanks

2021-10-06 00:41:34

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically

On Tue, 5 Oct 2021 22:15:40 +0300 Leon Romanovsky wrote:
> On Tue, Oct 05, 2021 at 11:32:13AM -0700, Jakub Kicinski wrote:
> > On Tue, 5 Oct 2021 10:32:45 +0300 Leon Romanovsky wrote:
> > > It is impossible, devlink_register() is part of .probe() flow and if it
> > > wasn't called -> probe didn't success -> net_device doesn't exist.
> >
> > Are you talking about reality or the bright future brought by auxbus?
>
> I looked on all the drivers which called to devlink_alloc() which is
> starting point before devlink_register(). All of them used it in the
> probe. My annotation patch checks that too.
>
> https://lore.kernel.org/linux-rdma/f65772d429d2c259bbc18cf5b1bbe61e39eb7081.1633284302.git.leonro@nvidia.com/T/#u
>
> So IMHO, it is reality.

You say that yet below you admit flashing is broken :/

> > > We are not having net_device without "connected" device beneath, aren't we?
> > >
> > > At least drivers that I checked are not prepared at all to handle call
> > > to devlink->ops.flash_update() if they didn't probe successfully.
> >
> > Last time I checked you moved the devlink_register() at the end of
> > probe which for all no-auxbus drivers means after register_netdev().
>
> I need to add a check of if(devlink_register) inside devlink_compat_flash_update().

... and the workarounds start to pile up.

> > I don't like it. If you're feeling strongly please gather support of
> > other developers. Right now it's my preference against yours. I don't
> > even see you making arguments that your approach is better, just that
> > mine is not perfect and requires some similar changes.
>
> I have an idea of how to keep static ops and allow devlink_set_ops()
> like functionality.
>
> What about if I group ops by some sort of commonalities?
>
> In my case, it will be devlink_reload_ops, which will include reload
> relevant callbacks and provide devlink_set_reload_ops() wrapper to set
> them?
>
> It will ensure that all pointers are const without need to have feature
> bits.

I don't understand why you keep pushing the op reassignment.

2021-10-06 03:40:03

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically

On Tue, Oct 05, 2021 at 05:39:40PM -0700, Jakub Kicinski wrote:
> On Tue, 5 Oct 2021 22:15:40 +0300 Leon Romanovsky wrote:
> > On Tue, Oct 05, 2021 at 11:32:13AM -0700, Jakub Kicinski wrote:
> > > On Tue, 5 Oct 2021 10:32:45 +0300 Leon Romanovsky wrote:
> > > > It is impossible, devlink_register() is part of .probe() flow and if it
> > > > wasn't called -> probe didn't success -> net_device doesn't exist.
> > >
> > > Are you talking about reality or the bright future brought by auxbus?
> >
> > I looked on all the drivers which called to devlink_alloc() which is
> > starting point before devlink_register(). All of them used it in the
> > probe. My annotation patch checks that too.
> >
> > https://lore.kernel.org/linux-rdma/f65772d429d2c259bbc18cf5b1bbe61e39eb7081.1633284302.git.leonro@nvidia.com/T/#u
> >
> > So IMHO, it is reality.
>
> You say that yet below you admit flashing is broken :/

I said more than once, lifetime of devlink is broken. It is placed in
wrong layer, pretend to implement some of driver core functionality
without proper protections and have wrong locks.

At least, I didn't break flash update, there is no change in logic of
flash after any of my changes. Exactly like before, user was able to trigger
flash update through this devlink_compat_flash_update() call without any
protection.

Let's chose random kernel version (v5.11)
https://elixir.bootlin.com/linux/v5.11/source/net/core/devlink.c#L10245
as you can see, it doesn't hold ANY driver core locks, so it can be
called in any time during driver .probe() or .remove(). Drivers that
have implemented ops.flash_update() have no idea about that.

>
> > > > We are not having net_device without "connected" device beneath, aren't we?
> > > >
> > > > At least drivers that I checked are not prepared at all to handle call
> > > > to devlink->ops.flash_update() if they didn't probe successfully.
> > >
> > > Last time I checked you moved the devlink_register() at the end of
> > > probe which for all no-auxbus drivers means after register_netdev().
> >
> > I need to add a check of if(devlink_register) inside devlink_compat_flash_update().
>
> ... and the workarounds start to pile up.

It is not a workaround, but attempt to fix this mess.

I separated devlink netlink callers from the kernel flow and just need
to continue to fix these external to devlink callers.

>
> > > I don't like it. If you're feeling strongly please gather support of
> > > other developers. Right now it's my preference against yours. I don't
> > > even see you making arguments that your approach is better, just that
> > > mine is not perfect and requires some similar changes.
> >
> > I have an idea of how to keep static ops and allow devlink_set_ops()
> > like functionality.
> >
> > What about if I group ops by some sort of commonalities?
> >
> > In my case, it will be devlink_reload_ops, which will include reload
> > relevant callbacks and provide devlink_set_reload_ops() wrapper to set
> > them?
> >
> > It will ensure that all pointers are const without need to have feature
> > bits.
>
> I don't understand why you keep pushing the op reassignment.

It is not reassignment, but ability to assign proper callbacks from the
beginning.

The idea is to make sure that lifetime of devlink is managed by proper
ops callbacks, based on them we can control everything inside devlink
by ensuring right locks, order e.t.c.

I want to get rid from random *_enabled flags that always forgotten and
adds complexity that don't give any advantage only issues.

I'm also changing devlink to allow parallel execution and for that I
need to have reliable devlink instance with minimal number of locks.

Thanks

2021-10-06 13:38:18

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically

On Wed, 6 Oct 2021 06:37:44 +0300 Leon Romanovsky wrote:
> Let's chose random kernel version (v5.11)
> https://elixir.bootlin.com/linux/v5.11/source/net/core/devlink.c#L10245
> as you can see, it doesn't hold ANY driver core locks,

Nope, that is not what I see.

> so it can be called in any time during driver .probe() or .remove().

Having a callback invoked after registering to a subsystem (which used
to be the case for devlink before the changes) is _normal_.

You keep talking about .probe() like it's some magic period of complete
quiescence.

> Drivers that have implemented ops.flash_update() have no idea about that.

I bet.

I don't think this discussion is going anywhere, count me out.

2021-10-06 14:51:44

by Leon Romanovsky

[permalink] [raw]
Subject: Re: [PATCH net-next v2 3/5] devlink: Allow set specific ops callbacks dynamically

On Wed, Oct 06, 2021 at 06:35:58AM -0700, Jakub Kicinski wrote:
> On Wed, 6 Oct 2021 06:37:44 +0300 Leon Romanovsky wrote:

<...>

> I don't think this discussion is going anywhere, count me out.

At least here, we agree on something.