From: Vladimir Oltean <[email protected]>
The initial goal of this series was to have better support for
standalone ports mode and multiple bridges on the DSA drivers like
ocelot/felix and sja1105. Proper support for standalone mode requires
disabling address learning, which in turn requires interaction with the
switchdev notifier, which is actually where most of the patches are.
I also noticed that most of the drivers are actually talking either to
firmware or SPI/MDIO connected devices from the brport flags switchdev
attribute handler, so it makes sense to actually make it sleepable
instead of atomic.
Vladimir Oltean (11):
net: switchdev: propagate extack to port attributes
net: bridge: offload all port flags at once in br_setport
net: bridge: don't print in br_switchdev_set_port_flag
net: dsa: configure proper brport flags when ports leave the bridge
net: squash switchdev attributes PRE_BRIDGE_FLAGS and BRIDGE_FLAGS
net: dsa: kill .port_egress_floods overengineering
net: prep switchdev drivers for concurrent
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
net: bridge: put SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS on the blocking
call chain
net: mscc: ocelot: use separate flooding PGID for broadcast
net: mscc: ocelot: offload bridge port flags to device
net: dsa: sja1105: offload bridge port flags to device
drivers/net/dsa/b53/b53_common.c | 20 +-
drivers/net/dsa/mv88e6xxx/chip.c | 21 +-
drivers/net/dsa/ocelot/felix.c | 10 +
drivers/net/dsa/sja1105/sja1105.h | 2 +
drivers/net/dsa/sja1105/sja1105_main.c | 212 +++++++++++++++++-
drivers/net/dsa/sja1105/sja1105_spi.c | 6 +
.../marvell/prestera/prestera_switchdev.c | 54 +++--
.../mellanox/mlxsw/spectrum_switchdev.c | 90 ++++----
drivers/net/ethernet/mscc/ocelot.c | 72 +++++-
drivers/net/ethernet/mscc/ocelot_net.c | 7 +-
drivers/net/ethernet/rocker/rocker.h | 2 +-
drivers/net/ethernet/rocker/rocker_main.c | 24 +-
drivers/net/ethernet/rocker/rocker_ofdpa.c | 26 ++-
drivers/net/ethernet/ti/cpsw_switchdev.c | 35 ++-
drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 43 ++--
include/net/dsa.h | 7 +-
include/net/switchdev.h | 14 +-
include/soc/mscc/ocelot.h | 18 +-
net/bridge/br_netlink.c | 162 ++++++-------
net/bridge/br_private.h | 6 +-
net/bridge/br_switchdev.c | 33 ++-
net/bridge/br_sysfs_if.c | 21 +-
net/dsa/dsa_priv.h | 8 +-
net/dsa/port.c | 76 ++++---
net/dsa/slave.c | 10 +-
net/switchdev/switchdev.c | 11 +-
26 files changed, 654 insertions(+), 336 deletions(-)
--
2.25.1
From: Vladimir Oltean <[email protected]>
If for example this command:
ip link set swp0 type bridge_slave flood off mcast_flood off learning off
succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
BR_LEARNING, there would be no attempt to revert the partial state in
any way. Arguably, if the user changes more than one flag through the
same netlink command, this one _should_ be all or nothing, which means
it should be passed through switchdev as all or nothing.
We also move the br->lock handling inside br_setport in anticipation of
a future patch which will temporarily drop the lock around
br_switchdev_set_port_flag, since we would like that switchdev
notification to be emitted in blocking context.
Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v3:
Don't attempt to drop br->lock around br_switchdev_set_port_flag now,
move that part to a later patch.
Changes in v2:
Patch is new.
Changes in v2:
Patch is new.
net/bridge/br_netlink.c | 145 ++++++++++++++------------------------
net/bridge/br_switchdev.c | 7 +-
2 files changed, 58 insertions(+), 94 deletions(-)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index bd3962da345a..4e64775bd8fb 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -853,103 +853,76 @@ static int br_set_port_state(struct net_bridge_port *p, u8 state)
}
/* Set/clear or port flags based on attribute */
-static int br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
- int attrtype, unsigned long mask)
+static void br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
+ int attrtype, unsigned long mask)
{
- unsigned long flags;
- int err;
-
if (!tb[attrtype])
- return 0;
+ return;
if (nla_get_u8(tb[attrtype]))
- flags = p->flags | mask;
+ p->flags |= mask;
else
- flags = p->flags & ~mask;
-
- err = br_switchdev_set_port_flag(p, flags, mask);
- if (err)
- return err;
-
- p->flags = flags;
- return 0;
+ p->flags &= ~mask;
}
/* Process bridge protocol info on port */
static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
{
- unsigned long old_flags = p->flags;
- bool br_vlan_tunnel_old = false;
+ unsigned long old_flags, changed_mask;
+ bool br_vlan_tunnel_old;
int err;
- err = br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
- if (err)
- return err;
+ spin_lock_bh(&p->br->lock);
- err = br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
- if (err)
- return err;
+ old_flags = p->flags;
+ br_vlan_tunnel_old = (old_flags & BR_VLAN_TUNNEL) ? true : false;
- err = br_set_port_flag(p, tb, IFLA_BRPORT_FAST_LEAVE, BR_MULTICAST_FAST_LEAVE);
- if (err)
- return err;
+ br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
+ br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_FAST_LEAVE,
+ BR_MULTICAST_FAST_LEAVE);
+ br_set_port_flag(p, tb, IFLA_BRPORT_PROTECT, BR_ROOT_BLOCK);
+ br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
+ br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_TO_UCAST,
+ BR_MULTICAST_TO_UNICAST);
+ br_set_port_flag(p, tb, IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
+ br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);
+ br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
+ br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS, BR_NEIGH_SUPPRESS);
+ br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
- err = br_set_port_flag(p, tb, IFLA_BRPORT_PROTECT, BR_ROOT_BLOCK);
- if (err)
- return err;
+ changed_mask = old_flags ^ p->flags;
- err = br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_TO_UCAST, BR_MULTICAST_TO_UNICAST);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);
- if (err)
- return err;
-
- br_vlan_tunnel_old = (p->flags & BR_VLAN_TUNNEL) ? true : false;
- err = br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
- if (err)
- return err;
+ err = br_switchdev_set_port_flag(p, p->flags, changed_mask);
+ if (err) {
+ p->flags = old_flags;
+ goto out;
+ }
if (br_vlan_tunnel_old && !(p->flags & BR_VLAN_TUNNEL))
nbp_vlan_tunnel_info_flush(p);
+ br_port_flags_change(p, changed_mask);
+
if (tb[IFLA_BRPORT_COST]) {
err = br_stp_set_path_cost(p, nla_get_u32(tb[IFLA_BRPORT_COST]));
if (err)
- return err;
+ goto out;
}
if (tb[IFLA_BRPORT_PRIORITY]) {
err = br_stp_set_port_priority(p, nla_get_u16(tb[IFLA_BRPORT_PRIORITY]));
if (err)
- return err;
+ goto out;
}
if (tb[IFLA_BRPORT_STATE]) {
err = br_set_port_state(p, nla_get_u8(tb[IFLA_BRPORT_STATE]));
if (err)
- return err;
+ goto out;
}
if (tb[IFLA_BRPORT_FLUSH])
@@ -961,7 +934,7 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
err = br_multicast_set_port_router(p, mcast_router);
if (err)
- return err;
+ goto out;
}
if (tb[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT]) {
@@ -970,27 +943,20 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
hlimit = nla_get_u32(tb[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT]);
err = br_multicast_eht_set_hosts_limit(p, hlimit);
if (err)
- return err;
+ goto out;
}
#endif
if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) {
u16 fwd_mask = nla_get_u16(tb[IFLA_BRPORT_GROUP_FWD_MASK]);
- if (fwd_mask & BR_GROUPFWD_MACPAUSE)
- return -EINVAL;
+ if (fwd_mask & BR_GROUPFWD_MACPAUSE) {
+ err = -EINVAL;
+ goto out;
+ }
p->group_fwd_mask = fwd_mask;
}
- err = br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS,
- BR_NEIGH_SUPPRESS);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
- if (err)
- return err;
-
if (tb[IFLA_BRPORT_BACKUP_PORT]) {
struct net_device *backup_dev = NULL;
u32 backup_ifindex;
@@ -999,17 +965,21 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
if (backup_ifindex) {
backup_dev = __dev_get_by_index(dev_net(p->dev),
backup_ifindex);
- if (!backup_dev)
- return -ENOENT;
+ if (!backup_dev) {
+ err = -ENOENT;
+ goto out;
+ }
}
err = nbp_backup_change(p, backup_dev);
if (err)
- return err;
+ goto out;
}
- br_port_flags_change(p, old_flags ^ p->flags);
- return 0;
+out:
+ spin_unlock_bh(&p->br->lock);
+
+ return err;
}
/* Change state and parameters on port. */
@@ -1045,9 +1015,7 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh, u16 flags,
if (err)
return err;
- spin_lock_bh(&p->br->lock);
err = br_setport(p, tb);
- spin_unlock_bh(&p->br->lock);
} else {
/* Binary compatibility with old RSTP */
if (nla_len(protinfo) < sizeof(u8))
@@ -1134,17 +1102,10 @@ static int br_port_slave_changelink(struct net_device *brdev,
struct nlattr *data[],
struct netlink_ext_ack *extack)
{
- struct net_bridge *br = netdev_priv(brdev);
- int ret;
-
if (!data)
return 0;
- spin_lock_bh(&br->lock);
- ret = br_setport(br_port_get_rtnl(dev), data);
- spin_unlock_bh(&br->lock);
-
- return ret;
+ return br_setport(br_port_get_rtnl(dev), data);
}
static int br_port_fill_slave_info(struct sk_buff *skb,
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index a9c23ef83443..c004ade25ac0 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -65,16 +65,19 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
struct switchdev_attr attr = {
.orig_dev = p->dev,
.id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
- .u.brport_flags = mask,
};
struct switchdev_notifier_port_attr_info info = {
.attr = &attr,
};
int err;
- if (mask & ~BR_PORT_FLAGS_HW_OFFLOAD)
+ flags &= BR_PORT_FLAGS_HW_OFFLOAD;
+ mask &= BR_PORT_FLAGS_HW_OFFLOAD;
+ if (!mask)
return 0;
+ attr.u.brport_flags = mask;
+
/* We run from atomic context here */
err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
&info.info, NULL);
--
2.25.1
From: Vladimir Oltean <[email protected]>
Currently br_switchdev_set_port_flag has two options for error handling
and neither is good:
- The driver returns -EOPNOTSUPP in PRE_BRIDGE_FLAGS if it doesn't
support offloading that flag, and this gets silently ignored and
converted to an errno of 0. Nobody does this.
- The driver returns some other error code, like -EINVAL, in
PRE_BRIDGE_FLAGS, and br_switchdev_set_port_flag shouts loudly.
The problem is that we'd like to offload some port flags during bridge
join and leave, but also not have the bridge shout at us if those fail.
But on the other hand we'd like the user to know that we can't offload
something when they set that through netlink. And since we can't have
the driver return -EOPNOTSUPP or -EINVAL depending on whether it's
called by the user or internally by the bridge, let's just add an extack
argument to br_switchdev_set_port_flag and propagate it to its callers.
Then, when we need offloading to really fail silently, this can simply
be passed a NULL argument.
Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v3:
- Deal with the br_switchdev_set_port_flag call from sysfs too.
Changes in v2:
- br_set_port_flag now returns void, so no extack there.
- don't overwrite extack in br_switchdev_set_port_flag if already
populated.
net/bridge/br_netlink.c | 9 +++++----
net/bridge/br_private.h | 6 ++++--
net/bridge/br_switchdev.c | 13 +++++++------
net/bridge/br_sysfs_if.c | 7 +++++--
4 files changed, 21 insertions(+), 14 deletions(-)
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 4e64775bd8fb..b7731614c036 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -866,7 +866,8 @@ static void br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
}
/* Process bridge protocol info on port */
-static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
+static int br_setport(struct net_bridge_port *p, struct nlattr *tb[],
+ struct netlink_ext_ack *extack)
{
unsigned long old_flags, changed_mask;
bool br_vlan_tunnel_old;
@@ -896,7 +897,7 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
changed_mask = old_flags ^ p->flags;
- err = br_switchdev_set_port_flag(p, p->flags, changed_mask);
+ err = br_switchdev_set_port_flag(p, p->flags, changed_mask, extack);
if (err) {
p->flags = old_flags;
goto out;
@@ -1015,7 +1016,7 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh, u16 flags,
if (err)
return err;
- err = br_setport(p, tb);
+ err = br_setport(p, tb, extack);
} else {
/* Binary compatibility with old RSTP */
if (nla_len(protinfo) < sizeof(u8))
@@ -1105,7 +1106,7 @@ static int br_port_slave_changelink(struct net_device *brdev,
if (!data)
return 0;
- return br_setport(br_port_get_rtnl(dev), data);
+ return br_setport(br_port_get_rtnl(dev), data, extack);
}
static int br_port_fill_slave_info(struct sk_buff *skb,
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index d242ba668e47..a1639d41188b 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1575,7 +1575,8 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
const struct sk_buff *skb);
int br_switchdev_set_port_flag(struct net_bridge_port *p,
unsigned long flags,
- unsigned long mask);
+ unsigned long mask,
+ struct netlink_ext_ack *extack);
void br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb,
int type);
int br_switchdev_port_vlan_add(struct net_device *dev, u16 vid, u16 flags,
@@ -1605,7 +1606,8 @@ static inline bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
static inline int br_switchdev_set_port_flag(struct net_bridge_port *p,
unsigned long flags,
- unsigned long mask)
+ unsigned long mask,
+ struct netlink_ext_ack *extack)
{
return 0;
}
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index c004ade25ac0..ac8dead86bf2 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -60,7 +60,8 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
int br_switchdev_set_port_flag(struct net_bridge_port *p,
unsigned long flags,
- unsigned long mask)
+ unsigned long mask,
+ struct netlink_ext_ack *extack)
{
struct switchdev_attr attr = {
.orig_dev = p->dev,
@@ -80,14 +81,15 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
/* We run from atomic context here */
err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
- &info.info, NULL);
+ &info.info, extack);
err = notifier_to_errno(err);
if (err == -EOPNOTSUPP)
return 0;
if (err) {
- br_warn(p->br, "bridge flag offload is not supported %u(%s)\n",
- (unsigned int)p->port_no, p->dev->name);
+ if (extack && !extack->_msg)
+ NL_SET_ERR_MSG_MOD(extack,
+ "bridge flag offload is not supported");
return -EOPNOTSUPP;
}
@@ -97,8 +99,7 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
err = switchdev_port_attr_set(p->dev, &attr);
if (err) {
- br_warn(p->br, "error setting offload flag on port %u(%s)\n",
- (unsigned int)p->port_no, p->dev->name);
+ NL_SET_ERR_MSG_MOD(extack, "error setting offload flag on port");
return err;
}
diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 5aea9427ffe1..72e92376eef1 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -59,6 +59,7 @@ static BRPORT_ATTR(_name, 0644, \
static int store_flag(struct net_bridge_port *p, unsigned long v,
unsigned long mask)
{
+ struct netlink_ext_ack extack = {0};
unsigned long flags = p->flags;
int err;
@@ -68,9 +69,11 @@ static int store_flag(struct net_bridge_port *p, unsigned long v,
flags &= ~mask;
if (flags != p->flags) {
- err = br_switchdev_set_port_flag(p, flags, mask);
- if (err)
+ err = br_switchdev_set_port_flag(p, flags, mask, &extack);
+ if (err) {
+ netdev_err(p->dev, "%s\n", extack._msg);
return err;
+ }
p->flags = flags;
br_port_flags_change(p, mask);
--
2.25.1
From: Vladimir Oltean <[email protected]>
The bridge offloads the port flags through a single bit mask using
switchdev, which among others, contains learning and flooding settings.
The commit 57652796aa97 ("net: dsa: add support for bridge flags")
missed one crucial aspect of the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS API
when designing the API one level lower, towards the drivers.
This is that the bitmask of passed brport flags never has more than one
bit set at a time. On the other hand, the prototype passed to the driver
is .port_egress_floods(int port, bool unicast, bool multicast), which
configures two flags at a time.
DSA currently checks if .port_egress_floods is implemented, and if it
is, reports both BR_FLOOD and BR_MCAST_FLOOD as supported. So the driver
has no choice if it wants to inform the bridge that, for example, it
can't configure unicast flooding independently of multicast flooding -
the DSA mid layer is standing in the way. Or the other way around: a new
driver wants to start configuring BR_BCAST_FLOOD separately, but what do
we do with the rest, which only support unicast and multicast flooding?
Do we report broadcast flooding configuration as supported for those
too, and silently do nothing?
Secondly, currently DSA deems the driver too dumb to deserve knowing that
a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
just calls .port_egress_floods for the CPU port. When we'll add support
for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
problem because the flood settings will need to be held statefully in
the DSA middle layer, otherwise changing the mrouter port attribute will
impact the flooding attribute. And that's _assuming_ that the underlying
hardware doesn't have anything else to do when a multicast router
attaches to a port than flood unknown traffic to it. If it does, there
will need to be a dedicated .port_set_mrouter anyway.
Lastly, we have DSA drivers that have a backlink into a pure switchdev
driver (felix -> ocelot). It seems reasonable that the other switchdev
drivers should not have to suffer from the oddities of DSA overengineering,
so keeping DSA a pass-through layer makes more sense there.
To simplify the brport flags situation we just delete .port_egress_floods
and we introduce a simple .port_bridge_flags which is passed to the
driver. Also, the logic from dsa_port_mrouter is removed and a
.port_set_mrouter is created.
Functionally speaking, we simply move the calls to .port_egress_floods
one step lower, in the two drivers that implement it: mv88e6xxx and b53,
so things should work just as before.
Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v3:
- Pass extack through the newly introduce dsa_port_change_brport_flags.
Changes in v2:
- Reordered with previous patch such that we don't need to introduce
.port_pre_bridge_flags
- Pass extack to drivers.
drivers/net/dsa/b53/b53_common.c | 20 +++++++++++++++++++-
drivers/net/dsa/mv88e6xxx/chip.c | 21 ++++++++++++++++++++-
include/net/dsa.h | 7 +++++--
net/dsa/dsa_priv.h | 6 ++++--
net/dsa/port.c | 20 +++++++++-----------
net/dsa/slave.c | 4 ++--
6 files changed, 59 insertions(+), 19 deletions(-)
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 23fc7225c8d1..d480493cb64d 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1948,6 +1948,23 @@ int b53_br_egress_floods(struct dsa_switch *ds, int port,
}
EXPORT_SYMBOL(b53_br_egress_floods);
+static int b53_br_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ if (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD))
+ return -EINVAL;
+
+ return b53_br_egress_floods(ds, port, flags.val & BR_FLOOD,
+ flags.val & BR_MCAST_FLOOD);
+}
+
+static int b53_set_mrouter(struct dsa_switch *ds, int port, bool mrouter,
+ struct netlink_ext_ack *extack)
+{
+ return b53_br_egress_floods(ds, port, true, mrouter);
+}
+
static bool b53_possible_cpu_port(struct dsa_switch *ds, int port)
{
/* Broadcom switches will accept enabling Broadcom tags on the
@@ -2187,9 +2204,10 @@ static const struct dsa_switch_ops b53_switch_ops = {
.set_mac_eee = b53_set_mac_eee,
.port_bridge_join = b53_br_join,
.port_bridge_leave = b53_br_leave,
+ .port_bridge_flags = b53_br_flags,
+ .port_set_mrouter = b53_set_mrouter,
.port_stp_state_set = b53_br_set_stp_state,
.port_fast_age = b53_br_fast_age,
- .port_egress_floods = b53_br_egress_floods,
.port_vlan_filtering = b53_vlan_filtering,
.port_vlan_add = b53_vlan_add,
.port_vlan_del = b53_vlan_del,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index ae0b490f00cd..b230bfcc4050 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -5380,6 +5380,24 @@ static int mv88e6xxx_port_egress_floods(struct dsa_switch *ds, int port,
return err;
}
+static int mv88e6xxx_port_bridge_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ if (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD))
+ return -EINVAL;
+
+ return mv88e6xxx_port_egress_floods(ds, port, flags.val & BR_FLOOD,
+ flags.val & BR_MCAST_FLOOD);
+}
+
+static int mv88e6xxx_port_set_mrouter(struct dsa_switch *ds, int port,
+ bool mrouter,
+ struct netlink_ext_ack *extack)
+{
+ return mv88e6xxx_port_egress_floods(ds, port, true, mrouter);
+}
+
static bool mv88e6xxx_lag_can_offload(struct dsa_switch *ds,
struct net_device *lag,
struct netdev_lag_upper_info *info)
@@ -5678,7 +5696,8 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
.set_ageing_time = mv88e6xxx_set_ageing_time,
.port_bridge_join = mv88e6xxx_port_bridge_join,
.port_bridge_leave = mv88e6xxx_port_bridge_leave,
- .port_egress_floods = mv88e6xxx_port_egress_floods,
+ .port_bridge_flags = mv88e6xxx_port_bridge_flags,
+ .port_set_mrouter = mv88e6xxx_port_set_mrouter,
.port_stp_state_set = mv88e6xxx_port_stp_state_set,
.port_fast_age = mv88e6xxx_port_fast_age,
.port_vlan_filtering = mv88e6xxx_port_vlan_filtering,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 60acb9fca124..09aa28e667c7 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -621,8 +621,11 @@ struct dsa_switch_ops {
void (*port_stp_state_set)(struct dsa_switch *ds, int port,
u8 state);
void (*port_fast_age)(struct dsa_switch *ds, int port);
- int (*port_egress_floods)(struct dsa_switch *ds, int port,
- bool unicast, bool multicast);
+ int (*port_bridge_flags)(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack);
+ int (*port_set_mrouter)(struct dsa_switch *ds, int port, bool mrouter,
+ struct netlink_ext_ack *extack);
/*
* VLAN support
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 63770e421e4d..8125806ee135 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -175,8 +175,10 @@ int dsa_port_mdb_add(const struct dsa_port *dp,
int dsa_port_mdb_del(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
int dsa_port_bridge_flags(const struct dsa_port *dp,
- struct switchdev_brport_flags flags);
-int dsa_port_mrouter(struct dsa_port *dp, bool mrouter);
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack);
+int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
+ struct netlink_ext_ack *extack);
int dsa_port_vlan_add(struct dsa_port *dp,
const struct switchdev_obj_port_vlan *vlan);
int dsa_port_vlan_del(struct dsa_port *dp,
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 736c5debcb96..545f83c7b193 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -140,7 +140,7 @@ static void dsa_port_change_brport_flags(struct dsa_port *dp,
tmp.val = flags.val & BIT(flag);
tmp.mask = BIT(flag);
- dsa_port_bridge_flags(dp, tmp);
+ dsa_port_bridge_flags(dp, tmp, NULL);
}
}
@@ -425,28 +425,26 @@ int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock)
}
int dsa_port_bridge_flags(const struct dsa_port *dp,
- struct switchdev_brport_flags flags)
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
{
struct dsa_switch *ds = dp->ds;
- int port = dp->index;
- if (!ds->ops->port_egress_floods ||
- (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD)))
+ if (!ds->ops->port_bridge_flags)
return -EINVAL;
- return ds->ops->port_egress_floods(ds, port, flags.val & BR_FLOOD,
- flags.val & BR_MCAST_FLOOD);
+ return ds->ops->port_bridge_flags(ds, dp->index, flags, extack);
}
-int dsa_port_mrouter(struct dsa_port *dp, bool mrouter)
+int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
+ struct netlink_ext_ack *extack)
{
struct dsa_switch *ds = dp->ds;
- int port = dp->index;
- if (!ds->ops->port_egress_floods)
+ if (!ds->ops->port_set_mrouter)
return -EOPNOTSUPP;
- return ds->ops->port_egress_floods(ds, port, true, mrouter);
+ return ds->ops->port_set_mrouter(ds, dp->index, mrouter, extack);
}
int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0e1f8f1d4e2c..4a979245e059 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -291,10 +291,10 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
ret = dsa_port_ageing_time(dp, attr->u.ageing_time);
break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
- ret = dsa_port_bridge_flags(dp, attr->u.brport_flags);
+ ret = dsa_port_bridge_flags(dp, attr->u.brport_flags, extack);
break;
case SWITCHDEV_ATTR_ID_BRIDGE_MROUTER:
- ret = dsa_port_mrouter(dp->cpu_dp, attr->u.mrouter);
+ ret = dsa_port_mrouter(dp->cpu_dp, attr->u.mrouter, extack);
break;
default:
ret = -EOPNOTSUPP;
--
2.25.1
From: Vladimir Oltean <[email protected]>
For a DSA switch port operating in standalone mode, address learning
doesn't make much sense since that is a bridge function. In fact,
address learning even breaks setups such as this one:
+---------------------------------------------+
| |
| +-------------------+ |
| | br0 | send receive |
| +--------+-+--------+ +--------+ +--------+ |
| | | | | | | | | |
| | swp0 | | swp1 | | swp2 | | swp3 | |
| | | | | | | | | |
+-+--------+-+--------+-+--------+-+--------+-+
| ^ | ^
| | | |
| +-----------+ |
| |
+--------------------------------+
because if the switch has a single FDB (can offload a single bridge)
then source address learning on swp3 can "steal" the source MAC address
of swp2 from br0's FDB, because learning frames coming from swp2 will be
done twice: first on the swp1 ingress port, second on the swp3 ingress
port. So the hardware FDB will become out of sync with the software
bridge, and when swp2 tries to send one more packet towards swp1, the
ASIC will attempt to short-circuit the forwarding path and send it
directly to swp3 (since that's the last port it learned that address on),
which it obviously can't, because swp3 operates in standalone mode.
So DSA drivers operating in standalone mode should still configure a
list of bridge port flags even when they are standalone. Currently DSA
attempts to call dsa_port_bridge_flags with 0, which disables egress
flooding of unknown unicast and multicast, something which doesn't make
much sense. For the switches that implement .port_egress_floods - b53
and mv88e6xxx, it probably doesn't matter too much either, since they
can possibly inject traffic from the CPU into a standalone port,
regardless of MAC DA, even if egress flooding is turned off for that
port, but certainly not all DSA switches can do that - sja1105, for
example, can't. So it makes sense to use a better common default there,
such as "flood everything".
It should also be noted that what DSA calls "dsa_port_bridge_flags()"
is a degenerate name for just calling .port_egress_floods(), since
nothing else is implemented - not learning, in particular. But disabling
address learning, something that this driver is also coding up for, will
be supported by individual drivers once .port_egress_floods is replaced
with a more generic .port_bridge_flags.
Previous attempts to code up this logic have been in the common bridge
layer, but as pointed out by Ido Schimmel, there are corner cases that
are missed when doing that:
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
So, at least for now, let's leave DSA in charge of setting port flags
before and after the bridge join and leave.
Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v3:
Patch is new, logically it was moved from the bridge layer to the DSA
layer.
net/dsa/port.c | 45 ++++++++++++++++++++++++++++++++++++++-------
1 file changed, 38 insertions(+), 7 deletions(-)
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 5e079a61528e..9f65ba11ad00 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -122,6 +122,27 @@ void dsa_port_disable(struct dsa_port *dp)
rtnl_unlock();
}
+static void dsa_port_change_brport_flags(struct dsa_port *dp,
+ bool bridge_offload)
+{
+ unsigned long mask, flags;
+ int flag, err;
+
+ mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+ if (bridge_offload)
+ flags = mask;
+ else
+ flags = mask & ~BR_LEARNING;
+
+ for_each_set_bit(flag, &mask, 32) {
+ err = dsa_port_pre_bridge_flags(dp, BIT(flag));
+ if (err)
+ continue;
+
+ dsa_port_bridge_flags(dp, flags & BIT(flag));
+ }
+}
+
int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
{
struct dsa_notifier_bridge_info info = {
@@ -132,10 +153,10 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
};
int err;
- /* Set the flooding mode before joining the port in the switch */
- err = dsa_port_bridge_flags(dp, BR_FLOOD | BR_MCAST_FLOOD);
- if (err)
- return err;
+ /* Notify the port driver to set its configurable flags in a way that
+ * matches the initial settings of a bridge port.
+ */
+ dsa_port_change_brport_flags(dp, true);
/* Here the interface is already bridged. Reflect the current
* configuration so that drivers can program their chips accordingly.
@@ -146,7 +167,7 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
/* The bridging is rolled back on error */
if (err) {
- dsa_port_bridge_flags(dp, 0);
+ dsa_port_change_brport_flags(dp, false);
dp->bridge_dev = NULL;
}
@@ -172,8 +193,18 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
if (err)
pr_err("DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n");
- /* Port is leaving the bridge, disable flooding */
- dsa_port_bridge_flags(dp, 0);
+ /* Configure the port for standalone mode (no address learning,
+ * flood everything).
+ * The bridge only emits SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS events
+ * when the user requests it through netlink or sysfs, but not
+ * automatically at port join or leave, so we need to handle resetting
+ * the brport flags ourselves. But we even prefer it that way, because
+ * otherwise, some setups might never get the notification they need,
+ * for example, when a port leaves a LAG that offloads the bridge,
+ * it becomes standalone, but as far as the bridge is concerned, no
+ * port ever left.
+ */
+ dsa_port_change_brport_flags(dp, false);
/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
* so allow it to be in BR_STATE_FORWARDING to be kept functional
--
2.25.1
From: Vladimir Oltean <[email protected]>
There does not appear to be any strong reason why
br_switchdev_set_port_flag issues a separate notification for checking
the supported brport flags rather than just attempting to apply them and
propagating the error if that fails.
However, there is a reason why this switchdev API is counterproductive
for a driver writer, and that is because although br_switchdev_set_port_flag
gets passed a "flags" and a "mask", those are passed piecemeal to the
driver, so while the PRE_BRIDGE_FLAGS listener knows what changed
because it has the "mask", the BRIDGE_FLAGS listener doesn't, because it
only has the final value. This means that "edge detection" needs to be
done by each individual BRIDGE_FLAGS listener by XOR-ing the old and the
new flags, which in turn means that copying the flags into a driver
private variable is strictly necessary.
This can be solved by passing the "flags" and the "mask" together into
a single switchdev attribute, and it also reduces some boilerplate in
the drivers that offload this.
Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v3:
- Reworked mlxsw to check mask before performing any change.
- Also adapted the newly introduced dsa_port_change_brport_flags to the
new API.
Changes in v2:
- Renamed "val" to "flags".
- Reworked drivers to check mask before performing any change.
.../marvell/prestera/prestera_switchdev.c | 29 +++++----
.../mellanox/mlxsw/spectrum_switchdev.c | 59 +++++++++----------
drivers/net/ethernet/rocker/rocker_main.c | 24 ++------
drivers/net/ethernet/ti/cpsw_switchdev.c | 32 ++++------
drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 43 +++++++-------
include/net/switchdev.h | 8 ++-
net/bridge/br_switchdev.c | 15 +----
net/dsa/dsa_priv.h | 4 +-
net/dsa/port.c | 43 ++++++--------
net/dsa/slave.c | 3 -
10 files changed, 109 insertions(+), 151 deletions(-)
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
index 2c1619715a4b..a797a7ff0cfe 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
@@ -581,24 +581,32 @@ int prestera_bridge_port_event(struct net_device *dev, unsigned long event,
static int prestera_port_attr_br_flags_set(struct prestera_port *port,
struct net_device *dev,
- unsigned long flags)
+ struct switchdev_brport_flags flags)
{
struct prestera_bridge_port *br_port;
int err;
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
+ err = -EINVAL;
+
br_port = prestera_bridge_port_by_dev(port->sw->swdev, dev);
if (!br_port)
return 0;
- err = prestera_hw_port_flood_set(port, flags & BR_FLOOD);
- if (err)
- return err;
+ if (flags.mask & BR_FLOOD) {
+ err = prestera_hw_port_flood_set(port, flags.val & BR_FLOOD);
+ if (err)
+ return err;
+ }
- err = prestera_hw_port_learning_set(port, flags & BR_LEARNING);
- if (err)
- return err;
+ if (flags.mask & BR_LEARNING) {
+ err = prestera_hw_port_learning_set(port,
+ flags.val & BR_LEARNING);
+ if (err)
+ return err;
+ }
- memcpy(&br_port->flags, &flags, sizeof(flags));
+ memcpy(&br_port->flags, &flags.val, sizeof(flags.val));
return 0;
}
@@ -706,11 +714,6 @@ static int prestera_port_obj_attr_set(struct net_device *dev,
err = prestera_port_attr_stp_state_set(port, attr->orig_dev,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- if (attr->u.brport_flags &
- ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
- err = -EINVAL;
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = prestera_port_attr_br_flags_set(port, attr->orig_dev,
attr->u.brport_flags);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 18e4f1cd5587..7af79e3a3c7a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -653,51 +653,52 @@ mlxsw_sp_bridge_port_learning_set(struct mlxsw_sp_port *mlxsw_sp_port,
return err;
}
-static int mlxsw_sp_port_attr_br_pre_flags_set(struct mlxsw_sp_port
- *mlxsw_sp_port,
- unsigned long brport_flags)
-{
- if (brport_flags & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
- return -EINVAL;
-
- return 0;
-}
-
static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,
struct net_device *orig_dev,
- unsigned long brport_flags)
+ struct switchdev_brport_flags flags)
{
struct mlxsw_sp_bridge_port *bridge_port;
int err;
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
+ return -EINVAL;
+
bridge_port = mlxsw_sp_bridge_port_find(mlxsw_sp_port->mlxsw_sp->bridge,
orig_dev);
if (!bridge_port)
return 0;
- err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port, bridge_port,
- MLXSW_SP_FLOOD_TYPE_UC,
- brport_flags & BR_FLOOD);
- if (err)
- return err;
+ if (flags.mask & BR_FLOOD) {
+ err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port,
+ bridge_port,
+ MLXSW_SP_FLOOD_TYPE_UC,
+ flags.val & BR_FLOOD);
+ if (err)
+ return err;
+ }
- err = mlxsw_sp_bridge_port_learning_set(mlxsw_sp_port, bridge_port,
- brport_flags & BR_LEARNING);
- if (err)
- return err;
+ if (flags.mask & BR_LEARNING) {
+ err = mlxsw_sp_bridge_port_learning_set(mlxsw_sp_port,
+ bridge_port,
+ flags.val & BR_LEARNING);
+ if (err)
+ return err;
+ }
if (bridge_port->bridge_device->multicast_enabled)
goto out;
- err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port, bridge_port,
- MLXSW_SP_FLOOD_TYPE_MC,
- brport_flags &
- BR_MCAST_FLOOD);
- if (err)
- return err;
+ if (flags.mask & BR_MCAST_FLOOD) {
+ err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port,
+ bridge_port,
+ MLXSW_SP_FLOOD_TYPE_MC,
+ flags.val & BR_MCAST_FLOOD);
+ if (err)
+ return err;
+ }
out:
- memcpy(&bridge_port->flags, &brport_flags, sizeof(brport_flags));
+ memcpy(&bridge_port->flags, &flags.val, sizeof(flags.val));
return 0;
}
@@ -899,10 +900,6 @@ static int mlxsw_sp_port_attr_set(struct net_device *dev,
attr->orig_dev,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- err = mlxsw_sp_port_attr_br_pre_flags_set(mlxsw_sp_port,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = mlxsw_sp_port_attr_br_flags_set(mlxsw_sp_port,
attr->orig_dev,
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 740a715c49c6..898abf3d14d0 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1575,8 +1575,8 @@ rocker_world_port_attr_bridge_flags_support_get(const struct rocker_port *
}
static int
-rocker_world_port_attr_pre_bridge_flags_set(struct rocker_port *rocker_port,
- unsigned long brport_flags)
+rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
+ struct switchdev_brport_flags flags)
{
struct rocker_world_ops *wops = rocker_port->rocker->wops;
unsigned long brport_flags_s;
@@ -1590,22 +1590,10 @@ rocker_world_port_attr_pre_bridge_flags_set(struct rocker_port *rocker_port,
if (err)
return err;
- if (brport_flags & ~brport_flags_s)
+ if (flags.mask & ~brport_flags_s)
return -EINVAL;
- return 0;
-}
-
-static int
-rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
- unsigned long brport_flags)
-{
- struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
- if (!wops->port_attr_bridge_flags_set)
- return -EOPNOTSUPP;
-
- return wops->port_attr_bridge_flags_set(rocker_port, brport_flags);
+ return wops->port_attr_bridge_flags_set(rocker_port, flags.val);
}
static int
@@ -2056,10 +2044,6 @@ static int rocker_port_attr_set(struct net_device *dev,
err = rocker_world_port_attr_stp_state_set(rocker_port,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- err = rocker_world_port_attr_pre_bridge_flags_set(rocker_port,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = rocker_world_port_attr_bridge_flags_set(rocker_port,
attr->u.brport_flags);
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.c b/drivers/net/ethernet/ti/cpsw_switchdev.c
index 13524cbaa8b6..5d8ec34f82ad 100644
--- a/drivers/net/ethernet/ti/cpsw_switchdev.c
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.c
@@ -57,27 +57,25 @@ static int cpsw_port_stp_state_set(struct cpsw_priv *priv, u8 state)
static int cpsw_port_attr_br_flags_set(struct cpsw_priv *priv,
struct net_device *orig_dev,
- unsigned long brport_flags)
+ struct switchdev_brport_flags flags)
{
struct cpsw_common *cpsw = priv->cpsw;
- bool unreg_mcast_add = false;
- if (brport_flags & BR_MCAST_FLOOD)
- unreg_mcast_add = true;
- dev_dbg(priv->dev, "BR_MCAST_FLOOD: %d port %u\n",
- unreg_mcast_add, priv->emac_port);
+ if (flags.mask & ~(BR_LEARNING | BR_MCAST_FLOOD))
+ return -EINVAL;
- cpsw_ale_set_unreg_mcast(cpsw->ale, BIT(priv->emac_port),
- unreg_mcast_add);
+ if (flags.mask & BR_MCAST_FLOOD) {
+ bool unreg_mcast_add = false;
- return 0;
-}
+ if (flags.val & BR_MCAST_FLOOD)
+ unreg_mcast_add = true;
-static int cpsw_port_attr_br_flags_pre_set(struct net_device *netdev,
- unsigned long flags)
-{
- if (flags & ~(BR_LEARNING | BR_MCAST_FLOOD))
- return -EINVAL;
+ dev_dbg(priv->dev, "BR_MCAST_FLOOD: %d port %u\n",
+ unreg_mcast_add, priv->emac_port);
+
+ cpsw_ale_set_unreg_mcast(cpsw->ale, BIT(priv->emac_port),
+ unreg_mcast_add);
+ }
return 0;
}
@@ -92,10 +90,6 @@ static int cpsw_port_attr_set(struct net_device *ndev,
dev_dbg(priv->dev, "attr: id %u port: %u\n", attr->id, priv->emac_port);
switch (attr->id) {
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- ret = cpsw_port_attr_br_flags_pre_set(ndev,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
ret = cpsw_port_stp_state_set(priv, attr->u.stp_state);
dev_dbg(priv->dev, "stp state: %u\n", attr->u.stp_state);
diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
index ca3d07fe7f58..f675a2ba4dce 100644
--- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
+++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
@@ -908,31 +908,32 @@ static int dpaa2_switch_port_attr_stp_state_set(struct net_device *netdev,
return dpaa2_switch_port_set_stp_state(port_priv, state);
}
-static int dpaa2_switch_port_attr_br_flags_pre_set(struct net_device *netdev,
- unsigned long flags)
-{
- if (flags & ~(BR_LEARNING | BR_FLOOD))
- return -EINVAL;
-
- return 0;
-}
-
-static int dpaa2_switch_port_attr_br_flags_set(struct net_device *netdev,
- unsigned long flags)
+static int
+dpaa2_switch_port_attr_br_flags_set(struct net_device *netdev,
+ struct switchdev_brport_flags flags)
{
struct ethsw_port_priv *port_priv = netdev_priv(netdev);
int err = 0;
- /* Learning is enabled per switch */
- err = dpaa2_switch_set_learning(port_priv->ethsw_data,
- !!(flags & BR_LEARNING));
- if (err)
- goto exit;
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD))
+ return -EINVAL;
+
+ if (flags.mask & BR_LEARNING) {
+ /* Learning is enabled per switch */
+ err = dpaa2_switch_set_learning(port_priv->ethsw_data,
+ !!(flags.val & BR_LEARNING));
+ if (err)
+ return err;
+ }
- err = dpaa2_switch_port_set_flood(port_priv, !!(flags & BR_FLOOD));
+ if (flags.mask & BR_FLOOD) {
+ err = dpaa2_switch_port_set_flood(port_priv,
+ !!(flags.val & BR_FLOOD));
+ if (err)
+ return err;
+ }
-exit:
- return err;
+ return 0;
}
static int dpaa2_switch_port_attr_set(struct net_device *netdev,
@@ -945,10 +946,6 @@ static int dpaa2_switch_port_attr_set(struct net_device *netdev,
err = dpaa2_switch_port_attr_stp_state_set(netdev,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- err = dpaa2_switch_port_attr_br_flags_pre_set(netdev,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = dpaa2_switch_port_attr_br_flags_set(netdev,
attr->u.brport_flags);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 84c765312001..aa9cad9bad7d 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -20,7 +20,6 @@ enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_UNDEFINED,
SWITCHDEV_ATTR_ID_PORT_STP_STATE,
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
- SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
SWITCHDEV_ATTR_ID_PORT_MROUTER,
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
@@ -33,6 +32,11 @@ enum switchdev_attr_id {
#endif
};
+struct switchdev_brport_flags {
+ unsigned long val;
+ unsigned long mask;
+};
+
struct switchdev_attr {
struct net_device *orig_dev;
enum switchdev_attr_id id;
@@ -41,7 +45,7 @@ struct switchdev_attr {
void (*complete)(struct net_device *dev, int err, void *priv);
union {
u8 stp_state; /* PORT_STP_STATE */
- unsigned long brport_flags; /* PORT_{PRE}_BRIDGE_FLAGS */
+ struct switchdev_brport_flags brport_flags; /* PORT_BRIDGE_FLAGS */
bool mrouter; /* PORT_MROUTER */
clock_t ageing_time; /* BRIDGE_AGEING_TIME */
bool vlan_filtering; /* BRIDGE_VLAN_FILTERING */
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index ac8dead86bf2..92d207c8a30e 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -65,7 +65,7 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
{
struct switchdev_attr attr = {
.orig_dev = p->dev,
- .id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
+ .id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
};
struct switchdev_notifier_port_attr_info info = {
.attr = &attr,
@@ -77,7 +77,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
if (!mask)
return 0;
- attr.u.brport_flags = mask;
+ attr.u.brport_flags.val = flags;
+ attr.u.brport_flags.mask = mask;
/* We run from atomic context here */
err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
@@ -93,16 +94,6 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
return -EOPNOTSUPP;
}
- attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
- attr.flags = SWITCHDEV_F_DEFER;
- attr.u.brport_flags = flags;
-
- err = switchdev_port_attr_set(p->dev, &attr);
- if (err) {
- NL_SET_ERR_MSG_MOD(extack, "error setting offload flag on port");
- return err;
- }
-
return 0;
}
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 8a1bcb2b4208..63770e421e4d 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -174,8 +174,8 @@ int dsa_port_mdb_add(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
int dsa_port_mdb_del(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
-int dsa_port_pre_bridge_flags(const struct dsa_port *dp, unsigned long flags);
-int dsa_port_bridge_flags(const struct dsa_port *dp, unsigned long flags);
+int dsa_port_bridge_flags(const struct dsa_port *dp,
+ struct switchdev_brport_flags flags);
int dsa_port_mrouter(struct dsa_port *dp, bool mrouter);
int dsa_port_vlan_add(struct dsa_port *dp,
const struct switchdev_obj_port_vlan *vlan);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 9f65ba11ad00..736c5debcb96 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -125,21 +125,22 @@ void dsa_port_disable(struct dsa_port *dp)
static void dsa_port_change_brport_flags(struct dsa_port *dp,
bool bridge_offload)
{
- unsigned long mask, flags;
- int flag, err;
+ struct switchdev_brport_flags flags;
+ int flag;
- mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+ flags.mask = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
if (bridge_offload)
- flags = mask;
+ flags.val = flags.mask;
else
- flags = mask & ~BR_LEARNING;
+ flags.val = flags.mask & ~BR_LEARNING;
- for_each_set_bit(flag, &mask, 32) {
- err = dsa_port_pre_bridge_flags(dp, BIT(flag));
- if (err)
- continue;
+ for_each_set_bit(flag, &flags.mask, 32) {
+ struct switchdev_brport_flags tmp;
- dsa_port_bridge_flags(dp, flags & BIT(flag));
+ tmp.val = flags.val & BIT(flag);
+ tmp.mask = BIT(flag);
+
+ dsa_port_bridge_flags(dp, tmp);
}
}
@@ -423,28 +424,18 @@ int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock)
return 0;
}
-int dsa_port_pre_bridge_flags(const struct dsa_port *dp, unsigned long flags)
+int dsa_port_bridge_flags(const struct dsa_port *dp,
+ struct switchdev_brport_flags flags)
{
struct dsa_switch *ds = dp->ds;
+ int port = dp->index;
if (!ds->ops->port_egress_floods ||
- (flags & ~(BR_FLOOD | BR_MCAST_FLOOD)))
+ (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD)))
return -EINVAL;
- return 0;
-}
-
-int dsa_port_bridge_flags(const struct dsa_port *dp, unsigned long flags)
-{
- struct dsa_switch *ds = dp->ds;
- int port = dp->index;
- int err = 0;
-
- if (ds->ops->port_egress_floods)
- err = ds->ops->port_egress_floods(ds, port, flags & BR_FLOOD,
- flags & BR_MCAST_FLOOD);
-
- return err;
+ return ds->ops->port_egress_floods(ds, port, flags.val & BR_FLOOD,
+ flags.val & BR_MCAST_FLOOD);
}
int dsa_port_mrouter(struct dsa_port *dp, bool mrouter)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8f4c7c232e2c..0e1f8f1d4e2c 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -290,9 +290,6 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
ret = dsa_port_ageing_time(dp, attr->u.ageing_time);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- ret = dsa_port_pre_bridge_flags(dp, attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
ret = dsa_port_bridge_flags(dp, attr->u.brport_flags);
break;
--
2.25.1
From: Vladimir Oltean <[email protected]>
Because the bridge will start offloading SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
while not serialized by any lock such as the br->lock spinlock, existing
drivers that treat that attribute and cache the brport flags might no
longer work correctly.
The issue is that the brport flags are a single unsigned long bitmask,
and the bridge only guarantees the validity of the changed bits, not the
full state. So when offloading two concurrent switchdev attributes, such
as one for BR_LEARNING and another for BR_FLOOD, it might happen that
the flags having BR_FLOOD are written into the cached value, and this in
turn disables the BR_LEARNING bit which was set previously.
We can fix this across the board by keeping individual boolean variables
for each brport flag. Note that mlxsw and prestera were setting the
BR_LEARNING_SYNC flag too, but that appears to be just dead code, so I
removed it.
Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v3:
Patch is new.
.../marvell/prestera/prestera_switchdev.c | 32 ++++++++++------
.../mellanox/mlxsw/spectrum_switchdev.c | 38 ++++++++++++-------
drivers/net/ethernet/rocker/rocker.h | 2 +-
drivers/net/ethernet/rocker/rocker_main.c | 2 +-
drivers/net/ethernet/rocker/rocker_ofdpa.c | 26 +++++++------
net/bridge/br_switchdev.c | 3 +-
6 files changed, 62 insertions(+), 41 deletions(-)
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
index a797a7ff0cfe..76030c4c1546 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
@@ -49,7 +49,9 @@ struct prestera_bridge_port {
struct prestera_bridge *bridge;
struct list_head vlan_list;
refcount_t ref_count;
- unsigned long flags;
+ bool learning;
+ bool ucast_flood;
+ bool mcast_flood;
u8 stp_state;
};
@@ -339,8 +341,9 @@ prestera_bridge_port_create(struct prestera_bridge *bridge,
if (!br_port)
return NULL;
- br_port->flags = BR_LEARNING | BR_FLOOD | BR_LEARNING_SYNC |
- BR_MCAST_FLOOD;
+ br_port->learning = true;
+ br_port->ucast_flood = true;
+ br_port->mcast_flood = true;
br_port->stp_state = BR_STATE_DISABLED;
refcount_set(&br_port->ref_count, 1);
br_port->bridge = bridge;
@@ -404,11 +407,11 @@ prestera_bridge_1d_port_join(struct prestera_bridge_port *br_port)
if (err)
return err;
- err = prestera_hw_port_flood_set(port, br_port->flags & BR_FLOOD);
+ err = prestera_hw_port_flood_set(port, br_port->ucast_flood);
if (err)
goto err_port_flood_set;
- err = prestera_hw_port_learning_set(port, br_port->flags & BR_LEARNING);
+ err = prestera_hw_port_learning_set(port, br_port->learning);
if (err)
goto err_port_learning_set;
@@ -594,19 +597,24 @@ static int prestera_port_attr_br_flags_set(struct prestera_port *port,
return 0;
if (flags.mask & BR_FLOOD) {
- err = prestera_hw_port_flood_set(port, flags.val & BR_FLOOD);
+ bool ucast_flood = !!(flags.val & BR_FLOOD);
+
+ err = prestera_hw_port_flood_set(port, ucast_flood);
if (err)
return err;
+
+ br_port->ucast_flood = ucast_flood;
}
if (flags.mask & BR_LEARNING) {
- err = prestera_hw_port_learning_set(port,
- flags.val & BR_LEARNING);
+ bool learning = !!(flags.val & BR_LEARNING);
+
+ err = prestera_hw_port_learning_set(port, learning);
if (err)
return err;
- }
- memcpy(&br_port->flags, &flags.val, sizeof(flags.val));
+ br_port->learning = learning;
+ }
return 0;
}
@@ -899,11 +907,11 @@ prestera_port_vlan_bridge_join(struct prestera_port_vlan *port_vlan,
if (port_vlan->br_port)
return 0;
- err = prestera_hw_port_flood_set(port, br_port->flags & BR_FLOOD);
+ err = prestera_hw_port_flood_set(port, br_port->ucast_flood);
if (err)
return err;
- err = prestera_hw_port_learning_set(port, br_port->flags & BR_LEARNING);
+ err = prestera_hw_port_learning_set(port, br_port->learning);
if (err)
goto err_port_learning_set;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 7af79e3a3c7a..0c00754e0cb5 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -62,9 +62,11 @@ struct mlxsw_sp_bridge_port {
struct list_head vlans_list;
unsigned int ref_count;
u8 stp_state;
- unsigned long flags;
bool mrouter;
bool lagged;
+ bool ucast_flood;
+ bool mcast_flood;
+ bool learning;
union {
u16 lag_id;
u16 system_port;
@@ -349,8 +351,9 @@ mlxsw_sp_bridge_port_create(struct mlxsw_sp_bridge_device *bridge_device,
bridge_port->dev = brport_dev;
bridge_port->bridge_device = bridge_device;
bridge_port->stp_state = BR_STATE_DISABLED;
- bridge_port->flags = BR_LEARNING | BR_FLOOD | BR_LEARNING_SYNC |
- BR_MCAST_FLOOD;
+ bridge_port->learning = true;
+ bridge_port->ucast_flood = true;
+ bridge_port->mcast_flood = true;
INIT_LIST_HEAD(&bridge_port->vlans_list);
list_add(&bridge_port->list, &bridge_device->ports_list);
bridge_port->ref_count = 1;
@@ -669,36 +672,45 @@ static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,
return 0;
if (flags.mask & BR_FLOOD) {
+ bool ucast_flood = !!(flags.val & BR_FLOOD);
+
err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port,
bridge_port,
MLXSW_SP_FLOOD_TYPE_UC,
- flags.val & BR_FLOOD);
+ ucast_flood);
if (err)
return err;
+
+ bridge_port->ucast_flood = ucast_flood;
}
if (flags.mask & BR_LEARNING) {
+ bool learning = !!(flags.val & BR_LEARNING);
+
err = mlxsw_sp_bridge_port_learning_set(mlxsw_sp_port,
- bridge_port,
- flags.val & BR_LEARNING);
+ bridge_port, learning);
if (err)
return err;
+
+ bridge_port->learning = learning;
}
if (bridge_port->bridge_device->multicast_enabled)
- goto out;
+ return 0;
if (flags.mask & BR_MCAST_FLOOD) {
+ bool mcast_flood = !!(flags.val & BR_MCAST_FLOOD);
+
err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port,
bridge_port,
MLXSW_SP_FLOOD_TYPE_MC,
- flags.val & BR_MCAST_FLOOD);
+ mcast_flood);
if (err)
return err;
+
+ bridge_port->mcast_flood = mcast_flood;
}
-out:
- memcpy(&bridge_port->flags, &flags.val, sizeof(flags.val));
return 0;
}
@@ -796,7 +808,7 @@ static bool mlxsw_sp_mc_flood(const struct mlxsw_sp_bridge_port *bridge_port)
bridge_device = bridge_port->bridge_device;
return bridge_device->multicast_enabled ? bridge_port->mrouter :
- bridge_port->flags & BR_MCAST_FLOOD;
+ bridge_port->mcast_flood;
}
static int mlxsw_sp_port_mc_disabled_set(struct mlxsw_sp_port *mlxsw_sp_port,
@@ -962,7 +974,7 @@ mlxsw_sp_port_vlan_fid_join(struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan,
return PTR_ERR(fid);
err = mlxsw_sp_fid_flood_set(fid, MLXSW_SP_FLOOD_TYPE_UC, local_port,
- bridge_port->flags & BR_FLOOD);
+ bridge_port->ucast_flood);
if (err)
goto err_fid_uc_flood_set;
@@ -1043,7 +1055,7 @@ mlxsw_sp_port_vlan_bridge_join(struct mlxsw_sp_port_vlan *mlxsw_sp_port_vlan,
return err;
err = mlxsw_sp_port_vid_learning_set(mlxsw_sp_port, vid,
- bridge_port->flags & BR_LEARNING);
+ bridge_port->learning);
if (err)
goto err_port_vid_learning_set;
diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h
index 315a6e5c0f59..2f533c44fe88 100644
--- a/drivers/net/ethernet/rocker/rocker.h
+++ b/drivers/net/ethernet/rocker/rocker.h
@@ -103,7 +103,7 @@ struct rocker_world_ops {
int (*port_attr_stp_state_set)(struct rocker_port *rocker_port,
u8 state);
int (*port_attr_bridge_flags_set)(struct rocker_port *rocker_port,
- unsigned long brport_flags);
+ struct switchdev_brport_flags flags);
int (*port_attr_bridge_flags_support_get)(const struct rocker_port *
rocker_port,
unsigned long *
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 898abf3d14d0..b1a67c6423a8 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1593,7 +1593,7 @@ rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
if (flags.mask & ~brport_flags_s)
return -EINVAL;
- return wops->port_attr_bridge_flags_set(rocker_port, flags.val);
+ return wops->port_attr_bridge_flags_set(rocker_port, flags);
}
static int
diff --git a/drivers/net/ethernet/rocker/rocker_ofdpa.c b/drivers/net/ethernet/rocker/rocker_ofdpa.c
index 967a634ee9ac..eb36c2a2da7b 100644
--- a/drivers/net/ethernet/rocker/rocker_ofdpa.c
+++ b/drivers/net/ethernet/rocker/rocker_ofdpa.c
@@ -198,7 +198,7 @@ struct ofdpa_port {
struct net_device *bridge_dev;
__be16 internal_vlan_id;
int stp_state;
- u32 brport_flags;
+ bool learning;
unsigned long ageing_time;
bool ctrls[OFDPA_CTRL_MAX];
unsigned long vlan_bitmap[OFDPA_VLAN_BITMAP_LEN];
@@ -2423,7 +2423,7 @@ static int ofdpa_port_pre_init(struct rocker_port *rocker_port)
ofdpa_port->rocker_port = rocker_port;
ofdpa_port->dev = rocker_port->dev;
ofdpa_port->pport = rocker_port->pport;
- ofdpa_port->brport_flags = BR_LEARNING;
+ ofdpa_port->learning = true;
ofdpa_port->ageing_time = BR_DEFAULT_AGEING_TIME;
return 0;
}
@@ -2433,8 +2433,7 @@ static int ofdpa_port_init(struct rocker_port *rocker_port)
struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
int err;
- rocker_port_set_learning(rocker_port,
- !!(ofdpa_port->brport_flags & BR_LEARNING));
+ rocker_port_set_learning(rocker_port, ofdpa_port->learning);
err = ofdpa_port_ig_tbl(ofdpa_port, 0);
if (err) {
@@ -2488,20 +2487,23 @@ static int ofdpa_port_attr_stp_state_set(struct rocker_port *rocker_port,
}
static int ofdpa_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
- unsigned long brport_flags)
+ struct switchdev_brport_flags flags)
{
struct ofdpa_port *ofdpa_port = rocker_port->wpriv;
- unsigned long orig_flags;
- int err = 0;
+ int err;
- orig_flags = ofdpa_port->brport_flags;
- ofdpa_port->brport_flags = brport_flags;
+ if (flags.mask & BR_LEARNING) {
+ bool learning = !!(flags.val & BR_LEARNING);
- if ((orig_flags ^ ofdpa_port->brport_flags) & BR_LEARNING)
err = rocker_port_set_learning(ofdpa_port->rocker_port,
- !!(ofdpa_port->brport_flags & BR_LEARNING));
+ learning);
+ if (err)
+ return err;
- return err;
+ ofdpa_port->learning = learning;
+ }
+
+ return 0;
}
static int
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index 92d207c8a30e..dbd94156960f 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -72,12 +72,11 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
};
int err;
- flags &= BR_PORT_FLAGS_HW_OFFLOAD;
mask &= BR_PORT_FLAGS_HW_OFFLOAD;
if (!mask)
return 0;
- attr.u.brport_flags.val = flags;
+ attr.u.brport_flags.val = flags & mask;
attr.u.brport_flags.mask = mask;
/* We run from atomic context here */
--
2.25.1
From: Vladimir Oltean <[email protected]>
The chip can configure unicast flooding, broadcast flooding and learning.
Learning is per port, while flooding is per {ingress, egress} port pair
and we need to configure the same value for all possible ingress ports
towards the requested one.
While multicast flooding is not officially supported, we can hack it by
using a feature of the second generation (P/Q/R/S) devices, which is that
FDB entries are maskable, and multicast addresses always have an odd
first octet. So by putting a match-all for 00:01:00:00:00:00 addr and
00:01:00:00:00:00 mask at the end of the FDB, we make sure that it is
always checked last, and does not take precedence in front of any other
MDB. So it behaves effectively as an unknown multicast entry.
For the first generation switches, this feature is not available, so
unknown multicast will always be treated the same as unknown unicast.
So the only thing we can do is request the user to offload the settings
for these 2 flags in tandem, i.e.
ip link set swp2 type bridge_slave flood off
Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
ip link set swp2 type bridge_slave flood off mcast_flood off
ip link set swp2 type bridge_slave mcast_flood on
Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v3:
None.
Changes in v2:
Patch is new.
drivers/net/dsa/sja1105/sja1105.h | 2 +
drivers/net/dsa/sja1105/sja1105_main.c | 212 +++++++++++++++++++++++--
drivers/net/dsa/sja1105/sja1105_spi.c | 6 +
3 files changed, 209 insertions(+), 11 deletions(-)
diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index d582308c2401..15a0893d0ff1 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -94,6 +94,7 @@ struct sja1105_info {
* pop it when it's equal to TPID2.
*/
u16 qinq_tpid;
+ bool can_limit_mcast_flood;
int (*reset_cmd)(struct dsa_switch *ds);
int (*setup_rgmii_delay)(const void *ctx, int port);
/* Prototypes from include/net/dsa.h */
@@ -204,6 +205,7 @@ struct sja1105_private {
bool rgmii_rx_delay[SJA1105_NUM_PORTS];
bool rgmii_tx_delay[SJA1105_NUM_PORTS];
bool best_effort_vlan_filtering;
+ unsigned long learn_ena;
const struct sja1105_info *info;
struct gpio_desc *reset_gpio;
struct spi_device *spidev;
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 282253543f3b..8373cc1f5df1 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -25,6 +25,8 @@
#include "sja1105_sgmii.h"
#include "sja1105_tas.h"
+#define SJA1105_UNKNOWN_MULTICAST 0x010000000000ull
+
static const struct dsa_switch_ops sja1105_switch_ops;
static void sja1105_hw_reset(struct gpio_desc *gpio, unsigned int pulse_len,
@@ -42,15 +44,10 @@ static void
sja1105_port_allow_traffic(struct sja1105_l2_forwarding_entry *l2_fwd,
int from, int to, bool allow)
{
- if (allow) {
- l2_fwd[from].bc_domain |= BIT(to);
+ if (allow)
l2_fwd[from].reach_port |= BIT(to);
- l2_fwd[from].fl_domain |= BIT(to);
- } else {
- l2_fwd[from].bc_domain &= ~BIT(to);
+ else
l2_fwd[from].reach_port &= ~BIT(to);
- l2_fwd[from].fl_domain &= ~BIT(to);
- }
}
/* Structure used to temporarily transport device tree
@@ -220,17 +217,43 @@ static int sja1105_init_mii_settings(struct sja1105_private *priv,
static int sja1105_init_static_fdb(struct sja1105_private *priv)
{
+ struct sja1105_l2_lookup_entry *l2_lookup;
struct sja1105_table *table;
+ int port;
table = &priv->static_config.tables[BLK_IDX_L2_LOOKUP];
- /* We only populate the FDB table through dynamic
- * L2 Address Lookup entries
+ /* We only populate the FDB table through dynamic L2 Address Lookup
+ * entries, except for a special entry at the end which is a catch-all
+ * for unknown multicast and will be used to control flooding domain.
*/
if (table->entry_count) {
kfree(table->entries);
table->entry_count = 0;
}
+
+ if (!priv->info->can_limit_mcast_flood)
+ return 0;
+
+ table->entries = kcalloc(1, table->ops->unpacked_entry_size,
+ GFP_KERNEL);
+ if (!table->entries)
+ return -ENOMEM;
+
+ table->entry_count = 1;
+ l2_lookup = table->entries;
+
+ /* All L2 multicast addresses have an odd first octet */
+ l2_lookup[0].macaddr = SJA1105_UNKNOWN_MULTICAST;
+ l2_lookup[0].mask_macaddr = SJA1105_UNKNOWN_MULTICAST;
+ l2_lookup[0].lockeds = true;
+ l2_lookup[0].index = SJA1105_MAX_L2_LOOKUP_COUNT - 1;
+
+ /* Flood multicast to every port by default */
+ for (port = 0; port < priv->ds->num_ports; port++)
+ if (!dsa_is_unused_port(priv->ds, port))
+ l2_lookup[0].destports |= BIT(port);
+
return 0;
}
@@ -390,6 +413,12 @@ static int sja1105_init_l2_forwarding(struct sja1105_private *priv)
sja1105_port_allow_traffic(l2fwd, i, upstream, true);
sja1105_port_allow_traffic(l2fwd, upstream, i, true);
+
+ l2fwd[i].bc_domain = BIT(upstream);
+ l2fwd[i].fl_domain = BIT(upstream);
+
+ l2fwd[upstream].bc_domain |= BIT(i);
+ l2fwd[upstream].fl_domain |= BIT(i);
}
/* Next 8 entries define VLAN PCP mapping from ingress to egress.
* Create a one-to-one mapping.
@@ -1514,6 +1543,12 @@ static int sja1105_fdb_dump(struct dsa_switch *ds, int port,
*/
if (!(l2_lookup.destports & BIT(port)))
continue;
+
+ /* We need to hide the FDB entry for unknown multicast */
+ if (l2_lookup.macaddr == SJA1105_UNKNOWN_MULTICAST &&
+ l2_lookup.mask_macaddr == SJA1105_UNKNOWN_MULTICAST)
+ continue;
+
u64_to_ether_addr(l2_lookup.macaddr, macaddr);
/* We need to hide the dsa_8021q VLANs from the user. */
@@ -1605,12 +1640,12 @@ static void sja1105_bridge_stp_state_set(struct dsa_switch *ds, int port,
case BR_STATE_LEARNING:
mac[port].ingress = true;
mac[port].egress = false;
- mac[port].dyn_learn = true;
+ mac[port].dyn_learn = !!(priv->learn_ena & BIT(port));
break;
case BR_STATE_FORWARDING:
mac[port].ingress = true;
mac[port].egress = true;
- mac[port].dyn_learn = true;
+ mac[port].dyn_learn = !!(priv->learn_ena & BIT(port));
break;
default:
dev_err(ds->dev, "invalid STP state: %d\n", state);
@@ -3239,6 +3274,160 @@ static void sja1105_port_policer_del(struct dsa_switch *ds, int port)
sja1105_static_config_reload(priv, SJA1105_BEST_EFFORT_POLICING);
}
+static int sja1105_port_set_learning(struct sja1105_private *priv, int port,
+ bool enabled)
+{
+ struct sja1105_mac_config_entry *mac;
+ int rc;
+
+ mac = priv->static_config.tables[BLK_IDX_MAC_CONFIG].entries;
+
+ mac[port].dyn_learn = !!(priv->learn_ena & BIT(port));
+
+ rc = sja1105_dynamic_config_write(priv, BLK_IDX_MAC_CONFIG, port,
+ &mac[port], true);
+ if (rc)
+ return rc;
+
+ if (enabled)
+ priv->learn_ena |= BIT(port);
+ else
+ priv->learn_ena &= ~BIT(port);
+
+ return 0;
+}
+
+/* Common function for unicast and broadcast flood configuration.
+ * Flooding is configured between each {ingress, egress} port pair, and since
+ * the bridge's semantics are those of "egress flooding", it means we must
+ * enable flooding towards this port from all ingress ports that are in the
+ * same bridge. In practice, we just enable flooding from all possible ingress
+ * ports regardless of whether they're in the same bridge or not, since the
+ * reach_port configuration will not allow flooded frames to leak across
+ * bridging domains anyway.
+ */
+static int sja1105_port_ucast_bcast_flood(struct sja1105_private *priv, int to,
+ struct switchdev_brport_flags flags)
+{
+ struct sja1105_l2_forwarding_entry *l2_fwd;
+ int from, rc;
+
+ l2_fwd = priv->static_config.tables[BLK_IDX_L2_FORWARDING].entries;
+
+ for (from = 0; from < priv->ds->num_ports; from++) {
+ if (dsa_is_unused_port(priv->ds, from))
+ continue;
+ if (from == to)
+ continue;
+
+ /* Unicast */
+ if (flags.mask & BR_FLOOD) {
+ if (flags.val & BR_FLOOD)
+ l2_fwd[from].fl_domain |= BIT(to);
+ else
+ l2_fwd[from].fl_domain &= ~BIT(to);
+ }
+ /* Broadcast */
+ if (flags.mask & BR_BCAST_FLOOD) {
+ if (flags.val & BR_BCAST_FLOOD)
+ l2_fwd[from].bc_domain |= BIT(to);
+ else
+ l2_fwd[from].bc_domain &= ~BIT(to);
+ }
+
+ rc = sja1105_dynamic_config_write(priv, BLK_IDX_L2_FORWARDING,
+ from, &l2_fwd[from], true);
+ if (rc < 0)
+ return rc;
+ }
+
+ return 0;
+}
+
+static int sja1105_port_mcast_flood(struct sja1105_private *priv, int to,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ struct sja1105_l2_lookup_entry *l2_lookup;
+ struct sja1105_table *table;
+ int match;
+
+ table = &priv->static_config.tables[BLK_IDX_L2_LOOKUP];
+ l2_lookup = table->entries;
+
+ for (match = 0; match < table->entry_count; match++)
+ if (l2_lookup[match].macaddr == SJA1105_UNKNOWN_MULTICAST &&
+ l2_lookup[match].mask_macaddr == SJA1105_UNKNOWN_MULTICAST)
+ break;
+
+ if (match == table->entry_count) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Could not find FDB entry for unknown multicast");
+ return -ENOSPC;
+ }
+
+ if (flags.val & BR_MCAST_FLOOD)
+ l2_lookup[match].destports |= BIT(to);
+ else
+ l2_lookup[match].destports &= ~BIT(to);
+
+ return sja1105_dynamic_config_write(priv, BLK_IDX_L2_LOOKUP,
+ l2_lookup[match].index,
+ &l2_lookup[match],
+ true);
+}
+
+static int sja1105_port_bridge_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ struct sja1105_private *priv = ds->priv;
+ int rc;
+
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+ BR_BCAST_FLOOD))
+ return -EINVAL;
+
+ if (flags.mask & BR_LEARNING) {
+ bool learn_ena = !!(flags.val & BR_LEARNING);
+
+ rc = sja1105_port_set_learning(priv, port, learn_ena);
+ if (rc)
+ return rc;
+ }
+
+ if (flags.mask & (BR_FLOOD | BR_BCAST_FLOOD)) {
+ rc = sja1105_port_ucast_bcast_flood(priv, port, flags);
+ if (rc)
+ return rc;
+ }
+
+ if (flags.mask & (BR_FLOOD | BR_MCAST_FLOOD) &&
+ !priv->info->can_limit_mcast_flood) {
+ bool multicast = !!(flags.val & BR_MCAST_FLOOD);
+ bool unicast = !!(flags.val & BR_FLOOD);
+
+ if (unicast != multicast) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "This chip cannot configure multicast flooding independently of unicast");
+ return -EINVAL;
+ }
+ }
+
+ /* For chips that can't offload BR_MCAST_FLOOD independently, there
+ * is nothing to do here, we ensured the configuration is in sync by
+ * offloading BR_FLOOD.
+ */
+ if (flags.mask & BR_MCAST_FLOOD && priv->info->can_limit_mcast_flood) {
+ rc = sja1105_port_mcast_flood(priv, port, flags,
+ extack);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
static const struct dsa_switch_ops sja1105_switch_ops = {
.get_tag_protocol = sja1105_get_tag_protocol,
.setup = sja1105_setup,
@@ -3262,6 +3451,7 @@ static const struct dsa_switch_ops sja1105_switch_ops = {
.port_fdb_del = sja1105_fdb_del,
.port_bridge_join = sja1105_bridge_join,
.port_bridge_leave = sja1105_bridge_leave,
+ .port_bridge_flags = sja1105_port_bridge_flags,
.port_stp_state_set = sja1105_bridge_stp_state_set,
.port_vlan_filtering = sja1105_vlan_filtering,
.port_vlan_add = sja1105_vlan_add,
diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c
index 591c5734747d..f7a1514f81e8 100644
--- a/drivers/net/dsa/sja1105/sja1105_spi.c
+++ b/drivers/net/dsa/sja1105/sja1105_spi.c
@@ -512,6 +512,7 @@ const struct sja1105_info sja1105e_info = {
.static_ops = sja1105e_table_ops,
.dyn_ops = sja1105et_dyn_ops,
.qinq_tpid = ETH_P_8021Q,
+ .can_limit_mcast_flood = false,
.ptp_ts_bits = 24,
.ptpegr_ts_bytes = 4,
.num_cbs_shapers = SJA1105ET_MAX_CBS_COUNT,
@@ -529,6 +530,7 @@ const struct sja1105_info sja1105t_info = {
.static_ops = sja1105t_table_ops,
.dyn_ops = sja1105et_dyn_ops,
.qinq_tpid = ETH_P_8021Q,
+ .can_limit_mcast_flood = false,
.ptp_ts_bits = 24,
.ptpegr_ts_bytes = 4,
.num_cbs_shapers = SJA1105ET_MAX_CBS_COUNT,
@@ -546,6 +548,7 @@ const struct sja1105_info sja1105p_info = {
.static_ops = sja1105p_table_ops,
.dyn_ops = sja1105pqrs_dyn_ops,
.qinq_tpid = ETH_P_8021AD,
+ .can_limit_mcast_flood = true,
.ptp_ts_bits = 32,
.ptpegr_ts_bytes = 8,
.num_cbs_shapers = SJA1105PQRS_MAX_CBS_COUNT,
@@ -564,6 +567,7 @@ const struct sja1105_info sja1105q_info = {
.static_ops = sja1105q_table_ops,
.dyn_ops = sja1105pqrs_dyn_ops,
.qinq_tpid = ETH_P_8021AD,
+ .can_limit_mcast_flood = true,
.ptp_ts_bits = 32,
.ptpegr_ts_bytes = 8,
.num_cbs_shapers = SJA1105PQRS_MAX_CBS_COUNT,
@@ -582,6 +586,7 @@ const struct sja1105_info sja1105r_info = {
.static_ops = sja1105r_table_ops,
.dyn_ops = sja1105pqrs_dyn_ops,
.qinq_tpid = ETH_P_8021AD,
+ .can_limit_mcast_flood = true,
.ptp_ts_bits = 32,
.ptpegr_ts_bytes = 8,
.num_cbs_shapers = SJA1105PQRS_MAX_CBS_COUNT,
@@ -601,6 +606,7 @@ const struct sja1105_info sja1105s_info = {
.dyn_ops = sja1105pqrs_dyn_ops,
.regs = &sja1105pqrs_regs,
.qinq_tpid = ETH_P_8021AD,
+ .can_limit_mcast_flood = true,
.ptp_ts_bits = 32,
.ptpegr_ts_bytes = 8,
.num_cbs_shapers = SJA1105PQRS_MAX_CBS_COUNT,
--
2.25.1
From: Vladimir Oltean <[email protected]>
In preparation of offloading the bridge port flags which have
independent settings for unknown multicast and for broadcast, we should
also start reserving one destination Port Group ID for the flooding of
broadcast packets, to allow configuring it individually.
Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v3:
None.
Changes in v2:
None.
drivers/net/ethernet/mscc/ocelot.c | 13 ++++++++-----
include/soc/mscc/ocelot.h | 15 ++++++++-------
2 files changed, 16 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index f8b85ab8be5d..8c1052346b58 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1662,7 +1662,7 @@ int ocelot_init(struct ocelot *ocelot)
/* Setup flooding PGIDs */
for (i = 0; i < ocelot->num_flooding_pgids; i++)
ocelot_write_rix(ocelot, ANA_FLOODING_FLD_MULTICAST(PGID_MC) |
- ANA_FLOODING_FLD_BROADCAST(PGID_MC) |
+ ANA_FLOODING_FLD_BROADCAST(PGID_BC) |
ANA_FLOODING_FLD_UNICAST(PGID_UC),
ANA_FLOODING, i);
ocelot_write(ocelot, ANA_FLOODING_IPMC_FLD_MC6_DATA(PGID_MCIPV6) |
@@ -1683,15 +1683,18 @@ int ocelot_init(struct ocelot *ocelot)
ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, PGID_SRC + port);
}
- /* Allow broadcast MAC frames. */
for_each_nonreserved_multicast_dest_pgid(ocelot, i) {
u32 val = ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports - 1, 0));
ocelot_write_rix(ocelot, val, ANA_PGID_PGID, i);
}
- ocelot_write_rix(ocelot,
- ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports, 0)),
- ANA_PGID_PGID, PGID_MC);
+ /* Allow broadcast and unknown L2 multicast to the CPU. */
+ ocelot_rmw_rix(ocelot, ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)),
+ ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)),
+ ANA_PGID_PGID, PGID_MC);
+ ocelot_rmw_rix(ocelot, ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)),
+ ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)),
+ ANA_PGID_PGID, PGID_BC);
ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, PGID_MCIPV4);
ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, PGID_MCIPV6);
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index d0d48e9620fb..7ee85527cb5f 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -54,16 +54,17 @@
* PGID_CPU: used for whitelisting certain MAC addresses, such as the addresses
* of the switch port net devices, towards the CPU port module.
* PGID_UC: the flooding destinations for unknown unicast traffic.
- * PGID_MC: the flooding destinations for broadcast and non-IP multicast
- * traffic.
+ * PGID_MC: the flooding destinations for non-IP multicast traffic.
* PGID_MCIPV4: the flooding destinations for IPv4 multicast traffic.
* PGID_MCIPV6: the flooding destinations for IPv6 multicast traffic.
+ * PGID_BC: the flooding destinations for broadcast traffic.
*/
-#define PGID_CPU 59
-#define PGID_UC 60
-#define PGID_MC 61
-#define PGID_MCIPV4 62
-#define PGID_MCIPV6 63
+#define PGID_CPU 58
+#define PGID_UC 59
+#define PGID_MC 60
+#define PGID_MCIPV4 61
+#define PGID_MCIPV6 62
+#define PGID_BC 63
#define for_each_unicast_dest_pgid(ocelot, pgid) \
for ((pgid) = 0; \
--
2.25.1
From: Vladimir Oltean <[email protected]>
We should not be unconditionally enabling address learning, since doing
that is actively detrimential when a port is standalone and not offloading
a bridge. Namely, if a port in the switch is standalone and others are
offloading the bridge, then we could enter a situation where we learn an
address towards the standalone port, but the bridged ports could not
forward the packet there, because the CPU is the only path between the
standalone and the bridged ports. The solution of course is to not
enable address learning unless the bridge asks for it.
We need to set up the initial port flags for no learning and flooding
everything, then the bridge takes over. The flood configuration was
already configured ok in ocelot_init, we just need to disable learning
in ocelot_init_port.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Alexandre Belloni <[email protected]>
---
Changes in v3:
None.
Changes in v2:
- Disable learning in ocelot_init_port.
- Keep a single bool ocelot_port->learn_ena instead of
ocelot_port->brport_flags.
- Stop touching the brport_flags from ocelot_port_bridge_leave (which
was a leftover).
drivers/net/dsa/ocelot/felix.c | 10 +++++
drivers/net/ethernet/mscc/ocelot.c | 59 +++++++++++++++++++++++++-
drivers/net/ethernet/mscc/ocelot_net.c | 4 ++
include/soc/mscc/ocelot.h | 3 ++
4 files changed, 75 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 1bd5aea12b25..4ff18415ef95 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -553,6 +553,15 @@ static void felix_bridge_stp_state_set(struct dsa_switch *ds, int port,
return ocelot_bridge_stp_state_set(ocelot, port, state);
}
+static int felix_bridge_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags val,
+ struct netlink_ext_ack *extack)
+{
+ struct ocelot *ocelot = ds->priv;
+
+ return ocelot_port_bridge_flags(ocelot, port, val);
+}
+
static int felix_bridge_join(struct dsa_switch *ds, int port,
struct net_device *br)
{
@@ -1358,6 +1367,7 @@ const struct dsa_switch_ops felix_switch_ops = {
.port_fdb_del = felix_fdb_del,
.port_mdb_add = felix_mdb_add,
.port_mdb_del = felix_mdb_del,
+ .port_bridge_flags = felix_bridge_flags,
.port_bridge_join = felix_bridge_join,
.port_bridge_leave = felix_bridge_leave,
.port_lag_join = felix_lag_join,
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 8c1052346b58..65bc7d59d2c9 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -984,6 +984,7 @@ EXPORT_SYMBOL(ocelot_apply_bridge_fwd_mask);
void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
{
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
u32 port_cfg;
if (!(BIT(port) & ocelot->bridge_mask))
@@ -996,7 +997,8 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
ocelot->bridge_fwd_mask |= BIT(port);
fallthrough;
case BR_STATE_LEARNING:
- port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
+ if (ocelot_port->learn_ena)
+ port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
break;
default:
@@ -1480,6 +1482,57 @@ int ocelot_get_max_mtu(struct ocelot *ocelot, int port)
}
EXPORT_SYMBOL(ocelot_get_max_mtu);
+int ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
+ struct switchdev_brport_flags flags)
+{
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
+
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+ BR_BCAST_FLOOD))
+ return -EINVAL;
+
+ if (flags.mask & BR_LEARNING) {
+ u32 val = 0;
+
+ ocelot_port->learn_ena = !!(flags.val & BR_LEARNING);
+ if (ocelot_port->learn_ena)
+ val = ANA_PORT_PORT_CFG_LEARN_ENA;
+
+ ocelot_rmw_gix(ocelot, val, ANA_PORT_PORT_CFG_LEARN_ENA,
+ ANA_PORT_PORT_CFG, port);
+ }
+
+ if (flags.mask & BR_FLOOD) {
+ u32 val = 0;
+
+ if (flags.val & BR_FLOOD)
+ val = BIT(port);
+
+ ocelot_rmw_rix(ocelot, val, BIT(port), ANA_PGID_PGID, PGID_UC);
+ }
+
+ if (flags.mask & BR_MCAST_FLOOD) {
+ u32 val = 0;
+
+ if (flags.val & BR_MCAST_FLOOD)
+ val = BIT(port);
+
+ ocelot_rmw_rix(ocelot, val, BIT(port), ANA_PGID_PGID, PGID_MC);
+ }
+
+ if (flags.mask & BR_BCAST_FLOOD) {
+ u32 val = 0;
+
+ if (flags.val & BR_BCAST_FLOOD)
+ val = BIT(port);
+
+ ocelot_rmw_rix(ocelot, val, BIT(port), ANA_PGID_PGID, PGID_BC);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(ocelot_port_bridge_flags);
+
void ocelot_init_port(struct ocelot *ocelot, int port)
{
struct ocelot_port *ocelot_port = ocelot->ports[port];
@@ -1524,6 +1577,10 @@ void ocelot_init_port(struct ocelot *ocelot, int port)
ANA_PORT_DROP_CFG_DROP_MC_SMAC_ENA,
ANA_PORT_DROP_CFG, port);
+ /* Disable source address learning for standalone mode */
+ ocelot_rmw_gix(ocelot, 0, ANA_PORT_PORT_CFG_LEARN_ENA,
+ ANA_PORT_PORT_CFG, port);
+
/* Set default VLAN and tag type to 8021Q. */
ocelot_rmw_gix(ocelot, REW_PORT_VLAN_CFG_PORT_TPID(ETH_P_8021Q),
REW_PORT_VLAN_CFG_PORT_TPID_M,
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index f9da4aa39444..9944d79c6685 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1026,6 +1026,10 @@ static int ocelot_port_attr_set(struct net_device *dev,
case SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED:
ocelot_port_attr_mc_set(ocelot, port, !attr->u.mc_disabled);
break;
+ case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
+ err = ocelot_port_bridge_flags(ocelot, port,
+ attr->u.brport_flags);
+ break;
default:
err = -EOPNOTSUPP;
break;
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 7ee85527cb5f..e6aacf65fa1e 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -612,6 +612,7 @@ struct ocelot_port {
u8 *xmit_template;
bool is_dsa_8021q_cpu;
+ bool learn_ena;
struct net_device *bond;
bool lag_tx_active;
@@ -764,6 +765,8 @@ void ocelot_adjust_link(struct ocelot *ocelot, int port,
int ocelot_port_vlan_filtering(struct ocelot *ocelot, int port, bool enabled);
void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state);
void ocelot_apply_bridge_fwd_mask(struct ocelot *ocelot);
+int ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
+ struct switchdev_brport_flags val);
int ocelot_port_bridge_join(struct ocelot *ocelot, int port,
struct net_device *bridge);
int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
--
2.25.1
On Wed, Feb 10, 2021 at 11:14:41AM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> Because the bridge will start offloading SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
> while not serialized by any lock such as the br->lock spinlock, existing
> drivers that treat that attribute and cache the brport flags might no
> longer work correctly.
Can you explain the race? This notification is sent from sysfs/netlink
call path, both of which take rtnl.
On Wed, Feb 10, 2021 at 12:12:57PM +0200, Ido Schimmel wrote:
> On Wed, Feb 10, 2021 at 11:14:41AM +0200, Vladimir Oltean wrote:
> > From: Vladimir Oltean <[email protected]>
> >
> > Because the bridge will start offloading SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
> > while not serialized by any lock such as the br->lock spinlock, existing
> > drivers that treat that attribute and cache the brport flags might no
> > longer work correctly.
>
> Can you explain the race? This notification is sent from sysfs/netlink
> call path, both of which take rtnl.
Replying here to both you and Nikolay: there isn't any race, sorry, I
missed the fact that brport_store takes the rtnl_mutex and by extension
I thought that RTM_SETLINK runs unlocked too, without really checking.
Well, at least that's good news, the implementation can be a lot more
straightforward then...
On 10/02/2021 11:14, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> The initial goal of this series was to have better support for
> standalone ports mode and multiple bridges on the DSA drivers like
> ocelot/felix and sja1105. Proper support for standalone mode requires
> disabling address learning, which in turn requires interaction with the
> switchdev notifier, which is actually where most of the patches are.
>
> I also noticed that most of the drivers are actually talking either to
> firmware or SPI/MDIO connected devices from the brport flags switchdev
> attribute handler, so it makes sense to actually make it sleepable
> instead of atomic.
>
Hi Vladimir,
Let's take a step back for a moment and discuss the bridge unlock/lock sequences
that come with this set. I'd really like to avoid those as they're a recipe
for future problems. The only good way to achieve that currently is to keep
the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
after the flags have been changed (if they have changed obviously). That would
make the code read much easier since we'll have all our lock/unlock sequences
in the same code blocks and won't play games to get sleepable context.
Please let's think and work in that direction, rather than having:
+ spin_lock_bh(&p->br->lock);
+ if (err) {
+ netdev_err(p->dev, "%s\n", extack._msg);
+ return err;
}
+
which immediately looks like a bug even though after some code checking we can
verify it's ok. WDYT?
I plan to get rid of most of the br->lock since it's been abused for a very long
time because it's essentially STP lock, but people have started using it for other
things and I plan to fix that when I get more time.
Thanks,
Nik
Hi Nikolay,
On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> Hi Vladimir,
> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
> that come with this set. I'd really like to avoid those as they're a recipe
> for future problems. The only good way to achieve that currently is to keep
> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
> after the flags have been changed (if they have changed obviously). That would
> make the code read much easier since we'll have all our lock/unlock sequences
> in the same code blocks and won't play games to get sleepable context.
> Please let's think and work in that direction, rather than having:
> + spin_lock_bh(&p->br->lock);
> + if (err) {
> + netdev_err(p->dev, "%s\n", extack._msg);
> + return err;
> }
> +
>
> which immediately looks like a bug even though after some code checking we can
> verify it's ok. WDYT?
>
> I plan to get rid of most of the br->lock since it's been abused for a very long
> time because it's essentially STP lock, but people have started using it for other
> things and I plan to fix that when I get more time.
This won't make the sysfs codepath any nicer, will it?
On 10/02/2021 12:45, Vladimir Oltean wrote:
> Hi Nikolay,
>
> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
>> Hi Vladimir,
>> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
>> that come with this set. I'd really like to avoid those as they're a recipe
>> for future problems. The only good way to achieve that currently is to keep
>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
>> after the flags have been changed (if they have changed obviously). That would
>> make the code read much easier since we'll have all our lock/unlock sequences
>> in the same code blocks and won't play games to get sleepable context.
>> Please let's think and work in that direction, rather than having:
>> + spin_lock_bh(&p->br->lock);
>> + if (err) {
>> + netdev_err(p->dev, "%s\n", extack._msg);
>> + return err;
>> }
>> +
>>
>> which immediately looks like a bug even though after some code checking we can
>> verify it's ok. WDYT?
>>
>> I plan to get rid of most of the br->lock since it's been abused for a very long
>> time because it's essentially STP lock, but people have started using it for other
>> things and I plan to fix that when I get more time.
>
> This won't make the sysfs codepath any nicer, will it?
>
Currently we'll have to live with a hack that checks if the flags have changed. I agree
it won't be pretty, but we won't have to unlock and lock again in the middle of the
called function and we'll have all our locking in the same place, easier to verify and
later easier to remove. Once I get rid of most of the br->lock usage we can revisit
the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
send the switchdev notification outside of the lock and revert the flags if it doesn't
go through which doesn't sound much better.
I'm open to any other suggestions, but definitely would like to avoid playing locking games.
Even if it means casing out flag setting from all other store_ functions for sysfs.
On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
> On 10/02/2021 12:45, Vladimir Oltean wrote:
> > Hi Nikolay,
> >
> > On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> >> Hi Vladimir,
> >> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
> >> that come with this set. I'd really like to avoid those as they're a recipe
> >> for future problems. The only good way to achieve that currently is to keep
> >> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
> >> after the flags have been changed (if they have changed obviously). That would
> >> make the code read much easier since we'll have all our lock/unlock sequences
> >> in the same code blocks and won't play games to get sleepable context.
> >> Please let's think and work in that direction, rather than having:
> >> + spin_lock_bh(&p->br->lock);
> >> + if (err) {
> >> + netdev_err(p->dev, "%s\n", extack._msg);
> >> + return err;
> >> }
> >> +
> >>
> >> which immediately looks like a bug even though after some code checking we can
> >> verify it's ok. WDYT?
> >>
> >> I plan to get rid of most of the br->lock since it's been abused for a very long
> >> time because it's essentially STP lock, but people have started using it for other
> >> things and I plan to fix that when I get more time.
> >
> > This won't make the sysfs codepath any nicer, will it?
> >
>
> Currently we'll have to live with a hack that checks if the flags have changed. I agree
> it won't be pretty, but we won't have to unlock and lock again in the middle of the
> called function and we'll have all our locking in the same place, easier to verify and
> later easier to remove. Once I get rid of most of the br->lock usage we can revisit
> the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
> send the switchdev notification outside of the lock and revert the flags if it doesn't
> go through which doesn't sound much better.
> I'm open to any other suggestions, but definitely would like to avoid playing locking games.
> Even if it means casing out flag setting from all other store_ functions for sysfs.
By casing out flag settings you mean something like this?
#define BRPORT_ATTR(_name, _mode, _show, _store) \
const struct brport_attribute brport_attr_##_name = { \
.attr = {.name = __stringify(_name), \
.mode = _mode }, \
.show = _show, \
.store_unlocked = _store, \
};
#define BRPORT_ATTR_FLAG(_name, _mask) \
static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
{ \
return sprintf(buf, "%d\n", !!(p->flags & _mask)); \
} \
static int store_##_name(struct net_bridge_port *p, unsigned long v) \
{ \
return store_flag(p, v, _mask); \
} \
static BRPORT_ATTR(_name, 0644, \
show_##_name, store_##_name)
static ssize_t brport_store(struct kobject *kobj,
struct attribute *attr,
const char *buf, size_t count)
{
...
} else if (brport_attr->store_unlocked) {
val = simple_strtoul(buf, &endp, 0);
if (endp == buf)
goto out_unlock;
ret = brport_attr->store_unlocked(p, val);
}
On 10/02/2021 13:01, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
>> On 10/02/2021 12:45, Vladimir Oltean wrote:
>>> Hi Nikolay,
>>>
>>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
>>>> Hi Vladimir,
>>>> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
>>>> that come with this set. I'd really like to avoid those as they're a recipe
>>>> for future problems. The only good way to achieve that currently is to keep
>>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
>>>> after the flags have been changed (if they have changed obviously). That would
>>>> make the code read much easier since we'll have all our lock/unlock sequences
>>>> in the same code blocks and won't play games to get sleepable context.
>>>> Please let's think and work in that direction, rather than having:
>>>> + spin_lock_bh(&p->br->lock);
>>>> + if (err) {
>>>> + netdev_err(p->dev, "%s\n", extack._msg);
>>>> + return err;
>>>> }
>>>> +
>>>>
>>>> which immediately looks like a bug even though after some code checking we can
>>>> verify it's ok. WDYT?
>>>>
>>>> I plan to get rid of most of the br->lock since it's been abused for a very long
>>>> time because it's essentially STP lock, but people have started using it for other
>>>> things and I plan to fix that when I get more time.
>>>
>>> This won't make the sysfs codepath any nicer, will it?
>>>
>>
>> Currently we'll have to live with a hack that checks if the flags have changed. I agree
>> it won't be pretty, but we won't have to unlock and lock again in the middle of the
>> called function and we'll have all our locking in the same place, easier to verify and
>> later easier to remove. Once I get rid of most of the br->lock usage we can revisit
>> the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
>> send the switchdev notification outside of the lock and revert the flags if it doesn't
>> go through which doesn't sound much better.
>> I'm open to any other suggestions, but definitely would like to avoid playing locking games.
>> Even if it means casing out flag setting from all other store_ functions for sysfs.
>
> By casing out flag settings you mean something like this?
>
>
> #define BRPORT_ATTR(_name, _mode, _show, _store) \
> const struct brport_attribute brport_attr_##_name = { \
> .attr = {.name = __stringify(_name), \
> .mode = _mode }, \
> .show = _show, \
> .store_unlocked = _store, \
> };
>
> #define BRPORT_ATTR_FLAG(_name, _mask) \
> static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
> { \
> return sprintf(buf, "%d\n", !!(p->flags & _mask)); \
> } \
> static int store_##_name(struct net_bridge_port *p, unsigned long v) \
> { \
> return store_flag(p, v, _mask); \
> } \
> static BRPORT_ATTR(_name, 0644, \
> show_##_name, store_##_name)
>
> static ssize_t brport_store(struct kobject *kobj,
> struct attribute *attr,
> const char *buf, size_t count)
> {
> ...
>
> } else if (brport_attr->store_unlocked) {
> val = simple_strtoul(buf, &endp, 0);
> if (endp == buf)
> goto out_unlock;
> ret = brport_attr->store_unlocked(p, val);
> }
>
Yes, this can work but will need a bit more changes because of br_port_flags_change().
Then the netlink side can be modeled in a similar way.
On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
> On 10/02/2021 13:01, Vladimir Oltean wrote:
> > On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
> >> On 10/02/2021 12:45, Vladimir Oltean wrote:
> >>> Hi Nikolay,
> >>>
> >>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> >>>> Hi Vladimir,
> >>>> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
> >>>> that come with this set. I'd really like to avoid those as they're a recipe
> >>>> for future problems. The only good way to achieve that currently is to keep
> >>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
> >>>> after the flags have been changed (if they have changed obviously). That would
> >>>> make the code read much easier since we'll have all our lock/unlock sequences
> >>>> in the same code blocks and won't play games to get sleepable context.
> >>>> Please let's think and work in that direction, rather than having:
> >>>> + spin_lock_bh(&p->br->lock);
> >>>> + if (err) {
> >>>> + netdev_err(p->dev, "%s\n", extack._msg);
> >>>> + return err;
> >>>> }
> >>>> +
> >>>>
> >>>> which immediately looks like a bug even though after some code checking we can
> >>>> verify it's ok. WDYT?
> >>>>
> >>>> I plan to get rid of most of the br->lock since it's been abused for a very long
> >>>> time because it's essentially STP lock, but people have started using it for other
> >>>> things and I plan to fix that when I get more time.
> >>>
> >>> This won't make the sysfs codepath any nicer, will it?
> >>>
> >>
> >> Currently we'll have to live with a hack that checks if the flags have changed. I agree
> >> it won't be pretty, but we won't have to unlock and lock again in the middle of the
> >> called function and we'll have all our locking in the same place, easier to verify and
> >> later easier to remove. Once I get rid of most of the br->lock usage we can revisit
> >> the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
> >> send the switchdev notification outside of the lock and revert the flags if it doesn't
> >> go through which doesn't sound much better.
> >> I'm open to any other suggestions, but definitely would like to avoid playing locking games.
> >> Even if it means casing out flag setting from all other store_ functions for sysfs.
> >
> > By casing out flag settings you mean something like this?
> >
> >
> > #define BRPORT_ATTR(_name, _mode, _show, _store) \
> > const struct brport_attribute brport_attr_##_name = { \
> > .attr = {.name = __stringify(_name), \
> > .mode = _mode }, \
> > .show = _show, \
> > .store_unlocked = _store, \
> > };
> >
> > #define BRPORT_ATTR_FLAG(_name, _mask) \
> > static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
> > { \
> > return sprintf(buf, "%d\n", !!(p->flags & _mask)); \
> > } \
> > static int store_##_name(struct net_bridge_port *p, unsigned long v) \
> > { \
> > return store_flag(p, v, _mask); \
> > } \
> > static BRPORT_ATTR(_name, 0644, \
> > show_##_name, store_##_name)
> >
> > static ssize_t brport_store(struct kobject *kobj,
> > struct attribute *attr,
> > const char *buf, size_t count)
> > {
> > ...
> >
> > } else if (brport_attr->store_unlocked) {
> > val = simple_strtoul(buf, &endp, 0);
> > if (endp == buf)
> > goto out_unlock;
> > ret = brport_attr->store_unlocked(p, val);
> > }
> >
>
> Yes, this can work but will need a bit more changes because of br_port_flags_change().
> Then the netlink side can be modeled in a similar way.
What I just don't understand is how others can get away with doing
sleepable work in atomic context but I can't make the notifier blocking
by dropping a spinlock which isn't needed there, because it looks ugly :D
On 10/02/2021 14:01, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
>> On 10/02/2021 13:01, Vladimir Oltean wrote:
>>> On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
>>>> On 10/02/2021 12:45, Vladimir Oltean wrote:
>>>>> Hi Nikolay,
>>>>>
>>>>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
>>>>>> Hi Vladimir,
>>>>>> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
>>>>>> that come with this set. I'd really like to avoid those as they're a recipe
>>>>>> for future problems. The only good way to achieve that currently is to keep
>>>>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
>>>>>> after the flags have been changed (if they have changed obviously). That would
>>>>>> make the code read much easier since we'll have all our lock/unlock sequences
>>>>>> in the same code blocks and won't play games to get sleepable context.
>>>>>> Please let's think and work in that direction, rather than having:
>>>>>> + spin_lock_bh(&p->br->lock);
>>>>>> + if (err) {
>>>>>> + netdev_err(p->dev, "%s\n", extack._msg);
>>>>>> + return err;
>>>>>> }
>>>>>> +
>>>>>>
>>>>>> which immediately looks like a bug even though after some code checking we can
>>>>>> verify it's ok. WDYT?
>>>>>>
>>>>>> I plan to get rid of most of the br->lock since it's been abused for a very long
>>>>>> time because it's essentially STP lock, but people have started using it for other
>>>>>> things and I plan to fix that when I get more time.
>>>>>
>>>>> This won't make the sysfs codepath any nicer, will it?
>>>>>
>>>>
>>>> Currently we'll have to live with a hack that checks if the flags have changed. I agree
>>>> it won't be pretty, but we won't have to unlock and lock again in the middle of the
>>>> called function and we'll have all our locking in the same place, easier to verify and
>>>> later easier to remove. Once I get rid of most of the br->lock usage we can revisit
>>>> the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
>>>> send the switchdev notification outside of the lock and revert the flags if it doesn't
>>>> go through which doesn't sound much better.
>>>> I'm open to any other suggestions, but definitely would like to avoid playing locking games.
>>>> Even if it means casing out flag setting from all other store_ functions for sysfs.
>>>
>>> By casing out flag settings you mean something like this?
>>>
>>>
>>> #define BRPORT_ATTR(_name, _mode, _show, _store) \
>>> const struct brport_attribute brport_attr_##_name = { \
>>> .attr = {.name = __stringify(_name), \
>>> .mode = _mode }, \
>>> .show = _show, \
>>> .store_unlocked = _store, \
>>> };
>>>
>>> #define BRPORT_ATTR_FLAG(_name, _mask) \
>>> static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
>>> { \
>>> return sprintf(buf, "%d\n", !!(p->flags & _mask)); \
>>> } \
>>> static int store_##_name(struct net_bridge_port *p, unsigned long v) \
>>> { \
>>> return store_flag(p, v, _mask); \
>>> } \
>>> static BRPORT_ATTR(_name, 0644, \
>>> show_##_name, store_##_name)
>>>
>>> static ssize_t brport_store(struct kobject *kobj,
>>> struct attribute *attr,
>>> const char *buf, size_t count)
>>> {
>>> ...
>>>
>>> } else if (brport_attr->store_unlocked) {
>>> val = simple_strtoul(buf, &endp, 0);
>>> if (endp == buf)
>>> goto out_unlock;
>>> ret = brport_attr->store_unlocked(p, val);
>>> }
>>>
>>
>> Yes, this can work but will need a bit more changes because of br_port_flags_change().
>> Then the netlink side can be modeled in a similar way.
>
> What I just don't understand is how others can get away with doing
> sleepable work in atomic context but I can't make the notifier blocking
> by dropping a spinlock which isn't needed there, because it looks ugly :D
>
That's a bug that's gone unnoticed, surely not an argument to make error-prone changes.
It's not because of ugliness, rather for easier reasoning when people want to work with
that code, easier to maintain and later easier to verify when the lock gets removed.
We'll reduce the chance for new bugs by having code that can be understood easier,
especially for locking it's never a good idea to play games, we must try to avoid it
when we can.
On Wed, Feb 10, 2021 at 02:01:06PM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
> > On 10/02/2021 13:01, Vladimir Oltean wrote:
> > > On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
> > >> On 10/02/2021 12:45, Vladimir Oltean wrote:
> > >>> Hi Nikolay,
> > >>>
> > >>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> > >>>> Hi Vladimir,
> > >>>> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
> > >>>> that come with this set. I'd really like to avoid those as they're a recipe
> > >>>> for future problems. The only good way to achieve that currently is to keep
> > >>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
> > >>>> after the flags have been changed (if they have changed obviously). That would
> > >>>> make the code read much easier since we'll have all our lock/unlock sequences
> > >>>> in the same code blocks and won't play games to get sleepable context.
> > >>>> Please let's think and work in that direction, rather than having:
> > >>>> + spin_lock_bh(&p->br->lock);
> > >>>> + if (err) {
> > >>>> + netdev_err(p->dev, "%s\n", extack._msg);
> > >>>> + return err;
> > >>>> }
> > >>>> +
> > >>>>
> > >>>> which immediately looks like a bug even though after some code checking we can
> > >>>> verify it's ok. WDYT?
> > >>>>
> > >>>> I plan to get rid of most of the br->lock since it's been abused for a very long
> > >>>> time because it's essentially STP lock, but people have started using it for other
> > >>>> things and I plan to fix that when I get more time.
> > >>>
> > >>> This won't make the sysfs codepath any nicer, will it?
> > >>>
> > >>
> > >> Currently we'll have to live with a hack that checks if the flags have changed. I agree
> > >> it won't be pretty, but we won't have to unlock and lock again in the middle of the
> > >> called function and we'll have all our locking in the same place, easier to verify and
> > >> later easier to remove. Once I get rid of most of the br->lock usage we can revisit
> > >> the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
> > >> send the switchdev notification outside of the lock and revert the flags if it doesn't
> > >> go through which doesn't sound much better.
> > >> I'm open to any other suggestions, but definitely would like to avoid playing locking games.
> > >> Even if it means casing out flag setting from all other store_ functions for sysfs.
> > >
> > > By casing out flag settings you mean something like this?
> > >
> > >
> > > #define BRPORT_ATTR(_name, _mode, _show, _store) \
> > > const struct brport_attribute brport_attr_##_name = { \
> > > .attr = {.name = __stringify(_name), \
> > > .mode = _mode }, \
> > > .show = _show, \
> > > .store_unlocked = _store, \
> > > };
> > >
> > > #define BRPORT_ATTR_FLAG(_name, _mask) \
> > > static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
> > > { \
> > > return sprintf(buf, "%d\n", !!(p->flags & _mask)); \
> > > } \
> > > static int store_##_name(struct net_bridge_port *p, unsigned long v) \
> > > { \
> > > return store_flag(p, v, _mask); \
> > > } \
> > > static BRPORT_ATTR(_name, 0644, \
> > > show_##_name, store_##_name)
> > >
> > > static ssize_t brport_store(struct kobject *kobj,
> > > struct attribute *attr,
> > > const char *buf, size_t count)
> > > {
> > > ...
> > >
> > > } else if (brport_attr->store_unlocked) {
> > > val = simple_strtoul(buf, &endp, 0);
> > > if (endp == buf)
> > > goto out_unlock;
> > > ret = brport_attr->store_unlocked(p, val);
> > > }
> > >
> >
> > Yes, this can work but will need a bit more changes because of br_port_flags_change().
> > Then the netlink side can be modeled in a similar way.
>
> What I just don't understand is how others can get away with doing
> sleepable work in atomic context but I can't make the notifier blocking
> by dropping a spinlock which isn't needed there, because it looks ugly :D
Can you please point to the bug? I'm not following
On Wed, Feb 10, 2021 at 02:21:05PM +0200, Ido Schimmel wrote:
> On Wed, Feb 10, 2021 at 02:01:06PM +0200, Vladimir Oltean wrote:
> > On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
> > > On 10/02/2021 13:01, Vladimir Oltean wrote:
> > > > On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
> > > >> On 10/02/2021 12:45, Vladimir Oltean wrote:
> > > >>> Hi Nikolay,
> > > >>>
> > > >>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> > > >>>> Hi Vladimir,
> > > >>>> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
> > > >>>> that come with this set. I'd really like to avoid those as they're a recipe
> > > >>>> for future problems. The only good way to achieve that currently is to keep
> > > >>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
> > > >>>> after the flags have been changed (if they have changed obviously). That would
> > > >>>> make the code read much easier since we'll have all our lock/unlock sequences
> > > >>>> in the same code blocks and won't play games to get sleepable context.
> > > >>>> Please let's think and work in that direction, rather than having:
> > > >>>> + spin_lock_bh(&p->br->lock);
> > > >>>> + if (err) {
> > > >>>> + netdev_err(p->dev, "%s\n", extack._msg);
> > > >>>> + return err;
> > > >>>> }
> > > >>>> +
> > > >>>>
> > > >>>> which immediately looks like a bug even though after some code checking we can
> > > >>>> verify it's ok. WDYT?
> > > >>>>
> > > >>>> I plan to get rid of most of the br->lock since it's been abused for a very long
> > > >>>> time because it's essentially STP lock, but people have started using it for other
> > > >>>> things and I plan to fix that when I get more time.
> > > >>>
> > > >>> This won't make the sysfs codepath any nicer, will it?
> > > >>>
> > > >>
> > > >> Currently we'll have to live with a hack that checks if the flags have changed. I agree
> > > >> it won't be pretty, but we won't have to unlock and lock again in the middle of the
> > > >> called function and we'll have all our locking in the same place, easier to verify and
> > > >> later easier to remove. Once I get rid of most of the br->lock usage we can revisit
> > > >> the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
> > > >> send the switchdev notification outside of the lock and revert the flags if it doesn't
> > > >> go through which doesn't sound much better.
> > > >> I'm open to any other suggestions, but definitely would like to avoid playing locking games.
> > > >> Even if it means casing out flag setting from all other store_ functions for sysfs.
> > > >
> > > > By casing out flag settings you mean something like this?
> > > >
> > > >
> > > > #define BRPORT_ATTR(_name, _mode, _show, _store) \
> > > > const struct brport_attribute brport_attr_##_name = { \
> > > > .attr = {.name = __stringify(_name), \
> > > > .mode = _mode }, \
> > > > .show = _show, \
> > > > .store_unlocked = _store, \
> > > > };
> > > >
> > > > #define BRPORT_ATTR_FLAG(_name, _mask) \
> > > > static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
> > > > { \
> > > > return sprintf(buf, "%d\n", !!(p->flags & _mask)); \
> > > > } \
> > > > static int store_##_name(struct net_bridge_port *p, unsigned long v) \
> > > > { \
> > > > return store_flag(p, v, _mask); \
> > > > } \
> > > > static BRPORT_ATTR(_name, 0644, \
> > > > show_##_name, store_##_name)
> > > >
> > > > static ssize_t brport_store(struct kobject *kobj,
> > > > struct attribute *attr,
> > > > const char *buf, size_t count)
> > > > {
> > > > ...
> > > >
> > > > } else if (brport_attr->store_unlocked) {
> > > > val = simple_strtoul(buf, &endp, 0);
> > > > if (endp == buf)
> > > > goto out_unlock;
> > > > ret = brport_attr->store_unlocked(p, val);
> > > > }
> > > >
> > >
> > > Yes, this can work but will need a bit more changes because of br_port_flags_change().
> > > Then the netlink side can be modeled in a similar way.
> >
> > What I just don't understand is how others can get away with doing
> > sleepable work in atomic context but I can't make the notifier blocking
> > by dropping a spinlock which isn't needed there, because it looks ugly :D
>
> Can you please point to the bug? I'm not following
For example, mlxsw eventually calls mlxsw_sp_fid_flood_set from the
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS handling data path, and this
function allocates memory with GFP_KERNEL.
Another example is prestera which eventually calls prestera_fw_send_req
which takes a mutex_lock.
Yet another example are mv88e6xxx and b53 which use MDIO and SPI
from their .port_egress_floods implementation, buses which have
might_sleep() in them.
On Wed, Feb 10, 2021 at 02:29:36PM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 02:21:05PM +0200, Ido Schimmel wrote:
> > On Wed, Feb 10, 2021 at 02:01:06PM +0200, Vladimir Oltean wrote:
> > > On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
> > > > On 10/02/2021 13:01, Vladimir Oltean wrote:
> > > > > On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
> > > > >> On 10/02/2021 12:45, Vladimir Oltean wrote:
> > > > >>> Hi Nikolay,
> > > > >>>
> > > > >>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> > > > >>>> Hi Vladimir,
> > > > >>>> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
> > > > >>>> that come with this set. I'd really like to avoid those as they're a recipe
> > > > >>>> for future problems. The only good way to achieve that currently is to keep
> > > > >>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
> > > > >>>> after the flags have been changed (if they have changed obviously). That would
> > > > >>>> make the code read much easier since we'll have all our lock/unlock sequences
> > > > >>>> in the same code blocks and won't play games to get sleepable context.
> > > > >>>> Please let's think and work in that direction, rather than having:
> > > > >>>> + spin_lock_bh(&p->br->lock);
> > > > >>>> + if (err) {
> > > > >>>> + netdev_err(p->dev, "%s\n", extack._msg);
> > > > >>>> + return err;
> > > > >>>> }
> > > > >>>> +
> > > > >>>>
> > > > >>>> which immediately looks like a bug even though after some code checking we can
> > > > >>>> verify it's ok. WDYT?
> > > > >>>>
> > > > >>>> I plan to get rid of most of the br->lock since it's been abused for a very long
> > > > >>>> time because it's essentially STP lock, but people have started using it for other
> > > > >>>> things and I plan to fix that when I get more time.
> > > > >>>
> > > > >>> This won't make the sysfs codepath any nicer, will it?
> > > > >>>
> > > > >>
> > > > >> Currently we'll have to live with a hack that checks if the flags have changed. I agree
> > > > >> it won't be pretty, but we won't have to unlock and lock again in the middle of the
> > > > >> called function and we'll have all our locking in the same place, easier to verify and
> > > > >> later easier to remove. Once I get rid of most of the br->lock usage we can revisit
> > > > >> the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
> > > > >> send the switchdev notification outside of the lock and revert the flags if it doesn't
> > > > >> go through which doesn't sound much better.
> > > > >> I'm open to any other suggestions, but definitely would like to avoid playing locking games.
> > > > >> Even if it means casing out flag setting from all other store_ functions for sysfs.
> > > > >
> > > > > By casing out flag settings you mean something like this?
> > > > >
> > > > >
> > > > > #define BRPORT_ATTR(_name, _mode, _show, _store) \
> > > > > const struct brport_attribute brport_attr_##_name = { \
> > > > > .attr = {.name = __stringify(_name), \
> > > > > .mode = _mode }, \
> > > > > .show = _show, \
> > > > > .store_unlocked = _store, \
> > > > > };
> > > > >
> > > > > #define BRPORT_ATTR_FLAG(_name, _mask) \
> > > > > static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
> > > > > { \
> > > > > return sprintf(buf, "%d\n", !!(p->flags & _mask)); \
> > > > > } \
> > > > > static int store_##_name(struct net_bridge_port *p, unsigned long v) \
> > > > > { \
> > > > > return store_flag(p, v, _mask); \
> > > > > } \
> > > > > static BRPORT_ATTR(_name, 0644, \
> > > > > show_##_name, store_##_name)
> > > > >
> > > > > static ssize_t brport_store(struct kobject *kobj,
> > > > > struct attribute *attr,
> > > > > const char *buf, size_t count)
> > > > > {
> > > > > ...
> > > > >
> > > > > } else if (brport_attr->store_unlocked) {
> > > > > val = simple_strtoul(buf, &endp, 0);
> > > > > if (endp == buf)
> > > > > goto out_unlock;
> > > > > ret = brport_attr->store_unlocked(p, val);
> > > > > }
> > > > >
> > > >
> > > > Yes, this can work but will need a bit more changes because of br_port_flags_change().
> > > > Then the netlink side can be modeled in a similar way.
> > >
> > > What I just don't understand is how others can get away with doing
> > > sleepable work in atomic context but I can't make the notifier blocking
> > > by dropping a spinlock which isn't needed there, because it looks ugly :D
> >
> > Can you please point to the bug? I'm not following
>
> For example, mlxsw eventually calls mlxsw_sp_fid_flood_set from the
> SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS handling data path, and this
> function allocates memory with GFP_KERNEL.
>
> Another example is prestera which eventually calls prestera_fw_send_req
> which takes a mutex_lock.
>
> Yet another example are mv88e6xxx and b53 which use MDIO and SPI
> from their .port_egress_floods implementation, buses which have
> might_sleep() in them.
Right, but see the code:
```
attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
attr.flags = SWITCHDEV_F_DEFER;
attr.u.brport_flags = flags;
err = switchdev_port_attr_set(p->dev, &attr);
```
And check how SWITCHDEV_F_DEFER is used.
We can squash SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS into one blocking notification
by reducing the scope of the bridge lock like Nik suggested. Currently
it's just blindly taken around br_setport().
On Wed, Feb 10, 2021 at 02:38:23PM +0200, Ido Schimmel wrote:
> On Wed, Feb 10, 2021 at 02:29:36PM +0200, Vladimir Oltean wrote:
> > On Wed, Feb 10, 2021 at 02:21:05PM +0200, Ido Schimmel wrote:
> > > On Wed, Feb 10, 2021 at 02:01:06PM +0200, Vladimir Oltean wrote:
> > > > On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
> > > > > On 10/02/2021 13:01, Vladimir Oltean wrote:
> > > > > > On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
> > > > > >> On 10/02/2021 12:45, Vladimir Oltean wrote:
> > > > > >>> Hi Nikolay,
> > > > > >>>
> > > > > >>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> > > > > >>>> Hi Vladimir,
> > > > > >>>> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
> > > > > >>>> that come with this set. I'd really like to avoid those as they're a recipe
> > > > > >>>> for future problems. The only good way to achieve that currently is to keep
> > > > > >>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
> > > > > >>>> after the flags have been changed (if they have changed obviously). That would
> > > > > >>>> make the code read much easier since we'll have all our lock/unlock sequences
> > > > > >>>> in the same code blocks and won't play games to get sleepable context.
> > > > > >>>> Please let's think and work in that direction, rather than having:
> > > > > >>>> + spin_lock_bh(&p->br->lock);
> > > > > >>>> + if (err) {
> > > > > >>>> + netdev_err(p->dev, "%s\n", extack._msg);
> > > > > >>>> + return err;
> > > > > >>>> }
> > > > > >>>> +
> > > > > >>>>
> > > > > >>>> which immediately looks like a bug even though after some code checking we can
> > > > > >>>> verify it's ok. WDYT?
> > > > > >>>>
> > > > > >>>> I plan to get rid of most of the br->lock since it's been abused for a very long
> > > > > >>>> time because it's essentially STP lock, but people have started using it for other
> > > > > >>>> things and I plan to fix that when I get more time.
> > > > > >>>
> > > > > >>> This won't make the sysfs codepath any nicer, will it?
> > > > > >>>
> > > > > >>
> > > > > >> Currently we'll have to live with a hack that checks if the flags have changed. I agree
> > > > > >> it won't be pretty, but we won't have to unlock and lock again in the middle of the
> > > > > >> called function and we'll have all our locking in the same place, easier to verify and
> > > > > >> later easier to remove. Once I get rid of most of the br->lock usage we can revisit
> > > > > >> the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
> > > > > >> send the switchdev notification outside of the lock and revert the flags if it doesn't
> > > > > >> go through which doesn't sound much better.
> > > > > >> I'm open to any other suggestions, but definitely would like to avoid playing locking games.
> > > > > >> Even if it means casing out flag setting from all other store_ functions for sysfs.
> > > > > >
> > > > > > By casing out flag settings you mean something like this?
> > > > > >
> > > > > >
> > > > > > #define BRPORT_ATTR(_name, _mode, _show, _store) \
> > > > > > const struct brport_attribute brport_attr_##_name = { \
> > > > > > .attr = {.name = __stringify(_name), \
> > > > > > .mode = _mode }, \
> > > > > > .show = _show, \
> > > > > > .store_unlocked = _store, \
> > > > > > };
> > > > > >
> > > > > > #define BRPORT_ATTR_FLAG(_name, _mask) \
> > > > > > static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
> > > > > > { \
> > > > > > return sprintf(buf, "%d\n", !!(p->flags & _mask)); \
> > > > > > } \
> > > > > > static int store_##_name(struct net_bridge_port *p, unsigned long v) \
> > > > > > { \
> > > > > > return store_flag(p, v, _mask); \
> > > > > > } \
> > > > > > static BRPORT_ATTR(_name, 0644, \
> > > > > > show_##_name, store_##_name)
> > > > > >
> > > > > > static ssize_t brport_store(struct kobject *kobj,
> > > > > > struct attribute *attr,
> > > > > > const char *buf, size_t count)
> > > > > > {
> > > > > > ...
> > > > > >
> > > > > > } else if (brport_attr->store_unlocked) {
> > > > > > val = simple_strtoul(buf, &endp, 0);
> > > > > > if (endp == buf)
> > > > > > goto out_unlock;
> > > > > > ret = brport_attr->store_unlocked(p, val);
> > > > > > }
> > > > > >
> > > > >
> > > > > Yes, this can work but will need a bit more changes because of br_port_flags_change().
> > > > > Then the netlink side can be modeled in a similar way.
> > > >
> > > > What I just don't understand is how others can get away with doing
> > > > sleepable work in atomic context but I can't make the notifier blocking
> > > > by dropping a spinlock which isn't needed there, because it looks ugly :D
> > >
> > > Can you please point to the bug? I'm not following
> >
> > For example, mlxsw eventually calls mlxsw_sp_fid_flood_set from the
> > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS handling data path, and this
> > function allocates memory with GFP_KERNEL.
> >
> > Another example is prestera which eventually calls prestera_fw_send_req
> > which takes a mutex_lock.
> >
> > Yet another example are mv88e6xxx and b53 which use MDIO and SPI
> > from their .port_egress_floods implementation, buses which have
> > might_sleep() in them.
>
> Right, but see the code:
>
> ```
> attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
> attr.flags = SWITCHDEV_F_DEFER;
> attr.u.brport_flags = flags;
>
> err = switchdev_port_attr_set(p->dev, &attr);
> ```
>
> And check how SWITCHDEV_F_DEFER is used.
>
> We can squash SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and
> SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS into one blocking notification
> by reducing the scope of the bridge lock like Nik suggested. Currently
> it's just blindly taken around br_setport().
Okay, so the deferred attr_set propagates just a possible ENOMEM from
the deferred work enqueue, not the actual failure if that occurred.
I can leave alone the piece that sends two notifications for now, but I
would still need to deliver the full struct switchdev_brport_flags with
both the flags and the mask to both the PRE_BRIDGE_FLAGS and the
BRIDGE_FLAGS, because I need to deliver an extack from the sja1105 driver
that BR_FLOOD should always have the same value as BR_MCAST_FLOOD.
On Wed, Feb 10, 2021 at 02:55:01PM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 02:38:23PM +0200, Ido Schimmel wrote:
> > On Wed, Feb 10, 2021 at 02:29:36PM +0200, Vladimir Oltean wrote:
> > > On Wed, Feb 10, 2021 at 02:21:05PM +0200, Ido Schimmel wrote:
> > > > On Wed, Feb 10, 2021 at 02:01:06PM +0200, Vladimir Oltean wrote:
> > > > > On Wed, Feb 10, 2021 at 01:05:57PM +0200, Nikolay Aleksandrov wrote:
> > > > > > On 10/02/2021 13:01, Vladimir Oltean wrote:
> > > > > > > On Wed, Feb 10, 2021 at 12:52:33PM +0200, Nikolay Aleksandrov wrote:
> > > > > > >> On 10/02/2021 12:45, Vladimir Oltean wrote:
> > > > > > >>> Hi Nikolay,
> > > > > > >>>
> > > > > > >>> On Wed, Feb 10, 2021 at 12:31:43PM +0200, Nikolay Aleksandrov wrote:
> > > > > > >>>> Hi Vladimir,
> > > > > > >>>> Let's take a step back for a moment and discuss the bridge unlock/lock sequences
> > > > > > >>>> that come with this set. I'd really like to avoid those as they're a recipe
> > > > > > >>>> for future problems. The only good way to achieve that currently is to keep
> > > > > > >>>> the PRE_FLAGS call and do that in unsleepable context but move the FLAGS call
> > > > > > >>>> after the flags have been changed (if they have changed obviously). That would
> > > > > > >>>> make the code read much easier since we'll have all our lock/unlock sequences
> > > > > > >>>> in the same code blocks and won't play games to get sleepable context.
> > > > > > >>>> Please let's think and work in that direction, rather than having:
> > > > > > >>>> + spin_lock_bh(&p->br->lock);
> > > > > > >>>> + if (err) {
> > > > > > >>>> + netdev_err(p->dev, "%s\n", extack._msg);
> > > > > > >>>> + return err;
> > > > > > >>>> }
> > > > > > >>>> +
> > > > > > >>>>
> > > > > > >>>> which immediately looks like a bug even though after some code checking we can
> > > > > > >>>> verify it's ok. WDYT?
> > > > > > >>>>
> > > > > > >>>> I plan to get rid of most of the br->lock since it's been abused for a very long
> > > > > > >>>> time because it's essentially STP lock, but people have started using it for other
> > > > > > >>>> things and I plan to fix that when I get more time.
> > > > > > >>>
> > > > > > >>> This won't make the sysfs codepath any nicer, will it?
> > > > > > >>>
> > > > > > >>
> > > > > > >> Currently we'll have to live with a hack that checks if the flags have changed. I agree
> > > > > > >> it won't be pretty, but we won't have to unlock and lock again in the middle of the
> > > > > > >> called function and we'll have all our locking in the same place, easier to verify and
> > > > > > >> later easier to remove. Once I get rid of most of the br->lock usage we can revisit
> > > > > > >> the drop of PRE_FLAGS if it's a problem. The alternative is to change the flags, then
> > > > > > >> send the switchdev notification outside of the lock and revert the flags if it doesn't
> > > > > > >> go through which doesn't sound much better.
> > > > > > >> I'm open to any other suggestions, but definitely would like to avoid playing locking games.
> > > > > > >> Even if it means casing out flag setting from all other store_ functions for sysfs.
> > > > > > >
> > > > > > > By casing out flag settings you mean something like this?
> > > > > > >
> > > > > > >
> > > > > > > #define BRPORT_ATTR(_name, _mode, _show, _store) \
> > > > > > > const struct brport_attribute brport_attr_##_name = { \
> > > > > > > .attr = {.name = __stringify(_name), \
> > > > > > > .mode = _mode }, \
> > > > > > > .show = _show, \
> > > > > > > .store_unlocked = _store, \
> > > > > > > };
> > > > > > >
> > > > > > > #define BRPORT_ATTR_FLAG(_name, _mask) \
> > > > > > > static ssize_t show_##_name(struct net_bridge_port *p, char *buf) \
> > > > > > > { \
> > > > > > > return sprintf(buf, "%d\n", !!(p->flags & _mask)); \
> > > > > > > } \
> > > > > > > static int store_##_name(struct net_bridge_port *p, unsigned long v) \
> > > > > > > { \
> > > > > > > return store_flag(p, v, _mask); \
> > > > > > > } \
> > > > > > > static BRPORT_ATTR(_name, 0644, \
> > > > > > > show_##_name, store_##_name)
> > > > > > >
> > > > > > > static ssize_t brport_store(struct kobject *kobj,
> > > > > > > struct attribute *attr,
> > > > > > > const char *buf, size_t count)
> > > > > > > {
> > > > > > > ...
> > > > > > >
> > > > > > > } else if (brport_attr->store_unlocked) {
> > > > > > > val = simple_strtoul(buf, &endp, 0);
> > > > > > > if (endp == buf)
> > > > > > > goto out_unlock;
> > > > > > > ret = brport_attr->store_unlocked(p, val);
> > > > > > > }
> > > > > > >
> > > > > >
> > > > > > Yes, this can work but will need a bit more changes because of br_port_flags_change().
> > > > > > Then the netlink side can be modeled in a similar way.
> > > > >
> > > > > What I just don't understand is how others can get away with doing
> > > > > sleepable work in atomic context but I can't make the notifier blocking
> > > > > by dropping a spinlock which isn't needed there, because it looks ugly :D
> > > >
> > > > Can you please point to the bug? I'm not following
> > >
> > > For example, mlxsw eventually calls mlxsw_sp_fid_flood_set from the
> > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS handling data path, and this
> > > function allocates memory with GFP_KERNEL.
> > >
> > > Another example is prestera which eventually calls prestera_fw_send_req
> > > which takes a mutex_lock.
> > >
> > > Yet another example are mv88e6xxx and b53 which use MDIO and SPI
> > > from their .port_egress_floods implementation, buses which have
> > > might_sleep() in them.
> >
> > Right, but see the code:
> >
> > ```
> > attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
> > attr.flags = SWITCHDEV_F_DEFER;
> > attr.u.brport_flags = flags;
> >
> > err = switchdev_port_attr_set(p->dev, &attr);
> > ```
> >
> > And check how SWITCHDEV_F_DEFER is used.
> >
> > We can squash SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and
> > SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS into one blocking notification
> > by reducing the scope of the bridge lock like Nik suggested. Currently
> > it's just blindly taken around br_setport().
>
> Okay, so the deferred attr_set propagates just a possible ENOMEM from
> the deferred work enqueue, not the actual failure if that occurred.
>
> I can leave alone the piece that sends two notifications for now, but I
> would still need to deliver the full struct switchdev_brport_flags with
> both the flags and the mask to both the PRE_BRIDGE_FLAGS and the
> BRIDGE_FLAGS, because I need to deliver an extack from the sja1105 driver
> that BR_FLOOD should always have the same value as BR_MCAST_FLOOD.
OK
From: Vladimir Oltean <[email protected]>
Date: Wed, 10 Feb 2021 11:14:41 +0200
> From: Vladimir Oltean <[email protected]>
>
> Because the bridge will start offloading SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS
> while not serialized by any lock such as the br->lock spinlock, existing
> drivers that treat that attribute and cache the brport flags might no
> longer work correctly.
>
> The issue is that the brport flags are a single unsigned long bitmask,
> and the bridge only guarantees the validity of the changed bits, not the
> full state. So when offloading two concurrent switchdev attributes, such
> as one for BR_LEARNING and another for BR_FLOOD, it might happen that
> the flags having BR_FLOOD are written into the cached value, and this in
> turn disables the BR_LEARNING bit which was set previously.
>
> We can fix this across the board by keeping individual boolean variables
> for each brport flag. Note that mlxsw and prestera were setting the
> BR_LEARNING_SYNC flag too, but that appears to be just dead code, so I
> removed it.
>
> Signed-off-by: Vladimir Oltean <[email protected]>
This needs updating because, as discussed, there is no race.
On 2/10/2021 1:14 AM, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> For a DSA switch port operating in standalone mode, address learning
> doesn't make much sense since that is a bridge function. In fact,
> address learning even breaks setups such as this one:
>
> +---------------------------------------------+
> | |
> | +-------------------+ |
> | | br0 | send receive |
> | +--------+-+--------+ +--------+ +--------+ |
> | | | | | | | | | |
> | | swp0 | | swp1 | | swp2 | | swp3 | |
> | | | | | | | | | |
> +-+--------+-+--------+-+--------+-+--------+-+
> | ^ | ^
> | | | |
> | +-----------+ |
> | |
> +--------------------------------+
>
> because if the switch has a single FDB (can offload a single bridge)
> then source address learning on swp3 can "steal" the source MAC address
> of swp2 from br0's FDB, because learning frames coming from swp2 will be
> done twice: first on the swp1 ingress port, second on the swp3 ingress
> port. So the hardware FDB will become out of sync with the software
> bridge, and when swp2 tries to send one more packet towards swp1, the
> ASIC will attempt to short-circuit the forwarding path and send it
> directly to swp3 (since that's the last port it learned that address on),
> which it obviously can't, because swp3 operates in standalone mode.
>
> So DSA drivers operating in standalone mode should still configure a
> list of bridge port flags even when they are standalone. Currently DSA
> attempts to call dsa_port_bridge_flags with 0, which disables egress
> flooding of unknown unicast and multicast, something which doesn't make
> much sense. For the switches that implement .port_egress_floods - b53
> and mv88e6xxx, it probably doesn't matter too much either, since they
> can possibly inject traffic from the CPU into a standalone port,
> regardless of MAC DA, even if egress flooding is turned off for that
> port, but certainly not all DSA switches can do that - sja1105, for
> example, can't. So it makes sense to use a better common default there,
> such as "flood everything".
>
> It should also be noted that what DSA calls "dsa_port_bridge_flags()"
> is a degenerate name for just calling .port_egress_floods(), since
> nothing else is implemented - not learning, in particular. But disabling
> address learning, something that this driver is also coding up for, will
> be supported by individual drivers once .port_egress_floods is replaced
> with a more generic .port_bridge_flags.
>
> Previous attempts to code up this logic have been in the common bridge
> layer, but as pointed out by Ido Schimmel, there are corner cases that
> are missed when doing that:
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
>
> So, at least for now, let's leave DSA in charge of setting port flags
> before and after the bridge join and leave.
>
> Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
--
Florian
On 2/10/2021 1:14 AM, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> The bridge offloads the port flags through a single bit mask using
> switchdev, which among others, contains learning and flooding settings.
>
> The commit 57652796aa97 ("net: dsa: add support for bridge flags")
> missed one crucial aspect of the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS API
> when designing the API one level lower, towards the drivers.
> This is that the bitmask of passed brport flags never has more than one
> bit set at a time. On the other hand, the prototype passed to the driver
> is .port_egress_floods(int port, bool unicast, bool multicast), which
> configures two flags at a time.
>
> DSA currently checks if .port_egress_floods is implemented, and if it
> is, reports both BR_FLOOD and BR_MCAST_FLOOD as supported. So the driver
> has no choice if it wants to inform the bridge that, for example, it
> can't configure unicast flooding independently of multicast flooding -
> the DSA mid layer is standing in the way. Or the other way around: a new
> driver wants to start configuring BR_BCAST_FLOOD separately, but what do
> we do with the rest, which only support unicast and multicast flooding?
> Do we report broadcast flooding configuration as supported for those
> too, and silently do nothing?
>
> Secondly, currently DSA deems the driver too dumb to deserve knowing that
> a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
> just calls .port_egress_floods for the CPU port. When we'll add support
> for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
> problem because the flood settings will need to be held statefully in
> the DSA middle layer, otherwise changing the mrouter port attribute will
> impact the flooding attribute. And that's _assuming_ that the underlying
> hardware doesn't have anything else to do when a multicast router
> attaches to a port than flood unknown traffic to it. If it does, there
> will need to be a dedicated .port_set_mrouter anyway.
>
> Lastly, we have DSA drivers that have a backlink into a pure switchdev
> driver (felix -> ocelot). It seems reasonable that the other switchdev
> drivers should not have to suffer from the oddities of DSA overengineering,
> so keeping DSA a pass-through layer makes more sense there.
>
> To simplify the brport flags situation we just delete .port_egress_floods
> and we introduce a simple .port_bridge_flags which is passed to the
> driver. Also, the logic from dsa_port_mrouter is removed and a
> .port_set_mrouter is created.
>
> Functionally speaking, we simply move the calls to .port_egress_floods
> one step lower, in the two drivers that implement it: mv88e6xxx and b53,
> so things should work just as before.
>
> Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
--
Florian
On 2/10/2021 1:14 AM, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> In preparation of offloading the bridge port flags which have
> independent settings for unknown multicast and for broadcast, we should
> also start reserving one destination Port Group ID for the flooding of
> broadcast packets, to allow configuring it individually.
>
> Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
--
Florian
On 2/10/2021 1:14 AM, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> We should not be unconditionally enabling address learning, since doing
> that is actively detrimential when a port is standalone and not offloading
> a bridge. Namely, if a port in the switch is standalone and others are
> offloading the bridge, then we could enter a situation where we learn an
> address towards the standalone port, but the bridged ports could not
> forward the packet there, because the CPU is the only path between the
> standalone and the bridged ports. The solution of course is to not
> enable address learning unless the bridge asks for it.
>
> We need to set up the initial port flags for no learning and flooding
> everything, then the bridge takes over. The flood configuration was
> already configured ok in ocelot_init, we just need to disable learning
> in ocelot_init_port.
>
> Signed-off-by: Vladimir Oltean <[email protected]>
> Reviewed-by: Alexandre Belloni <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
--
Florian