2021-02-09 15:22:32

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 00/11] Cleanup in brport flags switchdev offload for DSA

From: Vladimir Oltean <[email protected]>

The initial goal of this series was to have better support for
standalone ports mode and multiple bridges on the DSA drivers like
ocelot/felix and sja1105. Proper support for standalone mode requires
disabling address learning, which in turn requires interaction with the
switchdev notifier, which is actually where most of the patches are.

I also noticed that most of the drivers are actually talking either to
firmware or SPI/MDIO connected devices from the brport flags switchdev
attribute handler, so it makes sense to actually make it sleepable
instead of atomic.

Vladimir Oltean (11):
net: switchdev: propagate extack to port attributes
net: bridge: offload all port flags at once in br_setport
net: bridge: don't print in br_switchdev_set_port_flag
net: bridge: offload initial and final port flags through switchdev
net: dsa: stop setting initial and final brport flags
net: squash switchdev attributes PRE_BRIDGE_FLAGS and BRIDGE_FLAGS
net: dsa: kill .port_egress_floods overengineering
net: bridge: put SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS on the blocking
call chain
net: mscc: ocelot: use separate flooding PGID for broadcast
net: mscc: ocelot: offload bridge port flags to device
net: dsa: sja1105: offload bridge port flags to device

drivers/net/dsa/b53/b53_common.c | 20 +-
drivers/net/dsa/mv88e6xxx/chip.c | 21 +-
drivers/net/dsa/ocelot/felix.c | 10 +
drivers/net/dsa/sja1105/sja1105.h | 2 +
drivers/net/dsa/sja1105/sja1105_main.c | 212 +++++++++++++++++-
drivers/net/dsa/sja1105/sja1105_spi.c | 6 +
.../marvell/prestera/prestera_switchdev.c | 32 +--
.../mellanox/mlxsw/spectrum_switchdev.c | 31 +--
drivers/net/ethernet/mscc/ocelot.c | 72 +++++-
drivers/net/ethernet/mscc/ocelot_net.c | 7 +-
drivers/net/ethernet/rocker/rocker_main.c | 24 +-
drivers/net/ethernet/ti/cpsw_switchdev.c | 35 ++-
drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 43 ++--
include/linux/if_bridge.h | 3 +
include/net/dsa.h | 7 +-
include/net/switchdev.h | 14 +-
include/soc/mscc/ocelot.h | 18 +-
net/bridge/br_if.c | 21 +-
net/bridge/br_netlink.c | 160 ++++++-------
net/bridge/br_private.h | 6 +-
net/bridge/br_switchdev.c | 37 ++-
net/dsa/dsa_priv.h | 8 +-
net/dsa/port.c | 42 +---
net/dsa/slave.c | 10 +-
net/switchdev/switchdev.c | 11 +-
25 files changed, 556 insertions(+), 296 deletions(-)

--
2.25.1


2021-02-09 15:22:56

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 01/11] net: switchdev: propagate extack to port attributes

From: Vladimir Oltean <[email protected]>

When a struct switchdev_attr is notified through switchdev, there is no
way to report informational messages, unlike for struct switchdev_obj.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
Patch is new.

.../ethernet/marvell/prestera/prestera_switchdev.c | 3 ++-
.../net/ethernet/mellanox/mlxsw/spectrum_switchdev.c | 3 ++-
drivers/net/ethernet/mscc/ocelot_net.c | 3 ++-
drivers/net/ethernet/ti/cpsw_switchdev.c | 3 ++-
include/net/switchdev.h | 6 ++++--
net/dsa/slave.c | 3 ++-
net/switchdev/switchdev.c | 11 ++++++++---
7 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
index 8c2b03151736..2c1619715a4b 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
@@ -695,7 +695,8 @@ static int prestera_port_attr_stp_state_set(struct prestera_port *port,
}

static int prestera_port_obj_attr_set(struct net_device *dev,
- const struct switchdev_attr *attr)
+ const struct switchdev_attr *attr,
+ struct netlink_ext_ack *extack)
{
struct prestera_port *port = netdev_priv(dev);
int err = 0;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 20c4f3c2cf23..18e4f1cd5587 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -887,7 +887,8 @@ mlxsw_sp_port_attr_br_mrouter_set(struct mlxsw_sp_port *mlxsw_sp_port,
}

static int mlxsw_sp_port_attr_set(struct net_device *dev,
- const struct switchdev_attr *attr)
+ const struct switchdev_attr *attr,
+ struct netlink_ext_ack *extack)
{
struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev);
int err;
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index 8f12fa45b1b5..f9da4aa39444 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1005,7 +1005,8 @@ static void ocelot_port_attr_mc_set(struct ocelot *ocelot, int port, bool mc)
}

static int ocelot_port_attr_set(struct net_device *dev,
- const struct switchdev_attr *attr)
+ const struct switchdev_attr *attr,
+ struct netlink_ext_ack *extack)
{
struct ocelot_port_private *priv = netdev_priv(dev);
struct ocelot *ocelot = priv->port.ocelot;
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.c b/drivers/net/ethernet/ti/cpsw_switchdev.c
index 9967cf985728..13524cbaa8b6 100644
--- a/drivers/net/ethernet/ti/cpsw_switchdev.c
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.c
@@ -83,7 +83,8 @@ static int cpsw_port_attr_br_flags_pre_set(struct net_device *netdev,
}

static int cpsw_port_attr_set(struct net_device *ndev,
- const struct switchdev_attr *attr)
+ const struct switchdev_attr *attr,
+ struct netlink_ext_ack *extack)
{
struct cpsw_priv *priv = netdev_priv(ndev);
int ret;
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 88fcac140966..84c765312001 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -283,7 +283,8 @@ int switchdev_handle_port_attr_set(struct net_device *dev,
struct switchdev_notifier_port_attr_info *port_attr_info,
bool (*check_cb)(const struct net_device *dev),
int (*set_cb)(struct net_device *dev,
- const struct switchdev_attr *attr));
+ const struct switchdev_attr *attr,
+ struct netlink_ext_ack *extack));
#else

static inline void switchdev_deferred_process(void)
@@ -374,7 +375,8 @@ switchdev_handle_port_attr_set(struct net_device *dev,
struct switchdev_notifier_port_attr_info *port_attr_info,
bool (*check_cb)(const struct net_device *dev),
int (*set_cb)(struct net_device *dev,
- const struct switchdev_attr *attr))
+ const struct switchdev_attr *attr,
+ struct netlink_ext_ack *extack))
{
return 0;
}
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 431bdbdd8473..8f4c7c232e2c 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -271,7 +271,8 @@ static int dsa_slave_ioctl(struct net_device *dev, struct ifreq *ifr, int cmd)
}

static int dsa_slave_port_attr_set(struct net_device *dev,
- const struct switchdev_attr *attr)
+ const struct switchdev_attr *attr,
+ struct netlink_ext_ack *extack)
{
struct dsa_port *dp = dsa_slave_to_port(dev);
int ret;
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 94113ca29dcf..0b84f076591e 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -488,14 +488,18 @@ static int __switchdev_handle_port_attr_set(struct net_device *dev,
struct switchdev_notifier_port_attr_info *port_attr_info,
bool (*check_cb)(const struct net_device *dev),
int (*set_cb)(struct net_device *dev,
- const struct switchdev_attr *attr))
+ const struct switchdev_attr *attr,
+ struct netlink_ext_ack *extack))
{
+ struct netlink_ext_ack *extack;
struct net_device *lower_dev;
struct list_head *iter;
int err = -EOPNOTSUPP;

+ extack = switchdev_notifier_info_to_extack(&port_attr_info->info);
+
if (check_cb(dev)) {
- err = set_cb(dev, port_attr_info->attr);
+ err = set_cb(dev, port_attr_info->attr, extack);
if (err != -EOPNOTSUPP)
port_attr_info->handled = true;
return err;
@@ -525,7 +529,8 @@ int switchdev_handle_port_attr_set(struct net_device *dev,
struct switchdev_notifier_port_attr_info *port_attr_info,
bool (*check_cb)(const struct net_device *dev),
int (*set_cb)(struct net_device *dev,
- const struct switchdev_attr *attr))
+ const struct switchdev_attr *attr,
+ struct netlink_ext_ack *extack))
{
int err;

--
2.25.1

2021-02-09 15:23:34

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 02/11] net: bridge: offload all port flags at once in br_setport

From: Vladimir Oltean <[email protected]>

The br_switchdev_set_port_flag function uses the atomic notifier call
chain because br_setport runs in an atomic section (under br->lock).
This is because port flag changes need to be synchronized with the data
path. But actually the switchdev notifier doesn't need that, only
br_set_port_flag does. So we can collect all the port flag changes and
only emit the notification at the end, then revert the changes if the
switchdev notification failed.

There's also the other aspect: if for example this command:

ip link set swp0 type bridge_slave flood off mcast_flood off learning off

succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
BR_LEARNING, there would be no attempt to revert the partial state in
any way. Arguably, if the user changes more than one flag through the
same netlink command, this one _should_ be all or nothing, which means
it should be passed through switchdev as all or nothing.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
Patch is new.

net/bridge/br_netlink.c | 155 +++++++++++++++-----------------------
net/bridge/br_switchdev.c | 7 +-
2 files changed, 66 insertions(+), 96 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index bd3962da345a..2c110bcbc6d0 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -853,103 +853,82 @@ static int br_set_port_state(struct net_bridge_port *p, u8 state)
}

/* Set/clear or port flags based on attribute */
-static int br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
- int attrtype, unsigned long mask)
+static void br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
+ int attrtype, unsigned long mask)
{
- unsigned long flags;
- int err;
-
if (!tb[attrtype])
- return 0;
+ return;

if (nla_get_u8(tb[attrtype]))
- flags = p->flags | mask;
+ p->flags |= mask;
else
- flags = p->flags & ~mask;
-
- err = br_switchdev_set_port_flag(p, flags, mask);
- if (err)
- return err;
-
- p->flags = flags;
- return 0;
+ p->flags &= ~mask;
}

/* Process bridge protocol info on port */
static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
{
- unsigned long old_flags = p->flags;
- bool br_vlan_tunnel_old = false;
+ unsigned long old_flags, changed_mask;
+ bool br_vlan_tunnel_old;
int err;

- err = br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_FAST_LEAVE, BR_MULTICAST_FAST_LEAVE);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_PROTECT, BR_ROOT_BLOCK);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_TO_UCAST, BR_MULTICAST_TO_UNICAST);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);
- if (err)
+ spin_lock_bh(&p->br->lock);
+
+ old_flags = p->flags;
+ br_vlan_tunnel_old = (old_flags & BR_VLAN_TUNNEL) ? true : false;
+
+ br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
+ br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_FAST_LEAVE,
+ BR_MULTICAST_FAST_LEAVE);
+ br_set_port_flag(p, tb, IFLA_BRPORT_PROTECT, BR_ROOT_BLOCK);
+ br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
+ br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_TO_UCAST,
+ BR_MULTICAST_TO_UNICAST);
+ br_set_port_flag(p, tb, IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD);
+ br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
+ br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);
+ br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
+ br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS, BR_NEIGH_SUPPRESS);
+ br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
+
+ changed_mask = old_flags ^ p->flags;
+
+ spin_unlock_bh(&p->br->lock);
+
+ err = br_switchdev_set_port_flag(p, p->flags, changed_mask);
+ if (err) {
+ spin_lock_bh(&p->br->lock);
+ p->flags = old_flags;
+ spin_unlock_bh(&p->br->lock);
return err;
+ }

- br_vlan_tunnel_old = (p->flags & BR_VLAN_TUNNEL) ? true : false;
- err = br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
- if (err)
- return err;
+ spin_lock_bh(&p->br->lock);

if (br_vlan_tunnel_old && !(p->flags & BR_VLAN_TUNNEL))
nbp_vlan_tunnel_info_flush(p);

+ br_port_flags_change(p, changed_mask);
+
if (tb[IFLA_BRPORT_COST]) {
err = br_stp_set_path_cost(p, nla_get_u32(tb[IFLA_BRPORT_COST]));
if (err)
- return err;
+ goto out;
}

if (tb[IFLA_BRPORT_PRIORITY]) {
err = br_stp_set_port_priority(p, nla_get_u16(tb[IFLA_BRPORT_PRIORITY]));
if (err)
- return err;
+ goto out;
}

if (tb[IFLA_BRPORT_STATE]) {
err = br_set_port_state(p, nla_get_u8(tb[IFLA_BRPORT_STATE]));
if (err)
- return err;
+ goto out;
}

if (tb[IFLA_BRPORT_FLUSH])
@@ -961,7 +940,7 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])

err = br_multicast_set_port_router(p, mcast_router);
if (err)
- return err;
+ goto out;
}

if (tb[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT]) {
@@ -970,27 +949,20 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
hlimit = nla_get_u32(tb[IFLA_BRPORT_MCAST_EHT_HOSTS_LIMIT]);
err = br_multicast_eht_set_hosts_limit(p, hlimit);
if (err)
- return err;
+ goto out;
}
#endif

if (tb[IFLA_BRPORT_GROUP_FWD_MASK]) {
u16 fwd_mask = nla_get_u16(tb[IFLA_BRPORT_GROUP_FWD_MASK]);

- if (fwd_mask & BR_GROUPFWD_MACPAUSE)
- return -EINVAL;
+ if (fwd_mask & BR_GROUPFWD_MACPAUSE) {
+ err = -EINVAL;
+ goto out;
+ }
p->group_fwd_mask = fwd_mask;
}

- err = br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS,
- BR_NEIGH_SUPPRESS);
- if (err)
- return err;
-
- err = br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
- if (err)
- return err;
-
if (tb[IFLA_BRPORT_BACKUP_PORT]) {
struct net_device *backup_dev = NULL;
u32 backup_ifindex;
@@ -999,17 +971,21 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
if (backup_ifindex) {
backup_dev = __dev_get_by_index(dev_net(p->dev),
backup_ifindex);
- if (!backup_dev)
- return -ENOENT;
+ if (!backup_dev) {
+ err = -ENOENT;
+ goto out;
+ }
}

err = nbp_backup_change(p, backup_dev);
if (err)
- return err;
+ goto out;
}

- br_port_flags_change(p, old_flags ^ p->flags);
- return 0;
+out:
+ spin_unlock_bh(&p->br->lock);
+
+ return err;
}

/* Change state and parameters on port. */
@@ -1045,9 +1021,7 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh, u16 flags,
if (err)
return err;

- spin_lock_bh(&p->br->lock);
err = br_setport(p, tb);
- spin_unlock_bh(&p->br->lock);
} else {
/* Binary compatibility with old RSTP */
if (nla_len(protinfo) < sizeof(u8))
@@ -1134,17 +1108,10 @@ static int br_port_slave_changelink(struct net_device *brdev,
struct nlattr *data[],
struct netlink_ext_ack *extack)
{
- struct net_bridge *br = netdev_priv(brdev);
- int ret;
-
if (!data)
return 0;

- spin_lock_bh(&br->lock);
- ret = br_setport(br_port_get_rtnl(dev), data);
- spin_unlock_bh(&br->lock);
-
- return ret;
+ return br_setport(br_port_get_rtnl(dev), data);
}

static int br_port_fill_slave_info(struct sk_buff *skb,
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index a9c23ef83443..c004ade25ac0 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -65,16 +65,19 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
struct switchdev_attr attr = {
.orig_dev = p->dev,
.id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
- .u.brport_flags = mask,
};
struct switchdev_notifier_port_attr_info info = {
.attr = &attr,
};
int err;

- if (mask & ~BR_PORT_FLAGS_HW_OFFLOAD)
+ flags &= BR_PORT_FLAGS_HW_OFFLOAD;
+ mask &= BR_PORT_FLAGS_HW_OFFLOAD;
+ if (!mask)
return 0;

+ attr.u.brport_flags = mask;
+
/* We run from atomic context here */
err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
&info.info, NULL);
--
2.25.1

2021-02-09 15:25:18

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

From: Vladimir Oltean <[email protected]>

It must first be admitted that switchdev device drivers have a life
beyond the bridge, and when they aren't offloading the bridge driver
they are operating with forwarding disabled between ports, emulating as
closely as possible N standalone network interfaces.

Now it must be said that for a switchdev port operating in standalone
mode, address learning doesn't make much sense since that is a bridge
function. In fact, address learning even breaks setups such as this one:

+---------------------------------------------+
| |
| +-------------------+ |
| | br0 | send receive |
| +--------+-+--------+ +--------+ +--------+ |
| | | | | | | | | |
| | swp0 | | swp1 | | swp2 | | swp3 | |
| | | | | | | | | |
+-+--------+-+--------+-+--------+-+--------+-+
| ^ | ^
| | | |
| +-----------+ |
| |
+--------------------------------+

because if the ASIC has a single FDB (can offload a single bridge)
then source address learning on swp3 can "steal" the source MAC address
of swp2 from br0's FDB, because learning frames coming from swp2 will be
done twice: first on the swp1 ingress port, second on the swp3 ingress
port. So the hardware FDB will become out of sync with the software
bridge, and when swp2 tries to send one more packet towards swp1, the
ASIC will attempt to short-circuit the forwarding path and send it
directly to swp3 (since that's the last port it learned that address on),
which it obviously can't, because swp3 operates in standalone mode.

So switchdev drivers operating in standalone mode should disable address
learning. As a matter of practicality, we can reduce code duplication in
drivers by having the bridge notify through switchdev of the initial and
final brport flags. Then, drivers can simply start up hardcoded for no
address learning (similar to how they already start up hardcoded for no
forwarding), then they only need to listen for
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
need for special cases when the port joins or leaves the bridge etc.

When a port leaves the bridge (and therefore becomes standalone), we
issue a switchdev attribute that apart from disabling address learning,
enables flooding of all kinds. This is also done for pragmatic reasons,
because even though standalone switchdev ports might not need to have
flooding enabled in order to inject traffic with any MAC DA from the
control interface, it certainly doesn't hurt either, and it even makes
more sense than disabling flooding of unknown traffic towards that port.

Note that the implementation is a bit wacky because the switchdev API
for port attributes is very counterproductive. Instead of issuing a
single switchdev notification with a bitwise OR of all flags that we're
modifying, we need to issue 4 individual notifications, one for each bit.
This is because the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS notifier
forces you to refuse the entire operation if there's at least one bit
which you can't offload, and that is currently BR_BCAST_FLOOD which
nobody does. So this change would do nothing for no one if we offloaded
all flags at once, but the idea is to offload as much as possible
instead of all or nothing.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
- Renamed nbp_flags_change to nbp_flags_notify.
- Don't return any errors, offload flags one by one as opposed to all at
once.

include/linux/if_bridge.h | 3 +++
net/bridge/br_if.c | 21 ++++++++++++++++++++-
net/bridge/br_switchdev.c | 3 +--
3 files changed, 24 insertions(+), 3 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index b979005ea39c..36d77fa8f40b 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -58,6 +58,9 @@ struct br_ip_list {
#define BR_MRP_LOST_CONT BIT(18)
#define BR_MRP_LOST_IN_CONT BIT(19)

+#define BR_PORT_DEFAULT_FLAGS (BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD | \
+ BR_LEARNING)
+
#define BR_DEFAULT_AGEING_TIME (300 * HZ)

extern void brioctl_set(int (*ioctl_hook)(struct net *, unsigned int, void __user *));
diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
index f7d2f472ae24..f813eec986ba 100644
--- a/net/bridge/br_if.c
+++ b/net/bridge/br_if.c
@@ -89,6 +89,23 @@ void br_port_carrier_check(struct net_bridge_port *p, bool *notified)
spin_unlock_bh(&br->lock);
}

+/* If @mask has multiple bits set at once, offload them one by one to
+ * switchdev, to allow it to reject only what it doesn't support and accept
+ * what it does.
+ */
+static void nbp_flags_notify(struct net_bridge_port *p, unsigned long flags,
+ unsigned long mask)
+{
+ int flag;
+
+ for_each_set_bit(flag, &mask, 32)
+ br_switchdev_set_port_flag(p, flags & BIT(flag),
+ BIT(flag), NULL);
+
+ p->flags &= ~mask;
+ p->flags |= flags;
+}
+
static void br_port_set_promisc(struct net_bridge_port *p)
{
int err = 0;
@@ -343,6 +360,8 @@ static void del_nbp(struct net_bridge_port *p)
update_headroom(br, get_max_headroom(br));
netdev_reset_rx_headroom(dev);

+ nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS & ~BR_LEARNING,
+ BR_PORT_DEFAULT_FLAGS);
nbp_vlan_flush(p);
br_fdb_delete_by_port(br, p, 0, 1);
switchdev_deferred_process();
@@ -428,7 +447,7 @@ static struct net_bridge_port *new_nbp(struct net_bridge *br,
p->path_cost = port_cost(dev);
p->priority = 0x8000 >> BR_PORT_BITS;
p->port_no = index;
- p->flags = BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD | BR_BCAST_FLOOD;
+ nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS, BR_PORT_DEFAULT_FLAGS);
br_init_port(p);
br_set_state(p, BR_STATE_DISABLED);
br_stp_port_timer_init(p);
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index ac8dead86bf2..1fae532cfbb1 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -55,8 +55,7 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
}

/* Flags that can be offloaded to hardware */
-#define BR_PORT_FLAGS_HW_OFFLOAD (BR_LEARNING | BR_FLOOD | \
- BR_MCAST_FLOOD | BR_BCAST_FLOOD)
+#define BR_PORT_FLAGS_HW_OFFLOAD BR_PORT_DEFAULT_FLAGS

int br_switchdev_set_port_flag(struct net_bridge_port *p,
unsigned long flags,
--
2.25.1

2021-02-09 15:26:01

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 08/11] net: bridge: put SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS on the blocking call chain

From: Vladimir Oltean <[email protected]>

Now that br_switchdev_set_port_flag is never called from under br->lock,
it runs in sleepable context.

All switchdev drivers handle SWITCHDEV_PORT_ATTR_SET as both blocking
and atomic, so no changes are needed on that front.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
Patch is new.

net/bridge/br_switchdev.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index bc63b10b2e67..3b152f2cd9b5 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -79,9 +79,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
attr.u.brport_flags.val = flags;
attr.u.brport_flags.mask = mask;

- /* We run from atomic context here */
- err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
- &info.info, extack);
+ err = call_switchdev_blocking_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
+ &info.info, extack);
err = notifier_to_errno(err);
if (err == -EOPNOTSUPP)
return 0;
--
2.25.1

2021-02-09 15:27:02

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 05/11] net: dsa: stop setting initial and final brport flags

From: Vladimir Oltean <[email protected]>

With the bridge driver doing that for us now, we can simplify our
mid-layer logic a little bit, which would have otherwise needed some
tuning for the disabling of address learning that is necessary in
standalone mode.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
None.

net/dsa/port.c | 12 +-----------
1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/net/dsa/port.c b/net/dsa/port.c
index 5e079a61528e..aa1cbba7f89f 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -132,11 +132,6 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
};
int err;

- /* Set the flooding mode before joining the port in the switch */
- err = dsa_port_bridge_flags(dp, BR_FLOOD | BR_MCAST_FLOOD);
- if (err)
- return err;
-
/* Here the interface is already bridged. Reflect the current
* configuration so that drivers can program their chips accordingly.
*/
@@ -145,10 +140,8 @@ int dsa_port_bridge_join(struct dsa_port *dp, struct net_device *br)
err = dsa_broadcast(DSA_NOTIFIER_BRIDGE_JOIN, &info);

/* The bridging is rolled back on error */
- if (err) {
- dsa_port_bridge_flags(dp, 0);
+ if (err)
dp->bridge_dev = NULL;
- }

return err;
}
@@ -172,9 +165,6 @@ void dsa_port_bridge_leave(struct dsa_port *dp, struct net_device *br)
if (err)
pr_err("DSA: failed to notify DSA_NOTIFIER_BRIDGE_LEAVE\n");

- /* Port is leaving the bridge, disable flooding */
- dsa_port_bridge_flags(dp, 0);
-
/* Port left the bridge, put in BR_STATE_DISABLED by the bridge layer,
* so allow it to be in BR_STATE_FORWARDING to be kept functional
*/
--
2.25.1

2021-02-09 15:27:03

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 06/11] net: squash switchdev attributes PRE_BRIDGE_FLAGS and BRIDGE_FLAGS

From: Vladimir Oltean <[email protected]>

There does not appear to be any strong reason why
br_switchdev_set_port_flag issues a separate notification for checking
the supported brport flags rather than just attempting to apply them and
propagating the error if that fails.

However, there is a reason why this switchdev API is counterproductive
for a driver writer, and that is because although br_switchdev_set_port_flag
gets passed a "flags" and a "mask", those are passed piecemeal to the
driver, so while the PRE_BRIDGE_FLAGS listener knows what changed
because it has the "mask", the BRIDGE_FLAGS listener doesn't, because it
only has the final value. This means that "edge detection" needs to be
done by each individual BRIDGE_FLAGS listener by XOR-ing the old and the
new flags, which in turn means that copying the flags into a driver
private variable is strictly necessary.

This can be solved by passing the "flags" and the "mask" together into
a single switchdev attribute, and it also reduces some boilerplate in
the drivers that offload this.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
- Renamed "val" to "flags".
- Reworked drivers to check mask before performing any change.

.../marvell/prestera/prestera_switchdev.c | 29 +++++++------
.../mellanox/mlxsw/spectrum_switchdev.c | 28 ++++--------
drivers/net/ethernet/rocker/rocker_main.c | 24 ++---------
drivers/net/ethernet/ti/cpsw_switchdev.c | 32 ++++++--------
drivers/staging/fsl-dpaa2/ethsw/ethsw.c | 43 +++++++++----------
include/net/switchdev.h | 8 +++-
net/bridge/br_switchdev.c | 15 ++-----
net/dsa/dsa_priv.h | 4 +-
net/dsa/port.c | 22 +++-------
net/dsa/slave.c | 3 --
10 files changed, 78 insertions(+), 130 deletions(-)

diff --git a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
index 2c1619715a4b..a797a7ff0cfe 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_switchdev.c
@@ -581,24 +581,32 @@ int prestera_bridge_port_event(struct net_device *dev, unsigned long event,

static int prestera_port_attr_br_flags_set(struct prestera_port *port,
struct net_device *dev,
- unsigned long flags)
+ struct switchdev_brport_flags flags)
{
struct prestera_bridge_port *br_port;
int err;

+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
+ err = -EINVAL;
+
br_port = prestera_bridge_port_by_dev(port->sw->swdev, dev);
if (!br_port)
return 0;

- err = prestera_hw_port_flood_set(port, flags & BR_FLOOD);
- if (err)
- return err;
+ if (flags.mask & BR_FLOOD) {
+ err = prestera_hw_port_flood_set(port, flags.val & BR_FLOOD);
+ if (err)
+ return err;
+ }

- err = prestera_hw_port_learning_set(port, flags & BR_LEARNING);
- if (err)
- return err;
+ if (flags.mask & BR_LEARNING) {
+ err = prestera_hw_port_learning_set(port,
+ flags.val & BR_LEARNING);
+ if (err)
+ return err;
+ }

- memcpy(&br_port->flags, &flags, sizeof(flags));
+ memcpy(&br_port->flags, &flags.val, sizeof(flags.val));

return 0;
}
@@ -706,11 +714,6 @@ static int prestera_port_obj_attr_set(struct net_device *dev,
err = prestera_port_attr_stp_state_set(port, attr->orig_dev,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- if (attr->u.brport_flags &
- ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
- err = -EINVAL;
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = prestera_port_attr_br_flags_set(port, attr->orig_dev,
attr->u.brport_flags);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
index 18e4f1cd5587..0a8521adb4e9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_switchdev.c
@@ -653,23 +653,16 @@ mlxsw_sp_bridge_port_learning_set(struct mlxsw_sp_port *mlxsw_sp_port,
return err;
}

-static int mlxsw_sp_port_attr_br_pre_flags_set(struct mlxsw_sp_port
- *mlxsw_sp_port,
- unsigned long brport_flags)
-{
- if (brport_flags & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
- return -EINVAL;
-
- return 0;
-}
-
static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,
struct net_device *orig_dev,
- unsigned long brport_flags)
+ struct switchdev_brport_flags flags)
{
struct mlxsw_sp_bridge_port *bridge_port;
int err;

+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD))
+ return -EINVAL;
+
bridge_port = mlxsw_sp_bridge_port_find(mlxsw_sp_port->mlxsw_sp->bridge,
orig_dev);
if (!bridge_port)
@@ -677,12 +670,12 @@ static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,

err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port, bridge_port,
MLXSW_SP_FLOOD_TYPE_UC,
- brport_flags & BR_FLOOD);
+ flags.val & BR_FLOOD);
if (err)
return err;

err = mlxsw_sp_bridge_port_learning_set(mlxsw_sp_port, bridge_port,
- brport_flags & BR_LEARNING);
+ flags.val & BR_LEARNING);
if (err)
return err;

@@ -691,13 +684,12 @@ static int mlxsw_sp_port_attr_br_flags_set(struct mlxsw_sp_port *mlxsw_sp_port,

err = mlxsw_sp_bridge_port_flood_table_set(mlxsw_sp_port, bridge_port,
MLXSW_SP_FLOOD_TYPE_MC,
- brport_flags &
- BR_MCAST_FLOOD);
+ flags.val & BR_MCAST_FLOOD);
if (err)
return err;

out:
- memcpy(&bridge_port->flags, &brport_flags, sizeof(brport_flags));
+ memcpy(&bridge_port->flags, &flags.val, sizeof(flags.val));
return 0;
}

@@ -899,10 +891,6 @@ static int mlxsw_sp_port_attr_set(struct net_device *dev,
attr->orig_dev,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- err = mlxsw_sp_port_attr_br_pre_flags_set(mlxsw_sp_port,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = mlxsw_sp_port_attr_br_flags_set(mlxsw_sp_port,
attr->orig_dev,
diff --git a/drivers/net/ethernet/rocker/rocker_main.c b/drivers/net/ethernet/rocker/rocker_main.c
index 740a715c49c6..898abf3d14d0 100644
--- a/drivers/net/ethernet/rocker/rocker_main.c
+++ b/drivers/net/ethernet/rocker/rocker_main.c
@@ -1575,8 +1575,8 @@ rocker_world_port_attr_bridge_flags_support_get(const struct rocker_port *
}

static int
-rocker_world_port_attr_pre_bridge_flags_set(struct rocker_port *rocker_port,
- unsigned long brport_flags)
+rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
+ struct switchdev_brport_flags flags)
{
struct rocker_world_ops *wops = rocker_port->rocker->wops;
unsigned long brport_flags_s;
@@ -1590,22 +1590,10 @@ rocker_world_port_attr_pre_bridge_flags_set(struct rocker_port *rocker_port,
if (err)
return err;

- if (brport_flags & ~brport_flags_s)
+ if (flags.mask & ~brport_flags_s)
return -EINVAL;

- return 0;
-}
-
-static int
-rocker_world_port_attr_bridge_flags_set(struct rocker_port *rocker_port,
- unsigned long brport_flags)
-{
- struct rocker_world_ops *wops = rocker_port->rocker->wops;
-
- if (!wops->port_attr_bridge_flags_set)
- return -EOPNOTSUPP;
-
- return wops->port_attr_bridge_flags_set(rocker_port, brport_flags);
+ return wops->port_attr_bridge_flags_set(rocker_port, flags.val);
}

static int
@@ -2056,10 +2044,6 @@ static int rocker_port_attr_set(struct net_device *dev,
err = rocker_world_port_attr_stp_state_set(rocker_port,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- err = rocker_world_port_attr_pre_bridge_flags_set(rocker_port,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = rocker_world_port_attr_bridge_flags_set(rocker_port,
attr->u.brport_flags);
diff --git a/drivers/net/ethernet/ti/cpsw_switchdev.c b/drivers/net/ethernet/ti/cpsw_switchdev.c
index 13524cbaa8b6..5d8ec34f82ad 100644
--- a/drivers/net/ethernet/ti/cpsw_switchdev.c
+++ b/drivers/net/ethernet/ti/cpsw_switchdev.c
@@ -57,27 +57,25 @@ static int cpsw_port_stp_state_set(struct cpsw_priv *priv, u8 state)

static int cpsw_port_attr_br_flags_set(struct cpsw_priv *priv,
struct net_device *orig_dev,
- unsigned long brport_flags)
+ struct switchdev_brport_flags flags)
{
struct cpsw_common *cpsw = priv->cpsw;
- bool unreg_mcast_add = false;

- if (brport_flags & BR_MCAST_FLOOD)
- unreg_mcast_add = true;
- dev_dbg(priv->dev, "BR_MCAST_FLOOD: %d port %u\n",
- unreg_mcast_add, priv->emac_port);
+ if (flags.mask & ~(BR_LEARNING | BR_MCAST_FLOOD))
+ return -EINVAL;

- cpsw_ale_set_unreg_mcast(cpsw->ale, BIT(priv->emac_port),
- unreg_mcast_add);
+ if (flags.mask & BR_MCAST_FLOOD) {
+ bool unreg_mcast_add = false;

- return 0;
-}
+ if (flags.val & BR_MCAST_FLOOD)
+ unreg_mcast_add = true;

-static int cpsw_port_attr_br_flags_pre_set(struct net_device *netdev,
- unsigned long flags)
-{
- if (flags & ~(BR_LEARNING | BR_MCAST_FLOOD))
- return -EINVAL;
+ dev_dbg(priv->dev, "BR_MCAST_FLOOD: %d port %u\n",
+ unreg_mcast_add, priv->emac_port);
+
+ cpsw_ale_set_unreg_mcast(cpsw->ale, BIT(priv->emac_port),
+ unreg_mcast_add);
+ }

return 0;
}
@@ -92,10 +90,6 @@ static int cpsw_port_attr_set(struct net_device *ndev,
dev_dbg(priv->dev, "attr: id %u port: %u\n", attr->id, priv->emac_port);

switch (attr->id) {
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- ret = cpsw_port_attr_br_flags_pre_set(ndev,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
ret = cpsw_port_stp_state_set(priv, attr->u.stp_state);
dev_dbg(priv->dev, "stp state: %u\n", attr->u.stp_state);
diff --git a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
index ca3d07fe7f58..f675a2ba4dce 100644
--- a/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
+++ b/drivers/staging/fsl-dpaa2/ethsw/ethsw.c
@@ -908,31 +908,32 @@ static int dpaa2_switch_port_attr_stp_state_set(struct net_device *netdev,
return dpaa2_switch_port_set_stp_state(port_priv, state);
}

-static int dpaa2_switch_port_attr_br_flags_pre_set(struct net_device *netdev,
- unsigned long flags)
-{
- if (flags & ~(BR_LEARNING | BR_FLOOD))
- return -EINVAL;
-
- return 0;
-}
-
-static int dpaa2_switch_port_attr_br_flags_set(struct net_device *netdev,
- unsigned long flags)
+static int
+dpaa2_switch_port_attr_br_flags_set(struct net_device *netdev,
+ struct switchdev_brport_flags flags)
{
struct ethsw_port_priv *port_priv = netdev_priv(netdev);
int err = 0;

- /* Learning is enabled per switch */
- err = dpaa2_switch_set_learning(port_priv->ethsw_data,
- !!(flags & BR_LEARNING));
- if (err)
- goto exit;
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD))
+ return -EINVAL;
+
+ if (flags.mask & BR_LEARNING) {
+ /* Learning is enabled per switch */
+ err = dpaa2_switch_set_learning(port_priv->ethsw_data,
+ !!(flags.val & BR_LEARNING));
+ if (err)
+ return err;
+ }

- err = dpaa2_switch_port_set_flood(port_priv, !!(flags & BR_FLOOD));
+ if (flags.mask & BR_FLOOD) {
+ err = dpaa2_switch_port_set_flood(port_priv,
+ !!(flags.val & BR_FLOOD));
+ if (err)
+ return err;
+ }

-exit:
- return err;
+ return 0;
}

static int dpaa2_switch_port_attr_set(struct net_device *netdev,
@@ -945,10 +946,6 @@ static int dpaa2_switch_port_attr_set(struct net_device *netdev,
err = dpaa2_switch_port_attr_stp_state_set(netdev,
attr->u.stp_state);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- err = dpaa2_switch_port_attr_br_flags_pre_set(netdev,
- attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
err = dpaa2_switch_port_attr_br_flags_set(netdev,
attr->u.brport_flags);
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 84c765312001..aa9cad9bad7d 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -20,7 +20,6 @@ enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_UNDEFINED,
SWITCHDEV_ATTR_ID_PORT_STP_STATE,
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
- SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
SWITCHDEV_ATTR_ID_PORT_MROUTER,
SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME,
SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING,
@@ -33,6 +32,11 @@ enum switchdev_attr_id {
#endif
};

+struct switchdev_brport_flags {
+ unsigned long val;
+ unsigned long mask;
+};
+
struct switchdev_attr {
struct net_device *orig_dev;
enum switchdev_attr_id id;
@@ -41,7 +45,7 @@ struct switchdev_attr {
void (*complete)(struct net_device *dev, int err, void *priv);
union {
u8 stp_state; /* PORT_STP_STATE */
- unsigned long brport_flags; /* PORT_{PRE}_BRIDGE_FLAGS */
+ struct switchdev_brport_flags brport_flags; /* PORT_BRIDGE_FLAGS */
bool mrouter; /* PORT_MROUTER */
clock_t ageing_time; /* BRIDGE_AGEING_TIME */
bool vlan_filtering; /* BRIDGE_VLAN_FILTERING */
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index 1fae532cfbb1..bc63b10b2e67 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -64,7 +64,7 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
{
struct switchdev_attr attr = {
.orig_dev = p->dev,
- .id = SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
+ .id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
};
struct switchdev_notifier_port_attr_info info = {
.attr = &attr,
@@ -76,7 +76,8 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
if (!mask)
return 0;

- attr.u.brport_flags = mask;
+ attr.u.brport_flags.val = flags;
+ attr.u.brport_flags.mask = mask;

/* We run from atomic context here */
err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
@@ -92,16 +93,6 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,
return -EOPNOTSUPP;
}

- attr.id = SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS;
- attr.flags = SWITCHDEV_F_DEFER;
- attr.u.brport_flags = flags;
-
- err = switchdev_port_attr_set(p->dev, &attr);
- if (err) {
- NL_SET_ERR_MSG_MOD(extack, "error setting offload flag on port");
- return err;
- }
-
return 0;
}

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 8a1bcb2b4208..63770e421e4d 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -174,8 +174,8 @@ int dsa_port_mdb_add(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
int dsa_port_mdb_del(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
-int dsa_port_pre_bridge_flags(const struct dsa_port *dp, unsigned long flags);
-int dsa_port_bridge_flags(const struct dsa_port *dp, unsigned long flags);
+int dsa_port_bridge_flags(const struct dsa_port *dp,
+ struct switchdev_brport_flags flags);
int dsa_port_mrouter(struct dsa_port *dp, bool mrouter);
int dsa_port_vlan_add(struct dsa_port *dp,
const struct switchdev_obj_port_vlan *vlan);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index aa1cbba7f89f..597d3d3eb507 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -382,28 +382,18 @@ int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock)
return 0;
}

-int dsa_port_pre_bridge_flags(const struct dsa_port *dp, unsigned long flags)
+int dsa_port_bridge_flags(const struct dsa_port *dp,
+ struct switchdev_brport_flags flags)
{
struct dsa_switch *ds = dp->ds;
+ int port = dp->index;

if (!ds->ops->port_egress_floods ||
- (flags & ~(BR_FLOOD | BR_MCAST_FLOOD)))
+ (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD)))
return -EINVAL;

- return 0;
-}
-
-int dsa_port_bridge_flags(const struct dsa_port *dp, unsigned long flags)
-{
- struct dsa_switch *ds = dp->ds;
- int port = dp->index;
- int err = 0;
-
- if (ds->ops->port_egress_floods)
- err = ds->ops->port_egress_floods(ds, port, flags & BR_FLOOD,
- flags & BR_MCAST_FLOOD);
-
- return err;
+ return ds->ops->port_egress_floods(ds, port, flags.val & BR_FLOOD,
+ flags.val & BR_MCAST_FLOOD);
}

int dsa_port_mrouter(struct dsa_port *dp, bool mrouter)
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 8f4c7c232e2c..0e1f8f1d4e2c 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -290,9 +290,6 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
case SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME:
ret = dsa_port_ageing_time(dp, attr->u.ageing_time);
break;
- case SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS:
- ret = dsa_port_pre_bridge_flags(dp, attr->u.brport_flags);
- break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
ret = dsa_port_bridge_flags(dp, attr->u.brport_flags);
break;
--
2.25.1

2021-02-09 15:27:44

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 10/11] net: mscc: ocelot: offload bridge port flags to device

From: Vladimir Oltean <[email protected]>

We should not be unconditionally enabling address learning, since doing
that is actively detrimential when a port is standalone and not offloading
a bridge. Namely, if a port in the switch is standalone and others are
offloading the bridge, then we could enter a situation where we learn an
address towards the standalone port, but the bridged ports could not
forward the packet there, because the CPU is the only path between the
standalone and the bridged ports. The solution of course is to not
enable address learning unless the bridge asks for it.

We need to set up the initial port flags for no learning and flooding
everything, then the bridge takes over. The flood configuration was
already configured ok in ocelot_init, we just need to disable learning
in ocelot_init_port.

Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Alexandre Belloni <[email protected]>
---
Changes in v2:
- Disable learning in ocelot_init_port.
- Keep a single bool ocelot_port->learn_ena instead of
ocelot_port->brport_flags.
- Stop touching the brport_flags from ocelot_port_bridge_leave (which
was a leftover).

drivers/net/dsa/ocelot/felix.c | 10 +++++
drivers/net/ethernet/mscc/ocelot.c | 59 +++++++++++++++++++++++++-
drivers/net/ethernet/mscc/ocelot_net.c | 4 ++
include/soc/mscc/ocelot.h | 3 ++
4 files changed, 75 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c
index 1bd5aea12b25..4ff18415ef95 100644
--- a/drivers/net/dsa/ocelot/felix.c
+++ b/drivers/net/dsa/ocelot/felix.c
@@ -553,6 +553,15 @@ static void felix_bridge_stp_state_set(struct dsa_switch *ds, int port,
return ocelot_bridge_stp_state_set(ocelot, port, state);
}

+static int felix_bridge_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags val,
+ struct netlink_ext_ack *extack)
+{
+ struct ocelot *ocelot = ds->priv;
+
+ return ocelot_port_bridge_flags(ocelot, port, val);
+}
+
static int felix_bridge_join(struct dsa_switch *ds, int port,
struct net_device *br)
{
@@ -1358,6 +1367,7 @@ const struct dsa_switch_ops felix_switch_ops = {
.port_fdb_del = felix_fdb_del,
.port_mdb_add = felix_mdb_add,
.port_mdb_del = felix_mdb_del,
+ .port_bridge_flags = felix_bridge_flags,
.port_bridge_join = felix_bridge_join,
.port_bridge_leave = felix_bridge_leave,
.port_lag_join = felix_lag_join,
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 8c1052346b58..65bc7d59d2c9 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -984,6 +984,7 @@ EXPORT_SYMBOL(ocelot_apply_bridge_fwd_mask);

void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
{
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
u32 port_cfg;

if (!(BIT(port) & ocelot->bridge_mask))
@@ -996,7 +997,8 @@ void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state)
ocelot->bridge_fwd_mask |= BIT(port);
fallthrough;
case BR_STATE_LEARNING:
- port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
+ if (ocelot_port->learn_ena)
+ port_cfg |= ANA_PORT_PORT_CFG_LEARN_ENA;
break;

default:
@@ -1480,6 +1482,57 @@ int ocelot_get_max_mtu(struct ocelot *ocelot, int port)
}
EXPORT_SYMBOL(ocelot_get_max_mtu);

+int ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
+ struct switchdev_brport_flags flags)
+{
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
+
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+ BR_BCAST_FLOOD))
+ return -EINVAL;
+
+ if (flags.mask & BR_LEARNING) {
+ u32 val = 0;
+
+ ocelot_port->learn_ena = !!(flags.val & BR_LEARNING);
+ if (ocelot_port->learn_ena)
+ val = ANA_PORT_PORT_CFG_LEARN_ENA;
+
+ ocelot_rmw_gix(ocelot, val, ANA_PORT_PORT_CFG_LEARN_ENA,
+ ANA_PORT_PORT_CFG, port);
+ }
+
+ if (flags.mask & BR_FLOOD) {
+ u32 val = 0;
+
+ if (flags.val & BR_FLOOD)
+ val = BIT(port);
+
+ ocelot_rmw_rix(ocelot, val, BIT(port), ANA_PGID_PGID, PGID_UC);
+ }
+
+ if (flags.mask & BR_MCAST_FLOOD) {
+ u32 val = 0;
+
+ if (flags.val & BR_MCAST_FLOOD)
+ val = BIT(port);
+
+ ocelot_rmw_rix(ocelot, val, BIT(port), ANA_PGID_PGID, PGID_MC);
+ }
+
+ if (flags.mask & BR_BCAST_FLOOD) {
+ u32 val = 0;
+
+ if (flags.val & BR_BCAST_FLOOD)
+ val = BIT(port);
+
+ ocelot_rmw_rix(ocelot, val, BIT(port), ANA_PGID_PGID, PGID_BC);
+ }
+
+ return 0;
+}
+EXPORT_SYMBOL(ocelot_port_bridge_flags);
+
void ocelot_init_port(struct ocelot *ocelot, int port)
{
struct ocelot_port *ocelot_port = ocelot->ports[port];
@@ -1524,6 +1577,10 @@ void ocelot_init_port(struct ocelot *ocelot, int port)
ANA_PORT_DROP_CFG_DROP_MC_SMAC_ENA,
ANA_PORT_DROP_CFG, port);

+ /* Disable source address learning for standalone mode */
+ ocelot_rmw_gix(ocelot, 0, ANA_PORT_PORT_CFG_LEARN_ENA,
+ ANA_PORT_PORT_CFG, port);
+
/* Set default VLAN and tag type to 8021Q. */
ocelot_rmw_gix(ocelot, REW_PORT_VLAN_CFG_PORT_TPID(ETH_P_8021Q),
REW_PORT_VLAN_CFG_PORT_TPID_M,
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index f9da4aa39444..9944d79c6685 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -1026,6 +1026,10 @@ static int ocelot_port_attr_set(struct net_device *dev,
case SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED:
ocelot_port_attr_mc_set(ocelot, port, !attr->u.mc_disabled);
break;
+ case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
+ err = ocelot_port_bridge_flags(ocelot, port,
+ attr->u.brport_flags);
+ break;
default:
err = -EOPNOTSUPP;
break;
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 7ee85527cb5f..e6aacf65fa1e 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -612,6 +612,7 @@ struct ocelot_port {

u8 *xmit_template;
bool is_dsa_8021q_cpu;
+ bool learn_ena;

struct net_device *bond;
bool lag_tx_active;
@@ -764,6 +765,8 @@ void ocelot_adjust_link(struct ocelot *ocelot, int port,
int ocelot_port_vlan_filtering(struct ocelot *ocelot, int port, bool enabled);
void ocelot_bridge_stp_state_set(struct ocelot *ocelot, int port, u8 state);
void ocelot_apply_bridge_fwd_mask(struct ocelot *ocelot);
+int ocelot_port_bridge_flags(struct ocelot *ocelot, int port,
+ struct switchdev_brport_flags val);
int ocelot_port_bridge_join(struct ocelot *ocelot, int port,
struct net_device *bridge);
int ocelot_port_bridge_leave(struct ocelot *ocelot, int port,
--
2.25.1

2021-02-09 15:27:47

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 11/11] net: dsa: sja1105: offload bridge port flags to device

From: Vladimir Oltean <[email protected]>

The chip can configure unicast flooding, broadcast flooding and learning.
Learning is per port, while flooding is per {ingress, egress} port pair
and we need to configure the same value for all possible ingress ports
towards the requested one.

While multicast flooding is not officially supported, we can hack it by
using a feature of the second generation (P/Q/R/S) devices, which is that
FDB entries are maskable, and multicast addresses always have an odd
first octet. So by putting a match-all for 00:01:00:00:00:00 addr and
00:01:00:00:00:00 mask at the end of the FDB, we make sure that it is
always checked last, and does not take precedence in front of any other
MDB. So it behaves effectively as an unknown multicast entry.

For the first generation switches, this feature is not available, so
unknown multicast will always be treated the same as unknown unicast.
So the only thing we can do is request the user to offload the settings
for these 2 flags in tandem, i.e.

ip link set swp2 type bridge_slave flood off
Error: sja1105: This chip cannot configure multicast flooding independently of unicast.
ip link set swp2 type bridge_slave flood off mcast_flood off
ip link set swp2 type bridge_slave mcast_flood on
Error: sja1105: This chip cannot configure multicast flooding independently of unicast.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
Patch is new.

drivers/net/dsa/sja1105/sja1105.h | 2 +
drivers/net/dsa/sja1105/sja1105_main.c | 212 +++++++++++++++++++++++--
drivers/net/dsa/sja1105/sja1105_spi.c | 6 +
3 files changed, 209 insertions(+), 11 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index d582308c2401..15a0893d0ff1 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -94,6 +94,7 @@ struct sja1105_info {
* pop it when it's equal to TPID2.
*/
u16 qinq_tpid;
+ bool can_limit_mcast_flood;
int (*reset_cmd)(struct dsa_switch *ds);
int (*setup_rgmii_delay)(const void *ctx, int port);
/* Prototypes from include/net/dsa.h */
@@ -204,6 +205,7 @@ struct sja1105_private {
bool rgmii_rx_delay[SJA1105_NUM_PORTS];
bool rgmii_tx_delay[SJA1105_NUM_PORTS];
bool best_effort_vlan_filtering;
+ unsigned long learn_ena;
const struct sja1105_info *info;
struct gpio_desc *reset_gpio;
struct spi_device *spidev;
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 282253543f3b..8373cc1f5df1 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -25,6 +25,8 @@
#include "sja1105_sgmii.h"
#include "sja1105_tas.h"

+#define SJA1105_UNKNOWN_MULTICAST 0x010000000000ull
+
static const struct dsa_switch_ops sja1105_switch_ops;

static void sja1105_hw_reset(struct gpio_desc *gpio, unsigned int pulse_len,
@@ -42,15 +44,10 @@ static void
sja1105_port_allow_traffic(struct sja1105_l2_forwarding_entry *l2_fwd,
int from, int to, bool allow)
{
- if (allow) {
- l2_fwd[from].bc_domain |= BIT(to);
+ if (allow)
l2_fwd[from].reach_port |= BIT(to);
- l2_fwd[from].fl_domain |= BIT(to);
- } else {
- l2_fwd[from].bc_domain &= ~BIT(to);
+ else
l2_fwd[from].reach_port &= ~BIT(to);
- l2_fwd[from].fl_domain &= ~BIT(to);
- }
}

/* Structure used to temporarily transport device tree
@@ -220,17 +217,43 @@ static int sja1105_init_mii_settings(struct sja1105_private *priv,

static int sja1105_init_static_fdb(struct sja1105_private *priv)
{
+ struct sja1105_l2_lookup_entry *l2_lookup;
struct sja1105_table *table;
+ int port;

table = &priv->static_config.tables[BLK_IDX_L2_LOOKUP];

- /* We only populate the FDB table through dynamic
- * L2 Address Lookup entries
+ /* We only populate the FDB table through dynamic L2 Address Lookup
+ * entries, except for a special entry at the end which is a catch-all
+ * for unknown multicast and will be used to control flooding domain.
*/
if (table->entry_count) {
kfree(table->entries);
table->entry_count = 0;
}
+
+ if (!priv->info->can_limit_mcast_flood)
+ return 0;
+
+ table->entries = kcalloc(1, table->ops->unpacked_entry_size,
+ GFP_KERNEL);
+ if (!table->entries)
+ return -ENOMEM;
+
+ table->entry_count = 1;
+ l2_lookup = table->entries;
+
+ /* All L2 multicast addresses have an odd first octet */
+ l2_lookup[0].macaddr = SJA1105_UNKNOWN_MULTICAST;
+ l2_lookup[0].mask_macaddr = SJA1105_UNKNOWN_MULTICAST;
+ l2_lookup[0].lockeds = true;
+ l2_lookup[0].index = SJA1105_MAX_L2_LOOKUP_COUNT - 1;
+
+ /* Flood multicast to every port by default */
+ for (port = 0; port < priv->ds->num_ports; port++)
+ if (!dsa_is_unused_port(priv->ds, port))
+ l2_lookup[0].destports |= BIT(port);
+
return 0;
}

@@ -390,6 +413,12 @@ static int sja1105_init_l2_forwarding(struct sja1105_private *priv)

sja1105_port_allow_traffic(l2fwd, i, upstream, true);
sja1105_port_allow_traffic(l2fwd, upstream, i, true);
+
+ l2fwd[i].bc_domain = BIT(upstream);
+ l2fwd[i].fl_domain = BIT(upstream);
+
+ l2fwd[upstream].bc_domain |= BIT(i);
+ l2fwd[upstream].fl_domain |= BIT(i);
}
/* Next 8 entries define VLAN PCP mapping from ingress to egress.
* Create a one-to-one mapping.
@@ -1514,6 +1543,12 @@ static int sja1105_fdb_dump(struct dsa_switch *ds, int port,
*/
if (!(l2_lookup.destports & BIT(port)))
continue;
+
+ /* We need to hide the FDB entry for unknown multicast */
+ if (l2_lookup.macaddr == SJA1105_UNKNOWN_MULTICAST &&
+ l2_lookup.mask_macaddr == SJA1105_UNKNOWN_MULTICAST)
+ continue;
+
u64_to_ether_addr(l2_lookup.macaddr, macaddr);

/* We need to hide the dsa_8021q VLANs from the user. */
@@ -1605,12 +1640,12 @@ static void sja1105_bridge_stp_state_set(struct dsa_switch *ds, int port,
case BR_STATE_LEARNING:
mac[port].ingress = true;
mac[port].egress = false;
- mac[port].dyn_learn = true;
+ mac[port].dyn_learn = !!(priv->learn_ena & BIT(port));
break;
case BR_STATE_FORWARDING:
mac[port].ingress = true;
mac[port].egress = true;
- mac[port].dyn_learn = true;
+ mac[port].dyn_learn = !!(priv->learn_ena & BIT(port));
break;
default:
dev_err(ds->dev, "invalid STP state: %d\n", state);
@@ -3239,6 +3274,160 @@ static void sja1105_port_policer_del(struct dsa_switch *ds, int port)
sja1105_static_config_reload(priv, SJA1105_BEST_EFFORT_POLICING);
}

+static int sja1105_port_set_learning(struct sja1105_private *priv, int port,
+ bool enabled)
+{
+ struct sja1105_mac_config_entry *mac;
+ int rc;
+
+ mac = priv->static_config.tables[BLK_IDX_MAC_CONFIG].entries;
+
+ mac[port].dyn_learn = !!(priv->learn_ena & BIT(port));
+
+ rc = sja1105_dynamic_config_write(priv, BLK_IDX_MAC_CONFIG, port,
+ &mac[port], true);
+ if (rc)
+ return rc;
+
+ if (enabled)
+ priv->learn_ena |= BIT(port);
+ else
+ priv->learn_ena &= ~BIT(port);
+
+ return 0;
+}
+
+/* Common function for unicast and broadcast flood configuration.
+ * Flooding is configured between each {ingress, egress} port pair, and since
+ * the bridge's semantics are those of "egress flooding", it means we must
+ * enable flooding towards this port from all ingress ports that are in the
+ * same bridge. In practice, we just enable flooding from all possible ingress
+ * ports regardless of whether they're in the same bridge or not, since the
+ * reach_port configuration will not allow flooded frames to leak across
+ * bridging domains anyway.
+ */
+static int sja1105_port_ucast_bcast_flood(struct sja1105_private *priv, int to,
+ struct switchdev_brport_flags flags)
+{
+ struct sja1105_l2_forwarding_entry *l2_fwd;
+ int from, rc;
+
+ l2_fwd = priv->static_config.tables[BLK_IDX_L2_FORWARDING].entries;
+
+ for (from = 0; from < priv->ds->num_ports; from++) {
+ if (dsa_is_unused_port(priv->ds, from))
+ continue;
+ if (from == to)
+ continue;
+
+ /* Unicast */
+ if (flags.mask & BR_FLOOD) {
+ if (flags.val & BR_FLOOD)
+ l2_fwd[from].fl_domain |= BIT(to);
+ else
+ l2_fwd[from].fl_domain &= ~BIT(to);
+ }
+ /* Broadcast */
+ if (flags.mask & BR_BCAST_FLOOD) {
+ if (flags.val & BR_BCAST_FLOOD)
+ l2_fwd[from].bc_domain |= BIT(to);
+ else
+ l2_fwd[from].bc_domain &= ~BIT(to);
+ }
+
+ rc = sja1105_dynamic_config_write(priv, BLK_IDX_L2_FORWARDING,
+ from, &l2_fwd[from], true);
+ if (rc < 0)
+ return rc;
+ }
+
+ return 0;
+}
+
+static int sja1105_port_mcast_flood(struct sja1105_private *priv, int to,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ struct sja1105_l2_lookup_entry *l2_lookup;
+ struct sja1105_table *table;
+ int match;
+
+ table = &priv->static_config.tables[BLK_IDX_L2_LOOKUP];
+ l2_lookup = table->entries;
+
+ for (match = 0; match < table->entry_count; match++)
+ if (l2_lookup[match].macaddr == SJA1105_UNKNOWN_MULTICAST &&
+ l2_lookup[match].mask_macaddr == SJA1105_UNKNOWN_MULTICAST)
+ break;
+
+ if (match == table->entry_count) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Could not find FDB entry for unknown multicast");
+ return -ENOSPC;
+ }
+
+ if (flags.val & BR_MCAST_FLOOD)
+ l2_lookup[match].destports |= BIT(to);
+ else
+ l2_lookup[match].destports &= ~BIT(to);
+
+ return sja1105_dynamic_config_write(priv, BLK_IDX_L2_LOOKUP,
+ l2_lookup[match].index,
+ &l2_lookup[match],
+ true);
+}
+
+static int sja1105_port_bridge_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ struct sja1105_private *priv = ds->priv;
+ int rc;
+
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+ BR_BCAST_FLOOD))
+ return -EINVAL;
+
+ if (flags.mask & BR_LEARNING) {
+ bool learn_ena = !!(flags.val & BR_LEARNING);
+
+ rc = sja1105_port_set_learning(priv, port, learn_ena);
+ if (rc)
+ return rc;
+ }
+
+ if (flags.mask & (BR_FLOOD | BR_BCAST_FLOOD)) {
+ rc = sja1105_port_ucast_bcast_flood(priv, port, flags);
+ if (rc)
+ return rc;
+ }
+
+ if (flags.mask & (BR_FLOOD | BR_MCAST_FLOOD) &&
+ !priv->info->can_limit_mcast_flood) {
+ bool multicast = !!(flags.val & BR_MCAST_FLOOD);
+ bool unicast = !!(flags.val & BR_FLOOD);
+
+ if (unicast != multicast) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "This chip cannot configure multicast flooding independently of unicast");
+ return -EINVAL;
+ }
+ }
+
+ /* For chips that can't offload BR_MCAST_FLOOD independently, there
+ * is nothing to do here, we ensured the configuration is in sync by
+ * offloading BR_FLOOD.
+ */
+ if (flags.mask & BR_MCAST_FLOOD && priv->info->can_limit_mcast_flood) {
+ rc = sja1105_port_mcast_flood(priv, port, flags,
+ extack);
+ if (rc)
+ return rc;
+ }
+
+ return 0;
+}
+
static const struct dsa_switch_ops sja1105_switch_ops = {
.get_tag_protocol = sja1105_get_tag_protocol,
.setup = sja1105_setup,
@@ -3262,6 +3451,7 @@ static const struct dsa_switch_ops sja1105_switch_ops = {
.port_fdb_del = sja1105_fdb_del,
.port_bridge_join = sja1105_bridge_join,
.port_bridge_leave = sja1105_bridge_leave,
+ .port_bridge_flags = sja1105_port_bridge_flags,
.port_stp_state_set = sja1105_bridge_stp_state_set,
.port_vlan_filtering = sja1105_vlan_filtering,
.port_vlan_add = sja1105_vlan_add,
diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c
index 591c5734747d..f7a1514f81e8 100644
--- a/drivers/net/dsa/sja1105/sja1105_spi.c
+++ b/drivers/net/dsa/sja1105/sja1105_spi.c
@@ -512,6 +512,7 @@ const struct sja1105_info sja1105e_info = {
.static_ops = sja1105e_table_ops,
.dyn_ops = sja1105et_dyn_ops,
.qinq_tpid = ETH_P_8021Q,
+ .can_limit_mcast_flood = false,
.ptp_ts_bits = 24,
.ptpegr_ts_bytes = 4,
.num_cbs_shapers = SJA1105ET_MAX_CBS_COUNT,
@@ -529,6 +530,7 @@ const struct sja1105_info sja1105t_info = {
.static_ops = sja1105t_table_ops,
.dyn_ops = sja1105et_dyn_ops,
.qinq_tpid = ETH_P_8021Q,
+ .can_limit_mcast_flood = false,
.ptp_ts_bits = 24,
.ptpegr_ts_bytes = 4,
.num_cbs_shapers = SJA1105ET_MAX_CBS_COUNT,
@@ -546,6 +548,7 @@ const struct sja1105_info sja1105p_info = {
.static_ops = sja1105p_table_ops,
.dyn_ops = sja1105pqrs_dyn_ops,
.qinq_tpid = ETH_P_8021AD,
+ .can_limit_mcast_flood = true,
.ptp_ts_bits = 32,
.ptpegr_ts_bytes = 8,
.num_cbs_shapers = SJA1105PQRS_MAX_CBS_COUNT,
@@ -564,6 +567,7 @@ const struct sja1105_info sja1105q_info = {
.static_ops = sja1105q_table_ops,
.dyn_ops = sja1105pqrs_dyn_ops,
.qinq_tpid = ETH_P_8021AD,
+ .can_limit_mcast_flood = true,
.ptp_ts_bits = 32,
.ptpegr_ts_bytes = 8,
.num_cbs_shapers = SJA1105PQRS_MAX_CBS_COUNT,
@@ -582,6 +586,7 @@ const struct sja1105_info sja1105r_info = {
.static_ops = sja1105r_table_ops,
.dyn_ops = sja1105pqrs_dyn_ops,
.qinq_tpid = ETH_P_8021AD,
+ .can_limit_mcast_flood = true,
.ptp_ts_bits = 32,
.ptpegr_ts_bytes = 8,
.num_cbs_shapers = SJA1105PQRS_MAX_CBS_COUNT,
@@ -601,6 +606,7 @@ const struct sja1105_info sja1105s_info = {
.dyn_ops = sja1105pqrs_dyn_ops,
.regs = &sja1105pqrs_regs,
.qinq_tpid = ETH_P_8021AD,
+ .can_limit_mcast_flood = true,
.ptp_ts_bits = 32,
.ptpegr_ts_bytes = 8,
.num_cbs_shapers = SJA1105PQRS_MAX_CBS_COUNT,
--
2.25.1

2021-02-09 20:00:13

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 01/11] net: switchdev: propagate extack to port attributes

On Tue, Feb 09, 2021 at 05:19:26PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> When a struct switchdev_attr is notified through switchdev, there is no
> way to report informational messages, unlike for struct switchdev_obj.
>
> Signed-off-by: Vladimir Oltean <[email protected]>

Reviewed-by: Ido Schimmel <[email protected]>

2021-02-09 20:42:37

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 02/11] net: bridge: offload all port flags at once in br_setport

On Tue, Feb 09, 2021 at 05:19:27PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> The br_switchdev_set_port_flag function uses the atomic notifier call
> chain because br_setport runs in an atomic section (under br->lock).
> This is because port flag changes need to be synchronized with the data
> path. But actually the switchdev notifier doesn't need that, only
> br_set_port_flag does. So we can collect all the port flag changes and
> only emit the notification at the end, then revert the changes if the
> switchdev notification failed.
>
> There's also the other aspect: if for example this command:
>
> ip link set swp0 type bridge_slave flood off mcast_flood off learning off
>
> succeeded at configuring BR_FLOOD and BR_MCAST_FLOOD but not at
> BR_LEARNING, there would be no attempt to revert the partial state in
> any way. Arguably, if the user changes more than one flag through the
> same netlink command, this one _should_ be all or nothing, which means
> it should be passed through switchdev as all or nothing.
>
> Signed-off-by: Vladimir Oltean <[email protected]>
> ---
(...)
> + spin_lock_bh(&p->br->lock);
> +
> + old_flags = p->flags;
> + br_vlan_tunnel_old = (old_flags & BR_VLAN_TUNNEL) ? true : false;
> +
> + br_set_port_flag(p, tb, IFLA_BRPORT_MODE, BR_HAIRPIN_MODE);
> + br_set_port_flag(p, tb, IFLA_BRPORT_GUARD, BR_BPDU_GUARD);
> + br_set_port_flag(p, tb, IFLA_BRPORT_FAST_LEAVE,
> + BR_MULTICAST_FAST_LEAVE);
> + br_set_port_flag(p, tb, IFLA_BRPORT_PROTECT, BR_ROOT_BLOCK);
> + br_set_port_flag(p, tb, IFLA_BRPORT_LEARNING, BR_LEARNING);
> + br_set_port_flag(p, tb, IFLA_BRPORT_UNICAST_FLOOD, BR_FLOOD);
> + br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_FLOOD, BR_MCAST_FLOOD);
> + br_set_port_flag(p, tb, IFLA_BRPORT_MCAST_TO_UCAST,
> + BR_MULTICAST_TO_UNICAST);
> + br_set_port_flag(p, tb, IFLA_BRPORT_BCAST_FLOOD, BR_BCAST_FLOOD);
> + br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP, BR_PROXYARP);
> + br_set_port_flag(p, tb, IFLA_BRPORT_PROXYARP_WIFI, BR_PROXYARP_WIFI);
> + br_set_port_flag(p, tb, IFLA_BRPORT_VLAN_TUNNEL, BR_VLAN_TUNNEL);
> + br_set_port_flag(p, tb, IFLA_BRPORT_NEIGH_SUPPRESS, BR_NEIGH_SUPPRESS);
> + br_set_port_flag(p, tb, IFLA_BRPORT_ISOLATED, BR_ISOLATED);
> +
> + changed_mask = old_flags ^ p->flags;
> +
> + spin_unlock_bh(&p->br->lock);
> +
> + err = br_switchdev_set_port_flag(p, p->flags, changed_mask);
> + if (err) {
> + spin_lock_bh(&p->br->lock);
> + p->flags = old_flags;
> + spin_unlock_bh(&p->br->lock);
> return err;
> + }
>

I know it's a bit strange to insert this in the middle of review, but
bear with me.

While I was reworking the patch series to also make sysfs non-atomic,
like this:

-----------------------------[cut here]-----------------------------
From 6ff6714b6686e4f9406425edf15db6c92e944954 Mon Sep 17 00:00:00 2001
From: Vladimir Oltean <[email protected]>
Date: Tue, 9 Feb 2021 19:43:40 +0200
Subject: [PATCH] net: bridge: temporarily drop br->lock for
br_switchdev_set_port_flag in sysfs

Since we would like br_switchdev_set_port_flag to not use an atomic
notifier, it should be called from outside spinlock context.

Dropping the lock creates some concurrency complications:
- There might be an "echo 1 > multicast_flood" simultaneous with an
"echo 0 > multicast_flood". The result of this is nondeterministic
either way, so I'm not too concerned as long as the result is
consistent (no other flags have changed).
- There might be an "echo 1 > multicast_flood" simultaneous with an
"echo 0 > learning". My expectation is that none of the two writes are
"eaten", and the final flags contain BR_MCAST_FLOOD=1 and BR_LEARNING=0
regardless of the order of execution. That is actually possible if, on
the commit path, we don't do a trivial "p->flags = flags" which might
overwrite bits outside of our mask, but instead we just change the
flags corresponding to our mask.

Signed-off-by: Vladimir Oltean <[email protected]>
---
net/bridge/br_sysfs_if.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/net/bridge/br_sysfs_if.c b/net/bridge/br_sysfs_if.c
index 62540b31e356..b419d9aad548 100644
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -68,17 +68,23 @@ static int store_flag(struct net_bridge_port *p, unsigned long v,
else
flags &= ~mask;

- if (flags != p->flags) {
- err = br_switchdev_set_port_flag(p, flags, mask, &extack);
- if (err) {
- if (extack._msg)
- netdev_err(p->dev, "%s\n", extack._msg);
- return err;
- }
+ if (flags == p->flags)
+ return 0;

- p->flags = flags;
- br_port_flags_change(p, mask);
+ spin_unlock_bh(&p->br->lock);
+ err = br_switchdev_set_port_flag(p, flags, mask, &extack);
+ spin_lock_bh(&p->br->lock);
+ if (err) {
+ if (extack._msg)
+ netdev_err(p->dev, "%s\n", extack._msg);
+ return err;
}
+
+ p->flags &= ~mask;
+ p->flags |= (flags & mask);
+
+ br_port_flags_change(p, mask);
+
return 0;
}

-----------------------------[cut here]-----------------------------

I figured there's a similar problem in this patch, which I had missed.
The code now looks like this:

changed_mask = old_flags ^ p->flags;
flags = p->flags;

spin_unlock_bh(&p->br->lock);

err = br_switchdev_set_port_flag(p, flags, changed_mask, extack);
if (err) {
spin_lock_bh(&p->br->lock);
p->flags &= ~changed_mask;
p->flags |= (old_flags & changed_mask);
spin_unlock_bh(&p->br->lock);
return err;
}

spin_lock_bh(&p->br->lock);

where I no longer access p->flags directly when calling
br_switchdev_set_port_flag (because I'm not protected by br->lock) but a
copy of it saved on stack. Also, I restore just the mask portion of
p->flags.

But there's an interesting side effect of allowing
br_switchdev_set_port_flag to run concurrently (notifier call chains use
a rw_semaphore and only take the read side). Basically now drivers that
cache the brport flags in their entirety are broken, because there isn't
any guarantee that bits outside the mask are valid any longer (we can
even enforce that by masking the flags with the mask when notifying
them). They would need to do the same trick of updating just the masked
part of their cached flags. Except for the fact that they would need
some sort of spinlock too, I don't think that the basic bitwise
operations are atomic or anything like that. I'm a bit reluctant to add
a spinlock in prestera, rocker, mlxsw just for this purpose. What do you
think?

2021-02-09 21:06:28

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> So switchdev drivers operating in standalone mode should disable address
> learning. As a matter of practicality, we can reduce code duplication in
> drivers by having the bridge notify through switchdev of the initial and
> final brport flags. Then, drivers can simply start up hardcoded for no
> address learning (similar to how they already start up hardcoded for no
> forwarding), then they only need to listen for
> SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> need for special cases when the port joins or leaves the bridge etc.

How are you handling the case where a port leaves a LAG that is linked
to a bridge? In this case the port becomes a standalone port, but will
not get this notification.

2021-02-10 00:27:34

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > So switchdev drivers operating in standalone mode should disable address
> > learning. As a matter of practicality, we can reduce code duplication in
> > drivers by having the bridge notify through switchdev of the initial and
> > final brport flags. Then, drivers can simply start up hardcoded for no
> > address learning (similar to how they already start up hardcoded for no
> > forwarding), then they only need to listen for
> > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> > need for special cases when the port joins or leaves the bridge etc.
>
> How are you handling the case where a port leaves a LAG that is linked
> to a bridge? In this case the port becomes a standalone port, but will
> not get this notification.

Apparently the answer to that question is "I delete the code that makes
this use case work", how smart of me. Thanks.

Unless you have any idea how I could move the logic into the bridge, I
guess I'm stuck with DSA and all the other switchdev drivers having this
forest of corner cases to deal with. At least I can add a comment so I'm
not tempted to delete it next time.

2021-02-10 01:21:55

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote:
> On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > > So switchdev drivers operating in standalone mode should disable address
> > > learning. As a matter of practicality, we can reduce code duplication in
> > > drivers by having the bridge notify through switchdev of the initial and
> > > final brport flags. Then, drivers can simply start up hardcoded for no
> > > address learning (similar to how they already start up hardcoded for no
> > > forwarding), then they only need to listen for
> > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> > > need for special cases when the port joins or leaves the bridge etc.
> >
> > How are you handling the case where a port leaves a LAG that is linked
> > to a bridge? In this case the port becomes a standalone port, but will
> > not get this notification.
>
> Apparently the answer to that question is "I delete the code that makes
> this use case work", how smart of me. Thanks.

Not sure how you expect to interpret this.

>
> Unless you have any idea how I could move the logic into the bridge, I
> guess I'm stuck with DSA and all the other switchdev drivers having this
> forest of corner cases to deal with. At least I can add a comment so I'm
> not tempted to delete it next time.

There are too many moving pieces with stacked devices. It is not only
LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better
to gracefully / explicitly not handle a case rather than pretending to
handle it correctly with complex / buggy code.

For example, you should refuse to be enslaved to a LAG that already has
upper devices such as a bridge. You are probably not handling this
correctly / at all. This is easy. Just a call to
netdev_has_any_upper_dev().

The reverse, during unlinking, would be to refuse unlinking if the upper
has uppers of its own. netdev_upper_dev_unlink() needs to learn to
return an error and callers such as team/bond need to learn to handle
it, but it seems patchable.

2021-02-10 01:26:23

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

On Wed, Feb 10, 2021 at 12:01:24AM +0200, Ido Schimmel wrote:
> On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote:
> > On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> > > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > > > So switchdev drivers operating in standalone mode should disable address
> > > > learning. As a matter of practicality, we can reduce code duplication in
> > > > drivers by having the bridge notify through switchdev of the initial and
> > > > final brport flags. Then, drivers can simply start up hardcoded for no
> > > > address learning (similar to how they already start up hardcoded for no
> > > > forwarding), then they only need to listen for
> > > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> > > > need for special cases when the port joins or leaves the bridge etc.
> > >
> > > How are you handling the case where a port leaves a LAG that is linked
> > > to a bridge? In this case the port becomes a standalone port, but will
> > > not get this notification.
> >
> > Apparently the answer to that question is "I delete the code that makes
> > this use case work", how smart of me. Thanks.
>
> Not sure how you expect to interpret this.

Next patch (05/11) deletes that explicit notification from dsa_port_bridge_leave,
function which is called from dsa_port_lag_leave too, apparently with good reason.

> > Unless you have any idea how I could move the logic into the bridge, I
> > guess I'm stuck with DSA and all the other switchdev drivers having this
> > forest of corner cases to deal with. At least I can add a comment so I'm
> > not tempted to delete it next time.
>
> There are too many moving pieces with stacked devices. It is not only
> LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better
> to gracefully / explicitly not handle a case rather than pretending to
> handle it correctly with complex / buggy code.
>
> For example, you should refuse to be enslaved to a LAG that already has
> upper devices such as a bridge. You are probably not handling this
> correctly / at all. This is easy. Just a call to
> netdev_has_any_upper_dev().

Correct, good point, in particular this means that joining a bridged LAG
will not get me any notifications of that LAG's CHANGEUPPER because that
was consumed a long time ago. An equally valid approach seems to be to
check for netdev_master_upper_dev_get_rcu in dsa_port_lag_join, and call
dsa_port_bridge_join on the upper if that is present.

> The reverse, during unlinking, would be to refuse unlinking if the upper
> has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> return an error and callers such as team/bond need to learn to handle
> it, but it seems patchable.

Again, this was treated prior to my deletion in this series and not by
erroring out, I just really didn't think it through.

So you're saying that if we impose that all switchdev drivers restrict
the house of cards to be constructed from the bottom up, and destructed
from the top down, then the notification of bridge port flags can stay
in the bridge layer?

2021-02-10 05:37:57

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 03/11] net: bridge: don't print in br_switchdev_set_port_flag

From: Vladimir Oltean <[email protected]>

Currently br_switchdev_set_port_flag has two options for error handling
and neither is good:
- The driver returns -EOPNOTSUPP in PRE_BRIDGE_FLAGS if it doesn't
support offloading that flag, and this gets silently ignored and
converted to an errno of 0. Nobody does this.
- The driver returns some other error code, like -EINVAL, in
PRE_BRIDGE_FLAGS, and br_switchdev_set_port_flag shouts loudly.

The problem is that we'd like to offload some port flags during bridge
join and leave, but also not have the bridge shout at us if those fail.
But on the other hand we'd like the user to know that we can't offload
something when they set that through netlink. And since we can't have
the driver return -EOPNOTSUPP or -EINVAL depending on whether it's
called by the user or internally by the bridge, let's just add an extack
argument to br_switchdev_set_port_flag and propagate it to its callers.
Then, when we need offloading to really fail silently, this can simply
be passed a NULL argument.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
- br_set_port_flag now returns void, so no extack there.
- don't overwrite extack in br_switchdev_set_port_flag if already
populated.

net/bridge/br_netlink.c | 9 +++++----
net/bridge/br_private.h | 6 ++++--
net/bridge/br_switchdev.c | 13 +++++++------
3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index 2c110bcbc6d0..3ba23fb95370 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -866,7 +866,8 @@ static void br_set_port_flag(struct net_bridge_port *p, struct nlattr *tb[],
}

/* Process bridge protocol info on port */
-static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])
+static int br_setport(struct net_bridge_port *p, struct nlattr *tb[],
+ struct netlink_ext_ack *extack)
{
unsigned long old_flags, changed_mask;
bool br_vlan_tunnel_old;
@@ -898,7 +899,7 @@ static int br_setport(struct net_bridge_port *p, struct nlattr *tb[])

spin_unlock_bh(&p->br->lock);

- err = br_switchdev_set_port_flag(p, p->flags, changed_mask);
+ err = br_switchdev_set_port_flag(p, p->flags, changed_mask, extack);
if (err) {
spin_lock_bh(&p->br->lock);
p->flags = old_flags;
@@ -1021,7 +1022,7 @@ int br_setlink(struct net_device *dev, struct nlmsghdr *nlh, u16 flags,
if (err)
return err;

- err = br_setport(p, tb);
+ err = br_setport(p, tb, extack);
} else {
/* Binary compatibility with old RSTP */
if (nla_len(protinfo) < sizeof(u8))
@@ -1111,7 +1112,7 @@ static int br_port_slave_changelink(struct net_device *brdev,
if (!data)
return 0;

- return br_setport(br_port_get_rtnl(dev), data);
+ return br_setport(br_port_get_rtnl(dev), data, extack);
}

static int br_port_fill_slave_info(struct sk_buff *skb,
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index d242ba668e47..a1639d41188b 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1575,7 +1575,8 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,
const struct sk_buff *skb);
int br_switchdev_set_port_flag(struct net_bridge_port *p,
unsigned long flags,
- unsigned long mask);
+ unsigned long mask,
+ struct netlink_ext_ack *extack);
void br_switchdev_fdb_notify(const struct net_bridge_fdb_entry *fdb,
int type);
int br_switchdev_port_vlan_add(struct net_device *dev, u16 vid, u16 flags,
@@ -1605,7 +1606,8 @@ static inline bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,

static inline int br_switchdev_set_port_flag(struct net_bridge_port *p,
unsigned long flags,
- unsigned long mask)
+ unsigned long mask,
+ struct netlink_ext_ack *extack)
{
return 0;
}
diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index c004ade25ac0..ac8dead86bf2 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -60,7 +60,8 @@ bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p,

int br_switchdev_set_port_flag(struct net_bridge_port *p,
unsigned long flags,
- unsigned long mask)
+ unsigned long mask,
+ struct netlink_ext_ack *extack)
{
struct switchdev_attr attr = {
.orig_dev = p->dev,
@@ -80,14 +81,15 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,

/* We run from atomic context here */
err = call_switchdev_notifiers(SWITCHDEV_PORT_ATTR_SET, p->dev,
- &info.info, NULL);
+ &info.info, extack);
err = notifier_to_errno(err);
if (err == -EOPNOTSUPP)
return 0;

if (err) {
- br_warn(p->br, "bridge flag offload is not supported %u(%s)\n",
- (unsigned int)p->port_no, p->dev->name);
+ if (extack && !extack->_msg)
+ NL_SET_ERR_MSG_MOD(extack,
+ "bridge flag offload is not supported");
return -EOPNOTSUPP;
}

@@ -97,8 +99,7 @@ int br_switchdev_set_port_flag(struct net_bridge_port *p,

err = switchdev_port_attr_set(p->dev, &attr);
if (err) {
- br_warn(p->br, "error setting offload flag on port %u(%s)\n",
- (unsigned int)p->port_no, p->dev->name);
+ NL_SET_ERR_MSG_MOD(extack, "error setting offload flag on port");
return err;
}

--
2.25.1

2021-02-10 05:40:31

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 07/11] net: dsa: kill .port_egress_floods overengineering

From: Vladimir Oltean <[email protected]>

The bridge offloads the port flags through a single bit mask using
switchdev, which among others, contains learning and flooding settings.

The commit 57652796aa97 ("net: dsa: add support for bridge flags")
missed one crucial aspect of the SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS API
when designing the API one level lower, towards the drivers.
This is that the bitmask of passed brport flags never has more than one
bit set at a time. On the other hand, the prototype passed to the driver
is .port_egress_floods(int port, bool unicast, bool multicast), which
configures two flags at a time.

DSA currently checks if .port_egress_floods is implemented, and if it
is, reports both BR_FLOOD and BR_MCAST_FLOOD as supported. So the driver
has no choice if it wants to inform the bridge that, for example, it
can't configure unicast flooding independently of multicast flooding -
the DSA mid layer is standing in the way. Or the other way around: a new
driver wants to start configuring BR_BCAST_FLOOD separately, but what do
we do with the rest, which only support unicast and multicast flooding?
Do we report broadcast flooding configuration as supported for those
too, and silently do nothing?

Secondly, currently DSA deems the driver too dumb to deserve knowing that
a SWITCHDEV_ATTR_ID_BRIDGE_MROUTER attribute was offloaded, because it
just calls .port_egress_floods for the CPU port. When we'll add support
for the plain SWITCHDEV_ATTR_ID_PORT_MROUTER, that will become a real
problem because the flood settings will need to be held statefully in
the DSA middle layer, otherwise changing the mrouter port attribute will
impact the flooding attribute. And that's _assuming_ that the underlying
hardware doesn't have anything else to do when a multicast router
attaches to a port than flood unknown traffic to it. If it does, there
will need to be a dedicated .port_set_mrouter anyway.

Lastly, we have DSA drivers that have a backlink into a pure switchdev
driver (felix -> ocelot). It seems reasonable that the other switchdev
drivers should not have to suffer from the oddities of DSA overengineering,
so keeping DSA a pass-through layer makes more sense there.

To simplify the brport flags situation we just delete .port_egress_floods
and we introduce a simple .port_bridge_flags which is passed to the
driver. Also, the logic from dsa_port_mrouter is removed and a
.port_set_mrouter is created.

Functionally speaking, we simply move the calls to .port_egress_floods
one step lower, in the two drivers that implement it: mv88e6xxx and b53,
so things should work just as before.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
- Reordered with previous patch such that we don't need to introduce
.port_pre_bridge_flags
- Pass extack to drivers.

drivers/net/dsa/b53/b53_common.c | 20 +++++++++++++++++++-
drivers/net/dsa/mv88e6xxx/chip.c | 21 ++++++++++++++++++++-
include/net/dsa.h | 7 +++++--
net/dsa/dsa_priv.h | 6 ++++--
net/dsa/port.c | 18 ++++++++----------
net/dsa/slave.c | 4 ++--
6 files changed, 58 insertions(+), 18 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 23fc7225c8d1..d480493cb64d 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1948,6 +1948,23 @@ int b53_br_egress_floods(struct dsa_switch *ds, int port,
}
EXPORT_SYMBOL(b53_br_egress_floods);

+static int b53_br_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ if (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD))
+ return -EINVAL;
+
+ return b53_br_egress_floods(ds, port, flags.val & BR_FLOOD,
+ flags.val & BR_MCAST_FLOOD);
+}
+
+static int b53_set_mrouter(struct dsa_switch *ds, int port, bool mrouter,
+ struct netlink_ext_ack *extack)
+{
+ return b53_br_egress_floods(ds, port, true, mrouter);
+}
+
static bool b53_possible_cpu_port(struct dsa_switch *ds, int port)
{
/* Broadcom switches will accept enabling Broadcom tags on the
@@ -2187,9 +2204,10 @@ static const struct dsa_switch_ops b53_switch_ops = {
.set_mac_eee = b53_set_mac_eee,
.port_bridge_join = b53_br_join,
.port_bridge_leave = b53_br_leave,
+ .port_bridge_flags = b53_br_flags,
+ .port_set_mrouter = b53_set_mrouter,
.port_stp_state_set = b53_br_set_stp_state,
.port_fast_age = b53_br_fast_age,
- .port_egress_floods = b53_br_egress_floods,
.port_vlan_filtering = b53_vlan_filtering,
.port_vlan_add = b53_vlan_add,
.port_vlan_del = b53_vlan_del,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index ae0b490f00cd..b230bfcc4050 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -5380,6 +5380,24 @@ static int mv88e6xxx_port_egress_floods(struct dsa_switch *ds, int port,
return err;
}

+static int mv88e6xxx_port_bridge_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ if (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD))
+ return -EINVAL;
+
+ return mv88e6xxx_port_egress_floods(ds, port, flags.val & BR_FLOOD,
+ flags.val & BR_MCAST_FLOOD);
+}
+
+static int mv88e6xxx_port_set_mrouter(struct dsa_switch *ds, int port,
+ bool mrouter,
+ struct netlink_ext_ack *extack)
+{
+ return mv88e6xxx_port_egress_floods(ds, port, true, mrouter);
+}
+
static bool mv88e6xxx_lag_can_offload(struct dsa_switch *ds,
struct net_device *lag,
struct netdev_lag_upper_info *info)
@@ -5678,7 +5696,8 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
.set_ageing_time = mv88e6xxx_set_ageing_time,
.port_bridge_join = mv88e6xxx_port_bridge_join,
.port_bridge_leave = mv88e6xxx_port_bridge_leave,
- .port_egress_floods = mv88e6xxx_port_egress_floods,
+ .port_bridge_flags = mv88e6xxx_port_bridge_flags,
+ .port_set_mrouter = mv88e6xxx_port_set_mrouter,
.port_stp_state_set = mv88e6xxx_port_stp_state_set,
.port_fast_age = mv88e6xxx_port_fast_age,
.port_vlan_filtering = mv88e6xxx_port_vlan_filtering,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 60acb9fca124..09aa28e667c7 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -621,8 +621,11 @@ struct dsa_switch_ops {
void (*port_stp_state_set)(struct dsa_switch *ds, int port,
u8 state);
void (*port_fast_age)(struct dsa_switch *ds, int port);
- int (*port_egress_floods)(struct dsa_switch *ds, int port,
- bool unicast, bool multicast);
+ int (*port_bridge_flags)(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack);
+ int (*port_set_mrouter)(struct dsa_switch *ds, int port, bool mrouter,
+ struct netlink_ext_ack *extack);

/*
* VLAN support
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 63770e421e4d..8125806ee135 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -175,8 +175,10 @@ int dsa_port_mdb_add(const struct dsa_port *dp,
int dsa_port_mdb_del(const struct dsa_port *dp,
const struct switchdev_obj_port_mdb *mdb);
int dsa_port_bridge_flags(const struct dsa_port *dp,
- struct switchdev_brport_flags flags);
-int dsa_port_mrouter(struct dsa_port *dp, bool mrouter);
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack);
+int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
+ struct netlink_ext_ack *extack);
int dsa_port_vlan_add(struct dsa_port *dp,
const struct switchdev_obj_port_vlan *vlan);
int dsa_port_vlan_del(struct dsa_port *dp,
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 597d3d3eb507..be5b2244667b 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -383,28 +383,26 @@ int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock)
}

int dsa_port_bridge_flags(const struct dsa_port *dp,
- struct switchdev_brport_flags flags)
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
{
struct dsa_switch *ds = dp->ds;
- int port = dp->index;

- if (!ds->ops->port_egress_floods ||
- (flags.mask & ~(BR_FLOOD | BR_MCAST_FLOOD)))
+ if (!ds->ops->port_bridge_flags)
return -EINVAL;

- return ds->ops->port_egress_floods(ds, port, flags.val & BR_FLOOD,
- flags.val & BR_MCAST_FLOOD);
+ return ds->ops->port_bridge_flags(ds, dp->index, flags, extack);
}

-int dsa_port_mrouter(struct dsa_port *dp, bool mrouter)
+int dsa_port_mrouter(struct dsa_port *dp, bool mrouter,
+ struct netlink_ext_ack *extack)
{
struct dsa_switch *ds = dp->ds;
- int port = dp->index;

- if (!ds->ops->port_egress_floods)
+ if (!ds->ops->port_set_mrouter)
return -EOPNOTSUPP;

- return ds->ops->port_egress_floods(ds, port, true, mrouter);
+ return ds->ops->port_set_mrouter(ds, dp->index, mrouter, extack);
}

int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0e1f8f1d4e2c..4a979245e059 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -291,10 +291,10 @@ static int dsa_slave_port_attr_set(struct net_device *dev,
ret = dsa_port_ageing_time(dp, attr->u.ageing_time);
break;
case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
- ret = dsa_port_bridge_flags(dp, attr->u.brport_flags);
+ ret = dsa_port_bridge_flags(dp, attr->u.brport_flags, extack);
break;
case SWITCHDEV_ATTR_ID_BRIDGE_MROUTER:
- ret = dsa_port_mrouter(dp->cpu_dp, attr->u.mrouter);
+ ret = dsa_port_mrouter(dp->cpu_dp, attr->u.mrouter, extack);
break;
default:
ret = -EOPNOTSUPP;
--
2.25.1

2021-02-10 05:41:03

by Vladimir Oltean

[permalink] [raw]
Subject: [PATCH v2 net-next 09/11] net: mscc: ocelot: use separate flooding PGID for broadcast

From: Vladimir Oltean <[email protected]>

In preparation of offloading the bridge port flags which have
independent settings for unknown multicast and for broadcast, we should
also start reserving one destination Port Group ID for the flooding of
broadcast packets, to allow configuring it individually.

Signed-off-by: Vladimir Oltean <[email protected]>
---
Changes in v2:
None.

drivers/net/ethernet/mscc/ocelot.c | 13 ++++++++-----
include/soc/mscc/ocelot.h | 15 ++++++++-------
2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index f8b85ab8be5d..8c1052346b58 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1662,7 +1662,7 @@ int ocelot_init(struct ocelot *ocelot)
/* Setup flooding PGIDs */
for (i = 0; i < ocelot->num_flooding_pgids; i++)
ocelot_write_rix(ocelot, ANA_FLOODING_FLD_MULTICAST(PGID_MC) |
- ANA_FLOODING_FLD_BROADCAST(PGID_MC) |
+ ANA_FLOODING_FLD_BROADCAST(PGID_BC) |
ANA_FLOODING_FLD_UNICAST(PGID_UC),
ANA_FLOODING, i);
ocelot_write(ocelot, ANA_FLOODING_IPMC_FLD_MC6_DATA(PGID_MCIPV6) |
@@ -1683,15 +1683,18 @@ int ocelot_init(struct ocelot *ocelot)
ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, PGID_SRC + port);
}

- /* Allow broadcast MAC frames. */
for_each_nonreserved_multicast_dest_pgid(ocelot, i) {
u32 val = ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports - 1, 0));

ocelot_write_rix(ocelot, val, ANA_PGID_PGID, i);
}
- ocelot_write_rix(ocelot,
- ANA_PGID_PGID_PGID(GENMASK(ocelot->num_phys_ports, 0)),
- ANA_PGID_PGID, PGID_MC);
+ /* Allow broadcast and unknown L2 multicast to the CPU. */
+ ocelot_rmw_rix(ocelot, ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)),
+ ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)),
+ ANA_PGID_PGID, PGID_MC);
+ ocelot_rmw_rix(ocelot, ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)),
+ ANA_PGID_PGID_PGID(BIT(ocelot->num_phys_ports)),
+ ANA_PGID_PGID, PGID_BC);
ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, PGID_MCIPV4);
ocelot_write_rix(ocelot, 0, ANA_PGID_PGID, PGID_MCIPV6);

diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index d0d48e9620fb..7ee85527cb5f 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -54,16 +54,17 @@
* PGID_CPU: used for whitelisting certain MAC addresses, such as the addresses
* of the switch port net devices, towards the CPU port module.
* PGID_UC: the flooding destinations for unknown unicast traffic.
- * PGID_MC: the flooding destinations for broadcast and non-IP multicast
- * traffic.
+ * PGID_MC: the flooding destinations for non-IP multicast traffic.
* PGID_MCIPV4: the flooding destinations for IPv4 multicast traffic.
* PGID_MCIPV6: the flooding destinations for IPv6 multicast traffic.
+ * PGID_BC: the flooding destinations for broadcast traffic.
*/
-#define PGID_CPU 59
-#define PGID_UC 60
-#define PGID_MC 61
-#define PGID_MCIPV4 62
-#define PGID_MCIPV6 63
+#define PGID_CPU 58
+#define PGID_UC 59
+#define PGID_MC 60
+#define PGID_MCIPV4 61
+#define PGID_MCIPV6 62
+#define PGID_BC 63

#define for_each_unicast_dest_pgid(ocelot, pgid) \
for ((pgid) = 0; \
--
2.25.1

2021-02-10 07:47:16

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 03/11] net: bridge: don't print in br_switchdev_set_port_flag

On Tue, Feb 09, 2021 at 05:19:28PM +0200, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> Currently br_switchdev_set_port_flag has two options for error handling
> and neither is good:
> - The driver returns -EOPNOTSUPP in PRE_BRIDGE_FLAGS if it doesn't
> support offloading that flag, and this gets silently ignored and
> converted to an errno of 0. Nobody does this.
> - The driver returns some other error code, like -EINVAL, in
> PRE_BRIDGE_FLAGS, and br_switchdev_set_port_flag shouts loudly.
>
> The problem is that we'd like to offload some port flags during bridge
> join and leave, but also not have the bridge shout at us if those fail.
> But on the other hand we'd like the user to know that we can't offload
> something when they set that through netlink. And since we can't have
> the driver return -EOPNOTSUPP or -EINVAL depending on whether it's
> called by the user or internally by the bridge, let's just add an extack
> argument to br_switchdev_set_port_flag and propagate it to its callers.
> Then, when we need offloading to really fail silently, this can simply
> be passed a NULL argument.
>
> Signed-off-by: Vladimir Oltean <[email protected]>
> ---

The build fails because since I started working on v2 and until I sent
it, Jakub merged net into net-next which contained this fix:
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
for which I couldn't change prototype due to it missing in net-next.
I think I would like to rather wait to gather some feedback first before
respinning v3, if possible.

2021-02-10 08:05:18

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 02/11] net: bridge: offload all port flags at once in br_setport

On Tue, Feb 09, 2021 at 08:27:24PM +0200, Vladimir Oltean wrote:
> But there's an interesting side effect of allowing
> br_switchdev_set_port_flag to run concurrently (notifier call chains use
> a rw_semaphore and only take the read side). Basically now drivers that
> cache the brport flags in their entirety are broken, because there isn't
> any guarantee that bits outside the mask are valid any longer (we can
> even enforce that by masking the flags with the mask when notifying
> them). They would need to do the same trick of updating just the masked
> part of their cached flags. Except for the fact that they would need
> some sort of spinlock too, I don't think that the basic bitwise
> operations are atomic or anything like that. I'm a bit reluctant to add
> a spinlock in prestera, rocker, mlxsw just for this purpose. What do you
> think?

My take on things is that I can change those drivers to do what ocelot
and sja1105 do, which is to just have some bool values like this:

if (flags.mask & BR_LEARNING)
ocelot_port->learn_ena = !!(flags.val & BR_LEARNING);

which eliminates concurrency to the shared unsigned long brport_flags
variable. No locking, no complications.

2021-02-10 11:06:11

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

On Wed, Feb 10, 2021 at 12:51:53AM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 12:01:24AM +0200, Ido Schimmel wrote:
> > On Tue, Feb 09, 2021 at 10:20:45PM +0200, Vladimir Oltean wrote:
> > > On Tue, Feb 09, 2021 at 08:51:00PM +0200, Ido Schimmel wrote:
> > > > On Tue, Feb 09, 2021 at 05:19:29PM +0200, Vladimir Oltean wrote:
> > > > > So switchdev drivers operating in standalone mode should disable address
> > > > > learning. As a matter of practicality, we can reduce code duplication in
> > > > > drivers by having the bridge notify through switchdev of the initial and
> > > > > final brport flags. Then, drivers can simply start up hardcoded for no
> > > > > address learning (similar to how they already start up hardcoded for no
> > > > > forwarding), then they only need to listen for
> > > > > SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS and their job is basically done, no
> > > > > need for special cases when the port joins or leaves the bridge etc.
> > > >
> > > > How are you handling the case where a port leaves a LAG that is linked
> > > > to a bridge? In this case the port becomes a standalone port, but will
> > > > not get this notification.
> > >
> > > Apparently the answer to that question is "I delete the code that makes
> > > this use case work", how smart of me. Thanks.
> >
> > Not sure how you expect to interpret this.
>
> Next patch (05/11) deletes that explicit notification from dsa_port_bridge_leave,
> function which is called from dsa_port_lag_leave too, apparently with good reason.
>
> > > Unless you have any idea how I could move the logic into the bridge, I
> > > guess I'm stuck with DSA and all the other switchdev drivers having this
> > > forest of corner cases to deal with. At least I can add a comment so I'm
> > > not tempted to delete it next time.
> >
> > There are too many moving pieces with stacked devices. It is not only
> > LAG/bridge. In L3 you have VRFs, SVIs, macvlans etc. It might be better
> > to gracefully / explicitly not handle a case rather than pretending to
> > handle it correctly with complex / buggy code.
> >
> > For example, you should refuse to be enslaved to a LAG that already has
> > upper devices such as a bridge. You are probably not handling this
> > correctly / at all. This is easy. Just a call to
> > netdev_has_any_upper_dev().
>
> Correct, good point, in particular this means that joining a bridged LAG
> will not get me any notifications of that LAG's CHANGEUPPER because that
> was consumed a long time ago. An equally valid approach seems to be to
> check for netdev_master_upper_dev_get_rcu in dsa_port_lag_join, and call
> dsa_port_bridge_join on the upper if that is present.

The bridge might already have a state you are not familiar with (e.g.,
FDB entry pointing to the LAG), so best to just forbid this. I think
it's fair to impose such limitations (assuming they are properly
communicated to user space) given it results in a much less
buggy/complex code to maintain.

>
> > The reverse, during unlinking, would be to refuse unlinking if the upper
> > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > return an error and callers such as team/bond need to learn to handle
> > it, but it seems patchable.
>
> Again, this was treated prior to my deletion in this series and not by
> erroring out, I just really didn't think it through.
>
> So you're saying that if we impose that all switchdev drivers restrict
> the house of cards to be constructed from the bottom up, and destructed
> from the top down, then the notification of bridge port flags can stay
> in the bridge layer?

I actually don't think it's a good idea to have this in the bridge in
any case. I understand that it makes sense for some devices where
learning, flooding, etc are port attributes, but in other devices these
can be {port,vlan} attributes and then you need to take care of them
when a vlan is added / deleted and not only when a port is removed from
the bridge. So for such devices this really won't save anything. I would
thus leave it to the lower levels to decide.

2021-02-10 23:27:32

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > The reverse, during unlinking, would be to refuse unlinking if the upper
> > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > return an error and callers such as team/bond need to learn to handle
> > > it, but it seems patchable.
> >
> > Again, this was treated prior to my deletion in this series and not by
> > erroring out, I just really didn't think it through.
> >
> > So you're saying that if we impose that all switchdev drivers restrict
> > the house of cards to be constructed from the bottom up, and destructed
> > from the top down, then the notification of bridge port flags can stay
> > in the bridge layer?
>
> I actually don't think it's a good idea to have this in the bridge in
> any case. I understand that it makes sense for some devices where
> learning, flooding, etc are port attributes, but in other devices these
> can be {port,vlan} attributes and then you need to take care of them
> when a vlan is added / deleted and not only when a port is removed from
> the bridge. So for such devices this really won't save anything. I would
> thus leave it to the lower levels to decide.

Just for my understanding, how are per-{port,vlan} attributes such as
learning and flooding managed by the Linux bridge? How can I disable
flooding only in a certain VLAN?

2021-02-11 08:01:17

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote:
> On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > > The reverse, during unlinking, would be to refuse unlinking if the upper
> > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > > return an error and callers such as team/bond need to learn to handle
> > > > it, but it seems patchable.
> > >
> > > Again, this was treated prior to my deletion in this series and not by
> > > erroring out, I just really didn't think it through.
> > >
> > > So you're saying that if we impose that all switchdev drivers restrict
> > > the house of cards to be constructed from the bottom up, and destructed
> > > from the top down, then the notification of bridge port flags can stay
> > > in the bridge layer?
> >
> > I actually don't think it's a good idea to have this in the bridge in
> > any case. I understand that it makes sense for some devices where
> > learning, flooding, etc are port attributes, but in other devices these
> > can be {port,vlan} attributes and then you need to take care of them
> > when a vlan is added / deleted and not only when a port is removed from
> > the bridge. So for such devices this really won't save anything. I would
> > thus leave it to the lower levels to decide.
>
> Just for my understanding, how are per-{port,vlan} attributes such as
> learning and flooding managed by the Linux bridge? How can I disable
> flooding only in a certain VLAN?

You can't (currently). But it does not change the fact that in some
devices these are {port,vlan} attributes and we are talking here about
the interface towards these devices. Having these as {port,vlan}
attributes allows you to support use cases such as a port being enslaved
to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware
bridge(s). Obviously you need to ensure there is no conflict between the
VLANs used by the VLAN-aware bridge and the VLAN device(s).

2021-02-11 09:41:55

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

On Thu, Feb 11, 2021 at 09:44:43AM +0200, Ido Schimmel wrote:
> On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote:
> > On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > > > The reverse, during unlinking, would be to refuse unlinking if the upper
> > > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > > > return an error and callers such as team/bond need to learn to handle
> > > > > it, but it seems patchable.
> > > >
> > > > Again, this was treated prior to my deletion in this series and not by
> > > > erroring out, I just really didn't think it through.
> > > >
> > > > So you're saying that if we impose that all switchdev drivers restrict
> > > > the house of cards to be constructed from the bottom up, and destructed
> > > > from the top down, then the notification of bridge port flags can stay
> > > > in the bridge layer?
> > >
> > > I actually don't think it's a good idea to have this in the bridge in
> > > any case. I understand that it makes sense for some devices where
> > > learning, flooding, etc are port attributes, but in other devices these
> > > can be {port,vlan} attributes and then you need to take care of them
> > > when a vlan is added / deleted and not only when a port is removed from
> > > the bridge. So for such devices this really won't save anything. I would
> > > thus leave it to the lower levels to decide.
> >
> > Just for my understanding, how are per-{port,vlan} attributes such as
> > learning and flooding managed by the Linux bridge? How can I disable
> > flooding only in a certain VLAN?
>
> You can't (currently). But it does not change the fact that in some
> devices these are {port,vlan} attributes and we are talking here about
> the interface towards these devices. Having these as {port,vlan}
> attributes allows you to support use cases such as a port being enslaved
> to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware
> bridge(s).

I don't think I understand the use case really. You mean something like this?

br1 (vlan_filtering=0)
/ \
/ \
swp0.100 \
| \
|(vlan_filtering \
| br0 =1) \
| / \ \
|/ \ \
swp0 swp1 swp2

A packet received on swp0 with VLAN tag 100 will go to swp0.100 which
will be forwarded according to the FDB of br1, and will be delivered to
swp2 as untagged? Respectively in the other direction, a packet received
on swp2 will have a VLAN 100 tag pushed on egress towards swp0, even if
it is already VLAN-tagged?

What do you even use this for?
And also: if the {port,vlan} attributes can be simulated by making the
bridge port be an 8021q upper of a physical interface, then as far as
the bridge is concerned, they still are per-port attributes, and they
are per-{port,vlan} only as far as the switch driver is concerned -
therefore I don't see why it isn't okay for the bridge to notify the
brport flags in exactly the same way for them too.

> Obviously you need to ensure there is no conflict between the
> VLANs used by the VLAN-aware bridge and the VLAN device(s).

On the other hand I think I have a more real-life use case that I think
is in conflict with this last phrase.
I have a VLAN-aware bridge and I want to run PTP in VLAN 7, but I also
need to add VLAN 7 in the VLAN table of the bridge ports so that it
doesn't drop traffic. PTP is link-local, so I need to run it on VLAN
uppers of the switch ports. Like this:

ip link add br0 type bridge vlan_filtering 1
ip link set swp0 master br0
ip link set swp1 master br0
bridge vlan add dev swp0 vid 7 master
bridge vlan add dev swp1 vid 7 master
bridge vlan add dev br0 vid 7 self
ip link add link swp0 name swp0.7 type vlan id 7
ip link add link swp1 name swp0.7 type vlan id 7
ptp4l -i swp0.7 -i swp1.7 -m

How can I do that considering that you recommend avoiding conflicts
between the VLAN-aware bridge and 8021q uppers? Or is that true only
when the 8021q uppers are bridged?

2021-02-11 22:25:05

by Ido Schimmel

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/11] net: bridge: offload initial and final port flags through switchdev

On Thu, Feb 11, 2021 at 11:35:27AM +0200, Vladimir Oltean wrote:
> On Thu, Feb 11, 2021 at 09:44:43AM +0200, Ido Schimmel wrote:
> > On Thu, Feb 11, 2021 at 01:23:52AM +0200, Vladimir Oltean wrote:
> > > On Wed, Feb 10, 2021 at 12:59:49PM +0200, Ido Schimmel wrote:
> > > > > > The reverse, during unlinking, would be to refuse unlinking if the upper
> > > > > > has uppers of its own. netdev_upper_dev_unlink() needs to learn to
> > > > > > return an error and callers such as team/bond need to learn to handle
> > > > > > it, but it seems patchable.
> > > > >
> > > > > Again, this was treated prior to my deletion in this series and not by
> > > > > erroring out, I just really didn't think it through.
> > > > >
> > > > > So you're saying that if we impose that all switchdev drivers restrict
> > > > > the house of cards to be constructed from the bottom up, and destructed
> > > > > from the top down, then the notification of bridge port flags can stay
> > > > > in the bridge layer?
> > > >
> > > > I actually don't think it's a good idea to have this in the bridge in
> > > > any case. I understand that it makes sense for some devices where
> > > > learning, flooding, etc are port attributes, but in other devices these
> > > > can be {port,vlan} attributes and then you need to take care of them
> > > > when a vlan is added / deleted and not only when a port is removed from
> > > > the bridge. So for such devices this really won't save anything. I would
> > > > thus leave it to the lower levels to decide.
> > >
> > > Just for my understanding, how are per-{port,vlan} attributes such as
> > > learning and flooding managed by the Linux bridge? How can I disable
> > > flooding only in a certain VLAN?
> >
> > You can't (currently). But it does not change the fact that in some
> > devices these are {port,vlan} attributes and we are talking here about
> > the interface towards these devices. Having these as {port,vlan}
> > attributes allows you to support use cases such as a port being enslaved
> > to a VLAN-aware bridge and its VLAN upper(s) enslaved to VLAN unaware
> > bridge(s).
>
> I don't think I understand the use case really. You mean something like this?
>
> br1 (vlan_filtering=0)
> / \
> / \
> swp0.100 \
> | \
> |(vlan_filtering \
> | br0 =1) \
> | / \ \
> |/ \ \
> swp0 swp1 swp2
>
> A packet received on swp0 with VLAN tag 100 will go to swp0.100 which
> will be forwarded according to the FDB of br1, and will be delivered to
> swp2 as untagged? Respectively in the other direction, a packet received
> on swp2 will have a VLAN 100 tag pushed on egress towards swp0, even if
> it is already VLAN-tagged?
>
> What do you even use this for?

The more common use case is to have multiple VLAN-unaware bridges
instead of one VLAN-aware bridge. I'm not aware of users that use the
hybrid model (VLAN-aware + VLAN-unaware). But regardless, this entails
treating above mentioned attributes as {port,vlan} attributes. A device
that only supports them as port attributes will have problems supporting
such a model.

> And also: if the {port,vlan} attributes can be simulated by making the
> bridge port be an 8021q upper of a physical interface, then as far as
> the bridge is concerned, they still are per-port attributes, and they
> are per-{port,vlan} only as far as the switch driver is concerned -
> therefore I don't see why it isn't okay for the bridge to notify the
> brport flags in exactly the same way for them too.

Look at this hunk from the patch:

@@ -343,6 +360,8 @@ static void del_nbp(struct net_bridge_port *p)
update_headroom(br, get_max_headroom(br));
netdev_reset_rx_headroom(dev);

+ nbp_flags_notify(p, BR_PORT_DEFAULT_FLAGS & ~BR_LEARNING,
+ BR_PORT_DEFAULT_FLAGS);
nbp_vlan_flush(p);
br_fdb_delete_by_port(br, p, 0, 1);
switchdev_deferred_process();

Devices that treat these attributes as {port,vlan} attributes will undo
this change upon the call to nbp_vlan_flush() when all the VLANs are
flushed.

>
> > Obviously you need to ensure there is no conflict between the
> > VLANs used by the VLAN-aware bridge and the VLAN device(s).
>
> On the other hand I think I have a more real-life use case that I think
> is in conflict with this last phrase.
> I have a VLAN-aware bridge and I want to run PTP in VLAN 7, but I also
> need to add VLAN 7 in the VLAN table of the bridge ports so that it
> doesn't drop traffic. PTP is link-local, so I need to run it on VLAN
> uppers of the switch ports. Like this:
>
> ip link add br0 type bridge vlan_filtering 1
> ip link set swp0 master br0
> ip link set swp1 master br0
> bridge vlan add dev swp0 vid 7 master
> bridge vlan add dev swp1 vid 7 master
> bridge vlan add dev br0 vid 7 self
> ip link add link swp0 name swp0.7 type vlan id 7
> ip link add link swp1 name swp0.7 type vlan id 7
> ptp4l -i swp0.7 -i swp1.7 -m
>
> How can I do that considering that you recommend avoiding conflicts
> between the VLAN-aware bridge and 8021q uppers? Or is that true only
> when the 8021q uppers are bridged?

The problem is with the statement "I also need to add VLAN 7 in the VLAN
table of the bridge ports so that it doesn't drop traffic". Packets with
VLAN 7 received by swp0 will be processed by swp0.7. br0 is irrelevant
and configuring swp0.7 should be enough in order to enable the VLAN
filter for VLAN 7 on swp0. I don't know the internals of the HW you are
working with, but I imagine that you would need to create a HW bridge
between {swp0, VLAN 7} and the CPU port so that all the traffic with
VLAN 7 will be sent / flooded to the CPU.