2022-03-01 10:29:01

by Tobias Waldekranz

[permalink] [raw]
Subject: [PATCH v2 net-next 00/10] net: bridge: Multiple Spanning Trees

The bridge has had per-VLAN STP support for a while now, since:

https://lore.kernel.org/netdev/[email protected]/

The current implementation has some problems:

- The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
is managed independently. This is awkward from an MSTP (802.1Q-2018,
Clause 13.5) point of view, where the model is that multiple VLANs
are grouped into MST instances.

Because of the way that the standard is written, presumably, this is
also reflected in hardware implementations. It is not uncommon for a
switch to support the full 4k range of VIDs, but that the pool of
MST instances is much smaller. Some examples:

Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
Marvell Prestera: 4k VLANs, but only 128 MSTIs
Microchip SparX-5i: 4k VLANs, but only 128 MSTIs

- By default, the feature is enabled, and there is no way to disable
it. This makes it hard to add offloading in a backwards compatible
way, since any underlying switchdevs have no way to refuse the
function if the hardware does not support it

- The port-global STP state has precedence over per-VLAN states. In
MSTP, as far as I understand it, all VLANs will use the common
spanning tree (CST) by default - through traffic engineering you can
then optimize your network to group subsets of VLANs to use
different trees (MSTI). To my understanding, the way this is
typically managed in silicon is roughly:

Incoming packet:
.----.----.--------------.----.-------------
| DA | SA | 802.1Q VID=X | ET | Payload ...
'----'----'--------------'----'-------------
|
'->|\ .----------------------------.
| +--> | VID | Members | ... | MSTI |
PVID -->|/ |-----|---------|-----|------|
| 1 | 0001001 | ... | 0 |
| 2 | 0001010 | ... | 10 |
| 3 | 0001100 | ... | 10 |
'----------------------------'
|
.-----------------------------'
| .------------------------.
'->| MSTI | Fwding | Lrning |
|------|--------|--------|
| 0 | 111110 | 111110 |
| 10 | 110111 | 110111 |
'------------------------'

What this is trying to show is that the STP state (whether MSTP is
used, or ye olde STP) is always accessed via the VLAN table. If STP
is running, all MSTI pointers in that table will reference the same
index in the STP stable - if MSTP is running, some VLANs may point
to other trees (like in this example).

The fact that in the Linux bridge, the global state (think: index 0
in most hardware implementations) is supposed to override the
per-VLAN state, is very awkward to offload. In effect, this means
that when the global state changes to blocking, drivers will have to
iterate over all MSTIs in use, and alter them all to match. This
also means that you have to cache whether the hardware state is
currently tracking the global state or the per-VLAN state. In the
first case, you also have to cache the per-VLAN state so that you
can restore it if the global state transitions back to forwarding.

This series adds a new mst_enable bridge setting (as suggested by Nik)
that can only be changed when no VLANs are configured on the
bridge. Enabling this mode has the following effect:

- The port-global STP state is used to represent the CST (Common
Spanning Tree) (1/10)

- Ingress STP filtering is deferred until the frame's VLAN has been
resolved (1/10)

- The preexisting per-VLAN states can no longer be controlled directly
(1/10). They are instead placed under the MST module's control,
which is managed using a new netlink interface (described in 3/10)

- VLANs can br mapped to MSTIs in an arbitrary M:N fashion, using a
new global VLAN option (2/10)

4-5/10 adds switchdev notifications so that a driver can track VID to
MSTI mappings and MST port states.

An offloading implementation is this provided for mv88e6xxx.

A proposal for the corresponding iproute2 interface is available here:

https://github.com/wkz/iproute2/tree/mst

Tobias Waldekranz (10):
net: bridge: mst: Multiple Spanning Tree (MST) mode
net: bridge: mst: Allow changing a VLAN's MSTI
net: bridge: mst: Support setting and reporting MST port states
net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations
net: bridge: mst: Notify switchdev drivers of MST state changes
net: dsa: Pass VLAN MSTI migration notifications to driver
net: dsa: Pass MST state changes to driver
net: dsa: mv88e6xxx: Disentangle STU from VTU
net: dsa: mv88e6xxx: Export STU as devlink region
net: dsa: mv88e6xxx: MST Offloading

drivers/net/dsa/mv88e6xxx/chip.c | 232 ++++++++++++++
drivers/net/dsa/mv88e6xxx/chip.h | 38 +++
drivers/net/dsa/mv88e6xxx/devlink.c | 94 ++++++
drivers/net/dsa/mv88e6xxx/global1.h | 10 +
drivers/net/dsa/mv88e6xxx/global1_vtu.c | 311 ++++++++++--------
include/net/dsa.h | 5 +
include/net/switchdev.h | 17 +
include/uapi/linux/if_bridge.h | 17 +
include/uapi/linux/if_link.h | 1 +
include/uapi/linux/rtnetlink.h | 5 +
net/bridge/Makefile | 2 +-
net/bridge/br_input.c | 17 +-
net/bridge/br_mst.c | 402 ++++++++++++++++++++++++
net/bridge/br_netlink.c | 17 +-
net/bridge/br_private.h | 31 ++
net/bridge/br_stp.c | 3 +
net/bridge/br_switchdev.c | 57 ++++
net/bridge/br_vlan.c | 20 +-
net/bridge/br_vlan_options.c | 24 +-
net/dsa/dsa_priv.h | 3 +
net/dsa/port.c | 40 +++
net/dsa/slave.c | 12 +
22 files changed, 1216 insertions(+), 142 deletions(-)
create mode 100644 net/bridge/br_mst.c

--
2.25.1


2022-03-01 10:41:03

by Tobias Waldekranz

[permalink] [raw]
Subject: [PATCH v2 net-next 09/10] net: dsa: mv88e6xxx: Export STU as devlink region

Export the raw STU data in a devlink region so that it can be
inspected from userspace and compared to the current bridge
configuration.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
drivers/net/dsa/mv88e6xxx/chip.h | 1 +
drivers/net/dsa/mv88e6xxx/devlink.c | 94 +++++++++++++++++++++++++++++
2 files changed, 95 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index be654be69982..6d4daa24d3e5 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -287,6 +287,7 @@ enum mv88e6xxx_region_id {
MV88E6XXX_REGION_GLOBAL2,
MV88E6XXX_REGION_ATU,
MV88E6XXX_REGION_VTU,
+ MV88E6XXX_REGION_STU,
MV88E6XXX_REGION_PVT,

_MV88E6XXX_REGION_MAX,
diff --git a/drivers/net/dsa/mv88e6xxx/devlink.c b/drivers/net/dsa/mv88e6xxx/devlink.c
index 381068395c63..1266eabee086 100644
--- a/drivers/net/dsa/mv88e6xxx/devlink.c
+++ b/drivers/net/dsa/mv88e6xxx/devlink.c
@@ -503,6 +503,85 @@ static int mv88e6xxx_region_vtu_snapshot(struct devlink *dl,
return 0;
}

+/**
+ * struct mv88e6xxx_devlink_stu_entry - Devlink STU entry
+ * @sid: Global1/3: SID, unknown filters and learning.
+ * @vid: Global1/6: Valid bit.
+ * @data: Global1/7-9: Membership data and priority override.
+ * @resvd: Reserved. In case we forgot something.
+ *
+ * The STU entry format varies between chipset generations. Peridot
+ * and Amethyst packs the STU data into Global1/7-8. Older silicon
+ * spreads the information across all three VTU data registers -
+ * inheriting the layout of even older hardware that had no STU at
+ * all. Since this is a low-level debug interface, copy all data
+ * verbatim and defer parsing to the consumer.
+ */
+struct mv88e6xxx_devlink_stu_entry {
+ u16 sid;
+ u16 vid;
+ u16 data[3];
+ u16 resvd;
+};
+
+static int mv88e6xxx_region_stu_snapshot(struct devlink *dl,
+ const struct devlink_region_ops *ops,
+ struct netlink_ext_ack *extack,
+ u8 **data)
+{
+ struct mv88e6xxx_devlink_stu_entry *table, *entry;
+ struct dsa_switch *ds = dsa_devlink_to_ds(dl);
+ struct mv88e6xxx_chip *chip = ds->priv;
+ struct mv88e6xxx_stu_entry stu;
+ int err;
+
+ table = kcalloc(mv88e6xxx_max_sid(chip) + 1,
+ sizeof(struct mv88e6xxx_devlink_stu_entry),
+ GFP_KERNEL);
+ if (!table)
+ return -ENOMEM;
+
+ entry = table;
+ stu.sid = mv88e6xxx_max_sid(chip);
+ stu.valid = false;
+
+ mv88e6xxx_reg_lock(chip);
+
+ do {
+ err = mv88e6xxx_g1_stu_getnext(chip, &stu);
+ if (err)
+ break;
+
+ if (!stu.valid)
+ break;
+
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6352_G1_VTU_SID,
+ &entry->sid);
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6XXX_G1_VTU_VID,
+ &entry->vid);
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6XXX_G1_VTU_DATA1,
+ &entry->data[0]);
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6XXX_G1_VTU_DATA2,
+ &entry->data[1]);
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6XXX_G1_VTU_DATA3,
+ &entry->data[2]);
+ if (err)
+ break;
+
+ entry++;
+ } while (stu.sid < mv88e6xxx_max_sid(chip));
+
+ mv88e6xxx_reg_unlock(chip);
+
+ if (err) {
+ kfree(table);
+ return err;
+ }
+
+ *data = (u8 *)table;
+ return 0;
+}
+
static int mv88e6xxx_region_pvt_snapshot(struct devlink *dl,
const struct devlink_region_ops *ops,
struct netlink_ext_ack *extack,
@@ -605,6 +684,12 @@ static struct devlink_region_ops mv88e6xxx_region_vtu_ops = {
.destructor = kfree,
};

+static struct devlink_region_ops mv88e6xxx_region_stu_ops = {
+ .name = "stu",
+ .snapshot = mv88e6xxx_region_stu_snapshot,
+ .destructor = kfree,
+};
+
static struct devlink_region_ops mv88e6xxx_region_pvt_ops = {
.name = "pvt",
.snapshot = mv88e6xxx_region_pvt_snapshot,
@@ -640,6 +725,11 @@ static struct mv88e6xxx_region mv88e6xxx_regions[] = {
.ops = &mv88e6xxx_region_vtu_ops
/* calculated at runtime */
},
+ [MV88E6XXX_REGION_STU] = {
+ .ops = &mv88e6xxx_region_stu_ops,
+ .cond = mv88e6xxx_has_stu,
+ /* calculated at runtime */
+ },
[MV88E6XXX_REGION_PVT] = {
.ops = &mv88e6xxx_region_pvt_ops,
.size = MV88E6XXX_MAX_PVT_ENTRIES * sizeof(u16),
@@ -706,6 +796,10 @@ int mv88e6xxx_setup_devlink_regions_global(struct dsa_switch *ds)
size = (mv88e6xxx_max_vid(chip) + 1) *
sizeof(struct mv88e6xxx_devlink_vtu_entry);
break;
+ case MV88E6XXX_REGION_STU:
+ size = (mv88e6xxx_max_sid(chip) + 1) *
+ sizeof(struct mv88e6xxx_devlink_stu_entry);
+ break;
}

region = dsa_devlink_region_create(ds, ops, 1, size);
--
2.25.1

2022-03-01 10:55:46

by Tobias Waldekranz

[permalink] [raw]
Subject: [PATCH v2 net-next 03/10] net: bridge: mst: Support setting and reporting MST port states

Make it possible to change the port state in a given MSTI. This is
done through a new netlink interface, since the MSTIs are objects in
their own right. The proposed iproute2 interface would be:

bridge mst set dev <PORT> msti <MSTI> state <STATE>

Current states in all applicable MSTIs can also be dumped. The
proposed iproute interface looks like this:

$ bridge mst
port msti
vb1 0
state forwarding
100
state disabled
vb2 0
state forwarding
100
state forwarding

The preexisting per-VLAN states are still valid in the MST
mode (although they are read-only), and can be queried as usual if one
is interested in knowing a particular VLAN's state without having to
care about the VID to MSTI mapping (in this example VLAN 20 and 30 are
bound to MSTI 100):

$ bridge -d vlan
port vlan-id
vb1 10
state forwarding mcast_router 1
20
state disabled mcast_router 1
30
state disabled mcast_router 1
40
state forwarding mcast_router 1
vb2 10
state forwarding mcast_router 1
20
state forwarding mcast_router 1
30
state forwarding mcast_router 1
40
state forwarding mcast_router 1

Signed-off-by: Tobias Waldekranz <[email protected]>
---
include/uapi/linux/if_bridge.h | 16 +++
include/uapi/linux/rtnetlink.h | 5 +
net/bridge/br_mst.c | 244 +++++++++++++++++++++++++++++++++
net/bridge/br_netlink.c | 3 +
net/bridge/br_private.h | 4 +
5 files changed, 272 insertions(+)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index b68016f625b7..784482527861 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -785,4 +785,20 @@ enum {
__BRIDGE_QUERIER_MAX
};
#define BRIDGE_QUERIER_MAX (__BRIDGE_QUERIER_MAX - 1)
+
+enum {
+ BRIDGE_MST_UNSPEC,
+ BRIDGE_MST_ENTRY,
+ __BRIDGE_MST_MAX,
+};
+#define BRIDGE_MST_MAX (__BRIDGE_MST_MAX - 1)
+
+enum {
+ BRIDGE_MST_ENTRY_UNSPEC,
+ BRIDGE_MST_ENTRY_MSTI,
+ BRIDGE_MST_ENTRY_STATE,
+ __BRIDGE_MST_ENTRY_MAX,
+};
+#define BRIDGE_MST_ENTRY_MAX (__BRIDGE_MST_ENTRY_MAX - 1)
+
#endif /* _UAPI_LINUX_IF_BRIDGE_H */
diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
index 0970cb4b1b88..4a48f3ce862c 100644
--- a/include/uapi/linux/rtnetlink.h
+++ b/include/uapi/linux/rtnetlink.h
@@ -192,6 +192,11 @@ enum {
RTM_GETTUNNEL,
#define RTM_GETTUNNEL RTM_GETTUNNEL

+ RTM_GETMST = 124 + 2,
+#define RTM_GETMST RTM_GETMST
+ RTM_SETMST,
+#define RTM_SETMST RTM_SETMST
+
__RTM_MAX,
#define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1)
};
diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
index f3b8e279b85c..8dea8e7257fd 100644
--- a/net/bridge/br_mst.c
+++ b/net/bridge/br_mst.c
@@ -120,3 +120,247 @@ int br_mst_set_enabled(struct net_bridge *br, unsigned long val)
br_opt_toggle(br, BROPT_MST_ENABLED, !!val);
return 0;
}
+
+static int br_mst_nl_get_one(struct net_bridge_port *p, struct sk_buff *skb,
+ struct netlink_callback *cb)
+{
+ struct net_bridge_vlan_group *vg = nbp_vlan_group(p);
+ int err = 0, idx = 0, s_idx = cb->args[1];
+ struct net_bridge_vlan *v;
+ struct br_port_msg *bpm;
+ struct nlmsghdr *nlh;
+ struct nlattr *nest;
+ unsigned long *seen;
+
+ nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
+ RTM_GETMST, sizeof(*bpm), NLM_F_MULTI);
+ if (!nlh)
+ return -EMSGSIZE;
+
+ bpm = nlmsg_data(nlh);
+ memset(bpm, 0, sizeof(*bpm));
+ bpm->ifindex = p->dev->ifindex;
+
+ seen = bitmap_zalloc(VLAN_N_VID, 0);
+ if (!seen)
+ return -ENOMEM;
+
+ list_for_each_entry(v, &vg->vlan_list, vlist) {
+ if (test_bit(v->brvlan->msti, seen))
+ continue;
+
+ if (idx < s_idx)
+ goto skip;
+
+ nest = nla_nest_start_noflag(skb, BRIDGE_MST_ENTRY);
+ if (!nest ||
+ nla_put_u16(skb, BRIDGE_MST_ENTRY_MSTI, v->brvlan->msti) ||
+ nla_put_u8(skb, BRIDGE_MST_ENTRY_STATE, v->state)) {
+ err = -EMSGSIZE;
+ break;
+ }
+ nla_nest_end(skb, nest);
+
+ set_bit(v->brvlan->msti, seen);
+skip:
+ idx++;
+ }
+
+ kfree(seen);
+ nlmsg_end(skb, nlh);
+ return err;
+}
+
+static struct net_bridge_port *br_mst_nl_get_parse(struct net *net,
+ struct netlink_callback *cb)
+{
+ struct netlink_ext_ack *extack = cb->extack;
+ const struct nlmsghdr *nlh = cb->nlh;
+ struct net_bridge_port *p;
+ struct br_port_msg *bpm;
+ struct net_device *dev;
+
+ if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*bpm))) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid header for mst get request");
+ return ERR_PTR(-EINVAL);
+ }
+
+ if (nlmsg_attrlen(nlh, sizeof(*bpm))) {
+ NL_SET_ERR_MSG(extack, "Invalid data after header in mst get request");
+ return ERR_PTR(-EINVAL);
+ }
+
+ bpm = nlmsg_data(nlh);
+ if (!bpm->ifindex)
+ return NULL;
+
+ dev = __dev_get_by_index(net, bpm->ifindex);
+ if (!dev)
+ return ERR_PTR(-ENODEV);
+
+ if (!netif_is_bridge_port(dev)) {
+ NL_SET_ERR_MSG_MOD(extack, "The device is not a valid bridge port");
+ return ERR_PTR(-EINVAL);
+ }
+
+ p = br_port_get_rtnl(dev);
+ if (WARN_ON(!p))
+ return ERR_PTR(-ENODEV);
+
+ if (!br_opt_get(p->br, BROPT_MST_ENABLED)) {
+ NL_SET_ERR_MSG_MOD(extack, "Can't query MST state when MST is disabled");
+ return ERR_PTR(-EINVAL);
+ }
+
+ return p;
+}
+
+static int br_mst_nl_get(struct sk_buff *skb, struct netlink_callback *cb)
+{
+ int err = 0, idx = 0, s_idx = cb->args[0];
+ struct net *net = sock_net(skb->sk);
+ struct net_bridge_port *p;
+ struct net_device *dev;
+
+ p = br_mst_nl_get_parse(net, cb);
+ if (IS_ERR(p))
+ return PTR_ERR(p);
+
+ if (p) {
+ err = br_mst_nl_get_one(p, skb, cb);
+ if (err != -EMSGSIZE)
+ return err;
+ } else {
+ for_each_netdev(net, dev) {
+ if (!netif_is_bridge_port(dev))
+ continue;
+
+ if (idx < s_idx)
+ goto skip;
+
+ p = br_port_get_rtnl(dev);
+ if (WARN_ON(!p))
+ return -ENODEV;
+
+ err = br_mst_nl_get_one(p, skb, cb);
+ if (err == -EMSGSIZE)
+ break;
+skip:
+ idx++;
+ }
+ }
+
+ cb->args[0] = idx;
+ return skb->len;
+}
+
+static const struct nla_policy br_mst_nl_policy[BRIDGE_MST_ENTRY_MAX + 1] = {
+ [BRIDGE_MST_ENTRY_MSTI] = NLA_POLICY_RANGE(NLA_U16,
+ 1, /* 0 reserved for CST */
+ VLAN_N_VID - 1),
+ [BRIDGE_MST_ENTRY_STATE] = NLA_POLICY_RANGE(NLA_U8,
+ BR_STATE_DISABLED,
+ BR_STATE_BLOCKING),
+};
+
+static int br_mst_nl_set_one(struct net_bridge_port *p,
+ const struct nlattr *attr,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[BRIDGE_MST_ENTRY_MAX + 1];
+ u16 msti;
+ u8 state;
+ int err;
+
+ err = nla_parse_nested(tb, BRIDGE_MST_ENTRY_MAX, attr,
+ br_mst_nl_policy, extack);
+ if (err)
+ return err;
+
+ if (!tb[BRIDGE_MST_ENTRY_MSTI]) {
+ NL_SET_ERR_MSG_MOD(extack, "MSTI not specified");
+ return -EINVAL;
+ }
+
+ if (!tb[BRIDGE_MST_ENTRY_STATE]) {
+ NL_SET_ERR_MSG_MOD(extack, "State not specified");
+ return -EINVAL;
+ }
+
+ msti = nla_get_u16(tb[BRIDGE_MST_ENTRY_MSTI]);
+ state = nla_get_u8(tb[BRIDGE_MST_ENTRY_STATE]);
+
+ br_mst_set_state(p, msti, state);
+ return 0;
+}
+
+static int br_mst_nl_set(struct sk_buff *skb, struct nlmsghdr *nlh,
+ struct netlink_ext_ack *extack)
+{
+ struct net *net = sock_net(skb->sk);
+ struct net_bridge_port *p;
+ struct br_port_msg *bpm;
+ struct net_device *dev;
+ struct nlattr *attr;
+ int err, msts = 0;
+ int rem;
+
+ err = nlmsg_parse(nlh, sizeof(*bpm), NULL, BRIDGE_MST_MAX, NULL,
+ extack);
+ if (err < 0)
+ return err;
+
+ bpm = nlmsg_data(nlh);
+ dev = __dev_get_by_index(net, bpm->ifindex);
+ if (!dev)
+ return -ENODEV;
+
+ if (!netif_is_bridge_port(dev)) {
+ NL_SET_ERR_MSG_MOD(extack, "The device is not a valid bridge port");
+ return -EINVAL;
+ }
+
+ p = br_port_get_rtnl(dev);
+ if (WARN_ON(!p))
+ return -ENODEV;
+
+ if (!br_opt_get(p->br, BROPT_MST_ENABLED)) {
+ NL_SET_ERR_MSG_MOD(extack, "Can't modify MST state when MST is disabled");
+ return -EBUSY;
+ }
+
+ nlmsg_for_each_attr(attr, nlh, sizeof(*bpm), rem) {
+ switch (nla_type(attr)) {
+ case BRIDGE_MST_ENTRY:
+ err = br_mst_nl_set_one(p, attr, extack);
+ break;
+ default:
+ continue;
+ }
+
+ msts++;
+ if (err)
+ break;
+ }
+
+ if (!msts) {
+ NL_SET_ERR_MSG_MOD(extack, "Found no MST entries to process");
+ err = -EINVAL;
+ }
+
+ return err;
+}
+
+void br_mst_rtnl_init(void)
+{
+ rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_GETMST, NULL,
+ br_mst_nl_get, 0);
+ rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_SETMST,
+ br_mst_nl_set, NULL, 0);
+}
+
+void br_mst_rtnl_uninit(void)
+{
+ rtnl_unregister(PF_BRIDGE, RTM_SETMST);
+ rtnl_unregister(PF_BRIDGE, RTM_GETMST);
+}
diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index a17a0fe25a58..6d70d6f9cf17 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -1813,6 +1813,7 @@ int __init br_netlink_init(void)

br_mdb_init();
br_vlan_rtnl_init();
+ br_mst_rtnl_init();
rtnl_af_register(&br_af_ops);

err = rtnl_link_register(&br_link_ops);
@@ -1824,6 +1825,7 @@ int __init br_netlink_init(void)
out_af:
rtnl_af_unregister(&br_af_ops);
br_mdb_uninit();
+ br_mst_rtnl_uninit();
return err;
}

@@ -1831,6 +1833,7 @@ void br_netlink_fini(void)
{
br_mdb_uninit();
br_vlan_rtnl_uninit();
+ br_mst_rtnl_uninit();
rtnl_af_unregister(&br_af_ops);
rtnl_link_unregister(&br_link_ops);
}
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 63601043abca..7882a65ffb43 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1782,6 +1782,8 @@ void br_mst_set_state(struct net_bridge_port *p, u16 msti, u8 state);
int br_mst_vlan_set_msti(struct net_bridge_vlan *v, u16 msti);
void br_mst_vlan_init_state(struct net_bridge_vlan *v);
int br_mst_set_enabled(struct net_bridge *br, unsigned long val);
+void br_mst_rtnl_init(void);
+void br_mst_rtnl_uninit(void);
#else
static inline bool br_mst_is_enabled(struct net_bridge *br)
{
@@ -1790,6 +1792,8 @@ static inline bool br_mst_is_enabled(struct net_bridge *br)

static inline void br_mst_set_state(struct net_bridge_port *p,
u16 msti, u8 state) {}
+static inline void br_mst_rtnl_init(void) {}
+static inline void br_mst_rtnl_uninit(void) {}
#endif

struct nf_br_ops {
--
2.25.1

2022-03-01 11:16:49

by Tobias Waldekranz

[permalink] [raw]
Subject: [PATCH v2 net-next 06/10] net: dsa: Pass VLAN MSTI migration notifications to driver

Add the usual trampoline functionality from the generic DSA layer down
to the drivers for VLAN MSTI migrations.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
include/net/dsa.h | 3 +++
net/dsa/dsa_priv.h | 1 +
net/dsa/port.c | 10 ++++++++++
net/dsa/slave.c | 6 ++++++
4 files changed, 20 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index cfedcfb86350..cc8acb01bd9b 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -962,6 +962,9 @@ struct dsa_switch_ops {
struct netlink_ext_ack *extack);
int (*port_vlan_del)(struct dsa_switch *ds, int port,
const struct switchdev_obj_port_vlan *vlan);
+ int (*vlan_msti_set)(struct dsa_switch *ds,
+ const struct switchdev_attr *attr);
+
/*
* Forwarding database
*/
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 07c0ad52395a..87ec0697e92e 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -217,6 +217,7 @@ int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
struct netlink_ext_ack *extack);
bool dsa_port_skip_vlan_configuration(struct dsa_port *dp);
int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock);
+int dsa_port_vlan_msti(struct dsa_port *dp, const struct switchdev_attr *attr);
int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
bool targeted_match);
int dsa_port_fdb_add(struct dsa_port *dp, const unsigned char *addr,
diff --git a/net/dsa/port.c b/net/dsa/port.c
index d9da425a17fb..5f45cb7d70ba 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -778,6 +778,16 @@ int dsa_port_bridge_flags(struct dsa_port *dp,
return 0;
}

+int dsa_port_vlan_msti(struct dsa_port *dp, const struct switchdev_attr *attr)
+{
+ struct dsa_switch *ds = dp->ds;
+
+ if (!ds->ops->vlan_msti_set)
+ return -EOPNOTSUPP;
+
+ return ds->ops->vlan_msti_set(ds, attr);
+}
+
int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
bool targeted_match)
{
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 089616206b11..c6ffcd782b5a 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -314,6 +314,12 @@ static int dsa_slave_port_attr_set(struct net_device *dev, const void *ctx,

ret = dsa_port_bridge_flags(dp, attr->u.brport_flags, extack);
break;
+ case SWITCHDEV_ATTR_ID_VLAN_MSTI:
+ if (!dsa_port_offloads_bridge_dev(dp, attr->orig_dev))
+ return -EOPNOTSUPP;
+
+ ret = dsa_port_vlan_msti(dp, attr);
+ break;
default:
ret = -EOPNOTSUPP;
break;
--
2.25.1

2022-03-01 13:26:02

by Tobias Waldekranz

[permalink] [raw]
Subject: [PATCH v2 net-next 08/10] net: dsa: mv88e6xxx: Disentangle STU from VTU

In early LinkStreet silicon (e.g. 6095/6185), the per-VLAN STP states
were kept in the VTU - there was no concept of a SID. Later, the
information was split into two tables, where the VTU only tracked
memberships and deferred the STP state tracking to the STU via a
pointer (SID). This meant that a group of VLANs could share the same
STU entry. Most likely, this was done to align with MSTP (802.1Q-2018,
Clause 13), which is built on this principle.

While the VTU is still 4k lines on most devices, the STU is capped at
64 entries. This means that the current stategy, updating STU info
whenever a VTU entry is updated, can not easily support MSTP because:

- The maximum number of VIDs would also be capped at 64, as we would
have to allocate one SID for every VTU entry - even if many VLANs
would effectively share the same MST.

- MSTP updates would be unnecessarily slow as you would have to
iterate over all VLANs that share the same MST.

In order to support MSTP offloading in the future, manage the STU as a
separate entity from the VTU.

Only add support for newer hardware with separate VTU and
STU. VTU-only devices can also be supported, but essentially this
requires a software implementation of an STU (fanning out state
changed to all VLANs tied to the same MST).

Signed-off-by: Tobias Waldekranz <[email protected]>
---
drivers/net/dsa/mv88e6xxx/chip.c | 54 ++++
drivers/net/dsa/mv88e6xxx/chip.h | 24 ++
drivers/net/dsa/mv88e6xxx/global1.h | 10 +
drivers/net/dsa/mv88e6xxx/global1_vtu.c | 311 ++++++++++++++----------
4 files changed, 264 insertions(+), 135 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 84b90fc36c58..c14a62aa6a6c 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1791,6 +1791,33 @@ static int mv88e6xxx_atu_new(struct mv88e6xxx_chip *chip, u16 *fid)
return mv88e6xxx_g1_atu_flush(chip, *fid, true);
}

+static int mv88e6xxx_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ if (!chip->info->ops->stu_loadpurge)
+ return -EOPNOTSUPP;
+
+ return chip->info->ops->stu_loadpurge(chip, entry);
+}
+
+static int mv88e6xxx_stu_setup(struct mv88e6xxx_chip *chip)
+{
+ struct mv88e6xxx_stu_entry stu = {
+ .valid = true,
+ .sid = 0
+ };
+
+ if (!mv88e6xxx_has_stu(chip))
+ return 0;
+
+ /* Make sure that SID 0 is always valid. This is used by VTU
+ * entries that do not make use of the STU, e.g. when creating
+ * a VLAN upper on a port that is also part of a VLAN
+ * filtering bridge.
+ */
+ return mv88e6xxx_stu_loadpurge(chip, &stu);
+}
+
static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch *ds, int port,
u16 vid)
{
@@ -3427,6 +3454,13 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
if (err)
goto unlock;

+ /* Must be called after mv88e6xxx_vtu_setup (which flushes the
+ * VTU, thereby also flushing the STU).
+ */
+ err = mv88e6xxx_stu_setup(chip);
+ if (err)
+ goto unlock;
+
/* Setup Switch Port Registers */
for (i = 0; i < mv88e6xxx_num_ports(chip); i++) {
if (dsa_is_unused_port(ds, i))
@@ -3882,6 +3916,8 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
.phylink_get_caps = mv88e6095_phylink_get_caps,
+ .stu_getnext = mv88e6352_g1_stu_getnext,
+ .stu_loadpurge = mv88e6352_g1_stu_loadpurge,
.set_max_frame_size = mv88e6185_g1_set_max_frame_size,
};

@@ -4968,6 +5004,8 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
.atu_set_hash = mv88e6165_g1_atu_set_hash,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+ .stu_getnext = mv88e6352_g1_stu_getnext,
+ .stu_loadpurge = mv88e6352_g1_stu_loadpurge,
.serdes_get_lane = mv88e6352_serdes_get_lane,
.serdes_pcs_get_state = mv88e6352_serdes_pcs_get_state,
.serdes_pcs_config = mv88e6352_serdes_pcs_config,
@@ -5033,6 +5071,8 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
.atu_set_hash = mv88e6165_g1_atu_set_hash,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+ .stu_getnext = mv88e6390_g1_stu_getnext,
+ .stu_loadpurge = mv88e6390_g1_stu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
.serdes_get_lane = mv88e6390_serdes_get_lane,
/* Check status register pause & lpa register */
@@ -5098,6 +5138,8 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
.atu_set_hash = mv88e6165_g1_atu_set_hash,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+ .stu_getnext = mv88e6390_g1_stu_getnext,
+ .stu_loadpurge = mv88e6390_g1_stu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
.serdes_get_lane = mv88e6390x_serdes_get_lane,
.serdes_pcs_get_state = mv88e6390_serdes_pcs_get_state,
@@ -5166,6 +5208,8 @@ static const struct mv88e6xxx_ops mv88e6393x_ops = {
.atu_set_hash = mv88e6165_g1_atu_set_hash,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+ .stu_getnext = mv88e6390_g1_stu_getnext,
+ .stu_loadpurge = mv88e6390_g1_stu_loadpurge,
.serdes_power = mv88e6393x_serdes_power,
.serdes_get_lane = mv88e6393x_serdes_get_lane,
.serdes_pcs_get_state = mv88e6393x_serdes_pcs_get_state,
@@ -5234,6 +5278,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11,
.num_internal_phys = 8,
.max_vid = 4095,
+ .max_sid = 63,
.port_base_addr = 0x10,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5487,6 +5532,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 9,
.num_gpio = 16,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5510,6 +5556,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 9,
.num_gpio = 16,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5532,6 +5579,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11, /* 10 + Z80 */
.num_internal_phys = 9,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5554,6 +5602,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11, /* 10 + Z80 */
.num_internal_phys = 9,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5576,6 +5625,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11, /* 10 + Z80 */
.num_internal_phys = 9,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5815,6 +5865,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 5,
.num_gpio = 15,
.max_vid = 4095,
+ .max_sid = 63,
.port_base_addr = 0x10,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5839,6 +5890,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 9,
.num_gpio = 16,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5863,6 +5915,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 9,
.num_gpio = 16,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5886,6 +5939,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11, /* 10 + Z80 */
.num_internal_phys = 9,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 30b92a265613..be654be69982 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -20,6 +20,7 @@

#define EDSA_HLEN 8
#define MV88E6XXX_N_FID 4096
+#define MV88E6XXX_N_SID 64

#define MV88E6XXX_FID_STANDALONE 0
#define MV88E6XXX_FID_BRIDGED 1
@@ -130,6 +131,7 @@ struct mv88e6xxx_info {
unsigned int num_internal_phys;
unsigned int num_gpio;
unsigned int max_vid;
+ unsigned int max_sid;
unsigned int port_base_addr;
unsigned int phy_base_addr;
unsigned int global1_addr;
@@ -181,6 +183,12 @@ struct mv88e6xxx_vtu_entry {
bool valid;
bool policy;
u8 member[DSA_MAX_PORTS];
+ u8 state[DSA_MAX_PORTS]; /* Older silicon has no STU */
+};
+
+struct mv88e6xxx_stu_entry {
+ u8 sid;
+ bool valid;
u8 state[DSA_MAX_PORTS];
};

@@ -602,6 +610,12 @@ struct mv88e6xxx_ops {
int (*vtu_loadpurge)(struct mv88e6xxx_chip *chip,
struct mv88e6xxx_vtu_entry *entry);

+ /* Spanning Tree Unit operations */
+ int (*stu_getnext)(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+ int (*stu_loadpurge)(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+
/* GPIO operations */
const struct mv88e6xxx_gpio_ops *gpio_ops;

@@ -700,6 +714,11 @@ struct mv88e6xxx_hw_stat {
int type;
};

+static inline bool mv88e6xxx_has_stu(struct mv88e6xxx_chip *chip)
+{
+ return chip->info->max_sid > 0;
+}
+
static inline bool mv88e6xxx_has_pvt(struct mv88e6xxx_chip *chip)
{
return chip->info->pvt;
@@ -730,6 +749,11 @@ static inline unsigned int mv88e6xxx_max_vid(struct mv88e6xxx_chip *chip)
return chip->info->max_vid;
}

+static inline unsigned int mv88e6xxx_max_sid(struct mv88e6xxx_chip *chip)
+{
+ return chip->info->max_sid;
+}
+
static inline u16 mv88e6xxx_port_mask(struct mv88e6xxx_chip *chip)
{
return GENMASK((s32)mv88e6xxx_num_ports(chip) - 1, 0);
diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h
index 2c1607c858a1..65958b2a0d3a 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.h
+++ b/drivers/net/dsa/mv88e6xxx/global1.h
@@ -348,6 +348,16 @@ int mv88e6390_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
int mv88e6390_g1_vtu_loadpurge(struct mv88e6xxx_chip *chip,
struct mv88e6xxx_vtu_entry *entry);
int mv88e6xxx_g1_vtu_flush(struct mv88e6xxx_chip *chip);
+int mv88e6xxx_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+int mv88e6352_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+int mv88e6352_g1_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+int mv88e6390_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+int mv88e6390_g1_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
int mv88e6xxx_g1_vtu_prob_irq_setup(struct mv88e6xxx_chip *chip);
void mv88e6xxx_g1_vtu_prob_irq_free(struct mv88e6xxx_chip *chip);
int mv88e6xxx_g1_atu_get_next(struct mv88e6xxx_chip *chip, u16 fid);
diff --git a/drivers/net/dsa/mv88e6xxx/global1_vtu.c b/drivers/net/dsa/mv88e6xxx/global1_vtu.c
index b1bd9274a562..38e18f5811bf 100644
--- a/drivers/net/dsa/mv88e6xxx/global1_vtu.c
+++ b/drivers/net/dsa/mv88e6xxx/global1_vtu.c
@@ -44,8 +44,7 @@ static int mv88e6xxx_g1_vtu_fid_write(struct mv88e6xxx_chip *chip,

/* Offset 0x03: VTU SID Register */

-static int mv88e6xxx_g1_vtu_sid_read(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+static int mv88e6xxx_g1_vtu_sid_read(struct mv88e6xxx_chip *chip, u8 *sid)
{
u16 val;
int err;
@@ -54,15 +53,14 @@ static int mv88e6xxx_g1_vtu_sid_read(struct mv88e6xxx_chip *chip,
if (err)
return err;

- entry->sid = val & MV88E6352_G1_VTU_SID_MASK;
+ *sid = val & MV88E6352_G1_VTU_SID_MASK;

return 0;
}

-static int mv88e6xxx_g1_vtu_sid_write(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+static int mv88e6xxx_g1_vtu_sid_write(struct mv88e6xxx_chip *chip, u8 sid)
{
- u16 val = entry->sid & MV88E6352_G1_VTU_SID_MASK;
+ u16 val = sid & MV88E6352_G1_VTU_SID_MASK;

return mv88e6xxx_g1_write(chip, MV88E6352_G1_VTU_SID, val);
}
@@ -91,7 +89,7 @@ static int mv88e6xxx_g1_vtu_op(struct mv88e6xxx_chip *chip, u16 op)
/* Offset 0x06: VTU VID Register */

static int mv88e6xxx_g1_vtu_vid_read(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+ bool *valid, u16 *vid)
{
u16 val;
int err;
@@ -100,25 +98,28 @@ static int mv88e6xxx_g1_vtu_vid_read(struct mv88e6xxx_chip *chip,
if (err)
return err;

- entry->vid = val & 0xfff;
+ if (vid) {
+ *vid = val & 0xfff;

- if (val & MV88E6390_G1_VTU_VID_PAGE)
- entry->vid |= 0x1000;
+ if (val & MV88E6390_G1_VTU_VID_PAGE)
+ *vid |= 0x1000;
+ }

- entry->valid = !!(val & MV88E6XXX_G1_VTU_VID_VALID);
+ if (valid)
+ *valid = !!(val & MV88E6XXX_G1_VTU_VID_VALID);

return 0;
}

static int mv88e6xxx_g1_vtu_vid_write(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+ bool valid, u16 vid)
{
- u16 val = entry->vid & 0xfff;
+ u16 val = vid & 0xfff;

- if (entry->vid & 0x1000)
+ if (vid & 0x1000)
val |= MV88E6390_G1_VTU_VID_PAGE;

- if (entry->valid)
+ if (valid)
val |= MV88E6XXX_G1_VTU_VID_VALID;

return mv88e6xxx_g1_write(chip, MV88E6XXX_G1_VTU_VID, val);
@@ -147,7 +148,7 @@ static int mv88e6185_g1_vtu_stu_data_read(struct mv88e6xxx_chip *chip,
}

static int mv88e6185_g1_vtu_data_read(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+ u8 *member, u8 *state)
{
u16 regs[3];
int err;
@@ -160,36 +161,20 @@ static int mv88e6185_g1_vtu_data_read(struct mv88e6xxx_chip *chip,
/* Extract MemberTag data */
for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
unsigned int member_offset = (i % 4) * 4;
+ unsigned int state_offset = member_offset + 2;

- entry->member[i] = (regs[i / 4] >> member_offset) & 0x3;
- }
-
- return 0;
-}
-
-static int mv88e6185_g1_stu_data_read(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
-{
- u16 regs[3];
- int err;
- int i;
-
- err = mv88e6185_g1_vtu_stu_data_read(chip, regs);
- if (err)
- return err;
+ if (member)
+ member[i] = (regs[i / 4] >> member_offset) & 0x3;

- /* Extract PortState data */
- for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
- unsigned int state_offset = (i % 4) * 4 + 2;
-
- entry->state[i] = (regs[i / 4] >> state_offset) & 0x3;
+ if (state)
+ state[i] = (regs[i / 4] >> state_offset) & 0x3;
}

return 0;
}

static int mv88e6185_g1_vtu_data_write(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+ u8 *member, u8 *state)
{
u16 regs[3] = { 0 };
int i;
@@ -199,8 +184,11 @@ static int mv88e6185_g1_vtu_data_write(struct mv88e6xxx_chip *chip,
unsigned int member_offset = (i % 4) * 4;
unsigned int state_offset = member_offset + 2;

- regs[i / 4] |= (entry->member[i] & 0x3) << member_offset;
- regs[i / 4] |= (entry->state[i] & 0x3) << state_offset;
+ if (member)
+ regs[i / 4] |= (member[i] & 0x3) << member_offset;
+
+ if (state)
+ regs[i / 4] |= (state[i] & 0x3) << state_offset;
}

/* Write all 3 VTU/STU Data registers */
@@ -268,48 +256,6 @@ static int mv88e6390_g1_vtu_data_write(struct mv88e6xxx_chip *chip, u8 *data)

/* VLAN Translation Unit Operations */

-static int mv88e6xxx_g1_vtu_stu_getnext(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
-{
- int err;
-
- err = mv88e6xxx_g1_vtu_sid_write(chip, entry);
- if (err)
- return err;
-
- err = mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_STU_GET_NEXT);
- if (err)
- return err;
-
- err = mv88e6xxx_g1_vtu_sid_read(chip, entry);
- if (err)
- return err;
-
- return mv88e6xxx_g1_vtu_vid_read(chip, entry);
-}
-
-static int mv88e6xxx_g1_vtu_stu_get(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *vtu)
-{
- struct mv88e6xxx_vtu_entry stu;
- int err;
-
- err = mv88e6xxx_g1_vtu_sid_read(chip, vtu);
- if (err)
- return err;
-
- stu.sid = vtu->sid - 1;
-
- err = mv88e6xxx_g1_vtu_stu_getnext(chip, &stu);
- if (err)
- return err;
-
- if (stu.sid != vtu->sid || !stu.valid)
- return -EINVAL;
-
- return 0;
-}
-
int mv88e6xxx_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
struct mv88e6xxx_vtu_entry *entry)
{
@@ -327,7 +273,7 @@ int mv88e6xxx_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
* write the VID only once, when the entry is given as invalid.
*/
if (!entry->valid) {
- err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, false, entry->vid);
if (err)
return err;
}
@@ -336,7 +282,7 @@ int mv88e6xxx_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
if (err)
return err;

- return mv88e6xxx_g1_vtu_vid_read(chip, entry);
+ return mv88e6xxx_g1_vtu_vid_read(chip, &entry->valid, &entry->vid);
}

int mv88e6185_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
@@ -350,11 +296,7 @@ int mv88e6185_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
return err;

if (entry->valid) {
- err = mv88e6185_g1_vtu_data_read(chip, entry);
- if (err)
- return err;
-
- err = mv88e6185_g1_stu_data_read(chip, entry);
+ err = mv88e6185_g1_vtu_data_read(chip, entry->member, entry->state);
if (err)
return err;

@@ -384,7 +326,7 @@ int mv88e6352_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
return err;

if (entry->valid) {
- err = mv88e6185_g1_vtu_data_read(chip, entry);
+ err = mv88e6185_g1_vtu_data_read(chip, entry->member, NULL);
if (err)
return err;

@@ -392,12 +334,7 @@ int mv88e6352_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
if (err)
return err;

- /* Fetch VLAN PortState data from the STU */
- err = mv88e6xxx_g1_vtu_stu_get(chip, entry);
- if (err)
- return err;
-
- err = mv88e6185_g1_stu_data_read(chip, entry);
+ err = mv88e6xxx_g1_vtu_sid_read(chip, &entry->sid);
if (err)
return err;
}
@@ -420,16 +357,11 @@ int mv88e6390_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
if (err)
return err;

- /* Fetch VLAN PortState data from the STU */
- err = mv88e6xxx_g1_vtu_stu_get(chip, entry);
- if (err)
- return err;
-
- err = mv88e6390_g1_vtu_data_read(chip, entry->state);
+ err = mv88e6xxx_g1_vtu_fid_read(chip, entry);
if (err)
return err;

- err = mv88e6xxx_g1_vtu_fid_read(chip, entry);
+ err = mv88e6xxx_g1_vtu_sid_read(chip, &entry->sid);
if (err)
return err;
}
@@ -447,12 +379,12 @@ int mv88e6185_g1_vtu_loadpurge(struct mv88e6xxx_chip *chip,
if (err)
return err;

- err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, entry->vid);
if (err)
return err;

if (entry->valid) {
- err = mv88e6185_g1_vtu_data_write(chip, entry);
+ err = mv88e6185_g1_vtu_data_write(chip, entry->member, entry->state);
if (err)
return err;

@@ -479,27 +411,21 @@ int mv88e6352_g1_vtu_loadpurge(struct mv88e6xxx_chip *chip,
if (err)
return err;

- err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, entry->vid);
if (err)
return err;

if (entry->valid) {
- /* Write MemberTag and PortState data */
- err = mv88e6185_g1_vtu_data_write(chip, entry);
- if (err)
- return err;
-
- err = mv88e6xxx_g1_vtu_sid_write(chip, entry);
+ /* Write MemberTag data */
+ err = mv88e6185_g1_vtu_data_write(chip, entry->member, NULL);
if (err)
return err;

- /* Load STU entry */
- err = mv88e6xxx_g1_vtu_op(chip,
- MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE);
+ err = mv88e6xxx_g1_vtu_fid_write(chip, entry);
if (err)
return err;

- err = mv88e6xxx_g1_vtu_fid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
if (err)
return err;
}
@@ -517,41 +443,113 @@ int mv88e6390_g1_vtu_loadpurge(struct mv88e6xxx_chip *chip,
if (err)
return err;

- err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, entry->vid);
if (err)
return err;

if (entry->valid) {
- /* Write PortState data */
- err = mv88e6390_g1_vtu_data_write(chip, entry->state);
+ /* Write MemberTag data */
+ err = mv88e6390_g1_vtu_data_write(chip, entry->member);
if (err)
return err;

- err = mv88e6xxx_g1_vtu_sid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_fid_write(chip, entry);
if (err)
return err;

- /* Load STU entry */
- err = mv88e6xxx_g1_vtu_op(chip,
- MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE);
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
if (err)
return err;
+ }

- /* Write MemberTag data */
- err = mv88e6390_g1_vtu_data_write(chip, entry->member);
+ /* Load/Purge VTU entry */
+ return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_VTU_LOAD_PURGE);
+}
+
+int mv88e6xxx_g1_vtu_flush(struct mv88e6xxx_chip *chip)
+{
+ int err;
+
+ err = mv88e6xxx_g1_vtu_op_wait(chip);
+ if (err)
+ return err;
+
+ return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_FLUSH_ALL);
+}
+
+/* Spanning Tree Unit Operations */
+
+int mv88e6xxx_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ int err;
+
+ err = mv88e6xxx_g1_vtu_op_wait(chip);
+ if (err)
+ return err;
+
+ /* To get the next higher active SID, the STU GetNext operation can be
+ * started again without setting the SID registers since it already
+ * contains the last SID.
+ *
+ * To save a few hardware accesses and abstract this to the caller,
+ * write the SID only once, when the entry is given as invalid.
+ */
+ if (!entry->valid) {
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
if (err)
return err;
+ }

- err = mv88e6xxx_g1_vtu_fid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_STU_GET_NEXT);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_vtu_vid_read(chip, &entry->valid, NULL);
+ if (err)
+ return err;
+
+ if (entry->valid) {
+ err = mv88e6xxx_g1_vtu_sid_read(chip, &entry->sid);
if (err)
return err;
}

- /* Load/Purge VTU entry */
- return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_VTU_LOAD_PURGE);
+ return 0;
}

-int mv88e6xxx_g1_vtu_flush(struct mv88e6xxx_chip *chip)
+int mv88e6352_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ int err;
+
+ err = mv88e6xxx_g1_stu_getnext(chip, entry);
+ if (err)
+ return err;
+
+ if (!entry->valid)
+ return 0;
+
+ return mv88e6185_g1_vtu_data_read(chip, NULL, entry->state);
+}
+
+int mv88e6390_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ int err;
+
+ err = mv88e6xxx_g1_stu_getnext(chip, entry);
+ if (err)
+ return err;
+
+ if (!entry->valid)
+ return 0;
+
+ return mv88e6390_g1_vtu_data_read(chip, entry->state);
+}
+
+int mv88e6352_g1_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
{
int err;

@@ -559,16 +557,59 @@ int mv88e6xxx_g1_vtu_flush(struct mv88e6xxx_chip *chip)
if (err)
return err;

- return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_FLUSH_ALL);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, 0);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
+ if (err)
+ return err;
+
+ if (entry->valid) {
+ err = mv88e6185_g1_vtu_data_write(chip, NULL, entry->state);
+ if (err)
+ return err;
+ }
+
+ /* Load/Purge STU entry */
+ return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE);
+}
+
+int mv88e6390_g1_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ int err;
+
+ err = mv88e6xxx_g1_vtu_op_wait(chip);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, 0);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
+ if (err)
+ return err;
+
+ if (entry->valid) {
+ err = mv88e6390_g1_vtu_data_write(chip, entry->state);
+ if (err)
+ return err;
+ }
+
+ /* Load/Purge STU entry */
+ return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE);
}

+/* VTU Violation Management */
+
static irqreturn_t mv88e6xxx_g1_vtu_prob_irq_thread_fn(int irq, void *dev_id)
{
struct mv88e6xxx_chip *chip = dev_id;
- struct mv88e6xxx_vtu_entry entry;
+ u16 val, vid;
int spid;
int err;
- u16 val;

mv88e6xxx_reg_lock(chip);

@@ -580,7 +621,7 @@ static irqreturn_t mv88e6xxx_g1_vtu_prob_irq_thread_fn(int irq, void *dev_id)
if (err)
goto out;

- err = mv88e6xxx_g1_vtu_vid_read(chip, &entry);
+ err = mv88e6xxx_g1_vtu_vid_read(chip, NULL, &vid);
if (err)
goto out;

@@ -588,13 +629,13 @@ static irqreturn_t mv88e6xxx_g1_vtu_prob_irq_thread_fn(int irq, void *dev_id)

if (val & MV88E6XXX_G1_VTU_OP_MEMBER_VIOLATION) {
dev_err_ratelimited(chip->dev, "VTU member violation for vid %d, source port %d\n",
- entry.vid, spid);
+ vid, spid);
chip->ports[spid].vtu_member_violation++;
}

if (val & MV88E6XXX_G1_VTU_OP_MISS_VIOLATION) {
dev_dbg_ratelimited(chip->dev, "VTU miss violation for vid %d, source port %d\n",
- entry.vid, spid);
+ vid, spid);
chip->ports[spid].vtu_miss_violation++;
}

--
2.25.1

2022-03-01 13:47:48

by Tobias Waldekranz

[permalink] [raw]
Subject: [PATCH v2 net-next 10/10] net: dsa: mv88e6xxx: MST Offloading

Allocate a SID in the STU for each MSTID in use by a bridge and handle
the mapping of MSTIDs to VLANs using the SID field of each VTU entry.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
drivers/net/dsa/mv88e6xxx/chip.c | 178 +++++++++++++++++++++++++++++++
drivers/net/dsa/mv88e6xxx/chip.h | 13 +++
2 files changed, 191 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index c14a62aa6a6c..4fb4ec1dff79 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1818,6 +1818,137 @@ static int mv88e6xxx_stu_setup(struct mv88e6xxx_chip *chip)
return mv88e6xxx_stu_loadpurge(chip, &stu);
}

+static int mv88e6xxx_sid_new(struct mv88e6xxx_chip *chip, u8 *sid)
+{
+ DECLARE_BITMAP(busy, MV88E6XXX_N_SID) = { 0 };
+ struct mv88e6xxx_mst *mst;
+
+ set_bit(0, busy);
+
+ list_for_each_entry(mst, &chip->msts, node) {
+ set_bit(mst->stu.sid, busy);
+ }
+
+ *sid = find_first_zero_bit(busy, MV88E6XXX_N_SID);
+
+ return (*sid >= mv88e6xxx_max_sid(chip)) ? -ENOSPC : 0;
+}
+
+static int mv88e6xxx_sid_put(struct mv88e6xxx_chip *chip, u8 sid)
+{
+ struct mv88e6xxx_mst *mst, *tmp;
+ int err = 0;
+
+ list_for_each_entry_safe(mst, tmp, &chip->msts, node) {
+ if (mst->stu.sid == sid) {
+ if (refcount_dec_and_test(&mst->refcnt)) {
+ mst->stu.valid = false;
+ err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);
+ list_del(&mst->node);
+ kfree(mst);
+ }
+
+ return err;
+ }
+ }
+
+ return -ENOENT;
+}
+
+static int mv88e6xxx_sid_get(struct mv88e6xxx_chip *chip, struct net_device *br,
+ u16 msti, u8 *sid)
+{
+ struct mv88e6xxx_mst *mst;
+ int err, i;
+
+ if (!br)
+ return 0;
+
+ if (!mv88e6xxx_has_stu(chip))
+ return -EOPNOTSUPP;
+
+ list_for_each_entry(mst, &chip->msts, node) {
+ if (mst->br == br && mst->msti == msti) {
+ refcount_inc(&mst->refcnt);
+ *sid = mst->stu.sid;
+ return 0;
+ }
+ }
+
+ err = mv88e6xxx_sid_new(chip, sid);
+ if (err)
+ return err;
+
+ mst = kzalloc(sizeof(*mst), GFP_KERNEL);
+ if (!mst)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&mst->node);
+ refcount_set(&mst->refcnt, 1);
+ mst->br = br;
+ mst->msti = msti;
+ mst->stu.valid = true;
+ mst->stu.sid = *sid;
+
+ /* The bridge starts out all ports in the disabled state. But
+ * a STU state of disabled means to go by the port-global
+ * state. So we set all user port's initial state to blocking,
+ * to match the bridge's behavior.
+ */
+ for (i = 0; i < mv88e6xxx_num_ports(chip); i++)
+ mst->stu.state[i] = dsa_is_user_port(chip->ds, i) ?
+ MV88E6XXX_PORT_CTL0_STATE_BLOCKING :
+ MV88E6XXX_PORT_CTL0_STATE_DISABLED;
+
+ list_add_tail(&mst->node, &chip->msts);
+ return mv88e6xxx_stu_loadpurge(chip, &mst->stu);
+}
+
+static int mv88e6xxx_port_mst_state_set(struct dsa_switch *ds, int port,
+ const struct switchdev_mst_state *st)
+{
+ struct dsa_port *dp = dsa_to_port(ds, port);
+ struct mv88e6xxx_chip *chip = ds->priv;
+ struct mv88e6xxx_mst *mst;
+ u8 state;
+ int err;
+
+ if (!mv88e6xxx_has_stu(chip))
+ return -EOPNOTSUPP;
+
+ switch (st->state) {
+ case BR_STATE_DISABLED:
+ case BR_STATE_BLOCKING:
+ case BR_STATE_LISTENING:
+ state = MV88E6XXX_PORT_CTL0_STATE_BLOCKING;
+ break;
+ case BR_STATE_LEARNING:
+ state = MV88E6XXX_PORT_CTL0_STATE_LEARNING;
+ break;
+ case BR_STATE_FORWARDING:
+ state = MV88E6XXX_PORT_CTL0_STATE_FORWARDING;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ list_for_each_entry(mst, &chip->msts, node) {
+ if (mst->br == dsa_port_bridge_dev_get(dp) &&
+ mst->msti == st->msti) {
+ if (mst->stu.state[port] == state)
+ return 0;
+
+ mst->stu.state[port] = state;
+ mv88e6xxx_reg_lock(chip);
+ err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);
+ mv88e6xxx_reg_unlock(chip);
+ return err;
+ }
+ }
+
+ return -ENOENT;
+}
+
static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch *ds, int port,
u16 vid)
{
@@ -2437,6 +2568,12 @@ static int mv88e6xxx_port_vlan_leave(struct mv88e6xxx_chip *chip,
if (err)
return err;

+ if (!vlan.valid && vlan.sid) {
+ err = mv88e6xxx_sid_put(chip, vlan.sid);
+ if (err)
+ return err;
+ }
+
return mv88e6xxx_g1_atu_remove(chip, vlan.fid, port, false);
}

@@ -2482,6 +2619,44 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
return err;
}

+static int mv88e6xxx_vlan_msti_set(struct dsa_switch *ds,
+ const struct switchdev_attr *attr)
+{
+ const struct switchdev_vlan_attr *vattr = &attr->u.vlan_attr;
+ struct mv88e6xxx_chip *chip = ds->priv;
+ struct mv88e6xxx_vtu_entry vlan;
+ u8 new_sid;
+ int err;
+
+ mv88e6xxx_reg_lock(chip);
+
+ err = mv88e6xxx_vtu_get(chip, vattr->vid, &vlan);
+ if (err)
+ goto unlock;
+
+ if (!vlan.valid) {
+ err = -EINVAL;
+ goto unlock;
+ }
+
+ err = mv88e6xxx_sid_get(chip, attr->orig_dev, vattr->msti, &new_sid);
+ if (err)
+ goto unlock;
+
+ if (vlan.sid) {
+ err = mv88e6xxx_sid_put(chip, vlan.sid);
+ if (err)
+ goto unlock;
+ }
+
+ vlan.sid = new_sid;
+ err = mv88e6xxx_vtu_loadpurge(chip, &vlan);
+
+unlock:
+ mv88e6xxx_reg_unlock(chip);
+ return err;
+}
+
static int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid,
struct dsa_db db)
@@ -6008,6 +6183,7 @@ static struct mv88e6xxx_chip *mv88e6xxx_alloc_chip(struct device *dev)
mutex_init(&chip->reg_lock);
INIT_LIST_HEAD(&chip->mdios);
idr_init(&chip->policies);
+ INIT_LIST_HEAD(&chip->msts);

return chip;
}
@@ -6540,10 +6716,12 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
.port_pre_bridge_flags = mv88e6xxx_port_pre_bridge_flags,
.port_bridge_flags = mv88e6xxx_port_bridge_flags,
.port_stp_state_set = mv88e6xxx_port_stp_state_set,
+ .port_mst_state_set = mv88e6xxx_port_mst_state_set,
.port_fast_age = mv88e6xxx_port_fast_age,
.port_vlan_filtering = mv88e6xxx_port_vlan_filtering,
.port_vlan_add = mv88e6xxx_port_vlan_add,
.port_vlan_del = mv88e6xxx_port_vlan_del,
+ .vlan_msti_set = mv88e6xxx_vlan_msti_set,
.port_fdb_add = mv88e6xxx_port_fdb_add,
.port_fdb_del = mv88e6xxx_port_fdb_del,
.port_fdb_dump = mv88e6xxx_port_fdb_dump,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 6d4daa24d3e5..6a0b66354e1d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -297,6 +297,16 @@ struct mv88e6xxx_region_priv {
enum mv88e6xxx_region_id id;
};

+struct mv88e6xxx_mst {
+ struct list_head node;
+
+ refcount_t refcnt;
+ struct net_device *br;
+ u16 msti;
+
+ struct mv88e6xxx_stu_entry stu;
+};
+
struct mv88e6xxx_chip {
const struct mv88e6xxx_info *info;

@@ -397,6 +407,9 @@ struct mv88e6xxx_chip {

/* devlink regions */
struct devlink_region *regions[_MV88E6XXX_REGION_MAX];
+
+ /* Bridge MST to SID mappings */
+ struct list_head msts;
};

struct mv88e6xxx_bus_ops {
--
2.25.1

2022-03-01 15:48:59

by Tobias Waldekranz

[permalink] [raw]
Subject: [PATCH v2 net-next 05/10] net: bridge: mst: Notify switchdev drivers of MST state changes

Generate a switchdev notification whenever an MST state changes. This
notification is keyed by the VLANs MSTI rather than the VID, since
multiple VLANs may share the same MST instance.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
include/net/switchdev.h | 7 +++++++
net/bridge/br_mst.c | 21 +++++++++++++++++++++
2 files changed, 28 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 39e57aa5005a..441beeb2fda5 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -19,6 +19,7 @@
enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_UNDEFINED,
SWITCHDEV_ATTR_ID_PORT_STP_STATE,
+ SWITCHDEV_ATTR_ID_PORT_MST_STATE,
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
SWITCHDEV_ATTR_ID_PORT_MROUTER,
@@ -31,6 +32,11 @@ enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_VLAN_MSTI,
};

+struct switchdev_mst_state {
+ u16 msti;
+ u8 state;
+};
+
struct switchdev_brport_flags {
unsigned long val;
unsigned long mask;
@@ -52,6 +58,7 @@ struct switchdev_attr {
void (*complete)(struct net_device *dev, int err, void *priv);
union {
u8 stp_state; /* PORT_STP_STATE */
+ struct switchdev_mst_state mst_state; /* PORT_MST_STATE */
struct switchdev_brport_flags brport_flags; /* PORT_BRIDGE_FLAGS */
bool mrouter; /* PORT_MROUTER */
clock_t ageing_time; /* BRIDGE_AGEING_TIME */
diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
index aba603675165..9cdd7d9e07c6 100644
--- a/net/bridge/br_mst.c
+++ b/net/bridge/br_mst.c
@@ -29,8 +29,18 @@ void br_mst_vlan_set_state(struct net_bridge_port *p, struct net_bridge_vlan *v,

void br_mst_set_state(struct net_bridge_port *p, u16 msti, u8 state)
{
+ struct switchdev_attr attr = {
+ .id = SWITCHDEV_ATTR_ID_PORT_MST_STATE,
+ .flags = SWITCHDEV_F_DEFER,
+ .orig_dev = p->dev,
+ .u.mst_state = {
+ .msti = msti,
+ .state = state,
+ },
+ };
struct net_bridge_vlan_group *vg;
struct net_bridge_vlan *v;
+ int err;

vg = nbp_vlan_group(p);
if (!vg)
@@ -42,6 +52,17 @@ void br_mst_set_state(struct net_bridge_port *p, u16 msti, u8 state)

br_mst_vlan_set_state(p, v, state);
}
+
+ if (!msti)
+ /* MSTI 0 (CST) state changes are notified via the
+ * regular SWITCHDEV_ATTR_ID_PORT_STP_STATE.
+ */
+ return;
+
+ err = switchdev_port_attr_set(p->dev, &attr, NULL);
+ if (err && err != -EOPNOTSUPP)
+ br_warn(p->br, "unable to offload MST state on %s in MSTI %u",
+ netdev_name(p->dev), msti);
}

static void br_mst_vlan_sync_state(struct net_bridge_vlan *pv, u16 msti)
--
2.25.1

2022-03-01 18:21:48

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 00/10] net: bridge: Multiple Spanning Trees

Hi Tobias,

On Tue, Mar 01, 2022 at 11:03:11AM +0100, Tobias Waldekranz wrote:
> A proposal for the corresponding iproute2 interface is available here:
>
> https://github.com/wkz/iproute2/tree/mst

Please pardon my ignorance. Is there a user-mode STP protocol application
that supports MSTP, and that you've tested these patches with?
I'd like to give it a try.

2022-03-01 18:24:51

by Tobias Waldekranz

[permalink] [raw]
Subject: [PATCH v2 net-next 04/10] net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations

Whenever a VLAN moves to a new MSTI, send a switchdev notification so
that switchdevs can...

...either refuse the migration if the hardware does not support
offloading of MST...

..or track a bridge's VID to MSTI mapping when offloading is
supported.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
include/net/switchdev.h | 10 +++++++
net/bridge/br_mst.c | 15 +++++++++++
net/bridge/br_switchdev.c | 57 +++++++++++++++++++++++++++++++++++++++
3 files changed, 82 insertions(+)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 3e424d40fae3..39e57aa5005a 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -28,6 +28,7 @@ enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
SWITCHDEV_ATTR_ID_MRP_PORT_ROLE,
+ SWITCHDEV_ATTR_ID_VLAN_MSTI,
};

struct switchdev_brport_flags {
@@ -35,6 +36,14 @@ struct switchdev_brport_flags {
unsigned long mask;
};

+struct switchdev_vlan_attr {
+ u16 vid;
+
+ union {
+ u16 msti;
+ };
+};
+
struct switchdev_attr {
struct net_device *orig_dev;
enum switchdev_attr_id id;
@@ -50,6 +59,7 @@ struct switchdev_attr {
u16 vlan_protocol; /* BRIDGE_VLAN_PROTOCOL */
bool mc_disabled; /* MC_DISABLED */
u8 mrp_port_role; /* MRP_PORT_ROLE */
+ struct switchdev_vlan_attr vlan_attr; /* VLAN_* */
} u;
};

diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
index 8dea8e7257fd..aba603675165 100644
--- a/net/bridge/br_mst.c
+++ b/net/bridge/br_mst.c
@@ -7,6 +7,7 @@
*/

#include <linux/kernel.h>
+#include <net/switchdev.h>

#include "br_private.h"

@@ -65,9 +66,23 @@ static void br_mst_vlan_sync_state(struct net_bridge_vlan *pv, u16 msti)

int br_mst_vlan_set_msti(struct net_bridge_vlan *mv, u16 msti)
{
+ struct switchdev_attr attr = {
+ .id = SWITCHDEV_ATTR_ID_VLAN_MSTI,
+ .flags = SWITCHDEV_F_DEFER,
+ .orig_dev = mv->br->dev,
+ .u.vlan_attr = {
+ .vid = mv->vid,
+ .msti = msti,
+ },
+ };
struct net_bridge_vlan_group *vg;
struct net_bridge_vlan *pv;
struct net_bridge_port *p;
+ int err;
+
+ err = switchdev_port_attr_set(mv->br->dev, &attr, NULL);
+ if (err && err != -EOPNOTSUPP)
+ return err;

mv->msti = msti;

diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
index 6f6a70121a5e..160d7659f88a 100644
--- a/net/bridge/br_switchdev.c
+++ b/net/bridge/br_switchdev.c
@@ -428,6 +428,57 @@ static int br_switchdev_vlan_replay(struct net_device *br_dev,
return 0;
}

+static int br_switchdev_mst_replay(struct net_device *br_dev,
+ const void *ctx, bool adding,
+ struct notifier_block *nb,
+ struct netlink_ext_ack *extack)
+{
+ struct switchdev_notifier_port_attr_info attr_info = {
+ .info = {
+ .dev = br_dev,
+ .extack = extack,
+ .ctx = ctx,
+ },
+ };
+ struct net_bridge *br = netdev_priv(br_dev);
+ struct net_bridge_vlan_group *vg;
+ struct net_bridge_vlan *v;
+ int err;
+
+ ASSERT_RTNL();
+
+ if (!nb)
+ return 0;
+
+ if (!netif_is_bridge_master(br_dev))
+ return -EINVAL;
+
+ vg = br_vlan_group(br);
+
+ list_for_each_entry(v, &vg->vlan_list, vlist) {
+ struct switchdev_attr attr = {
+ .id = SWITCHDEV_ATTR_ID_VLAN_MSTI,
+ .flags = SWITCHDEV_F_DEFER,
+ .orig_dev = br_dev,
+ .u.vlan_attr = {
+ .vid = v->vid,
+ .msti = v->msti,
+ }
+ };
+
+ if (!v->msti)
+ continue;
+
+ attr_info.attr = &attr;
+ err = nb->notifier_call(nb, SWITCHDEV_PORT_ATTR_SET, &attr_info);
+ err = notifier_to_errno(err);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
#ifdef CONFIG_BRIDGE_IGMP_SNOOPING
struct br_switchdev_mdb_complete_info {
struct net_bridge_port *port;
@@ -695,6 +746,10 @@ static int nbp_switchdev_sync_objs(struct net_bridge_port *p, const void *ctx,
if (err && err != -EOPNOTSUPP)
return err;

+ err = br_switchdev_mst_replay(br_dev, ctx, true, blocking_nb, extack);
+ if (err && err != -EOPNOTSUPP)
+ return err;
+
err = br_switchdev_mdb_replay(br_dev, dev, ctx, true, blocking_nb,
extack);
if (err && err != -EOPNOTSUPP)
@@ -719,6 +774,8 @@ static void nbp_switchdev_unsync_objs(struct net_bridge_port *p,

br_switchdev_mdb_replay(br_dev, dev, ctx, false, blocking_nb, NULL);

+ br_switchdev_mst_replay(br_dev, ctx, false, blocking_nb, NULL);
+
br_switchdev_vlan_replay(br_dev, ctx, false, blocking_nb, NULL);
}

--
2.25.1

2022-03-01 20:46:11

by Tobias Waldekranz

[permalink] [raw]
Subject: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

Add the usual trampoline functionality from the generic DSA layer down
to the drivers for MST state changes.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
include/net/dsa.h | 2 ++
net/dsa/dsa_priv.h | 2 ++
net/dsa/port.c | 30 ++++++++++++++++++++++++++++++
net/dsa/slave.c | 6 ++++++
4 files changed, 40 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index cc8acb01bd9b..096e6e3a8e1e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -943,6 +943,8 @@ struct dsa_switch_ops {
struct dsa_bridge bridge);
void (*port_stp_state_set)(struct dsa_switch *ds, int port,
u8 state);
+ int (*port_mst_state_set)(struct dsa_switch *ds, int port,
+ const struct switchdev_mst_state *state);
void (*port_fast_age)(struct dsa_switch *ds, int port);
int (*port_pre_bridge_flags)(struct dsa_switch *ds, int port,
struct switchdev_brport_flags flags,
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 87ec0697e92e..a620e079ebc5 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -198,6 +198,8 @@ static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
void dsa_port_set_tag_protocol(struct dsa_port *cpu_dp,
const struct dsa_device_ops *tag_ops);
int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age);
+int dsa_port_set_mst_state(struct dsa_port *dp,
+ const struct switchdev_mst_state *state);
int dsa_port_enable_rt(struct dsa_port *dp, struct phy_device *phy);
int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy);
void dsa_port_disable_rt(struct dsa_port *dp);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 5f45cb7d70ba..26cfbc8ab499 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -108,6 +108,36 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age)
return 0;
}

+int dsa_port_set_mst_state(struct dsa_port *dp,
+ const struct switchdev_mst_state *state)
+{
+ struct dsa_switch *ds = dp->ds;
+ int err, port = dp->index;
+
+ if (!ds->ops->port_mst_state_set)
+ return -EOPNOTSUPP;
+
+ err = ds->ops->port_mst_state_set(ds, port, state);
+ if (err)
+ return err;
+
+ if (!dsa_port_can_configure_learning(dp) || dp->learning) {
+ switch (state->state) {
+ case BR_STATE_DISABLED:
+ case BR_STATE_BLOCKING:
+ case BR_STATE_LISTENING:
+ /* Ideally we would only fast age entries
+ * belonging to VLANs controlled by this
+ * MST.
+ */
+ dsa_port_fast_age(dp);
+ break;
+ }
+ }
+
+ return 0;
+}
+
static void dsa_port_set_state_now(struct dsa_port *dp, u8 state,
bool do_fast_age)
{
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index c6ffcd782b5a..32b006a5b778 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -288,6 +288,12 @@ static int dsa_slave_port_attr_set(struct net_device *dev, const void *ctx,

ret = dsa_port_set_state(dp, attr->u.stp_state, true);
break;
+ case SWITCHDEV_ATTR_ID_PORT_MST_STATE:
+ if (!dsa_port_offloads_bridge_port(dp, attr->orig_dev))
+ return -EOPNOTSUPP;
+
+ ret = dsa_port_set_mst_state(dp, &attr->u.mst_state);
+ break;
case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
if (!dsa_port_offloads_bridge_dev(dp, attr->orig_dev))
return -EOPNOTSUPP;
--
2.25.1

2022-03-02 02:05:19

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 00/10] net: bridge: Multiple Spanning Trees

On Tue, Mar 01, 2022 at 18:21, Vladimir Oltean <[email protected]> wrote:
> Hi Tobias,
>
> On Tue, Mar 01, 2022 at 11:03:11AM +0100, Tobias Waldekranz wrote:
>> A proposal for the corresponding iproute2 interface is available here:
>>
>> https://github.com/wkz/iproute2/tree/mst
>
> Please pardon my ignorance. Is there a user-mode STP protocol application
> that supports MSTP, and that you've tested these patches with?
> I'd like to give it a try.

I see that Stephen has already pointed you to mstpd in a sibling
message.

It is important to note though, that AFAIK mstpd does not actually
support MSTP on a vanilla Linux system. The protocol implementation is
in place, and they have a plugin architecture that makes it easy for people
to hook it up to various userspace SDKs and whatnot, but you can't use
it with a regular bridge.

A colleague of mine has been successfully running a modified version of
mstpd which was tailored for v1 of this series (RFC). But I do not
believe he has had the time to rework it for v2. That should mostly be a
matter of removing code though, as v2 allows you to manage the MSTIs
directly, rather than having to translate it to an associated VLAN.

2022-03-02 03:10:34

by Roopa Prabhu

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 03/10] net: bridge: mst: Support setting and reporting MST port states


On 3/1/22 15:19, Nikolay Aleksandrov wrote:
> On 1 March 2022 11:03:14 CET, Tobias Waldekranz <[email protected]> wrote:
>> Make it possible to change the port state in a given MSTI. This is
>> done through a new netlink interface, since the MSTIs are objects in
>> their own right. The proposed iproute2 interface would be:
>>
>> bridge mst set dev <PORT> msti <MSTI> state <STATE>
>>
>> Current states in all applicable MSTIs can also be dumped. The
>> proposed iproute interface looks like this:
>>
>> $ bridge mst
>> port msti
>> vb1 0
>> state forwarding
>> 100
>> state disabled
>> vb2 0
>> state forwarding
>> 100
>> state forwarding
>>
>> The preexisting per-VLAN states are still valid in the MST
>> mode (although they are read-only), and can be queried as usual if one
>> is interested in knowing a particular VLAN's state without having to
>> care about the VID to MSTI mapping (in this example VLAN 20 and 30 are
>> bound to MSTI 100):
>>
>> $ bridge -d vlan
>> port vlan-id
>> vb1 10
>> state forwarding mcast_router 1
>> 20
>> state disabled mcast_router 1
>> 30
>> state disabled mcast_router 1
>> 40
>> state forwarding mcast_router 1
>> vb2 10
>> state forwarding mcast_router 1
>> 20
>> state forwarding mcast_router 1
>> 30
>> state forwarding mcast_router 1
>> 40
>> state forwarding mcast_router 1
>>
>> Signed-off-by: Tobias Waldekranz <[email protected]>
>> ---
>> include/uapi/linux/if_bridge.h | 16 +++
>> include/uapi/linux/rtnetlink.h | 5 +
>> net/bridge/br_mst.c | 244 +++++++++++++++++++++++++++++++++
>> net/bridge/br_netlink.c | 3 +
>> net/bridge/br_private.h | 4 +
>> 5 files changed, 272 insertions(+)
>>
>> diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
>> index b68016f625b7..784482527861 100644
>> --- a/include/uapi/linux/if_bridge.h
>> +++ b/include/uapi/linux/if_bridge.h
>> @@ -785,4 +785,20 @@ enum {
>> __BRIDGE_QUERIER_MAX
>> };
>> #define BRIDGE_QUERIER_MAX (__BRIDGE_QUERIER_MAX - 1)
>> +
>> +enum {
>> + BRIDGE_MST_UNSPEC,
>> + BRIDGE_MST_ENTRY,
>> + __BRIDGE_MST_MAX,
>> +};
>> +#define BRIDGE_MST_MAX (__BRIDGE_MST_MAX - 1)
>> +
>> +enum {
>> + BRIDGE_MST_ENTRY_UNSPEC,
>> + BRIDGE_MST_ENTRY_MSTI,
>> + BRIDGE_MST_ENTRY_STATE,
>> + __BRIDGE_MST_ENTRY_MAX,
>> +};
>> +#define BRIDGE_MST_ENTRY_MAX (__BRIDGE_MST_ENTRY_MAX - 1)
>> +
>> #endif /* _UAPI_LINUX_IF_BRIDGE_H */
>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>> index 0970cb4b1b88..4a48f3ce862c 100644
>> --- a/include/uapi/linux/rtnetlink.h
>> +++ b/include/uapi/linux/rtnetlink.h
>> @@ -192,6 +192,11 @@ enum {
>> RTM_GETTUNNEL,
>> #define RTM_GETTUNNEL RTM_GETTUNNEL
>>
>> + RTM_GETMST = 124 + 2,
>> +#define RTM_GETMST RTM_GETMST
>> + RTM_SETMST,
>> +#define RTM_SETMST RTM_SETMST
>> +
> I think you should also update selinux (see nlmsgtab.c)
> I'll think about this one, if there is some nice way to avoid the new rtm types.

yes, since these are all port attributes, seems like 'bridge link set'
should work

Tobias, can you pls check if extending RTM_SETLINK (with AF_BRIDGE) is
an option here ?

ie via br_setlink

2022-03-02 07:46:44

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 03/10] net: bridge: mst: Support setting and reporting MST port states

On 1 March 2022 11:03:14 CET, Tobias Waldekranz <[email protected]> wrote:
>Make it possible to change the port state in a given MSTI. This is
>done through a new netlink interface, since the MSTIs are objects in
>their own right. The proposed iproute2 interface would be:
>
> bridge mst set dev <PORT> msti <MSTI> state <STATE>
>
>Current states in all applicable MSTIs can also be dumped. The
>proposed iproute interface looks like this:
>
>$ bridge mst
>port msti
>vb1 0
> state forwarding
> 100
> state disabled
>vb2 0
> state forwarding
> 100
> state forwarding
>
>The preexisting per-VLAN states are still valid in the MST
>mode (although they are read-only), and can be queried as usual if one
>is interested in knowing a particular VLAN's state without having to
>care about the VID to MSTI mapping (in this example VLAN 20 and 30 are
>bound to MSTI 100):
>
>$ bridge -d vlan
>port vlan-id
>vb1 10
> state forwarding mcast_router 1
> 20
> state disabled mcast_router 1
> 30
> state disabled mcast_router 1
> 40
> state forwarding mcast_router 1
>vb2 10
> state forwarding mcast_router 1
> 20
> state forwarding mcast_router 1
> 30
> state forwarding mcast_router 1
> 40
> state forwarding mcast_router 1
>
>Signed-off-by: Tobias Waldekranz <[email protected]>
>---
> include/uapi/linux/if_bridge.h | 16 +++
> include/uapi/linux/rtnetlink.h | 5 +
> net/bridge/br_mst.c | 244 +++++++++++++++++++++++++++++++++
> net/bridge/br_netlink.c | 3 +
> net/bridge/br_private.h | 4 +
> 5 files changed, 272 insertions(+)
>
>diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
>index b68016f625b7..784482527861 100644
>--- a/include/uapi/linux/if_bridge.h
>+++ b/include/uapi/linux/if_bridge.h
>@@ -785,4 +785,20 @@ enum {
> __BRIDGE_QUERIER_MAX
> };
> #define BRIDGE_QUERIER_MAX (__BRIDGE_QUERIER_MAX - 1)
>+
>+enum {
>+ BRIDGE_MST_UNSPEC,
>+ BRIDGE_MST_ENTRY,
>+ __BRIDGE_MST_MAX,
>+};
>+#define BRIDGE_MST_MAX (__BRIDGE_MST_MAX - 1)
>+
>+enum {
>+ BRIDGE_MST_ENTRY_UNSPEC,
>+ BRIDGE_MST_ENTRY_MSTI,
>+ BRIDGE_MST_ENTRY_STATE,
>+ __BRIDGE_MST_ENTRY_MAX,
>+};
>+#define BRIDGE_MST_ENTRY_MAX (__BRIDGE_MST_ENTRY_MAX - 1)
>+
> #endif /* _UAPI_LINUX_IF_BRIDGE_H */
>diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>index 0970cb4b1b88..4a48f3ce862c 100644
>--- a/include/uapi/linux/rtnetlink.h
>+++ b/include/uapi/linux/rtnetlink.h
>@@ -192,6 +192,11 @@ enum {
> RTM_GETTUNNEL,
> #define RTM_GETTUNNEL RTM_GETTUNNEL
>
>+ RTM_GETMST = 124 + 2,
>+#define RTM_GETMST RTM_GETMST
>+ RTM_SETMST,
>+#define RTM_SETMST RTM_SETMST
>+

I think you should also update selinux (see nlmsgtab.c)
I'll think about this one, if there is some nice way to avoid the new rtm types.

> __RTM_MAX,
> #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1)
> };
>diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
>index f3b8e279b85c..8dea8e7257fd 100644
>--- a/net/bridge/br_mst.c
>+++ b/net/bridge/br_mst.c
>@@ -120,3 +120,247 @@ int br_mst_set_enabled(struct net_bridge *br, unsigned long val)
> br_opt_toggle(br, BROPT_MST_ENABLED, !!val);
> return 0;
> }
>+
>+static int br_mst_nl_get_one(struct net_bridge_port *p, struct sk_buff *skb,
>+ struct netlink_callback *cb)
>+{
>+ struct net_bridge_vlan_group *vg = nbp_vlan_group(p);
>+ int err = 0, idx = 0, s_idx = cb->args[1];
>+ struct net_bridge_vlan *v;
>+ struct br_port_msg *bpm;
>+ struct nlmsghdr *nlh;
>+ struct nlattr *nest;
>+ unsigned long *seen;
>+

Reverse xmas tree

>+ nlh = nlmsg_put(skb, NETLINK_CB(cb->skb).portid, cb->nlh->nlmsg_seq,
>+ RTM_GETMST, sizeof(*bpm), NLM_F_MULTI);
>+ if (!nlh)
>+ return -EMSGSIZE;
>+
>+ bpm = nlmsg_data(nlh);
>+ memset(bpm, 0, sizeof(*bpm));
>+ bpm->ifindex = p->dev->ifindex;
>+
>+ seen = bitmap_zalloc(VLAN_N_VID, 0);
>+ if (!seen)
>+ return -ENOMEM;
>+
>+ list_for_each_entry(v, &vg->vlan_list, vlist) {
>+ if (test_bit(v->brvlan->msti, seen))
>+ continue;
>+
>+ if (idx < s_idx)
>+ goto skip;
>+
>+ nest = nla_nest_start_noflag(skb, BRIDGE_MST_ENTRY);
>+ if (!nest ||
>+ nla_put_u16(skb, BRIDGE_MST_ENTRY_MSTI, v->brvlan->msti) ||
>+ nla_put_u8(skb, BRIDGE_MST_ENTRY_STATE, v->state)) {
>+ err = -EMSGSIZE;
>+ break;
>+ }
>+ nla_nest_end(skb, nest);
>+
>+ set_bit(v->brvlan->msti, seen);
>+skip:
>+ idx++;
>+ }
>+
>+ kfree(seen);
>+ nlmsg_end(skb, nlh);
>+ return err;
>+}
>+
>+static struct net_bridge_port *br_mst_nl_get_parse(struct net *net,
>+ struct netlink_callback *cb)
>+{
>+ struct netlink_ext_ack *extack = cb->extack;
>+ const struct nlmsghdr *nlh = cb->nlh;
>+ struct net_bridge_port *p;
>+ struct br_port_msg *bpm;
>+ struct net_device *dev;
>+
>+ if (nlh->nlmsg_len < nlmsg_msg_size(sizeof(*bpm))) {
>+ NL_SET_ERR_MSG_MOD(extack, "Invalid header for mst get request");
>+ return ERR_PTR(-EINVAL);
>+ }
>+
>+ if (nlmsg_attrlen(nlh, sizeof(*bpm))) {
>+ NL_SET_ERR_MSG(extack, "Invalid data after header in mst get request");
>+ return ERR_PTR(-EINVAL);
>+ }
>+
>+ bpm = nlmsg_data(nlh);
>+ if (!bpm->ifindex)
>+ return NULL;
>+
>+ dev = __dev_get_by_index(net, bpm->ifindex);
>+ if (!dev)
>+ return ERR_PTR(-ENODEV);
>+
>+ if (!netif_is_bridge_port(dev)) {
>+ NL_SET_ERR_MSG_MOD(extack, "The device is not a valid bridge port");
>+ return ERR_PTR(-EINVAL);
>+ }
>+
>+ p = br_port_get_rtnl(dev);
>+ if (WARN_ON(!p))
>+ return ERR_PTR(-ENODEV);
>+
>+ if (!br_opt_get(p->br, BROPT_MST_ENABLED)) {
>+ NL_SET_ERR_MSG_MOD(extack, "Can't query MST state when MST is disabled");
>+ return ERR_PTR(-EINVAL);
>+ }
>+
>+ return p;
>+}
>+
>+static int br_mst_nl_get(struct sk_buff *skb, struct netlink_callback *cb)
>+{
>+ int err = 0, idx = 0, s_idx = cb->args[0];
>+ struct net *net = sock_net(skb->sk);
>+ struct net_bridge_port *p;
>+ struct net_device *dev;
>+
>+ p = br_mst_nl_get_parse(net, cb);
>+ if (IS_ERR(p))
>+ return PTR_ERR(p);
>+
>+ if (p) {
>+ err = br_mst_nl_get_one(p, skb, cb);
>+ if (err != -EMSGSIZE)
>+ return err;
>+ } else {
>+ for_each_netdev(net, dev) {
>+ if (!netif_is_bridge_port(dev))
>+ continue;
>+
>+ if (idx < s_idx)
>+ goto skip;
>+
>+ p = br_port_get_rtnl(dev);
>+ if (WARN_ON(!p))
>+ return -ENODEV;
>+
>+ err = br_mst_nl_get_one(p, skb, cb);
>+ if (err == -EMSGSIZE)
>+ break;
>+skip:
>+ idx++;
>+ }
>+ }
>+
>+ cb->args[0] = idx;
>+ return skb->len;
>+}
>+
>+static const struct nla_policy br_mst_nl_policy[BRIDGE_MST_ENTRY_MAX + 1] = {
>+ [BRIDGE_MST_ENTRY_MSTI] = NLA_POLICY_RANGE(NLA_U16,
>+ 1, /* 0 reserved for CST */
>+ VLAN_N_VID - 1),
>+ [BRIDGE_MST_ENTRY_STATE] = NLA_POLICY_RANGE(NLA_U8,
>+ BR_STATE_DISABLED,
>+ BR_STATE_BLOCKING),
>+};
>+
>+static int br_mst_nl_set_one(struct net_bridge_port *p,
>+ const struct nlattr *attr,
>+ struct netlink_ext_ack *extack)
>+{
>+ struct nlattr *tb[BRIDGE_MST_ENTRY_MAX + 1];
>+ u16 msti;
>+ u8 state;
>+ int err;
>+
>+ err = nla_parse_nested(tb, BRIDGE_MST_ENTRY_MAX, attr,
>+ br_mst_nl_policy, extack);
>+ if (err)
>+ return err;
>+
>+ if (!tb[BRIDGE_MST_ENTRY_MSTI]) {
>+ NL_SET_ERR_MSG_MOD(extack, "MSTI not specified");
>+ return -EINVAL;
>+ }
>+
>+ if (!tb[BRIDGE_MST_ENTRY_STATE]) {
>+ NL_SET_ERR_MSG_MOD(extack, "State not specified");
>+ return -EINVAL;
>+ }
>+
>+ msti = nla_get_u16(tb[BRIDGE_MST_ENTRY_MSTI]);
>+ state = nla_get_u8(tb[BRIDGE_MST_ENTRY_STATE]);
>+
>+ br_mst_set_state(p, msti, state);
>+ return 0;
>+}
>+
>+static int br_mst_nl_set(struct sk_buff *skb, struct nlmsghdr *nlh,
>+ struct netlink_ext_ack *extack)
>+{
>+ struct net *net = sock_net(skb->sk);
>+ struct net_bridge_port *p;
>+ struct br_port_msg *bpm;
>+ struct net_device *dev;
>+ struct nlattr *attr;
>+ int err, msts = 0;
>+ int rem;
>+
>+ err = nlmsg_parse(nlh, sizeof(*bpm), NULL, BRIDGE_MST_MAX, NULL,
>+ extack);
>+ if (err < 0)
>+ return err;
>+
>+ bpm = nlmsg_data(nlh);
>+ dev = __dev_get_by_index(net, bpm->ifindex);
>+ if (!dev)
>+ return -ENODEV;
>+
>+ if (!netif_is_bridge_port(dev)) {
>+ NL_SET_ERR_MSG_MOD(extack, "The device is not a valid bridge port");
>+ return -EINVAL;
>+ }
>+
>+ p = br_port_get_rtnl(dev);
>+ if (WARN_ON(!p))
>+ return -ENODEV;
>+
>+ if (!br_opt_get(p->br, BROPT_MST_ENABLED)) {
>+ NL_SET_ERR_MSG_MOD(extack, "Can't modify MST state when MST is disabled");
>+ return -EBUSY;
>+ }
>+
>+ nlmsg_for_each_attr(attr, nlh, sizeof(*bpm), rem) {
>+ switch (nla_type(attr)) {
>+ case BRIDGE_MST_ENTRY:
>+ err = br_mst_nl_set_one(p, attr, extack);
>+ break;
>+ default:
>+ continue;
>+ }
>+
>+ msts++;
>+ if (err)
>+ break;
>+ }
>+
>+ if (!msts) {
>+ NL_SET_ERR_MSG_MOD(extack, "Found no MST entries to process");
>+ err = -EINVAL;
>+ }
>+
>+ return err;
>+}
>+
>+void br_mst_rtnl_init(void)
>+{
>+ rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_GETMST, NULL,
>+ br_mst_nl_get, 0);
>+ rtnl_register_module(THIS_MODULE, PF_BRIDGE, RTM_SETMST,
>+ br_mst_nl_set, NULL, 0);
>+}
>+
>+void br_mst_rtnl_uninit(void)
>+{
>+ rtnl_unregister(PF_BRIDGE, RTM_SETMST);
>+ rtnl_unregister(PF_BRIDGE, RTM_GETMST);
>+}
>diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>index a17a0fe25a58..6d70d6f9cf17 100644
>--- a/net/bridge/br_netlink.c
>+++ b/net/bridge/br_netlink.c
>@@ -1813,6 +1813,7 @@ int __init br_netlink_init(void)
>
> br_mdb_init();
> br_vlan_rtnl_init();
>+ br_mst_rtnl_init();
> rtnl_af_register(&br_af_ops);
>
> err = rtnl_link_register(&br_link_ops);
>@@ -1824,6 +1825,7 @@ int __init br_netlink_init(void)
> out_af:
> rtnl_af_unregister(&br_af_ops);
> br_mdb_uninit();
>+ br_mst_rtnl_uninit();
> return err;
> }
>
>@@ -1831,6 +1833,7 @@ void br_netlink_fini(void)
> {
> br_mdb_uninit();
> br_vlan_rtnl_uninit();
>+ br_mst_rtnl_uninit();
> rtnl_af_unregister(&br_af_ops);
> rtnl_link_unregister(&br_link_ops);
> }
>diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
>index 63601043abca..7882a65ffb43 100644
>--- a/net/bridge/br_private.h
>+++ b/net/bridge/br_private.h
>@@ -1782,6 +1782,8 @@ void br_mst_set_state(struct net_bridge_port *p, u16 msti, u8 state);
> int br_mst_vlan_set_msti(struct net_bridge_vlan *v, u16 msti);
> void br_mst_vlan_init_state(struct net_bridge_vlan *v);
> int br_mst_set_enabled(struct net_bridge *br, unsigned long val);
>+void br_mst_rtnl_init(void);
>+void br_mst_rtnl_uninit(void);
> #else
> static inline bool br_mst_is_enabled(struct net_bridge *br)
> {
>@@ -1790,6 +1792,8 @@ static inline bool br_mst_is_enabled(struct net_bridge *br)
>
> static inline void br_mst_set_state(struct net_bridge_port *p,
> u16 msti, u8 state) {}
>+static inline void br_mst_rtnl_init(void) {}
>+static inline void br_mst_rtnl_uninit(void) {}
> #endif
>
> struct nf_br_ops {

2022-03-02 10:10:52

by Pavel Šimerda

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 00/10] net: bridge: Multiple Spanning Trees



On 01/03/2022 22:20, Tobias Waldekranz wrote:
> On Tue, Mar 01, 2022 at 18:21, Vladimir Oltean <[email protected]> wrote:
>> Hi Tobias,
>>
>> On Tue, Mar 01, 2022 at 11:03:11AM +0100, Tobias Waldekranz wrote:
>>> A proposal for the corresponding iproute2 interface is available here:
>>>
>>> https://github.com/wkz/iproute2/tree/mst
>>
>> Please pardon my ignorance. Is there a user-mode STP protocol application
>> that supports MSTP, and that you've tested these patches with?
>> I'd like to give it a try.
>
> I see that Stephen has already pointed you to mstpd in a sibling
> message.
>
> It is important to note though, that AFAIK mstpd does not actually
> support MSTP on a vanilla Linux system. The protocol implementation is
> in place, and they have a plugin architecture that makes it easy for people
> to hook it up to various userspace SDKs and whatnot, but you can't use
> it with a regular bridge.
>
> A colleague of mine has been successfully running a modified version of
> mstpd which was tailored for v1 of this series (RFC). But I do not
> believe he has had the time to rework it for v2. That should mostly be a
> matter of removing code though, as v2 allows you to manage the MSTIs
> directly, rather than having to translate it to an associated VLAN.

Hello,

we experimented with mstpd with pretty reasonable kernel modifications. Vanilla kernel wasn't capable of transferring the correct mapping from mstpd to the hardware due to lack of vlan2msti mapping and per-msti port state (rather than just per-vlan port state).

https://github.com/mstpd/mstpd/pull/112

I didn't pursue this for a while, though.

Regards,
Pavel

2022-03-02 23:19:08

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 00/10] net: bridge: Multiple Spanning Trees

On Tue, 1 Mar 2022 18:21:42 +0200
Vladimir Oltean <[email protected]> wrote:

> Hi Tobias,
>
> On Tue, Mar 01, 2022 at 11:03:11AM +0100, Tobias Waldekranz wrote:
> > A proposal for the corresponding iproute2 interface is available here:
> >
> > https://github.com/wkz/iproute2/tree/mst
>
> Please pardon my ignorance. Is there a user-mode STP protocol application
> that supports MSTP, and that you've tested these patches with?
> I'd like to give it a try.

https://github.com/mstpd/mstpd

2022-03-03 23:04:07

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 10/10] net: dsa: mv88e6xxx: MST Offloading

On Tue, Mar 01, 2022 at 11:03:21AM +0100, Tobias Waldekranz wrote:
> Allocate a SID in the STU for each MSTID in use by a bridge and handle
> the mapping of MSTIDs to VLANs using the SID field of each VTU entry.
>
> Signed-off-by: Tobias Waldekranz <[email protected]>
> ---
> drivers/net/dsa/mv88e6xxx/chip.c | 178 +++++++++++++++++++++++++++++++
> drivers/net/dsa/mv88e6xxx/chip.h | 13 +++
> 2 files changed, 191 insertions(+)
>
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> index c14a62aa6a6c..4fb4ec1dff79 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> @@ -1818,6 +1818,137 @@ static int mv88e6xxx_stu_setup(struct mv88e6xxx_chip *chip)
> return mv88e6xxx_stu_loadpurge(chip, &stu);
> }
>
> +static int mv88e6xxx_sid_new(struct mv88e6xxx_chip *chip, u8 *sid)
> +{
> + DECLARE_BITMAP(busy, MV88E6XXX_N_SID) = { 0 };
> + struct mv88e6xxx_mst *mst;
> +
> + set_bit(0, busy);
> +
> + list_for_each_entry(mst, &chip->msts, node) {
> + set_bit(mst->stu.sid, busy);
> + }
> +
> + *sid = find_first_zero_bit(busy, MV88E6XXX_N_SID);
> +
> + return (*sid >= mv88e6xxx_max_sid(chip)) ? -ENOSPC : 0;
> +}
> +
> +static int mv88e6xxx_sid_put(struct mv88e6xxx_chip *chip, u8 sid)
> +{
> + struct mv88e6xxx_mst *mst, *tmp;
> + int err = 0;
> +
> + list_for_each_entry_safe(mst, tmp, &chip->msts, node) {
> + if (mst->stu.sid == sid) {
> + if (refcount_dec_and_test(&mst->refcnt)) {
> + mst->stu.valid = false;
> + err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);

It is interesting what to do if this fails. Possibly not this, because
the entry remains in hardware but not in software.

> + list_del(&mst->node);
> + kfree(mst);
> + }
> +
> + return err;
> + }
> + }
> +
> + return -ENOENT;
> +}
> +
> +static int mv88e6xxx_sid_get(struct mv88e6xxx_chip *chip, struct net_device *br,
> + u16 msti, u8 *sid)
> +{
> + struct mv88e6xxx_mst *mst;
> + int err, i;
> +
> + if (!br)
> + return 0;

Is this condition possible?

> +
> + if (!mv88e6xxx_has_stu(chip))
> + return -EOPNOTSUPP;
> +
> + list_for_each_entry(mst, &chip->msts, node) {
> + if (mst->br == br && mst->msti == msti) {
> + refcount_inc(&mst->refcnt);
> + *sid = mst->stu.sid;
> + return 0;
> + }
> + }
> +
> + err = mv88e6xxx_sid_new(chip, sid);
> + if (err)
> + return err;
> +
> + mst = kzalloc(sizeof(*mst), GFP_KERNEL);
> + if (!mst)
> + return -ENOMEM;

This leaks the new SID.

> +
> + INIT_LIST_HEAD(&mst->node);
> + refcount_set(&mst->refcnt, 1);
> + mst->br = br;
> + mst->msti = msti;
> + mst->stu.valid = true;
> + mst->stu.sid = *sid;
> +
> + /* The bridge starts out all ports in the disabled state. But
> + * a STU state of disabled means to go by the port-global
> + * state. So we set all user port's initial state to blocking,
> + * to match the bridge's behavior.
> + */
> + for (i = 0; i < mv88e6xxx_num_ports(chip); i++)
> + mst->stu.state[i] = dsa_is_user_port(chip->ds, i) ?
> + MV88E6XXX_PORT_CTL0_STATE_BLOCKING :
> + MV88E6XXX_PORT_CTL0_STATE_DISABLED;
> +
> + list_add_tail(&mst->node, &chip->msts);
> + return mv88e6xxx_stu_loadpurge(chip, &mst->stu);

And this doesn't behave too well on failure (the MSTID exists in
software but not in hardware).

> +}
> +
> +static int mv88e6xxx_port_mst_state_set(struct dsa_switch *ds, int port,
> + const struct switchdev_mst_state *st)
> +{
> + struct dsa_port *dp = dsa_to_port(ds, port);
> + struct mv88e6xxx_chip *chip = ds->priv;
> + struct mv88e6xxx_mst *mst;
> + u8 state;
> + int err;
> +
> + if (!mv88e6xxx_has_stu(chip))
> + return -EOPNOTSUPP;
> +
> + switch (st->state) {
> + case BR_STATE_DISABLED:
> + case BR_STATE_BLOCKING:
> + case BR_STATE_LISTENING:
> + state = MV88E6XXX_PORT_CTL0_STATE_BLOCKING;
> + break;
> + case BR_STATE_LEARNING:
> + state = MV88E6XXX_PORT_CTL0_STATE_LEARNING;
> + break;
> + case BR_STATE_FORWARDING:
> + state = MV88E6XXX_PORT_CTL0_STATE_FORWARDING;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + list_for_each_entry(mst, &chip->msts, node) {
> + if (mst->br == dsa_port_bridge_dev_get(dp) &&
> + mst->msti == st->msti) {
> + if (mst->stu.state[port] == state)
> + return 0;
> +
> + mst->stu.state[port] = state;
> + mv88e6xxx_reg_lock(chip);
> + err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);
> + mv88e6xxx_reg_unlock(chip);
> + return err;
> + }
> + }
> +
> + return -ENOENT;
> +}
> +
> static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch *ds, int port,
> u16 vid)
> {
> @@ -2437,6 +2568,12 @@ static int mv88e6xxx_port_vlan_leave(struct mv88e6xxx_chip *chip,
> if (err)
> return err;
>
> + if (!vlan.valid && vlan.sid) {
> + err = mv88e6xxx_sid_put(chip, vlan.sid);
> + if (err)
> + return err;
> + }
> +
> return mv88e6xxx_g1_atu_remove(chip, vlan.fid, port, false);
> }
>
> @@ -2482,6 +2619,44 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
> return err;
> }
>
> +static int mv88e6xxx_vlan_msti_set(struct dsa_switch *ds,
> + const struct switchdev_attr *attr)
> +{
> + const struct switchdev_vlan_attr *vattr = &attr->u.vlan_attr;
> + struct mv88e6xxx_chip *chip = ds->priv;
> + struct mv88e6xxx_vtu_entry vlan;
> + u8 new_sid;
> + int err;
> +
> + mv88e6xxx_reg_lock(chip);
> +
> + err = mv88e6xxx_vtu_get(chip, vattr->vid, &vlan);
> + if (err)
> + goto unlock;
> +
> + if (!vlan.valid) {
> + err = -EINVAL;
> + goto unlock;
> + }
> +
> + err = mv88e6xxx_sid_get(chip, attr->orig_dev, vattr->msti, &new_sid);
> + if (err)
> + goto unlock;
> +
> + if (vlan.sid) {
> + err = mv88e6xxx_sid_put(chip, vlan.sid);
> + if (err)
> + goto unlock;
> + }
> +
> + vlan.sid = new_sid;
> + err = mv88e6xxx_vtu_loadpurge(chip, &vlan);

Maybe you could move mv88e6xxx_sid_put() after this succeeds?

> +
> +unlock:
> + mv88e6xxx_reg_unlock(chip);
> + return err;
> +}
> +
> static int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
> const unsigned char *addr, u16 vid,
> struct dsa_db db)
> @@ -6008,6 +6183,7 @@ static struct mv88e6xxx_chip *mv88e6xxx_alloc_chip(struct device *dev)
> mutex_init(&chip->reg_lock);
> INIT_LIST_HEAD(&chip->mdios);
> idr_init(&chip->policies);
> + INIT_LIST_HEAD(&chip->msts);
>
> return chip;
> }
> @@ -6540,10 +6716,12 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
> .port_pre_bridge_flags = mv88e6xxx_port_pre_bridge_flags,
> .port_bridge_flags = mv88e6xxx_port_bridge_flags,
> .port_stp_state_set = mv88e6xxx_port_stp_state_set,
> + .port_mst_state_set = mv88e6xxx_port_mst_state_set,
> .port_fast_age = mv88e6xxx_port_fast_age,
> .port_vlan_filtering = mv88e6xxx_port_vlan_filtering,
> .port_vlan_add = mv88e6xxx_port_vlan_add,
> .port_vlan_del = mv88e6xxx_port_vlan_del,
> + .vlan_msti_set = mv88e6xxx_vlan_msti_set,
> .port_fdb_add = mv88e6xxx_port_fdb_add,
> .port_fdb_del = mv88e6xxx_port_fdb_del,
> .port_fdb_dump = mv88e6xxx_port_fdb_dump,
> diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
> index 6d4daa24d3e5..6a0b66354e1d 100644
> --- a/drivers/net/dsa/mv88e6xxx/chip.h
> +++ b/drivers/net/dsa/mv88e6xxx/chip.h
> @@ -297,6 +297,16 @@ struct mv88e6xxx_region_priv {
> enum mv88e6xxx_region_id id;
> };
>
> +struct mv88e6xxx_mst {
> + struct list_head node;
> +
> + refcount_t refcnt;
> + struct net_device *br;
> + u16 msti;
> +
> + struct mv88e6xxx_stu_entry stu;
> +};
> +
> struct mv88e6xxx_chip {
> const struct mv88e6xxx_info *info;
>
> @@ -397,6 +407,9 @@ struct mv88e6xxx_chip {
>
> /* devlink regions */
> struct devlink_region *regions[_MV88E6XXX_REGION_MAX];
> +
> + /* Bridge MST to SID mappings */
> + struct list_head msts;
> };
>
> struct mv88e6xxx_bus_ops {
> --
> 2.25.1
>

2022-03-04 03:52:03

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Tue, Mar 01, 2022 at 11:03:18AM +0100, Tobias Waldekranz wrote:
> Add the usual trampoline functionality from the generic DSA layer down
> to the drivers for MST state changes.
>
> Signed-off-by: Tobias Waldekranz <[email protected]>
> ---
> include/net/dsa.h | 2 ++
> net/dsa/dsa_priv.h | 2 ++
> net/dsa/port.c | 30 ++++++++++++++++++++++++++++++
> net/dsa/slave.c | 6 ++++++
> 4 files changed, 40 insertions(+)
>
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index cc8acb01bd9b..096e6e3a8e1e 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -943,6 +943,8 @@ struct dsa_switch_ops {
> struct dsa_bridge bridge);
> void (*port_stp_state_set)(struct dsa_switch *ds, int port,
> u8 state);
> + int (*port_mst_state_set)(struct dsa_switch *ds, int port,
> + const struct switchdev_mst_state *state);
> void (*port_fast_age)(struct dsa_switch *ds, int port);
> int (*port_pre_bridge_flags)(struct dsa_switch *ds, int port,
> struct switchdev_brport_flags flags,
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 87ec0697e92e..a620e079ebc5 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -198,6 +198,8 @@ static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
> void dsa_port_set_tag_protocol(struct dsa_port *cpu_dp,
> const struct dsa_device_ops *tag_ops);
> int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age);
> +int dsa_port_set_mst_state(struct dsa_port *dp,
> + const struct switchdev_mst_state *state);
> int dsa_port_enable_rt(struct dsa_port *dp, struct phy_device *phy);
> int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy);
> void dsa_port_disable_rt(struct dsa_port *dp);
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index 5f45cb7d70ba..26cfbc8ab499 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -108,6 +108,36 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age)
> return 0;
> }
>
> +int dsa_port_set_mst_state(struct dsa_port *dp,
> + const struct switchdev_mst_state *state)
> +{
> + struct dsa_switch *ds = dp->ds;
> + int err, port = dp->index;
> +
> + if (!ds->ops->port_mst_state_set)
> + return -EOPNOTSUPP;
> +
> + err = ds->ops->port_mst_state_set(ds, port, state);
> + if (err)
> + return err;
> +
> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
> + switch (state->state) {
> + case BR_STATE_DISABLED:
> + case BR_STATE_BLOCKING:
> + case BR_STATE_LISTENING:
> + /* Ideally we would only fast age entries
> + * belonging to VLANs controlled by this
> + * MST.
> + */
> + dsa_port_fast_age(dp);

Does mv88e6xxx support this? If it does, you might just as well
introduce another variant of ds->ops->port_fast_age() for an msti.

And since it is new code, you could require that drivers _do_ support
configuring learning before they could support MSTP. After all, we don't
want to keep legacy mechanisms in place forever.

> + break;
> + }
> + }
> +
> + return 0;
> +}
> +
> static void dsa_port_set_state_now(struct dsa_port *dp, u8 state,
> bool do_fast_age)
> {
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index c6ffcd782b5a..32b006a5b778 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -288,6 +288,12 @@ static int dsa_slave_port_attr_set(struct net_device *dev, const void *ctx,
>
> ret = dsa_port_set_state(dp, attr->u.stp_state, true);
> break;
> + case SWITCHDEV_ATTR_ID_PORT_MST_STATE:
> + if (!dsa_port_offloads_bridge_port(dp, attr->orig_dev))
> + return -EOPNOTSUPP;
> +
> + ret = dsa_port_set_mst_state(dp, &attr->u.mst_state);
> + break;
> case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
> if (!dsa_port_offloads_bridge_dev(dp, attr->orig_dev))
> return -EOPNOTSUPP;
> --
> 2.25.1
>

2022-03-04 04:37:00

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 06/10] net: dsa: Pass VLAN MSTI migration notifications to driver

On Tue, Mar 01, 2022 at 11:03:17AM +0100, Tobias Waldekranz wrote:
> Add the usual trampoline functionality from the generic DSA layer down
> to the drivers for VLAN MSTI migrations.
>
> Signed-off-by: Tobias Waldekranz <[email protected]>
> ---
> include/net/dsa.h | 3 +++
> net/dsa/dsa_priv.h | 1 +
> net/dsa/port.c | 10 ++++++++++
> net/dsa/slave.c | 6 ++++++
> 4 files changed, 20 insertions(+)
>
> diff --git a/include/net/dsa.h b/include/net/dsa.h
> index cfedcfb86350..cc8acb01bd9b 100644
> --- a/include/net/dsa.h
> +++ b/include/net/dsa.h
> @@ -962,6 +962,9 @@ struct dsa_switch_ops {
> struct netlink_ext_ack *extack);
> int (*port_vlan_del)(struct dsa_switch *ds, int port,
> const struct switchdev_obj_port_vlan *vlan);
> + int (*vlan_msti_set)(struct dsa_switch *ds,
> + const struct switchdev_attr *attr);

I would rather pass the struct switchdev_vlan_attr and the orig_dev
(bridge) as separate arguments here. Or even the struct dsa_bridge, for
consistency to the API changes for database isolation.

> +
> /*
> * Forwarding database
> */
> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
> index 07c0ad52395a..87ec0697e92e 100644
> --- a/net/dsa/dsa_priv.h
> +++ b/net/dsa/dsa_priv.h
> @@ -217,6 +217,7 @@ int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
> struct netlink_ext_ack *extack);
> bool dsa_port_skip_vlan_configuration(struct dsa_port *dp);
> int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock);
> +int dsa_port_vlan_msti(struct dsa_port *dp, const struct switchdev_attr *attr);
> int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
> bool targeted_match);
> int dsa_port_fdb_add(struct dsa_port *dp, const unsigned char *addr,
> diff --git a/net/dsa/port.c b/net/dsa/port.c
> index d9da425a17fb..5f45cb7d70ba 100644
> --- a/net/dsa/port.c
> +++ b/net/dsa/port.c
> @@ -778,6 +778,16 @@ int dsa_port_bridge_flags(struct dsa_port *dp,
> return 0;
> }
>
> +int dsa_port_vlan_msti(struct dsa_port *dp, const struct switchdev_attr *attr)
> +{
> + struct dsa_switch *ds = dp->ds;
> +
> + if (!ds->ops->vlan_msti_set)
> + return -EOPNOTSUPP;
> +
> + return ds->ops->vlan_msti_set(ds, attr);

I guess this doesn't need to be a cross-chip notifier event for all
switches, because replication to all bridge ports is handled by
switchdev_handle_port_attr_set(). Ok. But isn't it called too many times
per switch?

> +}
> +
> int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
> bool targeted_match)
> {
> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
> index 089616206b11..c6ffcd782b5a 100644
> --- a/net/dsa/slave.c
> +++ b/net/dsa/slave.c
> @@ -314,6 +314,12 @@ static int dsa_slave_port_attr_set(struct net_device *dev, const void *ctx,
>
> ret = dsa_port_bridge_flags(dp, attr->u.brport_flags, extack);
> break;
> + case SWITCHDEV_ATTR_ID_VLAN_MSTI:
> + if (!dsa_port_offloads_bridge_dev(dp, attr->orig_dev))
> + return -EOPNOTSUPP;
> +
> + ret = dsa_port_vlan_msti(dp, attr);
> + break;
> default:
> ret = -EOPNOTSUPP;
> break;
> --
> 2.25.1
>

2022-03-04 08:27:23

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/10] net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations

On Tue, Mar 01, 2022 at 11:03:15AM +0100, Tobias Waldekranz wrote:
> Whenever a VLAN moves to a new MSTI, send a switchdev notification so
> that switchdevs can...
>
> ...either refuse the migration if the hardware does not support
> offloading of MST...
>
> ..or track a bridge's VID to MSTI mapping when offloading is
> supported.
>
> Signed-off-by: Tobias Waldekranz <[email protected]>
> ---
> include/net/switchdev.h | 10 +++++++
> net/bridge/br_mst.c | 15 +++++++++++
> net/bridge/br_switchdev.c | 57 +++++++++++++++++++++++++++++++++++++++
> 3 files changed, 82 insertions(+)
>
> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> index 3e424d40fae3..39e57aa5005a 100644
> --- a/include/net/switchdev.h
> +++ b/include/net/switchdev.h
> @@ -28,6 +28,7 @@ enum switchdev_attr_id {
> SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
> SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
> SWITCHDEV_ATTR_ID_MRP_PORT_ROLE,
> + SWITCHDEV_ATTR_ID_VLAN_MSTI,
> };
>
> struct switchdev_brport_flags {
> @@ -35,6 +36,14 @@ struct switchdev_brport_flags {
> unsigned long mask;
> };
>
> +struct switchdev_vlan_attr {
> + u16 vid;
> +
> + union {
> + u16 msti;
> + };

Do you see other VLAN attributes that would be added in the future, such
as to justify making this a single-element union from the get-go?
Anyway if that is the case, we're lacking an id for the attribute type,
so we'd end up needing to change drivers when a second union element
appears. Otherwise they'd all expect an u16 msti.

> +};
> +
> struct switchdev_attr {
> struct net_device *orig_dev;
> enum switchdev_attr_id id;
> @@ -50,6 +59,7 @@ struct switchdev_attr {
> u16 vlan_protocol; /* BRIDGE_VLAN_PROTOCOL */
> bool mc_disabled; /* MC_DISABLED */
> u8 mrp_port_role; /* MRP_PORT_ROLE */
> + struct switchdev_vlan_attr vlan_attr; /* VLAN_* */
> } u;
> };
>
> diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
> index 8dea8e7257fd..aba603675165 100644
> --- a/net/bridge/br_mst.c
> +++ b/net/bridge/br_mst.c
> @@ -7,6 +7,7 @@
> */
>
> #include <linux/kernel.h>
> +#include <net/switchdev.h>
>
> #include "br_private.h"
>
> @@ -65,9 +66,23 @@ static void br_mst_vlan_sync_state(struct net_bridge_vlan *pv, u16 msti)
>
> int br_mst_vlan_set_msti(struct net_bridge_vlan *mv, u16 msti)
> {
> + struct switchdev_attr attr = {
> + .id = SWITCHDEV_ATTR_ID_VLAN_MSTI,
> + .flags = SWITCHDEV_F_DEFER,

Is the bridge spinlock held (atomic context), or otherwise why is
SWITCHDEV_F_DEFER needed here?

> + .orig_dev = mv->br->dev,
> + .u.vlan_attr = {
> + .vid = mv->vid,
> + .msti = msti,
> + },
> + };
> struct net_bridge_vlan_group *vg;
> struct net_bridge_vlan *pv;
> struct net_bridge_port *p;
> + int err;
> +
> + err = switchdev_port_attr_set(mv->br->dev, &attr, NULL);

Treating a "VLAN attribute" as a "port attribute of the bridge" is
pushing the taxonomy just a little, but I don't have a better suggestion.

> + if (err && err != -EOPNOTSUPP)
> + return err;
>
> mv->msti = msti;
>
> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
> index 6f6a70121a5e..160d7659f88a 100644
> --- a/net/bridge/br_switchdev.c
> +++ b/net/bridge/br_switchdev.c
> @@ -428,6 +428,57 @@ static int br_switchdev_vlan_replay(struct net_device *br_dev,
> return 0;
> }
>
> +static int br_switchdev_mst_replay(struct net_device *br_dev,
> + const void *ctx, bool adding,
> + struct notifier_block *nb,
> + struct netlink_ext_ack *extack)

"bool adding" is unused, and replaying the VLAN to MSTI associations
before deleting them makes little sense anyway.

I understand the appeal of symmetry, so maybe put an

if (adding) {
err = br_switchdev_vlan_attr_replay(...);
if (err && err != -EOPNOTSUPP)
return err;
}

at the end of br_switchdev_vlan_replay()?

> +{
> + struct switchdev_notifier_port_attr_info attr_info = {
> + .info = {
> + .dev = br_dev,
> + .extack = extack,
> + .ctx = ctx,
> + },
> + };
> + struct net_bridge *br = netdev_priv(br_dev);
> + struct net_bridge_vlan_group *vg;
> + struct net_bridge_vlan *v;
> + int err;
> +
> + ASSERT_RTNL();
> +
> + if (!nb)
> + return 0;
> +
> + if (!netif_is_bridge_master(br_dev))
> + return -EINVAL;
> +
> + vg = br_vlan_group(br);
> +
> + list_for_each_entry(v, &vg->vlan_list, vlist) {
> + struct switchdev_attr attr = {
> + .id = SWITCHDEV_ATTR_ID_VLAN_MSTI,
> + .flags = SWITCHDEV_F_DEFER,

I don't think SWITCHDEV_F_DEFER has any effect on a replay.

> + .orig_dev = br_dev,
> + .u.vlan_attr = {
> + .vid = v->vid,
> + .msti = v->msti,
> + }
> + };
> +
> + if (!v->msti)
> + continue;
> +
> + attr_info.attr = &attr;
> + err = nb->notifier_call(nb, SWITCHDEV_PORT_ATTR_SET, &attr_info);
> + err = notifier_to_errno(err);
> + if (err)
> + return err;
> + }
> +
> + return 0;
> +}
> +
> #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
> struct br_switchdev_mdb_complete_info {
> struct net_bridge_port *port;
> @@ -695,6 +746,10 @@ static int nbp_switchdev_sync_objs(struct net_bridge_port *p, const void *ctx,
> if (err && err != -EOPNOTSUPP)
> return err;
>
> + err = br_switchdev_mst_replay(br_dev, ctx, true, blocking_nb, extack);
> + if (err && err != -EOPNOTSUPP)
> + return err;
> +
> err = br_switchdev_mdb_replay(br_dev, dev, ctx, true, blocking_nb,
> extack);
> if (err && err != -EOPNOTSUPP)
> @@ -719,6 +774,8 @@ static void nbp_switchdev_unsync_objs(struct net_bridge_port *p,
>
> br_switchdev_mdb_replay(br_dev, dev, ctx, false, blocking_nb, NULL);
>
> + br_switchdev_mst_replay(br_dev, ctx, false, blocking_nb, NULL);
> +
> br_switchdev_vlan_replay(br_dev, ctx, false, blocking_nb, NULL);
> }
>
> --
> 2.25.1
>

2022-03-07 15:49:06

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 03/10] net: bridge: mst: Support setting and reporting MST port states

On 07/03/2022 17:00, Tobias Waldekranz wrote:
> On Wed, Mar 02, 2022 at 00:19, Nikolay Aleksandrov <[email protected]> wrote:
>> On 1 March 2022 11:03:14 CET, Tobias Waldekranz <[email protected]> wrote:
>>> Make it possible to change the port state in a given MSTI. This is
>>> done through a new netlink interface, since the MSTIs are objects in
>>> their own right. The proposed iproute2 interface would be:
>>>
>>> bridge mst set dev <PORT> msti <MSTI> state <STATE>
>>>
>>> Current states in all applicable MSTIs can also be dumped. The
>>> proposed iproute interface looks like this:
>>>
>>> $ bridge mst
>>> port msti
>>> vb1 0
>>> state forwarding
>>> 100
>>> state disabled
>>> vb2 0
>>> state forwarding
>>> 100
>>> state forwarding
>>>
>>> The preexisting per-VLAN states are still valid in the MST
>>> mode (although they are read-only), and can be queried as usual if one
>>> is interested in knowing a particular VLAN's state without having to
>>> care about the VID to MSTI mapping (in this example VLAN 20 and 30 are
>>> bound to MSTI 100):
>>>
>>> $ bridge -d vlan
>>> port vlan-id
>>> vb1 10
>>> state forwarding mcast_router 1
>>> 20
>>> state disabled mcast_router 1
>>> 30
>>> state disabled mcast_router 1
>>> 40
>>> state forwarding mcast_router 1
>>> vb2 10
>>> state forwarding mcast_router 1
>>> 20
>>> state forwarding mcast_router 1
>>> 30
>>> state forwarding mcast_router 1
>>> 40
>>> state forwarding mcast_router 1
>>>
>>> Signed-off-by: Tobias Waldekranz <[email protected]>
>>> ---
>>> include/uapi/linux/if_bridge.h | 16 +++
>>> include/uapi/linux/rtnetlink.h | 5 +
>>> net/bridge/br_mst.c | 244 +++++++++++++++++++++++++++++++++
>>> net/bridge/br_netlink.c | 3 +
>>> net/bridge/br_private.h | 4 +
>>> 5 files changed, 272 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
>>> index b68016f625b7..784482527861 100644
>>> --- a/include/uapi/linux/if_bridge.h
>>> +++ b/include/uapi/linux/if_bridge.h
>>> @@ -785,4 +785,20 @@ enum {
>>> __BRIDGE_QUERIER_MAX
>>> };
>>> #define BRIDGE_QUERIER_MAX (__BRIDGE_QUERIER_MAX - 1)
>>> +
>>> +enum {
>>> + BRIDGE_MST_UNSPEC,
>>> + BRIDGE_MST_ENTRY,
>>> + __BRIDGE_MST_MAX,
>>> +};
>>> +#define BRIDGE_MST_MAX (__BRIDGE_MST_MAX - 1)
>>> +
>>> +enum {
>>> + BRIDGE_MST_ENTRY_UNSPEC,
>>> + BRIDGE_MST_ENTRY_MSTI,
>>> + BRIDGE_MST_ENTRY_STATE,
>>> + __BRIDGE_MST_ENTRY_MAX,
>>> +};
>>> +#define BRIDGE_MST_ENTRY_MAX (__BRIDGE_MST_ENTRY_MAX - 1)
>>> +
>>> #endif /* _UAPI_LINUX_IF_BRIDGE_H */
>>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>>> index 0970cb4b1b88..4a48f3ce862c 100644
>>> --- a/include/uapi/linux/rtnetlink.h
>>> +++ b/include/uapi/linux/rtnetlink.h
>>> @@ -192,6 +192,11 @@ enum {
>>> RTM_GETTUNNEL,
>>> #define RTM_GETTUNNEL RTM_GETTUNNEL
>>>
>>> + RTM_GETMST = 124 + 2,
>>> +#define RTM_GETMST RTM_GETMST
>>> + RTM_SETMST,
>>> +#define RTM_SETMST RTM_SETMST
>>> +
>>
>> I think you should also update selinux (see nlmsgtab.c)
>> I'll think about this one, if there is some nice way to avoid the new rtm types.
>>
>>> __RTM_MAX,
>>> #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1)
>>> };
>>> diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
>>> index f3b8e279b85c..8dea8e7257fd 100644
>>> --- a/net/bridge/br_mst.c
>>> +++ b/net/bridge/br_mst.c
>>> @@ -120,3 +120,247 @@ int br_mst_set_enabled(struct net_bridge *br, unsigned long val)
>>> br_opt_toggle(br, BROPT_MST_ENABLED, !!val);
>>> return 0;
>>> }
>>> +
>>> +static int br_mst_nl_get_one(struct net_bridge_port *p, struct sk_buff *skb,
>>> + struct netlink_callback *cb)
>>> +{
>>> + struct net_bridge_vlan_group *vg = nbp_vlan_group(p);
>>> + int err = 0, idx = 0, s_idx = cb->args[1];
>>> + struct net_bridge_vlan *v;
>>> + struct br_port_msg *bpm;
>>> + struct nlmsghdr *nlh;
>>> + struct nlattr *nest;
>>> + unsigned long *seen;
>>> +
>>
>> Reverse xmas tree
>
> Both of these lines end at the 28th column. Is there some other
> tiebreaking mechanism that forces the reverse ordering of nest and seen?
>
> In a variable-width font, the nest declaration does appear shorter. I
> remember that you did not have your laptop with you, could that be it?

Ah yes, you're right. :) Sorry for the noise.

2022-03-08 09:45:08

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/10] net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations

On Thu, Mar 03, 2022 at 22:59, Vladimir Oltean <[email protected]> wrote:
> On Tue, Mar 01, 2022 at 11:03:15AM +0100, Tobias Waldekranz wrote:
>> Whenever a VLAN moves to a new MSTI, send a switchdev notification so
>> that switchdevs can...
>>
>> ...either refuse the migration if the hardware does not support
>> offloading of MST...
>>
>> ..or track a bridge's VID to MSTI mapping when offloading is
>> supported.
>>
>> Signed-off-by: Tobias Waldekranz <[email protected]>
>> ---
>> include/net/switchdev.h | 10 +++++++
>> net/bridge/br_mst.c | 15 +++++++++++
>> net/bridge/br_switchdev.c | 57 +++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 82 insertions(+)
>>
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index 3e424d40fae3..39e57aa5005a 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -28,6 +28,7 @@ enum switchdev_attr_id {
>> SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
>> SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
>> SWITCHDEV_ATTR_ID_MRP_PORT_ROLE,
>> + SWITCHDEV_ATTR_ID_VLAN_MSTI,
>> };
>>
>> struct switchdev_brport_flags {
>> @@ -35,6 +36,14 @@ struct switchdev_brport_flags {
>> unsigned long mask;
>> };
>>
>> +struct switchdev_vlan_attr {
>> + u16 vid;
>> +
>> + union {
>> + u16 msti;
>> + };
>
> Do you see other VLAN attributes that would be added in the future, such
> as to justify making this a single-element union from the get-go?

I could imagine being able to control things like multicast snooping on
a per-VLAN basis. Being able to act as a multicast router in one VLAN
but not another.

> Anyway if that is the case, we're lacking an id for the attribute type,
> so we'd end up needing to change drivers when a second union element
> appears. Otherwise they'd all expect an u16 msti.

My idea was that `enum switchdev_attr_id` would hold all of that
information. In this example SWITCHDEV_ATTR_ID_VLAN_MSTI, denotes both
that `vlan_attr` is the valid member of `u` and that `msti` is the valid
member of `vlan_attr`. If we add SWITCHDEV_ATTR_ID_VLAN_SNOOPING, that
would point to both `vlan_attr` and a new `bool snooping` in the union.

Do you think we should just have a SWITCHDEV_ATTR_ID_VLAN_ATTR for all
per-VLAN attributes and then have a separate union?

>> +};
>> +
>> struct switchdev_attr {
>> struct net_device *orig_dev;
>> enum switchdev_attr_id id;
>> @@ -50,6 +59,7 @@ struct switchdev_attr {
>> u16 vlan_protocol; /* BRIDGE_VLAN_PROTOCOL */
>> bool mc_disabled; /* MC_DISABLED */
>> u8 mrp_port_role; /* MRP_PORT_ROLE */
>> + struct switchdev_vlan_attr vlan_attr; /* VLAN_* */
>> } u;
>> };
>>
>> diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
>> index 8dea8e7257fd..aba603675165 100644
>> --- a/net/bridge/br_mst.c
>> +++ b/net/bridge/br_mst.c
>> @@ -7,6 +7,7 @@
>> */
>>
>> #include <linux/kernel.h>
>> +#include <net/switchdev.h>
>>
>> #include "br_private.h"
>>
>> @@ -65,9 +66,23 @@ static void br_mst_vlan_sync_state(struct net_bridge_vlan *pv, u16 msti)
>>
>> int br_mst_vlan_set_msti(struct net_bridge_vlan *mv, u16 msti)
>> {
>> + struct switchdev_attr attr = {
>> + .id = SWITCHDEV_ATTR_ID_VLAN_MSTI,
>> + .flags = SWITCHDEV_F_DEFER,
>
> Is the bridge spinlock held (atomic context), or otherwise why is
> SWITCHDEV_F_DEFER needed here?

Nope, just copypasta. In fact, it shouldn't be needed when setting the
state either, as you can only change the state via a netlink message. I
will remove it.

>> + .orig_dev = mv->br->dev,
>> + .u.vlan_attr = {
>> + .vid = mv->vid,
>> + .msti = msti,
>> + },
>> + };
>> struct net_bridge_vlan_group *vg;
>> struct net_bridge_vlan *pv;
>> struct net_bridge_port *p;
>> + int err;
>> +
>> + err = switchdev_port_attr_set(mv->br->dev, &attr, NULL);
>
> Treating a "VLAN attribute" as a "port attribute of the bridge" is
> pushing the taxonomy just a little, but I don't have a better suggestion.

Isn't there prior art here? I thought things like VLAN filtering already
worked like this?

>> + if (err && err != -EOPNOTSUPP)
>> + return err;
>>
>> mv->msti = msti;
>>
>> diff --git a/net/bridge/br_switchdev.c b/net/bridge/br_switchdev.c
>> index 6f6a70121a5e..160d7659f88a 100644
>> --- a/net/bridge/br_switchdev.c
>> +++ b/net/bridge/br_switchdev.c
>> @@ -428,6 +428,57 @@ static int br_switchdev_vlan_replay(struct net_device *br_dev,
>> return 0;
>> }
>>
>> +static int br_switchdev_mst_replay(struct net_device *br_dev,
>> + const void *ctx, bool adding,
>> + struct notifier_block *nb,
>> + struct netlink_ext_ack *extack)
>
> "bool adding" is unused, and replaying the VLAN to MSTI associations
> before deleting them makes little sense anyway.
>
> I understand the appeal of symmetry, so maybe put an
>
> if (adding) {
> err = br_switchdev_vlan_attr_replay(...);
> if (err && err != -EOPNOTSUPP)
> return err;
> }
>
> at the end of br_switchdev_vlan_replay()?

Yeah, that is better. Will change.

>> +{
>> + struct switchdev_notifier_port_attr_info attr_info = {
>> + .info = {
>> + .dev = br_dev,
>> + .extack = extack,
>> + .ctx = ctx,
>> + },
>> + };
>> + struct net_bridge *br = netdev_priv(br_dev);
>> + struct net_bridge_vlan_group *vg;
>> + struct net_bridge_vlan *v;
>> + int err;
>> +
>> + ASSERT_RTNL();
>> +
>> + if (!nb)
>> + return 0;
>> +
>> + if (!netif_is_bridge_master(br_dev))
>> + return -EINVAL;
>> +
>> + vg = br_vlan_group(br);
>> +
>> + list_for_each_entry(v, &vg->vlan_list, vlist) {
>> + struct switchdev_attr attr = {
>> + .id = SWITCHDEV_ATTR_ID_VLAN_MSTI,
>> + .flags = SWITCHDEV_F_DEFER,
>
> I don't think SWITCHDEV_F_DEFER has any effect on a replay.

Right, will fix.

>> + .orig_dev = br_dev,
>> + .u.vlan_attr = {
>> + .vid = v->vid,
>> + .msti = v->msti,
>> + }
>> + };
>> +
>> + if (!v->msti)
>> + continue;
>> +
>> + attr_info.attr = &attr;
>> + err = nb->notifier_call(nb, SWITCHDEV_PORT_ATTR_SET, &attr_info);
>> + err = notifier_to_errno(err);
>> + if (err)
>> + return err;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> #ifdef CONFIG_BRIDGE_IGMP_SNOOPING
>> struct br_switchdev_mdb_complete_info {
>> struct net_bridge_port *port;
>> @@ -695,6 +746,10 @@ static int nbp_switchdev_sync_objs(struct net_bridge_port *p, const void *ctx,
>> if (err && err != -EOPNOTSUPP)
>> return err;
>>
>> + err = br_switchdev_mst_replay(br_dev, ctx, true, blocking_nb, extack);
>> + if (err && err != -EOPNOTSUPP)
>> + return err;
>> +
>> err = br_switchdev_mdb_replay(br_dev, dev, ctx, true, blocking_nb,
>> extack);
>> if (err && err != -EOPNOTSUPP)
>> @@ -719,6 +774,8 @@ static void nbp_switchdev_unsync_objs(struct net_bridge_port *p,
>>
>> br_switchdev_mdb_replay(br_dev, dev, ctx, false, blocking_nb, NULL);
>>
>> + br_switchdev_mst_replay(br_dev, ctx, false, blocking_nb, NULL);
>> +
>> br_switchdev_vlan_replay(br_dev, ctx, false, blocking_nb, NULL);
>> }
>>
>> --
>> 2.25.1
>>

2022-03-08 18:10:47

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/10] net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations

On Tue, Mar 08, 2022 at 09:01:04AM +0100, Tobias Waldekranz wrote:
> On Thu, Mar 03, 2022 at 22:59, Vladimir Oltean <[email protected]> wrote:
> > On Tue, Mar 01, 2022 at 11:03:15AM +0100, Tobias Waldekranz wrote:
> >> Whenever a VLAN moves to a new MSTI, send a switchdev notification so
> >> that switchdevs can...
> >>
> >> ...either refuse the migration if the hardware does not support
> >> offloading of MST...
> >>
> >> ..or track a bridge's VID to MSTI mapping when offloading is
> >> supported.
> >>
> >> Signed-off-by: Tobias Waldekranz <[email protected]>
> >> ---
> >> include/net/switchdev.h | 10 +++++++
> >> net/bridge/br_mst.c | 15 +++++++++++
> >> net/bridge/br_switchdev.c | 57 +++++++++++++++++++++++++++++++++++++++
> >> 3 files changed, 82 insertions(+)
> >>
> >> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
> >> index 3e424d40fae3..39e57aa5005a 100644
> >> --- a/include/net/switchdev.h
> >> +++ b/include/net/switchdev.h
> >> @@ -28,6 +28,7 @@ enum switchdev_attr_id {
> >> SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
> >> SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
> >> SWITCHDEV_ATTR_ID_MRP_PORT_ROLE,
> >> + SWITCHDEV_ATTR_ID_VLAN_MSTI,
> >> };
> >>
> >> struct switchdev_brport_flags {
> >> @@ -35,6 +36,14 @@ struct switchdev_brport_flags {
> >> unsigned long mask;
> >> };
> >>
> >> +struct switchdev_vlan_attr {
> >> + u16 vid;
> >> +
> >> + union {
> >> + u16 msti;
> >> + };
> >
> > Do you see other VLAN attributes that would be added in the future, such
> > as to justify making this a single-element union from the get-go?
>
> I could imagine being able to control things like multicast snooping on
> a per-VLAN basis. Being able to act as a multicast router in one VLAN
> but not another.
>
> > Anyway if that is the case, we're lacking an id for the attribute type,
> > so we'd end up needing to change drivers when a second union element
> > appears. Otherwise they'd all expect an u16 msti.
>
> My idea was that `enum switchdev_attr_id` would hold all of that
> information. In this example SWITCHDEV_ATTR_ID_VLAN_MSTI, denotes both
> that `vlan_attr` is the valid member of `u` and that `msti` is the valid
> member of `vlan_attr`. If we add SWITCHDEV_ATTR_ID_VLAN_SNOOPING, that
> would point to both `vlan_attr` and a new `bool snooping` in the union.
>
> Do you think we should just have a SWITCHDEV_ATTR_ID_VLAN_ATTR for all
> per-VLAN attributes and then have a separate union?

It's the first nested union that I see, and a bit confusing.

I think it would be better if we had a

struct switchdev_vlan_attr_msti {
u16 vid;
u16 msti;
};

and different structures for other, future VLAN attributes. Basically
keep a 1:1 mapping between an attribute id and a union.

> >> +};
> >> +
> >> struct switchdev_attr {
> >> struct net_device *orig_dev;
> >> enum switchdev_attr_id id;
> >> @@ -50,6 +59,7 @@ struct switchdev_attr {
> >> u16 vlan_protocol; /* BRIDGE_VLAN_PROTOCOL */
> >> bool mc_disabled; /* MC_DISABLED */
> >> u8 mrp_port_role; /* MRP_PORT_ROLE */
> >> + struct switchdev_vlan_attr vlan_attr; /* VLAN_* */
> >> } u;
> >> };
> >>
> >> diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
> >> index 8dea8e7257fd..aba603675165 100644
> >> --- a/net/bridge/br_mst.c
> >> +++ b/net/bridge/br_mst.c
> >> @@ -7,6 +7,7 @@
> >> */
> >>
> >> #include <linux/kernel.h>
> >> +#include <net/switchdev.h>
> >>
> >> #include "br_private.h"
> >>
> >> @@ -65,9 +66,23 @@ static void br_mst_vlan_sync_state(struct net_bridge_vlan *pv, u16 msti)
> >>
> >> int br_mst_vlan_set_msti(struct net_bridge_vlan *mv, u16 msti)
> >> {
> >> + struct switchdev_attr attr = {
> >> + .id = SWITCHDEV_ATTR_ID_VLAN_MSTI,
> >> + .flags = SWITCHDEV_F_DEFER,
> >
> > Is the bridge spinlock held (atomic context), or otherwise why is
> > SWITCHDEV_F_DEFER needed here?
>
> Nope, just copypasta. In fact, it shouldn't be needed when setting the
> state either, as you can only change the state via a netlink message. I
> will remove it.
>
> >> + .orig_dev = mv->br->dev,
> >> + .u.vlan_attr = {
> >> + .vid = mv->vid,
> >> + .msti = msti,
> >> + },
> >> + };
> >> struct net_bridge_vlan_group *vg;
> >> struct net_bridge_vlan *pv;
> >> struct net_bridge_port *p;
> >> + int err;
> >> +
> >> + err = switchdev_port_attr_set(mv->br->dev, &attr, NULL);
> >
> > Treating a "VLAN attribute" as a "port attribute of the bridge" is
> > pushing the taxonomy just a little, but I don't have a better suggestion.
>
> Isn't there prior art here? I thought things like VLAN filtering already
> worked like this?

Hmm, I can think of VLAN filtering as being an attribute of the bridge
device, but 'which MSTI does VLAN X belong to' is an attribute of the
VLAN (in itself a switchdev object, i.e. something countable).

If the prior art would apply as straightforward as you say, then we'd be
replaying the VLAN MSTIs together with the other port attributes - in
"pull" mode, in dsa_port_switchdev_sync_attrs(), rather than in "push"
mode with the rest of the objects - in nbp_switchdev_sync_objs().
But we're not doing that.

To prove that there is a difference between VLAN filtering as a port
property of the bridge device, and VLAN MSTIs (or other per-VLAN global
bridge options), consider this.
You create a bridge, add 10 VLANs on br0, enable VLAN filtering, then
delete the 10 VLANs and re-create them. The bridge is still VLAN
filtering.
So VLAN filtering is a property of the bridge.

Next you create a bridge, add 10 VLANs on br0, run your new command:
'bridge vlan global set dev br0 vid <VID> msti <MSTI>'
then delete the 10 VLANs and create them back.
Their MSTI is 0, not what was set via the bridge vlan global options...
Because the MSTI is a property of the VLANs, not of the bridge.

A real port attribute wouldn't behave like that.

At least this is what I understand from your patch set, I haven't run it;
sorry if I'm mistaken about something, but I can't find a clearer way to
express what I find strange.

Anyway, I'll stop uselessly commenting here - I can understand the
practical reasons why you wouldn't want to bother expanding the taxonomy
to describe this for what it really is - an "object attribute" of sorts -
because a port attribute for the bridge device has the call path you
need already laid out, including replication towards all bridge ports.

2022-03-08 20:16:31

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 03/10] net: bridge: mst: Support setting and reporting MST port states

On Tue, Mar 01, 2022 at 17:53, Roopa Prabhu <[email protected]> wrote:
> On 3/1/22 15:19, Nikolay Aleksandrov wrote:
>> On 1 March 2022 11:03:14 CET, Tobias Waldekranz <[email protected]> wrote:
>>> Make it possible to change the port state in a given MSTI. This is
>>> done through a new netlink interface, since the MSTIs are objects in
>>> their own right. The proposed iproute2 interface would be:
>>>
>>> bridge mst set dev <PORT> msti <MSTI> state <STATE>
>>>
>>> Current states in all applicable MSTIs can also be dumped. The
>>> proposed iproute interface looks like this:
>>>
>>> $ bridge mst
>>> port msti
>>> vb1 0
>>> state forwarding
>>> 100
>>> state disabled
>>> vb2 0
>>> state forwarding
>>> 100
>>> state forwarding
>>>
>>> The preexisting per-VLAN states are still valid in the MST
>>> mode (although they are read-only), and can be queried as usual if one
>>> is interested in knowing a particular VLAN's state without having to
>>> care about the VID to MSTI mapping (in this example VLAN 20 and 30 are
>>> bound to MSTI 100):
>>>
>>> $ bridge -d vlan
>>> port vlan-id
>>> vb1 10
>>> state forwarding mcast_router 1
>>> 20
>>> state disabled mcast_router 1
>>> 30
>>> state disabled mcast_router 1
>>> 40
>>> state forwarding mcast_router 1
>>> vb2 10
>>> state forwarding mcast_router 1
>>> 20
>>> state forwarding mcast_router 1
>>> 30
>>> state forwarding mcast_router 1
>>> 40
>>> state forwarding mcast_router 1
>>>
>>> Signed-off-by: Tobias Waldekranz <[email protected]>
>>> ---
>>> include/uapi/linux/if_bridge.h | 16 +++
>>> include/uapi/linux/rtnetlink.h | 5 +
>>> net/bridge/br_mst.c | 244 +++++++++++++++++++++++++++++++++
>>> net/bridge/br_netlink.c | 3 +
>>> net/bridge/br_private.h | 4 +
>>> 5 files changed, 272 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
>>> index b68016f625b7..784482527861 100644
>>> --- a/include/uapi/linux/if_bridge.h
>>> +++ b/include/uapi/linux/if_bridge.h
>>> @@ -785,4 +785,20 @@ enum {
>>> __BRIDGE_QUERIER_MAX
>>> };
>>> #define BRIDGE_QUERIER_MAX (__BRIDGE_QUERIER_MAX - 1)
>>> +
>>> +enum {
>>> + BRIDGE_MST_UNSPEC,
>>> + BRIDGE_MST_ENTRY,
>>> + __BRIDGE_MST_MAX,
>>> +};
>>> +#define BRIDGE_MST_MAX (__BRIDGE_MST_MAX - 1)
>>> +
>>> +enum {
>>> + BRIDGE_MST_ENTRY_UNSPEC,
>>> + BRIDGE_MST_ENTRY_MSTI,
>>> + BRIDGE_MST_ENTRY_STATE,
>>> + __BRIDGE_MST_ENTRY_MAX,
>>> +};
>>> +#define BRIDGE_MST_ENTRY_MAX (__BRIDGE_MST_ENTRY_MAX - 1)
>>> +
>>> #endif /* _UAPI_LINUX_IF_BRIDGE_H */
>>> diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>>> index 0970cb4b1b88..4a48f3ce862c 100644
>>> --- a/include/uapi/linux/rtnetlink.h
>>> +++ b/include/uapi/linux/rtnetlink.h
>>> @@ -192,6 +192,11 @@ enum {
>>> RTM_GETTUNNEL,
>>> #define RTM_GETTUNNEL RTM_GETTUNNEL
>>>
>>> + RTM_GETMST = 124 + 2,
>>> +#define RTM_GETMST RTM_GETMST
>>> + RTM_SETMST,
>>> +#define RTM_SETMST RTM_SETMST
>>> +
>> I think you should also update selinux (see nlmsgtab.c)
>> I'll think about this one, if there is some nice way to avoid the new rtm types.
>
> yes, since these are all port attributes, seems like 'bridge link set'
> should work
>
> Tobias, can you pls check if extending RTM_SETLINK (with AF_BRIDGE) is
> an option here ?
>
> ie via br_setlink

Yeah that makes sense. Not sure how I convinced myself that I needed a
separate rtm type for it. I will give it a try. Thanks!

2022-03-08 23:49:18

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 03/10] net: bridge: mst: Support setting and reporting MST port states

On Wed, Mar 02, 2022 at 00:19, Nikolay Aleksandrov <[email protected]> wrote:
> On 1 March 2022 11:03:14 CET, Tobias Waldekranz <[email protected]> wrote:
>>Make it possible to change the port state in a given MSTI. This is
>>done through a new netlink interface, since the MSTIs are objects in
>>their own right. The proposed iproute2 interface would be:
>>
>> bridge mst set dev <PORT> msti <MSTI> state <STATE>
>>
>>Current states in all applicable MSTIs can also be dumped. The
>>proposed iproute interface looks like this:
>>
>>$ bridge mst
>>port msti
>>vb1 0
>> state forwarding
>> 100
>> state disabled
>>vb2 0
>> state forwarding
>> 100
>> state forwarding
>>
>>The preexisting per-VLAN states are still valid in the MST
>>mode (although they are read-only), and can be queried as usual if one
>>is interested in knowing a particular VLAN's state without having to
>>care about the VID to MSTI mapping (in this example VLAN 20 and 30 are
>>bound to MSTI 100):
>>
>>$ bridge -d vlan
>>port vlan-id
>>vb1 10
>> state forwarding mcast_router 1
>> 20
>> state disabled mcast_router 1
>> 30
>> state disabled mcast_router 1
>> 40
>> state forwarding mcast_router 1
>>vb2 10
>> state forwarding mcast_router 1
>> 20
>> state forwarding mcast_router 1
>> 30
>> state forwarding mcast_router 1
>> 40
>> state forwarding mcast_router 1
>>
>>Signed-off-by: Tobias Waldekranz <[email protected]>
>>---
>> include/uapi/linux/if_bridge.h | 16 +++
>> include/uapi/linux/rtnetlink.h | 5 +
>> net/bridge/br_mst.c | 244 +++++++++++++++++++++++++++++++++
>> net/bridge/br_netlink.c | 3 +
>> net/bridge/br_private.h | 4 +
>> 5 files changed, 272 insertions(+)
>>
>>diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
>>index b68016f625b7..784482527861 100644
>>--- a/include/uapi/linux/if_bridge.h
>>+++ b/include/uapi/linux/if_bridge.h
>>@@ -785,4 +785,20 @@ enum {
>> __BRIDGE_QUERIER_MAX
>> };
>> #define BRIDGE_QUERIER_MAX (__BRIDGE_QUERIER_MAX - 1)
>>+
>>+enum {
>>+ BRIDGE_MST_UNSPEC,
>>+ BRIDGE_MST_ENTRY,
>>+ __BRIDGE_MST_MAX,
>>+};
>>+#define BRIDGE_MST_MAX (__BRIDGE_MST_MAX - 1)
>>+
>>+enum {
>>+ BRIDGE_MST_ENTRY_UNSPEC,
>>+ BRIDGE_MST_ENTRY_MSTI,
>>+ BRIDGE_MST_ENTRY_STATE,
>>+ __BRIDGE_MST_ENTRY_MAX,
>>+};
>>+#define BRIDGE_MST_ENTRY_MAX (__BRIDGE_MST_ENTRY_MAX - 1)
>>+
>> #endif /* _UAPI_LINUX_IF_BRIDGE_H */
>>diff --git a/include/uapi/linux/rtnetlink.h b/include/uapi/linux/rtnetlink.h
>>index 0970cb4b1b88..4a48f3ce862c 100644
>>--- a/include/uapi/linux/rtnetlink.h
>>+++ b/include/uapi/linux/rtnetlink.h
>>@@ -192,6 +192,11 @@ enum {
>> RTM_GETTUNNEL,
>> #define RTM_GETTUNNEL RTM_GETTUNNEL
>>
>>+ RTM_GETMST = 124 + 2,
>>+#define RTM_GETMST RTM_GETMST
>>+ RTM_SETMST,
>>+#define RTM_SETMST RTM_SETMST
>>+
>
> I think you should also update selinux (see nlmsgtab.c)
> I'll think about this one, if there is some nice way to avoid the new rtm types.
>
>> __RTM_MAX,
>> #define RTM_MAX (((__RTM_MAX + 3) & ~3) - 1)
>> };
>>diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
>>index f3b8e279b85c..8dea8e7257fd 100644
>>--- a/net/bridge/br_mst.c
>>+++ b/net/bridge/br_mst.c
>>@@ -120,3 +120,247 @@ int br_mst_set_enabled(struct net_bridge *br, unsigned long val)
>> br_opt_toggle(br, BROPT_MST_ENABLED, !!val);
>> return 0;
>> }
>>+
>>+static int br_mst_nl_get_one(struct net_bridge_port *p, struct sk_buff *skb,
>>+ struct netlink_callback *cb)
>>+{
>>+ struct net_bridge_vlan_group *vg = nbp_vlan_group(p);
>>+ int err = 0, idx = 0, s_idx = cb->args[1];
>>+ struct net_bridge_vlan *v;
>>+ struct br_port_msg *bpm;
>>+ struct nlmsghdr *nlh;
>>+ struct nlattr *nest;
>>+ unsigned long *seen;
>>+
>
> Reverse xmas tree

Both of these lines end at the 28th column. Is there some other
tiebreaking mechanism that forces the reverse ordering of nest and seen?

In a variable-width font, the nest declaration does appear shorter. I
remember that you did not have your laptop with you, could that be it?

2022-03-09 16:04:27

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 06/10] net: dsa: Pass VLAN MSTI migration notifications to driver

On Fri, Mar 04, 2022 at 00:29, Vladimir Oltean <[email protected]> wrote:
> On Tue, Mar 01, 2022 at 11:03:17AM +0100, Tobias Waldekranz wrote:
>> Add the usual trampoline functionality from the generic DSA layer down
>> to the drivers for VLAN MSTI migrations.
>>
>> Signed-off-by: Tobias Waldekranz <[email protected]>
>> ---
>> include/net/dsa.h | 3 +++
>> net/dsa/dsa_priv.h | 1 +
>> net/dsa/port.c | 10 ++++++++++
>> net/dsa/slave.c | 6 ++++++
>> 4 files changed, 20 insertions(+)
>>
>> diff --git a/include/net/dsa.h b/include/net/dsa.h
>> index cfedcfb86350..cc8acb01bd9b 100644
>> --- a/include/net/dsa.h
>> +++ b/include/net/dsa.h
>> @@ -962,6 +962,9 @@ struct dsa_switch_ops {
>> struct netlink_ext_ack *extack);
>> int (*port_vlan_del)(struct dsa_switch *ds, int port,
>> const struct switchdev_obj_port_vlan *vlan);
>> + int (*vlan_msti_set)(struct dsa_switch *ds,
>> + const struct switchdev_attr *attr);
>
> I would rather pass the struct switchdev_vlan_attr and the orig_dev
> (bridge) as separate arguments here. Or even the struct dsa_bridge, for
> consistency to the API changes for database isolation.

Fair point. I'll change.

>> +
>> /*
>> * Forwarding database
>> */
>> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
>> index 07c0ad52395a..87ec0697e92e 100644
>> --- a/net/dsa/dsa_priv.h
>> +++ b/net/dsa/dsa_priv.h
>> @@ -217,6 +217,7 @@ int dsa_port_vlan_filtering(struct dsa_port *dp, bool vlan_filtering,
>> struct netlink_ext_ack *extack);
>> bool dsa_port_skip_vlan_configuration(struct dsa_port *dp);
>> int dsa_port_ageing_time(struct dsa_port *dp, clock_t ageing_clock);
>> +int dsa_port_vlan_msti(struct dsa_port *dp, const struct switchdev_attr *attr);
>> int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
>> bool targeted_match);
>> int dsa_port_fdb_add(struct dsa_port *dp, const unsigned char *addr,
>> diff --git a/net/dsa/port.c b/net/dsa/port.c
>> index d9da425a17fb..5f45cb7d70ba 100644
>> --- a/net/dsa/port.c
>> +++ b/net/dsa/port.c
>> @@ -778,6 +778,16 @@ int dsa_port_bridge_flags(struct dsa_port *dp,
>> return 0;
>> }
>>
>> +int dsa_port_vlan_msti(struct dsa_port *dp, const struct switchdev_attr *attr)
>> +{
>> + struct dsa_switch *ds = dp->ds;
>> +
>> + if (!ds->ops->vlan_msti_set)
>> + return -EOPNOTSUPP;
>> +
>> + return ds->ops->vlan_msti_set(ds, attr);
>
> I guess this doesn't need to be a cross-chip notifier event for all
> switches, because replication to all bridge ports is handled by
> switchdev_handle_port_attr_set(). Ok. But isn't it called too many times
> per switch?

It is certainly called more times than necessary. But I'm not aware of
any way to limit it. Just as with other bridge-global settings like
ageing timeout, the bridge will just replicate the event to each port,
not knowing whether some of them belong to the same underlying ASIC or
not.

We could leverage hwdoms in the bridge to figure that out, but then:

- Drivers that do not implement forward offloading would miss out on
this optimization. Unfortunate but not a big deal.

- Since DSA presents multi-chip trees as a single switchdev, the DSA
layer would have to replicate the event out to each device. Doable,
but feels like a series of its own.

>> +}
>> +
>> int dsa_port_mtu_change(struct dsa_port *dp, int new_mtu,
>> bool targeted_match)
>> {
>> diff --git a/net/dsa/slave.c b/net/dsa/slave.c
>> index 089616206b11..c6ffcd782b5a 100644
>> --- a/net/dsa/slave.c
>> +++ b/net/dsa/slave.c
>> @@ -314,6 +314,12 @@ static int dsa_slave_port_attr_set(struct net_device *dev, const void *ctx,
>>
>> ret = dsa_port_bridge_flags(dp, attr->u.brport_flags, extack);
>> break;
>> + case SWITCHDEV_ATTR_ID_VLAN_MSTI:
>> + if (!dsa_port_offloads_bridge_dev(dp, attr->orig_dev))
>> + return -EOPNOTSUPP;
>> +
>> + ret = dsa_port_vlan_msti(dp, attr);
>> + break;
>> default:
>> ret = -EOPNOTSUPP;
>> break;
>> --
>> 2.25.1
>>

2022-03-09 16:23:20

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 04/10] net: bridge: mst: Notify switchdev drivers of VLAN MSTI migrations

On Tue, Mar 08, 2022 at 19:17, Vladimir Oltean <[email protected]> wrote:
> On Tue, Mar 08, 2022 at 09:01:04AM +0100, Tobias Waldekranz wrote:
>> On Thu, Mar 03, 2022 at 22:59, Vladimir Oltean <[email protected]> wrote:
>> > On Tue, Mar 01, 2022 at 11:03:15AM +0100, Tobias Waldekranz wrote:
>> >> Whenever a VLAN moves to a new MSTI, send a switchdev notification so
>> >> that switchdevs can...
>> >>
>> >> ...either refuse the migration if the hardware does not support
>> >> offloading of MST...
>> >>
>> >> ..or track a bridge's VID to MSTI mapping when offloading is
>> >> supported.
>> >>
>> >> Signed-off-by: Tobias Waldekranz <[email protected]>
>> >> ---
>> >> include/net/switchdev.h | 10 +++++++
>> >> net/bridge/br_mst.c | 15 +++++++++++
>> >> net/bridge/br_switchdev.c | 57 +++++++++++++++++++++++++++++++++++++++
>> >> 3 files changed, 82 insertions(+)
>> >>
>> >> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> >> index 3e424d40fae3..39e57aa5005a 100644
>> >> --- a/include/net/switchdev.h
>> >> +++ b/include/net/switchdev.h
>> >> @@ -28,6 +28,7 @@ enum switchdev_attr_id {
>> >> SWITCHDEV_ATTR_ID_BRIDGE_MC_DISABLED,
>> >> SWITCHDEV_ATTR_ID_BRIDGE_MROUTER,
>> >> SWITCHDEV_ATTR_ID_MRP_PORT_ROLE,
>> >> + SWITCHDEV_ATTR_ID_VLAN_MSTI,
>> >> };
>> >>
>> >> struct switchdev_brport_flags {
>> >> @@ -35,6 +36,14 @@ struct switchdev_brport_flags {
>> >> unsigned long mask;
>> >> };
>> >>
>> >> +struct switchdev_vlan_attr {
>> >> + u16 vid;
>> >> +
>> >> + union {
>> >> + u16 msti;
>> >> + };
>> >
>> > Do you see other VLAN attributes that would be added in the future, such
>> > as to justify making this a single-element union from the get-go?
>>
>> I could imagine being able to control things like multicast snooping on
>> a per-VLAN basis. Being able to act as a multicast router in one VLAN
>> but not another.
>>
>> > Anyway if that is the case, we're lacking an id for the attribute type,
>> > so we'd end up needing to change drivers when a second union element
>> > appears. Otherwise they'd all expect an u16 msti.
>>
>> My idea was that `enum switchdev_attr_id` would hold all of that
>> information. In this example SWITCHDEV_ATTR_ID_VLAN_MSTI, denotes both
>> that `vlan_attr` is the valid member of `u` and that `msti` is the valid
>> member of `vlan_attr`. If we add SWITCHDEV_ATTR_ID_VLAN_SNOOPING, that
>> would point to both `vlan_attr` and a new `bool snooping` in the union.
>>
>> Do you think we should just have a SWITCHDEV_ATTR_ID_VLAN_ATTR for all
>> per-VLAN attributes and then have a separate union?
>
> It's the first nested union that I see, and a bit confusing.
>
> I think it would be better if we had a
>
> struct switchdev_vlan_attr_msti {
> u16 vid;
> u16 msti;
> };
>
> and different structures for other, future VLAN attributes. Basically
> keep a 1:1 mapping between an attribute id and a union.

Yeah, I like the simplicity of that. Changing.

>> >> +};
>> >> +
>> >> struct switchdev_attr {
>> >> struct net_device *orig_dev;
>> >> enum switchdev_attr_id id;
>> >> @@ -50,6 +59,7 @@ struct switchdev_attr {
>> >> u16 vlan_protocol; /* BRIDGE_VLAN_PROTOCOL */
>> >> bool mc_disabled; /* MC_DISABLED */
>> >> u8 mrp_port_role; /* MRP_PORT_ROLE */
>> >> + struct switchdev_vlan_attr vlan_attr; /* VLAN_* */
>> >> } u;
>> >> };
>> >>
>> >> diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c
>> >> index 8dea8e7257fd..aba603675165 100644
>> >> --- a/net/bridge/br_mst.c
>> >> +++ b/net/bridge/br_mst.c
>> >> @@ -7,6 +7,7 @@
>> >> */
>> >>
>> >> #include <linux/kernel.h>
>> >> +#include <net/switchdev.h>
>> >>
>> >> #include "br_private.h"
>> >>
>> >> @@ -65,9 +66,23 @@ static void br_mst_vlan_sync_state(struct net_bridge_vlan *pv, u16 msti)
>> >>
>> >> int br_mst_vlan_set_msti(struct net_bridge_vlan *mv, u16 msti)
>> >> {
>> >> + struct switchdev_attr attr = {
>> >> + .id = SWITCHDEV_ATTR_ID_VLAN_MSTI,
>> >> + .flags = SWITCHDEV_F_DEFER,
>> >
>> > Is the bridge spinlock held (atomic context), or otherwise why is
>> > SWITCHDEV_F_DEFER needed here?
>>
>> Nope, just copypasta. In fact, it shouldn't be needed when setting the
>> state either, as you can only change the state via a netlink message. I
>> will remove it.
>>
>> >> + .orig_dev = mv->br->dev,
>> >> + .u.vlan_attr = {
>> >> + .vid = mv->vid,
>> >> + .msti = msti,
>> >> + },
>> >> + };
>> >> struct net_bridge_vlan_group *vg;
>> >> struct net_bridge_vlan *pv;
>> >> struct net_bridge_port *p;
>> >> + int err;
>> >> +
>> >> + err = switchdev_port_attr_set(mv->br->dev, &attr, NULL);
>> >
>> > Treating a "VLAN attribute" as a "port attribute of the bridge" is
>> > pushing the taxonomy just a little, but I don't have a better suggestion.
>>
>> Isn't there prior art here? I thought things like VLAN filtering already
>> worked like this?
>
> Hmm, I can think of VLAN filtering as being an attribute of the bridge
> device, but 'which MSTI does VLAN X belong to' is an attribute of the
> VLAN (in itself a switchdev object, i.e. something countable).
>
> If the prior art would apply as straightforward as you say, then we'd be
> replaying the VLAN MSTIs together with the other port attributes - in
> "pull" mode, in dsa_port_switchdev_sync_attrs(), rather than in "push"
> mode with the rest of the objects - in nbp_switchdev_sync_objs().
> But we're not doing that.
>
> To prove that there is a difference between VLAN filtering as a port
> property of the bridge device, and VLAN MSTIs (or other per-VLAN global
> bridge options), consider this.
> You create a bridge, add 10 VLANs on br0, enable VLAN filtering, then
> delete the 10 VLANs and re-create them. The bridge is still VLAN
> filtering.
> So VLAN filtering is a property of the bridge.
>
> Next you create a bridge, add 10 VLANs on br0, run your new command:
> 'bridge vlan global set dev br0 vid <VID> msti <MSTI>'
> then delete the 10 VLANs and create them back.
> Their MSTI is 0, not what was set via the bridge vlan global options...
> Because the MSTI is a property of the VLANs, not of the bridge.
>
> A real port attribute wouldn't behave like that.
>
> At least this is what I understand from your patch set, I haven't run it;
> sorry if I'm mistaken about something, but I can't find a clearer way to
> express what I find strange.
>
> Anyway, I'll stop uselessly commenting here - I can understand the
> practical reasons why you wouldn't want to bother expanding the taxonomy
> to describe this for what it really is - an "object attribute" of sorts -
> because a port attribute for the bridge device has the call path you
> need already laid out, including replication towards all bridge ports.

I yield, I yield! :)

2022-03-10 00:59:39

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 06/10] net: dsa: Pass VLAN MSTI migration notifications to driver

On Wed, Mar 09, 2022 at 04:47:02PM +0100, Tobias Waldekranz wrote:
> >> +int dsa_port_vlan_msti(struct dsa_port *dp, const struct switchdev_attr *attr)
> >> +{
> >> + struct dsa_switch *ds = dp->ds;
> >> +
> >> + if (!ds->ops->vlan_msti_set)
> >> + return -EOPNOTSUPP;
> >> +
> >> + return ds->ops->vlan_msti_set(ds, attr);
> >
> > I guess this doesn't need to be a cross-chip notifier event for all
> > switches, because replication to all bridge ports is handled by
> > switchdev_handle_port_attr_set(). Ok. But isn't it called too many times
> > per switch?
>
> It is certainly called more times than necessary. But I'm not aware of
> any way to limit it. Just as with other bridge-global settings like
> ageing timeout, the bridge will just replicate the event to each port,
> not knowing whether some of them belong to the same underlying ASIC or
> not.
>
> We could leverage hwdoms in the bridge to figure that out, but then:

Hmm, uncalled for. Also, not sure how it helps (it just plain doesn't
work, as you've pointed out below yourself).

>
> - Drivers that do not implement forward offloading would miss out on
> this optimization. Unfortunate but not a big deal.
> - Since DSA presents multi-chip trees as a single switchdev, the DSA
> layer would have to replicate the event out to each device. Doable,
> but feels like a series of its own.

I've mentally walked through the alternatives and I don't see a practical
alternative than letting the driver cut out the duplicate calls.

Maybe it's worth raising awareness by adding a comment above the
dsa_switch_ops :: vlan_msti_set definition that drivers should be
prepared to handle such calls.

Case in point, in mv88e6xxx_vlan_msti_set() you could avoid some useless
MDIO transactions (a call to mv88e6xxx_vtu_loadpurge) with a simple
"if (vlan.sid != new_sid)" check. Basically just go through a refcount
bump followed by an immediate drop.

2022-03-10 09:52:38

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Fri, Mar 04, 2022 at 00:20, Vladimir Oltean <[email protected]> wrote:
> On Tue, Mar 01, 2022 at 11:03:18AM +0100, Tobias Waldekranz wrote:
>> Add the usual trampoline functionality from the generic DSA layer down
>> to the drivers for MST state changes.
>>
>> Signed-off-by: Tobias Waldekranz <[email protected]>
>> ---
>> include/net/dsa.h | 2 ++
>> net/dsa/dsa_priv.h | 2 ++
>> net/dsa/port.c | 30 ++++++++++++++++++++++++++++++
>> net/dsa/slave.c | 6 ++++++
>> 4 files changed, 40 insertions(+)
>>
>> diff --git a/include/net/dsa.h b/include/net/dsa.h
>> index cc8acb01bd9b..096e6e3a8e1e 100644
>> --- a/include/net/dsa.h
>> +++ b/include/net/dsa.h
>> @@ -943,6 +943,8 @@ struct dsa_switch_ops {
>> struct dsa_bridge bridge);
>> void (*port_stp_state_set)(struct dsa_switch *ds, int port,
>> u8 state);
>> + int (*port_mst_state_set)(struct dsa_switch *ds, int port,
>> + const struct switchdev_mst_state *state);
>> void (*port_fast_age)(struct dsa_switch *ds, int port);
>> int (*port_pre_bridge_flags)(struct dsa_switch *ds, int port,
>> struct switchdev_brport_flags flags,
>> diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
>> index 87ec0697e92e..a620e079ebc5 100644
>> --- a/net/dsa/dsa_priv.h
>> +++ b/net/dsa/dsa_priv.h
>> @@ -198,6 +198,8 @@ static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
>> void dsa_port_set_tag_protocol(struct dsa_port *cpu_dp,
>> const struct dsa_device_ops *tag_ops);
>> int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age);
>> +int dsa_port_set_mst_state(struct dsa_port *dp,
>> + const struct switchdev_mst_state *state);
>> int dsa_port_enable_rt(struct dsa_port *dp, struct phy_device *phy);
>> int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy);
>> void dsa_port_disable_rt(struct dsa_port *dp);
>> diff --git a/net/dsa/port.c b/net/dsa/port.c
>> index 5f45cb7d70ba..26cfbc8ab499 100644
>> --- a/net/dsa/port.c
>> +++ b/net/dsa/port.c
>> @@ -108,6 +108,36 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age)
>> return 0;
>> }
>>
>> +int dsa_port_set_mst_state(struct dsa_port *dp,
>> + const struct switchdev_mst_state *state)
>> +{
>> + struct dsa_switch *ds = dp->ds;
>> + int err, port = dp->index;
>> +
>> + if (!ds->ops->port_mst_state_set)
>> + return -EOPNOTSUPP;
>> +
>> + err = ds->ops->port_mst_state_set(ds, port, state);
>> + if (err)
>> + return err;
>> +
>> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
>> + switch (state->state) {
>> + case BR_STATE_DISABLED:
>> + case BR_STATE_BLOCKING:
>> + case BR_STATE_LISTENING:
>> + /* Ideally we would only fast age entries
>> + * belonging to VLANs controlled by this
>> + * MST.
>> + */
>> + dsa_port_fast_age(dp);
>
> Does mv88e6xxx support this? If it does, you might just as well
> introduce another variant of ds->ops->port_fast_age() for an msti.

You can limit ATU operations to a particular FID. So the way I see it we
could either have:

int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)

+ Maybe more generic. You could imagine there being a way to trigger
this operation from userspace for example.
- We would have to keep the VLAN<->MSTI mapping in the DSA layer in
order to be able to do the fan-out in dsa_port_set_mst_state.

or:

int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)

+ Let's the mapping be an internal affair in the driver.
- Perhaps, less generically useful.

Which one do you prefer? Or is there a hidden third option? :)

> And since it is new code, you could require that drivers _do_ support
> configuring learning before they could support MSTP. After all, we don't
> want to keep legacy mechanisms in place forever.

By "configuring learning", do you mean this new fast-age-per-vid/msti,
or being able to enable/disable learning per port? If it's the latter,
I'm not sure I understand how those two are related.

2022-03-10 18:54:17

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Thu, Mar 10, 2022 at 12:35, Vladimir Oltean <[email protected]> wrote:
> On Thu, Mar 10, 2022 at 09:54:34AM +0100, Tobias Waldekranz wrote:
>> >> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
>> >> + switch (state->state) {
>> >> + case BR_STATE_DISABLED:
>> >> + case BR_STATE_BLOCKING:
>> >> + case BR_STATE_LISTENING:
>> >> + /* Ideally we would only fast age entries
>> >> + * belonging to VLANs controlled by this
>> >> + * MST.
>> >> + */
>> >> + dsa_port_fast_age(dp);
>> >
>> > Does mv88e6xxx support this? If it does, you might just as well
>> > introduce another variant of ds->ops->port_fast_age() for an msti.
>>
>> You can limit ATU operations to a particular FID. So the way I see it we
>> could either have:
>>
>> int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)
>>
>> + Maybe more generic. You could imagine there being a way to trigger
>> this operation from userspace for example.
>> - We would have to keep the VLAN<->MSTI mapping in the DSA layer in
>> order to be able to do the fan-out in dsa_port_set_mst_state.
>>
>> or:
>>
>> int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)
>>
>> + Let's the mapping be an internal affair in the driver.
>> - Perhaps, less generically useful.
>>
>> Which one do you prefer? Or is there a hidden third option? :)
>
> Yes, I was thinking of "port_msti_fast_age". I don't see a cheap way of
> keeping VLAN to MSTI associations in the DSA layer. Only if we could
> retrieve this mapping from the bridge layer - maybe with something
> analogous to br_vlan_get_info(), but br_mst_get_info(), and this gets
> passed a VLAN_N_VID sized bitmap, which the bridge populates with ones
> and zeroes.

That can easily be done. Given that, should we go for port_vlan_fast_age
instead? port_msti_fast_age feels like an awkward interface, since I
don't think there is any hardware out there that can actually perform
that operation without internally fanning it out over all affected VIDs
(or FIDs in the case of mv88e6xxx).

> The reason why I asked for this is because I'm not sure of the
> implications of flushing the entire FDB of the port for a single MSTP
> state change. It would trigger temporary useless flooding in other MSTIs
> at the very least. There isn't any backwards compatibility concern to
> speak of, so we can at least try from the beginning to limit the
> flushing to the required VLANs.

Aside from the performance implications of flows being temporarily
flooded I don't think there are any.

I suppose if you've disabled flooding of unknown unicast on that port,
you would loose the flow until you see some return traffic (or when one
side gives up and ARPs). While somewhat esoteric, it would be nice to
handle this case if the hardware supports it.

> What I didn't think about, and will be a problem, is
> dsa_port_notify_bridge_fdb_flush() - we don't know the vid to flush.
> The easy way out here would be to export dsa_port_notify_bridge_fdb_flush(),
> add a "vid" argument to it, and let drivers call it. Thoughts?

To me, this seems to be another argument in favor of
port_vlan_fast_age. That way you would know the VIDs being flushed at
the DSA layer, and driver writers needn't concern themselves with having
to remember to generate the proper notifications back to the bridge.

> Alternatively, if you think that cross-flushing FDBs of multiple MSTIs
> isn't a real problem, I suppose we could keep the "port_fast_age" method.

What about falling back to it if the driver doesn't support per-VLAN
flushing? Flushing all entries will work in most cases, at the cost of
some temporary flooding. Seems more useful than refusing the offload
completely.

>> > And since it is new code, you could require that drivers _do_ support
>> > configuring learning before they could support MSTP. After all, we don't
>> > want to keep legacy mechanisms in place forever.
>>
>> By "configuring learning", do you mean this new fast-age-per-vid/msti,
>> or being able to enable/disable learning per port? If it's the latter,
>> I'm not sure I understand how those two are related.
>
> The code from dsa_port_set_state() which you've copied:
>
> if (!dsa_port_can_configure_learning(dp) ||
> (do_fast_age && dp->learning)) {
>
> has this explanation:
>
> 1. DSA keeps standalone ports in the FORWARDING state.
> 2. DSA also disables address learning on standalone ports, where this is
> possible (dsa_port_can_configure_learning(dp) == true).
> 3. When a port joins a bridge, it leaves its FORWARDING state from
> standalone mode and inherits the bridge port's BLOCKING state
> 4. dsa_port_set_state() treats a port transition from FORWARDING to
> BLOCKING as a transition requiring an FDB flush
> 5. due to (2), the FDB flush at stage (4) is in fact not needed, because
> the FDB of that port should already be empty. Flushing the FDB may be
> a costly operation for some drivers, so it is avoided if possible.
>
> So this is why the "dsa_port_can_configure_learning()" check is there -
> for compatibility with drivers that can't configure learning => they
> keep learning enabled also in standalone mode => they need an FDB flush
> when a standalone port joins a bridge.
>
> What I'm saying is: for drivers that offload MSTP, let's force them to
> get the basics right first (have configurable learning), rather than go
> forward forever with a backwards compatibility mode.

Makes sense, I'll just move it up to the initial capability check.

2022-03-10 19:05:54

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Thu, Mar 10, 2022 at 09:54:34AM +0100, Tobias Waldekranz wrote:
> >> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
> >> + switch (state->state) {
> >> + case BR_STATE_DISABLED:
> >> + case BR_STATE_BLOCKING:
> >> + case BR_STATE_LISTENING:
> >> + /* Ideally we would only fast age entries
> >> + * belonging to VLANs controlled by this
> >> + * MST.
> >> + */
> >> + dsa_port_fast_age(dp);
> >
> > Does mv88e6xxx support this? If it does, you might just as well
> > introduce another variant of ds->ops->port_fast_age() for an msti.
>
> You can limit ATU operations to a particular FID. So the way I see it we
> could either have:
>
> int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)
>
> + Maybe more generic. You could imagine there being a way to trigger
> this operation from userspace for example.
> - We would have to keep the VLAN<->MSTI mapping in the DSA layer in
> order to be able to do the fan-out in dsa_port_set_mst_state.
>
> or:
>
> int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)
>
> + Let's the mapping be an internal affair in the driver.
> - Perhaps, less generically useful.
>
> Which one do you prefer? Or is there a hidden third option? :)

Yes, I was thinking of "port_msti_fast_age". I don't see a cheap way of
keeping VLAN to MSTI associations in the DSA layer. Only if we could
retrieve this mapping from the bridge layer - maybe with something
analogous to br_vlan_get_info(), but br_mst_get_info(), and this gets
passed a VLAN_N_VID sized bitmap, which the bridge populates with ones
and zeroes.

The reason why I asked for this is because I'm not sure of the
implications of flushing the entire FDB of the port for a single MSTP
state change. It would trigger temporary useless flooding in other MSTIs
at the very least. There isn't any backwards compatibility concern to
speak of, so we can at least try from the beginning to limit the
flushing to the required VLANs.

What I didn't think about, and will be a problem, is
dsa_port_notify_bridge_fdb_flush() - we don't know the vid to flush.
The easy way out here would be to export dsa_port_notify_bridge_fdb_flush(),
add a "vid" argument to it, and let drivers call it. Thoughts?

Alternatively, if you think that cross-flushing FDBs of multiple MSTIs
isn't a real problem, I suppose we could keep the "port_fast_age" method.

> > And since it is new code, you could require that drivers _do_ support
> > configuring learning before they could support MSTP. After all, we don't
> > want to keep legacy mechanisms in place forever.
>
> By "configuring learning", do you mean this new fast-age-per-vid/msti,
> or being able to enable/disable learning per port? If it's the latter,
> I'm not sure I understand how those two are related.

The code from dsa_port_set_state() which you've copied:

if (!dsa_port_can_configure_learning(dp) ||
(do_fast_age && dp->learning)) {

has this explanation:

1. DSA keeps standalone ports in the FORWARDING state.
2. DSA also disables address learning on standalone ports, where this is
possible (dsa_port_can_configure_learning(dp) == true).
3. When a port joins a bridge, it leaves its FORWARDING state from
standalone mode and inherits the bridge port's BLOCKING state
4. dsa_port_set_state() treats a port transition from FORWARDING to
BLOCKING as a transition requiring an FDB flush
5. due to (2), the FDB flush at stage (4) is in fact not needed, because
the FDB of that port should already be empty. Flushing the FDB may be
a costly operation for some drivers, so it is avoided if possible.

So this is why the "dsa_port_can_configure_learning()" check is there -
for compatibility with drivers that can't configure learning => they
keep learning enabled also in standalone mode => they need an FDB flush
when a standalone port joins a bridge.

What I'm saying is: for drivers that offload MSTP, let's force them to
get the basics right first (have configurable learning), rather than go
forward forever with a backwards compatibility mode.

2022-03-10 19:54:07

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 10/10] net: dsa: mv88e6xxx: MST Offloading

On Thu, Mar 10, 2022 at 04:14:31PM +0100, Tobias Waldekranz wrote:
> On Fri, Mar 04, 2022 at 00:26, Vladimir Oltean <[email protected]> wrote:
> > On Tue, Mar 01, 2022 at 11:03:21AM +0100, Tobias Waldekranz wrote:
> >> Allocate a SID in the STU for each MSTID in use by a bridge and handle
> >> the mapping of MSTIDs to VLANs using the SID field of each VTU entry.
> >>
> >> Signed-off-by: Tobias Waldekranz <[email protected]>
> >> ---
> >> drivers/net/dsa/mv88e6xxx/chip.c | 178 +++++++++++++++++++++++++++++++
> >> drivers/net/dsa/mv88e6xxx/chip.h | 13 +++
> >> 2 files changed, 191 insertions(+)
> >>
> >> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
> >> index c14a62aa6a6c..4fb4ec1dff79 100644
> >> --- a/drivers/net/dsa/mv88e6xxx/chip.c
> >> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
> >> @@ -1818,6 +1818,137 @@ static int mv88e6xxx_stu_setup(struct mv88e6xxx_chip *chip)
> >> return mv88e6xxx_stu_loadpurge(chip, &stu);
> >> }
> >>
> >> +static int mv88e6xxx_sid_new(struct mv88e6xxx_chip *chip, u8 *sid)
> >> +{
> >> + DECLARE_BITMAP(busy, MV88E6XXX_N_SID) = { 0 };
> >> + struct mv88e6xxx_mst *mst;
> >> +
> >> + set_bit(0, busy);
> >> +
> >> + list_for_each_entry(mst, &chip->msts, node) {
> >> + set_bit(mst->stu.sid, busy);
> >> + }
> >> +
> >> + *sid = find_first_zero_bit(busy, MV88E6XXX_N_SID);
> >> +
> >> + return (*sid >= mv88e6xxx_max_sid(chip)) ? -ENOSPC : 0;
> >> +}
> >> +
> >> +static int mv88e6xxx_sid_put(struct mv88e6xxx_chip *chip, u8 sid)
> >> +{
> >> + struct mv88e6xxx_mst *mst, *tmp;
> >> + int err = 0;
> >> +
> >> + list_for_each_entry_safe(mst, tmp, &chip->msts, node) {
> >> + if (mst->stu.sid == sid) {
> >> + if (refcount_dec_and_test(&mst->refcnt)) {
> >> + mst->stu.valid = false;
> >> + err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);
> >
> > It is interesting what to do if this fails. Possibly not this, because
> > the entry remains in hardware but not in software.
>
> True, I will let the error bubble up and keep the SW state in sync with
> the hardware.

Ok. For what it's worth, if you bump a refcount from 0 to 1 as part of
the error handling here, you need to do so using refcount_set(1), not
refcount_inc(). I found this out in commit 232deb3f9567 ("net: dsa:
avoid refcount warnings when ->port_{fdb,mdb}_del returns error").
Just thought I'd mention it in case you didn't know, to avoid a future
respin for that reason.

> >> + list_del(&mst->node);
> >> + kfree(mst);
> >> + }
> >> +
> >> + return err;
> >> + }
> >> + }
> >> +
> >> + return -ENOENT;
> >> +}
> >> +
> >> +static int mv88e6xxx_sid_get(struct mv88e6xxx_chip *chip, struct net_device *br,
> >> + u16 msti, u8 *sid)
> >> +{
> >> + struct mv88e6xxx_mst *mst;
> >> + int err, i;
> >> +
> >> + if (!br)
> >> + return 0;
> >
> > Is this condition possible?
>
> Removing.
>
> >> +
> >> + if (!mv88e6xxx_has_stu(chip))
> >> + return -EOPNOTSUPP;
> >> +
> >> + list_for_each_entry(mst, &chip->msts, node) {
> >> + if (mst->br == br && mst->msti == msti) {
> >> + refcount_inc(&mst->refcnt);
> >> + *sid = mst->stu.sid;
> >> + return 0;
> >> + }
> >> + }
> >> +
> >> + err = mv88e6xxx_sid_new(chip, sid);
> >> + if (err)
> >> + return err;
> >> +
> >> + mst = kzalloc(sizeof(*mst), GFP_KERNEL);
> >> + if (!mst)
> >> + return -ENOMEM;
> >
> > This leaks the new SID.
>
> I don't think so, the SID is just calculated based on what is in
> chip->msts. However:
>
> - The naming is bad. Will change.

I see now. My bad. What are you renaming it to? If it isn't as concise
you could still keep it sid_new(). I see atu_new() is based on the same
find_first_zero_bit() concept.

> >> +
> >> + INIT_LIST_HEAD(&mst->node);
> >> + refcount_set(&mst->refcnt, 1);
> >> + mst->br = br;
> >> + mst->msti = msti;
> >> + mst->stu.valid = true;
> >> + mst->stu.sid = *sid;
> >> +
> >> + /* The bridge starts out all ports in the disabled state. But
> >> + * a STU state of disabled means to go by the port-global
> >> + * state. So we set all user port's initial state to blocking,
> >> + * to match the bridge's behavior.
> >> + */
> >> + for (i = 0; i < mv88e6xxx_num_ports(chip); i++)
> >> + mst->stu.state[i] = dsa_is_user_port(chip->ds, i) ?
> >> + MV88E6XXX_PORT_CTL0_STATE_BLOCKING :
> >> + MV88E6XXX_PORT_CTL0_STATE_DISABLED;
> >> +
> >> + list_add_tail(&mst->node, &chip->msts);
> >> + return mv88e6xxx_stu_loadpurge(chip, &mst->stu);
> >
> > And this doesn't behave too well on failure (the MSTID exists in
> > software but not in hardware).
>
> Yes, fixing in v3.
>
> >> +}
> >> +
> >> +static int mv88e6xxx_port_mst_state_set(struct dsa_switch *ds, int port,
> >> + const struct switchdev_mst_state *st)
> >> +{
> >> + struct dsa_port *dp = dsa_to_port(ds, port);
> >> + struct mv88e6xxx_chip *chip = ds->priv;
> >> + struct mv88e6xxx_mst *mst;
> >> + u8 state;
> >> + int err;
> >> +
> >> + if (!mv88e6xxx_has_stu(chip))
> >> + return -EOPNOTSUPP;
> >> +
> >> + switch (st->state) {
> >> + case BR_STATE_DISABLED:
> >> + case BR_STATE_BLOCKING:
> >> + case BR_STATE_LISTENING:
> >> + state = MV88E6XXX_PORT_CTL0_STATE_BLOCKING;
> >> + break;
> >> + case BR_STATE_LEARNING:
> >> + state = MV88E6XXX_PORT_CTL0_STATE_LEARNING;
> >> + break;
> >> + case BR_STATE_FORWARDING:
> >> + state = MV88E6XXX_PORT_CTL0_STATE_FORWARDING;
> >> + break;
> >> + default:
> >> + return -EINVAL;
> >> + }
> >> +
> >> + list_for_each_entry(mst, &chip->msts, node) {
> >> + if (mst->br == dsa_port_bridge_dev_get(dp) &&
> >> + mst->msti == st->msti) {
> >> + if (mst->stu.state[port] == state)
> >> + return 0;
> >> +
> >> + mst->stu.state[port] = state;
> >> + mv88e6xxx_reg_lock(chip);
> >> + err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);
> >> + mv88e6xxx_reg_unlock(chip);
> >> + return err;
> >> + }
> >> + }
> >> +
> >> + return -ENOENT;
> >> +}
> >> +
> >> static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch *ds, int port,
> >> u16 vid)
> >> {
> >> @@ -2437,6 +2568,12 @@ static int mv88e6xxx_port_vlan_leave(struct mv88e6xxx_chip *chip,
> >> if (err)
> >> return err;
> >>
> >> + if (!vlan.valid && vlan.sid) {
> >> + err = mv88e6xxx_sid_put(chip, vlan.sid);
> >> + if (err)
> >> + return err;
> >> + }
> >> +
> >> return mv88e6xxx_g1_atu_remove(chip, vlan.fid, port, false);
> >> }
> >>
> >> @@ -2482,6 +2619,44 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
> >> return err;
> >> }
> >>
> >> +static int mv88e6xxx_vlan_msti_set(struct dsa_switch *ds,
> >> + const struct switchdev_attr *attr)
> >> +{
> >> + const struct switchdev_vlan_attr *vattr = &attr->u.vlan_attr;
> >> + struct mv88e6xxx_chip *chip = ds->priv;
> >> + struct mv88e6xxx_vtu_entry vlan;
> >> + u8 new_sid;
> >> + int err;
> >> +
> >> + mv88e6xxx_reg_lock(chip);
> >> +
> >> + err = mv88e6xxx_vtu_get(chip, vattr->vid, &vlan);
> >> + if (err)
> >> + goto unlock;
> >> +
> >> + if (!vlan.valid) {
> >> + err = -EINVAL;
> >> + goto unlock;
> >> + }
> >> +
> >> + err = mv88e6xxx_sid_get(chip, attr->orig_dev, vattr->msti, &new_sid);
> >> + if (err)
> >> + goto unlock;
> >> +
> >> + if (vlan.sid) {
> >> + err = mv88e6xxx_sid_put(chip, vlan.sid);
> >> + if (err)
> >> + goto unlock;
> >> + }
> >> +
> >> + vlan.sid = new_sid;
> >> + err = mv88e6xxx_vtu_loadpurge(chip, &vlan);
> >
> > Maybe you could move mv88e6xxx_sid_put() after this succeeds?
>
> Yep. Also made sure to avoid needless updates of the VTU entry if it
> already belonged to the correct SID.
>
> Thanks for the great review!

I realize I gave you conflicting advice here, first with inverting the
refcount_inc() with the refcount_dec(), then with having fast handling
of noop-changes to vlan.sid. I hope you're able to make some sense out
of that and avoid the obvious issue with the refcount temporarily
dropping to zero before going back to 1, which makes the sanity checker
complain.

2022-03-11 09:39:40

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Thu, Mar 10, 2022 at 18:18, Vladimir Oltean <[email protected]> wrote:
> On Thu, Mar 10, 2022 at 05:05:35PM +0100, Tobias Waldekranz wrote:
>> On Thu, Mar 10, 2022 at 12:35, Vladimir Oltean <[email protected]> wrote:
>> > On Thu, Mar 10, 2022 at 09:54:34AM +0100, Tobias Waldekranz wrote:
>> >> >> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
>> >> >> + switch (state->state) {
>> >> >> + case BR_STATE_DISABLED:
>> >> >> + case BR_STATE_BLOCKING:
>> >> >> + case BR_STATE_LISTENING:
>> >> >> + /* Ideally we would only fast age entries
>> >> >> + * belonging to VLANs controlled by this
>> >> >> + * MST.
>> >> >> + */
>> >> >> + dsa_port_fast_age(dp);
>> >> >
>> >> > Does mv88e6xxx support this? If it does, you might just as well
>> >> > introduce another variant of ds->ops->port_fast_age() for an msti.
>> >>
>> >> You can limit ATU operations to a particular FID. So the way I see it we
>> >> could either have:
>> >>
>> >> int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)
>> >>
>> >> + Maybe more generic. You could imagine there being a way to trigger
>> >> this operation from userspace for example.
>> >> - We would have to keep the VLAN<->MSTI mapping in the DSA layer in
>> >> order to be able to do the fan-out in dsa_port_set_mst_state.
>> >>
>> >> or:
>> >>
>> >> int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)
>> >>
>> >> + Let's the mapping be an internal affair in the driver.
>> >> - Perhaps, less generically useful.
>> >>
>> >> Which one do you prefer? Or is there a hidden third option? :)
>> >
>> > Yes, I was thinking of "port_msti_fast_age". I don't see a cheap way of
>> > keeping VLAN to MSTI associations in the DSA layer. Only if we could
>> > retrieve this mapping from the bridge layer - maybe with something
>> > analogous to br_vlan_get_info(), but br_mst_get_info(), and this gets
>> > passed a VLAN_N_VID sized bitmap, which the bridge populates with ones
>> > and zeroes.
>>
>> That can easily be done. Given that, should we go for port_vlan_fast_age
>> instead? port_msti_fast_age feels like an awkward interface, since I
>> don't think there is any hardware out there that can actually perform
>> that operation without internally fanning it out over all affected VIDs
>> (or FIDs in the case of mv88e6xxx).
>
> Yup, yup. My previous email was all over the place with regard to the
> available options, because I wrote it in multiple phases so it wasn't
> chronologically ordered top-to-bottom. But port_vlan_fast_age() makes
> the most sense if you can implement br_mst_get_info(). Same goes for
> dsa_port_notify_bridge_fdb_flush().
>
>> > The reason why I asked for this is because I'm not sure of the
>> > implications of flushing the entire FDB of the port for a single MSTP
>> > state change. It would trigger temporary useless flooding in other MSTIs
>> > at the very least. There isn't any backwards compatibility concern to
>> > speak of, so we can at least try from the beginning to limit the
>> > flushing to the required VLANs.
>>
>> Aside from the performance implications of flows being temporarily
>> flooded I don't think there are any.
>>
>> I suppose if you've disabled flooding of unknown unicast on that port,
>> you would loose the flow until you see some return traffic (or when one
>> side gives up and ARPs). While somewhat esoteric, it would be nice to
>> handle this case if the hardware supports it.
>
> If by "handle this case" you mean "flush only the affected VLANs", then
> yes, I fully agree.
>
>> > What I didn't think about, and will be a problem, is
>> > dsa_port_notify_bridge_fdb_flush() - we don't know the vid to flush.
>> > The easy way out here would be to export dsa_port_notify_bridge_fdb_flush(),
>> > add a "vid" argument to it, and let drivers call it. Thoughts?
>>
>> To me, this seems to be another argument in favor of
>> port_vlan_fast_age. That way you would know the VIDs being flushed at
>> the DSA layer, and driver writers needn't concern themselves with having
>> to remember to generate the proper notifications back to the bridge.
>
> See above.
>
>> > Alternatively, if you think that cross-flushing FDBs of multiple MSTIs
>> > isn't a real problem, I suppose we could keep the "port_fast_age" method.
>>
>> What about falling back to it if the driver doesn't support per-VLAN
>> flushing? Flushing all entries will work in most cases, at the cost of
>> some temporary flooding. Seems more useful than refusing the offload
>> completely.
>
> So here's what I don't understand. Do you expect a driver other than
> mv88e6xxx to do something remotely reasonable under a bridge with MSTP
> enabled? The idea being to handle gracefully the case where a port is
> BLOCKING in an MSTI but FORWARDING in another. Because if not, let's
> just outright not offload that kind of bridge, and only concern
> ourselves with what MST-capable drivers can do.

I think you're right. I was trying to make it easier for other driver
writers, but it will just be more confusing and error prone.

Alright, so v3 will have something like this:

bool dsa_port_can_offload_mst(struct dsa_port *dp)
{
return ds->ops->vlan_msti_set &&
ds->ops->port_mst_state_set &&
ds->ops->port_vlan_fast_age &&
dsa_port_can_configure_learning(dp);
}

If this returns false, we have two options:

1. Return -EOPNOTSUPP, which the bridge will be unable to discriminate
from a non-switchdev port saying "I have no idea what you're talking
about". I.e. the bridge will happily apply the config, but the
hardware won't match. I don't like this, but it lines up with most
other stuff.

2. Return a hard error, e.g. -EINVAL/-ENOSYS. This will keep the bridge
in sync with the hardware and also gives some feedback to the
user. This seems like the better approach to me, but it is a new kind
of paradigm.

What do you think?

> I'm shadowing you with a prototype (and untested so far) MSTP
> implementation for the ocelot/felix drivers, and those switches can
> flush the MAC table per VLAN too. So I don't see an immediate need to
> have a fallback implementation if you'll also provide it for mv88e6xxx.
> Let's treat that only if the need arises.

Cool. Agreed, v3 will implement .port_vlan_fast_age for mv88e6xxx.

2022-03-11 12:31:52

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Thu, Mar 10, 2022 at 05:05:35PM +0100, Tobias Waldekranz wrote:
> On Thu, Mar 10, 2022 at 12:35, Vladimir Oltean <[email protected]> wrote:
> > On Thu, Mar 10, 2022 at 09:54:34AM +0100, Tobias Waldekranz wrote:
> >> >> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
> >> >> + switch (state->state) {
> >> >> + case BR_STATE_DISABLED:
> >> >> + case BR_STATE_BLOCKING:
> >> >> + case BR_STATE_LISTENING:
> >> >> + /* Ideally we would only fast age entries
> >> >> + * belonging to VLANs controlled by this
> >> >> + * MST.
> >> >> + */
> >> >> + dsa_port_fast_age(dp);
> >> >
> >> > Does mv88e6xxx support this? If it does, you might just as well
> >> > introduce another variant of ds->ops->port_fast_age() for an msti.
> >>
> >> You can limit ATU operations to a particular FID. So the way I see it we
> >> could either have:
> >>
> >> int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)
> >>
> >> + Maybe more generic. You could imagine there being a way to trigger
> >> this operation from userspace for example.
> >> - We would have to keep the VLAN<->MSTI mapping in the DSA layer in
> >> order to be able to do the fan-out in dsa_port_set_mst_state.
> >>
> >> or:
> >>
> >> int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)
> >>
> >> + Let's the mapping be an internal affair in the driver.
> >> - Perhaps, less generically useful.
> >>
> >> Which one do you prefer? Or is there a hidden third option? :)
> >
> > Yes, I was thinking of "port_msti_fast_age". I don't see a cheap way of
> > keeping VLAN to MSTI associations in the DSA layer. Only if we could
> > retrieve this mapping from the bridge layer - maybe with something
> > analogous to br_vlan_get_info(), but br_mst_get_info(), and this gets
> > passed a VLAN_N_VID sized bitmap, which the bridge populates with ones
> > and zeroes.
>
> That can easily be done. Given that, should we go for port_vlan_fast_age
> instead? port_msti_fast_age feels like an awkward interface, since I
> don't think there is any hardware out there that can actually perform
> that operation without internally fanning it out over all affected VIDs
> (or FIDs in the case of mv88e6xxx).

Yup, yup. My previous email was all over the place with regard to the
available options, because I wrote it in multiple phases so it wasn't
chronologically ordered top-to-bottom. But port_vlan_fast_age() makes
the most sense if you can implement br_mst_get_info(). Same goes for
dsa_port_notify_bridge_fdb_flush().

> > The reason why I asked for this is because I'm not sure of the
> > implications of flushing the entire FDB of the port for a single MSTP
> > state change. It would trigger temporary useless flooding in other MSTIs
> > at the very least. There isn't any backwards compatibility concern to
> > speak of, so we can at least try from the beginning to limit the
> > flushing to the required VLANs.
>
> Aside from the performance implications of flows being temporarily
> flooded I don't think there are any.
>
> I suppose if you've disabled flooding of unknown unicast on that port,
> you would loose the flow until you see some return traffic (or when one
> side gives up and ARPs). While somewhat esoteric, it would be nice to
> handle this case if the hardware supports it.

If by "handle this case" you mean "flush only the affected VLANs", then
yes, I fully agree.

> > What I didn't think about, and will be a problem, is
> > dsa_port_notify_bridge_fdb_flush() - we don't know the vid to flush.
> > The easy way out here would be to export dsa_port_notify_bridge_fdb_flush(),
> > add a "vid" argument to it, and let drivers call it. Thoughts?
>
> To me, this seems to be another argument in favor of
> port_vlan_fast_age. That way you would know the VIDs being flushed at
> the DSA layer, and driver writers needn't concern themselves with having
> to remember to generate the proper notifications back to the bridge.

See above.

> > Alternatively, if you think that cross-flushing FDBs of multiple MSTIs
> > isn't a real problem, I suppose we could keep the "port_fast_age" method.
>
> What about falling back to it if the driver doesn't support per-VLAN
> flushing? Flushing all entries will work in most cases, at the cost of
> some temporary flooding. Seems more useful than refusing the offload
> completely.

So here's what I don't understand. Do you expect a driver other than
mv88e6xxx to do something remotely reasonable under a bridge with MSTP
enabled? The idea being to handle gracefully the case where a port is
BLOCKING in an MSTI but FORWARDING in another. Because if not, let's
just outright not offload that kind of bridge, and only concern
ourselves with what MST-capable drivers can do.
I'm shadowing you with a prototype (and untested so far) MSTP
implementation for the ocelot/felix drivers, and those switches can
flush the MAC table per VLAN too. So I don't see an immediate need to
have a fallback implementation if you'll also provide it for mv88e6xxx.
Let's treat that only if the need arises.

> >> > And since it is new code, you could require that drivers _do_ support
> >> > configuring learning before they could support MSTP. After all, we don't
> >> > want to keep legacy mechanisms in place forever.
> >>
> >> By "configuring learning", do you mean this new fast-age-per-vid/msti,
> >> or being able to enable/disable learning per port? If it's the latter,
> >> I'm not sure I understand how those two are related.
> >
> > The code from dsa_port_set_state() which you've copied:
> >
> > if (!dsa_port_can_configure_learning(dp) ||
> > (do_fast_age && dp->learning)) {
> >
> > has this explanation:
> >
> > 1. DSA keeps standalone ports in the FORWARDING state.
> > 2. DSA also disables address learning on standalone ports, where this is
> > possible (dsa_port_can_configure_learning(dp) == true).
> > 3. When a port joins a bridge, it leaves its FORWARDING state from
> > standalone mode and inherits the bridge port's BLOCKING state
> > 4. dsa_port_set_state() treats a port transition from FORWARDING to
> > BLOCKING as a transition requiring an FDB flush
> > 5. due to (2), the FDB flush at stage (4) is in fact not needed, because
> > the FDB of that port should already be empty. Flushing the FDB may be
> > a costly operation for some drivers, so it is avoided if possible.
> >
> > So this is why the "dsa_port_can_configure_learning()" check is there -
> > for compatibility with drivers that can't configure learning => they
> > keep learning enabled also in standalone mode => they need an FDB flush
> > when a standalone port joins a bridge.
> >
> > What I'm saying is: for drivers that offload MSTP, let's force them to
> > get the basics right first (have configurable learning), rather than go
> > forward forever with a backwards compatibility mode.
>
> Makes sense, I'll just move it up to the initial capability check.

2022-03-11 13:32:07

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 10/10] net: dsa: mv88e6xxx: MST Offloading

On Thu, Mar 10, 2022 at 05:25:47PM +0200, Vladimir Oltean wrote:
> > >> + err = mv88e6xxx_sid_get(chip, attr->orig_dev, vattr->msti, &new_sid);
> > >> + if (err)
> > >> + goto unlock;
> > >> +
> > >> + if (vlan.sid) {
> > >> + err = mv88e6xxx_sid_put(chip, vlan.sid);
> > >> + if (err)
> > >> + goto unlock;
> > >> + }
> > >> +
> > >> + vlan.sid = new_sid;
> > >> + err = mv88e6xxx_vtu_loadpurge(chip, &vlan);
> > >
> > > Maybe you could move mv88e6xxx_sid_put() after this succeeds?
> >
> > Yep. Also made sure to avoid needless updates of the VTU entry if it
> > already belonged to the correct SID.
>
> I realize I gave you conflicting advice here, first with inverting the
> refcount_inc() with the refcount_dec(), then with having fast handling
> of noop-changes to vlan.sid. I hope you're able to make some sense out
> of that and avoid the obvious issue with the refcount temporarily
> dropping to zero before going back to 1, which makes the sanity checker
> complain.

Oh wow... I didn't look at the code again, and commented based on a
false memory. Disregard, sorry. You aren't reversing sid_get with sid_put,
nor did I suggest that. There's a lot that happened just in my head,
apparently.

2022-03-11 14:40:25

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Thu, Mar 10, 2022 at 11:46:45PM +0100, Tobias Waldekranz wrote:
> On Thu, Mar 10, 2022 at 18:18, Vladimir Oltean <[email protected]> wrote:
> > On Thu, Mar 10, 2022 at 05:05:35PM +0100, Tobias Waldekranz wrote:
> >> On Thu, Mar 10, 2022 at 12:35, Vladimir Oltean <[email protected]> wrote:
> >> > On Thu, Mar 10, 2022 at 09:54:34AM +0100, Tobias Waldekranz wrote:
> >> >> >> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
> >> >> >> + switch (state->state) {
> >> >> >> + case BR_STATE_DISABLED:
> >> >> >> + case BR_STATE_BLOCKING:
> >> >> >> + case BR_STATE_LISTENING:
> >> >> >> + /* Ideally we would only fast age entries
> >> >> >> + * belonging to VLANs controlled by this
> >> >> >> + * MST.
> >> >> >> + */
> >> >> >> + dsa_port_fast_age(dp);
> >> >> >
> >> >> > Does mv88e6xxx support this? If it does, you might just as well
> >> >> > introduce another variant of ds->ops->port_fast_age() for an msti.
> >> >>
> >> >> You can limit ATU operations to a particular FID. So the way I see it we
> >> >> could either have:
> >> >>
> >> >> int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)
> >> >>
> >> >> + Maybe more generic. You could imagine there being a way to trigger
> >> >> this operation from userspace for example.
> >> >> - We would have to keep the VLAN<->MSTI mapping in the DSA layer in
> >> >> order to be able to do the fan-out in dsa_port_set_mst_state.
> >> >>
> >> >> or:
> >> >>
> >> >> int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)
> >> >>
> >> >> + Let's the mapping be an internal affair in the driver.
> >> >> - Perhaps, less generically useful.
> >> >>
> >> >> Which one do you prefer? Or is there a hidden third option? :)
> >> >
> >> > Yes, I was thinking of "port_msti_fast_age". I don't see a cheap way of
> >> > keeping VLAN to MSTI associations in the DSA layer. Only if we could
> >> > retrieve this mapping from the bridge layer - maybe with something
> >> > analogous to br_vlan_get_info(), but br_mst_get_info(), and this gets
> >> > passed a VLAN_N_VID sized bitmap, which the bridge populates with ones
> >> > and zeroes.
> >>
> >> That can easily be done. Given that, should we go for port_vlan_fast_age
> >> instead? port_msti_fast_age feels like an awkward interface, since I
> >> don't think there is any hardware out there that can actually perform
> >> that operation without internally fanning it out over all affected VIDs
> >> (or FIDs in the case of mv88e6xxx).
> >
> > Yup, yup. My previous email was all over the place with regard to the
> > available options, because I wrote it in multiple phases so it wasn't
> > chronologically ordered top-to-bottom. But port_vlan_fast_age() makes
> > the most sense if you can implement br_mst_get_info(). Same goes for
> > dsa_port_notify_bridge_fdb_flush().
> >
> >> > The reason why I asked for this is because I'm not sure of the
> >> > implications of flushing the entire FDB of the port for a single MSTP
> >> > state change. It would trigger temporary useless flooding in other MSTIs
> >> > at the very least. There isn't any backwards compatibility concern to
> >> > speak of, so we can at least try from the beginning to limit the
> >> > flushing to the required VLANs.
> >>
> >> Aside from the performance implications of flows being temporarily
> >> flooded I don't think there are any.
> >>
> >> I suppose if you've disabled flooding of unknown unicast on that port,
> >> you would loose the flow until you see some return traffic (or when one
> >> side gives up and ARPs). While somewhat esoteric, it would be nice to
> >> handle this case if the hardware supports it.
> >
> > If by "handle this case" you mean "flush only the affected VLANs", then
> > yes, I fully agree.
> >
> >> > What I didn't think about, and will be a problem, is
> >> > dsa_port_notify_bridge_fdb_flush() - we don't know the vid to flush.
> >> > The easy way out here would be to export dsa_port_notify_bridge_fdb_flush(),
> >> > add a "vid" argument to it, and let drivers call it. Thoughts?
> >>
> >> To me, this seems to be another argument in favor of
> >> port_vlan_fast_age. That way you would know the VIDs being flushed at
> >> the DSA layer, and driver writers needn't concern themselves with having
> >> to remember to generate the proper notifications back to the bridge.
> >
> > See above.
> >
> >> > Alternatively, if you think that cross-flushing FDBs of multiple MSTIs
> >> > isn't a real problem, I suppose we could keep the "port_fast_age" method.
> >>
> >> What about falling back to it if the driver doesn't support per-VLAN
> >> flushing? Flushing all entries will work in most cases, at the cost of
> >> some temporary flooding. Seems more useful than refusing the offload
> >> completely.
> >
> > So here's what I don't understand. Do you expect a driver other than
> > mv88e6xxx to do something remotely reasonable under a bridge with MSTP
> > enabled? The idea being to handle gracefully the case where a port is
> > BLOCKING in an MSTI but FORWARDING in another. Because if not, let's
> > just outright not offload that kind of bridge, and only concern
> > ourselves with what MST-capable drivers can do.
>
> I think you're right. I was trying to make it easier for other driver
> writers, but it will just be more confusing and error prone.
>
> Alright, so v3 will have something like this:
>
> bool dsa_port_can_offload_mst(struct dsa_port *dp)
> {
> return ds->ops->vlan_msti_set &&
> ds->ops->port_mst_state_set &&
> ds->ops->port_vlan_fast_age &&
> dsa_port_can_configure_learning(dp);
> }
>
> If this returns false, we have two options:
>
> 1. Return -EOPNOTSUPP, which the bridge will be unable to discriminate
> from a non-switchdev port saying "I have no idea what you're talking
> about". I.e. the bridge will happily apply the config, but the
> hardware won't match. I don't like this, but it lines up with most
> other stuff.
>
> 2. Return a hard error, e.g. -EINVAL/-ENOSYS. This will keep the bridge
> in sync with the hardware and also gives some feedback to the
> user. This seems like the better approach to me, but it is a new kind
> of paradigm.
>
> What do you think?

Wait, what? It matters a lot where you place the call to
dsa_port_can_offload_mst(), too. You don't have to propagate a hard
error code, either, at least if you make dsa_port_bridge_join() return
-EOPNOTSUPP prior to calling switchdev_bridge_port_offload(), no?
DSA transforms this error code into 0, and dsa_port_offloads_bridge*()
starts returning false, which makes us ignore all MSTP related switchdev
notifiers.
The important part will be to make sure that MSTP is enabled for this
bridge from the get-go (that being the only case in which we can offload
an MSTP aware bridge), and refusing to offload dynamic changes to its
MSTP state. I didn't re-check now, but I think I remember there being
limitations even in the software bridge related to dynamic MSTP mode
changes anyway - there had to not be any port VLANs, which IIUC means
that you actually need to _delete_ the port PVIDs which are automatically
created before you could change the MSTP mode.

This is the model, what's wrong with it? I said "don't offload the
bridge", not "don't offload specific MSTP operations".

> > I'm shadowing you with a prototype (and untested so far) MSTP
> > implementation for the ocelot/felix drivers, and those switches can
> > flush the MAC table per VLAN too. So I don't see an immediate need to
> > have a fallback implementation if you'll also provide it for mv88e6xxx.
> > Let's treat that only if the need arises.
>
> Cool. Agreed, v3 will implement .port_vlan_fast_age for mv88e6xxx.

2022-03-11 15:50:10

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 10/10] net: dsa: mv88e6xxx: MST Offloading

On Fri, Mar 04, 2022 at 00:26, Vladimir Oltean <[email protected]> wrote:
> On Tue, Mar 01, 2022 at 11:03:21AM +0100, Tobias Waldekranz wrote:
>> Allocate a SID in the STU for each MSTID in use by a bridge and handle
>> the mapping of MSTIDs to VLANs using the SID field of each VTU entry.
>>
>> Signed-off-by: Tobias Waldekranz <[email protected]>
>> ---
>> drivers/net/dsa/mv88e6xxx/chip.c | 178 +++++++++++++++++++++++++++++++
>> drivers/net/dsa/mv88e6xxx/chip.h | 13 +++
>> 2 files changed, 191 insertions(+)
>>
>> diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
>> index c14a62aa6a6c..4fb4ec1dff79 100644
>> --- a/drivers/net/dsa/mv88e6xxx/chip.c
>> +++ b/drivers/net/dsa/mv88e6xxx/chip.c
>> @@ -1818,6 +1818,137 @@ static int mv88e6xxx_stu_setup(struct mv88e6xxx_chip *chip)
>> return mv88e6xxx_stu_loadpurge(chip, &stu);
>> }
>>
>> +static int mv88e6xxx_sid_new(struct mv88e6xxx_chip *chip, u8 *sid)
>> +{
>> + DECLARE_BITMAP(busy, MV88E6XXX_N_SID) = { 0 };
>> + struct mv88e6xxx_mst *mst;
>> +
>> + set_bit(0, busy);
>> +
>> + list_for_each_entry(mst, &chip->msts, node) {
>> + set_bit(mst->stu.sid, busy);
>> + }
>> +
>> + *sid = find_first_zero_bit(busy, MV88E6XXX_N_SID);
>> +
>> + return (*sid >= mv88e6xxx_max_sid(chip)) ? -ENOSPC : 0;
>> +}
>> +
>> +static int mv88e6xxx_sid_put(struct mv88e6xxx_chip *chip, u8 sid)
>> +{
>> + struct mv88e6xxx_mst *mst, *tmp;
>> + int err = 0;
>> +
>> + list_for_each_entry_safe(mst, tmp, &chip->msts, node) {
>> + if (mst->stu.sid == sid) {
>> + if (refcount_dec_and_test(&mst->refcnt)) {
>> + mst->stu.valid = false;
>> + err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);
>
> It is interesting what to do if this fails. Possibly not this, because
> the entry remains in hardware but not in software.

True, I will let the error bubble up and keep the SW state in sync with
the hardware.

>> + list_del(&mst->node);
>> + kfree(mst);
>> + }
>> +
>> + return err;
>> + }
>> + }
>> +
>> + return -ENOENT;
>> +}
>> +
>> +static int mv88e6xxx_sid_get(struct mv88e6xxx_chip *chip, struct net_device *br,
>> + u16 msti, u8 *sid)
>> +{
>> + struct mv88e6xxx_mst *mst;
>> + int err, i;
>> +
>> + if (!br)
>> + return 0;
>
> Is this condition possible?

Removing.

>> +
>> + if (!mv88e6xxx_has_stu(chip))
>> + return -EOPNOTSUPP;
>> +
>> + list_for_each_entry(mst, &chip->msts, node) {
>> + if (mst->br == br && mst->msti == msti) {
>> + refcount_inc(&mst->refcnt);
>> + *sid = mst->stu.sid;
>> + return 0;
>> + }
>> + }
>> +
>> + err = mv88e6xxx_sid_new(chip, sid);
>> + if (err)
>> + return err;
>> +
>> + mst = kzalloc(sizeof(*mst), GFP_KERNEL);
>> + if (!mst)
>> + return -ENOMEM;
>
> This leaks the new SID.

I don't think so, the SID is just calculated based on what is in
chip->msts. However:

- The naming is bad. Will change.

>> +
>> + INIT_LIST_HEAD(&mst->node);
>> + refcount_set(&mst->refcnt, 1);
>> + mst->br = br;
>> + mst->msti = msti;
>> + mst->stu.valid = true;
>> + mst->stu.sid = *sid;
>> +
>> + /* The bridge starts out all ports in the disabled state. But
>> + * a STU state of disabled means to go by the port-global
>> + * state. So we set all user port's initial state to blocking,
>> + * to match the bridge's behavior.
>> + */
>> + for (i = 0; i < mv88e6xxx_num_ports(chip); i++)
>> + mst->stu.state[i] = dsa_is_user_port(chip->ds, i) ?
>> + MV88E6XXX_PORT_CTL0_STATE_BLOCKING :
>> + MV88E6XXX_PORT_CTL0_STATE_DISABLED;
>> +
>> + list_add_tail(&mst->node, &chip->msts);
>> + return mv88e6xxx_stu_loadpurge(chip, &mst->stu);
>
> And this doesn't behave too well on failure (the MSTID exists in
> software but not in hardware).

Yes, fixing in v3.

>> +}
>> +
>> +static int mv88e6xxx_port_mst_state_set(struct dsa_switch *ds, int port,
>> + const struct switchdev_mst_state *st)
>> +{
>> + struct dsa_port *dp = dsa_to_port(ds, port);
>> + struct mv88e6xxx_chip *chip = ds->priv;
>> + struct mv88e6xxx_mst *mst;
>> + u8 state;
>> + int err;
>> +
>> + if (!mv88e6xxx_has_stu(chip))
>> + return -EOPNOTSUPP;
>> +
>> + switch (st->state) {
>> + case BR_STATE_DISABLED:
>> + case BR_STATE_BLOCKING:
>> + case BR_STATE_LISTENING:
>> + state = MV88E6XXX_PORT_CTL0_STATE_BLOCKING;
>> + break;
>> + case BR_STATE_LEARNING:
>> + state = MV88E6XXX_PORT_CTL0_STATE_LEARNING;
>> + break;
>> + case BR_STATE_FORWARDING:
>> + state = MV88E6XXX_PORT_CTL0_STATE_FORWARDING;
>> + break;
>> + default:
>> + return -EINVAL;
>> + }
>> +
>> + list_for_each_entry(mst, &chip->msts, node) {
>> + if (mst->br == dsa_port_bridge_dev_get(dp) &&
>> + mst->msti == st->msti) {
>> + if (mst->stu.state[port] == state)
>> + return 0;
>> +
>> + mst->stu.state[port] = state;
>> + mv88e6xxx_reg_lock(chip);
>> + err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);
>> + mv88e6xxx_reg_unlock(chip);
>> + return err;
>> + }
>> + }
>> +
>> + return -ENOENT;
>> +}
>> +
>> static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch *ds, int port,
>> u16 vid)
>> {
>> @@ -2437,6 +2568,12 @@ static int mv88e6xxx_port_vlan_leave(struct mv88e6xxx_chip *chip,
>> if (err)
>> return err;
>>
>> + if (!vlan.valid && vlan.sid) {
>> + err = mv88e6xxx_sid_put(chip, vlan.sid);
>> + if (err)
>> + return err;
>> + }
>> +
>> return mv88e6xxx_g1_atu_remove(chip, vlan.fid, port, false);
>> }
>>
>> @@ -2482,6 +2619,44 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
>> return err;
>> }
>>
>> +static int mv88e6xxx_vlan_msti_set(struct dsa_switch *ds,
>> + const struct switchdev_attr *attr)
>> +{
>> + const struct switchdev_vlan_attr *vattr = &attr->u.vlan_attr;
>> + struct mv88e6xxx_chip *chip = ds->priv;
>> + struct mv88e6xxx_vtu_entry vlan;
>> + u8 new_sid;
>> + int err;
>> +
>> + mv88e6xxx_reg_lock(chip);
>> +
>> + err = mv88e6xxx_vtu_get(chip, vattr->vid, &vlan);
>> + if (err)
>> + goto unlock;
>> +
>> + if (!vlan.valid) {
>> + err = -EINVAL;
>> + goto unlock;
>> + }
>> +
>> + err = mv88e6xxx_sid_get(chip, attr->orig_dev, vattr->msti, &new_sid);
>> + if (err)
>> + goto unlock;
>> +
>> + if (vlan.sid) {
>> + err = mv88e6xxx_sid_put(chip, vlan.sid);
>> + if (err)
>> + goto unlock;
>> + }
>> +
>> + vlan.sid = new_sid;
>> + err = mv88e6xxx_vtu_loadpurge(chip, &vlan);
>
> Maybe you could move mv88e6xxx_sid_put() after this succeeds?

Yep. Also made sure to avoid needless updates of the VTU entry if it
already belonged to the correct SID.

Thanks for the great review!

2022-03-11 21:46:08

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Fri, Mar 11, 2022 at 12:59:54AM +0100, Tobias Waldekranz wrote:
> On Fri, Mar 11, 2022 at 01:08, Vladimir Oltean <[email protected]> wrote:
> > On Thu, Mar 10, 2022 at 11:46:45PM +0100, Tobias Waldekranz wrote:
> >> On Thu, Mar 10, 2022 at 18:18, Vladimir Oltean <[email protected]> wrote:
> >> > On Thu, Mar 10, 2022 at 05:05:35PM +0100, Tobias Waldekranz wrote:
> >> >> On Thu, Mar 10, 2022 at 12:35, Vladimir Oltean <[email protected]> wrote:
> >> >> > On Thu, Mar 10, 2022 at 09:54:34AM +0100, Tobias Waldekranz wrote:
> >> >> >> >> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
> >> >> >> >> + switch (state->state) {
> >> >> >> >> + case BR_STATE_DISABLED:
> >> >> >> >> + case BR_STATE_BLOCKING:
> >> >> >> >> + case BR_STATE_LISTENING:
> >> >> >> >> + /* Ideally we would only fast age entries
> >> >> >> >> + * belonging to VLANs controlled by this
> >> >> >> >> + * MST.
> >> >> >> >> + */
> >> >> >> >> + dsa_port_fast_age(dp);
> >> >> >> >
> >> >> >> > Does mv88e6xxx support this? If it does, you might just as well
> >> >> >> > introduce another variant of ds->ops->port_fast_age() for an msti.
> >> >> >>
> >> >> >> You can limit ATU operations to a particular FID. So the way I see it we
> >> >> >> could either have:
> >> >> >>
> >> >> >> int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)
> >> >> >>
> >> >> >> + Maybe more generic. You could imagine there being a way to trigger
> >> >> >> this operation from userspace for example.
> >> >> >> - We would have to keep the VLAN<->MSTI mapping in the DSA layer in
> >> >> >> order to be able to do the fan-out in dsa_port_set_mst_state.
> >> >> >>
> >> >> >> or:
> >> >> >>
> >> >> >> int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)
> >> >> >>
> >> >> >> + Let's the mapping be an internal affair in the driver.
> >> >> >> - Perhaps, less generically useful.
> >> >> >>
> >> >> >> Which one do you prefer? Or is there a hidden third option? :)
> >> >> >
> >> >> > Yes, I was thinking of "port_msti_fast_age". I don't see a cheap way of
> >> >> > keeping VLAN to MSTI associations in the DSA layer. Only if we could
> >> >> > retrieve this mapping from the bridge layer - maybe with something
> >> >> > analogous to br_vlan_get_info(), but br_mst_get_info(), and this gets
> >> >> > passed a VLAN_N_VID sized bitmap, which the bridge populates with ones
> >> >> > and zeroes.
> >> >>
> >> >> That can easily be done. Given that, should we go for port_vlan_fast_age
> >> >> instead? port_msti_fast_age feels like an awkward interface, since I
> >> >> don't think there is any hardware out there that can actually perform
> >> >> that operation without internally fanning it out over all affected VIDs
> >> >> (or FIDs in the case of mv88e6xxx).
> >> >
> >> > Yup, yup. My previous email was all over the place with regard to the
> >> > available options, because I wrote it in multiple phases so it wasn't
> >> > chronologically ordered top-to-bottom. But port_vlan_fast_age() makes
> >> > the most sense if you can implement br_mst_get_info(). Same goes for
> >> > dsa_port_notify_bridge_fdb_flush().
> >> >
> >> >> > The reason why I asked for this is because I'm not sure of the
> >> >> > implications of flushing the entire FDB of the port for a single MSTP
> >> >> > state change. It would trigger temporary useless flooding in other MSTIs
> >> >> > at the very least. There isn't any backwards compatibility concern to
> >> >> > speak of, so we can at least try from the beginning to limit the
> >> >> > flushing to the required VLANs.
> >> >>
> >> >> Aside from the performance implications of flows being temporarily
> >> >> flooded I don't think there are any.
> >> >>
> >> >> I suppose if you've disabled flooding of unknown unicast on that port,
> >> >> you would loose the flow until you see some return traffic (or when one
> >> >> side gives up and ARPs). While somewhat esoteric, it would be nice to
> >> >> handle this case if the hardware supports it.
> >> >
> >> > If by "handle this case" you mean "flush only the affected VLANs", then
> >> > yes, I fully agree.
> >> >
> >> >> > What I didn't think about, and will be a problem, is
> >> >> > dsa_port_notify_bridge_fdb_flush() - we don't know the vid to flush.
> >> >> > The easy way out here would be to export dsa_port_notify_bridge_fdb_flush(),
> >> >> > add a "vid" argument to it, and let drivers call it. Thoughts?
> >> >>
> >> >> To me, this seems to be another argument in favor of
> >> >> port_vlan_fast_age. That way you would know the VIDs being flushed at
> >> >> the DSA layer, and driver writers needn't concern themselves with having
> >> >> to remember to generate the proper notifications back to the bridge.
> >> >
> >> > See above.
> >> >
> >> >> > Alternatively, if you think that cross-flushing FDBs of multiple MSTIs
> >> >> > isn't a real problem, I suppose we could keep the "port_fast_age" method.
> >> >>
> >> >> What about falling back to it if the driver doesn't support per-VLAN
> >> >> flushing? Flushing all entries will work in most cases, at the cost of
> >> >> some temporary flooding. Seems more useful than refusing the offload
> >> >> completely.
> >> >
> >> > So here's what I don't understand. Do you expect a driver other than
> >> > mv88e6xxx to do something remotely reasonable under a bridge with MSTP
> >> > enabled? The idea being to handle gracefully the case where a port is
> >> > BLOCKING in an MSTI but FORWARDING in another. Because if not, let's
> >> > just outright not offload that kind of bridge, and only concern
> >> > ourselves with what MST-capable drivers can do.
> >>
> >> I think you're right. I was trying to make it easier for other driver
> >> writers, but it will just be more confusing and error prone.
> >>
> >> Alright, so v3 will have something like this:
> >>
> >> bool dsa_port_can_offload_mst(struct dsa_port *dp)
> >> {
> >> return ds->ops->vlan_msti_set &&
> >> ds->ops->port_mst_state_set &&
> >> ds->ops->port_vlan_fast_age &&
> >> dsa_port_can_configure_learning(dp);
> >> }
> >>
> >> If this returns false, we have two options:
> >>
> >> 1. Return -EOPNOTSUPP, which the bridge will be unable to discriminate
> >> from a non-switchdev port saying "I have no idea what you're talking
> >> about". I.e. the bridge will happily apply the config, but the
> >> hardware won't match. I don't like this, but it lines up with most
> >> other stuff.
> >>
> >> 2. Return a hard error, e.g. -EINVAL/-ENOSYS. This will keep the bridge
> >> in sync with the hardware and also gives some feedback to the
> >> user. This seems like the better approach to me, but it is a new kind
> >> of paradigm.
> >>
> >> What do you think?
> >
> > Wait, what? It matters a lot where you place the call to
> > dsa_port_can_offload_mst(), too. You don't have to propagate a hard
> > error code, either, at least if you make dsa_port_bridge_join() return
> > -EOPNOTSUPP prior to calling switchdev_bridge_port_offload(), no?
> > DSA transforms this error code into 0, and dsa_port_offloads_bridge*()
> > starts returning false, which makes us ignore all MSTP related switchdev
> > notifiers.
>
> Right. So we also need:
>
> 1. A br_mst_enabled() that we can call from dsa_port_bridge_join to
> validate the initial state.
>
> 2. A switchdev attr event sent out when enabling/disabling MST on the
> bridge, so that we can NAK the change.

So far, so good. This, to me, is analogous to the way in which a hypothetical
VLAN-unaware switchdev driver wouldn't deny VLAN additions or removals,
but it would only accept a VLAN-unaware bridge, and refuse to transition
into a VLAN-aware one. So even though we wouldn't deny the bridge from
keeping state that would have effect when VLAN awareness is on, we
would just deny the bridge from making that state active. Same with MSTP
awareness in my view - don't deny MSTI migrations, per-MSTI port state
changes etc, just the ability to turn on MSTP awareness.

In practice I have only seen things done the other way around - the
dpaa2-switch driver refuses VLAN-unaware bridges, so it doesn't need to
handle ignoring VLAN switchdev notifiers - a slightly simpler task.
Also, the concept of unoffloaded uppers seems to be pretty unique to DSA
so far, among switchdev drivers.

> > The important part will be to make sure that MSTP is enabled for this
> > bridge from the get-go (that being the only case in which we can offload
> > an MSTP aware bridge), and refusing to offload dynamic changes to its
> > MSTP state. I didn't re-check now, but I think I remember there being
>
> Hang on though. Won't that mean that this sequence...
>
> ip link add dev br0 type bridge \
> vlan_filtering 1 vlan_default_pvid 0 mst_enable 1
> ip link set dev swp1 master br0
>
> ...will work, but offloading will be disabled on swp0; whereas this
> sequence...
>
> ip link add dev br0 type bridge \
> vlan_filtering 1 vlan_default_pvid 0
> ip link set dev swp1 master br0
> ip link set dev br0 type bridge mst_enable 1
>
> ...will fail on the final command? Even though they are logically
> equivalent? But maybe that's just the way the cookie crumbles.

Well, they could be made equivalent for academic purposes, if you're
prepared to dynamically unoffload a bridge port from the MST awareness
notifier, be my guest, I never tried it... I suppose we could try it, in
theory it's just a call to dsa_port_pre_bridge_leave() +
dsa_port_bridge_leave() after all. But it's effort to be spent in work
and testing, and I'm not sure whether anyone will see the benefit or use
case. During initial bridge join, at least it's an established code
path, the drivers which don't implement ds->ops->port_bridge_join() have
exercised it. Alvin Šipraga has fixed a few bugs related to rtl8365mb
and this after some recent rework, it should work just fine now.

> > limitations even in the software bridge related to dynamic MSTP mode
> > changes anyway - there had to not be any port VLANs, which IIUC means
> > that you actually need to _delete_ the port PVIDs which are automatically
> > created before you could change the MSTP mode.
>
> There are some ergonomic issues there, yes. I might look at it again and
> see if there is some reasonable way of allowing the mode to be changed
> even when VLANs are present.
>
> > This is the model, what's wrong with it? I said "don't offload the
> > bridge", not "don't offload specific MSTP operations".
>
> Nothing is wrong, I just couldn't see the whole picture.
>
> This is the way.

2022-03-11 22:24:39

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Fri, Mar 11, 2022 at 02:22, Vladimir Oltean <[email protected]> wrote:
> On Fri, Mar 11, 2022 at 12:59:54AM +0100, Tobias Waldekranz wrote:
>> On Fri, Mar 11, 2022 at 01:08, Vladimir Oltean <[email protected]> wrote:
>> > On Thu, Mar 10, 2022 at 11:46:45PM +0100, Tobias Waldekranz wrote:
>> >> On Thu, Mar 10, 2022 at 18:18, Vladimir Oltean <[email protected]> wrote:
>> >> > On Thu, Mar 10, 2022 at 05:05:35PM +0100, Tobias Waldekranz wrote:
>> >> >> On Thu, Mar 10, 2022 at 12:35, Vladimir Oltean <[email protected]> wrote:
>> >> >> > On Thu, Mar 10, 2022 at 09:54:34AM +0100, Tobias Waldekranz wrote:
>> >> >> >> >> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
>> >> >> >> >> + switch (state->state) {
>> >> >> >> >> + case BR_STATE_DISABLED:
>> >> >> >> >> + case BR_STATE_BLOCKING:
>> >> >> >> >> + case BR_STATE_LISTENING:
>> >> >> >> >> + /* Ideally we would only fast age entries
>> >> >> >> >> + * belonging to VLANs controlled by this
>> >> >> >> >> + * MST.
>> >> >> >> >> + */
>> >> >> >> >> + dsa_port_fast_age(dp);
>> >> >> >> >
>> >> >> >> > Does mv88e6xxx support this? If it does, you might just as well
>> >> >> >> > introduce another variant of ds->ops->port_fast_age() for an msti.
>> >> >> >>
>> >> >> >> You can limit ATU operations to a particular FID. So the way I see it we
>> >> >> >> could either have:
>> >> >> >>
>> >> >> >> int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)
>> >> >> >>
>> >> >> >> + Maybe more generic. You could imagine there being a way to trigger
>> >> >> >> this operation from userspace for example.
>> >> >> >> - We would have to keep the VLAN<->MSTI mapping in the DSA layer in
>> >> >> >> order to be able to do the fan-out in dsa_port_set_mst_state.
>> >> >> >>
>> >> >> >> or:
>> >> >> >>
>> >> >> >> int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)
>> >> >> >>
>> >> >> >> + Let's the mapping be an internal affair in the driver.
>> >> >> >> - Perhaps, less generically useful.
>> >> >> >>
>> >> >> >> Which one do you prefer? Or is there a hidden third option? :)
>> >> >> >
>> >> >> > Yes, I was thinking of "port_msti_fast_age". I don't see a cheap way of
>> >> >> > keeping VLAN to MSTI associations in the DSA layer. Only if we could
>> >> >> > retrieve this mapping from the bridge layer - maybe with something
>> >> >> > analogous to br_vlan_get_info(), but br_mst_get_info(), and this gets
>> >> >> > passed a VLAN_N_VID sized bitmap, which the bridge populates with ones
>> >> >> > and zeroes.
>> >> >>
>> >> >> That can easily be done. Given that, should we go for port_vlan_fast_age
>> >> >> instead? port_msti_fast_age feels like an awkward interface, since I
>> >> >> don't think there is any hardware out there that can actually perform
>> >> >> that operation without internally fanning it out over all affected VIDs
>> >> >> (or FIDs in the case of mv88e6xxx).
>> >> >
>> >> > Yup, yup. My previous email was all over the place with regard to the
>> >> > available options, because I wrote it in multiple phases so it wasn't
>> >> > chronologically ordered top-to-bottom. But port_vlan_fast_age() makes
>> >> > the most sense if you can implement br_mst_get_info(). Same goes for
>> >> > dsa_port_notify_bridge_fdb_flush().
>> >> >
>> >> >> > The reason why I asked for this is because I'm not sure of the
>> >> >> > implications of flushing the entire FDB of the port for a single MSTP
>> >> >> > state change. It would trigger temporary useless flooding in other MSTIs
>> >> >> > at the very least. There isn't any backwards compatibility concern to
>> >> >> > speak of, so we can at least try from the beginning to limit the
>> >> >> > flushing to the required VLANs.
>> >> >>
>> >> >> Aside from the performance implications of flows being temporarily
>> >> >> flooded I don't think there are any.
>> >> >>
>> >> >> I suppose if you've disabled flooding of unknown unicast on that port,
>> >> >> you would loose the flow until you see some return traffic (or when one
>> >> >> side gives up and ARPs). While somewhat esoteric, it would be nice to
>> >> >> handle this case if the hardware supports it.
>> >> >
>> >> > If by "handle this case" you mean "flush only the affected VLANs", then
>> >> > yes, I fully agree.
>> >> >
>> >> >> > What I didn't think about, and will be a problem, is
>> >> >> > dsa_port_notify_bridge_fdb_flush() - we don't know the vid to flush.
>> >> >> > The easy way out here would be to export dsa_port_notify_bridge_fdb_flush(),
>> >> >> > add a "vid" argument to it, and let drivers call it. Thoughts?
>> >> >>
>> >> >> To me, this seems to be another argument in favor of
>> >> >> port_vlan_fast_age. That way you would know the VIDs being flushed at
>> >> >> the DSA layer, and driver writers needn't concern themselves with having
>> >> >> to remember to generate the proper notifications back to the bridge.
>> >> >
>> >> > See above.
>> >> >
>> >> >> > Alternatively, if you think that cross-flushing FDBs of multiple MSTIs
>> >> >> > isn't a real problem, I suppose we could keep the "port_fast_age" method.
>> >> >>
>> >> >> What about falling back to it if the driver doesn't support per-VLAN
>> >> >> flushing? Flushing all entries will work in most cases, at the cost of
>> >> >> some temporary flooding. Seems more useful than refusing the offload
>> >> >> completely.
>> >> >
>> >> > So here's what I don't understand. Do you expect a driver other than
>> >> > mv88e6xxx to do something remotely reasonable under a bridge with MSTP
>> >> > enabled? The idea being to handle gracefully the case where a port is
>> >> > BLOCKING in an MSTI but FORWARDING in another. Because if not, let's
>> >> > just outright not offload that kind of bridge, and only concern
>> >> > ourselves with what MST-capable drivers can do.
>> >>
>> >> I think you're right. I was trying to make it easier for other driver
>> >> writers, but it will just be more confusing and error prone.
>> >>
>> >> Alright, so v3 will have something like this:
>> >>
>> >> bool dsa_port_can_offload_mst(struct dsa_port *dp)
>> >> {
>> >> return ds->ops->vlan_msti_set &&
>> >> ds->ops->port_mst_state_set &&
>> >> ds->ops->port_vlan_fast_age &&
>> >> dsa_port_can_configure_learning(dp);
>> >> }
>> >>
>> >> If this returns false, we have two options:
>> >>
>> >> 1. Return -EOPNOTSUPP, which the bridge will be unable to discriminate
>> >> from a non-switchdev port saying "I have no idea what you're talking
>> >> about". I.e. the bridge will happily apply the config, but the
>> >> hardware won't match. I don't like this, but it lines up with most
>> >> other stuff.
>> >>
>> >> 2. Return a hard error, e.g. -EINVAL/-ENOSYS. This will keep the bridge
>> >> in sync with the hardware and also gives some feedback to the
>> >> user. This seems like the better approach to me, but it is a new kind
>> >> of paradigm.
>> >>
>> >> What do you think?
>> >
>> > Wait, what? It matters a lot where you place the call to
>> > dsa_port_can_offload_mst(), too. You don't have to propagate a hard
>> > error code, either, at least if you make dsa_port_bridge_join() return
>> > -EOPNOTSUPP prior to calling switchdev_bridge_port_offload(), no?
>> > DSA transforms this error code into 0, and dsa_port_offloads_bridge*()
>> > starts returning false, which makes us ignore all MSTP related switchdev
>> > notifiers.
>>
>> Right. So we also need:
>>
>> 1. A br_mst_enabled() that we can call from dsa_port_bridge_join to
>> validate the initial state.
>>
>> 2. A switchdev attr event sent out when enabling/disabling MST on the
>> bridge, so that we can NAK the change.
>
> So far, so good. This, to me, is analogous to the way in which a hypothetical
> VLAN-unaware switchdev driver wouldn't deny VLAN additions or removals,
> but it would only accept a VLAN-unaware bridge, and refuse to transition
> into a VLAN-aware one. So even though we wouldn't deny the bridge from
> keeping state that would have effect when VLAN awareness is on, we
> would just deny the bridge from making that state active. Same with MSTP
> awareness in my view - don't deny MSTI migrations, per-MSTI port state
> changes etc, just the ability to turn on MSTP awareness.
>
> In practice I have only seen things done the other way around - the
> dpaa2-switch driver refuses VLAN-unaware bridges, so it doesn't need to
> handle ignoring VLAN switchdev notifiers - a slightly simpler task.
> Also, the concept of unoffloaded uppers seems to be pretty unique to DSA
> so far, among switchdev drivers.
>
>> > The important part will be to make sure that MSTP is enabled for this
>> > bridge from the get-go (that being the only case in which we can offload
>> > an MSTP aware bridge), and refusing to offload dynamic changes to its
>> > MSTP state. I didn't re-check now, but I think I remember there being
>>
>> Hang on though. Won't that mean that this sequence...
>>
>> ip link add dev br0 type bridge \
>> vlan_filtering 1 vlan_default_pvid 0 mst_enable 1
>> ip link set dev swp1 master br0
>>
>> ...will work, but offloading will be disabled on swp0; whereas this
>> sequence...
>>
>> ip link add dev br0 type bridge \
>> vlan_filtering 1 vlan_default_pvid 0
>> ip link set dev swp1 master br0
>> ip link set dev br0 type bridge mst_enable 1
>>
>> ...will fail on the final command? Even though they are logically
>> equivalent? But maybe that's just the way the cookie crumbles.
>
> Well, they could be made equivalent for academic purposes, if you're
> prepared to dynamically unoffload a bridge port from the MST awareness
> notifier, be my guest, I never tried it... I suppose we could try it, in
> theory it's just a call to dsa_port_pre_bridge_leave() +
> dsa_port_bridge_leave() after all. But it's effort to be spent in work
> and testing, and I'm not sure whether anyone will see the benefit or use
> case. During initial bridge join, at least it's an established code
> path, the drivers which don't implement ds->ops->port_bridge_join() have
> exercised it. Alvin Šipraga has fixed a few bugs related to rtl8365mb
> and this after some recent rework, it should work just fine now.

I completely agree. Just wanted to make sure that I had understood it
correctly. Thanks.

>> > limitations even in the software bridge related to dynamic MSTP mode
>> > changes anyway - there had to not be any port VLANs, which IIUC means
>> > that you actually need to _delete_ the port PVIDs which are automatically
>> > created before you could change the MSTP mode.
>>
>> There are some ergonomic issues there, yes. I might look at it again and
>> see if there is some reasonable way of allowing the mode to be changed
>> even when VLANs are present.
>>
>> > This is the model, what's wrong with it? I said "don't offload the
>> > bridge", not "don't offload specific MSTP operations".
>>
>> Nothing is wrong, I just couldn't see the whole picture.
>>
>> This is the way.

2022-03-11 22:54:52

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Thu, Mar 10, 2022 at 17:05, Tobias Waldekranz <[email protected]> wrote:
> On Thu, Mar 10, 2022 at 12:35, Vladimir Oltean <[email protected]> wrote:
>> On Thu, Mar 10, 2022 at 09:54:34AM +0100, Tobias Waldekranz wrote:
>>> >> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
>>> >> + switch (state->state) {
>>> >> + case BR_STATE_DISABLED:
>>> >> + case BR_STATE_BLOCKING:
>>> >> + case BR_STATE_LISTENING:
>>> >> + /* Ideally we would only fast age entries
>>> >> + * belonging to VLANs controlled by this
>>> >> + * MST.
>>> >> + */
>>> >> + dsa_port_fast_age(dp);
>>> >
>>> > Does mv88e6xxx support this? If it does, you might just as well
>>> > introduce another variant of ds->ops->port_fast_age() for an msti.
>>>
>>> You can limit ATU operations to a particular FID. So the way I see it we
>>> could either have:
>>>
>>> int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)
>>>
>>> + Maybe more generic. You could imagine there being a way to trigger
>>> this operation from userspace for example.
>>> - We would have to keep the VLAN<->MSTI mapping in the DSA layer in
>>> order to be able to do the fan-out in dsa_port_set_mst_state.
>>>
>>> or:
>>>
>>> int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)
>>>
>>> + Let's the mapping be an internal affair in the driver.
>>> - Perhaps, less generically useful.
>>>
>>> Which one do you prefer? Or is there a hidden third option? :)
>>
>> Yes, I was thinking of "port_msti_fast_age". I don't see a cheap way of
>> keeping VLAN to MSTI associations in the DSA layer. Only if we could
>> retrieve this mapping from the bridge layer - maybe with something
>> analogous to br_vlan_get_info(), but br_mst_get_info(), and this gets
>> passed a VLAN_N_VID sized bitmap, which the bridge populates with ones
>> and zeroes.
>
> That can easily be done. Given that, should we go for port_vlan_fast_age
> instead? port_msti_fast_age feels like an awkward interface, since I
> don't think there is any hardware out there that can actually perform
> that operation without internally fanning it out over all affected VIDs
> (or FIDs in the case of mv88e6xxx).
>
>> The reason why I asked for this is because I'm not sure of the
>> implications of flushing the entire FDB of the port for a single MSTP
>> state change. It would trigger temporary useless flooding in other MSTIs
>> at the very least. There isn't any backwards compatibility concern to
>> speak of, so we can at least try from the beginning to limit the
>> flushing to the required VLANs.
>
> Aside from the performance implications of flows being temporarily
> flooded I don't think there are any.
>
> I suppose if you've disabled flooding of unknown unicast on that port,
> you would loose the flow until you see some return traffic (or when one
> side gives up and ARPs). While somewhat esoteric, it would be nice to
> handle this case if the hardware supports it.
>
>> What I didn't think about, and will be a problem, is
>> dsa_port_notify_bridge_fdb_flush() - we don't know the vid to flush.
>> The easy way out here would be to export dsa_port_notify_bridge_fdb_flush(),
>> add a "vid" argument to it, and let drivers call it. Thoughts?
>
> To me, this seems to be another argument in favor of
> port_vlan_fast_age. That way you would know the VIDs being flushed at
> the DSA layer, and driver writers needn't concern themselves with having
> to remember to generate the proper notifications back to the bridge.
>
>> Alternatively, if you think that cross-flushing FDBs of multiple MSTIs
>> isn't a real problem, I suppose we could keep the "port_fast_age" method.
>
> What about falling back to it if the driver doesn't support per-VLAN
> flushing? Flushing all entries will work in most cases, at the cost of
> some temporary flooding. Seems more useful than refusing the offload
> completely.

Actually now that I think about it, maybe it is more reasonable to risk
having stale entries in the VLANs where the topology changed, rather
than nuking flows in unrelated VLANs.

>>> > And since it is new code, you could require that drivers _do_ support
>>> > configuring learning before they could support MSTP. After all, we don't
>>> > want to keep legacy mechanisms in place forever.
>>>
>>> By "configuring learning", do you mean this new fast-age-per-vid/msti,
>>> or being able to enable/disable learning per port? If it's the latter,
>>> I'm not sure I understand how those two are related.
>>
>> The code from dsa_port_set_state() which you've copied:
>>
>> if (!dsa_port_can_configure_learning(dp) ||
>> (do_fast_age && dp->learning)) {
>>
>> has this explanation:
>>
>> 1. DSA keeps standalone ports in the FORWARDING state.
>> 2. DSA also disables address learning on standalone ports, where this is
>> possible (dsa_port_can_configure_learning(dp) == true).
>> 3. When a port joins a bridge, it leaves its FORWARDING state from
>> standalone mode and inherits the bridge port's BLOCKING state
>> 4. dsa_port_set_state() treats a port transition from FORWARDING to
>> BLOCKING as a transition requiring an FDB flush
>> 5. due to (2), the FDB flush at stage (4) is in fact not needed, because
>> the FDB of that port should already be empty. Flushing the FDB may be
>> a costly operation for some drivers, so it is avoided if possible.
>>
>> So this is why the "dsa_port_can_configure_learning()" check is there -
>> for compatibility with drivers that can't configure learning => they
>> keep learning enabled also in standalone mode => they need an FDB flush
>> when a standalone port joins a bridge.
>>
>> What I'm saying is: for drivers that offload MSTP, let's force them to
>> get the basics right first (have configurable learning), rather than go
>> forward forever with a backwards compatibility mode.
>
> Makes sense, I'll just move it up to the initial capability check.

2022-03-11 23:23:07

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [PATCH v2 net-next 07/10] net: dsa: Pass MST state changes to driver

On Fri, Mar 11, 2022 at 01:08, Vladimir Oltean <[email protected]> wrote:
> On Thu, Mar 10, 2022 at 11:46:45PM +0100, Tobias Waldekranz wrote:
>> On Thu, Mar 10, 2022 at 18:18, Vladimir Oltean <[email protected]> wrote:
>> > On Thu, Mar 10, 2022 at 05:05:35PM +0100, Tobias Waldekranz wrote:
>> >> On Thu, Mar 10, 2022 at 12:35, Vladimir Oltean <[email protected]> wrote:
>> >> > On Thu, Mar 10, 2022 at 09:54:34AM +0100, Tobias Waldekranz wrote:
>> >> >> >> + if (!dsa_port_can_configure_learning(dp) || dp->learning) {
>> >> >> >> + switch (state->state) {
>> >> >> >> + case BR_STATE_DISABLED:
>> >> >> >> + case BR_STATE_BLOCKING:
>> >> >> >> + case BR_STATE_LISTENING:
>> >> >> >> + /* Ideally we would only fast age entries
>> >> >> >> + * belonging to VLANs controlled by this
>> >> >> >> + * MST.
>> >> >> >> + */
>> >> >> >> + dsa_port_fast_age(dp);
>> >> >> >
>> >> >> > Does mv88e6xxx support this? If it does, you might just as well
>> >> >> > introduce another variant of ds->ops->port_fast_age() for an msti.
>> >> >>
>> >> >> You can limit ATU operations to a particular FID. So the way I see it we
>> >> >> could either have:
>> >> >>
>> >> >> int (*port_vlan_fast_age)(struct dsa_switch *ds, int port, u16 vid)
>> >> >>
>> >> >> + Maybe more generic. You could imagine there being a way to trigger
>> >> >> this operation from userspace for example.
>> >> >> - We would have to keep the VLAN<->MSTI mapping in the DSA layer in
>> >> >> order to be able to do the fan-out in dsa_port_set_mst_state.
>> >> >>
>> >> >> or:
>> >> >>
>> >> >> int (*port_msti_fast_age)(struct dsa_switch *ds, int port, u16 msti)
>> >> >>
>> >> >> + Let's the mapping be an internal affair in the driver.
>> >> >> - Perhaps, less generically useful.
>> >> >>
>> >> >> Which one do you prefer? Or is there a hidden third option? :)
>> >> >
>> >> > Yes, I was thinking of "port_msti_fast_age". I don't see a cheap way of
>> >> > keeping VLAN to MSTI associations in the DSA layer. Only if we could
>> >> > retrieve this mapping from the bridge layer - maybe with something
>> >> > analogous to br_vlan_get_info(), but br_mst_get_info(), and this gets
>> >> > passed a VLAN_N_VID sized bitmap, which the bridge populates with ones
>> >> > and zeroes.
>> >>
>> >> That can easily be done. Given that, should we go for port_vlan_fast_age
>> >> instead? port_msti_fast_age feels like an awkward interface, since I
>> >> don't think there is any hardware out there that can actually perform
>> >> that operation without internally fanning it out over all affected VIDs
>> >> (or FIDs in the case of mv88e6xxx).
>> >
>> > Yup, yup. My previous email was all over the place with regard to the
>> > available options, because I wrote it in multiple phases so it wasn't
>> > chronologically ordered top-to-bottom. But port_vlan_fast_age() makes
>> > the most sense if you can implement br_mst_get_info(). Same goes for
>> > dsa_port_notify_bridge_fdb_flush().
>> >
>> >> > The reason why I asked for this is because I'm not sure of the
>> >> > implications of flushing the entire FDB of the port for a single MSTP
>> >> > state change. It would trigger temporary useless flooding in other MSTIs
>> >> > at the very least. There isn't any backwards compatibility concern to
>> >> > speak of, so we can at least try from the beginning to limit the
>> >> > flushing to the required VLANs.
>> >>
>> >> Aside from the performance implications of flows being temporarily
>> >> flooded I don't think there are any.
>> >>
>> >> I suppose if you've disabled flooding of unknown unicast on that port,
>> >> you would loose the flow until you see some return traffic (or when one
>> >> side gives up and ARPs). While somewhat esoteric, it would be nice to
>> >> handle this case if the hardware supports it.
>> >
>> > If by "handle this case" you mean "flush only the affected VLANs", then
>> > yes, I fully agree.
>> >
>> >> > What I didn't think about, and will be a problem, is
>> >> > dsa_port_notify_bridge_fdb_flush() - we don't know the vid to flush.
>> >> > The easy way out here would be to export dsa_port_notify_bridge_fdb_flush(),
>> >> > add a "vid" argument to it, and let drivers call it. Thoughts?
>> >>
>> >> To me, this seems to be another argument in favor of
>> >> port_vlan_fast_age. That way you would know the VIDs being flushed at
>> >> the DSA layer, and driver writers needn't concern themselves with having
>> >> to remember to generate the proper notifications back to the bridge.
>> >
>> > See above.
>> >
>> >> > Alternatively, if you think that cross-flushing FDBs of multiple MSTIs
>> >> > isn't a real problem, I suppose we could keep the "port_fast_age" method.
>> >>
>> >> What about falling back to it if the driver doesn't support per-VLAN
>> >> flushing? Flushing all entries will work in most cases, at the cost of
>> >> some temporary flooding. Seems more useful than refusing the offload
>> >> completely.
>> >
>> > So here's what I don't understand. Do you expect a driver other than
>> > mv88e6xxx to do something remotely reasonable under a bridge with MSTP
>> > enabled? The idea being to handle gracefully the case where a port is
>> > BLOCKING in an MSTI but FORWARDING in another. Because if not, let's
>> > just outright not offload that kind of bridge, and only concern
>> > ourselves with what MST-capable drivers can do.
>>
>> I think you're right. I was trying to make it easier for other driver
>> writers, but it will just be more confusing and error prone.
>>
>> Alright, so v3 will have something like this:
>>
>> bool dsa_port_can_offload_mst(struct dsa_port *dp)
>> {
>> return ds->ops->vlan_msti_set &&
>> ds->ops->port_mst_state_set &&
>> ds->ops->port_vlan_fast_age &&
>> dsa_port_can_configure_learning(dp);
>> }
>>
>> If this returns false, we have two options:
>>
>> 1. Return -EOPNOTSUPP, which the bridge will be unable to discriminate
>> from a non-switchdev port saying "I have no idea what you're talking
>> about". I.e. the bridge will happily apply the config, but the
>> hardware won't match. I don't like this, but it lines up with most
>> other stuff.
>>
>> 2. Return a hard error, e.g. -EINVAL/-ENOSYS. This will keep the bridge
>> in sync with the hardware and also gives some feedback to the
>> user. This seems like the better approach to me, but it is a new kind
>> of paradigm.
>>
>> What do you think?
>
> Wait, what? It matters a lot where you place the call to
> dsa_port_can_offload_mst(), too. You don't have to propagate a hard
> error code, either, at least if you make dsa_port_bridge_join() return
> -EOPNOTSUPP prior to calling switchdev_bridge_port_offload(), no?
> DSA transforms this error code into 0, and dsa_port_offloads_bridge*()
> starts returning false, which makes us ignore all MSTP related switchdev
> notifiers.

Right. So we also need:

1. A br_mst_enabled() that we can call from dsa_port_bridge_join to
validate the initial state.

2. A switchdev attr event sent out when enabling/disabling MST on the
bridge, so that we can NAK the change.

> The important part will be to make sure that MSTP is enabled for this
> bridge from the get-go (that being the only case in which we can offload
> an MSTP aware bridge), and refusing to offload dynamic changes to its
> MSTP state. I didn't re-check now, but I think I remember there being

Hang on though. Won't that mean that this sequence...

ip link add dev br0 type bridge \
vlan_filtering 1 vlan_default_pvid 0 mst_enable 1
ip link set dev swp1 master br0

...will work, but offloading will be disabled on swp0; whereas this
sequence...

ip link add dev br0 type bridge \
vlan_filtering 1 vlan_default_pvid 0
ip link set dev swp1 master br0
ip link set dev br0 type bridge mst_enable 1

...will fail on the final command? Even though they are logically
equivalent? But maybe that's just the way the cookie crumbles.

> limitations even in the software bridge related to dynamic MSTP mode
> changes anyway - there had to not be any port VLANs, which IIUC means
> that you actually need to _delete_ the port PVIDs which are automatically
> created before you could change the MSTP mode.

There are some ergonomic issues there, yes. I might look at it again and
see if there is some reasonable way of allowing the mode to be changed
even when VLANs are present.

> This is the model, what's wrong with it? I said "don't offload the
> bridge", not "don't offload specific MSTP operations".

Nothing is wrong, I just couldn't see the whole picture.

This is the way.