2022-02-16 13:55:24

by Tobias Waldekranz

[permalink] [raw]
Subject: [RFC net-next 0/9] net: bridge: vlan: Multiple Spanning Trees

The bridge has had per-VLAN STP support for a while now, since:

https://lore.kernel.org/netdev/[email protected]/

The current implementation has some problems:

- The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
is managed independently. This is awkward from an MSTP (802.1Q-2018,
Clause 13.5) point of view, where the model is that multiple VLANs
are grouped into MST instances.

Because of the way that the standard is written, presumably, this is
also reflected in hardware implementations. It is not uncommon for a
switch to support the full 4k range of VIDs, but that the pool of
MST instances is much smaller. Some examples:

Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
Marvell Prestera: 4k VLANs, but only 128 MSTIs
Microchip SparX-5i: 4k VLANs, but only 128 MSTIs

- By default, the feature is enabled, and there is no way to disable
it. This makes it hard to add offloading in a backwards compatible
way, since any underlying switchdevs have no way to refuse the
function if the hardware does not support it

- The port-global STP state has precedence over per-VLAN states. In
MSTP, as far as I understand it, all VLANs will use the common
spanning tree (CST) by default - through traffic engineering you can
then optimize your network to group subsets of VLANs to use
different trees (MSTI). To my understanding, the way this is
typically managed in silicon is roughly:

Incoming packet:
.----.----.--------------.----.-------------
| DA | SA | 802.1Q VID=X | ET | Payload ...
'----'----'--------------'----'-------------
|
'->|\ .----------------------------.
| +--> | VID | Members | ... | MSTI |
PVID -->|/ |-----|---------|-----|------|
| 1 | 0001001 | ... | 0 |
| 2 | 0001010 | ... | 10 |
| 3 | 0001100 | ... | 10 |
'----------------------------'
|
.-----------------------------'
| .------------------------.
'->| MSTI | Fwding | Lrning |
|------|--------|--------|
| 0 | 111110 | 111110 |
| 10 | 110111 | 110111 |
'------------------------'

What this is trying to show is that the STP state (whether MSTP is
used, or ye olde STP) is always accessed via the VLAN table. If STP
is running, all MSTI pointers in that table will reference the same
index in the STP stable - if MSTP is running, some VLANs may point
to other trees (like in this example).

The fact that in the Linux bridge, the global state (think: index 0
in most hardware implementations) is supposed to override the
per-VLAN state, is very awkward to offload. In effect, this means
that when the global state changes to blocking, drivers will have to
iterate over all MSTIs in use, and alter them all to match. This
also means that you have to cache whether the hardware state is
currently tracking the global state or the per-VLAN state. In the
first case, you also have to cache the per-VLAN state so that you
can restore it if the global state transitions back to forwarding.

This series adds support for an arbitrary M:N mapping of VIDs to
MSTIs, proposing one solution to the first issue. An example of an
offload implementation for mv88e6xxx is also provided. Offloading is
done on a best-effort basis, i.e. notifications of the relevant events
are generated, but there is no way for the user to see whether the
per-VLAN state has been offloaded or not. There is also no handling of
the relationship between the port-global state the the per-VLAN ditto.

If I was king of net/bridge/*, I would make the following additional
changes:

- By default, when a VLAN is created, assign it to MSTID 0, which
would mean that no per-VLAN state is used and that packets belonging
to this VLAN should be filtered according to the port-global state.

This way, when a VLAN is configured to use a separate tree (setting
a non-zero MSTID), an underlying switchdev could oppose it if it is
not supported.

Obviously, this adds an extra step for existing users of per-VLAN
STP states and would thus not be backwards compatible. Maybe this
means that that is impossible to do, maybe not.

- Swap the precedence of the port-global and the per-VLAN state,
i.e. the port-global state only applies to packets belonging to
VLANs that does not make use of a per-VLAN state (MSTID != 0).

This would make the offloading much more natural, as you avoid all
of the caching stuff described above.

Again, this changes the behavior of the kernel so it is not
backwards compatible. I suspect that this is less of an issue
though, since my guess is that very few people rely on the old
behavior.

Thoughts?

Tobias Waldekranz (9):
net: bridge: vlan: Introduce multiple spanning trees (MST)
net: bridge: vlan: Allow multiple VLANs to be mapped to a single MST
net: bridge: vlan: Notify switchdev drivers of VLAN MST migrations
net: bridge: vlan: Notify switchdev drivers of MST state changes
net: dsa: Pass VLAN MST migration notifications to driver
net: dsa: Pass MST state changes to driver
net: dsa: mv88e6xxx: Disentangle STU from VTU
net: dsa: mv88e6xxx: Export STU as devlink region
net: dsa: mv88e6xxx: MST Offloading

drivers/net/dsa/mv88e6xxx/chip.c | 223 +++++++++++++++++
drivers/net/dsa/mv88e6xxx/chip.h | 38 +++
drivers/net/dsa/mv88e6xxx/devlink.c | 94 +++++++
drivers/net/dsa/mv88e6xxx/global1.h | 10 +
drivers/net/dsa/mv88e6xxx/global1_vtu.c | 311 ++++++++++++++----------
include/linux/if_bridge.h | 6 +
include/net/dsa.h | 5 +
include/net/switchdev.h | 17 ++
include/uapi/linux/if_bridge.h | 1 +
net/bridge/br_private.h | 44 +++-
net/bridge/br_vlan.c | 249 ++++++++++++++++++-
net/bridge/br_vlan_options.c | 48 +++-
net/dsa/dsa_priv.h | 3 +
net/dsa/port.c | 40 +++
net/dsa/slave.c | 12 +
15 files changed, 941 insertions(+), 160 deletions(-)

--
2.25.1


2022-02-16 14:05:03

by Tobias Waldekranz

[permalink] [raw]
Subject: [RFC net-next 1/9] net: bridge: vlan: Introduce multiple spanning trees (MST)

Before this commit, the bridge was able to manage the forwarding state
of a port on either a global or per-VLAN basis. I.e. either 1:N or
N:N. There are two issues with this:

1. In order to support MSTP (802.1Q-2018 13.5), the controlling entity
expects the bridge to be able to group multiple VLANs to operate on
the same tree (MST). I.e. an M:N mapping, where M <= N.

2. Some hardware (e.g. mv88e6xxx) has a smaller pool of spanning tree
groups than VLANs. I.e. the full set of 4k VLANs can be configured,
but each VLAN must be mapped to one of only 64 spanning trees.

While somewhat less efficient (and non-atomic), (1) can be worked
around in software by iterating over all affected VLANs when changing
the state of a tree to make sure that they are all in
sync. Unfortunately, (2) means that offloading is not possible in this
architecture.

Therefore, add a level of indirection in the per-VLAN STP state. By
default, each new VLAN will be assigned to a separate MST. I.e. there
are no functional changes introduced by this commit.

Upcoming commits will then extend the VLAN DB configuration to allow
arbitrary M:N mappings.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
include/linux/if_bridge.h | 6 ++
net/bridge/br_private.h | 41 +++++--
net/bridge/br_vlan.c | 200 +++++++++++++++++++++++++++++++++--
net/bridge/br_vlan_options.c | 9 +-
4 files changed, 234 insertions(+), 22 deletions(-)

diff --git a/include/linux/if_bridge.h b/include/linux/if_bridge.h
index 509e18c7e740..a3b0e95c3047 100644
--- a/include/linux/if_bridge.h
+++ b/include/linux/if_bridge.h
@@ -118,6 +118,7 @@ int br_vlan_get_info(const struct net_device *dev, u16 vid,
struct bridge_vlan_info *p_vinfo);
int br_vlan_get_info_rcu(const struct net_device *dev, u16 vid,
struct bridge_vlan_info *p_vinfo);
+int br_vlan_get_mstid(const struct net_device *dev, u16 vid, u16 *mstid);
#else
static inline bool br_vlan_enabled(const struct net_device *dev)
{
@@ -150,6 +151,11 @@ static inline int br_vlan_get_info_rcu(const struct net_device *dev, u16 vid,
{
return -EINVAL;
}
+static inline int br_vlan_get_mstid(const struct net_device *dev, u16 vid,
+ u16 *mstid)
+{
+ return -EINVAL;
+}
#endif

#if IS_ENABLED(CONFIG_BRIDGE)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 2661dda1a92b..7781e7a4449b 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -153,6 +153,14 @@ struct br_tunnel_info {
struct metadata_dst __rcu *tunnel_dst;
};

+struct br_vlan_mst {
+ refcount_t refcnt;
+ u16 id;
+ u8 state;
+
+ struct rcu_head rcu;
+};
+
/* private vlan flags */
enum {
BR_VLFLAG_PER_PORT_STATS = BIT(0),
@@ -168,7 +176,8 @@ enum {
* @vid: VLAN id
* @flags: bridge vlan flags
* @priv_flags: private (in-kernel) bridge vlan flags
- * @state: STP state (e.g. blocking, learning, forwarding)
+ * @mst: the port's STP state (e.g. blocking, learning, forwarding) in the MST
+ * associated with this VLAN
* @stats: per-cpu VLAN statistics
* @br: if MASTER flag set, this points to a bridge struct
* @port: if MASTER flag unset, this points to a port struct
@@ -192,7 +201,7 @@ struct net_bridge_vlan {
u16 vid;
u16 flags;
u16 priv_flags;
- u8 state;
+ struct br_vlan_mst __rcu *mst;
struct pcpu_sw_netstats __percpu *stats;
union {
struct net_bridge *br;
@@ -215,6 +224,20 @@ struct net_bridge_vlan {
struct rcu_head rcu;
};

+static inline u8 br_vlan_get_state_rcu(const struct net_bridge_vlan *v)
+{
+ const struct br_vlan_mst *mst = rcu_dereference(v->mst);
+
+ return mst->state;
+}
+
+static inline u8 br_vlan_get_state_rtnl(const struct net_bridge_vlan *v)
+{
+ const struct br_vlan_mst *mst = rtnl_dereference(v->mst);
+
+ return mst->state;
+}
+
/**
* struct net_bridge_vlan_group
*
@@ -1179,7 +1202,7 @@ br_multicast_port_ctx_state_disabled(const struct net_bridge_mcast_port *pmctx)
return pmctx->port->state == BR_STATE_DISABLED ||
(br_multicast_port_ctx_is_vlan(pmctx) &&
(br_multicast_port_ctx_vlan_disabled(pmctx) ||
- pmctx->vlan->state == BR_STATE_DISABLED));
+ br_vlan_get_state_rcu(pmctx->vlan) == BR_STATE_DISABLED));
}

static inline bool
@@ -1188,7 +1211,7 @@ br_multicast_port_ctx_state_stopped(const struct net_bridge_mcast_port *pmctx)
return br_multicast_port_ctx_state_disabled(pmctx) ||
pmctx->port->state == BR_STATE_BLOCKING ||
(br_multicast_port_ctx_is_vlan(pmctx) &&
- pmctx->vlan->state == BR_STATE_BLOCKING);
+ br_vlan_get_state_rcu(pmctx->vlan) == BR_STATE_BLOCKING);
}

static inline bool
@@ -1729,15 +1752,11 @@ bool br_vlan_global_opts_can_enter_range(const struct net_bridge_vlan *v_curr,
bool br_vlan_global_opts_fill(struct sk_buff *skb, u16 vid, u16 vid_range,
const struct net_bridge_vlan *v_opts);

-/* vlan state manipulation helpers using *_ONCE to annotate lock-free access */
-static inline u8 br_vlan_get_state(const struct net_bridge_vlan *v)
-{
- return READ_ONCE(v->state);
-}
-
static inline void br_vlan_set_state(struct net_bridge_vlan *v, u8 state)
{
- WRITE_ONCE(v->state, state);
+ struct br_vlan_mst *mst = rtnl_dereference(v->mst);
+
+ mst->state = state;
}

static inline u8 br_vlan_get_pvid_state(const struct net_bridge_vlan_group *vg)
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 6315e43a7a3e..b0383ec6cc91 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -34,6 +34,187 @@ static struct net_bridge_vlan *br_vlan_lookup(struct rhashtable *tbl, u16 vid)
return rhashtable_lookup_fast(tbl, &vid, br_vlan_rht_params);
}

+static void br_vlan_mst_rcu_free(struct rcu_head *rcu)
+{
+ struct br_vlan_mst *mst = container_of(rcu, struct br_vlan_mst, rcu);
+
+ kfree(mst);
+}
+
+static void br_vlan_mst_put(struct net_bridge_vlan *v)
+{
+ struct br_vlan_mst *mst = rtnl_dereference(v->mst);
+
+ if (refcount_dec_and_test(&mst->refcnt))
+ call_rcu(&mst->rcu, br_vlan_mst_rcu_free);
+}
+
+static struct br_vlan_mst *br_vlan_mst_new(u16 id)
+{
+ struct br_vlan_mst *mst;
+
+ mst = kzalloc(sizeof(*mst), GFP_KERNEL);
+ if (!mst)
+ return NULL;
+
+ refcount_set(&mst->refcnt, 1);
+ mst->id = id;
+ mst->state = BR_STATE_FORWARDING;
+ return mst;
+}
+
+static int br_vlan_mstid_get_free(struct net_bridge *br)
+{
+ const struct net_bridge_vlan *v;
+ struct rhashtable_iter iter;
+ struct br_vlan_mst *mst;
+ unsigned long *busy;
+ int err = 0;
+ u16 id;
+
+ busy = bitmap_zalloc(VLAN_N_VID, GFP_KERNEL);
+ if (!busy)
+ return -ENOMEM;
+
+ /* MSTID 0 is reserved for the CIST */
+ set_bit(0, busy);
+
+ rhashtable_walk_enter(&br_vlan_group(br)->vlan_hash, &iter);
+ rhashtable_walk_start(&iter);
+
+ while ((v = rhashtable_walk_next(&iter))) {
+ if (IS_ERR(v)) {
+ err = PTR_ERR(v);
+ goto out_free;
+ }
+
+ mst = rtnl_dereference(v->mst);
+ set_bit(mst->id, busy);
+ }
+
+ rhashtable_walk_stop(&iter);
+
+ id = find_first_zero_bit(busy, VLAN_N_VID);
+ if (id >= VLAN_N_VID)
+ err = -ENOSPC;
+
+out_free:
+ kfree(busy);
+ return err ? : id;
+}
+
+u16 br_vlan_mstid_get(const struct net_bridge_vlan *v)
+{
+ const struct net_bridge_vlan *masterv;
+ const struct br_vlan_mst *mst;
+ const struct net_bridge *br;
+
+ if (br_vlan_is_master(v))
+ br = v->br;
+ else
+ br = v->port->br;
+
+ masterv = br_vlan_lookup(&br_vlan_group(br)->vlan_hash, v->vid);
+
+ mst = rtnl_dereference(masterv->mst);
+
+ return mst->id;
+}
+
+int br_vlan_get_mstid(const struct net_device *dev, u16 vid, u16 *mstid)
+{
+ struct net_bridge *br = netdev_priv(dev);
+ struct net_bridge_vlan *v;
+
+ v = br_vlan_lookup(&br_vlan_group(br)->vlan_hash, vid);
+ if (!v)
+ return -ENOENT;
+
+ *mstid = br_vlan_mstid_get(v);
+ return 0;
+}
+EXPORT_SYMBOL_GPL(br_vlan_get_mstid);
+
+static struct br_vlan_mst *br_vlan_group_mst_get(struct net_bridge_vlan_group *vg, u16 mstid)
+{
+ struct net_bridge_vlan *v;
+ struct br_vlan_mst *mst;
+
+ list_for_each_entry(v, &vg->vlan_list, vlist) {
+ mst = rtnl_dereference(v->mst);
+ if (mst->id == mstid) {
+ refcount_inc(&mst->refcnt);
+ return mst;
+ }
+ }
+
+ return NULL;
+}
+
+static int br_vlan_mst_migrate(struct net_bridge_vlan *v, u16 mstid)
+{
+ struct net_bridge_vlan_group *vg;
+ struct br_vlan_mst *mst;
+
+ if (br_vlan_is_master(v))
+ vg = br_vlan_group(v->br);
+ else
+ vg = nbp_vlan_group(v->port);
+
+ mst = br_vlan_group_mst_get(vg, mstid);
+ if (!mst) {
+ mst = br_vlan_mst_new(mstid);
+ if (!mst)
+ return -ENOMEM;
+ }
+
+ if (rtnl_dereference(v->mst))
+ br_vlan_mst_put(v);
+
+ rcu_assign_pointer(v->mst, mst);
+ return 0;
+}
+
+static int br_vlan_mst_init_master(struct net_bridge_vlan *v)
+{
+ struct net_bridge *br = v->br;
+ struct br_vlan_mst *mst;
+ int mstid;
+
+ /* The bridge VLAN is always added first, either as context or
+ * as a proper entry. Since the bridge default is a 1:1 map
+ * from VID to MST, we always need to allocate a new ID in
+ * this case.
+ */
+ mstid = br_vlan_mstid_get_free(br);
+ if (mstid < 0)
+ return mstid;
+
+ mst = br_vlan_mst_new(mstid);
+ if (!mst)
+ return -ENOMEM;
+
+ rcu_assign_pointer(v->mst, mst);
+ return 0;
+}
+
+static int br_vlan_mst_init_port(struct net_bridge_vlan *v)
+{
+ u16 mstid;
+
+ mstid = br_vlan_mstid_get(v);
+
+ return br_vlan_mst_migrate(v, mstid);
+}
+
+static int br_vlan_mst_init(struct net_bridge_vlan *v)
+{
+ if (br_vlan_is_master(v))
+ return br_vlan_mst_init_master(v);
+ else
+ return br_vlan_mst_init_port(v);
+}
+
static bool __vlan_add_pvid(struct net_bridge_vlan_group *vg,
const struct net_bridge_vlan *v)
{
@@ -41,7 +222,7 @@ static bool __vlan_add_pvid(struct net_bridge_vlan_group *vg,
return false;

smp_wmb();
- br_vlan_set_pvid_state(vg, v->state);
+ br_vlan_set_pvid_state(vg, br_vlan_get_state_rtnl(v));
vg->pvid = v->vid;

return true;
@@ -301,13 +482,14 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags,
vg->num_vlans++;
}

- /* set the state before publishing */
- v->state = BR_STATE_FORWARDING;
+ err = br_vlan_mst_init(v);
+ if (err)
+ goto out_fdb_insert;

err = rhashtable_lookup_insert_fast(&vg->vlan_hash, &v->vnode,
br_vlan_rht_params);
if (err)
- goto out_fdb_insert;
+ goto out_mst_init;

__vlan_add_list(v);
__vlan_add_flags(v, flags);
@@ -318,6 +500,9 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags,
out:
return err;

+out_mst_init:
+ br_vlan_mst_put(v);
+
out_fdb_insert:
if (br_vlan_should_use(v)) {
br_fdb_find_delete_local(br, p, dev->dev_addr, v->vid);
@@ -385,6 +570,7 @@ static int __vlan_del(struct net_bridge_vlan *v)
call_rcu(&v->rcu, nbp_vlan_rcu_free);
}

+ br_vlan_mst_put(v);
br_vlan_put_master(masterv);
out:
return err;
@@ -578,7 +764,7 @@ static bool __allowed_ingress(const struct net_bridge *br,
goto drop;

if (*state == BR_STATE_FORWARDING) {
- *state = br_vlan_get_state(v);
+ *state = br_vlan_get_state_rcu(v);
if (!br_vlan_state_allowed(*state, true))
goto drop;
}
@@ -631,7 +817,7 @@ bool br_allowed_egress(struct net_bridge_vlan_group *vg,
br_vlan_get_tag(skb, &vid);
v = br_vlan_find(vg, vid);
if (v && br_vlan_should_use(v) &&
- br_vlan_state_allowed(br_vlan_get_state(v), false))
+ br_vlan_state_allowed(br_vlan_get_state_rcu(v), false))
return true;

return false;
@@ -665,7 +851,7 @@ bool br_should_learn(struct net_bridge_port *p, struct sk_buff *skb, u16 *vid)
}

v = br_vlan_find(vg, *vid);
- if (v && br_vlan_state_allowed(br_vlan_get_state(v), true))
+ if (v && br_vlan_state_allowed(br_vlan_get_state_rcu(v), true))
return true;

return false;
diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c
index a6382973b3e7..0b1099709d4b 100644
--- a/net/bridge/br_vlan_options.c
+++ b/net/bridge/br_vlan_options.c
@@ -43,14 +43,14 @@ bool br_vlan_opts_eq_range(const struct net_bridge_vlan *v_curr,
u8 range_mc_rtr = br_vlan_multicast_router(range_end);
u8 curr_mc_rtr = br_vlan_multicast_router(v_curr);

- return v_curr->state == range_end->state &&
+ return br_vlan_get_state_rtnl(v_curr) == br_vlan_get_state_rtnl(range_end) &&
__vlan_tun_can_enter_range(v_curr, range_end) &&
curr_mc_rtr == range_mc_rtr;
}

bool br_vlan_opts_fill(struct sk_buff *skb, const struct net_bridge_vlan *v)
{
- if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state(v)) ||
+ if (nla_put_u8(skb, BRIDGE_VLANDB_ENTRY_STATE, br_vlan_get_state_rtnl(v)) ||
!__vlan_tun_put(skb, v))
return false;

@@ -99,7 +99,7 @@ static int br_vlan_modify_state(struct net_bridge_vlan_group *vg,
return -EBUSY;
}

- if (v->state == state)
+ if (br_vlan_get_state_rtnl(v) == state)
return 0;

if (v->vid == br_get_pvid(vg))
@@ -294,7 +294,8 @@ bool br_vlan_global_opts_can_enter_range(const struct net_bridge_vlan *v_curr,
((v_curr->priv_flags ^ r_end->priv_flags) &
BR_VLFLAG_GLOBAL_MCAST_ENABLED) == 0 &&
br_multicast_ctx_options_equal(&v_curr->br_mcast_ctx,
- &r_end->br_mcast_ctx);
+ &r_end->br_mcast_ctx) &&
+ br_vlan_mstid_get(v_curr) == br_vlan_mstid_get(r_end);
}

bool br_vlan_global_opts_fill(struct sk_buff *skb, u16 vid, u16 vid_range,
--
2.25.1

2022-02-16 14:09:31

by Tobias Waldekranz

[permalink] [raw]
Subject: [RFC net-next 9/9] net: dsa: mv88e6xxx: MST Offloading

Allocate a SID in the STU for each MSTID in use by a bridge and handle
the mapping of MSTIDs to VLANs using the SID field of each VTU entry.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
drivers/net/dsa/mv88e6xxx/chip.c | 169 +++++++++++++++++++++++++++++++
drivers/net/dsa/mv88e6xxx/chip.h | 13 +++
2 files changed, 182 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 39cf1bae161e..7d9ef041252d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1817,6 +1817,128 @@ static int mv88e6xxx_stu_setup(struct mv88e6xxx_chip *chip)
return mv88e6xxx_stu_loadpurge(chip, &stu);
}

+static int mv88e6xxx_sid_new(struct mv88e6xxx_chip *chip, u8 *sid)
+{
+ DECLARE_BITMAP(busy, MV88E6XXX_N_SID) = { 0 };
+ struct mv88e6xxx_mst *mst;
+
+ set_bit(0, busy);
+
+ list_for_each_entry(mst, &chip->msts, node) {
+ set_bit(mst->stu.sid, busy);
+ }
+
+ *sid = find_first_zero_bit(busy, MV88E6XXX_N_SID);
+
+ return (*sid >= mv88e6xxx_max_sid(chip)) ? -ENOSPC : 0;
+}
+
+static int mv88e6xxx_sid_put(struct mv88e6xxx_chip *chip, u8 sid)
+{
+ struct mv88e6xxx_mst *mst, *tmp;
+ int err = 0;
+
+ list_for_each_entry_safe(mst, tmp, &chip->msts, node) {
+ if (mst->stu.sid == sid) {
+ if (refcount_dec_and_test(&mst->refcnt)) {
+ mst->stu.valid = false;
+ err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);
+ list_del(&mst->node);
+ kfree(mst);
+ }
+
+ return err;
+ }
+ }
+
+ return -ENOENT;
+}
+
+static int mv88e6xxx_sid_get(struct mv88e6xxx_chip *chip, struct net_device *br,
+ u16 mstid, u8 *sid)
+{
+ struct mv88e6xxx_mst *mst;
+ int err;
+
+ if (!br)
+ return 0;
+
+ if (!mv88e6xxx_has_stu(chip))
+ return -EOPNOTSUPP;
+
+ list_for_each_entry(mst, &chip->msts, node) {
+ if (mst->br == br && mst->mstid == mstid) {
+ refcount_inc(&mst->refcnt);
+ *sid = mst->stu.sid;
+ return 0;
+ }
+ }
+
+ err = mv88e6xxx_sid_new(chip, sid);
+ if (err)
+ return err;
+
+ mst = kzalloc(sizeof(*mst), GFP_KERNEL);
+ if (!mst)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&mst->node);
+ refcount_set(&mst->refcnt, 1);
+ mst->br = br;
+ mst->mstid = mstid;
+ mst->stu.valid = true;
+ mst->stu.sid = *sid;
+ list_add_tail(&mst->node, &chip->msts);
+ return mv88e6xxx_stu_loadpurge(chip, &mst->stu);
+}
+
+static int mv88e6xxx_port_mst_state_set(struct dsa_switch *ds, int port,
+ const struct switchdev_mst_state *st)
+{
+ struct dsa_port *dp = dsa_to_port(ds, port);
+ struct mv88e6xxx_chip *chip = ds->priv;
+ struct mv88e6xxx_mst *mst;
+ u8 state;
+ int err;
+
+ if (!mv88e6xxx_has_stu(chip))
+ return -EOPNOTSUPP;
+
+ switch (st->state) {
+ case BR_STATE_DISABLED:
+ state = MV88E6XXX_PORT_CTL0_STATE_DISABLED;
+ break;
+ case BR_STATE_BLOCKING:
+ case BR_STATE_LISTENING:
+ state = MV88E6XXX_PORT_CTL0_STATE_BLOCKING;
+ break;
+ case BR_STATE_LEARNING:
+ state = MV88E6XXX_PORT_CTL0_STATE_LEARNING;
+ break;
+ case BR_STATE_FORWARDING:
+ state = MV88E6XXX_PORT_CTL0_STATE_FORWARDING;
+ break;
+ default:
+ return -EINVAL;
+ }
+
+ list_for_each_entry(mst, &chip->msts, node) {
+ if (mst->br == dsa_port_bridge_dev_get(dp) &&
+ mst->mstid == st->mstid) {
+ if (mst->stu.state[port] == state)
+ return 0;
+
+ mst->stu.state[port] = state;
+ mv88e6xxx_reg_lock(chip);
+ err = mv88e6xxx_stu_loadpurge(chip, &mst->stu);
+ mv88e6xxx_reg_unlock(chip);
+ return err;
+ }
+ }
+
+ return -ENOENT;
+}
+
static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch *ds, int port,
u16 vid)
{
@@ -2436,6 +2558,12 @@ static int mv88e6xxx_port_vlan_leave(struct mv88e6xxx_chip *chip,
if (err)
return err;

+ if (!vlan.valid) {
+ err = mv88e6xxx_sid_put(chip, vlan.sid);
+ if (err)
+ return err;
+ }
+
return mv88e6xxx_g1_atu_remove(chip, vlan.fid, port, false);
}

@@ -2474,6 +2602,44 @@ static int mv88e6xxx_port_vlan_del(struct dsa_switch *ds, int port,
return err;
}

+static int mv88e6xxx_vlan_mstid_set(struct dsa_switch *ds,
+ const struct switchdev_attr *attr)
+{
+ const struct switchdev_vlan_attr *vattr = &attr->u.vlan_attr;
+ struct mv88e6xxx_chip *chip = ds->priv;
+ struct mv88e6xxx_vtu_entry vlan;
+ u8 new_sid;
+ int err;
+
+ mv88e6xxx_reg_lock(chip);
+
+ err = mv88e6xxx_vtu_get(chip, vattr->vid, &vlan);
+ if (err)
+ goto unlock;
+
+ if (!vlan.valid) {
+ err = -EINVAL;
+ goto unlock;
+ }
+
+ err = mv88e6xxx_sid_get(chip, attr->orig_dev, vattr->mstid, &new_sid);
+ if (err)
+ goto unlock;
+
+ if (vlan.sid) {
+ err = mv88e6xxx_sid_put(chip, vlan.sid);
+ if (err)
+ goto unlock;
+ }
+
+ vlan.sid = new_sid;
+ err = mv88e6xxx_vtu_loadpurge(chip, &vlan);
+
+unlock:
+ mv88e6xxx_reg_unlock(chip);
+ return err;
+}
+
static int mv88e6xxx_port_fdb_add(struct dsa_switch *ds, int port,
const unsigned char *addr, u16 vid)
{
@@ -5996,6 +6162,7 @@ static struct mv88e6xxx_chip *mv88e6xxx_alloc_chip(struct device *dev)
mutex_init(&chip->reg_lock);
INIT_LIST_HEAD(&chip->mdios);
idr_init(&chip->policies);
+ INIT_LIST_HEAD(&chip->msts);

return chip;
}
@@ -6518,10 +6685,12 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
.port_pre_bridge_flags = mv88e6xxx_port_pre_bridge_flags,
.port_bridge_flags = mv88e6xxx_port_bridge_flags,
.port_stp_state_set = mv88e6xxx_port_stp_state_set,
+ .port_mst_state_set = mv88e6xxx_port_mst_state_set,
.port_fast_age = mv88e6xxx_port_fast_age,
.port_vlan_filtering = mv88e6xxx_port_vlan_filtering,
.port_vlan_add = mv88e6xxx_port_vlan_add,
.port_vlan_del = mv88e6xxx_port_vlan_del,
+ .vlan_mstid_set = mv88e6xxx_vlan_mstid_set,
.port_fdb_add = mv88e6xxx_port_fdb_add,
.port_fdb_del = mv88e6xxx_port_fdb_del,
.port_fdb_dump = mv88e6xxx_port_fdb_dump,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 6d4daa24d3e5..af0f53b65689 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -297,6 +297,16 @@ struct mv88e6xxx_region_priv {
enum mv88e6xxx_region_id id;
};

+struct mv88e6xxx_mst {
+ struct list_head node;
+
+ refcount_t refcnt;
+ struct net_device *br;
+ u16 mstid;
+
+ struct mv88e6xxx_stu_entry stu;
+};
+
struct mv88e6xxx_chip {
const struct mv88e6xxx_info *info;

@@ -397,6 +407,9 @@ struct mv88e6xxx_chip {

/* devlink regions */
struct devlink_region *regions[_MV88E6XXX_REGION_MAX];
+
+ /* Bridge MST to SID mappings */
+ struct list_head msts;
};

struct mv88e6xxx_bus_ops {
--
2.25.1

2022-02-16 14:20:30

by Tobias Waldekranz

[permalink] [raw]
Subject: [RFC net-next 2/9] net: bridge: vlan: Allow multiple VLANs to be mapped to a single MST

Allow a VLAN to change its MSTID. In particular, allow multiple VLANs
to use the same MSTID. This is a global VLAN setting, i.e. any VLANs
bound to the same MSTID will share their per-VLAN STP states on all
bridge ports.

Example:

By default, each VLAN is placed in a separate MSTID:

root@coronet:~# ip link add dev br0 type bridge vlan_filtering 1
root@coronet:~# ip link set dev eth1 master br0
root@coronet:~# bridge vlan add dev eth1 vid 2 pvid untagged
root@coronet:~# bridge vlan add dev eth1 vid 3
root@coronet:~# bridge vlan global
port vlan-id
br0 1
mcast_snooping 1 mca<redacted>_interval 1000 mstid 1
2
mcast_snooping 1 mca<redacted>_interval 1000 mstid 2
3
mcast_snooping 1 mca<redacted>_interval 1000 mstid 3

Once two or more VLANs are bound to the same MSTID, their states move
in lockstep, independent of which VID is used to access the state:

root@coronet:~# bridge vlan global set dev br0 vid 2 mstid 10
root@coronet:~# bridge vlan global set dev br0 vid 3 mstid 10
root@coronet:~# bridge -d vlan global
port vlan-id
br0 1
mcast_snooping 1 mca<redacted>_interval 1000 mstid 1
2-3
mcast_snooping 1 mca<redacted>_interval 1000 mstid 10

root@coronet:~# bridge vlan set dev eth1 vid 2 state blocking
root@coronet:~# bridge -d vlan
port vlan-id
eth1 1 Egress Untagged
state forwarding mcast_router 1
2 PVID Egress Untagged
state blocking mcast_router 1
3
state blocking mcast_router 1
br0 1 PVID Egress Untagged
state forwarding mcast_router 1
root@coronet:~# bridge vlan set dev eth1 vid 3 state forwarding
root@coronet:~# bridge -d vlan
port vlan-id
eth1 1 Egress Untagged
state forwarding mcast_router 1
2 PVID Egress Untagged
state forwarding mcast_router 1
3
state forwarding mcast_router 1
br0 1 PVID Egress Untagged
state forwarding mcast_router 1

Signed-off-by: Tobias Waldekranz <[email protected]>
---
include/uapi/linux/if_bridge.h | 1 +
net/bridge/br_private.h | 3 ++
net/bridge/br_vlan.c | 53 ++++++++++++++++++++++++++--------
net/bridge/br_vlan_options.c | 17 ++++++++++-
4 files changed, 61 insertions(+), 13 deletions(-)

diff --git a/include/uapi/linux/if_bridge.h b/include/uapi/linux/if_bridge.h
index 2711c3522010..4a971b419d9f 100644
--- a/include/uapi/linux/if_bridge.h
+++ b/include/uapi/linux/if_bridge.h
@@ -564,6 +564,7 @@ enum {
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER,
BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS,
BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_STATE,
+ BRIDGE_VLANDB_GOPTS_MSTID,
__BRIDGE_VLANDB_GOPTS_MAX
};
#define BRIDGE_VLANDB_GOPTS_MAX (__BRIDGE_VLANDB_GOPTS_MAX - 1)
diff --git a/net/bridge/br_private.h b/net/bridge/br_private.h
index 7781e7a4449b..5b121cf7aabe 100644
--- a/net/bridge/br_private.h
+++ b/net/bridge/br_private.h
@@ -1759,6 +1759,9 @@ static inline void br_vlan_set_state(struct net_bridge_vlan *v, u8 state)
mst->state = state;
}

+u16 br_vlan_mstid_get(const struct net_bridge_vlan *v);
+int br_vlan_mstid_set(struct net_bridge_vlan *v, u16 mstid);
+
static inline u8 br_vlan_get_pvid_state(const struct net_bridge_vlan_group *vg)
{
return READ_ONCE(vg->pvid_state);
diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index b0383ec6cc91..459e84a7354d 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -41,10 +41,8 @@ static void br_vlan_mst_rcu_free(struct rcu_head *rcu)
kfree(mst);
}

-static void br_vlan_mst_put(struct net_bridge_vlan *v)
+static void br_vlan_mst_put(struct br_vlan_mst *mst)
{
- struct br_vlan_mst *mst = rtnl_dereference(v->mst);
-
if (refcount_dec_and_test(&mst->refcnt))
call_rcu(&mst->rcu, br_vlan_mst_rcu_free);
}
@@ -153,13 +151,17 @@ static struct br_vlan_mst *br_vlan_group_mst_get(struct net_bridge_vlan_group *v

static int br_vlan_mst_migrate(struct net_bridge_vlan *v, u16 mstid)
{
+ struct br_vlan_mst *mst, *old_mst;
struct net_bridge_vlan_group *vg;
- struct br_vlan_mst *mst;
+ struct net_bridge *br;

- if (br_vlan_is_master(v))
- vg = br_vlan_group(v->br);
- else
+ if (br_vlan_is_master(v)) {
+ br = v->br;
+ vg = br_vlan_group(br);
+ } else {
+ br = v->port->br;
vg = nbp_vlan_group(v->port);
+ }

mst = br_vlan_group_mst_get(vg, mstid);
if (!mst) {
@@ -168,10 +170,37 @@ static int br_vlan_mst_migrate(struct net_bridge_vlan *v, u16 mstid)
return -ENOMEM;
}

- if (rtnl_dereference(v->mst))
- br_vlan_mst_put(v);
-
+ old_mst = rtnl_dereference(v->mst);
rcu_assign_pointer(v->mst, mst);
+
+ if (old_mst)
+ br_vlan_mst_put(old_mst);
+
+ return 0;
+}
+
+int br_vlan_mstid_set(struct net_bridge_vlan *v, u16 mstid)
+{
+ struct net_bridge *br = v->br;
+ struct net_bridge_port *p;
+ int err;
+
+ err = br_vlan_mst_migrate(v, mstid);
+ if (err)
+ return err;
+
+ list_for_each_entry(p, &br->port_list, list) {
+ struct net_bridge_vlan_group *vg = nbp_vlan_group(p);
+ struct net_bridge_vlan *portv;
+
+ portv = br_vlan_lookup(&vg->vlan_hash, v->vid);
+ if (!portv)
+ continue;
+
+ err = br_vlan_mst_migrate(portv, mstid);
+ if (err)
+ return err;
+ }
return 0;
}

@@ -501,7 +530,7 @@ static int __vlan_add(struct net_bridge_vlan *v, u16 flags,
return err;

out_mst_init:
- br_vlan_mst_put(v);
+ br_vlan_mst_put(rtnl_dereference(v->mst));

out_fdb_insert:
if (br_vlan_should_use(v)) {
@@ -570,7 +599,7 @@ static int __vlan_del(struct net_bridge_vlan *v)
call_rcu(&v->rcu, nbp_vlan_rcu_free);
}

- br_vlan_mst_put(v);
+ br_vlan_mst_put(rtnl_dereference(v->mst));
br_vlan_put_master(masterv);
out:
return err;
diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c
index 0b1099709d4b..1c0fd55fe6c9 100644
--- a/net/bridge/br_vlan_options.c
+++ b/net/bridge/br_vlan_options.c
@@ -380,6 +380,9 @@ bool br_vlan_global_opts_fill(struct sk_buff *skb, u16 vid, u16 vid_range,
#endif
#endif

+ if (nla_put_u16(skb, BRIDGE_VLANDB_GOPTS_MSTID, br_vlan_mstid_get(v_opts)))
+ goto out_err;
+
nla_nest_end(skb, nest);

return true;
@@ -411,7 +414,9 @@ static size_t rtnl_vlan_global_opts_nlmsg_size(const struct net_bridge_vlan *v)
+ nla_total_size(0) /* BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS */
+ br_rports_size(&v->br_mcast_ctx) /* BRIDGE_VLANDB_GOPTS_MCAST_ROUTER_PORTS */
#endif
- + nla_total_size(sizeof(u16)); /* BRIDGE_VLANDB_GOPTS_RANGE */
+ + nla_total_size(sizeof(u16)) /* BRIDGE_VLANDB_GOPTS_RANGE */
+ + nla_total_size(sizeof(u16)) /* BRIDGE_VLANDB_GOPTS_MSTID */
+ + 0;
}

static void br_vlan_global_opts_notify(const struct net_bridge *br,
@@ -560,6 +565,15 @@ static int br_vlan_process_global_one_opts(const struct net_bridge *br,
}
#endif
#endif
+ if (tb[BRIDGE_VLANDB_GOPTS_MSTID]) {
+ u16 mstid;
+
+ mstid = nla_get_u16(tb[BRIDGE_VLANDB_GOPTS_MSTID]);
+ err = br_vlan_mstid_set(v, mstid);
+ if (err)
+ return err;
+ *changed = true;
+ }

return 0;
}
@@ -579,6 +593,7 @@ static const struct nla_policy br_vlan_db_gpol[BRIDGE_VLANDB_GOPTS_MAX + 1] = {
[BRIDGE_VLANDB_GOPTS_MCAST_QUERIER_INTVL] = { .type = NLA_U64 },
[BRIDGE_VLANDB_GOPTS_MCAST_STARTUP_QUERY_INTVL] = { .type = NLA_U64 },
[BRIDGE_VLANDB_GOPTS_MCAST_QUERY_RESPONSE_INTVL] = { .type = NLA_U64 },
+ [BRIDGE_VLANDB_GOPTS_MSTID] = NLA_POLICY_RANGE(NLA_U16, 1, 4094),
};

int br_vlan_rtm_process_global_options(struct net_device *dev,
--
2.25.1

2022-02-16 14:35:25

by Tobias Waldekranz

[permalink] [raw]
Subject: [RFC net-next 8/9] net: dsa: mv88e6xxx: Export STU as devlink region

Export the raw STU data in a devlink region so that it can be
inspected from userspace and compared to the current bridge
configuration.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
drivers/net/dsa/mv88e6xxx/chip.h | 1 +
drivers/net/dsa/mv88e6xxx/devlink.c | 94 +++++++++++++++++++++++++++++
2 files changed, 95 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index be654be69982..6d4daa24d3e5 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -287,6 +287,7 @@ enum mv88e6xxx_region_id {
MV88E6XXX_REGION_GLOBAL2,
MV88E6XXX_REGION_ATU,
MV88E6XXX_REGION_VTU,
+ MV88E6XXX_REGION_STU,
MV88E6XXX_REGION_PVT,

_MV88E6XXX_REGION_MAX,
diff --git a/drivers/net/dsa/mv88e6xxx/devlink.c b/drivers/net/dsa/mv88e6xxx/devlink.c
index 381068395c63..1266eabee086 100644
--- a/drivers/net/dsa/mv88e6xxx/devlink.c
+++ b/drivers/net/dsa/mv88e6xxx/devlink.c
@@ -503,6 +503,85 @@ static int mv88e6xxx_region_vtu_snapshot(struct devlink *dl,
return 0;
}

+/**
+ * struct mv88e6xxx_devlink_stu_entry - Devlink STU entry
+ * @sid: Global1/3: SID, unknown filters and learning.
+ * @vid: Global1/6: Valid bit.
+ * @data: Global1/7-9: Membership data and priority override.
+ * @resvd: Reserved. In case we forgot something.
+ *
+ * The STU entry format varies between chipset generations. Peridot
+ * and Amethyst packs the STU data into Global1/7-8. Older silicon
+ * spreads the information across all three VTU data registers -
+ * inheriting the layout of even older hardware that had no STU at
+ * all. Since this is a low-level debug interface, copy all data
+ * verbatim and defer parsing to the consumer.
+ */
+struct mv88e6xxx_devlink_stu_entry {
+ u16 sid;
+ u16 vid;
+ u16 data[3];
+ u16 resvd;
+};
+
+static int mv88e6xxx_region_stu_snapshot(struct devlink *dl,
+ const struct devlink_region_ops *ops,
+ struct netlink_ext_ack *extack,
+ u8 **data)
+{
+ struct mv88e6xxx_devlink_stu_entry *table, *entry;
+ struct dsa_switch *ds = dsa_devlink_to_ds(dl);
+ struct mv88e6xxx_chip *chip = ds->priv;
+ struct mv88e6xxx_stu_entry stu;
+ int err;
+
+ table = kcalloc(mv88e6xxx_max_sid(chip) + 1,
+ sizeof(struct mv88e6xxx_devlink_stu_entry),
+ GFP_KERNEL);
+ if (!table)
+ return -ENOMEM;
+
+ entry = table;
+ stu.sid = mv88e6xxx_max_sid(chip);
+ stu.valid = false;
+
+ mv88e6xxx_reg_lock(chip);
+
+ do {
+ err = mv88e6xxx_g1_stu_getnext(chip, &stu);
+ if (err)
+ break;
+
+ if (!stu.valid)
+ break;
+
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6352_G1_VTU_SID,
+ &entry->sid);
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6XXX_G1_VTU_VID,
+ &entry->vid);
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6XXX_G1_VTU_DATA1,
+ &entry->data[0]);
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6XXX_G1_VTU_DATA2,
+ &entry->data[1]);
+ err = err ? : mv88e6xxx_g1_read(chip, MV88E6XXX_G1_VTU_DATA3,
+ &entry->data[2]);
+ if (err)
+ break;
+
+ entry++;
+ } while (stu.sid < mv88e6xxx_max_sid(chip));
+
+ mv88e6xxx_reg_unlock(chip);
+
+ if (err) {
+ kfree(table);
+ return err;
+ }
+
+ *data = (u8 *)table;
+ return 0;
+}
+
static int mv88e6xxx_region_pvt_snapshot(struct devlink *dl,
const struct devlink_region_ops *ops,
struct netlink_ext_ack *extack,
@@ -605,6 +684,12 @@ static struct devlink_region_ops mv88e6xxx_region_vtu_ops = {
.destructor = kfree,
};

+static struct devlink_region_ops mv88e6xxx_region_stu_ops = {
+ .name = "stu",
+ .snapshot = mv88e6xxx_region_stu_snapshot,
+ .destructor = kfree,
+};
+
static struct devlink_region_ops mv88e6xxx_region_pvt_ops = {
.name = "pvt",
.snapshot = mv88e6xxx_region_pvt_snapshot,
@@ -640,6 +725,11 @@ static struct mv88e6xxx_region mv88e6xxx_regions[] = {
.ops = &mv88e6xxx_region_vtu_ops
/* calculated at runtime */
},
+ [MV88E6XXX_REGION_STU] = {
+ .ops = &mv88e6xxx_region_stu_ops,
+ .cond = mv88e6xxx_has_stu,
+ /* calculated at runtime */
+ },
[MV88E6XXX_REGION_PVT] = {
.ops = &mv88e6xxx_region_pvt_ops,
.size = MV88E6XXX_MAX_PVT_ENTRIES * sizeof(u16),
@@ -706,6 +796,10 @@ int mv88e6xxx_setup_devlink_regions_global(struct dsa_switch *ds)
size = (mv88e6xxx_max_vid(chip) + 1) *
sizeof(struct mv88e6xxx_devlink_vtu_entry);
break;
+ case MV88E6XXX_REGION_STU:
+ size = (mv88e6xxx_max_sid(chip) + 1) *
+ sizeof(struct mv88e6xxx_devlink_stu_entry);
+ break;
}

region = dsa_devlink_region_create(ds, ops, 1, size);
--
2.25.1

2022-02-16 14:37:32

by Tobias Waldekranz

[permalink] [raw]
Subject: [RFC net-next 6/9] net: dsa: Pass MST state changes to driver

Add the usual trampoline functionality from the generic DSA layer down
to the drivers for MST state changes.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
include/net/dsa.h | 2 ++
net/dsa/dsa_priv.h | 2 ++
net/dsa/port.c | 30 ++++++++++++++++++++++++++++++
net/dsa/slave.c | 6 ++++++
4 files changed, 40 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 2aabe7f0b176..f030afb68f46 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -882,6 +882,8 @@ struct dsa_switch_ops {
struct dsa_bridge bridge);
void (*port_stp_state_set)(struct dsa_switch *ds, int port,
u8 state);
+ int (*port_mst_state_set)(struct dsa_switch *ds, int port,
+ const struct switchdev_mst_state *state);
void (*port_fast_age)(struct dsa_switch *ds, int port);
int (*port_pre_bridge_flags)(struct dsa_switch *ds, int port,
struct switchdev_brport_flags flags,
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 43709c005461..96d09184de5d 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -185,6 +185,8 @@ static inline struct net_device *dsa_master_find_slave(struct net_device *dev,
void dsa_port_set_tag_protocol(struct dsa_port *cpu_dp,
const struct dsa_device_ops *tag_ops);
int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age);
+int dsa_port_set_mst_state(struct dsa_port *dp,
+ const struct switchdev_mst_state *state);
int dsa_port_enable_rt(struct dsa_port *dp, struct phy_device *phy);
int dsa_port_enable(struct dsa_port *dp, struct phy_device *phy);
void dsa_port_disable_rt(struct dsa_port *dp);
diff --git a/net/dsa/port.c b/net/dsa/port.c
index 4fb2bf2383d9..a34779658f17 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -108,6 +108,36 @@ int dsa_port_set_state(struct dsa_port *dp, u8 state, bool do_fast_age)
return 0;
}

+int dsa_port_set_mst_state(struct dsa_port *dp,
+ const struct switchdev_mst_state *state)
+{
+ struct dsa_switch *ds = dp->ds;
+ int err, port = dp->index;
+
+ if (!ds->ops->port_mst_state_set)
+ return -EOPNOTSUPP;
+
+ err = ds->ops->port_mst_state_set(ds, port, state);
+ if (err)
+ return err;
+
+ if (!dsa_port_can_configure_learning(dp) || dp->learning) {
+ switch (state->state) {
+ case BR_STATE_DISABLED:
+ case BR_STATE_BLOCKING:
+ case BR_STATE_LISTENING:
+ /* Ideally we would only fast age entries
+ * belonging to VLANs controlled by this
+ * MST.
+ */
+ dsa_port_fast_age(dp);
+ break;
+ }
+ }
+
+ return 0;
+}
+
static void dsa_port_set_state_now(struct dsa_port *dp, u8 state,
bool do_fast_age)
{
diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 0a5e44105add..48075d697588 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -288,6 +288,12 @@ static int dsa_slave_port_attr_set(struct net_device *dev, const void *ctx,

ret = dsa_port_set_state(dp, attr->u.stp_state, true);
break;
+ case SWITCHDEV_ATTR_ID_PORT_MST_STATE:
+ if (!dsa_port_offloads_bridge_port(dp, attr->orig_dev))
+ return -EOPNOTSUPP;
+
+ ret = dsa_port_set_mst_state(dp, &attr->u.mst_state);
+ break;
case SWITCHDEV_ATTR_ID_BRIDGE_VLAN_FILTERING:
if (!dsa_port_offloads_bridge_dev(dp, attr->orig_dev))
return -EOPNOTSUPP;
--
2.25.1

2022-02-16 15:17:24

by Tobias Waldekranz

[permalink] [raw]
Subject: [RFC net-next 7/9] net: dsa: mv88e6xxx: Disentangle STU from VTU

In early LinkStreet silicon (e.g. 6095/6185), the per-VLAN STP states
were kept in the VTU - there was no concept of a SID. Later, the
information was split into two tables, where the VTU only tracked
memberships and deferred the STP state tracking to the STU via a
pointer (SID). This meant that a group of VLANs could share the same
STU entry. Most likely, this was done to align with MSTP (802.1Q-2018,
Clause 13), which is built on this principle.

While the VTU is still 4k lines on most devices, the STU is capped at
64 entries. This means that the current stategy, updating STU info
whenever a VTU entry is updated, can not easily support MSTP because:

- The maximum number of VIDs would also be capped at 64, as we would
have to allocate one SID for every VTU entry - even if many VLANs
would effectively share the same MST.

- MSTP updates would be unnecessarily slow as you would have to
iterate over all VLANs that share the same MST.

In order to support MSTP offloading in the future, manage the STU as a
separate entity from the VTU.

Only add support for newer hardware with separate VTU and
STU. VTU-only devices can also be supported, but essentially this
requires a software implementation of an STU (fanning out state
changed to all VLANs tied to the same MST).

Signed-off-by: Tobias Waldekranz <[email protected]>
---
drivers/net/dsa/mv88e6xxx/chip.c | 54 ++++
drivers/net/dsa/mv88e6xxx/chip.h | 24 ++
drivers/net/dsa/mv88e6xxx/global1.h | 10 +
drivers/net/dsa/mv88e6xxx/global1_vtu.c | 311 ++++++++++++++----------
4 files changed, 264 insertions(+), 135 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 34036c555977..39cf1bae161e 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1790,6 +1790,33 @@ static int mv88e6xxx_atu_new(struct mv88e6xxx_chip *chip, u16 *fid)
return mv88e6xxx_g1_atu_flush(chip, *fid, true);
}

+static int mv88e6xxx_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ if (!chip->info->ops->stu_loadpurge)
+ return -EOPNOTSUPP;
+
+ return chip->info->ops->stu_loadpurge(chip, entry);
+}
+
+static int mv88e6xxx_stu_setup(struct mv88e6xxx_chip *chip)
+{
+ struct mv88e6xxx_stu_entry stu = {
+ .valid = true,
+ .sid = 0
+ };
+
+ if (!mv88e6xxx_has_stu(chip))
+ return 0;
+
+ /* Make sure that SID 0 is always valid. This is used by VTU
+ * entries that do not make use of the STU, e.g. when creating
+ * a VLAN upper on a port that is also part of a VLAN
+ * filtering bridge.
+ */
+ return mv88e6xxx_stu_loadpurge(chip, &stu);
+}
+
static int mv88e6xxx_port_check_hw_vlan(struct dsa_switch *ds, int port,
u16 vid)
{
@@ -3415,6 +3442,13 @@ static int mv88e6xxx_setup(struct dsa_switch *ds)
if (err)
goto unlock;

+ /* Must be called after mv88e6xxx_vtu_setup (which flushes the
+ * VTU, thereby also flushing the STU).
+ */
+ err = mv88e6xxx_stu_setup(chip);
+ if (err)
+ goto unlock;
+
/* Setup Switch Port Registers */
for (i = 0; i < mv88e6xxx_num_ports(chip); i++) {
if (dsa_is_unused_port(ds, i))
@@ -3870,6 +3904,8 @@ static const struct mv88e6xxx_ops mv88e6097_ops = {
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
.phylink_get_caps = mv88e6095_phylink_get_caps,
+ .stu_getnext = mv88e6352_g1_stu_getnext,
+ .stu_loadpurge = mv88e6352_g1_stu_loadpurge,
.set_max_frame_size = mv88e6185_g1_set_max_frame_size,
};

@@ -4956,6 +4992,8 @@ static const struct mv88e6xxx_ops mv88e6352_ops = {
.atu_set_hash = mv88e6165_g1_atu_set_hash,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+ .stu_getnext = mv88e6352_g1_stu_getnext,
+ .stu_loadpurge = mv88e6352_g1_stu_loadpurge,
.serdes_get_lane = mv88e6352_serdes_get_lane,
.serdes_pcs_get_state = mv88e6352_serdes_pcs_get_state,
.serdes_pcs_config = mv88e6352_serdes_pcs_config,
@@ -5021,6 +5059,8 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
.atu_set_hash = mv88e6165_g1_atu_set_hash,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+ .stu_getnext = mv88e6390_g1_stu_getnext,
+ .stu_loadpurge = mv88e6390_g1_stu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
.serdes_get_lane = mv88e6390_serdes_get_lane,
/* Check status register pause & lpa register */
@@ -5086,6 +5126,8 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
.atu_set_hash = mv88e6165_g1_atu_set_hash,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+ .stu_getnext = mv88e6390_g1_stu_getnext,
+ .stu_loadpurge = mv88e6390_g1_stu_loadpurge,
.serdes_power = mv88e6390_serdes_power,
.serdes_get_lane = mv88e6390x_serdes_get_lane,
.serdes_pcs_get_state = mv88e6390_serdes_pcs_get_state,
@@ -5154,6 +5196,8 @@ static const struct mv88e6xxx_ops mv88e6393x_ops = {
.atu_set_hash = mv88e6165_g1_atu_set_hash,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+ .stu_getnext = mv88e6390_g1_stu_getnext,
+ .stu_loadpurge = mv88e6390_g1_stu_loadpurge,
.serdes_power = mv88e6393x_serdes_power,
.serdes_get_lane = mv88e6393x_serdes_get_lane,
.serdes_pcs_get_state = mv88e6393x_serdes_pcs_get_state,
@@ -5222,6 +5266,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11,
.num_internal_phys = 8,
.max_vid = 4095,
+ .max_sid = 63,
.port_base_addr = 0x10,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5475,6 +5520,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 9,
.num_gpio = 16,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5498,6 +5544,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 9,
.num_gpio = 16,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5520,6 +5567,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11, /* 10 + Z80 */
.num_internal_phys = 9,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5542,6 +5590,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11, /* 10 + Z80 */
.num_internal_phys = 9,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5564,6 +5613,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11, /* 10 + Z80 */
.num_internal_phys = 9,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5803,6 +5853,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 5,
.num_gpio = 15,
.max_vid = 4095,
+ .max_sid = 63,
.port_base_addr = 0x10,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5827,6 +5878,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 9,
.num_gpio = 16,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5851,6 +5903,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_internal_phys = 9,
.num_gpio = 16,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
@@ -5874,6 +5927,7 @@ static const struct mv88e6xxx_info mv88e6xxx_table[] = {
.num_ports = 11, /* 10 + Z80 */
.num_internal_phys = 9,
.max_vid = 8191,
+ .max_sid = 63,
.port_base_addr = 0x0,
.phy_base_addr = 0x0,
.global1_addr = 0x1b,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.h b/drivers/net/dsa/mv88e6xxx/chip.h
index 30b92a265613..be654be69982 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.h
+++ b/drivers/net/dsa/mv88e6xxx/chip.h
@@ -20,6 +20,7 @@

#define EDSA_HLEN 8
#define MV88E6XXX_N_FID 4096
+#define MV88E6XXX_N_SID 64

#define MV88E6XXX_FID_STANDALONE 0
#define MV88E6XXX_FID_BRIDGED 1
@@ -130,6 +131,7 @@ struct mv88e6xxx_info {
unsigned int num_internal_phys;
unsigned int num_gpio;
unsigned int max_vid;
+ unsigned int max_sid;
unsigned int port_base_addr;
unsigned int phy_base_addr;
unsigned int global1_addr;
@@ -181,6 +183,12 @@ struct mv88e6xxx_vtu_entry {
bool valid;
bool policy;
u8 member[DSA_MAX_PORTS];
+ u8 state[DSA_MAX_PORTS]; /* Older silicon has no STU */
+};
+
+struct mv88e6xxx_stu_entry {
+ u8 sid;
+ bool valid;
u8 state[DSA_MAX_PORTS];
};

@@ -602,6 +610,12 @@ struct mv88e6xxx_ops {
int (*vtu_loadpurge)(struct mv88e6xxx_chip *chip,
struct mv88e6xxx_vtu_entry *entry);

+ /* Spanning Tree Unit operations */
+ int (*stu_getnext)(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+ int (*stu_loadpurge)(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+
/* GPIO operations */
const struct mv88e6xxx_gpio_ops *gpio_ops;

@@ -700,6 +714,11 @@ struct mv88e6xxx_hw_stat {
int type;
};

+static inline bool mv88e6xxx_has_stu(struct mv88e6xxx_chip *chip)
+{
+ return chip->info->max_sid > 0;
+}
+
static inline bool mv88e6xxx_has_pvt(struct mv88e6xxx_chip *chip)
{
return chip->info->pvt;
@@ -730,6 +749,11 @@ static inline unsigned int mv88e6xxx_max_vid(struct mv88e6xxx_chip *chip)
return chip->info->max_vid;
}

+static inline unsigned int mv88e6xxx_max_sid(struct mv88e6xxx_chip *chip)
+{
+ return chip->info->max_sid;
+}
+
static inline u16 mv88e6xxx_port_mask(struct mv88e6xxx_chip *chip)
{
return GENMASK((s32)mv88e6xxx_num_ports(chip) - 1, 0);
diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h
index 2c1607c858a1..65958b2a0d3a 100644
--- a/drivers/net/dsa/mv88e6xxx/global1.h
+++ b/drivers/net/dsa/mv88e6xxx/global1.h
@@ -348,6 +348,16 @@ int mv88e6390_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
int mv88e6390_g1_vtu_loadpurge(struct mv88e6xxx_chip *chip,
struct mv88e6xxx_vtu_entry *entry);
int mv88e6xxx_g1_vtu_flush(struct mv88e6xxx_chip *chip);
+int mv88e6xxx_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+int mv88e6352_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+int mv88e6352_g1_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+int mv88e6390_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
+int mv88e6390_g1_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry);
int mv88e6xxx_g1_vtu_prob_irq_setup(struct mv88e6xxx_chip *chip);
void mv88e6xxx_g1_vtu_prob_irq_free(struct mv88e6xxx_chip *chip);
int mv88e6xxx_g1_atu_get_next(struct mv88e6xxx_chip *chip, u16 fid);
diff --git a/drivers/net/dsa/mv88e6xxx/global1_vtu.c b/drivers/net/dsa/mv88e6xxx/global1_vtu.c
index b1bd9274a562..38e18f5811bf 100644
--- a/drivers/net/dsa/mv88e6xxx/global1_vtu.c
+++ b/drivers/net/dsa/mv88e6xxx/global1_vtu.c
@@ -44,8 +44,7 @@ static int mv88e6xxx_g1_vtu_fid_write(struct mv88e6xxx_chip *chip,

/* Offset 0x03: VTU SID Register */

-static int mv88e6xxx_g1_vtu_sid_read(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+static int mv88e6xxx_g1_vtu_sid_read(struct mv88e6xxx_chip *chip, u8 *sid)
{
u16 val;
int err;
@@ -54,15 +53,14 @@ static int mv88e6xxx_g1_vtu_sid_read(struct mv88e6xxx_chip *chip,
if (err)
return err;

- entry->sid = val & MV88E6352_G1_VTU_SID_MASK;
+ *sid = val & MV88E6352_G1_VTU_SID_MASK;

return 0;
}

-static int mv88e6xxx_g1_vtu_sid_write(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+static int mv88e6xxx_g1_vtu_sid_write(struct mv88e6xxx_chip *chip, u8 sid)
{
- u16 val = entry->sid & MV88E6352_G1_VTU_SID_MASK;
+ u16 val = sid & MV88E6352_G1_VTU_SID_MASK;

return mv88e6xxx_g1_write(chip, MV88E6352_G1_VTU_SID, val);
}
@@ -91,7 +89,7 @@ static int mv88e6xxx_g1_vtu_op(struct mv88e6xxx_chip *chip, u16 op)
/* Offset 0x06: VTU VID Register */

static int mv88e6xxx_g1_vtu_vid_read(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+ bool *valid, u16 *vid)
{
u16 val;
int err;
@@ -100,25 +98,28 @@ static int mv88e6xxx_g1_vtu_vid_read(struct mv88e6xxx_chip *chip,
if (err)
return err;

- entry->vid = val & 0xfff;
+ if (vid) {
+ *vid = val & 0xfff;

- if (val & MV88E6390_G1_VTU_VID_PAGE)
- entry->vid |= 0x1000;
+ if (val & MV88E6390_G1_VTU_VID_PAGE)
+ *vid |= 0x1000;
+ }

- entry->valid = !!(val & MV88E6XXX_G1_VTU_VID_VALID);
+ if (valid)
+ *valid = !!(val & MV88E6XXX_G1_VTU_VID_VALID);

return 0;
}

static int mv88e6xxx_g1_vtu_vid_write(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+ bool valid, u16 vid)
{
- u16 val = entry->vid & 0xfff;
+ u16 val = vid & 0xfff;

- if (entry->vid & 0x1000)
+ if (vid & 0x1000)
val |= MV88E6390_G1_VTU_VID_PAGE;

- if (entry->valid)
+ if (valid)
val |= MV88E6XXX_G1_VTU_VID_VALID;

return mv88e6xxx_g1_write(chip, MV88E6XXX_G1_VTU_VID, val);
@@ -147,7 +148,7 @@ static int mv88e6185_g1_vtu_stu_data_read(struct mv88e6xxx_chip *chip,
}

static int mv88e6185_g1_vtu_data_read(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+ u8 *member, u8 *state)
{
u16 regs[3];
int err;
@@ -160,36 +161,20 @@ static int mv88e6185_g1_vtu_data_read(struct mv88e6xxx_chip *chip,
/* Extract MemberTag data */
for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
unsigned int member_offset = (i % 4) * 4;
+ unsigned int state_offset = member_offset + 2;

- entry->member[i] = (regs[i / 4] >> member_offset) & 0x3;
- }
-
- return 0;
-}
-
-static int mv88e6185_g1_stu_data_read(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
-{
- u16 regs[3];
- int err;
- int i;
-
- err = mv88e6185_g1_vtu_stu_data_read(chip, regs);
- if (err)
- return err;
+ if (member)
+ member[i] = (regs[i / 4] >> member_offset) & 0x3;

- /* Extract PortState data */
- for (i = 0; i < mv88e6xxx_num_ports(chip); ++i) {
- unsigned int state_offset = (i % 4) * 4 + 2;
-
- entry->state[i] = (regs[i / 4] >> state_offset) & 0x3;
+ if (state)
+ state[i] = (regs[i / 4] >> state_offset) & 0x3;
}

return 0;
}

static int mv88e6185_g1_vtu_data_write(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
+ u8 *member, u8 *state)
{
u16 regs[3] = { 0 };
int i;
@@ -199,8 +184,11 @@ static int mv88e6185_g1_vtu_data_write(struct mv88e6xxx_chip *chip,
unsigned int member_offset = (i % 4) * 4;
unsigned int state_offset = member_offset + 2;

- regs[i / 4] |= (entry->member[i] & 0x3) << member_offset;
- regs[i / 4] |= (entry->state[i] & 0x3) << state_offset;
+ if (member)
+ regs[i / 4] |= (member[i] & 0x3) << member_offset;
+
+ if (state)
+ regs[i / 4] |= (state[i] & 0x3) << state_offset;
}

/* Write all 3 VTU/STU Data registers */
@@ -268,48 +256,6 @@ static int mv88e6390_g1_vtu_data_write(struct mv88e6xxx_chip *chip, u8 *data)

/* VLAN Translation Unit Operations */

-static int mv88e6xxx_g1_vtu_stu_getnext(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *entry)
-{
- int err;
-
- err = mv88e6xxx_g1_vtu_sid_write(chip, entry);
- if (err)
- return err;
-
- err = mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_STU_GET_NEXT);
- if (err)
- return err;
-
- err = mv88e6xxx_g1_vtu_sid_read(chip, entry);
- if (err)
- return err;
-
- return mv88e6xxx_g1_vtu_vid_read(chip, entry);
-}
-
-static int mv88e6xxx_g1_vtu_stu_get(struct mv88e6xxx_chip *chip,
- struct mv88e6xxx_vtu_entry *vtu)
-{
- struct mv88e6xxx_vtu_entry stu;
- int err;
-
- err = mv88e6xxx_g1_vtu_sid_read(chip, vtu);
- if (err)
- return err;
-
- stu.sid = vtu->sid - 1;
-
- err = mv88e6xxx_g1_vtu_stu_getnext(chip, &stu);
- if (err)
- return err;
-
- if (stu.sid != vtu->sid || !stu.valid)
- return -EINVAL;
-
- return 0;
-}
-
int mv88e6xxx_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
struct mv88e6xxx_vtu_entry *entry)
{
@@ -327,7 +273,7 @@ int mv88e6xxx_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
* write the VID only once, when the entry is given as invalid.
*/
if (!entry->valid) {
- err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, false, entry->vid);
if (err)
return err;
}
@@ -336,7 +282,7 @@ int mv88e6xxx_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
if (err)
return err;

- return mv88e6xxx_g1_vtu_vid_read(chip, entry);
+ return mv88e6xxx_g1_vtu_vid_read(chip, &entry->valid, &entry->vid);
}

int mv88e6185_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
@@ -350,11 +296,7 @@ int mv88e6185_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
return err;

if (entry->valid) {
- err = mv88e6185_g1_vtu_data_read(chip, entry);
- if (err)
- return err;
-
- err = mv88e6185_g1_stu_data_read(chip, entry);
+ err = mv88e6185_g1_vtu_data_read(chip, entry->member, entry->state);
if (err)
return err;

@@ -384,7 +326,7 @@ int mv88e6352_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
return err;

if (entry->valid) {
- err = mv88e6185_g1_vtu_data_read(chip, entry);
+ err = mv88e6185_g1_vtu_data_read(chip, entry->member, NULL);
if (err)
return err;

@@ -392,12 +334,7 @@ int mv88e6352_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
if (err)
return err;

- /* Fetch VLAN PortState data from the STU */
- err = mv88e6xxx_g1_vtu_stu_get(chip, entry);
- if (err)
- return err;
-
- err = mv88e6185_g1_stu_data_read(chip, entry);
+ err = mv88e6xxx_g1_vtu_sid_read(chip, &entry->sid);
if (err)
return err;
}
@@ -420,16 +357,11 @@ int mv88e6390_g1_vtu_getnext(struct mv88e6xxx_chip *chip,
if (err)
return err;

- /* Fetch VLAN PortState data from the STU */
- err = mv88e6xxx_g1_vtu_stu_get(chip, entry);
- if (err)
- return err;
-
- err = mv88e6390_g1_vtu_data_read(chip, entry->state);
+ err = mv88e6xxx_g1_vtu_fid_read(chip, entry);
if (err)
return err;

- err = mv88e6xxx_g1_vtu_fid_read(chip, entry);
+ err = mv88e6xxx_g1_vtu_sid_read(chip, &entry->sid);
if (err)
return err;
}
@@ -447,12 +379,12 @@ int mv88e6185_g1_vtu_loadpurge(struct mv88e6xxx_chip *chip,
if (err)
return err;

- err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, entry->vid);
if (err)
return err;

if (entry->valid) {
- err = mv88e6185_g1_vtu_data_write(chip, entry);
+ err = mv88e6185_g1_vtu_data_write(chip, entry->member, entry->state);
if (err)
return err;

@@ -479,27 +411,21 @@ int mv88e6352_g1_vtu_loadpurge(struct mv88e6xxx_chip *chip,
if (err)
return err;

- err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, entry->vid);
if (err)
return err;

if (entry->valid) {
- /* Write MemberTag and PortState data */
- err = mv88e6185_g1_vtu_data_write(chip, entry);
- if (err)
- return err;
-
- err = mv88e6xxx_g1_vtu_sid_write(chip, entry);
+ /* Write MemberTag data */
+ err = mv88e6185_g1_vtu_data_write(chip, entry->member, NULL);
if (err)
return err;

- /* Load STU entry */
- err = mv88e6xxx_g1_vtu_op(chip,
- MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE);
+ err = mv88e6xxx_g1_vtu_fid_write(chip, entry);
if (err)
return err;

- err = mv88e6xxx_g1_vtu_fid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
if (err)
return err;
}
@@ -517,41 +443,113 @@ int mv88e6390_g1_vtu_loadpurge(struct mv88e6xxx_chip *chip,
if (err)
return err;

- err = mv88e6xxx_g1_vtu_vid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, entry->vid);
if (err)
return err;

if (entry->valid) {
- /* Write PortState data */
- err = mv88e6390_g1_vtu_data_write(chip, entry->state);
+ /* Write MemberTag data */
+ err = mv88e6390_g1_vtu_data_write(chip, entry->member);
if (err)
return err;

- err = mv88e6xxx_g1_vtu_sid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_fid_write(chip, entry);
if (err)
return err;

- /* Load STU entry */
- err = mv88e6xxx_g1_vtu_op(chip,
- MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE);
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
if (err)
return err;
+ }

- /* Write MemberTag data */
- err = mv88e6390_g1_vtu_data_write(chip, entry->member);
+ /* Load/Purge VTU entry */
+ return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_VTU_LOAD_PURGE);
+}
+
+int mv88e6xxx_g1_vtu_flush(struct mv88e6xxx_chip *chip)
+{
+ int err;
+
+ err = mv88e6xxx_g1_vtu_op_wait(chip);
+ if (err)
+ return err;
+
+ return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_FLUSH_ALL);
+}
+
+/* Spanning Tree Unit Operations */
+
+int mv88e6xxx_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ int err;
+
+ err = mv88e6xxx_g1_vtu_op_wait(chip);
+ if (err)
+ return err;
+
+ /* To get the next higher active SID, the STU GetNext operation can be
+ * started again without setting the SID registers since it already
+ * contains the last SID.
+ *
+ * To save a few hardware accesses and abstract this to the caller,
+ * write the SID only once, when the entry is given as invalid.
+ */
+ if (!entry->valid) {
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
if (err)
return err;
+ }

- err = mv88e6xxx_g1_vtu_fid_write(chip, entry);
+ err = mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_STU_GET_NEXT);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_vtu_vid_read(chip, &entry->valid, NULL);
+ if (err)
+ return err;
+
+ if (entry->valid) {
+ err = mv88e6xxx_g1_vtu_sid_read(chip, &entry->sid);
if (err)
return err;
}

- /* Load/Purge VTU entry */
- return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_VTU_LOAD_PURGE);
+ return 0;
}

-int mv88e6xxx_g1_vtu_flush(struct mv88e6xxx_chip *chip)
+int mv88e6352_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ int err;
+
+ err = mv88e6xxx_g1_stu_getnext(chip, entry);
+ if (err)
+ return err;
+
+ if (!entry->valid)
+ return 0;
+
+ return mv88e6185_g1_vtu_data_read(chip, NULL, entry->state);
+}
+
+int mv88e6390_g1_stu_getnext(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ int err;
+
+ err = mv88e6xxx_g1_stu_getnext(chip, entry);
+ if (err)
+ return err;
+
+ if (!entry->valid)
+ return 0;
+
+ return mv88e6390_g1_vtu_data_read(chip, entry->state);
+}
+
+int mv88e6352_g1_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
{
int err;

@@ -559,16 +557,59 @@ int mv88e6xxx_g1_vtu_flush(struct mv88e6xxx_chip *chip)
if (err)
return err;

- return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_FLUSH_ALL);
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, 0);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
+ if (err)
+ return err;
+
+ if (entry->valid) {
+ err = mv88e6185_g1_vtu_data_write(chip, NULL, entry->state);
+ if (err)
+ return err;
+ }
+
+ /* Load/Purge STU entry */
+ return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE);
+}
+
+int mv88e6390_g1_stu_loadpurge(struct mv88e6xxx_chip *chip,
+ struct mv88e6xxx_stu_entry *entry)
+{
+ int err;
+
+ err = mv88e6xxx_g1_vtu_op_wait(chip);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_vtu_vid_write(chip, entry->valid, 0);
+ if (err)
+ return err;
+
+ err = mv88e6xxx_g1_vtu_sid_write(chip, entry->sid);
+ if (err)
+ return err;
+
+ if (entry->valid) {
+ err = mv88e6390_g1_vtu_data_write(chip, entry->state);
+ if (err)
+ return err;
+ }
+
+ /* Load/Purge STU entry */
+ return mv88e6xxx_g1_vtu_op(chip, MV88E6XXX_G1_VTU_OP_STU_LOAD_PURGE);
}

+/* VTU Violation Management */
+
static irqreturn_t mv88e6xxx_g1_vtu_prob_irq_thread_fn(int irq, void *dev_id)
{
struct mv88e6xxx_chip *chip = dev_id;
- struct mv88e6xxx_vtu_entry entry;
+ u16 val, vid;
int spid;
int err;
- u16 val;

mv88e6xxx_reg_lock(chip);

@@ -580,7 +621,7 @@ static irqreturn_t mv88e6xxx_g1_vtu_prob_irq_thread_fn(int irq, void *dev_id)
if (err)
goto out;

- err = mv88e6xxx_g1_vtu_vid_read(chip, &entry);
+ err = mv88e6xxx_g1_vtu_vid_read(chip, NULL, &vid);
if (err)
goto out;

@@ -588,13 +629,13 @@ static irqreturn_t mv88e6xxx_g1_vtu_prob_irq_thread_fn(int irq, void *dev_id)

if (val & MV88E6XXX_G1_VTU_OP_MEMBER_VIOLATION) {
dev_err_ratelimited(chip->dev, "VTU member violation for vid %d, source port %d\n",
- entry.vid, spid);
+ vid, spid);
chip->ports[spid].vtu_member_violation++;
}

if (val & MV88E6XXX_G1_VTU_OP_MISS_VIOLATION) {
dev_dbg_ratelimited(chip->dev, "VTU miss violation for vid %d, source port %d\n",
- entry.vid, spid);
+ vid, spid);
chip->ports[spid].vtu_miss_violation++;
}

--
2.25.1

2022-02-16 15:43:01

by Tobias Waldekranz

[permalink] [raw]
Subject: [RFC net-next 4/9] net: bridge: vlan: Notify switchdev drivers of MST state changes

Generate a switchdev notification whenever a per-VLAN STP state
changes. This notification is keyed by the VLANs MSTID rather than the
VID, since multiple VLANs may share the same MST instance.

Signed-off-by: Tobias Waldekranz <[email protected]>
---
include/net/switchdev.h | 7 +++++++
net/bridge/br_vlan_options.c | 22 ++++++++++++++++++++--
2 files changed, 27 insertions(+), 2 deletions(-)

diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index ee4a7bd1e540..0a3e0e0bb10a 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -19,6 +19,7 @@
enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_UNDEFINED,
SWITCHDEV_ATTR_ID_PORT_STP_STATE,
+ SWITCHDEV_ATTR_ID_PORT_MST_STATE,
SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS,
SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS,
SWITCHDEV_ATTR_ID_PORT_MROUTER,
@@ -31,6 +32,11 @@ enum switchdev_attr_id {
SWITCHDEV_ATTR_ID_VLAN_MSTID,
};

+struct switchdev_mst_state {
+ u16 mstid;
+ u8 state;
+};
+
struct switchdev_brport_flags {
unsigned long val;
unsigned long mask;
@@ -52,6 +58,7 @@ struct switchdev_attr {
void (*complete)(struct net_device *dev, int err, void *priv);
union {
u8 stp_state; /* PORT_STP_STATE */
+ struct switchdev_mst_state mst_state; /* PORT_MST_STATE */
struct switchdev_brport_flags brport_flags; /* PORT_BRIDGE_FLAGS */
bool mrouter; /* PORT_MROUTER */
clock_t ageing_time; /* BRIDGE_AGEING_TIME */
diff --git a/net/bridge/br_vlan_options.c b/net/bridge/br_vlan_options.c
index 1c0fd55fe6c9..b8840294f98e 100644
--- a/net/bridge/br_vlan_options.c
+++ b/net/bridge/br_vlan_options.c
@@ -5,6 +5,7 @@
#include <linux/rtnetlink.h>
#include <linux/slab.h>
#include <net/ip_tunnels.h>
+#include <net/switchdev.h>

#include "br_private.h"
#include "br_private_tunnel.h"
@@ -80,7 +81,16 @@ static int br_vlan_modify_state(struct net_bridge_vlan_group *vg,
bool *changed,
struct netlink_ext_ack *extack)
{
+ struct switchdev_attr attr = {
+ .id = SWITCHDEV_ATTR_ID_PORT_MST_STATE,
+ .flags = SWITCHDEV_F_DEFER,
+ .u.mst_state = {
+ .mstid = br_vlan_mstid_get(v),
+ .state = state,
+ },
+ };
struct net_bridge *br;
+ int err;

ASSERT_RTNL();

@@ -89,10 +99,12 @@ static int br_vlan_modify_state(struct net_bridge_vlan_group *vg,
return -EINVAL;
}

- if (br_vlan_is_brentry(v))
+ if (br_vlan_is_brentry(v)) {
br = v->br;
- else
+ } else {
br = v->port->br;
+ attr.orig_dev = v->port->dev;
+ }

if (br->stp_enabled == BR_KERNEL_STP) {
NL_SET_ERR_MSG_MOD(extack, "Can't modify vlan state when using kernel STP");
@@ -102,6 +114,12 @@ static int br_vlan_modify_state(struct net_bridge_vlan_group *vg,
if (br_vlan_get_state_rtnl(v) == state)
return 0;

+ if (attr.orig_dev) {
+ err = switchdev_port_attr_set(attr.orig_dev, &attr, NULL);
+ if (err && err != -EOPNOTSUPP)
+ return err;
+ }
+
if (v->vid == br_get_pvid(vg))
br_vlan_set_pvid_state(vg, state);

--
2.25.1

2022-02-16 16:24:13

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: [RFC net-next 0/9] net: bridge: vlan: Multiple Spanning Trees

On 16/02/2022 15:29, Tobias Waldekranz wrote:
> The bridge has had per-VLAN STP support for a while now, since:
>
> https://lore.kernel.org/netdev/[email protected]/
>
> The current implementation has some problems:
>
> - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
> is managed independently. This is awkward from an MSTP (802.1Q-2018,
> Clause 13.5) point of view, where the model is that multiple VLANs
> are grouped into MST instances.
>
> Because of the way that the standard is written, presumably, this is
> also reflected in hardware implementations. It is not uncommon for a
> switch to support the full 4k range of VIDs, but that the pool of
> MST instances is much smaller. Some examples:
>
> Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
> Marvell Prestera: 4k VLANs, but only 128 MSTIs
> Microchip SparX-5i: 4k VLANs, but only 128 MSTIs
>
> - By default, the feature is enabled, and there is no way to disable
> it. This makes it hard to add offloading in a backwards compatible
> way, since any underlying switchdevs have no way to refuse the
> function if the hardware does not support it
>
> - The port-global STP state has precedence over per-VLAN states. In
> MSTP, as far as I understand it, all VLANs will use the common
> spanning tree (CST) by default - through traffic engineering you can
> then optimize your network to group subsets of VLANs to use
> different trees (MSTI). To my understanding, the way this is
> typically managed in silicon is roughly:
>
> Incoming packet:
> .----.----.--------------.----.-------------
> | DA | SA | 802.1Q VID=X | ET | Payload ...
> '----'----'--------------'----'-------------
> |
> '->|\ .----------------------------.
> | +--> | VID | Members | ... | MSTI |
> PVID -->|/ |-----|---------|-----|------|
> | 1 | 0001001 | ... | 0 |
> | 2 | 0001010 | ... | 10 |
> | 3 | 0001100 | ... | 10 |
> '----------------------------'
> |
> .-----------------------------'
> | .------------------------.
> '->| MSTI | Fwding | Lrning |
> |------|--------|--------|
> | 0 | 111110 | 111110 |
> | 10 | 110111 | 110111 |
> '------------------------'
>
> What this is trying to show is that the STP state (whether MSTP is
> used, or ye olde STP) is always accessed via the VLAN table. If STP
> is running, all MSTI pointers in that table will reference the same
> index in the STP stable - if MSTP is running, some VLANs may point
> to other trees (like in this example).
>
> The fact that in the Linux bridge, the global state (think: index 0
> in most hardware implementations) is supposed to override the
> per-VLAN state, is very awkward to offload. In effect, this means
> that when the global state changes to blocking, drivers will have to
> iterate over all MSTIs in use, and alter them all to match. This
> also means that you have to cache whether the hardware state is
> currently tracking the global state or the per-VLAN state. In the
> first case, you also have to cache the per-VLAN state so that you
> can restore it if the global state transitions back to forwarding.
>
> This series adds support for an arbitrary M:N mapping of VIDs to
> MSTIs, proposing one solution to the first issue. An example of an
> offload implementation for mv88e6xxx is also provided. Offloading is
> done on a best-effort basis, i.e. notifications of the relevant events
> are generated, but there is no way for the user to see whether the
> per-VLAN state has been offloaded or not. There is also no handling of
> the relationship between the port-global state the the per-VLAN ditto.
>
> If I was king of net/bridge/*, I would make the following additional
> changes:
>
> - By default, when a VLAN is created, assign it to MSTID 0, which
> would mean that no per-VLAN state is used and that packets belonging
> to this VLAN should be filtered according to the port-global state.
>
> This way, when a VLAN is configured to use a separate tree (setting
> a non-zero MSTID), an underlying switchdev could oppose it if it is
> not supported.
>
> Obviously, this adds an extra step for existing users of per-VLAN
> STP states and would thus not be backwards compatible. Maybe this
> means that that is impossible to do, maybe not.
>
> - Swap the precedence of the port-global and the per-VLAN state,
> i.e. the port-global state only applies to packets belonging to
> VLANs that does not make use of a per-VLAN state (MSTID != 0).
>
> This would make the offloading much more natural, as you avoid all
> of the caching stuff described above.
>
> Again, this changes the behavior of the kernel so it is not
> backwards compatible. I suspect that this is less of an issue
> though, since my guess is that very few people rely on the old
> behavior.
>
> Thoughts?
>

Interesting! Would adding a new (e.g. vlan_mst_enable) option which changes the behaviour
as described help? It can require that there are no vlans present to change
similar to the per-port vlan stats option. Also based on that option you can alter
how the state checks are performed. For example, you can skip the initial port state
check, then in br_vlan_allowed_ingress() you can use the port state if vlan filtering
is disabled and mst enabled and you can avoid checking it altogether if filter && mst
are enabled then always use the vlan mst state. Similar changes would have to happen
for the egress path. Since we are talking about multiple tests the new MST logic can
be hidden behind a static key for both br_handle_frame() and later stages.

This set needs to read a new cache line to fetch mst ptr for all packets in the vlan fast-path,
that is definitely undesirable. Please either cache that state in the vlan and update it when
something changes, or think of some way which avoids that cache line in fast-path.
Alternative would be to make that cache line dependent on the new option, so it's needed
only when mst feature is enabled.

There are other options, but they way more invasive.. I'll think about it more.

Cheers,
Nik

2022-02-16 18:44:44

by Tobias Waldekranz

[permalink] [raw]
Subject: Re: [RFC net-next 0/9] net: bridge: vlan: Multiple Spanning Trees

On Wed, Feb 16, 2022 at 17:28, Nikolay Aleksandrov <[email protected]> wrote:
> On 16/02/2022 15:29, Tobias Waldekranz wrote:
>> The bridge has had per-VLAN STP support for a while now, since:
>>
>> https://lore.kernel.org/netdev/[email protected]/
>>
>> The current implementation has some problems:
>>
>> - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
>> is managed independently. This is awkward from an MSTP (802.1Q-2018,
>> Clause 13.5) point of view, where the model is that multiple VLANs
>> are grouped into MST instances.
>>
>> Because of the way that the standard is written, presumably, this is
>> also reflected in hardware implementations. It is not uncommon for a
>> switch to support the full 4k range of VIDs, but that the pool of
>> MST instances is much smaller. Some examples:
>>
>> Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
>> Marvell Prestera: 4k VLANs, but only 128 MSTIs
>> Microchip SparX-5i: 4k VLANs, but only 128 MSTIs
>>
>> - By default, the feature is enabled, and there is no way to disable
>> it. This makes it hard to add offloading in a backwards compatible
>> way, since any underlying switchdevs have no way to refuse the
>> function if the hardware does not support it
>>
>> - The port-global STP state has precedence over per-VLAN states. In
>> MSTP, as far as I understand it, all VLANs will use the common
>> spanning tree (CST) by default - through traffic engineering you can
>> then optimize your network to group subsets of VLANs to use
>> different trees (MSTI). To my understanding, the way this is
>> typically managed in silicon is roughly:
>>
>> Incoming packet:
>> .----.----.--------------.----.-------------
>> | DA | SA | 802.1Q VID=X | ET | Payload ...
>> '----'----'--------------'----'-------------
>> |
>> '->|\ .----------------------------.
>> | +--> | VID | Members | ... | MSTI |
>> PVID -->|/ |-----|---------|-----|------|
>> | 1 | 0001001 | ... | 0 |
>> | 2 | 0001010 | ... | 10 |
>> | 3 | 0001100 | ... | 10 |
>> '----------------------------'
>> |
>> .-----------------------------'
>> | .------------------------.
>> '->| MSTI | Fwding | Lrning |
>> |------|--------|--------|
>> | 0 | 111110 | 111110 |
>> | 10 | 110111 | 110111 |
>> '------------------------'
>>
>> What this is trying to show is that the STP state (whether MSTP is
>> used, or ye olde STP) is always accessed via the VLAN table. If STP
>> is running, all MSTI pointers in that table will reference the same
>> index in the STP stable - if MSTP is running, some VLANs may point
>> to other trees (like in this example).
>>
>> The fact that in the Linux bridge, the global state (think: index 0
>> in most hardware implementations) is supposed to override the
>> per-VLAN state, is very awkward to offload. In effect, this means
>> that when the global state changes to blocking, drivers will have to
>> iterate over all MSTIs in use, and alter them all to match. This
>> also means that you have to cache whether the hardware state is
>> currently tracking the global state or the per-VLAN state. In the
>> first case, you also have to cache the per-VLAN state so that you
>> can restore it if the global state transitions back to forwarding.
>>
>> This series adds support for an arbitrary M:N mapping of VIDs to
>> MSTIs, proposing one solution to the first issue. An example of an
>> offload implementation for mv88e6xxx is also provided. Offloading is
>> done on a best-effort basis, i.e. notifications of the relevant events
>> are generated, but there is no way for the user to see whether the
>> per-VLAN state has been offloaded or not. There is also no handling of
>> the relationship between the port-global state the the per-VLAN ditto.
>>
>> If I was king of net/bridge/*, I would make the following additional
>> changes:
>>
>> - By default, when a VLAN is created, assign it to MSTID 0, which
>> would mean that no per-VLAN state is used and that packets belonging
>> to this VLAN should be filtered according to the port-global state.
>>
>> This way, when a VLAN is configured to use a separate tree (setting
>> a non-zero MSTID), an underlying switchdev could oppose it if it is
>> not supported.
>>
>> Obviously, this adds an extra step for existing users of per-VLAN
>> STP states and would thus not be backwards compatible. Maybe this
>> means that that is impossible to do, maybe not.
>>
>> - Swap the precedence of the port-global and the per-VLAN state,
>> i.e. the port-global state only applies to packets belonging to
>> VLANs that does not make use of a per-VLAN state (MSTID != 0).
>>
>> This would make the offloading much more natural, as you avoid all
>> of the caching stuff described above.
>>
>> Again, this changes the behavior of the kernel so it is not
>> backwards compatible. I suspect that this is less of an issue
>> though, since my guess is that very few people rely on the old
>> behavior.
>>
>> Thoughts?
>>
>
> Interesting! Would adding a new (e.g. vlan_mst_enable) option which changes the behaviour
> as described help? It can require that there are no vlans present to change
> similar to the per-port vlan stats option.

Great idea, I did not know that that's how vlan stats worked. I will
definitely look into it, thanks!

> Also based on that option you can alter
> how the state checks are performed. For example, you can skip the initial port state
> check, then in br_vlan_allowed_ingress() you can use the port state if vlan filtering
> is disabled and mst enabled and you can avoid checking it altogether if filter && mst
> are enabled then always use the vlan mst state. Similar changes would have to happen
> for the egress path. Since we are talking about multiple tests the new MST logic can
> be hidden behind a static key for both br_handle_frame() and later stages.

Makes sense.

So should we keep the current per-VLAN state as-is then? And bolt the
MST on to the side? I.e. should `struct net_bridge_vlan` both have `u8
state` for the current implementation _and_ a `struct br_vlan_mst *`
that is populated for VLANs tied to a non-zero MSTI?

> This set needs to read a new cache line to fetch mst ptr for all packets in the vlan fast-path,
> that is definitely undesirable. Please either cache that state in the vlan and update it when
> something changes, or think of some way which avoids that cache line in fast-path.
> Alternative would be to make that cache line dependent on the new option, so it's needed
> only when mst feature is enabled.

If we go with the approach I suggested above, then the current `u8
state` on `struct net_bridge_vlan` could be that cache, right?

With the current implementation, it is set directly - in the new MST
mode all grouped VLANs would have their states updated when updating the
MSTI's state.

2022-02-17 05:59:36

by Nikolay Aleksandrov

[permalink] [raw]
Subject: Re: [RFC net-next 0/9] net: bridge: vlan: Multiple Spanning Trees

On 16/02/2022 17:56, Tobias Waldekranz wrote:
> On Wed, Feb 16, 2022 at 17:28, Nikolay Aleksandrov <[email protected]> wrote:
>> On 16/02/2022 15:29, Tobias Waldekranz wrote:
>>> The bridge has had per-VLAN STP support for a while now, since:
>>>
>>> https://lore.kernel.org/netdev/[email protected]/
>>>
>>> The current implementation has some problems:
>>>
>>> - The mapping from VLAN to STP state is fixed as 1:1, i.e. each VLAN
>>> is managed independently. This is awkward from an MSTP (802.1Q-2018,
>>> Clause 13.5) point of view, where the model is that multiple VLANs
>>> are grouped into MST instances.
>>>
>>> Because of the way that the standard is written, presumably, this is
>>> also reflected in hardware implementations. It is not uncommon for a
>>> switch to support the full 4k range of VIDs, but that the pool of
>>> MST instances is much smaller. Some examples:
>>>
>>> Marvell LinkStreet (mv88e6xxx): 4k VLANs, but only 64 MSTIs
>>> Marvell Prestera: 4k VLANs, but only 128 MSTIs
>>> Microchip SparX-5i: 4k VLANs, but only 128 MSTIs
>>>
>>> - By default, the feature is enabled, and there is no way to disable
>>> it. This makes it hard to add offloading in a backwards compatible
>>> way, since any underlying switchdevs have no way to refuse the
>>> function if the hardware does not support it
>>>
>>> - The port-global STP state has precedence over per-VLAN states. In
>>> MSTP, as far as I understand it, all VLANs will use the common
>>> spanning tree (CST) by default - through traffic engineering you can
>>> then optimize your network to group subsets of VLANs to use
>>> different trees (MSTI). To my understanding, the way this is
>>> typically managed in silicon is roughly:
>>>
>>> Incoming packet:
>>> .----.----.--------------.----.-------------
>>> | DA | SA | 802.1Q VID=X | ET | Payload ...
>>> '----'----'--------------'----'-------------
>>> |
>>> '->|\ .----------------------------.
>>> | +--> | VID | Members | ... | MSTI |
>>> PVID -->|/ |-----|---------|-----|------|
>>> | 1 | 0001001 | ... | 0 |
>>> | 2 | 0001010 | ... | 10 |
>>> | 3 | 0001100 | ... | 10 |
>>> '----------------------------'
>>> |
>>> .-----------------------------'
>>> | .------------------------.
>>> '->| MSTI | Fwding | Lrning |
>>> |------|--------|--------|
>>> | 0 | 111110 | 111110 |
>>> | 10 | 110111 | 110111 |
>>> '------------------------'
>>>
>>> What this is trying to show is that the STP state (whether MSTP is
>>> used, or ye olde STP) is always accessed via the VLAN table. If STP
>>> is running, all MSTI pointers in that table will reference the same
>>> index in the STP stable - if MSTP is running, some VLANs may point
>>> to other trees (like in this example).
>>>
>>> The fact that in the Linux bridge, the global state (think: index 0
>>> in most hardware implementations) is supposed to override the
>>> per-VLAN state, is very awkward to offload. In effect, this means
>>> that when the global state changes to blocking, drivers will have to
>>> iterate over all MSTIs in use, and alter them all to match. This
>>> also means that you have to cache whether the hardware state is
>>> currently tracking the global state or the per-VLAN state. In the
>>> first case, you also have to cache the per-VLAN state so that you
>>> can restore it if the global state transitions back to forwarding.
>>>
>>> This series adds support for an arbitrary M:N mapping of VIDs to
>>> MSTIs, proposing one solution to the first issue. An example of an
>>> offload implementation for mv88e6xxx is also provided. Offloading is
>>> done on a best-effort basis, i.e. notifications of the relevant events
>>> are generated, but there is no way for the user to see whether the
>>> per-VLAN state has been offloaded or not. There is also no handling of
>>> the relationship between the port-global state the the per-VLAN ditto.
>>>
>>> If I was king of net/bridge/*, I would make the following additional
>>> changes:
>>>
>>> - By default, when a VLAN is created, assign it to MSTID 0, which
>>> would mean that no per-VLAN state is used and that packets belonging
>>> to this VLAN should be filtered according to the port-global state.
>>>
>>> This way, when a VLAN is configured to use a separate tree (setting
>>> a non-zero MSTID), an underlying switchdev could oppose it if it is
>>> not supported.
>>>
>>> Obviously, this adds an extra step for existing users of per-VLAN
>>> STP states and would thus not be backwards compatible. Maybe this
>>> means that that is impossible to do, maybe not.
>>>
>>> - Swap the precedence of the port-global and the per-VLAN state,
>>> i.e. the port-global state only applies to packets belonging to
>>> VLANs that does not make use of a per-VLAN state (MSTID != 0).
>>>
>>> This would make the offloading much more natural, as you avoid all
>>> of the caching stuff described above.
>>>
>>> Again, this changes the behavior of the kernel so it is not
>>> backwards compatible. I suspect that this is less of an issue
>>> though, since my guess is that very few people rely on the old
>>> behavior.
>>>
>>> Thoughts?
>>>
>>
>> Interesting! Would adding a new (e.g. vlan_mst_enable) option which changes the behaviour
>> as described help? It can require that there are no vlans present to change
>> similar to the per-port vlan stats option.
>
> Great idea, I did not know that that's how vlan stats worked. I will
> definitely look into it, thanks!
>
>> Also based on that option you can alter
>> how the state checks are performed. For example, you can skip the initial port state
>> check, then in br_vlan_allowed_ingress() you can use the port state if vlan filtering
>> is disabled and mst enabled and you can avoid checking it altogether if filter && mst
>> are enabled then always use the vlan mst state. Similar changes would have to happen
>> for the egress path. Since we are talking about multiple tests the new MST logic can
>> be hidden behind a static key for both br_handle_frame() and later stages.
>
> Makes sense.
>
> So should we keep the current per-VLAN state as-is then? And bolt the
> MST on to the side? I.e. should `struct net_bridge_vlan` both have `u8
> state` for the current implementation _and_ a `struct br_vlan_mst *`
> that is populated for VLANs tied to a non-zero MSTI?
>

Good question. The u8 we should keep for the quick access/cache of state, the
ptr we might escape by keeping just the mst id and fetching it, i.e. it could
be just 2 bytes instead of 8, but that's not really a problem if it will be
used just for bookkeeping in slow paths. It can always be pushed in the end of
the struct and if a ptr makes things simpler and easier it's ok.
So in short yes, I think we can keep both.

>> This set needs to read a new cache line to fetch mst ptr for all packets in the vlan fast-path,
>> that is definitely undesirable. Please either cache that state in the vlan and update it when
>> something changes, or think of some way which avoids that cache line in fast-path.
>> Alternative would be to make that cache line dependent on the new option, so it's needed
>> only when mst feature is enabled.
>
> If we go with the approach I suggested above, then the current `u8
> state` on `struct net_bridge_vlan` could be that cache, right?
>

Yep, that's the idea.

> With the current implementation, it is set directly - in the new MST
> mode all grouped VLANs would have their states updated when updating the
> MSTI's state.

Right

Cheers,
Nik