2021-08-07 12:08:59

by Qingfang Deng

[permalink] [raw]
Subject: [RFC net-next 0/3] qca8k bridge flags offload

Add bridge flags support for qca8k.

RFC: This is only compile-tested. Anyone who has the hardware, please
test this and provide Tested-by tags. Thanks.

DENG Qingfang (3):
net: dsa: qca8k: offload bridge flags
net: dsa: qca8k: enable assisted learning on CPU port
net: dsa: tag_qca: set offload_fwd_mark

drivers/net/dsa/qca8k.c | 62 +++++++++++++++++++++++++++++++++++------
net/dsa/tag_qca.c | 11 +++++++-
2 files changed, 64 insertions(+), 9 deletions(-)

--
2.25.1


2021-08-07 12:08:59

by Qingfang Deng

[permalink] [raw]
Subject: [RFC net-next 1/3] net: dsa: qca8k: offload bridge flags

The hardware supports controlling per-port flooding and learning.
Do not enable learning by default, instead configure them in
.port_bridge_flags function.

Signed-off-by: DENG Qingfang <[email protected]>
---
drivers/net/dsa/qca8k.c | 60 +++++++++++++++++++++++++++++++++++------
1 file changed, 52 insertions(+), 8 deletions(-)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 1f63f50f73f1..798bc548e5b0 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -987,10 +987,11 @@ qca8k_setup(struct dsa_switch *ds)
return ret;
}

- /* Disable forwarding by default on all ports */
+ /* Disable forwarding and learning by default on all ports */
for (i = 0; i < QCA8K_NUM_PORTS; i++) {
ret = qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(i),
- QCA8K_PORT_LOOKUP_MEMBER, 0);
+ QCA8K_PORT_LOOKUP_MEMBER |
+ QCA8K_PORT_LOOKUP_LEARN, 0);
if (ret)
return ret;
}
@@ -1028,12 +1029,6 @@ qca8k_setup(struct dsa_switch *ds)
if (ret)
return ret;

- /* Enable ARP Auto-learning by default */
- ret = qca8k_reg_set(priv, QCA8K_PORT_LOOKUP_CTRL(i),
- QCA8K_PORT_LOOKUP_LEARN);
- if (ret)
- return ret;
-
/* For port based vlans to work we need to set the
* default egress vid
*/
@@ -1504,6 +1499,53 @@ qca8k_port_stp_state_set(struct dsa_switch *ds, int port, u8 state)
QCA8K_PORT_LOOKUP_STATE_MASK, stp_state);
}

+static int
+qca8k_port_pre_bridge_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ if (flags.mask & ~(BR_LEARNING | BR_FLOOD | BR_MCAST_FLOOD |
+ BR_BCAST_FLOOD))
+ return -EINVAL;
+
+ return 0;
+}
+
+static int
+qca8k_port_bridge_flags(struct dsa_switch *ds, int port,
+ struct switchdev_brport_flags flags,
+ struct netlink_ext_ack *extack)
+{
+ struct qca8k_priv *priv = ds->priv;
+ int ret = 0;
+
+ if (!ret && flags.mask & BR_LEARNING)
+ ret = qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(port),
+ QCA8K_PORT_LOOKUP_LEARN,
+ flags.val & BR_LEARNING ?
+ QCA8K_PORT_LOOKUP_LEARN : 0);
+
+ if (!ret && flags.mask & BR_FLOOD)
+ ret = qca8k_rmw(priv, QCA8K_REG_GLOBAL_FW_CTRL1,
+ BIT(port + QCA8K_GLOBAL_FW_CTRL1_UC_DP_S),
+ flags.val & BR_FLOOD ?
+ BIT(port + QCA8K_GLOBAL_FW_CTRL1_UC_DP_S) : 0);
+
+ if (!ret && flags.mask & BR_MCAST_FLOOD)
+ ret = qca8k_rmw(priv, QCA8K_REG_GLOBAL_FW_CTRL1,
+ BIT(port + QCA8K_GLOBAL_FW_CTRL1_MC_DP_S),
+ flags.val & BR_MCAST_FLOOD ?
+ BIT(port + QCA8K_GLOBAL_FW_CTRL1_MC_DP_S) : 0);
+
+ if (!ret && flags.mask & BR_BCAST_FLOOD)
+ ret = qca8k_rmw(priv, QCA8K_REG_GLOBAL_FW_CTRL1,
+ BIT(port + QCA8K_GLOBAL_FW_CTRL1_BC_DP_S),
+ flags.val & BR_BCAST_FLOOD ?
+ BIT(port + QCA8K_GLOBAL_FW_CTRL1_BC_DP_S) : 0);
+
+ return ret;
+}
+
static int
qca8k_port_bridge_join(struct dsa_switch *ds, int port, struct net_device *br)
{
@@ -1764,6 +1806,8 @@ static const struct dsa_switch_ops qca8k_switch_ops = {
.port_change_mtu = qca8k_port_change_mtu,
.port_max_mtu = qca8k_port_max_mtu,
.port_stp_state_set = qca8k_port_stp_state_set,
+ .port_pre_bridge_flags = qca8k_port_pre_bridge_flags,
+ .port_bridge_flags = qca8k_port_bridge_flags,
.port_bridge_join = qca8k_port_bridge_join,
.port_bridge_leave = qca8k_port_bridge_leave,
.port_fdb_add = qca8k_port_fdb_add,
--
2.25.1

2021-08-07 12:09:03

by Qingfang Deng

[permalink] [raw]
Subject: [RFC net-next 2/3] net: dsa: qca8k: enable assisted learning on CPU port

Enable assisted learning on CPU port to fix roaming issues.

Although hardware learning is available, it won't work well with
software bridging fallback or multiple CPU ports.

Signed-off-by: DENG Qingfang <[email protected]>
---
drivers/net/dsa/qca8k.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index 798bc548e5b0..de2aa7812d1c 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -1114,6 +1114,8 @@ qca8k_setup(struct dsa_switch *ds)
/* We don't have interrupts for link changes, so we need to poll */
ds->pcs_poll = true;

+ ds->assisted_learning_on_cpu_port = true;
+
return 0;
}

--
2.25.1

2021-08-07 12:09:34

by Qingfang Deng

[permalink] [raw]
Subject: [RFC net-next 3/3] net: dsa: tag_qca: set offload_fwd_mark

As we offload flooding and forwarding, set offload_fwd_mark according to
Atheros header's type.

Signed-off-by: DENG Qingfang <[email protected]>
---
net/dsa/tag_qca.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
index 6e3136990491..ee5c1fdfef47 100644
--- a/net/dsa/tag_qca.c
+++ b/net/dsa/tag_qca.c
@@ -50,7 +50,7 @@ static struct sk_buff *qca_tag_xmit(struct sk_buff *skb, struct net_device *dev)

static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev)
{
- u8 ver;
+ u8 ver, type;
u16 hdr;
int port;
__be16 *phdr;
@@ -82,6 +82,15 @@ static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev)
if (!skb->dev)
return NULL;

+ type = (hdr & QCA_HDR_RECV_TYPE_MASK) >> QCA_HDR_RECV_TYPE_S;
+ switch (type) {
+ case 0x00: /* Normal packet */
+ case 0x19: /* Flooding to CPU */
+ case 0x1a: /* Forwarding to CPU */
+ dsa_default_offload_fwd_mark(skb);
+ break;
+ }
+
return skb;
}

--
2.25.1

2021-08-07 20:57:48

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC net-next 1/3] net: dsa: qca8k: offload bridge flags

On Sat, Aug 07, 2021 at 08:07:24PM +0800, DENG Qingfang wrote:
> +static int
> +qca8k_port_bridge_flags(struct dsa_switch *ds, int port,
> + struct switchdev_brport_flags flags,
> + struct netlink_ext_ack *extack)
> +{
> + struct qca8k_priv *priv = ds->priv;
> + int ret = 0;
> +
> + if (!ret && flags.mask & BR_LEARNING)
> + ret = qca8k_rmw(priv, QCA8K_PORT_LOOKUP_CTRL(port),
> + QCA8K_PORT_LOOKUP_LEARN,
> + flags.val & BR_LEARNING ?
> + QCA8K_PORT_LOOKUP_LEARN : 0);

And fast ageing when learning on a port is turned off?

> +
> + if (!ret && flags.mask & BR_FLOOD)
> + ret = qca8k_rmw(priv, QCA8K_REG_GLOBAL_FW_CTRL1,
> + BIT(port + QCA8K_GLOBAL_FW_CTRL1_UC_DP_S),
> + flags.val & BR_FLOOD ?
> + BIT(port + QCA8K_GLOBAL_FW_CTRL1_UC_DP_S) : 0);

2021-08-07 22:34:36

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC net-next 2/3] net: dsa: qca8k: enable assisted learning on CPU port

On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> Enable assisted learning on CPU port to fix roaming issues.

'roaming issues' implies to me it suffered from blindness to MAC
addresses learned on foreign interfaces, which appears to not be true
since your previous patch removes hardware learning on the CPU port
(=> hardware learning on the CPU port was supported, so there were no
roaming issues)

>
> Although hardware learning is available, it won't work well with
> software bridging fallback or multiple CPU ports.

This part is potentially true however, but it would need proof. I am not
familiar enough with the qca8k driver to say for sure that it suffers
from the typical problem with bridging with software LAG uppers (no FDB
isolation for standalone ports => attempt to shortcircuit the forwarding
through the CPU port and go directly towards the bridged port, which
would result in drops), and I also can't say anything about its support
for multiple CPU ports.

2021-08-07 22:59:03

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC net-next 3/3] net: dsa: tag_qca: set offload_fwd_mark

On Sat, Aug 07, 2021 at 08:07:26PM +0800, DENG Qingfang wrote:
> As we offload flooding and forwarding, set offload_fwd_mark according to
> Atheros header's type.
>
> Signed-off-by: DENG Qingfang <[email protected]>
> ---
> net/dsa/tag_qca.c | 11 ++++++++++-
> 1 file changed, 10 insertions(+), 1 deletion(-)
>
> diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
> index 6e3136990491..ee5c1fdfef47 100644
> --- a/net/dsa/tag_qca.c
> +++ b/net/dsa/tag_qca.c
> @@ -50,7 +50,7 @@ static struct sk_buff *qca_tag_xmit(struct sk_buff *skb, struct net_device *dev)
>
> static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev)
> {
> - u8 ver;
> + u8 ver, type;
> u16 hdr;
> int port;
> __be16 *phdr;
> @@ -82,6 +82,15 @@ static struct sk_buff *qca_tag_rcv(struct sk_buff *skb, struct net_device *dev)
> if (!skb->dev)
> return NULL;
>
> + type = (hdr & QCA_HDR_RECV_TYPE_MASK) >> QCA_HDR_RECV_TYPE_S;
> + switch (type) {
> + case 0x00: /* Normal packet */
> + case 0x19: /* Flooding to CPU */
> + case 0x1a: /* Forwarding to CPU */
> + dsa_default_offload_fwd_mark(skb);
> + break;
> + }
> +
> return skb;
> }
>
> --
> 2.25.1
>

In this day and age, I consider this commit to be a bug fix, since the
software bridge, seeing an skb with offload_fwd_mark = false on an
offloaded port, will think it hasn't been forwarded and do that job
itself. So all broadcast and multicast traffic flooded to the CPU will
end up being transmitted with duplicates on the other bridge ports.

When the qca8k tagger was added in 2016 in commit cafdc45c949b
("net-next: dsa: add Qualcomm tag RX/TX handler"), the offload_fwd_mark
framework was already there, but no DSA driver was using it - the first
commit I can find that uses offload_fwd_mark in DSA is f849772915e5
("net: dsa: lan9303: lan9303_rcv set skb->offload_fwd_mark") in 2017,
and then quite a few more followed suit. But you could still blame
commit cafdc45c949b.

Curious, I also see that the gswip driver is in the same situation: it
implements .port_bridge_join but does not set skb->offload_fwd_mark.
I've copied Hauke Mehrtens to make him aware. I would rather not send
the patch myself because I would do a rather lousy job and set it
unconditionally to 'true', but the hardware can probably do better in
informing the tagger about whether a frame was received only by the host
or not, since it has an 8 byte header on RX.

For the record, I've checked the other tagging drivers too, to see who
else does not set skb->offload_fwd_mark, and they all correspond to
switch drivers which don't implement .port_bridge_join, which in that
case would be the correct thing to do.

2021-08-08 16:23:00

by Qingfang Deng

[permalink] [raw]
Subject: Re: [RFC net-next 2/3] net: dsa: qca8k: enable assisted learning on CPU port

On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> > Enable assisted learning on CPU port to fix roaming issues.
>
> 'roaming issues' implies to me it suffered from blindness to MAC
> addresses learned on foreign interfaces, which appears to not be true
> since your previous patch removes hardware learning on the CPU port
> (=> hardware learning on the CPU port was supported, so there were no
> roaming issues)

The datasheet says learning is enabled by default, but if that's true,
the driver won't have to enable it manually.

Others have reported roaming issues as well:
https://github.com/Ansuel/openwrt/pull/3

As I don't have the hardware to test, I don't know what the default
value really is, so I just disable learning to make sure.

>
> >
> > Although hardware learning is available, it won't work well with
> > software bridging fallback or multiple CPU ports.
>
> This part is potentially true however, but it would need proof. I am not
> familiar enough with the qca8k driver to say for sure that it suffers
> from the typical problem with bridging with software LAG uppers (no FDB
> isolation for standalone ports => attempt to shortcircuit the forwarding
> through the CPU port and go directly towards the bridged port, which
> would result in drops), and I also can't say anything about its support
> for multiple CPU ports.

QCA8337 supports disabling learning and FDB lookup on a per-VLAN basis,
so we could assign all standalone ports to a reserved VLAN (ID 0 or 4095)
with learning and FDB lookup disabled.

Ansuel has a patch set for multiple CPU ports.

2021-08-08 16:24:24

by Qingfang Deng

[permalink] [raw]
Subject: Re: [RFC net-next 3/3] net: dsa: tag_qca: set offload_fwd_mark

On Sun, Aug 08, 2021 at 01:57:21AM +0300, Vladimir Oltean wrote:
> In this day and age, I consider this commit to be a bug fix, since the
> software bridge, seeing an skb with offload_fwd_mark = false on an
> offloaded port, will think it hasn't been forwarded and do that job
> itself. So all broadcast and multicast traffic flooded to the CPU will
> end up being transmitted with duplicates on the other bridge ports.
>
> When the qca8k tagger was added in 2016 in commit cafdc45c949b
> ("net-next: dsa: add Qualcomm tag RX/TX handler"), the offload_fwd_mark
> framework was already there, but no DSA driver was using it - the first
> commit I can find that uses offload_fwd_mark in DSA is f849772915e5
> ("net: dsa: lan9303: lan9303_rcv set skb->offload_fwd_mark") in 2017,
> and then quite a few more followed suit. But you could still blame
> commit cafdc45c949b.

The driver currently only enables flooding to the CPU port (like MT7530
back then), so offload_fwd_mark should NOT be set until bridge flags
offload is supported.

>
> Curious, I also see that the gswip driver is in the same situation: it
> implements .port_bridge_join but does not set skb->offload_fwd_mark.
> I've copied Hauke Mehrtens to make him aware. I would rather not send
> the patch myself because I would do a rather lousy job and set it
> unconditionally to 'true', but the hardware can probably do better in
> informing the tagger about whether a frame was received only by the host
> or not, since it has an 8 byte header on RX.
>
> For the record, I've checked the other tagging drivers too, to see who
> else does not set skb->offload_fwd_mark, and they all correspond to
> switch drivers which don't implement .port_bridge_join, which in that
> case would be the correct thing to do.

2021-08-08 21:16:31

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC net-next 3/3] net: dsa: tag_qca: set offload_fwd_mark

On Mon, Aug 09, 2021 at 12:12:24AM +0800, DENG Qingfang wrote:
> On Sun, Aug 08, 2021 at 01:57:21AM +0300, Vladimir Oltean wrote:
> > In this day and age, I consider this commit to be a bug fix, since the
> > software bridge, seeing an skb with offload_fwd_mark = false on an
> > offloaded port, will think it hasn't been forwarded and do that job
> > itself. So all broadcast and multicast traffic flooded to the CPU will
> > end up being transmitted with duplicates on the other bridge ports.
> >
> > When the qca8k tagger was added in 2016 in commit cafdc45c949b
> > ("net-next: dsa: add Qualcomm tag RX/TX handler"), the offload_fwd_mark
> > framework was already there, but no DSA driver was using it - the first
> > commit I can find that uses offload_fwd_mark in DSA is f849772915e5
> > ("net: dsa: lan9303: lan9303_rcv set skb->offload_fwd_mark") in 2017,
> > and then quite a few more followed suit. But you could still blame
> > commit cafdc45c949b.
>
> The driver currently only enables flooding to the CPU port (like MT7530
> back then), so offload_fwd_mark should NOT be set until bridge flags
> offload is supported.

Ok, I missed that. Please squash this with patch 1 then, please.

2021-08-08 21:33:31

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC net-next 2/3] net: dsa: qca8k: enable assisted learning on CPU port

On Mon, Aug 09, 2021 at 12:05:03AM +0800, DENG Qingfang wrote:
> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
> > On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> > > Enable assisted learning on CPU port to fix roaming issues.
> >
> > 'roaming issues' implies to me it suffered from blindness to MAC
> > addresses learned on foreign interfaces, which appears to not be true
> > since your previous patch removes hardware learning on the CPU port
> > (=> hardware learning on the CPU port was supported, so there were no
> > roaming issues)
>
> The datasheet says learning is enabled by default, but if that's true,
> the driver won't have to enable it manually.
>
> Others have reported roaming issues as well:
> https://github.com/Ansuel/openwrt/pull/3
>
> As I don't have the hardware to test, I don't know what the default
> value really is, so I just disable learning to make sure.

That link doesn't really say more than "roaming issues" either, so I am
still not clear on what is being fixed here exactly.

Note that I can still think of 'roaming'-related issues with VLAN-aware
bridges and foreign interfaces and hardware learning on the CPU port,
but I don't want to speculate too much and just want to hear what is the
issue that is being fixed.

> > > Although hardware learning is available, it won't work well with
> > > software bridging fallback or multiple CPU ports.
> >
> > This part is potentially true however, but it would need proof. I am not
> > familiar enough with the qca8k driver to say for sure that it suffers
> > from the typical problem with bridging with software LAG uppers (no FDB
> > isolation for standalone ports => attempt to shortcircuit the forwarding
> > through the CPU port and go directly towards the bridged port, which
> > would result in drops), and I also can't say anything about its support
> > for multiple CPU ports.
>
> QCA8337 supports disabling learning and FDB lookup on a per-VLAN basis,
> so we could assign all standalone ports to a reserved VLAN (ID 0 or 4095)
> with learning and FDB lookup disabled.

And to follow along that idea, if you also change the tagger to send
all packets towards a standalone port using that reserved VLAN, then
even if hardware learning is enabled on the CPU port, it will be
inconsequential as long as IVL is used, because no FDB lookup will match
the VLAN in which those addresses were learned.

My point is, if you come with something functional to the table, present
the whole story. If more changes still need to be made until it works
with software bridging fallback, say that too. Otherwise, I think that
the general idea that "hardware learning on the CPU port won't work well
with software bridging fallback" is not strictly true, and that this
patch has a weak overall justification.

>
> Ansuel has a patch set for multiple CPU ports.

2021-08-10 08:41:00

by Jonathan McDowell

[permalink] [raw]
Subject: Re: [RFC net-next 0/3] qca8k bridge flags offload

On Sat, Aug 07, 2021 at 08:07:23PM +0800, DENG Qingfang wrote:
> Add bridge flags support for qca8k.
>
> RFC: This is only compile-tested. Anyone who has the hardware, please
> test this and provide Tested-by tags. Thanks.

I will try to find time to build + test these on my RB3011 this weekend.
Anything in particular I should look out for, or just that normal
operation is unaffected?

> DENG Qingfang (3):
> net: dsa: qca8k: offload bridge flags
> net: dsa: qca8k: enable assisted learning on CPU port
> net: dsa: tag_qca: set offload_fwd_mark
>
> drivers/net/dsa/qca8k.c | 62 +++++++++++++++++++++++++++++++++++------
> net/dsa/tag_qca.c | 11 +++++++-
> 2 files changed, 64 insertions(+), 9 deletions(-)

J.

--
Are you happy with your wash?

2021-08-10 17:31:52

by Andre Valentin

[permalink] [raw]
Subject: Re: [RFC net-next 2/3] net: dsa: qca8k: enable assisted learning on CPU port

On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote:
> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
>> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
>>> Enable assisted learning on CPU port to fix roaming issues.
>>
>> 'roaming issues' implies to me it suffered from blindness to MAC
>> addresses learned on foreign interfaces, which appears to not be true
>> since your previous patch removes hardware learning on the CPU port
>> (=> hardware learning on the CPU port was supported, so there were no
>> roaming issues)

The issue is with a wifi AP bridged into dsa and previously learned
addresses.

Test setup:
We have to wifi APs a and b(with qca8k). Client is on AP a.

The qca8k switch in AP b sees also the broadcast traffic from the client
and takes the address into its fdb.

Now the client roams to AP b.
The client starts DHCP but does not get an IP. With tcpdump, I see the
packets going through the switch (ap->cpu port->ethernet port) and they
arrive at the DHCP server. It responds, the response packet reaches the
ethernet port of the qca8k, and is not forwarded.

After about 3 minutes the fdb entry in the qca8k on AP b is
"cleaned up" and the client can immediately get its IP from the DHCP server.

I hope this helps understanding the background.

2021-08-10 18:22:08

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC net-next 2/3] net: dsa: qca8k: enable assisted learning on CPU port

On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote:
> On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote:
> > On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
> >> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> >>> Enable assisted learning on CPU port to fix roaming issues.
> >>
> >> 'roaming issues' implies to me it suffered from blindness to MAC
> >> addresses learned on foreign interfaces, which appears to not be true
> >> since your previous patch removes hardware learning on the CPU port
> >> (=> hardware learning on the CPU port was supported, so there were no
> >> roaming issues)
>
> The issue is with a wifi AP bridged into dsa and previously learned
> addresses.
>
> Test setup:
> We have to wifi APs a and b(with qca8k). Client is on AP a.
>
> The qca8k switch in AP b sees also the broadcast traffic from the client
> and takes the address into its fdb.
>
> Now the client roams to AP b.
> The client starts DHCP but does not get an IP. With tcpdump, I see the
> packets going through the switch (ap->cpu port->ethernet port) and they
> arrive at the DHCP server. It responds, the response packet reaches the
> ethernet port of the qca8k, and is not forwarded.
>
> After about 3 minutes the fdb entry in the qca8k on AP b is
> "cleaned up" and the client can immediately get its IP from the DHCP server.
>
> I hope this helps understanding the background.

How does this differ from what is described in commit d5f19486cee7
("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign
bridge neighbors")?

2021-08-10 21:14:00

by Andre Valentin

[permalink] [raw]
Subject: Re: [RFC net-next 2/3] net: dsa: qca8k: enable assisted learning on CPU port

Am 10.08.21 um 19:53 schrieb Vladimir Oltean:
> On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote:
>> On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote:
>>> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
>>>> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
>>>>> Enable assisted learning on CPU port to fix roaming issues.
>>>>
>>>> 'roaming issues' implies to me it suffered from blindness to MAC
>>>> addresses learned on foreign interfaces, which appears to not be true
>>>> since your previous patch removes hardware learning on the CPU port
>>>> (=> hardware learning on the CPU port was supported, so there were no
>>>> roaming issues)
>>
>> The issue is with a wifi AP bridged into dsa and previously learned
>> addresses.
>>
>> Test setup:
>> We have to wifi APs a and b(with qca8k). Client is on AP a.
>>
>> The qca8k switch in AP b sees also the broadcast traffic from the client
>> and takes the address into its fdb.
>>
>> Now the client roams to AP b.
>> The client starts DHCP but does not get an IP. With tcpdump, I see the
>> packets going through the switch (ap->cpu port->ethernet port) and they
>> arrive at the DHCP server. It responds, the response packet reaches the
>> ethernet port of the qca8k, and is not forwarded.
>>
>> After about 3 minutes the fdb entry in the qca8k on AP b is
>> "cleaned up" and the client can immediately get its IP from the DHCP server.
>>
>> I hope this helps understanding the background.
>
> How does this differ from what is described in commit d5f19486cee7
> ("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign
> bridge neighbors")?
>
I lost a bit, It is a bit different.

I've been also working a bit on the ipq807x device with such a switch on
OpenWRT. There is a backport of that commit. To fix the problems described
by d5f19486cee7, I enabled assisted_learning on qca8k. And it solves the
problem.

But initially, the device was unreachable until I created traffic from the device
to a client (cpu port->ethernet). At first, I did not notice this because a wifi client
with it's traffic immediately solved the issue automatically.
Later I verified this in detail.

Hopefully DENG Qingfang patches help. But I cannot proove atm.

2021-08-10 23:35:07

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [RFC net-next 2/3] net: dsa: qca8k: enable assisted learning on CPU port

On Tue, Aug 10, 2021 at 11:09:07PM +0200, Andre Valentin wrote:
> Am 10.08.21 um 19:53 schrieb Vladimir Oltean:
> > On Tue, Aug 10, 2021 at 07:27:05PM +0200, Andre Valentin wrote:
> >> On Sun, Aug 08, 2021 at 1805, DENG Qingfang wrote:
> >>> On Sun, Aug 08, 2021 at 01:25:55AM +0300, Vladimir Oltean wrote:
> >>>> On Sat, Aug 07, 2021 at 08:07:25PM +0800, DENG Qingfang wrote:
> >>>>> Enable assisted learning on CPU port to fix roaming issues.
> >>>>
> >>>> 'roaming issues' implies to me it suffered from blindness to MAC
> >>>> addresses learned on foreign interfaces, which appears to not be true
> >>>> since your previous patch removes hardware learning on the CPU port
> >>>> (=> hardware learning on the CPU port was supported, so there were no
> >>>> roaming issues)
> >>
> >> The issue is with a wifi AP bridged into dsa and previously learned
> >> addresses.
> >>
> >> Test setup:
> >> We have to wifi APs a and b(with qca8k). Client is on AP a.
> >>
> >> The qca8k switch in AP b sees also the broadcast traffic from the client
> >> and takes the address into its fdb.
> >>
> >> Now the client roams to AP b.
> >> The client starts DHCP but does not get an IP. With tcpdump, I see the
> >> packets going through the switch (ap->cpu port->ethernet port) and they
> >> arrive at the DHCP server. It responds, the response packet reaches the
> >> ethernet port of the qca8k, and is not forwarded.
> >>
> >> After about 3 minutes the fdb entry in the qca8k on AP b is
> >> "cleaned up" and the client can immediately get its IP from the DHCP server.
> >>
> >> I hope this helps understanding the background.
> >
> > How does this differ from what is described in commit d5f19486cee7
> > ("net: dsa: listen for SWITCHDEV_{FDB,DEL}_ADD_TO_DEVICE on foreign
> > bridge neighbors")?
> >
> I lost a bit, It is a bit different.
>
> I've been also working a bit on the ipq807x device with such a switch on
> OpenWRT. There is a backport of that commit. To fix the problems described
> by d5f19486cee7, I enabled assisted_learning on qca8k. And it solves the
> problem.
>
> But initially, the device was unreachable until I created traffic from the device
> to a client (cpu port->ethernet). At first, I did not notice this because a wifi client
> with it's traffic immediately solved the issue automatically.
> Later I verified this in detail.
>
> Hopefully DENG Qingfang patches help. But I cannot proove atm.

I don't understand. You're saying that when the device sends a packet
from its new position, the switch learns it on the CPU port, and that
fixes the issue?

Isn't that always how issues like that get fixed? If hardware learning
is supported on the CPU port, it is no different than a device roaming
from one switch port to another (but isn't directly connected to that
switch port, otherwise the switch might fast age the port when the
device roams) - it is inaccessible until it says something.

I still have no idea what we're talking about, and why this patch is
necessary. Does the qca8k switch support hardware learning on the CPU
port or not?