This patchset adds MAB [1] offload support in mv88e6xxx.
Patch #1: Fix a problem when reading the FID needed to get the VID.
Patch #2: The MAB implementation for mv88e6xxx.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=a35ec8e38cdd1766f29924ca391a01de20163931
Hans J. Schultz (2):
net: dsa: mv88e6xxx: allow reading FID when handling ATU violations
net: dsa: mv88e6xxx: mac-auth/MAB implementation
drivers/net/dsa/mv88e6xxx/Makefile | 1 +
drivers/net/dsa/mv88e6xxx/chip.c | 19 ++++--
drivers/net/dsa/mv88e6xxx/chip.h | 10 ++++
drivers/net/dsa/mv88e6xxx/global1_atu.c | 70 ++++++++++++++++++++--
drivers/net/dsa/mv88e6xxx/port.h | 6 ++
drivers/net/dsa/mv88e6xxx/switchdev.c | 77 +++++++++++++++++++++++++
drivers/net/dsa/mv88e6xxx/switchdev.h | 19 ++++++
7 files changed, 190 insertions(+), 12 deletions(-)
create mode 100644 drivers/net/dsa/mv88e6xxx/switchdev.c
create mode 100644 drivers/net/dsa/mv88e6xxx/switchdev.h
--
2.34.1
The FID is needed to get hold of which VID was involved in a violation,
thus the need to be able to read the FID.
For convenience the function mv88e6xxx_g1_atu_op() has been used to read
ATU violations, but the function invalidates reading the fid, so to both
read ATU violations without zeroing the fid, and read the fid, functions
have been added to ensure both are done correctly.
Signed-off-by: Hans J. Schultz <[email protected]>
---
drivers/net/dsa/mv88e6xxx/global1_atu.c | 60 ++++++++++++++++++++++---
1 file changed, 55 insertions(+), 5 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c b/drivers/net/dsa/mv88e6xxx/global1_atu.c
index 40bd67a5c8e9..8a874b6fc8e1 100644
--- a/drivers/net/dsa/mv88e6xxx/global1_atu.c
+++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c
@@ -114,6 +114,19 @@ static int mv88e6xxx_g1_atu_op_wait(struct mv88e6xxx_chip *chip)
return mv88e6xxx_g1_wait_bit(chip, MV88E6XXX_G1_ATU_OP, bit, 0);
}
+static int mv88e6xxx_g1_read_atu_violation(struct mv88e6xxx_chip *chip)
+{
+ int err;
+
+ err = mv88e6xxx_g1_write(chip, MV88E6XXX_G1_ATU_OP,
+ MV88E6XXX_G1_ATU_OP_BUSY |
+ MV88E6XXX_G1_ATU_OP_GET_CLR_VIOLATION);
+ if (err)
+ return err;
+
+ return mv88e6xxx_g1_atu_op_wait(chip);
+}
+
static int mv88e6xxx_g1_atu_op(struct mv88e6xxx_chip *chip, u16 fid, u16 op)
{
u16 val;
@@ -159,6 +172,41 @@ int mv88e6xxx_g1_atu_get_next(struct mv88e6xxx_chip *chip, u16 fid)
return mv88e6xxx_g1_atu_op(chip, fid, MV88E6XXX_G1_ATU_OP_GET_NEXT_DB);
}
+static int mv88e6xxx_g1_atu_fid_read(struct mv88e6xxx_chip *chip, u16 *fid)
+{
+ u16 val = 0, upper = 0, op = 0;
+ int err = -EOPNOTSUPP;
+
+ if (mv88e6xxx_num_databases(chip) > 256) {
+ err = mv88e6xxx_g1_read(chip, MV88E6352_G1_ATU_FID, &val);
+ val &= 0xfff;
+ if (err)
+ return err;
+ } else {
+ err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_ATU_OP, &op);
+ if (err)
+ return err;
+ if (mv88e6xxx_num_databases(chip) > 64) {
+ /* ATU DBNum[7:4] are located in ATU Control 15:12 */
+ err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_ATU_CTL,
+ &upper);
+ if (err)
+ return err;
+
+ upper = (upper >> 8) & 0x00f0;
+ } else if (mv88e6xxx_num_databases(chip) > 16) {
+ /* ATU DBNum[5:4] are located in ATU Operation 9:8 */
+ upper = (op >> 4) & 0x30;
+ }
+
+ /* ATU DBNum[3:0] are located in ATU Operation 3:0 */
+ val = (op & 0xf) | upper;
+ }
+ *fid = val;
+
+ return err;
+}
+
/* Offset 0x0C: ATU Data Register */
static int mv88e6xxx_g1_atu_data_read(struct mv88e6xxx_chip *chip,
@@ -353,14 +401,12 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id)
{
struct mv88e6xxx_chip *chip = dev_id;
struct mv88e6xxx_atu_entry entry;
- int spid;
- int err;
- u16 val;
+ int err, spid;
+ u16 val, fid;
mv88e6xxx_reg_lock(chip);
- err = mv88e6xxx_g1_atu_op(chip, 0,
- MV88E6XXX_G1_ATU_OP_GET_CLR_VIOLATION);
+ err = mv88e6xxx_g1_read_atu_violation(chip);
if (err)
goto out;
@@ -368,6 +414,10 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id)
if (err)
goto out;
+ err = mv88e6xxx_g1_atu_fid_read(chip, &fid);
+ if (err)
+ goto out;
+
err = mv88e6xxx_g1_atu_data_read(chip, &entry);
if (err)
goto out;
--
2.34.1
On Sat, Nov 12, 2022 at 09:37:46PM +0100, Hans J. Schultz wrote:
> This patchset adds MAB [1] offload support in mv88e6xxx.
>
> Patch #1: Fix a problem when reading the FID needed to get the VID.
>
> Patch #2: The MAB implementation for mv88e6xxx.
Just to be sure, this was tested with bridge_locked_port.sh, right?
On 2022-11-15 10:30, Ido Schimmel wrote:
> On Sat, Nov 12, 2022 at 09:37:46PM +0100, Hans J. Schultz wrote:
>> This patchset adds MAB [1] offload support in mv88e6xxx.
>>
>> Patch #1: Fix a problem when reading the FID needed to get the VID.
>>
>> Patch #2: The MAB implementation for mv88e6xxx.
>
> Just to be sure, this was tested with bridge_locked_port.sh, right?
As I have the phy regression I have given notice of, that has simply not
been possible. After maybe 50 resets it worked for me at a point
(something to do with timing), and I tested it manually.
When I have tried to run the selftests, I get errors related to the phy
problem, which I have not been able to find a way around.
On 2022-11-15 11:28, Vladimir Oltean wrote:
> On Tue, Nov 15, 2022 at 11:26:55AM +0100, [email protected]
> wrote:
>> On 2022-11-15 10:30, Ido Schimmel wrote:
>> > On Sat, Nov 12, 2022 at 09:37:46PM +0100, Hans J. Schultz wrote:
>> > > This patchset adds MAB [1] offload support in mv88e6xxx.
>> > >
>> > > Patch #1: Fix a problem when reading the FID needed to get the VID.
>> > >
>> > > Patch #2: The MAB implementation for mv88e6xxx.
>> >
>> > Just to be sure, this was tested with bridge_locked_port.sh, right?
>>
>> As I have the phy regression I have given notice of, that has simply
>> not
>> been possible. After maybe 50 resets it worked for me at a point
>> (something to do with timing), and I tested it manually.
>>
>> When I have tried to run the selftests, I get errors related to the
>> phy
>> problem, which I have not been able to find a way around.
>
> What PHY regression?
I had a discussion with Jacub, which resulted in me sending a mail to
maintainers on this. The problem is shown below...
the phy is marvell/6097/88e3082
------------[ cut here ]------------
WARNING: CPU: 0 PID: 332 at drivers/net/phy/phy.c:975
phy_error+0x28/0x54
Modules linked in:
CPU: 0 PID: 332 Comm: kworker/0:0 Tainted: G W 6.0.0 #17
Hardware name: Freescale i.MX27 (Device Tree Support)
Workqueue: events_power_efficient phy_state_machine
unwind_backtrace from show_stack+0x18/0x1c
show_stack from dump_stack_lvl+0x28/0x30
dump_stack_lvl from __warn+0xb8/0x114
__warn from warn_slowpath_fmt+0x80/0xbc
warn_slowpath_fmt from phy_error+0x28/0x54
phy_error from phy_state_machine+0xbc/0x218
phy_state_machine from process_one_work+0x17c/0x244
process_one_work from worker_thread+0x248/0x2cc
worker_thread from kthread+0xb0/0xbc
kthread from ret_from_fork+0x14/0x2c
Exception stack(0xc4a71fb0 to 0xc4a71ff8)
1fa0: 00000000 00000000 00000000
00000000
1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
00000000
1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
---[ end trace 0000000000000000 ]---
On Tue, Nov 15, 2022 at 11:26:55AM +0100, [email protected] wrote:
> On 2022-11-15 10:30, Ido Schimmel wrote:
> > On Sat, Nov 12, 2022 at 09:37:46PM +0100, Hans J. Schultz wrote:
> > > This patchset adds MAB [1] offload support in mv88e6xxx.
> > >
> > > Patch #1: Fix a problem when reading the FID needed to get the VID.
> > >
> > > Patch #2: The MAB implementation for mv88e6xxx.
> >
> > Just to be sure, this was tested with bridge_locked_port.sh, right?
>
> As I have the phy regression I have given notice of, that has simply not
> been possible. After maybe 50 resets it worked for me at a point
> (something to do with timing), and I tested it manually.
>
> When I have tried to run the selftests, I get errors related to the phy
> problem, which I have not been able to find a way around.
What PHY regression?
On Tue, Nov 15, 2022 at 11:52:40AM +0100, [email protected] wrote:
> I had a discussion with Jacub, which resulted in me sending a mail to
> maintainers on this. The problem is shown below...
>
> the phy is marvell/6097/88e3082
>
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 332 at drivers/net/phy/phy.c:975
> phy_error+0x28/0x54
> Modules linked in:
> CPU: 0 PID: 332 Comm: kworker/0:0 Tainted: G W 6.0.0 #17
> Hardware name: Freescale i.MX27 (Device Tree Support)
> Workqueue: events_power_efficient phy_state_machine
> unwind_backtrace from show_stack+0x18/0x1c
> show_stack from dump_stack_lvl+0x28/0x30
> dump_stack_lvl from __warn+0xb8/0x114
> __warn from warn_slowpath_fmt+0x80/0xbc
> warn_slowpath_fmt from phy_error+0x28/0x54
> phy_error from phy_state_machine+0xbc/0x218
> phy_state_machine from process_one_work+0x17c/0x244
> process_one_work from worker_thread+0x248/0x2cc
> worker_thread from kthread+0xb0/0xbc
> kthread from ret_from_fork+0x14/0x2c
> Exception stack(0xc4a71fb0 to 0xc4a71ff8)
> 1fa0: 00000000 00000000 00000000 00000000
> 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
> ---[ end trace 0000000000000000 ]---
Was that email public on the lists? I don't see it...
The phy_error() is called from phy_state_machine() if one of
phy_check_link_status() or phy_start_aneg() fails.
Could you please print exactly the value of "err", as well as dig deeper
to see which call is failing, all the way into the PHY driver?
Easiest way to do that would probably be something like:
$ trace-cmd record -e mdio sleep 10 &
... do your stuff ...
$ trace-cmd report
kworker/u4:3-337 [001] 59.054741: mdio_access: 0000:00:00.3 read phy:0x13 reg:0x01 val:0x7949
kworker/u4:3-337 [001] 59.054941: mdio_access: 0000:00:00.3 read phy:0x13 reg:0x09 val:0x0700
kworker/u4:3-337 [001] 59.055262: mdio_access: 0000:00:00.3 read phy:0x13 reg:0x0a val:0x4000
kworker/u4:3-337 [001] 60.075808: mdio_access: 0000:00:00.3 read phy:0x13 reg:0x1c val:0x3005
"val" will be negative when there is an error. This is to see quicker what fails,
and if some MDIO access ever works.
If you don't want to enable CONFIG_FTRACE or install trace-cmd, you
could also probably do the debugging manually.
On 2022-11-15 12:10, Vladimir Oltean wrote:
> On Tue, Nov 15, 2022 at 11:52:40AM +0100, [email protected]
> wrote:
>> I had a discussion with Jacub, which resulted in me sending a mail to
>> maintainers on this. The problem is shown below...
>>
>> the phy is marvell/6097/88e3082
>>
>> ------------[ cut here ]------------
>> WARNING: CPU: 0 PID: 332 at drivers/net/phy/phy.c:975
>> phy_error+0x28/0x54
>> Modules linked in:
>> CPU: 0 PID: 332 Comm: kworker/0:0 Tainted: G W 6.0.0
>> #17
>> Hardware name: Freescale i.MX27 (Device Tree Support)
>> Workqueue: events_power_efficient phy_state_machine
>> unwind_backtrace from show_stack+0x18/0x1c
>> show_stack from dump_stack_lvl+0x28/0x30
>> dump_stack_lvl from __warn+0xb8/0x114
>> __warn from warn_slowpath_fmt+0x80/0xbc
>> warn_slowpath_fmt from phy_error+0x28/0x54
>> phy_error from phy_state_machine+0xbc/0x218
>> phy_state_machine from process_one_work+0x17c/0x244
>> process_one_work from worker_thread+0x248/0x2cc
>> worker_thread from kthread+0xb0/0xbc
>> kthread from ret_from_fork+0x14/0x2c
>> Exception stack(0xc4a71fb0 to 0xc4a71ff8)
>> 1fa0: 00000000 00000000 00000000
>> 00000000
>> 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> 00000000
>> 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000
>> ---[ end trace 0000000000000000 ]---
>
> Was that email public on the lists? I don't see it...
Sorry, yes the public list was not added.
>
> The phy_error() is called from phy_state_machine() if one of
> phy_check_link_status() or phy_start_aneg() fails.
>
> Could you please print exactly the value of "err", as well as dig
> deeper
> to see which call is failing, all the way into the PHY driver?
It happens on upstart, so I would then have to hack the system upstart
to add trace.
I also have:
mv88e6085 1002b000.ethernet-1:04: switch 0x990 detected: Marvell
88E6097/88E6097F, revision 2
mv88e6085 1002b000.ethernet-1:04: configuring for fixed/rgmii-id link
mode
mv88e6085 1002b000.ethernet-1:04: Link is Up - 100Mbps/Full - flow
control off
mv88e6085 1002b000.ethernet-1:04 eth10 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver
[Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver
[Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver
[Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver
[Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver
[Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver
[Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver
[Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver
[Generic PHY] (irq=POLL)
mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver
[Marvell 88E1112] (irq=174)
mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY
[!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver
[Marvell 88E1112] (irq=175)
after this and adding the ifaces to the bridge, it continues like:
br0: port 1(eth10) entered blocking state
br0: port 1(eth10) entered disabled state
br0: port 2(eth6) entered blocking state
br0: port 2(eth6) entered disabled state
device eth6 entered promiscuous mode
device eth10 entered promiscuous mode
br0: port 3(eth9) entered blocking state
br0: port 3(eth9) entered disabled state
device eth9 entered promiscuous mode
br0: port 4(eth5) entered blocking state
br0: port 4(eth5) entered disabled state
device eth5 entered promiscuous mode
br0: port 5(eth8) entered blocking state
br0: port 5(eth8) entered disabled state
device eth8 entered promiscuous mode
br0: port 6(eth4) entered blocking state
br0: port 6(eth4) entered disabled state
mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch
mv88e6085 1002b000.ethernet-1:04: port 0 failed to add 9a:af:03:f1:bd:0a
vid 1 to fdb: -110
device eth4 entered promiscuous mode
br0: port 7(eth7) entered blocking state
br0: port 7(eth7) entered disabled state
I don't know if that gives ay clues...?
Otherwise I have to take more time to see what I can dig out. The
easiest for me is then to
add some printk statements giving targeted information if told what and
where...
>
> Easiest way to do that would probably be something like:
>
> $ trace-cmd record -e mdio sleep 10 &
> ... do your stuff ...
> $ trace-cmd report
> kworker/u4:3-337 [001] 59.054741: mdio_access:
> 0000:00:00.3 read phy:0x13 reg:0x01 val:0x7949
> kworker/u4:3-337 [001] 59.054941: mdio_access:
> 0000:00:00.3 read phy:0x13 reg:0x09 val:0x0700
> kworker/u4:3-337 [001] 59.055262: mdio_access:
> 0000:00:00.3 read phy:0x13 reg:0x0a val:0x4000
> kworker/u4:3-337 [001] 60.075808: mdio_access:
> 0000:00:00.3 read phy:0x13 reg:0x1c val:0x3005
>
> "val" will be negative when there is an error. This is to see quicker
> what fails,
> and if some MDIO access ever works.
>
> If you don't want to enable CONFIG_FTRACE or install trace-cmd, you
> could also probably do the debugging manually.
On Tue, Nov 15, 2022 at 12:31:59PM +0100, [email protected] wrote:
> It happens on upstart, so I would then have to hack the system upstart to
> add trace.
Hack upstart or disable the service that brings the switch ports up, and
bring them up manually...
> I also have:
> mv88e6085 1002b000.ethernet-1:04: switch 0x990 detected: Marvell 88E6097/88E6097F, revision 2
> mv88e6085 1002b000.ethernet-1:04: configuring for fixed/rgmii-id link mode
> mv88e6085 1002b000.ethernet-1:04: Link is Up - 100Mbps/Full - flow control off
> mv88e6085 1002b000.ethernet-1:04 eth10 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver [Generic PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver [Generic PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver [Generic PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver [Generic PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver [Generic PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver [Generic PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver [Generic PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver [Generic PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver [Marvell 88E1112] (irq=174)
> mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver [Marvell 88E1112] (irq=175)
>
> after this and adding the ifaces to the bridge, it continues like:
>
> br0: port 1(eth10) entered blocking state
> br0: port 1(eth10) entered disabled state
> br0: port 2(eth6) entered blocking state
> br0: port 2(eth6) entered disabled state
> device eth6 entered promiscuous mode
> device eth10 entered promiscuous mode
> br0: port 3(eth9) entered blocking state
> br0: port 3(eth9) entered disabled state
> device eth9 entered promiscuous mode
> br0: port 4(eth5) entered blocking state
> br0: port 4(eth5) entered disabled state
> device eth5 entered promiscuous mode
> br0: port 5(eth8) entered blocking state
> br0: port 5(eth8) entered disabled state
> device eth8 entered promiscuous mode
> br0: port 6(eth4) entered blocking state
> br0: port 6(eth4) entered disabled state
> mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch
> mv88e6085 1002b000.ethernet-1:04: port 0 failed to add 9a:af:03:f1:bd:0a vid 1 to fdb: -110
Dumb question, but if you get errors like this, how can you test anything at all
in the patches that you submit?
> device eth4 entered promiscuous mode
> br0: port 7(eth7) entered blocking state
> br0: port 7(eth7) entered disabled state
>
> I don't know if that gives ay clues...?
Not really. That error might be related - something indicating a breakage
in the top-level (fec IIUC) MDIO controller, or not. There was "recent"
rework almost everywhere. For example commit 35da1dfd9484 ("net: dsa:
mv88e6xxx: Improve performance of busy bit polling"). That also hooks
into the mv88e6xxx cascaded MDIO controller (mv88e6xxx_g2_smi_phy_wait),
so there might be something there.
>
> Otherwise I have to take more time to see what I can dig out. The easiest
> for me is then to add some printk statements giving targeted information if told what and
> where...
Do you have a timeline for when the regression was introduced?
Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with
that reverted might be worth a shot. Otherwise, a bisect from a known
working version only takes a couple of hours, and shouldn't require
other changes to the setup.
On 2022-11-15 13:22, Vladimir Oltean wrote:
> On Tue, Nov 15, 2022 at 12:31:59PM +0100, [email protected]
> wrote:
>> It happens on upstart, so I would then have to hack the system upstart
>> to
>> add trace.
>
> Hack upstart or disable the service that brings the switch ports up,
> and
> bring them up manually...
>
>> I also have:
>> mv88e6085 1002b000.ethernet-1:04: switch 0x990 detected: Marvell
>> 88E6097/88E6097F, revision 2
>> mv88e6085 1002b000.ethernet-1:04: configuring for fixed/rgmii-id link
>> mode
>> mv88e6085 1002b000.ethernet-1:04: Link is Up - 100Mbps/Full - flow
>> control off
>> mv88e6085 1002b000.ethernet-1:04 eth10 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver
>> [Generic PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver
>> [Generic PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver
>> [Generic PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver
>> [Generic PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver
>> [Generic PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver
>> [Generic PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver
>> [Generic PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver
>> [Generic PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver
>> [Marvell 88E1112] (irq=174)
>> mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver
>> [Marvell 88E1112] (irq=175)
>>
>> after this and adding the ifaces to the bridge, it continues like:
>>
>> br0: port 1(eth10) entered blocking state
>> br0: port 1(eth10) entered disabled state
>> br0: port 2(eth6) entered blocking state
>> br0: port 2(eth6) entered disabled state
>> device eth6 entered promiscuous mode
>> device eth10 entered promiscuous mode
>> br0: port 3(eth9) entered blocking state
>> br0: port 3(eth9) entered disabled state
>> device eth9 entered promiscuous mode
>> br0: port 4(eth5) entered blocking state
>> br0: port 4(eth5) entered disabled state
>> device eth5 entered promiscuous mode
>> br0: port 5(eth8) entered blocking state
>> br0: port 5(eth8) entered disabled state
>> device eth8 entered promiscuous mode
>> br0: port 6(eth4) entered blocking state
>> br0: port 6(eth4) entered disabled state
>> mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch
>> mv88e6085 1002b000.ethernet-1:04: port 0 failed to add
>> 9a:af:03:f1:bd:0a vid 1 to fdb: -110
>
> Dumb question, but if you get errors like this, how can you test
> anything at all
> in the patches that you submit?
The answer is that I don't always get these errors... once in a while
(maaany resets) it does
not happen, and all is fine.
The error code is... well of course -110 (timed out).
>
>> device eth4 entered promiscuous mode
>> br0: port 7(eth7) entered blocking state
>> br0: port 7(eth7) entered disabled state
>>
>> I don't know if that gives ay clues...?
>
> Not really. That error might be related - something indicating a
> breakage
> in the top-level (fec IIUC) MDIO controller, or not. There was "recent"
> rework almost everywhere. For example commit 35da1dfd9484 ("net: dsa:
> mv88e6xxx: Improve performance of busy bit polling"). That also hooks
> into the mv88e6xxx cascaded MDIO controller
> (mv88e6xxx_g2_smi_phy_wait),
> so there might be something there.
>
I can check that out, but I remember that net-next has not worked on
this device for quite some
time...
>>
>> Otherwise I have to take more time to see what I can dig out. The
>> easiest
>> for me is then to add some printk statements giving targeted
>> information if told what and
>> where...
>
> Do you have a timeline for when the regression was introduced?
> Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with
> that reverted might be worth a shot. Otherwise, a bisect from a known
> working version only takes a couple of hours, and shouldn't require
> other changes to the setup.
I can't say when the regression was introduced as I used modified
kernels, but something
between 5.16 and 5.17, I know there was something phy related, but it's
a bit more complicated,
so it is only a guess...
I would have to get the whole locked port patch set etc. on a 5.16 to
see if that works.
On 2022-11-15 13:22, Vladimir Oltean wrote:
> Do you have a timeline for when the regression was introduced?
> Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with
> that reverted might be worth a shot. Otherwise, a bisect from a known
> working version only takes a couple of hours, and shouldn't require
> other changes to the setup.
Wow! Reverting 35da1dfd9484 and the problem has disappeared. :-)
The bridge_locked_port.sh tests all succeeded... as expected... ;-)
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver [Generic
> PHY] (irq=POLL)
It is interesting it is using the generic PHY driver, not the Marvell
PHY driver.
> mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver [Generic
> PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver [Generic
> PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver [Generic
> PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver [Generic
> PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver [Generic
> PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver [Generic
> PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver [Generic
> PHY] (irq=POLL)
> mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver
> [Marvell 88E1112] (irq=174)
> mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY
> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver
> [Marvell 88E1112] (irq=175)
And here it does use the Marvell PHY driver. Are ports 8 and 9
external, where as the others are internal?
Andrew
On 2022-11-15 14:21, Andrew Lunn wrote:
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:00] driver
>> [Generic
>> PHY] (irq=POLL)
>
> It is interesting it is using the generic PHY driver, not the Marvell
> PHY driver.
>
>> mv88e6085 1002b000.ethernet-1:04 eth6 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:01] driver
>> [Generic
>> PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth9 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:02] driver
>> [Generic
>> PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth5 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:03] driver
>> [Generic
>> PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth8 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:04] driver
>> [Generic
>> PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth4 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:05] driver
>> [Generic
>> PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth7 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:06] driver
>> [Generic
>> PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth3 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdio:07] driver
>> [Generic
>> PHY] (irq=POLL)
>> mv88e6085 1002b000.ethernet-1:04 eth2 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:08] driver
>> [Marvell 88E1112] (irq=174)
>> mv88e6085 1002b000.ethernet-1:04 eth1 (uninitialized): PHY
>> [!soc!aipi@10020000!ethernet@1002b000!mdio!switch@4!mdioe:09] driver
>> [Marvell 88E1112] (irq=175)
>
> And here it does use the Marvell PHY driver. Are ports 8 and 9
> external, where as the others are internal?
>
> Andrew
It is an 8 port switch, so the two (8+9) are for internal use, I guess,
as I
have not had any part in the system design etc of this device.
I have it for test and development purposes, incl. upstreaming.
On Tue, Nov 15, 2022 at 02:25:13PM +0100, [email protected] wrote:
> On 2022-11-15 13:22, Vladimir Oltean wrote:
> > Do you have a timeline for when the regression was introduced?
> > Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with
> > that reverted might be worth a shot. Otherwise, a bisect from a known
> > working version only takes a couple of hours, and shouldn't require
> > other changes to the setup.
>
> Wow! Reverting 35da1dfd9484 and the problem has disappeared. :-)
See? That wasn't very painful, was it.
Now, why doesn't that commit work for you? that's the real question.
I'm going to say there's a big assumption made there. The old code used
to poll up to 16 times with sleeps of up to 2 ms in between.
The new code polls until at least 50 ms have elapsed.
I can imagine the thought process being something like "hmm, 16*2=32ms,
let's round that up to 50 just to be sure". But the effective timeout
was not really increased. Rather said, in the old code there was never
really an effective timeout, since the polling code could have been
preempted many times, and these preemptions would not be accounted
against the msleep(2) calls. Whereas the new code really tracks
something approximating 50 ms now.
Could you please add the reverted patch back to your git tree, and see
by how much do you need to increase the timeout for your system to get
reliable results?
> The bridge_locked_port.sh tests all succeeded... as expected... ;-)
Yeah, I confirm this works on a 6390 over here. But I still don't like
the log spam from the IRQ handlers.
[root@mox:~/.../drivers/net/dsa] # ./bridge_locked_port.sh lan1 lan2 lan3 lan4
[ 1298.218224] mv88e6085 d0032004.mdio-mii:10 lan1: configuring for phy/gmii link mode
[ 1299.150249] mv88e6085 d0032004.mdio-mii:10 lan4: configuring for phy/gmii link mode
[ 1299.791778] br0: port 1(lan2) entered blocking state
[ 1299.800824] br0: port 1(lan2) entered disabled state
[ 1300.419596] br0: port 2(lan3) entered blocking state
[ 1300.425016] br0: port 2(lan3) entered disabled state
[ 1300.527959] device lan3 entered promiscuous mode
[ 1300.538124] device lan2 entered promiscuous mode
[ 1300.733679] mv88e6085 d0032004.mdio-mii:10 lan2: configuring for phy/gmii link mode
[ 1300.765642] 8021q: adding VLAN 0 to HW filter on device lan2
[ 1300.818540] mv88e6085 d0032004.mdio-mii:10 lan3: configuring for phy/gmii link mode
[ 1300.855003] 8021q: adding VLAN 0 to HW filter on device lan3
[ 1303.903840] mv88e6085 d0032004.mdio-mii:10 lan3: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1303.912939] IPv6: ADDRCONF(NETDEV_CHANGE): lan3: link becomes ready
[ 1303.928313] br0: port 2(lan3) entered blocking state
[ 1303.933685] br0: port 2(lan3) entered forwarding state
[ 1303.948607] mv88e6085 d0032004.mdio-mii:10 lan4: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1303.985784] IPv6: ADDRCONF(NETDEV_CHANGE): br0: link becomes ready
[ 1303.999962] IPv6: ADDRCONF(NETDEV_CHANGE): lan4: link becomes ready
[ 1304.003780] mv88e6085 d0032004.mdio-mii:10 lan2: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1304.017407] IPv6: ADDRCONF(NETDEV_CHANGE): lan2: link becomes ready
[ 1304.027922] br0: port 1(lan2) entered blocking state
[ 1304.033157] br0: port 1(lan2) entered forwarding state
[ 1304.043508] mv88e6085 d0032004.mdio-mii:10 lan1: Link is Up - 1Gbps/Full - flow control rx/tx
[ 1304.052601] IPv6: ADDRCONF(NETDEV_CHANGE): lan4.100: link becomes ready
[ 1304.158404] IPv6: ADDRCONF(NETDEV_CHANGE): lan1: link becomes ready
[ 1304.167574] IPv6: ADDRCONF(NETDEV_CHANGE): lan1.100: link becomes ready
TEST: Locked port ipv4 [ OK ]
TEST: Locked port ipv6 [ OK ]
[ 1337.627010] mv88e6xxx_g1_vtu_prob_irq_thread_fn: 1 callbacks suppressed
[ 1337.627042] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1337.644822] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1337.727920] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1337.862053] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1337.956972] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1338.065996] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1338.136905] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1338.238182] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1338.339244] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1338.440106] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1342.655520] mv88e6xxx_g1_vtu_prob_irq_thread_fn: 33 callbacks suppressed
[ 1342.655568] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1342.763619] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1342.835264] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1342.847464] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1342.865387] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1342.971661] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1343.075774] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1343.179562] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1343.283426] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1343.387409] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1348.758296] mv88e6xxx_g1_vtu_prob_irq_thread_fn: 24 callbacks suppressed
[ 1348.758328] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1348.858879] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1348.990492] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1349.063977] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1349.165979] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1349.268187] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1349.373827] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1349.472601] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1349.573752] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
[ 1349.585837] mv88e6085 d0032004.mdio-mii:10: VTU member violation for vid 100, source port 9
TEST: Locked port vlan [ OK ]
[ 1352.550194] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1352.659486] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1352.792941] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1352.888959] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1352.968150] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1353.072434] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1353.182844] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1353.280595] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1353.384755] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1353.488602] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1358.704139] mv88e6xxx_g1_atu_prob_irq_thread_fn: 39 callbacks suppressed
[ 1358.704172] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1359.280930] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1359.388609] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1359.524500] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1359.620272] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1359.696597] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1359.727872] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1359.801563] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1359.908665] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1360.012063] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1363.858627] mv88e6xxx_g1_atu_prob_irq_thread_fn: 29 callbacks suppressed
[ 1363.858674] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1364.879407] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
TEST: Locked port MAB [ OK ]
[ 1369.837089] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2
[ 1369.971489] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2
[ 1370.070444] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2
[ 1370.143784] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2
[ 1370.245843] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for a0:b0:c0:c0:b0:a0 portvec 4 spid 2
[ 1371.465429] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2
[ 1371.599084] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2
[ 1371.703341] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2
[ 1371.794905] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2
[ 1371.873307] mv88e6085 d0032004.mdio-mii:10: ATU member violation for a0:b0:c0:c0:b0:a0 portvec 8 spid 2
[ 1372.022403] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
TEST: Locked port MAB roam [ OK ]
TEST: Locked port MAB configuration [ OK ]
[ 1373.039089] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:01 portvec 4 spid 2
[ 1373.060995] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:05 portvec 4 spid 2
[ 1373.167964] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:05 portvec 4 spid 2
[ 1373.296506] mv88e6085 d0032004.mdio-mii:10: ATU miss violation for 00:01:02:03:04:05 portvec 4 spid 2
TEST: Locked port MAB FDB flush [ OK ]
[ 1375.330376] mv88e6085 d0032004.mdio-mii:10 lan3: Link is Down
[ 1375.341372] br0: port 2(lan3) entered disabled state
[ 1375.461136] mv88e6085 d0032004.mdio-mii:10 lan2: Link is Down
[ 1375.489476] br0: port 1(lan2) entered disabled state
[ 1375.611159] device lan3 left promiscuous mode
[ 1375.618253] br0: port 2(lan3) entered disabled state
[ 1375.737909] device lan2 left promiscuous mode
[ 1375.745305] br0: port 1(lan2) entered disabled state
On 2022-11-15 15:56, Vladimir Oltean wrote:
> On Tue, Nov 15, 2022 at 02:25:13PM +0100, [email protected]
> wrote:
>> On 2022-11-15 13:22, Vladimir Oltean wrote:
>
>> The bridge_locked_port.sh tests all succeeded... as expected... ;-)
>
> Yeah, I confirm this works on a 6390 over here.
Thanks :-)
> But I still don't like
> the log spam from the IRQ handlers.
>
> [root@mox:~/.../drivers/net/dsa] # ./bridge_locked_port.sh lan1 lan2
> lan3 lan4
> [ 1298.218224] mv88e6085 d0032004.mdio-mii:10 lan1: configuring for
> ...
I think the violation log issue should be handled in a seperate
patch(set)?
On Tue, Nov 15, 2022 at 04:14:05PM +0100, [email protected] wrote:
> I think the violation log issue should be handled in a seperate patch(set)?
idk, what do you plan to do about it?
On 2022-11-15 15:56, Vladimir Oltean wrote:
> On Tue, Nov 15, 2022 at 02:25:13PM +0100, [email protected]
> wrote:
>> On 2022-11-15 13:22, Vladimir Oltean wrote:
>> > Do you have a timeline for when the regression was introduced?
>> > Commit 35da1dfd9484 reverts cleanly, so I suppose giving it a go with
>> > that reverted might be worth a shot. Otherwise, a bisect from a known
>> > working version only takes a couple of hours, and shouldn't require
>> > other changes to the setup.
>>
>> Wow! Reverting 35da1dfd9484 and the problem has disappeared. :-)
>
> See? That wasn't very painful, was it.
Indeed it was not, when you get a good tip. Thanks alot! :-)
>
> Now, why doesn't that commit work for you? that's the real question.
> I'm going to say there's a big assumption made there. The old code used
> to poll up to 16 times with sleeps of up to 2 ms in between.
> The new code polls until at least 50 ms have elapsed.
> I can imagine the thought process being something like "hmm, 16*2=32ms,
> let's round that up to 50 just to be sure". But the effective timeout
> was not really increased. Rather said, in the old code there was never
> really an effective timeout, since the polling code could have been
> preempted many times, and these preemptions would not be accounted
> against the msleep(2) calls. Whereas the new code really tracks
> something approximating 50 ms now.
>
> Could you please add the reverted patch back to your git tree, and see
> by how much do you need to increase the timeout for your system to get
> reliable results?
>
Yes, so you want me to simply increase the 50ms on line 58 in smi.c...
I have now tried to increase it even to 10000ms == 10s and it didn't
help,
so something else must be needed...
On Tue, Nov 15, 2022 at 05:03:01PM +0100, [email protected] wrote:
> Yes, so you want me to simply increase the 50ms on line 58 in smi.c...
>
> I have now tried to increase it even to 10000ms == 10s and it didn't help,
> so something else must be needed...
Not only that, but also the one in mv88e6xxx_wait_mask().
On 2022-11-15 17:15, Vladimir Oltean wrote:
> On Tue, Nov 15, 2022 at 04:14:05PM +0100, [email protected]
> wrote:
>> I think the violation log issue should be handled in a seperate
>> patch(set)?
>
> idk, what do you plan to do about it?
When I think about it, I think that the messages should be disabled by
default
and like one enables NO_LL_LEARN (echo 1 >/sys/class...), they can be
activated
if one needs it. And of course the ethtool stats will still be there
anyhow...
How does that sound?
On Tue, Nov 15, 2022 at 06:11:08PM +0100, [email protected] wrote:
> On 2022-11-15 17:15, Vladimir Oltean wrote:
> > On Tue, Nov 15, 2022 at 04:14:05PM +0100, [email protected]
> > wrote:
> > > I think the violation log issue should be handled in a seperate
> > > patch(set)?
> >
> > idk, what do you plan to do about it?
>
> When I think about it, I think that the messages should be disabled by default
> and like one enables NO_LL_LEARN (echo 1 >/sys/class...), they can be activated
> if one needs it. And of course the ethtool stats will still be there anyhow...
>
> How does that sound?
Just make them trace points...
If you don't know how to do that, just search for who has "#define TRACE_SYSTEM"
in drivers/net/ethernet/ and steal from them...
On 2022-11-15 17:18, Vladimir Oltean wrote:
> On Tue, Nov 15, 2022 at 05:03:01PM +0100, [email protected]
> wrote:
>> Yes, so you want me to simply increase the 50ms on line 58 in smi.c...
>>
>> I have now tried to increase it even to 10000ms == 10s and it didn't
>> help,
>> so something else must be needed...
>
> Not only that, but also the one in mv88e6xxx_wait_mask().
So, I will not present you with a graph as it is a tedious process
(probably
it is some descending gaussian curve wrt timeout occurring).
But 100ms fails, 125 I had 1 port fail, at 140, 150 and 180 I saw
timeouts
resulting in fdb add fails, like (and occasional port fail):
mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch
mv88e6085 1002b000.ethernet-1:04: port 0 failed to add be:7c:96:06:9f:09
vid 1 to fdb: -110
At around 200 ms it looks like it is getting stable (like 5 runs, no
problems).
So with the gaussian curve tail whipping ones behind (risque of failure)
it
might need to be like 300 ms in my case... :-)
On Tue, Nov 15, 2022 at 07:40:02PM +0100, [email protected] wrote:
> So, I will not present you with a graph as it is a tedious process (probably
> it is some descending gaussian curve wrt timeout occurring).
>
> But 100ms fails, 125 I had 1 port fail, at 140, 150 and 180 I saw timeouts
> resulting in fdb add fails, like (and occasional port fail):
>
> mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch
> mv88e6085 1002b000.ethernet-1:04: port 0 failed to add be:7c:96:06:9f:09 vid
> 1 to fdb: -110
>
> At around 200 ms it looks like it is getting stable (like 5 runs, no
> problems).
>
> So with the gaussian curve tail whipping ones behind (risque of failure) it
> might need to be like 300 ms in my case... :-)
Pick a value that is high enough to be reliable and submit a patch to
"net" where you present the evidence for it (top-level MDIO controller,
SoC, switch, kernel). I don't believe there's much to read into. A large
timeout shouldn't have a negative effect on the MDIO performance,
because it just determines how long it takes until the kernel declares
it dead, rather than how long it takes for transactions to actually take
place.
> Pick a value that is high enough to be reliable and submit a patch to
> "net" where you present the evidence for it (top-level MDIO controller,
> SoC, switch, kernel). I don't believe there's much to read into. A large
> timeout shouldn't have a negative effect on the MDIO performance,
> because it just determines how long it takes until the kernel declares
> it dead, rather than how long it takes for transactions to actually take
> place.
Yes, please do that.
It is interesting that you found this. I'm just curious, so no need to
investigate if you don't have time. Is there a pattern, is it the same
register which always times out?
Andrew
On 2022-11-16 11:24, Vladimir Oltean wrote:
> On Tue, Nov 15, 2022 at 07:40:02PM +0100, [email protected]
> wrote:
>> So, I will not present you with a graph as it is a tedious process
>> (probably
>> it is some descending gaussian curve wrt timeout occurring).
>>
>> But 100ms fails, 125 I had 1 port fail, at 140, 150 and 180 I saw
>> timeouts
>> resulting in fdb add fails, like (and occasional port fail):
>>
>> mv88e6085 1002b000.ethernet-1:04: Timeout while waiting for switch
>> mv88e6085 1002b000.ethernet-1:04: port 0 failed to add
>> be:7c:96:06:9f:09 vid
>> 1 to fdb: -110
>>
>> At around 200 ms it looks like it is getting stable (like 5 runs, no
>> problems).
>>
>> So with the gaussian curve tail whipping ones behind (risque of
>> failure) it
>> might need to be like 300 ms in my case... :-)
>
> Pick a value that is high enough to be reliable and submit a patch to
> "net" where you present the evidence for it (top-level MDIO controller,
> SoC, switch, kernel). I don't believe there's much to read into. A
> large
> timeout shouldn't have a negative effect on the MDIO performance,
> because it just determines how long it takes until the kernel declares
> it dead, rather than how long it takes for transactions to actually
> take
> place.
Would it not be appropriate to have a define that specifies the value
instead
of the same value two places as it is now?
And in so case, what would be an appropriate name?
> Would it not be appropriate to have a define that specifies the
> value instead of the same value two places as it is now?
Yes.
> And in so case, what would be an appropriate name?
MV88E6XXX_WAIT_TIMEOUT_MS ?
Andrew