Subject: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

Hi.

This patch series fixes EEE support for MT7531 and the switch on the MT7988
SoC. EEE did not work on MT7531 on most boards before this, it is unclear
what's the status on MT7988 SoC switch as I don't have the hardware.

Signed-off-by: Arınç ÜNAL <[email protected]>
---
Arınç ÜNAL (3):
net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards
net: dsa: mt7530: fix disabling EEE on failure on MT7531 and MT7988
net: phy: mediatek-ge: do not disable EEE advertisement

drivers/net/dsa/mt7530.c | 7 +++++++
drivers/net/dsa/mt7530.h | 7 ++++++-
drivers/net/phy/mediatek-ge.c | 3 ---
3 files changed, 13 insertions(+), 4 deletions(-)
---
base-commit: ea80e3ed09ab2c2b75724faf5484721753e92c31
change-id: 20240317-for-net-mt7530-fix-eee-for-mt7531-mt7988-a5c5453cc0e8

Best regards,
--
Arınç ÜNAL <[email protected]>



2024-03-18 12:58:14

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch



On 3/18/2024 12:46 AM, Arınç ÜNAL via B4 Relay wrote:
> Hi.
>
> This patch series fixes EEE support for MT7531 and the switch on the MT7988
> SoC. EEE did not work on MT7531 on most boards before this, it is unclear
> what's the status on MT7988 SoC switch as I don't have the hardware.

We've received your patch series 4 times and this was the same thing
with your previous b4 submission, can you find out what happened? Thanks.
--
Florian

2024-03-18 14:04:26

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 18.03.2024 15:57, Florian Fainelli wrote:
>
>
> On 3/18/2024 12:46 AM, Arınç ÜNAL via B4 Relay wrote:
>> Hi.
>>
>> This patch series fixes EEE support for MT7531 and the switch on the MT7988
>> SoC. EEE did not work on MT7531 on most boards before this, it is unclear
>> what's the status on MT7988 SoC switch as I don't have the hardware.
>
> We've received your patch series 4 times and this was the same thing with your previous b4 submission, can you find out what happened? Thanks.

It looks like my branch name was too long again. b4 0.13.0 cannot handle
branch names that are too long. I'll keep it shorter for future
submissions. It'd be great if Konstantin could provide a specific limit.

https://lore.kernel.org/all/20240205-silky-sensible-puffin-8e23ee@lemur/

Arınç

2024-03-18 18:02:09

by Konstantin Ryabitsev

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On Mon, Mar 18, 2024 at 05:03:29PM +0300, Arınç ÜNAL wrote:
> > We've received your patch series 4 times and this was the same thing with your previous b4 submission, can you find out what happened? Thanks.
>
> It looks like my branch name was too long again. b4 0.13.0 cannot handle
> branch names that are too long. I'll keep it shorter for future
> submissions. It'd be great if Konstantin could provide a specific limit.

It's not really b4, it's the web endpoint and the version of python it's
running. I hope to fix it soon by applying the same workarounds as we ended up
doing for b4 itself.

-K

2024-03-19 16:07:43

by Konstantin Ryabitsev

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On Mon, Mar 18, 2024 at 02:01:58PM -0400, Konstantin Ryabitsev wrote:
> > It looks like my branch name was too long again. b4 0.13.0 cannot handle
> > branch names that are too long. I'll keep it shorter for future
> > submissions. It'd be great if Konstantin could provide a specific limit.
>
> It's not really b4, it's the web endpoint and the version of python it's
> running. I hope to fix it soon by applying the same workarounds as we ended up
> doing for b4 itself.

I've deployed the version of endpoint that works around this bug. Hopefully,
you won't see it again regardless of the message-id length.

-K

2024-03-19 18:26:57

by Daniel Golle

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On Mon, Mar 18, 2024 at 10:46:22AM +0300, Arınç ÜNAL via B4 Relay wrote:
> Hi.
>
> This patch series fixes EEE support for MT7531 and the switch on the MT7988
> SoC. EEE did not work on MT7531 on most boards before this, it is unclear
> what's the status on MT7988 SoC switch as I don't have the hardware.

EEE seems to already work just fine on the MT7988 built-in switch, at least
on the BPI-R4. I don't think the SoC has bootstrap pins related to EEE like
stand-alone MT753x may have.

root@bpi-r4:~# ethtool --show-eee lan1
EEE settings for lan1:
EEE status: disabled
Tx LPI: 30 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: Not reported
Link partner advertised EEE link modes: 100baseT/Full
1000baseT/Full

root@bpi-r4:~# ethtool --set-eee lan1 eee on
root@bpi-r4:~# ethtool --show-eee lan1
EEE settings for lan1:
EEE status: enabled - inactive
Tx LPI: 30 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Link partner advertised EEE link modes: Not reported
root@bpi-r4:~# ethtool --show-eee lan1
EEE settings for lan1:
EEE status: enabled - active
Tx LPI: 30 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Link partner advertised EEE link modes: 100baseT/Full
1000baseT/Full

So don't fix if it ain't broken maybe...?

>
> Signed-off-by: Arınç ÜNAL <[email protected]>
> ---
> Arınç ÜNAL (3):
> net: dsa: mt7530: fix enabling EEE on MT7531 switch on all boards
> net: dsa: mt7530: fix disabling EEE on failure on MT7531 and MT7988
> net: phy: mediatek-ge: do not disable EEE advertisement
>
> drivers/net/dsa/mt7530.c | 7 +++++++
> drivers/net/dsa/mt7530.h | 7 ++++++-
> drivers/net/phy/mediatek-ge.c | 3 ---
> 3 files changed, 13 insertions(+), 4 deletions(-)
> ---
> base-commit: ea80e3ed09ab2c2b75724faf5484721753e92c31
> change-id: 20240317-for-net-mt7530-fix-eee-for-mt7531-mt7988-a5c5453cc0e8
>
> Best regards,
> --
> Arınç ÜNAL <[email protected]>
>
>

2024-03-19 18:32:16

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 19.03.2024 19:03, Konstantin Ryabitsev wrote:
> On Mon, Mar 18, 2024 at 02:01:58PM -0400, Konstantin Ryabitsev wrote:
>>> It looks like my branch name was too long again. b4 0.13.0 cannot handle
>>> branch names that are too long. I'll keep it shorter for future
>>> submissions. It'd be great if Konstantin could provide a specific limit.
>>
>> It's not really b4, it's the web endpoint and the version of python it's
>> running. I hope to fix it soon by applying the same workarounds as we ended up
>> doing for b4 itself.
>
> I've deployed the version of endpoint that works around this bug. Hopefully,
> you won't see it again regardless of the message-id length.

Thank you very much!

Arınç

2024-03-19 18:53:36

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 19.03.2024 21:25, Daniel Golle wrote:
> On Mon, Mar 18, 2024 at 10:46:22AM +0300, Arınç ÜNAL via B4 Relay wrote:
>> Hi.
>>
>> This patch series fixes EEE support for MT7531 and the switch on the MT7988
>> SoC. EEE did not work on MT7531 on most boards before this, it is unclear
>> what's the status on MT7988 SoC switch as I don't have the hardware.
>
> EEE seems to already work just fine on the MT7988 built-in switch, at least
> on the BPI-R4. I don't think the SoC has bootstrap pins related to EEE like
> stand-alone MT753x may have.
>
> root@bpi-r4:~# ethtool --show-eee lan1
> EEE settings for lan1:
> EEE status: disabled
> Tx LPI: 30 (us)
> Supported EEE link modes: 100baseT/Full
> 1000baseT/Full
> Advertised EEE link modes: Not reported
> Link partner advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
>
> root@bpi-r4:~# ethtool --set-eee lan1 eee on
> root@bpi-r4:~# ethtool --show-eee lan1
> EEE settings for lan1:
> EEE status: enabled - inactive
> Tx LPI: 30 (us)
> Supported EEE link modes: 100baseT/Full
> 1000baseT/Full
> Advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
> Link partner advertised EEE link modes: Not reported
> root@bpi-r4:~# ethtool --show-eee lan1
> EEE settings for lan1:
> EEE status: enabled - active
> Tx LPI: 30 (us)
> Supported EEE link modes: 100baseT/Full
> 1000baseT/Full
> Advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
> Link partner advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
>
> So don't fix if it ain't broken maybe...?

I would argue that EEE advertisement on the PHY should be enabled by
default. I guess we're supposed to supply that on the PHY driver. Can you
test with this diff applied and see if it works without manually enabling
EEE using ethtool?

diff --git a/drivers/net/phy/mediatek-ge-soc.c b/drivers/net/phy/mediatek-ge-soc.c
index 0f3a1538a8b8..5f482c12018a 100644
--- a/drivers/net/phy/mediatek-ge-soc.c
+++ b/drivers/net/phy/mediatek-ge-soc.c
@@ -978,6 +978,9 @@ static void mt798x_phy_eee(struct phy_device *phydev)
MTK_PHY_RG_LPI_PCS_DSP_CTRL_REG122,
MTK_PHY_LPI_NORM_MSE_HI_THRESH1000_MASK,
FIELD_PREP(MTK_PHY_LPI_NORM_MSE_HI_THRESH1000_MASK, 0xff));
+
+ phy_write_mmd(phydev, MDIO_MMD_AN, MDIO_AN_EEE_ADV, MDIO_EEE_100TX |
+ MDIO_EEE_1000T);
}

static int cal_sw(struct phy_device *phydev, enum CAL_ITEM cal_item,

Arınç

2024-03-19 19:38:41

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

> I would argue that EEE advertisement on the PHY should be enabled by
> default.

That is an open question at the moment. For some use cases, it can add
extra delay and jitter which can cause problems. I've heard people
doing PTP don't like EEE for example.

The current phylib core code leaves the PHY advertisement whatever its
reset default is. So we leave it to the manufacture to decide if it
should be enabled or disabled by default. It is policy, so it should
really be down to user space to configure EEE how it wants it.

Andrew

2024-03-19 20:05:39

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 19.03.2024 22:38, Andrew Lunn wrote:
>> I would argue that EEE advertisement on the PHY should be enabled by
>> default.
>
> That is an open question at the moment. For some use cases, it can add
> extra delay and jitter which can cause problems. I've heard people
> doing PTP don't like EEE for example.
>
> The current phylib core code leaves the PHY advertisement whatever its
> reset default is. So we leave it to the manufacture to decide if it
> should be enabled or disabled by default. It is policy, so it should
> really be down to user space to configure EEE how it wants it.

That's fine by me. Then my patch series is okay as it is.

Arınç

2024-03-19 20:27:51

by Daniel Golle

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On Tue, Mar 19, 2024 at 08:38:03PM +0100, Andrew Lunn wrote:
> > I would argue that EEE advertisement on the PHY should be enabled by
> > default.
>
> That is an open question at the moment. For some use cases, it can add
> extra delay and jitter which can cause problems. I've heard people
> doing PTP don't like EEE for example.

MediaTek consumer-grade hardware doesn't support PTP and hence that
quite certainly won't ever be an issue with all switch ICs supported
by the mt7530 driver.

I'd rather first change the (configuration) default in OpenWrt (which
is arguable the way most people are using this hardware), also because
that will be more visible/obvious for users. Or even just make EEE
configurable in the LuCI web-UI as a first step so users start playing
with it.

After all, I also have a hard time imagining that MediaTek disabled
EEE in their downstream driver for no reason:

https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/24091177a18ba7f2dd8d928a8f5b27b14df46b16


>
> The current phylib core code leaves the PHY advertisement whatever its
> reset default is. So we leave it to the manufacture to decide if it
> should be enabled or disabled by default. It is policy, so it should
> really be down to user space to configure EEE how it wants it.

I very much agree with that policy, changing the default definitely
feels like something which could affect quite a lot of people and
should be done very carefully, if at all.

2024-03-19 21:17:03

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 19.03.2024 23:26, Daniel Golle wrote:
> On Tue, Mar 19, 2024 at 08:38:03PM +0100, Andrew Lunn wrote:
>>> I would argue that EEE advertisement on the PHY should be enabled by
>>> default.
>>
>> That is an open question at the moment. For some use cases, it can add
>> extra delay and jitter which can cause problems. I've heard people
>> doing PTP don't like EEE for example.
>
> MediaTek consumer-grade hardware doesn't support PTP and hence that
> quite certainly won't ever be an issue with all switch ICs supported
> by the mt7530 driver.
>
> I'd rather first change the (configuration) default in OpenWrt (which
> is arguable the way most people are using this hardware), also because
> that will be more visible/obvious for users. Or even just make EEE
> configurable in the LuCI web-UI as a first step so users start playing
> with it.
>
> After all, I also have a hard time imagining that MediaTek disabled
> EEE in their downstream driver for no reason:
>
> https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/24091177a18ba7f2dd8d928a8f5b27b14df46b16

Are you saying this to indicate that we shouldn't remove that from
mediatek-ge? If so, I've already explained that there'd be no practical
change in removing it as both MT7530 and MT7531 switches enable EEE
advertisement after mediatek-ge.

Arınç

2024-03-19 21:31:45

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 3/19/24 13:26, Daniel Golle wrote:
> On Tue, Mar 19, 2024 at 08:38:03PM +0100, Andrew Lunn wrote:
>>> I would argue that EEE advertisement on the PHY should be enabled by
>>> default.
>>
>> That is an open question at the moment. For some use cases, it can add
>> extra delay and jitter which can cause problems. I've heard people
>> doing PTP don't like EEE for example.
>
> MediaTek consumer-grade hardware doesn't support PTP and hence that
> quite certainly won't ever be an issue with all switch ICs supported
> by the mt7530 driver.
>
> I'd rather first change the (configuration) default in OpenWrt (which
> is arguable the way most people are using this hardware), also because
> that will be more visible/obvious for users. Or even just make EEE
> configurable in the LuCI web-UI as a first step so users start playing
> with it.
>
> After all, I also have a hard time imagining that MediaTek disabled
> EEE in their downstream driver for no reason:
>
> https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/24091177a18ba7f2dd8d928a8f5b27b14df46b16

EEE tends to be an interoperability trap and typically results in
unexplained link drops with different link partners which are difficult
to debug and root cause. It would be great to have more context as to
why it was disabled in the downstream tree to know what we are up
against, though I would not be surprised if there had been a number of
issues reported.

That said as an user, if someone has a well controlled environment, they
should absolutely be able to turn on EEE and see how stable it holds in
their environment.
--
Florian


2024-03-20 08:11:04

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 18.03.2024 10:46, Arınç ÜNAL via B4 Relay wrote:
> Hi.
>
> This patch series fixes EEE support for MT7531 and the switch on the MT7988
> SoC. EEE did not work on MT7531 on most boards before this, it is unclear
> what's the status on MT7988 SoC switch as I don't have the hardware.
>
> Signed-off-by: Arınç ÜNAL <[email protected]>

I see the state of this patch series is deferred on patchwork. I see that I
forgot to delegate this to the net tree. As I don't see any objections in
this series, I'll send v2 with it tomorrow.

Arınç

2024-03-20 11:09:12

by Daniel Golle

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On Wed, Mar 20, 2024 at 11:10:19AM +0300, Arınç ÜNAL wrote:
> On 18.03.2024 10:46, Arınç ÜNAL via B4 Relay wrote:
> > Hi.
> >
> > This patch series fixes EEE support for MT7531 and the switch on the MT7988
> > SoC. EEE did not work on MT7531 on most boards before this, it is unclear
> > what's the status on MT7988 SoC switch as I don't have the hardware.
> >
> > Signed-off-by: Arınç ÜNAL <[email protected]>
>
> I see the state of this patch series is deferred on patchwork. I see that I
> forgot to delegate this to the net tree. As I don't see any objections in
> this series, I'll send v2 with it tomorrow.

Sorry, but imho it should go to net-next, and you might have overlook
it but there have been some concerns.

For sure it should not go to net tree as you are enabling a new feature
and not fixing anything which is broken. EEE on MT7988 works fine as
of now (see my reply), EEE on MT7531 was supposedly intentionally
disabled for reasons we should ask MTK people about.

2024-03-20 15:04:45

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 20.03.2024 14:08, Daniel Golle wrote:
> On Wed, Mar 20, 2024 at 11:10:19AM +0300, Arınç ÜNAL wrote:
>> On 18.03.2024 10:46, Arınç ÜNAL via B4 Relay wrote:
>>> Hi.
>>>
>>> This patch series fixes EEE support for MT7531 and the switch on the MT7988
>>> SoC. EEE did not work on MT7531 on most boards before this, it is unclear
>>> what's the status on MT7988 SoC switch as I don't have the hardware.
>>>
>>> Signed-off-by: Arınç ÜNAL <[email protected]>
>>
>> I see the state of this patch series is deferred on patchwork. I see that I
>> forgot to delegate this to the net tree. As I don't see any objections in
>> this series, I'll send v2 with it tomorrow.
>
> Sorry, but imho it should go to net-next, and you might have overlook
> it but there have been some concerns.

I don't believe I overlooked anything. I've come to this conclusion after
reading every response in this thread.

>
> For sure it should not go to net tree as you are enabling a new feature
> and not fixing anything which is broken. EEE on MT7988 works fine as

EEE support exists since the commit which I've mentioned on my patches
here. I am fixing it. I thought I had explained this clearly on the
patches.

> of now (see my reply), EEE on MT7531 was supposedly intentionally
> disabled for reasons we should ask MTK people about.

Are you talking about the EEE_DIS bit on the trap? There's no default
setting there, it's up to the board vendor to enable/disable EEE by pulling
the affine pin low or high. So there's no intentional disabling by MediaTek
there. I see no need to ask the corporate regarding this.

This patch series does not in any way enable EEE on the switch PHYs and
MACs when it's disabled by default.

Arınç

2024-03-21 16:19:10

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 20.03.2024 00:31, Florian Fainelli wrote:
> On 3/19/24 13:26, Daniel Golle wrote:
>> On Tue, Mar 19, 2024 at 08:38:03PM +0100, Andrew Lunn wrote:
>>>> I would argue that EEE advertisement on the PHY should be enabled by
>>>> default.
>>>
>>> That is an open question at the moment. For some use cases, it can add
>>> extra delay and jitter which can cause problems. I've heard people
>>> doing PTP don't like EEE for example.
>>
>> MediaTek consumer-grade hardware doesn't support PTP and hence that
>> quite certainly won't ever be an issue with all switch ICs supported
>> by the mt7530 driver.
>>
>> I'd rather first change the (configuration) default in OpenWrt (which
>> is arguable the way most people are using this hardware), also because
>> that will be more visible/obvious for users. Or even just make EEE
>> configurable in the LuCI web-UI as a first step so users start playing
>> with it.
>>
>> After all, I also have a hard time imagining that MediaTek disabled
>> EEE in their downstream driver for no reason:
>>
>> https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/24091177a18ba7f2dd8d928a8f5b27b14df46b16
>
> EEE tends to be an interoperability trap and typically results in unexplained link drops with different link partners which are difficult to debug and root cause. It would be great to have more context as to why it was disabled in the downstream tree to know what we are up against, though I would not be surprised if there had been a number of issues reported.

I have started testing MT7531 with EEE enabled and immediately experienced
frames that wouldn't egress the switch or improperly received on the link
partner.

SoC MAC <-EEE off-> MT7531 P6 MAC (acting as PHY)
MT7531 P0 MAC <-EEE on -> MT7531 P0 PHY
MT7531 P0 PHY <-EEE on -> Computer connected with twisted pair

I've tested pinging from the SoC's CPU. Packet capturing on the twisted
pair computer showed very few frames were being received.

# ping 192.168.2.2
PING 192.168.2.2 (192.168.2.2): 56 data bytes
64 bytes from 192.168.2.2: seq=36 ttl=64 time=0.486 ms
^C
--- 192.168.2.2 ping statistics ---
64 packets transmitted, 1 packets received, 98% packet loss
round-trip min/avg/max = 0.486/0.486/0.486 ms

It seems there's less loss when frames are passed more frequently.

# ping 192.168.2.2 -i 0.06
PING 192.168.2.2 (192.168.2.2): 56 data bytes
64 bytes from 192.168.2.2: seq=5 ttl=64 time=0.285 ms
64 bytes from 192.168.2.2: seq=6 ttl=64 time=0.155 ms
64 bytes from 192.168.2.2: seq=7 ttl=64 time=0.243 ms
64 bytes from 192.168.2.2: seq=8 ttl=64 time=0.139 ms
64 bytes from 192.168.2.2: seq=9 ttl=64 time=0.224 ms
64 bytes from 192.168.2.2: seq=68 ttl=64 time=0.350 ms
64 bytes from 192.168.2.2: seq=69 ttl=64 time=0.242 ms
64 bytes from 192.168.2.2: seq=70 ttl=64 time=0.230 ms
64 bytes from 192.168.2.2: seq=71 ttl=64 time=0.242 ms
64 bytes from 192.168.2.2: seq=72 ttl=64 time=0.276 ms
64 bytes from 192.168.2.2: seq=101 ttl=64 time=0.224 ms
64 bytes from 192.168.2.2: seq=102 ttl=64 time=0.238 ms
64 bytes from 192.168.2.2: seq=103 ttl=64 time=0.240 ms
..
--- 192.168.2.2 ping statistics ---
214 packets transmitted, 32 packets received, 85% packet loss
round-trip min/avg/max = 0.099/0.225/0.350 ms

# ping 192.168.2.2 -i 0.05
PING 192.168.2.2 (192.168.2.2): 56 data bytes
64 bytes from 192.168.2.2: seq=1 ttl=64 time=0.277 ms
64 bytes from 192.168.2.2: seq=2 ttl=64 time=0.240 ms
64 bytes from 192.168.2.2: seq=3 ttl=64 time=0.133 ms
64 bytes from 192.168.2.2: seq=4 ttl=64 time=0.233 ms
64 bytes from 192.168.2.2: seq=5 ttl=64 time=0.223 ms
64 bytes from 192.168.2.2: seq=6 ttl=64 time=0.228 ms
64 bytes from 192.168.2.2: seq=7 ttl=64 time=0.236 ms
64 bytes from 192.168.2.2: seq=8 ttl=64 time=0.150 ms
..
--- 192.168.2.2 ping statistics ---
41 packets transmitted, 40 packets received, 2% packet loss
round-trip min/avg/max = 0.112/0.206/0.277 ms

>
> That said as an user, if someone has a well controlled environment, they should absolutely be able to turn on EEE and see how stable it holds in their environment.

Looks like this is the way to go. I'm planning to submit v2 with patch 1
as:

diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 678b51f9cea6..6aa99b590329 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -2458,6 +2458,20 @@ mt7531_setup(struct dsa_switch *ds)
/* Reset the switch through internal reset */
mt7530_write(priv, MT7530_SYS_CTRL, SYS_CTRL_SW_RST | SYS_CTRL_REG_RST);

+ /* Allow modifying the trap and enable Energy-Efficient Ethernet (EEE).
+ */
+ val = mt7530_read(priv, MT7531_HWTRAP);
+ val |= CHG_STRAP;
+ val &= ~EEE_DIS;
+ mt7530_write(priv, MT7530_MHWTRAP, val);
+
+ /* Disable EEE advertisement on the switch PHYs. */
+ for (i = MT753X_CTRL_PHY_ADDR;
+ i < MT753X_CTRL_PHY_ADDR + MT7530_NUM_PHYS; i++) {
+ mt7531_ind_c45_phy_write(priv, i, MDIO_MMD_AN, MDIO_AN_EEE_ADV,
+ 0);
+ }
+
if (!priv->p5_sgmii) {
mt7531_pll_setup(priv);
} else {
diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
index a71166e0a7fc..509ed5362236 100644
--- a/drivers/net/dsa/mt7530.h
+++ b/drivers/net/dsa/mt7530.h
@@ -457,6 +457,7 @@ enum mt7531_clk_skew {
#define XTAL_FSEL_M BIT(7)
#define PHY_EN BIT(6)
#define CHG_STRAP BIT(8)
+#define EEE_DIS BIT(4)

/* Register for hw trap modification */
#define MT7530_MHWTRAP 0x7804

Arınç

2024-03-21 16:31:49

by Florian Fainelli

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 3/21/24 09:09, Arınç ÜNAL wrote:
> On 20.03.2024 00:31, Florian Fainelli wrote:
>> On 3/19/24 13:26, Daniel Golle wrote:
>>> On Tue, Mar 19, 2024 at 08:38:03PM +0100, Andrew Lunn wrote:
>>>>> I would argue that EEE advertisement on the PHY should be enabled by
>>>>> default.
>>>>
>>>> That is an open question at the moment. For some use cases, it can add
>>>> extra delay and jitter which can cause problems. I've heard people
>>>> doing PTP don't like EEE for example.
>>>
>>> MediaTek consumer-grade hardware doesn't support PTP and hence that
>>> quite certainly won't ever be an issue with all switch ICs supported
>>> by the mt7530 driver.
>>>
>>> I'd rather first change the (configuration) default in OpenWrt (which
>>> is arguable the way most people are using this hardware), also because
>>> that will be more visible/obvious for users. Or even just make EEE
>>> configurable in the LuCI web-UI as a first step so users start playing
>>> with it.
>>>
>>> After all, I also have a hard time imagining that MediaTek disabled
>>> EEE in their downstream driver for no reason:
>>>
>>> https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/24091177a18ba7f2dd8d928a8f5b27b14df46b16
>>
>> EEE tends to be an interoperability trap and typically results in
>> unexplained link drops with different link partners which are
>> difficult to debug and root cause. It would be great to have more
>> context as to why it was disabled in the downstream tree to know what
>> we are up against, though I would not be surprised if there had been a
>> number of issues reported.
>
> I have started testing MT7531 with EEE enabled and immediately experienced
> frames that wouldn't egress the switch or improperly received on the link
> partner.
>
> SoC MAC       <-EEE off-> MT7531 P6 MAC (acting as PHY)
> MT7531 P0 MAC <-EEE on -> MT7531 P0 PHY
> MT7531 P0 PHY <-EEE on -> Computer connected with twisted pair

OK, so this is intended to describe that the SoC's Ethernet MAC link to
the integrated switch did not use EEE only the user-facing ports. That
makes sense because it's all digital logic and you are not going to be
seeing much power saving from having EEE enabled between the SoC's
Ethernet MAC and CPU port of the switch, that said, however, I wonder if
this has an impact on any form of flow control within the switch that is
reacting to LPI and you need EEE to be enabled end-to-end?

>
> I've tested pinging from the SoC's CPU. Packet capturing on the twisted
> pair computer showed very few frames were being received.
>
> # ping 192.168.2.2
> PING 192.168.2.2 (192.168.2.2): 56 data bytes
> 64 bytes from 192.168.2.2: seq=36 ttl=64 time=0.486 ms
> ^C
> --- 192.168.2.2 ping statistics ---
> 64 packets transmitted, 1 packets received, 98% packet loss
> round-trip min/avg/max = 0.486/0.486/0.486 ms
>
> It seems there's less loss when frames are passed more frequently.

That would point to an issue getting in and out of LPI, do you see these
packet losses even with different LPI timeouts?

>
> # ping 192.168.2.2 -i 0.06
> PING 192.168.2.2 (192.168.2.2): 56 data bytes
> 64 bytes from 192.168.2.2: seq=5 ttl=64 time=0.285 ms
> 64 bytes from 192.168.2.2: seq=6 ttl=64 time=0.155 ms
> 64 bytes from 192.168.2.2: seq=7 ttl=64 time=0.243 ms
> 64 bytes from 192.168.2.2: seq=8 ttl=64 time=0.139 ms
> 64 bytes from 192.168.2.2: seq=9 ttl=64 time=0.224 ms
> 64 bytes from 192.168.2.2: seq=68 ttl=64 time=0.350 ms
> 64 bytes from 192.168.2.2: seq=69 ttl=64 time=0.242 ms
> 64 bytes from 192.168.2.2: seq=70 ttl=64 time=0.230 ms
> 64 bytes from 192.168.2.2: seq=71 ttl=64 time=0.242 ms
> 64 bytes from 192.168.2.2: seq=72 ttl=64 time=0.276 ms
> 64 bytes from 192.168.2.2: seq=101 ttl=64 time=0.224 ms
> 64 bytes from 192.168.2.2: seq=102 ttl=64 time=0.238 ms
> 64 bytes from 192.168.2.2: seq=103 ttl=64 time=0.240 ms
> ...
> --- 192.168.2.2 ping statistics ---
> 214 packets transmitted, 32 packets received, 85% packet loss
> round-trip min/avg/max = 0.099/0.225/0.350 ms
>
> # ping 192.168.2.2 -i 0.05
> PING 192.168.2.2 (192.168.2.2): 56 data bytes
> 64 bytes from 192.168.2.2: seq=1 ttl=64 time=0.277 ms
> 64 bytes from 192.168.2.2: seq=2 ttl=64 time=0.240 ms
> 64 bytes from 192.168.2.2: seq=3 ttl=64 time=0.133 ms
> 64 bytes from 192.168.2.2: seq=4 ttl=64 time=0.233 ms
> 64 bytes from 192.168.2.2: seq=5 ttl=64 time=0.223 ms
> 64 bytes from 192.168.2.2: seq=6 ttl=64 time=0.228 ms
> 64 bytes from 192.168.2.2: seq=7 ttl=64 time=0.236 ms
> 64 bytes from 192.168.2.2: seq=8 ttl=64 time=0.150 ms
> ...
> --- 192.168.2.2 ping statistics ---
> 41 packets transmitted, 40 packets received, 2% packet loss
> round-trip min/avg/max = 0.112/0.206/0.277 ms
>
>>
>> That said as an user, if someone has a well controlled environment,
>> they should absolutely be able to turn on EEE and see how stable it
>> holds in their environment.
>
> Looks like this is the way to go. I'm planning to submit v2 with patch 1
> as:
>
> diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
> index 678b51f9cea6..6aa99b590329 100644
> --- a/drivers/net/dsa/mt7530.c
> +++ b/drivers/net/dsa/mt7530.c
> @@ -2458,6 +2458,20 @@ mt7531_setup(struct dsa_switch *ds)
>      /* Reset the switch through internal reset */
>      mt7530_write(priv, MT7530_SYS_CTRL, SYS_CTRL_SW_RST |
> SYS_CTRL_REG_RST);
>
> +    /* Allow modifying the trap and enable Energy-Efficient Ethernet
> (EEE).
> +     */
> +    val = mt7530_read(priv, MT7531_HWTRAP);
> +    val |= CHG_STRAP;
> +    val &= ~EEE_DIS;
> +    mt7530_write(priv, MT7530_MHWTRAP, val);
> +
> +    /* Disable EEE advertisement on the switch PHYs. */
> +    for (i = MT753X_CTRL_PHY_ADDR;
> +         i < MT753X_CTRL_PHY_ADDR + MT7530_NUM_PHYS; i++) {
> +        mt7531_ind_c45_phy_write(priv, i, MDIO_MMD_AN, MDIO_AN_EEE_ADV,
> +                     0);
> +    }
> +
>      if (!priv->p5_sgmii) {
>          mt7531_pll_setup(priv);
>      } else {
> diff --git a/drivers/net/dsa/mt7530.h b/drivers/net/dsa/mt7530.h
> index a71166e0a7fc..509ed5362236 100644
> --- a/drivers/net/dsa/mt7530.h
> +++ b/drivers/net/dsa/mt7530.h
> @@ -457,6 +457,7 @@ enum mt7531_clk_skew {
>  #define  XTAL_FSEL_M            BIT(7)
>  #define  PHY_EN                BIT(6)
>  #define  CHG_STRAP            BIT(8)
> +#define  EEE_DIS            BIT(4)
>
>  /* Register for hw trap modification */
>  #define MT7530_MHWTRAP            0x7804
>
> Arınç

--
Florian


2024-03-24 09:47:36

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 21/03/2024 18:31, Florian Fainelli wrote:
> On 3/21/24 09:09, Arınç ÜNAL wrote:
>> I have started testing MT7531 with EEE enabled and immediately experienced
>> frames that wouldn't egress the switch or improperly received on the link
>> partner.
>>
>> SoC MAC       <-EEE off-> MT7531 P6 MAC (acting as PHY)
>> MT7531 P0 MAC <-EEE on -> MT7531 P0 PHY
>> MT7531 P0 PHY <-EEE on -> Computer connected with twisted pair
>
> OK, so this is intended to describe that the SoC's Ethernet MAC link to the integrated switch did not use EEE only the user-facing ports. That makes sense because it's all digital logic and you are not going to be seeing much power saving from having EEE enabled between the SoC's Ethernet MAC and CPU port of the switch, that said, however, I wonder if this has an impact on any form of flow control within the switch that is reacting to LPI and you need EEE to be enabled end-to-end?

I've tested pinging between my computers with EEE enabled interfaces. The
behaviour is identical.

>
>>
>> I've tested pinging from the SoC's CPU. Packet capturing on the twisted
>> pair computer showed very few frames were being received.
>>
>> # ping 192.168.2.2
>> PING 192.168.2.2 (192.168.2.2): 56 data bytes
>> 64 bytes from 192.168.2.2: seq=36 ttl=64 time=0.486 ms
>> ^C
>> --- 192.168.2.2 ping statistics ---
>> 64 packets transmitted, 1 packets received, 98% packet loss
>> round-trip min/avg/max = 0.486/0.486/0.486 ms
>>
>> It seems there's less loss when frames are passed more frequently.
>
> That would point to an issue getting in and out of LPI, do you see these packet losses even with different LPI timeouts?

The NICs on my computers don't seem to allow changing the tx-lpi and
tx-timer options.

Computer 1 (Intel I219-V, driver: e1000e):

$ sudo ethtool --set-eee eno1 tx-timer 15
netlink error: Invalid argument

$ sudo ethtool --show-eee eno1
EEE settings for eno1:
EEE status: enabled - active
Tx LPI: 17 (us)
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Link partner advertised EEE link modes: 100baseT/Full
1000baseT/Full

Computer 2 (Realtek RTL8111H, driver: r8169):

$ sudo ethtool --set-eee eno1 tx-lpi on

$ sudo ethtool --show-eee eno1
EEE settings for eno1:
EEE status: enabled - active
Tx LPI: disabled
Supported EEE link modes: 100baseT/Full
1000baseT/Full
Advertised EEE link modes: 100baseT/Full
1000baseT/Full
Link partner advertised EEE link modes: 100baseT/Full
1000baseT/Full

I've tested with switch ports interfaces' tx-timer from 0 to 40, same
tx-timer for both interfaces. Loss is still there.

I suppose the MT7531 switch PHYs need calibration for EEE that is currently
missing from the mediatek-ge driver.

Arınç

2024-03-24 11:40:04

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On Sun, Mar 24, 2024 at 12:47:08PM +0300, Arınç ÜNAL wrote:
> On 21/03/2024 18:31, Florian Fainelli wrote:
> > On 3/21/24 09:09, Arınç ÜNAL wrote:
> > > I have started testing MT7531 with EEE enabled and immediately experienced
> > > frames that wouldn't egress the switch or improperly received on the link
> > > partner.
> > >
> > > SoC MAC       <-EEE off-> MT7531 P6 MAC (acting as PHY)
> > > MT7531 P0 MAC <-EEE on -> MT7531 P0 PHY
> > > MT7531 P0 PHY <-EEE on -> Computer connected with twisted pair
> >
> > OK, so this is intended to describe that the SoC's Ethernet MAC link to the integrated switch did not use EEE only the user-facing ports. That makes sense because it's all digital logic and you are not going to be seeing much power saving from having EEE enabled between the SoC's Ethernet MAC and CPU port of the switch, that said, however, I wonder if this has an impact on any form of flow control within the switch that is reacting to LPI and you need EEE to be enabled end-to-end?
>
> I've tested pinging between my computers with EEE enabled interfaces. The
> behaviour is identical.
>
> >
> > >
> > > I've tested pinging from the SoC's CPU. Packet capturing on the twisted
> > > pair computer showed very few frames were being received.
> > >
> > > # ping 192.168.2.2
> > > PING 192.168.2.2 (192.168.2.2): 56 data bytes
> > > 64 bytes from 192.168.2.2: seq=36 ttl=64 time=0.486 ms
> > > ^C
> > > --- 192.168.2.2 ping statistics ---
> > > 64 packets transmitted, 1 packets received, 98% packet loss
> > > round-trip min/avg/max = 0.486/0.486/0.486 ms
> > >
> > > It seems there's less loss when frames are passed more frequently.
> >
> > That would point to an issue getting in and out of LPI, do you see these packet losses even with different LPI timeouts?
>
> The NICs on my computers don't seem to allow changing the tx-lpi and
> tx-timer options.
>
> Computer 1 (Intel I219-V, driver: e1000e):
>
> $ sudo ethtool --set-eee eno1 tx-timer 15
> netlink error: Invalid argument
>
> $ sudo ethtool --show-eee eno1
> EEE settings for eno1:
> EEE status: enabled - active
> Tx LPI: 17 (us)
> Supported EEE link modes: 100baseT/Full
> 1000baseT/Full
> Advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
> Link partner advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
>
> Computer 2 (Realtek RTL8111H, driver: r8169):
>
> $ sudo ethtool --set-eee eno1 tx-lpi on
>
> $ sudo ethtool --show-eee eno1
> EEE settings for eno1:
> EEE status: enabled - active
> Tx LPI: disabled
> Supported EEE link modes: 100baseT/Full
> 1000baseT/Full
> Advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
> Link partner advertised EEE link modes: 100baseT/Full
> 1000baseT/Full
>
> I've tested with switch ports interfaces' tx-timer from 0 to 40, same
> tx-timer for both interfaces. Loss is still there.

EEE implementations tend to be a mess in the way drivers implement the
API, so one can't at the moment rely on what ethtool says about the
status. Sadly, this is what happens when driver authors are left to
their own ends. :(

> I suppose the MT7531 switch PHYs need calibration for EEE that is currently
> missing from the mediatek-ge driver.

EEE is quite simple from the software point of view. There is software
negotiation of the modules that EEE supports, and then there is are
one or more timers that affect the behaviour of EEE. The LPI timer is
"how long the link needs to be idle for before _this_ end signals that
it _can_ enter low power state". The link only enters low power state
when *both* ends of the link signal that they can enter low power
state.

What calibration would be necessary?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-03-24 11:52:01

by Arınç ÜNAL

[permalink] [raw]
Subject: Re: [PATCH 0/3] Fix EEE support for MT7531 and MT7988 SoC switch

On 24.03.2024 14:39, Russell King (Oracle) wrote:
> On Sun, Mar 24, 2024 at 12:47:08PM +0300, Arınç ÜNAL wrote:
>> I've tested with switch ports interfaces' tx-timer from 0 to 40, same
>> tx-timer for both interfaces. Loss is still there.
>
> EEE implementations tend to be a mess in the way drivers implement the
> API, so one can't at the moment rely on what ethtool says about the
> status. Sadly, this is what happens when driver authors are left to
> their own ends. :(
>
>> I suppose the MT7531 switch PHYs need calibration for EEE that is currently
>> missing from the mediatek-ge driver.
>
> EEE is quite simple from the software point of view. There is software
> negotiation of the modules that EEE supports, and then there is are
> one or more timers that affect the behaviour of EEE. The LPI timer is
> "how long the link needs to be idle for before _this_ end signals that
> it _can_ enter low power state". The link only enters low power state
> when *both* ends of the link signal that they can enter low power
> state.
>
> What calibration would be necessary?

Check out mt798x_phy_eee() on drivers/net/phy/mediatek-ge-soc.c.

Arınç