2024-02-29 10:53:38

by John Ernberg

[permalink] [raw]
Subject: [PATCH net v2 0/2] net: fec: Fixes to suspend / resume with mac_managed_pm

Since the introduction of mac_managed_pm in the FEC driver there were some
discrepancies regarding power management of the PHY.

This failed on our board that has a permanently powered Microchip LAN8700R
attached to the FEC. Although the root cause of the failure can be traced
back to f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO
with polled IO") and probably even before that, we only started noticing
the problem going from 5.10 to 6.1.

Since 557d5dc83f68 ("net: fec: use mac-managed PHY PM") is actually a fix
to most of the power management sequencing problems that came with power
managing the MDIO bus which for the FEC meant adding a race with FEC
resume (and phy_start() if netif was running) and PHY resume.

That it worked before for us was probably just luck...

Thanks to Wei's response to my report at [1] I was able to pick up his
patch and start honing in on the remaining missing details.

[1]: https://lore.kernel.org/netdev/[email protected]/

v2:
- Completely different approach that should be much more correct
(Wei Fang, Jakub Kucinski)
- Re-target to net tree, because I have fixes tags now

v1: https://lore.kernel.org/netdev/[email protected]/

John Ernberg (1):
net: fec: Suspend and resume the PHY

Wei Fang (1):
net: fec: Set mac_managed_pm during probe

drivers/net/ethernet/freescale/fec_main.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)

--
2.43.0


2024-02-29 10:54:14

by John Ernberg

[permalink] [raw]
Subject: [PATCH net v2 2/2] net: fec: Suspend and resume the PHY

PHYs that are always-on will not enter their low power modes otherwise as
they have no regulator to be powered off with.

Since the PHY is picked up via {of_,}phy_connect() and dropped with
phy_disconnect() when the link is brought up and down respectively the only
cases were pm is needed is when the netif is running or or when the link
has never been up.

To deal with the latter case the PHY is suspended on discovery in probe,
since it won't be needed until link up.

Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
Signed-off-by: John Ernberg <[email protected]>
---

v2: New patch
---
drivers/net/ethernet/freescale/fec_main.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 8decb1b072c5..c5394a4d8491 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2539,8 +2539,10 @@ static int fec_enet_mii_init(struct platform_device *pdev)
/* find all the PHY devices on the bus and set mac_managed_pm to true */
for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
phydev = mdiobus_get_phy(fep->mii_bus, addr);
- if (phydev)
+ if (phydev) {
phydev->mac_managed_pm = true;
+ phy_suspend(phydev);
+ }
}

mii_cnt++;
@@ -4631,6 +4633,7 @@ static int __maybe_unused fec_suspend(struct device *dev)
if (fep->wol_flag & FEC_WOL_FLAG_ENABLE)
fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON;
phy_stop(ndev->phydev);
+ phy_suspend(ndev->phydev);
napi_disable(&fep->napi);
netif_tx_lock_bh(ndev);
netif_device_detach(ndev);
@@ -4716,6 +4719,7 @@ static int __maybe_unused fec_resume(struct device *dev)
netif_tx_unlock_bh(ndev);
napi_enable(&fep->napi);
phy_init_hw(ndev->phydev);
+ phy_resume(ndev->phydev);
phy_start(ndev->phydev);
}
rtnl_unlock();
--
2.43.0

2024-02-29 11:14:48

by John Ernberg

[permalink] [raw]
Subject: [PATCH net v2 1/2] net: fec: Set mac_managed_pm during probe

From: Wei Fang <[email protected]>

Setting mac_managed_pm during interface up is too late.

In situations where the link is not brought up yet and the system suspends
the regular PHY power management will run. Since the FEC ETHEREN control
bit is cleared (automatically) on suspend the controller is off in resume.
When the regular PHY power management resume path runs in this context it
will write to the MII_DATA register but nothing will be transmitted on the
MDIO bus.

This can be observed by the following log:

fec 5b040000.ethernet eth0: MDIO read timeout
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110

The data written will however remain in the MII_DATA register.

When the link later is brought up it will trigger a call to fec_restart()
which will restore the MII_SPEED register. This triggers the quirk
explained in f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven
MDIO with polled IO") causing a MII_EVENT storm.

This event storm causes all the MDIO register reads to read as 0 because
fec_enet_mdio_wait() returns too early.

When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes
the PHY to be initialized incorrectly and the PHY will not transmit any
ethernet signal in this state. It cannot be brought out of this state
without a power cycle of the PHY.

Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Wei Fang <[email protected]>
[jernberg: commit message]
Signed-off-by: John Ernberg <[email protected]>
---

v2: New patch. Big thanks to Wei for the help on this issue.
---
drivers/net/ethernet/freescale/fec_main.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 432523b2c789..8decb1b072c5 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2406,8 +2406,6 @@ static int fec_enet_mii_probe(struct net_device *ndev)
fep->link = 0;
fep->full_duplex = 0;

- phy_dev->mac_managed_pm = true;
-
phy_attached_info(phy_dev);

return 0;
@@ -2419,10 +2417,12 @@ static int fec_enet_mii_init(struct platform_device *pdev)
struct net_device *ndev = platform_get_drvdata(pdev);
struct fec_enet_private *fep = netdev_priv(ndev);
bool suppress_preamble = false;
+ struct phy_device *phydev;
struct device_node *node;
int err = -ENXIO;
u32 mii_speed, holdtime;
u32 bus_freq;
+ int addr;

/*
* The i.MX28 dual fec interfaces are not equal.
@@ -2536,6 +2536,13 @@ static int fec_enet_mii_init(struct platform_device *pdev)
goto err_out_free_mdiobus;
of_node_put(node);

+ /* find all the PHY devices on the bus and set mac_managed_pm to true */
+ for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
+ phydev = mdiobus_get_phy(fep->mii_bus, addr);
+ if (phydev)
+ phydev->mac_managed_pm = true;
+ }
+
mii_cnt++;

/* save fec0 mii_bus */
--
2.43.0

2024-03-01 01:49:52

by Wei Fang

[permalink] [raw]
Subject: RE: [PATCH net v2 2/2] net: fec: Suspend and resume the PHY

> -----Original Message-----
> From: John Ernberg <[email protected]>
> Sent: 2024??2??29?? 18:53
> To: Wei Fang <[email protected]>
> Cc: Shenwei Wang <[email protected]>; Clark Wang
> <[email protected]>; dl-linux-imx <[email protected]>; David S.
> Miller <[email protected]>; Eric Dumazet <[email protected]>;
> Jakub Kicinski <[email protected]>; Paolo Abeni <[email protected]>;
> Heiner Kallweit <[email protected]>; [email protected];
> [email protected]; John Ernberg <[email protected]>
> Subject: [PATCH net v2 2/2] net: fec: Suspend and resume the PHY
>
> PHYs that are always-on will not enter their low power modes otherwise as
> they have no regulator to be powered off with.
>
> Since the PHY is picked up via {of_,}phy_connect() and dropped with
> phy_disconnect() when the link is brought up and down respectively the only
> cases were pm is needed is when the netif is running or or when the link
nit: where

> has never been up.
>
> To deal with the latter case the PHY is suspended on discovery in probe,
> since it won't be needed until link up.
>
> Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
I'm not sure whether this commit should be blamed. After checking my local
code (not the recent upstream code), fec_suspend() will make the PHY enter
suspend mode when calling phy_stop(), the specific logic is fec_suspend() -->
phy_stop() --> phy_state_machine() --> phy_suspend (). But the latest upstream
code may have changed this logic. I'm sorry that I don't have time to sit down
and look at the latest code.

> Signed-off-by: John Ernberg <[email protected]>
> ---
>
> v2: New patch
> ---
> drivers/net/ethernet/freescale/fec_main.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/freescale/fec_main.c
> b/drivers/net/ethernet/freescale/fec_main.c
> index 8decb1b072c5..c5394a4d8491 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -2539,8 +2539,10 @@ static int fec_enet_mii_init(struct
> platform_device *pdev)
> /* find all the PHY devices on the bus and set mac_managed_pm to true
> */
> for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
> phydev = mdiobus_get_phy(fep->mii_bus, addr);
> - if (phydev)
> + if (phydev) {
> phydev->mac_managed_pm = true;
> + phy_suspend(phydev);
> + }
> }
>
> mii_cnt++;
> @@ -4631,6 +4633,7 @@ static int __maybe_unused fec_suspend(struct
> device *dev)
> if (fep->wol_flag & FEC_WOL_FLAG_ENABLE)
> fep->wol_flag |= FEC_WOL_FLAG_SLEEP_ON;
> phy_stop(ndev->phydev);
> + phy_suspend(ndev->phydev);
As I aforementioned, if phy_stop() does not suspend the PHY in the latest
code, is it more general to restore the suspend operation in phy_stop()?

> napi_disable(&fep->napi);
> netif_tx_lock_bh(ndev);
> netif_device_detach(ndev);
> @@ -4716,6 +4719,7 @@ static int __maybe_unused fec_resume(struct
> device *dev)
> netif_tx_unlock_bh(ndev);
> napi_enable(&fep->napi);
> phy_init_hw(ndev->phydev);
> + phy_resume(ndev->phydev);
> phy_start(ndev->phydev);
> }
> rtnl_unlock();
> --
> 2.43.0

2024-03-01 08:10:51

by John Ernberg

[permalink] [raw]
Subject: Re: [PATCH net v2 2/2] net: fec: Suspend and resume the PHY

Hi Wei,

On 3/1/24 02:49, Wei Fang wrote:
>> -----Original Message-----
>> From: John Ernberg <[email protected]>
>> Sent: 2024年2月29日 18:53
>> To: Wei Fang <[email protected]>
>> Cc: Shenwei Wang <[email protected]>; Clark Wang
>> <[email protected]>; dl-linux-imx <[email protected]>; David S.
>> Miller <[email protected]>; Eric Dumazet <[email protected]>;
>> Jakub Kicinski <[email protected]>; Paolo Abeni <[email protected]>;
>> Heiner Kallweit <[email protected]>; [email protected];
>> [email protected]; John Ernberg <[email protected]>
>> Subject: [PATCH net v2 2/2] net: fec: Suspend and resume the PHY
>>
>> PHYs that are always-on will not enter their low power modes otherwise as
>> they have no regulator to be powered off with.
>>
>> Since the PHY is picked up via {of_,}phy_connect() and dropped with
>> phy_disconnect() when the link is brought up and down respectively the only
>> cases were pm is needed is when the netif is running or or when the link
> nit: where
>
>> has never been up.
>>
>> To deal with the latter case the PHY is suspended on discovery in probe,
>> since it won't be needed until link up.
>>
>> Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
> I'm not sure whether this commit should be blamed. After checking my local
> code (not the recent upstream code), fec_suspend() will make the PHY enter
> suspend mode when calling phy_stop(), the specific logic is fec_suspend() -->
> phy_stop() --> phy_state_machine() --> phy_suspend (). But the latest upstream
> code may have changed this logic. I'm sorry that I don't have time to sit down
> and look at the latest code.

I missed this flow, and also didn't see it take place when I look at the
MDIO traffic. But the code has been there since 2013.

I will check why that isn't happening.

Thanks! // John Ernberg