Since the introduction of mac_managed_pm in the FEC driver there were some
discrepancies regarding power management of the PHY.
This failed on our board that has a permanently powered Microchip LAN8700R
attached to the FEC. Although the root cause of the failure can be traced
back to f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO
with polled IO") and probably even before that, we only started noticing
the problem going from 5.10 to 6.1.
Since 557d5dc83f68 ("net: fec: use mac-managed PHY PM") is actually a fix
to most of the power management sequencing problems that came with power
managing the MDIO bus which for the FEC meant adding a race with FEC
resume (and phy_start() if netif was running) and PHY resume.
That it worked before for us was probably just luck...
Thanks to Wei's response to my report at [1] I was able to pick up his
patch and start honing in on the remaining missing details.
[1]: https://lore.kernel.org/netdev/[email protected]/
v4:
- Adjustments to commit message in patch 1
- Drop patch 2 after discussion in v3.
v3: https://lore.kernel.org/netdev/[email protected]/
- Implement feedback from Wei Fang for patch 2
- Fixes tag in patch 2 dropped, should it be delayed for net-next now?
v2: https://lore.kernel.org/netdev/[email protected]/
- Completely different approach that should be much more correct
(Wei Fang, Jakub Kicinski)
- Re-target to net tree, because I have fixes tags now
v1: https://lore.kernel.org/netdev/[email protected]/
Wei Fang (1):
net: fec: Set mac_managed_pm during probe
drivers/net/ethernet/freescale/fec_main.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
--
2.44.0
From: Wei Fang <[email protected]>
Setting mac_managed_pm during interface up is too late.
In situations where the link is not brought up yet and the system suspends
the regular PHY power management will run. Since the FEC ETHEREN control
bit is cleared (automatically) on suspend the controller is off in resume.
When the regular PHY power management resume path runs in this context it
will write to the MII_DATA register but nothing will be transmitted on the
MDIO bus.
This can be observed by the following log:
fec 5b040000.ethernet eth0: MDIO read timeout
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110
The data written will however remain in the MII_DATA register.
When the link later is set to administrative up it will trigger a call to
fec_restart() which will restore the MII_SPEED register. This triggers the
quirk explained in f166f890c8f0 ("net: ethernet: fec: Replace interrupt
driven MDIO with polled IO") causing an extra MII_EVENT.
This extra event desynchronizes all the MDIO register reads, causing them
to complete too early. Leading all reads to read as 0 because
fec_enet_mdio_wait() returns too early.
When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes
the PHY to be initialized incorrectly and the PHY will not transmit any
ethernet signal in this state. It cannot be brought out of this state
without a power cycle of the PHY.
Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
Closes: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Wei Fang <[email protected]>
[jernberg: commit message]
Signed-off-by: John Ernberg <[email protected]>
---
v4:
- Adjustments to commit message after a better understanding of the
behavior explained in f166f890c8f0 as a result of the discussion on v3
patch 2.
v3:
- No changes
v2:
- New patch. Big thanks to Wei for the help on this issue.
---
drivers/net/ethernet/freescale/fec_main.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index d7693fdf640d..8bd213da8fb6 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2454,8 +2454,6 @@ static int fec_enet_mii_probe(struct net_device *ndev)
fep->link = 0;
fep->full_duplex = 0;
- phy_dev->mac_managed_pm = true;
-
phy_attached_info(phy_dev);
return 0;
@@ -2467,10 +2465,12 @@ static int fec_enet_mii_init(struct platform_device *pdev)
struct net_device *ndev = platform_get_drvdata(pdev);
struct fec_enet_private *fep = netdev_priv(ndev);
bool suppress_preamble = false;
+ struct phy_device *phydev;
struct device_node *node;
int err = -ENXIO;
u32 mii_speed, holdtime;
u32 bus_freq;
+ int addr;
/*
* The i.MX28 dual fec interfaces are not equal.
@@ -2584,6 +2584,13 @@ static int fec_enet_mii_init(struct platform_device *pdev)
goto err_out_free_mdiobus;
of_node_put(node);
+ /* find all the PHY devices on the bus and set mac_managed_pm to true */
+ for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
+ phydev = mdiobus_get_phy(fep->mii_bus, addr);
+ if (phydev)
+ phydev->mac_managed_pm = true;
+ }
+
mii_cnt++;
/* save fec0 mii_bus */
--
2.44.0
Hello:
This patch was applied to netdev/net.git (main)
by Jakub Kicinski <[email protected]>:
On Thu, 28 Mar 2024 15:59:29 +0000 you wrote:
> Since the introduction of mac_managed_pm in the FEC driver there were some
> discrepancies regarding power management of the PHY.
>
> This failed on our board that has a permanently powered Microchip LAN8700R
> attached to the FEC. Although the root cause of the failure can be traced
> back to f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO
> with polled IO") and probably even before that, we only started noticing
> the problem going from 5.10 to 6.1.
>
> [...]
Here is the summary with links:
- [v4,1/1] net: fec: Set mac_managed_pm during probe
https://git.kernel.org/netdev/net/c/cbc17e7802f5
You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html