2024-02-12 10:56:08

by John Ernberg

[permalink] [raw]
Subject: [PATCH net-next] net: fec: Always call fec_restart() in resume path

When trying to resume from suspend the following can be observed:

fec 5b040000.ethernet eth0: MDIO read timeout
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110

This is because the MAC is left powered down after resuming from suspend.

The MAC is brought up in both probe and open, so leaving it off in resume
from suspend is an imbalance.
This imbalance combined with a LAN8700R that is permanently powered
results in unusuable networking if the board would happen to suspend before
the link is brought up, and the only way to get out of it would be a full
power cycle.

NOTE: With this change the PHY ends up taking different resume paths when
the link has never been up compared to once the link has been up. Currently
the resume process is identical and just happens at different times, so
this *should* not have any unforseen consequences.

Signed-off-by: John Ernberg <[email protected]>
---

Tested on 6.1 kernel and forward ported. I discovered this when we
upgraded from 5.10 to 6.1, but the resume path in the FEC driver has had
this imbalance since at least 2009.

This is also why I target the -next tree, I can't identify a proper commit
to blame with a Fixes. Let me know if this should be the net tree anyway.

drivers/net/ethernet/freescale/fec_main.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 42bdc01a304e..e6804c068d6b 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -4706,6 +4706,8 @@ static int __maybe_unused fec_resume(struct device *dev)
napi_enable(&fep->napi);
phy_init_hw(ndev->phydev);
phy_start(ndev->phydev);
+ } else {
+ fec_restart(ndev);
}
rtnl_unlock();

--
2.43.0


2024-02-14 02:55:14

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next] net: fec: Always call fec_restart() in resume path

On Mon, 12 Feb 2024 10:50:30 +0000 John Ernberg wrote:
> Tested on 6.1 kernel and forward ported. I discovered this when we
> upgraded from 5.10 to 6.1, but the resume path in the FEC driver has had
> this imbalance since at least 2009.
>
> This is also why I target the -next tree, I can't identify a proper commit
> to blame with a Fixes. Let me know if this should be the net tree anyway.

I thought you bisected it to one or two specific changes?
I'd put those down as Fixes tags and target net.

2024-02-14 08:27:35

by John Ernberg

[permalink] [raw]
Subject: Re: [PATCH net-next] net: fec: Always call fec_restart() in resume path

On 2/14/24 03:44, Jakub Kicinski wrote:
> On Mon, 12 Feb 2024 10:50:30 +0000 John Ernberg wrote:
>> Tested on 6.1 kernel and forward ported. I discovered this when we
>> upgraded from 5.10 to 6.1, but the resume path in the FEC driver has had
>> this imbalance since at least 2009.
>>
>> This is also why I target the -next tree, I can't identify a proper commit
>> to blame with a Fixes. Let me know if this should be the net tree anyway.
>
> I thought you bisected it to one or two specific changes?
> I'd put those down as Fixes tags and target net.

Hi Jakub,

You are correct, we thought so too at [1], but bisection is really hard
because we need a whole bunch of patches on top to even boot the system
(imx8qxp specific stuff in the NXP vendor tree that's difficult to
rebase), we left it a bit open ended.

Over the course of the weekend I lost all confidence in my bisection
after being confident for 4-5 days, because the more I thought about it
the less it made sense for that commit to be the culprit.

I should probably have both followed up on that mail with that, and been
clearer here. I apologize for failing that.

Best regards // John Ernberg

[1]:
https://lore.kernel.org/netdev/[email protected]/

2024-02-14 14:52:34

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH net-next] net: fec: Always call fec_restart() in resume path

On Wed, 14 Feb 2024 08:27:02 +0000 John Ernberg wrote:
> You are correct, we thought so too at [1], but bisection is really hard
> because we need a whole bunch of patches on top to even boot the system
> (imx8qxp specific stuff in the NXP vendor tree that's difficult to
> rebase), we left it a bit open ended.
>
> Over the course of the weekend I lost all confidence in my bisection
> after being confident for 4-5 days, because the more I thought about it
> the less it made sense for that commit to be the culprit.
>
> I should probably have both followed up on that mail with that, and been
> clearer here. I apologize for failing that.

Is it perhaps possible that upstream 5.10 also didn't work?
I'm not saying the change itself is incorrect, indeed there
is fec_restart() on probe and open paths, as you say.
Did you try reverting as many of the changes that happened
in the meantime as possible (instead of bisection)?

The other question is whether we need to enable any of the
clocks or runtime resume before calling fec_restart()?

2024-02-14 15:50:00

by John Ernberg

[permalink] [raw]
Subject: Re: [PATCH net-next] net: fec: Always call fec_restart() in resume path

On 2/14/24 15:52, Jakub Kicinski wrote:
> On Wed, 14 Feb 2024 08:27:02 +0000 John Ernberg wrote:
>> You are correct, we thought so too at [1], but bisection is really hard
>> because we need a whole bunch of patches on top to even boot the system
>> (imx8qxp specific stuff in the NXP vendor tree that's difficult to
>> rebase), we left it a bit open ended.
>>
>> Over the course of the weekend I lost all confidence in my bisection
>> after being confident for 4-5 days, because the more I thought about it
>> the less it made sense for that commit to be the culprit.
>>
>> I should probably have both followed up on that mail with that, and been
>> clearer here. I apologize for failing that.
>
> Is it perhaps possible that upstream 5.10 also didn't work?
> I'm not saying the change itself is incorrect, indeed there
> is fec_restart() on probe and open paths, as you say.
> Did you try reverting as many of the changes that happened
> in the meantime as possible (instead of bisection)?
>

That's a really good point. I'll make some time for this in the next weeks.
Please mark it with changes requested in the meantime, as I expect to
make changes to the patch when I have a result.

> The other question is whether we need to enable any of the
> clocks or runtime resume before calling fec_restart()?

On our board it works fine without it, I don't know enough about this
SoC or other NXP SoCs to know if it's necessary in other situations.

The clocks are re-enabled in the open call which appears to be enough to
get traffic going again when the link is brought up.

Perhaps NXP can fill us in?

Thanks! // John Ernberg