2022-11-30 11:30:10

by Clark Wang

[permalink] [raw]
Subject: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume issue with WoL enabled

Issue we met:
On some platforms, mac cannot work after resumed from the suspend with WoL
enabled.

The cause of the issue:
1. phylink_resolve() is in a workqueue which will not be executed immediately.
This is the call sequence:
phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
For stmmac driver, mac_link_up() will set the correct speed/duplex...
values which are from link_state.
2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
phylink_resume(). stmmac_core_init() is called in function stmmac_hw_setup(),
which will reset the mac and set the speed/duplex... to default value.
Conclusion: Because phylink_resolve() cannot determine when it is called, it
cannot be guaranteed to be called after stmmac_core_init().
Once stmmac_core_init() is called after phylink_resolve(),
the mac will be misconfigured and cannot be used.

In order to solve this problem, the mac_ready flag is added to the phylink
structure to synchronize the state of the mac in the phylink_resolve()
function. To prevent the correct configuration from being overwritten.

By default, mac_ready will be configured as true, that is, it will not affect
other drivers that do not use this flag.

Signed-off-by: Clark Wang <[email protected]>
---
drivers/net/phy/phylink.c | 36 ++++++++++++++++++++++++++++++++++++
include/linux/phylink.h | 2 ++
2 files changed, 38 insertions(+)

diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 09cc65c0da93..312b47fdc12b 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -74,6 +74,7 @@ struct phylink {

bool mac_link_dropped;
bool using_mac_select_pcs;
+ bool mac_ready;

struct sfp_bus *sfp_bus;
bool sfp_may_have_phy;
@@ -1276,6 +1277,12 @@ static void phylink_resolve(struct work_struct *w)
bool retrigger = false;
bool cur_link_state;

+ /* If mac is not ready, retrigger this work queue to wait it ready*/
+ if (!pl->mac_ready) {
+ queue_work(system_power_efficient_wq, &pl->resolve);
+ return;
+ }
+
mutex_lock(&pl->state_mutex);
if (pl->netdev)
cur_link_state = netif_carrier_ok(ndev);
@@ -1450,6 +1457,34 @@ static int phylink_register_sfp(struct phylink *pl,
return ret;
}

+/**
+ * phylink_clear_mac_ready() - clear mac_ready flag
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ *
+ * set mac_ready to false, which means the mac is not ready,
+ * if the pl->mac_ops->mac_link_up function in resolve is called at this time,
+ * the correct speed/duplex and other parameters set in this function will
+ * be reset to the default values by mac.
+ */
+void phylink_clear_mac_ready(struct phylink *pl)
+{
+ pl->mac_ready = false;
+}
+EXPORT_SYMBOL_GPL(phylink_clear_mac_ready);
+
+/**
+ * phylink_set_mac_ready() - set mac_ready flag
+ * @pl: a pointer to a &struct phylink returned from phylink_create()
+ *
+ * set mac_ready to true, which means the mac is ready now, can reslove the
+ * phylink and set the correct speed/duplex settings for mac.
+ */
+void phylink_set_mac_ready(struct phylink *pl)
+{
+ pl->mac_ready = true;
+}
+EXPORT_SYMBOL_GPL(phylink_set_mac_ready);
+
/**
* phylink_create() - create a phylink instance
* @config: a pointer to the target &struct phylink_config
@@ -1518,6 +1553,7 @@ struct phylink *phylink_create(struct phylink_config *config,
pl->link_config.duplex = DUPLEX_UNKNOWN;
pl->link_config.an_enabled = true;
pl->mac_ops = mac_ops;
+ pl->mac_ready = true;
__set_bit(PHYLINK_DISABLE_STOPPED, &pl->phylink_disable_state);
timer_setup(&pl->link_poll, phylink_fixed_poll, 0);

diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index c492c26202b5..759c8041f3d2 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -570,6 +570,8 @@ void phylink_generic_validate(struct phylink_config *config,
unsigned long *supported,
struct phylink_link_state *state);

+void phylink_clear_mac_ready(struct phylink *pl);
+void phylink_set_mac_ready(struct phylink *pl);
struct phylink *phylink_create(struct phylink_config *, struct fwnode_handle *,
phy_interface_t iface,
const struct phylink_mac_ops *mac_ops);
--
2.34.1


2022-11-30 11:36:33

by Clark Wang

[permalink] [raw]
Subject: RE: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume issue with WoL enabled

Hi Russell,

> -----Original Message-----
> From: Russell King <[email protected]>
> Sent: 2022??11??30?? 19:24
> To: Clark Wang <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume
> issue with WoL enabled
>
> On Wed, Nov 30, 2022 at 07:11:47PM +0800, Clark Wang wrote:
> > Issue we met:
> > On some platforms, mac cannot work after resumed from the suspend with
> > WoL enabled.
> >
> > The cause of the issue:
> > 1. phylink_resolve() is in a workqueue which will not be executed immediately.
> > This is the call sequence:
> > phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
> > For stmmac driver, mac_link_up() will set the correct speed/duplex...
> > values which are from link_state.
> > 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
> > phylink_resume(). stmmac_core_init() is called in function
> > stmmac_hw_setup(),
>
> ... and that is where the problem is. Don't call phylink_resume() before your
> hardware is ready to see a link-up event.

Thank you very much for your reply!

You are right.

However, stmmac requires RXC to have a clock input when performing a reset(in stmmac_hw_setup()). On our board, RXC is provided by the phy.

In WoL mode, this is not a problem, because the phy will not be down when suspend. RXC will keep output. But in normal suspend(without WoL), the phy will be down, which does not guarantee the output of the RXC of the phy. Therefore, the previous code will call phylink_resume() before stmmac_hw_setup().

Thanks again!
Clark Wang

>
> --
> RMK's Patch system:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ar
> mlinux.org.uk%2Fdeveloper%2Fpatches%2F&amp;data=05%7C01%7Cxiaoning.
> wang%40nxp.com%7C5b2cf0060616410813ca08dad2c55a11%7C686ea1d3bc2
> b4c6fa92cd99c5c301635%7C0%7C0%7C638054042306464657%7CUnknown%
> 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=CPhQpnvoV95ch%2FUc
> M4Rb2HvY0r24I0FsVSKM9hO13FI%3D&amp;reserved=0
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-11-30 11:47:35

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume issue with WoL enabled

On Wed, Nov 30, 2022 at 07:11:47PM +0800, Clark Wang wrote:
> Issue we met:
> On some platforms, mac cannot work after resumed from the suspend with WoL
> enabled.
>
> The cause of the issue:
> 1. phylink_resolve() is in a workqueue which will not be executed immediately.
> This is the call sequence:
> phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
> For stmmac driver, mac_link_up() will set the correct speed/duplex...
> values which are from link_state.
> 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
> phylink_resume(). stmmac_core_init() is called in function stmmac_hw_setup(),

... and that is where the problem is. Don't call phylink_resume() before
your hardware is ready to see a link-up event.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-11-30 12:01:46

by Clark Wang

[permalink] [raw]
Subject: RE: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume issue with WoL enabled


> -----Original Message-----
> From: Russell King <[email protected]>
> Sent: 2022??11??30?? 19:33
> To: Clark Wang <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume
> issue with WoL enabled
>
> On Wed, Nov 30, 2022 at 11:23:42AM +0000, Russell King (Oracle) wrote:
> > On Wed, Nov 30, 2022 at 07:11:47PM +0800, Clark Wang wrote:
> > > Issue we met:
> > > On some platforms, mac cannot work after resumed from the suspend
> > > with WoL enabled.
> > >
> > > The cause of the issue:
> > > 1. phylink_resolve() is in a workqueue which will not be executed
> immediately.
> > > This is the call sequence:
> > > phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
> > > For stmmac driver, mac_link_up() will set the correct speed/duplex...
> > > values which are from link_state.
> > > 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
> > > phylink_resume(). stmmac_core_init() is called in function
> > > stmmac_hw_setup(),
> >
> > ... and that is where the problem is. Don't call phylink_resume()
> > before your hardware is ready to see a link-up event.
>
> ... and while that is being fixed, maybe the stupid code in
> stmmac_resume() can also be fixed:
>
> rtnl_lock();
> if (device_may_wakeup(priv->device) && priv->plat->pmt) {
> phylink_resume(priv->phylink);
> } else {
> phylink_resume(priv->phylink);
> if (device_may_wakeup(priv->device))
> phylink_speed_up(priv->phylink);
> }
> rtnl_unlock();
>
> rtnl_lock();
>
> 1. phylink_resume() is always called after that first rtnl_lock(), so there's no
> point it being stupidly in each side of the if().
>
> 2. the rtnl_unlock() followed by rtnl_lock() is completely unnecessary.
>
> Thanks.
>
> --
> RMK's Patch system:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ar
> mlinux.org.uk%2Fdeveloper%2Fpatches%2F&amp;data=05%7C01%7Cxiaoning.
> wang%40nxp.com%7Cc0961fa17589464a742308dad2c69502%7C686ea1d3bc2
> b4c6fa92cd99c5c301635%7C0%7C0%7C638054047593066949%7CUnknown%
> 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=wURkTxhK2IlqktOKmJzx
> Nr8E4KCzd1gjDgHK3iSy6EA%3D&amp;reserved=0
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-11-30 12:19:56

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume issue with WoL enabled

On Wed, Nov 30, 2022 at 11:23:42AM +0000, Russell King (Oracle) wrote:
> On Wed, Nov 30, 2022 at 07:11:47PM +0800, Clark Wang wrote:
> > Issue we met:
> > On some platforms, mac cannot work after resumed from the suspend with WoL
> > enabled.
> >
> > The cause of the issue:
> > 1. phylink_resolve() is in a workqueue which will not be executed immediately.
> > This is the call sequence:
> > phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
> > For stmmac driver, mac_link_up() will set the correct speed/duplex...
> > values which are from link_state.
> > 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
> > phylink_resume(). stmmac_core_init() is called in function stmmac_hw_setup(),
>
> ... and that is where the problem is. Don't call phylink_resume() before
> your hardware is ready to see a link-up event.

... and while that is being fixed, maybe the stupid code in
stmmac_resume() can also be fixed:

rtnl_lock();
if (device_may_wakeup(priv->device) && priv->plat->pmt) {
phylink_resume(priv->phylink);
} else {
phylink_resume(priv->phylink);
if (device_may_wakeup(priv->device))
phylink_speed_up(priv->phylink);
}
rtnl_unlock();

rtnl_lock();

1. phylink_resume() is always called after that first rtnl_lock(), so
there's no point it being stupidly in each side of the if().

2. the rtnl_unlock() followed by rtnl_lock() is completely unnecessary.

Thanks.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-11-30 12:36:52

by Clark Wang

[permalink] [raw]
Subject: RE: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume issue with WoL enabled


> -----Original Message-----
> From: Russell King <[email protected]>
> Sent: 2022??11??30?? 19:51
> To: Clark Wang <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume
> issue with WoL enabled
>
> On Wed, Nov 30, 2022 at 11:32:09AM +0000, Clark Wang wrote:
> > Hi Russell,
> >
> > > -----Original Message-----
> > > From: Russell King <[email protected]>
> > > Sent: 2022??11??30?? 19:24
> > > To: Clark Wang <[email protected]>
> > > Cc: [email protected]; [email protected];
> > > [email protected]; [email protected];
> [email protected];
> > > [email protected]; [email protected]; [email protected];
> > > [email protected]; [email protected]; [email protected];
> > > [email protected];
> > > [email protected]; [email protected]
> > > Subject: Re: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix
> resume
> > > issue with WoL enabled
> > >
> > > On Wed, Nov 30, 2022 at 07:11:47PM +0800, Clark Wang wrote:
> > > > Issue we met:
> > > > On some platforms, mac cannot work after resumed from the suspend
> with
> > > > WoL enabled.
> > > >
> > > > The cause of the issue:
> > > > 1. phylink_resolve() is in a workqueue which will not be executed
> immediately.
> > > > This is the call sequence:
> > > >
> phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
> > > > For stmmac driver, mac_link_up() will set the correct
> speed/duplex...
> > > > values which are from link_state.
> > > > 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
> > > > phylink_resume(). stmmac_core_init() is called in function
> > > > stmmac_hw_setup(),
> > >
> > > ... and that is where the problem is. Don't call phylink_resume() before
> your
> > > hardware is ready to see a link-up event.
> >
> > Thank you very much for your reply!
> >
> > You are right.
> >
> > However, stmmac requires RXC to have a clock input when performing a
> reset(in stmmac_hw_setup()). On our board, RXC is provided by the phy.
> >
> > In WoL mode, this is not a problem, because the phy will not be down
> when suspend. RXC will keep output. But in normal suspend(without WoL),
> the phy will be down, which does not guarantee the output of the RXC of the
> phy. Therefore, the previous code will call phylink_resume() before
> stmmac_hw_setup().
>
> I think we need phylink_phy_resume() which stmmac can use to resume the
> PHY without resuming phylink, assuming that will output the RXC. Which
> PHY driver(s) are used with stmmac?

Yes, that will be a better way!
For now, we use AR8031 in drivers/net/phy/at803x.c and
RTL8211F/RTL8211F-VD in drivers/net/phy/realtek.c with stmmac.


Best Regards,
Clark Wang

>
> --
> RMK's Patch system:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww
> .armlinux.org.uk%2Fdeveloper%2Fpatches%2F&amp;data=05%7C01%7Cxia
> oning.wang%40nxp.com%7C9b3a5389973b48963ee708dad2c924a3%7C686
> ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C638054058591085448%7C
> Unknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB
> TiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=0qb%2F
> NpZUeI42SGekT488Lob9aQ6d2EB4xV944330QkI%3D&amp;reserved=0
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-11-30 12:39:12

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume issue with WoL enabled

On Wed, Nov 30, 2022 at 11:32:09AM +0000, Clark Wang wrote:
> Hi Russell,
>
> > -----Original Message-----
> > From: Russell King <[email protected]>
> > Sent: 2022年11月30日 19:24
> > To: Clark Wang <[email protected]>
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected];
> > [email protected]; [email protected]
> > Subject: Re: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume
> > issue with WoL enabled
> >
> > On Wed, Nov 30, 2022 at 07:11:47PM +0800, Clark Wang wrote:
> > > Issue we met:
> > > On some platforms, mac cannot work after resumed from the suspend with
> > > WoL enabled.
> > >
> > > The cause of the issue:
> > > 1. phylink_resolve() is in a workqueue which will not be executed immediately.
> > > This is the call sequence:
> > > phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
> > > For stmmac driver, mac_link_up() will set the correct speed/duplex...
> > > values which are from link_state.
> > > 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
> > > phylink_resume(). stmmac_core_init() is called in function
> > > stmmac_hw_setup(),
> >
> > ... and that is where the problem is. Don't call phylink_resume() before your
> > hardware is ready to see a link-up event.
>
> Thank you very much for your reply!
>
> You are right.
>
> However, stmmac requires RXC to have a clock input when performing a reset(in stmmac_hw_setup()). On our board, RXC is provided by the phy.
>
> In WoL mode, this is not a problem, because the phy will not be down when suspend. RXC will keep output. But in normal suspend(without WoL), the phy will be down, which does not guarantee the output of the RXC of the phy. Therefore, the previous code will call phylink_resume() before stmmac_hw_setup().

I think we need phylink_phy_resume() which stmmac can use to resume the
PHY without resuming phylink, assuming that will output the RXC. Which
PHY driver(s) are used with stmmac?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2022-11-30 12:42:16

by Clark Wang

[permalink] [raw]
Subject: RE: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume issue with WoL enabled


> -----Original Message-----
> From: Russell King <[email protected]>
> Sent: 2022??11??30?? 19:33
> To: Clark Wang <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected];
> [email protected]; [email protected]
> Subject: Re: [PATCH 1/2] net: phylink: add sync flag mac_ready to fix resume
> issue with WoL enabled
>
> On Wed, Nov 30, 2022 at 11:23:42AM +0000, Russell King (Oracle) wrote:
> > On Wed, Nov 30, 2022 at 07:11:47PM +0800, Clark Wang wrote:
> > > Issue we met:
> > > On some platforms, mac cannot work after resumed from the suspend
> > > with WoL enabled.
> > >
> > > The cause of the issue:
> > > 1. phylink_resolve() is in a workqueue which will not be executed
> immediately.
> > > This is the call sequence:
> > > phylink_resolve()->phylink_link_up()->pl->mac_ops->mac_link_up()
> > > For stmmac driver, mac_link_up() will set the correct speed/duplex...
> > > values which are from link_state.
> > > 2. In stmmac_resume(), it will call stmmac_hw_setup() after called the
> > > phylink_resume(). stmmac_core_init() is called in function
> > > stmmac_hw_setup(),
> >
> > ... and that is where the problem is. Don't call phylink_resume()
> > before your hardware is ready to see a link-up event.
>
> ... and while that is being fixed, maybe the stupid code in
> stmmac_resume() can also be fixed:
>
> rtnl_lock();
> if (device_may_wakeup(priv->device) && priv->plat->pmt) {
> phylink_resume(priv->phylink);
> } else {
> phylink_resume(priv->phylink);
> if (device_may_wakeup(priv->device))
> phylink_speed_up(priv->phylink);
> }
> rtnl_unlock();
>
> rtnl_lock();
>
> 1. phylink_resume() is always called after that first rtnl_lock(), so there's no
> point it being stupidly in each side of the if().
>
> 2. the rtnl_unlock() followed by rtnl_lock() is completely unnecessary.
>
> Thanks.
>

Thanks for your reminder, I will send another patch to clean this.
Sorry, the last empty email was sent by mistake.

Best Regards,
Clark Wang

> --
> RMK's Patch system:
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.ar
> mlinux.org.uk%2Fdeveloper%2Fpatches%2F&amp;data=05%7C01%7Cxiaoning.
> wang%40nxp.com%7Cc0961fa17589464a742308dad2c69502%7C686ea1d3bc2
> b4c6fa92cd99c5c301635%7C0%7C0%7C638054047593066949%7CUnknown%
> 7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwi
> LCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=wURkTxhK2IlqktOKmJzx
> Nr8E4KCzd1gjDgHK3iSy6EA%3D&amp;reserved=0
> FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!