LinuxLists.cc - racing ndo_open()/phylink*connect() with phy

2024-04-30 21:02:43

Subject: racing ndo_open()/phylink*connect() with phy_probe()

Hi,

I've been taking a look at the following error message:

qcom-ethqos 23000000.ethernet end1: __stmmac_open: Cannot attach to PHY (error: -19)

end1 is using a phy on the mdio bus of end0, not on its own bus. Something
like this devicetree snippet highlights the relationship:

// end0
&ethernet0 {
phy-mode = "sgmii";
phy-handle = <&sgmii_phy0>;

mdio {
compatible = "snps,dwmac-mdio";
sgmii_phy0: phy@8 {
compatible = "ethernet-phy-id0141.0dd4";
reg = <0x8>;
device_type = "ethernet-phy";
};

sgmii_phy1: phy@a {
compatible = "ethernet-phy-id0141.0dd4";
reg = <0xa>;
device_type = "ethernet-phy";
};
};
};

// end1
&ethernet1 {
phy-mode = "sgmii";
phy-handle = <&sgmii_phy1>;
};

Basically, NetworkManager is setting both interfaces to up, and end1's
phy doesn't seem to be ready when ndo_open() runs, returning
-ENODEV in phylink_fwnode_phy_connect() and bubbling that back up. This doesn't
happen very often, but by shoving things into the initramfs or anything to
speed up probe/ndo_open() it's easier to reproduce. Delaying probe()
of end0 and then setting end1 up is another easy way to reproduce.

My question after looking around for a while, is what is the expectation
of userspace in this situation?

NetworkManager retries 4 times right now (tunable via autoconnect-retries setting),
and if you're lucky that's good enough. You could tell it to retry infinitely,
that should get me out of my bind at least, but it's not an amazing solution.

I'm used to dealing with deferral issues in probe() callbacks, but
to me it seems that phylink and netdev sort of move the phy part of the problem
till later (so you don't get the kernel retrying for you, etc, like you do
with deferred probes) when you ndo_open(). I guess the logic there is that
the phys could be hot pluggable, so with phylink we delay getting at them
until ndo_open()? I also guess if the mac is going to create an mdio bus
with phys off of it that also gets dicey as the phy could -EPROBE_DEFER
when you're trying to probe the mac and confirm its phy is ready..

Is retrying the correct solution here from userspace, or am I missing something?
I thought the "does not support carrier detection" below might be my ticket out
of this, but I can still reproduce it even after patching[0] to make that work.
I guess NetworkManager still brings the device up, but doesn't "activate" the
connection until the carrier detect stuff is done.

[0] https://lore.kernel.org/netdev/[email protected]/

Thanks,
Andrew

Some logs to illustrate the issue with a little more context (without
the patch in [0], in the end that doesn't help):

[ 1.541839] fedora kernel: qcom-ethqos 23000000.ethernet end1: renamed from eth0
[ 1.545750] fedora NetworkManager[407]: <info> [1709251201.1647] device (eth0): driver '(null)' does not support carrier detection.
[ 1.545826] fedora NetworkManager[407]: <info> [1709251201.1657] device (eth0): driver 'unknown' does not support carrier detection.
[ 1.545854] fedora NetworkManager[407]: <info> [1709251201.1660] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
[ 1.545880] fedora NetworkManager[407]: <info> [1709251201.1678] device (eth0): interface index 2 renamed iface from 'eth0' to 'end1'
[ 1.551721] fedora NetworkManager[407]: <info> [1709251201.1794] device (end1): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
[ 1.554944] fedora kernel: qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
[ 1.555962] fedora kernel: qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-1
[ 1.557711] fedora kernel: qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-2
[ 1.558721] fedora kernel: qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-3
[ 1.560297] fedora kernel: qcom-ethqos 23000000.ethernet end1: __stmmac_open: Cannot attach to PHY (error: -19)
[ 1.573664] fedora NetworkManager[407]: <info> [1709251201.2013] device (end1): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
[ 1.574344] fedora NetworkManager[407]: <info> [1709251201.2020] policy: auto-activating connection 'Wired Connection' (bfe920e8-6031-4129-bf5c-78198427076a)
[ 1.574733] fedora NetworkManager[407]: <info> [1709251201.2024] device (end1): Activation: starting connection 'Wired Connection' (bfe920e8-6031-4129-bf5c-78198427076a)
[ 1.578589] fedora kernel: qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
[ 1.579351] fedora kernel: qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-1
[ 1.580102] fedora kernel: qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-2
[ 1.578337] fedora NetworkManager[407]: <info> [1709251201.2027] device (end1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
[ 1.578404] fedora NetworkManager[407]: <info> [1709251201.2030] manager: NetworkManager state is now CONNECTING
[ 1.578431] fedora NetworkManager[407]: <info> [1709251201.2033] device (end1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
[ 1.580965] fedora kernel: qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-3
[ 1.582390] fedora kernel: qcom-ethqos 23000000.ethernet end1: __stmmac_open: Cannot attach to PHY (error: -19)
[ 1.593993] fedora NetworkManager[407]: <info> [1709251201.2217] device (end1): state change: config -> failed (reason 'config-failed', sys-iface-state: 'managed')
[ 1.648395] fedora NetworkManager[407]: <info> [1709251201.2220] manager: NetworkManager state is now DISCONNECTED
[ 1.648634] fedora NetworkManager[407]: <warn> [1709251201.2222] device (end1): Activation: failed for connection 'Wired Connection'
[ 1.648926] fedora NetworkManager[407]: <info> [1709251201.2224] device (end1): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
[ 1.649179] fedora NetworkManager[407]: <info> [1709251201.2233] manager: startup complete
[ 1.834016] fedora NetworkManager[407]: <info> [1709251201.4617] device (eth0): driver '(null)' does not support carrier detection.
[ 1.836553] fedora NetworkManager[407]: <info> [1709251201.4624] device (eth0): driver 'unknown' does not support carrier detection.
[ 1.836628] fedora NetworkManager[407]: <info> [1709251201.4627] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/3)
[ 1.896859] fedora kernel: qcom-ethqos 23040000.ethernet end0: renamed from eth0
[ 1.902243] fedora NetworkManager[407]: <info> [1709251201.5299] device (eth0): interface index 3 renamed iface from 'eth0' to 'end0'
[ 1.911400] fedora kernel: qcom-ethqos 23040000.ethernet end0: Register MEM_TYPE_PAGE_POOL RxQ-0
[ 1.909632] fedora NetworkManager[407]: <info> [1709251201.5357] device (end0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')

2024-04-30 21:43:22

by Russell King (Oracle)

[permalink] [raw]

Subject: Re: racing ndo_open()/phylink*connect() with phy_probe()

On Tue, Apr 30, 2024 at 04:02:19PM -0500, Andrew Halaney wrote:
> Basically, NetworkManager is setting both interfaces to up, and end1's
> phy doesn't seem to be ready when ndo_open() runs, returning
> -ENODEV in phylink_fwnode_phy_connect() and bubbling that back up. This doesn't

Let's get something clear - you're attributing phylink to this, but this
is not the case. phylink doesn't deal directly with PHYs, it makes use
of phylib for that, and merely passes back to its caller whatever status
it gets from phylib. It's also not fair to attribute this to phylib as
we will see later...

There are a few reasons for phylink_fwnode_phy_connect() would return
-ENODEV:

1) fwnode_get_phy_node() (a phylib function) returning an error,
basically meaning the phy node isn't found. This would be a persistent
error, so unlikely to be your issue.

2) fwnode_phy_find_device() (another phylib function) not finding the
PHY device corresponding to the fwnode returned by the above on the
MDIO bus. This is possible if the PHY has not been detected on the
MDIO bus, but I suspect this is not the cause of your issue.

3) phy_attach_direct() (another phylib function) returning an error.
This function calls phy_init_hw() which will attempt to talk to the
hardware, and if that returns an error, it will be propagated up.

(3) is the most likely scenario given your quoted DT description. I
suspect that the stmmac/qcom-ethqos driver is what's at fault here.

Your DT description shows that the PHYs are on one MDIO bus owned by
one of the network interfaces. I suspect if that network interface
is down, then the MDIO bus is not accessible.

Therefore, when you attempt to open the _other_ network interface,
accesses to its PHY fail with -ENODEV and that gets propagated all
the way back up.

What's more is if you did manage to get that network interface up
(because the one with the MDIO bus on was up) then if you take
that interface down, you'll end up with a phy_error() splat from
phylib because the PHY it was using has become inaccessible.

Basically, the network driver is buggy for this PHY setup. If a
MDIO bus contains devices that are not owned by the network device
owning that MDIO bus, then the MDIO bus _must_ be prepared to handle
MDIO bus accesses _at_ _any_ _time_. This clearly is not the case
here.

It could also be the case that if the driver is using runtime PM,
that when the network interface is runtime-PM suspended, it causes
MDIO bus accesses to fail... that would be very chaotic though.

In any case, I'm going to say... I don't think this is a phylink nor
phylib issue, but a buggy network driver thinking that it has the
right to shutdown MDIO bus access depending on its network interface
state.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-05-01 15:08:37

by Andrew Lunn

[permalink] [raw]

Subject: Re: racing ndo_open()/phylink*connect() with phy_probe()

On Tue, Apr 30, 2024 at 04:02:19PM -0500, Andrew Halaney wrote:
> Hi,
>
> I've been taking a look at the following error message:
>
> qcom-ethqos 23000000.ethernet end1: __stmmac_open: Cannot attach to PHY (error: -19)
>
> end1 is using a phy on the mdio bus of end0, not on its own bus. Something
> like this devicetree snippet highlights the relationship:
>
> // end0
> &ethernet0 {
> phy-mode = "sgmii";
> phy-handle = <&sgmii_phy0>;
>
> mdio {
> compatible = "snps,dwmac-mdio";
> sgmii_phy0: phy@8 {
> compatible = "ethernet-phy-id0141.0dd4";
> reg = <0x8>;
> device_type = "ethernet-phy";
> };
>
> sgmii_phy1: phy@a {
> compatible = "ethernet-phy-id0141.0dd4";
> reg = <0xa>;
> device_type = "ethernet-phy";
> };
> };
> };
>
> // end1
> &ethernet1 {
> phy-mode = "sgmii";
> phy-handle = <&sgmii_phy1>;
> };

I agree with Russell here. You need to determine where the ENODEV
comes from.

It seems unlikely, but one possibility is:

static int stmmac_xgmac2_mdio_read_c22(struct mii_bus *bus, int phyaddr,
int phyreg)
{
struct net_device *ndev = bus->priv;
struct stmmac_priv *priv;
u32 addr;

priv = netdev_priv(ndev);

/* Until ver 2.20 XGMAC does not support C22 addr >= 4 */
if (priv->synopsys_id < DWXGMAC_CORE_2_20 &&
phyaddr > MII_XGMAC_MAX_C22ADDR)
return -ENODEV;

Maybe if the interface is down, priv->synopsys_id has not been set
yet? Your devices are at address 8 and 10, so if priv->synopsys_id
where 0, this would give you the ENODEV.

Once you know the root cause, we might be able to make suggests how to
fix it.

Andrew

2024-05-01 15:24:25

by Russell King (Oracle)

[permalink] [raw]

Subject: Re: racing ndo_open()/phylink*connect() with phy_probe()

On Wed, May 01, 2024 at 05:07:44PM +0200, Andrew Lunn wrote:
> On Tue, Apr 30, 2024 at 04:02:19PM -0500, Andrew Halaney wrote:
> > Hi,
> >
> > I've been taking a look at the following error message:
> >
> > qcom-ethqos 23000000.ethernet end1: __stmmac_open: Cannot attach to PHY (error: -19)
> >
> > end1 is using a phy on the mdio bus of end0, not on its own bus. Something
> > like this devicetree snippet highlights the relationship:
> >
> > // end0
> > &ethernet0 {
> > phy-mode = "sgmii";
> > phy-handle = <&sgmii_phy0>;
> >
> > mdio {
> > compatible = "snps,dwmac-mdio";
> > sgmii_phy0: phy@8 {
> > compatible = "ethernet-phy-id0141.0dd4";
> > reg = <0x8>;
> > device_type = "ethernet-phy";
> > };
> >
> > sgmii_phy1: phy@a {
> > compatible = "ethernet-phy-id0141.0dd4";
> > reg = <0xa>;
> > device_type = "ethernet-phy";
> > };
> > };
> > };
> >
> > // end1
> > &ethernet1 {
> > phy-mode = "sgmii";
> > phy-handle = <&sgmii_phy1>;
> > };
>
> I agree with Russell here. You need to determine where the ENODEV
> comes from.
>
> It seems unlikely, but one possibility is:
>
> static int stmmac_xgmac2_mdio_read_c22(struct mii_bus *bus, int phyaddr,
> int phyreg)
> {
> struct net_device *ndev = bus->priv;
> struct stmmac_priv *priv;
> u32 addr;
>
> priv = netdev_priv(ndev);
>
> /* Until ver 2.20 XGMAC does not support C22 addr >= 4 */
> if (priv->synopsys_id < DWXGMAC_CORE_2_20 &&
> phyaddr > MII_XGMAC_MAX_C22ADDR)
> return -ENODEV;
>
> Maybe if the interface is down, priv->synopsys_id has not been set
> yet? Your devices are at address 8 and 10, so if priv->synopsys_id
> where 0, this would give you the ENODEV.

Yes, I did look at that, but didn't read the DT snippet to realise
that the addresses would be trapped by that. So, looking at where
priv->synopsys_id is set... is in stmmac_hwif_init(), which is
called from stmmac_hw_init(), which in turn is called from
stmmac_dvr_probe(). This is the only path that leads here.

This all happens before the MDIO bus is registered, so the value of
priv->synopsys_id should be set by the time the MDIO bus is registered.

So I'm unconvinced... but I think Andrew Halaney needs to delve
deeper into debugging what is going on when the failure occurs.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

2024-05-01 15:32:32

by Andrew Lunn

[permalink] [raw]

Subject: Re: racing ndo_open()/phylink*connect() with phy_probe()

> > It seems unlikely, but one possibility is:
> >
> > static int stmmac_xgmac2_mdio_read_c22(struct mii_bus *bus, int phyaddr,
> > int phyreg)
> > {
> > struct net_device *ndev = bus->priv;
> > struct stmmac_priv *priv;
> > u32 addr;
> >
> > priv = netdev_priv(ndev);
> >
> > /* Until ver 2.20 XGMAC does not support C22 addr >= 4 */
> > if (priv->synopsys_id < DWXGMAC_CORE_2_20 &&
> > phyaddr > MII_XGMAC_MAX_C22ADDR)
> > return -ENODEV;
> >
> > Maybe if the interface is down, priv->synopsys_id has not been set
> > yet? Your devices are at address 8 and 10, so if priv->synopsys_id
> > where 0, this would give you the ENODEV.
>
> Yes, I did look at that, but didn't read the DT snippet to realise
> that the addresses would be trapped by that. So, looking at where
> priv->synopsys_id is set... is in stmmac_hwif_init(), which is
> called from stmmac_hw_init(), which in turn is called from
> stmmac_dvr_probe(). This is the only path that leads here.
>
> This all happens before the MDIO bus is registered, so the value of
> priv->synopsys_id should be set by the time the MDIO bus is registered.
>
> So I'm unconvinced...

Me too. I just wanted to give an example of why it is important to
track down the source of the ENODEV.

Andrew

2024-05-02 17:46:10

by Andrew Halaney

[permalink] [raw]

Subject: Re: racing ndo_open()/phylink*connect() with phy_probe()

On Tue, Apr 30, 2024 at 10:42:58PM +0100, Russell King (Oracle) wrote:
> On Tue, Apr 30, 2024 at 04:02:19PM -0500, Andrew Halaney wrote:
> > Basically, NetworkManager is setting both interfaces to up, and end1's
> > phy doesn't seem to be ready when ndo_open() runs, returning
> > -ENODEV in phylink_fwnode_phy_connect() and bubbling that back up. This doesn't
>
> Let's get something clear - you're attributing phylink to this, but this
> is not the case. phylink doesn't deal directly with PHYs, it makes use
> of phylib for that, and merely passes back to its caller whatever status
> it gets from phylib. It's also not fair to attribute this to phylib as
> we will see later...

Sorry for the delay, I wanted to try and test with some extra logs in
the legit setup (not my "simulate via EPROBE_DEFER delays" approach)
which is tedious with the initramfs (plus I wasted time failing to
ftrace some stuff :P) to reconvince me of old notes. Thanks for the
explanation above on the nuances between phylink and phylib, I really
appreciate it.

>
> There are a few reasons for phylink_fwnode_phy_connect() would return
> -ENODEV:
>
> 1) fwnode_get_phy_node() (a phylib function) returning an error,
> basically meaning the phy node isn't found. This would be a persistent
> error, so unlikely to be your issue.
>
> 2) fwnode_phy_find_device() (another phylib function) not finding the
> PHY device corresponding to the fwnode returned by the above on the
> MDIO bus. This is possible if the PHY has not been detected on the
> MDIO bus, but I suspect this is not the cause of your issue.

So I think we're in this case. I added some extra logs to see which
of the cases we were hitting, as well as some extra logs in phy creation
code etc to come to that conclusion:

# end1 probe start (and finish)
[ 1.424099] qcom-ethqos 23000000.ethernet: Adding to iommu group 2
...
[ 1.431267] qcom-ethqos 23000000.ethernet: Using 40/40 bits DMA host/device width

# end0 probe start
[ 1.440517] qcom-ethqos 23040000.ethernet: Adding to iommu group 3
...
[ 1.443502] qcom-ethqos 23040000.ethernet: Using 40/40 bits DMA host/device width

# end0 starts making the mdio bus, and phy devices
[ 1.443537] qcom-ethqos 23040000.ethernet: Before of_mdiobus_reg

# create phy at addr 0x8, end0's phy
[ 1.450118] Starting phy_create_device for addr: 8

# NetworkManager up'ed end1! and again. But the device we're needing
# (0xa) isn't created yet
[ 1.459743] qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
...
[ 1.465168] Failed at fwnode_phy_find_device
[ 1.465174] qcom-ethqos 23000000.ethernet end1: __stmmac_open: Cannot attach to PHY (error: -19)
[ 1.473687] qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
...
[ 1.477637] Failed at fwnode_phy_find_device
[ 1.477643] qcom-ethqos 23000000.ethernet end1: __stmmac_open: Cannot attach to PHY (error: -19)

# device created for 0x8, probe it
[ 1.531617] Ending phy_create_device for addr: 8
[ 1.627462] Marvell 88E1510 stmmac-0:08: Starting probe
[ 1.627644] hwmon hwmon0: temp1_input not attached to any thermal zone
[ 1.627650] Marvell 88E1510 stmmac-0:08: Ending probe

# device created for 0xa, probe it
[ 1.628992] Starting phy_create_device for addr: a
[ 1.632615] Ending phy_create_device for addr: a
[ 1.731552] Marvell 88E1510 stmmac-0:0a: Starting probe
[ 1.731732] hwmon hwmon1: temp1_input not attached to any thermal zone
[ 1.731738] Marvell 88E1510 stmmac-0:0a: Ending probe

# end0 is done probing now
[ 1.732804] qcom-ethqos 23040000.ethernet: After of_mdiobus_reg
[ 1.820725] qcom-ethqos 23040000.ethernet end0: renamed from eth0

# NetworkManager up's end0
[ 1.851805] qcom-ethqos 23040000.ethernet end0: Register MEM_TYPE_PAGE_POOL RxQ-0
...
[ 1.914980] qcom-ethqos 23040000.ethernet end0: PHY [stmmac-0:08] driver [Marvell 88E1510] (irq=233)
...
[ 1.939432] qcom-ethqos 23040000.ethernet end0: configuring for phy/sgmii link mode
...
[ 4.451765] qcom-ethqos 23040000.ethernet end0: Link is Up - 1Gbps/Full - flow control rx/tx

So end1 is up'ed before end0 can finish making its mdio bus / phy
devices, and therefore we fail to find it. I can easily simulate this
situation as well by -EPROBE_DEFER'ing end0 for say 10 seconds.

In playing around with this I also discovered that if end1's marvell
phy -EPROBE_DEFERs for a bit, up'ing end1 results in matching against the Generic PHY
driver, so then things don't work network wise. That's a similar topic, but
probably should be discussed separately?
Mentioning it now before I forget though. Here's some logs:

# Probe end1
[ 8.245164] qcom-ethqos 23000000.ethernet: Adding to iommu group 8
...
[ 8.377010] qcom-ethqos 23000000.ethernet: Using 40/40 bits DMA host/device width

# Probe end0
[ 8.396919] qcom-ethqos 23040000.ethernet: Adding to iommu group 9
...
[ 8.513481] qcom-ethqos 23040000.ethernet: Using 40/40 bits DMA host/device width
[ 8.521475] qcom-ethqos 23040000.ethernet: Before of_mdiobus_reg
[ 8.529283] Starting phy_create_device for addr: 8
[ 8.553872] Ending phy_create_device for addr: 8
[ 8.714637] Marvell 88E1510 stmmac-0:08: Ending probe
[ 8.721627] Starting phy_create_device for addr: a
[ 8.729064] Ending phy_create_device for addr: a
[ 8.898759] qcom-ethqos 23040000.ethernet: After of_mdiobus_reg
...

# NetworkManager ups end0
[ 9.028419] qcom-ethqos 23040000.ethernet end0: Register MEM_TYPE_PAGE_POOL RxQ-0
...
[ 9.092839] net end0: Before phylink_fwnode_phy_connect
[ 9.164375] qcom-ethqos 23040000.ethernet end0: PHY [stmmac-0:08] driver [Marvell 88E1510] (irq=280)
[ 9.174201] net end0: After phylink_fwnode_phy_connect
...

# NetworkManager ups end1, get the Generic PHY instead of marvell...
[ 9.257364] qcom-ethqos 23040000.ethernet end0: configuring for phy/sgmii link mode
...
[ 9.317542] net end1: Before phylink_fwnode_phy_connect
[ 9.404384] qcom-ethqos 23000000.ethernet end1: PHY [stmmac-0:0a] driver [Generic PHY] (irq=POLL)
[ 9.414730] net end1: After phylink_fwnode_phy_connect
...
[ 9.509450] qcom-ethqos 23000000.ethernet end1: configuring for phy/sgmii link mode

# end0 comes up, end1 doesn't due to the wrong phy being selected here
[ 11.672223] qcom-ethqos 23040000.ethernet end0: Link is Up - 1Gbps/Full - flow control rx/tx

>
> 3) phy_attach_direct() (another phylib function) returning an error.
> This function calls phy_init_hw() which will attempt to talk to the
> hardware, and if that returns an error, it will be propagated up.
>
> (3) is the most likely scenario given your quoted DT description. I
> suspect that the stmmac/qcom-ethqos driver is what's at fault here.
>
> Your DT description shows that the PHYs are on one MDIO bus owned by
> one of the network interfaces. I suspect if that network interface
> is down, then the MDIO bus is not accessible.
>
> Therefore, when you attempt to open the _other_ network interface,
> accesses to its PHY fail with -ENODEV and that gets propagated all
> the way back up.
>
> What's more is if you did manage to get that network interface up
> (because the one with the MDIO bus on was up) then if you take
> that interface down, you'll end up with a phy_error() splat from
> phylib because the PHY it was using has become inaccessible.
>
> Basically, the network driver is buggy for this PHY setup. If a
> MDIO bus contains devices that are not owned by the network device
> owning that MDIO bus, then the MDIO bus _must_ be prepared to handle
> MDIO bus accesses _at_ _any_ _time_. This clearly is not the case
> here.

As far as I can tell, at least from the "link up/down" perspective, any
combo works. If I boot (without NetworkManager doing things), I can
play around with any combo of link up and down without any noticeable
issue.

>
> It could also be the case that if the driver is using runtime PM,
> that when the network interface is runtime-PM suspended, it causes
> MDIO bus accesses to fail... that would be very chaotic though.
>
> In any case, I'm going to say... I don't think this is a phylink nor
> phylib issue, but a buggy network driver thinking that it has the
> right to shutdown MDIO bus access depending on its network interface
> state.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!
>

2024-05-02 21:27:02

by Serge Semin

[permalink] [raw]

Subject: Re: racing ndo_open()/phylink*connect() with phy_probe()

Hi all

On Thu, May 02, 2024 at 12:43:27PM -0500, Andrew Halaney wrote:
> On Tue, Apr 30, 2024 at 10:42:58PM +0100, Russell King (Oracle) wrote:
> > On Tue, Apr 30, 2024 at 04:02:19PM -0500, Andrew Halaney wrote:
> > > Basically, NetworkManager is setting both interfaces to up, and end1's
> > > phy doesn't seem to be ready when ndo_open() runs, returning
> > > -ENODEV in phylink_fwnode_phy_connect() and bubbling that back up. This doesn't
> >
> > Let's get something clear - you're attributing phylink to this, but this
> > is not the case. phylink doesn't deal directly with PHYs, it makes use
> > of phylib for that, and merely passes back to its caller whatever status
> > it gets from phylib. It's also not fair to attribute this to phylib as
> > we will see later...
>
> Sorry for the delay, I wanted to try and test with some extra logs in
> the legit setup (not my "simulate via EPROBE_DEFER delays" approach)
> which is tedious with the initramfs (plus I wasted time failing to
> ftrace some stuff :P) to reconvince me of old notes. Thanks for the
> explanation above on the nuances between phylink and phylib, I really
> appreciate it.
>
> >
> > There are a few reasons for phylink_fwnode_phy_connect() would return
> > -ENODEV:
> >
> > 1) fwnode_get_phy_node() (a phylib function) returning an error,
> > basically meaning the phy node isn't found. This would be a persistent
> > error, so unlikely to be your issue.
> >
> > 2) fwnode_phy_find_device() (another phylib function) not finding the
> > PHY device corresponding to the fwnode returned by the above on the
> > MDIO bus. This is possible if the PHY has not been detected on the
> > MDIO bus, but I suspect this is not the cause of your issue.
>
> So I think we're in this case. I added some extra logs to see which
> of the cases we were hitting, as well as some extra logs in phy creation
> code etc to come to that conclusion:
>
> # end1 probe start (and finish)
> [ 1.424099] qcom-ethqos 23000000.ethernet: Adding to iommu group 2
> ...
> [ 1.431267] qcom-ethqos 23000000.ethernet: Using 40/40 bits DMA host/device width
>
> # end0 probe start
> [ 1.440517] qcom-ethqos 23040000.ethernet: Adding to iommu group 3
> ...
> [ 1.443502] qcom-ethqos 23040000.ethernet: Using 40/40 bits DMA host/device width
>
> # end0 starts making the mdio bus, and phy devices
> [ 1.443537] qcom-ethqos 23040000.ethernet: Before of_mdiobus_reg
>
> # create phy at addr 0x8, end0's phy
> [ 1.450118] Starting phy_create_device for addr: 8
>
> # NetworkManager up'ed end1! and again. But the device we're needing
> # (0xa) isn't created yet
> [ 1.459743] qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
> ...
> [ 1.465168] Failed at fwnode_phy_find_device
> [ 1.465174] qcom-ethqos 23000000.ethernet end1: __stmmac_open: Cannot attach to PHY (error: -19)
> [ 1.473687] qcom-ethqos 23000000.ethernet end1: Register MEM_TYPE_PAGE_POOL RxQ-0
> ...
> [ 1.477637] Failed at fwnode_phy_find_device
> [ 1.477643] qcom-ethqos 23000000.ethernet end1: __stmmac_open: Cannot attach to PHY (error: -19)
>
> # device created for 0x8, probe it
> [ 1.531617] Ending phy_create_device for addr: 8
> [ 1.627462] Marvell 88E1510 stmmac-0:08: Starting probe
> [ 1.627644] hwmon hwmon0: temp1_input not attached to any thermal zone
> [ 1.627650] Marvell 88E1510 stmmac-0:08: Ending probe
>
> # device created for 0xa, probe it
> [ 1.628992] Starting phy_create_device for addr: a
> [ 1.632615] Ending phy_create_device for addr: a
> [ 1.731552] Marvell 88E1510 stmmac-0:0a: Starting probe
> [ 1.731732] hwmon hwmon1: temp1_input not attached to any thermal zone
> [ 1.731738] Marvell 88E1510 stmmac-0:0a: Ending probe
>
> # end0 is done probing now
> [ 1.732804] qcom-ethqos 23040000.ethernet: After of_mdiobus_reg
> [ 1.820725] qcom-ethqos 23040000.ethernet end0: renamed from eth0
>
> # NetworkManager up's end0
> [ 1.851805] qcom-ethqos 23040000.ethernet end0: Register MEM_TYPE_PAGE_POOL RxQ-0
> ...
> [ 1.914980] qcom-ethqos 23040000.ethernet end0: PHY [stmmac-0:08] driver [Marvell 88E1510] (irq=233)
> ...
> [ 1.939432] qcom-ethqos 23040000.ethernet end0: configuring for phy/sgmii link mode
> ...
> [ 4.451765] qcom-ethqos 23040000.ethernet end0: Link is Up - 1Gbps/Full - flow control rx/tx
>
> So end1 is up'ed before end0 can finish making its mdio bus / phy
> devices, and therefore we fail to find it. I can easily simulate this
> situation as well by -EPROBE_DEFER'ing end0 for say 10 seconds.

AFAICS the problem is in the race between the end0 and end1 device
probes. Right?
If so then can't the order be fixed by adding the links between the
OF-devices? As it's already done for various phandle-based references
like "clocks", "gpios", "phys", etc?

* Before this topic was raised I had thought it was working for any
phandle-based dependencies, but apparently it wasn't and the
supplier/consumer linkage was supposed to be implemented for each
particular case. The "phy-handle" property lacks that feature support
(see drivers/of/property.c:of_supplier_bindings and
of_fwnode_add_links() for details).

-Serge(y)

> [...]

2024-05-03 01:25:51

by Andrew Lunn

[permalink] [raw]

Subject: Re: racing ndo_open()/phylink*connect() with phy_probe()

> AFAICS the problem is in the race between the end0 and end1 device
> probes. Right?
> If so then can't the order be fixed by adding the links between the
> OF-devices? As it's already done for various phandle-based references
> like "clocks", "gpios", "phys", etc?

It gets tricky because an MDIO bus master device is often a sub device
of an Ethernet MAC driver. Typically how it works is that a MAC driver
probes. Part way through the probe it creates an MDIO bus driver,
which enumerates the MDIO bus and creates the PHYs. Later in the MAC
driver probe, or maybe when the MAC driver is opened, it follows the
phy-handle to a PHY on its own MDIO bus.

If you were to say it cannot probe the MAC driver until the MDIO bus
driver is created and the PHYs created, you never get anywhere,
because it is the act of probing the MAC driver which creates the PHYs
which fulfils the phandle.

You would need to differentiate between a phandle pointing deeper into
the same branch of a DT tree, or pointing to a different branch of a
DT tree. If it is deeper within the same branch, you need to probe in
order to make progress. If it points to a different branch you need to
defer until that sub-branch has successfully probed. And if you get
two branches which are mutually dependent on each other, probe and
hope EPROBE_DEFER solves it.

Andrew

2024-05-07 15:15:52

by Andrew Halaney

[permalink] [raw]

Subject: Re: racing ndo_open()/phylink*connect() with phy_probe()

On Fri, May 03, 2024 at 03:25:19AM GMT, Andrew Lunn wrote:
> > AFAICS the problem is in the race between the end0 and end1 device
> > probes. Right?
> > If so then can't the order be fixed by adding the links between the
> > OF-devices? As it's already done for various phandle-based references
> > like "clocks", "gpios", "phys", etc?

Thanks for the pointer here Serge, I had no idea (still don't have much of an
idea) on how this works. I think this makes sense to explore some more.
Hopefully sometime this week I'll poke at this more.

>
> It gets tricky because an MDIO bus master device is often a sub device
> of an Ethernet MAC driver. Typically how it works is that a MAC driver
> probes. Part way through the probe it creates an MDIO bus driver,
> which enumerates the MDIO bus and creates the PHYs. Later in the MAC
> driver probe, or maybe when the MAC driver is opened, it follows the
> phy-handle to a PHY on its own MDIO bus.
>
> If you were to say it cannot probe the MAC driver until the MDIO bus
> driver is created and the PHYs created, you never get anywhere,
> because it is the act of probing the MAC driver which creates the PHYs
> which fulfils the phandle.
>
> You would need to differentiate between a phandle pointing deeper into
> the same branch of a DT tree, or pointing to a different branch of a
> DT tree. If it is deeper within the same branch, you need to probe in
> order to make progress. If it points to a different branch you need to
> defer until that sub-branch has successfully probed. And if you get
> two branches which are mutually dependent on each other, probe and
> hope EPROBE_DEFER solves it.

I'll keep this relationship in mind. IIUC the fw_devlink stuff sort of handles
cycles, but I need to look into how all that works further. At least in
my example device, end0 is in this situation, whereas end1 is in the
other situation, so I have a decent test setup for that.