2023-01-15 22:14:47

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition

On Sun, Jan 15, 2023 at 10:38:04PM +0100, Pierluigi Passaro wrote:
> For dual fec interfaces, external phys can only be configured by fec0.
> When the function of_mdiobus_register return -EPROBE_DEFER, the driver
> is lately called to manage fec1, which wrongly register its mii_bus as
> fec0_mii_bus.
> When fec0 retry the probe, the previous assignement prevent the MDIO bus
> registration.
> Use a static boolean to trace the orginal MDIO bus deferred probe and
> prevent further registrations until the fec0 registration completed
> succesfully.

The real problem here seems to be that fep->dev_id is not
deterministic. I think a better fix would be to make the mdio bus name
deterministic. Use pdev->id instead of fep->dev_id + 1. That is what
most mdiobus drivers use.

Andrew


2023-01-15 22:51:39

by Pierluigi Passaro

[permalink] [raw]
Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition

On Sun, Jan 15, 2023 at 10:56 PM Andrew Lunn <[email protected]> wrote:
> On Sun, Jan 15, 2023 at 10:38:04PM +0100, Pierluigi Passaro wrote:
> > For dual fec interfaces, external phys can only be configured by fec0.
> > When the function of_mdiobus_register return -EPROBE_DEFER, the driver
> > is lately called to manage fec1, which wrongly register its mii_bus as
> > fec0_mii_bus.
> > When fec0 retry the probe, the previous assignement prevent the MDIO bus
> > registration.
> > Use a static boolean to trace the orginal MDIO bus deferred probe and
> > prevent further registrations until the fec0 registration completed
> > succesfully.
>
> The real problem here seems to be that fep->dev_id is not
> deterministic. I think a better fix would be to make the mdio bus name
> deterministic. Use pdev->id instead of fep->dev_id + 1. That is what
> most mdiobus drivers use.
>
Actually, the sequence is deterministic, fec0 and then fec1,
but sometimes the GPIO of fec0 is not yet available.
The EPROBE_DEFER does not prevent the second instance from being probed.
This is the origin of the problem.
>
> Andrew

2023-01-16 00:54:06

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition

On Sun, Jan 15, 2023 at 11:23:51PM +0100, Pierluigi Passaro wrote:
> On Sun, Jan 15, 2023 at 10:56 PM Andrew Lunn <[email protected]> wrote:
> > On Sun, Jan 15, 2023 at 10:38:04PM +0100, Pierluigi Passaro wrote:
> > > For dual fec interfaces, external phys can only be configured by fec0.
> > > When the function of_mdiobus_register return -EPROBE_DEFER, the driver
> > > is lately called to manage fec1, which wrongly register its mii_bus as
> > > fec0_mii_bus.
> > > When fec0 retry the probe, the previous assignement prevent the MDIO bus
> > > registration.
> > > Use a static boolean to trace the orginal MDIO bus deferred probe and
> > > prevent further registrations until the fec0 registration completed
> > > succesfully.
> >
> > The real problem here seems to be that fep->dev_id is not
> > deterministic. I think a better fix would be to make the mdio bus name
> > deterministic. Use pdev->id instead of fep->dev_id + 1. That is what
> > most mdiobus drivers use.
> >
> Actually, the sequence is deterministic, fec0 and then fec1,
> but sometimes the GPIO of fec0 is not yet available.
> The EPROBE_DEFER does not prevent the second instance from being probed.
> This is the origin of the problem.

Maybe i understood you wrongly, but it sounds like the second instance
takes the name space of the first? And when the first probes for the
second time, the name space is taken and the registration fails? To
me, this is indeterminate behaviour, the name fec0_mii_bus is not
determinate.

Andrew

2023-01-16 07:41:59

by Alexander Stein

[permalink] [raw]
Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition

Hi,

Am Sonntag, 15. Januar 2023, 23:23:51 CET schrieb Pierluigi Passaro:
> On Sun, Jan 15, 2023 at 10:56 PM Andrew Lunn <[email protected]> wrote:
> > On Sun, Jan 15, 2023 at 10:38:04PM +0100, Pierluigi Passaro wrote:
> > > For dual fec interfaces, external phys can only be configured by fec0.
> > > When the function of_mdiobus_register return -EPROBE_DEFER, the driver
> > > is lately called to manage fec1, which wrongly register its mii_bus as
> > > fec0_mii_bus.
> > > When fec0 retry the probe, the previous assignement prevent the MDIO bus
> > > registration.
> > > Use a static boolean to trace the orginal MDIO bus deferred probe and
> > > prevent further registrations until the fec0 registration completed
> > > succesfully.
> >
> > The real problem here seems to be that fep->dev_id is not
> > deterministic. I think a better fix would be to make the mdio bus name
> > deterministic. Use pdev->id instead of fep->dev_id + 1. That is what
> > most mdiobus drivers use.
>
> Actually, the sequence is deterministic, fec0 and then fec1,
> but sometimes the GPIO of fec0 is not yet available.

Not in every case though. On i.MX6UL has the following memory map for FEC:
* FEC2: 0x020b4000
* FEC1: 0x02188000

Which essentially means that fec2 will be probed first.

> The EPROBE_DEFER does not prevent the second instance from being probed.
> This is the origin of the problem.

Is this the actual cause? There is also a problem in the case above if the
MDIO controlling interface (fec2) is not probed first, e.g. using fec1 for
MDIO access. But then again there is i.MX6ULG1 which only has fec1
interface...

Best regards,
Alexander



2023-01-16 09:28:27

by Pierluigi Passaro

[permalink] [raw]
Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition

On Mon, Jan 16, 2023 at 1:01 AM Andrew Lunn <[email protected]> wrote:
> On Sun, Jan 15, 2023 at 11:23:51PM +0100, Pierluigi Passaro wrote:
> > On Sun, Jan 15, 2023 at 10:56 PM Andrew Lunn <[email protected]> wrote:
> > > On Sun, Jan 15, 2023 at 10:38:04PM +0100, Pierluigi Passaro wrote:
> > > > For dual fec interfaces, external phys can only be configured by fec0.
> > > > When the function of_mdiobus_register return -EPROBE_DEFER, the driver
> > > > is lately called to manage fec1, which wrongly register its mii_bus as
> > > > fec0_mii_bus.
> > > > When fec0 retry the probe, the previous assignement prevent the MDIO bus
> > > > registration.
> > > > Use a static boolean to trace the orginal MDIO bus deferred probe and
> > > > prevent further registrations until the fec0 registration completed
> > > > succesfully.
> > >
> > > The real problem here seems to be that fep->dev_id is not
> > > deterministic. I think a better fix would be to make the mdio bus name
> > > deterministic. Use pdev->id instead of fep->dev_id + 1. That is what
> > > most mdiobus drivers use.
> > >
> > Actually, the sequence is deterministic, fec0 and then fec1,
> > but sometimes the GPIO of fec0 is not yet available.
> > The EPROBE_DEFER does not prevent the second instance from being probed.
> > This is the origin of the problem.
>
> Maybe I understood you wrongly, but it sounds like the second instance
> takes the namespace of the first? And when the first probes for the
> second time, the name space is taken and the registration fails? To
> me, this is indeterminate behaviour, the name fec0_mii_bus is not
> determinate.
>
> ? ? ? ? Andrew
This is the setup of the corner case:
- FEC0 is the owner of MDIO bus, but its own PHY rely on a "delayed" GPIO
- FEC1 rely on FEC0 for MDIO communications
The sequence is something like this
- FEC0 probe start, but being the reset GPIO "delayed" it return EPROBE_DEFERRED
- FEC1 is successfully probed: being the MDIO bus still not owned, the driver assume
? that the ownership must be assigned to the 1st one successfully probed, but no
? MDIO node is actually present and no communication takes place.
- FEC0 is successfully probed, but MDIO bus is now assigned to FEC1 and cannot? and no communication takes place

2023-01-16 17:02:54

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition

> This is the setup of the corner case:
> - FEC0 is the owner of MDIO bus, but its own PHY rely on a "delayed" GPIO
> - FEC1 rely on FEC0 for MDIO communications
> The sequence is something like this
> - FEC0 probe start, but being the reset GPIO "delayed" it return EPROBE_DEFERRED
> - FEC1 is successfully probed: being the MDIO bus still not owned, the driver assume
> ? that the ownership must be assigned to the 1st one successfully probed, but no
> ? MDIO node is actually present and no communication takes place.

So semantics of a phandle is that you expect what it points to, to
exists. So if phy-handle points to a PHY, when you follow that pointer
and find it missing, you should defer the probe. So this step should
not succeed.

> - FEC0 is successfully probed, but MDIO bus is now assigned to FEC1
> and cannot? and no communication takes place

Andrew

2023-01-16 19:46:16

by Pierluigi Passaro

[permalink] [raw]
Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition

On Mon, Jan 16, 2023 at 8:35 AM Alexander Stein <[email protected]> wrote:
> Hi,
>
> Am Sonntag, 15. Januar 2023, 23:23:51 CET schrieb Pierluigi Passaro:
> > On Sun, Jan 15, 2023 at 10:56 PM Andrew Lunn <[email protected]> wrote:
> > > On Sun, Jan 15, 2023 at 10:38:04PM +0100, Pierluigi Passaro wrote:
> > > > For dual fec interfaces, external phys can only be configured by fec0.
> > > > When the function of_mdiobus_register return -EPROBE_DEFER, the driver
> > > > is lately called to manage fec1, which wrongly register its mii_bus as
> > > > fec0_mii_bus.
> > > > When fec0 retry the probe, the previous assignement prevent the MDIO bus
> > > > registration.
> > > > Use a static boolean to trace the orginal MDIO bus deferred probe and
> > > > prevent further registrations until the fec0 registration completed
> > > > succesfully.
> > >
> > > The real problem here seems to be that fep->dev_id is not
> > > deterministic. I think a better fix would be to make the mdio bus name
> > > deterministic. Use pdev->id instead of fep->dev_id + 1. That is what
> > > most mdiobus drivers use.
> >
> > Actually, the sequence is deterministic, fec0 and then fec1,
> > but sometimes the GPIO of fec0 is not yet available.
>
> Not in every case though. On i.MX6UL has the following memory map for FEC:
> * FEC2: 0x020b4000
> * FEC1: 0x02188000
>
> Which essentially means that fec2 will be probed first.
>
This is actually the expected behaviour, by FEC0 I refer to the 1st instance of
FEC, no matter the alias used for it: apologizing for the misleading notation.
For iMX6UL, when both the FEC are present, the MDIO is owned by
fec@0x020b4000,?which is the 1st instance of FEC.
>
> > The EPROBE_DEFER does not prevent the second instance from being probed.
> > This is the origin of the problem.
>
> Is this the actual cause? There is also a problem in the case above if the
> MDIO controlling interface (fec2) is not probed first, e.g. using fec1 for
> MDIO access. But then again there is i.MX6ULG1 which only has fec1
> interface...
>
I'm not familiar with iMX6ULG1, but I would expect that its device tree disables
one of the 2 fec: this patch is relevant only for dual FEC configuration.
>
> Best regards,
> Alexander

2023-01-16 20:45:15

by Pierluigi Passaro

[permalink] [raw]
Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition

On Mon, Jan 16, 2023 at 4:32 PM Andrew Lunn <[email protected]> wrote:
> > This is the setup of the corner case:
> > - FEC0 is the owner of MDIO bus, but its own PHY rely on a "delayed" GPIO
> > - FEC1 rely on FEC0 for MDIO communications
> > The sequence is something like this
> > - FEC0 probe start, but being the reset GPIO "delayed" it return EPROBE_DEFERRED
> > - FEC1 is successfully probed: being the MDIO bus still not owned, the driver assume
> > ? that the ownership must be assigned to the 1st one successfully probed, but no
> > ? MDIO node is actually present and no communication takes place.
>
> So semantics of a phandle is that you expect what it points to, to
> exists. So if phy-handle points to a PHY, when you follow that pointer
> and find it missing, you should defer the probe. So this step should
> not succeed.
>
I agree with you: the check is present, but the current logic is not consistent.
Whenever the node owning the MDIO fails the probe due to EPROBE_DEFERRED,
also the second node must defer the probe, otherwise no MDIO communication
is possible.
That's why the patch set the static variable wait_for_mdio_bus?to track the status.
>
> > - FEC0 is successfully probed, but MDIO bus is now assigned to FEC1
> > ? and cannot ?and no communication takes place
>
> ? ? ? ?Andrew

2023-01-17 06:12:58

by Wei Fang

[permalink] [raw]
Subject: RE: [PATCH v2] net: fec: manage corner deferred probe condition

> -----Original Message-----
> From: Pierluigi Passaro <[email protected]>
> Sent: 2023年1月17日 4:23
> To: Andrew Lunn <[email protected]>
> Cc: Pierluigi Passaro <[email protected]>; Wei Fang
> <[email protected]>; Shenwei Wang <[email protected]>; Clark Wang
> <[email protected]>; dl-linux-imx <[email protected]>;
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Nate Drude <[email protected]>; Francesco
> Ferraro <[email protected]>
> Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition
>
> On Mon, Jan 16, 2023 at 4:32 PM Andrew Lunn <[email protected]> wrote:
> > > This is the setup of the corner case:
> > > - FEC0 is the owner of MDIO bus, but its own PHY rely on a "delayed"
> > > GPIO
> > > - FEC1 rely on FEC0 for MDIO communications The sequence is
> > > something like this
> > > - FEC0 probe start, but being the reset GPIO "delayed" it return
> > > EPROBE_DEFERRED
> > > - FEC1 is successfully probed: being the MDIO bus still not owned,
> > > the driver assume
> > >   that the ownership must be assigned to the 1st one successfully
> > > probed, but no
> > >   MDIO node is actually present and no communication takes place.
> >
> > So semantics of a phandle is that you expect what it points to, to
> > exists. So if phy-handle points to a PHY, when you follow that pointer
> > and find it missing, you should defer the probe. So this step should
> > not succeed.
> >
> I agree with you: the check is present, but the current logic is not consistent.
> Whenever the node owning the MDIO fails the probe due to
> EPROBE_DEFERRED, also the second node must defer the probe, otherwise no
> MDIO communication is possible.
> That's why the patch set the static variable wait_for_mdio_bus to track the
> status.
> > > - FEC0 is successfully probed, but MDIO bus is now assigned to FEC1
> > >   and cannot  and no communication takes place
> >

Have you tested that this issue also exists on the net tree? According to your
description, I simulated your situation on the net tree and tested it with imx6ul,
but the problem you mentioned does not exist. Below is is my test patch.

diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 644f3c963730..e4f6937cdc3e 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2284,6 +2284,7 @@ static int fec_enet_mii_init(struct platform_device *pdev)
int err = -ENXIO;
u32 mii_speed, holdtime;
u32 bus_freq;
+ static bool test_flag;

/*
* The i.MX28 dual fec interfaces are not equal.
@@ -2388,7 +2389,12 @@ static int fec_enet_mii_init(struct platform_device *pdev)
fep->mii_bus->priv = fep;
fep->mii_bus->parent = &pdev->dev;

- err = of_mdiobus_register(fep->mii_bus, node);
+ if (node && !test_flag) {
+ err = -EPROBE_DEFER;
+ test_flag = true;
+ } else {
+ err = of_mdiobus_register(fep->mii_bus, node);
+ }
if (err)
goto err_out_free_mdiobus;
of_node_put(node);
@@ -4349,8 +4355,9 @@ fec_probe(struct platform_device *pdev)

/* Decide which interrupt line is wakeup capable */
fec_enet_get_wakeup_irq(pdev);
-
+ dev_info(&pdev->dev, "[FEC Debug] >> Start registering mii bus!\n");
ret = fec_enet_mii_init(pdev);
+ dev_info(&pdev->dev, "[FEC Debug] >> Finish registering mii bus! ret:%d\n", ret);
if (ret)
goto failed_mii_init;

On the imx6ul platform, fec1 (ethernet@2188000) and fec2 (fec2: ethernet@20b4000) share
the same MDIO bus and external PHYs can only be configured by fec2. After applying the test
patch, the fec2 will be delayed to probe, the debug log shows as follows.

[ 7.101569] fec 20b4000.ethernet: [FEC Debug] >> Start registering mii bus!
[ 7.109386] fec 20b4000.ethernet: [FEC Debug] >> Finish registering mii bus! ret:-517
[ 7.153045] fec 2188000.ethernet: [FEC Debug] >> Start registering mii bus!
[ 7.343374] fec 2188000.ethernet: [FEC Debug] >> Finish registering mii bus! ret:0
[ 8.742909] fec 20b4000.ethernet: [FEC Debug] >> Start registering mii bus!
[ 8.769657] fec 20b4000.ethernet: [FEC Debug] >> Finish registering mii bus! ret:0

And the MDIO bus also can be accessed, please refer to the following log.

root@imx6ul7d:~# ./mdio eth0 1
read phy addr: 0x2 reg: 0x1 value : 0x786d

root@imx6ul7d:~# ./mdio eth1 1
read phy addr: 0x1 reg: 0x1 value : 0x786d

The only change is that after applying the test patch, the fec1 and fec2 exchange the ethernet
port names (before fec1: eth1 fec2: eth0, after fec1: eth0, fec2: eth1) because of the sequence
of probe. Of course, this is just a simulated situation on my side, maybe the actual situation on
your side is different from this which leads to different behaviors.

BTW, you'd better use the ./script/checkpatch.pl to check the patch before sending the patch.
There were some warnings and errors in the patch as follows.

WARNING: 'succesfully' may be misspelled - perhaps 'successfully'?
#14:
succesfully.
^^^^^^^^^^^

ERROR: do not initialise statics to false
#29: FILE: drivers/net/ethernet/freescale/fec_main.c:2287:
+ static bool wait_for_mdio_bus = false;

total: 1 errors, 1 warnings, 0 checks, 40 lines checked

2023-01-17 15:23:29

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition

On Tue, Jan 17, 2023 at 05:54:21AM +0000, Wei Fang wrote:
> > -----Original Message-----
> > From: Pierluigi Passaro <[email protected]>
> > Sent: 2023年1月17日 4:23
> > To: Andrew Lunn <[email protected]>
> > Cc: Pierluigi Passaro <[email protected]>; Wei Fang
> > <[email protected]>; Shenwei Wang <[email protected]>; Clark Wang
> > <[email protected]>; dl-linux-imx <[email protected]>;
> > [email protected]; [email protected]; [email protected];
> > [email protected]; [email protected]; [email protected];
> > [email protected]; Nate Drude <[email protected]>; Francesco
> > Ferraro <[email protected]>
> > Subject: Re: [PATCH v2] net: fec: manage corner deferred probe condition
> >
> > On Mon, Jan 16, 2023 at 4:32 PM Andrew Lunn <[email protected]> wrote:
> > > > This is the setup of the corner case:
> > > > - FEC0 is the owner of MDIO bus, but its own PHY rely on a "delayed"
> > > > GPIO
> > > > - FEC1 rely on FEC0 for MDIO communications The sequence is
> > > > something like this
> > > > - FEC0 probe start, but being the reset GPIO "delayed" it return
> > > > EPROBE_DEFERRED
> > > > - FEC1 is successfully probed: being the MDIO bus still not owned,
> > > > the driver assume
> > > >   that the ownership must be assigned to the 1st one successfully
> > > > probed, but no
> > > >   MDIO node is actually present and no communication takes place.
> > >
> > > So semantics of a phandle is that you expect what it points to, to
> > > exists. So if phy-handle points to a PHY, when you follow that pointer
> > > and find it missing, you should defer the probe. So this step should
> > > not succeed.
> > >
> > I agree with you: the check is present, but the current logic is not consistent.
> > Whenever the node owning the MDIO fails the probe due to
> > EPROBE_DEFERRED, also the second node must defer the probe, otherwise no
> > MDIO communication is possible.
> > That's why the patch set the static variable wait_for_mdio_bus to track the
> > status.
> > > > - FEC0 is successfully probed, but MDIO bus is now assigned to FEC1
> > > >   and cannot  and no communication takes place
> > >
>
> Have you tested that this issue also exists on the net tree? According to your
> description, I simulated your situation on the net tree and tested it with imx6ul,
> but the problem you mentioned does not exist. Below is is my test patch.

Hi Wei

Reading the emails from Pierluigi, i don't get the feeling he really
understands the problem and has got to the root cause. I've not seen a
really good, detailed explanation of what is going wrong.

Andrew