2021-02-10 22:16:19

by Saravana Kannan

[permalink] [raw]
Subject: phy_attach_direct()'s use of device_bind_driver()

Hi,

This email was triggered by this other email[1].

Why is phy_attach_direct() directly calling device_bind_driver()
instead of using bus_probe_device()? I'm asking because this is
causing device links status to not get updated correctly and causes
this[2] warning.

We can fix the device links issue with something like this[3], but
want to understand the reason for the current implementation of
phy_attach_direct() before we go ahead and put in that fix.

Thanks,
Saravana

[1] - https://lore.kernel.org/lkml/[email protected]/#t
[2] - https://lore.kernel.org/lkml/[email protected]/
[3] - https://lore.kernel.org/lkml/[email protected]/


2021-02-10 22:44:32

by Heiner Kallweit

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

On 10.02.2021 23:13, Saravana Kannan wrote:
> Hi,
>
> This email was triggered by this other email[1].
>
> Why is phy_attach_direct() directly calling device_bind_driver()
> instead of using bus_probe_device()? I'm asking because this is
> causing device links status to not get updated correctly and causes
> this[2] warning.
>

The genphy driver is a fallback if no dedicated PHY driver matches the
PHY device. It doesn't match any device, therefore it needs to be
explicitly bound.

> We can fix the device links issue with something like this[3], but
> want to understand the reason for the current implementation of
> phy_attach_direct() before we go ahead and put in that fix.
>
> Thanks,
> Saravana
>
> [1] - https://lore.kernel.org/lkml/[email protected]/#t
> [2] - https://lore.kernel.org/lkml/[email protected]/
> [3] - https://lore.kernel.org/lkml/[email protected]/
>

2021-02-10 22:54:41

by Andrew Lunn

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
> Hi,
>
> This email was triggered by this other email[1].
>
> Why is phy_attach_direct() directly calling device_bind_driver()
> instead of using bus_probe_device()?

Hi Saravana

So this is to do with the generic PHY, which is a special case.

First the normal case. The MDIO bus driver registers an MDIO bus using
mdiobus_register(). This will enumerate the bus, finding PHYs on
it. Each PHY device is registered with the device core, using the
usual device_add(). The core will go through the registered PHY
drivers and see if one can drive this hardware, based on the ID
registers the PHY has at address 2 and 3. If a match is found, the
driver probes the device, all in the usual way.

Sometime later, the MAC driver wants to make use of the PHY
device. This is often in the open() call of the MAC driver, when the
interface is configured up. The MAC driver asks phylib to associate a
PHY devices to the MAC device. In the normal case, the PHY has been
probed, and everything is good to go.

However, sometimes, there is no driver for the PHY. There is no driver
for that hardware. Or the driver has not been built, or it is not on
the disk, etc. So the device core has not been able to probe
it. However, IEEE 802.3 clause 22 defines a minimum set of registers a
PHY should support. And most PHY devices have this minimum. So there
is a fall back driver, the generic PHY driver. It assumes the minimum
registers are available, and does its best to drive the hardware. It
often works, but not always. So if the MAC asks phylib to connect to a
PHY which does not have a driver, we forcefully bind the generic
driver to the device, and hope for the best.

We don't actually recommend using the generic driver. Use the specific
driver for the hardware. But the generic driver can at least get you
going, allow you to scp the correct driver onto the system, etc.

Andrew

2021-02-10 23:01:57

by Andrew Lunn

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
> Hi,
>
> This email was triggered by this other email[1].

And it appears the Tegra194 Jetson Xavier uses the Marvell 88E1512
PHY. So ensure the Marvell driver is available, and it should get
probed in the usual way, the fallback driver will not be needed.

Andrew

2021-02-10 23:34:42

by Saravana Kannan

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

On Wed, Feb 10, 2021 at 2:52 PM Andrew Lunn <[email protected]> wrote:
>
> On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
> > Hi,
> >
> > This email was triggered by this other email[1].
> >
> > Why is phy_attach_direct() directly calling device_bind_driver()
> > instead of using bus_probe_device()?
>
> Hi Saravana
>
> So this is to do with the generic PHY, which is a special case.
>
> First the normal case. The MDIO bus driver registers an MDIO bus using
> mdiobus_register(). This will enumerate the bus, finding PHYs on
> it. Each PHY device is registered with the device core, using the
> usual device_add(). The core will go through the registered PHY
> drivers and see if one can drive this hardware, based on the ID
> registers the PHY has at address 2 and 3. If a match is found, the
> driver probes the device, all in the usual way.
>
> Sometime later, the MAC driver wants to make use of the PHY
> device. This is often in the open() call of the MAC driver, when the
> interface is configured up. The MAC driver asks phylib to associate a
> PHY devices to the MAC device. In the normal case, the PHY has been
> probed, and everything is good to go.
>
> However, sometimes, there is no driver for the PHY. There is no driver
> for that hardware. Or the driver has not been built, or it is not on
> the disk, etc. So the device core has not been able to probe
> it. However, IEEE 802.3 clause 22 defines a minimum set of registers a
> PHY should support. And most PHY devices have this minimum. So there
> is a fall back driver, the generic PHY driver. It assumes the minimum
> registers are available, and does its best to drive the hardware. It
> often works, but not always. So if the MAC asks phylib to connect to a
> PHY which does not have a driver, we forcefully bind the generic
> driver to the device, and hope for the best.

Thanks for the detailed answer Andrew! I think it gives me enough
info/context to come up with a proper fix.

> We don't actually recommend using the generic driver. Use the specific
> driver for the hardware. But the generic driver can at least get you
> going, allow you to scp the correct driver onto the system, etc.

I'm not sure if I can control what driver they use. If I can fix this
warning, I'll probably try to do that.

-Saravana

2021-02-11 07:34:42

by Heiner Kallweit

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

On 11.02.2021 00:29, Saravana Kannan wrote:
> On Wed, Feb 10, 2021 at 2:52 PM Andrew Lunn <[email protected]> wrote:
>>
>> On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
>>> Hi,
>>>
>>> This email was triggered by this other email[1].
>>>
>>> Why is phy_attach_direct() directly calling device_bind_driver()
>>> instead of using bus_probe_device()?
>>
>> Hi Saravana
>>
>> So this is to do with the generic PHY, which is a special case.
>>
>> First the normal case. The MDIO bus driver registers an MDIO bus using
>> mdiobus_register(). This will enumerate the bus, finding PHYs on
>> it. Each PHY device is registered with the device core, using the
>> usual device_add(). The core will go through the registered PHY
>> drivers and see if one can drive this hardware, based on the ID
>> registers the PHY has at address 2 and 3. If a match is found, the
>> driver probes the device, all in the usual way.
>>
>> Sometime later, the MAC driver wants to make use of the PHY
>> device. This is often in the open() call of the MAC driver, when the
>> interface is configured up. The MAC driver asks phylib to associate a
>> PHY devices to the MAC device. In the normal case, the PHY has been
>> probed, and everything is good to go.
>>
>> However, sometimes, there is no driver for the PHY. There is no driver
>> for that hardware. Or the driver has not been built, or it is not on
>> the disk, etc. So the device core has not been able to probe
>> it. However, IEEE 802.3 clause 22 defines a minimum set of registers a
>> PHY should support. And most PHY devices have this minimum. So there
>> is a fall back driver, the generic PHY driver. It assumes the minimum
>> registers are available, and does its best to drive the hardware. It
>> often works, but not always. So if the MAC asks phylib to connect to a
>> PHY which does not have a driver, we forcefully bind the generic
>> driver to the device, and hope for the best.
>
> Thanks for the detailed answer Andrew! I think it gives me enough
> info/context to come up with a proper fix.
>
>> We don't actually recommend using the generic driver. Use the specific
>> driver for the hardware. But the generic driver can at least get you
>> going, allow you to scp the correct driver onto the system, etc.
>
> I'm not sure if I can control what driver they use. If I can fix this
> warning, I'll probably try to do that.
>
The genphy driver is a last resort, at least they lose functionality like
downshift detection and control. Therefore they should go with the
dedicated Marvell PHY driver.

But right, this avoids the warning, but the underlying issue (probably
in device_bind_driver()) still exists. Would be good if you can fix it.

> -Saravana
>

Heiner

2021-02-11 09:05:55

by Saravana Kannan

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

On Wed, Feb 10, 2021 at 11:31 PM Heiner Kallweit <[email protected]> wrote:
>
> On 11.02.2021 00:29, Saravana Kannan wrote:
> > On Wed, Feb 10, 2021 at 2:52 PM Andrew Lunn <[email protected]> wrote:
> >>
> >> On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
> >>> Hi,
> >>>
> >>> This email was triggered by this other email[1].
> >>>
> >>> Why is phy_attach_direct() directly calling device_bind_driver()
> >>> instead of using bus_probe_device()?
> >>
> >> Hi Saravana
> >>
> >> So this is to do with the generic PHY, which is a special case.
> >>
> >> First the normal case. The MDIO bus driver registers an MDIO bus using
> >> mdiobus_register(). This will enumerate the bus, finding PHYs on
> >> it. Each PHY device is registered with the device core, using the
> >> usual device_add(). The core will go through the registered PHY
> >> drivers and see if one can drive this hardware, based on the ID
> >> registers the PHY has at address 2 and 3. If a match is found, the
> >> driver probes the device, all in the usual way.
> >>
> >> Sometime later, the MAC driver wants to make use of the PHY
> >> device. This is often in the open() call of the MAC driver, when the
> >> interface is configured up. The MAC driver asks phylib to associate a
> >> PHY devices to the MAC device. In the normal case, the PHY has been
> >> probed, and everything is good to go.
> >>
> >> However, sometimes, there is no driver for the PHY. There is no driver
> >> for that hardware. Or the driver has not been built, or it is not on
> >> the disk, etc. So the device core has not been able to probe
> >> it. However, IEEE 802.3 clause 22 defines a minimum set of registers a
> >> PHY should support. And most PHY devices have this minimum. So there
> >> is a fall back driver, the generic PHY driver. It assumes the minimum
> >> registers are available, and does its best to drive the hardware. It
> >> often works, but not always. So if the MAC asks phylib to connect to a
> >> PHY which does not have a driver, we forcefully bind the generic
> >> driver to the device, and hope for the best.
> >
> > Thanks for the detailed answer Andrew! I think it gives me enough
> > info/context to come up with a proper fix.
> >
> >> We don't actually recommend using the generic driver. Use the specific
> >> driver for the hardware. But the generic driver can at least get you
> >> going, allow you to scp the correct driver onto the system, etc.
> >
> > I'm not sure if I can control what driver they use. If I can fix this
> > warning, I'll probably try to do that.
> >
> The genphy driver is a last resort, at least they lose functionality like
> downshift detection and control. Therefore they should go with the
> dedicated Marvell PHY driver.
>
> But right, this avoids the warning, but the underlying issue (probably
> in device_bind_driver()) still exists. Would be good if you can fix it.

Yeah, I plan to fix this. So I have a few more questions. In the
example I gave, what should happen if the gpios listed in the phy's DT
node aren't ready yet? The generic phy driver itself probably isn't
using any GPIO? But will the phy work without the GPIO hardware being
initialized? The reason I'm asking this question is, if the phy is
linked to a supplier and the supplier is not ready, should the
device_bind_driver() succeed or not?

-Saravana

2021-02-11 09:39:31

by Heiner Kallweit

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

On 11.02.2021 09:57, Saravana Kannan wrote:
> On Wed, Feb 10, 2021 at 11:31 PM Heiner Kallweit <[email protected]> wrote:
>>
>> On 11.02.2021 00:29, Saravana Kannan wrote:
>>> On Wed, Feb 10, 2021 at 2:52 PM Andrew Lunn <[email protected]> wrote:
>>>>
>>>> On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
>>>>> Hi,
>>>>>
>>>>> This email was triggered by this other email[1].
>>>>>
>>>>> Why is phy_attach_direct() directly calling device_bind_driver()
>>>>> instead of using bus_probe_device()?
>>>>
>>>> Hi Saravana
>>>>
>>>> So this is to do with the generic PHY, which is a special case.
>>>>
>>>> First the normal case. The MDIO bus driver registers an MDIO bus using
>>>> mdiobus_register(). This will enumerate the bus, finding PHYs on
>>>> it. Each PHY device is registered with the device core, using the
>>>> usual device_add(). The core will go through the registered PHY
>>>> drivers and see if one can drive this hardware, based on the ID
>>>> registers the PHY has at address 2 and 3. If a match is found, the
>>>> driver probes the device, all in the usual way.
>>>>
>>>> Sometime later, the MAC driver wants to make use of the PHY
>>>> device. This is often in the open() call of the MAC driver, when the
>>>> interface is configured up. The MAC driver asks phylib to associate a
>>>> PHY devices to the MAC device. In the normal case, the PHY has been
>>>> probed, and everything is good to go.
>>>>
>>>> However, sometimes, there is no driver for the PHY. There is no driver
>>>> for that hardware. Or the driver has not been built, or it is not on
>>>> the disk, etc. So the device core has not been able to probe
>>>> it. However, IEEE 802.3 clause 22 defines a minimum set of registers a
>>>> PHY should support. And most PHY devices have this minimum. So there
>>>> is a fall back driver, the generic PHY driver. It assumes the minimum
>>>> registers are available, and does its best to drive the hardware. It
>>>> often works, but not always. So if the MAC asks phylib to connect to a
>>>> PHY which does not have a driver, we forcefully bind the generic
>>>> driver to the device, and hope for the best.
>>>
>>> Thanks for the detailed answer Andrew! I think it gives me enough
>>> info/context to come up with a proper fix.
>>>
>>>> We don't actually recommend using the generic driver. Use the specific
>>>> driver for the hardware. But the generic driver can at least get you
>>>> going, allow you to scp the correct driver onto the system, etc.
>>>
>>> I'm not sure if I can control what driver they use. If I can fix this
>>> warning, I'll probably try to do that.
>>>
>> The genphy driver is a last resort, at least they lose functionality like
>> downshift detection and control. Therefore they should go with the
>> dedicated Marvell PHY driver.
>>
>> But right, this avoids the warning, but the underlying issue (probably
>> in device_bind_driver()) still exists. Would be good if you can fix it.
>
> Yeah, I plan to fix this. So I have a few more questions. In the
> example I gave, what should happen if the gpios listed in the phy's DT
> node aren't ready yet? The generic phy driver itself probably isn't
> using any GPIO? But will the phy work without the GPIO hardware being
> initialized? The reason I'm asking this question is, if the phy is
> linked to a supplier and the supplier is not ready, should the
> device_bind_driver() succeed or not?
>

There may be situations where the gpio is used for the PHY reset and
default state is "reset assigned". If we can't control the reset signal
then PHY isn't usable. Therefore I'm inclined to say we should not
succeed. Let's see which opinions Andrew and Russell have.

However I have a little bit of a hard time to imagine how this scenario
can happen. device_bind_driver(), as part of phy_attach_direct(),
is typically called from ndo_open() of the net device, like in
your stmmac case. Means user space would open the network interface
before the gpio controller has even been probed.

> -Saravana
>

2021-02-11 10:29:41

by Jon Hunter

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()


On 10/02/2021 22:56, Andrew Lunn wrote:
> On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
>> Hi,
>>
>> This email was triggered by this other email[1].
>
> And it appears the Tegra194 Jetson Xavier uses the Marvell 88E1512
> PHY. So ensure the Marvell driver is available, and it should get
> probed in the usual way, the fallback driver will not be needed.


Yes that is correct. Enabling the Marvell PHY does fix this indeed and
so I can enable that as part of our testsuite. We were seeing the same
warning on Tegra186 Jetson TX2 and enabling the BRCM PHY resolves that
as well. I will ensure that these are enabled going forward.

Cheers
Jon

--
nvpublic

2021-02-11 14:19:15

by Andrew Lunn

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

> Yeah, I plan to fix this. So I have a few more questions. In the
> example I gave, what should happen if the gpios listed in the phy's DT
> node aren't ready yet?

There are four different use cases for GPIO.

1) The GPIO is used to reset all devices on the MDIO bus. When the bus
is registered with the core, the core will try to get this GPIO. If we
get EPROBE_DEFER, the registration of the bus is deferred and tried
again later. If the MAC driver tries to get the PHY device before the
MDIO bus is enumerated, it should also get EPROBE_DEFER, and in the
end everything should work.

2) The GPIO is for a specific PHY. Here we have an oddity in the
code. If the PHY responds to bus enumeration, before we start doing
anything with the reset GPIO, it will be discovered on the bus. At
this point, we try to get the GPIO. If that fails with EPROBE_DEFER,
all the PHYs on the bus are unregistered, and the bus registration
process fails with EPROBE_DEFER.

3) The GPIO is for a specific PHY. However, the device does not
respond to enumeration, because it is held in reset. You can get
around this by placing the ID values into device tree. The bus is
first enumerated in the normal way. And then devices which are listed
in DT, but have not been found, and have ID registers are registered
to the bus. This follows pretty much the same path as for a device
which is discovered. Before the device is registered with the device
core, we get the GPIOs, and handle the EPROBE_DEFER, unwinding
everything.

4) The GPIO does not use the normal name in DT. Or the PHY has some
other resource, which phylib does nothing with. The driver specific to
the hardware has code to handle the resource. It should try to get
those resources during probe. If probe returns EPROBE_DEFER, the probe
will be retried later. And when the MAC driver tries to find the PHY,
it should also get EPROBE_DEFER.

In case 4, the fallback driver has no idea about these PHY devices
specific properties. They are not part of 802.3 clause 22. So it will
ignore them. Probably the PHY will not work, because it is missing a
reset, or a clock, or a regulator. But we don't really care about
that. In order that the DT was accepted into the kernel, there must be
a device specific driver which uses those properties. So the kernel
installation is broken, that hardware specific driver is missing.

Andrew

2021-02-11 14:23:50

by Andrew Lunn

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

On Thu, Feb 11, 2021 at 10:21:03AM +0000, Jon Hunter wrote:
>
> On 10/02/2021 22:56, Andrew Lunn wrote:
> > On Wed, Feb 10, 2021 at 02:13:48PM -0800, Saravana Kannan wrote:
> >> Hi,
> >>
> >> This email was triggered by this other email[1].
> >
> > And it appears the Tegra194 Jetson Xavier uses the Marvell 88E1512
> > PHY. So ensure the Marvell driver is available, and it should get
> > probed in the usual way, the fallback driver will not be needed.
>
>
> Yes that is correct. Enabling the Marvell PHY does fix this indeed and
> so I can enable that as part of our testsuite. We were seeing the same
> warning on Tegra186 Jetson TX2 and enabling the BRCM PHY resolves that
> as well. I will ensure that these are enabled going forward.

Hi Jon

As an added bonus, you might of gained an additional HWMON temperature
sensor for the PHY, some PHY statistics, and maybe cable diagnostics.
Just by using the correct driver for the hardware.

Andrew

2021-02-12 03:46:55

by Saravana Kannan

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

On Thu, Feb 11, 2021 at 5:57 AM Andrew Lunn <[email protected]> wrote:
>
> > Yeah, I plan to fix this. So I have a few more questions. In the
> > example I gave, what should happen if the gpios listed in the phy's DT
> > node aren't ready yet?
>
> There are four different use cases for GPIO.
>
> 1) The GPIO is used to reset all devices on the MDIO bus. When the bus
> is registered with the core, the core will try to get this GPIO. If we
> get EPROBE_DEFER, the registration of the bus is deferred and tried
> again later. If the MAC driver tries to get the PHY device before the
> MDIO bus is enumerated, it should also get EPROBE_DEFER, and in the
> end everything should work.
>
> 2) The GPIO is for a specific PHY. Here we have an oddity in the
> code. If the PHY responds to bus enumeration, before we start doing
> anything with the reset GPIO, it will be discovered on the bus. At
> this point, we try to get the GPIO. If that fails with EPROBE_DEFER,
> all the PHYs on the bus are unregistered, and the bus registration
> process fails with EPROBE_DEFER.
>
> 3) The GPIO is for a specific PHY. However, the device does not
> respond to enumeration, because it is held in reset. You can get
> around this by placing the ID values into device tree. The bus is
> first enumerated in the normal way. And then devices which are listed
> in DT, but have not been found, and have ID registers are registered
> to the bus. This follows pretty much the same path as for a device
> which is discovered. Before the device is registered with the device
> core, we get the GPIOs, and handle the EPROBE_DEFER, unwinding
> everything.
>
> 4) The GPIO does not use the normal name in DT. Or the PHY has some
> other resource, which phylib does nothing with. The driver specific to
> the hardware has code to handle the resource. It should try to get
> those resources during probe. If probe returns EPROBE_DEFER, the probe
> will be retried later. And when the MAC driver tries to find the PHY,
> it should also get EPROBE_DEFER.
>
> In case 4, the fallback driver has no idea about these PHY devices
> specific properties. They are not part of 802.3 clause 22. So it will
> ignore them. Probably the PHY will not work, because it is missing a
> reset, or a clock, or a regulator. But we don't really care about
> that. In order that the DT was accepted into the kernel, there must be
> a device specific driver which uses those properties. So the kernel
> installation is broken, that hardware specific driver is missing.

Thanks! I don't know anything about mdio (other than the generic bus
stuff) or "MAC driver" (except for "MAC address"). So I had to read
this multiple times and I think I finally got it at a high level. So,
to summarize it and ignoring case 4, the phy device would never get
added to driver core before all it's required resources are available
just because of how it's part of an ethernet controller/mdio bus. So
by the time we force bind a PHY to the generic driver, all the
required resources should already be set up and work with the generic
driver.

So the plan to fix this warning is, when device_bind_driver() is called:
1. Delete all device links from the device (in this case, the PHY) to
suppliers that haven't probed yet because there's no probe function
that can defer at this point.
2. Then call the usual device link status update code so that it
updates the status of the remaining device links correctly. This will
avoid the warning.

This seems like a generic solution that works for PHY and for any
device that is force bound.

Thanks for the help!

-Saravana

2021-02-12 14:07:06

by Andrew Lunn

[permalink] [raw]
Subject: Re: phy_attach_direct()'s use of device_bind_driver()

> So the plan to fix this warning is, when device_bind_driver() is called:
> 1. Delete all device links from the device (in this case, the PHY) to
> suppliers that haven't probed yet because there's no probe function
> that can defer at this point.

Just because it currently does not happen, does not mean it couldn't
happen in the future. What are the implications of removing the links?


> 2. Then call the usual device link status update code so that it
> updates the status of the remaining device links correctly. This will
> avoid the warning.
>
> This seems like a generic solution that works for PHY and for any
> device that is force bound.

I don't know if there is any other case in the kernel where a fallback
driver is force bound on a device. But i agree this should be
generic. And hidden away in the drive core, with maybe a new call?

Andrew