This prevents "No phy led trigger registered for speed(-1)"
alert message which is coused by phy_led_trigger_chage_speed()
being called during attaching phy to net_device where phy device
speed could be still unknown.
Signed-off-by: Ivan T. Ivanov <[email protected]>
---
drivers/net/phy/phy_led_triggers.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/phy/phy_led_triggers.c b/drivers/net/phy/phy_led_triggers.c
index f550576eb9da..4d6497c45ae4 100644
--- a/drivers/net/phy/phy_led_triggers.c
+++ b/drivers/net/phy/phy_led_triggers.c
@@ -33,7 +33,7 @@ void phy_led_trigger_change_speed(struct phy_device *phy)
if (!phy->link)
return phy_led_trigger_no_link(phy);
- if (phy->speed == 0)
+ if (phy->speed == 0 || phy->speed == SPEED_UNKNOWN)
return;
plt = phy_speed_to_led_trigger(phy, phy->speed);
--
2.32.0
On Fri, Jul 16, 2021 at 05:11:42PM +0300, Ivan T. Ivanov wrote:
> This prevents "No phy led trigger registered for speed(-1)"
> alert message which is coused by phy_led_trigger_chage_speed()
> being called during attaching phy to net_device where phy device
> speed could be still unknown.
Hi Ivan
It seems odd that when attaching the PHY we have link, but not the
speed. What PHY is this?
> - if (phy->speed == 0)
> + if (phy->speed == 0 || phy->speed == SPEED_UNKNOWN)
> return;
This change makes sense. But i'm wondering if the original logic is
sound. We have link, but no speed information. So the LED trigger is
left in an indeterminate state. Rather than a plain return, maybe
phy_led_trigger_no_link(phy) should be called?
Andrew
On Fri, Jul 16, 2021 at 09:32:00PM +0300, Ivan T. Ivanov wrote:
> Hi,
>
> Quoting Andrew Lunn (2021-07-16 18:19:58)
> > On Fri, Jul 16, 2021 at 05:11:42PM +0300, Ivan T. Ivanov wrote:
> > > This prevents "No phy led trigger registered for speed(-1)"
> > > alert message which is coused by phy_led_trigger_chage_speed()
> > > being called during attaching phy to net_device where phy device
> > > speed could be still unknown.
> >
> > Hi Ivan
> >
> > It seems odd that when attaching the PHY we have link, but not the
> > speed. What PHY is this?
>
> This is lan78xx on RPi3B+
>
> >
> > > - if (phy->speed == 0)
> > > + if (phy->speed == 0 || phy->speed == SPEED_UNKNOWN)
> > > return;
> >
> > This change makes sense. But i'm wondering if the original logic is
> > sound. We have link, but no speed information.
>
> Well, probably my interpretation was not correct. The most probable
> call to phy_led_trigger_change_speed() which couses this alert is
> phy_attach_direct() -> phy_led_triggers_register(), I think. I am
> not sure that we have link at this stage or not.
This does sound weird.
When a phy_device is allocated, it's explicitly initialised with:
dev->speed = SPEED_UNKNOWN;
dev->duplex = DUPLEX_UNKNOWN;
dev->link = 0;
dev->state = PHY_DOWN;
so, unless something is causing state to be read before we've attached
the phy to a network device, this is how this state should remain. I
wonder why you are seeing dev->link be non-zero.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Wed, Aug 04, 2021 at 11:33:10AM +0300, Ivan T. Ivanov wrote:
> I have sent new patch[1] which I think is proper fix for this.
>
> [1] https://lore.kernel.org/netdev/[email protected]/T/#u
Thanks.
I haven't reviewed the driver, but the patch itself LGTM from the
point of view that phy_read_status() should definitely only be
called with phydev->lock held.
I think we also need the "Doing it all yourself" section in
Documentation/networking/phy.rst fixed to specify that if you
call this function, you must be holding phydev->lock.
Lastly, I'm wondering how many other places call phy_read_status()
without holding phydev->lock - sounds like something that needs a
kernel-wide review, and then possibly we should introduce a lockdep
check for this in phy_read_status() to catch any new introductions.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
> I think will be easier if we protect public phylib API internally with
> lock, otherwise it is easy to make mistakes, obviously. But still this
> will not protect users which directly dereference phy_device members.
Anybody directly dereference phy_device members, rather than going
through the API, is directly responsible for any breakage they
cause. This is not a supported use case.
Andrew
On Wed, Aug 11, 2021 at 12:51:04PM +0300, Ivan T. Ivanov wrote:
> There are a few callers of this, but then we have a few callers of
> genphy_read_status() too, which are outside just implementing
> phy_driver->read_status() and don't use locking.
I think we need to strongly discourage people using the genphy*
functions directly from anything except phylib driver code. Any
such use is a layering violation.
I think it does make sense to have a set of lower-level API
functions that do the locking necessary, rather than having the
phylib locking spread out across multiple network drivers. It's
better to have it in one place.
> Then there a few users of phy_init_eee() which uses phy_read_status(),
> again without locking.
That is kind-of a special case - phy_init_eee() can be called by
network drivers from within the phylib adjust_link() callback, and
this callback is made while holding the phydev's lock. So those
cases are safe.
If phy_init_eee() is called outside of that, and the lock isn't
taken, then yes, it's buggy.
This all said, I can't say that I have particularly liked the
phy_init_eee() API. It seems to mix interface-up like configuration
(do we wish clocks to stop in EEE) with working out whether EEE
should be enabled for the speed/duplex that we've just read from
the PHY. However, most users of this function are being called as a
result of a link-up event when we already know the link parameters,
so we shouldn't be re-interrogating the PHY at this point. It seems
to me to be entirely unnecessary.
> I think will be easier if we protect public phylib API internally with
> lock, otherwise it is easy to make mistakes, obviously. But still this
> will not protect users which directly dereference phy_device members.
As Andrew says... but there are some members that network drivers
have to access due to the design of phylib, such as speed, duplex,
*pause, link, etc. Indeed these can change at any time when
phydev->lock is not held due to the action of phylib polling or
link interrupts from the PHY. So, accessing them outside of the
adjust_link() callback without holding the lock is racy.
Note that phylink's usage is similarly safe to the adjust_link()
callback; its access to these members is similarly protected by
phydev->lock taken in the phylib state machine - we use the
slightly lower-level phy_link_change() hook which avoids some of
the help that phylib provides to network drivers (which phylink
really doesn't want phylib managing.)
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Mon, Aug 09, 2021 at 03:16:33PM +0100, Russell King (Oracle) wrote:
> On Wed, Aug 04, 2021 at 11:33:10AM +0300, Ivan T. Ivanov wrote:
> > I have sent new patch[1] which I think is proper fix for this.
> >
> > [1] https://lore.kernel.org/netdev/[email protected]/T/#u
>
> Thanks.
>
> I haven't reviewed the driver, but the patch itself LGTM from the
> point of view that phy_read_status() should definitely only be
> called with phydev->lock held.
I'm cooking up a patchset which makes phy_read_status() take the
lock. I don't see any external callers taking the lock, so all the
changes are internal to phylib.
The change is however made a bit more complex by phy_read_status()
being in a header file, not phy.c. I wounder if there is some build
dependencies, modules vs built in. So my first patch simply moves it
into phy.c no other change. I will push it to github and let 0-day
chew on it for a while and see if it finds any build failures.
Andrew