2020-06-17 15:38:17

by Dejin Zheng

[permalink] [raw]
Subject: [PATCH net v1] net: phy: smsc: fix printing too many logs

Commit 7ae7ad2f11ef47 ("net: phy: smsc: use phy_read_poll_timeout()
to simplify the code") will print a lot of logs as follows when Ethernet
cable is not connected:

[ 4.473105] SMSC LAN8710/LAN8720 2188000.ethernet-1:00: lan87xx_read_status failed: -110

So fix it by read_poll_timeout().

Fixes: 7ae7ad2f11ef47 ("net: phy: smsc: use phy_read_poll_timeout() to simplify the code")
Reported-by: Kevin Groeneveld <[email protected]>
Signed-off-by: Dejin Zheng <[email protected]>
---
drivers/net/phy/smsc.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index 93da7d3d0954..36c5a57917b8 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -123,9 +123,10 @@ static int lan87xx_read_status(struct phy_device *phydev)
return rc;

/* Wait max 640 ms to detect energy */
- phy_read_poll_timeout(phydev, MII_LAN83C185_CTRL_STATUS, rc,
- rc & MII_LAN83C185_ENERGYON, 10000,
- 640000, true);
+ read_poll_timeout(phy_read, rc,
+ rc & MII_LAN83C185_ENERGYON || rc < 0,
+ 10000, 640000, true, phydev,
+ MII_LAN83C185_CTRL_STATUS);
if (rc < 0)
return rc;

--
2.25.0


2020-06-17 16:22:07

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH net v1] net: phy: smsc: fix printing too many logs

On Wed, Jun 17, 2020 at 11:33:40PM +0800, Dejin Zheng wrote:
> Commit 7ae7ad2f11ef47 ("net: phy: smsc: use phy_read_poll_timeout()
> to simplify the code") will print a lot of logs as follows when Ethernet
> cable is not connected:
>
> [ 4.473105] SMSC LAN8710/LAN8720 2188000.ethernet-1:00: lan87xx_read_status failed: -110
>
> So fix it by read_poll_timeout().

Do you have a more detailed explanation of what is going on here?

After a lot of thought, i think i can see how this happens. But the
commit message should really spell out why this is the correct fix.

Thanks
Andrew

2020-06-17 17:53:35

by Dejin Zheng

[permalink] [raw]
Subject: Re: [PATCH net v1] net: phy: smsc: fix printing too many logs

On Wed, Jun 17, 2020 at 06:19:25PM +0200, Andrew Lunn wrote:
> On Wed, Jun 17, 2020 at 11:33:40PM +0800, Dejin Zheng wrote:
> > Commit 7ae7ad2f11ef47 ("net: phy: smsc: use phy_read_poll_timeout()
> > to simplify the code") will print a lot of logs as follows when Ethernet
> > cable is not connected:
> >
> > [ 4.473105] SMSC LAN8710/LAN8720 2188000.ethernet-1:00: lan87xx_read_status failed: -110
> >
> > So fix it by read_poll_timeout().
>
> Do you have a more detailed explanation of what is going on here?
>
> After a lot of thought, i think i can see how this happens. But the
> commit message should really spell out why this is the correct fix.
>
Hi Andrew:

Kevin report a bug for me in link[1], I check the Commit 7ae7ad2f11ef47
("net: phy: smsc: use phy_read_poll_timeout() to simplify the code") and
found it change the original behavior in smsc driver. It does not has
any error message whether it is timeout or phy_read fails, but this Commit
will changed it and will print some error messages by
phy_read_poll_timeout() when it is timeout or phy_read fails. so use the
read_poll_timeout() to replace phy_read_poll_timeout() to fix this
issue. the read_poll_timeout() does not print any log when it goes
wrong.

the original codes is that:

/* Wait max 640 ms to detect energy */
for (i = 0; i < 64; i++) {
/* Sleep to allow link test pulses to be sent */
msleep(10);
rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
if (rc < 0)
return rc;
if (rc & MII_LAN83C185_ENERGYON)
break;
}

Commit 7ae7ad2f11ef47 modify it as this:

phy_read_poll_timeout(phydev, MII_LAN83C185_CTRL_STATUS,
rc & MII_LAN83C185_ENERGYON, 10000,
640000, true);
if (rc < 0)
return rc;

the phy_read_poll_timeout() will print a error log by phydev_err()
when timeout or rc < 0. read_poll_timeout() just return timeout
error and does not print any error log.

#define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
timeout_us, sleep_before_read) \
({ \
int __ret = read_poll_timeout(phy_read, val, (cond) || val < 0, \
sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
if (val < 0) \
__ret = val; \
if (__ret) \
phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
__ret; \
})

So use read_poll_timeout Use read_poll_timeout() to be consistent with the
original code.

link[1]: https://lkml.org/lkml/2020/6/1/1408

> Thanks
> Andrew

2020-06-17 18:48:07

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH net v1] net: phy: smsc: fix printing too many logs

On Thu, Jun 18, 2020 at 01:50:39AM +0800, Dejin Zheng wrote:
> On Wed, Jun 17, 2020 at 06:19:25PM +0200, Andrew Lunn wrote:
> > On Wed, Jun 17, 2020 at 11:33:40PM +0800, Dejin Zheng wrote:
> > > Commit 7ae7ad2f11ef47 ("net: phy: smsc: use phy_read_poll_timeout()
> > > to simplify the code") will print a lot of logs as follows when Ethernet
> > > cable is not connected:
> > >
> > > [ 4.473105] SMSC LAN8710/LAN8720 2188000.ethernet-1:00: lan87xx_read_status failed: -110
> > >
> > > So fix it by read_poll_timeout().
> >
> > Do you have a more detailed explanation of what is going on here?
> >
> > After a lot of thought, i think i can see how this happens. But the
> > commit message should really spell out why this is the correct fix.
> >
> Hi Andrew:
>
> Kevin report a bug for me in link[1], I check the Commit 7ae7ad2f11ef47
> ("net: phy: smsc: use phy_read_poll_timeout() to simplify the code") and
> found it change the original behavior in smsc driver. It does not has
> any error message whether it is timeout or phy_read fails, but this Commit
> will changed it and will print some error messages by
> phy_read_poll_timeout() when it is timeout or phy_read fails. so use the
> read_poll_timeout() to replace phy_read_poll_timeout() to fix this
> issue. the read_poll_timeout() does not print any log when it goes
> wrong.
>
> the original codes is that:
>
> /* Wait max 640 ms to detect energy */
> for (i = 0; i < 64; i++) {
> /* Sleep to allow link test pulses to be sent */
> msleep(10);
> rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
> if (rc < 0)
> return rc;
> if (rc & MII_LAN83C185_ENERGYON)
> break;
> }
>
> Commit 7ae7ad2f11ef47 modify it as this:
>
> phy_read_poll_timeout(phydev, MII_LAN83C185_CTRL_STATUS,
> rc & MII_LAN83C185_ENERGYON, 10000,
> 640000, true);
> if (rc < 0)
> return rc;
>
> the phy_read_poll_timeout() will print a error log by phydev_err()
> when timeout or rc < 0. read_poll_timeout() just return timeout
> error and does not print any error log.
>
> #define phy_read_poll_timeout(phydev, regnum, val, cond, sleep_us, \
> timeout_us, sleep_before_read) \
> ({ \
> int __ret = read_poll_timeout(phy_read, val, (cond) || val < 0, \
> sleep_us, timeout_us, sleep_before_read, phydev, regnum); \
> if (val < 0) \
> __ret = val; \
> if (__ret) \
> phydev_err(phydev, "%s failed: %d\n", __func__, __ret); \
> __ret; \
> })
>
> So use read_poll_timeout Use read_poll_timeout() to be consistent with the
> original code.

You have explained what the change does. But not why it is
needed. What exactly is happening. To me, the key thing is
understanding why we get -110, and why it is not an actual error we
should be reporting as an error. That is what needs explaining.

And once that is understood, i think we might be looking at a
different solution.

Andrew

2020-06-17 20:28:09

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH net v1] net: phy: smsc: fix printing too many logs

On Wed, Jun 17, 2020 at 08:43:34PM +0200, Andrew Lunn wrote:
> You have explained what the change does. But not why it is
> needed. What exactly is happening. To me, the key thing is
> understanding why we get -110, and why it is not an actual error we
> should be reporting as an error. That is what needs explaining.

The patch author really ought to be explaining this... but let me
have a go. It's worth pointing out that the comments in the file
aren't good English either, so don't really describe what is going
on.

When this PHY is in EDPD mode, it doesn't always detect a connected
cable. The workaround for it involves, when the link is down, and
at each read_status() call:

- disable EDPD mode, forcing the PHY out of low-power mode
- waiting 640ms to see if we have any energy detected from the media
- re-enable entry to EDPD mode

This is presumably enough to allow the PHY to notice that a cable is
connected, and resume normal operations to negotiate with the partner.

The problem is that when no media is detected, the 640ms wait times
out (as it should, we don't want to wait forever) and the kernel
prints a warning.

This bug was introduced by an inappropriate conversion of:

/* Wait max 640 ms to detect energy */
- for (i = 0; i < 64; i++) {
- /* Sleep to allow link test pulses to be sent */
- msleep(10);
- rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
- if (rc < 0)
- return rc;
- if (rc & MII_LAN83C185_ENERGYON)
- break;
- }

to phy_read_poll_timeout() in the belief that it was "cleaning up"
the code, but it actually results in a functional change of printing
an error at the end of the 640ms window which wasn't there before.
The patch that does this even states that it's about "simplifying"
the code, yet it introduced a bug by doing so - that being the
extra kernel log message.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2020-06-17 21:38:59

by Andrew Lunn

[permalink] [raw]
Subject: Re: [PATCH net v1] net: phy: smsc: fix printing too many logs

On Wed, Jun 17, 2020 at 09:24:50PM +0100, Russell King - ARM Linux admin wrote:
> On Wed, Jun 17, 2020 at 08:43:34PM +0200, Andrew Lunn wrote:
> > You have explained what the change does. But not why it is
> > needed. What exactly is happening. To me, the key thing is
> > understanding why we get -110, and why it is not an actual error we
> > should be reporting as an error. That is what needs explaining.
>
> The patch author really ought to be explaining this... but let me
> have a go. It's worth pointing out that the comments in the file
> aren't good English either, so don't really describe what is going
> on.
>
> When this PHY is in EDPD mode, it doesn't always detect a connected
> cable. The workaround for it involves, when the link is down, and
> at each read_status() call:
>
> - disable EDPD mode, forcing the PHY out of low-power mode
> - waiting 640ms to see if we have any energy detected from the media
> - re-enable entry to EDPD mode
>
> This is presumably enough to allow the PHY to notice that a cable is
> connected, and resume normal operations to negotiate with the partner.
>
> The problem is that when no media is detected, the 640ms wait times
> out (as it should, we don't want to wait forever) and the kernel
> prints a warning.

Hi Russell

Yes, that is what i was thinking.

There probably should be a comment added just to prevent somebody
swapping it back to phy_read_poll_timeout(). It is not clear that
-ETIMEOUT is expected under some conditions.

Andrew

2020-06-18 15:13:08

by Dejin Zheng

[permalink] [raw]
Subject: Re: [PATCH net v1] net: phy: smsc: fix printing too many logs

On Wed, Jun 17, 2020 at 11:36:42PM +0200, Andrew Lunn wrote:
> On Wed, Jun 17, 2020 at 09:24:50PM +0100, Russell King - ARM Linux admin wrote:
> > On Wed, Jun 17, 2020 at 08:43:34PM +0200, Andrew Lunn wrote:
> > > You have explained what the change does. But not why it is
> > > needed. What exactly is happening. To me, the key thing is
> > > understanding why we get -110, and why it is not an actual error we
> > > should be reporting as an error. That is what needs explaining.
> >
> > The patch author really ought to be explaining this... but let me
> > have a go. It's worth pointing out that the comments in the file
> > aren't good English either, so don't really describe what is going
> > on.
> >
> > When this PHY is in EDPD mode, it doesn't always detect a connected
> > cable. The workaround for it involves, when the link is down, and
> > at each read_status() call:
> >
> > - disable EDPD mode, forcing the PHY out of low-power mode
> > - waiting 640ms to see if we have any energy detected from the media
> > - re-enable entry to EDPD mode
> >
> > This is presumably enough to allow the PHY to notice that a cable is
> > connected, and resume normal operations to negotiate with the partner.
> >
> > The problem is that when no media is detected, the 640ms wait times
> > out (as it should, we don't want to wait forever) and the kernel
> > prints a warning.
>
> Hi Russell
>
> Yes, that is what i was thinking.
>
> There probably should be a comment added just to prevent somebody
> swapping it back to phy_read_poll_timeout(). It is not clear that
> -ETIMEOUT is expected under some conditions.
>
> Andrew

Hi Andrew and Russell,

First of all, thanks very much for your comment!

I did not understand Andrew's comments correctly in the previous email,
sorry for my bad English. I found something in the commit 776829de90c597
("net: phy: workaround for buggy cable detection by LAN8700 after cable
plugging") about why it is not an actual error when get a timeout error
in this driver. the commit's link is here:

https://lore.kernel.org/patchwork/patch/590285/

As Russell said, it just for fix a hardware bug on LAN8700 device by wait
640ms, so it should not as an actual error when got timeout.

The following is my understanding:

It leads to unstable detection of plugging in Ethernet cable when LAN87xx
is in Energy Detect Power-Down mode. Because the ENERGYON bit is not set.

For fix this issue, it will disables Energy Detect Power-Down mode and
check ENERGYON bit to waiting for response on link pulses to detect
presence of plugged Ethernet cable in the 640ms. if got the ENERGYON bit
is set, I guess the cable plug-in event was detected and will report an
interrupt after exit lan87xx_read_status() funtion, if the ENERGYON bit
is not set, the plug-in event was not detected. after that, entry Energy
Detect Power-Down mode again.

it will re-call lan87xx_read_status() funtion by interrupt when got the
ENERGYON bit is set. and check link status again. then, It will get a
stable link status for LAN87xx device.

So the timeout for wait 640ms should not as an actual error.

Thanks!

Dejin