The negotiation of flow control / pause frame modes was broken since
commit fcf1f59afc67 ("net: phy: marvell: rearrange to use
genphy_read_lpa()") moved the setting of phydev->duplex below the
phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that
function, phydev->pause was no longer set.
Fix it by moving the parsing of the status variable before the blocks
dealing with the pause frames.
Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()")
Cc: [email protected] # v5.6+
Signed-off-by: Clemens Gruber <[email protected]>
---
drivers/net/phy/marvell.c | 44 +++++++++++++++++++--------------------
1 file changed, 22 insertions(+), 22 deletions(-)
diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 4714ca0e0d4b..02cde4c0668c 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev,
int lpa;
int err;
+ if (!(status & MII_M1011_PHY_STATUS_RESOLVED))
+ return 0;
+
+ if (status & MII_M1011_PHY_STATUS_FULLDUPLEX)
+ phydev->duplex = DUPLEX_FULL;
+ else
+ phydev->duplex = DUPLEX_HALF;
+
+ switch (status & MII_M1011_PHY_STATUS_SPD_MASK) {
+ case MII_M1011_PHY_STATUS_1000:
+ phydev->speed = SPEED_1000;
+ break;
+
+ case MII_M1011_PHY_STATUS_100:
+ phydev->speed = SPEED_100;
+ break;
+
+ default:
+ phydev->speed = SPEED_10;
+ break;
+ }
+
if (!fiber) {
err = genphy_read_lpa(phydev);
if (err < 0)
@@ -1291,28 +1313,6 @@ static int marvell_read_status_page_an(struct phy_device *phydev,
}
}
- if (!(status & MII_M1011_PHY_STATUS_RESOLVED))
- return 0;
-
- if (status & MII_M1011_PHY_STATUS_FULLDUPLEX)
- phydev->duplex = DUPLEX_FULL;
- else
- phydev->duplex = DUPLEX_HALF;
-
- switch (status & MII_M1011_PHY_STATUS_SPD_MASK) {
- case MII_M1011_PHY_STATUS_1000:
- phydev->speed = SPEED_1000;
- break;
-
- case MII_M1011_PHY_STATUS_100:
- phydev->speed = SPEED_100;
- break;
-
- default:
- phydev->speed = SPEED_10;
- break;
- }
-
return 0;
}
--
2.26.0
On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote:
> The negotiation of flow control / pause frame modes was broken since
> commit fcf1f59afc67 ("net: phy: marvell: rearrange to use
> genphy_read_lpa()") moved the setting of phydev->duplex below the
> phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that
> function, phydev->pause was no longer set.
>
> Fix it by moving the parsing of the status variable before the blocks
> dealing with the pause frames.
>
> Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()")
> Cc: [email protected] # v5.6+
nit: please don't CC stable on networking patches
> Signed-off-by: Clemens Gruber <[email protected]>
> ---
> drivers/net/phy/marvell.c | 44 +++++++++++++++++++--------------------
> 1 file changed, 22 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> index 4714ca0e0d4b..02cde4c0668c 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev,
> int lpa;
> int err;
>
> + if (!(status & MII_M1011_PHY_STATUS_RESOLVED))
> + return 0;
If we return early here won't we miss updating the advertising bits?
We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t().
Perhaps extracting info from status should be moved to a helper so we
can return early without affecting the rest of the flow?
Is my understanding correct? Russell?
> + if (status & MII_M1011_PHY_STATUS_FULLDUPLEX)
> + phydev->duplex = DUPLEX_FULL;
> + else
> + phydev->duplex = DUPLEX_HALF;
> +
> + switch (status & MII_M1011_PHY_STATUS_SPD_MASK) {
> + case MII_M1011_PHY_STATUS_1000:
> + phydev->speed = SPEED_1000;
> + break;
> +
> + case MII_M1011_PHY_STATUS_100:
> + phydev->speed = SPEED_100;
> + break;
> +
> + default:
> + phydev->speed = SPEED_10;
> + break;
> + }
> +
> if (!fiber) {
> err = genphy_read_lpa(phydev);
> if (err < 0)
> @@ -1291,28 +1313,6 @@ static int marvell_read_status_page_an(struct phy_device *phydev,
> }
> }
>
> - if (!(status & MII_M1011_PHY_STATUS_RESOLVED))
> - return 0;
> -
> - if (status & MII_M1011_PHY_STATUS_FULLDUPLEX)
> - phydev->duplex = DUPLEX_FULL;
> - else
> - phydev->duplex = DUPLEX_HALF;
> -
> - switch (status & MII_M1011_PHY_STATUS_SPD_MASK) {
> - case MII_M1011_PHY_STATUS_1000:
> - phydev->speed = SPEED_1000;
> - break;
> -
> - case MII_M1011_PHY_STATUS_100:
> - phydev->speed = SPEED_100;
> - break;
> -
> - default:
> - phydev->speed = SPEED_10;
> - break;
> - }
> -
> return 0;
> }
>
On Fri, Apr 10, 2020 at 05:43:04PM -0700, Jakub Kicinski wrote:
> On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote:
> > The negotiation of flow control / pause frame modes was broken since
> > commit fcf1f59afc67 ("net: phy: marvell: rearrange to use
> > genphy_read_lpa()") moved the setting of phydev->duplex below the
> > phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that
> > function, phydev->pause was no longer set.
> >
> > Fix it by moving the parsing of the status variable before the blocks
> > dealing with the pause frames.
> >
> > Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()")
> > Cc: [email protected] # v5.6+
>
> nit: please don't CC stable on networking patches
>
> > Signed-off-by: Clemens Gruber <[email protected]>
> > ---
> > drivers/net/phy/marvell.c | 44 +++++++++++++++++++--------------------
> > 1 file changed, 22 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> > index 4714ca0e0d4b..02cde4c0668c 100644
> > --- a/drivers/net/phy/marvell.c
> > +++ b/drivers/net/phy/marvell.c
> > @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev,
> > int lpa;
> > int err;
> >
> > + if (!(status & MII_M1011_PHY_STATUS_RESOLVED))
> > + return 0;
>
> If we return early here won't we miss updating the advertising bits?
> We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t().
>
> Perhaps extracting info from status should be moved to a helper so we
> can return early without affecting the rest of the flow?
>
> Is my understanding correct? Russell?
You are correct - and yes, there is also a problem here.
It is not clear whether the resolved bit is set before or after the
link status reports that link is up - however, the resolved bit
indicates whether the speed and duplex are valid.
What I've done elsewhere is if the resolved bit is not set, then we
force phydev->link to be false, so we don't attempt to process a
link-up status until we can read the link parameters. I think that's
what needs to happen here, i.o.w.:
if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) {
phydev->link = 0;
return 0;
}
especially as we're not reading the LPA.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
On Sat, Apr 11, 2020 at 10:17:05AM +0100, Russell King - ARM Linux admin wrote:
> On Fri, Apr 10, 2020 at 05:43:04PM -0700, Jakub Kicinski wrote:
> > On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote:
> > > The negotiation of flow control / pause frame modes was broken since
> > > commit fcf1f59afc67 ("net: phy: marvell: rearrange to use
> > > genphy_read_lpa()") moved the setting of phydev->duplex below the
> > > phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that
> > > function, phydev->pause was no longer set.
> > >
> > > Fix it by moving the parsing of the status variable before the blocks
> > > dealing with the pause frames.
> > >
> > > Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()")
> > > Cc: [email protected] # v5.6+
> >
> > nit: please don't CC stable on networking patches
> >
> > > Signed-off-by: Clemens Gruber <[email protected]>
> > > ---
> > > drivers/net/phy/marvell.c | 44 +++++++++++++++++++--------------------
> > > 1 file changed, 22 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> > > index 4714ca0e0d4b..02cde4c0668c 100644
> > > --- a/drivers/net/phy/marvell.c
> > > +++ b/drivers/net/phy/marvell.c
> > > @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev,
> > > int lpa;
> > > int err;
> > >
> > > + if (!(status & MII_M1011_PHY_STATUS_RESOLVED))
> > > + return 0;
> >
> > If we return early here won't we miss updating the advertising bits?
> > We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t().
> >
> > Perhaps extracting info from status should be moved to a helper so we
> > can return early without affecting the rest of the flow?
> >
> > Is my understanding correct? Russell?
>
> You are correct - and yes, there is also a problem here.
>
> It is not clear whether the resolved bit is set before or after the
> link status reports that link is up - however, the resolved bit
> indicates whether the speed and duplex are valid.
I assumed that in the fiber case, the link status register won't be 1
until autonegotiation is complete. There is a part in the 88E1510
datasheet on page 57 [2.6.2], which says so but it's in the Fiber/Copper
Auto-Selection chapter and I am not sure if that's true in general. (?)
(For copper, we call genphy_update_link, which sets phydev->link to 0 if
autoneg is enabled && !completed. And according to the datasheet,
the resolved bit is set when autonegotiation is completed || disabled)
TL/DR:
It's probably a good idea to force link to 0 to be sure, as you
suggested below. I will send a v2 with that change.
Moving the extraction of info to a helper is probably better left to a
separate patch?
> What I've done elsewhere is if the resolved bit is not set, then we
> force phydev->link to be false, so we don't attempt to process a
> link-up status until we can read the link parameters. I think that's
> what needs to happen here, i.o.w.:
>
> if (!(status & MII_M1011_PHY_STATUS_RESOLVED)) {
> phydev->link = 0;
> return 0;
> }
>
> especially as we're not reading the LPA.
Thanks,
Clemens
On Sat, Apr 11, 2020 at 03:24:01PM +0200, Clemens Gruber wrote:
> On Sat, Apr 11, 2020 at 10:17:05AM +0100, Russell King - ARM Linux admin wrote:
> > On Fri, Apr 10, 2020 at 05:43:04PM -0700, Jakub Kicinski wrote:
> > > On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote:
> > > > The negotiation of flow control / pause frame modes was broken since
> > > > commit fcf1f59afc67 ("net: phy: marvell: rearrange to use
> > > > genphy_read_lpa()") moved the setting of phydev->duplex below the
> > > > phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that
> > > > function, phydev->pause was no longer set.
> > > >
> > > > Fix it by moving the parsing of the status variable before the blocks
> > > > dealing with the pause frames.
> > > >
> > > > Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()")
> > > > Cc: [email protected] # v5.6+
> > >
> > > nit: please don't CC stable on networking patches
> > >
> > > > Signed-off-by: Clemens Gruber <[email protected]>
> > > > ---
> > > > drivers/net/phy/marvell.c | 44 +++++++++++++++++++--------------------
> > > > 1 file changed, 22 insertions(+), 22 deletions(-)
> > > >
> > > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> > > > index 4714ca0e0d4b..02cde4c0668c 100644
> > > > --- a/drivers/net/phy/marvell.c
> > > > +++ b/drivers/net/phy/marvell.c
> > > > @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev,
> > > > int lpa;
> > > > int err;
> > > >
> > > > + if (!(status & MII_M1011_PHY_STATUS_RESOLVED))
> > > > + return 0;
> > >
> > > If we return early here won't we miss updating the advertising bits?
> > > We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t().
> > >
> > > Perhaps extracting info from status should be moved to a helper so we
> > > can return early without affecting the rest of the flow?
> > >
> > > Is my understanding correct? Russell?
> >
> > You are correct - and yes, there is also a problem here.
> >
> > It is not clear whether the resolved bit is set before or after the
> > link status reports that link is up - however, the resolved bit
> > indicates whether the speed and duplex are valid.
>
> I assumed that in the fiber case, the link status register won't be 1
> until autonegotiation is complete. There is a part in the 88E1510
> datasheet on page 57 [2.6.2], which says so but it's in the Fiber/Copper
> Auto-Selection chapter and I am not sure if that's true in general. (?)
The fiber code is IMHO very suspect; the decoding of the pause status
seems to be completely broken. However, I'm not sure whether anyone
actually uses that or not, so I've been trying not to touch it.
> (For copper, we call genphy_update_link, which sets phydev->link to 0 if
> autoneg is enabled && !completed. And according to the datasheet,
> the resolved bit is set when autonegotiation is completed || disabled)
The resolved bit indicates whether the resolution data is valid, which
will be set when autoneg is complete or autoneg is disabled. However,
the timing of the bit compared to the link status is not defined in the
datasheet - and that's the problem. If the link status bits report that
the link is up but the resolved bit is indicating that the resolution
is not valid, what do we do? Report potential garbage but link up to
the higher layers, or pretend that the link is down?
> TL/DR:
> It's probably a good idea to force link to 0 to be sure, as you
> suggested below. I will send a v2 with that change.
>
> Moving the extraction of info to a helper is probably better left to a
> separate patch?
I'm not sure what you're suggesting.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up
On Sat, Apr 11, 2020 at 02:43:44PM +0100, Russell King - ARM Linux admin wrote:
> On Sat, Apr 11, 2020 at 03:24:01PM +0200, Clemens Gruber wrote:
> > On Sat, Apr 11, 2020 at 10:17:05AM +0100, Russell King - ARM Linux admin wrote:
> > > On Fri, Apr 10, 2020 at 05:43:04PM -0700, Jakub Kicinski wrote:
> > > > On Wed, 8 Apr 2020 23:43:26 +0200 Clemens Gruber wrote:
> > > > > The negotiation of flow control / pause frame modes was broken since
> > > > > commit fcf1f59afc67 ("net: phy: marvell: rearrange to use
> > > > > genphy_read_lpa()") moved the setting of phydev->duplex below the
> > > > > phy_resolve_aneg_pause call. Due to a check of DUPLEX_FULL in that
> > > > > function, phydev->pause was no longer set.
> > > > >
> > > > > Fix it by moving the parsing of the status variable before the blocks
> > > > > dealing with the pause frames.
> > > > >
> > > > > Fixes: fcf1f59afc67 ("net: phy: marvell: rearrange to use genphy_read_lpa()")
> > > > > Cc: [email protected] # v5.6+
> > > >
> > > > nit: please don't CC stable on networking patches
> > > >
> > > > > Signed-off-by: Clemens Gruber <[email protected]>
> > > > > ---
> > > > > drivers/net/phy/marvell.c | 44 +++++++++++++++++++--------------------
> > > > > 1 file changed, 22 insertions(+), 22 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> > > > > index 4714ca0e0d4b..02cde4c0668c 100644
> > > > > --- a/drivers/net/phy/marvell.c
> > > > > +++ b/drivers/net/phy/marvell.c
> > > > > @@ -1263,6 +1263,28 @@ static int marvell_read_status_page_an(struct phy_device *phydev,
> > > > > int lpa;
> > > > > int err;
> > > > >
> > > > > + if (!(status & MII_M1011_PHY_STATUS_RESOLVED))
> > > > > + return 0;
> > > >
> > > > If we return early here won't we miss updating the advertising bits?
> > > > We will no longer call e.g. fiber_lpa_mod_linkmode_lpa_t().
> > > >
> > > > Perhaps extracting info from status should be moved to a helper so we
> > > > can return early without affecting the rest of the flow?
> > > >
> > > > Is my understanding correct? Russell?
> > >
> > > You are correct - and yes, there is also a problem here.
> > >
> > > It is not clear whether the resolved bit is set before or after the
> > > link status reports that link is up - however, the resolved bit
> > > indicates whether the speed and duplex are valid.
> >
> > I assumed that in the fiber case, the link status register won't be 1
> > until autonegotiation is complete. There is a part in the 88E1510
> > datasheet on page 57 [2.6.2], which says so but it's in the Fiber/Copper
> > Auto-Selection chapter and I am not sure if that's true in general. (?)
>
> The fiber code is IMHO very suspect; the decoding of the pause status
> seems to be completely broken. However, I'm not sure whether anyone
> actually uses that or not, so I've been trying not to touch it.
>
> > (For copper, we call genphy_update_link, which sets phydev->link to 0 if
> > autoneg is enabled && !completed. And according to the datasheet,
> > the resolved bit is set when autonegotiation is completed || disabled)
>
> The resolved bit indicates whether the resolution data is valid, which
> will be set when autoneg is complete or autoneg is disabled. However,
> the timing of the bit compared to the link status is not defined in the
> datasheet - and that's the problem. If the link status bits report that
> the link is up but the resolved bit is indicating that the resolution
> is not valid, what do we do? Report potential garbage but link up to
> the higher layers, or pretend that the link is down?
I see, thanks for the clarification. Pretending that the link is down
seems to be the right choice.
>
> > TL/DR:
> > It's probably a good idea to force link to 0 to be sure, as you
> > suggested below. I will send a v2 with that change.
> >
> > Moving the extraction of info to a helper is probably better left to a
> > separate patch?
>
> I'm not sure what you're suggesting.
I was referring to Jakub's suggestion to create a new helper function
for the parsing of the status register.
Clemens
On Sat, Apr 11, 2020 at 02:43:44PM +0100, Russell King - ARM Linux admin wrote:
> The fiber code is IMHO very suspect; the decoding of the pause status
> seems to be completely broken. However, I'm not sure whether anyone
> actually uses that or not, so I've been trying not to touch it.
If the following table for the link partner advertisement is correct..
PAUSE ASYM_PAUSE MEANING
0 0 Link partner has no pause frame support
0 1 <- Link partner can TX pause frames
1 0 <-> Link partner can RX and TX pauses
1 1 -> Link partner can RX pause frames
..then I think both pause and asym_pause have to be assigned
independently, like this:
phydev->pause = !!(lpa & LPA_1000XPAUSE);
phydev->asym_pause = !!(lpa & LPA_1000XPAUSE_ASYM);
(Using the defines from uapi mii.h instead of the redundant/combined
LPA_PAUSE_FIBER etc. which can then be removed from marvell.c)
Currently, if LPA_1000XPAUSE_ASYM is set we do pause=1 and asym_pause=1
no matter if LPA_1000XPAUSE is set. This could lead us to mistake a link
partner who can only send for one who can only receive pause frames.
^ Was this the problem you meant?
I saw that for the copper case and in other drivers, we first set the
ETHTOOL_LINK_MODE_(Asym_)Pause_BIT bit in lp_advertising and then set
phydev->(asym_)pause depending on the ETHTOOL_LINK_MODE_... bit.
Do you agree that we should also set the ETHTOOL_ bits in the fiber
case?
Does anybody have access to a Marvell PHY with 1000base-X Ethernet?
(I only have a 88E1510 + 1000Base-T at the home office)
Thanks,
Clemens
On Sun, Apr 12, 2020 at 07:03:36PM +0200, Clemens Gruber wrote:
> On Sat, Apr 11, 2020 at 02:43:44PM +0100, Russell King - ARM Linux admin wrote:
> > The fiber code is IMHO very suspect; the decoding of the pause status
> > seems to be completely broken. However, I'm not sure whether anyone
> > actually uses that or not, so I've been trying not to touch it.
>
> If the following table for the link partner advertisement is correct..
> PAUSE ASYM_PAUSE MEANING
> 0 0 Link partner has no pause frame support
> 0 1 <- Link partner can TX pause frames
> 1 0 <-> Link partner can RX and TX pauses
> 1 1 -> Link partner can RX pause frames
>
> ..then I think both pause and asym_pause have to be assigned
> independently, like this:
> phydev->pause = !!(lpa & LPA_1000XPAUSE);
> phydev->asym_pause = !!(lpa & LPA_1000XPAUSE_ASYM);
Yes, that's how it should be, because the pause and asym pause bits
correspond exactly with the phydev members.
> (Using the defines from uapi mii.h instead of the redundant/combined
> LPA_PAUSE_FIBER etc. which can then be removed from marvell.c)
>
> Currently, if LPA_1000XPAUSE_ASYM is set we do pause=1 and asym_pause=1
> no matter if LPA_1000XPAUSE is set. This could lead us to mistake a link
> partner who can only send for one who can only receive pause frames.
> ^ Was this the problem you meant?
Exactly, but given that I've no way to actually test anything with
regard to 1G Marvell PHYs using 1000BASE-X, I have to assume that
whoever contributed this code tested it and it worked for them. So,
it should not be changed just because it looks wrong - there may be
some subtle issues in the hardware that we don't know about that
makes this code "do the best it can". We need someone who can
actually do some tests to solve this.
> Does anybody have access to a Marvell PHY with 1000base-X Ethernet?
> (I only have a 88E1510 + 1000Base-T at the home office)
Yes, that's what we need... this isn't the first time I've mentioned
the problem, and so far no one has stepped forward.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 10.2Mbps down 587kbps up