LinuxLists.cc - [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY

2020-05-30 21:47:34

Subject: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

From: Vladimir Oltean <[email protected]>

In kernel 4.19 (and probably earlier too) there are issues surrounding
the PHY_AN state.

For example, if a PHY is in PHY_AN state and AN has not finished, then
what is supposed to happen is that the state machine gets rescheduled
until it is, or until the link_timeout reaches zero which triggers an
autoneg restart process.

But actually the rescheduling never works if the PHY uses interrupts,
because the condition under which rescheduling occurs is just if
phy_polling_mode() is true. So basically, this whole rescheduling
functionality works for AN-not-yet-complete just by mistake. Let me
explain.

Most of the time the AN process manages to finish by the time the
interrupt has triggered. One might say "that should always be the case,
otherwise the PHY wouldn't raise the interrupt, right?".
Well, some PHYs implement an .aneg_done method which allows them to tell
the state machine when the AN is really complete.
The AR8031/AR8033 driver (at803x.c) is one such example. Even when
copper autoneg completes, the driver still keeps the "aneg_done"
variable unset until in-band SGMII autoneg finishes too (there is no
interrupt for that). So we have the premises of a race condition.

In practice, what really happens depends on the log level of the serial
console. If the log level is verbose enough that kernel messages related
to the Ethernet link state are printed to the console, then this gives
in-band AN enough time to complete, which means the link will come up
and everyone will be happy. But if the console is not that verbose, the
link will sometimes come up, and sometimes will be forced down by the
.aneg_done of the PHY driver (forever, since we are not rescheduling).

The conclusion is that an extra condition needs to be explicitly added,
so that the state machine can be rescheduled properly. Otherwise PHY
devices in interrupt mode will never work properly if they have an
.aneg_done callback.

In more recent kernels, the whole PHY_AN state was removed by Heiner
Kallweit in the "[net-next,0/5] net: phy: improve and simplify phylib
state machine" series here:

https://patchwork.ozlabs.org/cover/994464/

and the problem was just masked away instead of being addressed with a
punctual patch.

Fixes: 76a423a3f8f1 ("net: phy: allow driver to implement their own aneg_done")
Signed-off-by: Vladimir Oltean <[email protected]>
---
I'm not sure the procedure I'm following is correct, sending this
directly to Greg. The patch doesn't apply on net.

drivers/net/phy/phy.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index cc454b8c032c..ca4fd74fd2c8 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -934,7 +934,7 @@ void phy_state_machine(struct work_struct *work)
struct delayed_work *dwork = to_delayed_work(work);
struct phy_device *phydev =
container_of(dwork, struct phy_device, state_queue);
- bool needs_aneg = false, do_suspend = false;
+ bool recheck = false, needs_aneg = false, do_suspend = false;
enum phy_state old_state;
int err = 0;
int old_link;
@@ -981,6 +981,8 @@ void phy_state_machine(struct work_struct *work)
phy_link_up(phydev);
} else if (0 == phydev->link_timeout--)
needs_aneg = true;
+ else
+ recheck = true;
break;
case PHY_NOLINK:
if (!phy_polling_mode(phydev))
@@ -1123,7 +1125,7 @@ void phy_state_machine(struct work_struct *work)
* PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
* between states from phy_mac_interrupt()
*/
- if (phy_polling_mode(phydev))
+ if (phy_polling_mode(phydev) || recheck)
queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
PHY_STATE_TIME * HZ);
}
--
2.25.1

2020-05-30 22:38:49

by Heiner Kallweit

[permalink] [raw]

Subject: Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

On 30.05.2020 23:43, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> In kernel 4.19 (and probably earlier too) there are issues surrounding
> the PHY_AN state.
>
> For example, if a PHY is in PHY_AN state and AN has not finished, then
> what is supposed to happen is that the state machine gets rescheduled
> until it is, or until the link_timeout reaches zero which triggers an
> autoneg restart process.
>
> But actually the rescheduling never works if the PHY uses interrupts,
> because the condition under which rescheduling occurs is just if
> phy_polling_mode() is true. So basically, this whole rescheduling
> functionality works for AN-not-yet-complete just by mistake. Let me
> explain.
>
> Most of the time the AN process manages to finish by the time the
> interrupt has triggered. One might say "that should always be the case,
> otherwise the PHY wouldn't raise the interrupt, right?".
> Well, some PHYs implement an .aneg_done method which allows them to tell
> the state machine when the AN is really complete.
> The AR8031/AR8033 driver (at803x.c) is one such example. Even when
> copper autoneg completes, the driver still keeps the "aneg_done"
> variable unset until in-band SGMII autoneg finishes too (there is no
> interrupt for that). So we have the premises of a race condition.
>
That's not nice from the PHY:
It signals "link up", and if the system asks the PHY for link details,
then it sheepishly says "well, link is *almost* up".

Question would be whether the same happens with other SGMII-capable
PHY's so that we need to cater for this scenario in phylib.
Or whether we consider it a chip quirk. In the latter case a custom
read_status() handler might do the trick too: if link is reported
as up then wait until aneg is signaled as done too before reading
further link details.

And it's interesting that nobody else stumbled across this problem
before. I mean the PHY we talk about isn't really new. Or is your
use case so special?

> In practice, what really happens depends on the log level of the serial
> console. If the log level is verbose enough that kernel messages related
> to the Ethernet link state are printed to the console, then this gives
> in-band AN enough time to complete, which means the link will come up
> and everyone will be happy. But if the console is not that verbose, the
> link will sometimes come up, and sometimes will be forced down by the
> .aneg_done of the PHY driver (forever, since we are not rescheduling).
>
> The conclusion is that an extra condition needs to be explicitly added,
> so that the state machine can be rescheduled properly. Otherwise PHY
> devices in interrupt mode will never work properly if they have an
> .aneg_done callback.
>
> In more recent kernels, the whole PHY_AN state was removed by Heiner
> Kallweit in the "[net-next,0/5] net: phy: improve and simplify phylib
> state machine" series here:
>
> https://patchwork.ozlabs.org/cover/994464/
>
> and the problem was just masked away instead of being addressed with a
> punctual patch.
>
> Fixes: 76a423a3f8f1 ("net: phy: allow driver to implement their own aneg_done")
> Signed-off-by: Vladimir Oltean <[email protected]>
> ---
> I'm not sure the procedure I'm following is correct, sending this
> directly to Greg. The patch doesn't apply on net.
>
> drivers/net/phy/phy.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> index cc454b8c032c..ca4fd74fd2c8 100644
> --- a/drivers/net/phy/phy.c
> +++ b/drivers/net/phy/phy.c
> @@ -934,7 +934,7 @@ void phy_state_machine(struct work_struct *work)
> struct delayed_work *dwork = to_delayed_work(work);
> struct phy_device *phydev =
> container_of(dwork, struct phy_device, state_queue);
> - bool needs_aneg = false, do_suspend = false;
> + bool recheck = false, needs_aneg = false, do_suspend = false;
> enum phy_state old_state;
> int err = 0;
> int old_link;
> @@ -981,6 +981,8 @@ void phy_state_machine(struct work_struct *work)
> phy_link_up(phydev);
> } else if (0 == phydev->link_timeout--)
> needs_aneg = true;
> + else
> + recheck = true;
> break;
> case PHY_NOLINK:
> if (!phy_polling_mode(phydev))
> @@ -1123,7 +1125,7 @@ void phy_state_machine(struct work_struct *work)
> * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
> * between states from phy_mac_interrupt()
> */
> - if (phy_polling_mode(phydev))
> + if (phy_polling_mode(phydev) || recheck)
> queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
> PHY_STATE_TIME * HZ);
> }
>

2020-05-30 22:44:38

by Vladimir Oltean

[permalink] [raw]

Subject: Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

Hi Heiner,

On Sun, 31 May 2020 at 01:36, Heiner Kallweit <[email protected]> wrote:
>
> On 30.05.2020 23:43, Vladimir Oltean wrote:
> > From: Vladimir Oltean <[email protected]>
> >
> > In kernel 4.19 (and probably earlier too) there are issues surrounding
> > the PHY_AN state.
> >
> > For example, if a PHY is in PHY_AN state and AN has not finished, then
> > what is supposed to happen is that the state machine gets rescheduled
> > until it is, or until the link_timeout reaches zero which triggers an
> > autoneg restart process.
> >
> > But actually the rescheduling never works if the PHY uses interrupts,
> > because the condition under which rescheduling occurs is just if
> > phy_polling_mode() is true. So basically, this whole rescheduling
> > functionality works for AN-not-yet-complete just by mistake. Let me
> > explain.
> >
> > Most of the time the AN process manages to finish by the time the
> > interrupt has triggered. One might say "that should always be the case,
> > otherwise the PHY wouldn't raise the interrupt, right?".
> > Well, some PHYs implement an .aneg_done method which allows them to tell
> > the state machine when the AN is really complete.
> > The AR8031/AR8033 driver (at803x.c) is one such example. Even when
> > copper autoneg completes, the driver still keeps the "aneg_done"
> > variable unset until in-band SGMII autoneg finishes too (there is no
> > interrupt for that). So we have the premises of a race condition.
> >
> That's not nice from the PHY:
> It signals "link up", and if the system asks the PHY for link details,
> then it sheepishly says "well, link is *almost* up".
>

The copper-side link is 100% up. In my opinion this is actually abuse
of the .aneg_done API. Here's what the guy who added it had to say:

commit f62265b53ef34a372b657c99e23d32e95b464316
Author: Zefir Kurtisi <[email protected]>
Date: Mon Oct 24 12:40:54 2016 +0200

at803x: double check SGMII side autoneg

In SGMII mode, we observed an autonegotiation issue
after power-down-up cycles where the copper side
reports successful link establishment but the
SGMII side's link is down.

This happened in a setup where the at8031 is
connected over SGMII to a eTSEC (fsl gianfar),
but so far could not be reproduced with other
Ethernet device / driver combinations.

This commit adds a wrapper function for at8031
that in case of operating in SGMII mode double
checks SGMII link state when generic aneg_done()
succeeds. It prints a warning on failure but
intentionally does not try to recover from this
state. As a result, if you ever see a warning
'803x_aneg_done: SGMII link is not ok' you will
end up having an Ethernet link up but won't get
any data through. This should not happen, if it
does, please contact the module maintainer.

Signed-off-by: Zefir Kurtisi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>

> Question would be whether the same happens with other SGMII-capable
> PHY's so that we need to cater for this scenario in phylib.
> Or whether we consider it a chip quirk. In the latter case a custom
> read_status() handler might do the trick too: if link is reported
> as up then wait until aneg is signaled as done too before reading
> further link details.
>
> And it's interesting that nobody else stumbled across this problem
> before. I mean the PHY we talk about isn't really new. Or is your
> use case so special?
>

No, my use case isn't special at all. Just using it in interrupt mode :)
Today we have the pcs_poll option in phylink (checking the in-band AN
status at PCS side and not at PHY side). But not all Ethernet drivers
have phylink. Actually I think there's no good place in phylib to do
this in-band AN status checking.

But my patch is rather minimal and makes things work in the way there
were intended at that time.

> > In practice, what really happens depends on the log level of the serial
> > console. If the log level is verbose enough that kernel messages related
> > to the Ethernet link state are printed to the console, then this gives
> > in-band AN enough time to complete, which means the link will come up
> > and everyone will be happy. But if the console is not that verbose, the
> > link will sometimes come up, and sometimes will be forced down by the
> > .aneg_done of the PHY driver (forever, since we are not rescheduling).
> >
> > The conclusion is that an extra condition needs to be explicitly added,
> > so that the state machine can be rescheduled properly. Otherwise PHY
> > devices in interrupt mode will never work properly if they have an
> > .aneg_done callback.
> >
> > In more recent kernels, the whole PHY_AN state was removed by Heiner
> > Kallweit in the "[net-next,0/5] net: phy: improve and simplify phylib
> > state machine" series here:
> >
> > https://patchwork.ozlabs.org/cover/994464/
> >
> > and the problem was just masked away instead of being addressed with a
> > punctual patch.
> >
> > Fixes: 76a423a3f8f1 ("net: phy: allow driver to implement their own aneg_done")
> > Signed-off-by: Vladimir Oltean <[email protected]>
> > ---
> > I'm not sure the procedure I'm following is correct, sending this
> > directly to Greg. The patch doesn't apply on net.
> >
> > drivers/net/phy/phy.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
> > index cc454b8c032c..ca4fd74fd2c8 100644
> > --- a/drivers/net/phy/phy.c
> > +++ b/drivers/net/phy/phy.c
> > @@ -934,7 +934,7 @@ void phy_state_machine(struct work_struct *work)
> > struct delayed_work *dwork = to_delayed_work(work);
> > struct phy_device *phydev =
> > container_of(dwork, struct phy_device, state_queue);
> > - bool needs_aneg = false, do_suspend = false;
> > + bool recheck = false, needs_aneg = false, do_suspend = false;
> > enum phy_state old_state;
> > int err = 0;
> > int old_link;
> > @@ -981,6 +981,8 @@ void phy_state_machine(struct work_struct *work)
> > phy_link_up(phydev);
> > } else if (0 == phydev->link_timeout--)
> > needs_aneg = true;
> > + else
> > + recheck = true;
> > break;
> > case PHY_NOLINK:
> > if (!phy_polling_mode(phydev))
> > @@ -1123,7 +1125,7 @@ void phy_state_machine(struct work_struct *work)
> > * PHY, if PHY_IGNORE_INTERRUPT is set, then we will be moving
> > * between states from phy_mac_interrupt()
> > */
> > - if (phy_polling_mode(phydev))
> > + if (phy_polling_mode(phydev) || recheck)
> > queue_delayed_work(system_power_efficient_wq, &phydev->state_queue,
> > PHY_STATE_TIME * HZ);
> > }
> >
>

Thanks,
-Vladimir

2020-05-31 00:21:14

by Russell King (Oracle)

[permalink] [raw]

Subject: Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

On Sun, May 31, 2020 at 12:43:15AM +0300, Vladimir Oltean wrote:
> From: Vladimir Oltean <[email protected]>
>
> In kernel 4.19 (and probably earlier too) there are issues surrounding
> the PHY_AN state.
>
> For example, if a PHY is in PHY_AN state and AN has not finished, then
> what is supposed to happen is that the state machine gets rescheduled
> until it is, or until the link_timeout reaches zero which triggers an
> autoneg restart process.
>
> But actually the rescheduling never works if the PHY uses interrupts,
> because the condition under which rescheduling occurs is just if
> phy_polling_mode() is true. So basically, this whole rescheduling
> functionality works for AN-not-yet-complete just by mistake. Let me
> explain.
>
> Most of the time the AN process manages to finish by the time the
> interrupt has triggered. One might say "that should always be the case,
> otherwise the PHY wouldn't raise the interrupt, right?".
> Well, some PHYs implement an .aneg_done method which allows them to tell
> the state machine when the AN is really complete.
> The AR8031/AR8033 driver (at803x.c) is one such example. Even when
> copper autoneg completes, the driver still keeps the "aneg_done"
> variable unset until in-band SGMII autoneg finishes too (there is no
> interrupt for that). So we have the premises of a race condition.

Why do we care whether SGMII autoneg has completed - is that not the
domain of the MAC side of the link?

It sounds like things are a little confused. The PHY interrupt is
signalling that the copper side has completed its autoneg. If we're
in SGMII mode, the PHY can now start the process of informing the
MAC about the negotiation results across the SGMII link. When the
MAC receives those results, and sends the acknowledgement back to the
PHY, is it not the responsibility of the MAC to then say "the link is
now up" ?

That's how we deal with it elsewhere with phylink integration, which
is what has to be done when you have to cope with PHYs that switch
their host interface mode between SGMII, 2500BASE-X, 5GBASE-R and
10GBASE-R - the MAC side needs to be dynamically reconfigured depending
on the new host-side operating mode of the PHY. Only when the MAC
subsequently reports that the link has been established is the whole
link from the MAC to the media deemed to be operational.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-05-31 21:04:05

by Vladimir Oltean

[permalink] [raw]

Subject: Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

On Sun, 31 May 2020 at 03:19, Russell King - ARM Linux admin
<[email protected]> wrote:
>
> On Sun, May 31, 2020 at 12:43:15AM +0300, Vladimir Oltean wrote:
> > From: Vladimir Oltean <[email protected]>
> >
> > In kernel 4.19 (and probably earlier too) there are issues surrounding
> > the PHY_AN state.
> >
> > For example, if a PHY is in PHY_AN state and AN has not finished, then
> > what is supposed to happen is that the state machine gets rescheduled
> > until it is, or until the link_timeout reaches zero which triggers an
> > autoneg restart process.
> >
> > But actually the rescheduling never works if the PHY uses interrupts,
> > because the condition under which rescheduling occurs is just if
> > phy_polling_mode() is true. So basically, this whole rescheduling
> > functionality works for AN-not-yet-complete just by mistake. Let me
> > explain.
> >
> > Most of the time the AN process manages to finish by the time the
> > interrupt has triggered. One might say "that should always be the case,
> > otherwise the PHY wouldn't raise the interrupt, right?".
> > Well, some PHYs implement an .aneg_done method which allows them to tell
> > the state machine when the AN is really complete.
> > The AR8031/AR8033 driver (at803x.c) is one such example. Even when
> > copper autoneg completes, the driver still keeps the "aneg_done"
> > variable unset until in-band SGMII autoneg finishes too (there is no
> > interrupt for that). So we have the premises of a race condition.
>
> Why do we care whether SGMII autoneg has completed - is that not the
> domain of the MAC side of the link?
>
> It sounds like things are a little confused. The PHY interrupt is
> signalling that the copper side has completed its autoneg. If we're
> in SGMII mode, the PHY can now start the process of informing the
> MAC about the negotiation results across the SGMII link. When the
> MAC receives those results, and sends the acknowledgement back to the
> PHY, is it not the responsibility of the MAC to then say "the link is
> now up" ?
>

Things are not at all confused on my end, Russell.
The "803x_aneg_done: SGMII link is not ok" log message had made me
aware of the existence of this piece of code for a very long while
now, but to be honest I hadn't actually read the commit message in
full detail until I replied to Heiner above. Especially this part:

It prints a warning on failure but
intentionally does not try to recover from this
state. As a result, if you ever see a warning
'803x_aneg_done: SGMII link is not ok' you will
end up having an Ethernet link up but won't get
any data through. This should not happen, if it
does, please contact the module maintainer.

The author highlighted a valid issue, but then came up with a BS
solution for it. It solves no problem, and it creates a problem for
some who originally had none.
When used in poll mode, the at803x.c driver would occasionally catch
the in-band AN in a state where it wasn't yet complete, so it would
print this message once, but all was ok in the end since the state
machine would get rescheduled and the link would come up. So I
genuinely thought that the intention of the patch was to be helpful.
But according to his own words, it is just trying to throw its hands
up in the air and lay the blame on somebody else [ the gianfar
maintainer ]. So in that sense, maybe it's my 'fault' for trying to
make the link come up with 803x_aneg_done in place. Maybe I should
just respectfully revert the patch
f62265b53ef34a372b657c99e23d32e95b464316, and replace it with some
other framework. The trouble is, what to replace it with?

> That's how we deal with it elsewhere with phylink integration, which
> is what has to be done when you have to cope with PHYs that switch
> their host interface mode between SGMII, 2500BASE-X, 5GBASE-R and
> 10GBASE-R - the MAC side needs to be dynamically reconfigured depending
> on the new host-side operating mode of the PHY. Only when the MAC
> subsequently reports that the link has been established is the whole
> link from the MAC to the media deemed to be operational.
>

This sounds to me like 'phylink has this one figured out', but I would
beg to differ.
My opinion is that it's not obvious that it would be the MAC's
responsibility to determine whether the overall link is operational
(applied in this case to in-band AN), but the system's responsibility,
and for a simple reason: it takes 2 to negotiate. The MAC PCS and the
PHY have to agree on whether they perform in-band AN or not. And
that's wild jungle right there, with some PHY drivers capable of
in-band AN keeping it enabled and some disabled (and even worse, PHY
drivers that don't enable it in Linux but enable it in U-Boot, and
since the setting is sticky, it changes the default behavior), and
phylink hasn't done anything to add some rules to it, just some
MAC-side knobs to turn for a particular MAC-PHY combination until
something works.

This is all relevant because our options for the stable trees boil
down to 2 choices:
- Revert f62265b53ef34a372b657c99e23d32e95b464316, fix an API misuse
and a bug, but lose an (admittedly ad-hoc, but still) useful way of
troubleshooting a system misconfiguration (hide the problem that Zefir
Kurtisi was seeing).
- Apply this patch which make the PHY state machine work even with
this bent interpretation of the API. It's not as if all phylib users
could migrate to phylink in stable trees, and even phylink doesn't
catch all possible configuration cases currently.

Either one is fine with me.

> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps u

Thanks,
-Vladimir

2020-06-01 00:30:21

by Russell King (Oracle)

[permalink] [raw]

Subject: Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

On Mon, Jun 01, 2020 at 12:00:16AM +0300, Vladimir Oltean wrote:
> On Sun, 31 May 2020 at 03:19, Russell King - ARM Linux admin
> <[email protected]> wrote:
> >
> > On Sun, May 31, 2020 at 12:43:15AM +0300, Vladimir Oltean wrote:
> > > From: Vladimir Oltean <[email protected]>
> > >
> > > In kernel 4.19 (and probably earlier too) there are issues surrounding
> > > the PHY_AN state.
> > >
> > > For example, if a PHY is in PHY_AN state and AN has not finished, then
> > > what is supposed to happen is that the state machine gets rescheduled
> > > until it is, or until the link_timeout reaches zero which triggers an
> > > autoneg restart process.
> > >
> > > But actually the rescheduling never works if the PHY uses interrupts,
> > > because the condition under which rescheduling occurs is just if
> > > phy_polling_mode() is true. So basically, this whole rescheduling
> > > functionality works for AN-not-yet-complete just by mistake. Let me
> > > explain.
> > >
> > > Most of the time the AN process manages to finish by the time the
> > > interrupt has triggered. One might say "that should always be the case,
> > > otherwise the PHY wouldn't raise the interrupt, right?".
> > > Well, some PHYs implement an .aneg_done method which allows them to tell
> > > the state machine when the AN is really complete.
> > > The AR8031/AR8033 driver (at803x.c) is one such example. Even when
> > > copper autoneg completes, the driver still keeps the "aneg_done"
> > > variable unset until in-band SGMII autoneg finishes too (there is no
> > > interrupt for that). So we have the premises of a race condition.
> >
> > Why do we care whether SGMII autoneg has completed - is that not the
> > domain of the MAC side of the link?
> >
> > It sounds like things are a little confused. The PHY interrupt is
> > signalling that the copper side has completed its autoneg. If we're
> > in SGMII mode, the PHY can now start the process of informing the
> > MAC about the negotiation results across the SGMII link. When the
> > MAC receives those results, and sends the acknowledgement back to the
> > PHY, is it not the responsibility of the MAC to then say "the link is
> > now up" ?
> >
>
> Things are not at all confused on my end, Russell.

When I say "It sounds like things are a little confused." I am
referring to the damn PHY/driver, not to you - that should be
obvious because I then describe what _should_ be happening there
in the same damn paragraph.

> The "803x_aneg_done: SGMII link is not ok" log message had made me
> aware of the existence of this piece of code for a very long while
> now, but to be honest I hadn't actually read the commit message in
> full detail until I replied to Heiner above. Especially this part:
>
> It prints a warning on failure but
> intentionally does not try to recover from this
> state. As a result, if you ever see a warning
> '803x_aneg_done: SGMII link is not ok' you will
> end up having an Ethernet link up but won't get
> any data through. This should not happen, if it
> does, please contact the module maintainer.

Yes, I read that after sending my reply - it seems there's a problem
with this PHY not being able to complete the SGMII side of the link,
but it's unclear what the problem actually is. I don't know the
background, but it sounds like Zefir was hoping someone would make
contact soon after the commit was added so that further debug could
happen, maybe? If that was the intention, the warning message could
have been better. In any case, as we're now four years on before
someone has tripped over it...

> The author highlighted a valid issue, but then came up with a BS
> solution for it. It solves no problem, and it creates a problem for
> some who originally had none.
> When used in poll mode, the at803x.c driver would occasionally catch
> the in-band AN in a state where it wasn't yet complete, so it would
> print this message once, but all was ok in the end since the state
> machine would get rescheduled and the link would come up. So I
> genuinely thought that the intention of the patch was to be helpful.
> But according to his own words, it is just trying to throw its hands
> up in the air and lay the blame on somebody else [ the gianfar
> maintainer ]. So in that sense, maybe it's my 'fault' for trying to
> make the link come up with 803x_aneg_done in place. Maybe I should
> just respectfully revert the patch
> f62265b53ef34a372b657c99e23d32e95b464316, and replace it with some
> other framework. The trouble is, what to replace it with?

Maybe it's how you say it is, or maybe it isn't... unless we get
input from those involved, it's going to need to be re-investigated.

> > That's how we deal with it elsewhere with phylink integration, which
> > is what has to be done when you have to cope with PHYs that switch
> > their host interface mode between SGMII, 2500BASE-X, 5GBASE-R and
> > 10GBASE-R - the MAC side needs to be dynamically reconfigured depending
> > on the new host-side operating mode of the PHY. Only when the MAC
> > subsequently reports that the link has been established is the whole
> > link from the MAC to the media deemed to be operational.
>
> This sounds to me like 'phylink has this one figured out', but I would
> beg to differ.

phylink has this case figured out for a PHY that does bring up
the SGMII side according to the specs. That's precisely how
copper SFPs work, which are fully supported - that's not opinion,
that's a fact. In the case of a PHY which doesn't reliably
complete SGMII, then no, phylink doesn't have that worked out.

However, yet again, you've taken my comments completely the wrong
way - what you quote was meant to illustrate _why_ waiting for
the SGMII side to come up before reporting link up from the PHY
is wrong.

> My opinion is that it's not obvious that it would be the MAC's
> responsibility to determine whether the overall link is operational
> (applied in this case to in-band AN), but the system's responsibility,
> and for a simple reason: it takes 2 to negotiate. The MAC PCS and the
> PHY have to agree on whether they perform in-band AN or not.

Right, and if you configure phylink to use "in-band-status" in
DT, you will be telling phylink that this in-band status is to be
used. If that's not what you want, then you don't specify that,
and use PHY or fixed-link modes... and why phylink integrates the
whole system status when using "in-band-status" mode.

> And
> that's wild jungle right there, with some PHY drivers capable of
> in-band AN keeping it enabled and some disabled (and even worse, PHY
> drivers that don't enable it in Linux but enable it in U-Boot, and
> since the setting is sticky, it changes the default behavior), and
> phylink hasn't done anything to add some rules to it, just some
> MAC-side knobs to turn for a particular MAC-PHY combination until
> something works.

... because there hasn't been a /need/ to do add this yet; no one
has asked for it, no one has provided me hardware that requires it,
and I don't have oodles of spare time to hack randomly on stuff that
maybe no one wants right now.

When I set out with phylink, it was to solve the problem of getting
SFPs to work with all the in-band complexities that they have, while
keeping compatibility with phylib and it does that reasonably well.

It seems that you're expecting it to also solve this Coronavirus
pandemic, world hunger, etc.

> This is all relevant because our options for the stable trees boil
> down to 2 choices:
> - Revert f62265b53ef34a372b657c99e23d32e95b464316, fix an API misuse
> and a bug, but lose an (admittedly ad-hoc, but still) useful way of
> troubleshooting a system misconfiguration (hide the problem that Zefir
> Kurtisi was seeing).

Or maybe just allow at803x_aneg_done() to return non-zero but still
print the warning (preferably identifying the affected PHY) so
your hard-to-debug problem still gets a useful kernel message pointing
out what the problem is?

> - Apply this patch which make the PHY state machine work even with
> this bent interpretation of the API. It's not as if all phylib users
> could migrate to phylink in stable trees, and even phylink doesn't
> catch all possible configuration cases currently.

I wasn't even proposing that as a solution.

And yes, I do have some copper SFP modules that have an (inaccessible)
AR803x PHY on them - Microtik S-RJ01 to be exact. I forget exactly
which variant it is, and no, I haven't seen any of this "SGMII fails
to come up" - in fact, the in-band SGMII status is the only way to
know what the PHY negotiated with its link partner... and this SFP
module works with phylink with no issues.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-06-01 20:59:57

by Vladimir Oltean

[permalink] [raw]

Subject: Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

On Mon, 1 Jun 2020 at 03:28, Russell King - ARM Linux admin
<[email protected]> wrote:
>
> On Mon, Jun 01, 2020 at 12:00:16AM +0300, Vladimir Oltean wrote:

> > This is all relevant because our options for the stable trees boil
> > down to 2 choices:
> > - Revert f62265b53ef34a372b657c99e23d32e95b464316, fix an API misuse
> > and a bug, but lose an (admittedly ad-hoc, but still) useful way of
> > troubleshooting a system misconfiguration (hide the problem that Zefir
> > Kurtisi was seeing).
>
> Or maybe just allow at803x_aneg_done() to return non-zero but still
> print the warning (preferably identifying the affected PHY) so
> your hard-to-debug problem still gets a useful kernel message pointing
> out what the problem is?
>

Maybe.

> > - Apply this patch which make the PHY state machine work even with
> > this bent interpretation of the API. It's not as if all phylib users
> > could migrate to phylink in stable trees, and even phylink doesn't
> > catch all possible configuration cases currently.
>
> I wasn't even proposing that as a solution.
>
> And yes, I do have some copper SFP modules that have an (inaccessible)
> AR803x PHY on them - Microtik S-RJ01 to be exact. I forget exactly
> which variant it is, and no, I haven't seen any of this "SGMII fails
> to come up" - in fact, the in-band SGMII status is the only way to
> know what the PHY negotiated with its link partner... and this SFP
> module works with phylink with no issues.
>

See, you should also specify what kernel you're testing on. Since
Heiner did the PHY_AN cleanup, phy_aneg_done is basically dead code
from the state machine's perspective, only a few random drivers call
it:
https://elixir.bootlin.com/linux/latest/A/ident/phy_aneg_done
So I would not be at all surprised that you're not hitting it simply
because at803x_aneg_done is never in your call path.

> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

Thanks,
-Vladimir

2020-06-01 21:23:58

by Russell King (Oracle)

[permalink] [raw]

Subject: Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

On Mon, Jun 01, 2020 at 11:57:30PM +0300, Vladimir Oltean wrote:
> On Mon, 1 Jun 2020 at 03:28, Russell King - ARM Linux admin
> <[email protected]> wrote:
> > And yes, I do have some copper SFP modules that have an (inaccessible)
> > AR803x PHY on them - Microtik S-RJ01 to be exact. I forget exactly
> > which variant it is, and no, I haven't seen any of this "SGMII fails
> > to come up" - in fact, the in-band SGMII status is the only way to
> > know what the PHY negotiated with its link partner... and this SFP
> > module works with phylink with no issues.
>
> See, you should also specify what kernel you're testing on. Since
> Heiner did the PHY_AN cleanup, phy_aneg_done is basically dead code
> from the state machine's perspective, only a few random drivers call
> it:
> https://elixir.bootlin.com/linux/latest/A/ident/phy_aneg_done
> So I would not be at all surprised that you're not hitting it simply
> because at803x_aneg_done is never in your call path.

Please re-read the paragraph of my reply that is quoted above, and
consider your response to it in light of the word *inaccessible* in
my paragraph above.

Specifically, ask yourself the question: "if the PHY is inaccessible,
does it matter what kernel version is being tested? Does it matter
what the at803x code is doing?"

The point that I was trying to get across, but you seem to have missed,
is that this SFP module uses an AR803x PHY that is inaccessible and I
have never seen a problem with the SGMII side coming up - and if the
SGMII side does not come up, we have no way to know what the copper
side is doing. With this module, we are totally reliant on the SGMII
side working correctly to work out what the copper side status is.

*Frustrated*.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

2020-06-01 21:42:46

by Vladimir Oltean

[permalink] [raw]

Subject: Re: [PATCH stable-4.19.y] net: phy: reschedule state machine if AN has not completed in PHY_AN state

On Tue, 2 Jun 2020 at 00:21, Russell King - ARM Linux admin
<[email protected]> wrote:
>
> On Mon, Jun 01, 2020 at 11:57:30PM +0300, Vladimir Oltean wrote:
> > On Mon, 1 Jun 2020 at 03:28, Russell King - ARM Linux admin
> > <[email protected]> wrote:
> > > And yes, I do have some copper SFP modules that have an (inaccessible)
> > > AR803x PHY on them - Microtik S-RJ01 to be exact. I forget exactly
> > > which variant it is, and no, I haven't seen any of this "SGMII fails
> > > to come up" - in fact, the in-band SGMII status is the only way to
> > > know what the PHY negotiated with its link partner... and this SFP
> > > module works with phylink with no issues.
> >
> > See, you should also specify what kernel you're testing on. Since
> > Heiner did the PHY_AN cleanup, phy_aneg_done is basically dead code
> > from the state machine's perspective, only a few random drivers call
> > it:
> > https://elixir.bootlin.com/linux/latest/A/ident/phy_aneg_done
> > So I would not be at all surprised that you're not hitting it simply
> > because at803x_aneg_done is never in your call path.
>
> Please re-read the paragraph of my reply that is quoted above, and
> consider your response to it in light of the word *inaccessible* in
> my paragraph above.
>
> Specifically, ask yourself the question: "if the PHY is inaccessible,
> does it matter what kernel version is being tested? Does it matter
> what the at803x code is doing?"
>
> The point that I was trying to get across, but you seem to have missed,
> is that this SFP module uses an AR803x PHY that is inaccessible and I
> have never seen a problem with the SGMII side coming up - and if the
> SGMII side does not come up, we have no way to know what the copper
> side is doing. With this module, we are totally reliant on the SGMII
> side working correctly to work out what the copper side status is.
>
> *Frustrated*.
>
> --
> RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
> FTTC for 0.8m (est. 1762m) line in suburbia: sync at 13.1Mbps down 424kbps up

So I ignored the "inaccessible" part because I failed to understand
the relevance of your reply given the issue at hand. I wasn't trying
to suggest that the AT803x SGMII AN logic is broken.