2023-01-03 22:22:10

by Sean Anderson

[permalink] [raw]
Subject: [PATCH net-next v5 0/4] phy: aquantia: Determine rate adaptation support from registers

This attempts to address the problems first reported in [1]. Tim has an
Aquantia phy where the firmware is set up to use "5G XFI" (underclocked
10GBASE-R) when rate adapting lower speeds. This results in us
advertising that we support lower speeds and then failing to bring the
link up. To avoid this, determine whether to enable rate adaptation
based on what's programmed by the firmware. This is "the worst choice"
[2], but we can't really do better until we have more insight into
what the firmware is doing. At the very least, we can prevent bad
firmware from causing us to advertise the wrong modes.

Past submissions may be found at [3, 4].

[1] https://lore.kernel.org/netdev/CAJ+vNU3zeNqiGhjTKE8jRjDYR0D7f=iqPLB8phNyA2CWixy7JA@mail.gmail.com/
[2] https://lore.kernel.org/netdev/20221118171643.vu6uxbnmog4sna65@skbuf/
[3] https://lore.kernel.org/netdev/[email protected]/
[4] https://lore.kernel.org/netdev/[email protected]/

Changes in v5:
- Add missing PMA/PMD speed bits
- Don't handle PHY_INTERFACE_MODE_NA, and simplify logic

Changes in v4:
- Reorganize MDIO defines
- Fix kerneldoc using - instead of : for parameters

Changes in v3:
- Update speed register bits
- Fix incorrect bits for PMA/PMD speed

Changes in v2:
- Move/rename phylink_interface_max_speed
- Rework to just validate things instead of modifying registers

Sean Anderson (4):
net: phy: Move/rename phylink_interface_max_speed
phy: mdio: Reorganize defines
net: mdio: Update speed register bits
phy: aquantia: Determine rate adaptation support from registers

drivers/net/phy/aquantia_main.c | 136 ++++++++++++++++++++++++++++++--
drivers/net/phy/phy-core.c | 70 ++++++++++++++++
drivers/net/phy/phylink.c | 75 +-----------------
include/linux/phy.h | 1 +
include/uapi/linux/mdio.h | 118 ++++++++++++++++++---------
5 files changed, 282 insertions(+), 118 deletions(-)

--
2.35.1.1320.gc452695387.dirty


2023-01-03 22:47:10

by Sean Anderson

[permalink] [raw]
Subject: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

When autonegotiation completes, the phy interface will be set based on
the global config register for that speed. If the SERDES mode is set to
something which the MAC does not support, then the link will not come
up. To avoid this, validate each combination of interface speed and link
speed which might be configured. This way, we ensure that we only
consider rate adaptation in our advertisement when we can actually use
it.

For some firmwares, not all speeds are supported. In this case, the
global config register for that speed will be initialized to zero
(indicating that rate adaptation is not supported). We can detect this
by reading the PMA/PMD speed register to determine which speeds are
supported. This register is read once in probe and cached for later.

Signed-off-by: Sean Anderson <[email protected]>
---
This commit fixes 3c42563b3041 ("net: phy: aquantia: Add support for
rate matching"). In an effort to avoid backporting of this commit until
it has soaked in master for a while, the fixes tag has been left off.

Changes in v5:
- Don't handle PHY_INTERFACE_MODE_NA, and simplify logic

Changes in v4:
- Fix kerneldoc using - instead of : for parameters

Changes in v3:
- Fix incorrect bits for PMA/PMD speed

Changes in v2:
- Rework to just validate things instead of modifying registers

drivers/net/phy/aquantia_main.c | 136 ++++++++++++++++++++++++++++++--
1 file changed, 128 insertions(+), 8 deletions(-)

diff --git a/drivers/net/phy/aquantia_main.c b/drivers/net/phy/aquantia_main.c
index 334a6904ca5a..06078cd2d5b3 100644
--- a/drivers/net/phy/aquantia_main.c
+++ b/drivers/net/phy/aquantia_main.c
@@ -111,6 +111,12 @@
#define VEND1_GLOBAL_CFG_RATE_ADAPT_NONE 0
#define VEND1_GLOBAL_CFG_RATE_ADAPT_USX 1
#define VEND1_GLOBAL_CFG_RATE_ADAPT_PAUSE 2
+#define VEND1_GLOBAL_CFG_SERDES_MODE GENMASK(2, 0)
+#define VEND1_GLOBAL_CFG_SERDES_MODE_XFI 0
+#define VEND1_GLOBAL_CFG_SERDES_MODE_SGMII 3
+#define VEND1_GLOBAL_CFG_SERDES_MODE_OCSGMII 4
+#define VEND1_GLOBAL_CFG_SERDES_MODE_XFI5G 6
+#define VEND1_GLOBAL_CFG_SERDES_MODE_XFI20G 7

#define VEND1_GLOBAL_RSVD_STAT1 0xc885
#define VEND1_GLOBAL_RSVD_STAT1_FW_BUILD_ID GENMASK(7, 4)
@@ -175,6 +181,7 @@ static const struct aqr107_hw_stat aqr107_hw_stats[] = {

struct aqr107_priv {
u64 sgmii_stats[AQR107_SGMII_STAT_SZ];
+ int pmapmd_speeds;
};

static int aqr107_get_sset_count(struct phy_device *phydev)
@@ -677,14 +684,119 @@ static int aqr107_wait_processor_intensive_op(struct phy_device *phydev)
return 0;
}

+/**
+ * struct aqr107_link_speed_cfg - Common configuration for link speeds
+ * @speed: The speed of this config
+ * @reg: The global system configuration register for this speed
+ * @speed_bit: The bit in the PMA/PMD speed ability register which determines
+ * whether this link speed is supported
+ */
+struct aqr107_link_speed_cfg {
+ int speed;
+ u16 reg, speed_bit;
+};
+
+/**
+ * aqr107_rate_adapt_ok() - Validate rate adaptation for a configuration
+ * @phydev: The phy to act on
+ * @serdes_speed: The speed of the serdes (aka the phy interface)
+ * @link_cfg: The config for the link speed
+ *
+ * This function validates whether rate adaptation will work for a particular
+ * combination of @serdes_speed and @link_cfg.
+ *
+ * Return: %true if the @link_cfg.reg is configured for rate adaptation or if
+ * @link_cfg.speed will not be advertised, %false otherwise.
+ */
+static bool aqr107_rate_adapt_ok(struct phy_device *phydev, int serdes_speed,
+ const struct aqr107_link_speed_cfg *link_cfg)
+{
+ struct aqr107_priv *priv = phydev->priv;
+ int val;
+
+ phydev_dbg(phydev, "validating link_speed=%d serdes_speed=%d\n",
+ link_cfg->speed, serdes_speed);
+
+ /* Vacuously OK, since we won't advertise it anyway */
+ if (!(priv->pmapmd_speeds & link_cfg->speed_bit))
+ return true;
+
+ val = phy_read_mmd(phydev, MDIO_MMD_VEND1, link_cfg->reg);
+ if (val < 0) {
+ phydev_warn(phydev, "could not read register %x:%.04x (err = %d)\n",
+ MDIO_MMD_VEND1, link_cfg->reg, val);
+ return false;
+ }
+
+ phydev_dbg(phydev, "%x:%.04x = %.04x\n", MDIO_MMD_VEND1, link_cfg->reg, val);
+ if (FIELD_GET(VEND1_GLOBAL_CFG_RATE_ADAPT, val) !=
+ VEND1_GLOBAL_CFG_RATE_ADAPT_PAUSE)
+ return false;
+
+ switch (FIELD_GET(VEND1_GLOBAL_CFG_SERDES_MODE, val)) {
+ case VEND1_GLOBAL_CFG_SERDES_MODE_XFI20G:
+ return serdes_speed == SPEED_20000;
+ case VEND1_GLOBAL_CFG_SERDES_MODE_XFI:
+ return serdes_speed == SPEED_10000;
+ case VEND1_GLOBAL_CFG_SERDES_MODE_XFI5G:
+ return serdes_speed == SPEED_5000;
+ case VEND1_GLOBAL_CFG_SERDES_MODE_OCSGMII:
+ return serdes_speed == SPEED_2500;
+ case VEND1_GLOBAL_CFG_SERDES_MODE_SGMII:
+ return serdes_speed == SPEED_1000;
+ default:
+ return false;
+ }
+}
+
static int aqr107_get_rate_matching(struct phy_device *phydev,
phy_interface_t iface)
{
- if (iface == PHY_INTERFACE_MODE_10GBASER ||
- iface == PHY_INTERFACE_MODE_2500BASEX ||
- iface == PHY_INTERFACE_MODE_NA)
- return RATE_MATCH_PAUSE;
- return RATE_MATCH_NONE;
+ static const struct aqr107_link_speed_cfg speed_table[] = {
+ {
+ .speed = SPEED_10,
+ .reg = VEND1_GLOBAL_CFG_10M,
+ .speed_bit = MDIO_PMA_SPEED_10,
+ },
+ {
+ .speed = SPEED_100,
+ .reg = VEND1_GLOBAL_CFG_100M,
+ .speed_bit = MDIO_PMA_SPEED_100,
+ },
+ {
+ .speed = SPEED_1000,
+ .reg = VEND1_GLOBAL_CFG_1G,
+ .speed_bit = MDIO_PMA_SPEED_1000,
+ },
+ {
+ .speed = SPEED_2500,
+ .reg = VEND1_GLOBAL_CFG_2_5G,
+ .speed_bit = MDIO_PMA_SPEED_2_5G,
+ },
+ {
+ .speed = SPEED_5000,
+ .reg = VEND1_GLOBAL_CFG_5G,
+ .speed_bit = MDIO_PMA_SPEED_5G,
+ },
+ {
+ .speed = SPEED_10000,
+ .reg = VEND1_GLOBAL_CFG_10G,
+ .speed_bit = MDIO_PMA_SPEED_10G,
+ },
+ };
+ int speed = phy_interface_max_speed(iface);
+ bool got_one = false;
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(speed_table) &&
+ speed_table[i].speed <= speed; i++) {
+ if (!aqr107_rate_adapt_ok(phydev, speed, &speed_table[i]))
+ return RATE_MATCH_NONE;
+ got_one = true;
+ }
+
+ /* Must match at least one speed */
+ return got_one ? RATE_MATCH_PAUSE : RATE_MATCH_NONE;
}

static int aqr107_suspend(struct phy_device *phydev)
@@ -713,10 +825,18 @@ static int aqr107_resume(struct phy_device *phydev)

static int aqr107_probe(struct phy_device *phydev)
{
- phydev->priv = devm_kzalloc(&phydev->mdio.dev,
- sizeof(struct aqr107_priv), GFP_KERNEL);
- if (!phydev->priv)
+ struct aqr107_priv *priv;
+
+ priv = devm_kzalloc(&phydev->mdio.dev, sizeof(*priv), GFP_KERNEL);
+ if (!priv)
return -ENOMEM;
+ phydev->priv = priv;
+
+ priv->pmapmd_speeds = phy_read_mmd(phydev, MDIO_MMD_PMAPMD, MDIO_SPEED);
+ if (priv->pmapmd_speeds < 0) {
+ phydev_err(phydev, "could not read PMA/PMD speeds\n");
+ return priv->pmapmd_speeds;
+ };

return aqr_hwmon_probe(phydev);
}
--
2.35.1.1320.gc452695387.dirty

2023-01-05 13:48:12

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 0/4] phy: aquantia: Determine rate adaptation support from registers

Hi Sean,

On Tue, Jan 03, 2023 at 05:05:07PM -0500, Sean Anderson wrote:
> This attempts to address the problems first reported in [1]. Tim has an
> Aquantia phy where the firmware is set up to use "5G XFI" (underclocked
> 10GBASE-R) when rate adapting lower speeds. This results in us
> advertising that we support lower speeds and then failing to bring the
> link up. To avoid this, determine whether to enable rate adaptation
> based on what's programmed by the firmware. This is "the worst choice"
> [2], but we can't really do better until we have more insight into
> what the firmware is doing. At the very least, we can prevent bad
> firmware from causing us to advertise the wrong modes.

After this patch set, is there any reason why phydev->rate_matching
still exists and must be populated by the PHY driver?

2023-01-05 14:38:54

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Tue, Jan 03, 2023 at 05:05:11PM -0500, Sean Anderson wrote:
> static int aqr107_get_rate_matching(struct phy_device *phydev,
> phy_interface_t iface)
> {
> - if (iface == PHY_INTERFACE_MODE_10GBASER ||
> - iface == PHY_INTERFACE_MODE_2500BASEX ||
> - iface == PHY_INTERFACE_MODE_NA)
> - return RATE_MATCH_PAUSE;
> - return RATE_MATCH_NONE;
> + static const struct aqr107_link_speed_cfg speed_table[] = {
> + {
> + .speed = SPEED_10,
> + .reg = VEND1_GLOBAL_CFG_10M,
> + .speed_bit = MDIO_PMA_SPEED_10,
> + },
> + {
> + .speed = SPEED_100,
> + .reg = VEND1_GLOBAL_CFG_100M,
> + .speed_bit = MDIO_PMA_SPEED_100,
> + },
> + {
> + .speed = SPEED_1000,
> + .reg = VEND1_GLOBAL_CFG_1G,
> + .speed_bit = MDIO_PMA_SPEED_1000,
> + },
> + {
> + .speed = SPEED_2500,
> + .reg = VEND1_GLOBAL_CFG_2_5G,
> + .speed_bit = MDIO_PMA_SPEED_2_5G,
> + },
> + {
> + .speed = SPEED_5000,
> + .reg = VEND1_GLOBAL_CFG_5G,
> + .speed_bit = MDIO_PMA_SPEED_5G,
> + },
> + {
> + .speed = SPEED_10000,
> + .reg = VEND1_GLOBAL_CFG_10G,
> + .speed_bit = MDIO_PMA_SPEED_10G,
> + },
> + };
> + int speed = phy_interface_max_speed(iface);
> + bool got_one = false;
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(speed_table) &&
> + speed_table[i].speed <= speed; i++) {
> + if (!aqr107_rate_adapt_ok(phydev, speed, &speed_table[i]))
> + return RATE_MATCH_NONE;
> + got_one = true;
> + }

Trying to wrap my head around the API for rate matching that was
originally proposed and how it applies to what we read from Aquantia
registers now.

IIUC, phylink (via the PHY library) asks "what kind of rate matching is
supported for this SERDES protocol?". It doesn't ask "via what kind of
rate matching can this SERDES protocol support this particular media
side speed?".

Your code walks through the speed_table[] of media speeds (from 10M up
until the max speed of the SERDES) and sees whether the PHY was
provisioned, for that speed, to use PAUSE rate adaptation.

If the PHY firmware uses a combination like this: 10GBASE-R/XFI for
media speeds of 10G, 5G, 2.5G (rate adapted), and SGMII for 1G, 100M
and 10M, a call to your implementation of
aqr107_get_rate_matching(PHY_INTERFACE_MODE_10GBASER) would return
RATE_MATCH_NONE, right? So only ETHTOOL_LINK_MODE_10000baseT_Full_BIT
would be advertised on the media side?

Shouldn't you take into consideration in your aqr107_rate_adapt_ok()
function only the media side link speeds for which the PHY was actually
*configured* to use the SERDES protocol @iface?

> +
> + /* Must match at least one speed */
> + return got_one ? RATE_MATCH_PAUSE : RATE_MATCH_NONE;
> }

2023-01-05 14:58:15

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 04:04:21PM +0200, Vladimir Oltean wrote:
> On Tue, Jan 03, 2023 at 05:05:11PM -0500, Sean Anderson wrote:
> > static int aqr107_get_rate_matching(struct phy_device *phydev,
> > phy_interface_t iface)
> > {
> > - if (iface == PHY_INTERFACE_MODE_10GBASER ||
> > - iface == PHY_INTERFACE_MODE_2500BASEX ||
> > - iface == PHY_INTERFACE_MODE_NA)
> > - return RATE_MATCH_PAUSE;
> > - return RATE_MATCH_NONE;
> > + static const struct aqr107_link_speed_cfg speed_table[] = {
> > + {
> > + .speed = SPEED_10,
> > + .reg = VEND1_GLOBAL_CFG_10M,
> > + .speed_bit = MDIO_PMA_SPEED_10,
> > + },
> > + {
> > + .speed = SPEED_100,
> > + .reg = VEND1_GLOBAL_CFG_100M,
> > + .speed_bit = MDIO_PMA_SPEED_100,
> > + },
> > + {
> > + .speed = SPEED_1000,
> > + .reg = VEND1_GLOBAL_CFG_1G,
> > + .speed_bit = MDIO_PMA_SPEED_1000,
> > + },
> > + {
> > + .speed = SPEED_2500,
> > + .reg = VEND1_GLOBAL_CFG_2_5G,
> > + .speed_bit = MDIO_PMA_SPEED_2_5G,
> > + },
> > + {
> > + .speed = SPEED_5000,
> > + .reg = VEND1_GLOBAL_CFG_5G,
> > + .speed_bit = MDIO_PMA_SPEED_5G,
> > + },
> > + {
> > + .speed = SPEED_10000,
> > + .reg = VEND1_GLOBAL_CFG_10G,
> > + .speed_bit = MDIO_PMA_SPEED_10G,
> > + },
> > + };
> > + int speed = phy_interface_max_speed(iface);
> > + bool got_one = false;
> > + int i;
> > +
> > + for (i = 0; i < ARRAY_SIZE(speed_table) &&
> > + speed_table[i].speed <= speed; i++) {
> > + if (!aqr107_rate_adapt_ok(phydev, speed, &speed_table[i]))
> > + return RATE_MATCH_NONE;
> > + got_one = true;
> > + }
>
> Trying to wrap my head around the API for rate matching that was
> originally proposed and how it applies to what we read from Aquantia
> registers now.
>
> IIUC, phylink (via the PHY library) asks "what kind of rate matching is
> supported for this SERDES protocol?". It doesn't ask "via what kind of
> rate matching can this SERDES protocol support this particular media
> side speed?".
>
> Your code walks through the speed_table[] of media speeds (from 10M up
> until the max speed of the SERDES) and sees whether the PHY was
> provisioned, for that speed, to use PAUSE rate adaptation.
>
> If the PHY firmware uses a combination like this: 10GBASE-R/XFI for
> media speeds of 10G, 5G, 2.5G (rate adapted), and SGMII for 1G, 100M
> and 10M, a call to your implementation of
> aqr107_get_rate_matching(PHY_INTERFACE_MODE_10GBASER) would return
> RATE_MATCH_NONE, right? So only ETHTOOL_LINK_MODE_10000baseT_Full_BIT
> would be advertised on the media side?

No, beause of the special condition in phylink that if it's a clause 45
PHY and we use something like 10GBASE-R, we don't limit to just 10G
speed, but try all interface modes - on the assumption that the PHY
will switch its host interface.

RATE_MATCH_NONE doesn't state anything about whether the PHY operates
in a single interface mode or not - with 10G PHYs (and thus clause 45
PHYs) it seems very common from current observations for
implementations to do this kind of host-interface switching.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2023-01-05 16:31:38

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On 1/5/23 09:04, Vladimir Oltean wrote:
> On Tue, Jan 03, 2023 at 05:05:11PM -0500, Sean Anderson wrote:
>> static int aqr107_get_rate_matching(struct phy_device *phydev,
>> phy_interface_t iface)
>> {
>> - if (iface == PHY_INTERFACE_MODE_10GBASER ||
>> - iface == PHY_INTERFACE_MODE_2500BASEX ||
>> - iface == PHY_INTERFACE_MODE_NA)
>> - return RATE_MATCH_PAUSE;
>> - return RATE_MATCH_NONE;
>> + static const struct aqr107_link_speed_cfg speed_table[] = {
>> + {
>> + .speed = SPEED_10,
>> + .reg = VEND1_GLOBAL_CFG_10M,
>> + .speed_bit = MDIO_PMA_SPEED_10,
>> + },
>> + {
>> + .speed = SPEED_100,
>> + .reg = VEND1_GLOBAL_CFG_100M,
>> + .speed_bit = MDIO_PMA_SPEED_100,
>> + },
>> + {
>> + .speed = SPEED_1000,
>> + .reg = VEND1_GLOBAL_CFG_1G,
>> + .speed_bit = MDIO_PMA_SPEED_1000,
>> + },
>> + {
>> + .speed = SPEED_2500,
>> + .reg = VEND1_GLOBAL_CFG_2_5G,
>> + .speed_bit = MDIO_PMA_SPEED_2_5G,
>> + },
>> + {
>> + .speed = SPEED_5000,
>> + .reg = VEND1_GLOBAL_CFG_5G,
>> + .speed_bit = MDIO_PMA_SPEED_5G,
>> + },
>> + {
>> + .speed = SPEED_10000,
>> + .reg = VEND1_GLOBAL_CFG_10G,
>> + .speed_bit = MDIO_PMA_SPEED_10G,
>> + },
>> + };
>> + int speed = phy_interface_max_speed(iface);
>> + bool got_one = false;
>> + int i;
>> +
>> + for (i = 0; i < ARRAY_SIZE(speed_table) &&
>> + speed_table[i].speed <= speed; i++) {
>> + if (!aqr107_rate_adapt_ok(phydev, speed, &speed_table[i]))
>> + return RATE_MATCH_NONE;
>> + got_one = true;
>> + }
>
> Trying to wrap my head around the API for rate matching that was
> originally proposed and how it applies to what we read from Aquantia
> registers now.
>
> IIUC, phylink (via the PHY library) asks "what kind of rate matching is
> supported for this SERDES protocol?". It doesn't ask "via what kind of
> rate matching can this SERDES protocol support this particular media
> side speed?".

We ask "What kind of rate matching is supported for this phy interface?"

> Your code walks through the speed_table[] of media speeds (from 10M up
> until the max speed of the SERDES) and sees whether the PHY was
> provisioned, for that speed, to use PAUSE rate adaptation.

This is because we assume that if a phy supports rate matching for a phy
interface mode, then it supports rate matching to all slower speeds that
it otherwise supports. This seemed like a pretty reasonable assumption
when I wrote the code, but it turns out that some firmwares don't abide
by this. This is firstly a problem with the firmware (as it should be
configured so that Linux can use the phy's features), but we have to be
careful not to end up with an unsupported combination.

> If the PHY firmware uses a combination like this: 10GBASE-R/XFI for
> media speeds of 10G, 5G, 2.5G (rate adapted), and SGMII for 1G, 100M
> and 10M, a call to your implementation of
> aqr107_get_rate_matching(PHY_INTERFACE_MODE_10GBASER) would return
> RATE_MATCH_NONE, right?

Correct.

> So only ETHTOOL_LINK_MODE_10000baseT_Full_BIT
> would be advertised on the media side?


If the host only supports 10GBASE-R and nothing else. If the host
supports SGMII as well, we will advertise 10G, 1G, 100M, and 10M. But
really, this is a problem with the firmware, since if the host supports
only 10GBASE-R, then the firmware should be set up to rate adapt to all
speeds.

> Shouldn't you take into consideration in your aqr107_rate_adapt_ok()
> function only the media side link speeds for which the PHY was actually
> *configured* to use the SERDES protocol @iface?

No, because we don't know what phy interface modes are actually going to
be used. The phy doesn't know whether e.g. the host supports both
10GBASE-R and SGMII or whether it only supports 10GBASE-R. With the
current API we cannot say "I support 5G" without also saying "I support
1G". If you don't like this, please send a patch for an API returning
supported speeds for a phy interface mode.

--Sean

2023-01-05 16:33:17

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 0/4] phy: aquantia: Determine rate adaptation support from registers

On 1/5/23 08:39, Vladimir Oltean wrote:
> Hi Sean,
>
> On Tue, Jan 03, 2023 at 05:05:07PM -0500, Sean Anderson wrote:
>> This attempts to address the problems first reported in [1]. Tim has an
>> Aquantia phy where the firmware is set up to use "5G XFI" (underclocked
>> 10GBASE-R) when rate adapting lower speeds. This results in us
>> advertising that we support lower speeds and then failing to bring the
>> link up. To avoid this, determine whether to enable rate adaptation
>> based on what's programmed by the firmware. This is "the worst choice"
>> [2], but we can't really do better until we have more insight into
>> what the firmware is doing. At the very least, we can prevent bad
>> firmware from causing us to advertise the wrong modes.
>
> After this patch set, is there any reason why phydev->rate_matching
> still exists and must be populated by the PHY driver?

It's necessary for phylink_link_up to know what speed to use for the MAC.

--Sean

2023-01-05 18:04:41

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 07:34:45PM +0200, Vladimir Oltean wrote:
> So we lose the advertisement of 5G and 2.5G, even if the firmware is
> provisioned for them via 10GBASE-R rate adaptation, right? Because when
> asked "What kind of rate matching is supported for 10GBASE-R?", the
> Aquantia driver will respond "None".

The code doesn't have the ability to do any better right now - since
we don't know what sets of interface modes _could_ be used by the PHY
and whether each interface mode may result in rate adaption.

To achieve that would mean reworking yet again all the phylink
validation from scratch, and probably reworking phylib and most of
the PHY drivers too so that they provide a lot more information
about their host interface behaviour.

I don't think there is an easy way to have a "perfect" solution
immediately - it's going to take a while to evolve - and probably
painfully evolve due to the slowness involved in updating all the
drivers that make use of phylink in some way.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2023-01-05 18:04:53

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On 1/5/23 12:34, Vladimir Oltean wrote:
> On Thu, Jan 05, 2023 at 11:21:14AM -0500, Sean Anderson wrote:
>> > Your code walks through the speed_table[] of media speeds (from 10M up
>> > until the max speed of the SERDES) and sees whether the PHY was
>> > provisioned, for that speed, to use PAUSE rate adaptation.
>>
>> This is because we assume that if a phy supports rate matching for a phy
>> interface mode, then it supports rate matching to all slower speeds that
>> it otherwise supports. This seemed like a pretty reasonable assumption
>> when I wrote the code, but it turns out that some firmwares don't abide
>> by this. This is firstly a problem with the firmware (as it should be
>> configured so that Linux can use the phy's features), but we have to be
>> careful not to end up with an unsupported combination.
>
> When you say "problem with the firmware", you're referring specifically
> to my example (10GBASE-R for >1G speeds, SGMII for <=1G speeds)?

Actually, I am mostly referring to cases where rate adaptation is set up
to use a phy interface mode which isn't supported. In particular, Tim
has a board where the phy is set up to rate adapt using 5GBASE-R (-X?),
even though the host only supports 10GBASE-R.

> Why do you consider this a firmware misconfiguration? Let's say the host
> supports both 10GBASE-R and SGMII, but the system designer preferred not
> to use PAUSE-based rate adaptation for the speeds where native rate
> adaptation was available.

This is supported, you just can't get 5G or 2.5G.

>> > If the PHY firmware uses a combination like this: 10GBASE-R/XFI for
>> > media speeds of 10G, 5G, 2.5G (rate adapted), and SGMII for 1G, 100M
>> > and 10M, a call to your implementation of
>> > aqr107_get_rate_matching(PHY_INTERFACE_MODE_10GBASER) would return
>> > RATE_MATCH_NONE, right?
>>
>> Correct.
>>
>> > So only ETHTOOL_LINK_MODE_10000baseT_Full_BIT
>> > would be advertised on the media side?
>>
>> If the host only supports 10GBASE-R and nothing else. If the host
>> supports SGMII as well, we will advertise 10G, 1G, 100M, and 10M. But
>> really, this is a problem with the firmware, since if the host supports
>> only 10GBASE-R, then the firmware should be set up to rate adapt to all
>> speeds.
>
> So we lose the advertisement of 5G and 2.5G, even if the firmware is
> provisioned for them via 10GBASE-R rate adaptation, right? Because when
> asked "What kind of rate matching is supported for 10GBASE-R?", the
> Aquantia driver will respond "None".

Correct.

>> > Shouldn't you take into consideration in your aqr107_rate_adapt_ok()
>> > function only the media side link speeds for which the PHY was actually
>> > *configured* to use the SERDES protocol @iface?
>>
>> No, because we don't know what phy interface modes are actually going to
>> be used. The phy doesn't know whether e.g. the host supports both
>> 10GBASE-R and SGMII or whether it only supports 10GBASE-R. With the
>> current API we cannot say "I support 5G" without also saying "I support
>> 1G". If you don't like this, please send a patch for an API returning
>> supported speeds for a phy interface mode.

Again, this is to comply with the existing API assumptions. The current
code is buggy. Of course, another way around this is to modify the API.
I have chosen this route because I don't have a situation like you
described. But if support for that is important to you, I encourage you
to refactor things.

--Sean

2023-01-05 18:06:07

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 02:40:50PM +0000, Russell King (Oracle) wrote:
> > If the PHY firmware uses a combination like this: 10GBASE-R/XFI for
> > media speeds of 10G, 5G, 2.5G (rate adapted), and SGMII for 1G, 100M
> > and 10M, a call to your implementation of
> > aqr107_get_rate_matching(PHY_INTERFACE_MODE_10GBASER) would return
> > RATE_MATCH_NONE, right? So only ETHTOOL_LINK_MODE_10000baseT_Full_BIT
> > would be advertised on the media side?
>
> No, beause of the special condition in phylink that if it's a clause 45
> PHY and we use something like 10GBASE-R, we don't limit to just 10G
> speed, but try all interface modes - on the assumption that the PHY
> will switch its host interface.
>
> RATE_MATCH_NONE doesn't state anything about whether the PHY operates
> in a single interface mode or not - with 10G PHYs (and thus clause 45
> PHYs) it seems very common from current observations for
> implementations to do this kind of host-interface switching.

So you mention commits
7642cc28fd37 ("net: phylink: fix PHY validation with rate adaption") and
df3f57ac9605 ("net: phylink: extend clause 45 PHY validation workaround").

IIUC, these allow the advertised capabilities to be more than 10G (based
on supported_interfaces), on the premise that it's possible for the PHY
to switch SERDES protocol to achieve lower speeds.

This does partly correct the last part of my question, but I believe
that the essence of it still remains. We won't make use of PAUSE rate
adaptation to support the speeds which aren't directly covered by the
supported_interfaces. Aren't we interpreting the PHY provisioning somewhat
too conservatively in this case, or do you believe that this is just an
academic concern?

2023-01-05 18:06:11

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 11:21:14AM -0500, Sean Anderson wrote:
> > Your code walks through the speed_table[] of media speeds (from 10M up
> > until the max speed of the SERDES) and sees whether the PHY was
> > provisioned, for that speed, to use PAUSE rate adaptation.
>
> This is because we assume that if a phy supports rate matching for a phy
> interface mode, then it supports rate matching to all slower speeds that
> it otherwise supports. This seemed like a pretty reasonable assumption
> when I wrote the code, but it turns out that some firmwares don't abide
> by this. This is firstly a problem with the firmware (as it should be
> configured so that Linux can use the phy's features), but we have to be
> careful not to end up with an unsupported combination.

When you say "problem with the firmware", you're referring specifically
to my example (10GBASE-R for >1G speeds, SGMII for <=1G speeds)?

Why do you consider this a firmware misconfiguration? Let's say the host
supports both 10GBASE-R and SGMII, but the system designer preferred not
to use PAUSE-based rate adaptation for the speeds where native rate
adaptation was available.

> > If the PHY firmware uses a combination like this: 10GBASE-R/XFI for
> > media speeds of 10G, 5G, 2.5G (rate adapted), and SGMII for 1G, 100M
> > and 10M, a call to your implementation of
> > aqr107_get_rate_matching(PHY_INTERFACE_MODE_10GBASER) would return
> > RATE_MATCH_NONE, right?
>
> Correct.
>
> > So only ETHTOOL_LINK_MODE_10000baseT_Full_BIT
> > would be advertised on the media side?
>
> If the host only supports 10GBASE-R and nothing else. If the host
> supports SGMII as well, we will advertise 10G, 1G, 100M, and 10M. But
> really, this is a problem with the firmware, since if the host supports
> only 10GBASE-R, then the firmware should be set up to rate adapt to all
> speeds.

So we lose the advertisement of 5G and 2.5G, even if the firmware is
provisioned for them via 10GBASE-R rate adaptation, right? Because when
asked "What kind of rate matching is supported for 10GBASE-R?", the
Aquantia driver will respond "None".

> > Shouldn't you take into consideration in your aqr107_rate_adapt_ok()
> > function only the media side link speeds for which the PHY was actually
> > *configured* to use the SERDES protocol @iface?
>
> No, because we don't know what phy interface modes are actually going to
> be used. The phy doesn't know whether e.g. the host supports both
> 10GBASE-R and SGMII or whether it only supports 10GBASE-R. With the
> current API we cannot say "I support 5G" without also saying "I support
> 1G". If you don't like this, please send a patch for an API returning
> supported speeds for a phy interface mode.

2023-01-05 18:07:22

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On 1/5/23 12:55, Vladimir Oltean wrote:
> On Thu, Jan 05, 2023 at 07:52:06PM +0200, Vladimir Oltean wrote:
>> On Thu, Jan 05, 2023 at 12:43:47PM -0500, Sean Anderson wrote:
>> > Again, this is to comply with the existing API assumptions. The current
>> > code is buggy. Of course, another way around this is to modify the API.
>> > I have chosen this route because I don't have a situation like you
>> > described. But if support for that is important to you, I encourage you
>> > to refactor things.
>>
>> I don't think I'm aware of a practical situation like that either.
>> I remember seeing some S32G boards with Aquantia PHYs which use 2500BASE-X
>> for 2.5G and SGMII for <=1G, but that's about it in terms of protocol switching.
>> As for Layerscape boards, SERDES protocol switching is a very new concept there,
>> so they're all going to be provisioned for PAUSE all the way down
>> (or USXGMII, where that is available).
>>
>> I just pointed this out because it jumped out to me. I don't have
>> something against this patch getting accepted as it is.
>
> A real-life (albeit niche) scenario where someone might have an Aquantia
> firmware provisioned like this would be a 10G capable port that also
> wants to support half duplex at 10/100 speeds. Although I'm not quite
> sure who cares about half duplex all that much these days.

IMO if we really want to support this, the easier way would be to teach
the phy driver how to change the rate adaptation mode. That way we could
always advertise rate adaptation, but if someone came along and
requested 10HD we could reconfigure the phy to support it. However, this
was deemed too risky in the discussion for v1, since we don't really
know how the firmware interacts with the registers.

--Sean

2023-01-05 18:19:24

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 07:52:06PM +0200, Vladimir Oltean wrote:
> On Thu, Jan 05, 2023 at 12:43:47PM -0500, Sean Anderson wrote:
> > Again, this is to comply with the existing API assumptions. The current
> > code is buggy. Of course, another way around this is to modify the API.
> > I have chosen this route because I don't have a situation like you
> > described. But if support for that is important to you, I encourage you
> > to refactor things.
>
> I don't think I'm aware of a practical situation like that either.
> I remember seeing some S32G boards with Aquantia PHYs which use 2500BASE-X
> for 2.5G and SGMII for <=1G, but that's about it in terms of protocol switching.
> As for Layerscape boards, SERDES protocol switching is a very new concept there,
> so they're all going to be provisioned for PAUSE all the way down
> (or USXGMII, where that is available).
>
> I just pointed this out because it jumped out to me. I don't have
> something against this patch getting accepted as it is.

A real-life (albeit niche) scenario where someone might have an Aquantia
firmware provisioned like this would be a 10G capable port that also
wants to support half duplex at 10/100 speeds. Although I'm not quite
sure who cares about half duplex all that much these days.

2023-01-05 18:28:43

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On 1/5/23 13:11, Vladimir Oltean wrote:
> I think I would prefer not exporting anything rate adaptation related to
> user space, at least until things clean up a little and we're confident
> that we don't need to radically change how it works.

Currently, we have a rate adaptation field for get_ksettings, which
indicates whether rate adaptation is occurring. To really support the
above case, we'd probably want a way for userspace to express a
preference for e.g. rate-adapted 10M over "native" 10M. I agree that we
should hold off on this until we're more confident in the
implementation.

--Sean

2023-01-05 18:34:59

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 01:03:49PM -0500, Sean Anderson wrote:
> IMO if we really want to support this, the easier way would be to teach
> the phy driver how to change the rate adaptation mode. That way we could
> always advertise rate adaptation, but if someone came along and
> requested 10HD we could reconfigure the phy to support it. However, this
> was deemed too risky in the discussion for v1, since we don't really
> know how the firmware interacts with the registers.

I think I would prefer not exporting anything rate adaptation related to
user space, at least until things clean up a little and we're confident
that we don't need to radically change how it works.

2023-01-05 18:35:06

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 12:43:47PM -0500, Sean Anderson wrote:
> Again, this is to comply with the existing API assumptions. The current
> code is buggy. Of course, another way around this is to modify the API.
> I have chosen this route because I don't have a situation like you
> described. But if support for that is important to you, I encourage you
> to refactor things.

I don't think I'm aware of a practical situation like that either.
I remember seeing some S32G boards with Aquantia PHYs which use 2500BASE-X
for 2.5G and SGMII for <=1G, but that's about it in terms of protocol switching.
As for Layerscape boards, SERDES protocol switching is a very new concept there,
so they're all going to be provisioned for PAUSE all the way down
(or USXGMII, where that is available).

I just pointed this out because it jumped out to me. I don't have
something against this patch getting accepted as it is.

2023-01-05 19:13:14

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 07:43:42PM +0200, Vladimir Oltean wrote:
> On Thu, Jan 05, 2023 at 02:40:50PM +0000, Russell King (Oracle) wrote:
> > > If the PHY firmware uses a combination like this: 10GBASE-R/XFI for
> > > media speeds of 10G, 5G, 2.5G (rate adapted), and SGMII for 1G, 100M
> > > and 10M, a call to your implementation of
> > > aqr107_get_rate_matching(PHY_INTERFACE_MODE_10GBASER) would return
> > > RATE_MATCH_NONE, right? So only ETHTOOL_LINK_MODE_10000baseT_Full_BIT
> > > would be advertised on the media side?
> >
> > No, beause of the special condition in phylink that if it's a clause 45
> > PHY and we use something like 10GBASE-R, we don't limit to just 10G
> > speed, but try all interface modes - on the assumption that the PHY
> > will switch its host interface.
> >
> > RATE_MATCH_NONE doesn't state anything about whether the PHY operates
> > in a single interface mode or not - with 10G PHYs (and thus clause 45
> > PHYs) it seems very common from current observations for
> > implementations to do this kind of host-interface switching.
>
> So you mention commits
> 7642cc28fd37 ("net: phylink: fix PHY validation with rate adaption") and
> df3f57ac9605 ("net: phylink: extend clause 45 PHY validation workaround").
>
> IIUC, these allow the advertised capabilities to be more than 10G (based
> on supported_interfaces), on the premise that it's possible for the PHY
> to switch SERDES protocol to achieve lower speeds.

I didn't mention any commits, but yes, it's ever since the second commit
you list above, which was necessary to get PHYs which switch their
interface mode to work sanely. It essentially allows everything that
the combination of host and PHY supports, because we couldn't do much
better at the time that commit was written.

> This does partly correct the last part of my question, but I believe
> that the essence of it still remains. We won't make use of PAUSE rate
> adaptation to support the speeds which aren't directly covered by the
> supported_interfaces. Aren't we interpreting the PHY provisioning somewhat
> too conservatively in this case, or do you believe that this is just an
> academic concern?

Do you have a better idea how to come up with a list of link modes that
the PHY should advertise to its link partner and also report as
supported given the combination of:

- PHYs that switch their host interface
- PHYs that may support some kind of rate adaption
- PCS/MACs that may support half-duplex at some speeds
- PCS/MACs that might support pause modes, and might support them only
with certain interface modes

Layered on top of that is being able to determine which interface a PHY/
PCS/MAC should be using when e.g. a 10G copper PHY is inserted (which
could be inserted into a host which only supports up to 1G.)

I've spent considerable time trying to work out a solution to this, and
even before we had rate adaption, it isn't easy to solve. I've
experimented with several different solutions, and it's from numerous
trials that led to this host_interfaces/mac_capabilities structure -
but that still doesn't let us solve the problems I mention above since
we have no idea what the PHY itself is capable of, or how it's going to
behave, or really which interface modes it might switch between if it's
a clause 45 PHY.

I've experimented with adding phy->supported_interfaces so a phylib
driver can advertise what interfaces it supports. I've also
experimented with phy->possible_interfaces which reports the interface
modes that the PHY _is_ going to switch between having selected its
operating mode. I've not submitted them because even with this, it all
still seems rather inadequate - and there is a huge amount of work to
update all the phylib drivers to provide even that basic information,
let alone have much confidence that it is correct.

You can find these experiments, as normal, in my net-queue branch in
my git tree. These date from before we had rate adaption, so they take
no account of the recent addition of this extra variable.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2023-01-05 19:14:40

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On 1/5/23 13:55, Russell King (Oracle) wrote:
> On Thu, Jan 05, 2023 at 07:52:06PM +0200, Vladimir Oltean wrote:
>> On Thu, Jan 05, 2023 at 12:43:47PM -0500, Sean Anderson wrote:
>> > Again, this is to comply with the existing API assumptions. The current
>> > code is buggy. Of course, another way around this is to modify the API.
>> > I have chosen this route because I don't have a situation like you
>> > described. But if support for that is important to you, I encourage you
>> > to refactor things.
>>
>> I don't think I'm aware of a practical situation like that either.
>> I remember seeing some S32G boards with Aquantia PHYs which use 2500BASE-X
>> for 2.5G and SGMII for <=1G, but that's about it in terms of protocol switching.
>
> 88x3310 can dynamically switch between 10GBASE-R, 5GBASE-R, 2500BASE-X
> and SGMII if rate adaption is not being used (and the rate adaption
> method it supports in non-MACSEC PHYs is only via increasing the IPG on
> the MAC... which currently no MAC driver supports.)
>

As an aside, do you know of any MACs which support open-loop rate
matching to below ~95% of the line rate (the amount necessary for
10GBASE-W)?

--Sean

2023-01-05 19:17:29

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 01:03:49PM -0500, Sean Anderson wrote:
> On 1/5/23 12:55, Vladimir Oltean wrote:
> > On Thu, Jan 05, 2023 at 07:52:06PM +0200, Vladimir Oltean wrote:
> >> On Thu, Jan 05, 2023 at 12:43:47PM -0500, Sean Anderson wrote:
> >> > Again, this is to comply with the existing API assumptions. The current
> >> > code is buggy. Of course, another way around this is to modify the API.
> >> > I have chosen this route because I don't have a situation like you
> >> > described. But if support for that is important to you, I encourage you
> >> > to refactor things.
> >>
> >> I don't think I'm aware of a practical situation like that either.
> >> I remember seeing some S32G boards with Aquantia PHYs which use 2500BASE-X
> >> for 2.5G and SGMII for <=1G, but that's about it in terms of protocol switching.
> >> As for Layerscape boards, SERDES protocol switching is a very new concept there,
> >> so they're all going to be provisioned for PAUSE all the way down
> >> (or USXGMII, where that is available).
> >>
> >> I just pointed this out because it jumped out to me. I don't have
> >> something against this patch getting accepted as it is.
> >
> > A real-life (albeit niche) scenario where someone might have an Aquantia
> > firmware provisioned like this would be a 10G capable port that also
> > wants to support half duplex at 10/100 speeds. Although I'm not quite
> > sure who cares about half duplex all that much these days.
>
> IMO if we really want to support this, the easier way would be to teach
> the phy driver how to change the rate adaptation mode. That way we could
> always advertise rate adaptation, but if someone came along and
> requested 10HD we could reconfigure the phy to support it. However, this
> was deemed too risky in the discussion for v1, since we don't really
> know how the firmware interacts with the registers.

I'm not sure about "someone came along and requested 10HD". Don't you
mean "if someone plugged the RJ45 into a 10bT hub which only supports
10HD" ? Or are you suggesting that we shouldn't advertise 10HD and
100HD along with everything else, and then switch into this special
mode if someone wants to advertise these and disable all other link
modes?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2023-01-05 19:17:29

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 07:52:06PM +0200, Vladimir Oltean wrote:
> On Thu, Jan 05, 2023 at 12:43:47PM -0500, Sean Anderson wrote:
> > Again, this is to comply with the existing API assumptions. The current
> > code is buggy. Of course, another way around this is to modify the API.
> > I have chosen this route because I don't have a situation like you
> > described. But if support for that is important to you, I encourage you
> > to refactor things.
>
> I don't think I'm aware of a practical situation like that either.
> I remember seeing some S32G boards with Aquantia PHYs which use 2500BASE-X
> for 2.5G and SGMII for <=1G, but that's about it in terms of protocol switching.

88x3310 can dynamically switch between 10GBASE-R, 5GBASE-R, 2500BASE-X
and SGMII if rate adaption is not being used (and the rate adaption
method it supports in non-MACSEC PHYs is only via increasing the IPG on
the MAC... which currently no MAC driver supports.)

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2023-01-05 19:37:21

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On 1/5/23 13:58, Russell King (Oracle) wrote:
> On Thu, Jan 05, 2023 at 01:03:49PM -0500, Sean Anderson wrote:
>> On 1/5/23 12:55, Vladimir Oltean wrote:
>> > On Thu, Jan 05, 2023 at 07:52:06PM +0200, Vladimir Oltean wrote:
>> >> On Thu, Jan 05, 2023 at 12:43:47PM -0500, Sean Anderson wrote:
>> >> > Again, this is to comply with the existing API assumptions. The current
>> >> > code is buggy. Of course, another way around this is to modify the API.
>> >> > I have chosen this route because I don't have a situation like you
>> >> > described. But if support for that is important to you, I encourage you
>> >> > to refactor things.
>> >>
>> >> I don't think I'm aware of a practical situation like that either.
>> >> I remember seeing some S32G boards with Aquantia PHYs which use 2500BASE-X
>> >> for 2.5G and SGMII for <=1G, but that's about it in terms of protocol switching.
>> >> As for Layerscape boards, SERDES protocol switching is a very new concept there,
>> >> so they're all going to be provisioned for PAUSE all the way down
>> >> (or USXGMII, where that is available).
>> >>
>> >> I just pointed this out because it jumped out to me. I don't have
>> >> something against this patch getting accepted as it is.
>> >
>> > A real-life (albeit niche) scenario where someone might have an Aquantia
>> > firmware provisioned like this would be a 10G capable port that also
>> > wants to support half duplex at 10/100 speeds. Although I'm not quite
>> > sure who cares about half duplex all that much these days.
>>
>> IMO if we really want to support this, the easier way would be to teach
>> the phy driver how to change the rate adaptation mode. That way we could
>> always advertise rate adaptation, but if someone came along and
>> requested 10HD we could reconfigure the phy to support it. However, this
>> was deemed too risky in the discussion for v1, since we don't really
>> know how the firmware interacts with the registers.
>
> I'm not sure about "someone came along and requested 10HD". Don't you
> mean "if someone plugged the RJ45 into a 10bT hub which only supports
> 10HD" ? Or are you suggesting that we shouldn't advertise 10HD and
> 100HD along with everything else, and then switch into this special
> mode if someone wants to advertise these and disable all other link
> modes?
>

The former. "someone" being userspace or the remote end.

--Sean

2023-01-05 19:37:31

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On 1/5/23 14:06, Russell King (Oracle) wrote:
> On Thu, Jan 05, 2023 at 01:59:27PM -0500, Sean Anderson wrote:
>> On 1/5/23 13:55, Russell King (Oracle) wrote:
>> > On Thu, Jan 05, 2023 at 07:52:06PM +0200, Vladimir Oltean wrote:
>> >> On Thu, Jan 05, 2023 at 12:43:47PM -0500, Sean Anderson wrote:
>> >> > Again, this is to comply with the existing API assumptions. The current
>> >> > code is buggy. Of course, another way around this is to modify the API.
>> >> > I have chosen this route because I don't have a situation like you
>> >> > described. But if support for that is important to you, I encourage you
>> >> > to refactor things.
>> >>
>> >> I don't think I'm aware of a practical situation like that either.
>> >> I remember seeing some S32G boards with Aquantia PHYs which use 2500BASE-X
>> >> for 2.5G and SGMII for <=1G, but that's about it in terms of protocol switching.
>> >
>> > 88x3310 can dynamically switch between 10GBASE-R, 5GBASE-R, 2500BASE-X
>> > and SGMII if rate adaption is not being used (and the rate adaption
>> > method it supports in non-MACSEC PHYs is only via increasing the IPG on
>> > the MAC... which currently no MAC driver supports.)
>> >
>>
>> As an aside, do you know of any MACs which support open-loop rate
>> matching to below ~95% of the line rate (the amount necessary for
>> 10GBASE-W)?
>
> I'm afraid I haven't paid too much attention to BASE-W, and I'm not
> aware of anything within the realms of phylink/phylib supporting MAC
> drivers having anything for it. I don't even remember mention of it
> in any SoC datasheets.

The mEMAC supports "WAN mode" which does open-loop rate matching, but
it can really only adapt down to 9.5 GBit/s or so.

> Are you aware of a 10GBASE-W setup?

No.

--Sean

2023-01-05 20:13:48

by Russell King (Oracle)

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 01:59:27PM -0500, Sean Anderson wrote:
> On 1/5/23 13:55, Russell King (Oracle) wrote:
> > On Thu, Jan 05, 2023 at 07:52:06PM +0200, Vladimir Oltean wrote:
> >> On Thu, Jan 05, 2023 at 12:43:47PM -0500, Sean Anderson wrote:
> >> > Again, this is to comply with the existing API assumptions. The current
> >> > code is buggy. Of course, another way around this is to modify the API.
> >> > I have chosen this route because I don't have a situation like you
> >> > described. But if support for that is important to you, I encourage you
> >> > to refactor things.
> >>
> >> I don't think I'm aware of a practical situation like that either.
> >> I remember seeing some S32G boards with Aquantia PHYs which use 2500BASE-X
> >> for 2.5G and SGMII for <=1G, but that's about it in terms of protocol switching.
> >
> > 88x3310 can dynamically switch between 10GBASE-R, 5GBASE-R, 2500BASE-X
> > and SGMII if rate adaption is not being used (and the rate adaption
> > method it supports in non-MACSEC PHYs is only via increasing the IPG on
> > the MAC... which currently no MAC driver supports.)
> >
>
> As an aside, do you know of any MACs which support open-loop rate
> matching to below ~95% of the line rate (the amount necessary for
> 10GBASE-W)?

I'm afraid I haven't paid too much attention to BASE-W, and I'm not
aware of anything within the realms of phylink/phylib supporting MAC
drivers having anything for it. I don't even remember mention of it
in any SoC datasheets.

Are you aware of a 10GBASE-W setup?

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!

2023-01-06 14:51:58

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 06:51:33PM +0000, Russell King (Oracle) wrote:
> On Thu, Jan 05, 2023 at 07:43:42PM +0200, Vladimir Oltean wrote:
> > On Thu, Jan 05, 2023 at 02:40:50PM +0000, Russell King (Oracle) wrote:
> > > > If the PHY firmware uses a combination like this: 10GBASE-R/XFI for
> > > > media speeds of 10G, 5G, 2.5G (rate adapted), and SGMII for 1G, 100M
> > > > and 10M, a call to your implementation of
> > > > aqr107_get_rate_matching(PHY_INTERFACE_MODE_10GBASER) would return
> > > > RATE_MATCH_NONE, right? So only ETHTOOL_LINK_MODE_10000baseT_Full_BIT
> > > > would be advertised on the media side?
> > >
> > > No, beause of the special condition in phylink that if it's a clause 45
> > > PHY and we use something like 10GBASE-R, we don't limit to just 10G
> > > speed, but try all interface modes - on the assumption that the PHY
> > > will switch its host interface.
> > >
> > > RATE_MATCH_NONE doesn't state anything about whether the PHY operates
> > > in a single interface mode or not - with 10G PHYs (and thus clause 45
> > > PHYs) it seems very common from current observations for
> > > implementations to do this kind of host-interface switching.
> >
> > So you mention commits
> > 7642cc28fd37 ("net: phylink: fix PHY validation with rate adaption") and
> > df3f57ac9605 ("net: phylink: extend clause 45 PHY validation workaround").
> >
> > IIUC, these allow the advertised capabilities to be more than 10G (based
> > on supported_interfaces), on the premise that it's possible for the PHY
> > to switch SERDES protocol to achieve lower speeds.
>
> I didn't mention any commits, but yes, it's ever since the second commit
> you list above, which was necessary to get PHYs which switch their
> interface mode to work sanely. It essentially allows everything that
> the combination of host and PHY supports, because we couldn't do much
> better at the time that commit was written.
>
> > This does partly correct the last part of my question, but I believe
> > that the essence of it still remains. We won't make use of PAUSE rate
> > adaptation to support the speeds which aren't directly covered by the
> > supported_interfaces. Aren't we interpreting the PHY provisioning somewhat
> > too conservatively in this case, or do you believe that this is just an
> > academic concern?
>
> Do you have a better idea how to come up with a list of link modes that
> the PHY should advertise to its link partner and also report as
> supported given the combination of:
>
> - PHYs that switch their host interface
> - PHYs that may support some kind of rate adaption
> - PCS/MACs that may support half-duplex at some speeds
> - PCS/MACs that might support pause modes, and might support them only
> with certain interface modes
>
> Layered on top of that is being able to determine which interface a PHY/
> PCS/MAC should be using when e.g. a 10G copper PHY is inserted (which
> could be inserted into a host which only supports up to 1G.)
>
> I've spent considerable time trying to work out a solution to this, and
> even before we had rate adaption, it isn't easy to solve. I've
> experimented with several different solutions, and it's from numerous
> trials that led to this host_interfaces/mac_capabilities structure -
> but that still doesn't let us solve the problems I mention above since
> we have no idea what the PHY itself is capable of, or how it's going to
> behave, or really which interface modes it might switch between if it's
> a clause 45 PHY.
>
> I've experimented with adding phy->supported_interfaces so a phylib
> driver can advertise what interfaces it supports. I've also
> experimented with phy->possible_interfaces which reports the interface
> modes that the PHY _is_ going to switch between having selected its
> operating mode. I've not submitted them because even with this, it all
> still seems rather inadequate - and there is a huge amount of work to
> update all the phylib drivers to provide even that basic information,
> let alone have much confidence that it is correct.
>
> You can find these experiments, as normal, in my net-queue branch in
> my git tree. These date from before we had rate adaption, so they take
> no account of the recent addition of this extra variable.

Don't we actually need an API for the PHY resembling the following?

struct phy_host_cfg {
phy_interface_t interface;
int rate_matching;
};

/* Caller must kfree() @host_cfg */
int phy_get_host_cfg_for_linkmode(struct phy_device *phydev,
enum ethtool_link_mode_bit_indices linkmode,
struct phy_host_cfg **host_cfg,
int *num_host_cfg)
{
if (!phydev->drv->get_host_cfg_for_linkmode) {
/* Assume that PHYs can't change host interface and don't
* support rate matching
*/
*host_cfg = kcalloc(sizeof(*host_cfg), GFP_KERNEL);
*num_host_cfg = 1;
*host_cfg[0].interface = phydev->interface;
*host_cfg[0].rate_matching = RATE_MATCH_NONE;

return 0;
}

return phydev->drv->get_host_cfg_for_linkmode(phydev, linkmode,
host_cfg, num_host_cfg);
}

/* Calling this is only necessary if @num_host_cfg returned by
* phy_get_host_cfg_for_linkmode() is larger than 1.
*/
int phy_set_host_cfg_for_linkmode(struct phy_device *phydev,
enum ethtool_link_mode_bit_indices linkmode,
const struct phy_host_cfg *host_cfg)
{
if (!phydev->drv->set_host_cfg_for_linkmode)
return -EOPNOTSUPP;

return phydev->drv->set_host_cfg_for_linkmode(phydev, linkmode,
host_cfg);
}

Based on the host_cfg array returned by the PHY for each link mode,
phylink could figure out (by intersecting with the MAC/PCS's
host_interfaces/mac_capabilities) what should be advertised and what
shouldn't.

2023-01-06 23:20:54

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Thu, Jan 05, 2023 at 05:46:48PM +0000, Russell King (Oracle) wrote:
> On Thu, Jan 05, 2023 at 07:34:45PM +0200, Vladimir Oltean wrote:
> > So we lose the advertisement of 5G and 2.5G, even if the firmware is
> > provisioned for them via 10GBASE-R rate adaptation, right? Because when
> > asked "What kind of rate matching is supported for 10GBASE-R?", the
> > Aquantia driver will respond "None".
>
> The code doesn't have the ability to do any better right now - since
> we don't know what sets of interface modes _could_ be used by the PHY
> and whether each interface mode may result in rate adaption.
>
> To achieve that would mean reworking yet again all the phylink
> validation from scratch, and probably reworking phylib and most of
> the PHY drivers too so that they provide a lot more information
> about their host interface behaviour.
>
> I don't think there is an easy way to have a "perfect" solution
> immediately - it's going to take a while to evolve - and probably
> painfully evolve due to the slowness involved in updating all the
> drivers that make use of phylink in some way.

Serious question. What do we gain in practical terms with this patch set
applied? With certain firmware provisioning, some unsupported link modes
won't be advertised anymore. But also, with other firmware, some supported
link modes won't be advertised anymore.

IIUC, Tim Harvey's firmware ultimately had incorrect provisioning, it's
not like the existing code prevents his use case from working.

2023-01-06 23:29:50

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On 1/6/23 18:03, Vladimir Oltean wrote:
> On Thu, Jan 05, 2023 at 05:46:48PM +0000, Russell King (Oracle) wrote:
>> On Thu, Jan 05, 2023 at 07:34:45PM +0200, Vladimir Oltean wrote:
>> > So we lose the advertisement of 5G and 2.5G, even if the firmware is
>> > provisioned for them via 10GBASE-R rate adaptation, right? Because when
>> > asked "What kind of rate matching is supported for 10GBASE-R?", the
>> > Aquantia driver will respond "None".
>>
>> The code doesn't have the ability to do any better right now - since
>> we don't know what sets of interface modes _could_ be used by the PHY
>> and whether each interface mode may result in rate adaption.
>>
>> To achieve that would mean reworking yet again all the phylink
>> validation from scratch, and probably reworking phylib and most of
>> the PHY drivers too so that they provide a lot more information
>> about their host interface behaviour.
>>
>> I don't think there is an easy way to have a "perfect" solution
>> immediately - it's going to take a while to evolve - and probably
>> painfully evolve due to the slowness involved in updating all the
>> drivers that make use of phylink in some way.
>
> Serious question. What do we gain in practical terms with this patch set
> applied? With certain firmware provisioning, some unsupported link modes
> won't be advertised anymore. But also, with other firmware, some supported
> link modes won't be advertised anymore.

Well, before the rate adaptation series, none of this would be
advertised. I would rather add advertisement only for what we can
actually support. We can always come back later and add additional
support.

> IIUC, Tim Harvey's firmware ultimately had incorrect provisioning, it's
> not like the existing code prevents his use case from working.

The existing code isn't great as-is, since all the user sees is that we
e.g. negotiated for 1G, but the link never came up.

--Sean

2023-01-06 23:55:40

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Fri, Jan 06, 2023 at 06:21:26PM -0500, Sean Anderson wrote:
> On 1/6/23 18:03, Vladimir Oltean wrote:
> > On Thu, Jan 05, 2023 at 05:46:48PM +0000, Russell King (Oracle) wrote:
> >> On Thu, Jan 05, 2023 at 07:34:45PM +0200, Vladimir Oltean wrote:
> >> > So we lose the advertisement of 5G and 2.5G, even if the firmware is
> >> > provisioned for them via 10GBASE-R rate adaptation, right? Because when
> >> > asked "What kind of rate matching is supported for 10GBASE-R?", the
> >> > Aquantia driver will respond "None".
> >>
> >> The code doesn't have the ability to do any better right now - since
> >> we don't know what sets of interface modes _could_ be used by the PHY
> >> and whether each interface mode may result in rate adaption.
> >>
> >> To achieve that would mean reworking yet again all the phylink
> >> validation from scratch, and probably reworking phylib and most of
> >> the PHY drivers too so that they provide a lot more information
> >> about their host interface behaviour.
> >>
> >> I don't think there is an easy way to have a "perfect" solution
> >> immediately - it's going to take a while to evolve - and probably
> >> painfully evolve due to the slowness involved in updating all the
> >> drivers that make use of phylink in some way.
> >
> > Serious question. What do we gain in practical terms with this patch set
> > applied? With certain firmware provisioning, some unsupported link modes
> > won't be advertised anymore. But also, with other firmware, some supported
> > link modes won't be advertised anymore.
>
> Well, before the rate adaptation series, none of this would be
> advertised. I would rather add advertisement only for what we can
> actually support. We can always come back later and add additional
> support.

Well, yes. But practically, does it matter that we are negotiating a
link speed that we don't support, when the effect is the same (link
doesn't come up)? The only practical case I see is where advertising
e.g. an unsupported 2.5G would cause the link to not establish at a
supported 1G. But as you say, I don't think this will be the case with
the firmware provisioning that Tim gave as an example?

> > IIUC, Tim Harvey's firmware ultimately had incorrect provisioning, it's
> > not like the existing code prevents his use case from working.
>
> The existing code isn't great as-is, since all the user sees is that we
> e.g. negotiated for 1G, but the link never came up.
>
> --Sean

2023-01-09 19:23:36

by Tim Harvey

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On Fri, Jan 6, 2023 at 3:21 PM Sean Anderson <[email protected]> wrote:
>
> On 1/6/23 18:03, Vladimir Oltean wrote:
> > On Thu, Jan 05, 2023 at 05:46:48PM +0000, Russell King (Oracle) wrote:
> >> On Thu, Jan 05, 2023 at 07:34:45PM +0200, Vladimir Oltean wrote:
> >> > So we lose the advertisement of 5G and 2.5G, even if the firmware is
> >> > provisioned for them via 10GBASE-R rate adaptation, right? Because when
> >> > asked "What kind of rate matching is supported for 10GBASE-R?", the
> >> > Aquantia driver will respond "None".
> >>
> >> The code doesn't have the ability to do any better right now - since
> >> we don't know what sets of interface modes _could_ be used by the PHY
> >> and whether each interface mode may result in rate adaption.
> >>
> >> To achieve that would mean reworking yet again all the phylink
> >> validation from scratch, and probably reworking phylib and most of
> >> the PHY drivers too so that they provide a lot more information
> >> about their host interface behaviour.
> >>
> >> I don't think there is an easy way to have a "perfect" solution
> >> immediately - it's going to take a while to evolve - and probably
> >> painfully evolve due to the slowness involved in updating all the
> >> drivers that make use of phylink in some way.
> >
> > Serious question. What do we gain in practical terms with this patch set
> > applied? With certain firmware provisioning, some unsupported link modes
> > won't be advertised anymore. But also, with other firmware, some supported
> > link modes won't be advertised anymore.
>
> Well, before the rate adaptation series, none of this would be
> advertised. I would rather add advertisement only for what we can
> actually support. We can always come back later and add additional
> support.
>
> > IIUC, Tim Harvey's firmware ultimately had incorrect provisioning, it's
> > not like the existing code prevents his use case from working.

Correct - the firmware I was provided was mis-configured.

Tim

>
> The existing code isn't great as-is, since all the user sees is that we
> e.g. negotiated for 1G, but the link never came up.
>
> --Sean

2023-01-19 18:31:02

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 0/4] phy: aquantia: Determine rate adaptation support from registers

On 1/3/23 17:05, Sean Anderson wrote:
> This attempts to address the problems first reported in [1]. Tim has an
> Aquantia phy where the firmware is set up to use "5G XFI" (underclocked
> 10GBASE-R) when rate adapting lower speeds. This results in us
> advertising that we support lower speeds and then failing to bring the
> link up. To avoid this, determine whether to enable rate adaptation
> based on what's programmed by the firmware. This is "the worst choice"
> [2], but we can't really do better until we have more insight into
> what the firmware is doing. At the very least, we can prevent bad
> firmware from causing us to advertise the wrong modes.
>
> Past submissions may be found at [3, 4].
>
> [1] https://lore.kernel.org/netdev/CAJ+vNU3zeNqiGhjTKE8jRjDYR0D7f=iqPLB8phNyA2CWixy7JA@mail.gmail.com/
> [2] https://lore.kernel.org/netdev/20221118171643.vu6uxbnmog4sna65@skbuf/
> [3] https://lore.kernel.org/netdev/[email protected]/
> [4] https://lore.kernel.org/netdev/[email protected]/
>
> Changes in v5:
> - Add missing PMA/PMD speed bits
> - Don't handle PHY_INTERFACE_MODE_NA, and simplify logic
>
> Changes in v4:
> - Reorganize MDIO defines
> - Fix kerneldoc using - instead of : for parameters
>
> Changes in v3:
> - Update speed register bits
> - Fix incorrect bits for PMA/PMD speed
>
> Changes in v2:
> - Move/rename phylink_interface_max_speed
> - Rework to just validate things instead of modifying registers
>
> Sean Anderson (4):
> net: phy: Move/rename phylink_interface_max_speed
> phy: mdio: Reorganize defines
> net: mdio: Update speed register bits
> phy: aquantia: Determine rate adaptation support from registers
>
> drivers/net/phy/aquantia_main.c | 136 ++++++++++++++++++++++++++++++--
> drivers/net/phy/phy-core.c | 70 ++++++++++++++++
> drivers/net/phy/phylink.c | 75 +-----------------
> include/linux/phy.h | 1 +
> include/uapi/linux/mdio.h | 118 ++++++++++++++++++---------
> 5 files changed, 282 insertions(+), 118 deletions(-)
>

Although there was a lot of discussion about the final patch, I think the
first 3 are good changes. Can we apply them as-is? Should I resubmit?

--Sean

2023-01-19 18:51:11

by Sean Anderson

[permalink] [raw]
Subject: Re: [PATCH net-next v5 4/4] phy: aquantia: Determine rate adaptation support from registers

On 1/6/23 18:29, Vladimir Oltean wrote:
> On Fri, Jan 06, 2023 at 06:21:26PM -0500, Sean Anderson wrote:
>> On 1/6/23 18:03, Vladimir Oltean wrote:
>> > On Thu, Jan 05, 2023 at 05:46:48PM +0000, Russell King (Oracle) wrote:
>> >> On Thu, Jan 05, 2023 at 07:34:45PM +0200, Vladimir Oltean wrote:
>> >> > So we lose the advertisement of 5G and 2.5G, even if the firmware is
>> >> > provisioned for them via 10GBASE-R rate adaptation, right? Because when
>> >> > asked "What kind of rate matching is supported for 10GBASE-R?", the
>> >> > Aquantia driver will respond "None".
>> >>
>> >> The code doesn't have the ability to do any better right now - since
>> >> we don't know what sets of interface modes _could_ be used by the PHY
>> >> and whether each interface mode may result in rate adaption.
>> >>
>> >> To achieve that would mean reworking yet again all the phylink
>> >> validation from scratch, and probably reworking phylib and most of
>> >> the PHY drivers too so that they provide a lot more information
>> >> about their host interface behaviour.
>> >>
>> >> I don't think there is an easy way to have a "perfect" solution
>> >> immediately - it's going to take a while to evolve - and probably
>> >> painfully evolve due to the slowness involved in updating all the
>> >> drivers that make use of phylink in some way.
>> >
>> > Serious question. What do we gain in practical terms with this patch set
>> > applied? With certain firmware provisioning, some unsupported link modes
>> > won't be advertised anymore. But also, with other firmware, some supported
>> > link modes won't be advertised anymore.
>>
>> Well, before the rate adaptation series, none of this would be
>> advertised. I would rather add advertisement only for what we can
>> actually support. We can always come back later and add additional
>> support.
>
> Well, yes. But practically, does it matter that we are negotiating a
> link speed that we don't support, when the effect is the same (link
> doesn't come up)? The only practical case I see is where advertising
> e.g. an unsupported 2.5G would cause the link to not establish at a
> supported 1G. But as you say, I don't think this will be the case with
> the firmware provisioning that Tim gave as an example?

I suppose.

I still think we should try to prevent bad firmware from tripping us up.
At the very least, I think we could detect bad configurations and warn
about them, so the user knows it's the firmware and not us.

--Sean

>> > IIUC, Tim Harvey's firmware ultimately had incorrect provisioning, it's
>> > not like the existing code prevents his use case from working.
>>
>> The existing code isn't great as-is, since all the user sees is that we
>> e.g. negotiated for 1G, but the link never came up.
>>
>> --Sean