This adds support for phy rate adaptation: when a phy adapts between
differing phy interface and link speeds. It was originally submitted as
part of [1], which is considered "v1" of this series.
We need support for rate adaptation for two reasons. First, the phy
consumer needs to know if the phy will perform rate adaptation in order to
program the correct advertising. An unaware consumer will only program
support for link modes at the phy interface mode's native speed. This
will cause autonegotiation to fail if the link partner only advertises
support for lower speed link modes. Second, to reduce packet loss it may
be desirable to throttle packet throughput.
There have been several past discussions [2-4] around adding rate
adaptation support. One point is that we must be certain that rate
adaptation is possible before enabling it. It is the opinion of some
developers that it is the responsibility of the system integrator or end
user to set the link settings appropriately for rate adaptation. In
particular, it was argued that (due to differing firmware) it might not
be clear if a particular phy has rate adaptation enabled. Additionally,
upper-layer protocols must already be tolerant of packet loss caused by
differing rates. Packet loss may happen anyway, such as if a faster link
is used with a slower switch or repeater. So adjusting pause settings
for rate adaptation is not strictly necessary.
I believe that our current approach is limiting, especially when
considering that rate adaptation (in two forms) has made it into IEEE
standards. In general, when we have appropriate information we should
set sensible defaults. To consider use a contrasting example, we enable
pause frames by default for link partners which autonegotiate for them.
When it's the phy itself generating these frames, we don't even have to
autonegotiate to know that we should enable pause frames.
Our current approach also encourages workarounds, such as commit
73a21fa817f0 ("dpaa_eth: support all modes with rate adapting PHYs").
These workarounds are fine for phylib drivers, but phylink drivers cannot
use this approach (since there is no direct access to the phy).
Although in earlier versions of this series, userspace could disable
rate adaptation, now it is only possible to determine the current rate
adaptation type. Disabling or otherwise configuring rate adaptation has
been left for future work. However, because currently only
RATE_ADAPT_PAUSE is implemented, it is possible to disable rate
adaptation by modifying the advertisement appropriately.
[1] https://lore.kernel.org/netdev/[email protected]/T/#t
[2] https://lore.kernel.org/netdev/[email protected]/
[3] https://lore.kernel.org/netdev/[email protected]/
[4] https://lore.kernel.org/netdev/[email protected]/
Changes in v3:
- Document MAC_(A)SYM_PAUSE
- Add some helpers for working with mac caps
- Modify link settings directly in phylink_link_up, instead of doing
things more indirectly via link_*.
- Add phylink_cap_from_speed_duplex to look up the mac capability
corresponding to the interface's speed.
- Include RATE_ADAPT_CRS; it's a few lines and it doesn't hurt.
- Move unused defines to next commit (where they will be used)
- Remove "Support differing link/interface speed/duplex". It has been
rendered unnecessary due to simplification of the rate adaptation
patches. Thanks Russell!
- Rewrite cover letter to better reflect the opinions of the developers
involved
Changes in v2:
- Use int/defines instead of enum to allow for use in ioctls/netlink
- Add locking to phy_get_rate_adaptation
- Add (read-only) ethtool support for rate adaptation
- Move part of commit message to cover letter, as it gives a good
overview of the whole series, and allows this patch to focus more on
the specifics.
- Use the phy's rate adaptation setting to determine whether to use its
link speed/duplex or the MAC's speed/duplex with MLO_AN_INBAND.
- Always use the rate adaptation setting to determine the interface
speed/duplex (instead of sometimes using the interface mode).
- Determine the interface speed and max mac speed directly instead of
guessing based on the caps.
- Add comments clarifying the register defines
- Reorder variables in aqr107_read_rate
Sean Anderson (11):
net: dpaa: Fix <1G ethernet on LS1046ARDB
net: phy: Add 1000BASE-KX interface mode
net: phylink: Document MAC_(A)SYM_PAUSE
net: phylink: Export phylink_caps_to_linkmodes
net: phylink: Generate caps and convert to linkmodes separately
net: phylink: Add some helpers for working with mac caps
net: phy: Add support for rate adaptation
net: phylink: Adjust link settings based on rate adaptation
net: phylink: Adjust advertisement based on rate adaptation
net: phy: aquantia: Add some additional phy interfaces
net: phy: aquantia: Add support for rate adaptation
Documentation/networking/ethtool-netlink.rst | 2 +
.../net/ethernet/freescale/dpaa/dpaa_eth.c | 6 +-
drivers/net/phy/aquantia_main.c | 68 +++-
drivers/net/phy/phy-core.c | 15 +
drivers/net/phy/phy.c | 28 ++
drivers/net/phy/phylink.c | 302 ++++++++++++++++--
include/linux/phy.h | 26 +-
include/linux/phylink.h | 29 +-
include/uapi/linux/ethtool.h | 18 +-
include/uapi/linux/ethtool_netlink.h | 1 +
net/ethtool/ioctl.c | 1 +
net/ethtool/linkmodes.c | 5 +
12 files changed, 466 insertions(+), 35 deletions(-)
--
2.35.1.1320.gc452695387.dirty
This adds a table for converting between speed/duplex and mac
capabilities. It also adds a helper for getting the max speed/duplex
from some caps. It is intended to be used by Russell King's DSA phylink
series. The table will be used directly later in this series.
Co-developed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Co-developed-by: Russell King (Oracle) <[email protected]>
Signed-off-by: Russell King (Oracle) <[email protected]>
[ adapted to live in phylink.c ]
Signed-off-by: Sean Anderson <[email protected]>
---
This is adapted from [1].
[1] https://lore.kernel.org/netdev/[email protected]/
Changes in v3:
- New
drivers/net/phy/phylink.c | 56 +++++++++++++++++++++++++++++++++++++++
include/linux/phylink.h | 2 ++
2 files changed, 58 insertions(+)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 68a58ab6a8ed..72bf6b607320 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -304,6 +304,62 @@ void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps)
}
EXPORT_SYMBOL_GPL(phylink_caps_to_linkmodes);
+static struct {
+ unsigned long mask;
+ int speed;
+ unsigned int duplex;
+} phylink_caps_params[] = {
+ { MAC_400000FD, SPEED_400000, DUPLEX_FULL },
+ { MAC_200000FD, SPEED_200000, DUPLEX_FULL },
+ { MAC_100000FD, SPEED_100000, DUPLEX_FULL },
+ { MAC_56000FD, SPEED_56000, DUPLEX_FULL },
+ { MAC_50000FD, SPEED_50000, DUPLEX_FULL },
+ { MAC_40000FD, SPEED_40000, DUPLEX_FULL },
+ { MAC_25000FD, SPEED_25000, DUPLEX_FULL },
+ { MAC_20000FD, SPEED_20000, DUPLEX_FULL },
+ { MAC_10000FD, SPEED_10000, DUPLEX_FULL },
+ { MAC_5000FD, SPEED_5000, DUPLEX_FULL },
+ { MAC_2500FD, SPEED_2500, DUPLEX_FULL },
+ { MAC_1000FD, SPEED_1000, DUPLEX_FULL },
+ { MAC_1000HD, SPEED_1000, DUPLEX_HALF },
+ { MAC_100FD, SPEED_100, DUPLEX_FULL },
+ { MAC_100HD, SPEED_100, DUPLEX_HALF },
+ { MAC_10FD, SPEED_10, DUPLEX_FULL },
+ { MAC_10HD, SPEED_10, DUPLEX_HALF },
+};
+
+/**
+ * phylink_caps_find_max_speed() - Find the max speed/duplex of mac capabilities
+ * @caps: A mask of mac capabilities
+ * @speed: Variable to store the maximum speed
+ * @duplex: Variable to store the maximum duplex
+ *
+ * Find the maximum speed (and associated duplex) supported by a mask of mac
+ * capabilities. @speed and @duplex are always set, even if no matching mac
+ * capability was found.
+ *
+ * Return: 0 on success, or %-EINVAL if the maximum speed/duplex could not be determined.
+ */
+int phylink_caps_find_max_speed(unsigned long caps, int *speed,
+ unsigned int *duplex)
+{
+ int i;
+
+ *speed = SPEED_UNKNOWN;
+ *duplex = DUPLEX_UNKNOWN;
+
+ for (i = 0; i < ARRAY_SIZE(phylink_caps_params); i++) {
+ if (caps & phylink_caps_params[i].mask) {
+ *speed = phylink_caps_params[i].speed;
+ *duplex = phylink_caps_params[i].duplex;
+ return 0;
+ }
+ }
+
+ return -EINVAL;
+}
+EXPORT_SYMBOL_GPL(phylink_caps_find_max_speed);
+
/**
* phylink_get_capabilities() - get capabilities for a given MAC
* @interface: phy interface mode defined by &typedef phy_interface_t
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 661d1d4fdbec..a5a236cfacb6 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -535,6 +535,8 @@ void pcs_link_up(struct phylink_pcs *pcs, unsigned int mode,
#endif
void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps);
+int phylink_caps_find_max_speed(unsigned long caps, int *speed,
+ unsigned int *duplex);
unsigned long phylink_get_capabilities(phy_interface_t interface,
unsigned long mac_capabilities);
void phylink_generic_validate(struct phylink_config *config,
--
2.35.1.1320.gc452695387.dirty
This adds support for rate adaptation to the phy subsystem. The general
idea is that the phy interface runs at one speed, and the MAC throttles
the rate at which it sends packets to the link speed. There's a good
overview of several techniques for achieving this at [1]. This patch
adds support for three: pause-frame based (such as in Aquantia phys),
CRS-based (such as in 10PASS-TS and 2BASE-TL), and open-loop-based (such
as in 10GBASE-W).
This patch makes a few assumptions and a few non assumptions about the
types of rate adaptation available. First, it assumes that different phys
may use different forms of rate adaptation. Second, it assumes that phys
can use rate adaptation for any of their supported link speeds (e.g. if a
phy supports 10BASE-T and XGMII, then it can adapt XGMII to 10BASE-T).
Third, it does not assume that all interface modes will use the same form
of rate adaptation. Fourth, it does not assume that all phy devices will
support rate adaptation (even some do). Relaxing or strengthening these
(non-)assumptions could result in a different API. For example, if all
interface modes were assumed to use the same form of rate adaptation, then
a bitmask of interface modes supportting rate adaptation would suffice.
For some better visibility into the process, the current rate adaptation
mode is exposed as part of the ethtool ksettings. For the moment, only
read access is supported. I'm not sure what userspace might want to
configure yet (disable it altogether, disable just one mode, specify the
mode to use, etc.). For the moment, since only pause-based rate
adaptation support is added in the next few commits, rate adaptation can
be disabled altogether by adjusting the advertisement.
Signed-off-by: Sean Anderson <[email protected]>
---
Should the unimplemented adaptation modes be kept in?
(no changes since v2)
Changes in v2:
- Use int/defines instead of enum to allow for use in ioctls/netlink
- Add locking to phy_get_rate_adaptation
- Add (read-only) ethtool support for rate adaptation
- Move part of commit message to cover letter, as it gives a good
overview of the whole series, and allows this patch to focus more on
the specifics.
Documentation/networking/ethtool-netlink.rst | 2 ++
drivers/net/phy/phy-core.c | 15 +++++++++++
drivers/net/phy/phy.c | 28 ++++++++++++++++++++
include/linux/phy.h | 22 ++++++++++++++-
include/uapi/linux/ethtool.h | 18 +++++++++++--
include/uapi/linux/ethtool_netlink.h | 1 +
net/ethtool/ioctl.c | 1 +
net/ethtool/linkmodes.c | 5 ++++
8 files changed, 89 insertions(+), 3 deletions(-)
diff --git a/Documentation/networking/ethtool-netlink.rst b/Documentation/networking/ethtool-netlink.rst
index dbca3e9ec782..65ed29e78499 100644
--- a/Documentation/networking/ethtool-netlink.rst
+++ b/Documentation/networking/ethtool-netlink.rst
@@ -426,6 +426,7 @@ Kernel response contents:
``ETHTOOL_A_LINKMODES_DUPLEX`` u8 duplex mode
``ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG`` u8 Master/slave port mode
``ETHTOOL_A_LINKMODES_MASTER_SLAVE_STATE`` u8 Master/slave port state
+ ``ETHTOOL_A_LINKMODES_RATE_ADAPTATION`` u8 PHY rate adaptation
========================================== ====== ==========================
For ``ETHTOOL_A_LINKMODES_OURS``, value represents advertised modes and mask
@@ -449,6 +450,7 @@ Request contents:
``ETHTOOL_A_LINKMODES_SPEED`` u32 link speed (Mb/s)
``ETHTOOL_A_LINKMODES_DUPLEX`` u8 duplex mode
``ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG`` u8 Master/slave port mode
+ ``ETHTOOL_A_LINKMODES_RATE_ADAPTATION`` u8 PHY rate adaptation
``ETHTOOL_A_LINKMODES_LANES`` u32 lanes
========================================== ====== ==========================
diff --git a/drivers/net/phy/phy-core.c b/drivers/net/phy/phy-core.c
index 1f2531a1a876..dc70a9088544 100644
--- a/drivers/net/phy/phy-core.c
+++ b/drivers/net/phy/phy-core.c
@@ -74,6 +74,21 @@ const char *phy_duplex_to_str(unsigned int duplex)
}
EXPORT_SYMBOL_GPL(phy_duplex_to_str);
+const char *phy_rate_adaptation_to_str(int rate_adaptation)
+{
+ switch (rate_adaptation) {
+ case RATE_ADAPT_NONE:
+ return "none";
+ case RATE_ADAPT_PAUSE:
+ return "pause";
+ case RATE_ADAPT_CRS:
+ return "crs";
+ case RATE_ADAPT_OPEN_LOOP:
+ return "open-loop";
+ }
+ return "Unsupported (update phy-core.c)";
+}
+
/* A mapping of all SUPPORTED settings to speed/duplex. This table
* must be grouped by speed and sorted in descending match priority
* - iow, descending speed.
diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c
index 8d3ee3a6495b..77cbf07852e6 100644
--- a/drivers/net/phy/phy.c
+++ b/drivers/net/phy/phy.c
@@ -114,6 +114,33 @@ void phy_print_status(struct phy_device *phydev)
}
EXPORT_SYMBOL(phy_print_status);
+/**
+ * phy_get_rate_adaptation - determine if rate adaptation is supported
+ * @phydev: The phy device to return rate adaptation for
+ * @iface: The interface mode to use
+ *
+ * This determines the type of rate adaptation (if any) that @phy supports
+ * using @iface. @iface may be %PHY_INTERFACE_MODE_NA to determine if any
+ * interface supports rate adaptation.
+ *
+ * Return: The type of rate adaptation @phy supports for @iface, or
+ * %RATE_ADAPT_NONE.
+ */
+int phy_get_rate_adaptation(struct phy_device *phydev,
+ phy_interface_t iface)
+{
+ int ret = RATE_ADAPT_NONE;
+
+ if (phydev->drv->get_rate_adaptation) {
+ mutex_lock(&phydev->lock);
+ ret = phydev->drv->get_rate_adaptation(phydev, iface);
+ mutex_unlock(&phydev->lock);
+ }
+
+ return ret;
+}
+EXPORT_SYMBOL_GPL(phy_get_rate_adaptation);
+
/**
* phy_config_interrupt - configure the PHY device for the requested interrupts
* @phydev: the phy_device struct
@@ -256,6 +283,7 @@ void phy_ethtool_ksettings_get(struct phy_device *phydev,
cmd->base.duplex = phydev->duplex;
cmd->base.master_slave_cfg = phydev->master_slave_get;
cmd->base.master_slave_state = phydev->master_slave_state;
+ cmd->base.rate_adaptation = phydev->rate_adaptation;
if (phydev->interface == PHY_INTERFACE_MODE_MOCA)
cmd->base.port = PORT_BNC;
else
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 81ce76c3e799..4ba8126b64f3 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -276,7 +276,6 @@ static inline const char *phy_modes(phy_interface_t interface)
}
}
-
#define PHY_INIT_TIMEOUT 100000
#define PHY_FORCE_TIMEOUT 10
@@ -570,6 +569,7 @@ struct macsec_ops;
* @lp_advertising: Current link partner advertised linkmodes
* @eee_broken_modes: Energy efficient ethernet modes which should be prohibited
* @autoneg: Flag autoneg being used
+ * @rate_adaptation: Current rate adaptation mode
* @link: Current link state
* @autoneg_complete: Flag auto negotiation of the link has completed
* @mdix: Current crossover
@@ -637,6 +637,8 @@ struct phy_device {
unsigned irq_suspended:1;
unsigned irq_rerun:1;
+ int rate_adaptation;
+
enum phy_state state;
u32 dev_flags;
@@ -801,6 +803,21 @@ struct phy_driver {
*/
int (*get_features)(struct phy_device *phydev);
+ /**
+ * @get_rate_adaptation: Get the supported type of rate adaptation for a
+ * particular phy interface. This is used by phy consumers to determine
+ * whether to advertise lower-speed modes for that interface. It is
+ * assumed that if a rate adaptation mode is supported on an interface,
+ * then that interface's rate can be adapted to all slower link speeds
+ * supported by the phy. If iface is %PHY_INTERFACE_MODE_NA, and the phy
+ * supports any kind of rate adaptation for any interface, then it must
+ * return that rate adaptation mode (preferring %RATE_ADAPT_PAUSE, to
+ * %RATE_ADAPT_CRS). If the interface is not supported, this should
+ * return %RATE_ADAPT_NONE.
+ */
+ int (*get_rate_adaptation)(struct phy_device *phydev,
+ phy_interface_t iface);
+
/* PHY Power Management */
/** @suspend: Suspend the hardware, saving state if needed */
int (*suspend)(struct phy_device *phydev);
@@ -967,6 +984,7 @@ struct phy_fixup {
const char *phy_speed_to_str(int speed);
const char *phy_duplex_to_str(unsigned int duplex);
+const char *phy_rate_adaptation_to_str(int rate_adaptation);
/* A structure for mapping a particular speed and duplex
* combination to a particular SUPPORTED and ADVERTISED value
@@ -1681,6 +1699,8 @@ int phy_disable_interrupts(struct phy_device *phydev);
void phy_request_interrupt(struct phy_device *phydev);
void phy_free_interrupt(struct phy_device *phydev);
void phy_print_status(struct phy_device *phydev);
+int phy_get_rate_adaptation(struct phy_device *phydev,
+ phy_interface_t iface);
void phy_set_max_speed(struct phy_device *phydev, u32 max_speed);
void phy_remove_link_mode(struct phy_device *phydev, u32 link_mode);
void phy_advertise_supported(struct phy_device *phydev);
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index e0f0ee9bc89e..3978f9c3fb83 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1840,6 +1840,20 @@ static inline int ethtool_validate_duplex(__u8 duplex)
#define MASTER_SLAVE_STATE_SLAVE 3
#define MASTER_SLAVE_STATE_ERR 4
+/* These are used to throttle the rate of data on the phy interface when the
+ * native speed of the interface is higher than the link speed. These should
+ * not be used for phy interfaces which natively support multiple speeds (e.g.
+ * MII or SGMII).
+ */
+/* No rate adaptation performed. */
+#define RATE_ADAPT_NONE 0
+/* The phy sends pause frames to throttle the MAC. */
+#define RATE_ADAPT_PAUSE 1
+/* The phy asserts CRS to prevent the MAC from transmitting. */
+#define RATE_ADAPT_CRS 2
+/* The MAC is programmed with a sufficiently-large IPG. */
+#define RATE_ADAPT_OPEN_LOOP 3
+
/* Which connector port. */
#define PORT_TP 0x00
#define PORT_AUI 0x01
@@ -2033,8 +2047,8 @@ enum ethtool_reset_flags {
* reported consistently by PHYLIB. Read-only.
* @master_slave_cfg: Master/slave port mode.
* @master_slave_state: Master/slave port state.
+ * @rate_adaptation: Rate adaptation performed by the PHY
* @reserved: Reserved for future use; see the note on reserved space.
- * @reserved1: Reserved for future use; see the note on reserved space.
* @link_mode_masks: Variable length bitmaps.
*
* If autonegotiation is disabled, the speed and @duplex represent the
@@ -2085,7 +2099,7 @@ struct ethtool_link_settings {
__u8 transceiver;
__u8 master_slave_cfg;
__u8 master_slave_state;
- __u8 reserved1[1];
+ __u8 rate_adaptation;
__u32 reserved[7];
__u32 link_mode_masks[0];
/* layout of link_mode_masks fields:
diff --git a/include/uapi/linux/ethtool_netlink.h b/include/uapi/linux/ethtool_netlink.h
index d2fb4f7be61b..3a5d81769ff4 100644
--- a/include/uapi/linux/ethtool_netlink.h
+++ b/include/uapi/linux/ethtool_netlink.h
@@ -242,6 +242,7 @@ enum {
ETHTOOL_A_LINKMODES_MASTER_SLAVE_CFG, /* u8 */
ETHTOOL_A_LINKMODES_MASTER_SLAVE_STATE, /* u8 */
ETHTOOL_A_LINKMODES_LANES, /* u32 */
+ ETHTOOL_A_LINKMODES_RATE_ADAPTATION, /* u8 */
/* add new constants above here */
__ETHTOOL_A_LINKMODES_CNT,
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 6a7308de192d..ef0ad300393a 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -571,6 +571,7 @@ static int ethtool_get_link_ksettings(struct net_device *dev,
= __ETHTOOL_LINK_MODE_MASK_NU32;
link_ksettings.base.master_slave_cfg = MASTER_SLAVE_CFG_UNSUPPORTED;
link_ksettings.base.master_slave_state = MASTER_SLAVE_STATE_UNSUPPORTED;
+ link_ksettings.base.rate_adaptation = RATE_ADAPT_NONE;
return store_link_ksettings_for_user(useraddr, &link_ksettings);
}
diff --git a/net/ethtool/linkmodes.c b/net/ethtool/linkmodes.c
index 99b29b4fe947..7905bd985c7f 100644
--- a/net/ethtool/linkmodes.c
+++ b/net/ethtool/linkmodes.c
@@ -70,6 +70,7 @@ static int linkmodes_reply_size(const struct ethnl_req_info *req_base,
+ nla_total_size(sizeof(u32)) /* LINKMODES_SPEED */
+ nla_total_size(sizeof(u32)) /* LINKMODES_LANES */
+ nla_total_size(sizeof(u8)) /* LINKMODES_DUPLEX */
+ + nla_total_size(sizeof(u8)) /* LINKMODES_RATE_ADAPTATION */
+ 0;
ret = ethnl_bitset_size(ksettings->link_modes.advertising,
ksettings->link_modes.supported,
@@ -143,6 +144,10 @@ static int linkmodes_fill_reply(struct sk_buff *skb,
lsettings->master_slave_state))
return -EMSGSIZE;
+ if (nla_put_u8(skb, ETHTOOL_A_LINKMODES_RATE_ADAPTATION,
+ lsettings->rate_adaptation))
+ return -EMSGSIZE;
+
return 0;
}
--
2.35.1.1320.gc452695387.dirty
This adds support for rate adaptation for phys similar to the AQR107. We
assume that all phys using aqr107_read_status support rate adaptation.
However, it could be possible to determine support based on the firmware
revision if there are phys discovered which do not support rate adaptation.
However, as rate adaptation is advertised in the datasheets for these phys,
I suspect it is supported most boards.
Despite the name, the "config" registers are updated with the current rate
adaptation method (if any). Because they appear to be updated
automatically, I don't know if these registers can be used to disable rate
adaptation.
Signed-off-by: Sean Anderson <[email protected]>
---
(no changes since v2)
Changes in v2:
- Add comments clarifying the register defines
- Reorder variables in aqr107_read_rate
drivers/net/phy/aquantia_main.c | 51 ++++++++++++++++++++++++++++++---
1 file changed, 47 insertions(+), 4 deletions(-)
diff --git a/drivers/net/phy/aquantia_main.c b/drivers/net/phy/aquantia_main.c
index b3a5db487e52..dd73891cf68a 100644
--- a/drivers/net/phy/aquantia_main.c
+++ b/drivers/net/phy/aquantia_main.c
@@ -94,6 +94,19 @@
#define VEND1_GLOBAL_FW_ID_MAJOR GENMASK(15, 8)
#define VEND1_GLOBAL_FW_ID_MINOR GENMASK(7, 0)
+/* The following registers all have similar layouts; first the registers... */
+#define VEND1_GLOBAL_CFG_10M 0x0310
+#define VEND1_GLOBAL_CFG_100M 0x031b
+#define VEND1_GLOBAL_CFG_1G 0x031c
+#define VEND1_GLOBAL_CFG_2_5G 0x031d
+#define VEND1_GLOBAL_CFG_5G 0x031e
+#define VEND1_GLOBAL_CFG_10G 0x031f
+/* ...and now the fields */
+#define VEND1_GLOBAL_CFG_RATE_ADAPT GENMASK(8, 7)
+#define VEND1_GLOBAL_CFG_RATE_ADAPT_NONE 0
+#define VEND1_GLOBAL_CFG_RATE_ADAPT_USX 1
+#define VEND1_GLOBAL_CFG_RATE_ADAPT_PAUSE 2
+
#define VEND1_GLOBAL_RSVD_STAT1 0xc885
#define VEND1_GLOBAL_RSVD_STAT1_FW_BUILD_ID GENMASK(7, 4)
#define VEND1_GLOBAL_RSVD_STAT1_PROV_ID GENMASK(3, 0)
@@ -338,40 +351,57 @@ static int aqr_read_status(struct phy_device *phydev)
static int aqr107_read_rate(struct phy_device *phydev)
{
+ u32 config_reg;
int val;
val = phy_read_mmd(phydev, MDIO_MMD_AN, MDIO_AN_TX_VEND_STATUS1);
if (val < 0)
return val;
+ if (val & MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX)
+ phydev->duplex = DUPLEX_FULL;
+ else
+ phydev->duplex = DUPLEX_HALF;
+
switch (FIELD_GET(MDIO_AN_TX_VEND_STATUS1_RATE_MASK, val)) {
case MDIO_AN_TX_VEND_STATUS1_10BASET:
phydev->speed = SPEED_10;
+ config_reg = VEND1_GLOBAL_CFG_10M;
break;
case MDIO_AN_TX_VEND_STATUS1_100BASETX:
phydev->speed = SPEED_100;
+ config_reg = VEND1_GLOBAL_CFG_100M;
break;
case MDIO_AN_TX_VEND_STATUS1_1000BASET:
phydev->speed = SPEED_1000;
+ config_reg = VEND1_GLOBAL_CFG_1G;
break;
case MDIO_AN_TX_VEND_STATUS1_2500BASET:
phydev->speed = SPEED_2500;
+ config_reg = VEND1_GLOBAL_CFG_2_5G;
break;
case MDIO_AN_TX_VEND_STATUS1_5000BASET:
phydev->speed = SPEED_5000;
+ config_reg = VEND1_GLOBAL_CFG_5G;
break;
case MDIO_AN_TX_VEND_STATUS1_10GBASET:
phydev->speed = SPEED_10000;
+ config_reg = VEND1_GLOBAL_CFG_10G;
break;
default:
phydev->speed = SPEED_UNKNOWN;
- break;
+ return 0;
}
- if (val & MDIO_AN_TX_VEND_STATUS1_FULL_DUPLEX)
- phydev->duplex = DUPLEX_FULL;
+ val = phy_read_mmd(phydev, MDIO_MMD_VEND1, config_reg);
+ if (val < 0)
+ return val;
+
+ if (FIELD_GET(VEND1_GLOBAL_CFG_RATE_ADAPT, val) ==
+ VEND1_GLOBAL_CFG_RATE_ADAPT_PAUSE)
+ phydev->rate_adaptation = RATE_ADAPT_PAUSE;
else
- phydev->duplex = DUPLEX_HALF;
+ phydev->rate_adaptation = RATE_ADAPT_NONE;
return 0;
}
@@ -612,6 +642,16 @@ static void aqr107_link_change_notify(struct phy_device *phydev)
phydev_info(phydev, "Aquantia 1000Base-T2 mode active\n");
}
+static int aqr107_get_rate_adaptation(struct phy_device *phydev,
+ phy_interface_t iface)
+{
+ if (iface == PHY_INTERFACE_MODE_10GBASER ||
+ iface == PHY_INTERFACE_MODE_2500BASEX ||
+ iface == PHY_INTERFACE_MODE_NA)
+ return RATE_ADAPT_PAUSE;
+ return RATE_ADAPT_NONE;
+}
+
static int aqr107_suspend(struct phy_device *phydev)
{
return phy_set_bits_mmd(phydev, MDIO_MMD_VEND1, MDIO_CTRL1,
@@ -673,6 +713,7 @@ static struct phy_driver aqr_driver[] = {
PHY_ID_MATCH_MODEL(PHY_ID_AQR107),
.name = "Aquantia AQR107",
.probe = aqr107_probe,
+ .get_rate_adaptation = aqr107_get_rate_adaptation,
.config_init = aqr107_config_init,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
@@ -691,6 +732,7 @@ static struct phy_driver aqr_driver[] = {
PHY_ID_MATCH_MODEL(PHY_ID_AQCS109),
.name = "Aquantia AQCS109",
.probe = aqr107_probe,
+ .get_rate_adaptation = aqr107_get_rate_adaptation,
.config_init = aqcs109_config_init,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
@@ -717,6 +759,7 @@ static struct phy_driver aqr_driver[] = {
PHY_ID_MATCH_MODEL(PHY_ID_AQR113C),
.name = "Aquantia AQR113C",
.probe = aqr107_probe,
+ .get_rate_adaptation = aqr107_get_rate_adaptation,
.config_init = aqr107_config_init,
.config_aneg = aqr_config_aneg,
.config_intr = aqr_config_intr,
--
2.35.1.1320.gc452695387.dirty
These are documented in the AQR115 register reference. I haven't tested
them, but perhaps they'll be useful to someone.
Signed-off-by: Sean Anderson <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
---
Changes in v3:
- Move unused defines to next commit (where they will be used)
drivers/net/phy/aquantia_main.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/aquantia_main.c b/drivers/net/phy/aquantia_main.c
index 8b7a46db30e0..b3a5db487e52 100644
--- a/drivers/net/phy/aquantia_main.c
+++ b/drivers/net/phy/aquantia_main.c
@@ -27,9 +27,12 @@
#define MDIO_PHYXS_VEND_IF_STATUS 0xe812
#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_MASK GENMASK(7, 3)
#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_KR 0
+#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_KX 1
#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_XFI 2
#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_USXGMII 3
+#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_XAUI 4
#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_SGMII 6
+#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_RXAUI 7
#define MDIO_PHYXS_VEND_IF_STATUS_TYPE_OCSGMII 10
#define MDIO_AN_VEND_PROV 0xc400
@@ -392,15 +395,24 @@ static int aqr107_read_status(struct phy_device *phydev)
case MDIO_PHYXS_VEND_IF_STATUS_TYPE_KR:
phydev->interface = PHY_INTERFACE_MODE_10GKR;
break;
+ case MDIO_PHYXS_VEND_IF_STATUS_TYPE_KX:
+ phydev->interface = PHY_INTERFACE_MODE_1000BASEKX;
+ break;
case MDIO_PHYXS_VEND_IF_STATUS_TYPE_XFI:
phydev->interface = PHY_INTERFACE_MODE_10GBASER;
break;
case MDIO_PHYXS_VEND_IF_STATUS_TYPE_USXGMII:
phydev->interface = PHY_INTERFACE_MODE_USXGMII;
break;
+ case MDIO_PHYXS_VEND_IF_STATUS_TYPE_XAUI:
+ phydev->interface = PHY_INTERFACE_MODE_XAUI;
+ break;
case MDIO_PHYXS_VEND_IF_STATUS_TYPE_SGMII:
phydev->interface = PHY_INTERFACE_MODE_SGMII;
break;
+ case MDIO_PHYXS_VEND_IF_STATUS_TYPE_RXAUI:
+ phydev->interface = PHY_INTERFACE_MODE_RXAUI;
+ break;
case MDIO_PHYXS_VEND_IF_STATUS_TYPE_OCSGMII:
phydev->interface = PHY_INTERFACE_MODE_2500BASEX;
break;
@@ -513,11 +525,14 @@ static int aqr107_config_init(struct phy_device *phydev)
/* Check that the PHY interface type is compatible */
if (phydev->interface != PHY_INTERFACE_MODE_SGMII &&
+ phydev->interface != PHY_INTERFACE_MODE_1000BASEKX &&
phydev->interface != PHY_INTERFACE_MODE_2500BASEX &&
phydev->interface != PHY_INTERFACE_MODE_XGMII &&
phydev->interface != PHY_INTERFACE_MODE_USXGMII &&
phydev->interface != PHY_INTERFACE_MODE_10GKR &&
- phydev->interface != PHY_INTERFACE_MODE_10GBASER)
+ phydev->interface != PHY_INTERFACE_MODE_10GBASER &&
+ phydev->interface != PHY_INTERFACE_MODE_XAUI &&
+ phydev->interface != PHY_INTERFACE_MODE_RXAUI)
return -ENODEV;
WARN(phydev->interface == PHY_INTERFACE_MODE_XGMII,
--
2.35.1.1320.gc452695387.dirty
Add 1000BASE-KX interface mode. This 1G backplane ethernet as described in
clause 70. Clause 73 autonegotiation is mandatory, and only full duplex
operation is supported.
Signed-off-by: Sean Anderson <[email protected]>
---
(no changes since v1)
drivers/net/phy/phylink.c | 1 +
include/linux/phy.h | 4 ++++
2 files changed, 5 insertions(+)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 9bd69328dc4d..b08716fe22c1 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -344,6 +344,7 @@ void phylink_get_linkmodes(unsigned long *linkmodes, phy_interface_t interface,
case PHY_INTERFACE_MODE_1000BASEX:
caps |= MAC_1000HD;
fallthrough;
+ case PHY_INTERFACE_MODE_1000BASEKX:
case PHY_INTERFACE_MODE_TRGMII:
caps |= MAC_1000FD;
break;
diff --git a/include/linux/phy.h b/include/linux/phy.h
index 87638c55d844..81ce76c3e799 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -115,6 +115,7 @@ extern const int phy_10gbit_features_array[1];
* @PHY_INTERFACE_MODE_25GBASER: 25G BaseR
* @PHY_INTERFACE_MODE_USXGMII: Universal Serial 10GE MII
* @PHY_INTERFACE_MODE_10GKR: 10GBASE-KR - with Clause 73 AN
+ * @PHY_INTERFACE_MODE_1000BASEKX: 1000Base-KX - with Clause 73 AN
* @PHY_INTERFACE_MODE_MAX: Book keeping
*
* Describes the interface between the MAC and PHY.
@@ -152,6 +153,7 @@ typedef enum {
PHY_INTERFACE_MODE_USXGMII,
/* 10GBASE-KR - with Clause 73 AN */
PHY_INTERFACE_MODE_10GKR,
+ PHY_INTERFACE_MODE_1000BASEKX,
PHY_INTERFACE_MODE_MAX,
} phy_interface_t;
@@ -249,6 +251,8 @@ static inline const char *phy_modes(phy_interface_t interface)
return "trgmii";
case PHY_INTERFACE_MODE_1000BASEX:
return "1000base-x";
+ case PHY_INTERFACE_MODE_1000BASEKX:
+ return "1000base-kx";
case PHY_INTERFACE_MODE_2500BASEX:
return "2500base-x";
case PHY_INTERFACE_MODE_5GBASER:
--
2.35.1.1320.gc452695387.dirty
Hi Vladmir,
On 7/25/22 11:41 AM, Vladimir Oltean wrote:
> On Mon, Jul 25, 2022 at 11:37:24AM -0400, Sean Anderson wrote:
>> This adds a table for converting between speed/duplex and mac
>> capabilities. It also adds a helper for getting the max speed/duplex
>> from some caps. It is intended to be used by Russell King's DSA phylink
>> series. The table will be used directly later in this series.
>>
>> Co-developed-by: Vladimir Oltean <[email protected]>
>> Signed-off-by: Vladimir Oltean <[email protected]>
>> Co-developed-by: Russell King (Oracle) <[email protected]>
>> Signed-off-by: Russell King (Oracle) <[email protected]>
>> [ adapted to live in phylink.c ]
>> Signed-off-by: Sean Anderson <[email protected]>
>> ---
>> This is adapted from [1].
>>
>> [1] https://lore.kernel.org/netdev/[email protected]/
>
> I did not write even one line of code from this patch, please drop my
> name from the next revision when there will be one.
>
I merely retained your CDB/SoB from [1].
--Sean
On 7/25/22 11:46 AM, Vladimir Oltean wrote:
> On Mon, Jul 25, 2022 at 11:42:25AM -0400, Sean Anderson wrote:
>> Hi Vladmir,
>>
>> On 7/25/22 11:41 AM, Vladimir Oltean wrote:
>> > On Mon, Jul 25, 2022 at 11:37:24AM -0400, Sean Anderson wrote:
>> >> This adds a table for converting between speed/duplex and mac
>> >> capabilities. It also adds a helper for getting the max speed/duplex
>> >> from some caps. It is intended to be used by Russell King's DSA phylink
>> >> series. The table will be used directly later in this series.
>> >>
>> >> Co-developed-by: Vladimir Oltean <[email protected]>
>> >> Signed-off-by: Vladimir Oltean <[email protected]>
>> >> Co-developed-by: Russell King (Oracle) <[email protected]>
>> >> Signed-off-by: Russell King (Oracle) <[email protected]>
>> >> [ adapted to live in phylink.c ]
>> >> Signed-off-by: Sean Anderson <[email protected]>
>> >> ---
>> >> This is adapted from [1].
>> >>
>> >> [1] https://lore.kernel.org/netdev/[email protected]/
>> >
>> > I did not write even one line of code from this patch, please drop my
>> > name from the next revision when there will be one.
>> >
>>
>> I merely retained your CDB/SoB from [1].
>
> Yes, but context matters, the logic that you cropped out from that patch
> was exactly my contribution to that change, the result no longer has anything
> to do with me. Maybe you didn't have any way to know this, but now you do.
>
OK, I will remove those lines from the next revision.
--Sean
This adds support for adjusting the advertisement for pause-based rate
adaptation. This may result in a lossy link, since the final link settings
are not adjusted. Asymmetric pause support is necessary. It would be
possible for a MAC supporting only symmetric pause to use pause-based rate
adaptation, but only if pause reception was enabled as well.
Signed-off-by: Sean Anderson <[email protected]>
---
Changes in v3:
- Add phylink_cap_from_speed_duplex to look up the mac capability
corresponding to the interface's speed.
- Include RATE_ADAPT_CRS; it's a few lines and it doesn't hurt.
Changes in v2:
- Determine the interface speed and max mac speed directly instead of
guessing based on the caps.
drivers/net/phy/phylink.c | 58 +++++++++++++++++++++++++++++++++++++--
include/linux/phylink.h | 3 +-
2 files changed, 57 insertions(+), 4 deletions(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 27b60a89ddc6..0eaca6f34413 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -456,14 +456,18 @@ static unsigned long phylink_cap_from_speed_duplex(int speed,
* phylink_get_capabilities() - get capabilities for a given MAC
* @interface: phy interface mode defined by &typedef phy_interface_t
* @mac_capabilities: bitmask of MAC capabilities
+ * @rate_adaptation: type of rate adaptation being performed
*
* Get the MAC capabilities that are supported by the @interface mode and
* @mac_capabilities.
*/
unsigned long phylink_get_capabilities(phy_interface_t interface,
- unsigned long mac_capabilities)
+ unsigned long mac_capabilities,
+ int rate_adaptation)
{
+ int max_speed = phylink_interface_max_speed(interface);
unsigned long caps = MAC_SYM_PAUSE | MAC_ASYM_PAUSE;
+ unsigned long adapted_caps = 0;
switch (interface) {
case PHY_INTERFACE_MODE_USXGMII:
@@ -536,7 +540,53 @@ unsigned long phylink_get_capabilities(phy_interface_t interface,
break;
}
- return caps & mac_capabilities;
+ switch (rate_adaptation) {
+ case RATE_ADAPT_OPEN_LOOP:
+ /* TODO */
+ fallthrough;
+ case RATE_ADAPT_NONE:
+ adapted_caps = 0;
+ break;
+ case RATE_ADAPT_PAUSE: {
+ /* The MAC must support asymmetric pause towards the local
+ * device for this. We could allow just symmetric pause, but
+ * then we might have to renegotiate if the link partner
+ * doesn't support pause. This is because there's no way to
+ * accept pause frames without transmitting them if we only
+ * support symmetric pause.
+ */
+ if (!(mac_capabilities & MAC_SYM_PAUSE) ||
+ !(mac_capabilities & MAC_ASYM_PAUSE))
+ break;
+
+ /* We can't adapt if the MAC doesn't support the interface's
+ * max speed at full duplex.
+ */
+ if (mac_capabilities &
+ phylink_cap_from_speed_duplex(max_speed, DUPLEX_FULL)) {
+ /* Although a duplex-adapting phy might exist, we
+ * conservatively remove these modes because the MAC
+ * will not be aware of the half-duplex nature of the
+ * link.
+ */
+ adapted_caps = GENMASK(__fls(caps), __fls(MAC_10HD));
+ adapted_caps &= ~(MAC_1000HD | MAC_100HD | MAC_10HD);
+ }
+ break;
+ }
+ case RATE_ADAPT_CRS:
+ /* The MAC must support half duplex at the interface's max
+ * speed.
+ */
+ if (mac_capabilities &
+ phylink_cap_from_speed_duplex(max_speed, DUPLEX_HALF)) {
+ adapted_caps = GENMASK(__fls(caps), __fls(MAC_10HD));
+ adapted_caps &= mac_capabilities;
+ }
+ break;
+ }
+
+ return (caps & mac_capabilities) | adapted_caps;
}
EXPORT_SYMBOL_GPL(phylink_get_capabilities);
@@ -560,7 +610,8 @@ void phylink_generic_validate(struct phylink_config *config,
phylink_set_port_modes(mask);
phylink_set(mask, Autoneg);
caps = phylink_get_capabilities(state->interface,
- config->mac_capabilities);
+ config->mac_capabilities,
+ state->rate_adaptation);
phylink_caps_to_linkmodes(mask, caps);
linkmode_and(supported, supported, mask);
@@ -1586,6 +1637,7 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
config.interface = PHY_INTERFACE_MODE_NA;
else
config.interface = interface;
+ config.rate_adaptation = phy_get_rate_adaptation(phy, config.interface);
ret = phylink_validate(pl, supported, &config);
if (ret) {
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 192a18a8674a..265a344e1039 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -543,7 +543,8 @@ void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps);
int phylink_caps_find_max_speed(unsigned long caps, int *speed,
unsigned int *duplex);
unsigned long phylink_get_capabilities(phy_interface_t interface,
- unsigned long mac_capabilities);
+ unsigned long mac_capabilities,
+ int rate_adaptation);
void phylink_generic_validate(struct phylink_config *config,
unsigned long *supported,
struct phylink_link_state *state);
--
2.35.1.1320.gc452695387.dirty
If the phy is configured to use pause-based rate adaptation, ensure that
the link is full duplex with pause frame reception enabled. As
suggested, if pause-based rate adaptation is enabled by the phy, then
pause reception is unconditionally enabled.
The interface duplex is determined based on the rate adaptation type.
When rate adaptation is enabled, so is the speed. We assume the maximum
interface speed is used. This is only relevant for MLO_AN_PHY. For
MLO_AN_INBAND, the MAC/PCS's view of the interface speed will be used.
Although there are no RATE_ADAPT_CRS phys in-tree, it has been added for
comparison (and the implementation is quite simple).
Co-developed-by: Russell King <[email protected]>
Signed-off-by: Sean Anderson <[email protected]>
---
Russell, I need your SoB as well as RB, since you wrote some of this.
Changes in v3:
- Modify link settings directly in phylink_link_up, instead of doing
things more indirectly via link_*.
Changes in v2:
- Use the phy's rate adaptation setting to determine whether to use its
link speed/duplex or the MAC's speed/duplex with MLO_AN_INBAND.
- Always use the rate adaptation setting to determine the interface
speed/duplex (instead of sometimes using the interface mode).
drivers/net/phy/phylink.c | 160 +++++++++++++++++++++++++++++++++++---
include/linux/phylink.h | 5 ++
2 files changed, 153 insertions(+), 12 deletions(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 72bf6b607320..27b60a89ddc6 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -155,6 +155,74 @@ static const char *phylink_an_mode_str(unsigned int mode)
return mode < ARRAY_SIZE(modestr) ? modestr[mode] : "unknown";
}
+/**
+ * phylink_interface_max_speed() - get the maximum speed of a phy interface
+ * @interface: phy interface mode defined by &typedef phy_interface_t
+ *
+ * Determine the maximum speed of a phy interface. This is intended to help
+ * determine the correct speed to pass to the MAC when the phy is performing
+ * rate adaptation.
+ *
+ * Return: The maximum speed of @interface
+ */
+static int phylink_interface_max_speed(phy_interface_t interface)
+{
+ switch (interface) {
+ case PHY_INTERFACE_MODE_100BASEX:
+ case PHY_INTERFACE_MODE_REVRMII:
+ case PHY_INTERFACE_MODE_RMII:
+ case PHY_INTERFACE_MODE_SMII:
+ case PHY_INTERFACE_MODE_REVMII:
+ case PHY_INTERFACE_MODE_MII:
+ return SPEED_100;
+
+ case PHY_INTERFACE_MODE_TBI:
+ case PHY_INTERFACE_MODE_MOCA:
+ case PHY_INTERFACE_MODE_RTBI:
+ case PHY_INTERFACE_MODE_1000BASEX:
+ case PHY_INTERFACE_MODE_1000BASEKX:
+ case PHY_INTERFACE_MODE_TRGMII:
+ case PHY_INTERFACE_MODE_RGMII_TXID:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII:
+ case PHY_INTERFACE_MODE_QSGMII:
+ case PHY_INTERFACE_MODE_SGMII:
+ case PHY_INTERFACE_MODE_GMII:
+ return SPEED_1000;
+
+ case PHY_INTERFACE_MODE_2500BASEX:
+ return SPEED_2500;
+
+ case PHY_INTERFACE_MODE_5GBASER:
+ return SPEED_5000;
+
+ case PHY_INTERFACE_MODE_XGMII:
+ case PHY_INTERFACE_MODE_RXAUI:
+ case PHY_INTERFACE_MODE_XAUI:
+ case PHY_INTERFACE_MODE_10GBASER:
+ case PHY_INTERFACE_MODE_10GKR:
+ case PHY_INTERFACE_MODE_USXGMII:
+ return SPEED_10000;
+
+ case PHY_INTERFACE_MODE_25GBASER:
+ return SPEED_25000;
+
+ case PHY_INTERFACE_MODE_XLGMII:
+ return SPEED_40000;
+
+ case PHY_INTERFACE_MODE_INTERNAL:
+ case PHY_INTERFACE_MODE_NA:
+ case PHY_INTERFACE_MODE_MAX:
+ /* No idea! Garbage in, unknown out */
+ return SPEED_UNKNOWN;
+ }
+
+ /* If we get here, someone forgot to add an interface mode above */
+ WARN_ON_ONCE(1);
+ return SPEED_UNKNOWN;
+}
+
/**
* phylink_caps_to_linkmodes() - Convert capabilities to ethtool link modes
* @linkmodes: ethtool linkmode mask (must be already initialised)
@@ -360,6 +428,30 @@ int phylink_caps_find_max_speed(unsigned long caps, int *speed,
}
EXPORT_SYMBOL_GPL(phylink_caps_find_max_speed);
+/**
+ * phylink_cap_from_speed_duplex - Get mac capability from speed/duplex
+ * @speed: the speed to search for
+ * @duplex: the duplex to search for
+ *
+ * Find the mac capability for a given speed and duplex.
+ *
+ * Return: A mask with the mac capability patching @speed and @duplex, or 0 if
+ * there were no matches.
+ */
+static unsigned long phylink_cap_from_speed_duplex(int speed,
+ unsigned int duplex)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(phylink_caps_params); i++) {
+ if (speed == phylink_caps_params[i].speed &&
+ duplex == phylink_caps_params[i].duplex)
+ return phylink_caps_params[i].mask;
+ }
+
+ return 0;
+}
+
/**
* phylink_get_capabilities() - get capabilities for a given MAC
* @interface: phy interface mode defined by &typedef phy_interface_t
@@ -840,11 +932,12 @@ static void phylink_mac_config(struct phylink *pl,
const struct phylink_link_state *state)
{
phylink_dbg(pl,
- "%s: mode=%s/%s/%s/%s adv=%*pb pause=%02x link=%u an=%u\n",
+ "%s: mode=%s/%s/%s/%s/%s adv=%*pb pause=%02x link=%u an=%u\n",
__func__, phylink_an_mode_str(pl->cur_link_an_mode),
phy_modes(state->interface),
phy_speed_to_str(state->speed),
phy_duplex_to_str(state->duplex),
+ phy_rate_adaptation_to_str(state->rate_adaptation),
__ETHTOOL_LINK_MODE_MASK_NBITS, state->advertising,
state->pause, state->link, state->an_enabled);
@@ -981,7 +1074,8 @@ static void phylink_mac_pcs_get_state(struct phylink *pl,
linkmode_zero(state->lp_advertising);
state->interface = pl->link_config.interface;
state->an_enabled = pl->link_config.an_enabled;
- if (state->an_enabled) {
+ state->rate_adaptation = pl->link_config.rate_adaptation;
+ if (state->an_enabled) {
state->speed = SPEED_UNKNOWN;
state->duplex = DUPLEX_UNKNOWN;
state->pause = MLO_PAUSE_NONE;
@@ -1064,19 +1158,45 @@ static void phylink_link_up(struct phylink *pl,
struct phylink_link_state link_state)
{
struct net_device *ndev = pl->netdev;
+ int speed, duplex;
+ bool rx_pause;
+
+ speed = link_state.speed;
+ duplex = link_state.duplex;
+ rx_pause = !!(link_state.pause & MLO_PAUSE_RX);
+
+ switch (link_state.rate_adaptation) {
+ case RATE_ADAPT_PAUSE:
+ /* The PHY is doing rate adaption from the media rate (in
+ * the link_state) to the interface speed, and will send
+ * pause frames to the MAC to limit its transmission speed.
+ */
+ speed = phylink_interface_max_speed(link_state.interface);
+ duplex = DUPLEX_FULL;
+ rx_pause = true;
+ break;
+
+ case RATE_ADAPT_CRS:
+ /* The PHY is doing rate adaption from the media rate (in
+ * the link_state) to the interface speed, and will cause
+ * collisions to the MAC to limit its transmission speed.
+ */
+ speed = phylink_interface_max_speed(link_state.interface);
+ duplex = DUPLEX_HALF;
+ break;
+ }
pl->cur_interface = link_state.interface;
+ if (link_state.rate_adaptation == RATE_ADAPT_PAUSE)
+ link_state.pause |= MLO_PAUSE_RX;
if (pl->pcs && pl->pcs->ops->pcs_link_up)
pl->pcs->ops->pcs_link_up(pl->pcs, pl->cur_link_an_mode,
- pl->cur_interface,
- link_state.speed, link_state.duplex);
+ pl->cur_interface, speed, duplex);
- pl->mac_ops->mac_link_up(pl->config, pl->phydev,
- pl->cur_link_an_mode, pl->cur_interface,
- link_state.speed, link_state.duplex,
- !!(link_state.pause & MLO_PAUSE_TX),
- !!(link_state.pause & MLO_PAUSE_RX));
+ pl->mac_ops->mac_link_up(pl->config, pl->phydev, pl->cur_link_an_mode,
+ pl->cur_interface, speed, duplex,
+ !!(link_state.pause & MLO_PAUSE_TX), rx_pause);
if (ndev)
netif_carrier_on(ndev);
@@ -1168,6 +1288,17 @@ static void phylink_resolve(struct work_struct *w)
}
link_state.interface = pl->phy_state.interface;
+ /* If we are doing rate adaptation, then the
+ * link speed/duplex comes from the PHY
+ */
+ if (pl->phy_state.rate_adaptation) {
+ link_state.rate_adaptation =
+ pl->phy_state.rate_adaptation;
+ link_state.speed = pl->phy_state.speed;
+ link_state.duplex =
+ pl->phy_state.duplex;
+ }
+
/* If we have a PHY, we need to update with
* the PHY flow control bits.
*/
@@ -1402,6 +1533,7 @@ static void phylink_phy_change(struct phy_device *phydev, bool up)
mutex_lock(&pl->state_mutex);
pl->phy_state.speed = phydev->speed;
pl->phy_state.duplex = phydev->duplex;
+ pl->phy_state.rate_adaptation = phydev->rate_adaptation;
pl->phy_state.pause = MLO_PAUSE_NONE;
if (tx_pause)
pl->phy_state.pause |= MLO_PAUSE_TX;
@@ -1413,10 +1545,11 @@ static void phylink_phy_change(struct phy_device *phydev, bool up)
phylink_run_resolve(pl);
- phylink_dbg(pl, "phy link %s %s/%s/%s/%s\n", up ? "up" : "down",
+ phylink_dbg(pl, "phy link %s %s/%s/%s/%s/%s\n", up ? "up" : "down",
phy_modes(phydev->interface),
phy_speed_to_str(phydev->speed),
phy_duplex_to_str(phydev->duplex),
+ phy_rate_adaptation_to_str(phydev->rate_adaptation),
phylink_pause_to_str(pl->phy_state.pause));
}
@@ -1480,6 +1613,7 @@ static int phylink_bringup_phy(struct phylink *pl, struct phy_device *phy,
pl->phy_state.pause = MLO_PAUSE_NONE;
pl->phy_state.speed = SPEED_UNKNOWN;
pl->phy_state.duplex = DUPLEX_UNKNOWN;
+ pl->phy_state.rate_adaptation = RATE_ADAPT_NONE;
linkmode_copy(pl->supported, supported);
linkmode_copy(pl->link_config.advertising, config.advertising);
@@ -1922,8 +2056,10 @@ static void phylink_get_ksettings(const struct phylink_link_state *state,
{
phylink_merge_link_mode(kset->link_modes.advertising, state->advertising);
linkmode_copy(kset->link_modes.lp_advertising, state->lp_advertising);
- kset->base.speed = state->speed;
- kset->base.duplex = state->duplex;
+ if (kset->base.rate_adaptation == RATE_ADAPT_NONE) {
+ kset->base.speed = state->speed;
+ kset->base.duplex = state->duplex;
+ }
kset->base.autoneg = state->an_enabled ? AUTONEG_ENABLE :
AUTONEG_DISABLE;
}
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index a5a236cfacb6..192a18a8674a 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -75,6 +75,10 @@ static inline bool phylink_autoneg_inband(unsigned int mode)
* @speed: link speed, one of the SPEED_* constants.
* @duplex: link duplex mode, one of DUPLEX_* constants.
* @pause: link pause state, described by MLO_PAUSE_* constants.
+ * @rate_adaptation: rate adaptation being performed, one of the RATE_ADAPT_*
+ * constants. If rate adaptation is taking place, then the speed/duplex of
+ * the medium link mode (@speed and @duplex) and the speed/duplex of the phy
+ * interface mode (@interface) are different.
* @link: true if the link is up.
* @an_enabled: true if autonegotiation is enabled/desired.
* @an_complete: true if autonegotiation has completed.
@@ -86,6 +90,7 @@ struct phylink_link_state {
int speed;
int duplex;
int pause;
+ int rate_adaptation;
unsigned int link:1;
unsigned int an_enabled:1;
unsigned int an_complete:1;
--
2.35.1.1320.gc452695387.dirty
If we call phylink_caps_to_linkmodes directly from
phylink_get_linkmodes, it is difficult to re-use this functionality in
MAC drivers. This is because MAC drivers must then work with an ethtool
linkmode bitmap, instead of with mac capabilities. Instead, let the
caller of phylink_get_linkmodes do the conversion. To reflect this
change, rename the function to phylink_get_capabilities.
Signed-off-by: Sean Anderson <[email protected]>
---
(no changes since v1)
drivers/net/phy/phylink.c | 21 +++++++++++----------
include/linux/phylink.h | 4 ++--
2 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 51e66320526f..68a58ab6a8ed 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -305,17 +305,15 @@ void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps)
EXPORT_SYMBOL_GPL(phylink_caps_to_linkmodes);
/**
- * phylink_get_linkmodes() - get acceptable link modes
- * @linkmodes: ethtool linkmode mask (must be already initialised)
+ * phylink_get_capabilities() - get capabilities for a given MAC
* @interface: phy interface mode defined by &typedef phy_interface_t
* @mac_capabilities: bitmask of MAC capabilities
*
- * Set all possible pause, speed and duplex linkmodes in @linkmodes that
- * are supported by the @interface mode and @mac_capabilities. @linkmodes
- * must have been initialised previously.
+ * Get the MAC capabilities that are supported by the @interface mode and
+ * @mac_capabilities.
*/
-void phylink_get_linkmodes(unsigned long *linkmodes, phy_interface_t interface,
- unsigned long mac_capabilities)
+unsigned long phylink_get_capabilities(phy_interface_t interface,
+ unsigned long mac_capabilities)
{
unsigned long caps = MAC_SYM_PAUSE | MAC_ASYM_PAUSE;
@@ -390,9 +388,9 @@ void phylink_get_linkmodes(unsigned long *linkmodes, phy_interface_t interface,
break;
}
- phylink_caps_to_linkmodes(linkmodes, caps & mac_capabilities);
+ return caps & mac_capabilities;
}
-EXPORT_SYMBOL_GPL(phylink_get_linkmodes);
+EXPORT_SYMBOL_GPL(phylink_get_capabilities);
/**
* phylink_generic_validate() - generic validate() callback implementation
@@ -408,11 +406,14 @@ void phylink_generic_validate(struct phylink_config *config,
unsigned long *supported,
struct phylink_link_state *state)
{
+ unsigned long caps;
__ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
phylink_set_port_modes(mask);
phylink_set(mask, Autoneg);
- phylink_get_linkmodes(mask, state->interface, config->mac_capabilities);
+ caps = phylink_get_capabilities(state->interface,
+ config->mac_capabilities);
+ phylink_caps_to_linkmodes(mask, caps);
linkmode_and(supported, supported, mask);
linkmode_and(state->advertising, state->advertising, mask);
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 2a6d2f7a7ebb..661d1d4fdbec 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -535,8 +535,8 @@ void pcs_link_up(struct phylink_pcs *pcs, unsigned int mode,
#endif
void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps);
-void phylink_get_linkmodes(unsigned long *linkmodes, phy_interface_t interface,
- unsigned long mac_capabilities);
+unsigned long phylink_get_capabilities(phy_interface_t interface,
+ unsigned long mac_capabilities);
void phylink_generic_validate(struct phylink_config *config,
unsigned long *supported,
struct phylink_link_state *state);
--
2.35.1.1320.gc452695387.dirty
As discussed in commit 73a21fa817f0 ("dpaa_eth: support all modes with
rate adapting PHYs"), we must add a workaround for Aquantia phys with
in-tree support in order to keep 1G support working. Update this
workaround for the AQR113C phy found on revision C LS1046ARDB boards.
Fixes: 12cf1b89a668 ("net: phy: Add support for AQR113C EPHY")
Signed-off-by: Sean Anderson <[email protected]>
Acked-by: Camelia Groza <[email protected]>
---
In a previous version of this commit, I referred to an AQR115, however
on further inspection this appears to be an AQR113C. Confusingly, the
higher-numbered phys support lower data rates.
(no changes since v1)
drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
index f643009cac5f..0a180d17121c 100644
--- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
+++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
@@ -2916,6 +2916,7 @@ static void dpaa_adjust_link(struct net_device *net_dev)
/* The Aquantia PHYs are capable of performing rate adaptation */
#define PHY_VEND_AQUANTIA 0x03a1b400
+#define PHY_VEND_AQUANTIA2 0x31c31c00
static int dpaa_phy_init(struct net_device *net_dev)
{
@@ -2923,6 +2924,7 @@ static int dpaa_phy_init(struct net_device *net_dev)
struct mac_device *mac_dev;
struct phy_device *phy_dev;
struct dpaa_priv *priv;
+ u32 phy_vendor;
priv = netdev_priv(net_dev);
mac_dev = priv->mac_dev;
@@ -2935,9 +2937,11 @@ static int dpaa_phy_init(struct net_device *net_dev)
return -ENODEV;
}
+ phy_vendor = phy_dev->drv->phy_id & GENMASK(31, 10);
/* Unless the PHY is capable of rate adaptation */
if (mac_dev->phy_if != PHY_INTERFACE_MODE_XGMII ||
- ((phy_dev->drv->phy_id & GENMASK(31, 10)) != PHY_VEND_AQUANTIA)) {
+ (phy_vendor != PHY_VEND_AQUANTIA &&
+ phy_vendor != PHY_VEND_AQUANTIA2)) {
/* remove any features not supported by the controller */
ethtool_convert_legacy_u32_to_link_mode(mask,
mac_dev->if_support);
--
2.35.1.1320.gc452695387.dirty
This documents the possible MLO_PAUSE_* settings which can result from
different combinations of MLO_(A)SYM_PAUSE. These are more-or-less a
direct consequence of Table 28B-2.
Signed-off-by: Sean Anderson <[email protected]>
---
Changes in v3:
- New
include/linux/phylink.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 6d06896fc20d..9629bcd594b1 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -21,6 +21,22 @@ enum {
MLO_AN_FIXED, /* Fixed-link mode */
MLO_AN_INBAND, /* In-band protocol */
+ /* MAC_SYM_PAUSE and MAC_ASYM_PAUSE correspond to the PAUSE and
+ * ASM_DIR bits used in autonegotiation, respectively. See IEEE802.3
+ * Annex 28B for more information.
+ *
+ * The following table lists the values of MLO_PAUSE_* (aside from
+ * MLO_PAUSE_AN) which might be requested depending on the results of
+ * autonegotiation or user configuration:
+ *
+ * MAC_SYM_PAUSE MAC_ASYM_PAUSE Valid pause modes
+ * ============= ============== ==============================
+ * 0 0 MLO_PAUSE_NONE
+ * 0 1 MLO_PAUSE_NONE, MLO_PAUSE_TX
+ * 1 0 MLO_PAUSE_NONE, MLO_PAUSE_TXRX
+ * 1 1 MLO_PAUSE_NONE, MLO_PAUSE_TXRX,
+ * MLO_PAUSE_RX
+ */
MAC_SYM_PAUSE = BIT(0),
MAC_ASYM_PAUSE = BIT(1),
MAC_10HD = BIT(2),
--
2.35.1.1320.gc452695387.dirty
This function is convenient for MAC drivers. They can use it to add or
remove particular link modes based on capabilities (such as if half
duplex is not supported for a particular interface mode).
Signed-off-by: Sean Anderson <[email protected]>
---
(no changes since v1)
drivers/net/phy/phylink.c | 12 ++++++++++--
include/linux/phylink.h | 1 +
2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index b08716fe22c1..51e66320526f 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -155,8 +155,15 @@ static const char *phylink_an_mode_str(unsigned int mode)
return mode < ARRAY_SIZE(modestr) ? modestr[mode] : "unknown";
}
-static void phylink_caps_to_linkmodes(unsigned long *linkmodes,
- unsigned long caps)
+/**
+ * phylink_caps_to_linkmodes() - Convert capabilities to ethtool link modes
+ * @linkmodes: ethtool linkmode mask (must be already initialised)
+ * @caps: bitmask of MAC capabilities
+ *
+ * Set all possible pause, speed and duplex linkmodes in @linkmodes that are
+ * supported by the @caps. @linkmodes must have been initialised previously.
+ */
+void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps)
{
if (caps & MAC_SYM_PAUSE)
__set_bit(ETHTOOL_LINK_MODE_Pause_BIT, linkmodes);
@@ -295,6 +302,7 @@ static void phylink_caps_to_linkmodes(unsigned long *linkmodes,
__set_bit(ETHTOOL_LINK_MODE_400000baseCR4_Full_BIT, linkmodes);
}
}
+EXPORT_SYMBOL_GPL(phylink_caps_to_linkmodes);
/**
* phylink_get_linkmodes() - get acceptable link modes
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index 9629bcd594b1..2a6d2f7a7ebb 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -534,6 +534,7 @@ void pcs_link_up(struct phylink_pcs *pcs, unsigned int mode,
phy_interface_t interface, int speed, int duplex);
#endif
+void phylink_caps_to_linkmodes(unsigned long *linkmodes, unsigned long caps);
void phylink_get_linkmodes(unsigned long *linkmodes, phy_interface_t interface,
unsigned long mac_capabilities);
void phylink_generic_validate(struct phylink_config *config,
--
2.35.1.1320.gc452695387.dirty
On Mon, Jul 25, 2022 at 11:42:25AM -0400, Sean Anderson wrote:
> Hi Vladmir,
>
> On 7/25/22 11:41 AM, Vladimir Oltean wrote:
> > On Mon, Jul 25, 2022 at 11:37:24AM -0400, Sean Anderson wrote:
> >> This adds a table for converting between speed/duplex and mac
> >> capabilities. It also adds a helper for getting the max speed/duplex
> >> from some caps. It is intended to be used by Russell King's DSA phylink
> >> series. The table will be used directly later in this series.
> >>
> >> Co-developed-by: Vladimir Oltean <[email protected]>
> >> Signed-off-by: Vladimir Oltean <[email protected]>
> >> Co-developed-by: Russell King (Oracle) <[email protected]>
> >> Signed-off-by: Russell King (Oracle) <[email protected]>
> >> [ adapted to live in phylink.c ]
> >> Signed-off-by: Sean Anderson <[email protected]>
> >> ---
> >> This is adapted from [1].
> >>
> >> [1] https://lore.kernel.org/netdev/[email protected]/
> >
> > I did not write even one line of code from this patch, please drop my
> > name from the next revision when there will be one.
> >
>
> I merely retained your CDB/SoB from [1].
What Vladimir is trying to say is that the code in this patch copied
from [1] was not written by him (although other bits in [1] were), and
thus this patch should not carry a Co-developed-by or s-o-b for him.
Thanks.
--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 40Mbps down 10Mbps up. Decent connectivity at last!
On Mon, Jul 25, 2022 at 11:42:25AM -0400, Sean Anderson wrote:
> Hi Vladmir,
>
> On 7/25/22 11:41 AM, Vladimir Oltean wrote:
> > On Mon, Jul 25, 2022 at 11:37:24AM -0400, Sean Anderson wrote:
> >> This adds a table for converting between speed/duplex and mac
> >> capabilities. It also adds a helper for getting the max speed/duplex
> >> from some caps. It is intended to be used by Russell King's DSA phylink
> >> series. The table will be used directly later in this series.
> >>
> >> Co-developed-by: Vladimir Oltean <[email protected]>
> >> Signed-off-by: Vladimir Oltean <[email protected]>
> >> Co-developed-by: Russell King (Oracle) <[email protected]>
> >> Signed-off-by: Russell King (Oracle) <[email protected]>
> >> [ adapted to live in phylink.c ]
> >> Signed-off-by: Sean Anderson <[email protected]>
> >> ---
> >> This is adapted from [1].
> >>
> >> [1] https://lore.kernel.org/netdev/[email protected]/
> >
> > I did not write even one line of code from this patch, please drop my
> > name from the next revision when there will be one.
> >
>
> I merely retained your CDB/SoB from [1].
Yes, but context matters, the logic that you cropped out from that patch
was exactly my contribution to that change, the result no longer has anything
to do with me. Maybe you didn't have any way to know this, but now you do.
On Mon, Jul 25, 2022 at 11:37:24AM -0400, Sean Anderson wrote:
> This adds a table for converting between speed/duplex and mac
> capabilities. It also adds a helper for getting the max speed/duplex
> from some caps. It is intended to be used by Russell King's DSA phylink
> series. The table will be used directly later in this series.
>
> Co-developed-by: Vladimir Oltean <[email protected]>
> Signed-off-by: Vladimir Oltean <[email protected]>
> Co-developed-by: Russell King (Oracle) <[email protected]>
> Signed-off-by: Russell King (Oracle) <[email protected]>
> [ adapted to live in phylink.c ]
> Signed-off-by: Sean Anderson <[email protected]>
> ---
> This is adapted from [1].
>
> [1] https://lore.kernel.org/netdev/[email protected]/
I did not write even one line of code from this patch, please drop my
name from the next revision when there will be one.
On 7/25/22 11:37 AM, Sean Anderson wrote:
> This adds support for phy rate adaptation: when a phy adapts between
> differing phy interface and link speeds. It was originally submitted as
> part of [1], which is considered "v1" of this series.
>
> We need support for rate adaptation for two reasons. First, the phy
> consumer needs to know if the phy will perform rate adaptation in order to
> program the correct advertising. An unaware consumer will only program
> support for link modes at the phy interface mode's native speed. This
> will cause autonegotiation to fail if the link partner only advertises
> support for lower speed link modes. Second, to reduce packet loss it may
> be desirable to throttle packet throughput.
>
> There have been several past discussions [2-4] around adding rate
> adaptation support. One point is that we must be certain that rate
> adaptation is possible before enabling it. It is the opinion of some
> developers that it is the responsibility of the system integrator or end
> user to set the link settings appropriately for rate adaptation. In
> particular, it was argued that (due to differing firmware) it might not
> be clear if a particular phy has rate adaptation enabled. Additionally,
> upper-layer protocols must already be tolerant of packet loss caused by
> differing rates. Packet loss may happen anyway, such as if a faster link
> is used with a slower switch or repeater. So adjusting pause settings
> for rate adaptation is not strictly necessary.
>
> I believe that our current approach is limiting, especially when
> considering that rate adaptation (in two forms) has made it into IEEE
> standards. In general, when we have appropriate information we should
> set sensible defaults. To consider use a contrasting example, we enable
> pause frames by default for link partners which autonegotiate for them.
> When it's the phy itself generating these frames, we don't even have to
> autonegotiate to know that we should enable pause frames.
>
> Our current approach also encourages workarounds, such as commit
> 73a21fa817f0 ("dpaa_eth: support all modes with rate adapting PHYs").
> These workarounds are fine for phylib drivers, but phylink drivers cannot
> use this approach (since there is no direct access to the phy).
>
> Although in earlier versions of this series, userspace could disable
> rate adaptation, now it is only possible to determine the current rate
> adaptation type. Disabling or otherwise configuring rate adaptation has
> been left for future work. However, because currently only
> RATE_ADAPT_PAUSE is implemented, it is possible to disable rate
> adaptation by modifying the advertisement appropriately.
>
> [1] https://lore.kernel.org/netdev/[email protected]/T/#t
> [2] https://lore.kernel.org/netdev/[email protected]/
> [3] https://lore.kernel.org/netdev/[email protected]/
> [4] https://lore.kernel.org/netdev/[email protected]/
>
> Changes in v3:
> - Document MAC_(A)SYM_PAUSE
> - Add some helpers for working with mac caps
> - Modify link settings directly in phylink_link_up, instead of doing
> things more indirectly via link_*.
> - Add phylink_cap_from_speed_duplex to look up the mac capability
> corresponding to the interface's speed.
> - Include RATE_ADAPT_CRS; it's a few lines and it doesn't hurt.
> - Move unused defines to next commit (where they will be used)
> - Remove "Support differing link/interface speed/duplex". It has been
> rendered unnecessary due to simplification of the rate adaptation
> patches. Thanks Russell!
> - Rewrite cover letter to better reflect the opinions of the developers
> involved
>
> Changes in v2:
> - Use int/defines instead of enum to allow for use in ioctls/netlink
> - Add locking to phy_get_rate_adaptation
> - Add (read-only) ethtool support for rate adaptation
> - Move part of commit message to cover letter, as it gives a good
> overview of the whole series, and allows this patch to focus more on
> the specifics.
> - Use the phy's rate adaptation setting to determine whether to use its
> link speed/duplex or the MAC's speed/duplex with MLO_AN_INBAND.
> - Always use the rate adaptation setting to determine the interface
> speed/duplex (instead of sometimes using the interface mode).
> - Determine the interface speed and max mac speed directly instead of
> guessing based on the caps.
> - Add comments clarifying the register defines
> - Reorder variables in aqr107_read_rate
>
> Sean Anderson (11):
> net: dpaa: Fix <1G ethernet on LS1046ARDB
> net: phy: Add 1000BASE-KX interface mode
> net: phylink: Document MAC_(A)SYM_PAUSE
> net: phylink: Export phylink_caps_to_linkmodes
> net: phylink: Generate caps and convert to linkmodes separately
> net: phylink: Add some helpers for working with mac caps
> net: phy: Add support for rate adaptation
> net: phylink: Adjust link settings based on rate adaptation
> net: phylink: Adjust advertisement based on rate adaptation
> net: phy: aquantia: Add some additional phy interfaces
> net: phy: aquantia: Add support for rate adaptation
>
> Documentation/networking/ethtool-netlink.rst | 2 +
> .../net/ethernet/freescale/dpaa/dpaa_eth.c | 6 +-
> drivers/net/phy/aquantia_main.c | 68 +++-
> drivers/net/phy/phy-core.c | 15 +
> drivers/net/phy/phy.c | 28 ++
> drivers/net/phy/phylink.c | 302 ++++++++++++++++--
> include/linux/phy.h | 26 +-
> include/linux/phylink.h | 29 +-
> include/uapi/linux/ethtool.h | 18 +-
> include/uapi/linux/ethtool_netlink.h | 1 +
> net/ethtool/ioctl.c | 1 +
> net/ethtool/linkmodes.c | 5 +
> 12 files changed, 466 insertions(+), 35 deletions(-)
>
ping?
Are there any comments on this series other than about the tags for patch 6?
--Sean
On 8/18/22 11:41 AM, Andrew Lunn wrote:
> On Thu, Aug 18, 2022 at 11:21:10AM -0400, Sean Anderson wrote:
>>
>>
>> On 7/25/22 11:37 AM, Sean Anderson wrote:
>> > This adds support for phy rate adaptation: when a phy adapts between
>> > differing phy interface and link speeds. It was originally submitted as
>> > part of [1], which is considered "v1" of this series.
>
>> ping?
>>
>> Are there any comments on this series other than about the tags for patch 6?
>
> Anything that old is going to need a rebase. So you may as well repost
> rather than ping.
OK
I have some other series [1,2] which have also received little-to-no feedback. Should
I resend those as well? I didn't ping about these earlier because there was a merge
window open, and e.g. gregkh has told me not to ping at that time (since series will
be reviewed afterwards). Should I be pinging sooner on netdev?
--Sean
[1] https://lore.kernel.org/linux-phy/[email protected]/
[2] https://lore.kernel.org/netdev/[email protected]/
On Thu, Aug 18, 2022 at 11:21:10AM -0400, Sean Anderson wrote:
>
>
> On 7/25/22 11:37 AM, Sean Anderson wrote:
> > This adds support for phy rate adaptation: when a phy adapts between
> > differing phy interface and link speeds. It was originally submitted as
> > part of [1], which is considered "v1" of this series.
> ping?
>
> Are there any comments on this series other than about the tags for patch 6?
Anything that old is going to need a rebase. So you may as well repost
rather than ping.
Andrew
On 8/18/22 12:53 PM, Vladimir Oltean wrote:
> On Mon, Jul 25, 2022 at 11:37:20AM -0400, Sean Anderson wrote:
>> Add 1000BASE-KX interface mode. This 1G backplane ethernet as described in
>> clause 70. Clause 73 autonegotiation is mandatory, and only full duplex
>> operation is supported.
>>
>> Signed-off-by: Sean Anderson <[email protected]>
>> ---
>
> What does 1000BASE-KX have to do with anything?
>
It doesn't, really. This should have been part of [1], but it looks like I
put it in the wrong series.
--Sean
[1] https://lore.kernel.org/linux-phy/[email protected]/
On Mon, Jul 25, 2022 at 11:37:20AM -0400, Sean Anderson wrote:
> Add 1000BASE-KX interface mode. This 1G backplane ethernet as described in
> clause 70. Clause 73 autonegotiation is mandatory, and only full duplex
> operation is supported.
>
> Signed-off-by: Sean Anderson <[email protected]>
> ---
What does 1000BASE-KX have to do with anything?
On 8/18/22 12:57 PM, Sean Anderson wrote:
>
>
> On 8/18/22 12:53 PM, Vladimir Oltean wrote:
>> On Mon, Jul 25, 2022 at 11:37:20AM -0400, Sean Anderson wrote:
>>> Add 1000BASE-KX interface mode. This 1G backplane ethernet as described in
>>> clause 70. Clause 73 autonegotiation is mandatory, and only full duplex
>>> operation is supported.
>>>
>>> Signed-off-by: Sean Anderson <[email protected]>
>>> ---
>>
>> What does 1000BASE-KX have to do with anything?
>>
>
> It doesn't, really. This should have been part of [1], but it looks like I
> put it in the wrong series.
>
> --Sean
>
> [1] https://lore.kernel.org/linux-phy/[email protected]/
>
Well, I suppose the real reason is that this will cause a merge conflict
(or lack of one), since this series introduces phylink_interface_max_speed
in patch 7, which is supposed to contain all the phy modes. So depending on
what gets merged first, the other series will have to be modified and resent.
To be honest, I had expected that trivial patches like that would have been
applied and merged already.
--Sean
On Thu, Aug 18, 2022 at 01:03:54PM -0400, Sean Anderson wrote:
> Well, I suppose the real reason is that this will cause a merge conflict
> (or lack of one), since this series introduces phylink_interface_max_speed
> in patch 7, which is supposed to contain all the phy modes. So depending on
> what gets merged first, the other series will have to be modified and resent.
>
> To be honest, I had expected that trivial patches like that would have been
> applied and merged already.
There's nothing trivial about this patch. 1000Base-KX is not a phy-mode
in exactly the same way that 1000Base-T isn't, either. If you want to
bring PHY_INTERFACE_MODE_10GKR as a "yes, but" counterexample, it was
later clarified that 10gbase-r was what was actually meant in that case,
and we keep 10gbase-kr as phy-mode only for compatibility with some
device trees.
I'd suggest resolving the merge conflict without 1000Base-KX and
splitting off a separate discussion about this topic. Otherwise it will
unnecessarily detract from PAUSE-based rate adaptation.
On 8/18/22 1:12 PM, Vladimir Oltean wrote:
> On Thu, Aug 18, 2022 at 01:03:54PM -0400, Sean Anderson wrote:
>> Well, I suppose the real reason is that this will cause a merge conflict
>> (or lack of one), since this series introduces phylink_interface_max_speed
>> in patch 7, which is supposed to contain all the phy modes. So depending on
>> what gets merged first, the other series will have to be modified and resent.
>>
>> To be honest, I had expected that trivial patches like that would have been
>> applied and merged already.
>
> There's nothing trivial about this patch.
Perhaps "limited in scope and mostly independent" is better, then.
> 1000Base-KX is not a phy-mode
> in exactly the same way that 1000Base-T isn't, either.
It has different AN from 1000BASE-X (c73 vs c37), and doesn't support half
duplex. This is something the serdes and PCS have to care about. Unfortunately,
we don't have a separate PCS_INTERFACE_MODE so these things become
PHY_INTERFACE_MODEs.
> If you want to
> bring PHY_INTERFACE_MODE_10GKR as a "yes, but" counterexample, it was
> later clarified that 10gbase-r was what was actually meant in that case,
> and we keep 10gbase-kr as phy-mode only for compatibility with some
> device trees.
That's not what's documented:
> ``PHY_INTERFACE_MODE_10GBASER``
> This is the IEEE 802.3 Clause 49 defined 10GBASE-R protocol used with
> various different mediums. Please refer to the IEEE standard for a
> definition of this.
>
> Note: 10GBASE-R is just one protocol that can be used with XFI and SFI.
> XFI and SFI permit multiple protocols over a single SERDES lane, and
> also defines the electrical characteristics of the signals with a host
> compliance board plugged into the host XFP/SFP connector. Therefore,
> XFI and SFI are not PHY interface types in their own right.
>
> ``PHY_INTERFACE_MODE_10GKR``
> This is the IEEE 802.3 Clause 49 defined 10GBASE-R with Clause 73
> autonegotiation. Please refer to the IEEE standard for further
> information.
>
> Note: due to legacy usage, some 10GBASE-R usage incorrectly makes
> use of this definition.
so indeed you get a new phy interface mode when you add c73 AN. The
clarification only applies to *incorrect* usage.
> I'd suggest resolving the merge conflict without 1000Base-KX and
> splitting off a separate discussion about this topic. Otherwise it will
> unnecessarily detract from PAUSE-based rate adaptation.
Well, no one is using it yet, so hopefully it will not be a problem...
--Sean
On Thu, Aug 18, 2022 at 01:28:19PM -0400, Sean Anderson wrote:
> That's not what's documented:
>
> > ``PHY_INTERFACE_MODE_10GBASER``
> > This is the IEEE 802.3 Clause 49 defined 10GBASE-R protocol used with
> > various different mediums. Please refer to the IEEE standard for a
> > definition of this.
> >
> > Note: 10GBASE-R is just one protocol that can be used with XFI and SFI.
> > XFI and SFI permit multiple protocols over a single SERDES lane, and
> > also defines the electrical characteristics of the signals with a host
> > compliance board plugged into the host XFP/SFP connector. Therefore,
> > XFI and SFI are not PHY interface types in their own right.
> >
> > ``PHY_INTERFACE_MODE_10GKR``
> > This is the IEEE 802.3 Clause 49 defined 10GBASE-R with Clause 73
> > autonegotiation. Please refer to the IEEE standard for further
> > information.
> >
> > Note: due to legacy usage, some 10GBASE-R usage incorrectly makes
> > use of this definition.
>
> so indeed you get a new phy interface mode when you add c73 AN. The
> clarification only applies to *incorrect* usage.
I challenge you to the following thought experiment. Open clause 73 from
IEEE 802.3, and see what is actually exchanged through auto-negotiation.
You'll discover that the *use* of the 10GBase-KR operating mode is
*established* through clause 73 AN (the Technology Ability field).
So what sense does it make to define 10GBase-KR as "10Base-R with clause 73 AN"
as the document you've quoted does? None whatsoever. The K in KR stands
for bacKplane, and typical of this type of PMD are the signaling and
link training procedures described in the previous clause, 72.
Clause 73 AN is not something that is a property of 10GBase-KR, but
something that exists outside of it.
So if clause 73 *establishes* the use of 10GBase-KR (or 1000Base-KX or
others) through autonegotiation, then what sense does it have to put
phy-mode = "1000base-kx" in the device tree? Does it mean "use C73 AN",
or "don't use it, I already know what operating mode I want to use"?
If it means "use C73 AN", then what advertisement do you use for the
Technology Ability field? There's a priority resolution function for
C73, just like there is one for C28/C40 for the twisted pair medium (aka
that thing that allows you to fall back to the highest supported common
link speed). So why would you populate just one bit in Technology
Ability based on DT, if you can potentially support multiple operating
modes? And why would you even create your advertisement based on the
device tree, for that matter? Twisted pair PHYs don't do this.
On 8/18/22 3:51 PM, Vladimir Oltean wrote:
> On Thu, Aug 18, 2022 at 01:28:19PM -0400, Sean Anderson wrote:
>> That's not what's documented:
>>
>> > ``PHY_INTERFACE_MODE_10GBASER``
>> > This is the IEEE 802.3 Clause 49 defined 10GBASE-R protocol used with
>> > various different mediums. Please refer to the IEEE standard for a
>> > definition of this.
>> >
>> > Note: 10GBASE-R is just one protocol that can be used with XFI and SFI.
>> > XFI and SFI permit multiple protocols over a single SERDES lane, and
>> > also defines the electrical characteristics of the signals with a host
>> > compliance board plugged into the host XFP/SFP connector. Therefore,
>> > XFI and SFI are not PHY interface types in their own right.
>> >
>> > ``PHY_INTERFACE_MODE_10GKR``
>> > This is the IEEE 802.3 Clause 49 defined 10GBASE-R with Clause 73
>> > autonegotiation. Please refer to the IEEE standard for further
>> > information.
>> >
>> > Note: due to legacy usage, some 10GBASE-R usage incorrectly makes
>> > use of this definition.
>>
>> so indeed you get a new phy interface mode when you add c73 AN. The
>> clarification only applies to *incorrect* usage.
>
> I challenge you to the following thought experiment. Open clause 73 from
> IEEE 802.3, and see what is actually exchanged through auto-negotiation.
> You'll discover that the *use* of the 10GBase-KR operating mode is
> *established* through clause 73 AN (the Technology Ability field).
>
> So what sense does it make to define 10GBase-KR as "10Base-R with clause 73 AN"
> as the document you've quoted does?None whatsoever. The K in KR stands
> for bacKplane, and typical of this type of PMD are the signaling and
> link training procedures described in the previous clause, 72.
> Clause 73 AN is not something that is a property of 10GBase-KR, but
> something that exists outside of it.
You should send a patch; this document is Documentation/networking/phy.rst
> So if clause 73 *establishes* the use of 10GBase-KR (or 1000Base-KX or
> others) through autonegotiation, then what sense does it have to put
> phy-mode = "1000base-kx" in the device tree? Does it mean "use C73 AN",
> or "don't use it, I already know what operating mode I want to use"?
>
> If it means "use C73 AN", then what advertisement do you use for the
> Technology Ability field? There's a priority resolution function for
> C73, just like there is one for C28/C40 for the twisted pair medium (aka
> that thing that allows you to fall back to the highest supported common
> link speed). So why would you populate just one bit in Technology
> Ability based on DT, if you can potentially support multiple operating
> modes? And why would you even create your advertisement based on the
> device tree, for that matter? Twisted pair PHYs don't do this.
The problem is that our current model looks something like
1. MAC <-- A --> phy (ethernet) --> B <-- far end
2. MAC <-> "PCS" <-> phy (serdes) --> C <-- phy (ethernet) --> B <-- far end
3. --> C <-- transciever --> B <-- far end
4. --> D <-- far end
Where 1 is the traditional MAC+phy architecture, 2 is a MAC connected to
a phy over a serial link, 3 is a MAC connected to an optical
transcievber, and 4 is a backplane connection. A is the phy interface
mode, and B is the ethtool link mode. C is also the "phy interface
mode", except that sometimes it is highly-dependent on the link mode
(e.g. 1000BASE-X) and sometimes it is not (SGMII). The problem is case
4. Here, there isn't really a phy interface mode; just a link mode.
Consider the serdes driver. It has to know how to configure itself.
Sometimes this will be the phy mode (cases 2 and 3), and sometimes it
will be the link mode (case 4). In particular, for a link mode like
1000BASE-SX, it should be configured for 1000BASE-X. But for
1000BASE-KX, it has to be configured for 1000BASE-KX. I suppose the
right thing to do here is rework the phy subsystem to use link modes and
not phy modes for phy_mode_ext, since it seems like there is a
1000BASE-X link mode. But what should be passed to mac_prepare and
mac_select_pcs?
As another example, consider the AQR113C. It supports the following
(abbreviated) interfaces on the MAC side:
- 10GBASE-KR
- 1000BASE-KX
- 10GBASE-R
- 1000BASE-X
- USXGMII
- SGMII
This example of what phy-mode = "1000base-kx" would imply. I would
expect that selecting -KX over -X would change the electrical settings
to comply with clause 70 (instead of the SFP spec).
--Sean
> I have some other series [1,2] which have also received little-to-no feedback. Should
> I resend those as well? I didn't ping about these earlier because there was a merge
> window open, and e.g. gregkh has told me not to ping at that time (since series will
> be reviewed afterwards). Should I be pinging sooner on netdev?
For netdev, you need to resend after the merge window. Other
subsystems might look at older patch, and if they apply cleanly might
merge them. But not netdev.
Andrew
Hi Vladimir,
On 8/18/22 4:40 PM, Sean Anderson wrote:
> On 8/18/22 3:51 PM, Vladimir Oltean wrote:
>> On Thu, Aug 18, 2022 at 01:28:19PM -0400, Sean Anderson wrote:
>>> That's not what's documented:
>>>
>>> > ``PHY_INTERFACE_MODE_10GBASER``
>>> > This is the IEEE 802.3 Clause 49 defined 10GBASE-R protocol used with
>>> > various different mediums. Please refer to the IEEE standard for a
>>> > definition of this.
>>> >
>>> > Note: 10GBASE-R is just one protocol that can be used with XFI and SFI.
>>> > XFI and SFI permit multiple protocols over a single SERDES lane, and
>>> > also defines the electrical characteristics of the signals with a host
>>> > compliance board plugged into the host XFP/SFP connector. Therefore,
>>> > XFI and SFI are not PHY interface types in their own right.
>>> >
>>> > ``PHY_INTERFACE_MODE_10GKR``
>>> > This is the IEEE 802.3 Clause 49 defined 10GBASE-R with Clause 73
>>> > autonegotiation. Please refer to the IEEE standard for further
>>> > information.
>>> >
>>> > Note: due to legacy usage, some 10GBASE-R usage incorrectly makes
>>> > use of this definition.
>>>
>>> so indeed you get a new phy interface mode when you add c73 AN. The
>>> clarification only applies to *incorrect* usage.
>>
>> I challenge you to the following thought experiment. Open clause 73 from
>> IEEE 802.3, and see what is actually exchanged through auto-negotiation.
>> You'll discover that the *use* of the 10GBase-KR operating mode is
>> *established* through clause 73 AN (the Technology Ability field).
>>
>> So what sense does it make to define 10GBase-KR as "10Base-R with clause 73 AN"
>> as the document you've quoted does?None whatsoever. The K in KR stands
>> for bacKplane, and typical of this type of PMD are the signaling and
>> link training procedures described in the previous clause, 72.
>> Clause 73 AN is not something that is a property of 10GBase-KR, but
>> something that exists outside of it.
>
> You should send a patch; this document is Documentation/networking/phy.rst
>
>> So if clause 73 *establishes* the use of 10GBase-KR (or 1000Base-KX or
>> others) through autonegotiation, then what sense does it have to put
>> phy-mode = "1000base-kx" in the device tree? Does it mean "use C73 AN",
>> or "don't use it, I already know what operating mode I want to use"?
>>
>> If it means "use C73 AN", then what advertisement do you use for the
>> Technology Ability field? There's a priority resolution function for
>> C73, just like there is one for C28/C40 for the twisted pair medium (aka
>> that thing that allows you to fall back to the highest supported common
>> link speed). So why would you populate just one bit in Technology
>> Ability based on DT, if you can potentially support multiple operating
>> modes? And why would you even create your advertisement based on the
>> device tree, for that matter? Twisted pair PHYs don't do this.
>
> The problem is that our current model looks something like
>
> 1. MAC <-- A --> phy (ethernet) --> B <-- far end
> 2. MAC <-> "PCS" <-> phy (serdes) --> C <-- phy (ethernet) --> B <-- far end
> 3. --> C <-- transciever --> B <-- far end
> 4. --> D <-- far end
>
> Where 1 is the traditional MAC+phy architecture, 2 is a MAC connected to
> a phy over a serial link, 3 is a MAC connected to an optical
> transcievber, and 4 is a backplane connection. A is the phy interface
> mode, and B is the ethtool link mode. C is also the "phy interface
> mode", except that sometimes it is highly-dependent on the link mode
> (e.g. 1000BASE-X) and sometimes it is not (SGMII). The problem is case
> 4. Here, there isn't really a phy interface mode; just a link mode.
>
> Consider the serdes driver. It has to know how to configure itself.
> Sometimes this will be the phy mode (cases 2 and 3), and sometimes it
> will be the link mode (case 4). In particular, for a link mode like
> 1000BASE-SX, it should be configured for 1000BASE-X. But for
> 1000BASE-KX, it has to be configured for 1000BASE-KX. I suppose the
> right thing to do here is rework the phy subsystem to use link modes and
> not phy modes for phy_mode_ext, since it seems like there is a
> 1000BASE-X link mode. But what should be passed to mac_prepare and
> mac_select_pcs?
>
> As another example, consider the AQR113C. It supports the following
> (abbreviated) interfaces on the MAC side:
>
> - 10GBASE-KR
> - 1000BASE-KX
> - 10GBASE-R
> - 1000BASE-X
> - USXGMII
> - SGMII
>
> This example of what phy-mode = "1000base-kx" would imply. I would
> expect that selecting -KX over -X would change the electrical settings
> to comply with clause 70 (instead of the SFP spec).
Do you have any comments on the above?
--Sean
On 8/25/22 7:45 PM, Vladimir Oltean wrote:
> On Thu, Aug 25, 2022 at 06:50:23PM -0400, Sean Anderson wrote:
>> > The problem is that our current model looks something like
>> >
>> > 1. MAC <-- A --> phy (ethernet) --> B <-- far end
>> > 2. MAC <-> "PCS" <-> phy (serdes) --> C <-- phy (ethernet) --> B <-- far end
>> > 3. --> C <-- transciever --> B <-- far end
>> > 4. --> D <-- far end
>> >
>> > Where 1 is the traditional MAC+phy architecture, 2 is a MAC connected to
>> > a phy over a serial link, 3 is a MAC connected to an optical
>> > transcievber, and 4 is a backplane connection. A is the phy interface
>> > mode, and B is the ethtool link mode. C is also the "phy interface
>> > mode", except that sometimes it is highly-dependent on the link mode
>> > (e.g. 1000BASE-X) and sometimes it is not (SGMII). The problem is case
>> > 4. Here, there isn't really a phy interface mode; just a link mode.
>> >
>> > Consider the serdes driver. It has to know how to configure itself.
>> > Sometimes this will be the phy mode (cases 2 and 3), and sometimes it
>> > will be the link mode (case 4). In particular, for a link mode like
>> > 1000BASE-SX, it should be configured for 1000BASE-X. But for
>> > 1000BASE-KX, it has to be configured for 1000BASE-KX. I suppose the
>> > right thing to do here is rework the phy subsystem to use link modes and
>> > not phy modes for phy_mode_ext, since it seems like there is a
>> > 1000BASE-X link mode. But what should be passed to mac_prepare and
>> > mac_select_pcs?
>> >
>> > As another example, consider the AQR113C. It supports the following
>> > (abbreviated) interfaces on the MAC side:
>> >
>> > - 10GBASE-KR
>> > - 1000BASE-KX
>> > - 10GBASE-R
>> > - 1000BASE-X
>> > - USXGMII
>> > - SGMII
>> >
>> > This example of what phy-mode = "1000base-kx" would imply. I would
>> > expect that selecting -KX over -X would change the electrical settings
>> > to comply with clause 70 (instead of the SFP spec).
>>
>> Do you have any comments on the above?
>
> What comments do you expect? My message was just don't get sidetracked
> by trying to tackle problems you don't need to solve, thinking they're
> just there, along the way.
>
> "The problem" of wanting to describe an electrical using phy-mode was
> discussed before, with debatable degrees of success and the following
> synopsis:
>
> | phy_interface_t describes the protocol *only*, it doesn't describe
> | the electrical characteristics of the interface. So, if we exclude
> | the electrical characteristics of SFI, we're back to 10GBASE-R,
> | 10GBASE-W, 10GFC or 10GBASE-R with FEC. That's what phy_interface_t
> | is, not SFI.
> |
> | So, I propose that we add 10GBASE-R to the list and *not* SFI.
>
> https://lore.kernel.org/netdev/[email protected]/
>
> I don't have access to 802.3 right now to double check, but I think this
> is a similar case here - between 1000Base-SX and 1000Base-KX, only the
> PMA/PMD is different, otherwise they are still both 1000Base-X in terms
> of protocol/coding.
Well, the main thing is that the PCS has to know we are doing c73 AN, as
opposed to c37. Maybe we need another MLO_AN_? Or perhaps the PCS should
look at the advertisement in order to determine how to advertise? But
then who determines whether we do c73? This should really come from the
device tree, and not involve userspace manually setting up the advertisement.
And of course the serdes has to find out what the link mode is. I suppose
we could stick this stuff in the device tree. Maybe using ethtool link modes
for phy_set_mode_ext is better, but it's still not ideal since many link modes
are electrically identical (we don't want to special-case full/half duplex in
every serdes driver).
> As for the second "example". I had opened my copy of the AQR113C manual
> and IIRC it listed 10GBASE-KR in the system interfaces, but I don't
> recall seeing 1000Base-KX. And even for 10GBASE-KR, there weren't a lot
> of details, like what is configurable about it, is there C73 available etc.
> Not extremely clear what that is about, tbh. Something that has
> 10GBase-KR on system side and 10GBase-T on media side sounds like a
> media converter to me. Not sure how we model those in phylib (I think
> the answer is we don't).
I'd agree with your assesment that it's a media converter and that the
documentation isn't clear. However, it's fairly trivial to support in the
manner I outlined. That's actually the main thing I'm getting at: the rest
of the code is structured to treat a PHY_INTERFACE_MODE_1000BASEKX in the
right way, even if it's not really a phy interface mode (...just like
1000BASE-X isn't really a phy interface mode).
--Sean
On Thu, Aug 25, 2022 at 06:50:23PM -0400, Sean Anderson wrote:
> > The problem is that our current model looks something like
> >
> > 1. MAC <-- A --> phy (ethernet) --> B <-- far end
> > 2. MAC <-> "PCS" <-> phy (serdes) --> C <-- phy (ethernet) --> B <-- far end
> > 3. --> C <-- transciever --> B <-- far end
> > 4. --> D <-- far end
> >
> > Where 1 is the traditional MAC+phy architecture, 2 is a MAC connected to
> > a phy over a serial link, 3 is a MAC connected to an optical
> > transcievber, and 4 is a backplane connection. A is the phy interface
> > mode, and B is the ethtool link mode. C is also the "phy interface
> > mode", except that sometimes it is highly-dependent on the link mode
> > (e.g. 1000BASE-X) and sometimes it is not (SGMII). The problem is case
> > 4. Here, there isn't really a phy interface mode; just a link mode.
> >
> > Consider the serdes driver. It has to know how to configure itself.
> > Sometimes this will be the phy mode (cases 2 and 3), and sometimes it
> > will be the link mode (case 4). In particular, for a link mode like
> > 1000BASE-SX, it should be configured for 1000BASE-X. But for
> > 1000BASE-KX, it has to be configured for 1000BASE-KX. I suppose the
> > right thing to do here is rework the phy subsystem to use link modes and
> > not phy modes for phy_mode_ext, since it seems like there is a
> > 1000BASE-X link mode. But what should be passed to mac_prepare and
> > mac_select_pcs?
> >
> > As another example, consider the AQR113C. It supports the following
> > (abbreviated) interfaces on the MAC side:
> >
> > - 10GBASE-KR
> > - 1000BASE-KX
> > - 10GBASE-R
> > - 1000BASE-X
> > - USXGMII
> > - SGMII
> >
> > This example of what phy-mode = "1000base-kx" would imply. I would
> > expect that selecting -KX over -X would change the electrical settings
> > to comply with clause 70 (instead of the SFP spec).
>
> Do you have any comments on the above?
What comments do you expect? My message was just don't get sidetracked
by trying to tackle problems you don't need to solve, thinking they're
just there, along the way.
"The problem" of wanting to describe an electrical using phy-mode was
discussed before, with debatable degrees of success and the following
synopsis:
| phy_interface_t describes the protocol *only*, it doesn't describe
| the electrical characteristics of the interface. So, if we exclude
| the electrical characteristics of SFI, we're back to 10GBASE-R,
| 10GBASE-W, 10GFC or 10GBASE-R with FEC. That's what phy_interface_t
| is, not SFI.
|
| So, I propose that we add 10GBASE-R to the list and *not* SFI.
https://lore.kernel.org/netdev/[email protected]/
I don't have access to 802.3 right now to double check, but I think this
is a similar case here - between 1000Base-SX and 1000Base-KX, only the
PMA/PMD is different, otherwise they are still both 1000Base-X in terms
of protocol/coding.
As for the second "example". I had opened my copy of the AQR113C manual
and IIRC it listed 10GBASE-KR in the system interfaces, but I don't
recall seeing 1000Base-KX. And even for 10GBASE-KR, there weren't a lot
of details, like what is configurable about it, is there C73 available etc.
Not extremely clear what that is about, tbh. Something that has
10GBase-KR on system side and 10GBase-T on media side sounds like a
media converter to me. Not sure how we model those in phylib (I think
the answer is we don't).