The tc-taprio offload mode supported by the Felix DSA driver has
limitations surrounding its guard bands.
The initial discussion was at:
https://lore.kernel.org/netdev/[email protected]/
with the latest status being that we now have a vsc9959_tas_guard_bands_update()
method which makes a best-guess attempt at how much useful space to
reserve for packet scheduling in a taprio interval, and how much to
reserve for guard bands.
IEEE 802.1Q actually does offer a tunable variable (queueMaxSDU) which
can determine the max MTU supported per traffic class. In turn we can
determine the size we need for the guard bands, depending on the
queueMaxSDU. This way we can make the guard band of small taprio
intervals smaller than one full MTU worth of transmission time, if we
know that said traffic class will transport only smaller packets.
As discussed with Gerhard Engleder, the queueMaxSDU may also be useful
in limiting the latency on an endpoint, if some of the TX queues are
outside of the control of the Linux driver.
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
Allow input of queueMaxSDU through netlink into tc-taprio, offload it to
the hardware I have access to (LS1028A), and deny non-default values to
everyone else. Kurt Kanzenbach has also kindly tested and shared a patch
to offload this to hellcreek.
Changes in v2:
- precompute the max_frm_len using dev->hard_header_len, so that the
fast path can directly check against skb->len
- add the newly added lan966x taprio offload to the list of drivers
which must reject the new option
- add some enetc cleanup patches from
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
- get rid of some taprio cleanup patches which were merged separately
via https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
- make enetc_vf.ko compile by excluding the taprio offload code:
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
v1 at:
https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
Kurt Kanzenbach (1):
net: dsa: hellcreek: Offload per-tc max SDU from tc-taprio
Vladimir Oltean (11):
net/sched: taprio: allow user input of per-tc max SDU
tsnep: deny tc-taprio changes to per-tc max SDU
igc: deny tc-taprio changes to per-tc max SDU
net: stmmac: deny tc-taprio changes to per-tc max SDU
net: am65-cpsw: deny tc-taprio changes to per-tc max SDU
net: lan966x: deny tc-taprio changes to per-tc max SDU
net: dsa: sja1105: deny tc-taprio changes to per-tc max SDU
net: dsa: felix: offload per-tc max SDU from tc-taprio
net: enetc: cache accesses to &priv->si->hw
net: enetc: use common naming scheme for PTGCR and PTGCAPR registers
net: enetc: offload per-tc max SDU from tc-taprio
drivers/net/dsa/hirschmann/hellcreek.c | 59 +++++++-
drivers/net/dsa/hirschmann/hellcreek.h | 7 +
drivers/net/dsa/ocelot/felix_vsc9959.c | 20 ++-
drivers/net/dsa/sja1105/sja1105_tas.c | 6 +-
drivers/net/ethernet/engleder/tsnep_tc.c | 6 +-
drivers/net/ethernet/freescale/enetc/enetc.c | 28 ++--
drivers/net/ethernet/freescale/enetc/enetc.h | 12 +-
.../net/ethernet/freescale/enetc/enetc_hw.h | 10 +-
.../net/ethernet/freescale/enetc/enetc_pf.c | 25 +++-
.../net/ethernet/freescale/enetc/enetc_qos.c | 73 +++++----
drivers/net/ethernet/intel/igc/igc_main.c | 6 +-
.../microchip/lan966x/lan966x_taprio.c | 8 +
.../net/ethernet/stmicro/stmmac/stmmac_tc.c | 6 +-
drivers/net/ethernet/ti/am65-cpsw-qos.c | 6 +-
include/net/pkt_sched.h | 1 +
include/uapi/linux/pkt_sched.h | 11 ++
net/sched/sch_taprio.c | 138 +++++++++++++++++-
17 files changed, 351 insertions(+), 71 deletions(-)
--
2.34.1
Since the driver does not act upon the max_sdu argument, deny any other
values except the default all-zeroes, which means that all traffic
classes should use the same MTU as the port itself.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none
drivers/net/ethernet/engleder/tsnep_tc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/engleder/tsnep_tc.c b/drivers/net/ethernet/engleder/tsnep_tc.c
index c4c6e1357317..82df837ffc54 100644
--- a/drivers/net/ethernet/engleder/tsnep_tc.c
+++ b/drivers/net/ethernet/engleder/tsnep_tc.c
@@ -320,11 +320,15 @@ static int tsnep_taprio(struct tsnep_adapter *adapter,
{
struct tsnep_gcl *gcl;
struct tsnep_gcl *curr;
- int retval;
+ int retval, tc;
if (!adapter->gate_control)
return -EOPNOTSUPP;
+ for (tc = 0; tc < TC_MAX_QUEUE; tc++)
+ if (qopt->max_sdu[tc])
+ return -EOPNOTSUPP;
+
if (!qopt->enable) {
/* disable gate control if active */
mutex_lock(&adapter->gate_control_lock);
--
2.34.1
Since the driver does not act upon the max_sdu argument, deny any other
values except the default all-zeroes, which means that all traffic
classes should use the same MTU as the port itself.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
index 773e415cc2de..8e9c7d54fb4c 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c
@@ -922,13 +922,17 @@ static int tc_setup_taprio(struct stmmac_priv *priv,
struct plat_stmmacenet_data *plat = priv->plat;
struct timespec64 time, current_time, qopt_time;
ktime_t current_time_ns;
+ int i, tc, ret = 0;
bool fpe = false;
- int i, ret = 0;
u64 ctr;
if (!priv->dma_cap.estsel)
return -EOPNOTSUPP;
+ for (tc = 0; tc < TC_MAX_QUEUE; tc++)
+ if (qopt->max_sdu[tc])
+ return -EOPNOTSUPP;
+
switch (wid) {
case 0x1:
wid = 16;
--
2.34.1
Since the driver does not act upon the max_sdu argument, deny any other
values except the default all-zeroes, which means that all traffic
classes should use the same MTU as the port itself.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: patch is new
drivers/net/ethernet/microchip/lan966x/lan966x_taprio.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_taprio.c b/drivers/net/ethernet/microchip/lan966x/lan966x_taprio.c
index 3f5b212066c5..96367819ff96 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_taprio.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_taprio.c
@@ -219,8 +219,16 @@ static int lan966x_taprio_find_list(struct lan966x_port *port,
static int lan966x_taprio_check(struct tc_taprio_qopt_offload *qopt)
{
u64 total_time = 0;
+ int tc;
u32 i;
+ /* Offloading queueMaxSDU is not supported at the moment,
+ * only accept the default port MTU
+ */
+ for (tc = 0; tc < 8; tc++)
+ if (qopt->max_sdu[tc])
+ return -EOPNOTSUPP;
+
/* This is not supported by th HW */
if (qopt->cycle_time_extension)
return -EOPNOTSUPP;
--
2.34.1
Since the driver does not act upon the max_sdu argument, deny any other
values except the default all-zeroes, which means that all traffic
classes should use the same MTU as the port itself.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none
drivers/net/ethernet/ti/am65-cpsw-qos.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/ti/am65-cpsw-qos.c b/drivers/net/ethernet/ti/am65-cpsw-qos.c
index e162771893af..bd3b425083aa 100644
--- a/drivers/net/ethernet/ti/am65-cpsw-qos.c
+++ b/drivers/net/ethernet/ti/am65-cpsw-qos.c
@@ -503,7 +503,11 @@ static int am65_cpsw_set_taprio(struct net_device *ndev, void *type_data)
struct am65_cpsw_port *port = am65_ndev_to_port(ndev);
struct tc_taprio_qopt_offload *taprio = type_data;
struct am65_cpsw_est *est_new;
- int ret = 0;
+ int tc, ret = 0;
+
+ for (tc = 0; tc < TC_MAX_QUEUE; tc++)
+ if (taprio->max_sdu[tc])
+ return -EOPNOTSUPP;
if (taprio->cycle_time_extension) {
dev_err(&ndev->dev, "Failed to set cycle time extension");
--
2.34.1
Since the driver does not act upon the max_sdu argument (which it
should, in full offload mode), deny any other values except the default
all-zeroes, which means that all traffic classes should use the same MTU
as the port itself.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none
drivers/net/dsa/sja1105/sja1105_tas.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/sja1105/sja1105_tas.c b/drivers/net/dsa/sja1105/sja1105_tas.c
index e6153848a950..607f4714fb01 100644
--- a/drivers/net/dsa/sja1105/sja1105_tas.c
+++ b/drivers/net/dsa/sja1105/sja1105_tas.c
@@ -511,7 +511,7 @@ int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
{
struct sja1105_private *priv = ds->priv;
struct sja1105_tas_data *tas_data = &priv->tas_data;
- int other_port, rc, i;
+ int other_port, rc, i, tc;
/* Can't change an already configured port (must delete qdisc first).
* Can't delete the qdisc from an unconfigured port.
@@ -519,6 +519,10 @@ int sja1105_setup_tc_taprio(struct dsa_switch *ds, int port,
if (!!tas_data->offload[port] == admin->enable)
return -EINVAL;
+ for (tc = 0; tc < TC_MAX_QUEUE; tc++)
+ if (admin->max_sdu[tc])
+ return -EOPNOTSUPP;
+
if (!admin->enable) {
taprio_offload_free(tas_data->offload[port]);
tas_data->offload[port] = NULL;
--
2.34.1
Our current vsc9959_tas_guard_bands_update() algorithm has a limitation
imposed by the hardware design. To avoid packet overruns between one
gate interval and the next (which would add jitter for scheduled traffic
in the next gate), we configure the switch to use guard bands. These are
as large as the largest packet which is possible to be transmitted.
The problem is that at tc-taprio intervals of sizes comparable to a
guard band, there isn't an obvious place in which to split the interval
between the useful portion (for scheduling) and the guard band portion
(where scheduling is blocked).
For example, a 10 us interval at 1Gbps allows 1225 octets to be
transmitted. We currently split the interval between the bare minimum of
33 ns useful time (required to schedule a single packet) and the rest as
guard band.
But 33 ns of useful scheduling time will only allow a single packet to
be sent, be that packet 1200 octets in size, or 60 octets in size. It is
impossible to send 2 60 octets frames in the 10 us window. Except that
if we reduced the guard band (and therefore the maximum allowable SDU
size) to 5 us, the useful time for scheduling is now also 5 us, so more
packets could be scheduled.
The hardware inflexibility of not scheduling according to individual
packet lengths must unfortunately propagate to the user, who needs to
tune the queueMaxSDU values if he wants to fit more small packets into a
10 us interval, rather than one large packet.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none
drivers/net/dsa/ocelot/felix_vsc9959.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 2ec49e42b3f4..aeb52c5c68d9 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -1248,6 +1248,14 @@ static u32 vsc9959_port_qmaxsdu_get(struct ocelot *ocelot, int port, int tc)
}
}
+static u32 vsc9959_tas_tc_max_sdu(struct tc_taprio_qopt_offload *taprio, int tc)
+{
+ if (!taprio || !taprio->max_sdu[tc])
+ return 0;
+
+ return taprio->max_sdu[tc] + ETH_HLEN + 2 * VLAN_HLEN + ETH_FCS_LEN;
+}
+
/* Update QSYS_PORT_MAX_SDU to make sure the static guard bands added by the
* switch (see the ALWAYS_GUARD_BAND_SCH_Q comment) are correct at all MTU
* values (the default value is 1518). Also, for traffic class windows smaller
@@ -1257,6 +1265,7 @@ static u32 vsc9959_port_qmaxsdu_get(struct ocelot *ocelot, int port, int tc)
static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
{
struct ocelot_port *ocelot_port = ocelot->ports[port];
+ struct tc_taprio_qopt_offload *taprio;
u64 min_gate_len[OCELOT_NUM_TC];
int speed, picos_per_byte;
u64 needed_bit_time_ps;
@@ -1266,6 +1275,8 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
lockdep_assert_held(&ocelot->tas_lock);
+ taprio = ocelot_port->taprio;
+
val = ocelot_read_rix(ocelot, QSYS_TAG_CONFIG, port);
tas_speed = QSYS_TAG_CONFIG_LINK_SPEED_X(val);
@@ -1302,11 +1313,12 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
"port %d: max frame size %d needs %llu ps at speed %d\n",
port, maxlen, needed_bit_time_ps, speed);
- vsc9959_tas_min_gate_lengths(ocelot_port->taprio, min_gate_len);
+ vsc9959_tas_min_gate_lengths(taprio, min_gate_len);
mutex_lock(&ocelot->fwd_domain_lock);
for (tc = 0; tc < OCELOT_NUM_TC; tc++) {
+ u32 requested_max_sdu = vsc9959_tas_tc_max_sdu(taprio, tc);
u64 remaining_gate_len_ps;
u32 max_sdu;
@@ -1317,7 +1329,7 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
/* Setting QMAXSDU_CFG to 0 disables oversized frame
* dropping.
*/
- max_sdu = 0;
+ max_sdu = requested_max_sdu;
dev_dbg(ocelot->dev,
"port %d tc %d min gate len %llu"
", sending all frames\n",
@@ -1348,6 +1360,10 @@ static void vsc9959_tas_guard_bands_update(struct ocelot *ocelot, int port)
*/
if (max_sdu > 20)
max_sdu -= 20;
+
+ if (requested_max_sdu && requested_max_sdu < max_sdu)
+ max_sdu = requested_max_sdu;
+
dev_info(ocelot->dev,
"port %d tc %d min gate length %llu"
" ns not enough for max frame size %d at %d"
--
2.34.1
From: Kurt Kanzenbach <[email protected]>
Add support for configuring the max SDU per priority and per port. If not
specified, keep the default.
Signed-off-by: Kurt Kanzenbach <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: replace my patch that rejects custom queueMaxSDU values with
Kurt's patch that acts upon them
drivers/net/dsa/hirschmann/hellcreek.c | 59 +++++++++++++++++++++++++-
drivers/net/dsa/hirschmann/hellcreek.h | 7 +++
2 files changed, 64 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/hirschmann/hellcreek.c b/drivers/net/dsa/hirschmann/hellcreek.c
index eac6ace7c5f9..ed0bdf0ea6bf 100644
--- a/drivers/net/dsa/hirschmann/hellcreek.c
+++ b/drivers/net/dsa/hirschmann/hellcreek.c
@@ -128,6 +128,16 @@ static void hellcreek_select_prio(struct hellcreek *hellcreek, int prio)
hellcreek_write(hellcreek, val, HR_PSEL);
}
+static void hellcreek_select_port_prio(struct hellcreek *hellcreek, int port,
+ int prio)
+{
+ u16 val = port << HR_PSEL_PTWSEL_SHIFT;
+
+ val |= prio << HR_PSEL_PRTCWSEL_SHIFT;
+
+ hellcreek_write(hellcreek, val, HR_PSEL);
+}
+
static void hellcreek_select_counter(struct hellcreek *hellcreek, int counter)
{
u16 val = counter << HR_CSEL_SHIFT;
@@ -1537,6 +1547,45 @@ hellcreek_port_prechangeupper(struct dsa_switch *ds, int port,
return ret;
}
+static void hellcreek_setup_maxsdu(struct hellcreek *hellcreek, int port,
+ const struct tc_taprio_qopt_offload *schedule)
+{
+ int tc;
+
+ for (tc = 0; tc < 8; ++tc) {
+ u32 max_sdu = schedule->max_sdu[tc] + VLAN_ETH_HLEN - ETH_FCS_LEN;
+ u16 val;
+
+ if (!schedule->max_sdu[tc])
+ continue;
+
+ dev_dbg(hellcreek->dev, "Configure max-sdu %u for tc %d on port %d\n",
+ max_sdu, tc, port);
+
+ hellcreek_select_port_prio(hellcreek, port, tc);
+
+ val = (max_sdu & HR_PTPRTCCFG_MAXSDU_MASK) << HR_PTPRTCCFG_MAXSDU_SHIFT;
+
+ hellcreek_write(hellcreek, val, HR_PTPRTCCFG);
+ }
+}
+
+static void hellcreek_reset_maxsdu(struct hellcreek *hellcreek, int port)
+{
+ int tc;
+
+ for (tc = 0; tc < 8; ++tc) {
+ u16 val;
+
+ hellcreek_select_port_prio(hellcreek, port, tc);
+
+ val = (HELLCREEK_DEFAULT_MAX_SDU & HR_PTPRTCCFG_MAXSDU_MASK)
+ << HR_PTPRTCCFG_MAXSDU_SHIFT;
+
+ hellcreek_write(hellcreek, val, HR_PTPRTCCFG);
+ }
+}
+
static void hellcreek_setup_gcl(struct hellcreek *hellcreek, int port,
const struct tc_taprio_qopt_offload *schedule)
{
@@ -1720,7 +1769,10 @@ static int hellcreek_port_set_schedule(struct dsa_switch *ds, int port,
}
hellcreek_port->current_schedule = taprio_offload_get(taprio);
- /* Then select port */
+ /* Configure max sdu */
+ hellcreek_setup_maxsdu(hellcreek, port, hellcreek_port->current_schedule);
+
+ /* Select tdg */
hellcreek_select_tgd(hellcreek, port);
/* Enable gating and keep defaults */
@@ -1772,7 +1824,10 @@ static int hellcreek_port_del_schedule(struct dsa_switch *ds, int port)
hellcreek_port->current_schedule = NULL;
}
- /* Then select port */
+ /* Reset max sdu */
+ hellcreek_reset_maxsdu(hellcreek, port);
+
+ /* Select tgd */
hellcreek_select_tgd(hellcreek, port);
/* Disable gating and return to regular switching flow */
diff --git a/drivers/net/dsa/hirschmann/hellcreek.h b/drivers/net/dsa/hirschmann/hellcreek.h
index 9e303b8ab13c..4a678f7d61ae 100644
--- a/drivers/net/dsa/hirschmann/hellcreek.h
+++ b/drivers/net/dsa/hirschmann/hellcreek.h
@@ -37,6 +37,7 @@
#define HELLCREEK_VLAN_UNTAGGED_MEMBER 0x1
#define HELLCREEK_VLAN_TAGGED_MEMBER 0x3
#define HELLCREEK_NUM_EGRESS_QUEUES 8
+#define HELLCREEK_DEFAULT_MAX_SDU 1536
/* Register definitions */
#define HR_MODID_C (0 * 2)
@@ -72,6 +73,12 @@
#define HR_PRTCCFG_PCP_TC_MAP_SHIFT 0
#define HR_PRTCCFG_PCP_TC_MAP_MASK GENMASK(2, 0)
+#define HR_PTPRTCCFG (0xa9 * 2)
+#define HR_PTPRTCCFG_SET_QTRACK BIT(15)
+#define HR_PTPRTCCFG_REJECT BIT(14)
+#define HR_PTPRTCCFG_MAXSDU_SHIFT 0
+#define HR_PTPRTCCFG_MAXSDU_MASK GENMASK(10, 0)
+
#define HR_CSEL (0x8d * 2)
#define HR_CSEL_SHIFT 0
#define HR_CSEL_MASK GENMASK(7, 0)
--
2.34.1
The &priv->si->hw construct dereferences 2 pointers and makes lines
longer than they need to be, in turn making the code harder to read.
Replace &priv->si->hw accesses with a "hw" variable when there are 2 or
more accesses within a function that dereference this. This includes
loops, since &priv->si->hw is a loop invariant.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none
drivers/net/ethernet/freescale/enetc/enetc.c | 28 +++++----
drivers/net/ethernet/freescale/enetc/enetc.h | 9 +--
.../net/ethernet/freescale/enetc/enetc_qos.c | 60 +++++++++----------
3 files changed, 49 insertions(+), 48 deletions(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 9f5b921039bd..151fb3fa4806 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2116,13 +2116,14 @@ static void enetc_setup_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
static void enetc_setup_bdrs(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
int i;
for (i = 0; i < priv->num_tx_rings; i++)
- enetc_setup_txbdr(&priv->si->hw, priv->tx_ring[i]);
+ enetc_setup_txbdr(hw, priv->tx_ring[i]);
for (i = 0; i < priv->num_rx_rings; i++)
- enetc_setup_rxbdr(&priv->si->hw, priv->rx_ring[i]);
+ enetc_setup_rxbdr(hw, priv->rx_ring[i]);
}
static void enetc_clear_rxbdr(struct enetc_hw *hw, struct enetc_bdr *rx_ring)
@@ -2155,13 +2156,14 @@ static void enetc_clear_txbdr(struct enetc_hw *hw, struct enetc_bdr *tx_ring)
static void enetc_clear_bdrs(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
int i;
for (i = 0; i < priv->num_tx_rings; i++)
- enetc_clear_txbdr(&priv->si->hw, priv->tx_ring[i]);
+ enetc_clear_txbdr(hw, priv->tx_ring[i]);
for (i = 0; i < priv->num_rx_rings; i++)
- enetc_clear_rxbdr(&priv->si->hw, priv->rx_ring[i]);
+ enetc_clear_rxbdr(hw, priv->rx_ring[i]);
udelay(1);
}
@@ -2169,13 +2171,13 @@ static void enetc_clear_bdrs(struct enetc_ndev_priv *priv)
static int enetc_setup_irqs(struct enetc_ndev_priv *priv)
{
struct pci_dev *pdev = priv->si->pdev;
+ struct enetc_hw *hw = &priv->si->hw;
int i, j, err;
for (i = 0; i < priv->bdr_int_num; i++) {
int irq = pci_irq_vector(pdev, ENETC_BDR_INT_BASE_IDX + i);
struct enetc_int_vector *v = priv->int_vector[i];
int entry = ENETC_BDR_INT_BASE_IDX + i;
- struct enetc_hw *hw = &priv->si->hw;
snprintf(v->name, sizeof(v->name), "%s-rxtx%d",
priv->ndev->name, i);
@@ -2263,13 +2265,14 @@ static void enetc_setup_interrupts(struct enetc_ndev_priv *priv)
static void enetc_clear_interrupts(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
int i;
for (i = 0; i < priv->num_tx_rings; i++)
- enetc_txbdr_wr(&priv->si->hw, i, ENETC_TBIER, 0);
+ enetc_txbdr_wr(hw, i, ENETC_TBIER, 0);
for (i = 0; i < priv->num_rx_rings; i++)
- enetc_rxbdr_wr(&priv->si->hw, i, ENETC_RBIER, 0);
+ enetc_rxbdr_wr(hw, i, ENETC_RBIER, 0);
}
static int enetc_phylink_connect(struct net_device *ndev)
@@ -2436,6 +2439,7 @@ int enetc_setup_tc_mqprio(struct net_device *ndev, void *type_data)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
struct tc_mqprio_qopt *mqprio = type_data;
+ struct enetc_hw *hw = &priv->si->hw;
struct enetc_bdr *tx_ring;
int num_stack_tx_queues;
u8 num_tc;
@@ -2452,7 +2456,7 @@ int enetc_setup_tc_mqprio(struct net_device *ndev, void *type_data)
/* Reset all ring priorities to 0 */
for (i = 0; i < priv->num_tx_rings; i++) {
tx_ring = priv->tx_ring[i];
- enetc_set_bdr_prio(&priv->si->hw, tx_ring->index, 0);
+ enetc_set_bdr_prio(hw, tx_ring->index, 0);
}
return 0;
@@ -2471,7 +2475,7 @@ int enetc_setup_tc_mqprio(struct net_device *ndev, void *type_data)
*/
for (i = 0; i < num_tc; i++) {
tx_ring = priv->tx_ring[i];
- enetc_set_bdr_prio(&priv->si->hw, tx_ring->index, i);
+ enetc_set_bdr_prio(hw, tx_ring->index, i);
}
/* Reset the number of netdev queues based on the TC count */
@@ -2584,19 +2588,21 @@ static int enetc_set_rss(struct net_device *ndev, int en)
static void enetc_enable_rxvlan(struct net_device *ndev, bool en)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct enetc_hw *hw = &priv->si->hw;
int i;
for (i = 0; i < priv->num_rx_rings; i++)
- enetc_bdr_enable_rxvlan(&priv->si->hw, i, en);
+ enetc_bdr_enable_rxvlan(hw, i, en);
}
static void enetc_enable_txvlan(struct net_device *ndev, bool en)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct enetc_hw *hw = &priv->si->hw;
int i;
for (i = 0; i < priv->num_tx_rings; i++)
- enetc_bdr_enable_txvlan(&priv->si->hw, i, en);
+ enetc_bdr_enable_txvlan(hw, i, en);
}
void enetc_set_features(struct net_device *ndev, netdev_features_t features)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 2cfe6944ebd3..748677b2ce1f 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -467,19 +467,20 @@ int enetc_set_psfp(struct net_device *ndev, bool en);
static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
u32 reg;
- reg = enetc_port_rd(&priv->si->hw, ENETC_PSIDCAPR);
+ reg = enetc_port_rd(hw, ENETC_PSIDCAPR);
priv->psfp_cap.max_streamid = reg & ENETC_PSIDCAPR_MSK;
/* Port stream filter capability */
- reg = enetc_port_rd(&priv->si->hw, ENETC_PSFCAPR);
+ reg = enetc_port_rd(hw, ENETC_PSFCAPR);
priv->psfp_cap.max_psfp_filter = reg & ENETC_PSFCAPR_MSK;
/* Port stream gate capability */
- reg = enetc_port_rd(&priv->si->hw, ENETC_PSGCAPR);
+ reg = enetc_port_rd(hw, ENETC_PSGCAPR);
priv->psfp_cap.max_psfp_gate = (reg & ENETC_PSGCAPR_SGIT_MSK);
priv->psfp_cap.max_psfp_gatelist = (reg & ENETC_PSGCAPR_GCL_MSK) >> 16;
/* Port flow meter capability */
- reg = enetc_port_rd(&priv->si->hw, ENETC_PFMCAPR);
+ reg = enetc_port_rd(hw, ENETC_PFMCAPR);
priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
}
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index f8a2f02ce22d..2e783ef73690 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -17,8 +17,9 @@ static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
void enetc_sched_speed_set(struct enetc_ndev_priv *priv, int speed)
{
+ struct enetc_hw *hw = &priv->si->hw;
u32 old_speed = priv->speed;
- u32 pspeed;
+ u32 pspeed, tmp;
if (speed == old_speed)
return;
@@ -39,16 +40,15 @@ void enetc_sched_speed_set(struct enetc_ndev_priv *priv, int speed)
}
priv->speed = speed;
- enetc_port_wr(&priv->si->hw, ENETC_PMR,
- (enetc_port_rd(&priv->si->hw, ENETC_PMR)
- & (~ENETC_PMR_PSPEED_MASK))
- | pspeed);
+ tmp = enetc_port_rd(hw, ENETC_PMR);
+ enetc_port_wr(hw, ENETC_PMR, (tmp & ~ENETC_PMR_PSPEED_MASK) | pspeed);
}
static int enetc_setup_taprio(struct net_device *ndev,
struct tc_taprio_qopt_offload *admin_conf)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct enetc_hw *hw = &priv->si->hw;
struct enetc_cbd cbd = {.cmd = 0};
struct tgs_gcl_conf *gcl_config;
struct tgs_gcl_data *gcl_data;
@@ -61,15 +61,13 @@ static int enetc_setup_taprio(struct net_device *ndev,
int err;
int i;
- if (admin_conf->num_entries > enetc_get_max_gcl_len(&priv->si->hw))
+ if (admin_conf->num_entries > enetc_get_max_gcl_len(hw))
return -EINVAL;
gcl_len = admin_conf->num_entries;
- tge = enetc_rd(&priv->si->hw, ENETC_QBV_PTGCR_OFFSET);
+ tge = enetc_rd(hw, ENETC_QBV_PTGCR_OFFSET);
if (!admin_conf->enable) {
- enetc_wr(&priv->si->hw,
- ENETC_QBV_PTGCR_OFFSET,
- tge & (~ENETC_QBV_TGE));
+ enetc_wr(hw, ENETC_QBV_PTGCR_OFFSET, tge & ~ENETC_QBV_TGE);
priv->active_offloads &= ~ENETC_F_QBV;
@@ -117,14 +115,11 @@ static int enetc_setup_taprio(struct net_device *ndev,
cbd.cls = BDCR_CMD_PORT_GCL;
cbd.status_flags = 0;
- enetc_wr(&priv->si->hw, ENETC_QBV_PTGCR_OFFSET,
- tge | ENETC_QBV_TGE);
+ enetc_wr(hw, ENETC_QBV_PTGCR_OFFSET, tge | ENETC_QBV_TGE);
err = enetc_send_cmd(priv->si, &cbd);
if (err)
- enetc_wr(&priv->si->hw,
- ENETC_QBV_PTGCR_OFFSET,
- tge & (~ENETC_QBV_TGE));
+ enetc_wr(hw, ENETC_QBV_PTGCR_OFFSET, tge & ~ENETC_QBV_TGE);
enetc_cbd_free_data_mem(priv->si, data_size, tmp, &dma);
@@ -138,6 +133,7 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data)
{
struct tc_taprio_qopt_offload *taprio = type_data;
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct enetc_hw *hw = &priv->si->hw;
int err;
int i;
@@ -147,16 +143,14 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data)
return -EBUSY;
for (i = 0; i < priv->num_tx_rings; i++)
- enetc_set_bdr_prio(&priv->si->hw,
- priv->tx_ring[i]->index,
+ enetc_set_bdr_prio(hw, priv->tx_ring[i]->index,
taprio->enable ? i : 0);
err = enetc_setup_taprio(ndev, taprio);
if (err)
for (i = 0; i < priv->num_tx_rings; i++)
- enetc_set_bdr_prio(&priv->si->hw,
- priv->tx_ring[i]->index,
+ enetc_set_bdr_prio(hw, priv->tx_ring[i]->index,
taprio->enable ? 0 : i);
return err;
@@ -178,7 +172,7 @@ int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data)
struct tc_cbs_qopt_offload *cbs = type_data;
u32 port_transmit_rate = priv->speed;
u8 tc_nums = netdev_get_num_tc(ndev);
- struct enetc_si *si = priv->si;
+ struct enetc_hw *hw = &priv->si->hw;
u32 hi_credit_bit, hi_credit_reg;
u32 max_interference_size;
u32 port_frame_max_size;
@@ -199,15 +193,15 @@ int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data)
* lower than this TC have been disabled.
*/
if (tc == prio_top &&
- enetc_get_cbs_enable(&si->hw, prio_next)) {
+ enetc_get_cbs_enable(hw, prio_next)) {
dev_err(&ndev->dev,
"Disable TC%d before disable TC%d\n",
prio_next, tc);
return -EINVAL;
}
- enetc_port_wr(&si->hw, ENETC_PTCCBSR1(tc), 0);
- enetc_port_wr(&si->hw, ENETC_PTCCBSR0(tc), 0);
+ enetc_port_wr(hw, ENETC_PTCCBSR1(tc), 0);
+ enetc_port_wr(hw, ENETC_PTCCBSR0(tc), 0);
return 0;
}
@@ -224,13 +218,13 @@ int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data)
* higher than this TC have been enabled.
*/
if (tc == prio_next) {
- if (!enetc_get_cbs_enable(&si->hw, prio_top)) {
+ if (!enetc_get_cbs_enable(hw, prio_top)) {
dev_err(&ndev->dev,
"Enable TC%d first before enable TC%d\n",
prio_top, prio_next);
return -EINVAL;
}
- bw_sum += enetc_get_cbs_bw(&si->hw, prio_top);
+ bw_sum += enetc_get_cbs_bw(hw, prio_top);
}
if (bw_sum + bw >= 100) {
@@ -239,7 +233,7 @@ int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data)
return -EINVAL;
}
- enetc_port_rd(&si->hw, ENETC_PTCMSDUR(tc));
+ enetc_port_rd(hw, ENETC_PTCMSDUR(tc));
/* For top prio TC, the max_interfrence_size is maxSizedFrame.
*
@@ -259,8 +253,8 @@ int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data)
u32 m0, ma, r0, ra;
m0 = port_frame_max_size * 8;
- ma = enetc_port_rd(&si->hw, ENETC_PTCMSDUR(prio_top)) * 8;
- ra = enetc_get_cbs_bw(&si->hw, prio_top) *
+ ma = enetc_port_rd(hw, ENETC_PTCMSDUR(prio_top)) * 8;
+ ra = enetc_get_cbs_bw(hw, prio_top) *
port_transmit_rate * 10000ULL;
r0 = port_transmit_rate * 1000000ULL;
max_interference_size = m0 + ma +
@@ -280,10 +274,10 @@ int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data)
hi_credit_reg = (u32)div_u64((ENETC_CLK * 100ULL) * hi_credit_bit,
port_transmit_rate * 1000000ULL);
- enetc_port_wr(&si->hw, ENETC_PTCCBSR1(tc), hi_credit_reg);
+ enetc_port_wr(hw, ENETC_PTCCBSR1(tc), hi_credit_reg);
/* Set bw register and enable this traffic class */
- enetc_port_wr(&si->hw, ENETC_PTCCBSR0(tc), bw | ENETC_CBSE);
+ enetc_port_wr(hw, ENETC_PTCCBSR0(tc), bw | ENETC_CBSE);
return 0;
}
@@ -293,6 +287,7 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)
struct enetc_ndev_priv *priv = netdev_priv(ndev);
struct tc_etf_qopt_offload *qopt = type_data;
u8 tc_nums = netdev_get_num_tc(ndev);
+ struct enetc_hw *hw = &priv->si->hw;
int tc;
if (!tc_nums)
@@ -304,12 +299,11 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)
return -EINVAL;
/* TSD and Qbv are mutually exclusive in hardware */
- if (enetc_rd(&priv->si->hw, ENETC_QBV_PTGCR_OFFSET) & ENETC_QBV_TGE)
+ if (enetc_rd(hw, ENETC_QBV_PTGCR_OFFSET) & ENETC_QBV_TGE)
return -EBUSY;
priv->tx_ring[tc]->tsd_enable = qopt->enable;
- enetc_port_wr(&priv->si->hw, ENETC_PTCTSDR(tc),
- qopt->enable ? ENETC_TSDE : 0);
+ enetc_port_wr(hw, ENETC_PTCTSDR(tc), qopt->enable ? ENETC_TSDE : 0);
return 0;
}
--
2.34.1
The Port Time Gating Control Register (PTGCR) and Port Time Gating
Capability Register (PTGCAPR) have definitions in the driver which
aren't in line with the other registers. Rename these.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: patch is new (actually taken from the separate preliminary
series at https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/)
drivers/net/ethernet/freescale/enetc/enetc_hw.h | 10 +++++-----
drivers/net/ethernet/freescale/enetc/enetc_qos.c | 13 ++++++-------
2 files changed, 11 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 0b85e37a00eb..18ca1f42b1f7 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -945,13 +945,13 @@ static inline u32 enetc_usecs_to_cycles(u32 usecs)
}
/* port time gating control register */
-#define ENETC_QBV_PTGCR_OFFSET 0x11a00
-#define ENETC_QBV_TGE BIT(31)
-#define ENETC_QBV_TGPE BIT(30)
+#define ENETC_PTGCR 0x11a00
+#define ENETC_PTGCR_TGE BIT(31)
+#define ENETC_PTGCR_TGPE BIT(30)
/* Port time gating capability register */
-#define ENETC_QBV_PTGCAPR_OFFSET 0x11a08
-#define ENETC_QBV_MAX_GCL_LEN_MASK GENMASK(15, 0)
+#define ENETC_PTGCAPR 0x11a08
+#define ENETC_PTGCAPR_MAX_GCL_LEN_MASK GENMASK(15, 0)
/* Port time specific departure */
#define ENETC_PTCTSDR(n) (0x1210 + 4 * (n))
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 2e783ef73690..ee28cb62afe8 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -11,8 +11,7 @@
static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
{
- return enetc_rd(hw, ENETC_QBV_PTGCAPR_OFFSET)
- & ENETC_QBV_MAX_GCL_LEN_MASK;
+ return enetc_rd(hw, ENETC_PTGCAPR) & ENETC_PTGCAPR_MAX_GCL_LEN_MASK;
}
void enetc_sched_speed_set(struct enetc_ndev_priv *priv, int speed)
@@ -65,9 +64,9 @@ static int enetc_setup_taprio(struct net_device *ndev,
return -EINVAL;
gcl_len = admin_conf->num_entries;
- tge = enetc_rd(hw, ENETC_QBV_PTGCR_OFFSET);
+ tge = enetc_rd(hw, ENETC_PTGCR);
if (!admin_conf->enable) {
- enetc_wr(hw, ENETC_QBV_PTGCR_OFFSET, tge & ~ENETC_QBV_TGE);
+ enetc_wr(hw, ENETC_PTGCR, tge & ~ENETC_PTGCR_TGE);
priv->active_offloads &= ~ENETC_F_QBV;
@@ -115,11 +114,11 @@ static int enetc_setup_taprio(struct net_device *ndev,
cbd.cls = BDCR_CMD_PORT_GCL;
cbd.status_flags = 0;
- enetc_wr(hw, ENETC_QBV_PTGCR_OFFSET, tge | ENETC_QBV_TGE);
+ enetc_wr(hw, ENETC_PTGCR, tge | ENETC_PTGCR_TGE);
err = enetc_send_cmd(priv->si, &cbd);
if (err)
- enetc_wr(hw, ENETC_QBV_PTGCR_OFFSET, tge & ~ENETC_QBV_TGE);
+ enetc_wr(hw, ENETC_PTGCR, tge & ~ENETC_PTGCR_TGE);
enetc_cbd_free_data_mem(priv->si, data_size, tmp, &dma);
@@ -299,7 +298,7 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)
return -EINVAL;
/* TSD and Qbv are mutually exclusive in hardware */
- if (enetc_rd(hw, ENETC_QBV_PTGCR_OFFSET) & ENETC_QBV_TGE)
+ if (enetc_rd(hw, ENETC_PTGCR) & ENETC_PTGCR_TGE)
return -EBUSY;
priv->tx_ring[tc]->tsd_enable = qopt->enable;
--
2.34.1
The driver currently sets the PTCMSDUR register statically to the max
MTU supported by the interface. Keep this logic if tc-taprio is absent
or if the max_sdu for a traffic class is 0, and follow the requested max
SDU size otherwise.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none
drivers/net/ethernet/freescale/enetc/enetc.h | 3 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 25 ++++++++++++++++---
.../net/ethernet/freescale/enetc/enetc_qos.c | 10 +++++---
3 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 748677b2ce1f..d7edc04f4bfc 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -453,6 +453,9 @@ static inline void enetc_cbd_free_data_mem(struct enetc_si *si, int size,
data, *dma);
}
+void enetc_reset_ptcmsdur(struct enetc_hw *hw);
+void enetc_set_ptcmsdur(struct enetc_hw *hw, u32 *queue_max_sdu);
+
#ifdef CONFIG_FSL_ENETC_QOS
int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct enetc_ndev_priv *priv, int speed);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index bb7750222691..de39d9ba7534 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -516,15 +516,34 @@ static void enetc_port_si_configure(struct enetc_si *si)
enetc_port_wr(hw, ENETC_PSIVLANFMR, ENETC_PSIVLANFMR_VS);
}
-static void enetc_configure_port_mac(struct enetc_hw *hw)
+void enetc_set_ptcmsdur(struct enetc_hw *hw, u32 *max_sdu)
{
int tc;
- enetc_port_wr(hw, ENETC_PM0_MAXFRM,
- ENETC_SET_MAXFRM(ENETC_RX_MAXFRM_SIZE));
+ for (tc = 0; tc < 8; tc++) {
+ u32 val = ENETC_MAC_MAXFRM_SIZE;
+
+ if (max_sdu[tc])
+ val = max_sdu[tc] + VLAN_ETH_HLEN;
+
+ enetc_port_wr(hw, ENETC_PTCMSDUR(tc), val);
+ }
+}
+
+void enetc_reset_ptcmsdur(struct enetc_hw *hw)
+{
+ int tc;
for (tc = 0; tc < 8; tc++)
enetc_port_wr(hw, ENETC_PTCMSDUR(tc), ENETC_MAC_MAXFRM_SIZE);
+}
+
+static void enetc_configure_port_mac(struct enetc_hw *hw)
+{
+ enetc_port_wr(hw, ENETC_PM0_MAXFRM,
+ ENETC_SET_MAXFRM(ENETC_RX_MAXFRM_SIZE));
+
+ enetc_reset_ptcmsdur(hw);
enetc_port_wr(hw, ENETC_PM0_CMD_CFG, ENETC_PM0_CMD_PHY_TX_EN |
ENETC_PM0_CMD_TXP | ENETC_PM0_PROMISC);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index ee28cb62afe8..f89623792916 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -67,6 +67,7 @@ static int enetc_setup_taprio(struct net_device *ndev,
tge = enetc_rd(hw, ENETC_PTGCR);
if (!admin_conf->enable) {
enetc_wr(hw, ENETC_PTGCR, tge & ~ENETC_PTGCR_TGE);
+ enetc_reset_ptcmsdur(hw);
priv->active_offloads &= ~ENETC_F_QBV;
@@ -122,10 +123,13 @@ static int enetc_setup_taprio(struct net_device *ndev,
enetc_cbd_free_data_mem(priv->si, data_size, tmp, &dma);
- if (!err)
- priv->active_offloads |= ENETC_F_QBV;
+ if (err)
+ return err;
- return err;
+ enetc_set_ptcmsdur(hw, admin_conf->max_sdu);
+ priv->active_offloads |= ENETC_F_QBV;
+
+ return 0;
}
int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data)
--
2.34.1
IEEE 802.1Q clause 12.29.1.1 "The queueMaxSDUTable structure and data
types" and 8.6.8.4 "Enhancements for scheduled traffic" talk about the
existence of a per traffic class limitation of maximum frame sizes, with
a fallback on the port-based MTU.
As far as I am able to understand, the 802.1Q Service Data Unit (SDU)
represents the MAC Service Data Unit (MSDU, i.e. L2 payload), excluding
any number of prepended VLAN headers which may be otherwise present in
the MSDU. Therefore, the queueMaxSDU is directly comparable to the
device MTU (1500 means L2 payload sizes are accepted, or frame sizes of
1518 octets, or 1522 plus one VLAN header). Drivers which offload this
are directly responsible of translating into other units of measurement.
To keep the fast path checks optimized, we keep 2 arrays in the qdisc,
one for max_sdu translated into frame length (so that it's comparable to
skb->len), and another for offloading and for dumping back to the user.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: precompute the max_frm_len using dev->hard_header_len, so that
the fast path can directly check against skb->len
include/net/pkt_sched.h | 1 +
include/uapi/linux/pkt_sched.h | 11 +++
net/sched/sch_taprio.c | 138 ++++++++++++++++++++++++++++++++-
3 files changed, 149 insertions(+), 1 deletion(-)
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 2ff80cd04c5c..3c65417bea94 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -168,6 +168,7 @@ struct tc_taprio_qopt_offload {
ktime_t base_time;
u64 cycle_time;
u64 cycle_time_extension;
+ u32 max_sdu[TC_MAX_QUEUE];
size_t num_entries;
struct tc_taprio_sched_entry entries[];
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index f292b467b27f..000eec106856 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1232,6 +1232,16 @@ enum {
#define TCA_TAPRIO_ATTR_FLAG_TXTIME_ASSIST _BITUL(0)
#define TCA_TAPRIO_ATTR_FLAG_FULL_OFFLOAD _BITUL(1)
+enum {
+ TCA_TAPRIO_TC_ENTRY_UNSPEC,
+ TCA_TAPRIO_TC_ENTRY_INDEX, /* u32 */
+ TCA_TAPRIO_TC_ENTRY_MAX_SDU, /* u32 */
+
+ /* add new constants above here */
+ __TCA_TAPRIO_TC_ENTRY_CNT,
+ TCA_TAPRIO_TC_ENTRY_MAX = (__TCA_TAPRIO_TC_ENTRY_CNT - 1)
+};
+
enum {
TCA_TAPRIO_ATTR_UNSPEC,
TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */
@@ -1245,6 +1255,7 @@ enum {
TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */
TCA_TAPRIO_ATTR_FLAGS, /* u32 */
TCA_TAPRIO_ATTR_TXTIME_DELAY, /* u32 */
+ TCA_TAPRIO_ATTR_TC_ENTRY, /* nest */
__TCA_TAPRIO_ATTR_MAX,
};
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 0bc6d90e1e51..c38ed1861ee7 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -78,6 +78,8 @@ struct taprio_sched {
struct sched_gate_list __rcu *admin_sched;
struct hrtimer advance_timer;
struct list_head taprio_list;
+ u32 max_frm_len[TC_MAX_QUEUE]; /* for the fast path */
+ u32 max_sdu[TC_MAX_QUEUE]; /* for dump and offloading */
u32 txtime_delay;
};
@@ -415,6 +417,9 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch,
struct Qdisc *child, struct sk_buff **to_free)
{
struct taprio_sched *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+ int prio = skb->priority;
+ u8 tc;
/* sk_flags are only safe to use on full sockets. */
if (skb->sk && sk_fullsock(skb->sk) && sock_flag(skb->sk, SOCK_TXTIME)) {
@@ -426,6 +431,11 @@ static int taprio_enqueue_one(struct sk_buff *skb, struct Qdisc *sch,
return qdisc_drop(skb, sch, to_free);
}
+ /* Devices with full offload are expected to honor this in hardware */
+ tc = netdev_get_prio_tc_map(dev, prio);
+ if (skb->len > q->max_frm_len[tc])
+ return qdisc_drop(skb, sch, to_free);
+
qdisc_qstats_backlog_inc(sch, skb);
sch->q.qlen++;
@@ -754,6 +764,11 @@ static const struct nla_policy entry_policy[TCA_TAPRIO_SCHED_ENTRY_MAX + 1] = {
[TCA_TAPRIO_SCHED_ENTRY_INTERVAL] = { .type = NLA_U32 },
};
+static const struct nla_policy taprio_tc_policy[TCA_TAPRIO_TC_ENTRY_MAX + 1] = {
+ [TCA_TAPRIO_TC_ENTRY_INDEX] = { .type = NLA_U32 },
+ [TCA_TAPRIO_TC_ENTRY_MAX_SDU] = { .type = NLA_U32 },
+};
+
static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
[TCA_TAPRIO_ATTR_PRIOMAP] = {
.len = sizeof(struct tc_mqprio_qopt)
@@ -766,6 +781,7 @@ static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 },
[TCA_TAPRIO_ATTR_FLAGS] = { .type = NLA_U32 },
[TCA_TAPRIO_ATTR_TXTIME_DELAY] = { .type = NLA_U32 },
+ [TCA_TAPRIO_ATTR_TC_ENTRY] = { .type = NLA_NESTED },
};
static int fill_sched_entry(struct taprio_sched *q, struct nlattr **tb,
@@ -1216,7 +1232,7 @@ static int taprio_enable_offload(struct net_device *dev,
{
const struct net_device_ops *ops = dev->netdev_ops;
struct tc_taprio_qopt_offload *offload;
- int err = 0;
+ int tc, err = 0;
if (!ops->ndo_setup_tc) {
NL_SET_ERR_MSG(extack,
@@ -1233,6 +1249,9 @@ static int taprio_enable_offload(struct net_device *dev,
offload->enable = 1;
taprio_sched_to_offload(dev, sched, offload);
+ for (tc = 0; tc < TC_MAX_QUEUE; tc++)
+ offload->max_sdu[tc] = q->max_sdu[tc];
+
err = ops->ndo_setup_tc(dev, TC_SETUP_QDISC_TAPRIO, offload);
if (err < 0) {
NL_SET_ERR_MSG(extack,
@@ -1367,6 +1386,89 @@ static int taprio_parse_clockid(struct Qdisc *sch, struct nlattr **tb,
return err;
}
+static int taprio_parse_tc_entry(struct Qdisc *sch,
+ struct nlattr *opt,
+ u32 max_sdu[TC_QOPT_MAX_QUEUE],
+ unsigned long *seen_tcs,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[TCA_TAPRIO_TC_ENTRY_MAX + 1] = { };
+ struct net_device *dev = qdisc_dev(sch);
+ u32 val = 0;
+ int err, tc;
+
+ err = nla_parse_nested(tb, TCA_TAPRIO_TC_ENTRY_MAX, opt,
+ taprio_tc_policy, extack);
+ if (err < 0)
+ return err;
+
+ if (!tb[TCA_TAPRIO_TC_ENTRY_INDEX]) {
+ NL_SET_ERR_MSG_MOD(extack, "TC entry index missing");
+ return -EINVAL;
+ }
+
+ tc = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_INDEX]);
+ if (tc >= TC_QOPT_MAX_QUEUE) {
+ NL_SET_ERR_MSG_MOD(extack, "TC entry index out of range");
+ return -ERANGE;
+ }
+
+ if (*seen_tcs & BIT(tc)) {
+ NL_SET_ERR_MSG_MOD(extack, "Duplicate TC entry");
+ return -EINVAL;
+ }
+
+ *seen_tcs |= BIT(tc);
+
+ if (tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU])
+ val = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU]);
+
+ if (val > dev->max_mtu) {
+ NL_SET_ERR_MSG_MOD(extack, "TC max SDU exceeds device max MTU");
+ return -ERANGE;
+ }
+
+ max_sdu[tc] = val;
+
+ return 0;
+}
+
+static int taprio_parse_tc_entries(struct Qdisc *sch,
+ struct nlattr *opt,
+ struct netlink_ext_ack *extack)
+{
+ struct taprio_sched *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+ u32 max_sdu[TC_QOPT_MAX_QUEUE];
+ unsigned long seen_tcs = 0;
+ struct nlattr *n;
+ int tc, rem;
+ int err = 0;
+
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++)
+ max_sdu[tc] = q->max_sdu[tc];
+
+ nla_for_each_nested(n, opt, rem) {
+ if (nla_type(n) != TCA_TAPRIO_ATTR_TC_ENTRY)
+ continue;
+
+ err = taprio_parse_tc_entry(sch, n, max_sdu, &seen_tcs, extack);
+ if (err)
+ goto out;
+ }
+
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) {
+ q->max_sdu[tc] = max_sdu[tc];
+ if (max_sdu[tc])
+ q->max_frm_len[tc] = max_sdu[tc] + dev->hard_header_len;
+ else
+ q->max_frm_len[tc] = U32_MAX; /* never oversized */
+ }
+
+out:
+ return err;
+}
+
static int taprio_mqprio_cmp(const struct net_device *dev,
const struct tc_mqprio_qopt *mqprio)
{
@@ -1445,6 +1547,10 @@ static int taprio_change(struct Qdisc *sch, struct nlattr *opt,
if (err < 0)
return err;
+ err = taprio_parse_tc_entries(sch, opt, extack);
+ if (err)
+ return err;
+
new_admin = kzalloc(sizeof(*new_admin), GFP_KERNEL);
if (!new_admin) {
NL_SET_ERR_MSG(extack, "Not enough memory for a new schedule");
@@ -1825,6 +1931,33 @@ static int dump_schedule(struct sk_buff *msg,
return -1;
}
+static int taprio_dump_tc_entries(struct taprio_sched *q, struct sk_buff *skb)
+{
+ struct nlattr *n;
+ int tc;
+
+ for (tc = 0; tc < TC_MAX_QUEUE; tc++) {
+ n = nla_nest_start(skb, TCA_TAPRIO_ATTR_TC_ENTRY);
+ if (!n)
+ return -EMSGSIZE;
+
+ if (nla_put_u32(skb, TCA_TAPRIO_TC_ENTRY_INDEX, tc))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_TAPRIO_TC_ENTRY_MAX_SDU,
+ q->max_sdu[tc]))
+ goto nla_put_failure;
+
+ nla_nest_end(skb, n);
+ }
+
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, n);
+ return -EMSGSIZE;
+}
+
static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
{
struct taprio_sched *q = qdisc_priv(sch);
@@ -1863,6 +1996,9 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
goto options_error;
+ if (taprio_dump_tc_entries(q, skb))
+ goto options_error;
+
if (oper && dump_schedule(skb, oper))
goto options_error;
--
2.34.1
Since the driver does not act upon the max_sdu argument, deny any other
values except the default all-zeroes, which means that all traffic
classes should use the same MTU as the port itself.
Signed-off-by: Vladimir Oltean <[email protected]>
---
v1->v2: none
drivers/net/ethernet/intel/igc/igc_main.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c
index bf6c461e1a2a..47fae443066c 100644
--- a/drivers/net/ethernet/intel/igc/igc_main.c
+++ b/drivers/net/ethernet/intel/igc/igc_main.c
@@ -5965,11 +5965,15 @@ static int igc_tsn_enable_qbv_scheduling(struct igc_adapter *adapter,
struct tc_taprio_qopt_offload *qopt)
{
struct igc_hw *hw = &adapter->hw;
- int err;
+ int tc, err;
if (hw->mac.type != igc_i225)
return -EOPNOTSUPP;
+ for (tc = 0; tc < TC_MAX_QUEUE; tc++)
+ if (qopt->max_sdu[tc])
+ return -EOPNOTSUPP;
+
err = igc_save_qbv_schedule(adapter, qopt);
if (err)
return err;
--
2.34.1
On Fri, 23 Sep 2022 19:32:59 +0300 Vladimir Oltean wrote:
> + if (!tb[TCA_TAPRIO_TC_ENTRY_INDEX]) {
> + NL_SET_ERR_MSG_MOD(extack, "TC entry index missing");
NL_SET_ERR_ATTR_MISS() ?
> + return -EINVAL;
> + }
> +
> + tc = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_INDEX]);
> + if (tc >= TC_QOPT_MAX_QUEUE) {
> + NL_SET_ERR_MSG_MOD(extack, "TC entry index out of range");
NLA_POLICY_MAX()
Are you not using those on purpose? :(
On Fri, 23 Sep 2022 19:33:00 +0300 Vladimir Oltean wrote:
> Since the driver does not act upon the max_sdu argument, deny any other
> values except the default all-zeroes, which means that all traffic
> classes should use the same MTU as the port itself.
Don't all the driver patches make you wanna turn this into an opt-in?
What are the chances we'll catch all drivers missing the validation
in review?
On Mon, Sep 26, 2022 at 01:40:25PM -0700, Jakub Kicinski wrote:
> On Fri, 23 Sep 2022 19:33:00 +0300 Vladimir Oltean wrote:
> > Since the driver does not act upon the max_sdu argument, deny any other
> > values except the default all-zeroes, which means that all traffic
> > classes should use the same MTU as the port itself.
>
> Don't all the driver patches make you wanna turn this into an opt-in?
Presumably you're thinking of a way through which the caller of
ndo_setup_tc(TC_SETUP_QDISC_TAPRIO, struct tc_taprio_qopt_offload *)
knows whether the driver took the new max_sdu field into consideration,
and not just accepted it blindly?
I'm not exactly up to date with all the techniques which can achieve
that without changes in drivers, and I haven't noticed other qdisc
offloads doing it either... but this would be a great trick to learn for
sure. Do you have any idea?
> What are the chances we'll catch all drivers missing the validation
> in review?
Not that slim I think, they are all identifiable if you search for
TC_SETUP_QDISC_TAPRIO.
On Tue, 27 Sep 2022 00:50:49 +0300 Vladimir Oltean wrote:
> > Don't all the driver patches make you wanna turn this into an opt-in?
>
> Presumably you're thinking of a way through which the caller of
> ndo_setup_tc(TC_SETUP_QDISC_TAPRIO, struct tc_taprio_qopt_offload *)
> knows whether the driver took the new max_sdu field into consideration,
> and not just accepted it blindly?
>
> I'm not exactly up to date with all the techniques which can achieve
> that without changes in drivers, and I haven't noticed other qdisc
> offloads doing it either... but this would be a great trick to learn for
> sure. Do you have any idea?
I usually put a capability field into the ops themselves. But since tc
offloads don't have real ops (heh) we need to do the command callback
thing. This is my knee-jerk coding of something:
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 9f42fc871c3b..2d043def76d8 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -960,6 +960,11 @@ enum tc_setup_type {
TC_SETUP_QDISC_FIFO,
TC_SETUP_QDISC_HTB,
TC_SETUP_ACT,
+ TC_QUERY_CAPS,
+};
+
+struct tc_query_caps {
+ u32 cmd;
};
/* These structures hold the attributes of bpf state that are being passed
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 2ff80cd04c5c..2416151a23db 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -155,6 +155,12 @@ struct tc_etf_qopt_offload {
s32 queue;
};
+struct tc_taprio_drv_caps {
+ struct tc_query_caps base;
+
+ bool accept_max_sdu;
+};
+
struct tc_taprio_sched_entry {
u8 command; /* TC_TAPRIO_CMD_* */
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 136ae21ebce9..68302ee33937 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -1219,6 +1219,7 @@ static int taprio_enable_offload(struct net_device *dev,
struct sched_gate_list *sched,
struct netlink_ext_ack *extack)
{
+ struct tc_taprio_drv_caps caps = { { .cmd = TC_SETUP_QDISC_TAPRIO, }, };
const struct net_device_ops *ops = dev->netdev_ops;
struct tc_taprio_qopt_offload *offload;
int err = 0;
@@ -1229,6 +1230,12 @@ static int taprio_enable_offload(struct net_device *dev,
return -EOPNOTSUPP;
}
+ ops->ndo_setup_tc(dev, TC_QUERY_CAPS, &caps);
+ if (!caps.accept_max_sdu && taprio_is_max_sdu_used(...)) {
+ NL_SET_ERR_MSG(extack, "nope.");
+ return -EOPNOTSUPP;
+ }
+
offload = taprio_offload_alloc(sched->num_entries);
if (!offload) {
NL_SET_ERR_MSG(extack,
> > What are the chances we'll catch all drivers missing the validation
> > in review?
>
> Not that slim I think, they are all identifiable if you search for
> TC_SETUP_QDISC_TAPRIO.
Right, but that's what's in the tree _now_. Experience teaches that
people may have out of tree code which implements TAPRIO and may send
it for upstream review without as much as testing it against net-next :(
As time passes and our memories fade the chances we'd catch such code
when posted upstream go down, perhaps from high to medium but still,
the explicit opt-in is more foolproof.
On Mon, Sep 26, 2022 at 04:29:34PM -0700, Jakub Kicinski wrote:
> I usually put a capability field into the ops themselves.
Do you also have an example for the 'usual' manner?
> But since tc offloads don't have real ops (heh) we need to do the
> command callback thing. This is my knee-jerk coding of something:
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 9f42fc871c3b..2d043def76d8 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -960,6 +960,11 @@ enum tc_setup_type {
> TC_SETUP_QDISC_FIFO,
> TC_SETUP_QDISC_HTB,
> TC_SETUP_ACT,
> + TC_QUERY_CAPS,
> +};
> +
> +struct tc_query_caps {
> + u32 cmd;
actually s/u32/enum tc_setup_type/
inception....
> };
> Right, but that's what's in the tree _now_. Experience teaches that
> people may have out of tree code which implements TAPRIO and may send
> it for upstream review without as much as testing it against net-next :(
> As time passes and our memories fade the chances we'd catch such code
> when posted upstream go down, perhaps from high to medium but still,
> the explicit opt-in is more foolproof.
You also need to see the flip side. You're making code more self-maintainable
by adding bureaucracy to the run time itself. Whereas things could have
been sorted out between the qdisc and the driver in just one ndo_setup_tc()
call via the straightforward approach (every driver rejects what it
doesn't like), now you need two calls for the normal case when the
driver will accept a valid configuration.
I get the point and I think this won't probably make a big difference
for a slow path like qdisc offload (at least it won't for me), but from
an API perspective, once the mechanism will go in, it will become quite
ossified, so it's best to ask some questions about it now.
Like for example you're funneling the caps through ndo_setup_tc(), which
has these comments in its description:
* int (*ndo_setup_tc)(struct net_device *dev, enum tc_setup_type type,
* void *type_data);
* Called to setup any 'tc' scheduler, classifier or action on @dev.
* This is always called from the stack with the rtnl lock held and netif
* tx queues stopped. This allows the netdevice to perform queue
* management safely.
Do we need to offer guarantees of rtnl lock and stopped TX queues to a
function which just queries capabilities (and likely doesn't need them),
or would it be better to devise a new ndo? Generally, when you have a
separate method to query caps vs to actually do the work, different
calling contexts is generally the justification to do that, as opposed
to piggy-backing the caps that the driver acted upon through the same
struct tc_taprio_qopt_offload.
On Mon, Sep 26, 2022 at 01:38:29PM -0700, Jakub Kicinski wrote:
> On Fri, 23 Sep 2022 19:32:59 +0300 Vladimir Oltean wrote:
> > + if (!tb[TCA_TAPRIO_TC_ENTRY_INDEX]) {
> > + NL_SET_ERR_MSG_MOD(extack, "TC entry index missing");
>
> NL_SET_ERR_ATTR_MISS() ?
>
> > + return -EINVAL;
> > + }
> > +
> > + tc = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_INDEX]);
> > + if (tc >= TC_QOPT_MAX_QUEUE) {
> > + NL_SET_ERR_MSG_MOD(extack, "TC entry index out of range");
>
> NLA_POLICY_MAX()
>
> Are you not using those on purpose? :(
I don't exactly see it as being super user friendly to leave it to the
policy validator (or to use NL_SET_ERR_ATTR_MISS()) because all that
will be reported back to user space will be the offset to the original
attribute in the nlmsghdr, which is pretty hard to retrieve and
re-interpret (at least in the iproute2 tc source code, I can't seem to
find a way to stringify it or something like that). For the NLA_POLICY_MAX(),
all I'll get now is an uninformative "Error: integer out of range."
What integer? What range?
I don't understand what is the gain of removing extack message strings
and just pointing to the netlink attribute via NLMSGERR_ATTR_OFFS? Could
I at least use the NL_SET_ERR_ATTR_MISS() helper *and* set a custom message?
That's for the missing nlattr. Regarding the range checking in the
policy, I'd like a custom message there as well, but the NLA_POLICY_MAX()
doesn't provide one. However, I see that struct nla_policy has a const
char *reject_message for NLA_REJECT types. Would it be an abuse to move
this outside of the union and allow U32 policies and such to also
provide it?
On Tue, 27 Sep 2022 18:09:21 +0300 Vladimir Oltean wrote:
> On Mon, Sep 26, 2022 at 01:38:29PM -0700, Jakub Kicinski wrote:
> > On Fri, 23 Sep 2022 19:32:59 +0300 Vladimir Oltean wrote:
> > > + if (!tb[TCA_TAPRIO_TC_ENTRY_INDEX]) {
> > > + NL_SET_ERR_MSG_MOD(extack, "TC entry index missing");
> >
> > NL_SET_ERR_ATTR_MISS() ?
> >
> > > + return -EINVAL;
> > > + }
> > > +
> > > + tc = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_INDEX]);
> > > + if (tc >= TC_QOPT_MAX_QUEUE) {
> > > + NL_SET_ERR_MSG_MOD(extack, "TC entry index out of range");
> >
> > NLA_POLICY_MAX()
> >
> > Are you not using those on purpose? :(
>
> I don't exactly see it as being super user friendly to leave it to the
> policy validator (or to use NL_SET_ERR_ATTR_MISS()) because all that
> will be reported back to user space will be the offset to the original
> attribute in the nlmsghdr, which is pretty hard to retrieve and
> re-interpret (at least in the iproute2 tc source code, I can't seem to
> find a way to stringify it or something like that). For the NLA_POLICY_MAX(),
> all I'll get now is an uninformative "Error: integer out of range."
> What integer? What range?
I know, that's what I expected you'd say :(
You'd need a reverse parser which is a PITA to do unless you have
clearly specified bindings.
> I don't understand what is the gain of removing extack message strings
> and just pointing to the netlink attribute via NLMSGERR_ATTR_OFFS? Could
> I at least use the NL_SET_ERR_ATTR_MISS() helper *and* set a custom message?
> That's for the missing nlattr. Regarding the range checking in the
> policy, I'd like a custom message there as well, but the NLA_POLICY_MAX()
> doesn't provide one. However, I see that struct nla_policy has a const
> char *reject_message for NLA_REJECT types. Would it be an abuse to move
> this outside of the union and allow U32 policies and such to also
> provide it?
I'd rather you kept the code as is than make precedent for adding both
string and machine readable. If we do that people will try to stay on
the safe side and always add both.
The machine readable format is carries all the information you need.
It's just the user space is not clever enough to read it which is,
well, solvable.
On Tue, Sep 27, 2022 at 11:27:10AM -0700, Jakub Kicinski wrote:
> I know, that's what I expected you'd say :(
> You'd need a reverse parser which is a PITA to do unless you have
> clearly specified bindings.
I think you're underestimating the problem, it's worse than PITA. My A
still hurts and yet I couldn't find any way in which reverse parsing the
bad netlink attribute is in any way practical in iproute2, other than
doing it to prove a point that it's possible.
> I'd rather you kept the code as is than make precedent for adding both
> string and machine readable. If we do that people will try to stay on
> the safe side and always add both.
>
> The machine readable format is carries all the information you need.
Nope, the question "What range?" still isn't answered via the machine
readable format. Just "What integer?".
> It's just the user space is not clever enough to read it which is,
> well, solvable.
You should come work at NXP, we love people who keep a healthy dose of
unnecessary complexity in things :)
Sometimes, "not clever enough" is just fine.
On Tue, 27 Sep 2022 21:23:19 +0000 Vladimir Oltean wrote:
> On Tue, Sep 27, 2022 at 11:27:10AM -0700, Jakub Kicinski wrote:
> > I know, that's what I expected you'd say :(
> > You'd need a reverse parser which is a PITA to do unless you have
> > clearly specified bindings.
>
> I think you're underestimating the problem, it's worse than PITA. My A
> still hurts and yet I couldn't find any way in which reverse parsing the
> bad netlink attribute is in any way practical in iproute2, other than
> doing it to prove a point that it's possible.
Yup, iproute2 does not have policy tables, so it's hard.
Once you have tables like this:
https://github.com/kuba-moo/ynl/blob/main/tools/net/ynl/generated/ethtool-user.c#L22
all linking up the types, it's fairly easy to reverse parse:
https://github.com/kuba-moo/ynl/blob/main/tools/net/ynl/lib/ynl.c#L75
Admittedly I haven't added parsing of the bounds yet so it'd just say
"invalid argument .bla.something" not "argument out of range
.bla.something (is: X, range N-M)" but that's just typing.
> > I'd rather you kept the code as is than make precedent for adding both
> > string and machine readable. If we do that people will try to stay on
> > the safe side and always add both.
> >
> > The machine readable format is carries all the information you need.
>
> Nope, the question "What range?" still isn't answered via the machine
> readable format. Just "What integer?".
Hm, doesn't NLMSGERR_ATTR_POLICY contain the bounds?
> > It's just the user space is not clever enough to read it which is,
> > well, solvable.
>
> You should come work at NXP, we love people who keep a healthy dose of
> unnecessary complexity in things :)
Ha! :D
> Sometimes, "not clever enough" is just fine.
Yup, go ahead with just the strings. "We'll get there" for the machine
readable parsing, hopefully, once my YAML descriptions come...