The last RFC in August 2022 contained a proposal for the UAPI of both
TSN standards which together form Frame Preemption (802.1Q and 802.3):
https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
It wasn't clear at the time whether the 802.1Q portion of Frame Preemption
should be exposed via the tc qdisc (mqprio, taprio) or via some other
layer (perhaps also ethtool like the 802.3 portion).
So the 802.3 portion got submitted separately and finally was accepted:
https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
leaving the only remaining question: how do we expose the 802.1Q bits?
This series proposes that we use the Qdisc layer, through separate
(albeit very similar) UAPI in mqprio and taprio, and that both these
Qdiscs pass the information down to the offloading device driver through
the common mqprio offload structure (which taprio also passes).
Implementations are provided for the NXP LS1028A on-board Ethernet
(enetc, felix).
Some patches should have maybe belonged to separate series, leaving here
only patches 09/12 - 12/12, for ease of review. That may be true,
however due to a perceived lack of time to wait for the prerequisite
cleanup to be merged, here they are all together.
Vladimir Oltean (12):
net: enetc: rename "mqprio" to "qopt"
net: mscc: ocelot: add support for mqprio offload
net: dsa: felix: act upon the mqprio qopt in taprio offload
net: ethtool: fix __ethtool_dev_mm_supported() implementation
net: ethtool: create and export ethtool_dev_mm_supported()
net/sched: mqprio: simplify handling of nlattr portion of TCA_OPTIONS
net/sched: mqprio: add extack to mqprio_parse_nlattr()
net/sched: mqprio: add an extack message to mqprio_parse_opt()
net/sched: mqprio: allow per-TC user input of FP adminStatus
net/sched: taprio: allow per-TC user input of FP adminStatus
net: mscc: ocelot: add support for preemptible traffic classes
net: enetc: add support for preemptible traffic classes
drivers/net/dsa/ocelot/felix_vsc9959.c | 44 ++++-
drivers/net/ethernet/freescale/enetc/enetc.c | 31 ++-
drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
.../net/ethernet/freescale/enetc/enetc_hw.h | 4 +
drivers/net/ethernet/mscc/ocelot.c | 51 +++++
drivers/net/ethernet/mscc/ocelot.h | 2 +
drivers/net/ethernet/mscc/ocelot_mm.c | 56 ++++++
include/linux/ethtool_netlink.h | 6 +
include/net/pkt_sched.h | 1 +
include/soc/mscc/ocelot.h | 6 +
include/uapi/linux/pkt_sched.h | 17 ++
net/ethtool/mm.c | 24 ++-
net/sched/sch_mqprio.c | 182 +++++++++++++++---
net/sched/sch_mqprio_lib.c | 14 ++
net/sched/sch_mqprio_lib.h | 2 +
net/sched/sch_taprio.c | 65 +++++--
16 files changed, 459 insertions(+), 47 deletions(-)
--
2.34.1
To gain access to the larger encapsulating structure which has the type
tc_mqprio_qopt_offload, rename just the "qopt" field as "qopt".
Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 2fc712b24d12..e0207b01ddd6 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -2644,12 +2644,13 @@ static void enetc_reset_tc_mqprio(struct net_device *ndev)
int enetc_setup_tc_mqprio(struct net_device *ndev, void *type_data)
{
+ struct tc_mqprio_qopt_offload *mqprio = type_data;
struct enetc_ndev_priv *priv = netdev_priv(ndev);
- struct tc_mqprio_qopt *mqprio = type_data;
+ struct tc_mqprio_qopt *qopt = &mqprio->qopt;
struct enetc_hw *hw = &priv->si->hw;
int num_stack_tx_queues = 0;
- u8 num_tc = mqprio->num_tc;
struct enetc_bdr *tx_ring;
+ u8 num_tc = qopt->num_tc;
int offset, count;
int err, tc, q;
@@ -2663,8 +2664,8 @@ int enetc_setup_tc_mqprio(struct net_device *ndev, void *type_data)
return err;
for (tc = 0; tc < num_tc; tc++) {
- offset = mqprio->offset[tc];
- count = mqprio->count[tc];
+ offset = qopt->offset[tc];
+ count = qopt->count[tc];
num_stack_tx_queues += count;
err = netdev_set_tc_queue(ndev, tc, count, offset);
--
2.34.1
This doesn't apply anything to hardware and in general doesn't do
anything that the software variant doesn't do, except for checking that
there isn't more than 1 TXQ per TC (TXQs for a DSA switch are a dubious
concept anyway). The reason we add this is to be able to parse one more
field added to struct tc_mqprio_qopt_offload, namely preemptible_tcs.
Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/dsa/ocelot/felix_vsc9959.c | 9 +++++
drivers/net/ethernet/mscc/ocelot.c | 48 ++++++++++++++++++++++++++
include/soc/mscc/ocelot.h | 4 +++
3 files changed, 61 insertions(+)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 354aa3dbfde7..3df71444dde1 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -1612,6 +1612,13 @@ static int vsc9959_qos_port_cbs_set(struct dsa_switch *ds, int port,
static int vsc9959_qos_query_caps(struct tc_query_caps_base *base)
{
switch (base->type) {
+ case TC_SETUP_QDISC_MQPRIO: {
+ struct tc_mqprio_caps *caps = base->caps;
+
+ caps->validate_queue_counts = true;
+
+ return 0;
+ }
case TC_SETUP_QDISC_TAPRIO: {
struct tc_taprio_caps *caps = base->caps;
@@ -1635,6 +1642,8 @@ static int vsc9959_port_setup_tc(struct dsa_switch *ds, int port,
return vsc9959_qos_query_caps(type_data);
case TC_SETUP_QDISC_TAPRIO:
return vsc9959_qos_port_tas_set(ocelot, port, type_data);
+ case TC_SETUP_QDISC_MQPRIO:
+ return ocelot_port_mqprio(ocelot, port, type_data);
case TC_SETUP_QDISC_CBS:
return vsc9959_qos_port_cbs_set(ds, port, type_data);
default:
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 08acb7b89086..20557a9c46e6 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -7,6 +7,7 @@
#include <linux/dsa/ocelot.h>
#include <linux/if_bridge.h>
#include <linux/iopoll.h>
+#include <net/pkt_sched.h>
#include <soc/mscc/ocelot_vcap.h>
#include "ocelot.h"
#include "ocelot_vcap.h"
@@ -2602,6 +2603,53 @@ void ocelot_port_mirror_del(struct ocelot *ocelot, int from, bool ingress)
}
EXPORT_SYMBOL_GPL(ocelot_port_mirror_del);
+static void ocelot_port_reset_mqprio(struct ocelot *ocelot, int port)
+{
+ struct net_device *dev = ocelot->ops->port_to_netdev(ocelot, port);
+
+ netdev_reset_tc(dev);
+}
+
+int ocelot_port_mqprio(struct ocelot *ocelot, int port,
+ struct tc_mqprio_qopt_offload *mqprio)
+{
+ struct net_device *dev = ocelot->ops->port_to_netdev(ocelot, port);
+ struct tc_mqprio_qopt *qopt = &mqprio->qopt;
+ int num_tc = qopt->num_tc;
+ int tc, err;
+
+ if (!num_tc) {
+ ocelot_port_reset_mqprio(ocelot, port);
+ return 0;
+ }
+
+ err = netdev_set_num_tc(dev, num_tc);
+ if (err)
+ return err;
+
+ for (tc = 0; tc < num_tc; tc++) {
+ if (qopt->count[tc] != 1) {
+ netdev_err(dev, "Only one TXQ per TC supported\n");
+ return -EINVAL;
+ }
+
+ err = netdev_set_tc_queue(dev, tc, 1, qopt->offset[tc]);
+ if (err)
+ goto err_reset_tc;
+ }
+
+ err = netif_set_real_num_tx_queues(dev, num_tc);
+ if (err)
+ goto err_reset_tc;
+
+ return 0;
+
+err_reset_tc:
+ ocelot_port_reset_mqprio(ocelot, port);
+ return err;
+}
+EXPORT_SYMBOL_GPL(ocelot_port_mqprio);
+
void ocelot_init_port(struct ocelot *ocelot, int port)
{
struct ocelot_port *ocelot_port = ocelot->ports[port];
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 2080879e4134..27ff770a6c53 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -11,6 +11,8 @@
#include <linux/regmap.h>
#include <net/dsa.h>
+struct tc_mqprio_qopt_offload;
+
/* Port Group IDs (PGID) are masks of destination ports.
*
* For L2 forwarding, the switch performs 3 lookups in the PGID table for each
@@ -1145,6 +1147,8 @@ int ocelot_port_set_mm(struct ocelot *ocelot, int port,
struct netlink_ext_ack *extack);
int ocelot_port_get_mm(struct ocelot *ocelot, int port,
struct ethtool_mm_state *state);
+int ocelot_port_mqprio(struct ocelot *ocelot, int port,
+ struct tc_mqprio_qopt_offload *mqprio);
#if IS_ENABLED(CONFIG_BRIDGE_MRP)
int ocelot_mrp_add(struct ocelot *ocelot, int port,
--
2.34.1
The mqprio queue configuration can appear either through
TC_SETUP_QDISC_MQPRIO or through TC_SETUP_QDISC_TAPRIO. Make sure both
are treated in the same way.
Code does nothing new for now.
Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/dsa/ocelot/felix_vsc9959.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 3df71444dde1..81fcdccacd8b 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -1424,6 +1424,7 @@ static int vsc9959_qos_port_tas_set(struct ocelot *ocelot, int port,
mutex_lock(&ocelot->tas_lock);
if (!taprio->enable) {
+ ocelot_port_mqprio(ocelot, port, &taprio->mqprio);
ocelot_rmw_rix(ocelot, 0, QSYS_TAG_CONFIG_ENABLE,
QSYS_TAG_CONFIG, port);
@@ -1436,15 +1437,19 @@ static int vsc9959_qos_port_tas_set(struct ocelot *ocelot, int port,
return 0;
}
+ ret = ocelot_port_mqprio(ocelot, port, &taprio->mqprio);
+ if (ret)
+ goto err_unlock;
+
if (taprio->cycle_time > NSEC_PER_SEC ||
taprio->cycle_time_extension >= NSEC_PER_SEC) {
ret = -EINVAL;
- goto err;
+ goto err_reset_tc;
}
if (taprio->num_entries > VSC9959_TAS_GCL_ENTRY_MAX) {
ret = -ERANGE;
- goto err;
+ goto err_reset_tc;
}
/* Enable guard band. The switch will schedule frames without taking
@@ -1468,7 +1473,7 @@ static int vsc9959_qos_port_tas_set(struct ocelot *ocelot, int port,
val = ocelot_read(ocelot, QSYS_PARAM_STATUS_REG_8);
if (val & QSYS_PARAM_STATUS_REG_8_CONFIG_PENDING) {
ret = -EBUSY;
- goto err;
+ goto err_reset_tc;
}
ocelot_rmw_rix(ocelot,
@@ -1503,12 +1508,19 @@ static int vsc9959_qos_port_tas_set(struct ocelot *ocelot, int port,
!(val & QSYS_TAS_PARAM_CFG_CTRL_CONFIG_CHANGE),
10, 100000);
if (ret)
- goto err;
+ goto err_reset_tc;
ocelot_port->taprio = taprio_offload_get(taprio);
vsc9959_tas_guard_bands_update(ocelot, port);
-err:
+ mutex_unlock(&ocelot->tas_lock);
+
+ return 0;
+
+err_reset_tc:
+ taprio->mqprio.qopt.num_tc = 0;
+ ocelot_port_mqprio(ocelot, port, &taprio->mqprio);
+err_unlock:
mutex_unlock(&ocelot->tas_lock);
return ret;
--
2.34.1
The MAC Merge layer is supported when ops->get_mm() returns 0.
The implementation was changed during review, and in this process, a bug
was introduced.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
Fixes: 04692c9020b7 ("net: ethtool: netlink: retrieve stats from multiple sources (eMAC, pMAC)")
Signed-off-by: Vladimir Oltean <[email protected]>
---
net/ethtool/mm.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c
index e612856eed8c..fce3cc2734f9 100644
--- a/net/ethtool/mm.c
+++ b/net/ethtool/mm.c
@@ -247,5 +247,5 @@ bool __ethtool_dev_mm_supported(struct net_device *dev)
if (ops && ops->get_mm)
ret = ops->get_mm(dev, &state);
- return !!ret;
+ return !ret;
}
--
2.34.1
Create a wrapper over __ethtool_dev_mm_supported() which also calls
ethnl_ops_begin() and ethnl_ops_complete(). It can be used by other code
layers, such as tc, to make sure that preemptible TCs are supported
(this is true if an underlying MAC Merge layer exists).
Signed-off-by: Vladimir Oltean <[email protected]>
---
include/linux/ethtool_netlink.h | 6 ++++++
net/ethtool/mm.c | 22 ++++++++++++++++++++++
net/sched/sch_mqprio.c | 1 +
3 files changed, 29 insertions(+)
diff --git a/include/linux/ethtool_netlink.h b/include/linux/ethtool_netlink.h
index 17003b385756..fae0dfb9a9c8 100644
--- a/include/linux/ethtool_netlink.h
+++ b/include/linux/ethtool_netlink.h
@@ -39,6 +39,7 @@ void ethtool_aggregate_pause_stats(struct net_device *dev,
struct ethtool_pause_stats *pause_stats);
void ethtool_aggregate_rmon_stats(struct net_device *dev,
struct ethtool_rmon_stats *rmon_stats);
+bool ethtool_dev_mm_supported(struct net_device *dev);
#else
static inline int ethnl_cable_test_alloc(struct phy_device *phydev, u8 cmd)
@@ -112,5 +113,10 @@ ethtool_aggregate_rmon_stats(struct net_device *dev,
{
}
+static inline bool ethtool_dev_mm_supported(struct net_device *dev)
+{
+ return false;
+}
+
#endif /* IS_ENABLED(CONFIG_ETHTOOL_NETLINK) */
#endif /* _LINUX_ETHTOOL_NETLINK_H_ */
diff --git a/net/ethtool/mm.c b/net/ethtool/mm.c
index fce3cc2734f9..87d9682efadd 100644
--- a/net/ethtool/mm.c
+++ b/net/ethtool/mm.c
@@ -249,3 +249,25 @@ bool __ethtool_dev_mm_supported(struct net_device *dev)
return !ret;
}
+
+bool ethtool_dev_mm_supported(struct net_device *dev)
+{
+ const struct ethtool_ops *ops = dev->ethtool_ops;
+ bool supported;
+ int ret;
+
+ ASSERT_RTNL();
+
+ if (!ops)
+ return false;
+
+ ret = ethnl_ops_begin(dev);
+ if (ret < 0)
+ return false;
+
+ supported = __ethtool_dev_mm_supported(dev);
+
+ ethnl_ops_complete(dev);
+
+ return supported;
+}
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 48ed87b91086..f0232783ced7 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -5,6 +5,7 @@
* Copyright (c) 2010 John Fastabend <[email protected]>
*/
+#include <linux/ethtool_netlink.h>
#include <linux/types.h>
#include <linux/slab.h>
#include <linux/kernel.h>
--
2.34.1
Netlink attribute parsing in mqprio is a minesweeper game, with many
options having the possibility of being passed incorrectly and the user
being none the wiser.
Try to make errors less sour by giving user space some information
regarding what went wrong.
Signed-off-by: Vladimir Oltean <[email protected]>
---
net/sched/sch_mqprio.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index cbb9cd2c3eff..18eda5fade81 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -151,7 +151,8 @@ static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = {
* TCA_OPTIONS, which are appended right after struct tc_mqprio_qopt.
*/
static int mqprio_parse_nlattr(struct Qdisc *sch, struct tc_mqprio_qopt *qopt,
- struct nlattr *opt)
+ struct nlattr *opt,
+ struct netlink_ext_ack *extack)
{
struct nlattr *nlattr_opt = nla_data(opt) + NLA_ALIGN(sizeof(*qopt));
int nlattr_opt_len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt));
@@ -168,8 +169,11 @@ static int mqprio_parse_nlattr(struct Qdisc *sch, struct tc_mqprio_qopt *qopt,
return err;
}
- if (!qopt->hw)
+ if (!qopt->hw) {
+ NL_SET_ERR_MSG(extack,
+ "mqprio TCA_OPTIONS can only contain netlink attributes in hardware mode");
return -EINVAL;
+ }
if (tb[TCA_MQPRIO_MODE]) {
priv->flags |= TC_MQPRIO_F_MODE;
@@ -182,13 +186,19 @@ static int mqprio_parse_nlattr(struct Qdisc *sch, struct tc_mqprio_qopt *qopt,
}
if (tb[TCA_MQPRIO_MIN_RATE64]) {
- if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE)
+ if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) {
+ NL_SET_ERR_MSG_ATTR(extack, tb[TCA_MQPRIO_MIN_RATE64],
+ "min_rate accepted only when shaper is in bw_rlimit mode");
return -EINVAL;
+ }
i = 0;
nla_for_each_nested(attr, tb[TCA_MQPRIO_MIN_RATE64],
rem) {
- if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64)
+ if (nla_type(attr) != TCA_MQPRIO_MIN_RATE64) {
+ NL_SET_ERR_MSG_ATTR(extack, attr,
+ "Attribute type expected to be TCA_MQPRIO_MIN_RATE64");
return -EINVAL;
+ }
if (i >= qopt->num_tc)
break;
priv->min_rate[i] = *(u64 *)nla_data(attr);
@@ -198,13 +208,19 @@ static int mqprio_parse_nlattr(struct Qdisc *sch, struct tc_mqprio_qopt *qopt,
}
if (tb[TCA_MQPRIO_MAX_RATE64]) {
- if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE)
+ if (priv->shaper != TC_MQPRIO_SHAPER_BW_RATE) {
+ NL_SET_ERR_MSG_ATTR(extack, tb[TCA_MQPRIO_MAX_RATE64],
+ "max_rate accepted only when shaper is in bw_rlimit mode");
return -EINVAL;
+ }
i = 0;
nla_for_each_nested(attr, tb[TCA_MQPRIO_MAX_RATE64],
rem) {
- if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64)
+ if (nla_type(attr) != TCA_MQPRIO_MAX_RATE64) {
+ NL_SET_ERR_MSG_ATTR(extack, attr,
+ "Attribute type expected to be TCA_MQPRIO_MAX_RATE64");
return -EINVAL;
+ }
if (i >= qopt->num_tc)
break;
priv->max_rate[i] = *(u64 *)nla_data(attr);
@@ -253,7 +269,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt,
len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt));
if (len > 0) {
- err = mqprio_parse_nlattr(sch, qopt, opt);
+ err = mqprio_parse_nlattr(sch, qopt, opt, extack);
if (err)
return err;
}
--
2.34.1
In commit 4e8b86c06269 ("mqprio: Introduce new hardware offload mode and
shaper in mqprio"), the TCA_OPTIONS format of mqprio was extended to
contain a fixed portion (of size NLA_ALIGN(sizeof struct tc_mqprio_qopt))
and a variable portion of other nlattrs (in the TCA_MQPRIO_* type space)
following immediately afterwards.
In commit feb2cf3dcfb9 ("net/sched: mqprio: refactor nlattr parsing to a
separate function"), we've moved the nlattr handling to a smaller
function, but yet, a small parse_attr() still remains, and the larger
mqprio_parse_nlattr() still does not have access to the beginning, and
the length, of the TCA_OPTIONS region containing these other nlattrs.
In a future change, the mqprio qdisc will need to iterate through this
nlattr region to discover other attributes, so eliminate parse_attr()
and add 2 variables in mqprio_parse_nlattr() which hold the beginning
and the length of the nlattr range.
We avoid the need to memset when nlattr_opt_len has insufficient length
by pre-initializing the table "tb".
Signed-off-by: Vladimir Oltean <[email protected]>
---
net/sched/sch_mqprio.c | 32 +++++++++++++-------------------
1 file changed, 13 insertions(+), 19 deletions(-)
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index f0232783ced7..cbb9cd2c3eff 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -147,32 +147,26 @@ static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = {
[TCA_MQPRIO_MAX_RATE64] = { .type = NLA_NESTED },
};
-static int parse_attr(struct nlattr *tb[], int maxtype, struct nlattr *nla,
- const struct nla_policy *policy, int len)
-{
- int nested_len = nla_len(nla) - NLA_ALIGN(len);
-
- if (nested_len >= nla_attr_size(0))
- return nla_parse_deprecated(tb, maxtype,
- nla_data(nla) + NLA_ALIGN(len),
- nested_len, policy, NULL);
-
- memset(tb, 0, sizeof(struct nlattr *) * (maxtype + 1));
- return 0;
-}
-
+/* Parse the other netlink attributes that represent the payload of
+ * TCA_OPTIONS, which are appended right after struct tc_mqprio_qopt.
+ */
static int mqprio_parse_nlattr(struct Qdisc *sch, struct tc_mqprio_qopt *qopt,
struct nlattr *opt)
{
+ struct nlattr *nlattr_opt = nla_data(opt) + NLA_ALIGN(sizeof(*qopt));
+ int nlattr_opt_len = nla_len(opt) - NLA_ALIGN(sizeof(*qopt));
struct mqprio_sched *priv = qdisc_priv(sch);
- struct nlattr *tb[TCA_MQPRIO_MAX + 1];
+ struct nlattr *tb[TCA_MQPRIO_MAX + 1] = {};
struct nlattr *attr;
int i, rem, err;
- err = parse_attr(tb, TCA_MQPRIO_MAX, opt, mqprio_policy,
- sizeof(*qopt));
- if (err < 0)
- return err;
+ if (nlattr_opt_len >= nla_attr_size(0)) {
+ err = nla_parse_deprecated(tb, TCA_MQPRIO_MAX, nlattr_opt,
+ nlattr_opt_len, mqprio_policy,
+ NULL);
+ if (err < 0)
+ return err;
+ }
if (!qopt->hw)
return -EINVAL;
--
2.34.1
Ferenc reports that a combination of poor iproute2 defaults and obscure
cases where the kernel returns -EINVAL make it difficult to understand
what is wrong with this command:
$ ip link add veth0 numtxqueues 8 numrxqueues 8 type veth peer name veth1
$ tc qdisc add dev veth0 root mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7
RTNETLINK answers: Invalid argument
Hopefully with this patch, the cause is clearer:
Error: Device does not support hardware offload.
This was rejected because iproute2 defaults to "hw 1" if the option is
not specified.
Link: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/#25215636
Signed-off-by: Vladimir Oltean <[email protected]>
---
net/sched/sch_mqprio.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 18eda5fade81..52cfc0ec2e23 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -134,8 +134,11 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt,
/* If ndo_setup_tc is not present then hardware doesn't support offload
* and we should return an error.
*/
- if (qopt->hw && !dev->netdev_ops->ndo_setup_tc)
+ if (qopt->hw && !dev->netdev_ops->ndo_setup_tc) {
+ NL_SET_ERR_MSG(extack,
+ "Device does not support hardware offload");
return -EINVAL;
+ }
return 0;
}
--
2.34.1
IEEE 802.1Q-2018 clause 6.7.2 Frame preemption specifies that each
packet priority can be assigned to a "frame preemption status" value of
either "express" or "preemptible". Express priorities are transmitted by
the local device through the eMAC, and preemptible priorities through
the pMAC (the concepts of eMAC and pMAC come from the 802.3 MAC Merge
layer).
The FP adminStatus is defined per packet priority, but 802.1Q clause
12.30.1.1.1 framePreemptionAdminStatus also says that:
| Priorities that all map to the same traffic class should be
| constrained to use the same value of preemption status.
It is impossible to ignore the cognitive dissonance in the standard
here, because it practically means that the FP adminStatus only takes
distinct values per traffic class, even though it is defined per
priority.
I can see no valid use case which is prevented by having the kernel take
the FP adminStatus as input per traffic class (what we do here).
In addition, this also enforces the above constraint by construction.
User space network managers which wish to expose FP adminStatus per
priority are free to do so; they must only observe the prio_tc_map of
the netdev.
The reason for configuring frame preemption as a property of the Qdisc
layer is that the information about "preemptible TCs" is closest to the
place which handles the num_tc and prio_tc_map of the netdev. If the
UAPI would have been any other layer, it would be unclear what to do
with the FP information when num_tc collapses to 0. A key assumption is
that only mqprio/taprio change the num_tc and prio_tc_map of the netdev.
Not sure if that's a great assumption to make.
Having FP in tc-mqprio can be seen as an implementation of the use case
defined in 802.1Q Annex S.2 Preemption used in isolation. There will be
a separate implementation of FP in tc-taprio.
Signed-off-by: Vladimir Oltean <[email protected]>
---
include/net/pkt_sched.h | 1 +
include/uapi/linux/pkt_sched.h | 16 +++++
net/sched/sch_mqprio.c | 126 ++++++++++++++++++++++++++++++++-
net/sched/sch_mqprio_lib.c | 14 ++++
net/sched/sch_mqprio_lib.h | 2 +
5 files changed, 158 insertions(+), 1 deletion(-)
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 2016839991a4..23be97f542fc 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -172,6 +172,7 @@ struct tc_mqprio_qopt_offload {
u32 flags;
u64 min_rate[TC_QOPT_MAX_QUEUE];
u64 max_rate[TC_QOPT_MAX_QUEUE];
+ unsigned long preemptible_tcs;
};
struct tc_taprio_caps {
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 000eec106856..b8d29be91b62 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -719,6 +719,11 @@ enum {
#define __TC_MQPRIO_SHAPER_MAX (__TC_MQPRIO_SHAPER_MAX - 1)
+enum {
+ TC_FP_EXPRESS = 1,
+ TC_FP_PREEMPTIBLE = 2,
+};
+
struct tc_mqprio_qopt {
__u8 num_tc;
__u8 prio_tc_map[TC_QOPT_BITMASK + 1];
@@ -732,12 +737,23 @@ struct tc_mqprio_qopt {
#define TC_MQPRIO_F_MIN_RATE 0x4
#define TC_MQPRIO_F_MAX_RATE 0x8
+enum {
+ TCA_MQPRIO_TC_ENTRY_UNSPEC,
+ TCA_MQPRIO_TC_ENTRY_INDEX, /* u32 */
+ TCA_MQPRIO_TC_ENTRY_FP, /* u32 */
+
+ /* add new constants above here */
+ __TCA_MQPRIO_TC_ENTRY_CNT,
+ TCA_MQPRIO_TC_ENTRY_MAX = (__TCA_MQPRIO_TC_ENTRY_CNT - 1)
+};
+
enum {
TCA_MQPRIO_UNSPEC,
TCA_MQPRIO_MODE,
TCA_MQPRIO_SHAPER,
TCA_MQPRIO_MIN_RATE64,
TCA_MQPRIO_MAX_RATE64,
+ TCA_MQPRIO_TC_ENTRY,
__TCA_MQPRIO_MAX,
};
diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
index 52cfc0ec2e23..2db0802c2ce8 100644
--- a/net/sched/sch_mqprio.c
+++ b/net/sched/sch_mqprio.c
@@ -28,6 +28,7 @@ struct mqprio_sched {
u32 flags;
u64 min_rate[TC_QOPT_MAX_QUEUE];
u64 max_rate[TC_QOPT_MAX_QUEUE];
+ u32 fp[TC_QOPT_MAX_QUEUE]; /* only for dump and offloading */
};
static int mqprio_enable_offload(struct Qdisc *sch,
@@ -61,6 +62,8 @@ static int mqprio_enable_offload(struct Qdisc *sch,
return -EINVAL;
}
+ mqprio_fp_to_offload(priv->fp, &mqprio);
+
err = dev->netdev_ops->ndo_setup_tc(dev, TC_SETUP_QDISC_MQPRIO,
&mqprio);
if (err)
@@ -143,13 +146,94 @@ static int mqprio_parse_opt(struct net_device *dev, struct tc_mqprio_qopt *qopt,
return 0;
}
+static const struct
+nla_policy mqprio_tc_entry_policy[TCA_MQPRIO_TC_ENTRY_MAX + 1] = {
+ [TCA_MQPRIO_TC_ENTRY_INDEX] = NLA_POLICY_MAX(NLA_U32,
+ TC_QOPT_MAX_QUEUE),
+ [TCA_MQPRIO_TC_ENTRY_FP] = NLA_POLICY_RANGE(NLA_U32,
+ TC_FP_EXPRESS,
+ TC_FP_PREEMPTIBLE),
+};
+
static const struct nla_policy mqprio_policy[TCA_MQPRIO_MAX + 1] = {
[TCA_MQPRIO_MODE] = { .len = sizeof(u16) },
[TCA_MQPRIO_SHAPER] = { .len = sizeof(u16) },
[TCA_MQPRIO_MIN_RATE64] = { .type = NLA_NESTED },
[TCA_MQPRIO_MAX_RATE64] = { .type = NLA_NESTED },
+ [TCA_MQPRIO_TC_ENTRY] = { .type = NLA_NESTED },
};
+static int mqprio_parse_tc_entry(u32 fp[TC_QOPT_MAX_QUEUE],
+ struct nlattr *opt,
+ unsigned long *seen_tcs,
+ struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[TCA_MQPRIO_TC_ENTRY_MAX + 1] = { };
+ int err, tc;
+
+ err = nla_parse_nested(tb, TCA_MQPRIO_TC_ENTRY_MAX, opt,
+ mqprio_tc_entry_policy, extack);
+ if (err < 0)
+ return err;
+
+ if (!tb[TCA_MQPRIO_TC_ENTRY_INDEX]) {
+ NL_SET_ERR_MSG(extack, "TC entry index missing");
+ return -EINVAL;
+ }
+
+ tc = nla_get_u32(tb[TCA_MQPRIO_TC_ENTRY_INDEX]);
+ if (*seen_tcs & BIT(tc)) {
+ NL_SET_ERR_MSG(extack, "Duplicate tc entry");
+ return -EINVAL;
+ }
+
+ *seen_tcs |= BIT(tc);
+
+ if (tb[TCA_MQPRIO_TC_ENTRY_FP])
+ fp[tc] = nla_get_u32(tb[TCA_MQPRIO_TC_ENTRY_FP]);
+
+ return 0;
+}
+
+static int mqprio_parse_tc_entries(struct Qdisc *sch, struct nlattr *nlattr_opt,
+ int nlattr_opt_len,
+ struct netlink_ext_ack *extack)
+{
+ struct mqprio_sched *priv = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
+ bool have_preemption = false;
+ unsigned long seen_tcs = 0;
+ u32 fp[TC_QOPT_MAX_QUEUE];
+ struct nlattr *n;
+ int tc, rem;
+ int err = 0;
+
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++)
+ fp[tc] = priv->fp[tc];
+
+ nla_for_each_attr(n, nlattr_opt, nlattr_opt_len, rem) {
+ if (nla_type(n) != TCA_MQPRIO_TC_ENTRY)
+ continue;
+
+ err = mqprio_parse_tc_entry(fp, n, &seen_tcs, extack);
+ if (err)
+ goto out;
+ }
+
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) {
+ priv->fp[tc] = fp[tc];
+ if (fp[tc] == TC_FP_PREEMPTIBLE)
+ have_preemption = true;
+ }
+
+ if (have_preemption && !ethtool_dev_mm_supported(dev)) {
+ NL_SET_ERR_MSG(extack, "Device does not support preemption");
+ return -EOPNOTSUPP;
+ }
+out:
+ return err;
+}
+
/* Parse the other netlink attributes that represent the payload of
* TCA_OPTIONS, which are appended right after struct tc_mqprio_qopt.
*/
@@ -232,6 +316,13 @@ static int mqprio_parse_nlattr(struct Qdisc *sch, struct tc_mqprio_qopt *qopt,
priv->flags |= TC_MQPRIO_F_MAX_RATE;
}
+ if (tb[TCA_MQPRIO_TC_ENTRY]) {
+ err = mqprio_parse_tc_entries(sch, nlattr_opt, nlattr_opt_len,
+ extack);
+ if (err)
+ return err;
+ }
+
return 0;
}
@@ -245,7 +336,7 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt,
int i, err = -EOPNOTSUPP;
struct tc_mqprio_qopt *qopt = NULL;
struct tc_mqprio_caps caps;
- int len;
+ int len, tc;
BUILD_BUG_ON(TC_MAX_QUEUE != TC_QOPT_MAX_QUEUE);
BUILD_BUG_ON(TC_BITMASK != TC_QOPT_BITMASK);
@@ -263,6 +354,9 @@ static int mqprio_init(struct Qdisc *sch, struct nlattr *opt,
if (!opt || nla_len(opt) < sizeof(*qopt))
return -EINVAL;
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++)
+ priv->fp[tc] = TC_FP_EXPRESS;
+
qdisc_offload_query_caps(dev, TC_SETUP_QDISC_MQPRIO,
&caps, sizeof(caps));
@@ -413,6 +507,33 @@ static int dump_rates(struct mqprio_sched *priv,
return -1;
}
+static int mqprio_dump_tc_entries(struct mqprio_sched *priv,
+ struct sk_buff *skb)
+{
+ struct nlattr *n;
+ int tc;
+
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) {
+ n = nla_nest_start(skb, TCA_MQPRIO_TC_ENTRY);
+ if (!n)
+ return -EMSGSIZE;
+
+ if (nla_put_u32(skb, TCA_MQPRIO_TC_ENTRY_INDEX, tc))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_MQPRIO_TC_ENTRY_FP, priv->fp[tc]))
+ goto nla_put_failure;
+
+ nla_nest_end(skb, n);
+ }
+
+ return 0;
+
+nla_put_failure:
+ nla_nest_cancel(skb, n);
+ return -EMSGSIZE;
+}
+
static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
{
struct net_device *dev = qdisc_dev(sch);
@@ -463,6 +584,9 @@ static int mqprio_dump(struct Qdisc *sch, struct sk_buff *skb)
(dump_rates(priv, &opt, skb) != 0))
goto nla_put_failure;
+ if (mqprio_dump_tc_entries(priv, skb))
+ goto nla_put_failure;
+
return nla_nest_end(skb, nla);
nla_put_failure:
nlmsg_trim(skb, nla);
diff --git a/net/sched/sch_mqprio_lib.c b/net/sched/sch_mqprio_lib.c
index c58a533b8ec5..83b3793c4012 100644
--- a/net/sched/sch_mqprio_lib.c
+++ b/net/sched/sch_mqprio_lib.c
@@ -114,4 +114,18 @@ void mqprio_qopt_reconstruct(struct net_device *dev, struct tc_mqprio_qopt *qopt
}
EXPORT_SYMBOL_GPL(mqprio_qopt_reconstruct);
+void mqprio_fp_to_offload(u32 fp[TC_QOPT_MAX_QUEUE],
+ struct tc_mqprio_qopt_offload *mqprio)
+{
+ unsigned long preemptible_tcs = 0;
+ int tc;
+
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++)
+ if (fp[tc] == TC_FP_PREEMPTIBLE)
+ preemptible_tcs |= BIT(tc);
+
+ mqprio->preemptible_tcs = preemptible_tcs;
+}
+EXPORT_SYMBOL_GPL(mqprio_fp_to_offload);
+
MODULE_LICENSE("GPL");
diff --git a/net/sched/sch_mqprio_lib.h b/net/sched/sch_mqprio_lib.h
index 63f725ab8761..079f597072e3 100644
--- a/net/sched/sch_mqprio_lib.h
+++ b/net/sched/sch_mqprio_lib.h
@@ -14,5 +14,7 @@ int mqprio_validate_qopt(struct net_device *dev, struct tc_mqprio_qopt *qopt,
struct netlink_ext_ack *extack);
void mqprio_qopt_reconstruct(struct net_device *dev,
struct tc_mqprio_qopt *qopt);
+void mqprio_fp_to_offload(u32 fp[TC_QOPT_MAX_QUEUE],
+ struct tc_mqprio_qopt_offload *mqprio);
#endif
--
2.34.1
This is a duplication of the FP adminStatus logic introduced for
tc-mqprio. Offloading is done through the tc_mqprio_qopt_offload
structure embedded within tc_taprio_qopt_offload. So practically, if a
device driver is written to treat the mqprio portion of taprio just like
standalone mqprio, it gets unified handling of frame preemption.
I would have reused more code with taprio, but this is mostly netlink
attribute parsing, which is hard to transform into generic code without
having something that stinks as a result. We have the same variables
with the same semantics, just different nlattr type values
(TCA_MQPRIO_TC_ENTRY=5 vs TCA_TAPRIO_ATTR_TC_ENTRY=12;
TCA_MQPRIO_TC_ENTRY_FP=2 vs TCA_TAPRIO_TC_ENTRY_FP=3, etc) and
consequently, different policies for the nest.
Every time nla_parse_nested() is called, an on-stack table "tb" of
nlattr pointers is allocated statically, up to the maximum understood
nlattr type. That array size is hardcoded as a constant, but when
transforming this into a common parsing function, it would become either
a VLA (which the Linux kernel rightfully doesn't like) or a call to the
allocator.
Having FP adminStatus in tc-taprio can be seen as addressing the 802.1Q
Annex S.3 "Scheduling and preemption used in combination, no HOLD/RELEASE"
and S.4 "Scheduling and preemption used in combination with HOLD/RELEASE"
use cases. HOLD and RELEASE events are emitted towards the underlying
MAC Merge layer when the schedule hits a Set-And-Hold-MAC or a
Set-And-Release-MAC gate operation.
A small part of the change is dedicated to refactoring the max_sdu
nlattr parsing to put all logic under the "if" that tests for presence
of that nlattr.
Signed-off-by: Vladimir Oltean <[email protected]>
---
include/uapi/linux/pkt_sched.h | 1 +
net/sched/sch_taprio.c | 65 +++++++++++++++++++++++++++-------
2 files changed, 53 insertions(+), 13 deletions(-)
diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index b8d29be91b62..51a7addc56c6 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -1252,6 +1252,7 @@ enum {
TCA_TAPRIO_TC_ENTRY_UNSPEC,
TCA_TAPRIO_TC_ENTRY_INDEX, /* u32 */
TCA_TAPRIO_TC_ENTRY_MAX_SDU, /* u32 */
+ TCA_TAPRIO_TC_ENTRY_FP, /* u32 */
/* add new constants above here */
__TCA_TAPRIO_TC_ENTRY_CNT,
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c
index 9781b47962bb..c799361adea4 100644
--- a/net/sched/sch_taprio.c
+++ b/net/sched/sch_taprio.c
@@ -7,6 +7,7 @@
*/
#include <linux/ethtool.h>
+#include <linux/ethtool_netlink.h>
#include <linux/types.h>
#include <linux/slab.h>
#include <linux/kernel.h>
@@ -96,6 +97,7 @@ struct taprio_sched {
struct list_head taprio_list;
int cur_txq[TC_MAX_QUEUE];
u32 max_sdu[TC_MAX_QUEUE]; /* save info from the user */
+ u32 fp[TC_QOPT_MAX_QUEUE]; /* only for dump and offloading */
u32 txtime_delay;
};
@@ -994,6 +996,9 @@ static const struct nla_policy entry_policy[TCA_TAPRIO_SCHED_ENTRY_MAX + 1] = {
static const struct nla_policy taprio_tc_policy[TCA_TAPRIO_TC_ENTRY_MAX + 1] = {
[TCA_TAPRIO_TC_ENTRY_INDEX] = { .type = NLA_U32 },
[TCA_TAPRIO_TC_ENTRY_MAX_SDU] = { .type = NLA_U32 },
+ [TCA_TAPRIO_TC_ENTRY_FP] = NLA_POLICY_RANGE(NLA_U32,
+ TC_FP_EXPRESS,
+ TC_FP_PREEMPTIBLE),
};
static const struct nla_policy taprio_policy[TCA_TAPRIO_ATTR_MAX + 1] = {
@@ -1514,6 +1519,7 @@ static int taprio_enable_offload(struct net_device *dev,
offload->enable = 1;
mqprio_qopt_reconstruct(dev, &offload->mqprio.qopt);
taprio_sched_to_offload(dev, sched, offload, &caps);
+ mqprio_fp_to_offload(q->fp, &offload->mqprio);
for (tc = 0; tc < TC_MAX_QUEUE; tc++)
offload->max_sdu[tc] = q->max_sdu[tc];
@@ -1655,13 +1661,14 @@ static int taprio_parse_clockid(struct Qdisc *sch, struct nlattr **tb,
static int taprio_parse_tc_entry(struct Qdisc *sch,
struct nlattr *opt,
u32 max_sdu[TC_QOPT_MAX_QUEUE],
+ u32 fp[TC_QOPT_MAX_QUEUE],
unsigned long *seen_tcs,
struct netlink_ext_ack *extack)
{
struct nlattr *tb[TCA_TAPRIO_TC_ENTRY_MAX + 1] = { };
struct net_device *dev = qdisc_dev(sch);
- u32 val = 0;
int err, tc;
+ u32 val;
err = nla_parse_nested(tb, TCA_TAPRIO_TC_ENTRY_MAX, opt,
taprio_tc_policy, extack);
@@ -1686,15 +1693,18 @@ static int taprio_parse_tc_entry(struct Qdisc *sch,
*seen_tcs |= BIT(tc);
- if (tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU])
+ if (tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU]) {
val = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_MAX_SDU]);
+ if (val > dev->max_mtu) {
+ NL_SET_ERR_MSG_MOD(extack, "TC max SDU exceeds device max MTU");
+ return -ERANGE;
+ }
- if (val > dev->max_mtu) {
- NL_SET_ERR_MSG_MOD(extack, "TC max SDU exceeds device max MTU");
- return -ERANGE;
+ max_sdu[tc] = val;
}
- max_sdu[tc] = val;
+ if (tb[TCA_TAPRIO_TC_ENTRY_FP])
+ fp[tc] = nla_get_u32(tb[TCA_TAPRIO_TC_ENTRY_FP]);
return 0;
}
@@ -1704,29 +1714,51 @@ static int taprio_parse_tc_entries(struct Qdisc *sch,
struct netlink_ext_ack *extack)
{
struct taprio_sched *q = qdisc_priv(sch);
+ struct net_device *dev = qdisc_dev(sch);
u32 max_sdu[TC_QOPT_MAX_QUEUE];
+ bool have_preemption = false;
unsigned long seen_tcs = 0;
+ u32 fp[TC_QOPT_MAX_QUEUE];
struct nlattr *n;
int tc, rem;
int err = 0;
- for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++)
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) {
max_sdu[tc] = q->max_sdu[tc];
+ fp[tc] = q->fp[tc];
+ }
nla_for_each_nested(n, opt, rem) {
if (nla_type(n) != TCA_TAPRIO_ATTR_TC_ENTRY)
continue;
- err = taprio_parse_tc_entry(sch, n, max_sdu, &seen_tcs,
+ err = taprio_parse_tc_entry(sch, n, max_sdu, fp, &seen_tcs,
extack);
if (err)
- goto out;
+ return err;
}
- for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++)
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++) {
q->max_sdu[tc] = max_sdu[tc];
+ q->fp[tc] = fp[tc];
+ if (fp[tc] != TC_FP_EXPRESS)
+ have_preemption = true;
+ }
+
+ if (have_preemption) {
+ if (!FULL_OFFLOAD_IS_ENABLED(q->flags)) {
+ NL_SET_ERR_MSG(extack,
+ "Preemption only supported with full offload");
+ return -EOPNOTSUPP;
+ }
+
+ if (!ethtool_dev_mm_supported(dev)) {
+ NL_SET_ERR_MSG(extack,
+ "Device does not support preemption");
+ return -EOPNOTSUPP;
+ }
+ }
-out:
return err;
}
@@ -2007,7 +2039,7 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
{
struct taprio_sched *q = qdisc_priv(sch);
struct net_device *dev = qdisc_dev(sch);
- int i;
+ int i, tc;
spin_lock_init(&q->current_entry_lock);
@@ -2064,6 +2096,9 @@ static int taprio_init(struct Qdisc *sch, struct nlattr *opt,
q->qdiscs[i] = qdisc;
}
+ for (tc = 0; tc < TC_QOPT_MAX_QUEUE; tc++)
+ q->fp[tc] = TC_FP_EXPRESS;
+
taprio_detect_broken_mqprio(q);
return taprio_change(sch, opt, extack);
@@ -2207,6 +2242,7 @@ static int dump_schedule(struct sk_buff *msg,
}
static int taprio_dump_tc_entries(struct sk_buff *skb,
+ struct taprio_sched *q,
struct sched_gate_list *sched)
{
struct nlattr *n;
@@ -2224,6 +2260,9 @@ static int taprio_dump_tc_entries(struct sk_buff *skb,
sched->max_sdu[tc]))
goto nla_put_failure;
+ if (nla_put_u32(skb, TCA_TAPRIO_TC_ENTRY_FP, q->fp[tc]))
+ goto nla_put_failure;
+
nla_nest_end(skb, n);
}
@@ -2265,7 +2304,7 @@ static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb)
nla_put_u32(skb, TCA_TAPRIO_ATTR_TXTIME_DELAY, q->txtime_delay))
goto options_error;
- if (oper && taprio_dump_tc_entries(skb, oper))
+ if (oper && taprio_dump_tc_entries(skb, q, oper))
goto options_error;
if (oper && dump_schedule(skb, oper))
--
2.34.1
In order to not transmit (preemptible) frames which will be received by
the link partner as corrupted (because it doesn't support FP), the
hardware requires the driver to program the QSYS_PREEMPTION_CFG_P_QUEUES
register only after the MAC Merge layer becomes active (verification
succeeds, or was disabled).
There are some cases when FP is known (through experimentation) to be
broken. Give priority to FP over cut-through switching, and disable FP
for known broken link modes.
Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/dsa/ocelot/felix_vsc9959.c | 13 +++++-
drivers/net/ethernet/mscc/ocelot.c | 3 ++
drivers/net/ethernet/mscc/ocelot.h | 2 +
drivers/net/ethernet/mscc/ocelot_mm.c | 56 ++++++++++++++++++++++++++
include/soc/mscc/ocelot.h | 2 +
5 files changed, 74 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/ocelot/felix_vsc9959.c b/drivers/net/dsa/ocelot/felix_vsc9959.c
index 81fcdccacd8b..c6a5cf57dcc6 100644
--- a/drivers/net/dsa/ocelot/felix_vsc9959.c
+++ b/drivers/net/dsa/ocelot/felix_vsc9959.c
@@ -1343,6 +1343,7 @@ static void vsc9959_sched_speed_set(struct ocelot *ocelot, int port,
u32 speed)
{
struct ocelot_port *ocelot_port = ocelot->ports[port];
+ struct ocelot_mm_state *mm = &ocelot->mm[port];
u8 tas_speed;
switch (speed) {
@@ -1374,6 +1375,11 @@ static void vsc9959_sched_speed_set(struct ocelot *ocelot, int port,
vsc9959_tas_guard_bands_update(ocelot, port);
mutex_unlock(&ocelot->tas_lock);
+
+ /* Workaround for hardware bug */
+ mutex_lock(&mm->lock);
+ ocelot_port_update_preemptible_tcs(ocelot, port);
+ mutex_unlock(&mm->lock);
}
static void vsc9959_new_base_time(struct ocelot *ocelot, ktime_t base_time,
@@ -2519,6 +2525,7 @@ static void vsc9959_cut_through_fwd(struct ocelot *ocelot)
for (port = 0; port < ocelot->num_phys_ports; port++) {
struct ocelot_port *ocelot_port = ocelot->ports[port];
+ struct ocelot_mm_state *mm = &ocelot->mm[port];
int min_speed = ocelot_port->speed;
unsigned long mask = 0;
u32 tmp, val = 0;
@@ -2559,10 +2566,12 @@ static void vsc9959_cut_through_fwd(struct ocelot *ocelot)
/* Enable cut-through forwarding for all traffic classes that
* don't have oversized dropping enabled, since this check is
- * bypassed in cut-through mode.
+ * bypassed in cut-through mode. Also exclude preemptible
+ * traffic classes, since these would hang the port for some
+ * reason, if sent as cut-through.
*/
if (ocelot_port->speed == min_speed) {
- val = GENMASK(7, 0);
+ val = GENMASK(7, 0) & ~mm->preemptible_tcs;
for (tc = 0; tc < OCELOT_NUM_TC; tc++)
if (vsc9959_port_qmaxsdu_get(ocelot, port, tc))
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index 20557a9c46e6..76a7c25744b9 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -2608,6 +2608,7 @@ static void ocelot_port_reset_mqprio(struct ocelot *ocelot, int port)
struct net_device *dev = ocelot->ops->port_to_netdev(ocelot, port);
netdev_reset_tc(dev);
+ ocelot_port_update_fp(ocelot, port, 0);
}
int ocelot_port_mqprio(struct ocelot *ocelot, int port,
@@ -2642,6 +2643,8 @@ int ocelot_port_mqprio(struct ocelot *ocelot, int port,
if (err)
goto err_reset_tc;
+ ocelot_port_update_fp(ocelot, port, mqprio->preemptible_tcs);
+
return 0;
err_reset_tc:
diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h
index e9a0179448bf..fa9b69ba198c 100644
--- a/drivers/net/ethernet/mscc/ocelot.h
+++ b/drivers/net/ethernet/mscc/ocelot.h
@@ -110,6 +110,8 @@ int ocelot_stats_init(struct ocelot *ocelot);
void ocelot_stats_deinit(struct ocelot *ocelot);
int ocelot_mm_init(struct ocelot *ocelot);
+void ocelot_port_update_fp(struct ocelot *ocelot, int port,
+ unsigned long preemptible_tcs);
extern struct notifier_block ocelot_netdevice_nb;
extern struct notifier_block ocelot_switchdev_nb;
diff --git a/drivers/net/ethernet/mscc/ocelot_mm.c b/drivers/net/ethernet/mscc/ocelot_mm.c
index 0a8f21ae23f0..21d5656dfc70 100644
--- a/drivers/net/ethernet/mscc/ocelot_mm.c
+++ b/drivers/net/ethernet/mscc/ocelot_mm.c
@@ -49,6 +49,61 @@ static enum ethtool_mm_verify_status ocelot_mm_verify_status(u32 val)
}
}
+void ocelot_port_update_preemptible_tcs(struct ocelot *ocelot, int port)
+{
+ struct ocelot_port *ocelot_port = ocelot->ports[port];
+ struct ocelot_mm_state *mm = &ocelot->mm[port];
+ u32 val = 0;
+
+ lockdep_assert_held(&mm->lock);
+
+ /* On NXP LS1028A, when using QSGMII, the port hangs if transmitting
+ * preemptible frames at any other link speed than gigabit
+ */
+ if (ocelot_port->phy_mode != PHY_INTERFACE_MODE_QSGMII ||
+ ocelot_port->speed == SPEED_1000) {
+ /* Only commit preemptible TCs when MAC Merge is active */
+ switch (mm->verify_status) {
+ case ETHTOOL_MM_VERIFY_STATUS_SUCCEEDED:
+ case ETHTOOL_MM_VERIFY_STATUS_DISABLED:
+ val = mm->preemptible_tcs;
+ break;
+ default:
+ }
+ }
+
+ ocelot_rmw_rix(ocelot, QSYS_PREEMPTION_CFG_P_QUEUES(val),
+ QSYS_PREEMPTION_CFG_P_QUEUES_M,
+ QSYS_PREEMPTION_CFG, port);
+}
+EXPORT_SYMBOL_GPL(ocelot_port_update_preemptible_tcs);
+
+void ocelot_port_update_fp(struct ocelot *ocelot, int port,
+ unsigned long preemptible_tcs)
+{
+ struct ocelot_mm_state *mm = &ocelot->mm[port];
+
+ mutex_lock(&mm->lock);
+
+ if (mm->preemptible_tcs == preemptible_tcs)
+ goto out_unlock;
+
+ mm->preemptible_tcs = preemptible_tcs;
+
+ /* Cut through switching doesn't work for preemptible priorities,
+ * so disable it.
+ */
+ mutex_lock(&ocelot->fwd_domain_lock);
+ ocelot->ops->cut_through_fwd(ocelot);
+ mutex_unlock(&ocelot->fwd_domain_lock);
+
+ ocelot_port_update_preemptible_tcs(ocelot, port);
+
+out_unlock:
+ mutex_unlock(&mm->lock);
+}
+EXPORT_SYMBOL_GPL(ocelot_port_update_fp);
+
void ocelot_port_mm_irq(struct ocelot *ocelot, int port)
{
struct ocelot_port *ocelot_port = ocelot->ports[port];
@@ -66,6 +121,7 @@ void ocelot_port_mm_irq(struct ocelot *ocelot, int port)
"Port %d MAC Merge verification state %s\n",
port, mm_verify_state_to_string(verify_status));
mm->verify_status = verify_status;
+ ocelot_port_update_preemptible_tcs(ocelot, port);
}
if (val & DEV_MM_STAT_MM_STATUS_PRMPT_ACTIVE_STICKY) {
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index 27ff770a6c53..7ee7a29e7c51 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -748,6 +748,7 @@ struct ocelot_mm_state {
struct mutex lock;
enum ethtool_mm_verify_status verify_status;
bool tx_active;
+ u8 preemptible_tcs;
};
struct ocelot_port;
@@ -1149,6 +1150,7 @@ int ocelot_port_get_mm(struct ocelot *ocelot, int port,
struct ethtool_mm_state *state);
int ocelot_port_mqprio(struct ocelot *ocelot, int port,
struct tc_mqprio_qopt_offload *mqprio);
+void ocelot_port_update_preemptible_tcs(struct ocelot *ocelot, int port);
#if IS_ENABLED(CONFIG_BRIDGE_MRP)
int ocelot_mrp_add(struct ocelot *ocelot, int port,
--
2.34.1
PFs which support the MAC Merge layer also have a set of 8 registers
called "Port traffic class N frame preemption register (PTC0FPR - PTC7FPR)".
Through these, a traffic class (group of TX rings of same dequeue
priority) can be mapped to the eMAC or to the pMAC.
There's nothing particularly spectacular here. We should probably only
commit the preemptible TCs to hardware once the MAC Merge layer became
active, but unlike Felix, we don't have an IRQ that notifies us of that.
We'd have to sleep for up to verifyTime (127 ms) to wait for a
resolution coming from the verification state machine; not only from the
ndo_setup_tc() code path, but also from enetc_mm_link_state_update().
Since it's relatively complicated and has a relatively small benefit,
I'm not doing it.
Signed-off-by: Vladimir Oltean <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 22 +++++++++++++++++++
drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
.../net/ethernet/freescale/enetc/enetc_hw.h | 4 ++++
3 files changed, 27 insertions(+)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index e0207b01ddd6..41c194c1672d 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -25,6 +25,24 @@ void enetc_port_mac_wr(struct enetc_si *si, u32 reg, u32 val)
}
EXPORT_SYMBOL_GPL(enetc_port_mac_wr);
+void enetc_set_ptcfpr(struct enetc_hw *hw, unsigned long preemptible_tcs)
+{
+ u32 val;
+ int tc;
+
+ for (tc = 0; tc < 8; tc++) {
+ val = enetc_port_rd(hw, ENETC_PTCFPR(tc));
+
+ if (preemptible_tcs & BIT(tc))
+ val |= ENETC_PTCFPR_FPE;
+ else
+ val &= ~ENETC_PTCFPR_FPE;
+
+ enetc_port_wr(hw, ENETC_PTCFPR(tc), val);
+ }
+}
+EXPORT_SYMBOL_GPL(enetc_set_ptcfpr);
+
static int enetc_num_stack_tx_queues(struct enetc_ndev_priv *priv)
{
int num_tx_rings = priv->num_tx_rings;
@@ -2640,6 +2658,8 @@ static void enetc_reset_tc_mqprio(struct net_device *ndev)
}
enetc_debug_tx_ring_prios(priv);
+
+ enetc_set_ptcfpr(hw, 0);
}
int enetc_setup_tc_mqprio(struct net_device *ndev, void *type_data)
@@ -2694,6 +2714,8 @@ int enetc_setup_tc_mqprio(struct net_device *ndev, void *type_data)
enetc_debug_tx_ring_prios(priv);
+ enetc_set_ptcfpr(hw, mqprio->preemptible_tcs);
+
return 0;
err_reset_tc:
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 8010f31cd10d..143078a9ef16 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -486,6 +486,7 @@ static inline void enetc_cbd_free_data_mem(struct enetc_si *si, int size,
void enetc_reset_ptcmsdur(struct enetc_hw *hw);
void enetc_set_ptcmsdur(struct enetc_hw *hw, u32 *queue_max_sdu);
+void enetc_set_ptcfpr(struct enetc_hw *hw, unsigned long preemptible_tcs);
#ifdef CONFIG_FSL_ENETC_QOS
int enetc_qos_query_caps(struct net_device *ndev, void *type_data);
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index de2e0ee8cdcb..36bb2d6d5658 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -965,6 +965,10 @@ static inline u32 enetc_usecs_to_cycles(u32 usecs)
return (u32)div_u64(usecs * ENETC_CLK, 1000000ULL);
}
+/* Port traffic class frame preemption register */
+#define ENETC_PTCFPR(n) (0x1910 + (n) * 4) /* n = [0 ..7] */
+#define ENETC_PTCFPR_FPE BIT(31)
+
/* port time gating control register */
#define ENETC_PTGCR 0x11a00
#define ENETC_PTGCR_TGE BIT(31)
--
2.34.1
Hi Vladimir!
On Fri, 2023-02-17 at 01:21 +0200, Vladimir Oltean wrote:
> Ferenc reports that a combination of poor iproute2 defaults and
> obscure
> cases where the kernel returns -EINVAL make it difficult to
> understand
> what is wrong with this command:
>
> $ ip link add veth0 numtxqueues 8 numrxqueues 8 type veth peer name
> veth1
> $ tc qdisc add dev veth0 root mqprio num_tc 8 map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7
> RTNETLINK answers: Invalid argument
>
> Hopefully with this patch, the cause is clearer:
>
> Error: Device does not support hardware offload.
Much better, great improvement!
>
> This was rejected because iproute2 defaults to "hw 1" if the option
> is
> not specified.
>
> Link:
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/#25215636
> Signed-off-by: Vladimir Oltean <[email protected]>
> ---
> net/sched/sch_mqprio.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/sched/sch_mqprio.c b/net/sched/sch_mqprio.c
> index 18eda5fade81..52cfc0ec2e23 100644
> --- a/net/sched/sch_mqprio.c
> +++ b/net/sched/sch_mqprio.c
> @@ -134,8 +134,11 @@ static int mqprio_parse_opt(struct net_device
> *dev, struct tc_mqprio_qopt *qopt,
> /* If ndo_setup_tc is not present then hardware doesn't
> support offload
> * and we should return an error.
> */
> - if (qopt->hw && !dev->netdev_ops->ndo_setup_tc)
> + if (qopt->hw && !dev->netdev_ops->ndo_setup_tc) {
> + NL_SET_ERR_MSG(extack,
> + "Device does not support hardware
> offload");
> return -EINVAL;
> + }
>
> return 0;
> }
Thanks for doing this!
Best,
Ferenc
On Fri, Feb 17, 2023 at 01:21:19AM +0200, Vladimir Oltean wrote:
> +bool ethtool_dev_mm_supported(struct net_device *dev)
> +{
> + const struct ethtool_ops *ops = dev->ethtool_ops;
> + bool supported;
> + int ret;
> +
> + ASSERT_RTNL();
> +
> + if (!ops)
> + return false;
> +
> + ret = ethnl_ops_begin(dev);
> + if (ret < 0)
> + return false;
> +
> + supported = __ethtool_dev_mm_supported(dev);
> +
> + ethnl_ops_complete(dev);
> +
> + return supported;
> +}
In the first patch that uses this:
ERROR: modpost: "ethtool_dev_mm_supported" [net/sched/sch_mqprio.ko] undefined!
due to a missing EXPORT_SYMBOL_GPL(). Sorry.
On Fri, Feb 17, 2023 at 01:21:14AM +0200, Vladimir Oltean wrote:
> The last RFC in August 2022 contained a proposal for the UAPI of both
> TSN standards which together form Frame Preemption (802.1Q and 802.3):
> https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
>
> It wasn't clear at the time whether the 802.1Q portion of Frame Preemption
> should be exposed via the tc qdisc (mqprio, taprio) or via some other
> layer (perhaps also ethtool like the 802.3 portion).
>
> So the 802.3 portion got submitted separately and finally was accepted:
> https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
>
> leaving the only remaining question: how do we expose the 802.1Q bits?
>
> This series proposes that we use the Qdisc layer, through separate
> (albeit very similar) UAPI in mqprio and taprio, and that both these
> Qdiscs pass the information down to the offloading device driver through
> the common mqprio offload structure (which taprio also passes).
>
> Implementations are provided for the NXP LS1028A on-board Ethernet
> (enetc, felix).
>
> Some patches should have maybe belonged to separate series, leaving here
> only patches 09/12 - 12/12, for ease of review. That may be true,
> however due to a perceived lack of time to wait for the prerequisite
> cleanup to be merged, here they are all together.
>
> Vladimir Oltean (12):
> net: enetc: rename "mqprio" to "qopt"
> net: mscc: ocelot: add support for mqprio offload
> net: dsa: felix: act upon the mqprio qopt in taprio offload
> net: ethtool: fix __ethtool_dev_mm_supported() implementation
> net: ethtool: create and export ethtool_dev_mm_supported()
> net/sched: mqprio: simplify handling of nlattr portion of TCA_OPTIONS
> net/sched: mqprio: add extack to mqprio_parse_nlattr()
> net/sched: mqprio: add an extack message to mqprio_parse_opt()
> net/sched: mqprio: allow per-TC user input of FP adminStatus
> net/sched: taprio: allow per-TC user input of FP adminStatus
> net: mscc: ocelot: add support for preemptible traffic classes
> net: enetc: add support for preemptible traffic classes
>
> drivers/net/dsa/ocelot/felix_vsc9959.c | 44 ++++-
> drivers/net/ethernet/freescale/enetc/enetc.c | 31 ++-
> drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
> .../net/ethernet/freescale/enetc/enetc_hw.h | 4 +
> drivers/net/ethernet/mscc/ocelot.c | 51 +++++
> drivers/net/ethernet/mscc/ocelot.h | 2 +
> drivers/net/ethernet/mscc/ocelot_mm.c | 56 ++++++
> include/linux/ethtool_netlink.h | 6 +
> include/net/pkt_sched.h | 1 +
> include/soc/mscc/ocelot.h | 6 +
> include/uapi/linux/pkt_sched.h | 17 ++
> net/ethtool/mm.c | 24 ++-
> net/sched/sch_mqprio.c | 182 +++++++++++++++---
> net/sched/sch_mqprio_lib.c | 14 ++
> net/sched/sch_mqprio_lib.h | 2 +
> net/sched/sch_taprio.c | 65 +++++--
> 16 files changed, 459 insertions(+), 47 deletions(-)
>
> --
> 2.34.1
>
Seeing that there is no feedback on the proposed UAPI, I'd be tempted
to resend this, with just the modular build fixed (export the
ethtool_dev_mm_supported() symbol).
Would anyone hate me for doing this, considering that the merge window
is close? Does anyone need some time to take a closer look at this, or
think about a better alternative?
Hi Vladimir!
On Sat, 2023-02-18 at 17:20 +0200, Vladimir Oltean wrote:
> On Fri, Feb 17, 2023 at 01:21:14AM +0200, Vladimir Oltean wrote:
> > The last RFC in August 2022 contained a proposal for the UAPI of
> > both
> > TSN standards which together form Frame Preemption (802.1Q and
> > 802.3):
> > https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
> >
> > It wasn't clear at the time whether the 802.1Q portion of Frame
> > Preemption
> > should be exposed via the tc qdisc (mqprio, taprio) or via some
> > other
> > layer (perhaps also ethtool like the 802.3 portion).
> >
> > So the 802.3 portion got submitted separately and finally was
> > accepted:
> > https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/
> >
> > leaving the only remaining question: how do we expose the 802.1Q
> > bits?
> >
> > This series proposes that we use the Qdisc layer, through separate
> > (albeit very similar) UAPI in mqprio and taprio, and that both
> > these
> > Qdiscs pass the information down to the offloading device driver
> > through
> > the common mqprio offload structure (which taprio also passes).
> >
> > Implementations are provided for the NXP LS1028A on-board Ethernet
> > (enetc, felix).
> >
> > Some patches should have maybe belonged to separate series, leaving
> > here
> > only patches 09/12 - 12/12, for ease of review. That may be true,
> > however due to a perceived lack of time to wait for the
> > prerequisite
> > cleanup to be merged, here they are all together.
> >
> > Vladimir Oltean (12):
> > net: enetc: rename "mqprio" to "qopt"
> > net: mscc: ocelot: add support for mqprio offload
> > net: dsa: felix: act upon the mqprio qopt in taprio offload
> > net: ethtool: fix __ethtool_dev_mm_supported() implementation
> > net: ethtool: create and export ethtool_dev_mm_supported()
> > net/sched: mqprio: simplify handling of nlattr portion of
> > TCA_OPTIONS
> > net/sched: mqprio: add extack to mqprio_parse_nlattr()
> > net/sched: mqprio: add an extack message to mqprio_parse_opt()
> > net/sched: mqprio: allow per-TC user input of FP adminStatus
> > net/sched: taprio: allow per-TC user input of FP adminStatus
> > net: mscc: ocelot: add support for preemptible traffic classes
> > net: enetc: add support for preemptible traffic classes
> >
> > drivers/net/dsa/ocelot/felix_vsc9959.c | 44 ++++-
> > drivers/net/ethernet/freescale/enetc/enetc.c | 31 ++-
> > drivers/net/ethernet/freescale/enetc/enetc.h | 1 +
> > .../net/ethernet/freescale/enetc/enetc_hw.h | 4 +
> > drivers/net/ethernet/mscc/ocelot.c | 51 +++++
> > drivers/net/ethernet/mscc/ocelot.h | 2 +
> > drivers/net/ethernet/mscc/ocelot_mm.c | 56 ++++++
> > include/linux/ethtool_netlink.h | 6 +
> > include/net/pkt_sched.h | 1 +
> > include/soc/mscc/ocelot.h | 6 +
> > include/uapi/linux/pkt_sched.h | 17 ++
> > net/ethtool/mm.c | 24 ++-
> > net/sched/sch_mqprio.c | 182
> > +++++++++++++++---
> > net/sched/sch_mqprio_lib.c | 14 ++
> > net/sched/sch_mqprio_lib.h | 2 +
> > net/sched/sch_taprio.c | 65 +++++--
> > 16 files changed, 459 insertions(+), 47 deletions(-)
> >
> > --
> > 2.34.1
> >
>
> Seeing that there is no feedback on the proposed UAPI, I'd be tempted
> to resend this, with just the modular build fixed (export the
> ethtool_dev_mm_supported() symbol).
>
> Would anyone hate me for doing this, considering that the merge
> window
> is close? Does anyone need some time to take a closer look at this,
> or
> think about a better alternative?
Do you have the iproute2 part? Sorry if I missed it, but it would be
nice to see how is that UAPI exposed for the config tools. Is there any
new parameter for mqprio/taprio?
Best,
Ferenc
Hi Ferenc,
On Sun, Feb 19, 2023 at 10:47:31AM +0100, Ferenc Fejes wrote:
> Do you have the iproute2 part? Sorry if I missed it, but it would be
> nice to see how is that UAPI exposed for the config tools. Is there any
> new parameter for mqprio/taprio?
I haven't posted the iproute2 part (yet). For those familiar with my
recent development, FP is a per-traffic-class netlink attribute just
like queueMaxSDU from tc-taprio. That was exposed in iproute2 as an
array of values, one per tc.
What I have in my tree would allow something like this:
tc qdisc replace dev $swp1 root stab overhead 20 taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 \
sched-entry S 0x7e 900000 \
sched-entry S 0x82 100000 \
max-sdu 0 0 0 0 0 0 0 200 \
fp P E E E E E E E \ # this is new (one entry per tc)
flags 0x2
tc qdisc replace dev $swp1 root mqprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
fp P E E E E E E E \ # this is new (one entry per tc)
hw 1
of course the exact syntax is a potential matter of debate on its own,
and does not really matter for the purpose of defining the kernel UAPI,
which is why I wanted to keep discussions separate.
For hardware which understands preemptible queues rather than traffic
classes, how many queues are preemptible, and what are their offsets,
will be deduced by translating the "queues" argument.
For hardware which understands preemptible priorities rather than
traffic classes, which priorities are preemptible will be deduced by
translating the "map" argument.
The traffic class is the kernel entity which has the preemptible
priority in my proposed UAPI because this is what my analysis of the
standard has deduced that the preemptible quality is fundamentally
attached to.
Considering that the UAPI for FP is a topic that has been discussed to
death at least since August without any really new input since then, I'm
going to submit v2 later today, and the iproute2 patch set afterwards
(still need to write man page entries for that).
Hi Vladimir!
Thank you for the update!
On Sun, 2023-02-19 at 14:58 +0200, Vladimir Oltean wrote:
> Hi Ferenc,
>
> On Sun, Feb 19, 2023 at 10:47:31AM +0100, Ferenc Fejes wrote:
> > Do you have the iproute2 part? Sorry if I missed it, but it would
> > be
> > nice to see how is that UAPI exposed for the config tools. Is there
> > any
> > new parameter for mqprio/taprio?
>
> I haven't posted the iproute2 part (yet). For those familiar with my
> recent development, FP is a per-traffic-class netlink attribute just
> like queueMaxSDU from tc-taprio. That was exposed in iproute2 as an
> array of values, one per tc.
>
> What I have in my tree would allow something like this:
>
> tc qdisc replace dev $swp1 root stab overhead 20 taprio \
> num_tc 8 \
> map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> base-time 0 \
> sched-entry S 0x7e 900000 \
> sched-entry S 0x82 100000 \
> max-sdu 0 0 0 0 0 0 0 200 \
> fp P E E E E E E E \ # this is new (one entry per tc)
> flags 0x2
>
> tc qdisc replace dev $swp1 root mqprio \
> num_tc 8 \
> map 0 1 2 3 4 5 6 7 \
> queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
> fp P E E E E E E E \ # this is new (one entry per tc)
> hw 1
>
> of course the exact syntax is a potential matter of debate on its
> own,
> and does not really matter for the purpose of defining the kernel
> UAPI,
> which is why I wanted to keep discussions separate.
Fair enough. What you have right here is pretty straightforward IMO, I
would definitely support something like this.
>
> For hardware which understands preemptible queues rather than traffic
> classes, how many queues are preemptible, and what are their offsets,
> will be deduced by translating the "queues" argument.
>
> For hardware which understands preemptible priorities rather than
> traffic classes, which priorities are preemptible will be deduced by
> translating the "map" argument.
Great, that cover both cases with the same UAPI. I love the fact that
this even lets open the possibility to use prio-s (map) instead of
queues for FP.
>
> The traffic class is the kernel entity which has the preemptible
> priority in my proposed UAPI because this is what my analysis of the
> standard has deduced that the preemptible quality is fundamentally
> attached to.
>
> Considering that the UAPI for FP is a topic that has been discussed
> to
> death at least since August without any really new input since then,
> I'm
> going to submit v2 later today, and the iproute2 patch set afterwards
> (still need to write man page entries for that).
Best,
Ferenc
Hi Ferenc,
On Fri, Feb 17, 2023 at 07:24:07AM +0000, Ferenc Fejes wrote:
> Much better, great improvement!
Would appreciate a formal review/test tag on v2 :)