This patch series adds new extension attributes to dcbnl, to support PCP
prioritization (and thereby hw offloadable pcp-based queue
classification) and per-selector trust and trust order. Additionally,
the microchip sparx5 driver has been dcb-enabled to make use of the new
attributes to offload PCP, DSCP and Default prio to the switch, and
implement trust order of selectors.
For pre-RFC discussion see:
https://lore.kernel.org/netdev/Yv9VO1DYAxNduw6A@DEN-LT-70577/
For RFC series see:
https://lore.kernel.org/netdev/[email protected]/
In summary: there currently exist no convenient way to offload per-port
PCP-based queue classification to hardware. The DCB subsystem offers
different ways to prioritize through its APP table, but lacks an option
for PCP. Similarly, there is no way to indicate the notion of trust for
APP table selectors. This patch series addresses both topics.
PCP based queue classification:
- 8021Q standardizes the Priority Code Point table (see 6.9.3 of IEEE
Std 802.1Q-2018). This patch series makes it possible, to offload
the PCP classification to said table. The new PCP selector is not a
standard part of the APP managed object, therefore it is
encapsulated in a new non-std extension attribute.
Selector trust:
- ASIC's often has the notion of trust DSCP and trust PCP. The new
attribute makes it possible to specify a trust order of app
selectors, which drivers can then react on.
DCB-enable sparx5 driver:
- Now supports offloading of DSCP, PCP and default priority. Only one
mapping of protocol:priority is allowed. Consecutive mappings of the
same protocol to some new priority, will overwrite the previous. This
is to keep a consistent view of the app table and the hardware.
- Now supports dscp and pcp trust, by use of the introduced
dcbnl_set/getapptrust ops. Sparx5 supports trust orders: [], [dscp],
[pcp] and [dscp, pcp]. For now, only DSCP and PCP selectors are
supported by the driver, everything else is bounced.
Patch #1 introduces a new PCP selector to the APP object, which makes it
possible to encode PCP and DEI in the app triplet and offload it to the
PCP table of the ASIC.
Patch #2 Introduces the new extension attributes
DCB_ATTR_DCB_APP_TRUST_TABLE and DCB_ATTR_DCB_APP_TRUST. Trusted
selectors are passed in the nested DCB_ATTR_DCB_APP_TRUST_TABLE
attribute, and assembled into an array of selectors:
u8 selectors[256];
where lower indexes has higher precedence. In the array, selectors are
stored consecutively, starting from index zero. With a maximum number of
256 unique selectors, the list has the same maximum size.
Patch #3 Sets up the dcbnl ops hook, and adds support for offloading pcp
app entries, to the PCP table of the switch.
Patch #4 Makes use of the dcbnl_set/getapptrust ops, to set a per-port
trust order.
Patch #5 Adds support for offloading dscp app entries to the DSCP table
of the switch.
Patch #6 Adds support for offloading default prio app entries to the
switch.
================================================================================
RFC v1:
https://lore.kernel.org/netdev/[email protected]/
RFC v1 -> RFC v2:
- Added new nested attribute type DCB_ATTR_DCB_APP_TRUST_TABLE.
- Renamed attributes from DCB_ATTR_IEEE_* to DCB_ATTR_DCB_*.
- Renamed ieee_set/getapptrust to dcbnl_set/getapptrust.
- Added -EOPNOTSUPP if dcbnl_setapptrust is not set.
- Added sanitization of selector array, before passing to driver.
RFC v2 -> (non-RFC) v1
- Added additional check for selector validity.
- Fixed a few style errors.
- using nla_start_nest() instead of nla_start_nest_no_flag().
- Moved DCB_ATTR_DCB_APP_TRUST into new enum.
- Added new DCB_ATTR_DCB_APP extension attribute, for non-std selector
values.
- Added support for offloading dscp, pcp and default prio in the sparx5
driver.
- Added support for per-selector trust and trust order in the sparx5
driver.
v1 -> v2
- Fixed compiler and kdoc warning
v2 -> v3
- Moved back to 255 as PCP selector value.
- Fixed return value in dcbnl_app_attr_type_get() to enum.
- Modified in dcbnl_app_attr_type_get() dcbnl_app_attr_type_validate() to
return directly.
- Added nselector check in sparx5_dcb_apptrust_validate().
- Added const qualifier to "names" variable in struct sparx5_dcb_apptrust.
- Added new SPARX5_DCB config. Fixes issues reported by kernel test robot.
Daniel Machon (6):
net: dcb: add new pcp selector to app object
net: dcb: add new apptrust attribute
net: microchip: sparx5: add support for offloading pcp table
net: microchip: sparx5: add support for apptrust
net: microchip: sparx5: add support for offloading dscp table
net: microchip: sparx5: add support for offloading default prio
drivers/net/ethernet/microchip/sparx5/Kconfig | 8 +
.../net/ethernet/microchip/sparx5/Makefile | 2 +
.../ethernet/microchip/sparx5/sparx5_dcb.c | 293 ++++++++++++++++++
.../ethernet/microchip/sparx5/sparx5_main.h | 11 +
.../microchip/sparx5/sparx5_main_regs.h | 127 +++++++-
.../ethernet/microchip/sparx5/sparx5_port.c | 99 ++++++
.../ethernet/microchip/sparx5/sparx5_port.h | 37 +++
.../ethernet/microchip/sparx5/sparx5_qos.c | 4 +
include/net/dcbnl.h | 4 +
include/uapi/linux/dcbnl.h | 15 +
net/dcb/dcbnl.c | 113 ++++++-
11 files changed, 703 insertions(+), 10 deletions(-)
create mode 100644 drivers/net/ethernet/microchip/sparx5/sparx5_dcb.c
--
2.34.1
Add new PCP selector for the 8021Qaz APP managed object.
As the PCP selector is not part of the 8021Qaz standard, a new non-std
extension attribute DCB_ATTR_DCB_APP has been introduced. Also two
helper functions to translate between selector and app attribute type
has been added. The new selector has been given a value of 255, to
minimize the risk of future overlap of std- and non-std attributes.
The new DCB_ATTR_DCB_APP is sent alongside the ieee std attribute in the
app table. This means that the dcb_app struct can now both contain std-
and non-std app attributes. Currently there is no overlap between the
selector values of the two attributes.
The purpose of adding the PCP selector, is to be able to offload
PCP-based queue classification to the 8021Q Priority Code Point table,
see 6.9.3 of IEEE Std 802.1Q-2018.
PCP and DEI is encoded in the protocol field as 8*dei+pcp, so that a
mapping of PCP 2 and DEI 1 to priority 3 is encoded as {255, 10, 3}.
Signed-off-by: Daniel Machon <[email protected]>
---
include/uapi/linux/dcbnl.h | 6 ++++++
net/dcb/dcbnl.c | 36 ++++++++++++++++++++++++++++++++----
2 files changed, 38 insertions(+), 4 deletions(-)
diff --git a/include/uapi/linux/dcbnl.h b/include/uapi/linux/dcbnl.h
index a791a94013a6..dc7ef96207ca 100644
--- a/include/uapi/linux/dcbnl.h
+++ b/include/uapi/linux/dcbnl.h
@@ -218,6 +218,9 @@ struct cee_pfc {
#define IEEE_8021QAZ_APP_SEL_ANY 4
#define IEEE_8021QAZ_APP_SEL_DSCP 5
+/* Non-std selector values */
+#define DCB_APP_SEL_PCP 255
+
/* This structure contains the IEEE 802.1Qaz APP managed object. This
* object is also used for the CEE std as well.
*
@@ -247,6 +250,8 @@ struct dcb_app {
__u16 protocol;
};
+#define IEEE_8021QAZ_APP_SEL_MAX 255
+
/**
* struct dcb_peer_app_info - APP feature information sent by the peer
*
@@ -425,6 +430,7 @@ enum ieee_attrs {
enum ieee_attrs_app {
DCB_ATTR_IEEE_APP_UNSPEC,
DCB_ATTR_IEEE_APP,
+ DCB_ATTR_DCB_APP,
__DCB_ATTR_IEEE_APP_MAX
};
#define DCB_ATTR_IEEE_APP_MAX (__DCB_ATTR_IEEE_APP_MAX - 1)
diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index dc4fb699b56c..92c32bc11374 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -179,6 +179,33 @@ static const struct nla_policy dcbnl_featcfg_nest[DCB_FEATCFG_ATTR_MAX + 1] = {
static LIST_HEAD(dcb_app_list);
static DEFINE_SPINLOCK(dcb_lock);
+static enum ieee_attrs_app dcbnl_app_attr_type_get(u8 selector)
+{
+ switch (selector) {
+ case IEEE_8021QAZ_APP_SEL_ETHERTYPE:
+ case IEEE_8021QAZ_APP_SEL_STREAM:
+ case IEEE_8021QAZ_APP_SEL_DGRAM:
+ case IEEE_8021QAZ_APP_SEL_ANY:
+ case IEEE_8021QAZ_APP_SEL_DSCP:
+ return DCB_ATTR_IEEE_APP;
+ case DCB_APP_SEL_PCP:
+ return DCB_ATTR_DCB_APP;
+ default:
+ return DCB_ATTR_IEEE_APP_UNSPEC;
+ }
+}
+
+static bool dcbnl_app_attr_type_validate(enum ieee_attrs_app type)
+{
+ switch (type) {
+ case DCB_ATTR_IEEE_APP:
+ case DCB_ATTR_DCB_APP:
+ return true;
+ default:
+ return false;
+ }
+}
+
static struct sk_buff *dcbnl_newmsg(int type, u8 cmd, u32 port, u32 seq,
u32 flags, struct nlmsghdr **nlhp)
{
@@ -1116,8 +1143,9 @@ static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev)
spin_lock_bh(&dcb_lock);
list_for_each_entry(itr, &dcb_app_list, list) {
if (itr->ifindex == netdev->ifindex) {
- err = nla_put(skb, DCB_ATTR_IEEE_APP, sizeof(itr->app),
- &itr->app);
+ enum ieee_attrs_app type =
+ dcbnl_app_attr_type_get(itr->app.selector);
+ err = nla_put(skb, type, sizeof(itr->app), &itr->app);
if (err) {
spin_unlock_bh(&dcb_lock);
return -EMSGSIZE;
@@ -1495,7 +1523,7 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh,
nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) {
struct dcb_app *app_data;
- if (nla_type(attr) != DCB_ATTR_IEEE_APP)
+ if (!dcbnl_app_attr_type_validate(nla_type(attr)))
continue;
if (nla_len(attr) < sizeof(struct dcb_app)) {
@@ -1556,7 +1584,7 @@ static int dcbnl_ieee_del(struct net_device *netdev, struct nlmsghdr *nlh,
nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) {
struct dcb_app *app_data;
- if (nla_type(attr) != DCB_ATTR_IEEE_APP)
+ if (!dcbnl_app_attr_type_validate(nla_type(attr)))
continue;
app_data = nla_data(attr);
if (ops->ieee_delapp)
--
2.34.1
Daniel Machon <[email protected]> writes:
> Add new PCP selector for the 8021Qaz APP managed object.
>
> As the PCP selector is not part of the 8021Qaz standard, a new non-std
> extension attribute DCB_ATTR_DCB_APP has been introduced. Also two
> helper functions to translate between selector and app attribute type
> has been added. The new selector has been given a value of 255, to
> minimize the risk of future overlap of std- and non-std attributes.
>
> The new DCB_ATTR_DCB_APP is sent alongside the ieee std attribute in the
> app table. This means that the dcb_app struct can now both contain std-
> and non-std app attributes. Currently there is no overlap between the
> selector values of the two attributes.
>
> The purpose of adding the PCP selector, is to be able to offload
> PCP-based queue classification to the 8021Q Priority Code Point table,
> see 6.9.3 of IEEE Std 802.1Q-2018.
>
> PCP and DEI is encoded in the protocol field as 8*dei+pcp, so that a
> mapping of PCP 2 and DEI 1 to priority 3 is encoded as {255, 10, 3}.
>
> Signed-off-by: Daniel Machon <[email protected]>
> static struct sk_buff *dcbnl_newmsg(int type, u8 cmd, u32 port, u32 seq,
> u32 flags, struct nlmsghdr **nlhp)
> {
> @@ -1116,8 +1143,9 @@ static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev)
> spin_lock_bh(&dcb_lock);
> list_for_each_entry(itr, &dcb_app_list, list) {
> if (itr->ifindex == netdev->ifindex) {
> - err = nla_put(skb, DCB_ATTR_IEEE_APP, sizeof(itr->app),
> - &itr->app);
> + enum ieee_attrs_app type =
> + dcbnl_app_attr_type_get(itr->app.selector);
> + err = nla_put(skb, type, sizeof(itr->app), &itr->app);
> if (err) {
> spin_unlock_bh(&dcb_lock);
> return -EMSGSIZE;
> @@ -1495,7 +1523,7 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh,
> nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) {
> struct dcb_app *app_data;
>
> - if (nla_type(attr) != DCB_ATTR_IEEE_APP)
> + if (!dcbnl_app_attr_type_validate(nla_type(attr)))
> continue;
>
> if (nla_len(attr) < sizeof(struct dcb_app)) {
> @@ -1556,7 +1584,7 @@ static int dcbnl_ieee_del(struct net_device *netdev, struct nlmsghdr *nlh,
> nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) {
> struct dcb_app *app_data;
>
> - if (nla_type(attr) != DCB_ATTR_IEEE_APP)
> + if (!dcbnl_app_attr_type_validate(nla_type(attr)))
> continue;
> app_data = nla_data(attr);
> if (ops->ieee_delapp)
I'm missing a validation that DCB_APP_SEL_PCP is always sent in
DCB_ATTR_DCB_APP encapsulation. Wouldn't the current code permit
sending it in the IEEE encap? This should be forbidden.
And vice versa: I'm not sure we want to permit sending the standard
attributes in the DCB encap.
> > Add new PCP selector for the 8021Qaz APP managed object.
> >
> > As the PCP selector is not part of the 8021Qaz standard, a new non-std
> > extension attribute DCB_ATTR_DCB_APP has been introduced. Also two
> > helper functions to translate between selector and app attribute type
> > has been added. The new selector has been given a value of 255, to
> > minimize the risk of future overlap of std- and non-std attributes.
> >
> > The new DCB_ATTR_DCB_APP is sent alongside the ieee std attribute in the
> > app table. This means that the dcb_app struct can now both contain std-
> > and non-std app attributes. Currently there is no overlap between the
> > selector values of the two attributes.
> >
> > The purpose of adding the PCP selector, is to be able to offload
> > PCP-based queue classification to the 8021Q Priority Code Point table,
> > see 6.9.3 of IEEE Std 802.1Q-2018.
> >
> > PCP and DEI is encoded in the protocol field as 8*dei+pcp, so that a
> > mapping of PCP 2 and DEI 1 to priority 3 is encoded as {255, 10, 3}.
> >
> > Signed-off-by: Daniel Machon <[email protected]>
>
> > static struct sk_buff *dcbnl_newmsg(int type, u8 cmd, u32 port, u32 seq,
> > u32 flags, struct nlmsghdr **nlhp)
> > {
> > @@ -1116,8 +1143,9 @@ static int dcbnl_ieee_fill(struct sk_buff *skb, struct net_device *netdev)
> > spin_lock_bh(&dcb_lock);
> > list_for_each_entry(itr, &dcb_app_list, list) {
> > if (itr->ifindex == netdev->ifindex) {
> > - err = nla_put(skb, DCB_ATTR_IEEE_APP, sizeof(itr->app),
> > - &itr->app);
> > + enum ieee_attrs_app type =
> > + dcbnl_app_attr_type_get(itr->app.selector);
> > + err = nla_put(skb, type, sizeof(itr->app), &itr->app);
> > if (err) {
> > spin_unlock_bh(&dcb_lock);
> > return -EMSGSIZE;
> > @@ -1495,7 +1523,7 @@ static int dcbnl_ieee_set(struct net_device *netdev, struct nlmsghdr *nlh,
> > nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) {
> > struct dcb_app *app_data;
> >
> > - if (nla_type(attr) != DCB_ATTR_IEEE_APP)
> > + if (!dcbnl_app_attr_type_validate(nla_type(attr)))
> > continue;
> >
> > if (nla_len(attr) < sizeof(struct dcb_app)) {
> > @@ -1556,7 +1584,7 @@ static int dcbnl_ieee_del(struct net_device *netdev, struct nlmsghdr *nlh,
> > nla_for_each_nested(attr, ieee[DCB_ATTR_IEEE_APP_TABLE], rem) {
> > struct dcb_app *app_data;
> >
> > - if (nla_type(attr) != DCB_ATTR_IEEE_APP)
> > + if (!dcbnl_app_attr_type_validate(nla_type(attr)))
> > continue;
> > app_data = nla_data(attr);
> > if (ops->ieee_delapp)
>
> I'm missing a validation that DCB_APP_SEL_PCP is always sent in
> DCB_ATTR_DCB_APP encapsulation. Wouldn't the current code permit
> sending it in the IEEE encap? This should be forbidden.
Right. Current impl. does not check that the non-std selectors received, are
sent with a DCB_ATTR_DCB_APP type. We could introduce a new check
dcbnl_app_attr_selector_validate() that checks combination of type and
selector, after the type and nla_len(attr) has been checked, so that:
validate type -> validate nla_len(attr) -> validate selector
> And vice versa: I'm not sure we want to permit sending the standard
> attributes in the DCB encap.
dcbnl_app_attr_type_get() in dcbnl_ieee_fill() takes care of this. IEEE are
always sent in DCB_ATTR_IEEE and non-std are sent in DCB_ATTR_DCB.
/ Daniel
<[email protected]> writes:
>> I'm missing a validation that DCB_APP_SEL_PCP is always sent in
>> DCB_ATTR_DCB_APP encapsulation. Wouldn't the current code permit
>> sending it in the IEEE encap? This should be forbidden.
>
> Right. Current impl. does not check that the non-std selectors received, are
> sent with a DCB_ATTR_DCB_APP type. We could introduce a new check
> dcbnl_app_attr_selector_validate() that checks combination of type and
> selector, after the type and nla_len(attr) has been checked, so that:
>
> validate type -> validate nla_len(attr) -> validate selector
This needs to be validated, otherwise there's no point in having a
dedicated attribute for the non-standard stuff.
>> And vice versa: I'm not sure we want to permit sending the standard
>> attributes in the DCB encap.
>
> dcbnl_app_attr_type_get() in dcbnl_ieee_fill() takes care of this. IEEE are
> always sent in DCB_ATTR_IEEE and non-std are sent in DCB_ATTR_DCB.
By "sending" I meant userspace sending this to the kernel. So bounce
extended opcodes that are wrapped in IEEE and bounce IEEE opcodes
wrapped in DCB as well.
> >> I'm missing a validation that DCB_APP_SEL_PCP is always sent in
> >> DCB_ATTR_DCB_APP encapsulation. Wouldn't the current code permit
> >> sending it in the IEEE encap? This should be forbidden.
> >
> > Right. Current impl. does not check that the non-std selectors received, are
> > sent with a DCB_ATTR_DCB_APP type. We could introduce a new check
> > dcbnl_app_attr_selector_validate() that checks combination of type and
> > selector, after the type and nla_len(attr) has been checked, so that:
> >
> > validate type -> validate nla_len(attr) -> validate selector
>
> This needs to be validated, otherwise there's no point in having a
> dedicated attribute for the non-standard stuff.
Agree.
>
> >> And vice versa: I'm not sure we want to permit sending the standard
> >> attributes in the DCB encap.
> >
> > dcbnl_app_attr_type_get() in dcbnl_ieee_fill() takes care of this. IEEE are
> > always sent in DCB_ATTR_IEEE and non-std are sent in DCB_ATTR_DCB.
>
> By "sending" I meant userspace sending this to the kernel. So bounce
> extended opcodes that are wrapped in IEEE and bounce IEEE opcodes
> wrapped in DCB as well.
Right. Then we only need to decide what to do with any opcode in-between
(not defined in uapi, neither ieee or extension opcode, 7-254). If they are
sent in DCB_ATTR_DCB they should be bounced, because we agreed that we can
interpret data in the new attr), _but_ if they are sent in DCB_ATTR_IEEE I
guess we should accept them, to not break userspace that is already sending
them.
Here is what that could look like:
/* Make sure any non-std selectors is always encapsulated in the non-std
* DCB_ATTR_DCB_APP attribute.
*/
static bool dcbnl_app_attr_selector_validate(enum ieee_attrs_app type, u32 selector)
{
switch (selector) {
case DCB_APP_SEL_PCP:
/* Non-std selector in non-std attr? */
if (type == DCB_ATTR_DCB_APP)
return true;
default:
/* Std selector in std attr? */
if (type == DCB_ATTR_IEEE_APP)
return true;
}
return false;
}
/ Daniel
<[email protected]> writes:
>> >> And vice versa: I'm not sure we want to permit sending the standard
>> >> attributes in the DCB encap.
>> >
>> > dcbnl_app_attr_type_get() in dcbnl_ieee_fill() takes care of this. IEEE are
>> > always sent in DCB_ATTR_IEEE and non-std are sent in DCB_ATTR_DCB.
>>
>> By "sending" I meant userspace sending this to the kernel. So bounce
>> extended opcodes that are wrapped in IEEE and bounce IEEE opcodes
>> wrapped in DCB as well.
>
> Right. Then we only need to decide what to do with any opcode in-between
> (not defined in uapi, neither ieee or extension opcode, 7-254). If they are
> sent in DCB_ATTR_DCB they should be bounced, because we agreed that we can
> interpret data in the new attr), _but_ if they are sent in DCB_ATTR_IEEE I
> guess we should accept them, to not break userspace that is already sending
> them.
I see, it's not currently validating at all. It just relies on the
driver to do the validation, but e.g. bnxt_dcbnl_ieee_setapp(), just
lets nonsense right through.
OK, but this interface is built on standards. The selector has a
well-defined, IEEE-backed meaning with enumerators published in the UAPI
headers. As before, even though this constitutes API breakage, IMHO if
anyone relies on shoving random garbage through this interface, it's on
them...
I think it's kosher to start bouncing undefined selectors.