2020-03-06 13:16:15

by Po Liu

[permalink] [raw]
Subject: [RFC,net-next 0/9] Introduce a flow gate control action and apply IEEE

This patch set is trying to intruduce a way to add tc flower offload
for the enetc IEEE 802.1Qci (PSFP) function. There are four main feature
parts in PSFP to implement the flow policing and filtering for ingress
flow with IEEE 802.1Qci features. They are stream identify(this is defined
in the P802.1cb exactly but needed by 802.1Qci), stream filtering, stream
gate and flow metering.

The stream gate function is the important part in the features. But
there is no compare actions in the qdisc filter part. The second patch
introduce a ingress frame gate control flow action. tc create a gate
action would provide a gate list to control class open/close state. when
the gate open state, the flow could pass but not when gate state is
close. The driver would repeat the gate list periodic. User also could
assign a time point to start the gate list by the basetime parameter. if
the basetime has passed current time, start time would calculate by the
cycletime of the gate list. And it is introduce a software simulator
gate control method.

The first patch is fix a flow offload can't provide dropped frame count
number issue. This would be used for getting the hardware offload dropped
frame.

The third patch is to adding the gate flow offloading.

The fourth patch is to add tc offload command in enetc. This is to
control the on/off for the tc flower offloading.

Now the enetc driver would got the gate control list and filter mac
address etc. So enetc would collected the parameters and create the
stream identify entry and stream gate control entry. Then driver would
create a stream filter entry by these inputs. Driver would maintain the
flow chain list. The fifth patch implement the stream gate and stream
filter and stream identify functions in driver by the tc flower actions
and tc filter parameters.

The sixth patch extend the police action with max frame size parameter.
This patch prepare for the PSFP per stream filtering by the frame size
policing.

The seventh patch add the max frame size policing into the stream
filtering function.

The eighth patch extend the police action with action index to the
driver. So driver could know which hardware entry to police the rate and
burst size.

The ninth patch add flow metering function in driver for the
IEEE802.1Qci with the police action 'index'/'burst'/'rate_bytes_ps'.

The iproute2 test patch need to upload to:

git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git

There are some works still need to improve and I'd like your feedback:
- The gate action software simulator need to add admin/oper state
machine. This state machine would keep the previous operation gate list
before the new admin gate list start time arrived. Does it required for
admin/oper in software implement?

- More PSFP flow metering parameters. flow metering is an optional function
for a specific filter chain. Add more parameters this part would make
the flow metering more completely. Flow metering privde a two rate (CIR,
EIR) and two buckets (CBS, EBS) and mark flow three color(green,
yellow, red). Current tc flower offload only provide "burst" and
"rate_bytes_ps" in police action for driver. This patch set using these
two parameters to set one bucket and one rate. Each flow metering entry
own two sets buckets and two rate police. So the second rate/burst keep
disable.

Po Liu (9):
net: qos offload add flow status with dropped count
net: qos: introduce a gate control flow action
net: schedule: add action gate offloading
net: enetc: add hw tc hw offload features for PSPF capability
net: enetc: add tc flower psfp offload driver
net: qos: add tc police offloading action with max frame size limit
net: enetc: add support max frame size for tc flower offload
net: qos: police action add index for tc flower offloading
net: enetc add tc flower offload flow metering policing action

drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 2 +-
.../ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 2 +-
.../chelsio/cxgb4/cxgb4_tc_matchall.c | 2 +-
drivers/net/ethernet/freescale/enetc/enetc.c | 34 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 86 ++
.../net/ethernet/freescale/enetc/enetc_hw.h | 183 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 6 +
.../net/ethernet/freescale/enetc/enetc_qos.c | 1228 ++++++++++++++++-
.../net/ethernet/mellanox/mlx5/core/en_tc.c | 4 +-
.../ethernet/mellanox/mlxsw/spectrum_flower.c | 2 +-
drivers/net/ethernet/mscc/ocelot_flower.c | 2 +-
.../ethernet/netronome/nfp/flower/offload.c | 2 +-
.../ethernet/netronome/nfp/flower/qos_conf.c | 2 +-
include/net/act_api.h | 11 +-
include/net/flow_offload.h | 17 +-
include/net/pkt_cls.h | 5 +-
include/net/tc_act/tc_gate.h | 169 +++
include/net/tc_act/tc_police.h | 10 +
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +
net/sched/Kconfig | 15 +
net/sched/Makefile | 1 +
net/sched/act_api.c | 12 +-
net/sched/act_ct.c | 6 +-
net/sched/act_gact.c | 7 +-
net/sched/act_gate.c | 645 +++++++++
net/sched/act_mirred.c | 6 +-
net/sched/act_police.c | 6 +-
net/sched/act_vlan.c | 6 +-
net/sched/cls_api.c | 35 +
net/sched/cls_flower.c | 3 +-
net/sched/cls_matchall.c | 3 +-
32 files changed, 2515 insertions(+), 45 deletions(-)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

--
2.17.1


2020-03-06 13:17:05

by Po Liu

[permalink] [raw]
Subject: [RFC,net-next 6/9] net: qos: add tc police offloading action with max frame size limit

Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some
hardware own the capability to limit the frame size. If the frame size
larger than the setting, the frame would be dropped. For the police
action itself already accept the 'mtu' parameter in tc command. But not
extend to tc flower offloading. So extend 'mtu' to tc flower offloading.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 1 +
include/net/tc_act/tc_police.h | 10 ++++++++++
net/sched/cls_api.c | 1 +
3 files changed, 12 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 7f5a097f5072..54df87328edc 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -203,6 +203,7 @@ struct flow_action_entry {
struct { /* FLOW_ACTION_POLICE */
s64 burst;
u64 rate_bytes_ps;
+ u32 mtu;
} police;
struct { /* FLOW_ACTION_CT */
int action;
diff --git a/include/net/tc_act/tc_police.h b/include/net/tc_act/tc_police.h
index f098ad4424be..39fbf28f8f3e 100644
--- a/include/net/tc_act/tc_police.h
+++ b/include/net/tc_act/tc_police.h
@@ -69,4 +69,14 @@ static inline s64 tcf_police_tcfp_burst(const struct tc_action *act)
return params->tcfp_burst;
}

+static inline u32 tcf_police_mtu(const struct tc_action *act)
+{
+ struct tcf_police *police = to_police(act);
+ struct tcf_police_params *params;
+
+ params = rcu_dereference_protected(police->params,
+ lockdep_is_held(&police->tcf_lock));
+
+ return params->tcfp_mtu;
+}
#endif /* __NET_TC_POLICE_H */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 0ada7b2a5c2c..363d3991793d 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3583,6 +3583,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
entry->police.burst = tcf_police_tcfp_burst(act);
entry->police.rate_bytes_ps =
tcf_police_rate_bytes_ps(act);
+ entry->police.mtu = tcf_police_mtu(act);
} else if (is_tcf_ct(act)) {
entry->id = FLOW_ACTION_CT;
entry->ct.action = tcf_ct_action(act);
--
2.17.1

2020-03-06 13:17:14

by Po Liu

[permalink] [raw]
Subject: [RFC,net-next 8/9] net: qos: police action add index for tc flower offloading

Hardware may own many entries for police flow. So that make one(or
multi) flow to be policed by one hardware entry. This patch add the
police action index provide to the driver side make it mapping the
driver hardware entry index.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 1 +
net/sched/cls_api.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 54df87328edc..3b78b15ed20b 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -201,6 +201,7 @@ struct flow_action_entry {
bool truncate;
} sample;
struct { /* FLOW_ACTION_POLICE */
+ u32 index;
s64 burst;
u64 rate_bytes_ps;
u32 mtu;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 363d3991793d..ce846a9dadc1 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3584,6 +3584,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
entry->police.rate_bytes_ps =
tcf_police_rate_bytes_ps(act);
entry->police.mtu = tcf_police_mtu(act);
+ entry->police.index = act->tcfa_index;
} else if (is_tcf_ct(act)) {
entry->id = FLOW_ACTION_CT;
entry->ct.action = tcf_ct_action(act);
--
2.17.1

2020-03-06 13:17:16

by Po Liu

[permalink] [raw]
Subject: [RFC,net-next 7/9] net: enetc: add support max frame size for tc flower offload

Base on the tc flower offload police action add max frame size by the
parameter 'mtu'. Tc flower device driver working by the IEEE 802.1Qci
stream filter can implement the max frame size filtering. Add it to the
current hardware tc flower stearm filter driver.

The limitation is the police action must add burst size and rate per
second. Driver ignore the parameter 'burst' size and 'rate_bytes_ps'.
Next driver patch would make burst and rate to the flow metering
function.

Signed-off-by: Po Liu <[email protected]>
---
.../net/ethernet/freescale/enetc/enetc_qos.c | 52 +++++++++++++------
1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 3ef46190d71d..278f1603b00a 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -388,6 +388,7 @@ struct enetc_psfp_filter {
u32 index;
s32 handle;
s8 prio;
+ u32 maxsdu;
u32 gate_id;
s32 meter_id;
u32 refcount;
@@ -429,6 +430,12 @@ struct actions_fwd enetc_act_fwd[] = {
BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
FILTER_ACTION_TYPE_PSFP
},
+ {
+ BIT(FLOW_ACTION_POLICE) |
+ BIT(FLOW_ACTION_GATE),
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
+ FILTER_ACTION_TYPE_PSFP
+ },
/* example for ACL actions */
{
BIT(FLOW_ACTION_DROP),
@@ -593,8 +600,12 @@ static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
/* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
* field as being either an MSDU value or an index into the Flow
* Meter Instance table.
- * TODO: no limit max sdu
*/
+ if (sfi->maxsdu) {
+ sfi_config->msdu =
+ cpu_to_le16(sfi->maxsdu);
+ sfi_config->multi |= 0x40;
+ }

if (sfi->meter_id >= 0) {
sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
@@ -860,6 +871,7 @@ static struct enetc_psfp_filter
hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
if (s->gate_id == sfi->gate_id &&
s->prio == sfi->prio &&
+ s->maxsdu == sfi->maxsdu &&
s->meter_id == sfi->meter_id)
return s;

@@ -965,6 +977,7 @@ struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int inputkeys)
static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
struct flow_cls_offload *f)
{
+ struct flow_action_entry *entryg = NULL, *entryp = NULL;
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
struct netlink_ext_ack *extack = f->common.extack;
struct enetc_stream_filter *filter, *old_filter;
@@ -983,9 +996,12 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,

flow_action_for_each(i, entry, &rule->action)
if (entry->id == FLOW_ACTION_GATE)
- break;
+ entryg = entry;
+ else if (entry->id == FLOW_ACTION_POLICE)
+ entryp = entry;

- if (entry->id != FLOW_ACTION_GATE)
+ /* Not support without gate action */
+ if (!entryg)
return -EINVAL;

filter = kzalloc(sizeof(*filter), GFP_KERNEL);
@@ -1044,37 +1060,37 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
}

/* parsing gate action */
- if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
+ if (entryg->gate.index >= priv->psfp_cap.max_psfp_gate) {
NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
err = -ENOSPC;
goto free_filter;
}

- if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
+ if (entryg->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
err = -ENOSPC;
goto free_filter;
}

- entries_size = struct_size(sgi, entries, entry->gate.num_entries);
+ entries_size = struct_size(sgi, entries, entryg->gate.num_entries);
sgi = kzalloc(entries_size, GFP_KERNEL);
if (!sgi) {
err = -ENOMEM;
goto free_filter;
}

- sgi->index = entry->gate.index;
- sgi->init_ipv = entry->gate.prio;
- sgi->basetime = entry->gate.basetime;
- sgi->cycletime = entry->gate.cycletime;
- sgi->num_entries = entry->gate.num_entries;
+ sgi->index = entryg->gate.index;
+ sgi->init_ipv = entryg->gate.prio;
+ sgi->basetime = entryg->gate.basetime;
+ sgi->cycletime = entryg->gate.cycletime;
+ sgi->num_entries = entryg->gate.num_entries;

e = sgi->entries;
- for (i = 0; i < entry->gate.num_entries; i++) {
- e[i].gate_state = entry->gate.entries[i].gate_state;
- e[i].interval = entry->gate.entries[i].interval;
- e[i].ipv = entry->gate.entries[i].ipv;
- e[i].maxoctets = entry->gate.entries[i].maxoctets;
+ for (i = 0; i < entryg->gate.num_entries; i++) {
+ e[i].gate_state = entryg->gate.entries[i].gate_state;
+ e[i].interval = entryg->gate.entries[i].interval;
+ e[i].ipv = entryg->gate.entries[i].ipv;
+ e[i].maxoctets = entryg->gate.entries[i].maxoctets;
}

filter->sgi_index = sgi->index;
@@ -1090,6 +1106,10 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
/* flow meter not support yet */
sfi->meter_id = ENETC_PSFP_WILDCARD;

+ /* Max frame size */
+ if (entryp)
+ sfi->maxsdu = entryp->police.mtu;
+
/* prio ref the filter prio */
if (f->common.prio && f->common.prio <= BIT(3))
sfi->prio = f->common.prio - 1;
--
2.17.1

2020-03-06 13:17:19

by Po Liu

[permalink] [raw]
Subject: [RFC,net-next 9/9] net: enetc add tc flower offload flow metering policing action

Flow metering entries in IEEE 802.1Qci is an optional function for a
flow filtering module. Flow metering is two rates two buckets and three
color marker to policing the frames. This patch only enable one rate one
bucket and in color blind mode. Flow metering instance are as
specified in the algorithm in MEF 10.3 and in Bandwidth Profile
Parameters. They are:

a) Flow meter instance identifier. An integer value identifying the flow
meter instance. The patch use the police 'index' as thin value.
b) Committed Information Rate (CIR), in bits per second. This patch use
the 'rate_bytes_ps' represent this value.
c) Committed Burst Size (CBS), in octets. This patch use the 'burst'
represent this value.
d) Excess Information Rate (EIR), in bits per second.
e) Excess Burst Size per Bandwidth Profile Flow (EBS), in octets.
And plus some other parameters. This patch set EIR/EBS default disable
and color blind mode.

Signed-off-by: Po Liu <[email protected]>
---
.../net/ethernet/freescale/enetc/enetc_hw.h | 24 +++
.../net/ethernet/freescale/enetc/enetc_qos.c | 150 ++++++++++++++++--
2 files changed, 164 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 640099f48a0d..74eac8ea6336 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -573,6 +573,7 @@ enum bdcr_cmd_class {
BDCR_CMD_STREAM_IDENTIFY,
BDCR_CMD_STREAM_FILTER,
BDCR_CMD_STREAM_GCL,
+ BDCR_CMD_FLOW_METER,
__BDCR_CMD_MAX_LEN,
BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
};
@@ -739,10 +740,33 @@ struct sgcl_data {
struct sgce sgcl[0];
};

+#define ENETC_CBDR_FMI_MR BIT(0)
+#define ENETC_CBDR_FMI_MREN BIT(1)
+#define ENETC_CBDR_FMI_DOY BIT(2)
+#define ENETC_CBDR_FMI_CM BIT(3)
+#define ENETC_CBDR_FMI_CF BIT(4)
+#define ENETC_CBDR_FMI_NDOR BIT(5)
+#define ENETC_CBDR_FMI_OALEN BIT(6)
+#define ENETC_CBDR_FMI_IRFPP_MASK GENMASK(4, 0)
+
+/* class 10: command 0/1, Flow Meter Instance Set, short Format */
+struct fmi_conf {
+ __le32 cir;
+ __le32 cbs;
+ __le32 eir;
+ __le32 ebs;
+ u8 conf;
+ u8 res1;
+ u8 ir_fpp;
+ u8 res2[4];
+ u8 en;
+};
+
struct enetc_cbd {
union{
struct sfi_conf sfi_conf;
struct sgi_table sgi_table;
+ struct fmi_conf fmi_conf;
struct {
__le32 addr[2];
union {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 278f1603b00a..8670ab395856 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -407,10 +407,26 @@ struct enetc_psfp_gate {
struct action_gate_entry entries[0];
};

+/* Only enable the green color frame now
+ * Will add eir and ebs color blind, couple flag etc when
+ * policing action add more offloading parameters
+ */
+struct enetc_psfp_meter {
+ u32 index;
+ u32 cir;
+ u32 cbs;
+ u32 refcount;
+ struct hlist_node node;
+};
+
+#define ENETC_PSFP_FLAGS_FMI BIT(0)
+
struct enetc_stream_filter {
struct enetc_streamid sid;
u32 sfi_index;
u32 sgi_index;
+ u32 flags;
+ u32 fmi_index;
struct flow_stats stats;
struct hlist_node node;
};
@@ -421,6 +437,7 @@ struct enetc_psfp {
struct hlist_head stream_list;
struct hlist_head psfp_filter_list;
struct hlist_head psfp_gate_list;
+ struct hlist_head psfp_meter_list;
spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
};

@@ -830,6 +847,47 @@ static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
return err;
}

+static int enetc_flowmeter_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_meter *fmi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = { .cmd = 0 };
+ struct fmi_conf *fmi_config;
+ u64 temp = 0;
+
+ cbd.index = cpu_to_le16((u16)fmi->index);
+ cbd.cls = BDCR_CMD_FLOW_METER;
+ cbd.status_flags = 0x80;
+
+ if (!enable)
+ return enetc_send_cmd(priv->si, &cbd);
+
+ fmi_config = &cbd.fmi_conf;
+ fmi_config->en = 0x80;
+
+ if (fmi->cir) {
+ temp = (u64)8000 * fmi->cir;
+ temp = temp / 3725;
+ }
+
+ fmi_config->cir = cpu_to_le32((u32)temp);
+ fmi_config->cbs = cpu_to_le32(fmi->cbs);
+
+ /* Default for eir ebs disable */
+ fmi_config->eir = 0;
+ fmi_config->ebs = 0;
+
+ /* Default:
+ * mark red disable
+ * drop on yellow disable
+ * color mode disable
+ * couple flag disable
+ */
+ fmi_config->conf = 0;
+
+ return enetc_send_cmd(priv->si, &cbd);
+}
+
static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
{
struct enetc_stream_filter *f;
@@ -863,6 +921,17 @@ static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
return NULL;
}

+static struct enetc_psfp_meter *enetc_get_meter_by_index(u32 index)
+{
+ struct enetc_psfp_meter *m;
+
+ hlist_for_each_entry(m, &epsfp.psfp_meter_list, node)
+ if (m->index == index)
+ return m;
+
+ return NULL;
+}
+
static struct enetc_psfp_filter
*enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
{
@@ -921,9 +990,28 @@ static void reduce_ref_sgi(struct enetc_ndev_priv *priv, u32 index)
}
}

+static void reduce_ref_fmi(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_meter *fmi;
+
+ fmi = enetc_get_meter_by_index(index);
+ if (fmi)
+ return;
+
+ fmi->refcount--;
+
+ if (!fmi->refcount) {
+ enetc_flowmeter_hw_set(priv, fmi, false);
+ hlist_del(&fmi->node);
+ kfree(fmi);
+ }
+}
+
static void remove_one_chain(struct enetc_ndev_priv *priv,
struct enetc_stream_filter *filter)
{
+ if (filter->flags & ENETC_PSFP_FLAGS_FMI)
+ reduce_ref_fmi(priv, filter->fmi_index);
reduce_ref_sgi(priv, filter->sgi_index);
reduce_ref_sfi(priv, filter->sfi_index);

@@ -934,7 +1022,8 @@ static void remove_one_chain(struct enetc_ndev_priv *priv,
static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
struct enetc_streamid *sid,
struct enetc_psfp_filter *sfi,
- struct enetc_psfp_gate *sgi)
+ struct enetc_psfp_gate *sgi,
+ struct enetc_psfp_meter *fmi)
{
int err;

@@ -952,8 +1041,16 @@ static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
if (err)
goto revert_sfi;

+ if (fmi) {
+ err = enetc_flowmeter_hw_set(priv, fmi, true);
+ if (err)
+ goto revert_sgi;
+ }
+
return 0;

+revert_sgi:
+ enetc_streamgate_hw_set(priv, sgi, false);
revert_sfi:
if (sfi && !sfi->refcount)
enetc_streamfilter_hw_set(priv, sfi, false);
@@ -981,6 +1078,7 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
struct netlink_ext_ack *extack = f->common.extack;
struct enetc_stream_filter *filter, *old_filter;
+ struct enetc_psfp_meter *fmi = NULL, *old_fmi;
struct enetc_psfp_filter *sfi, *old_sfi;
struct enetc_psfp_gate *sgi, *old_sgi;
struct flow_action_entry *entry;
@@ -1095,20 +1193,35 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,

filter->sgi_index = sgi->index;

+ /* flow meter */
+ if (entryp) {
+ fmi = kzalloc(sizeof(*fmi), GFP_KERNEL);
+ if (!fmi) {
+ err = -ENOMEM;
+ goto free_gate;
+ }
+ fmi->cir = entryp->police.rate_bytes_ps;
+ fmi->cbs = entryp->police.burst;
+ fmi->index = entryp->police.index;
+ filter->flags |= ENETC_PSFP_FLAGS_FMI;
+ filter->fmi_index = fmi->index;
+ }
+
sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
if (!sfi) {
err = -ENOMEM;
- goto free_gate;
+ goto free_fmi;
}

sfi->gate_id = sgi->index;

- /* flow meter not support yet */
- sfi->meter_id = ENETC_PSFP_WILDCARD;
-
/* Max frame size */
- if (entryp)
+ if (entryp) {
sfi->maxsdu = entryp->police.mtu;
+ sfi->meter_id = fmi->index;
+ } else {
+ sfi->meter_id = ENETC_PSFP_WILDCARD;
+ }

/* prio ref the filter prio */
if (f->common.prio && f->common.prio <= BIT(3))
@@ -1140,11 +1253,22 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
}

err = enetc_psfp_hw_set(priv, &filter->sid,
- sfi_overwrite ? NULL : sfi, sgi);
+ sfi_overwrite ? NULL : sfi, sgi, fmi);
if (err)
goto free_sfi;

spin_lock(&epsfp.psfp_lock);
+ if (filter->flags & ENETC_PSFP_FLAGS_FMI) {
+ old_fmi = enetc_get_meter_by_index(filter->fmi_index);
+ if (old_fmi) {
+ fmi->refcount = old_fmi->refcount;
+ hlist_del(&old_fmi->node);
+ kfree(old_fmi);
+ }
+ fmi->refcount++;
+ hlist_add_head(&fmi->node, &epsfp.psfp_meter_list);
+ }
+
old_sgi = enetc_get_gate_by_index(filter->sgi_index);
if (old_sgi) {
sgi->refcount = old_sgi->refcount;
@@ -1177,6 +1301,8 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,

free_sfi:
kfree(sfi);
+free_fmi:
+ kfree(fmi);
free_gate:
kfree(sgi);
free_filter:
@@ -1273,9 +1399,13 @@ static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
return -EINVAL;

spin_lock(&epsfp.psfp_lock);
- stats.pkts = counters.matching_frames_count - filter->stats.pkts;
- stats.dropped = counters.not_passing_frames_count -
- filter->stats.dropped;
+ stats.pkts = counters.matching_frames_count +
+ counters.not_passing_sdu_count -
+ filter->stats.pkts;
+ stats.dropped = counters.not_passing_frames_count +
+ counters.not_passing_sdu_count +
+ counters.red_frames_count -
+ filter->stats.dropped;
stats.lastused = filter->stats.lastused;
filter->stats.pkts += stats.pkts;
filter->stats.dropped += stats.dropped;
--
2.17.1

2020-03-06 13:17:19

by Po Liu

[permalink] [raw]
Subject: [RFC,net-next 3/9] net: schedule: add action gate offloading

Add the gate action to the flow action entry. Add the gate parameters to
the tc_setup_flow_action() queueing to the entries of flow_action_entry
array provide to the driver.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 10 +++
include/net/tc_act/tc_gate.h | 115 +++++++++++++++++++++++++++++++++++
net/sched/cls_api.c | 33 ++++++++++
3 files changed, 158 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index eb013ffc24f3..7f5a097f5072 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -138,6 +138,7 @@ enum flow_action_id {
FLOW_ACTION_MPLS_PUSH,
FLOW_ACTION_MPLS_POP,
FLOW_ACTION_MPLS_MANGLE,
+ FLOW_ACTION_GATE,
NUM_FLOW_ACTIONS,
};

@@ -223,6 +224,15 @@ struct flow_action_entry {
u8 bos;
u8 ttl;
} mpls_mangle;
+ struct {
+ u32 index;
+ s32 prio;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimeext;
+ u32 num_entries;
+ struct action_gate_entry *entries;
+ } gate;
};
struct flow_action_cookie *cookie; /* user defined action cookie */
};
diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
index 932a2b91b944..d93fa0d6f516 100644
--- a/include/net/tc_act/tc_gate.h
+++ b/include/net/tc_act/tc_gate.h
@@ -7,6 +7,13 @@
#include <net/act_api.h>
#include <linux/tc_act/tc_gate.h>

+struct action_gate_entry {
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+};
+
struct tcfg_gate_entry {
int index;
u8 gate_state;
@@ -51,4 +58,112 @@ struct tcf_gate {
#define get_gate_param(act) ((struct tcf_gate_params *)act)
#define get_gate_action(p) ((struct gate_action *)p)

+static inline bool is_tcf_gate(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+ if (a->ops && a->ops->id == TCA_ID_GATE)
+ return true;
+#endif
+ return false;
+}
+
+static inline u32 tcf_gate_index(const struct tc_action *a)
+{
+ return a->tcfa_index;
+}
+
+static inline s32 tcf_gate_prio(const struct tc_action *a)
+{
+ s32 tcfg_prio;
+
+ rcu_read_lock();
+ tcfg_prio = rcu_dereference(to_gate(a)->actg)->param.tcfg_priority;
+ rcu_read_unlock();
+
+ return tcfg_prio;
+}
+
+static inline u64 tcf_gate_basetime(const struct tc_action *a)
+{
+ u64 tcfg_basetime;
+
+ rcu_read_lock();
+ tcfg_basetime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_basetime;
+ rcu_read_unlock();
+
+ return tcfg_basetime;
+}
+
+static inline u64 tcf_gate_cycletime(const struct tc_action *a)
+{
+ u64 tcfg_cycletime;
+
+ rcu_read_lock();
+ tcfg_cycletime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime;
+ rcu_read_unlock();
+
+ return tcfg_cycletime;
+}
+
+static inline u64 tcf_gate_cycletimeext(const struct tc_action *a)
+{
+ u64 tcfg_cycletimeext;
+
+ rcu_read_lock();
+ tcfg_cycletimeext =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime_ext;
+ rcu_read_unlock();
+
+ return tcfg_cycletimeext;
+}
+
+static inline u32 tcf_gate_num_entries(const struct tc_action *a)
+{
+ u32 num_entries;
+
+ rcu_read_lock();
+ num_entries =
+ rcu_dereference(to_gate(a)->actg)->param.num_entries;
+ rcu_read_unlock();
+
+ return num_entries;
+}
+
+static inline struct action_gate_entry
+ *tcf_gate_get_list(const struct tc_action *a)
+{
+ struct action_gate_entry *oe;
+ struct tcf_gate_params *p;
+ struct tcfg_gate_entry *entry;
+ u32 num_entries;
+ int i = 0;
+
+ rcu_read_lock();
+ p = &(rcu_dereference(to_gate(a)->actg)->param);
+ num_entries = p->num_entries;
+ rcu_read_unlock();
+
+ list_for_each_entry(entry, &p->entries, list)
+ i++;
+
+ if (i != num_entries)
+ return NULL;
+
+ oe = kzalloc(sizeof(*oe) * num_entries, GFP_KERNEL);
+ if (!oe)
+ return NULL;
+
+ i = 0;
+ list_for_each_entry(entry, &p->entries, list) {
+ oe[i].gate_state = entry->gate_state;
+ oe[i].interval = entry->interval;
+ oe[i].ipv = entry->ipv;
+ oe[i].maxoctets = entry->maxoctets;
+ i++;
+ }
+
+ return oe;
+}
#endif
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 4e766c5ab77a..0ada7b2a5c2c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -38,6 +38,7 @@
#include <net/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_ct.h>
#include <net/tc_act/tc_mpls.h>
+#include <net/tc_act/tc_gate.h>
#include <net/flow_offload.h>

extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
@@ -3458,6 +3459,27 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
#endif
}

+static void tcf_gate_entry_destructor(void *priv)
+{
+ struct action_gate_entry *oe = priv;
+
+ kfree(oe);
+}
+
+static int tcf_gate_get_entries(struct flow_action_entry *entry,
+ const struct tc_action *act)
+{
+ entry->gate.entries = tcf_gate_get_list(act);
+
+ if (!entry->gate.entries)
+ return -EINVAL;
+
+ entry->destructor = tcf_gate_entry_destructor;
+ entry->destructor_priv = entry->gate.entries;
+
+ return 0;
+}
+
int tc_setup_flow_action(struct flow_action *flow_action,
const struct tcf_exts *exts)
{
@@ -3592,6 +3614,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
} else if (is_tcf_skbedit_ptype(act)) {
entry->id = FLOW_ACTION_PTYPE;
entry->ptype = tcf_skbedit_ptype(act);
+ } else if (is_tcf_gate(act)) {
+ entry->id = FLOW_ACTION_GATE;
+ entry->gate.index = tcf_gate_index(act);
+ entry->gate.prio = tcf_gate_prio(act);
+ entry->gate.basetime = tcf_gate_basetime(act);
+ entry->gate.cycletime = tcf_gate_cycletime(act);
+ entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
+ entry->gate.num_entries = tcf_gate_num_entries(act);
+ err = tcf_gate_get_entries(entry, act);
+ if (err)
+ goto err_out;
} else {
err = -EOPNOTSUPP;
goto err_out_locked;
--
2.17.1

2020-03-06 13:17:21

by Po Liu

[permalink] [raw]
Subject: [RFC,net-next 4/9] net: enetc: add hw tc hw offload features for PSPF capability

This patch is to let ethtool enable/disable the tc flower offload
features. Hardware enetc has the feature of PSFP which is for per-stream
policing. When enable the tc hw offloading feature, driver would enable
the IEEE 802.1Qci feature. It is only set the register enable bit for
this feature not enable for any entry of per stream filtering and stream
gate or stream identify but get how much capabilities for each features.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 23 ++++++++
drivers/net/ethernet/freescale/enetc/enetc.h | 55 +++++++++++++++++++
.../net/ethernet/freescale/enetc/enetc_hw.h | 17 ++++++
.../net/ethernet/freescale/enetc/enetc_pf.c | 8 +++
4 files changed, 103 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 1f79e36116a3..d810651317e1 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -762,6 +762,9 @@ void enetc_get_si_caps(struct enetc_si *si)

if (val & ENETC_SIPCAPR0_QBV)
si->hw_features |= ENETC_SI_F_QBV;
+
+ if (val & ENETC_SIPCAPR0_PSFP)
+ si->hw_features |= ENETC_SI_F_PSFP;
}

static int enetc_dma_alloc_bdr(struct enetc_bdr *r, size_t bd_size)
@@ -1566,6 +1569,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
return 0;
}

+static int enetc_set_psfp(struct net_device *ndev, int en)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+
+ if (en) {
+ priv->active_offloads |= ENETC_F_QCI;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&priv->si->hw);
+ } else {
+ priv->active_offloads &= ~ENETC_F_QCI;
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+ enetc_psfp_disable(&priv->si->hw);
+ }
+
+ return 0;
+}
+
int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
@@ -1574,6 +1594,9 @@ int enetc_set_features(struct net_device *ndev,
if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

+ if (changed & NETIF_F_HW_TC)
+ enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+
return 0;
}

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 9938f7a5fc0a..bcdade8f7b8a 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -120,6 +120,7 @@ enum enetc_errata {
};

#define ENETC_SI_F_QBV BIT(0)
+#define ENETC_SI_F_PSFP BIT(1)

/* PCI IEP device data */
struct enetc_si {
@@ -172,12 +173,20 @@ struct enetc_cls_rule {
};

#define ENETC_MAX_BDR_INT 2 /* fixed to max # of available cpus */
+struct psfp_cap {
+ u32 max_streamid;
+ u32 max_psfp_filter;
+ u32 max_psfp_gate;
+ u32 max_psfp_gatelist;
+ u32 max_psfp_meter;
+};

/* TODO: more hardware offloads */
enum enetc_active_offloads {
ENETC_F_RX_TSTAMP = BIT(0),
ENETC_F_TX_TSTAMP = BIT(1),
ENETC_F_QBV = BIT(2),
+ ENETC_F_QCI = BIT(3),
};

struct enetc_ndev_priv {
@@ -200,6 +209,8 @@ struct enetc_ndev_priv {

struct enetc_cls_rule *cls_rules;

+ struct psfp_cap psfp_cap;
+
struct device_node *phy_node;
phy_interface_t if_mode;
};
@@ -258,9 +269,53 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+
+static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
+{
+ u32 reg = 0;
+
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSIDCAPR);
+ priv->psfp_cap.max_streamid = reg & ENETC_PSIDCAPR_MSK;
+ /* Port stream filter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSFCAPR);
+ priv->psfp_cap.max_psfp_filter = reg & ENETC_PSFCAPR_MSK;
+ /* Port stream gate capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSGCAPR);
+ priv->psfp_cap.max_psfp_gate = (reg & ENETC_PSGCAPR_SGIT_MSK);
+ priv->psfp_cap.max_psfp_gatelist = (reg & ENETC_PSGCAPR_GCL_MSK) >> 16;
+ /* Port flow meter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PFMCAPR);
+ priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
+}
+
+static inline void enetc_psfp_enable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
+ | ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS
+ | ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+}
+
+static inline void enetc_psfp_disable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
+ & ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS
+ & ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+}
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_get_max_cap(p) \
+ memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))
+
+static inline void enetc_psfp_enable(struct enetc_hw *hw)
+{
+ return 0;
+}
+
+static inline void enetc_psfp_disable(struct enetc_hw *hw)
+{
+ return 0;
+}
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index da134e211c1a..99d520207069 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -19,6 +19,7 @@
#define ENETC_SICTR1 0x1c
#define ENETC_SIPCAPR0 0x20
#define ENETC_SIPCAPR0_QBV BIT(4)
+#define ENETC_SIPCAPR0_PSFP BIT(9)
#define ENETC_SIPCAPR0_RSS BIT(8)
#define ENETC_SIPCAPR1 0x24
#define ENETC_SITGTGR 0x30
@@ -228,6 +229,15 @@ enum enetc_bdr_type {TX, RX};
#define ENETC_PM0_IFM_RLP (BIT(5) | BIT(11))
#define ENETC_PM0_IFM_RGAUTO (BIT(15) | ENETC_PMO_IFM_RG | BIT(1))
#define ENETC_PM0_IFM_XGMII BIT(12)
+#define ENETC_PSIDCAPR 0x1b08
+#define ENETC_PSIDCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSFCAPR 0x1b18
+#define ENETC_PSFCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSGCAPR 0x1b28
+#define ENETC_PSGCAPR_GCL_MSK GENMASK(18, 16)
+#define ENETC_PSGCAPR_SGIT_MSK GENMASK(15, 0)
+#define ENETC_PFMCAPR 0x1b38
+#define ENETC_PFMCAPR_MSK GENMASK(15, 0)

/* MAC counters */
#define ENETC_PM0_REOCT 0x8100
@@ -624,3 +634,10 @@ struct enetc_cbd {
/* Port time specific departure */
#define ENETC_PTCTSDR(n) (0x1210 + 4 * (n))
#define ENETC_TSDE BIT(31)
+
+/* PSFP setting */
+#define ENETC_PPSFPMR 0x11b00
+#define ENETC_PPSFPMR_PSFPEN BIT(0)
+#define ENETC_PPSFPMR_VS BIT(1)
+#define ENETC_PPSFPMR_PVC BIT(2)
+#define ENETC_PPSFPMR_PVZC BIT(3)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 545a344bce00..d880cbdc0d2e 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -740,6 +740,14 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

+ if (si->hw_features & ENETC_SI_F_PSFP) {
+ priv->active_offloads |= ENETC_F_QCI;
+ ndev->features |= NETIF_F_HW_TC;
+ ndev->hw_features |= NETIF_F_HW_TC;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&si->hw);
+ }
+
/* pick up primary MAC address from SI */
enetc_get_primary_mac_addr(&si->hw, ndev->dev_addr);
}
--
2.17.1

2020-03-06 13:17:21

by Po Liu

[permalink] [raw]
Subject: [RFC,net-next 5/9] net: enetc: add tc flower psfp offload driver

This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
function. There are four main feature parts to implement the flow
policing and filtering for ingress flow with IEEE 802.1Qci features.
They are stream identify(this is defined in the P802.1cb exactly but
needed for 802.1Qci), stream filtering, stream gate and flow metering.
Each function block includes many entries by index to assign parameters.
So for one frame would be filtered by stream identify first, then
flow into stream filter block by the same handle between stream identify
and stream filtering. Then flow into stream gate control which assigned
by the stream filtering entry. And then policing by the gate and limited
by the max sdu in the filter block(optional). At last, policing by the
flow metering block, index choosing at the fitering block.
So you can see that each entry of block may link to many upper entries
since they can be assigned same index means more streams want to share
the same feature in the stream filtering or stream gate or flow
metering.
To implement such features, each stream filtered by source/destination
mac address, some stream maybe also plus the vlan id value would be
treated as one flow chain. This would be identified by the chain_index
which already in the tc filter concept. Driver would maintain this chain
and also with gate modules. The stream filter entry create by the gate
index and flow meter(optional) entry id and also one priority value.
Offloading only transfer the gate action and flow filtering parameters.
Driver would create (or search same gate id and flow meter id and
priority) one stream filter entry to set to the hardware. So stream
filtering do not need transfer by the action offloading.
This architecture is same with tc filter and actions relationship. tc
filter maintain the list for each flow feature by keys. And actions
maintain by the action list.

Below showing a example commands by tc:
> tc qdisc add dev eth0 ingress
> ip link set eth0 address 10:00:80:00:00:00
> tc filter add dev eth0 parent ffff: protocol ip chain 11 \
flower skip_sw dst_mac 10:00:80:00:00:00 \
action gate index 10 \
sched-entry OPEN 200000000 -1 -1 \
sched-entry CLOSE 100000000 -1 -1
(or just create one gate entry list to keep close)

That means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
identify module. And the set the gate index 10 of stream gate module.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 25 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 39 +-
.../net/ethernet/freescale/enetc/enetc_hw.h | 142 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 4 +-
.../net/ethernet/freescale/enetc/enetc_qos.c | 1078 ++++++++++++++++-
5 files changed, 1269 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index d810651317e1..df2e77619f64 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1520,6 +1520,8 @@ int enetc_setup_tc(struct net_device *ndev, enum tc_setup_type type,
return enetc_setup_tc_cbs(ndev, type_data);
case TC_SETUP_QDISC_ETF:
return enetc_setup_tc_txtime(ndev, type_data);
+ case TC_SETUP_BLOCK:
+ return enetc_setup_tc_psfp(ndev, type_data);
default:
return -EOPNOTSUPP;
}
@@ -1572,17 +1574,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
static int enetc_set_psfp(struct net_device *ndev, int en)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ int err;

if (en) {
+ err = enetc_psfp_enable(priv);
+ if (err)
+ return err;
+
priv->active_offloads |= ENETC_F_QCI;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&priv->si->hw);
- } else {
- priv->active_offloads &= ~ENETC_F_QCI;
- memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
- enetc_psfp_disable(&priv->si->hw);
+ return 0;
}

+ err = enetc_psfp_disable(priv);
+ if (err)
+ return err;
+
+ priv->active_offloads &= ~ENETC_F_QCI;
+
return 0;
}

@@ -1590,14 +1598,15 @@ int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
netdev_features_t changed = ndev->features ^ features;
+ int err = 0;

if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

if (changed & NETIF_F_HW_TC)
- enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+ err = enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));

- return 0;
+ return err;
}

#ifdef CONFIG_FSL_ENETC_HW_TIMESTAMPING
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index bcdade8f7b8a..f1a9a4cf1914 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -269,6 +269,11 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv);
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data);
+int enetc_psfp_init(struct enetc_ndev_priv *priv);
+int enetc_psfp_clean(struct enetc_ndev_priv *priv);

static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
{
@@ -288,33 +293,59 @@ static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
}

-static inline void enetc_psfp_enable(struct enetc_hw *hw)
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ enetc_get_max_cap(priv);
+
+ err = enetc_psfp_init(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
| ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS
| ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+
+ return 0;
}

-static inline void enetc_psfp_disable(struct enetc_hw *hw)
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ err = enetc_psfp_clean(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
& ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS
& ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+
+ return 0;
}
+
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_psfp(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_block_cb NULL
+
#define enetc_get_max_cap(p) \
memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))

-static inline void enetc_psfp_enable(struct enetc_hw *hw)
+static inline int enetc_psfp_enable(struct enetc_hw *hw)
{
return 0;
}

-static inline void enetc_psfp_disable(struct enetc_hw *hw)
+static inline int enetc_psfp_disable(struct enetc_hw *hw)
{
return 0;
}
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 99d520207069..640099f48a0d 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -570,6 +570,9 @@ enum bdcr_cmd_class {
BDCR_CMD_RFS,
BDCR_CMD_PORT_GCL,
BDCR_CMD_RECV_CLASSIFIER,
+ BDCR_CMD_STREAM_IDENTIFY,
+ BDCR_CMD_STREAM_FILTER,
+ BDCR_CMD_STREAM_GCL,
__BDCR_CMD_MAX_LEN,
BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
};
@@ -601,13 +604,152 @@ struct tgs_gcl_data {
struct gce entry[];
};

+/* class 7, command 0, Stream Identity Entry Configuration */
+struct streamid_conf {
+ __le32 stream_handle; /* init gate value */
+ __le32 iports;
+ u8 id_type;
+ u8 oui[3];
+ u8 res[3];
+ u8 en;
+};
+
+#define ENETC_CBDR_SID_VID_MASK 0xfff
+#define ENETC_CBDR_SID_VIDM BIT(12)
+#define ENETC_CBDR_SID_TG_MASK 0xc000
+/* streamid_conf address point to this data space */
+struct streamid_data {
+ union {
+ u8 dmac[6];
+ u8 smac[6];
+ };
+ u16 vid_vidm_tg;
+};
+
+#define ENETC_CBDR_SFI_PRI_MASK 0x7
+#define ENETC_CBDR_SFI_PRIM BIT(3)
+#define ENETC_CBDR_SFI_BLOV BIT(4)
+#define ENETC_CBDR_SFI_BLEN BIT(5)
+#define ENETC_CBDR_SFI_MSDUEN BIT(6)
+#define ENETC_CBDR_SFI_FMITEN BIT(7)
+#define ENETC_CBDR_SFI_ENABLE BIT(7)
+/* class 8, command 0, Stream Filter Instance, Short Format */
+struct sfi_conf {
+ __le32 stream_handle;
+ u8 multi;
+ u8 res[2];
+ u8 sthm;
+ /* Max Service Data Unit or Flow Meter Instance Table index.
+ * Depending on the value of FLT this represents either Max
+ * Service Data Unit (max frame size) allowed by the filter
+ * entry or is an index into the Flow Meter Instance table
+ * index identifying the policer which will be used to police
+ * it.
+ */
+ __le16 fm_inst_table_index;
+ __le16 msdu;
+ __le16 sg_inst_table_index;
+ u8 res1[2];
+ __le32 input_ports;
+ u8 res2[3];
+ u8 en;
+};
+
+/* class 8, command 2 stream Filter Instance status query short format
+ * command no need structure define
+ * Stream Filter Instance Query Statistics Response data
+ */
+struct sfi_counter_data {
+ u32 matchl;
+ u32 matchh;
+ u32 msdu_dropl;
+ u32 msdu_droph;
+ u32 stream_gate_dropl;
+ u32 stream_gate_droph;
+ u32 flow_meter_dropl;
+ u32 flow_meter_droph;
+};
+
+#define ENETC_CBDR_SGI_OIPV_MASK 0x7
+#define ENETC_CBDR_SGI_OIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_CGTST BIT(6)
+#define ENETC_CBDR_SGI_OGTST BIT(7)
+#define ENETC_CBDR_SGI_CFG_CHG BIT(1)
+#define ENETC_CBDR_SGI_CFG_PND BIT(2)
+#define ENETC_CBDR_SGI_OEX BIT(4)
+#define ENETC_CBDR_SGI_OEXEN BIT(5)
+#define ENETC_CBDR_SGI_IRX BIT(6)
+#define ENETC_CBDR_SGI_IRXEN BIT(7)
+#define ENETC_CBDR_SGI_ACLLEN_MASK 0x3
+#define ENETC_CBDR_SGI_OCLLEN_MASK 0xc
+#define ENETC_CBDR_SGI_EN BIT(7)
+/* class 9, command 0, Stream Gate Instance Table, Short Format
+ * class 9, command 2, Stream Gate Instance Table entry query write back
+ * Short Format
+ */
+struct sgi_table {
+ u8 res[8];
+ u8 oipv;
+ u8 res0[2];
+ u8 ocgtst;
+ u8 res1[7];
+ u8 gset;
+ u8 oacl_len;
+ u8 res2[2];
+ u8 en;
+};
+
+#define ENETC_CBDR_SGI_AIPV_MASK 0x7
+#define ENETC_CBDR_SGI_AIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_AGTST BIT(7)
+
+/* class 9, command 1, Stream Gate Control List, Long Format */
+struct sgcl_conf {
+ u8 aipv;
+ u8 res[2];
+ u8 agtst;
+ u8 res1[4];
+ union {
+ struct {
+ u8 res2[4];
+ u8 acl_len;
+ u8 res3[3];
+ };
+ u8 cct[8]; /* Config change time */
+ };
+};
+
+#define ENETC_CBDR_SGL_IOMEN BIT(0)
+#define ENETC_CBDR_SGL_IPVEN BIT(3)
+#define ENETC_CBDR_SGL_GTST BIT(4)
+#define ENETC_CBDR_SGL_IPV_MASK 0xe
+/* Stream Gate Control List Entry */
+struct sgce {
+ u32 interval;
+ u8 msdu[3];
+ u8 multi;
+};
+
+/* stream control list class 9 , cmd 1 data buffer */
+struct sgcl_data {
+ u32 btl;
+ u32 bth;
+ u32 ct;
+ u32 cte;
+ struct sgce sgcl[0];
+};
+
struct enetc_cbd {
union{
+ struct sfi_conf sfi_conf;
+ struct sgi_table sgi_table;
struct {
__le32 addr[2];
union {
__le32 opt[4];
struct tgs_gcl_conf gcl_conf;
+ struct streamid_conf sid_set;
+ struct sgcl_conf sgcl_conf;
};
}; /* Long format */
__le32 data[6];
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index d880cbdc0d2e..a19095ab7b41 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -740,12 +740,10 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

- if (si->hw_features & ENETC_SI_F_PSFP) {
+ if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
priv->active_offloads |= ENETC_F_QCI;
ndev->features |= NETIF_F_HW_TC;
ndev->hw_features |= NETIF_F_HW_TC;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&si->hw);
}

/* pick up primary MAC address from SI */
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 0c6bf3a55a9a..3ef46190d71d 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -5,6 +5,8 @@

#include <net/pkt_sched.h>
#include <linux/math64.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>

static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
{
@@ -108,13 +110,13 @@ static int enetc_setup_taprio(struct net_device *ndev,
gcl_data->cte = cpu_to_le32(admin_conf->cycle_time_extension);

for (i = 0; i < gcl_len; i++) {
- struct tc_taprio_sched_entry *temp_entry;
+ struct tc_taprio_sched_entry *to;
struct gce *temp_gce = gce + i;

- temp_entry = &admin_conf->entries[i];
+ to = &admin_conf->entries[i];

- temp_gce->gate = (u8)temp_entry->gate_mask;
- temp_gce->period = cpu_to_le32(temp_entry->interval);
+ temp_gce->gate = (u8)to->gate_mask;
+ temp_gce->period = cpu_to_le32(to->interval);
}

cbd.length = cpu_to_le16(data_size);
@@ -331,3 +333,1071 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)

return 0;
}
+
+enum streamid_type {
+ STREAMID_TYPE_RESERVED = 0,
+ STREAMID_TYPE_NULL,
+ STREAMID_TYPE_SMAC,
+};
+
+enum streamid_vlan_tagged {
+ STREAMID_VLAN_RESERVED = 0,
+ STREAMID_VLAN_TAGGED,
+ STREAMID_VLAN_UNTAGGED,
+ STREAMID_VLAN_ALL,
+};
+
+#define ENETC_PSFP_WILDCARD -1
+#define HANDLE_OFFSET 100
+
+enum forward_type {
+ FILTER_ACTION_TYPE_PSFP = BIT(0),
+ FILTER_ACTION_TYPE_ACL = BIT(1),
+ FILTER_ACTION_TYPE_BOTH = GENMASK(1, 0),
+};
+
+/* This is for limit output type for input actions */
+struct actions_fwd {
+ u64 actions;
+ u64 keys; /* include the must needed keys */
+ enum forward_type output;
+};
+
+struct psfp_streamfilter_counters {
+ u64 matching_frames_count;
+ u64 passing_frames_count;
+ u64 not_passing_frames_count;
+ u64 passing_sdu_count;
+ u64 not_passing_sdu_count;
+ u64 red_frames_count;
+};
+
+struct enetc_streamid {
+ u32 index;
+ union {
+ u8 src_mac[6];
+ u8 dst_mac[6];
+ };
+ u8 filtertype;
+ u16 vid;
+ u8 tagged;
+ s32 handle;
+};
+
+struct enetc_psfp_filter {
+ u32 index;
+ s32 handle;
+ s8 prio;
+ u32 gate_id;
+ s32 meter_id;
+ u32 refcount;
+ struct hlist_node node;
+};
+
+struct enetc_psfp_gate {
+ u32 index;
+ s8 init_ipv;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimext;
+ u32 num_entries;
+ u32 refcount;
+ struct hlist_node node;
+ struct action_gate_entry entries[0];
+};
+
+struct enetc_stream_filter {
+ struct enetc_streamid sid;
+ u32 sfi_index;
+ u32 sgi_index;
+ struct flow_stats stats;
+ struct hlist_node node;
+};
+
+struct enetc_psfp {
+ unsigned long dev_bitmap;
+ unsigned long *psfp_sfi_bitmap;
+ struct hlist_head stream_list;
+ struct hlist_head psfp_filter_list;
+ struct hlist_head psfp_gate_list;
+ spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
+};
+
+struct actions_fwd enetc_act_fwd[] = {
+ {
+ BIT(FLOW_ACTION_GATE),
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
+ FILTER_ACTION_TYPE_PSFP
+ },
+ /* example for ACL actions */
+ {
+ BIT(FLOW_ACTION_DROP),
+ 0,
+ FILTER_ACTION_TYPE_ACL
+ }
+};
+
+static struct enetc_psfp epsfp = {
+ .psfp_sfi_bitmap = NULL,
+};
+
+static LIST_HEAD(enetc_block_cb_list);
+
+static inline int enetc_get_port(struct enetc_ndev_priv *priv)
+{
+ return priv->si->pdev->devfn & 0x7;
+}
+
+/* Stream Identity Entry Set Descriptor */
+static int enetc_streamid_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct streamid_data *si_data;
+ struct streamid_conf *si_conf;
+ u16 data_size;
+ dma_addr_t dma;
+ int err;
+
+ if (sid->index >= priv->psfp_cap.max_streamid)
+ return -EINVAL;
+
+ if (sid->filtertype != STREAMID_TYPE_NULL &&
+ sid->filtertype != STREAMID_TYPE_SMAC)
+ return -EOPNOTSUPP;
+
+ /* Disable operation before enable */
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct streamid_data);
+ si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev, si_data,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(si_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+ memset(si_data->dmac, 0xff, ETH_ALEN);
+ si_data->vid_vidm_tg =
+ cpu_to_le16(ENETC_CBDR_SID_VID_MASK
+ + ((0x3 << 14) | ENETC_CBDR_SID_VIDM));
+
+ si_conf = &cbd.sid_set;
+ /* Only one port supported for one entry, set itself */
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = 1;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ if (!enable) {
+ kfree(si_data);
+ return 0;
+ }
+
+ /* Enable the entry overwrite again incase space flushed by hardware */
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ si_conf->en = 0x80;
+ si_conf->stream_handle = cpu_to_le32(sid->handle);
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = sid->filtertype;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ memset(si_data, 0, data_size);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ /* VIDM default to be 1.
+ * VID Match. If set (b1) then the VID must match, otherwise
+ * any VID is considered a match. VIDM setting is only used
+ * when TG is set to b01.
+ */
+ if (si_conf->id_type == STREAMID_TYPE_NULL) {
+ ether_addr_copy(si_data->dmac, sid->dst_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ } else if (si_conf->id_type == STREAMID_TYPE_SMAC) {
+ ether_addr_copy(si_data->smac, sid->src_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ kfree(si_data);
+
+ return err;
+}
+
+/* Stream Filter Instance Set Descriptor */
+static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_filter *sfi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct sfi_conf *sfi_config;
+
+ cbd.index = cpu_to_le16(sfi->index);
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0x80;
+ cbd.length = cpu_to_le16(1);
+
+ sfi_config = &cbd.sfi_conf;
+ if (!enable)
+ goto exit;
+
+ sfi_config->en = 0x80;
+
+ if (sfi->handle >= 0) {
+ sfi_config->stream_handle =
+ cpu_to_le32(sfi->handle);
+ sfi_config->sthm |= 0x80;
+ }
+
+ sfi_config->sg_inst_table_index = cpu_to_le16(sfi->gate_id);
+ sfi_config->input_ports = 1 << enetc_get_port(priv);
+
+ /* The priority value which may be matched against the
+ * frame’s priority value to determine a match for this entry.
+ */
+ if (sfi->prio >= 0)
+ sfi_config->multi |= (sfi->prio & 0x7) | 0x8;
+
+ /* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
+ * field as being either an MSDU value or an index into the Flow
+ * Meter Instance table.
+ * TODO: no limit max sdu
+ */
+
+ if (sfi->meter_id >= 0) {
+ sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
+ sfi_config->multi |= 0x80;
+ }
+
+exit:
+ return enetc_send_cmd(priv->si, &cbd);
+}
+
+static int enetc_streamcounter_hw_get(struct enetc_ndev_priv *priv,
+ u32 index,
+ struct psfp_streamfilter_counters *cnt)
+{
+ struct enetc_cbd cbd = { .cmd = 2 };
+ struct sfi_counter_data *data_buf;
+ dma_addr_t dma;
+ u16 data_size;
+ int err;
+
+ cbd.index = cpu_to_le16((u16)index);
+ cbd.cmd = 2;
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct sfi_counter_data);
+ data_buf = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!data_buf)
+ return -ENOMEM;
+
+ dma = dma_map_single(&priv->si->pdev->dev, data_buf,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ err = -ENOMEM;
+ goto exit;
+ }
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ goto exit;
+
+ cnt->matching_frames_count =
+ ((u64)le32_to_cpu(data_buf->matchh) << 32)
+ + data_buf->matchl;
+
+ cnt->not_passing_sdu_count =
+ ((u64)le32_to_cpu(data_buf->msdu_droph) << 32)
+ + data_buf->msdu_dropl;
+
+ cnt->passing_sdu_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count;
+
+ cnt->not_passing_frames_count =
+ ((u64)le32_to_cpu(data_buf->stream_gate_droph) << 32)
+ + le32_to_cpu(data_buf->stream_gate_dropl);
+
+ cnt->passing_frames_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count
+ - cnt->not_passing_frames_count;
+
+ cnt->red_frames_count =
+ ((u64)le32_to_cpu(data_buf->flow_meter_droph) << 32)
+ + le32_to_cpu(data_buf->flow_meter_dropl);
+
+exit:
+ kfree(data_buf);
+ return err;
+}
+
+static int get_start_ns(struct enetc_ndev_priv *priv, u64 cycle, u64 *start)
+{
+ u64 now_lo, now_hi, now, n;
+
+ now_lo = enetc_rd(&priv->si->hw, ENETC_SICTR0);
+ now_hi = enetc_rd(&priv->si->hw, ENETC_SICTR1);
+ now = now_lo | now_hi << 32;
+
+ if (WARN_ON(!cycle))
+ return -EFAULT;
+
+ n = div64_u64(now, cycle);
+
+ *start = (n + 1) * cycle;
+
+ return 0;
+}
+
+/* Stream Gate Instance Set Descriptor */
+static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_gate *sgi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = { .cmd = 0 };
+ struct sgi_table *sgi_config;
+ struct sgcl_conf *sgcl_config;
+ struct sgcl_data *sgcl_data;
+ struct sgce *sgce;
+ dma_addr_t dma;
+ u16 data_size;
+ int err, i;
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0x80;
+
+ /* disable */
+ if (!enable)
+ return enetc_send_cmd(priv->si, &cbd);
+
+ /* enable */
+ sgi_config = &cbd.sgi_table;
+
+ /* Keep open before gate list start */
+ sgi_config->ocgtst = 0x80;
+
+ sgi_config->oipv = (sgi->init_ipv < 0) ?
+ 0x0 : ((sgi->init_ipv & 0x7) | 0x8);
+
+ sgi_config->en = 0x80;
+
+ /* Basic config */
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ if (!sgi->num_entries)
+ return 0;
+
+ if (sgi->num_entries > priv->psfp_cap.max_psfp_gatelist ||
+ !sgi->cycletime)
+ return -EINVAL;
+
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 1;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0;
+
+ sgcl_config = &cbd.sgcl_conf;
+
+ sgcl_config->acl_len = (sgi->num_entries - 1) & 0x3;
+
+ data_size = struct_size(sgcl_data, sgcl, sgi->num_entries);
+
+ sgcl_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!sgcl_data)
+ return -ENOMEM;
+
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev,
+ sgcl_data, data_size,
+ DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(sgcl_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ sgce = &sgcl_data->sgcl[0];
+
+ sgcl_config->agtst = 0x80;
+
+ sgcl_data->ct = cpu_to_le32(sgi->cycletime);
+ sgcl_data->cte = cpu_to_le32(sgi->cycletimext);
+
+ if (sgi->init_ipv >= 0)
+ sgcl_config->aipv = (sgi->init_ipv & 0x7) | 0x8;
+
+ for (i = 0; i < sgi->num_entries; i++) {
+ struct action_gate_entry *from = &sgi->entries[i];
+ struct sgce *to = &sgce[i];
+
+ if (from->gate_state)
+ to->multi |= 0x10;
+
+ if (from->ipv >= 0)
+ to->multi |= ((from->ipv & 0x7) << 5) | 0x08;
+
+ if (from->maxoctets)
+ to->multi |= 0x01;
+
+ to->interval = cpu_to_le32(from->interval);
+ to->msdu[0] = from->maxoctets & 0xFF;
+ to->msdu[1] = (from->maxoctets >> 8) & 0xFF;
+ to->msdu[2] = (from->maxoctets >> 16) & 0xFF;
+ }
+
+ /* If basetime is 0, calculate start time */
+ if (!sgi->basetime) {
+ u64 start;
+
+ err = get_start_ns(priv, sgi->cycletime, &start);
+ if (err)
+ goto exit;
+ sgcl_data->btl = cpu_to_le32(lower_32_bits(start));
+ sgcl_data->bth = cpu_to_le32(upper_32_bits(start));
+ } else {
+ u32 hi, lo;
+
+ hi = upper_32_bits(sgi->basetime);
+ lo = lower_32_bits(sgi->basetime);
+ sgcl_data->bth = cpu_to_le32(hi);
+ sgcl_data->btl = cpu_to_le32(lo);
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+
+exit:
+ kfree(sgcl_data);
+
+ return err;
+}
+
+static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
+{
+ struct enetc_stream_filter *f;
+
+ hlist_for_each_entry(f, &epsfp.stream_list, node)
+ if (f->sid.index == index)
+ return f;
+
+ return NULL;
+}
+
+static struct enetc_psfp_gate *enetc_get_gate_by_index(u32 index)
+{
+ struct enetc_psfp_gate *g;
+
+ hlist_for_each_entry(g, &epsfp.psfp_gate_list, node)
+ if (g->index == index)
+ return g;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->index == index)
+ return s;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter
+ *enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->gate_id == sfi->gate_id &&
+ s->prio == sfi->prio &&
+ s->meter_id == sfi->meter_id)
+ return s;
+
+ return NULL;
+}
+
+static int enetc_get_free_index(struct enetc_ndev_priv *priv)
+{
+ u32 max_size = priv->psfp_cap.max_psfp_filter;
+ unsigned long index;
+
+ index = find_first_zero_bit(epsfp.psfp_sfi_bitmap, max_size);
+ if (index == max_size)
+ return -1;
+
+ return index;
+}
+
+static void reduce_ref_sfi(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_filter *sfi;
+
+ sfi = enetc_get_filter_by_index(index);
+ WARN_ON(!sfi);
+ sfi->refcount--;
+
+ if (!sfi->refcount) {
+ enetc_streamfilter_hw_set(priv, sfi, false);
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ clear_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ }
+}
+
+static void reduce_ref_sgi(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_gate *sgi;
+
+ sgi = enetc_get_gate_by_index(index);
+ WARN_ON(!sgi);
+ sgi->refcount--;
+
+ if (!sgi->refcount) {
+ enetc_streamgate_hw_set(priv, sgi, false);
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void remove_one_chain(struct enetc_ndev_priv *priv,
+ struct enetc_stream_filter *filter)
+{
+ reduce_ref_sgi(priv, filter->sgi_index);
+ reduce_ref_sfi(priv, filter->sfi_index);
+
+ hlist_del(&filter->node);
+ kfree(filter);
+}
+
+static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ struct enetc_psfp_filter *sfi,
+ struct enetc_psfp_gate *sgi)
+{
+ int err;
+
+ err = enetc_streamid_hw_set(priv, sid, true);
+ if (err)
+ return err;
+
+ if (sfi) {
+ err = enetc_streamfilter_hw_set(priv, sfi, true);
+ if (err)
+ goto revert_sid;
+ }
+
+ err = enetc_streamgate_hw_set(priv, sgi, true);
+ if (err)
+ goto revert_sfi;
+
+ return 0;
+
+revert_sfi:
+ if (sfi && !sfi->refcount)
+ enetc_streamfilter_hw_set(priv, sfi, false);
+revert_sid:
+ enetc_streamid_hw_set(priv, sid, false);
+ return err;
+}
+
+struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int inputkeys)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(enetc_act_fwd); i++)
+ if (acts == enetc_act_fwd[i].actions &&
+ inputkeys & enetc_act_fwd[i].keys)
+ return &enetc_act_fwd[i];
+
+ return NULL;
+}
+
+static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(f);
+ struct netlink_ext_ack *extack = f->common.extack;
+ struct enetc_stream_filter *filter, *old_filter;
+ struct enetc_psfp_filter *sfi, *old_sfi;
+ struct enetc_psfp_gate *sgi, *old_sgi;
+ struct flow_action_entry *entry;
+ struct action_gate_entry *e;
+ u8 sfi_overwrite = 0;
+ int entries_size;
+ int i, err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ flow_action_for_each(i, entry, &rule->action)
+ if (entry->id == FLOW_ACTION_GATE)
+ break;
+
+ if (entry->id != FLOW_ACTION_GATE)
+ return -EINVAL;
+
+ filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+ if (!filter)
+ return -ENOMEM;
+
+ filter->sid.index = f->common.chain_index;
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+ struct flow_match_eth_addrs match;
+
+ flow_rule_match_eth_addrs(rule, &match);
+
+ if (!is_zero_ether_addr(match.mask->dst)) {
+ ether_addr_copy(filter->sid.dst_mac, match.key->dst);
+ filter->sid.filtertype = STREAMID_TYPE_NULL;
+ }
+
+ if (!is_zero_ether_addr(match.mask->src)) {
+ ether_addr_copy(filter->sid.src_mac, match.key->src);
+ filter->sid.filtertype = STREAMID_TYPE_SMAC;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported, must ETH_ADDRS");
+ return -EINVAL;
+ }
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
+ struct flow_match_vlan match;
+
+ flow_rule_match_vlan(rule, &match);
+ if (match.mask->vlan_priority) {
+ if (match.mask->vlan_priority !=
+ (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN priority");
+ err = -EINVAL;
+ goto free_filter;
+ }
+ }
+
+ if (match.mask->vlan_tpid) {
+ if (match.mask->vlan_tpid != VLAN_VID_MASK) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN id");
+ err = -EINVAL;
+ goto free_filter;
+ }
+
+ filter->sid.vid = match.key->vlan_tpid;
+ if (!filter->sid.vid)
+ filter->sid.tagged = STREAMID_VLAN_UNTAGGED;
+ else
+ filter->sid.tagged = STREAMID_VLAN_TAGGED;
+ }
+ } else {
+ filter->sid.tagged = STREAMID_VLAN_ALL;
+ }
+
+ /* parsing gate action */
+ if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ entries_size = struct_size(sgi, entries, entry->gate.num_entries);
+ sgi = kzalloc(entries_size, GFP_KERNEL);
+ if (!sgi) {
+ err = -ENOMEM;
+ goto free_filter;
+ }
+
+ sgi->index = entry->gate.index;
+ sgi->init_ipv = entry->gate.prio;
+ sgi->basetime = entry->gate.basetime;
+ sgi->cycletime = entry->gate.cycletime;
+ sgi->num_entries = entry->gate.num_entries;
+
+ e = sgi->entries;
+ for (i = 0; i < entry->gate.num_entries; i++) {
+ e[i].gate_state = entry->gate.entries[i].gate_state;
+ e[i].interval = entry->gate.entries[i].interval;
+ e[i].ipv = entry->gate.entries[i].ipv;
+ e[i].maxoctets = entry->gate.entries[i].maxoctets;
+ }
+
+ filter->sgi_index = sgi->index;
+
+ sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
+ if (!sfi) {
+ err = -ENOMEM;
+ goto free_gate;
+ }
+
+ sfi->gate_id = sgi->index;
+
+ /* flow meter not support yet */
+ sfi->meter_id = ENETC_PSFP_WILDCARD;
+
+ /* prio ref the filter prio */
+ if (f->common.prio && f->common.prio <= BIT(3))
+ sfi->prio = f->common.prio - 1;
+ else
+ sfi->prio = ENETC_PSFP_WILDCARD;
+
+ old_sfi = enetc_psfp_check_sfi(sfi);
+ if (!old_sfi) {
+ int index;
+
+ index = enetc_get_free_index(priv);
+ if (sfi->handle < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
+ err = -ENOSPC;
+ goto free_sfi;
+ }
+
+ sfi->index = index;
+ sfi->handle = index + HANDLE_OFFSET;
+ /* Update the stream filter handle also */
+ filter->sid.handle = sfi->handle;
+ filter->sfi_index = sfi->index;
+ sfi_overwrite = 0;
+ } else {
+ filter->sfi_index = old_sfi->index;
+ filter->sid.handle = old_sfi->handle;
+ sfi_overwrite = 1;
+ }
+
+ err = enetc_psfp_hw_set(priv, &filter->sid,
+ sfi_overwrite ? NULL : sfi, sgi);
+ if (err)
+ goto free_sfi;
+
+ spin_lock(&epsfp.psfp_lock);
+ old_sgi = enetc_get_gate_by_index(filter->sgi_index);
+ if (old_sgi) {
+ sgi->refcount = old_sgi->refcount;
+ hlist_del(&old_sgi->node);
+ kfree(old_sgi);
+ }
+
+ sgi->refcount++;
+ hlist_add_head(&sgi->node, &epsfp.psfp_gate_list);
+
+ if (!old_sfi) {
+ hlist_add_head(&sfi->node, &epsfp.psfp_filter_list);
+ set_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ sfi->refcount++;
+ } else {
+ kfree(sfi);
+ old_sfi->refcount++;
+ }
+
+ old_filter = enetc_get_stream_by_index(filter->sid.index);
+ if (old_filter)
+ remove_one_chain(priv, old_filter);
+
+ filter->stats.lastused = jiffies;
+ hlist_add_head(&filter->node, &epsfp.stream_list);
+
+ spin_unlock(&epsfp.psfp_lock);
+
+ return 0;
+
+free_sfi:
+ kfree(sfi);
+free_gate:
+ kfree(sgi);
+free_filter:
+ kfree(filter);
+
+ return err;
+}
+
+static int enetc_config_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
+ struct netlink_ext_ack *extack = cls_flower->common.extack;
+ struct flow_dissector *dissector = rule->match.dissector;
+ struct flow_action *action = &rule->action;
+ struct flow_action_entry *entry;
+ struct actions_fwd *fwd;
+ u64 actions = 0;
+ int i, err;
+
+ if (!flow_action_has_entries(action)) {
+ NL_SET_ERR_MSG_MOD(extack, "At least one action is needed");
+ return -EINVAL;
+ }
+
+ flow_action_for_each(i, entry, action)
+ actions |= BIT(entry->id);
+
+ fwd = enetc_check_flow_actions(actions, dissector->used_keys);
+ if (!fwd) {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported filter type!");
+ return -EOPNOTSUPP;
+ }
+
+ if (fwd->output & FILTER_ACTION_TYPE_PSFP) {
+ err = enetc_psfp_parse_clsflower(priv, cls_flower);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid PSFP inputs");
+ return err;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported actions");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int enetc_psfp_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct enetc_stream_filter *filter;
+ struct netlink_ext_ack *extack = f->common.extack;
+ int err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamid_hw_set(priv, &filter->sid, false);
+ if (err)
+ return err;
+
+ remove_one_chain(priv, filter);
+
+ return 0;
+}
+
+static int enetc_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ return enetc_psfp_destroy_clsflower(priv, f);
+}
+
+static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct psfp_streamfilter_counters counters = {};
+ struct enetc_stream_filter *filter;
+ struct flow_stats stats = {};
+ int err;
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamcounter_hw_get(priv, filter->sfi_index, &counters);
+ if (err)
+ return -EINVAL;
+
+ spin_lock(&epsfp.psfp_lock);
+ stats.pkts = counters.matching_frames_count - filter->stats.pkts;
+ stats.dropped = counters.not_passing_frames_count -
+ filter->stats.dropped;
+ stats.lastused = filter->stats.lastused;
+ filter->stats.pkts += stats.pkts;
+ filter->stats.dropped += stats.dropped;
+
+ spin_unlock(&epsfp.psfp_lock);
+
+ flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
+ stats.dropped);
+
+ return 0;
+}
+
+static int enetc_setup_tc_cls_flower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ switch (cls_flower->command) {
+ case FLOW_CLS_REPLACE:
+ return enetc_config_clsflower(priv, cls_flower);
+ case FLOW_CLS_DESTROY:
+ return enetc_destroy_clsflower(priv, cls_flower);
+ case FLOW_CLS_STATS:
+ return enetc_psfp_get_stats(priv, cls_flower);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static inline void clean_psfp_sfi_bitmap(void)
+{
+ bitmap_free(epsfp.psfp_sfi_bitmap);
+ epsfp.psfp_sfi_bitmap = NULL;
+}
+
+static void clean_stream_list(void)
+{
+ struct enetc_stream_filter *s;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(s, tmp, &epsfp.stream_list, node) {
+ hlist_del(&s->node);
+ kfree(s);
+ }
+}
+
+static void clean_sfi_list(void)
+{
+ struct enetc_psfp_filter *sfi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sfi, tmp, &epsfp.psfp_filter_list, node) {
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ }
+}
+
+static void clean_sgi_list(void)
+{
+ struct enetc_psfp_gate *sgi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sgi, tmp, &epsfp.psfp_gate_list, node) {
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void clean_psfp_all(void)
+{
+ /* Disable all list nodes and free all memory */
+ clean_sfi_list();
+ clean_sgi_list();
+ clean_stream_list();
+ epsfp.dev_bitmap = 0;
+ clean_psfp_sfi_bitmap();
+}
+
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct net_device *ndev = cb_priv;
+
+ if (!tc_can_offload(ndev))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case TC_SETUP_CLSFLOWER:
+ return enetc_setup_tc_cls_flower(netdev_priv(ndev), type_data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+int enetc_psfp_init(struct enetc_ndev_priv *priv)
+{
+ if (epsfp.psfp_sfi_bitmap)
+ return 0;
+
+ epsfp.psfp_sfi_bitmap = bitmap_zalloc(priv->psfp_cap.max_psfp_filter,
+ GFP_KERNEL);
+ if (!epsfp.psfp_sfi_bitmap)
+ return -ENOMEM;
+
+ spin_lock_init(&epsfp.psfp_lock);
+
+ if (list_empty(&enetc_block_cb_list))
+ epsfp.dev_bitmap = 0;
+
+ return 0;
+}
+
+int enetc_psfp_clean(struct enetc_ndev_priv *priv)
+{
+ if (!list_empty(&enetc_block_cb_list))
+ return -EBUSY;
+
+ clean_psfp_all();
+
+ return 0;
+}
+
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct flow_block_offload *f = type_data;
+ int err;
+
+ err = flow_block_cb_setup_simple(f, &enetc_block_cb_list,
+ enetc_setup_tc_block_cb,
+ ndev, ndev, true);
+ if (err)
+ return err;
+
+ switch (f->command) {
+ case FLOW_BLOCK_BIND:
+ set_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ break;
+ case FLOW_BLOCK_UNBIND:
+ clear_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ if (!epsfp.dev_bitmap)
+ clean_psfp_all();
+ break;
+ }
+
+ return 0;
+}
--
2.17.1

2020-03-06 13:18:27

by Po Liu

[permalink] [raw]
Subject: [RFC, iproute2-next] iproute2:tc:action: add a gate control action

Introduce a ingress frame gate control flow action. tc create a gate
action would provide a gate list to control when open/close state. when
the gate open state, the flow could pass but not when gate state is
close. The driver would repeat the gate list cyclely. User also could
assign a time point to start the gate list by the basetime parameter. if
the basetime has passed current time, start time would calculate by the
cycletime of the gate list.
The behavior try to keep according to the IEEE 802.1Qci spec. For the
software emulation, require the user input the clock type.

Below is the setting software simulator example in user space:

> tc qdisc add dev eth0 ingress

> tc filter add dev eth0 parent ffff: protocol ip \
flower skip_hw src_ip 192.168.0.20 \
action gate index 2 \
sched-entry OPEN 200000000 -1 -1 \
sched-entry CLOSE 100000000 -1 -1

> tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. gate state is
"OPEN"/"CLOSE". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.

The command also support tc flower offload with device driver support:

> tc filter add dev eth0 parent ffff: protocol ip chain 14 \
flower skip_sw dst_mac 10:00:80:00:00:00 \
action gate index 12 sched-entry close 200000000 -1 -1

The iproute2 test patch need to upload to:

git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git

Signed-off-by: Po Liu <[email protected]>
---
include/uapi/linux/tc_act/tc_gate.h | 47 +++
tc/Makefile | 1 +
tc/m_gate.c | 483 ++++++++++++++++++++++++++++
3 files changed, 531 insertions(+)
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 tc/m_gate.c

diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
new file mode 100644
index 0000000..e1b3a7f
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_gate.h
@@ -0,0 +1,47 @@
+// SPDX-License-Identifier: (GPL-2.0+)
+/* Copyright 2020 NXP */
+
+#ifndef __LINUX_TC_GATE_H
+#define __LINUX_TC_GATE_H
+
+#include <linux/pkt_cls.h>
+
+struct tc_gate {
+ tc_gen;
+};
+
+enum {
+ TCA_GATE_ENTRY_UNSPEC,
+ TCA_GATE_ENTRY_INDEX,
+ TCA_GATE_ENTRY_GATE,
+ TCA_GATE_ENTRY_INTERVAL,
+ TCA_GATE_ENTRY_IPV,
+ TCA_GATE_ENTRY_MAX_OCTETS,
+ __TCA_GATE_ENTRY_MAX,
+};
+#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_ONE_ENTRY_UNSPEC,
+ TCA_GATE_ONE_ENTRY,
+ __TCA_GATE_ONE_ENTRY_MAX,
+};
+#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_UNSPEC,
+ TCA_GATE_TM,
+ TCA_GATE_PARMS,
+ TCA_GATE_PAD,
+ TCA_GATE_PRIORITY,
+ TCA_GATE_ENTRY_LIST,
+ TCA_GATE_BASE_TIME,
+ TCA_GATE_CYCLE_TIME,
+ TCA_GATE_CYCLE_TIME_EXT,
+ TCA_GATE_FLAGS,
+ TCA_GATE_CLOCKID,
+ __TCA_GATE_MAX,
+};
+#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
+
+#endif
diff --git a/tc/Makefile b/tc/Makefile
index a468a52..9365f3c 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -54,6 +54,7 @@ TCMODULES += m_bpf.o
TCMODULES += m_tunnel_key.o
TCMODULES += m_sample.o
TCMODULES += m_ct.o
+TCMODULES += m_gate.o
TCMODULES += p_ip.o
TCMODULES += p_ip6.o
TCMODULES += p_icmp.o
diff --git a/tc/m_gate.c b/tc/m_gate.c
new file mode 100644
index 0000000..50b65b7
--- /dev/null
+++ b/tc/m_gate.c
@@ -0,0 +1,483 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
+/* Copyright 2020 NXP */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <linux/if_ether.h>
+#include "utils.h"
+#include "rt_names.h"
+#include "tc_util.h"
+#include "list.h"
+#include <linux/tc_act/tc_gate.h>
+
+struct gate_entry {
+ struct list_head list;
+ uint8_t gate_state;
+ uint32_t interval;
+ int32_t ipv;
+ int32_t maxoctets;
+};
+
+#define CLOCKID_INVALID (-1)
+static const struct clockid_table {
+ const char *name;
+ clockid_t clockid;
+} clockt_map[] = {
+ { "REALTIME", CLOCK_REALTIME },
+ { "TAI", CLOCK_TAI },
+ { "BOOTTIME", CLOCK_BOOTTIME },
+ { "MONOTONIC", CLOCK_MONOTONIC },
+ { NULL }
+};
+
+static void explain(void)
+{
+ fprintf(stderr,
+ "Usage: gate [ priority PRIO-SPEC ] [ base-time BASE-TIME ]\n"
+ " [ cycle-time CYCLE-TIME ]\n"
+ " [ cycle-time-ext CYCLE-TIME-EXT ]\n"
+ " [ clockid CLOCKID ] [flags FLAGS]\n"
+ " [ sched-entry GATE0 INTERVAL INTERNAL-PRIO-VALUE MAX-OCTETS ]\n"
+ " [ sched-entry GATE1 INTERVAL INTERNAL-PRIO-VALUE MAX-OCTETS ]\n"
+ " ......\n"
+ " [ sched-entry GATEn INTERVAL INTERNAL-PRIO-VALUE MAX-OCTETS ]\n"
+ " [ CONTROL ]\n"
+ " GATEn := open | close |\n"
+ " INTERVAL : nanoseconds period of gate slot\n"
+ " INTERNAL-PRIO-VALUE : internal priority decide which\n"
+ " rx queue going to\n"
+ " -1 means wildcard\n"
+ " MAX-OCTETS : maximum number of MSDU octets that are"
+ " permitted to pas the gate during the\n"
+ " specified TimeInterval"
+ " CONTROL := pipe | drop | continue | pass |\n"
+ " goto chain <CHAIN_INDEX>\n");
+}
+
+static void usage(void)
+{
+ explain();
+ exit(-1);
+}
+
+static void explain_entry_format(void)
+{
+ fprintf(stderr, "Usage: sched-entry <open | close> <interval> <interval ipv> <octets max bytes>\n");
+}
+
+static int parse_gate(struct action_util *a, int *argc_p, char ***argv_p,
+ int tca_id, struct nlmsghdr *n);
+static int print_gate(struct action_util *au, FILE *f, struct rtattr *arg);
+
+struct action_util gate_action_util = {
+ .id = "gate",
+ .parse_aopt = parse_gate,
+ .print_aopt = print_gate,
+};
+
+static int get_clockid(__s32 *val, const char *arg)
+{
+ const struct clockid_table *c;
+
+ if (strcasestr(arg, "CLOCK_") != NULL)
+ arg += sizeof("CLOCK_") - 1;
+
+ for (c = clockt_map; c->name; c++) {
+ if (strcasecmp(c->name, arg) == 0) {
+ *val = c->clockid;
+ return 0;
+ }
+ }
+
+ return -1;
+}
+
+static const char *get_clock_name(clockid_t clockid)
+{
+ const struct clockid_table *c;
+
+ for (c = clockt_map; c->name; c++) {
+ if (clockid == c->clockid)
+ return c->name;
+ }
+
+ return "invalid";
+}
+
+static int get_gate_state(__u8 *val, const char *arg)
+{
+ if (!strcasecmp("OPEN", arg)) {
+ *val = 1;
+ return 0;
+ }
+
+ if (!strcasecmp("CLOSE", arg)) {
+ *val = 0;
+ return 0;
+ }
+
+ return -1;
+}
+
+static struct gate_entry *create_gate_entry(uint8_t gate_state,
+ uint32_t interval,
+ int32_t ipv,
+ int32_t maxoctets)
+{
+ struct gate_entry *e;
+
+ e = calloc(1, sizeof(*e));
+ if (!e)
+ return NULL;
+
+ e->gate_state = gate_state;
+ e->interval = interval;
+ e->ipv = ipv;
+ e->maxoctets = maxoctets;
+
+ return e;
+}
+
+static int add_gate_list(struct list_head *gate_entries, struct nlmsghdr *n)
+{
+ struct gate_entry *e;
+
+ list_for_each_entry(e, gate_entries, list) {
+ struct rtattr *a;
+
+ a = addattr_nest(n, 1024, TCA_GATE_ONE_ENTRY);
+
+ if (e->gate_state)
+ addattr(n, MAX_MSG, TCA_GATE_ENTRY_GATE);
+
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_INTERVAL,
+ &e->interval, sizeof(e->interval));
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_IPV,
+ &e->ipv, sizeof(e->ipv));
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_MAX_OCTETS,
+ &e->maxoctets, sizeof(e->maxoctets));
+
+ addattr_nest_end(n, a);
+ }
+
+ return 0;
+}
+
+static void free_entries(struct list_head *gate_entries)
+{
+ struct gate_entry *e, *n;
+
+ list_for_each_entry_safe(e, n, gate_entries, list) {
+ list_del(&e->list);
+ free(e);
+ }
+}
+
+static int parse_gate(struct action_util *a, int *argc_p, char ***argv_p,
+ int tca_id, struct nlmsghdr *n)
+{
+ __s32 clockid = CLOCKID_INVALID;
+ int argc = *argc_p;
+ char **argv = *argv_p;
+ struct rtattr *tail, *nle;
+ struct tc_gate parm = {.action = TC_ACT_PIPE};
+ struct list_head gate_entries;
+ int prio = -1;
+ __u64 base_time = 0;
+ __u64 cycle_time = 0;
+ __u64 cycle_time_ext = 0;
+ __u32 flags = 0;
+ int entry_num = 0;
+ int err;
+
+ if (matches(*argv, "gate") != 0)
+ return -1;
+
+ NEXT_ARG();
+ if (argc <= 0)
+ return -1;
+
+ INIT_LIST_HEAD(&gate_entries);
+
+ while (argc > 0) {
+ if (matches(*argv, "index") == 0) {
+ NEXT_ARG();
+ if (get_u32(&parm.index, *argv, 10))
+ invarg("index", *argv);
+ } else if (matches(*argv, "priority") == 0) {
+ NEXT_ARG();
+ if (get_s32(&prio, *argv, 0))
+ invarg("priority", *argv);
+ } else if (matches(*argv, "base-time") == 0) {
+ NEXT_ARG();
+ if (get_u64(&base_time, *argv, 10))
+ invarg("base-time", *argv);
+ } else if (matches(*argv, "cycle-time") == 0) {
+ NEXT_ARG();
+ if (get_u64(&cycle_time, *argv, 10))
+ invarg("cycle-time", *argv);
+ } else if (matches(*argv, "cycle-time-ext") == 0) {
+ NEXT_ARG();
+ if (get_u64(&cycle_time_ext, *argv, 10))
+ invarg("cycle-time-ext", *argv);
+ } else if (matches(*argv, "clockid") == 0) {
+ NEXT_ARG();
+ if (get_clockid(&clockid, *argv))
+ invarg("clockid", *argv);
+ } else if (matches(*argv, "flags") == 0) {
+ NEXT_ARG();
+ if (get_u32(&flags, *argv, 0))
+ invarg("flags", *argv);
+ } else if (matches(*argv, "sched-entry") == 0) {
+ struct gate_entry *e;
+ uint8_t gate_state = 0;
+ uint32_t interval = 0;
+ int32_t ipv = -1;
+ int32_t maxoctets = -1;
+
+ NEXT_ARG();
+
+ if (get_gate_state(&gate_state, *argv)) {
+ explain_entry_format();
+ invarg("gate", *argv);
+ }
+
+ NEXT_ARG();
+
+ if (get_u32(&interval, *argv, 0)) {
+ explain_entry_format();
+ invarg("interval", *argv);
+ }
+
+ NEXT_ARG();
+ if (get_s32(&ipv, *argv, 0)) {
+ explain_entry_format();
+ invarg("interval ipv", *argv);
+ }
+
+ NEXT_ARG();
+ if (get_s32(&maxoctets, *argv, 0)) {
+ explain_entry_format();
+ invarg("max octets", *argv);
+ }
+
+ e = create_gate_entry(gate_state, interval,
+ ipv, maxoctets);
+ if (!e) {
+ fprintf(stderr, "gate: not enough memory\n");
+ exit(-1);
+ }
+
+ list_add_tail(&e->list, &gate_entries);
+ entry_num++;
+
+ } else if (matches(*argv, "reclassify") == 0 ||
+ matches(*argv, "drop") == 0 ||
+ matches(*argv, "shot") == 0 ||
+ matches(*argv, "continue") == 0 ||
+ matches(*argv, "pass") == 0 ||
+ matches(*argv, "ok") == 0 ||
+ matches(*argv, "pipe") == 0 ||
+ matches(*argv, "goto") == 0) {
+ if (parse_action_control(&argc, &argv,
+ &parm.action, false))
+ return -1;
+ } else if (matches(*argv, "help") == 0) {
+ usage();
+ } else {
+ break;
+ }
+
+ argc--;
+ argv++;
+ }
+
+ parse_action_control_dflt(&argc, &argv, &parm.action,
+ false, TC_ACT_PIPE);
+
+ if (!entry_num && !parm.index) {
+ fprintf(stderr, "gate: must add at least one entry\n");
+ exit(-1);
+ }
+
+ tail = addattr_nest(n, MAX_MSG, tca_id);
+ addattr_l(n, MAX_MSG, TCA_GATE_PARMS, &parm, sizeof(parm));
+
+ if (prio != -1)
+ addattr_l(n, MAX_MSG, TCA_GATE_PRIORITY, &prio, sizeof(prio));
+
+ if (flags)
+ addattr_l(n, MAX_MSG, TCA_GATE_FLAGS, &flags, sizeof(flags));
+
+ if (base_time)
+ addattr_l(n, MAX_MSG, TCA_GATE_BASE_TIME,
+ &base_time, sizeof(base_time));
+
+ if (cycle_time)
+ addattr_l(n, MAX_MSG, TCA_GATE_CYCLE_TIME,
+ &cycle_time, sizeof(cycle_time));
+
+ if (cycle_time_ext)
+ addattr_l(n, MAX_MSG, TCA_GATE_CYCLE_TIME_EXT,
+ &cycle_time_ext, sizeof(cycle_time_ext));
+
+ if (clockid != CLOCKID_INVALID)
+ addattr_l(n, MAX_MSG, TCA_GATE_CLOCKID, &clockid, sizeof(clockid));
+
+ nle = addattr_nest(n, MAX_MSG, TCA_GATE_ENTRY_LIST | NLA_F_NESTED);
+ err = add_gate_list(&gate_entries, n);
+ if (err < 0) {
+ fprintf(stderr, "Could not add entries to netlink message\n");
+ free_entries(&gate_entries);
+ return -1;
+ }
+
+ addattr_nest_end(n, nle);
+ addattr_nest_end(n, tail);
+ free_entries(&gate_entries);
+ *argc_p = argc;
+ *argv_p = argv;
+
+ return 0;
+}
+
+static int print_gate_list(struct rtattr *list)
+{
+ struct rtattr *item;
+ int rem;
+
+ rem = RTA_PAYLOAD(list);
+ print_string(PRINT_FP, NULL, "%s", _SL_);
+
+ for (item = RTA_DATA(list);
+ RTA_OK(item, rem);
+ item = RTA_NEXT(item, rem)) {
+ struct rtattr *tb[TCA_GATE_ENTRY_MAX + 1];
+ __u32 index = 0, interval = 0;
+ __u8 gate_state = 0;
+ __s32 ipv = -1, maxoctets = -1;
+
+ parse_rtattr_nested(tb, TCA_GATE_ENTRY_MAX, item);
+
+ if (tb[TCA_GATE_ENTRY_INDEX])
+ index = rta_getattr_u32(tb[TCA_GATE_ENTRY_INDEX]);
+
+ if (tb[TCA_GATE_ENTRY_GATE])
+ gate_state = 1;
+
+ if (tb[TCA_GATE_ENTRY_INTERVAL])
+ interval = rta_getattr_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
+
+ if (tb[TCA_GATE_ENTRY_IPV])
+ ipv = rta_getattr_s32(tb[TCA_GATE_ENTRY_IPV]);
+
+ if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
+ maxoctets = rta_getattr_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
+
+ print_uint(PRINT_ANY, "number", "\t number %4u", index);
+ print_string(PRINT_ANY, "gate state", "\tgate-state %-8s",
+ gate_state ? "open" : "close");
+
+ print_uint(PRINT_ANY, "interval", "\tinterval %-16u", interval);
+
+ if (ipv != -1)
+ print_uint(PRINT_ANY, "ipv", "\tipv %-8u", ipv);
+ else
+ print_string(PRINT_FP, "ipv", "\tipv %s", "wildcard");
+
+ if (maxoctets != -1)
+ print_uint(PRINT_ANY, "max_octets", "\tmax-octets %-8u", maxoctets);
+ else
+ print_string(PRINT_FP, "max_octets", "\tmax-octets %s", "wildcard");
+
+ print_string(PRINT_FP, NULL, "%s", _SL_);
+ }
+
+ return 0;
+}
+
+static int print_gate(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+ struct tc_gate *parm;
+ struct rtattr *tb[TCA_GATE_MAX + 1];
+ __s32 clockid = CLOCKID_INVALID;
+ __u64 base_time = 0;
+ __u64 cycle_time = 0;
+ __u64 cycle_time_ext = 0;
+ int prio = -1;
+
+ if (arg == NULL)
+ return -1;
+
+ parse_rtattr_nested(tb, TCA_GATE_MAX, arg);
+
+ if (!tb[TCA_GATE_PARMS]) {
+ fprintf(stderr, "Missing gate parameters\n");
+ return -1;
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ parm = RTA_DATA(tb[TCA_GATE_PARMS]);
+
+ if (tb[TCA_GATE_PRIORITY])
+ prio = rta_getattr_s32(tb[TCA_GATE_PRIORITY]);
+
+ if (prio != -1)
+ print_int(PRINT_ANY, "priority", "\tpriority %-8d", prio);
+ else
+ print_string(PRINT_FP, "priority", "\tpriority %s", "wildcard");
+
+ if (tb[TCA_GATE_CLOCKID])
+ clockid = rta_getattr_s32(tb[TCA_GATE_CLOCKID]);
+ print_string(PRINT_ANY, "clockid", "\tclockid %s",
+ get_clock_name(clockid));
+
+ if (tb[TCA_GATE_FLAGS]) {
+ __u32 flags;
+
+ flags = rta_getattr_u32(tb[TCA_GATE_FLAGS]);
+ print_0xhex(PRINT_ANY, "flags", "\tflags %#x", flags);
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ if (tb[TCA_GATE_BASE_TIME])
+ base_time = rta_getattr_u64(tb[TCA_GATE_BASE_TIME]);
+
+ print_lluint(PRINT_ANY, "base_time", "\tbase-time %-22lld", base_time);
+
+ if (tb[TCA_GATE_CYCLE_TIME])
+ cycle_time = rta_getattr_u64(tb[TCA_GATE_CYCLE_TIME]);
+
+ print_lluint(PRINT_ANY, "cycle_time", "\tcycle-time %-16lld", cycle_time);
+
+ if (tb[TCA_GATE_CYCLE_TIME_EXT])
+ cycle_time = rta_getattr_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
+
+ print_lluint(PRINT_ANY, "cycle_time_ext", "\tcycle-time-ext %-16lld",
+ cycle_time_ext);
+
+ if (tb[TCA_GATE_ENTRY_LIST])
+ print_gate_list(tb[TCA_GATE_ENTRY_LIST]);
+
+ print_action_control(f, "\t", parm->action, "");
+
+ print_uint(PRINT_ANY, "index", "\n\t index %u", parm->index);
+ print_int(PRINT_ANY, "ref", " ref %d", parm->refcnt);
+ print_int(PRINT_ANY, "bind", " bind %d", parm->bindcnt);
+
+ if (show_stats) {
+ if (tb[TCA_GATE_TM]) {
+ struct tcf_t *tm = RTA_DATA(tb[TCA_GATE_TM]);
+
+ print_tm(f, tm);
+ }
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ return 0;
+}
--
2.17.1

2020-03-06 19:02:43

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC,net-next 3/9] net: schedule: add action gate offloading

On Fri, 6 Mar 2020 20:56:01 +0800 Po Liu wrote:
> +static int tcf_gate_get_entries(struct flow_action_entry *entry,
> + const struct tc_action *act)
> +{
> + entry->gate.entries = tcf_gate_get_list(act);
> +
> + if (!entry->gate.entries)
> + return -EINVAL;
> +
> + entry->destructor = tcf_gate_entry_destructor;
> + entry->destructor_priv = entry->gate.entries;

What's this destructor stuff doing? I don't it being called.

> + return 0;
> +}

2020-03-06 19:20:11

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [RFC,net-next 3/9] net: schedule: add action gate offloading

On Fri, 6 Mar 2020 11:02:00 -0800 Jakub Kicinski wrote:
> On Fri, 6 Mar 2020 20:56:01 +0800 Po Liu wrote:
> > +static int tcf_gate_get_entries(struct flow_action_entry *entry,
> > + const struct tc_action *act)
> > +{
> > + entry->gate.entries = tcf_gate_get_list(act);
> > +
> > + if (!entry->gate.entries)
> > + return -EINVAL;
> > +
> > + entry->destructor = tcf_gate_entry_destructor;
> > + entry->destructor_priv = entry->gate.entries;
>
> What's this destructor stuff doing? I don't it being called.

Ah, it's the action destructor, not something gate specific. Disregard.

2020-03-07 04:41:29

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [RFC,net-next 3/9] net: schedule: add action gate offloading

Hi Jakub,


> -----Original Message-----
> From: Jakub Kicinski <[email protected]>
> Sent: 2020??3??7?? 3:19
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>; Xiaoliang Yang
> <[email protected]>; Roy Zang <[email protected]>; Mingkai Hu
> <[email protected]>; Jerry Huang <[email protected]>; Leo Li
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [EXT] Re: [RFC,net-next 3/9] net: schedule: add action gate
> offloading
>
> Caution: EXT Email
>
> On Fri, 6 Mar 2020 11:02:00 -0800 Jakub Kicinski wrote:
> > On Fri, 6 Mar 2020 20:56:01 +0800 Po Liu wrote:
> > > +static int tcf_gate_get_entries(struct flow_action_entry *entry,
> > > + const struct tc_action *act) {
> > > + entry->gate.entries = tcf_gate_get_list(act);
> > > +
> > > + if (!entry->gate.entries)
> > > + return -EINVAL;
> > > +
> > > + entry->destructor = tcf_gate_entry_destructor;
> > > + entry->destructor_priv = entry->gate.entries;
> >
> > What's this destructor stuff doing? I don't it being called.

It prepare a gate list array parameters for offloading. So the driver side would not link the data with protocol side. Destructor would free the temporary gate list array.

>
> Ah, it's the action destructor, not something gate specific. Disregard.

I understand here with actions are:
Each tc flower follow up with actions. Each action defined:

struct flow_action_entry {
enum flow_action_id id;
action_destr destructor;
void *destructor_priv;
union {
......
{}sample,
{}police,
{}gate,
}
}

So for the destructor and destructor_priv are provided specific for the union action. So it is not gate specific. For mirror action, destructor_priv would be point to a mirror device data.
I suppose it is for destroy the temporary data like it name. And after tc_setup_flow_action() loaded, the destructor function would be loaded by tc_cleanup_flow_action() to destroy and free the temporary data.

Code flow is :
net/sched/cls_flower.c
static int fl_hw_replace_filter()
{
......
tc_setup_flow_action(); ---------------------------------------> assign action parameters (with the destructor and destructor_priv if the action needed)
......
tc_setup_cb_add() ----------------------------------------------> call the driver provide rules with actions datas for device
......
tc_cleanup_flow_action(&cls_flower.rule->action); ---> loading each action''s destructor(destructor_priv)
}

So for each action would be with its private destructor and destructor_priv if the action needed, and then destroyed at tc_cleanup_flow_action().

Did I misunderstand anything?

Thanks!

Br,
Po Liu

2020-03-12 22:12:50

by Vinicius Costa Gomes

[permalink] [raw]
Subject: Re: [RFC,net-next 5/9] net: enetc: add tc flower psfp offload driver

Po Liu <[email protected]> writes:

> This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
> function. There are four main feature parts to implement the flow
> policing and filtering for ingress flow with IEEE 802.1Qci features.
> They are stream identify(this is defined in the P802.1cb exactly but
> needed for 802.1Qci), stream filtering, stream gate and flow metering.
> Each function block includes many entries by index to assign parameters.
> So for one frame would be filtered by stream identify first, then
> flow into stream filter block by the same handle between stream identify
> and stream filtering. Then flow into stream gate control which assigned
> by the stream filtering entry. And then policing by the gate and limited
> by the max sdu in the filter block(optional). At last, policing by the
> flow metering block, index choosing at the fitering block.
> So you can see that each entry of block may link to many upper entries
> since they can be assigned same index means more streams want to share
> the same feature in the stream filtering or stream gate or flow
> metering.
> To implement such features, each stream filtered by source/destination
> mac address, some stream maybe also plus the vlan id value would be
> treated as one flow chain. This would be identified by the chain_index
> which already in the tc filter concept. Driver would maintain this chain
> and also with gate modules. The stream filter entry create by the gate
> index and flow meter(optional) entry id and also one priority value.
> Offloading only transfer the gate action and flow filtering parameters.
> Driver would create (or search same gate id and flow meter id and
> priority) one stream filter entry to set to the hardware. So stream
> filtering do not need transfer by the action offloading.
> This architecture is same with tc filter and actions relationship. tc
> filter maintain the list for each flow feature by keys. And actions
> maintain by the action list.
>
> Below showing a example commands by tc:
>> tc qdisc add dev eth0 ingress
>> ip link set eth0 address 10:00:80:00:00:00
>> tc filter add dev eth0 parent ffff: protocol ip chain 11 \
> flower skip_sw dst_mac 10:00:80:00:00:00 \
> action gate index 10 \
> sched-entry OPEN 200000000 -1 -1 \
> sched-entry CLOSE 100000000 -1 -1
> (or just create one gate entry list to keep close)
>
> That means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
> identify module. And the set the gate index 10 of stream gate module.
>
> Signed-off-by: Po Liu <[email protected]>
> ---
> drivers/net/ethernet/freescale/enetc/enetc.c | 25 +-
> drivers/net/ethernet/freescale/enetc/enetc.h | 39 +-
> .../net/ethernet/freescale/enetc/enetc_hw.h | 142 +++
> .../net/ethernet/freescale/enetc/enetc_pf.c | 4 +-
> .../net/ethernet/freescale/enetc/enetc_qos.c | 1078 ++++++++++++++++-
> 5 files changed, 1269 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
> index d810651317e1..df2e77619f64 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -1520,6 +1520,8 @@ int enetc_setup_tc(struct net_device *ndev, enum tc_setup_type type,
> return enetc_setup_tc_cbs(ndev, type_data);
> case TC_SETUP_QDISC_ETF:
> return enetc_setup_tc_txtime(ndev, type_data);
> + case TC_SETUP_BLOCK:
> + return enetc_setup_tc_psfp(ndev, type_data);
> default:
> return -EOPNOTSUPP;
> }
> @@ -1572,17 +1574,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
> static int enetc_set_psfp(struct net_device *ndev, int en)
> {
> struct enetc_ndev_priv *priv = netdev_priv(ndev);
> + int err;
>
> if (en) {
> + err = enetc_psfp_enable(priv);
> + if (err)
> + return err;
> +
> priv->active_offloads |= ENETC_F_QCI;
> - enetc_get_max_cap(priv);
> - enetc_psfp_enable(&priv->si->hw);
> - } else {
> - priv->active_offloads &= ~ENETC_F_QCI;
> - memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
> - enetc_psfp_disable(&priv->si->hw);
> + return 0;
> }
>
> + err = enetc_psfp_disable(priv);
> + if (err)
> + return err;
> +
> + priv->active_offloads &= ~ENETC_F_QCI;
> +
> return 0;
> }
>
> @@ -1590,14 +1598,15 @@ int enetc_set_features(struct net_device *ndev,
> netdev_features_t features)
> {
> netdev_features_t changed = ndev->features ^ features;
> + int err = 0;
>
> if (changed & NETIF_F_RXHASH)
> enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));
>
> if (changed & NETIF_F_HW_TC)
> - enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
> + err = enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
>
> - return 0;
> + return err;
> }
>
> #ifdef CONFIG_FSL_ENETC_HW_TIMESTAMPING
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
> index bcdade8f7b8a..f1a9a4cf1914 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.h
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.h
> @@ -269,6 +269,11 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
> void enetc_sched_speed_set(struct net_device *ndev);
> int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
> int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
> +int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
> + void *cb_priv);
> +int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data);
> +int enetc_psfp_init(struct enetc_ndev_priv *priv);
> +int enetc_psfp_clean(struct enetc_ndev_priv *priv);
>
> static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
> {
> @@ -288,33 +293,59 @@ static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
> priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
> }
>
> -static inline void enetc_psfp_enable(struct enetc_hw *hw)
> +static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
> {
> + struct enetc_hw *hw = &priv->si->hw;
> + int err;
> +
> + enetc_get_max_cap(priv);
> +
> + err = enetc_psfp_init(priv);
> + if (err)
> + return err;
> +
> enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
> | ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS
> | ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
> +
> + return 0;
> }
>
> -static inline void enetc_psfp_disable(struct enetc_hw *hw)
> +static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
> {
> + struct enetc_hw *hw = &priv->si->hw;
> + int err;
> +
> + err = enetc_psfp_clean(priv);
> + if (err)
> + return err;
> +
> enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
> & ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS
> & ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
> +
> + memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
> +
> + return 0;
> }
> +
> #else
> #define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
> #define enetc_sched_speed_set(ndev) (void)0
> #define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
> #define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
> +#define enetc_setup_tc_psfp(ndev, type_data) -EOPNOTSUPP
> +#define enetc_setup_tc_block_cb NULL
> +
> #define enetc_get_max_cap(p) \
> memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))
>
> -static inline void enetc_psfp_enable(struct enetc_hw *hw)
> +static inline int enetc_psfp_enable(struct enetc_hw *hw)
> {
> return 0;
> }
>
> -static inline void enetc_psfp_disable(struct enetc_hw *hw)
> +static inline int enetc_psfp_disable(struct enetc_hw *hw)
> {
> return 0;
> }
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
> index 99d520207069..640099f48a0d 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
> @@ -570,6 +570,9 @@ enum bdcr_cmd_class {
> BDCR_CMD_RFS,
> BDCR_CMD_PORT_GCL,
> BDCR_CMD_RECV_CLASSIFIER,
> + BDCR_CMD_STREAM_IDENTIFY,
> + BDCR_CMD_STREAM_FILTER,
> + BDCR_CMD_STREAM_GCL,
> __BDCR_CMD_MAX_LEN,
> BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
> };
> @@ -601,13 +604,152 @@ struct tgs_gcl_data {
> struct gce entry[];
> };
>
> +/* class 7, command 0, Stream Identity Entry Configuration */
> +struct streamid_conf {
> + __le32 stream_handle; /* init gate value */
> + __le32 iports;
> + u8 id_type;
> + u8 oui[3];
> + u8 res[3];
> + u8 en;
> +};
> +
> +#define ENETC_CBDR_SID_VID_MASK 0xfff
> +#define ENETC_CBDR_SID_VIDM BIT(12)
> +#define ENETC_CBDR_SID_TG_MASK 0xc000
> +/* streamid_conf address point to this data space */
> +struct streamid_data {
> + union {
> + u8 dmac[6];
> + u8 smac[6];
> + };
> + u16 vid_vidm_tg;
> +};
> +
> +#define ENETC_CBDR_SFI_PRI_MASK 0x7
> +#define ENETC_CBDR_SFI_PRIM BIT(3)
> +#define ENETC_CBDR_SFI_BLOV BIT(4)
> +#define ENETC_CBDR_SFI_BLEN BIT(5)
> +#define ENETC_CBDR_SFI_MSDUEN BIT(6)
> +#define ENETC_CBDR_SFI_FMITEN BIT(7)
> +#define ENETC_CBDR_SFI_ENABLE BIT(7)
> +/* class 8, command 0, Stream Filter Instance, Short Format */
> +struct sfi_conf {
> + __le32 stream_handle;
> + u8 multi;
> + u8 res[2];
> + u8 sthm;
> + /* Max Service Data Unit or Flow Meter Instance Table index.
> + * Depending on the value of FLT this represents either Max
> + * Service Data Unit (max frame size) allowed by the filter
> + * entry or is an index into the Flow Meter Instance table
> + * index identifying the policer which will be used to police
> + * it.
> + */
> + __le16 fm_inst_table_index;
> + __le16 msdu;
> + __le16 sg_inst_table_index;
> + u8 res1[2];
> + __le32 input_ports;
> + u8 res2[3];
> + u8 en;
> +};
> +
> +/* class 8, command 2 stream Filter Instance status query short format
> + * command no need structure define
> + * Stream Filter Instance Query Statistics Response data
> + */
> +struct sfi_counter_data {
> + u32 matchl;
> + u32 matchh;
> + u32 msdu_dropl;
> + u32 msdu_droph;
> + u32 stream_gate_dropl;
> + u32 stream_gate_droph;
> + u32 flow_meter_dropl;
> + u32 flow_meter_droph;
> +};
> +
> +#define ENETC_CBDR_SGI_OIPV_MASK 0x7
> +#define ENETC_CBDR_SGI_OIPV_EN BIT(3)
> +#define ENETC_CBDR_SGI_CGTST BIT(6)
> +#define ENETC_CBDR_SGI_OGTST BIT(7)
> +#define ENETC_CBDR_SGI_CFG_CHG BIT(1)
> +#define ENETC_CBDR_SGI_CFG_PND BIT(2)
> +#define ENETC_CBDR_SGI_OEX BIT(4)
> +#define ENETC_CBDR_SGI_OEXEN BIT(5)
> +#define ENETC_CBDR_SGI_IRX BIT(6)
> +#define ENETC_CBDR_SGI_IRXEN BIT(7)
> +#define ENETC_CBDR_SGI_ACLLEN_MASK 0x3
> +#define ENETC_CBDR_SGI_OCLLEN_MASK 0xc
> +#define ENETC_CBDR_SGI_EN BIT(7)
> +/* class 9, command 0, Stream Gate Instance Table, Short Format
> + * class 9, command 2, Stream Gate Instance Table entry query write back
> + * Short Format
> + */
> +struct sgi_table {
> + u8 res[8];
> + u8 oipv;
> + u8 res0[2];
> + u8 ocgtst;
> + u8 res1[7];
> + u8 gset;
> + u8 oacl_len;
> + u8 res2[2];
> + u8 en;
> +};
> +
> +#define ENETC_CBDR_SGI_AIPV_MASK 0x7
> +#define ENETC_CBDR_SGI_AIPV_EN BIT(3)
> +#define ENETC_CBDR_SGI_AGTST BIT(7)
> +
> +/* class 9, command 1, Stream Gate Control List, Long Format */
> +struct sgcl_conf {
> + u8 aipv;
> + u8 res[2];
> + u8 agtst;
> + u8 res1[4];
> + union {
> + struct {
> + u8 res2[4];
> + u8 acl_len;
> + u8 res3[3];
> + };
> + u8 cct[8]; /* Config change time */
> + };
> +};
> +
> +#define ENETC_CBDR_SGL_IOMEN BIT(0)
> +#define ENETC_CBDR_SGL_IPVEN BIT(3)
> +#define ENETC_CBDR_SGL_GTST BIT(4)
> +#define ENETC_CBDR_SGL_IPV_MASK 0xe
> +/* Stream Gate Control List Entry */
> +struct sgce {
> + u32 interval;
> + u8 msdu[3];
> + u8 multi;
> +};
> +
> +/* stream control list class 9 , cmd 1 data buffer */
> +struct sgcl_data {
> + u32 btl;
> + u32 bth;
> + u32 ct;
> + u32 cte;
> + struct sgce sgcl[0];
> +};
> +
> struct enetc_cbd {
> union{

Wrong identation here. Should be fixed on a separated patch.

> + struct sfi_conf sfi_conf;
> + struct sgi_table sgi_table;
> struct {
> __le32 addr[2];
> union {
> __le32 opt[4];
> struct tgs_gcl_conf gcl_conf;
> + struct streamid_conf sid_set;
> + struct sgcl_conf sgcl_conf;
> };
> }; /* Long format */
> __le32 data[6];
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> index d880cbdc0d2e..a19095ab7b41 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> @@ -740,12 +740,10 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
> if (si->hw_features & ENETC_SI_F_QBV)
> priv->active_offloads |= ENETC_F_QBV;
>
> - if (si->hw_features & ENETC_SI_F_PSFP) {
> + if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
> priv->active_offloads |= ENETC_F_QCI;
> ndev->features |= NETIF_F_HW_TC;
> ndev->hw_features |= NETIF_F_HW_TC;
> - enetc_get_max_cap(priv);
> - enetc_psfp_enable(&si->hw);
> }
>
> /* pick up primary MAC address from SI */
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
> index 0c6bf3a55a9a..3ef46190d71d 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
> @@ -5,6 +5,8 @@
>
> #include <net/pkt_sched.h>
> #include <linux/math64.h>
> +#include <net/pkt_cls.h>
> +#include <net/tc_act/tc_gate.h>
>
> static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
> {
> @@ -108,13 +110,13 @@ static int enetc_setup_taprio(struct net_device *ndev,
> gcl_data->cte = cpu_to_le32(admin_conf->cycle_time_extension);
>
> for (i = 0; i < gcl_len; i++) {
> - struct tc_taprio_sched_entry *temp_entry;
> + struct tc_taprio_sched_entry *to;
> struct gce *temp_gce = gce + i;
>
> - temp_entry = &admin_conf->entries[i];
> + to = &admin_conf->entries[i];
>
> - temp_gce->gate = (u8)temp_entry->gate_mask;
> - temp_gce->period = cpu_to_le32(temp_entry->interval);
> + temp_gce->gate = (u8)to->gate_mask;
> + temp_gce->period = cpu_to_le32(to->interval);

These changes seem unrelated to the series.

> }
>
> cbd.length = cpu_to_le16(data_size);
> @@ -331,3 +333,1071 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)
>
> return 0;
> }
> +
> +enum streamid_type {
> + STREAMID_TYPE_RESERVED = 0,
> + STREAMID_TYPE_NULL,
> + STREAMID_TYPE_SMAC,
> +};
> +
> +enum streamid_vlan_tagged {
> + STREAMID_VLAN_RESERVED = 0,
> + STREAMID_VLAN_TAGGED,
> + STREAMID_VLAN_UNTAGGED,
> + STREAMID_VLAN_ALL,
> +};
> +
> +#define ENETC_PSFP_WILDCARD -1
> +#define HANDLE_OFFSET 100
> +
> +enum forward_type {
> + FILTER_ACTION_TYPE_PSFP = BIT(0),
> + FILTER_ACTION_TYPE_ACL = BIT(1),
> + FILTER_ACTION_TYPE_BOTH = GENMASK(1, 0),
> +};
> +
> +/* This is for limit output type for input actions */
> +struct actions_fwd {
> + u64 actions;
> + u64 keys; /* include the must needed keys */
> + enum forward_type output;
> +};
> +
> +struct psfp_streamfilter_counters {
> + u64 matching_frames_count;
> + u64 passing_frames_count;
> + u64 not_passing_frames_count;
> + u64 passing_sdu_count;
> + u64 not_passing_sdu_count;
> + u64 red_frames_count;
> +};
> +
> +struct enetc_streamid {
> + u32 index;
> + union {
> + u8 src_mac[6];
> + u8 dst_mac[6];
> + };
> + u8 filtertype;
> + u16 vid;
> + u8 tagged;
> + s32 handle;
> +};
> +
> +struct enetc_psfp_filter {
> + u32 index;
> + s32 handle;
> + s8 prio;
> + u32 gate_id;
> + s32 meter_id;
> + u32 refcount;

It may be more appropriate to use <linux/refcount.h> instead of rolling
your own.

> + struct hlist_node node;
> +};
> +
> +struct enetc_psfp_gate {
> + u32 index;
> + s8 init_ipv;
> + u64 basetime;
> + u64 cycletime;
> + u64 cycletimext;
> + u32 num_entries;
> + u32 refcount;
> + struct hlist_node node;
> + struct action_gate_entry entries[0];
> +};
> +
> +struct enetc_stream_filter {
> + struct enetc_streamid sid;
> + u32 sfi_index;
> + u32 sgi_index;
> + struct flow_stats stats;
> + struct hlist_node node;
> +};
> +
> +struct enetc_psfp {
> + unsigned long dev_bitmap;
> + unsigned long *psfp_sfi_bitmap;
> + struct hlist_head stream_list;
> + struct hlist_head psfp_filter_list;
> + struct hlist_head psfp_gate_list;
> + spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
> +};
> +
> +struct actions_fwd enetc_act_fwd[] = {
> + {
> + BIT(FLOW_ACTION_GATE),
> + BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
> + FILTER_ACTION_TYPE_PSFP
> + },
> + /* example for ACL actions */
> + {
> + BIT(FLOW_ACTION_DROP),
> + 0,
> + FILTER_ACTION_TYPE_ACL
> + }
> +};
> +
> +static struct enetc_psfp epsfp = {
> + .psfp_sfi_bitmap = NULL,
> +};

Is it possible to have more than one these controllers in the same
system? If it's possible to have more than one, this should be moved to
live under a per-device struct.

> +
> +static LIST_HEAD(enetc_block_cb_list);
> +
> +static inline int enetc_get_port(struct enetc_ndev_priv *priv)
> +{
> + return priv->si->pdev->devfn & 0x7;
> +}
> +
> +/* Stream Identity Entry Set Descriptor */
> +static int enetc_streamid_hw_set(struct enetc_ndev_priv *priv,
> + struct enetc_streamid *sid,
> + u8 enable)
> +{
> + struct enetc_cbd cbd = {.cmd = 0};
> + struct streamid_data *si_data;
> + struct streamid_conf *si_conf;
> + u16 data_size;
> + dma_addr_t dma;
> + int err;
> +
> + if (sid->index >= priv->psfp_cap.max_streamid)
> + return -EINVAL;
> +
> + if (sid->filtertype != STREAMID_TYPE_NULL &&
> + sid->filtertype != STREAMID_TYPE_SMAC)
> + return -EOPNOTSUPP;
> +
> + /* Disable operation before enable */
> + cbd.index = cpu_to_le16((u16)sid->index);
> + cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
> + cbd.status_flags = 0;
> +
> + data_size = sizeof(struct streamid_data);
> + si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
> + cbd.length = cpu_to_le16(data_size);
> +
> + dma = dma_map_single(&priv->si->pdev->dev, si_data,
> + data_size, DMA_FROM_DEVICE);
> + if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
> + netdev_err(priv->si->ndev, "DMA mapping failed!\n");
> + kfree(si_data);
> + return -ENOMEM;
> + }
> +
> + cbd.addr[0] = lower_32_bits(dma);
> + cbd.addr[1] = upper_32_bits(dma);
> + memset(si_data->dmac, 0xff, ETH_ALEN);
> + si_data->vid_vidm_tg =
> + cpu_to_le16(ENETC_CBDR_SID_VID_MASK
> + + ((0x3 << 14) | ENETC_CBDR_SID_VIDM));
> +
> + si_conf = &cbd.sid_set;
> + /* Only one port supported for one entry, set itself */
> + si_conf->iports = 1 << enetc_get_port(priv);
> + si_conf->id_type = 1;
> + si_conf->oui[2] = 0x0;
> + si_conf->oui[1] = 0x80;
> + si_conf->oui[0] = 0xC2;
> +
> + err = enetc_send_cmd(priv->si, &cbd);
> + if (err)
> + return -EINVAL;
> +
> + if (!enable) {
> + kfree(si_data);
> + return 0;
> + }
> +
> + /* Enable the entry overwrite again incase space flushed by hardware */
> + memset(&cbd, 0, sizeof(cbd));
> +
> + cbd.index = cpu_to_le16((u16)sid->index);
> + cbd.cmd = 0;
> + cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
> + cbd.status_flags = 0;
> +
> + si_conf->en = 0x80;
> + si_conf->stream_handle = cpu_to_le32(sid->handle);
> + si_conf->iports = 1 << enetc_get_port(priv);
> + si_conf->id_type = sid->filtertype;
> + si_conf->oui[2] = 0x0;
> + si_conf->oui[1] = 0x80;
> + si_conf->oui[0] = 0xC2;
> +
> + memset(si_data, 0, data_size);
> +
> + cbd.length = cpu_to_le16(data_size);
> +
> + cbd.addr[0] = lower_32_bits(dma);
> + cbd.addr[1] = upper_32_bits(dma);
> +
> + /* VIDM default to be 1.
> + * VID Match. If set (b1) then the VID must match, otherwise
> + * any VID is considered a match. VIDM setting is only used
> + * when TG is set to b01.
> + */
> + if (si_conf->id_type == STREAMID_TYPE_NULL) {
> + ether_addr_copy(si_data->dmac, sid->dst_mac);
> + si_data->vid_vidm_tg =
> + cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
> + ((((u16)(sid->tagged) & 0x3) << 14)
> + | ENETC_CBDR_SID_VIDM));
> + } else if (si_conf->id_type == STREAMID_TYPE_SMAC) {
> + ether_addr_copy(si_data->smac, sid->src_mac);
> + si_data->vid_vidm_tg =
> + cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
> + ((((u16)(sid->tagged) & 0x3) << 14)
> + | ENETC_CBDR_SID_VIDM));
> + }
> +
> + err = enetc_send_cmd(priv->si, &cbd);
> + kfree(si_data);
> +
> + return err;
> +}
> +
> +/* Stream Filter Instance Set Descriptor */
> +static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
> + struct enetc_psfp_filter *sfi,
> + u8 enable)
> +{
> + struct enetc_cbd cbd = {.cmd = 0};
> + struct sfi_conf *sfi_config;
> +
> + cbd.index = cpu_to_le16(sfi->index);
> + cbd.cls = BDCR_CMD_STREAM_FILTER;
> + cbd.status_flags = 0x80;
> + cbd.length = cpu_to_le16(1);
> +
> + sfi_config = &cbd.sfi_conf;
> + if (!enable)
> + goto exit;
> +
> + sfi_config->en = 0x80;
> +
> + if (sfi->handle >= 0) {
> + sfi_config->stream_handle =
> + cpu_to_le32(sfi->handle);
> + sfi_config->sthm |= 0x80;
> + }
> +
> + sfi_config->sg_inst_table_index = cpu_to_le16(sfi->gate_id);
> + sfi_config->input_ports = 1 << enetc_get_port(priv);
> +
> + /* The priority value which may be matched against the
> + * frame’s priority value to determine a match for this entry.
> + */
> + if (sfi->prio >= 0)
> + sfi_config->multi |= (sfi->prio & 0x7) | 0x8;
> +
> + /* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
> + * field as being either an MSDU value or an index into the Flow
> + * Meter Instance table.
> + * TODO: no limit max sdu
> + */
> +
> + if (sfi->meter_id >= 0) {
> + sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
> + sfi_config->multi |= 0x80;
> + }
> +
> +exit:
> + return enetc_send_cmd(priv->si, &cbd);
> +}
> +
> +static int enetc_streamcounter_hw_get(struct enetc_ndev_priv *priv,
> + u32 index,
> + struct psfp_streamfilter_counters *cnt)
> +{
> + struct enetc_cbd cbd = { .cmd = 2 };
> + struct sfi_counter_data *data_buf;
> + dma_addr_t dma;
> + u16 data_size;
> + int err;
> +
> + cbd.index = cpu_to_le16((u16)index);
> + cbd.cmd = 2;
> + cbd.cls = BDCR_CMD_STREAM_FILTER;
> + cbd.status_flags = 0;
> +
> + data_size = sizeof(struct sfi_counter_data);
> + data_buf = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
> + if (!data_buf)
> + return -ENOMEM;
> +
> + dma = dma_map_single(&priv->si->pdev->dev, data_buf,
> + data_size, DMA_FROM_DEVICE);
> + if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
> + netdev_err(priv->si->ndev, "DMA mapping failed!\n");
> + err = -ENOMEM;
> + goto exit;
> + }
> + cbd.addr[0] = lower_32_bits(dma);
> + cbd.addr[1] = upper_32_bits(dma);
> +
> + cbd.length = cpu_to_le16(data_size);
> +
> + err = enetc_send_cmd(priv->si, &cbd);
> + if (err)
> + goto exit;
> +
> + cnt->matching_frames_count =
> + ((u64)le32_to_cpu(data_buf->matchh) << 32)
> + + data_buf->matchl;
> +
> + cnt->not_passing_sdu_count =
> + ((u64)le32_to_cpu(data_buf->msdu_droph) << 32)
> + + data_buf->msdu_dropl;
> +
> + cnt->passing_sdu_count = cnt->matching_frames_count
> + - cnt->not_passing_sdu_count;
> +
> + cnt->not_passing_frames_count =
> + ((u64)le32_to_cpu(data_buf->stream_gate_droph) << 32)
> + + le32_to_cpu(data_buf->stream_gate_dropl);
> +
> + cnt->passing_frames_count = cnt->matching_frames_count
> + - cnt->not_passing_sdu_count
> + - cnt->not_passing_frames_count;
> +
> + cnt->red_frames_count =
> + ((u64)le32_to_cpu(data_buf->flow_meter_droph) << 32)
> + + le32_to_cpu(data_buf->flow_meter_dropl);
> +
> +exit:
> + kfree(data_buf);
> + return err;
> +}
> +
> +static int get_start_ns(struct enetc_ndev_priv *priv, u64 cycle, u64 *start)
> +{
> + u64 now_lo, now_hi, now, n;
> +
> + now_lo = enetc_rd(&priv->si->hw, ENETC_SICTR0);
> + now_hi = enetc_rd(&priv->si->hw, ENETC_SICTR1);
> + now = now_lo | now_hi << 32;
> +
> + if (WARN_ON(!cycle))
> + return -EFAULT;
> +
> + n = div64_u64(now, cycle);
> +
> + *start = (n + 1) * cycle;
> +
> + return 0;
> +}
> +
> +/* Stream Gate Instance Set Descriptor */
> +static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
> + struct enetc_psfp_gate *sgi,
> + u8 enable)
> +{
> + struct enetc_cbd cbd = { .cmd = 0 };
> + struct sgi_table *sgi_config;
> + struct sgcl_conf *sgcl_config;
> + struct sgcl_data *sgcl_data;
> + struct sgce *sgce;
> + dma_addr_t dma;
> + u16 data_size;
> + int err, i;
> +
> + cbd.index = cpu_to_le16(sgi->index);
> + cbd.cmd = 0;
> + cbd.cls = BDCR_CMD_STREAM_GCL;
> + cbd.status_flags = 0x80;
> +
> + /* disable */
> + if (!enable)
> + return enetc_send_cmd(priv->si, &cbd);
> +
> + /* enable */
> + sgi_config = &cbd.sgi_table;
> +
> + /* Keep open before gate list start */
> + sgi_config->ocgtst = 0x80;
> +
> + sgi_config->oipv = (sgi->init_ipv < 0) ?
> + 0x0 : ((sgi->init_ipv & 0x7) | 0x8);
> +
> + sgi_config->en = 0x80;
> +
> + /* Basic config */
> + err = enetc_send_cmd(priv->si, &cbd);
> + if (err)
> + return -EINVAL;
> +
> + if (!sgi->num_entries)
> + return 0;
> +
> + if (sgi->num_entries > priv->psfp_cap.max_psfp_gatelist ||
> + !sgi->cycletime)
> + return -EINVAL;

You already check this when "sgi" is created, this check doesn't seem to
be needed. If it's needed, it should be moved to the top of the
function.

> +
> + memset(&cbd, 0, sizeof(cbd));
> +
> + cbd.index = cpu_to_le16(sgi->index);
> + cbd.cmd = 1;
> + cbd.cls = BDCR_CMD_STREAM_GCL;
> + cbd.status_flags = 0;
> +
> + sgcl_config = &cbd.sgcl_conf;
> +
> + sgcl_config->acl_len = (sgi->num_entries - 1) & 0x3;
> +
> + data_size = struct_size(sgcl_data, sgcl, sgi->num_entries);
> +
> + sgcl_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
> + if (!sgcl_data)
> + return -ENOMEM;
> +
> + cbd.length = cpu_to_le16(data_size);
> +
> + dma = dma_map_single(&priv->si->pdev->dev,
> + sgcl_data, data_size,
> + DMA_FROM_DEVICE);
> + if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
> + netdev_err(priv->si->ndev, "DMA mapping failed!\n");
> + kfree(sgcl_data);
> + return -ENOMEM;
> + }
> +
> + cbd.addr[0] = lower_32_bits(dma);
> + cbd.addr[1] = upper_32_bits(dma);
> +
> + sgce = &sgcl_data->sgcl[0];
> +
> + sgcl_config->agtst = 0x80;
> +
> + sgcl_data->ct = cpu_to_le32(sgi->cycletime);
> + sgcl_data->cte = cpu_to_le32(sgi->cycletimext);
> +
> + if (sgi->init_ipv >= 0)
> + sgcl_config->aipv = (sgi->init_ipv & 0x7) | 0x8;
> +
> + for (i = 0; i < sgi->num_entries; i++) {
> + struct action_gate_entry *from = &sgi->entries[i];
> + struct sgce *to = &sgce[i];
> +
> + if (from->gate_state)
> + to->multi |= 0x10;
> +
> + if (from->ipv >= 0)
> + to->multi |= ((from->ipv & 0x7) << 5) | 0x08;
> +
> + if (from->maxoctets)
> + to->multi |= 0x01;
> +
> + to->interval = cpu_to_le32(from->interval);
> + to->msdu[0] = from->maxoctets & 0xFF;
> + to->msdu[1] = (from->maxoctets >> 8) & 0xFF;
> + to->msdu[2] = (from->maxoctets >> 16) & 0xFF;
> + }
> +
> + /* If basetime is 0, calculate start time */
> + if (!sgi->basetime) {
> + u64 start;
> +
> + err = get_start_ns(priv, sgi->cycletime, &start);
> + if (err)
> + goto exit;
> + sgcl_data->btl = cpu_to_le32(lower_32_bits(start));
> + sgcl_data->bth = cpu_to_le32(upper_32_bits(start));
> + } else {
> + u32 hi, lo;
> +
> + hi = upper_32_bits(sgi->basetime);
> + lo = lower_32_bits(sgi->basetime);
> + sgcl_data->bth = cpu_to_le32(hi);
> + sgcl_data->btl = cpu_to_le32(lo);
> + }
> +
> + err = enetc_send_cmd(priv->si, &cbd);
> +
> +exit:
> + kfree(sgcl_data);
> +
> + return err;
> +}
> +
> +static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
> +{
> + struct enetc_stream_filter *f;
> +
> + hlist_for_each_entry(f, &epsfp.stream_list, node)
> + if (f->sid.index == index)
> + return f;
> +
> + return NULL;
> +}
> +
> +static struct enetc_psfp_gate *enetc_get_gate_by_index(u32 index)
> +{
> + struct enetc_psfp_gate *g;
> +
> + hlist_for_each_entry(g, &epsfp.psfp_gate_list, node)
> + if (g->index == index)
> + return g;
> +
> + return NULL;
> +}
> +
> +static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
> +{
> + struct enetc_psfp_filter *s;
> +
> + hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
> + if (s->index == index)
> + return s;
> +
> + return NULL;
> +}
> +
> +static struct enetc_psfp_filter
> + *enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
> +{
> + struct enetc_psfp_filter *s;
> +
> + hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
> + if (s->gate_id == sfi->gate_id &&
> + s->prio == sfi->prio &&
> + s->meter_id == sfi->meter_id)
> + return s;
> +
> + return NULL;
> +}
> +
> +static int enetc_get_free_index(struct enetc_ndev_priv *priv)
> +{
> + u32 max_size = priv->psfp_cap.max_psfp_filter;
> + unsigned long index;
> +
> + index = find_first_zero_bit(epsfp.psfp_sfi_bitmap, max_size);
> + if (index == max_size)
> + return -1;
> +
> + return index;
> +}
> +
> +static void reduce_ref_sfi(struct enetc_ndev_priv *priv, u32 index)

Please use the more usual idioms, something like "sfi_unref()" or
"stream_filter_unref".

> +{
> + struct enetc_psfp_filter *sfi;
> +
> + sfi = enetc_get_filter_by_index(index);
> + WARN_ON(!sfi);
> + sfi->refcount--;
> +
> + if (!sfi->refcount) {
> + enetc_streamfilter_hw_set(priv, sfi, false);
> + hlist_del(&sfi->node);
> + kfree(sfi);
> + clear_bit(sfi->index, epsfp.psfp_sfi_bitmap);
> + }
> +}
> +
> +static void reduce_ref_sgi(struct enetc_ndev_priv *priv, u32 index)
> +{
> + struct enetc_psfp_gate *sgi;
> +
> + sgi = enetc_get_gate_by_index(index);
> + WARN_ON(!sgi);
> + sgi->refcount--;
> +
> + if (!sgi->refcount) {
> + enetc_streamgate_hw_set(priv, sgi, false);
> + hlist_del(&sgi->node);
> + kfree(sgi);
> + }
> +}
> +
> +static void remove_one_chain(struct enetc_ndev_priv *priv,
> + struct enetc_stream_filter *filter)
> +{
> + reduce_ref_sgi(priv, filter->sgi_index);
> + reduce_ref_sfi(priv, filter->sfi_index);
> +
> + hlist_del(&filter->node);
> + kfree(filter);
> +}
> +
> +static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
> + struct enetc_streamid *sid,
> + struct enetc_psfp_filter *sfi,
> + struct enetc_psfp_gate *sgi)
> +{
> + int err;
> +
> + err = enetc_streamid_hw_set(priv, sid, true);
> + if (err)
> + return err;
> +
> + if (sfi) {
> + err = enetc_streamfilter_hw_set(priv, sfi, true);
> + if (err)
> + goto revert_sid;
> + }
> +
> + err = enetc_streamgate_hw_set(priv, sgi, true);
> + if (err)
> + goto revert_sfi;
> +
> + return 0;
> +
> +revert_sfi:
> + if (sfi && !sfi->refcount)
> + enetc_streamfilter_hw_set(priv, sfi, false);
> +revert_sid:
> + enetc_streamid_hw_set(priv, sid, false);
> + return err;
> +}
> +
> +struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int inputkeys)
> +{
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(enetc_act_fwd); i++)
> + if (acts == enetc_act_fwd[i].actions &&
> + inputkeys & enetc_act_fwd[i].keys)
> + return &enetc_act_fwd[i];
> +
> + return NULL;
> +}
> +
> +static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *f)
> +{
> + struct flow_rule *rule = flow_cls_offload_flow_rule(f);
> + struct netlink_ext_ack *extack = f->common.extack;
> + struct enetc_stream_filter *filter, *old_filter;
> + struct enetc_psfp_filter *sfi, *old_sfi;
> + struct enetc_psfp_gate *sgi, *old_sgi;
> + struct flow_action_entry *entry;
> + struct action_gate_entry *e;
> + u8 sfi_overwrite = 0;
> + int entries_size;
> + int i, err;
> +
> + if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
> + return -ENOSPC;
> + }
> +
> + flow_action_for_each(i, entry, &rule->action)
> + if (entry->id == FLOW_ACTION_GATE)
> + break;
> +
> + if (entry->id != FLOW_ACTION_GATE)
> + return -EINVAL;
> +
> + filter = kzalloc(sizeof(*filter), GFP_KERNEL);
> + if (!filter)
> + return -ENOMEM;
> +
> + filter->sid.index = f->common.chain_index;
> +
> + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
> + struct flow_match_eth_addrs match;
> +
> + flow_rule_match_eth_addrs(rule, &match);
> +
> + if (!is_zero_ether_addr(match.mask->dst)) {
> + ether_addr_copy(filter->sid.dst_mac, match.key->dst);
> + filter->sid.filtertype = STREAMID_TYPE_NULL;
> + }
> +
> + if (!is_zero_ether_addr(match.mask->src)) {
> + ether_addr_copy(filter->sid.src_mac, match.key->src);
> + filter->sid.filtertype = STREAMID_TYPE_SMAC;
> + }
> + } else {
> + NL_SET_ERR_MSG_MOD(extack, "Unsupported, must ETH_ADDRS");
> + return -EINVAL;
> + }
> +
> + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
> + struct flow_match_vlan match;
> +
> + flow_rule_match_vlan(rule, &match);
> + if (match.mask->vlan_priority) {
> + if (match.mask->vlan_priority !=
> + (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
> + NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN priority");
> + err = -EINVAL;
> + goto free_filter;
> + }
> + }
> +
> + if (match.mask->vlan_tpid) {
> + if (match.mask->vlan_tpid != VLAN_VID_MASK) {
> + NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN id");
> + err = -EINVAL;
> + goto free_filter;
> + }
> +
> + filter->sid.vid = match.key->vlan_tpid;
> + if (!filter->sid.vid)
> + filter->sid.tagged = STREAMID_VLAN_UNTAGGED;
> + else
> + filter->sid.tagged = STREAMID_VLAN_TAGGED;
> + }
> + } else {
> + filter->sid.tagged = STREAMID_VLAN_ALL;
> + }
> +
> + /* parsing gate action */
> + if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
> + err = -ENOSPC;
> + goto free_filter;
> + }
> +
> + if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
> + err = -ENOSPC;
> + goto free_filter;
> + }
> +
> + entries_size = struct_size(sgi, entries, entry->gate.num_entries);
> + sgi = kzalloc(entries_size, GFP_KERNEL);
> + if (!sgi) {
> + err = -ENOMEM;
> + goto free_filter;
> + }
> +
> + sgi->index = entry->gate.index;
> + sgi->init_ipv = entry->gate.prio;
> + sgi->basetime = entry->gate.basetime;
> + sgi->cycletime = entry->gate.cycletime;
> + sgi->num_entries = entry->gate.num_entries;
> +
> + e = sgi->entries;
> + for (i = 0; i < entry->gate.num_entries; i++) {
> + e[i].gate_state = entry->gate.entries[i].gate_state;
> + e[i].interval = entry->gate.entries[i].interval;
> + e[i].ipv = entry->gate.entries[i].ipv;
> + e[i].maxoctets = entry->gate.entries[i].maxoctets;
> + }
> +
> + filter->sgi_index = sgi->index;
> +
> + sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
> + if (!sfi) {
> + err = -ENOMEM;
> + goto free_gate;
> + }
> +
> + sfi->gate_id = sgi->index;
> +
> + /* flow meter not support yet */
> + sfi->meter_id = ENETC_PSFP_WILDCARD;
> +
> + /* prio ref the filter prio */
> + if (f->common.prio && f->common.prio <= BIT(3))
> + sfi->prio = f->common.prio - 1;
> + else
> + sfi->prio = ENETC_PSFP_WILDCARD;
> +
> + old_sfi = enetc_psfp_check_sfi(sfi);
> + if (!old_sfi) {
> + int index;
> +
> + index = enetc_get_free_index(priv);
> + if (sfi->handle < 0) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
> + err = -ENOSPC;
> + goto free_sfi;
> + }
> +
> + sfi->index = index;
> + sfi->handle = index + HANDLE_OFFSET;
> + /* Update the stream filter handle also */
> + filter->sid.handle = sfi->handle;
> + filter->sfi_index = sfi->index;
> + sfi_overwrite = 0;
> + } else {
> + filter->sfi_index = old_sfi->index;
> + filter->sid.handle = old_sfi->handle;
> + sfi_overwrite = 1;
> + }
> +
> + err = enetc_psfp_hw_set(priv, &filter->sid,
> + sfi_overwrite ? NULL : sfi, sgi);
> + if (err)
> + goto free_sfi;
> +
> + spin_lock(&epsfp.psfp_lock);
> + old_sgi = enetc_get_gate_by_index(filter->sgi_index);
> + if (old_sgi) {
> + sgi->refcount = old_sgi->refcount;
> + hlist_del(&old_sgi->node);
> + kfree(old_sgi);

I don't understand what you are trying to achieve here. But there should
be a cleaner way.

> + }
> +
> + sgi->refcount++;
> + hlist_add_head(&sgi->node, &epsfp.psfp_gate_list);
> +
> + if (!old_sfi) {
> + hlist_add_head(&sfi->node, &epsfp.psfp_filter_list);
> + set_bit(sfi->index, epsfp.psfp_sfi_bitmap);
> + sfi->refcount++;
> + } else {
> + kfree(sfi);
> + old_sfi->refcount++;
> + }
> +
> + old_filter = enetc_get_stream_by_index(filter->sid.index);
> + if (old_filter)
> + remove_one_chain(priv, old_filter);
> +
> + filter->stats.lastused = jiffies;
> + hlist_add_head(&filter->node, &epsfp.stream_list);
> +
> + spin_unlock(&epsfp.psfp_lock);
> +
> + return 0;
> +
> +free_sfi:
> + kfree(sfi);
> +free_gate:
> + kfree(sgi);
> +free_filter:
> + kfree(filter);
> +
> + return err;
> +}
> +
> +static int enetc_config_clsflower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *cls_flower)
> +{
> + struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
> + struct netlink_ext_ack *extack = cls_flower->common.extack;
> + struct flow_dissector *dissector = rule->match.dissector;
> + struct flow_action *action = &rule->action;
> + struct flow_action_entry *entry;
> + struct actions_fwd *fwd;
> + u64 actions = 0;
> + int i, err;
> +
> + if (!flow_action_has_entries(action)) {
> + NL_SET_ERR_MSG_MOD(extack, "At least one action is needed");
> + return -EINVAL;
> + }
> +
> + flow_action_for_each(i, entry, action)
> + actions |= BIT(entry->id);
> +
> + fwd = enetc_check_flow_actions(actions, dissector->used_keys);
> + if (!fwd) {
> + NL_SET_ERR_MSG_MOD(extack, "Unsupported filter type!");
> + return -EOPNOTSUPP;
> + }
> +
> + if (fwd->output & FILTER_ACTION_TYPE_PSFP) {
> + err = enetc_psfp_parse_clsflower(priv, cls_flower);
> + if (err) {
> + NL_SET_ERR_MSG_MOD(extack, "Invalid PSFP inputs");
> + return err;
> + }
> + } else {
> + NL_SET_ERR_MSG_MOD(extack, "Unsupported actions");
> + return -EOPNOTSUPP;
> + }
> +
> + return 0;
> +}
> +
> +static int enetc_psfp_destroy_clsflower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *f)
> +{
> + struct enetc_stream_filter *filter;
> + struct netlink_ext_ack *extack = f->common.extack;
> + int err;
> +
> + if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
> + return -ENOSPC;
> + }
> +
> + filter = enetc_get_stream_by_index(f->common.chain_index);
> + if (!filter)
> + return -EINVAL;
> +
> + err = enetc_streamid_hw_set(priv, &filter->sid, false);
> + if (err)
> + return err;
> +
> + remove_one_chain(priv, filter);
> +
> + return 0;
> +}
> +
> +static int enetc_destroy_clsflower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *f)
> +{
> + return enetc_psfp_destroy_clsflower(priv, f);
> +}
> +
> +static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *f)
> +{
> + struct psfp_streamfilter_counters counters = {};
> + struct enetc_stream_filter *filter;
> + struct flow_stats stats = {};
> + int err;
> +
> + filter = enetc_get_stream_by_index(f->common.chain_index);
> + if (!filter)
> + return -EINVAL;
> +
> + err = enetc_streamcounter_hw_get(priv, filter->sfi_index, &counters);
> + if (err)
> + return -EINVAL;
> +
> + spin_lock(&epsfp.psfp_lock);
> + stats.pkts = counters.matching_frames_count - filter->stats.pkts;
> + stats.dropped = counters.not_passing_frames_count -
> + filter->stats.dropped;
> + stats.lastused = filter->stats.lastused;
> + filter->stats.pkts += stats.pkts;
> + filter->stats.dropped += stats.dropped;
> +
> + spin_unlock(&epsfp.psfp_lock);
> +
> + flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
> + stats.dropped);
> +
> + return 0;
> +}
> +
> +static int enetc_setup_tc_cls_flower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *cls_flower)
> +{
> + switch (cls_flower->command) {
> + case FLOW_CLS_REPLACE:
> + return enetc_config_clsflower(priv, cls_flower);
> + case FLOW_CLS_DESTROY:
> + return enetc_destroy_clsflower(priv, cls_flower);
> + case FLOW_CLS_STATS:
> + return enetc_psfp_get_stats(priv, cls_flower);
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +static inline void clean_psfp_sfi_bitmap(void)
> +{
> + bitmap_free(epsfp.psfp_sfi_bitmap);
> + epsfp.psfp_sfi_bitmap = NULL;
> +}
> +
> +static void clean_stream_list(void)
> +{
> + struct enetc_stream_filter *s;
> + struct hlist_node *tmp;
> +
> + hlist_for_each_entry_safe(s, tmp, &epsfp.stream_list, node) {
> + hlist_del(&s->node);
> + kfree(s);
> + }
> +}
> +
> +static void clean_sfi_list(void)
> +{
> + struct enetc_psfp_filter *sfi;
> + struct hlist_node *tmp;
> +
> + hlist_for_each_entry_safe(sfi, tmp, &epsfp.psfp_filter_list, node) {
> + hlist_del(&sfi->node);
> + kfree(sfi);
> + }
> +}
> +
> +static void clean_sgi_list(void)
> +{
> + struct enetc_psfp_gate *sgi;
> + struct hlist_node *tmp;
> +
> + hlist_for_each_entry_safe(sgi, tmp, &epsfp.psfp_gate_list, node) {
> + hlist_del(&sgi->node);
> + kfree(sgi);
> + }
> +}
> +
> +static void clean_psfp_all(void)
> +{
> + /* Disable all list nodes and free all memory */
> + clean_sfi_list();
> + clean_sgi_list();
> + clean_stream_list();
> + epsfp.dev_bitmap = 0;
> + clean_psfp_sfi_bitmap();
> +}
> +
> +int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
> + void *cb_priv)
> +{
> + struct net_device *ndev = cb_priv;
> +
> + if (!tc_can_offload(ndev))
> + return -EOPNOTSUPP;
> +
> + switch (type) {
> + case TC_SETUP_CLSFLOWER:
> + return enetc_setup_tc_cls_flower(netdev_priv(ndev), type_data);
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +int enetc_psfp_init(struct enetc_ndev_priv *priv)
> +{
> + if (epsfp.psfp_sfi_bitmap)
> + return 0;
> +
> + epsfp.psfp_sfi_bitmap = bitmap_zalloc(priv->psfp_cap.max_psfp_filter,
> + GFP_KERNEL);
> + if (!epsfp.psfp_sfi_bitmap)
> + return -ENOMEM;
> +
> + spin_lock_init(&epsfp.psfp_lock);
> +
> + if (list_empty(&enetc_block_cb_list))
> + epsfp.dev_bitmap = 0;
> +
> + return 0;
> +}
> +
> +int enetc_psfp_clean(struct enetc_ndev_priv *priv)
> +{
> + if (!list_empty(&enetc_block_cb_list))
> + return -EBUSY;
> +
> + clean_psfp_all();
> +
> + return 0;
> +}
> +
> +int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data)
> +{
> + struct enetc_ndev_priv *priv = netdev_priv(ndev);
> + struct flow_block_offload *f = type_data;
> + int err;
> +
> + err = flow_block_cb_setup_simple(f, &enetc_block_cb_list,
> + enetc_setup_tc_block_cb,
> + ndev, ndev, true);
> + if (err)
> + return err;
> +
> + switch (f->command) {
> + case FLOW_BLOCK_BIND:
> + set_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
> + break;
> + case FLOW_BLOCK_UNBIND:
> + clear_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
> + if (!epsfp.dev_bitmap)
> + clean_psfp_all();
> + break;
> + }
> +
> + return 0;
> +}
> --
> 2.17.1
>

--
Vinicius

2020-03-13 06:04:10

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [RFC,net-next 5/9] net: enetc: add tc flower psfp offload driver

Hi Vinicius

> -----Original Message-----
> From: Vinicius Costa Gomes <[email protected]>
> Sent: 2020年3月13日 6:14

> > struct enetc_cbd {
> > union{
>
> Wrong identation here. Should be fixed on a separated patch.

Thanks.

>
> > static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
> > {
> > @@ -108,13 +110,13 @@ static int enetc_setup_taprio(struct net_device
> *ndev,
> > gcl_data->cte = cpu_to_le32(admin_conf->cycle_time_extension);
> >
> > for (i = 0; i < gcl_len; i++) {
> > - struct tc_taprio_sched_entry *temp_entry;
> > + struct tc_taprio_sched_entry *to;
> > struct gce *temp_gce = gce + i;
> >
> > - temp_entry = &admin_conf->entries[i];
> > + to = &admin_conf->entries[i];
> >
> > - temp_gce->gate = (u8)temp_entry->gate_mask;
> > - temp_gce->period = cpu_to_le32(temp_entry->interval);
> > + temp_gce->gate = (u8)to->gate_mask;
> > + temp_gce->period = cpu_to_le32(to->interval);
>
> These changes seem unrelated to the series.

Thanks, these should be mis operation by replace.

>
> > }
> >
> > cbd.length = cpu_to_le16(data_size);
> > @@ -331,3 +333,1071 @@ int enetc_setup_tc_txtime(struct net_device
> *ndev, void *type_data)
> >
> > return 0;
> > }
> > +
> > +enum streamid_type {
> > + STREAMID_TYPE_RESERVED = 0,
> > + STREAMID_TYPE_NULL,
> > + STREAMID_TYPE_SMAC,
> > +};
> > +
> > +enum streamid_vlan_tagged {
> > + STREAMID_VLAN_RESERVED = 0,
> > + STREAMID_VLAN_TAGGED,
> > + STREAMID_VLAN_UNTAGGED,
> > + STREAMID_VLAN_ALL,
> > +};
> > +
> > +#define ENETC_PSFP_WILDCARD -1
> > +#define HANDLE_OFFSET 100
> > +
> > +enum forward_type {
> > + FILTER_ACTION_TYPE_PSFP = BIT(0),
> > + FILTER_ACTION_TYPE_ACL = BIT(1),
> > + FILTER_ACTION_TYPE_BOTH = GENMASK(1, 0),
> > +};
> > +
> > +/* This is for limit output type for input actions */
> > +struct actions_fwd {
> > + u64 actions;
> > + u64 keys; /* include the must needed keys */
> > + enum forward_type output;
> > +};
> > +
> > +struct psfp_streamfilter_counters {
> > + u64 matching_frames_count;
> > + u64 passing_frames_count;
> > + u64 not_passing_frames_count;
> > + u64 passing_sdu_count;
> > + u64 not_passing_sdu_count;
> > + u64 red_frames_count;
> > +};
> > +
> > +struct enetc_streamid {
> > + u32 index;
> > + union {
> > + u8 src_mac[6];
> > + u8 dst_mac[6];
> > + };
> > + u8 filtertype;
> > + u16 vid;
> > + u8 tagged;
> > + s32 handle;
> > +};
> > +
> > +struct enetc_psfp_filter {
> > + u32 index;
> > + s32 handle;
> > + s8 prio;
> > + u32 gate_id;
> > + s32 meter_id;
> > + u32 refcount;
>
> It may be more appropriate to use <linux/refcount.h> instead of rolling
> your own.

Ok, will try. thanks.

>
> > + struct hlist_node node;
> > +};
> > +
> > +struct enetc_psfp_gate {
> > + u32 index;
> > + s8 init_ipv;
> > + u64 basetime;
> > + u64 cycletime;
> > + u64 cycletimext;
> > + u32 num_entries;
> > + u32 refcount;
> > + struct hlist_node node;
> > + struct action_gate_entry entries[0];
> > +};
> > +
> > +struct enetc_stream_filter {
> > + struct enetc_streamid sid;
> > + u32 sfi_index;
> > + u32 sgi_index;
> > + struct flow_stats stats;
> > + struct hlist_node node;
> > +};
> > +
> > +struct enetc_psfp {
> > + unsigned long dev_bitmap;
> > + unsigned long *psfp_sfi_bitmap;
> > + struct hlist_head stream_list;
> > + struct hlist_head psfp_filter_list;
> > + struct hlist_head psfp_gate_list;
> > + spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
> > +};
> > +
> > +struct actions_fwd enetc_act_fwd[] = {
> > + {
> > + BIT(FLOW_ACTION_GATE),
> > + BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
> > + FILTER_ACTION_TYPE_PSFP
> > + },
> > + /* example for ACL actions */
> > + {
> > + BIT(FLOW_ACTION_DROP),
> > + 0,
> > + FILTER_ACTION_TYPE_ACL
> > + }
> > +};
> > +
> > +static struct enetc_psfp epsfp = {
> > + .psfp_sfi_bitmap = NULL,
> > +};
>
> Is it possible to have more than one these controllers in the same
> system? If it's possible to have more than one, this should be moved to
> live under a per-device struct.
>

There is only one Qci for one controller (here is one ENETC controller with multi ports). For each stream gate/flow meter entry could be assigned to any enetc port.
So the enetc ports sharing this structure. psfp_sfi_bitmap means which entry is be occupied.

> > +
> > +static LIST_HEAD(enetc_block_cb_list);
> > +
> > +static inline int enetc_get_port(struct enetc_ndev_priv *priv)
> > +{
> > + return priv->si->pdev->devfn & 0x7;
> > +}
> > +
> > +/* Stream Identity Entry Set Descriptor */
> > +static int enetc_streamid_hw_set(struct enetc_ndev_priv *priv,
> > + struct enetc_streamid *sid,
> > + u8 enable)
> > +{
> > + struct enetc_cbd cbd = {.cmd = 0};
> > + struct streamid_data *si_data;
> > + struct streamid_conf *si_conf;
> > + u16 data_size;
> > + dma_addr_t dma;
> > + int err;
> > +
> > + if (sid->index >= priv->psfp_cap.max_streamid)
> > + return -EINVAL;
> > +
> > + if (sid->filtertype != STREAMID_TYPE_NULL &&
> > + sid->filtertype != STREAMID_TYPE_SMAC)
> > + return -EOPNOTSUPP;
> > +
> > + /* Disable operation before enable */
> > + cbd.index = cpu_to_le16((u16)sid->index);
> > + cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
> > + cbd.status_flags = 0;
> > +
> > + data_size = sizeof(struct streamid_data);
> > + si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
> > + cbd.length = cpu_to_le16(data_size);
> > +
> > + dma = dma_map_single(&priv->si->pdev->dev, si_data,
> > + data_size, DMA_FROM_DEVICE);
> > + if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
> > + netdev_err(priv->si->ndev, "DMA mapping failed!\n");
> > + kfree(si_data);
> > + return -ENOMEM;
> > + }
> > +
> > + cbd.addr[0] = lower_32_bits(dma);
> > + cbd.addr[1] = upper_32_bits(dma);
> > + memset(si_data->dmac, 0xff, ETH_ALEN);
> > + si_data->vid_vidm_tg =
> > + cpu_to_le16(ENETC_CBDR_SID_VID_MASK
> > + + ((0x3 << 14) | ENETC_CBDR_SID_VIDM));
> > +
> > + si_conf = &cbd.sid_set;
> > + /* Only one port supported for one entry, set itself */
> > + si_conf->iports = 1 << enetc_get_port(priv);
> > + si_conf->id_type = 1;
> > + si_conf->oui[2] = 0x0;
> > + si_conf->oui[1] = 0x80;
> > + si_conf->oui[0] = 0xC2;
> > +
> > + err = enetc_send_cmd(priv->si, &cbd);
> > + if (err)
> > + return -EINVAL;
> > +
> > + if (!enable) {
> > + kfree(si_data);
> > + return 0;
> > + }
> > +
> > + /* Enable the entry overwrite again incase space flushed by hardware */
> > + memset(&cbd, 0, sizeof(cbd));
> > +
> > + cbd.index = cpu_to_le16((u16)sid->index);
> > + cbd.cmd = 0;
> > + cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
> > + cbd.status_flags = 0;
> > +
> > + si_conf->en = 0x80;
> > + si_conf->stream_handle = cpu_to_le32(sid->handle);
> > + si_conf->iports = 1 << enetc_get_port(priv);
> > + si_conf->id_type = sid->filtertype;
> > + si_conf->oui[2] = 0x0;
> > + si_conf->oui[1] = 0x80;
> > + si_conf->oui[0] = 0xC2;
> > +
> > + memset(si_data, 0, data_size);
> > +
> > + cbd.length = cpu_to_le16(data_size);
> > +
> > + cbd.addr[0] = lower_32_bits(dma);
> > + cbd.addr[1] = upper_32_bits(dma);
> > +
> > + /* VIDM default to be 1.
> > + * VID Match. If set (b1) then the VID must match, otherwise
> > + * any VID is considered a match. VIDM setting is only used
> > + * when TG is set to b01.
> > + */
> > + if (si_conf->id_type == STREAMID_TYPE_NULL) {
> > + ether_addr_copy(si_data->dmac, sid->dst_mac);
> > + si_data->vid_vidm_tg =
> > + cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
> > + ((((u16)(sid->tagged) & 0x3) << 14)
> > + | ENETC_CBDR_SID_VIDM));
> > + } else if (si_conf->id_type == STREAMID_TYPE_SMAC) {
> > + ether_addr_copy(si_data->smac, sid->src_mac);
> > + si_data->vid_vidm_tg =
> > + cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
> > + ((((u16)(sid->tagged) & 0x3) << 14)
> > + | ENETC_CBDR_SID_VIDM));
> > + }
> > +
> > + err = enetc_send_cmd(priv->si, &cbd);
> > + kfree(si_data);
> > +
> > + return err;
> > +}
> > +
> > +/* Stream Filter Instance Set Descriptor */
> > +static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
> > + struct enetc_psfp_filter *sfi,
> > + u8 enable)
> > +{
> > + struct enetc_cbd cbd = {.cmd = 0};
> > + struct sfi_conf *sfi_config;
> > +
> > + cbd.index = cpu_to_le16(sfi->index);
> > + cbd.cls = BDCR_CMD_STREAM_FILTER;
> > + cbd.status_flags = 0x80;
> > + cbd.length = cpu_to_le16(1);
> > +
> > + sfi_config = &cbd.sfi_conf;
> > + if (!enable)
> > + goto exit;
> > +
> > + sfi_config->en = 0x80;
> > +
> > + if (sfi->handle >= 0) {
> > + sfi_config->stream_handle =
> > + cpu_to_le32(sfi->handle);
> > + sfi_config->sthm |= 0x80;
> > + }
> > +
> > + sfi_config->sg_inst_table_index = cpu_to_le16(sfi->gate_id);
> > + sfi_config->input_ports = 1 << enetc_get_port(priv);
> > +
> > + /* The priority value which may be matched against the
> > + * frame’s priority value to determine a match for this entry.
> > + */
> > + if (sfi->prio >= 0)
> > + sfi_config->multi |= (sfi->prio & 0x7) | 0x8;
> > +
> > + /* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
> > + * field as being either an MSDU value or an index into the Flow
> > + * Meter Instance table.
> > + * TODO: no limit max sdu
> > + */
> > +
> > + if (sfi->meter_id >= 0) {
> > + sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
> > + sfi_config->multi |= 0x80;
> > + }
> > +
> > +exit:
> > + return enetc_send_cmd(priv->si, &cbd);
> > +}
> > +
> > +static int enetc_streamcounter_hw_get(struct enetc_ndev_priv *priv,
> > + u32 index,
> > + struct psfp_streamfilter_counters *cnt)
> > +{
> > + struct enetc_cbd cbd = { .cmd = 2 };
> > + struct sfi_counter_data *data_buf;
> > + dma_addr_t dma;
> > + u16 data_size;
> > + int err;
> > +
> > + cbd.index = cpu_to_le16((u16)index);
> > + cbd.cmd = 2;
> > + cbd.cls = BDCR_CMD_STREAM_FILTER;
> > + cbd.status_flags = 0;
> > +
> > + data_size = sizeof(struct sfi_counter_data);
> > + data_buf = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
> > + if (!data_buf)
> > + return -ENOMEM;
> > +
> > + dma = dma_map_single(&priv->si->pdev->dev, data_buf,
> > + data_size, DMA_FROM_DEVICE);
> > + if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
> > + netdev_err(priv->si->ndev, "DMA mapping failed!\n");
> > + err = -ENOMEM;
> > + goto exit;
> > + }
> > + cbd.addr[0] = lower_32_bits(dma);
> > + cbd.addr[1] = upper_32_bits(dma);
> > +
> > + cbd.length = cpu_to_le16(data_size);
> > +
> > + err = enetc_send_cmd(priv->si, &cbd);
> > + if (err)
> > + goto exit;
> > +
> > + cnt->matching_frames_count =
> > + ((u64)le32_to_cpu(data_buf->matchh) << 32)
> > + + data_buf->matchl;
> > +
> > + cnt->not_passing_sdu_count =
> > + ((u64)le32_to_cpu(data_buf->msdu_droph) << 32)
> > + + data_buf->msdu_dropl;
> > +
> > + cnt->passing_sdu_count = cnt->matching_frames_count
> > + - cnt->not_passing_sdu_count;
> > +
> > + cnt->not_passing_frames_count =
> > + ((u64)le32_to_cpu(data_buf->stream_gate_droph) << 32)
> > + + le32_to_cpu(data_buf->stream_gate_dropl);
> > +
> > + cnt->passing_frames_count = cnt->matching_frames_count
> > + - cnt->not_passing_sdu_count
> > + - cnt->not_passing_frames_count;
> > +
> > + cnt->red_frames_count =
> > + ((u64)le32_to_cpu(data_buf->flow_meter_droph) << 32)
> > + + le32_to_cpu(data_buf->flow_meter_dropl);
> > +
> > +exit:
> > + kfree(data_buf);
> > + return err;
> > +}
> > +
> > +static int get_start_ns(struct enetc_ndev_priv *priv, u64 cycle, u64 *start)
> > +{
> > + u64 now_lo, now_hi, now, n;
> > +
> > + now_lo = enetc_rd(&priv->si->hw, ENETC_SICTR0);
> > + now_hi = enetc_rd(&priv->si->hw, ENETC_SICTR1);
> > + now = now_lo | now_hi << 32;
> > +
> > + if (WARN_ON(!cycle))
> > + return -EFAULT;
> > +
> > + n = div64_u64(now, cycle);
> > +
> > + *start = (n + 1) * cycle;
> > +
> > + return 0;
> > +}
> > +
> > +/* Stream Gate Instance Set Descriptor */
> > +static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
> > + struct enetc_psfp_gate *sgi,
> > + u8 enable)
> > +{
> > + struct enetc_cbd cbd = { .cmd = 0 };
> > + struct sgi_table *sgi_config;
> > + struct sgcl_conf *sgcl_config;
> > + struct sgcl_data *sgcl_data;
> > + struct sgce *sgce;
> > + dma_addr_t dma;
> > + u16 data_size;
> > + int err, i;
> > +
> > + cbd.index = cpu_to_le16(sgi->index);
> > + cbd.cmd = 0;
> > + cbd.cls = BDCR_CMD_STREAM_GCL;
> > + cbd.status_flags = 0x80;
> > +
> > + /* disable */
> > + if (!enable)
> > + return enetc_send_cmd(priv->si, &cbd);
> > +
> > + /* enable */
> > + sgi_config = &cbd.sgi_table;
> > +
> > + /* Keep open before gate list start */
> > + sgi_config->ocgtst = 0x80;
> > +
> > + sgi_config->oipv = (sgi->init_ipv < 0) ?
> > + 0x0 : ((sgi->init_ipv & 0x7) | 0x8);
> > +
> > + sgi_config->en = 0x80;
> > +
> > + /* Basic config */
> > + err = enetc_send_cmd(priv->si, &cbd);
> > + if (err)
> > + return -EINVAL;
> > +
> > + if (!sgi->num_entries)
> > + return 0;
> > +
> > + if (sgi->num_entries > priv->psfp_cap.max_psfp_gatelist ||
> > + !sgi->cycletime)
> > + return -EINVAL;
>
> You already check this when "sgi" is created, this check doesn't seem to
> be needed. If it's needed, it should be moved to the top of the
> function.

Yes, should be move to upper after disable operation since sgi entries is not set when disable.

>
> > +
> > + memset(&cbd, 0, sizeof(cbd));
> > +
> > + cbd.index = cpu_to_le16(sgi->index);
> > + cbd.cmd = 1;
> > + cbd.cls = BDCR_CMD_STREAM_GCL;
> > + cbd.status_flags = 0;
> > +
> > + sgcl_config = &cbd.sgcl_conf;
> > +
> > + sgcl_config->acl_len = (sgi->num_entries - 1) & 0x3;
> > +
> > + data_size = struct_size(sgcl_data, sgcl, sgi->num_entries);
> > +
> > + sgcl_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
> > + if (!sgcl_data)
> > + return -ENOMEM;
> > +
> > + cbd.length = cpu_to_le16(data_size);
> > +
> > + dma = dma_map_single(&priv->si->pdev->dev,
> > + sgcl_data, data_size,
> > + DMA_FROM_DEVICE);
> > + if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
> > + netdev_err(priv->si->ndev, "DMA mapping failed!\n");
> > + kfree(sgcl_data);
> > + return -ENOMEM;
> > + }
> > +
> > + cbd.addr[0] = lower_32_bits(dma);
> > + cbd.addr[1] = upper_32_bits(dma);
> > +
> > + sgce = &sgcl_data->sgcl[0];
> > +
> > + sgcl_config->agtst = 0x80;
> > +
> > + sgcl_data->ct = cpu_to_le32(sgi->cycletime);
> > + sgcl_data->cte = cpu_to_le32(sgi->cycletimext);
> > +
> > + if (sgi->init_ipv >= 0)
> > + sgcl_config->aipv = (sgi->init_ipv & 0x7) | 0x8;
> > +
> > + for (i = 0; i < sgi->num_entries; i++) {
> > + struct action_gate_entry *from = &sgi->entries[i];
> > + struct sgce *to = &sgce[i];
> > +
> > + if (from->gate_state)
> > + to->multi |= 0x10;
> > +
> > + if (from->ipv >= 0)
> > + to->multi |= ((from->ipv & 0x7) << 5) | 0x08;
> > +
> > + if (from->maxoctets)
> > + to->multi |= 0x01;
> > +
> > + to->interval = cpu_to_le32(from->interval);
> > + to->msdu[0] = from->maxoctets & 0xFF;
> > + to->msdu[1] = (from->maxoctets >> 8) & 0xFF;
> > + to->msdu[2] = (from->maxoctets >> 16) & 0xFF;
> > + }
> > +
> > + /* If basetime is 0, calculate start time */
> > + if (!sgi->basetime) {
> > + u64 start;
> > +
> > + err = get_start_ns(priv, sgi->cycletime, &start);
> > + if (err)
> > + goto exit;
> > + sgcl_data->btl = cpu_to_le32(lower_32_bits(start));
> > + sgcl_data->bth = cpu_to_le32(upper_32_bits(start));
> > + } else {
> > + u32 hi, lo;
> > +
> > + hi = upper_32_bits(sgi->basetime);
> > + lo = lower_32_bits(sgi->basetime);
> > + sgcl_data->bth = cpu_to_le32(hi);
> > + sgcl_data->btl = cpu_to_le32(lo);
> > + }
> > +
> > + err = enetc_send_cmd(priv->si, &cbd);
> > +
> > +exit:
> > + kfree(sgcl_data);
> > +
> > + return err;
> > +}
> > +
> > +static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
> > +{
> > + struct enetc_stream_filter *f;
> > +
> > + hlist_for_each_entry(f, &epsfp.stream_list, node)
> > + if (f->sid.index == index)
> > + return f;
> > +
> > + return NULL;
> > +}
> > +
> > +static struct enetc_psfp_gate *enetc_get_gate_by_index(u32 index)
> > +{
> > + struct enetc_psfp_gate *g;
> > +
> > + hlist_for_each_entry(g, &epsfp.psfp_gate_list, node)
> > + if (g->index == index)
> > + return g;
> > +
> > + return NULL;
> > +}
> > +
> > +static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
> > +{
> > + struct enetc_psfp_filter *s;
> > +
> > + hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
> > + if (s->index == index)
> > + return s;
> > +
> > + return NULL;
> > +}
> > +
> > +static struct enetc_psfp_filter
> > + *enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
> > +{
> > + struct enetc_psfp_filter *s;
> > +
> > + hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
> > + if (s->gate_id == sfi->gate_id &&
> > + s->prio == sfi->prio &&
> > + s->meter_id == sfi->meter_id)
> > + return s;
> > +
> > + return NULL;
> > +}
> > +
> > +static int enetc_get_free_index(struct enetc_ndev_priv *priv)
> > +{
> > + u32 max_size = priv->psfp_cap.max_psfp_filter;
> > + unsigned long index;
> > +
> > + index = find_first_zero_bit(epsfp.psfp_sfi_bitmap, max_size);
> > + if (index == max_size)
> > + return -1;
> > +
> > + return index;
> > +}
> > +
> > +static void reduce_ref_sfi(struct enetc_ndev_priv *priv, u32 index)
>
> Please use the more usual idioms, something like "sfi_unref()" or
> "stream_filter_unref".

Ok. make sense.

>
> > +{
> > + struct enetc_psfp_filter *sfi;
> > +
> > + sfi = enetc_get_filter_by_index(index);
> > + WARN_ON(!sfi);
> > + sfi->refcount--;
> > +
> > + if (!sfi->refcount) {
> > + enetc_streamfilter_hw_set(priv, sfi, false);
> > + hlist_del(&sfi->node);
> > + kfree(sfi);
> > + clear_bit(sfi->index, epsfp.psfp_sfi_bitmap);
> > + }
> > +}
> > +
> > +static void reduce_ref_sgi(struct enetc_ndev_priv *priv, u32 index)
> > +{
> > + struct enetc_psfp_gate *sgi;
> > +
> > + sgi = enetc_get_gate_by_index(index);
> > + WARN_ON(!sgi);
> > + sgi->refcount--;
> > +
> > + if (!sgi->refcount) {
> > + enetc_streamgate_hw_set(priv, sgi, false);
> > + hlist_del(&sgi->node);
> > + kfree(sgi);
> > + }
> > +}
> > +
> > +static void remove_one_chain(struct enetc_ndev_priv *priv,
> > + struct enetc_stream_filter *filter)
> > +{
> > + reduce_ref_sgi(priv, filter->sgi_index);
> > + reduce_ref_sfi(priv, filter->sfi_index);
> > +
> > + hlist_del(&filter->node);
> > + kfree(filter);
> > +}
> > +
> > +static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
> > + struct enetc_streamid *sid,
> > + struct enetc_psfp_filter *sfi,
> > + struct enetc_psfp_gate *sgi)
> > +{
> > + int err;
> > +
> > + err = enetc_streamid_hw_set(priv, sid, true);
> > + if (err)
> > + return err;
> > +
> > + if (sfi) {
> > + err = enetc_streamfilter_hw_set(priv, sfi, true);
> > + if (err)
> > + goto revert_sid;
> > + }
> > +
> > + err = enetc_streamgate_hw_set(priv, sgi, true);
> > + if (err)
> > + goto revert_sfi;
> > +
> > + return 0;
> > +
> > +revert_sfi:
> > + if (sfi && !sfi->refcount)
> > + enetc_streamfilter_hw_set(priv, sfi, false);
> > +revert_sid:
> > + enetc_streamid_hw_set(priv, sid, false);
> > + return err;
> > +}
> > +
> > +struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int
> inputkeys)
> > +{
> > + int i;
> > +
> > + for (i = 0; i < ARRAY_SIZE(enetc_act_fwd); i++)
> > + if (acts == enetc_act_fwd[i].actions &&
> > + inputkeys & enetc_act_fwd[i].keys)
> > + return &enetc_act_fwd[i];
> > +
> > + return NULL;
> > +}
> > +
> > +static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
> > + struct flow_cls_offload *f)
> > +{
> > + struct flow_rule *rule = flow_cls_offload_flow_rule(f);
> > + struct netlink_ext_ack *extack = f->common.extack;
> > + struct enetc_stream_filter *filter, *old_filter;
> > + struct enetc_psfp_filter *sfi, *old_sfi;
> > + struct enetc_psfp_gate *sgi, *old_sgi;
> > + struct flow_action_entry *entry;
> > + struct action_gate_entry *e;
> > + u8 sfi_overwrite = 0;
> > + int entries_size;
> > + int i, err;
> > +
> > + if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
> > + NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
> > + return -ENOSPC;
> > + }
> > +
> > + flow_action_for_each(i, entry, &rule->action)
> > + if (entry->id == FLOW_ACTION_GATE)
> > + break;
> > +
> > + if (entry->id != FLOW_ACTION_GATE)
> > + return -EINVAL;
> > +
> > + filter = kzalloc(sizeof(*filter), GFP_KERNEL);
> > + if (!filter)
> > + return -ENOMEM;
> > +
> > + filter->sid.index = f->common.chain_index;
> > +
> > + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
> > + struct flow_match_eth_addrs match;
> > +
> > + flow_rule_match_eth_addrs(rule, &match);
> > +
> > + if (!is_zero_ether_addr(match.mask->dst)) {
> > + ether_addr_copy(filter->sid.dst_mac, match.key->dst);
> > + filter->sid.filtertype = STREAMID_TYPE_NULL;
> > + }
> > +
> > + if (!is_zero_ether_addr(match.mask->src)) {
> > + ether_addr_copy(filter->sid.src_mac, match.key->src);
> > + filter->sid.filtertype = STREAMID_TYPE_SMAC;
> > + }
> > + } else {
> > + NL_SET_ERR_MSG_MOD(extack, "Unsupported, must
> ETH_ADDRS");
> > + return -EINVAL;
> > + }
> > +
> > + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
> > + struct flow_match_vlan match;
> > +
> > + flow_rule_match_vlan(rule, &match);
> > + if (match.mask->vlan_priority) {
> > + if (match.mask->vlan_priority !=
> > + (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
> > + NL_SET_ERR_MSG_MOD(extack, "Only full mask is
> supported for VLAN priority");
> > + err = -EINVAL;
> > + goto free_filter;
> > + }
> > + }
> > +
> > + if (match.mask->vlan_tpid) {
> > + if (match.mask->vlan_tpid != VLAN_VID_MASK) {
> > + NL_SET_ERR_MSG_MOD(extack, "Only full mask is
> supported for VLAN id");
> > + err = -EINVAL;
> > + goto free_filter;
> > + }
> > +
> > + filter->sid.vid = match.key->vlan_tpid;
> > + if (!filter->sid.vid)
> > + filter->sid.tagged = STREAMID_VLAN_UNTAGGED;
> > + else
> > + filter->sid.tagged = STREAMID_VLAN_TAGGED;
> > + }
> > + } else {
> > + filter->sid.tagged = STREAMID_VLAN_ALL;
> > + }
> > +
> > + /* parsing gate action */
> > + if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
> > + NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
> > + err = -ENOSPC;
> > + goto free_filter;
> > + }
> > +
> > + if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
> > + NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
> > + err = -ENOSPC;
> > + goto free_filter;
> > + }
> > +
> > + entries_size = struct_size(sgi, entries, entry->gate.num_entries);
> > + sgi = kzalloc(entries_size, GFP_KERNEL);
> > + if (!sgi) {
> > + err = -ENOMEM;
> > + goto free_filter;
> > + }
> > +
> > + sgi->index = entry->gate.index;
> > + sgi->init_ipv = entry->gate.prio;
> > + sgi->basetime = entry->gate.basetime;
> > + sgi->cycletime = entry->gate.cycletime;
> > + sgi->num_entries = entry->gate.num_entries;
> > +
> > + e = sgi->entries;
> > + for (i = 0; i < entry->gate.num_entries; i++) {
> > + e[i].gate_state = entry->gate.entries[i].gate_state;
> > + e[i].interval = entry->gate.entries[i].interval;
> > + e[i].ipv = entry->gate.entries[i].ipv;
> > + e[i].maxoctets = entry->gate.entries[i].maxoctets;
> > + }
> > +
> > + filter->sgi_index = sgi->index;
> > +
> > + sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
> > + if (!sfi) {
> > + err = -ENOMEM;
> > + goto free_gate;
> > + }
> > +
> > + sfi->gate_id = sgi->index;
> > +
> > + /* flow meter not support yet */
> > + sfi->meter_id = ENETC_PSFP_WILDCARD;
> > +
> > + /* prio ref the filter prio */
> > + if (f->common.prio && f->common.prio <= BIT(3))
> > + sfi->prio = f->common.prio - 1;
> > + else
> > + sfi->prio = ENETC_PSFP_WILDCARD;
> > +
> > + old_sfi = enetc_psfp_check_sfi(sfi);
> > + if (!old_sfi) {
> > + int index;
> > +
> > + index = enetc_get_free_index(priv);
> > + if (sfi->handle < 0) {
> > + NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
> > + err = -ENOSPC;
> > + goto free_sfi;
> > + }
> > +
> > + sfi->index = index;
> > + sfi->handle = index + HANDLE_OFFSET;
> > + /* Update the stream filter handle also */
> > + filter->sid.handle = sfi->handle;
> > + filter->sfi_index = sfi->index;
> > + sfi_overwrite = 0;
> > + } else {
> > + filter->sfi_index = old_sfi->index;
> > + filter->sid.handle = old_sfi->handle;
> > + sfi_overwrite = 1;
> > + }
> > +
> > + err = enetc_psfp_hw_set(priv, &filter->sid,
> > + sfi_overwrite ? NULL : sfi, sgi);
> > + if (err)
> > + goto free_sfi;
> > +
> > + spin_lock(&epsfp.psfp_lock);
> > + old_sgi = enetc_get_gate_by_index(filter->sgi_index);
> > + if (old_sgi) {
> > + sgi->refcount = old_sgi->refcount;
> > + hlist_del(&old_sgi->node);
> > + kfree(old_sgi);
>
> I don't understand what you are trying to achieve here. But there should
> be a cleaner way.

The code here is to remove the old stream gate instance in list. What ever it is existed, will link a new list to replace the old setting but still need the bind count.
Since the old filter may different with new gate instance, from user view, should know they are setting new one with same index, and would replaced by new gate setting. But old filter still working.
I would check again for improvement if it can be more clear.

>
> > + }
> > +
> > + sgi->refcount++;
> > + hlist_add_head(&sgi->node, &epsfp.psfp_gate_list);
> > +
> > + if (!old_sfi) {
> > + hlist_add_head(&sfi->node, &epsfp.psfp_filter_list);
> > + set_bit(sfi->index, epsfp.psfp_sfi_bitmap);
> > + sfi->refcount++;
> > + } else {
> > + kfree(sfi);
> > + old_sfi->refcount++;
> > + }
> > +
> > + old_filter = enetc_get_stream_by_index(filter->sid.index);
> > + if (old_filter)
> > + remove_one_chain(priv, old_filter);
> > +
> > + filter->stats.lastused = jiffies;
> > + hlist_add_head(&filter->node, &epsfp.stream_list);
> > +
> > + spin_unlock(&epsfp.psfp_lock);
> > +
> > + return 0;
> > +
> > +free_sfi:
> > + kfree(sfi);
> > +free_gate:
> > + kfree(sgi);
> > +free_filter:
> > + kfree(filter);
> > +
> > + return err;
> > +}
> > +
> > +static int enetc_config_clsflower(struct enetc_ndev_priv *priv,
> > + struct flow_cls_offload *cls_flower)
> > +{
> > + struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
> > + struct netlink_ext_ack *extack = cls_flower->common.extack;
> > + struct flow_dissector *dissector = rule->match.dissector;
> > + struct flow_action *action = &rule->action;
> > + struct flow_action_entry *entry;
> > + struct actions_fwd *fwd;
> > + u64 actions = 0;
> > + int i, err;
> > +
> > + if (!flow_action_has_entries(action)) {
> > + NL_SET_ERR_MSG_MOD(extack, "At least one action is needed");
> > + return -EINVAL;
> > + }
> > +
> > + flow_action_for_each(i, entry, action)
> > + actions |= BIT(entry->id);
> > +
> > + fwd = enetc_check_flow_actions(actions, dissector->used_keys);
> > + if (!fwd) {
> > + NL_SET_ERR_MSG_MOD(extack, "Unsupported filter type!");
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + if (fwd->output & FILTER_ACTION_TYPE_PSFP) {
> > + err = enetc_psfp_parse_clsflower(priv, cls_flower);
> > + if (err) {
> > + NL_SET_ERR_MSG_MOD(extack, "Invalid PSFP inputs");
> > + return err;
> > + }
> > + } else {
> > + NL_SET_ERR_MSG_MOD(extack, "Unsupported actions");
> > + return -EOPNOTSUPP;
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int enetc_psfp_destroy_clsflower(struct enetc_ndev_priv *priv,
> > + struct flow_cls_offload *f)
> > +{
> > + struct enetc_stream_filter *filter;
> > + struct netlink_ext_ack *extack = f->common.extack;
> > + int err;
> > +
> > + if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
> > + NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
> > + return -ENOSPC;
> > + }
> > +
> > + filter = enetc_get_stream_by_index(f->common.chain_index);
> > + if (!filter)
> > + return -EINVAL;
> > +
> > + err = enetc_streamid_hw_set(priv, &filter->sid, false);
> > + if (err)
> > + return err;
> > +
> > + remove_one_chain(priv, filter);
> > +
> > + return 0;
> > +}
> > +
> > +static int enetc_destroy_clsflower(struct enetc_ndev_priv *priv,
> > + struct flow_cls_offload *f)
> > +{
> > + return enetc_psfp_destroy_clsflower(priv, f);
> > +}
> > +
> > +static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
> > + struct flow_cls_offload *f)
> > +{
> > + struct psfp_streamfilter_counters counters = {};
> > + struct enetc_stream_filter *filter;
> > + struct flow_stats stats = {};
> > + int err;
> > +
> > + filter = enetc_get_stream_by_index(f->common.chain_index);
> > + if (!filter)
> > + return -EINVAL;
> > +
> > + err = enetc_streamcounter_hw_get(priv, filter->sfi_index, &counters);
> > + if (err)
> > + return -EINVAL;
> > +
> > + spin_lock(&epsfp.psfp_lock);
> > + stats.pkts = counters.matching_frames_count - filter->stats.pkts;
> > + stats.dropped = counters.not_passing_frames_count -
> > + filter->stats.dropped;
> > + stats.lastused = filter->stats.lastused;
> > + filter->stats.pkts += stats.pkts;
> > + filter->stats.dropped += stats.dropped;
> > +
> > + spin_unlock(&epsfp.psfp_lock);
> > +
> > + flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
> > + stats.dropped);
> > +
> > + return 0;
> > +}
> > +
> > +static int enetc_setup_tc_cls_flower(struct enetc_ndev_priv *priv,
> > + struct flow_cls_offload *cls_flower)
> > +{
> > + switch (cls_flower->command) {
> > + case FLOW_CLS_REPLACE:
> > + return enetc_config_clsflower(priv, cls_flower);
> > + case FLOW_CLS_DESTROY:
> > + return enetc_destroy_clsflower(priv, cls_flower);
> > + case FLOW_CLS_STATS:
> > + return enetc_psfp_get_stats(priv, cls_flower);
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +}
> > +
> > +static inline void clean_psfp_sfi_bitmap(void)
> > +{
> > + bitmap_free(epsfp.psfp_sfi_bitmap);
> > + epsfp.psfp_sfi_bitmap = NULL;
> > +}
> > +
> > +static void clean_stream_list(void)
> > +{
> > + struct enetc_stream_filter *s;
> > + struct hlist_node *tmp;
> > +
> > + hlist_for_each_entry_safe(s, tmp, &epsfp.stream_list, node) {
> > + hlist_del(&s->node);
> > + kfree(s);
> > + }
> > +}
> > +
> > +static void clean_sfi_list(void)
> > +{
> > + struct enetc_psfp_filter *sfi;
> > + struct hlist_node *tmp;
> > +
> > + hlist_for_each_entry_safe(sfi, tmp, &epsfp.psfp_filter_list, node) {
> > + hlist_del(&sfi->node);
> > + kfree(sfi);
> > + }
> > +}
> > +
> > +static void clean_sgi_list(void)
> > +{
> > + struct enetc_psfp_gate *sgi;
> > + struct hlist_node *tmp;
> > +
> > + hlist_for_each_entry_safe(sgi, tmp, &epsfp.psfp_gate_list, node) {
> > + hlist_del(&sgi->node);
> > + kfree(sgi);
> > + }
> > +}
> > +
> > +static void clean_psfp_all(void)
> > +{
> > + /* Disable all list nodes and free all memory */
> > + clean_sfi_list();
> > + clean_sgi_list();
> > + clean_stream_list();
> > + epsfp.dev_bitmap = 0;
> > + clean_psfp_sfi_bitmap();
> > +}
> > +
> > +int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
> > + void *cb_priv)
> > +{
> > + struct net_device *ndev = cb_priv;
> > +
> > + if (!tc_can_offload(ndev))
> > + return -EOPNOTSUPP;
> > +
> > + switch (type) {
> > + case TC_SETUP_CLSFLOWER:
> > + return enetc_setup_tc_cls_flower(netdev_priv(ndev), type_data);
> > + default:
> > + return -EOPNOTSUPP;
> > + }
> > +}
> > +
> > +int enetc_psfp_init(struct enetc_ndev_priv *priv)
> > +{
> > + if (epsfp.psfp_sfi_bitmap)
> > + return 0;
> > +
> > + epsfp.psfp_sfi_bitmap = bitmap_zalloc(priv->psfp_cap.max_psfp_filter,
> > + GFP_KERNEL);
> > + if (!epsfp.psfp_sfi_bitmap)
> > + return -ENOMEM;
> > +
> > + spin_lock_init(&epsfp.psfp_lock);
> > +
> > + if (list_empty(&enetc_block_cb_list))
> > + epsfp.dev_bitmap = 0;
> > +
> > + return 0;
> > +}
> > +
> > +int enetc_psfp_clean(struct enetc_ndev_priv *priv)
> > +{
> > + if (!list_empty(&enetc_block_cb_list))
> > + return -EBUSY;
> > +
> > + clean_psfp_all();
> > +
> > + return 0;
> > +}
> > +
> > +int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data)
> > +{
> > + struct enetc_ndev_priv *priv = netdev_priv(ndev);
> > + struct flow_block_offload *f = type_data;
> > + int err;
> > +
> > + err = flow_block_cb_setup_simple(f, &enetc_block_cb_list,
> > + enetc_setup_tc_block_cb,
> > + ndev, ndev, true);
> > + if (err)
> > + return err;
> > +
> > + switch (f->command) {
> > + case FLOW_BLOCK_BIND:
> > + set_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
> > + break;
> > + case FLOW_BLOCK_UNBIND:
> > + clear_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
> > + if (!epsfp.dev_bitmap)
> > + clean_psfp_all();
> > + break;
> > + }
> > +
> > + return 0;
> > +}
> > --
> > 2.17.1
> >
>
> --
> Vinicius

Br,
Po Liu

2020-03-24 04:07:55

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 0/5] Introduce a flow gate control action and apply IEEE

Changes from RFC:
0000: Reduce to 5 patches and remove the 4 max frame size offload and
flow metering in the policing offload action, Only keep gate action
offloading implementation.
0001: No changes.
0002:
- fix kfree lead ack by Jakub Kicinski and Cong Wang
- License fix from Jakub Kicinski and Stephen Hemminger
- Update example in commit acked by Vinicius Costa Gomes
- Fix the rcu protect in tcf_gate_act() acked by Vinicius

0003: No changes
0004: No changes
0005:
Acked by Vinicius Costa Gomes
- Use refcount kernel lib
- Update stream gate check code position
- Update reduce ref names more clear

iprout2 command patches:
0000: Update license expression and add gate id
0001: Add tc action gate man page

This patch set is trying to intruduce a way to add tc flower offload
for the enetc IEEE 802.1Qci (PSFP) function. There are four main feature
parts in PSFP to implement the flow policing and filtering for ingress
flow with IEEE 802.1Qci features. They are stream identify(this is defined
in the P802.1cb exactly but needed by 802.1Qci), stream filtering, stream
gate and flow metering.

The stream gate function is the important part in the features. But
there is no compare actions in the qdisc filter part. The second patch
introduce a ingress frame gate control flow action. tc create a gate
action would provide a gate list to control class open/close state. when
the gate open state, the flow could pass but not when gate state is
close. The driver would repeat the gate list periodic. User also could
assign a time point to start the gate list by the basetime parameter. if
the basetime has passed current time, start time would calculate by the
cycletime of the gate list. And it is introduce a software simulator
gate control method.

The first patch is fix a flow offload can't provide dropped frame count
number issue. This would be used for getting the hardware offload dropped
frame.

The third patch is to adding the gate flow offloading.

The fourth patch is to add tc offload command in enetc. This is to
control the on/off for the tc flower offloading.

Now the enetc driver would got the gate control list and filter mac
address etc. So enetc would collected the parameters and create the
stream identify entry and stream gate control entry. Then driver would
create a stream filter entry by these inputs. Driver would maintain the
flow chain list. The fifth patch implement the stream gate and stream
filter and stream identify functions in driver by the tc flower actions
and tc filter parameters.

The iproute2 test patch need to upload to:

git://git.kernel.org/pub/scm/network/iproute2/iproute2-next.git

Po Liu (5):
net: qos offload add flow status with dropped count
net: qos: introduce a gate control flow action
net: schedule: add action gate offloading
net: enetc: add hw tc hw offload features for PSPF capability
net: enetc: add tc flower psfp offload driver

drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 2 +-
.../ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 2 +-
.../chelsio/cxgb4/cxgb4_tc_matchall.c | 2 +-
drivers/net/ethernet/freescale/enetc/enetc.c | 34 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 86 ++
.../net/ethernet/freescale/enetc/enetc_hw.h | 159 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 6 +
.../net/ethernet/freescale/enetc/enetc_qos.c | 1074 +++++++++++++++++
.../net/ethernet/mellanox/mlx5/core/en_tc.c | 4 +-
.../ethernet/mellanox/mlxsw/spectrum_flower.c | 2 +-
drivers/net/ethernet/mscc/ocelot_flower.c | 2 +-
.../ethernet/netronome/nfp/flower/offload.c | 2 +-
.../ethernet/netronome/nfp/flower/qos_conf.c | 2 +-
include/net/act_api.h | 11 +-
include/net/flow_offload.h | 15 +-
include/net/pkt_cls.h | 5 +-
include/net/tc_act/tc_gate.h | 169 +++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +
net/sched/Kconfig | 15 +
net/sched/Makefile | 1 +
net/sched/act_api.c | 12 +-
net/sched/act_ct.c | 6 +-
net/sched/act_gact.c | 7 +-
net/sched/act_gate.c | 647 ++++++++++
net/sched/act_mirred.c | 6 +-
net/sched/act_police.c | 6 +-
net/sched/act_vlan.c | 6 +-
net/sched/cls_api.c | 33 +
net/sched/cls_flower.c | 3 +-
net/sched/cls_matchall.c | 3 +-
31 files changed, 2329 insertions(+), 41 deletions(-)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

--
2.17.1

2020-03-24 04:08:48

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 1/5] net: qos offload add flow status with dropped count

Add the hardware tc flower offloading with dropped frame counter for
status update. action ops->stats_update only loaded by the
tcf_exts_stats_update() and tcf_exts_stats_update() only loaded by
matchall and tc flower hardware filter. But the stats_update only set
the dropped count as default false in the ops->stats_update. This
patch add the dropped counter to action stats update. Its dropped counter
update by the hardware offloading driver.
This is changed by replacing the drop flag with dropped frame counter.

Driver side should update how many "packets" it filtered and how many
"dropped" in those "packets".

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 2 +-
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 2 +-
.../net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 ++--
.../net/ethernet/mellanox/mlxsw/spectrum_flower.c | 2 +-
drivers/net/ethernet/mscc/ocelot_flower.c | 2 +-
drivers/net/ethernet/netronome/nfp/flower/offload.c | 2 +-
drivers/net/ethernet/netronome/nfp/flower/qos_conf.c | 2 +-
include/net/act_api.h | 11 ++++++-----
include/net/flow_offload.h | 5 ++++-
include/net/pkt_cls.h | 5 +++--
net/sched/act_api.c | 12 ++++++------
net/sched/act_ct.c | 6 +++---
net/sched/act_gact.c | 7 ++++---
net/sched/act_mirred.c | 6 +++---
net/sched/act_police.c | 6 +++---
net/sched/act_vlan.c | 6 +++---
net/sched/cls_flower.c | 3 ++-
net/sched/cls_matchall.c | 3 ++-
19 files changed, 48 insertions(+), 40 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index b19be7549aad..d14b33fe745c 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -1639,7 +1639,7 @@ static int bnxt_tc_get_flow_stats(struct bnxt *bp,
spin_unlock(&flow->stats_lock);

flow_stats_update(&tc_flow_cmd->stats, stats.bytes, stats.packets,
- lastused);
+ lastused, 0);
return 0;
}

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index aec9b90313e7..c0d9bc9e6cb7 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -903,7 +903,7 @@ int cxgb4_tc_flower_stats(struct net_device *dev,
ofld_stats->last_used = jiffies;
flow_stats_update(&cls->stats, bytes - ofld_stats->byte_count,
packets - ofld_stats->packet_count,
- ofld_stats->last_used);
+ ofld_stats->last_used, 0);

ofld_stats->packet_count = packets;
ofld_stats->byte_count = bytes;
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c
index 8a5ae8bc9b7d..d2451836d5fd 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c
@@ -346,7 +346,7 @@ int cxgb4_tc_matchall_stats(struct net_device *dev,
flow_stats_update(&cls_matchall->stats,
bytes - tc_port_matchall->ingress.bytes,
packets - tc_port_matchall->ingress.packets,
- tc_port_matchall->ingress.last_used);
+ tc_port_matchall->ingress.last_used, 0);

tc_port_matchall->ingress.packets = packets;
tc_port_matchall->ingress.bytes = bytes;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 901f88a886c8..ca1b694d13d4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4467,7 +4467,7 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
no_peer_counter:
mlx5_devcom_release_peer_data(devcom, MLX5_DEVCOM_ESW_OFFLOADS);
out:
- flow_stats_update(&f->stats, bytes, packets, lastuse);
+ flow_stats_update(&f->stats, bytes, packets, lastuse, 0);
trace_mlx5e_stats_flower(f);
errout:
mlx5e_flow_put(priv, flow);
@@ -4584,7 +4584,7 @@ void mlx5e_tc_stats_matchall(struct mlx5e_priv *priv,
dpkts = cur_stats.rx_packets - rpriv->prev_vf_vport_stats.rx_packets;
dbytes = cur_stats.rx_bytes - rpriv->prev_vf_vport_stats.rx_bytes;
rpriv->prev_vf_vport_stats = cur_stats;
- flow_stats_update(&ma->stats, dpkts, dbytes, jiffies);
+ flow_stats_update(&ma->stats, dpkts, dbytes, jiffies, 0);
}

static void mlx5e_tc_hairpin_update_dead_peer(struct mlx5e_priv *priv,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
index 1cb023955d8f..f1d90ffa5eae 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
@@ -578,7 +578,7 @@ int mlxsw_sp_flower_stats(struct mlxsw_sp *mlxsw_sp,
if (err)
goto err_rule_get_stats;

- flow_stats_update(&f->stats, bytes, packets, lastuse);
+ flow_stats_update(&f->stats, bytes, packets, lastuse, 0);

mlxsw_sp_acl_ruleset_put(mlxsw_sp, ruleset);
return 0;
diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c
index 873a9944fbfb..6cbca9b05520 100644
--- a/drivers/net/ethernet/mscc/ocelot_flower.c
+++ b/drivers/net/ethernet/mscc/ocelot_flower.c
@@ -224,7 +224,7 @@ int ocelot_cls_flower_stats(struct ocelot *ocelot, int port,
if (ret)
return ret;

- flow_stats_update(&f->stats, 0x0, ace.stats.pkts, 0x0);
+ flow_stats_update(&f->stats, 0x0, ace.stats.pkts, 0x0, 0x0);
return 0;
}
EXPORT_SYMBOL_GPL(ocelot_cls_flower_stats);
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 7ca5c1becfcf..053f647c1ec6 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1490,7 +1490,7 @@ nfp_flower_get_stats(struct nfp_app *app, struct net_device *netdev,
nfp_flower_update_merge_stats(app, nfp_flow);

flow_stats_update(&flow->stats, priv->stats[ctx_id].bytes,
- priv->stats[ctx_id].pkts, priv->stats[ctx_id].used);
+ priv->stats[ctx_id].pkts, priv->stats[ctx_id].used, 0);

priv->stats[ctx_id].pkts = 0;
priv->stats[ctx_id].bytes = 0;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/qos_conf.c b/drivers/net/ethernet/netronome/nfp/flower/qos_conf.c
index 124a43dc136a..354d64edbec0 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/qos_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/qos_conf.c
@@ -320,7 +320,7 @@ nfp_flower_stats_rate_limiter(struct nfp_app *app, struct net_device *netdev,
spin_unlock_bh(&fl_priv->qos_stats_lock);

flow_stats_update(&flow->stats, diff_bytes, diff_pkts,
- repr_priv->qos_table.last_update);
+ repr_priv->qos_table.last_update, 0);
return 0;
}

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 41337c7fc728..7bb03ee2f747 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -103,7 +103,7 @@ struct tc_action_ops {
struct netlink_callback *, int,
const struct tc_action_ops *,
struct netlink_ext_ack *);
- void (*stats_update)(struct tc_action *, u64, u32, u64, bool);
+ void (*stats_update)(struct tc_action *, u64, u64, u64, u64, bool);
size_t (*get_fill_size)(const struct tc_action *act);
struct net_device *(*get_dev)(const struct tc_action *a,
tc_action_priv_destructor *destructor);
@@ -229,8 +229,8 @@ static inline void tcf_action_inc_overlimit_qstats(struct tc_action *a)
spin_unlock(&a->tcfa_lock);
}

-void tcf_action_update_stats(struct tc_action *a, u64 bytes, u32 packets,
- bool drop, bool hw);
+void tcf_action_update_stats(struct tc_action *a, u64 bytes, u64 packets,
+ u64 dropped, bool hw);
int tcf_action_copy_stats(struct sk_buff *, struct tc_action *, int);

int tcf_action_check_ctrlact(int action, struct tcf_proto *tp,
@@ -241,13 +241,14 @@ struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action,
#endif /* CONFIG_NET_CLS_ACT */

static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
- u64 packets, u64 lastuse, bool hw)
+ u64 packets, u64 lastuse,
+ u64 dropped, bool hw)
{
#ifdef CONFIG_NET_CLS_ACT
if (!a->ops->stats_update)
return;

- a->ops->stats_update(a, bytes, packets, lastuse, hw);
+ a->ops->stats_update(a, bytes, packets, lastuse, dropped, hw);
#endif
}

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 51b9893d4ccb..cae3658a1844 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -369,14 +369,17 @@ struct flow_stats {
u64 pkts;
u64 bytes;
u64 lastused;
+ u64 dropped;
};

static inline void flow_stats_update(struct flow_stats *flow_stats,
- u64 bytes, u64 pkts, u64 lastused)
+ u64 bytes, u64 pkts,
+ u64 lastused, u64 dropped)
{
flow_stats->pkts += pkts;
flow_stats->bytes += bytes;
flow_stats->lastused = max_t(u64, flow_stats->lastused, lastused);
+ flow_stats->dropped += dropped;
}

enum flow_block_command {
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 1db8b27d4515..4d12d3edeb71 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -262,7 +262,7 @@ static inline void tcf_exts_put_net(struct tcf_exts *exts)

static inline void
tcf_exts_stats_update(const struct tcf_exts *exts,
- u64 bytes, u64 packets, u64 lastuse)
+ u64 bytes, u64 packets, u64 lastuse, u64 dropped)
{
#ifdef CONFIG_NET_CLS_ACT
int i;
@@ -272,7 +272,8 @@ tcf_exts_stats_update(const struct tcf_exts *exts,
for (i = 0; i < exts->nr_actions; i++) {
struct tc_action *a = exts->actions[i];

- tcf_action_stats_update(a, bytes, packets, lastuse, true);
+ tcf_action_stats_update(a, bytes, packets,
+ lastuse, dropped, true);
}

preempt_enable();
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index aa7b737fed2e..83ffb3fb63f4 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1053,14 +1053,13 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
return err;
}

-void tcf_action_update_stats(struct tc_action *a, u64 bytes, u32 packets,
- bool drop, bool hw)
+void tcf_action_update_stats(struct tc_action *a, u64 bytes, u64 packets,
+ u64 dropped, bool hw)
{
if (a->cpu_bstats) {
_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);

- if (drop)
- this_cpu_ptr(a->cpu_qstats)->drops += packets;
+ this_cpu_ptr(a->cpu_qstats)->drops += dropped;

if (hw)
_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats_hw),
@@ -1069,8 +1068,9 @@ void tcf_action_update_stats(struct tc_action *a, u64 bytes, u32 packets,
}

_bstats_update(&a->tcfa_bstats, bytes, packets);
- if (drop)
- a->tcfa_qstats.drops += packets;
+
+ a->tcfa_qstats.drops += dropped;
+
if (hw)
_bstats_update(&a->tcfa_bstats_hw, bytes, packets);
}
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index 56b66d215a89..f8eda414537c 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -1445,12 +1445,12 @@ static int tcf_ct_search(struct net *net, struct tc_action **a, u32 index)
return tcf_idr_search(tn, a, index);
}

-static void tcf_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 lastuse, u64 dropped, bool hw)
{
struct tcf_ct *c = to_ct(a);

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, dropped, hw);
c->tcf_tm.lastuse = max_t(u64, c->tcf_tm.lastuse, lastuse);
}

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 416065772719..173908368dcc 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -171,14 +171,15 @@ static int tcf_gact_act(struct sk_buff *skb, const struct tc_action *a,
return action;
}

-static void tcf_gact_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_gact_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 lastuse, u64 dropped, bool hw)
{
struct tcf_gact *gact = to_gact(a);
int action = READ_ONCE(gact->tcf_action);
struct tcf_t *tm = &gact->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, action == TC_ACT_SHOT, hw);
+ tcf_action_update_stats(a, bytes, packets,
+ (action == TC_ACT_SHOT) ? packets : 0, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 1ad300e6dbc0..1c56f59e86a8 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -314,13 +314,13 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
return retval;
}

-static void tcf_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 lastuse, u64 dropped, bool hw)
{
struct tcf_mirred *m = to_mirred(a);
struct tcf_t *tm = &m->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, dropped, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 8b7a0ac96c51..7dcc418043fc 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -288,13 +288,13 @@ static void tcf_police_cleanup(struct tc_action *a)
}

static void tcf_police_stats_update(struct tc_action *a,
- u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+ u64 bytes, u64 packets,
+ u64 lastuse, u64 dropped, bool hw)
{
struct tcf_police *police = to_police(a);
struct tcf_t *tm = &police->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, dropped, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index c91d3958fcbb..d579ce23b479 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -302,13 +302,13 @@ static int tcf_vlan_walker(struct net *net, struct sk_buff *skb,
return tcf_generic_walker(tn, skb, cb, type, ops, extack);
}

-static void tcf_vlan_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_vlan_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 lastuse, u64 dropped, bool hw)
{
struct tcf_vlan *v = to_vlan(a);
struct tcf_t *tm = &v->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, dropped, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 258dc45ab7e3..8afaaabef15d 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -492,7 +492,8 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,

tcf_exts_stats_update(&f->exts, cls_flower.stats.bytes,
cls_flower.stats.pkts,
- cls_flower.stats.lastused);
+ cls_flower.stats.lastused,
+ cls_flower.stats.dropped);
}

static void __fl_put(struct cls_fl_filter *f)
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index a34b36adb9b7..606c131d4df7 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -338,7 +338,8 @@ static void mall_stats_hw_filter(struct tcf_proto *tp,
tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false, true);

tcf_exts_stats_update(&head->exts, cls_mall.stats.bytes,
- cls_mall.stats.pkts, cls_mall.stats.lastused);
+ cls_mall.stats.pkts, cls_mall.stats.lastused,
+ cls_mall.stats.dropped);
}

static int mall_dump(struct net *net, struct tcf_proto *tp, void *fh,
--
2.17.1

2020-03-24 04:08:56

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 3/5] net: schedule: add action gate offloading

Add the gate action to the flow action entry. Add the gate parameters to
the tc_setup_flow_action() queueing to the entries of flow_action_entry
array provide to the driver.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 10 +++
include/net/tc_act/tc_gate.h | 115 +++++++++++++++++++++++++++++++++++
net/sched/cls_api.c | 33 ++++++++++
3 files changed, 158 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index cae3658a1844..ef9b8fe82e85 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -147,6 +147,7 @@ enum flow_action_id {
FLOW_ACTION_MPLS_PUSH,
FLOW_ACTION_MPLS_POP,
FLOW_ACTION_MPLS_MANGLE,
+ FLOW_ACTION_GATE,
NUM_FLOW_ACTIONS,
};

@@ -254,6 +255,15 @@ struct flow_action_entry {
u8 bos;
u8 ttl;
} mpls_mangle;
+ struct {
+ u32 index;
+ s32 prio;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimeext;
+ u32 num_entries;
+ struct action_gate_entry *entries;
+ } gate;
};
struct flow_action_cookie *cookie; /* user defined action cookie */
};
diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
index b0ace55b2aaa..62633cb02c7a 100644
--- a/include/net/tc_act/tc_gate.h
+++ b/include/net/tc_act/tc_gate.h
@@ -7,6 +7,13 @@
#include <net/act_api.h>
#include <linux/tc_act/tc_gate.h>

+struct action_gate_entry {
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+};
+
struct tcfg_gate_entry {
int index;
u8 gate_state;
@@ -51,4 +58,112 @@ struct tcf_gate {
#define get_gate_param(act) ((struct tcf_gate_params *)act)
#define get_gate_action(p) ((struct gate_action *)p)

+static inline bool is_tcf_gate(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+ if (a->ops && a->ops->id == TCA_ID_GATE)
+ return true;
+#endif
+ return false;
+}
+
+static inline u32 tcf_gate_index(const struct tc_action *a)
+{
+ return a->tcfa_index;
+}
+
+static inline s32 tcf_gate_prio(const struct tc_action *a)
+{
+ s32 tcfg_prio;
+
+ rcu_read_lock();
+ tcfg_prio = rcu_dereference(to_gate(a)->actg)->param.tcfg_priority;
+ rcu_read_unlock();
+
+ return tcfg_prio;
+}
+
+static inline u64 tcf_gate_basetime(const struct tc_action *a)
+{
+ u64 tcfg_basetime;
+
+ rcu_read_lock();
+ tcfg_basetime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_basetime;
+ rcu_read_unlock();
+
+ return tcfg_basetime;
+}
+
+static inline u64 tcf_gate_cycletime(const struct tc_action *a)
+{
+ u64 tcfg_cycletime;
+
+ rcu_read_lock();
+ tcfg_cycletime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime;
+ rcu_read_unlock();
+
+ return tcfg_cycletime;
+}
+
+static inline u64 tcf_gate_cycletimeext(const struct tc_action *a)
+{
+ u64 tcfg_cycletimeext;
+
+ rcu_read_lock();
+ tcfg_cycletimeext =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime_ext;
+ rcu_read_unlock();
+
+ return tcfg_cycletimeext;
+}
+
+static inline u32 tcf_gate_num_entries(const struct tc_action *a)
+{
+ u32 num_entries;
+
+ rcu_read_lock();
+ num_entries =
+ rcu_dereference(to_gate(a)->actg)->param.num_entries;
+ rcu_read_unlock();
+
+ return num_entries;
+}
+
+static inline struct action_gate_entry
+ *tcf_gate_get_list(const struct tc_action *a)
+{
+ struct action_gate_entry *oe;
+ struct tcf_gate_params *p;
+ struct tcfg_gate_entry *entry;
+ u32 num_entries;
+ int i = 0;
+
+ rcu_read_lock();
+ p = &(rcu_dereference(to_gate(a)->actg)->param);
+ num_entries = p->num_entries;
+ rcu_read_unlock();
+
+ list_for_each_entry(entry, &p->entries, list)
+ i++;
+
+ if (i != num_entries)
+ return NULL;
+
+ oe = kzalloc(sizeof(*oe) * num_entries, GFP_KERNEL);
+ if (!oe)
+ return NULL;
+
+ i = 0;
+ list_for_each_entry(entry, &p->entries, list) {
+ oe[i].gate_state = entry->gate_state;
+ oe[i].interval = entry->interval;
+ oe[i].ipv = entry->ipv;
+ oe[i].maxoctets = entry->maxoctets;
+ i++;
+ }
+
+ return oe;
+}
#endif
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index fb6c3660fb9a..047733442850 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -39,6 +39,7 @@
#include <net/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_ct.h>
#include <net/tc_act/tc_mpls.h>
+#include <net/tc_act/tc_gate.h>
#include <net/flow_offload.h>

extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
@@ -3522,6 +3523,27 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
#endif
}

+static void tcf_gate_entry_destructor(void *priv)
+{
+ struct action_gate_entry *oe = priv;
+
+ kfree(oe);
+}
+
+static int tcf_gate_get_entries(struct flow_action_entry *entry,
+ const struct tc_action *act)
+{
+ entry->gate.entries = tcf_gate_get_list(act);
+
+ if (!entry->gate.entries)
+ return -EINVAL;
+
+ entry->destructor = tcf_gate_entry_destructor;
+ entry->destructor_priv = entry->gate.entries;
+
+ return 0;
+}
+
int tc_setup_flow_action(struct flow_action *flow_action,
const struct tcf_exts *exts)
{
@@ -3668,6 +3690,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
} else if (is_tcf_skbedit_priority(act)) {
entry->id = FLOW_ACTION_PRIORITY;
entry->priority = tcf_skbedit_priority(act);
+ } else if (is_tcf_gate(act)) {
+ entry->id = FLOW_ACTION_GATE;
+ entry->gate.index = tcf_gate_index(act);
+ entry->gate.prio = tcf_gate_prio(act);
+ entry->gate.basetime = tcf_gate_basetime(act);
+ entry->gate.cycletime = tcf_gate_cycletime(act);
+ entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
+ entry->gate.num_entries = tcf_gate_num_entries(act);
+ err = tcf_gate_get_entries(entry, act);
+ if (err)
+ goto err_out;
} else {
err = -EOPNOTSUPP;
goto err_out_locked;
--
2.17.1

2020-03-24 04:09:19

by Po Liu

[permalink] [raw]
Subject: [v1,iproute2 2/2] iproute2: add gate action man page

Signed-off-by: Po Liu <[email protected]>
---
man/man8/tc-gate.8 | 106 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 106 insertions(+)
create mode 100644 man/man8/tc-gate.8

diff --git a/man/man8/tc-gate.8 b/man/man8/tc-gate.8
new file mode 100644
index 0000000..2b2d101
--- /dev/null
+++ b/man/man8/tc-gate.8
@@ -0,0 +1,106 @@
+.TH GATE 8 "12 Mar 2020" "iproute2" "Linux"
+.SH NAME
+gate \- Stream Gate Action
+.SH SYNOPSIS
+.B tc " ... " action gate
+.ti +8
+.B [ base-time
+BASETIME ]
+.B [ clockid
+CLOCKID ]
+.ti +8
+.B sched-entry
+<gate state> <interval 1> <internal priority> <max octets>
+.ti +8
+.B sched-entry
+<gate state> <interval 2> <internal priority> <max octets>
+.ti +8
+.B sched-entry
+<gate state> <interval 3> <internal priority> <max octets>
+.ti +8
+.B ......
+.ti +8
+.B sched-entry
+<gate state> <interval N> <internal priority> <max octets>
+
+.SH DESCRIPTION
+GATE action would provide a gate list to control when traffic keep
+open/close state. when the gate open state, the flow could pass but
+not when gate state is close. The driver would repeat the gate list
+periodically. User also could assign a time point to start the gate
+list by the basetime parameter. if the basetime has passed current
+time, start time would calculate by the cycletime of the gate list.
+
+.SH PARAMETERS
+
+.TP
+base-time
+.br
+Specifies the instant in nanoseconds, defining the time when the schedule
+starts. If 'base-time' is a time in the past, the schedule will start at
+
+base-time + (N * cycle-time)
+
+where N is the smallest integer so the resulting time is greater than
+"now", and "cycle-time" is the sum of all the intervals of the entries
+in the schedule. Without base-time specified, will default to be 0.
+
+.TP
+clockid
+.br
+Specifies the clock to be used by qdisc's internal timer for measuring
+time and scheduling events. Not valid if using for offloading filter.
+For example, tc filter command with
+.B skip_sw parameter.
+
+.TP
+sched-entry
+.br
+There may multiple
+.B sched-entry
+parameters in a single schedule. Each one has the format:
+
+sched-entry <gate state> <interval> <internal priority> <max octets>
+
+.br
+<gate state> means gate states. 'OPEN' keep gate open, 'CLOSE' keep gate close.
+.br
+<interval> means how much nano seconds for this time slot.
+.br
+<internal priority> means internal priority value. Present of the
+internal receiving queue for this stream. "-1" means wildcard.
+.br
+<max octets> means how many octets size for this time slot. Dropped
+if overlimited. "-1" means wildcard.
+
+.SH EXAMPLES
+
+The following example shows tc filter frames source ip match to the
+192.168.0.20 will be passed at offset time 0 last 200000000ns and will
+be dropped at the offset time 200000000ns and last 100000000ns. Then
+run the gate periodically. The schedule will start at instant 200000000000
+using the reference CLOCK_TAI. The schedule is composed of three entries
+each of 300us duration.
+
+.EX
+# tc filter add dev eth0 parent ffff: protocol ip \\
+ flower skip_hw src_ip 192.168.0.20 \\
+ action gate index 2 clockid CLOCK_TAI \\
+ base-time 200000000000 \\
+ sched-entry OPEN 200000000 -1 -1 \\
+ sched-entry CLOSE 100000000 -1 -1
+
+.EE
+
+Following is an example to filter a stream source mac match to the
+10:00:80:00:00:00 will be dropped at any time.
+
+.EX
+#tc filter add dev eth0 parent ffff: protocol ip chain 14 \\
+ flower skip_sw dst_mac 10:00:80:00:00:00 \\
+ action gate index 12 sched-entry close 200000000 -1 -1
+
+.EE
+
+.SH AUTHORS
+Po Liu <[email protected]>
--
2.17.1

2020-03-24 04:09:24

by Po Liu

[permalink] [raw]
Subject: [v1,iproute2 1/2] iproute2:tc:action: add a gate control action

Introduce a ingress frame gate control flow action. tc create a gate
action would provide a gate list to control when open/close state. when
the gate open state, the flow could pass but not when gate state is
close. The driver would repeat the gate list cyclely. User also could
assign a time point to start the gate list by the basetime parameter. if
the basetime has passed current time, start time would calculate by the
cycletime of the gate list.
The behavior try to keep according to the IEEE 802.1Qci spec. For the
software emulation, require the user input the clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

> tc qdisc add dev eth0 ingress
> tc filter add dev eth0 parent ffff: protocol ip \
flower src_ip 192.168.0.20 \
action gate index 2 clockid CLOCK_TAI \
sched-entry OPEN 200000000 -1 -1 \
sched-entry CLOSE 100000000 -1 -1

> tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. gate state is
"OPEN"/"CLOSE". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

> tc filter add dev eth0 parent ffff: protocol ip \
flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
action gate index 12 base-time 1357000000000 \
sched-entry CLOSE 200000000 -1 -1 \
clockid CLOCK_TAI

Signed-off-by: Po Liu <[email protected]>
---
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +++
tc/Makefile | 1 +
tc/m_gate.c | 483 ++++++++++++++++++++++++++++
4 files changed, 532 insertions(+)
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 tc/m_gate.c

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index a6aa466..7a047a9 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -106,6 +106,7 @@ enum tca_id {
TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
TCA_ID_CTINFO,
TCA_ID_MPLS,
+ TCA_ID_GATE,
TCA_ID_CT,
/* other actions go here */
__TCA_ID_MAX = 255
diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
new file mode 100644
index 0000000..f214b3a
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_gate.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 NXP */
+
+#ifndef __LINUX_TC_GATE_H
+#define __LINUX_TC_GATE_H
+
+#include <linux/pkt_cls.h>
+
+struct tc_gate {
+ tc_gen;
+};
+
+enum {
+ TCA_GATE_ENTRY_UNSPEC,
+ TCA_GATE_ENTRY_INDEX,
+ TCA_GATE_ENTRY_GATE,
+ TCA_GATE_ENTRY_INTERVAL,
+ TCA_GATE_ENTRY_IPV,
+ TCA_GATE_ENTRY_MAX_OCTETS,
+ __TCA_GATE_ENTRY_MAX,
+};
+#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_ONE_ENTRY_UNSPEC,
+ TCA_GATE_ONE_ENTRY,
+ __TCA_GATE_ONE_ENTRY_MAX,
+};
+#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_UNSPEC,
+ TCA_GATE_TM,
+ TCA_GATE_PARMS,
+ TCA_GATE_PAD,
+ TCA_GATE_PRIORITY,
+ TCA_GATE_ENTRY_LIST,
+ TCA_GATE_BASE_TIME,
+ TCA_GATE_CYCLE_TIME,
+ TCA_GATE_CYCLE_TIME_EXT,
+ TCA_GATE_FLAGS,
+ TCA_GATE_CLOCKID,
+ __TCA_GATE_MAX,
+};
+#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
+
+#endif
diff --git a/tc/Makefile b/tc/Makefile
index a468a52..9365f3c 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -54,6 +54,7 @@ TCMODULES += m_bpf.o
TCMODULES += m_tunnel_key.o
TCMODULES += m_sample.o
TCMODULES += m_ct.o
+TCMODULES += m_gate.o
TCMODULES += p_ip.o
TCMODULES += p_ip6.o
TCMODULES += p_icmp.o
diff --git a/tc/m_gate.c b/tc/m_gate.c
new file mode 100644
index 0000000..326ceb8
--- /dev/null
+++ b/tc/m_gate.c
@@ -0,0 +1,483 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
+/* Copyright 2020 NXP */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <linux/if_ether.h>
+#include "utils.h"
+#include "rt_names.h"
+#include "tc_util.h"
+#include "list.h"
+#include <linux/tc_act/tc_gate.h>
+
+struct gate_entry {
+ struct list_head list;
+ uint8_t gate_state;
+ uint32_t interval;
+ int32_t ipv;
+ int32_t maxoctets;
+};
+
+#define CLOCKID_INVALID (-1)
+static const struct clockid_table {
+ const char *name;
+ clockid_t clockid;
+} clockt_map[] = {
+ { "REALTIME", CLOCK_REALTIME },
+ { "TAI", CLOCK_TAI },
+ { "BOOTTIME", CLOCK_BOOTTIME },
+ { "MONOTONIC", CLOCK_MONOTONIC },
+ { NULL }
+};
+
+static void explain(void)
+{
+ fprintf(stderr,
+ "Usage: gate [ priority PRIO-SPEC ] [ base-time BASE-TIME ]\n"
+ " [ cycle-time CYCLE-TIME ]\n"
+ " [ cycle-time-ext CYCLE-TIME-EXT ]\n"
+ " [ clockid CLOCKID ] [flags FLAGS]\n"
+ " [ sched-entry GATE0 INTERVAL INTERNAL-PRIO-VALUE MAX-OCTETS ]\n"
+ " [ sched-entry GATE1 INTERVAL INTERNAL-PRIO-VALUE MAX-OCTETS ]\n"
+ " ......\n"
+ " [ sched-entry GATEn INTERVAL INTERNAL-PRIO-VALUE MAX-OCTETS ]\n"
+ " [ CONTROL ]\n"
+ " GATEn := open | close |\n"
+ " INTERVAL : nanoseconds period of gate slot\n"
+ " INTERNAL-PRIO-VALUE : internal priority decide which\n"
+ " rx queue going to\n"
+ " -1 means wildcard\n"
+ " MAX-OCTETS : maximum number of MSDU octets that are"
+ " permitted to pas the gate during the\n"
+ " specified TimeInterval"
+ " CONTROL := pipe | drop | continue | pass |\n"
+ " goto chain <CHAIN_INDEX>\n");
+}
+
+static void usage(void)
+{
+ explain();
+ exit(-1);
+}
+
+static void explain_entry_format(void)
+{
+ fprintf(stderr, "Usage: sched-entry <open | close> <interval> <interval ipv> <octets max bytes>\n");
+}
+
+static int parse_gate(struct action_util *a, int *argc_p, char ***argv_p,
+ int tca_id, struct nlmsghdr *n);
+static int print_gate(struct action_util *au, FILE *f, struct rtattr *arg);
+
+struct action_util gate_action_util = {
+ .id = "gate",
+ .parse_aopt = parse_gate,
+ .print_aopt = print_gate,
+};
+
+static int get_clockid(__s32 *val, const char *arg)
+{
+ const struct clockid_table *c;
+
+ if (strcasestr(arg, "CLOCK_") != NULL)
+ arg += sizeof("CLOCK_") - 1;
+
+ for (c = clockt_map; c->name; c++) {
+ if (strcasecmp(c->name, arg) == 0) {
+ *val = c->clockid;
+ return 0;
+ }
+ }
+
+ return -1;
+}
+
+static const char *get_clock_name(clockid_t clockid)
+{
+ const struct clockid_table *c;
+
+ for (c = clockt_map; c->name; c++) {
+ if (clockid == c->clockid)
+ return c->name;
+ }
+
+ return "invalid";
+}
+
+static int get_gate_state(__u8 *val, const char *arg)
+{
+ if (!strcasecmp("OPEN", arg)) {
+ *val = 1;
+ return 0;
+ }
+
+ if (!strcasecmp("CLOSE", arg)) {
+ *val = 0;
+ return 0;
+ }
+
+ return -1;
+}
+
+static struct gate_entry *create_gate_entry(uint8_t gate_state,
+ uint32_t interval,
+ int32_t ipv,
+ int32_t maxoctets)
+{
+ struct gate_entry *e;
+
+ e = calloc(1, sizeof(*e));
+ if (!e)
+ return NULL;
+
+ e->gate_state = gate_state;
+ e->interval = interval;
+ e->ipv = ipv;
+ e->maxoctets = maxoctets;
+
+ return e;
+}
+
+static int add_gate_list(struct list_head *gate_entries, struct nlmsghdr *n)
+{
+ struct gate_entry *e;
+
+ list_for_each_entry(e, gate_entries, list) {
+ struct rtattr *a;
+
+ a = addattr_nest(n, 1024, TCA_GATE_ONE_ENTRY | NLA_F_NESTED);
+
+ if (e->gate_state)
+ addattr(n, MAX_MSG, TCA_GATE_ENTRY_GATE);
+
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_INTERVAL,
+ &e->interval, sizeof(e->interval));
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_IPV,
+ &e->ipv, sizeof(e->ipv));
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_MAX_OCTETS,
+ &e->maxoctets, sizeof(e->maxoctets));
+
+ addattr_nest_end(n, a);
+ }
+
+ return 0;
+}
+
+static void free_entries(struct list_head *gate_entries)
+{
+ struct gate_entry *e, *n;
+
+ list_for_each_entry_safe(e, n, gate_entries, list) {
+ list_del(&e->list);
+ free(e);
+ }
+}
+
+static int parse_gate(struct action_util *a, int *argc_p, char ***argv_p,
+ int tca_id, struct nlmsghdr *n)
+{
+ struct tc_gate parm = {.action = TC_ACT_PIPE};
+ struct list_head gate_entries;
+ __s32 clockid = CLOCKID_INVALID;
+ struct rtattr *tail, *nle;
+ char **argv = *argv_p;
+ int argc = *argc_p;
+ __u64 base_time = 0;
+ __u64 cycle_time = 0;
+ __u64 cycle_time_ext = 0;
+ __u32 flags = 0;
+ int prio = -1;
+ int entry_num = 0;
+ int err;
+
+ if (matches(*argv, "gate") != 0)
+ return -1;
+
+ NEXT_ARG();
+ if (argc <= 0)
+ return -1;
+
+ INIT_LIST_HEAD(&gate_entries);
+
+ while (argc > 0) {
+ if (matches(*argv, "index") == 0) {
+ NEXT_ARG();
+ if (get_u32(&parm.index, *argv, 10))
+ invarg("index", *argv);
+ } else if (matches(*argv, "priority") == 0) {
+ NEXT_ARG();
+ if (get_s32(&prio, *argv, 0))
+ invarg("priority", *argv);
+ } else if (matches(*argv, "base-time") == 0) {
+ NEXT_ARG();
+ if (get_u64(&base_time, *argv, 10))
+ invarg("base-time", *argv);
+ } else if (matches(*argv, "cycle-time") == 0) {
+ NEXT_ARG();
+ if (get_u64(&cycle_time, *argv, 10))
+ invarg("cycle-time", *argv);
+ } else if (matches(*argv, "cycle-time-ext") == 0) {
+ NEXT_ARG();
+ if (get_u64(&cycle_time_ext, *argv, 10))
+ invarg("cycle-time-ext", *argv);
+ } else if (matches(*argv, "clockid") == 0) {
+ NEXT_ARG();
+ if (get_clockid(&clockid, *argv))
+ invarg("clockid", *argv);
+ } else if (matches(*argv, "flags") == 0) {
+ NEXT_ARG();
+ if (get_u32(&flags, *argv, 0))
+ invarg("flags", *argv);
+ } else if (matches(*argv, "sched-entry") == 0) {
+ struct gate_entry *e;
+ uint8_t gate_state = 0;
+ uint32_t interval = 0;
+ int32_t ipv = -1;
+ int32_t maxoctets = -1;
+
+ NEXT_ARG();
+
+ if (get_gate_state(&gate_state, *argv)) {
+ explain_entry_format();
+ invarg("gate", *argv);
+ }
+
+ NEXT_ARG();
+
+ if (get_u32(&interval, *argv, 0)) {
+ explain_entry_format();
+ invarg("interval", *argv);
+ }
+
+ NEXT_ARG();
+ if (get_s32(&ipv, *argv, 0)) {
+ explain_entry_format();
+ invarg("interval ipv", *argv);
+ }
+
+ NEXT_ARG();
+ if (get_s32(&maxoctets, *argv, 0)) {
+ explain_entry_format();
+ invarg("max octets", *argv);
+ }
+
+ e = create_gate_entry(gate_state, interval,
+ ipv, maxoctets);
+ if (!e) {
+ fprintf(stderr, "gate: not enough memory\n");
+ exit(-1);
+ }
+
+ list_add_tail(&e->list, &gate_entries);
+ entry_num++;
+
+ } else if (matches(*argv, "reclassify") == 0 ||
+ matches(*argv, "drop") == 0 ||
+ matches(*argv, "shot") == 0 ||
+ matches(*argv, "continue") == 0 ||
+ matches(*argv, "pass") == 0 ||
+ matches(*argv, "ok") == 0 ||
+ matches(*argv, "pipe") == 0 ||
+ matches(*argv, "goto") == 0) {
+ if (parse_action_control(&argc, &argv,
+ &parm.action, false))
+ return -1;
+ } else if (matches(*argv, "help") == 0) {
+ usage();
+ } else {
+ break;
+ }
+
+ argc--;
+ argv++;
+ }
+
+ parse_action_control_dflt(&argc, &argv, &parm.action,
+ false, TC_ACT_PIPE);
+
+ if (!entry_num && !parm.index) {
+ fprintf(stderr, "gate: must add at least one entry\n");
+ exit(-1);
+ }
+
+ tail = addattr_nest(n, MAX_MSG, tca_id | NLA_F_NESTED);
+ addattr_l(n, MAX_MSG, TCA_GATE_PARMS, &parm, sizeof(parm));
+
+ if (prio != -1)
+ addattr_l(n, MAX_MSG, TCA_GATE_PRIORITY, &prio, sizeof(prio));
+
+ if (flags)
+ addattr_l(n, MAX_MSG, TCA_GATE_FLAGS, &flags, sizeof(flags));
+
+ if (base_time)
+ addattr_l(n, MAX_MSG, TCA_GATE_BASE_TIME,
+ &base_time, sizeof(base_time));
+
+ if (cycle_time)
+ addattr_l(n, MAX_MSG, TCA_GATE_CYCLE_TIME,
+ &cycle_time, sizeof(cycle_time));
+
+ if (cycle_time_ext)
+ addattr_l(n, MAX_MSG, TCA_GATE_CYCLE_TIME_EXT,
+ &cycle_time_ext, sizeof(cycle_time_ext));
+
+ if (clockid != CLOCKID_INVALID)
+ addattr_l(n, MAX_MSG, TCA_GATE_CLOCKID, &clockid, sizeof(clockid));
+
+ nle = addattr_nest(n, MAX_MSG, TCA_GATE_ENTRY_LIST | NLA_F_NESTED);
+ err = add_gate_list(&gate_entries, n);
+ if (err < 0) {
+ fprintf(stderr, "Could not add entries to netlink message\n");
+ free_entries(&gate_entries);
+ return -1;
+ }
+
+ addattr_nest_end(n, nle);
+ addattr_nest_end(n, tail);
+ free_entries(&gate_entries);
+ *argc_p = argc;
+ *argv_p = argv;
+
+ return 0;
+}
+
+static int print_gate_list(struct rtattr *list)
+{
+ struct rtattr *item;
+ int rem;
+
+ rem = RTA_PAYLOAD(list);
+ print_string(PRINT_FP, NULL, "%s", _SL_);
+
+ for (item = RTA_DATA(list);
+ RTA_OK(item, rem);
+ item = RTA_NEXT(item, rem)) {
+ struct rtattr *tb[TCA_GATE_ENTRY_MAX + 1];
+ __u32 index = 0, interval = 0;
+ __u8 gate_state = 0;
+ __s32 ipv = -1, maxoctets = -1;
+
+ parse_rtattr_nested(tb, TCA_GATE_ENTRY_MAX, item);
+
+ if (tb[TCA_GATE_ENTRY_INDEX])
+ index = rta_getattr_u32(tb[TCA_GATE_ENTRY_INDEX]);
+
+ if (tb[TCA_GATE_ENTRY_GATE])
+ gate_state = 1;
+
+ if (tb[TCA_GATE_ENTRY_INTERVAL])
+ interval = rta_getattr_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
+
+ if (tb[TCA_GATE_ENTRY_IPV])
+ ipv = rta_getattr_s32(tb[TCA_GATE_ENTRY_IPV]);
+
+ if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
+ maxoctets = rta_getattr_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
+
+ print_uint(PRINT_ANY, "number", "\t number %4u", index);
+ print_string(PRINT_ANY, "gate state", "\tgate-state %-8s",
+ gate_state ? "open" : "close");
+
+ print_uint(PRINT_ANY, "interval", "\tinterval %-16u", interval);
+
+ if (ipv != -1)
+ print_uint(PRINT_ANY, "ipv", "\tipv %-8u", ipv);
+ else
+ print_string(PRINT_FP, "ipv", "\tipv %s", "wildcard");
+
+ if (maxoctets != -1)
+ print_uint(PRINT_ANY, "max_octets", "\tmax-octets %-8u", maxoctets);
+ else
+ print_string(PRINT_FP, "max_octets", "\tmax-octets %s", "wildcard");
+
+ print_string(PRINT_FP, NULL, "%s", _SL_);
+ }
+
+ return 0;
+}
+
+static int print_gate(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+ struct tc_gate *parm;
+ struct rtattr *tb[TCA_GATE_MAX + 1];
+ __s32 clockid = CLOCKID_INVALID;
+ __u64 base_time = 0;
+ __u64 cycle_time = 0;
+ __u64 cycle_time_ext = 0;
+ int prio = -1;
+
+ if (arg == NULL)
+ return -1;
+
+ parse_rtattr_nested(tb, TCA_GATE_MAX, arg);
+
+ if (!tb[TCA_GATE_PARMS]) {
+ fprintf(stderr, "Missing gate parameters\n");
+ return -1;
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ parm = RTA_DATA(tb[TCA_GATE_PARMS]);
+
+ if (tb[TCA_GATE_PRIORITY])
+ prio = rta_getattr_s32(tb[TCA_GATE_PRIORITY]);
+
+ if (prio != -1)
+ print_int(PRINT_ANY, "priority", "\tpriority %-8d", prio);
+ else
+ print_string(PRINT_FP, "priority", "\tpriority %s", "wildcard");
+
+ if (tb[TCA_GATE_CLOCKID])
+ clockid = rta_getattr_s32(tb[TCA_GATE_CLOCKID]);
+ print_string(PRINT_ANY, "clockid", "\tclockid %s",
+ get_clock_name(clockid));
+
+ if (tb[TCA_GATE_FLAGS]) {
+ __u32 flags;
+
+ flags = rta_getattr_u32(tb[TCA_GATE_FLAGS]);
+ print_0xhex(PRINT_ANY, "flags", "\tflags %#x", flags);
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ if (tb[TCA_GATE_BASE_TIME])
+ base_time = rta_getattr_u64(tb[TCA_GATE_BASE_TIME]);
+
+ print_lluint(PRINT_ANY, "base_time", "\tbase-time %-22lld", base_time);
+
+ if (tb[TCA_GATE_CYCLE_TIME])
+ cycle_time = rta_getattr_u64(tb[TCA_GATE_CYCLE_TIME]);
+
+ print_lluint(PRINT_ANY, "cycle_time", "\tcycle-time %-16lld", cycle_time);
+
+ if (tb[TCA_GATE_CYCLE_TIME_EXT])
+ cycle_time = rta_getattr_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
+
+ print_lluint(PRINT_ANY, "cycle_time_ext", "\tcycle-time-ext %-16lld",
+ cycle_time_ext);
+
+ if (tb[TCA_GATE_ENTRY_LIST])
+ print_gate_list(tb[TCA_GATE_ENTRY_LIST]);
+
+ print_action_control(f, "\t", parm->action, "");
+
+ print_uint(PRINT_ANY, "index", "\n\t index %u", parm->index);
+ print_int(PRINT_ANY, "ref", " ref %d", parm->refcnt);
+ print_int(PRINT_ANY, "bind", " bind %d", parm->bindcnt);
+
+ if (show_stats) {
+ if (tb[TCA_GATE_TM]) {
+ struct tcf_t *tm = RTA_DATA(tb[TCA_GATE_TM]);
+
+ print_tm(f, tm);
+ }
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ return 0;
+}
--
2.17.1

2020-03-24 04:10:11

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 2/5] net: qos: introduce a gate control flow action

Introduce a ingress frame gate control flow action. tc create a gate
action would provide a gate list to control when open/close state. when
the gate open state, the flow could pass but not when gate state is
close. The driver would repeat the gate list cyclically. User also could
assign a time point to start the gate list by the basetime parameter. if
the basetime has passed current time, start time would calculate by the
cycletime of the gate list.
The action gate behavior try to keep according to the IEEE 802.1Qci spec.
For the software simulation, require the user input the clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped. When the passed total frames over 8000000 bytes, it will
dropped in one 200000000ns time slot.

> tc qdisc add dev eth0 ingress

> tc filter add dev eth0 parent ffff: protocol ip \
flower src_ip 192.168.0.20 \
action gate index 2 clockid CLOCK_TAI \
sched-entry OPEN 200000000 -1 8000000 \
sched-entry CLOSE 100000000 -1 -1

> tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. gate state is
"OPEN"/"CLOSE". Follow the period nanosecond. Then next item is internal
priority value means which ingress queue should put. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.
Base-time is not set will be as 0 as default, as result start time would
be ((N + 1) * cycletime) which is the minimal of future time.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

> tc filter add dev eth0 parent ffff: protocol ip \
flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
action gate index 12 base-time 1357000000000 \
sched-entry CLOSE 200000000 -1 -1 \
clockid CLOCK_TAI

NOTE: This software simulator version not separate the admin/operation
state machine. Update setting would overwrite stop the previos setting
and waiting new cycle start.

Signed-off-by: Po Liu <[email protected]>
---
Changes from RFC:
- fix kfree lead ack by Jakub Kicinski and Cong Wang
- License fix from Jakub Kicinski and Stephen Hemminger
- Update example in commit acked by Vinicius Costa Gomes
- Fix the rcu protect in tcf_gate_act() acked by Vinicius


include/net/tc_act/tc_gate.h | 54 +++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 ++
net/sched/Kconfig | 15 +
net/sched/Makefile | 1 +
net/sched/act_gate.c | 647 ++++++++++++++++++++++++++++
6 files changed, 765 insertions(+)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
new file mode 100644
index 000000000000..b0ace55b2aaa
--- /dev/null
+++ b/include/net/tc_act/tc_gate.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright 2020 NXP */
+
+#ifndef __NET_TC_GATE_H
+#define __NET_TC_GATE_H
+
+#include <net/act_api.h>
+#include <linux/tc_act/tc_gate.h>
+
+struct tcfg_gate_entry {
+ int index;
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+ struct list_head list;
+};
+
+struct tcf_gate_params {
+ s32 tcfg_priority;
+ u64 tcfg_basetime;
+ u64 tcfg_cycletime;
+ u64 tcfg_cycletime_ext;
+ u32 tcfg_flags;
+ s32 tcfg_clockid;
+ size_t num_entries;
+ struct list_head entries;
+};
+
+#define GATE_ACT_GATE_OPEN BIT(0)
+#define GATE_ACT_PENDING BIT(1)
+struct gate_action {
+ struct tcf_gate_params param;
+ spinlock_t entry_lock;
+ u8 current_gate_status;
+ ktime_t current_close_time;
+ u32 current_entry_octets;
+ s32 current_max_octets;
+ struct tcfg_gate_entry __rcu *next_entry;
+ struct hrtimer hitimer;
+ enum tk_offsets tk_offset;
+ struct rcu_head rcu;
+};
+
+struct tcf_gate {
+ struct tc_action common;
+ struct gate_action __rcu *actg;
+};
+#define to_gate(a) ((struct tcf_gate *)a)
+
+#define get_gate_param(act) ((struct tcf_gate_params *)act)
+#define get_gate_action(p) ((struct gate_action *)p)
+
+#endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 81cc1a869588..aaa8fb214b28 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -134,6 +134,7 @@ enum tca_id {
TCA_ID_CTINFO,
TCA_ID_MPLS,
TCA_ID_CT,
+ TCA_ID_GATE,
/* other actions go here */
__TCA_ID_MAX = 255
};
diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
new file mode 100644
index 000000000000..f214b3a6d44f
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_gate.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 NXP */
+
+#ifndef __LINUX_TC_GATE_H
+#define __LINUX_TC_GATE_H
+
+#include <linux/pkt_cls.h>
+
+struct tc_gate {
+ tc_gen;
+};
+
+enum {
+ TCA_GATE_ENTRY_UNSPEC,
+ TCA_GATE_ENTRY_INDEX,
+ TCA_GATE_ENTRY_GATE,
+ TCA_GATE_ENTRY_INTERVAL,
+ TCA_GATE_ENTRY_IPV,
+ TCA_GATE_ENTRY_MAX_OCTETS,
+ __TCA_GATE_ENTRY_MAX,
+};
+#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_ONE_ENTRY_UNSPEC,
+ TCA_GATE_ONE_ENTRY,
+ __TCA_GATE_ONE_ENTRY_MAX,
+};
+#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_UNSPEC,
+ TCA_GATE_TM,
+ TCA_GATE_PARMS,
+ TCA_GATE_PAD,
+ TCA_GATE_PRIORITY,
+ TCA_GATE_ENTRY_LIST,
+ TCA_GATE_BASE_TIME,
+ TCA_GATE_CYCLE_TIME,
+ TCA_GATE_CYCLE_TIME_EXT,
+ TCA_GATE_FLAGS,
+ TCA_GATE_CLOCKID,
+ __TCA_GATE_MAX,
+};
+#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index bfbefb7bff9d..320471a0a21d 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -981,6 +981,21 @@ config NET_ACT_CT
To compile this code as a module, choose M here: the
module will be called act_ct.

+config NET_ACT_GATE
+ tristate "Frame gate list control tc action"
+ depends on NET_CLS_ACT
+ help
+ Say Y here to allow the control the ingress flow by the gate list
+ control. The frame policing by the time gate list control open/close
+ cycle time. The manipulation will simulate the IEEE 802.1Qci stream
+ gate control behavior. The action could be offload by the tc flower
+ to hardware driver which the hardware own the capability of IEEE
+ 802.1Qci.
+
+ If unsure, say N.
+ To compile this code as a module, choose M here: the
+ module will be called act_gate.
+
config NET_IFE_SKBMARK
tristate "Support to encoding decoding skb mark on IFE action"
depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 31c367a6cd09..66bbf9a98f9e 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o
obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
obj-$(CONFIG_NET_ACT_CT) += act_ct.o
+obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
new file mode 100644
index 000000000000..a70dadc4213b
--- /dev/null
+++ b/net/sched/act_gate.c
@@ -0,0 +1,647 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Copyright 2020 NXP */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <net/act_api.h>
+#include <net/netlink.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>
+
+static unsigned int gate_net_id;
+static struct tc_action_ops act_gate_ops;
+
+static ktime_t gate_get_time(struct gate_action *gact)
+{
+ ktime_t mono = ktime_get();
+
+ switch (gact->tk_offset) {
+ case TK_OFFS_MAX:
+ return mono;
+ default:
+ return ktime_mono_to_any(mono, gact->tk_offset);
+ }
+
+ return KTIME_MAX;
+}
+
+static int gate_get_start_time(struct gate_action *gact, ktime_t *start)
+{
+ struct tcf_gate_params *param = get_gate_param(gact);
+ ktime_t now, base, cycle;
+ u64 n;
+
+ base = ns_to_ktime(param->tcfg_basetime);
+ now = gate_get_time(gact);
+
+ if (ktime_after(base, now)) {
+ *start = base;
+ return 0;
+ }
+
+ cycle = param->tcfg_cycletime;
+
+ /* cycle time should not be zero */
+ if (WARN_ON(!cycle))
+ return -EFAULT;
+
+ n = div64_u64(ktime_sub_ns(now, base), cycle);
+ *start = ktime_add_ns(base, (n + 1) * cycle);
+ return 0;
+}
+
+static void gate_start_timer(struct gate_action *gact, ktime_t start)
+{
+ ktime_t expires;
+
+ expires = hrtimer_get_expires(&gact->hitimer);
+ if (expires == 0)
+ expires = KTIME_MAX;
+
+ start = min_t(ktime_t, start, expires);
+
+ hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS);
+}
+
+static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
+{
+ struct gate_action *gact = container_of(timer, struct gate_action,
+ hitimer);
+ struct tcf_gate_params *p = get_gate_param(gact);
+ struct tcfg_gate_entry *next;
+ ktime_t close_time, now;
+
+ spin_lock(&gact->entry_lock);
+
+ next = rcu_dereference_protected(gact->next_entry,
+ lockdep_is_held(&gact->entry_lock));
+
+ /* cycle start, clear pending bit, clear total octets */
+ gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0;
+ gact->current_entry_octets = 0;
+ gact->current_max_octets = next->maxoctets;
+
+ gact->current_close_time = ktime_add_ns(gact->current_close_time,
+ next->interval);
+
+ close_time = gact->current_close_time;
+
+ if (list_is_last(&next->list, &p->entries))
+ next = list_first_entry(&p->entries,
+ struct tcfg_gate_entry, list);
+ else
+ next = list_next_entry(next, list);
+
+ now = gate_get_time(gact);
+
+ if (ktime_after(now, close_time)) {
+ ktime_t cycle, base;
+ u64 n;
+
+ cycle = p->tcfg_cycletime;
+ base = ns_to_ktime(p->tcfg_basetime);
+ n = div64_u64(ktime_sub_ns(now, base), cycle);
+ close_time = ktime_add_ns(base, (n + 1) * cycle);
+ }
+
+ rcu_assign_pointer(gact->next_entry, next);
+ spin_unlock(&gact->entry_lock);
+
+ hrtimer_set_expires(&gact->hitimer, close_time);
+
+ return HRTIMER_RESTART;
+}
+
+static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
+ struct tcf_result *res)
+{
+ struct tcf_gate *g = to_gate(a);
+ struct gate_action *gact;
+ int action;
+
+ tcf_lastuse_update(&g->tcf_tm);
+ bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
+
+ action = READ_ONCE(g->tcf_action);
+ rcu_read_lock();
+ gact = rcu_dereference_bh(g->actg);
+ if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
+ rcu_read_unlock();
+ return action;
+ }
+
+ if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
+ goto drop;
+
+ if (gact->current_max_octets >= 0) {
+ gact->current_entry_octets += qdisc_pkt_len(skb);
+ if (gact->current_entry_octets > gact->current_max_octets) {
+ qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
+ goto drop;
+ }
+ }
+ rcu_read_unlock();
+
+ return action;
+drop:
+ rcu_read_unlock();
+ qstats_drop_inc(this_cpu_ptr(g->common.cpu_qstats));
+ return TC_ACT_SHOT;
+}
+
+static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1] = {
+ [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
+ [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
+ [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
+ [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
+ [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
+};
+
+static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
+ [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
+ .type = NLA_EXACT_LEN },
+ [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
+ [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
+ [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
+ [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
+ [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
+ [TCA_GATE_FLAGS] = { .type = NLA_U32 },
+ [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
+};
+
+static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
+ struct netlink_ext_ack *extack)
+{
+ u32 interval = 0;
+
+ entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
+
+ if (tb[TCA_GATE_ENTRY_INTERVAL])
+ interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
+
+ if (interval == 0) {
+ NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
+ return -EINVAL;
+ }
+
+ entry->interval = interval;
+
+ if (tb[TCA_GATE_ENTRY_IPV])
+ entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
+ else
+ entry->ipv = -1;
+
+ if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
+ entry->maxoctets = nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
+ else
+ entry->maxoctets = -1;
+
+ return 0;
+}
+
+static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry *entry,
+ int index, struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
+ int err;
+
+ err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy, extack);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack, "Could not parse nested entry");
+ return -EINVAL;
+ }
+
+ entry->index = index;
+
+ return fill_gate_entry(tb, entry, extack);
+}
+
+static int parse_gate_list(struct nlattr *list_attr,
+ struct tcf_gate_params *sched,
+ struct netlink_ext_ack *extack)
+{
+ struct tcfg_gate_entry *entry, *e;
+ struct nlattr *n;
+ int err, rem;
+ int i = 0;
+
+ if (!list_attr)
+ return -EINVAL;
+
+ nla_for_each_nested(n, list_attr, rem) {
+ if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
+ NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
+ continue;
+ }
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry) {
+ NL_SET_ERR_MSG(extack, "Not enough memory for entry");
+ err = -ENOMEM;
+ goto release_list;
+ }
+
+ err = parse_gate_entry(n, entry, i, extack);
+ if (err < 0) {
+ kfree(entry);
+ goto release_list;
+ }
+
+ list_add_tail(&entry->list, &sched->entries);
+ i++;
+ }
+
+ sched->num_entries = i;
+
+ return i;
+
+release_list:
+ list_for_each_entry_safe(entry, e, &sched->entries, list) {
+ list_del(&entry->list);
+ kfree(entry);
+ }
+
+ return err;
+}
+
+static int tcf_gate_init(struct net *net, struct nlattr *nla,
+ struct nlattr *est, struct tc_action **a,
+ int ovr, int bind, bool rtnl_held,
+ struct tcf_proto *tp, u32 flags,
+ struct netlink_ext_ack *extack)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+ enum tk_offsets tk_offset = TK_OFFS_TAI;
+ struct nlattr *tb[TCA_GATE_MAX + 1];
+ struct tcf_chain *goto_ch = NULL;
+ struct tcfg_gate_entry *next;
+ struct tcf_gate_params *p;
+ struct gate_action *gact;
+ s32 clockid = CLOCK_TAI;
+ struct tc_gate *parm;
+ struct tcf_gate *g;
+ int ret = 0, err;
+ u64 basetime = 0;
+ u32 gflags = 0;
+ s32 prio = -1;
+ ktime_t start;
+ u32 index;
+
+ if (!nla)
+ return -EINVAL;
+
+ err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
+ if (err < 0)
+ return err;
+
+ if (!tb[TCA_GATE_PARMS])
+ return -EINVAL;
+ parm = nla_data(tb[TCA_GATE_PARMS]);
+ index = parm->index;
+ err = tcf_idr_check_alloc(tn, &index, a, bind);
+ if (err < 0)
+ return err;
+
+ if (err && bind)
+ return 0;
+
+ if (!err) {
+ ret = tcf_idr_create_from_flags(tn, index, est, a,
+ &act_gate_ops, bind, flags);
+ if (ret) {
+ tcf_idr_cleanup(tn, index);
+ return ret;
+ }
+
+ ret = ACT_P_CREATED;
+ } else if (!ovr) {
+ tcf_idr_release(*a, bind);
+ return -EEXIST;
+ }
+
+ if (tb[TCA_GATE_PRIORITY])
+ prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
+
+ if (tb[TCA_GATE_BASE_TIME])
+ basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
+
+ if (tb[TCA_GATE_FLAGS])
+ gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
+
+ if (tb[TCA_GATE_CLOCKID]) {
+ clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
+ switch (clockid) {
+ case CLOCK_REALTIME:
+ tk_offset = TK_OFFS_REAL;
+ break;
+ case CLOCK_MONOTONIC:
+ tk_offset = TK_OFFS_MAX;
+ break;
+ case CLOCK_BOOTTIME:
+ tk_offset = TK_OFFS_BOOT;
+ break;
+ case CLOCK_TAI:
+ tk_offset = TK_OFFS_TAI;
+ break;
+ default:
+ NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
+ goto release_idr;
+ }
+ }
+
+ err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
+ if (err < 0)
+ goto release_idr;
+
+ g = to_gate(*a);
+
+ gact = kzalloc(sizeof(*gact), GFP_KERNEL);
+ if (!gact) {
+ err = -ENOMEM;
+ goto put_chain;
+ }
+
+ p = get_gate_param(gact);
+
+ INIT_LIST_HEAD(&p->entries);
+ if (tb[TCA_GATE_ENTRY_LIST]) {
+ err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
+ if (err < 0)
+ goto release_mem;
+ }
+
+ if (tb[TCA_GATE_CYCLE_TIME]) {
+ p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
+ } else {
+ struct tcfg_gate_entry *entry;
+ ktime_t cycle = 0;
+
+ list_for_each_entry(entry, &p->entries, list)
+ cycle = ktime_add_ns(cycle, entry->interval);
+ p->tcfg_cycletime = cycle;
+ }
+
+ if (tb[TCA_GATE_CYCLE_TIME_EXT])
+ p->tcfg_cycletime_ext =
+ nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
+
+ p->tcfg_priority = prio;
+ p->tcfg_basetime = basetime;
+ p->tcfg_clockid = clockid;
+ p->tcfg_flags = gflags;
+
+ gact->tk_offset = tk_offset;
+ spin_lock_init(&gact->entry_lock);
+ hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
+ gact->hitimer.function = gate_timer_func;
+
+ err = gate_get_start_time(gact, &start);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack,
+ "Internal error: failed get start time");
+ goto release_mem;
+ }
+
+ gact->current_close_time = start;
+ gact->current_gate_status = GATE_ACT_GATE_OPEN | GATE_ACT_PENDING;
+
+ next = list_first_entry(&p->entries, struct tcfg_gate_entry, list);
+ rcu_assign_pointer(gact->next_entry, next);
+
+ gate_start_timer(gact, start);
+
+ spin_lock_bh(&g->tcf_lock);
+ goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+ gact = rcu_replace_pointer(g->actg, gact,
+ lockdep_is_held(&g->tcf_lock));
+ spin_unlock_bh(&g->tcf_lock);
+
+ if (goto_ch)
+ tcf_chain_put_by_act(goto_ch);
+ if (gact)
+ kfree_rcu(gact, rcu);
+
+ if (ret == ACT_P_CREATED)
+ tcf_idr_insert(tn, *a);
+ return ret;
+
+release_mem:
+ kfree(gact);
+put_chain:
+ if (goto_ch)
+ tcf_chain_put_by_act(goto_ch);
+release_idr:
+ tcf_idr_release(*a, bind);
+ return err;
+}
+
+static void tcf_gate_cleanup(struct tc_action *a)
+{
+ struct tcf_gate *g = to_gate(a);
+ struct tcfg_gate_entry *entry, *n;
+ struct tcf_gate_params *p;
+ struct gate_action *gact;
+
+ spin_lock_bh(&g->tcf_lock);
+ gact = rcu_dereference_protected(g->actg,
+ lockdep_is_held(&g->tcf_lock));
+ hrtimer_cancel(&gact->hitimer);
+
+ p = get_gate_param(gact);
+ list_for_each_entry_safe(entry, n, &p->entries, list) {
+ list_del(&entry->list);
+ kfree(entry);
+ }
+ spin_unlock_bh(&g->tcf_lock);
+
+ kfree_rcu(gact, rcu);
+}
+
+static int dumping_entry(struct sk_buff *skb,
+ struct tcfg_gate_entry *entry)
+{
+ struct nlattr *item;
+
+ item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY);
+ if (!item)
+ return -ENOSPC;
+
+ if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index))
+ goto nla_put_failure;
+
+ if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry->maxoctets))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv))
+ goto nla_put_failure;
+
+ return nla_nest_end(skb, item);
+
+nla_put_failure:
+ nla_nest_cancel(skb, item);
+ return -1;
+}
+
+static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a,
+ int bind, int ref)
+{
+ unsigned char *b = skb_tail_pointer(skb);
+ struct tcf_gate *g = to_gate(a);
+ struct tc_gate opt = {
+ .index = g->tcf_index,
+ .refcnt = refcount_read(&g->tcf_refcnt) - ref,
+ .bindcnt = atomic_read(&g->tcf_bindcnt) - bind,
+ };
+ struct tcfg_gate_entry *entry;
+ struct gate_action *gact;
+ struct tcf_gate_params *p;
+ struct nlattr *entry_list;
+ struct tcf_t t;
+
+ spin_lock_bh(&g->tcf_lock);
+ opt.action = g->tcf_action;
+ gact = rcu_dereference_protected(g->actg,
+ lockdep_is_held(&g->tcf_lock));
+
+ p = get_gate_param(gact);
+
+ if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME,
+ p->tcfg_basetime, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME,
+ p->tcfg_cycletime, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT,
+ p->tcfg_cycletime_ext, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority))
+ goto nla_put_failure;
+
+ entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST);
+ if (!entry_list)
+ goto nla_put_failure;
+
+ list_for_each_entry(entry, &p->entries, list) {
+ if (dumping_entry(skb, entry) < 0)
+ goto nla_put_failure;
+ }
+
+ nla_nest_end(skb, entry_list);
+
+ tcf_tm_dump(&t, &g->tcf_tm);
+ if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD))
+ goto nla_put_failure;
+ spin_unlock_bh(&g->tcf_lock);
+
+ return skb->len;
+
+nla_put_failure:
+ spin_unlock_bh(&g->tcf_lock);
+ nlmsg_trim(skb, b);
+ return -1;
+}
+
+static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
+ struct netlink_callback *cb, int type,
+ const struct tc_action_ops *ops,
+ struct netlink_ext_ack *extack)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 lastuse, u64 dropped, bool hw)
+{
+ struct tcf_gate *g = to_gate(a);
+ struct tcf_t *tm = &g->tcf_tm;
+
+ tcf_action_update_stats(a, bytes, packets, dropped, hw);
+ tm->lastuse = max_t(u64, tm->lastuse, lastuse);
+}
+
+static int tcf_gate_search(struct net *net, struct tc_action **a, u32 index)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tcf_idr_search(tn, a, index);
+}
+
+static size_t tcf_gate_get_fill_size(const struct tc_action *act)
+{
+ return nla_total_size(sizeof(struct tc_gate));
+}
+
+static struct tc_action_ops act_gate_ops = {
+ .kind = "gate",
+ .id = TCA_ID_GATE,
+ .owner = THIS_MODULE,
+ .act = tcf_gate_act,
+ .dump = tcf_gate_dump,
+ .init = tcf_gate_init,
+ .cleanup = tcf_gate_cleanup,
+ .walk = tcf_gate_walker,
+ .stats_update = tcf_gate_stats_update,
+ .get_fill_size = tcf_gate_get_fill_size,
+ .lookup = tcf_gate_search,
+ .size = sizeof(struct gate_action),
+};
+
+static __net_init int gate_init_net(struct net *net)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tc_action_net_init(net, tn, &act_gate_ops);
+}
+
+static void __net_exit gate_exit_net(struct list_head *net_list)
+{
+ tc_action_net_exit(net_list, gate_net_id);
+}
+
+static struct pernet_operations gate_net_ops = {
+ .init = gate_init_net,
+ .exit_batch = gate_exit_net,
+ .id = &gate_net_id,
+ .size = sizeof(struct tc_action_net),
+};
+
+static int __init gate_init_module(void)
+{
+ return tcf_register_action(&act_gate_ops, &gate_net_ops);
+}
+
+static void __exit gate_cleanup_module(void)
+{
+ tcf_unregister_action(&act_gate_ops, &gate_net_ops);
+}
+
+module_init(gate_init_module);
+module_exit(gate_cleanup_module);
+MODULE_LICENSE("GPL v2");
--
2.17.1

2020-03-24 04:10:44

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 5/5] net: enetc: add tc flower psfp offload driver

This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
function. There are four main feature parts to implement the flow
policing and filtering for ingress flow with IEEE 802.1Qci features.
They are stream identify(this is defined in the P802.1cb exactly but
needed for 802.1Qci), stream filtering, stream gate and flow metering.
Each function block includes many entries by index to assign parameters.
So for one frame would be filtered by stream identify first, then
flow into stream filter block by the same handle between stream identify
and stream filtering. Then flow into stream gate control which assigned
by the stream filtering entry. And then policing by the gate and limited
by the max sdu in the filter block(optional). At last, policing by the
flow metering block, index choosing at the fitering block.
So you can see that each entry of block may link to many upper entries
since they can be assigned same index means more streams want to share
the same feature in the stream filtering or stream gate or flow
metering.
To implement such features, each stream filtered by source/destination
mac address, some stream maybe also plus the vlan id value would be
treated as one flow chain. This would be identified by the chain_index
which already in the tc filter concept. Driver would maintain this chain
and also with gate modules. The stream filter entry create by the gate
index and flow meter(optional) entry id and also one priority value.
Offloading only transfer the gate action and flow filtering parameters.
Driver would create (or search same gate id and flow meter id and
priority) one stream filter entry to set to the hardware. So stream
filtering do not need transfer by the action offloading.
This architecture is same with tc filter and actions relationship. tc
filter maintain the list for each flow feature by keys. And actions
maintain by the action list.

Below showing a example commands by tc:
> tc qdisc add dev eth0 ingress
> ip link set eth0 address 10:00:80:00:00:00
> tc filter add dev eth0 parent ffff: protocol ip chain 11 \
flower skip_sw dst_mac 10:00:80:00:00:00 \
action gate index 10 \
sched-entry OPEN 200000000 1 8000000 \
sched-entry CLOSE 100000000 -1 -1

Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
identify module. And the set the gate index 10 of stream gate module.
The gate list is keeping OPEN state 200ms, and through the frames to the
ingress queue 1, and max octets are the 8Mbytes.

Signed-off-by: Po Liu <[email protected]>
---
Changes from RFC:
Acked by Vinicius Costa Gomes
- Use refcount kernel lib
- Update stream gate check code position
- Update reduce ref names more clear

drivers/net/ethernet/freescale/enetc/enetc.c | 25 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 39 +-
.../net/ethernet/freescale/enetc/enetc_hw.h | 142 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 4 +-
.../net/ethernet/freescale/enetc/enetc_qos.c | 1074 +++++++++++++++++
5 files changed, 1269 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 04aac7cbb506..298c55786fd9 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1521,6 +1521,8 @@ int enetc_setup_tc(struct net_device *ndev, enum tc_setup_type type,
return enetc_setup_tc_cbs(ndev, type_data);
case TC_SETUP_QDISC_ETF:
return enetc_setup_tc_txtime(ndev, type_data);
+ case TC_SETUP_BLOCK:
+ return enetc_setup_tc_psfp(ndev, type_data);
default:
return -EOPNOTSUPP;
}
@@ -1573,17 +1575,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
static int enetc_set_psfp(struct net_device *ndev, int en)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ int err;

if (en) {
+ err = enetc_psfp_enable(priv);
+ if (err)
+ return err;
+
priv->active_offloads |= ENETC_F_QCI;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&priv->si->hw);
- } else {
- priv->active_offloads &= ~ENETC_F_QCI;
- memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
- enetc_psfp_disable(&priv->si->hw);
+ return 0;
}

+ err = enetc_psfp_disable(priv);
+ if (err)
+ return err;
+
+ priv->active_offloads &= ~ENETC_F_QCI;
+
return 0;
}

@@ -1591,14 +1599,15 @@ int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
netdev_features_t changed = ndev->features ^ features;
+ int err = 0;

if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

if (changed & NETIF_F_HW_TC)
- enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+ err = enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));

- return 0;
+ return err;
}

#ifdef CONFIG_FSL_ENETC_PTP_CLOCK
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index f1690a178c17..56583df1acb8 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -300,6 +300,11 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv);
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data);
+int enetc_psfp_init(struct enetc_ndev_priv *priv);
+int enetc_psfp_clean(struct enetc_ndev_priv *priv);

static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
{
@@ -319,33 +324,59 @@ static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
}

-static inline void enetc_psfp_enable(struct enetc_hw *hw)
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ enetc_get_max_cap(priv);
+
+ err = enetc_psfp_init(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
| ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS
| ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+
+ return 0;
}

-static inline void enetc_psfp_disable(struct enetc_hw *hw)
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ err = enetc_psfp_clean(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
& ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS
& ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+
+ return 0;
}
+
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_psfp(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_block_cb NULL
+
#define enetc_get_max_cap(p) \
memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))

-static inline void enetc_psfp_enable(struct enetc_hw *hw)
+static inline int enetc_psfp_enable(struct enetc_hw *hw)
{
return 0;
}

-static inline void enetc_psfp_disable(struct enetc_hw *hw)
+static inline int enetc_psfp_disable(struct enetc_hw *hw)
{
return 0;
}
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 587974862f48..6314051bc6c1 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -567,6 +567,9 @@ enum bdcr_cmd_class {
BDCR_CMD_RFS,
BDCR_CMD_PORT_GCL,
BDCR_CMD_RECV_CLASSIFIER,
+ BDCR_CMD_STREAM_IDENTIFY,
+ BDCR_CMD_STREAM_FILTER,
+ BDCR_CMD_STREAM_GCL,
__BDCR_CMD_MAX_LEN,
BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
};
@@ -598,13 +601,152 @@ struct tgs_gcl_data {
struct gce entry[];
};

+/* class 7, command 0, Stream Identity Entry Configuration */
+struct streamid_conf {
+ __le32 stream_handle; /* init gate value */
+ __le32 iports;
+ u8 id_type;
+ u8 oui[3];
+ u8 res[3];
+ u8 en;
+};
+
+#define ENETC_CBDR_SID_VID_MASK 0xfff
+#define ENETC_CBDR_SID_VIDM BIT(12)
+#define ENETC_CBDR_SID_TG_MASK 0xc000
+/* streamid_conf address point to this data space */
+struct streamid_data {
+ union {
+ u8 dmac[6];
+ u8 smac[6];
+ };
+ u16 vid_vidm_tg;
+};
+
+#define ENETC_CBDR_SFI_PRI_MASK 0x7
+#define ENETC_CBDR_SFI_PRIM BIT(3)
+#define ENETC_CBDR_SFI_BLOV BIT(4)
+#define ENETC_CBDR_SFI_BLEN BIT(5)
+#define ENETC_CBDR_SFI_MSDUEN BIT(6)
+#define ENETC_CBDR_SFI_FMITEN BIT(7)
+#define ENETC_CBDR_SFI_ENABLE BIT(7)
+/* class 8, command 0, Stream Filter Instance, Short Format */
+struct sfi_conf {
+ __le32 stream_handle;
+ u8 multi;
+ u8 res[2];
+ u8 sthm;
+ /* Max Service Data Unit or Flow Meter Instance Table index.
+ * Depending on the value of FLT this represents either Max
+ * Service Data Unit (max frame size) allowed by the filter
+ * entry or is an index into the Flow Meter Instance table
+ * index identifying the policer which will be used to police
+ * it.
+ */
+ __le16 fm_inst_table_index;
+ __le16 msdu;
+ __le16 sg_inst_table_index;
+ u8 res1[2];
+ __le32 input_ports;
+ u8 res2[3];
+ u8 en;
+};
+
+/* class 8, command 2 stream Filter Instance status query short format
+ * command no need structure define
+ * Stream Filter Instance Query Statistics Response data
+ */
+struct sfi_counter_data {
+ u32 matchl;
+ u32 matchh;
+ u32 msdu_dropl;
+ u32 msdu_droph;
+ u32 stream_gate_dropl;
+ u32 stream_gate_droph;
+ u32 flow_meter_dropl;
+ u32 flow_meter_droph;
+};
+
+#define ENETC_CBDR_SGI_OIPV_MASK 0x7
+#define ENETC_CBDR_SGI_OIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_CGTST BIT(6)
+#define ENETC_CBDR_SGI_OGTST BIT(7)
+#define ENETC_CBDR_SGI_CFG_CHG BIT(1)
+#define ENETC_CBDR_SGI_CFG_PND BIT(2)
+#define ENETC_CBDR_SGI_OEX BIT(4)
+#define ENETC_CBDR_SGI_OEXEN BIT(5)
+#define ENETC_CBDR_SGI_IRX BIT(6)
+#define ENETC_CBDR_SGI_IRXEN BIT(7)
+#define ENETC_CBDR_SGI_ACLLEN_MASK 0x3
+#define ENETC_CBDR_SGI_OCLLEN_MASK 0xc
+#define ENETC_CBDR_SGI_EN BIT(7)
+/* class 9, command 0, Stream Gate Instance Table, Short Format
+ * class 9, command 2, Stream Gate Instance Table entry query write back
+ * Short Format
+ */
+struct sgi_table {
+ u8 res[8];
+ u8 oipv;
+ u8 res0[2];
+ u8 ocgtst;
+ u8 res1[7];
+ u8 gset;
+ u8 oacl_len;
+ u8 res2[2];
+ u8 en;
+};
+
+#define ENETC_CBDR_SGI_AIPV_MASK 0x7
+#define ENETC_CBDR_SGI_AIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_AGTST BIT(7)
+
+/* class 9, command 1, Stream Gate Control List, Long Format */
+struct sgcl_conf {
+ u8 aipv;
+ u8 res[2];
+ u8 agtst;
+ u8 res1[4];
+ union {
+ struct {
+ u8 res2[4];
+ u8 acl_len;
+ u8 res3[3];
+ };
+ u8 cct[8]; /* Config change time */
+ };
+};
+
+#define ENETC_CBDR_SGL_IOMEN BIT(0)
+#define ENETC_CBDR_SGL_IPVEN BIT(3)
+#define ENETC_CBDR_SGL_GTST BIT(4)
+#define ENETC_CBDR_SGL_IPV_MASK 0xe
+/* Stream Gate Control List Entry */
+struct sgce {
+ u32 interval;
+ u8 msdu[3];
+ u8 multi;
+};
+
+/* stream control list class 9 , cmd 1 data buffer */
+struct sgcl_data {
+ u32 btl;
+ u32 bth;
+ u32 ct;
+ u32 cte;
+ struct sgce sgcl[0];
+};
+
struct enetc_cbd {
union{
+ struct sfi_conf sfi_conf;
+ struct sgi_table sgi_table;
struct {
__le32 addr[2];
union {
__le32 opt[4];
struct tgs_gcl_conf gcl_conf;
+ struct streamid_conf sid_set;
+ struct sgcl_conf sgcl_conf;
};
}; /* Long format */
__le32 data[6];
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index b38beb41998b..64d36fd1d444 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -740,12 +740,10 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

- if (si->hw_features & ENETC_SI_F_PSFP) {
+ if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
priv->active_offloads |= ENETC_F_QCI;
ndev->features |= NETIF_F_HW_TC;
ndev->hw_features |= NETIF_F_HW_TC;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&si->hw);
}

/* pick up primary MAC address from SI */
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 0c6bf3a55a9a..f479a190f6cf 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -5,6 +5,9 @@

#include <net/pkt_sched.h>
#include <linux/math64.h>
+#include <linux/refcount.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>

static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
{
@@ -331,3 +334,1074 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)

return 0;
}
+
+enum streamid_type {
+ STREAMID_TYPE_RESERVED = 0,
+ STREAMID_TYPE_NULL,
+ STREAMID_TYPE_SMAC,
+};
+
+enum streamid_vlan_tagged {
+ STREAMID_VLAN_RESERVED = 0,
+ STREAMID_VLAN_TAGGED,
+ STREAMID_VLAN_UNTAGGED,
+ STREAMID_VLAN_ALL,
+};
+
+#define ENETC_PSFP_WILDCARD -1
+#define HANDLE_OFFSET 100
+
+enum forward_type {
+ FILTER_ACTION_TYPE_PSFP = BIT(0),
+ FILTER_ACTION_TYPE_ACL = BIT(1),
+ FILTER_ACTION_TYPE_BOTH = GENMASK(1, 0),
+};
+
+/* This is for limit output type for input actions */
+struct actions_fwd {
+ u64 actions;
+ u64 keys; /* include the must needed keys */
+ enum forward_type output;
+};
+
+struct psfp_streamfilter_counters {
+ u64 matching_frames_count;
+ u64 passing_frames_count;
+ u64 not_passing_frames_count;
+ u64 passing_sdu_count;
+ u64 not_passing_sdu_count;
+ u64 red_frames_count;
+};
+
+struct enetc_streamid {
+ u32 index;
+ union {
+ u8 src_mac[6];
+ u8 dst_mac[6];
+ };
+ u8 filtertype;
+ u16 vid;
+ u8 tagged;
+ s32 handle;
+};
+
+struct enetc_psfp_filter {
+ u32 index;
+ s32 handle;
+ s8 prio;
+ u32 gate_id;
+ s32 meter_id;
+ refcount_t refcount;
+ struct hlist_node node;
+};
+
+struct enetc_psfp_gate {
+ u32 index;
+ s8 init_ipv;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimext;
+ u32 num_entries;
+ refcount_t refcount;
+ struct hlist_node node;
+ struct action_gate_entry entries[0];
+};
+
+struct enetc_stream_filter {
+ struct enetc_streamid sid;
+ u32 sfi_index;
+ u32 sgi_index;
+ struct flow_stats stats;
+ struct hlist_node node;
+};
+
+struct enetc_psfp {
+ unsigned long dev_bitmap;
+ unsigned long *psfp_sfi_bitmap;
+ struct hlist_head stream_list;
+ struct hlist_head psfp_filter_list;
+ struct hlist_head psfp_gate_list;
+ spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
+};
+
+struct actions_fwd enetc_act_fwd[] = {
+ {
+ BIT(FLOW_ACTION_GATE),
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
+ FILTER_ACTION_TYPE_PSFP
+ },
+ /* example for ACL actions */
+ {
+ BIT(FLOW_ACTION_DROP),
+ 0,
+ FILTER_ACTION_TYPE_ACL
+ }
+};
+
+static struct enetc_psfp epsfp = {
+ .psfp_sfi_bitmap = NULL,
+};
+
+static LIST_HEAD(enetc_block_cb_list);
+
+static inline int enetc_get_port(struct enetc_ndev_priv *priv)
+{
+ return priv->si->pdev->devfn & 0x7;
+}
+
+/* Stream Identity Entry Set Descriptor */
+static int enetc_streamid_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct streamid_data *si_data;
+ struct streamid_conf *si_conf;
+ u16 data_size;
+ dma_addr_t dma;
+ int err;
+
+ if (sid->index >= priv->psfp_cap.max_streamid)
+ return -EINVAL;
+
+ if (sid->filtertype != STREAMID_TYPE_NULL &&
+ sid->filtertype != STREAMID_TYPE_SMAC)
+ return -EOPNOTSUPP;
+
+ /* Disable operation before enable */
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct streamid_data);
+ si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev, si_data,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(si_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+ memset(si_data->dmac, 0xff, ETH_ALEN);
+ si_data->vid_vidm_tg =
+ cpu_to_le16(ENETC_CBDR_SID_VID_MASK
+ + ((0x3 << 14) | ENETC_CBDR_SID_VIDM));
+
+ si_conf = &cbd.sid_set;
+ /* Only one port supported for one entry, set itself */
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = 1;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ if (!enable) {
+ kfree(si_data);
+ return 0;
+ }
+
+ /* Enable the entry overwrite again incase space flushed by hardware */
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ si_conf->en = 0x80;
+ si_conf->stream_handle = cpu_to_le32(sid->handle);
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = sid->filtertype;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ memset(si_data, 0, data_size);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ /* VIDM default to be 1.
+ * VID Match. If set (b1) then the VID must match, otherwise
+ * any VID is considered a match. VIDM setting is only used
+ * when TG is set to b01.
+ */
+ if (si_conf->id_type == STREAMID_TYPE_NULL) {
+ ether_addr_copy(si_data->dmac, sid->dst_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ } else if (si_conf->id_type == STREAMID_TYPE_SMAC) {
+ ether_addr_copy(si_data->smac, sid->src_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ kfree(si_data);
+
+ return err;
+}
+
+/* Stream Filter Instance Set Descriptor */
+static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_filter *sfi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct sfi_conf *sfi_config;
+
+ cbd.index = cpu_to_le16(sfi->index);
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0x80;
+ cbd.length = cpu_to_le16(1);
+
+ sfi_config = &cbd.sfi_conf;
+ if (!enable)
+ goto exit;
+
+ sfi_config->en = 0x80;
+
+ if (sfi->handle >= 0) {
+ sfi_config->stream_handle =
+ cpu_to_le32(sfi->handle);
+ sfi_config->sthm |= 0x80;
+ }
+
+ sfi_config->sg_inst_table_index = cpu_to_le16(sfi->gate_id);
+ sfi_config->input_ports = 1 << enetc_get_port(priv);
+
+ /* The priority value which may be matched against the
+ * frame’s priority value to determine a match for this entry.
+ */
+ if (sfi->prio >= 0)
+ sfi_config->multi |= (sfi->prio & 0x7) | 0x8;
+
+ /* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
+ * field as being either an MSDU value or an index into the Flow
+ * Meter Instance table.
+ * TODO: no limit max sdu
+ */
+
+ if (sfi->meter_id >= 0) {
+ sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
+ sfi_config->multi |= 0x80;
+ }
+
+exit:
+ return enetc_send_cmd(priv->si, &cbd);
+}
+
+static int enetc_streamcounter_hw_get(struct enetc_ndev_priv *priv,
+ u32 index,
+ struct psfp_streamfilter_counters *cnt)
+{
+ struct enetc_cbd cbd = { .cmd = 2 };
+ struct sfi_counter_data *data_buf;
+ dma_addr_t dma;
+ u16 data_size;
+ int err;
+
+ cbd.index = cpu_to_le16((u16)index);
+ cbd.cmd = 2;
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct sfi_counter_data);
+ data_buf = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!data_buf)
+ return -ENOMEM;
+
+ dma = dma_map_single(&priv->si->pdev->dev, data_buf,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ err = -ENOMEM;
+ goto exit;
+ }
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ goto exit;
+
+ cnt->matching_frames_count =
+ ((u64)le32_to_cpu(data_buf->matchh) << 32)
+ + data_buf->matchl;
+
+ cnt->not_passing_sdu_count =
+ ((u64)le32_to_cpu(data_buf->msdu_droph) << 32)
+ + data_buf->msdu_dropl;
+
+ cnt->passing_sdu_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count;
+
+ cnt->not_passing_frames_count =
+ ((u64)le32_to_cpu(data_buf->stream_gate_droph) << 32)
+ + le32_to_cpu(data_buf->stream_gate_dropl);
+
+ cnt->passing_frames_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count
+ - cnt->not_passing_frames_count;
+
+ cnt->red_frames_count =
+ ((u64)le32_to_cpu(data_buf->flow_meter_droph) << 32)
+ + le32_to_cpu(data_buf->flow_meter_dropl);
+
+exit:
+ kfree(data_buf);
+ return err;
+}
+
+static int get_start_ns(struct enetc_ndev_priv *priv, u64 cycle, u64 *start)
+{
+ u64 now_lo, now_hi, now, n;
+
+ now_lo = enetc_rd(&priv->si->hw, ENETC_SICTR0);
+ now_hi = enetc_rd(&priv->si->hw, ENETC_SICTR1);
+ now = now_lo | now_hi << 32;
+
+ if (WARN_ON(!cycle))
+ return -EFAULT;
+
+ n = div64_u64(now, cycle);
+
+ *start = (n + 1) * cycle;
+
+ return 0;
+}
+
+/* Stream Gate Instance Set Descriptor */
+static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_gate *sgi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = { .cmd = 0 };
+ struct sgi_table *sgi_config;
+ struct sgcl_conf *sgcl_config;
+ struct sgcl_data *sgcl_data;
+ struct sgce *sgce;
+ dma_addr_t dma;
+ u16 data_size;
+ int err, i;
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0x80;
+
+ /* disable */
+ if (!enable)
+ return enetc_send_cmd(priv->si, &cbd);
+
+ if (!sgi->num_entries)
+ return 0;
+
+ if (sgi->num_entries > priv->psfp_cap.max_psfp_gatelist ||
+ !sgi->cycletime)
+ return -EINVAL;
+
+ /* enable */
+ sgi_config = &cbd.sgi_table;
+
+ /* Keep open before gate list start */
+ sgi_config->ocgtst = 0x80;
+
+ sgi_config->oipv = (sgi->init_ipv < 0) ?
+ 0x0 : ((sgi->init_ipv & 0x7) | 0x8);
+
+ sgi_config->en = 0x80;
+
+ /* Basic config */
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 1;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0;
+
+ sgcl_config = &cbd.sgcl_conf;
+
+ sgcl_config->acl_len = (sgi->num_entries - 1) & 0x3;
+
+ data_size = struct_size(sgcl_data, sgcl, sgi->num_entries);
+
+ sgcl_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!sgcl_data)
+ return -ENOMEM;
+
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev,
+ sgcl_data, data_size,
+ DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(sgcl_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ sgce = &sgcl_data->sgcl[0];
+
+ sgcl_config->agtst = 0x80;
+
+ sgcl_data->ct = cpu_to_le32(sgi->cycletime);
+ sgcl_data->cte = cpu_to_le32(sgi->cycletimext);
+
+ if (sgi->init_ipv >= 0)
+ sgcl_config->aipv = (sgi->init_ipv & 0x7) | 0x8;
+
+ for (i = 0; i < sgi->num_entries; i++) {
+ struct action_gate_entry *from = &sgi->entries[i];
+ struct sgce *to = &sgce[i];
+
+ if (from->gate_state)
+ to->multi |= 0x10;
+
+ if (from->ipv >= 0)
+ to->multi |= ((from->ipv & 0x7) << 5) | 0x08;
+
+ if (from->maxoctets)
+ to->multi |= 0x01;
+
+ to->interval = cpu_to_le32(from->interval);
+ to->msdu[0] = from->maxoctets & 0xFF;
+ to->msdu[1] = (from->maxoctets >> 8) & 0xFF;
+ to->msdu[2] = (from->maxoctets >> 16) & 0xFF;
+ }
+
+ /* If basetime is 0, calculate start time */
+ if (!sgi->basetime) {
+ u64 start;
+
+ err = get_start_ns(priv, sgi->cycletime, &start);
+ if (err)
+ goto exit;
+ sgcl_data->btl = cpu_to_le32(lower_32_bits(start));
+ sgcl_data->bth = cpu_to_le32(upper_32_bits(start));
+ } else {
+ u32 hi, lo;
+
+ hi = upper_32_bits(sgi->basetime);
+ lo = lower_32_bits(sgi->basetime);
+ sgcl_data->bth = cpu_to_le32(hi);
+ sgcl_data->btl = cpu_to_le32(lo);
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+
+exit:
+ kfree(sgcl_data);
+
+ return err;
+}
+
+static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
+{
+ struct enetc_stream_filter *f;
+
+ hlist_for_each_entry(f, &epsfp.stream_list, node)
+ if (f->sid.index == index)
+ return f;
+
+ return NULL;
+}
+
+static struct enetc_psfp_gate *enetc_get_gate_by_index(u32 index)
+{
+ struct enetc_psfp_gate *g;
+
+ hlist_for_each_entry(g, &epsfp.psfp_gate_list, node)
+ if (g->index == index)
+ return g;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->index == index)
+ return s;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter
+ *enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->gate_id == sfi->gate_id &&
+ s->prio == sfi->prio &&
+ s->meter_id == sfi->meter_id)
+ return s;
+
+ return NULL;
+}
+
+static int enetc_get_free_index(struct enetc_ndev_priv *priv)
+{
+ u32 max_size = priv->psfp_cap.max_psfp_filter;
+ unsigned long index;
+
+ index = find_first_zero_bit(epsfp.psfp_sfi_bitmap, max_size);
+ if (index == max_size)
+ return -1;
+
+ return index;
+}
+
+static void stream_filter_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_filter *sfi;
+ u8 z;
+
+ sfi = enetc_get_filter_by_index(index);
+ WARN_ON(!sfi);
+ z = refcount_dec_and_test(&sfi->refcount);
+
+ if (z) {
+ enetc_streamfilter_hw_set(priv, sfi, false);
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ clear_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ }
+}
+
+static void stream_gate_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_gate *sgi;
+ u8 z;
+
+ sgi = enetc_get_gate_by_index(index);
+ WARN_ON(!sgi);
+ z = refcount_dec_and_test(&sgi->refcount);
+ if (z) {
+ enetc_streamgate_hw_set(priv, sgi, false);
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void remove_one_chain(struct enetc_ndev_priv *priv,
+ struct enetc_stream_filter *filter)
+{
+ stream_gate_unref(priv, filter->sgi_index);
+ stream_filter_unref(priv, filter->sfi_index);
+
+ hlist_del(&filter->node);
+ kfree(filter);
+}
+
+static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ struct enetc_psfp_filter *sfi,
+ struct enetc_psfp_gate *sgi)
+{
+ int err;
+
+ err = enetc_streamid_hw_set(priv, sid, true);
+ if (err)
+ return err;
+
+ if (sfi) {
+ err = enetc_streamfilter_hw_set(priv, sfi, true);
+ if (err)
+ goto revert_sid;
+ }
+
+ err = enetc_streamgate_hw_set(priv, sgi, true);
+ if (err)
+ goto revert_sfi;
+
+ return 0;
+
+revert_sfi:
+ if (sfi)
+ enetc_streamfilter_hw_set(priv, sfi, false);
+revert_sid:
+ enetc_streamid_hw_set(priv, sid, false);
+ return err;
+}
+
+struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int inputkeys)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(enetc_act_fwd); i++)
+ if (acts == enetc_act_fwd[i].actions &&
+ inputkeys & enetc_act_fwd[i].keys)
+ return &enetc_act_fwd[i];
+
+ return NULL;
+}
+
+static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(f);
+ struct netlink_ext_ack *extack = f->common.extack;
+ struct enetc_stream_filter *filter, *old_filter;
+ struct enetc_psfp_filter *sfi, *old_sfi;
+ struct enetc_psfp_gate *sgi, *old_sgi;
+ struct flow_action_entry *entry;
+ struct action_gate_entry *e;
+ u8 sfi_overwrite = 0;
+ int entries_size;
+ int i, err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ flow_action_for_each(i, entry, &rule->action)
+ if (entry->id == FLOW_ACTION_GATE)
+ break;
+
+ if (entry->id != FLOW_ACTION_GATE)
+ return -EINVAL;
+
+ filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+ if (!filter)
+ return -ENOMEM;
+
+ filter->sid.index = f->common.chain_index;
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+ struct flow_match_eth_addrs match;
+
+ flow_rule_match_eth_addrs(rule, &match);
+
+ if (!is_zero_ether_addr(match.mask->dst)) {
+ ether_addr_copy(filter->sid.dst_mac, match.key->dst);
+ filter->sid.filtertype = STREAMID_TYPE_NULL;
+ }
+
+ if (!is_zero_ether_addr(match.mask->src)) {
+ ether_addr_copy(filter->sid.src_mac, match.key->src);
+ filter->sid.filtertype = STREAMID_TYPE_SMAC;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported, must ETH_ADDRS");
+ return -EINVAL;
+ }
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
+ struct flow_match_vlan match;
+
+ flow_rule_match_vlan(rule, &match);
+ if (match.mask->vlan_priority) {
+ if (match.mask->vlan_priority !=
+ (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN priority");
+ err = -EINVAL;
+ goto free_filter;
+ }
+ }
+
+ if (match.mask->vlan_tpid) {
+ if (match.mask->vlan_tpid != VLAN_VID_MASK) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN id");
+ err = -EINVAL;
+ goto free_filter;
+ }
+
+ filter->sid.vid = match.key->vlan_tpid;
+ if (!filter->sid.vid)
+ filter->sid.tagged = STREAMID_VLAN_UNTAGGED;
+ else
+ filter->sid.tagged = STREAMID_VLAN_TAGGED;
+ }
+ } else {
+ filter->sid.tagged = STREAMID_VLAN_ALL;
+ }
+
+ /* parsing gate action */
+ if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ entries_size = struct_size(sgi, entries, entry->gate.num_entries);
+ sgi = kzalloc(entries_size, GFP_KERNEL);
+ if (!sgi) {
+ err = -ENOMEM;
+ goto free_filter;
+ }
+
+ refcount_set(&sgi->refcount, 1);
+ sgi->index = entry->gate.index;
+ sgi->init_ipv = entry->gate.prio;
+ sgi->basetime = entry->gate.basetime;
+ sgi->cycletime = entry->gate.cycletime;
+ sgi->num_entries = entry->gate.num_entries;
+
+ e = sgi->entries;
+ for (i = 0; i < entry->gate.num_entries; i++) {
+ e[i].gate_state = entry->gate.entries[i].gate_state;
+ e[i].interval = entry->gate.entries[i].interval;
+ e[i].ipv = entry->gate.entries[i].ipv;
+ e[i].maxoctets = entry->gate.entries[i].maxoctets;
+ }
+
+ filter->sgi_index = sgi->index;
+
+ sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
+ if (!sfi) {
+ err = -ENOMEM;
+ goto free_gate;
+ }
+
+ refcount_set(&sfi->refcount, 1);
+ sfi->gate_id = sgi->index;
+
+ /* flow meter not support yet */
+ sfi->meter_id = ENETC_PSFP_WILDCARD;
+
+ /* prio ref the filter prio */
+ if (f->common.prio && f->common.prio <= BIT(3))
+ sfi->prio = f->common.prio - 1;
+ else
+ sfi->prio = ENETC_PSFP_WILDCARD;
+
+ old_sfi = enetc_psfp_check_sfi(sfi);
+ if (!old_sfi) {
+ int index;
+
+ index = enetc_get_free_index(priv);
+ if (sfi->handle < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
+ err = -ENOSPC;
+ goto free_sfi;
+ }
+
+ sfi->index = index;
+ sfi->handle = index + HANDLE_OFFSET;
+ /* Update the stream filter handle also */
+ filter->sid.handle = sfi->handle;
+ filter->sfi_index = sfi->index;
+ sfi_overwrite = 0;
+ } else {
+ filter->sfi_index = old_sfi->index;
+ filter->sid.handle = old_sfi->handle;
+ sfi_overwrite = 1;
+ }
+
+ err = enetc_psfp_hw_set(priv, &filter->sid,
+ sfi_overwrite ? NULL : sfi, sgi);
+ if (err)
+ goto free_sfi;
+
+ spin_lock(&epsfp.psfp_lock);
+ /* Remove the old node if exist and update with a new node */
+ old_sgi = enetc_get_gate_by_index(filter->sgi_index);
+ if (old_sgi) {
+ refcount_set(&sgi->refcount,
+ refcount_read(&old_sgi->refcount) + 1);
+ hlist_del(&old_sgi->node);
+ kfree(old_sgi);
+ }
+
+ hlist_add_head(&sgi->node, &epsfp.psfp_gate_list);
+
+ if (!old_sfi) {
+ hlist_add_head(&sfi->node, &epsfp.psfp_filter_list);
+ set_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ } else {
+ kfree(sfi);
+ refcount_inc(&old_sfi->refcount);
+ }
+
+ old_filter = enetc_get_stream_by_index(filter->sid.index);
+ if (old_filter)
+ remove_one_chain(priv, old_filter);
+
+ filter->stats.lastused = jiffies;
+ hlist_add_head(&filter->node, &epsfp.stream_list);
+
+ spin_unlock(&epsfp.psfp_lock);
+
+ return 0;
+
+free_sfi:
+ kfree(sfi);
+free_gate:
+ kfree(sgi);
+free_filter:
+ kfree(filter);
+
+ return err;
+}
+
+static int enetc_config_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
+ struct netlink_ext_ack *extack = cls_flower->common.extack;
+ struct flow_dissector *dissector = rule->match.dissector;
+ struct flow_action *action = &rule->action;
+ struct flow_action_entry *entry;
+ struct actions_fwd *fwd;
+ u64 actions = 0;
+ int i, err;
+
+ if (!flow_action_has_entries(action)) {
+ NL_SET_ERR_MSG_MOD(extack, "At least one action is needed");
+ return -EINVAL;
+ }
+
+ flow_action_for_each(i, entry, action)
+ actions |= BIT(entry->id);
+
+ fwd = enetc_check_flow_actions(actions, dissector->used_keys);
+ if (!fwd) {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported filter type!");
+ return -EOPNOTSUPP;
+ }
+
+ if (fwd->output & FILTER_ACTION_TYPE_PSFP) {
+ err = enetc_psfp_parse_clsflower(priv, cls_flower);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid PSFP inputs");
+ return err;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported actions");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int enetc_psfp_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct enetc_stream_filter *filter;
+ struct netlink_ext_ack *extack = f->common.extack;
+ int err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamid_hw_set(priv, &filter->sid, false);
+ if (err)
+ return err;
+
+ remove_one_chain(priv, filter);
+
+ return 0;
+}
+
+static int enetc_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ return enetc_psfp_destroy_clsflower(priv, f);
+}
+
+static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct psfp_streamfilter_counters counters = {};
+ struct enetc_stream_filter *filter;
+ struct flow_stats stats = {};
+ int err;
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamcounter_hw_get(priv, filter->sfi_index, &counters);
+ if (err)
+ return -EINVAL;
+
+ spin_lock(&epsfp.psfp_lock);
+ stats.pkts = counters.matching_frames_count - filter->stats.pkts;
+ stats.dropped = counters.not_passing_frames_count -
+ filter->stats.dropped;
+ stats.lastused = filter->stats.lastused;
+ filter->stats.pkts += stats.pkts;
+ filter->stats.dropped += stats.dropped;
+
+ spin_unlock(&epsfp.psfp_lock);
+
+ flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
+ stats.dropped);
+
+ return 0;
+}
+
+static int enetc_setup_tc_cls_flower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ switch (cls_flower->command) {
+ case FLOW_CLS_REPLACE:
+ return enetc_config_clsflower(priv, cls_flower);
+ case FLOW_CLS_DESTROY:
+ return enetc_destroy_clsflower(priv, cls_flower);
+ case FLOW_CLS_STATS:
+ return enetc_psfp_get_stats(priv, cls_flower);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static inline void clean_psfp_sfi_bitmap(void)
+{
+ bitmap_free(epsfp.psfp_sfi_bitmap);
+ epsfp.psfp_sfi_bitmap = NULL;
+}
+
+static void clean_stream_list(void)
+{
+ struct enetc_stream_filter *s;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(s, tmp, &epsfp.stream_list, node) {
+ hlist_del(&s->node);
+ kfree(s);
+ }
+}
+
+static void clean_sfi_list(void)
+{
+ struct enetc_psfp_filter *sfi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sfi, tmp, &epsfp.psfp_filter_list, node) {
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ }
+}
+
+static void clean_sgi_list(void)
+{
+ struct enetc_psfp_gate *sgi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sgi, tmp, &epsfp.psfp_gate_list, node) {
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void clean_psfp_all(void)
+{
+ /* Disable all list nodes and free all memory */
+ clean_sfi_list();
+ clean_sgi_list();
+ clean_stream_list();
+ epsfp.dev_bitmap = 0;
+ clean_psfp_sfi_bitmap();
+}
+
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct net_device *ndev = cb_priv;
+
+ if (!tc_can_offload(ndev))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case TC_SETUP_CLSFLOWER:
+ return enetc_setup_tc_cls_flower(netdev_priv(ndev), type_data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+int enetc_psfp_init(struct enetc_ndev_priv *priv)
+{
+ if (epsfp.psfp_sfi_bitmap)
+ return 0;
+
+ epsfp.psfp_sfi_bitmap = bitmap_zalloc(priv->psfp_cap.max_psfp_filter,
+ GFP_KERNEL);
+ if (!epsfp.psfp_sfi_bitmap)
+ return -ENOMEM;
+
+ spin_lock_init(&epsfp.psfp_lock);
+
+ if (list_empty(&enetc_block_cb_list))
+ epsfp.dev_bitmap = 0;
+
+ return 0;
+}
+
+int enetc_psfp_clean(struct enetc_ndev_priv *priv)
+{
+ if (!list_empty(&enetc_block_cb_list))
+ return -EBUSY;
+
+ clean_psfp_all();
+
+ return 0;
+}
+
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct flow_block_offload *f = type_data;
+ int err;
+
+ err = flow_block_cb_setup_simple(f, &enetc_block_cb_list,
+ enetc_setup_tc_block_cb,
+ ndev, ndev, true);
+ if (err)
+ return err;
+
+ switch (f->command) {
+ case FLOW_BLOCK_BIND:
+ set_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ break;
+ case FLOW_BLOCK_UNBIND:
+ clear_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ if (!epsfp.dev_bitmap)
+ clean_psfp_all();
+ break;
+ }
+
+ return 0;
+}
--
2.17.1

2020-03-24 04:10:48

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 4/5] net: enetc: add hw tc hw offload features for PSPF capability

This patch is to let ethtool enable/disable the tc flower offload
features. Hardware enetc has the feature of PSFP which is for per-stream
policing. When enable the tc hw offloading feature, driver would enable
the IEEE 802.1Qci feature. It is only set the register enable bit for
this feature not enable for any entry of per stream filtering and stream
gate or stream identify but get how much capabilities for each features.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 23 ++++++++
drivers/net/ethernet/freescale/enetc/enetc.h | 55 +++++++++++++++++++
.../net/ethernet/freescale/enetc/enetc_hw.h | 17 ++++++
.../net/ethernet/freescale/enetc/enetc_pf.c | 8 +++
4 files changed, 103 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index ccf2611f4a20..04aac7cbb506 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -756,6 +756,9 @@ void enetc_get_si_caps(struct enetc_si *si)

if (val & ENETC_SIPCAPR0_QBV)
si->hw_features |= ENETC_SI_F_QBV;
+
+ if (val & ENETC_SIPCAPR0_PSFP)
+ si->hw_features |= ENETC_SI_F_PSFP;
}

static int enetc_dma_alloc_bdr(struct enetc_bdr *r, size_t bd_size)
@@ -1567,6 +1570,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
return 0;
}

+static int enetc_set_psfp(struct net_device *ndev, int en)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+
+ if (en) {
+ priv->active_offloads |= ENETC_F_QCI;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&priv->si->hw);
+ } else {
+ priv->active_offloads &= ~ENETC_F_QCI;
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+ enetc_psfp_disable(&priv->si->hw);
+ }
+
+ return 0;
+}
+
int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
@@ -1575,6 +1595,9 @@ int enetc_set_features(struct net_device *ndev,
if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

+ if (changed & NETIF_F_HW_TC)
+ enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+
return 0;
}

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 56c43f35b633..f1690a178c17 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -151,6 +151,7 @@ enum enetc_errata {
};

#define ENETC_SI_F_QBV BIT(0)
+#define ENETC_SI_F_PSFP BIT(1)

/* PCI IEP device data */
struct enetc_si {
@@ -203,12 +204,20 @@ struct enetc_cls_rule {
};

#define ENETC_MAX_BDR_INT 2 /* fixed to max # of available cpus */
+struct psfp_cap {
+ u32 max_streamid;
+ u32 max_psfp_filter;
+ u32 max_psfp_gate;
+ u32 max_psfp_gatelist;
+ u32 max_psfp_meter;
+};

/* TODO: more hardware offloads */
enum enetc_active_offloads {
ENETC_F_RX_TSTAMP = BIT(0),
ENETC_F_TX_TSTAMP = BIT(1),
ENETC_F_QBV = BIT(2),
+ ENETC_F_QCI = BIT(3),
};

struct enetc_ndev_priv {
@@ -231,6 +240,8 @@ struct enetc_ndev_priv {

struct enetc_cls_rule *cls_rules;

+ struct psfp_cap psfp_cap;
+
struct device_node *phy_node;
phy_interface_t if_mode;
};
@@ -289,9 +300,53 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+
+static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
+{
+ u32 reg = 0;
+
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSIDCAPR);
+ priv->psfp_cap.max_streamid = reg & ENETC_PSIDCAPR_MSK;
+ /* Port stream filter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSFCAPR);
+ priv->psfp_cap.max_psfp_filter = reg & ENETC_PSFCAPR_MSK;
+ /* Port stream gate capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSGCAPR);
+ priv->psfp_cap.max_psfp_gate = (reg & ENETC_PSGCAPR_SGIT_MSK);
+ priv->psfp_cap.max_psfp_gatelist = (reg & ENETC_PSGCAPR_GCL_MSK) >> 16;
+ /* Port flow meter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PFMCAPR);
+ priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
+}
+
+static inline void enetc_psfp_enable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
+ | ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS
+ | ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+}
+
+static inline void enetc_psfp_disable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
+ & ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS
+ & ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+}
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_get_max_cap(p) \
+ memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))
+
+static inline void enetc_psfp_enable(struct enetc_hw *hw)
+{
+ return 0;
+}
+
+static inline void enetc_psfp_disable(struct enetc_hw *hw)
+{
+ return 0;
+}
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 2a6523136947..587974862f48 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -19,6 +19,7 @@
#define ENETC_SICTR1 0x1c
#define ENETC_SIPCAPR0 0x20
#define ENETC_SIPCAPR0_QBV BIT(4)
+#define ENETC_SIPCAPR0_PSFP BIT(9)
#define ENETC_SIPCAPR0_RSS BIT(8)
#define ENETC_SIPCAPR1 0x24
#define ENETC_SITGTGR 0x30
@@ -228,6 +229,15 @@ enum enetc_bdr_type {TX, RX};
#define ENETC_PM0_IFM_RLP (BIT(5) | BIT(11))
#define ENETC_PM0_IFM_RGAUTO (BIT(15) | ENETC_PMO_IFM_RG | BIT(1))
#define ENETC_PM0_IFM_XGMII BIT(12)
+#define ENETC_PSIDCAPR 0x1b08
+#define ENETC_PSIDCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSFCAPR 0x1b18
+#define ENETC_PSFCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSGCAPR 0x1b28
+#define ENETC_PSGCAPR_GCL_MSK GENMASK(18, 16)
+#define ENETC_PSGCAPR_SGIT_MSK GENMASK(15, 0)
+#define ENETC_PFMCAPR 0x1b38
+#define ENETC_PFMCAPR_MSK GENMASK(15, 0)

/* MAC counters */
#define ENETC_PM0_REOCT 0x8100
@@ -621,3 +631,10 @@ struct enetc_cbd {
/* Port time specific departure */
#define ENETC_PTCTSDR(n) (0x1210 + 4 * (n))
#define ENETC_TSDE BIT(31)
+
+/* PSFP setting */
+#define ENETC_PPSFPMR 0x11b00
+#define ENETC_PPSFPMR_PSFPEN BIT(0)
+#define ENETC_PPSFPMR_VS BIT(1)
+#define ENETC_PPSFPMR_PVC BIT(2)
+#define ENETC_PPSFPMR_PVZC BIT(3)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 4e4a49179f0b..b38beb41998b 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -740,6 +740,14 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

+ if (si->hw_features & ENETC_SI_F_PSFP) {
+ priv->active_offloads |= ENETC_F_QCI;
+ ndev->features |= NETIF_F_HW_TC;
+ ndev->hw_features |= NETIF_F_HW_TC;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&si->hw);
+ }
+
/* pick up primary MAC address from SI */
enetc_get_primary_mac_addr(&si->hw, ndev->dev_addr);
}
--
2.17.1

2020-03-24 10:02:24

by Jiri Pirko

[permalink] [raw]
Subject: Re: [v1,net-next 1/5] net: qos offload add flow status with dropped count

Tue, Mar 24, 2020 at 04:47:39AM CET, [email protected] wrote:
>Add the hardware tc flower offloading with dropped frame counter for
>status update. action ops->stats_update only loaded by the
>tcf_exts_stats_update() and tcf_exts_stats_update() only loaded by
>matchall and tc flower hardware filter. But the stats_update only set
>the dropped count as default false in the ops->stats_update. This
>patch add the dropped counter to action stats update. Its dropped counter
>update by the hardware offloading driver.
>This is changed by replacing the drop flag with dropped frame counter.

I just read this paragraph 3 times, I'm unable to decypher :(



>
>Driver side should update how many "packets" it filtered and how many
>"dropped" in those "packets".
>

[...]


> return action;
> }
>
>-static void tcf_gact_stats_update(struct tc_action *a, u64 bytes, u32 packets,
>- u64 lastuse, bool hw)
>+static void tcf_gact_stats_update(struct tc_action *a, u64 bytes, u64 packets,
>+ u64 lastuse, u64 dropped, bool hw)
> {
> struct tcf_gact *gact = to_gact(a);
> int action = READ_ONCE(gact->tcf_action);
> struct tcf_t *tm = &gact->tcf_tm;
>
>- tcf_action_update_stats(a, bytes, packets, action == TC_ACT_SHOT, hw);
>+ tcf_action_update_stats(a, bytes, packets,
>+ (action == TC_ACT_SHOT) ? packets : 0, hw);

Avoid ()s here.


> tm->lastuse = max_t(u64, tm->lastuse, lastuse);
> }
>

2020-03-24 10:21:36

by Jiri Pirko

[permalink] [raw]
Subject: Re: [v1,net-next 2/5] net: qos: introduce a gate control flow action

Tue, Mar 24, 2020 at 04:47:40AM CET, [email protected] wrote:
>Introduce a ingress frame gate control flow action. tc create a gate
>action would provide a gate list to control when open/close state. when
>the gate open state, the flow could pass but not when gate state is
>close. The driver would repeat the gate list cyclically. User also could
>assign a time point to start the gate list by the basetime parameter. if
>the basetime has passed current time, start time would calculate by the
>cycletime of the gate list.

Cannot decypher this either :/ Seriously, please make the patch
descriptions readable.

Also, a sentence starts with capital letter.



>The action gate behavior try to keep according to the IEEE 802.1Qci spec.
>For the software simulation, require the user input the clock type.
>
>Below is the setting example in user space. Tc filter a stream source ip
>address is 192.168.0.20 and gate action own two time slots. One is last
>200ms gate open let frame pass another is last 100ms gate close let
>frames dropped. When the passed total frames over 8000000 bytes, it will
>dropped in one 200000000ns time slot.
>
>> tc qdisc add dev eth0 ingress
>
>> tc filter add dev eth0 parent ffff: protocol ip \
> flower src_ip 192.168.0.20 \
> action gate index 2 clockid CLOCK_TAI \
> sched-entry OPEN 200000000 -1 8000000 \
> sched-entry CLOSE 100000000 -1 -1

The rest of the commands do not use capitals. Please lowercase these.


>
>> tc chain del dev eth0 ingress chain 0
>
>"sched-entry" follow the name taprio style. gate state is
>"OPEN"/"CLOSE". Follow the period nanosecond. Then next item is internal
>priority value means which ingress queue should put. "-1" means
>wildcard. The last value optional specifies the maximum number of
>MSDU octets that are permitted to pass the gate during the specified
>time interval.
>Base-time is not set will be as 0 as default, as result start time would
>be ((N + 1) * cycletime) which is the minimal of future time.
>
>Below example shows filtering a stream with destination mac address is
>10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
>action would run with one close time slot which means always keep close.
>The time cycle is total 200000000ns. The base-time would calculate by:
>
> 1357000000000 + (N + 1) * cycletime
>
>When the total value is the future time, it will be the start time.
>The cycletime here would be 200000000ns for this case.
>
>> tc filter add dev eth0 parent ffff: protocol ip \
> flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> action gate index 12 base-time 1357000000000 \
> sched-entry CLOSE 200000000 -1 -1 \
> clockid CLOCK_TAI
>
>NOTE: This software simulator version not separate the admin/operation
>state machine. Update setting would overwrite stop the previos setting
>and waiting new cycle start.
>

[...]


>diff --git a/net/sched/Kconfig b/net/sched/Kconfig
>index bfbefb7bff9d..320471a0a21d 100644
>--- a/net/sched/Kconfig
>+++ b/net/sched/Kconfig
>@@ -981,6 +981,21 @@ config NET_ACT_CT
> To compile this code as a module, choose M here: the
> module will be called act_ct.
>
>+config NET_ACT_GATE
>+ tristate "Frame gate list control tc action"
>+ depends on NET_CLS_ACT
>+ help
>+ Say Y here to allow the control the ingress flow by the gate list

"to control"?


>+ control. The frame policing by the time gate list control open/close

Incomplete sentence.


>+ cycle time. The manipulation will simulate the IEEE 802.1Qci stream
>+ gate control behavior. The action could be offload by the tc flower
>+ to hardware driver which the hardware own the capability of IEEE
>+ 802.1Qci.

We do not mention offload for the other actions. I suggest to not to
mention it here either.


>+
>+ If unsure, say N.
>+ To compile this code as a module, choose M here: the
>+ module will be called act_gate.
>+
> config NET_IFE_SKBMARK
> tristate "Support to encoding decoding skb mark on IFE action"
> depends on NET_ACT_IFE

[...]

2020-03-24 10:30:32

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v1,net-next 2/5] net: qos: introduce a gate control flow action

Hi Jiri,

I would update descriptions more clearly.

> -----Original Message-----
> From: Jiri Pirko <[email protected]>
> Sent: 2020??3??24?? 18:19
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>; Xiaoliang Yang
> <[email protected]>; Roy Zang <[email protected]>; Mingkai Hu
> <[email protected]>; Jerry Huang <[email protected]>; Leo Li
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [EXT] Re: [v1,net-next 2/5] net: qos: introduce a gate control flow
> action
>
> Caution: EXT Email
>
> Tue, Mar 24, 2020 at 04:47:40AM CET, [email protected] wrote:
> >Introduce a ingress frame gate control flow action. tc create a gate
> >action would provide a gate list to control when open/close state. when
> >the gate open state, the flow could pass but not when gate state is
> >close. The driver would repeat the gate list cyclically. User also
> >could assign a time point to start the gate list by the basetime
> >parameter. if the basetime has passed current time, start time would
> >calculate by the cycletime of the gate list.
>
> Cannot decypher this either :/ Seriously, please make the patch
> descriptions readable.
>

Ok.

> Also, a sentence starts with capital letter.
>
>
>
> >The action gate behavior try to keep according to the IEEE 802.1Qci spec.
> >For the software simulation, require the user input the clock type.
> >
> >Below is the setting example in user space. Tc filter a stream source
> >ip address is 192.168.0.20 and gate action own two time slots. One is
> >last 200ms gate open let frame pass another is last 100ms gate close
> >let frames dropped. When the passed total frames over 8000000 bytes, it
> >will dropped in one 200000000ns time slot.
> >
> >> tc qdisc add dev eth0 ingress
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower src_ip 192.168.0.20 \
> > action gate index 2 clockid CLOCK_TAI \
> > sched-entry OPEN 200000000 -1 8000000 \
> > sched-entry CLOSE 100000000 -1 -1
>
> The rest of the commands do not use capitals. Please lowercase these.
>

Ok.

>
> >
> >> tc chain del dev eth0 ingress chain 0
> >
> >"sched-entry" follow the name taprio style. gate state is
> >"OPEN"/"CLOSE". Follow the period nanosecond. Then next item is
> >internal priority value means which ingress queue should put. "-1"
> >means wildcard. The last value optional specifies the maximum number
> of
> >MSDU octets that are permitted to pass the gate during the specified
> >time interval.
> >Base-time is not set will be as 0 as default, as result start time
> >would be ((N + 1) * cycletime) which is the minimal of future time.
> >
> >Below example shows filtering a stream with destination mac address is
> >10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
> >action would run with one close time slot which means always keep close.
> >The time cycle is total 200000000ns. The base-time would calculate by:
> >
> > 1357000000000 + (N + 1) * cycletime
> >
> >When the total value is the future time, it will be the start time.
> >The cycletime here would be 200000000ns for this case.
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> > action gate index 12 base-time 1357000000000 \
> > sched-entry CLOSE 200000000 -1 -1 \
> > clockid CLOCK_TAI
> >
> >NOTE: This software simulator version not separate the admin/operation
> >state machine. Update setting would overwrite stop the previos setting
> >and waiting new cycle start.
> >
>
> [...]
>
>
> >diff --git a/net/sched/Kconfig b/net/sched/Kconfig index
> >bfbefb7bff9d..320471a0a21d 100644
> >--- a/net/sched/Kconfig
> >+++ b/net/sched/Kconfig
> >@@ -981,6 +981,21 @@ config NET_ACT_CT
> > To compile this code as a module, choose M here: the
> > module will be called act_ct.
> >
> >+config NET_ACT_GATE
> >+ tristate "Frame gate list control tc action"
> >+ depends on NET_CLS_ACT
> >+ help
> >+ Say Y here to allow the control the ingress flow by the gate
> >+list
>
> "to control"?

Ok.

>
>
> >+ control. The frame policing by the time gate list control
> >+ open/close
>
> Incomplete sentence.
>
>
> >+ cycle time. The manipulation will simulate the IEEE 802.1Qci stream
> >+ gate control behavior. The action could be offload by the tc flower
> >+ to hardware driver which the hardware own the capability of IEEE
> >+ 802.1Qci.
>
> We do not mention offload for the other actions. I suggest to not to
> mention it here either.

Ok.

>
>
> >+
> >+ If unsure, say N.
> >+ To compile this code as a module, choose M here: the
> >+ module will be called act_gate.
> >+
> > config NET_IFE_SKBMARK
> > tristate "Support to encoding decoding skb mark on IFE action"
> > depends on NET_ACT_IFE
>
> [...]

Thanks!

Br,
Po Liu

2020-03-24 11:19:47

by kernel test robot

[permalink] [raw]
Subject: Re: [v1,net-next 4/5] net: enetc: add hw tc hw offload features for PSPF capability

Hi Po,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on next-20200323]
[cannot apply to linux/master linus/master sparc-next/master ipvs/master v5.6-rc7 v5.6-rc6 v5.6-rc5 v5.6-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Po-Liu/Introduce-a-flow-gate-control-action-and-apply-IEEE/20200324-121156
base: 5149100c3aebe5e640d6ff68e0b5e5a7eb8638e0
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=9.2.0 make.cross ARCH=arm64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All warnings (new ones prefixed by >>):

In file included from drivers/net/ethernet/freescale/enetc/enetc_ptp.c:8:
drivers/net/ethernet/freescale/enetc/enetc.h: In function 'enetc_psfp_enable':
>> drivers/net/ethernet/freescale/enetc/enetc.h:345:9: warning: 'return' with a value, in function returning void [-Wreturn-type]
345 | return 0;
| ^
drivers/net/ethernet/freescale/enetc/enetc.h:343:20: note: declared here
343 | static inline void enetc_psfp_enable(struct enetc_hw *hw)
| ^~~~~~~~~~~~~~~~~
drivers/net/ethernet/freescale/enetc/enetc.h: In function 'enetc_psfp_disable':
drivers/net/ethernet/freescale/enetc/enetc.h:350:9: warning: 'return' with a value, in function returning void [-Wreturn-type]
350 | return 0;
| ^
drivers/net/ethernet/freescale/enetc/enetc.h:348:20: note: declared here
348 | static inline void enetc_psfp_disable(struct enetc_hw *hw)
| ^~~~~~~~~~~~~~~~~~

vim +/return +345 drivers/net/ethernet/freescale/enetc/enetc.h

328
329 static inline void enetc_psfp_disable(struct enetc_hw *hw)
330 {
331 enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
332 & ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS
333 & ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
334 }
335 #else
336 #define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
337 #define enetc_sched_speed_set(ndev) (void)0
338 #define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
339 #define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
340 #define enetc_get_max_cap(p) \
341 memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))
342
343 static inline void enetc_psfp_enable(struct enetc_hw *hw)
344 {
> 345 return 0;
346 }
347

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (3.00 kB)
.config.gz (47.80 kB)
Download all attachments

2020-03-24 12:16:01

by Jiri Pirko

[permalink] [raw]
Subject: Re: [v1,net-next 4/5] net: enetc: add hw tc hw offload features for PSPF capability

Tue, Mar 24, 2020 at 04:47:42AM CET, [email protected] wrote:

[...]


>@@ -289,9 +300,53 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
> void enetc_sched_speed_set(struct net_device *ndev);
> int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
> int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
>+
>+static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
>+{
>+ u32 reg = 0;

Pointless init.


>+
>+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSIDCAPR);
>+ priv->psfp_cap.max_streamid = reg & ENETC_PSIDCAPR_MSK;
>+ /* Port stream filter capability */
>+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSFCAPR);
>+ priv->psfp_cap.max_psfp_filter = reg & ENETC_PSFCAPR_MSK;
>+ /* Port stream gate capability */
>+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSGCAPR);
>+ priv->psfp_cap.max_psfp_gate = (reg & ENETC_PSGCAPR_SGIT_MSK);
>+ priv->psfp_cap.max_psfp_gatelist = (reg & ENETC_PSGCAPR_GCL_MSK) >> 16;
>+ /* Port flow meter capability */
>+ reg = enetc_port_rd(&priv->si->hw, ENETC_PFMCAPR);
>+ priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
>+}
>+
>+static inline void enetc_psfp_enable(struct enetc_hw *hw)
>+{
>+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
>+ | ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS

Hmm, I think it is better to have "|" at the end of the line".


>+ | ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
>+}
>+
>+static inline void enetc_psfp_disable(struct enetc_hw *hw)
>+{
>+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR)
>+ & ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS

Same here.


>+ & ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
>+}
> #else
> #define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
> #define enetc_sched_speed_set(ndev) (void)0
> #define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
> #define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP

[...]

2020-03-24 12:54:31

by kernel test robot

[permalink] [raw]
Subject: Re: [v1,net-next 5/5] net: enetc: add tc flower psfp offload driver

Hi Po,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20200323]
[cannot apply to linux/master linus/master sparc-next/master ipvs/master v5.6-rc7 v5.6-rc6 v5.6-rc5 v5.6-rc7]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url: https://github.com/0day-ci/linux/commits/Po-Liu/Introduce-a-flow-gate-control-action-and-apply-IEEE/20200324-121156
base: 5149100c3aebe5e640d6ff68e0b5e5a7eb8638e0
config: arm64-defconfig (attached as .config)
compiler: aarch64-linux-gcc (GCC) 9.2.0
reproduce:
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=9.2.0 make.cross ARCH=arm64

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <[email protected]>

All errors (new ones prefixed by >>):

drivers/net/ethernet/freescale/enetc/enetc_pf.c: In function 'enetc_pf_netdev_setup':
>> drivers/net/ethernet/freescale/enetc/enetc_pf.c:743:62: error: passing argument 1 of 'enetc_psfp_enable' from incompatible pointer type [-Werror=incompatible-pointer-types]
743 | if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
| ^~~~
| |
| struct enetc_ndev_priv *
In file included from drivers/net/ethernet/freescale/enetc/enetc_pf.h:4,
from drivers/net/ethernet/freescale/enetc/enetc_pf.c:8:
drivers/net/ethernet/freescale/enetc/enetc.h:374:54: note: expected 'struct enetc_hw *' but argument is of type 'struct enetc_ndev_priv *'
374 | static inline int enetc_psfp_enable(struct enetc_hw *hw)
| ~~~~~~~~~~~~~~~~~^~
cc1: some warnings being treated as errors
--
drivers/net/ethernet/freescale/enetc/enetc.c: In function 'enetc_set_psfp':
>> drivers/net/ethernet/freescale/enetc/enetc.c:1581:27: error: passing argument 1 of 'enetc_psfp_enable' from incompatible pointer type [-Werror=incompatible-pointer-types]
1581 | err = enetc_psfp_enable(priv);
| ^~~~
| |
| struct enetc_ndev_priv *
In file included from drivers/net/ethernet/freescale/enetc/enetc.c:4:
drivers/net/ethernet/freescale/enetc/enetc.h:374:54: note: expected 'struct enetc_hw *' but argument is of type 'struct enetc_ndev_priv *'
374 | static inline int enetc_psfp_enable(struct enetc_hw *hw)
| ~~~~~~~~~~~~~~~~~^~
>> drivers/net/ethernet/freescale/enetc/enetc.c:1589:27: error: passing argument 1 of 'enetc_psfp_disable' from incompatible pointer type [-Werror=incompatible-pointer-types]
1589 | err = enetc_psfp_disable(priv);
| ^~~~
| |
| struct enetc_ndev_priv *
In file included from drivers/net/ethernet/freescale/enetc/enetc.c:4:
drivers/net/ethernet/freescale/enetc/enetc.h:379:55: note: expected 'struct enetc_hw *' but argument is of type 'struct enetc_ndev_priv *'
379 | static inline int enetc_psfp_disable(struct enetc_hw *hw)
| ~~~~~~~~~~~~~~~~~^~
cc1: some warnings being treated as errors

vim +/enetc_psfp_enable +743 drivers/net/ethernet/freescale/enetc/enetc_pf.c

703
704 static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
705 const struct net_device_ops *ndev_ops)
706 {
707 struct enetc_ndev_priv *priv = netdev_priv(ndev);
708
709 SET_NETDEV_DEV(ndev, &si->pdev->dev);
710 priv->ndev = ndev;
711 priv->si = si;
712 priv->dev = &si->pdev->dev;
713 si->ndev = ndev;
714
715 priv->msg_enable = (NETIF_MSG_WOL << 1) - 1;
716 ndev->netdev_ops = ndev_ops;
717 enetc_set_ethtool_ops(ndev);
718 ndev->watchdog_timeo = 5 * HZ;
719 ndev->max_mtu = ENETC_MAX_MTU;
720
721 ndev->hw_features = NETIF_F_SG | NETIF_F_RXCSUM | NETIF_F_HW_CSUM |
722 NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_CTAG_RX |
723 NETIF_F_LOOPBACK;
724 ndev->features = NETIF_F_HIGHDMA | NETIF_F_SG |
725 NETIF_F_RXCSUM | NETIF_F_HW_CSUM |
726 NETIF_F_HW_VLAN_CTAG_TX |
727 NETIF_F_HW_VLAN_CTAG_RX |
728 NETIF_F_HW_VLAN_CTAG_FILTER;
729
730 if (si->num_rss)
731 ndev->hw_features |= NETIF_F_RXHASH;
732
733 if (si->errata & ENETC_ERR_TXCSUM) {
734 ndev->hw_features &= ~NETIF_F_HW_CSUM;
735 ndev->features &= ~NETIF_F_HW_CSUM;
736 }
737
738 ndev->priv_flags |= IFF_UNICAST_FLT;
739
740 if (si->hw_features & ENETC_SI_F_QBV)
741 priv->active_offloads |= ENETC_F_QBV;
742
> 743 if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
744 priv->active_offloads |= ENETC_F_QCI;
745 ndev->features |= NETIF_F_HW_TC;
746 ndev->hw_features |= NETIF_F_HW_TC;
747 }
748
749 /* pick up primary MAC address from SI */
750 enetc_get_primary_mac_addr(&si->hw, ndev->dev_addr);
751 }
752

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (5.64 kB)
.config.gz (47.80 kB)
Download all attachments

2020-03-24 13:07:07

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v1,net-next 1/5] net: qos offload add flow status with dropped count

Hi Jiri,

> -----Original Message-----
> From: Jiri Pirko <[email protected]>
> Sent: 2020??3??24?? 18:02
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>; Xiaoliang Yang
> <[email protected]>; Roy Zang <[email protected]>; Mingkai Hu
> <[email protected]>; Jerry Huang <[email protected]>; Leo Li
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [EXT] Re: [v1,net-next 1/5] net: qos offload add flow status with
> dropped count
>
> Caution: EXT Email
>
> Tue, Mar 24, 2020 at 04:47:39AM CET, [email protected] wrote:
> >Add the hardware tc flower offloading with dropped frame counter for
> >status update. action ops->stats_update only loaded by the
> >tcf_exts_stats_update() and tcf_exts_stats_update() only loaded by
> >matchall and tc flower hardware filter. But the stats_update only set
> >the dropped count as default false in the ops->stats_update. This patch
> >add the dropped counter to action stats update. Its dropped counter
> >update by the hardware offloading driver.
> >This is changed by replacing the drop flag with dropped frame counter.
>
> I just read this paragraph 3 times, I'm unable to decypher :(

Sorry to make you confusing. I would make a clear description.
Before that, I just try to explain what the patch do here so you can provide more suggestion.

To get the stats in the tc flower offloading(by flag FLOW_CLS_STATS), it saves in the 'struct flow_stats' in the ' struct flow_cls_offload '. But the ' struct flow_stats ' only include the packages numbers. Some actions like policing also this 0002/0003 patch introduce stream gate action would produce dropped frames which is important for user evaluation. The packages number(pkts in struct flow_stats) and dropped number relation should be 'pkts' is how many frames were filtered, and 'dropped' number is how many frames dropped in those 'pkts'.
Eventually, the stats updated by the struct tc_action 's operation stats_update(). To implement add dropped number, then add parameter 'dropped' in the related functions: ops->stats_update/tcf_exts_stats_update() and also the tcf_action_update_stats(). There is a bool parameter 'drop' which is using now, can be understand the stats updating is for update for 'drop' count or not. But this flag is not useless as I checked in current kernel code(correct me if it is not). So replace the bool 'drop' flag with 'dropped' number in tcf_action_update_stats(). Make the tcf_action_update_stats() shows how many ' packets' updated and how many 'dropped' number in these 'packets'.
Thanks!

>
>
>
> >
> >Driver side should update how many "packets" it filtered and how many
> >"dropped" in those "packets".
> >
>
> [...]
>
>
> > return action;
> > }
> >
> >-static void tcf_gact_stats_update(struct tc_action *a, u64 bytes, u32
> packets,
> >- u64 lastuse, bool hw)
> >+static void tcf_gact_stats_update(struct tc_action *a, u64 bytes, u64
> packets,
> >+ u64 lastuse, u64 dropped, bool hw)
> > {
> > struct tcf_gact *gact = to_gact(a);
> > int action = READ_ONCE(gact->tcf_action);
> > struct tcf_t *tm = &gact->tcf_tm;
> >
> >- tcf_action_update_stats(a, bytes, packets, action == TC_ACT_SHOT,
> hw);
> >+ tcf_action_update_stats(a, bytes, packets,
> >+ (action == TC_ACT_SHOT) ? packets : 0,
> >+ hw);
>
> Avoid ()s here.

Ok.

>
>
> > tm->lastuse = max_t(u64, tm->lastuse, lastuse); }
> >


Br,
Po Liu

2020-03-24 22:03:49

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [v1,iproute2 1/2] iproute2:tc:action: add a gate control action

On Tue, 24 Mar 2020 11:47:44 +0800
Po Liu <[email protected]> wrote:

> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index a6aa466..7a047a9 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -106,6 +106,7 @@ enum tca_id {
> TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
> TCA_ID_CTINFO,
> TCA_ID_MPLS,
> + TCA_ID_GATE,
> TCA_ID_CT,
> /* other actions go here */
> __TCA_ID_MAX = 255

All uapi headers need to come from checked in kernel.

This is an example of why, you have an out of date version because
this version would have broken ABI.


This patch should be against iproute2-next since because it depends on net-next.

2020-03-25 02:41:53

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v1,iproute2 1/2] iproute2:tc:action: add a gate control action

Hi Stephen,


> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
> Sent: 2020??3??25?? 5:59
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>; Xiaoliang Yang
> <[email protected]>; Roy Zang <[email protected]>; Mingkai Hu
> <[email protected]>; Jerry Huang <[email protected]>; Leo Li
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; David Ahern <[email protected]>
> Subject: [EXT] Re: [v1,iproute2 1/2] iproute2:tc:action: add a gate control
> action
>
> Caution: EXT Email
>
> On Tue, 24 Mar 2020 11:47:44 +0800
> Po Liu <[email protected]> wrote:
>
> > diff --git a/include/uapi/linux/pkt_cls.h
> > b/include/uapi/linux/pkt_cls.h index a6aa466..7a047a9 100644
> > --- a/include/uapi/linux/pkt_cls.h
> > +++ b/include/uapi/linux/pkt_cls.h
> > @@ -106,6 +106,7 @@ enum tca_id {
> > TCA_ID_SAMPLE = TCA_ACT_SAMPLE,
> > TCA_ID_CTINFO,
> > TCA_ID_MPLS,
> > + TCA_ID_GATE,
> > TCA_ID_CT,
> > /* other actions go here */
> > __TCA_ID_MAX = 255
>
> All uapi headers need to come from checked in kernel.
>
> This is an example of why, you have an out of date version because this
> version would have broken ABI.
>
>
> This patch should be against iproute2-next since because it depends on
> net-next.

I'll keep up with the iproute2-next branch. Thanks!

Br,
Po Liu

2020-04-14 14:32:56

by Po Liu

[permalink] [raw]
Subject: [v2,net-next 0/4] Introduce a flow gate control action and apply IEEE

Changes from V1:
0000: Update description make it more clear
0001: Removed 'add update dropped stats' patch, will provide pull
request as standalone patches.
0001: Update commit description make it more clear ack by Jiri Pirko.
0001: No changes
0003: Fix some code style ack by Jiri Pirko.
0004: Fix enetc_psfp_enable/disable parameter type ack by test robot

iprout2 command patches:
Not attach with these serial patches, will provide separate pull
request after kernel accept these patches.

Changes from RFC:
0000: Reduce to 5 patches and remove the 4 max frame size offload and
flow metering in the policing offload action, Only keep gate action
offloading implementation.
0001: No changes.
0002:
- fix kfree lead ack by Jakub Kicinski and Cong Wang
- License fix from Jakub Kicinski and Stephen Hemminger
- Update example in commit acked by Vinicius Costa Gomes
- Fix the rcu protect in tcf_gate_act() acked by Vinicius

0003: No changes
0004: No changes
0005:
Acked by Vinicius Costa Gomes
- Use refcount kernel lib
- Update stream gate check code position
- Update reduce ref names more clear

iprout2 command patches:
0000: Update license expression and add gate id
0001: Add tc action gate man page

--------------------------------------------------------------------
These patches add stream gate action policing in IEEE802.1Qci (Per-Stream
Filtering and Policing) software support and hardware offload support in
tc flower, and implement the stream identify, stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver.
Per-Stream Filtering and Policing (PSFP) specifies flow policing and
filtering for ingress flows, and has three main parts:
1. The stream filter instance table consists of an ordered list of
stream filters that determine the filtering and policing actions that
are to be applied to frames received on a specific stream. The main
elements are stream gate id, flow metering id and maximum SDU size.
2. The stream gate function setup a gate list to control ingress traffic
class open/close state. When the gate is running at open state, the flow
could pass but dropped when gate state is running to close. User setup a
bastime to tell gate when start running the entry list, then the hardware
would periodiclly. There is no compare qdisc action support.
3. Flow metering is two rates two buckets and three-color marker to
policing the frames. Flow metering instance are as specified in the
algorithm in MEF10.3. The most likely qdisc action is policing action.

The first patch introduces an ingress frame flow control gate action,
for the point 2. The tc gate action maintains the open/close state gate
list, allowing flows to pass when the gate is open. Each gate action
may policing one or more qdisc filters. When the start time arrived, The
driver would repeat the gate list periodiclly. User can assign a passed
time, the driver would calculate a new future time by the cycletime of
the gate list.

The 0002 patch introduces the gate flow hardware offloading.

The 0003 patch adds support control the on/off for the tc flower
offloading by ethtool.

The 0004 patch implement the stream identify and stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver. Tc filter
command provide filtering keys with MAC address and VLAN id. These keys
would be set to stream identify instance entry. Stream gate instance
entry would refer the gate action parameters. Stream filter instance
entry would refer the stream gate index and assign a stream handle value
matches to the stream identify instance.

Po Liu (4):
net: qos: introduce a gate control flow action
net: schedule: add action gate offloading
net: enetc: add hw tc hw offload features for PSPF capability
net: enetc: add tc flower psfp offload driver

drivers/net/ethernet/freescale/enetc/enetc.c | 34 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 86 ++
.../net/ethernet/freescale/enetc/enetc_hw.h | 159 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 6 +
.../net/ethernet/freescale/enetc/enetc_qos.c | 1070 +++++++++++++++++
include/net/flow_offload.h | 10 +
include/net/tc_act/tc_gate.h | 169 +++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +
net/sched/Kconfig | 13 +
net/sched/Makefile | 1 +
net/sched/act_gate.c | 647 ++++++++++
net/sched/cls_api.c | 33 +
13 files changed, 2275 insertions(+), 1 deletion(-)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

--
2.17.1

2020-04-14 14:33:01

by Po Liu

[permalink] [raw]
Subject: [ v2,net-next 2/4] net: schedule: add action gate offloading

Add the gate action to the flow action entry. Add the gate parameters to
the tc_setup_flow_action() queueing to the entries of flow_action_entry
array provide to the driver.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 10 +++
include/net/tc_act/tc_gate.h | 115 +++++++++++++++++++++++++++++++++++
net/sched/cls_api.c | 33 ++++++++++
3 files changed, 158 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 3619c6acf60f..94a30fe02e6d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -147,6 +147,7 @@ enum flow_action_id {
FLOW_ACTION_MPLS_PUSH,
FLOW_ACTION_MPLS_POP,
FLOW_ACTION_MPLS_MANGLE,
+ FLOW_ACTION_GATE,
NUM_FLOW_ACTIONS,
};

@@ -255,6 +256,15 @@ struct flow_action_entry {
u8 bos;
u8 ttl;
} mpls_mangle;
+ struct {
+ u32 index;
+ s32 prio;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimeext;
+ u32 num_entries;
+ struct action_gate_entry *entries;
+ } gate;
};
struct flow_action_cookie *cookie; /* user defined action cookie */
};
diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
index b0ace55b2aaa..62633cb02c7a 100644
--- a/include/net/tc_act/tc_gate.h
+++ b/include/net/tc_act/tc_gate.h
@@ -7,6 +7,13 @@
#include <net/act_api.h>
#include <linux/tc_act/tc_gate.h>

+struct action_gate_entry {
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+};
+
struct tcfg_gate_entry {
int index;
u8 gate_state;
@@ -51,4 +58,112 @@ struct tcf_gate {
#define get_gate_param(act) ((struct tcf_gate_params *)act)
#define get_gate_action(p) ((struct gate_action *)p)

+static inline bool is_tcf_gate(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+ if (a->ops && a->ops->id == TCA_ID_GATE)
+ return true;
+#endif
+ return false;
+}
+
+static inline u32 tcf_gate_index(const struct tc_action *a)
+{
+ return a->tcfa_index;
+}
+
+static inline s32 tcf_gate_prio(const struct tc_action *a)
+{
+ s32 tcfg_prio;
+
+ rcu_read_lock();
+ tcfg_prio = rcu_dereference(to_gate(a)->actg)->param.tcfg_priority;
+ rcu_read_unlock();
+
+ return tcfg_prio;
+}
+
+static inline u64 tcf_gate_basetime(const struct tc_action *a)
+{
+ u64 tcfg_basetime;
+
+ rcu_read_lock();
+ tcfg_basetime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_basetime;
+ rcu_read_unlock();
+
+ return tcfg_basetime;
+}
+
+static inline u64 tcf_gate_cycletime(const struct tc_action *a)
+{
+ u64 tcfg_cycletime;
+
+ rcu_read_lock();
+ tcfg_cycletime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime;
+ rcu_read_unlock();
+
+ return tcfg_cycletime;
+}
+
+static inline u64 tcf_gate_cycletimeext(const struct tc_action *a)
+{
+ u64 tcfg_cycletimeext;
+
+ rcu_read_lock();
+ tcfg_cycletimeext =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime_ext;
+ rcu_read_unlock();
+
+ return tcfg_cycletimeext;
+}
+
+static inline u32 tcf_gate_num_entries(const struct tc_action *a)
+{
+ u32 num_entries;
+
+ rcu_read_lock();
+ num_entries =
+ rcu_dereference(to_gate(a)->actg)->param.num_entries;
+ rcu_read_unlock();
+
+ return num_entries;
+}
+
+static inline struct action_gate_entry
+ *tcf_gate_get_list(const struct tc_action *a)
+{
+ struct action_gate_entry *oe;
+ struct tcf_gate_params *p;
+ struct tcfg_gate_entry *entry;
+ u32 num_entries;
+ int i = 0;
+
+ rcu_read_lock();
+ p = &(rcu_dereference(to_gate(a)->actg)->param);
+ num_entries = p->num_entries;
+ rcu_read_unlock();
+
+ list_for_each_entry(entry, &p->entries, list)
+ i++;
+
+ if (i != num_entries)
+ return NULL;
+
+ oe = kzalloc(sizeof(*oe) * num_entries, GFP_KERNEL);
+ if (!oe)
+ return NULL;
+
+ i = 0;
+ list_for_each_entry(entry, &p->entries, list) {
+ oe[i].gate_state = entry->gate_state;
+ oe[i].interval = entry->interval;
+ oe[i].ipv = entry->ipv;
+ oe[i].maxoctets = entry->maxoctets;
+ i++;
+ }
+
+ return oe;
+}
#endif
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f6a3b969ead0..c8de5a887230 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -39,6 +39,7 @@
#include <net/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_ct.h>
#include <net/tc_act/tc_mpls.h>
+#include <net/tc_act/tc_gate.h>
#include <net/flow_offload.h>

extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
@@ -3522,6 +3523,27 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
#endif
}

+static void tcf_gate_entry_destructor(void *priv)
+{
+ struct action_gate_entry *oe = priv;
+
+ kfree(oe);
+}
+
+static int tcf_gate_get_entries(struct flow_action_entry *entry,
+ const struct tc_action *act)
+{
+ entry->gate.entries = tcf_gate_get_list(act);
+
+ if (!entry->gate.entries)
+ return -EINVAL;
+
+ entry->destructor = tcf_gate_entry_destructor;
+ entry->destructor_priv = entry->gate.entries;
+
+ return 0;
+}
+
int tc_setup_flow_action(struct flow_action *flow_action,
const struct tcf_exts *exts)
{
@@ -3668,6 +3690,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
} else if (is_tcf_skbedit_priority(act)) {
entry->id = FLOW_ACTION_PRIORITY;
entry->priority = tcf_skbedit_priority(act);
+ } else if (is_tcf_gate(act)) {
+ entry->id = FLOW_ACTION_GATE;
+ entry->gate.index = tcf_gate_index(act);
+ entry->gate.prio = tcf_gate_prio(act);
+ entry->gate.basetime = tcf_gate_basetime(act);
+ entry->gate.cycletime = tcf_gate_cycletime(act);
+ entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
+ entry->gate.num_entries = tcf_gate_num_entries(act);
+ err = tcf_gate_get_entries(entry, act);
+ if (err)
+ goto err_out;
} else {
err = -EOPNOTSUPP;
goto err_out_locked;
--
2.17.1

2020-04-15 23:37:06

by David Miller

[permalink] [raw]
Subject: Re: [v2,net-next 0/4] Introduce a flow gate control action and apply IEEE


net-next is still closed, please resubmit when it opens back up.

Thank you.

2020-04-18 01:34:25

by Po Liu

[permalink] [raw]
Subject: [v2,net-next 0/4] Introduce a flow gate control action and apply IEEE

Changes from V1:
0000: Update description make it more clear
0001: Removed 'add update dropped stats' patch, will provide pull
request as standalone patches.
0001: Update commit description make it more clear ack by Jiri Pirko.
0001: No changes
0003: Fix some code style ack by Jiri Pirko.
0004: Fix enetc_psfp_enable/disable parameter type ack by test robot

iprout2 command patches:
Not attach with these serial patches, will provide separate pull
request after kernel accept these patches.

Changes from RFC:
0000: Reduce to 5 patches and remove the 4 max frame size offload and
flow metering in the policing offload action, Only keep gate action
offloading implementation.
0001: No changes.
0002:
- fix kfree lead ack by Jakub Kicinski and Cong Wang
- License fix from Jakub Kicinski and Stephen Hemminger
- Update example in commit acked by Vinicius Costa Gomes
- Fix the rcu protect in tcf_gate_act() acked by Vinicius

0003: No changes
0004: No changes
0005:
Acked by Vinicius Costa Gomes
- Use refcount kernel lib
- Update stream gate check code position
- Update reduce ref names more clear

iprout2 command patches:
0000: Update license expression and add gate id
0001: Add tc action gate man page

--------------------------------------------------------------------
These patches add stream gate action policing in IEEE802.1Qci (Per-Stream
Filtering and Policing) software support and hardware offload support in
tc flower, and implement the stream identify, stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver.
Per-Stream Filtering and Policing (PSFP) specifies flow policing and
filtering for ingress flows, and has three main parts:
1. The stream filter instance table consists of an ordered list of
stream filters that determine the filtering and policing actions that
are to be applied to frames received on a specific stream. The main
elements are stream gate id, flow metering id and maximum SDU size.
2. The stream gate function setup a gate list to control ingress traffic
class open/close state. When the gate is running at open state, the flow
could pass but dropped when gate state is running to close. User setup a
bastime to tell gate when start running the entry list, then the hardware
would periodiclly. There is no compare qdisc action support.
3. Flow metering is two rates two buckets and three-color marker to
policing the frames. Flow metering instance are as specified in the
algorithm in MEF10.3. The most likely qdisc action is policing action.

The first patch introduces an ingress frame flow control gate action,
for the point 2. The tc gate action maintains the open/close state gate
list, allowing flows to pass when the gate is open. Each gate action
may policing one or more qdisc filters. When the start time arrived, The
driver would repeat the gate list periodiclly. User can assign a passed
time, the driver would calculate a new future time by the cycletime of
the gate list.

The 0002 patch introduces the gate flow hardware offloading.

The 0003 patch adds support control the on/off for the tc flower
offloading by ethtool.

The 0004 patch implement the stream identify and stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver. Tc filter
command provide filtering keys with MAC address and VLAN id. These keys
would be set to stream identify instance entry. Stream gate instance
entry would refer the gate action parameters. Stream filter instance
entry would refer the stream gate index and assign a stream handle value
matches to the stream identify instance.

Po Liu (4):
net: qos: introduce a gate control flow action
net: schedule: add action gate offloading
net: enetc: add hw tc hw offload features for PSPF capability
net: enetc: add tc flower psfp offload driver

drivers/net/ethernet/freescale/enetc/enetc.c | 34 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 86 ++
.../net/ethernet/freescale/enetc/enetc_hw.h | 159 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 6 +
.../net/ethernet/freescale/enetc/enetc_qos.c | 1070 +++++++++++++++++
include/net/flow_offload.h | 10 +
include/net/tc_act/tc_gate.h | 169 +++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +
net/sched/Kconfig | 13 +
net/sched/Makefile | 1 +
net/sched/act_gate.c | 647 ++++++++++
net/sched/cls_api.c | 33 +
13 files changed, 2275 insertions(+), 1 deletion(-)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

--
2.17.1

2020-04-18 01:34:28

by Po Liu

[permalink] [raw]
Subject: [ v2,net-next 1/4] net: qos: introduce a gate control flow action

Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can be passed at
specific time slot, and be dropped at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action requires the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped. When the frames have passed total frames over 8000000
bytes, frames will be dropped in one 200000000ns time slot.

> tc qdisc add dev eth0 ingress

> tc filter add dev eth0 parent ffff: protocol ip \
flower src_ip 192.168.0.20 \
action gate index 2 clockid CLOCK_TAI \
sched-entry open 200000000 -1 8000000 \
sched-entry close 100000000 -1 -1

> tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow with period nanosecond. Then next item is internal
priority value means which ingress queue should put. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.
Base-time is not set will be 0 as default, as result start time would
be ((N + 1) * cycletime) which is the minimal of future time.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

> tc filter add dev eth0 parent ffff: protocol ip \
flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
action gate index 12 base-time 1357000000000 \
sched-entry close 200000000 -1 -1 \
clockid CLOCK_TAI

Signed-off-by: Po Liu <[email protected]>
---
include/net/tc_act/tc_gate.h | 54 +++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 ++
net/sched/Kconfig | 13 +
net/sched/Makefile | 1 +
net/sched/act_gate.c | 647 ++++++++++++++++++++++++++++
6 files changed, 763 insertions(+)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
new file mode 100644
index 000000000000..b0ace55b2aaa
--- /dev/null
+++ b/include/net/tc_act/tc_gate.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright 2020 NXP */
+
+#ifndef __NET_TC_GATE_H
+#define __NET_TC_GATE_H
+
+#include <net/act_api.h>
+#include <linux/tc_act/tc_gate.h>
+
+struct tcfg_gate_entry {
+ int index;
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+ struct list_head list;
+};
+
+struct tcf_gate_params {
+ s32 tcfg_priority;
+ u64 tcfg_basetime;
+ u64 tcfg_cycletime;
+ u64 tcfg_cycletime_ext;
+ u32 tcfg_flags;
+ s32 tcfg_clockid;
+ size_t num_entries;
+ struct list_head entries;
+};
+
+#define GATE_ACT_GATE_OPEN BIT(0)
+#define GATE_ACT_PENDING BIT(1)
+struct gate_action {
+ struct tcf_gate_params param;
+ spinlock_t entry_lock;
+ u8 current_gate_status;
+ ktime_t current_close_time;
+ u32 current_entry_octets;
+ s32 current_max_octets;
+ struct tcfg_gate_entry __rcu *next_entry;
+ struct hrtimer hitimer;
+ enum tk_offsets tk_offset;
+ struct rcu_head rcu;
+};
+
+struct tcf_gate {
+ struct tc_action common;
+ struct gate_action __rcu *actg;
+};
+#define to_gate(a) ((struct tcf_gate *)a)
+
+#define get_gate_param(act) ((struct tcf_gate_params *)act)
+#define get_gate_action(p) ((struct gate_action *)p)
+
+#endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 9f06d29cab70..fc672b232437 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -134,6 +134,7 @@ enum tca_id {
TCA_ID_CTINFO,
TCA_ID_MPLS,
TCA_ID_CT,
+ TCA_ID_GATE,
/* other actions go here */
__TCA_ID_MAX = 255
};
diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
new file mode 100644
index 000000000000..f214b3a6d44f
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_gate.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 NXP */
+
+#ifndef __LINUX_TC_GATE_H
+#define __LINUX_TC_GATE_H
+
+#include <linux/pkt_cls.h>
+
+struct tc_gate {
+ tc_gen;
+};
+
+enum {
+ TCA_GATE_ENTRY_UNSPEC,
+ TCA_GATE_ENTRY_INDEX,
+ TCA_GATE_ENTRY_GATE,
+ TCA_GATE_ENTRY_INTERVAL,
+ TCA_GATE_ENTRY_IPV,
+ TCA_GATE_ENTRY_MAX_OCTETS,
+ __TCA_GATE_ENTRY_MAX,
+};
+#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_ONE_ENTRY_UNSPEC,
+ TCA_GATE_ONE_ENTRY,
+ __TCA_GATE_ONE_ENTRY_MAX,
+};
+#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_UNSPEC,
+ TCA_GATE_TM,
+ TCA_GATE_PARMS,
+ TCA_GATE_PAD,
+ TCA_GATE_PRIORITY,
+ TCA_GATE_ENTRY_LIST,
+ TCA_GATE_BASE_TIME,
+ TCA_GATE_CYCLE_TIME,
+ TCA_GATE_CYCLE_TIME_EXT,
+ TCA_GATE_FLAGS,
+ TCA_GATE_CLOCKID,
+ __TCA_GATE_MAX,
+};
+#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index bfbefb7bff9d..1314549c7567 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -981,6 +981,19 @@ config NET_ACT_CT
To compile this code as a module, choose M here: the
module will be called act_ct.

+config NET_ACT_GATE
+ tristate "Frame gate entry list control tc action"
+ depends on NET_CLS_ACT
+ help
+ Say Y here to allow to control the ingress flow to be passed at
+ specific time slot and be dropped at other specific time slot by
+ the gate entry list. The manipulation will simulate the IEEE
+ 802.1Qci stream gate control behavior.
+
+ If unsure, say N.
+ To compile this code as a module, choose M here: the
+ module will be called act_gate.
+
config NET_IFE_SKBMARK
tristate "Support to encoding decoding skb mark on IFE action"
depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 31c367a6cd09..66bbf9a98f9e 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o
obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
obj-$(CONFIG_NET_ACT_CT) += act_ct.o
+obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
new file mode 100644
index 000000000000..e932f402b4f1
--- /dev/null
+++ b/net/sched/act_gate.c
@@ -0,0 +1,647 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Copyright 2020 NXP */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <net/act_api.h>
+#include <net/netlink.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>
+
+static unsigned int gate_net_id;
+static struct tc_action_ops act_gate_ops;
+
+static ktime_t gate_get_time(struct gate_action *gact)
+{
+ ktime_t mono = ktime_get();
+
+ switch (gact->tk_offset) {
+ case TK_OFFS_MAX:
+ return mono;
+ default:
+ return ktime_mono_to_any(mono, gact->tk_offset);
+ }
+
+ return KTIME_MAX;
+}
+
+static int gate_get_start_time(struct gate_action *gact, ktime_t *start)
+{
+ struct tcf_gate_params *param = get_gate_param(gact);
+ ktime_t now, base, cycle;
+ u64 n;
+
+ base = ns_to_ktime(param->tcfg_basetime);
+ now = gate_get_time(gact);
+
+ if (ktime_after(base, now)) {
+ *start = base;
+ return 0;
+ }
+
+ cycle = param->tcfg_cycletime;
+
+ /* cycle time should not be zero */
+ if (WARN_ON(!cycle))
+ return -EFAULT;
+
+ n = div64_u64(ktime_sub_ns(now, base), cycle);
+ *start = ktime_add_ns(base, (n + 1) * cycle);
+ return 0;
+}
+
+static void gate_start_timer(struct gate_action *gact, ktime_t start)
+{
+ ktime_t expires;
+
+ expires = hrtimer_get_expires(&gact->hitimer);
+ if (expires == 0)
+ expires = KTIME_MAX;
+
+ start = min_t(ktime_t, start, expires);
+
+ hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS);
+}
+
+static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
+{
+ struct gate_action *gact = container_of(timer, struct gate_action,
+ hitimer);
+ struct tcf_gate_params *p = get_gate_param(gact);
+ struct tcfg_gate_entry *next;
+ ktime_t close_time, now;
+
+ spin_lock(&gact->entry_lock);
+
+ next = rcu_dereference_protected(gact->next_entry,
+ lockdep_is_held(&gact->entry_lock));
+
+ /* cycle start, clear pending bit, clear total octets */
+ gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0;
+ gact->current_entry_octets = 0;
+ gact->current_max_octets = next->maxoctets;
+
+ gact->current_close_time = ktime_add_ns(gact->current_close_time,
+ next->interval);
+
+ close_time = gact->current_close_time;
+
+ if (list_is_last(&next->list, &p->entries))
+ next = list_first_entry(&p->entries,
+ struct tcfg_gate_entry, list);
+ else
+ next = list_next_entry(next, list);
+
+ now = gate_get_time(gact);
+
+ if (ktime_after(now, close_time)) {
+ ktime_t cycle, base;
+ u64 n;
+
+ cycle = p->tcfg_cycletime;
+ base = ns_to_ktime(p->tcfg_basetime);
+ n = div64_u64(ktime_sub_ns(now, base), cycle);
+ close_time = ktime_add_ns(base, (n + 1) * cycle);
+ }
+
+ rcu_assign_pointer(gact->next_entry, next);
+ spin_unlock(&gact->entry_lock);
+
+ hrtimer_set_expires(&gact->hitimer, close_time);
+
+ return HRTIMER_RESTART;
+}
+
+static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
+ struct tcf_result *res)
+{
+ struct tcf_gate *g = to_gate(a);
+ struct gate_action *gact;
+ int action;
+
+ tcf_lastuse_update(&g->tcf_tm);
+ bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
+
+ action = READ_ONCE(g->tcf_action);
+ rcu_read_lock();
+ gact = rcu_dereference_bh(g->actg);
+ if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
+ rcu_read_unlock();
+ return action;
+ }
+
+ if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
+ goto drop;
+
+ if (gact->current_max_octets >= 0) {
+ gact->current_entry_octets += qdisc_pkt_len(skb);
+ if (gact->current_entry_octets > gact->current_max_octets) {
+ qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
+ goto drop;
+ }
+ }
+ rcu_read_unlock();
+
+ return action;
+drop:
+ rcu_read_unlock();
+ qstats_drop_inc(this_cpu_ptr(g->common.cpu_qstats));
+ return TC_ACT_SHOT;
+}
+
+static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1] = {
+ [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
+ [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
+ [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
+ [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
+ [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
+};
+
+static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
+ [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
+ .type = NLA_EXACT_LEN },
+ [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
+ [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
+ [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
+ [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
+ [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
+ [TCA_GATE_FLAGS] = { .type = NLA_U32 },
+ [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
+};
+
+static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
+ struct netlink_ext_ack *extack)
+{
+ u32 interval = 0;
+
+ entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
+
+ if (tb[TCA_GATE_ENTRY_INTERVAL])
+ interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
+
+ if (interval == 0) {
+ NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
+ return -EINVAL;
+ }
+
+ entry->interval = interval;
+
+ if (tb[TCA_GATE_ENTRY_IPV])
+ entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
+ else
+ entry->ipv = -1;
+
+ if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
+ entry->maxoctets = nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
+ else
+ entry->maxoctets = -1;
+
+ return 0;
+}
+
+static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry *entry,
+ int index, struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
+ int err;
+
+ err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy, extack);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack, "Could not parse nested entry");
+ return -EINVAL;
+ }
+
+ entry->index = index;
+
+ return fill_gate_entry(tb, entry, extack);
+}
+
+static int parse_gate_list(struct nlattr *list_attr,
+ struct tcf_gate_params *sched,
+ struct netlink_ext_ack *extack)
+{
+ struct tcfg_gate_entry *entry, *e;
+ struct nlattr *n;
+ int err, rem;
+ int i = 0;
+
+ if (!list_attr)
+ return -EINVAL;
+
+ nla_for_each_nested(n, list_attr, rem) {
+ if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
+ NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
+ continue;
+ }
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry) {
+ NL_SET_ERR_MSG(extack, "Not enough memory for entry");
+ err = -ENOMEM;
+ goto release_list;
+ }
+
+ err = parse_gate_entry(n, entry, i, extack);
+ if (err < 0) {
+ kfree(entry);
+ goto release_list;
+ }
+
+ list_add_tail(&entry->list, &sched->entries);
+ i++;
+ }
+
+ sched->num_entries = i;
+
+ return i;
+
+release_list:
+ list_for_each_entry_safe(entry, e, &sched->entries, list) {
+ list_del(&entry->list);
+ kfree(entry);
+ }
+
+ return err;
+}
+
+static int tcf_gate_init(struct net *net, struct nlattr *nla,
+ struct nlattr *est, struct tc_action **a,
+ int ovr, int bind, bool rtnl_held,
+ struct tcf_proto *tp, u32 flags,
+ struct netlink_ext_ack *extack)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+ enum tk_offsets tk_offset = TK_OFFS_TAI;
+ struct nlattr *tb[TCA_GATE_MAX + 1];
+ struct tcf_chain *goto_ch = NULL;
+ struct tcfg_gate_entry *next;
+ struct tcf_gate_params *p;
+ struct gate_action *gact;
+ s32 clockid = CLOCK_TAI;
+ struct tc_gate *parm;
+ struct tcf_gate *g;
+ int ret = 0, err;
+ u64 basetime = 0;
+ u32 gflags = 0;
+ s32 prio = -1;
+ ktime_t start;
+ u32 index;
+
+ if (!nla)
+ return -EINVAL;
+
+ err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
+ if (err < 0)
+ return err;
+
+ if (!tb[TCA_GATE_PARMS])
+ return -EINVAL;
+ parm = nla_data(tb[TCA_GATE_PARMS]);
+ index = parm->index;
+ err = tcf_idr_check_alloc(tn, &index, a, bind);
+ if (err < 0)
+ return err;
+
+ if (err && bind)
+ return 0;
+
+ if (!err) {
+ ret = tcf_idr_create_from_flags(tn, index, est, a,
+ &act_gate_ops, bind, flags);
+ if (ret) {
+ tcf_idr_cleanup(tn, index);
+ return ret;
+ }
+
+ ret = ACT_P_CREATED;
+ } else if (!ovr) {
+ tcf_idr_release(*a, bind);
+ return -EEXIST;
+ }
+
+ if (tb[TCA_GATE_PRIORITY])
+ prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
+
+ if (tb[TCA_GATE_BASE_TIME])
+ basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
+
+ if (tb[TCA_GATE_FLAGS])
+ gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
+
+ if (tb[TCA_GATE_CLOCKID]) {
+ clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
+ switch (clockid) {
+ case CLOCK_REALTIME:
+ tk_offset = TK_OFFS_REAL;
+ break;
+ case CLOCK_MONOTONIC:
+ tk_offset = TK_OFFS_MAX;
+ break;
+ case CLOCK_BOOTTIME:
+ tk_offset = TK_OFFS_BOOT;
+ break;
+ case CLOCK_TAI:
+ tk_offset = TK_OFFS_TAI;
+ break;
+ default:
+ NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
+ goto release_idr;
+ }
+ }
+
+ err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
+ if (err < 0)
+ goto release_idr;
+
+ g = to_gate(*a);
+
+ gact = kzalloc(sizeof(*gact), GFP_KERNEL);
+ if (!gact) {
+ err = -ENOMEM;
+ goto put_chain;
+ }
+
+ p = get_gate_param(gact);
+
+ INIT_LIST_HEAD(&p->entries);
+ if (tb[TCA_GATE_ENTRY_LIST]) {
+ err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
+ if (err < 0)
+ goto release_mem;
+ }
+
+ if (tb[TCA_GATE_CYCLE_TIME]) {
+ p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
+ } else {
+ struct tcfg_gate_entry *entry;
+ ktime_t cycle = 0;
+
+ list_for_each_entry(entry, &p->entries, list)
+ cycle = ktime_add_ns(cycle, entry->interval);
+ p->tcfg_cycletime = cycle;
+ }
+
+ if (tb[TCA_GATE_CYCLE_TIME_EXT])
+ p->tcfg_cycletime_ext =
+ nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
+
+ p->tcfg_priority = prio;
+ p->tcfg_basetime = basetime;
+ p->tcfg_clockid = clockid;
+ p->tcfg_flags = gflags;
+
+ gact->tk_offset = tk_offset;
+ spin_lock_init(&gact->entry_lock);
+ hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
+ gact->hitimer.function = gate_timer_func;
+
+ err = gate_get_start_time(gact, &start);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack,
+ "Internal error: failed get start time");
+ goto release_mem;
+ }
+
+ gact->current_close_time = start;
+ gact->current_gate_status = GATE_ACT_GATE_OPEN | GATE_ACT_PENDING;
+
+ next = list_first_entry(&p->entries, struct tcfg_gate_entry, list);
+ rcu_assign_pointer(gact->next_entry, next);
+
+ gate_start_timer(gact, start);
+
+ spin_lock_bh(&g->tcf_lock);
+ goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+ gact = rcu_replace_pointer(g->actg, gact,
+ lockdep_is_held(&g->tcf_lock));
+ spin_unlock_bh(&g->tcf_lock);
+
+ if (goto_ch)
+ tcf_chain_put_by_act(goto_ch);
+ if (gact)
+ kfree_rcu(gact, rcu);
+
+ if (ret == ACT_P_CREATED)
+ tcf_idr_insert(tn, *a);
+ return ret;
+
+release_mem:
+ kfree(gact);
+put_chain:
+ if (goto_ch)
+ tcf_chain_put_by_act(goto_ch);
+release_idr:
+ tcf_idr_release(*a, bind);
+ return err;
+}
+
+static void tcf_gate_cleanup(struct tc_action *a)
+{
+ struct tcf_gate *g = to_gate(a);
+ struct tcfg_gate_entry *entry, *n;
+ struct tcf_gate_params *p;
+ struct gate_action *gact;
+
+ spin_lock_bh(&g->tcf_lock);
+ gact = rcu_dereference_protected(g->actg,
+ lockdep_is_held(&g->tcf_lock));
+ hrtimer_cancel(&gact->hitimer);
+
+ p = get_gate_param(gact);
+ list_for_each_entry_safe(entry, n, &p->entries, list) {
+ list_del(&entry->list);
+ kfree(entry);
+ }
+ spin_unlock_bh(&g->tcf_lock);
+
+ kfree_rcu(gact, rcu);
+}
+
+static int dumping_entry(struct sk_buff *skb,
+ struct tcfg_gate_entry *entry)
+{
+ struct nlattr *item;
+
+ item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY);
+ if (!item)
+ return -ENOSPC;
+
+ if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index))
+ goto nla_put_failure;
+
+ if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry->maxoctets))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv))
+ goto nla_put_failure;
+
+ return nla_nest_end(skb, item);
+
+nla_put_failure:
+ nla_nest_cancel(skb, item);
+ return -1;
+}
+
+static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a,
+ int bind, int ref)
+{
+ unsigned char *b = skb_tail_pointer(skb);
+ struct tcf_gate *g = to_gate(a);
+ struct tc_gate opt = {
+ .index = g->tcf_index,
+ .refcnt = refcount_read(&g->tcf_refcnt) - ref,
+ .bindcnt = atomic_read(&g->tcf_bindcnt) - bind,
+ };
+ struct tcfg_gate_entry *entry;
+ struct gate_action *gact;
+ struct tcf_gate_params *p;
+ struct nlattr *entry_list;
+ struct tcf_t t;
+
+ spin_lock_bh(&g->tcf_lock);
+ opt.action = g->tcf_action;
+ gact = rcu_dereference_protected(g->actg,
+ lockdep_is_held(&g->tcf_lock));
+
+ p = get_gate_param(gact);
+
+ if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME,
+ p->tcfg_basetime, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME,
+ p->tcfg_cycletime, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT,
+ p->tcfg_cycletime_ext, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority))
+ goto nla_put_failure;
+
+ entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST);
+ if (!entry_list)
+ goto nla_put_failure;
+
+ list_for_each_entry(entry, &p->entries, list) {
+ if (dumping_entry(skb, entry) < 0)
+ goto nla_put_failure;
+ }
+
+ nla_nest_end(skb, entry_list);
+
+ tcf_tm_dump(&t, &g->tcf_tm);
+ if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD))
+ goto nla_put_failure;
+ spin_unlock_bh(&g->tcf_lock);
+
+ return skb->len;
+
+nla_put_failure:
+ spin_unlock_bh(&g->tcf_lock);
+ nlmsg_trim(skb, b);
+ return -1;
+}
+
+static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
+ struct netlink_callback *cb, int type,
+ const struct tc_action_ops *ops,
+ struct netlink_ext_ack *extack)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u32 packets,
+ u64 lastuse, bool hw)
+{
+ struct tcf_gate *g = to_gate(a);
+ struct tcf_t *tm = &g->tcf_tm;
+
+ tcf_action_update_stats(a, bytes, packets, false, hw);
+ tm->lastuse = max_t(u64, tm->lastuse, lastuse);
+}
+
+static int tcf_gate_search(struct net *net, struct tc_action **a, u32 index)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tcf_idr_search(tn, a, index);
+}
+
+static size_t tcf_gate_get_fill_size(const struct tc_action *act)
+{
+ return nla_total_size(sizeof(struct tc_gate));
+}
+
+static struct tc_action_ops act_gate_ops = {
+ .kind = "gate",
+ .id = TCA_ID_GATE,
+ .owner = THIS_MODULE,
+ .act = tcf_gate_act,
+ .dump = tcf_gate_dump,
+ .init = tcf_gate_init,
+ .cleanup = tcf_gate_cleanup,
+ .walk = tcf_gate_walker,
+ .stats_update = tcf_gate_stats_update,
+ .get_fill_size = tcf_gate_get_fill_size,
+ .lookup = tcf_gate_search,
+ .size = sizeof(struct gate_action),
+};
+
+static __net_init int gate_init_net(struct net *net)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tc_action_net_init(net, tn, &act_gate_ops);
+}
+
+static void __net_exit gate_exit_net(struct list_head *net_list)
+{
+ tc_action_net_exit(net_list, gate_net_id);
+}
+
+static struct pernet_operations gate_net_ops = {
+ .init = gate_init_net,
+ .exit_batch = gate_exit_net,
+ .id = &gate_net_id,
+ .size = sizeof(struct tc_action_net),
+};
+
+static int __init gate_init_module(void)
+{
+ return tcf_register_action(&act_gate_ops, &gate_net_ops);
+}
+
+static void __exit gate_cleanup_module(void)
+{
+ tcf_unregister_action(&act_gate_ops, &gate_net_ops);
+}
+
+module_init(gate_init_module);
+module_exit(gate_cleanup_module);
+MODULE_LICENSE("GPL v2");
--
2.17.1

2020-04-18 01:36:51

by Po Liu

[permalink] [raw]
Subject: [ v2,net-next 2/4] net: schedule: add action gate offloading

Add the gate action to the flow action entry. Add the gate parameters to
the tc_setup_flow_action() queueing to the entries of flow_action_entry
array provide to the driver.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 10 +++
include/net/tc_act/tc_gate.h | 115 +++++++++++++++++++++++++++++++++++
net/sched/cls_api.c | 33 ++++++++++
3 files changed, 158 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 3619c6acf60f..94a30fe02e6d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -147,6 +147,7 @@ enum flow_action_id {
FLOW_ACTION_MPLS_PUSH,
FLOW_ACTION_MPLS_POP,
FLOW_ACTION_MPLS_MANGLE,
+ FLOW_ACTION_GATE,
NUM_FLOW_ACTIONS,
};

@@ -255,6 +256,15 @@ struct flow_action_entry {
u8 bos;
u8 ttl;
} mpls_mangle;
+ struct {
+ u32 index;
+ s32 prio;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimeext;
+ u32 num_entries;
+ struct action_gate_entry *entries;
+ } gate;
};
struct flow_action_cookie *cookie; /* user defined action cookie */
};
diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
index b0ace55b2aaa..62633cb02c7a 100644
--- a/include/net/tc_act/tc_gate.h
+++ b/include/net/tc_act/tc_gate.h
@@ -7,6 +7,13 @@
#include <net/act_api.h>
#include <linux/tc_act/tc_gate.h>

+struct action_gate_entry {
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+};
+
struct tcfg_gate_entry {
int index;
u8 gate_state;
@@ -51,4 +58,112 @@ struct tcf_gate {
#define get_gate_param(act) ((struct tcf_gate_params *)act)
#define get_gate_action(p) ((struct gate_action *)p)

+static inline bool is_tcf_gate(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+ if (a->ops && a->ops->id == TCA_ID_GATE)
+ return true;
+#endif
+ return false;
+}
+
+static inline u32 tcf_gate_index(const struct tc_action *a)
+{
+ return a->tcfa_index;
+}
+
+static inline s32 tcf_gate_prio(const struct tc_action *a)
+{
+ s32 tcfg_prio;
+
+ rcu_read_lock();
+ tcfg_prio = rcu_dereference(to_gate(a)->actg)->param.tcfg_priority;
+ rcu_read_unlock();
+
+ return tcfg_prio;
+}
+
+static inline u64 tcf_gate_basetime(const struct tc_action *a)
+{
+ u64 tcfg_basetime;
+
+ rcu_read_lock();
+ tcfg_basetime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_basetime;
+ rcu_read_unlock();
+
+ return tcfg_basetime;
+}
+
+static inline u64 tcf_gate_cycletime(const struct tc_action *a)
+{
+ u64 tcfg_cycletime;
+
+ rcu_read_lock();
+ tcfg_cycletime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime;
+ rcu_read_unlock();
+
+ return tcfg_cycletime;
+}
+
+static inline u64 tcf_gate_cycletimeext(const struct tc_action *a)
+{
+ u64 tcfg_cycletimeext;
+
+ rcu_read_lock();
+ tcfg_cycletimeext =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime_ext;
+ rcu_read_unlock();
+
+ return tcfg_cycletimeext;
+}
+
+static inline u32 tcf_gate_num_entries(const struct tc_action *a)
+{
+ u32 num_entries;
+
+ rcu_read_lock();
+ num_entries =
+ rcu_dereference(to_gate(a)->actg)->param.num_entries;
+ rcu_read_unlock();
+
+ return num_entries;
+}
+
+static inline struct action_gate_entry
+ *tcf_gate_get_list(const struct tc_action *a)
+{
+ struct action_gate_entry *oe;
+ struct tcf_gate_params *p;
+ struct tcfg_gate_entry *entry;
+ u32 num_entries;
+ int i = 0;
+
+ rcu_read_lock();
+ p = &(rcu_dereference(to_gate(a)->actg)->param);
+ num_entries = p->num_entries;
+ rcu_read_unlock();
+
+ list_for_each_entry(entry, &p->entries, list)
+ i++;
+
+ if (i != num_entries)
+ return NULL;
+
+ oe = kzalloc(sizeof(*oe) * num_entries, GFP_KERNEL);
+ if (!oe)
+ return NULL;
+
+ i = 0;
+ list_for_each_entry(entry, &p->entries, list) {
+ oe[i].gate_state = entry->gate_state;
+ oe[i].interval = entry->interval;
+ oe[i].ipv = entry->ipv;
+ oe[i].maxoctets = entry->maxoctets;
+ i++;
+ }
+
+ return oe;
+}
#endif
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index f6a3b969ead0..c8de5a887230 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -39,6 +39,7 @@
#include <net/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_ct.h>
#include <net/tc_act/tc_mpls.h>
+#include <net/tc_act/tc_gate.h>
#include <net/flow_offload.h>

extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
@@ -3522,6 +3523,27 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
#endif
}

+static void tcf_gate_entry_destructor(void *priv)
+{
+ struct action_gate_entry *oe = priv;
+
+ kfree(oe);
+}
+
+static int tcf_gate_get_entries(struct flow_action_entry *entry,
+ const struct tc_action *act)
+{
+ entry->gate.entries = tcf_gate_get_list(act);
+
+ if (!entry->gate.entries)
+ return -EINVAL;
+
+ entry->destructor = tcf_gate_entry_destructor;
+ entry->destructor_priv = entry->gate.entries;
+
+ return 0;
+}
+
int tc_setup_flow_action(struct flow_action *flow_action,
const struct tcf_exts *exts)
{
@@ -3668,6 +3690,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
} else if (is_tcf_skbedit_priority(act)) {
entry->id = FLOW_ACTION_PRIORITY;
entry->priority = tcf_skbedit_priority(act);
+ } else if (is_tcf_gate(act)) {
+ entry->id = FLOW_ACTION_GATE;
+ entry->gate.index = tcf_gate_index(act);
+ entry->gate.prio = tcf_gate_prio(act);
+ entry->gate.basetime = tcf_gate_basetime(act);
+ entry->gate.cycletime = tcf_gate_cycletime(act);
+ entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
+ entry->gate.num_entries = tcf_gate_num_entries(act);
+ err = tcf_gate_get_entries(entry, act);
+ if (err)
+ goto err_out;
} else {
err = -EOPNOTSUPP;
goto err_out_locked;
--
2.17.1

2020-04-18 01:37:27

by Po Liu

[permalink] [raw]
Subject: [ v2,net-next 3/4] net: enetc: add hw tc hw offload features for PSPF capability

This patch is to let ethtool enable/disable the tc flower offload
features. Hardware ENETC has the feature of PSFP which is for per-stream
policing. When enable the tc hw offloading feature, driver would enable
the IEEE 802.1Qci feature. It is only set the register enable bit for
this feature not enable for any entry of per stream filtering and stream
gate or stream identify but get how much capabilities for each feature.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 23 +++++++++
drivers/net/ethernet/freescale/enetc/enetc.h | 48 +++++++++++++++++++
.../net/ethernet/freescale/enetc/enetc_hw.h | 17 +++++++
.../net/ethernet/freescale/enetc/enetc_pf.c | 8 ++++
4 files changed, 96 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index ccf2611f4a20..04aac7cbb506 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -756,6 +756,9 @@ void enetc_get_si_caps(struct enetc_si *si)

if (val & ENETC_SIPCAPR0_QBV)
si->hw_features |= ENETC_SI_F_QBV;
+
+ if (val & ENETC_SIPCAPR0_PSFP)
+ si->hw_features |= ENETC_SI_F_PSFP;
}

static int enetc_dma_alloc_bdr(struct enetc_bdr *r, size_t bd_size)
@@ -1567,6 +1570,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
return 0;
}

+static int enetc_set_psfp(struct net_device *ndev, int en)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+
+ if (en) {
+ priv->active_offloads |= ENETC_F_QCI;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&priv->si->hw);
+ } else {
+ priv->active_offloads &= ~ENETC_F_QCI;
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+ enetc_psfp_disable(&priv->si->hw);
+ }
+
+ return 0;
+}
+
int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
@@ -1575,6 +1595,9 @@ int enetc_set_features(struct net_device *ndev,
if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

+ if (changed & NETIF_F_HW_TC)
+ enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+
return 0;
}

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 56c43f35b633..2cfe877c3778 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -151,6 +151,7 @@ enum enetc_errata {
};

#define ENETC_SI_F_QBV BIT(0)
+#define ENETC_SI_F_PSFP BIT(1)

/* PCI IEP device data */
struct enetc_si {
@@ -203,12 +204,20 @@ struct enetc_cls_rule {
};

#define ENETC_MAX_BDR_INT 2 /* fixed to max # of available cpus */
+struct psfp_cap {
+ u32 max_streamid;
+ u32 max_psfp_filter;
+ u32 max_psfp_gate;
+ u32 max_psfp_gatelist;
+ u32 max_psfp_meter;
+};

/* TODO: more hardware offloads */
enum enetc_active_offloads {
ENETC_F_RX_TSTAMP = BIT(0),
ENETC_F_TX_TSTAMP = BIT(1),
ENETC_F_QBV = BIT(2),
+ ENETC_F_QCI = BIT(3),
};

struct enetc_ndev_priv {
@@ -231,6 +240,8 @@ struct enetc_ndev_priv {

struct enetc_cls_rule *cls_rules;

+ struct psfp_cap psfp_cap;
+
struct device_node *phy_node;
phy_interface_t if_mode;
};
@@ -289,9 +300,46 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+
+static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
+{
+ u32 reg;
+
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSIDCAPR);
+ priv->psfp_cap.max_streamid = reg & ENETC_PSIDCAPR_MSK;
+ /* Port stream filter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSFCAPR);
+ priv->psfp_cap.max_psfp_filter = reg & ENETC_PSFCAPR_MSK;
+ /* Port stream gate capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSGCAPR);
+ priv->psfp_cap.max_psfp_gate = (reg & ENETC_PSGCAPR_SGIT_MSK);
+ priv->psfp_cap.max_psfp_gatelist = (reg & ENETC_PSGCAPR_GCL_MSK) >> 16;
+ /* Port flow meter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PFMCAPR);
+ priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
+}
+
+static inline void enetc_psfp_enable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) |
+ ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS |
+ ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+}
+
+static inline void enetc_psfp_disable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) &
+ ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS &
+ ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+}
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_get_max_cap(p) \
+ memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))
+
+#define enetc_psfp_enable(hw) (void)0
+#define enetc_psfp_disable(hw) (void)0
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 2a6523136947..587974862f48 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -19,6 +19,7 @@
#define ENETC_SICTR1 0x1c
#define ENETC_SIPCAPR0 0x20
#define ENETC_SIPCAPR0_QBV BIT(4)
+#define ENETC_SIPCAPR0_PSFP BIT(9)
#define ENETC_SIPCAPR0_RSS BIT(8)
#define ENETC_SIPCAPR1 0x24
#define ENETC_SITGTGR 0x30
@@ -228,6 +229,15 @@ enum enetc_bdr_type {TX, RX};
#define ENETC_PM0_IFM_RLP (BIT(5) | BIT(11))
#define ENETC_PM0_IFM_RGAUTO (BIT(15) | ENETC_PMO_IFM_RG | BIT(1))
#define ENETC_PM0_IFM_XGMII BIT(12)
+#define ENETC_PSIDCAPR 0x1b08
+#define ENETC_PSIDCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSFCAPR 0x1b18
+#define ENETC_PSFCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSGCAPR 0x1b28
+#define ENETC_PSGCAPR_GCL_MSK GENMASK(18, 16)
+#define ENETC_PSGCAPR_SGIT_MSK GENMASK(15, 0)
+#define ENETC_PFMCAPR 0x1b38
+#define ENETC_PFMCAPR_MSK GENMASK(15, 0)

/* MAC counters */
#define ENETC_PM0_REOCT 0x8100
@@ -621,3 +631,10 @@ struct enetc_cbd {
/* Port time specific departure */
#define ENETC_PTCTSDR(n) (0x1210 + 4 * (n))
#define ENETC_TSDE BIT(31)
+
+/* PSFP setting */
+#define ENETC_PPSFPMR 0x11b00
+#define ENETC_PPSFPMR_PSFPEN BIT(0)
+#define ENETC_PPSFPMR_VS BIT(1)
+#define ENETC_PPSFPMR_PVC BIT(2)
+#define ENETC_PPSFPMR_PVZC BIT(3)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index 85e2b741df41..eacd597b55f2 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -739,6 +739,14 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

+ if (si->hw_features & ENETC_SI_F_PSFP) {
+ priv->active_offloads |= ENETC_F_QCI;
+ ndev->features |= NETIF_F_HW_TC;
+ ndev->hw_features |= NETIF_F_HW_TC;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&si->hw);
+ }
+
/* pick up primary MAC address from SI */
enetc_get_primary_mac_addr(&si->hw, ndev->dev_addr);
}
--
2.17.1

2020-04-18 01:38:19

by Po Liu

[permalink] [raw]
Subject: [ v2,net-next 4/4] net: enetc: add tc flower psfp offload driver

This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
function. There are four main feature parts to implement the flow
policing and filtering for ingress flow with IEEE 802.1Qci features.
They are stream identify(this is defined in the P802.1cb exactly but
needed for 802.1Qci), stream filtering, stream gate and flow metering.
Each function block includes many entries by index to assign parameters.
So for one frame would be filtered by stream identify first, then
flow into stream filter block by the same handle between stream identify
and stream filtering. Then flow into stream gate control which assigned
by the stream filtering entry. And then policing by the gate and limited
by the max sdu in the filter block(optional). At last, policing by the
flow metering block, index choosing at the fitering block.
So you can see that each entry of block may link to many upper entries
since they can be assigned same index means more streams want to share
the same feature in the stream filtering or stream gate or flow
metering.
To implement such features, each stream filtered by source/destination
mac address, some stream maybe also plus the vlan id value would be
treated as one flow chain. This would be identified by the chain_index
which already in the tc filter concept. Driver would maintain this chain
and also with gate modules. The stream filter entry create by the gate
index and flow meter(optional) entry id and also one priority value.
Offloading only transfer the gate action and flow filtering parameters.
Driver would create (or search same gate id and flow meter id and
priority) one stream filter entry to set to the hardware. So stream
filtering do not need transfer by the action offloading.
This architecture is same with tc filter and actions relationship. tc
filter maintain the list for each flow feature by keys. And actions
maintain by the action list.

Below showing a example commands by tc:
> tc qdisc add dev eth0 ingress
> ip link set eth0 address 10:00:80:00:00:00
> tc filter add dev eth0 parent ffff: protocol ip chain 11 \
flower skip_sw dst_mac 10:00:80:00:00:00 \
action gate index 10 \
sched-entry OPEN 200000000 1 8000000 \
sched-entry CLOSE 100000000 -1 -1

Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
identify module. And the set the gate index 10 of stream gate module.
The gate list is keeping OPEN state 200ms, and through the frames to the
ingress queue 1, and max octets are the 8Mbytes.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 25 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 46 +-
.../net/ethernet/freescale/enetc/enetc_hw.h | 142 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 4 +-
.../net/ethernet/freescale/enetc/enetc_qos.c | 1070 +++++++++++++++++
5 files changed, 1272 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 04aac7cbb506..298c55786fd9 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1521,6 +1521,8 @@ int enetc_setup_tc(struct net_device *ndev, enum tc_setup_type type,
return enetc_setup_tc_cbs(ndev, type_data);
case TC_SETUP_QDISC_ETF:
return enetc_setup_tc_txtime(ndev, type_data);
+ case TC_SETUP_BLOCK:
+ return enetc_setup_tc_psfp(ndev, type_data);
default:
return -EOPNOTSUPP;
}
@@ -1573,17 +1575,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
static int enetc_set_psfp(struct net_device *ndev, int en)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ int err;

if (en) {
+ err = enetc_psfp_enable(priv);
+ if (err)
+ return err;
+
priv->active_offloads |= ENETC_F_QCI;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&priv->si->hw);
- } else {
- priv->active_offloads &= ~ENETC_F_QCI;
- memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
- enetc_psfp_disable(&priv->si->hw);
+ return 0;
}

+ err = enetc_psfp_disable(priv);
+ if (err)
+ return err;
+
+ priv->active_offloads &= ~ENETC_F_QCI;
+
return 0;
}

@@ -1591,14 +1599,15 @@ int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
netdev_features_t changed = ndev->features ^ features;
+ int err = 0;

if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

if (changed & NETIF_F_HW_TC)
- enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+ err = enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));

- return 0;
+ return err;
}

#ifdef CONFIG_FSL_ENETC_PTP_CLOCK
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 2cfe877c3778..b705464f6882 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -300,6 +300,11 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv);
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data);
+int enetc_psfp_init(struct enetc_ndev_priv *priv);
+int enetc_psfp_clean(struct enetc_ndev_priv *priv);

static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
{
@@ -319,27 +324,60 @@ static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
}

-static inline void enetc_psfp_enable(struct enetc_hw *hw)
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ enetc_get_max_cap(priv);
+
+ err = enetc_psfp_init(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) |
ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS |
ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+
+ return 0;
}

-static inline void enetc_psfp_disable(struct enetc_hw *hw)
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ err = enetc_psfp_clean(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) &
~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS &
~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+
+ return 0;
}
+
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_psfp(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_block_cb NULL
+
#define enetc_get_max_cap(p) \
memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))

-#define enetc_psfp_enable(hw) (void)0
-#define enetc_psfp_disable(hw) (void)0
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
+{
+ return 0;
+}
+
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
+{
+ return 0;
+}
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 587974862f48..6314051bc6c1 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -567,6 +567,9 @@ enum bdcr_cmd_class {
BDCR_CMD_RFS,
BDCR_CMD_PORT_GCL,
BDCR_CMD_RECV_CLASSIFIER,
+ BDCR_CMD_STREAM_IDENTIFY,
+ BDCR_CMD_STREAM_FILTER,
+ BDCR_CMD_STREAM_GCL,
__BDCR_CMD_MAX_LEN,
BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
};
@@ -598,13 +601,152 @@ struct tgs_gcl_data {
struct gce entry[];
};

+/* class 7, command 0, Stream Identity Entry Configuration */
+struct streamid_conf {
+ __le32 stream_handle; /* init gate value */
+ __le32 iports;
+ u8 id_type;
+ u8 oui[3];
+ u8 res[3];
+ u8 en;
+};
+
+#define ENETC_CBDR_SID_VID_MASK 0xfff
+#define ENETC_CBDR_SID_VIDM BIT(12)
+#define ENETC_CBDR_SID_TG_MASK 0xc000
+/* streamid_conf address point to this data space */
+struct streamid_data {
+ union {
+ u8 dmac[6];
+ u8 smac[6];
+ };
+ u16 vid_vidm_tg;
+};
+
+#define ENETC_CBDR_SFI_PRI_MASK 0x7
+#define ENETC_CBDR_SFI_PRIM BIT(3)
+#define ENETC_CBDR_SFI_BLOV BIT(4)
+#define ENETC_CBDR_SFI_BLEN BIT(5)
+#define ENETC_CBDR_SFI_MSDUEN BIT(6)
+#define ENETC_CBDR_SFI_FMITEN BIT(7)
+#define ENETC_CBDR_SFI_ENABLE BIT(7)
+/* class 8, command 0, Stream Filter Instance, Short Format */
+struct sfi_conf {
+ __le32 stream_handle;
+ u8 multi;
+ u8 res[2];
+ u8 sthm;
+ /* Max Service Data Unit or Flow Meter Instance Table index.
+ * Depending on the value of FLT this represents either Max
+ * Service Data Unit (max frame size) allowed by the filter
+ * entry or is an index into the Flow Meter Instance table
+ * index identifying the policer which will be used to police
+ * it.
+ */
+ __le16 fm_inst_table_index;
+ __le16 msdu;
+ __le16 sg_inst_table_index;
+ u8 res1[2];
+ __le32 input_ports;
+ u8 res2[3];
+ u8 en;
+};
+
+/* class 8, command 2 stream Filter Instance status query short format
+ * command no need structure define
+ * Stream Filter Instance Query Statistics Response data
+ */
+struct sfi_counter_data {
+ u32 matchl;
+ u32 matchh;
+ u32 msdu_dropl;
+ u32 msdu_droph;
+ u32 stream_gate_dropl;
+ u32 stream_gate_droph;
+ u32 flow_meter_dropl;
+ u32 flow_meter_droph;
+};
+
+#define ENETC_CBDR_SGI_OIPV_MASK 0x7
+#define ENETC_CBDR_SGI_OIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_CGTST BIT(6)
+#define ENETC_CBDR_SGI_OGTST BIT(7)
+#define ENETC_CBDR_SGI_CFG_CHG BIT(1)
+#define ENETC_CBDR_SGI_CFG_PND BIT(2)
+#define ENETC_CBDR_SGI_OEX BIT(4)
+#define ENETC_CBDR_SGI_OEXEN BIT(5)
+#define ENETC_CBDR_SGI_IRX BIT(6)
+#define ENETC_CBDR_SGI_IRXEN BIT(7)
+#define ENETC_CBDR_SGI_ACLLEN_MASK 0x3
+#define ENETC_CBDR_SGI_OCLLEN_MASK 0xc
+#define ENETC_CBDR_SGI_EN BIT(7)
+/* class 9, command 0, Stream Gate Instance Table, Short Format
+ * class 9, command 2, Stream Gate Instance Table entry query write back
+ * Short Format
+ */
+struct sgi_table {
+ u8 res[8];
+ u8 oipv;
+ u8 res0[2];
+ u8 ocgtst;
+ u8 res1[7];
+ u8 gset;
+ u8 oacl_len;
+ u8 res2[2];
+ u8 en;
+};
+
+#define ENETC_CBDR_SGI_AIPV_MASK 0x7
+#define ENETC_CBDR_SGI_AIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_AGTST BIT(7)
+
+/* class 9, command 1, Stream Gate Control List, Long Format */
+struct sgcl_conf {
+ u8 aipv;
+ u8 res[2];
+ u8 agtst;
+ u8 res1[4];
+ union {
+ struct {
+ u8 res2[4];
+ u8 acl_len;
+ u8 res3[3];
+ };
+ u8 cct[8]; /* Config change time */
+ };
+};
+
+#define ENETC_CBDR_SGL_IOMEN BIT(0)
+#define ENETC_CBDR_SGL_IPVEN BIT(3)
+#define ENETC_CBDR_SGL_GTST BIT(4)
+#define ENETC_CBDR_SGL_IPV_MASK 0xe
+/* Stream Gate Control List Entry */
+struct sgce {
+ u32 interval;
+ u8 msdu[3];
+ u8 multi;
+};
+
+/* stream control list class 9 , cmd 1 data buffer */
+struct sgcl_data {
+ u32 btl;
+ u32 bth;
+ u32 ct;
+ u32 cte;
+ struct sgce sgcl[0];
+};
+
struct enetc_cbd {
union{
+ struct sfi_conf sfi_conf;
+ struct sgi_table sgi_table;
struct {
__le32 addr[2];
union {
__le32 opt[4];
struct tgs_gcl_conf gcl_conf;
+ struct streamid_conf sid_set;
+ struct sgcl_conf sgcl_conf;
};
}; /* Long format */
__le32 data[6];
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index eacd597b55f2..d06fce0dcd8a 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -739,12 +739,10 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

- if (si->hw_features & ENETC_SI_F_PSFP) {
+ if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
priv->active_offloads |= ENETC_F_QCI;
ndev->features |= NETIF_F_HW_TC;
ndev->hw_features |= NETIF_F_HW_TC;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&si->hw);
}

/* pick up primary MAC address from SI */
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 0c6bf3a55a9a..7944c243903c 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -5,6 +5,9 @@

#include <net/pkt_sched.h>
#include <linux/math64.h>
+#include <linux/refcount.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>

static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
{
@@ -331,3 +334,1070 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)

return 0;
}
+
+enum streamid_type {
+ STREAMID_TYPE_RESERVED = 0,
+ STREAMID_TYPE_NULL,
+ STREAMID_TYPE_SMAC,
+};
+
+enum streamid_vlan_tagged {
+ STREAMID_VLAN_RESERVED = 0,
+ STREAMID_VLAN_TAGGED,
+ STREAMID_VLAN_UNTAGGED,
+ STREAMID_VLAN_ALL,
+};
+
+#define ENETC_PSFP_WILDCARD -1
+#define HANDLE_OFFSET 100
+
+enum forward_type {
+ FILTER_ACTION_TYPE_PSFP = BIT(0),
+ FILTER_ACTION_TYPE_ACL = BIT(1),
+ FILTER_ACTION_TYPE_BOTH = GENMASK(1, 0),
+};
+
+/* This is for limit output type for input actions */
+struct actions_fwd {
+ u64 actions;
+ u64 keys; /* include the must needed keys */
+ enum forward_type output;
+};
+
+struct psfp_streamfilter_counters {
+ u64 matching_frames_count;
+ u64 passing_frames_count;
+ u64 not_passing_frames_count;
+ u64 passing_sdu_count;
+ u64 not_passing_sdu_count;
+ u64 red_frames_count;
+};
+
+struct enetc_streamid {
+ u32 index;
+ union {
+ u8 src_mac[6];
+ u8 dst_mac[6];
+ };
+ u8 filtertype;
+ u16 vid;
+ u8 tagged;
+ s32 handle;
+};
+
+struct enetc_psfp_filter {
+ u32 index;
+ s32 handle;
+ s8 prio;
+ u32 gate_id;
+ s32 meter_id;
+ refcount_t refcount;
+ struct hlist_node node;
+};
+
+struct enetc_psfp_gate {
+ u32 index;
+ s8 init_ipv;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimext;
+ u32 num_entries;
+ refcount_t refcount;
+ struct hlist_node node;
+ struct action_gate_entry entries[0];
+};
+
+struct enetc_stream_filter {
+ struct enetc_streamid sid;
+ u32 sfi_index;
+ u32 sgi_index;
+ struct flow_stats stats;
+ struct hlist_node node;
+};
+
+struct enetc_psfp {
+ unsigned long dev_bitmap;
+ unsigned long *psfp_sfi_bitmap;
+ struct hlist_head stream_list;
+ struct hlist_head psfp_filter_list;
+ struct hlist_head psfp_gate_list;
+ spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
+};
+
+struct actions_fwd enetc_act_fwd[] = {
+ {
+ BIT(FLOW_ACTION_GATE),
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
+ FILTER_ACTION_TYPE_PSFP
+ },
+ /* example for ACL actions */
+ {
+ BIT(FLOW_ACTION_DROP),
+ 0,
+ FILTER_ACTION_TYPE_ACL
+ }
+};
+
+static struct enetc_psfp epsfp = {
+ .psfp_sfi_bitmap = NULL,
+};
+
+static LIST_HEAD(enetc_block_cb_list);
+
+static inline int enetc_get_port(struct enetc_ndev_priv *priv)
+{
+ return priv->si->pdev->devfn & 0x7;
+}
+
+/* Stream Identity Entry Set Descriptor */
+static int enetc_streamid_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct streamid_data *si_data;
+ struct streamid_conf *si_conf;
+ u16 data_size;
+ dma_addr_t dma;
+ int err;
+
+ if (sid->index >= priv->psfp_cap.max_streamid)
+ return -EINVAL;
+
+ if (sid->filtertype != STREAMID_TYPE_NULL &&
+ sid->filtertype != STREAMID_TYPE_SMAC)
+ return -EOPNOTSUPP;
+
+ /* Disable operation before enable */
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct streamid_data);
+ si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev, si_data,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(si_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+ memset(si_data->dmac, 0xff, ETH_ALEN);
+ si_data->vid_vidm_tg =
+ cpu_to_le16(ENETC_CBDR_SID_VID_MASK
+ + ((0x3 << 14) | ENETC_CBDR_SID_VIDM));
+
+ si_conf = &cbd.sid_set;
+ /* Only one port supported for one entry, set itself */
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = 1;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ if (!enable) {
+ kfree(si_data);
+ return 0;
+ }
+
+ /* Enable the entry overwrite again incase space flushed by hardware */
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ si_conf->en = 0x80;
+ si_conf->stream_handle = cpu_to_le32(sid->handle);
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = sid->filtertype;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ memset(si_data, 0, data_size);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ /* VIDM default to be 1.
+ * VID Match. If set (b1) then the VID must match, otherwise
+ * any VID is considered a match. VIDM setting is only used
+ * when TG is set to b01.
+ */
+ if (si_conf->id_type == STREAMID_TYPE_NULL) {
+ ether_addr_copy(si_data->dmac, sid->dst_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ } else if (si_conf->id_type == STREAMID_TYPE_SMAC) {
+ ether_addr_copy(si_data->smac, sid->src_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ kfree(si_data);
+
+ return err;
+}
+
+/* Stream Filter Instance Set Descriptor */
+static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_filter *sfi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct sfi_conf *sfi_config;
+
+ cbd.index = cpu_to_le16(sfi->index);
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0x80;
+ cbd.length = cpu_to_le16(1);
+
+ sfi_config = &cbd.sfi_conf;
+ if (!enable)
+ goto exit;
+
+ sfi_config->en = 0x80;
+
+ if (sfi->handle >= 0) {
+ sfi_config->stream_handle =
+ cpu_to_le32(sfi->handle);
+ sfi_config->sthm |= 0x80;
+ }
+
+ sfi_config->sg_inst_table_index = cpu_to_le16(sfi->gate_id);
+ sfi_config->input_ports = 1 << enetc_get_port(priv);
+
+ /* The priority value which may be matched against the
+ * frame’s priority value to determine a match for this entry.
+ */
+ if (sfi->prio >= 0)
+ sfi_config->multi |= (sfi->prio & 0x7) | 0x8;
+
+ /* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
+ * field as being either an MSDU value or an index into the Flow
+ * Meter Instance table.
+ * TODO: no limit max sdu
+ */
+
+ if (sfi->meter_id >= 0) {
+ sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
+ sfi_config->multi |= 0x80;
+ }
+
+exit:
+ return enetc_send_cmd(priv->si, &cbd);
+}
+
+static int enetc_streamcounter_hw_get(struct enetc_ndev_priv *priv,
+ u32 index,
+ struct psfp_streamfilter_counters *cnt)
+{
+ struct enetc_cbd cbd = { .cmd = 2 };
+ struct sfi_counter_data *data_buf;
+ dma_addr_t dma;
+ u16 data_size;
+ int err;
+
+ cbd.index = cpu_to_le16((u16)index);
+ cbd.cmd = 2;
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct sfi_counter_data);
+ data_buf = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!data_buf)
+ return -ENOMEM;
+
+ dma = dma_map_single(&priv->si->pdev->dev, data_buf,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ err = -ENOMEM;
+ goto exit;
+ }
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ goto exit;
+
+ cnt->matching_frames_count =
+ ((u64)le32_to_cpu(data_buf->matchh) << 32)
+ + data_buf->matchl;
+
+ cnt->not_passing_sdu_count =
+ ((u64)le32_to_cpu(data_buf->msdu_droph) << 32)
+ + data_buf->msdu_dropl;
+
+ cnt->passing_sdu_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count;
+
+ cnt->not_passing_frames_count =
+ ((u64)le32_to_cpu(data_buf->stream_gate_droph) << 32)
+ + le32_to_cpu(data_buf->stream_gate_dropl);
+
+ cnt->passing_frames_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count
+ - cnt->not_passing_frames_count;
+
+ cnt->red_frames_count =
+ ((u64)le32_to_cpu(data_buf->flow_meter_droph) << 32)
+ + le32_to_cpu(data_buf->flow_meter_dropl);
+
+exit:
+ kfree(data_buf);
+ return err;
+}
+
+static int get_start_ns(struct enetc_ndev_priv *priv, u64 cycle, u64 *start)
+{
+ u64 now_lo, now_hi, now, n;
+
+ now_lo = enetc_rd(&priv->si->hw, ENETC_SICTR0);
+ now_hi = enetc_rd(&priv->si->hw, ENETC_SICTR1);
+ now = now_lo | now_hi << 32;
+
+ if (WARN_ON(!cycle))
+ return -EFAULT;
+
+ n = div64_u64(now, cycle);
+
+ *start = (n + 1) * cycle;
+
+ return 0;
+}
+
+/* Stream Gate Instance Set Descriptor */
+static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_gate *sgi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = { .cmd = 0 };
+ struct sgi_table *sgi_config;
+ struct sgcl_conf *sgcl_config;
+ struct sgcl_data *sgcl_data;
+ struct sgce *sgce;
+ dma_addr_t dma;
+ u16 data_size;
+ int err, i;
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0x80;
+
+ /* disable */
+ if (!enable)
+ return enetc_send_cmd(priv->si, &cbd);
+
+ if (!sgi->num_entries)
+ return 0;
+
+ if (sgi->num_entries > priv->psfp_cap.max_psfp_gatelist ||
+ !sgi->cycletime)
+ return -EINVAL;
+
+ /* enable */
+ sgi_config = &cbd.sgi_table;
+
+ /* Keep open before gate list start */
+ sgi_config->ocgtst = 0x80;
+
+ sgi_config->oipv = (sgi->init_ipv < 0) ?
+ 0x0 : ((sgi->init_ipv & 0x7) | 0x8);
+
+ sgi_config->en = 0x80;
+
+ /* Basic config */
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 1;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0;
+
+ sgcl_config = &cbd.sgcl_conf;
+
+ sgcl_config->acl_len = (sgi->num_entries - 1) & 0x3;
+
+ data_size = struct_size(sgcl_data, sgcl, sgi->num_entries);
+
+ sgcl_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!sgcl_data)
+ return -ENOMEM;
+
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev,
+ sgcl_data, data_size,
+ DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(sgcl_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ sgce = &sgcl_data->sgcl[0];
+
+ sgcl_config->agtst = 0x80;
+
+ sgcl_data->ct = cpu_to_le32(sgi->cycletime);
+ sgcl_data->cte = cpu_to_le32(sgi->cycletimext);
+
+ if (sgi->init_ipv >= 0)
+ sgcl_config->aipv = (sgi->init_ipv & 0x7) | 0x8;
+
+ for (i = 0; i < sgi->num_entries; i++) {
+ struct action_gate_entry *from = &sgi->entries[i];
+ struct sgce *to = &sgce[i];
+
+ if (from->gate_state)
+ to->multi |= 0x10;
+
+ if (from->ipv >= 0)
+ to->multi |= ((from->ipv & 0x7) << 5) | 0x08;
+
+ if (from->maxoctets)
+ to->multi |= 0x01;
+
+ to->interval = cpu_to_le32(from->interval);
+ to->msdu[0] = from->maxoctets & 0xFF;
+ to->msdu[1] = (from->maxoctets >> 8) & 0xFF;
+ to->msdu[2] = (from->maxoctets >> 16) & 0xFF;
+ }
+
+ /* If basetime is 0, calculate start time */
+ if (!sgi->basetime) {
+ u64 start;
+
+ err = get_start_ns(priv, sgi->cycletime, &start);
+ if (err)
+ goto exit;
+ sgcl_data->btl = cpu_to_le32(lower_32_bits(start));
+ sgcl_data->bth = cpu_to_le32(upper_32_bits(start));
+ } else {
+ u32 hi, lo;
+
+ hi = upper_32_bits(sgi->basetime);
+ lo = lower_32_bits(sgi->basetime);
+ sgcl_data->bth = cpu_to_le32(hi);
+ sgcl_data->btl = cpu_to_le32(lo);
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+
+exit:
+ kfree(sgcl_data);
+
+ return err;
+}
+
+static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
+{
+ struct enetc_stream_filter *f;
+
+ hlist_for_each_entry(f, &epsfp.stream_list, node)
+ if (f->sid.index == index)
+ return f;
+
+ return NULL;
+}
+
+static struct enetc_psfp_gate *enetc_get_gate_by_index(u32 index)
+{
+ struct enetc_psfp_gate *g;
+
+ hlist_for_each_entry(g, &epsfp.psfp_gate_list, node)
+ if (g->index == index)
+ return g;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->index == index)
+ return s;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter
+ *enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->gate_id == sfi->gate_id &&
+ s->prio == sfi->prio &&
+ s->meter_id == sfi->meter_id)
+ return s;
+
+ return NULL;
+}
+
+static int enetc_get_free_index(struct enetc_ndev_priv *priv)
+{
+ u32 max_size = priv->psfp_cap.max_psfp_filter;
+ unsigned long index;
+
+ index = find_first_zero_bit(epsfp.psfp_sfi_bitmap, max_size);
+ if (index == max_size)
+ return -1;
+
+ return index;
+}
+
+static void stream_filter_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_filter *sfi;
+ u8 z;
+
+ sfi = enetc_get_filter_by_index(index);
+ WARN_ON(!sfi);
+ z = refcount_dec_and_test(&sfi->refcount);
+
+ if (z) {
+ enetc_streamfilter_hw_set(priv, sfi, false);
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ clear_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ }
+}
+
+static void stream_gate_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_gate *sgi;
+ u8 z;
+
+ sgi = enetc_get_gate_by_index(index);
+ WARN_ON(!sgi);
+ z = refcount_dec_and_test(&sgi->refcount);
+ if (z) {
+ enetc_streamgate_hw_set(priv, sgi, false);
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void remove_one_chain(struct enetc_ndev_priv *priv,
+ struct enetc_stream_filter *filter)
+{
+ stream_gate_unref(priv, filter->sgi_index);
+ stream_filter_unref(priv, filter->sfi_index);
+
+ hlist_del(&filter->node);
+ kfree(filter);
+}
+
+static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ struct enetc_psfp_filter *sfi,
+ struct enetc_psfp_gate *sgi)
+{
+ int err;
+
+ err = enetc_streamid_hw_set(priv, sid, true);
+ if (err)
+ return err;
+
+ if (sfi) {
+ err = enetc_streamfilter_hw_set(priv, sfi, true);
+ if (err)
+ goto revert_sid;
+ }
+
+ err = enetc_streamgate_hw_set(priv, sgi, true);
+ if (err)
+ goto revert_sfi;
+
+ return 0;
+
+revert_sfi:
+ if (sfi)
+ enetc_streamfilter_hw_set(priv, sfi, false);
+revert_sid:
+ enetc_streamid_hw_set(priv, sid, false);
+ return err;
+}
+
+struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int inputkeys)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(enetc_act_fwd); i++)
+ if (acts == enetc_act_fwd[i].actions &&
+ inputkeys & enetc_act_fwd[i].keys)
+ return &enetc_act_fwd[i];
+
+ return NULL;
+}
+
+static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(f);
+ struct netlink_ext_ack *extack = f->common.extack;
+ struct enetc_stream_filter *filter, *old_filter;
+ struct enetc_psfp_filter *sfi, *old_sfi;
+ struct enetc_psfp_gate *sgi, *old_sgi;
+ struct flow_action_entry *entry;
+ struct action_gate_entry *e;
+ u8 sfi_overwrite = 0;
+ int entries_size;
+ int i, err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ flow_action_for_each(i, entry, &rule->action)
+ if (entry->id == FLOW_ACTION_GATE)
+ break;
+
+ if (entry->id != FLOW_ACTION_GATE)
+ return -EINVAL;
+
+ filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+ if (!filter)
+ return -ENOMEM;
+
+ filter->sid.index = f->common.chain_index;
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+ struct flow_match_eth_addrs match;
+
+ flow_rule_match_eth_addrs(rule, &match);
+
+ if (!is_zero_ether_addr(match.mask->dst)) {
+ ether_addr_copy(filter->sid.dst_mac, match.key->dst);
+ filter->sid.filtertype = STREAMID_TYPE_NULL;
+ }
+
+ if (!is_zero_ether_addr(match.mask->src)) {
+ ether_addr_copy(filter->sid.src_mac, match.key->src);
+ filter->sid.filtertype = STREAMID_TYPE_SMAC;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported, must ETH_ADDRS");
+ return -EINVAL;
+ }
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
+ struct flow_match_vlan match;
+
+ flow_rule_match_vlan(rule, &match);
+ if (match.mask->vlan_priority) {
+ if (match.mask->vlan_priority !=
+ (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN priority");
+ err = -EINVAL;
+ goto free_filter;
+ }
+ }
+
+ if (match.mask->vlan_tpid) {
+ if (match.mask->vlan_tpid != VLAN_VID_MASK) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN id");
+ err = -EINVAL;
+ goto free_filter;
+ }
+
+ filter->sid.vid = match.key->vlan_tpid;
+ if (!filter->sid.vid)
+ filter->sid.tagged = STREAMID_VLAN_UNTAGGED;
+ else
+ filter->sid.tagged = STREAMID_VLAN_TAGGED;
+ }
+ } else {
+ filter->sid.tagged = STREAMID_VLAN_ALL;
+ }
+
+ /* parsing gate action */
+ if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ entries_size = struct_size(sgi, entries, entry->gate.num_entries);
+ sgi = kzalloc(entries_size, GFP_KERNEL);
+ if (!sgi) {
+ err = -ENOMEM;
+ goto free_filter;
+ }
+
+ refcount_set(&sgi->refcount, 1);
+ sgi->index = entry->gate.index;
+ sgi->init_ipv = entry->gate.prio;
+ sgi->basetime = entry->gate.basetime;
+ sgi->cycletime = entry->gate.cycletime;
+ sgi->num_entries = entry->gate.num_entries;
+
+ e = sgi->entries;
+ for (i = 0; i < entry->gate.num_entries; i++) {
+ e[i].gate_state = entry->gate.entries[i].gate_state;
+ e[i].interval = entry->gate.entries[i].interval;
+ e[i].ipv = entry->gate.entries[i].ipv;
+ e[i].maxoctets = entry->gate.entries[i].maxoctets;
+ }
+
+ filter->sgi_index = sgi->index;
+
+ sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
+ if (!sfi) {
+ err = -ENOMEM;
+ goto free_gate;
+ }
+
+ refcount_set(&sfi->refcount, 1);
+ sfi->gate_id = sgi->index;
+
+ /* flow meter not support yet */
+ sfi->meter_id = ENETC_PSFP_WILDCARD;
+
+ /* prio ref the filter prio */
+ if (f->common.prio && f->common.prio <= BIT(3))
+ sfi->prio = f->common.prio - 1;
+ else
+ sfi->prio = ENETC_PSFP_WILDCARD;
+
+ old_sfi = enetc_psfp_check_sfi(sfi);
+ if (!old_sfi) {
+ int index;
+
+ index = enetc_get_free_index(priv);
+ if (sfi->handle < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
+ err = -ENOSPC;
+ goto free_sfi;
+ }
+
+ sfi->index = index;
+ sfi->handle = index + HANDLE_OFFSET;
+ /* Update the stream filter handle also */
+ filter->sid.handle = sfi->handle;
+ filter->sfi_index = sfi->index;
+ sfi_overwrite = 0;
+ } else {
+ filter->sfi_index = old_sfi->index;
+ filter->sid.handle = old_sfi->handle;
+ sfi_overwrite = 1;
+ }
+
+ err = enetc_psfp_hw_set(priv, &filter->sid,
+ sfi_overwrite ? NULL : sfi, sgi);
+ if (err)
+ goto free_sfi;
+
+ spin_lock(&epsfp.psfp_lock);
+ /* Remove the old node if exist and update with a new node */
+ old_sgi = enetc_get_gate_by_index(filter->sgi_index);
+ if (old_sgi) {
+ refcount_set(&sgi->refcount,
+ refcount_read(&old_sgi->refcount) + 1);
+ hlist_del(&old_sgi->node);
+ kfree(old_sgi);
+ }
+
+ hlist_add_head(&sgi->node, &epsfp.psfp_gate_list);
+
+ if (!old_sfi) {
+ hlist_add_head(&sfi->node, &epsfp.psfp_filter_list);
+ set_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ } else {
+ kfree(sfi);
+ refcount_inc(&old_sfi->refcount);
+ }
+
+ old_filter = enetc_get_stream_by_index(filter->sid.index);
+ if (old_filter)
+ remove_one_chain(priv, old_filter);
+
+ filter->stats.lastused = jiffies;
+ hlist_add_head(&filter->node, &epsfp.stream_list);
+
+ spin_unlock(&epsfp.psfp_lock);
+
+ return 0;
+
+free_sfi:
+ kfree(sfi);
+free_gate:
+ kfree(sgi);
+free_filter:
+ kfree(filter);
+
+ return err;
+}
+
+static int enetc_config_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
+ struct netlink_ext_ack *extack = cls_flower->common.extack;
+ struct flow_dissector *dissector = rule->match.dissector;
+ struct flow_action *action = &rule->action;
+ struct flow_action_entry *entry;
+ struct actions_fwd *fwd;
+ u64 actions = 0;
+ int i, err;
+
+ if (!flow_action_has_entries(action)) {
+ NL_SET_ERR_MSG_MOD(extack, "At least one action is needed");
+ return -EINVAL;
+ }
+
+ flow_action_for_each(i, entry, action)
+ actions |= BIT(entry->id);
+
+ fwd = enetc_check_flow_actions(actions, dissector->used_keys);
+ if (!fwd) {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported filter type!");
+ return -EOPNOTSUPP;
+ }
+
+ if (fwd->output & FILTER_ACTION_TYPE_PSFP) {
+ err = enetc_psfp_parse_clsflower(priv, cls_flower);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid PSFP inputs");
+ return err;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported actions");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int enetc_psfp_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct enetc_stream_filter *filter;
+ struct netlink_ext_ack *extack = f->common.extack;
+ int err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamid_hw_set(priv, &filter->sid, false);
+ if (err)
+ return err;
+
+ remove_one_chain(priv, filter);
+
+ return 0;
+}
+
+static int enetc_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ return enetc_psfp_destroy_clsflower(priv, f);
+}
+
+static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct psfp_streamfilter_counters counters = {};
+ struct enetc_stream_filter *filter;
+ struct flow_stats stats = {};
+ int err;
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamcounter_hw_get(priv, filter->sfi_index, &counters);
+ if (err)
+ return -EINVAL;
+
+ spin_lock(&epsfp.psfp_lock);
+ stats.pkts = counters.matching_frames_count - filter->stats.pkts;
+ stats.lastused = filter->stats.lastused;
+ filter->stats.pkts += stats.pkts;
+ spin_unlock(&epsfp.psfp_lock);
+
+ flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
+ FLOW_ACTION_HW_STATS_DELAYED);
+
+ return 0;
+}
+
+static int enetc_setup_tc_cls_flower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ switch (cls_flower->command) {
+ case FLOW_CLS_REPLACE:
+ return enetc_config_clsflower(priv, cls_flower);
+ case FLOW_CLS_DESTROY:
+ return enetc_destroy_clsflower(priv, cls_flower);
+ case FLOW_CLS_STATS:
+ return enetc_psfp_get_stats(priv, cls_flower);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static inline void clean_psfp_sfi_bitmap(void)
+{
+ bitmap_free(epsfp.psfp_sfi_bitmap);
+ epsfp.psfp_sfi_bitmap = NULL;
+}
+
+static void clean_stream_list(void)
+{
+ struct enetc_stream_filter *s;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(s, tmp, &epsfp.stream_list, node) {
+ hlist_del(&s->node);
+ kfree(s);
+ }
+}
+
+static void clean_sfi_list(void)
+{
+ struct enetc_psfp_filter *sfi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sfi, tmp, &epsfp.psfp_filter_list, node) {
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ }
+}
+
+static void clean_sgi_list(void)
+{
+ struct enetc_psfp_gate *sgi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sgi, tmp, &epsfp.psfp_gate_list, node) {
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void clean_psfp_all(void)
+{
+ /* Disable all list nodes and free all memory */
+ clean_sfi_list();
+ clean_sgi_list();
+ clean_stream_list();
+ epsfp.dev_bitmap = 0;
+ clean_psfp_sfi_bitmap();
+}
+
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct net_device *ndev = cb_priv;
+
+ if (!tc_can_offload(ndev))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case TC_SETUP_CLSFLOWER:
+ return enetc_setup_tc_cls_flower(netdev_priv(ndev), type_data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+int enetc_psfp_init(struct enetc_ndev_priv *priv)
+{
+ if (epsfp.psfp_sfi_bitmap)
+ return 0;
+
+ epsfp.psfp_sfi_bitmap = bitmap_zalloc(priv->psfp_cap.max_psfp_filter,
+ GFP_KERNEL);
+ if (!epsfp.psfp_sfi_bitmap)
+ return -ENOMEM;
+
+ spin_lock_init(&epsfp.psfp_lock);
+
+ if (list_empty(&enetc_block_cb_list))
+ epsfp.dev_bitmap = 0;
+
+ return 0;
+}
+
+int enetc_psfp_clean(struct enetc_ndev_priv *priv)
+{
+ if (!list_empty(&enetc_block_cb_list))
+ return -EBUSY;
+
+ clean_psfp_all();
+
+ return 0;
+}
+
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct flow_block_offload *f = type_data;
+ int err;
+
+ err = flow_block_cb_setup_simple(f, &enetc_block_cb_list,
+ enetc_setup_tc_block_cb,
+ ndev, ndev, true);
+ if (err)
+ return err;
+
+ switch (f->command) {
+ case FLOW_BLOCK_BIND:
+ set_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ break;
+ case FLOW_BLOCK_UNBIND:
+ clear_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ if (!epsfp.dev_bitmap)
+ clean_psfp_all();
+ break;
+ }
+
+ return 0;
+}
--
2.17.1

2020-04-18 22:56:43

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [ v2,net-next 4/4] net: enetc: add tc flower psfp offload driver

Hi Po,

On Sat, 18 Apr 2020 at 04:35, Po Liu <[email protected]> wrote:
>
> This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
> function. There are four main feature parts to implement the flow
> policing and filtering for ingress flow with IEEE 802.1Qci features.
> They are stream identify(this is defined in the P802.1cb exactly but
> needed for 802.1Qci), stream filtering, stream gate and flow metering.
> Each function block includes many entries by index to assign parameters.
> So for one frame would be filtered by stream identify first, then
> flow into stream filter block by the same handle between stream identify
> and stream filtering. Then flow into stream gate control which assigned
> by the stream filtering entry. And then policing by the gate and limited
> by the max sdu in the filter block(optional). At last, policing by the
> flow metering block, index choosing at the fitering block.
> So you can see that each entry of block may link to many upper entries
> since they can be assigned same index means more streams want to share
> the same feature in the stream filtering or stream gate or flow
> metering.
> To implement such features, each stream filtered by source/destination
> mac address, some stream maybe also plus the vlan id value would be
> treated as one flow chain. This would be identified by the chain_index
> which already in the tc filter concept. Driver would maintain this chain
> and also with gate modules. The stream filter entry create by the gate
> index and flow meter(optional) entry id and also one priority value.
> Offloading only transfer the gate action and flow filtering parameters.
> Driver would create (or search same gate id and flow meter id and
> priority) one stream filter entry to set to the hardware. So stream
> filtering do not need transfer by the action offloading.
> This architecture is same with tc filter and actions relationship. tc
> filter maintain the list for each flow feature by keys. And actions
> maintain by the action list.
>
> Below showing a example commands by tc:
> > tc qdisc add dev eth0 ingress
> > ip link set eth0 address 10:00:80:00:00:00
> > tc filter add dev eth0 parent ffff: protocol ip chain 11 \
> flower skip_sw dst_mac 10:00:80:00:00:00 \
> action gate index 10 \
> sched-entry OPEN 200000000 1 8000000 \
> sched-entry CLOSE 100000000 -1 -1
>
> Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
> identify module. And the set the gate index 10 of stream gate module.
> The gate list is keeping OPEN state 200ms, and through the frames to the
> ingress queue 1, and max octets are the 8Mbytes.
>
> Signed-off-by: Po Liu <[email protected]>
> ---
> drivers/net/ethernet/freescale/enetc/enetc.c | 25 +-
> drivers/net/ethernet/freescale/enetc/enetc.h | 46 +-
> .../net/ethernet/freescale/enetc/enetc_hw.h | 142 +++
> .../net/ethernet/freescale/enetc/enetc_pf.c | 4 +-
> .../net/ethernet/freescale/enetc/enetc_qos.c | 1070 +++++++++++++++++
> 5 files changed, 1272 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
> index 04aac7cbb506..298c55786fd9 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.c
> @@ -1521,6 +1521,8 @@ int enetc_setup_tc(struct net_device *ndev, enum tc_setup_type type,
> return enetc_setup_tc_cbs(ndev, type_data);
> case TC_SETUP_QDISC_ETF:
> return enetc_setup_tc_txtime(ndev, type_data);
> + case TC_SETUP_BLOCK:
> + return enetc_setup_tc_psfp(ndev, type_data);
> default:
> return -EOPNOTSUPP;
> }
> @@ -1573,17 +1575,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
> static int enetc_set_psfp(struct net_device *ndev, int en)
> {
> struct enetc_ndev_priv *priv = netdev_priv(ndev);
> + int err;
>
> if (en) {
> + err = enetc_psfp_enable(priv);
> + if (err)
> + return err;
> +
> priv->active_offloads |= ENETC_F_QCI;
> - enetc_get_max_cap(priv);
> - enetc_psfp_enable(&priv->si->hw);
> - } else {
> - priv->active_offloads &= ~ENETC_F_QCI;
> - memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
> - enetc_psfp_disable(&priv->si->hw);
> + return 0;
> }
>
> + err = enetc_psfp_disable(priv);
> + if (err)
> + return err;
> +
> + priv->active_offloads &= ~ENETC_F_QCI;
> +
> return 0;
> }
>
> @@ -1591,14 +1599,15 @@ int enetc_set_features(struct net_device *ndev,
> netdev_features_t features)
> {
> netdev_features_t changed = ndev->features ^ features;
> + int err = 0;
>
> if (changed & NETIF_F_RXHASH)
> enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));
>
> if (changed & NETIF_F_HW_TC)
> - enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
> + err = enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
>
> - return 0;
> + return err;
> }
>
> #ifdef CONFIG_FSL_ENETC_PTP_CLOCK
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
> index 2cfe877c3778..b705464f6882 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc.h
> +++ b/drivers/net/ethernet/freescale/enetc/enetc.h
> @@ -300,6 +300,11 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
> void enetc_sched_speed_set(struct net_device *ndev);
> int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
> int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
> +int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
> + void *cb_priv);
> +int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data);
> +int enetc_psfp_init(struct enetc_ndev_priv *priv);
> +int enetc_psfp_clean(struct enetc_ndev_priv *priv);
>
> static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
> {
> @@ -319,27 +324,60 @@ static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
> priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
> }
>
> -static inline void enetc_psfp_enable(struct enetc_hw *hw)
> +static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
> {
> + struct enetc_hw *hw = &priv->si->hw;
> + int err;
> +
> + enetc_get_max_cap(priv);
> +
> + err = enetc_psfp_init(priv);
> + if (err)
> + return err;
> +
> enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) |
> ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS |
> ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
> +
> + return 0;
> }
>
> -static inline void enetc_psfp_disable(struct enetc_hw *hw)
> +static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
> {
> + struct enetc_hw *hw = &priv->si->hw;
> + int err;
> +
> + err = enetc_psfp_clean(priv);
> + if (err)
> + return err;
> +
> enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) &
> ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS &
> ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
> +
> + memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
> +
> + return 0;
> }
> +
> #else
> #define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
> #define enetc_sched_speed_set(ndev) (void)0
> #define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
> #define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
> +#define enetc_setup_tc_psfp(ndev, type_data) -EOPNOTSUPP
> +#define enetc_setup_tc_block_cb NULL
> +
> #define enetc_get_max_cap(p) \
> memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))
>
> -#define enetc_psfp_enable(hw) (void)0
> -#define enetc_psfp_disable(hw) (void)0
> +static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
> +{
> + return 0;
> +}
> +
> +static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
> +{
> + return 0;
> +}
> #endif
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
> index 587974862f48..6314051bc6c1 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
> @@ -567,6 +567,9 @@ enum bdcr_cmd_class {
> BDCR_CMD_RFS,
> BDCR_CMD_PORT_GCL,
> BDCR_CMD_RECV_CLASSIFIER,
> + BDCR_CMD_STREAM_IDENTIFY,
> + BDCR_CMD_STREAM_FILTER,
> + BDCR_CMD_STREAM_GCL,
> __BDCR_CMD_MAX_LEN,
> BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
> };
> @@ -598,13 +601,152 @@ struct tgs_gcl_data {
> struct gce entry[];
> };
>
> +/* class 7, command 0, Stream Identity Entry Configuration */
> +struct streamid_conf {
> + __le32 stream_handle; /* init gate value */
> + __le32 iports;
> + u8 id_type;
> + u8 oui[3];
> + u8 res[3];
> + u8 en;
> +};
> +
> +#define ENETC_CBDR_SID_VID_MASK 0xfff
> +#define ENETC_CBDR_SID_VIDM BIT(12)
> +#define ENETC_CBDR_SID_TG_MASK 0xc000
> +/* streamid_conf address point to this data space */
> +struct streamid_data {
> + union {
> + u8 dmac[6];
> + u8 smac[6];
> + };
> + u16 vid_vidm_tg;
> +};
> +
> +#define ENETC_CBDR_SFI_PRI_MASK 0x7
> +#define ENETC_CBDR_SFI_PRIM BIT(3)
> +#define ENETC_CBDR_SFI_BLOV BIT(4)
> +#define ENETC_CBDR_SFI_BLEN BIT(5)
> +#define ENETC_CBDR_SFI_MSDUEN BIT(6)
> +#define ENETC_CBDR_SFI_FMITEN BIT(7)
> +#define ENETC_CBDR_SFI_ENABLE BIT(7)
> +/* class 8, command 0, Stream Filter Instance, Short Format */
> +struct sfi_conf {
> + __le32 stream_handle;
> + u8 multi;
> + u8 res[2];
> + u8 sthm;
> + /* Max Service Data Unit or Flow Meter Instance Table index.
> + * Depending on the value of FLT this represents either Max
> + * Service Data Unit (max frame size) allowed by the filter
> + * entry or is an index into the Flow Meter Instance table
> + * index identifying the policer which will be used to police
> + * it.
> + */
> + __le16 fm_inst_table_index;
> + __le16 msdu;
> + __le16 sg_inst_table_index;
> + u8 res1[2];
> + __le32 input_ports;
> + u8 res2[3];
> + u8 en;
> +};
> +
> +/* class 8, command 2 stream Filter Instance status query short format
> + * command no need structure define
> + * Stream Filter Instance Query Statistics Response data
> + */
> +struct sfi_counter_data {
> + u32 matchl;
> + u32 matchh;
> + u32 msdu_dropl;
> + u32 msdu_droph;
> + u32 stream_gate_dropl;
> + u32 stream_gate_droph;
> + u32 flow_meter_dropl;
> + u32 flow_meter_droph;
> +};
> +
> +#define ENETC_CBDR_SGI_OIPV_MASK 0x7
> +#define ENETC_CBDR_SGI_OIPV_EN BIT(3)
> +#define ENETC_CBDR_SGI_CGTST BIT(6)
> +#define ENETC_CBDR_SGI_OGTST BIT(7)
> +#define ENETC_CBDR_SGI_CFG_CHG BIT(1)
> +#define ENETC_CBDR_SGI_CFG_PND BIT(2)
> +#define ENETC_CBDR_SGI_OEX BIT(4)
> +#define ENETC_CBDR_SGI_OEXEN BIT(5)
> +#define ENETC_CBDR_SGI_IRX BIT(6)
> +#define ENETC_CBDR_SGI_IRXEN BIT(7)
> +#define ENETC_CBDR_SGI_ACLLEN_MASK 0x3
> +#define ENETC_CBDR_SGI_OCLLEN_MASK 0xc
> +#define ENETC_CBDR_SGI_EN BIT(7)
> +/* class 9, command 0, Stream Gate Instance Table, Short Format
> + * class 9, command 2, Stream Gate Instance Table entry query write back
> + * Short Format
> + */
> +struct sgi_table {
> + u8 res[8];
> + u8 oipv;
> + u8 res0[2];
> + u8 ocgtst;
> + u8 res1[7];
> + u8 gset;
> + u8 oacl_len;
> + u8 res2[2];
> + u8 en;
> +};
> +
> +#define ENETC_CBDR_SGI_AIPV_MASK 0x7
> +#define ENETC_CBDR_SGI_AIPV_EN BIT(3)
> +#define ENETC_CBDR_SGI_AGTST BIT(7)
> +
> +/* class 9, command 1, Stream Gate Control List, Long Format */
> +struct sgcl_conf {
> + u8 aipv;
> + u8 res[2];
> + u8 agtst;
> + u8 res1[4];
> + union {
> + struct {
> + u8 res2[4];
> + u8 acl_len;
> + u8 res3[3];
> + };
> + u8 cct[8]; /* Config change time */
> + };
> +};
> +
> +#define ENETC_CBDR_SGL_IOMEN BIT(0)
> +#define ENETC_CBDR_SGL_IPVEN BIT(3)
> +#define ENETC_CBDR_SGL_GTST BIT(4)
> +#define ENETC_CBDR_SGL_IPV_MASK 0xe
> +/* Stream Gate Control List Entry */
> +struct sgce {
> + u32 interval;
> + u8 msdu[3];
> + u8 multi;
> +};
> +
> +/* stream control list class 9 , cmd 1 data buffer */
> +struct sgcl_data {
> + u32 btl;
> + u32 bth;
> + u32 ct;
> + u32 cte;
> + struct sgce sgcl[0];
> +};
> +
> struct enetc_cbd {
> union{
> + struct sfi_conf sfi_conf;
> + struct sgi_table sgi_table;
> struct {
> __le32 addr[2];
> union {
> __le32 opt[4];
> struct tgs_gcl_conf gcl_conf;
> + struct streamid_conf sid_set;
> + struct sgcl_conf sgcl_conf;
> };
> }; /* Long format */
> __le32 data[6];
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> index eacd597b55f2..d06fce0dcd8a 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
> @@ -739,12 +739,10 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
> if (si->hw_features & ENETC_SI_F_QBV)
> priv->active_offloads |= ENETC_F_QBV;
>
> - if (si->hw_features & ENETC_SI_F_PSFP) {
> + if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
> priv->active_offloads |= ENETC_F_QCI;
> ndev->features |= NETIF_F_HW_TC;
> ndev->hw_features |= NETIF_F_HW_TC;
> - enetc_get_max_cap(priv);
> - enetc_psfp_enable(&si->hw);
> }
>
> /* pick up primary MAC address from SI */
> diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
> index 0c6bf3a55a9a..7944c243903c 100644
> --- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
> +++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
> @@ -5,6 +5,9 @@
>
> #include <net/pkt_sched.h>
> #include <linux/math64.h>
> +#include <linux/refcount.h>
> +#include <net/pkt_cls.h>
> +#include <net/tc_act/tc_gate.h>
>
> static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
> {
> @@ -331,3 +334,1070 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)
>
> return 0;
> }
> +
> +enum streamid_type {
> + STREAMID_TYPE_RESERVED = 0,
> + STREAMID_TYPE_NULL,
> + STREAMID_TYPE_SMAC,
> +};
> +
> +enum streamid_vlan_tagged {
> + STREAMID_VLAN_RESERVED = 0,
> + STREAMID_VLAN_TAGGED,
> + STREAMID_VLAN_UNTAGGED,
> + STREAMID_VLAN_ALL,
> +};
> +
> +#define ENETC_PSFP_WILDCARD -1
> +#define HANDLE_OFFSET 100
> +
> +enum forward_type {
> + FILTER_ACTION_TYPE_PSFP = BIT(0),
> + FILTER_ACTION_TYPE_ACL = BIT(1),
> + FILTER_ACTION_TYPE_BOTH = GENMASK(1, 0),
> +};
> +
> +/* This is for limit output type for input actions */
> +struct actions_fwd {
> + u64 actions;
> + u64 keys; /* include the must needed keys */
> + enum forward_type output;
> +};
> +
> +struct psfp_streamfilter_counters {
> + u64 matching_frames_count;
> + u64 passing_frames_count;
> + u64 not_passing_frames_count;
> + u64 passing_sdu_count;
> + u64 not_passing_sdu_count;
> + u64 red_frames_count;
> +};
> +
> +struct enetc_streamid {
> + u32 index;
> + union {
> + u8 src_mac[6];
> + u8 dst_mac[6];
> + };
> + u8 filtertype;
> + u16 vid;
> + u8 tagged;
> + s32 handle;
> +};
> +
> +struct enetc_psfp_filter {
> + u32 index;
> + s32 handle;
> + s8 prio;
> + u32 gate_id;
> + s32 meter_id;
> + refcount_t refcount;
> + struct hlist_node node;
> +};
> +
> +struct enetc_psfp_gate {
> + u32 index;
> + s8 init_ipv;
> + u64 basetime;
> + u64 cycletime;
> + u64 cycletimext;
> + u32 num_entries;
> + refcount_t refcount;
> + struct hlist_node node;
> + struct action_gate_entry entries[0];
> +};
> +
> +struct enetc_stream_filter {
> + struct enetc_streamid sid;
> + u32 sfi_index;
> + u32 sgi_index;
> + struct flow_stats stats;
> + struct hlist_node node;
> +};
> +
> +struct enetc_psfp {
> + unsigned long dev_bitmap;
> + unsigned long *psfp_sfi_bitmap;
> + struct hlist_head stream_list;
> + struct hlist_head psfp_filter_list;
> + struct hlist_head psfp_gate_list;
> + spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
> +};
> +
> +struct actions_fwd enetc_act_fwd[] = {
> + {
> + BIT(FLOW_ACTION_GATE),
> + BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
> + FILTER_ACTION_TYPE_PSFP
> + },
> + /* example for ACL actions */
> + {
> + BIT(FLOW_ACTION_DROP),
> + 0,
> + FILTER_ACTION_TYPE_ACL
> + }
> +};
> +
> +static struct enetc_psfp epsfp = {
> + .psfp_sfi_bitmap = NULL,
> +};
> +
> +static LIST_HEAD(enetc_block_cb_list);
> +
> +static inline int enetc_get_port(struct enetc_ndev_priv *priv)
> +{
> + return priv->si->pdev->devfn & 0x7;
> +}
> +
> +/* Stream Identity Entry Set Descriptor */
> +static int enetc_streamid_hw_set(struct enetc_ndev_priv *priv,
> + struct enetc_streamid *sid,
> + u8 enable)
> +{
> + struct enetc_cbd cbd = {.cmd = 0};
> + struct streamid_data *si_data;
> + struct streamid_conf *si_conf;
> + u16 data_size;
> + dma_addr_t dma;
> + int err;
> +
> + if (sid->index >= priv->psfp_cap.max_streamid)
> + return -EINVAL;
> +
> + if (sid->filtertype != STREAMID_TYPE_NULL &&
> + sid->filtertype != STREAMID_TYPE_SMAC)
> + return -EOPNOTSUPP;
> +
> + /* Disable operation before enable */
> + cbd.index = cpu_to_le16((u16)sid->index);
> + cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
> + cbd.status_flags = 0;
> +
> + data_size = sizeof(struct streamid_data);
> + si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
> + cbd.length = cpu_to_le16(data_size);
> +
> + dma = dma_map_single(&priv->si->pdev->dev, si_data,
> + data_size, DMA_FROM_DEVICE);
> + if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
> + netdev_err(priv->si->ndev, "DMA mapping failed!\n");
> + kfree(si_data);
> + return -ENOMEM;
> + }
> +
> + cbd.addr[0] = lower_32_bits(dma);
> + cbd.addr[1] = upper_32_bits(dma);
> + memset(si_data->dmac, 0xff, ETH_ALEN);
> + si_data->vid_vidm_tg =
> + cpu_to_le16(ENETC_CBDR_SID_VID_MASK
> + + ((0x3 << 14) | ENETC_CBDR_SID_VIDM));
> +
> + si_conf = &cbd.sid_set;
> + /* Only one port supported for one entry, set itself */
> + si_conf->iports = 1 << enetc_get_port(priv);
> + si_conf->id_type = 1;
> + si_conf->oui[2] = 0x0;
> + si_conf->oui[1] = 0x80;
> + si_conf->oui[0] = 0xC2;
> +
> + err = enetc_send_cmd(priv->si, &cbd);
> + if (err)
> + return -EINVAL;
> +
> + if (!enable) {
> + kfree(si_data);
> + return 0;
> + }
> +
> + /* Enable the entry overwrite again incase space flushed by hardware */
> + memset(&cbd, 0, sizeof(cbd));
> +
> + cbd.index = cpu_to_le16((u16)sid->index);
> + cbd.cmd = 0;
> + cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
> + cbd.status_flags = 0;
> +
> + si_conf->en = 0x80;
> + si_conf->stream_handle = cpu_to_le32(sid->handle);
> + si_conf->iports = 1 << enetc_get_port(priv);
> + si_conf->id_type = sid->filtertype;
> + si_conf->oui[2] = 0x0;
> + si_conf->oui[1] = 0x80;
> + si_conf->oui[0] = 0xC2;
> +
> + memset(si_data, 0, data_size);
> +
> + cbd.length = cpu_to_le16(data_size);
> +
> + cbd.addr[0] = lower_32_bits(dma);
> + cbd.addr[1] = upper_32_bits(dma);
> +
> + /* VIDM default to be 1.
> + * VID Match. If set (b1) then the VID must match, otherwise
> + * any VID is considered a match. VIDM setting is only used
> + * when TG is set to b01.
> + */
> + if (si_conf->id_type == STREAMID_TYPE_NULL) {
> + ether_addr_copy(si_data->dmac, sid->dst_mac);
> + si_data->vid_vidm_tg =
> + cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
> + ((((u16)(sid->tagged) & 0x3) << 14)
> + | ENETC_CBDR_SID_VIDM));
> + } else if (si_conf->id_type == STREAMID_TYPE_SMAC) {
> + ether_addr_copy(si_data->smac, sid->src_mac);
> + si_data->vid_vidm_tg =
> + cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
> + ((((u16)(sid->tagged) & 0x3) << 14)
> + | ENETC_CBDR_SID_VIDM));
> + }
> +
> + err = enetc_send_cmd(priv->si, &cbd);
> + kfree(si_data);
> +
> + return err;
> +}
> +
> +/* Stream Filter Instance Set Descriptor */
> +static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
> + struct enetc_psfp_filter *sfi,
> + u8 enable)
> +{
> + struct enetc_cbd cbd = {.cmd = 0};
> + struct sfi_conf *sfi_config;
> +
> + cbd.index = cpu_to_le16(sfi->index);
> + cbd.cls = BDCR_CMD_STREAM_FILTER;
> + cbd.status_flags = 0x80;
> + cbd.length = cpu_to_le16(1);
> +
> + sfi_config = &cbd.sfi_conf;
> + if (!enable)
> + goto exit;
> +
> + sfi_config->en = 0x80;
> +
> + if (sfi->handle >= 0) {
> + sfi_config->stream_handle =
> + cpu_to_le32(sfi->handle);
> + sfi_config->sthm |= 0x80;
> + }
> +
> + sfi_config->sg_inst_table_index = cpu_to_le16(sfi->gate_id);
> + sfi_config->input_ports = 1 << enetc_get_port(priv);
> +
> + /* The priority value which may be matched against the
> + * frame’s priority value to determine a match for this entry.
> + */
> + if (sfi->prio >= 0)
> + sfi_config->multi |= (sfi->prio & 0x7) | 0x8;
> +
> + /* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
> + * field as being either an MSDU value or an index into the Flow
> + * Meter Instance table.
> + * TODO: no limit max sdu
> + */
> +
> + if (sfi->meter_id >= 0) {
> + sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
> + sfi_config->multi |= 0x80;
> + }
> +
> +exit:
> + return enetc_send_cmd(priv->si, &cbd);
> +}
> +
> +static int enetc_streamcounter_hw_get(struct enetc_ndev_priv *priv,
> + u32 index,
> + struct psfp_streamfilter_counters *cnt)
> +{
> + struct enetc_cbd cbd = { .cmd = 2 };
> + struct sfi_counter_data *data_buf;
> + dma_addr_t dma;
> + u16 data_size;
> + int err;
> +
> + cbd.index = cpu_to_le16((u16)index);
> + cbd.cmd = 2;
> + cbd.cls = BDCR_CMD_STREAM_FILTER;
> + cbd.status_flags = 0;
> +
> + data_size = sizeof(struct sfi_counter_data);
> + data_buf = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
> + if (!data_buf)
> + return -ENOMEM;
> +
> + dma = dma_map_single(&priv->si->pdev->dev, data_buf,
> + data_size, DMA_FROM_DEVICE);
> + if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
> + netdev_err(priv->si->ndev, "DMA mapping failed!\n");
> + err = -ENOMEM;
> + goto exit;
> + }
> + cbd.addr[0] = lower_32_bits(dma);
> + cbd.addr[1] = upper_32_bits(dma);
> +
> + cbd.length = cpu_to_le16(data_size);
> +
> + err = enetc_send_cmd(priv->si, &cbd);
> + if (err)
> + goto exit;
> +
> + cnt->matching_frames_count =
> + ((u64)le32_to_cpu(data_buf->matchh) << 32)
> + + data_buf->matchl;
> +
> + cnt->not_passing_sdu_count =
> + ((u64)le32_to_cpu(data_buf->msdu_droph) << 32)
> + + data_buf->msdu_dropl;
> +
> + cnt->passing_sdu_count = cnt->matching_frames_count
> + - cnt->not_passing_sdu_count;
> +
> + cnt->not_passing_frames_count =
> + ((u64)le32_to_cpu(data_buf->stream_gate_droph) << 32)
> + + le32_to_cpu(data_buf->stream_gate_dropl);
> +
> + cnt->passing_frames_count = cnt->matching_frames_count
> + - cnt->not_passing_sdu_count
> + - cnt->not_passing_frames_count;
> +
> + cnt->red_frames_count =
> + ((u64)le32_to_cpu(data_buf->flow_meter_droph) << 32)
> + + le32_to_cpu(data_buf->flow_meter_dropl);
> +
> +exit:
> + kfree(data_buf);
> + return err;
> +}
> +
> +static int get_start_ns(struct enetc_ndev_priv *priv, u64 cycle, u64 *start)
> +{
> + u64 now_lo, now_hi, now, n;
> +
> + now_lo = enetc_rd(&priv->si->hw, ENETC_SICTR0);
> + now_hi = enetc_rd(&priv->si->hw, ENETC_SICTR1);
> + now = now_lo | now_hi << 32;
> +
> + if (WARN_ON(!cycle))
> + return -EFAULT;
> +
> + n = div64_u64(now, cycle);
> +
> + *start = (n + 1) * cycle;
> +
> + return 0;
> +}
> +
> +/* Stream Gate Instance Set Descriptor */
> +static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
> + struct enetc_psfp_gate *sgi,
> + u8 enable)
> +{
> + struct enetc_cbd cbd = { .cmd = 0 };
> + struct sgi_table *sgi_config;
> + struct sgcl_conf *sgcl_config;
> + struct sgcl_data *sgcl_data;
> + struct sgce *sgce;
> + dma_addr_t dma;
> + u16 data_size;
> + int err, i;
> +
> + cbd.index = cpu_to_le16(sgi->index);
> + cbd.cmd = 0;
> + cbd.cls = BDCR_CMD_STREAM_GCL;
> + cbd.status_flags = 0x80;
> +
> + /* disable */
> + if (!enable)
> + return enetc_send_cmd(priv->si, &cbd);
> +
> + if (!sgi->num_entries)
> + return 0;
> +
> + if (sgi->num_entries > priv->psfp_cap.max_psfp_gatelist ||
> + !sgi->cycletime)
> + return -EINVAL;
> +
> + /* enable */
> + sgi_config = &cbd.sgi_table;
> +
> + /* Keep open before gate list start */
> + sgi_config->ocgtst = 0x80;
> +
> + sgi_config->oipv = (sgi->init_ipv < 0) ?
> + 0x0 : ((sgi->init_ipv & 0x7) | 0x8);
> +
> + sgi_config->en = 0x80;
> +
> + /* Basic config */
> + err = enetc_send_cmd(priv->si, &cbd);
> + if (err)
> + return -EINVAL;
> +
> + memset(&cbd, 0, sizeof(cbd));
> +
> + cbd.index = cpu_to_le16(sgi->index);
> + cbd.cmd = 1;
> + cbd.cls = BDCR_CMD_STREAM_GCL;
> + cbd.status_flags = 0;
> +
> + sgcl_config = &cbd.sgcl_conf;
> +
> + sgcl_config->acl_len = (sgi->num_entries - 1) & 0x3;
> +
> + data_size = struct_size(sgcl_data, sgcl, sgi->num_entries);
> +
> + sgcl_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
> + if (!sgcl_data)
> + return -ENOMEM;
> +
> + cbd.length = cpu_to_le16(data_size);
> +
> + dma = dma_map_single(&priv->si->pdev->dev,
> + sgcl_data, data_size,
> + DMA_FROM_DEVICE);
> + if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
> + netdev_err(priv->si->ndev, "DMA mapping failed!\n");
> + kfree(sgcl_data);
> + return -ENOMEM;
> + }
> +
> + cbd.addr[0] = lower_32_bits(dma);
> + cbd.addr[1] = upper_32_bits(dma);
> +
> + sgce = &sgcl_data->sgcl[0];
> +
> + sgcl_config->agtst = 0x80;
> +
> + sgcl_data->ct = cpu_to_le32(sgi->cycletime);
> + sgcl_data->cte = cpu_to_le32(sgi->cycletimext);
> +
> + if (sgi->init_ipv >= 0)
> + sgcl_config->aipv = (sgi->init_ipv & 0x7) | 0x8;
> +
> + for (i = 0; i < sgi->num_entries; i++) {
> + struct action_gate_entry *from = &sgi->entries[i];
> + struct sgce *to = &sgce[i];
> +
> + if (from->gate_state)
> + to->multi |= 0x10;
> +
> + if (from->ipv >= 0)
> + to->multi |= ((from->ipv & 0x7) << 5) | 0x08;
> +
> + if (from->maxoctets)
> + to->multi |= 0x01;
> +
> + to->interval = cpu_to_le32(from->interval);
> + to->msdu[0] = from->maxoctets & 0xFF;
> + to->msdu[1] = (from->maxoctets >> 8) & 0xFF;
> + to->msdu[2] = (from->maxoctets >> 16) & 0xFF;
> + }
> +
> + /* If basetime is 0, calculate start time */
> + if (!sgi->basetime) {
> + u64 start;
> +
> + err = get_start_ns(priv, sgi->cycletime, &start);
> + if (err)
> + goto exit;
> + sgcl_data->btl = cpu_to_le32(lower_32_bits(start));
> + sgcl_data->bth = cpu_to_le32(upper_32_bits(start));
> + } else {
> + u32 hi, lo;
> +
> + hi = upper_32_bits(sgi->basetime);
> + lo = lower_32_bits(sgi->basetime);
> + sgcl_data->bth = cpu_to_le32(hi);
> + sgcl_data->btl = cpu_to_le32(lo);
> + }
> +
> + err = enetc_send_cmd(priv->si, &cbd);
> +
> +exit:
> + kfree(sgcl_data);
> +
> + return err;
> +}
> +
> +static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
> +{
> + struct enetc_stream_filter *f;
> +
> + hlist_for_each_entry(f, &epsfp.stream_list, node)
> + if (f->sid.index == index)
> + return f;
> +
> + return NULL;
> +}
> +
> +static struct enetc_psfp_gate *enetc_get_gate_by_index(u32 index)
> +{
> + struct enetc_psfp_gate *g;
> +
> + hlist_for_each_entry(g, &epsfp.psfp_gate_list, node)
> + if (g->index == index)
> + return g;
> +
> + return NULL;
> +}
> +
> +static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
> +{
> + struct enetc_psfp_filter *s;
> +
> + hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
> + if (s->index == index)
> + return s;
> +
> + return NULL;
> +}
> +
> +static struct enetc_psfp_filter
> + *enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
> +{
> + struct enetc_psfp_filter *s;
> +
> + hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
> + if (s->gate_id == sfi->gate_id &&
> + s->prio == sfi->prio &&
> + s->meter_id == sfi->meter_id)
> + return s;
> +
> + return NULL;
> +}
> +
> +static int enetc_get_free_index(struct enetc_ndev_priv *priv)
> +{
> + u32 max_size = priv->psfp_cap.max_psfp_filter;
> + unsigned long index;
> +
> + index = find_first_zero_bit(epsfp.psfp_sfi_bitmap, max_size);
> + if (index == max_size)
> + return -1;
> +
> + return index;
> +}
> +
> +static void stream_filter_unref(struct enetc_ndev_priv *priv, u32 index)
> +{
> + struct enetc_psfp_filter *sfi;
> + u8 z;
> +
> + sfi = enetc_get_filter_by_index(index);
> + WARN_ON(!sfi);
> + z = refcount_dec_and_test(&sfi->refcount);
> +
> + if (z) {
> + enetc_streamfilter_hw_set(priv, sfi, false);
> + hlist_del(&sfi->node);
> + kfree(sfi);
> + clear_bit(sfi->index, epsfp.psfp_sfi_bitmap);
> + }
> +}
> +
> +static void stream_gate_unref(struct enetc_ndev_priv *priv, u32 index)
> +{
> + struct enetc_psfp_gate *sgi;
> + u8 z;
> +
> + sgi = enetc_get_gate_by_index(index);
> + WARN_ON(!sgi);
> + z = refcount_dec_and_test(&sgi->refcount);
> + if (z) {
> + enetc_streamgate_hw_set(priv, sgi, false);
> + hlist_del(&sgi->node);
> + kfree(sgi);
> + }
> +}
> +
> +static void remove_one_chain(struct enetc_ndev_priv *priv,
> + struct enetc_stream_filter *filter)
> +{
> + stream_gate_unref(priv, filter->sgi_index);
> + stream_filter_unref(priv, filter->sfi_index);
> +
> + hlist_del(&filter->node);
> + kfree(filter);
> +}
> +
> +static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
> + struct enetc_streamid *sid,
> + struct enetc_psfp_filter *sfi,
> + struct enetc_psfp_gate *sgi)
> +{
> + int err;
> +
> + err = enetc_streamid_hw_set(priv, sid, true);
> + if (err)
> + return err;
> +
> + if (sfi) {
> + err = enetc_streamfilter_hw_set(priv, sfi, true);
> + if (err)
> + goto revert_sid;
> + }
> +
> + err = enetc_streamgate_hw_set(priv, sgi, true);
> + if (err)
> + goto revert_sfi;
> +
> + return 0;
> +
> +revert_sfi:
> + if (sfi)
> + enetc_streamfilter_hw_set(priv, sfi, false);
> +revert_sid:
> + enetc_streamid_hw_set(priv, sid, false);
> + return err;
> +}
> +
> +struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int inputkeys)
> +{
> + int i;
> +
> + for (i = 0; i < ARRAY_SIZE(enetc_act_fwd); i++)
> + if (acts == enetc_act_fwd[i].actions &&
> + inputkeys & enetc_act_fwd[i].keys)
> + return &enetc_act_fwd[i];
> +
> + return NULL;
> +}
> +
> +static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *f)
> +{
> + struct flow_rule *rule = flow_cls_offload_flow_rule(f);
> + struct netlink_ext_ack *extack = f->common.extack;
> + struct enetc_stream_filter *filter, *old_filter;
> + struct enetc_psfp_filter *sfi, *old_sfi;
> + struct enetc_psfp_gate *sgi, *old_sgi;
> + struct flow_action_entry *entry;
> + struct action_gate_entry *e;
> + u8 sfi_overwrite = 0;
> + int entries_size;
> + int i, err;
> +
> + if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
> + return -ENOSPC;
> + }
> +
> + flow_action_for_each(i, entry, &rule->action)
> + if (entry->id == FLOW_ACTION_GATE)
> + break;
> +
> + if (entry->id != FLOW_ACTION_GATE)
> + return -EINVAL;
> +
> + filter = kzalloc(sizeof(*filter), GFP_KERNEL);
> + if (!filter)
> + return -ENOMEM;
> +
> + filter->sid.index = f->common.chain_index;
> +
> + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
> + struct flow_match_eth_addrs match;
> +
> + flow_rule_match_eth_addrs(rule, &match);
> +
> + if (!is_zero_ether_addr(match.mask->dst)) {

Does ENETC support masked matching on MAC address? If not, you should
error out if the mask is not ff:ff:ff:ff:ff:ff.

> + ether_addr_copy(filter->sid.dst_mac, match.key->dst);
> + filter->sid.filtertype = STREAMID_TYPE_NULL;
> + }
> +
> + if (!is_zero_ether_addr(match.mask->src)) {
> + ether_addr_copy(filter->sid.src_mac, match.key->src);
> + filter->sid.filtertype = STREAMID_TYPE_SMAC;
> + }
> + } else {
> + NL_SET_ERR_MSG_MOD(extack, "Unsupported, must ETH_ADDRS");
> + return -EINVAL;
> + }
> +
> + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
> + struct flow_match_vlan match;
> +
> + flow_rule_match_vlan(rule, &match);
> + if (match.mask->vlan_priority) {
> + if (match.mask->vlan_priority !=
> + (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
> + NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN priority");
> + err = -EINVAL;
> + goto free_filter;
> + }
> + }
> +
> + if (match.mask->vlan_tpid) {
> + if (match.mask->vlan_tpid != VLAN_VID_MASK) {

I'm pretty sure that vlan_tpid is the EtherType (0x8100, etc), and
that you actually meant vlan_id.


> + NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN id");
> + err = -EINVAL;
> + goto free_filter;
> + }
> +
> + filter->sid.vid = match.key->vlan_tpid;
> + if (!filter->sid.vid)
> + filter->sid.tagged = STREAMID_VLAN_UNTAGGED;
> + else
> + filter->sid.tagged = STREAMID_VLAN_TAGGED;
> + }
> + } else {
> + filter->sid.tagged = STREAMID_VLAN_ALL;
> + }
> +
> + /* parsing gate action */
> + if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
> + err = -ENOSPC;
> + goto free_filter;
> + }
> +
> + if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
> + err = -ENOSPC;
> + goto free_filter;
> + }
> +
> + entries_size = struct_size(sgi, entries, entry->gate.num_entries);
> + sgi = kzalloc(entries_size, GFP_KERNEL);
> + if (!sgi) {
> + err = -ENOMEM;
> + goto free_filter;
> + }
> +
> + refcount_set(&sgi->refcount, 1);
> + sgi->index = entry->gate.index;
> + sgi->init_ipv = entry->gate.prio;
> + sgi->basetime = entry->gate.basetime;
> + sgi->cycletime = entry->gate.cycletime;
> + sgi->num_entries = entry->gate.num_entries;
> +
> + e = sgi->entries;
> + for (i = 0; i < entry->gate.num_entries; i++) {
> + e[i].gate_state = entry->gate.entries[i].gate_state;
> + e[i].interval = entry->gate.entries[i].interval;
> + e[i].ipv = entry->gate.entries[i].ipv;
> + e[i].maxoctets = entry->gate.entries[i].maxoctets;
> + }
> +
> + filter->sgi_index = sgi->index;
> +
> + sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
> + if (!sfi) {
> + err = -ENOMEM;
> + goto free_gate;
> + }
> +
> + refcount_set(&sfi->refcount, 1);
> + sfi->gate_id = sgi->index;
> +
> + /* flow meter not support yet */
> + sfi->meter_id = ENETC_PSFP_WILDCARD;
> +
> + /* prio ref the filter prio */
> + if (f->common.prio && f->common.prio <= BIT(3))
> + sfi->prio = f->common.prio - 1;
> + else
> + sfi->prio = ENETC_PSFP_WILDCARD;
> +
> + old_sfi = enetc_psfp_check_sfi(sfi);
> + if (!old_sfi) {
> + int index;
> +
> + index = enetc_get_free_index(priv);
> + if (sfi->handle < 0) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
> + err = -ENOSPC;
> + goto free_sfi;
> + }
> +
> + sfi->index = index;
> + sfi->handle = index + HANDLE_OFFSET;
> + /* Update the stream filter handle also */
> + filter->sid.handle = sfi->handle;
> + filter->sfi_index = sfi->index;
> + sfi_overwrite = 0;
> + } else {
> + filter->sfi_index = old_sfi->index;
> + filter->sid.handle = old_sfi->handle;
> + sfi_overwrite = 1;
> + }
> +
> + err = enetc_psfp_hw_set(priv, &filter->sid,
> + sfi_overwrite ? NULL : sfi, sgi);
> + if (err)
> + goto free_sfi;
> +
> + spin_lock(&epsfp.psfp_lock);
> + /* Remove the old node if exist and update with a new node */
> + old_sgi = enetc_get_gate_by_index(filter->sgi_index);
> + if (old_sgi) {
> + refcount_set(&sgi->refcount,
> + refcount_read(&old_sgi->refcount) + 1);
> + hlist_del(&old_sgi->node);
> + kfree(old_sgi);
> + }
> +
> + hlist_add_head(&sgi->node, &epsfp.psfp_gate_list);
> +
> + if (!old_sfi) {
> + hlist_add_head(&sfi->node, &epsfp.psfp_filter_list);
> + set_bit(sfi->index, epsfp.psfp_sfi_bitmap);
> + } else {
> + kfree(sfi);
> + refcount_inc(&old_sfi->refcount);
> + }
> +
> + old_filter = enetc_get_stream_by_index(filter->sid.index);
> + if (old_filter)
> + remove_one_chain(priv, old_filter);
> +
> + filter->stats.lastused = jiffies;
> + hlist_add_head(&filter->node, &epsfp.stream_list);
> +
> + spin_unlock(&epsfp.psfp_lock);
> +
> + return 0;
> +
> +free_sfi:
> + kfree(sfi);
> +free_gate:
> + kfree(sgi);
> +free_filter:
> + kfree(filter);
> +
> + return err;
> +}
> +
> +static int enetc_config_clsflower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *cls_flower)
> +{
> + struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
> + struct netlink_ext_ack *extack = cls_flower->common.extack;
> + struct flow_dissector *dissector = rule->match.dissector;
> + struct flow_action *action = &rule->action;
> + struct flow_action_entry *entry;
> + struct actions_fwd *fwd;
> + u64 actions = 0;
> + int i, err;
> +
> + if (!flow_action_has_entries(action)) {
> + NL_SET_ERR_MSG_MOD(extack, "At least one action is needed");
> + return -EINVAL;
> + }
> +
> + flow_action_for_each(i, entry, action)
> + actions |= BIT(entry->id);
> +
> + fwd = enetc_check_flow_actions(actions, dissector->used_keys);
> + if (!fwd) {
> + NL_SET_ERR_MSG_MOD(extack, "Unsupported filter type!");
> + return -EOPNOTSUPP;
> + }
> +
> + if (fwd->output & FILTER_ACTION_TYPE_PSFP) {
> + err = enetc_psfp_parse_clsflower(priv, cls_flower);
> + if (err) {
> + NL_SET_ERR_MSG_MOD(extack, "Invalid PSFP inputs");
> + return err;
> + }
> + } else {
> + NL_SET_ERR_MSG_MOD(extack, "Unsupported actions");
> + return -EOPNOTSUPP;
> + }
> +
> + return 0;
> +}
> +
> +static int enetc_psfp_destroy_clsflower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *f)
> +{
> + struct enetc_stream_filter *filter;
> + struct netlink_ext_ack *extack = f->common.extack;
> + int err;
> +
> + if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
> + NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
> + return -ENOSPC;
> + }
> +
> + filter = enetc_get_stream_by_index(f->common.chain_index);
> + if (!filter)
> + return -EINVAL;
> +
> + err = enetc_streamid_hw_set(priv, &filter->sid, false);
> + if (err)
> + return err;
> +
> + remove_one_chain(priv, filter);
> +
> + return 0;
> +}
> +
> +static int enetc_destroy_clsflower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *f)
> +{
> + return enetc_psfp_destroy_clsflower(priv, f);
> +}
> +
> +static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *f)
> +{
> + struct psfp_streamfilter_counters counters = {};
> + struct enetc_stream_filter *filter;
> + struct flow_stats stats = {};
> + int err;
> +
> + filter = enetc_get_stream_by_index(f->common.chain_index);
> + if (!filter)
> + return -EINVAL;
> +
> + err = enetc_streamcounter_hw_get(priv, filter->sfi_index, &counters);
> + if (err)
> + return -EINVAL;
> +
> + spin_lock(&epsfp.psfp_lock);
> + stats.pkts = counters.matching_frames_count - filter->stats.pkts;
> + stats.lastused = filter->stats.lastused;
> + filter->stats.pkts += stats.pkts;
> + spin_unlock(&epsfp.psfp_lock);
> +
> + flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
> + FLOW_ACTION_HW_STATS_DELAYED);
> +
> + return 0;
> +}
> +
> +static int enetc_setup_tc_cls_flower(struct enetc_ndev_priv *priv,
> + struct flow_cls_offload *cls_flower)
> +{
> + switch (cls_flower->command) {
> + case FLOW_CLS_REPLACE:
> + return enetc_config_clsflower(priv, cls_flower);
> + case FLOW_CLS_DESTROY:
> + return enetc_destroy_clsflower(priv, cls_flower);
> + case FLOW_CLS_STATS:
> + return enetc_psfp_get_stats(priv, cls_flower);
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +static inline void clean_psfp_sfi_bitmap(void)
> +{
> + bitmap_free(epsfp.psfp_sfi_bitmap);
> + epsfp.psfp_sfi_bitmap = NULL;
> +}
> +
> +static void clean_stream_list(void)
> +{
> + struct enetc_stream_filter *s;
> + struct hlist_node *tmp;
> +
> + hlist_for_each_entry_safe(s, tmp, &epsfp.stream_list, node) {
> + hlist_del(&s->node);
> + kfree(s);
> + }
> +}
> +
> +static void clean_sfi_list(void)
> +{
> + struct enetc_psfp_filter *sfi;
> + struct hlist_node *tmp;
> +
> + hlist_for_each_entry_safe(sfi, tmp, &epsfp.psfp_filter_list, node) {
> + hlist_del(&sfi->node);
> + kfree(sfi);
> + }
> +}
> +
> +static void clean_sgi_list(void)
> +{
> + struct enetc_psfp_gate *sgi;
> + struct hlist_node *tmp;
> +
> + hlist_for_each_entry_safe(sgi, tmp, &epsfp.psfp_gate_list, node) {
> + hlist_del(&sgi->node);
> + kfree(sgi);
> + }
> +}
> +
> +static void clean_psfp_all(void)
> +{
> + /* Disable all list nodes and free all memory */
> + clean_sfi_list();
> + clean_sgi_list();
> + clean_stream_list();
> + epsfp.dev_bitmap = 0;
> + clean_psfp_sfi_bitmap();
> +}
> +
> +int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
> + void *cb_priv)
> +{
> + struct net_device *ndev = cb_priv;
> +
> + if (!tc_can_offload(ndev))
> + return -EOPNOTSUPP;
> +
> + switch (type) {
> + case TC_SETUP_CLSFLOWER:
> + return enetc_setup_tc_cls_flower(netdev_priv(ndev), type_data);
> + default:
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +int enetc_psfp_init(struct enetc_ndev_priv *priv)
> +{
> + if (epsfp.psfp_sfi_bitmap)
> + return 0;
> +
> + epsfp.psfp_sfi_bitmap = bitmap_zalloc(priv->psfp_cap.max_psfp_filter,
> + GFP_KERNEL);
> + if (!epsfp.psfp_sfi_bitmap)
> + return -ENOMEM;
> +
> + spin_lock_init(&epsfp.psfp_lock);
> +
> + if (list_empty(&enetc_block_cb_list))
> + epsfp.dev_bitmap = 0;
> +
> + return 0;
> +}
> +
> +int enetc_psfp_clean(struct enetc_ndev_priv *priv)
> +{
> + if (!list_empty(&enetc_block_cb_list))
> + return -EBUSY;
> +
> + clean_psfp_all();
> +
> + return 0;
> +}
> +
> +int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data)
> +{
> + struct enetc_ndev_priv *priv = netdev_priv(ndev);
> + struct flow_block_offload *f = type_data;
> + int err;
> +
> + err = flow_block_cb_setup_simple(f, &enetc_block_cb_list,
> + enetc_setup_tc_block_cb,
> + ndev, ndev, true);
> + if (err)
> + return err;
> +
> + switch (f->command) {
> + case FLOW_BLOCK_BIND:
> + set_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
> + break;
> + case FLOW_BLOCK_UNBIND:
> + clear_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
> + if (!epsfp.dev_bitmap)
> + clean_psfp_all();
> + break;
> + }
> +
> + return 0;
> +}
> --
> 2.17.1
>

Thanks,
-Vladimir

2020-04-19 01:48:10

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [ v2,net-next 4/4] net: enetc: add tc flower psfp offload driver

Hi Vladimir,

> -----Original Message-----
> From: Vladimir Oltean <[email protected]>
> Sent: 2020年4月19日 6:53
> To: Po Liu <[email protected]>
> Cc: David S. Miller <[email protected]>; lkml <linux-
> [email protected]>; netdev <[email protected]>; Vinicius Costa
> Gomes <[email protected]>; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected]; Jiri Pirko <[email protected]>;
> Ido Schimmel <[email protected]>; Alexandre Belloni
> <[email protected]>; Microchip Linux Driver Support
> <[email protected]>; Jakub Kicinski <[email protected]>;
> Jamal Hadi Salim <[email protected]>; Cong Wang
> <[email protected]>; [email protected];
> [email protected]; [email protected]; Murali Karicheri <m-
> [email protected]>; Andre Guedes <[email protected]>;
> Stephen Hemminger <[email protected]>
> Subject: [EXT] Re: [ v2,net-next 4/4] net: enetc: add tc flower psfp offload
> driver
>
> Caution: EXT Email
>
> Hi Po,
>
> On Sat, 18 Apr 2020 at 04:35, Po Liu <[email protected]> wrote:
> >
> > + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS))
> {
> > + struct flow_match_eth_addrs match;
> > +
> > + flow_rule_match_eth_addrs(rule, &match);
> > +
> > + if (!is_zero_ether_addr(match.mask->dst)) {
>
> Does ENETC support masked matching on MAC address? If not, you should
> error out if the mask is not ff:ff:ff:ff:ff:ff.

I get it. Thanks.

>
> > + ether_addr_copy(filter->sid.dst_mac, match.key->dst);
> > + filter->sid.filtertype = STREAMID_TYPE_NULL;
> > + }
> > +
> > + if (!is_zero_ether_addr(match.mask->src)) {
> > + ether_addr_copy(filter->sid.src_mac, match.key->src);
> > + filter->sid.filtertype = STREAMID_TYPE_SMAC;
> > + }
> > + } else {
> > + NL_SET_ERR_MSG_MOD(extack, "Unsupported, must
> ETH_ADDRS");
> > + return -EINVAL;
> > + }
> > +
> > + if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
> > + struct flow_match_vlan match;
> > +
> > + flow_rule_match_vlan(rule, &match);
> > + if (match.mask->vlan_priority) {
> > + if (match.mask->vlan_priority !=
> > + (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
> > + NL_SET_ERR_MSG_MOD(extack, "Only full mask is
> supported for VLAN priority");
> > + err = -EINVAL;
> > + goto free_filter;
> > + }
> > + }
> > +
> > + if (match.mask->vlan_tpid) {
> > + if (match.mask->vlan_tpid != VLAN_VID_MASK) {
>
> I'm pretty sure that vlan_tpid is the EtherType (0x8100, etc), and
> that you actually meant vlan_id.
>

Yes, I'll correct it.

> > --
> > 2.17.1
> >
>
> Thanks,
> -Vladimir

Br,
Po Liu

2020-04-22 03:10:39

by Po Liu

[permalink] [raw]
Subject: [v3,net-next 0/4] Introduce a flow gate control action and apply IEEE

Changes from V2:
0001: No changes.
0002: No changes.
0003: No changes.
0004: Fix the vlan id filter parameter and add reject src mac
FF-FF-FF-FF-FF-FF filter in driver.

Changes from V1:
0000: Update description make it more clear
0001: Removed 'add update dropped stats' patch, will provide pull
request as standalone patches.
0001: Update commit description make it more clear ack by Jiri Pirko.
0002: No changes
0003: Fix some code style ack by Jiri Pirko.
0004: Fix enetc_psfp_enable/disable parameter type ack by test robot

iprout2 command patches:
Not attach with these serial patches, will provide separate pull
request after kernel accept these patches.

Changes from RFC:
0000: Reduce to 5 patches and remove the 4 max frame size offload and
flow metering in the policing offload action, Only keep gate action
offloading implementation.
0001: No changes.
0002:
- fix kfree lead ack by Jakub Kicinski and Cong Wang
- License fix from Jakub Kicinski and Stephen Hemminger
- Update example in commit acked by Vinicius Costa Gomes
- Fix the rcu protect in tcf_gate_act() acked by Vinicius

0003: No changes
0004: No changes
0005:
Acked by Vinicius Costa Gomes
- Use refcount kernel lib
- Update stream gate check code position
- Update reduce ref names more clear

iprout2 command patches:
0000: Update license expression and add gate id
0001: Add tc action gate man page

--------------------------------------------------------------------
These patches add stream gate action policing in IEEE802.1Qci (Per-Stream
Filtering and Policing) software support and hardware offload support in
tc flower, and implement the stream identify, stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver.
Per-Stream Filtering and Policing (PSFP) specifies flow policing and
filtering for ingress flows, and has three main parts:
1. The stream filter instance table consists of an ordered list of
stream filters that determine the filtering and policing actions that
are to be applied to frames received on a specific stream. The main
elements are stream gate id, flow metering id and maximum SDU size.
2. The stream gate function setup a gate list to control ingress traffic
class open/close state. When the gate is running at open state, the flow
could pass but dropped when gate state is running to close. User setup a
bastime to tell gate when start running the entry list, then the hardware
would periodiclly. There is no compare qdisc action support.
3. Flow metering is two rates two buckets and three-color marker to
policing the frames. Flow metering instance are as specified in the
algorithm in MEF10.3. The most likely qdisc action is policing action.

The first patch introduces an ingress frame flow control gate action,
for the point 2. The tc gate action maintains the open/close state gate
list, allowing flows to pass when the gate is open. Each gate action
may policing one or more qdisc filters. When the start time arrived, The
driver would repeat the gate list periodiclly. User can assign a passed
time, the driver would calculate a new future time by the cycletime of
the gate list.

The 0002 patch introduces the gate flow hardware offloading.

The 0003 patch adds support control the on/off for the tc flower
offloading by ethtool.

The 0004 patch implement the stream identify and stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver. Tc filter
command provide filtering keys with MAC address and VLAN id. These keys
would be set to stream identify instance entry. Stream gate instance
entry would refer the gate action parameters. Stream filter instance
entry would refer the stream gate index and assign a stream handle value
matches to the stream identify instance.

Po Liu (4):
net: qos: introduce a gate control flow action
net: schedule: add action gate offloading
net: enetc: add hw tc hw offload features for PSPF capability
net: enetc: add tc flower psfp offload driver

drivers/net/ethernet/freescale/enetc/enetc.c | 34 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 86 ++
.../net/ethernet/freescale/enetc/enetc_hw.h | 159 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 6 +
.../net/ethernet/freescale/enetc/enetc_qos.c | 1082 +++++++++++++++++
include/net/flow_offload.h | 10 +
include/net/tc_act/tc_gate.h | 169 +++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +
net/sched/Kconfig | 13 +
net/sched/Makefile | 1 +
net/sched/act_gate.c | 647 ++++++++++
net/sched/cls_api.c | 33 +
13 files changed, 2287 insertions(+), 1 deletion(-)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

--
2.17.1

2020-04-22 03:10:49

by Po Liu

[permalink] [raw]
Subject: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can be passed at
specific time slot, and be dropped at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action requires the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped. When the frames have passed total frames over 8000000
bytes, frames will be dropped in one 200000000ns time slot.

> tc qdisc add dev eth0 ingress

> tc filter add dev eth0 parent ffff: protocol ip \
flower src_ip 192.168.0.20 \
action gate index 2 clockid CLOCK_TAI \
sched-entry open 200000000 -1 8000000 \
sched-entry close 100000000 -1 -1

> tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow with period nanosecond. Then next item is internal
priority value means which ingress queue should put. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.
Base-time is not set will be 0 as default, as result start time would
be ((N + 1) * cycletime) which is the minimal of future time.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

> tc filter add dev eth0 parent ffff: protocol ip \
flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
action gate index 12 base-time 1357000000000 \
sched-entry close 200000000 -1 -1 \
clockid CLOCK_TAI

Signed-off-by: Po Liu <[email protected]>
---
include/net/tc_act/tc_gate.h | 54 +++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 ++
net/sched/Kconfig | 13 +
net/sched/Makefile | 1 +
net/sched/act_gate.c | 647 ++++++++++++++++++++++++++++
6 files changed, 763 insertions(+)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
new file mode 100644
index 000000000000..b0ace55b2aaa
--- /dev/null
+++ b/include/net/tc_act/tc_gate.h
@@ -0,0 +1,54 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright 2020 NXP */
+
+#ifndef __NET_TC_GATE_H
+#define __NET_TC_GATE_H
+
+#include <net/act_api.h>
+#include <linux/tc_act/tc_gate.h>
+
+struct tcfg_gate_entry {
+ int index;
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+ struct list_head list;
+};
+
+struct tcf_gate_params {
+ s32 tcfg_priority;
+ u64 tcfg_basetime;
+ u64 tcfg_cycletime;
+ u64 tcfg_cycletime_ext;
+ u32 tcfg_flags;
+ s32 tcfg_clockid;
+ size_t num_entries;
+ struct list_head entries;
+};
+
+#define GATE_ACT_GATE_OPEN BIT(0)
+#define GATE_ACT_PENDING BIT(1)
+struct gate_action {
+ struct tcf_gate_params param;
+ spinlock_t entry_lock;
+ u8 current_gate_status;
+ ktime_t current_close_time;
+ u32 current_entry_octets;
+ s32 current_max_octets;
+ struct tcfg_gate_entry __rcu *next_entry;
+ struct hrtimer hitimer;
+ enum tk_offsets tk_offset;
+ struct rcu_head rcu;
+};
+
+struct tcf_gate {
+ struct tc_action common;
+ struct gate_action __rcu *actg;
+};
+#define to_gate(a) ((struct tcf_gate *)a)
+
+#define get_gate_param(act) ((struct tcf_gate_params *)act)
+#define get_gate_action(p) ((struct gate_action *)p)
+
+#endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 9f06d29cab70..fc672b232437 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -134,6 +134,7 @@ enum tca_id {
TCA_ID_CTINFO,
TCA_ID_MPLS,
TCA_ID_CT,
+ TCA_ID_GATE,
/* other actions go here */
__TCA_ID_MAX = 255
};
diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
new file mode 100644
index 000000000000..f214b3a6d44f
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_gate.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 NXP */
+
+#ifndef __LINUX_TC_GATE_H
+#define __LINUX_TC_GATE_H
+
+#include <linux/pkt_cls.h>
+
+struct tc_gate {
+ tc_gen;
+};
+
+enum {
+ TCA_GATE_ENTRY_UNSPEC,
+ TCA_GATE_ENTRY_INDEX,
+ TCA_GATE_ENTRY_GATE,
+ TCA_GATE_ENTRY_INTERVAL,
+ TCA_GATE_ENTRY_IPV,
+ TCA_GATE_ENTRY_MAX_OCTETS,
+ __TCA_GATE_ENTRY_MAX,
+};
+#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_ONE_ENTRY_UNSPEC,
+ TCA_GATE_ONE_ENTRY,
+ __TCA_GATE_ONE_ENTRY_MAX,
+};
+#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_UNSPEC,
+ TCA_GATE_TM,
+ TCA_GATE_PARMS,
+ TCA_GATE_PAD,
+ TCA_GATE_PRIORITY,
+ TCA_GATE_ENTRY_LIST,
+ TCA_GATE_BASE_TIME,
+ TCA_GATE_CYCLE_TIME,
+ TCA_GATE_CYCLE_TIME_EXT,
+ TCA_GATE_FLAGS,
+ TCA_GATE_CLOCKID,
+ __TCA_GATE_MAX,
+};
+#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index bfbefb7bff9d..1314549c7567 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -981,6 +981,19 @@ config NET_ACT_CT
To compile this code as a module, choose M here: the
module will be called act_ct.

+config NET_ACT_GATE
+ tristate "Frame gate entry list control tc action"
+ depends on NET_CLS_ACT
+ help
+ Say Y here to allow to control the ingress flow to be passed at
+ specific time slot and be dropped at other specific time slot by
+ the gate entry list. The manipulation will simulate the IEEE
+ 802.1Qci stream gate control behavior.
+
+ If unsure, say N.
+ To compile this code as a module, choose M here: the
+ module will be called act_gate.
+
config NET_IFE_SKBMARK
tristate "Support to encoding decoding skb mark on IFE action"
depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 31c367a6cd09..66bbf9a98f9e 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o
obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
obj-$(CONFIG_NET_ACT_CT) += act_ct.o
+obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
new file mode 100644
index 000000000000..e932f402b4f1
--- /dev/null
+++ b/net/sched/act_gate.c
@@ -0,0 +1,647 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Copyright 2020 NXP */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <net/act_api.h>
+#include <net/netlink.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>
+
+static unsigned int gate_net_id;
+static struct tc_action_ops act_gate_ops;
+
+static ktime_t gate_get_time(struct gate_action *gact)
+{
+ ktime_t mono = ktime_get();
+
+ switch (gact->tk_offset) {
+ case TK_OFFS_MAX:
+ return mono;
+ default:
+ return ktime_mono_to_any(mono, gact->tk_offset);
+ }
+
+ return KTIME_MAX;
+}
+
+static int gate_get_start_time(struct gate_action *gact, ktime_t *start)
+{
+ struct tcf_gate_params *param = get_gate_param(gact);
+ ktime_t now, base, cycle;
+ u64 n;
+
+ base = ns_to_ktime(param->tcfg_basetime);
+ now = gate_get_time(gact);
+
+ if (ktime_after(base, now)) {
+ *start = base;
+ return 0;
+ }
+
+ cycle = param->tcfg_cycletime;
+
+ /* cycle time should not be zero */
+ if (WARN_ON(!cycle))
+ return -EFAULT;
+
+ n = div64_u64(ktime_sub_ns(now, base), cycle);
+ *start = ktime_add_ns(base, (n + 1) * cycle);
+ return 0;
+}
+
+static void gate_start_timer(struct gate_action *gact, ktime_t start)
+{
+ ktime_t expires;
+
+ expires = hrtimer_get_expires(&gact->hitimer);
+ if (expires == 0)
+ expires = KTIME_MAX;
+
+ start = min_t(ktime_t, start, expires);
+
+ hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS);
+}
+
+static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
+{
+ struct gate_action *gact = container_of(timer, struct gate_action,
+ hitimer);
+ struct tcf_gate_params *p = get_gate_param(gact);
+ struct tcfg_gate_entry *next;
+ ktime_t close_time, now;
+
+ spin_lock(&gact->entry_lock);
+
+ next = rcu_dereference_protected(gact->next_entry,
+ lockdep_is_held(&gact->entry_lock));
+
+ /* cycle start, clear pending bit, clear total octets */
+ gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0;
+ gact->current_entry_octets = 0;
+ gact->current_max_octets = next->maxoctets;
+
+ gact->current_close_time = ktime_add_ns(gact->current_close_time,
+ next->interval);
+
+ close_time = gact->current_close_time;
+
+ if (list_is_last(&next->list, &p->entries))
+ next = list_first_entry(&p->entries,
+ struct tcfg_gate_entry, list);
+ else
+ next = list_next_entry(next, list);
+
+ now = gate_get_time(gact);
+
+ if (ktime_after(now, close_time)) {
+ ktime_t cycle, base;
+ u64 n;
+
+ cycle = p->tcfg_cycletime;
+ base = ns_to_ktime(p->tcfg_basetime);
+ n = div64_u64(ktime_sub_ns(now, base), cycle);
+ close_time = ktime_add_ns(base, (n + 1) * cycle);
+ }
+
+ rcu_assign_pointer(gact->next_entry, next);
+ spin_unlock(&gact->entry_lock);
+
+ hrtimer_set_expires(&gact->hitimer, close_time);
+
+ return HRTIMER_RESTART;
+}
+
+static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
+ struct tcf_result *res)
+{
+ struct tcf_gate *g = to_gate(a);
+ struct gate_action *gact;
+ int action;
+
+ tcf_lastuse_update(&g->tcf_tm);
+ bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
+
+ action = READ_ONCE(g->tcf_action);
+ rcu_read_lock();
+ gact = rcu_dereference_bh(g->actg);
+ if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
+ rcu_read_unlock();
+ return action;
+ }
+
+ if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
+ goto drop;
+
+ if (gact->current_max_octets >= 0) {
+ gact->current_entry_octets += qdisc_pkt_len(skb);
+ if (gact->current_entry_octets > gact->current_max_octets) {
+ qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
+ goto drop;
+ }
+ }
+ rcu_read_unlock();
+
+ return action;
+drop:
+ rcu_read_unlock();
+ qstats_drop_inc(this_cpu_ptr(g->common.cpu_qstats));
+ return TC_ACT_SHOT;
+}
+
+static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1] = {
+ [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
+ [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
+ [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
+ [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
+ [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
+};
+
+static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
+ [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
+ .type = NLA_EXACT_LEN },
+ [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
+ [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
+ [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
+ [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
+ [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
+ [TCA_GATE_FLAGS] = { .type = NLA_U32 },
+ [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
+};
+
+static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
+ struct netlink_ext_ack *extack)
+{
+ u32 interval = 0;
+
+ entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
+
+ if (tb[TCA_GATE_ENTRY_INTERVAL])
+ interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
+
+ if (interval == 0) {
+ NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
+ return -EINVAL;
+ }
+
+ entry->interval = interval;
+
+ if (tb[TCA_GATE_ENTRY_IPV])
+ entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
+ else
+ entry->ipv = -1;
+
+ if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
+ entry->maxoctets = nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
+ else
+ entry->maxoctets = -1;
+
+ return 0;
+}
+
+static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry *entry,
+ int index, struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
+ int err;
+
+ err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy, extack);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack, "Could not parse nested entry");
+ return -EINVAL;
+ }
+
+ entry->index = index;
+
+ return fill_gate_entry(tb, entry, extack);
+}
+
+static int parse_gate_list(struct nlattr *list_attr,
+ struct tcf_gate_params *sched,
+ struct netlink_ext_ack *extack)
+{
+ struct tcfg_gate_entry *entry, *e;
+ struct nlattr *n;
+ int err, rem;
+ int i = 0;
+
+ if (!list_attr)
+ return -EINVAL;
+
+ nla_for_each_nested(n, list_attr, rem) {
+ if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
+ NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
+ continue;
+ }
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry) {
+ NL_SET_ERR_MSG(extack, "Not enough memory for entry");
+ err = -ENOMEM;
+ goto release_list;
+ }
+
+ err = parse_gate_entry(n, entry, i, extack);
+ if (err < 0) {
+ kfree(entry);
+ goto release_list;
+ }
+
+ list_add_tail(&entry->list, &sched->entries);
+ i++;
+ }
+
+ sched->num_entries = i;
+
+ return i;
+
+release_list:
+ list_for_each_entry_safe(entry, e, &sched->entries, list) {
+ list_del(&entry->list);
+ kfree(entry);
+ }
+
+ return err;
+}
+
+static int tcf_gate_init(struct net *net, struct nlattr *nla,
+ struct nlattr *est, struct tc_action **a,
+ int ovr, int bind, bool rtnl_held,
+ struct tcf_proto *tp, u32 flags,
+ struct netlink_ext_ack *extack)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+ enum tk_offsets tk_offset = TK_OFFS_TAI;
+ struct nlattr *tb[TCA_GATE_MAX + 1];
+ struct tcf_chain *goto_ch = NULL;
+ struct tcfg_gate_entry *next;
+ struct tcf_gate_params *p;
+ struct gate_action *gact;
+ s32 clockid = CLOCK_TAI;
+ struct tc_gate *parm;
+ struct tcf_gate *g;
+ int ret = 0, err;
+ u64 basetime = 0;
+ u32 gflags = 0;
+ s32 prio = -1;
+ ktime_t start;
+ u32 index;
+
+ if (!nla)
+ return -EINVAL;
+
+ err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
+ if (err < 0)
+ return err;
+
+ if (!tb[TCA_GATE_PARMS])
+ return -EINVAL;
+ parm = nla_data(tb[TCA_GATE_PARMS]);
+ index = parm->index;
+ err = tcf_idr_check_alloc(tn, &index, a, bind);
+ if (err < 0)
+ return err;
+
+ if (err && bind)
+ return 0;
+
+ if (!err) {
+ ret = tcf_idr_create_from_flags(tn, index, est, a,
+ &act_gate_ops, bind, flags);
+ if (ret) {
+ tcf_idr_cleanup(tn, index);
+ return ret;
+ }
+
+ ret = ACT_P_CREATED;
+ } else if (!ovr) {
+ tcf_idr_release(*a, bind);
+ return -EEXIST;
+ }
+
+ if (tb[TCA_GATE_PRIORITY])
+ prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
+
+ if (tb[TCA_GATE_BASE_TIME])
+ basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
+
+ if (tb[TCA_GATE_FLAGS])
+ gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
+
+ if (tb[TCA_GATE_CLOCKID]) {
+ clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
+ switch (clockid) {
+ case CLOCK_REALTIME:
+ tk_offset = TK_OFFS_REAL;
+ break;
+ case CLOCK_MONOTONIC:
+ tk_offset = TK_OFFS_MAX;
+ break;
+ case CLOCK_BOOTTIME:
+ tk_offset = TK_OFFS_BOOT;
+ break;
+ case CLOCK_TAI:
+ tk_offset = TK_OFFS_TAI;
+ break;
+ default:
+ NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
+ goto release_idr;
+ }
+ }
+
+ err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
+ if (err < 0)
+ goto release_idr;
+
+ g = to_gate(*a);
+
+ gact = kzalloc(sizeof(*gact), GFP_KERNEL);
+ if (!gact) {
+ err = -ENOMEM;
+ goto put_chain;
+ }
+
+ p = get_gate_param(gact);
+
+ INIT_LIST_HEAD(&p->entries);
+ if (tb[TCA_GATE_ENTRY_LIST]) {
+ err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
+ if (err < 0)
+ goto release_mem;
+ }
+
+ if (tb[TCA_GATE_CYCLE_TIME]) {
+ p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
+ } else {
+ struct tcfg_gate_entry *entry;
+ ktime_t cycle = 0;
+
+ list_for_each_entry(entry, &p->entries, list)
+ cycle = ktime_add_ns(cycle, entry->interval);
+ p->tcfg_cycletime = cycle;
+ }
+
+ if (tb[TCA_GATE_CYCLE_TIME_EXT])
+ p->tcfg_cycletime_ext =
+ nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
+
+ p->tcfg_priority = prio;
+ p->tcfg_basetime = basetime;
+ p->tcfg_clockid = clockid;
+ p->tcfg_flags = gflags;
+
+ gact->tk_offset = tk_offset;
+ spin_lock_init(&gact->entry_lock);
+ hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
+ gact->hitimer.function = gate_timer_func;
+
+ err = gate_get_start_time(gact, &start);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack,
+ "Internal error: failed get start time");
+ goto release_mem;
+ }
+
+ gact->current_close_time = start;
+ gact->current_gate_status = GATE_ACT_GATE_OPEN | GATE_ACT_PENDING;
+
+ next = list_first_entry(&p->entries, struct tcfg_gate_entry, list);
+ rcu_assign_pointer(gact->next_entry, next);
+
+ gate_start_timer(gact, start);
+
+ spin_lock_bh(&g->tcf_lock);
+ goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+ gact = rcu_replace_pointer(g->actg, gact,
+ lockdep_is_held(&g->tcf_lock));
+ spin_unlock_bh(&g->tcf_lock);
+
+ if (goto_ch)
+ tcf_chain_put_by_act(goto_ch);
+ if (gact)
+ kfree_rcu(gact, rcu);
+
+ if (ret == ACT_P_CREATED)
+ tcf_idr_insert(tn, *a);
+ return ret;
+
+release_mem:
+ kfree(gact);
+put_chain:
+ if (goto_ch)
+ tcf_chain_put_by_act(goto_ch);
+release_idr:
+ tcf_idr_release(*a, bind);
+ return err;
+}
+
+static void tcf_gate_cleanup(struct tc_action *a)
+{
+ struct tcf_gate *g = to_gate(a);
+ struct tcfg_gate_entry *entry, *n;
+ struct tcf_gate_params *p;
+ struct gate_action *gact;
+
+ spin_lock_bh(&g->tcf_lock);
+ gact = rcu_dereference_protected(g->actg,
+ lockdep_is_held(&g->tcf_lock));
+ hrtimer_cancel(&gact->hitimer);
+
+ p = get_gate_param(gact);
+ list_for_each_entry_safe(entry, n, &p->entries, list) {
+ list_del(&entry->list);
+ kfree(entry);
+ }
+ spin_unlock_bh(&g->tcf_lock);
+
+ kfree_rcu(gact, rcu);
+}
+
+static int dumping_entry(struct sk_buff *skb,
+ struct tcfg_gate_entry *entry)
+{
+ struct nlattr *item;
+
+ item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY);
+ if (!item)
+ return -ENOSPC;
+
+ if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index))
+ goto nla_put_failure;
+
+ if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry->maxoctets))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv))
+ goto nla_put_failure;
+
+ return nla_nest_end(skb, item);
+
+nla_put_failure:
+ nla_nest_cancel(skb, item);
+ return -1;
+}
+
+static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a,
+ int bind, int ref)
+{
+ unsigned char *b = skb_tail_pointer(skb);
+ struct tcf_gate *g = to_gate(a);
+ struct tc_gate opt = {
+ .index = g->tcf_index,
+ .refcnt = refcount_read(&g->tcf_refcnt) - ref,
+ .bindcnt = atomic_read(&g->tcf_bindcnt) - bind,
+ };
+ struct tcfg_gate_entry *entry;
+ struct gate_action *gact;
+ struct tcf_gate_params *p;
+ struct nlattr *entry_list;
+ struct tcf_t t;
+
+ spin_lock_bh(&g->tcf_lock);
+ opt.action = g->tcf_action;
+ gact = rcu_dereference_protected(g->actg,
+ lockdep_is_held(&g->tcf_lock));
+
+ p = get_gate_param(gact);
+
+ if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME,
+ p->tcfg_basetime, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME,
+ p->tcfg_cycletime, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT,
+ p->tcfg_cycletime_ext, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority))
+ goto nla_put_failure;
+
+ entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST);
+ if (!entry_list)
+ goto nla_put_failure;
+
+ list_for_each_entry(entry, &p->entries, list) {
+ if (dumping_entry(skb, entry) < 0)
+ goto nla_put_failure;
+ }
+
+ nla_nest_end(skb, entry_list);
+
+ tcf_tm_dump(&t, &g->tcf_tm);
+ if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD))
+ goto nla_put_failure;
+ spin_unlock_bh(&g->tcf_lock);
+
+ return skb->len;
+
+nla_put_failure:
+ spin_unlock_bh(&g->tcf_lock);
+ nlmsg_trim(skb, b);
+ return -1;
+}
+
+static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
+ struct netlink_callback *cb, int type,
+ const struct tc_action_ops *ops,
+ struct netlink_ext_ack *extack)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u32 packets,
+ u64 lastuse, bool hw)
+{
+ struct tcf_gate *g = to_gate(a);
+ struct tcf_t *tm = &g->tcf_tm;
+
+ tcf_action_update_stats(a, bytes, packets, false, hw);
+ tm->lastuse = max_t(u64, tm->lastuse, lastuse);
+}
+
+static int tcf_gate_search(struct net *net, struct tc_action **a, u32 index)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tcf_idr_search(tn, a, index);
+}
+
+static size_t tcf_gate_get_fill_size(const struct tc_action *act)
+{
+ return nla_total_size(sizeof(struct tc_gate));
+}
+
+static struct tc_action_ops act_gate_ops = {
+ .kind = "gate",
+ .id = TCA_ID_GATE,
+ .owner = THIS_MODULE,
+ .act = tcf_gate_act,
+ .dump = tcf_gate_dump,
+ .init = tcf_gate_init,
+ .cleanup = tcf_gate_cleanup,
+ .walk = tcf_gate_walker,
+ .stats_update = tcf_gate_stats_update,
+ .get_fill_size = tcf_gate_get_fill_size,
+ .lookup = tcf_gate_search,
+ .size = sizeof(struct gate_action),
+};
+
+static __net_init int gate_init_net(struct net *net)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tc_action_net_init(net, tn, &act_gate_ops);
+}
+
+static void __net_exit gate_exit_net(struct list_head *net_list)
+{
+ tc_action_net_exit(net_list, gate_net_id);
+}
+
+static struct pernet_operations gate_net_ops = {
+ .init = gate_init_net,
+ .exit_batch = gate_exit_net,
+ .id = &gate_net_id,
+ .size = sizeof(struct tc_action_net),
+};
+
+static int __init gate_init_module(void)
+{
+ return tcf_register_action(&act_gate_ops, &gate_net_ops);
+}
+
+static void __exit gate_cleanup_module(void)
+{
+ tcf_unregister_action(&act_gate_ops, &gate_net_ops);
+}
+
+module_init(gate_init_module);
+module_exit(gate_cleanup_module);
+MODULE_LICENSE("GPL v2");
--
2.17.1

2020-04-22 03:11:17

by Po Liu

[permalink] [raw]
Subject: [v3,net-next 2/4] net: schedule: add action gate offloading

Add the gate action to the flow action entry. Add the gate parameters to
the tc_setup_flow_action() queueing to the entries of flow_action_entry
array provide to the driver.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 10 +++
include/net/tc_act/tc_gate.h | 115 +++++++++++++++++++++++++++++++++++
net/sched/cls_api.c | 33 ++++++++++
3 files changed, 158 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 3619c6acf60f..94a30fe02e6d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -147,6 +147,7 @@ enum flow_action_id {
FLOW_ACTION_MPLS_PUSH,
FLOW_ACTION_MPLS_POP,
FLOW_ACTION_MPLS_MANGLE,
+ FLOW_ACTION_GATE,
NUM_FLOW_ACTIONS,
};

@@ -255,6 +256,15 @@ struct flow_action_entry {
u8 bos;
u8 ttl;
} mpls_mangle;
+ struct {
+ u32 index;
+ s32 prio;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimeext;
+ u32 num_entries;
+ struct action_gate_entry *entries;
+ } gate;
};
struct flow_action_cookie *cookie; /* user defined action cookie */
};
diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
index b0ace55b2aaa..62633cb02c7a 100644
--- a/include/net/tc_act/tc_gate.h
+++ b/include/net/tc_act/tc_gate.h
@@ -7,6 +7,13 @@
#include <net/act_api.h>
#include <linux/tc_act/tc_gate.h>

+struct action_gate_entry {
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+};
+
struct tcfg_gate_entry {
int index;
u8 gate_state;
@@ -51,4 +58,112 @@ struct tcf_gate {
#define get_gate_param(act) ((struct tcf_gate_params *)act)
#define get_gate_action(p) ((struct gate_action *)p)

+static inline bool is_tcf_gate(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+ if (a->ops && a->ops->id == TCA_ID_GATE)
+ return true;
+#endif
+ return false;
+}
+
+static inline u32 tcf_gate_index(const struct tc_action *a)
+{
+ return a->tcfa_index;
+}
+
+static inline s32 tcf_gate_prio(const struct tc_action *a)
+{
+ s32 tcfg_prio;
+
+ rcu_read_lock();
+ tcfg_prio = rcu_dereference(to_gate(a)->actg)->param.tcfg_priority;
+ rcu_read_unlock();
+
+ return tcfg_prio;
+}
+
+static inline u64 tcf_gate_basetime(const struct tc_action *a)
+{
+ u64 tcfg_basetime;
+
+ rcu_read_lock();
+ tcfg_basetime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_basetime;
+ rcu_read_unlock();
+
+ return tcfg_basetime;
+}
+
+static inline u64 tcf_gate_cycletime(const struct tc_action *a)
+{
+ u64 tcfg_cycletime;
+
+ rcu_read_lock();
+ tcfg_cycletime =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime;
+ rcu_read_unlock();
+
+ return tcfg_cycletime;
+}
+
+static inline u64 tcf_gate_cycletimeext(const struct tc_action *a)
+{
+ u64 tcfg_cycletimeext;
+
+ rcu_read_lock();
+ tcfg_cycletimeext =
+ rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime_ext;
+ rcu_read_unlock();
+
+ return tcfg_cycletimeext;
+}
+
+static inline u32 tcf_gate_num_entries(const struct tc_action *a)
+{
+ u32 num_entries;
+
+ rcu_read_lock();
+ num_entries =
+ rcu_dereference(to_gate(a)->actg)->param.num_entries;
+ rcu_read_unlock();
+
+ return num_entries;
+}
+
+static inline struct action_gate_entry
+ *tcf_gate_get_list(const struct tc_action *a)
+{
+ struct action_gate_entry *oe;
+ struct tcf_gate_params *p;
+ struct tcfg_gate_entry *entry;
+ u32 num_entries;
+ int i = 0;
+
+ rcu_read_lock();
+ p = &(rcu_dereference(to_gate(a)->actg)->param);
+ num_entries = p->num_entries;
+ rcu_read_unlock();
+
+ list_for_each_entry(entry, &p->entries, list)
+ i++;
+
+ if (i != num_entries)
+ return NULL;
+
+ oe = kzalloc(sizeof(*oe) * num_entries, GFP_KERNEL);
+ if (!oe)
+ return NULL;
+
+ i = 0;
+ list_for_each_entry(entry, &p->entries, list) {
+ oe[i].gate_state = entry->gate_state;
+ oe[i].interval = entry->interval;
+ oe[i].ipv = entry->ipv;
+ oe[i].maxoctets = entry->maxoctets;
+ i++;
+ }
+
+ return oe;
+}
#endif
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 55bd1429678f..ca8bf74be4ba 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -39,6 +39,7 @@
#include <net/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_ct.h>
#include <net/tc_act/tc_mpls.h>
+#include <net/tc_act/tc_gate.h>
#include <net/flow_offload.h>

extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
@@ -3523,6 +3524,27 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
#endif
}

+static void tcf_gate_entry_destructor(void *priv)
+{
+ struct action_gate_entry *oe = priv;
+
+ kfree(oe);
+}
+
+static int tcf_gate_get_entries(struct flow_action_entry *entry,
+ const struct tc_action *act)
+{
+ entry->gate.entries = tcf_gate_get_list(act);
+
+ if (!entry->gate.entries)
+ return -EINVAL;
+
+ entry->destructor = tcf_gate_entry_destructor;
+ entry->destructor_priv = entry->gate.entries;
+
+ return 0;
+}
+
int tc_setup_flow_action(struct flow_action *flow_action,
const struct tcf_exts *exts)
{
@@ -3669,6 +3691,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
} else if (is_tcf_skbedit_priority(act)) {
entry->id = FLOW_ACTION_PRIORITY;
entry->priority = tcf_skbedit_priority(act);
+ } else if (is_tcf_gate(act)) {
+ entry->id = FLOW_ACTION_GATE;
+ entry->gate.index = tcf_gate_index(act);
+ entry->gate.prio = tcf_gate_prio(act);
+ entry->gate.basetime = tcf_gate_basetime(act);
+ entry->gate.cycletime = tcf_gate_cycletime(act);
+ entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
+ entry->gate.num_entries = tcf_gate_num_entries(act);
+ err = tcf_gate_get_entries(entry, act);
+ if (err)
+ goto err_out;
} else {
err = -EOPNOTSUPP;
goto err_out_locked;
--
2.17.1

2020-04-22 03:11:31

by Po Liu

[permalink] [raw]
Subject: [v3,net-next 3/4] net: enetc: add hw tc hw offload features for PSPF capability

This patch is to let ethtool enable/disable the tc flower offload
features. Hardware ENETC has the feature of PSFP which is for per-stream
policing. When enable the tc hw offloading feature, driver would enable
the IEEE 802.1Qci feature. It is only set the register enable bit for
this feature not enable for any entry of per stream filtering and stream
gate or stream identify but get how much capabilities for each feature.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 23 +++++++++
drivers/net/ethernet/freescale/enetc/enetc.h | 48 +++++++++++++++++++
.../net/ethernet/freescale/enetc/enetc_hw.h | 17 +++++++
.../net/ethernet/freescale/enetc/enetc_pf.c | 8 ++++
4 files changed, 96 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index ccf2611f4a20..04aac7cbb506 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -756,6 +756,9 @@ void enetc_get_si_caps(struct enetc_si *si)

if (val & ENETC_SIPCAPR0_QBV)
si->hw_features |= ENETC_SI_F_QBV;
+
+ if (val & ENETC_SIPCAPR0_PSFP)
+ si->hw_features |= ENETC_SI_F_PSFP;
}

static int enetc_dma_alloc_bdr(struct enetc_bdr *r, size_t bd_size)
@@ -1567,6 +1570,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
return 0;
}

+static int enetc_set_psfp(struct net_device *ndev, int en)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+
+ if (en) {
+ priv->active_offloads |= ENETC_F_QCI;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&priv->si->hw);
+ } else {
+ priv->active_offloads &= ~ENETC_F_QCI;
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+ enetc_psfp_disable(&priv->si->hw);
+ }
+
+ return 0;
+}
+
int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
@@ -1575,6 +1595,9 @@ int enetc_set_features(struct net_device *ndev,
if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

+ if (changed & NETIF_F_HW_TC)
+ enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+
return 0;
}

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 56c43f35b633..2cfe877c3778 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -151,6 +151,7 @@ enum enetc_errata {
};

#define ENETC_SI_F_QBV BIT(0)
+#define ENETC_SI_F_PSFP BIT(1)

/* PCI IEP device data */
struct enetc_si {
@@ -203,12 +204,20 @@ struct enetc_cls_rule {
};

#define ENETC_MAX_BDR_INT 2 /* fixed to max # of available cpus */
+struct psfp_cap {
+ u32 max_streamid;
+ u32 max_psfp_filter;
+ u32 max_psfp_gate;
+ u32 max_psfp_gatelist;
+ u32 max_psfp_meter;
+};

/* TODO: more hardware offloads */
enum enetc_active_offloads {
ENETC_F_RX_TSTAMP = BIT(0),
ENETC_F_TX_TSTAMP = BIT(1),
ENETC_F_QBV = BIT(2),
+ ENETC_F_QCI = BIT(3),
};

struct enetc_ndev_priv {
@@ -231,6 +240,8 @@ struct enetc_ndev_priv {

struct enetc_cls_rule *cls_rules;

+ struct psfp_cap psfp_cap;
+
struct device_node *phy_node;
phy_interface_t if_mode;
};
@@ -289,9 +300,46 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+
+static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
+{
+ u32 reg;
+
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSIDCAPR);
+ priv->psfp_cap.max_streamid = reg & ENETC_PSIDCAPR_MSK;
+ /* Port stream filter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSFCAPR);
+ priv->psfp_cap.max_psfp_filter = reg & ENETC_PSFCAPR_MSK;
+ /* Port stream gate capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSGCAPR);
+ priv->psfp_cap.max_psfp_gate = (reg & ENETC_PSGCAPR_SGIT_MSK);
+ priv->psfp_cap.max_psfp_gatelist = (reg & ENETC_PSGCAPR_GCL_MSK) >> 16;
+ /* Port flow meter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PFMCAPR);
+ priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
+}
+
+static inline void enetc_psfp_enable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) |
+ ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS |
+ ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+}
+
+static inline void enetc_psfp_disable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) &
+ ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS &
+ ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+}
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_get_max_cap(p) \
+ memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))
+
+#define enetc_psfp_enable(hw) (void)0
+#define enetc_psfp_disable(hw) (void)0
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 2a6523136947..587974862f48 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -19,6 +19,7 @@
#define ENETC_SICTR1 0x1c
#define ENETC_SIPCAPR0 0x20
#define ENETC_SIPCAPR0_QBV BIT(4)
+#define ENETC_SIPCAPR0_PSFP BIT(9)
#define ENETC_SIPCAPR0_RSS BIT(8)
#define ENETC_SIPCAPR1 0x24
#define ENETC_SITGTGR 0x30
@@ -228,6 +229,15 @@ enum enetc_bdr_type {TX, RX};
#define ENETC_PM0_IFM_RLP (BIT(5) | BIT(11))
#define ENETC_PM0_IFM_RGAUTO (BIT(15) | ENETC_PMO_IFM_RG | BIT(1))
#define ENETC_PM0_IFM_XGMII BIT(12)
+#define ENETC_PSIDCAPR 0x1b08
+#define ENETC_PSIDCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSFCAPR 0x1b18
+#define ENETC_PSFCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSGCAPR 0x1b28
+#define ENETC_PSGCAPR_GCL_MSK GENMASK(18, 16)
+#define ENETC_PSGCAPR_SGIT_MSK GENMASK(15, 0)
+#define ENETC_PFMCAPR 0x1b38
+#define ENETC_PFMCAPR_MSK GENMASK(15, 0)

/* MAC counters */
#define ENETC_PM0_REOCT 0x8100
@@ -621,3 +631,10 @@ struct enetc_cbd {
/* Port time specific departure */
#define ENETC_PTCTSDR(n) (0x1210 + 4 * (n))
#define ENETC_TSDE BIT(31)
+
+/* PSFP setting */
+#define ENETC_PPSFPMR 0x11b00
+#define ENETC_PPSFPMR_PSFPEN BIT(0)
+#define ENETC_PPSFPMR_VS BIT(1)
+#define ENETC_PPSFPMR_PVC BIT(2)
+#define ENETC_PPSFPMR_PVZC BIT(3)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index de1ad4975074..cef9fbfdb056 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -727,6 +727,14 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

+ if (si->hw_features & ENETC_SI_F_PSFP) {
+ priv->active_offloads |= ENETC_F_QCI;
+ ndev->features |= NETIF_F_HW_TC;
+ ndev->hw_features |= NETIF_F_HW_TC;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&si->hw);
+ }
+
/* pick up primary MAC address from SI */
enetc_get_primary_mac_addr(&si->hw, ndev->dev_addr);
}
--
2.17.1

2020-04-22 03:14:28

by Po Liu

[permalink] [raw]
Subject: [v3,net-next 4/4] net: enetc: add tc flower psfp offload driver

This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
function. There are four main feature parts to implement the flow
policing and filtering for ingress flow with IEEE 802.1Qci features.
They are stream identify(this is defined in the P802.1cb exactly but
needed for 802.1Qci), stream filtering, stream gate and flow metering.
Each function block includes many entries by index to assign parameters.
So for one frame would be filtered by stream identify first, then
flow into stream filter block by the same handle between stream identify
and stream filtering. Then flow into stream gate control which assigned
by the stream filtering entry. And then policing by the gate and limited
by the max sdu in the filter block(optional). At last, policing by the
flow metering block, index choosing at the fitering block.
So you can see that each entry of block may link to many upper entries
since they can be assigned same index means more streams want to share
the same feature in the stream filtering or stream gate or flow
metering.
To implement such features, each stream filtered by source/destination
mac address, some stream maybe also plus the vlan id value would be
treated as one flow chain. This would be identified by the chain_index
which already in the tc filter concept. Driver would maintain this chain
and also with gate modules. The stream filter entry create by the gate
index and flow meter(optional) entry id and also one priority value.
Offloading only transfer the gate action and flow filtering parameters.
Driver would create (or search same gate id and flow meter id and
priority) one stream filter entry to set to the hardware. So stream
filtering do not need transfer by the action offloading.
This architecture is same with tc filter and actions relationship. tc
filter maintain the list for each flow feature by keys. And actions
maintain by the action list.

Below showing a example commands by tc:
> tc qdisc add dev eth0 ingress
> ip link set eth0 address 10:00:80:00:00:00
> tc filter add dev eth0 parent ffff: protocol ip chain 11 \
flower skip_sw dst_mac 10:00:80:00:00:00 \
action gate index 10 \
sched-entry OPEN 200000000 1 8000000 \
sched-entry CLOSE 100000000 -1 -1

Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
identify module. And the set the gate index 10 of stream gate module.
The gate list is keeping OPEN state 200ms, and through the frames to the
ingress queue 1, and max octets are the 8Mbytes.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 25 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 46 +-
.../net/ethernet/freescale/enetc/enetc_hw.h | 142 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 4 +-
.../net/ethernet/freescale/enetc/enetc_qos.c | 1082 +++++++++++++++++
5 files changed, 1284 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 04aac7cbb506..298c55786fd9 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1521,6 +1521,8 @@ int enetc_setup_tc(struct net_device *ndev, enum tc_setup_type type,
return enetc_setup_tc_cbs(ndev, type_data);
case TC_SETUP_QDISC_ETF:
return enetc_setup_tc_txtime(ndev, type_data);
+ case TC_SETUP_BLOCK:
+ return enetc_setup_tc_psfp(ndev, type_data);
default:
return -EOPNOTSUPP;
}
@@ -1573,17 +1575,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
static int enetc_set_psfp(struct net_device *ndev, int en)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ int err;

if (en) {
+ err = enetc_psfp_enable(priv);
+ if (err)
+ return err;
+
priv->active_offloads |= ENETC_F_QCI;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&priv->si->hw);
- } else {
- priv->active_offloads &= ~ENETC_F_QCI;
- memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
- enetc_psfp_disable(&priv->si->hw);
+ return 0;
}

+ err = enetc_psfp_disable(priv);
+ if (err)
+ return err;
+
+ priv->active_offloads &= ~ENETC_F_QCI;
+
return 0;
}

@@ -1591,14 +1599,15 @@ int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
netdev_features_t changed = ndev->features ^ features;
+ int err = 0;

if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

if (changed & NETIF_F_HW_TC)
- enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+ err = enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));

- return 0;
+ return err;
}

#ifdef CONFIG_FSL_ENETC_PTP_CLOCK
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 2cfe877c3778..b705464f6882 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -300,6 +300,11 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv);
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data);
+int enetc_psfp_init(struct enetc_ndev_priv *priv);
+int enetc_psfp_clean(struct enetc_ndev_priv *priv);

static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
{
@@ -319,27 +324,60 @@ static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
}

-static inline void enetc_psfp_enable(struct enetc_hw *hw)
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ enetc_get_max_cap(priv);
+
+ err = enetc_psfp_init(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) |
ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS |
ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+
+ return 0;
}

-static inline void enetc_psfp_disable(struct enetc_hw *hw)
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ err = enetc_psfp_clean(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) &
~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS &
~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+
+ return 0;
}
+
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_psfp(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_block_cb NULL
+
#define enetc_get_max_cap(p) \
memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))

-#define enetc_psfp_enable(hw) (void)0
-#define enetc_psfp_disable(hw) (void)0
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
+{
+ return 0;
+}
+
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
+{
+ return 0;
+}
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 587974862f48..6314051bc6c1 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -567,6 +567,9 @@ enum bdcr_cmd_class {
BDCR_CMD_RFS,
BDCR_CMD_PORT_GCL,
BDCR_CMD_RECV_CLASSIFIER,
+ BDCR_CMD_STREAM_IDENTIFY,
+ BDCR_CMD_STREAM_FILTER,
+ BDCR_CMD_STREAM_GCL,
__BDCR_CMD_MAX_LEN,
BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
};
@@ -598,13 +601,152 @@ struct tgs_gcl_data {
struct gce entry[];
};

+/* class 7, command 0, Stream Identity Entry Configuration */
+struct streamid_conf {
+ __le32 stream_handle; /* init gate value */
+ __le32 iports;
+ u8 id_type;
+ u8 oui[3];
+ u8 res[3];
+ u8 en;
+};
+
+#define ENETC_CBDR_SID_VID_MASK 0xfff
+#define ENETC_CBDR_SID_VIDM BIT(12)
+#define ENETC_CBDR_SID_TG_MASK 0xc000
+/* streamid_conf address point to this data space */
+struct streamid_data {
+ union {
+ u8 dmac[6];
+ u8 smac[6];
+ };
+ u16 vid_vidm_tg;
+};
+
+#define ENETC_CBDR_SFI_PRI_MASK 0x7
+#define ENETC_CBDR_SFI_PRIM BIT(3)
+#define ENETC_CBDR_SFI_BLOV BIT(4)
+#define ENETC_CBDR_SFI_BLEN BIT(5)
+#define ENETC_CBDR_SFI_MSDUEN BIT(6)
+#define ENETC_CBDR_SFI_FMITEN BIT(7)
+#define ENETC_CBDR_SFI_ENABLE BIT(7)
+/* class 8, command 0, Stream Filter Instance, Short Format */
+struct sfi_conf {
+ __le32 stream_handle;
+ u8 multi;
+ u8 res[2];
+ u8 sthm;
+ /* Max Service Data Unit or Flow Meter Instance Table index.
+ * Depending on the value of FLT this represents either Max
+ * Service Data Unit (max frame size) allowed by the filter
+ * entry or is an index into the Flow Meter Instance table
+ * index identifying the policer which will be used to police
+ * it.
+ */
+ __le16 fm_inst_table_index;
+ __le16 msdu;
+ __le16 sg_inst_table_index;
+ u8 res1[2];
+ __le32 input_ports;
+ u8 res2[3];
+ u8 en;
+};
+
+/* class 8, command 2 stream Filter Instance status query short format
+ * command no need structure define
+ * Stream Filter Instance Query Statistics Response data
+ */
+struct sfi_counter_data {
+ u32 matchl;
+ u32 matchh;
+ u32 msdu_dropl;
+ u32 msdu_droph;
+ u32 stream_gate_dropl;
+ u32 stream_gate_droph;
+ u32 flow_meter_dropl;
+ u32 flow_meter_droph;
+};
+
+#define ENETC_CBDR_SGI_OIPV_MASK 0x7
+#define ENETC_CBDR_SGI_OIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_CGTST BIT(6)
+#define ENETC_CBDR_SGI_OGTST BIT(7)
+#define ENETC_CBDR_SGI_CFG_CHG BIT(1)
+#define ENETC_CBDR_SGI_CFG_PND BIT(2)
+#define ENETC_CBDR_SGI_OEX BIT(4)
+#define ENETC_CBDR_SGI_OEXEN BIT(5)
+#define ENETC_CBDR_SGI_IRX BIT(6)
+#define ENETC_CBDR_SGI_IRXEN BIT(7)
+#define ENETC_CBDR_SGI_ACLLEN_MASK 0x3
+#define ENETC_CBDR_SGI_OCLLEN_MASK 0xc
+#define ENETC_CBDR_SGI_EN BIT(7)
+/* class 9, command 0, Stream Gate Instance Table, Short Format
+ * class 9, command 2, Stream Gate Instance Table entry query write back
+ * Short Format
+ */
+struct sgi_table {
+ u8 res[8];
+ u8 oipv;
+ u8 res0[2];
+ u8 ocgtst;
+ u8 res1[7];
+ u8 gset;
+ u8 oacl_len;
+ u8 res2[2];
+ u8 en;
+};
+
+#define ENETC_CBDR_SGI_AIPV_MASK 0x7
+#define ENETC_CBDR_SGI_AIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_AGTST BIT(7)
+
+/* class 9, command 1, Stream Gate Control List, Long Format */
+struct sgcl_conf {
+ u8 aipv;
+ u8 res[2];
+ u8 agtst;
+ u8 res1[4];
+ union {
+ struct {
+ u8 res2[4];
+ u8 acl_len;
+ u8 res3[3];
+ };
+ u8 cct[8]; /* Config change time */
+ };
+};
+
+#define ENETC_CBDR_SGL_IOMEN BIT(0)
+#define ENETC_CBDR_SGL_IPVEN BIT(3)
+#define ENETC_CBDR_SGL_GTST BIT(4)
+#define ENETC_CBDR_SGL_IPV_MASK 0xe
+/* Stream Gate Control List Entry */
+struct sgce {
+ u32 interval;
+ u8 msdu[3];
+ u8 multi;
+};
+
+/* stream control list class 9 , cmd 1 data buffer */
+struct sgcl_data {
+ u32 btl;
+ u32 bth;
+ u32 ct;
+ u32 cte;
+ struct sgce sgcl[0];
+};
+
struct enetc_cbd {
union{
+ struct sfi_conf sfi_conf;
+ struct sgi_table sgi_table;
struct {
__le32 addr[2];
union {
__le32 opt[4];
struct tgs_gcl_conf gcl_conf;
+ struct streamid_conf sid_set;
+ struct sgcl_conf sgcl_conf;
};
}; /* Long format */
__le32 data[6];
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index cef9fbfdb056..824d211ec00f 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -727,12 +727,10 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

- if (si->hw_features & ENETC_SI_F_PSFP) {
+ if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
priv->active_offloads |= ENETC_F_QCI;
ndev->features |= NETIF_F_HW_TC;
ndev->hw_features |= NETIF_F_HW_TC;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&si->hw);
}

/* pick up primary MAC address from SI */
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 0c6bf3a55a9a..609cb5752b47 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -5,6 +5,9 @@

#include <net/pkt_sched.h>
#include <linux/math64.h>
+#include <linux/refcount.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>

static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
{
@@ -331,3 +334,1082 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)

return 0;
}
+
+enum streamid_type {
+ STREAMID_TYPE_RESERVED = 0,
+ STREAMID_TYPE_NULL,
+ STREAMID_TYPE_SMAC,
+};
+
+enum streamid_vlan_tagged {
+ STREAMID_VLAN_RESERVED = 0,
+ STREAMID_VLAN_TAGGED,
+ STREAMID_VLAN_UNTAGGED,
+ STREAMID_VLAN_ALL,
+};
+
+#define ENETC_PSFP_WILDCARD -1
+#define HANDLE_OFFSET 100
+
+enum forward_type {
+ FILTER_ACTION_TYPE_PSFP = BIT(0),
+ FILTER_ACTION_TYPE_ACL = BIT(1),
+ FILTER_ACTION_TYPE_BOTH = GENMASK(1, 0),
+};
+
+/* This is for limit output type for input actions */
+struct actions_fwd {
+ u64 actions;
+ u64 keys; /* include the must needed keys */
+ enum forward_type output;
+};
+
+struct psfp_streamfilter_counters {
+ u64 matching_frames_count;
+ u64 passing_frames_count;
+ u64 not_passing_frames_count;
+ u64 passing_sdu_count;
+ u64 not_passing_sdu_count;
+ u64 red_frames_count;
+};
+
+struct enetc_streamid {
+ u32 index;
+ union {
+ u8 src_mac[6];
+ u8 dst_mac[6];
+ };
+ u8 filtertype;
+ u16 vid;
+ u8 tagged;
+ s32 handle;
+};
+
+struct enetc_psfp_filter {
+ u32 index;
+ s32 handle;
+ s8 prio;
+ u32 gate_id;
+ s32 meter_id;
+ refcount_t refcount;
+ struct hlist_node node;
+};
+
+struct enetc_psfp_gate {
+ u32 index;
+ s8 init_ipv;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimext;
+ u32 num_entries;
+ refcount_t refcount;
+ struct hlist_node node;
+ struct action_gate_entry entries[0];
+};
+
+struct enetc_stream_filter {
+ struct enetc_streamid sid;
+ u32 sfi_index;
+ u32 sgi_index;
+ struct flow_stats stats;
+ struct hlist_node node;
+};
+
+struct enetc_psfp {
+ unsigned long dev_bitmap;
+ unsigned long *psfp_sfi_bitmap;
+ struct hlist_head stream_list;
+ struct hlist_head psfp_filter_list;
+ struct hlist_head psfp_gate_list;
+ spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
+};
+
+struct actions_fwd enetc_act_fwd[] = {
+ {
+ BIT(FLOW_ACTION_GATE),
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
+ FILTER_ACTION_TYPE_PSFP
+ },
+ /* example for ACL actions */
+ {
+ BIT(FLOW_ACTION_DROP),
+ 0,
+ FILTER_ACTION_TYPE_ACL
+ }
+};
+
+static struct enetc_psfp epsfp = {
+ .psfp_sfi_bitmap = NULL,
+};
+
+static LIST_HEAD(enetc_block_cb_list);
+
+static inline int enetc_get_port(struct enetc_ndev_priv *priv)
+{
+ return priv->si->pdev->devfn & 0x7;
+}
+
+/* Stream Identity Entry Set Descriptor */
+static int enetc_streamid_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct streamid_data *si_data;
+ struct streamid_conf *si_conf;
+ u16 data_size;
+ dma_addr_t dma;
+ int err;
+
+ if (sid->index >= priv->psfp_cap.max_streamid)
+ return -EINVAL;
+
+ if (sid->filtertype != STREAMID_TYPE_NULL &&
+ sid->filtertype != STREAMID_TYPE_SMAC)
+ return -EOPNOTSUPP;
+
+ /* Disable operation before enable */
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct streamid_data);
+ si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev, si_data,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(si_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+ memset(si_data->dmac, 0xff, ETH_ALEN);
+ si_data->vid_vidm_tg =
+ cpu_to_le16(ENETC_CBDR_SID_VID_MASK
+ + ((0x3 << 14) | ENETC_CBDR_SID_VIDM));
+
+ si_conf = &cbd.sid_set;
+ /* Only one port supported for one entry, set itself */
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = 1;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ if (!enable) {
+ kfree(si_data);
+ return 0;
+ }
+
+ /* Enable the entry overwrite again incase space flushed by hardware */
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ si_conf->en = 0x80;
+ si_conf->stream_handle = cpu_to_le32(sid->handle);
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = sid->filtertype;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ memset(si_data, 0, data_size);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ /* VIDM default to be 1.
+ * VID Match. If set (b1) then the VID must match, otherwise
+ * any VID is considered a match. VIDM setting is only used
+ * when TG is set to b01.
+ */
+ if (si_conf->id_type == STREAMID_TYPE_NULL) {
+ ether_addr_copy(si_data->dmac, sid->dst_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ } else if (si_conf->id_type == STREAMID_TYPE_SMAC) {
+ ether_addr_copy(si_data->smac, sid->src_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ kfree(si_data);
+
+ return err;
+}
+
+/* Stream Filter Instance Set Descriptor */
+static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_filter *sfi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct sfi_conf *sfi_config;
+
+ cbd.index = cpu_to_le16(sfi->index);
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0x80;
+ cbd.length = cpu_to_le16(1);
+
+ sfi_config = &cbd.sfi_conf;
+ if (!enable)
+ goto exit;
+
+ sfi_config->en = 0x80;
+
+ if (sfi->handle >= 0) {
+ sfi_config->stream_handle =
+ cpu_to_le32(sfi->handle);
+ sfi_config->sthm |= 0x80;
+ }
+
+ sfi_config->sg_inst_table_index = cpu_to_le16(sfi->gate_id);
+ sfi_config->input_ports = 1 << enetc_get_port(priv);
+
+ /* The priority value which may be matched against the
+ * frame’s priority value to determine a match for this entry.
+ */
+ if (sfi->prio >= 0)
+ sfi_config->multi |= (sfi->prio & 0x7) | 0x8;
+
+ /* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
+ * field as being either an MSDU value or an index into the Flow
+ * Meter Instance table.
+ * TODO: no limit max sdu
+ */
+
+ if (sfi->meter_id >= 0) {
+ sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
+ sfi_config->multi |= 0x80;
+ }
+
+exit:
+ return enetc_send_cmd(priv->si, &cbd);
+}
+
+static int enetc_streamcounter_hw_get(struct enetc_ndev_priv *priv,
+ u32 index,
+ struct psfp_streamfilter_counters *cnt)
+{
+ struct enetc_cbd cbd = { .cmd = 2 };
+ struct sfi_counter_data *data_buf;
+ dma_addr_t dma;
+ u16 data_size;
+ int err;
+
+ cbd.index = cpu_to_le16((u16)index);
+ cbd.cmd = 2;
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct sfi_counter_data);
+ data_buf = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!data_buf)
+ return -ENOMEM;
+
+ dma = dma_map_single(&priv->si->pdev->dev, data_buf,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ err = -ENOMEM;
+ goto exit;
+ }
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ goto exit;
+
+ cnt->matching_frames_count =
+ ((u64)le32_to_cpu(data_buf->matchh) << 32)
+ + data_buf->matchl;
+
+ cnt->not_passing_sdu_count =
+ ((u64)le32_to_cpu(data_buf->msdu_droph) << 32)
+ + data_buf->msdu_dropl;
+
+ cnt->passing_sdu_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count;
+
+ cnt->not_passing_frames_count =
+ ((u64)le32_to_cpu(data_buf->stream_gate_droph) << 32)
+ + le32_to_cpu(data_buf->stream_gate_dropl);
+
+ cnt->passing_frames_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count
+ - cnt->not_passing_frames_count;
+
+ cnt->red_frames_count =
+ ((u64)le32_to_cpu(data_buf->flow_meter_droph) << 32)
+ + le32_to_cpu(data_buf->flow_meter_dropl);
+
+exit:
+ kfree(data_buf);
+ return err;
+}
+
+static int get_start_ns(struct enetc_ndev_priv *priv, u64 cycle, u64 *start)
+{
+ u64 now_lo, now_hi, now, n;
+
+ now_lo = enetc_rd(&priv->si->hw, ENETC_SICTR0);
+ now_hi = enetc_rd(&priv->si->hw, ENETC_SICTR1);
+ now = now_lo | now_hi << 32;
+
+ if (WARN_ON(!cycle))
+ return -EFAULT;
+
+ n = div64_u64(now, cycle);
+
+ *start = (n + 1) * cycle;
+
+ return 0;
+}
+
+/* Stream Gate Instance Set Descriptor */
+static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_gate *sgi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = { .cmd = 0 };
+ struct sgi_table *sgi_config;
+ struct sgcl_conf *sgcl_config;
+ struct sgcl_data *sgcl_data;
+ struct sgce *sgce;
+ dma_addr_t dma;
+ u16 data_size;
+ int err, i;
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0x80;
+
+ /* disable */
+ if (!enable)
+ return enetc_send_cmd(priv->si, &cbd);
+
+ if (!sgi->num_entries)
+ return 0;
+
+ if (sgi->num_entries > priv->psfp_cap.max_psfp_gatelist ||
+ !sgi->cycletime)
+ return -EINVAL;
+
+ /* enable */
+ sgi_config = &cbd.sgi_table;
+
+ /* Keep open before gate list start */
+ sgi_config->ocgtst = 0x80;
+
+ sgi_config->oipv = (sgi->init_ipv < 0) ?
+ 0x0 : ((sgi->init_ipv & 0x7) | 0x8);
+
+ sgi_config->en = 0x80;
+
+ /* Basic config */
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 1;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0;
+
+ sgcl_config = &cbd.sgcl_conf;
+
+ sgcl_config->acl_len = (sgi->num_entries - 1) & 0x3;
+
+ data_size = struct_size(sgcl_data, sgcl, sgi->num_entries);
+
+ sgcl_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!sgcl_data)
+ return -ENOMEM;
+
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev,
+ sgcl_data, data_size,
+ DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(sgcl_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ sgce = &sgcl_data->sgcl[0];
+
+ sgcl_config->agtst = 0x80;
+
+ sgcl_data->ct = cpu_to_le32(sgi->cycletime);
+ sgcl_data->cte = cpu_to_le32(sgi->cycletimext);
+
+ if (sgi->init_ipv >= 0)
+ sgcl_config->aipv = (sgi->init_ipv & 0x7) | 0x8;
+
+ for (i = 0; i < sgi->num_entries; i++) {
+ struct action_gate_entry *from = &sgi->entries[i];
+ struct sgce *to = &sgce[i];
+
+ if (from->gate_state)
+ to->multi |= 0x10;
+
+ if (from->ipv >= 0)
+ to->multi |= ((from->ipv & 0x7) << 5) | 0x08;
+
+ if (from->maxoctets)
+ to->multi |= 0x01;
+
+ to->interval = cpu_to_le32(from->interval);
+ to->msdu[0] = from->maxoctets & 0xFF;
+ to->msdu[1] = (from->maxoctets >> 8) & 0xFF;
+ to->msdu[2] = (from->maxoctets >> 16) & 0xFF;
+ }
+
+ /* If basetime is 0, calculate start time */
+ if (!sgi->basetime) {
+ u64 start;
+
+ err = get_start_ns(priv, sgi->cycletime, &start);
+ if (err)
+ goto exit;
+ sgcl_data->btl = cpu_to_le32(lower_32_bits(start));
+ sgcl_data->bth = cpu_to_le32(upper_32_bits(start));
+ } else {
+ u32 hi, lo;
+
+ hi = upper_32_bits(sgi->basetime);
+ lo = lower_32_bits(sgi->basetime);
+ sgcl_data->bth = cpu_to_le32(hi);
+ sgcl_data->btl = cpu_to_le32(lo);
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+
+exit:
+ kfree(sgcl_data);
+
+ return err;
+}
+
+static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
+{
+ struct enetc_stream_filter *f;
+
+ hlist_for_each_entry(f, &epsfp.stream_list, node)
+ if (f->sid.index == index)
+ return f;
+
+ return NULL;
+}
+
+static struct enetc_psfp_gate *enetc_get_gate_by_index(u32 index)
+{
+ struct enetc_psfp_gate *g;
+
+ hlist_for_each_entry(g, &epsfp.psfp_gate_list, node)
+ if (g->index == index)
+ return g;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->index == index)
+ return s;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter
+ *enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->gate_id == sfi->gate_id &&
+ s->prio == sfi->prio &&
+ s->meter_id == sfi->meter_id)
+ return s;
+
+ return NULL;
+}
+
+static int enetc_get_free_index(struct enetc_ndev_priv *priv)
+{
+ u32 max_size = priv->psfp_cap.max_psfp_filter;
+ unsigned long index;
+
+ index = find_first_zero_bit(epsfp.psfp_sfi_bitmap, max_size);
+ if (index == max_size)
+ return -1;
+
+ return index;
+}
+
+static void stream_filter_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_filter *sfi;
+ u8 z;
+
+ sfi = enetc_get_filter_by_index(index);
+ WARN_ON(!sfi);
+ z = refcount_dec_and_test(&sfi->refcount);
+
+ if (z) {
+ enetc_streamfilter_hw_set(priv, sfi, false);
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ clear_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ }
+}
+
+static void stream_gate_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_gate *sgi;
+ u8 z;
+
+ sgi = enetc_get_gate_by_index(index);
+ WARN_ON(!sgi);
+ z = refcount_dec_and_test(&sgi->refcount);
+ if (z) {
+ enetc_streamgate_hw_set(priv, sgi, false);
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void remove_one_chain(struct enetc_ndev_priv *priv,
+ struct enetc_stream_filter *filter)
+{
+ stream_gate_unref(priv, filter->sgi_index);
+ stream_filter_unref(priv, filter->sfi_index);
+
+ hlist_del(&filter->node);
+ kfree(filter);
+}
+
+static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ struct enetc_psfp_filter *sfi,
+ struct enetc_psfp_gate *sgi)
+{
+ int err;
+
+ err = enetc_streamid_hw_set(priv, sid, true);
+ if (err)
+ return err;
+
+ if (sfi) {
+ err = enetc_streamfilter_hw_set(priv, sfi, true);
+ if (err)
+ goto revert_sid;
+ }
+
+ err = enetc_streamgate_hw_set(priv, sgi, true);
+ if (err)
+ goto revert_sfi;
+
+ return 0;
+
+revert_sfi:
+ if (sfi)
+ enetc_streamfilter_hw_set(priv, sfi, false);
+revert_sid:
+ enetc_streamid_hw_set(priv, sid, false);
+ return err;
+}
+
+struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int inputkeys)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(enetc_act_fwd); i++)
+ if (acts == enetc_act_fwd[i].actions &&
+ inputkeys & enetc_act_fwd[i].keys)
+ return &enetc_act_fwd[i];
+
+ return NULL;
+}
+
+static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(f);
+ struct netlink_ext_ack *extack = f->common.extack;
+ struct enetc_stream_filter *filter, *old_filter;
+ struct enetc_psfp_filter *sfi, *old_sfi;
+ struct enetc_psfp_gate *sgi, *old_sgi;
+ struct flow_action_entry *entry;
+ struct action_gate_entry *e;
+ u8 sfi_overwrite = 0;
+ int entries_size;
+ int i, err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ flow_action_for_each(i, entry, &rule->action)
+ if (entry->id == FLOW_ACTION_GATE)
+ break;
+
+ if (entry->id != FLOW_ACTION_GATE)
+ return -EINVAL;
+
+ filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+ if (!filter)
+ return -ENOMEM;
+
+ filter->sid.index = f->common.chain_index;
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+ struct flow_match_eth_addrs match;
+
+ flow_rule_match_eth_addrs(rule, &match);
+
+ if (!is_zero_ether_addr(match.mask->dst) &&
+ !is_zero_ether_addr(match.mask->src)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Cannot match on both source and destination MAC");
+ goto free_filter;
+ }
+
+ if (!is_zero_ether_addr(match.mask->dst)) {
+ ether_addr_copy(filter->sid.dst_mac, match.key->dst);
+ filter->sid.filtertype = STREAMID_TYPE_NULL;
+ }
+
+ if (!is_zero_ether_addr(match.mask->src)) {
+ if (!is_broadcast_ether_addr(match.mask->src)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Masked matching on source MAC not supported");
+ goto free_filter;
+ }
+ ether_addr_copy(filter->sid.src_mac, match.key->src);
+ filter->sid.filtertype = STREAMID_TYPE_SMAC;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported, must include ETH_ADDRS");
+ goto free_filter;
+ }
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
+ struct flow_match_vlan match;
+
+ flow_rule_match_vlan(rule, &match);
+ if (match.mask->vlan_priority) {
+ if (match.mask->vlan_priority !=
+ (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN priority");
+ err = -EINVAL;
+ goto free_filter;
+ }
+ }
+
+ if (match.mask->vlan_id) {
+ if (match.mask->vlan_id != VLAN_VID_MASK) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN id");
+ err = -EINVAL;
+ goto free_filter;
+ }
+
+ filter->sid.vid = match.key->vlan_id;
+ if (!filter->sid.vid)
+ filter->sid.tagged = STREAMID_VLAN_UNTAGGED;
+ else
+ filter->sid.tagged = STREAMID_VLAN_TAGGED;
+ }
+ } else {
+ filter->sid.tagged = STREAMID_VLAN_ALL;
+ }
+
+ /* parsing gate action */
+ if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ entries_size = struct_size(sgi, entries, entry->gate.num_entries);
+ sgi = kzalloc(entries_size, GFP_KERNEL);
+ if (!sgi) {
+ err = -ENOMEM;
+ goto free_filter;
+ }
+
+ refcount_set(&sgi->refcount, 1);
+ sgi->index = entry->gate.index;
+ sgi->init_ipv = entry->gate.prio;
+ sgi->basetime = entry->gate.basetime;
+ sgi->cycletime = entry->gate.cycletime;
+ sgi->num_entries = entry->gate.num_entries;
+
+ e = sgi->entries;
+ for (i = 0; i < entry->gate.num_entries; i++) {
+ e[i].gate_state = entry->gate.entries[i].gate_state;
+ e[i].interval = entry->gate.entries[i].interval;
+ e[i].ipv = entry->gate.entries[i].ipv;
+ e[i].maxoctets = entry->gate.entries[i].maxoctets;
+ }
+
+ filter->sgi_index = sgi->index;
+
+ sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
+ if (!sfi) {
+ err = -ENOMEM;
+ goto free_gate;
+ }
+
+ refcount_set(&sfi->refcount, 1);
+ sfi->gate_id = sgi->index;
+
+ /* flow meter not support yet */
+ sfi->meter_id = ENETC_PSFP_WILDCARD;
+
+ /* prio ref the filter prio */
+ if (f->common.prio && f->common.prio <= BIT(3))
+ sfi->prio = f->common.prio - 1;
+ else
+ sfi->prio = ENETC_PSFP_WILDCARD;
+
+ old_sfi = enetc_psfp_check_sfi(sfi);
+ if (!old_sfi) {
+ int index;
+
+ index = enetc_get_free_index(priv);
+ if (sfi->handle < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
+ err = -ENOSPC;
+ goto free_sfi;
+ }
+
+ sfi->index = index;
+ sfi->handle = index + HANDLE_OFFSET;
+ /* Update the stream filter handle also */
+ filter->sid.handle = sfi->handle;
+ filter->sfi_index = sfi->index;
+ sfi_overwrite = 0;
+ } else {
+ filter->sfi_index = old_sfi->index;
+ filter->sid.handle = old_sfi->handle;
+ sfi_overwrite = 1;
+ }
+
+ err = enetc_psfp_hw_set(priv, &filter->sid,
+ sfi_overwrite ? NULL : sfi, sgi);
+ if (err)
+ goto free_sfi;
+
+ spin_lock(&epsfp.psfp_lock);
+ /* Remove the old node if exist and update with a new node */
+ old_sgi = enetc_get_gate_by_index(filter->sgi_index);
+ if (old_sgi) {
+ refcount_set(&sgi->refcount,
+ refcount_read(&old_sgi->refcount) + 1);
+ hlist_del(&old_sgi->node);
+ kfree(old_sgi);
+ }
+
+ hlist_add_head(&sgi->node, &epsfp.psfp_gate_list);
+
+ if (!old_sfi) {
+ hlist_add_head(&sfi->node, &epsfp.psfp_filter_list);
+ set_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ } else {
+ kfree(sfi);
+ refcount_inc(&old_sfi->refcount);
+ }
+
+ old_filter = enetc_get_stream_by_index(filter->sid.index);
+ if (old_filter)
+ remove_one_chain(priv, old_filter);
+
+ filter->stats.lastused = jiffies;
+ hlist_add_head(&filter->node, &epsfp.stream_list);
+
+ spin_unlock(&epsfp.psfp_lock);
+
+ return 0;
+
+free_sfi:
+ kfree(sfi);
+free_gate:
+ kfree(sgi);
+free_filter:
+ kfree(filter);
+
+ return err;
+}
+
+static int enetc_config_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
+ struct netlink_ext_ack *extack = cls_flower->common.extack;
+ struct flow_dissector *dissector = rule->match.dissector;
+ struct flow_action *action = &rule->action;
+ struct flow_action_entry *entry;
+ struct actions_fwd *fwd;
+ u64 actions = 0;
+ int i, err;
+
+ if (!flow_action_has_entries(action)) {
+ NL_SET_ERR_MSG_MOD(extack, "At least one action is needed");
+ return -EINVAL;
+ }
+
+ flow_action_for_each(i, entry, action)
+ actions |= BIT(entry->id);
+
+ fwd = enetc_check_flow_actions(actions, dissector->used_keys);
+ if (!fwd) {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported filter type!");
+ return -EOPNOTSUPP;
+ }
+
+ if (fwd->output & FILTER_ACTION_TYPE_PSFP) {
+ err = enetc_psfp_parse_clsflower(priv, cls_flower);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid PSFP inputs");
+ return err;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported actions");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int enetc_psfp_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct enetc_stream_filter *filter;
+ struct netlink_ext_ack *extack = f->common.extack;
+ int err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamid_hw_set(priv, &filter->sid, false);
+ if (err)
+ return err;
+
+ remove_one_chain(priv, filter);
+
+ return 0;
+}
+
+static int enetc_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ return enetc_psfp_destroy_clsflower(priv, f);
+}
+
+static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct psfp_streamfilter_counters counters = {};
+ struct enetc_stream_filter *filter;
+ struct flow_stats stats = {};
+ int err;
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamcounter_hw_get(priv, filter->sfi_index, &counters);
+ if (err)
+ return -EINVAL;
+
+ spin_lock(&epsfp.psfp_lock);
+ stats.pkts = counters.matching_frames_count - filter->stats.pkts;
+ stats.lastused = filter->stats.lastused;
+ filter->stats.pkts += stats.pkts;
+ spin_unlock(&epsfp.psfp_lock);
+
+ flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
+ FLOW_ACTION_HW_STATS_DELAYED);
+
+ return 0;
+}
+
+static int enetc_setup_tc_cls_flower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ switch (cls_flower->command) {
+ case FLOW_CLS_REPLACE:
+ return enetc_config_clsflower(priv, cls_flower);
+ case FLOW_CLS_DESTROY:
+ return enetc_destroy_clsflower(priv, cls_flower);
+ case FLOW_CLS_STATS:
+ return enetc_psfp_get_stats(priv, cls_flower);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static inline void clean_psfp_sfi_bitmap(void)
+{
+ bitmap_free(epsfp.psfp_sfi_bitmap);
+ epsfp.psfp_sfi_bitmap = NULL;
+}
+
+static void clean_stream_list(void)
+{
+ struct enetc_stream_filter *s;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(s, tmp, &epsfp.stream_list, node) {
+ hlist_del(&s->node);
+ kfree(s);
+ }
+}
+
+static void clean_sfi_list(void)
+{
+ struct enetc_psfp_filter *sfi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sfi, tmp, &epsfp.psfp_filter_list, node) {
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ }
+}
+
+static void clean_sgi_list(void)
+{
+ struct enetc_psfp_gate *sgi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sgi, tmp, &epsfp.psfp_gate_list, node) {
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void clean_psfp_all(void)
+{
+ /* Disable all list nodes and free all memory */
+ clean_sfi_list();
+ clean_sgi_list();
+ clean_stream_list();
+ epsfp.dev_bitmap = 0;
+ clean_psfp_sfi_bitmap();
+}
+
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct net_device *ndev = cb_priv;
+
+ if (!tc_can_offload(ndev))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case TC_SETUP_CLSFLOWER:
+ return enetc_setup_tc_cls_flower(netdev_priv(ndev), type_data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+int enetc_psfp_init(struct enetc_ndev_priv *priv)
+{
+ if (epsfp.psfp_sfi_bitmap)
+ return 0;
+
+ epsfp.psfp_sfi_bitmap = bitmap_zalloc(priv->psfp_cap.max_psfp_filter,
+ GFP_KERNEL);
+ if (!epsfp.psfp_sfi_bitmap)
+ return -ENOMEM;
+
+ spin_lock_init(&epsfp.psfp_lock);
+
+ if (list_empty(&enetc_block_cb_list))
+ epsfp.dev_bitmap = 0;
+
+ return 0;
+}
+
+int enetc_psfp_clean(struct enetc_ndev_priv *priv)
+{
+ if (!list_empty(&enetc_block_cb_list))
+ return -EBUSY;
+
+ clean_psfp_all();
+
+ return 0;
+}
+
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct flow_block_offload *f = type_data;
+ int err;
+
+ err = flow_block_cb_setup_simple(f, &enetc_block_cb_list,
+ enetc_setup_tc_block_cb,
+ ndev, ndev, true);
+ if (err)
+ return err;
+
+ switch (f->command) {
+ case FLOW_BLOCK_BIND:
+ set_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ break;
+ case FLOW_BLOCK_UNBIND:
+ clear_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ if (!epsfp.dev_bitmap)
+ clean_psfp_all();
+ break;
+ }
+
+ return 0;
+}
--
2.17.1

2020-04-22 13:34:18

by Vlad Buslov

[permalink] [raw]
Subject: Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Hi Po,

On Wed 22 Apr 2020 at 05:48, Po Liu <[email protected]> wrote:
> Introduce a ingress frame gate control flow action.
> Tc gate action does the work like this:
> Assume there is a gate allow specified ingress frames can be passed at
> specific time slot, and be dropped at specific time slot. Tc filter
> chooses the ingress frames, and tc gate action would specify what slot
> does these frames can be passed to device and what time slot would be
> dropped.
> Tc gate action would provide an entry list to tell how much time gate
> keep open and how much time gate keep state close. Gate action also
> assign a start time to tell when the entry list start. Then driver would
> repeat the gate entry list cyclically.
> For the software simulation, gate action requires the user assign a time
> clock type.
>
> Below is the setting example in user space. Tc filter a stream source ip
> address is 192.168.0.20 and gate action own two time slots. One is last
> 200ms gate open let frame pass another is last 100ms gate close let
> frames dropped. When the frames have passed total frames over 8000000
> bytes, frames will be dropped in one 200000000ns time slot.
>
>> tc qdisc add dev eth0 ingress
>
>> tc filter add dev eth0 parent ffff: protocol ip \
> flower src_ip 192.168.0.20 \
> action gate index 2 clockid CLOCK_TAI \
> sched-entry open 200000000 -1 8000000 \
> sched-entry close 100000000 -1 -1
>
>> tc chain del dev eth0 ingress chain 0
>
> "sched-entry" follow the name taprio style. Gate state is
> "open"/"close". Follow with period nanosecond. Then next item is internal
> priority value means which ingress queue should put. "-1" means
> wildcard. The last value optional specifies the maximum number of
> MSDU octets that are permitted to pass the gate during the specified
> time interval.
> Base-time is not set will be 0 as default, as result start time would
> be ((N + 1) * cycletime) which is the minimal of future time.
>
> Below example shows filtering a stream with destination mac address is
> 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
> action would run with one close time slot which means always keep close.
> The time cycle is total 200000000ns. The base-time would calculate by:
>
> 1357000000000 + (N + 1) * cycletime
>
> When the total value is the future time, it will be the start time.
> The cycletime here would be 200000000ns for this case.
>
>> tc filter add dev eth0 parent ffff: protocol ip \
> flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> action gate index 12 base-time 1357000000000 \
> sched-entry close 200000000 -1 -1 \
> clockid CLOCK_TAI
>
> Signed-off-by: Po Liu <[email protected]>
> ---
> include/net/tc_act/tc_gate.h | 54 +++
> include/uapi/linux/pkt_cls.h | 1 +
> include/uapi/linux/tc_act/tc_gate.h | 47 ++
> net/sched/Kconfig | 13 +
> net/sched/Makefile | 1 +
> net/sched/act_gate.c | 647 ++++++++++++++++++++++++++++
> 6 files changed, 763 insertions(+)
> create mode 100644 include/net/tc_act/tc_gate.h
> create mode 100644 include/uapi/linux/tc_act/tc_gate.h
> create mode 100644 net/sched/act_gate.c
>
> diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
> new file mode 100644
> index 000000000000..b0ace55b2aaa
> --- /dev/null
> +++ b/include/net/tc_act/tc_gate.h
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/* Copyright 2020 NXP */
> +
> +#ifndef __NET_TC_GATE_H
> +#define __NET_TC_GATE_H
> +
> +#include <net/act_api.h>
> +#include <linux/tc_act/tc_gate.h>
> +
> +struct tcfg_gate_entry {
> + int index;
> + u8 gate_state;
> + u32 interval;
> + s32 ipv;
> + s32 maxoctets;
> + struct list_head list;
> +};
> +
> +struct tcf_gate_params {
> + s32 tcfg_priority;
> + u64 tcfg_basetime;
> + u64 tcfg_cycletime;
> + u64 tcfg_cycletime_ext;
> + u32 tcfg_flags;
> + s32 tcfg_clockid;
> + size_t num_entries;
> + struct list_head entries;
> +};
> +
> +#define GATE_ACT_GATE_OPEN BIT(0)
> +#define GATE_ACT_PENDING BIT(1)
> +struct gate_action {
> + struct tcf_gate_params param;
> + spinlock_t entry_lock;
> + u8 current_gate_status;
> + ktime_t current_close_time;
> + u32 current_entry_octets;
> + s32 current_max_octets;
> + struct tcfg_gate_entry __rcu *next_entry;
> + struct hrtimer hitimer;
> + enum tk_offsets tk_offset;
> + struct rcu_head rcu;
> +};
> +
> +struct tcf_gate {
> + struct tc_action common;
> + struct gate_action __rcu *actg;
> +};
> +#define to_gate(a) ((struct tcf_gate *)a)
> +
> +#define get_gate_param(act) ((struct tcf_gate_params *)act)
> +#define get_gate_action(p) ((struct gate_action *)p)
> +
> +#endif
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 9f06d29cab70..fc672b232437 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -134,6 +134,7 @@ enum tca_id {
> TCA_ID_CTINFO,
> TCA_ID_MPLS,
> TCA_ID_CT,
> + TCA_ID_GATE,
> /* other actions go here */
> __TCA_ID_MAX = 255
> };
> diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
> new file mode 100644
> index 000000000000..f214b3a6d44f
> --- /dev/null
> +++ b/include/uapi/linux/tc_act/tc_gate.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/* Copyright 2020 NXP */
> +
> +#ifndef __LINUX_TC_GATE_H
> +#define __LINUX_TC_GATE_H
> +
> +#include <linux/pkt_cls.h>
> +
> +struct tc_gate {
> + tc_gen;
> +};
> +
> +enum {
> + TCA_GATE_ENTRY_UNSPEC,
> + TCA_GATE_ENTRY_INDEX,
> + TCA_GATE_ENTRY_GATE,
> + TCA_GATE_ENTRY_INTERVAL,
> + TCA_GATE_ENTRY_IPV,
> + TCA_GATE_ENTRY_MAX_OCTETS,
> + __TCA_GATE_ENTRY_MAX,
> +};
> +#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
> +
> +enum {
> + TCA_GATE_ONE_ENTRY_UNSPEC,
> + TCA_GATE_ONE_ENTRY,
> + __TCA_GATE_ONE_ENTRY_MAX,
> +};
> +#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
> +
> +enum {
> + TCA_GATE_UNSPEC,
> + TCA_GATE_TM,
> + TCA_GATE_PARMS,
> + TCA_GATE_PAD,
> + TCA_GATE_PRIORITY,
> + TCA_GATE_ENTRY_LIST,
> + TCA_GATE_BASE_TIME,
> + TCA_GATE_CYCLE_TIME,
> + TCA_GATE_CYCLE_TIME_EXT,
> + TCA_GATE_FLAGS,
> + TCA_GATE_CLOCKID,
> + __TCA_GATE_MAX,
> +};
> +#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
> +
> +#endif
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index bfbefb7bff9d..1314549c7567 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -981,6 +981,19 @@ config NET_ACT_CT
> To compile this code as a module, choose M here: the
> module will be called act_ct.
>
> +config NET_ACT_GATE
> + tristate "Frame gate entry list control tc action"
> + depends on NET_CLS_ACT
> + help
> + Say Y here to allow to control the ingress flow to be passed at
> + specific time slot and be dropped at other specific time slot by
> + the gate entry list. The manipulation will simulate the IEEE
> + 802.1Qci stream gate control behavior.
> +
> + If unsure, say N.
> + To compile this code as a module, choose M here: the
> + module will be called act_gate.
> +
> config NET_IFE_SKBMARK
> tristate "Support to encoding decoding skb mark on IFE action"
> depends on NET_ACT_IFE
> diff --git a/net/sched/Makefile b/net/sched/Makefile
> index 31c367a6cd09..66bbf9a98f9e 100644
> --- a/net/sched/Makefile
> +++ b/net/sched/Makefile
> @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o
> obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
> obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
> obj-$(CONFIG_NET_ACT_CT) += act_ct.o
> +obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
> obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
> obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
> obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
> diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
> new file mode 100644
> index 000000000000..e932f402b4f1
> --- /dev/null
> +++ b/net/sched/act_gate.c
> @@ -0,0 +1,647 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* Copyright 2020 NXP */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/errno.h>
> +#include <linux/skbuff.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <net/act_api.h>
> +#include <net/netlink.h>
> +#include <net/pkt_cls.h>
> +#include <net/tc_act/tc_gate.h>
> +
> +static unsigned int gate_net_id;
> +static struct tc_action_ops act_gate_ops;
> +
> +static ktime_t gate_get_time(struct gate_action *gact)
> +{
> + ktime_t mono = ktime_get();
> +
> + switch (gact->tk_offset) {
> + case TK_OFFS_MAX:
> + return mono;
> + default:
> + return ktime_mono_to_any(mono, gact->tk_offset);
> + }
> +
> + return KTIME_MAX;
> +}
> +
> +static int gate_get_start_time(struct gate_action *gact, ktime_t *start)
> +{
> + struct tcf_gate_params *param = get_gate_param(gact);
> + ktime_t now, base, cycle;
> + u64 n;
> +
> + base = ns_to_ktime(param->tcfg_basetime);
> + now = gate_get_time(gact);
> +
> + if (ktime_after(base, now)) {
> + *start = base;
> + return 0;
> + }
> +
> + cycle = param->tcfg_cycletime;
> +
> + /* cycle time should not be zero */
> + if (WARN_ON(!cycle))
> + return -EFAULT;

Looking at the init code it seems that this value can be set to 0
directly from netlink packet without further validation, which would
allow user to trigger warning here.

> +
> + n = div64_u64(ktime_sub_ns(now, base), cycle);
> + *start = ktime_add_ns(base, (n + 1) * cycle);
> + return 0;
> +}
> +
> +static void gate_start_timer(struct gate_action *gact, ktime_t start)
> +{
> + ktime_t expires;
> +
> + expires = hrtimer_get_expires(&gact->hitimer);
> + if (expires == 0)
> + expires = KTIME_MAX;
> +
> + start = min_t(ktime_t, start, expires);
> +
> + hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS);
> +}
> +
> +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
> +{
> + struct gate_action *gact = container_of(timer, struct gate_action,
> + hitimer);
> + struct tcf_gate_params *p = get_gate_param(gact);
> + struct tcfg_gate_entry *next;
> + ktime_t close_time, now;
> +
> + spin_lock(&gact->entry_lock);
> +
> + next = rcu_dereference_protected(gact->next_entry,
> + lockdep_is_held(&gact->entry_lock));
> +
> + /* cycle start, clear pending bit, clear total octets */
> + gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0;
> + gact->current_entry_octets = 0;
> + gact->current_max_octets = next->maxoctets;
> +
> + gact->current_close_time = ktime_add_ns(gact->current_close_time,
> + next->interval);
> +
> + close_time = gact->current_close_time;
> +
> + if (list_is_last(&next->list, &p->entries))
> + next = list_first_entry(&p->entries,
> + struct tcfg_gate_entry, list);
> + else
> + next = list_next_entry(next, list);
> +
> + now = gate_get_time(gact);
> +
> + if (ktime_after(now, close_time)) {
> + ktime_t cycle, base;
> + u64 n;
> +
> + cycle = p->tcfg_cycletime;
> + base = ns_to_ktime(p->tcfg_basetime);
> + n = div64_u64(ktime_sub_ns(now, base), cycle);
> + close_time = ktime_add_ns(base, (n + 1) * cycle);
> + }
> +
> + rcu_assign_pointer(gact->next_entry, next);
> + spin_unlock(&gact->entry_lock);

I have couple of question about synchronization here:

- Why do you need next_entry to be rcu pointer? It is only assigned here
with entry_lock protection and in init code before action is visible to
concurrent users. I don't see any unlocked rcu-protected readers here
that could benefit from it.

- Why create dedicated entry_lock instead of using already existing
per-action tcf_lock?

> +
> + hrtimer_set_expires(&gact->hitimer, close_time);
> +
> + return HRTIMER_RESTART;
> +}
> +
> +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
> + struct tcf_result *res)
> +{
> + struct tcf_gate *g = to_gate(a);
> + struct gate_action *gact;
> + int action;
> +
> + tcf_lastuse_update(&g->tcf_tm);
> + bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
> +
> + action = READ_ONCE(g->tcf_action);
> + rcu_read_lock();

Action fastpath is already rcu read lock protected, you don't need to
manually obtain it.

> + gact = rcu_dereference_bh(g->actg);
> + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {

Can't current_gate_status be concurrently modified by timer callback?
This function doesn't use entry_lock to synchronize with timer.

> + rcu_read_unlock();
> + return action;
> + }
> +
> + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))

...and here

> + goto drop;
> +
> + if (gact->current_max_octets >= 0) {
> + gact->current_entry_octets += qdisc_pkt_len(skb);
> + if (gact->current_entry_octets > gact->current_max_octets) {

here also.

> + qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));

Please use tcf_action_inc_overlimit_qstats() and other wrappers for
stats. Otherwise it will crash if user passes
TCA_ACT_FLAGS_NO_PERCPU_STATS flag.

> + goto drop;
> + }
> + }
> + rcu_read_unlock();
> +
> + return action;
> +drop:
> + rcu_read_unlock();
> + qstats_drop_inc(this_cpu_ptr(g->common.cpu_qstats));
> + return TC_ACT_SHOT;
> +}
> +
> +static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1] = {
> + [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
> + [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
> + [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
> + [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
> + [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
> +};
> +
> +static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
> + [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
> + .type = NLA_EXACT_LEN },
> + [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
> + [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
> + [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
> + [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
> + [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
> + [TCA_GATE_FLAGS] = { .type = NLA_U32 },
> + [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
> +};
> +
> +static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
> + struct netlink_ext_ack *extack)
> +{
> + u32 interval = 0;
> +
> + entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
> +
> + if (tb[TCA_GATE_ENTRY_INTERVAL])
> + interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
> +
> + if (interval == 0) {
> + NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
> + return -EINVAL;
> + }
> +
> + entry->interval = interval;
> +
> + if (tb[TCA_GATE_ENTRY_IPV])
> + entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
> + else
> + entry->ipv = -1;
> +
> + if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
> + entry->maxoctets = nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
> + else
> + entry->maxoctets = -1;
> +
> + return 0;
> +}
> +
> +static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry *entry,
> + int index, struct netlink_ext_ack *extack)
> +{
> + struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
> + int err;

> + err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy, extack);
> + if (err < 0) {
> + NL_SET_ERR_MSG(extack, "Could not parse nested entry");
> + return -EINVAL;
> + }
> +
> + entry->index = index;
> +
> + return fill_gate_entry(tb, entry, extack);
> +}
> +
> +static int parse_gate_list(struct nlattr *list_attr,
> + struct tcf_gate_params *sched,
> + struct netlink_ext_ack *extack)
> +{
> + struct tcfg_gate_entry *entry, *e;
> + struct nlattr *n;
> + int err, rem;
> + int i = 0;
> +
> + if (!list_attr)
> + return -EINVAL;
> +
> + nla_for_each_nested(n, list_attr, rem) {
> + if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
> + NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
> + continue;
> + }
> +
> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> + if (!entry) {
> + NL_SET_ERR_MSG(extack, "Not enough memory for entry");
> + err = -ENOMEM;
> + goto release_list;
> + }
> +
> + err = parse_gate_entry(n, entry, i, extack);
> + if (err < 0) {
> + kfree(entry);
> + goto release_list;
> + }
> +
> + list_add_tail(&entry->list, &sched->entries);
> + i++;
> + }
> +
> + sched->num_entries = i;
> +
> + return i;
> +
> +release_list:
> + list_for_each_entry_safe(entry, e, &sched->entries, list) {
> + list_del(&entry->list);
> + kfree(entry);
> + }
> +
> + return err;
> +}
> +
> +static int tcf_gate_init(struct net *net, struct nlattr *nla,
> + struct nlattr *est, struct tc_action **a,
> + int ovr, int bind, bool rtnl_held,
> + struct tcf_proto *tp, u32 flags,
> + struct netlink_ext_ack *extack)
> +{
> + struct tc_action_net *tn = net_generic(net, gate_net_id);
> + enum tk_offsets tk_offset = TK_OFFS_TAI;
> + struct nlattr *tb[TCA_GATE_MAX + 1];
> + struct tcf_chain *goto_ch = NULL;
> + struct tcfg_gate_entry *next;
> + struct tcf_gate_params *p;
> + struct gate_action *gact;
> + s32 clockid = CLOCK_TAI;
> + struct tc_gate *parm;
> + struct tcf_gate *g;
> + int ret = 0, err;
> + u64 basetime = 0;
> + u32 gflags = 0;
> + s32 prio = -1;
> + ktime_t start;
> + u32 index;
> +
> + if (!nla)
> + return -EINVAL;
> +
> + err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
> + if (err < 0)
> + return err;
> +
> + if (!tb[TCA_GATE_PARMS])
> + return -EINVAL;
> + parm = nla_data(tb[TCA_GATE_PARMS]);
> + index = parm->index;
> + err = tcf_idr_check_alloc(tn, &index, a, bind);
> + if (err < 0)
> + return err;
> +
> + if (err && bind)
> + return 0;
> +
> + if (!err) {
> + ret = tcf_idr_create_from_flags(tn, index, est, a,
> + &act_gate_ops, bind, flags);
> + if (ret) {
> + tcf_idr_cleanup(tn, index);
> + return ret;
> + }
> +
> + ret = ACT_P_CREATED;
> + } else if (!ovr) {
> + tcf_idr_release(*a, bind);
> + return -EEXIST;
> + }
> +
> + if (tb[TCA_GATE_PRIORITY])
> + prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
> +
> + if (tb[TCA_GATE_BASE_TIME])
> + basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
> +
> + if (tb[TCA_GATE_FLAGS])
> + gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
> +
> + if (tb[TCA_GATE_CLOCKID]) {
> + clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
> + switch (clockid) {
> + case CLOCK_REALTIME:
> + tk_offset = TK_OFFS_REAL;
> + break;
> + case CLOCK_MONOTONIC:
> + tk_offset = TK_OFFS_MAX;
> + break;
> + case CLOCK_BOOTTIME:
> + tk_offset = TK_OFFS_BOOT;
> + break;
> + case CLOCK_TAI:
> + tk_offset = TK_OFFS_TAI;
> + break;
> + default:
> + NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
> + goto release_idr;
> + }
> + }
> +
> + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
> + if (err < 0)
> + goto release_idr;
> +
> + g = to_gate(*a);
> +
> + gact = kzalloc(sizeof(*gact), GFP_KERNEL);
> + if (!gact) {
> + err = -ENOMEM;
> + goto put_chain;
> + }
> +
> + p = get_gate_param(gact);
> +
> + INIT_LIST_HEAD(&p->entries);
> + if (tb[TCA_GATE_ENTRY_LIST]) {
> + err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
> + if (err < 0)
> + goto release_mem;
> + }
> +
> + if (tb[TCA_GATE_CYCLE_TIME]) {
> + p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
> + } else {
> + struct tcfg_gate_entry *entry;
> + ktime_t cycle = 0;
> +
> + list_for_each_entry(entry, &p->entries, list)
> + cycle = ktime_add_ns(cycle, entry->interval);
> + p->tcfg_cycletime = cycle;
> + }
> +
> + if (tb[TCA_GATE_CYCLE_TIME_EXT])
> + p->tcfg_cycletime_ext =
> + nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
> +
> + p->tcfg_priority = prio;
> + p->tcfg_basetime = basetime;
> + p->tcfg_clockid = clockid;
> + p->tcfg_flags = gflags;
> +
> + gact->tk_offset = tk_offset;
> + spin_lock_init(&gact->entry_lock);
> + hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
> + gact->hitimer.function = gate_timer_func;
> +
> + err = gate_get_start_time(gact, &start);
> + if (err < 0) {
> + NL_SET_ERR_MSG(extack,
> + "Internal error: failed get start time");
> + goto release_mem;
> + }
> +
> + gact->current_close_time = start;
> + gact->current_gate_status = GATE_ACT_GATE_OPEN | GATE_ACT_PENDING;
> +
> + next = list_first_entry(&p->entries, struct tcfg_gate_entry, list);
> + rcu_assign_pointer(gact->next_entry, next);
> +
> + gate_start_timer(gact, start);
> +
> + spin_lock_bh(&g->tcf_lock);
> + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
> + gact = rcu_replace_pointer(g->actg, gact,
> + lockdep_is_held(&g->tcf_lock));
> + spin_unlock_bh(&g->tcf_lock);
> +
> + if (goto_ch)
> + tcf_chain_put_by_act(goto_ch);
> + if (gact)
> + kfree_rcu(gact, rcu);

This leaks entries. For example, tunnel key action implements
tunnel_key_release_params() helper that is used by both init and release
code. I guess that would be the best approach here as well.

> +
> + if (ret == ACT_P_CREATED)
> + tcf_idr_insert(tn, *a);
> + return ret;
> +
> +release_mem:
> + kfree(gact);
> +put_chain:
> + if (goto_ch)
> + tcf_chain_put_by_act(goto_ch);
> +release_idr:
> + tcf_idr_release(*a, bind);
> + return err;
> +}
> +
> +static void tcf_gate_cleanup(struct tc_action *a)
> +{
> + struct tcf_gate *g = to_gate(a);
> + struct tcfg_gate_entry *entry, *n;
> + struct tcf_gate_params *p;
> + struct gate_action *gact;
> +
> + spin_lock_bh(&g->tcf_lock);
> + gact = rcu_dereference_protected(g->actg,
> + lockdep_is_held(&g->tcf_lock));
> + hrtimer_cancel(&gact->hitimer);
> +
> + p = get_gate_param(gact);
> + list_for_each_entry_safe(entry, n, &p->entries, list) {
> + list_del(&entry->list);
> + kfree(entry);
> + }
> + spin_unlock_bh(&g->tcf_lock);
> +
> + kfree_rcu(gact, rcu);
> +}
> +
> +static int dumping_entry(struct sk_buff *skb,
> + struct tcfg_gate_entry *entry)
> +{
> + struct nlattr *item;
> +
> + item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY);
> + if (!item)
> + return -ENOSPC;
> +
> + if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index))
> + goto nla_put_failure;
> +
> + if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE))
> + goto nla_put_failure;
> +
> + if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval))
> + goto nla_put_failure;
> +
> + if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry->maxoctets))
> + goto nla_put_failure;
> +
> + if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv))
> + goto nla_put_failure;
> +
> + return nla_nest_end(skb, item);
> +
> +nla_put_failure:
> + nla_nest_cancel(skb, item);
> + return -1;
> +}
> +
> +static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a,
> + int bind, int ref)
> +{
> + unsigned char *b = skb_tail_pointer(skb);
> + struct tcf_gate *g = to_gate(a);
> + struct tc_gate opt = {
> + .index = g->tcf_index,
> + .refcnt = refcount_read(&g->tcf_refcnt) - ref,
> + .bindcnt = atomic_read(&g->tcf_bindcnt) - bind,
> + };
> + struct tcfg_gate_entry *entry;
> + struct gate_action *gact;
> + struct tcf_gate_params *p;
> + struct nlattr *entry_list;
> + struct tcf_t t;
> +
> + spin_lock_bh(&g->tcf_lock);
> + opt.action = g->tcf_action;
> + gact = rcu_dereference_protected(g->actg,
> + lockdep_is_held(&g->tcf_lock));
> +
> + p = get_gate_param(gact);
> +
> + if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt))
> + goto nla_put_failure;
> +
> + if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME,
> + p->tcfg_basetime, TCA_GATE_PAD))
> + goto nla_put_failure;
> +
> + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME,
> + p->tcfg_cycletime, TCA_GATE_PAD))
> + goto nla_put_failure;
> +
> + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT,
> + p->tcfg_cycletime_ext, TCA_GATE_PAD))
> + goto nla_put_failure;
> +
> + if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid))
> + goto nla_put_failure;
> +
> + if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags))
> + goto nla_put_failure;
> +
> + if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority))
> + goto nla_put_failure;
> +
> + entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST);
> + if (!entry_list)
> + goto nla_put_failure;
> +
> + list_for_each_entry(entry, &p->entries, list) {
> + if (dumping_entry(skb, entry) < 0)
> + goto nla_put_failure;
> + }
> +
> + nla_nest_end(skb, entry_list);
> +
> + tcf_tm_dump(&t, &g->tcf_tm);
> + if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD))
> + goto nla_put_failure;
> + spin_unlock_bh(&g->tcf_lock);
> +
> + return skb->len;
> +
> +nla_put_failure:
> + spin_unlock_bh(&g->tcf_lock);
> + nlmsg_trim(skb, b);
> + return -1;
> +}
> +
> +static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
> + struct netlink_callback *cb, int type,
> + const struct tc_action_ops *ops,
> + struct netlink_ext_ack *extack)
> +{
> + struct tc_action_net *tn = net_generic(net, gate_net_id);
> +
> + return tcf_generic_walker(tn, skb, cb, type, ops, extack);
> +}
> +
> +static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u32 packets,
> + u64 lastuse, bool hw)
> +{
> + struct tcf_gate *g = to_gate(a);
> + struct tcf_t *tm = &g->tcf_tm;
> +
> + tcf_action_update_stats(a, bytes, packets, false, hw);
> + tm->lastuse = max_t(u64, tm->lastuse, lastuse);
> +}
> +
> +static int tcf_gate_search(struct net *net, struct tc_action **a, u32 index)
> +{
> + struct tc_action_net *tn = net_generic(net, gate_net_id);
> +
> + return tcf_idr_search(tn, a, index);
> +}
> +
> +static size_t tcf_gate_get_fill_size(const struct tc_action *act)
> +{
> + return nla_total_size(sizeof(struct tc_gate));
> +}
> +
> +static struct tc_action_ops act_gate_ops = {
> + .kind = "gate",
> + .id = TCA_ID_GATE,
> + .owner = THIS_MODULE,
> + .act = tcf_gate_act,
> + .dump = tcf_gate_dump,
> + .init = tcf_gate_init,
> + .cleanup = tcf_gate_cleanup,
> + .walk = tcf_gate_walker,
> + .stats_update = tcf_gate_stats_update,
> + .get_fill_size = tcf_gate_get_fill_size,
> + .lookup = tcf_gate_search,
> + .size = sizeof(struct gate_action),
> +};
> +
> +static __net_init int gate_init_net(struct net *net)
> +{
> + struct tc_action_net *tn = net_generic(net, gate_net_id);
> +
> + return tc_action_net_init(net, tn, &act_gate_ops);
> +}
> +
> +static void __net_exit gate_exit_net(struct list_head *net_list)
> +{
> + tc_action_net_exit(net_list, gate_net_id);
> +}
> +
> +static struct pernet_operations gate_net_ops = {
> + .init = gate_init_net,
> + .exit_batch = gate_exit_net,
> + .id = &gate_net_id,
> + .size = sizeof(struct tc_action_net),
> +};
> +
> +static int __init gate_init_module(void)
> +{
> + return tcf_register_action(&act_gate_ops, &gate_net_ops);
> +}
> +
> +static void __exit gate_cleanup_module(void)
> +{
> + tcf_unregister_action(&act_gate_ops, &gate_net_ops);
> +}
> +
> +module_init(gate_init_module);
> +module_exit(gate_cleanup_module);
> +MODULE_LICENSE("GPL v2");

2020-04-22 14:16:04

by Vlad Buslov

[permalink] [raw]
Subject: Re: [v3,net-next 2/4] net: schedule: add action gate offloading


On Wed 22 Apr 2020 at 05:48, Po Liu <[email protected]> wrote:
> Add the gate action to the flow action entry. Add the gate parameters to
> the tc_setup_flow_action() queueing to the entries of flow_action_entry
> array provide to the driver.
>
> Signed-off-by: Po Liu <[email protected]>
> ---
> include/net/flow_offload.h | 10 +++
> include/net/tc_act/tc_gate.h | 115 +++++++++++++++++++++++++++++++++++
> net/sched/cls_api.c | 33 ++++++++++
> 3 files changed, 158 insertions(+)
>
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 3619c6acf60f..94a30fe02e6d 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -147,6 +147,7 @@ enum flow_action_id {
> FLOW_ACTION_MPLS_PUSH,
> FLOW_ACTION_MPLS_POP,
> FLOW_ACTION_MPLS_MANGLE,
> + FLOW_ACTION_GATE,
> NUM_FLOW_ACTIONS,
> };
>
> @@ -255,6 +256,15 @@ struct flow_action_entry {
> u8 bos;
> u8 ttl;
> } mpls_mangle;
> + struct {
> + u32 index;
> + s32 prio;
> + u64 basetime;
> + u64 cycletime;
> + u64 cycletimeext;
> + u32 num_entries;
> + struct action_gate_entry *entries;
> + } gate;
> };
> struct flow_action_cookie *cookie; /* user defined action cookie */
> };
> diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
> index b0ace55b2aaa..62633cb02c7a 100644
> --- a/include/net/tc_act/tc_gate.h
> +++ b/include/net/tc_act/tc_gate.h
> @@ -7,6 +7,13 @@
> #include <net/act_api.h>
> #include <linux/tc_act/tc_gate.h>
>
> +struct action_gate_entry {
> + u8 gate_state;
> + u32 interval;
> + s32 ipv;
> + s32 maxoctets;
> +};
> +
> struct tcfg_gate_entry {
> int index;
> u8 gate_state;
> @@ -51,4 +58,112 @@ struct tcf_gate {
> #define get_gate_param(act) ((struct tcf_gate_params *)act)
> #define get_gate_action(p) ((struct gate_action *)p)
>
> +static inline bool is_tcf_gate(const struct tc_action *a)
> +{
> +#ifdef CONFIG_NET_CLS_ACT
> + if (a->ops && a->ops->id == TCA_ID_GATE)
> + return true;
> +#endif
> + return false;
> +}
> +
> +static inline u32 tcf_gate_index(const struct tc_action *a)
> +{
> + return a->tcfa_index;
> +}
> +
> +static inline s32 tcf_gate_prio(const struct tc_action *a)
> +{
> + s32 tcfg_prio;
> +
> + rcu_read_lock();
> + tcfg_prio = rcu_dereference(to_gate(a)->actg)->param.tcfg_priority;
> + rcu_read_unlock();
> +
> + return tcfg_prio;
> +}
> +
> +static inline u64 tcf_gate_basetime(const struct tc_action *a)
> +{
> + u64 tcfg_basetime;
> +
> + rcu_read_lock();
> + tcfg_basetime =
> + rcu_dereference(to_gate(a)->actg)->param.tcfg_basetime;
> + rcu_read_unlock();
> +
> + return tcfg_basetime;
> +}
> +
> +static inline u64 tcf_gate_cycletime(const struct tc_action *a)
> +{
> + u64 tcfg_cycletime;
> +
> + rcu_read_lock();
> + tcfg_cycletime =
> + rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime;
> + rcu_read_unlock();
> +
> + return tcfg_cycletime;
> +}
> +
> +static inline u64 tcf_gate_cycletimeext(const struct tc_action *a)
> +{
> + u64 tcfg_cycletimeext;
> +
> + rcu_read_lock();
> + tcfg_cycletimeext =
> + rcu_dereference(to_gate(a)->actg)->param.tcfg_cycletime_ext;
> + rcu_read_unlock();
> +
> + return tcfg_cycletimeext;
> +}
> +
> +static inline u32 tcf_gate_num_entries(const struct tc_action *a)
> +{
> + u32 num_entries;
> +
> + rcu_read_lock();
> + num_entries =
> + rcu_dereference(to_gate(a)->actg)->param.num_entries;
> + rcu_read_unlock();
> +
> + return num_entries;
> +}
> +
> +static inline struct action_gate_entry
> + *tcf_gate_get_list(const struct tc_action *a)
> +{
> + struct action_gate_entry *oe;
> + struct tcf_gate_params *p;
> + struct tcfg_gate_entry *entry;
> + u32 num_entries;
> + int i = 0;
> +
> + rcu_read_lock();
> + p = &(rcu_dereference(to_gate(a)->actg)->param);
> + num_entries = p->num_entries;
> + rcu_read_unlock();

Here you obtain a pointer to part of rcu-protected object and use it
past rcu read lock block.

> +
> + list_for_each_entry(entry, &p->entries, list)
> + i++;
> +
> + if (i != num_entries)
> + return NULL;
> +
> + oe = kzalloc(sizeof(*oe) * num_entries, GFP_KERNEL);
> + if (!oe)
> + return NULL;
> +
> + i = 0;
> + list_for_each_entry(entry, &p->entries, list) {
> + oe[i].gate_state = entry->gate_state;
> + oe[i].interval = entry->interval;
> + oe[i].ipv = entry->ipv;
> + oe[i].maxoctets = entry->maxoctets;
> + i++;
> + }
> +
> + return oe;
> +}
> #endif
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 55bd1429678f..ca8bf74be4ba 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -39,6 +39,7 @@
> #include <net/tc_act/tc_skbedit.h>
> #include <net/tc_act/tc_ct.h>
> #include <net/tc_act/tc_mpls.h>
> +#include <net/tc_act/tc_gate.h>
> #include <net/flow_offload.h>
>
> extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
> @@ -3523,6 +3524,27 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
> #endif
> }
>
> +static void tcf_gate_entry_destructor(void *priv)
> +{
> + struct action_gate_entry *oe = priv;
> +
> + kfree(oe);
> +}
> +
> +static int tcf_gate_get_entries(struct flow_action_entry *entry,
> + const struct tc_action *act)
> +{
> + entry->gate.entries = tcf_gate_get_list(act);
> +
> + if (!entry->gate.entries)
> + return -EINVAL;
> +
> + entry->destructor = tcf_gate_entry_destructor;
> + entry->destructor_priv = entry->gate.entries;
> +
> + return 0;
> +}
> +
> int tc_setup_flow_action(struct flow_action *flow_action,
> const struct tcf_exts *exts)
> {
> @@ -3669,6 +3691,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
> } else if (is_tcf_skbedit_priority(act)) {
> entry->id = FLOW_ACTION_PRIORITY;
> entry->priority = tcf_skbedit_priority(act);
> + } else if (is_tcf_gate(act)) {
> + entry->id = FLOW_ACTION_GATE;
> + entry->gate.index = tcf_gate_index(act);
> + entry->gate.prio = tcf_gate_prio(act);
> + entry->gate.basetime = tcf_gate_basetime(act);
> + entry->gate.cycletime = tcf_gate_cycletime(act);
> + entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
> + entry->gate.num_entries = tcf_gate_num_entries(act);
> + err = tcf_gate_get_entries(entry, act);
> + if (err)
> + goto err_out;
> } else {
> err = -EOPNOTSUPP;
> goto err_out_locked;

2020-04-22 19:20:49

by [email protected]

[permalink] [raw]
Subject: Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Hi Po,

Nice to see even more work on the TSN standards in the upstream kernel.

On 22.04.2020 10:48, Po Liu wrote:
>EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>Introduce a ingress frame gate control flow action.
>Tc gate action does the work like this:
>Assume there is a gate allow specified ingress frames can be passed at
>specific time slot, and be dropped at specific time slot. Tc filter
>chooses the ingress frames, and tc gate action would specify what slot
>does these frames can be passed to device and what time slot would be
>dropped.
>Tc gate action would provide an entry list to tell how much time gate
>keep open and how much time gate keep state close. Gate action also
>assign a start time to tell when the entry list start. Then driver would
>repeat the gate entry list cyclically.
>For the software simulation, gate action requires the user assign a time
>clock type.
>
>Below is the setting example in user space. Tc filter a stream source ip
>address is 192.168.0.20 and gate action own two time slots. One is last
>200ms gate open let frame pass another is last 100ms gate close let
>frames dropped. When the frames have passed total frames over 8000000
>bytes, frames will be dropped in one 200000000ns time slot.
>
>> tc qdisc add dev eth0 ingress
>
>> tc filter add dev eth0 parent ffff: protocol ip \
> flower src_ip 192.168.0.20 \
> action gate index 2 clockid CLOCK_TAI \
> sched-entry open 200000000 -1 8000000 \
> sched-entry close 100000000 -1 -1

First of all, it is a long time since I read the 802.1Qci and when I did
it, it was a draft. So please let me know if I'm completly off here.

I know you are focusing on the gate control in this patch serie, but I
assume that you later will want to do the policing and flow-meter as
well. And it could make sense to consider how all of this work
toghether.

A common use-case for the policing is to have multiple rules pointing at
the same policing instance. Maybe you want the sum of the traffic on 2
ports to be limited to 100mbit. If you specify such action on the
individual rule (like done with the gate), then you can not have two
rules pointing at the same policer instance.

Long storry short, have you considered if it would be better to do
something like:

tc filter add dev eth0 parent ffff: protocol ip \
flower src_ip 192.168.0.20 \
action psfp-id 42

And then have some other function to configure the properties of psfp-id
42?


/Allan

2020-04-22 19:31:10

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Hi Allan,

On Wed, 22 Apr 2020 at 22:20, Allan W. Nielsen
<[email protected]> wrote:
>
> Hi Po,
>
> Nice to see even more work on the TSN standards in the upstream kernel.
>
> On 22.04.2020 10:48, Po Liu wrote:
> >EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> >
> >Introduce a ingress frame gate control flow action.
> >Tc gate action does the work like this:
> >Assume there is a gate allow specified ingress frames can be passed at
> >specific time slot, and be dropped at specific time slot. Tc filter
> >chooses the ingress frames, and tc gate action would specify what slot
> >does these frames can be passed to device and what time slot would be
> >dropped.
> >Tc gate action would provide an entry list to tell how much time gate
> >keep open and how much time gate keep state close. Gate action also
> >assign a start time to tell when the entry list start. Then driver would
> >repeat the gate entry list cyclically.
> >For the software simulation, gate action requires the user assign a time
> >clock type.
> >
> >Below is the setting example in user space. Tc filter a stream source ip
> >address is 192.168.0.20 and gate action own two time slots. One is last
> >200ms gate open let frame pass another is last 100ms gate close let
> >frames dropped. When the frames have passed total frames over 8000000
> >bytes, frames will be dropped in one 200000000ns time slot.
> >
> >> tc qdisc add dev eth0 ingress
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower src_ip 192.168.0.20 \
> > action gate index 2 clockid CLOCK_TAI \
> > sched-entry open 200000000 -1 8000000 \
> > sched-entry close 100000000 -1 -1
>
> First of all, it is a long time since I read the 802.1Qci and when I did
> it, it was a draft. So please let me know if I'm completly off here.
>
> I know you are focusing on the gate control in this patch serie, but I
> assume that you later will want to do the policing and flow-meter as
> well. And it could make sense to consider how all of this work
> toghether.
>
> A common use-case for the policing is to have multiple rules pointing at
> the same policing instance. Maybe you want the sum of the traffic on 2
> ports to be limited to 100mbit. If you specify such action on the
> individual rule (like done with the gate), then you can not have two
> rules pointing at the same policer instance.
>
> Long storry short, have you considered if it would be better to do
> something like:
>
> tc filter add dev eth0 parent ffff: protocol ip \
> flower src_ip 192.168.0.20 \
> action psfp-id 42
>
> And then have some other function to configure the properties of psfp-id
> 42?
>
>
> /Allan
>

It is very good that you brought it up though, since in my opinion too
it is a rather important aspect, and it seems that the fact this
feature is already designed-in was a bit too subtle.

"psfp-id" is actually his "index" argument.

You can actually do this:
tc filter add dev eth0 ingress \
flower skip_hw dst_mac 01:80:c2:00:00:0e \
action gate index 1 clockid CLOCK_TAI \
base-time 200000000000 \
sched-entry OPEN 200000000 -1 -1 \
sched-entry CLOSE 100000000 -1 -1
tc filter add dev eth0 ingress \
flower skip_hw dst_mac 01:80:c2:00:00:0f \
action gate index 1

Then 2 filters get created with the same action:

tc -s filter show dev swp2 ingress
filter protocol all pref 49151 flower chain 0
filter protocol all pref 49151 flower chain 0 handle 0x1
dst_mac 01:80:c2:00:00:0f
skip_hw
not_in_hw
action order 1:
priority wildcard clockid TAI flags 0x6404f
base-time 200000000000 cycle-time 300000000
cycle-time-ext 0
number 0 gate-state open interval 200000000
ipv wildcard max-octets wildcard
number 1 gate-state close interval 100000000
ipv wildcard max-octets wildcard
pipe
index 2 ref 2 bind 2 installed 168 sec used 168 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

filter protocol all pref 49152 flower chain 0
filter protocol all pref 49152 flower chain 0 handle 0x1
dst_mac 01:80:c2:00:00:0e
skip_hw
not_in_hw
action order 1:
priority wildcard clockid TAI flags 0x6404f
base-time 200000000000 cycle-time 300000000
cycle-time-ext 0
number 0 gate-state open interval 200000000
ipv wildcard max-octets wildcard
number 1 gate-state close interval 100000000
ipv wildcard max-octets wildcard
pipe
index 2 ref 2 bind 2 installed 168 sec used 168 sec
Action statistics:
Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
backlog 0b 0p requeues 0

Actually my only concern is that maybe this mechanism should (?) have
been more generic. At the moment, this patch series implements it via
a TCA_GATE_ENTRY_INDEX netlink attribute, so every action which wants
to be shared across filters needs to reinvent this wheel.

Thoughts, everyone?

Thanks,
-Vladimir

2020-04-22 20:07:41

by Dave Taht

[permalink] [raw]
Subject: Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

On Wed, Apr 22, 2020 at 12:31 PM Vladimir Oltean <[email protected]> wrote:
>
> Hi Allan,
>
> On Wed, 22 Apr 2020 at 22:20, Allan W. Nielsen
> <[email protected]> wrote:
> >
> > Hi Po,
> >
> > Nice to see even more work on the TSN standards in the upstream kernel.
> >
> > On 22.04.2020 10:48, Po Liu wrote:
> > >EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > >
> > >Introduce a ingress frame gate control flow action.
> > >Tc gate action does the work like this:
> > >Assume there is a gate allow specified ingress frames can be passed at
> > >specific time slot, and be dropped at specific time slot. Tc filter
> > >chooses the ingress frames, and tc gate action would specify what slot
> > >does these frames can be passed to device and what time slot would be
> > >dropped.
> > >Tc gate action would provide an entry list to tell how much time gate
> > >keep open and how much time gate keep state close. Gate action also
> > >assign a start time to tell when the entry list start. Then driver would
> > >repeat the gate entry list cyclically.
> > >For the software simulation, gate action requires the user assign a time
> > >clock type.
> > >
> > >Below is the setting example in user space. Tc filter a stream source ip
> > >address is 192.168.0.20 and gate action own two time slots. One is last
> > >200ms gate open let frame pass another is last 100ms gate close let
> > >frames dropped. When the frames have passed total frames over 8000000
> > >bytes, frames will be dropped in one 200000000ns time slot.
> > >
> > >> tc qdisc add dev eth0 ingress
> > >
> > >> tc filter add dev eth0 parent ffff: protocol ip \
> > > flower src_ip 192.168.0.20 \
> > > action gate index 2 clockid CLOCK_TAI \
> > > sched-entry open 200000000 -1 8000000 \
> > > sched-entry close 100000000 -1 -1
> >
> > First of all, it is a long time since I read the 802.1Qci and when I did
> > it, it was a draft. So please let me know if I'm completly off here.
> >
> > I know you are focusing on the gate control in this patch serie, but I
> > assume that you later will want to do the policing and flow-meter as
> > well. And it could make sense to consider how all of this work
> > toghether.
> >
> > A common use-case for the policing is to have multiple rules pointing at
> > the same policing instance. Maybe you want the sum of the traffic on 2
> > ports to be limited to 100mbit. If you specify such action on the
> > individual rule (like done with the gate), then you can not have two
> > rules pointing at the same policer instance.
> >
> > Long storry short, have you considered if it would be better to do
> > something like:
> >
> > tc filter add dev eth0 parent ffff: protocol ip \
> > flower src_ip 192.168.0.20 \
> > action psfp-id 42
> >
> > And then have some other function to configure the properties of psfp-id
> > 42?
> >
> >
> > /Allan
> >
>
> It is very good that you brought it up though, since in my opinion too
> it is a rather important aspect, and it seems that the fact this
> feature is already designed-in was a bit too subtle.
>
> "psfp-id" is actually his "index" argument.
>
> You can actually do this:
> tc filter add dev eth0 ingress \
> flower skip_hw dst_mac 01:80:c2:00:00:0e \
> action gate index 1 clockid CLOCK_TAI \
> base-time 200000000000 \
> sched-entry OPEN 200000000 -1 -1 \
> sched-entry CLOSE 100000000 -1 -1
> tc filter add dev eth0 ingress \
> flower skip_hw dst_mac 01:80:c2:00:00:0f \
> action gate index 1
>
> Then 2 filters get created with the same action:
>
> tc -s filter show dev swp2 ingress
> filter protocol all pref 49151 flower chain 0
> filter protocol all pref 49151 flower chain 0 handle 0x1
> dst_mac 01:80:c2:00:00:0f
> skip_hw
> not_in_hw
> action order 1:
> priority wildcard clockid TAI flags 0x6404f
> base-time 200000000000 cycle-time 300000000
> cycle-time-ext 0
> number 0 gate-state open interval 200000000
> ipv wildcard max-octets wildcard
> number 1 gate-state close interval 100000000
> ipv wildcard max-octets wildcard
> pipe
> index 2 ref 2 bind 2 installed 168 sec used 168 sec
> Action statistics:
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
> filter protocol all pref 49152 flower chain 0
> filter protocol all pref 49152 flower chain 0 handle 0x1
> dst_mac 01:80:c2:00:00:0e
> skip_hw
> not_in_hw
> action order 1:
> priority wildcard clockid TAI flags 0x6404f
> base-time 200000000000 cycle-time 300000000
> cycle-time-ext 0
> number 0 gate-state open interval 200000000
> ipv wildcard max-octets wildcard
> number 1 gate-state close interval 100000000
> ipv wildcard max-octets wildcard
> pipe
> index 2 ref 2 bind 2 installed 168 sec used 168 sec
> Action statistics:
> Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> backlog 0b 0p requeues 0
>
> Actually my only concern is that maybe this mechanism should (?) have
> been more generic. At the moment, this patch series implements it via
> a TCA_GATE_ENTRY_INDEX netlink attribute, so every action which wants
> to be shared across filters needs to reinvent this wheel.
>
> Thoughts, everyone?

I don't have anything valuable to add, aside from commenting this
whole thing makes my brain hurt.

> Thanks,
> -Vladimir



--
Make Music, Not War

Dave Täht
CTO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-831-435-0729

2020-04-23 03:16:38

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Hi Vlad Buslov,

> -----Original Message-----
> From: Vlad Buslov <[email protected]>
> Sent: 2020??4??22?? 21:23
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow
> action
>
> Caution: EXT Email
>
> Hi Po,
>
> On Wed 22 Apr 2020 at 05:48, Po Liu <[email protected]> wrote:
> > Introduce a ingress frame gate control flow action.
> > Tc gate action does the work like this:
> > Assume there is a gate allow specified ingress frames can be passed at
> > specific time slot, and be dropped at specific time slot. Tc filter
> > chooses the ingress frames, and tc gate action would specify what slot
> > does these frames can be passed to device and what time slot would be
> > dropped.
> > Tc gate action would provide an entry list to tell how much time gate
> > keep open and how much time gate keep state close. Gate action also
> > assign a start time to tell when the entry list start. Then driver
> > would repeat the gate entry list cyclically.
> > For the software simulation, gate action requires the user assign a
> > time clock type.
> >
> > Below is the setting example in user space. Tc filter a stream source
> > ip address is 192.168.0.20 and gate action own two time slots. One is
> > last 200ms gate open let frame pass another is last 100ms gate close
> > let frames dropped. When the frames have passed total frames over
> > 8000000 bytes, frames will be dropped in one 200000000ns time slot.
> >
> >> tc qdisc add dev eth0 ingress
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower src_ip 192.168.0.20 \
> > action gate index 2 clockid CLOCK_TAI \
> > sched-entry open 200000000 -1 8000000 \
> > sched-entry close 100000000 -1 -1
> >
> >> tc chain del dev eth0 ingress chain 0
> >
> > "sched-entry" follow the name taprio style. Gate state is
> > "open"/"close". Follow with period nanosecond. Then next item is
> > internal priority value means which ingress queue should put. "-1"
> > means wildcard. The last value optional specifies the maximum number
> > of MSDU octets that are permitted to pass the gate during the
> > specified time interval.
> > Base-time is not set will be 0 as default, as result start time would
> > be ((N + 1) * cycletime) which is the minimal of future time.
> >
> > Below example shows filtering a stream with destination mac address is
> > 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The
> > gate action would run with one close time slot which means always keep
> close.
> > The time cycle is total 200000000ns. The base-time would calculate by:
> >
> > 1357000000000 + (N + 1) * cycletime
> >
> > When the total value is the future time, it will be the start time.
> > The cycletime here would be 200000000ns for this case.
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> > action gate index 12 base-time 1357000000000 \
> > sched-entry close 200000000 -1 -1 \
> > clockid CLOCK_TAI
> >
> > Signed-off-by: Po Liu <[email protected]>
> > ---
> > include/net/tc_act/tc_gate.h | 54 +++
> > include/uapi/linux/pkt_cls.h | 1 +
> > include/uapi/linux/tc_act/tc_gate.h | 47 ++
> > net/sched/Kconfig | 13 +
> > net/sched/Makefile | 1 +
> > net/sched/act_gate.c | 647 ++++++++++++++++++++++++++++
> > 6 files changed, 763 insertions(+)
> > create mode 100644 include/net/tc_act/tc_gate.h create mode 100644
> > include/uapi/linux/tc_act/tc_gate.h
> > create mode 100644 net/sched/act_gate.c
> >
> > diff --git a/include/net/tc_act/tc_gate.h
> > b/include/net/tc_act/tc_gate.h new file mode 100644 index
> > 000000000000..b0ace55b2aaa
> > --- /dev/null
> > +++ b/include/net/tc_act/tc_gate.h
> > @@ -0,0 +1,54 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/* Copyright 2020 NXP */
> > +
> > +#ifndef __NET_TC_GATE_H
> > +#define __NET_TC_GATE_H
> > +
> > +#include <net/act_api.h>
> > +#include <linux/tc_act/tc_gate.h>
> > +
> > +struct tcfg_gate_entry {
> > + int index;
> > + u8 gate_state;
> > + u32 interval;
> > + s32 ipv;
> > + s32 maxoctets;
> > + struct list_head list;
> > +};
> > +
> > +struct tcf_gate_params {
> > + s32 tcfg_priority;
> > + u64 tcfg_basetime;
> > + u64 tcfg_cycletime;
> > + u64 tcfg_cycletime_ext;
> > + u32 tcfg_flags;
> > + s32 tcfg_clockid;
> > + size_t num_entries;
> > + struct list_head entries;
> > +};
> > +
> > +#define GATE_ACT_GATE_OPEN BIT(0)
> > +#define GATE_ACT_PENDING BIT(1)
> > +struct gate_action {
> > + struct tcf_gate_params param;
> > + spinlock_t entry_lock;
> > + u8 current_gate_status;
> > + ktime_t current_close_time;
> > + u32 current_entry_octets;
> > + s32 current_max_octets;
> > + struct tcfg_gate_entry __rcu *next_entry;
> > + struct hrtimer hitimer;
> > + enum tk_offsets tk_offset;
> > + struct rcu_head rcu;
> > +};
> > +
> > +struct tcf_gate {
> > + struct tc_action common;
> > + struct gate_action __rcu *actg;
> > +};
> > +#define to_gate(a) ((struct tcf_gate *)a)
> > +
> > +#define get_gate_param(act) ((struct tcf_gate_params *)act) #define
> > +get_gate_action(p) ((struct gate_action *)p)
> > +
> > +#endif
> > diff --git a/include/uapi/linux/pkt_cls.h
> > b/include/uapi/linux/pkt_cls.h index 9f06d29cab70..fc672b232437
> 100644
> > --- a/include/uapi/linux/pkt_cls.h
> > +++ b/include/uapi/linux/pkt_cls.h
> > @@ -134,6 +134,7 @@ enum tca_id {
> > TCA_ID_CTINFO,
> > TCA_ID_MPLS,
> > TCA_ID_CT,
> > + TCA_ID_GATE,
> > /* other actions go here */
> > __TCA_ID_MAX = 255
> > };
> > diff --git a/include/uapi/linux/tc_act/tc_gate.h
> > b/include/uapi/linux/tc_act/tc_gate.h
> > new file mode 100644
> > index 000000000000..f214b3a6d44f
> > --- /dev/null
> > +++ b/include/uapi/linux/tc_act/tc_gate.h
> > @@ -0,0 +1,47 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> > +/* Copyright 2020 NXP */
> > +
> > +#ifndef __LINUX_TC_GATE_H
> > +#define __LINUX_TC_GATE_H
> > +
> > +#include <linux/pkt_cls.h>
> > +
> > +struct tc_gate {
> > + tc_gen;
> > +};
> > +
> > +enum {
> > + TCA_GATE_ENTRY_UNSPEC,
> > + TCA_GATE_ENTRY_INDEX,
> > + TCA_GATE_ENTRY_GATE,
> > + TCA_GATE_ENTRY_INTERVAL,
> > + TCA_GATE_ENTRY_IPV,
> > + TCA_GATE_ENTRY_MAX_OCTETS,
> > + __TCA_GATE_ENTRY_MAX,
> > +};
> > +#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
> > +
> > +enum {
> > + TCA_GATE_ONE_ENTRY_UNSPEC,
> > + TCA_GATE_ONE_ENTRY,
> > + __TCA_GATE_ONE_ENTRY_MAX,
> > +};
> > +#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX
> - 1)
> > +
> > +enum {
> > + TCA_GATE_UNSPEC,
> > + TCA_GATE_TM,
> > + TCA_GATE_PARMS,
> > + TCA_GATE_PAD,
> > + TCA_GATE_PRIORITY,
> > + TCA_GATE_ENTRY_LIST,
> > + TCA_GATE_BASE_TIME,
> > + TCA_GATE_CYCLE_TIME,
> > + TCA_GATE_CYCLE_TIME_EXT,
> > + TCA_GATE_FLAGS,
> > + TCA_GATE_CLOCKID,
> > + __TCA_GATE_MAX,
> > +};
> > +#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
> > +
> > +#endif
> > diff --git a/net/sched/Kconfig b/net/sched/Kconfig index
> > bfbefb7bff9d..1314549c7567 100644
> > --- a/net/sched/Kconfig
> > +++ b/net/sched/Kconfig
> > @@ -981,6 +981,19 @@ config NET_ACT_CT
> > To compile this code as a module, choose M here: the
> > module will be called act_ct.
> >
> > +config NET_ACT_GATE
> > + tristate "Frame gate entry list control tc action"
> > + depends on NET_CLS_ACT
> > + help
> > + Say Y here to allow to control the ingress flow to be passed at
> > + specific time slot and be dropped at other specific time slot by
> > + the gate entry list. The manipulation will simulate the IEEE
> > + 802.1Qci stream gate control behavior.
> > +
> > + If unsure, say N.
> > + To compile this code as a module, choose M here: the
> > + module will be called act_gate.
> > +
> > config NET_IFE_SKBMARK
> > tristate "Support to encoding decoding skb mark on IFE action"
> > depends on NET_ACT_IFE
> > diff --git a/net/sched/Makefile b/net/sched/Makefile index
> > 31c367a6cd09..66bbf9a98f9e 100644
> > --- a/net/sched/Makefile
> > +++ b/net/sched/Makefile
> > @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) +=
> act_meta_skbprio.o
> > obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
> > obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
> > obj-$(CONFIG_NET_ACT_CT) += act_ct.o
> > +obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
> > obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
> > obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
> > obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
> > diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c new file mode
> > 100644 index 000000000000..e932f402b4f1
> > --- /dev/null
> > +++ b/net/sched/act_gate.c
> > @@ -0,0 +1,647 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/* Copyright 2020 NXP */
> > +
> > +#include <linux/module.h>
> > +#include <linux/types.h>
> > +#include <linux/kernel.h>
> > +#include <linux/string.h>
> > +#include <linux/errno.h>
> > +#include <linux/skbuff.h>
> > +#include <linux/rtnetlink.h>
> > +#include <linux/init.h>
> > +#include <linux/slab.h>
> > +#include <net/act_api.h>
> > +#include <net/netlink.h>
> > +#include <net/pkt_cls.h>
> > +#include <net/tc_act/tc_gate.h>
> > +
> > +static unsigned int gate_net_id;
> > +static struct tc_action_ops act_gate_ops;
> > +
> > +static ktime_t gate_get_time(struct gate_action *gact) {
> > + ktime_t mono = ktime_get();
> > +
> > + switch (gact->tk_offset) {
> > + case TK_OFFS_MAX:
> > + return mono;
> > + default:
> > + return ktime_mono_to_any(mono, gact->tk_offset);
> > + }
> > +
> > + return KTIME_MAX;
> > +}
> > +
> > +static int gate_get_start_time(struct gate_action *gact, ktime_t
> > +*start) {
> > + struct tcf_gate_params *param = get_gate_param(gact);
> > + ktime_t now, base, cycle;
> > + u64 n;
> > +
> > + base = ns_to_ktime(param->tcfg_basetime);
> > + now = gate_get_time(gact);
> > +
> > + if (ktime_after(base, now)) {
> > + *start = base;
> > + return 0;
> > + }
> > +
> > + cycle = param->tcfg_cycletime;
> > +
> > + /* cycle time should not be zero */
> > + if (WARN_ON(!cycle))
> > + return -EFAULT;
>
> Looking at the init code it seems that this value can be set to 0 directly
> from netlink packet without further validation, which would allow user to
> trigger warning here.

Yes, will avoid at ahead point.

>
> > +
> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
> > + *start = ktime_add_ns(base, (n + 1) * cycle);
> > + return 0;
> > +}
> > +
> > +static void gate_start_timer(struct gate_action *gact, ktime_t start)
> > +{
> > + ktime_t expires;
> > +
> > + expires = hrtimer_get_expires(&gact->hitimer);
> > + if (expires == 0)
> > + expires = KTIME_MAX;
> > +
> > + start = min_t(ktime_t, start, expires);
> > +
> > + hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS); }
> > +
> > +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer) {
> > + struct gate_action *gact = container_of(timer, struct gate_action,
> > + hitimer);
> > + struct tcf_gate_params *p = get_gate_param(gact);
> > + struct tcfg_gate_entry *next;
> > + ktime_t close_time, now;
> > +
> > + spin_lock(&gact->entry_lock);
> > +
> > + next = rcu_dereference_protected(gact->next_entry,
> > +
> > + lockdep_is_held(&gact->entry_lock));
> > +
> > + /* cycle start, clear pending bit, clear total octets */
> > + gact->current_gate_status = next->gate_state ?
> GATE_ACT_GATE_OPEN : 0;
> > + gact->current_entry_octets = 0;
> > + gact->current_max_octets = next->maxoctets;
> > +
> > + gact->current_close_time = ktime_add_ns(gact->current_close_time,
> > + next->interval);
> > +
> > + close_time = gact->current_close_time;
> > +
> > + if (list_is_last(&next->list, &p->entries))
> > + next = list_first_entry(&p->entries,
> > + struct tcfg_gate_entry, list);
> > + else
> > + next = list_next_entry(next, list);
> > +
> > + now = gate_get_time(gact);
> > +
> > + if (ktime_after(now, close_time)) {
> > + ktime_t cycle, base;
> > + u64 n;
> > +
> > + cycle = p->tcfg_cycletime;
> > + base = ns_to_ktime(p->tcfg_basetime);
> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
> > + close_time = ktime_add_ns(base, (n + 1) * cycle);
> > + }
> > +
> > + rcu_assign_pointer(gact->next_entry, next);
> > + spin_unlock(&gact->entry_lock);
>
> I have couple of question about synchronization here:
>
> - Why do you need next_entry to be rcu pointer? It is only assigned here
> with entry_lock protection and in init code before action is visible to
> concurrent users. I don't see any unlocked rcu-protected readers here that
> could benefit from it.
>
> - Why create dedicated entry_lock instead of using already existing per-
> action tcf_lock?

Will try to use the tcf_lock for verification.
The thoughts came from that the timer period arrived then check through the list and then update next time would take much more time. Action function would be busy when traffic. So use a separate lock here for

>
> > +
> > + hrtimer_set_expires(&gact->hitimer, close_time);
> > +
> > + return HRTIMER_RESTART;
> > +}
> > +
> > +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
> > + struct tcf_result *res) {
> > + struct tcf_gate *g = to_gate(a);
> > + struct gate_action *gact;
> > + int action;
> > +
> > + tcf_lastuse_update(&g->tcf_tm);
> > + bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
> > +
> > + action = READ_ONCE(g->tcf_action);
> > + rcu_read_lock();
>
> Action fastpath is already rcu read lock protected, you don't need to
> manually obtain it.

Will be removed.

>
> > + gact = rcu_dereference_bh(g->actg);
> > + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
>
> Can't current_gate_status be concurrently modified by timer callback?
> This function doesn't use entry_lock to synchronize with timer.

Will try tcf_lock either.

>
> > + rcu_read_unlock();
> > + return action;
> > + }
> > +
> > + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
>
> ...and here
>
> > + goto drop;
> > +
> > + if (gact->current_max_octets >= 0) {
> > + gact->current_entry_octets += qdisc_pkt_len(skb);
> > + if (gact->current_entry_octets >
> > + gact->current_max_octets) {
>
> here also.
>
> > +
> > + qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
>
> Please use tcf_action_inc_overlimit_qstats() and other wrappers for stats.
> Otherwise it will crash if user passes TCA_ACT_FLAGS_NO_PERCPU_STATS
> flag.

The tcf_action_inc_overlimit_qstats() can't show limit counts in tc show command. Is there anything need to do?

>
> > + goto drop;
> > + }
> > + }
> > + rcu_read_unlock();
> > +
> > + return action;
> > +drop:
> > + rcu_read_unlock();
> > + qstats_drop_inc(this_cpu_ptr(g->common.cpu_qstats));
> > + return TC_ACT_SHOT;
> > +}
> > +
> > +static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1]
> = {
> > + [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
> > + [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
> > + [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
> > + [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
> > + [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
> > +};
> > +
> > +static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
> > + [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
> > + .type = NLA_EXACT_LEN },
> > + [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
> > + [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
> > + [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
> > + [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
> > + [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
> > + [TCA_GATE_FLAGS] = { .type = NLA_U32 },
> > + [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
> > +};
> > +
> > +static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
> > + struct netlink_ext_ack *extack) {
> > + u32 interval = 0;
> > +
> > + entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
> > +
> > + if (tb[TCA_GATE_ENTRY_INTERVAL])
> > + interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
> > +
> > + if (interval == 0) {
> > + NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
> > + return -EINVAL;
> > + }
> > +
> > + entry->interval = interval;
> > +
> > + if (tb[TCA_GATE_ENTRY_IPV])
> > + entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
> > + else
> > + entry->ipv = -1;
> > +
> > + if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
> > + entry->maxoctets =
> nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
> > + else
> > + entry->maxoctets = -1;
> > +
> > + return 0;
> > +}
> > +
> > +static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry
> *entry,
> > + int index, struct netlink_ext_ack *extack) {
> > + struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
> > + int err;
>
> > + err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy,
> extack);
> > + if (err < 0) {
> > + NL_SET_ERR_MSG(extack, "Could not parse nested entry");
> > + return -EINVAL;
> > + }
> > +
> > + entry->index = index;
> > +
> > + return fill_gate_entry(tb, entry, extack); }
> > +
> > +static int parse_gate_list(struct nlattr *list_attr,
> > + struct tcf_gate_params *sched,
> > + struct netlink_ext_ack *extack) {
> > + struct tcfg_gate_entry *entry, *e;
> > + struct nlattr *n;
> > + int err, rem;
> > + int i = 0;
> > +
> > + if (!list_attr)
> > + return -EINVAL;
> > +
> > + nla_for_each_nested(n, list_attr, rem) {
> > + if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
> > + NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
> > + continue;
> > + }
> > +
> > + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> > + if (!entry) {
> > + NL_SET_ERR_MSG(extack, "Not enough memory for entry");
> > + err = -ENOMEM;
> > + goto release_list;
> > + }
> > +
> > + err = parse_gate_entry(n, entry, i, extack);
> > + if (err < 0) {
> > + kfree(entry);
> > + goto release_list;
> > + }
> > +
> > + list_add_tail(&entry->list, &sched->entries);
> > + i++;
> > + }
> > +
> > + sched->num_entries = i;
> > +
> > + return i;
> > +
> > +release_list:
> > + list_for_each_entry_safe(entry, e, &sched->entries, list) {
> > + list_del(&entry->list);
> > + kfree(entry);
> > + }
> > +
> > + return err;
> > +}
> > +
> > +static int tcf_gate_init(struct net *net, struct nlattr *nla,
> > + struct nlattr *est, struct tc_action **a,
> > + int ovr, int bind, bool rtnl_held,
> > + struct tcf_proto *tp, u32 flags,
> > + struct netlink_ext_ack *extack) {
> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
> > + enum tk_offsets tk_offset = TK_OFFS_TAI;
> > + struct nlattr *tb[TCA_GATE_MAX + 1];
> > + struct tcf_chain *goto_ch = NULL;
> > + struct tcfg_gate_entry *next;
> > + struct tcf_gate_params *p;
> > + struct gate_action *gact;
> > + s32 clockid = CLOCK_TAI;
> > + struct tc_gate *parm;
> > + struct tcf_gate *g;
> > + int ret = 0, err;
> > + u64 basetime = 0;
> > + u32 gflags = 0;
> > + s32 prio = -1;
> > + ktime_t start;
> > + u32 index;
> > +
> > + if (!nla)
> > + return -EINVAL;
> > +
> > + err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
> > + if (err < 0)
> > + return err;
> > +
> > + if (!tb[TCA_GATE_PARMS])
> > + return -EINVAL;
> > + parm = nla_data(tb[TCA_GATE_PARMS]);
> > + index = parm->index;
> > + err = tcf_idr_check_alloc(tn, &index, a, bind);
> > + if (err < 0)
> > + return err;
> > +
> > + if (err && bind)
> > + return 0;
> > +
> > + if (!err) {
> > + ret = tcf_idr_create_from_flags(tn, index, est, a,
> > + &act_gate_ops, bind, flags);
> > + if (ret) {
> > + tcf_idr_cleanup(tn, index);
> > + return ret;
> > + }
> > +
> > + ret = ACT_P_CREATED;
> > + } else if (!ovr) {
> > + tcf_idr_release(*a, bind);
> > + return -EEXIST;
> > + }
> > +
> > + if (tb[TCA_GATE_PRIORITY])
> > + prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
> > +
> > + if (tb[TCA_GATE_BASE_TIME])
> > + basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
> > +
> > + if (tb[TCA_GATE_FLAGS])
> > + gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
> > +
> > + if (tb[TCA_GATE_CLOCKID]) {
> > + clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
> > + switch (clockid) {
> > + case CLOCK_REALTIME:
> > + tk_offset = TK_OFFS_REAL;
> > + break;
> > + case CLOCK_MONOTONIC:
> > + tk_offset = TK_OFFS_MAX;
> > + break;
> > + case CLOCK_BOOTTIME:
> > + tk_offset = TK_OFFS_BOOT;
> > + break;
> > + case CLOCK_TAI:
> > + tk_offset = TK_OFFS_TAI;
> > + break;
> > + default:
> > + NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
> > + goto release_idr;
> > + }
> > + }
> > +
> > + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
> > + if (err < 0)
> > + goto release_idr;
> > +
> > + g = to_gate(*a);
> > +
> > + gact = kzalloc(sizeof(*gact), GFP_KERNEL);
> > + if (!gact) {
> > + err = -ENOMEM;
> > + goto put_chain;
> > + }
> > +
> > + p = get_gate_param(gact);
> > +
> > + INIT_LIST_HEAD(&p->entries);
> > + if (tb[TCA_GATE_ENTRY_LIST]) {
> > + err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
> > + if (err < 0)
> > + goto release_mem;
> > + }
> > +
> > + if (tb[TCA_GATE_CYCLE_TIME]) {
> > + p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
> > + } else {
> > + struct tcfg_gate_entry *entry;
> > + ktime_t cycle = 0;
> > +
> > + list_for_each_entry(entry, &p->entries, list)
> > + cycle = ktime_add_ns(cycle, entry->interval);
> > + p->tcfg_cycletime = cycle;
> > + }
> > +
> > + if (tb[TCA_GATE_CYCLE_TIME_EXT])
> > + p->tcfg_cycletime_ext =
> > + nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
> > +
> > + p->tcfg_priority = prio;
> > + p->tcfg_basetime = basetime;
> > + p->tcfg_clockid = clockid;
> > + p->tcfg_flags = gflags;
> > +
> > + gact->tk_offset = tk_offset;
> > + spin_lock_init(&gact->entry_lock);
> > + hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
> > + gact->hitimer.function = gate_timer_func;
> > +
> > + err = gate_get_start_time(gact, &start);
> > + if (err < 0) {
> > + NL_SET_ERR_MSG(extack,
> > + "Internal error: failed get start time");
> > + goto release_mem;
> > + }
> > +
> > + gact->current_close_time = start;
> > + gact->current_gate_status = GATE_ACT_GATE_OPEN |
> > + GATE_ACT_PENDING;
> > +
> > + next = list_first_entry(&p->entries, struct tcfg_gate_entry, list);
> > + rcu_assign_pointer(gact->next_entry, next);
> > +
> > + gate_start_timer(gact, start);
> > +
> > + spin_lock_bh(&g->tcf_lock);
> > + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
> > + gact = rcu_replace_pointer(g->actg, gact,
> > + lockdep_is_held(&g->tcf_lock));
> > + spin_unlock_bh(&g->tcf_lock);
> > +
> > + if (goto_ch)
> > + tcf_chain_put_by_act(goto_ch);
> > + if (gact)
> > + kfree_rcu(gact, rcu);
>
> This leaks entries. For example, tunnel key action implements
> tunnel_key_release_params() helper that is used by both init and release
> code. I guess that would be the best approach here as well.
>

Will refer the tunnel action. Thanks for pointing out.

> > +
> > + if (ret == ACT_P_CREATED)
> > + tcf_idr_insert(tn, *a);
> > + return ret;
> > +
> > +release_mem:
> > + kfree(gact);
> > +put_chain:
> > + if (goto_ch)
> > + tcf_chain_put_by_act(goto_ch);
> > +release_idr:
> > + tcf_idr_release(*a, bind);
> > + return err;
> > +}
> > +
> > +static void tcf_gate_cleanup(struct tc_action *a) {
> > + struct tcf_gate *g = to_gate(a);
> > + struct tcfg_gate_entry *entry, *n;
> > + struct tcf_gate_params *p;
> > + struct gate_action *gact;
> > +
> > + spin_lock_bh(&g->tcf_lock);
> > + gact = rcu_dereference_protected(g->actg,
> > + lockdep_is_held(&g->tcf_lock));
> > + hrtimer_cancel(&gact->hitimer);
> > +
> > + p = get_gate_param(gact);
> > + list_for_each_entry_safe(entry, n, &p->entries, list) {
> > + list_del(&entry->list);
> > + kfree(entry);
> > + }
> > + spin_unlock_bh(&g->tcf_lock);
> > +
> > + kfree_rcu(gact, rcu);
> > +}
> > +
> > +static int dumping_entry(struct sk_buff *skb,
> > + struct tcfg_gate_entry *entry) {
> > + struct nlattr *item;
> > +
> > + item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY);
> > + if (!item)
> > + return -ENOSPC;
> > +
> > + if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index))
> > + goto nla_put_failure;
> > +
> > + if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry-
> >maxoctets))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv))
> > + goto nla_put_failure;
> > +
> > + return nla_nest_end(skb, item);
> > +
> > +nla_put_failure:
> > + nla_nest_cancel(skb, item);
> > + return -1;
> > +}
> > +
> > +static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a,
> > + int bind, int ref) {
> > + unsigned char *b = skb_tail_pointer(skb);
> > + struct tcf_gate *g = to_gate(a);
> > + struct tc_gate opt = {
> > + .index = g->tcf_index,
> > + .refcnt = refcount_read(&g->tcf_refcnt) - ref,
> > + .bindcnt = atomic_read(&g->tcf_bindcnt) - bind,
> > + };
> > + struct tcfg_gate_entry *entry;
> > + struct gate_action *gact;
> > + struct tcf_gate_params *p;
> > + struct nlattr *entry_list;
> > + struct tcf_t t;
> > +
> > + spin_lock_bh(&g->tcf_lock);
> > + opt.action = g->tcf_action;
> > + gact = rcu_dereference_protected(g->actg,
> > + lockdep_is_held(&g->tcf_lock));
> > +
> > + p = get_gate_param(gact);
> > +
> > + if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME,
> > + p->tcfg_basetime, TCA_GATE_PAD))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME,
> > + p->tcfg_cycletime, TCA_GATE_PAD))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT,
> > + p->tcfg_cycletime_ext, TCA_GATE_PAD))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority))
> > + goto nla_put_failure;
> > +
> > + entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST);
> > + if (!entry_list)
> > + goto nla_put_failure;
> > +
> > + list_for_each_entry(entry, &p->entries, list) {
> > + if (dumping_entry(skb, entry) < 0)
> > + goto nla_put_failure;
> > + }
> > +
> > + nla_nest_end(skb, entry_list);
> > +
> > + tcf_tm_dump(&t, &g->tcf_tm);
> > + if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD))
> > + goto nla_put_failure;
> > + spin_unlock_bh(&g->tcf_lock);
> > +
> > + return skb->len;
> > +
> > +nla_put_failure:
> > + spin_unlock_bh(&g->tcf_lock);
> > + nlmsg_trim(skb, b);
> > + return -1;
> > +}
> > +
> > +static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
> > + struct netlink_callback *cb, int type,
> > + const struct tc_action_ops *ops,
> > + struct netlink_ext_ack *extack) {
> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
> > +
> > + return tcf_generic_walker(tn, skb, cb, type, ops, extack); }
> > +
> > +static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u32
> packets,
> > + u64 lastuse, bool hw) {
> > + struct tcf_gate *g = to_gate(a);
> > + struct tcf_t *tm = &g->tcf_tm;
> > +
> > + tcf_action_update_stats(a, bytes, packets, false, hw);
> > + tm->lastuse = max_t(u64, tm->lastuse, lastuse); }
> > +
> > +static int tcf_gate_search(struct net *net, struct tc_action **a, u32
> > +index) {
> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
> > +
> > + return tcf_idr_search(tn, a, index); }
> > +
> > +static size_t tcf_gate_get_fill_size(const struct tc_action *act) {
> > + return nla_total_size(sizeof(struct tc_gate)); }
> > +
> > +static struct tc_action_ops act_gate_ops = {
> > + .kind = "gate",
> > + .id = TCA_ID_GATE,
> > + .owner = THIS_MODULE,
> > + .act = tcf_gate_act,
> > + .dump = tcf_gate_dump,
> > + .init = tcf_gate_init,
> > + .cleanup = tcf_gate_cleanup,
> > + .walk = tcf_gate_walker,
> > + .stats_update = tcf_gate_stats_update,
> > + .get_fill_size = tcf_gate_get_fill_size,
> > + .lookup = tcf_gate_search,
> > + .size = sizeof(struct gate_action),
> > +};
> > +
> > +static __net_init int gate_init_net(struct net *net) {
> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
> > +
> > + return tc_action_net_init(net, tn, &act_gate_ops); }
> > +
> > +static void __net_exit gate_exit_net(struct list_head *net_list) {
> > + tc_action_net_exit(net_list, gate_net_id); }
> > +
> > +static struct pernet_operations gate_net_ops = {
> > + .init = gate_init_net,
> > + .exit_batch = gate_exit_net,
> > + .id = &gate_net_id,
> > + .size = sizeof(struct tc_action_net), };
> > +
> > +static int __init gate_init_module(void) {
> > + return tcf_register_action(&act_gate_ops, &gate_net_ops); }
> > +
> > +static void __exit gate_cleanup_module(void) {
> > + tcf_unregister_action(&act_gate_ops, &gate_net_ops); }
> > +
> > +module_init(gate_init_module);
> > +module_exit(gate_cleanup_module);
> > +MODULE_LICENSE("GPL v2");

Thanks a lot.

Br,
Po Liu

2020-04-23 03:29:01

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Hi Nielsen,

> -----Original Message-----
> From: Allan W. Nielsen <[email protected]>
> Sent: 2020年4月23日 3:19
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow
> action
>
> Caution: EXT Email
>
> Hi Po,
>
> Nice to see even more work on the TSN standards in the upstream kernel.
>
> On 22.04.2020 10:48, Po Liu wrote:
> >EXTERNAL EMAIL: Do not click links or open attachments unless you know
> >the content is safe
> >
> >Introduce a ingress frame gate control flow action.
> >Tc gate action does the work like this:
> >Assume there is a gate allow specified ingress frames can be passed at
> >specific time slot, and be dropped at specific time slot. Tc filter
> >chooses the ingress frames, and tc gate action would specify what slot
> >does these frames can be passed to device and what time slot would be
> >dropped.
> >Tc gate action would provide an entry list to tell how much time gate
> >keep open and how much time gate keep state close. Gate action also
> >assign a start time to tell when the entry list start. Then driver
> >would repeat the gate entry list cyclically.
> >For the software simulation, gate action requires the user assign a
> >time clock type.
> >
> >Below is the setting example in user space. Tc filter a stream source
> >ip address is 192.168.0.20 and gate action own two time slots. One is
> >last 200ms gate open let frame pass another is last 100ms gate close
> >let frames dropped. When the frames have passed total frames over
> >8000000 bytes, frames will be dropped in one 200000000ns time slot.
> >
> >> tc qdisc add dev eth0 ingress
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower src_ip 192.168.0.20 \
> > action gate index 2 clockid CLOCK_TAI \
> > sched-entry open 200000000 -1 8000000 \
> > sched-entry close 100000000 -1 -1
>
> First of all, it is a long time since I read the 802.1Qci and when I did it, it
> was a draft. So please let me know if I'm completly off here.
>
> I know you are focusing on the gate control in this patch serie, but I
> assume that you later will want to do the policing and flow-meter as well.
> And it could make sense to consider how all of this work toghether.

The gate action is the must have feature, so here is the part of it. And provide the stream filter implementation part in the driver.
I provided a whole thoughts, with flow-meter apply to policing action and other throughs in the RFC version. You can look back to that version for reference.

>
> A common use-case for the policing is to have multiple rules pointing at
> the same policing instance. Maybe you want the sum of the traffic on 2
> ports to be limited to 100mbit. If you specify such action on the individual
> rule (like done with the gate), then you can not have two rules pointing at
> the same policer instance.
>
> Long storry short, have you considered if it would be better to do
> something like:
>
> tc filter add dev eth0 parent ffff: protocol ip \
> flower src_ip 192.168.0.20 \
> action psfp-id 42
>
> And then have some other function to configure the properties of psfp-id
> 42?

I think the psfp-id you mention is for the gate and stream filter and flow meter and stream-identify. For each they should have an index.
The stream-identify index come from the chain value. Gate index comes from the index in the gate action, two ports or more can share one same index gate index. Will same for the flow-meter action.

>
>
> /Allan

Thanks!

Br,
Po Liu

2020-04-23 03:34:26

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Hi Dave That,


> -----Original Message-----
> From: Dave Taht <[email protected]>
> Sent: 2020年4月23日 4:06
> To: Vladimir Oltean <[email protected]>
> Cc: Allan W. Nielsen <[email protected]>; Po Liu
> <[email protected]>; David S. Miller <[email protected]>; lkml <linux-
> [email protected]>; netdev <[email protected]>; Vinicius Costa
> Gomes <[email protected]>; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>;
> [email protected]; [email protected]; Saeed Mahameed
> <[email protected]>; [email protected]; Jiri Pirko
> <[email protected]>; Ido Schimmel <[email protected]>; Alexandre
> Belloni <[email protected]>; Microchip Linux Driver Support
> <[email protected]>; Jakub Kicinski <[email protected]>;
> Jamal Hadi Salim <[email protected]>; Cong Wang
> <[email protected]>; [email protected]; Pablo
> Neira Ayuso <[email protected]>; [email protected]; Murali
> Karicheri <[email protected]>; Andre Guedes
> <[email protected]>; Stephen Hemminger
> <[email protected]>
> Subject: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow
> action
>
> Caution: EXT Email
>
> On Wed, Apr 22, 2020 at 12:31 PM Vladimir Oltean <[email protected]>
> wrote:
> >
> > Hi Allan,
> >
> > On Wed, 22 Apr 2020 at 22:20, Allan W. Nielsen
> > <[email protected]> wrote:
> > >
> > > Hi Po,
> > >
> > > Nice to see even more work on the TSN standards in the upstream
> kernel.
> > >
> > > On 22.04.2020 10:48, Po Liu wrote:
> > > >EXTERNAL EMAIL: Do not click links or open attachments unless you
> > > >know the content is safe
> > > >
> > > >Introduce a ingress frame gate control flow action.
> > > >Tc gate action does the work like this:
> > > >Assume there is a gate allow specified ingress frames can be passed
> > > >at specific time slot, and be dropped at specific time slot. Tc
> > > >filter chooses the ingress frames, and tc gate action would specify
> > > >what slot does these frames can be passed to device and what time
> > > >slot would be dropped.
> > > >Tc gate action would provide an entry list to tell how much time
> > > >gate keep open and how much time gate keep state close. Gate
> action
> > > >also assign a start time to tell when the entry list start. Then
> > > >driver would repeat the gate entry list cyclically.
> > > >For the software simulation, gate action requires the user assign a
> > > >time clock type.
> > > >
> > > >Below is the setting example in user space. Tc filter a stream
> > > >source ip address is 192.168.0.20 and gate action own two time
> > > >slots. One is last 200ms gate open let frame pass another is last
> > > >100ms gate close let frames dropped. When the frames have passed
> > > >total frames over 8000000 bytes, frames will be dropped in one
> 200000000ns time slot.
> > > >
> > > >> tc qdisc add dev eth0 ingress
> > > >
> > > >> tc filter add dev eth0 parent ffff: protocol ip \
> > > > flower src_ip 192.168.0.20 \
> > > > action gate index 2 clockid CLOCK_TAI \
> > > > sched-entry open 200000000 -1 8000000 \
> > > > sched-entry close 100000000 -1 -1
> > >
> > > First of all, it is a long time since I read the 802.1Qci and when I
> > > did it, it was a draft. So please let me know if I'm completly off here.
> > >
> > > I know you are focusing on the gate control in this patch serie, but
> > > I assume that you later will want to do the policing and flow-meter
> > > as well. And it could make sense to consider how all of this work
> > > toghether.
> > >
> > > A common use-case for the policing is to have multiple rules
> > > pointing at the same policing instance. Maybe you want the sum of
> > > the traffic on 2 ports to be limited to 100mbit. If you specify such
> > > action on the individual rule (like done with the gate), then you
> > > can not have two rules pointing at the same policer instance.
> > >
> > > Long storry short, have you considered if it would be better to do
> > > something like:
> > >
> > > tc filter add dev eth0 parent ffff: protocol ip \
> > > flower src_ip 192.168.0.20 \
> > > action psfp-id 42
> > >
> > > And then have some other function to configure the properties of
> > > psfp-id 42?
> > >
> > >
> > > /Allan
> > >
> >
> > It is very good that you brought it up though, since in my opinion too
> > it is a rather important aspect, and it seems that the fact this
> > feature is already designed-in was a bit too subtle.
> >
> > "psfp-id" is actually his "index" argument.
> >
> > You can actually do this:
> > tc filter add dev eth0 ingress \
> > flower skip_hw dst_mac 01:80:c2:00:00:0e \
> > action gate index 1 clockid CLOCK_TAI \
> > base-time 200000000000 \
> > sched-entry OPEN 200000000 -1 -1 \
> > sched-entry CLOSE 100000000 -1 -1 tc filter add dev eth0
> > ingress \
> > flower skip_hw dst_mac 01:80:c2:00:00:0f \
> > action gate index 1
> >
> > Then 2 filters get created with the same action:
> >
> > tc -s filter show dev swp2 ingress
> > filter protocol all pref 49151 flower chain 0 filter protocol all pref
> > 49151 flower chain 0 handle 0x1
> > dst_mac 01:80:c2:00:00:0f
> > skip_hw
> > not_in_hw
> > action order 1:
> > priority wildcard clockid TAI flags 0x6404f
> > base-time 200000000000 cycle-time 300000000
> > cycle-time-ext 0
> > number 0 gate-state open interval 200000000
> > ipv wildcard max-octets wildcard
> > number 1 gate-state close interval 100000000
> > ipv wildcard max-octets wildcard
> > pipe
> > index 2 ref 2 bind 2 installed 168 sec used 168 sec
> > Action statistics:
> > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> > backlog 0b 0p requeues 0
> >
> > filter protocol all pref 49152 flower chain 0 filter protocol all pref
> > 49152 flower chain 0 handle 0x1
> > dst_mac 01:80:c2:00:00:0e
> > skip_hw
> > not_in_hw
> > action order 1:
> > priority wildcard clockid TAI flags 0x6404f
> > base-time 200000000000 cycle-time 300000000
> > cycle-time-ext 0
> > number 0 gate-state open interval 200000000
> > ipv wildcard max-octets wildcard
> > number 1 gate-state close interval 100000000
> > ipv wildcard max-octets wildcard
> > pipe
> > index 2 ref 2 bind 2 installed 168 sec used 168 sec
> > Action statistics:
> > Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
> > backlog 0b 0p requeues 0
> >
> > Actually my only concern is that maybe this mechanism should (?) have
> > been more generic. At the moment, this patch series implements it via
> > a TCA_GATE_ENTRY_INDEX netlink attribute, so every action which
> wants
> > to be shared across filters needs to reinvent this wheel.
> >
> > Thoughts, everyone?
>
> I don't have anything valuable to add, aside from commenting this whole
> thing makes my brain hurt.

Thanks for express your thoughts.

>
> > Thanks,
> > -Vladimir
>
>
>
> --
> Make Music, Not War

Thanks!

>
> Dave Täht
> CTO, TekLibre, LLC
> https://eur01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> .teklibre.com%2F&amp;data=02%7C01%7CPo.Liu%40nxp.com%7C09cf63bb
> 73ee4fb2e08108d7e6f8929f%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0
> %7C0%7C637231827599217479&amp;sdata=u3ILmBlK6RsVYDLy9gtFythp%2
> FbG2%2Bw40xea2N1sJvr4%3D&amp;reserved=0
> Tel: 1-831-435-0729



Br,
Po Liu

2020-04-23 07:47:08

by Vlad Buslov

[permalink] [raw]
Subject: Re: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action


On Thu 23 Apr 2020 at 06:14, Po Liu <[email protected]> wrote:
> Hi Vlad Buslov,
>
>> -----Original Message-----
>> From: Vlad Buslov <[email protected]>
>> Sent: 2020年4月22日 21:23
>> To: Po Liu <[email protected]>
>> Cc: [email protected]; [email protected];
>> [email protected]; [email protected]; Claudiu Manoil
>> <[email protected]>; Vladimir Oltean <[email protected]>;
>> Alexandru Marginean <[email protected]>;
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected]
>> Subject: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow
>> action
>>
>> Caution: EXT Email
>>
>> Hi Po,
>>
>> On Wed 22 Apr 2020 at 05:48, Po Liu <[email protected]> wrote:
>> > Introduce a ingress frame gate control flow action.
>> > Tc gate action does the work like this:
>> > Assume there is a gate allow specified ingress frames can be passed at
>> > specific time slot, and be dropped at specific time slot. Tc filter
>> > chooses the ingress frames, and tc gate action would specify what slot
>> > does these frames can be passed to device and what time slot would be
>> > dropped.
>> > Tc gate action would provide an entry list to tell how much time gate
>> > keep open and how much time gate keep state close. Gate action also
>> > assign a start time to tell when the entry list start. Then driver
>> > would repeat the gate entry list cyclically.
>> > For the software simulation, gate action requires the user assign a
>> > time clock type.
>> >
>> > Below is the setting example in user space. Tc filter a stream source
>> > ip address is 192.168.0.20 and gate action own two time slots. One is
>> > last 200ms gate open let frame pass another is last 100ms gate close
>> > let frames dropped. When the frames have passed total frames over
>> > 8000000 bytes, frames will be dropped in one 200000000ns time slot.
>> >
>> >> tc qdisc add dev eth0 ingress
>> >
>> >> tc filter add dev eth0 parent ffff: protocol ip \
>> > flower src_ip 192.168.0.20 \
>> > action gate index 2 clockid CLOCK_TAI \
>> > sched-entry open 200000000 -1 8000000 \
>> > sched-entry close 100000000 -1 -1
>> >
>> >> tc chain del dev eth0 ingress chain 0
>> >
>> > "sched-entry" follow the name taprio style. Gate state is
>> > "open"/"close". Follow with period nanosecond. Then next item is
>> > internal priority value means which ingress queue should put. "-1"
>> > means wildcard. The last value optional specifies the maximum number
>> > of MSDU octets that are permitted to pass the gate during the
>> > specified time interval.
>> > Base-time is not set will be 0 as default, as result start time would
>> > be ((N + 1) * cycletime) which is the minimal of future time.
>> >
>> > Below example shows filtering a stream with destination mac address is
>> > 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The
>> > gate action would run with one close time slot which means always keep
>> close.
>> > The time cycle is total 200000000ns. The base-time would calculate by:
>> >
>> > 1357000000000 + (N + 1) * cycletime
>> >
>> > When the total value is the future time, it will be the start time.
>> > The cycletime here would be 200000000ns for this case.
>> >
>> >> tc filter add dev eth0 parent ffff: protocol ip \
>> > flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
>> > action gate index 12 base-time 1357000000000 \
>> > sched-entry close 200000000 -1 -1 \
>> > clockid CLOCK_TAI
>> >
>> > Signed-off-by: Po Liu <[email protected]>
>> > ---
>> > include/net/tc_act/tc_gate.h | 54 +++
>> > include/uapi/linux/pkt_cls.h | 1 +
>> > include/uapi/linux/tc_act/tc_gate.h | 47 ++
>> > net/sched/Kconfig | 13 +
>> > net/sched/Makefile | 1 +
>> > net/sched/act_gate.c | 647 ++++++++++++++++++++++++++++
>> > 6 files changed, 763 insertions(+)
>> > create mode 100644 include/net/tc_act/tc_gate.h create mode 100644
>> > include/uapi/linux/tc_act/tc_gate.h
>> > create mode 100644 net/sched/act_gate.c
>> >
>> > diff --git a/include/net/tc_act/tc_gate.h
>> > b/include/net/tc_act/tc_gate.h new file mode 100644 index
>> > 000000000000..b0ace55b2aaa
>> > --- /dev/null
>> > +++ b/include/net/tc_act/tc_gate.h
>> > @@ -0,0 +1,54 @@
>> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> > +/* Copyright 2020 NXP */
>> > +
>> > +#ifndef __NET_TC_GATE_H
>> > +#define __NET_TC_GATE_H
>> > +
>> > +#include <net/act_api.h>
>> > +#include <linux/tc_act/tc_gate.h>
>> > +
>> > +struct tcfg_gate_entry {
>> > + int index;
>> > + u8 gate_state;
>> > + u32 interval;
>> > + s32 ipv;
>> > + s32 maxoctets;
>> > + struct list_head list;
>> > +};
>> > +
>> > +struct tcf_gate_params {
>> > + s32 tcfg_priority;
>> > + u64 tcfg_basetime;
>> > + u64 tcfg_cycletime;
>> > + u64 tcfg_cycletime_ext;
>> > + u32 tcfg_flags;
>> > + s32 tcfg_clockid;
>> > + size_t num_entries;
>> > + struct list_head entries;
>> > +};
>> > +
>> > +#define GATE_ACT_GATE_OPEN BIT(0)
>> > +#define GATE_ACT_PENDING BIT(1)
>> > +struct gate_action {
>> > + struct tcf_gate_params param;
>> > + spinlock_t entry_lock;
>> > + u8 current_gate_status;
>> > + ktime_t current_close_time;
>> > + u32 current_entry_octets;
>> > + s32 current_max_octets;
>> > + struct tcfg_gate_entry __rcu *next_entry;
>> > + struct hrtimer hitimer;
>> > + enum tk_offsets tk_offset;
>> > + struct rcu_head rcu;
>> > +};
>> > +
>> > +struct tcf_gate {
>> > + struct tc_action common;
>> > + struct gate_action __rcu *actg;
>> > +};
>> > +#define to_gate(a) ((struct tcf_gate *)a)
>> > +
>> > +#define get_gate_param(act) ((struct tcf_gate_params *)act) #define
>> > +get_gate_action(p) ((struct gate_action *)p)
>> > +
>> > +#endif
>> > diff --git a/include/uapi/linux/pkt_cls.h
>> > b/include/uapi/linux/pkt_cls.h index 9f06d29cab70..fc672b232437
>> 100644
>> > --- a/include/uapi/linux/pkt_cls.h
>> > +++ b/include/uapi/linux/pkt_cls.h
>> > @@ -134,6 +134,7 @@ enum tca_id {
>> > TCA_ID_CTINFO,
>> > TCA_ID_MPLS,
>> > TCA_ID_CT,
>> > + TCA_ID_GATE,
>> > /* other actions go here */
>> > __TCA_ID_MAX = 255
>> > };
>> > diff --git a/include/uapi/linux/tc_act/tc_gate.h
>> > b/include/uapi/linux/tc_act/tc_gate.h
>> > new file mode 100644
>> > index 000000000000..f214b3a6d44f
>> > --- /dev/null
>> > +++ b/include/uapi/linux/tc_act/tc_gate.h
>> > @@ -0,0 +1,47 @@
>> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
>> > +/* Copyright 2020 NXP */
>> > +
>> > +#ifndef __LINUX_TC_GATE_H
>> > +#define __LINUX_TC_GATE_H
>> > +
>> > +#include <linux/pkt_cls.h>
>> > +
>> > +struct tc_gate {
>> > + tc_gen;
>> > +};
>> > +
>> > +enum {
>> > + TCA_GATE_ENTRY_UNSPEC,
>> > + TCA_GATE_ENTRY_INDEX,
>> > + TCA_GATE_ENTRY_GATE,
>> > + TCA_GATE_ENTRY_INTERVAL,
>> > + TCA_GATE_ENTRY_IPV,
>> > + TCA_GATE_ENTRY_MAX_OCTETS,
>> > + __TCA_GATE_ENTRY_MAX,
>> > +};
>> > +#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
>> > +
>> > +enum {
>> > + TCA_GATE_ONE_ENTRY_UNSPEC,
>> > + TCA_GATE_ONE_ENTRY,
>> > + __TCA_GATE_ONE_ENTRY_MAX,
>> > +};
>> > +#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX
>> - 1)
>> > +
>> > +enum {
>> > + TCA_GATE_UNSPEC,
>> > + TCA_GATE_TM,
>> > + TCA_GATE_PARMS,
>> > + TCA_GATE_PAD,
>> > + TCA_GATE_PRIORITY,
>> > + TCA_GATE_ENTRY_LIST,
>> > + TCA_GATE_BASE_TIME,
>> > + TCA_GATE_CYCLE_TIME,
>> > + TCA_GATE_CYCLE_TIME_EXT,
>> > + TCA_GATE_FLAGS,
>> > + TCA_GATE_CLOCKID,
>> > + __TCA_GATE_MAX,
>> > +};
>> > +#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
>> > +
>> > +#endif
>> > diff --git a/net/sched/Kconfig b/net/sched/Kconfig index
>> > bfbefb7bff9d..1314549c7567 100644
>> > --- a/net/sched/Kconfig
>> > +++ b/net/sched/Kconfig
>> > @@ -981,6 +981,19 @@ config NET_ACT_CT
>> > To compile this code as a module, choose M here: the
>> > module will be called act_ct.
>> >
>> > +config NET_ACT_GATE
>> > + tristate "Frame gate entry list control tc action"
>> > + depends on NET_CLS_ACT
>> > + help
>> > + Say Y here to allow to control the ingress flow to be passed at
>> > + specific time slot and be dropped at other specific time slot by
>> > + the gate entry list. The manipulation will simulate the IEEE
>> > + 802.1Qci stream gate control behavior.
>> > +
>> > + If unsure, say N.
>> > + To compile this code as a module, choose M here: the
>> > + module will be called act_gate.
>> > +
>> > config NET_IFE_SKBMARK
>> > tristate "Support to encoding decoding skb mark on IFE action"
>> > depends on NET_ACT_IFE
>> > diff --git a/net/sched/Makefile b/net/sched/Makefile index
>> > 31c367a6cd09..66bbf9a98f9e 100644
>> > --- a/net/sched/Makefile
>> > +++ b/net/sched/Makefile
>> > @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) +=
>> act_meta_skbprio.o
>> > obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
>> > obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
>> > obj-$(CONFIG_NET_ACT_CT) += act_ct.o
>> > +obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
>> > obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
>> > obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
>> > obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
>> > diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c new file mode
>> > 100644 index 000000000000..e932f402b4f1
>> > --- /dev/null
>> > +++ b/net/sched/act_gate.c
>> > @@ -0,0 +1,647 @@
>> > +// SPDX-License-Identifier: GPL-2.0-or-later
>> > +/* Copyright 2020 NXP */
>> > +
>> > +#include <linux/module.h>
>> > +#include <linux/types.h>
>> > +#include <linux/kernel.h>
>> > +#include <linux/string.h>
>> > +#include <linux/errno.h>
>> > +#include <linux/skbuff.h>
>> > +#include <linux/rtnetlink.h>
>> > +#include <linux/init.h>
>> > +#include <linux/slab.h>
>> > +#include <net/act_api.h>
>> > +#include <net/netlink.h>
>> > +#include <net/pkt_cls.h>
>> > +#include <net/tc_act/tc_gate.h>
>> > +
>> > +static unsigned int gate_net_id;
>> > +static struct tc_action_ops act_gate_ops;
>> > +
>> > +static ktime_t gate_get_time(struct gate_action *gact) {
>> > + ktime_t mono = ktime_get();
>> > +
>> > + switch (gact->tk_offset) {
>> > + case TK_OFFS_MAX:
>> > + return mono;
>> > + default:
>> > + return ktime_mono_to_any(mono, gact->tk_offset);
>> > + }
>> > +
>> > + return KTIME_MAX;
>> > +}
>> > +
>> > +static int gate_get_start_time(struct gate_action *gact, ktime_t
>> > +*start) {
>> > + struct tcf_gate_params *param = get_gate_param(gact);
>> > + ktime_t now, base, cycle;
>> > + u64 n;
>> > +
>> > + base = ns_to_ktime(param->tcfg_basetime);
>> > + now = gate_get_time(gact);
>> > +
>> > + if (ktime_after(base, now)) {
>> > + *start = base;
>> > + return 0;
>> > + }
>> > +
>> > + cycle = param->tcfg_cycletime;
>> > +
>> > + /* cycle time should not be zero */
>> > + if (WARN_ON(!cycle))
>> > + return -EFAULT;
>>
>> Looking at the init code it seems that this value can be set to 0 directly
>> from netlink packet without further validation, which would allow user to
>> trigger warning here.
>
> Yes, will avoid at ahead point.
>
>>
>> > +
>> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
>> > + *start = ktime_add_ns(base, (n + 1) * cycle);
>> > + return 0;
>> > +}
>> > +
>> > +static void gate_start_timer(struct gate_action *gact, ktime_t start)
>> > +{
>> > + ktime_t expires;
>> > +
>> > + expires = hrtimer_get_expires(&gact->hitimer);
>> > + if (expires == 0)
>> > + expires = KTIME_MAX;
>> > +
>> > + start = min_t(ktime_t, start, expires);
>> > +
>> > + hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS); }
>> > +
>> > +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer) {
>> > + struct gate_action *gact = container_of(timer, struct gate_action,
>> > + hitimer);
>> > + struct tcf_gate_params *p = get_gate_param(gact);
>> > + struct tcfg_gate_entry *next;
>> > + ktime_t close_time, now;
>> > +
>> > + spin_lock(&gact->entry_lock);
>> > +
>> > + next = rcu_dereference_protected(gact->next_entry,
>> > +
>> > + lockdep_is_held(&gact->entry_lock));
>> > +
>> > + /* cycle start, clear pending bit, clear total octets */
>> > + gact->current_gate_status = next->gate_state ?
>> GATE_ACT_GATE_OPEN : 0;
>> > + gact->current_entry_octets = 0;
>> > + gact->current_max_octets = next->maxoctets;
>> > +
>> > + gact->current_close_time = ktime_add_ns(gact->current_close_time,
>> > + next->interval);
>> > +
>> > + close_time = gact->current_close_time;
>> > +
>> > + if (list_is_last(&next->list, &p->entries))
>> > + next = list_first_entry(&p->entries,
>> > + struct tcfg_gate_entry, list);
>> > + else
>> > + next = list_next_entry(next, list);
>> > +
>> > + now = gate_get_time(gact);
>> > +
>> > + if (ktime_after(now, close_time)) {
>> > + ktime_t cycle, base;
>> > + u64 n;
>> > +
>> > + cycle = p->tcfg_cycletime;
>> > + base = ns_to_ktime(p->tcfg_basetime);
>> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
>> > + close_time = ktime_add_ns(base, (n + 1) * cycle);
>> > + }
>> > +
>> > + rcu_assign_pointer(gact->next_entry, next);
>> > + spin_unlock(&gact->entry_lock);
>>
>> I have couple of question about synchronization here:
>>
>> - Why do you need next_entry to be rcu pointer? It is only assigned here
>> with entry_lock protection and in init code before action is visible to
>> concurrent users. I don't see any unlocked rcu-protected readers here that
>> could benefit from it.
>>
>> - Why create dedicated entry_lock instead of using already existing per-
>> action tcf_lock?
>
> Will try to use the tcf_lock for verification.
> The thoughts came from that the timer period arrived then check through the list and then update next time would take much more time. Action function would be busy when traffic. So use a separate lock here for
>
>>
>> > +
>> > + hrtimer_set_expires(&gact->hitimer, close_time);
>> > +
>> > + return HRTIMER_RESTART;
>> > +}
>> > +
>> > +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
>> > + struct tcf_result *res) {
>> > + struct tcf_gate *g = to_gate(a);
>> > + struct gate_action *gact;
>> > + int action;
>> > +
>> > + tcf_lastuse_update(&g->tcf_tm);
>> > + bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
>> > +
>> > + action = READ_ONCE(g->tcf_action);
>> > + rcu_read_lock();
>>
>> Action fastpath is already rcu read lock protected, you don't need to
>> manually obtain it.
>
> Will be removed.
>
>>
>> > + gact = rcu_dereference_bh(g->actg);
>> > + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
>>
>> Can't current_gate_status be concurrently modified by timer callback?
>> This function doesn't use entry_lock to synchronize with timer.
>
> Will try tcf_lock either.
>
>>
>> > + rcu_read_unlock();
>> > + return action;
>> > + }
>> > +
>> > + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
>>
>> ...and here
>>
>> > + goto drop;
>> > +
>> > + if (gact->current_max_octets >= 0) {
>> > + gact->current_entry_octets += qdisc_pkt_len(skb);
>> > + if (gact->current_entry_octets >
>> > + gact->current_max_octets) {
>>
>> here also.
>>
>> > +
>> > + qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
>>
>> Please use tcf_action_inc_overlimit_qstats() and other wrappers for stats.
>> Otherwise it will crash if user passes TCA_ACT_FLAGS_NO_PERCPU_STATS
>> flag.
>
> The tcf_action_inc_overlimit_qstats() can't show limit counts in tc show command. Is there anything need to do?

What do you mean? Internally tcf_action_inc_overlimit_qstats() just
calls qstats_overlimit_inc, if cpu_qstats percpu counter is not NULL:


if (likely(a->cpu_qstats)) {
qstats_overlimit_inc(this_cpu_ptr(a->cpu_qstats));
return;
}

Is there a subtle bug somewhere in this function?

>
>>
>> > + goto drop;
>> > + }
>> > + }
>> > + rcu_read_unlock();
>> > +
>> > + return action;
>> > +drop:
>> > + rcu_read_unlock();
>> > + qstats_drop_inc(this_cpu_ptr(g->common.cpu_qstats));
>> > + return TC_ACT_SHOT;
>> > +}
>> > +
>> > +static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1]
>> = {
>> > + [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
>> > + [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
>> > + [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
>> > + [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
>> > + [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
>> > +};
>> > +
>> > +static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
>> > + [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
>> > + .type = NLA_EXACT_LEN },
>> > + [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
>> > + [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
>> > + [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
>> > + [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
>> > + [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
>> > + [TCA_GATE_FLAGS] = { .type = NLA_U32 },
>> > + [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
>> > +};
>> > +
>> > +static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
>> > + struct netlink_ext_ack *extack) {
>> > + u32 interval = 0;
>> > +
>> > + entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
>> > +
>> > + if (tb[TCA_GATE_ENTRY_INTERVAL])
>> > + interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
>> > +
>> > + if (interval == 0) {
>> > + NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
>> > + return -EINVAL;
>> > + }
>> > +
>> > + entry->interval = interval;
>> > +
>> > + if (tb[TCA_GATE_ENTRY_IPV])
>> > + entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
>> > + else
>> > + entry->ipv = -1;
>> > +
>> > + if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
>> > + entry->maxoctets =
>> nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
>> > + else
>> > + entry->maxoctets = -1;
>> > +
>> > + return 0;
>> > +}
>> > +
>> > +static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry
>> *entry,
>> > + int index, struct netlink_ext_ack *extack) {
>> > + struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
>> > + int err;
>>
>> > + err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy,
>> extack);
>> > + if (err < 0) {
>> > + NL_SET_ERR_MSG(extack, "Could not parse nested entry");
>> > + return -EINVAL;
>> > + }
>> > +
>> > + entry->index = index;
>> > +
>> > + return fill_gate_entry(tb, entry, extack); }
>> > +
>> > +static int parse_gate_list(struct nlattr *list_attr,
>> > + struct tcf_gate_params *sched,
>> > + struct netlink_ext_ack *extack) {
>> > + struct tcfg_gate_entry *entry, *e;
>> > + struct nlattr *n;
>> > + int err, rem;
>> > + int i = 0;
>> > +
>> > + if (!list_attr)
>> > + return -EINVAL;
>> > +
>> > + nla_for_each_nested(n, list_attr, rem) {
>> > + if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
>> > + NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
>> > + continue;
>> > + }
>> > +
>> > + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
>> > + if (!entry) {
>> > + NL_SET_ERR_MSG(extack, "Not enough memory for entry");
>> > + err = -ENOMEM;
>> > + goto release_list;
>> > + }
>> > +
>> > + err = parse_gate_entry(n, entry, i, extack);
>> > + if (err < 0) {
>> > + kfree(entry);
>> > + goto release_list;
>> > + }
>> > +
>> > + list_add_tail(&entry->list, &sched->entries);
>> > + i++;
>> > + }
>> > +
>> > + sched->num_entries = i;
>> > +
>> > + return i;
>> > +
>> > +release_list:
>> > + list_for_each_entry_safe(entry, e, &sched->entries, list) {
>> > + list_del(&entry->list);
>> > + kfree(entry);
>> > + }
>> > +
>> > + return err;
>> > +}
>> > +
>> > +static int tcf_gate_init(struct net *net, struct nlattr *nla,
>> > + struct nlattr *est, struct tc_action **a,
>> > + int ovr, int bind, bool rtnl_held,
>> > + struct tcf_proto *tp, u32 flags,
>> > + struct netlink_ext_ack *extack) {
>> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
>> > + enum tk_offsets tk_offset = TK_OFFS_TAI;
>> > + struct nlattr *tb[TCA_GATE_MAX + 1];
>> > + struct tcf_chain *goto_ch = NULL;
>> > + struct tcfg_gate_entry *next;
>> > + struct tcf_gate_params *p;
>> > + struct gate_action *gact;
>> > + s32 clockid = CLOCK_TAI;
>> > + struct tc_gate *parm;
>> > + struct tcf_gate *g;
>> > + int ret = 0, err;
>> > + u64 basetime = 0;
>> > + u32 gflags = 0;
>> > + s32 prio = -1;
>> > + ktime_t start;
>> > + u32 index;
>> > +
>> > + if (!nla)
>> > + return -EINVAL;
>> > +
>> > + err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
>> > + if (err < 0)
>> > + return err;
>> > +
>> > + if (!tb[TCA_GATE_PARMS])
>> > + return -EINVAL;
>> > + parm = nla_data(tb[TCA_GATE_PARMS]);
>> > + index = parm->index;
>> > + err = tcf_idr_check_alloc(tn, &index, a, bind);
>> > + if (err < 0)
>> > + return err;
>> > +
>> > + if (err && bind)
>> > + return 0;
>> > +
>> > + if (!err) {
>> > + ret = tcf_idr_create_from_flags(tn, index, est, a,
>> > + &act_gate_ops, bind, flags);
>> > + if (ret) {
>> > + tcf_idr_cleanup(tn, index);
>> > + return ret;
>> > + }
>> > +
>> > + ret = ACT_P_CREATED;
>> > + } else if (!ovr) {
>> > + tcf_idr_release(*a, bind);
>> > + return -EEXIST;
>> > + }
>> > +
>> > + if (tb[TCA_GATE_PRIORITY])
>> > + prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
>> > +
>> > + if (tb[TCA_GATE_BASE_TIME])
>> > + basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
>> > +
>> > + if (tb[TCA_GATE_FLAGS])
>> > + gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
>> > +
>> > + if (tb[TCA_GATE_CLOCKID]) {
>> > + clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
>> > + switch (clockid) {
>> > + case CLOCK_REALTIME:
>> > + tk_offset = TK_OFFS_REAL;
>> > + break;
>> > + case CLOCK_MONOTONIC:
>> > + tk_offset = TK_OFFS_MAX;
>> > + break;
>> > + case CLOCK_BOOTTIME:
>> > + tk_offset = TK_OFFS_BOOT;
>> > + break;
>> > + case CLOCK_TAI:
>> > + tk_offset = TK_OFFS_TAI;
>> > + break;
>> > + default:
>> > + NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
>> > + goto release_idr;
>> > + }
>> > + }
>> > +
>> > + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
>> > + if (err < 0)
>> > + goto release_idr;
>> > +
>> > + g = to_gate(*a);
>> > +
>> > + gact = kzalloc(sizeof(*gact), GFP_KERNEL);
>> > + if (!gact) {
>> > + err = -ENOMEM;
>> > + goto put_chain;
>> > + }
>> > +
>> > + p = get_gate_param(gact);
>> > +
>> > + INIT_LIST_HEAD(&p->entries);
>> > + if (tb[TCA_GATE_ENTRY_LIST]) {
>> > + err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
>> > + if (err < 0)
>> > + goto release_mem;
>> > + }
>> > +
>> > + if (tb[TCA_GATE_CYCLE_TIME]) {
>> > + p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
>> > + } else {
>> > + struct tcfg_gate_entry *entry;
>> > + ktime_t cycle = 0;
>> > +
>> > + list_for_each_entry(entry, &p->entries, list)
>> > + cycle = ktime_add_ns(cycle, entry->interval);
>> > + p->tcfg_cycletime = cycle;
>> > + }
>> > +
>> > + if (tb[TCA_GATE_CYCLE_TIME_EXT])
>> > + p->tcfg_cycletime_ext =
>> > + nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
>> > +
>> > + p->tcfg_priority = prio;
>> > + p->tcfg_basetime = basetime;
>> > + p->tcfg_clockid = clockid;
>> > + p->tcfg_flags = gflags;
>> > +
>> > + gact->tk_offset = tk_offset;
>> > + spin_lock_init(&gact->entry_lock);
>> > + hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
>> > + gact->hitimer.function = gate_timer_func;
>> > +
>> > + err = gate_get_start_time(gact, &start);
>> > + if (err < 0) {
>> > + NL_SET_ERR_MSG(extack,
>> > + "Internal error: failed get start time");
>> > + goto release_mem;
>> > + }
>> > +
>> > + gact->current_close_time = start;
>> > + gact->current_gate_status = GATE_ACT_GATE_OPEN |
>> > + GATE_ACT_PENDING;
>> > +
>> > + next = list_first_entry(&p->entries, struct tcfg_gate_entry, list);
>> > + rcu_assign_pointer(gact->next_entry, next);
>> > +
>> > + gate_start_timer(gact, start);
>> > +
>> > + spin_lock_bh(&g->tcf_lock);
>> > + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
>> > + gact = rcu_replace_pointer(g->actg, gact,
>> > + lockdep_is_held(&g->tcf_lock));
>> > + spin_unlock_bh(&g->tcf_lock);
>> > +
>> > + if (goto_ch)
>> > + tcf_chain_put_by_act(goto_ch);
>> > + if (gact)
>> > + kfree_rcu(gact, rcu);
>>
>> This leaks entries. For example, tunnel key action implements
>> tunnel_key_release_params() helper that is used by both init and release
>> code. I guess that would be the best approach here as well.
>>
>
> Will refer the tunnel action. Thanks for pointing out.
>
>> > +
>> > + if (ret == ACT_P_CREATED)
>> > + tcf_idr_insert(tn, *a);
>> > + return ret;
>> > +
>> > +release_mem:
>> > + kfree(gact);
>> > +put_chain:
>> > + if (goto_ch)
>> > + tcf_chain_put_by_act(goto_ch);
>> > +release_idr:
>> > + tcf_idr_release(*a, bind);
>> > + return err;
>> > +}
>> > +
>> > +static void tcf_gate_cleanup(struct tc_action *a) {
>> > + struct tcf_gate *g = to_gate(a);
>> > + struct tcfg_gate_entry *entry, *n;
>> > + struct tcf_gate_params *p;
>> > + struct gate_action *gact;
>> > +
>> > + spin_lock_bh(&g->tcf_lock);
>> > + gact = rcu_dereference_protected(g->actg,
>> > + lockdep_is_held(&g->tcf_lock));
>> > + hrtimer_cancel(&gact->hitimer);
>> > +
>> > + p = get_gate_param(gact);
>> > + list_for_each_entry_safe(entry, n, &p->entries, list) {
>> > + list_del(&entry->list);
>> > + kfree(entry);
>> > + }
>> > + spin_unlock_bh(&g->tcf_lock);
>> > +
>> > + kfree_rcu(gact, rcu);
>> > +}
>> > +
>> > +static int dumping_entry(struct sk_buff *skb,
>> > + struct tcfg_gate_entry *entry) {
>> > + struct nlattr *item;
>> > +
>> > + item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY);
>> > + if (!item)
>> > + return -ENOSPC;
>> > +
>> > + if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index))
>> > + goto nla_put_failure;
>> > +
>> > + if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE))
>> > + goto nla_put_failure;
>> > +
>> > + if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval))
>> > + goto nla_put_failure;
>> > +
>> > + if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry-
>> >maxoctets))
>> > + goto nla_put_failure;
>> > +
>> > + if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv))
>> > + goto nla_put_failure;
>> > +
>> > + return nla_nest_end(skb, item);
>> > +
>> > +nla_put_failure:
>> > + nla_nest_cancel(skb, item);
>> > + return -1;
>> > +}
>> > +
>> > +static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a,
>> > + int bind, int ref) {
>> > + unsigned char *b = skb_tail_pointer(skb);
>> > + struct tcf_gate *g = to_gate(a);
>> > + struct tc_gate opt = {
>> > + .index = g->tcf_index,
>> > + .refcnt = refcount_read(&g->tcf_refcnt) - ref,
>> > + .bindcnt = atomic_read(&g->tcf_bindcnt) - bind,
>> > + };
>> > + struct tcfg_gate_entry *entry;
>> > + struct gate_action *gact;
>> > + struct tcf_gate_params *p;
>> > + struct nlattr *entry_list;
>> > + struct tcf_t t;
>> > +
>> > + spin_lock_bh(&g->tcf_lock);
>> > + opt.action = g->tcf_action;
>> > + gact = rcu_dereference_protected(g->actg,
>> > + lockdep_is_held(&g->tcf_lock));
>> > +
>> > + p = get_gate_param(gact);
>> > +
>> > + if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt))
>> > + goto nla_put_failure;
>> > +
>> > + if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME,
>> > + p->tcfg_basetime, TCA_GATE_PAD))
>> > + goto nla_put_failure;
>> > +
>> > + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME,
>> > + p->tcfg_cycletime, TCA_GATE_PAD))
>> > + goto nla_put_failure;
>> > +
>> > + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT,
>> > + p->tcfg_cycletime_ext, TCA_GATE_PAD))
>> > + goto nla_put_failure;
>> > +
>> > + if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid))
>> > + goto nla_put_failure;
>> > +
>> > + if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags))
>> > + goto nla_put_failure;
>> > +
>> > + if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority))
>> > + goto nla_put_failure;
>> > +
>> > + entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST);
>> > + if (!entry_list)
>> > + goto nla_put_failure;
>> > +
>> > + list_for_each_entry(entry, &p->entries, list) {
>> > + if (dumping_entry(skb, entry) < 0)
>> > + goto nla_put_failure;
>> > + }
>> > +
>> > + nla_nest_end(skb, entry_list);
>> > +
>> > + tcf_tm_dump(&t, &g->tcf_tm);
>> > + if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD))
>> > + goto nla_put_failure;
>> > + spin_unlock_bh(&g->tcf_lock);
>> > +
>> > + return skb->len;
>> > +
>> > +nla_put_failure:
>> > + spin_unlock_bh(&g->tcf_lock);
>> > + nlmsg_trim(skb, b);
>> > + return -1;
>> > +}
>> > +
>> > +static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
>> > + struct netlink_callback *cb, int type,
>> > + const struct tc_action_ops *ops,
>> > + struct netlink_ext_ack *extack) {
>> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
>> > +
>> > + return tcf_generic_walker(tn, skb, cb, type, ops, extack); }
>> > +
>> > +static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u32
>> packets,
>> > + u64 lastuse, bool hw) {
>> > + struct tcf_gate *g = to_gate(a);
>> > + struct tcf_t *tm = &g->tcf_tm;
>> > +
>> > + tcf_action_update_stats(a, bytes, packets, false, hw);
>> > + tm->lastuse = max_t(u64, tm->lastuse, lastuse); }
>> > +
>> > +static int tcf_gate_search(struct net *net, struct tc_action **a, u32
>> > +index) {
>> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
>> > +
>> > + return tcf_idr_search(tn, a, index); }
>> > +
>> > +static size_t tcf_gate_get_fill_size(const struct tc_action *act) {
>> > + return nla_total_size(sizeof(struct tc_gate)); }
>> > +
>> > +static struct tc_action_ops act_gate_ops = {
>> > + .kind = "gate",
>> > + .id = TCA_ID_GATE,
>> > + .owner = THIS_MODULE,
>> > + .act = tcf_gate_act,
>> > + .dump = tcf_gate_dump,
>> > + .init = tcf_gate_init,
>> > + .cleanup = tcf_gate_cleanup,
>> > + .walk = tcf_gate_walker,
>> > + .stats_update = tcf_gate_stats_update,
>> > + .get_fill_size = tcf_gate_get_fill_size,
>> > + .lookup = tcf_gate_search,
>> > + .size = sizeof(struct gate_action),
>> > +};
>> > +
>> > +static __net_init int gate_init_net(struct net *net) {
>> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
>> > +
>> > + return tc_action_net_init(net, tn, &act_gate_ops); }
>> > +
>> > +static void __net_exit gate_exit_net(struct list_head *net_list) {
>> > + tc_action_net_exit(net_list, gate_net_id); }
>> > +
>> > +static struct pernet_operations gate_net_ops = {
>> > + .init = gate_init_net,
>> > + .exit_batch = gate_exit_net,
>> > + .id = &gate_net_id,
>> > + .size = sizeof(struct tc_action_net), };
>> > +
>> > +static int __init gate_init_module(void) {
>> > + return tcf_register_action(&act_gate_ops, &gate_net_ops); }
>> > +
>> > +static void __exit gate_cleanup_module(void) {
>> > + tcf_unregister_action(&act_gate_ops, &gate_net_ops); }
>> > +
>> > +module_init(gate_init_module);
>> > +module_exit(gate_cleanup_module);
>> > +MODULE_LICENSE("GPL v2");
>
> Thanks a lot.
>
> Br,
> Po Liu

2020-04-23 08:34:49

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action




> -----Original Message-----
> From: Vlad Buslov <[email protected]>
> Sent: 2020年4月23日 15:43
> To: Po Liu <[email protected]>
> Cc: Vlad Buslov <[email protected]>; [email protected]; linux-
> [email protected]; [email protected];
> [email protected]; Claudiu Manoil <[email protected]>;
> Vladimir Oltean <[email protected]>; Alexandru Marginean
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control
> flow action
>
> Caution: EXT Email
>
> On Thu 23 Apr 2020 at 06:14, Po Liu <[email protected]> wrote:
> > Hi Vlad Buslov,
> >
> >> -----Original Message-----
> >> From: Vlad Buslov <[email protected]>
> >> Sent: 2020年4月22日 21:23
> >> To: Po Liu <[email protected]>
> >> Cc: [email protected]; [email protected];
> >> [email protected]; [email protected]; Claudiu Manoil
> >> <[email protected]>; Vladimir Oltean
> <[email protected]>;
> >> Alexandru Marginean <[email protected]>;
> >> [email protected]; [email protected];
> [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected];
> >> [email protected]; [email protected]; [email protected];
> >> [email protected]; [email protected];
> [email protected];
> >> [email protected]; [email protected];
> >> [email protected]
> >> Subject: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate
> >> control flow action
> >>
> >> Caution: EXT Email
> >>
> >> Hi Po,
> >>
> >> On Wed 22 Apr 2020 at 05:48, Po Liu <[email protected]> wrote:
> >> > Introduce a ingress frame gate control flow action.
> >> > Tc gate action does the work like this:
> >> > Assume there is a gate allow specified ingress frames can be passed
> >> > at specific time slot, and be dropped at specific time slot. Tc
> >> > filter chooses the ingress frames, and tc gate action would specify
> >> > what slot does these frames can be passed to device and what time
> >> > slot would be dropped.
> >> > Tc gate action would provide an entry list to tell how much time
> >> > gate keep open and how much time gate keep state close. Gate
> action
> >> > also assign a start time to tell when the entry list start. Then
> >> > driver would repeat the gate entry list cyclically.
> >> > For the software simulation, gate action requires the user assign a
> >> > time clock type.
> >> >
> >> > Below is the setting example in user space. Tc filter a stream
> >> > source ip address is 192.168.0.20 and gate action own two time
> >> > slots. One is last 200ms gate open let frame pass another is last
> >> > 100ms gate close let frames dropped. When the frames have passed
> >> > total frames over
> >> > 8000000 bytes, frames will be dropped in one 200000000ns time slot.
> >> >
> >> >> tc qdisc add dev eth0 ingress
> >> >
> >> >> tc filter add dev eth0 parent ffff: protocol ip \
> >> > flower src_ip 192.168.0.20 \
> >> > action gate index 2 clockid CLOCK_TAI \
> >> > sched-entry open 200000000 -1 8000000 \
> >> > sched-entry close 100000000 -1 -1
> >> >
> >> >> tc chain del dev eth0 ingress chain 0
> >> >
> >> > "sched-entry" follow the name taprio style. Gate state is
> >> > "open"/"close". Follow with period nanosecond. Then next item is
> >> > internal priority value means which ingress queue should put. "-1"
> >> > means wildcard. The last value optional specifies the maximum
> >> > number of MSDU octets that are permitted to pass the gate during
> >> > the specified time interval.
> >> > Base-time is not set will be 0 as default, as result start time
> >> > would be ((N + 1) * cycletime) which is the minimal of future time.
> >> >
> >> > Below example shows filtering a stream with destination mac address
> >> > is
> >> > 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The
> >> > gate action would run with one close time slot which means always
> >> > keep
> >> close.
> >> > The time cycle is total 200000000ns. The base-time would calculate by:
> >> >
> >> > 1357000000000 + (N + 1) * cycletime
> >> >
> >> > When the total value is the future time, it will be the start time.
> >> > The cycletime here would be 200000000ns for this case.
> >> >
> >> >> tc filter add dev eth0 parent ffff: protocol ip \
> >> > flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> >> > action gate index 12 base-time 1357000000000 \
> >> > sched-entry close 200000000 -1 -1 \
> >> > clockid CLOCK_TAI
> >> >
> >> > Signed-off-by: Po Liu <[email protected]>
> >> > ---
> >> > include/net/tc_act/tc_gate.h | 54 +++
> >> > include/uapi/linux/pkt_cls.h | 1 +
> >> > include/uapi/linux/tc_act/tc_gate.h | 47 ++
> >> > net/sched/Kconfig | 13 +
> >> > net/sched/Makefile | 1 +
> >> > net/sched/act_gate.c | 647
> ++++++++++++++++++++++++++++
> >> > 6 files changed, 763 insertions(+) create mode 100644
> >> > include/net/tc_act/tc_gate.h create mode 100644
> >> > include/uapi/linux/tc_act/tc_gate.h
> >> > create mode 100644 net/sched/act_gate.c
> >> >
> >> > diff --git a/include/net/tc_act/tc_gate.h
> >> > b/include/net/tc_act/tc_gate.h new file mode 100644 index
> >> > 000000000000..b0ace55b2aaa
> >> > --- /dev/null
> >> > +++ b/include/net/tc_act/tc_gate.h
> >> > @@ -0,0 +1,54 @@
> >> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> >> > +/* Copyright 2020 NXP */
> >> > +
> >> > +#ifndef __NET_TC_GATE_H
> >> > +#define __NET_TC_GATE_H
> >> > +
> >> > +#include <net/act_api.h>
> >> > +#include <linux/tc_act/tc_gate.h>
> >> > +
> >> > +struct tcfg_gate_entry {
> >> > + int index;
> >> > + u8 gate_state;
> >> > + u32 interval;
> >> > + s32 ipv;
> >> > + s32 maxoctets;
> >> > + struct list_head list;
> >> > +};
> >> > +
> >> > +struct tcf_gate_params {
> >> > + s32 tcfg_priority;
> >> > + u64 tcfg_basetime;
> >> > + u64 tcfg_cycletime;
> >> > + u64 tcfg_cycletime_ext;
> >> > + u32 tcfg_flags;
> >> > + s32 tcfg_clockid;
> >> > + size_t num_entries;
> >> > + struct list_head entries;
> >> > +};
> >> > +
> >> > +#define GATE_ACT_GATE_OPEN BIT(0)
> >> > +#define GATE_ACT_PENDING BIT(1)
> >> > +struct gate_action {
> >> > + struct tcf_gate_params param;
> >> > + spinlock_t entry_lock;
> >> > + u8 current_gate_status;
> >> > + ktime_t current_close_time;
> >> > + u32 current_entry_octets;
> >> > + s32 current_max_octets;
> >> > + struct tcfg_gate_entry __rcu *next_entry;
> >> > + struct hrtimer hitimer;
> >> > + enum tk_offsets tk_offset;
> >> > + struct rcu_head rcu;
> >> > +};
> >> > +
> >> > +struct tcf_gate {
> >> > + struct tc_action common;
> >> > + struct gate_action __rcu *actg;
> >> > +};
> >> > +#define to_gate(a) ((struct tcf_gate *)a)
> >> > +
> >> > +#define get_gate_param(act) ((struct tcf_gate_params *)act)
> >> > +#define
> >> > +get_gate_action(p) ((struct gate_action *)p)
> >> > +
> >> > +#endif
> >> > diff --git a/include/uapi/linux/pkt_cls.h
> >> > b/include/uapi/linux/pkt_cls.h index 9f06d29cab70..fc672b232437
> >> 100644
> >> > --- a/include/uapi/linux/pkt_cls.h
> >> > +++ b/include/uapi/linux/pkt_cls.h
> >> > @@ -134,6 +134,7 @@ enum tca_id {
> >> > TCA_ID_CTINFO,
> >> > TCA_ID_MPLS,
> >> > TCA_ID_CT,
> >> > + TCA_ID_GATE,
> >> > /* other actions go here */
> >> > __TCA_ID_MAX = 255
> >> > };
> >> > diff --git a/include/uapi/linux/tc_act/tc_gate.h
> >> > b/include/uapi/linux/tc_act/tc_gate.h
> >> > new file mode 100644
> >> > index 000000000000..f214b3a6d44f
> >> > --- /dev/null
> >> > +++ b/include/uapi/linux/tc_act/tc_gate.h
> >> > @@ -0,0 +1,47 @@
> >> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> >> > +/* Copyright 2020 NXP */
> >> > +
> >> > +#ifndef __LINUX_TC_GATE_H
> >> > +#define __LINUX_TC_GATE_H
> >> > +
> >> > +#include <linux/pkt_cls.h>
> >> > +
> >> > +struct tc_gate {
> >> > + tc_gen;
> >> > +};
> >> > +
> >> > +enum {
> >> > + TCA_GATE_ENTRY_UNSPEC,
> >> > + TCA_GATE_ENTRY_INDEX,
> >> > + TCA_GATE_ENTRY_GATE,
> >> > + TCA_GATE_ENTRY_INTERVAL,
> >> > + TCA_GATE_ENTRY_IPV,
> >> > + TCA_GATE_ENTRY_MAX_OCTETS,
> >> > + __TCA_GATE_ENTRY_MAX,
> >> > +};
> >> > +#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
> >> > +
> >> > +enum {
> >> > + TCA_GATE_ONE_ENTRY_UNSPEC,
> >> > + TCA_GATE_ONE_ENTRY,
> >> > + __TCA_GATE_ONE_ENTRY_MAX,
> >> > +};
> >> > +#define TCA_GATE_ONE_ENTRY_MAX
> (__TCA_GATE_ONE_ENTRY_MAX
> >> - 1)
> >> > +
> >> > +enum {
> >> > + TCA_GATE_UNSPEC,
> >> > + TCA_GATE_TM,
> >> > + TCA_GATE_PARMS,
> >> > + TCA_GATE_PAD,
> >> > + TCA_GATE_PRIORITY,
> >> > + TCA_GATE_ENTRY_LIST,
> >> > + TCA_GATE_BASE_TIME,
> >> > + TCA_GATE_CYCLE_TIME,
> >> > + TCA_GATE_CYCLE_TIME_EXT,
> >> > + TCA_GATE_FLAGS,
> >> > + TCA_GATE_CLOCKID,
> >> > + __TCA_GATE_MAX,
> >> > +};
> >> > +#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
> >> > +
> >> > +#endif
> >> > diff --git a/net/sched/Kconfig b/net/sched/Kconfig index
> >> > bfbefb7bff9d..1314549c7567 100644
> >> > --- a/net/sched/Kconfig
> >> > +++ b/net/sched/Kconfig
> >> > @@ -981,6 +981,19 @@ config NET_ACT_CT
> >> > To compile this code as a module, choose M here: the
> >> > module will be called act_ct.
> >> >
> >> > +config NET_ACT_GATE
> >> > + tristate "Frame gate entry list control tc action"
> >> > + depends on NET_CLS_ACT
> >> > + help
> >> > + Say Y here to allow to control the ingress flow to be passed at
> >> > + specific time slot and be dropped at other specific time slot by
> >> > + the gate entry list. The manipulation will simulate the IEEE
> >> > + 802.1Qci stream gate control behavior.
> >> > +
> >> > + If unsure, say N.
> >> > + To compile this code as a module, choose M here: the
> >> > + module will be called act_gate.
> >> > +
> >> > config NET_IFE_SKBMARK
> >> > tristate "Support to encoding decoding skb mark on IFE action"
> >> > depends on NET_ACT_IFE
> >> > diff --git a/net/sched/Makefile b/net/sched/Makefile index
> >> > 31c367a6cd09..66bbf9a98f9e 100644
> >> > --- a/net/sched/Makefile
> >> > +++ b/net/sched/Makefile
> >> > @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) +=
> >> act_meta_skbprio.o
> >> > obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
> >> > obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
> >> > obj-$(CONFIG_NET_ACT_CT) += act_ct.o
> >> > +obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
> >> > obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
> >> > obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
> >> > obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
> >> > diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c new file
> >> > mode
> >> > 100644 index 000000000000..e932f402b4f1
> >> > --- /dev/null
> >> > +++ b/net/sched/act_gate.c
> >> > @@ -0,0 +1,647 @@
> >> > +// SPDX-License-Identifier: GPL-2.0-or-later
> >> > +/* Copyright 2020 NXP */
> >> > +
> >> > +#include <linux/module.h>
> >> > +#include <linux/types.h>
> >> > +#include <linux/kernel.h>
> >> > +#include <linux/string.h>
> >> > +#include <linux/errno.h>
> >> > +#include <linux/skbuff.h>
> >> > +#include <linux/rtnetlink.h>
> >> > +#include <linux/init.h>
> >> > +#include <linux/slab.h>
> >> > +#include <net/act_api.h>
> >> > +#include <net/netlink.h>
> >> > +#include <net/pkt_cls.h>
> >> > +#include <net/tc_act/tc_gate.h>
> >> > +
> >> > +static unsigned int gate_net_id;
> >> > +static struct tc_action_ops act_gate_ops;
> >> > +
> >> > +static ktime_t gate_get_time(struct gate_action *gact) {
> >> > + ktime_t mono = ktime_get();
> >> > +
> >> > + switch (gact->tk_offset) {
> >> > + case TK_OFFS_MAX:
> >> > + return mono;
> >> > + default:
> >> > + return ktime_mono_to_any(mono, gact->tk_offset);
> >> > + }
> >> > +
> >> > + return KTIME_MAX;
> >> > +}
> >> > +
> >> > +static int gate_get_start_time(struct gate_action *gact, ktime_t
> >> > +*start) {
> >> > + struct tcf_gate_params *param = get_gate_param(gact);
> >> > + ktime_t now, base, cycle;
> >> > + u64 n;
> >> > +
> >> > + base = ns_to_ktime(param->tcfg_basetime);
> >> > + now = gate_get_time(gact);
> >> > +
> >> > + if (ktime_after(base, now)) {
> >> > + *start = base;
> >> > + return 0;
> >> > + }
> >> > +
> >> > + cycle = param->tcfg_cycletime;
> >> > +
> >> > + /* cycle time should not be zero */
> >> > + if (WARN_ON(!cycle))
> >> > + return -EFAULT;
> >>
> >> Looking at the init code it seems that this value can be set to 0
> >> directly from netlink packet without further validation, which would
> >> allow user to trigger warning here.
> >
> > Yes, will avoid at ahead point.
> >
> >>
> >> > +
> >> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
> >> > + *start = ktime_add_ns(base, (n + 1) * cycle);
> >> > + return 0;
> >> > +}
> >> > +
> >> > +static void gate_start_timer(struct gate_action *gact, ktime_t
> >> > +start) {
> >> > + ktime_t expires;
> >> > +
> >> > + expires = hrtimer_get_expires(&gact->hitimer);
> >> > + if (expires == 0)
> >> > + expires = KTIME_MAX;
> >> > +
> >> > + start = min_t(ktime_t, start, expires);
> >> > +
> >> > + hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS); }
> >> > +
> >> > +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer) {
> >> > + struct gate_action *gact = container_of(timer, struct gate_action,
> >> > + hitimer);
> >> > + struct tcf_gate_params *p = get_gate_param(gact);
> >> > + struct tcfg_gate_entry *next;
> >> > + ktime_t close_time, now;
> >> > +
> >> > + spin_lock(&gact->entry_lock);
> >> > +
> >> > + next = rcu_dereference_protected(gact->next_entry,
> >> > +
> >> > + lockdep_is_held(&gact->entry_lock));
> >> > +
> >> > + /* cycle start, clear pending bit, clear total octets */
> >> > + gact->current_gate_status = next->gate_state ?
> >> GATE_ACT_GATE_OPEN : 0;
> >> > + gact->current_entry_octets = 0;
> >> > + gact->current_max_octets = next->maxoctets;
> >> > +
> >> > + gact->current_close_time = ktime_add_ns(gact-
> >current_close_time,
> >> > + next->interval);
> >> > +
> >> > + close_time = gact->current_close_time;
> >> > +
> >> > + if (list_is_last(&next->list, &p->entries))
> >> > + next = list_first_entry(&p->entries,
> >> > + struct tcfg_gate_entry, list);
> >> > + else
> >> > + next = list_next_entry(next, list);
> >> > +
> >> > + now = gate_get_time(gact);
> >> > +
> >> > + if (ktime_after(now, close_time)) {
> >> > + ktime_t cycle, base;
> >> > + u64 n;
> >> > +
> >> > + cycle = p->tcfg_cycletime;
> >> > + base = ns_to_ktime(p->tcfg_basetime);
> >> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
> >> > + close_time = ktime_add_ns(base, (n + 1) * cycle);
> >> > + }
> >> > +
> >> > + rcu_assign_pointer(gact->next_entry, next);
> >> > + spin_unlock(&gact->entry_lock);
> >>
> >> I have couple of question about synchronization here:
> >>
> >> - Why do you need next_entry to be rcu pointer? It is only assigned
> >> here with entry_lock protection and in init code before action is
> >> visible to concurrent users. I don't see any unlocked rcu-protected
> >> readers here that could benefit from it.
> >>
> >> - Why create dedicated entry_lock instead of using already existing
> >> per- action tcf_lock?
> >
> > Will try to use the tcf_lock for verification.
> > The thoughts came from that the timer period arrived then check
> > through the list and then update next time would take much more time.
> > Action function would be busy when traffic. So use a separate lock
> > here for
> >
> >>
> >> > +
> >> > + hrtimer_set_expires(&gact->hitimer, close_time);
> >> > +
> >> > + return HRTIMER_RESTART;
> >> > +}
> >> > +
> >> > +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
> >> > + struct tcf_result *res) {
> >> > + struct tcf_gate *g = to_gate(a);
> >> > + struct gate_action *gact;
> >> > + int action;
> >> > +
> >> > + tcf_lastuse_update(&g->tcf_tm);
> >> > + bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
> >> > +
> >> > + action = READ_ONCE(g->tcf_action);
> >> > + rcu_read_lock();
> >>
> >> Action fastpath is already rcu read lock protected, you don't need to
> >> manually obtain it.
> >
> > Will be removed.
> >
> >>
> >> > + gact = rcu_dereference_bh(g->actg);
> >> > + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
> >>
> >> Can't current_gate_status be concurrently modified by timer callback?
> >> This function doesn't use entry_lock to synchronize with timer.
> >
> > Will try tcf_lock either.
> >
> >>
> >> > + rcu_read_unlock();
> >> > + return action;
> >> > + }
> >> > +
> >> > + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
> >>
> >> ...and here
> >>
> >> > + goto drop;
> >> > +
> >> > + if (gact->current_max_octets >= 0) {
> >> > + gact->current_entry_octets += qdisc_pkt_len(skb);
> >> > + if (gact->current_entry_octets >
> >> > + gact->current_max_octets) {
> >>
> >> here also.
> >>
> >> > +
> >> > + qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
> >>
> >> Please use tcf_action_inc_overlimit_qstats() and other wrappers for
> stats.
> >> Otherwise it will crash if user passes
> TCA_ACT_FLAGS_NO_PERCPU_STATS
> >> flag.
> >
> > The tcf_action_inc_overlimit_qstats() can't show limit counts in tc show
> command. Is there anything need to do?
>
> What do you mean? Internally tcf_action_inc_overlimit_qstats() just calls
> qstats_overlimit_inc, if cpu_qstats percpu counter is not NULL:
>
>
> if (likely(a->cpu_qstats)) {
> qstats_overlimit_inc(this_cpu_ptr(a->cpu_qstats));
> return;
> }
>
> Is there a subtle bug somewhere in this function?

Sorry, I updated using the tcf_action_*, and the counting is ok. I moved back to the qstats_overlimit_inc() because tcf_action_* () include the spin_lock(&a->tcfa_lock).
I would update to tcf_action_* () increate.

>
> >
> > Br,
> > Po Liu

Thanks a lot.

Br,
Po Liu

2020-04-23 09:17:49

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Hi Vlad Buslov,

> > >> > +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
> {
> > >> > + struct gate_action *gact = container_of(timer, struct
> gate_action,
> > >> > + hitimer);
> > >> > + struct tcf_gate_params *p = get_gate_param(gact);
> > >> > + struct tcfg_gate_entry *next;
> > >> > + ktime_t close_time, now;
> > >> > +
> > >> > + spin_lock(&gact->entry_lock);
> > >> > +
> > >> > + next = rcu_dereference_protected(gact->next_entry,
> > >> > +
> > >> > + lockdep_is_held(&gact->entry_lock));
> > >> > +
> > >> > + /* cycle start, clear pending bit, clear total octets */
> > >> > + gact->current_gate_status = next->gate_state ?
> > >> GATE_ACT_GATE_OPEN : 0;
> > >> > + gact->current_entry_octets = 0;
> > >> > + gact->current_max_octets = next->maxoctets;
> > >> > +
> > >> > + gact->current_close_time = ktime_add_ns(gact-
> > >current_close_time,
> > >> > + next->interval);
> > >> > +
> > >> > + close_time = gact->current_close_time;
> > >> > +
> > >> > + if (list_is_last(&next->list, &p->entries))
> > >> > + next = list_first_entry(&p->entries,
> > >> > + struct tcfg_gate_entry, list);
> > >> > + else
> > >> > + next = list_next_entry(next, list);
> > >> > +
> > >> > + now = gate_get_time(gact);
> > >> > +
> > >> > + if (ktime_after(now, close_time)) {
> > >> > + ktime_t cycle, base;
> > >> > + u64 n;
> > >> > +
> > >> > + cycle = p->tcfg_cycletime;
> > >> > + base = ns_to_ktime(p->tcfg_basetime);
> > >> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
> > >> > + close_time = ktime_add_ns(base, (n + 1) * cycle);
> > >> > + }
> > >> > +
> > >> > + rcu_assign_pointer(gact->next_entry, next);
> > >> > + spin_unlock(&gact->entry_lock);
> > >>
> > >> I have couple of question about synchronization here:
> > >>
> > >> - Why do you need next_entry to be rcu pointer? It is only assigned
> > >> here with entry_lock protection and in init code before action is
> > >> visible to concurrent users. I don't see any unlocked rcu-protected
> > >> readers here that could benefit from it.
> > >>
> > >> - Why create dedicated entry_lock instead of using already existing
> > >> per- action tcf_lock?
> > >
> > > Will try to use the tcf_lock for verification.

I think I added entry_lock was that I can't get the tc_action common parameter in this timer function. If I insist to use the tcf_lock, I have to move the hrtimer to struct tcf_gate which has tc_action common.
What do you think?

> > > The thoughts came from that the timer period arrived then check
> > > through the list and then update next time would take much more
> time.
> > > Action function would be busy when traffic. So use a separate lock
> > > here for
> > >
> > >>
> > >> > +
> > >> > + hrtimer_set_expires(&gact->hitimer, close_time);
> > >> > +
> > >> > + return HRTIMER_RESTART;
> > >> > +}
> > >> > +
> > >> > +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
> > >> > + struct tcf_result *res) {
> > >> > + struct tcf_gate *g = to_gate(a);
> > >> > + struct gate_action *gact;
> > >> > + int action;
> > >> > +
> > >> > + tcf_lastuse_update(&g->tcf_tm);
> > >> > + bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
> > >> > +
> > >> > + action = READ_ONCE(g->tcf_action);
> > >> > + rcu_read_lock();
> > >>
> > >> Action fastpath is already rcu read lock protected, you don't need
> > >> to manually obtain it.
> > >
> > > Will be removed.
> > >
> > >>
> > >> > + gact = rcu_dereference_bh(g->actg);
> > >> > + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING))
> > >> > + {
> > >>
> > >> Can't current_gate_status be concurrently modified by timer callback?
> > >> This function doesn't use entry_lock to synchronize with timer.
> > >
> > > Will try tcf_lock either.
> > >
> > >>
> > >> > + rcu_read_unlock();
> > >> > + return action;
> > >> > + }
> > >> > +
> > >> > + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
> > >>
> > >> ...and here
> > >>
> > >> > + goto drop;
> > >> > +
> > >> > + if (gact->current_max_octets >= 0) {
> > >> > + gact->current_entry_octets += qdisc_pkt_len(skb);
> > >> > + if (gact->current_entry_octets >
> > >> > + gact->current_max_octets) {
> > >>
> > >> here also.
> > >>
> > >> > +
> > >> > + qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
> > >>
> > >> Please use tcf_action_inc_overlimit_qstats() and other wrappers for
> > stats.
> > >> Otherwise it will crash if user passes
> > TCA_ACT_FLAGS_NO_PERCPU_STATS
> > >> flag.
> > >
> > > The tcf_action_inc_overlimit_qstats() can't show limit counts in tc
> > > show
> > command. Is there anything need to do?
> >
> > What do you mean? Internally tcf_action_inc_overlimit_qstats() just
> > calls qstats_overlimit_inc, if cpu_qstats percpu counter is not NULL:
> >
> >
> > if (likely(a->cpu_qstats)) {
> > qstats_overlimit_inc(this_cpu_ptr(a->cpu_qstats));
> > return;
> > }
> >
> > Is there a subtle bug somewhere in this function?
>
> Sorry, I updated using the tcf_action_*, and the counting is ok. I moved
> back to the qstats_overlimit_inc() because tcf_action_* () include the
> spin_lock(&a->tcfa_lock).
> I would update to tcf_action_* () increate.
>
> >
> > >
> > > Br,
> > > Po Liu
>
> Thanks a lot.
>
> Br,
> Po Liu

Thanks a lot.

Br,
Po Liu

2020-04-23 11:05:12

by Vlad Buslov

[permalink] [raw]
Subject: Re: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action


On Thu 23 Apr 2020 at 11:32, Po Liu <[email protected]> wrote:
>> -----Original Message-----
>> From: Vlad Buslov <[email protected]>
>> Sent: 2020年4月23日 15:43
>> To: Po Liu <[email protected]>
>> Cc: Vlad Buslov <[email protected]>; [email protected]; linux-
>> [email protected]; [email protected];
>> [email protected]; Claudiu Manoil <[email protected]>;
>> Vladimir Oltean <[email protected]>; Alexandru Marginean
>> <[email protected]>; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected];
>> [email protected]; [email protected]
>> Subject: Re: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control
>> flow action
>>
>> Caution: EXT Email
>>
>> On Thu 23 Apr 2020 at 06:14, Po Liu <[email protected]> wrote:
>> > Hi Vlad Buslov,
>> >
>> >> -----Original Message-----
>> >> From: Vlad Buslov <[email protected]>
>> >> Sent: 2020年4月22日 21:23
>> >> To: Po Liu <[email protected]>
>> >> Cc: [email protected]; [email protected];
>> >> [email protected]; [email protected]; Claudiu Manoil
>> >> <[email protected]>; Vladimir Oltean
>> <[email protected]>;
>> >> Alexandru Marginean <[email protected]>;
>> >> [email protected]; [email protected];
>> [email protected];
>> >> [email protected]; [email protected]; [email protected];
>> >> [email protected]; [email protected];
>> >> [email protected]; [email protected]; [email protected];
>> >> [email protected]; [email protected];
>> [email protected];
>> >> [email protected]; [email protected];
>> >> [email protected]
>> >> Subject: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate
>> >> control flow action
>> >>
>> >> Caution: EXT Email
>> >>
>> >> Hi Po,
>> >>
>> >> On Wed 22 Apr 2020 at 05:48, Po Liu <[email protected]> wrote:
>> >> > Introduce a ingress frame gate control flow action.
>> >> > Tc gate action does the work like this:
>> >> > Assume there is a gate allow specified ingress frames can be passed
>> >> > at specific time slot, and be dropped at specific time slot. Tc
>> >> > filter chooses the ingress frames, and tc gate action would specify
>> >> > what slot does these frames can be passed to device and what time
>> >> > slot would be dropped.
>> >> > Tc gate action would provide an entry list to tell how much time
>> >> > gate keep open and how much time gate keep state close. Gate
>> action
>> >> > also assign a start time to tell when the entry list start. Then
>> >> > driver would repeat the gate entry list cyclically.
>> >> > For the software simulation, gate action requires the user assign a
>> >> > time clock type.
>> >> >
>> >> > Below is the setting example in user space. Tc filter a stream
>> >> > source ip address is 192.168.0.20 and gate action own two time
>> >> > slots. One is last 200ms gate open let frame pass another is last
>> >> > 100ms gate close let frames dropped. When the frames have passed
>> >> > total frames over
>> >> > 8000000 bytes, frames will be dropped in one 200000000ns time slot.
>> >> >
>> >> >> tc qdisc add dev eth0 ingress
>> >> >
>> >> >> tc filter add dev eth0 parent ffff: protocol ip \
>> >> > flower src_ip 192.168.0.20 \
>> >> > action gate index 2 clockid CLOCK_TAI \
>> >> > sched-entry open 200000000 -1 8000000 \
>> >> > sched-entry close 100000000 -1 -1
>> >> >
>> >> >> tc chain del dev eth0 ingress chain 0
>> >> >
>> >> > "sched-entry" follow the name taprio style. Gate state is
>> >> > "open"/"close". Follow with period nanosecond. Then next item is
>> >> > internal priority value means which ingress queue should put. "-1"
>> >> > means wildcard. The last value optional specifies the maximum
>> >> > number of MSDU octets that are permitted to pass the gate during
>> >> > the specified time interval.
>> >> > Base-time is not set will be 0 as default, as result start time
>> >> > would be ((N + 1) * cycletime) which is the minimal of future time.
>> >> >
>> >> > Below example shows filtering a stream with destination mac address
>> >> > is
>> >> > 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The
>> >> > gate action would run with one close time slot which means always
>> >> > keep
>> >> close.
>> >> > The time cycle is total 200000000ns. The base-time would calculate by:
>> >> >
>> >> > 1357000000000 + (N + 1) * cycletime
>> >> >
>> >> > When the total value is the future time, it will be the start time.
>> >> > The cycletime here would be 200000000ns for this case.
>> >> >
>> >> >> tc filter add dev eth0 parent ffff: protocol ip \
>> >> > flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
>> >> > action gate index 12 base-time 1357000000000 \
>> >> > sched-entry close 200000000 -1 -1 \
>> >> > clockid CLOCK_TAI
>> >> >
>> >> > Signed-off-by: Po Liu <[email protected]>
>> >> > ---
>> >> > include/net/tc_act/tc_gate.h | 54 +++
>> >> > include/uapi/linux/pkt_cls.h | 1 +
>> >> > include/uapi/linux/tc_act/tc_gate.h | 47 ++
>> >> > net/sched/Kconfig | 13 +
>> >> > net/sched/Makefile | 1 +
>> >> > net/sched/act_gate.c | 647
>> ++++++++++++++++++++++++++++
>> >> > 6 files changed, 763 insertions(+) create mode 100644
>> >> > include/net/tc_act/tc_gate.h create mode 100644
>> >> > include/uapi/linux/tc_act/tc_gate.h
>> >> > create mode 100644 net/sched/act_gate.c
>> >> >
>> >> > diff --git a/include/net/tc_act/tc_gate.h
>> >> > b/include/net/tc_act/tc_gate.h new file mode 100644 index
>> >> > 000000000000..b0ace55b2aaa
>> >> > --- /dev/null
>> >> > +++ b/include/net/tc_act/tc_gate.h
>> >> > @@ -0,0 +1,54 @@
>> >> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
>> >> > +/* Copyright 2020 NXP */
>> >> > +
>> >> > +#ifndef __NET_TC_GATE_H
>> >> > +#define __NET_TC_GATE_H
>> >> > +
>> >> > +#include <net/act_api.h>
>> >> > +#include <linux/tc_act/tc_gate.h>
>> >> > +
>> >> > +struct tcfg_gate_entry {
>> >> > + int index;
>> >> > + u8 gate_state;
>> >> > + u32 interval;
>> >> > + s32 ipv;
>> >> > + s32 maxoctets;
>> >> > + struct list_head list;
>> >> > +};
>> >> > +
>> >> > +struct tcf_gate_params {
>> >> > + s32 tcfg_priority;
>> >> > + u64 tcfg_basetime;
>> >> > + u64 tcfg_cycletime;
>> >> > + u64 tcfg_cycletime_ext;
>> >> > + u32 tcfg_flags;
>> >> > + s32 tcfg_clockid;
>> >> > + size_t num_entries;
>> >> > + struct list_head entries;
>> >> > +};
>> >> > +
>> >> > +#define GATE_ACT_GATE_OPEN BIT(0)
>> >> > +#define GATE_ACT_PENDING BIT(1)
>> >> > +struct gate_action {
>> >> > + struct tcf_gate_params param;
>> >> > + spinlock_t entry_lock;
>> >> > + u8 current_gate_status;
>> >> > + ktime_t current_close_time;
>> >> > + u32 current_entry_octets;
>> >> > + s32 current_max_octets;
>> >> > + struct tcfg_gate_entry __rcu *next_entry;
>> >> > + struct hrtimer hitimer;
>> >> > + enum tk_offsets tk_offset;
>> >> > + struct rcu_head rcu;
>> >> > +};
>> >> > +
>> >> > +struct tcf_gate {
>> >> > + struct tc_action common;
>> >> > + struct gate_action __rcu *actg;
>> >> > +};
>> >> > +#define to_gate(a) ((struct tcf_gate *)a)
>> >> > +
>> >> > +#define get_gate_param(act) ((struct tcf_gate_params *)act)
>> >> > +#define
>> >> > +get_gate_action(p) ((struct gate_action *)p)
>> >> > +
>> >> > +#endif
>> >> > diff --git a/include/uapi/linux/pkt_cls.h
>> >> > b/include/uapi/linux/pkt_cls.h index 9f06d29cab70..fc672b232437
>> >> 100644
>> >> > --- a/include/uapi/linux/pkt_cls.h
>> >> > +++ b/include/uapi/linux/pkt_cls.h
>> >> > @@ -134,6 +134,7 @@ enum tca_id {
>> >> > TCA_ID_CTINFO,
>> >> > TCA_ID_MPLS,
>> >> > TCA_ID_CT,
>> >> > + TCA_ID_GATE,
>> >> > /* other actions go here */
>> >> > __TCA_ID_MAX = 255
>> >> > };
>> >> > diff --git a/include/uapi/linux/tc_act/tc_gate.h
>> >> > b/include/uapi/linux/tc_act/tc_gate.h
>> >> > new file mode 100644
>> >> > index 000000000000..f214b3a6d44f
>> >> > --- /dev/null
>> >> > +++ b/include/uapi/linux/tc_act/tc_gate.h
>> >> > @@ -0,0 +1,47 @@
>> >> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
>> >> > +/* Copyright 2020 NXP */
>> >> > +
>> >> > +#ifndef __LINUX_TC_GATE_H
>> >> > +#define __LINUX_TC_GATE_H
>> >> > +
>> >> > +#include <linux/pkt_cls.h>
>> >> > +
>> >> > +struct tc_gate {
>> >> > + tc_gen;
>> >> > +};
>> >> > +
>> >> > +enum {
>> >> > + TCA_GATE_ENTRY_UNSPEC,
>> >> > + TCA_GATE_ENTRY_INDEX,
>> >> > + TCA_GATE_ENTRY_GATE,
>> >> > + TCA_GATE_ENTRY_INTERVAL,
>> >> > + TCA_GATE_ENTRY_IPV,
>> >> > + TCA_GATE_ENTRY_MAX_OCTETS,
>> >> > + __TCA_GATE_ENTRY_MAX,
>> >> > +};
>> >> > +#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
>> >> > +
>> >> > +enum {
>> >> > + TCA_GATE_ONE_ENTRY_UNSPEC,
>> >> > + TCA_GATE_ONE_ENTRY,
>> >> > + __TCA_GATE_ONE_ENTRY_MAX,
>> >> > +};
>> >> > +#define TCA_GATE_ONE_ENTRY_MAX
>> (__TCA_GATE_ONE_ENTRY_MAX
>> >> - 1)
>> >> > +
>> >> > +enum {
>> >> > + TCA_GATE_UNSPEC,
>> >> > + TCA_GATE_TM,
>> >> > + TCA_GATE_PARMS,
>> >> > + TCA_GATE_PAD,
>> >> > + TCA_GATE_PRIORITY,
>> >> > + TCA_GATE_ENTRY_LIST,
>> >> > + TCA_GATE_BASE_TIME,
>> >> > + TCA_GATE_CYCLE_TIME,
>> >> > + TCA_GATE_CYCLE_TIME_EXT,
>> >> > + TCA_GATE_FLAGS,
>> >> > + TCA_GATE_CLOCKID,
>> >> > + __TCA_GATE_MAX,
>> >> > +};
>> >> > +#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
>> >> > +
>> >> > +#endif
>> >> > diff --git a/net/sched/Kconfig b/net/sched/Kconfig index
>> >> > bfbefb7bff9d..1314549c7567 100644
>> >> > --- a/net/sched/Kconfig
>> >> > +++ b/net/sched/Kconfig
>> >> > @@ -981,6 +981,19 @@ config NET_ACT_CT
>> >> > To compile this code as a module, choose M here: the
>> >> > module will be called act_ct.
>> >> >
>> >> > +config NET_ACT_GATE
>> >> > + tristate "Frame gate entry list control tc action"
>> >> > + depends on NET_CLS_ACT
>> >> > + help
>> >> > + Say Y here to allow to control the ingress flow to be passed at
>> >> > + specific time slot and be dropped at other specific time slot by
>> >> > + the gate entry list. The manipulation will simulate the IEEE
>> >> > + 802.1Qci stream gate control behavior.
>> >> > +
>> >> > + If unsure, say N.
>> >> > + To compile this code as a module, choose M here: the
>> >> > + module will be called act_gate.
>> >> > +
>> >> > config NET_IFE_SKBMARK
>> >> > tristate "Support to encoding decoding skb mark on IFE action"
>> >> > depends on NET_ACT_IFE
>> >> > diff --git a/net/sched/Makefile b/net/sched/Makefile index
>> >> > 31c367a6cd09..66bbf9a98f9e 100644
>> >> > --- a/net/sched/Makefile
>> >> > +++ b/net/sched/Makefile
>> >> > @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) +=
>> >> act_meta_skbprio.o
>> >> > obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
>> >> > obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
>> >> > obj-$(CONFIG_NET_ACT_CT) += act_ct.o
>> >> > +obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
>> >> > obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
>> >> > obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
>> >> > obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
>> >> > diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c new file
>> >> > mode
>> >> > 100644 index 000000000000..e932f402b4f1
>> >> > --- /dev/null
>> >> > +++ b/net/sched/act_gate.c
>> >> > @@ -0,0 +1,647 @@
>> >> > +// SPDX-License-Identifier: GPL-2.0-or-later
>> >> > +/* Copyright 2020 NXP */
>> >> > +
>> >> > +#include <linux/module.h>
>> >> > +#include <linux/types.h>
>> >> > +#include <linux/kernel.h>
>> >> > +#include <linux/string.h>
>> >> > +#include <linux/errno.h>
>> >> > +#include <linux/skbuff.h>
>> >> > +#include <linux/rtnetlink.h>
>> >> > +#include <linux/init.h>
>> >> > +#include <linux/slab.h>
>> >> > +#include <net/act_api.h>
>> >> > +#include <net/netlink.h>
>> >> > +#include <net/pkt_cls.h>
>> >> > +#include <net/tc_act/tc_gate.h>
>> >> > +
>> >> > +static unsigned int gate_net_id;
>> >> > +static struct tc_action_ops act_gate_ops;
>> >> > +
>> >> > +static ktime_t gate_get_time(struct gate_action *gact) {
>> >> > + ktime_t mono = ktime_get();
>> >> > +
>> >> > + switch (gact->tk_offset) {
>> >> > + case TK_OFFS_MAX:
>> >> > + return mono;
>> >> > + default:
>> >> > + return ktime_mono_to_any(mono, gact->tk_offset);
>> >> > + }
>> >> > +
>> >> > + return KTIME_MAX;
>> >> > +}
>> >> > +
>> >> > +static int gate_get_start_time(struct gate_action *gact, ktime_t
>> >> > +*start) {
>> >> > + struct tcf_gate_params *param = get_gate_param(gact);
>> >> > + ktime_t now, base, cycle;
>> >> > + u64 n;
>> >> > +
>> >> > + base = ns_to_ktime(param->tcfg_basetime);
>> >> > + now = gate_get_time(gact);
>> >> > +
>> >> > + if (ktime_after(base, now)) {
>> >> > + *start = base;
>> >> > + return 0;
>> >> > + }
>> >> > +
>> >> > + cycle = param->tcfg_cycletime;
>> >> > +
>> >> > + /* cycle time should not be zero */
>> >> > + if (WARN_ON(!cycle))
>> >> > + return -EFAULT;
>> >>
>> >> Looking at the init code it seems that this value can be set to 0
>> >> directly from netlink packet without further validation, which would
>> >> allow user to trigger warning here.
>> >
>> > Yes, will avoid at ahead point.
>> >
>> >>
>> >> > +
>> >> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
>> >> > + *start = ktime_add_ns(base, (n + 1) * cycle);
>> >> > + return 0;
>> >> > +}
>> >> > +
>> >> > +static void gate_start_timer(struct gate_action *gact, ktime_t
>> >> > +start) {
>> >> > + ktime_t expires;
>> >> > +
>> >> > + expires = hrtimer_get_expires(&gact->hitimer);
>> >> > + if (expires == 0)
>> >> > + expires = KTIME_MAX;
>> >> > +
>> >> > + start = min_t(ktime_t, start, expires);
>> >> > +
>> >> > + hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS); }
>> >> > +
>> >> > +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer) {
>> >> > + struct gate_action *gact = container_of(timer, struct gate_action,
>> >> > + hitimer);
>> >> > + struct tcf_gate_params *p = get_gate_param(gact);
>> >> > + struct tcfg_gate_entry *next;
>> >> > + ktime_t close_time, now;
>> >> > +
>> >> > + spin_lock(&gact->entry_lock);
>> >> > +
>> >> > + next = rcu_dereference_protected(gact->next_entry,
>> >> > +
>> >> > + lockdep_is_held(&gact->entry_lock));
>> >> > +
>> >> > + /* cycle start, clear pending bit, clear total octets */
>> >> > + gact->current_gate_status = next->gate_state ?
>> >> GATE_ACT_GATE_OPEN : 0;
>> >> > + gact->current_entry_octets = 0;
>> >> > + gact->current_max_octets = next->maxoctets;
>> >> > +
>> >> > + gact->current_close_time = ktime_add_ns(gact-
>> >current_close_time,
>> >> > + next->interval);
>> >> > +
>> >> > + close_time = gact->current_close_time;
>> >> > +
>> >> > + if (list_is_last(&next->list, &p->entries))
>> >> > + next = list_first_entry(&p->entries,
>> >> > + struct tcfg_gate_entry, list);
>> >> > + else
>> >> > + next = list_next_entry(next, list);
>> >> > +
>> >> > + now = gate_get_time(gact);
>> >> > +
>> >> > + if (ktime_after(now, close_time)) {
>> >> > + ktime_t cycle, base;
>> >> > + u64 n;
>> >> > +
>> >> > + cycle = p->tcfg_cycletime;
>> >> > + base = ns_to_ktime(p->tcfg_basetime);
>> >> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
>> >> > + close_time = ktime_add_ns(base, (n + 1) * cycle);
>> >> > + }
>> >> > +
>> >> > + rcu_assign_pointer(gact->next_entry, next);
>> >> > + spin_unlock(&gact->entry_lock);
>> >>
>> >> I have couple of question about synchronization here:
>> >>
>> >> - Why do you need next_entry to be rcu pointer? It is only assigned
>> >> here with entry_lock protection and in init code before action is
>> >> visible to concurrent users. I don't see any unlocked rcu-protected
>> >> readers here that could benefit from it.
>> >>
>> >> - Why create dedicated entry_lock instead of using already existing
>> >> per- action tcf_lock?
>> >
>> > Will try to use the tcf_lock for verification.
>> > The thoughts came from that the timer period arrived then check
>> > through the list and then update next time would take much more time.
>> > Action function would be busy when traffic. So use a separate lock
>> > here for
>> >
>> >>
>> >> > +
>> >> > + hrtimer_set_expires(&gact->hitimer, close_time);
>> >> > +
>> >> > + return HRTIMER_RESTART;
>> >> > +}
>> >> > +
>> >> > +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
>> >> > + struct tcf_result *res) {
>> >> > + struct tcf_gate *g = to_gate(a);
>> >> > + struct gate_action *gact;
>> >> > + int action;
>> >> > +
>> >> > + tcf_lastuse_update(&g->tcf_tm);
>> >> > + bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
>> >> > +
>> >> > + action = READ_ONCE(g->tcf_action);
>> >> > + rcu_read_lock();
>> >>
>> >> Action fastpath is already rcu read lock protected, you don't need to
>> >> manually obtain it.
>> >
>> > Will be removed.
>> >
>> >>
>> >> > + gact = rcu_dereference_bh(g->actg);
>> >> > + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
>> >>
>> >> Can't current_gate_status be concurrently modified by timer callback?
>> >> This function doesn't use entry_lock to synchronize with timer.
>> >
>> > Will try tcf_lock either.
>> >
>> >>
>> >> > + rcu_read_unlock();
>> >> > + return action;
>> >> > + }
>> >> > +
>> >> > + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
>> >>
>> >> ...and here
>> >>
>> >> > + goto drop;
>> >> > +
>> >> > + if (gact->current_max_octets >= 0) {
>> >> > + gact->current_entry_octets += qdisc_pkt_len(skb);
>> >> > + if (gact->current_entry_octets >
>> >> > + gact->current_max_octets) {
>> >>
>> >> here also.
>> >>
>> >> > +
>> >> > + qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
>> >>
>> >> Please use tcf_action_inc_overlimit_qstats() and other wrappers for
>> stats.
>> >> Otherwise it will crash if user passes
>> TCA_ACT_FLAGS_NO_PERCPU_STATS
>> >> flag.
>> >
>> > The tcf_action_inc_overlimit_qstats() can't show limit counts in tc show
>> command. Is there anything need to do?
>>
>> What do you mean? Internally tcf_action_inc_overlimit_qstats() just calls
>> qstats_overlimit_inc, if cpu_qstats percpu counter is not NULL:
>>
>>
>> if (likely(a->cpu_qstats)) {
>> qstats_overlimit_inc(this_cpu_ptr(a->cpu_qstats));
>> return;
>> }
>>
>> Is there a subtle bug somewhere in this function?
>
> Sorry, I updated using the tcf_action_*, and the counting is ok. I moved back to the qstats_overlimit_inc() because tcf_action_* () include the spin_lock(&a->tcfa_lock).
> I would update to tcf_action_* () increate.

BTW if you end up with synchronizing fastpath with tcfa_lock, then you
don't need to use tcf_action_*stats() helpers and percpu counters (they
will only slow down action init and increase memory usage without
providing any improvements for parallelism). Instead, you can just
directly change the tcf_{q|b}stats while holding the tcfa_lock. Check
pedit for example of such action.

>
>>
>> >
>> > Br,
>> > Po Liu
>
> Thanks a lot.
>
> Br,
> Po Liu

2020-04-23 11:16:53

by Vlad Buslov

[permalink] [raw]
Subject: Re: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action


On Thu 23 Apr 2020 at 12:15, Po Liu <[email protected]> wrote:
> Hi Vlad Buslov,
>
>> > >> > +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
>> {
>> > >> > + struct gate_action *gact = container_of(timer, struct
>> gate_action,
>> > >> > + hitimer);
>> > >> > + struct tcf_gate_params *p = get_gate_param(gact);
>> > >> > + struct tcfg_gate_entry *next;
>> > >> > + ktime_t close_time, now;
>> > >> > +
>> > >> > + spin_lock(&gact->entry_lock);
>> > >> > +
>> > >> > + next = rcu_dereference_protected(gact->next_entry,
>> > >> > +
>> > >> > + lockdep_is_held(&gact->entry_lock));
>> > >> > +
>> > >> > + /* cycle start, clear pending bit, clear total octets */
>> > >> > + gact->current_gate_status = next->gate_state ?
>> > >> GATE_ACT_GATE_OPEN : 0;
>> > >> > + gact->current_entry_octets = 0;
>> > >> > + gact->current_max_octets = next->maxoctets;
>> > >> > +
>> > >> > + gact->current_close_time = ktime_add_ns(gact-
>> > >current_close_time,
>> > >> > + next->interval);
>> > >> > +
>> > >> > + close_time = gact->current_close_time;
>> > >> > +
>> > >> > + if (list_is_last(&next->list, &p->entries))
>> > >> > + next = list_first_entry(&p->entries,
>> > >> > + struct tcfg_gate_entry, list);
>> > >> > + else
>> > >> > + next = list_next_entry(next, list);
>> > >> > +
>> > >> > + now = gate_get_time(gact);
>> > >> > +
>> > >> > + if (ktime_after(now, close_time)) {
>> > >> > + ktime_t cycle, base;
>> > >> > + u64 n;
>> > >> > +
>> > >> > + cycle = p->tcfg_cycletime;
>> > >> > + base = ns_to_ktime(p->tcfg_basetime);
>> > >> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
>> > >> > + close_time = ktime_add_ns(base, (n + 1) * cycle);
>> > >> > + }
>> > >> > +
>> > >> > + rcu_assign_pointer(gact->next_entry, next);
>> > >> > + spin_unlock(&gact->entry_lock);
>> > >>
>> > >> I have couple of question about synchronization here:
>> > >>
>> > >> - Why do you need next_entry to be rcu pointer? It is only assigned
>> > >> here with entry_lock protection and in init code before action is
>> > >> visible to concurrent users. I don't see any unlocked rcu-protected
>> > >> readers here that could benefit from it.
>> > >>
>> > >> - Why create dedicated entry_lock instead of using already existing
>> > >> per- action tcf_lock?
>> > >
>> > > Will try to use the tcf_lock for verification.
>
> I think I added entry_lock was that I can't get the tc_action common parameter in this timer function. If I insist to use the tcf_lock, I have to move the hrtimer to struct tcf_gate which has tc_action common.
> What do you think?

Well, if you use tcf_lock instead of rcu to sync with fastpath, the you
don't need to implement struct gate_action as standalone object pointed
to by rcu pointer from base structure that includes tc_action common.
All the necessary data can be included in tcf_gate structure directly
and used from both timer and action fastpath. See pedit for example of
action that doesn't use rcu for fastpath.

>
>> > > The thoughts came from that the timer period arrived then check
>> > > through the list and then update next time would take much more
>> time.
>> > > Action function would be busy when traffic. So use a separate lock
>> > > here for
>> > >
>> > >>
>> > >> > +
>> > >> > + hrtimer_set_expires(&gact->hitimer, close_time);
>> > >> > +
>> > >> > + return HRTIMER_RESTART;
>> > >> > +}
>> > >> > +
>> > >> > +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
>> > >> > + struct tcf_result *res) {
>> > >> > + struct tcf_gate *g = to_gate(a);
>> > >> > + struct gate_action *gact;
>> > >> > + int action;
>> > >> > +
>> > >> > + tcf_lastuse_update(&g->tcf_tm);
>> > >> > + bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
>> > >> > +
>> > >> > + action = READ_ONCE(g->tcf_action);
>> > >> > + rcu_read_lock();
>> > >>
>> > >> Action fastpath is already rcu read lock protected, you don't need
>> > >> to manually obtain it.
>> > >
>> > > Will be removed.
>> > >
>> > >>
>> > >> > + gact = rcu_dereference_bh(g->actg);
>> > >> > + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING))
>> > >> > + {
>> > >>
>> > >> Can't current_gate_status be concurrently modified by timer callback?
>> > >> This function doesn't use entry_lock to synchronize with timer.
>> > >
>> > > Will try tcf_lock either.
>> > >
>> > >>
>> > >> > + rcu_read_unlock();
>> > >> > + return action;
>> > >> > + }
>> > >> > +
>> > >> > + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
>> > >>
>> > >> ...and here
>> > >>
>> > >> > + goto drop;
>> > >> > +
>> > >> > + if (gact->current_max_octets >= 0) {
>> > >> > + gact->current_entry_octets += qdisc_pkt_len(skb);
>> > >> > + if (gact->current_entry_octets >
>> > >> > + gact->current_max_octets) {
>> > >>
>> > >> here also.
>> > >>
>> > >> > +
>> > >> > + qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
>> > >>
>> > >> Please use tcf_action_inc_overlimit_qstats() and other wrappers for
>> > stats.
>> > >> Otherwise it will crash if user passes
>> > TCA_ACT_FLAGS_NO_PERCPU_STATS
>> > >> flag.
>> > >
>> > > The tcf_action_inc_overlimit_qstats() can't show limit counts in tc
>> > > show
>> > command. Is there anything need to do?
>> >
>> > What do you mean? Internally tcf_action_inc_overlimit_qstats() just
>> > calls qstats_overlimit_inc, if cpu_qstats percpu counter is not NULL:
>> >
>> >
>> > if (likely(a->cpu_qstats)) {
>> > qstats_overlimit_inc(this_cpu_ptr(a->cpu_qstats));
>> > return;
>> > }
>> >
>> > Is there a subtle bug somewhere in this function?
>>
>> Sorry, I updated using the tcf_action_*, and the counting is ok. I moved
>> back to the qstats_overlimit_inc() because tcf_action_* () include the
>> spin_lock(&a->tcfa_lock).
>> I would update to tcf_action_* () increate.
>>
>> >
>> > >
>> > > Br,
>> > > Po Liu
>>
>> Thanks a lot.
>>
>> Br,
>> Po Liu
>
> Thanks a lot.
>
> Br,
> Po Liu

2020-04-23 19:36:02

by Vinicius Costa Gomes

[permalink] [raw]
Subject: Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Po Liu <[email protected]> writes:

> Introduce a ingress frame gate control flow action.
> Tc gate action does the work like this:
> Assume there is a gate allow specified ingress frames can be passed at
> specific time slot, and be dropped at specific time slot. Tc filter
> chooses the ingress frames, and tc gate action would specify what slot
> does these frames can be passed to device and what time slot would be
> dropped.
> Tc gate action would provide an entry list to tell how much time gate
> keep open and how much time gate keep state close. Gate action also
> assign a start time to tell when the entry list start. Then driver would
> repeat the gate entry list cyclically.
> For the software simulation, gate action requires the user assign a time
> clock type.
>
> Below is the setting example in user space. Tc filter a stream source ip
> address is 192.168.0.20 and gate action own two time slots. One is last
> 200ms gate open let frame pass another is last 100ms gate close let
> frames dropped. When the frames have passed total frames over 8000000
> bytes, frames will be dropped in one 200000000ns time slot.
>
>> tc qdisc add dev eth0 ingress
>
>> tc filter add dev eth0 parent ffff: protocol ip \
> flower src_ip 192.168.0.20 \
> action gate index 2 clockid CLOCK_TAI \
> sched-entry open 200000000 -1 8000000 \
> sched-entry close 100000000 -1 -1

From the insight that Vladimir gave, it really makes it easier for me to
understand if you added these filters and actions in two steps. The
first, you would add the "time based" actions and the second you would
plug the filters into the actions. And I think this would match real
world usage better.

Another small usability improvement is to make the "extra" parameters to
'sched-entry close' optional, if packets that arrive during a closed
gate are dropped, those parameters don't make much sense.

>
>> tc chain del dev eth0 ingress chain 0
>
> "sched-entry" follow the name taprio style. Gate state is
> "open"/"close". Follow with period nanosecond. Then next item is internal
> priority value means which ingress queue should put. "-1" means
> wildcard. The last value optional specifies the maximum number of
> MSDU octets that are permitted to pass the gate during the specified
> time interval.
> Base-time is not set will be 0 as default, as result start time would
> be ((N + 1) * cycletime) which is the minimal of future time.
>
> Below example shows filtering a stream with destination mac address is
> 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
> action would run with one close time slot which means always keep close.
> The time cycle is total 200000000ns. The base-time would calculate by:
>
> 1357000000000 + (N + 1) * cycletime
>
> When the total value is the future time, it will be the start time.
> The cycletime here would be 200000000ns for this case.
>
>> tc filter add dev eth0 parent ffff: protocol ip \
> flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> action gate index 12 base-time 1357000000000 \
> sched-entry close 200000000 -1 -1 \
> clockid CLOCK_TAI
>
> Signed-off-by: Po Liu <[email protected]>
> ---
> include/net/tc_act/tc_gate.h | 54 +++
> include/uapi/linux/pkt_cls.h | 1 +
> include/uapi/linux/tc_act/tc_gate.h | 47 ++
> net/sched/Kconfig | 13 +
> net/sched/Makefile | 1 +
> net/sched/act_gate.c | 647 ++++++++++++++++++++++++++++
> 6 files changed, 763 insertions(+)
> create mode 100644 include/net/tc_act/tc_gate.h
> create mode 100644 include/uapi/linux/tc_act/tc_gate.h
> create mode 100644 net/sched/act_gate.c
>
> diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
> new file mode 100644
> index 000000000000..b0ace55b2aaa
> --- /dev/null
> +++ b/include/net/tc_act/tc_gate.h
> @@ -0,0 +1,54 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/* Copyright 2020 NXP */
> +
> +#ifndef __NET_TC_GATE_H
> +#define __NET_TC_GATE_H
> +
> +#include <net/act_api.h>
> +#include <linux/tc_act/tc_gate.h>
> +
> +struct tcfg_gate_entry {
> + int index;
> + u8 gate_state;
> + u32 interval;
> + s32 ipv;
> + s32 maxoctets;
> + struct list_head list;
> +};
> +
> +struct tcf_gate_params {
> + s32 tcfg_priority;
> + u64 tcfg_basetime;
> + u64 tcfg_cycletime;
> + u64 tcfg_cycletime_ext;
> + u32 tcfg_flags;
> + s32 tcfg_clockid;
> + size_t num_entries;
> + struct list_head entries;
> +};
> +
> +#define GATE_ACT_GATE_OPEN BIT(0)
> +#define GATE_ACT_PENDING BIT(1)
> +struct gate_action {
> + struct tcf_gate_params param;
> + spinlock_t entry_lock;
> + u8 current_gate_status;
> + ktime_t current_close_time;
> + u32 current_entry_octets;
> + s32 current_max_octets;
> + struct tcfg_gate_entry __rcu *next_entry;
> + struct hrtimer hitimer;
> + enum tk_offsets tk_offset;
> + struct rcu_head rcu;
> +};
> +
> +struct tcf_gate {
> + struct tc_action common;
> + struct gate_action __rcu *actg;
> +};
> +#define to_gate(a) ((struct tcf_gate *)a)
> +
> +#define get_gate_param(act) ((struct tcf_gate_params *)act)
> +#define get_gate_action(p) ((struct gate_action *)p)
> +
> +#endif
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 9f06d29cab70..fc672b232437 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -134,6 +134,7 @@ enum tca_id {
> TCA_ID_CTINFO,
> TCA_ID_MPLS,
> TCA_ID_CT,
> + TCA_ID_GATE,
> /* other actions go here */
> __TCA_ID_MAX = 255
> };
> diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
> new file mode 100644
> index 000000000000..f214b3a6d44f
> --- /dev/null
> +++ b/include/uapi/linux/tc_act/tc_gate.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/* Copyright 2020 NXP */
> +
> +#ifndef __LINUX_TC_GATE_H
> +#define __LINUX_TC_GATE_H
> +
> +#include <linux/pkt_cls.h>
> +
> +struct tc_gate {
> + tc_gen;
> +};
> +
> +enum {
> + TCA_GATE_ENTRY_UNSPEC,
> + TCA_GATE_ENTRY_INDEX,
> + TCA_GATE_ENTRY_GATE,
> + TCA_GATE_ENTRY_INTERVAL,
> + TCA_GATE_ENTRY_IPV,
> + TCA_GATE_ENTRY_MAX_OCTETS,
> + __TCA_GATE_ENTRY_MAX,
> +};
> +#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
> +
> +enum {
> + TCA_GATE_ONE_ENTRY_UNSPEC,
> + TCA_GATE_ONE_ENTRY,
> + __TCA_GATE_ONE_ENTRY_MAX,
> +};
> +#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
> +
> +enum {
> + TCA_GATE_UNSPEC,
> + TCA_GATE_TM,
> + TCA_GATE_PARMS,
> + TCA_GATE_PAD,
> + TCA_GATE_PRIORITY,
> + TCA_GATE_ENTRY_LIST,
> + TCA_GATE_BASE_TIME,
> + TCA_GATE_CYCLE_TIME,
> + TCA_GATE_CYCLE_TIME_EXT,
> + TCA_GATE_FLAGS,
> + TCA_GATE_CLOCKID,
> + __TCA_GATE_MAX,
> +};
> +#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
> +
> +#endif
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index bfbefb7bff9d..1314549c7567 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -981,6 +981,19 @@ config NET_ACT_CT
> To compile this code as a module, choose M here: the
> module will be called act_ct.
>
> +config NET_ACT_GATE
> + tristate "Frame gate entry list control tc action"
> + depends on NET_CLS_ACT
> + help
> + Say Y here to allow to control the ingress flow to be passed at
> + specific time slot and be dropped at other specific time slot by
> + the gate entry list. The manipulation will simulate the IEEE
> + 802.1Qci stream gate control behavior.
> +
> + If unsure, say N.
> + To compile this code as a module, choose M here: the
> + module will be called act_gate.
> +
> config NET_IFE_SKBMARK
> tristate "Support to encoding decoding skb mark on IFE action"
> depends on NET_ACT_IFE
> diff --git a/net/sched/Makefile b/net/sched/Makefile
> index 31c367a6cd09..66bbf9a98f9e 100644
> --- a/net/sched/Makefile
> +++ b/net/sched/Makefile
> @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o
> obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
> obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
> obj-$(CONFIG_NET_ACT_CT) += act_ct.o
> +obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
> obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
> obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
> obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
> diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
> new file mode 100644
> index 000000000000..e932f402b4f1
> --- /dev/null
> +++ b/net/sched/act_gate.c
> @@ -0,0 +1,647 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* Copyright 2020 NXP */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/errno.h>
> +#include <linux/skbuff.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <net/act_api.h>
> +#include <net/netlink.h>
> +#include <net/pkt_cls.h>
> +#include <net/tc_act/tc_gate.h>
> +
> +static unsigned int gate_net_id;
> +static struct tc_action_ops act_gate_ops;
> +
> +static ktime_t gate_get_time(struct gate_action *gact)
> +{
> + ktime_t mono = ktime_get();
> +
> + switch (gact->tk_offset) {
> + case TK_OFFS_MAX:
> + return mono;
> + default:
> + return ktime_mono_to_any(mono, gact->tk_offset);
> + }
> +
> + return KTIME_MAX;
> +}
> +
> +static int gate_get_start_time(struct gate_action *gact, ktime_t *start)
> +{
> + struct tcf_gate_params *param = get_gate_param(gact);
> + ktime_t now, base, cycle;
> + u64 n;
> +
> + base = ns_to_ktime(param->tcfg_basetime);
> + now = gate_get_time(gact);
> +
> + if (ktime_after(base, now)) {
> + *start = base;
> + return 0;
> + }
> +
> + cycle = param->tcfg_cycletime;
> +
> + /* cycle time should not be zero */
> + if (WARN_ON(!cycle))
> + return -EFAULT;
> +
> + n = div64_u64(ktime_sub_ns(now, base), cycle);
> + *start = ktime_add_ns(base, (n + 1) * cycle);
> + return 0;
> +}
> +
> +static void gate_start_timer(struct gate_action *gact, ktime_t start)
> +{
> + ktime_t expires;
> +
> + expires = hrtimer_get_expires(&gact->hitimer);
> + if (expires == 0)
> + expires = KTIME_MAX;
> +
> + start = min_t(ktime_t, start, expires);
> +
> + hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS);
> +}
> +
> +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
> +{
> + struct gate_action *gact = container_of(timer, struct gate_action,
> + hitimer);
> + struct tcf_gate_params *p = get_gate_param(gact);
> + struct tcfg_gate_entry *next;
> + ktime_t close_time, now;
> +
> + spin_lock(&gact->entry_lock);
> +
> + next = rcu_dereference_protected(gact->next_entry,
> + lockdep_is_held(&gact->entry_lock));
> +
> + /* cycle start, clear pending bit, clear total octets */
> + gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0;
> + gact->current_entry_octets = 0;
> + gact->current_max_octets = next->maxoctets;
> +
> + gact->current_close_time = ktime_add_ns(gact->current_close_time,
> + next->interval);
> +
> + close_time = gact->current_close_time;
> +
> + if (list_is_last(&next->list, &p->entries))
> + next = list_first_entry(&p->entries,
> + struct tcfg_gate_entry, list);
> + else
> + next = list_next_entry(next, list);
> +
> + now = gate_get_time(gact);
> +
> + if (ktime_after(now, close_time)) {
> + ktime_t cycle, base;
> + u64 n;
> +
> + cycle = p->tcfg_cycletime;
> + base = ns_to_ktime(p->tcfg_basetime);
> + n = div64_u64(ktime_sub_ns(now, base), cycle);
> + close_time = ktime_add_ns(base, (n + 1) * cycle);
> + }
> +
> + rcu_assign_pointer(gact->next_entry, next);
> + spin_unlock(&gact->entry_lock);
> +
> + hrtimer_set_expires(&gact->hitimer, close_time);
> +
> + return HRTIMER_RESTART;
> +}
> +
> +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
> + struct tcf_result *res)
> +{
> + struct tcf_gate *g = to_gate(a);
> + struct gate_action *gact;
> + int action;
> +
> + tcf_lastuse_update(&g->tcf_tm);
> + bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
> +
> + action = READ_ONCE(g->tcf_action);
> + rcu_read_lock();
> + gact = rcu_dereference_bh(g->actg);
> + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
> + rcu_read_unlock();
> + return action;
> + }
> +
> + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
> + goto drop;
> +
> + if (gact->current_max_octets >= 0) {
> + gact->current_entry_octets += qdisc_pkt_len(skb);
> + if (gact->current_entry_octets > gact->current_max_octets) {
> + qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
> + goto drop;
> + }
> + }
> + rcu_read_unlock();
> +
> + return action;
> +drop:
> + rcu_read_unlock();
> + qstats_drop_inc(this_cpu_ptr(g->common.cpu_qstats));
> + return TC_ACT_SHOT;
> +}
> +
> +static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1] = {
> + [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
> + [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
> + [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
> + [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
> + [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
> +};
> +
> +static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
> + [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
> + .type = NLA_EXACT_LEN },
> + [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
> + [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
> + [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
> + [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
> + [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
> + [TCA_GATE_FLAGS] = { .type = NLA_U32 },
> + [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
> +};
> +
> +static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
> + struct netlink_ext_ack *extack)
> +{
> + u32 interval = 0;
> +
> + entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
> +
> + if (tb[TCA_GATE_ENTRY_INTERVAL])
> + interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
> +
> + if (interval == 0) {
> + NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
> + return -EINVAL;
> + }
> +
> + entry->interval = interval;
> +
> + if (tb[TCA_GATE_ENTRY_IPV])
> + entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
> + else
> + entry->ipv = -1;
> +
> + if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
> + entry->maxoctets = nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
> + else
> + entry->maxoctets = -1;
> +
> + return 0;
> +}
> +
> +static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry *entry,
> + int index, struct netlink_ext_ack *extack)
> +{
> + struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
> + int err;
> +
> + err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy, extack);
> + if (err < 0) {
> + NL_SET_ERR_MSG(extack, "Could not parse nested entry");
> + return -EINVAL;
> + }
> +
> + entry->index = index;
> +
> + return fill_gate_entry(tb, entry, extack);
> +}
> +
> +static int parse_gate_list(struct nlattr *list_attr,
> + struct tcf_gate_params *sched,
> + struct netlink_ext_ack *extack)
> +{
> + struct tcfg_gate_entry *entry, *e;
> + struct nlattr *n;
> + int err, rem;
> + int i = 0;
> +
> + if (!list_attr)
> + return -EINVAL;
> +
> + nla_for_each_nested(n, list_attr, rem) {
> + if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
> + NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
> + continue;
> + }
> +
> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> + if (!entry) {
> + NL_SET_ERR_MSG(extack, "Not enough memory for entry");
> + err = -ENOMEM;
> + goto release_list;
> + }
> +
> + err = parse_gate_entry(n, entry, i, extack);
> + if (err < 0) {
> + kfree(entry);
> + goto release_list;
> + }
> +
> + list_add_tail(&entry->list, &sched->entries);
> + i++;
> + }
> +
> + sched->num_entries = i;
> +
> + return i;
> +
> +release_list:
> + list_for_each_entry_safe(entry, e, &sched->entries, list) {
> + list_del(&entry->list);
> + kfree(entry);
> + }
> +
> + return err;
> +}
> +
> +static int tcf_gate_init(struct net *net, struct nlattr *nla,
> + struct nlattr *est, struct tc_action **a,
> + int ovr, int bind, bool rtnl_held,
> + struct tcf_proto *tp, u32 flags,
> + struct netlink_ext_ack *extack)
> +{
> + struct tc_action_net *tn = net_generic(net, gate_net_id);
> + enum tk_offsets tk_offset = TK_OFFS_TAI;
> + struct nlattr *tb[TCA_GATE_MAX + 1];
> + struct tcf_chain *goto_ch = NULL;
> + struct tcfg_gate_entry *next;
> + struct tcf_gate_params *p;
> + struct gate_action *gact;
> + s32 clockid = CLOCK_TAI;
> + struct tc_gate *parm;
> + struct tcf_gate *g;
> + int ret = 0, err;
> + u64 basetime = 0;
> + u32 gflags = 0;
> + s32 prio = -1;
> + ktime_t start;
> + u32 index;
> +
> + if (!nla)
> + return -EINVAL;
> +
> + err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
> + if (err < 0)
> + return err;
> +
> + if (!tb[TCA_GATE_PARMS])
> + return -EINVAL;
> + parm = nla_data(tb[TCA_GATE_PARMS]);
> + index = parm->index;
> + err = tcf_idr_check_alloc(tn, &index, a, bind);
> + if (err < 0)
> + return err;
> +
> + if (err && bind)
> + return 0;
> +
> + if (!err) {
> + ret = tcf_idr_create_from_flags(tn, index, est, a,
> + &act_gate_ops, bind, flags);
> + if (ret) {
> + tcf_idr_cleanup(tn, index);
> + return ret;
> + }
> +
> + ret = ACT_P_CREATED;
> + } else if (!ovr) {
> + tcf_idr_release(*a, bind);
> + return -EEXIST;
> + }
> +
> + if (tb[TCA_GATE_PRIORITY])
> + prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
> +
> + if (tb[TCA_GATE_BASE_TIME])
> + basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
> +
> + if (tb[TCA_GATE_FLAGS])
> + gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
> +
> + if (tb[TCA_GATE_CLOCKID]) {
> + clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
> + switch (clockid) {
> + case CLOCK_REALTIME:
> + tk_offset = TK_OFFS_REAL;
> + break;
> + case CLOCK_MONOTONIC:
> + tk_offset = TK_OFFS_MAX;
> + break;
> + case CLOCK_BOOTTIME:
> + tk_offset = TK_OFFS_BOOT;
> + break;
> + case CLOCK_TAI:
> + tk_offset = TK_OFFS_TAI;
> + break;
> + default:
> + NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
> + goto release_idr;
> + }
> + }
> +
> + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
> + if (err < 0)
> + goto release_idr;
> +
> + g = to_gate(*a);
> +
> + gact = kzalloc(sizeof(*gact), GFP_KERNEL);
> + if (!gact) {
> + err = -ENOMEM;
> + goto put_chain;
> + }
> +
> + p = get_gate_param(gact);
> +
> + INIT_LIST_HEAD(&p->entries);
> + if (tb[TCA_GATE_ENTRY_LIST]) {
> + err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
> + if (err < 0)
> + goto release_mem;
> + }
> +
> + if (tb[TCA_GATE_CYCLE_TIME]) {
> + p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
> + } else {
> + struct tcfg_gate_entry *entry;
> + ktime_t cycle = 0;
> +
> + list_for_each_entry(entry, &p->entries, list)
> + cycle = ktime_add_ns(cycle, entry->interval);
> + p->tcfg_cycletime = cycle;
> + }
> +
> + if (tb[TCA_GATE_CYCLE_TIME_EXT])
> + p->tcfg_cycletime_ext =
> + nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
> +
> + p->tcfg_priority = prio;
> + p->tcfg_basetime = basetime;
> + p->tcfg_clockid = clockid;
> + p->tcfg_flags = gflags;
> +
> + gact->tk_offset = tk_offset;
> + spin_lock_init(&gact->entry_lock);
> + hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
> + gact->hitimer.function = gate_timer_func;
> +

One idea that just happened, if you find a way to enable RX timestamping
and can rely that all packets have a timestamp, the code can simplified
a lot. You wouldn't need any hrtimers, and deciding to drop or not
a packet becomes a couple of mathematical operations. Seems worth a
thought.

The real question is: if requiring for the driver to support at least
software RX timestamping is excessive (doesn't seem so to me).


Cheers,
--
Vinicius

2020-04-23 19:41:08

by [email protected]

[permalink] [raw]
Subject: Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

On 23.04.2020 10:38, Vinicius Costa Gomes wrote:
>EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
>Po Liu <[email protected]> writes:
>>> tc filter add dev eth0 parent ffff: protocol ip \
>> flower src_ip 192.168.0.20 \
>> action gate index 2 clockid CLOCK_TAI \
>> sched-entry open 200000000 -1 8000000 \
>> sched-entry close 100000000 -1 -1
>
>From the insight that Vladimir gave, it really makes it easier for me to
>understand if you added these filters and actions in two steps. The
>first, you would add the "time based" actions and the second you would
>plug the filters into the actions. And I think this would match real
>world usage better.
I agree.

/Allan

2020-04-23 19:42:30

by [email protected]

[permalink] [raw]
Subject: Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

On 22.04.2020 22:28, Vladimir Oltean wrote:
>> >> tc qdisc add dev eth0 ingress
>> >
>> >> tc filter add dev eth0 parent ffff: protocol ip \
>> > flower src_ip 192.168.0.20 \
>> > action gate index 2 clockid CLOCK_TAI \
>> > sched-entry open 200000000 -1 8000000 \
>> > sched-entry close 100000000 -1 -1
>>
>> First of all, it is a long time since I read the 802.1Qci and when I did
>> it, it was a draft. So please let me know if I'm completly off here.
>>
>> I know you are focusing on the gate control in this patch serie, but I
>> assume that you later will want to do the policing and flow-meter as
>> well. And it could make sense to consider how all of this work
>> toghether.
>>
>> A common use-case for the policing is to have multiple rules pointing at
>> the same policing instance. Maybe you want the sum of the traffic on 2
>> ports to be limited to 100mbit. If you specify such action on the
>> individual rule (like done with the gate), then you can not have two
>> rules pointing at the same policer instance.
>>
>> Long storry short, have you considered if it would be better to do
>> something like:
>>
>> tc filter add dev eth0 parent ffff: protocol ip \
>> flower src_ip 192.168.0.20 \
>> action psfp-id 42
>>
>> And then have some other function to configure the properties of psfp-id
>> 42?
>>
>>
>> /Allan
>>
>
>It is very good that you brought it up though, since in my opinion too
>it is a rather important aspect, and it seems that the fact this
>feature is already designed-in was a bit too subtle.
>
>"psfp-id" is actually his "index" argument.
Ahh.. Thanks for clarifying, I missed this point completly.

2020-04-24 03:25:26

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Hi Vinicius,


> -----Original Message-----
> From: Vinicius Costa Gomes <[email protected]>
> Sent: 2020??4??24?? 1:38
> To: Po Liu <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]
> Cc: Po Liu <[email protected]>; Claudiu Manoil <[email protected]>;
> Vladimir Oltean <[email protected]>; Alexandru Marginean
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; Po Liu
> <[email protected]>
> Subject: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow
> action
>
> Caution: EXT Email
>
> Po Liu <[email protected]> writes:
>
> > Introduce a ingress frame gate control flow action.
> > Tc gate action does the work like this:
> > Assume there is a gate allow specified ingress frames can be passed at
> > specific time slot, and be dropped at specific time slot. Tc filter
> > chooses the ingress frames, and tc gate action would specify what slot
> > does these frames can be passed to device and what time slot would be
> > dropped.
> > Tc gate action would provide an entry list to tell how much time gate
> > keep open and how much time gate keep state close. Gate action also
> > assign a start time to tell when the entry list start. Then driver
> > would repeat the gate entry list cyclically.
> > For the software simulation, gate action requires the user assign a
> > time clock type.
> >
> > Below is the setting example in user space. Tc filter a stream source
> > ip address is 192.168.0.20 and gate action own two time slots. One is
> > last 200ms gate open let frame pass another is last 100ms gate close
> > let frames dropped. When the frames have passed total frames over
> > 8000000 bytes, frames will be dropped in one 200000000ns time slot.
> >
> >> tc qdisc add dev eth0 ingress
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower src_ip 192.168.0.20 \
> > action gate index 2 clockid CLOCK_TAI \
> > sched-entry open 200000000 -1 8000000 \
> > sched-entry close 100000000 -1 -1
>
> From the insight that Vladimir gave, it really makes it easier for me to
> understand if you added these filters and actions in two steps. The first,
> you would add the "time based" actions and the second you would plug
> the filters into the actions. And I think this would match real world usage
> better.
>
> Another small usability improvement is to make the "extra" parameters to
> 'sched-entry close' optional, if packets that arrive during a closed gate are
> dropped, those parameters don't make much sense.
>

This is make sense. Vladmir Oltean also suggest to default value if not set the last two parameters. I would modify it on the user space iproute2 patch.

> >
> >> tc chain del dev eth0 ingress chain 0
> >
> > "sched-entry" follow the name taprio style. Gate state is
> > "open"/"close". Follow with period nanosecond. Then next item is
> > internal priority value means which ingress queue should put. "-1"
> > means wildcard. The last value optional specifies the maximum number
> > of MSDU octets that are permitted to pass the gate during the
> > specified time interval.
> > Base-time is not set will be 0 as default, as result start time would
> > be ((N + 1) * cycletime) which is the minimal of future time.
> >
> > Below example shows filtering a stream with destination mac address is
> > 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The
> > gate action would run with one close time slot which means always keep
> close.
> > The time cycle is total 200000000ns. The base-time would calculate by:
> >
> > 1357000000000 + (N + 1) * cycletime
> >
> > When the total value is the future time, it will be the start time.
> > The cycletime here would be 200000000ns for this case.
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> > action gate index 12 base-time 1357000000000 \
> > sched-entry close 200000000 -1 -1 \
> > clockid CLOCK_TAI
> >
> > Signed-off-by: Po Liu <[email protected]>
> > ---
> > include/net/tc_act/tc_gate.h | 54 +++
> > include/uapi/linux/pkt_cls.h | 1 +
> > include/uapi/linux/tc_act/tc_gate.h | 47 ++
> > net/sched/Kconfig | 13 +
> > net/sched/Makefile | 1 +
> > net/sched/act_gate.c | 647 ++++++++++++++++++++++++++++
> > 6 files changed, 763 insertions(+)
> > create mode 100644 include/net/tc_act/tc_gate.h create mode 100644
> > include/uapi/linux/tc_act/tc_gate.h
> > create mode 100644 net/sched/act_gate.c
> >
> > diff --git a/include/net/tc_act/tc_gate.h
> > b/include/net/tc_act/tc_gate.h new file mode 100644 index
> > 000000000000..b0ace55b2aaa
> > --- /dev/null
> > +++ b/include/net/tc_act/tc_gate.h
> > @@ -0,0 +1,54 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/* Copyright 2020 NXP */
> > +
> > +#ifndef __NET_TC_GATE_H
> > +#define __NET_TC_GATE_H
> > +
> > +#include <net/act_api.h>
> > +#include <linux/tc_act/tc_gate.h>
> > +
> > +struct tcfg_gate_entry {
> > + int index;
> > + u8 gate_state;
> > + u32 interval;
> > + s32 ipv;
> > + s32 maxoctets;
> > + struct list_head list;
> > +};
> > +
> > +struct tcf_gate_params {
> > + s32 tcfg_priority;
> > + u64 tcfg_basetime;
> > + u64 tcfg_cycletime;
> > + u64 tcfg_cycletime_ext;
> > + u32 tcfg_flags;
> > + s32 tcfg_clockid;
> > + size_t num_entries;
> > + struct list_head entries;
> > +};
> > +
> > +#define GATE_ACT_GATE_OPEN BIT(0)
> > +#define GATE_ACT_PENDING BIT(1)
> > +struct gate_action {
> > + struct tcf_gate_params param;
> > + spinlock_t entry_lock;
> > + u8 current_gate_status;
> > + ktime_t current_close_time;
> > + u32 current_entry_octets;
> > + s32 current_max_octets;
> > + struct tcfg_gate_entry __rcu *next_entry;
> > + struct hrtimer hitimer;
> > + enum tk_offsets tk_offset;
> > + struct rcu_head rcu;
> > +};
> > +
> > +struct tcf_gate {
> > + struct tc_action common;
> > + struct gate_action __rcu *actg;
> > +};
> > +#define to_gate(a) ((struct tcf_gate *)a)
> > +
> > +#define get_gate_param(act) ((struct tcf_gate_params *)act) #define
> > +get_gate_action(p) ((struct gate_action *)p)
> > +
> > +#endif
> > diff --git a/include/uapi/linux/pkt_cls.h
> > b/include/uapi/linux/pkt_cls.h index 9f06d29cab70..fc672b232437
> 100644
> > --- a/include/uapi/linux/pkt_cls.h
> > +++ b/include/uapi/linux/pkt_cls.h
> > @@ -134,6 +134,7 @@ enum tca_id {
> > TCA_ID_CTINFO,
> > TCA_ID_MPLS,
> > TCA_ID_CT,
> > + TCA_ID_GATE,
> > /* other actions go here */
> > __TCA_ID_MAX = 255
> > };
> > diff --git a/include/uapi/linux/tc_act/tc_gate.h
> > b/include/uapi/linux/tc_act/tc_gate.h
> > new file mode 100644
> > index 000000000000..f214b3a6d44f
> > --- /dev/null
> > +++ b/include/uapi/linux/tc_act/tc_gate.h
> > @@ -0,0 +1,47 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> > +/* Copyright 2020 NXP */
> > +
> > +#ifndef __LINUX_TC_GATE_H
> > +#define __LINUX_TC_GATE_H
> > +
> > +#include <linux/pkt_cls.h>
> > +
> > +struct tc_gate {
> > + tc_gen;
> > +};
> > +
> > +enum {
> > + TCA_GATE_ENTRY_UNSPEC,
> > + TCA_GATE_ENTRY_INDEX,
> > + TCA_GATE_ENTRY_GATE,
> > + TCA_GATE_ENTRY_INTERVAL,
> > + TCA_GATE_ENTRY_IPV,
> > + TCA_GATE_ENTRY_MAX_OCTETS,
> > + __TCA_GATE_ENTRY_MAX,
> > +};
> > +#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
> > +
> > +enum {
> > + TCA_GATE_ONE_ENTRY_UNSPEC,
> > + TCA_GATE_ONE_ENTRY,
> > + __TCA_GATE_ONE_ENTRY_MAX,
> > +};
> > +#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX
> - 1)
> > +
> > +enum {
> > + TCA_GATE_UNSPEC,
> > + TCA_GATE_TM,
> > + TCA_GATE_PARMS,
> > + TCA_GATE_PAD,
> > + TCA_GATE_PRIORITY,
> > + TCA_GATE_ENTRY_LIST,
> > + TCA_GATE_BASE_TIME,
> > + TCA_GATE_CYCLE_TIME,
> > + TCA_GATE_CYCLE_TIME_EXT,
> > + TCA_GATE_FLAGS,
> > + TCA_GATE_CLOCKID,
> > + __TCA_GATE_MAX,
> > +};
> > +#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
> > +
> > +#endif
> > diff --git a/net/sched/Kconfig b/net/sched/Kconfig index
> > bfbefb7bff9d..1314549c7567 100644
> > --- a/net/sched/Kconfig
> > +++ b/net/sched/Kconfig
> > @@ -981,6 +981,19 @@ config NET_ACT_CT
> > To compile this code as a module, choose M here: the
> > module will be called act_ct.
> >
> > +config NET_ACT_GATE
> > + tristate "Frame gate entry list control tc action"
> > + depends on NET_CLS_ACT
> > + help
> > + Say Y here to allow to control the ingress flow to be passed at
> > + specific time slot and be dropped at other specific time slot by
> > + the gate entry list. The manipulation will simulate the IEEE
> > + 802.1Qci stream gate control behavior.
> > +
> > + If unsure, say N.
> > + To compile this code as a module, choose M here: the
> > + module will be called act_gate.
> > +
> > config NET_IFE_SKBMARK
> > tristate "Support to encoding decoding skb mark on IFE action"
> > depends on NET_ACT_IFE
> > diff --git a/net/sched/Makefile b/net/sched/Makefile index
> > 31c367a6cd09..66bbf9a98f9e 100644
> > --- a/net/sched/Makefile
> > +++ b/net/sched/Makefile
> > @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) +=
> act_meta_skbprio.o
> > obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
> > obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
> > obj-$(CONFIG_NET_ACT_CT) += act_ct.o
> > +obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
> > obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
> > obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
> > obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
> > diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c new file mode
> > 100644 index 000000000000..e932f402b4f1
> > --- /dev/null
> > +++ b/net/sched/act_gate.c
> > @@ -0,0 +1,647 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/* Copyright 2020 NXP */
> > +
> > +#include <linux/module.h>
> > +#include <linux/types.h>
> > +#include <linux/kernel.h>
> > +#include <linux/string.h>
> > +#include <linux/errno.h>
> > +#include <linux/skbuff.h>
> > +#include <linux/rtnetlink.h>
> > +#include <linux/init.h>
> > +#include <linux/slab.h>
> > +#include <net/act_api.h>
> > +#include <net/netlink.h>
> > +#include <net/pkt_cls.h>
> > +#include <net/tc_act/tc_gate.h>
> > +
> > +static unsigned int gate_net_id;
> > +static struct tc_action_ops act_gate_ops;
> > +
> > +static ktime_t gate_get_time(struct gate_action *gact) {
> > + ktime_t mono = ktime_get();
> > +
> > + switch (gact->tk_offset) {
> > + case TK_OFFS_MAX:
> > + return mono;
> > + default:
> > + return ktime_mono_to_any(mono, gact->tk_offset);
> > + }
> > +
> > + return KTIME_MAX;
> > +}
> > +
> > +static int gate_get_start_time(struct gate_action *gact, ktime_t
> > +*start) {
> > + struct tcf_gate_params *param = get_gate_param(gact);
> > + ktime_t now, base, cycle;
> > + u64 n;
> > +
> > + base = ns_to_ktime(param->tcfg_basetime);
> > + now = gate_get_time(gact);
> > +
> > + if (ktime_after(base, now)) {
> > + *start = base;
> > + return 0;
> > + }
> > +
> > + cycle = param->tcfg_cycletime;
> > +
> > + /* cycle time should not be zero */
> > + if (WARN_ON(!cycle))
> > + return -EFAULT;
> > +
> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
> > + *start = ktime_add_ns(base, (n + 1) * cycle);
> > + return 0;
> > +}
> > +
> > +static void gate_start_timer(struct gate_action *gact, ktime_t start)
> > +{
> > + ktime_t expires;
> > +
> > + expires = hrtimer_get_expires(&gact->hitimer);
> > + if (expires == 0)
> > + expires = KTIME_MAX;
> > +
> > + start = min_t(ktime_t, start, expires);
> > +
> > + hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS); }
> > +
> > +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer) {
> > + struct gate_action *gact = container_of(timer, struct gate_action,
> > + hitimer);
> > + struct tcf_gate_params *p = get_gate_param(gact);
> > + struct tcfg_gate_entry *next;
> > + ktime_t close_time, now;
> > +
> > + spin_lock(&gact->entry_lock);
> > +
> > + next = rcu_dereference_protected(gact->next_entry,
> > +
> > + lockdep_is_held(&gact->entry_lock));
> > +
> > + /* cycle start, clear pending bit, clear total octets */
> > + gact->current_gate_status = next->gate_state ?
> GATE_ACT_GATE_OPEN : 0;
> > + gact->current_entry_octets = 0;
> > + gact->current_max_octets = next->maxoctets;
> > +
> > + gact->current_close_time = ktime_add_ns(gact->current_close_time,
> > + next->interval);
> > +
> > + close_time = gact->current_close_time;
> > +
> > + if (list_is_last(&next->list, &p->entries))
> > + next = list_first_entry(&p->entries,
> > + struct tcfg_gate_entry, list);
> > + else
> > + next = list_next_entry(next, list);
> > +
> > + now = gate_get_time(gact);
> > +
> > + if (ktime_after(now, close_time)) {
> > + ktime_t cycle, base;
> > + u64 n;
> > +
> > + cycle = p->tcfg_cycletime;
> > + base = ns_to_ktime(p->tcfg_basetime);
> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
> > + close_time = ktime_add_ns(base, (n + 1) * cycle);
> > + }
> > +
> > + rcu_assign_pointer(gact->next_entry, next);
> > + spin_unlock(&gact->entry_lock);
> > +
> > + hrtimer_set_expires(&gact->hitimer, close_time);
> > +
> > + return HRTIMER_RESTART;
> > +}
> > +
> > +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
> > + struct tcf_result *res) {
> > + struct tcf_gate *g = to_gate(a);
> > + struct gate_action *gact;
> > + int action;
> > +
> > + tcf_lastuse_update(&g->tcf_tm);
> > + bstats_cpu_update(this_cpu_ptr(g->common.cpu_bstats), skb);
> > +
> > + action = READ_ONCE(g->tcf_action);
> > + rcu_read_lock();
> > + gact = rcu_dereference_bh(g->actg);
> > + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
> > + rcu_read_unlock();
> > + return action;
> > + }
> > +
> > + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
> > + goto drop;
> > +
> > + if (gact->current_max_octets >= 0) {
> > + gact->current_entry_octets += qdisc_pkt_len(skb);
> > + if (gact->current_entry_octets > gact->current_max_octets) {
> > + qstats_overlimit_inc(this_cpu_ptr(g->common.cpu_qstats));
> > + goto drop;
> > + }
> > + }
> > + rcu_read_unlock();
> > +
> > + return action;
> > +drop:
> > + rcu_read_unlock();
> > + qstats_drop_inc(this_cpu_ptr(g->common.cpu_qstats));
> > + return TC_ACT_SHOT;
> > +}
> > +
> > +static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1]
> = {
> > + [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
> > + [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
> > + [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
> > + [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
> > + [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
> > +};
> > +
> > +static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
> > + [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
> > + .type = NLA_EXACT_LEN },
> > + [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
> > + [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
> > + [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
> > + [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
> > + [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
> > + [TCA_GATE_FLAGS] = { .type = NLA_U32 },
> > + [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
> > +};
> > +
> > +static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
> > + struct netlink_ext_ack *extack) {
> > + u32 interval = 0;
> > +
> > + entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
> > +
> > + if (tb[TCA_GATE_ENTRY_INTERVAL])
> > + interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
> > +
> > + if (interval == 0) {
> > + NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
> > + return -EINVAL;
> > + }
> > +
> > + entry->interval = interval;
> > +
> > + if (tb[TCA_GATE_ENTRY_IPV])
> > + entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
> > + else
> > + entry->ipv = -1;
> > +
> > + if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
> > + entry->maxoctets =
> nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
> > + else
> > + entry->maxoctets = -1;
> > +
> > + return 0;
> > +}
> > +
> > +static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry
> *entry,
> > + int index, struct netlink_ext_ack *extack) {
> > + struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
> > + int err;
> > +
> > + err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy,
> extack);
> > + if (err < 0) {
> > + NL_SET_ERR_MSG(extack, "Could not parse nested entry");
> > + return -EINVAL;
> > + }
> > +
> > + entry->index = index;
> > +
> > + return fill_gate_entry(tb, entry, extack); }
> > +
> > +static int parse_gate_list(struct nlattr *list_attr,
> > + struct tcf_gate_params *sched,
> > + struct netlink_ext_ack *extack) {
> > + struct tcfg_gate_entry *entry, *e;
> > + struct nlattr *n;
> > + int err, rem;
> > + int i = 0;
> > +
> > + if (!list_attr)
> > + return -EINVAL;
> > +
> > + nla_for_each_nested(n, list_attr, rem) {
> > + if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
> > + NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
> > + continue;
> > + }
> > +
> > + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> > + if (!entry) {
> > + NL_SET_ERR_MSG(extack, "Not enough memory for entry");
> > + err = -ENOMEM;
> > + goto release_list;
> > + }
> > +
> > + err = parse_gate_entry(n, entry, i, extack);
> > + if (err < 0) {
> > + kfree(entry);
> > + goto release_list;
> > + }
> > +
> > + list_add_tail(&entry->list, &sched->entries);
> > + i++;
> > + }
> > +
> > + sched->num_entries = i;
> > +
> > + return i;
> > +
> > +release_list:
> > + list_for_each_entry_safe(entry, e, &sched->entries, list) {
> > + list_del(&entry->list);
> > + kfree(entry);
> > + }
> > +
> > + return err;
> > +}
> > +
> > +static int tcf_gate_init(struct net *net, struct nlattr *nla,
> > + struct nlattr *est, struct tc_action **a,
> > + int ovr, int bind, bool rtnl_held,
> > + struct tcf_proto *tp, u32 flags,
> > + struct netlink_ext_ack *extack) {
> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
> > + enum tk_offsets tk_offset = TK_OFFS_TAI;
> > + struct nlattr *tb[TCA_GATE_MAX + 1];
> > + struct tcf_chain *goto_ch = NULL;
> > + struct tcfg_gate_entry *next;
> > + struct tcf_gate_params *p;
> > + struct gate_action *gact;
> > + s32 clockid = CLOCK_TAI;
> > + struct tc_gate *parm;
> > + struct tcf_gate *g;
> > + int ret = 0, err;
> > + u64 basetime = 0;
> > + u32 gflags = 0;
> > + s32 prio = -1;
> > + ktime_t start;
> > + u32 index;
> > +
> > + if (!nla)
> > + return -EINVAL;
> > +
> > + err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
> > + if (err < 0)
> > + return err;
> > +
> > + if (!tb[TCA_GATE_PARMS])
> > + return -EINVAL;
> > + parm = nla_data(tb[TCA_GATE_PARMS]);
> > + index = parm->index;
> > + err = tcf_idr_check_alloc(tn, &index, a, bind);
> > + if (err < 0)
> > + return err;
> > +
> > + if (err && bind)
> > + return 0;
> > +
> > + if (!err) {
> > + ret = tcf_idr_create_from_flags(tn, index, est, a,
> > + &act_gate_ops, bind, flags);
> > + if (ret) {
> > + tcf_idr_cleanup(tn, index);
> > + return ret;
> > + }
> > +
> > + ret = ACT_P_CREATED;
> > + } else if (!ovr) {
> > + tcf_idr_release(*a, bind);
> > + return -EEXIST;
> > + }
> > +
> > + if (tb[TCA_GATE_PRIORITY])
> > + prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
> > +
> > + if (tb[TCA_GATE_BASE_TIME])
> > + basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
> > +
> > + if (tb[TCA_GATE_FLAGS])
> > + gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
> > +
> > + if (tb[TCA_GATE_CLOCKID]) {
> > + clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
> > + switch (clockid) {
> > + case CLOCK_REALTIME:
> > + tk_offset = TK_OFFS_REAL;
> > + break;
> > + case CLOCK_MONOTONIC:
> > + tk_offset = TK_OFFS_MAX;
> > + break;
> > + case CLOCK_BOOTTIME:
> > + tk_offset = TK_OFFS_BOOT;
> > + break;
> > + case CLOCK_TAI:
> > + tk_offset = TK_OFFS_TAI;
> > + break;
> > + default:
> > + NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
> > + goto release_idr;
> > + }
> > + }
> > +
> > + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
> > + if (err < 0)
> > + goto release_idr;
> > +
> > + g = to_gate(*a);
> > +
> > + gact = kzalloc(sizeof(*gact), GFP_KERNEL);
> > + if (!gact) {
> > + err = -ENOMEM;
> > + goto put_chain;
> > + }
> > +
> > + p = get_gate_param(gact);
> > +
> > + INIT_LIST_HEAD(&p->entries);
> > + if (tb[TCA_GATE_ENTRY_LIST]) {
> > + err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
> > + if (err < 0)
> > + goto release_mem;
> > + }
> > +
> > + if (tb[TCA_GATE_CYCLE_TIME]) {
> > + p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
> > + } else {
> > + struct tcfg_gate_entry *entry;
> > + ktime_t cycle = 0;
> > +
> > + list_for_each_entry(entry, &p->entries, list)
> > + cycle = ktime_add_ns(cycle, entry->interval);
> > + p->tcfg_cycletime = cycle;
> > + }
> > +
> > + if (tb[TCA_GATE_CYCLE_TIME_EXT])
> > + p->tcfg_cycletime_ext =
> > + nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
> > +
> > + p->tcfg_priority = prio;
> > + p->tcfg_basetime = basetime;
> > + p->tcfg_clockid = clockid;
> > + p->tcfg_flags = gflags;
> > +
> > + gact->tk_offset = tk_offset;
> > + spin_lock_init(&gact->entry_lock);
> > + hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
> > + gact->hitimer.function = gate_timer_func;
> > +
>
> One idea that just happened, if you find a way to enable RX timestamping
> and can rely that all packets have a timestamp, the code can simplified a
> lot. You wouldn't need any hrtimers, and deciding to drop or not a packet
> becomes a couple of mathematical operations. Seems worth a thought.

Thanks for the different ideas. The basic problem is we need to know now is a close time or open time in action. But I still don't know a better way than hrtimer to set the flag.

>
> The real question is: if requiring for the driver to support at least software
> RX timestamping is excessive (doesn't seem so to me).

I understand.

>
>
> Cheers,
> --
> Vinicius

Thanks a lot!

Br,
Po Liu

2020-04-24 17:45:23

by Vinicius Costa Gomes

[permalink] [raw]
Subject: RE: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action

Po Liu <[email protected]> writes:

>>
>> One idea that just happened, if you find a way to enable RX timestamping
>> and can rely that all packets have a timestamp, the code can simplified a
>> lot. You wouldn't need any hrtimers, and deciding to drop or not a packet
>> becomes a couple of mathematical operations. Seems worth a thought.
>
> Thanks for the different ideas. The basic problem is we need to know
> now is a close time or open time in action. But I still don't know a
> better way than hrtimer to set the flag.

That's the point, if you have the timestamp of when the packet arrived,
you can calculate if the gate is open and closed at that point. You
don't need to know "now", you work only in terms of "skb->tstamp"
(supposing that's where the timestamp is stored). In other words, it
doesn't matter when the packet arrives at the qdisc action, but when it
arrived at the controller, and the actions should be taken based on that
time.

>
>>
>> The real question is: if requiring for the driver to support at least software
>> RX timestamping is excessive (doesn't seem so to me).
>
> I understand.
>
>>
>>
>> Cheers,
>> --
>> Vinicius
>
> Thanks a lot!
>
> Br,
> Po Liu

--
Vinicius

2020-04-25 01:51:11

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control flow action




Br,
Po Liu

> -----Original Message-----
> From: Vinicius Costa Gomes <[email protected]>
> Sent: 2020??4??25?? 1:41
> To: Po Liu <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]
> Cc: Claudiu Manoil <[email protected]>; Vladimir Oltean
> <[email protected]>; Alexandru Marginean
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: RE: [EXT] Re: [v3,net-next 1/4] net: qos: introduce a gate control
> flow action
>
> Caution: EXT Email
>
> Po Liu <[email protected]> writes:
>
> >>
> >> One idea that just happened, if you find a way to enable RX
> >> timestamping and can rely that all packets have a timestamp, the code
> >> can simplified a lot. You wouldn't need any hrtimers, and deciding to
> >> drop or not a packet becomes a couple of mathematical operations.
> Seems worth a thought.
> >
> > Thanks for the different ideas. The basic problem is we need to know
> > now is a close time or open time in action. But I still don't know a
> > better way than hrtimer to set the flag.
>
> That's the point, if you have the timestamp of when the packet arrived,
> you can calculate if the gate is open and closed at that point. You don't
> need to know "now", you work only in terms of "skb->tstamp"
> (supposing that's where the timestamp is stored). In other words, it
> doesn't matter when the packet arrives at the qdisc action, but when it
> arrived at the controller, and the actions should be taken based on that
> time.

I get your idea. But that would rely on the software timestamping on driver to be set. By set all streams or by filtered stream each time set with tc flower filter. Also how the virtue net fix it is unknow.

>
> >
> >>
> >> The real question is: if requiring for the driver to support at least
> >> software RX timestamping is excessive (doesn't seem so to me).
> >
> > I understand.
> >
> >>
> >>
> >> Cheers,
> >> --
> >> Vinicius
> >
> > Thanks a lot!
> >
> > Br,
> > Po Liu
>
> --
> Vinicius

2020-04-28 03:57:33

by Po Liu

[permalink] [raw]
Subject: [v4,net-next 0/4] Introduce a flow gate control action and apply IEEE

Changes from V3:

0001:

Fix and modify according to Vlid Buslov:
- Remove the struct gate_action and move the parameters to the
struct tcf_gate align with tc_action parameters. This would not need to
alloc rcu type memory with pointer.
- Remove the spin_lock type entry_lock which do not needed anymore, will
use the tcf_lock system provided.
- Provide lockep protect for the status parameters in the tcf_gate_act().
- Remove the cycletime 0 input warning, return error directly.

And:
- Remove Qci related description in the Kconfig for gate action.

0002:
- Fix rcu_read_lock protect range suggested by Vlid Buslov.

0003:
- No changes.

0004:
- Fix bug of gate maxoct wildcard condition not included.
- Fix the pass time basetime calculation report by Vladimir Otlean.

Changes from V2:
0001: No changes.
0002: No changes.
0003: No changes.
0004: Fix the vlan id filter parameter and add reject src mac
FF-FF-FF-FF-FF-FF filter in driver.

Changes from V1:
0000: Update description make it more clear
0001: Removed 'add update dropped stats' patch, will provide pull
request as standalone patches.
0001: Update commit description make it more clear ack by Jiri Pirko.
0002: No changes
0003: Fix some code style ack by Jiri Pirko.
0004: Fix enetc_psfp_enable/disable parameter type ack by test robot

iprout2 command patches:
Not attach with these serial patches, will provide separate pull
request after kernel accept these patches.

Changes from RFC:
0000: Reduce to 5 patches and remove the 4 max frame size offload and
flow metering in the policing offload action, Only keep gate action
offloading implementation.
0001: No changes.
0002:
- fix kfree lead ack by Jakub Kicinski and Cong Wang
- License fix from Jakub Kicinski and Stephen Hemminger
- Update example in commit acked by Vinicius Costa Gomes
- Fix the rcu protect in tcf_gate_act() acked by Vinicius

0003: No changes
0004: No changes
0005:
Acked by Vinicius Costa Gomes
- Use refcount kernel lib
- Update stream gate check code position
- Update reduce ref names more clear

iprout2 command patches:
0000: Update license expression and add gate id
0001: Add tc action gate man page

--------------------------------------------------------------------
These patches add stream gate action policing in IEEE802.1Qci (Per-Stream
Filtering and Policing) software support and hardware offload support in
tc flower, and implement the stream identify, stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver.
Per-Stream Filtering and Policing (PSFP) specifies flow policing and
filtering for ingress flows, and has three main parts:
1. The stream filter instance table consists of an ordered list of
stream filters that determine the filtering and policing actions that
are to be applied to frames received on a specific stream. The main
elements are stream gate id, flow metering id and maximum SDU size.
2. The stream gate function setup a gate list to control ingress traffic
class open/close state. When the gate is running at open state, the flow
could pass but dropped when gate state is running to close. User setup a
bastime to tell gate when start running the entry list, then the hardware
would periodiclly. There is no compare qdisc action support.
3. Flow metering is two rates two buckets and three-color marker to
policing the frames. Flow metering instance are as specified in the
algorithm in MEF10.3. The most likely qdisc action is policing action.

The first patch introduces an ingress frame flow control gate action,
for the point 2. The tc gate action maintains the open/close state gate
list, allowing flows to pass when the gate is open. Each gate action
may policing one or more qdisc filters. When the start time arrived, The
driver would repeat the gate list periodiclly. User can assign a passed
time, the driver would calculate a new future time by the cycletime of
the gate list.

The 0002 patch introduces the gate flow hardware offloading.

The 0003 patch adds support control the on/off for the tc flower
offloading by ethtool.

The 0004 patch implement the stream identify and stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver. Tc filter
command provide filtering keys with MAC address and VLAN id. These keys
would be set to stream identify instance entry. Stream gate instance
entry would refer the gate action parameters. Stream filter instance
entry would refer the stream gate index and assign a stream handle value
matches to the stream identify instance.

Po Liu (4):
net: qos: introduce a gate control flow action
net: schedule: add action gate offloading
net: enetc: add hw tc hw offload features for PSPF capability
net: enetc: add tc flower psfp offload driver

drivers/net/ethernet/freescale/enetc/enetc.c | 34 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 86 ++
.../net/ethernet/freescale/enetc/enetc_hw.h | 159 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 6 +
.../net/ethernet/freescale/enetc/enetc_qos.c | 1098 +++++++++++++++++
include/net/flow_offload.h | 10 +
include/net/tc_act/tc_gate.h | 160 +++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/act_gate.c | 637 ++++++++++
net/sched/cls_api.c | 33 +
13 files changed, 2283 insertions(+), 1 deletion(-)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

--
2.17.1

2020-04-28 03:57:44

by Po Liu

[permalink] [raw]
Subject: [v4,net-next 1/4] net: qos: introduce a gate control flow action

Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can be passed at
specific time slot, and be dropped at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action requires the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped. When the ingress frames have reach total frames over
8000000 bytes, the excessive frames will be dropped in that 200000000ns
time slot.

> tc qdisc add dev eth0 ingress

> tc filter add dev eth0 parent ffff: protocol ip \
flower src_ip 192.168.0.20 \
action gate index 2 clockid CLOCK_TAI \
sched-entry open 200000000 -1 8000000 \
sched-entry close 100000000 -1 -1

> tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow with period nanosecond. Then next item is internal
priority value means which ingress queue should put. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.
Base-time is not set will be 0 as default, as result start time would
be ((N + 1) * cycletime) which is the minimal of future time.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

> tc filter add dev eth0 parent ffff: protocol ip \
flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
action gate index 12 base-time 1357000000000 \
sched-entry close 200000000 -1 -1 \
clockid CLOCK_TAI

Signed-off-by: Po Liu <[email protected]>
---
include/net/tc_act/tc_gate.h | 47 ++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 ++
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/act_gate.c | 638 ++++++++++++++++++++++++++++
6 files changed, 746 insertions(+)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
new file mode 100644
index 000000000000..330ad8b02495
--- /dev/null
+++ b/include/net/tc_act/tc_gate.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* Copyright 2020 NXP */
+
+#ifndef __NET_TC_GATE_H
+#define __NET_TC_GATE_H
+
+#include <net/act_api.h>
+#include <linux/tc_act/tc_gate.h>
+
+struct tcfg_gate_entry {
+ int index;
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+ struct list_head list;
+};
+
+struct tcf_gate_params {
+ s32 tcfg_priority;
+ u64 tcfg_basetime;
+ u64 tcfg_cycletime;
+ u64 tcfg_cycletime_ext;
+ u32 tcfg_flags;
+ s32 tcfg_clockid;
+ size_t num_entries;
+ struct list_head entries;
+};
+
+#define GATE_ACT_GATE_OPEN BIT(0)
+#define GATE_ACT_PENDING BIT(1)
+
+struct tcf_gate {
+ struct tc_action common;
+ struct tcf_gate_params param;
+ u8 current_gate_status;
+ ktime_t current_close_time;
+ u32 current_entry_octets;
+ s32 current_max_octets;
+ struct tcfg_gate_entry *next_entry;
+ struct hrtimer hitimer;
+ enum tk_offsets tk_offset;
+};
+
+#define to_gate(a) ((struct tcf_gate *)a)
+
+#endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 9f06d29cab70..fc672b232437 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -134,6 +134,7 @@ enum tca_id {
TCA_ID_CTINFO,
TCA_ID_MPLS,
TCA_ID_CT,
+ TCA_ID_GATE,
/* other actions go here */
__TCA_ID_MAX = 255
};
diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
new file mode 100644
index 000000000000..f214b3a6d44f
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_gate.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 NXP */
+
+#ifndef __LINUX_TC_GATE_H
+#define __LINUX_TC_GATE_H
+
+#include <linux/pkt_cls.h>
+
+struct tc_gate {
+ tc_gen;
+};
+
+enum {
+ TCA_GATE_ENTRY_UNSPEC,
+ TCA_GATE_ENTRY_INDEX,
+ TCA_GATE_ENTRY_GATE,
+ TCA_GATE_ENTRY_INTERVAL,
+ TCA_GATE_ENTRY_IPV,
+ TCA_GATE_ENTRY_MAX_OCTETS,
+ __TCA_GATE_ENTRY_MAX,
+};
+#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_ONE_ENTRY_UNSPEC,
+ TCA_GATE_ONE_ENTRY,
+ __TCA_GATE_ONE_ENTRY_MAX,
+};
+#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_UNSPEC,
+ TCA_GATE_TM,
+ TCA_GATE_PARMS,
+ TCA_GATE_PAD,
+ TCA_GATE_PRIORITY,
+ TCA_GATE_ENTRY_LIST,
+ TCA_GATE_BASE_TIME,
+ TCA_GATE_CYCLE_TIME,
+ TCA_GATE_CYCLE_TIME_EXT,
+ TCA_GATE_FLAGS,
+ TCA_GATE_CLOCKID,
+ __TCA_GATE_MAX,
+};
+#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
+
+#endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index bfbefb7bff9d..2f20073f4f84 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -981,6 +981,18 @@ config NET_ACT_CT
To compile this code as a module, choose M here: the
module will be called act_ct.

+config NET_ACT_GATE
+ tristate "Frame gate entry list control tc action"
+ depends on NET_CLS_ACT
+ help
+ Say Y here to allow to control the ingress flow to be passed at
+ specific time slot and be dropped at other specific time slot by
+ the gate entry list.
+
+ If unsure, say N.
+ To compile this code as a module, choose M here: the
+ module will be called act_gate.
+
config NET_IFE_SKBMARK
tristate "Support to encoding decoding skb mark on IFE action"
depends on NET_ACT_IFE
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 31c367a6cd09..66bbf9a98f9e 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o
obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
obj-$(CONFIG_NET_ACT_CT) += act_ct.o
+obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
new file mode 100644
index 000000000000..916f6fe56692
--- /dev/null
+++ b/net/sched/act_gate.c
@@ -0,0 +1,638 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+/* Copyright 2020 NXP */
+
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/skbuff.h>
+#include <linux/rtnetlink.h>
+#include <linux/init.h>
+#include <linux/slab.h>
+#include <net/act_api.h>
+#include <net/netlink.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>
+
+static unsigned int gate_net_id;
+static struct tc_action_ops act_gate_ops;
+
+static ktime_t gate_get_time(struct tcf_gate *gact)
+{
+ ktime_t mono = ktime_get();
+
+ switch (gact->tk_offset) {
+ case TK_OFFS_MAX:
+ return mono;
+ default:
+ return ktime_mono_to_any(mono, gact->tk_offset);
+ }
+
+ return KTIME_MAX;
+}
+
+static int gate_get_start_time(struct tcf_gate *gact, ktime_t *start)
+{
+ struct tcf_gate_params *param = &gact->param;
+ ktime_t now, base, cycle;
+ u64 n;
+
+ base = ns_to_ktime(param->tcfg_basetime);
+ now = gate_get_time(gact);
+
+ if (ktime_after(base, now)) {
+ *start = base;
+ return 0;
+ }
+
+ cycle = param->tcfg_cycletime;
+
+ /* cycle time should not be zero */
+ if (!cycle)
+ return -EFAULT;
+
+ n = div64_u64(ktime_sub_ns(now, base), cycle);
+ *start = ktime_add_ns(base, (n + 1) * cycle);
+ return 0;
+}
+
+static void gate_start_timer(struct tcf_gate *gact, ktime_t start)
+{
+ ktime_t expires;
+
+ expires = hrtimer_get_expires(&gact->hitimer);
+ if (expires == 0)
+ expires = KTIME_MAX;
+
+ start = min_t(ktime_t, start, expires);
+
+ hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS);
+}
+
+static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
+{
+ struct tcf_gate *gact = container_of(timer, struct tcf_gate,
+ hitimer);
+ struct tcf_gate_params *p = &gact->param;
+ struct tcfg_gate_entry *next;
+ ktime_t close_time, now;
+
+ spin_lock(&gact->tcf_lock);
+
+ next = gact->next_entry;
+
+ /* cycle start, clear pending bit, clear total octets */
+ gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0;
+ gact->current_entry_octets = 0;
+ gact->current_max_octets = next->maxoctets;
+
+ gact->current_close_time = ktime_add_ns(gact->current_close_time,
+ next->interval);
+
+ close_time = gact->current_close_time;
+
+ if (list_is_last(&next->list, &p->entries))
+ next = list_first_entry(&p->entries,
+ struct tcfg_gate_entry, list);
+ else
+ next = list_next_entry(next, list);
+
+ now = gate_get_time(gact);
+
+ if (ktime_after(now, close_time)) {
+ ktime_t cycle, base;
+ u64 n;
+
+ cycle = p->tcfg_cycletime;
+ base = ns_to_ktime(p->tcfg_basetime);
+ n = div64_u64(ktime_sub_ns(now, base), cycle);
+ close_time = ktime_add_ns(base, (n + 1) * cycle);
+ }
+
+ gact->next_entry = next;
+
+ hrtimer_set_expires(&gact->hitimer, close_time);
+
+ spin_unlock(&gact->tcf_lock);
+
+ return HRTIMER_RESTART;
+}
+
+static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
+ struct tcf_result *res)
+{
+ struct tcf_gate *gact = to_gate(a);
+
+ spin_lock_bh(&gact->tcf_lock);
+
+ tcf_lastuse_update(&gact->tcf_tm);
+ bstats_update(&gact->tcf_bstats, skb);
+
+ if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
+ spin_unlock_bh(&gact->tcf_lock);
+ return gact->tcf_action;
+ }
+
+ if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
+ goto drop;
+
+ if (gact->current_max_octets >= 0) {
+ gact->current_entry_octets += qdisc_pkt_len(skb);
+ if (gact->current_entry_octets > gact->current_max_octets) {
+ gact->tcf_qstats.overlimits++;
+ goto drop;
+ }
+ }
+
+ spin_unlock_bh(&gact->tcf_lock);
+
+ return gact->tcf_action;
+drop:
+ gact->tcf_qstats.drops++;
+ spin_unlock_bh(&gact->tcf_lock);
+
+ return TC_ACT_SHOT;
+}
+
+static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1] = {
+ [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
+ [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
+ [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
+ [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
+ [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
+};
+
+static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
+ [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
+ .type = NLA_EXACT_LEN },
+ [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
+ [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
+ [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
+ [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
+ [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
+ [TCA_GATE_FLAGS] = { .type = NLA_U32 },
+ [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
+};
+
+static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
+ struct netlink_ext_ack *extack)
+{
+ u32 interval = 0;
+
+ entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
+
+ if (tb[TCA_GATE_ENTRY_INTERVAL])
+ interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
+
+ if (interval == 0) {
+ NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
+ return -EINVAL;
+ }
+
+ entry->interval = interval;
+
+ if (tb[TCA_GATE_ENTRY_IPV])
+ entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
+ else
+ entry->ipv = -1;
+
+ if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
+ entry->maxoctets = nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
+ else
+ entry->maxoctets = -1;
+
+ return 0;
+}
+
+static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry *entry,
+ int index, struct netlink_ext_ack *extack)
+{
+ struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
+ int err;
+
+ err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy, extack);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack, "Could not parse nested entry");
+ return -EINVAL;
+ }
+
+ entry->index = index;
+
+ return fill_gate_entry(tb, entry, extack);
+}
+
+static void release_entry_list(struct list_head *entries)
+{
+ struct tcfg_gate_entry *entry, *e;
+
+ list_for_each_entry_safe(entry, e, entries, list) {
+ list_del(&entry->list);
+ kfree(entry);
+ }
+}
+
+static int parse_gate_list(struct nlattr *list_attr,
+ struct tcf_gate_params *sched,
+ struct netlink_ext_ack *extack)
+{
+ struct tcfg_gate_entry *entry;
+ struct nlattr *n;
+ int err, rem;
+ int i = 0;
+
+ if (!list_attr)
+ return -EINVAL;
+
+ nla_for_each_nested(n, list_attr, rem) {
+ if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
+ NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
+ continue;
+ }
+
+ entry = kzalloc(sizeof(*entry), GFP_KERNEL);
+ if (!entry) {
+ NL_SET_ERR_MSG(extack, "Not enough memory for entry");
+ err = -ENOMEM;
+ goto release_list;
+ }
+
+ err = parse_gate_entry(n, entry, i, extack);
+ if (err < 0) {
+ kfree(entry);
+ goto release_list;
+ }
+
+ list_add_tail(&entry->list, &sched->entries);
+ i++;
+ }
+
+ sched->num_entries = i;
+
+ return i;
+
+release_list:
+ release_entry_list(&sched->entries);
+
+ return err;
+}
+
+static int tcf_gate_init(struct net *net, struct nlattr *nla,
+ struct nlattr *est, struct tc_action **a,
+ int ovr, int bind, bool rtnl_held,
+ struct tcf_proto *tp, u32 flags,
+ struct netlink_ext_ack *extack)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+ enum tk_offsets tk_offset = TK_OFFS_TAI;
+ struct nlattr *tb[TCA_GATE_MAX + 1];
+ struct tcf_chain *goto_ch = NULL;
+ struct tcf_gate_params *p;
+ s32 clockid = CLOCK_TAI;
+ struct tcf_gate *gact;
+ struct tc_gate *parm;
+ int ret = 0, err;
+ u64 basetime = 0;
+ u32 gflags = 0;
+ s32 prio = -1;
+ ktime_t start;
+ u32 index;
+
+ if (!nla)
+ return -EINVAL;
+
+ err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
+ if (err < 0)
+ return err;
+
+ if (!tb[TCA_GATE_PARMS])
+ return -EINVAL;
+
+ parm = nla_data(tb[TCA_GATE_PARMS]);
+ index = parm->index;
+
+ err = tcf_idr_check_alloc(tn, &index, a, bind);
+ if (err < 0)
+ return err;
+
+ if (err && bind)
+ return 0;
+
+ if (!err) {
+ ret = tcf_idr_create(tn, index, est, a,
+ &act_gate_ops, bind, false, 0);
+ if (ret) {
+ tcf_idr_cleanup(tn, index);
+ return ret;
+ }
+
+ ret = ACT_P_CREATED;
+ } else if (!ovr) {
+ tcf_idr_release(*a, bind);
+ return -EEXIST;
+ }
+
+ if (tb[TCA_GATE_PRIORITY])
+ prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
+
+ if (tb[TCA_GATE_BASE_TIME])
+ basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
+
+ if (tb[TCA_GATE_FLAGS])
+ gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
+
+ if (tb[TCA_GATE_CLOCKID]) {
+ clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
+ switch (clockid) {
+ case CLOCK_REALTIME:
+ tk_offset = TK_OFFS_REAL;
+ break;
+ case CLOCK_MONOTONIC:
+ tk_offset = TK_OFFS_MAX;
+ break;
+ case CLOCK_BOOTTIME:
+ tk_offset = TK_OFFS_BOOT;
+ break;
+ case CLOCK_TAI:
+ tk_offset = TK_OFFS_TAI;
+ break;
+ default:
+ NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
+ goto release_idr;
+ }
+ }
+
+ err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
+ if (err < 0)
+ goto release_idr;
+
+ gact = to_gate(*a);
+
+ spin_lock_bh(&gact->tcf_lock);
+ p = &gact->param;
+
+ if (tb[TCA_GATE_CYCLE_TIME]) {
+ p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
+ if (!p->tcfg_cycletime_ext)
+ goto chain_put;
+ }
+
+ INIT_LIST_HEAD(&p->entries);
+ if (tb[TCA_GATE_ENTRY_LIST]) {
+ err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
+ if (err < 0)
+ goto chain_put;
+ }
+
+ if (!p->tcfg_cycletime) {
+ struct tcfg_gate_entry *entry;
+ ktime_t cycle = 0;
+
+ list_for_each_entry(entry, &p->entries, list)
+ cycle = ktime_add_ns(cycle, entry->interval);
+ p->tcfg_cycletime = cycle;
+ }
+
+ if (tb[TCA_GATE_CYCLE_TIME_EXT])
+ p->tcfg_cycletime_ext =
+ nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
+
+ p->tcfg_priority = prio;
+ p->tcfg_basetime = basetime;
+ p->tcfg_clockid = clockid;
+ p->tcfg_flags = gflags;
+
+ gact->tk_offset = tk_offset;
+ hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
+ gact->hitimer.function = gate_timer_func;
+
+ err = gate_get_start_time(gact, &start);
+ if (err < 0) {
+ NL_SET_ERR_MSG(extack,
+ "Internal error: failed get start time");
+ release_entry_list(&p->entries);
+ goto chain_put;
+ }
+
+ gact->current_close_time = start;
+ gact->current_gate_status = GATE_ACT_GATE_OPEN | GATE_ACT_PENDING;
+
+ gact->next_entry = list_first_entry(&p->entries,
+ struct tcfg_gate_entry, list);
+
+ goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
+
+ gate_start_timer(gact, start);
+
+ spin_unlock_bh(&gact->tcf_lock);
+
+ if (goto_ch)
+ tcf_chain_put_by_act(goto_ch);
+
+ if (ret == ACT_P_CREATED)
+ tcf_idr_insert(tn, *a);
+ return ret;
+
+chain_put:
+ spin_unlock_bh(&gact->tcf_lock);
+
+ if (goto_ch)
+ tcf_chain_put_by_act(goto_ch);
+release_idr:
+ tcf_idr_release(*a, bind);
+ return err;
+}
+
+static void tcf_gate_cleanup(struct tc_action *a)
+{
+ struct tcf_gate *gact = to_gate(a);
+ struct tcf_gate_params *p;
+
+ spin_lock_bh(&gact->tcf_lock);
+
+ hrtimer_cancel(&gact->hitimer);
+
+ p = &gact->param;
+ release_entry_list(&p->entries);
+
+ spin_unlock_bh(&gact->tcf_lock);
+}
+
+static int dumping_entry(struct sk_buff *skb,
+ struct tcfg_gate_entry *entry)
+{
+ struct nlattr *item;
+
+ item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY);
+ if (!item)
+ return -ENOSPC;
+
+ if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index))
+ goto nla_put_failure;
+
+ if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry->maxoctets))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv))
+ goto nla_put_failure;
+
+ return nla_nest_end(skb, item);
+
+nla_put_failure:
+ nla_nest_cancel(skb, item);
+ return -1;
+}
+
+static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a,
+ int bind, int ref)
+{
+ unsigned char *b = skb_tail_pointer(skb);
+ struct tcf_gate *gact = to_gate(a);
+ struct tc_gate opt = {
+ .index = gact->tcf_index,
+ .refcnt = refcount_read(&gact->tcf_refcnt) - ref,
+ .bindcnt = atomic_read(&gact->tcf_bindcnt) - bind,
+ };
+ struct tcfg_gate_entry *entry;
+ struct tcf_gate_params *p;
+ struct nlattr *entry_list;
+ struct tcf_t t;
+
+ spin_lock_bh(&gact->tcf_lock);
+ opt.action = gact->tcf_action;
+
+ p = &gact->param;
+
+ if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME,
+ p->tcfg_basetime, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME,
+ p->tcfg_cycletime, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT,
+ p->tcfg_cycletime_ext, TCA_GATE_PAD))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid))
+ goto nla_put_failure;
+
+ if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags))
+ goto nla_put_failure;
+
+ if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority))
+ goto nla_put_failure;
+
+ entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST);
+ if (!entry_list)
+ goto nla_put_failure;
+
+ list_for_each_entry(entry, &p->entries, list) {
+ if (dumping_entry(skb, entry) < 0)
+ goto nla_put_failure;
+ }
+
+ nla_nest_end(skb, entry_list);
+
+ tcf_tm_dump(&t, &gact->tcf_tm);
+ if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD))
+ goto nla_put_failure;
+ spin_unlock_bh(&gact->tcf_lock);
+
+ return skb->len;
+
+nla_put_failure:
+ spin_unlock_bh(&gact->tcf_lock);
+ nlmsg_trim(skb, b);
+ return -1;
+}
+
+static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
+ struct netlink_callback *cb, int type,
+ const struct tc_action_ops *ops,
+ struct netlink_ext_ack *extack)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tcf_generic_walker(tn, skb, cb, type, ops, extack);
+}
+
+static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u32 packets,
+ u64 lastuse, bool hw)
+{
+ struct tcf_gate *gact = to_gate(a);
+ struct tcf_t *tm = &gact->tcf_tm;
+
+ tcf_action_update_stats(a, bytes, packets, false, hw);
+ tm->lastuse = max_t(u64, tm->lastuse, lastuse);
+}
+
+static int tcf_gate_search(struct net *net, struct tc_action **a, u32 index)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tcf_idr_search(tn, a, index);
+}
+
+static size_t tcf_gate_get_fill_size(const struct tc_action *act)
+{
+ return nla_total_size(sizeof(struct tc_gate));
+}
+
+static struct tc_action_ops act_gate_ops = {
+ .kind = "gate",
+ .id = TCA_ID_GATE,
+ .owner = THIS_MODULE,
+ .act = tcf_gate_act,
+ .dump = tcf_gate_dump,
+ .init = tcf_gate_init,
+ .cleanup = tcf_gate_cleanup,
+ .walk = tcf_gate_walker,
+ .stats_update = tcf_gate_stats_update,
+ .get_fill_size = tcf_gate_get_fill_size,
+ .lookup = tcf_gate_search,
+ .size = sizeof(struct tcf_gate),
+};
+
+static __net_init int gate_init_net(struct net *net)
+{
+ struct tc_action_net *tn = net_generic(net, gate_net_id);
+
+ return tc_action_net_init(net, tn, &act_gate_ops);
+}
+
+static void __net_exit gate_exit_net(struct list_head *net_list)
+{
+ tc_action_net_exit(net_list, gate_net_id);
+}
+
+static struct pernet_operations gate_net_ops = {
+ .init = gate_init_net,
+ .exit_batch = gate_exit_net,
+ .id = &gate_net_id,
+ .size = sizeof(struct tc_action_net),
+};
+
+static int __init gate_init_module(void)
+{
+ return tcf_register_action(&act_gate_ops, &gate_net_ops);
+}
+
+static void __exit gate_cleanup_module(void)
+{
+ tcf_unregister_action(&act_gate_ops, &gate_net_ops);
+}
+
+module_init(gate_init_module);
+module_exit(gate_cleanup_module);
+MODULE_LICENSE("GPL v2");
--
2.17.1

2020-04-28 03:58:03

by Po Liu

[permalink] [raw]
Subject: [v4,net-next 2/4] net: schedule: add action gate offloading

Add the gate action to the flow action entry. Add the gate parameters to
the tc_setup_flow_action() queueing to the entries of flow_action_entry
array provide to the driver.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 10 ++++
include/net/tc_act/tc_gate.h | 113 +++++++++++++++++++++++++++++++++++
net/sched/cls_api.c | 33 ++++++++++
3 files changed, 156 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 3619c6acf60f..94a30fe02e6d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -147,6 +147,7 @@ enum flow_action_id {
FLOW_ACTION_MPLS_PUSH,
FLOW_ACTION_MPLS_POP,
FLOW_ACTION_MPLS_MANGLE,
+ FLOW_ACTION_GATE,
NUM_FLOW_ACTIONS,
};

@@ -255,6 +256,15 @@ struct flow_action_entry {
u8 bos;
u8 ttl;
} mpls_mangle;
+ struct {
+ u32 index;
+ s32 prio;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimeext;
+ u32 num_entries;
+ struct action_gate_entry *entries;
+ } gate;
};
struct flow_action_cookie *cookie; /* user defined action cookie */
};
diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
index 330ad8b02495..9e698c7d64cd 100644
--- a/include/net/tc_act/tc_gate.h
+++ b/include/net/tc_act/tc_gate.h
@@ -7,6 +7,13 @@
#include <net/act_api.h>
#include <linux/tc_act/tc_gate.h>

+struct action_gate_entry {
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+};
+
struct tcfg_gate_entry {
int index;
u8 gate_state;
@@ -44,4 +51,110 @@ struct tcf_gate {

#define to_gate(a) ((struct tcf_gate *)a)

+static inline bool is_tcf_gate(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+ if (a->ops && a->ops->id == TCA_ID_GATE)
+ return true;
+#endif
+ return false;
+}
+
+static inline u32 tcf_gate_index(const struct tc_action *a)
+{
+ return a->tcfa_index;
+}
+
+static inline s32 tcf_gate_prio(const struct tc_action *a)
+{
+ s32 tcfg_prio;
+
+ rcu_read_lock();
+ tcfg_prio = to_gate(a)->param.tcfg_priority;
+ rcu_read_unlock();
+
+ return tcfg_prio;
+}
+
+static inline u64 tcf_gate_basetime(const struct tc_action *a)
+{
+ u64 tcfg_basetime;
+
+ rcu_read_lock();
+ tcfg_basetime = to_gate(a)->param.tcfg_basetime;
+ rcu_read_unlock();
+
+ return tcfg_basetime;
+}
+
+static inline u64 tcf_gate_cycletime(const struct tc_action *a)
+{
+ u64 tcfg_cycletime;
+
+ rcu_read_lock();
+ tcfg_cycletime = to_gate(a)->param.tcfg_cycletime;
+ rcu_read_unlock();
+
+ return tcfg_cycletime;
+}
+
+static inline u64 tcf_gate_cycletimeext(const struct tc_action *a)
+{
+ u64 tcfg_cycletimeext;
+
+ rcu_read_lock();
+ tcfg_cycletimeext = to_gate(a)->param.tcfg_cycletime_ext;
+ rcu_read_unlock();
+
+ return tcfg_cycletimeext;
+}
+
+static inline u32 tcf_gate_num_entries(const struct tc_action *a)
+{
+ u32 num_entries;
+
+ rcu_read_lock();
+ num_entries = to_gate(a)->param.num_entries;
+ rcu_read_unlock();
+
+ return num_entries;
+}
+
+static inline struct action_gate_entry
+ *tcf_gate_get_list(const struct tc_action *a)
+{
+ struct action_gate_entry *oe;
+ struct tcf_gate_params *p;
+ struct tcfg_gate_entry *entry;
+ u32 num_entries;
+ int i = 0;
+
+ rcu_read_lock();
+
+ p = &to_gate(a)->param;
+ num_entries = p->num_entries;
+
+ list_for_each_entry(entry, &p->entries, list)
+ i++;
+
+ if (i != num_entries)
+ return NULL;
+
+ oe = kzalloc(sizeof(*oe) * num_entries, GFP_KERNEL);
+ if (!oe)
+ return NULL;
+
+ i = 0;
+ list_for_each_entry(entry, &p->entries, list) {
+ oe[i].gate_state = entry->gate_state;
+ oe[i].interval = entry->interval;
+ oe[i].ipv = entry->ipv;
+ oe[i].maxoctets = entry->maxoctets;
+ i++;
+ }
+
+ rcu_read_unlock();
+
+ return oe;
+}
#endif
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 11b683c45c28..7e85c91d0752 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -39,6 +39,7 @@
#include <net/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_ct.h>
#include <net/tc_act/tc_mpls.h>
+#include <net/tc_act/tc_gate.h>
#include <net/flow_offload.h>

extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
@@ -3526,6 +3527,27 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
#endif
}

+static void tcf_gate_entry_destructor(void *priv)
+{
+ struct action_gate_entry *oe = priv;
+
+ kfree(oe);
+}
+
+static int tcf_gate_get_entries(struct flow_action_entry *entry,
+ const struct tc_action *act)
+{
+ entry->gate.entries = tcf_gate_get_list(act);
+
+ if (!entry->gate.entries)
+ return -EINVAL;
+
+ entry->destructor = tcf_gate_entry_destructor;
+ entry->destructor_priv = entry->gate.entries;
+
+ return 0;
+}
+
int tc_setup_flow_action(struct flow_action *flow_action,
const struct tcf_exts *exts)
{
@@ -3672,6 +3694,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
} else if (is_tcf_skbedit_priority(act)) {
entry->id = FLOW_ACTION_PRIORITY;
entry->priority = tcf_skbedit_priority(act);
+ } else if (is_tcf_gate(act)) {
+ entry->id = FLOW_ACTION_GATE;
+ entry->gate.index = tcf_gate_index(act);
+ entry->gate.prio = tcf_gate_prio(act);
+ entry->gate.basetime = tcf_gate_basetime(act);
+ entry->gate.cycletime = tcf_gate_cycletime(act);
+ entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
+ entry->gate.num_entries = tcf_gate_num_entries(act);
+ err = tcf_gate_get_entries(entry, act);
+ if (err)
+ goto err_out;
} else {
err = -EOPNOTSUPP;
goto err_out_locked;
--
2.17.1

2020-04-28 04:00:05

by Po Liu

[permalink] [raw]
Subject: [v4,net-next 3/4] net: enetc: add hw tc hw offload features for PSPF capability

This patch is to let ethtool enable/disable the tc flower offload
features. Hardware ENETC has the feature of PSFP which is for per-stream
policing. When enable the tc hw offloading feature, driver would enable
the IEEE 802.1Qci feature. It is only set the register enable bit for
this feature not enable for any entry of per stream filtering and stream
gate or stream identify but get how much capabilities for each feature.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 23 +++++++++
drivers/net/ethernet/freescale/enetc/enetc.h | 48 +++++++++++++++++++
.../net/ethernet/freescale/enetc/enetc_hw.h | 17 +++++++
.../net/ethernet/freescale/enetc/enetc_pf.c | 8 ++++
4 files changed, 96 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index ccf2611f4a20..04aac7cbb506 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -756,6 +756,9 @@ void enetc_get_si_caps(struct enetc_si *si)

if (val & ENETC_SIPCAPR0_QBV)
si->hw_features |= ENETC_SI_F_QBV;
+
+ if (val & ENETC_SIPCAPR0_PSFP)
+ si->hw_features |= ENETC_SI_F_PSFP;
}

static int enetc_dma_alloc_bdr(struct enetc_bdr *r, size_t bd_size)
@@ -1567,6 +1570,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
return 0;
}

+static int enetc_set_psfp(struct net_device *ndev, int en)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+
+ if (en) {
+ priv->active_offloads |= ENETC_F_QCI;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&priv->si->hw);
+ } else {
+ priv->active_offloads &= ~ENETC_F_QCI;
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+ enetc_psfp_disable(&priv->si->hw);
+ }
+
+ return 0;
+}
+
int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
@@ -1575,6 +1595,9 @@ int enetc_set_features(struct net_device *ndev,
if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

+ if (changed & NETIF_F_HW_TC)
+ enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+
return 0;
}

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 56c43f35b633..2cfe877c3778 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -151,6 +151,7 @@ enum enetc_errata {
};

#define ENETC_SI_F_QBV BIT(0)
+#define ENETC_SI_F_PSFP BIT(1)

/* PCI IEP device data */
struct enetc_si {
@@ -203,12 +204,20 @@ struct enetc_cls_rule {
};

#define ENETC_MAX_BDR_INT 2 /* fixed to max # of available cpus */
+struct psfp_cap {
+ u32 max_streamid;
+ u32 max_psfp_filter;
+ u32 max_psfp_gate;
+ u32 max_psfp_gatelist;
+ u32 max_psfp_meter;
+};

/* TODO: more hardware offloads */
enum enetc_active_offloads {
ENETC_F_RX_TSTAMP = BIT(0),
ENETC_F_TX_TSTAMP = BIT(1),
ENETC_F_QBV = BIT(2),
+ ENETC_F_QCI = BIT(3),
};

struct enetc_ndev_priv {
@@ -231,6 +240,8 @@ struct enetc_ndev_priv {

struct enetc_cls_rule *cls_rules;

+ struct psfp_cap psfp_cap;
+
struct device_node *phy_node;
phy_interface_t if_mode;
};
@@ -289,9 +300,46 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+
+static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
+{
+ u32 reg;
+
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSIDCAPR);
+ priv->psfp_cap.max_streamid = reg & ENETC_PSIDCAPR_MSK;
+ /* Port stream filter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSFCAPR);
+ priv->psfp_cap.max_psfp_filter = reg & ENETC_PSFCAPR_MSK;
+ /* Port stream gate capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSGCAPR);
+ priv->psfp_cap.max_psfp_gate = (reg & ENETC_PSGCAPR_SGIT_MSK);
+ priv->psfp_cap.max_psfp_gatelist = (reg & ENETC_PSGCAPR_GCL_MSK) >> 16;
+ /* Port flow meter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PFMCAPR);
+ priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
+}
+
+static inline void enetc_psfp_enable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) |
+ ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS |
+ ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+}
+
+static inline void enetc_psfp_disable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) &
+ ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS &
+ ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+}
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_get_max_cap(p) \
+ memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))
+
+#define enetc_psfp_enable(hw) (void)0
+#define enetc_psfp_disable(hw) (void)0
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 2a6523136947..587974862f48 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -19,6 +19,7 @@
#define ENETC_SICTR1 0x1c
#define ENETC_SIPCAPR0 0x20
#define ENETC_SIPCAPR0_QBV BIT(4)
+#define ENETC_SIPCAPR0_PSFP BIT(9)
#define ENETC_SIPCAPR0_RSS BIT(8)
#define ENETC_SIPCAPR1 0x24
#define ENETC_SITGTGR 0x30
@@ -228,6 +229,15 @@ enum enetc_bdr_type {TX, RX};
#define ENETC_PM0_IFM_RLP (BIT(5) | BIT(11))
#define ENETC_PM0_IFM_RGAUTO (BIT(15) | ENETC_PMO_IFM_RG | BIT(1))
#define ENETC_PM0_IFM_XGMII BIT(12)
+#define ENETC_PSIDCAPR 0x1b08
+#define ENETC_PSIDCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSFCAPR 0x1b18
+#define ENETC_PSFCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSGCAPR 0x1b28
+#define ENETC_PSGCAPR_GCL_MSK GENMASK(18, 16)
+#define ENETC_PSGCAPR_SGIT_MSK GENMASK(15, 0)
+#define ENETC_PFMCAPR 0x1b38
+#define ENETC_PFMCAPR_MSK GENMASK(15, 0)

/* MAC counters */
#define ENETC_PM0_REOCT 0x8100
@@ -621,3 +631,10 @@ struct enetc_cbd {
/* Port time specific departure */
#define ENETC_PTCTSDR(n) (0x1210 + 4 * (n))
#define ENETC_TSDE BIT(31)
+
+/* PSFP setting */
+#define ENETC_PPSFPMR 0x11b00
+#define ENETC_PPSFPMR_PSFPEN BIT(0)
+#define ENETC_PPSFPMR_VS BIT(1)
+#define ENETC_PPSFPMR_PVC BIT(2)
+#define ENETC_PPSFPMR_PVZC BIT(3)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index de1ad4975074..cef9fbfdb056 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -727,6 +727,14 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

+ if (si->hw_features & ENETC_SI_F_PSFP) {
+ priv->active_offloads |= ENETC_F_QCI;
+ ndev->features |= NETIF_F_HW_TC;
+ ndev->hw_features |= NETIF_F_HW_TC;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&si->hw);
+ }
+
/* pick up primary MAC address from SI */
enetc_get_primary_mac_addr(&si->hw, ndev->dev_addr);
}
--
2.17.1

2020-04-28 04:01:30

by Po Liu

[permalink] [raw]
Subject: [v4,net-next 4/4] net: enetc: add tc flower psfp offload driver

This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
function. There are four main feature parts to implement the flow
policing and filtering for ingress flow with IEEE 802.1Qci features.
They are stream identify(this is defined in the P802.1cb exactly but
needed for 802.1Qci), stream filtering, stream gate and flow metering.
Each function block includes many entries by index to assign parameters.
So for one frame would be filtered by stream identify first, then
flow into stream filter block by the same handle between stream identify
and stream filtering. Then flow into stream gate control which assigned
by the stream filtering entry. And then policing by the gate and limited
by the max sdu in the filter block(optional). At last, policing by the
flow metering block, index choosing at the fitering block.
So you can see that each entry of block may link to many upper entries
since they can be assigned same index means more streams want to share
the same feature in the stream filtering or stream gate or flow
metering.
To implement such features, each stream filtered by source/destination
mac address, some stream maybe also plus the vlan id value would be
treated as one flow chain. This would be identified by the chain_index
which already in the tc filter concept. Driver would maintain this chain
and also with gate modules. The stream filter entry create by the gate
index and flow meter(optional) entry id and also one priority value.
Offloading only transfer the gate action and flow filtering parameters.
Driver would create (or search same gate id and flow meter id and
priority) one stream filter entry to set to the hardware. So stream
filtering do not need transfer by the action offloading.
This architecture is same with tc filter and actions relationship. tc
filter maintain the list for each flow feature by keys. And actions
maintain by the action list.

Below showing a example commands by tc:
> tc qdisc add dev eth0 ingress
> ip link set eth0 address 10:00:80:00:00:00
> tc filter add dev eth0 parent ffff: protocol ip chain 11 \
flower skip_sw dst_mac 10:00:80:00:00:00 \
action gate index 10 \
sched-entry OPEN 200000000 1 8000000 \
sched-entry CLOSE 100000000 -1 -1

Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
identify module. And the set the gate index 10 of stream gate module.
The gate list is keeping OPEN state 200ms, and through the frames to the
ingress queue 1, and max octets are the 8Mbytes.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 25 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 46 +-
.../net/ethernet/freescale/enetc/enetc_hw.h | 142 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 4 +-
.../net/ethernet/freescale/enetc/enetc_qos.c | 1098 +++++++++++++++++
5 files changed, 1300 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 04aac7cbb506..298c55786fd9 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1521,6 +1521,8 @@ int enetc_setup_tc(struct net_device *ndev, enum tc_setup_type type,
return enetc_setup_tc_cbs(ndev, type_data);
case TC_SETUP_QDISC_ETF:
return enetc_setup_tc_txtime(ndev, type_data);
+ case TC_SETUP_BLOCK:
+ return enetc_setup_tc_psfp(ndev, type_data);
default:
return -EOPNOTSUPP;
}
@@ -1573,17 +1575,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
static int enetc_set_psfp(struct net_device *ndev, int en)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ int err;

if (en) {
+ err = enetc_psfp_enable(priv);
+ if (err)
+ return err;
+
priv->active_offloads |= ENETC_F_QCI;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&priv->si->hw);
- } else {
- priv->active_offloads &= ~ENETC_F_QCI;
- memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
- enetc_psfp_disable(&priv->si->hw);
+ return 0;
}

+ err = enetc_psfp_disable(priv);
+ if (err)
+ return err;
+
+ priv->active_offloads &= ~ENETC_F_QCI;
+
return 0;
}

@@ -1591,14 +1599,15 @@ int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
netdev_features_t changed = ndev->features ^ features;
+ int err = 0;

if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

if (changed & NETIF_F_HW_TC)
- enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+ err = enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));

- return 0;
+ return err;
}

#ifdef CONFIG_FSL_ENETC_PTP_CLOCK
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 2cfe877c3778..b705464f6882 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -300,6 +300,11 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv);
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data);
+int enetc_psfp_init(struct enetc_ndev_priv *priv);
+int enetc_psfp_clean(struct enetc_ndev_priv *priv);

static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
{
@@ -319,27 +324,60 @@ static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
}

-static inline void enetc_psfp_enable(struct enetc_hw *hw)
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ enetc_get_max_cap(priv);
+
+ err = enetc_psfp_init(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) |
ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS |
ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+
+ return 0;
}

-static inline void enetc_psfp_disable(struct enetc_hw *hw)
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ err = enetc_psfp_clean(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) &
~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS &
~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+
+ return 0;
}
+
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_psfp(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_block_cb NULL
+
#define enetc_get_max_cap(p) \
memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))

-#define enetc_psfp_enable(hw) (void)0
-#define enetc_psfp_disable(hw) (void)0
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
+{
+ return 0;
+}
+
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
+{
+ return 0;
+}
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 587974862f48..6314051bc6c1 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -567,6 +567,9 @@ enum bdcr_cmd_class {
BDCR_CMD_RFS,
BDCR_CMD_PORT_GCL,
BDCR_CMD_RECV_CLASSIFIER,
+ BDCR_CMD_STREAM_IDENTIFY,
+ BDCR_CMD_STREAM_FILTER,
+ BDCR_CMD_STREAM_GCL,
__BDCR_CMD_MAX_LEN,
BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
};
@@ -598,13 +601,152 @@ struct tgs_gcl_data {
struct gce entry[];
};

+/* class 7, command 0, Stream Identity Entry Configuration */
+struct streamid_conf {
+ __le32 stream_handle; /* init gate value */
+ __le32 iports;
+ u8 id_type;
+ u8 oui[3];
+ u8 res[3];
+ u8 en;
+};
+
+#define ENETC_CBDR_SID_VID_MASK 0xfff
+#define ENETC_CBDR_SID_VIDM BIT(12)
+#define ENETC_CBDR_SID_TG_MASK 0xc000
+/* streamid_conf address point to this data space */
+struct streamid_data {
+ union {
+ u8 dmac[6];
+ u8 smac[6];
+ };
+ u16 vid_vidm_tg;
+};
+
+#define ENETC_CBDR_SFI_PRI_MASK 0x7
+#define ENETC_CBDR_SFI_PRIM BIT(3)
+#define ENETC_CBDR_SFI_BLOV BIT(4)
+#define ENETC_CBDR_SFI_BLEN BIT(5)
+#define ENETC_CBDR_SFI_MSDUEN BIT(6)
+#define ENETC_CBDR_SFI_FMITEN BIT(7)
+#define ENETC_CBDR_SFI_ENABLE BIT(7)
+/* class 8, command 0, Stream Filter Instance, Short Format */
+struct sfi_conf {
+ __le32 stream_handle;
+ u8 multi;
+ u8 res[2];
+ u8 sthm;
+ /* Max Service Data Unit or Flow Meter Instance Table index.
+ * Depending on the value of FLT this represents either Max
+ * Service Data Unit (max frame size) allowed by the filter
+ * entry or is an index into the Flow Meter Instance table
+ * index identifying the policer which will be used to police
+ * it.
+ */
+ __le16 fm_inst_table_index;
+ __le16 msdu;
+ __le16 sg_inst_table_index;
+ u8 res1[2];
+ __le32 input_ports;
+ u8 res2[3];
+ u8 en;
+};
+
+/* class 8, command 2 stream Filter Instance status query short format
+ * command no need structure define
+ * Stream Filter Instance Query Statistics Response data
+ */
+struct sfi_counter_data {
+ u32 matchl;
+ u32 matchh;
+ u32 msdu_dropl;
+ u32 msdu_droph;
+ u32 stream_gate_dropl;
+ u32 stream_gate_droph;
+ u32 flow_meter_dropl;
+ u32 flow_meter_droph;
+};
+
+#define ENETC_CBDR_SGI_OIPV_MASK 0x7
+#define ENETC_CBDR_SGI_OIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_CGTST BIT(6)
+#define ENETC_CBDR_SGI_OGTST BIT(7)
+#define ENETC_CBDR_SGI_CFG_CHG BIT(1)
+#define ENETC_CBDR_SGI_CFG_PND BIT(2)
+#define ENETC_CBDR_SGI_OEX BIT(4)
+#define ENETC_CBDR_SGI_OEXEN BIT(5)
+#define ENETC_CBDR_SGI_IRX BIT(6)
+#define ENETC_CBDR_SGI_IRXEN BIT(7)
+#define ENETC_CBDR_SGI_ACLLEN_MASK 0x3
+#define ENETC_CBDR_SGI_OCLLEN_MASK 0xc
+#define ENETC_CBDR_SGI_EN BIT(7)
+/* class 9, command 0, Stream Gate Instance Table, Short Format
+ * class 9, command 2, Stream Gate Instance Table entry query write back
+ * Short Format
+ */
+struct sgi_table {
+ u8 res[8];
+ u8 oipv;
+ u8 res0[2];
+ u8 ocgtst;
+ u8 res1[7];
+ u8 gset;
+ u8 oacl_len;
+ u8 res2[2];
+ u8 en;
+};
+
+#define ENETC_CBDR_SGI_AIPV_MASK 0x7
+#define ENETC_CBDR_SGI_AIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_AGTST BIT(7)
+
+/* class 9, command 1, Stream Gate Control List, Long Format */
+struct sgcl_conf {
+ u8 aipv;
+ u8 res[2];
+ u8 agtst;
+ u8 res1[4];
+ union {
+ struct {
+ u8 res2[4];
+ u8 acl_len;
+ u8 res3[3];
+ };
+ u8 cct[8]; /* Config change time */
+ };
+};
+
+#define ENETC_CBDR_SGL_IOMEN BIT(0)
+#define ENETC_CBDR_SGL_IPVEN BIT(3)
+#define ENETC_CBDR_SGL_GTST BIT(4)
+#define ENETC_CBDR_SGL_IPV_MASK 0xe
+/* Stream Gate Control List Entry */
+struct sgce {
+ u32 interval;
+ u8 msdu[3];
+ u8 multi;
+};
+
+/* stream control list class 9 , cmd 1 data buffer */
+struct sgcl_data {
+ u32 btl;
+ u32 bth;
+ u32 ct;
+ u32 cte;
+ struct sgce sgcl[0];
+};
+
struct enetc_cbd {
union{
+ struct sfi_conf sfi_conf;
+ struct sgi_table sgi_table;
struct {
__le32 addr[2];
union {
__le32 opt[4];
struct tgs_gcl_conf gcl_conf;
+ struct streamid_conf sid_set;
+ struct sgcl_conf sgcl_conf;
};
}; /* Long format */
__le32 data[6];
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index cef9fbfdb056..824d211ec00f 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -727,12 +727,10 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

- if (si->hw_features & ENETC_SI_F_PSFP) {
+ if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
priv->active_offloads |= ENETC_F_QCI;
ndev->features |= NETIF_F_HW_TC;
ndev->hw_features |= NETIF_F_HW_TC;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&si->hw);
}

/* pick up primary MAC address from SI */
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 0c6bf3a55a9a..30fca29b2739 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -5,6 +5,9 @@

#include <net/pkt_sched.h>
#include <linux/math64.h>
+#include <linux/refcount.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>

static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
{
@@ -331,3 +334,1098 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)

return 0;
}
+
+enum streamid_type {
+ STREAMID_TYPE_RESERVED = 0,
+ STREAMID_TYPE_NULL,
+ STREAMID_TYPE_SMAC,
+};
+
+enum streamid_vlan_tagged {
+ STREAMID_VLAN_RESERVED = 0,
+ STREAMID_VLAN_TAGGED,
+ STREAMID_VLAN_UNTAGGED,
+ STREAMID_VLAN_ALL,
+};
+
+#define ENETC_PSFP_WILDCARD -1
+#define HANDLE_OFFSET 100
+
+enum forward_type {
+ FILTER_ACTION_TYPE_PSFP = BIT(0),
+ FILTER_ACTION_TYPE_ACL = BIT(1),
+ FILTER_ACTION_TYPE_BOTH = GENMASK(1, 0),
+};
+
+/* This is for limit output type for input actions */
+struct actions_fwd {
+ u64 actions;
+ u64 keys; /* include the must needed keys */
+ enum forward_type output;
+};
+
+struct psfp_streamfilter_counters {
+ u64 matching_frames_count;
+ u64 passing_frames_count;
+ u64 not_passing_frames_count;
+ u64 passing_sdu_count;
+ u64 not_passing_sdu_count;
+ u64 red_frames_count;
+};
+
+struct enetc_streamid {
+ u32 index;
+ union {
+ u8 src_mac[6];
+ u8 dst_mac[6];
+ };
+ u8 filtertype;
+ u16 vid;
+ u8 tagged;
+ s32 handle;
+};
+
+struct enetc_psfp_filter {
+ u32 index;
+ s32 handle;
+ s8 prio;
+ u32 gate_id;
+ s32 meter_id;
+ refcount_t refcount;
+ struct hlist_node node;
+};
+
+struct enetc_psfp_gate {
+ u32 index;
+ s8 init_ipv;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimext;
+ u32 num_entries;
+ refcount_t refcount;
+ struct hlist_node node;
+ struct action_gate_entry entries[0];
+};
+
+struct enetc_stream_filter {
+ struct enetc_streamid sid;
+ u32 sfi_index;
+ u32 sgi_index;
+ struct flow_stats stats;
+ struct hlist_node node;
+};
+
+struct enetc_psfp {
+ unsigned long dev_bitmap;
+ unsigned long *psfp_sfi_bitmap;
+ struct hlist_head stream_list;
+ struct hlist_head psfp_filter_list;
+ struct hlist_head psfp_gate_list;
+ spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
+};
+
+struct actions_fwd enetc_act_fwd[] = {
+ {
+ BIT(FLOW_ACTION_GATE),
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
+ FILTER_ACTION_TYPE_PSFP
+ },
+ /* example for ACL actions */
+ {
+ BIT(FLOW_ACTION_DROP),
+ 0,
+ FILTER_ACTION_TYPE_ACL
+ }
+};
+
+static struct enetc_psfp epsfp = {
+ .psfp_sfi_bitmap = NULL,
+};
+
+static LIST_HEAD(enetc_block_cb_list);
+
+static inline int enetc_get_port(struct enetc_ndev_priv *priv)
+{
+ return priv->si->pdev->devfn & 0x7;
+}
+
+/* Stream Identity Entry Set Descriptor */
+static int enetc_streamid_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct streamid_data *si_data;
+ struct streamid_conf *si_conf;
+ u16 data_size;
+ dma_addr_t dma;
+ int err;
+
+ if (sid->index >= priv->psfp_cap.max_streamid)
+ return -EINVAL;
+
+ if (sid->filtertype != STREAMID_TYPE_NULL &&
+ sid->filtertype != STREAMID_TYPE_SMAC)
+ return -EOPNOTSUPP;
+
+ /* Disable operation before enable */
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct streamid_data);
+ si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev, si_data,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(si_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+ memset(si_data->dmac, 0xff, ETH_ALEN);
+ si_data->vid_vidm_tg =
+ cpu_to_le16(ENETC_CBDR_SID_VID_MASK
+ + ((0x3 << 14) | ENETC_CBDR_SID_VIDM));
+
+ si_conf = &cbd.sid_set;
+ /* Only one port supported for one entry, set itself */
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = 1;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ if (!enable) {
+ kfree(si_data);
+ return 0;
+ }
+
+ /* Enable the entry overwrite again incase space flushed by hardware */
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ si_conf->en = 0x80;
+ si_conf->stream_handle = cpu_to_le32(sid->handle);
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = sid->filtertype;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ memset(si_data, 0, data_size);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ /* VIDM default to be 1.
+ * VID Match. If set (b1) then the VID must match, otherwise
+ * any VID is considered a match. VIDM setting is only used
+ * when TG is set to b01.
+ */
+ if (si_conf->id_type == STREAMID_TYPE_NULL) {
+ ether_addr_copy(si_data->dmac, sid->dst_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ } else if (si_conf->id_type == STREAMID_TYPE_SMAC) {
+ ether_addr_copy(si_data->smac, sid->src_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ kfree(si_data);
+
+ return err;
+}
+
+/* Stream Filter Instance Set Descriptor */
+static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_filter *sfi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct sfi_conf *sfi_config;
+
+ cbd.index = cpu_to_le16(sfi->index);
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0x80;
+ cbd.length = cpu_to_le16(1);
+
+ sfi_config = &cbd.sfi_conf;
+ if (!enable)
+ goto exit;
+
+ sfi_config->en = 0x80;
+
+ if (sfi->handle >= 0) {
+ sfi_config->stream_handle =
+ cpu_to_le32(sfi->handle);
+ sfi_config->sthm |= 0x80;
+ }
+
+ sfi_config->sg_inst_table_index = cpu_to_le16(sfi->gate_id);
+ sfi_config->input_ports = 1 << enetc_get_port(priv);
+
+ /* The priority value which may be matched against the
+ * frame’s priority value to determine a match for this entry.
+ */
+ if (sfi->prio >= 0)
+ sfi_config->multi |= (sfi->prio & 0x7) | 0x8;
+
+ /* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
+ * field as being either an MSDU value or an index into the Flow
+ * Meter Instance table.
+ * TODO: no limit max sdu
+ */
+
+ if (sfi->meter_id >= 0) {
+ sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
+ sfi_config->multi |= 0x80;
+ }
+
+exit:
+ return enetc_send_cmd(priv->si, &cbd);
+}
+
+static int enetc_streamcounter_hw_get(struct enetc_ndev_priv *priv,
+ u32 index,
+ struct psfp_streamfilter_counters *cnt)
+{
+ struct enetc_cbd cbd = { .cmd = 2 };
+ struct sfi_counter_data *data_buf;
+ dma_addr_t dma;
+ u16 data_size;
+ int err;
+
+ cbd.index = cpu_to_le16((u16)index);
+ cbd.cmd = 2;
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct sfi_counter_data);
+ data_buf = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!data_buf)
+ return -ENOMEM;
+
+ dma = dma_map_single(&priv->si->pdev->dev, data_buf,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ err = -ENOMEM;
+ goto exit;
+ }
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ goto exit;
+
+ cnt->matching_frames_count =
+ ((u64)le32_to_cpu(data_buf->matchh) << 32)
+ + data_buf->matchl;
+
+ cnt->not_passing_sdu_count =
+ ((u64)le32_to_cpu(data_buf->msdu_droph) << 32)
+ + data_buf->msdu_dropl;
+
+ cnt->passing_sdu_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count;
+
+ cnt->not_passing_frames_count =
+ ((u64)le32_to_cpu(data_buf->stream_gate_droph) << 32)
+ + le32_to_cpu(data_buf->stream_gate_dropl);
+
+ cnt->passing_frames_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count
+ - cnt->not_passing_frames_count;
+
+ cnt->red_frames_count =
+ ((u64)le32_to_cpu(data_buf->flow_meter_droph) << 32)
+ + le32_to_cpu(data_buf->flow_meter_dropl);
+
+exit:
+ kfree(data_buf);
+ return err;
+}
+
+static u64 get_ptp_now(struct enetc_hw *hw)
+{
+ u64 now_lo, now_hi, now;
+
+ now_lo = enetc_rd(hw, ENETC_SICTR0);
+ now_hi = enetc_rd(hw, ENETC_SICTR1);
+ now = now_lo | now_hi << 32;
+
+ return now;
+}
+
+static int get_start_ns(u64 now, u64 cycle, u64 *start)
+{
+ u64 n;
+
+ if (!cycle)
+ return -EFAULT;
+
+ n = div64_u64(now, cycle);
+
+ *start = (n + 1) * cycle;
+
+ return 0;
+}
+
+/* Stream Gate Instance Set Descriptor */
+static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_gate *sgi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = { .cmd = 0 };
+ struct sgi_table *sgi_config;
+ struct sgcl_conf *sgcl_config;
+ struct sgcl_data *sgcl_data;
+ struct sgce *sgce;
+ dma_addr_t dma;
+ u16 data_size;
+ int err, i;
+ u64 now;
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0x80;
+
+ /* disable */
+ if (!enable)
+ return enetc_send_cmd(priv->si, &cbd);
+
+ if (!sgi->num_entries)
+ return 0;
+
+ if (sgi->num_entries > priv->psfp_cap.max_psfp_gatelist ||
+ !sgi->cycletime)
+ return -EINVAL;
+
+ /* enable */
+ sgi_config = &cbd.sgi_table;
+
+ /* Keep open before gate list start */
+ sgi_config->ocgtst = 0x80;
+
+ sgi_config->oipv = (sgi->init_ipv < 0) ?
+ 0x0 : ((sgi->init_ipv & 0x7) | 0x8);
+
+ sgi_config->en = 0x80;
+
+ /* Basic config */
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 1;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0;
+
+ sgcl_config = &cbd.sgcl_conf;
+
+ sgcl_config->acl_len = (sgi->num_entries - 1) & 0x3;
+
+ data_size = struct_size(sgcl_data, sgcl, sgi->num_entries);
+
+ sgcl_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!sgcl_data)
+ return -ENOMEM;
+
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev,
+ sgcl_data, data_size,
+ DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(sgcl_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ sgce = &sgcl_data->sgcl[0];
+
+ sgcl_config->agtst = 0x80;
+
+ sgcl_data->ct = cpu_to_le32(sgi->cycletime);
+ sgcl_data->cte = cpu_to_le32(sgi->cycletimext);
+
+ if (sgi->init_ipv >= 0)
+ sgcl_config->aipv = (sgi->init_ipv & 0x7) | 0x8;
+
+ for (i = 0; i < sgi->num_entries; i++) {
+ struct action_gate_entry *from = &sgi->entries[i];
+ struct sgce *to = &sgce[i];
+
+ if (from->gate_state)
+ to->multi |= 0x10;
+
+ if (from->ipv >= 0)
+ to->multi |= ((from->ipv & 0x7) << 5) | 0x08;
+
+ if (from->maxoctets >= 0) {
+ to->multi |= 0x01;
+ to->msdu[0] = from->maxoctets & 0xFF;
+ to->msdu[1] = (from->maxoctets >> 8) & 0xFF;
+ to->msdu[2] = (from->maxoctets >> 16) & 0xFF;
+ }
+
+ to->interval = cpu_to_le32(from->interval);
+ }
+
+ /* If basetime is less than now, calculate start time */
+ now = get_ptp_now(&priv->si->hw);
+
+ if (sgi->basetime < now) {
+ u64 start;
+
+ err = get_start_ns(now, sgi->cycletime, &start);
+ if (err)
+ goto exit;
+ sgcl_data->btl = cpu_to_le32(lower_32_bits(start));
+ sgcl_data->bth = cpu_to_le32(upper_32_bits(start));
+ } else {
+ u32 hi, lo;
+
+ hi = upper_32_bits(sgi->basetime);
+ lo = lower_32_bits(sgi->basetime);
+ sgcl_data->bth = cpu_to_le32(hi);
+ sgcl_data->btl = cpu_to_le32(lo);
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+
+exit:
+ kfree(sgcl_data);
+
+ return err;
+}
+
+static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
+{
+ struct enetc_stream_filter *f;
+
+ hlist_for_each_entry(f, &epsfp.stream_list, node)
+ if (f->sid.index == index)
+ return f;
+
+ return NULL;
+}
+
+static struct enetc_psfp_gate *enetc_get_gate_by_index(u32 index)
+{
+ struct enetc_psfp_gate *g;
+
+ hlist_for_each_entry(g, &epsfp.psfp_gate_list, node)
+ if (g->index == index)
+ return g;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->index == index)
+ return s;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter
+ *enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->gate_id == sfi->gate_id &&
+ s->prio == sfi->prio &&
+ s->meter_id == sfi->meter_id)
+ return s;
+
+ return NULL;
+}
+
+static int enetc_get_free_index(struct enetc_ndev_priv *priv)
+{
+ u32 max_size = priv->psfp_cap.max_psfp_filter;
+ unsigned long index;
+
+ index = find_first_zero_bit(epsfp.psfp_sfi_bitmap, max_size);
+ if (index == max_size)
+ return -1;
+
+ return index;
+}
+
+static void stream_filter_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_filter *sfi;
+ u8 z;
+
+ sfi = enetc_get_filter_by_index(index);
+ WARN_ON(!sfi);
+ z = refcount_dec_and_test(&sfi->refcount);
+
+ if (z) {
+ enetc_streamfilter_hw_set(priv, sfi, false);
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ clear_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ }
+}
+
+static void stream_gate_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_gate *sgi;
+ u8 z;
+
+ sgi = enetc_get_gate_by_index(index);
+ WARN_ON(!sgi);
+ z = refcount_dec_and_test(&sgi->refcount);
+ if (z) {
+ enetc_streamgate_hw_set(priv, sgi, false);
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void remove_one_chain(struct enetc_ndev_priv *priv,
+ struct enetc_stream_filter *filter)
+{
+ stream_gate_unref(priv, filter->sgi_index);
+ stream_filter_unref(priv, filter->sfi_index);
+
+ hlist_del(&filter->node);
+ kfree(filter);
+}
+
+static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ struct enetc_psfp_filter *sfi,
+ struct enetc_psfp_gate *sgi)
+{
+ int err;
+
+ err = enetc_streamid_hw_set(priv, sid, true);
+ if (err)
+ return err;
+
+ if (sfi) {
+ err = enetc_streamfilter_hw_set(priv, sfi, true);
+ if (err)
+ goto revert_sid;
+ }
+
+ err = enetc_streamgate_hw_set(priv, sgi, true);
+ if (err)
+ goto revert_sfi;
+
+ return 0;
+
+revert_sfi:
+ if (sfi)
+ enetc_streamfilter_hw_set(priv, sfi, false);
+revert_sid:
+ enetc_streamid_hw_set(priv, sid, false);
+ return err;
+}
+
+struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int inputkeys)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(enetc_act_fwd); i++)
+ if (acts == enetc_act_fwd[i].actions &&
+ inputkeys & enetc_act_fwd[i].keys)
+ return &enetc_act_fwd[i];
+
+ return NULL;
+}
+
+static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(f);
+ struct netlink_ext_ack *extack = f->common.extack;
+ struct enetc_stream_filter *filter, *old_filter;
+ struct enetc_psfp_filter *sfi, *old_sfi;
+ struct enetc_psfp_gate *sgi, *old_sgi;
+ struct flow_action_entry *entry;
+ struct action_gate_entry *e;
+ u8 sfi_overwrite = 0;
+ int entries_size;
+ int i, err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ flow_action_for_each(i, entry, &rule->action)
+ if (entry->id == FLOW_ACTION_GATE)
+ break;
+
+ if (entry->id != FLOW_ACTION_GATE)
+ return -EINVAL;
+
+ filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+ if (!filter)
+ return -ENOMEM;
+
+ filter->sid.index = f->common.chain_index;
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+ struct flow_match_eth_addrs match;
+
+ flow_rule_match_eth_addrs(rule, &match);
+
+ if (!is_zero_ether_addr(match.mask->dst) &&
+ !is_zero_ether_addr(match.mask->src)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Cannot match on both source and destination MAC");
+ goto free_filter;
+ }
+
+ if (!is_zero_ether_addr(match.mask->dst)) {
+ if (!is_broadcast_ether_addr(match.mask->dst)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Masked matching on destination MAC not supported");
+ goto free_filter;
+ }
+ ether_addr_copy(filter->sid.dst_mac, match.key->dst);
+ filter->sid.filtertype = STREAMID_TYPE_NULL;
+ }
+
+ if (!is_zero_ether_addr(match.mask->src)) {
+ if (!is_broadcast_ether_addr(match.mask->src)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Masked matching on source MAC not supported");
+ goto free_filter;
+ }
+ ether_addr_copy(filter->sid.src_mac, match.key->src);
+ filter->sid.filtertype = STREAMID_TYPE_SMAC;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported, must include ETH_ADDRS");
+ goto free_filter;
+ }
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
+ struct flow_match_vlan match;
+
+ flow_rule_match_vlan(rule, &match);
+ if (match.mask->vlan_priority) {
+ if (match.mask->vlan_priority !=
+ (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN priority");
+ err = -EINVAL;
+ goto free_filter;
+ }
+ }
+
+ if (match.mask->vlan_id) {
+ if (match.mask->vlan_id != VLAN_VID_MASK) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN id");
+ err = -EINVAL;
+ goto free_filter;
+ }
+
+ filter->sid.vid = match.key->vlan_id;
+ if (!filter->sid.vid)
+ filter->sid.tagged = STREAMID_VLAN_UNTAGGED;
+ else
+ filter->sid.tagged = STREAMID_VLAN_TAGGED;
+ }
+ } else {
+ filter->sid.tagged = STREAMID_VLAN_ALL;
+ }
+
+ /* parsing gate action */
+ if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ entries_size = struct_size(sgi, entries, entry->gate.num_entries);
+ sgi = kzalloc(entries_size, GFP_KERNEL);
+ if (!sgi) {
+ err = -ENOMEM;
+ goto free_filter;
+ }
+
+ refcount_set(&sgi->refcount, 1);
+ sgi->index = entry->gate.index;
+ sgi->init_ipv = entry->gate.prio;
+ sgi->basetime = entry->gate.basetime;
+ sgi->cycletime = entry->gate.cycletime;
+ sgi->num_entries = entry->gate.num_entries;
+
+ e = sgi->entries;
+ for (i = 0; i < entry->gate.num_entries; i++) {
+ e[i].gate_state = entry->gate.entries[i].gate_state;
+ e[i].interval = entry->gate.entries[i].interval;
+ e[i].ipv = entry->gate.entries[i].ipv;
+ e[i].maxoctets = entry->gate.entries[i].maxoctets;
+ }
+
+ filter->sgi_index = sgi->index;
+
+ sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
+ if (!sfi) {
+ err = -ENOMEM;
+ goto free_gate;
+ }
+
+ refcount_set(&sfi->refcount, 1);
+ sfi->gate_id = sgi->index;
+
+ /* flow meter not support yet */
+ sfi->meter_id = ENETC_PSFP_WILDCARD;
+
+ /* prio ref the filter prio */
+ if (f->common.prio && f->common.prio <= BIT(3))
+ sfi->prio = f->common.prio - 1;
+ else
+ sfi->prio = ENETC_PSFP_WILDCARD;
+
+ old_sfi = enetc_psfp_check_sfi(sfi);
+ if (!old_sfi) {
+ int index;
+
+ index = enetc_get_free_index(priv);
+ if (sfi->handle < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
+ err = -ENOSPC;
+ goto free_sfi;
+ }
+
+ sfi->index = index;
+ sfi->handle = index + HANDLE_OFFSET;
+ /* Update the stream filter handle also */
+ filter->sid.handle = sfi->handle;
+ filter->sfi_index = sfi->index;
+ sfi_overwrite = 0;
+ } else {
+ filter->sfi_index = old_sfi->index;
+ filter->sid.handle = old_sfi->handle;
+ sfi_overwrite = 1;
+ }
+
+ err = enetc_psfp_hw_set(priv, &filter->sid,
+ sfi_overwrite ? NULL : sfi, sgi);
+ if (err)
+ goto free_sfi;
+
+ spin_lock(&epsfp.psfp_lock);
+ /* Remove the old node if exist and update with a new node */
+ old_sgi = enetc_get_gate_by_index(filter->sgi_index);
+ if (old_sgi) {
+ refcount_set(&sgi->refcount,
+ refcount_read(&old_sgi->refcount) + 1);
+ hlist_del(&old_sgi->node);
+ kfree(old_sgi);
+ }
+
+ hlist_add_head(&sgi->node, &epsfp.psfp_gate_list);
+
+ if (!old_sfi) {
+ hlist_add_head(&sfi->node, &epsfp.psfp_filter_list);
+ set_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ } else {
+ kfree(sfi);
+ refcount_inc(&old_sfi->refcount);
+ }
+
+ old_filter = enetc_get_stream_by_index(filter->sid.index);
+ if (old_filter)
+ remove_one_chain(priv, old_filter);
+
+ filter->stats.lastused = jiffies;
+ hlist_add_head(&filter->node, &epsfp.stream_list);
+
+ spin_unlock(&epsfp.psfp_lock);
+
+ return 0;
+
+free_sfi:
+ kfree(sfi);
+free_gate:
+ kfree(sgi);
+free_filter:
+ kfree(filter);
+
+ return err;
+}
+
+static int enetc_config_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
+ struct netlink_ext_ack *extack = cls_flower->common.extack;
+ struct flow_dissector *dissector = rule->match.dissector;
+ struct flow_action *action = &rule->action;
+ struct flow_action_entry *entry;
+ struct actions_fwd *fwd;
+ u64 actions = 0;
+ int i, err;
+
+ if (!flow_action_has_entries(action)) {
+ NL_SET_ERR_MSG_MOD(extack, "At least one action is needed");
+ return -EINVAL;
+ }
+
+ flow_action_for_each(i, entry, action)
+ actions |= BIT(entry->id);
+
+ fwd = enetc_check_flow_actions(actions, dissector->used_keys);
+ if (!fwd) {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported filter type!");
+ return -EOPNOTSUPP;
+ }
+
+ if (fwd->output & FILTER_ACTION_TYPE_PSFP) {
+ err = enetc_psfp_parse_clsflower(priv, cls_flower);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid PSFP inputs");
+ return err;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported actions");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int enetc_psfp_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct enetc_stream_filter *filter;
+ struct netlink_ext_ack *extack = f->common.extack;
+ int err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamid_hw_set(priv, &filter->sid, false);
+ if (err)
+ return err;
+
+ remove_one_chain(priv, filter);
+
+ return 0;
+}
+
+static int enetc_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ return enetc_psfp_destroy_clsflower(priv, f);
+}
+
+static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct psfp_streamfilter_counters counters = {};
+ struct enetc_stream_filter *filter;
+ struct flow_stats stats = {};
+ int err;
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamcounter_hw_get(priv, filter->sfi_index, &counters);
+ if (err)
+ return -EINVAL;
+
+ spin_lock(&epsfp.psfp_lock);
+ stats.pkts = counters.matching_frames_count - filter->stats.pkts;
+ stats.lastused = filter->stats.lastused;
+ filter->stats.pkts += stats.pkts;
+ spin_unlock(&epsfp.psfp_lock);
+
+ flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
+ FLOW_ACTION_HW_STATS_DELAYED);
+
+ return 0;
+}
+
+static int enetc_setup_tc_cls_flower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ switch (cls_flower->command) {
+ case FLOW_CLS_REPLACE:
+ return enetc_config_clsflower(priv, cls_flower);
+ case FLOW_CLS_DESTROY:
+ return enetc_destroy_clsflower(priv, cls_flower);
+ case FLOW_CLS_STATS:
+ return enetc_psfp_get_stats(priv, cls_flower);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static inline void clean_psfp_sfi_bitmap(void)
+{
+ bitmap_free(epsfp.psfp_sfi_bitmap);
+ epsfp.psfp_sfi_bitmap = NULL;
+}
+
+static void clean_stream_list(void)
+{
+ struct enetc_stream_filter *s;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(s, tmp, &epsfp.stream_list, node) {
+ hlist_del(&s->node);
+ kfree(s);
+ }
+}
+
+static void clean_sfi_list(void)
+{
+ struct enetc_psfp_filter *sfi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sfi, tmp, &epsfp.psfp_filter_list, node) {
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ }
+}
+
+static void clean_sgi_list(void)
+{
+ struct enetc_psfp_gate *sgi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sgi, tmp, &epsfp.psfp_gate_list, node) {
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void clean_psfp_all(void)
+{
+ /* Disable all list nodes and free all memory */
+ clean_sfi_list();
+ clean_sgi_list();
+ clean_stream_list();
+ epsfp.dev_bitmap = 0;
+ clean_psfp_sfi_bitmap();
+}
+
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct net_device *ndev = cb_priv;
+
+ if (!tc_can_offload(ndev))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case TC_SETUP_CLSFLOWER:
+ return enetc_setup_tc_cls_flower(netdev_priv(ndev), type_data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+int enetc_psfp_init(struct enetc_ndev_priv *priv)
+{
+ if (epsfp.psfp_sfi_bitmap)
+ return 0;
+
+ epsfp.psfp_sfi_bitmap = bitmap_zalloc(priv->psfp_cap.max_psfp_filter,
+ GFP_KERNEL);
+ if (!epsfp.psfp_sfi_bitmap)
+ return -ENOMEM;
+
+ spin_lock_init(&epsfp.psfp_lock);
+
+ if (list_empty(&enetc_block_cb_list))
+ epsfp.dev_bitmap = 0;
+
+ return 0;
+}
+
+int enetc_psfp_clean(struct enetc_ndev_priv *priv)
+{
+ if (!list_empty(&enetc_block_cb_list))
+ return -EBUSY;
+
+ clean_psfp_all();
+
+ return 0;
+}
+
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct flow_block_offload *f = type_data;
+ int err;
+
+ err = flow_block_cb_setup_simple(f, &enetc_block_cb_list,
+ enetc_setup_tc_block_cb,
+ ndev, ndev, true);
+ if (err)
+ return err;
+
+ switch (f->command) {
+ case FLOW_BLOCK_BIND:
+ set_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ break;
+ case FLOW_BLOCK_UNBIND:
+ clear_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ if (!epsfp.dev_bitmap)
+ clean_psfp_all();
+ break;
+ }
+
+ return 0;
+}
--
2.17.1

2020-04-29 17:08:41

by Vlad Buslov

[permalink] [raw]
Subject: Re: [v4,net-next 1/4] net: qos: introduce a gate control flow action

Hi Po,

On Tue 28 Apr 2020 at 06:34, Po Liu <[email protected]> wrote:
> Introduce a ingress frame gate control flow action.
> Tc gate action does the work like this:
> Assume there is a gate allow specified ingress frames can be passed at
> specific time slot, and be dropped at specific time slot. Tc filter
> chooses the ingress frames, and tc gate action would specify what slot
> does these frames can be passed to device and what time slot would be
> dropped.
> Tc gate action would provide an entry list to tell how much time gate
> keep open and how much time gate keep state close. Gate action also
> assign a start time to tell when the entry list start. Then driver would
> repeat the gate entry list cyclically.
> For the software simulation, gate action requires the user assign a time
> clock type.
>
> Below is the setting example in user space. Tc filter a stream source ip
> address is 192.168.0.20 and gate action own two time slots. One is last
> 200ms gate open let frame pass another is last 100ms gate close let
> frames dropped. When the ingress frames have reach total frames over
> 8000000 bytes, the excessive frames will be dropped in that 200000000ns
> time slot.
>
>> tc qdisc add dev eth0 ingress
>
>> tc filter add dev eth0 parent ffff: protocol ip \
> flower src_ip 192.168.0.20 \
> action gate index 2 clockid CLOCK_TAI \
> sched-entry open 200000000 -1 8000000 \
> sched-entry close 100000000 -1 -1
>
>> tc chain del dev eth0 ingress chain 0
>
> "sched-entry" follow the name taprio style. Gate state is
> "open"/"close". Follow with period nanosecond. Then next item is internal
> priority value means which ingress queue should put. "-1" means
> wildcard. The last value optional specifies the maximum number of
> MSDU octets that are permitted to pass the gate during the specified
> time interval.
> Base-time is not set will be 0 as default, as result start time would
> be ((N + 1) * cycletime) which is the minimal of future time.
>
> Below example shows filtering a stream with destination mac address is
> 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
> action would run with one close time slot which means always keep close.
> The time cycle is total 200000000ns. The base-time would calculate by:
>
> 1357000000000 + (N + 1) * cycletime
>
> When the total value is the future time, it will be the start time.
> The cycletime here would be 200000000ns for this case.
>
>> tc filter add dev eth0 parent ffff: protocol ip \
> flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> action gate index 12 base-time 1357000000000 \
> sched-entry close 200000000 -1 -1 \
> clockid CLOCK_TAI
>
> Signed-off-by: Po Liu <[email protected]>
> ---
> include/net/tc_act/tc_gate.h | 47 ++
> include/uapi/linux/pkt_cls.h | 1 +
> include/uapi/linux/tc_act/tc_gate.h | 47 ++
> net/sched/Kconfig | 12 +
> net/sched/Makefile | 1 +
> net/sched/act_gate.c | 638 ++++++++++++++++++++++++++++
> 6 files changed, 746 insertions(+)
> create mode 100644 include/net/tc_act/tc_gate.h
> create mode 100644 include/uapi/linux/tc_act/tc_gate.h
> create mode 100644 net/sched/act_gate.c
>
> diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
> new file mode 100644
> index 000000000000..330ad8b02495
> --- /dev/null
> +++ b/include/net/tc_act/tc_gate.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/* Copyright 2020 NXP */
> +
> +#ifndef __NET_TC_GATE_H
> +#define __NET_TC_GATE_H
> +
> +#include <net/act_api.h>
> +#include <linux/tc_act/tc_gate.h>
> +
> +struct tcfg_gate_entry {
> + int index;
> + u8 gate_state;
> + u32 interval;
> + s32 ipv;
> + s32 maxoctets;
> + struct list_head list;
> +};
> +
> +struct tcf_gate_params {
> + s32 tcfg_priority;
> + u64 tcfg_basetime;
> + u64 tcfg_cycletime;
> + u64 tcfg_cycletime_ext;
> + u32 tcfg_flags;
> + s32 tcfg_clockid;
> + size_t num_entries;
> + struct list_head entries;
> +};
> +
> +#define GATE_ACT_GATE_OPEN BIT(0)
> +#define GATE_ACT_PENDING BIT(1)
> +
> +struct tcf_gate {
> + struct tc_action common;
> + struct tcf_gate_params param;
> + u8 current_gate_status;
> + ktime_t current_close_time;
> + u32 current_entry_octets;
> + s32 current_max_octets;
> + struct tcfg_gate_entry *next_entry;
> + struct hrtimer hitimer;
> + enum tk_offsets tk_offset;
> +};
> +
> +#define to_gate(a) ((struct tcf_gate *)a)
> +
> +#endif
> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
> index 9f06d29cab70..fc672b232437 100644
> --- a/include/uapi/linux/pkt_cls.h
> +++ b/include/uapi/linux/pkt_cls.h
> @@ -134,6 +134,7 @@ enum tca_id {
> TCA_ID_CTINFO,
> TCA_ID_MPLS,
> TCA_ID_CT,
> + TCA_ID_GATE,
> /* other actions go here */
> __TCA_ID_MAX = 255
> };
> diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
> new file mode 100644
> index 000000000000..f214b3a6d44f
> --- /dev/null
> +++ b/include/uapi/linux/tc_act/tc_gate.h
> @@ -0,0 +1,47 @@
> +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> +/* Copyright 2020 NXP */
> +
> +#ifndef __LINUX_TC_GATE_H
> +#define __LINUX_TC_GATE_H
> +
> +#include <linux/pkt_cls.h>
> +
> +struct tc_gate {
> + tc_gen;
> +};
> +
> +enum {
> + TCA_GATE_ENTRY_UNSPEC,
> + TCA_GATE_ENTRY_INDEX,
> + TCA_GATE_ENTRY_GATE,
> + TCA_GATE_ENTRY_INTERVAL,
> + TCA_GATE_ENTRY_IPV,
> + TCA_GATE_ENTRY_MAX_OCTETS,
> + __TCA_GATE_ENTRY_MAX,
> +};
> +#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
> +
> +enum {
> + TCA_GATE_ONE_ENTRY_UNSPEC,
> + TCA_GATE_ONE_ENTRY,
> + __TCA_GATE_ONE_ENTRY_MAX,
> +};
> +#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
> +
> +enum {
> + TCA_GATE_UNSPEC,
> + TCA_GATE_TM,
> + TCA_GATE_PARMS,
> + TCA_GATE_PAD,
> + TCA_GATE_PRIORITY,
> + TCA_GATE_ENTRY_LIST,
> + TCA_GATE_BASE_TIME,
> + TCA_GATE_CYCLE_TIME,
> + TCA_GATE_CYCLE_TIME_EXT,
> + TCA_GATE_FLAGS,
> + TCA_GATE_CLOCKID,
> + __TCA_GATE_MAX,
> +};
> +#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
> +
> +#endif
> diff --git a/net/sched/Kconfig b/net/sched/Kconfig
> index bfbefb7bff9d..2f20073f4f84 100644
> --- a/net/sched/Kconfig
> +++ b/net/sched/Kconfig
> @@ -981,6 +981,18 @@ config NET_ACT_CT
> To compile this code as a module, choose M here: the
> module will be called act_ct.
>
> +config NET_ACT_GATE
> + tristate "Frame gate entry list control tc action"
> + depends on NET_CLS_ACT
> + help
> + Say Y here to allow to control the ingress flow to be passed at
> + specific time slot and be dropped at other specific time slot by
> + the gate entry list.
> +
> + If unsure, say N.
> + To compile this code as a module, choose M here: the
> + module will be called act_gate.
> +
> config NET_IFE_SKBMARK
> tristate "Support to encoding decoding skb mark on IFE action"
> depends on NET_ACT_IFE
> diff --git a/net/sched/Makefile b/net/sched/Makefile
> index 31c367a6cd09..66bbf9a98f9e 100644
> --- a/net/sched/Makefile
> +++ b/net/sched/Makefile
> @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) += act_meta_skbprio.o
> obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
> obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
> obj-$(CONFIG_NET_ACT_CT) += act_ct.o
> +obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
> obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
> obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
> obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
> diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
> new file mode 100644
> index 000000000000..916f6fe56692
> --- /dev/null
> +++ b/net/sched/act_gate.c
> @@ -0,0 +1,638 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/* Copyright 2020 NXP */
> +
> +#include <linux/module.h>
> +#include <linux/types.h>
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/errno.h>
> +#include <linux/skbuff.h>
> +#include <linux/rtnetlink.h>
> +#include <linux/init.h>
> +#include <linux/slab.h>
> +#include <net/act_api.h>
> +#include <net/netlink.h>
> +#include <net/pkt_cls.h>
> +#include <net/tc_act/tc_gate.h>
> +
> +static unsigned int gate_net_id;
> +static struct tc_action_ops act_gate_ops;
> +
> +static ktime_t gate_get_time(struct tcf_gate *gact)
> +{
> + ktime_t mono = ktime_get();
> +
> + switch (gact->tk_offset) {
> + case TK_OFFS_MAX:
> + return mono;
> + default:
> + return ktime_mono_to_any(mono, gact->tk_offset);
> + }
> +
> + return KTIME_MAX;
> +}
> +
> +static int gate_get_start_time(struct tcf_gate *gact, ktime_t *start)
> +{
> + struct tcf_gate_params *param = &gact->param;
> + ktime_t now, base, cycle;
> + u64 n;
> +
> + base = ns_to_ktime(param->tcfg_basetime);
> + now = gate_get_time(gact);
> +
> + if (ktime_after(base, now)) {
> + *start = base;
> + return 0;
> + }
> +
> + cycle = param->tcfg_cycletime;
> +
> + /* cycle time should not be zero */
> + if (!cycle)
> + return -EFAULT;
> +
> + n = div64_u64(ktime_sub_ns(now, base), cycle);
> + *start = ktime_add_ns(base, (n + 1) * cycle);
> + return 0;
> +}
> +
> +static void gate_start_timer(struct tcf_gate *gact, ktime_t start)
> +{
> + ktime_t expires;
> +
> + expires = hrtimer_get_expires(&gact->hitimer);
> + if (expires == 0)
> + expires = KTIME_MAX;
> +
> + start = min_t(ktime_t, start, expires);
> +
> + hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS);
> +}
> +
> +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer)
> +{
> + struct tcf_gate *gact = container_of(timer, struct tcf_gate,
> + hitimer);
> + struct tcf_gate_params *p = &gact->param;
> + struct tcfg_gate_entry *next;
> + ktime_t close_time, now;
> +
> + spin_lock(&gact->tcf_lock);
> +
> + next = gact->next_entry;
> +
> + /* cycle start, clear pending bit, clear total octets */
> + gact->current_gate_status = next->gate_state ? GATE_ACT_GATE_OPEN : 0;
> + gact->current_entry_octets = 0;
> + gact->current_max_octets = next->maxoctets;
> +
> + gact->current_close_time = ktime_add_ns(gact->current_close_time,
> + next->interval);
> +
> + close_time = gact->current_close_time;
> +
> + if (list_is_last(&next->list, &p->entries))
> + next = list_first_entry(&p->entries,
> + struct tcfg_gate_entry, list);
> + else
> + next = list_next_entry(next, list);
> +
> + now = gate_get_time(gact);
> +
> + if (ktime_after(now, close_time)) {
> + ktime_t cycle, base;
> + u64 n;
> +
> + cycle = p->tcfg_cycletime;
> + base = ns_to_ktime(p->tcfg_basetime);
> + n = div64_u64(ktime_sub_ns(now, base), cycle);
> + close_time = ktime_add_ns(base, (n + 1) * cycle);
> + }
> +
> + gact->next_entry = next;
> +
> + hrtimer_set_expires(&gact->hitimer, close_time);
> +
> + spin_unlock(&gact->tcf_lock);
> +
> + return HRTIMER_RESTART;
> +}
> +
> +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
> + struct tcf_result *res)
> +{
> + struct tcf_gate *gact = to_gate(a);
> +
> + spin_lock_bh(&gact->tcf_lock);

I think you need to use regular spin_lock() function here because this
function is called from softirq context.

> +
> + tcf_lastuse_update(&gact->tcf_tm);
> + bstats_update(&gact->tcf_bstats, skb);
> +
> + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
> + spin_unlock_bh(&gact->tcf_lock);
> + return gact->tcf_action;
> + }
> +
> + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
> + goto drop;
> +
> + if (gact->current_max_octets >= 0) {
> + gact->current_entry_octets += qdisc_pkt_len(skb);
> + if (gact->current_entry_octets > gact->current_max_octets) {
> + gact->tcf_qstats.overlimits++;
> + goto drop;
> + }
> + }
> +
> + spin_unlock_bh(&gact->tcf_lock);
> +
> + return gact->tcf_action;
> +drop:
> + gact->tcf_qstats.drops++;
> + spin_unlock_bh(&gact->tcf_lock);
> +
> + return TC_ACT_SHOT;
> +}
> +
> +static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1] = {
> + [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
> + [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
> + [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
> + [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
> + [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
> +};
> +
> +static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
> + [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
> + .type = NLA_EXACT_LEN },
> + [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
> + [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
> + [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
> + [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
> + [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
> + [TCA_GATE_FLAGS] = { .type = NLA_U32 },
> + [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
> +};
> +
> +static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
> + struct netlink_ext_ack *extack)
> +{
> + u32 interval = 0;
> +
> + entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
> +
> + if (tb[TCA_GATE_ENTRY_INTERVAL])
> + interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
> +
> + if (interval == 0) {
> + NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
> + return -EINVAL;
> + }
> +
> + entry->interval = interval;
> +
> + if (tb[TCA_GATE_ENTRY_IPV])
> + entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
> + else
> + entry->ipv = -1;
> +
> + if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
> + entry->maxoctets = nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
> + else
> + entry->maxoctets = -1;
> +
> + return 0;
> +}
> +
> +static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry *entry,
> + int index, struct netlink_ext_ack *extack)
> +{
> + struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
> + int err;
> +
> + err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy, extack);
> + if (err < 0) {
> + NL_SET_ERR_MSG(extack, "Could not parse nested entry");
> + return -EINVAL;
> + }
> +
> + entry->index = index;
> +
> + return fill_gate_entry(tb, entry, extack);
> +}
> +
> +static void release_entry_list(struct list_head *entries)
> +{
> + struct tcfg_gate_entry *entry, *e;
> +
> + list_for_each_entry_safe(entry, e, entries, list) {
> + list_del(&entry->list);
> + kfree(entry);
> + }
> +}
> +
> +static int parse_gate_list(struct nlattr *list_attr,
> + struct tcf_gate_params *sched,
> + struct netlink_ext_ack *extack)
> +{
> + struct tcfg_gate_entry *entry;
> + struct nlattr *n;
> + int err, rem;
> + int i = 0;
> +
> + if (!list_attr)
> + return -EINVAL;
> +
> + nla_for_each_nested(n, list_attr, rem) {
> + if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
> + NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
> + continue;
> + }
> +
> + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> + if (!entry) {
> + NL_SET_ERR_MSG(extack, "Not enough memory for entry");
> + err = -ENOMEM;
> + goto release_list;
> + }
> +
> + err = parse_gate_entry(n, entry, i, extack);
> + if (err < 0) {
> + kfree(entry);
> + goto release_list;
> + }
> +
> + list_add_tail(&entry->list, &sched->entries);
> + i++;
> + }
> +
> + sched->num_entries = i;
> +
> + return i;
> +
> +release_list:
> + release_entry_list(&sched->entries);
> +
> + return err;
> +}
> +
> +static int tcf_gate_init(struct net *net, struct nlattr *nla,
> + struct nlattr *est, struct tc_action **a,
> + int ovr, int bind, bool rtnl_held,
> + struct tcf_proto *tp, u32 flags,
> + struct netlink_ext_ack *extack)
> +{
> + struct tc_action_net *tn = net_generic(net, gate_net_id);
> + enum tk_offsets tk_offset = TK_OFFS_TAI;
> + struct nlattr *tb[TCA_GATE_MAX + 1];
> + struct tcf_chain *goto_ch = NULL;
> + struct tcf_gate_params *p;
> + s32 clockid = CLOCK_TAI;
> + struct tcf_gate *gact;
> + struct tc_gate *parm;
> + int ret = 0, err;
> + u64 basetime = 0;
> + u32 gflags = 0;
> + s32 prio = -1;
> + ktime_t start;
> + u32 index;
> +
> + if (!nla)
> + return -EINVAL;
> +
> + err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
> + if (err < 0)
> + return err;
> +
> + if (!tb[TCA_GATE_PARMS])
> + return -EINVAL;
> +
> + parm = nla_data(tb[TCA_GATE_PARMS]);
> + index = parm->index;
> +
> + err = tcf_idr_check_alloc(tn, &index, a, bind);
> + if (err < 0)
> + return err;
> +
> + if (err && bind)
> + return 0;
> +
> + if (!err) {
> + ret = tcf_idr_create(tn, index, est, a,
> + &act_gate_ops, bind, false, 0);
> + if (ret) {
> + tcf_idr_cleanup(tn, index);
> + return ret;
> + }
> +
> + ret = ACT_P_CREATED;
> + } else if (!ovr) {
> + tcf_idr_release(*a, bind);
> + return -EEXIST;
> + }
> +
> + if (tb[TCA_GATE_PRIORITY])
> + prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
> +
> + if (tb[TCA_GATE_BASE_TIME])
> + basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
> +
> + if (tb[TCA_GATE_FLAGS])
> + gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
> +
> + if (tb[TCA_GATE_CLOCKID]) {
> + clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
> + switch (clockid) {
> + case CLOCK_REALTIME:
> + tk_offset = TK_OFFS_REAL;
> + break;
> + case CLOCK_MONOTONIC:
> + tk_offset = TK_OFFS_MAX;
> + break;
> + case CLOCK_BOOTTIME:
> + tk_offset = TK_OFFS_BOOT;
> + break;
> + case CLOCK_TAI:
> + tk_offset = TK_OFFS_TAI;
> + break;
> + default:
> + NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
> + goto release_idr;
> + }
> + }
> +
> + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
> + if (err < 0)
> + goto release_idr;
> +
> + gact = to_gate(*a);
> +
> + spin_lock_bh(&gact->tcf_lock);
> + p = &gact->param;
> +
> + if (tb[TCA_GATE_CYCLE_TIME]) {
> + p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
> + if (!p->tcfg_cycletime_ext)
> + goto chain_put;
> + }
> +
> + INIT_LIST_HEAD(&p->entries);
> + if (tb[TCA_GATE_ENTRY_LIST]) {
> + err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
> + if (err < 0)
> + goto chain_put;
> + }
> +
> + if (!p->tcfg_cycletime) {
> + struct tcfg_gate_entry *entry;
> + ktime_t cycle = 0;
> +
> + list_for_each_entry(entry, &p->entries, list)
> + cycle = ktime_add_ns(cycle, entry->interval);
> + p->tcfg_cycletime = cycle;
> + }
> +
> + if (tb[TCA_GATE_CYCLE_TIME_EXT])
> + p->tcfg_cycletime_ext =
> + nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
> +
> + p->tcfg_priority = prio;
> + p->tcfg_basetime = basetime;
> + p->tcfg_clockid = clockid;
> + p->tcfg_flags = gflags;
> +
> + gact->tk_offset = tk_offset;
> + hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
> + gact->hitimer.function = gate_timer_func;
> +
> + err = gate_get_start_time(gact, &start);
> + if (err < 0) {
> + NL_SET_ERR_MSG(extack,
> + "Internal error: failed get start time");
> + release_entry_list(&p->entries);
> + goto chain_put;
> + }
> +
> + gact->current_close_time = start;
> + gact->current_gate_status = GATE_ACT_GATE_OPEN | GATE_ACT_PENDING;
> +
> + gact->next_entry = list_first_entry(&p->entries,
> + struct tcfg_gate_entry, list);
> +
> + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
> +
> + gate_start_timer(gact, start);
> +
> + spin_unlock_bh(&gact->tcf_lock);
> +
> + if (goto_ch)
> + tcf_chain_put_by_act(goto_ch);
> +
> + if (ret == ACT_P_CREATED)
> + tcf_idr_insert(tn, *a);
> + return ret;
> +
> +chain_put:
> + spin_unlock_bh(&gact->tcf_lock);
> +
> + if (goto_ch)
> + tcf_chain_put_by_act(goto_ch);
> +release_idr:
> + tcf_idr_release(*a, bind);
> + return err;
> +}
> +
> +static void tcf_gate_cleanup(struct tc_action *a)
> +{
> + struct tcf_gate *gact = to_gate(a);
> + struct tcf_gate_params *p;
> +
> + spin_lock_bh(&gact->tcf_lock);

No need to lock in tc_action_ops->cleanup() callbacks.

> +
> + hrtimer_cancel(&gact->hitimer);
> +
> + p = &gact->param;
> + release_entry_list(&p->entries);
> +
> + spin_unlock_bh(&gact->tcf_lock);
> +}
> +
> +static int dumping_entry(struct sk_buff *skb,
> + struct tcfg_gate_entry *entry)
> +{
> + struct nlattr *item;
> +
> + item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY);
> + if (!item)
> + return -ENOSPC;
> +
> + if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index))
> + goto nla_put_failure;
> +
> + if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE))
> + goto nla_put_failure;
> +
> + if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval))
> + goto nla_put_failure;
> +
> + if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry->maxoctets))
> + goto nla_put_failure;
> +
> + if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv))
> + goto nla_put_failure;
> +
> + return nla_nest_end(skb, item);
> +
> +nla_put_failure:
> + nla_nest_cancel(skb, item);
> + return -1;
> +}
> +
> +static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a,
> + int bind, int ref)
> +{
> + unsigned char *b = skb_tail_pointer(skb);
> + struct tcf_gate *gact = to_gate(a);
> + struct tc_gate opt = {
> + .index = gact->tcf_index,
> + .refcnt = refcount_read(&gact->tcf_refcnt) - ref,
> + .bindcnt = atomic_read(&gact->tcf_bindcnt) - bind,
> + };
> + struct tcfg_gate_entry *entry;
> + struct tcf_gate_params *p;
> + struct nlattr *entry_list;
> + struct tcf_t t;
> +
> + spin_lock_bh(&gact->tcf_lock);
> + opt.action = gact->tcf_action;
> +
> + p = &gact->param;
> +
> + if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt))
> + goto nla_put_failure;
> +
> + if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME,
> + p->tcfg_basetime, TCA_GATE_PAD))
> + goto nla_put_failure;
> +
> + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME,
> + p->tcfg_cycletime, TCA_GATE_PAD))
> + goto nla_put_failure;
> +
> + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT,
> + p->tcfg_cycletime_ext, TCA_GATE_PAD))
> + goto nla_put_failure;
> +
> + if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid))
> + goto nla_put_failure;
> +
> + if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags))
> + goto nla_put_failure;
> +
> + if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority))
> + goto nla_put_failure;
> +
> + entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST);
> + if (!entry_list)
> + goto nla_put_failure;
> +
> + list_for_each_entry(entry, &p->entries, list) {
> + if (dumping_entry(skb, entry) < 0)
> + goto nla_put_failure;
> + }
> +
> + nla_nest_end(skb, entry_list);
> +
> + tcf_tm_dump(&t, &gact->tcf_tm);
> + if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD))
> + goto nla_put_failure;
> + spin_unlock_bh(&gact->tcf_lock);
> +
> + return skb->len;
> +
> +nla_put_failure:
> + spin_unlock_bh(&gact->tcf_lock);
> + nlmsg_trim(skb, b);
> + return -1;
> +}
> +
> +static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
> + struct netlink_callback *cb, int type,
> + const struct tc_action_ops *ops,
> + struct netlink_ext_ack *extack)
> +{
> + struct tc_action_net *tn = net_generic(net, gate_net_id);
> +
> + return tcf_generic_walker(tn, skb, cb, type, ops, extack);
> +}
> +
> +static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u32 packets,
> + u64 lastuse, bool hw)
> +{
> + struct tcf_gate *gact = to_gate(a);
> + struct tcf_t *tm = &gact->tcf_tm;
> +
> + tcf_action_update_stats(a, bytes, packets, false, hw);
> + tm->lastuse = max_t(u64, tm->lastuse, lastuse);
> +}
> +
> +static int tcf_gate_search(struct net *net, struct tc_action **a, u32 index)
> +{
> + struct tc_action_net *tn = net_generic(net, gate_net_id);
> +
> + return tcf_idr_search(tn, a, index);
> +}
> +
> +static size_t tcf_gate_get_fill_size(const struct tc_action *act)
> +{
> + return nla_total_size(sizeof(struct tc_gate));
> +}
> +
> +static struct tc_action_ops act_gate_ops = {
> + .kind = "gate",
> + .id = TCA_ID_GATE,
> + .owner = THIS_MODULE,
> + .act = tcf_gate_act,
> + .dump = tcf_gate_dump,
> + .init = tcf_gate_init,
> + .cleanup = tcf_gate_cleanup,
> + .walk = tcf_gate_walker,
> + .stats_update = tcf_gate_stats_update,
> + .get_fill_size = tcf_gate_get_fill_size,
> + .lookup = tcf_gate_search,
> + .size = sizeof(struct tcf_gate),
> +};
> +
> +static __net_init int gate_init_net(struct net *net)
> +{
> + struct tc_action_net *tn = net_generic(net, gate_net_id);
> +
> + return tc_action_net_init(net, tn, &act_gate_ops);
> +}
> +
> +static void __net_exit gate_exit_net(struct list_head *net_list)
> +{
> + tc_action_net_exit(net_list, gate_net_id);
> +}
> +
> +static struct pernet_operations gate_net_ops = {
> + .init = gate_init_net,
> + .exit_batch = gate_exit_net,
> + .id = &gate_net_id,
> + .size = sizeof(struct tc_action_net),
> +};
> +
> +static int __init gate_init_module(void)
> +{
> + return tcf_register_action(&act_gate_ops, &gate_net_ops);
> +}
> +
> +static void __exit gate_cleanup_module(void)
> +{
> + tcf_unregister_action(&act_gate_ops, &gate_net_ops);
> +}
> +
> +module_init(gate_init_module);
> +module_exit(gate_cleanup_module);
> +MODULE_LICENSE("GPL v2");

2020-04-29 17:42:32

by Vlad Buslov

[permalink] [raw]
Subject: Re: [v4,net-next 2/4] net: schedule: add action gate offloading


On Tue 28 Apr 2020 at 06:34, Po Liu <[email protected]> wrote:
> Add the gate action to the flow action entry. Add the gate parameters to
> the tc_setup_flow_action() queueing to the entries of flow_action_entry
> array provide to the driver.
>
> Signed-off-by: Po Liu <[email protected]>
> ---
> include/net/flow_offload.h | 10 ++++
> include/net/tc_act/tc_gate.h | 113 +++++++++++++++++++++++++++++++++++
> net/sched/cls_api.c | 33 ++++++++++
> 3 files changed, 156 insertions(+)
>
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 3619c6acf60f..94a30fe02e6d 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -147,6 +147,7 @@ enum flow_action_id {
> FLOW_ACTION_MPLS_PUSH,
> FLOW_ACTION_MPLS_POP,
> FLOW_ACTION_MPLS_MANGLE,
> + FLOW_ACTION_GATE,
> NUM_FLOW_ACTIONS,
> };
>
> @@ -255,6 +256,15 @@ struct flow_action_entry {
> u8 bos;
> u8 ttl;
> } mpls_mangle;
> + struct {
> + u32 index;
> + s32 prio;
> + u64 basetime;
> + u64 cycletime;
> + u64 cycletimeext;
> + u32 num_entries;
> + struct action_gate_entry *entries;
> + } gate;
> };
> struct flow_action_cookie *cookie; /* user defined action cookie */
> };
> diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
> index 330ad8b02495..9e698c7d64cd 100644
> --- a/include/net/tc_act/tc_gate.h
> +++ b/include/net/tc_act/tc_gate.h
> @@ -7,6 +7,13 @@
> #include <net/act_api.h>
> #include <linux/tc_act/tc_gate.h>
>
> +struct action_gate_entry {
> + u8 gate_state;
> + u32 interval;
> + s32 ipv;
> + s32 maxoctets;
> +};
> +
> struct tcfg_gate_entry {
> int index;
> u8 gate_state;
> @@ -44,4 +51,110 @@ struct tcf_gate {
>
> #define to_gate(a) ((struct tcf_gate *)a)
>
> +static inline bool is_tcf_gate(const struct tc_action *a)
> +{
> +#ifdef CONFIG_NET_CLS_ACT
> + if (a->ops && a->ops->id == TCA_ID_GATE)
> + return true;
> +#endif
> + return false;
> +}
> +
> +static inline u32 tcf_gate_index(const struct tc_action *a)
> +{
> + return a->tcfa_index;
> +}
> +
> +static inline s32 tcf_gate_prio(const struct tc_action *a)
> +{
> + s32 tcfg_prio;
> +
> + rcu_read_lock();

This action no longer uses rcu, so you don't need protect with
rcu_read_lock() in all these helpers.

> + tcfg_prio = to_gate(a)->param.tcfg_priority;
> + rcu_read_unlock();
> +
> + return tcfg_prio;
> +}
> +
> +static inline u64 tcf_gate_basetime(const struct tc_action *a)
> +{
> + u64 tcfg_basetime;
> +
> + rcu_read_lock();
> + tcfg_basetime = to_gate(a)->param.tcfg_basetime;
> + rcu_read_unlock();
> +
> + return tcfg_basetime;
> +}
> +
> +static inline u64 tcf_gate_cycletime(const struct tc_action *a)
> +{
> + u64 tcfg_cycletime;
> +
> + rcu_read_lock();
> + tcfg_cycletime = to_gate(a)->param.tcfg_cycletime;
> + rcu_read_unlock();
> +
> + return tcfg_cycletime;
> +}
> +
> +static inline u64 tcf_gate_cycletimeext(const struct tc_action *a)
> +{
> + u64 tcfg_cycletimeext;
> +
> + rcu_read_lock();
> + tcfg_cycletimeext = to_gate(a)->param.tcfg_cycletime_ext;
> + rcu_read_unlock();
> +
> + return tcfg_cycletimeext;
> +}
> +
> +static inline u32 tcf_gate_num_entries(const struct tc_action *a)
> +{
> + u32 num_entries;
> +
> + rcu_read_lock();
> + num_entries = to_gate(a)->param.num_entries;
> + rcu_read_unlock();
> +
> + return num_entries;
> +}
> +
> +static inline struct action_gate_entry
> + *tcf_gate_get_list(const struct tc_action *a)
> +{
> + struct action_gate_entry *oe;
> + struct tcf_gate_params *p;
> + struct tcfg_gate_entry *entry;
> + u32 num_entries;
> + int i = 0;
> +
> + rcu_read_lock();
> +
> + p = &to_gate(a)->param;
> + num_entries = p->num_entries;
> +
> + list_for_each_entry(entry, &p->entries, list)
> + i++;
> +
> + if (i != num_entries)
> + return NULL;
> +
> + oe = kzalloc(sizeof(*oe) * num_entries, GFP_KERNEL);

Can't allocate with GFP_KERNEL flag in rcu read blocks, but you don't
need the rcu read lock here anyway. However, tc_setup_flow_action()
calls this function while holding tcfa_lock spinlock, which also
precludes allocating memory with that flag. You can test for such
problems by enabling CONFIG_DEBUG_ATOMIC_SLEEP. To help uncover such
errors all new act APIs and action implementations are usually
accompanied by tdc tests. If you chose to implement such tests you can
look at 6e52fca36c67 ("tc-tests: Add tc action ct tests") for recent
example.

> + if (!oe)
> + return NULL;

This returns without releasing rcu read lock, but as I said before you
probably don't need rcu protection here anyway.

> +
> + i = 0;
> + list_for_each_entry(entry, &p->entries, list) {
> + oe[i].gate_state = entry->gate_state;
> + oe[i].interval = entry->interval;
> + oe[i].ipv = entry->ipv;
> + oe[i].maxoctets = entry->maxoctets;
> + i++;
> + }
> +
> + rcu_read_unlock();
> +
> + return oe;
> +}
> #endif
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 11b683c45c28..7e85c91d0752 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -39,6 +39,7 @@
> #include <net/tc_act/tc_skbedit.h>
> #include <net/tc_act/tc_ct.h>
> #include <net/tc_act/tc_mpls.h>
> +#include <net/tc_act/tc_gate.h>
> #include <net/flow_offload.h>
>
> extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
> @@ -3526,6 +3527,27 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
> #endif
> }
>
> +static void tcf_gate_entry_destructor(void *priv)
> +{
> + struct action_gate_entry *oe = priv;
> +
> + kfree(oe);
> +}
> +
> +static int tcf_gate_get_entries(struct flow_action_entry *entry,
> + const struct tc_action *act)
> +{
> + entry->gate.entries = tcf_gate_get_list(act);
> +
> + if (!entry->gate.entries)
> + return -EINVAL;
> +
> + entry->destructor = tcf_gate_entry_destructor;
> + entry->destructor_priv = entry->gate.entries;
> +
> + return 0;
> +}
> +
> int tc_setup_flow_action(struct flow_action *flow_action,
> const struct tcf_exts *exts)
> {
> @@ -3672,6 +3694,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
> } else if (is_tcf_skbedit_priority(act)) {
> entry->id = FLOW_ACTION_PRIORITY;
> entry->priority = tcf_skbedit_priority(act);
> + } else if (is_tcf_gate(act)) {
> + entry->id = FLOW_ACTION_GATE;
> + entry->gate.index = tcf_gate_index(act);
> + entry->gate.prio = tcf_gate_prio(act);
> + entry->gate.basetime = tcf_gate_basetime(act);
> + entry->gate.cycletime = tcf_gate_cycletime(act);
> + entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
> + entry->gate.num_entries = tcf_gate_num_entries(act);
> + err = tcf_gate_get_entries(entry, act);
> + if (err)
> + goto err_out;
> } else {
> err = -EOPNOTSUPP;
> goto err_out_locked;

2020-04-30 01:11:57

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v4,net-next 1/4] net: qos: introduce a gate control flow action

Hi Vlad,

> -----Original Message-----
> From: Vlad Buslov <[email protected]>
> Sent: 2020??4??30?? 1:04
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [EXT] Re: [v4,net-next 1/4] net: qos: introduce a gate control flow
> action
>
> Caution: EXT Email
>
> Hi Po,
>
> On Tue 28 Apr 2020 at 06:34, Po Liu <[email protected]> wrote:
> > Introduce a ingress frame gate control flow action.
> > Tc gate action does the work like this:
> > Assume there is a gate allow specified ingress frames can be passed at
> > specific time slot, and be dropped at specific time slot. Tc filter
> > chooses the ingress frames, and tc gate action would specify what slot
> > does these frames can be passed to device and what time slot would be
> > dropped.
> > Tc gate action would provide an entry list to tell how much time gate
> > keep open and how much time gate keep state close. Gate action also
> > assign a start time to tell when the entry list start. Then driver
> > would repeat the gate entry list cyclically.
> > For the software simulation, gate action requires the user assign a
> > time clock type.
> >
> > Below is the setting example in user space. Tc filter a stream source
> > ip address is 192.168.0.20 and gate action own two time slots. One is
> > last 200ms gate open let frame pass another is last 100ms gate close
> > let frames dropped. When the ingress frames have reach total frames
> > over
> > 8000000 bytes, the excessive frames will be dropped in that
> > 200000000ns time slot.
> >
> >> tc qdisc add dev eth0 ingress
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower src_ip 192.168.0.20 \
> > action gate index 2 clockid CLOCK_TAI \
> > sched-entry open 200000000 -1 8000000 \
> > sched-entry close 100000000 -1 -1
> >
> >> tc chain del dev eth0 ingress chain 0
> >
> > "sched-entry" follow the name taprio style. Gate state is
> > "open"/"close". Follow with period nanosecond. Then next item is
> > internal priority value means which ingress queue should put. "-1"
> > means wildcard. The last value optional specifies the maximum number
> > of MSDU octets that are permitted to pass the gate during the
> > specified time interval.
> > Base-time is not set will be 0 as default, as result start time would
> > be ((N + 1) * cycletime) which is the minimal of future time.
> >
> > Below example shows filtering a stream with destination mac address is
> > 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The
> > gate action would run with one close time slot which means always keep
> close.
> > The time cycle is total 200000000ns. The base-time would calculate by:
> >
> > 1357000000000 + (N + 1) * cycletime
> >
> > When the total value is the future time, it will be the start time.
> > The cycletime here would be 200000000ns for this case.
> >
> >> tc filter add dev eth0 parent ffff: protocol ip \
> > flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> > action gate index 12 base-time 1357000000000 \
> > sched-entry close 200000000 -1 -1 \
> > clockid CLOCK_TAI
> >
> > Signed-off-by: Po Liu <[email protected]>
> > ---
> > include/net/tc_act/tc_gate.h | 47 ++
> > include/uapi/linux/pkt_cls.h | 1 +
> > include/uapi/linux/tc_act/tc_gate.h | 47 ++
> > net/sched/Kconfig | 12 +
> > net/sched/Makefile | 1 +
> > net/sched/act_gate.c | 638 ++++++++++++++++++++++++++++
> > 6 files changed, 746 insertions(+)
> > create mode 100644 include/net/tc_act/tc_gate.h create mode 100644
> > include/uapi/linux/tc_act/tc_gate.h
> > create mode 100644 net/sched/act_gate.c
> >
> > diff --git a/include/net/tc_act/tc_gate.h
> > b/include/net/tc_act/tc_gate.h new file mode 100644 index
> > 000000000000..330ad8b02495
> > --- /dev/null
> > +++ b/include/net/tc_act/tc_gate.h
> > @@ -0,0 +1,47 @@
> > +/* SPDX-License-Identifier: GPL-2.0-or-later */
> > +/* Copyright 2020 NXP */
> > +
> > +#ifndef __NET_TC_GATE_H
> > +#define __NET_TC_GATE_H
> > +
> > +#include <net/act_api.h>
> > +#include <linux/tc_act/tc_gate.h>
> > +
> > +struct tcfg_gate_entry {
> > + int index;
> > + u8 gate_state;
> > + u32 interval;
> > + s32 ipv;
> > + s32 maxoctets;
> > + struct list_head list;
> > +};
> > +
> > +struct tcf_gate_params {
> > + s32 tcfg_priority;
> > + u64 tcfg_basetime;
> > + u64 tcfg_cycletime;
> > + u64 tcfg_cycletime_ext;
> > + u32 tcfg_flags;
> > + s32 tcfg_clockid;
> > + size_t num_entries;
> > + struct list_head entries;
> > +};
> > +
> > +#define GATE_ACT_GATE_OPEN BIT(0)
> > +#define GATE_ACT_PENDING BIT(1)
> > +
> > +struct tcf_gate {
> > + struct tc_action common;
> > + struct tcf_gate_params param;
> > + u8 current_gate_status;
> > + ktime_t current_close_time;
> > + u32 current_entry_octets;
> > + s32 current_max_octets;
> > + struct tcfg_gate_entry *next_entry;
> > + struct hrtimer hitimer;
> > + enum tk_offsets tk_offset;
> > +};
> > +
> > +#define to_gate(a) ((struct tcf_gate *)a)
> > +
> > +#endif
> > diff --git a/include/uapi/linux/pkt_cls.h
> > b/include/uapi/linux/pkt_cls.h index 9f06d29cab70..fc672b232437
> 100644
> > --- a/include/uapi/linux/pkt_cls.h
> > +++ b/include/uapi/linux/pkt_cls.h
> > @@ -134,6 +134,7 @@ enum tca_id {
> > TCA_ID_CTINFO,
> > TCA_ID_MPLS,
> > TCA_ID_CT,
> > + TCA_ID_GATE,
> > /* other actions go here */
> > __TCA_ID_MAX = 255
> > };
> > diff --git a/include/uapi/linux/tc_act/tc_gate.h
> > b/include/uapi/linux/tc_act/tc_gate.h
> > new file mode 100644
> > index 000000000000..f214b3a6d44f
> > --- /dev/null
> > +++ b/include/uapi/linux/tc_act/tc_gate.h
> > @@ -0,0 +1,47 @@
> > +/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
> > +/* Copyright 2020 NXP */
> > +
> > +#ifndef __LINUX_TC_GATE_H
> > +#define __LINUX_TC_GATE_H
> > +
> > +#include <linux/pkt_cls.h>
> > +
> > +struct tc_gate {
> > + tc_gen;
> > +};
> > +
> > +enum {
> > + TCA_GATE_ENTRY_UNSPEC,
> > + TCA_GATE_ENTRY_INDEX,
> > + TCA_GATE_ENTRY_GATE,
> > + TCA_GATE_ENTRY_INTERVAL,
> > + TCA_GATE_ENTRY_IPV,
> > + TCA_GATE_ENTRY_MAX_OCTETS,
> > + __TCA_GATE_ENTRY_MAX,
> > +};
> > +#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
> > +
> > +enum {
> > + TCA_GATE_ONE_ENTRY_UNSPEC,
> > + TCA_GATE_ONE_ENTRY,
> > + __TCA_GATE_ONE_ENTRY_MAX,
> > +};
> > +#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX
> - 1)
> > +
> > +enum {
> > + TCA_GATE_UNSPEC,
> > + TCA_GATE_TM,
> > + TCA_GATE_PARMS,
> > + TCA_GATE_PAD,
> > + TCA_GATE_PRIORITY,
> > + TCA_GATE_ENTRY_LIST,
> > + TCA_GATE_BASE_TIME,
> > + TCA_GATE_CYCLE_TIME,
> > + TCA_GATE_CYCLE_TIME_EXT,
> > + TCA_GATE_FLAGS,
> > + TCA_GATE_CLOCKID,
> > + __TCA_GATE_MAX,
> > +};
> > +#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
> > +
> > +#endif
> > diff --git a/net/sched/Kconfig b/net/sched/Kconfig index
> > bfbefb7bff9d..2f20073f4f84 100644
> > --- a/net/sched/Kconfig
> > +++ b/net/sched/Kconfig
> > @@ -981,6 +981,18 @@ config NET_ACT_CT
> > To compile this code as a module, choose M here: the
> > module will be called act_ct.
> >
> > +config NET_ACT_GATE
> > + tristate "Frame gate entry list control tc action"
> > + depends on NET_CLS_ACT
> > + help
> > + Say Y here to allow to control the ingress flow to be passed at
> > + specific time slot and be dropped at other specific time slot by
> > + the gate entry list.
> > +
> > + If unsure, say N.
> > + To compile this code as a module, choose M here: the
> > + module will be called act_gate.
> > +
> > config NET_IFE_SKBMARK
> > tristate "Support to encoding decoding skb mark on IFE action"
> > depends on NET_ACT_IFE
> > diff --git a/net/sched/Makefile b/net/sched/Makefile index
> > 31c367a6cd09..66bbf9a98f9e 100644
> > --- a/net/sched/Makefile
> > +++ b/net/sched/Makefile
> > @@ -30,6 +30,7 @@ obj-$(CONFIG_NET_IFE_SKBPRIO) +=
> act_meta_skbprio.o
> > obj-$(CONFIG_NET_IFE_SKBTCINDEX) += act_meta_skbtcindex.o
> > obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
> > obj-$(CONFIG_NET_ACT_CT) += act_ct.o
> > +obj-$(CONFIG_NET_ACT_GATE) += act_gate.o
> > obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
> > obj-$(CONFIG_NET_SCH_CBQ) += sch_cbq.o
> > obj-$(CONFIG_NET_SCH_HTB) += sch_htb.o
> > diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c new file mode
> > 100644 index 000000000000..916f6fe56692
> > --- /dev/null
> > +++ b/net/sched/act_gate.c
> > @@ -0,0 +1,638 @@
> > +// SPDX-License-Identifier: GPL-2.0-or-later
> > +/* Copyright 2020 NXP */
> > +
> > +#include <linux/module.h>
> > +#include <linux/types.h>
> > +#include <linux/kernel.h>
> > +#include <linux/string.h>
> > +#include <linux/errno.h>
> > +#include <linux/skbuff.h>
> > +#include <linux/rtnetlink.h>
> > +#include <linux/init.h>
> > +#include <linux/slab.h>
> > +#include <net/act_api.h>
> > +#include <net/netlink.h>
> > +#include <net/pkt_cls.h>
> > +#include <net/tc_act/tc_gate.h>
> > +
> > +static unsigned int gate_net_id;
> > +static struct tc_action_ops act_gate_ops;
> > +
> > +static ktime_t gate_get_time(struct tcf_gate *gact) {
> > + ktime_t mono = ktime_get();
> > +
> > + switch (gact->tk_offset) {
> > + case TK_OFFS_MAX:
> > + return mono;
> > + default:
> > + return ktime_mono_to_any(mono, gact->tk_offset);
> > + }
> > +
> > + return KTIME_MAX;
> > +}
> > +
> > +static int gate_get_start_time(struct tcf_gate *gact, ktime_t *start)
> > +{
> > + struct tcf_gate_params *param = &gact->param;
> > + ktime_t now, base, cycle;
> > + u64 n;
> > +
> > + base = ns_to_ktime(param->tcfg_basetime);
> > + now = gate_get_time(gact);
> > +
> > + if (ktime_after(base, now)) {
> > + *start = base;
> > + return 0;
> > + }
> > +
> > + cycle = param->tcfg_cycletime;
> > +
> > + /* cycle time should not be zero */
> > + if (!cycle)
> > + return -EFAULT;
> > +
> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
> > + *start = ktime_add_ns(base, (n + 1) * cycle);
> > + return 0;
> > +}
> > +
> > +static void gate_start_timer(struct tcf_gate *gact, ktime_t start) {
> > + ktime_t expires;
> > +
> > + expires = hrtimer_get_expires(&gact->hitimer);
> > + if (expires == 0)
> > + expires = KTIME_MAX;
> > +
> > + start = min_t(ktime_t, start, expires);
> > +
> > + hrtimer_start(&gact->hitimer, start, HRTIMER_MODE_ABS); }
> > +
> > +static enum hrtimer_restart gate_timer_func(struct hrtimer *timer) {
> > + struct tcf_gate *gact = container_of(timer, struct tcf_gate,
> > + hitimer);
> > + struct tcf_gate_params *p = &gact->param;
> > + struct tcfg_gate_entry *next;
> > + ktime_t close_time, now;
> > +
> > + spin_lock(&gact->tcf_lock);
> > +
> > + next = gact->next_entry;
> > +
> > + /* cycle start, clear pending bit, clear total octets */
> > + gact->current_gate_status = next->gate_state ?
> GATE_ACT_GATE_OPEN : 0;
> > + gact->current_entry_octets = 0;
> > + gact->current_max_octets = next->maxoctets;
> > +
> > + gact->current_close_time = ktime_add_ns(gact->current_close_time,
> > + next->interval);
> > +
> > + close_time = gact->current_close_time;
> > +
> > + if (list_is_last(&next->list, &p->entries))
> > + next = list_first_entry(&p->entries,
> > + struct tcfg_gate_entry, list);
> > + else
> > + next = list_next_entry(next, list);
> > +
> > + now = gate_get_time(gact);
> > +
> > + if (ktime_after(now, close_time)) {
> > + ktime_t cycle, base;
> > + u64 n;
> > +
> > + cycle = p->tcfg_cycletime;
> > + base = ns_to_ktime(p->tcfg_basetime);
> > + n = div64_u64(ktime_sub_ns(now, base), cycle);
> > + close_time = ktime_add_ns(base, (n + 1) * cycle);
> > + }
> > +
> > + gact->next_entry = next;
> > +
> > + hrtimer_set_expires(&gact->hitimer, close_time);
> > +
> > + spin_unlock(&gact->tcf_lock);
> > +
> > + return HRTIMER_RESTART;
> > +}
> > +
> > +static int tcf_gate_act(struct sk_buff *skb, const struct tc_action *a,
> > + struct tcf_result *res) {
> > + struct tcf_gate *gact = to_gate(a);
> > +
> > + spin_lock_bh(&gact->tcf_lock);
>
> I think you need to use regular spin_lock() function here because this
> function is called from softirq context.

Understand, I'll change to spin_lock().

>
> > +
> > + tcf_lastuse_update(&gact->tcf_tm);
> > + bstats_update(&gact->tcf_bstats, skb);
> > +
> > + if (unlikely(gact->current_gate_status & GATE_ACT_PENDING)) {
> > + spin_unlock_bh(&gact->tcf_lock);
> > + return gact->tcf_action;
> > + }
> > +
> > + if (!(gact->current_gate_status & GATE_ACT_GATE_OPEN))
> > + goto drop;
> > +
> > + if (gact->current_max_octets >= 0) {
> > + gact->current_entry_octets += qdisc_pkt_len(skb);
> > + if (gact->current_entry_octets > gact->current_max_octets) {
> > + gact->tcf_qstats.overlimits++;
> > + goto drop;
> > + }
> > + }
> > +
> > + spin_unlock_bh(&gact->tcf_lock);
> > +
> > + return gact->tcf_action;
> > +drop:
> > + gact->tcf_qstats.drops++;
> > + spin_unlock_bh(&gact->tcf_lock);
> > +
> > + return TC_ACT_SHOT;
> > +}
> > +
> > +static const struct nla_policy entry_policy[TCA_GATE_ENTRY_MAX + 1]
> = {
> > + [TCA_GATE_ENTRY_INDEX] = { .type = NLA_U32 },
> > + [TCA_GATE_ENTRY_GATE] = { .type = NLA_FLAG },
> > + [TCA_GATE_ENTRY_INTERVAL] = { .type = NLA_U32 },
> > + [TCA_GATE_ENTRY_IPV] = { .type = NLA_S32 },
> > + [TCA_GATE_ENTRY_MAX_OCTETS] = { .type = NLA_S32 },
> > +};
> > +
> > +static const struct nla_policy gate_policy[TCA_GATE_MAX + 1] = {
> > + [TCA_GATE_PARMS] = { .len = sizeof(struct tc_gate),
> > + .type = NLA_EXACT_LEN },
> > + [TCA_GATE_PRIORITY] = { .type = NLA_S32 },
> > + [TCA_GATE_ENTRY_LIST] = { .type = NLA_NESTED },
> > + [TCA_GATE_BASE_TIME] = { .type = NLA_U64 },
> > + [TCA_GATE_CYCLE_TIME] = { .type = NLA_U64 },
> > + [TCA_GATE_CYCLE_TIME_EXT] = { .type = NLA_U64 },
> > + [TCA_GATE_FLAGS] = { .type = NLA_U32 },
> > + [TCA_GATE_CLOCKID] = { .type = NLA_S32 },
> > +};
> > +
> > +static int fill_gate_entry(struct nlattr **tb, struct tcfg_gate_entry *entry,
> > + struct netlink_ext_ack *extack) {
> > + u32 interval = 0;
> > +
> > + entry->gate_state = nla_get_flag(tb[TCA_GATE_ENTRY_GATE]);
> > +
> > + if (tb[TCA_GATE_ENTRY_INTERVAL])
> > + interval = nla_get_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
> > +
> > + if (interval == 0) {
> > + NL_SET_ERR_MSG(extack, "Invalid interval for schedule entry");
> > + return -EINVAL;
> > + }
> > +
> > + entry->interval = interval;
> > +
> > + if (tb[TCA_GATE_ENTRY_IPV])
> > + entry->ipv = nla_get_s32(tb[TCA_GATE_ENTRY_IPV]);
> > + else
> > + entry->ipv = -1;
> > +
> > + if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
> > + entry->maxoctets =
> nla_get_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
> > + else
> > + entry->maxoctets = -1;
> > +
> > + return 0;
> > +}
> > +
> > +static int parse_gate_entry(struct nlattr *n, struct tcfg_gate_entry
> *entry,
> > + int index, struct netlink_ext_ack *extack) {
> > + struct nlattr *tb[TCA_GATE_ENTRY_MAX + 1] = { };
> > + int err;
> > +
> > + err = nla_parse_nested(tb, TCA_GATE_ENTRY_MAX, n, entry_policy,
> extack);
> > + if (err < 0) {
> > + NL_SET_ERR_MSG(extack, "Could not parse nested entry");
> > + return -EINVAL;
> > + }
> > +
> > + entry->index = index;
> > +
> > + return fill_gate_entry(tb, entry, extack); }
> > +
> > +static void release_entry_list(struct list_head *entries) {
> > + struct tcfg_gate_entry *entry, *e;
> > +
> > + list_for_each_entry_safe(entry, e, entries, list) {
> > + list_del(&entry->list);
> > + kfree(entry);
> > + }
> > +}
> > +
> > +static int parse_gate_list(struct nlattr *list_attr,
> > + struct tcf_gate_params *sched,
> > + struct netlink_ext_ack *extack) {
> > + struct tcfg_gate_entry *entry;
> > + struct nlattr *n;
> > + int err, rem;
> > + int i = 0;
> > +
> > + if (!list_attr)
> > + return -EINVAL;
> > +
> > + nla_for_each_nested(n, list_attr, rem) {
> > + if (nla_type(n) != TCA_GATE_ONE_ENTRY) {
> > + NL_SET_ERR_MSG(extack, "Attribute isn't type 'entry'");
> > + continue;
> > + }
> > +
> > + entry = kzalloc(sizeof(*entry), GFP_KERNEL);
> > + if (!entry) {
> > + NL_SET_ERR_MSG(extack, "Not enough memory for entry");
> > + err = -ENOMEM;
> > + goto release_list;
> > + }
> > +
> > + err = parse_gate_entry(n, entry, i, extack);
> > + if (err < 0) {
> > + kfree(entry);
> > + goto release_list;
> > + }
> > +
> > + list_add_tail(&entry->list, &sched->entries);
> > + i++;
> > + }
> > +
> > + sched->num_entries = i;
> > +
> > + return i;
> > +
> > +release_list:
> > + release_entry_list(&sched->entries);
> > +
> > + return err;
> > +}
> > +
> > +static int tcf_gate_init(struct net *net, struct nlattr *nla,
> > + struct nlattr *est, struct tc_action **a,
> > + int ovr, int bind, bool rtnl_held,
> > + struct tcf_proto *tp, u32 flags,
> > + struct netlink_ext_ack *extack) {
> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
> > + enum tk_offsets tk_offset = TK_OFFS_TAI;
> > + struct nlattr *tb[TCA_GATE_MAX + 1];
> > + struct tcf_chain *goto_ch = NULL;
> > + struct tcf_gate_params *p;
> > + s32 clockid = CLOCK_TAI;
> > + struct tcf_gate *gact;
> > + struct tc_gate *parm;
> > + int ret = 0, err;
> > + u64 basetime = 0;
> > + u32 gflags = 0;
> > + s32 prio = -1;
> > + ktime_t start;
> > + u32 index;
> > +
> > + if (!nla)
> > + return -EINVAL;
> > +
> > + err = nla_parse_nested(tb, TCA_GATE_MAX, nla, gate_policy, extack);
> > + if (err < 0)
> > + return err;
> > +
> > + if (!tb[TCA_GATE_PARMS])
> > + return -EINVAL;
> > +
> > + parm = nla_data(tb[TCA_GATE_PARMS]);
> > + index = parm->index;
> > +
> > + err = tcf_idr_check_alloc(tn, &index, a, bind);
> > + if (err < 0)
> > + return err;
> > +
> > + if (err && bind)
> > + return 0;
> > +
> > + if (!err) {
> > + ret = tcf_idr_create(tn, index, est, a,
> > + &act_gate_ops, bind, false, 0);
> > + if (ret) {
> > + tcf_idr_cleanup(tn, index);
> > + return ret;
> > + }
> > +
> > + ret = ACT_P_CREATED;
> > + } else if (!ovr) {
> > + tcf_idr_release(*a, bind);
> > + return -EEXIST;
> > + }
> > +
> > + if (tb[TCA_GATE_PRIORITY])
> > + prio = nla_get_s32(tb[TCA_GATE_PRIORITY]);
> > +
> > + if (tb[TCA_GATE_BASE_TIME])
> > + basetime = nla_get_u64(tb[TCA_GATE_BASE_TIME]);
> > +
> > + if (tb[TCA_GATE_FLAGS])
> > + gflags = nla_get_u32(tb[TCA_GATE_FLAGS]);
> > +
> > + if (tb[TCA_GATE_CLOCKID]) {
> > + clockid = nla_get_s32(tb[TCA_GATE_CLOCKID]);
> > + switch (clockid) {
> > + case CLOCK_REALTIME:
> > + tk_offset = TK_OFFS_REAL;
> > + break;
> > + case CLOCK_MONOTONIC:
> > + tk_offset = TK_OFFS_MAX;
> > + break;
> > + case CLOCK_BOOTTIME:
> > + tk_offset = TK_OFFS_BOOT;
> > + break;
> > + case CLOCK_TAI:
> > + tk_offset = TK_OFFS_TAI;
> > + break;
> > + default:
> > + NL_SET_ERR_MSG(extack, "Invalid 'clockid'");
> > + goto release_idr;
> > + }
> > + }
> > +
> > + err = tcf_action_check_ctrlact(parm->action, tp, &goto_ch, extack);
> > + if (err < 0)
> > + goto release_idr;
> > +
> > + gact = to_gate(*a);
> > +
> > + spin_lock_bh(&gact->tcf_lock);
> > + p = &gact->param;
> > +
> > + if (tb[TCA_GATE_CYCLE_TIME]) {
> > + p->tcfg_cycletime = nla_get_u64(tb[TCA_GATE_CYCLE_TIME]);
> > + if (!p->tcfg_cycletime_ext)
> > + goto chain_put;
> > + }
> > +
> > + INIT_LIST_HEAD(&p->entries);
> > + if (tb[TCA_GATE_ENTRY_LIST]) {
> > + err = parse_gate_list(tb[TCA_GATE_ENTRY_LIST], p, extack);
> > + if (err < 0)
> > + goto chain_put;
> > + }
> > +
> > + if (!p->tcfg_cycletime) {
> > + struct tcfg_gate_entry *entry;
> > + ktime_t cycle = 0;
> > +
> > + list_for_each_entry(entry, &p->entries, list)
> > + cycle = ktime_add_ns(cycle, entry->interval);
> > + p->tcfg_cycletime = cycle;
> > + }
> > +
> > + if (tb[TCA_GATE_CYCLE_TIME_EXT])
> > + p->tcfg_cycletime_ext =
> > + nla_get_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
> > +
> > + p->tcfg_priority = prio;
> > + p->tcfg_basetime = basetime;
> > + p->tcfg_clockid = clockid;
> > + p->tcfg_flags = gflags;
> > +
> > + gact->tk_offset = tk_offset;
> > + hrtimer_init(&gact->hitimer, clockid, HRTIMER_MODE_ABS);
> > + gact->hitimer.function = gate_timer_func;
> > +
> > + err = gate_get_start_time(gact, &start);
> > + if (err < 0) {
> > + NL_SET_ERR_MSG(extack,
> > + "Internal error: failed get start time");
> > + release_entry_list(&p->entries);
> > + goto chain_put;
> > + }
> > +
> > + gact->current_close_time = start;
> > + gact->current_gate_status = GATE_ACT_GATE_OPEN |
> > + GATE_ACT_PENDING;
> > +
> > + gact->next_entry = list_first_entry(&p->entries,
> > + struct tcfg_gate_entry,
> > + list);
> > +
> > + goto_ch = tcf_action_set_ctrlact(*a, parm->action, goto_ch);
> > +
> > + gate_start_timer(gact, start);
> > +
> > + spin_unlock_bh(&gact->tcf_lock);
> > +
> > + if (goto_ch)
> > + tcf_chain_put_by_act(goto_ch);
> > +
> > + if (ret == ACT_P_CREATED)
> > + tcf_idr_insert(tn, *a);
> > + return ret;
> > +
> > +chain_put:
> > + spin_unlock_bh(&gact->tcf_lock);
> > +
> > + if (goto_ch)
> > + tcf_chain_put_by_act(goto_ch);
> > +release_idr:
> > + tcf_idr_release(*a, bind);
> > + return err;
> > +}
> > +
> > +static void tcf_gate_cleanup(struct tc_action *a) {
> > + struct tcf_gate *gact = to_gate(a);
> > + struct tcf_gate_params *p;
> > +
> > + spin_lock_bh(&gact->tcf_lock);
>
> No need to lock in tc_action_ops->cleanup() callbacks.

I'll remove the lock in cleanup callback. Thanks!

>
> > +
> > + hrtimer_cancel(&gact->hitimer);
> > +
> > + p = &gact->param;
> > + release_entry_list(&p->entries);
> > +
> > + spin_unlock_bh(&gact->tcf_lock); }
> > +
> > +static int dumping_entry(struct sk_buff *skb,
> > + struct tcfg_gate_entry *entry) {
> > + struct nlattr *item;
> > +
> > + item = nla_nest_start_noflag(skb, TCA_GATE_ONE_ENTRY);
> > + if (!item)
> > + return -ENOSPC;
> > +
> > + if (nla_put_u32(skb, TCA_GATE_ENTRY_INDEX, entry->index))
> > + goto nla_put_failure;
> > +
> > + if (entry->gate_state && nla_put_flag(skb, TCA_GATE_ENTRY_GATE))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u32(skb, TCA_GATE_ENTRY_INTERVAL, entry->interval))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_s32(skb, TCA_GATE_ENTRY_MAX_OCTETS, entry-
> >maxoctets))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_s32(skb, TCA_GATE_ENTRY_IPV, entry->ipv))
> > + goto nla_put_failure;
> > +
> > + return nla_nest_end(skb, item);
> > +
> > +nla_put_failure:
> > + nla_nest_cancel(skb, item);
> > + return -1;
> > +}
> > +
> > +static int tcf_gate_dump(struct sk_buff *skb, struct tc_action *a,
> > + int bind, int ref) {
> > + unsigned char *b = skb_tail_pointer(skb);
> > + struct tcf_gate *gact = to_gate(a);
> > + struct tc_gate opt = {
> > + .index = gact->tcf_index,
> > + .refcnt = refcount_read(&gact->tcf_refcnt) - ref,
> > + .bindcnt = atomic_read(&gact->tcf_bindcnt) - bind,
> > + };
> > + struct tcfg_gate_entry *entry;
> > + struct tcf_gate_params *p;
> > + struct nlattr *entry_list;
> > + struct tcf_t t;
> > +
> > + spin_lock_bh(&gact->tcf_lock);
> > + opt.action = gact->tcf_action;
> > +
> > + p = &gact->param;
> > +
> > + if (nla_put(skb, TCA_GATE_PARMS, sizeof(opt), &opt))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u64_64bit(skb, TCA_GATE_BASE_TIME,
> > + p->tcfg_basetime, TCA_GATE_PAD))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME,
> > + p->tcfg_cycletime, TCA_GATE_PAD))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u64_64bit(skb, TCA_GATE_CYCLE_TIME_EXT,
> > + p->tcfg_cycletime_ext, TCA_GATE_PAD))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_s32(skb, TCA_GATE_CLOCKID, p->tcfg_clockid))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_u32(skb, TCA_GATE_FLAGS, p->tcfg_flags))
> > + goto nla_put_failure;
> > +
> > + if (nla_put_s32(skb, TCA_GATE_PRIORITY, p->tcfg_priority))
> > + goto nla_put_failure;
> > +
> > + entry_list = nla_nest_start_noflag(skb, TCA_GATE_ENTRY_LIST);
> > + if (!entry_list)
> > + goto nla_put_failure;
> > +
> > + list_for_each_entry(entry, &p->entries, list) {
> > + if (dumping_entry(skb, entry) < 0)
> > + goto nla_put_failure;
> > + }
> > +
> > + nla_nest_end(skb, entry_list);
> > +
> > + tcf_tm_dump(&t, &gact->tcf_tm);
> > + if (nla_put_64bit(skb, TCA_GATE_TM, sizeof(t), &t, TCA_GATE_PAD))
> > + goto nla_put_failure;
> > + spin_unlock_bh(&gact->tcf_lock);
> > +
> > + return skb->len;
> > +
> > +nla_put_failure:
> > + spin_unlock_bh(&gact->tcf_lock);
> > + nlmsg_trim(skb, b);
> > + return -1;
> > +}
> > +
> > +static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
> > + struct netlink_callback *cb, int type,
> > + const struct tc_action_ops *ops,
> > + struct netlink_ext_ack *extack) {
> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
> > +
> > + return tcf_generic_walker(tn, skb, cb, type, ops, extack); }
> > +
> > +static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u32
> packets,
> > + u64 lastuse, bool hw) {
> > + struct tcf_gate *gact = to_gate(a);
> > + struct tcf_t *tm = &gact->tcf_tm;
> > +
> > + tcf_action_update_stats(a, bytes, packets, false, hw);
> > + tm->lastuse = max_t(u64, tm->lastuse, lastuse); }
> > +
> > +static int tcf_gate_search(struct net *net, struct tc_action **a, u32
> > +index) {
> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
> > +
> > + return tcf_idr_search(tn, a, index); }
> > +
> > +static size_t tcf_gate_get_fill_size(const struct tc_action *act) {
> > + return nla_total_size(sizeof(struct tc_gate)); }
> > +
> > +static struct tc_action_ops act_gate_ops = {
> > + .kind = "gate",
> > + .id = TCA_ID_GATE,
> > + .owner = THIS_MODULE,
> > + .act = tcf_gate_act,
> > + .dump = tcf_gate_dump,
> > + .init = tcf_gate_init,
> > + .cleanup = tcf_gate_cleanup,
> > + .walk = tcf_gate_walker,
> > + .stats_update = tcf_gate_stats_update,
> > + .get_fill_size = tcf_gate_get_fill_size,
> > + .lookup = tcf_gate_search,
> > + .size = sizeof(struct tcf_gate),
> > +};
> > +
> > +static __net_init int gate_init_net(struct net *net) {
> > + struct tc_action_net *tn = net_generic(net, gate_net_id);
> > +
> > + return tc_action_net_init(net, tn, &act_gate_ops); }
> > +
> > +static void __net_exit gate_exit_net(struct list_head *net_list) {
> > + tc_action_net_exit(net_list, gate_net_id); }
> > +
> > +static struct pernet_operations gate_net_ops = {
> > + .init = gate_init_net,
> > + .exit_batch = gate_exit_net,
> > + .id = &gate_net_id,
> > + .size = sizeof(struct tc_action_net), };
> > +
> > +static int __init gate_init_module(void) {
> > + return tcf_register_action(&act_gate_ops, &gate_net_ops); }
> > +
> > +static void __exit gate_cleanup_module(void) {
> > + tcf_unregister_action(&act_gate_ops, &gate_net_ops); }
> > +
> > +module_init(gate_init_module);
> > +module_exit(gate_cleanup_module);
> > +MODULE_LICENSE("GPL v2");

Thanks a lot.

Br,
Po Liu

2020-04-30 07:46:37

by Po Liu

[permalink] [raw]
Subject: RE: Re: [v4,net-next 2/4] net: schedule: add action gate offloading

Hi Vlad,



> -----Original Message-----
> From: Vlad Buslov <[email protected]>
> Sent: 2020??4??30?? 1:41
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>;
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [v4,net-next 2/4] net: schedule: add action gate
> offloading
>
>
> On Tue 28 Apr 2020 at 06:34, Po Liu <[email protected]> wrote:
> > Add the gate action to the flow action entry. Add the gate parameters
> > to the tc_setup_flow_action() queueing to the entries of
> > flow_action_entry array provide to the driver.
> >
> > Signed-off-by: Po Liu <[email protected]>
> > ---
> > include/net/flow_offload.h | 10 ++++
> > include/net/tc_act/tc_gate.h | 113
> +++++++++++++++++++++++++++++++++++
> > net/sched/cls_api.c | 33 ++++++++++
> > 3 files changed, 156 insertions(+)
> >
> > diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> > index 3619c6acf60f..94a30fe02e6d 100644
> > --- a/include/net/flow_offload.h
> > +++ b/include/net/flow_offload.h
> > @@ -147,6 +147,7 @@ enum flow_action_id {
> > FLOW_ACTION_MPLS_PUSH,
> > FLOW_ACTION_MPLS_POP,
> > FLOW_ACTION_MPLS_MANGLE,
> > + FLOW_ACTION_GATE,
> > NUM_FLOW_ACTIONS,
> > };
> >
> > @@ -255,6 +256,15 @@ struct flow_action_entry {
> > u8 bos;
> > u8 ttl;
> > } mpls_mangle;
> > + struct {
> > + u32 index;
> > + s32 prio;
> > + u64 basetime;
> > + u64 cycletime;
> > + u64 cycletimeext;
> > + u32 num_entries;
> > + struct action_gate_entry *entries;
> > + } gate;
> > };
> > struct flow_action_cookie *cookie; /* user defined action cookie
> > */ }; diff --git a/include/net/tc_act/tc_gate.h
> > b/include/net/tc_act/tc_gate.h index 330ad8b02495..9e698c7d64cd
> 100644
> > --- a/include/net/tc_act/tc_gate.h
> > +++ b/include/net/tc_act/tc_gate.h
> > @@ -7,6 +7,13 @@
> > #include <net/act_api.h>
> > #include <linux/tc_act/tc_gate.h>
> >
> > +struct action_gate_entry {
> > + u8 gate_state;
> > + u32 interval;
> > + s32 ipv;
> > + s32 maxoctets;
> > +};
> > +
> > struct tcfg_gate_entry {
> > int index;
> > u8 gate_state;
> > @@ -44,4 +51,110 @@ struct tcf_gate {
> >
> > #define to_gate(a) ((struct tcf_gate *)a)
> >
> > +static inline bool is_tcf_gate(const struct tc_action *a) { #ifdef
> > +CONFIG_NET_CLS_ACT
> > + if (a->ops && a->ops->id == TCA_ID_GATE)
> > + return true;
> > +#endif
> > + return false;
> > +}
> > +
> > +static inline u32 tcf_gate_index(const struct tc_action *a) {
> > + return a->tcfa_index;
> > +}
> > +
> > +static inline s32 tcf_gate_prio(const struct tc_action *a) {
> > + s32 tcfg_prio;
> > +
> > + rcu_read_lock();
>
> This action no longer uses rcu, so you don't need protect with
> rcu_read_lock() in all these helpers.

I would remove all the rcu_read_lock() here in this patch.

>
> > + tcfg_prio = to_gate(a)->param.tcfg_priority;
> > + rcu_read_unlock();
> > +
> > + return tcfg_prio;
> > +}
> > +
> > +static inline u64 tcf_gate_basetime(const struct tc_action *a) {
> > + u64 tcfg_basetime;
> > +
> > + rcu_read_lock();
> > + tcfg_basetime = to_gate(a)->param.tcfg_basetime;
> > + rcu_read_unlock();
> > +
> > + return tcfg_basetime;
> > +}
> > +
> > +static inline u64 tcf_gate_cycletime(const struct tc_action *a) {
> > + u64 tcfg_cycletime;
> > +
> > + rcu_read_lock();
> > + tcfg_cycletime = to_gate(a)->param.tcfg_cycletime;
> > + rcu_read_unlock();
> > +
> > + return tcfg_cycletime;
> > +}
> > +
> > +static inline u64 tcf_gate_cycletimeext(const struct tc_action *a) {
> > + u64 tcfg_cycletimeext;
> > +
> > + rcu_read_lock();
> > + tcfg_cycletimeext = to_gate(a)->param.tcfg_cycletime_ext;
> > + rcu_read_unlock();
> > +
> > + return tcfg_cycletimeext;
> > +}
> > +
> > +static inline u32 tcf_gate_num_entries(const struct tc_action *a) {
> > + u32 num_entries;
> > +
> > + rcu_read_lock();
> > + num_entries = to_gate(a)->param.num_entries;
> > + rcu_read_unlock();
> > +
> > + return num_entries;
> > +}
> > +
> > +static inline struct action_gate_entry
> > + *tcf_gate_get_list(const struct tc_action *a) {
> > + struct action_gate_entry *oe;
> > + struct tcf_gate_params *p;
> > + struct tcfg_gate_entry *entry;
> > + u32 num_entries;
> > + int i = 0;
> > +
> > + rcu_read_lock();
> > +
> > + p = &to_gate(a)->param;
> > + num_entries = p->num_entries;
> > +
> > + list_for_each_entry(entry, &p->entries, list)
> > + i++;
> > +
> > + if (i != num_entries)
> > + return NULL;
> > +
> > + oe = kzalloc(sizeof(*oe) * num_entries, GFP_KERNEL);
>
> Can't allocate with GFP_KERNEL flag in rcu read blocks, but you don't need
> the rcu read lock here anyway. However, tc_setup_flow_action() calls this
> function while holding tcfa_lock spinlock, which also precludes allocating
> memory with that flag. You can test for such problems by enabling
> CONFIG_DEBUG_ATOMIC_SLEEP. To help uncover such errors all new act

Thanks a lot. I added this config for debug. I would use GFP_ATOMIC flag avoid sleeping alloc and using kcalloc for the array.

> APIs and action implementations are usually accompanied by tdc tests. If
> you chose to implement such tests you can look at 6e52fca36c67 ("tc-tests:
> Add tc action ct tests") for recent example.

I would look into the test. Thanks!

>
> > + if (!oe)
> > + return NULL;
>
> This returns without releasing rcu read lock, but as I said before you
> probably don't need rcu protection here anyway.

Thanks for remind, that is helpful.

>
> > +
> > + i = 0;
> > + list_for_each_entry(entry, &p->entries, list) {
> > + oe[i].gate_state = entry->gate_state;
> > + oe[i].interval = entry->interval;
> > + oe[i].ipv = entry->ipv;
> > + oe[i].maxoctets = entry->maxoctets;
> > + i++;
> > + }
> > +
> > + rcu_read_unlock();
> > +
> > + return oe;
> > +}
> > #endif
> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index
> > 11b683c45c28..7e85c91d0752 100644
> > --- a/net/sched/cls_api.c
> > +++ b/net/sched/cls_api.c
> > @@ -39,6 +39,7 @@
> > #include <net/tc_act/tc_skbedit.h>
> > #include <net/tc_act/tc_ct.h>
> > #include <net/tc_act/tc_mpls.h>
> > +#include <net/tc_act/tc_gate.h>
> > #include <net/flow_offload.h>
> >
> > extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1]; @@
> > -3526,6 +3527,27 @@ static void tcf_sample_get_group(struct
> > flow_action_entry *entry, #endif }
> >
> > +static void tcf_gate_entry_destructor(void *priv) {
> > + struct action_gate_entry *oe = priv;
> > +
> > + kfree(oe);
> > +}
> > +
> > +static int tcf_gate_get_entries(struct flow_action_entry *entry,
> > + const struct tc_action *act) {
> > + entry->gate.entries = tcf_gate_get_list(act);
> > +
> > + if (!entry->gate.entries)
> > + return -EINVAL;
> > +
> > + entry->destructor = tcf_gate_entry_destructor;
> > + entry->destructor_priv = entry->gate.entries;
> > +
> > + return 0;
> > +}
> > +
> > int tc_setup_flow_action(struct flow_action *flow_action,
> > const struct tcf_exts *exts) { @@ -3672,6
> > +3694,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
> > } else if (is_tcf_skbedit_priority(act)) {
> > entry->id = FLOW_ACTION_PRIORITY;
> > entry->priority = tcf_skbedit_priority(act);
> > + } else if (is_tcf_gate(act)) {
> > + entry->id = FLOW_ACTION_GATE;
> > + entry->gate.index = tcf_gate_index(act);
> > + entry->gate.prio = tcf_gate_prio(act);
> > + entry->gate.basetime = tcf_gate_basetime(act);
> > + entry->gate.cycletime = tcf_gate_cycletime(act);
> > + entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
> > + entry->gate.num_entries = tcf_gate_num_entries(act);
> > + err = tcf_gate_get_entries(entry, act);
> > + if (err)
> > + goto err_out;
> > } else {
> > err = -EOPNOTSUPP;
> > goto err_out_locked;

Thanks a lot.

Br,
Po Liu

2020-05-01 01:16:39

by Po Liu

[permalink] [raw]
Subject: [v5,net-next 0/4] Introduce a flow gate control action and apply IEEE

Changes from V4:
----------------
0001:
Fix and modify according to Vlid Buslov suggestions:
- Change spin_lock_bh() to spin_lock() since tcf_gate_act() already in
software irq.
- Remove spin lock protect in the ops->cleanup function.
- Enable the CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING checking,
then fix as kzalloc flag type and lock deadlock.
- Change kzalloc() flag type from GFP_KERNEL to GFP_ATOMIC since
function in the spin_lock protect.
- Change hrtimer type from HRTIMER_MODE_ABS to HRTIMER_MODE_ABS_SOFT
to avoid deadlock.

0002:
Fix and modify according to Vlid Buslov suggestions:
- Remove all rcu_read_lock protection since no rcu parameters.
- Enable the CONFIG_DEBUG_ATOMIC_SLEEP and CONFIG_PROVE_LOCKING checking,
then check kzalloc sleeping flag.
- Change kzalloc to kcalloc for array memory alloc and change GFP_KERNEL
flag to GFP_ATOMIC since function holding spin_lock protect.

0003:
- No changes.

0004:
- Commit comments rephrase act by Claudiu Manoil.

Changes from V3:
----------------
0001:

Fix and modify according to Vlid Buslov:
- Remove the struct gate_action and move the parameters to the
struct tcf_gate align with tc_action parameters. This would not need to
alloc rcu type memory with pointer.
- Remove the spin_lock type entry_lock which do not needed anymore, will
use the tcf_lock system provided.
- Provide lockep protect for the status parameters in the tcf_gate_act().
- Remove the cycletime 0 input warning, return error directly.

And:
- Remove Qci related description in the Kconfig for gate action.

0002:
- Fix rcu_read_lock protect range suggested by Vlid Buslov.

0003:
- No changes.

0004:
- Fix bug of gate maxoct wildcard condition not included.
- Fix the pass time basetime calculation report by Vladimir Otlean.

Changes from V2:
0001: No changes.
0002: No changes.
0003: No changes.
0004: Fix the vlan id filter parameter and add reject src mac
FF-FF-FF-FF-FF-FF filter in driver.

Changes from V1:
----------------
0000: Update description make it more clear
0001: Removed 'add update dropped stats' patch, will provide pull
request as standalone patches.
0001: Update commit description make it more clear ack by Jiri Pirko.
0002: No changes
0003: Fix some code style ack by Jiri Pirko.
0004: Fix enetc_psfp_enable/disable parameter type ack by test robot

iprout2 command patches:
Not attach with these serial patches, will provide separate pull
request after kernel accept these patches.

Changes from RFC:
-----------------
0000: Reduce to 5 patches and remove the 4 max frame size offload and
flow metering in the policing offload action, Only keep gate action
offloading implementation.
0001: No changes.
0002:
- fix kfree lead ack by Jakub Kicinski and Cong Wang
- License fix from Jakub Kicinski and Stephen Hemminger
- Update example in commit acked by Vinicius Costa Gomes
- Fix the rcu protect in tcf_gate_act() acked by Vinicius

0003: No changes
0004: No changes
0005:
Acked by Vinicius Costa Gomes
- Use refcount kernel lib
- Update stream gate check code position
- Update reduce ref names more clear

iprout2 command patches:
0000: Update license expression and add gate id
0001: Add tc action gate man page

--------------------------------------------------------------------
These patches add stream gate action policing in IEEE802.1Qci (Per-Stream
Filtering and Policing) software support and hardware offload support in
tc flower, and implement the stream identify, stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver.
Per-Stream Filtering and Policing (PSFP) specifies flow policing and
filtering for ingress flows, and has three main parts:
1. The stream filter instance table consists of an ordered list of
stream filters that determine the filtering and policing actions that
are to be applied to frames received on a specific stream. The main
elements are stream gate id, flow metering id and maximum SDU size.
2. The stream gate function setup a gate list to control ingress traffic
class open/close state. When the gate is running at open state, the flow
could pass but dropped when gate state is running to close. User setup a
bastime to tell gate when start running the entry list, then the hardware
would periodiclly. There is no compare qdisc action support.
3. Flow metering is two rates two buckets and three-color marker to
policing the frames. Flow metering instance are as specified in the
algorithm in MEF10.3. The most likely qdisc action is policing action.

The first patch introduces an ingress frame flow control gate action,
for the point 2. The tc gate action maintains the open/close state gate
list, allowing flows to pass when the gate is open. Each gate action
may policing one or more qdisc filters. When the start time arrived, The
driver would repeat the gate list periodiclly. User can assign a passed
time, the driver would calculate a new future time by the cycletime of
the gate list.

The 0002 patch introduces the gate flow hardware offloading.

The 0003 patch adds support control the on/off for the tc flower
offloading by ethtool.

The 0004 patch implement the stream identify and stream filtering and
stream gate filtering action in the NXP ENETC ethernet driver. Tc filter
command provide filtering keys with MAC address and VLAN id. These keys
would be set to stream identify instance entry. Stream gate instance
entry would refer the gate action parameters. Stream filter instance
entry would refer the stream gate index and assign a stream handle value
matches to the stream identify instance.

Po Liu (4):
net: qos: introduce a gate control flow action
net: schedule: add action gate offloading
net: enetc: add hw tc hw offload features for PSPF capability
net: enetc: add tc flower psfp offload driver

drivers/net/ethernet/freescale/enetc/enetc.c | 34 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 86 ++
.../net/ethernet/freescale/enetc/enetc_hw.h | 159 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 6 +
.../net/ethernet/freescale/enetc/enetc_qos.c | 1098 +++++++++++++++++
include/net/flow_offload.h | 10 +
include/net/tc_act/tc_gate.h | 146 +++
include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +
net/sched/Kconfig | 12 +
net/sched/Makefile | 1 +
net/sched/act_gate.c | 636 ++++++++++
net/sched/cls_api.c | 33 +
13 files changed, 2268 insertions(+), 1 deletion(-)
create mode 100644 include/net/tc_act/tc_gate.h
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 net/sched/act_gate.c

--
2.17.1

2020-05-01 01:16:51

by Po Liu

[permalink] [raw]
Subject: [v5,net-next 2/4] net: schedule: add action gate offloading

Add the gate action to the flow action entry. Add the gate parameters to
the tc_setup_flow_action() queueing to the entries of flow_action_entry
array provide to the driver.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 10 ++++
include/net/tc_act/tc_gate.h | 99 ++++++++++++++++++++++++++++++++++++
net/sched/cls_api.c | 33 ++++++++++++
3 files changed, 142 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 3619c6acf60f..94a30fe02e6d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -147,6 +147,7 @@ enum flow_action_id {
FLOW_ACTION_MPLS_PUSH,
FLOW_ACTION_MPLS_POP,
FLOW_ACTION_MPLS_MANGLE,
+ FLOW_ACTION_GATE,
NUM_FLOW_ACTIONS,
};

@@ -255,6 +256,15 @@ struct flow_action_entry {
u8 bos;
u8 ttl;
} mpls_mangle;
+ struct {
+ u32 index;
+ s32 prio;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimeext;
+ u32 num_entries;
+ struct action_gate_entry *entries;
+ } gate;
};
struct flow_action_cookie *cookie; /* user defined action cookie */
};
diff --git a/include/net/tc_act/tc_gate.h b/include/net/tc_act/tc_gate.h
index 330ad8b02495..8bc6be81a7ad 100644
--- a/include/net/tc_act/tc_gate.h
+++ b/include/net/tc_act/tc_gate.h
@@ -7,6 +7,13 @@
#include <net/act_api.h>
#include <linux/tc_act/tc_gate.h>

+struct action_gate_entry {
+ u8 gate_state;
+ u32 interval;
+ s32 ipv;
+ s32 maxoctets;
+};
+
struct tcfg_gate_entry {
int index;
u8 gate_state;
@@ -44,4 +51,96 @@ struct tcf_gate {

#define to_gate(a) ((struct tcf_gate *)a)

+static inline bool is_tcf_gate(const struct tc_action *a)
+{
+#ifdef CONFIG_NET_CLS_ACT
+ if (a->ops && a->ops->id == TCA_ID_GATE)
+ return true;
+#endif
+ return false;
+}
+
+static inline u32 tcf_gate_index(const struct tc_action *a)
+{
+ return a->tcfa_index;
+}
+
+static inline s32 tcf_gate_prio(const struct tc_action *a)
+{
+ s32 tcfg_prio;
+
+ tcfg_prio = to_gate(a)->param.tcfg_priority;
+
+ return tcfg_prio;
+}
+
+static inline u64 tcf_gate_basetime(const struct tc_action *a)
+{
+ u64 tcfg_basetime;
+
+ tcfg_basetime = to_gate(a)->param.tcfg_basetime;
+
+ return tcfg_basetime;
+}
+
+static inline u64 tcf_gate_cycletime(const struct tc_action *a)
+{
+ u64 tcfg_cycletime;
+
+ tcfg_cycletime = to_gate(a)->param.tcfg_cycletime;
+
+ return tcfg_cycletime;
+}
+
+static inline u64 tcf_gate_cycletimeext(const struct tc_action *a)
+{
+ u64 tcfg_cycletimeext;
+
+ tcfg_cycletimeext = to_gate(a)->param.tcfg_cycletime_ext;
+
+ return tcfg_cycletimeext;
+}
+
+static inline u32 tcf_gate_num_entries(const struct tc_action *a)
+{
+ u32 num_entries;
+
+ num_entries = to_gate(a)->param.num_entries;
+
+ return num_entries;
+}
+
+static inline struct action_gate_entry
+ *tcf_gate_get_list(const struct tc_action *a)
+{
+ struct action_gate_entry *oe;
+ struct tcf_gate_params *p;
+ struct tcfg_gate_entry *entry;
+ u32 num_entries;
+ int i = 0;
+
+ p = &to_gate(a)->param;
+ num_entries = p->num_entries;
+
+ list_for_each_entry(entry, &p->entries, list)
+ i++;
+
+ if (i != num_entries)
+ return NULL;
+
+ oe = kcalloc(num_entries, sizeof(*oe), GFP_ATOMIC);
+ if (!oe)
+ return NULL;
+
+ i = 0;
+ list_for_each_entry(entry, &p->entries, list) {
+ oe[i].gate_state = entry->gate_state;
+ oe[i].interval = entry->interval;
+ oe[i].ipv = entry->ipv;
+ oe[i].maxoctets = entry->maxoctets;
+ i++;
+ }
+
+ return oe;
+}
#endif
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 11b683c45c28..7e85c91d0752 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -39,6 +39,7 @@
#include <net/tc_act/tc_skbedit.h>
#include <net/tc_act/tc_ct.h>
#include <net/tc_act/tc_mpls.h>
+#include <net/tc_act/tc_gate.h>
#include <net/flow_offload.h>

extern const struct nla_policy rtm_tca_policy[TCA_MAX + 1];
@@ -3526,6 +3527,27 @@ static void tcf_sample_get_group(struct flow_action_entry *entry,
#endif
}

+static void tcf_gate_entry_destructor(void *priv)
+{
+ struct action_gate_entry *oe = priv;
+
+ kfree(oe);
+}
+
+static int tcf_gate_get_entries(struct flow_action_entry *entry,
+ const struct tc_action *act)
+{
+ entry->gate.entries = tcf_gate_get_list(act);
+
+ if (!entry->gate.entries)
+ return -EINVAL;
+
+ entry->destructor = tcf_gate_entry_destructor;
+ entry->destructor_priv = entry->gate.entries;
+
+ return 0;
+}
+
int tc_setup_flow_action(struct flow_action *flow_action,
const struct tcf_exts *exts)
{
@@ -3672,6 +3694,17 @@ int tc_setup_flow_action(struct flow_action *flow_action,
} else if (is_tcf_skbedit_priority(act)) {
entry->id = FLOW_ACTION_PRIORITY;
entry->priority = tcf_skbedit_priority(act);
+ } else if (is_tcf_gate(act)) {
+ entry->id = FLOW_ACTION_GATE;
+ entry->gate.index = tcf_gate_index(act);
+ entry->gate.prio = tcf_gate_prio(act);
+ entry->gate.basetime = tcf_gate_basetime(act);
+ entry->gate.cycletime = tcf_gate_cycletime(act);
+ entry->gate.cycletimeext = tcf_gate_cycletimeext(act);
+ entry->gate.num_entries = tcf_gate_num_entries(act);
+ err = tcf_gate_get_entries(entry, act);
+ if (err)
+ goto err_out;
} else {
err = -EOPNOTSUPP;
goto err_out_locked;
--
2.17.1

2020-05-01 01:17:52

by Po Liu

[permalink] [raw]
Subject: [v5,net-next 4/4] net: enetc: add tc flower psfp offload driver

This patch is to add tc flower offload for the enetc IEEE 802.1Qci(PSFP)
function. There are four main feature parts to implement the flow
policing and filtering for ingress flow with IEEE 802.1Qci features.
They are stream identify(this is defined in the P802.1cb exactly but
needed for 802.1Qci), stream filtering, stream gate and flow metering.
Each function block includes many entries by index to assign parameters.
So for one frame would be filtered by stream identify first, then
flow into stream filter block by the same handle between stream identify
and stream filtering. Then flow into stream gate control which assigned
by the stream filtering entry. And then policing by the gate and limited
by the max sdu in the filter block(optional). At last, policing by the
flow metering block, index choosing at the fitering block.
So you can see that each entry of block may link to many upper entries
since they can be assigned same index means more streams want to share
the same feature in the stream filtering or stream gate or flow
metering.
To implement such features, each stream filtered by source/destination
mac address, some stream maybe also plus the vlan id value would be
treated as one flow chain. This would be identified by the chain_index
which already in the tc filter concept. Driver would maintain this chain
and also with gate modules. The stream filter entry create by the gate
index and flow meter(optional) entry id and also one priority value.
Offloading only transfer the gate action and flow filtering parameters.
Driver would create (or search same gate id and flow meter id and
priority) one stream filter entry to set to the hardware. So stream
filtering do not need transfer by the action offloading.
This architecture is same with tc filter and actions relationship. tc
filter maintain the list for each flow feature by keys. And actions
maintain by the action list.

Below showing a example commands by tc:
> tc qdisc add dev eth0 ingress
> ip link set eth0 address 10:00:80:00:00:00
> tc filter add dev eth0 parent ffff: protocol ip chain 11 \
flower skip_sw dst_mac 10:00:80:00:00:00 \
action gate index 10 \
sched-entry open 200000000 1 8000000 \
sched-entry close 100000000 -1 -1

Command means to set the dst_mac 10:00:80:00:00:00 to index 11 of stream
identify module. Then setting the gate index 10 of stream gate module.
Keep the gate open for 200ms and limit the traffic volume to 8MB in this
sched-entry. Then direct the frames to the ingress queue 1.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 25 +-
drivers/net/ethernet/freescale/enetc/enetc.h | 46 +-
.../net/ethernet/freescale/enetc/enetc_hw.h | 142 +++
.../net/ethernet/freescale/enetc/enetc_pf.c | 4 +-
.../net/ethernet/freescale/enetc/enetc_qos.c | 1098 +++++++++++++++++
5 files changed, 1300 insertions(+), 15 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 04aac7cbb506..298c55786fd9 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1521,6 +1521,8 @@ int enetc_setup_tc(struct net_device *ndev, enum tc_setup_type type,
return enetc_setup_tc_cbs(ndev, type_data);
case TC_SETUP_QDISC_ETF:
return enetc_setup_tc_txtime(ndev, type_data);
+ case TC_SETUP_BLOCK:
+ return enetc_setup_tc_psfp(ndev, type_data);
default:
return -EOPNOTSUPP;
}
@@ -1573,17 +1575,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
static int enetc_set_psfp(struct net_device *ndev, int en)
{
struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ int err;

if (en) {
+ err = enetc_psfp_enable(priv);
+ if (err)
+ return err;
+
priv->active_offloads |= ENETC_F_QCI;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&priv->si->hw);
- } else {
- priv->active_offloads &= ~ENETC_F_QCI;
- memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
- enetc_psfp_disable(&priv->si->hw);
+ return 0;
}

+ err = enetc_psfp_disable(priv);
+ if (err)
+ return err;
+
+ priv->active_offloads &= ~ENETC_F_QCI;
+
return 0;
}

@@ -1591,14 +1599,15 @@ int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
netdev_features_t changed = ndev->features ^ features;
+ int err = 0;

if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

if (changed & NETIF_F_HW_TC)
- enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+ err = enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));

- return 0;
+ return err;
}

#ifdef CONFIG_FSL_ENETC_PTP_CLOCK
diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 2cfe877c3778..b705464f6882 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -300,6 +300,11 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv);
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data);
+int enetc_psfp_init(struct enetc_ndev_priv *priv);
+int enetc_psfp_clean(struct enetc_ndev_priv *priv);

static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
{
@@ -319,27 +324,60 @@ static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
}

-static inline void enetc_psfp_enable(struct enetc_hw *hw)
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ enetc_get_max_cap(priv);
+
+ err = enetc_psfp_init(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) |
ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS |
ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+
+ return 0;
}

-static inline void enetc_psfp_disable(struct enetc_hw *hw)
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
{
+ struct enetc_hw *hw = &priv->si->hw;
+ int err;
+
+ err = enetc_psfp_clean(priv);
+ if (err)
+ return err;
+
enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) &
~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS &
~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+
+ return 0;
}
+
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_psfp(ndev, type_data) -EOPNOTSUPP
+#define enetc_setup_tc_block_cb NULL
+
#define enetc_get_max_cap(p) \
memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))

-#define enetc_psfp_enable(hw) (void)0
-#define enetc_psfp_disable(hw) (void)0
+static inline int enetc_psfp_enable(struct enetc_ndev_priv *priv)
+{
+ return 0;
+}
+
+static inline int enetc_psfp_disable(struct enetc_ndev_priv *priv)
+{
+ return 0;
+}
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 587974862f48..6314051bc6c1 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -567,6 +567,9 @@ enum bdcr_cmd_class {
BDCR_CMD_RFS,
BDCR_CMD_PORT_GCL,
BDCR_CMD_RECV_CLASSIFIER,
+ BDCR_CMD_STREAM_IDENTIFY,
+ BDCR_CMD_STREAM_FILTER,
+ BDCR_CMD_STREAM_GCL,
__BDCR_CMD_MAX_LEN,
BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
};
@@ -598,13 +601,152 @@ struct tgs_gcl_data {
struct gce entry[];
};

+/* class 7, command 0, Stream Identity Entry Configuration */
+struct streamid_conf {
+ __le32 stream_handle; /* init gate value */
+ __le32 iports;
+ u8 id_type;
+ u8 oui[3];
+ u8 res[3];
+ u8 en;
+};
+
+#define ENETC_CBDR_SID_VID_MASK 0xfff
+#define ENETC_CBDR_SID_VIDM BIT(12)
+#define ENETC_CBDR_SID_TG_MASK 0xc000
+/* streamid_conf address point to this data space */
+struct streamid_data {
+ union {
+ u8 dmac[6];
+ u8 smac[6];
+ };
+ u16 vid_vidm_tg;
+};
+
+#define ENETC_CBDR_SFI_PRI_MASK 0x7
+#define ENETC_CBDR_SFI_PRIM BIT(3)
+#define ENETC_CBDR_SFI_BLOV BIT(4)
+#define ENETC_CBDR_SFI_BLEN BIT(5)
+#define ENETC_CBDR_SFI_MSDUEN BIT(6)
+#define ENETC_CBDR_SFI_FMITEN BIT(7)
+#define ENETC_CBDR_SFI_ENABLE BIT(7)
+/* class 8, command 0, Stream Filter Instance, Short Format */
+struct sfi_conf {
+ __le32 stream_handle;
+ u8 multi;
+ u8 res[2];
+ u8 sthm;
+ /* Max Service Data Unit or Flow Meter Instance Table index.
+ * Depending on the value of FLT this represents either Max
+ * Service Data Unit (max frame size) allowed by the filter
+ * entry or is an index into the Flow Meter Instance table
+ * index identifying the policer which will be used to police
+ * it.
+ */
+ __le16 fm_inst_table_index;
+ __le16 msdu;
+ __le16 sg_inst_table_index;
+ u8 res1[2];
+ __le32 input_ports;
+ u8 res2[3];
+ u8 en;
+};
+
+/* class 8, command 2 stream Filter Instance status query short format
+ * command no need structure define
+ * Stream Filter Instance Query Statistics Response data
+ */
+struct sfi_counter_data {
+ u32 matchl;
+ u32 matchh;
+ u32 msdu_dropl;
+ u32 msdu_droph;
+ u32 stream_gate_dropl;
+ u32 stream_gate_droph;
+ u32 flow_meter_dropl;
+ u32 flow_meter_droph;
+};
+
+#define ENETC_CBDR_SGI_OIPV_MASK 0x7
+#define ENETC_CBDR_SGI_OIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_CGTST BIT(6)
+#define ENETC_CBDR_SGI_OGTST BIT(7)
+#define ENETC_CBDR_SGI_CFG_CHG BIT(1)
+#define ENETC_CBDR_SGI_CFG_PND BIT(2)
+#define ENETC_CBDR_SGI_OEX BIT(4)
+#define ENETC_CBDR_SGI_OEXEN BIT(5)
+#define ENETC_CBDR_SGI_IRX BIT(6)
+#define ENETC_CBDR_SGI_IRXEN BIT(7)
+#define ENETC_CBDR_SGI_ACLLEN_MASK 0x3
+#define ENETC_CBDR_SGI_OCLLEN_MASK 0xc
+#define ENETC_CBDR_SGI_EN BIT(7)
+/* class 9, command 0, Stream Gate Instance Table, Short Format
+ * class 9, command 2, Stream Gate Instance Table entry query write back
+ * Short Format
+ */
+struct sgi_table {
+ u8 res[8];
+ u8 oipv;
+ u8 res0[2];
+ u8 ocgtst;
+ u8 res1[7];
+ u8 gset;
+ u8 oacl_len;
+ u8 res2[2];
+ u8 en;
+};
+
+#define ENETC_CBDR_SGI_AIPV_MASK 0x7
+#define ENETC_CBDR_SGI_AIPV_EN BIT(3)
+#define ENETC_CBDR_SGI_AGTST BIT(7)
+
+/* class 9, command 1, Stream Gate Control List, Long Format */
+struct sgcl_conf {
+ u8 aipv;
+ u8 res[2];
+ u8 agtst;
+ u8 res1[4];
+ union {
+ struct {
+ u8 res2[4];
+ u8 acl_len;
+ u8 res3[3];
+ };
+ u8 cct[8]; /* Config change time */
+ };
+};
+
+#define ENETC_CBDR_SGL_IOMEN BIT(0)
+#define ENETC_CBDR_SGL_IPVEN BIT(3)
+#define ENETC_CBDR_SGL_GTST BIT(4)
+#define ENETC_CBDR_SGL_IPV_MASK 0xe
+/* Stream Gate Control List Entry */
+struct sgce {
+ u32 interval;
+ u8 msdu[3];
+ u8 multi;
+};
+
+/* stream control list class 9 , cmd 1 data buffer */
+struct sgcl_data {
+ u32 btl;
+ u32 bth;
+ u32 ct;
+ u32 cte;
+ struct sgce sgcl[0];
+};
+
struct enetc_cbd {
union{
+ struct sfi_conf sfi_conf;
+ struct sgi_table sgi_table;
struct {
__le32 addr[2];
union {
__le32 opt[4];
struct tgs_gcl_conf gcl_conf;
+ struct streamid_conf sid_set;
+ struct sgcl_conf sgcl_conf;
};
}; /* Long format */
__le32 data[6];
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index cef9fbfdb056..824d211ec00f 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -727,12 +727,10 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

- if (si->hw_features & ENETC_SI_F_PSFP) {
+ if (si->hw_features & ENETC_SI_F_PSFP && !enetc_psfp_enable(priv)) {
priv->active_offloads |= ENETC_F_QCI;
ndev->features |= NETIF_F_HW_TC;
ndev->hw_features |= NETIF_F_HW_TC;
- enetc_get_max_cap(priv);
- enetc_psfp_enable(&si->hw);
}

/* pick up primary MAC address from SI */
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 0c6bf3a55a9a..30fca29b2739 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -5,6 +5,9 @@

#include <net/pkt_sched.h>
#include <linux/math64.h>
+#include <linux/refcount.h>
+#include <net/pkt_cls.h>
+#include <net/tc_act/tc_gate.h>

static u16 enetc_get_max_gcl_len(struct enetc_hw *hw)
{
@@ -331,3 +334,1098 @@ int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data)

return 0;
}
+
+enum streamid_type {
+ STREAMID_TYPE_RESERVED = 0,
+ STREAMID_TYPE_NULL,
+ STREAMID_TYPE_SMAC,
+};
+
+enum streamid_vlan_tagged {
+ STREAMID_VLAN_RESERVED = 0,
+ STREAMID_VLAN_TAGGED,
+ STREAMID_VLAN_UNTAGGED,
+ STREAMID_VLAN_ALL,
+};
+
+#define ENETC_PSFP_WILDCARD -1
+#define HANDLE_OFFSET 100
+
+enum forward_type {
+ FILTER_ACTION_TYPE_PSFP = BIT(0),
+ FILTER_ACTION_TYPE_ACL = BIT(1),
+ FILTER_ACTION_TYPE_BOTH = GENMASK(1, 0),
+};
+
+/* This is for limit output type for input actions */
+struct actions_fwd {
+ u64 actions;
+ u64 keys; /* include the must needed keys */
+ enum forward_type output;
+};
+
+struct psfp_streamfilter_counters {
+ u64 matching_frames_count;
+ u64 passing_frames_count;
+ u64 not_passing_frames_count;
+ u64 passing_sdu_count;
+ u64 not_passing_sdu_count;
+ u64 red_frames_count;
+};
+
+struct enetc_streamid {
+ u32 index;
+ union {
+ u8 src_mac[6];
+ u8 dst_mac[6];
+ };
+ u8 filtertype;
+ u16 vid;
+ u8 tagged;
+ s32 handle;
+};
+
+struct enetc_psfp_filter {
+ u32 index;
+ s32 handle;
+ s8 prio;
+ u32 gate_id;
+ s32 meter_id;
+ refcount_t refcount;
+ struct hlist_node node;
+};
+
+struct enetc_psfp_gate {
+ u32 index;
+ s8 init_ipv;
+ u64 basetime;
+ u64 cycletime;
+ u64 cycletimext;
+ u32 num_entries;
+ refcount_t refcount;
+ struct hlist_node node;
+ struct action_gate_entry entries[0];
+};
+
+struct enetc_stream_filter {
+ struct enetc_streamid sid;
+ u32 sfi_index;
+ u32 sgi_index;
+ struct flow_stats stats;
+ struct hlist_node node;
+};
+
+struct enetc_psfp {
+ unsigned long dev_bitmap;
+ unsigned long *psfp_sfi_bitmap;
+ struct hlist_head stream_list;
+ struct hlist_head psfp_filter_list;
+ struct hlist_head psfp_gate_list;
+ spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
+};
+
+struct actions_fwd enetc_act_fwd[] = {
+ {
+ BIT(FLOW_ACTION_GATE),
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
+ FILTER_ACTION_TYPE_PSFP
+ },
+ /* example for ACL actions */
+ {
+ BIT(FLOW_ACTION_DROP),
+ 0,
+ FILTER_ACTION_TYPE_ACL
+ }
+};
+
+static struct enetc_psfp epsfp = {
+ .psfp_sfi_bitmap = NULL,
+};
+
+static LIST_HEAD(enetc_block_cb_list);
+
+static inline int enetc_get_port(struct enetc_ndev_priv *priv)
+{
+ return priv->si->pdev->devfn & 0x7;
+}
+
+/* Stream Identity Entry Set Descriptor */
+static int enetc_streamid_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct streamid_data *si_data;
+ struct streamid_conf *si_conf;
+ u16 data_size;
+ dma_addr_t dma;
+ int err;
+
+ if (sid->index >= priv->psfp_cap.max_streamid)
+ return -EINVAL;
+
+ if (sid->filtertype != STREAMID_TYPE_NULL &&
+ sid->filtertype != STREAMID_TYPE_SMAC)
+ return -EOPNOTSUPP;
+
+ /* Disable operation before enable */
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct streamid_data);
+ si_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev, si_data,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(si_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+ memset(si_data->dmac, 0xff, ETH_ALEN);
+ si_data->vid_vidm_tg =
+ cpu_to_le16(ENETC_CBDR_SID_VID_MASK
+ + ((0x3 << 14) | ENETC_CBDR_SID_VIDM));
+
+ si_conf = &cbd.sid_set;
+ /* Only one port supported for one entry, set itself */
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = 1;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ if (!enable) {
+ kfree(si_data);
+ return 0;
+ }
+
+ /* Enable the entry overwrite again incase space flushed by hardware */
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16((u16)sid->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_IDENTIFY;
+ cbd.status_flags = 0;
+
+ si_conf->en = 0x80;
+ si_conf->stream_handle = cpu_to_le32(sid->handle);
+ si_conf->iports = 1 << enetc_get_port(priv);
+ si_conf->id_type = sid->filtertype;
+ si_conf->oui[2] = 0x0;
+ si_conf->oui[1] = 0x80;
+ si_conf->oui[0] = 0xC2;
+
+ memset(si_data, 0, data_size);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ /* VIDM default to be 1.
+ * VID Match. If set (b1) then the VID must match, otherwise
+ * any VID is considered a match. VIDM setting is only used
+ * when TG is set to b01.
+ */
+ if (si_conf->id_type == STREAMID_TYPE_NULL) {
+ ether_addr_copy(si_data->dmac, sid->dst_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ } else if (si_conf->id_type == STREAMID_TYPE_SMAC) {
+ ether_addr_copy(si_data->smac, sid->src_mac);
+ si_data->vid_vidm_tg =
+ cpu_to_le16((sid->vid & ENETC_CBDR_SID_VID_MASK) +
+ ((((u16)(sid->tagged) & 0x3) << 14)
+ | ENETC_CBDR_SID_VIDM));
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ kfree(si_data);
+
+ return err;
+}
+
+/* Stream Filter Instance Set Descriptor */
+static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_filter *sfi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = {.cmd = 0};
+ struct sfi_conf *sfi_config;
+
+ cbd.index = cpu_to_le16(sfi->index);
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0x80;
+ cbd.length = cpu_to_le16(1);
+
+ sfi_config = &cbd.sfi_conf;
+ if (!enable)
+ goto exit;
+
+ sfi_config->en = 0x80;
+
+ if (sfi->handle >= 0) {
+ sfi_config->stream_handle =
+ cpu_to_le32(sfi->handle);
+ sfi_config->sthm |= 0x80;
+ }
+
+ sfi_config->sg_inst_table_index = cpu_to_le16(sfi->gate_id);
+ sfi_config->input_ports = 1 << enetc_get_port(priv);
+
+ /* The priority value which may be matched against the
+ * frame’s priority value to determine a match for this entry.
+ */
+ if (sfi->prio >= 0)
+ sfi_config->multi |= (sfi->prio & 0x7) | 0x8;
+
+ /* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
+ * field as being either an MSDU value or an index into the Flow
+ * Meter Instance table.
+ * TODO: no limit max sdu
+ */
+
+ if (sfi->meter_id >= 0) {
+ sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
+ sfi_config->multi |= 0x80;
+ }
+
+exit:
+ return enetc_send_cmd(priv->si, &cbd);
+}
+
+static int enetc_streamcounter_hw_get(struct enetc_ndev_priv *priv,
+ u32 index,
+ struct psfp_streamfilter_counters *cnt)
+{
+ struct enetc_cbd cbd = { .cmd = 2 };
+ struct sfi_counter_data *data_buf;
+ dma_addr_t dma;
+ u16 data_size;
+ int err;
+
+ cbd.index = cpu_to_le16((u16)index);
+ cbd.cmd = 2;
+ cbd.cls = BDCR_CMD_STREAM_FILTER;
+ cbd.status_flags = 0;
+
+ data_size = sizeof(struct sfi_counter_data);
+ data_buf = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!data_buf)
+ return -ENOMEM;
+
+ dma = dma_map_single(&priv->si->pdev->dev, data_buf,
+ data_size, DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ err = -ENOMEM;
+ goto exit;
+ }
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ cbd.length = cpu_to_le16(data_size);
+
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ goto exit;
+
+ cnt->matching_frames_count =
+ ((u64)le32_to_cpu(data_buf->matchh) << 32)
+ + data_buf->matchl;
+
+ cnt->not_passing_sdu_count =
+ ((u64)le32_to_cpu(data_buf->msdu_droph) << 32)
+ + data_buf->msdu_dropl;
+
+ cnt->passing_sdu_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count;
+
+ cnt->not_passing_frames_count =
+ ((u64)le32_to_cpu(data_buf->stream_gate_droph) << 32)
+ + le32_to_cpu(data_buf->stream_gate_dropl);
+
+ cnt->passing_frames_count = cnt->matching_frames_count
+ - cnt->not_passing_sdu_count
+ - cnt->not_passing_frames_count;
+
+ cnt->red_frames_count =
+ ((u64)le32_to_cpu(data_buf->flow_meter_droph) << 32)
+ + le32_to_cpu(data_buf->flow_meter_dropl);
+
+exit:
+ kfree(data_buf);
+ return err;
+}
+
+static u64 get_ptp_now(struct enetc_hw *hw)
+{
+ u64 now_lo, now_hi, now;
+
+ now_lo = enetc_rd(hw, ENETC_SICTR0);
+ now_hi = enetc_rd(hw, ENETC_SICTR1);
+ now = now_lo | now_hi << 32;
+
+ return now;
+}
+
+static int get_start_ns(u64 now, u64 cycle, u64 *start)
+{
+ u64 n;
+
+ if (!cycle)
+ return -EFAULT;
+
+ n = div64_u64(now, cycle);
+
+ *start = (n + 1) * cycle;
+
+ return 0;
+}
+
+/* Stream Gate Instance Set Descriptor */
+static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_gate *sgi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = { .cmd = 0 };
+ struct sgi_table *sgi_config;
+ struct sgcl_conf *sgcl_config;
+ struct sgcl_data *sgcl_data;
+ struct sgce *sgce;
+ dma_addr_t dma;
+ u16 data_size;
+ int err, i;
+ u64 now;
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 0;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0x80;
+
+ /* disable */
+ if (!enable)
+ return enetc_send_cmd(priv->si, &cbd);
+
+ if (!sgi->num_entries)
+ return 0;
+
+ if (sgi->num_entries > priv->psfp_cap.max_psfp_gatelist ||
+ !sgi->cycletime)
+ return -EINVAL;
+
+ /* enable */
+ sgi_config = &cbd.sgi_table;
+
+ /* Keep open before gate list start */
+ sgi_config->ocgtst = 0x80;
+
+ sgi_config->oipv = (sgi->init_ipv < 0) ?
+ 0x0 : ((sgi->init_ipv & 0x7) | 0x8);
+
+ sgi_config->en = 0x80;
+
+ /* Basic config */
+ err = enetc_send_cmd(priv->si, &cbd);
+ if (err)
+ return -EINVAL;
+
+ memset(&cbd, 0, sizeof(cbd));
+
+ cbd.index = cpu_to_le16(sgi->index);
+ cbd.cmd = 1;
+ cbd.cls = BDCR_CMD_STREAM_GCL;
+ cbd.status_flags = 0;
+
+ sgcl_config = &cbd.sgcl_conf;
+
+ sgcl_config->acl_len = (sgi->num_entries - 1) & 0x3;
+
+ data_size = struct_size(sgcl_data, sgcl, sgi->num_entries);
+
+ sgcl_data = kzalloc(data_size, __GFP_DMA | GFP_KERNEL);
+ if (!sgcl_data)
+ return -ENOMEM;
+
+ cbd.length = cpu_to_le16(data_size);
+
+ dma = dma_map_single(&priv->si->pdev->dev,
+ sgcl_data, data_size,
+ DMA_FROM_DEVICE);
+ if (dma_mapping_error(&priv->si->pdev->dev, dma)) {
+ netdev_err(priv->si->ndev, "DMA mapping failed!\n");
+ kfree(sgcl_data);
+ return -ENOMEM;
+ }
+
+ cbd.addr[0] = lower_32_bits(dma);
+ cbd.addr[1] = upper_32_bits(dma);
+
+ sgce = &sgcl_data->sgcl[0];
+
+ sgcl_config->agtst = 0x80;
+
+ sgcl_data->ct = cpu_to_le32(sgi->cycletime);
+ sgcl_data->cte = cpu_to_le32(sgi->cycletimext);
+
+ if (sgi->init_ipv >= 0)
+ sgcl_config->aipv = (sgi->init_ipv & 0x7) | 0x8;
+
+ for (i = 0; i < sgi->num_entries; i++) {
+ struct action_gate_entry *from = &sgi->entries[i];
+ struct sgce *to = &sgce[i];
+
+ if (from->gate_state)
+ to->multi |= 0x10;
+
+ if (from->ipv >= 0)
+ to->multi |= ((from->ipv & 0x7) << 5) | 0x08;
+
+ if (from->maxoctets >= 0) {
+ to->multi |= 0x01;
+ to->msdu[0] = from->maxoctets & 0xFF;
+ to->msdu[1] = (from->maxoctets >> 8) & 0xFF;
+ to->msdu[2] = (from->maxoctets >> 16) & 0xFF;
+ }
+
+ to->interval = cpu_to_le32(from->interval);
+ }
+
+ /* If basetime is less than now, calculate start time */
+ now = get_ptp_now(&priv->si->hw);
+
+ if (sgi->basetime < now) {
+ u64 start;
+
+ err = get_start_ns(now, sgi->cycletime, &start);
+ if (err)
+ goto exit;
+ sgcl_data->btl = cpu_to_le32(lower_32_bits(start));
+ sgcl_data->bth = cpu_to_le32(upper_32_bits(start));
+ } else {
+ u32 hi, lo;
+
+ hi = upper_32_bits(sgi->basetime);
+ lo = lower_32_bits(sgi->basetime);
+ sgcl_data->bth = cpu_to_le32(hi);
+ sgcl_data->btl = cpu_to_le32(lo);
+ }
+
+ err = enetc_send_cmd(priv->si, &cbd);
+
+exit:
+ kfree(sgcl_data);
+
+ return err;
+}
+
+static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
+{
+ struct enetc_stream_filter *f;
+
+ hlist_for_each_entry(f, &epsfp.stream_list, node)
+ if (f->sid.index == index)
+ return f;
+
+ return NULL;
+}
+
+static struct enetc_psfp_gate *enetc_get_gate_by_index(u32 index)
+{
+ struct enetc_psfp_gate *g;
+
+ hlist_for_each_entry(g, &epsfp.psfp_gate_list, node)
+ if (g->index == index)
+ return g;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->index == index)
+ return s;
+
+ return NULL;
+}
+
+static struct enetc_psfp_filter
+ *enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
+{
+ struct enetc_psfp_filter *s;
+
+ hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
+ if (s->gate_id == sfi->gate_id &&
+ s->prio == sfi->prio &&
+ s->meter_id == sfi->meter_id)
+ return s;
+
+ return NULL;
+}
+
+static int enetc_get_free_index(struct enetc_ndev_priv *priv)
+{
+ u32 max_size = priv->psfp_cap.max_psfp_filter;
+ unsigned long index;
+
+ index = find_first_zero_bit(epsfp.psfp_sfi_bitmap, max_size);
+ if (index == max_size)
+ return -1;
+
+ return index;
+}
+
+static void stream_filter_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_filter *sfi;
+ u8 z;
+
+ sfi = enetc_get_filter_by_index(index);
+ WARN_ON(!sfi);
+ z = refcount_dec_and_test(&sfi->refcount);
+
+ if (z) {
+ enetc_streamfilter_hw_set(priv, sfi, false);
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ clear_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ }
+}
+
+static void stream_gate_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_gate *sgi;
+ u8 z;
+
+ sgi = enetc_get_gate_by_index(index);
+ WARN_ON(!sgi);
+ z = refcount_dec_and_test(&sgi->refcount);
+ if (z) {
+ enetc_streamgate_hw_set(priv, sgi, false);
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void remove_one_chain(struct enetc_ndev_priv *priv,
+ struct enetc_stream_filter *filter)
+{
+ stream_gate_unref(priv, filter->sgi_index);
+ stream_filter_unref(priv, filter->sfi_index);
+
+ hlist_del(&filter->node);
+ kfree(filter);
+}
+
+static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_streamid *sid,
+ struct enetc_psfp_filter *sfi,
+ struct enetc_psfp_gate *sgi)
+{
+ int err;
+
+ err = enetc_streamid_hw_set(priv, sid, true);
+ if (err)
+ return err;
+
+ if (sfi) {
+ err = enetc_streamfilter_hw_set(priv, sfi, true);
+ if (err)
+ goto revert_sid;
+ }
+
+ err = enetc_streamgate_hw_set(priv, sgi, true);
+ if (err)
+ goto revert_sfi;
+
+ return 0;
+
+revert_sfi:
+ if (sfi)
+ enetc_streamfilter_hw_set(priv, sfi, false);
+revert_sid:
+ enetc_streamid_hw_set(priv, sid, false);
+ return err;
+}
+
+struct actions_fwd *enetc_check_flow_actions(u64 acts, unsigned int inputkeys)
+{
+ int i;
+
+ for (i = 0; i < ARRAY_SIZE(enetc_act_fwd); i++)
+ if (acts == enetc_act_fwd[i].actions &&
+ inputkeys & enetc_act_fwd[i].keys)
+ return &enetc_act_fwd[i];
+
+ return NULL;
+}
+
+static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(f);
+ struct netlink_ext_ack *extack = f->common.extack;
+ struct enetc_stream_filter *filter, *old_filter;
+ struct enetc_psfp_filter *sfi, *old_sfi;
+ struct enetc_psfp_gate *sgi, *old_sgi;
+ struct flow_action_entry *entry;
+ struct action_gate_entry *e;
+ u8 sfi_overwrite = 0;
+ int entries_size;
+ int i, err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ flow_action_for_each(i, entry, &rule->action)
+ if (entry->id == FLOW_ACTION_GATE)
+ break;
+
+ if (entry->id != FLOW_ACTION_GATE)
+ return -EINVAL;
+
+ filter = kzalloc(sizeof(*filter), GFP_KERNEL);
+ if (!filter)
+ return -ENOMEM;
+
+ filter->sid.index = f->common.chain_index;
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_ETH_ADDRS)) {
+ struct flow_match_eth_addrs match;
+
+ flow_rule_match_eth_addrs(rule, &match);
+
+ if (!is_zero_ether_addr(match.mask->dst) &&
+ !is_zero_ether_addr(match.mask->src)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Cannot match on both source and destination MAC");
+ goto free_filter;
+ }
+
+ if (!is_zero_ether_addr(match.mask->dst)) {
+ if (!is_broadcast_ether_addr(match.mask->dst)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Masked matching on destination MAC not supported");
+ goto free_filter;
+ }
+ ether_addr_copy(filter->sid.dst_mac, match.key->dst);
+ filter->sid.filtertype = STREAMID_TYPE_NULL;
+ }
+
+ if (!is_zero_ether_addr(match.mask->src)) {
+ if (!is_broadcast_ether_addr(match.mask->src)) {
+ NL_SET_ERR_MSG_MOD(extack,
+ "Masked matching on source MAC not supported");
+ goto free_filter;
+ }
+ ether_addr_copy(filter->sid.src_mac, match.key->src);
+ filter->sid.filtertype = STREAMID_TYPE_SMAC;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported, must include ETH_ADDRS");
+ goto free_filter;
+ }
+
+ if (flow_rule_match_key(rule, FLOW_DISSECTOR_KEY_VLAN)) {
+ struct flow_match_vlan match;
+
+ flow_rule_match_vlan(rule, &match);
+ if (match.mask->vlan_priority) {
+ if (match.mask->vlan_priority !=
+ (VLAN_PRIO_MASK >> VLAN_PRIO_SHIFT)) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN priority");
+ err = -EINVAL;
+ goto free_filter;
+ }
+ }
+
+ if (match.mask->vlan_id) {
+ if (match.mask->vlan_id != VLAN_VID_MASK) {
+ NL_SET_ERR_MSG_MOD(extack, "Only full mask is supported for VLAN id");
+ err = -EINVAL;
+ goto free_filter;
+ }
+
+ filter->sid.vid = match.key->vlan_id;
+ if (!filter->sid.vid)
+ filter->sid.tagged = STREAMID_VLAN_UNTAGGED;
+ else
+ filter->sid.tagged = STREAMID_VLAN_TAGGED;
+ }
+ } else {
+ filter->sid.tagged = STREAMID_VLAN_ALL;
+ }
+
+ /* parsing gate action */
+ if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
+ err = -ENOSPC;
+ goto free_filter;
+ }
+
+ entries_size = struct_size(sgi, entries, entry->gate.num_entries);
+ sgi = kzalloc(entries_size, GFP_KERNEL);
+ if (!sgi) {
+ err = -ENOMEM;
+ goto free_filter;
+ }
+
+ refcount_set(&sgi->refcount, 1);
+ sgi->index = entry->gate.index;
+ sgi->init_ipv = entry->gate.prio;
+ sgi->basetime = entry->gate.basetime;
+ sgi->cycletime = entry->gate.cycletime;
+ sgi->num_entries = entry->gate.num_entries;
+
+ e = sgi->entries;
+ for (i = 0; i < entry->gate.num_entries; i++) {
+ e[i].gate_state = entry->gate.entries[i].gate_state;
+ e[i].interval = entry->gate.entries[i].interval;
+ e[i].ipv = entry->gate.entries[i].ipv;
+ e[i].maxoctets = entry->gate.entries[i].maxoctets;
+ }
+
+ filter->sgi_index = sgi->index;
+
+ sfi = kzalloc(sizeof(*sfi), GFP_KERNEL);
+ if (!sfi) {
+ err = -ENOMEM;
+ goto free_gate;
+ }
+
+ refcount_set(&sfi->refcount, 1);
+ sfi->gate_id = sgi->index;
+
+ /* flow meter not support yet */
+ sfi->meter_id = ENETC_PSFP_WILDCARD;
+
+ /* prio ref the filter prio */
+ if (f->common.prio && f->common.prio <= BIT(3))
+ sfi->prio = f->common.prio - 1;
+ else
+ sfi->prio = ENETC_PSFP_WILDCARD;
+
+ old_sfi = enetc_psfp_check_sfi(sfi);
+ if (!old_sfi) {
+ int index;
+
+ index = enetc_get_free_index(priv);
+ if (sfi->handle < 0) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
+ err = -ENOSPC;
+ goto free_sfi;
+ }
+
+ sfi->index = index;
+ sfi->handle = index + HANDLE_OFFSET;
+ /* Update the stream filter handle also */
+ filter->sid.handle = sfi->handle;
+ filter->sfi_index = sfi->index;
+ sfi_overwrite = 0;
+ } else {
+ filter->sfi_index = old_sfi->index;
+ filter->sid.handle = old_sfi->handle;
+ sfi_overwrite = 1;
+ }
+
+ err = enetc_psfp_hw_set(priv, &filter->sid,
+ sfi_overwrite ? NULL : sfi, sgi);
+ if (err)
+ goto free_sfi;
+
+ spin_lock(&epsfp.psfp_lock);
+ /* Remove the old node if exist and update with a new node */
+ old_sgi = enetc_get_gate_by_index(filter->sgi_index);
+ if (old_sgi) {
+ refcount_set(&sgi->refcount,
+ refcount_read(&old_sgi->refcount) + 1);
+ hlist_del(&old_sgi->node);
+ kfree(old_sgi);
+ }
+
+ hlist_add_head(&sgi->node, &epsfp.psfp_gate_list);
+
+ if (!old_sfi) {
+ hlist_add_head(&sfi->node, &epsfp.psfp_filter_list);
+ set_bit(sfi->index, epsfp.psfp_sfi_bitmap);
+ } else {
+ kfree(sfi);
+ refcount_inc(&old_sfi->refcount);
+ }
+
+ old_filter = enetc_get_stream_by_index(filter->sid.index);
+ if (old_filter)
+ remove_one_chain(priv, old_filter);
+
+ filter->stats.lastused = jiffies;
+ hlist_add_head(&filter->node, &epsfp.stream_list);
+
+ spin_unlock(&epsfp.psfp_lock);
+
+ return 0;
+
+free_sfi:
+ kfree(sfi);
+free_gate:
+ kfree(sgi);
+free_filter:
+ kfree(filter);
+
+ return err;
+}
+
+static int enetc_config_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ struct flow_rule *rule = flow_cls_offload_flow_rule(cls_flower);
+ struct netlink_ext_ack *extack = cls_flower->common.extack;
+ struct flow_dissector *dissector = rule->match.dissector;
+ struct flow_action *action = &rule->action;
+ struct flow_action_entry *entry;
+ struct actions_fwd *fwd;
+ u64 actions = 0;
+ int i, err;
+
+ if (!flow_action_has_entries(action)) {
+ NL_SET_ERR_MSG_MOD(extack, "At least one action is needed");
+ return -EINVAL;
+ }
+
+ flow_action_for_each(i, entry, action)
+ actions |= BIT(entry->id);
+
+ fwd = enetc_check_flow_actions(actions, dissector->used_keys);
+ if (!fwd) {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported filter type!");
+ return -EOPNOTSUPP;
+ }
+
+ if (fwd->output & FILTER_ACTION_TYPE_PSFP) {
+ err = enetc_psfp_parse_clsflower(priv, cls_flower);
+ if (err) {
+ NL_SET_ERR_MSG_MOD(extack, "Invalid PSFP inputs");
+ return err;
+ }
+ } else {
+ NL_SET_ERR_MSG_MOD(extack, "Unsupported actions");
+ return -EOPNOTSUPP;
+ }
+
+ return 0;
+}
+
+static int enetc_psfp_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct enetc_stream_filter *filter;
+ struct netlink_ext_ack *extack = f->common.extack;
+ int err;
+
+ if (f->common.chain_index >= priv->psfp_cap.max_streamid) {
+ NL_SET_ERR_MSG_MOD(extack, "No Stream identify resource!");
+ return -ENOSPC;
+ }
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamid_hw_set(priv, &filter->sid, false);
+ if (err)
+ return err;
+
+ remove_one_chain(priv, filter);
+
+ return 0;
+}
+
+static int enetc_destroy_clsflower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ return enetc_psfp_destroy_clsflower(priv, f);
+}
+
+static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *f)
+{
+ struct psfp_streamfilter_counters counters = {};
+ struct enetc_stream_filter *filter;
+ struct flow_stats stats = {};
+ int err;
+
+ filter = enetc_get_stream_by_index(f->common.chain_index);
+ if (!filter)
+ return -EINVAL;
+
+ err = enetc_streamcounter_hw_get(priv, filter->sfi_index, &counters);
+ if (err)
+ return -EINVAL;
+
+ spin_lock(&epsfp.psfp_lock);
+ stats.pkts = counters.matching_frames_count - filter->stats.pkts;
+ stats.lastused = filter->stats.lastused;
+ filter->stats.pkts += stats.pkts;
+ spin_unlock(&epsfp.psfp_lock);
+
+ flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
+ FLOW_ACTION_HW_STATS_DELAYED);
+
+ return 0;
+}
+
+static int enetc_setup_tc_cls_flower(struct enetc_ndev_priv *priv,
+ struct flow_cls_offload *cls_flower)
+{
+ switch (cls_flower->command) {
+ case FLOW_CLS_REPLACE:
+ return enetc_config_clsflower(priv, cls_flower);
+ case FLOW_CLS_DESTROY:
+ return enetc_destroy_clsflower(priv, cls_flower);
+ case FLOW_CLS_STATS:
+ return enetc_psfp_get_stats(priv, cls_flower);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+static inline void clean_psfp_sfi_bitmap(void)
+{
+ bitmap_free(epsfp.psfp_sfi_bitmap);
+ epsfp.psfp_sfi_bitmap = NULL;
+}
+
+static void clean_stream_list(void)
+{
+ struct enetc_stream_filter *s;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(s, tmp, &epsfp.stream_list, node) {
+ hlist_del(&s->node);
+ kfree(s);
+ }
+}
+
+static void clean_sfi_list(void)
+{
+ struct enetc_psfp_filter *sfi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sfi, tmp, &epsfp.psfp_filter_list, node) {
+ hlist_del(&sfi->node);
+ kfree(sfi);
+ }
+}
+
+static void clean_sgi_list(void)
+{
+ struct enetc_psfp_gate *sgi;
+ struct hlist_node *tmp;
+
+ hlist_for_each_entry_safe(sgi, tmp, &epsfp.psfp_gate_list, node) {
+ hlist_del(&sgi->node);
+ kfree(sgi);
+ }
+}
+
+static void clean_psfp_all(void)
+{
+ /* Disable all list nodes and free all memory */
+ clean_sfi_list();
+ clean_sgi_list();
+ clean_stream_list();
+ epsfp.dev_bitmap = 0;
+ clean_psfp_sfi_bitmap();
+}
+
+int enetc_setup_tc_block_cb(enum tc_setup_type type, void *type_data,
+ void *cb_priv)
+{
+ struct net_device *ndev = cb_priv;
+
+ if (!tc_can_offload(ndev))
+ return -EOPNOTSUPP;
+
+ switch (type) {
+ case TC_SETUP_CLSFLOWER:
+ return enetc_setup_tc_cls_flower(netdev_priv(ndev), type_data);
+ default:
+ return -EOPNOTSUPP;
+ }
+}
+
+int enetc_psfp_init(struct enetc_ndev_priv *priv)
+{
+ if (epsfp.psfp_sfi_bitmap)
+ return 0;
+
+ epsfp.psfp_sfi_bitmap = bitmap_zalloc(priv->psfp_cap.max_psfp_filter,
+ GFP_KERNEL);
+ if (!epsfp.psfp_sfi_bitmap)
+ return -ENOMEM;
+
+ spin_lock_init(&epsfp.psfp_lock);
+
+ if (list_empty(&enetc_block_cb_list))
+ epsfp.dev_bitmap = 0;
+
+ return 0;
+}
+
+int enetc_psfp_clean(struct enetc_ndev_priv *priv)
+{
+ if (!list_empty(&enetc_block_cb_list))
+ return -EBUSY;
+
+ clean_psfp_all();
+
+ return 0;
+}
+
+int enetc_setup_tc_psfp(struct net_device *ndev, void *type_data)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+ struct flow_block_offload *f = type_data;
+ int err;
+
+ err = flow_block_cb_setup_simple(f, &enetc_block_cb_list,
+ enetc_setup_tc_block_cb,
+ ndev, ndev, true);
+ if (err)
+ return err;
+
+ switch (f->command) {
+ case FLOW_BLOCK_BIND:
+ set_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ break;
+ case FLOW_BLOCK_UNBIND:
+ clear_bit(enetc_get_port(priv), &epsfp.dev_bitmap);
+ if (!epsfp.dev_bitmap)
+ clean_psfp_all();
+ break;
+ }
+
+ return 0;
+}
--
2.17.1

2020-05-01 01:19:07

by Po Liu

[permalink] [raw]
Subject: [v5,net-next 3/4] net: enetc: add hw tc hw offload features for PSPF capability

This patch is to let ethtool enable/disable the tc flower offload
features. Hardware ENETC has the feature of PSFP which is for per-stream
policing. When enable the tc hw offloading feature, driver would enable
the IEEE 802.1Qci feature. It is only set the register enable bit for
this feature not enable for any entry of per stream filtering and stream
gate or stream identify but get how much capabilities for each feature.

Signed-off-by: Po Liu <[email protected]>
---
drivers/net/ethernet/freescale/enetc/enetc.c | 23 +++++++++
drivers/net/ethernet/freescale/enetc/enetc.h | 48 +++++++++++++++++++
.../net/ethernet/freescale/enetc/enetc_hw.h | 17 +++++++
.../net/ethernet/freescale/enetc/enetc_pf.c | 8 ++++
4 files changed, 96 insertions(+)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index ccf2611f4a20..04aac7cbb506 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -756,6 +756,9 @@ void enetc_get_si_caps(struct enetc_si *si)

if (val & ENETC_SIPCAPR0_QBV)
si->hw_features |= ENETC_SI_F_QBV;
+
+ if (val & ENETC_SIPCAPR0_PSFP)
+ si->hw_features |= ENETC_SI_F_PSFP;
}

static int enetc_dma_alloc_bdr(struct enetc_bdr *r, size_t bd_size)
@@ -1567,6 +1570,23 @@ static int enetc_set_rss(struct net_device *ndev, int en)
return 0;
}

+static int enetc_set_psfp(struct net_device *ndev, int en)
+{
+ struct enetc_ndev_priv *priv = netdev_priv(ndev);
+
+ if (en) {
+ priv->active_offloads |= ENETC_F_QCI;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&priv->si->hw);
+ } else {
+ priv->active_offloads &= ~ENETC_F_QCI;
+ memset(&priv->psfp_cap, 0, sizeof(struct psfp_cap));
+ enetc_psfp_disable(&priv->si->hw);
+ }
+
+ return 0;
+}
+
int enetc_set_features(struct net_device *ndev,
netdev_features_t features)
{
@@ -1575,6 +1595,9 @@ int enetc_set_features(struct net_device *ndev,
if (changed & NETIF_F_RXHASH)
enetc_set_rss(ndev, !!(features & NETIF_F_RXHASH));

+ if (changed & NETIF_F_HW_TC)
+ enetc_set_psfp(ndev, !!(features & NETIF_F_HW_TC));
+
return 0;
}

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.h b/drivers/net/ethernet/freescale/enetc/enetc.h
index 56c43f35b633..2cfe877c3778 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc.h
@@ -151,6 +151,7 @@ enum enetc_errata {
};

#define ENETC_SI_F_QBV BIT(0)
+#define ENETC_SI_F_PSFP BIT(1)

/* PCI IEP device data */
struct enetc_si {
@@ -203,12 +204,20 @@ struct enetc_cls_rule {
};

#define ENETC_MAX_BDR_INT 2 /* fixed to max # of available cpus */
+struct psfp_cap {
+ u32 max_streamid;
+ u32 max_psfp_filter;
+ u32 max_psfp_gate;
+ u32 max_psfp_gatelist;
+ u32 max_psfp_meter;
+};

/* TODO: more hardware offloads */
enum enetc_active_offloads {
ENETC_F_RX_TSTAMP = BIT(0),
ENETC_F_TX_TSTAMP = BIT(1),
ENETC_F_QBV = BIT(2),
+ ENETC_F_QCI = BIT(3),
};

struct enetc_ndev_priv {
@@ -231,6 +240,8 @@ struct enetc_ndev_priv {

struct enetc_cls_rule *cls_rules;

+ struct psfp_cap psfp_cap;
+
struct device_node *phy_node;
phy_interface_t if_mode;
};
@@ -289,9 +300,46 @@ int enetc_setup_tc_taprio(struct net_device *ndev, void *type_data);
void enetc_sched_speed_set(struct net_device *ndev);
int enetc_setup_tc_cbs(struct net_device *ndev, void *type_data);
int enetc_setup_tc_txtime(struct net_device *ndev, void *type_data);
+
+static inline void enetc_get_max_cap(struct enetc_ndev_priv *priv)
+{
+ u32 reg;
+
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSIDCAPR);
+ priv->psfp_cap.max_streamid = reg & ENETC_PSIDCAPR_MSK;
+ /* Port stream filter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSFCAPR);
+ priv->psfp_cap.max_psfp_filter = reg & ENETC_PSFCAPR_MSK;
+ /* Port stream gate capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PSGCAPR);
+ priv->psfp_cap.max_psfp_gate = (reg & ENETC_PSGCAPR_SGIT_MSK);
+ priv->psfp_cap.max_psfp_gatelist = (reg & ENETC_PSGCAPR_GCL_MSK) >> 16;
+ /* Port flow meter capability */
+ reg = enetc_port_rd(&priv->si->hw, ENETC_PFMCAPR);
+ priv->psfp_cap.max_psfp_meter = reg & ENETC_PFMCAPR_MSK;
+}
+
+static inline void enetc_psfp_enable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) |
+ ENETC_PPSFPMR_PSFPEN | ENETC_PPSFPMR_VS |
+ ENETC_PPSFPMR_PVC | ENETC_PPSFPMR_PVZC);
+}
+
+static inline void enetc_psfp_disable(struct enetc_hw *hw)
+{
+ enetc_wr(hw, ENETC_PPSFPMR, enetc_rd(hw, ENETC_PPSFPMR) &
+ ~ENETC_PPSFPMR_PSFPEN & ~ENETC_PPSFPMR_VS &
+ ~ENETC_PPSFPMR_PVC & ~ENETC_PPSFPMR_PVZC);
+}
#else
#define enetc_setup_tc_taprio(ndev, type_data) -EOPNOTSUPP
#define enetc_sched_speed_set(ndev) (void)0
#define enetc_setup_tc_cbs(ndev, type_data) -EOPNOTSUPP
#define enetc_setup_tc_txtime(ndev, type_data) -EOPNOTSUPP
+#define enetc_get_max_cap(p) \
+ memset(&((p)->psfp_cap), 0, sizeof(struct psfp_cap))
+
+#define enetc_psfp_enable(hw) (void)0
+#define enetc_psfp_disable(hw) (void)0
#endif
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 2a6523136947..587974862f48 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -19,6 +19,7 @@
#define ENETC_SICTR1 0x1c
#define ENETC_SIPCAPR0 0x20
#define ENETC_SIPCAPR0_QBV BIT(4)
+#define ENETC_SIPCAPR0_PSFP BIT(9)
#define ENETC_SIPCAPR0_RSS BIT(8)
#define ENETC_SIPCAPR1 0x24
#define ENETC_SITGTGR 0x30
@@ -228,6 +229,15 @@ enum enetc_bdr_type {TX, RX};
#define ENETC_PM0_IFM_RLP (BIT(5) | BIT(11))
#define ENETC_PM0_IFM_RGAUTO (BIT(15) | ENETC_PMO_IFM_RG | BIT(1))
#define ENETC_PM0_IFM_XGMII BIT(12)
+#define ENETC_PSIDCAPR 0x1b08
+#define ENETC_PSIDCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSFCAPR 0x1b18
+#define ENETC_PSFCAPR_MSK GENMASK(15, 0)
+#define ENETC_PSGCAPR 0x1b28
+#define ENETC_PSGCAPR_GCL_MSK GENMASK(18, 16)
+#define ENETC_PSGCAPR_SGIT_MSK GENMASK(15, 0)
+#define ENETC_PFMCAPR 0x1b38
+#define ENETC_PFMCAPR_MSK GENMASK(15, 0)

/* MAC counters */
#define ENETC_PM0_REOCT 0x8100
@@ -621,3 +631,10 @@ struct enetc_cbd {
/* Port time specific departure */
#define ENETC_PTCTSDR(n) (0x1210 + 4 * (n))
#define ENETC_TSDE BIT(31)
+
+/* PSFP setting */
+#define ENETC_PPSFPMR 0x11b00
+#define ENETC_PPSFPMR_PSFPEN BIT(0)
+#define ENETC_PPSFPMR_VS BIT(1)
+#define ENETC_PPSFPMR_PVC BIT(2)
+#define ENETC_PPSFPMR_PVZC BIT(3)
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_pf.c b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
index de1ad4975074..cef9fbfdb056 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_pf.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_pf.c
@@ -727,6 +727,14 @@ static void enetc_pf_netdev_setup(struct enetc_si *si, struct net_device *ndev,
if (si->hw_features & ENETC_SI_F_QBV)
priv->active_offloads |= ENETC_F_QBV;

+ if (si->hw_features & ENETC_SI_F_PSFP) {
+ priv->active_offloads |= ENETC_F_QCI;
+ ndev->features |= NETIF_F_HW_TC;
+ ndev->hw_features |= NETIF_F_HW_TC;
+ enetc_get_max_cap(priv);
+ enetc_psfp_enable(&si->hw);
+ }
+
/* pick up primary MAC address from SI */
enetc_get_primary_mac_addr(&si->hw, ndev->dev_addr);
}
--
2.17.1

2020-05-01 23:10:37

by David Miller

[permalink] [raw]
Subject: Re: [v5,net-next 0/4] Introduce a flow gate control action and apply IEEE

From: Po Liu <[email protected]>
Date: Fri, 1 May 2020 08:53:14 +0800

...
> These patches add stream gate action policing in IEEE802.1Qci (Per-Stream
> Filtering and Policing) software support and hardware offload support in
> tc flower, and implement the stream identify, stream filtering and
> stream gate filtering action in the NXP ENETC ethernet driver.
...

Series applied, thanks.

2020-05-03 06:55:47

by Po Liu

[permalink] [raw]
Subject: [v3,iproute2 1/2] iproute2:tc:action: add a gate control action

Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can pass at
specific time slot, and also drop at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action require the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 parent ffff: protocol ip \

flower src_ip 192.168.0.20 \
action gate index 2 clockid CLOCK_TAI \
sched-entry open 200000000 -1 -1 \
sched-entry close 100000000

# tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put to. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

#tc filter add dev eth0 parent ffff: protocol ip \
flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
action gate index 12 base-time 1357000000000 \
sched-entry CLOSE 200000000 \
clockid CLOCK_TAI

Signed-off-by: Po Liu <[email protected]>
---
These patches continue request for support iprout2 tc command input gate
action since kernel patch applied (a51c328df310 net: qos: introduce a
gate control flow action).
Continue the version 3.
Changes from v2:
Fix flexible input for a time slot - sched-entry suggested by Vladimir
Oltean and Vinicius Gomes:
- ipv and maxoctets in a sched-entry can be ignore input default to be
wildcard(values are -1).

include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +++
tc/Makefile | 1 +
tc/m_gate.c | 533 ++++++++++++++++++++++++++++
4 files changed, 582 insertions(+)
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 tc/m_gate.c

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 9f06d29c..fc672b23 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -134,6 +134,7 @@ enum tca_id {
TCA_ID_CTINFO,
TCA_ID_MPLS,
TCA_ID_CT,
+ TCA_ID_GATE,
/* other actions go here */
__TCA_ID_MAX = 255
};
diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
new file mode 100644
index 00000000..f214b3a6
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_gate.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 NXP */
+
+#ifndef __LINUX_TC_GATE_H
+#define __LINUX_TC_GATE_H
+
+#include <linux/pkt_cls.h>
+
+struct tc_gate {
+ tc_gen;
+};
+
+enum {
+ TCA_GATE_ENTRY_UNSPEC,
+ TCA_GATE_ENTRY_INDEX,
+ TCA_GATE_ENTRY_GATE,
+ TCA_GATE_ENTRY_INTERVAL,
+ TCA_GATE_ENTRY_IPV,
+ TCA_GATE_ENTRY_MAX_OCTETS,
+ __TCA_GATE_ENTRY_MAX,
+};
+#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_ONE_ENTRY_UNSPEC,
+ TCA_GATE_ONE_ENTRY,
+ __TCA_GATE_ONE_ENTRY_MAX,
+};
+#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_UNSPEC,
+ TCA_GATE_TM,
+ TCA_GATE_PARMS,
+ TCA_GATE_PAD,
+ TCA_GATE_PRIORITY,
+ TCA_GATE_ENTRY_LIST,
+ TCA_GATE_BASE_TIME,
+ TCA_GATE_CYCLE_TIME,
+ TCA_GATE_CYCLE_TIME_EXT,
+ TCA_GATE_FLAGS,
+ TCA_GATE_CLOCKID,
+ __TCA_GATE_MAX,
+};
+#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
+
+#endif
diff --git a/tc/Makefile b/tc/Makefile
index e31cbc12..79c9c1dd 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -54,6 +54,7 @@ TCMODULES += m_bpf.o
TCMODULES += m_tunnel_key.o
TCMODULES += m_sample.o
TCMODULES += m_ct.o
+TCMODULES += m_gate.o
TCMODULES += p_ip.o
TCMODULES += p_ip6.o
TCMODULES += p_icmp.o
diff --git a/tc/m_gate.c b/tc/m_gate.c
new file mode 100644
index 00000000..8e0211f5
--- /dev/null
+++ b/tc/m_gate.c
@@ -0,0 +1,533 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
+/* Copyright 2020 NXP */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <linux/if_ether.h>
+#include "utils.h"
+#include "rt_names.h"
+#include "tc_util.h"
+#include "list.h"
+#include <linux/tc_act/tc_gate.h>
+
+struct gate_entry {
+ struct list_head list;
+ uint8_t gate_state;
+ uint32_t interval;
+ int32_t ipv;
+ int32_t maxoctets;
+};
+
+#define CLOCKID_INVALID (-1)
+static const struct clockid_table {
+ const char *name;
+ clockid_t clockid;
+} clockt_map[] = {
+ { "REALTIME", CLOCK_REALTIME },
+ { "TAI", CLOCK_TAI },
+ { "BOOTTIME", CLOCK_BOOTTIME },
+ { "MONOTONIC", CLOCK_MONOTONIC },
+ { NULL }
+};
+
+static void explain(void)
+{
+ fprintf(stderr,
+ "Usage: gate [ priority PRIO-SPEC ] [ base-time BASE-TIME ]\n"
+ " [ cycle-time CYCLE-TIME ]\n"
+ " [ cycle-time-ext CYCLE-TIME-EXT ]\n"
+ " [ clockid CLOCKID ] [flags FLAGS]\n"
+ " [ sched-entry GATE0 INTERVAL [ INTERNAL-PRIO-VALUE MAX-OCTETS ] ]\n"
+ " [ sched-entry GATE1 INTERVAL [ INTERNAL-PRIO-VALUE MAX-OCTETS ] ]\n"
+ " ......\n"
+ " [ sched-entry GATEn INTERVAL [ INTERNAL-PRIO-VALUE MAX-OCTETS ] ]\n"
+ " [ CONTROL ]\n"
+ " GATEn := open | close\n"
+ " INTERVAL : nanoseconds period of gate slot\n"
+ " INTERNAL-PRIO-VALUE : internal priority decide which\n"
+ " rx queue number direct to.\n"
+ " default to be -1 which means wildcard.\n"
+ " MAX-OCTETS : maximum number of MSDU octets that are\n"
+ " permitted to pas the gate during the\n"
+ " specified TimeInterval.\n"
+ " default to be -1 which means wildcard.\n"
+ " CONTROL := pipe | drop | continue | pass |\n"
+ " goto chain <CHAIN_INDEX>\n");
+}
+
+static void usage(void)
+{
+ explain();
+ exit(-1);
+}
+
+static void explain_entry_format(void)
+{
+ fprintf(stderr, "Usage: sched-entry <open | close> <interval> [ <interval ipv> <octets max bytes> ]\n");
+}
+
+static int parse_gate(struct action_util *a, int *argc_p, char ***argv_p,
+ int tca_id, struct nlmsghdr *n);
+static int print_gate(struct action_util *au, FILE *f, struct rtattr *arg);
+
+struct action_util gate_action_util = {
+ .id = "gate",
+ .parse_aopt = parse_gate,
+ .print_aopt = print_gate,
+};
+
+static int get_clockid(__s32 *val, const char *arg)
+{
+ const struct clockid_table *c;
+
+ if (strcasestr(arg, "CLOCK_") != NULL)
+ arg += sizeof("CLOCK_") - 1;
+
+ for (c = clockt_map; c->name; c++) {
+ if (strcasecmp(c->name, arg) == 0) {
+ *val = c->clockid;
+ return 0;
+ }
+ }
+
+ return -1;
+}
+
+static const char *get_clock_name(clockid_t clockid)
+{
+ const struct clockid_table *c;
+
+ for (c = clockt_map; c->name; c++) {
+ if (clockid == c->clockid)
+ return c->name;
+ }
+
+ return "invalid";
+}
+
+static int get_gate_state(__u8 *val, const char *arg)
+{
+ if (!strcasecmp("OPEN", arg)) {
+ *val = 1;
+ return 0;
+ }
+
+ if (!strcasecmp("CLOSE", arg)) {
+ *val = 0;
+ return 0;
+ }
+
+ return -1;
+}
+
+static struct gate_entry *create_gate_entry(uint8_t gate_state,
+ uint32_t interval,
+ int32_t ipv,
+ int32_t maxoctets)
+{
+ struct gate_entry *e;
+
+ e = calloc(1, sizeof(*e));
+ if (!e)
+ return NULL;
+
+ e->gate_state = gate_state;
+ e->interval = interval;
+ e->ipv = ipv;
+ e->maxoctets = maxoctets;
+
+ return e;
+}
+
+static int add_gate_list(struct list_head *gate_entries, struct nlmsghdr *n)
+{
+ struct gate_entry *e;
+
+ list_for_each_entry(e, gate_entries, list) {
+ struct rtattr *a;
+
+ a = addattr_nest(n, 1024, TCA_GATE_ONE_ENTRY | NLA_F_NESTED);
+
+ if (e->gate_state)
+ addattr(n, MAX_MSG, TCA_GATE_ENTRY_GATE);
+
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_INTERVAL,
+ &e->interval, sizeof(e->interval));
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_IPV,
+ &e->ipv, sizeof(e->ipv));
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_MAX_OCTETS,
+ &e->maxoctets, sizeof(e->maxoctets));
+
+ addattr_nest_end(n, a);
+ }
+
+ return 0;
+}
+
+static void free_entries(struct list_head *gate_entries)
+{
+ struct gate_entry *e, *n;
+
+ list_for_each_entry_safe(e, n, gate_entries, list) {
+ list_del(&e->list);
+ free(e);
+ }
+}
+
+static int parse_gate(struct action_util *a, int *argc_p, char ***argv_p,
+ int tca_id, struct nlmsghdr *n)
+{
+ struct tc_gate parm = {.action = TC_ACT_PIPE};
+ struct list_head gate_entries;
+ __s32 clockid = CLOCKID_INVALID;
+ struct rtattr *tail, *nle;
+ char **argv = *argv_p;
+ int argc = *argc_p;
+ __u64 base_time = 0;
+ __u64 cycle_time = 0;
+ __u64 cycle_time_ext = 0;
+ int entry_num = 0;
+ char *invalidarg;
+ __u32 flags = 0;
+ int prio = -1;
+
+ int err;
+
+ if (matches(*argv, "gate") != 0)
+ return -1;
+
+ NEXT_ARG();
+ if (argc <= 0)
+ return -1;
+
+ INIT_LIST_HEAD(&gate_entries);
+
+ while (argc > 0) {
+ if (matches(*argv, "index") == 0) {
+ NEXT_ARG();
+ if (get_u32(&parm.index, *argv, 10)) {
+ invalidarg = "index";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "priority") == 0) {
+ NEXT_ARG();
+ if (get_s32(&prio, *argv, 0)) {
+ invalidarg = "priority";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "base-time") == 0) {
+ NEXT_ARG();
+ if (get_u64(&base_time, *argv, 10)) {
+ invalidarg = "base-time";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "cycle-time") == 0) {
+ NEXT_ARG();
+ if (get_u64(&cycle_time, *argv, 10)) {
+ invalidarg = "cycle-time";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "cycle-time-ext") == 0) {
+ NEXT_ARG();
+ if (get_u64(&cycle_time_ext, *argv, 10)) {
+ invalidarg = "cycle-time-ext";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "clockid") == 0) {
+ NEXT_ARG();
+ if (get_clockid(&clockid, *argv)) {
+ invalidarg = "clockid";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "flags") == 0) {
+ NEXT_ARG();
+ if (get_u32(&flags, *argv, 0)) {
+ invalidarg = "flags";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "sched-entry") == 0) {
+ struct gate_entry *e;
+ uint8_t gate_state = 0;
+ uint32_t interval = 0;
+ int32_t ipv = -1;
+ int32_t maxoctets = -1;
+ uint8_t backarg = 0;
+
+ NEXT_ARG();
+
+ if (get_gate_state(&gate_state, *argv)) {
+ explain_entry_format();
+ invalidarg = "sched-entry";
+ goto err_arg;
+ }
+
+ NEXT_ARG();
+
+ if (get_u32(&interval, *argv, 0)) {
+ explain_entry_format();
+ invalidarg = "sched-entry";
+ goto err_arg;
+ }
+
+ if (!NEXT_ARG_OK())
+ goto create_entry;
+
+ NEXT_ARG();
+
+ if (get_s32(&ipv, *argv, 0)) {
+ backarg++;
+ goto create_entry;
+ }
+
+ if (!gate_state)
+ ipv = -1;
+
+ if (!NEXT_ARG_OK())
+ goto create_entry;
+
+ NEXT_ARG();
+
+ if (get_s32(&maxoctets, *argv, 0))
+ backarg++;
+
+ if (!gate_state)
+ maxoctets = -1;
+
+create_entry:
+ e = create_gate_entry(gate_state, interval,
+ ipv, maxoctets);
+ if (!e) {
+ fprintf(stderr, "gate: not enough memory\n");
+ free_entries(&gate_entries);
+ exit(-1);
+ }
+
+ list_add_tail(&e->list, &gate_entries);
+ entry_num++;
+
+ while (backarg) {
+ PREV_ARG();
+ backarg--;
+ }
+
+ } else if (matches(*argv, "reclassify") == 0 ||
+ matches(*argv, "drop") == 0 ||
+ matches(*argv, "shot") == 0 ||
+ matches(*argv, "continue") == 0 ||
+ matches(*argv, "pass") == 0 ||
+ matches(*argv, "ok") == 0 ||
+ matches(*argv, "pipe") == 0 ||
+ matches(*argv, "goto") == 0) {
+ if (parse_action_control(&argc, &argv,
+ &parm.action, false)) {
+ free_entries(&gate_entries);
+ return -1;
+ }
+ } else if (matches(*argv, "help") == 0) {
+ usage();
+ } else {
+ break;
+ }
+
+ argc--;
+ argv++;
+ }
+
+ parse_action_control_dflt(&argc, &argv, &parm.action,
+ false, TC_ACT_PIPE);
+
+ if (!entry_num && !parm.index) {
+ fprintf(stderr, "gate: must add at least one entry\n");
+ exit(-1);
+ }
+
+ tail = addattr_nest(n, MAX_MSG, tca_id | NLA_F_NESTED);
+ addattr_l(n, MAX_MSG, TCA_GATE_PARMS, &parm, sizeof(parm));
+
+ if (prio != -1)
+ addattr_l(n, MAX_MSG, TCA_GATE_PRIORITY, &prio, sizeof(prio));
+
+ if (flags)
+ addattr_l(n, MAX_MSG, TCA_GATE_FLAGS, &flags, sizeof(flags));
+
+ if (base_time)
+ addattr_l(n, MAX_MSG, TCA_GATE_BASE_TIME,
+ &base_time, sizeof(base_time));
+
+ if (cycle_time)
+ addattr_l(n, MAX_MSG, TCA_GATE_CYCLE_TIME,
+ &cycle_time, sizeof(cycle_time));
+
+ if (cycle_time_ext)
+ addattr_l(n, MAX_MSG, TCA_GATE_CYCLE_TIME_EXT,
+ &cycle_time_ext, sizeof(cycle_time_ext));
+
+ if (clockid != CLOCKID_INVALID)
+ addattr_l(n, MAX_MSG, TCA_GATE_CLOCKID,
+ &clockid, sizeof(clockid));
+
+ nle = addattr_nest(n, MAX_MSG, TCA_GATE_ENTRY_LIST | NLA_F_NESTED);
+ err = add_gate_list(&gate_entries, n);
+ if (err < 0) {
+ fprintf(stderr, "Could not add entries to netlink message\n");
+ free_entries(&gate_entries);
+ return -1;
+ }
+
+ addattr_nest_end(n, nle);
+ addattr_nest_end(n, tail);
+ free_entries(&gate_entries);
+ *argc_p = argc;
+ *argv_p = argv;
+
+ return 0;
+err_arg:
+ invarg(invalidarg, *argv);
+ free_entries(&gate_entries);
+
+ exit(-1);
+}
+
+static int print_gate_list(struct rtattr *list)
+{
+ struct rtattr *item;
+ int rem;
+
+ rem = RTA_PAYLOAD(list);
+ print_string(PRINT_FP, NULL, "%s", _SL_);
+
+ for (item = RTA_DATA(list);
+ RTA_OK(item, rem);
+ item = RTA_NEXT(item, rem)) {
+ struct rtattr *tb[TCA_GATE_ENTRY_MAX + 1];
+ __u32 index = 0, interval = 0;
+ __u8 gate_state = 0;
+ __s32 ipv = -1, maxoctets = -1;
+
+ parse_rtattr_nested(tb, TCA_GATE_ENTRY_MAX, item);
+
+ if (tb[TCA_GATE_ENTRY_INDEX])
+ index = rta_getattr_u32(tb[TCA_GATE_ENTRY_INDEX]);
+
+ if (tb[TCA_GATE_ENTRY_GATE])
+ gate_state = 1;
+
+ if (tb[TCA_GATE_ENTRY_INTERVAL])
+ interval = rta_getattr_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
+
+ if (tb[TCA_GATE_ENTRY_IPV])
+ ipv = rta_getattr_s32(tb[TCA_GATE_ENTRY_IPV]);
+
+ if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
+ maxoctets = rta_getattr_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
+
+ print_uint(PRINT_ANY, "number", "\t number %4u", index);
+ print_string(PRINT_ANY, "gate state", "\tgate-state %-8s",
+ gate_state ? "open" : "close");
+
+ print_uint(PRINT_ANY, "interval", "\tinterval %-16u", interval);
+
+ if (ipv != -1)
+ print_uint(PRINT_ANY, "ipv", "\tipv %-8u", ipv);
+ else
+ print_string(PRINT_FP, "ipv", "\tipv %s", "wildcard");
+
+ if (maxoctets != -1)
+ print_uint(PRINT_ANY, "max_octets",
+ "\tmax-octets %-8u", maxoctets);
+ else
+ print_string(PRINT_FP, "max_octets",
+ "\tmax-octets %s", "wildcard");
+
+ print_string(PRINT_FP, NULL, "%s", _SL_);
+ }
+
+ return 0;
+}
+
+static int print_gate(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+ struct tc_gate *parm;
+ struct rtattr *tb[TCA_GATE_MAX + 1];
+ __s32 clockid = CLOCKID_INVALID;
+ __u64 base_time = 0;
+ __u64 cycle_time = 0;
+ __u64 cycle_time_ext = 0;
+ int prio = -1;
+
+ if (arg == NULL)
+ return -1;
+
+ parse_rtattr_nested(tb, TCA_GATE_MAX, arg);
+
+ if (!tb[TCA_GATE_PARMS]) {
+ fprintf(stderr, "Missing gate parameters\n");
+ return -1;
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ parm = RTA_DATA(tb[TCA_GATE_PARMS]);
+
+ if (tb[TCA_GATE_PRIORITY])
+ prio = rta_getattr_s32(tb[TCA_GATE_PRIORITY]);
+
+ if (prio != -1)
+ print_int(PRINT_ANY, "priority", "\tpriority %-8d", prio);
+ else
+ print_string(PRINT_FP, "priority", "\tpriority %s", "wildcard");
+
+ if (tb[TCA_GATE_CLOCKID])
+ clockid = rta_getattr_s32(tb[TCA_GATE_CLOCKID]);
+ print_string(PRINT_ANY, "clockid", "\tclockid %s",
+ get_clock_name(clockid));
+
+ if (tb[TCA_GATE_FLAGS]) {
+ __u32 flags;
+
+ flags = rta_getattr_u32(tb[TCA_GATE_FLAGS]);
+ print_0xhex(PRINT_ANY, "flags", "\tflags %#x", flags);
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ if (tb[TCA_GATE_BASE_TIME])
+ base_time = rta_getattr_u64(tb[TCA_GATE_BASE_TIME]);
+
+ print_lluint(PRINT_ANY, "base_time", "\tbase-time %-22lld", base_time);
+
+ if (tb[TCA_GATE_CYCLE_TIME])
+ cycle_time = rta_getattr_u64(tb[TCA_GATE_CYCLE_TIME]);
+
+ print_lluint(PRINT_ANY, "cycle_time",
+ "\tcycle-time %-16lld", cycle_time);
+
+ if (tb[TCA_GATE_CYCLE_TIME_EXT])
+ cycle_time = rta_getattr_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
+
+ print_lluint(PRINT_ANY, "cycle_time_ext", "\tcycle-time-ext %-16lld",
+ cycle_time_ext);
+
+ if (tb[TCA_GATE_ENTRY_LIST])
+ print_gate_list(tb[TCA_GATE_ENTRY_LIST]);
+
+ print_action_control(f, "\t", parm->action, "");
+
+ print_uint(PRINT_ANY, "index", "\n\t index %u", parm->index);
+ print_int(PRINT_ANY, "ref", " ref %d", parm->refcnt);
+ print_int(PRINT_ANY, "bind", " bind %d", parm->bindcnt);
+
+ if (show_stats) {
+ if (tb[TCA_GATE_TM]) {
+ struct tcf_t *tm = RTA_DATA(tb[TCA_GATE_TM]);
+
+ print_tm(f, tm);
+ }
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ return 0;
+}
--
2.17.1

2020-05-03 06:58:11

by Po Liu

[permalink] [raw]
Subject: [v3,iproute2 2/2] iproute2: add gate action man page

This patch is to add the man page for the tc gate action.

Signed-off-by: Po Liu <[email protected]>
---
man/man8/tc-gate.8 | 123 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 123 insertions(+)
create mode 100644 man/man8/tc-gate.8

diff --git a/man/man8/tc-gate.8 b/man/man8/tc-gate.8
new file mode 100644
index 00000000..0f48d7f3
--- /dev/null
+++ b/man/man8/tc-gate.8
@@ -0,0 +1,123 @@
+.TH GATE 8 "12 Mar 2020" "iproute2" "Linux"
+.SH NAME
+gate \- Stream Gate Action
+.SH SYNOPSIS
+.B tc " ... " action gate
+.ti +8
+.B [ base-time
+BASETIME ]
+.B [ clockid
+CLOCKID ]
+.ti +8
+.B sched-entry
+<gate state> <interval 1> [ <internal priority> <max octets> ]
+.ti +8
+.B sched-entry
+<gate state> <interval 2> [ <internal priority> <max octets> ]
+.ti +8
+.B sched-entry
+<gate state> <interval 3> [ <internal priority> <max octets> ]
+.ti +8
+.B ......
+.ti +8
+.B sched-entry
+<gate state> <interval N> [ <internal priority> <max octets> ]
+
+.SH DESCRIPTION
+GATE action allows specified ingress frames can be passed at
+specific time slot, or be dropped at specific time slot. Tc filter
+filters the ingress frames, then tc gate action would specify which time
+slot and how many bytes these frames can be passed to device and
+which time slot frames would be dropped.
+Gate action also assign a base-time to tell when the entry list start.
+Then gate action would start to repeat the gate entry list cyclically
+at the start base-time.
+For the software simulation, gate action requires the user assign reference
+time clock type.
+
+.SH PARAMETERS
+
+.TP
+base-time
+.br
+Specifies the instant in nanoseconds, defining the time when the schedule
+starts. If 'base-time' is a time in the past, the schedule will start at
+
+base-time + (N * cycle-time)
+
+where N is the smallest integer so the resulting time is greater than
+"now", and "cycle-time" is the sum of all the intervals of the entries
+in the schedule. Without base-time specified, will default to be 0.
+
+.TP
+clockid
+.br
+Specifies the clock to be used by qdisc's internal timer for measuring
+time and scheduling events. Not valid if gate action is used for offloading
+filter.
+For example, tc filter command with
+.B skip_sw
+parameter.
+
+.TP
+sched-entry
+.br
+There may multiple
+.B sched-entry
+parameters in a single schedule. Each one has the format:
+
+sched-entry <gate state> <interval> [ <internal priority> <max octets> ]
+
+.br
+<gate state> means gate states. 'open' keep gate open, 'close' keep gate close.
+.br
+<interval> means how much nano seconds for this time slot.
+.br
+<internal priority> means internal priority value. Present of the
+internal receiving queue for this stream. "-1" means wildcard.
+<internal priority> and <max octets> can be omit default to be "-1" which both
+ value to be "-1" for this <sched-entry>.
+.br
+<max octets> means how many octets size for this time slot. Dropped
+if overlimited. "-1" means wildcard. <max octets> can be omit default to be
+"-1" which value to be "-1" for this <sched-entry>.
+.br
+Note that <internal priority> and <max octets> are nothing meaning for gate state
+is "close" in a "sched-entry". All frames are dropped when "sched-entry" with
+"close" state.
+
+.SH EXAMPLES
+
+The following example shows tc filter frames source ip match to the
+192.168.0.20 will keep the gate open for 200ms and limit the traffic to 8MB
+in this sched-entry. Then keep the traffic gate to be close for 100ms.
+Frames arrived at gate close state would be dropped. Then the cycle would
+run the gate entries periodically. The schedule will start at instant 200.0s
+using the reference CLOCK_TAI. The schedule is composed of two entries
+each of 300ms duration.
+
+.EX
+# tc qdisc add dev eth0 ingress
+# tc filter add dev eth0 parent ffff: protocol ip \\
+ flower skip_hw src_ip 192.168.0.20 \\
+ action gate index 2 clockid CLOCK_TAI \\
+ base-time 200000000000 \\
+ sched-entry open 200000000 -1 8000000 \\
+ sched-entry close 100000000
+
+.EE
+
+Following commands is an example to filter a stream source mac match to the
+10:00:80:00:00:00 icmp frames will be dropped at any time with cycle 200ms.
+With a default basetime 0 and clockid is CLOCK_TAI as default.
+
+.EX
+# tc qdisc add dev eth0 ingress
+# tc filter add dev eth0 parent ffff: protocol ip \\
+ flower ip_proto icmp dst_mac 10:00:80:00:00:00 \\
+ action gate index 12 sched-entry close 200000000
+
+.EE
+
+.SH AUTHORS
+Po Liu <[email protected]>
--
2.17.1

2020-05-05 00:08:54

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [v3,iproute2 1/2] iproute2:tc:action: add a gate control action

On Sun, 3 May 2020 14:32:50 +0800
Po Liu <[email protected]> wrote:

> + print_string(PRINT_ANY, "gate state", "\tgate-state %-8s",

NAK
Space in a json tag is not valid.

Please run a dump command and feed it into JSON validation checker like Python.

2020-05-05 00:09:42

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [v3,iproute2 1/2] iproute2:tc:action: add a gate control action

On Sun, 3 May 2020 14:32:50 +0800
Po Liu <[email protected]> wrote:

> Introduce a ingress frame gate control flow action.
> Tc gate action does the work like this:
> Assume there is a gate allow specified ingress frames can pass at
> specific time slot, and also drop at specific time slot. Tc filter
> chooses the ingress frames, and tc gate action would specify what slot
> does these frames can be passed to device and what time slot would be
> dropped.
> Tc gate action would provide an entry list to tell how much time gate
> keep open and how much time gate keep state close. Gate action also
> assign a start time to tell when the entry list start. Then driver would
> repeat the gate entry list cyclically.
> For the software simulation, gate action require the user assign a time
> clock type.
>
> Below is the setting example in user space. Tc filter a stream source ip
> address is 192.168.0.20 and gate action own two time slots. One is last
> 200ms gate open let frame pass another is last 100ms gate close let
> frames dropped.
>
> # tc qdisc add dev eth0 ingress
> # tc filter add dev eth0 parent ffff: protocol ip \
>
> flower src_ip 192.168.0.20 \
> action gate index 2 clockid CLOCK_TAI \
> sched-entry open 200000000 -1 -1 \
> sched-entry close 100000000
>
> # tc chain del dev eth0 ingress chain 0
>
> "sched-entry" follow the name taprio style. Gate state is
> "open"/"close". Follow the period nanosecond. Then next -1 is internal
> priority value means which ingress queue should put to. "-1" means
> wildcard. The last value optional specifies the maximum number of
> MSDU octets that are permitted to pass the gate during the specified
> time interval.
>
> Below example shows filtering a stream with destination mac address is
> 10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
> action would run with one close time slot which means always keep close.
> The time cycle is total 200000000ns. The base-time would calculate by:
>
> 1357000000000 + (N + 1) * cycletime
>
> When the total value is the future time, it will be the start time.
> The cycletime here would be 200000000ns for this case.
>
> #tc filter add dev eth0 parent ffff: protocol ip \
> flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
> action gate index 12 base-time 1357000000000 \
> sched-entry CLOSE 200000000 \
> clockid CLOCK_TAI
>
> Signed-off-by: Po Liu <[email protected]>

These changes are specific to net-next should be assigned to iproute2-next.
Will change delegation.

2020-05-05 09:30:29

by Po Liu

[permalink] [raw]
Subject: RE: Re: [v3,iproute2 1/2] iproute2:tc:action: add a gate control action

Hi Stephen,

> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
> Sent: 2020??5??5?? 8:07
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> Claudiu Manoil <[email protected]>; Vladimir Oltean
> <[email protected]>; Alexandru Marginean
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]
> Subject: Re: [v3,iproute2 1/2] iproute2:tc:action: add a gate control
> action
>
> On Sun, 3 May 2020 14:32:50 +0800
> Po Liu <[email protected]> wrote:
>
> > + print_string(PRINT_ANY, "gate state", "\tgate-state %-8s",
>
> NAK
> Space in a json tag is not valid.
>
> Please run a dump command and feed it into JSON validation checker like
> Python.

I would test json format and upload new version patch set.

Thanks!
Br,
Po Liu

2020-05-06 11:25:49

by Po Liu

[permalink] [raw]
Subject: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a gate control action

Introduce a ingress frame gate control flow action.
Tc gate action does the work like this:
Assume there is a gate allow specified ingress frames can pass at
specific time slot, and also drop at specific time slot. Tc filter
chooses the ingress frames, and tc gate action would specify what slot
does these frames can be passed to device and what time slot would be
dropped.
Tc gate action would provide an entry list to tell how much time gate
keep open and how much time gate keep state close. Gate action also
assign a start time to tell when the entry list start. Then driver would
repeat the gate entry list cyclically.
For the software simulation, gate action require the user assign a time
clock type.

Below is the setting example in user space. Tc filter a stream source ip
address is 192.168.0.20 and gate action own two time slots. One is last
200ms gate open let frame pass another is last 100ms gate close let
frames dropped.

# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 parent ffff: protocol ip \

flower src_ip 192.168.0.20 \
action gate index 2 clockid CLOCK_TAI \
sched-entry open 200000000 -1 -1 \
sched-entry close 100000000

# tc chain del dev eth0 ingress chain 0

"sched-entry" follow the name taprio style. Gate state is
"open"/"close". Follow the period nanosecond. Then next -1 is internal
priority value means which ingress queue should put to. "-1" means
wildcard. The last value optional specifies the maximum number of
MSDU octets that are permitted to pass the gate during the specified
time interval.

Below example shows filtering a stream with destination mac address is
10:00:80:00:00:00 and ip type is ICMP, follow the action gate. The gate
action would run with one close time slot which means always keep close.
The time cycle is total 200000000ns. The base-time would calculate by:

1357000000000 + (N + 1) * cycletime

When the total value is the future time, it will be the start time.
The cycletime here would be 200000000ns for this case.

#tc filter add dev eth0 parent ffff: protocol ip \
flower skip_hw ip_proto icmp dst_mac 10:00:80:00:00:00 \
action gate index 12 base-time 1357000000000 \
sched-entry CLOSE 200000000 \
clockid CLOCK_TAI

Signed-off-by: Po Liu <[email protected]>
---
These patches continue request for support iprout2 tc command input gate
action since kernel patch applied (a51c328df310 net: qos: introduce a
gate control flow action).
Continue the version 3.

Changes from v2:
Fix flexible input for a time slot - sched-entry suggested by Vladimir
Oltean and Vinicius Gomes:
- ipv and maxoctets in a sched-entry can be ignore input default to be
wildcard(values are -1).

Changes from v3:
- Fix json 'gate state' output update json format output: add array for
'sched-entry'
- Update input for 'sched-entry' parameters absense conditions


include/uapi/linux/pkt_cls.h | 1 +
include/uapi/linux/tc_act/tc_gate.h | 47 +++
tc/Makefile | 1 +
tc/m_gate.c | 556 ++++++++++++++++++++++++++++
4 files changed, 605 insertions(+)
create mode 100644 include/uapi/linux/tc_act/tc_gate.h
create mode 100644 tc/m_gate.c

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 9f06d29c..fc672b23 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -134,6 +134,7 @@ enum tca_id {
TCA_ID_CTINFO,
TCA_ID_MPLS,
TCA_ID_CT,
+ TCA_ID_GATE,
/* other actions go here */
__TCA_ID_MAX = 255
};
diff --git a/include/uapi/linux/tc_act/tc_gate.h b/include/uapi/linux/tc_act/tc_gate.h
new file mode 100644
index 00000000..f214b3a6
--- /dev/null
+++ b/include/uapi/linux/tc_act/tc_gate.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0+ WITH Linux-syscall-note */
+/* Copyright 2020 NXP */
+
+#ifndef __LINUX_TC_GATE_H
+#define __LINUX_TC_GATE_H
+
+#include <linux/pkt_cls.h>
+
+struct tc_gate {
+ tc_gen;
+};
+
+enum {
+ TCA_GATE_ENTRY_UNSPEC,
+ TCA_GATE_ENTRY_INDEX,
+ TCA_GATE_ENTRY_GATE,
+ TCA_GATE_ENTRY_INTERVAL,
+ TCA_GATE_ENTRY_IPV,
+ TCA_GATE_ENTRY_MAX_OCTETS,
+ __TCA_GATE_ENTRY_MAX,
+};
+#define TCA_GATE_ENTRY_MAX (__TCA_GATE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_ONE_ENTRY_UNSPEC,
+ TCA_GATE_ONE_ENTRY,
+ __TCA_GATE_ONE_ENTRY_MAX,
+};
+#define TCA_GATE_ONE_ENTRY_MAX (__TCA_GATE_ONE_ENTRY_MAX - 1)
+
+enum {
+ TCA_GATE_UNSPEC,
+ TCA_GATE_TM,
+ TCA_GATE_PARMS,
+ TCA_GATE_PAD,
+ TCA_GATE_PRIORITY,
+ TCA_GATE_ENTRY_LIST,
+ TCA_GATE_BASE_TIME,
+ TCA_GATE_CYCLE_TIME,
+ TCA_GATE_CYCLE_TIME_EXT,
+ TCA_GATE_FLAGS,
+ TCA_GATE_CLOCKID,
+ __TCA_GATE_MAX,
+};
+#define TCA_GATE_MAX (__TCA_GATE_MAX - 1)
+
+#endif
diff --git a/tc/Makefile b/tc/Makefile
index e31cbc12..79c9c1dd 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -54,6 +54,7 @@ TCMODULES += m_bpf.o
TCMODULES += m_tunnel_key.o
TCMODULES += m_sample.o
TCMODULES += m_ct.o
+TCMODULES += m_gate.o
TCMODULES += p_ip.o
TCMODULES += p_ip6.o
TCMODULES += p_icmp.o
diff --git a/tc/m_gate.c b/tc/m_gate.c
new file mode 100644
index 00000000..15d9995b
--- /dev/null
+++ b/tc/m_gate.c
@@ -0,0 +1,556 @@
+// SPDX-License-Identifier: (GPL-2.0+ OR BSD-3-Clause)
+/* Copyright 2020 NXP */
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <string.h>
+#include <linux/if_ether.h>
+#include "utils.h"
+#include "rt_names.h"
+#include "tc_util.h"
+#include "list.h"
+#include <linux/tc_act/tc_gate.h>
+
+struct gate_entry {
+ struct list_head list;
+ uint8_t gate_state;
+ uint32_t interval;
+ int32_t ipv;
+ int32_t maxoctets;
+};
+
+#define CLOCKID_INVALID (-1)
+static const struct clockid_table {
+ const char *name;
+ clockid_t clockid;
+} clockt_map[] = {
+ { "REALTIME", CLOCK_REALTIME },
+ { "TAI", CLOCK_TAI },
+ { "BOOTTIME", CLOCK_BOOTTIME },
+ { "MONOTONIC", CLOCK_MONOTONIC },
+ { NULL }
+};
+
+static void explain(void)
+{
+ fprintf(stderr,
+ "Usage: gate [ priority PRIO-SPEC ] [ base-time BASE-TIME ]\n"
+ " [ cycle-time CYCLE-TIME ]\n"
+ " [ cycle-time-ext CYCLE-TIME-EXT ]\n"
+ " [ clockid CLOCKID ] [flags FLAGS]\n"
+ " [ sched-entry GATE0 INTERVAL [ INTERNAL-PRIO-VALUE MAX-OCTETS ] ]\n"
+ " [ sched-entry GATE1 INTERVAL [ INTERNAL-PRIO-VALUE MAX-OCTETS ] ]\n"
+ " ......\n"
+ " [ sched-entry GATEn INTERVAL [ INTERNAL-PRIO-VALUE MAX-OCTETS ] ]\n"
+ " [ CONTROL ]\n"
+ " GATEn := open | close\n"
+ " INTERVAL : nanoseconds period of gate slot\n"
+ " INTERNAL-PRIO-VALUE : internal priority decide which\n"
+ " rx queue number direct to.\n"
+ " default to be -1 which means wildcard.\n"
+ " MAX-OCTETS : maximum number of MSDU octets that are\n"
+ " permitted to pas the gate during the\n"
+ " specified TimeInterval.\n"
+ " default to be -1 which means wildcard.\n"
+ " CONTROL := pipe | drop | continue | pass |\n"
+ " goto chain <CHAIN_INDEX>\n");
+}
+
+static void usage(void)
+{
+ explain();
+ exit(-1);
+}
+
+static void explain_entry_format(void)
+{
+ fprintf(stderr, "Usage: sched-entry <open | close> <interval> [ <interval ipv> <octets max bytes> ]\n");
+}
+
+static int parse_gate(struct action_util *a, int *argc_p, char ***argv_p,
+ int tca_id, struct nlmsghdr *n);
+static int print_gate(struct action_util *au, FILE *f, struct rtattr *arg);
+
+struct action_util gate_action_util = {
+ .id = "gate",
+ .parse_aopt = parse_gate,
+ .print_aopt = print_gate,
+};
+
+static int get_clockid(__s32 *val, const char *arg)
+{
+ const struct clockid_table *c;
+
+ if (strcasestr(arg, "CLOCK_") != NULL)
+ arg += sizeof("CLOCK_") - 1;
+
+ for (c = clockt_map; c->name; c++) {
+ if (strcasecmp(c->name, arg) == 0) {
+ *val = c->clockid;
+ return 0;
+ }
+ }
+
+ return -1;
+}
+
+static const char *get_clock_name(clockid_t clockid)
+{
+ const struct clockid_table *c;
+
+ for (c = clockt_map; c->name; c++) {
+ if (clockid == c->clockid)
+ return c->name;
+ }
+
+ return "invalid";
+}
+
+static int get_gate_state(__u8 *val, const char *arg)
+{
+ if (!strcasecmp("OPEN", arg)) {
+ *val = 1;
+ return 0;
+ }
+
+ if (!strcasecmp("CLOSE", arg)) {
+ *val = 0;
+ return 0;
+ }
+
+ return -1;
+}
+
+static struct gate_entry *create_gate_entry(uint8_t gate_state,
+ uint32_t interval,
+ int32_t ipv,
+ int32_t maxoctets)
+{
+ struct gate_entry *e;
+
+ e = calloc(1, sizeof(*e));
+ if (!e)
+ return NULL;
+
+ e->gate_state = gate_state;
+ e->interval = interval;
+ e->ipv = ipv;
+ e->maxoctets = maxoctets;
+
+ return e;
+}
+
+static int add_gate_list(struct list_head *gate_entries, struct nlmsghdr *n)
+{
+ struct gate_entry *e;
+
+ list_for_each_entry(e, gate_entries, list) {
+ struct rtattr *a;
+
+ a = addattr_nest(n, 1024, TCA_GATE_ONE_ENTRY | NLA_F_NESTED);
+
+ if (e->gate_state)
+ addattr(n, MAX_MSG, TCA_GATE_ENTRY_GATE);
+
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_INTERVAL,
+ &e->interval, sizeof(e->interval));
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_IPV,
+ &e->ipv, sizeof(e->ipv));
+ addattr_l(n, MAX_MSG, TCA_GATE_ENTRY_MAX_OCTETS,
+ &e->maxoctets, sizeof(e->maxoctets));
+
+ addattr_nest_end(n, a);
+ }
+
+ return 0;
+}
+
+static void free_entries(struct list_head *gate_entries)
+{
+ struct gate_entry *e, *n;
+
+ list_for_each_entry_safe(e, n, gate_entries, list) {
+ list_del(&e->list);
+ free(e);
+ }
+}
+
+static int parse_gate(struct action_util *a, int *argc_p, char ***argv_p,
+ int tca_id, struct nlmsghdr *n)
+{
+ struct tc_gate parm = {.action = TC_ACT_PIPE};
+ struct list_head gate_entries;
+ __s32 clockid = CLOCKID_INVALID;
+ struct rtattr *tail, *nle;
+ char **argv = *argv_p;
+ int argc = *argc_p;
+ __u64 base_time = 0;
+ __u64 cycle_time = 0;
+ __u64 cycle_time_ext = 0;
+ int entry_num = 0;
+ char *invalidarg;
+ __u32 flags = 0;
+ int prio = -1;
+
+ int err;
+
+ if (matches(*argv, "gate") != 0)
+ return -1;
+
+ NEXT_ARG();
+ if (argc <= 0)
+ return -1;
+
+ INIT_LIST_HEAD(&gate_entries);
+
+ while (argc > 0) {
+ if (matches(*argv, "index") == 0) {
+ NEXT_ARG();
+ if (get_u32(&parm.index, *argv, 10)) {
+ invalidarg = "index";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "priority") == 0) {
+ NEXT_ARG();
+ if (get_s32(&prio, *argv, 0)) {
+ invalidarg = "priority";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "base-time") == 0) {
+ NEXT_ARG();
+ if (get_u64(&base_time, *argv, 10)) {
+ invalidarg = "base-time";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "cycle-time") == 0) {
+ NEXT_ARG();
+ if (get_u64(&cycle_time, *argv, 10)) {
+ invalidarg = "cycle-time";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "cycle-time-ext") == 0) {
+ NEXT_ARG();
+ if (get_u64(&cycle_time_ext, *argv, 10)) {
+ invalidarg = "cycle-time-ext";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "clockid") == 0) {
+ NEXT_ARG();
+ if (get_clockid(&clockid, *argv)) {
+ invalidarg = "clockid";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "flags") == 0) {
+ NEXT_ARG();
+ if (get_u32(&flags, *argv, 0)) {
+ invalidarg = "flags";
+ goto err_arg;
+ }
+ } else if (matches(*argv, "sched-entry") == 0) {
+ struct gate_entry *e;
+ uint8_t gate_state = 0;
+ uint32_t interval = 0;
+ int32_t ipv = -1;
+ int32_t maxoctets = -1;
+
+ if (!NEXT_ARG_OK()) {
+ explain_entry_format();
+ fprintf(stderr, "\"sched-entry\" is imcomplete\n");
+ free_entries(&gate_entries);
+ return -1;
+ }
+
+ NEXT_ARG();
+
+ if (get_gate_state(&gate_state, *argv)) {
+ explain_entry_format();
+ fprintf(stderr, "\"sched-entry\" is imcomplete\n");
+ free_entries(&gate_entries);
+ return -1;
+ }
+
+ if (!NEXT_ARG_OK()) {
+ explain_entry_format();
+ fprintf(stderr, "\"sched-entry\" is imcomplete\n");
+ free_entries(&gate_entries);
+ return -1;
+ }
+
+ NEXT_ARG();
+
+ if (get_u32(&interval, *argv, 0)) {
+ explain_entry_format();
+ fprintf(stderr, "\"sched-entry\" is imcomplete\n");
+ free_entries(&gate_entries);
+ return -1;
+ }
+
+ if (!NEXT_ARG_OK())
+ goto create_entry;
+
+ NEXT_ARG();
+
+ if (get_s32(&ipv, *argv, 0)) {
+ PREV_ARG();
+ goto create_entry;
+ }
+
+ if (!gate_state)
+ ipv = -1;
+
+ if (!NEXT_ARG_OK())
+ goto create_entry;
+
+ NEXT_ARG();
+
+ if (get_s32(&maxoctets, *argv, 0))
+ PREV_ARG();
+
+ if (!gate_state)
+ maxoctets = -1;
+
+create_entry:
+ e = create_gate_entry(gate_state, interval,
+ ipv, maxoctets);
+ if (!e) {
+ fprintf(stderr, "gate: not enough memory\n");
+ free_entries(&gate_entries);
+ return -1;
+ }
+
+ list_add_tail(&e->list, &gate_entries);
+ entry_num++;
+
+ } else if (matches(*argv, "reclassify") == 0 ||
+ matches(*argv, "drop") == 0 ||
+ matches(*argv, "shot") == 0 ||
+ matches(*argv, "continue") == 0 ||
+ matches(*argv, "pass") == 0 ||
+ matches(*argv, "ok") == 0 ||
+ matches(*argv, "pipe") == 0 ||
+ matches(*argv, "goto") == 0) {
+ if (parse_action_control(&argc, &argv,
+ &parm.action, false)) {
+ free_entries(&gate_entries);
+ return -1;
+ }
+ } else if (matches(*argv, "help") == 0) {
+ usage();
+ } else {
+ break;
+ }
+
+ argc--;
+ argv++;
+ }
+
+ parse_action_control_dflt(&argc, &argv, &parm.action,
+ false, TC_ACT_PIPE);
+
+ if (!entry_num && !parm.index) {
+ fprintf(stderr, "gate: must add at least one entry\n");
+ return -1;
+ }
+
+ tail = addattr_nest(n, MAX_MSG, tca_id | NLA_F_NESTED);
+ addattr_l(n, MAX_MSG, TCA_GATE_PARMS, &parm, sizeof(parm));
+
+ if (prio != -1)
+ addattr_l(n, MAX_MSG, TCA_GATE_PRIORITY, &prio, sizeof(prio));
+
+ if (flags)
+ addattr_l(n, MAX_MSG, TCA_GATE_FLAGS, &flags, sizeof(flags));
+
+ if (base_time)
+ addattr_l(n, MAX_MSG, TCA_GATE_BASE_TIME,
+ &base_time, sizeof(base_time));
+
+ if (cycle_time)
+ addattr_l(n, MAX_MSG, TCA_GATE_CYCLE_TIME,
+ &cycle_time, sizeof(cycle_time));
+
+ if (cycle_time_ext)
+ addattr_l(n, MAX_MSG, TCA_GATE_CYCLE_TIME_EXT,
+ &cycle_time_ext, sizeof(cycle_time_ext));
+
+ if (clockid != CLOCKID_INVALID)
+ addattr_l(n, MAX_MSG, TCA_GATE_CLOCKID,
+ &clockid, sizeof(clockid));
+
+ nle = addattr_nest(n, MAX_MSG, TCA_GATE_ENTRY_LIST | NLA_F_NESTED);
+ err = add_gate_list(&gate_entries, n);
+ if (err < 0) {
+ fprintf(stderr, "Could not add entries to netlink message\n");
+ free_entries(&gate_entries);
+ return -1;
+ }
+
+ addattr_nest_end(n, nle);
+ addattr_nest_end(n, tail);
+ free_entries(&gate_entries);
+ *argc_p = argc;
+ *argv_p = argv;
+
+ return 0;
+err_arg:
+ invarg(invalidarg, *argv);
+ free_entries(&gate_entries);
+
+ return -1;
+}
+
+static int print_gate_list(struct rtattr *list)
+{
+ struct rtattr *item;
+ int rem;
+
+ rem = RTA_PAYLOAD(list);
+
+ print_string(PRINT_FP, NULL, "%s", _SL_);
+ print_string(PRINT_FP, NULL, "\tschedule:%s", _SL_);
+ open_json_array(PRINT_JSON, "schedule");
+
+ for (item = RTA_DATA(list);
+ RTA_OK(item, rem);
+ item = RTA_NEXT(item, rem)) {
+ struct rtattr *tb[TCA_GATE_ENTRY_MAX + 1];
+ __u32 index = 0, interval = 0;
+ __u8 gate_state = 0;
+ __s32 ipv = -1, maxoctets = -1;
+
+ parse_rtattr_nested(tb, TCA_GATE_ENTRY_MAX, item);
+
+ if (tb[TCA_GATE_ENTRY_INDEX])
+ index = rta_getattr_u32(tb[TCA_GATE_ENTRY_INDEX]);
+
+ if (tb[TCA_GATE_ENTRY_GATE])
+ gate_state = 1;
+
+ if (tb[TCA_GATE_ENTRY_INTERVAL])
+ interval = rta_getattr_u32(tb[TCA_GATE_ENTRY_INTERVAL]);
+
+ if (tb[TCA_GATE_ENTRY_IPV])
+ ipv = rta_getattr_s32(tb[TCA_GATE_ENTRY_IPV]);
+
+ if (tb[TCA_GATE_ENTRY_MAX_OCTETS])
+ maxoctets = rta_getattr_s32(tb[TCA_GATE_ENTRY_MAX_OCTETS]);
+
+ open_json_object(NULL);
+ print_uint(PRINT_ANY, "number", "\t number %4u", index);
+ print_string(PRINT_ANY, "gate_state", "\tgate-state %-8s",
+ gate_state ? "open" : "close");
+
+ print_uint(PRINT_ANY, "interval", "\tinterval %-16u", interval);
+
+ if (ipv != -1) {
+ print_uint(PRINT_ANY, "ipv", "\tipv %-8u", ipv);
+ } else {
+ print_int(PRINT_JSON, "ipv", NULL, ipv);
+ print_string(PRINT_FP, NULL, "\tipv %s", "wildcard");
+ }
+
+ if (maxoctets != -1) {
+ print_uint(PRINT_ANY, "max_octets",
+ "\tmax-octets %-8u", maxoctets);
+ } else {
+ print_string(PRINT_FP, NULL,
+ "\tmax-octets %s", "wildcard");
+ print_int(PRINT_JSON, "max_octets", NULL, maxoctets);
+ }
+
+ close_json_object();
+ print_string(PRINT_FP, NULL, "%s", _SL_);
+ }
+
+ close_json_array(PRINT_ANY, "");
+
+ return 0;
+}
+
+static int print_gate(struct action_util *au, FILE *f, struct rtattr *arg)
+{
+ struct tc_gate *parm;
+ struct rtattr *tb[TCA_GATE_MAX + 1];
+ __s32 clockid = CLOCKID_INVALID;
+ __u64 base_time = 0;
+ __u64 cycle_time = 0;
+ __u64 cycle_time_ext = 0;
+ int prio = -1;
+
+ if (arg == NULL)
+ return -1;
+
+ parse_rtattr_nested(tb, TCA_GATE_MAX, arg);
+
+ if (!tb[TCA_GATE_PARMS]) {
+ fprintf(stderr, "Missing gate parameters\n");
+ return -1;
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ parm = RTA_DATA(tb[TCA_GATE_PARMS]);
+
+ if (tb[TCA_GATE_PRIORITY])
+ prio = rta_getattr_s32(tb[TCA_GATE_PRIORITY]);
+
+ if (prio != -1) {
+ print_int(PRINT_ANY, "priority", "\tpriority %-8d", prio);
+ } else {
+ print_string(PRINT_FP, NULL, "\tpriority %s", "wildcard");
+ print_int(PRINT_JSON, "priority", NULL, prio);
+ }
+
+ if (tb[TCA_GATE_CLOCKID])
+ clockid = rta_getattr_s32(tb[TCA_GATE_CLOCKID]);
+ print_string(PRINT_ANY, "clockid", "\tclockid %s",
+ get_clock_name(clockid));
+
+ if (tb[TCA_GATE_FLAGS]) {
+ __u32 flags;
+
+ flags = rta_getattr_u32(tb[TCA_GATE_FLAGS]);
+ print_0xhex(PRINT_ANY, "flags", "\tflags %#x", flags);
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ if (tb[TCA_GATE_BASE_TIME])
+ base_time = rta_getattr_u64(tb[TCA_GATE_BASE_TIME]);
+
+ print_lluint(PRINT_ANY, "base_time", "\tbase-time %-22lld", base_time);
+
+ if (tb[TCA_GATE_CYCLE_TIME])
+ cycle_time = rta_getattr_u64(tb[TCA_GATE_CYCLE_TIME]);
+
+ print_lluint(PRINT_ANY, "cycle_time",
+ "\tcycle-time %-16lld", cycle_time);
+
+ if (tb[TCA_GATE_CYCLE_TIME_EXT])
+ cycle_time = rta_getattr_u64(tb[TCA_GATE_CYCLE_TIME_EXT]);
+
+ print_lluint(PRINT_ANY, "cycle_time_ext", "\tcycle-time-ext %-16lld",
+ cycle_time_ext);
+
+ if (tb[TCA_GATE_ENTRY_LIST])
+ print_gate_list(tb[TCA_GATE_ENTRY_LIST]);
+
+ print_action_control(f, "\t", parm->action, "");
+
+ print_uint(PRINT_ANY, "index", "\n\t index %u", parm->index);
+ print_int(PRINT_ANY, "ref", " ref %d", parm->refcnt);
+ print_int(PRINT_ANY, "bind", " bind %d", parm->bindcnt);
+
+ if (show_stats) {
+ if (tb[TCA_GATE_TM]) {
+ struct tcf_t *tm = RTA_DATA(tb[TCA_GATE_TM]);
+
+ print_tm(f, tm);
+ }
+ }
+
+ print_string(PRINT_FP, NULL, "%s", "\n");
+
+ return 0;
+}
--
2.17.1

2020-05-06 15:23:45

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a gate control action

On Wed, 6 May 2020 16:40:19 +0800
Po Liu <[email protected]> wrote:

> } else if (matches(*argv, "base-time") == 0) {
> + NEXT_ARG();
> + if (get_u64(&base_time, *argv, 10)) {
> + invalidarg = "base-time";
> + goto err_arg;
> + }
> + } else if (matches(*argv, "cycle-time") == 0) {
> + NEXT_ARG();
> + if (get_u64(&cycle_time, *argv, 10)) {
> + invalidarg = "cycle-time";
> + goto err_arg;
> + }
> + } else if (matches(*argv, "cycle-time-ext") == 0) {
> + NEXT_ARG();
> + if (get_u64(&cycle_time_ext, *argv, 10)) {
> + invalidarg = "cycle-time-ext";
> + goto err_arg;
> + }

Could all these time values use existing TC helper routines?
See get_time(). The way you have it makes sense for hardware
but stands out versus the rest of tc.

It maybe that the kernel UAPI is wrong, and should be using same
time units as rest of tc. Forgot to review that part of the patch.

2020-05-06 23:04:25

by Davide Caratti

[permalink] [raw]
Subject: Re: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a gate control action

On Wed, 2020-05-06 at 16:40 +0800, Po Liu wrote:
> Introduce a ingress frame gate control flow action.
[...]

hello Po Liu,

[...]

> +create_entry:
> + e = create_gate_entry(gate_state, interval,
> + ipv, maxoctets);
> + if (!e) {
> + fprintf(stderr, "gate: not enough memory\n");
> + free_entries(&gate_entries);
> + return -1;
> + }
> +
> + list_add_tail(&e->list, &gate_entries);
> + entry_num++;
> +
> + } else if (matches(*argv, "reclassify") == 0 ||
> + matches(*argv, "drop") == 0 ||
> + matches(*argv, "shot") == 0 ||
> + matches(*argv, "continue") == 0 ||
> + matches(*argv, "pass") == 0 ||
> + matches(*argv, "ok") == 0 ||
> + matches(*argv, "pipe") == 0 ||
> + matches(*argv, "goto") == 0) {
> + if (parse_action_control(&argc, &argv,
> + &parm.action, false)) {
> + free_entries(&gate_entries);
> + return -1;
> + }
> + } else if (matches(*argv, "help") == 0) {
> + usage();
> + } else {
> + break;
> + }
> +
> + argc--;
> + argv++;
> + }
> +
> + parse_action_control_dflt(&argc, &argv, &parm.action,
> + false, TC_ACT_PIPE);

it seems that the control action is parsed twice, and the first time it
does not allow "jump" and "trap". Is that intentional? IOW, are there some
"act_gate" configurations that don't allow jump or trap?

I don't see anything similar in kernel act_gate.c, where tcf_gate_act()
can return TC_ACT_SHOT or whatever is written in parm.action. That's why
I'm asking, if these two control actions are forbidden you should let the
kernel return -EINVAL with a proper extack in tcf_gate_init(). Can you
please clarify?

thank you in advance!
--
davide


2020-05-07 03:23:17

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a gate control action

Hi Davide,


> -----Original Message-----
> From: Davide Caratti <[email protected]>
> Sent: 2020年5月6日 20:54
> To: Po Liu <[email protected]>; [email protected]; linux-
> [email protected]; [email protected]
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>
> Subject: [EXT] Re: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a
> gate control action
>
> Caution: EXT Email
>
> On Wed, 2020-05-06 at 16:40 +0800, Po Liu wrote:
> > Introduce a ingress frame gate control flow action.
> [...]
>
> hello Po Liu,
>
> [...]
>
> > +create_entry:
> > + e = create_gate_entry(gate_state, interval,
> > + ipv, maxoctets);
> > + if (!e) {
> > + fprintf(stderr, "gate: not enough memory\n");
> > + free_entries(&gate_entries);
> > + return -1;
> > + }
> > +
> > + list_add_tail(&e->list, &gate_entries);
> > + entry_num++;
> > +
> > + } else if (matches(*argv, "reclassify") == 0 ||
> > + matches(*argv, "drop") == 0 ||
> > + matches(*argv, "shot") == 0 ||
> > + matches(*argv, "continue") == 0 ||
> > + matches(*argv, "pass") == 0 ||
> > + matches(*argv, "ok") == 0 ||
> > + matches(*argv, "pipe") == 0 ||
> > + matches(*argv, "goto") == 0) {
> > + if (parse_action_control(&argc, &argv,
> > + &parm.action, false)) {
> > + free_entries(&gate_entries);
> > + return -1;
> > + }
> > + } else if (matches(*argv, "help") == 0) {
> > + usage();
> > + } else {
> > + break;
> > + }
> > +
> > + argc--;
> > + argv++;
> > + }
> > +
> > + parse_action_control_dflt(&argc, &argv, &parm.action,
> > + false, TC_ACT_PIPE);
>
> it seems that the control action is parsed twice, and the first time it does
> not allow "jump" and "trap". Is that intentional? IOW, are there some
> "act_gate" configurations that don't allow jump or trap?

It is allowed to jump and trap. I didn't notice it was loaded twice. I would correct here and remove one parse_action_control()
Thanks a lot!

>
> I don't see anything similar in kernel act_gate.c, where tcf_gate_act() can
> return TC_ACT_SHOT or whatever is written in parm.action. That's why I'm
> asking, if these two control actions are forbidden you should let the kernel
> return -EINVAL with a proper extack in tcf_gate_init(). Can you please
> clarify?
>
> thank you in advance!
> --
> davide
>


Br,
Po Liu

2020-05-07 03:25:38

by Po Liu

[permalink] [raw]
Subject: RE: Re: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a gate control action

Hi Stephen,


> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
> Sent: 2020??5??6?? 23:22
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>
> Subject: Re: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a
> gate control action
> On Wed, 6 May 2020 16:40:19 +0800
> Po Liu <[email protected]> wrote:
>
> > } else if (matches(*argv, "base-time") == 0) {
> > + NEXT_ARG();
> > + if (get_u64(&base_time, *argv, 10)) {
> > + invalidarg = "base-time";
> > + goto err_arg;
> > + }
> > + } else if (matches(*argv, "cycle-time") == 0) {
> > + NEXT_ARG();
> > + if (get_u64(&cycle_time, *argv, 10)) {
> > + invalidarg = "cycle-time";
> > + goto err_arg;
> > + }
> > + } else if (matches(*argv, "cycle-time-ext") == 0) {
> > + NEXT_ARG();
> > + if (get_u64(&cycle_time_ext, *argv, 10)) {
> > + invalidarg = "cycle-time-ext";
> > + goto err_arg;
> > + }
>
> Could all these time values use existing TC helper routines?

I agree to keep the tc routines input.
The names of timer input and type is more reference the taprio input.

> See get_time(). The way you have it makes sense for hardware but stands
> out versus the rest of tc.
>
> It maybe that the kernel UAPI is wrong, and should be using same time
> units as rest of tc. Forgot to review that part of the patch.

I would also sync with kernel UAPI if needed.


Br,
Po Liu

2020-05-07 12:33:32

by Po Liu

[permalink] [raw]
Subject: RE: Re: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a gate control action

Hi Stephen,

> -----Original Message-----
> From: Po Liu
> Sent: 2020??5??7?? 10:53
> To: Stephen Hemminger <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>
> Subject: RE: Re: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a gate
> control action
>
> Hi Stephen,
>
>
> > -----Original Message-----
> > From: Stephen Hemminger <[email protected]>
> > Sent: 2020??5??6?? 23:22
> > To: Po Liu <[email protected]>
> > Cc: [email protected]; [email protected];
> > [email protected]; [email protected];
> [email protected];
> > [email protected]; Claudiu Manoil <[email protected]>; Vladimir
> > Oltean <[email protected]>; Alexandru Marginean
> > <[email protected]>
> > Subject: Re: [v4,iproute2-next 1/2] iproute2-next:tc:action: add a
> > gate control action On Wed, 6 May 2020 16:40:19 +0800 Po Liu
> > <[email protected]> wrote:
> >
> > > } else if (matches(*argv, "base-time") == 0) {
> > > + NEXT_ARG();
> > > + if (get_u64(&base_time, *argv, 10)) {
> > > + invalidarg = "base-time";
> > > + goto err_arg;
> > > + }
> > > + } else if (matches(*argv, "cycle-time") == 0) {
> > > + NEXT_ARG();
> > > + if (get_u64(&cycle_time, *argv, 10)) {
> > > + invalidarg = "cycle-time";
> > > + goto err_arg;
> > > + }
> > > + } else if (matches(*argv, "cycle-time-ext") == 0) {
> > > + NEXT_ARG();
> > > + if (get_u64(&cycle_time_ext, *argv, 10)) {
> > > + invalidarg = "cycle-time-ext";
> > > + goto err_arg;
> > > + }
> >
> > Could all these time values use existing TC helper routines?
>
> I agree to keep the tc routines input.
> The names of timer input and type is more reference the taprio input.
>

Shall I support both input method. The default decimal input like 120000 default to nano-second and formal time routines like 120us.
Then the tc show command shows formal time routines like 120us whatever in non-json format. Json format shows a decimal number only which is always done by other tc command.

So this would compatible with kernel commit commands examples. But I would mention in the man pages supporting the timer routines input.

> > See get_time(). The way you have it makes sense for hardware but
> > stands out versus the rest of tc.
> >
> > It maybe that the kernel UAPI is wrong, and should be using same time
> > units as rest of tc. Forgot to review that part of the patch.
>
> I would also sync with kernel UAPI if needed.

I checked the gate UAPI file, there is nothing need to changed for time format.

>
>
> Br,
> Po Liu



Br,
Po Liu

2020-06-19 06:03:43

by Po Liu

[permalink] [raw]
Subject: [v2,net-next] net: qos offload add flow status with dropped count

From: Po Liu <[email protected]>

This patch adds a drop frames counter to tc flower offloading.
Reporting h/w dropped frames is necessary for some actions.
Some actions like police action and the coming introduced stream gate
action would produce dropped frames which is necessary for user. Status
update shows how many filtered packets increasing and how many dropped
in those packets.

v2: Changes
- Update commit comments suggest by Jiri Pirko.

Signed-off-by: Po Liu <[email protected]>
---
This patch is continue the thread [email protected]

drivers/net/dsa/sja1105/sja1105_vl.c | 2 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 2 +-
drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 2 +-
.../net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c | 2 +-
drivers/net/ethernet/freescale/enetc/enetc_qos.c | 7 +++++--
drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 2 +-
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 ++--
drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c | 2 +-
drivers/net/ethernet/mscc/ocelot_flower.c | 2 +-
drivers/net/ethernet/netronome/nfp/flower/offload.c | 2 +-
drivers/net/ethernet/netronome/nfp/flower/qos_conf.c | 2 +-
include/net/act_api.h | 11 ++++++-----
include/net/flow_offload.h | 5 ++++-
include/net/pkt_cls.h | 5 +++--
net/sched/act_api.c | 10 ++++------
net/sched/act_ct.c | 6 +++---
net/sched/act_gact.c | 7 ++++---
net/sched/act_gate.c | 6 +++---
net/sched/act_mirred.c | 6 +++---
net/sched/act_pedit.c | 6 +++---
net/sched/act_police.c | 4 ++--
net/sched/act_skbedit.c | 5 +++--
net/sched/act_vlan.c | 6 +++---
net/sched/cls_flower.c | 1 +
net/sched/cls_matchall.c | 3 ++-
25 files changed, 60 insertions(+), 50 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105_vl.c b/drivers/net/dsa/sja1105/sja1105_vl.c
index bdfd6c4e190d..9ddc49b7eb8f 100644
--- a/drivers/net/dsa/sja1105/sja1105_vl.c
+++ b/drivers/net/dsa/sja1105/sja1105_vl.c
@@ -771,7 +771,7 @@ int sja1105_vl_stats(struct sja1105_private *priv, int port,

pkts = timingerr + unreleased + lengtherr;

- flow_stats_update(stats, 0, pkts - rule->vl.stats.pkts,
+ flow_stats_update(stats, 0, pkts - rule->vl.stats.pkts, 0,
jiffies - rule->vl.stats.lastused,
FLOW_ACTION_HW_STATS_IMMEDIATE);

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
index 0eef4f5e4a46..4d482d75a20b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c
@@ -1638,7 +1638,7 @@ static int bnxt_tc_get_flow_stats(struct bnxt *bp,
lastused = flow->lastused;
spin_unlock(&flow->stats_lock);

- flow_stats_update(&tc_flow_cmd->stats, stats.bytes, stats.packets,
+ flow_stats_update(&tc_flow_cmd->stats, stats.bytes, stats.packets, 0,
lastused, FLOW_ACTION_HW_STATS_DELAYED);
return 0;
}
diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
index 4a5fa9eba0b6..030de20a5d27 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c
@@ -902,7 +902,7 @@ int cxgb4_tc_flower_stats(struct net_device *dev,
if (ofld_stats->prev_packet_count != packets)
ofld_stats->last_used = jiffies;
flow_stats_update(&cls->stats, bytes - ofld_stats->byte_count,
- packets - ofld_stats->packet_count,
+ packets - ofld_stats->packet_count, 0,
ofld_stats->last_used,
FLOW_ACTION_HW_STATS_IMMEDIATE);

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c
index c88c47a14fbb..c439b5bce9c9 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c
@@ -346,7 +346,7 @@ int cxgb4_tc_matchall_stats(struct net_device *dev,
flow_stats_update(&cls_matchall->stats,
bytes - tc_port_matchall->ingress.bytes,
packets - tc_port_matchall->ingress.packets,
- tc_port_matchall->ingress.last_used,
+ 0, tc_port_matchall->ingress.last_used,
FLOW_ACTION_HW_STATS_IMMEDIATE);

tc_port_matchall->ingress.packets = packets;
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index fd3df19eaa32..fb76903eca90 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -1291,12 +1291,15 @@ static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,

spin_lock(&epsfp.psfp_lock);
stats.pkts = counters.matching_frames_count - filter->stats.pkts;
+ stats.drops = counters.not_passing_frames_count -
+ filter->stats.drops;
stats.lastused = filter->stats.lastused;
filter->stats.pkts += stats.pkts;
+ filter->stats.drops += stats.drops;
spin_unlock(&epsfp.psfp_lock);

- flow_stats_update(&f->stats, 0x0, stats.pkts, stats.lastused,
- FLOW_ACTION_HW_STATS_DELAYED);
+ flow_stats_update(&f->stats, 0x0, stats.pkts, stats.drops,
+ stats.lastused, FLOW_ACTION_HW_STATS_DELAYED);

return 0;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
index 430025550fad..c7107da03212 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c
@@ -672,7 +672,7 @@ mlx5_tc_ct_block_flow_offload_stats(struct mlx5_ct_ft *ft,
return -ENOENT;

mlx5_fc_query_cached(entry->counter, &bytes, &packets, &lastuse);
- flow_stats_update(&f->stats, bytes, packets, lastuse,
+ flow_stats_update(&f->stats, bytes, packets, 0, lastuse,
FLOW_ACTION_HW_STATS_DELAYED);

return 0;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index 7fc84f58e28a..bc9c0ac15f99 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -4828,7 +4828,7 @@ int mlx5e_stats_flower(struct net_device *dev, struct mlx5e_priv *priv,
no_peer_counter:
mlx5_devcom_release_peer_data(devcom, MLX5_DEVCOM_ESW_OFFLOADS);
out:
- flow_stats_update(&f->stats, bytes, packets, lastuse,
+ flow_stats_update(&f->stats, bytes, packets, 0, lastuse,
FLOW_ACTION_HW_STATS_DELAYED);
trace_mlx5e_stats_flower(f);
errout:
@@ -4946,7 +4946,7 @@ void mlx5e_tc_stats_matchall(struct mlx5e_priv *priv,
dpkts = cur_stats.rx_packets - rpriv->prev_vf_vport_stats.rx_packets;
dbytes = cur_stats.rx_bytes - rpriv->prev_vf_vport_stats.rx_bytes;
rpriv->prev_vf_vport_stats = cur_stats;
- flow_stats_update(&ma->stats, dbytes, dpkts, jiffies,
+ flow_stats_update(&ma->stats, dbytes, dpkts, 0, jiffies,
FLOW_ACTION_HW_STATS_DELAYED);
}

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
index 51e1b3930c56..61d21043d83a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
@@ -633,7 +633,7 @@ int mlxsw_sp_flower_stats(struct mlxsw_sp *mlxsw_sp,
if (err)
goto err_rule_get_stats;

- flow_stats_update(&f->stats, bytes, packets, lastuse, used_hw_stats);
+ flow_stats_update(&f->stats, bytes, packets, 0, lastuse, used_hw_stats);

mlxsw_sp_acl_ruleset_put(mlxsw_sp, ruleset);
return 0;
diff --git a/drivers/net/ethernet/mscc/ocelot_flower.c b/drivers/net/ethernet/mscc/ocelot_flower.c
index 5ce172e22b43..c90bafbd651f 100644
--- a/drivers/net/ethernet/mscc/ocelot_flower.c
+++ b/drivers/net/ethernet/mscc/ocelot_flower.c
@@ -244,7 +244,7 @@ int ocelot_cls_flower_stats(struct ocelot *ocelot, int port,
if (ret)
return ret;

- flow_stats_update(&f->stats, 0x0, ace.stats.pkts, 0x0,
+ flow_stats_update(&f->stats, 0x0, ace.stats.pkts, 0, 0x0,
FLOW_ACTION_HW_STATS_IMMEDIATE);
return 0;
}
diff --git a/drivers/net/ethernet/netronome/nfp/flower/offload.c b/drivers/net/ethernet/netronome/nfp/flower/offload.c
index 695d24b9dd92..234c652700e1 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/offload.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/offload.c
@@ -1491,7 +1491,7 @@ nfp_flower_get_stats(struct nfp_app *app, struct net_device *netdev,
nfp_flower_update_merge_stats(app, nfp_flow);

flow_stats_update(&flow->stats, priv->stats[ctx_id].bytes,
- priv->stats[ctx_id].pkts, priv->stats[ctx_id].used,
+ priv->stats[ctx_id].pkts, 0, priv->stats[ctx_id].used,
FLOW_ACTION_HW_STATS_DELAYED);

priv->stats[ctx_id].pkts = 0;
diff --git a/drivers/net/ethernet/netronome/nfp/flower/qos_conf.c b/drivers/net/ethernet/netronome/nfp/flower/qos_conf.c
index d18a830e4264..bb327d48d1ab 100644
--- a/drivers/net/ethernet/netronome/nfp/flower/qos_conf.c
+++ b/drivers/net/ethernet/netronome/nfp/flower/qos_conf.c
@@ -319,7 +319,7 @@ nfp_flower_stats_rate_limiter(struct nfp_app *app, struct net_device *netdev,
prev_stats->bytes = curr_stats->bytes;
spin_unlock_bh(&fl_priv->qos_stats_lock);

- flow_stats_update(&flow->stats, diff_bytes, diff_pkts,
+ flow_stats_update(&flow->stats, diff_bytes, diff_pkts, 0,
repr_priv->qos_table.last_update,
FLOW_ACTION_HW_STATS_DELAYED);
return 0;
diff --git a/include/net/act_api.h b/include/net/act_api.h
index 8c3934880670..cb382a89ea58 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -106,7 +106,7 @@ struct tc_action_ops {
struct netlink_callback *, int,
const struct tc_action_ops *,
struct netlink_ext_ack *);
- void (*stats_update)(struct tc_action *, u64, u32, u64, bool);
+ void (*stats_update)(struct tc_action *, u64, u64, u64, u64, bool);
size_t (*get_fill_size)(const struct tc_action *act);
struct net_device *(*get_dev)(const struct tc_action *a,
tc_action_priv_destructor *destructor);
@@ -232,8 +232,8 @@ static inline void tcf_action_inc_overlimit_qstats(struct tc_action *a)
spin_unlock(&a->tcfa_lock);
}

-void tcf_action_update_stats(struct tc_action *a, u64 bytes, u32 packets,
- bool drop, bool hw);
+void tcf_action_update_stats(struct tc_action *a, u64 bytes, u64 packets,
+ u64 drops, bool hw);
int tcf_action_copy_stats(struct sk_buff *, struct tc_action *, int);

int tcf_action_check_ctrlact(int action, struct tcf_proto *tp,
@@ -244,13 +244,14 @@ struct tcf_chain *tcf_action_set_ctrlact(struct tc_action *a, int action,
#endif /* CONFIG_NET_CLS_ACT */

static inline void tcf_action_stats_update(struct tc_action *a, u64 bytes,
- u64 packets, u64 lastuse, bool hw)
+ u64 packets, u64 drops,
+ u64 lastuse, bool hw)
{
#ifdef CONFIG_NET_CLS_ACT
if (!a->ops->stats_update)
return;

- a->ops->stats_update(a, bytes, packets, lastuse, hw);
+ a->ops->stats_update(a, bytes, packets, drops, lastuse, hw);
#endif
}

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index f2c8311a0433..00c15f14c434 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -389,17 +389,20 @@ static inline bool flow_rule_match_key(const struct flow_rule *rule,
struct flow_stats {
u64 pkts;
u64 bytes;
+ u64 drops;
u64 lastused;
enum flow_action_hw_stats used_hw_stats;
bool used_hw_stats_valid;
};

static inline void flow_stats_update(struct flow_stats *flow_stats,
- u64 bytes, u64 pkts, u64 lastused,
+ u64 bytes, u64 pkts,
+ u64 drops, u64 lastused,
enum flow_action_hw_stats used_hw_stats)
{
flow_stats->pkts += pkts;
flow_stats->bytes += bytes;
+ flow_stats->drops += drops;
flow_stats->lastused = max_t(u64, flow_stats->lastused, lastused);

/* The driver should pass value with a maximum of one bit set.
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index ed65619cbc47..ff017e5b3ea2 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -262,7 +262,7 @@ static inline void tcf_exts_put_net(struct tcf_exts *exts)

static inline void
tcf_exts_stats_update(const struct tcf_exts *exts,
- u64 bytes, u64 packets, u64 lastuse,
+ u64 bytes, u64 packets, u64 drops, u64 lastuse,
u8 used_hw_stats, bool used_hw_stats_valid)
{
#ifdef CONFIG_NET_CLS_ACT
@@ -273,7 +273,8 @@ tcf_exts_stats_update(const struct tcf_exts *exts,
for (i = 0; i < exts->nr_actions; i++) {
struct tc_action *a = exts->actions[i];

- tcf_action_stats_update(a, bytes, packets, lastuse, true);
+ tcf_action_stats_update(a, bytes, packets, drops,
+ lastuse, true);
a->used_hw_stats = used_hw_stats;
a->used_hw_stats_valid = used_hw_stats_valid;
}
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 8ac7eb0a8309..4c4466f18801 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -1059,14 +1059,13 @@ int tcf_action_init(struct net *net, struct tcf_proto *tp, struct nlattr *nla,
return err;
}

-void tcf_action_update_stats(struct tc_action *a, u64 bytes, u32 packets,
- bool drop, bool hw)
+void tcf_action_update_stats(struct tc_action *a, u64 bytes, u64 packets,
+ u64 drops, bool hw)
{
if (a->cpu_bstats) {
_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);

- if (drop)
- this_cpu_ptr(a->cpu_qstats)->drops += packets;
+ this_cpu_ptr(a->cpu_qstats)->drops += drops;

if (hw)
_bstats_cpu_update(this_cpu_ptr(a->cpu_bstats_hw),
@@ -1075,8 +1074,7 @@ void tcf_action_update_stats(struct tc_action *a, u64 bytes, u32 packets,
}

_bstats_update(&a->tcfa_bstats, bytes, packets);
- if (drop)
- a->tcfa_qstats.drops += packets;
+ a->tcfa_qstats.drops += drops;
if (hw)
_bstats_update(&a->tcfa_bstats_hw, bytes, packets);
}
diff --git a/net/sched/act_ct.c b/net/sched/act_ct.c
index e9f3576cbf71..1b9c6d4a1b6b 100644
--- a/net/sched/act_ct.c
+++ b/net/sched/act_ct.c
@@ -1450,12 +1450,12 @@ static int tcf_ct_search(struct net *net, struct tc_action **a, u32 index)
return tcf_idr_search(tn, a, index);
}

-static void tcf_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 drops, u64 lastuse, bool hw)
{
struct tcf_ct *c = to_ct(a);

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, drops, hw);
c->tcf_tm.lastuse = max_t(u64, c->tcf_tm.lastuse, lastuse);
}

diff --git a/net/sched/act_gact.c b/net/sched/act_gact.c
index 416065772719..410e3bbfb9ca 100644
--- a/net/sched/act_gact.c
+++ b/net/sched/act_gact.c
@@ -171,14 +171,15 @@ static int tcf_gact_act(struct sk_buff *skb, const struct tc_action *a,
return action;
}

-static void tcf_gact_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_gact_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 drops, u64 lastuse, bool hw)
{
struct tcf_gact *gact = to_gact(a);
int action = READ_ONCE(gact->tcf_action);
struct tcf_t *tm = &gact->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, action == TC_ACT_SHOT, hw);
+ tcf_action_update_stats(a, bytes, packets,
+ action == TC_ACT_SHOT ? packets : drops, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/act_gate.c b/net/sched/act_gate.c
index 9c628591f452..c818844846b1 100644
--- a/net/sched/act_gate.c
+++ b/net/sched/act_gate.c
@@ -568,13 +568,13 @@ static int tcf_gate_walker(struct net *net, struct sk_buff *skb,
return tcf_generic_walker(tn, skb, cb, type, ops, extack);
}

-static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_gate_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 drops, u64 lastuse, bool hw)
{
struct tcf_gate *gact = to_gate(a);
struct tcf_t *tm = &gact->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, drops, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 83dd82fc9f40..b2705318993b 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -312,13 +312,13 @@ static int tcf_mirred_act(struct sk_buff *skb, const struct tc_action *a,
return retval;
}

-static void tcf_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 drops, u64 lastuse, bool hw)
{
struct tcf_mirred *m = to_mirred(a);
struct tcf_t *tm = &m->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, drops, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index d41d6200d9de..66986db062ed 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -409,13 +409,13 @@ static int tcf_pedit_act(struct sk_buff *skb, const struct tc_action *a,
return p->tcf_action;
}

-static void tcf_pedit_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_pedit_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 drops, u64 lastuse, bool hw)
{
struct tcf_pedit *d = to_pedit(a);
struct tcf_t *tm = &d->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, drops, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 8b7a0ac96c51..0b431d493768 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -288,13 +288,13 @@ static void tcf_police_cleanup(struct tc_action *a)
}

static void tcf_police_stats_update(struct tc_action *a,
- u64 bytes, u32 packets,
+ u64 bytes, u64 packets, u64 drops,
u64 lastuse, bool hw)
{
struct tcf_police *police = to_police(a);
struct tcf_t *tm = &police->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, drops, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index b125b2be4467..361b863e0634 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -74,12 +74,13 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
}

static void tcf_skbedit_stats_update(struct tc_action *a, u64 bytes,
- u32 packets, u64 lastuse, bool hw)
+ u64 packets, u64 drops,
+ u64 lastuse, bool hw)
{
struct tcf_skbedit *d = to_skbedit(a);
struct tcf_t *tm = &d->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, drops, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index c91d3958fcbb..a5ff9f68ab02 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -302,13 +302,13 @@ static int tcf_vlan_walker(struct net *net, struct sk_buff *skb,
return tcf_generic_walker(tn, skb, cb, type, ops, extack);
}

-static void tcf_vlan_stats_update(struct tc_action *a, u64 bytes, u32 packets,
- u64 lastuse, bool hw)
+static void tcf_vlan_stats_update(struct tc_action *a, u64 bytes, u64 packets,
+ u64 drops, u64 lastuse, bool hw)
{
struct tcf_vlan *v = to_vlan(a);
struct tcf_t *tm = &v->tcf_tm;

- tcf_action_update_stats(a, bytes, packets, false, hw);
+ tcf_action_update_stats(a, bytes, packets, drops, hw);
tm->lastuse = max_t(u64, tm->lastuse, lastuse);
}

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b2da37286082..391971672d54 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -491,6 +491,7 @@ static void fl_hw_update_stats(struct tcf_proto *tp, struct cls_fl_filter *f,

tcf_exts_stats_update(&f->exts, cls_flower.stats.bytes,
cls_flower.stats.pkts,
+ cls_flower.stats.drops,
cls_flower.stats.lastused,
cls_flower.stats.used_hw_stats,
cls_flower.stats.used_hw_stats_valid);
diff --git a/net/sched/cls_matchall.c b/net/sched/cls_matchall.c
index 8d39dbcf1746..cafb84480bab 100644
--- a/net/sched/cls_matchall.c
+++ b/net/sched/cls_matchall.c
@@ -338,7 +338,8 @@ static void mall_stats_hw_filter(struct tcf_proto *tp,
tc_setup_cb_call(block, TC_SETUP_CLSMATCHALL, &cls_mall, false, true);

tcf_exts_stats_update(&head->exts, cls_mall.stats.bytes,
- cls_mall.stats.pkts, cls_mall.stats.lastused,
+ cls_mall.stats.pkts, cls_mall.stats.drops,
+ cls_mall.stats.lastused,
cls_mall.stats.used_hw_stats,
cls_mall.stats.used_hw_stats_valid);
}
--
2.17.1

2020-06-19 07:06:08

by Simon Horman

[permalink] [raw]
Subject: Re: [v2,net-next] net: qos offload add flow status with dropped count

On Fri, Jun 19, 2020 at 02:01:07PM +0800, Po Liu wrote:
> From: Po Liu <[email protected]>
>
> This patch adds a drop frames counter to tc flower offloading.
> Reporting h/w dropped frames is necessary for some actions.
> Some actions like police action and the coming introduced stream gate
> action would produce dropped frames which is necessary for user. Status
> update shows how many filtered packets increasing and how many dropped
> in those packets.
>
> v2: Changes
> - Update commit comments suggest by Jiri Pirko.
>
> Signed-off-by: Po Liu <[email protected]>
> ---
> This patch is continue the thread [email protected]
>
> drivers/net/dsa/sja1105/sja1105_vl.c | 2 +-
> drivers/net/ethernet/broadcom/bnxt/bnxt_tc.c | 2 +-
> drivers/net/ethernet/chelsio/cxgb4/cxgb4_tc_flower.c | 2 +-
> .../net/ethernet/chelsio/cxgb4/cxgb4_tc_matchall.c | 2 +-
> drivers/net/ethernet/freescale/enetc/enetc_qos.c | 7 +++++--
> drivers/net/ethernet/mellanox/mlx5/core/en/tc_ct.c | 2 +-
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 4 ++--
> drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c | 2 +-
> drivers/net/ethernet/mscc/ocelot_flower.c | 2 +-
> drivers/net/ethernet/netronome/nfp/flower/offload.c | 2 +-
> drivers/net/ethernet/netronome/nfp/flower/qos_conf.c | 2 +-
> include/net/act_api.h | 11 ++++++-----
> include/net/flow_offload.h | 5 ++++-
> include/net/pkt_cls.h | 5 +++--
> net/sched/act_api.c | 10 ++++------
> net/sched/act_ct.c | 6 +++---
> net/sched/act_gact.c | 7 ++++---
> net/sched/act_gate.c | 6 +++---
> net/sched/act_mirred.c | 6 +++---
> net/sched/act_pedit.c | 6 +++---
> net/sched/act_police.c | 4 ++--
> net/sched/act_skbedit.c | 5 +++--
> net/sched/act_vlan.c | 6 +++---
> net/sched/cls_flower.c | 1 +
> net/sched/cls_matchall.c | 3 ++-
> 25 files changed, 60 insertions(+), 50 deletions(-)

Netronome portion:

Reviewed-by: Simon Horman <[email protected]>

2020-06-20 04:20:03

by Vlad Buslov

[permalink] [raw]
Subject: Re: [v2,net-next] net: qos offload add flow status with dropped count

On Fri 19 Jun 2020 at 09:01, Po Liu <[email protected]> wrote:
> From: Po Liu <[email protected]>
>
> This patch adds a drop frames counter to tc flower offloading.
> Reporting h/w dropped frames is necessary for some actions.
> Some actions like police action and the coming introduced stream gate
> action would produce dropped frames which is necessary for user. Status
> update shows how many filtered packets increasing and how many dropped
> in those packets.
>
> v2: Changes
> - Update commit comments suggest by Jiri Pirko.
>
> Signed-off-by: Po Liu <[email protected]>
> ---

Reviewed-by: Vlad Buslov <[email protected]>

2020-06-20 04:39:53

by David Miller

[permalink] [raw]
Subject: Re: [v2,net-next] net: qos offload add flow status with dropped count

From: Po Liu <[email protected]>
Date: Fri, 19 Jun 2020 14:01:07 +0800

> From: Po Liu <[email protected]>
>
> This patch adds a drop frames counter to tc flower offloading.
> Reporting h/w dropped frames is necessary for some actions.
> Some actions like police action and the coming introduced stream gate
> action would produce dropped frames which is necessary for user. Status
> update shows how many filtered packets increasing and how many dropped
> in those packets.
>
> v2: Changes
> - Update commit comments suggest by Jiri Pirko.
>
> Signed-off-by: Po Liu <[email protected]>

Applied, thank you.

2020-06-21 10:08:17

by Ido Schimmel

[permalink] [raw]
Subject: Re: [RFC,net-next 8/9] net: qos: police action add index for tc flower offloading

On Fri, Mar 06, 2020 at 08:56:06PM +0800, Po Liu wrote:
> Hardware may own many entries for police flow. So that make one(or
> multi) flow to be policed by one hardware entry. This patch add the
> police action index provide to the driver side make it mapping the
> driver hardware entry index.
>
> Signed-off-by: Po Liu <[email protected]>

Hi,

I started looking into tc-police offload in mlxsw and remembered your
patch. Are you planning to formally submit it? I'm asking because in
mlxsw it is also possible to share the same policer between multiple
filters.

Thanks

> ---
> include/net/flow_offload.h | 1 +
> net/sched/cls_api.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 54df87328edc..3b78b15ed20b 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -201,6 +201,7 @@ struct flow_action_entry {
> bool truncate;
> } sample;
> struct { /* FLOW_ACTION_POLICE */
> + u32 index;
> s64 burst;
> u64 rate_bytes_ps;
> u32 mtu;
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 363d3991793d..ce846a9dadc1 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3584,6 +3584,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
> entry->police.rate_bytes_ps =
> tcf_police_rate_bytes_ps(act);
> entry->police.mtu = tcf_police_mtu(act);
> + entry->police.index = act->tcfa_index;
> } else if (is_tcf_ct(act)) {
> entry->id = FLOW_ACTION_CT;
> entry->ct.action = tcf_ct_action(act);
> --
> 2.17.1
>

2020-06-22 01:21:44

by Po Liu

[permalink] [raw]
Subject: RE: Re: [RFC,net-next 8/9] net: qos: police action add index for tc flower offloading

Hi Ido,


> -----Original Message-----
> From: Ido Schimmel <[email protected]>
> Sent: 2020??6??21?? 18:04
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; Claudiu Manoil
> <[email protected]>; Vladimir Oltean <[email protected]>;
> Alexandru Marginean <[email protected]>; Xiaoliang Yang
> <[email protected]>; Roy Zang <[email protected]>; Mingkai Hu
> <[email protected]>; Jerry Huang <[email protected]>; Leo Li
> <[email protected]>; [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [RFC,net-next 8/9] net: qos: police action add index for
> tc flower offloading
>
>
> On Fri, Mar 06, 2020 at 08:56:06PM +0800, Po Liu wrote:
> > Hardware may own many entries for police flow. So that make one(or
> > multi) flow to be policed by one hardware entry. This patch add the
> > police action index provide to the driver side make it mapping the
> > driver hardware entry index.
> >
> > Signed-off-by: Po Liu <[email protected]>
>
> Hi,
>
> I started looking into tc-police offload in mlxsw and remembered your
> patch. Are you planning to formally submit it? I'm asking because in mlxsw
> it is also possible to share the same policer between multiple filters.

Yes, I am preparing the patches and push again very soon. The patches will add mtu and index for offloading as first step.
The next step is seeking method to implement two color + two buckets mode but seems absent one bucket in policing action. The current burst + rate_bytes_ps only can only implement one color+ one bucket policing.

>
> Thanks
>
> > ---
> > include/net/flow_offload.h | 1 +
> > net/sched/cls_api.c | 1 +
> > 2 files changed, 2 insertions(+)
> >
> > diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> > index 54df87328edc..3b78b15ed20b 100644
> > --- a/include/net/flow_offload.h
> > +++ b/include/net/flow_offload.h
> > @@ -201,6 +201,7 @@ struct flow_action_entry {
> > bool truncate;
> > } sample;
> > struct { /* FLOW_ACTION_POLICE */
> > + u32 index;
> > s64 burst;
> > u64 rate_bytes_ps;
> > u32 mtu;
> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index
> > 363d3991793d..ce846a9dadc1 100644
> > --- a/net/sched/cls_api.c
> > +++ b/net/sched/cls_api.c
> > @@ -3584,6 +3584,7 @@ int tc_setup_flow_action(struct flow_action
> *flow_action,
> > entry->police.rate_bytes_ps =
> > tcf_police_rate_bytes_ps(act);
> > entry->police.mtu = tcf_police_mtu(act);
> > + entry->police.index = act->tcfa_index;
> > } else if (is_tcf_ct(act)) {
> > entry->id = FLOW_ACTION_CT;
> > entry->ct.action = tcf_ct_action(act);
> > --
> > 2.17.1
> >

Thanks a lot!
Br,
Po Liu

2020-06-23 06:35:05

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 1/4] net: qos: add tc police offloading action with max frame size limit

From: Po Liu <[email protected]>

Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some
hardware own the capability to limit the frame size. If the frame size
larger than the setting, the frame would be dropped. For the police
action itself already accept the 'mtu' parameter in tc command. But not
extend to tc flower offloading. So extend 'mtu' to tc flower offloading.

Signed-off-by: Po Liu <[email protected]>
---
continue the thread [email protected] for the police
action offloading.

include/net/flow_offload.h | 1 +
include/net/tc_act/tc_police.h | 10 ++++++++++
net/sched/cls_api.c | 1 +
3 files changed, 12 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 00c15f14c434..c2ef19c6b27d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -234,6 +234,7 @@ struct flow_action_entry {
struct { /* FLOW_ACTION_POLICE */
s64 burst;
u64 rate_bytes_ps;
+ u32 mtu;
} police;
struct { /* FLOW_ACTION_CT */
int action;
diff --git a/include/net/tc_act/tc_police.h b/include/net/tc_act/tc_police.h
index f098ad4424be..cd973b10ae8c 100644
--- a/include/net/tc_act/tc_police.h
+++ b/include/net/tc_act/tc_police.h
@@ -69,4 +69,14 @@ static inline s64 tcf_police_tcfp_burst(const struct tc_action *act)
return params->tcfp_burst;
}

+static inline u32 tcf_police_tcfp_mtu(const struct tc_action *act)
+{
+ struct tcf_police *police = to_police(act);
+ struct tcf_police_params *params;
+
+ params = rcu_dereference_protected(police->params,
+ lockdep_is_held(&police->tcf_lock));
+ return params->tcfp_mtu;
+}
+
#endif /* __NET_TC_POLICE_H */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a00a203b2ef5..6aba7d5ba1ec 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3658,6 +3658,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
entry->police.burst = tcf_police_tcfp_burst(act);
entry->police.rate_bytes_ps =
tcf_police_rate_bytes_ps(act);
+ entry->police.mtu = tcf_police_tcfp_mtu(act);
} else if (is_tcf_ct(act)) {
entry->id = FLOW_ACTION_CT;
entry->ct.action = tcf_ct_action(act);
--
2.17.1

2020-06-23 06:35:23

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 2/4] net: enetc: add support max frame size for tc flower offload

From: Po Liu <[email protected]>

Base on the tc flower offload police action add max frame size by the
parameter 'mtu'. Tc flower device driver working by the IEEE 802.1Qci
stream filter can implement the max frame size filtering. Add it to the
current hardware tc flower stearm filter driver.

Signed-off-by: Po Liu <[email protected]>
---
.../net/ethernet/freescale/enetc/enetc_qos.c | 52 +++++++++++++------
1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index fb76903eca90..07f98bf7a06b 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -389,6 +389,7 @@ struct enetc_psfp_filter {
u32 index;
s32 handle;
s8 prio;
+ u32 maxsdu;
u32 gate_id;
s32 meter_id;
refcount_t refcount;
@@ -430,6 +431,12 @@ static struct actions_fwd enetc_act_fwd[] = {
BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
FILTER_ACTION_TYPE_PSFP
},
+ {
+ BIT(FLOW_ACTION_POLICE) |
+ BIT(FLOW_ACTION_GATE),
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
+ FILTER_ACTION_TYPE_PSFP
+ },
/* example for ACL actions */
{
BIT(FLOW_ACTION_DROP),
@@ -594,8 +601,12 @@ static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
/* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
* field as being either an MSDU value or an index into the Flow
* Meter Instance table.
- * TODO: no limit max sdu
*/
+ if (sfi->maxsdu) {
+ sfi_config->msdu =
+ cpu_to_le16(sfi->maxsdu);
+ sfi_config->multi |= 0x40;
+ }

if (sfi->meter_id >= 0) {
sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
@@ -872,6 +883,7 @@ static struct enetc_psfp_filter
hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
if (s->gate_id == sfi->gate_id &&
s->prio == sfi->prio &&
+ s->maxsdu == sfi->maxsdu &&
s->meter_id == sfi->meter_id)
return s;

@@ -979,6 +991,7 @@ static struct actions_fwd *enetc_check_flow_actions(u64 acts,
static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
struct flow_cls_offload *f)
{
+ struct flow_action_entry *entryg = NULL, *entryp = NULL;
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
struct netlink_ext_ack *extack = f->common.extack;
struct enetc_stream_filter *filter, *old_filter;
@@ -997,9 +1010,12 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,

flow_action_for_each(i, entry, &rule->action)
if (entry->id == FLOW_ACTION_GATE)
- break;
+ entryg = entry;
+ else if (entry->id == FLOW_ACTION_POLICE)
+ entryp = entry;

- if (entry->id != FLOW_ACTION_GATE)
+ /* Not support without gate action */
+ if (!entryg)
return -EINVAL;

filter = kzalloc(sizeof(*filter), GFP_KERNEL);
@@ -1079,19 +1095,19 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
}

/* parsing gate action */
- if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
+ if (entryg->gate.index >= priv->psfp_cap.max_psfp_gate) {
NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
err = -ENOSPC;
goto free_filter;
}

- if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
+ if (entryg->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
err = -ENOSPC;
goto free_filter;
}

- entries_size = struct_size(sgi, entries, entry->gate.num_entries);
+ entries_size = struct_size(sgi, entries, entryg->gate.num_entries);
sgi = kzalloc(entries_size, GFP_KERNEL);
if (!sgi) {
err = -ENOMEM;
@@ -1099,18 +1115,18 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
}

refcount_set(&sgi->refcount, 1);
- sgi->index = entry->gate.index;
- sgi->init_ipv = entry->gate.prio;
- sgi->basetime = entry->gate.basetime;
- sgi->cycletime = entry->gate.cycletime;
- sgi->num_entries = entry->gate.num_entries;
+ sgi->index = entryg->gate.index;
+ sgi->init_ipv = entryg->gate.prio;
+ sgi->basetime = entryg->gate.basetime;
+ sgi->cycletime = entryg->gate.cycletime;
+ sgi->num_entries = entryg->gate.num_entries;

e = sgi->entries;
- for (i = 0; i < entry->gate.num_entries; i++) {
- e[i].gate_state = entry->gate.entries[i].gate_state;
- e[i].interval = entry->gate.entries[i].interval;
- e[i].ipv = entry->gate.entries[i].ipv;
- e[i].maxoctets = entry->gate.entries[i].maxoctets;
+ for (i = 0; i < entryg->gate.num_entries; i++) {
+ e[i].gate_state = entryg->gate.entries[i].gate_state;
+ e[i].interval = entryg->gate.entries[i].interval;
+ e[i].ipv = entryg->gate.entries[i].ipv;
+ e[i].maxoctets = entryg->gate.entries[i].maxoctets;
}

filter->sgi_index = sgi->index;
@@ -1127,6 +1143,10 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
/* flow meter not support yet */
sfi->meter_id = ENETC_PSFP_WILDCARD;

+ /* Max frame size */
+ if (entryp)
+ sfi->maxsdu = entryp->police.mtu;
+
/* prio ref the filter prio */
if (f->common.prio && f->common.prio <= BIT(3))
sfi->prio = f->common.prio - 1;
--
2.17.1

2020-06-23 06:37:02

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 4/4] net: enetc add tc flower offload flow metering policing action

From: Po Liu <[email protected]>

Flow metering entries in IEEE 802.1Qci is an optional function for a
flow filtering module. Flow metering is two rates two buckets and three
color marker to policing the frames. This patch only enable one rate one
bucket and in color blind mode. Flow metering instance are as
specified in the algorithm in MEF 10.3 and in Bandwidth Profile
Parameters. They are:

a) Flow meter instance identifier. An integer value identifying the flow
meter instance. The patch use the police 'index' as thin value.
b) Committed Information Rate (CIR), in bits per second. This patch use
the 'rate_bytes_ps' represent this value.
c) Committed Burst Size (CBS), in octets. This patch use the 'burst'
represent this value.
d) Excess Information Rate (EIR), in bits per second.
e) Excess Burst Size per Bandwidth Profile Flow (EBS), in octets.
And plus some other parameters. This patch set EIR/EBS default disable
and color blind mode.

Signed-off-by: Po Liu <[email protected]>
---
.../net/ethernet/freescale/enetc/enetc_hw.h | 24 +++
.../net/ethernet/freescale/enetc/enetc_qos.c | 160 ++++++++++++++++--
2 files changed, 172 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 6314051bc6c1..f00c4382423e 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -570,6 +570,7 @@ enum bdcr_cmd_class {
BDCR_CMD_STREAM_IDENTIFY,
BDCR_CMD_STREAM_FILTER,
BDCR_CMD_STREAM_GCL,
+ BDCR_CMD_FLOW_METER,
__BDCR_CMD_MAX_LEN,
BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
};
@@ -736,10 +737,33 @@ struct sgcl_data {
struct sgce sgcl[0];
};

+#define ENETC_CBDR_FMI_MR BIT(0)
+#define ENETC_CBDR_FMI_MREN BIT(1)
+#define ENETC_CBDR_FMI_DOY BIT(2)
+#define ENETC_CBDR_FMI_CM BIT(3)
+#define ENETC_CBDR_FMI_CF BIT(4)
+#define ENETC_CBDR_FMI_NDOR BIT(5)
+#define ENETC_CBDR_FMI_OALEN BIT(6)
+#define ENETC_CBDR_FMI_IRFPP_MASK GENMASK(4, 0)
+
+/* class 10: command 0/1, Flow Meter Instance Set, short Format */
+struct fmi_conf {
+ __le32 cir;
+ __le32 cbs;
+ __le32 eir;
+ __le32 ebs;
+ u8 conf;
+ u8 res1;
+ u8 ir_fpp;
+ u8 res2[4];
+ u8 en;
+};
+
struct enetc_cbd {
union{
struct sfi_conf sfi_conf;
struct sgi_table sgi_table;
+ struct fmi_conf fmi_conf;
struct {
__le32 addr[2];
union {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 07f98bf7a06b..2d79962daf4a 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -408,10 +408,26 @@ struct enetc_psfp_gate {
struct action_gate_entry entries[0];
};

+/* Only enable the green color frame now
+ * Will add eir and ebs color blind, couple flag etc when
+ * policing action add more offloading parameters
+ */
+struct enetc_psfp_meter {
+ u32 index;
+ u32 cir;
+ u32 cbs;
+ refcount_t refcount;
+ struct hlist_node node;
+};
+
+#define ENETC_PSFP_FLAGS_FMI BIT(0)
+
struct enetc_stream_filter {
struct enetc_streamid sid;
u32 sfi_index;
u32 sgi_index;
+ u32 flags;
+ u32 fmi_index;
struct flow_stats stats;
struct hlist_node node;
};
@@ -422,6 +438,7 @@ struct enetc_psfp {
struct hlist_head stream_list;
struct hlist_head psfp_filter_list;
struct hlist_head psfp_gate_list;
+ struct hlist_head psfp_meter_list;
spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
};

@@ -842,6 +859,47 @@ static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
return err;
}

+static int enetc_flowmeter_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_meter *fmi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = { .cmd = 0 };
+ struct fmi_conf *fmi_config;
+ u64 temp = 0;
+
+ cbd.index = cpu_to_le16((u16)fmi->index);
+ cbd.cls = BDCR_CMD_FLOW_METER;
+ cbd.status_flags = 0x80;
+
+ if (!enable)
+ return enetc_send_cmd(priv->si, &cbd);
+
+ fmi_config = &cbd.fmi_conf;
+ fmi_config->en = 0x80;
+
+ if (fmi->cir) {
+ temp = (u64)8000 * fmi->cir;
+ temp = temp / 3725;
+ }
+
+ fmi_config->cir = cpu_to_le32((u32)temp);
+ fmi_config->cbs = cpu_to_le32(fmi->cbs);
+
+ /* Default for eir ebs disable */
+ fmi_config->eir = 0;
+ fmi_config->ebs = 0;
+
+ /* Default:
+ * mark red disable
+ * drop on yellow disable
+ * color mode disable
+ * couple flag disable
+ */
+ fmi_config->conf = 0;
+
+ return enetc_send_cmd(priv->si, &cbd);
+}
+
static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
{
struct enetc_stream_filter *f;
@@ -875,6 +933,17 @@ static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
return NULL;
}

+static struct enetc_psfp_meter *enetc_get_meter_by_index(u32 index)
+{
+ struct enetc_psfp_meter *m;
+
+ hlist_for_each_entry(m, &epsfp.psfp_meter_list, node)
+ if (m->index == index)
+ return m;
+
+ return NULL;
+}
+
static struct enetc_psfp_filter
*enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
{
@@ -934,9 +1003,27 @@ static void stream_gate_unref(struct enetc_ndev_priv *priv, u32 index)
}
}

+static void flow_meter_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_meter *fmi;
+ u8 z;
+
+ fmi = enetc_get_meter_by_index(index);
+ WARN_ON(!fmi);
+ z = refcount_dec_and_test(&fmi->refcount);
+ if (z) {
+ enetc_flowmeter_hw_set(priv, fmi, false);
+ hlist_del(&fmi->node);
+ kfree(fmi);
+ }
+}
+
static void remove_one_chain(struct enetc_ndev_priv *priv,
struct enetc_stream_filter *filter)
{
+ if (filter->flags & ENETC_PSFP_FLAGS_FMI)
+ flow_meter_unref(priv, filter->fmi_index);
+
stream_gate_unref(priv, filter->sgi_index);
stream_filter_unref(priv, filter->sfi_index);

@@ -947,7 +1034,8 @@ static void remove_one_chain(struct enetc_ndev_priv *priv,
static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
struct enetc_streamid *sid,
struct enetc_psfp_filter *sfi,
- struct enetc_psfp_gate *sgi)
+ struct enetc_psfp_gate *sgi,
+ struct enetc_psfp_meter *fmi)
{
int err;

@@ -965,8 +1053,16 @@ static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
if (err)
goto revert_sfi;

+ if (fmi) {
+ err = enetc_flowmeter_hw_set(priv, fmi, true);
+ if (err)
+ goto revert_sgi;
+ }
+
return 0;

+revert_sgi:
+ enetc_streamgate_hw_set(priv, sgi, false);
revert_sfi:
if (sfi)
enetc_streamfilter_hw_set(priv, sfi, false);
@@ -995,6 +1091,7 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
struct netlink_ext_ack *extack = f->common.extack;
struct enetc_stream_filter *filter, *old_filter;
+ struct enetc_psfp_meter *fmi = NULL, *old_fmi;
struct enetc_psfp_filter *sfi, *old_sfi;
struct enetc_psfp_gate *sgi, *old_sgi;
struct flow_action_entry *entry;
@@ -1139,13 +1236,34 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,

refcount_set(&sfi->refcount, 1);
sfi->gate_id = sgi->index;
-
- /* flow meter not support yet */
sfi->meter_id = ENETC_PSFP_WILDCARD;

- /* Max frame size */
- if (entryp)
- sfi->maxsdu = entryp->police.mtu;
+ /* Flow meter and max frame size */
+ if (entryp) {
+ if (entryp->police.burst) {
+ u64 temp;
+
+ fmi = kzalloc(sizeof(*fmi), GFP_KERNEL);
+ if (!fmi) {
+ err = -ENOMEM;
+ goto free_sfi;
+ }
+ refcount_set(&fmi->refcount, 1);
+ fmi->cir = entryp->police.rate_bytes_ps;
+ /* Convert to original burst value */
+ temp = entryp->police.burst * fmi->cir;
+ temp = div_u64(temp, 1000000000ULL);
+
+ fmi->cbs = temp;
+ fmi->index = entryp->police.index;
+ filter->flags |= ENETC_PSFP_FLAGS_FMI;
+ filter->fmi_index = fmi->index;
+ sfi->meter_id = fmi->index;
+ }
+
+ if (entryp->police.mtu)
+ sfi->maxsdu = entryp->police.mtu;
+ }

/* prio ref the filter prio */
if (f->common.prio && f->common.prio <= BIT(3))
@@ -1161,7 +1279,7 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
if (sfi->handle < 0) {
NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
err = -ENOSPC;
- goto free_sfi;
+ goto free_fmi;
}

sfi->index = index;
@@ -1177,11 +1295,23 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
}

err = enetc_psfp_hw_set(priv, &filter->sid,
- sfi_overwrite ? NULL : sfi, sgi);
+ sfi_overwrite ? NULL : sfi, sgi, fmi);
if (err)
- goto free_sfi;
+ goto free_fmi;

spin_lock(&epsfp.psfp_lock);
+ if (filter->flags & ENETC_PSFP_FLAGS_FMI) {
+ old_fmi = enetc_get_meter_by_index(filter->fmi_index);
+ if (old_fmi) {
+ fmi->refcount = old_fmi->refcount;
+ refcount_set(&fmi->refcount,
+ refcount_read(&old_fmi->refcount) + 1);
+ hlist_del(&old_fmi->node);
+ kfree(old_fmi);
+ }
+ hlist_add_head(&fmi->node, &epsfp.psfp_meter_list);
+ }
+
/* Remove the old node if exist and update with a new node */
old_sgi = enetc_get_gate_by_index(filter->sgi_index);
if (old_sgi) {
@@ -1212,6 +1342,8 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,

return 0;

+free_fmi:
+ kfree(fmi);
free_sfi:
kfree(sfi);
free_gate:
@@ -1310,9 +1442,13 @@ static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
return -EINVAL;

spin_lock(&epsfp.psfp_lock);
- stats.pkts = counters.matching_frames_count - filter->stats.pkts;
- stats.drops = counters.not_passing_frames_count -
- filter->stats.drops;
+ stats.pkts = counters.matching_frames_count +
+ counters.not_passing_sdu_count -
+ filter->stats.pkts;
+ stats.drops = counters.not_passing_frames_count +
+ counters.not_passing_sdu_count +
+ counters.red_frames_count -
+ filter->stats.drops;
stats.lastused = filter->stats.lastused;
filter->stats.pkts += stats.pkts;
filter->stats.drops += stats.drops;
--
2.17.1

2020-06-23 06:38:12

by Po Liu

[permalink] [raw]
Subject: [v1,net-next 3/4] net: qos: police action add index for tc flower offloading

From: Po Liu <[email protected]>

Hardware may own many entries for police flow. So that make one(or
multi) flow to be policed by one hardware entry. This patch add the
police action index provide to the driver side make it mapping the
driver hardware entry index.

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 1 +
net/sched/cls_api.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index c2ef19c6b27d..eed98075b1ae 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -232,6 +232,7 @@ struct flow_action_entry {
bool truncate;
} sample;
struct { /* FLOW_ACTION_POLICE */
+ u32 index;
s64 burst;
u64 rate_bytes_ps;
u32 mtu;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 6aba7d5ba1ec..fdc4c89ca1fa 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3659,6 +3659,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
entry->police.rate_bytes_ps =
tcf_police_rate_bytes_ps(act);
entry->police.mtu = tcf_police_tcfp_mtu(act);
+ entry->police.index = act->tcfa_index;
} else if (is_tcf_ct(act)) {
entry->id = FLOW_ACTION_CT;
entry->ct.action = tcf_ct_action(act);
--
2.17.1

2020-06-23 07:04:02

by Ido Schimmel

[permalink] [raw]
Subject: Re: [v1,net-next 1/4] net: qos: add tc police offloading action with max frame size limit

On Tue, Jun 23, 2020 at 02:34:09PM +0800, Po Liu wrote:
> From: Po Liu <[email protected]>
>
> Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some

s/support/supports/
s/'burst''/'burst'/

> hardware own the capability to limit the frame size. If the frame size
> larger than the setting, the frame would be dropped. For the police
> action itself already accept the 'mtu' parameter in tc command. But not

s/accept/accepts/

> extend to tc flower offloading. So extend 'mtu' to tc flower offloading.

Throughout the submission you are always using the term 'flower
offloading', but this has nothing to do with flower. Flower is the
classifier, whereas you are extending police action which can be tied to
any classifier.

>
> Signed-off-by: Po Liu <[email protected]>
> ---
> continue the thread [email protected] for the police
> action offloading.

For a patch set you need a cover letter (patch 0). It should include
necessary background, motivation and overview of the patches. You can
mention there that some of the patches were sent as RFC back in March
and provide a link:

https://lore.kernel.org/netdev/[email protected]/

The code itself looks good to me.

>
> include/net/flow_offload.h | 1 +
> include/net/tc_act/tc_police.h | 10 ++++++++++
> net/sched/cls_api.c | 1 +
> 3 files changed, 12 insertions(+)
>
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index 00c15f14c434..c2ef19c6b27d 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -234,6 +234,7 @@ struct flow_action_entry {
> struct { /* FLOW_ACTION_POLICE */
> s64 burst;
> u64 rate_bytes_ps;
> + u32 mtu;
> } police;
> struct { /* FLOW_ACTION_CT */
> int action;
> diff --git a/include/net/tc_act/tc_police.h b/include/net/tc_act/tc_police.h
> index f098ad4424be..cd973b10ae8c 100644
> --- a/include/net/tc_act/tc_police.h
> +++ b/include/net/tc_act/tc_police.h
> @@ -69,4 +69,14 @@ static inline s64 tcf_police_tcfp_burst(const struct tc_action *act)
> return params->tcfp_burst;
> }
>
> +static inline u32 tcf_police_tcfp_mtu(const struct tc_action *act)
> +{
> + struct tcf_police *police = to_police(act);
> + struct tcf_police_params *params;
> +
> + params = rcu_dereference_protected(police->params,
> + lockdep_is_held(&police->tcf_lock));
> + return params->tcfp_mtu;
> +}
> +
> #endif /* __NET_TC_POLICE_H */
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index a00a203b2ef5..6aba7d5ba1ec 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3658,6 +3658,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
> entry->police.burst = tcf_police_tcfp_burst(act);
> entry->police.rate_bytes_ps =
> tcf_police_rate_bytes_ps(act);
> + entry->police.mtu = tcf_police_tcfp_mtu(act);
> } else if (is_tcf_ct(act)) {
> entry->id = FLOW_ACTION_CT;
> entry->ct.action = tcf_ct_action(act);
> --
> 2.17.1
>

2020-06-23 07:12:11

by Ido Schimmel

[permalink] [raw]
Subject: Re: [v1,net-next 3/4] net: qos: police action add index for tc flower offloading

On Tue, Jun 23, 2020 at 02:34:11PM +0800, Po Liu wrote:
> From: Po Liu <[email protected]>
>
> Hardware may own many entries for police flow. So that make one(or
> multi) flow to be policed by one hardware entry. This patch add the
> police action index provide to the driver side make it mapping the
> driver hardware entry index.

Maybe first mention that it is possible for multiple filters in software
to share the same policer. Something like:

"
It is possible for several tc filters to share the same police action by
specifying the action's index when installing the filters.

Propagate this index to device drivers through the flow offload
intermediate representation, so that drivers could share a single
hardware policer between multiple filters.
"

>
> Signed-off-by: Po Liu <[email protected]>
> ---
> include/net/flow_offload.h | 1 +
> net/sched/cls_api.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index c2ef19c6b27d..eed98075b1ae 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -232,6 +232,7 @@ struct flow_action_entry {
> bool truncate;
> } sample;
> struct { /* FLOW_ACTION_POLICE */
> + u32 index;
> s64 burst;
> u64 rate_bytes_ps;
> u32 mtu;
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 6aba7d5ba1ec..fdc4c89ca1fa 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3659,6 +3659,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
> entry->police.rate_bytes_ps =
> tcf_police_rate_bytes_ps(act);
> entry->police.mtu = tcf_police_tcfp_mtu(act);
> + entry->police.index = act->tcfa_index;
> } else if (is_tcf_ct(act)) {
> entry->id = FLOW_ACTION_CT;
> entry->ct.action = tcf_ct_action(act);
> --
> 2.17.1
>

2020-06-23 07:41:24

by Po Liu

[permalink] [raw]
Subject: RE: [EXT] Re: [v1,net-next 3/4] net: qos: police action add index for tc flower offloading

Hi Ido,



> -----Original Message-----
> From: Ido Schimmel <[email protected]>
> Sent: 2020??6??23?? 15:10
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Claudiu Manoil <[email protected]>; Vladimir
> Oltean <[email protected]>; Alexandru Marginean
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: [EXT] Re: [v1,net-next 3/4] net: qos: police action add index for tc
> flower offloading
>
> Caution: EXT Email
>
> On Tue, Jun 23, 2020 at 02:34:11PM +0800, Po Liu wrote:
> > From: Po Liu <[email protected]>
> >
> > Hardware may own many entries for police flow. So that make one(or
> > multi) flow to be policed by one hardware entry. This patch add the
> > police action index provide to the driver side make it mapping the
> > driver hardware entry index.
>
> Maybe first mention that it is possible for multiple filters in software to
> share the same policer. Something like:
>
> "
> It is possible for several tc filters to share the same police action by
> specifying the action's index when installing the filters.
>
> Propagate this index to device drivers through the flow offload
> intermediate representation, so that drivers could share a single hardware
> policer between multiple filters.
> "
>
> >
> > Signed-off-by: Po Liu <[email protected]>
> > ---
> > include/net/flow_offload.h | 1 +
> > net/sched/cls_api.c | 1 +
> > 2 files changed, 2 insertions(+)
> >
> > diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> > index c2ef19c6b27d..eed98075b1ae 100644
> > --- a/include/net/flow_offload.h
> > +++ b/include/net/flow_offload.h
> > @@ -232,6 +232,7 @@ struct flow_action_entry {
> > bool truncate;
> > } sample;
> > struct { /* FLOW_ACTION_POLICE */
> > + u32 index;
> > s64 burst;
> > u64 rate_bytes_ps;
> > u32 mtu;
> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index
> > 6aba7d5ba1ec..fdc4c89ca1fa 100644
> > --- a/net/sched/cls_api.c
> > +++ b/net/sched/cls_api.c
> > @@ -3659,6 +3659,7 @@ int tc_setup_flow_action(struct flow_action
> *flow_action,
> > entry->police.rate_bytes_ps =
> > tcf_police_rate_bytes_ps(act);
> > entry->police.mtu = tcf_police_tcfp_mtu(act);
> > + entry->police.index = act->tcfa_index;
> > } else if (is_tcf_ct(act)) {
> > entry->id = FLOW_ACTION_CT;
> > entry->ct.action = tcf_ct_action(act);
> > --
> > 2.17.1
> >


2020-06-23 07:45:40

by Po Liu

[permalink] [raw]
Subject: RE:Re: [v1,net-next 3/4] net: qos: police action add index for tc flower offloading

Hi Ido,

Sorry, ignore previous email.

> -----Original Message-----
> From: Ido Schimmel <[email protected]>
> Sent: 2020??6??23?? 15:10
> To: Po Liu <[email protected]>
> Cc: [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; Claudiu Manoil <[email protected]>; Vladimir
> Oltean <[email protected]>; Alexandru Marginean
> <[email protected]>; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected];
> [email protected]; [email protected]
> Subject: Re: [v1,net-next 3/4] net: qos: police action add index for tc
> flower offloading
>
> On Tue, Jun 23, 2020 at 02:34:11PM +0800, Po Liu wrote:
> > From: Po Liu <[email protected]>
> >
> > Hardware may own many entries for police flow. So that make one(or
> > multi) flow to be policed by one hardware entry. This patch add the
> > police action index provide to the driver side make it mapping the
> > driver hardware entry index.
>
> Maybe first mention that it is possible for multiple filters in software to
> share the same policer. Something like:
>
> "
> It is possible for several tc filters to share the same police action by
> specifying the action's index when installing the filters.
>
> Propagate this index to device drivers through the flow offload
> intermediate representation, so that drivers could share a single hardware
> policer between multiple filters.
> "
>

Thanks, I would change this commit message.

> >
> > Signed-off-by: Po Liu <[email protected]>
> > ---
> > include/net/flow_offload.h | 1 +
> > net/sched/cls_api.c | 1 +
> > 2 files changed, 2 insertions(+)
> >
> > diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> > index c2ef19c6b27d..eed98075b1ae 100644
> > --- a/include/net/flow_offload.h
> > +++ b/include/net/flow_offload.h
> > @@ -232,6 +232,7 @@ struct flow_action_entry {
> > bool truncate;
> > } sample;
> > struct { /* FLOW_ACTION_POLICE */
> > + u32 index;
> > s64 burst;
> > u64 rate_bytes_ps;
> > u32 mtu;
> > diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index
> > 6aba7d5ba1ec..fdc4c89ca1fa 100644
> > --- a/net/sched/cls_api.c
> > +++ b/net/sched/cls_api.c
> > @@ -3659,6 +3659,7 @@ int tc_setup_flow_action(struct flow_action
> *flow_action,
> > entry->police.rate_bytes_ps =
> > tcf_police_rate_bytes_ps(act);
> > entry->police.mtu = tcf_police_tcfp_mtu(act);
> > + entry->police.index = act->tcfa_index;
> > } else if (is_tcf_ct(act)) {
> > entry->id = FLOW_ACTION_CT;
> > entry->ct.action = tcf_ct_action(act);
> > --
> > 2.17.1
> >


Br,
Po Liu

2020-06-23 10:10:44

by Jamal Hadi Salim

[permalink] [raw]
Subject: Re: [v1,net-next 3/4] net: qos: police action add index for tc flower offloading

This certainly brings an interesting point which i brought up earlier
when Jiri was doing offloading of stats.
In this case the action index is being used as the offloaded
policer index (note: there'd need to be a check whether the
index is infact acceptable to the h/w etc unless there
2^32 meters available in the hardware).

My question: Is this any different from how stats are structured?
In this case you can map the s/w action index to a h/w table index
(of meters).
My comment then was: hardware i have encountered (and i pointed to P4
model as well) assumes an indexed table of stats.

cheers,
jamal

On 2020-06-23 2:34 a.m., Po Liu wrote:
> From: Po Liu <[email protected]>
>
> Hardware may own many entries for police flow. So that make one(or
> multi) flow to be policed by one hardware entry. This patch add the
> police action index provide to the driver side make it mapping the
> driver hardware entry index.
>
> Signed-off-by: Po Liu <[email protected]>
> ---
> include/net/flow_offload.h | 1 +
> net/sched/cls_api.c | 1 +
> 2 files changed, 2 insertions(+)
>
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index c2ef19c6b27d..eed98075b1ae 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -232,6 +232,7 @@ struct flow_action_entry {
> bool truncate;
> } sample;
> struct { /* FLOW_ACTION_POLICE */
> + u32 index;
> s64 burst;
> u64 rate_bytes_ps;
> u32 mtu;
> diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
> index 6aba7d5ba1ec..fdc4c89ca1fa 100644
> --- a/net/sched/cls_api.c
> +++ b/net/sched/cls_api.c
> @@ -3659,6 +3659,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
> entry->police.rate_bytes_ps =
> tcf_police_rate_bytes_ps(act);
> entry->police.mtu = tcf_police_tcfp_mtu(act);
> + entry->police.index = act->tcfa_index;
> } else if (is_tcf_ct(act)) {
> entry->id = FLOW_ACTION_CT;
> entry->ct.action = tcf_ct_action(act);
>

2020-06-23 15:32:31

by kernel test robot

[permalink] [raw]
Subject: Re: [v1,net-next 4/4] net: enetc add tc flower offload flow metering policing action

Hi Po,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20200622]
[cannot apply to sparc-next/master linux/master linus/master ipvs/master v5.8-rc2 v5.8-rc1 v5.7 v5.8-rc2]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Po-Liu/net-qos-add-tc-police-offloading-action-with-max-frame-size-limit/20200623-153724
base: 27f11fea33608cbd321a97cbecfa2ef97dcc1821
config: parisc-allmodconfig (attached as .config)
compiler: hppa-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=parisc

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>, old ones prefixed by <<):

WARNING: modpost: missing MODULE_LICENSE() in lib/test_bits.o
>> ERROR: modpost: "__udivdi3" [drivers/net/ethernet/freescale/enetc/fsl-enetc-vf.ko] undefined!
>> ERROR: modpost: "__divdi3" [drivers/net/ethernet/freescale/enetc/fsl-enetc-vf.ko] undefined!
>> ERROR: modpost: "__udivdi3" [drivers/net/ethernet/freescale/enetc/fsl-enetc.ko] undefined!
>> ERROR: modpost: "__divdi3" [drivers/net/ethernet/freescale/enetc/fsl-enetc.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (1.70 kB)
.config.gz (63.72 kB)
Download all attachments

2020-06-23 16:06:23

by kernel test robot

[permalink] [raw]
Subject: Re: [v1,net-next 4/4] net: enetc add tc flower offload flow metering policing action

Hi Po,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on next-20200622]
[cannot apply to sparc-next/master linux/master linus/master ipvs/master v5.8-rc2 v5.8-rc1 v5.7 v5.8-rc2]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Po-Liu/net-qos-add-tc-police-offloading-action-with-max-frame-size-limit/20200623-153724
base: 27f11fea33608cbd321a97cbecfa2ef97dcc1821
config: i386-allyesconfig (attached as .config)
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0
reproduce (this is a W=1 build):
# save the attached .config to linux build tree
make W=1 ARCH=i386

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <[email protected]>

All errors (new ones prefixed by >>):

ld: drivers/net/ethernet/freescale/enetc/enetc_qos.o: in function `enetc_flowmeter_hw_set':
>> enetc_qos.c:(.text+0x66): undefined reference to `__udivdi3'

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/[email protected]


Attachments:
(No filename) (1.20 kB)
.config.gz (72.46 kB)
Download all attachments

2020-06-24 09:35:50

by Po Liu

[permalink] [raw]
Subject: [v2,net-next 1/4] net: qos: add tc police offloading action with max frame size limit

From: Po Liu <[email protected]>

Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some
hardware own the capability to limit the frame size. If the frame size
larger than the setting, the frame would be dropped. For the police
action itself already accept the 'mtu' parameter in tc command. But not
extend to tc flower offloading. So extend 'mtu' to tc flower offloading.

Signed-off-by: Po Liu <[email protected]>
---
v2:
-- No update.

include/net/flow_offload.h | 1 +
include/net/tc_act/tc_police.h | 10 ++++++++++
net/sched/cls_api.c | 1 +
3 files changed, 12 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index 00c15f14c434..c2ef19c6b27d 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -234,6 +234,7 @@ struct flow_action_entry {
struct { /* FLOW_ACTION_POLICE */
s64 burst;
u64 rate_bytes_ps;
+ u32 mtu;
} police;
struct { /* FLOW_ACTION_CT */
int action;
diff --git a/include/net/tc_act/tc_police.h b/include/net/tc_act/tc_police.h
index f098ad4424be..cd973b10ae8c 100644
--- a/include/net/tc_act/tc_police.h
+++ b/include/net/tc_act/tc_police.h
@@ -69,4 +69,14 @@ static inline s64 tcf_police_tcfp_burst(const struct tc_action *act)
return params->tcfp_burst;
}

+static inline u32 tcf_police_tcfp_mtu(const struct tc_action *act)
+{
+ struct tcf_police *police = to_police(act);
+ struct tcf_police_params *params;
+
+ params = rcu_dereference_protected(police->params,
+ lockdep_is_held(&police->tcf_lock));
+ return params->tcfp_mtu;
+}
+
#endif /* __NET_TC_POLICE_H */
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index a00a203b2ef5..6aba7d5ba1ec 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3658,6 +3658,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
entry->police.burst = tcf_police_tcfp_burst(act);
entry->police.rate_bytes_ps =
tcf_police_rate_bytes_ps(act);
+ entry->police.mtu = tcf_police_tcfp_mtu(act);
} else if (is_tcf_ct(act)) {
entry->id = FLOW_ACTION_CT;
entry->ct.action = tcf_ct_action(act);
--
2.17.1

2020-06-24 09:36:07

by Po Liu

[permalink] [raw]
Subject: [v2,net-next 2/4] net: enetc: add support max frame size for tc flower offload

From: Po Liu <[email protected]>

Base on the tc flower offload police action add max frame size by the
parameter 'mtu'. Tc flower device driver working by the IEEE 802.1Qci
stream filter can implement the max frame size filtering. Add it to the
current hardware tc flower stearm filter driver.

Signed-off-by: Po Liu <[email protected]>
---
v2:
- No update.

.../net/ethernet/freescale/enetc/enetc_qos.c | 52 +++++++++++++------
1 file changed, 36 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index fb76903eca90..07f98bf7a06b 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -389,6 +389,7 @@ struct enetc_psfp_filter {
u32 index;
s32 handle;
s8 prio;
+ u32 maxsdu;
u32 gate_id;
s32 meter_id;
refcount_t refcount;
@@ -430,6 +431,12 @@ static struct actions_fwd enetc_act_fwd[] = {
BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
FILTER_ACTION_TYPE_PSFP
},
+ {
+ BIT(FLOW_ACTION_POLICE) |
+ BIT(FLOW_ACTION_GATE),
+ BIT(FLOW_DISSECTOR_KEY_ETH_ADDRS),
+ FILTER_ACTION_TYPE_PSFP
+ },
/* example for ACL actions */
{
BIT(FLOW_ACTION_DROP),
@@ -594,8 +601,12 @@ static int enetc_streamfilter_hw_set(struct enetc_ndev_priv *priv,
/* Filter Type. Identifies the contents of the MSDU/FM_INST_INDEX
* field as being either an MSDU value or an index into the Flow
* Meter Instance table.
- * TODO: no limit max sdu
*/
+ if (sfi->maxsdu) {
+ sfi_config->msdu =
+ cpu_to_le16(sfi->maxsdu);
+ sfi_config->multi |= 0x40;
+ }

if (sfi->meter_id >= 0) {
sfi_config->fm_inst_table_index = cpu_to_le16(sfi->meter_id);
@@ -872,6 +883,7 @@ static struct enetc_psfp_filter
hlist_for_each_entry(s, &epsfp.psfp_filter_list, node)
if (s->gate_id == sfi->gate_id &&
s->prio == sfi->prio &&
+ s->maxsdu == sfi->maxsdu &&
s->meter_id == sfi->meter_id)
return s;

@@ -979,6 +991,7 @@ static struct actions_fwd *enetc_check_flow_actions(u64 acts,
static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
struct flow_cls_offload *f)
{
+ struct flow_action_entry *entryg = NULL, *entryp = NULL;
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
struct netlink_ext_ack *extack = f->common.extack;
struct enetc_stream_filter *filter, *old_filter;
@@ -997,9 +1010,12 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,

flow_action_for_each(i, entry, &rule->action)
if (entry->id == FLOW_ACTION_GATE)
- break;
+ entryg = entry;
+ else if (entry->id == FLOW_ACTION_POLICE)
+ entryp = entry;

- if (entry->id != FLOW_ACTION_GATE)
+ /* Not support without gate action */
+ if (!entryg)
return -EINVAL;

filter = kzalloc(sizeof(*filter), GFP_KERNEL);
@@ -1079,19 +1095,19 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
}

/* parsing gate action */
- if (entry->gate.index >= priv->psfp_cap.max_psfp_gate) {
+ if (entryg->gate.index >= priv->psfp_cap.max_psfp_gate) {
NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
err = -ENOSPC;
goto free_filter;
}

- if (entry->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
+ if (entryg->gate.num_entries >= priv->psfp_cap.max_psfp_gatelist) {
NL_SET_ERR_MSG_MOD(extack, "No Stream Gate resource!");
err = -ENOSPC;
goto free_filter;
}

- entries_size = struct_size(sgi, entries, entry->gate.num_entries);
+ entries_size = struct_size(sgi, entries, entryg->gate.num_entries);
sgi = kzalloc(entries_size, GFP_KERNEL);
if (!sgi) {
err = -ENOMEM;
@@ -1099,18 +1115,18 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
}

refcount_set(&sgi->refcount, 1);
- sgi->index = entry->gate.index;
- sgi->init_ipv = entry->gate.prio;
- sgi->basetime = entry->gate.basetime;
- sgi->cycletime = entry->gate.cycletime;
- sgi->num_entries = entry->gate.num_entries;
+ sgi->index = entryg->gate.index;
+ sgi->init_ipv = entryg->gate.prio;
+ sgi->basetime = entryg->gate.basetime;
+ sgi->cycletime = entryg->gate.cycletime;
+ sgi->num_entries = entryg->gate.num_entries;

e = sgi->entries;
- for (i = 0; i < entry->gate.num_entries; i++) {
- e[i].gate_state = entry->gate.entries[i].gate_state;
- e[i].interval = entry->gate.entries[i].interval;
- e[i].ipv = entry->gate.entries[i].ipv;
- e[i].maxoctets = entry->gate.entries[i].maxoctets;
+ for (i = 0; i < entryg->gate.num_entries; i++) {
+ e[i].gate_state = entryg->gate.entries[i].gate_state;
+ e[i].interval = entryg->gate.entries[i].interval;
+ e[i].ipv = entryg->gate.entries[i].ipv;
+ e[i].maxoctets = entryg->gate.entries[i].maxoctets;
}

filter->sgi_index = sgi->index;
@@ -1127,6 +1143,10 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
/* flow meter not support yet */
sfi->meter_id = ENETC_PSFP_WILDCARD;

+ /* Max frame size */
+ if (entryp)
+ sfi->maxsdu = entryp->police.mtu;
+
/* prio ref the filter prio */
if (f->common.prio && f->common.prio <= BIT(3))
sfi->prio = f->common.prio - 1;
--
2.17.1

2020-06-24 09:36:50

by Po Liu

[permalink] [raw]
Subject: [v2,net-next 3/4] net: qos: police action add index for tc flower offloading

From: Po Liu <[email protected]>

Hardware device may include more than one police entry. Specifying the
action's index make it possible for several tc filters to share the same
police action when installing the filters.

Propagate this index to device drivers through the flow offload
intermediate representation, so that drivers could share a single
hardware policer between multiple filters.

v1->v2 changes:
- Update the commit message suggest by Ido Schimmel <[email protected]>

Signed-off-by: Po Liu <[email protected]>
---
include/net/flow_offload.h | 1 +
net/sched/cls_api.c | 1 +
2 files changed, 2 insertions(+)

diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index c2ef19c6b27d..eed98075b1ae 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -232,6 +232,7 @@ struct flow_action_entry {
bool truncate;
} sample;
struct { /* FLOW_ACTION_POLICE */
+ u32 index;
s64 burst;
u64 rate_bytes_ps;
u32 mtu;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 6aba7d5ba1ec..fdc4c89ca1fa 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -3659,6 +3659,7 @@ int tc_setup_flow_action(struct flow_action *flow_action,
entry->police.rate_bytes_ps =
tcf_police_rate_bytes_ps(act);
entry->police.mtu = tcf_police_tcfp_mtu(act);
+ entry->police.index = act->tcfa_index;
} else if (is_tcf_ct(act)) {
entry->id = FLOW_ACTION_CT;
entry->ct.action = tcf_ct_action(act);
--
2.17.1

2020-06-24 09:39:52

by Po Liu

[permalink] [raw]
Subject: [v2,net-next 4/4] net: enetc add tc flower offload flow metering policing action

From: Po Liu <[email protected]>

Flow metering entries in IEEE 802.1Qci is an optional function for a
flow filtering module. Flow metering is two rates two buckets and three
color marker to policing the frames. This patch only enable one rate one
bucket and in color blind mode. Flow metering instance are as
specified in the algorithm in MEF 10.3 and in Bandwidth Profile
Parameters. They are:

a) Flow meter instance identifier. An integer value identifying the flow
meter instance. The patch use the police 'index' as thin value.
b) Committed Information Rate (CIR), in bits per second. This patch use
the 'rate_bytes_ps' represent this value.
c) Committed Burst Size (CBS), in octets. This patch use the 'burst'
represent this value.
d) Excess Information Rate (EIR), in bits per second.
e) Excess Burst Size per Bandwidth Profile Flow (EBS), in octets.
And plus some other parameters. This patch set EIR/EBS default disable
and color blind mode.

v1->v2 changes:
- Use div_u64() as division replace the '/' report:

Reported-by: kernel test robot <[email protected]>
All errors (new ones prefixed by >>):

ld: drivers/net/ethernet/freescale/enetc/enetc_qos.o: in function `enetc_flowmeter_hw_set':
>> enetc_qos.c:(.text+0x66): undefined reference to `__udivdi3'


Signed-off-by: Po Liu <[email protected]>
---
.../net/ethernet/freescale/enetc/enetc_hw.h | 24 +++
.../net/ethernet/freescale/enetc/enetc_qos.c | 160 ++++++++++++++++--
2 files changed, 172 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/freescale/enetc/enetc_hw.h b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
index 6314051bc6c1..f00c4382423e 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_hw.h
+++ b/drivers/net/ethernet/freescale/enetc/enetc_hw.h
@@ -570,6 +570,7 @@ enum bdcr_cmd_class {
BDCR_CMD_STREAM_IDENTIFY,
BDCR_CMD_STREAM_FILTER,
BDCR_CMD_STREAM_GCL,
+ BDCR_CMD_FLOW_METER,
__BDCR_CMD_MAX_LEN,
BDCR_CMD_MAX_LEN = __BDCR_CMD_MAX_LEN - 1,
};
@@ -736,10 +737,33 @@ struct sgcl_data {
struct sgce sgcl[0];
};

+#define ENETC_CBDR_FMI_MR BIT(0)
+#define ENETC_CBDR_FMI_MREN BIT(1)
+#define ENETC_CBDR_FMI_DOY BIT(2)
+#define ENETC_CBDR_FMI_CM BIT(3)
+#define ENETC_CBDR_FMI_CF BIT(4)
+#define ENETC_CBDR_FMI_NDOR BIT(5)
+#define ENETC_CBDR_FMI_OALEN BIT(6)
+#define ENETC_CBDR_FMI_IRFPP_MASK GENMASK(4, 0)
+
+/* class 10: command 0/1, Flow Meter Instance Set, short Format */
+struct fmi_conf {
+ __le32 cir;
+ __le32 cbs;
+ __le32 eir;
+ __le32 ebs;
+ u8 conf;
+ u8 res1;
+ u8 ir_fpp;
+ u8 res2[4];
+ u8 en;
+};
+
struct enetc_cbd {
union{
struct sfi_conf sfi_conf;
struct sgi_table sgi_table;
+ struct fmi_conf fmi_conf;
struct {
__le32 addr[2];
union {
diff --git a/drivers/net/ethernet/freescale/enetc/enetc_qos.c b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
index 07f98bf7a06b..4f670cbdf186 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc_qos.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc_qos.c
@@ -408,10 +408,26 @@ struct enetc_psfp_gate {
struct action_gate_entry entries[0];
};

+/* Only enable the green color frame now
+ * Will add eir and ebs color blind, couple flag etc when
+ * policing action add more offloading parameters
+ */
+struct enetc_psfp_meter {
+ u32 index;
+ u32 cir;
+ u32 cbs;
+ refcount_t refcount;
+ struct hlist_node node;
+};
+
+#define ENETC_PSFP_FLAGS_FMI BIT(0)
+
struct enetc_stream_filter {
struct enetc_streamid sid;
u32 sfi_index;
u32 sgi_index;
+ u32 flags;
+ u32 fmi_index;
struct flow_stats stats;
struct hlist_node node;
};
@@ -422,6 +438,7 @@ struct enetc_psfp {
struct hlist_head stream_list;
struct hlist_head psfp_filter_list;
struct hlist_head psfp_gate_list;
+ struct hlist_head psfp_meter_list;
spinlock_t psfp_lock; /* spinlock for the struct enetc_psfp r/w */
};

@@ -842,6 +859,47 @@ static int enetc_streamgate_hw_set(struct enetc_ndev_priv *priv,
return err;
}

+static int enetc_flowmeter_hw_set(struct enetc_ndev_priv *priv,
+ struct enetc_psfp_meter *fmi,
+ u8 enable)
+{
+ struct enetc_cbd cbd = { .cmd = 0 };
+ struct fmi_conf *fmi_config;
+ u64 temp = 0;
+
+ cbd.index = cpu_to_le16((u16)fmi->index);
+ cbd.cls = BDCR_CMD_FLOW_METER;
+ cbd.status_flags = 0x80;
+
+ if (!enable)
+ return enetc_send_cmd(priv->si, &cbd);
+
+ fmi_config = &cbd.fmi_conf;
+ fmi_config->en = 0x80;
+
+ if (fmi->cir) {
+ temp = (u64)8000 * fmi->cir;
+ temp = div_u64(temp, 3725);
+ }
+
+ fmi_config->cir = cpu_to_le32((u32)temp);
+ fmi_config->cbs = cpu_to_le32(fmi->cbs);
+
+ /* Default for eir ebs disable */
+ fmi_config->eir = 0;
+ fmi_config->ebs = 0;
+
+ /* Default:
+ * mark red disable
+ * drop on yellow disable
+ * color mode disable
+ * couple flag disable
+ */
+ fmi_config->conf = 0;
+
+ return enetc_send_cmd(priv->si, &cbd);
+}
+
static struct enetc_stream_filter *enetc_get_stream_by_index(u32 index)
{
struct enetc_stream_filter *f;
@@ -875,6 +933,17 @@ static struct enetc_psfp_filter *enetc_get_filter_by_index(u32 index)
return NULL;
}

+static struct enetc_psfp_meter *enetc_get_meter_by_index(u32 index)
+{
+ struct enetc_psfp_meter *m;
+
+ hlist_for_each_entry(m, &epsfp.psfp_meter_list, node)
+ if (m->index == index)
+ return m;
+
+ return NULL;
+}
+
static struct enetc_psfp_filter
*enetc_psfp_check_sfi(struct enetc_psfp_filter *sfi)
{
@@ -934,9 +1003,27 @@ static void stream_gate_unref(struct enetc_ndev_priv *priv, u32 index)
}
}

+static void flow_meter_unref(struct enetc_ndev_priv *priv, u32 index)
+{
+ struct enetc_psfp_meter *fmi;
+ u8 z;
+
+ fmi = enetc_get_meter_by_index(index);
+ WARN_ON(!fmi);
+ z = refcount_dec_and_test(&fmi->refcount);
+ if (z) {
+ enetc_flowmeter_hw_set(priv, fmi, false);
+ hlist_del(&fmi->node);
+ kfree(fmi);
+ }
+}
+
static void remove_one_chain(struct enetc_ndev_priv *priv,
struct enetc_stream_filter *filter)
{
+ if (filter->flags & ENETC_PSFP_FLAGS_FMI)
+ flow_meter_unref(priv, filter->fmi_index);
+
stream_gate_unref(priv, filter->sgi_index);
stream_filter_unref(priv, filter->sfi_index);

@@ -947,7 +1034,8 @@ static void remove_one_chain(struct enetc_ndev_priv *priv,
static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
struct enetc_streamid *sid,
struct enetc_psfp_filter *sfi,
- struct enetc_psfp_gate *sgi)
+ struct enetc_psfp_gate *sgi,
+ struct enetc_psfp_meter *fmi)
{
int err;

@@ -965,8 +1053,16 @@ static int enetc_psfp_hw_set(struct enetc_ndev_priv *priv,
if (err)
goto revert_sfi;

+ if (fmi) {
+ err = enetc_flowmeter_hw_set(priv, fmi, true);
+ if (err)
+ goto revert_sgi;
+ }
+
return 0;

+revert_sgi:
+ enetc_streamgate_hw_set(priv, sgi, false);
revert_sfi:
if (sfi)
enetc_streamfilter_hw_set(priv, sfi, false);
@@ -995,6 +1091,7 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
struct flow_rule *rule = flow_cls_offload_flow_rule(f);
struct netlink_ext_ack *extack = f->common.extack;
struct enetc_stream_filter *filter, *old_filter;
+ struct enetc_psfp_meter *fmi = NULL, *old_fmi;
struct enetc_psfp_filter *sfi, *old_sfi;
struct enetc_psfp_gate *sgi, *old_sgi;
struct flow_action_entry *entry;
@@ -1139,13 +1236,34 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,

refcount_set(&sfi->refcount, 1);
sfi->gate_id = sgi->index;
-
- /* flow meter not support yet */
sfi->meter_id = ENETC_PSFP_WILDCARD;

- /* Max frame size */
- if (entryp)
- sfi->maxsdu = entryp->police.mtu;
+ /* Flow meter and max frame size */
+ if (entryp) {
+ if (entryp->police.burst) {
+ u64 temp;
+
+ fmi = kzalloc(sizeof(*fmi), GFP_KERNEL);
+ if (!fmi) {
+ err = -ENOMEM;
+ goto free_sfi;
+ }
+ refcount_set(&fmi->refcount, 1);
+ fmi->cir = entryp->police.rate_bytes_ps;
+ /* Convert to original burst value */
+ temp = entryp->police.burst * fmi->cir;
+ temp = div_u64(temp, 1000000000ULL);
+
+ fmi->cbs = temp;
+ fmi->index = entryp->police.index;
+ filter->flags |= ENETC_PSFP_FLAGS_FMI;
+ filter->fmi_index = fmi->index;
+ sfi->meter_id = fmi->index;
+ }
+
+ if (entryp->police.mtu)
+ sfi->maxsdu = entryp->police.mtu;
+ }

/* prio ref the filter prio */
if (f->common.prio && f->common.prio <= BIT(3))
@@ -1161,7 +1279,7 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
if (sfi->handle < 0) {
NL_SET_ERR_MSG_MOD(extack, "No Stream Filter resource!");
err = -ENOSPC;
- goto free_sfi;
+ goto free_fmi;
}

sfi->index = index;
@@ -1177,11 +1295,23 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,
}

err = enetc_psfp_hw_set(priv, &filter->sid,
- sfi_overwrite ? NULL : sfi, sgi);
+ sfi_overwrite ? NULL : sfi, sgi, fmi);
if (err)
- goto free_sfi;
+ goto free_fmi;

spin_lock(&epsfp.psfp_lock);
+ if (filter->flags & ENETC_PSFP_FLAGS_FMI) {
+ old_fmi = enetc_get_meter_by_index(filter->fmi_index);
+ if (old_fmi) {
+ fmi->refcount = old_fmi->refcount;
+ refcount_set(&fmi->refcount,
+ refcount_read(&old_fmi->refcount) + 1);
+ hlist_del(&old_fmi->node);
+ kfree(old_fmi);
+ }
+ hlist_add_head(&fmi->node, &epsfp.psfp_meter_list);
+ }
+
/* Remove the old node if exist and update with a new node */
old_sgi = enetc_get_gate_by_index(filter->sgi_index);
if (old_sgi) {
@@ -1212,6 +1342,8 @@ static int enetc_psfp_parse_clsflower(struct enetc_ndev_priv *priv,

return 0;

+free_fmi:
+ kfree(fmi);
free_sfi:
kfree(sfi);
free_gate:
@@ -1310,9 +1442,13 @@ static int enetc_psfp_get_stats(struct enetc_ndev_priv *priv,
return -EINVAL;

spin_lock(&epsfp.psfp_lock);
- stats.pkts = counters.matching_frames_count - filter->stats.pkts;
- stats.drops = counters.not_passing_frames_count -
- filter->stats.drops;
+ stats.pkts = counters.matching_frames_count +
+ counters.not_passing_sdu_count -
+ filter->stats.pkts;
+ stats.drops = counters.not_passing_frames_count +
+ counters.not_passing_sdu_count +
+ counters.red_frames_count -
+ filter->stats.drops;
stats.lastused = filter->stats.lastused;
filter->stats.pkts += stats.pkts;
filter->stats.drops += stats.drops;
--
2.17.1

2020-06-25 05:07:13

by David Miller

[permalink] [raw]
Subject: Re: [v2,net-next 2/4] net: enetc: add support max frame size for tc flower offload

From: Po Liu <[email protected]>
Date: Wed, 24 Jun 2020 17:36:29 +0800

> From: Po Liu <[email protected]>
>
> Base on the tc flower offload police action add max frame size by the
> parameter 'mtu'. Tc flower device driver working by the IEEE 802.1Qci
> stream filter can implement the max frame size filtering. Add it to the
> current hardware tc flower stearm filter driver.
>
> Signed-off-by: Po Liu <[email protected]>

Applied.

2020-06-25 05:08:01

by David Miller

[permalink] [raw]
Subject: Re: [v2,net-next 1/4] net: qos: add tc police offloading action with max frame size limit

From: Po Liu <[email protected]>
Date: Wed, 24 Jun 2020 17:36:28 +0800

> From: Po Liu <[email protected]>
>
> Current police offloading support the 'burst'' and 'rate_bytes_ps'. Some
> hardware own the capability to limit the frame size. If the frame size
> larger than the setting, the frame would be dropped. For the police
> action itself already accept the 'mtu' parameter in tc command. But not
> extend to tc flower offloading. So extend 'mtu' to tc flower offloading.
>
> Signed-off-by: Po Liu <[email protected]>

Applied.

2020-06-25 05:08:22

by David Miller

[permalink] [raw]
Subject: Re: [v2,net-next 4/4] net: enetc add tc flower offload flow metering policing action

From: Po Liu <[email protected]>
Date: Wed, 24 Jun 2020 17:36:31 +0800

> From: Po Liu <[email protected]>
>
> Flow metering entries in IEEE 802.1Qci is an optional function for a
> flow filtering module. Flow metering is two rates two buckets and three
> color marker to policing the frames. This patch only enable one rate one
> bucket and in color blind mode. Flow metering instance are as
> specified in the algorithm in MEF 10.3 and in Bandwidth Profile
> Parameters. They are:
>
> a) Flow meter instance identifier. An integer value identifying the flow
> meter instance. The patch use the police 'index' as thin value.
> b) Committed Information Rate (CIR), in bits per second. This patch use
> the 'rate_bytes_ps' represent this value.
> c) Committed Burst Size (CBS), in octets. This patch use the 'burst'
> represent this value.
> d) Excess Information Rate (EIR), in bits per second.
> e) Excess Burst Size per Bandwidth Profile Flow (EBS), in octets.
> And plus some other parameters. This patch set EIR/EBS default disable
> and color blind mode.
>
> v1->v2 changes:
> - Use div_u64() as division replace the '/' report:
>
> Reported-by: kernel test robot <[email protected]>
> All errors (new ones prefixed by >>):
>
> ld: drivers/net/ethernet/freescale/enetc/enetc_qos.o: in function `enetc_flowmeter_hw_set':
>>> enetc_qos.c:(.text+0x66): undefined reference to `__udivdi3'
>
>
> Signed-off-by: Po Liu <[email protected]>

Applied.

2020-06-25 05:08:34

by David Miller

[permalink] [raw]
Subject: Re: [v2,net-next 3/4] net: qos: police action add index for tc flower offloading

From: Po Liu <[email protected]>
Date: Wed, 24 Jun 2020 17:36:30 +0800

> From: Po Liu <[email protected]>
>
> Hardware device may include more than one police entry. Specifying the
> action's index make it possible for several tc filters to share the same
> police action when installing the filters.
>
> Propagate this index to device drivers through the flow offload
> intermediate representation, so that drivers could share a single
> hardware policer between multiple filters.
>
> v1->v2 changes:
> - Update the commit message suggest by Ido Schimmel <[email protected]>
>
> Signed-off-by: Po Liu <[email protected]>

Applied.