2021-11-03 09:21:43

by Clément Léger

[permalink] [raw]
Subject: [PATCH v2 0/6] Add FDMA support on ocelot switch driver

This series adds support for the Frame DMA present on the VSC7514
switch. The FDMA is able to extract and inject packets on the various
ethernet interfaces present on the switch.

While adding FDMA support, bindings were switched from .txt to .yaml
and MAC address reading from device-tree was added for testing
purposes. Jumbo frame support was also added since it gives a large
performance improvement with FDMA.

------------------
Changes in V2:
- Read MAC for each port and not as switch base MAC address
- Add missing static for some functions in ocelot_fdma.c
- Split change_mtu from fdma commit
- Add jumbo support for register based xmit
- Move precomputed header into ocelot_port struct
- Remove use of QUIRK_ENDIAN_LITTLE due to misconfiguration for tests
- Remove fragmented packet sending which has not been tested

Clément Léger (6):
net: ocelot: add support to get port mac from device-tree
dt-bindings: net: convert mscc,vsc7514-switch bindings to yaml
net: ocelot: pre-compute injection frame header content
net: ocelot: add support for ndo_change_mtu
net: ocelot: add FDMA support
net: ocelot: add jumbo frame support for FDMA

.../bindings/net/mscc,vsc7514-switch.yaml | 184 +++++
.../devicetree/bindings/net/mscc-ocelot.txt | 83 --
drivers/net/ethernet/mscc/Makefile | 1 +
drivers/net/ethernet/mscc/ocelot.c | 23 +-
drivers/net/ethernet/mscc/ocelot.h | 3 +
drivers/net/ethernet/mscc/ocelot_fdma.c | 754 ++++++++++++++++++
drivers/net/ethernet/mscc/ocelot_fdma.h | 60 ++
drivers/net/ethernet/mscc/ocelot_net.c | 37 +-
drivers/net/ethernet/mscc/ocelot_vsc7514.c | 15 +
include/soc/mscc/ocelot.h | 7 +
10 files changed, 1075 insertions(+), 92 deletions(-)
create mode 100644 Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
delete mode 100644 Documentation/devicetree/bindings/net/mscc-ocelot.txt
create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.c
create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.h

--
2.33.0


2021-11-03 09:21:44

by Clément Léger

[permalink] [raw]
Subject: [PATCH v2 2/6] dt-bindings: net: convert mscc,vsc7514-switch bindings to yaml

Convert existing bindings to yaml format. In the same time, remove non
exiting properties ("inj" interrupt) and add fdma.

Signed-off-by: Clément Léger <[email protected]>
---
.../bindings/net/mscc,vsc7514-switch.yaml | 184 ++++++++++++++++++
.../devicetree/bindings/net/mscc-ocelot.txt | 83 --------
2 files changed, 184 insertions(+), 83 deletions(-)
create mode 100644 Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
delete mode 100644 Documentation/devicetree/bindings/net/mscc-ocelot.txt

diff --git a/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml b/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
new file mode 100644
index 000000000000..0c96eabf9d2d
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
@@ -0,0 +1,184 @@
+# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/net/mscc,vsc7514-switch.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: Microchip VSC7514 Ethernet switch controller
+
+maintainers:
+ - Vladimir Oltean <[email protected]>
+ - Claudiu Manoil <[email protected]>
+ - Alexandre Belloni <[email protected]>
+
+description: |
+ The VSC7514 Industrial IoT Ethernet switch contains four integrated dual media
+ 10/100/1000BASE-T PHYs, two 1G SGMII/SerDes, two 1G/2.5G SGMII/SerDes, and an
+ option for either a 1G/2.5G SGMII/SerDes Node Processor Interface (NPI) or a
+ PCIe interface for external CPU connectivity. The NPI/PCIe can operate as a
+ standard Ethernet port.
+
+ The device provides a rich set of Industrial Ethernet switching features such
+ as fast protection switching, 1588 precision time protocol, and synchronous
+ Ethernet. Advanced TCAM-based VLAN and QoS processing enable delivery of
+ differentiated services. Security is assured through frame processing using
+ Microsemi’s TCAM-based Versatile Content Aware Processor.
+
+ In addition, the device contains a powerful 500 MHz CPU enabling full
+ management of the switch.
+
+properties:
+ $nodename:
+ pattern: "^switch@[0-9a-f]+$"
+
+ compatible:
+ const: mscc,vsc7514-switch
+
+ reg:
+ items:
+ - description: system target
+ - description: rewriter target
+ - description: qs target
+ - description: PTP target
+ - description: Port0 target
+ - description: Port1 target
+ - description: Port2 target
+ - description: Port3 target
+ - description: Port4 target
+ - description: Port5 target
+ - description: Port6 target
+ - description: Port7 target
+ - description: Port8 target
+ - description: Port9 target
+ - description: Port10 target
+ - description: QSystem target
+ - description: Analyzer target
+ - description: S0 target
+ - description: S1 target
+ - description: S2 target
+ - description: fdma target
+
+ reg-names:
+ items:
+ - const: sys
+ - const: rew
+ - const: qs
+ - const: ptp
+ - const: port0
+ - const: port1
+ - const: port2
+ - const: port3
+ - const: port4
+ - const: port5
+ - const: port6
+ - const: port7
+ - const: port8
+ - const: port9
+ - const: port10
+ - const: qsys
+ - const: ana
+ - const: s0
+ - const: s1
+ - const: s2
+ - const: fdma
+
+ interrupts:
+ minItems: 1
+ items:
+ - description: PTP ready
+ - description: register based extraction
+ - description: frame dma based extraction
+
+ interrupt-names:
+ minItems: 1
+ items:
+ - const: ptp_rdy
+ - const: xtr
+ - const: fdma
+
+ ethernet-ports:
+ type: object
+ patternProperties:
+ "^port@[0-9a-f]+$":
+ type: object
+ description: Ethernet ports handled by the switch
+
+ allOf:
+ - $ref: ethernet-controller.yaml#
+
+ properties:
+ '#address-cells':
+ const: 1
+ '#size-cells':
+ const: 0
+
+ reg:
+ description: Switch port number
+
+ phy-handle: true
+
+ mac-address: true
+
+ required:
+ - reg
+ - phy-handle
+
+required:
+ - compatible
+ - reg
+ - reg-names
+ - interrupts
+ - interrupt-names
+ - ethernet-ports
+
+additionalProperties: false
+
+examples:
+ - |
+ switch@1010000 {
+ compatible = "mscc,vsc7514-switch";
+ reg = <0x1010000 0x10000>,
+ <0x1030000 0x10000>,
+ <0x1080000 0x100>,
+ <0x10e0000 0x10000>,
+ <0x11e0000 0x100>,
+ <0x11f0000 0x100>,
+ <0x1200000 0x100>,
+ <0x1210000 0x100>,
+ <0x1220000 0x100>,
+ <0x1230000 0x100>,
+ <0x1240000 0x100>,
+ <0x1250000 0x100>,
+ <0x1260000 0x100>,
+ <0x1270000 0x100>,
+ <0x1280000 0x100>,
+ <0x1800000 0x80000>,
+ <0x1880000 0x10000>,
+ <0x1040000 0x10000>,
+ <0x1050000 0x10000>,
+ <0x1060000 0x10000>,
+ <0x1a0 0x1c4>;
+ reg-names = "sys", "rew", "qs", "ptp", "port0", "port1",
+ "port2", "port3", "port4", "port5", "port6",
+ "port7", "port8", "port9", "port10", "qsys",
+ "ana", "s0", "s1", "s2", "fdma";
+ interrupts = <18 21 16>;
+ interrupt-names = "ptp_rdy", "xtr", "fdma";
+
+ ethernet-ports {
+ #address-cells = <1>;
+ #size-cells = <0>;
+
+ port0: port@0 {
+ reg = <0>;
+ phy-handle = <&phy0>;
+ };
+ port1: port@1 {
+ reg = <1>;
+ phy-handle = <&phy1>;
+ };
+ };
+ };
+
+...
+# vim: set ts=2 sw=2 sts=2 tw=80 et cc=80 ft=yaml :
diff --git a/Documentation/devicetree/bindings/net/mscc-ocelot.txt b/Documentation/devicetree/bindings/net/mscc-ocelot.txt
deleted file mode 100644
index 3b6290b45ce5..000000000000
--- a/Documentation/devicetree/bindings/net/mscc-ocelot.txt
+++ /dev/null
@@ -1,83 +0,0 @@
-Microsemi Ocelot network Switch
-===============================
-
-The Microsemi Ocelot network switch can be found on Microsemi SoCs (VSC7513,
-VSC7514)
-
-Required properties:
-- compatible: Should be "mscc,vsc7514-switch"
-- reg: Must contain an (offset, length) pair of the register set for each
- entry in reg-names.
-- reg-names: Must include the following entries:
- - "sys"
- - "rew"
- - "qs"
- - "ptp" (optional due to backward compatibility)
- - "qsys"
- - "ana"
- - "portX" with X from 0 to the number of last port index available on that
- switch
-- interrupts: Should contain the switch interrupts for frame extraction,
- frame injection and PTP ready.
-- interrupt-names: should contain the interrupt names: "xtr", "inj". Can contain
- "ptp_rdy" which is optional due to backward compatibility.
-- ethernet-ports: A container for child nodes representing switch ports.
-
-The ethernet-ports container has the following properties
-
-Required properties:
-
-- #address-cells: Must be 1
-- #size-cells: Must be 0
-
-Each port node must have the following mandatory properties:
-- reg: Describes the port address in the switch
-
-Port nodes may also contain the following optional standardised
-properties, described in binding documents:
-
-- phy-handle: Phandle to a PHY on an MDIO bus. See
- Documentation/devicetree/bindings/net/ethernet.txt for details.
-
-Example:
-
- switch@1010000 {
- compatible = "mscc,vsc7514-switch";
- reg = <0x1010000 0x10000>,
- <0x1030000 0x10000>,
- <0x1080000 0x100>,
- <0x10e0000 0x10000>,
- <0x11e0000 0x100>,
- <0x11f0000 0x100>,
- <0x1200000 0x100>,
- <0x1210000 0x100>,
- <0x1220000 0x100>,
- <0x1230000 0x100>,
- <0x1240000 0x100>,
- <0x1250000 0x100>,
- <0x1260000 0x100>,
- <0x1270000 0x100>,
- <0x1280000 0x100>,
- <0x1800000 0x80000>,
- <0x1880000 0x10000>;
- reg-names = "sys", "rew", "qs", "ptp", "port0", "port1",
- "port2", "port3", "port4", "port5", "port6",
- "port7", "port8", "port9", "port10", "qsys",
- "ana";
- interrupts = <18 21 22>;
- interrupt-names = "ptp_rdy", "xtr", "inj";
-
- ethernet-ports {
- #address-cells = <1>;
- #size-cells = <0>;
-
- port0: port@0 {
- reg = <0>;
- phy-handle = <&phy0>;
- };
- port1: port@1 {
- reg = <1>;
- phy-handle = <&phy1>;
- };
- };
- };
--
2.33.0

2021-11-03 09:21:52

by Clément Léger

[permalink] [raw]
Subject: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

IFH preparation can take quite some time on slow processors (up to 5% in
a iperf3 test for instance). In order to reduce the cost of this
preparation, pre-compute IFH since most of the parameters are fixed per
port. Only rew_op and vlan tag will be set when sending if different
than 0. This allows to remove entirely the calls to packing() with basic
usage. In the same time, export this function that will be used by FDMA.

Signed-off-by: Clément Léger <[email protected]>
---
drivers/net/ethernet/mscc/ocelot.c | 23 ++++++++++++++++++-----
include/soc/mscc/ocelot.h | 5 +++++
2 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
index e6c18b598d5c..97693772595b 100644
--- a/drivers/net/ethernet/mscc/ocelot.c
+++ b/drivers/net/ethernet/mscc/ocelot.c
@@ -1076,20 +1076,29 @@ bool ocelot_can_inject(struct ocelot *ocelot, int grp)
}
EXPORT_SYMBOL(ocelot_can_inject);

+void ocelot_ifh_port_set(void *ifh, struct ocelot_port *port, u32 rew_op,
+ u32 vlan_tag)
+{
+ memcpy(ifh, port->ifh, OCELOT_TAG_LEN);
+
+ if (vlan_tag)
+ ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
+ if (rew_op)
+ ocelot_ifh_set_rew_op(ifh, rew_op);
+}
+EXPORT_SYMBOL(ocelot_ifh_port_set);
+
void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp,
u32 rew_op, struct sk_buff *skb)
{
+ struct ocelot_port *port_s = ocelot->ports[port];
u32 ifh[OCELOT_TAG_LEN / 4] = {0};
unsigned int i, count, last;

ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);

- ocelot_ifh_set_bypass(ifh, 1);
- ocelot_ifh_set_dest(ifh, BIT_ULL(port));
- ocelot_ifh_set_tag_type(ifh, IFH_TAG_TYPE_C);
- ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
- ocelot_ifh_set_rew_op(ifh, rew_op);
+ ocelot_ifh_port_set(ifh, port_s, rew_op, skb_vlan_tag_get(skb));

for (i = 0; i < OCELOT_TAG_LEN / 4; i++)
ocelot_write_rix(ocelot, ifh[i], QS_INJ_WR, grp);
@@ -2128,6 +2137,10 @@ void ocelot_init_port(struct ocelot *ocelot, int port)

skb_queue_head_init(&ocelot_port->tx_skbs);

+ ocelot_ifh_set_bypass(ocelot_port->ifh, 1);
+ ocelot_ifh_set_dest(ocelot_port->ifh, BIT_ULL(port));
+ ocelot_ifh_set_tag_type(ocelot_port->ifh, IFH_TAG_TYPE_C);
+
/* Basic L2 initialization */

/* Set MAC IFG Gaps
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index fef3a36b0210..b3381c90ff3e 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -6,6 +6,7 @@
#define _SOC_MSCC_OCELOT_H

#include <linux/ptp_clock_kernel.h>
+#include <linux/dsa/ocelot.h>
#include <linux/net_tstamp.h>
#include <linux/if_vlan.h>
#include <linux/regmap.h>
@@ -623,6 +624,8 @@ struct ocelot_port {

struct net_device *bridge;
u8 stp_state;
+
+ u8 ifh[OCELOT_TAG_LEN];
};

struct ocelot {
@@ -754,6 +757,8 @@ void __ocelot_target_write_ix(struct ocelot *ocelot, enum ocelot_target target,
bool ocelot_can_inject(struct ocelot *ocelot, int grp);
void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp,
u32 rew_op, struct sk_buff *skb);
+void ocelot_ifh_port_set(void *ifh, struct ocelot_port *port, u32 rew_op,
+ u32 vlan_tag);
int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **skb);
void ocelot_drain_cpu_queue(struct ocelot *ocelot, int grp);

--
2.33.0

2021-11-03 09:22:02

by Clément Léger

[permalink] [raw]
Subject: [PATCH v2 5/6] net: ocelot: add FDMA support

Ethernet frames can be extracted or injected autonomously to or from the
device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list data
structures in memory are used for injecting or extracting Ethernet frames.
The FDMA generates interrupts when frame extraction or injection is done
and when the linked lists need updating.

The FDMA is shared between all the ethernet ports of the switch and uses
a linked list of descriptors (DCB) to inject and extract packets.
Before adding descriptors, the FDMA channels must be stopped. It would
be inefficient to do that each time a descriptor would be added,

TX path uses multiple lists to handle descriptors. tx_ongoing is the list
of DCB currently owned by the hardware, tx_queued is a list of DCB that
will be given to the hardware when tx_ongoing is done and finally
tx_free_dcb is the list of DCB available for TX.

RX path uses two list, rx_hw is the list of DCB currently given to the
hardware and rx_sw is the list of descriptors that have been completed
by the FDMA and will be reinjected when the DMA hits the end of the
linked list.

Co-developed-by: Alexandre Belloni <[email protected]>
Signed-off-by: Alexandre Belloni <[email protected]>
Signed-off-by: Clément Léger <[email protected]>
---
drivers/net/ethernet/mscc/Makefile | 1 +
drivers/net/ethernet/mscc/ocelot.h | 1 +
drivers/net/ethernet/mscc/ocelot_fdma.c | 693 +++++++++++++++++++++
drivers/net/ethernet/mscc/ocelot_fdma.h | 59 ++
drivers/net/ethernet/mscc/ocelot_net.c | 11 +-
drivers/net/ethernet/mscc/ocelot_vsc7514.c | 15 +
include/soc/mscc/ocelot.h | 2 +
7 files changed, 779 insertions(+), 3 deletions(-)
create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.c
create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.h

diff --git a/drivers/net/ethernet/mscc/Makefile b/drivers/net/ethernet/mscc/Makefile
index 722c27694b21..d76a9b78b6ca 100644
--- a/drivers/net/ethernet/mscc/Makefile
+++ b/drivers/net/ethernet/mscc/Makefile
@@ -11,5 +11,6 @@ mscc_ocelot_switch_lib-y := \
mscc_ocelot_switch_lib-$(CONFIG_BRIDGE_MRP) += ocelot_mrp.o
obj-$(CONFIG_MSCC_OCELOT_SWITCH) += mscc_ocelot.o
mscc_ocelot-y := \
+ ocelot_fdma.o \
ocelot_vsc7514.o \
ocelot_net.o
diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h
index ba0dec7dd64f..ad85ad1079ad 100644
--- a/drivers/net/ethernet/mscc/ocelot.h
+++ b/drivers/net/ethernet/mscc/ocelot.h
@@ -9,6 +9,7 @@
#define _MSCC_OCELOT_H_

#include <linux/bitops.h>
+#include <linux/dsa/ocelot.h>
#include <linux/etherdevice.h>
#include <linux/if_vlan.h>
#include <linux/net_tstamp.h>
diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c
new file mode 100644
index 000000000000..d8cdf022bbee
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_fdma.c
@@ -0,0 +1,693 @@
+// SPDX-License-Identifier: (GPL-2.0 OR MIT)
+/*
+ * Microsemi SoCs FDMA driver
+ *
+ * Copyright (c) 2021 Microchip
+ */
+
+#include <linux/bitops.h>
+#include <linux/dmapool.h>
+#include <linux/dsa/ocelot.h>
+#include <linux/netdevice.h>
+#include <linux/of_platform.h>
+#include <linux/skbuff.h>
+
+#include "ocelot_fdma.h"
+#include "ocelot_qs.h"
+
+#define MSCC_FDMA_DCB_LLP(x) ((x) * 4 + 0x0)
+
+#define MSCC_FDMA_DCB_STAT_BLOCKO(x) (((x) << 20) & GENMASK(31, 20))
+#define MSCC_FDMA_DCB_STAT_BLOCKO_M GENMASK(31, 20)
+#define MSCC_FDMA_DCB_STAT_BLOCKO_X(x) (((x) & GENMASK(31, 20)) >> 20)
+#define MSCC_FDMA_DCB_STAT_PD BIT(19)
+#define MSCC_FDMA_DCB_STAT_ABORT BIT(18)
+#define MSCC_FDMA_DCB_STAT_EOF BIT(17)
+#define MSCC_FDMA_DCB_STAT_SOF BIT(16)
+#define MSCC_FDMA_DCB_STAT_BLOCKL_M GENMASK(15, 0)
+#define MSCC_FDMA_DCB_STAT_BLOCKL(x) ((x) & GENMASK(15, 0))
+
+#define MSCC_FDMA_CH_SAFE 0xcc
+
+#define MSCC_FDMA_CH_ACTIVATE 0xd0
+
+#define MSCC_FDMA_CH_DISABLE 0xd4
+
+#define MSCC_FDMA_EVT_ERR 0x164
+
+#define MSCC_FDMA_EVT_ERR_CODE 0x168
+
+#define MSCC_FDMA_INTR_LLP 0x16c
+
+#define MSCC_FDMA_INTR_LLP_ENA 0x170
+
+#define MSCC_FDMA_INTR_FRM 0x174
+
+#define MSCC_FDMA_INTR_FRM_ENA 0x178
+
+#define MSCC_FDMA_INTR_ENA 0x184
+
+#define MSCC_FDMA_INTR_IDENT 0x188
+
+#define MSCC_FDMA_INJ_CHAN 2
+#define MSCC_FDMA_XTR_CHAN 0
+
+#define FDMA_MAX_SKB 256
+#define FDMA_WEIGHT 32
+
+#define OCELOT_TAG_WORD_LEN (OCELOT_TAG_LEN / 4)
+
+/* Add 4 for possible misalignment when mapping the data */
+#define FDMA_RX_EXTRA_SIZE \
+ (OCELOT_TAG_LEN + ETH_FCS_LEN + ETH_HLEN + 4)
+
+struct ocelot_fdma_dcb_hw_v2 {
+ u32 llp;
+ u32 datap;
+ u32 datal;
+ u32 stat;
+};
+
+struct ocelot_fdma_dcb {
+ struct ocelot_fdma_dcb_hw_v2 hw;
+ struct list_head node;
+ struct sk_buff *skb;
+ dma_addr_t mapping;
+ size_t mapped_size;
+ dma_addr_t phys;
+};
+
+static int fdma_rx_compute_buffer_size(int mtu)
+{
+ return ALIGN(mtu + FDMA_RX_EXTRA_SIZE, 4);
+}
+
+static void fdma_writel(struct ocelot_fdma *fdma, u32 reg, u32 data)
+{
+ writel(data, fdma->base + reg);
+}
+
+static u32 fdma_readl(struct ocelot_fdma *fdma, u32 reg)
+{
+ return readl(fdma->base + reg);
+}
+
+static void fdma_activate_chan(struct ocelot_fdma *fdma,
+ struct ocelot_fdma_dcb *dcb, int chan)
+{
+ fdma_writel(fdma, MSCC_FDMA_DCB_LLP(chan), dcb->phys);
+ fdma_writel(fdma, MSCC_FDMA_CH_ACTIVATE, BIT(chan));
+}
+
+static void fdma_stop_channel(struct ocelot_fdma *fdma, int chan)
+{
+ u32 safe;
+
+ fdma_writel(fdma, MSCC_FDMA_CH_DISABLE, BIT(chan));
+ do {
+ safe = fdma_readl(fdma, MSCC_FDMA_CH_SAFE);
+ } while (!(safe & BIT(chan)));
+}
+
+static bool ocelot_fdma_dcb_set_data(struct ocelot_fdma *fdma,
+ struct ocelot_fdma_dcb *dcb, void *data,
+ size_t size, enum dma_data_direction dir)
+{
+ u32 offset;
+
+ dcb->mapped_size = size;
+ dcb->mapping = dma_map_single(fdma->dev, data, size, dir);
+ if (unlikely(dma_mapping_error(fdma->dev, dcb->mapping)))
+ return false;
+
+ offset = dcb->mapping & 0x3;
+
+ dcb->hw.llp = 0;
+ dcb->hw.datap = dcb->mapping & ~0x3;
+ /* DATAL must be a multiple of word size */
+ dcb->hw.datal = ALIGN_DOWN(size - offset, 4);
+ dcb->hw.stat = MSCC_FDMA_DCB_STAT_BLOCKO(offset);
+
+ return true;
+}
+
+static bool ocelot_fdma_dcb_set_rx_skb(struct ocelot_fdma *fdma,
+ struct ocelot_fdma_dcb *dcb,
+ struct sk_buff *skb, size_t size)
+{
+ dcb->skb = skb;
+ return ocelot_fdma_dcb_set_data(fdma, dcb, skb->data, size,
+ DMA_FROM_DEVICE);
+}
+
+static bool ocelot_fdma_dcb_set_tx_skb(struct ocelot_fdma *fdma,
+ struct ocelot_fdma_dcb *dcb,
+ struct sk_buff *skb)
+{
+ if (!ocelot_fdma_dcb_set_data(fdma, dcb, skb->data, skb->len,
+ DMA_TO_DEVICE))
+ return false;
+
+ dcb->skb = skb;
+ dcb->hw.stat |= MSCC_FDMA_DCB_STAT_BLOCKL(skb->len);
+ dcb->hw.stat |= MSCC_FDMA_DCB_STAT_SOF | MSCC_FDMA_DCB_STAT_EOF;
+
+ return true;
+}
+
+static struct ocelot_fdma_dcb *fdma_dcb_alloc(struct ocelot_fdma *fdma)
+{
+ struct ocelot_fdma_dcb *dcb;
+ dma_addr_t phys;
+
+ dcb = dma_pool_zalloc(fdma->dcb_pool, GFP_KERNEL, &phys);
+ if (unlikely(!dcb))
+ return NULL;
+
+ dcb->phys = phys;
+
+ return dcb;
+}
+
+static struct net_device *fdma_get_port_netdev(struct ocelot_fdma *fdma,
+ int port_num)
+{
+ struct ocelot_port_private *port_priv;
+ struct ocelot *ocelot = fdma->ocelot;
+ struct ocelot_port *port;
+
+ if (port_num >= ocelot->num_phys_ports)
+ return NULL;
+
+ port = ocelot->ports[port_num];
+
+ if (!port)
+ return NULL;
+
+ port_priv = container_of(port, struct ocelot_port_private, port);
+
+ return port_priv->dev;
+}
+
+static bool ocelot_fdma_rx_process_skb(struct ocelot_fdma *fdma,
+ struct ocelot_fdma_dcb *dcb,
+ int budget)
+{
+ struct sk_buff *skb = dcb->skb;
+ struct net_device *ndev;
+ u64 src_port;
+ void *xfh;
+
+ dma_unmap_single(fdma->dev, dcb->mapping, dcb->mapped_size,
+ DMA_FROM_DEVICE);
+
+ xfh = skb->data;
+ ocelot_xfh_get_src_port(xfh, &src_port);
+
+ skb_put(skb, MSCC_FDMA_DCB_STAT_BLOCKL(dcb->hw.stat));
+ skb_pull(skb, OCELOT_TAG_LEN);
+
+ ndev = fdma_get_port_netdev(fdma, src_port);
+ if (unlikely(!ndev)) {
+ napi_consume_skb(dcb->skb, budget);
+ return false;
+ }
+
+ skb->dev = ndev;
+ skb->protocol = eth_type_trans(skb, skb->dev);
+ skb->dev->stats.rx_bytes += skb->len;
+ skb->dev->stats.rx_packets++;
+
+ netif_receive_skb(skb);
+
+ return true;
+}
+
+static void ocelot_fdma_rx_refill(struct ocelot_fdma *fdma)
+{
+ struct ocelot_fdma_dcb *dcb, *last_dcb;
+
+ WARN_ON(list_empty(&fdma->rx_sw));
+
+ dcb = list_first_entry(&fdma->rx_sw, struct ocelot_fdma_dcb, node);
+ /* Splice old hardware DCB list + new one */
+ if (!list_empty(&fdma->rx_hw)) {
+ last_dcb = list_last_entry(&fdma->rx_hw, struct ocelot_fdma_dcb,
+ node);
+ last_dcb->hw.llp = dcb->phys;
+ }
+
+ /* Move software list to hardware list */
+ list_splice_tail_init(&fdma->rx_sw, &fdma->rx_hw);
+
+ /* Finally reactivate the channel */
+ fdma_activate_chan(fdma, dcb, MSCC_FDMA_XTR_CHAN);
+}
+
+static void ocelot_fdma_list_add_dcb(struct list_head *list,
+ struct ocelot_fdma_dcb *dcb)
+{
+ struct ocelot_fdma_dcb *last_dcb;
+
+ if (!list_empty(list)) {
+ last_dcb = list_last_entry(list, struct ocelot_fdma_dcb, node);
+ last_dcb->hw.llp = dcb->phys;
+ }
+
+ list_add_tail(&dcb->node, list);
+}
+
+static bool ocelot_fdma_rx_add_dcb_sw(struct ocelot_fdma *fdma,
+ struct ocelot_fdma_dcb *dcb)
+{
+ struct sk_buff *new_skb;
+
+ /* Add DCB to end of list with new SKB */
+ new_skb = napi_alloc_skb(&fdma->napi, fdma->rx_buf_size);
+ if (unlikely(!new_skb)) {
+ pr_err("skb_alloc failed\n");
+ return false;
+ }
+
+ ocelot_fdma_dcb_set_rx_skb(fdma, dcb, new_skb, fdma->rx_buf_size);
+ ocelot_fdma_list_add_dcb(&fdma->rx_sw, dcb);
+
+ return true;
+}
+
+static bool ocelot_fdma_rx_get(struct ocelot_fdma *fdma, int budget)
+{
+ struct ocelot_fdma_dcb *dcb;
+ bool valid = true;
+ u32 stat;
+
+ dcb = list_first_entry_or_null(&fdma->rx_hw, struct ocelot_fdma_dcb,
+ node);
+ if (!dcb || MSCC_FDMA_DCB_STAT_BLOCKL(dcb->hw.stat) == 0)
+ return false;
+
+ list_del(&dcb->node);
+
+ stat = dcb->hw.stat;
+ if (stat & MSCC_FDMA_DCB_STAT_ABORT || stat & MSCC_FDMA_DCB_STAT_PD)
+ valid = false;
+
+ if (!(stat & MSCC_FDMA_DCB_STAT_SOF) ||
+ !(stat & MSCC_FDMA_DCB_STAT_EOF))
+ valid = false;
+
+ if (likely(valid)) {
+ if (!ocelot_fdma_rx_process_skb(fdma, dcb, budget))
+ pr_err("Process skb failed, stat %x\n", stat);
+ } else {
+ napi_consume_skb(dcb->skb, budget);
+ }
+
+ return ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
+}
+
+static void ocelot_fdma_rx_check_stopped(struct ocelot_fdma *fdma)
+{
+ u32 llp = fdma_readl(fdma, MSCC_FDMA_DCB_LLP(MSCC_FDMA_XTR_CHAN));
+ /* LLP is non NULL, FDMA is still fetching packets */
+ if (llp)
+ return;
+
+ fdma_stop_channel(fdma, MSCC_FDMA_XTR_CHAN);
+ ocelot_fdma_rx_refill(fdma);
+}
+
+static void ocelot_fdma_tx_free_dcb(struct ocelot_fdma *fdma,
+ struct list_head *list)
+{
+ struct ocelot_fdma_dcb *dcb;
+
+ if (list_empty(list))
+ return;
+
+ /* Free all SKBs that have been used for TX */
+ list_for_each_entry(dcb, list, node) {
+ dma_unmap_single(fdma->dev, dcb->mapping, dcb->mapped_size,
+ DMA_TO_DEVICE);
+ dev_consume_skb_any(dcb->skb);
+ dcb->skb = NULL;
+ }
+
+ /* All DCBs can now be given to free list */
+ spin_lock(&fdma->tx_free_lock);
+ list_splice_tail_init(list, &fdma->tx_free_dcb);
+ spin_unlock(&fdma->tx_free_lock);
+}
+
+static void ocelot_fdma_tx_cleanup(struct ocelot_fdma *fdma)
+{
+ struct list_head tx_done = LIST_HEAD_INIT(tx_done);
+ struct ocelot_fdma_dcb *dcb, *temp;
+
+ spin_lock(&fdma->tx_enqueue_lock);
+ if (list_empty(&fdma->tx_ongoing))
+ goto out_unlock;
+
+ list_for_each_entry_safe(dcb, temp, &fdma->tx_ongoing, node) {
+ if (!(dcb->hw.stat & MSCC_FDMA_DCB_STAT_PD))
+ break;
+
+ list_move_tail(&dcb->node, &tx_done);
+ }
+
+out_unlock:
+ spin_unlock(&fdma->tx_enqueue_lock);
+
+ ocelot_fdma_tx_free_dcb(fdma, &tx_done);
+}
+
+static void ocelot_fdma_tx_restart(struct ocelot_fdma *fdma)
+{
+ struct ocelot_fdma_dcb *dcb;
+ u32 safe;
+
+ spin_lock(&fdma->tx_enqueue_lock);
+
+ if (!list_empty(&fdma->tx_ongoing) || list_empty(&fdma->tx_queued))
+ goto out_unlock;
+
+ /* Ongoing list is empty, channel should be in safe mode */
+ do {
+ safe = fdma_readl(fdma, MSCC_FDMA_CH_SAFE);
+ } while (!(safe & BIT(MSCC_FDMA_INJ_CHAN)));
+
+ /* Move queued DCB to ongoing and restart the DMA */
+ list_splice_tail_init(&fdma->tx_queued, &fdma->tx_ongoing);
+ /* List can't be empty, no need to check */
+ dcb = list_first_entry(&fdma->tx_ongoing, struct ocelot_fdma_dcb, node);
+
+ fdma_activate_chan(fdma, dcb, MSCC_FDMA_INJ_CHAN);
+
+out_unlock:
+ spin_unlock(&fdma->tx_enqueue_lock);
+}
+
+static int ocelot_fdma_napi_poll(struct napi_struct *napi, int budget)
+{
+ struct ocelot_fdma *fdma = container_of(napi, struct ocelot_fdma, napi);
+ int work_done = 0;
+
+ ocelot_fdma_tx_cleanup(fdma);
+ ocelot_fdma_tx_restart(fdma);
+
+ while (work_done < budget) {
+ if (!ocelot_fdma_rx_get(fdma, budget))
+ break;
+
+ work_done++;
+ }
+
+ ocelot_fdma_rx_check_stopped(fdma);
+
+ if (work_done < budget) {
+ napi_complete(&fdma->napi);
+ fdma_writel(fdma, MSCC_FDMA_INTR_ENA,
+ BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
+ }
+
+ return work_done;
+}
+
+static irqreturn_t ocelot_fdma_interrupt(int irq, void *dev_id)
+{
+ u32 ident, llp, frm, err, err_code;
+ struct ocelot_fdma *fdma = dev_id;
+
+ ident = fdma_readl(fdma, MSCC_FDMA_INTR_IDENT);
+ frm = fdma_readl(fdma, MSCC_FDMA_INTR_FRM);
+ llp = fdma_readl(fdma, MSCC_FDMA_INTR_LLP);
+
+ fdma_writel(fdma, MSCC_FDMA_INTR_LLP, llp & ident);
+ fdma_writel(fdma, MSCC_FDMA_INTR_FRM, frm & ident);
+ if (frm | llp) {
+ fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
+ napi_schedule(&fdma->napi);
+ }
+
+ err = fdma_readl(fdma, MSCC_FDMA_EVT_ERR);
+ if (unlikely(err)) {
+ err_code = fdma_readl(fdma, MSCC_FDMA_EVT_ERR_CODE);
+ dev_err_ratelimited(fdma->dev,
+ "Error ! chans mask: %#x, code: %#x\n",
+ err, err_code);
+
+ fdma_writel(fdma, MSCC_FDMA_EVT_ERR, err);
+ fdma_writel(fdma, MSCC_FDMA_EVT_ERR_CODE, err_code);
+ }
+
+ return IRQ_HANDLED;
+}
+
+static struct ocelot_fdma_dcb *fdma_tx_get_dcb(struct ocelot_fdma *fdma)
+{
+ struct ocelot_fdma_dcb *dcb = NULL;
+
+ spin_lock_bh(&fdma->tx_free_lock);
+ dcb = list_first_entry_or_null(&fdma->tx_free_dcb,
+ struct ocelot_fdma_dcb, node);
+ if (dcb)
+ list_del(&dcb->node);
+
+ spin_unlock_bh(&fdma->tx_free_lock);
+
+ return dcb;
+}
+
+int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port, u32 rew_op,
+ struct sk_buff *skb, struct net_device *dev)
+{
+ struct ocelot_port *port_s = fdma->ocelot->ports[port];
+ struct ocelot_fdma_dcb *dcb;
+ struct sk_buff *new_skb;
+ void *ifh;
+
+ if (unlikely(skb_shinfo(skb)->nr_frags != 0)) {
+ netdev_err(dev, "Unsupported fragmented packet");
+ dev_kfree_skb_any(skb);
+ return NETDEV_TX_OK;
+ }
+
+ if (skb_headroom(skb) < OCELOT_TAG_LEN ||
+ skb_tailroom(skb) < ETH_FCS_LEN) {
+ new_skb = skb_copy_expand(skb, OCELOT_TAG_LEN, ETH_FCS_LEN,
+ GFP_ATOMIC);
+ dev_consume_skb_any(skb);
+ if (!new_skb)
+ return NETDEV_TX_OK;
+
+ skb = new_skb;
+ }
+
+ ifh = skb_push(skb, OCELOT_TAG_LEN);
+ skb_put(skb, ETH_FCS_LEN);
+ ocelot_ifh_port_set(ifh, port_s, rew_op, skb_vlan_tag_get(skb));
+
+ dcb = fdma_tx_get_dcb(fdma);
+ if (unlikely(!dcb))
+ return NETDEV_TX_BUSY;
+
+ if (!ocelot_fdma_dcb_set_tx_skb(fdma, dcb, skb)) {
+ dev_kfree_skb_any(skb);
+ spin_lock_bh(&fdma->tx_free_lock);
+ list_add_tail(&dcb->node, &fdma->tx_free_dcb);
+ spin_unlock_bh(&fdma->tx_free_lock);
+ return NETDEV_TX_OK;
+ }
+
+ spin_lock_bh(&fdma->tx_enqueue_lock);
+
+ if (list_empty(&fdma->tx_ongoing)) {
+ ocelot_fdma_list_add_dcb(&fdma->tx_ongoing, dcb);
+ fdma_activate_chan(fdma, dcb, MSCC_FDMA_INJ_CHAN);
+ } else {
+ ocelot_fdma_list_add_dcb(&fdma->tx_queued, dcb);
+ }
+
+ spin_unlock_bh(&fdma->tx_enqueue_lock);
+ return NETDEV_TX_OK;
+}
+
+static void fdma_free_skbs_list(struct ocelot_fdma *fdma,
+ struct list_head *list,
+ enum dma_data_direction dir)
+{
+ struct ocelot_fdma_dcb *dcb;
+
+ if (list_empty(list))
+ return;
+
+ list_for_each_entry(dcb, list, node) {
+ if (dcb->skb) {
+ dma_unmap_single(fdma->dev, dcb->mapping,
+ dcb->mapped_size, dir);
+ dev_kfree_skb_any(dcb->skb);
+ }
+ }
+}
+
+static int fdma_init_tx(struct ocelot_fdma *fdma)
+{
+ int i;
+ struct ocelot_fdma_dcb *dcb;
+
+ for (i = 0; i < FDMA_MAX_SKB; i++) {
+ dcb = fdma_dcb_alloc(fdma);
+ if (!dcb)
+ return -ENOMEM;
+
+ list_add_tail(&dcb->node, &fdma->tx_free_dcb);
+ }
+
+ return 0;
+}
+
+static int fdma_init_rx(struct ocelot_fdma *fdma)
+{
+ struct ocelot_port_private *port_priv;
+ struct ocelot *ocelot = fdma->ocelot;
+ struct ocelot_fdma_dcb *dcb;
+ struct ocelot_port *port;
+ struct net_device *ndev;
+ int max_mtu = 0;
+ int i;
+ u8 port_num;
+
+ for (port_num = 0; port_num < ocelot->num_phys_ports; port_num++) {
+ port = ocelot->ports[port_num];
+ if (!port)
+ continue;
+
+ port_priv = container_of(port, struct ocelot_port_private,
+ port);
+ ndev = port_priv->dev;
+
+ ndev->needed_headroom = OCELOT_TAG_LEN;
+ ndev->needed_tailroom = ETH_FCS_LEN;
+
+ if (READ_ONCE(ndev->mtu) > max_mtu)
+ max_mtu = READ_ONCE(ndev->mtu);
+ }
+
+ if (!ndev)
+ return -ENODEV;
+
+ fdma->rx_buf_size = fdma_rx_compute_buffer_size(max_mtu);
+ netif_napi_add(ndev, &fdma->napi, ocelot_fdma_napi_poll,
+ FDMA_WEIGHT);
+
+ for (i = 0; i < FDMA_MAX_SKB; i++) {
+ dcb = fdma_dcb_alloc(fdma);
+ if (!dcb)
+ return -ENOMEM;
+
+ ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
+ }
+
+ napi_enable(&fdma->napi);
+
+ return 0;
+}
+
+struct ocelot_fdma *ocelot_fdma_init(struct platform_device *pdev,
+ struct ocelot *ocelot)
+{
+ struct ocelot_fdma *fdma;
+ int ret;
+
+ fdma = devm_kzalloc(&pdev->dev, sizeof(*fdma), GFP_KERNEL);
+ if (!fdma)
+ return ERR_PTR(-ENOMEM);
+
+ fdma->ocelot = ocelot;
+ fdma->base = devm_platform_ioremap_resource_byname(pdev, "fdma");
+ if (IS_ERR_OR_NULL(fdma->base))
+ return fdma->base;
+
+ fdma->dev = &pdev->dev;
+ fdma->dev->coherent_dma_mask = DMA_BIT_MASK(32);
+
+ spin_lock_init(&fdma->tx_enqueue_lock);
+ spin_lock_init(&fdma->tx_free_lock);
+
+ fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
+
+ fdma->irq = platform_get_irq_byname(pdev, "fdma");
+ ret = devm_request_irq(&pdev->dev, fdma->irq, ocelot_fdma_interrupt, 0,
+ dev_name(&pdev->dev), fdma);
+ if (ret)
+ return ERR_PTR(ret);
+
+ /* Create a pool of consistent memory blocks for hardware descriptors */
+ fdma->dcb_pool = dmam_pool_create("ocelot_fdma", &pdev->dev,
+ sizeof(struct ocelot_fdma_dcb),
+ __alignof__(struct ocelot_fdma_dcb),
+ 0);
+ if (!fdma->dcb_pool) {
+ dev_err(&pdev->dev, "unable to allocate DMA descriptor pool\n");
+ return ERR_PTR(-ENOMEM);
+ }
+
+ INIT_LIST_HEAD(&fdma->tx_ongoing);
+ INIT_LIST_HEAD(&fdma->tx_free_dcb);
+ INIT_LIST_HEAD(&fdma->tx_queued);
+ INIT_LIST_HEAD(&fdma->rx_sw);
+ INIT_LIST_HEAD(&fdma->rx_hw);
+
+ return fdma;
+}
+
+int ocelot_fdma_start(struct ocelot_fdma *fdma)
+{
+ struct ocelot *ocelot = fdma->ocelot;
+ int ret;
+
+ ret = fdma_init_tx(fdma);
+ if (ret)
+ return ret;
+
+ ret = fdma_init_rx(fdma);
+ if (ret)
+ return ret;
+
+ /* Reconfigure for extraction and injection using DMA */
+ ocelot_write_rix(ocelot, QS_INJ_GRP_CFG_MODE(2), QS_INJ_GRP_CFG, 0);
+ ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(0), QS_INJ_CTRL, 0);
+
+ ocelot_write_rix(ocelot, QS_XTR_GRP_CFG_MODE(2), QS_XTR_GRP_CFG, 0);
+
+ fdma_writel(fdma, MSCC_FDMA_INTR_LLP, 0xffffffff);
+ fdma_writel(fdma, MSCC_FDMA_INTR_FRM, 0xffffffff);
+
+ fdma_writel(fdma, MSCC_FDMA_INTR_LLP_ENA,
+ BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
+ fdma_writel(fdma, MSCC_FDMA_INTR_FRM_ENA, BIT(MSCC_FDMA_XTR_CHAN));
+ fdma_writel(fdma, MSCC_FDMA_INTR_ENA,
+ BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
+
+ ocelot_fdma_rx_refill(fdma);
+
+ return 0;
+}
+
+int ocelot_fdma_stop(struct ocelot_fdma *fdma)
+{
+ fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
+
+ fdma_stop_channel(fdma, MSCC_FDMA_XTR_CHAN);
+ fdma_stop_channel(fdma, MSCC_FDMA_INJ_CHAN);
+
+ /* Free potentially pending SKBs in DCB lists */
+ fdma_free_skbs_list(fdma, &fdma->rx_hw, DMA_FROM_DEVICE);
+ fdma_free_skbs_list(fdma, &fdma->rx_sw, DMA_FROM_DEVICE);
+ fdma_free_skbs_list(fdma, &fdma->tx_ongoing, DMA_TO_DEVICE);
+ fdma_free_skbs_list(fdma, &fdma->tx_queued, DMA_TO_DEVICE);
+
+ netif_napi_del(&fdma->napi);
+
+ return 0;
+}
diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.h b/drivers/net/ethernet/mscc/ocelot_fdma.h
new file mode 100644
index 000000000000..6c5c5872abf5
--- /dev/null
+++ b/drivers/net/ethernet/mscc/ocelot_fdma.h
@@ -0,0 +1,59 @@
+/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
+/*
+ * Microsemi SoCs FDMA driver
+ *
+ * Copyright (c) 2021 Microchip
+ */
+#ifndef _MSCC_OCELOT_FDMA_H_
+#define _MSCC_OCELOT_FDMA_H_
+
+#include "ocelot.h"
+
+/**
+ * struct ocelot_fdma - FMDA struct
+ *
+ * @ocelot: Pointer to ocelot struct
+ * @base: base address of FDMA registers
+ * @dcb_pool: Pool used for DCB allocation
+ * @irq: FDMA interrupt
+ * @dev: Ocelot device
+ * @napi: napi handle
+ * @rx_buf_size: Size of RX buffer
+ * @tx_ongoing: List of DCB handed out to the FDMA
+ * @tx_queued: pending list of DCBs to be given to the hardware
+ * @tx_enqueue_lock: Lock used for tx_queued and tx_ongoing
+ * @tx_free_dcb: List of DCB available for TX
+ * @tx_free_lock: Lock used to access tx_free_dcb list
+ * @rx_hw: RX DCBs currently owned by the hardware and not completed
+ * @rx_sw: RX DCBs completed
+ */
+struct ocelot_fdma {
+ struct ocelot *ocelot;
+ void __iomem *base;
+ struct dma_pool *dcb_pool;
+ int irq;
+ struct device *dev;
+ struct napi_struct napi;
+ size_t rx_buf_size;
+
+ struct list_head tx_ongoing;
+ struct list_head tx_queued;
+ /* Lock for tx_queued and tx_ongoing lists */
+ spinlock_t tx_enqueue_lock;
+
+ struct list_head tx_free_dcb;
+ /* Lock for tx_free_dcb list */
+ spinlock_t tx_free_lock;
+
+ struct list_head rx_hw;
+ struct list_head rx_sw;
+};
+
+struct ocelot_fdma *ocelot_fdma_init(struct platform_device *pdev,
+ struct ocelot *ocelot);
+int ocelot_fdma_start(struct ocelot_fdma *fdma);
+int ocelot_fdma_stop(struct ocelot_fdma *fdma);
+int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port, u32 rew_op,
+ struct sk_buff *skb, struct net_device *dev);
+
+#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index 5916492fd6d0..3971b810c5b4 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -15,6 +15,7 @@
#include <net/pkt_cls.h>
#include "ocelot.h"
#include "ocelot_vcap.h"
+#include "ocelot_fdma.h"

#define OCELOT_MAC_QUIRKS OCELOT_QUIRK_QSGMII_PORTS_MUST_BE_UP

@@ -457,7 +458,7 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
int port = priv->chip_port;
u32 rew_op = 0;

- if (!ocelot_can_inject(ocelot, 0))
+ if (!ocelot->fdma && !ocelot_can_inject(ocelot, 0))
return NETDEV_TX_BUSY;

/* Check if timestamping is needed */
@@ -475,9 +476,13 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
rew_op = ocelot_ptp_rew_op(skb);
}

- ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
+ if (ocelot->fdma) {
+ ocelot_fdma_inject_frame(ocelot->fdma, port, rew_op, skb, dev);
+ } else {
+ ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);

- kfree_skb(skb);
+ kfree_skb(skb);
+ }

return NETDEV_TX_OK;
}
diff --git a/drivers/net/ethernet/mscc/ocelot_vsc7514.c b/drivers/net/ethernet/mscc/ocelot_vsc7514.c
index 38103b0255b0..985d584db3a1 100644
--- a/drivers/net/ethernet/mscc/ocelot_vsc7514.c
+++ b/drivers/net/ethernet/mscc/ocelot_vsc7514.c
@@ -18,6 +18,7 @@

#include <soc/mscc/ocelot_vcap.h>
#include <soc/mscc/ocelot_hsio.h>
+#include "ocelot_fdma.h"
#include "ocelot.h"

static const u32 ocelot_ana_regmap[] = {
@@ -1080,6 +1081,10 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
ocelot->targets[io_target[i].id] = target;
}

+ ocelot->fdma = ocelot_fdma_init(pdev, ocelot);
+ if (IS_ERR(ocelot->fdma))
+ ocelot->fdma = NULL;
+
hsio = syscon_regmap_lookup_by_compatible("mscc,ocelot-hsio");
if (IS_ERR(hsio)) {
dev_err(&pdev->dev, "missing hsio syscon\n");
@@ -1139,6 +1144,12 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
if (err)
goto out_ocelot_devlink_unregister;

+ if (ocelot->fdma) {
+ err = ocelot_fdma_start(ocelot->fdma);
+ if (err)
+ goto out_ocelot_devlink_unregister;
+ }
+
err = ocelot_devlink_sb_register(ocelot);
if (err)
goto out_ocelot_release_ports;
@@ -1166,6 +1177,8 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
out_ocelot_release_ports:
mscc_ocelot_release_ports(ocelot);
mscc_ocelot_teardown_devlink_ports(ocelot);
+ if (ocelot->fdma)
+ ocelot_fdma_stop(ocelot->fdma);
out_ocelot_devlink_unregister:
ocelot_deinit(ocelot);
out_put_ports:
@@ -1179,6 +1192,8 @@ static int mscc_ocelot_remove(struct platform_device *pdev)
{
struct ocelot *ocelot = platform_get_drvdata(pdev);

+ if (ocelot->fdma)
+ ocelot_fdma_stop(ocelot->fdma);
devlink_unregister(ocelot->devlink);
ocelot_deinit_timestamp(ocelot);
ocelot_devlink_sb_unregister(ocelot);
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
index b3381c90ff3e..33e1559bdea3 100644
--- a/include/soc/mscc/ocelot.h
+++ b/include/soc/mscc/ocelot.h
@@ -695,6 +695,8 @@ struct ocelot {
/* Protects the PTP clock */
spinlock_t ptp_clock_lock;
struct ptp_pin_desc ptp_pins[OCELOT_PTP_PINS_NUM];
+
+ struct ocelot_fdma *fdma;
};

struct ocelot_policer {
--
2.33.0

2021-11-03 09:22:38

by Clément Léger

[permalink] [raw]
Subject: [PATCH v2 6/6] net: ocelot: add jumbo frame support for FDMA

When using the FDMA, using jumbo frames can lead to a large performance
improvement. When changing the MTU, the RX buffer size must be
increased to be large enough to receive jumbo frame. Since the FDMA is
shared amongst all interfaces, all the ports must be down before
changing the MTU. Buffers are sized to accept the maximum MTU supported
by each port.

Signed-off-by: Clément Léger <[email protected]>
---
drivers/net/ethernet/mscc/ocelot_fdma.c | 61 +++++++++++++++++++++++++
drivers/net/ethernet/mscc/ocelot_fdma.h | 1 +
drivers/net/ethernet/mscc/ocelot_net.c | 7 +++
3 files changed, 69 insertions(+)

diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c
index d8cdf022bbee..bee1a310caa6 100644
--- a/drivers/net/ethernet/mscc/ocelot_fdma.c
+++ b/drivers/net/ethernet/mscc/ocelot_fdma.c
@@ -530,6 +530,67 @@ static void fdma_free_skbs_list(struct ocelot_fdma *fdma,
}
}

+int ocelot_fdma_change_mtu(struct net_device *dev, int new_mtu)
+{
+ struct ocelot_port_private *priv = netdev_priv(dev);
+ struct ocelot_port *port = &priv->port;
+ struct ocelot *ocelot = port->ocelot;
+ struct ocelot_fdma *fdma = ocelot->fdma;
+ struct ocelot_fdma_dcb *dcb, *dcb_temp;
+ struct list_head tmp = LIST_HEAD_INIT(tmp);
+ size_t old_rx_buf_size = fdma->rx_buf_size;
+ bool all_ports_down = true;
+ u8 port_num;
+
+ /* The FDMA RX list is shared amongst all the port, get the max MTU from
+ * all of them
+ */
+ for (port_num = 0; port_num < ocelot->num_phys_ports; port_num++) {
+ port = ocelot->ports[port_num];
+ if (!port)
+ continue;
+
+ priv = container_of(port, struct ocelot_port_private, port);
+
+ if (READ_ONCE(priv->dev->mtu) > new_mtu)
+ new_mtu = READ_ONCE(priv->dev->mtu);
+
+ /* All ports must be down to change the RX buffer length */
+ if (netif_running(priv->dev))
+ all_ports_down = false;
+ }
+
+ fdma->rx_buf_size = fdma_rx_compute_buffer_size(new_mtu);
+ if (fdma->rx_buf_size == old_rx_buf_size)
+ return 0;
+
+ if (!all_ports_down)
+ return -EBUSY;
+
+ priv = netdev_priv(dev);
+
+ fdma_stop_channel(fdma, MSCC_FDMA_INJ_CHAN);
+
+ /* Discard all pending RX software and hardware descriptor */
+ fdma_free_skbs_list(fdma, &fdma->rx_hw, DMA_FROM_DEVICE);
+ fdma_free_skbs_list(fdma, &fdma->rx_sw, DMA_FROM_DEVICE);
+
+ /* Move all DCBs to a temporary list that will be injected in sw list */
+ if (!list_empty(&fdma->rx_hw))
+ list_splice_tail_init(&fdma->rx_hw, &tmp);
+ if (!list_empty(&fdma->rx_sw))
+ list_splice_tail_init(&fdma->rx_sw, &tmp);
+
+ list_for_each_entry_safe(dcb, dcb_temp, &tmp, node) {
+ list_del(&dcb->node);
+ ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
+ }
+
+ ocelot_fdma_rx_refill(fdma);
+
+ return 0;
+}
+
static int fdma_init_tx(struct ocelot_fdma *fdma)
{
int i;
diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.h b/drivers/net/ethernet/mscc/ocelot_fdma.h
index 6c5c5872abf5..74514a0b291a 100644
--- a/drivers/net/ethernet/mscc/ocelot_fdma.h
+++ b/drivers/net/ethernet/mscc/ocelot_fdma.h
@@ -55,5 +55,6 @@ int ocelot_fdma_start(struct ocelot_fdma *fdma);
int ocelot_fdma_stop(struct ocelot_fdma *fdma);
int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port, u32 rew_op,
struct sk_buff *skb, struct net_device *dev);
+int ocelot_fdma_change_mtu(struct net_device *dev, int new_mtu);

#endif
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index 3971b810c5b4..d5e88d7b15c7 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -492,6 +492,13 @@ static int ocelot_change_mtu(struct net_device *dev, int new_mtu)
struct ocelot_port_private *priv = netdev_priv(dev);
struct ocelot_port *ocelot_port = &priv->port;
struct ocelot *ocelot = ocelot_port->ocelot;
+ int ret;
+
+ if (ocelot->fdma) {
+ ret = ocelot_fdma_change_mtu(dev, new_mtu);
+ if (ret)
+ return ret;
+ }

ocelot_port_set_maxlen(ocelot, priv->chip_port, new_mtu);
WRITE_ONCE(dev->mtu, new_mtu);
--
2.33.0

2021-11-03 09:23:00

by Clément Léger

[permalink] [raw]
Subject: [PATCH v2 4/6] net: ocelot: add support for ndo_change_mtu

This commit adds support for changing MTU for the ocelot register based
interface. For ocelot, JUMBO frame size can be set up to 25000 bytes
but has been set to 9000 which is a saner value and allow for maximum
gain of performances. Frames larger than 9000 bytes do not yield
a noticeable improvement.

Signed-off-by: Clément Léger <[email protected]>
---
drivers/net/ethernet/mscc/ocelot.h | 2 ++
drivers/net/ethernet/mscc/ocelot_net.c | 14 ++++++++++++++
2 files changed, 16 insertions(+)

diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h
index e43da09b8f91..ba0dec7dd64f 100644
--- a/drivers/net/ethernet/mscc/ocelot.h
+++ b/drivers/net/ethernet/mscc/ocelot.h
@@ -32,6 +32,8 @@

#define OCELOT_PTP_QUEUE_SZ 128

+#define OCELOT_JUMBO_MTU 9000
+
struct ocelot_port_tc {
bool block_shared;
unsigned long offload_cnt;
diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
index d76def435b23..5916492fd6d0 100644
--- a/drivers/net/ethernet/mscc/ocelot_net.c
+++ b/drivers/net/ethernet/mscc/ocelot_net.c
@@ -482,6 +482,18 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
return NETDEV_TX_OK;
}

+static int ocelot_change_mtu(struct net_device *dev, int new_mtu)
+{
+ struct ocelot_port_private *priv = netdev_priv(dev);
+ struct ocelot_port *ocelot_port = &priv->port;
+ struct ocelot *ocelot = ocelot_port->ocelot;
+
+ ocelot_port_set_maxlen(ocelot, priv->chip_port, new_mtu);
+ WRITE_ONCE(dev->mtu, new_mtu);
+
+ return 0;
+}
+
enum ocelot_action_type {
OCELOT_MACT_LEARN,
OCELOT_MACT_FORGET,
@@ -768,6 +780,7 @@ static const struct net_device_ops ocelot_port_netdev_ops = {
.ndo_open = ocelot_port_open,
.ndo_stop = ocelot_port_stop,
.ndo_start_xmit = ocelot_port_xmit,
+ .ndo_change_mtu = ocelot_change_mtu,
.ndo_set_rx_mode = ocelot_set_rx_mode,
.ndo_set_mac_address = ocelot_port_set_mac_address,
.ndo_get_stats64 = ocelot_get_stats64,
@@ -1699,6 +1712,7 @@ int ocelot_probe_port(struct ocelot *ocelot, int port, struct regmap *target,

dev->netdev_ops = &ocelot_port_netdev_ops;
dev->ethtool_ops = &ocelot_ethtool_ops;
+ dev->max_mtu = OCELOT_JUMBO_MTU;

dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_RXFCS |
NETIF_F_HW_TC;
--
2.33.0

2021-11-03 10:47:51

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] dt-bindings: net: convert mscc,vsc7514-switch bindings to yaml

On Wed, Nov 03, 2021 at 10:19:39AM +0100, Clément Léger wrote:
> Convert existing bindings to yaml format. In the same time, remove non
> exiting properties ("inj" interrupt) and add fdma.
>
> Signed-off-by: Clément Léger <[email protected]>
> ---
> .../bindings/net/mscc,vsc7514-switch.yaml | 184 ++++++++++++++++++
> .../devicetree/bindings/net/mscc-ocelot.txt | 83 --------
> 2 files changed, 184 insertions(+), 83 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> delete mode 100644 Documentation/devicetree/bindings/net/mscc-ocelot.txt
>
> diff --git a/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml b/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> new file mode 100644
> index 000000000000..0c96eabf9d2d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> @@ -0,0 +1,184 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/net/mscc,vsc7514-switch.yaml
> +$schema: http://devicetree.org/meta-schemas/core.yaml
> +
> +title: Microchip VSC7514 Ethernet switch controller
> +
> +maintainers:
> + - Vladimir Oltean <[email protected]>
> + - Claudiu Manoil <[email protected]>
> + - Alexandre Belloni <[email protected]>
> +
> +description: |
> + The VSC7514 Industrial IoT Ethernet switch contains four integrated dual media
> + 10/100/1000BASE-T PHYs, two 1G SGMII/SerDes, two 1G/2.5G SGMII/SerDes, and an
> + option for either a 1G/2.5G SGMII/SerDes Node Processor Interface (NPI) or a
> + PCIe interface for external CPU connectivity. The NPI/PCIe can operate as a
> + standard Ethernet port.

Technically any port can serve as NPI, not just the SERDES ones. People
are even using internal PHY ports as NPI.
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/#24381029

Honestly I would not bother talking about NPI, it is confusing to see it here.
Anything having to do with the NPI port is the realm of DSA.

Just say how the present driver expects to control the device, don't
just copy stuff from marketing slides. In this case PCIe is irrelevant
too, this driver is for a platform device, and it only runs on the
embedded processor as far as I can tell.

> +
> + The device provides a rich set of Industrial Ethernet switching features such
> + as fast protection switching, 1588 precision time protocol, and synchronous
> + Ethernet. Advanced TCAM-based VLAN and QoS processing enable delivery of
> + differentiated services. Security is assured through frame processing using
> + Microsemi’s TCAM-based Versatile Content Aware Processor.

Above you say Microchip, and here you say Microsemi.

> +
> + In addition, the device contains a powerful 500 MHz CPU enabling full
> + management of the switch.

~powerful~

> +
> +properties:
> + $nodename:
> + pattern: "^switch@[0-9a-f]+$"
> +
> + compatible:
> + const: mscc,vsc7514-switch
> +
> + reg:
> + items:
> + - description: system target
> + - description: rewriter target
> + - description: qs target
> + - description: PTP target
> + - description: Port0 target
> + - description: Port1 target
> + - description: Port2 target
> + - description: Port3 target
> + - description: Port4 target
> + - description: Port5 target
> + - description: Port6 target
> + - description: Port7 target
> + - description: Port8 target
> + - description: Port9 target
> + - description: Port10 target
> + - description: QSystem target
> + - description: Analyzer target
> + - description: S0 target
> + - description: S1 target
> + - description: S2 target
> + - description: fdma target
> +
> + reg-names:
> + items:
> + - const: sys
> + - const: rew
> + - const: qs
> + - const: ptp
> + - const: port0
> + - const: port1
> + - const: port2
> + - const: port3
> + - const: port4
> + - const: port5
> + - const: port6
> + - const: port7
> + - const: port8
> + - const: port9
> + - const: port10
> + - const: qsys
> + - const: ana
> + - const: s0
> + - const: s1
> + - const: s2
> + - const: fdma
> +
> + interrupts:
> + minItems: 1
> + items:
> + - description: PTP ready
> + - description: register based extraction
> + - description: frame dma based extraction
> +
> + interrupt-names:
> + minItems: 1
> + items:
> + - const: ptp_rdy
> + - const: xtr
> + - const: fdma
> +
> + ethernet-ports:
> + type: object
> + patternProperties:
> + "^port@[0-9a-f]+$":
> + type: object
> + description: Ethernet ports handled by the switch
> +
> + allOf:
> + - $ref: ethernet-controller.yaml#

I'm pretty sure Rob will comment that this can be simplified to:

$ref: ethernet-controller.yaml#

without the allOf: and "-".

> +
> + properties:
> + '#address-cells':
> + const: 1
> + '#size-cells':
> + const: 0
> +
> + reg:
> + description: Switch port number
> +
> + phy-handle: true
> +
> + mac-address: true
> +
> + required:
> + - reg
> + - phy-handle

Shouldn't there be additionalProperties: false for the port node as well?

And actually, phy-handle is not strictly required, if you have a
fixed-link. I think you should use oneOf.

And you know what else is required? phy-mode. See commits e6e12df625f2
("net: mscc: ocelot: convert to phylink") and eba54cbb92d2 ("MIPS: mscc:
ocelot: mark the phy-mode for internal PHY ports").

> +
> +required:
> + - compatible
> + - reg
> + - reg-names
> + - interrupts
> + - interrupt-names
> + - ethernet-ports
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + switch@1010000 {
> + compatible = "mscc,vsc7514-switch";
> + reg = <0x1010000 0x10000>,
> + <0x1030000 0x10000>,
> + <0x1080000 0x100>,
> + <0x10e0000 0x10000>,
> + <0x11e0000 0x100>,
> + <0x11f0000 0x100>,
> + <0x1200000 0x100>,
> + <0x1210000 0x100>,
> + <0x1220000 0x100>,
> + <0x1230000 0x100>,
> + <0x1240000 0x100>,
> + <0x1250000 0x100>,
> + <0x1260000 0x100>,
> + <0x1270000 0x100>,
> + <0x1280000 0x100>,
> + <0x1800000 0x80000>,
> + <0x1880000 0x10000>,
> + <0x1040000 0x10000>,
> + <0x1050000 0x10000>,
> + <0x1060000 0x10000>,
> + <0x1a0 0x1c4>;
> + reg-names = "sys", "rew", "qs", "ptp", "port0", "port1",
> + "port2", "port3", "port4", "port5", "port6",
> + "port7", "port8", "port9", "port10", "qsys",
> + "ana", "s0", "s1", "s2", "fdma";
> + interrupts = <18 21 16>;
> + interrupt-names = "ptp_rdy", "xtr", "fdma";
> +
> + ethernet-ports {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + port0: port@0 {
> + reg = <0>;
> + phy-handle = <&phy0>;
> + };
> + port1: port@1 {
> + reg = <1>;
> + phy-handle = <&phy1>;
> + };
> + };
> + };
> +
> +...
> +# vim: set ts=2 sw=2 sts=2 tw=80 et cc=80 ft=yaml :
> diff --git a/Documentation/devicetree/bindings/net/mscc-ocelot.txt b/Documentation/devicetree/bindings/net/mscc-ocelot.txt
> deleted file mode 100644
> index 3b6290b45ce5..000000000000
> --- a/Documentation/devicetree/bindings/net/mscc-ocelot.txt
> +++ /dev/null
> @@ -1,83 +0,0 @@
> -Microsemi Ocelot network Switch
> -===============================
> -
> -The Microsemi Ocelot network switch can be found on Microsemi SoCs (VSC7513,
> -VSC7514)
> -
> -Required properties:
> -- compatible: Should be "mscc,vsc7514-switch"
> -- reg: Must contain an (offset, length) pair of the register set for each
> - entry in reg-names.
> -- reg-names: Must include the following entries:
> - - "sys"
> - - "rew"
> - - "qs"
> - - "ptp" (optional due to backward compatibility)
> - - "qsys"
> - - "ana"
> - - "portX" with X from 0 to the number of last port index available on that
> - switch
> -- interrupts: Should contain the switch interrupts for frame extraction,
> - frame injection and PTP ready.
> -- interrupt-names: should contain the interrupt names: "xtr", "inj". Can contain
> - "ptp_rdy" which is optional due to backward compatibility.
> -- ethernet-ports: A container for child nodes representing switch ports.
> -
> -The ethernet-ports container has the following properties
> -
> -Required properties:
> -
> -- #address-cells: Must be 1
> -- #size-cells: Must be 0
> -
> -Each port node must have the following mandatory properties:
> -- reg: Describes the port address in the switch
> -
> -Port nodes may also contain the following optional standardised
> -properties, described in binding documents:
> -
> -- phy-handle: Phandle to a PHY on an MDIO bus. See
> - Documentation/devicetree/bindings/net/ethernet.txt for details.
> -
> -Example:
> -
> - switch@1010000 {
> - compatible = "mscc,vsc7514-switch";
> - reg = <0x1010000 0x10000>,
> - <0x1030000 0x10000>,
> - <0x1080000 0x100>,
> - <0x10e0000 0x10000>,
> - <0x11e0000 0x100>,
> - <0x11f0000 0x100>,
> - <0x1200000 0x100>,
> - <0x1210000 0x100>,
> - <0x1220000 0x100>,
> - <0x1230000 0x100>,
> - <0x1240000 0x100>,
> - <0x1250000 0x100>,
> - <0x1260000 0x100>,
> - <0x1270000 0x100>,
> - <0x1280000 0x100>,
> - <0x1800000 0x80000>,
> - <0x1880000 0x10000>;
> - reg-names = "sys", "rew", "qs", "ptp", "port0", "port1",
> - "port2", "port3", "port4", "port5", "port6",
> - "port7", "port8", "port9", "port10", "qsys",
> - "ana";
> - interrupts = <18 21 22>;
> - interrupt-names = "ptp_rdy", "xtr", "inj";
> -
> - ethernet-ports {
> - #address-cells = <1>;
> - #size-cells = <0>;
> -
> - port0: port@0 {
> - reg = <0>;
> - phy-handle = <&phy0>;
> - };
> - port1: port@1 {
> - reg = <1>;
> - phy-handle = <&phy1>;
> - };
> - };
> - };
> --
> 2.33.0
>

2021-11-03 10:48:52

by Denis Kirjanov

[permalink] [raw]
Subject: Re: [PATCH v2 0/6] Add FDMA support on ocelot switch driver



11/3/21 12:19 PM, Clément Léger пишет:
> This series adds support for the Frame DMA present on the VSC7514
> switch. The FDMA is able to extract and inject packets on the various
> ethernet interfaces present on the switch.
>
> While adding FDMA support, bindings were switched from .txt to .yaml
> and MAC address reading from device-tree was added for testing
> purposes. Jumbo frame support was also added since it gives a large
> performance improvement with FDMA.

The series should be prefixed with net-next

>
> ------------------
> Changes in V2:
> - Read MAC for each port and not as switch base MAC address
> - Add missing static for some functions in ocelot_fdma.c
> - Split change_mtu from fdma commit
> - Add jumbo support for register based xmit
> - Move precomputed header into ocelot_port struct
> - Remove use of QUIRK_ENDIAN_LITTLE due to misconfiguration for tests
> - Remove fragmented packet sending which has not been tested
>
> Clément Léger (6):
> net: ocelot: add support to get port mac from device-tree
> dt-bindings: net: convert mscc,vsc7514-switch bindings to yaml
> net: ocelot: pre-compute injection frame header content
> net: ocelot: add support for ndo_change_mtu
> net: ocelot: add FDMA support
> net: ocelot: add jumbo frame support for FDMA
>
> .../bindings/net/mscc,vsc7514-switch.yaml | 184 +++++
> .../devicetree/bindings/net/mscc-ocelot.txt | 83 --
> drivers/net/ethernet/mscc/Makefile | 1 +
> drivers/net/ethernet/mscc/ocelot.c | 23 +-
> drivers/net/ethernet/mscc/ocelot.h | 3 +
> drivers/net/ethernet/mscc/ocelot_fdma.c | 754 ++++++++++++++++++
> drivers/net/ethernet/mscc/ocelot_fdma.h | 60 ++
> drivers/net/ethernet/mscc/ocelot_net.c | 37 +-
> drivers/net/ethernet/mscc/ocelot_vsc7514.c | 15 +
> include/soc/mscc/ocelot.h | 7 +
> 10 files changed, 1075 insertions(+), 92 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> delete mode 100644 Documentation/devicetree/bindings/net/mscc-ocelot.txt
> create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.c
> create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.h
>

2021-11-03 11:28:11

by Denis Kirjanov

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] net: ocelot: add FDMA support



11/3/21 12:19 PM, Clément Léger пишет:
> Ethernet frames can be extracted or injected autonomously to or from the
> device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list data
> structures in memory are used for injecting or extracting Ethernet frames.
> The FDMA generates interrupts when frame extraction or injection is done
> and when the linked lists need updating.
>
> The FDMA is shared between all the ethernet ports of the switch and uses
> a linked list of descriptors (DCB) to inject and extract packets.
> Before adding descriptors, the FDMA channels must be stopped. It would
> be inefficient to do that each time a descriptor would be added,
>
> TX path uses multiple lists to handle descriptors. tx_ongoing is the list
> of DCB currently owned by the hardware, tx_queued is a list of DCB that
> will be given to the hardware when tx_ongoing is done and finally
> tx_free_dcb is the list of DCB available for TX.
>
> RX path uses two list, rx_hw is the list of DCB currently given to the
> hardware and rx_sw is the list of descriptors that have been completed
> by the FDMA and will be reinjected when the DMA hits the end of the
> linked list.
>
> Co-developed-by: Alexandre Belloni <[email protected]>
> Signed-off-by: Alexandre Belloni <[email protected]>
> Signed-off-by: Clément Léger <[email protected]>
> ---
> drivers/net/ethernet/mscc/Makefile | 1 +
> drivers/net/ethernet/mscc/ocelot.h | 1 +
> drivers/net/ethernet/mscc/ocelot_fdma.c | 693 +++++++++++++++++++++
> drivers/net/ethernet/mscc/ocelot_fdma.h | 59 ++
> drivers/net/ethernet/mscc/ocelot_net.c | 11 +-
> drivers/net/ethernet/mscc/ocelot_vsc7514.c | 15 +
> include/soc/mscc/ocelot.h | 2 +
> 7 files changed, 779 insertions(+), 3 deletions(-)
> create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.c
> create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.h
>
> diff --git a/drivers/net/ethernet/mscc/Makefile b/drivers/net/ethernet/mscc/Makefile
> index 722c27694b21..d76a9b78b6ca 100644
> --- a/drivers/net/ethernet/mscc/Makefile
> +++ b/drivers/net/ethernet/mscc/Makefile
> @@ -11,5 +11,6 @@ mscc_ocelot_switch_lib-y := \
> mscc_ocelot_switch_lib-$(CONFIG_BRIDGE_MRP) += ocelot_mrp.o
> obj-$(CONFIG_MSCC_OCELOT_SWITCH) += mscc_ocelot.o
> mscc_ocelot-y := \
> + ocelot_fdma.o \
> ocelot_vsc7514.o \
> ocelot_net.o
> diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h
> index ba0dec7dd64f..ad85ad1079ad 100644
> --- a/drivers/net/ethernet/mscc/ocelot.h
> +++ b/drivers/net/ethernet/mscc/ocelot.h
> @@ -9,6 +9,7 @@
> #define _MSCC_OCELOT_H_
>
> #include <linux/bitops.h>
> +#include <linux/dsa/ocelot.h>
> #include <linux/etherdevice.h>
> #include <linux/if_vlan.h>
> #include <linux/net_tstamp.h>
> diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c
> new file mode 100644
> index 000000000000..d8cdf022bbee
> --- /dev/null
> +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c
> @@ -0,0 +1,693 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +/*
> + * Microsemi SoCs FDMA driver
> + *
> + * Copyright (c) 2021 Microchip
> + */
> +
> +#include <linux/bitops.h>
> +#include <linux/dmapool.h>
> +#include <linux/dsa/ocelot.h>
> +#include <linux/netdevice.h>
> +#include <linux/of_platform.h>
> +#include <linux/skbuff.h>
> +
> +#include "ocelot_fdma.h"
> +#include "ocelot_qs.h"
> +
> +#define MSCC_FDMA_DCB_LLP(x) ((x) * 4 + 0x0)
> +
> +#define MSCC_FDMA_DCB_STAT_BLOCKO(x) (((x) << 20) & GENMASK(31, 20))
> +#define MSCC_FDMA_DCB_STAT_BLOCKO_M GENMASK(31, 20)
> +#define MSCC_FDMA_DCB_STAT_BLOCKO_X(x) (((x) & GENMASK(31, 20)) >> 20)
> +#define MSCC_FDMA_DCB_STAT_PD BIT(19)
> +#define MSCC_FDMA_DCB_STAT_ABORT BIT(18)
> +#define MSCC_FDMA_DCB_STAT_EOF BIT(17)
> +#define MSCC_FDMA_DCB_STAT_SOF BIT(16)
> +#define MSCC_FDMA_DCB_STAT_BLOCKL_M GENMASK(15, 0)
> +#define MSCC_FDMA_DCB_STAT_BLOCKL(x) ((x) & GENMASK(15, 0))
> +
> +#define MSCC_FDMA_CH_SAFE 0xcc
> +
> +#define MSCC_FDMA_CH_ACTIVATE 0xd0
> +
> +#define MSCC_FDMA_CH_DISABLE 0xd4
> +
> +#define MSCC_FDMA_EVT_ERR 0x164
> +
> +#define MSCC_FDMA_EVT_ERR_CODE 0x168
> +
> +#define MSCC_FDMA_INTR_LLP 0x16c
> +
> +#define MSCC_FDMA_INTR_LLP_ENA 0x170
> +
> +#define MSCC_FDMA_INTR_FRM 0x174
> +
> +#define MSCC_FDMA_INTR_FRM_ENA 0x178
> +
> +#define MSCC_FDMA_INTR_ENA 0x184
> +
> +#define MSCC_FDMA_INTR_IDENT 0x188
> +
> +#define MSCC_FDMA_INJ_CHAN 2
> +#define MSCC_FDMA_XTR_CHAN 0
> +
> +#define FDMA_MAX_SKB 256
> +#define FDMA_WEIGHT 32
> +
> +#define OCELOT_TAG_WORD_LEN (OCELOT_TAG_LEN / 4)
> +
> +/* Add 4 for possible misalignment when mapping the data */
> +#define FDMA_RX_EXTRA_SIZE \
> + (OCELOT_TAG_LEN + ETH_FCS_LEN + ETH_HLEN + 4)
> +
> +struct ocelot_fdma_dcb_hw_v2 {
> + u32 llp;
> + u32 datap;
> + u32 datal;
> + u32 stat;
> +};
> +
> +struct ocelot_fdma_dcb {
> + struct ocelot_fdma_dcb_hw_v2 hw;
> + struct list_head node;
> + struct sk_buff *skb;
> + dma_addr_t mapping;
> + size_t mapped_size;
> + dma_addr_t phys;
> +};
> +
> +static int fdma_rx_compute_buffer_size(int mtu)
> +{
> + return ALIGN(mtu + FDMA_RX_EXTRA_SIZE, 4);
> +}
> +
> +static void fdma_writel(struct ocelot_fdma *fdma, u32 reg, u32 data)
> +{
> + writel(data, fdma->base + reg);
> +}
> +
> +static u32 fdma_readl(struct ocelot_fdma *fdma, u32 reg)
> +{
> + return readl(fdma->base + reg);
> +}
> +
> +static void fdma_activate_chan(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb, int chan)
> +{
> + fdma_writel(fdma, MSCC_FDMA_DCB_LLP(chan), dcb->phys);
> + fdma_writel(fdma, MSCC_FDMA_CH_ACTIVATE, BIT(chan));
> +}
> +
> +static void fdma_stop_channel(struct ocelot_fdma *fdma, int chan)
> +{
> + u32 safe;
> +
> + fdma_writel(fdma, MSCC_FDMA_CH_DISABLE, BIT(chan));
> + do {
> + safe = fdma_readl(fdma, MSCC_FDMA_CH_SAFE);
> + } while (!(safe & BIT(chan)));
timeout?

> +}
> +
> +static bool ocelot_fdma_dcb_set_data(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb, void *data,
> + size_t size, enum dma_data_direction dir)
> +{
> + u32 offset;
> +
> + dcb->mapped_size = size;
> + dcb->mapping = dma_map_single(fdma->dev, data, size, dir);
> + if (unlikely(dma_mapping_error(fdma->dev, dcb->mapping)))
> + return false;
> +
> + offset = dcb->mapping & 0x3;
> +
> + dcb->hw.llp = 0;
> + dcb->hw.datap = dcb->mapping & ~0x3;
> + /* DATAL must be a multiple of word size */
> + dcb->hw.datal = ALIGN_DOWN(size - offset, 4);
> + dcb->hw.stat = MSCC_FDMA_DCB_STAT_BLOCKO(offset);
> +
> + return true;
> +}
> +
> +static bool ocelot_fdma_dcb_set_rx_skb(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb,
> + struct sk_buff *skb, size_t size)
> +{
> + dcb->skb = skb;
> + return ocelot_fdma_dcb_set_data(fdma, dcb, skb->data, size,
> + DMA_FROM_DEVICE);
> +}
> +
> +static bool ocelot_fdma_dcb_set_tx_skb(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb,
> + struct sk_buff *skb)
> +{
> + if (!ocelot_fdma_dcb_set_data(fdma, dcb, skb->data, skb->len,
> + DMA_TO_DEVICE))
> + return false;
> +
> + dcb->skb = skb;
> + dcb->hw.stat |= MSCC_FDMA_DCB_STAT_BLOCKL(skb->len);
> + dcb->hw.stat |= MSCC_FDMA_DCB_STAT_SOF | MSCC_FDMA_DCB_STAT_EOF;
> +
> + return true;
> +}
> +
> +static struct ocelot_fdma_dcb *fdma_dcb_alloc(struct ocelot_fdma *fdma)
> +{
> + struct ocelot_fdma_dcb *dcb;
> + dma_addr_t phys;
> +
> + dcb = dma_pool_zalloc(fdma->dcb_pool, GFP_KERNEL, &phys);
> + if (unlikely(!dcb))
> + return NULL;
> +
> + dcb->phys = phys;
> +
> + return dcb;
> +}
> +
> +static struct net_device *fdma_get_port_netdev(struct ocelot_fdma *fdma,
> + int port_num)
> +{
> + struct ocelot_port_private *port_priv;
> + struct ocelot *ocelot = fdma->ocelot;
> + struct ocelot_port *port;
> +
> + if (port_num >= ocelot->num_phys_ports)
> + return NULL;
> +
> + port = ocelot->ports[port_num];
> +
> + if (!port)
> + return NULL;
> +
> + port_priv = container_of(port, struct ocelot_port_private, port);
> +
> + return port_priv->dev;
> +}
> +
> +static bool ocelot_fdma_rx_process_skb(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb,
> + int budget)
> +{
> + struct sk_buff *skb = dcb->skb;
> + struct net_device *ndev;
> + u64 src_port;
> + void *xfh;
> +
> + dma_unmap_single(fdma->dev, dcb->mapping, dcb->mapped_size,
> + DMA_FROM_DEVICE);
> +
> + xfh = skb->data;
> + ocelot_xfh_get_src_port(xfh, &src_port);
> +
> + skb_put(skb, MSCC_FDMA_DCB_STAT_BLOCKL(dcb->hw.stat));
> + skb_pull(skb, OCELOT_TAG_LEN);
> +
> + ndev = fdma_get_port_netdev(fdma, src_port);
> + if (unlikely(!ndev)) {
> + napi_consume_skb(dcb->skb, budget);
> + return false;
> + }
> +
> + skb->dev = ndev;
> + skb->protocol = eth_type_trans(skb, skb->dev);
> + skb->dev->stats.rx_bytes += skb->len;
> + skb->dev->stats.rx_packets++;
> +
> + netif_receive_skb(skb);
> +
> + return true;
> +}
> +
> +static void ocelot_fdma_rx_refill(struct ocelot_fdma *fdma)
> +{
> + struct ocelot_fdma_dcb *dcb, *last_dcb;
> +
> + WARN_ON(list_empty(&fdma->rx_sw));
> +
> + dcb = list_first_entry(&fdma->rx_sw, struct ocelot_fdma_dcb, node);
> + /* Splice old hardware DCB list + new one */
> + if (!list_empty(&fdma->rx_hw)) {
> + last_dcb = list_last_entry(&fdma->rx_hw, struct ocelot_fdma_dcb,
> + node);
> + last_dcb->hw.llp = dcb->phys;
> + }
> +
> + /* Move software list to hardware list */
> + list_splice_tail_init(&fdma->rx_sw, &fdma->rx_hw);
> +
> + /* Finally reactivate the channel */
> + fdma_activate_chan(fdma, dcb, MSCC_FDMA_XTR_CHAN);
> +}
> +
> +static void ocelot_fdma_list_add_dcb(struct list_head *list,
> + struct ocelot_fdma_dcb *dcb)
> +{
> + struct ocelot_fdma_dcb *last_dcb;
> +
> + if (!list_empty(list)) {
> + last_dcb = list_last_entry(list, struct ocelot_fdma_dcb, node);
> + last_dcb->hw.llp = dcb->phys;
> + }
> +
> + list_add_tail(&dcb->node, list);
> +}
> +
> +static bool ocelot_fdma_rx_add_dcb_sw(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb)
> +{
> + struct sk_buff *new_skb;
> +
> + /* Add DCB to end of list with new SKB */
> + new_skb = napi_alloc_skb(&fdma->napi, fdma->rx_buf_size);
> + if (unlikely(!new_skb)) {
> + pr_err("skb_alloc failed\n");
> + return false;
> + }
> +
> + ocelot_fdma_dcb_set_rx_skb(fdma, dcb, new_skb, fdma->rx_buf_size);
> + ocelot_fdma_list_add_dcb(&fdma->rx_sw, dcb);
> +
> + return true;
> +}
> +
> +static bool ocelot_fdma_rx_get(struct ocelot_fdma *fdma, int budget)
> +{
> + struct ocelot_fdma_dcb *dcb;
> + bool valid = true;
> + u32 stat;
> +
> + dcb = list_first_entry_or_null(&fdma->rx_hw, struct ocelot_fdma_dcb,
> + node);
> + if (!dcb || MSCC_FDMA_DCB_STAT_BLOCKL(dcb->hw.stat) == 0)
> + return false;
> +
> + list_del(&dcb->node);
> +
> + stat = dcb->hw.stat;
> + if (stat & MSCC_FDMA_DCB_STAT_ABORT || stat & MSCC_FDMA_DCB_STAT_PD)
> + valid = false;
> +
> + if (!(stat & MSCC_FDMA_DCB_STAT_SOF) ||
> + !(stat & MSCC_FDMA_DCB_STAT_EOF))
> + valid = false;
> +
> + if (likely(valid)) {
> + if (!ocelot_fdma_rx_process_skb(fdma, dcb, budget))
> + pr_err("Process skb failed, stat %x\n", stat);
> + } else {
> + napi_consume_skb(dcb->skb, budget);
> + }
> +
> + return ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
> +}
> +
> +static void ocelot_fdma_rx_check_stopped(struct ocelot_fdma *fdma)
> +{
> + u32 llp = fdma_readl(fdma, MSCC_FDMA_DCB_LLP(MSCC_FDMA_XTR_CHAN));
> + /* LLP is non NULL, FDMA is still fetching packets */
> + if (llp)
> + return;
> +
> + fdma_stop_channel(fdma, MSCC_FDMA_XTR_CHAN);
> + ocelot_fdma_rx_refill(fdma);
> +}
> +
> +static void ocelot_fdma_tx_free_dcb(struct ocelot_fdma *fdma,
> + struct list_head *list)
> +{
> + struct ocelot_fdma_dcb *dcb;
> +
> + if (list_empty(list))
> + return;
> +
> + /* Free all SKBs that have been used for TX */
> + list_for_each_entry(dcb, list, node) {
> + dma_unmap_single(fdma->dev, dcb->mapping, dcb->mapped_size,
> + DMA_TO_DEVICE);
> + dev_consume_skb_any(dcb->skb);
> + dcb->skb = NULL;
> + }
> +
> + /* All DCBs can now be given to free list */
> + spin_lock(&fdma->tx_free_lock);
> + list_splice_tail_init(list, &fdma->tx_free_dcb);
> + spin_unlock(&fdma->tx_free_lock);
> +}
> +
> +static void ocelot_fdma_tx_cleanup(struct ocelot_fdma *fdma)
> +{
> + struct list_head tx_done = LIST_HEAD_INIT(tx_done);
> + struct ocelot_fdma_dcb *dcb, *temp;
> +
> + spin_lock(&fdma->tx_enqueue_lock);
> + if (list_empty(&fdma->tx_ongoing))
> + goto out_unlock;
> +
> + list_for_each_entry_safe(dcb, temp, &fdma->tx_ongoing, node) {
> + if (!(dcb->hw.stat & MSCC_FDMA_DCB_STAT_PD))
> + break;
> +
> + list_move_tail(&dcb->node, &tx_done);
> + }
> +
> +out_unlock:
> + spin_unlock(&fdma->tx_enqueue_lock);
> +
> + ocelot_fdma_tx_free_dcb(fdma, &tx_done);
> +}
> +
> +static void ocelot_fdma_tx_restart(struct ocelot_fdma *fdma)
> +{
> + struct ocelot_fdma_dcb *dcb;
> + u32 safe;
> +
> + spin_lock(&fdma->tx_enqueue_lock);
> +
> + if (!list_empty(&fdma->tx_ongoing) || list_empty(&fdma->tx_queued))
> + goto out_unlock;
> +
> + /* Ongoing list is empty, channel should be in safe mode */
> + do {
> + safe = fdma_readl(fdma, MSCC_FDMA_CH_SAFE);
> + } while (!(safe & BIT(MSCC_FDMA_INJ_CHAN)));
timeout?
> +
> + /* Move queued DCB to ongoing and restart the DMA */
> + list_splice_tail_init(&fdma->tx_queued, &fdma->tx_ongoing);
> + /* List can't be empty, no need to check */
> + dcb = list_first_entry(&fdma->tx_ongoing, struct ocelot_fdma_dcb, node);
> +
> + fdma_activate_chan(fdma, dcb, MSCC_FDMA_INJ_CHAN);
> +
> +out_unlock:
> + spin_unlock(&fdma->tx_enqueue_lock);
> +}
> +
> +static int ocelot_fdma_napi_poll(struct napi_struct *napi, int budget)
> +{
> + struct ocelot_fdma *fdma = container_of(napi, struct ocelot_fdma, napi);
> + int work_done = 0;
> +
> + ocelot_fdma_tx_cleanup(fdma);
> + ocelot_fdma_tx_restart(fdma);
> +
> + while (work_done < budget) {
> + if (!ocelot_fdma_rx_get(fdma, budget))
> + break;
> +
> + work_done++;
> + }
> +
> + ocelot_fdma_rx_check_stopped(fdma);
> +
> + if (work_done < budget) {
> + napi_complete(&fdma->napi);
> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA,
> + BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
> + }
> +
> + return work_done;
> +}
> +
> +static irqreturn_t ocelot_fdma_interrupt(int irq, void *dev_id)
> +{
> + u32 ident, llp, frm, err, err_code;
> + struct ocelot_fdma *fdma = dev_id;
> +
> + ident = fdma_readl(fdma, MSCC_FDMA_INTR_IDENT);
> + frm = fdma_readl(fdma, MSCC_FDMA_INTR_FRM);
> + llp = fdma_readl(fdma, MSCC_FDMA_INTR_LLP);
> +
> + fdma_writel(fdma, MSCC_FDMA_INTR_LLP, llp & ident);
> + fdma_writel(fdma, MSCC_FDMA_INTR_FRM, frm & ident);
> + if (frm | llp) {
> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
> + napi_schedule(&fdma->napi);
> + }
> +
> + err = fdma_readl(fdma, MSCC_FDMA_EVT_ERR);
> + if (unlikely(err)) {
> + err_code = fdma_readl(fdma, MSCC_FDMA_EVT_ERR_CODE);
> + dev_err_ratelimited(fdma->dev,
> + "Error ! chans mask: %#x, code: %#x\n",
> + err, err_code);
> +
> + fdma_writel(fdma, MSCC_FDMA_EVT_ERR, err);
> + fdma_writel(fdma, MSCC_FDMA_EVT_ERR_CODE, err_code);
> + }
> +
> + return IRQ_HANDLED;
> +}
> +
> +static struct ocelot_fdma_dcb *fdma_tx_get_dcb(struct ocelot_fdma *fdma)
> +{
> + struct ocelot_fdma_dcb *dcb = NULL;
> +
> + spin_lock_bh(&fdma->tx_free_lock);
> + dcb = list_first_entry_or_null(&fdma->tx_free_dcb,
> + struct ocelot_fdma_dcb, node);
> + if (dcb)
> + list_del(&dcb->node);
> +
> + spin_unlock_bh(&fdma->tx_free_lock);
> +
> + return dcb;
> +}
> +
> +int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port, u32 rew_op,
> + struct sk_buff *skb, struct net_device *dev)
> +{
> + struct ocelot_port *port_s = fdma->ocelot->ports[port];
> + struct ocelot_fdma_dcb *dcb;
> + struct sk_buff *new_skb;
> + void *ifh;
> +
> + if (unlikely(skb_shinfo(skb)->nr_frags != 0)) {
> + netdev_err(dev, "Unsupported fragmented packet");
> + dev_kfree_skb_any(skb);
> + return NETDEV_TX_OK;
> + }
> +
> + if (skb_headroom(skb) < OCELOT_TAG_LEN ||
> + skb_tailroom(skb) < ETH_FCS_LEN) {
> + new_skb = skb_copy_expand(skb, OCELOT_TAG_LEN, ETH_FCS_LEN,
> + GFP_ATOMIC);
> + dev_consume_skb_any(skb);
> + if (!new_skb)
> + return NETDEV_TX_OK;
> +
> + skb = new_skb;
> + }
> +
> + ifh = skb_push(skb, OCELOT_TAG_LEN);
> + skb_put(skb, ETH_FCS_LEN);
> + ocelot_ifh_port_set(ifh, port_s, rew_op, skb_vlan_tag_get(skb));
> +
> + dcb = fdma_tx_get_dcb(fdma);
> + if (unlikely(!dcb))
> + return NETDEV_TX_BUSY;
> +
> + if (!ocelot_fdma_dcb_set_tx_skb(fdma, dcb, skb)) {
> + dev_kfree_skb_any(skb);
> + spin_lock_bh(&fdma->tx_free_lock);
> + list_add_tail(&dcb->node, &fdma->tx_free_dcb);
> + spin_unlock_bh(&fdma->tx_free_lock);
> + return NETDEV_TX_OK;
> + }
> +
> + spin_lock_bh(&fdma->tx_enqueue_lock);
> +
> + if (list_empty(&fdma->tx_ongoing)) {
> + ocelot_fdma_list_add_dcb(&fdma->tx_ongoing, dcb);
> + fdma_activate_chan(fdma, dcb, MSCC_FDMA_INJ_CHAN);
> + } else {
> + ocelot_fdma_list_add_dcb(&fdma->tx_queued, dcb);
> + }
> +
> + spin_unlock_bh(&fdma->tx_enqueue_lock);
> + return NETDEV_TX_OK;
> +}
> +
> +static void fdma_free_skbs_list(struct ocelot_fdma *fdma,
> + struct list_head *list,
> + enum dma_data_direction dir)
> +{
> + struct ocelot_fdma_dcb *dcb;
> +
> + if (list_empty(list))
> + return;
> +
> + list_for_each_entry(dcb, list, node) {
> + if (dcb->skb) {
> + dma_unmap_single(fdma->dev, dcb->mapping,
> + dcb->mapped_size, dir);
> + dev_kfree_skb_any(dcb->skb);
> + }
> + }
> +}
> +
> +static int fdma_init_tx(struct ocelot_fdma *fdma)
> +{
> + int i;
> + struct ocelot_fdma_dcb *dcb;
> +
> + for (i = 0; i < FDMA_MAX_SKB; i++) {
> + dcb = fdma_dcb_alloc(fdma);
> + if (!dcb)
> + return -ENOMEM;
> +
> + list_add_tail(&dcb->node, &fdma->tx_free_dcb);
> + }
> +
> + return 0;
> +}
> +
> +static int fdma_init_rx(struct ocelot_fdma *fdma)
> +{
> + struct ocelot_port_private *port_priv;
> + struct ocelot *ocelot = fdma->ocelot;
> + struct ocelot_fdma_dcb *dcb;
> + struct ocelot_port *port;
> + struct net_device *ndev;
> + int max_mtu = 0;
> + int i;
> + u8 port_num;
> +
> + for (port_num = 0; port_num < ocelot->num_phys_ports; port_num++) {
> + port = ocelot->ports[port_num];
> + if (!port)
> + continue;
> +
> + port_priv = container_of(port, struct ocelot_port_private,
> + port);
> + ndev = port_priv->dev;
> +
> + ndev->needed_headroom = OCELOT_TAG_LEN;
> + ndev->needed_tailroom = ETH_FCS_LEN;
> +
> + if (READ_ONCE(ndev->mtu) > max_mtu)
> + max_mtu = READ_ONCE(ndev->mtu);
> + }
> +
> + if (!ndev)
> + return -ENODEV;
> +
> + fdma->rx_buf_size = fdma_rx_compute_buffer_size(max_mtu);
> + netif_napi_add(ndev, &fdma->napi, ocelot_fdma_napi_poll,
> + FDMA_WEIGHT);
> +
> + for (i = 0; i < FDMA_MAX_SKB; i++) {
> + dcb = fdma_dcb_alloc(fdma);
> + if (!dcb)
> + return -ENOMEM;
cleanup on error
> +
> + ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
> + }
> +
> + napi_enable(&fdma->napi);
> +
> + return 0;
> +}
> +
> +struct ocelot_fdma *ocelot_fdma_init(struct platform_device *pdev,
> + struct ocelot *ocelot)
> +{
> + struct ocelot_fdma *fdma;
> + int ret;
> +
> + fdma = devm_kzalloc(&pdev->dev, sizeof(*fdma), GFP_KERNEL);
> + if (!fdma)
> + return ERR_PTR(-ENOMEM);
> +
> + fdma->ocelot = ocelot;
> + fdma->base = devm_platform_ioremap_resource_byname(pdev, "fdma");
> + if (IS_ERR_OR_NULL(fdma->base))
> + return fdma->base;
> +
> + fdma->dev = &pdev->dev;
> + fdma->dev->coherent_dma_mask = DMA_BIT_MASK(32);
> +
> + spin_lock_init(&fdma->tx_enqueue_lock);
> + spin_lock_init(&fdma->tx_free_lock);
> +
> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
> +
> + fdma->irq = platform_get_irq_byname(pdev, "fdma");
> + ret = devm_request_irq(&pdev->dev, fdma->irq, ocelot_fdma_interrupt, 0,
> + dev_name(&pdev->dev), fdma);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + /* Create a pool of consistent memory blocks for hardware descriptors */
> + fdma->dcb_pool = dmam_pool_create("ocelot_fdma", &pdev->dev,
> + sizeof(struct ocelot_fdma_dcb),
> + __alignof__(struct ocelot_fdma_dcb),
> + 0);
> + if (!fdma->dcb_pool) {
> + dev_err(&pdev->dev, "unable to allocate DMA descriptor pool\n");
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + INIT_LIST_HEAD(&fdma->tx_ongoing);
> + INIT_LIST_HEAD(&fdma->tx_free_dcb);
> + INIT_LIST_HEAD(&fdma->tx_queued);
> + INIT_LIST_HEAD(&fdma->rx_sw);
> + INIT_LIST_HEAD(&fdma->rx_hw);

You're leaking resources on error conditions.

> + return fdma;
> +}
> +
> +int ocelot_fdma_start(struct ocelot_fdma *fdma)
> +{
> + struct ocelot *ocelot = fdma->ocelot;
> + int ret;
> +
> + ret = fdma_init_tx(fdma);
> + if (ret)
> + return ret;
> +
> + ret = fdma_init_rx(fdma);
> + if (ret)
> + return ret;
> +
> + /* Reconfigure for extraction and injection using DMA */
> + ocelot_write_rix(ocelot, QS_INJ_GRP_CFG_MODE(2), QS_INJ_GRP_CFG, 0);
> + ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(0), QS_INJ_CTRL, 0);
> +
> + ocelot_write_rix(ocelot, QS_XTR_GRP_CFG_MODE(2), QS_XTR_GRP_CFG, 0);
> +
> + fdma_writel(fdma, MSCC_FDMA_INTR_LLP, 0xffffffff);
> + fdma_writel(fdma, MSCC_FDMA_INTR_FRM, 0xffffffff);
> +
> + fdma_writel(fdma, MSCC_FDMA_INTR_LLP_ENA,
> + BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
> + fdma_writel(fdma, MSCC_FDMA_INTR_FRM_ENA, BIT(MSCC_FDMA_XTR_CHAN));
> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA,
> + BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
> +
> + ocelot_fdma_rx_refill(fdma);
> +
> + return 0;
> +}
> +
> +int ocelot_fdma_stop(struct ocelot_fdma *fdma)
> +{
> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
> +
> + fdma_stop_channel(fdma, MSCC_FDMA_XTR_CHAN);
> + fdma_stop_channel(fdma, MSCC_FDMA_INJ_CHAN);
> +
> + /* Free potentially pending SKBs in DCB lists */
> + fdma_free_skbs_list(fdma, &fdma->rx_hw, DMA_FROM_DEVICE);
> + fdma_free_skbs_list(fdma, &fdma->rx_sw, DMA_FROM_DEVICE);
> + fdma_free_skbs_list(fdma, &fdma->tx_ongoing, DMA_TO_DEVICE);
> + fdma_free_skbs_list(fdma, &fdma->tx_queued, DMA_TO_DEVICE);
> +
> + netif_napi_del(&fdma->napi);
> +
> + return 0;
> +}
> diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.h b/drivers/net/ethernet/mscc/ocelot_fdma.h
> new file mode 100644
> index 000000000000..6c5c5872abf5
> --- /dev/null
> +++ b/drivers/net/ethernet/mscc/ocelot_fdma.h
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
> +/*
> + * Microsemi SoCs FDMA driver
> + *
> + * Copyright (c) 2021 Microchip
> + */
> +#ifndef _MSCC_OCELOT_FDMA_H_
> +#define _MSCC_OCELOT_FDMA_H_
> +
> +#include "ocelot.h"
> +
> +/**
> + * struct ocelot_fdma - FMDA struct
> + *
> + * @ocelot: Pointer to ocelot struct
> + * @base: base address of FDMA registers
> + * @dcb_pool: Pool used for DCB allocation
> + * @irq: FDMA interrupt
> + * @dev: Ocelot device
> + * @napi: napi handle
> + * @rx_buf_size: Size of RX buffer
> + * @tx_ongoing: List of DCB handed out to the FDMA
> + * @tx_queued: pending list of DCBs to be given to the hardware
> + * @tx_enqueue_lock: Lock used for tx_queued and tx_ongoing
> + * @tx_free_dcb: List of DCB available for TX
> + * @tx_free_lock: Lock used to access tx_free_dcb list
> + * @rx_hw: RX DCBs currently owned by the hardware and not completed
> + * @rx_sw: RX DCBs completed
> + */
> +struct ocelot_fdma {
> + struct ocelot *ocelot;
> + void __iomem *base;
> + struct dma_pool *dcb_pool;
> + int irq;
> + struct device *dev;
> + struct napi_struct napi;
> + size_t rx_buf_size;
> +
> + struct list_head tx_ongoing;
> + struct list_head tx_queued;
> + /* Lock for tx_queued and tx_ongoing lists */
> + spinlock_t tx_enqueue_lock;
> +
> + struct list_head tx_free_dcb;
> + /* Lock for tx_free_dcb list */
> + spinlock_t tx_free_lock;
> +
> + struct list_head rx_hw;
> + struct list_head rx_sw;
> +};
> +
> +struct ocelot_fdma *ocelot_fdma_init(struct platform_device *pdev,
> + struct ocelot *ocelot);
> +int ocelot_fdma_start(struct ocelot_fdma *fdma);
> +int ocelot_fdma_stop(struct ocelot_fdma *fdma);
> +int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port, u32 rew_op,
> + struct sk_buff *skb, struct net_device *dev);
> +
> +#endif
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
> index 5916492fd6d0..3971b810c5b4 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -15,6 +15,7 @@
> #include <net/pkt_cls.h>
> #include "ocelot.h"
> #include "ocelot_vcap.h"
> +#include "ocelot_fdma.h"
>
> #define OCELOT_MAC_QUIRKS OCELOT_QUIRK_QSGMII_PORTS_MUST_BE_UP
>
> @@ -457,7 +458,7 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
> int port = priv->chip_port;
> u32 rew_op = 0;
>
> - if (!ocelot_can_inject(ocelot, 0))
> + if (!ocelot->fdma && !ocelot_can_inject(ocelot, 0))
> return NETDEV_TX_BUSY;
>
> /* Check if timestamping is needed */
> @@ -475,9 +476,13 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
> rew_op = ocelot_ptp_rew_op(skb);
> }
>
> - ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
> + if (ocelot->fdma) {
> + ocelot_fdma_inject_frame(ocelot->fdma, port, rew_op, skb, dev);
> + } else {
> + ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
>
> - kfree_skb(skb);
> + kfree_skb(skb);
> + }
>
> return NETDEV_TX_OK;
> }
> diff --git a/drivers/net/ethernet/mscc/ocelot_vsc7514.c b/drivers/net/ethernet/mscc/ocelot_vsc7514.c
> index 38103b0255b0..985d584db3a1 100644
> --- a/drivers/net/ethernet/mscc/ocelot_vsc7514.c
> +++ b/drivers/net/ethernet/mscc/ocelot_vsc7514.c
> @@ -18,6 +18,7 @@
>
> #include <soc/mscc/ocelot_vcap.h>
> #include <soc/mscc/ocelot_hsio.h>
> +#include "ocelot_fdma.h"
> #include "ocelot.h"
>
> static const u32 ocelot_ana_regmap[] = {
> @@ -1080,6 +1081,10 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
> ocelot->targets[io_target[i].id] = target;
> }
>
> + ocelot->fdma = ocelot_fdma_init(pdev, ocelot);
> + if (IS_ERR(ocelot->fdma))
> + ocelot->fdma = NULL;
> +
> hsio = syscon_regmap_lookup_by_compatible("mscc,ocelot-hsio");
> if (IS_ERR(hsio)) {
> dev_err(&pdev->dev, "missing hsio syscon\n");
> @@ -1139,6 +1144,12 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
> if (err)
> goto out_ocelot_devlink_unregister;
>
> + if (ocelot->fdma) {
> + err = ocelot_fdma_start(ocelot->fdma);
> + if (err)
> + goto out_ocelot_devlink_unregister;
> + }
> +
> err = ocelot_devlink_sb_register(ocelot);
> if (err)
> goto out_ocelot_release_ports;
> @@ -1166,6 +1177,8 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
> out_ocelot_release_ports:
> mscc_ocelot_release_ports(ocelot);
> mscc_ocelot_teardown_devlink_ports(ocelot);
> + if (ocelot->fdma)
> + ocelot_fdma_stop(ocelot->fdma);
> out_ocelot_devlink_unregister:
> ocelot_deinit(ocelot);
> out_put_ports:
> @@ -1179,6 +1192,8 @@ static int mscc_ocelot_remove(struct platform_device *pdev)
> {
> struct ocelot *ocelot = platform_get_drvdata(pdev);
>
> + if (ocelot->fdma)
> + ocelot_fdma_stop(ocelot->fdma);
> devlink_unregister(ocelot->devlink);
> ocelot_deinit_timestamp(ocelot);
> ocelot_devlink_sb_unregister(ocelot);
> diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
> index b3381c90ff3e..33e1559bdea3 100644
> --- a/include/soc/mscc/ocelot.h
> +++ b/include/soc/mscc/ocelot.h
> @@ -695,6 +695,8 @@ struct ocelot {
> /* Protects the PTP clock */
> spinlock_t ptp_clock_lock;
> struct ptp_pin_desc ptp_pins[OCELOT_PTP_PINS_NUM];
> +
> + struct ocelot_fdma *fdma;
> };
>
> struct ocelot_policer {
>

2021-11-03 12:33:14

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] net: ocelot: add FDMA support

On Wed, Nov 03, 2021 at 10:19:42AM +0100, Clément Léger wrote:
> Ethernet frames can be extracted or injected autonomously to or from the
> device’s DDR3/DDR3L memory and/or PCIe memory space. Linked list data
> structures in memory are used for injecting or extracting Ethernet frames.
> The FDMA generates interrupts when frame extraction or injection is done
> and when the linked lists need updating.
>
> The FDMA is shared between all the ethernet ports of the switch and uses
> a linked list of descriptors (DCB) to inject and extract packets.
> Before adding descriptors, the FDMA channels must be stopped. It would
> be inefficient to do that each time a descriptor would be added,
>
> TX path uses multiple lists to handle descriptors. tx_ongoing is the list
> of DCB currently owned by the hardware, tx_queued is a list of DCB that
> will be given to the hardware when tx_ongoing is done and finally
> tx_free_dcb is the list of DCB available for TX.
>
> RX path uses two list, rx_hw is the list of DCB currently given to the
> hardware and rx_sw is the list of descriptors that have been completed
> by the FDMA and will be reinjected when the DMA hits the end of the
> linked list.
>
> Co-developed-by: Alexandre Belloni <[email protected]>
> Signed-off-by: Alexandre Belloni <[email protected]>
> Signed-off-by: Clément Léger <[email protected]>
> ---

Honestly, my mind exploded when I saw locking between TX and TX
confirmation. Can you not constrain the list of TX DCBs to act like a
ring-based device? Meaning that the linked list is always constant in
size, and you just update the Linked List Pointer of the last entry
populated by software to be NULL, to make the hardware stop processing
beyond that point. This could help you avoid keeping a list in software,
and a DMA pool for the DCBs, just have a contiguous memory mapping of
all the DCBs for TX, and then you shouldn't need a spin_lock for a list
you no longer keep.

I haven't even gotten to reviewing RX properly...

> drivers/net/ethernet/mscc/Makefile | 1 +
> drivers/net/ethernet/mscc/ocelot.h | 1 +
> drivers/net/ethernet/mscc/ocelot_fdma.c | 693 +++++++++++++++++++++
> drivers/net/ethernet/mscc/ocelot_fdma.h | 59 ++
> drivers/net/ethernet/mscc/ocelot_net.c | 11 +-
> drivers/net/ethernet/mscc/ocelot_vsc7514.c | 15 +
> include/soc/mscc/ocelot.h | 2 +
> 7 files changed, 779 insertions(+), 3 deletions(-)
> create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.c
> create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.h
>
> diff --git a/drivers/net/ethernet/mscc/Makefile b/drivers/net/ethernet/mscc/Makefile
> index 722c27694b21..d76a9b78b6ca 100644
> --- a/drivers/net/ethernet/mscc/Makefile
> +++ b/drivers/net/ethernet/mscc/Makefile
> @@ -11,5 +11,6 @@ mscc_ocelot_switch_lib-y := \
> mscc_ocelot_switch_lib-$(CONFIG_BRIDGE_MRP) += ocelot_mrp.o
> obj-$(CONFIG_MSCC_OCELOT_SWITCH) += mscc_ocelot.o
> mscc_ocelot-y := \
> + ocelot_fdma.o \
> ocelot_vsc7514.o \
> ocelot_net.o
> diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h
> index ba0dec7dd64f..ad85ad1079ad 100644
> --- a/drivers/net/ethernet/mscc/ocelot.h
> +++ b/drivers/net/ethernet/mscc/ocelot.h
> @@ -9,6 +9,7 @@
> #define _MSCC_OCELOT_H_
>
> #include <linux/bitops.h>
> +#include <linux/dsa/ocelot.h>
> #include <linux/etherdevice.h>
> #include <linux/if_vlan.h>
> #include <linux/net_tstamp.h>
> diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c
> new file mode 100644
> index 000000000000..d8cdf022bbee
> --- /dev/null
> +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c
> @@ -0,0 +1,693 @@
> +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> +/*
> + * Microsemi SoCs FDMA driver
> + *
> + * Copyright (c) 2021 Microchip
> + */
> +
> +#include <linux/bitops.h>
> +#include <linux/dmapool.h>
> +#include <linux/dsa/ocelot.h>
> +#include <linux/netdevice.h>
> +#include <linux/of_platform.h>
> +#include <linux/skbuff.h>
> +
> +#include "ocelot_fdma.h"
> +#include "ocelot_qs.h"
> +
> +#define MSCC_FDMA_DCB_LLP(x) ((x) * 4 + 0x0)
> +
> +#define MSCC_FDMA_DCB_STAT_BLOCKO(x) (((x) << 20) & GENMASK(31, 20))
> +#define MSCC_FDMA_DCB_STAT_BLOCKO_M GENMASK(31, 20)
> +#define MSCC_FDMA_DCB_STAT_BLOCKO_X(x) (((x) & GENMASK(31, 20)) >> 20)
> +#define MSCC_FDMA_DCB_STAT_PD BIT(19)
> +#define MSCC_FDMA_DCB_STAT_ABORT BIT(18)
> +#define MSCC_FDMA_DCB_STAT_EOF BIT(17)
> +#define MSCC_FDMA_DCB_STAT_SOF BIT(16)
> +#define MSCC_FDMA_DCB_STAT_BLOCKL_M GENMASK(15, 0)
> +#define MSCC_FDMA_DCB_STAT_BLOCKL(x) ((x) & GENMASK(15, 0))
> +
> +#define MSCC_FDMA_CH_SAFE 0xcc
> +
> +#define MSCC_FDMA_CH_ACTIVATE 0xd0
> +
> +#define MSCC_FDMA_CH_DISABLE 0xd4
> +
> +#define MSCC_FDMA_EVT_ERR 0x164
> +
> +#define MSCC_FDMA_EVT_ERR_CODE 0x168
> +
> +#define MSCC_FDMA_INTR_LLP 0x16c
> +
> +#define MSCC_FDMA_INTR_LLP_ENA 0x170
> +
> +#define MSCC_FDMA_INTR_FRM 0x174
> +
> +#define MSCC_FDMA_INTR_FRM_ENA 0x178
> +
> +#define MSCC_FDMA_INTR_ENA 0x184
> +
> +#define MSCC_FDMA_INTR_IDENT 0x188
> +
> +#define MSCC_FDMA_INJ_CHAN 2
> +#define MSCC_FDMA_XTR_CHAN 0
> +
> +#define FDMA_MAX_SKB 256
> +#define FDMA_WEIGHT 32
> +
> +#define OCELOT_TAG_WORD_LEN (OCELOT_TAG_LEN / 4)
> +
> +/* Add 4 for possible misalignment when mapping the data */
> +#define FDMA_RX_EXTRA_SIZE \
> + (OCELOT_TAG_LEN + ETH_FCS_LEN + ETH_HLEN + 4)
> +
> +struct ocelot_fdma_dcb_hw_v2 {
> + u32 llp;
> + u32 datap;
> + u32 datal;
> + u32 stat;
> +};
> +
> +struct ocelot_fdma_dcb {
> + struct ocelot_fdma_dcb_hw_v2 hw;
> + struct list_head node;
> + struct sk_buff *skb;
> + dma_addr_t mapping;
> + size_t mapped_size;
> + dma_addr_t phys;
> +};
> +
> +static int fdma_rx_compute_buffer_size(int mtu)
> +{
> + return ALIGN(mtu + FDMA_RX_EXTRA_SIZE, 4);
> +}
> +
> +static void fdma_writel(struct ocelot_fdma *fdma, u32 reg, u32 data)
> +{
> + writel(data, fdma->base + reg);
> +}
> +
> +static u32 fdma_readl(struct ocelot_fdma *fdma, u32 reg)
> +{
> + return readl(fdma->base + reg);
> +}
> +
> +static void fdma_activate_chan(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb, int chan)
> +{
> + fdma_writel(fdma, MSCC_FDMA_DCB_LLP(chan), dcb->phys);
> + fdma_writel(fdma, MSCC_FDMA_CH_ACTIVATE, BIT(chan));
> +}
> +
> +static void fdma_stop_channel(struct ocelot_fdma *fdma, int chan)
> +{
> + u32 safe;
> +
> + fdma_writel(fdma, MSCC_FDMA_CH_DISABLE, BIT(chan));
> + do {
> + safe = fdma_readl(fdma, MSCC_FDMA_CH_SAFE);
> + } while (!(safe & BIT(chan)));
> +}
> +
> +static bool ocelot_fdma_dcb_set_data(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb, void *data,
> + size_t size, enum dma_data_direction dir)
> +{
> + u32 offset;
> +
> + dcb->mapped_size = size;
> + dcb->mapping = dma_map_single(fdma->dev, data, size, dir);
> + if (unlikely(dma_mapping_error(fdma->dev, dcb->mapping)))
> + return false;
> +
> + offset = dcb->mapping & 0x3;
> +
> + dcb->hw.llp = 0;
> + dcb->hw.datap = dcb->mapping & ~0x3;
> + /* DATAL must be a multiple of word size */
> + dcb->hw.datal = ALIGN_DOWN(size - offset, 4);
> + dcb->hw.stat = MSCC_FDMA_DCB_STAT_BLOCKO(offset);
> +
> + return true;
> +}
> +
> +static bool ocelot_fdma_dcb_set_rx_skb(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb,
> + struct sk_buff *skb, size_t size)
> +{
> + dcb->skb = skb;
> + return ocelot_fdma_dcb_set_data(fdma, dcb, skb->data, size,
> + DMA_FROM_DEVICE);
> +}
> +
> +static bool ocelot_fdma_dcb_set_tx_skb(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb,
> + struct sk_buff *skb)
> +{
> + if (!ocelot_fdma_dcb_set_data(fdma, dcb, skb->data, skb->len,
> + DMA_TO_DEVICE))
> + return false;
> +
> + dcb->skb = skb;
> + dcb->hw.stat |= MSCC_FDMA_DCB_STAT_BLOCKL(skb->len);
> + dcb->hw.stat |= MSCC_FDMA_DCB_STAT_SOF | MSCC_FDMA_DCB_STAT_EOF;
> +
> + return true;
> +}
> +
> +static struct ocelot_fdma_dcb *fdma_dcb_alloc(struct ocelot_fdma *fdma)
> +{
> + struct ocelot_fdma_dcb *dcb;
> + dma_addr_t phys;
> +
> + dcb = dma_pool_zalloc(fdma->dcb_pool, GFP_KERNEL, &phys);
> + if (unlikely(!dcb))
> + return NULL;
> +
> + dcb->phys = phys;
> +
> + return dcb;
> +}
> +
> +static struct net_device *fdma_get_port_netdev(struct ocelot_fdma *fdma,
> + int port_num)
> +{
> + struct ocelot_port_private *port_priv;
> + struct ocelot *ocelot = fdma->ocelot;
> + struct ocelot_port *port;
> +
> + if (port_num >= ocelot->num_phys_ports)
> + return NULL;
> +
> + port = ocelot->ports[port_num];
> +
> + if (!port)
> + return NULL;
> +
> + port_priv = container_of(port, struct ocelot_port_private, port);
> +
> + return port_priv->dev;
> +}
> +
> +static bool ocelot_fdma_rx_process_skb(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb,
> + int budget)
> +{
> + struct sk_buff *skb = dcb->skb;
> + struct net_device *ndev;
> + u64 src_port;
> + void *xfh;
> +
> + dma_unmap_single(fdma->dev, dcb->mapping, dcb->mapped_size,
> + DMA_FROM_DEVICE);
> +
> + xfh = skb->data;
> + ocelot_xfh_get_src_port(xfh, &src_port);
> +
> + skb_put(skb, MSCC_FDMA_DCB_STAT_BLOCKL(dcb->hw.stat));
> + skb_pull(skb, OCELOT_TAG_LEN);
> +
> + ndev = fdma_get_port_netdev(fdma, src_port);
> + if (unlikely(!ndev)) {
> + napi_consume_skb(dcb->skb, budget);
> + return false;
> + }
> +
> + skb->dev = ndev;
> + skb->protocol = eth_type_trans(skb, skb->dev);
> + skb->dev->stats.rx_bytes += skb->len;
> + skb->dev->stats.rx_packets++;
> +
> + netif_receive_skb(skb);
> +
> + return true;
> +}
> +
> +static void ocelot_fdma_rx_refill(struct ocelot_fdma *fdma)
> +{
> + struct ocelot_fdma_dcb *dcb, *last_dcb;
> +
> + WARN_ON(list_empty(&fdma->rx_sw));
> +
> + dcb = list_first_entry(&fdma->rx_sw, struct ocelot_fdma_dcb, node);
> + /* Splice old hardware DCB list + new one */
> + if (!list_empty(&fdma->rx_hw)) {
> + last_dcb = list_last_entry(&fdma->rx_hw, struct ocelot_fdma_dcb,
> + node);
> + last_dcb->hw.llp = dcb->phys;
> + }
> +
> + /* Move software list to hardware list */
> + list_splice_tail_init(&fdma->rx_sw, &fdma->rx_hw);
> +
> + /* Finally reactivate the channel */
> + fdma_activate_chan(fdma, dcb, MSCC_FDMA_XTR_CHAN);
> +}
> +
> +static void ocelot_fdma_list_add_dcb(struct list_head *list,
> + struct ocelot_fdma_dcb *dcb)
> +{
> + struct ocelot_fdma_dcb *last_dcb;
> +
> + if (!list_empty(list)) {
> + last_dcb = list_last_entry(list, struct ocelot_fdma_dcb, node);
> + last_dcb->hw.llp = dcb->phys;
> + }
> +
> + list_add_tail(&dcb->node, list);
> +}
> +
> +static bool ocelot_fdma_rx_add_dcb_sw(struct ocelot_fdma *fdma,
> + struct ocelot_fdma_dcb *dcb)
> +{
> + struct sk_buff *new_skb;
> +
> + /* Add DCB to end of list with new SKB */
> + new_skb = napi_alloc_skb(&fdma->napi, fdma->rx_buf_size);
> + if (unlikely(!new_skb)) {
> + pr_err("skb_alloc failed\n");
> + return false;
> + }
> +
> + ocelot_fdma_dcb_set_rx_skb(fdma, dcb, new_skb, fdma->rx_buf_size);
> + ocelot_fdma_list_add_dcb(&fdma->rx_sw, dcb);
> +
> + return true;
> +}
> +
> +static bool ocelot_fdma_rx_get(struct ocelot_fdma *fdma, int budget)
> +{
> + struct ocelot_fdma_dcb *dcb;
> + bool valid = true;
> + u32 stat;
> +
> + dcb = list_first_entry_or_null(&fdma->rx_hw, struct ocelot_fdma_dcb,
> + node);
> + if (!dcb || MSCC_FDMA_DCB_STAT_BLOCKL(dcb->hw.stat) == 0)
> + return false;
> +
> + list_del(&dcb->node);
> +
> + stat = dcb->hw.stat;
> + if (stat & MSCC_FDMA_DCB_STAT_ABORT || stat & MSCC_FDMA_DCB_STAT_PD)
> + valid = false;
> +
> + if (!(stat & MSCC_FDMA_DCB_STAT_SOF) ||
> + !(stat & MSCC_FDMA_DCB_STAT_EOF))
> + valid = false;
> +
> + if (likely(valid)) {
> + if (!ocelot_fdma_rx_process_skb(fdma, dcb, budget))
> + pr_err("Process skb failed, stat %x\n", stat);
> + } else {
> + napi_consume_skb(dcb->skb, budget);
> + }
> +
> + return ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
> +}
> +
> +static void ocelot_fdma_rx_check_stopped(struct ocelot_fdma *fdma)
> +{
> + u32 llp = fdma_readl(fdma, MSCC_FDMA_DCB_LLP(MSCC_FDMA_XTR_CHAN));
> + /* LLP is non NULL, FDMA is still fetching packets */
> + if (llp)
> + return;
> +
> + fdma_stop_channel(fdma, MSCC_FDMA_XTR_CHAN);
> + ocelot_fdma_rx_refill(fdma);
> +}
> +
> +static void ocelot_fdma_tx_free_dcb(struct ocelot_fdma *fdma,
> + struct list_head *list)
> +{
> + struct ocelot_fdma_dcb *dcb;
> +
> + if (list_empty(list))
> + return;
> +
> + /* Free all SKBs that have been used for TX */
> + list_for_each_entry(dcb, list, node) {
> + dma_unmap_single(fdma->dev, dcb->mapping, dcb->mapped_size,
> + DMA_TO_DEVICE);
> + dev_consume_skb_any(dcb->skb);
> + dcb->skb = NULL;
> + }
> +
> + /* All DCBs can now be given to free list */
> + spin_lock(&fdma->tx_free_lock);
> + list_splice_tail_init(list, &fdma->tx_free_dcb);
> + spin_unlock(&fdma->tx_free_lock);
> +}
> +
> +static void ocelot_fdma_tx_cleanup(struct ocelot_fdma *fdma)
> +{
> + struct list_head tx_done = LIST_HEAD_INIT(tx_done);
> + struct ocelot_fdma_dcb *dcb, *temp;
> +
> + spin_lock(&fdma->tx_enqueue_lock);
> + if (list_empty(&fdma->tx_ongoing))
> + goto out_unlock;
> +
> + list_for_each_entry_safe(dcb, temp, &fdma->tx_ongoing, node) {
> + if (!(dcb->hw.stat & MSCC_FDMA_DCB_STAT_PD))
> + break;
> +
> + list_move_tail(&dcb->node, &tx_done);
> + }
> +
> +out_unlock:
> + spin_unlock(&fdma->tx_enqueue_lock);
> +
> + ocelot_fdma_tx_free_dcb(fdma, &tx_done);
> +}
> +
> +static void ocelot_fdma_tx_restart(struct ocelot_fdma *fdma)
> +{
> + struct ocelot_fdma_dcb *dcb;
> + u32 safe;
> +
> + spin_lock(&fdma->tx_enqueue_lock);
> +
> + if (!list_empty(&fdma->tx_ongoing) || list_empty(&fdma->tx_queued))
> + goto out_unlock;
> +
> + /* Ongoing list is empty, channel should be in safe mode */
> + do {
> + safe = fdma_readl(fdma, MSCC_FDMA_CH_SAFE);
> + } while (!(safe & BIT(MSCC_FDMA_INJ_CHAN)));
> +
> + /* Move queued DCB to ongoing and restart the DMA */
> + list_splice_tail_init(&fdma->tx_queued, &fdma->tx_ongoing);
> + /* List can't be empty, no need to check */
> + dcb = list_first_entry(&fdma->tx_ongoing, struct ocelot_fdma_dcb, node);
> +
> + fdma_activate_chan(fdma, dcb, MSCC_FDMA_INJ_CHAN);
> +
> +out_unlock:
> + spin_unlock(&fdma->tx_enqueue_lock);
> +}
> +
> +static int ocelot_fdma_napi_poll(struct napi_struct *napi, int budget)
> +{
> + struct ocelot_fdma *fdma = container_of(napi, struct ocelot_fdma, napi);
> + int work_done = 0;
> +
> + ocelot_fdma_tx_cleanup(fdma);
> + ocelot_fdma_tx_restart(fdma);
> +
> + while (work_done < budget) {
> + if (!ocelot_fdma_rx_get(fdma, budget))
> + break;
> +
> + work_done++;
> + }
> +
> + ocelot_fdma_rx_check_stopped(fdma);
> +
> + if (work_done < budget) {
> + napi_complete(&fdma->napi);

Documentation says you should consider calling napi_complete_done(&fdma->napi, work_done);

> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA,
> + BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
> + }
> +
> + return work_done;
> +}
> +
> +static irqreturn_t ocelot_fdma_interrupt(int irq, void *dev_id)
> +{
> + u32 ident, llp, frm, err, err_code;
> + struct ocelot_fdma *fdma = dev_id;
> +
> + ident = fdma_readl(fdma, MSCC_FDMA_INTR_IDENT);
> + frm = fdma_readl(fdma, MSCC_FDMA_INTR_FRM);
> + llp = fdma_readl(fdma, MSCC_FDMA_INTR_LLP);
> +
> + fdma_writel(fdma, MSCC_FDMA_INTR_LLP, llp & ident);
> + fdma_writel(fdma, MSCC_FDMA_INTR_FRM, frm & ident);
> + if (frm | llp) {

Bitwise OR? Strange.

> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
> + napi_schedule(&fdma->napi);
> + }
> +
> + err = fdma_readl(fdma, MSCC_FDMA_EVT_ERR);
> + if (unlikely(err)) {
> + err_code = fdma_readl(fdma, MSCC_FDMA_EVT_ERR_CODE);
> + dev_err_ratelimited(fdma->dev,
> + "Error ! chans mask: %#x, code: %#x\n",
> + err, err_code);
> +
> + fdma_writel(fdma, MSCC_FDMA_EVT_ERR, err);
> + fdma_writel(fdma, MSCC_FDMA_EVT_ERR_CODE, err_code);
> + }
> +
> + return IRQ_HANDLED;
> +}
> +
> +static struct ocelot_fdma_dcb *fdma_tx_get_dcb(struct ocelot_fdma *fdma)

Please name these functions consistently and make them start with ocelot_.

> +{
> + struct ocelot_fdma_dcb *dcb = NULL;
> +
> + spin_lock_bh(&fdma->tx_free_lock);
> + dcb = list_first_entry_or_null(&fdma->tx_free_dcb,
> + struct ocelot_fdma_dcb, node);
> + if (dcb)
> + list_del(&dcb->node);
> +
> + spin_unlock_bh(&fdma->tx_free_lock);
> +
> + return dcb;
> +}
> +
> +int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port, u32 rew_op,
> + struct sk_buff *skb, struct net_device *dev)
> +{
> + struct ocelot_port *port_s = fdma->ocelot->ports[port];
> + struct ocelot_fdma_dcb *dcb;
> + struct sk_buff *new_skb;
> + void *ifh;
> +
> + if (unlikely(skb_shinfo(skb)->nr_frags != 0)) {
> + netdev_err(dev, "Unsupported fragmented packet");
> + dev_kfree_skb_any(skb);

skb_linearize()

Also please don't print stuff from the hot path without net_ratelimited()

> + return NETDEV_TX_OK;
> + }
> +
> + if (skb_headroom(skb) < OCELOT_TAG_LEN ||
> + skb_tailroom(skb) < ETH_FCS_LEN) {

Don't you also need to copy the skb (to ensure it's writable) if it's cloned
(like would be the case for packets with two-step PTP TX timestamping)?
I don't see any calls to skb_unshare().

You can test with:

ptp4l -i swp0 -2 -P -m --tx_timestamp_timeout 20

on two back-to-back boards.

> + new_skb = skb_copy_expand(skb, OCELOT_TAG_LEN, ETH_FCS_LEN,
> + GFP_ATOMIC);
> + dev_consume_skb_any(skb);

I think you can use pskb_expand_head() and avoid creating a new_skb.
Look at dsa_realloc_skb().

> + if (!new_skb)
> + return NETDEV_TX_OK;
> +
> + skb = new_skb;
> + }
> +
> + ifh = skb_push(skb, OCELOT_TAG_LEN);
> + skb_put(skb, ETH_FCS_LEN);
> + ocelot_ifh_port_set(ifh, port_s, rew_op, skb_vlan_tag_get(skb));
> +
> + dcb = fdma_tx_get_dcb(fdma);
> + if (unlikely(!dcb))
> + return NETDEV_TX_BUSY;
> +
> + if (!ocelot_fdma_dcb_set_tx_skb(fdma, dcb, skb)) {
> + dev_kfree_skb_any(skb);
> + spin_lock_bh(&fdma->tx_free_lock);
> + list_add_tail(&dcb->node, &fdma->tx_free_dcb);
> + spin_unlock_bh(&fdma->tx_free_lock);
> + return NETDEV_TX_OK;
> + }
> +
> + spin_lock_bh(&fdma->tx_enqueue_lock);
> +
> + if (list_empty(&fdma->tx_ongoing)) {
> + ocelot_fdma_list_add_dcb(&fdma->tx_ongoing, dcb);
> + fdma_activate_chan(fdma, dcb, MSCC_FDMA_INJ_CHAN);
> + } else {
> + ocelot_fdma_list_add_dcb(&fdma->tx_queued, dcb);
> + }
> +
> + spin_unlock_bh(&fdma->tx_enqueue_lock);

I think you don't need _bh locking from ndo_start_xmit() context.
__dev_queue_xmit() calls rcu_read_lock_bh(). On the other hand, I think
you might need to use spin_lock_bh from ocelot_fdma_napi_poll(), since
that runs from NET_RX softirq, and ndo_start_xmit() can run from loads
of other contexts.

> + return NETDEV_TX_OK;
> +}
> +
> +static void fdma_free_skbs_list(struct ocelot_fdma *fdma,
> + struct list_head *list,
> + enum dma_data_direction dir)
> +{
> + struct ocelot_fdma_dcb *dcb;
> +
> + if (list_empty(list))
> + return;

I'm not sure this is really needed.

> +
> + list_for_each_entry(dcb, list, node) {
> + if (dcb->skb) {
> + dma_unmap_single(fdma->dev, dcb->mapping,
> + dcb->mapped_size, dir);
> + dev_kfree_skb_any(dcb->skb);

dcb->skb = NULL?

> + }
> + }
> +}
> +
> +static int fdma_init_tx(struct ocelot_fdma *fdma)
> +{
> + int i;
> + struct ocelot_fdma_dcb *dcb;
> +
> + for (i = 0; i < FDMA_MAX_SKB; i++) {
> + dcb = fdma_dcb_alloc(fdma);
> + if (!dcb)
> + return -ENOMEM;
> +
> + list_add_tail(&dcb->node, &fdma->tx_free_dcb);
> + }
> +
> + return 0;
> +}
> +
> +static int fdma_init_rx(struct ocelot_fdma *fdma)
> +{
> + struct ocelot_port_private *port_priv;
> + struct ocelot *ocelot = fdma->ocelot;
> + struct ocelot_fdma_dcb *dcb;
> + struct ocelot_port *port;
> + struct net_device *ndev;
> + int max_mtu = 0;
> + int i;
> + u8 port_num;

Please declare variables in the order of descending line length (aka
"reverse Christmas tree"). Here, and in fdma_init_rx(), and in other
places.

> +
> + for (port_num = 0; port_num < ocelot->num_phys_ports; port_num++) {

The naming convention is "int port", "struct ocelot_port *ocelot_port".
Please keep it. Thanks.

> + port = ocelot->ports[port_num];
> + if (!port)
> + continue;
> +
> + port_priv = container_of(port, struct ocelot_port_private,
> + port);
> + ndev = port_priv->dev;
> +
> + ndev->needed_headroom = OCELOT_TAG_LEN;
> + ndev->needed_tailroom = ETH_FCS_LEN;
> +
> + if (READ_ONCE(ndev->mtu) > max_mtu)
> + max_mtu = READ_ONCE(ndev->mtu);

This seems silly, you use READ_ONCE twice... what's the point?
Also, what is this racing with?

> + }
> +
> + if (!ndev)
> + return -ENODEV;
> +
> + fdma->rx_buf_size = fdma_rx_compute_buffer_size(max_mtu);
> + netif_napi_add(ndev, &fdma->napi, ocelot_fdma_napi_poll,
> + FDMA_WEIGHT);
> +
> + for (i = 0; i < FDMA_MAX_SKB; i++) {
> + dcb = fdma_dcb_alloc(fdma);
> + if (!dcb)
> + return -ENOMEM;
> +
> + ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
> + }
> +
> + napi_enable(&fdma->napi);
> +
> + return 0;
> +}
> +
> +struct ocelot_fdma *ocelot_fdma_init(struct platform_device *pdev,
> + struct ocelot *ocelot)
> +{
> + struct ocelot_fdma *fdma;
> + int ret;
> +
> + fdma = devm_kzalloc(&pdev->dev, sizeof(*fdma), GFP_KERNEL);
> + if (!fdma)
> + return ERR_PTR(-ENOMEM);
> +
> + fdma->ocelot = ocelot;
> + fdma->base = devm_platform_ioremap_resource_byname(pdev, "fdma");

Don't you want to look up the resource by name before allocating stuff?
Maybe the allocation won't be needed, then you'll have to live with it,
since you use devres. Although my personal recommendation would be to
just not use devres, it makes you think more.

> + if (IS_ERR_OR_NULL(fdma->base))
> + return fdma->base;

Just return NULL and simplify the caller, you aren't using the ERR value
anyway (or do something with the error value at the call site).

> +
> + fdma->dev = &pdev->dev;
> + fdma->dev->coherent_dma_mask = DMA_BIT_MASK(32);
> +
> + spin_lock_init(&fdma->tx_enqueue_lock);
> + spin_lock_init(&fdma->tx_free_lock);
> +
> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
> +
> + fdma->irq = platform_get_irq_byname(pdev, "fdma");
> + ret = devm_request_irq(&pdev->dev, fdma->irq, ocelot_fdma_interrupt, 0,
> + dev_name(&pdev->dev), fdma);
> + if (ret)
> + return ERR_PTR(ret);
> +
> + /* Create a pool of consistent memory blocks for hardware descriptors */
> + fdma->dcb_pool = dmam_pool_create("ocelot_fdma", &pdev->dev,
> + sizeof(struct ocelot_fdma_dcb),
> + __alignof__(struct ocelot_fdma_dcb),
> + 0);
> + if (!fdma->dcb_pool) {
> + dev_err(&pdev->dev, "unable to allocate DMA descriptor pool\n");
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + INIT_LIST_HEAD(&fdma->tx_ongoing);
> + INIT_LIST_HEAD(&fdma->tx_free_dcb);
> + INIT_LIST_HEAD(&fdma->tx_queued);
> + INIT_LIST_HEAD(&fdma->rx_sw);
> + INIT_LIST_HEAD(&fdma->rx_hw);
> +
> + return fdma;
> +}
> +
> +int ocelot_fdma_start(struct ocelot_fdma *fdma)
> +{
> + struct ocelot *ocelot = fdma->ocelot;
> + int ret;
> +
> + ret = fdma_init_tx(fdma);
> + if (ret)
> + return ret;
> +
> + ret = fdma_init_rx(fdma);
> + if (ret)

Don't you want to undo the fdma_dcb_alloc() from fdma_init_tx() if this fails?

> + return ret;
> +
> + /* Reconfigure for extraction and injection using DMA */
> + ocelot_write_rix(ocelot, QS_INJ_GRP_CFG_MODE(2), QS_INJ_GRP_CFG, 0);
> + ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(0), QS_INJ_CTRL, 0);
> +
> + ocelot_write_rix(ocelot, QS_XTR_GRP_CFG_MODE(2), QS_XTR_GRP_CFG, 0);
> +
> + fdma_writel(fdma, MSCC_FDMA_INTR_LLP, 0xffffffff);
> + fdma_writel(fdma, MSCC_FDMA_INTR_FRM, 0xffffffff);
> +
> + fdma_writel(fdma, MSCC_FDMA_INTR_LLP_ENA,
> + BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
> + fdma_writel(fdma, MSCC_FDMA_INTR_FRM_ENA, BIT(MSCC_FDMA_XTR_CHAN));
> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA,
> + BIT(MSCC_FDMA_INJ_CHAN) | BIT(MSCC_FDMA_XTR_CHAN));
> +
> + ocelot_fdma_rx_refill(fdma);
> +
> + return 0;
> +}
> +
> +int ocelot_fdma_stop(struct ocelot_fdma *fdma)
> +{
> + fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
> +
> + fdma_stop_channel(fdma, MSCC_FDMA_XTR_CHAN);
> + fdma_stop_channel(fdma, MSCC_FDMA_INJ_CHAN);
> +
> + /* Free potentially pending SKBs in DCB lists */
> + fdma_free_skbs_list(fdma, &fdma->rx_hw, DMA_FROM_DEVICE);
> + fdma_free_skbs_list(fdma, &fdma->rx_sw, DMA_FROM_DEVICE);
> + fdma_free_skbs_list(fdma, &fdma->tx_ongoing, DMA_TO_DEVICE);
> + fdma_free_skbs_list(fdma, &fdma->tx_queued, DMA_TO_DEVICE);
> +
> + netif_napi_del(&fdma->napi);
> +
> + return 0;
> +}
> diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.h b/drivers/net/ethernet/mscc/ocelot_fdma.h
> new file mode 100644
> index 000000000000..6c5c5872abf5
> --- /dev/null
> +++ b/drivers/net/ethernet/mscc/ocelot_fdma.h
> @@ -0,0 +1,59 @@
> +/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
> +/*
> + * Microsemi SoCs FDMA driver
> + *
> + * Copyright (c) 2021 Microchip
> + */
> +#ifndef _MSCC_OCELOT_FDMA_H_
> +#define _MSCC_OCELOT_FDMA_H_
> +
> +#include "ocelot.h"
> +
> +/**
> + * struct ocelot_fdma - FMDA struct
> + *
> + * @ocelot: Pointer to ocelot struct
> + * @base: base address of FDMA registers
> + * @dcb_pool: Pool used for DCB allocation
> + * @irq: FDMA interrupt
> + * @dev: Ocelot device
> + * @napi: napi handle
> + * @rx_buf_size: Size of RX buffer
> + * @tx_ongoing: List of DCB handed out to the FDMA
> + * @tx_queued: pending list of DCBs to be given to the hardware
> + * @tx_enqueue_lock: Lock used for tx_queued and tx_ongoing
> + * @tx_free_dcb: List of DCB available for TX
> + * @tx_free_lock: Lock used to access tx_free_dcb list
> + * @rx_hw: RX DCBs currently owned by the hardware and not completed
> + * @rx_sw: RX DCBs completed
> + */
> +struct ocelot_fdma {
> + struct ocelot *ocelot;
> + void __iomem *base;
> + struct dma_pool *dcb_pool;
> + int irq;
> + struct device *dev;
> + struct napi_struct napi;
> + size_t rx_buf_size;
> +
> + struct list_head tx_ongoing;
> + struct list_head tx_queued;
> + /* Lock for tx_queued and tx_ongoing lists */
> + spinlock_t tx_enqueue_lock;
> +
> + struct list_head tx_free_dcb;
> + /* Lock for tx_free_dcb list */
> + spinlock_t tx_free_lock;
> +
> + struct list_head rx_hw;
> + struct list_head rx_sw;
> +};
> +
> +struct ocelot_fdma *ocelot_fdma_init(struct platform_device *pdev,
> + struct ocelot *ocelot);
> +int ocelot_fdma_start(struct ocelot_fdma *fdma);
> +int ocelot_fdma_stop(struct ocelot_fdma *fdma);
> +int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port, u32 rew_op,
> + struct sk_buff *skb, struct net_device *dev);
> +
> +#endif
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
> index 5916492fd6d0..3971b810c5b4 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -15,6 +15,7 @@
> #include <net/pkt_cls.h>
> #include "ocelot.h"
> #include "ocelot_vcap.h"
> +#include "ocelot_fdma.h"
>
> #define OCELOT_MAC_QUIRKS OCELOT_QUIRK_QSGMII_PORTS_MUST_BE_UP
>
> @@ -457,7 +458,7 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
> int port = priv->chip_port;
> u32 rew_op = 0;
>
> - if (!ocelot_can_inject(ocelot, 0))
> + if (!ocelot->fdma && !ocelot_can_inject(ocelot, 0))
> return NETDEV_TX_BUSY;
>
> /* Check if timestamping is needed */
> @@ -475,9 +476,13 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
> rew_op = ocelot_ptp_rew_op(skb);
> }
>
> - ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
> + if (ocelot->fdma) {
> + ocelot_fdma_inject_frame(ocelot->fdma, port, rew_op, skb, dev);
> + } else {
> + ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
>
> - kfree_skb(skb);
> + kfree_skb(skb);

I know this is unrelated, but.. consume_skb maybe?

> + }
>
> return NETDEV_TX_OK;
> }
> diff --git a/drivers/net/ethernet/mscc/ocelot_vsc7514.c b/drivers/net/ethernet/mscc/ocelot_vsc7514.c
> index 38103b0255b0..985d584db3a1 100644
> --- a/drivers/net/ethernet/mscc/ocelot_vsc7514.c
> +++ b/drivers/net/ethernet/mscc/ocelot_vsc7514.c
> @@ -18,6 +18,7 @@
>
> #include <soc/mscc/ocelot_vcap.h>
> #include <soc/mscc/ocelot_hsio.h>
> +#include "ocelot_fdma.h"
> #include "ocelot.h"
>
> static const u32 ocelot_ana_regmap[] = {
> @@ -1080,6 +1081,10 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
> ocelot->targets[io_target[i].id] = target;
> }
>
> + ocelot->fdma = ocelot_fdma_init(pdev, ocelot);
> + if (IS_ERR(ocelot->fdma))
> + ocelot->fdma = NULL;
> +
> hsio = syscon_regmap_lookup_by_compatible("mscc,ocelot-hsio");
> if (IS_ERR(hsio)) {
> dev_err(&pdev->dev, "missing hsio syscon\n");
> @@ -1139,6 +1144,12 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
> if (err)
> goto out_ocelot_devlink_unregister;
>
> + if (ocelot->fdma) {
> + err = ocelot_fdma_start(ocelot->fdma);
> + if (err)
> + goto out_ocelot_devlink_unregister;
> + }
> +
> err = ocelot_devlink_sb_register(ocelot);
> if (err)
> goto out_ocelot_release_ports;
> @@ -1166,6 +1177,8 @@ static int mscc_ocelot_probe(struct platform_device *pdev)
> out_ocelot_release_ports:
> mscc_ocelot_release_ports(ocelot);
> mscc_ocelot_teardown_devlink_ports(ocelot);
> + if (ocelot->fdma)
> + ocelot_fdma_stop(ocelot->fdma);
> out_ocelot_devlink_unregister:
> ocelot_deinit(ocelot);
> out_put_ports:
> @@ -1179,6 +1192,8 @@ static int mscc_ocelot_remove(struct platform_device *pdev)
> {
> struct ocelot *ocelot = platform_get_drvdata(pdev);
>
> + if (ocelot->fdma)
> + ocelot_fdma_stop(ocelot->fdma);

Are you sure you want to call netif_napi_del() while the net devices are
still registered? :-/

> devlink_unregister(ocelot->devlink);
> ocelot_deinit_timestamp(ocelot);
> ocelot_devlink_sb_unregister(ocelot);
> diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
> index b3381c90ff3e..33e1559bdea3 100644
> --- a/include/soc/mscc/ocelot.h
> +++ b/include/soc/mscc/ocelot.h
> @@ -695,6 +695,8 @@ struct ocelot {
> /* Protects the PTP clock */
> spinlock_t ptp_clock_lock;
> struct ptp_pin_desc ptp_pins[OCELOT_PTP_PINS_NUM];
> +
> + struct ocelot_fdma *fdma;
> };
>
> struct ocelot_policer {
> --
> 2.33.0
>

2021-11-03 12:39:53

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

On Wed, Nov 03, 2021 at 10:19:40AM +0100, Cl?ment L?ger wrote:
> IFH preparation can take quite some time on slow processors (up to 5% in
> a iperf3 test for instance). In order to reduce the cost of this
> preparation, pre-compute IFH since most of the parameters are fixed per
> port. Only rew_op and vlan tag will be set when sending if different
> than 0. This allows to remove entirely the calls to packing() with basic
> usage. In the same time, export this function that will be used by FDMA.
>
> Signed-off-by: Cl?ment L?ger <[email protected]>
> ---

Honestly, this feels a bit cheap/gimmicky, and not really the
fundamental thing to address. In my testing of a similar idea (see
commits 67c2404922c2 ("net: dsa: felix: create a template for the DSA
tags on xmit") and then 7c4bb540e917 ("net: dsa: tag_ocelot: create
separate tagger for Seville"), the net difference is not that stark,
considering that now you need to access one more memory region which you
did not need before, do a memcpy, and then patch the IFH anyway for the
non-constant stuff.

Certainly, for the calls to ocelot_port_inject_frame() from DSA, I would
prefer not having this pre-computed IFH.

Could you provide some before/after performance numbers and perf counters?

> drivers/net/ethernet/mscc/ocelot.c | 23 ++++++++++++++++++-----
> include/soc/mscc/ocelot.h | 5 +++++
> 2 files changed, 23 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c
> index e6c18b598d5c..97693772595b 100644
> --- a/drivers/net/ethernet/mscc/ocelot.c
> +++ b/drivers/net/ethernet/mscc/ocelot.c
> @@ -1076,20 +1076,29 @@ bool ocelot_can_inject(struct ocelot *ocelot, int grp)
> }
> EXPORT_SYMBOL(ocelot_can_inject);
>
> +void ocelot_ifh_port_set(void *ifh, struct ocelot_port *port, u32 rew_op,
> + u32 vlan_tag)
> +{
> + memcpy(ifh, port->ifh, OCELOT_TAG_LEN);
> +
> + if (vlan_tag)
> + ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
> + if (rew_op)
> + ocelot_ifh_set_rew_op(ifh, rew_op);
> +}
> +EXPORT_SYMBOL(ocelot_ifh_port_set);
> +
> void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp,
> u32 rew_op, struct sk_buff *skb)
> {
> + struct ocelot_port *port_s = ocelot->ports[port];
> u32 ifh[OCELOT_TAG_LEN / 4] = {0};
> unsigned int i, count, last;
>
> ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
> QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
>
> - ocelot_ifh_set_bypass(ifh, 1);
> - ocelot_ifh_set_dest(ifh, BIT_ULL(port));
> - ocelot_ifh_set_tag_type(ifh, IFH_TAG_TYPE_C);
> - ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
> - ocelot_ifh_set_rew_op(ifh, rew_op);
> + ocelot_ifh_port_set(ifh, port_s, rew_op, skb_vlan_tag_get(skb));
>
> for (i = 0; i < OCELOT_TAG_LEN / 4; i++)
> ocelot_write_rix(ocelot, ifh[i], QS_INJ_WR, grp);
> @@ -2128,6 +2137,10 @@ void ocelot_init_port(struct ocelot *ocelot, int port)
>
> skb_queue_head_init(&ocelot_port->tx_skbs);
>
> + ocelot_ifh_set_bypass(ocelot_port->ifh, 1);
> + ocelot_ifh_set_dest(ocelot_port->ifh, BIT_ULL(port));
> + ocelot_ifh_set_tag_type(ocelot_port->ifh, IFH_TAG_TYPE_C);
> +
> /* Basic L2 initialization */
>
> /* Set MAC IFG Gaps
> diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
> index fef3a36b0210..b3381c90ff3e 100644
> --- a/include/soc/mscc/ocelot.h
> +++ b/include/soc/mscc/ocelot.h
> @@ -6,6 +6,7 @@
> #define _SOC_MSCC_OCELOT_H
>
> #include <linux/ptp_clock_kernel.h>
> +#include <linux/dsa/ocelot.h>
> #include <linux/net_tstamp.h>
> #include <linux/if_vlan.h>
> #include <linux/regmap.h>
> @@ -623,6 +624,8 @@ struct ocelot_port {
>
> struct net_device *bridge;
> u8 stp_state;
> +
> + u8 ifh[OCELOT_TAG_LEN];
> };
>
> struct ocelot {
> @@ -754,6 +757,8 @@ void __ocelot_target_write_ix(struct ocelot *ocelot, enum ocelot_target target,
> bool ocelot_can_inject(struct ocelot *ocelot, int grp);
> void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp,
> u32 rew_op, struct sk_buff *skb);
> +void ocelot_ifh_port_set(void *ifh, struct ocelot_port *port, u32 rew_op,
> + u32 vlan_tag);
> int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **skb);
> void ocelot_drain_cpu_queue(struct ocelot *ocelot, int grp);
>
> --
> 2.33.0
>

2021-11-03 12:43:10

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] net: ocelot: add support for ndo_change_mtu

On Wed, Nov 03, 2021 at 10:19:41AM +0100, Cl?ment L?ger wrote:
> This commit adds support for changing MTU for the ocelot register based
> interface. For ocelot, JUMBO frame size can be set up to 25000 bytes
> but has been set to 9000 which is a saner value and allow for maximum
> gain of performances. Frames larger than 9000 bytes do not yield
> a noticeable improvement.
>
> Signed-off-by: Cl?ment L?ger <[email protected]>
> ---
> drivers/net/ethernet/mscc/ocelot.h | 2 ++
> drivers/net/ethernet/mscc/ocelot_net.c | 14 ++++++++++++++
> 2 files changed, 16 insertions(+)
>
> diff --git a/drivers/net/ethernet/mscc/ocelot.h b/drivers/net/ethernet/mscc/ocelot.h
> index e43da09b8f91..ba0dec7dd64f 100644
> --- a/drivers/net/ethernet/mscc/ocelot.h
> +++ b/drivers/net/ethernet/mscc/ocelot.h
> @@ -32,6 +32,8 @@
>
> #define OCELOT_PTP_QUEUE_SZ 128
>
> +#define OCELOT_JUMBO_MTU 9000
> +
> struct ocelot_port_tc {
> bool block_shared;
> unsigned long offload_cnt;
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
> index d76def435b23..5916492fd6d0 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -482,6 +482,18 @@ static netdev_tx_t ocelot_port_xmit(struct sk_buff *skb, struct net_device *dev)
> return NETDEV_TX_OK;
> }
>
> +static int ocelot_change_mtu(struct net_device *dev, int new_mtu)
> +{
> + struct ocelot_port_private *priv = netdev_priv(dev);
> + struct ocelot_port *ocelot_port = &priv->port;
> + struct ocelot *ocelot = ocelot_port->ocelot;
> +
> + ocelot_port_set_maxlen(ocelot, priv->chip_port, new_mtu);
> + WRITE_ONCE(dev->mtu, new_mtu);

The WRITE_ONCE seems absolutely gratuitous to me.

> +
> + return 0;
> +}
> +
> enum ocelot_action_type {
> OCELOT_MACT_LEARN,
> OCELOT_MACT_FORGET,
> @@ -768,6 +780,7 @@ static const struct net_device_ops ocelot_port_netdev_ops = {
> .ndo_open = ocelot_port_open,
> .ndo_stop = ocelot_port_stop,
> .ndo_start_xmit = ocelot_port_xmit,
> + .ndo_change_mtu = ocelot_change_mtu,
> .ndo_set_rx_mode = ocelot_set_rx_mode,
> .ndo_set_mac_address = ocelot_port_set_mac_address,
> .ndo_get_stats64 = ocelot_get_stats64,
> @@ -1699,6 +1712,7 @@ int ocelot_probe_port(struct ocelot *ocelot, int port, struct regmap *target,
>
> dev->netdev_ops = &ocelot_port_netdev_ops;
> dev->ethtool_ops = &ocelot_ethtool_ops;
> + dev->max_mtu = OCELOT_JUMBO_MTU;
>
> dev->hw_features |= NETIF_F_HW_VLAN_CTAG_FILTER | NETIF_F_RXFCS |
> NETIF_F_HW_TC;
> --
> 2.33.0
>

2021-11-03 12:45:55

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] net: ocelot: add jumbo frame support for FDMA

On Wed, Nov 03, 2021 at 10:19:43AM +0100, Cl?ment L?ger wrote:
> When using the FDMA, using jumbo frames can lead to a large performance
> improvement. When changing the MTU, the RX buffer size must be
> increased to be large enough to receive jumbo frame. Since the FDMA is
> shared amongst all interfaces, all the ports must be down before
> changing the MTU. Buffers are sized to accept the maximum MTU supported
> by each port.
>
> Signed-off-by: Cl?ment L?ger <[email protected]>
> ---

Instead of draining buffers and refilling with a different size, which
impacts the user experience, can you not just use scatter/gather RX
processing for frames larger than the fixed buffer size, like a normal
driver would?

> drivers/net/ethernet/mscc/ocelot_fdma.c | 61 +++++++++++++++++++++++++
> drivers/net/ethernet/mscc/ocelot_fdma.h | 1 +
> drivers/net/ethernet/mscc/ocelot_net.c | 7 +++
> 3 files changed, 69 insertions(+)
>
> diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c
> index d8cdf022bbee..bee1a310caa6 100644
> --- a/drivers/net/ethernet/mscc/ocelot_fdma.c
> +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c
> @@ -530,6 +530,67 @@ static void fdma_free_skbs_list(struct ocelot_fdma *fdma,
> }
> }
>
> +int ocelot_fdma_change_mtu(struct net_device *dev, int new_mtu)
> +{
> + struct ocelot_port_private *priv = netdev_priv(dev);
> + struct ocelot_port *port = &priv->port;
> + struct ocelot *ocelot = port->ocelot;
> + struct ocelot_fdma *fdma = ocelot->fdma;
> + struct ocelot_fdma_dcb *dcb, *dcb_temp;
> + struct list_head tmp = LIST_HEAD_INIT(tmp);
> + size_t old_rx_buf_size = fdma->rx_buf_size;
> + bool all_ports_down = true;
> + u8 port_num;
> +
> + /* The FDMA RX list is shared amongst all the port, get the max MTU from
> + * all of them
> + */
> + for (port_num = 0; port_num < ocelot->num_phys_ports; port_num++) {
> + port = ocelot->ports[port_num];
> + if (!port)
> + continue;
> +
> + priv = container_of(port, struct ocelot_port_private, port);
> +
> + if (READ_ONCE(priv->dev->mtu) > new_mtu)
> + new_mtu = READ_ONCE(priv->dev->mtu);
> +
> + /* All ports must be down to change the RX buffer length */
> + if (netif_running(priv->dev))
> + all_ports_down = false;
> + }
> +
> + fdma->rx_buf_size = fdma_rx_compute_buffer_size(new_mtu);
> + if (fdma->rx_buf_size == old_rx_buf_size)
> + return 0;
> +
> + if (!all_ports_down)
> + return -EBUSY;
> +
> + priv = netdev_priv(dev);
> +
> + fdma_stop_channel(fdma, MSCC_FDMA_INJ_CHAN);
> +
> + /* Discard all pending RX software and hardware descriptor */
> + fdma_free_skbs_list(fdma, &fdma->rx_hw, DMA_FROM_DEVICE);
> + fdma_free_skbs_list(fdma, &fdma->rx_sw, DMA_FROM_DEVICE);
> +
> + /* Move all DCBs to a temporary list that will be injected in sw list */
> + if (!list_empty(&fdma->rx_hw))
> + list_splice_tail_init(&fdma->rx_hw, &tmp);
> + if (!list_empty(&fdma->rx_sw))
> + list_splice_tail_init(&fdma->rx_sw, &tmp);
> +
> + list_for_each_entry_safe(dcb, dcb_temp, &tmp, node) {
> + list_del(&dcb->node);
> + ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
> + }
> +
> + ocelot_fdma_rx_refill(fdma);
> +
> + return 0;
> +}
> +
> static int fdma_init_tx(struct ocelot_fdma *fdma)
> {
> int i;
> diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.h b/drivers/net/ethernet/mscc/ocelot_fdma.h
> index 6c5c5872abf5..74514a0b291a 100644
> --- a/drivers/net/ethernet/mscc/ocelot_fdma.h
> +++ b/drivers/net/ethernet/mscc/ocelot_fdma.h
> @@ -55,5 +55,6 @@ int ocelot_fdma_start(struct ocelot_fdma *fdma);
> int ocelot_fdma_stop(struct ocelot_fdma *fdma);
> int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port, u32 rew_op,
> struct sk_buff *skb, struct net_device *dev);
> +int ocelot_fdma_change_mtu(struct net_device *dev, int new_mtu);
>
> #endif
> diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
> index 3971b810c5b4..d5e88d7b15c7 100644
> --- a/drivers/net/ethernet/mscc/ocelot_net.c
> +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> @@ -492,6 +492,13 @@ static int ocelot_change_mtu(struct net_device *dev, int new_mtu)
> struct ocelot_port_private *priv = netdev_priv(dev);
> struct ocelot_port *ocelot_port = &priv->port;
> struct ocelot *ocelot = ocelot_port->ocelot;
> + int ret;
> +
> + if (ocelot->fdma) {
> + ret = ocelot_fdma_change_mtu(dev, new_mtu);
> + if (ret)
> + return ret;
> + }
>
> ocelot_port_set_maxlen(ocelot, priv->chip_port, new_mtu);
> WRITE_ONCE(dev->mtu, new_mtu);
> --
> 2.33.0
>

2021-11-03 13:09:13

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH v2 4/6] net: ocelot: add support for ndo_change_mtu

Le Wed, 3 Nov 2021 12:40:55 +0000,
Vladimir Oltean <[email protected]> a écrit :

> > +static int ocelot_change_mtu(struct net_device *dev, int new_mtu)
> > +{
> > + struct ocelot_port_private *priv = netdev_priv(dev);
> > + struct ocelot_port *ocelot_port = &priv->port;
> > + struct ocelot *ocelot = ocelot_port->ocelot;
> > +
> > + ocelot_port_set_maxlen(ocelot, priv->chip_port, new_mtu);
> > + WRITE_ONCE(dev->mtu, new_mtu);
>
> The WRITE_ONCE seems absolutely gratuitous to me.

I applied what is recommended in netdevice.h for the mtu field of the
netdev.
(https://elixir.bootlin.com/linux/v5.15/source/include/linux/netdevice.h#L1989)
And used in __dev_set_mtu
(https://elixir.bootlin.com/linux/v5.15/source/net/core/dev.c#L8849)

>
> > +
> > + return 0;
> > +}
> > +
> > enum ocelot_action_type {
> > OCELOT_MACT_LEARN,
> > OCELOT_MACT_FORGET,


--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2021-11-03 13:59:28

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

Le Wed, 3 Nov 2021 12:38:12 +0000,
Vladimir Oltean <[email protected]> a écrit :

> On Wed, Nov 03, 2021 at 10:19:40AM +0100, Clément Léger wrote:
> > IFH preparation can take quite some time on slow processors (up to
> > 5% in a iperf3 test for instance). In order to reduce the cost of
> > this preparation, pre-compute IFH since most of the parameters are
> > fixed per port. Only rew_op and vlan tag will be set when sending
> > if different than 0. This allows to remove entirely the calls to
> > packing() with basic usage. In the same time, export this function
> > that will be used by FDMA.
> >
> > Signed-off-by: Clément Léger <[email protected]>
> > ---
>
> Honestly, this feels a bit cheap/gimmicky, and not really the
> fundamental thing to address. In my testing of a similar idea (see
> commits 67c2404922c2 ("net: dsa: felix: create a template for the DSA
> tags on xmit") and then 7c4bb540e917 ("net: dsa: tag_ocelot: create
> separate tagger for Seville"), the net difference is not that stark,
> considering that now you need to access one more memory region which
> you did not need before, do a memcpy, and then patch the IFH anyway
> for the non-constant stuff.

The memcpy is neglectable and the patching happens only in a few
cases (at least vs the packing function call). The VSC7514 CPU is really
slow and lead to 2.5% up to 5% time spent in packing() when using iperf3
and depending on the use case (according to ftrace).

>
> Certainly, for the calls to ocelot_port_inject_frame() from DSA, I
> would prefer not having this pre-computed IFH.
>
> Could you provide some before/after performance numbers and perf
> counters?

I will make another round of measure to confirm my previous number and
check the impact on the injection rate on ocelot.

>
> > drivers/net/ethernet/mscc/ocelot.c | 23 ++++++++++++++++++-----
> > include/soc/mscc/ocelot.h | 5 +++++
> > 2 files changed, 23 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mscc/ocelot.c
> > b/drivers/net/ethernet/mscc/ocelot.c index
> > e6c18b598d5c..97693772595b 100644 ---
> > a/drivers/net/ethernet/mscc/ocelot.c +++
> > b/drivers/net/ethernet/mscc/ocelot.c @@ -1076,20 +1076,29 @@ bool
> > ocelot_can_inject(struct ocelot *ocelot, int grp) }
> > EXPORT_SYMBOL(ocelot_can_inject);
> >
> > +void ocelot_ifh_port_set(void *ifh, struct ocelot_port *port, u32
> > rew_op,
> > + u32 vlan_tag)
> > +{
> > + memcpy(ifh, port->ifh, OCELOT_TAG_LEN);
> > +
> > + if (vlan_tag)
> > + ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
> > + if (rew_op)
> > + ocelot_ifh_set_rew_op(ifh, rew_op);
> > +}
> > +EXPORT_SYMBOL(ocelot_ifh_port_set);
> > +
> > void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int
> > grp, u32 rew_op, struct sk_buff *skb)
> > {
> > + struct ocelot_port *port_s = ocelot->ports[port];
> > u32 ifh[OCELOT_TAG_LEN / 4] = {0};
> > unsigned int i, count, last;
> >
> > ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
> > QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
> >
> > - ocelot_ifh_set_bypass(ifh, 1);
> > - ocelot_ifh_set_dest(ifh, BIT_ULL(port));
> > - ocelot_ifh_set_tag_type(ifh, IFH_TAG_TYPE_C);
> > - ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
> > - ocelot_ifh_set_rew_op(ifh, rew_op);
> > + ocelot_ifh_port_set(ifh, port_s, rew_op,
> > skb_vlan_tag_get(skb));
> > for (i = 0; i < OCELOT_TAG_LEN / 4; i++)
> > ocelot_write_rix(ocelot, ifh[i], QS_INJ_WR, grp);
> > @@ -2128,6 +2137,10 @@ void ocelot_init_port(struct ocelot *ocelot,
> > int port)
> > skb_queue_head_init(&ocelot_port->tx_skbs);
> >
> > + ocelot_ifh_set_bypass(ocelot_port->ifh, 1);
> > + ocelot_ifh_set_dest(ocelot_port->ifh, BIT_ULL(port));
> > + ocelot_ifh_set_tag_type(ocelot_port->ifh, IFH_TAG_TYPE_C);
> > +
> > /* Basic L2 initialization */
> >
> > /* Set MAC IFG Gaps
> > diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
> > index fef3a36b0210..b3381c90ff3e 100644
> > --- a/include/soc/mscc/ocelot.h
> > +++ b/include/soc/mscc/ocelot.h
> > @@ -6,6 +6,7 @@
> > #define _SOC_MSCC_OCELOT_H
> >
> > #include <linux/ptp_clock_kernel.h>
> > +#include <linux/dsa/ocelot.h>
> > #include <linux/net_tstamp.h>
> > #include <linux/if_vlan.h>
> > #include <linux/regmap.h>
> > @@ -623,6 +624,8 @@ struct ocelot_port {
> >
> > struct net_device *bridge;
> > u8 stp_state;
> > +
> > + u8 ifh[OCELOT_TAG_LEN];
> > };
> >
> > struct ocelot {
> > @@ -754,6 +757,8 @@ void __ocelot_target_write_ix(struct ocelot
> > *ocelot, enum ocelot_target target, bool ocelot_can_inject(struct
> > ocelot *ocelot, int grp); void ocelot_port_inject_frame(struct
> > ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb);
> > +void ocelot_ifh_port_set(void *ifh, struct ocelot_port *port, u32
> > rew_op,
> > + u32 vlan_tag);
> > int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct
> > sk_buff **skb); void ocelot_drain_cpu_queue(struct ocelot *ocelot,
> > int grp);
> > --
> > 2.33.0
>



--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2021-11-03 14:25:01

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH v2 5/6] net: ocelot: add FDMA support

Le Wed, 3 Nov 2021 12:31:54 +0000,
Vladimir Oltean <[email protected]> a écrit :

> On Wed, Nov 03, 2021 at 10:19:42AM +0100, Clément Léger wrote:
> > Ethernet frames can be extracted or injected autonomously to or
> > from the device’s DDR3/DDR3L memory and/or PCIe memory space.
> > Linked list data structures in memory are used for injecting or
> > extracting Ethernet frames. The FDMA generates interrupts when
> > frame extraction or injection is done and when the linked lists
> > need updating.
> >
> > The FDMA is shared between all the ethernet ports of the switch and
> > uses a linked list of descriptors (DCB) to inject and extract
> > packets. Before adding descriptors, the FDMA channels must be
> > stopped. It would be inefficient to do that each time a descriptor
> > would be added,
> >
> > TX path uses multiple lists to handle descriptors. tx_ongoing is
> > the list of DCB currently owned by the hardware, tx_queued is a
> > list of DCB that will be given to the hardware when tx_ongoing is
> > done and finally tx_free_dcb is the list of DCB available for TX.
> >
> > RX path uses two list, rx_hw is the list of DCB currently given to
> > the hardware and rx_sw is the list of descriptors that have been
> > completed by the FDMA and will be reinjected when the DMA hits the
> > end of the linked list.
> >
> > Co-developed-by: Alexandre Belloni <[email protected]>
> > Signed-off-by: Alexandre Belloni <[email protected]>
> > Signed-off-by: Clément Léger <[email protected]>
> > ---
>
> Honestly, my mind exploded when I saw locking between TX and TX
> confirmation. Can you not constrain the list of TX DCBs to act like a
> ring-based device? Meaning that the linked list is always constant in
> size, and you just update the Linked List Pointer of the last entry
> populated by software to be NULL, to make the hardware stop processing
> beyond that point. This could help you avoid keeping a list in
> software, and a DMA pool for the DCBs, just have a contiguous memory
> mapping of all the DCBs for TX, and then you shouldn't need a
> spin_lock for a list you no longer keep.

Indeed, this will simplify DCB handling. I will do these modifications.

>
> I haven't even gotten to reviewing RX properly...
>
> > drivers/net/ethernet/mscc/Makefile | 1 +
> > drivers/net/ethernet/mscc/ocelot.h | 1 +
> > drivers/net/ethernet/mscc/ocelot_fdma.c | 693
> > +++++++++++++++++++++ drivers/net/ethernet/mscc/ocelot_fdma.h |
> > 59 ++ drivers/net/ethernet/mscc/ocelot_net.c | 11 +-
> > drivers/net/ethernet/mscc/ocelot_vsc7514.c | 15 +
> > include/soc/mscc/ocelot.h | 2 +
> > 7 files changed, 779 insertions(+), 3 deletions(-)
> > create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.c
> > create mode 100644 drivers/net/ethernet/mscc/ocelot_fdma.h
> >
> > diff --git a/drivers/net/ethernet/mscc/Makefile
> > b/drivers/net/ethernet/mscc/Makefile index
> > 722c27694b21..d76a9b78b6ca 100644 ---
> > a/drivers/net/ethernet/mscc/Makefile +++
> > b/drivers/net/ethernet/mscc/Makefile @@ -11,5 +11,6 @@
> > mscc_ocelot_switch_lib-y := \
> > mscc_ocelot_switch_lib-$(CONFIG_BRIDGE_MRP) += ocelot_mrp.o
> > obj-$(CONFIG_MSCC_OCELOT_SWITCH) += mscc_ocelot.o mscc_ocelot-y := \
> > + ocelot_fdma.o \
> > ocelot_vsc7514.o \
> > ocelot_net.o
> > diff --git a/drivers/net/ethernet/mscc/ocelot.h
> > b/drivers/net/ethernet/mscc/ocelot.h index
> > ba0dec7dd64f..ad85ad1079ad 100644 ---
> > a/drivers/net/ethernet/mscc/ocelot.h +++
> > b/drivers/net/ethernet/mscc/ocelot.h @@ -9,6 +9,7 @@
> > #define _MSCC_OCELOT_H_
> >
> > #include <linux/bitops.h>
> > +#include <linux/dsa/ocelot.h>
> > #include <linux/etherdevice.h>
> > #include <linux/if_vlan.h>
> > #include <linux/net_tstamp.h>
> > diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c
> > b/drivers/net/ethernet/mscc/ocelot_fdma.c new file mode 100644
> > index 000000000000..d8cdf022bbee
> > --- /dev/null
> > +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c
> > @@ -0,0 +1,693 @@
> > +// SPDX-License-Identifier: (GPL-2.0 OR MIT)
> > +/*
> > + * Microsemi SoCs FDMA driver
> > + *
> > + * Copyright (c) 2021 Microchip
> > + */
> > +
> > +#include <linux/bitops.h>
> > +#include <linux/dmapool.h>
> > +#include <linux/dsa/ocelot.h>
> > +#include <linux/netdevice.h>
> > +#include <linux/of_platform.h>
> > +#include <linux/skbuff.h>
> > +
> > +#include "ocelot_fdma.h"
> > +#include "ocelot_qs.h"
> > +
> > +#define MSCC_FDMA_DCB_LLP(x) ((x) * 4 + 0x0)
> > +
> > +#define MSCC_FDMA_DCB_STAT_BLOCKO(x) (((x) << 20) &
> > GENMASK(31, 20)) +#define MSCC_FDMA_DCB_STAT_BLOCKO_M
> > GENMASK(31, 20) +#define
> > MSCC_FDMA_DCB_STAT_BLOCKO_X(x) (((x) & GENMASK(31,
> > 20)) >> 20) +#define MSCC_FDMA_DCB_STAT_PD
> > BIT(19) +#define MSCC_FDMA_DCB_STAT_ABORT BIT(18)
> > +#define MSCC_FDMA_DCB_STAT_EOF BIT(17)
> > +#define MSCC_FDMA_DCB_STAT_SOF BIT(16)
> > +#define MSCC_FDMA_DCB_STAT_BLOCKL_M GENMASK(15, 0)
> > +#define MSCC_FDMA_DCB_STAT_BLOCKL(x) ((x) &
> > GENMASK(15, 0)) + +#define MSCC_FDMA_CH_SAFE
> > 0xcc +
> > +#define MSCC_FDMA_CH_ACTIVATE 0xd0
> > +
> > +#define MSCC_FDMA_CH_DISABLE 0xd4
> > +
> > +#define MSCC_FDMA_EVT_ERR 0x164
> > +
> > +#define MSCC_FDMA_EVT_ERR_CODE 0x168
> > +
> > +#define MSCC_FDMA_INTR_LLP 0x16c
> > +
> > +#define MSCC_FDMA_INTR_LLP_ENA 0x170
> > +
> > +#define MSCC_FDMA_INTR_FRM 0x174
> > +
> > +#define MSCC_FDMA_INTR_FRM_ENA 0x178
> > +
> > +#define MSCC_FDMA_INTR_ENA 0x184
> > +
> > +#define MSCC_FDMA_INTR_IDENT 0x188
> > +
> > +#define MSCC_FDMA_INJ_CHAN 2
> > +#define MSCC_FDMA_XTR_CHAN 0
> > +
> > +#define FDMA_MAX_SKB 256
> > +#define FDMA_WEIGHT 32
> > +
> > +#define OCELOT_TAG_WORD_LEN (OCELOT_TAG_LEN
> > / 4) +
> > +/* Add 4 for possible misalignment when mapping the data */
> > +#define FDMA_RX_EXTRA_SIZE \
> > + (OCELOT_TAG_LEN + ETH_FCS_LEN + ETH_HLEN + 4)
> > +
> > +struct ocelot_fdma_dcb_hw_v2 {
> > + u32 llp;
> > + u32 datap;
> > + u32 datal;
> > + u32 stat;
> > +};
> > +
> > +struct ocelot_fdma_dcb {
> > + struct ocelot_fdma_dcb_hw_v2 hw;
> > + struct list_head node;
> > + struct sk_buff *skb;
> > + dma_addr_t mapping;
> > + size_t mapped_size;
> > + dma_addr_t phys;
> > +};
> > +
> > +static int fdma_rx_compute_buffer_size(int mtu)
> > +{
> > + return ALIGN(mtu + FDMA_RX_EXTRA_SIZE, 4);
> > +}
> > +
> > +static void fdma_writel(struct ocelot_fdma *fdma, u32 reg, u32
> > data) +{
> > + writel(data, fdma->base + reg);
> > +}
> > +
> > +static u32 fdma_readl(struct ocelot_fdma *fdma, u32 reg)
> > +{
> > + return readl(fdma->base + reg);
> > +}
> > +
> > +static void fdma_activate_chan(struct ocelot_fdma *fdma,
> > + struct ocelot_fdma_dcb *dcb, int
> > chan) +{
> > + fdma_writel(fdma, MSCC_FDMA_DCB_LLP(chan), dcb->phys);
> > + fdma_writel(fdma, MSCC_FDMA_CH_ACTIVATE, BIT(chan));
> > +}
> > +
> > +static void fdma_stop_channel(struct ocelot_fdma *fdma, int chan)
> > +{
> > + u32 safe;
> > +
> > + fdma_writel(fdma, MSCC_FDMA_CH_DISABLE, BIT(chan));
> > + do {
> > + safe = fdma_readl(fdma, MSCC_FDMA_CH_SAFE);
> > + } while (!(safe & BIT(chan)));
> > +}
> > +
> > +static bool ocelot_fdma_dcb_set_data(struct ocelot_fdma *fdma,
> > + struct ocelot_fdma_dcb *dcb,
> > void *data,
> > + size_t size, enum
> > dma_data_direction dir) +{
> > + u32 offset;
> > +
> > + dcb->mapped_size = size;
> > + dcb->mapping = dma_map_single(fdma->dev, data, size, dir);
> > + if (unlikely(dma_mapping_error(fdma->dev, dcb->mapping)))
> > + return false;
> > +
> > + offset = dcb->mapping & 0x3;
> > +
> > + dcb->hw.llp = 0;
> > + dcb->hw.datap = dcb->mapping & ~0x3;
> > + /* DATAL must be a multiple of word size */
> > + dcb->hw.datal = ALIGN_DOWN(size - offset, 4);
> > + dcb->hw.stat = MSCC_FDMA_DCB_STAT_BLOCKO(offset);
> > +
> > + return true;
> > +}
> > +
> > +static bool ocelot_fdma_dcb_set_rx_skb(struct ocelot_fdma *fdma,
> > + struct ocelot_fdma_dcb *dcb,
> > + struct sk_buff *skb, size_t
> > size) +{
> > + dcb->skb = skb;
> > + return ocelot_fdma_dcb_set_data(fdma, dcb, skb->data, size,
> > + DMA_FROM_DEVICE);
> > +}
> > +
> > +static bool ocelot_fdma_dcb_set_tx_skb(struct ocelot_fdma *fdma,
> > + struct ocelot_fdma_dcb *dcb,
> > + struct sk_buff *skb)
> > +{
> > + if (!ocelot_fdma_dcb_set_data(fdma, dcb, skb->data,
> > skb->len,
> > + DMA_TO_DEVICE))
> > + return false;
> > +
> > + dcb->skb = skb;
> > + dcb->hw.stat |= MSCC_FDMA_DCB_STAT_BLOCKL(skb->len);
> > + dcb->hw.stat |= MSCC_FDMA_DCB_STAT_SOF |
> > MSCC_FDMA_DCB_STAT_EOF; +
> > + return true;
> > +}
> > +
> > +static struct ocelot_fdma_dcb *fdma_dcb_alloc(struct ocelot_fdma
> > *fdma) +{
> > + struct ocelot_fdma_dcb *dcb;
> > + dma_addr_t phys;
> > +
> > + dcb = dma_pool_zalloc(fdma->dcb_pool, GFP_KERNEL, &phys);
> > + if (unlikely(!dcb))
> > + return NULL;
> > +
> > + dcb->phys = phys;
> > +
> > + return dcb;
> > +}
> > +
> > +static struct net_device *fdma_get_port_netdev(struct ocelot_fdma
> > *fdma,
> > + int port_num)
> > +{
> > + struct ocelot_port_private *port_priv;
> > + struct ocelot *ocelot = fdma->ocelot;
> > + struct ocelot_port *port;
> > +
> > + if (port_num >= ocelot->num_phys_ports)
> > + return NULL;
> > +
> > + port = ocelot->ports[port_num];
> > +
> > + if (!port)
> > + return NULL;
> > +
> > + port_priv = container_of(port, struct ocelot_port_private,
> > port); +
> > + return port_priv->dev;
> > +}
> > +
> > +static bool ocelot_fdma_rx_process_skb(struct ocelot_fdma *fdma,
> > + struct ocelot_fdma_dcb *dcb,
> > + int budget)
> > +{
> > + struct sk_buff *skb = dcb->skb;
> > + struct net_device *ndev;
> > + u64 src_port;
> > + void *xfh;
> > +
> > + dma_unmap_single(fdma->dev, dcb->mapping, dcb->mapped_size,
> > + DMA_FROM_DEVICE);
> > +
> > + xfh = skb->data;
> > + ocelot_xfh_get_src_port(xfh, &src_port);
> > +
> > + skb_put(skb, MSCC_FDMA_DCB_STAT_BLOCKL(dcb->hw.stat));
> > + skb_pull(skb, OCELOT_TAG_LEN);
> > +
> > + ndev = fdma_get_port_netdev(fdma, src_port);
> > + if (unlikely(!ndev)) {
> > + napi_consume_skb(dcb->skb, budget);
> > + return false;
> > + }
> > +
> > + skb->dev = ndev;
> > + skb->protocol = eth_type_trans(skb, skb->dev);
> > + skb->dev->stats.rx_bytes += skb->len;
> > + skb->dev->stats.rx_packets++;
> > +
> > + netif_receive_skb(skb);
> > +
> > + return true;
> > +}
> > +
> > +static void ocelot_fdma_rx_refill(struct ocelot_fdma *fdma)
> > +{
> > + struct ocelot_fdma_dcb *dcb, *last_dcb;
> > +
> > + WARN_ON(list_empty(&fdma->rx_sw));
> > +
> > + dcb = list_first_entry(&fdma->rx_sw, struct
> > ocelot_fdma_dcb, node);
> > + /* Splice old hardware DCB list + new one */
> > + if (!list_empty(&fdma->rx_hw)) {
> > + last_dcb = list_last_entry(&fdma->rx_hw, struct
> > ocelot_fdma_dcb,
> > + node);
> > + last_dcb->hw.llp = dcb->phys;
> > + }
> > +
> > + /* Move software list to hardware list */
> > + list_splice_tail_init(&fdma->rx_sw, &fdma->rx_hw);
> > +
> > + /* Finally reactivate the channel */
> > + fdma_activate_chan(fdma, dcb, MSCC_FDMA_XTR_CHAN);
> > +}
> > +
> > +static void ocelot_fdma_list_add_dcb(struct list_head *list,
> > + struct ocelot_fdma_dcb *dcb)
> > +{
> > + struct ocelot_fdma_dcb *last_dcb;
> > +
> > + if (!list_empty(list)) {
> > + last_dcb = list_last_entry(list, struct
> > ocelot_fdma_dcb, node);
> > + last_dcb->hw.llp = dcb->phys;
> > + }
> > +
> > + list_add_tail(&dcb->node, list);
> > +}
> > +
> > +static bool ocelot_fdma_rx_add_dcb_sw(struct ocelot_fdma *fdma,
> > + struct ocelot_fdma_dcb *dcb)
> > +{
> > + struct sk_buff *new_skb;
> > +
> > + /* Add DCB to end of list with new SKB */
> > + new_skb = napi_alloc_skb(&fdma->napi, fdma->rx_buf_size);
> > + if (unlikely(!new_skb)) {
> > + pr_err("skb_alloc failed\n");
> > + return false;
> > + }
> > +
> > + ocelot_fdma_dcb_set_rx_skb(fdma, dcb, new_skb,
> > fdma->rx_buf_size);
> > + ocelot_fdma_list_add_dcb(&fdma->rx_sw, dcb);
> > +
> > + return true;
> > +}
> > +
> > +static bool ocelot_fdma_rx_get(struct ocelot_fdma *fdma, int
> > budget) +{
> > + struct ocelot_fdma_dcb *dcb;
> > + bool valid = true;
> > + u32 stat;
> > +
> > + dcb = list_first_entry_or_null(&fdma->rx_hw, struct
> > ocelot_fdma_dcb,
> > + node);
> > + if (!dcb || MSCC_FDMA_DCB_STAT_BLOCKL(dcb->hw.stat) == 0)
> > + return false;
> > +
> > + list_del(&dcb->node);
> > +
> > + stat = dcb->hw.stat;
> > + if (stat & MSCC_FDMA_DCB_STAT_ABORT || stat &
> > MSCC_FDMA_DCB_STAT_PD)
> > + valid = false;
> > +
> > + if (!(stat & MSCC_FDMA_DCB_STAT_SOF) ||
> > + !(stat & MSCC_FDMA_DCB_STAT_EOF))
> > + valid = false;
> > +
> > + if (likely(valid)) {
> > + if (!ocelot_fdma_rx_process_skb(fdma, dcb, budget))
> > + pr_err("Process skb failed, stat %x\n",
> > stat);
> > + } else {
> > + napi_consume_skb(dcb->skb, budget);
> > + }
> > +
> > + return ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
> > +}
> > +
> > +static void ocelot_fdma_rx_check_stopped(struct ocelot_fdma *fdma)
> > +{
> > + u32 llp = fdma_readl(fdma,
> > MSCC_FDMA_DCB_LLP(MSCC_FDMA_XTR_CHAN));
> > + /* LLP is non NULL, FDMA is still fetching packets */
> > + if (llp)
> > + return;
> > +
> > + fdma_stop_channel(fdma, MSCC_FDMA_XTR_CHAN);
> > + ocelot_fdma_rx_refill(fdma);
> > +}
> > +
> > +static void ocelot_fdma_tx_free_dcb(struct ocelot_fdma *fdma,
> > + struct list_head *list)
> > +{
> > + struct ocelot_fdma_dcb *dcb;
> > +
> > + if (list_empty(list))
> > + return;
> > +
> > + /* Free all SKBs that have been used for TX */
> > + list_for_each_entry(dcb, list, node) {
> > + dma_unmap_single(fdma->dev, dcb->mapping,
> > dcb->mapped_size,
> > + DMA_TO_DEVICE);
> > + dev_consume_skb_any(dcb->skb);
> > + dcb->skb = NULL;
> > + }
> > +
> > + /* All DCBs can now be given to free list */
> > + spin_lock(&fdma->tx_free_lock);
> > + list_splice_tail_init(list, &fdma->tx_free_dcb);
> > + spin_unlock(&fdma->tx_free_lock);
> > +}
> > +
> > +static void ocelot_fdma_tx_cleanup(struct ocelot_fdma *fdma)
> > +{
> > + struct list_head tx_done = LIST_HEAD_INIT(tx_done);
> > + struct ocelot_fdma_dcb *dcb, *temp;
> > +
> > + spin_lock(&fdma->tx_enqueue_lock);
> > + if (list_empty(&fdma->tx_ongoing))
> > + goto out_unlock;
> > +
> > + list_for_each_entry_safe(dcb, temp, &fdma->tx_ongoing,
> > node) {
> > + if (!(dcb->hw.stat & MSCC_FDMA_DCB_STAT_PD))
> > + break;
> > +
> > + list_move_tail(&dcb->node, &tx_done);
> > + }
> > +
> > +out_unlock:
> > + spin_unlock(&fdma->tx_enqueue_lock);
> > +
> > + ocelot_fdma_tx_free_dcb(fdma, &tx_done);
> > +}
> > +
> > +static void ocelot_fdma_tx_restart(struct ocelot_fdma *fdma)
> > +{
> > + struct ocelot_fdma_dcb *dcb;
> > + u32 safe;
> > +
> > + spin_lock(&fdma->tx_enqueue_lock);
> > +
> > + if (!list_empty(&fdma->tx_ongoing) ||
> > list_empty(&fdma->tx_queued))
> > + goto out_unlock;
> > +
> > + /* Ongoing list is empty, channel should be in safe mode */
> > + do {
> > + safe = fdma_readl(fdma, MSCC_FDMA_CH_SAFE);
> > + } while (!(safe & BIT(MSCC_FDMA_INJ_CHAN)));
> > +
> > + /* Move queued DCB to ongoing and restart the DMA */
> > + list_splice_tail_init(&fdma->tx_queued, &fdma->tx_ongoing);
> > + /* List can't be empty, no need to check */
> > + dcb = list_first_entry(&fdma->tx_ongoing, struct
> > ocelot_fdma_dcb, node); +
> > + fdma_activate_chan(fdma, dcb, MSCC_FDMA_INJ_CHAN);
> > +
> > +out_unlock:
> > + spin_unlock(&fdma->tx_enqueue_lock);
> > +}
> > +
> > +static int ocelot_fdma_napi_poll(struct napi_struct *napi, int
> > budget) +{
> > + struct ocelot_fdma *fdma = container_of(napi, struct
> > ocelot_fdma, napi);
> > + int work_done = 0;
> > +
> > + ocelot_fdma_tx_cleanup(fdma);
> > + ocelot_fdma_tx_restart(fdma);
> > +
> > + while (work_done < budget) {
> > + if (!ocelot_fdma_rx_get(fdma, budget))
> > + break;
> > +
> > + work_done++;
> > + }
> > +
> > + ocelot_fdma_rx_check_stopped(fdma);
> > +
> > + if (work_done < budget) {
> > + napi_complete(&fdma->napi);
>
> Documentation says you should consider calling
> napi_complete_done(&fdma->napi, work_done);

Acked.

>
> > + fdma_writel(fdma, MSCC_FDMA_INTR_ENA,
> > + BIT(MSCC_FDMA_INJ_CHAN) |
> > BIT(MSCC_FDMA_XTR_CHAN));
> > + }
> > +
> > + return work_done;
> > +}
> > +
> > +static irqreturn_t ocelot_fdma_interrupt(int irq, void *dev_id)
> > +{
> > + u32 ident, llp, frm, err, err_code;
> > + struct ocelot_fdma *fdma = dev_id;
> > +
> > + ident = fdma_readl(fdma, MSCC_FDMA_INTR_IDENT);
> > + frm = fdma_readl(fdma, MSCC_FDMA_INTR_FRM);
> > + llp = fdma_readl(fdma, MSCC_FDMA_INTR_LLP);
> > +
> > + fdma_writel(fdma, MSCC_FDMA_INTR_LLP, llp & ident);
> > + fdma_writel(fdma, MSCC_FDMA_INTR_FRM, frm & ident);
> > + if (frm | llp) {
>
> Bitwise OR? Strange.

I will use a logic OR even if both works here.

>
> > + fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
> > + napi_schedule(&fdma->napi);
> > + }
> > +
> > + err = fdma_readl(fdma, MSCC_FDMA_EVT_ERR);
> > + if (unlikely(err)) {
> > + err_code = fdma_readl(fdma,
> > MSCC_FDMA_EVT_ERR_CODE);
> > + dev_err_ratelimited(fdma->dev,
> > + "Error ! chans mask: %#x,
> > code: %#x\n",
> > + err, err_code);
> > +
> > + fdma_writel(fdma, MSCC_FDMA_EVT_ERR, err);
> > + fdma_writel(fdma, MSCC_FDMA_EVT_ERR_CODE,
> > err_code);
> > + }
> > +
> > + return IRQ_HANDLED;
> > +}
> > +
> > +static struct ocelot_fdma_dcb *fdma_tx_get_dcb(struct ocelot_fdma
> > *fdma)
>
> Please name these functions consistently and make them start with
> ocelot_.

Acked

>
> > +{
> > + struct ocelot_fdma_dcb *dcb = NULL;
> > +
> > + spin_lock_bh(&fdma->tx_free_lock);
> > + dcb = list_first_entry_or_null(&fdma->tx_free_dcb,
> > + struct ocelot_fdma_dcb,
> > node);
> > + if (dcb)
> > + list_del(&dcb->node);
> > +
> > + spin_unlock_bh(&fdma->tx_free_lock);
> > +
> > + return dcb;
> > +}
> > +
> > +int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port,
> > u32 rew_op,
> > + struct sk_buff *skb, struct
> > net_device *dev) +{
> > + struct ocelot_port *port_s = fdma->ocelot->ports[port];
> > + struct ocelot_fdma_dcb *dcb;
> > + struct sk_buff *new_skb;
> > + void *ifh;
> > +
> > + if (unlikely(skb_shinfo(skb)->nr_frags != 0)) {
> > + netdev_err(dev, "Unsupported fragmented packet");
> > + dev_kfree_skb_any(skb);
>
> skb_linearize()

I can use multiple TX DCBs to send that packet but I wanted to do so in
another series.

>
> Also please don't print stuff from the hot path without
> net_ratelimited()

Ok.

>
> > + return NETDEV_TX_OK;
> > + }
> > +
> > + if (skb_headroom(skb) < OCELOT_TAG_LEN ||
> > + skb_tailroom(skb) < ETH_FCS_LEN) {
>
> Don't you also need to copy the skb (to ensure it's writable) if it's
> cloned (like would be the case for packets with two-step PTP TX
> timestamping)? I don't see any calls to skb_unshare().
>
> You can test with:
>
> ptp4l -i swp0 -2 -P -m --tx_timestamp_timeout 20
>
> on two back-to-back boards.
>
> > + new_skb = skb_copy_expand(skb, OCELOT_TAG_LEN,
> > ETH_FCS_LEN,
> > + GFP_ATOMIC);
> > + dev_consume_skb_any(skb);
>
> I think you can use pskb_expand_head() and avoid creating a new_skb.
> Look at dsa_realloc_skb().

Ok thanks.

>
> > + if (!new_skb)
> > + return NETDEV_TX_OK;
> > +
> > + skb = new_skb;
> > + }
> > +
> > + ifh = skb_push(skb, OCELOT_TAG_LEN);
> > + skb_put(skb, ETH_FCS_LEN);
> > + ocelot_ifh_port_set(ifh, port_s, rew_op,
> > skb_vlan_tag_get(skb)); +
> > + dcb = fdma_tx_get_dcb(fdma);
> > + if (unlikely(!dcb))
> > + return NETDEV_TX_BUSY;
> > +
> > + if (!ocelot_fdma_dcb_set_tx_skb(fdma, dcb, skb)) {
> > + dev_kfree_skb_any(skb);
> > + spin_lock_bh(&fdma->tx_free_lock);
> > + list_add_tail(&dcb->node, &fdma->tx_free_dcb);
> > + spin_unlock_bh(&fdma->tx_free_lock);
> > + return NETDEV_TX_OK;
> > + }
> > +
> > + spin_lock_bh(&fdma->tx_enqueue_lock);
> > +
> > + if (list_empty(&fdma->tx_ongoing)) {
> > + ocelot_fdma_list_add_dcb(&fdma->tx_ongoing, dcb);
> > + fdma_activate_chan(fdma, dcb, MSCC_FDMA_INJ_CHAN);
> > + } else {
> > + ocelot_fdma_list_add_dcb(&fdma->tx_queued, dcb);
> > + }
> > +
> > + spin_unlock_bh(&fdma->tx_enqueue_lock);
>
> I think you don't need _bh locking from ndo_start_xmit() context.
> __dev_queue_xmit() calls rcu_read_lock_bh(). On the other hand, I
> think you might need to use spin_lock_bh from
> ocelot_fdma_napi_poll(), since that runs from NET_RX softirq, and
> ndo_start_xmit() can run from loads of other contexts.

Ok, I'll rework that.
>
> > + return NETDEV_TX_OK;
> > +}
> > +
> > +static void fdma_free_skbs_list(struct ocelot_fdma *fdma,
> > + struct list_head *list,
> > + enum dma_data_direction dir)
> > +{
> > + struct ocelot_fdma_dcb *dcb;
> > +
> > + if (list_empty(list))
> > + return;
>
> I'm not sure this is really needed.
>
> > +
> > + list_for_each_entry(dcb, list, node) {
> > + if (dcb->skb) {
> > + dma_unmap_single(fdma->dev, dcb->mapping,
> > + dcb->mapped_size, dir);
> > + dev_kfree_skb_any(dcb->skb);
>
> dcb->skb = NULL?
>
> > + }
> > + }
> > +}
> > +
> > +static int fdma_init_tx(struct ocelot_fdma *fdma)
> > +{
> > + int i;
> > + struct ocelot_fdma_dcb *dcb;
> > +
> > + for (i = 0; i < FDMA_MAX_SKB; i++) {
> > + dcb = fdma_dcb_alloc(fdma);
> > + if (!dcb)
> > + return -ENOMEM;
> > +
> > + list_add_tail(&dcb->node, &fdma->tx_free_dcb);
> > + }
> > +
> > + return 0;
> > +}
> > +
> > +static int fdma_init_rx(struct ocelot_fdma *fdma)
> > +{
> > + struct ocelot_port_private *port_priv;
> > + struct ocelot *ocelot = fdma->ocelot;
> > + struct ocelot_fdma_dcb *dcb;
> > + struct ocelot_port *port;
> > + struct net_device *ndev;
> > + int max_mtu = 0;
> > + int i;
> > + u8 port_num;
>
> Please declare variables in the order of descending line length (aka
> "reverse Christmas tree"). Here, and in fdma_init_rx(), and in other
> places.


Acked.

>
> > +
> > + for (port_num = 0; port_num < ocelot->num_phys_ports;
> > port_num++) {
>
> The naming convention is "int port", "struct ocelot_port
> *ocelot_port". Please keep it. Thanks.

Ok.

>
> > + port = ocelot->ports[port_num];
> > + if (!port)
> > + continue;
> > +
> > + port_priv = container_of(port, struct
> > ocelot_port_private,
> > + port);
> > + ndev = port_priv->dev;
> > +
> > + ndev->needed_headroom = OCELOT_TAG_LEN;
> > + ndev->needed_tailroom = ETH_FCS_LEN;
> > +
> > + if (READ_ONCE(ndev->mtu) > max_mtu)
> > + max_mtu = READ_ONCE(ndev->mtu);
>
> This seems silly, you use READ_ONCE twice... what's the point?
> Also, what is this racing with?
>
> > + }
> > +
> > + if (!ndev)
> > + return -ENODEV;
> > +
> > + fdma->rx_buf_size = fdma_rx_compute_buffer_size(max_mtu);
> > + netif_napi_add(ndev, &fdma->napi, ocelot_fdma_napi_poll,
> > + FDMA_WEIGHT);
> > +
> > + for (i = 0; i < FDMA_MAX_SKB; i++) {
> > + dcb = fdma_dcb_alloc(fdma);
> > + if (!dcb)
> > + return -ENOMEM;
> > +
> > + ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
> > + }
> > +
> > + napi_enable(&fdma->napi);
> > +
> > + return 0;
> > +}
> > +
> > +struct ocelot_fdma *ocelot_fdma_init(struct platform_device *pdev,
> > + struct ocelot *ocelot)
> > +{
> > + struct ocelot_fdma *fdma;
> > + int ret;
> > +
> > + fdma = devm_kzalloc(&pdev->dev, sizeof(*fdma), GFP_KERNEL);
> > + if (!fdma)
> > + return ERR_PTR(-ENOMEM);
> > +
> > + fdma->ocelot = ocelot;
> > + fdma->base = devm_platform_ioremap_resource_byname(pdev,
> > "fdma");
>
> Don't you want to look up the resource by name before allocating
> stuff? Maybe the allocation won't be needed, then you'll have to live
> with it, since you use devres. Although my personal recommendation
> would be to just not use devres, it makes you think more.

Ok, I will see if it's simpler without all devm_ variants.

>
> > + if (IS_ERR_OR_NULL(fdma->base))
> > + return fdma->base;
>
> Just return NULL and simplify the caller, you aren't using the ERR
> value anyway (or do something with the error value at the call site).

Acked, error path will be reworked.

>
> > +
> > + fdma->dev = &pdev->dev;
> > + fdma->dev->coherent_dma_mask = DMA_BIT_MASK(32);
> > +
> > + spin_lock_init(&fdma->tx_enqueue_lock);
> > + spin_lock_init(&fdma->tx_free_lock);
> > +
> > + fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
> > +
> > + fdma->irq = platform_get_irq_byname(pdev, "fdma");
> > + ret = devm_request_irq(&pdev->dev, fdma->irq,
> > ocelot_fdma_interrupt, 0,
> > + dev_name(&pdev->dev), fdma);
> > + if (ret)
> > + return ERR_PTR(ret);
> > +
> > + /* Create a pool of consistent memory blocks for hardware
> > descriptors */
> > + fdma->dcb_pool = dmam_pool_create("ocelot_fdma",
> > &pdev->dev,
> > + sizeof(struct
> > ocelot_fdma_dcb),
> > + __alignof__(struct
> > ocelot_fdma_dcb),
> > + 0);
> > + if (!fdma->dcb_pool) {
> > + dev_err(&pdev->dev, "unable to allocate DMA
> > descriptor pool\n");
> > + return ERR_PTR(-ENOMEM);
> > + }
> > +
> > + INIT_LIST_HEAD(&fdma->tx_ongoing);
> > + INIT_LIST_HEAD(&fdma->tx_free_dcb);
> > + INIT_LIST_HEAD(&fdma->tx_queued);
> > + INIT_LIST_HEAD(&fdma->rx_sw);
> > + INIT_LIST_HEAD(&fdma->rx_hw);
> > +
> > + return fdma;
> > +}
> > +
> > +int ocelot_fdma_start(struct ocelot_fdma *fdma)
> > +{
> > + struct ocelot *ocelot = fdma->ocelot;
> > + int ret;
> > +
> > + ret = fdma_init_tx(fdma);
> > + if (ret)
> > + return ret;
> > +
> > + ret = fdma_init_rx(fdma);
> > + if (ret)
>
> Don't you want to undo the fdma_dcb_alloc() from fdma_init_tx() if
> this fails?

Indeed. Since I will switch to your proposal of using a single large
area of coherent descriptor, this will be removed anyway.

>
> > + return ret;
> > +
> > + /* Reconfigure for extraction and injection using DMA */
> > + ocelot_write_rix(ocelot, QS_INJ_GRP_CFG_MODE(2),
> > QS_INJ_GRP_CFG, 0);
> > + ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(0),
> > QS_INJ_CTRL, 0); +
> > + ocelot_write_rix(ocelot, QS_XTR_GRP_CFG_MODE(2),
> > QS_XTR_GRP_CFG, 0); +
> > + fdma_writel(fdma, MSCC_FDMA_INTR_LLP, 0xffffffff);
> > + fdma_writel(fdma, MSCC_FDMA_INTR_FRM, 0xffffffff);
> > +
> > + fdma_writel(fdma, MSCC_FDMA_INTR_LLP_ENA,
> > + BIT(MSCC_FDMA_INJ_CHAN) |
> > BIT(MSCC_FDMA_XTR_CHAN));
> > + fdma_writel(fdma, MSCC_FDMA_INTR_FRM_ENA,
> > BIT(MSCC_FDMA_XTR_CHAN));
> > + fdma_writel(fdma, MSCC_FDMA_INTR_ENA,
> > + BIT(MSCC_FDMA_INJ_CHAN) |
> > BIT(MSCC_FDMA_XTR_CHAN)); +
> > + ocelot_fdma_rx_refill(fdma);
> > +
> > + return 0;
> > +}
> > +
> > +int ocelot_fdma_stop(struct ocelot_fdma *fdma)
> > +{
> > + fdma_writel(fdma, MSCC_FDMA_INTR_ENA, 0);
> > +
> > + fdma_stop_channel(fdma, MSCC_FDMA_XTR_CHAN);
> > + fdma_stop_channel(fdma, MSCC_FDMA_INJ_CHAN);
> > +
> > + /* Free potentially pending SKBs in DCB lists */
> > + fdma_free_skbs_list(fdma, &fdma->rx_hw, DMA_FROM_DEVICE);
> > + fdma_free_skbs_list(fdma, &fdma->rx_sw, DMA_FROM_DEVICE);
> > + fdma_free_skbs_list(fdma, &fdma->tx_ongoing,
> > DMA_TO_DEVICE);
> > + fdma_free_skbs_list(fdma, &fdma->tx_queued, DMA_TO_DEVICE);
> > +
> > + netif_napi_del(&fdma->napi);
> > +
> > + return 0;
> > +}
> > diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.h
> > b/drivers/net/ethernet/mscc/ocelot_fdma.h new file mode 100644
> > index 000000000000..6c5c5872abf5
> > --- /dev/null
> > +++ b/drivers/net/ethernet/mscc/ocelot_fdma.h
> > @@ -0,0 +1,59 @@
> > +/* SPDX-License-Identifier: (GPL-2.0 OR MIT) */
> > +/*
> > + * Microsemi SoCs FDMA driver
> > + *
> > + * Copyright (c) 2021 Microchip
> > + */
> > +#ifndef _MSCC_OCELOT_FDMA_H_
> > +#define _MSCC_OCELOT_FDMA_H_
> > +
> > +#include "ocelot.h"
> > +
> > +/**
> > + * struct ocelot_fdma - FMDA struct
> > + *
> > + * @ocelot: Pointer to ocelot struct
> > + * @base: base address of FDMA registers
> > + * @dcb_pool: Pool used for DCB allocation
> > + * @irq: FDMA interrupt
> > + * @dev: Ocelot device
> > + * @napi: napi handle
> > + * @rx_buf_size: Size of RX buffer
> > + * @tx_ongoing: List of DCB handed out to the FDMA
> > + * @tx_queued: pending list of DCBs to be given to the hardware
> > + * @tx_enqueue_lock: Lock used for tx_queued and tx_ongoing
> > + * @tx_free_dcb: List of DCB available for TX
> > + * @tx_free_lock: Lock used to access tx_free_dcb list
> > + * @rx_hw: RX DCBs currently owned by the hardware and not
> > completed
> > + * @rx_sw: RX DCBs completed
> > + */
> > +struct ocelot_fdma {
> > + struct ocelot *ocelot;
> > + void __iomem *base;
> > + struct dma_pool *dcb_pool;
> > + int irq;
> > + struct device *dev;
> > + struct napi_struct napi;
> > + size_t rx_buf_size;
> > +
> > + struct list_head tx_ongoing;
> > + struct list_head tx_queued;
> > + /* Lock for tx_queued and tx_ongoing lists */
> > + spinlock_t tx_enqueue_lock;
> > +
> > + struct list_head tx_free_dcb;
> > + /* Lock for tx_free_dcb list */
> > + spinlock_t tx_free_lock;
> > +
> > + struct list_head rx_hw;
> > + struct list_head rx_sw;
> > +};
> > +
> > +struct ocelot_fdma *ocelot_fdma_init(struct platform_device *pdev,
> > + struct ocelot *ocelot);
> > +int ocelot_fdma_start(struct ocelot_fdma *fdma);
> > +int ocelot_fdma_stop(struct ocelot_fdma *fdma);
> > +int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port,
> > u32 rew_op,
> > + struct sk_buff *skb, struct
> > net_device *dev); +
> > +#endif
> > diff --git a/drivers/net/ethernet/mscc/ocelot_net.c
> > b/drivers/net/ethernet/mscc/ocelot_net.c index
> > 5916492fd6d0..3971b810c5b4 100644 ---
> > a/drivers/net/ethernet/mscc/ocelot_net.c +++
> > b/drivers/net/ethernet/mscc/ocelot_net.c @@ -15,6 +15,7 @@
> > #include <net/pkt_cls.h>
> > #include "ocelot.h"
> > #include "ocelot_vcap.h"
> > +#include "ocelot_fdma.h"
> >
> > #define OCELOT_MAC_QUIRKS
> > OCELOT_QUIRK_QSGMII_PORTS_MUST_BE_UP
> > @@ -457,7 +458,7 @@ static netdev_tx_t ocelot_port_xmit(struct
> > sk_buff *skb, struct net_device *dev) int port = priv->chip_port;
> > u32 rew_op = 0;
> >
> > - if (!ocelot_can_inject(ocelot, 0))
> > + if (!ocelot->fdma && !ocelot_can_inject(ocelot, 0))
> > return NETDEV_TX_BUSY;
> >
> > /* Check if timestamping is needed */
> > @@ -475,9 +476,13 @@ static netdev_tx_t ocelot_port_xmit(struct
> > sk_buff *skb, struct net_device *dev) rew_op =
> > ocelot_ptp_rew_op(skb); }
> >
> > - ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
> > + if (ocelot->fdma) {
> > + ocelot_fdma_inject_frame(ocelot->fdma, port,
> > rew_op, skb, dev);
> > + } else {
> > + ocelot_port_inject_frame(ocelot, port, 0, rew_op,
> > skb);
> > - kfree_skb(skb);
> > + kfree_skb(skb);
>
> I know this is unrelated, but.. consume_skb maybe?

Acked.

>
> > + }
> >
> > return NETDEV_TX_OK;
> > }
> > diff --git a/drivers/net/ethernet/mscc/ocelot_vsc7514.c
> > b/drivers/net/ethernet/mscc/ocelot_vsc7514.c index
> > 38103b0255b0..985d584db3a1 100644 ---
> > a/drivers/net/ethernet/mscc/ocelot_vsc7514.c +++
> > b/drivers/net/ethernet/mscc/ocelot_vsc7514.c @@ -18,6 +18,7 @@
> >
> > #include <soc/mscc/ocelot_vcap.h>
> > #include <soc/mscc/ocelot_hsio.h>
> > +#include "ocelot_fdma.h"
> > #include "ocelot.h"
> >
> > static const u32 ocelot_ana_regmap[] = {
> > @@ -1080,6 +1081,10 @@ static int mscc_ocelot_probe(struct
> > platform_device *pdev) ocelot->targets[io_target[i].id] = target;
> > }
> >
> > + ocelot->fdma = ocelot_fdma_init(pdev, ocelot);
> > + if (IS_ERR(ocelot->fdma))
> > + ocelot->fdma = NULL;
> > +
> > hsio =
> > syscon_regmap_lookup_by_compatible("mscc,ocelot-hsio"); if
> > (IS_ERR(hsio)) { dev_err(&pdev->dev, "missing hsio syscon\n");
> > @@ -1139,6 +1144,12 @@ static int mscc_ocelot_probe(struct
> > platform_device *pdev) if (err)
> > goto out_ocelot_devlink_unregister;
> >
> > + if (ocelot->fdma) {
> > + err = ocelot_fdma_start(ocelot->fdma);
> > + if (err)
> > + goto out_ocelot_devlink_unregister;
> > + }
> > +
> > err = ocelot_devlink_sb_register(ocelot);
> > if (err)
> > goto out_ocelot_release_ports;
> > @@ -1166,6 +1177,8 @@ static int mscc_ocelot_probe(struct
> > platform_device *pdev) out_ocelot_release_ports:
> > mscc_ocelot_release_ports(ocelot);
> > mscc_ocelot_teardown_devlink_ports(ocelot);
> > + if (ocelot->fdma)
> > + ocelot_fdma_stop(ocelot->fdma);
> > out_ocelot_devlink_unregister:
> > ocelot_deinit(ocelot);
> > out_put_ports:
> > @@ -1179,6 +1192,8 @@ static int mscc_ocelot_remove(struct
> > platform_device *pdev) {
> > struct ocelot *ocelot = platform_get_drvdata(pdev);
> >
> > + if (ocelot->fdma)
> > + ocelot_fdma_stop(ocelot->fdma);
>
> Are you sure you want to call netif_napi_del() while the net devices
> are still registered? :-/

Indeed, the napi removal must be done later, I will split fdma_stop().

>
> > devlink_unregister(ocelot->devlink);
> > ocelot_deinit_timestamp(ocelot);
> > ocelot_devlink_sb_unregister(ocelot);
> > diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
> > index b3381c90ff3e..33e1559bdea3 100644
> > --- a/include/soc/mscc/ocelot.h
> > +++ b/include/soc/mscc/ocelot.h
> > @@ -695,6 +695,8 @@ struct ocelot {
> > /* Protects the PTP clock */
> > spinlock_t ptp_clock_lock;
> > struct ptp_pin_desc
> > ptp_pins[OCELOT_PTP_PINS_NUM]; +
> > + struct ocelot_fdma *fdma;
> > };
> >
> > struct ocelot_policer {
> > --
> > 2.33.0
>



--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2021-11-03 14:33:08

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH v2 6/6] net: ocelot: add jumbo frame support for FDMA

Le Wed, 3 Nov 2021 12:43:20 +0000,
Vladimir Oltean <[email protected]> a écrit :

> On Wed, Nov 03, 2021 at 10:19:43AM +0100, Clément Léger wrote:
> > When using the FDMA, using jumbo frames can lead to a large performance
> > improvement. When changing the MTU, the RX buffer size must be
> > increased to be large enough to receive jumbo frame. Since the FDMA is
> > shared amongst all interfaces, all the ports must be down before
> > changing the MTU. Buffers are sized to accept the maximum MTU supported
> > by each port.
> >
> > Signed-off-by: Clément Léger <[email protected]>
> > ---
>
> Instead of draining buffers and refilling with a different size, which
> impacts the user experience, can you not just use scatter/gather RX
> processing for frames larger than the fixed buffer size, like a normal
> driver would?

I could do that yes but I'm not sure it will improve the FDMA
performance that much then. I will check.

>
> > drivers/net/ethernet/mscc/ocelot_fdma.c | 61 +++++++++++++++++++++++++
> > drivers/net/ethernet/mscc/ocelot_fdma.h | 1 +
> > drivers/net/ethernet/mscc/ocelot_net.c | 7 +++
> > 3 files changed, 69 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c
> > index d8cdf022bbee..bee1a310caa6 100644
> > --- a/drivers/net/ethernet/mscc/ocelot_fdma.c
> > +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c
> > @@ -530,6 +530,67 @@ static void fdma_free_skbs_list(struct ocelot_fdma *fdma,
> > }
> > }
> >
> > +int ocelot_fdma_change_mtu(struct net_device *dev, int new_mtu)
> > +{
> > + struct ocelot_port_private *priv = netdev_priv(dev);
> > + struct ocelot_port *port = &priv->port;
> > + struct ocelot *ocelot = port->ocelot;
> > + struct ocelot_fdma *fdma = ocelot->fdma;
> > + struct ocelot_fdma_dcb *dcb, *dcb_temp;
> > + struct list_head tmp = LIST_HEAD_INIT(tmp);
> > + size_t old_rx_buf_size = fdma->rx_buf_size;
> > + bool all_ports_down = true;
> > + u8 port_num;
> > +
> > + /* The FDMA RX list is shared amongst all the port, get the max MTU from
> > + * all of them
> > + */
> > + for (port_num = 0; port_num < ocelot->num_phys_ports; port_num++) {
> > + port = ocelot->ports[port_num];
> > + if (!port)
> > + continue;
> > +
> > + priv = container_of(port, struct ocelot_port_private, port);
> > +
> > + if (READ_ONCE(priv->dev->mtu) > new_mtu)
> > + new_mtu = READ_ONCE(priv->dev->mtu);
> > +
> > + /* All ports must be down to change the RX buffer length */
> > + if (netif_running(priv->dev))
> > + all_ports_down = false;
> > + }
> > +
> > + fdma->rx_buf_size = fdma_rx_compute_buffer_size(new_mtu);
> > + if (fdma->rx_buf_size == old_rx_buf_size)
> > + return 0;
> > +
> > + if (!all_ports_down)
> > + return -EBUSY;
> > +
> > + priv = netdev_priv(dev);
> > +
> > + fdma_stop_channel(fdma, MSCC_FDMA_INJ_CHAN);
> > +
> > + /* Discard all pending RX software and hardware descriptor */
> > + fdma_free_skbs_list(fdma, &fdma->rx_hw, DMA_FROM_DEVICE);
> > + fdma_free_skbs_list(fdma, &fdma->rx_sw, DMA_FROM_DEVICE);
> > +
> > + /* Move all DCBs to a temporary list that will be injected in sw list */
> > + if (!list_empty(&fdma->rx_hw))
> > + list_splice_tail_init(&fdma->rx_hw, &tmp);
> > + if (!list_empty(&fdma->rx_sw))
> > + list_splice_tail_init(&fdma->rx_sw, &tmp);
> > +
> > + list_for_each_entry_safe(dcb, dcb_temp, &tmp, node) {
> > + list_del(&dcb->node);
> > + ocelot_fdma_rx_add_dcb_sw(fdma, dcb);
> > + }
> > +
> > + ocelot_fdma_rx_refill(fdma);
> > +
> > + return 0;
> > +}
> > +
> > static int fdma_init_tx(struct ocelot_fdma *fdma)
> > {
> > int i;
> > diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.h b/drivers/net/ethernet/mscc/ocelot_fdma.h
> > index 6c5c5872abf5..74514a0b291a 100644
> > --- a/drivers/net/ethernet/mscc/ocelot_fdma.h
> > +++ b/drivers/net/ethernet/mscc/ocelot_fdma.h
> > @@ -55,5 +55,6 @@ int ocelot_fdma_start(struct ocelot_fdma *fdma);
> > int ocelot_fdma_stop(struct ocelot_fdma *fdma);
> > int ocelot_fdma_inject_frame(struct ocelot_fdma *fdma, int port, u32 rew_op,
> > struct sk_buff *skb, struct net_device *dev);
> > +int ocelot_fdma_change_mtu(struct net_device *dev, int new_mtu);
> >
> > #endif
> > diff --git a/drivers/net/ethernet/mscc/ocelot_net.c b/drivers/net/ethernet/mscc/ocelot_net.c
> > index 3971b810c5b4..d5e88d7b15c7 100644
> > --- a/drivers/net/ethernet/mscc/ocelot_net.c
> > +++ b/drivers/net/ethernet/mscc/ocelot_net.c
> > @@ -492,6 +492,13 @@ static int ocelot_change_mtu(struct net_device *dev, int new_mtu)
> > struct ocelot_port_private *priv = netdev_priv(dev);
> > struct ocelot_port *ocelot_port = &priv->port;
> > struct ocelot *ocelot = ocelot_port->ocelot;
> > + int ret;
> > +
> > + if (ocelot->fdma) {
> > + ret = ocelot_fdma_change_mtu(dev, new_mtu);
> > + if (ret)
> > + return ret;
> > + }
> >
> > ocelot_port_set_maxlen(ocelot, priv->chip_port, new_mtu);
> > WRITE_ONCE(dev->mtu, new_mtu);
> > --
> > 2.33.0
>



--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2021-11-08 15:44:32

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] dt-bindings: net: convert mscc,vsc7514-switch bindings to yaml

Le Wed, 3 Nov 2021 10:45:12 +0000,
Vladimir Oltean <[email protected]> a écrit :

> On Wed, Nov 03, 2021 at 10:19:39AM +0100, Clément Léger wrote:
> > Convert existing bindings to yaml format. In the same time, remove non
> > exiting properties ("inj" interrupt) and add fdma.
> >
> > Signed-off-by: Clément Léger <[email protected]>
> > ---
> > .../bindings/net/mscc,vsc7514-switch.yaml | 184 ++++++++++++++++++
> > .../devicetree/bindings/net/mscc-ocelot.txt | 83 --------
> > 2 files changed, 184 insertions(+), 83 deletions(-)
> > create mode 100644 Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> > delete mode 100644 Documentation/devicetree/bindings/net/mscc-ocelot.txt
> >
> > diff --git a/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml b/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> > new file mode 100644
> > index 000000000000..0c96eabf9d2d
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> > @@ -0,0 +1,184 @@
> > +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> > +%YAML 1.2
> > +---
> > +$id: http://devicetree.org/schemas/net/mscc,vsc7514-switch.yaml
> > +$schema: http://devicetree.org/meta-schemas/core.yaml
> > +
> > +title: Microchip VSC7514 Ethernet switch controller
> > +
> > +maintainers:
> > + - Vladimir Oltean <[email protected]>
> > + - Claudiu Manoil <[email protected]>
> > + - Alexandre Belloni <[email protected]>
> > +
> > +description: |
> > + The VSC7514 Industrial IoT Ethernet switch contains four integrated dual media
> > + 10/100/1000BASE-T PHYs, two 1G SGMII/SerDes, two 1G/2.5G SGMII/SerDes, and an
> > + option for either a 1G/2.5G SGMII/SerDes Node Processor Interface (NPI) or a
> > + PCIe interface for external CPU connectivity. The NPI/PCIe can operate as a
> > + standard Ethernet port.
>
> Technically any port can serve as NPI, not just the SERDES ones. People
> are even using internal PHY ports as NPI.
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/#24381029
>
> Honestly I would not bother talking about NPI, it is confusing to see it here.
> Anything having to do with the NPI port is the realm of DSA.
>
> Just say how the present driver expects to control the device, don't
> just copy stuff from marketing slides. In this case PCIe is irrelevant
> too, this driver is for a platform device, and it only runs on the
> embedded processor as far as I can tell.
>

Ack

> > +
> > + The device provides a rich set of Industrial Ethernet switching features such
> > + as fast protection switching, 1588 precision time protocol, and synchronous
> > + Ethernet. Advanced TCAM-based VLAN and QoS processing enable delivery of
> > + differentiated services. Security is assured through frame processing using
> > + Microsemi’s TCAM-based Versatile Content Aware Processor.
>
> Above you say Microchip, and here you say Microsemi.
>
> > +
> > + In addition, the device contains a powerful 500 MHz CPU enabling full
> > + management of the switch.
>
> ~powerful~
>
> > +
> > +properties:
> > + $nodename:
> > + pattern: "^switch@[0-9a-f]+$"
> > +
> > + compatible:
> > + const: mscc,vsc7514-switch
> > +
> > + reg:
> > + items:
> > + - description: system target
> > + - description: rewriter target
> > + - description: qs target
> > + - description: PTP target
> > + - description: Port0 target
> > + - description: Port1 target
> > + - description: Port2 target
> > + - description: Port3 target
> > + - description: Port4 target
> > + - description: Port5 target
> > + - description: Port6 target
> > + - description: Port7 target
> > + - description: Port8 target
> > + - description: Port9 target
> > + - description: Port10 target
> > + - description: QSystem target
> > + - description: Analyzer target
> > + - description: S0 target
> > + - description: S1 target
> > + - description: S2 target
> > + - description: fdma target
> > +
> > + reg-names:
> > + items:
> > + - const: sys
> > + - const: rew
> > + - const: qs
> > + - const: ptp
> > + - const: port0
> > + - const: port1
> > + - const: port2
> > + - const: port3
> > + - const: port4
> > + - const: port5
> > + - const: port6
> > + - const: port7
> > + - const: port8
> > + - const: port9
> > + - const: port10
> > + - const: qsys
> > + - const: ana
> > + - const: s0
> > + - const: s1
> > + - const: s2
> > + - const: fdma
> > +
> > + interrupts:
> > + minItems: 1
> > + items:
> > + - description: PTP ready
> > + - description: register based extraction
> > + - description: frame dma based extraction
> > +
> > + interrupt-names:
> > + minItems: 1
> > + items:
> > + - const: ptp_rdy
> > + - const: xtr
> > + - const: fdma
> > +
> > + ethernet-ports:
> > + type: object
> > + patternProperties:
> > + "^port@[0-9a-f]+$":
> > + type: object
> > + description: Ethernet ports handled by the switch
> > +
> > + allOf:
> > + - $ref: ethernet-controller.yaml#
>
> I'm pretty sure Rob will comment that this can be simplified to:
>
> $ref: ethernet-controller.yaml#
>
> without the allOf: and "-".

Ok

>
> > +
> > + properties:
> > + '#address-cells':
> > + const: 1
> > + '#size-cells':
> > + const: 0
> > +
> > + reg:
> > + description: Switch port number
> > +
> > + phy-handle: true
> > +
> > + mac-address: true
> > +
> > + required:
> > + - reg
> > + - phy-handle
>
> Shouldn't there be additionalProperties: false for the port node as well?
>
> And actually, phy-handle is not strictly required, if you have a
> fixed-link. I think you should use oneOf.

Ok

>
> And you know what else is required? phy-mode. See commits e6e12df625f2
> ("net: mscc: ocelot: convert to phylink") and eba54cbb92d2 ("MIPS: mscc:
> ocelot: mark the phy-mode for internal PHY ports").

Ok, so I guess the binding text file was not updated back then. I'll fix
that.

>
> > +
> > +required:
> > + - compatible
> > + - reg
> > + - reg-names
> > + - interrupts
> > + - interrupt-names
> > + - ethernet-ports
> > +
> > +additionalProperties: false
> > +
> > +examples:
> > + - |
> > + switch@1010000 {
> > + compatible = "mscc,vsc7514-switch";
> > + reg = <0x1010000 0x10000>,
> > + <0x1030000 0x10000>,
> > + <0x1080000 0x100>,
> > + <0x10e0000 0x10000>,
> > + <0x11e0000 0x100>,
> > + <0x11f0000 0x100>,
> > + <0x1200000 0x100>,
> > + <0x1210000 0x100>,
> > + <0x1220000 0x100>,
> > + <0x1230000 0x100>,
> > + <0x1240000 0x100>,
> > + <0x1250000 0x100>,
> > + <0x1260000 0x100>,
> > + <0x1270000 0x100>,
> > + <0x1280000 0x100>,
> > + <0x1800000 0x80000>,
> > + <0x1880000 0x10000>,
> > + <0x1040000 0x10000>,
> > + <0x1050000 0x10000>,
> > + <0x1060000 0x10000>,
> > + <0x1a0 0x1c4>;
> > + reg-names = "sys", "rew", "qs", "ptp", "port0", "port1",
> > + "port2", "port3", "port4", "port5", "port6",
> > + "port7", "port8", "port9", "port10", "qsys",
> > + "ana", "s0", "s1", "s2", "fdma";
> > + interrupts = <18 21 16>;
> > + interrupt-names = "ptp_rdy", "xtr", "fdma";
> > +
> > + ethernet-ports {
> > + #address-cells = <1>;
> > + #size-cells = <0>;
> > +
> > + port0: port@0 {
> > + reg = <0>;
> > + phy-handle = <&phy0>;
> > + };
> > + port1: port@1 {
> > + reg = <1>;
> > + phy-handle = <&phy1>;
> > + };
> > + };
> > + };
> > +
> > +...
> > +# vim: set ts=2 sw=2 sts=2 tw=80 et cc=80 ft=yaml :
> > diff --git a/Documentation/devicetree/bindings/net/mscc-ocelot.txt b/Documentation/devicetree/bindings/net/mscc-ocelot.txt
> > deleted file mode 100644
> > index 3b6290b45ce5..000000000000
> > --- a/Documentation/devicetree/bindings/net/mscc-ocelot.txt
> > +++ /dev/null
> > @@ -1,83 +0,0 @@
> > -Microsemi Ocelot network Switch
> > -===============================
> > -
> > -The Microsemi Ocelot network switch can be found on Microsemi SoCs (VSC7513,
> > -VSC7514)
> > -
> > -Required properties:
> > -- compatible: Should be "mscc,vsc7514-switch"
> > -- reg: Must contain an (offset, length) pair of the register set for each
> > - entry in reg-names.
> > -- reg-names: Must include the following entries:
> > - - "sys"
> > - - "rew"
> > - - "qs"
> > - - "ptp" (optional due to backward compatibility)
> > - - "qsys"
> > - - "ana"
> > - - "portX" with X from 0 to the number of last port index available on that
> > - switch
> > -- interrupts: Should contain the switch interrupts for frame extraction,
> > - frame injection and PTP ready.
> > -- interrupt-names: should contain the interrupt names: "xtr", "inj". Can contain
> > - "ptp_rdy" which is optional due to backward compatibility.
> > -- ethernet-ports: A container for child nodes representing switch ports.
> > -
> > -The ethernet-ports container has the following properties
> > -
> > -Required properties:
> > -
> > -- #address-cells: Must be 1
> > -- #size-cells: Must be 0
> > -
> > -Each port node must have the following mandatory properties:
> > -- reg: Describes the port address in the switch
> > -
> > -Port nodes may also contain the following optional standardised
> > -properties, described in binding documents:
> > -
> > -- phy-handle: Phandle to a PHY on an MDIO bus. See
> > - Documentation/devicetree/bindings/net/ethernet.txt for details.
> > -
> > -Example:
> > -
> > - switch@1010000 {
> > - compatible = "mscc,vsc7514-switch";
> > - reg = <0x1010000 0x10000>,
> > - <0x1030000 0x10000>,
> > - <0x1080000 0x100>,
> > - <0x10e0000 0x10000>,
> > - <0x11e0000 0x100>,
> > - <0x11f0000 0x100>,
> > - <0x1200000 0x100>,
> > - <0x1210000 0x100>,
> > - <0x1220000 0x100>,
> > - <0x1230000 0x100>,
> > - <0x1240000 0x100>,
> > - <0x1250000 0x100>,
> > - <0x1260000 0x100>,
> > - <0x1270000 0x100>,
> > - <0x1280000 0x100>,
> > - <0x1800000 0x80000>,
> > - <0x1880000 0x10000>;
> > - reg-names = "sys", "rew", "qs", "ptp", "port0", "port1",
> > - "port2", "port3", "port4", "port5", "port6",
> > - "port7", "port8", "port9", "port10", "qsys",
> > - "ana";
> > - interrupts = <18 21 22>;
> > - interrupt-names = "ptp_rdy", "xtr", "inj";
> > -
> > - ethernet-ports {
> > - #address-cells = <1>;
> > - #size-cells = <0>;
> > -
> > - port0: port@0 {
> > - reg = <0>;
> > - phy-handle = <&phy0>;
> > - };
> > - port1: port@1 {
> > - reg = <1>;
> > - phy-handle = <&phy1>;
> > - };
> > - };
> > - };
> > --
> > 2.33.0
>



--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2021-11-12 20:06:12

by Rob Herring (Arm)

[permalink] [raw]
Subject: Re: [PATCH v2 2/6] dt-bindings: net: convert mscc,vsc7514-switch bindings to yaml

On Wed, Nov 03, 2021 at 10:19:39AM +0100, Clément Léger wrote:
> Convert existing bindings to yaml format. In the same time, remove non
> exiting properties ("inj" interrupt) and add fdma.
>
> Signed-off-by: Clément Léger <[email protected]>
> ---
> .../bindings/net/mscc,vsc7514-switch.yaml | 184 ++++++++++++++++++
> .../devicetree/bindings/net/mscc-ocelot.txt | 83 --------
> 2 files changed, 184 insertions(+), 83 deletions(-)
> create mode 100644 Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> delete mode 100644 Documentation/devicetree/bindings/net/mscc-ocelot.txt
>
> diff --git a/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml b/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> new file mode 100644
> index 000000000000..0c96eabf9d2d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/mscc,vsc7514-switch.yaml
> @@ -0,0 +1,184 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/net/mscc,vsc7514-switch.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: Microchip VSC7514 Ethernet switch controller
> +
> +maintainers:
> + - Vladimir Oltean <[email protected]>
> + - Claudiu Manoil <[email protected]>
> + - Alexandre Belloni <[email protected]>
> +
> +description: |
> + The VSC7514 Industrial IoT Ethernet switch contains four integrated dual media
> + 10/100/1000BASE-T PHYs, two 1G SGMII/SerDes, two 1G/2.5G SGMII/SerDes, and an
> + option for either a 1G/2.5G SGMII/SerDes Node Processor Interface (NPI) or a
> + PCIe interface for external CPU connectivity. The NPI/PCIe can operate as a
> + standard Ethernet port.
> +
> + The device provides a rich set of Industrial Ethernet switching features such
> + as fast protection switching, 1588 precision time protocol, and synchronous
> + Ethernet. Advanced TCAM-based VLAN and QoS processing enable delivery of
> + differentiated services. Security is assured through frame processing using
> + Microsemi’s TCAM-based Versatile Content Aware Processor.
> +
> + In addition, the device contains a powerful 500 MHz CPU enabling full
> + management of the switch.
> +
> +properties:
> + $nodename:
> + pattern: "^switch@[0-9a-f]+$"
> +
> + compatible:
> + const: mscc,vsc7514-switch
> +
> + reg:
> + items:
> + - description: system target
> + - description: rewriter target
> + - description: qs target
> + - description: PTP target
> + - description: Port0 target
> + - description: Port1 target
> + - description: Port2 target
> + - description: Port3 target
> + - description: Port4 target
> + - description: Port5 target
> + - description: Port6 target
> + - description: Port7 target
> + - description: Port8 target
> + - description: Port9 target
> + - description: Port10 target
> + - description: QSystem target
> + - description: Analyzer target
> + - description: S0 target
> + - description: S1 target
> + - description: S2 target
> + - description: fdma target
> +
> + reg-names:
> + items:
> + - const: sys
> + - const: rew
> + - const: qs
> + - const: ptp
> + - const: port0
> + - const: port1
> + - const: port2
> + - const: port3
> + - const: port4
> + - const: port5
> + - const: port6
> + - const: port7
> + - const: port8
> + - const: port9
> + - const: port10
> + - const: qsys
> + - const: ana
> + - const: s0
> + - const: s1
> + - const: s2
> + - const: fdma
> +
> + interrupts:
> + minItems: 1
> + items:
> + - description: PTP ready
> + - description: register based extraction
> + - description: frame dma based extraction
> +
> + interrupt-names:
> + minItems: 1
> + items:
> + - const: ptp_rdy
> + - const: xtr
> + - const: fdma
> +
> + ethernet-ports:
> + type: object

additionalProperties: false

> + patternProperties:
> + "^port@[0-9a-f]+$":
> + type: object
> + description: Ethernet ports handled by the switch
> +
> + allOf:

You can drop 'allOf'.

> + - $ref: ethernet-controller.yaml#

unevaluatedProperties: false

> +
> + properties:
> + '#address-cells':
> + const: 1
> + '#size-cells':
> + const: 0

Wrong level for these.

> +
> + reg:
> + description: Switch port number
> +
> + phy-handle: true
> +
> + mac-address: true
> +
> + required:
> + - reg
> + - phy-handle
> +
> +required:
> + - compatible
> + - reg
> + - reg-names
> + - interrupts
> + - interrupt-names
> + - ethernet-ports
> +
> +additionalProperties: false
> +
> +examples:
> + - |
> + switch@1010000 {
> + compatible = "mscc,vsc7514-switch";
> + reg = <0x1010000 0x10000>,
> + <0x1030000 0x10000>,
> + <0x1080000 0x100>,
> + <0x10e0000 0x10000>,
> + <0x11e0000 0x100>,
> + <0x11f0000 0x100>,
> + <0x1200000 0x100>,
> + <0x1210000 0x100>,
> + <0x1220000 0x100>,
> + <0x1230000 0x100>,
> + <0x1240000 0x100>,
> + <0x1250000 0x100>,
> + <0x1260000 0x100>,
> + <0x1270000 0x100>,
> + <0x1280000 0x100>,
> + <0x1800000 0x80000>,
> + <0x1880000 0x10000>,
> + <0x1040000 0x10000>,
> + <0x1050000 0x10000>,
> + <0x1060000 0x10000>,
> + <0x1a0 0x1c4>;
> + reg-names = "sys", "rew", "qs", "ptp", "port0", "port1",
> + "port2", "port3", "port4", "port5", "port6",
> + "port7", "port8", "port9", "port10", "qsys",
> + "ana", "s0", "s1", "s2", "fdma";
> + interrupts = <18 21 16>;
> + interrupt-names = "ptp_rdy", "xtr", "fdma";
> +
> + ethernet-ports {
> + #address-cells = <1>;
> + #size-cells = <0>;
> +
> + port0: port@0 {
> + reg = <0>;
> + phy-handle = <&phy0>;
> + };
> + port1: port@1 {
> + reg = <1>;
> + phy-handle = <&phy1>;
> + };
> + };
> + };
> +
> +...
> +# vim: set ts=2 sw=2 sts=2 tw=80 et cc=80 ft=yaml :

Please drop this.

emacs settings are fine, but not vim. ;) JK, I don't use either.


2021-11-15 10:18:10

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

Le Wed, 3 Nov 2021 14:53:51 +0100,
Clément Léger <[email protected]> a écrit :

> Le Wed, 3 Nov 2021 12:38:12 +0000,
> Vladimir Oltean <[email protected]> a écrit :
>
> > On Wed, Nov 03, 2021 at 10:19:40AM +0100, Clément Léger wrote:
> > > IFH preparation can take quite some time on slow processors (up to
> > > 5% in a iperf3 test for instance). In order to reduce the cost of
> > > this preparation, pre-compute IFH since most of the parameters are
> > > fixed per port. Only rew_op and vlan tag will be set when sending
> > > if different than 0. This allows to remove entirely the calls to
> > > packing() with basic usage. In the same time, export this function
> > > that will be used by FDMA.
> > >
> > > Signed-off-by: Clément Léger <[email protected]>
> > > ---
> >
> > Honestly, this feels a bit cheap/gimmicky, and not really the
> > fundamental thing to address. In my testing of a similar idea (see
> > commits 67c2404922c2 ("net: dsa: felix: create a template for the DSA
> > tags on xmit") and then 7c4bb540e917 ("net: dsa: tag_ocelot: create
> > separate tagger for Seville"), the net difference is not that stark,
> > considering that now you need to access one more memory region which
> > you did not need before, do a memcpy, and then patch the IFH anyway
> > for the non-constant stuff.
>
> The memcpy is neglectable and the patching happens only in a few
> cases (at least vs the packing function call). The VSC7514 CPU is really
> slow and lead to 2.5% up to 5% time spent in packing() when using iperf3
> and depending on the use case (according to ftrace).
>
> >
> > Certainly, for the calls to ocelot_port_inject_frame() from DSA, I
> > would prefer not having this pre-computed IFH.
> >
> > Could you provide some before/after performance numbers and perf
> > counters?
>
> I will make another round of measure to confirm my previous number and
> check the impact on the injection rate on ocelot.

I checked again my bandwith numbers (obtained with iperf3) with and
without the pre-computed header:

Test on standard packets with UDP (iperf3 -t 100 -l 1460 -u -b 0 -c *)
- With pre-computed header: UDP TX: 33Mbit/s
- Without UDP TX: 31Mbit/s
-> 6.5% improvement

Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
- With pre-computed header: UDP TX: 15.8Mbit/s
- Without UDP TX: 16.4Mbit/s
-> 4.3% improvement

The improvement might not be huge but also not negligible at all.
Please tell me if you want me to drop it or not based on those numbers.

>
> >
> > > drivers/net/ethernet/mscc/ocelot.c | 23 ++++++++++++++++++-----
> > > include/soc/mscc/ocelot.h | 5 +++++
> > > 2 files changed, 23 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/drivers/net/ethernet/mscc/ocelot.c
> > > b/drivers/net/ethernet/mscc/ocelot.c index
> > > e6c18b598d5c..97693772595b 100644 ---
> > > a/drivers/net/ethernet/mscc/ocelot.c +++
> > > b/drivers/net/ethernet/mscc/ocelot.c @@ -1076,20 +1076,29 @@ bool
> > > ocelot_can_inject(struct ocelot *ocelot, int grp) }
> > > EXPORT_SYMBOL(ocelot_can_inject);
> > >
> > > +void ocelot_ifh_port_set(void *ifh, struct ocelot_port *port, u32
> > > rew_op,
> > > + u32 vlan_tag)
> > > +{
> > > + memcpy(ifh, port->ifh, OCELOT_TAG_LEN);
> > > +
> > > + if (vlan_tag)
> > > + ocelot_ifh_set_vlan_tci(ifh, vlan_tag);
> > > + if (rew_op)
> > > + ocelot_ifh_set_rew_op(ifh, rew_op);
> > > +}
> > > +EXPORT_SYMBOL(ocelot_ifh_port_set);
> > > +
> > > void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int
> > > grp, u32 rew_op, struct sk_buff *skb)
> > > {
> > > + struct ocelot_port *port_s = ocelot->ports[port];
> > > u32 ifh[OCELOT_TAG_LEN / 4] = {0};
> > > unsigned int i, count, last;
> > >
> > > ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) |
> > > QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
> > >
> > > - ocelot_ifh_set_bypass(ifh, 1);
> > > - ocelot_ifh_set_dest(ifh, BIT_ULL(port));
> > > - ocelot_ifh_set_tag_type(ifh, IFH_TAG_TYPE_C);
> > > - ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
> > > - ocelot_ifh_set_rew_op(ifh, rew_op);
> > > + ocelot_ifh_port_set(ifh, port_s, rew_op,
> > > skb_vlan_tag_get(skb));
> > > for (i = 0; i < OCELOT_TAG_LEN / 4; i++)
> > > ocelot_write_rix(ocelot, ifh[i], QS_INJ_WR, grp);
> > > @@ -2128,6 +2137,10 @@ void ocelot_init_port(struct ocelot *ocelot,
> > > int port)
> > > skb_queue_head_init(&ocelot_port->tx_skbs);
> > >
> > > + ocelot_ifh_set_bypass(ocelot_port->ifh, 1);
> > > + ocelot_ifh_set_dest(ocelot_port->ifh, BIT_ULL(port));
> > > + ocelot_ifh_set_tag_type(ocelot_port->ifh, IFH_TAG_TYPE_C);
> > > +
> > > /* Basic L2 initialization */
> > >
> > > /* Set MAC IFG Gaps
> > > diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h
> > > index fef3a36b0210..b3381c90ff3e 100644
> > > --- a/include/soc/mscc/ocelot.h
> > > +++ b/include/soc/mscc/ocelot.h
> > > @@ -6,6 +6,7 @@
> > > #define _SOC_MSCC_OCELOT_H
> > >
> > > #include <linux/ptp_clock_kernel.h>
> > > +#include <linux/dsa/ocelot.h>
> > > #include <linux/net_tstamp.h>
> > > #include <linux/if_vlan.h>
> > > #include <linux/regmap.h>
> > > @@ -623,6 +624,8 @@ struct ocelot_port {
> > >
> > > struct net_device *bridge;
> > > u8 stp_state;
> > > +
> > > + u8 ifh[OCELOT_TAG_LEN];
> > > };
> > >
> > > struct ocelot {
> > > @@ -754,6 +757,8 @@ void __ocelot_target_write_ix(struct ocelot
> > > *ocelot, enum ocelot_target target, bool ocelot_can_inject(struct
> > > ocelot *ocelot, int grp); void ocelot_port_inject_frame(struct
> > > ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb);
> > > +void ocelot_ifh_port_set(void *ifh, struct ocelot_port *port, u32
> > > rew_op,
> > > + u32 vlan_tag);
> > > int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct
> > > sk_buff **skb); void ocelot_drain_cpu_queue(struct ocelot *ocelot,
> > > int grp);
> > > --
> > > 2.33.0
> >
>
>
>



--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2021-11-15 10:52:03

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

On Mon, Nov 15, 2021 at 11:13:44AM +0100, Cl?ment L?ger wrote:
> Le Wed, 3 Nov 2021 14:53:51 +0100,
> Cl?ment L?ger <[email protected]> a ?crit :
>
> > Le Wed, 3 Nov 2021 12:38:12 +0000,
> > Vladimir Oltean <[email protected]> a ?crit :
> >
> > > On Wed, Nov 03, 2021 at 10:19:40AM +0100, Cl?ment L?ger wrote:
> > > > IFH preparation can take quite some time on slow processors (up to
> > > > 5% in a iperf3 test for instance). In order to reduce the cost of
> > > > this preparation, pre-compute IFH since most of the parameters are
> > > > fixed per port. Only rew_op and vlan tag will be set when sending
> > > > if different than 0. This allows to remove entirely the calls to
> > > > packing() with basic usage. In the same time, export this function
> > > > that will be used by FDMA.
> > > >
> > > > Signed-off-by: Cl?ment L?ger <[email protected]>
> > > > ---
> > >
> > > Honestly, this feels a bit cheap/gimmicky, and not really the
> > > fundamental thing to address. In my testing of a similar idea (see
> > > commits 67c2404922c2 ("net: dsa: felix: create a template for the DSA
> > > tags on xmit") and then 7c4bb540e917 ("net: dsa: tag_ocelot: create
> > > separate tagger for Seville"), the net difference is not that stark,
> > > considering that now you need to access one more memory region which
> > > you did not need before, do a memcpy, and then patch the IFH anyway
> > > for the non-constant stuff.
> >
> > The memcpy is neglectable and the patching happens only in a few
> > cases (at least vs the packing function call). The VSC7514 CPU is really
> > slow and lead to 2.5% up to 5% time spent in packing() when using iperf3
> > and depending on the use case (according to ftrace).
> >
> > >
> > > Certainly, for the calls to ocelot_port_inject_frame() from DSA, I
> > > would prefer not having this pre-computed IFH.
> > >
> > > Could you provide some before/after performance numbers and perf
> > > counters?
> >
> > I will make another round of measure to confirm my previous number and
> > check the impact on the injection rate on ocelot.
>
> I checked again my bandwith numbers (obtained with iperf3) with and
> without the pre-computed header:
>
> Test on standard packets with UDP (iperf3 -t 100 -l 1460 -u -b 0 -c *)
> - With pre-computed header: UDP TX: 33Mbit/s
> - Without UDP TX: 31Mbit/s
> -> 6.5% improvement
>
> Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
> - With pre-computed header: UDP TX: 15.8Mbit/s
> - Without UDP TX: 16.4Mbit/s
> -> 4.3% improvement
>
> The improvement might not be huge but also not negligible at all.
> Please tell me if you want me to drop it or not based on those numbers.

Is this with manual injection or with FDMA? Do you have before/after
numbers with FDMA as well? At 31 vs 33 Mbps, this isn't going to compete
for any races anyway :)

2021-11-15 11:07:36

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

Le Mon, 15 Nov 2021 10:51:45 +0000,
Vladimir Oltean <[email protected]> a écrit :

> > I checked again my bandwith numbers (obtained with iperf3) with and
> > without the pre-computed header:
> >
> > Test on standard packets with UDP (iperf3 -t 100 -l 1460 -u -b 0 -c *)
> > - With pre-computed header: UDP TX: 33Mbit/s
> > - Without UDP TX: 31Mbit/s
> > -> 6.5% improvement
> >
> > Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
> > - With pre-computed header: UDP TX: 15.8Mbit/s
> > - Without UDP TX: 16.4Mbit/s
> > -> 4.3% improvement
> >
> > The improvement might not be huge but also not negligible at all.
> > Please tell me if you want me to drop it or not based on those numbers.
>
> Is this with manual injection or with FDMA? Do you have before/after
> numbers with FDMA as well? At 31 vs 33 Mbps, this isn't going to compete
> for any races anyway :)

These numbers were for the FDMA, with the CPU, its even much lower
because more time is spent to push bytes through registers...
But agreed with that, this isn't going to beat any records !


--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2021-11-15 14:08:11

by Jakub Kicinski

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

On Mon, 15 Nov 2021 11:13:44 +0100 Clément Léger wrote:
> Test on standard packets with UDP (iperf3 -t 100 -l 1460 -u -b 0 -c *)
> - With pre-computed header: UDP TX: 33Mbit/s
> - Without UDP TX: 31Mbit/s
> -> 6.5% improvement
>
> Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
> - With pre-computed header: UDP TX: 15.8Mbit/s
> - Without UDP TX: 16.4Mbit/s
> -> 4.3% improvement

Something's wrong with these numbers or I'm missing context.
You say improvement in both cases yet in the latter case the
new number is lower?

2021-11-15 14:11:28

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

Le Mon, 15 Nov 2021 06:08:00 -0800,
Jakub Kicinski <[email protected]> a écrit :

> On Mon, 15 Nov 2021 11:13:44 +0100 Clément Léger wrote:
> > Test on standard packets with UDP (iperf3 -t 100 -l 1460 -u -b 0 -c *)
> > - With pre-computed header: UDP TX: 33Mbit/s
> > - Without UDP TX: 31Mbit/s
> > -> 6.5% improvement
> >
> > Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
> > - With pre-computed header: UDP TX: 15.8Mbit/s
> > - Without UDP TX: 16.4Mbit/s
> > -> 4.3% improvement
>
> Something's wrong with these numbers or I'm missing context.
> You say improvement in both cases yet in the latter case the
> new number is lower?

You are right Jakub, I swapped the last two results,

Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
- With pre-computed header: UDP TX: 16.4Mbit/s
- Without UDP TX: 15.8Mbit/s
-> 4.3% improvement

--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com

2021-11-15 14:33:23

by Vladimir Oltean

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

On Mon, Nov 15, 2021 at 03:06:20PM +0100, Cl?ment L?ger wrote:
> Le Mon, 15 Nov 2021 06:08:00 -0800,
> Jakub Kicinski <[email protected]> a ?crit :
>
> > On Mon, 15 Nov 2021 11:13:44 +0100 Cl?ment L?ger wrote:
> > > Test on standard packets with UDP (iperf3 -t 100 -l 1460 -u -b 0 -c *)
> > > - With pre-computed header: UDP TX: 33Mbit/s
> > > - Without UDP TX: 31Mbit/s
> > > -> 6.5% improvement
> > >
> > > Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
> > > - With pre-computed header: UDP TX: 15.8Mbit/s
> > > - Without UDP TX: 16.4Mbit/s
> > > -> 4.3% improvement
> >
> > Something's wrong with these numbers or I'm missing context.
> > You say improvement in both cases yet in the latter case the
> > new number is lower?
>
> You are right Jakub, I swapped the last two results,
>
> Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
> - With pre-computed header: UDP TX: 16.4Mbit/s
> - Without UDP TX: 15.8Mbit/s
> -> 4.3% improvement

Even in reverse, something still seems wrong with the numbers.
My DSPI controller can transfer at a higher data rate than that.
Where is the rest of the time spent? Computing checksums?

2021-11-15 16:08:00

by Clément Léger

[permalink] [raw]
Subject: Re: [PATCH v2 3/6] net: ocelot: pre-compute injection frame header content

Le Mon, 15 Nov 2021 14:31:06 +0000,
Vladimir Oltean <[email protected]> a écrit :

> On Mon, Nov 15, 2021 at 03:06:20PM +0100, Clément Léger wrote:
> > Le Mon, 15 Nov 2021 06:08:00 -0800,
> > Jakub Kicinski <[email protected]> a écrit :
> >
> > > On Mon, 15 Nov 2021 11:13:44 +0100 Clément Léger wrote:
> > > > Test on standard packets with UDP (iperf3 -t 100 -l 1460 -u -b 0 -c *)
> > > > - With pre-computed header: UDP TX: 33Mbit/s
> > > > - Without UDP TX: 31Mbit/s
> > > > -> 6.5% improvement
> > > >
> > > > Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
> > > > - With pre-computed header: UDP TX: 15.8Mbit/s
> > > > - Without UDP TX: 16.4Mbit/s
> > > > -> 4.3% improvement
> > >
> > > Something's wrong with these numbers or I'm missing context.
> > > You say improvement in both cases yet in the latter case the
> > > new number is lower?
> >
> > You are right Jakub, I swapped the last two results,
> >
> > Test on small packets with UDP (iperf3 -t 100 -l 700 -u -b 0 -c *)
> > - With pre-computed header: UDP TX: 16.4Mbit/s
> > - Without UDP TX: 15.8Mbit/s
> > -> 4.3% improvement
>
> Even in reverse, something still seems wrong with the numbers.
> My DSPI controller can transfer at a higher data rate than that.
> Where is the rest of the time spent? Computing checksums?

While adding FDMA support, I was surprised by the low performances I
encountered so I spent some times trying to understand and find where
the time was spent. First, I ran a iperf in loopback (using lo) and it
yielded the following results (of course RX/TX runs on the same CPU in
this case):

TCP (iperf3 -c localhost):
- RX/TX: 84.0Mbit/s

UDP (iperf3 -u -b 0 -c localhost):
- RX/TX: 65.0Mbit/s

So even in localhost mode, the CPU is already really slow and can only
sustain a really small "throughput". I then tried to check the
performances using the CPU based injection/extraction, and I obtained
the following results:

TCP: (iperf3 -u -b 0 -c)
- TX: 11.8MBit/s
- RX: 21.6Mbit/s

UDP (iperf3 -u -b 0 -c)
- TX: 13.4Mbit/s
- RX: Not even possible, CPU never succeed to extract a single packet

I then tried to find where was the time spent with ftrace (I kept only
the relevant functions that consume most of the time), the following
results were recorded when using iperf3 with CPU based
injection/extraction.

In TCP TX, a lot of time is spent doing copy from user:

41.71% iperf3 [kernel.kallsyms] [k] __raw_copy_to_user
6.65% iperf3 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
3.23% iperf3 [kernel.kallsyms] [k] do_ade
2.10% iperf3 [kernel.kallsyms] [k] __ocelot_write_ix
2.10% iperf3 [kernel.kallsyms] [k] handle_adel_int
...

In TCP RX, numbers are even worse for the time spent in
__raw_copy_to_user:

62.95% iperf3 [kernel.kallsyms] [k] __raw_copy_to_user
1.97% iperf3 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.15% iperf3 [kernel.kallsyms] [k] __copy_page_start
1.07% iperf3 [kernel.kallsyms] [k] __skb_datagram_iter
...


In UDP TX, some time is spent handling locking and unaligned copies
as well as pushing packets. Unaligned copies are due to the driver
accessing all directly the bytes of the packets as word whhich might be
bad when there is misalignement.

17.97% iperf3 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
11.94% iperf3 [kernel.kallsyms] [k] do_ade
9.07% iperf3 [kernel.kallsyms] [k] __ocelot_write_ix
7.74% iperf3 [kernel.kallsyms] [k] handle_adel_int
5.78% iperf3 [kernel.kallsyms] [k] copy_from_kernel_nofault
4.71% iperf3 [kernel.kallsyms] [k] __compute_return_epc_for_insn
2.51% iperf3 [kernel.kallsyms] [k] regmap_write
2.31% iperf3 [kernel.kallsyms] [k] __compute_return_epc
...

In UDP RX (iperf3 with -b 5M to ensure packets are received), time is
spent in floating point emulation and other various function.

7.26% iperf3 [kernel.kallsyms] [k] cop1Emulate
2.84% iperf3 [kernel.kallsyms] [k] do_select
2.08% iperf3 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
2.06% iperf3 [kernel.kallsyms] [k] fpu_emulator_cop1Handler
2.01% iperf3 [kernel.kallsyms] [k] tcp_poll
2.00% iperf3 [kernel.kallsyms] [k] __raw_copy_to_user


When using the FDMA, the results are the following:

In TCP TX, copy from user is still present and checksuming takes quite
some time.

31.31% iperf3 [kernel.kallsyms] [k] __raw_copy_to_user
10.48% iperf3 [kernel.kallsyms] [k] __csum_partial_copy_to_user
3.73% iperf3 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
2.08% iperf3 [kernel.kallsyms] [k] tcp_ack
1.68% iperf3 [kernel.kallsyms] [k] ocelot_fdma_napi_poll
1.63% iperf3 [kernel.kallsyms] [k] tcp_write_xmit
1.05% iperf3 [kernel.kallsyms] [k] finish_task_switch

In TCP RX, the majority of time is still taken by __raw_copy_to_user.

63.95%[[m iperf3 [kernel.kallsyms] [k] __raw_copy_to_user
1.29%[[m iperf3 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.23%[[m iperf3 [kernel.kallsyms] [k] tcp_recvmsg_locked
1.23%[[m iperf3 [kernel.kallsyms] [k] __skb_datagram_iter
1.07%[[m iperf3 [kernel.kallsyms] [k] vfs_read

In UDP TX, time is spent in softirq entry and in checksuming.

9.01% iperf3 [kernel.kallsyms] [k] __softirqentry_text_start
7.07% iperf3 [kernel.kallsyms] [k] __csum_partial_copy_to_user
2.28% iperf3 [kernel.kallsyms] [k] __ip_append_data.isra.0
2.10% iperf3 [kernel.kallsyms] [k] __dev_queue_xmit
2.08% iperf3 [kernel.kallsyms] [k] siphash_3u32
2.06% iperf3 [kernel.kallsyms] [k] udp_sendmsg

And in UDP RX, again, time is spent in floating point emulation and
cheksuming.

10.33% iperf3 [kernel.kallsyms] [k] cop1Emulate
7.62% iperf3 [kernel.kallsyms] [k] csum_partial
3.32% iperf3 [kernel.kallsyms] [k] do_select
2.69% iperf3 [kernel.kallsyms] [k] ieee754dp_sub
2.68% iperf3 [kernel.kallsyms] [k] fpu_emulator_cop1Handler
2.56% iperf3 [kernel.kallsyms] [k] ieee754dp_add
2.33% iperf3 [kernel.kallsyms] [k] ieee754dp_div

After all these measurements, the CPU appears to be the bottleneck and
simply spend a lot of time in various functions. I did not went further
using perf events since there was no real reason to dig up more in that
way.

--
Clément Léger,
Embedded Linux and Kernel engineer at Bootlin
https://bootlin.com