Subject: [PATCH 00/14] can: m_can: Optimizations for m_can/tcan part 2

Hi Marc, Simon, Martin and everyone,

v7 is a rebase on v6.8. During my am62 tests I discovered some problems
which are fixed with this update.

@Simon: I had to remove most of your reviews again, due to a few
fixes I made.

The series implements many small and bigger throughput improvements and
adds rx/tx coalescing at the end.

Based on v6.8-rc1. Also available at
https://gitlab.baylibre.com/msp8/linux/-/tree/topic/mcan-optimization/v6.8?ref_type=heads

Best,
Markus

Changes in v7:
- Rebased to v6.8-rc1
- Fixed NULL pointer dereference in m_can_clean() on am62 that happened
when doing ip link up, ip link down, ip link up
- Fixed a racecondition on am62 observed with high throughput tests.
netdev_completed_queue() was called before netdev_sent_queue() as the
interrupt was processed so fast. netdev_sent_queue() is now reported
before the actual sent is done.
- Fixed an initializing issue on am62 where active interrupts are
getting lost between runs. Fixed by resetting cdev->active_interrupts
in m_can_disable_all_interrupts()
- Removed m_can_start_fast_xmit() because of a reordering of operations
due to above mentioned racecondition

Changes in v6:
- Rebased to v6.6-rc2
- Added two small changes for the newly integrated polling feature
- Reuse the polling hrtimer for coalescing as the timer used for
coalescing has a similar purpose as the one for polling. Also polling
and coalescing will never be active at the same time.

Changes in v5:
- Add back parenthesis in m_can_set_coalesce(). This will make
checkpatch unhappy but gcc happy.
- Remove unused fifo_header variable in m_can_tx_handler().
- Rebased to v6.5-rc1

Changes in v4:
- Create and use struct m_can_fifo_element in m_can_tx_handler
- Fix memcpy_and_pad to copy the full buffer
- Fixed a few checkpatch warnings
- Change putidx to be unsigned
- Print hard_xmit error only once when TX FIFO is full

Changes in v3:
- Remove parenthesis in error messages
- Use memcpy_and_pad for buffer copy in 'can: m_can: Write transmit
header and data in one transaction'.
- Replace spin_lock with spin_lock_irqsave. I got a report of a
interrupt that was calling start_xmit just after the netqueue was
woken up before the locked region was exited. spin_lock_irqsave should
fix this. I attached the full stack at the end of the mail if someone
wants to know.
- Rebased to v6.3-rc1.
- Removed tcan4x5x patches from this series.

Changes in v2:
- Rebased on v6.2-rc5
- Fixed missing/broken accounting for non peripheral m_can devices.

previous versions:
v1 - https://lore.kernel.org/lkml/[email protected]
v2 - https://lore.kernel.org/lkml/[email protected]
v3 - https://lore.kernel.org/lkml/[email protected]/
v4 - https://lore.kernel.org/lkml/[email protected]/
v5 - https://lore.kernel.org/lkml/[email protected]
v6 - https://lore.kernel.org/lkml/[email protected]

Markus Schneider-Pargmann (14):
can: m_can: Start/Cancel polling timer together with interrupts
can: m_can: Move hrtimer init to m_can_class_register
can: m_can: Write transmit header and data in one transaction
can: m_can: Implement receive coalescing
can: m_can: Implement transmit coalescing
can: m_can: Add rx coalescing ethtool support
can: m_can: Add tx coalescing ethtool support
can: m_can: Use u32 for putidx
can: m_can: Cache tx putidx
can: m_can: Use the workqueue as queue
can: m_can: Introduce a tx_fifo_in_flight counter
can: m_can: Use tx_fifo_in_flight for netif_queue control
can: m_can: Implement BQL
can: m_can: Implement transmit submission coalescing

drivers/net/can/m_can/m_can.c | 551 ++++++++++++++++++-------
drivers/net/can/m_can/m_can.h | 34 +-
drivers/net/can/m_can/m_can_platform.c | 4 -
3 files changed, 439 insertions(+), 150 deletions(-)

--
2.43.0



Subject: [PATCH 14/14] can: m_can: Implement transmit submission coalescing

m_can supports submitting multiple transmits with one register write.
This is an interesting option to reduce the number of SPI transfers for
peripheral chips.

The m_can_tx_op is extended with a bool that signals if it is the last
transmission and the submit should be executed immediately.

The worker then writes the skb to the FIFO and submits it only if the
submit bool is set. If it isn't set, the worker will write the next skb
which is waiting in the workqueue to the FIFO, etc.

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
---
drivers/net/can/m_can/m_can.c | 56 ++++++++++++++++++++++++++++++++---
drivers/net/can/m_can/m_can.h | 6 ++++
2 files changed, 58 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 48968da69ae9..b7dbce4c342a 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1539,6 +1539,9 @@ static int m_can_start(struct net_device *dev)
if (ret)
return ret;

+ netdev_queue_set_dql_min_limit(netdev_get_tx_queue(cdev->net, 0),
+ cdev->tx_max_coalesced_frames);
+
cdev->can.state = CAN_STATE_ERROR_ACTIVE;

m_can_enable_all_interrupts(cdev);
@@ -1835,8 +1838,13 @@ static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev,
*/
can_put_echo_skb(skb, dev, putidx, frame_len);

- /* Enable TX FIFO element to start transfer */
- m_can_write(cdev, M_CAN_TXBAR, (1 << putidx));
+ if (cdev->is_peripheral) {
+ /* Delay enabling TX FIFO element */
+ cdev->tx_peripheral_submit |= BIT(putidx);
+ } else {
+ /* Enable TX FIFO element to start transfer */
+ m_can_write(cdev, M_CAN_TXBAR, BIT(putidx));
+ }
cdev->tx_fifo_putidx = (++cdev->tx_fifo_putidx >= cdev->can.echo_skb_max ?
0 : cdev->tx_fifo_putidx);
}
@@ -1849,6 +1857,17 @@ static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev,
return NETDEV_TX_BUSY;
}

+static void m_can_tx_submit(struct m_can_classdev *cdev)
+{
+ if (cdev->version == 30)
+ return;
+ if (!cdev->is_peripheral)
+ return;
+
+ m_can_write(cdev, M_CAN_TXBAR, cdev->tx_peripheral_submit);
+ cdev->tx_peripheral_submit = 0;
+}
+
static void m_can_tx_work_queue(struct work_struct *ws)
{
struct m_can_tx_op *op = container_of(ws, struct m_can_tx_op, work);
@@ -1857,11 +1876,15 @@ static void m_can_tx_work_queue(struct work_struct *ws)

op->skb = NULL;
m_can_tx_handler(cdev, skb);
+ if (op->submit)
+ m_can_tx_submit(cdev);
}

-static void m_can_tx_queue_skb(struct m_can_classdev *cdev, struct sk_buff *skb)
+static void m_can_tx_queue_skb(struct m_can_classdev *cdev, struct sk_buff *skb,
+ bool submit)
{
cdev->tx_ops[cdev->next_tx_op].skb = skb;
+ cdev->tx_ops[cdev->next_tx_op].submit = submit;
queue_work(cdev->tx_wq, &cdev->tx_ops[cdev->next_tx_op].work);

++cdev->next_tx_op;
@@ -1872,7 +1895,17 @@ static void m_can_tx_queue_skb(struct m_can_classdev *cdev, struct sk_buff *skb)
static netdev_tx_t m_can_start_peripheral_xmit(struct m_can_classdev *cdev,
struct sk_buff *skb)
{
- m_can_tx_queue_skb(cdev, skb);
+ bool submit;
+
+ ++cdev->nr_txs_without_submit;
+ if (cdev->nr_txs_without_submit >= cdev->tx_max_coalesced_frames ||
+ !netdev_xmit_more()) {
+ cdev->nr_txs_without_submit = 0;
+ submit = true;
+ } else {
+ submit = false;
+ }
+ m_can_tx_queue_skb(cdev, skb, submit);

return NETDEV_TX_OK;
}
@@ -2015,6 +2048,7 @@ static int m_can_get_coalesce(struct net_device *dev,

ec->rx_max_coalesced_frames_irq = cdev->rx_max_coalesced_frames_irq;
ec->rx_coalesce_usecs_irq = cdev->rx_coalesce_usecs_irq;
+ ec->tx_max_coalesced_frames = cdev->tx_max_coalesced_frames;
ec->tx_max_coalesced_frames_irq = cdev->tx_max_coalesced_frames_irq;
ec->tx_coalesce_usecs_irq = cdev->tx_coalesce_usecs_irq;

@@ -2059,6 +2093,18 @@ static int m_can_set_coalesce(struct net_device *dev,
netdev_err(dev, "tx-frames-irq and tx-usecs-irq can only be set together\n");
return -EINVAL;
}
+ if (ec->tx_max_coalesced_frames > cdev->mcfg[MRAM_TXE].num) {
+ netdev_err(dev, "tx-frames %u greater than the TX event FIFO %u\n",
+ ec->tx_max_coalesced_frames,
+ cdev->mcfg[MRAM_TXE].num);
+ return -EINVAL;
+ }
+ if (ec->tx_max_coalesced_frames > cdev->mcfg[MRAM_TXB].num) {
+ netdev_err(dev, "tx-frames %u greater than the TX FIFO %u\n",
+ ec->tx_max_coalesced_frames,
+ cdev->mcfg[MRAM_TXB].num);
+ return -EINVAL;
+ }
if (ec->rx_coalesce_usecs_irq != 0 && ec->tx_coalesce_usecs_irq != 0 &&
ec->rx_coalesce_usecs_irq != ec->tx_coalesce_usecs_irq) {
netdev_err(dev, "rx-usecs-irq %u needs to be equal to tx-usecs-irq %u if both are enabled\n",
@@ -2069,6 +2115,7 @@ static int m_can_set_coalesce(struct net_device *dev,

cdev->rx_max_coalesced_frames_irq = ec->rx_max_coalesced_frames_irq;
cdev->rx_coalesce_usecs_irq = ec->rx_coalesce_usecs_irq;
+ cdev->tx_max_coalesced_frames = ec->tx_max_coalesced_frames;
cdev->tx_max_coalesced_frames_irq = ec->tx_max_coalesced_frames_irq;
cdev->tx_coalesce_usecs_irq = ec->tx_coalesce_usecs_irq;

@@ -2086,6 +2133,7 @@ static const struct ethtool_ops m_can_ethtool_ops = {
.supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS_IRQ |
ETHTOOL_COALESCE_RX_MAX_FRAMES_IRQ |
ETHTOOL_COALESCE_TX_USECS_IRQ |
+ ETHTOOL_COALESCE_TX_MAX_FRAMES |
ETHTOOL_COALESCE_TX_MAX_FRAMES_IRQ,
.get_ts_info = ethtool_op_get_ts_info,
.get_coalesce = m_can_get_coalesce,
diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index 76b1ce1b7c1b..2986c4ce0b2f 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -74,6 +74,7 @@ struct m_can_tx_op {
struct m_can_classdev *cdev;
struct work_struct work;
struct sk_buff *skb;
+ bool submit;
};

struct m_can_classdev {
@@ -102,6 +103,7 @@ struct m_can_classdev {
u32 active_interrupts;
u32 rx_max_coalesced_frames_irq;
u32 rx_coalesce_usecs_irq;
+ u32 tx_max_coalesced_frames;
u32 tx_max_coalesced_frames_irq;
u32 tx_coalesce_usecs_irq;

@@ -116,6 +118,10 @@ struct m_can_classdev {
int tx_fifo_size;
int next_tx_op;

+ int nr_txs_without_submit;
+ /* bitfield of fifo elements that will be submitted together */
+ u32 tx_peripheral_submit;
+
struct mram_cfg mcfg[MRAM_CFG_NUM];

struct hrtimer hrtimer;
--
2.43.0


Subject: [PATCH 09/14] can: m_can: Cache tx putidx

m_can_tx_handler is the only place where data is written to the tx fifo.
We can calculate the putidx in the driver code here to avoid the
dependency on the txfqs register.

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
---
drivers/net/can/m_can/m_can.c | 8 +++++++-
drivers/net/can/m_can/m_can.h | 3 +++
2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 1b62613f195c..a8e7b910ef81 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1504,6 +1504,10 @@ static int m_can_start(struct net_device *dev)

m_can_enable_all_interrupts(cdev);

+ if (cdev->version > 30)
+ cdev->tx_fifo_putidx = FIELD_GET(TXFQS_TFQPI_MASK,
+ m_can_read(cdev, M_CAN_TXFQS));
+
return 0;
}

@@ -1793,7 +1797,7 @@ static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)
}

/* get put index for frame */
- putidx = FIELD_GET(TXFQS_TFQPI_MASK, txfqs);
+ putidx = cdev->tx_fifo_putidx;

/* Construct DLC Field, with CAN-FD configuration.
* Use the put index of the fifo as the message marker,
@@ -1827,6 +1831,8 @@ static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)

/* Enable TX FIFO element to start transfer */
m_can_write(cdev, M_CAN_TXBAR, (1 << putidx));
+ cdev->tx_fifo_putidx = (++cdev->tx_fifo_putidx >= cdev->can.echo_skb_max ?
+ 0 : cdev->tx_fifo_putidx);

/* stop network queue if fifo full */
if (m_can_tx_fifo_full(cdev) ||
diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index 1e461d305bce..0de42fc5ef1e 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -101,6 +101,9 @@ struct m_can_classdev {
u32 tx_max_coalesced_frames_irq;
u32 tx_coalesce_usecs_irq;

+ // Store this internally to avoid fetch delays on peripheral chips
+ int tx_fifo_putidx;
+
struct mram_cfg mcfg[MRAM_CFG_NUM];

struct hrtimer hrtimer;
--
2.43.0


Subject: [PATCH 08/14] can: m_can: Use u32 for putidx

putidx is not an integer normally, it is an unsigned field used in
hardware registers. Use a u32 for it.

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
---
drivers/net/can/m_can/m_can.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index b31df3e3ceeb..1b62613f195c 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -486,7 +486,7 @@ static void m_can_clean(struct net_device *net)
struct m_can_classdev *cdev = netdev_priv(net);

if (cdev->tx_skb) {
- int putidx = 0;
+ u32 putidx = 0;

net->stats.tx_errors++;
if (cdev->version > 30)
@@ -1695,12 +1695,12 @@ static int m_can_close(struct net_device *dev)
return 0;
}

-static int m_can_next_echo_skb_occupied(struct net_device *dev, int putidx)
+static int m_can_next_echo_skb_occupied(struct net_device *dev, u32 putidx)
{
struct m_can_classdev *cdev = netdev_priv(dev);
/*get wrap around for loopback skb index */
unsigned int wrap = cdev->can.echo_skb_max;
- int next_idx;
+ u32 next_idx;

/* calculate next index */
next_idx = (++putidx >= wrap ? 0 : putidx);
@@ -1719,7 +1719,7 @@ static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)
u32 cccr, fdflags;
u32 txfqs;
int err;
- int putidx;
+ u32 putidx;

cdev->tx_skb = NULL;

--
2.43.0


Subject: [PATCH 11/14] can: m_can: Introduce a tx_fifo_in_flight counter

Keep track of the number of transmits in flight.

This patch prepares the driver to control the network interface queue
based on this counter. By itself this counter be
implemented with an atomic, but as we need to do other things in the
critical sections later I am using a spinlock instead.

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
---
drivers/net/can/m_can/m_can.c | 30 ++++++++++++++++++++++++++++++
drivers/net/can/m_can/m_can.h | 4 ++++
2 files changed, 34 insertions(+)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 8d7dbf2eb46c..2c68b1a60887 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -484,6 +484,7 @@ static u32 m_can_get_timestamp(struct m_can_classdev *cdev)
static void m_can_clean(struct net_device *net)
{
struct m_can_classdev *cdev = netdev_priv(net);
+ unsigned long irqflags;

if (cdev->tx_ops) {
for (int i = 0; i != cdev->tx_fifo_size; ++i) {
@@ -497,6 +498,10 @@ static void m_can_clean(struct net_device *net)

for (int i = 0; i != cdev->can.echo_skb_max; ++i)
can_free_echo_skb(cdev->net, i, NULL);
+
+ spin_lock_irqsave(&cdev->tx_handling_spinlock, irqflags);
+ cdev->tx_fifo_in_flight = 0;
+ spin_unlock_irqrestore(&cdev->tx_handling_spinlock, irqflags);
}

/* For peripherals, pass skb to rx-offload, which will push skb from
@@ -1067,6 +1072,24 @@ static void m_can_tx_update_stats(struct m_can_classdev *cdev,
stats->tx_packets++;
}

+static void m_can_finish_tx(struct m_can_classdev *cdev, int transmitted)
+{
+ unsigned long irqflags;
+
+ spin_lock_irqsave(&cdev->tx_handling_spinlock, irqflags);
+ cdev->tx_fifo_in_flight -= transmitted;
+ spin_unlock_irqrestore(&cdev->tx_handling_spinlock, irqflags);
+}
+
+static void m_can_start_tx(struct m_can_classdev *cdev)
+{
+ unsigned long irqflags;
+
+ spin_lock_irqsave(&cdev->tx_handling_spinlock, irqflags);
+ ++cdev->tx_fifo_in_flight;
+ spin_unlock_irqrestore(&cdev->tx_handling_spinlock, irqflags);
+}
+
static int m_can_echo_tx_event(struct net_device *dev)
{
u32 txe_count = 0;
@@ -1076,6 +1099,7 @@ static int m_can_echo_tx_event(struct net_device *dev)
int i = 0;
int err = 0;
unsigned int msg_mark;
+ int processed = 0;

struct m_can_classdev *cdev = netdev_priv(dev);

@@ -1105,12 +1129,15 @@ static int m_can_echo_tx_event(struct net_device *dev)

/* update stats */
m_can_tx_update_stats(cdev, msg_mark, timestamp);
+ ++processed;
}

if (ack_fgi != -1)
m_can_write(cdev, M_CAN_TXEFA, FIELD_PREP(TXEFA_EFAI_MASK,
ack_fgi));

+ m_can_finish_tx(cdev, processed);
+
return err;
}

@@ -1192,6 +1219,7 @@ static irqreturn_t m_can_isr(int irq, void *dev_id)
timestamp = m_can_get_timestamp(cdev);
m_can_tx_update_stats(cdev, 0, timestamp);
netif_wake_queue(dev);
+ m_can_finish_tx(cdev, 1);
}
} else {
if (ir & (IR_TEFN | IR_TEFW)) {
@@ -1890,6 +1918,8 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
return NETDEV_TX_OK;
}

+ m_can_start_tx(cdev);
+
if (cdev->is_peripheral)
return m_can_start_peripheral_xmit(cdev, skb);
else
diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index be1d2119bd53..76b1ce1b7c1b 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -108,6 +108,10 @@ struct m_can_classdev {
// Store this internally to avoid fetch delays on peripheral chips
u32 tx_fifo_putidx;

+ /* Protects shared state between start_xmit and m_can_isr */
+ spinlock_t tx_handling_spinlock;
+ int tx_fifo_in_flight;
+
struct m_can_tx_op *tx_ops;
int tx_fifo_size;
int next_tx_op;
--
2.43.0