Subject: [PATCH 00/18] can: m_can: Optimizations for m_can/tcan part 2

Hi Marc and everyone,

this is the second part now. I know it is the merge window right now but
I am quite sure this won't be merged immediately anyways, so if you have
some time for some comments I would appreciate it. So it is still based
on v6.1-rc8 + the patches that got applied.

I tried to do as small patches as possible so it is easier to
understand. The series changed a lot compared to v1 I sent so I didn't
call it v2. There are a lot of new patches as well.

The series contains a few small fixes and optimizations at the
beginning, then adding coalescing support and at the end removing the
restrictions on the number of parallel transmits in flight.

Note that the last patch 'Implement transmit submission coalescing' does
not perform well for me in a loopback testing setup. However I think it
may work well in normal testcases. I attached this mechanism to the
tx-frames coalescing option, let me know if this is the correct option.

Best,
Markus

part 1:
v1 - https://lore.kernel.org/lkml/[email protected]
v2 - https://lore.kernel.org/lkml/[email protected]

Markus Schneider-Pargmann (18):
can: tcan4x5x: Remove reserved register 0x814 from writable table
can: tcan4x5x: Check size of mram configuration
can: m_can: Remove repeated check for is_peripheral
can: m_can: Always acknowledge all interrupts
can: m_can: Remove double interrupt enable
can: m_can: Disable unused interrupts
can: m_can: Keep interrupts enabled during peripheral read
can: m_can: Write transmit header and data in one transaction
can: m_can: Implement receive coalescing
can: m_can: Implement transmit coalescing
can: m_can: Add rx coalescing ethtool support
can: m_can: Add tx coalescing ethtool support
can: m_can: Cache tx putidx
can: m_can: Use the workqueue as queue
can: m_can: Introduce a tx_fifo_in_flight counter
can: m_can: Use tx_fifo_in_flight for netif_queue control
can: m_can: Implement BQL
can: m_can: Implement transmit submission coalescing

drivers/net/can/m_can/m_can.c | 498 ++++++++++++++++++------
drivers/net/can/m_can/m_can.h | 36 +-
drivers/net/can/m_can/tcan4x5x-core.c | 5 +
drivers/net/can/m_can/tcan4x5x-regmap.c | 1 -
4 files changed, 418 insertions(+), 122 deletions(-)

--
2.38.1


Subject: [PATCH 04/18] can: m_can: Always acknowledge all interrupts

The code already exits the function on !ir before this condition. No
need to check again if anything is set as IR_ALL_INT is 0xffffffff.

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
---
drivers/net/can/m_can/m_can.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index a43abc667757..4b387403a7c7 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1073,8 +1073,7 @@ static irqreturn_t m_can_isr(int irq, void *dev_id)
return IRQ_NONE;

/* ACK all irqs */
- if (ir & IR_ALL_INT)
- m_can_write(cdev, M_CAN_IR, ir);
+ m_can_write(cdev, M_CAN_IR, ir);

if (cdev->ops->clear_interrupts)
cdev->ops->clear_interrupts(cdev);
--
2.38.1

Subject: [PATCH 11/18] can: m_can: Add rx coalescing ethtool support

Add the possibility to set coalescing parameters with ethtool.

rx-frames-irq and rx-usecs-irq can only be set and unset together as the
implemented mechanism would not work otherwise. rx-frames-irq can't be
greater than the RX FIFO size.

Also all values can only be changed if the chip is not active.

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
---
drivers/net/can/m_can/m_can.c | 46 +++++++++++++++++++++++++++++++++++
1 file changed, 46 insertions(+)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index a84a17386c02..4d6fc8ade4d6 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1909,8 +1909,54 @@ static const struct net_device_ops m_can_netdev_ops = {
.ndo_change_mtu = can_change_mtu,
};

+static int m_can_get_coalesce(struct net_device *dev,
+ struct ethtool_coalesce *ec,
+ struct kernel_ethtool_coalesce *kec,
+ struct netlink_ext_ack *ext_ack)
+{
+ struct m_can_classdev *cdev = netdev_priv(dev);
+
+ ec->rx_max_coalesced_frames_irq = cdev->rx_max_coalesced_frames_irq;
+ ec->rx_coalesce_usecs_irq = cdev->rx_coalesce_usecs_irq;
+
+ return 0;
+}
+
+static int m_can_set_coalesce(struct net_device *dev,
+ struct ethtool_coalesce *ec,
+ struct kernel_ethtool_coalesce *kec,
+ struct netlink_ext_ack *ext_ack)
+{
+ struct m_can_classdev *cdev = netdev_priv(dev);
+
+ if (cdev->can.state != CAN_STATE_STOPPED) {
+ netdev_err(dev, "Device is in use, please shut it down first\n");
+ return -EBUSY;
+ }
+
+ if (ec->rx_max_coalesced_frames_irq > cdev->mcfg[MRAM_RXF0].num) {
+ netdev_err(dev, "rx-frames-irq (%u) greater than the RX FIFO (%u)\n",
+ ec->rx_max_coalesced_frames_irq,
+ cdev->mcfg[MRAM_RXF0].num);
+ return -EINVAL;
+ }
+ if ((ec->rx_max_coalesced_frames_irq == 0) != (ec->rx_coalesce_usecs_irq == 0)) {
+ netdev_err(dev, "rx-frames-irq and rx-usecs-irq can only be set together\n");
+ return -EINVAL;
+ }
+
+ cdev->rx_max_coalesced_frames_irq = ec->rx_max_coalesced_frames_irq;
+ cdev->rx_coalesce_usecs_irq = ec->rx_coalesce_usecs_irq;
+
+ return 0;
+}
+
static const struct ethtool_ops m_can_ethtool_ops = {
+ .supported_coalesce_params = ETHTOOL_COALESCE_RX_USECS_IRQ |
+ ETHTOOL_COALESCE_RX_MAX_FRAMES_IRQ,
.get_ts_info = ethtool_op_get_ts_info,
+ .get_coalesce = m_can_get_coalesce,
+ .set_coalesce = m_can_set_coalesce,
};

static int register_m_can_dev(struct net_device *dev)
--
2.38.1

Subject: [PATCH 14/18] can: m_can: Use the workqueue as queue

The current implementation uses the workqueue for peripheral chips to
submit work. Only a single work item is queued and used at any time.

To be able to keep more than one transmit in flight at a time, prepare
the workqueue to support multiple transmits at the same time.

Each work item now has a separate storage for a skb and a pointer to
cdev. This assures that each workitem can be processed individually.

The workqueue is replaced by an ordered workqueue which makes sure that
only a single worker processes the items queued on the workqueue. Also
items are ordered by the order they were enqueued. This removes most of
the concurrency the workqueue normally offers. It is not necessary for
this driver.

The cleanup functions have to be adopted a bit to handle this new
mechanism.

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
---
drivers/net/can/m_can/m_can.c | 109 ++++++++++++++++++++--------------
drivers/net/can/m_can/m_can.h | 12 +++-
2 files changed, 74 insertions(+), 47 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 5b882c2fec52..42cde31fc0a8 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -442,17 +442,16 @@ static void m_can_clean(struct net_device *net)
{
struct m_can_classdev *cdev = netdev_priv(net);

- if (cdev->tx_skb) {
- int putidx = 0;
+ for (int i = 0; i != cdev->nr_tx_ops; ++i) {
+ if (!cdev->tx_ops[i].skb)
+ continue;

net->stats.tx_errors++;
- if (cdev->version > 30)
- putidx = FIELD_GET(TXFQS_TFQPI_MASK,
- m_can_read(cdev, M_CAN_TXFQS));
-
- can_free_echo_skb(cdev->net, putidx, NULL);
- cdev->tx_skb = NULL;
+ cdev->tx_ops[i].skb = NULL;
}
+
+ for (int i = 0; i != cdev->can.echo_skb_max; ++i)
+ can_free_echo_skb(cdev->net, i, NULL);
}

/* For peripherals, pass skb to rx-offload, which will push skb from
@@ -1632,8 +1631,9 @@ static int m_can_close(struct net_device *dev)
m_can_clk_stop(cdev);
free_irq(dev->irq, dev);

+ m_can_clean(dev);
+
if (cdev->is_peripheral) {
- cdev->tx_skb = NULL;
destroy_workqueue(cdev->tx_wq);
cdev->tx_wq = NULL;
can_rx_offload_disable(&cdev->offload);
@@ -1660,19 +1660,17 @@ static int m_can_next_echo_skb_occupied(struct net_device *dev, int putidx)
return !!cdev->can.echo_skb[next_idx];
}

-static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)
+static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev,
+ struct sk_buff *skb)
{
- struct canfd_frame *cf = (struct canfd_frame *)cdev->tx_skb->data;
+ struct canfd_frame *cf = (struct canfd_frame *)skb->data;
struct net_device *dev = cdev->net;
- struct sk_buff *skb = cdev->tx_skb;
struct id_and_dlc fifo_header;
u32 cccr, fdflags;
u32 txfqs;
int err;
int putidx;

- cdev->tx_skb = NULL;
-
/* Generate ID field for TX buffer Element */
/* Common to all supported M_CAN versions */
if (cf->can_id & CAN_EFF_FLAG) {
@@ -1796,10 +1794,36 @@ static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev)

static void m_can_tx_work_queue(struct work_struct *ws)
{
- struct m_can_classdev *cdev = container_of(ws, struct m_can_classdev,
- tx_work);
+ struct m_can_tx_op *op = container_of(ws, struct m_can_tx_op, work);
+ struct m_can_classdev *cdev = op->cdev;
+ struct sk_buff *skb = op->skb;

- m_can_tx_handler(cdev);
+ op->skb = NULL;
+ m_can_tx_handler(cdev, skb);
+}
+
+static void m_can_tx_queue_skb(struct m_can_classdev *cdev, struct sk_buff *skb)
+{
+ cdev->tx_ops[cdev->next_tx_op].skb = skb;
+ queue_work(cdev->tx_wq, &cdev->tx_ops[cdev->next_tx_op].work);
+
+ ++cdev->next_tx_op;
+ if (cdev->next_tx_op >= cdev->nr_tx_ops)
+ cdev->next_tx_op = 0;
+}
+
+static netdev_tx_t m_can_start_peripheral_xmit(struct m_can_classdev *cdev,
+ struct sk_buff *skb)
+{
+ if (cdev->can.state == CAN_STATE_BUS_OFF) {
+ m_can_clean(cdev->net);
+ return NETDEV_TX_OK;
+ }
+
+ netif_stop_queue(cdev->net);
+ m_can_tx_queue_skb(cdev, skb);
+
+ return NETDEV_TX_OK;
}

static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
@@ -1810,30 +1834,10 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
if (can_dev_dropped_skb(dev, skb))
return NETDEV_TX_OK;

- if (cdev->is_peripheral) {
- if (cdev->tx_skb) {
- netdev_err(dev, "hard_xmit called while tx busy\n");
- return NETDEV_TX_BUSY;
- }
-
- if (cdev->can.state == CAN_STATE_BUS_OFF) {
- m_can_clean(dev);
- } else {
- /* Need to stop the queue to avoid numerous requests
- * from being sent. Suggested improvement is to create
- * a queueing mechanism that will queue the skbs and
- * process them in order.
- */
- cdev->tx_skb = skb;
- netif_stop_queue(cdev->net);
- queue_work(cdev->tx_wq, &cdev->tx_work);
- }
- } else {
- cdev->tx_skb = skb;
- return m_can_tx_handler(cdev);
- }
-
- return NETDEV_TX_OK;
+ if (cdev->is_peripheral)
+ return m_can_start_peripheral_xmit(cdev, skb);
+ else
+ return m_can_tx_handler(cdev, skb);
}

static int m_can_open(struct net_device *dev)
@@ -1861,15 +1865,17 @@ static int m_can_open(struct net_device *dev)

/* register interrupt handler */
if (cdev->is_peripheral) {
- cdev->tx_skb = NULL;
- cdev->tx_wq = alloc_workqueue("mcan_wq",
- WQ_FREEZABLE | WQ_MEM_RECLAIM, 0);
+ cdev->tx_wq = alloc_ordered_workqueue("mcan_wq",
+ WQ_FREEZABLE | WQ_MEM_RECLAIM);
if (!cdev->tx_wq) {
err = -ENOMEM;
goto out_wq_fail;
}

- INIT_WORK(&cdev->tx_work, m_can_tx_work_queue);
+ for (int i = 0; i != cdev->nr_tx_ops; ++i) {
+ cdev->tx_ops[i].cdev = cdev;
+ INIT_WORK(&cdev->tx_ops[i].work, m_can_tx_work_queue);
+ }

err = request_threaded_irq(dev->irq, NULL, m_can_isr,
IRQF_ONESHOT,
@@ -2153,6 +2159,19 @@ int m_can_class_register(struct m_can_classdev *cdev)
{
int ret;

+ if (cdev->is_peripheral) {
+ cdev->nr_tx_ops = min(cdev->mcfg[MRAM_TXB].num,
+ cdev->mcfg[MRAM_TXE].num);
+ cdev->tx_ops =
+ devm_kzalloc(cdev->dev,
+ cdev->nr_tx_ops * sizeof(*cdev->tx_ops),
+ GFP_KERNEL);
+ if (!cdev->tx_ops) {
+ dev_err(cdev->dev, "Failed to allocate tx_ops for workqueue\n");
+ return -ENOMEM;
+ }
+ }
+
if (cdev->pm_clock_support) {
ret = m_can_clk_start(cdev);
if (ret)
diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index 185289a3719c..bf2d710c982f 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -70,6 +70,12 @@ struct m_can_ops {
int (*init)(struct m_can_classdev *cdev);
};

+struct m_can_tx_op {
+ struct m_can_classdev *cdev;
+ struct work_struct work;
+ struct sk_buff *skb;
+};
+
struct m_can_classdev {
struct can_priv can;
struct can_rx_offload offload;
@@ -80,8 +86,6 @@ struct m_can_classdev {
struct clk *cclk;

struct workqueue_struct *tx_wq;
- struct work_struct tx_work;
- struct sk_buff *tx_skb;
struct phy *transceiver;

struct hrtimer irq_timer;
@@ -105,6 +109,10 @@ struct m_can_classdev {
// Store this internally to avoid fetch delays on peripheral chips
int tx_fifo_putidx;

+ struct m_can_tx_op *tx_ops;
+ int nr_tx_ops;
+ int next_tx_op;
+
struct mram_cfg mcfg[MRAM_CFG_NUM];
};

--
2.38.1

Subject: [PATCH 17/18] can: m_can: Implement BQL

Implement byte queue limiting in preparation for the use of xmit_more().

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
---
drivers/net/can/m_can/m_can.c | 43 ++++++++++++++++++++++++++---------
1 file changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 076fa60317c2..719a7dfe154a 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -443,6 +443,8 @@ static void m_can_clean(struct net_device *net)
for (int i = 0; i != cdev->can.echo_skb_max; ++i)
can_free_echo_skb(cdev->net, i, NULL);

+ netdev_reset_queue(cdev->net);
+
spin_lock(&cdev->tx_handling_spinlock);
cdev->tx_fifo_in_flight = 0;
spin_unlock(&cdev->tx_handling_spinlock);
@@ -988,23 +990,25 @@ static int m_can_poll(struct napi_struct *napi, int quota)
* echo. timestamp is used for peripherals to ensure correct ordering
* by rx-offload, and is ignored for non-peripherals.
*/
-static void m_can_tx_update_stats(struct m_can_classdev *cdev,
- unsigned int msg_mark,
- u32 timestamp)
+static unsigned int m_can_tx_update_stats(struct m_can_classdev *cdev,
+ unsigned int msg_mark, u32 timestamp)
{
struct net_device *dev = cdev->net;
struct net_device_stats *stats = &dev->stats;
+ unsigned int frame_len;

if (cdev->is_peripheral)
stats->tx_bytes +=
can_rx_offload_get_echo_skb(&cdev->offload,
msg_mark,
timestamp,
- NULL);
+ &frame_len);
else
- stats->tx_bytes += can_get_echo_skb(dev, msg_mark, NULL);
+ stats->tx_bytes += can_get_echo_skb(dev, msg_mark, &frame_len);

stats->tx_packets++;
+
+ return frame_len;
}

static int m_can_echo_tx_event(struct net_device *dev)
@@ -1017,6 +1021,7 @@ static int m_can_echo_tx_event(struct net_device *dev)
int err = 0;
unsigned int msg_mark;
int processed = 0;
+ int processed_frame_len = 0;

struct m_can_classdev *cdev = netdev_priv(dev);

@@ -1045,10 +1050,14 @@ static int m_can_echo_tx_event(struct net_device *dev)
fgi = (++fgi >= cdev->mcfg[MRAM_TXE].num ? 0 : fgi);

/* update stats */
- m_can_tx_update_stats(cdev, msg_mark, timestamp);
+ processed_frame_len += m_can_tx_update_stats(cdev, msg_mark,
+ timestamp);
+
++processed;
}

+ netdev_completed_queue(cdev->net, processed, processed_frame_len);
+
if (ack_fgi != -1)
m_can_write(cdev, M_CAN_TXEFA, FIELD_PREP(TXEFA_EFAI_MASK,
ack_fgi));
@@ -1148,10 +1157,12 @@ static irqreturn_t m_can_isr(int irq, void *dev_id)
if (ir & IR_TC) {
/* Transmission Complete Interrupt*/
u32 timestamp = 0;
+ unsigned int frame_len;

if (cdev->is_peripheral)
timestamp = m_can_get_timestamp(cdev);
- m_can_tx_update_stats(cdev, 0, timestamp);
+ frame_len = m_can_tx_update_stats(cdev, 0, timestamp);
+ netdev_completed_queue(cdev->net, 1, frame_len);
netif_wake_queue(dev);
}
} else {
@@ -1667,6 +1678,7 @@ static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev,
u32 cccr, fdflags;
int err;
int putidx;
+ unsigned int frame_len = can_skb_get_frame_len(skb);

/* Generate ID field for TX buffer Element */
/* Common to all supported M_CAN versions */
@@ -1712,7 +1724,7 @@ static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev,
}
m_can_write(cdev, M_CAN_TXBTIE, 0x1);

- can_put_echo_skb(skb, dev, 0, 0);
+ can_put_echo_skb(skb, dev, 0, frame_len);

m_can_write(cdev, M_CAN_TXBAR, 0x1);
/* End of xmit function for version 3.0.x */
@@ -1750,7 +1762,7 @@ static netdev_tx_t m_can_tx_handler(struct m_can_classdev *cdev,
/* Push loopback echo.
* Will be looped back on TX interrupt based on message marker
*/
- can_put_echo_skb(skb, dev, putidx, 0);
+ can_put_echo_skb(skb, dev, putidx, frame_len);

/* Enable TX FIFO element to start transfer */
m_can_write(cdev, M_CAN_TXBAR, (1 << putidx));
@@ -1833,14 +1845,23 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
struct m_can_classdev *cdev = netdev_priv(dev);
+ netdev_tx_t ret;
+ unsigned int frame_len;

if (can_dev_dropped_skb(dev, skb))
return NETDEV_TX_OK;

+ frame_len = can_skb_get_frame_len(skb);
+
if (cdev->is_peripheral)
- return m_can_start_peripheral_xmit(cdev, skb);
+ ret = m_can_start_peripheral_xmit(cdev, skb);
else
- return m_can_start_fast_xmit(cdev, skb);
+ ret = m_can_start_fast_xmit(cdev, skb);
+
+ if (ret == NETDEV_TX_OK)
+ netdev_sent_queue(dev, frame_len);
+
+ return ret;
}

static int m_can_open(struct net_device *dev)
--
2.38.1

Subject: [PATCH 15/18] can: m_can: Introduce a tx_fifo_in_flight counter

Keep track of the number of transmits in flight.

This patch prepares the driver to control the network interface queue
based on this counter. By itself this counter be
implemented with an atomic, but as we need to do other things in the
critical sections later I am using a spinlock instead.

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
---
drivers/net/can/m_can/m_can.c | 27 ++++++++++++++++++++++++++-
drivers/net/can/m_can/m_can.h | 4 ++++
2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 42cde31fc0a8..90c9ff474149 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -452,6 +452,10 @@ static void m_can_clean(struct net_device *net)

for (int i = 0; i != cdev->can.echo_skb_max; ++i)
can_free_echo_skb(cdev->net, i, NULL);
+
+ spin_lock(&cdev->tx_handling_spinlock);
+ cdev->tx_fifo_in_flight = 0;
+ spin_unlock(&cdev->tx_handling_spinlock);
}

/* For peripherals, pass skb to rx-offload, which will push skb from
@@ -1022,6 +1026,7 @@ static int m_can_echo_tx_event(struct net_device *dev)
int i = 0;
int err = 0;
unsigned int msg_mark;
+ int processed = 0;

struct m_can_classdev *cdev = netdev_priv(dev);

@@ -1051,12 +1056,17 @@ static int m_can_echo_tx_event(struct net_device *dev)

/* update stats */
m_can_tx_update_stats(cdev, msg_mark, timestamp);
+ ++processed;
}

if (ack_fgi != -1)
m_can_write(cdev, M_CAN_TXEFA, FIELD_PREP(TXEFA_EFAI_MASK,
ack_fgi));

+ spin_lock(&cdev->tx_handling_spinlock);
+ cdev->tx_fifo_in_flight -= processed;
+ spin_unlock(&cdev->tx_handling_spinlock);
+
return err;
}

@@ -1821,11 +1831,26 @@ static netdev_tx_t m_can_start_peripheral_xmit(struct m_can_classdev *cdev,
}

netif_stop_queue(cdev->net);
+
+ spin_lock(&cdev->tx_handling_spinlock);
+ ++cdev->tx_fifo_in_flight;
+ spin_unlock(&cdev->tx_handling_spinlock);
+
m_can_tx_queue_skb(cdev, skb);

return NETDEV_TX_OK;
}

+static netdev_tx_t m_can_start_fast_xmit(struct m_can_classdev *cdev,
+ struct sk_buff *skb)
+{
+ spin_lock(&cdev->tx_handling_spinlock);
+ ++cdev->tx_fifo_in_flight;
+ spin_unlock(&cdev->tx_handling_spinlock);
+
+ return m_can_tx_handler(cdev, skb);
+}
+
static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
@@ -1837,7 +1862,7 @@ static netdev_tx_t m_can_start_xmit(struct sk_buff *skb,
if (cdev->is_peripheral)
return m_can_start_peripheral_xmit(cdev, skb);
else
- return m_can_tx_handler(cdev, skb);
+ return m_can_start_fast_xmit(cdev, skb);
}

static int m_can_open(struct net_device *dev)
diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index bf2d710c982f..adbd4765accc 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -109,6 +109,10 @@ struct m_can_classdev {
// Store this internally to avoid fetch delays on peripheral chips
int tx_fifo_putidx;

+ /* Protects shared state between start_xmit and m_can_isr */
+ spinlock_t tx_handling_spinlock;
+ int tx_fifo_in_flight;
+
struct m_can_tx_op *tx_ops;
int nr_tx_ops;
int next_tx_op;
--
2.38.1

Subject: [PATCH 09/18] can: m_can: Implement receive coalescing

m_can offers the possibility to set an interrupt on reaching a watermark
level in the receive FIFO. This can be used to implement coalescing.
Unfortunately there is no hardware timeout available to trigger an
interrupt if only a few messages were received within a given time. To
solve this I am using a hrtimer to wake up the irq thread after x
microseconds.

The timer is always started if receive coalescing is enabled and new
received frames were available during an interrupt. The timer is stopped
if during a interrupt handling no new data was available.

If the timer is started the new item interrupt is disabled and the
watermark interrupt takes over. If the timer is not started again, the
new item interrupt is enabled again, notifying the handler about every
new item received.

Signed-off-by: Markus Schneider-Pargmann <[email protected]>
---
drivers/net/can/m_can/m_can.c | 69 ++++++++++++++++++++++++++++++++---
drivers/net/can/m_can/m_can.h | 7 ++++
2 files changed, 70 insertions(+), 6 deletions(-)

diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 9b5ad222aef7..2e664313101b 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -1060,15 +1060,55 @@ static int m_can_echo_tx_event(struct net_device *dev)
return err;
}

+static void m_can_interrupt_enable(struct m_can_classdev *cdev, u32 interrupts)
+{
+ if (cdev->active_interrupts == interrupts)
+ return;
+ cdev->ops->write_reg(cdev, M_CAN_IE, interrupts);
+ cdev->active_interrupts = interrupts;
+}
+
+static void m_can_coalescing_disable(struct m_can_classdev *cdev)
+{
+ u32 new_interrupts = cdev->active_interrupts | IR_RF0N;
+
+ hrtimer_cancel(&cdev->irq_timer);
+ m_can_interrupt_enable(cdev, new_interrupts);
+}
+
+static void m_can_coalescing_update(struct m_can_classdev *cdev, u32 ir)
+{
+ u32 new_interrupts = cdev->active_interrupts;
+ bool enable_timer = false;
+
+ if (cdev->rx_coalesce_usecs_irq > 0 && (ir & (IR_RF0N | IR_RF0W))) {
+ enable_timer = true;
+ new_interrupts &= ~IR_RF0N;
+ } else if (!hrtimer_active(&cdev->irq_timer)) {
+ new_interrupts |= IR_RF0N;
+ }
+
+ m_can_interrupt_enable(cdev, new_interrupts);
+ if (enable_timer) {
+ hrtimer_start(&cdev->irq_timer,
+ ns_to_ktime(cdev->rx_coalesce_usecs_irq * NSEC_PER_USEC),
+ HRTIMER_MODE_REL);
+ }
+}
+
static irqreturn_t m_can_isr(int irq, void *dev_id)
{
struct net_device *dev = (struct net_device *)dev_id;
struct m_can_classdev *cdev = netdev_priv(dev);
u32 ir;

- if (pm_runtime_suspended(cdev->dev))
+ if (pm_runtime_suspended(cdev->dev)) {
+ m_can_coalescing_disable(cdev);
return IRQ_NONE;
+ }
+
ir = m_can_read(cdev, M_CAN_IR);
+ m_can_coalescing_update(cdev, ir);
if (!ir)
return IRQ_NONE;

@@ -1083,13 +1123,17 @@ static irqreturn_t m_can_isr(int irq, void *dev_id)
* - state change IRQ
* - bus error IRQ and bus error reporting
*/
- if ((ir & IR_RF0N) || (ir & IR_ERR_ALL_30X)) {
+ if (ir & (IR_RF0N | IR_RF0W | IR_ERR_ALL_30X)) {
cdev->irqstatus = ir;
if (!cdev->is_peripheral) {
m_can_disable_all_interrupts(cdev);
napi_schedule(&cdev->napi);
- } else if (m_can_rx_peripheral(dev, ir) < 0) {
- goto out_fail;
+ } else {
+ int pkts;
+
+ pkts = m_can_rx_peripheral(dev, ir);
+ if (pkts < 0)
+ goto out_fail;
}
}

@@ -1125,6 +1169,15 @@ static irqreturn_t m_can_isr(int irq, void *dev_id)
return IRQ_HANDLED;
}

+static enum hrtimer_restart m_can_irq_timer(struct hrtimer *timer)
+{
+ struct m_can_classdev *cdev = container_of(timer, struct m_can_classdev, irq_timer);
+
+ irq_wake_thread(cdev->net->irq, cdev->net);
+
+ return HRTIMER_NORESTART;
+}
+
static const struct can_bittiming_const m_can_bittiming_const_30X = {
.name = KBUILD_MODNAME,
.tseg1_min = 2, /* Time segment 1 = prop_seg + phase_seg1 */
@@ -1258,7 +1311,7 @@ static void m_can_chip_config(struct net_device *dev)
/* Disable unused interrupts */
interrupts &= ~(IR_ARA | IR_ELO | IR_DRX | IR_TEFF | IR_TEFW | IR_TFE |
IR_TCF | IR_HPM | IR_RF1F | IR_RF1W | IR_RF1N |
- IR_RF0F | IR_RF0W);
+ IR_RF0F);

m_can_config_endisable(cdev, true);

@@ -1302,6 +1355,7 @@ static void m_can_chip_config(struct net_device *dev)

/* rx fifo configuration, blocking mode, fifo size 1 */
m_can_write(cdev, M_CAN_RXF0C,
+ FIELD_PREP(RXFC_FWM_MASK, cdev->rx_max_coalesced_frames_irq) |
FIELD_PREP(RXFC_FS_MASK, cdev->mcfg[MRAM_RXF0].num) |
cdev->mcfg[MRAM_RXF0].off);

@@ -1360,7 +1414,7 @@ static void m_can_chip_config(struct net_device *dev)
else
interrupts &= ~(IR_ERR_LEC_31X);
}
- m_can_write(cdev, M_CAN_IE, interrupts);
+ m_can_interrupt_enable(cdev, interrupts);

/* route all interrupts to INT0 */
m_can_write(cdev, M_CAN_ILS, ILS_ALL_INT0);
@@ -2030,6 +2084,9 @@ int m_can_class_register(struct m_can_classdev *cdev)

of_can_transceiver(cdev->net);

+ hrtimer_init(&cdev->irq_timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+ cdev->irq_timer.function = m_can_irq_timer;
+
dev_info(cdev->dev, "%s device registered (irq=%d, version=%d)\n",
KBUILD_MODNAME, cdev->net->irq, cdev->version);

diff --git a/drivers/net/can/m_can/m_can.h b/drivers/net/can/m_can/m_can.h
index d2c584232c1a..4943e1e9aff0 100644
--- a/drivers/net/can/m_can/m_can.h
+++ b/drivers/net/can/m_can/m_can.h
@@ -84,6 +84,8 @@ struct m_can_classdev {
struct sk_buff *tx_skb;
struct phy *transceiver;

+ struct hrtimer irq_timer;
+
struct m_can_ops *ops;

int version;
@@ -92,6 +94,11 @@ struct m_can_classdev {
int pm_clock_support;
int is_peripheral;

+ // Cached M_CAN_IE register content
+ u32 active_interrupts;
+ u32 rx_max_coalesced_frames_irq;
+ u32 rx_coalesce_usecs_irq;
+
struct mram_cfg mcfg[MRAM_CFG_NUM];
};

--
2.38.1

Subject: Re: [PATCH 00/18] can: m_can: Optimizations for m_can/tcan part 2

Hi everyone,

On Wed, Dec 21, 2022 at 04:25:19PM +0100, Markus Schneider-Pargmann wrote:
> Hi Marc and everyone,
>
> this is the second part now. I know it is the merge window right now but
> I am quite sure this won't be merged immediately anyways, so if you have
> some time for some comments I would appreciate it. So it is still based
> on v6.1-rc8 + the patches that got applied.
>
> I tried to do as small patches as possible so it is easier to
> understand. The series changed a lot compared to v1 I sent so I didn't
> call it v2. There are a lot of new patches as well.
>
> The series contains a few small fixes and optimizations at the
> beginning, then adding coalescing support and at the end removing the
> restrictions on the number of parallel transmits in flight.
>
> Note that the last patch 'Implement transmit submission coalescing' does
> not perform well for me in a loopback testing setup. However I think it
> may work well in normal testcases. I attached this mechanism to the
> tx-frames coalescing option, let me know if this is the correct option.

I introduced a bug in this series in the internal m_can driver (not
peripheral) and maybe for older m_can versions as well. No need to
review at the moment, I will send a new version once they are fixed.

Best,
Markus

>
> Best,
> Markus
>
> part 1:
> v1 - https://lore.kernel.org/lkml/[email protected]
> v2 - https://lore.kernel.org/lkml/[email protected]
>
> Markus Schneider-Pargmann (18):
> can: tcan4x5x: Remove reserved register 0x814 from writable table
> can: tcan4x5x: Check size of mram configuration
> can: m_can: Remove repeated check for is_peripheral
> can: m_can: Always acknowledge all interrupts
> can: m_can: Remove double interrupt enable
> can: m_can: Disable unused interrupts
> can: m_can: Keep interrupts enabled during peripheral read
> can: m_can: Write transmit header and data in one transaction
> can: m_can: Implement receive coalescing
> can: m_can: Implement transmit coalescing
> can: m_can: Add rx coalescing ethtool support
> can: m_can: Add tx coalescing ethtool support
> can: m_can: Cache tx putidx
> can: m_can: Use the workqueue as queue
> can: m_can: Introduce a tx_fifo_in_flight counter
> can: m_can: Use tx_fifo_in_flight for netif_queue control
> can: m_can: Implement BQL
> can: m_can: Implement transmit submission coalescing
>
> drivers/net/can/m_can/m_can.c | 498 ++++++++++++++++++------
> drivers/net/can/m_can/m_can.h | 36 +-
> drivers/net/can/m_can/tcan4x5x-core.c | 5 +
> drivers/net/can/m_can/tcan4x5x-regmap.c | 1 -
> 4 files changed, 418 insertions(+), 122 deletions(-)
>
> --
> 2.38.1
>