2021-05-09 12:53:00

by Dario Binacchi

[permalink] [raw]
Subject: [PATCH 0/3] can: c_can: cache frames to operate as a true FIFO


Performance tests of the c_can driver led to the patch that gives the
series its name. We have also added a patch for ethtool support and a
patch to remove a variable that is no longer used.


Dario Binacchi (3):
can: c_can: remove the rxmasked unused variable
can: c_can: add ethtool support
can: c_can: cache frames to operate as a true FIFO

drivers/net/can/c_can/Makefile | 3 +
drivers/net/can/c_can/c_can.h | 6 +-
drivers/net/can/c_can/c_can_ethtool.c | 46 +++++++++++++
.../net/can/c_can/{c_can.c => c_can_main.c} | 65 +++++++++++++++----
4 files changed, 107 insertions(+), 13 deletions(-)
create mode 100644 drivers/net/can/c_can/c_can_ethtool.c
rename drivers/net/can/c_can/{c_can.c => c_can_main.c} (95%)

--
2.17.1


2021-05-09 12:53:25

by Dario Binacchi

[permalink] [raw]
Subject: [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO

As reported by a comment in the c_can_start_xmit() this was not a FIFO.
C/D_CAN controller sends out the buffers prioritized so that the lowest
buffer number wins.

What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
waited until the only frame of the FIFO was actually transmitted by the
controller. Only one message in the FIFO but we had to wait for it to
empty completely to ensure that the messages were transmitted in the
order in which they were loaded.

By storing the frames in the FIFO without requiring its transmission, we
will be able to use the full size of the FIFO even in cases such as the
one described above. The transmission interrupt will trigger their
transmission only when all the messages previously loaded but stored in
less priority positions of the buffers have been transmitted.

Suggested-by: Gianluca Falavigna <[email protected]>
Signed-off-by: Dario Binacchi <[email protected]>


---

drivers/net/can/c_can/c_can.h | 3 ++
drivers/net/can/c_can/c_can_main.c | 63 ++++++++++++++++++++++++------
2 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/drivers/net/can/c_can/c_can.h b/drivers/net/can/c_can/c_can.h
index 4247ff80a29c..6abde6cbc0b1 100644
--- a/drivers/net/can/c_can/c_can.h
+++ b/drivers/net/can/c_can/c_can.h
@@ -191,6 +191,9 @@ struct c_can_priv {
unsigned int msg_obj_tx_last;
u32 msg_obj_rx_mask;
atomic_t tx_active;
+ atomic_t tx_cached;
+ spinlock_t tx_cached_lock;
+ atomic_t tx_avail;
atomic_t sie_pending;
unsigned long tx_dir;
int last_status;
diff --git a/drivers/net/can/c_can/c_can_main.c b/drivers/net/can/c_can/c_can_main.c
index 7588f70ca0fe..d2f44c07d47f 100644
--- a/drivers/net/can/c_can/c_can_main.c
+++ b/drivers/net/can/c_can/c_can_main.c
@@ -124,6 +124,9 @@
IF_COMM_TXRQST | \
IF_COMM_DATAA | IF_COMM_DATAB)

+#define IF_COMM_TX_FRAME (IF_COMM_ARB | IF_COMM_CONTROL | \
+ IF_COMM_DATAA | IF_COMM_DATAB)
+
/* For the low buffers we clear the interrupt bit, but keep newdat */
#define IF_COMM_RCV_LOW (IF_COMM_MASK | IF_COMM_ARB | \
IF_COMM_CONTROL | IF_COMM_CLR_INT_PND | \
@@ -432,19 +435,36 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
{
struct can_frame *frame = (struct can_frame *)skb->data;
struct c_can_priv *priv = netdev_priv(dev);
- u32 idx, obj;
+ u32 idx, obj, tx_active, tx_cached;

if (can_dropped_invalid_skb(dev, skb))
return NETDEV_TX_OK;
- /* This is not a FIFO. C/D_CAN sends out the buffers
- * prioritized. The lowest buffer number wins.
- */
- idx = fls(atomic_read(&priv->tx_active));
- obj = idx + priv->msg_obj_tx_first;

- /* If this is the last buffer, stop the xmit queue */
- if (idx == priv->msg_obj_tx_num - 1)
+ if (atomic_read(&priv->tx_avail) == 0)
netif_stop_queue(dev);
+
+ tx_active = atomic_read(&priv->tx_active);
+ tx_cached = atomic_read(&priv->tx_cached);
+ idx = fls(tx_active);
+ if (idx > priv->msg_obj_tx_num - 1) {
+ idx = fls(tx_cached);
+
+ obj = idx + priv->msg_obj_tx_first;
+ spin_lock_bh(&priv->tx_cached_lock);
+ /* prepare message object for transmission */
+ c_can_setup_tx_object(dev, IF_TX, frame, idx);
+ /* Store the message but don't ask for its transmission */
+ c_can_object_put(dev, IF_TX, obj, IF_COMM_TX_FRAME);
+ spin_unlock_bh(&priv->tx_cached_lock);
+ priv->dlc[idx] = frame->len;
+ can_put_echo_skb(skb, dev, idx, 0);
+ atomic_dec(&priv->tx_avail);
+ atomic_add(BIT(idx), &priv->tx_cached);
+ return NETDEV_TX_OK;
+ }
+
+ obj = idx + priv->msg_obj_tx_first;
+
/* Store the message in the interface so we can call
* can_put_echo_skb(). We must do this before we enable
* transmit as we might race against do_tx().
@@ -453,6 +473,7 @@ static netdev_tx_t c_can_start_xmit(struct sk_buff *skb,
priv->dlc[idx] = frame->len;
can_put_echo_skb(skb, dev, idx, 0);

+ atomic_dec(&priv->tx_avail);
/* Update the active bits */
atomic_add(BIT(idx), &priv->tx_active);
/* Start transmission */
@@ -599,6 +620,8 @@ static int c_can_chip_config(struct net_device *dev)

/* Clear all internal status */
atomic_set(&priv->tx_active, 0);
+ atomic_set(&priv->tx_cached, 0);
+ atomic_set(&priv->tx_avail, priv->msg_obj_tx_num);
priv->tx_dir = 0;

/* set bittiming params */
@@ -723,14 +746,31 @@ static void c_can_do_tx(struct net_device *dev)
/* Clear the bits in the tx_active mask */
atomic_sub(clr, &priv->tx_active);

- if (clr & BIT(priv->msg_obj_tx_num - 1))
- netif_wake_queue(dev);
-
if (pkts) {
+ atomic_add(pkts, &priv->tx_avail);
+
+ if (netif_queue_stopped(dev))
+ netif_wake_queue(dev);
+
stats->tx_bytes += bytes;
stats->tx_packets += pkts;
can_led_event(dev, CAN_LED_EVENT_TX);
}
+
+ if (atomic_read(&priv->tx_active) == 0) {
+ pend = atomic_read(&priv->tx_cached);
+
+ clr = pend;
+ while ((idx = ffs(pend))) {
+ idx--;
+ pend &= ~(1 << idx);
+
+ obj = idx + priv->msg_obj_tx_first;
+ c_can_object_put(dev, IF_TX, obj, IF_COMM_TXRQST);
+ }
+ atomic_sub(clr, &priv->tx_cached);
+ atomic_add(clr, &priv->tx_active);
+ }
}

/* If we have a gap in the pending bits, that means we either
@@ -1193,6 +1233,7 @@ struct net_device *alloc_c_can_dev(int msg_obj_num)
return NULL;

priv = netdev_priv(dev);
+ spin_lock_init(&priv->tx_cached_lock);
priv->msg_obj_num = msg_obj_num;
priv->msg_obj_rx_num = msg_obj_num - msg_obj_tx_num;
priv->msg_obj_rx_first = 1;
--
2.17.1

2021-05-10 13:03:46

by Marc Kleine-Budde

[permalink] [raw]
Subject: Re: [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO

On 09.05.2021 14:43:09, Dario Binacchi wrote:
> As reported by a comment in the c_can_start_xmit() this was not a FIFO.
> C/D_CAN controller sends out the buffers prioritized so that the lowest
> buffer number wins.
>
> What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
> waited until the only frame of the FIFO was actually transmitted by the
> controller. Only one message in the FIFO but we had to wait for it to
> empty completely to ensure that the messages were transmitted in the
> order in which they were loaded.
>
> By storing the frames in the FIFO without requiring its transmission, we
> will be able to use the full size of the FIFO even in cases such as the
> one described above. The transmission interrupt will trigger their
> transmission only when all the messages previously loaded but stored in
> less priority positions of the buffers have been transmitted.

The algorithm you implemented looks a bit too complicated to me. Let me
sketch the algorithm that's implemented by several other drivers.

- have a power of two number of TX objects
- add a number of objects to struct priv (tx_num)
(or make it a define, if the number of tx objects is compile time fixed)
- add two "unsigned int" variables to your struct priv,
one "tx_head", one "tx_tail"
- the hard_start_xmit() writes to priv->tx_head & (priv->tx_num - 1)
- increment tx_head
- stop the tx_queue if there is no space or if the object with the
lowest prio has been written
- in TX complete IRQ, handle priv->tx_tail object
- increment tx_tail
- wake queue if there is space but don't wake if we wait for the lowest
prio object to be TX completed.

Special care needs to be taken to implement that lock-less and race
free. I suggest to look the the mcp251xfd driver.

Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (2.05 kB)
signature.asc (499.00 B)
Download all attachments

2021-05-10 13:09:03

by Marc Kleine-Budde

[permalink] [raw]
Subject: Re: [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO

On 10.05.2021 14:25:15, Marc Kleine-Budde wrote:
> On 09.05.2021 14:43:09, Dario Binacchi wrote:
> > As reported by a comment in the c_can_start_xmit() this was not a FIFO.
> > C/D_CAN controller sends out the buffers prioritized so that the lowest
> > buffer number wins.
> >
> > What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
> > waited until the only frame of the FIFO was actually transmitted by the
> > controller. Only one message in the FIFO but we had to wait for it to
> > empty completely to ensure that the messages were transmitted in the
> > order in which they were loaded.
> >
> > By storing the frames in the FIFO without requiring its transmission, we
> > will be able to use the full size of the FIFO even in cases such as the
> > one described above. The transmission interrupt will trigger their
> > transmission only when all the messages previously loaded but stored in
> > less priority positions of the buffers have been transmitted.
>
> The algorithm you implemented looks a bit too complicated to me. Let me
> sketch the algorithm that's implemented by several other drivers.
>
> - have a power of two number of TX objects
> - add a number of objects to struct priv (tx_num)
> (or make it a define, if the number of tx objects is compile time fixed)
> - add two "unsigned int" variables to your struct priv,
> one "tx_head", one "tx_tail"
> - the hard_start_xmit() writes to priv->tx_head & (priv->tx_num - 1)
> - increment tx_head
> - stop the tx_queue if there is no space or if the object with the
> lowest prio has been written
> - in TX complete IRQ, handle priv->tx_tail object
> - increment tx_tail
> - wake queue if there is space but don't wake if we wait for the lowest
> prio object to be TX completed.
>
> Special care needs to be taken to implement that lock-less and race
> free. I suggest to look the the mcp251xfd driver.

After converting the driver to the above outlined implementation it
should be more straight forward to add the caching you implemented.

regards,
Marc

--
Pengutronix e.K. | Marc Kleine-Budde |
Embedded Linux | https://www.pengutronix.de |
Vertretung West/Dortmund | Phone: +49-231-2826-924 |
Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |


Attachments:
(No filename) (2.31 kB)
signature.asc (499.00 B)
Download all attachments

2021-05-13 18:49:50

by Dario Binacchi

[permalink] [raw]
Subject: Re: [PATCH 3/3] can: c_can: cache frames to operate as a true FIFO

Hi Marc,

> Il 10/05/2021 14:36 Marc Kleine-Budde <[email protected]> ha scritto:
>
>
> On 10.05.2021 14:25:15, Marc Kleine-Budde wrote:
> > On 09.05.2021 14:43:09, Dario Binacchi wrote:
> > > As reported by a comment in the c_can_start_xmit() this was not a FIFO.
> > > C/D_CAN controller sends out the buffers prioritized so that the lowest
> > > buffer number wins.
> > >
> > > What did c_can_start_xmit() do if it found tx_active = 0x80000000 ? It
> > > waited until the only frame of the FIFO was actually transmitted by the
> > > controller. Only one message in the FIFO but we had to wait for it to
> > > empty completely to ensure that the messages were transmitted in the
> > > order in which they were loaded.
> > >
> > > By storing the frames in the FIFO without requiring its transmission, we
> > > will be able to use the full size of the FIFO even in cases such as the
> > > one described above. The transmission interrupt will trigger their
> > > transmission only when all the messages previously loaded but stored in
> > > less priority positions of the buffers have been transmitted.
> >
> > The algorithm you implemented looks a bit too complicated to me. Let me
> > sketch the algorithm that's implemented by several other drivers.
> >
> > - have a power of two number of TX objects
> > - add a number of objects to struct priv (tx_num)
> > (or make it a define, if the number of tx objects is compile time fixed)
> > - add two "unsigned int" variables to your struct priv,
> > one "tx_head", one "tx_tail"
> > - the hard_start_xmit() writes to priv->tx_head & (priv->tx_num - 1)
> > - increment tx_head
> > - stop the tx_queue if there is no space or if the object with the
> > lowest prio has been written
> > - in TX complete IRQ, handle priv->tx_tail object
> > - increment tx_tail
> > - wake queue if there is space but don't wake if we wait for the lowest
> > prio object to be TX completed.
> >
> > Special care needs to be taken to implement that lock-less and race
> > free. I suggest to look the the mcp251xfd driver.
>
> After converting the driver to the above outlined implementation it
> should be more straight forward to add the caching you implemented.
>

I took some time to think about your suggestions.
The submitted patch was developed trying to improve the
CAN transmission using the current driver design for minimize
the creation of bugs.
If I'm not missing something you suggest me to change the
driver design as a pre-condition to apply an updated version
of my patch. IMHO this would increase the possibility of generating
bugs, even for parts of the code that are considered stable.
If the algorithm I have implemented is a bit too complicated,
let's try to simplify it starting from the submitted patch.

Waiting for your reply, thanks and regards
Dario

> regards,
> Marc
>
> --
> Pengutronix e.K. | Marc Kleine-Budde |
> Embedded Linux | https://www.pengutronix.de |
> Vertretung West/Dortmund | Phone: +49-231-2826-924 |
> Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |