2021-02-11 17:19:21

by Sven Van Asbroeck

[permalink] [raw]
Subject: [PATCH net-next v2 2/5] lan743x: sync only the received area of an rx ring buffer

From: Sven Van Asbroeck <[email protected]>

On cpu architectures w/o dma cache snooping, dma_unmap() is a
is a very expensive operation, because its resulting sync
needs to invalidate cpu caches.

Increase efficiency/performance by syncing only those sections
of the lan743x's rx ring buffers that are actually in use.

Signed-off-by: Sven Van Asbroeck <[email protected]>
---

To: Bryan Whitehead <[email protected]>
To: [email protected]
To: "David S. Miller" <[email protected]>
To: Jakub Kicinski <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Alexey Denisov <[email protected]>
Cc: Sergej Bauer <[email protected]>
Cc: Tim Harvey <[email protected]>
Cc: Anders Rønningen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: [email protected]
Cc: [email protected]

drivers/net/ethernet/microchip/lan743x_main.c | 32 +++++++++++++------
1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index 0c48bb559719..36cc67c72851 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1968,35 +1968,49 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
struct lan743x_rx_buffer_info *buffer_info;
+ unsigned int buffer_length, packet_length;
struct lan743x_rx_descriptor *descriptor;
struct sk_buff *skb;
dma_addr_t dma_ptr;
- int length;

- length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
+ buffer_length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;

descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = __netdev_alloc_skb(netdev, length, GFP_ATOMIC | GFP_DMA);
+ skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
if (!skb)
return -ENOMEM;
- dma_ptr = dma_map_single(dev, skb->data, length, DMA_FROM_DEVICE);
+ dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
if (dma_mapping_error(dev, dma_ptr)) {
dev_kfree_skb_any(skb);
return -ENOMEM;
}
- if (buffer_info->dma_ptr)
- dma_unmap_single(dev, buffer_info->dma_ptr,
- buffer_info->buffer_length, DMA_FROM_DEVICE);
+ if (buffer_info->dma_ptr) {
+ /* unmap from dma */
+ packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
+ (le32_to_cpu(descriptor->data0));
+ if (packet_length == 0 ||
+ packet_length > buffer_info->buffer_length)
+ /* buffer is part of multi-buffer packet: fully used */
+ packet_length = buffer_info->buffer_length;
+ /* sync used part of buffer only */
+ dma_sync_single_for_cpu(dev, buffer_info->dma_ptr,
+ packet_length,
+ DMA_FROM_DEVICE);
+ dma_unmap_single_attrs(dev, buffer_info->dma_ptr,
+ buffer_info->buffer_length,
+ DMA_FROM_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC);
+ }

buffer_info->skb = skb;
buffer_info->dma_ptr = dma_ptr;
- buffer_info->buffer_length = length;
+ buffer_info->buffer_length = buffer_length;
descriptor->data1 = cpu_to_le32(DMA_ADDR_LOW32(buffer_info->dma_ptr));
descriptor->data2 = cpu_to_le32(DMA_ADDR_HIGH32(buffer_info->dma_ptr));
descriptor->data3 = 0;
descriptor->data0 = cpu_to_le32((RX_DESC_DATA0_OWN_ |
- (length & RX_DESC_DATA0_BUF_LENGTH_MASK_)));
+ (buffer_length & RX_DESC_DATA0_BUF_LENGTH_MASK_)));
lan743x_rx_update_tail(rx, index);

return 0;
--
2.17.1


2021-02-12 22:49:42

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [PATCH net-next v2 2/5] lan743x: sync only the received area of an rx ring buffer

Hi Bryan,

On Fri, Feb 12, 2021 at 3:45 PM <[email protected]> wrote:
>
> According to the document I have, FRAME_LENGTH is only valid when LS bit is set, and reserved otherwise.
> Therefore, I'm not sure you can rely on it being zero when LS is not set, even if your experiments say it is.
> Future chip revisions might use those bits differently.

That's good to know. I didn't find any documentation related to
multi-buffer frames, so I had to go with what I saw the chip do
experimentally. It's great that you were able to double-check against
the official docs.

>
> Can you change this so the LS bit is checked.
> If set you can use the smaller of FRAME_LENGTH or buffer length.
> If clear you can just use buffer length.

Will do. Are you planning to hold off your tests until v3? It
shouldn't take too long.