2021-02-16 01:09:51

by Sven Van Asbroeck

[permalink] [raw]
Subject: [PATCH net-next v3 0/5] lan743x speed boost

From: Sven Van Asbroeck <[email protected]>

Tree: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git # 9ec5eea5b6ac

v2 -> v3:
- Bryan Whitehead:
+ add Bryan's reviewed-by tag to patch 1/5.
+ Only use FRAME_LENGTH if the LS bit is checked.
If set use the smaller of FRAME_LENGTH or buffer length.
If clear use buffer length.
+ Correct typo in cover letter history (swap "packet" <-> "buffer").

v1 -> v2:
- Andrew Lunn:
+ always keep to Reverse Christmas Tree.
+ "changing the cache operations to operate on the received length" should
go in its own, separate patch, so it can be easily backed out if
"interesting things" should happen with it.

- Bryan Whitehead:
+ multi-buffer patch concept "looks good".
As a result, I will squash the intermediate "dma buffer only" patch which
demonstrated the speed boost using an inflexible solution
(w/o multi-buffers).
+ Rename lan743x_rx_process_packet() to lan743x_rx_process_buffer()
+ Remove unused RX_PROCESS_RESULT_PACKET_DROPPED
+ Rename RX_PROCESS_RESULT_PACKET_RECEIVED to
RX_PROCESS_RESULT_BUFFER_RECEIVED
+ Fold "unmap from dma" into lan743x_rx_init_ring_element() to prevent
use-after-dma-unmap issue
+ ensure that skb allocation issues do not result in the driver sending
incomplete packets to the OS. E.g. a three-buffer packet, with the
middle buffer missing

- Willem De Bruyn: skb_hwtstamps(skb) always returns a non-null value, if the
skb parameter points to a valid skb.

Summary of my tests below.
Suggestions for better tests are very welcome.

Tests with debug logging enabled (add #define DEBUG).

1. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Ping to chip, verify correct packet size is sent to OS.
Ping large packets to chip (ping -s 1400), verify correct
packet size is sent to OS.
Ping using packets around the buffer size, verify number of
buffers is changing, verify correct packet size is sent
to OS:
$ ping -s 472
$ ping -s 473
$ ping -s 992
$ ping -s 993
Verify that each packet is followed by extension processing.

2. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Run iperf3 -s on chip, verify that packets come in 3 buffers
at a time.
Verify that packet size is equal to mtu.
Verify that each packet is followed by extension processing.

3. Set mtu to 2000 on chip and host.
Limit rx buffer size to 500, so mtu (2000) takes 4 buffers.
Run iperf3 -s on chip, verify that packets come in 4 buffers
at a time.
Verify that packet size is equal to mtu.
Verify that each packet is followed by extension processing.

Tests with debug logging DISabled (remove #define DEBUG).

4. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Run iperf3 -s on chip, note sustained rx speed.
Set mtu to 2000, so mtu takes 4 buffers.
Run iperf3 -s on chip, note sustained rx speed.
Verify no packets are dropped in both cases.
Verify speeds are roughly comparable.

Tests with DEBUG_KMEMLEAK on:
$ mount -t debugfs nodev /sys/kernel/debug/
$ echo scan > /sys/kernel/debug/kmemleak

5. Limit rx buffer size to 500, so mtu (1500) takes 3 buffers.
Run the following tests concurrently for at least one hour:
- iperf3 -s on chip
- ping -> chip

Monitor reported memory leaks.

6. Set mtu to 2000.
Limit rx buffer size to 500, so mtu (2000) takes 4 buffers.
Run the following tests concurrently for at least one hour:
- iperf3 -s on chip
- ping -> chip

Monitor reported memory leaks.

7. Simulate low-memory in lan743x_rx_allocate_skb(): fail once every
100 allocations.
Repeat (5) and (6).
Monitor reported memory leaks.

8. Simulate low-memory in lan743x_rx_allocate_skb(): fail 10
allocations in a row in every 100.
Repeat (5) and (6).
Monitor reported memory leaks.

9. Simulate low-memory in lan743x_rx_trim_skb(): fail 1 allocation
in every 100.
Repeat (5) and (6).
Monitor reported memory leaks.

Tests with debug logging enabled (add #define DEBUG).

10. Set the chip mtu to 1500, generate lots of network traffic.
Stop all network traffic.
Set the chip and remote mtus to 8000.
Ping remote -> chip: $ ping <chip ip> -s 7000
Verify that the first few received packets are multi-buffer.
Verify no pings are dropped.

Tests with DEBUG_KMEMLEAK on:
$ mount -t debugfs nodev /sys/kernel/debug/
$ echo scan > /sys/kernel/debug/kmemleak

11. Start with chip mtu at 1500, host mtu at 8000.
Run concurrently:
- iperf3 -s on chip
- ping -> chip

Cycle the chip mtu between 1500 and 8000 every 10 seconds.

Scan kmemleak periodically to watch for memory leaks.

Verify that the mtu changeover happens smoothly, i.e.
the iperf3 test does not report periods where speed
drops and recovers suddenly.

Note: iperf3 occasionally reports dropped packets on
changeover. This behaviour also occurs on the original
driver, it's not a regression. Possibly related to the
chip's mac rx being disabled when the mtu is changed.

To: Bryan Whitehead <[email protected]>
To: [email protected]
To: "David S. Miller" <[email protected]>
To: Jakub Kicinski <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Alexey Denisov <[email protected]>
Cc: Sergej Bauer <[email protected]>
Cc: Tim Harvey <[email protected]>
Cc: Anders Rønningen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: [email protected]
Cc: [email protected]

Sven Van Asbroeck (5):
lan743x: boost performance on cpu archs w/o dma cache snooping
lan743x: sync only the received area of an rx ring buffer
TEST ONLY: lan743x: limit rx ring buffer size to 500 bytes
TEST ONLY: lan743x: skb_alloc failure test
TEST ONLY: lan743x: skb_trim failure test

drivers/net/ethernet/microchip/lan743x_main.c | 352 +++++++++---------
drivers/net/ethernet/microchip/lan743x_main.h | 5 +-
2 files changed, 174 insertions(+), 183 deletions(-)

--
2.17.1


2021-02-16 01:11:07

by Sven Van Asbroeck

[permalink] [raw]
Subject: [PATCH net-next v3 1/5] lan743x: boost performance on cpu archs w/o dma cache snooping

From: Sven Van Asbroeck <[email protected]>

The buffers in the lan743x driver's receive ring are always 9K,
even when the largest packet that can be received (the mtu) is
much smaller. This performs particularly badly on cpu archs
without dma cache snooping (such as ARM): each received packet
results in a 9K dma_{map|unmap} operation, which is very expensive
because cpu caches need to be invalidated.

Careful measurement of the driver rx path on armv7 reveals that
the cpu spends the majority of its time waiting for cache
invalidation.

Optimize by keeping the rx ring buffer size as close as possible
to the mtu. This limits the amount of cache that requires
invalidation.

This optimization would normally force us to re-allocate all
ring buffers when the mtu is changed - a disruptive event,
because it can only happen when the network interface is down.

Remove the need to re-allocate all ring buffers by adding support
for multi-buffer frames. Now any combination of mtu and ring
buffer size will work. When the mtu changes from mtu1 to mtu2,
consumed buffers of size mtu1 are lazily replaced by newly
allocated buffers of size mtu2.

These optimizations double the rx performance on armv7.
Third parties report 3x rx speedup on armv8.

Tested with iperf3 on a freescale imx6qp + lan7430, both sides
set to mtu 1500 bytes, measure rx performance:

Before:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-20.00 sec 550 MBytes 231 Mbits/sec 0
After:
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-20.00 sec 1.33 GBytes 570 Mbits/sec 0

Signed-off-by: Sven Van Asbroeck <[email protected]>
Reviewed-by: Bryan Whitehead <[email protected]>
---

To: Bryan Whitehead <[email protected]>
To: [email protected]
To: "David S. Miller" <[email protected]>
To: Jakub Kicinski <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Alexey Denisov <[email protected]>
Cc: Sergej Bauer <[email protected]>
Cc: Tim Harvey <[email protected]>
Cc: Anders Rønningen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: [email protected]
Cc: [email protected]

drivers/net/ethernet/microchip/lan743x_main.c | 324 ++++++++----------
drivers/net/ethernet/microchip/lan743x_main.h | 5 +-
2 files changed, 148 insertions(+), 181 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index f1f6eba4ace4..c2633efe6067 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1955,15 +1955,6 @@ static int lan743x_rx_next_index(struct lan743x_rx *rx, int index)
return ((++index) % rx->ring_size);
}

-static struct sk_buff *lan743x_rx_allocate_skb(struct lan743x_rx *rx)
-{
- int length = 0;
-
- length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
- return __netdev_alloc_skb(rx->adapter->netdev,
- length, GFP_ATOMIC | GFP_DMA);
-}
-
static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
{
/* update the tail once per 8 descriptors */
@@ -1972,36 +1963,40 @@ static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
index);
}

-static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index,
- struct sk_buff *skb)
+static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
{
+ struct net_device *netdev = rx->adapter->netdev;
+ struct device *dev = &rx->adapter->pdev->dev;
struct lan743x_rx_buffer_info *buffer_info;
struct lan743x_rx_descriptor *descriptor;
- int length = 0;
+ struct sk_buff *skb;
+ dma_addr_t dma_ptr;
+ int length;
+
+ length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;

- length = (LAN743X_MAX_FRAME_SIZE + ETH_HLEN + 4 + RX_HEAD_PADDING);
descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- buffer_info->skb = skb;
- if (!(buffer_info->skb))
+ skb = __netdev_alloc_skb(netdev, length, GFP_ATOMIC | GFP_DMA);
+ if (!skb)
return -ENOMEM;
- buffer_info->dma_ptr = dma_map_single(&rx->adapter->pdev->dev,
- buffer_info->skb->data,
- length,
- DMA_FROM_DEVICE);
- if (dma_mapping_error(&rx->adapter->pdev->dev,
- buffer_info->dma_ptr)) {
- buffer_info->dma_ptr = 0;
+ dma_ptr = dma_map_single(dev, skb->data, length, DMA_FROM_DEVICE);
+ if (dma_mapping_error(dev, dma_ptr)) {
+ dev_kfree_skb_any(skb);
return -ENOMEM;
}
+ if (buffer_info->dma_ptr)
+ dma_unmap_single(dev, buffer_info->dma_ptr,
+ buffer_info->buffer_length, DMA_FROM_DEVICE);

+ buffer_info->skb = skb;
+ buffer_info->dma_ptr = dma_ptr;
buffer_info->buffer_length = length;
descriptor->data1 = cpu_to_le32(DMA_ADDR_LOW32(buffer_info->dma_ptr));
descriptor->data2 = cpu_to_le32(DMA_ADDR_HIGH32(buffer_info->dma_ptr));
descriptor->data3 = 0;
descriptor->data0 = cpu_to_le32((RX_DESC_DATA0_OWN_ |
(length & RX_DESC_DATA0_BUF_LENGTH_MASK_)));
- skb_reserve(buffer_info->skb, RX_HEAD_PADDING);
lan743x_rx_update_tail(rx, index);

return 0;
@@ -2050,16 +2045,32 @@ static void lan743x_rx_release_ring_element(struct lan743x_rx *rx, int index)
memset(buffer_info, 0, sizeof(*buffer_info));
}

-static int lan743x_rx_process_packet(struct lan743x_rx *rx)
+static struct sk_buff *
+lan743x_rx_trim_skb(struct sk_buff *skb, int frame_length)
+{
+ if (skb_linearize(skb)) {
+ dev_kfree_skb_irq(skb);
+ return NULL;
+ }
+ frame_length = max_t(int, 0, frame_length - RX_HEAD_PADDING - 2);
+ if (skb->len > frame_length) {
+ skb->tail -= skb->len - frame_length;
+ skb->len = frame_length;
+ }
+ return skb;
+}
+
+static int lan743x_rx_process_buffer(struct lan743x_rx *rx)
{
- struct skb_shared_hwtstamps *hwtstamps = NULL;
- int result = RX_PROCESS_RESULT_NOTHING_TO_DO;
int current_head_index = le32_to_cpu(*rx->head_cpu_ptr);
+ struct lan743x_rx_descriptor *descriptor, *desc_ext;
+ struct net_device *netdev = rx->adapter->netdev;
+ int result = RX_PROCESS_RESULT_NOTHING_TO_DO;
struct lan743x_rx_buffer_info *buffer_info;
- struct lan743x_rx_descriptor *descriptor;
+ int frame_length, buffer_length;
int extension_index = -1;
- int first_index = -1;
- int last_index = -1;
+ bool is_last, is_first;
+ struct sk_buff *skb;

if (current_head_index < 0 || current_head_index >= rx->ring_size)
goto done;
@@ -2067,163 +2078,120 @@ static int lan743x_rx_process_packet(struct lan743x_rx *rx)
if (rx->last_head < 0 || rx->last_head >= rx->ring_size)
goto done;

- if (rx->last_head != current_head_index) {
- descriptor = &rx->ring_cpu_ptr[rx->last_head];
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_)
- goto done;
+ if (rx->last_head == current_head_index)
+ goto done;

- if (!(le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_FS_))
- goto done;
+ descriptor = &rx->ring_cpu_ptr[rx->last_head];
+ if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_)
+ goto done;
+ buffer_info = &rx->buffer_info[rx->last_head];

- first_index = rx->last_head;
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_) {
- last_index = rx->last_head;
- } else {
- int index;
-
- index = lan743x_rx_next_index(rx, first_index);
- while (index != current_head_index) {
- descriptor = &rx->ring_cpu_ptr[index];
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_OWN_)
- goto done;
-
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_) {
- last_index = index;
- break;
- }
- index = lan743x_rx_next_index(rx, index);
- }
- }
- if (last_index >= 0) {
- descriptor = &rx->ring_cpu_ptr[last_index];
- if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_EXT_) {
- /* extension is expected to follow */
- int index = lan743x_rx_next_index(rx,
- last_index);
- if (index != current_head_index) {
- descriptor = &rx->ring_cpu_ptr[index];
- if (le32_to_cpu(descriptor->data0) &
- RX_DESC_DATA0_OWN_) {
- goto done;
- }
- if (le32_to_cpu(descriptor->data0) &
- RX_DESC_DATA0_EXT_) {
- extension_index = index;
- } else {
- goto done;
- }
- } else {
- /* extension is not yet available */
- /* prevent processing of this packet */
- first_index = -1;
- last_index = -1;
- }
- }
- }
- }
- if (first_index >= 0 && last_index >= 0) {
- int real_last_index = last_index;
- struct sk_buff *skb = NULL;
- u32 ts_sec = 0;
- u32 ts_nsec = 0;
-
- /* packet is available */
- if (first_index == last_index) {
- /* single buffer packet */
- struct sk_buff *new_skb = NULL;
- int packet_length;
-
- new_skb = lan743x_rx_allocate_skb(rx);
- if (!new_skb) {
- /* failed to allocate next skb.
- * Memory is very low.
- * Drop this packet and reuse buffer.
- */
- lan743x_rx_reuse_ring_element(rx, first_index);
- goto process_extension;
- }
+ is_last = le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_;
+ is_first = le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_FS_;

- buffer_info = &rx->buffer_info[first_index];
- skb = buffer_info->skb;
- descriptor = &rx->ring_cpu_ptr[first_index];
-
- /* unmap from dma */
- if (buffer_info->dma_ptr) {
- dma_unmap_single(&rx->adapter->pdev->dev,
- buffer_info->dma_ptr,
- buffer_info->buffer_length,
- DMA_FROM_DEVICE);
- buffer_info->dma_ptr = 0;
- buffer_info->buffer_length = 0;
- }
- buffer_info->skb = NULL;
- packet_length = RX_DESC_DATA0_FRAME_LENGTH_GET_
- (le32_to_cpu(descriptor->data0));
- skb_put(skb, packet_length - 4);
- skb->protocol = eth_type_trans(skb,
- rx->adapter->netdev);
- lan743x_rx_init_ring_element(rx, first_index, new_skb);
- } else {
- int index = first_index;
+ if (is_last && le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_EXT_) {
+ /* extension is expected to follow */
+ int index = lan743x_rx_next_index(rx, rx->last_head);

- /* multi buffer packet not supported */
- /* this should not happen since
- * buffers are allocated to be at least jumbo size
- */
+ if (index == current_head_index)
+ /* extension not yet available */
+ goto done;
+ desc_ext = &rx->ring_cpu_ptr[index];
+ if (le32_to_cpu(desc_ext->data0) & RX_DESC_DATA0_OWN_)
+ /* extension not yet available */
+ goto done;
+ if (!(le32_to_cpu(desc_ext->data0) & RX_DESC_DATA0_EXT_))
+ goto move_forward;
+ extension_index = index;
+ }

- /* clean up buffers */
- if (first_index <= last_index) {
- while ((index >= first_index) &&
- (index <= last_index)) {
- lan743x_rx_reuse_ring_element(rx,
- index);
- index = lan743x_rx_next_index(rx,
- index);
- }
- } else {
- while ((index >= first_index) ||
- (index <= last_index)) {
- lan743x_rx_reuse_ring_element(rx,
- index);
- index = lan743x_rx_next_index(rx,
- index);
- }
- }
- }
+ /* Only the last buffer in a multi-buffer frame contains the total frame
+ * length. The chip occasionally sends more buffers than strictly
+ * required to reach the total frame length.
+ * Handle this by adding all buffers to the skb in their entirety.
+ * Once the real frame length is known, trim the skb.
+ */
+ frame_length =
+ RX_DESC_DATA0_FRAME_LENGTH_GET_(le32_to_cpu(descriptor->data0));
+ buffer_length = buffer_info->buffer_length;
+
+ netdev_dbg(netdev, "%s%schunk: %d/%d",
+ is_first ? "first " : " ",
+ is_last ? "last " : " ",
+ frame_length, buffer_length);
+
+ /* save existing skb, allocate new skb and map to dma */
+ skb = buffer_info->skb;
+ if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
+ /* failed to allocate next skb.
+ * Memory is very low.
+ * Drop this packet and reuse buffer.
+ */
+ lan743x_rx_reuse_ring_element(rx, rx->last_head);
+ /* drop packet that was being assembled */
+ dev_kfree_skb_irq(rx->skb_head);
+ rx->skb_head = NULL;
+ goto process_extension;
+ }
+
+ /* add buffers to skb via skb->frag_list */
+ if (is_first) {
+ skb_reserve(skb, RX_HEAD_PADDING);
+ skb_put(skb, buffer_length - RX_HEAD_PADDING);
+ if (rx->skb_head)
+ dev_kfree_skb_irq(rx->skb_head);
+ rx->skb_head = skb;
+ } else if (rx->skb_head) {
+ skb_put(skb, buffer_length);
+ if (skb_shinfo(rx->skb_head)->frag_list)
+ rx->skb_tail->next = skb;
+ else
+ skb_shinfo(rx->skb_head)->frag_list = skb;
+ rx->skb_tail = skb;
+ rx->skb_head->len += skb->len;
+ rx->skb_head->data_len += skb->len;
+ rx->skb_head->truesize += skb->truesize;
+ } else {
+ /* packet to assemble has already been dropped because one or
+ * more of its buffers could not be allocated
+ */
+ netdev_dbg(netdev, "drop buffer intended for dropped packet");
+ dev_kfree_skb_irq(skb);
+ }

process_extension:
- if (extension_index >= 0) {
- descriptor = &rx->ring_cpu_ptr[extension_index];
- buffer_info = &rx->buffer_info[extension_index];
-
- ts_sec = le32_to_cpu(descriptor->data1);
- ts_nsec = (le32_to_cpu(descriptor->data2) &
- RX_DESC_DATA2_TS_NS_MASK_);
- lan743x_rx_reuse_ring_element(rx, extension_index);
- real_last_index = extension_index;
- }
+ if (extension_index >= 0) {
+ u32 ts_sec;
+ u32 ts_nsec;

- if (!skb) {
- result = RX_PROCESS_RESULT_PACKET_DROPPED;
- goto move_forward;
- }
+ ts_sec = le32_to_cpu(desc_ext->data1);
+ ts_nsec = (le32_to_cpu(desc_ext->data2) &
+ RX_DESC_DATA2_TS_NS_MASK_);
+ if (rx->skb_head)
+ skb_hwtstamps(rx->skb_head)->hwtstamp =
+ ktime_set(ts_sec, ts_nsec);
+ lan743x_rx_reuse_ring_element(rx, extension_index);
+ rx->last_head = extension_index;
+ netdev_dbg(netdev, "process extension");
+ }

- if (extension_index < 0)
- goto pass_packet_to_os;
- hwtstamps = skb_hwtstamps(skb);
- if (hwtstamps)
- hwtstamps->hwtstamp = ktime_set(ts_sec, ts_nsec);
+ if (is_last && rx->skb_head)
+ rx->skb_head = lan743x_rx_trim_skb(rx->skb_head, frame_length);

-pass_packet_to_os:
- /* pass packet to OS */
- napi_gro_receive(&rx->napi, skb);
- result = RX_PROCESS_RESULT_PACKET_RECEIVED;
+ if (is_last && rx->skb_head) {
+ rx->skb_head->protocol = eth_type_trans(rx->skb_head,
+ rx->adapter->netdev);
+ netdev_dbg(netdev, "sending %d byte frame to OS",
+ rx->skb_head->len);
+ napi_gro_receive(&rx->napi, rx->skb_head);
+ rx->skb_head = NULL;
+ }

move_forward:
- /* push tail and head forward */
- rx->last_tail = real_last_index;
- rx->last_head = lan743x_rx_next_index(rx, real_last_index);
- }
+ /* push tail and head forward */
+ rx->last_tail = rx->last_head;
+ rx->last_head = lan743x_rx_next_index(rx, rx->last_head);
+ result = RX_PROCESS_RESULT_BUFFER_RECEIVED;
done:
return result;
}
@@ -2242,12 +2210,12 @@ static int lan743x_rx_napi_poll(struct napi_struct *napi, int weight)
DMAC_INT_BIT_RXFRM_(rx->channel_number));
}
for (count = 0; count < weight; count++) {
- result = lan743x_rx_process_packet(rx);
+ result = lan743x_rx_process_buffer(rx);
if (result == RX_PROCESS_RESULT_NOTHING_TO_DO)
break;
}
rx->frame_count += count;
- if (count == weight || result == RX_PROCESS_RESULT_PACKET_RECEIVED)
+ if (count == weight || result == RX_PROCESS_RESULT_BUFFER_RECEIVED)
return weight;

if (!napi_complete_done(napi, count))
@@ -2359,9 +2327,7 @@ static int lan743x_rx_ring_init(struct lan743x_rx *rx)

rx->last_head = 0;
for (index = 0; index < rx->ring_size; index++) {
- struct sk_buff *new_skb = lan743x_rx_allocate_skb(rx);
-
- ret = lan743x_rx_init_ring_element(rx, index, new_skb);
+ ret = lan743x_rx_init_ring_element(rx, index);
if (ret)
goto cleanup;
}
diff --git a/drivers/net/ethernet/microchip/lan743x_main.h b/drivers/net/ethernet/microchip/lan743x_main.h
index 751f2bc9ce84..40dfb564c4f7 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.h
+++ b/drivers/net/ethernet/microchip/lan743x_main.h
@@ -698,6 +698,8 @@ struct lan743x_rx {
struct napi_struct napi;

u32 frame_count;
+
+ struct sk_buff *skb_head, *skb_tail;
};

struct lan743x_adapter {
@@ -831,8 +833,7 @@ struct lan743x_rx_buffer_info {
#define LAN743X_RX_RING_SIZE (65)

#define RX_PROCESS_RESULT_NOTHING_TO_DO (0)
-#define RX_PROCESS_RESULT_PACKET_RECEIVED (1)
-#define RX_PROCESS_RESULT_PACKET_DROPPED (2)
+#define RX_PROCESS_RESULT_BUFFER_RECEIVED (1)

u32 lan743x_csr_read(struct lan743x_adapter *adapter, int offset);
void lan743x_csr_write(struct lan743x_adapter *adapter, int offset, u32 data);
--
2.17.1

2021-02-16 01:11:17

by Sven Van Asbroeck

[permalink] [raw]
Subject: [PATCH net-next v3 3/5] TEST ONLY: lan743x: limit rx ring buffer size to 500 bytes

From: Sven Van Asbroeck <[email protected]>

Signed-off-by: Sven Van Asbroeck <[email protected]>
---

To: Bryan Whitehead <[email protected]>
To: [email protected]
To: "David S. Miller" <[email protected]>
To: Jakub Kicinski <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Alexey Denisov <[email protected]>
Cc: Sergej Bauer <[email protected]>
Cc: Tim Harvey <[email protected]>
Cc: Anders Rønningen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: [email protected]
Cc: [email protected]

drivers/net/ethernet/microchip/lan743x_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index 6b642691a676..90738ec5e7ec 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1973,7 +1973,7 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
struct sk_buff *skb;
dma_addr_t dma_ptr;

- buffer_length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
+ buffer_length = 500 + ETH_HLEN + 4 + RX_HEAD_PADDING;

descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
--
2.17.1

2021-02-16 01:12:44

by Sven Van Asbroeck

[permalink] [raw]
Subject: [PATCH net-next v3 2/5] lan743x: sync only the received area of an rx ring buffer

From: Sven Van Asbroeck <[email protected]>

On cpu architectures w/o dma cache snooping, dma_unmap() is a
is a very expensive operation, because its resulting sync
needs to invalidate cpu caches.

Increase efficiency/performance by syncing only those sections
of the lan743x's rx ring buffers that are actually in use.

Signed-off-by: Sven Van Asbroeck <[email protected]>
---

To: Bryan Whitehead <[email protected]>
To: [email protected]
To: "David S. Miller" <[email protected]>
To: Jakub Kicinski <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Alexey Denisov <[email protected]>
Cc: Sergej Bauer <[email protected]>
Cc: Tim Harvey <[email protected]>
Cc: Anders Rønningen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: [email protected]
Cc: [email protected]

drivers/net/ethernet/microchip/lan743x_main.c | 35 ++++++++++++++-----
1 file changed, 26 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index c2633efe6067..6b642691a676 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1968,35 +1968,52 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
struct lan743x_rx_buffer_info *buffer_info;
+ unsigned int buffer_length, used_length;
struct lan743x_rx_descriptor *descriptor;
struct sk_buff *skb;
dma_addr_t dma_ptr;
- int length;

- length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;
+ buffer_length = netdev->mtu + ETH_HLEN + 4 + RX_HEAD_PADDING;

descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = __netdev_alloc_skb(netdev, length, GFP_ATOMIC | GFP_DMA);
+ skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
if (!skb)
return -ENOMEM;
- dma_ptr = dma_map_single(dev, skb->data, length, DMA_FROM_DEVICE);
+ dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
if (dma_mapping_error(dev, dma_ptr)) {
dev_kfree_skb_any(skb);
return -ENOMEM;
}
- if (buffer_info->dma_ptr)
- dma_unmap_single(dev, buffer_info->dma_ptr,
- buffer_info->buffer_length, DMA_FROM_DEVICE);
+ if (buffer_info->dma_ptr) {
+ /* sync used area of buffer only */
+ if (le32_to_cpu(descriptor->data0) & RX_DESC_DATA0_LS_)
+ /* frame length is valid only if LS bit is set.
+ * it's a safe upper bound for the used area in this
+ * buffer.
+ */
+ used_length = min(RX_DESC_DATA0_FRAME_LENGTH_GET_
+ (le32_to_cpu(descriptor->data0)),
+ buffer_info->buffer_length);
+ else
+ used_length = buffer_info->buffer_length;
+ dma_sync_single_for_cpu(dev, buffer_info->dma_ptr,
+ used_length,
+ DMA_FROM_DEVICE);
+ dma_unmap_single_attrs(dev, buffer_info->dma_ptr,
+ buffer_info->buffer_length,
+ DMA_FROM_DEVICE,
+ DMA_ATTR_SKIP_CPU_SYNC);
+ }

buffer_info->skb = skb;
buffer_info->dma_ptr = dma_ptr;
- buffer_info->buffer_length = length;
+ buffer_info->buffer_length = buffer_length;
descriptor->data1 = cpu_to_le32(DMA_ADDR_LOW32(buffer_info->dma_ptr));
descriptor->data2 = cpu_to_le32(DMA_ADDR_HIGH32(buffer_info->dma_ptr));
descriptor->data3 = 0;
descriptor->data0 = cpu_to_le32((RX_DESC_DATA0_OWN_ |
- (length & RX_DESC_DATA0_BUF_LENGTH_MASK_)));
+ (buffer_length & RX_DESC_DATA0_BUF_LENGTH_MASK_)));
lan743x_rx_update_tail(rx, index);

return 0;
--
2.17.1

2021-02-16 01:14:11

by Sven Van Asbroeck

[permalink] [raw]
Subject: [PATCH net-next v3 4/5] TEST ONLY: lan743x: skb_alloc failure test

From: Sven Van Asbroeck <[email protected]>

Simulate low-memory in lan743x_rx_allocate_skb(): fail 10
allocations in a row in every 100.

Signed-off-by: Sven Van Asbroeck <[email protected]>
---

To: Bryan Whitehead <[email protected]>
To: [email protected]
To: "David S. Miller" <[email protected]>
To: Jakub Kicinski <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Alexey Denisov <[email protected]>
Cc: Sergej Bauer <[email protected]>
Cc: Tim Harvey <[email protected]>
Cc: Anders Rønningen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: [email protected]
Cc: [email protected]

drivers/net/ethernet/microchip/lan743x_main.c | 21 +++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index 90738ec5e7ec..6e1b3c996bd7 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1963,7 +1963,20 @@ static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
index);
}

-static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
+static struct sk_buff *
+lan743x_alloc_skb(struct net_device *netdev, int length, bool can_fail)
+{
+ static int rx_alloc;
+ int counter = rx_alloc++ % 100;
+
+ if (can_fail && counter >= 20 && counter < 30)
+ return NULL;
+
+ return __netdev_alloc_skb(netdev, length, GFP_ATOMIC | GFP_DMA);
+}
+
+static int
+lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index, bool can_fail)
{
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
@@ -1977,7 +1990,7 @@ static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)

descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
+ skb = lan743x_alloc_skb(netdev, buffer_length, can_fail);
if (!skb)
return -ENOMEM;
dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
@@ -2139,7 +2152,7 @@ static int lan743x_rx_process_buffer(struct lan743x_rx *rx)

/* save existing skb, allocate new skb and map to dma */
skb = buffer_info->skb;
- if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
+ if (lan743x_rx_init_ring_element(rx, rx->last_head, true)) {
/* failed to allocate next skb.
* Memory is very low.
* Drop this packet and reuse buffer.
@@ -2344,7 +2357,7 @@ static int lan743x_rx_ring_init(struct lan743x_rx *rx)

rx->last_head = 0;
for (index = 0; index < rx->ring_size; index++) {
- ret = lan743x_rx_init_ring_element(rx, index);
+ ret = lan743x_rx_init_ring_element(rx, index, false);
if (ret)
goto cleanup;
}
--
2.17.1

2021-02-16 01:14:24

by Sven Van Asbroeck

[permalink] [raw]
Subject: [PATCH net-next v3 5/5] TEST ONLY: lan743x: skb_trim failure test

From: Sven Van Asbroeck <[email protected]>

Simulate low-memory in lan743x_rx_trim_skb(): fail one allocation
in every 100.

Signed-off-by: Sven Van Asbroeck <[email protected]>
---

To: Bryan Whitehead <[email protected]>
To: [email protected]
To: "David S. Miller" <[email protected]>
To: Jakub Kicinski <[email protected]>
Cc: Andrew Lunn <[email protected]>
Cc: Alexey Denisov <[email protected]>
Cc: Sergej Bauer <[email protected]>
Cc: Tim Harvey <[email protected]>
Cc: Anders Rønningen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Willem de Bruijn <[email protected]>
Cc: [email protected]
Cc: [email protected]

drivers/net/ethernet/microchip/lan743x_main.c | 28 ++++++++-----------
1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/microchip/lan743x_main.c b/drivers/net/ethernet/microchip/lan743x_main.c
index 6e1b3c996bd7..4751626f4c0f 100644
--- a/drivers/net/ethernet/microchip/lan743x_main.c
+++ b/drivers/net/ethernet/microchip/lan743x_main.c
@@ -1963,20 +1963,7 @@ static void lan743x_rx_update_tail(struct lan743x_rx *rx, int index)
index);
}

-static struct sk_buff *
-lan743x_alloc_skb(struct net_device *netdev, int length, bool can_fail)
-{
- static int rx_alloc;
- int counter = rx_alloc++ % 100;
-
- if (can_fail && counter >= 20 && counter < 30)
- return NULL;
-
- return __netdev_alloc_skb(netdev, length, GFP_ATOMIC | GFP_DMA);
-}
-
-static int
-lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index, bool can_fail)
+static int lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index)
{
struct net_device *netdev = rx->adapter->netdev;
struct device *dev = &rx->adapter->pdev->dev;
@@ -1990,7 +1977,7 @@ lan743x_rx_init_ring_element(struct lan743x_rx *rx, int index, bool can_fail)

descriptor = &rx->ring_cpu_ptr[index];
buffer_info = &rx->buffer_info[index];
- skb = lan743x_alloc_skb(netdev, buffer_length, can_fail);
+ skb = __netdev_alloc_skb(netdev, buffer_length, GFP_ATOMIC | GFP_DMA);
if (!skb)
return -ENOMEM;
dma_ptr = dma_map_single(dev, skb->data, buffer_length, DMA_FROM_DEVICE);
@@ -2078,6 +2065,13 @@ static void lan743x_rx_release_ring_element(struct lan743x_rx *rx, int index)
static struct sk_buff *
lan743x_rx_trim_skb(struct sk_buff *skb, int frame_length)
{
+ static int trim_cnt;
+
+ if ((trim_cnt++ % 100) == 77) {
+ dev_kfree_skb_irq(skb);
+ return NULL;
+ }
+
if (skb_linearize(skb)) {
dev_kfree_skb_irq(skb);
return NULL;
@@ -2152,7 +2146,7 @@ static int lan743x_rx_process_buffer(struct lan743x_rx *rx)

/* save existing skb, allocate new skb and map to dma */
skb = buffer_info->skb;
- if (lan743x_rx_init_ring_element(rx, rx->last_head, true)) {
+ if (lan743x_rx_init_ring_element(rx, rx->last_head)) {
/* failed to allocate next skb.
* Memory is very low.
* Drop this packet and reuse buffer.
@@ -2357,7 +2351,7 @@ static int lan743x_rx_ring_init(struct lan743x_rx *rx)

rx->last_head = 0;
for (index = 0; index < rx->ring_size; index++) {
- ret = lan743x_rx_init_ring_element(rx, index, false);
+ ret = lan743x_rx_init_ring_element(rx, index);
if (ret)
goto cleanup;
}
--
2.17.1

2021-02-16 21:33:00

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [PATCH net-next v3 2/5] lan743x: sync only the received area of an rx ring buffer

Hi Bryan,

On Tue, Feb 16, 2021 at 3:50 PM <[email protected]> wrote:
>
> Looks Good, Thanks Sven
> Our testing is in progress, We will let you know our results soon.
>
> Reviewed-by: Bryan Whitehead <[email protected]>
>

Thank you Bryan, I really appreciate your help and expertise.

2021-02-17 22:06:24

by Sven Van Asbroeck

[permalink] [raw]
Subject: Re: [PATCH net-next v3 0/5] lan743x speed boost

Hi Jakub and Bryan,

On Wed, Feb 17, 2021 at 4:43 PM <[email protected]> wrote:
>
> Just to let you know, my colleague tested the patches 1 and 2 on x86 PC and we are satisfied with the result.
> We confirmed some performance improvements.
> We also confirmed PTP is working.
>
> Thanks for your work on this.
>
> Tested-by: [email protected]
>

Bryan, that is great news. My pleasure, thank you for your guidance
and considerable expertise.

Jakub, is there anything else you'd like to see from us, before you
are satisfied that patches 1/5 and 2/5 can be merged into your tree?

2021-02-17 22:35:13

by David Miller

[permalink] [raw]
Subject: Re: [PATCH net-next v3 0/5] lan743x speed boost

From: Sven Van Asbroeck <[email protected]>
Date: Wed, 17 Feb 2021 17:04:05 -0500

> Hi Jakub and Bryan,
>
> Jakub, is there anything else you'd like to see from us, before you
> are satisfied that patches 1/5 and 2/5 can be merged into your tree?

They are already merged into net-next