2022-12-21 09:00:07

by Leesoo Ahn

[permalink] [raw]
Subject: [PATCH v3] usbnet: optimize usbnet_bh() to reduce CPU load

The current source pushes skb into dev->done queue by calling
skb_queue_tail() and then pop it by calling skb_dequeue() to branch to
rx_cleanup state for freeing urb/skb in usbnet_bh(). It takes extra CPU
load, 2.21% (skb_queue_tail) as follows.

- 11.58% 0.26% swapper [k] usbnet_bh
- 11.32% usbnet_bh
- 6.43% skb_dequeue
6.34% _raw_spin_unlock_irqrestore
- 2.21% skb_queue_tail
2.19% _raw_spin_unlock_irqrestore
- 1.68% consume_skb
- 0.97% kfree_skbmem
0.80% kmem_cache_free
0.53% skb_release_data

To reduce the extra CPU load use return values jumping to rx_cleanup
state directly to free them instead of calling skb_queue_tail() and
skb_dequeue() for push/pop respectively.

- 7.87% 0.25% swapper [k] usbnet_bh
- 7.62% usbnet_bh
- 4.81% skb_dequeue
4.74% _raw_spin_unlock_irqrestore
- 1.75% consume_skb
- 0.98% kfree_skbmem
0.78% kmem_cache_free
0.58% skb_release_data
0.53% smsc95xx_rx_fixup

Signed-off-by: Leesoo Ahn <[email protected]>
---
v3:
- Replace return values with proper -ERR values in rx_process()

v2:
- Replace goto label with return statement to reduce goto entropy
- Add CPU load information by perf in commit message

v1 at:
https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

---
drivers/net/usb/usbnet.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index 64a9a80b2309..98d594210df4 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -555,32 +555,30 @@ static int rx_submit (struct usbnet *dev, struct urb *urb, gfp_t flags)

/*-------------------------------------------------------------------------*/

-static inline void rx_process (struct usbnet *dev, struct sk_buff *skb)
+static inline int rx_process(struct usbnet *dev, struct sk_buff *skb)
{
if (dev->driver_info->rx_fixup &&
!dev->driver_info->rx_fixup (dev, skb)) {
/* With RX_ASSEMBLE, rx_fixup() must update counters */
if (!(dev->driver_info->flags & FLAG_RX_ASSEMBLE))
dev->net->stats.rx_errors++;
- goto done;
+ return -EPROTO;
}
// else network stack removes extra byte if we forced a short packet

/* all data was already cloned from skb inside the driver */
if (dev->driver_info->flags & FLAG_MULTI_PACKET)
- goto done;
+ return -EALREADY;

if (skb->len < ETH_HLEN) {
dev->net->stats.rx_errors++;
dev->net->stats.rx_length_errors++;
netif_dbg(dev, rx_err, dev->net, "rx length %d\n", skb->len);
- } else {
- usbnet_skb_return(dev, skb);
- return;
+ return -EPROTO;
}

-done:
- skb_queue_tail(&dev->done, skb);
+ usbnet_skb_return(dev, skb);
+ return 0;
}

/*-------------------------------------------------------------------------*/
@@ -1528,13 +1526,14 @@ static void usbnet_bh (struct timer_list *t)
entry = (struct skb_data *) skb->cb;
switch (entry->state) {
case rx_done:
- entry->state = rx_cleanup;
- rx_process (dev, skb);
+ if (rx_process(dev, skb))
+ goto cleanup;
continue;
case tx_done:
kfree(entry->urb->sg);
fallthrough;
case rx_cleanup:
+cleanup:
usb_free_urb (entry->urb);
dev_kfree_skb (skb);
continue;
--
2.34.1


2022-12-22 11:48:21

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH v3] usbnet: optimize usbnet_bh() to reduce CPU load

On Wed, 2022-12-21 at 16:59 +0900, Leesoo Ahn wrote:
> The current source pushes skb into dev->done queue by calling
> skb_queue_tail() and then pop it by calling skb_dequeue() to branch to
> rx_cleanup state for freeing urb/skb in usbnet_bh(). It takes extra CPU
> load, 2.21% (skb_queue_tail) as follows.
>
> - 11.58% 0.26% swapper [k] usbnet_bh
> - 11.32% usbnet_bh
> - 6.43% skb_dequeue
> 6.34% _raw_spin_unlock_irqrestore
> - 2.21% skb_queue_tail
> 2.19% _raw_spin_unlock_irqrestore
> - 1.68% consume_skb
> - 0.97% kfree_skbmem
> 0.80% kmem_cache_free
> 0.53% skb_release_data
>
> To reduce the extra CPU load use return values jumping to rx_cleanup
> state directly to free them instead of calling skb_queue_tail() and
> skb_dequeue() for push/pop respectively.
>
> - 7.87% 0.25% swapper [k] usbnet_bh
> - 7.62% usbnet_bh
> - 4.81% skb_dequeue
> 4.74% _raw_spin_unlock_irqrestore
> - 1.75% consume_skb
> - 0.98% kfree_skbmem
> 0.78% kmem_cache_free
> 0.58% skb_release_data
> 0.53% smsc95xx_rx_fixup
>
> Signed-off-by: Leesoo Ahn <[email protected]>
> ---
> v3:
> - Replace return values with proper -ERR values in rx_process()
>
> v2:
> - Replace goto label with return statement to reduce goto entropy
> - Add CPU load information by perf in commit message
>
> v1 at:
> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/

This looks like net-next material.

We have already submitted the networking pull request to Linus
for v6.2 and therefore net-next is closed for new drivers, features,
code refactoring and optimizations. We are currently accepting
bug fixes only.

Please repost when net-next reopens after Jan 2nd, including the
expected 'net-next' tag into the subject line.

RFC patches sent for review only are obviously welcome at any time.

[...]

> @@ -1528,13 +1526,14 @@ static void usbnet_bh (struct timer_list *t)
> entry = (struct skb_data *) skb->cb;
> switch (entry->state) {
> case rx_done:
> - entry->state = rx_cleanup;
> - rx_process (dev, skb);
> + if (rx_process(dev, skb))
> + goto cleanup;

You can avoid this additional label (which is a little confusing inside
a switch) factoring out a usb_free_skb(skb) helper and calling it here
and under the rx_cleanup case.

Cheers,

Paolo

2022-12-26 06:40:06

by Leesoo Ahn

[permalink] [raw]
Subject: Re: [PATCH v3] usbnet: optimize usbnet_bh() to reduce CPU load



22. 12. 22. 20:13에 Paolo Abeni 이(가) 쓴 글:
> On Wed, 2022-12-21 at 16:59 +0900, Leesoo Ahn wrote:
>> The current source pushes skb into dev->done queue by calling
>> skb_queue_tail() and then pop it by calling skb_dequeue() to branch to
>> rx_cleanup state for freeing urb/skb in usbnet_bh(). It takes extra CPU
>> load, 2.21% (skb_queue_tail) as follows.
>>
>> - 11.58% 0.26% swapper [k] usbnet_bh
>> - 11.32% usbnet_bh
>> - 6.43% skb_dequeue
>> 6.34% _raw_spin_unlock_irqrestore
>> - 2.21% skb_queue_tail
>> 2.19% _raw_spin_unlock_irqrestore
>> - 1.68% consume_skb
>> - 0.97% kfree_skbmem
>> 0.80% kmem_cache_free
>> 0.53% skb_release_data
>>
>> To reduce the extra CPU load use return values jumping to rx_cleanup
>> state directly to free them instead of calling skb_queue_tail() and
>> skb_dequeue() for push/pop respectively.
>>
>> - 7.87% 0.25% swapper [k] usbnet_bh
>> - 7.62% usbnet_bh
>> - 4.81% skb_dequeue
>> 4.74% _raw_spin_unlock_irqrestore
>> - 1.75% consume_skb
>> - 0.98% kfree_skbmem
>> 0.78% kmem_cache_free
>> 0.58% skb_release_data
>> 0.53% smsc95xx_rx_fixup
>>
>> Signed-off-by: Leesoo Ahn <[email protected]>
>> ---
>> v3:
>> - Replace return values with proper -ERR values in rx_process()
>>
>> v2:
>> - Replace goto label with return statement to reduce goto entropy
>> - Add CPU load information by perf in commit message
>>
>> v1 at:
>> https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/
>
> This looks like net-next material.
>
> We have already submitted the networking pull request to Linus
> for v6.2 and therefore net-next is closed for new drivers, features,
> code refactoring and optimizations. We are currently accepting
> bug fixes only.
>
> Please repost when net-next reopens after Jan 2nd, including the
> expected 'net-next' tag into the subject line
>
> RFC patches sent for review only are obviously welcome at any time.
>
> [...]
>
>> @@ -1528,13 +1526,14 @@ static void usbnet_bh (struct timer_list *t)
>> entry = (struct skb_data *) skb->cb;
>> switch (entry->state) {
>> case rx_done:
>> - entry->state = rx_cleanup;
>> - rx_process (dev, skb);
>> + if (rx_process(dev, skb))
>> + goto cleanup;
>
> You can avoid this additional label (which is a little confusing inside
> a switch) factoring out a usb_free_skb(skb) helper and calling it here
> and under the rx_cleanup case.

Thank you for the information and feedback, it will be in v4 when
net-next reopens.

Best regards,
Leesoo