LinuxLists.cc - [PATCH v2 net-next 0/3] gro: micro-optimize dev_gro

2021-03-13 20:32:07

Subject: [PATCH v2 net-next 0/3] gro: micro-optimize dev_gro_receive()

This random series addresses some of suboptimal constructions used
in the main GRO entry point.
The main body is gro_list_prepare() simplification and pointer usage
optimization in dev_gro_receive() itself. Being mostly cosmetic, it
gives like +10 Mbps on my setup to both TCP and UDP (both single- and
multi-flow).

Since v1 [0]:
- drop the replacement of bucket index calculation with
reciprocal_scale() since it makes absolutely no sense (Eric);
- improve stack usage in dev_gro_receive() (Eric);
- reverse the order of patches to avoid changes superseding.

[0] https://lore.kernel.org/netdev/[email protected]

Alexander Lobakin (3):
gro: simplify gro_list_prepare()
gro: consistentify napi->gro_hash[x] access in dev_gro_receive()
gro: give 'hash' variable in dev_gro_receive() a less confusing name

net/core/dev.c | 40 ++++++++++++++++++----------------------
1 file changed, 18 insertions(+), 22 deletions(-)

--
2.30.2

2021-03-13 20:32:32

by Alexander Lobakin

[permalink] [raw]

Subject: [PATCH v2 net-next 1/3] gro: simplify gro_list_prepare()

gro_list_prepare() always returns &napi->gro_hash[bucket].list,
without any variations. Moreover, it uses 'napi' argument only to
have access to this list, and calculates the bucket index for the
second time (firstly it happens at the beginning of
dev_gro_receive()) to do that.
Given that dev_gro_receive() already has an index to the needed
list, just pass it as the first argument to eliminate redundant
calculations, and make gro_list_prepare() return void.
Also, both arguments of gro_list_prepare() can be constified since
this function can only modify the skbs from the bucket list.

Signed-off-by: Alexander Lobakin <[email protected]>
---
net/core/dev.c | 12 ++++--------
1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 2bfdd528c7c3..1317e6b6758a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5858,15 +5858,13 @@ void napi_gro_flush(struct napi_struct *napi, bool flush_old)
}
EXPORT_SYMBOL(napi_gro_flush);

-static struct list_head *gro_list_prepare(struct napi_struct *napi,
- struct sk_buff *skb)
+static void gro_list_prepare(const struct list_head *head,
+ const struct sk_buff *skb)
{
unsigned int maclen = skb->dev->hard_header_len;
u32 hash = skb_get_hash_raw(skb);
- struct list_head *head;
struct sk_buff *p;

- head = &napi->gro_hash[hash & (GRO_HASH_BUCKETS - 1)].list;
list_for_each_entry(p, head, list) {
unsigned long diffs;

@@ -5892,8 +5890,6 @@ static struct list_head *gro_list_prepare(struct napi_struct *napi,
maclen);
NAPI_GRO_CB(p)->same_flow = !diffs;
}
-
- return head;
}

static void skb_gro_reset_offset(struct sk_buff *skb)
@@ -5957,10 +5953,10 @@ static void gro_flush_oldest(struct napi_struct *napi, struct list_head *head)
static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
{
u32 hash = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
+ struct list_head *gro_head = &napi->gro_hash[hash].list;
struct list_head *head = &offload_base;
struct packet_offload *ptype;
__be16 type = skb->protocol;
- struct list_head *gro_head;
struct sk_buff *pp = NULL;
enum gro_result ret;
int same_flow;
@@ -5969,7 +5965,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
if (netif_elide_gro(skb->dev))
goto normal;

- gro_head = gro_list_prepare(napi, skb);
+ gro_list_prepare(gro_head, skb);

rcu_read_lock();
list_for_each_entry_rcu(ptype, head, list) {
--
2.30.2

2021-03-13 20:34:47

by Alexander Lobakin

[permalink] [raw]

Subject: [PATCH v2 net-next 3/3] gro: give 'hash' variable in dev_gro_receive() a less confusing name

'hash' stores not the flow hash, but the index of the GRO bucket
corresponding to it.
Change its name to 'bucket' to avoid confusion while reading lines
like '__set_bit(hash, &napi->gro_bitmask)'.

Signed-off-by: Alexander Lobakin <[email protected]>
---
net/core/dev.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b635467087f3..5a2847a19cf2 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5952,8 +5952,8 @@ static void gro_flush_oldest(struct napi_struct *napi, struct list_head *head)

static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
{
- u32 hash = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
- struct gro_list *gro_list = &napi->gro_hash[hash];
+ u32 bucket = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
+ struct gro_list *gro_list = &napi->gro_hash[bucket];
struct list_head *head = &offload_base;
struct packet_offload *ptype;
__be16 type = skb->protocol;
@@ -6047,10 +6047,10 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
gro_pull_from_frag0(skb, grow);
ok:
if (gro_list->count) {
- if (!test_bit(hash, &napi->gro_bitmask))
- __set_bit(hash, &napi->gro_bitmask);
- } else if (test_bit(hash, &napi->gro_bitmask)) {
- __clear_bit(hash, &napi->gro_bitmask);
+ if (!test_bit(bucket, &napi->gro_bitmask))
+ __set_bit(bucket, &napi->gro_bitmask);
+ } else if (test_bit(bucket, &napi->gro_bitmask)) {
+ __clear_bit(bucket, &napi->gro_bitmask);
}

return ret;
--
2.30.2

2021-03-13 20:37:06

by Alexander Lobakin

[permalink] [raw]

Subject: [PATCH v2 net-next 2/3] gro: consistentify napi->gro_hash[x] access in dev_gro_receive()

GRO bucket index doesn't change through the entire function.
Store a pointer to the corresponding bucket instead of its member
and use it consistently through the function.
It is performance-safe since &gro_list->list == gro_list.

Misc: remove superfluous braces around single-line branches.

Signed-off-by: Alexander Lobakin <[email protected]>
---
net/core/dev.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 1317e6b6758a..b635467087f3 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5953,7 +5953,7 @@ static void gro_flush_oldest(struct napi_struct *napi, struct list_head *head)
static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff *skb)
{
u32 hash = skb_get_hash_raw(skb) & (GRO_HASH_BUCKETS - 1);
- struct list_head *gro_head = &napi->gro_hash[hash].list;
+ struct gro_list *gro_list = &napi->gro_hash[hash];
struct list_head *head = &offload_base;
struct packet_offload *ptype;
__be16 type = skb->protocol;
@@ -5965,7 +5965,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
if (netif_elide_gro(skb->dev))
goto normal;

- gro_list_prepare(gro_head, skb);
+ gro_list_prepare(&gro_list->list, skb);

rcu_read_lock();
list_for_each_entry_rcu(ptype, head, list) {
@@ -6001,7 +6001,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff

pp = INDIRECT_CALL_INET(ptype->callbacks.gro_receive,
ipv6_gro_receive, inet_gro_receive,
- gro_head, skb);
+ &gro_list->list, skb);
break;
}
rcu_read_unlock();
@@ -6020,7 +6020,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
if (pp) {
skb_list_del_init(pp);
napi_gro_complete(napi, pp);
- napi->gro_hash[hash].count--;
+ gro_list->count--;
}

if (same_flow)
@@ -6029,16 +6029,16 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
if (NAPI_GRO_CB(skb)->flush)
goto normal;

- if (unlikely(napi->gro_hash[hash].count >= MAX_GRO_SKBS)) {
- gro_flush_oldest(napi, gro_head);
- } else {
- napi->gro_hash[hash].count++;
- }
+ if (unlikely(gro_list->count >= MAX_GRO_SKBS))
+ gro_flush_oldest(napi, &gro_list->list);
+ else
+ gro_list->count++;
+
NAPI_GRO_CB(skb)->count = 1;
NAPI_GRO_CB(skb)->age = jiffies;
NAPI_GRO_CB(skb)->last = skb;
skb_shinfo(skb)->gso_size = skb_gro_len(skb);
- list_add(&skb->list, gro_head);
+ list_add(&skb->list, &gro_list->list);
ret = GRO_HELD;

pull:
@@ -6046,7 +6046,7 @@ static enum gro_result dev_gro_receive(struct napi_struct *napi, struct sk_buff
if (grow > 0)
gro_pull_from_frag0(skb, grow);
ok:
- if (napi->gro_hash[hash].count) {
+ if (gro_list->count) {
if (!test_bit(hash, &napi->gro_bitmask))
__set_bit(hash, &napi->gro_bitmask);
} else if (test_bit(hash, &napi->gro_bitmask)) {
--
2.30.2

2021-03-15 08:52:58

by Eric Dumazet

[permalink] [raw]

Subject: Re: [PATCH v2 net-next 0/3] gro: micro-optimize dev_gro_receive()

On Sat, Mar 13, 2021 at 9:30 PM Alexander Lobakin <[email protected]> wrote:
>
> This random series addresses some of suboptimal constructions used
> in the main GRO entry point.
> The main body is gro_list_prepare() simplification and pointer usage
> optimization in dev_gro_receive() itself. Being mostly cosmetic, it
> gives like +10 Mbps on my setup to both TCP and UDP (both single- and
> multi-flow).
>
> Since v1 [0]:
> - drop the replacement of bucket index calculation with
> reciprocal_scale() since it makes absolutely no sense (Eric);
> - improve stack usage in dev_gro_receive() (Eric);
> - reverse the order of patches to avoid changes superseding.
>
> [0] https://lore.kernel.org/netdev/[email protected]
>

SGTM, thanks.

Reviewed-by: Eric Dumazet <[email protected]>