2022-04-28 18:23:10

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH net-next 00/11] UDP/IPv6 refactoring

Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
cleaner than it was before and the series also removes a bunch of instructions
and other overhead from the hot path positively affecting performance.

It was a part of a larger series, there were some perf numbers for it, see
https://lore.kernel.org/netdev/[email protected]/

Pavel Begunkov (11):
ipv6: optimise ipcm6 cookie init
udp/ipv6: refactor udpv6_sendmsg udplite checks
udp/ipv6: move pending section of udpv6_sendmsg
udp/ipv6: prioritise the ip6 path over ip4 checks
udp/ipv6: optimise udpv6_sendmsg() daddr checks
udp/ipv6: optimise out daddr reassignment
udp/ipv6: clean up udpv6_sendmsg's saddr init
ipv6: partially inline fl6_update_dst()
ipv6: refactor opts push in __ip6_make_skb()
ipv6: improve opt-less __ip6_make_skb()
ipv6: clean up ip6_setup_cork

include/net/ipv6.h | 24 +++----
net/ipv6/datagram.c | 4 +-
net/ipv6/exthdrs.c | 15 ++--
net/ipv6/ip6_output.c | 53 +++++++-------
net/ipv6/raw.c | 8 +--
net/ipv6/udp.c | 158 ++++++++++++++++++++----------------------
net/l2tp/l2tp_ip6.c | 8 +--
7 files changed, 122 insertions(+), 148 deletions(-)

--
2.36.0


2022-04-28 20:13:13

by Pavel Begunkov

[permalink] [raw]
Subject: Re: [PATCH net-next 00/11] UDP/IPv6 refactoring

On 4/28/22 15:04, Paolo Abeni wrote:
> On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote:
>> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
>> cleaner than it was before and the series also removes a bunch of instructions
>> and other overhead from the hot path positively affecting performance.
>>
>> It was a part of a larger series, there were some perf numbers for it, see
>> https://lore.kernel.org/netdev/[email protected]/
>>
>> Pavel Begunkov (11):
>> ipv6: optimise ipcm6 cookie init
>> udp/ipv6: refactor udpv6_sendmsg udplite checks
>> udp/ipv6: move pending section of udpv6_sendmsg
>> udp/ipv6: prioritise the ip6 path over ip4 checks
>> udp/ipv6: optimise udpv6_sendmsg() daddr checks
>> udp/ipv6: optimise out daddr reassignment
>> udp/ipv6: clean up udpv6_sendmsg's saddr init
>> ipv6: partially inline fl6_update_dst()
>> ipv6: refactor opts push in __ip6_make_skb()
>> ipv6: improve opt-less __ip6_make_skb()
>> ipv6: clean up ip6_setup_cork
>>
>> include/net/ipv6.h | 24 +++----
>> net/ipv6/datagram.c | 4 +-
>> net/ipv6/exthdrs.c | 15 ++--
>> net/ipv6/ip6_output.c | 53 +++++++-------
>> net/ipv6/raw.c | 8 +--
>> net/ipv6/udp.c | 158 ++++++++++++++++++++----------------------
>> net/l2tp/l2tp_ip6.c | 8 +--
>> 7 files changed, 122 insertions(+), 148 deletions(-)
>
> Just a general comment here: IMHO the above diffstat is quite
> significant and some patches looks completely non trivial to me.
>
> I think we need a quite significant performance gain to justify the
> above, could you please share your performace data, comprising the
> testing scenario?

As mentioned I benchmarked it with a UDP/IPv6 max throughput kind of
test and only as a part of a larger series [1]. It was "2090K vs
2229K tx/s, +6.6%". Taking into account +3% from split out sock_wfree
optimisations, half if not most of the rest should be accounted to this
series, so a bit hand-wavingly +1-3%. Can spend some extra time
retesting this particular series if strongly required...

I was using [2], which is basically an io_uring copy of send paths of
selftests/net/msg_zerocopy. Should be visible with other tools, this
one just alleviates context switch / etc. overhead with io_uring.

./send-zc -6 udp -D <address> -t <time> -s16 -z0

It sends a number of 16 bytes UDP/ipv6 (non-zerocopy) send requests over
io_uring, then waits for them and repeats. It was 8 (default) requests
per iteration (i.e. syscall). I was using dummy netdev, so there is no
actual receiver, but it quite correlates with my server setup with mlx
cards, just takes more effort for me to test. And all with
mitigations=off

There might be some fatter targets to optimise, but udpv6_sendmsg()
and functions around take a good chunk of cycles as well, though without
particular hotspots. If we'd want some better justification than 1-3%,
then need to add more work on top, adding even more to diffstat...
vicious cycle.


[1] https://lore.kernel.org/netdev/[email protected]/
[2] https://github.com/isilence/liburing/blob/zc_v3/test/send-zc.c

--
Pavel Begunkov

2022-04-28 21:22:50

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH net-next 03/11] udp/ipv6: move pending section of udpv6_sendmsg

Move up->pending section of udpv6_sendmsg() to the beginning of the
function. Even though it require some code duplication for sin6 parsing,
it clearly localises the pending handling in one place, removes an extra
if and more importantly will prepare the code for further patches.

Signed-off-by: Pavel Begunkov <[email protected]>
---
net/ipv6/udp.c | 67 ++++++++++++++++++++++++++++++--------------------
1 file changed, 40 insertions(+), 27 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 705eea080f5e..d6aedd4dab25 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1317,6 +1317,44 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
ipc6.sockc.tsflags = sk->sk_tsflags;
ipc6.sockc.mark = sk->sk_mark;

+ /* Rough check on arithmetic overflow,
+ better check is made in ip6_append_data().
+ */
+ if (unlikely(len > INT_MAX - sizeof(struct udphdr)))
+ return -EMSGSIZE;
+
+ /* There are pending frames. */
+ if (up->pending) {
+ if (up->pending == AF_INET)
+ return udp_sendmsg(sk, msg, len);
+
+ /* Do a quick destination sanity check before corking. */
+ if (sin6) {
+ if (msg->msg_namelen < offsetof(struct sockaddr, sa_data))
+ return -EINVAL;
+ if (sin6->sin6_family == AF_INET6) {
+ if (msg->msg_namelen < SIN6_LEN_RFC2133)
+ return -EINVAL;
+ if (ipv6_addr_any(&sin6->sin6_addr) &&
+ ipv6_addr_v4mapped(&np->saddr))
+ return -EINVAL;
+ } else if (sin6->sin6_family != AF_UNSPEC) {
+ return -EINVAL;
+ }
+ }
+
+ /* The socket lock must be held while it's corked. */
+ lock_sock(sk);
+ if (unlikely(up->pending != AF_INET6)) {
+ /* Just now it was seen corked, userspace is buggy */
+ err = up->pending ? -EAFNOSUPPORT : -EINVAL;
+ release_sock(sk);
+ return err;
+ }
+ dst = NULL;
+ goto do_append_data;
+ }
+
/* destination address check */
if (sin6) {
if (addr_len < offsetof(struct sockaddr, sa_data))
@@ -1342,12 +1380,11 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
default:
return -EINVAL;
}
- } else if (!up->pending) {
+ } else {
if (sk->sk_state != TCP_ESTABLISHED)
return -EDESTADDRREQ;
daddr = &sk->sk_v6_daddr;
- } else
- daddr = NULL;
+ }

if (daddr) {
if (ipv6_addr_v4mapped(daddr)) {
@@ -1364,30 +1401,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
}

- /* Rough check on arithmetic overflow,
- better check is made in ip6_append_data().
- */
- if (len > INT_MAX - sizeof(struct udphdr))
- return -EMSGSIZE;
-
- if (up->pending) {
- if (up->pending == AF_INET)
- return udp_sendmsg(sk, msg, len);
- /*
- * There are pending frames.
- * The socket lock must be held while it's corked.
- */
- lock_sock(sk);
- if (likely(up->pending)) {
- if (unlikely(up->pending != AF_INET6)) {
- release_sock(sk);
- return -EAFNOSUPPORT;
- }
- dst = NULL;
- goto do_append_data;
- }
- release_sock(sk);
- }
ulen += sizeof(struct udphdr);

memset(fl6, 0, sizeof(*fl6));
--
2.36.0

2022-04-28 23:33:34

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH net-next 11/11] ipv6: clean up ip6_setup_cork

Do a bit of refactoring for ip6_setup_cork(). Cache a xfrm_dst_path()
result to not call it twice, reshuffle ifs to not repeat some parts
twice and so.

Signed-off-by: Pavel Begunkov <[email protected]>
---
net/ipv6/ip6_output.c | 30 +++++++++++++-----------------
1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 416d14299242..a17b26d5f34d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1358,15 +1358,13 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
struct ipv6_pinfo *np = inet6_sk(sk);
unsigned int mtu;
struct ipv6_txoptions *nopt, *opt = ipc6->opt;
+ struct dst_entry *xrfm_dst;

/* callers pass dst together with a reference, set it first so
* ip6_cork_release() can put it down even in case of an error.
*/
cork->base.dst = &rt->dst;

- /*
- * setup for corking
- */
if (opt) {
if (WARN_ON(v6_cork->opt))
return -EINVAL;
@@ -1399,28 +1397,26 @@ static int ip6_setup_cork(struct sock *sk, struct inet_cork_full *cork,
}
v6_cork->hop_limit = ipc6->hlimit;
v6_cork->tclass = ipc6->tclass;
- if (rt->dst.flags & DST_XFRM_TUNNEL)
- mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
- READ_ONCE(rt->dst.dev->mtu) : dst_mtu(&rt->dst);
+
+ xrfm_dst = xfrm_dst_path(&rt->dst);
+ if (dst_allfrag(xrfm_dst))
+ cork->base.flags |= IPCORK_ALLFRAG;
+
+ if (np->pmtudisc < IPV6_PMTUDISC_PROBE)
+ mtu = dst_mtu(rt->dst.flags & DST_XFRM_TUNNEL ? &rt->dst : xrfm_dst);
else
- mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
- READ_ONCE(rt->dst.dev->mtu) : dst_mtu(xfrm_dst_path(&rt->dst));
- if (np->frag_size < mtu) {
- if (np->frag_size)
- mtu = np->frag_size;
- }
+ mtu = READ_ONCE(rt->dst.dev->mtu);
+
+ if (np->frag_size < mtu && np->frag_size)
+ mtu = np->frag_size;
+
cork->base.fragsize = mtu;
cork->base.gso_size = ipc6->gso_size;
cork->base.tx_flags = 0;
cork->base.mark = ipc6->sockc.mark;
sock_tx_timestamp(sk, ipc6->sockc.tsflags, &cork->base.tx_flags);
-
- if (dst_allfrag(xfrm_dst_path(&rt->dst)))
- cork->base.flags |= IPCORK_ALLFRAG;
cork->base.length = 0;
-
cork->base.transmit_time = ipc6->sockc.transmit_time;
-
return 0;
}

--
2.36.0

2022-04-29 00:31:28

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH net-next 00/11] UDP/IPv6 refactoring

On Thu, 2022-04-28 at 11:56 +0100, Pavel Begunkov wrote:
> Refactor UDP/IPv6 and especially udpv6_sendmsg() paths. The end result looks
> cleaner than it was before and the series also removes a bunch of instructions
> and other overhead from the hot path positively affecting performance.
>
> It was a part of a larger series, there were some perf numbers for it, see
> https://lore.kernel.org/netdev/[email protected]/
>
> Pavel Begunkov (11):
> ipv6: optimise ipcm6 cookie init
> udp/ipv6: refactor udpv6_sendmsg udplite checks
> udp/ipv6: move pending section of udpv6_sendmsg
> udp/ipv6: prioritise the ip6 path over ip4 checks
> udp/ipv6: optimise udpv6_sendmsg() daddr checks
> udp/ipv6: optimise out daddr reassignment
> udp/ipv6: clean up udpv6_sendmsg's saddr init
> ipv6: partially inline fl6_update_dst()
> ipv6: refactor opts push in __ip6_make_skb()
> ipv6: improve opt-less __ip6_make_skb()
> ipv6: clean up ip6_setup_cork
>
> include/net/ipv6.h | 24 +++----
> net/ipv6/datagram.c | 4 +-
> net/ipv6/exthdrs.c | 15 ++--
> net/ipv6/ip6_output.c | 53 +++++++-------
> net/ipv6/raw.c | 8 +--
> net/ipv6/udp.c | 158 ++++++++++++++++++++----------------------
> net/l2tp/l2tp_ip6.c | 8 +--
> 7 files changed, 122 insertions(+), 148 deletions(-)

Just a general comment here: IMHO the above diffstat is quite
significant and some patches looks completely non trivial to me.

I think we need a quite significant performance gain to justify the
above, could you please share your performace data, comprising the
testing scenario?

Thanks!

Paolo

2022-04-29 01:05:00

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH net-next 06/11] udp/ipv6: optimise out daddr reassignment

There is nothing that checks daddr placement in udpv6_sendmsg(), so the
check reassigning it to ->sk_v6_daddr looks like a not needed anymore
artifact from the past. Remove it.

Signed-off-by: Pavel Begunkov <[email protected]>
---
net/ipv6/udp.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 1f05e165eb17..34c5919afa3e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1417,14 +1417,6 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
}
}

- /*
- * Otherwise it will be difficult to maintain
- * sk->sk_dst_cache.
- */
- if (sk->sk_state == TCP_ESTABLISHED &&
- ipv6_addr_equal(daddr, &sk->sk_v6_daddr))
- daddr = &sk->sk_v6_daddr;
-
if (addr_len >= sizeof(struct sockaddr_in6) &&
sin6->sin6_scope_id &&
__ipv6_addr_needs_scope_id(__ipv6_addr_type(daddr)))
--
2.36.0

2022-04-29 06:21:43

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH net-next 05/11] udp/ipv6: optimise udpv6_sendmsg() daddr checks

All paths taking udpv6_sendmsg() to the ipv6_addr_v4mapped() check set a
non zero daddr, we can safely kill the NULL check just before it.

Signed-off-by: Pavel Begunkov <[email protected]>
---
net/ipv6/udp.c | 23 +++++++++++------------
1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 78ce5fc53b59..1f05e165eb17 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1383,19 +1383,18 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
daddr = &sk->sk_v6_daddr;
}

- if (daddr) {
- if (ipv6_addr_v4mapped(daddr)) {
- struct sockaddr_in sin;
- sin.sin_family = AF_INET;
- sin.sin_port = sin6 ? sin6->sin6_port : inet->inet_dport;
- sin.sin_addr.s_addr = daddr->s6_addr32[3];
- msg->msg_name = &sin;
- msg->msg_namelen = sizeof(sin);
+ if (ipv6_addr_v4mapped(daddr)) {
+ struct sockaddr_in sin;
+
+ sin.sin_family = AF_INET;
+ sin.sin_port = sin6 ? sin6->sin6_port : inet->inet_dport;
+ sin.sin_addr.s_addr = daddr->s6_addr32[3];
+ msg->msg_name = &sin;
+ msg->msg_namelen = sizeof(sin);
do_udp_sendmsg:
- if (__ipv6_only_sock(sk))
- return -ENETUNREACH;
- return udp_sendmsg(sk, msg, len);
- }
+ if (__ipv6_only_sock(sk))
+ return -ENETUNREACH;
+ return udp_sendmsg(sk, msg, len);
}

ulen += sizeof(struct udphdr);
--
2.36.0

2022-04-29 06:52:24

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH net-next 04/11] udp/ipv6: prioritise the ip6 path over ip4 checks

For AF_INET6 sockets we care the most about ipv6 but not ip4 mappings as
it's requires some extra hops anyway. Take AF_INET6 case from the address
parsing switch and add an explicit path for it. It removes some extra
ifs from the path and removes the switch overhead.

Signed-off-by: Pavel Begunkov <[email protected]>
---
net/ipv6/udp.c | 37 +++++++++++++++++--------------------
1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d6aedd4dab25..78ce5fc53b59 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1357,30 +1357,27 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)

/* destination address check */
if (sin6) {
- if (addr_len < offsetof(struct sockaddr, sa_data))
- return -EINVAL;
+ if (addr_len < SIN6_LEN_RFC2133 || sin6->sin6_family != AF_INET6) {
+ if (addr_len < offsetof(struct sockaddr, sa_data))
+ return -EINVAL;

- switch (sin6->sin6_family) {
- case AF_INET6:
- if (addr_len < SIN6_LEN_RFC2133)
+ switch (sin6->sin6_family) {
+ case AF_INET:
+ goto do_udp_sendmsg;
+ case AF_UNSPEC:
+ msg->msg_name = sin6 = NULL;
+ msg->msg_namelen = addr_len = 0;
+ goto no_daddr;
+ default:
return -EINVAL;
- daddr = &sin6->sin6_addr;
- if (ipv6_addr_any(daddr) &&
- ipv6_addr_v4mapped(&np->saddr))
- ipv6_addr_set_v4mapped(htonl(INADDR_LOOPBACK),
- daddr);
- break;
- case AF_INET:
- goto do_udp_sendmsg;
- case AF_UNSPEC:
- msg->msg_name = sin6 = NULL;
- msg->msg_namelen = addr_len = 0;
- daddr = NULL;
- break;
- default:
- return -EINVAL;
+ }
}
+
+ daddr = &sin6->sin6_addr;
+ if (ipv6_addr_any(daddr) && ipv6_addr_v4mapped(&np->saddr))
+ ipv6_addr_set_v4mapped(htonl(INADDR_LOOPBACK), daddr);
} else {
+no_daddr:
if (sk->sk_state != TCP_ESTABLISHED)
return -EDESTADDRREQ;
daddr = &sk->sk_v6_daddr;
--
2.36.0

2022-04-29 07:12:21

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH net-next 10/11] ipv6: improve opt-less __ip6_make_skb()

We do a bit of a network header pointer shuffling in __ip6_make_skb()
expecting that ipv6_push_*frag_opts() might change the layout. Avoid it
with associated overhead when there are no opts.

Signed-off-by: Pavel Begunkov <[email protected]>
---
net/ipv6/ip6_output.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 43a541bbcf5f..416d14299242 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1880,22 +1880,20 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,

/* Allow local fragmentation. */
skb->ignore_df = ip6_sk_ignore_df(sk);
- __skb_pull(skb, skb_network_header_len(skb));
-
final_dst = &fl6->daddr;
if (v6_cork->opt) {
struct ipv6_txoptions *opt = v6_cork->opt;

+ __skb_pull(skb, skb_network_header_len(skb));
if (opt->opt_flen)
ipv6_push_frag_opts(skb, opt, &proto);
if (opt->opt_nflen)
ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr);
+ skb_push(skb, sizeof(struct ipv6hdr));
+ skb_reset_network_header(skb);
}

- skb_push(skb, sizeof(struct ipv6hdr));
- skb_reset_network_header(skb);
hdr = ipv6_hdr(skb);
-
ip6_flow_hdr(hdr, v6_cork->tclass,
ip6_make_flowlabel(net, skb, fl6->flowlabel,
ip6_autoflowlabel(net, np), fl6));
--
2.36.0

2022-05-03 00:21:37

by Pavel Begunkov

[permalink] [raw]
Subject: [PATCH net-next 09/11] ipv6: refactor opts push in __ip6_make_skb()

Don't preload v6_cork->opt before we actually need it, it likely to be
saved on the stack and read again for no good reason.

Signed-off-by: Pavel Begunkov <[email protected]>
---
net/ipv6/ip6_output.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 976554d0fdec..43a541bbcf5f 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1855,7 +1855,6 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
struct ipv6_pinfo *np = inet6_sk(sk);
struct net *net = sock_net(sk);
struct ipv6hdr *hdr;
- struct ipv6_txoptions *opt = v6_cork->opt;
struct rt6_info *rt = (struct rt6_info *)cork->base.dst;
struct flowi6 *fl6 = &cork->fl.u.ip6;
unsigned char proto = fl6->flowi6_proto;
@@ -1884,10 +1883,14 @@ struct sk_buff *__ip6_make_skb(struct sock *sk,
__skb_pull(skb, skb_network_header_len(skb));

final_dst = &fl6->daddr;
- if (opt && opt->opt_flen)
- ipv6_push_frag_opts(skb, opt, &proto);
- if (opt && opt->opt_nflen)
- ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr);
+ if (v6_cork->opt) {
+ struct ipv6_txoptions *opt = v6_cork->opt;
+
+ if (opt->opt_flen)
+ ipv6_push_frag_opts(skb, opt, &proto);
+ if (opt->opt_nflen)
+ ipv6_push_nfrag_opts(skb, opt, &proto, &final_dst, &fl6->saddr);
+ }

skb_push(skb, sizeof(struct ipv6hdr));
skb_reset_network_header(skb);
--
2.36.0