2022-02-24 12:34:12

by Harold Huang

[permalink] [raw]
Subject: [PATCH] tun: support NAPI to accelerate packet processing

In tun, NAPI is supported and we can also use NAPI in the path of
batched XDP buffs to accelerate packet processing. What is more, after
we use NPAI, GRO is also supported. The iperf shows that the throughput
could be improved from 4.5Gbsp to 9.2Gbps per stream.

Reported-at: https://lore.kernel.org/netdev/CAHJXk3Y9_Fh04sakMMbcAkef7kOTEc-kf84Ne3DtWD7EAp13cg@mail.gmail.com/T/#t
Signed-off-by: Harold Huang <[email protected]>
---
drivers/net/tun.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fed85447701a..4e1cea659b42 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
struct virtio_net_hdr *gso = &hdr->gso;
struct bpf_prog *xdp_prog;
struct sk_buff *skb = NULL;
+ struct sk_buff_head *queue;
u32 rxhash = 0, act;
int buflen = hdr->buflen;
int err = 0;
@@ -2464,7 +2465,14 @@ static int tun_xdp_one(struct tun_struct *tun,
!tfile->detached)
rxhash = __skb_get_hash_symmetric(skb);

- netif_receive_skb(skb);
+ if (tfile->napi_enabled) {
+ queue = &tfile->sk.sk_write_queue;
+ spin_lock(&queue->lock);
+ __skb_queue_tail(queue, skb);
+ spin_unlock(&queue->lock);
+ } else {
+ netif_receive_skb(skb);
+ }

/* No need to disable preemption here since this function is
* always called with bh disabled
@@ -2507,6 +2515,9 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
if (flush)
xdp_do_flush();

+ if (tfile->napi_enabled)
+ napi_schedule(&tfile->napi);
+
rcu_read_unlock();
local_bh_enable();

--
2.27.0


2022-02-24 17:28:38

by Paolo Abeni

[permalink] [raw]
Subject: Re: [PATCH] tun: support NAPI to accelerate packet processing

Hello,

On Thu, 2022-02-24 at 18:38 +0800, Harold Huang wrote:
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NPAI, GRO is also supported. The iperf shows that the throughput

Very minor nit: typo above NPAI -> NAPI

> could be improved from 4.5Gbsp to 9.2Gbps per stream.
>
> Reported-at: https://lore.kernel.org/netdev/CAHJXk3Y9_Fh04sakMMbcAkef7kOTEc-kf84Ne3DtWD7EAp13cg@mail.gmail.com/T/#t
> Signed-off-by: Harold Huang <[email protected]>

Additionally, please specify explicitly the target tree into the patch
subject.

Cheers,

Paolo

2022-02-25 07:52:46

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH] tun: support NAPI to accelerate packet processing

On Thu, Feb 24, 2022 at 6:39 PM Harold Huang <[email protected]> wrote:
>
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NPAI, GRO is also supported. The iperf shows that the throughput
> could be improved from 4.5Gbsp to 9.2Gbps per stream.

It's better to describe the setup in the testing.

And we need to tweak the title as NAPI is supported in some paths,
something like "support NAPI for packets received from msg_control"?

>
> Reported-at: https://lore.kernel.org/netdev/CAHJXk3Y9_Fh04sakMMbcAkef7kOTEc-kf84Ne3DtWD7EAp13cg@mail.gmail.com/T/#t
> Signed-off-by: Harold Huang <[email protected]>
> ---
> drivers/net/tun.c | 13 ++++++++++++-
> 1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..4e1cea659b42 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> struct virtio_net_hdr *gso = &hdr->gso;
> struct bpf_prog *xdp_prog;
> struct sk_buff *skb = NULL;
> + struct sk_buff_head *queue;
> u32 rxhash = 0, act;
> int buflen = hdr->buflen;
> int err = 0;
> @@ -2464,7 +2465,14 @@ static int tun_xdp_one(struct tun_struct *tun,
> !tfile->detached)
> rxhash = __skb_get_hash_symmetric(skb);
>
> - netif_receive_skb(skb);
> + if (tfile->napi_enabled) {
> + queue = &tfile->sk.sk_write_queue;
> + spin_lock(&queue->lock);
> + __skb_queue_tail(queue, skb);
> + spin_unlock(&queue->lock);
> + } else {
> + netif_receive_skb(skb);
> + }
>
> /* No need to disable preemption here since this function is
> * always called with bh disabled
> @@ -2507,6 +2515,9 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> if (flush)
> xdp_do_flush();
>
> + if (tfile->napi_enabled)
> + napi_schedule(&tfile->napi);

It's better to check whether we've queued anything to avoid unnecessary napi.

Thanks

> +
> rcu_read_unlock();
> local_bh_enable();
>
> --
> 2.27.0
>

2022-02-25 10:58:17

by Harold Huang

[permalink] [raw]
Subject: Re: [PATCH] tun: support NAPI to accelerate packet processing

Paolo Abeni <[email protected]> 于2022年2月25日周五 01:22写道:
>
> Hello,
>
> On Thu, 2022-02-24 at 18:38 +0800, Harold Huang wrote:
> > In tun, NAPI is supported and we can also use NAPI in the path of
> > batched XDP buffs to accelerate packet processing. What is more, after
> > we use NPAI, GRO is also supported. The iperf shows that the throughput
>
> Very minor nit: typo above NPAI -> NAPI

Fix it in the next version.

>
> > could be improved from 4.5Gbsp to 9.2Gbps per stream.
> >
> > Reported-at: https://lore.kernel.org/netdev/CAHJXk3Y9_Fh04sakMMbcAkef7kOTEc-kf84Ne3DtWD7EAp13cg@mail.gmail.com/T/#t
> > Signed-off-by: Harold Huang <[email protected]>
>
> Additionally, please specify explicitly the target tree into the patch
> subject.

Fix it in the next version.

>
> Cheers,
>
> Paolo
>

Thanks,

Harold

2022-02-25 13:29:35

by Harold Huang

[permalink] [raw]
Subject: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs

In tun, NAPI is supported and we can also use NAPI in the path of
batched XDP buffs to accelerate packet processing. What is more, after
we use NAPI, GRO is also supported. The iperf shows that the throughput of
single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
Gbps nearly reachs the line speed of the phy nic and there is still about
15% idle cpu core remaining on the vhost thread.

Test topology:

[iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]

Iperf stream:

Before:
...
[ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
[ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
[ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
[ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
[ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
[ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver

After:
...
[ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
[ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
[ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
[ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
[ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
[ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
....

Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
Signed-off-by: Harold Huang <[email protected]>
---
v1 -> v2
- fix commit messages
- add queued flag to avoid void unnecessary napi suggested by Jason

drivers/net/tun.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fed85447701a..c7d8b7c821d8 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
}

static int tun_xdp_one(struct tun_struct *tun,
- struct tun_file *tfile,
+ struct tun_file *tfile, int *queued,
struct xdp_buff *xdp, int *flush,
struct tun_page *tpage)
{
@@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
struct virtio_net_hdr *gso = &hdr->gso;
struct bpf_prog *xdp_prog;
struct sk_buff *skb = NULL;
+ struct sk_buff_head *queue;
u32 rxhash = 0, act;
int buflen = hdr->buflen;
int err = 0;
@@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
!tfile->detached)
rxhash = __skb_get_hash_symmetric(skb);

- netif_receive_skb(skb);
+ if (tfile->napi_enabled) {
+ queue = &tfile->sk.sk_write_queue;
+ spin_lock(&queue->lock);
+ __skb_queue_tail(queue, skb);
+ spin_unlock(&queue->lock);
+ (*queued)++;
+ } else {
+ netif_receive_skb(skb);
+ }

/* No need to disable preemption here since this function is
* always called with bh disabled
@@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
if (ctl && (ctl->type == TUN_MSG_PTR)) {
struct tun_page tpage;
int n = ctl->num;
- int flush = 0;
+ int flush = 0, queued = 0;

memset(&tpage, 0, sizeof(tpage));

@@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)

for (i = 0; i < n; i++) {
xdp = &((struct xdp_buff *)ctl->ptr)[i];
- tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+ tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
}

if (flush)
xdp_do_flush();

+ if (tfile->napi_enabled && queued > 0)
+ napi_schedule(&tfile->napi);
+
rcu_read_unlock();
local_bh_enable();

--
2.27.0

2022-02-28 06:36:50

by Harold Huang

[permalink] [raw]
Subject: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs

In tun, NAPI is supported and we can also use NAPI in the path of
batched XDP buffs to accelerate packet processing. What is more, after
we use NAPI, GRO is also supported. The iperf shows that the throughput of
single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
Gbps nearly reachs the line speed of the phy nic and there is still about
15% idle cpu core remaining on the vhost thread.

Test topology:
[iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]

Iperf stream:
iperf3 -c 10.0.0.2 -i 1 -t 10

Before:
...
[ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
[ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
[ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
[ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
[ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
[ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver

After:
...
[ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
[ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
[ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
[ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
[ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
[ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver

Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
Signed-off-by: Harold Huang <[email protected]>
---
v2 -> v3
- return the queued NAPI packet from tun_xdp_one

drivers/net/tun.c | 43 ++++++++++++++++++++++++++++++-------------
1 file changed, 30 insertions(+), 13 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index fed85447701a..969ea69fd29d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2388,9 +2388,10 @@ static int tun_xdp_one(struct tun_struct *tun,
struct virtio_net_hdr *gso = &hdr->gso;
struct bpf_prog *xdp_prog;
struct sk_buff *skb = NULL;
+ struct sk_buff_head *queue;
u32 rxhash = 0, act;
int buflen = hdr->buflen;
- int err = 0;
+ int ret = 0;
bool skb_xdp = false;
struct page *page;

@@ -2405,13 +2406,13 @@ static int tun_xdp_one(struct tun_struct *tun,
xdp_set_data_meta_invalid(xdp);

act = bpf_prog_run_xdp(xdp_prog, xdp);
- err = tun_xdp_act(tun, xdp_prog, xdp, act);
- if (err < 0) {
+ ret = tun_xdp_act(tun, xdp_prog, xdp, act);
+ if (ret < 0) {
put_page(virt_to_head_page(xdp->data));
- return err;
+ return ret;
}

- switch (err) {
+ switch (ret) {
case XDP_REDIRECT:
*flush = true;
fallthrough;
@@ -2435,7 +2436,7 @@ static int tun_xdp_one(struct tun_struct *tun,
build:
skb = build_skb(xdp->data_hard_start, buflen);
if (!skb) {
- err = -ENOMEM;
+ ret = -ENOMEM;
goto out;
}

@@ -2445,7 +2446,7 @@ static int tun_xdp_one(struct tun_struct *tun,
if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
atomic_long_inc(&tun->rx_frame_errors);
kfree_skb(skb);
- err = -EINVAL;
+ ret = -EINVAL;
goto out;
}

@@ -2455,16 +2456,27 @@ static int tun_xdp_one(struct tun_struct *tun,
skb_record_rx_queue(skb, tfile->queue_index);

if (skb_xdp) {
- err = do_xdp_generic(xdp_prog, skb);
- if (err != XDP_PASS)
+ ret = do_xdp_generic(xdp_prog, skb);
+ if (ret != XDP_PASS) {
+ ret = 0;
goto out;
+ }
}

if (!rcu_dereference(tun->steering_prog) && tun->numqueues > 1 &&
!tfile->detached)
rxhash = __skb_get_hash_symmetric(skb);

- netif_receive_skb(skb);
+ if (tfile->napi_enabled) {
+ queue = &tfile->sk.sk_write_queue;
+ spin_lock(&queue->lock);
+ __skb_queue_tail(queue, skb);
+ spin_unlock(&queue->lock);
+ ret = 1;
+ } else {
+ netif_receive_skb(skb);
+ ret = 0;
+ }

/* No need to disable preemption here since this function is
* always called with bh disabled
@@ -2475,7 +2487,7 @@ static int tun_xdp_one(struct tun_struct *tun,
tun_flow_update(tun, rxhash, tfile);

out:
- return err;
+ return ret;
}

static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
@@ -2492,7 +2504,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
if (ctl && (ctl->type == TUN_MSG_PTR)) {
struct tun_page tpage;
int n = ctl->num;
- int flush = 0;
+ int flush = 0, queued = 0;

memset(&tpage, 0, sizeof(tpage));

@@ -2501,12 +2513,17 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)

for (i = 0; i < n; i++) {
xdp = &((struct xdp_buff *)ctl->ptr)[i];
- tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+ ret = tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+ if (ret > 0)
+ queued += ret;
}

if (flush)
xdp_do_flush();

+ if (tfile->napi_enabled && queued > 0)
+ napi_schedule(&tfile->napi);
+
rcu_read_unlock();
local_bh_enable();

--
2.27.0

2022-02-28 06:53:06

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs

On Mon, Feb 28, 2022 at 12:59 PM Eric Dumazet <[email protected]> wrote:
>
>
>
> On Sun, Feb 27, 2022 at 8:20 PM Jason Wang <[email protected]> wrote:
>>
>> On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <[email protected]> wrote:
>>
>> > How big n can be ?
>> >
>> > BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
>> >
>> > struct tun_msg_ctl *ctl = m->msg_control;
>> >
>> > if (ctl && (ctl->type == TUN_MSG_PTR)) {
>> >
>> > int n = ctl->num; // can be set to values in [0..65535]
>> >
>> > for (i = 0; i < n; i++) {
>> >
>> > xdp = &((struct xdp_buff *)ctl->ptr)[i];
>> >
>> >
>> > I really do not understand how we prevent malicious user space from
>> > crashing the kernel.
>>
>> It looks to me the only user for this is vhost-net which limits it to
>> 64, userspace can't use sendmsg() directly on tap.
>>
>
> Ah right, thanks for the clarification.
>
> (IMO, either remove the "msg.msg_controllen = sizeof(ctl);" from handle_tx_zerocopy(), or add sanity checks in tun_sendmsg())
>
>

Right, Harold, want to do that?

Thanks

2022-02-28 07:48:46

by Eric Dumazet

[permalink] [raw]
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs


On 2/25/22 01:02, Harold Huang wrote:
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> Test topology:
>
> [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
>
> Iperf stream:
>
> Before:
> ...
> [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
>
> After:
> ...
> [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
> ....
>
> Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> Signed-off-by: Harold Huang <[email protected]>
> ---
> v1 -> v2
> - fix commit messages
> - add queued flag to avoid void unnecessary napi suggested by Jason
>
> drivers/net/tun.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..c7d8b7c821d8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
> }
>
> static int tun_xdp_one(struct tun_struct *tun,
> - struct tun_file *tfile,
> + struct tun_file *tfile, int *queued,
> struct xdp_buff *xdp, int *flush,
> struct tun_page *tpage)
> {
> @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> struct virtio_net_hdr *gso = &hdr->gso;
> struct bpf_prog *xdp_prog;
> struct sk_buff *skb = NULL;
> + struct sk_buff_head *queue;
> u32 rxhash = 0, act;
> int buflen = hdr->buflen;
> int err = 0;
> @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
> !tfile->detached)
> rxhash = __skb_get_hash_symmetric(skb);
>
> - netif_receive_skb(skb);
> + if (tfile->napi_enabled) {
> + queue = &tfile->sk.sk_write_queue;
> + spin_lock(&queue->lock);
> + __skb_queue_tail(queue, skb);
> + spin_unlock(&queue->lock);
> + (*queued)++;
> + } else {
> + netif_receive_skb(skb);
> + }
>
> /* No need to disable preemption here since this function is
> * always called with bh disabled
> @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> if (ctl && (ctl->type == TUN_MSG_PTR)) {
> struct tun_page tpage;
> int n = ctl->num;
> - int flush = 0;
> + int flush = 0, queued = 0;
>
> memset(&tpage, 0, sizeof(tpage));
>
> @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>
> for (i = 0; i < n; i++) {
> xdp = &((struct xdp_buff *)ctl->ptr)[i];
> - tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> + tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);


How big n can be ?

BTW I could not find where m->msg_controllen was checked in tun_sendmsg().

struct tun_msg_ctl *ctl = m->msg_control;

if (ctl && (ctl->type == TUN_MSG_PTR)) {

    int n = ctl->num;  // can be set to values in [0..65535]

    for (i = 0; i < n; i++) {

        xdp = &((struct xdp_buff *)ctl->ptr)[i];


I really do not understand how we prevent malicious user space from
crashing the kernel.



> }
>
> if (flush)
> xdp_do_flush();
>
> + if (tfile->napi_enabled && queued > 0)
> + napi_schedule(&tfile->napi);
> +
> rcu_read_unlock();
> local_bh_enable();
>

2022-02-28 09:03:44

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs

On Mon, Feb 28, 2022 at 3:27 PM Harold Huang <[email protected]> wrote:
>
> Thanks for the suggestions.
>
> On Mon, Feb 28, 2022 at 1:17 PM Jason Wang <[email protected]> wrote:
> >
> > On Mon, Feb 28, 2022 at 12:59 PM Eric Dumazet <[email protected]> wrote:
> > >
> > >
> > >
> > > On Sun, Feb 27, 2022 at 8:20 PM Jason Wang <[email protected]> wrote:
> > >>
> > >> On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <[email protected]> wrote:
> > >>
> > >> > How big n can be ?
> > >> >
> > >> > BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
> > >> >
> > >> > struct tun_msg_ctl *ctl = m->msg_control;
> > >> >
> > >> > if (ctl && (ctl->type == TUN_MSG_PTR)) {
> > >> >
> > >> > int n = ctl->num; // can be set to values in [0..65535]
> > >> >
> > >> > for (i = 0; i < n; i++) {
> > >> >
> > >> > xdp = &((struct xdp_buff *)ctl->ptr)[i];
> > >> >
> > >> >
> > >> > I really do not understand how we prevent malicious user space from
> > >> > crashing the kernel.
> > >>
> > >> It looks to me the only user for this is vhost-net which limits it to
> > >> 64, userspace can't use sendmsg() directly on tap.
> > >>
> > >
> > > Ah right, thanks for the clarification.
> > >
> > > (IMO, either remove the "msg.msg_controllen = sizeof(ctl);" from handle_tx_zerocopy(), or add sanity checks in tun_sendmsg())
> > >
> > >
> >
> > Right, Harold, want to do that?
>
> I am greatly willing to do that. But I am not quite sure about this.
>
> If we remove the "msg.msg_controllen = sizeof(ctl);" from
> handle_tx_zerocopy(), it seems msg.msg_controllen is always 0. What
> does it stands for?

It means msg_controllen is not used. But see below (adding sanity
check seems to be better).

>
> I see tap_sendmsg in drivers/net/tap.c also uses msg_controller to
> send batched xdp buffers. Do we need to add similar sanity checks to
> tap_sendmsg as tun_sendmsg?
>

I think the point is to make sure the caller doesn't send us too short
msg_control. E.g the msg_controllen should be sizeof(tun_msg_ctl).

So we probably need to check in both places. (And initialize
msg_controllen is vhost_tx_batch())

Thanks

2022-02-28 09:04:11

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs

On Mon, Feb 28, 2022 at 11:38 AM Harold Huang <[email protected]> wrote:
>
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> Test topology:
> [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
>
> Iperf stream:
> iperf3 -c 10.0.0.2 -i 1 -t 10
>
> Before:
> ...
> [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
>
> After:
> ...
> [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
>
> Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> Signed-off-by: Harold Huang <[email protected]>

Acked-by: Jason Wang <[email protected]>

> ---
> v2 -> v3
> - return the queued NAPI packet from tun_xdp_one
>
> drivers/net/tun.c | 43 ++++++++++++++++++++++++++++++-------------
> 1 file changed, 30 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..969ea69fd29d 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2388,9 +2388,10 @@ static int tun_xdp_one(struct tun_struct *tun,
> struct virtio_net_hdr *gso = &hdr->gso;
> struct bpf_prog *xdp_prog;
> struct sk_buff *skb = NULL;
> + struct sk_buff_head *queue;
> u32 rxhash = 0, act;
> int buflen = hdr->buflen;
> - int err = 0;
> + int ret = 0;
> bool skb_xdp = false;
> struct page *page;
>
> @@ -2405,13 +2406,13 @@ static int tun_xdp_one(struct tun_struct *tun,
> xdp_set_data_meta_invalid(xdp);
>
> act = bpf_prog_run_xdp(xdp_prog, xdp);
> - err = tun_xdp_act(tun, xdp_prog, xdp, act);
> - if (err < 0) {
> + ret = tun_xdp_act(tun, xdp_prog, xdp, act);
> + if (ret < 0) {
> put_page(virt_to_head_page(xdp->data));
> - return err;
> + return ret;
> }
>
> - switch (err) {
> + switch (ret) {
> case XDP_REDIRECT:
> *flush = true;
> fallthrough;
> @@ -2435,7 +2436,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> build:
> skb = build_skb(xdp->data_hard_start, buflen);
> if (!skb) {
> - err = -ENOMEM;
> + ret = -ENOMEM;
> goto out;
> }
>
> @@ -2445,7 +2446,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
> atomic_long_inc(&tun->rx_frame_errors);
> kfree_skb(skb);
> - err = -EINVAL;
> + ret = -EINVAL;
> goto out;
> }
>
> @@ -2455,16 +2456,27 @@ static int tun_xdp_one(struct tun_struct *tun,
> skb_record_rx_queue(skb, tfile->queue_index);
>
> if (skb_xdp) {
> - err = do_xdp_generic(xdp_prog, skb);
> - if (err != XDP_PASS)
> + ret = do_xdp_generic(xdp_prog, skb);
> + if (ret != XDP_PASS) {
> + ret = 0;
> goto out;
> + }
> }
>
> if (!rcu_dereference(tun->steering_prog) && tun->numqueues > 1 &&
> !tfile->detached)
> rxhash = __skb_get_hash_symmetric(skb);
>
> - netif_receive_skb(skb);
> + if (tfile->napi_enabled) {
> + queue = &tfile->sk.sk_write_queue;
> + spin_lock(&queue->lock);
> + __skb_queue_tail(queue, skb);
> + spin_unlock(&queue->lock);
> + ret = 1;
> + } else {
> + netif_receive_skb(skb);
> + ret = 0;
> + }
>
> /* No need to disable preemption here since this function is
> * always called with bh disabled
> @@ -2475,7 +2487,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> tun_flow_update(tun, rxhash, tfile);
>
> out:
> - return err;
> + return ret;
> }
>
> static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> @@ -2492,7 +2504,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> if (ctl && (ctl->type == TUN_MSG_PTR)) {
> struct tun_page tpage;
> int n = ctl->num;
> - int flush = 0;
> + int flush = 0, queued = 0;
>
> memset(&tpage, 0, sizeof(tpage));
>
> @@ -2501,12 +2513,17 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>
> for (i = 0; i < n; i++) {
> xdp = &((struct xdp_buff *)ctl->ptr)[i];
> - tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> + ret = tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> + if (ret > 0)
> + queued += ret;
> }
>
> if (flush)
> xdp_do_flush();
>
> + if (tfile->napi_enabled && queued > 0)
> + napi_schedule(&tfile->napi);
> +
> rcu_read_unlock();
> local_bh_enable();
>
> --
> 2.27.0
>

2022-02-28 10:51:04

by Harold Huang

[permalink] [raw]
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs

Thanks for the suggestions.

On Mon, Feb 28, 2022 at 1:17 PM Jason Wang <[email protected]> wrote:
>
> On Mon, Feb 28, 2022 at 12:59 PM Eric Dumazet <[email protected]> wrote:
> >
> >
> >
> > On Sun, Feb 27, 2022 at 8:20 PM Jason Wang <[email protected]> wrote:
> >>
> >> On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <[email protected]> wrote:
> >>
> >> > How big n can be ?
> >> >
> >> > BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
> >> >
> >> > struct tun_msg_ctl *ctl = m->msg_control;
> >> >
> >> > if (ctl && (ctl->type == TUN_MSG_PTR)) {
> >> >
> >> > int n = ctl->num; // can be set to values in [0..65535]
> >> >
> >> > for (i = 0; i < n; i++) {
> >> >
> >> > xdp = &((struct xdp_buff *)ctl->ptr)[i];
> >> >
> >> >
> >> > I really do not understand how we prevent malicious user space from
> >> > crashing the kernel.
> >>
> >> It looks to me the only user for this is vhost-net which limits it to
> >> 64, userspace can't use sendmsg() directly on tap.
> >>
> >
> > Ah right, thanks for the clarification.
> >
> > (IMO, either remove the "msg.msg_controllen = sizeof(ctl);" from handle_tx_zerocopy(), or add sanity checks in tun_sendmsg())
> >
> >
>
> Right, Harold, want to do that?

I am greatly willing to do that. But I am not quite sure about this.

If we remove the "msg.msg_controllen = sizeof(ctl);" from
handle_tx_zerocopy(), it seems msg.msg_controllen is always 0. What
does it stands for?

I see tap_sendmsg in drivers/net/tap.c also uses msg_controller to
send batched xdp buffers. Do we need to add similar sanity checks to
tap_sendmsg as tun_sendmsg?

2022-02-28 11:34:46

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs

On Fri, Feb 25, 2022 at 5:03 PM Harold Huang <[email protected]> wrote:
>
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> Test topology:
>
> [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
>
> Iperf stream:
>
> Before:
> ...
> [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
>
> After:
> ...
> [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval Transfer Bitrate Retr
> [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
> ....
>
> Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> Signed-off-by: Harold Huang <[email protected]>
> ---
> v1 -> v2
> - fix commit messages
> - add queued flag to avoid void unnecessary napi suggested by Jason
>
> drivers/net/tun.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index fed85447701a..c7d8b7c821d8 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
> }
>
> static int tun_xdp_one(struct tun_struct *tun,
> - struct tun_file *tfile,
> + struct tun_file *tfile, int *queued,
> struct xdp_buff *xdp, int *flush,
> struct tun_page *tpage)

Nit: how about simply returning the number of packets queued here?

Thanks

> {
> @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> struct virtio_net_hdr *gso = &hdr->gso;
> struct bpf_prog *xdp_prog;
> struct sk_buff *skb = NULL;
> + struct sk_buff_head *queue;
> u32 rxhash = 0, act;
> int buflen = hdr->buflen;
> int err = 0;
> @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
> !tfile->detached)
> rxhash = __skb_get_hash_symmetric(skb);
>
> - netif_receive_skb(skb);
> + if (tfile->napi_enabled) {
> + queue = &tfile->sk.sk_write_queue;
> + spin_lock(&queue->lock);
> + __skb_queue_tail(queue, skb);
> + spin_unlock(&queue->lock);
> + (*queued)++;
> + } else {
> + netif_receive_skb(skb);
> + }
>
> /* No need to disable preemption here since this function is
> * always called with bh disabled
> @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> if (ctl && (ctl->type == TUN_MSG_PTR)) {
> struct tun_page tpage;
> int n = ctl->num;
> - int flush = 0;
> + int flush = 0, queued = 0;
>
> memset(&tpage, 0, sizeof(tpage));
>
> @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
>
> for (i = 0; i < n; i++) {
> xdp = &((struct xdp_buff *)ctl->ptr)[i];
> - tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> + tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
> }
>
> if (flush)
> xdp_do_flush();
>
> + if (tfile->napi_enabled && queued > 0)
> + napi_schedule(&tfile->napi);
> +
> rcu_read_unlock();
> local_bh_enable();
>
> --
> 2.27.0
>

2022-02-28 16:02:04

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH net-next v2] tun: support NAPI for packets received from batched XDP buffs

On Mon, Feb 28, 2022 at 12:06 PM Eric Dumazet <[email protected]> wrote:
>
>
> On 2/25/22 01:02, Harold Huang wrote:
> > In tun, NAPI is supported and we can also use NAPI in the path of
> > batched XDP buffs to accelerate packet processing. What is more, after
> > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > Gbps nearly reachs the line speed of the phy nic and there is still about
> > 15% idle cpu core remaining on the vhost thread.
> >
> > Test topology:
> >
> > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> >
> > Iperf stream:
> >
> > Before:
> > ...
> > [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> > [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> > [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> > [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> > [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> > [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
> >
> > After:
> > ...
> > [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> > [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> > [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> > [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> > [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> > [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
> > ....
> >
> > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > Signed-off-by: Harold Huang <[email protected]>
> > ---
> > v1 -> v2
> > - fix commit messages
> > - add queued flag to avoid void unnecessary napi suggested by Jason
> >
> > drivers/net/tun.c | 20 ++++++++++++++++----
> > 1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > index fed85447701a..c7d8b7c821d8 100644
> > --- a/drivers/net/tun.c
> > +++ b/drivers/net/tun.c
> > @@ -2379,7 +2379,7 @@ static void tun_put_page(struct tun_page *tpage)
> > }
> >
> > static int tun_xdp_one(struct tun_struct *tun,
> > - struct tun_file *tfile,
> > + struct tun_file *tfile, int *queued,
> > struct xdp_buff *xdp, int *flush,
> > struct tun_page *tpage)
> > {
> > @@ -2388,6 +2388,7 @@ static int tun_xdp_one(struct tun_struct *tun,
> > struct virtio_net_hdr *gso = &hdr->gso;
> > struct bpf_prog *xdp_prog;
> > struct sk_buff *skb = NULL;
> > + struct sk_buff_head *queue;
> > u32 rxhash = 0, act;
> > int buflen = hdr->buflen;
> > int err = 0;
> > @@ -2464,7 +2465,15 @@ static int tun_xdp_one(struct tun_struct *tun,
> > !tfile->detached)
> > rxhash = __skb_get_hash_symmetric(skb);
> >
> > - netif_receive_skb(skb);
> > + if (tfile->napi_enabled) {
> > + queue = &tfile->sk.sk_write_queue;
> > + spin_lock(&queue->lock);
> > + __skb_queue_tail(queue, skb);
> > + spin_unlock(&queue->lock);
> > + (*queued)++;
> > + } else {
> > + netif_receive_skb(skb);
> > + }
> >
> > /* No need to disable preemption here since this function is
> > * always called with bh disabled
> > @@ -2492,7 +2501,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> > if (ctl && (ctl->type == TUN_MSG_PTR)) {
> > struct tun_page tpage;
> > int n = ctl->num;
> > - int flush = 0;
> > + int flush = 0, queued = 0;
> >
> > memset(&tpage, 0, sizeof(tpage));
> >
> > @@ -2501,12 +2510,15 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
> >
> > for (i = 0; i < n; i++) {
> > xdp = &((struct xdp_buff *)ctl->ptr)[i];
> > - tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
> > + tun_xdp_one(tun, tfile, &queued, xdp, &flush, &tpage);
>
>
> How big n can be ?
>
> BTW I could not find where m->msg_controllen was checked in tun_sendmsg().
>
> struct tun_msg_ctl *ctl = m->msg_control;
>
> if (ctl && (ctl->type == TUN_MSG_PTR)) {
>
> int n = ctl->num; // can be set to values in [0..65535]
>
> for (i = 0; i < n; i++) {
>
> xdp = &((struct xdp_buff *)ctl->ptr)[i];
>
>
> I really do not understand how we prevent malicious user space from
> crashing the kernel.

It looks to me the only user for this is vhost-net which limits it to
64, userspace can't use sendmsg() directly on tap.

Thanks

>
>
>
> > }
> >
> > if (flush)
> > xdp_do_flush();
> >
> > + if (tfile->napi_enabled && queued > 0)
> > + napi_schedule(&tfile->napi);
> > +
> > rcu_read_unlock();
> > local_bh_enable();
> >
>

2022-02-28 17:54:24

by Stephen Hemminger

[permalink] [raw]
Subject: Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs

On Mon, 28 Feb 2022 15:46:56 +0800
Jason Wang <[email protected]> wrote:

> On Mon, Feb 28, 2022 at 11:38 AM Harold Huang <[email protected]> wrote:
> >
> > In tun, NAPI is supported and we can also use NAPI in the path of
> > batched XDP buffs to accelerate packet processing. What is more, after
> > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > Gbps nearly reachs the line speed of the phy nic and there is still about
> > 15% idle cpu core remaining on the vhost thread.
> >
> > Test topology:
> > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> >
> > Iperf stream:
> > iperf3 -c 10.0.0.2 -i 1 -t 10
> >
> > Before:
> > ...
> > [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> > [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> > [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> > [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> > [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> > [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
> >
> > After:
> > ...
> > [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> > [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> > [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> > [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> > [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> > - - - - - - - - - - - - - - - - - - - - - - - - -
> > [ ID] Interval Transfer Bitrate Retr
> > [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> > [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
> >
> > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > Signed-off-by: Harold Huang <[email protected]>
>
> Acked-by: Jason Wang <[email protected]>

Would this help when using sendmmsg and recvmmsg on the TAP device?
Asking because interested in speeding up another use of TAP device, and wondering
if this would help.

2022-03-01 02:50:04

by Harold Huang

[permalink] [raw]
Subject: Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs

On Tue, Mar 1, 2022 at 1:15 AM Stephen Hemminger
<[email protected]> wrote:
>
> On Mon, 28 Feb 2022 15:46:56 +0800
> Jason Wang <[email protected]> wrote:
>
> > On Mon, Feb 28, 2022 at 11:38 AM Harold Huang <[email protected]> wrote:
> > >
> > > In tun, NAPI is supported and we can also use NAPI in the path of
> > > batched XDP buffs to accelerate packet processing. What is more, after
> > > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > > Gbps nearly reachs the line speed of the phy nic and there is still about
> > > 15% idle cpu core remaining on the vhost thread.
> > >
> > > Test topology:
> > > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> > >
> > > Iperf stream:
> > > iperf3 -c 10.0.0.2 -i 1 -t 10
> > >
> > > Before:
> > > ...
> > > [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> > > [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> > > [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> > > [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> > > [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> > > [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
> > >
> > > After:
> > > ...
> > > [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> > > [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> > > [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> > > [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> > > [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> > > [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
> > >
> > > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > > Signed-off-by: Harold Huang <[email protected]>
> >
> > Acked-by: Jason Wang <[email protected]>
>
> Would this help when using sendmmsg and recvmmsg on the TAP device?
> Asking because interested in speeding up another use of TAP device, and wondering
> if this would help.

As Jason said, sendmmsg()/recvmsg() could not be used on tuntap. But I
think another choice is to use writev/readv directly on the ttunap fd,
which will call tun_get_user to send msg and NAPI has also been
supported.

2022-03-01 04:20:58

by Jason Wang

[permalink] [raw]
Subject: Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs

On Tue, Mar 1, 2022 at 1:15 AM Stephen Hemminger
<[email protected]> wrote:
>
> On Mon, 28 Feb 2022 15:46:56 +0800
> Jason Wang <[email protected]> wrote:
>
> > On Mon, Feb 28, 2022 at 11:38 AM Harold Huang <[email protected]> wrote:
> > >
> > > In tun, NAPI is supported and we can also use NAPI in the path of
> > > batched XDP buffs to accelerate packet processing. What is more, after
> > > we use NAPI, GRO is also supported. The iperf shows that the throughput of
> > > single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> > > Gbps nearly reachs the line speed of the phy nic and there is still about
> > > 15% idle cpu core remaining on the vhost thread.
> > >
> > > Test topology:
> > > [iperf server]<--->tap<--->dpdk testpmd<--->phy nic<--->[iperf client]
> > >
> > > Iperf stream:
> > > iperf3 -c 10.0.0.2 -i 1 -t 10
> > >
> > > Before:
> > > ...
> > > [ 5] 5.00-6.00 sec 558 MBytes 4.68 Gbits/sec 0 1.50 MBytes
> > > [ 5] 6.00-7.00 sec 556 MBytes 4.67 Gbits/sec 1 1.35 MBytes
> > > [ 5] 7.00-8.00 sec 556 MBytes 4.67 Gbits/sec 2 1.18 MBytes
> > > [ 5] 8.00-9.00 sec 559 MBytes 4.69 Gbits/sec 0 1.48 MBytes
> > > [ 5] 9.00-10.00 sec 556 MBytes 4.67 Gbits/sec 1 1.33 MBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 5.39 GBytes 4.63 Gbits/sec 72 sender
> > > [ 5] 0.00-10.04 sec 5.39 GBytes 4.61 Gbits/sec receiver
> > >
> > > After:
> > > ...
> > > [ 5] 5.00-6.00 sec 1.07 GBytes 9.19 Gbits/sec 0 1.55 MBytes
> > > [ 5] 6.00-7.00 sec 1.08 GBytes 9.30 Gbits/sec 0 1.63 MBytes
> > > [ 5] 7.00-8.00 sec 1.08 GBytes 9.25 Gbits/sec 0 1.72 MBytes
> > > [ 5] 8.00-9.00 sec 1.08 GBytes 9.25 Gbits/sec 77 1.31 MBytes
> > > [ 5] 9.00-10.00 sec 1.08 GBytes 9.24 Gbits/sec 0 1.48 MBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval Transfer Bitrate Retr
> > > [ 5] 0.00-10.00 sec 10.8 GBytes 9.28 Gbits/sec 166 sender
> > > [ 5] 0.00-10.04 sec 10.8 GBytes 9.24 Gbits/sec receiver
> > >
> > > Reported-at: https://lore.kernel.org/all/CACGkMEvTLG0Ayg+TtbN4q4pPW-ycgCCs3sC3-TF8cuRTf7Pp1A@mail.gmail.com
> > > Signed-off-by: Harold Huang <[email protected]>
> >
> > Acked-by: Jason Wang <[email protected]>
>
> Would this help when using sendmmsg and recvmmsg on the TAP device?

We haven't exported the socket object of tuntap to userspace. So we
can't use sendmmsg()/recvmsg() now.

> Asking because interested in speeding up another use of TAP device, and wondering
> if this would help.
>

Yes, it would be interesting. We need someone to work on that.

Thanks

2022-03-02 05:34:27

by patchwork-bot+netdevbpf

[permalink] [raw]
Subject: Re: [PATCH net-next v3] tun: support NAPI for packets received from batched XDP buffs

Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <[email protected]>:

On Mon, 28 Feb 2022 11:38:05 +0800 you wrote:
> In tun, NAPI is supported and we can also use NAPI in the path of
> batched XDP buffs to accelerate packet processing. What is more, after
> we use NAPI, GRO is also supported. The iperf shows that the throughput of
> single stream could be improved from 4.5Gbps to 9.2Gbps. Additionally, 9.2
> Gbps nearly reachs the line speed of the phy nic and there is still about
> 15% idle cpu core remaining on the vhost thread.
>
> [...]

Here is the summary with links:
- [net-next,v3] tun: support NAPI for packets received from batched XDP buffs
https://git.kernel.org/netdev/net-next/c/fb3f903769e8

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html