Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp56614imm; Mon, 21 May 2018 02:07:12 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpfCM2b2wdg/08DDaIULei8FlrERVkGHXRHBU7zzJNy6D7yOwHnxBMAJoSNCakR9IACCF57 X-Received: by 2002:a65:6085:: with SMTP id t5-v6mr15324312pgu.362.1526893632446; Mon, 21 May 2018 02:07:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526893632; cv=none; d=google.com; s=arc-20160816; b=Lw5u0qe7cYAYmtPBjIcT6wp7UOwDeKrw5iwsJp11UgQKJnORN1Axa62WKTqn3SJEc7 UE/MNDzFiNxTI1bWbhl9GTqrb4MaolchmGHmx9Dz/4Tl+sCoMV1s+J3e+hjGfcb8nMvd BU7EONlg85D+w3ck9nidxkCIFslMy8QUhHgofiFLamnOAxDyzvSXpiWQx/9EiZa8NnIv t2dk/bF2noB0+As7QIi7Hm4LkmuKjtW3ITD88foWX5YJpFQArwcaGvIunFP94IsY2ibe 4yedJGolq/b3x7xkClj1ArvtRaD20qpA+I/xxs6e3iWtSD3fTViKEA61VOwJxvGvT+3r 2Ucg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=RO8oAfcgINWyOLQwPBLvsF9y10fdNQlPbvlYkXw3e6Y=; b=Z2i9pYO7Tsb/vf34GmWX3K2J68VFgccKVo2z/o5JDk3YHIcEyy9+tg03VAYj4tSS+n ntOBEEVUx6fB+dtktAzXigrXPku068EmKWBX6p7kj7XBwE+A82jQbVKp4lYJL+a9/Vp6 p/amRo7yKjLNo3Hc/mTfkGsWxElLQLOnN35qez6JmUN4ticEs0u985TtiRAmT9sMH47V SA5ldpxE1qaccrB9gL7OX/Hw2qfVULGST3K3FUZzGWXaiy43Asg8Qk0EB/GBuGogN1f9 RxUJLfU+FzaKFj4gn72nBHrRBNoHhf1WY7CfWffazF2Ai0s3VSPKcqmhoZzkAr0fhsoQ cB1g== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z6-v6si13714543pfk.194.2018.05.21.02.06.58; Mon, 21 May 2018 02:07:12 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752811AbeEUJFn (ORCPT + 99 others); Mon, 21 May 2018 05:05:43 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46344 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752766AbeEUJFd (ORCPT ); Mon, 21 May 2018 05:05:33 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 1252D401EF0C; Mon, 21 May 2018 09:05:33 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id CB1C64E68B; Mon, 21 May 2018 09:05:28 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 12/12] vhost_net: batch submitting XDP buffers to underlayer sockets Date: Mon, 21 May 2018 17:04:33 +0800 Message-Id: <1526893473-20128-13-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 09:05:33 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 09:05:33 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch implements XDP batching for vhost_net with tun. This is done by batching XDP buffs in vhost and submit them when: - vhost_net can not build XDP buff (mostly because of the size of packet) - #batched exceeds the limitation (VHOST_NET_RX_BATCH). - tun accept a batch of XDP buff through msg_control and process them in a batch With this tun XDP can benefit from e.g batch transmission during XDP_REDIRECT or XDP_TX. Tests shows 21% improvement on TX pps (from ~3.2Mpps to ~3.9Mpps) while transmitting through testpmd from guest to host by xdp_redirect_map between tap0 and ixgbe. Signed-off-by: Jason Wang --- drivers/net/tun.c | 36 +++++++++++++++++---------- drivers/vhost/net.c | 71 ++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 71 insertions(+), 36 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index b586b3f..5d16d18 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1616,7 +1616,6 @@ static u32 tun_do_xdp(struct tun_struct *tun, switch (act) { case XDP_REDIRECT: *err = xdp_do_redirect(tun->dev, xdp, xdp_prog); - xdp_do_flush_map(); if (*err) break; goto out; @@ -1624,7 +1623,6 @@ static u32 tun_do_xdp(struct tun_struct *tun, *err = tun_xdp_tx(tun->dev, xdp); if (*err) break; - tun_xdp_flush(tun->dev); goto out; case XDP_PASS: goto out; @@ -2400,9 +2398,6 @@ static int tun_xdp_one(struct tun_struct *tun, int err = 0; bool skb_xdp = false; - preempt_disable(); - rcu_read_lock(); - xdp_prog = rcu_dereference(tun->xdp_prog); if (xdp_prog) { if (gso->gso_type) { @@ -2461,15 +2456,12 @@ static int tun_xdp_one(struct tun_struct *tun, tun_flow_update(tun, rxhash, tfile); out: - rcu_read_unlock(); - preempt_enable(); - return err; } static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) { - int ret; + int ret, i; struct tun_file *tfile = container_of(sock, struct tun_file, socket); struct tun_struct *tun = tun_get(tfile); struct tun_msg_ctl *ctl = m->msg_control; @@ -2477,10 +2469,28 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) if (!tun) return -EBADFD; - if (ctl && ctl->type == TUN_MSG_PTR) { - ret = tun_xdp_one(tun, tfile, ctl->ptr); - if (!ret) - ret = total_len; + if (ctl && ((ctl->type & 0xF) == TUN_MSG_PTR)) { + int n = ctl->type >> 16; + + preempt_disable(); + rcu_read_lock(); + + for (i = 0; i < n; i++) { + struct xdp_buff *x = (struct xdp_buff *)ctl->ptr; + struct xdp_buff *xdp = &x[i]; + + xdp_set_data_meta_invalid(xdp); + xdp->rxq = &tfile->xdp_rxq; + tun_xdp_one(tun, tfile, xdp); + } + + xdp_do_flush_map(); + tun_xdp_flush(tun->dev); + + rcu_read_unlock(); + preempt_enable(); + + ret = total_len; goto out; } diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 0d84de6..bec4109 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -118,6 +118,7 @@ struct vhost_net_virtqueue { struct ptr_ring *rx_ring; struct vhost_net_buf rxq; struct xdp_buff xdp[VHOST_RX_BATCH]; + struct vring_used_elem heads[VHOST_RX_BATCH]; }; struct vhost_net { @@ -511,7 +512,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, void *buf; int copied; - if (len < nvq->sock_hlen) + if (unlikely(len < nvq->sock_hlen)) return -EFAULT; if (SKB_DATA_ALIGN(len + pad) + @@ -567,11 +568,37 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, return 0; } +static void vhost_tx_batch(struct vhost_net *net, + struct vhost_net_virtqueue *nvq, + struct socket *sock, + struct msghdr *msghdr, int n) +{ + struct tun_msg_ctl ctl = { + .type = n << 16 | TUN_MSG_PTR, + .ptr = nvq->xdp, + }; + int err; + + if (n == 0) + return; + + msghdr->msg_control = &ctl; + err = sock->ops->sendmsg(sock, msghdr, 0); + + if (unlikely(err < 0)) { + /* FIXME vq_err() */ + vq_err(&nvq->vq, "sendmsg err!\n"); + return; + } + vhost_add_used_and_signal_n(&net->dev, &nvq->vq, nvq->vq.heads, n); +} + +/* Expects to be always run from workqueue - which acts as + * read-size critical section for our kind of RCU. */ static void handle_tx_copy(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; - struct xdp_buff xdp; unsigned out, in; int head; struct msghdr msg = { @@ -586,7 +613,6 @@ static void handle_tx_copy(struct vhost_net *net) size_t hdr_size; struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); - struct tun_msg_ctl ctl; int sent_pkts = 0; s16 nheads = 0; @@ -631,22 +657,24 @@ static void handle_tx_copy(struct vhost_net *net) vq->heads[nheads].id = cpu_to_vhost32(vq, head); vq->heads[nheads].len = 0; - err = vhost_net_build_xdp(nvq, &msg.msg_iter, &xdp); - if (!err) { - ctl.type = TUN_MSG_PTR; - ctl.ptr = &xdp; - msg.msg_control = &ctl; - } else - msg.msg_control = NULL; - total_len += len; - if (total_len < VHOST_NET_WEIGHT && - vhost_has_more_pkts(net, vq)) { - msg.msg_flags |= MSG_MORE; - } else { - msg.msg_flags &= ~MSG_MORE; + err = vhost_net_build_xdp(nvq, &msg.msg_iter, + &nvq->xdp[nheads]); + if (!err) { + if (++nheads == VHOST_RX_BATCH) { + vhost_tx_batch(net, nvq, sock, &msg, nheads); + nheads = 0; + } + goto done; + } else if (unlikely(err != -ENOSPC)) { + vq_err(vq, "Fail to build XDP buffer\n"); + break; } + vhost_tx_batch(net, nvq, sock, &msg, nheads); + msg.msg_control = NULL; + nheads = 0; + /* TODO: Check specific error and bomb out unless ENOBUFS? */ err = sock->ops->sendmsg(sock, &msg, len); if (unlikely(err < 0)) { @@ -657,11 +685,9 @@ static void handle_tx_copy(struct vhost_net *net) if (err != len) pr_debug("Truncated TX packet: " " len %d != %zd\n", err, len); - if (++nheads == VHOST_RX_BATCH) { - vhost_add_used_and_signal_n(&net->dev, vq, vq->heads, - nheads); - nheads = 0; - } + + vhost_add_used_and_signal(&net->dev, vq, head, 0); +done: if (vhost_exceeds_weight(++sent_pkts, total_len)) { vhost_poll_queue(&vq->poll); break; @@ -669,8 +695,7 @@ static void handle_tx_copy(struct vhost_net *net) } out: if (nheads) - vhost_add_used_and_signal_n(&net->dev, vq, vq->heads, - nheads); + vhost_tx_batch(net, nvq, sock, &msg, nheads); mutex_unlock(&vq->mutex); } -- 2.7.4