Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp57060imm; Mon, 21 May 2018 02:07:42 -0700 (PDT) X-Google-Smtp-Source: AB8JxZpab7DU6hA/CxpO9l9z0vtO06B8BS7fRVb6o8DU7Aaw0/5JQy+d9C4x3ore/Dus6mrZP1m7 X-Received: by 2002:a17:902:7409:: with SMTP id g9-v6mr19628567pll.255.1526893662923; Mon, 21 May 2018 02:07:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526893662; cv=none; d=google.com; s=arc-20160816; b=gCauEGVYXEXht4U5aiUWgG/kzo+q8oU+8mlCpEsW9FKnW/LZFLNGT125QmEu0qC/J7 k3n7XdzmjMBCfuLiG1AccCk/6m3WKDDgUC7zP45sHcG3h6fnfMUPpcpbLMnP/TRENxKg h7yL2Bi60JE94k44Vv4mMg+19FWRgwdDEBRL29MujDPAIhRpN/y4NgWKsrt9UH1b2IrS Ch+SfY4tzNq0AEsndzN12BoUWpXWBuCVRpv/zvM0NDQBlo5SSzzt3GiNtSj0LBfeUxeR uVhPUdIJydDsFoa1X2rOHIv14WVfackS11FUAaS23ZTIwfx6Za5I6ZYSi+0tHmMCbmLz 31fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:arc-authentication-results; bh=znRyFtoMBll5s6acPat+yjwbZiSLQQwJAXgPDHv+FYE=; b=Ve5L0yx6yxQeZJth3QE4pPg+TQ4BMJzfE0eEcV+0EICMYjGZoLs9BM1GBI1mgpvk+4 G07qXeGuDMAKwqNFugA2IoZmgAdZugrFmQeu8K2y0E+Jj/DFCcdiPlwUYs4CrX8ZEk61 6kTJjl1+py9KwzElgCK/L1o80XjRpJprGiOrC6yqifDjpyEQjQV/83raFgvbeEMknad4 YI/7VY6F31LJLryt2J9YR54O8vFoE2RfSQ42RHWN4uNGrpw1yXgIZudCn1FF1r/MSaLq naSvSp/DIPA64kSQiCp+76EjPQ5VdI/Eu8jzTi8w/hJYLS4jYkxF6gZQf6eSnjJczx+x wDsg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 5-v6si13566386pfi.285.2018.05.21.02.07.28; Mon, 21 May 2018 02:07:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751251AbeEUJFg (ORCPT + 99 others); Mon, 21 May 2018 05:05:36 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:46340 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1752742AbeEUJF3 (ORCPT ); Mon, 21 May 2018 05:05:29 -0400 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 8E567401EF10; Mon, 21 May 2018 09:05:28 +0000 (UTC) Received: from jason-ThinkPad-T450s.redhat.com (ovpn-12-30.pek2.redhat.com [10.72.12.30]) by smtp.corp.redhat.com (Postfix) with ESMTP id 493044E68B; Mon, 21 May 2018 09:05:24 +0000 (UTC) From: Jason Wang To: mst@redhat.com, jasowang@redhat.com Cc: kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH net-next 11/12] vhost_net: passing raw xdp buff to tun Date: Mon, 21 May 2018 17:04:32 +0800 Message-Id: <1526893473-20128-12-git-send-email-jasowang@redhat.com> In-Reply-To: <1526893473-20128-1-git-send-email-jasowang@redhat.com> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.11.54.5 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 09:05:28 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 09:05:28 +0000 (UTC) for IP:'10.11.54.5' DOMAIN:'int-mx05.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patches implement a TUN specific msg_control: #define TUN_MSG_UBUF 1 #define TUN_MSG_PTR 2 struct tun_msg_ctl { int type; void *ptr; }; The first supported type is ubuf which is already used by vhost_net zerocopy code. The second is XDP buff, which allows vhost_net to pass XDP buff to TUN. This could be used to implement accepting an array of XDP buffs from vhost_net in the following patches. Signed-off-by: Jason Wang --- drivers/net/tun.c | 91 +++++++++++++++++++++++++++++++++++++++++++++++++- drivers/vhost/net.c | 21 ++++++++++-- include/linux/if_tun.h | 7 ++++ 3 files changed, 116 insertions(+), 3 deletions(-) diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 2560378..b586b3f 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -2387,18 +2387,107 @@ static void tun_sock_write_space(struct sock *sk) kill_fasync(&tfile->fasync, SIGIO, POLL_OUT); } +static int tun_xdp_one(struct tun_struct *tun, + struct tun_file *tfile, + struct xdp_buff *xdp) +{ + struct virtio_net_hdr *gso = xdp->data_hard_start + sizeof(int); + struct tun_pcpu_stats *stats; + struct bpf_prog *xdp_prog; + struct sk_buff *skb = NULL; + u32 rxhash = 0, act; + int buflen = *(int *)xdp->data_hard_start; + int err = 0; + bool skb_xdp = false; + + preempt_disable(); + rcu_read_lock(); + + xdp_prog = rcu_dereference(tun->xdp_prog); + if (xdp_prog) { + if (gso->gso_type) { + skb_xdp = true; + goto build; + } + xdp_set_data_meta_invalid(xdp); + xdp->rxq = &tfile->xdp_rxq; + act = tun_do_xdp(tun, tfile, xdp_prog, xdp, &err); + if (err) + goto out; + if (act != XDP_PASS) + goto out; + } + +build: + skb = build_skb(xdp->data_hard_start, buflen); + if (!skb) { + err = -ENOMEM; + goto out; + } + + if (skb_xdp) { + err = do_xdp_generic(xdp_prog, skb); + if (err != XDP_PASS) + goto out; + } + + skb_reserve(skb, xdp->data - xdp->data_hard_start); + skb_put(skb, xdp->data_end - xdp->data); + + if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) { + this_cpu_inc(tun->pcpu_stats->rx_frame_errors); + kfree_skb(skb); + err = -EINVAL; + goto out; + } + + skb->protocol = eth_type_trans(skb, tun->dev); + skb_reset_network_header(skb); + skb_probe_transport_header(skb, 0); + + if (!rcu_dereference(tun->steering_prog)) + rxhash = __skb_get_hash_symmetric(skb); + + netif_receive_skb(skb); + + stats = get_cpu_ptr(tun->pcpu_stats); + u64_stats_update_begin(&stats->syncp); + stats->rx_packets++; + stats->rx_bytes += skb->len; + u64_stats_update_end(&stats->syncp); + put_cpu_ptr(stats); + + if (rxhash) + tun_flow_update(tun, rxhash, tfile); + +out: + rcu_read_unlock(); + preempt_enable(); + + return err; +} + static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len) { int ret; struct tun_file *tfile = container_of(sock, struct tun_file, socket); struct tun_struct *tun = tun_get(tfile); + struct tun_msg_ctl *ctl = m->msg_control; if (!tun) return -EBADFD; - ret = tun_get_user(tun, tfile, m->msg_control, &m->msg_iter, + if (ctl && ctl->type == TUN_MSG_PTR) { + ret = tun_xdp_one(tun, tfile, ctl->ptr); + if (!ret) + ret = total_len; + goto out; + } + + ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter, m->msg_flags & MSG_DONTWAIT, m->msg_flags & MSG_MORE); +out: tun_put(tun); return ret; } diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 1209e84..0d84de6 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -117,6 +117,7 @@ struct vhost_net_virtqueue { struct vhost_net_ubuf_ref *ubufs; struct ptr_ring *rx_ring; struct vhost_net_buf rxq; + struct xdp_buff xdp[VHOST_RX_BATCH]; }; struct vhost_net { @@ -570,6 +571,7 @@ static void handle_tx_copy(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; + struct xdp_buff xdp; unsigned out, in; int head; struct msghdr msg = { @@ -584,6 +586,7 @@ static void handle_tx_copy(struct vhost_net *net) size_t hdr_size; struct socket *sock; struct vhost_net_ubuf_ref *uninitialized_var(ubufs); + struct tun_msg_ctl ctl; int sent_pkts = 0; s16 nheads = 0; @@ -628,6 +631,14 @@ static void handle_tx_copy(struct vhost_net *net) vq->heads[nheads].id = cpu_to_vhost32(vq, head); vq->heads[nheads].len = 0; + err = vhost_net_build_xdp(nvq, &msg.msg_iter, &xdp); + if (!err) { + ctl.type = TUN_MSG_PTR; + ctl.ptr = &xdp; + msg.msg_control = &ctl; + } else + msg.msg_control = NULL; + total_len += len; if (total_len < VHOST_NET_WEIGHT && vhost_has_more_pkts(net, vq)) { @@ -734,16 +745,21 @@ static void handle_tx_zerocopy(struct vhost_net *net) /* use msg_control to pass vhost zerocopy ubuf info to skb */ if (zcopy_used) { struct ubuf_info *ubuf; + struct tun_msg_ctl ctl; + ubuf = nvq->ubuf_info + nvq->upend_idx; + ctl.type = TUN_MSG_UBUF; + ctl.ptr = ubuf; + vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head); vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS; ubuf->callback = vhost_zerocopy_callback; ubuf->ctx = nvq->ubufs; ubuf->desc = nvq->upend_idx; refcount_set(&ubuf->refcnt, 1); - msg.msg_control = ubuf; - msg.msg_controllen = sizeof(ubuf); + msg.msg_control = &ctl; + msg.msg_controllen = sizeof(ctl); ubufs = nvq->ubufs; atomic_inc(&ubufs->refcount); nvq->upend_idx = (nvq->upend_idx + 1) % UIO_MAXIOV; @@ -751,6 +767,7 @@ static void handle_tx_zerocopy(struct vhost_net *net) msg.msg_control = NULL; ubufs = NULL; } + total_len += len; if (total_len < VHOST_NET_WEIGHT && vhost_has_more_pkts(net, vq)) { diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h index 3d2996d..ba46dce 100644 --- a/include/linux/if_tun.h +++ b/include/linux/if_tun.h @@ -19,6 +19,13 @@ #define TUN_XDP_FLAG 0x1UL +#define TUN_MSG_UBUF 1 +#define TUN_MSG_PTR 2 +struct tun_msg_ctl { + int type; + void *ptr; +}; + #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE) struct socket *tun_get_socket(struct file *); struct ptr_ring *tun_get_tx_ring(struct file *file); -- 2.7.4