Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp832985imm; Mon, 21 May 2018 15:23:00 -0700 (PDT) X-Google-Smtp-Source: AB8JxZrBi3wUuM26Pc7z9fI//kSR86g5DI7T1X66amg1ii52FwlqIlifU81eF7A4bNTgFAFQKJoE X-Received: by 2002:a62:e903:: with SMTP id j3-v6mr21850212pfh.196.1526941380030; Mon, 21 May 2018 15:23:00 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526941379; cv=none; d=google.com; s=arc-20160816; b=ZdQkD7vsMhOXlZ/a9rVdaOQxyw7ME0cEpugIECIVwb40u58m0gOqmd2WcBScjstt1P JF4gnPqCcEIpCc7dIz7Xnz0punzDp4tXc0FO+oRUEQJEm1D//jSkCgHLrlgofXWjxh78 u3QaCB+KWrNh9pcrATx63jUUX3ikpy7KthH4Nri/sYxG6iaWpYwKZVXZmV/rSxglYJtO nF9lwdmkcb8A6NjfjfPOLK5kg8WLHD7FjOu6Cji7hgg50wP3lfG72rquuWdExaUvQ+Ss r0VMN4/wBDjM9/WrEI6uKSynFnpfFIJlisVkRogxP9/HlDJYKTsiDnDAma2iUx7pCaN2 MIgg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :arc-authentication-results; bh=T4WzR1w82uLp2LGRDye4V2MjUcsbPMrdKALSdKkam/c=; b=0WboSKlLXzVox6NVLFDSZzKUGu/AZWzesvU/bPvUugXrQuuROXuDUynHRNYfEtywmg K+JyX7ZH9zfC6dnJu/sAXwUFZ0Op66Ic6Hh9iN/QZzp9eJIzkfeFo1SErWk8ujcWOFz6 bWAd+FhJVQCiimZ902WNjurwODcE59DKHeJ3PBd9c4e6fHGcIKFlBGgxwN8i5yblLnkb QybANWWBlw4VriA2y+i9qrcDOUIk4z9rrPvTxCfWrvalA6URxH26ytiXbbb01zWYMSWz smSX8pjI7+tzxnC2+dNtU76uMu1YPA9i4182ZszS1/USt6FYt1wPk3IXHY3fa4xM0BKu Gq8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id k4-v6si11798423pgp.74.2018.05.21.15.22.45; Mon, 21 May 2018 15:22:59 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753379AbeEUWVS (ORCPT + 99 others); Mon, 21 May 2018 18:21:18 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:41888 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1753201AbeEUWVO (ORCPT ); Mon, 21 May 2018 18:21:14 -0400 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.rdu2.redhat.com [10.11.54.4]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 177FB4187E5B; Mon, 21 May 2018 22:21:13 +0000 (UTC) Received: from redhat.com (ovpn-124-61.rdu2.redhat.com [10.10.124.61]) by smtp.corp.redhat.com (Postfix) with SMTP id B62C82026DEF; Mon, 21 May 2018 22:21:11 +0000 (UTC) Date: Tue, 22 May 2018 01:21:11 +0300 From: "Michael S. Tsirkin" To: Jesse Brandeburg Cc: Jason Wang , kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH net-next 10/12] vhost_net: build xdp buff Message-ID: <20180522012008-mutt-send-email-mst@kernel.org> References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> <1526893473-20128-11-git-send-email-jasowang@redhat.com> <20180521095611.00005caa@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20180521095611.00005caa@intel.com> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 22:21:13 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.5]); Mon, 21 May 2018 22:21:13 +0000 (UTC) for IP:'10.11.54.4' DOMAIN:'int-mx04.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'mst@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, May 21, 2018 at 09:56:11AM -0700, Jesse Brandeburg wrote: > On Mon, 21 May 2018 17:04:31 +0800 Jason wrote: > > This patch implement build XDP buffers in vhost_net. The idea is do > > userspace copy in vhost_net and build XDP buff based on the > > page. Vhost_net can then submit one or an array of XDP buffs to > > underlayer socket (e.g TUN). TUN can choose to do XDP or call > > build_skb() to build skb. To support build skb, vnet header were also > > stored into the header of the XDP buff. > > > > This userspace copy and XDP buffs building is key to achieve XDP > > batching in TUN, since TUN does not need to care about userspace copy > > and then can disable premmption for several XDP buffs to achieve > > batching from XDP. > > > > TODO: reserve headroom based on the TUN XDP. > > > > Signed-off-by: Jason Wang > > --- > > drivers/vhost/net.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 74 insertions(+) > > > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > > index f0639d7..1209e84 100644 > > --- a/drivers/vhost/net.c > > +++ b/drivers/vhost/net.c > > @@ -492,6 +492,80 @@ static bool vhost_has_more_pkts(struct vhost_net *net, > > likely(!vhost_exceeds_maxpend(net)); > > } > > > > +#define VHOST_NET_HEADROOM 256 > > +#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD) > > + > > +static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, > > + struct iov_iter *from, > > + struct xdp_buff *xdp) > > +{ > > + struct vhost_virtqueue *vq = &nvq->vq; > > + struct page_frag *alloc_frag = ¤t->task_frag; > > + struct virtio_net_hdr *gso; > > + size_t len = iov_iter_count(from); > > + int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); > > + int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + VHOST_NET_HEADROOM > > + + nvq->sock_hlen); > > + int sock_hlen = nvq->sock_hlen; > > + void *buf; > > + int copied; > > + > > + if (len < nvq->sock_hlen) > > + return -EFAULT; > > + > > + if (SKB_DATA_ALIGN(len + pad) + > > + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) > > + return -ENOSPC; > > + > > + buflen += SKB_DATA_ALIGN(len + pad); > > maybe store the result of SKB_DATA_ALIGN in a local instead of doing > the work twice? I don't mind, but I guess gcc can always do it itself? > > + alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES); > > + if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL))) > > + return -ENOMEM; > > + > > + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; > > + > > + /* We store two kinds of metadata in the header which will be > > + * used for XDP_PASS to do build_skb(): > > + * offset 0: buflen > > + * offset sizeof(int): vnet header > > + */ > > + copied = copy_page_from_iter(alloc_frag->page, > > + alloc_frag->offset + sizeof(int), sock_hlen, from); > > + if (copied != sock_hlen) > > + return -EFAULT; > > + > > + gso = (struct virtio_net_hdr *)(buf + sizeof(int)); > > + > > + if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && > > + vhost16_to_cpu(vq, gso->csum_start) + > > + vhost16_to_cpu(vq, gso->csum_offset) + 2 > > > + vhost16_to_cpu(vq, gso->hdr_len)) { > > + gso->hdr_len = cpu_to_vhost16(vq, > > + vhost16_to_cpu(vq, gso->csum_start) + > > + vhost16_to_cpu(vq, gso->csum_offset) + 2); > > + > > + if (vhost16_to_cpu(vq, gso->hdr_len) > len) > > + return -EINVAL; > > + } > > + > > + len -= sock_hlen; > > + copied = copy_page_from_iter(alloc_frag->page, > > + alloc_frag->offset + pad, > > + len, from); > > + if (copied != len) > > + return -EFAULT; > > + > > + xdp->data_hard_start = buf; > > + xdp->data = buf + pad; > > + xdp->data_end = xdp->data + len; > > + *(int *)(xdp->data_hard_start)= buflen; > > space before = > > > + > > + get_page(alloc_frag->page); > > + alloc_frag->offset += buflen; > > + > > + return 0; > > +} > > + > > static void handle_tx_copy(struct vhost_net *net) > > { > > struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];