Received: by 2002:ac0:a594:0:0:0:0:0 with SMTP id m20-v6csp1535340imm; Tue, 22 May 2018 05:42:51 -0700 (PDT) X-Google-Smtp-Source: AB8JxZqUYv/cbbW3hBsV5CSsDPXyP8TeyykMwyQUbUT9r0jUTEUqHVoWbv6IVTnb5X4wUNPKSkqM X-Received: by 2002:a62:f713:: with SMTP id h19-v6mr23716934pfi.165.1526992971845; Tue, 22 May 2018 05:42:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1526992971; cv=none; d=google.com; s=arc-20160816; b=Y3IMharWdHUFuNkKtFlDaWms0i4vfT+n50nGfNcU/dqVXNh4vDAzrnhx6DJuZ1/9Wp XNp+BtiFeFl11cwBNq3kWjJcqB4Gg8D3H351ZxtenVZXj20AXBKHL3ZBvqHKiH/Oii3Q ZQKT7kqPrgA71qohA/9dqXCvhSmbcdPt7w/jQjTiIQ9IFjPS53XeBwpD1Mo72yl5RYfj E9a5iC4IV/bZxCY0oLlbei+sKr2ghpveCq/Asw+3a7823m2rSSyZf8tiqT4EyhSQrBie G04G3p7/dslLghzBN+WCdJcNINELgwbQdEYK0hu4csGGjtvnvHzaauMh1ska2U3/8Yk0 EeUQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=M5j9qNlPnCXErb3l+Sv2fdaiWF2AFPzUo6OeHJoNA94=; b=d0KIve6/YyVujvn2wsnKSx94nve0JUPiagsa0LxrZ2Ox0/wSbO8tKHhQ7LcFox5M1T njTwVei5yR4mdNfdDXyJi6eSsxfehLLHp4jR7eQ2LngckA2YhfFbjZTggv9cp0jPGv6r I20MDdFfnpCIacoeE0kOVgPdsEHDKg4rovIiwK3PcVBiyT+6z44yosKVnIqKffMzpheq uOj0sH2NNCnV22xQyz4dxvBWBispwfyle4OM8rhW0ZvXqYfy2Qoi5U/b1vRJ9F/aTS8Y Q2umdJ+v5VrY9L14QV8vpP1d65b1yZD0OT/LpUwo3KgvxLlGZGm6tfKpC0ZVQUXDE6O9 mG6A== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id v12-v6si15661565plo.264.2018.05.22.05.42.36; Tue, 22 May 2018 05:42:51 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751380AbeEVMlT (ORCPT + 99 others); Tue, 22 May 2018 08:41:19 -0400 Received: from mx3-rdu2.redhat.com ([66.187.233.73]:33074 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751036AbeEVMlP (ORCPT ); Tue, 22 May 2018 08:41:15 -0400 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D1781814F0A8; Tue, 22 May 2018 12:41:14 +0000 (UTC) Received: from [10.72.12.75] (ovpn-12-75.pek2.redhat.com [10.72.12.75]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 0B438215CDA7; Tue, 22 May 2018 12:41:11 +0000 (UTC) Subject: Re: [RFC PATCH net-next 10/12] vhost_net: build xdp buff To: Jesse Brandeburg Cc: mst@redhat.com, kvm@vger.kernel.org, virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org References: <1526893473-20128-1-git-send-email-jasowang@redhat.com> <1526893473-20128-11-git-send-email-jasowang@redhat.com> <20180521095611.00005caa@intel.com> From: Jason Wang Message-ID: <000c64e8-9b70-a7f5-dbd3-c6112711d7df@redhat.com> Date: Tue, 22 May 2018 20:41:09 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: <20180521095611.00005caa@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-Scanned-By: MIMEDefang 2.78 on 10.11.54.6 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Tue, 22 May 2018 12:41:14 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.11.55.8]); Tue, 22 May 2018 12:41:14 +0000 (UTC) for IP:'10.11.54.6' DOMAIN:'int-mx06.intmail.prod.int.rdu2.redhat.com' HELO:'smtp.corp.redhat.com' FROM:'jasowang@redhat.com' RCPT:'' Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2018年05月22日 00:56, Jesse Brandeburg wrote: > On Mon, 21 May 2018 17:04:31 +0800 Jason wrote: >> This patch implement build XDP buffers in vhost_net. The idea is do >> userspace copy in vhost_net and build XDP buff based on the >> page. Vhost_net can then submit one or an array of XDP buffs to >> underlayer socket (e.g TUN). TUN can choose to do XDP or call >> build_skb() to build skb. To support build skb, vnet header were also >> stored into the header of the XDP buff. >> >> This userspace copy and XDP buffs building is key to achieve XDP >> batching in TUN, since TUN does not need to care about userspace copy >> and then can disable premmption for several XDP buffs to achieve >> batching from XDP. >> >> TODO: reserve headroom based on the TUN XDP. >> >> Signed-off-by: Jason Wang >> --- >> drivers/vhost/net.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++ >> 1 file changed, 74 insertions(+) >> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >> index f0639d7..1209e84 100644 >> --- a/drivers/vhost/net.c >> +++ b/drivers/vhost/net.c >> @@ -492,6 +492,80 @@ static bool vhost_has_more_pkts(struct vhost_net *net, >> likely(!vhost_exceeds_maxpend(net)); >> } >> >> +#define VHOST_NET_HEADROOM 256 >> +#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD) >> + >> +static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq, >> + struct iov_iter *from, >> + struct xdp_buff *xdp) >> +{ >> + struct vhost_virtqueue *vq = &nvq->vq; >> + struct page_frag *alloc_frag = ¤t->task_frag; >> + struct virtio_net_hdr *gso; >> + size_t len = iov_iter_count(from); >> + int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); >> + int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + VHOST_NET_HEADROOM >> + + nvq->sock_hlen); >> + int sock_hlen = nvq->sock_hlen; >> + void *buf; >> + int copied; >> + >> + if (len < nvq->sock_hlen) >> + return -EFAULT; >> + >> + if (SKB_DATA_ALIGN(len + pad) + >> + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE) >> + return -ENOSPC; >> + >> + buflen += SKB_DATA_ALIGN(len + pad); > maybe store the result of SKB_DATA_ALIGN in a local instead of doing > the work twice? Ok. > >> + alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES); >> + if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL))) >> + return -ENOMEM; >> + >> + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; >> + >> + /* We store two kinds of metadata in the header which will be >> + * used for XDP_PASS to do build_skb(): >> + * offset 0: buflen >> + * offset sizeof(int): vnet header >> + */ >> + copied = copy_page_from_iter(alloc_frag->page, >> + alloc_frag->offset + sizeof(int), sock_hlen, from); >> + if (copied != sock_hlen) >> + return -EFAULT; >> + >> + gso = (struct virtio_net_hdr *)(buf + sizeof(int)); >> + >> + if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) && >> + vhost16_to_cpu(vq, gso->csum_start) + >> + vhost16_to_cpu(vq, gso->csum_offset) + 2 > >> + vhost16_to_cpu(vq, gso->hdr_len)) { >> + gso->hdr_len = cpu_to_vhost16(vq, >> + vhost16_to_cpu(vq, gso->csum_start) + >> + vhost16_to_cpu(vq, gso->csum_offset) + 2); >> + >> + if (vhost16_to_cpu(vq, gso->hdr_len) > len) >> + return -EINVAL; >> + } >> + >> + len -= sock_hlen; >> + copied = copy_page_from_iter(alloc_frag->page, >> + alloc_frag->offset + pad, >> + len, from); >> + if (copied != len) >> + return -EFAULT; >> + >> + xdp->data_hard_start = buf; >> + xdp->data = buf + pad; >> + xdp->data_end = xdp->data + len; >> + *(int *)(xdp->data_hard_start)= buflen; > space before = Yes. Thanks > >> + >> + get_page(alloc_frag->page); >> + alloc_frag->offset += buflen; >> + >> + return 0; >> +} >> + >> static void handle_tx_copy(struct vhost_net *net) >> { >> struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];