Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp981786pxb; Fri, 22 Jan 2021 04:24:24 -0800 (PST) X-Google-Smtp-Source: ABdhPJwJ41BAJf3vRsk147SgeSLyKuJkuiAngNLsk6MaUyWfNt67ufXUf36pzsa8oHfuKwx0nF/v X-Received: by 2002:aa7:cdcb:: with SMTP id h11mr3011078edw.237.1611318264527; Fri, 22 Jan 2021 04:24:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611318264; cv=none; d=google.com; s=arc-20160816; b=QMKpXJe/PUapOESxkWnNLNSTBHmZg1VAo1Wh39RsKZl0dcoLB4YfxlvEWpFTMOkYou qr82d388YCL7iWIVBeHYfNLxWkcSsjYj1K4JKCBPdKKVmXSmk3sAw3jKu8iyTRwxgRkE wbcanVemAF9liegEuIQfnu61RWevY/wubk3dJ5LkXeAqt5crrdQmMGnNFlIsflPtlFy3 5mH2eSkEhQblX3cOSb9u8npwT+JfTSWEU+ti7DsgFFfqRMIkqhsz5kdD0ErRhTMjd0AF ZwQuhzETKDxkd/ou66blwsL8AvE6zYO0BBSCMwDOtFboxa+j0TqKjLYBAbDzBpCNB1ZN GrfQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=zBKNNB7xA3u1Yvt8i0udXoWfKjx7c/iGsaMmsV6C5K4=; b=g69cSODL1NlSayo9wdibvqpnxcltEtrn7TcFEo6fZFGfKFNzv/N2/ofDBDki4ZpxVW c3We0MTTQCvoE7cw4BPWD4gjWM9fetKqMgrzCNZfWAckMNuXYU0YEDerBqn+hfV+9Lcy CsIh2Wt2jjuYUPx7b9c6dct3ro6opF2C1QnKng56gayXtuo7oVpF3VA8NWwu0CHzTfXG jXMW5Znsm4CK+X8yqTUhdSpFWsqQzRvldKztixaXaaiqUfi4a7J3YDWRLj7qXugqRMYP EKSUu+I+914QqPjRK6+jnovuubcUMZGvenYP0+GbUqXGRTPPVdBBOI6FTLSwt5hhGX8R pfZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gARTxny3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id dk22si2914931ejb.736.2021.01.22.04.24.00; Fri, 22 Jan 2021 04:24:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=gARTxny3; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=gmail.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727815AbhAVMUW (ORCPT + 99 others); Fri, 22 Jan 2021 07:20:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52322 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727457AbhAVMTj (ORCPT ); Fri, 22 Jan 2021 07:19:39 -0500 Received: from mail-pg1-x535.google.com (mail-pg1-x535.google.com [IPv6:2607:f8b0:4864:20::535]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3E832C06174A; Fri, 22 Jan 2021 04:18:59 -0800 (PST) Received: by mail-pg1-x535.google.com with SMTP id c22so3541370pgg.13; Fri, 22 Jan 2021 04:18:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=zBKNNB7xA3u1Yvt8i0udXoWfKjx7c/iGsaMmsV6C5K4=; b=gARTxny3RlHhXi1J5reyt5jFczuQZ9ch4L0ufYCnyx1IxV4ixOBzNvUPYMGCN7oupX A3YcpcJmLSSTlOKOfIYHsQox8JDMQsp3i0fyLyP1J/z2FJbsWecXzqGSBmog3rZnDE1f yA/xVje1gvuy1THoi6DUPmy0sI47z+Dwa9HW7JDeQimpZRf+2os+NxHu+6Xn9yPvvqRv IdcRFroLkw3GSbfIKeG5K+H/dKYdakY1dOKrsOora28UDClhwozr7cwuQzFTXOC9xSsI y0chkbFpnWVXZfU7jufN7m/PK34v+FpokUigjy4b8K0xInO1ed6HL8FCBWZRFhuZCi12 Hd6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=zBKNNB7xA3u1Yvt8i0udXoWfKjx7c/iGsaMmsV6C5K4=; b=eYHddWVdXRRDgfWK3hqKvTQXxHhbwUnDijxRmbT+s5uut+oO3tCA2WM4ZywEGWOtEN SU92uiAggIv8Uc9tYkSMoR7XnRGuA/mnrgNzLv+HDb0C+rbOhMnSyIu6UAAhzwumPsD0 fuJydqRSWIVMTzkCbg0KHiJ1ZoGZI0LgfMkQPikIBDyBG9+2SX+n2+E1lYBdZBl08DET Il/2l+Z3hB21b53CNL6YUH70vN3l8DdGniyv3Vc80voFgGfDwSDOh9YHkW2yhCXzWVee cowkqf0hFjkHlBwK/QWwwxCjDsO+TxnvZCskgVuTHqh3NdtTcmUHcT5AiGBAJthNStf6 D54w== X-Gm-Message-State: AOAM533i/CwmHuzzIz0Nly3YlNoUh9JaAOVFx+alOjrMFDqFDKPNxOIU LkLrM8H9SgmKHri5wLXqkOcdjV0w24wZwXcZLrM= X-Received: by 2002:a63:1047:: with SMTP id 7mr4525662pgq.292.1611317938153; Fri, 22 Jan 2021 04:18:58 -0800 (PST) MIME-Version: 1.0 References: <340f1dfa40416dd966a56e08507daba82d633088.1611236588.git.xuanzhuo@linux.alibaba.com> <20210122114729.1758-1-alobakin@pm.me> <20210122115519.2183-1-alobakin@pm.me> In-Reply-To: <20210122115519.2183-1-alobakin@pm.me> From: Magnus Karlsson Date: Fri, 22 Jan 2021 13:18:47 +0100 Message-ID: Subject: Re: [PATCH bpf-next v3 3/3] xsk: build skb by page To: Alexander Lobakin Cc: Eric Dumazet , Xuan Zhuo , "Michael S. Tsirkin" , Jason Wang , "David S. Miller" , Jakub Kicinski , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Jonathan Lemon , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , virtualization@lists.linux-foundation.org, bpf , Network Development , open list Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jan 22, 2021 at 12:57 PM Alexander Lobakin wrote: > > From: Alexander Lobakin > Date: Fri, 22 Jan 2021 11:47:45 +0000 > > > From: Eric Dumazet > > Date: Thu, 21 Jan 2021 16:41:33 +0100 > > > > > On 1/21/21 2:47 PM, Xuan Zhuo wrote: > > > > This patch is used to construct skb based on page to save memory copy > > > > overhead. > > > > > > > > This function is implemented based on IFF_TX_SKB_NO_LINEAR. Only the > > > > network card priv_flags supports IFF_TX_SKB_NO_LINEAR will use page to > > > > directly construct skb. If this feature is not supported, it is still > > > > necessary to copy data to construct skb. > > > > > > > > ---------------- Performance Testing ------------ > > > > > > > > The test environment is Aliyun ECS server. > > > > Test cmd: > > > > ``` > > > > xdpsock -i eth0 -t -S -s > > > > ``` > > > > > > > > Test result data: > > > > > > > > size 64 512 1024 1500 > > > > copy 1916747 1775988 1600203 1440054 > > > > page 1974058 1953655 1945463 1904478 > > > > percent 3.0% 10.0% 21.58% 32.3% > > > > > > > > Signed-off-by: Xuan Zhuo > > > > Reviewed-by: Dust Li > > > > --- > > > > net/xdp/xsk.c | 104 ++++++++++++++++++++++++++++++++++++++++++++++++---------- > > > > 1 file changed, 86 insertions(+), 18 deletions(-) > > > > > > > > diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c > > > > index 4a83117..38af7f1 100644 > > > > --- a/net/xdp/xsk.c > > > > +++ b/net/xdp/xsk.c > > > > @@ -430,6 +430,87 @@ static void xsk_destruct_skb(struct sk_buff *skb) > > > > sock_wfree(skb); > > > > } > > > > > > > > +static struct sk_buff *xsk_build_skb_zerocopy(struct xdp_sock *xs, > > > > + struct xdp_desc *desc) > > > > +{ > > > > + u32 len, offset, copy, copied; > > > > + struct sk_buff *skb; > > > > + struct page *page; > > > > + void *buffer; > > > > + int err, i; > > > > + u64 addr; > > > > + > > > > + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); > > > > + if (unlikely(!skb)) > > > > + return ERR_PTR(err); > > > > + > > > > + addr = desc->addr; > > > > + len = desc->len; > > > > + > > > > + buffer = xsk_buff_raw_get_data(xs->pool, addr); > > > > + offset = offset_in_page(buffer); > > > > + addr = buffer - xs->pool->addrs; > > > > + > > > > + for (copied = 0, i = 0; copied < len; i++) { > > > > + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; > > > > + > > > > + get_page(page); > > > > + > > > > + copy = min_t(u32, PAGE_SIZE - offset, len - copied); > > > > + > > > > + skb_fill_page_desc(skb, i, page, offset, copy); > > > > + > > > > + copied += copy; > > > > + addr += copy; > > > > + offset = 0; > > > > + } > > > > + > > > > + skb->len += len; > > > > + skb->data_len += len; > > > > > > > + skb->truesize += len; > > > > > > This is not the truesize, unfortunately. > > > > > > We need to account for the number of pages, not number of bytes. > > > > The easiest solution is: > > > > skb->truesize += PAGE_SIZE * i; > > > > i would be equal to skb_shinfo(skb)->nr_frags after exiting the loop. > > Oops, pls ignore this. I forgot that XSK buffers are not > "one per page". > We need to count the number of pages manually and then do > > skb->truesize += PAGE_SIZE * npages; > > Right. There are two possible packet buffer (chunks) sizes in a umem, 2K and 4K on a system with a PAGE_SIZE of 4K. If I remember correctly, and please correct me if wrong, truesize is used for memory accounting. But in this code, no kernel memory has been allocated (apart from the skb). The page is just a part of the umem that has been already allocated beforehand and by user-space in this case. So what should truesize be in this case? Do we add 0, chunk_size * i, or the complicated case of counting exactly how many 4K pages that are used when the chunk_size is 2K, as two chunks could occupy the same page, or just the upper bound of PAGE_SIZE * i that is likely a good approximation in most cases? Just note that there might be other uses of truesize that I am unaware of that could impact this choice. > > > > + > > > > + refcount_add(len, &xs->sk.sk_wmem_alloc); > > > > + > > > > + return skb; > > > > +} > > > > + > > > > Al > > Thanks, > Al >