Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp9642112pxu; Tue, 29 Dec 2020 00:35:16 -0800 (PST) X-Google-Smtp-Source: ABdhPJzTyZuN7W/I8eNgwQosMqiJf2kwSfvbSu0dij2sY2mepvug5nI0jYh6D0aLprwuiQyfO4LD X-Received: by 2002:a17:906:74e:: with SMTP id z14mr43601172ejb.362.1609230916425; Tue, 29 Dec 2020 00:35:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1609230916; cv=none; d=google.com; s=arc-20160816; b=eHSNBpWxyqaD4xIURrFKhs2yPhH21+e5PuFkUxL9MFnikBI/rDLBGBCHwIi0icdol1 02Pxq0at/0yeiBkN8LxQ2ZDgZxgASrBZccJVUgY0FT3FWys1g16ZCtrW7YuabM7BX1S2 fok65bIAD0DxJG+du/q6sqRCJNoM8ixCLawHaclVx4+Tlo5JszFgGugfQCYsSTJSaOAb Ji9eCtf8jUgYtlLk3QjF8Cd8zwNuXhZHXzsgFd5LyWANA3DyPlbst8EJt3g5+abE52/N HkDzzJr8ONzCgw6+11hNWy+6wAk6g+UJcBw0w6i0EZAx0oYmVX9ueLqGD4wYP80DqUfg W8fg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:references:in-reply-to:message-id:date:subject :cc:to:from; bh=cwRvyFC8cPbM0zu05422ccVehllcssNyu4d5g3vAnl4=; b=cXmvWtsi0RGOir10ag7z0h4QkGA06rYpCtbr1CItYfoNWbUFVoE2hNR+V/qkhKT+2k ZTYysZ387CCZlTXlSHvnq1U89mQwKvf2YKSjcIk9Bn4fyvbtwdnp1HWWTcaH7UDkY6TJ UURzV/1m5vzV7eC+IRmRpGGKflk8DFCo8Q2C03gxjRXcNsWsUMhsJUcmTOoFnAYzKD+9 L+pMigJLEpZfGnY0BAARBe6ZdhntfgJDY+TgRVWXpoUacOqRJ/hM6W8JTua/lLpiQL69 Dpt/xzOfsaj0USeooKXam5wFFPTfgku72iD8AvWIY5nui4qDkb616qSLvdRDW16E5kpw UmRA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id u14si20826134edo.600.2020.12.29.00.34.53; Tue, 29 Dec 2020 00:35:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=alibaba.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726185AbgL2Id0 (ORCPT + 99 others); Tue, 29 Dec 2020 03:33:26 -0500 Received: from out30-43.freemail.mail.aliyun.com ([115.124.30.43]:59191 "EHLO out30-43.freemail.mail.aliyun.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726008AbgL2Id0 (ORCPT ); Tue, 29 Dec 2020 03:33:26 -0500 X-Alimail-AntiSpam: AC=PASS;BC=-1|-1;BR=01201311R251e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04400;MF=xuanzhuo@linux.alibaba.com;NM=1;PH=DS;RN=17;SR=0;TI=SMTPD_---0UK7gJ7u_1609230761; Received: from localhost(mailfrom:xuanzhuo@linux.alibaba.com fp:SMTPD_---0UK7gJ7u_1609230761) by smtp.aliyun-inc.com(127.0.0.1); Tue, 29 Dec 2020 16:32:41 +0800 From: Xuan Zhuo To: magnus.karlsson@intel.com Cc: =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Jonathan Lemon , "David S. Miller" , Jakub Kicinski , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , Andrii Nakryiko , Martin KaFai Lau , Song Liu , Yonghong Song , KP Singh , netdev@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)), bpf@vger.kernel.org (open list:XDP SOCKETS (AF_XDP)), linux-kernel@vger.kernel.org (open list) Subject: [PATCH bpf-next] xsk: build skb by page Date: Tue, 29 Dec 2020 16:32:41 +0800 Message-Id: <3fab080e36b322b6749190ad6054d53c67658050.1609230615.git.xuanzhuo@linux.alibaba.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: References: Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This patch is used to construct skb based on page to save memory copy overhead. Taking into account the problem of addr unaligned, and the possibility of frame size greater than page in the future. The test environment is Aliyun ECS server. Test cmd: ``` xdpsock -i eth0 -t -S -s ``` Test result data: size 64 512 1024 1500 copy 1916747 1775988 1600203 1440054 page 1974058 1953655 1945463 1904478 percent 3.0% 10.0% 21.58% 32.3% Signed-off-by: Xuan Zhuo --- net/xdp/xsk.c | 68 ++++++++++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 17 deletions(-) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index ac4a317..7cab40f 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -430,6 +430,55 @@ static void xsk_destruct_skb(struct sk_buff *skb) sock_wfree(skb); } +static struct sk_buff *xsk_build_skb_bypage(struct xdp_sock *xs, struct xdp_desc *desc) +{ + char *buffer; + u64 addr; + u32 len, offset, copy, copied; + int err, i; + struct page *page; + struct sk_buff *skb; + + skb = sock_alloc_send_skb(&xs->sk, 0, 1, &err); + if (unlikely(!skb)) + return NULL; + + addr = desc->addr; + len = desc->len; + + buffer = xsk_buff_raw_get_data(xs->pool, addr); + offset = offset_in_page(buffer); + addr = buffer - (char *)xs->pool->addrs; + + for (copied = 0, i = 0; copied < len; ++i) { + page = xs->pool->umem->pgs[addr >> PAGE_SHIFT]; + + get_page(page); + + copy = min((u32)(PAGE_SIZE - offset), len - copied); + + skb_fill_page_desc(skb, i, page, offset, copy); + + copied += copy; + addr += copy; + offset = 0; + } + + skb->len += len; + skb->data_len += len; + skb->truesize += len; + + refcount_add(len, &xs->sk.sk_wmem_alloc); + + skb->dev = xs->dev; + skb->priority = xs->sk.sk_priority; + skb->mark = xs->sk.sk_mark; + skb_shinfo(skb)->destructor_arg = (void *)(long)addr; + skb->destructor = xsk_destruct_skb; + + return skb; +} + static int xsk_generic_xmit(struct sock *sk) { struct xdp_sock *xs = xdp_sk(sk); @@ -445,40 +494,25 @@ static int xsk_generic_xmit(struct sock *sk) goto out; while (xskq_cons_peek_desc(xs->tx, &desc, xs->pool)) { - char *buffer; - u64 addr; - u32 len; - if (max_batch-- == 0) { err = -EAGAIN; goto out; } - len = desc.len; - skb = sock_alloc_send_skb(sk, len, 1, &err); + skb = xsk_build_skb_bypage(xs, &desc); if (unlikely(!skb)) goto out; - skb_put(skb, len); - addr = desc.addr; - buffer = xsk_buff_raw_get_data(xs->pool, addr); - err = skb_store_bits(skb, 0, buffer, len); /* This is the backpressure mechanism for the Tx path. * Reserve space in the completion queue and only proceed * if there is space in it. This avoids having to implement * any buffering in the Tx path. */ - if (unlikely(err) || xskq_prod_reserve(xs->pool->cq)) { + if (xskq_prod_reserve(xs->pool->cq)) { kfree_skb(skb); goto out; } - skb->dev = xs->dev; - skb->priority = sk->sk_priority; - skb->mark = sk->sk_mark; - skb_shinfo(skb)->destructor_arg = (void *)(long)desc.addr; - skb->destructor = xsk_destruct_skb; - err = __dev_direct_xmit(skb, xs->queue_id); if (err == NETDEV_TX_BUSY) { /* Tell user-space to retry the send */ -- 1.8.3.1