Received: by 2002:a05:6358:700f:b0:131:369:b2a3 with SMTP id 15csp2423070rwo; Thu, 3 Aug 2023 09:08:26 -0700 (PDT) X-Google-Smtp-Source: APBJJlGHHn9MNjsAxaNZ9jgkwSAEE4kFW8nw7Yu24+7ulKWvLKsKmyUSQXY6RYyJb0EURDzLh7u1 X-Received: by 2002:a17:90a:e398:b0:263:e814:5d0f with SMTP id b24-20020a17090ae39800b00263e8145d0fmr18318823pjz.41.1691078905811; Thu, 03 Aug 2023 09:08:25 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691078905; cv=none; d=google.com; s=arc-20160816; b=E1S5DDQVoc5H0PHdJAnYuvG1eisUTFfD7bpKhMFxJ4cwHE+EkrVmpIls3BbdzWqaBT 50kErx/qPXra8cHw9rcreqjfohZL1TRIiE2EKEl/D2mh2sIJcKYfSnF5SFiQb5e9ITVG NP2gpjlG9nhVPR0u1JXGRhFoFOHXiotc0yCtk8f2/7/IBuMNL9dvY04jSNOPEoNNr0E8 SIsL2R1mZlQA9h0xz9um9bwSa1RhgGNpA1jPZhZIfGWmuTf+q/QOjhKXV9UHgkvZIkD+ JgsBRyovjQn9m+k2S3V6SvjgbtTnf9oBoMXXRvoRKa1rDw7ENeSFfp6/UiTUX63nQyf6 qJLw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id:dkim-signature; bh=cn5EKV1iqxqOG4GSHq1jSv+0+9oy24IlUIVOT5PBKIs=; fh=XpsPDtOFTkzUJ9KAK/LO9Ukhstv4R8V9qyZq2puZ8gA=; b=lSjVr4331yBwb2rZvV4CxLOuQAMOuF6dGhDsBn/nusI1EToBb17Jx6pUeD1t78qriF M3sCXQErDhmms2UH5EEJl5VGuqHTTjaI1WY1NU2+yOVaLWkQXSMlvI3aIu/zpfqh/m9d SnUE/4v3C7YGRlQza1F6rZAkfmwid61s3c7QWVeBmwKXpBOojzhMZVhEsCnrZ3g7EznQ IoUkQIYQ2BQYa01JaW1r+2J2cgzuOXrgmTBdsiyEIsmQs/kf5sU8bpuecWa941DgFcv6 ERYdrhhk+JOeGdGbnOqPKNHFmp3zkSy0Ik6Ryp0NpJOei61y58xGVN0rGS4ZH+zXF1/7 dmHA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=RKXtPuSF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id kk10-20020a17090b4a0a00b0026335941736si3686678pjb.144.2023.08.03.09.08.07; Thu, 03 Aug 2023 09:08:25 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=RKXtPuSF; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234281AbjHCPBr (ORCPT + 99 others); Thu, 3 Aug 2023 11:01:47 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:37426 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232505AbjHCPBp (ORCPT ); Thu, 3 Aug 2023 11:01:45 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 3D058173F for ; Thu, 3 Aug 2023 08:01:44 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B2A9361DEE for ; Thu, 3 Aug 2023 15:01:43 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1B58FC433C8; Thu, 3 Aug 2023 15:01:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1691074903; bh=x0b7u5XFNxoM7ml5cInA0lkjUPtAMr2eoeGroLFHjw8=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=RKXtPuSFBEQPTmQqg+5qKMqk1wE2HgIgsj9S+WCnTVQ+nsy2VuJTAbWHxm8vhiuWM USuTZlTufLiXwvGw/8y2jaDLvk50AZwHM3KvpKBk+CsDCkEnjTW6Ds+cM5rjEAFF8k XakUgALC3Ki+6s0kzSxAI++9pQ2NjdDxnSXOjvR6eo1Z8eRJksazFcF6rjF0lnDR9t piRMKu7xAuHF74Eo2VSGGxjUnNmkf2w+LkA88nRCpNiOnMBl/85tNttJOpZSL2Tv3C JjeP25jUDHRjwhVTtUZFxzH6RQpNn1Hy1vZKNWjsJOuZgXgbRi2wrK1YMuWHZS9g8J HjJjaQWitqhFQ== Message-ID: Date: Thu, 3 Aug 2023 17:01:37 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [RFC Optimizing veth xsk performance 00/10] Content-Language: en-US To: "huangjie.albert" , davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com, Maryam Tahhan , Keith Wiles , Liang Chen Cc: Alexei Starovoitov , Daniel Borkmann , John Fastabend , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" References: <20230803140441.53596-1-huangjie.albert@bytedance.com> From: Jesper Dangaard Brouer In-Reply-To: <20230803140441.53596-1-huangjie.albert@bytedance.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_MED,SPF_HELO_NONE,SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/08/2023 16.04, huangjie.albert wrote: > AF_XDP is a kernel bypass technology that can greatly improve performance. > However, for virtual devices like veth, even with the use of AF_XDP sockets, > there are still many additional software paths that consume CPU resources. > This patch series focuses on optimizing the performance of AF_XDP sockets > for veth virtual devices. Patches 1 to 4 mainly involve preparatory work. > Patch 5 introduces tx queue and tx napi for packet transmission, while > patch 9 primarily implements zero-copy, and patch 10 adds support for > batch sending of IPv4 UDP packets. These optimizations significantly reduce > the software path and support checksum offload. > > I tested those feature with > A typical topology is shown below: > veth<-->veth-peer veth1-peer<--->veth1 > 1 | | 7 > |2 6| > | | > bridge<------->eth0(mlnx5)- switch -eth1(mlnx5)<--->bridge1 > 3 4 5 > (machine1) (machine2) > AF_XDP socket is attach to veth and veth1. and send packets to physical NIC(eth0) > veth:(172.17.0.2/24) > bridge:(172.17.0.1/24) > eth0:(192.168.156.66/24) > > eth1(172.17.0.2/24) > bridge1:(172.17.0.1/24) > eth0:(192.168.156.88/24) > > after set default route、snat、dnat. we can have a tests > to get the performance results. > > packets send from veth to veth1: > af_xdp test tool: > link:https://github.com/cclinuxer/libxudp > send:(veth) > ./objs/xudpperf send --dst 192.168.156.88:6002 -l 1300 > recv:(veth1) > ./objs/xudpperf recv --src 172.17.0.2:6002 > > udp test tool:iperf3 > send:(veth) > iperf3 -c 192.168.156.88 -p 6002 -l 1300 -b 60G -u > recv:(veth1) > iperf3 -s -p 6002 > > performance: > performance:(test weth libxdp lib) > UDP : 250 Kpps (with 100% cpu) > AF_XDP no zerocopy + no batch : 480 Kpps (with ksoftirqd 100% cpu) > AF_XDP with zerocopy + no batch : 540 Kpps (with ksoftirqd 100% cpu) > AF_XDP with batch + zerocopy : 1.5 Mpps (with ksoftirqd 15% cpu) > > With af_xdp batch, the libxdp user-space program reaches a bottleneck. Do you mean libxdp [1] or libxudp ? [1] https://github.com/xdp-project/xdp-tools/tree/master/lib/libxdp > Therefore, the softirq did not reach the limit. > > This is just an RFC patch series, and some code details still need > further consideration. Please review this proposal. > I find this performance work interesting as we have customer requests (via Maryam (cc)) to improve AF_XDP performance both native and on veth. Our benchmark is stored at: https://github.com/maryamtahhan/veth-benchmark Great to see other companies also interested in this area. --Jesper > thanks! > > huangjie.albert (10): > veth: Implement ethtool's get_ringparam() callback > xsk: add dma_check_skip for skipping dma check > veth: add support for send queue > xsk: add xsk_tx_completed_addr function > veth: use send queue tx napi to xmit xsk tx desc > veth: add ndo_xsk_wakeup callback for veth > sk_buff: add destructor_arg_xsk_pool for zero copy > xdp: add xdp_mem_type MEM_TYPE_XSK_BUFF_POOL_TX > veth: support zero copy for af xdp > veth: af_xdp tx batch support for ipv4 udp > > drivers/net/veth.c | 729 +++++++++++++++++++++++++++++++++++- > include/linux/skbuff.h | 1 + > include/net/xdp.h | 1 + > include/net/xdp_sock_drv.h | 1 + > include/net/xsk_buff_pool.h | 1 + > net/xdp/xsk.c | 6 + > net/xdp/xsk_buff_pool.c | 3 +- > net/xdp/xsk_queue.h | 11 + > 8 files changed, 751 insertions(+), 2 deletions(-) >