Received: by 2002:a05:6359:6284:b0:131:369:b2a3 with SMTP id se4csp4524601rwb; Tue, 8 Aug 2023 09:40:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEYY5PSOTbNY/Om3D8KZG5XpBpYNwg16tkYyTHPo35mvwCOiEGGRqVTWNywjkDTKeBxAwDp X-Received: by 2002:a05:6a20:2454:b0:10f:be0:4dce with SMTP id t20-20020a056a20245400b0010f0be04dcemr44422pzc.8.1691512803840; Tue, 08 Aug 2023 09:40:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691512803; cv=none; d=google.com; s=arc-20160816; b=ixKYc+ndnodg7/gjs6Gr8tx//x1z5Xlq5YsEzBQ+xjd5ZpdC1N3WU9LAkSbDnEuJyX Fz3FlkQQZwL3kNEZ4Z0+5VpFfLWoDaCccHgiDQSAuNJ13ND6x5SkPOCtvVLC7tBmX22W oiMpsFIfdcu3uY3J43DPLKPjFqXs0KNOUx8a0Sq5UneKIR74LTGwhg+T9G+10Vwmf9Ub AOPxCosOoaH8Dzjfh1WixNcdTaz8Z4kpxebokfhG25OE/377QUn/cT5fzWDIjP59MnoV nGYjOK1cBMZ+odwC4UBeIHPgtKcfEDQVEQZbuNSHEo/c0u8Be2p5/zK/qPN7IMfxPP3q dS7g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=FElCe4GqO7dcEzUVLWub8fb5QZN0mdVSba+a7j3yTZw=; fh=d52JIWYEkePcj18BLr16Py79ZIzTYL7SzTuYhl8wvHo=; b=I9B+pnPsUiD6sVPP08DUQGWX0eOLx36q9Q2+z3433JwKP4hD3j1Wf5Z+SNUkeWRJlt i0hCxIbIs0HQH34ofDhX7eiqB2l4lFZ6xz3SxSk7dyX5paSVHILiovyJWK50aTu+S97X 7WvkEWwIxyQ9+HNhey75sJUdK8BGP41yP1v4S2mcJM+Cj2/98xXSH3mD681UsH2v3MeN 9edcieLqDPxH96hlIxw8N8GTg0REgdCuVP+5+iMtwLAG8mDk0Woj83R5bWIjPdBLmFxX EjSjcM4HQGk6DSrNlKxxITYpWRFeoT2g7RfZMSQ6XAunMZ54GPhGjoeXQ8s4Mwu83w2o pBfw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=eXiUNr+h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id fa7-20020a056a002d0700b006876288783asi7491455pfb.119.2023.08.08.09.39.50; Tue, 08 Aug 2023 09:40:03 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance.com header.s=google header.b=eXiUNr+h; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=QUARANTINE sp=QUARANTINE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229851AbjHHDUW (ORCPT + 99 others); Mon, 7 Aug 2023 23:20:22 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45304 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229454AbjHHDUT (ORCPT ); Mon, 7 Aug 2023 23:20:19 -0400 Received: from mail-pl1-x634.google.com (mail-pl1-x634.google.com [IPv6:2607:f8b0:4864:20::634]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5CB23C9 for ; Mon, 7 Aug 2023 20:20:17 -0700 (PDT) Received: by mail-pl1-x634.google.com with SMTP id d9443c01a7336-1bbc64f9a91so44901375ad.0 for ; Mon, 07 Aug 2023 20:20:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1691464817; x=1692069617; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=FElCe4GqO7dcEzUVLWub8fb5QZN0mdVSba+a7j3yTZw=; b=eXiUNr+hyR+wu45aflOHgSuIkLqkiZfIbqJp2zNY23KBnr50ICeGARr5j7xJZrrBSI APTX7radaR/RPrRIdsosSTPZbo+QFkMmR0X45+xHqX5LyW3UxZidiSW/rCTzv7j4foxa nsBp5vLvd6e6Fuszk4yej0lvXlULO/9oBlArWaa5MVsKrA0DmtzGx/bPX9PyEOzkSIgM 87r0skMXjeF84QWHiawLb/yhoZt1F92Kvjtc/JHZf75yGNfC8NsIyC6bKtV23YtZT4Cj AzBwfOzF1MsKR2Io6LisMfrjIwvnJyHJ0GeKqLJXOfrZxJmLSe6E2lHyLGAcKf5mJyXe gRqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691464817; x=1692069617; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=FElCe4GqO7dcEzUVLWub8fb5QZN0mdVSba+a7j3yTZw=; b=ABzdOF8+Bh3k95g8U8mtLYReiKwFBGMOyBhtOWwsL8KSnUwutqNojnwIsaQFLJ1KRZ gdeBu+v70Wq8nrlEm0OTps2eYWUPbZrZbQ/u6thC6H5s+xTFLl6RTNt1Foar4J3TiqG1 sg4MbTOmOon/6xMokZhgvQElxCB87OPftVwaNaTDFIeTRl98eOB7aX8huhj1ez45+Ruy yd+ZcMGsy7dRTmlnDWjqw+mLJww+PH/494Gf6gc2P+z33w6yEWRMXAYQcVolww+Wj9Ee mWzSo7TFZGXBWAgMRHy7kH7hEM12CLkHpRT5C4znCswgiV5QJ8aymewo0QbYYWTzoHon AO+A== X-Gm-Message-State: AOJu0YzlDAmPy74bqnQBWoEwthxek9irWIEAh+7Tje6UEBfYAiqfU/fX RRFP3G+/0fif0CiGVQAkYj232Z5fxF6vDvvZDCQ= X-Received: by 2002:a17:902:d882:b0:1bc:5855:f108 with SMTP id b2-20020a170902d88200b001bc5855f108mr9933657plz.46.1691464816858; Mon, 07 Aug 2023 20:20:16 -0700 (PDT) Received: from C02FG34NMD6R.bytedance.net ([2408:8656:30f8:e020::b]) by smtp.gmail.com with ESMTPSA id 13-20020a170902c10d00b001b896686c78sm7675800pli.66.2023.08.07.20.20.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 07 Aug 2023 20:20:16 -0700 (PDT) From: Albert Huang To: davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Cc: Albert Huang , Alexei Starovoitov , Daniel Borkmann , Jesper Dangaard Brouer , John Fastabend , =?UTF-8?q?Bj=C3=B6rn=20T=C3=B6pel?= , Magnus Karlsson , Maciej Fijalkowski , Jonathan Lemon , Pavel Begunkov , Yunsheng Lin , Kees Cook , Richard Gobert , "open list:NETWORKING DRIVERS" , open list , "open list:XDP (eXpress Data Path)" Subject: [RFC v3 Optimizing veth xsk performance 0/9] Date: Tue, 8 Aug 2023 11:19:04 +0800 Message-Id: <20230808031913.46965-1-huangjie.albert@bytedance.com> X-Mailer: git-send-email 2.37.1 (Apple Git-137.1) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_BLOCKED, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org AF_XDP is a kernel bypass technology that can greatly improve performance. However,for virtual devices like veth,even with the use of AF_XDP sockets, there are still many additional software paths that consume CPU resources. This patch series focuses on optimizing the performance of AF_XDP sockets for veth virtual devices. Patches 1 to 4 mainly involve preparatory work. Patch 5 introduces tx queue and tx napi for packet transmission, while patch 8 primarily implements batch sending for IPv4 UDP packets, and patch 9 add support for AF_XDP tx need_wakup feature. These optimizations significantly reduce the software path and support checksum offload. I tested those feature with A typical topology is shown below: client(send): server:(recv) veth<-->veth-peer veth1-peer<--->veth1 1 | | 7 |2 6| | | bridge<------->eth0(mlnx5)- switch -eth1(mlnx5)<--->bridge1 3 4 5 (machine1) (machine2) AF_XDP socket is attach to veth and veth1. and send packets to physical NIC(eth0) veth:(172.17.0.2/24) bridge:(172.17.0.1/24) eth0:(192.168.156.66/24) eth1(172.17.0.2/24) bridge1:(172.17.0.1/24) eth0:(192.168.156.88/24) after set default route\snat\dnat. we can have a tests to get the performance results. packets send from veth to veth1: af_xdp test tool: link:https://github.com/cclinuxer/libxudp send:(veth) ./objs/xudpperf send --dst 192.168.156.88:6002 -l 1300 recv:(veth1) ./objs/xudpperf recv --src 172.17.0.2:6002 udp test tool:iperf3 send:(veth) iperf3 -c 192.168.156.88 -p 6002 -l 1300 -b 0 -u recv:(veth1) iperf3 -s -p 6002 performance: performance:(test weth libxudp lib) UDP : 320 Kpps (with 100% cpu) AF_XDP no zerocopy + no batch : 480 Kpps (with ksoftirqd 100% cpu) AF_XDP with batch + zerocopy : 1.5 Mpps (with ksoftirqd 15% cpu) With af_xdp batch, the libxudp user-space program reaches a bottleneck. Therefore, the softirq did not reach the limit. This is just an RFC patch series, and some code details still need further consideration. Please review this proposal. v2->v3: - fix build error find by kernel test robot. v1->v2: - all the patches pass checkpatch.pl test. suggested by Simon Horman. - iperf3 tested with -b 0, update the test results. suggested by Paolo Abeni. - refactor code to make code structure clearer. - delete some useless code logic in the veth_xsk_tx_xmit function. - add support for AF_XDP tx need_wakup feature. Albert Huang (9): veth: Implement ethtool's get_ringparam() callback xsk: add dma_check_skip for skipping dma check veth: add support for send queue xsk: add xsk_tx_completed_addr function veth: use send queue tx napi to xmit xsk tx desc veth: add ndo_xsk_wakeup callback for veth sk_buff: add destructor_arg_xsk_pool for zero copy veth: af_xdp tx batch support for ipv4 udp veth: add support for AF_XDP tx need_wakup feature drivers/net/veth.c | 679 +++++++++++++++++++++++++++++++++++- include/linux/skbuff.h | 2 + include/net/xdp_sock_drv.h | 5 + include/net/xsk_buff_pool.h | 1 + net/xdp/xsk.c | 6 + net/xdp/xsk_buff_pool.c | 3 +- net/xdp/xsk_queue.h | 10 + 7 files changed, 704 insertions(+), 2 deletions(-) -- 2.20.1