Received: by 2002:a05:6359:6284:b0:131:369:b2a3 with SMTP id se4csp3024496rwb; Mon, 7 Aug 2023 07:12:13 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGsWebK0OJwQBbUkanO643HpJuZg45nxI8lGRpNRQvsfZcybv8awG74Op4z3eu+JNrJzQoa X-Received: by 2002:a05:6a20:12c8:b0:12f:90d8:9755 with SMTP id v8-20020a056a2012c800b0012f90d89755mr10216299pzg.15.1691417533164; Mon, 07 Aug 2023 07:12:13 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691417533; cv=none; d=google.com; s=arc-20160816; b=HIaYXF5LOWkbDKqyQz6JClrPtV2EXlNytbNEa07Ky1smKgIdZ9cn+SaG1eyd+Zoqlw xzRSzqToGQdv0mEd28q0ah89bbRlFBDvYI2NusXJhTz9ht9Bx4chrPvnjoFUcz1FbLdK PP0i/DdG8Yv09IyGpFdcmvKkpOd3tpOn7fTNeckubvuKw+kr+A5L8cWguYI5QyM0uI6m EneXzbYU0AP4I2PVla8Fz8bd1ht8Dwt2LnW8L+aNzRR/dKMqXfDndSYxn4irUe6pCr/S iQEIAkQ17pKRHABlzpjraFrQvLyjirDEc/DDMaEJ0SCnX4BYJNIifO0dot+IOn4/7KTk sNqA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:to:content-language:subject:cc:user-agent:mime-version :date:message-id:dkim-signature; bh=585872+LcbPJjivate8MUEhFUiUtcBcUpx/p6qh3hU0=; fh=j488RTcJrlcMmR5Hnq6FOKnvPaBb/a8yKDn6IiWy/GY=; b=HTBQPTin4vVaeVGF0gQFRo4LsNqgFDGbtwP1uJdAmYdFOM2aR8TWpnJ929uD7o+Vbg a9zcLwh5LYv3Sl9Hwb/xRRvyza1RijkhObtHdg7jTYvfpH8Q5soMiT0CpM5alK5pECnS YzE+2Vld3sd7zmxUjHBAsPs0rbmp35YOmce0maqT6vj9reE+IGg99p2EhKGJ8Bq4vVHh E7DNFFA2P0e+CMPae76vj5fQZNvAulf5KDmTzMwdKnp0g94siVVASEsakEJ0Ek8W2+Le 7Ewbh2wIzhWepwSqIizuePmnXRnFNUgBosa5hIEsDLL5cZ8fPq+t5XMbYheNlA4WUl98 L+UA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YBlxm7Lo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id r130-20020a632b88000000b00553813c2df0si5464641pgr.513.2023.08.07.07.11.54; Mon, 07 Aug 2023 07:12:13 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@kernel.org header.s=k20201202 header.b=YBlxm7Lo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233135AbjHGNPc (ORCPT + 99 others); Mon, 7 Aug 2023 09:15:32 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:60336 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S232259AbjHGNPa (ORCPT ); Mon, 7 Aug 2023 09:15:30 -0400 Received: from dfw.source.kernel.org (dfw.source.kernel.org [IPv6:2604:1380:4641:c500::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 48C6CE5A for ; Mon, 7 Aug 2023 06:15:29 -0700 (PDT) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id B958261A8D for ; Mon, 7 Aug 2023 13:15:28 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0A126C433C8; Mon, 7 Aug 2023 13:15:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1691414128; bh=88hsxuLYKVqz+P+8ZpynUOYXC8Wm0wXVKKrSqpdIOug=; h=Date:Cc:Subject:To:References:From:In-Reply-To:From; b=YBlxm7LoLK6UmjBK4jHs58gD/F3lMu8Psw28IA2IEDXh91vGWSHe8iy/TkSyK9xwW c8kzPYJenw58/QcIgmbbrA/KRPaAxgoYmIbBRiNcPs1CQrRbs+NrbYC49S1idtv2b2 MsIzeZ4+TtaZHwa72bA6i1nObVOXVppYOnw4gnMWENf/IZ5YBEuFjAeaGmG08V3CzQ 085e8YnnB/+yGlXA1bNO0HxKycKBUEfEkHH4oOeAD9MjwhLoHMuiAIKebV4JheCZFz abhEd2A1BgcsJRB4rvZboGli8P4Jo5OGVMGim9lROetRRF6ibt0oUzo0I29hkjkBU1 UDSl/2scXRsew== Message-ID: <8fd0313b-8f6f-9814-247d-c2687d053e2a@kernel.org> Date: Mon, 7 Aug 2023 15:15:22 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Cc: "davem@davemloft.net" , "edumazet@google.com" , "pabeni@redhat.com" , Shenwei Wang , Clark Wang , "ast@kernel.org" , "daniel@iogearbox.net" , "john.fastabend@gmail.com" , "netdev@vger.kernel.org" , dl-linux-imx , "linux-kernel@vger.kernel.org" , "bpf@vger.kernel.org" , Andrew Lunn Subject: Re: [PATCH V3 net-next] net: fec: add XDP_TX feature support Content-Language: en-US To: Wei Fang , Jesper Dangaard Brouer , Jesper Dangaard Brouer , Jakub Kicinski References: <20230731060025.3117343-1-wei.fang@nxp.com> <20230802104706.5ce541e9@kernel.org> <1bf41ea8-5131-7d54-c373-00c1fbcac095@redhat.com> From: Jesper Dangaard Brouer In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-3.9 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,NICE_REPLY_A, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07/08/2023 12.30, Wei Fang wrote: >>> The flow-control was not disabled before, so according to your >>> suggestion, I disable the flow-control on the both boards and run the >>> test again, the performance is slightly improved, but still can not >>> see a clear difference between the two methods. Below are the results. >> >> Something else must be stalling the CPU. >> When looking at fec_main.c code, I noticed that >> fec_enet_txq_xmit_frame() will do a MMIO write for every xdp_frame (to >> trigger transmit start), which I believe will stall the CPU. >> The ndo_xdp_xmit/fec_enet_xdp_xmit does bulking, and should be the >> function that does the MMIO write to trigger transmit start. >> > We'd better keep a MMIO write for every xdp_frame on txq, as you know, > the txq will be inactive when no additional ready descriptors remain in the > tx-BDR. So it may increase the delay of the packets if we do a MMIO write > for multiple packets. > You know this hardware better than me, so I will leave to you. >> $ git diff >> diff --git a/drivers/net/ethernet/freescale/fec_main.c >> b/drivers/net/ethernet/freescale/fec_main.c >> index 03ac7690b5c4..57a6a3899b80 100644 >> --- a/drivers/net/ethernet/freescale/fec_main.c >> +++ b/drivers/net/ethernet/freescale/fec_main.c >> @@ -3849,9 +3849,6 @@ static int fec_enet_txq_xmit_frame(struct >> fec_enet_private *fep, >> >> txq->bd.cur = bdp; >> >> - /* Trigger transmission start */ >> - writel(0, txq->bd.reg_desc_active); >> - >> return 0; >> } >> >> @@ -3880,6 +3877,9 @@ static int fec_enet_xdp_xmit(struct net_device >> *dev, >> sent_frames++; >> } >> >> + /* Trigger transmission start */ >> + writel(0, txq->bd.reg_desc_active); >> + >> __netif_tx_unlock(nq); >> >> return sent_frames; >> >> >>> Result: use "sync_dma_len" method >>> root@imx8mpevk:~# ./xdp2 eth0 >> >> The xdp2 (and xdp1) program(s) have a performance issue (due to using >> >> Can I ask you to test using xdp_rxq_info, like: >> >> sudo ./xdp_rxq_info --dev mlx5p1 --action XDP_TX >> > Yes, below are the results, the results are also basically the same. > Result 1: current method > ./xdp_rxq_info --dev eth0 --action XDP_TX > Running XDP on dev:eth0 (ifindex:2) action:XDP_TX options:swapmac > XDP stats CPU pps issue-pps > XDP-RX CPU 0 259,102 0 > XDP-RX CPU total 259,102 > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 0:0 259,102 0 > rx_queue_index 0:sum 259,102 > Running XDP on dev:eth0 (ifindex:2) action:XDP_TX options:swapmac > XDP stats CPU pps issue-pps > XDP-RX CPU 0 259,498 0 > XDP-RX CPU total 259,498 > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 0:0 259,496 0 > rx_queue_index 0:sum 259,496 > Running XDP on dev:eth0 (ifindex:2) action:XDP_TX options:swapmac > XDP stats CPU pps issue-pps > XDP-RX CPU 0 259,408 0 > XDP-RX CPU total 259,408 > > Result 2: dma_sync_len method > Running XDP on dev:eth0 (ifindex:2) action:XDP_TX options:swapmac > XDP stats CPU pps issue-pps > XDP-RX CPU 0 258,254 0 > XDP-RX CPU total 258,254 > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 0:0 258,254 0 > rx_queue_index 0:sum 258,254 > Running XDP on dev:eth0 (ifindex:2) action:XDP_TX options:swapmac > XDP stats CPU pps issue-pps > XDP-RX CPU 0 259,316 0 > XDP-RX CPU total 259,316 > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 0:0 259,318 0 > rx_queue_index 0:sum 259,318 > Running XDP on dev:eth0 (ifindex:2) action:XDP_TX options:swapmac > XDP stats CPU pps issue-pps > XDP-RX CPU 0 259,554 0 > XDP-RX CPU total 259,554 > RXQ stats RXQ:CPU pps issue-pps > rx_queue_index 0:0 259,553 0 > rx_queue_index 0:sum 259,553 > Thanks for running this. >> >>> proto 17: 258886 pkt/s >>> proto 17: 258879 pkt/s >> >> If you provide numbers for xdp_redirect, then we could better evaluate if >> changing the lock per xdp_frame, for XDP_TX also, is worth it. >> > For XDP_REDIRECT, the performance show as follow. > root@imx8mpevk:~# ./xdp_redirect eth1 eth0 > Redirecting from eth1 (ifindex 3; driver st_gmac) to eth0 (ifindex 2; driver fec) This is not exactly the same as XDP_TX setup as here you choose to redirect between eth1 (driver st_gmac) and to eth0 (driver fec). I would like to see eth0 to eth0 XDP_REDIRECT, so we can compare to XDP_TX performance. Sorry for all the requests, but can you provide those numbers? > eth1->eth0 221,642 rx/s 0 err,drop/s 221,643 xmit/s So, XDP_REDIRECT is approx (1-(221825/259554))*100 = 14.53% slower. But as this is 'eth1->eth0' this isn't true comparison to XDP_TX. > eth1->eth0 221,761 rx/s 0 err,drop/s 221,760 xmit/s > eth1->eth0 221,793 rx/s 0 err,drop/s 221,794 xmit/s > eth1->eth0 221,825 rx/s 0 err,drop/s 221,825 xmit/s > eth1->eth0 221,823 rx/s 0 err,drop/s 221,821 xmit/s > eth1->eth0 221,815 rx/s 0 err,drop/s 221,816 xmit/s > eth1->eth0 222,016 rx/s 0 err,drop/s 222,016 xmit/s > eth1->eth0 222,059 rx/s 0 err,drop/s 222,059 xmit/s > eth1->eth0 222,085 rx/s 0 err,drop/s 222,089 xmit/s > eth1->eth0 221,956 rx/s 0 err,drop/s 221,952 xmit/s > eth1->eth0 222,070 rx/s 0 err,drop/s 222,071 xmit/s > eth1->eth0 222,017 rx/s 0 err,drop/s 222,017 xmit/s > eth1->eth0 222,069 rx/s 0 err,drop/s 222,067 xmit/s > eth1->eth0 221,986 rx/s 0 err,drop/s 221,987 xmit/s > eth1->eth0 221,932 rx/s 0 err,drop/s 221,936 xmit/s > eth1->eth0 222,045 rx/s 0 err,drop/s 222,041 xmit/s > eth1->eth0 222,014 rx/s 0 err,drop/s 222,014 xmit/s > Packets received : 3,772,908 > Average packets/s : 221,936 > Packets transmitted : 3,772,908 > Average transmit/s : 221,936 >> And also find out of moving the MMIO write have any effect. >> > I move the MMIO write to fec_enet_xdp_xmit(), the result shows as follow, > the performance is slightly improved. > I'm puzzled that moving the MMIO write isn't change performance. Can you please verify that the packet generator machine is sending more frame than the system can handle? (meaning the pktgen_sample03_burst_single_flow.sh script fast enough?) > root@imx8mpevk:~# ./xdp_redirect eth1 eth0 > Redirecting from eth1 (ifindex 3; driver st_gmac) to eth0 (ifindex 2; driver fec) > eth1->eth0 222,666 rx/s 0 err,drop/s 222,668 xmit/s > eth1->eth0 221,663 rx/s 0 err,drop/s 221,664 xmit/s > eth1->eth0 222,743 rx/s 0 err,drop/s 222,741 xmit/s > eth1->eth0 222,917 rx/s 0 err,drop/s 222,923 xmit/s > eth1->eth0 221,810 rx/s 0 err,drop/s 221,808 xmit/s > eth1->eth0 222,891 rx/s 0 err,drop/s 222,888 xmit/s > eth1->eth0 222,983 rx/s 0 err,drop/s 222,984 xmit/s > eth1->eth0 221,655 rx/s 0 err,drop/s 221,653 xmit/s > eth1->eth0 222,827 rx/s 0 err,drop/s 222,827 xmit/s > eth1->eth0 221,728 rx/s 0 err,drop/s 221,728 xmit/s > eth1->eth0 222,790 rx/s 0 err,drop/s 222,789 xmit/s > eth1->eth0 222,874 rx/s 0 err,drop/s 222,874 xmit/s > eth1->eth0 221,888 rx/s 0 err,drop/s 221,887 xmit/s > eth1->eth0 223,057 rx/s 0 err,drop/s 223,056 xmit/s > eth1->eth0 222,219 rx/s 0 err,drop/s 222,220 xmit/s > Packets received : 3,336,711 > Average packets/s : 222,447 > Packets transmitted : 3,336,710 > Average transmit/s : 222,447 > >> I also noticed driver does a MMIO write (on rxq) for every RX-packet in >> fec_enet_rx_queue() napi-poll loop. This also looks like a potential >> performance stall. >> > The same as txq, the rxq will be inactive if the rx-BDR has no free BDs, so we'd > better do a MMIO write when we recycle a BD, so that the hardware can timely > attach the received pakcets on the rx-BDR. > > In addition, I also tried to avoid using xdp_convert_buff_to_frame(), but the > performance of XDP_TX is still not improved. :( > I would not expect much performance improvement from this anyhow. > After these days of testing, I think it's best to keep the solution in V3, and then > make some optimizations on the V3 patch. I agree. I think you need to send a V5, and then I can ACK that. Thanks for all this testing, --Jesper