Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp76528rdb; Wed, 17 Jan 2024 18:07:47 -0800 (PST) X-Google-Smtp-Source: AGHT+IGudL2RdWghs2p9MHP9iRxQIL0/A6I8cGczqvXqHk5iStx2ZJoCbJ4beGMoGCuWM1LPsnIJ X-Received: by 2002:a05:6a20:72a6:b0:19a:4462:4ad6 with SMTP id o38-20020a056a2072a600b0019a44624ad6mr143126pzk.31.1705543667668; Wed, 17 Jan 2024 18:07:47 -0800 (PST) Return-Path: Received: from sy.mirrors.kernel.org (sy.mirrors.kernel.org. [2604:1380:40f1:3f00::1]) by mx.google.com with ESMTPS id b3-20020a17090ae38300b00279020d1fb0si581404pjz.129.2024.01.17.18.07.47 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 18:07:47 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-29645-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) client-ip=2604:1380:40f1:3f00::1; Authentication-Results: mx.google.com; dkim=neutral (body hash did not verify) header.i=@kernel.org header.s=k20201202 header.b=CA2BaMyZ; arc=fail (body hash mismatch); spf=pass (google.com: domain of linux-kernel+bounces-29645-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:40f1:3f00::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29645-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sy.mirrors.kernel.org (Postfix) with ESMTPS id 00EACB239B2 for ; Thu, 18 Jan 2024 02:05:18 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 6BE3479CD; Thu, 18 Jan 2024 02:04:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="CA2BaMyZ" Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8C1CC63A4; Thu, 18 Jan 2024 02:04:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705543490; cv=none; b=Pp3C35IDmnU+4Hm7N0Gs7LRYnL0083fCZWzyir902UJWuZgYoCHg4QNjDuf5ZR1H8iw9co5CqPFLLSFfV6HDYbX32ZoWonGvEmwnNsrF6iEkwS02z0iG5Sp84blmFNmA7HKWb8ZR5NV2GiOoUlAIiPnnsPMUvCPVq8NsKJllLcA= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705543490; c=relaxed/simple; bh=XNIIq6PjrrNjsY11XjebC3Pu5vgGDdfjY33hybEPFtg=; h=Received:DKIM-Signature:Date:From:To:Cc:Subject:Message-ID: In-Reply-To:References:MIME-Version:Content-Type: Content-Transfer-Encoding; b=Ju+hV3UNZmUEJVwV5v/RHPlhofwxccCrTGm8jQpwhsZSwv+RAI1kabVH+qbO53LljmdYrORsGH+02r32QjPElj/CClGfAzbPgtGoW+gEylhXQwW89PnFuDCTLPt81Bz8OOZ+v3I9dGEne+bIVLKBZ9GrcRvy1K+Pcs3DHg4Lpfw= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=CA2BaMyZ; arc=none smtp.client-ip=10.30.226.201 Received: by smtp.kernel.org (Postfix) with ESMTPSA id 49848C433C7; Thu, 18 Jan 2024 02:04:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1705543490; bh=XNIIq6PjrrNjsY11XjebC3Pu5vgGDdfjY33hybEPFtg=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=CA2BaMyZWcW//E9N/pitv8U2ATbyOQ6mwGVGbU1UaoUJcalwYQ8hRBXOeaj2VOgGX reHmCT4l43M/VGtp27rrYVb87W4ytKRwRPD/3iaAyKUdRWeWMPhjddkUVtuvAfwUng DcmYuKzBobzzJ7idQt/1znlTC2MygQzs42bxM7nEw/5MaPQCMaQp/Lry1IEFUa6BOL Yvox0R04NYH+uIhntcSyNs6/9wTFsmQ9iRHhOYsgoYcBOCXtFhHiTj9d3Pgnl3uV71 3+ArXQW6A5Q8cNjEgJJolGs8QflwpGP7wDwZQb+bsZkx8SnLypuGwIlHmsq34CQb6B KLcmzmLOKAKqg== Date: Wed, 17 Jan 2024 18:04:47 -0800 From: Jakub Kicinski To: Toke =?UTF-8?B?SMO4aWxhbmQtSsO4cmdlbnNlbg==?= Cc: Sebastian Andrzej Siewior , Alexei Starovoitov , LKML , Network Development , "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Alexei Starovoitov , Andrii Nakryiko , Cong Wang , Hao Luo , Jamal Hadi Salim , Jesper Dangaard Brouer , Jiri Olsa , Jiri Pirko , John Fastabend , KP Singh , Martin KaFai Lau , Ronak Doshi , Song Liu , Stanislav Fomichev , VMware PV-Drivers Reviewers , Yonghong Song , bpf Subject: Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect. Message-ID: <20240117180447.2512335b@kernel.org> In-Reply-To: <87ttnb6hme.fsf@toke.dk> References: <20231215171020.687342-1-bigeasy@linutronix.de> <20231215171020.687342-16-bigeasy@linutronix.de> <87r0iw524h.fsf@toke.dk> <20240112174138.tMmUs11o@linutronix.de> <87ttnb6hme.fsf@toke.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Wed, 17 Jan 2024 17:37:29 +0100 Toke H=C3=B8iland-J=C3=B8rgensen wrote: > I am not contesting that latency is important, but it's a pretty > fundamental trade-off and we don't want to kill throughput entirely > either. Especially since this is global to the whole kernel; and there > are definitely people who want to use XDP on an RT kernel and still > achieve high PPS rates. >=20 > (Whether those people really strictly speaking need to be running an RT > kernel is maybe debatable, but it does happen). >=20 > > I expected the lock operation (under RT) to always succeeds and not > > cause any delay because it should not be contended. =20 >=20 > A lock does cause delay even when it's not contended. Bear in mind that > at 10 Gbps line rate, we have a budget of 64 nanoseconds to process each > packet (for 64-byte packets). So just the atomic op to figure out > whether there's any contention (around 10ns on the Intel processors I > usually test on) will blow a huge chunk of the total processing budget. > We can't actually do the full processing needed in those 64 nanoseconds > (not to mention the 6.4 nanoseconds we have available at 100Gbps), which > is why it's essential to amortise as much as we can over multiple > packets. >=20 > This is all back-of-the-envelope calculations, of course. Having some > actual numbers to look at would be great; I don't suppose you have a > setup where you can run xdp-bench and see how your patches affect the > throughput? A potentially stupid idea which I have been turning in my head is=20 how we could get away from having the driver handle details of NAPI budgeting. It's an source of bugs and endless review comments. All drivers end up maintaining a counter of "how many packets have I processed" and comparing that against the budget. Would it be crazy if we put that inside napi_struct? Add a "budget" member inside napi_struct as well, and: struct napi_struct { .. // poll state unsigned int budget; unsigned int rx_used; .. } static inline bool napi_rx_has_budget(napi) { return napi->budget > napi->rx_used; } poll(napi) // no budget { while (napi_rx_has_budget(napi)) { napi_gro_receive(napi, skb); /* does napi->rx_used++ */ // maybe add explicit napi_rx_count() if // driver did something funny with the frame. } } We can also create napi_tx_has_budget() so that people stop being confused whether budget is for Tx or not. And napi_xdp_comp_has_budget() so that people stop completing XDP in hard irq context (budget=3D=3D0)... And we can pass napi into napi_consume_skb(), instead of, presumably inexplicably to a newcomer, passing in budget. And napi_complete_done() can lose the work_done argument, too. Oh, and I'm bringing it up here, because CONFIG_RT can throw in "need_resched()" into the napi_rx_has_budget(), obviously.