Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp290258rdb; Thu, 18 Jan 2024 03:51:54 -0800 (PST) X-Google-Smtp-Source: AGHT+IFtZLLgUNvsSHd6ZOmZxVnpYo80kp/AAOISfB2eVcotjlvTFEIC1ofm/5qR4Dj8Xz97j7PM X-Received: by 2002:a2e:83d7:0:b0:2cd:8a63:ac95 with SMTP id s23-20020a2e83d7000000b002cd8a63ac95mr472486ljh.73.1705578713945; Thu, 18 Jan 2024 03:51:53 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705578713; cv=pass; d=google.com; s=arc-20160816; b=oE+gIi4OOw96qhey3MIqmaYEiOHrjosizL2BxUx5o6qbRVTPxLzNIz/JtBNRAP+iH/ bkMvDimI0ttersoji+bCN3TRo9PR1MTi77DOwwx2nCLeYoGIU+MytLEodNbBSZa1ZL1v 3/9NG5X5F6adxCN6GkthqHCdI+KbsSjLWWDcFx7BnaI8N7AVBPFhDKRiqueRtJRRn1Zs dmxcggkMecWE84BSyAjGMkF3G/oXtKnHSj44JrldHmOrNFg2Rj4zGV/3m7Axk/aUSKNy aj8UNR5kHAAvu1OREhds2INhEVJPCjteVNd7Ls3n2k3/fl3M9VZGiffQ5GFEddBg60cm GeNg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:message-id:date:references :in-reply-to:subject:cc:to:from:dkim-signature; bh=dES9AcNoy5iDWvH8/jwu9Q9jSrExSLuFvfw2IDCTzWM=; fh=btSreK0Niu+GB44DTqgiEIEXa8rM3LlL/Ewa6jENaqA=; b=0XJrZnO9Bf+XGJiizf3loazPDh44CkULJzARwQAFG2wbJ4udg0737gJ8PlP6q9Q8H3 KE5ZIiOA03CdzYoB0NOwrw4z55gRH3dRGmCFuocYo8xAriXyMsMO7UYncMv/LJbLu8i7 J1prt5zY6TuODVaqMEMceGl58b56O0dDAber0SK4ESvHRQmj0fCra+wVomEN7aoyxyls MGAXPog93993QtR5q2CkaVASEM8WBMh+1/zRo4luj2FgI/3WsmIf0L/TgRvdbOD1vH50 PEbBkunsjz1Yakcdaa1UrvDc7H6Gon3Ylb9P/9h+U1x7HROQKgQn3aJdN9o5TqKrfEds SBow== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="g/40rDyY"; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-30068-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30068-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id es13-20020a056402380d00b00557766c0f78si7208043edb.77.2024.01.18.03.51.53 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 03:51:53 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-30068-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@redhat.com header.s=mimecast20190719 header.b="g/40rDyY"; arc=pass (i=1 spf=pass spfdomain=redhat.com dkim=pass dkdomain=redhat.com dmarc=pass fromdomain=redhat.com); spf=pass (google.com: domain of linux-kernel+bounces-30068-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-30068-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 99FA01F23BE7 for ; Thu, 18 Jan 2024 11:51:41 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 336FD249E2; Thu, 18 Jan 2024 11:51:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="g/40rDyY" Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D43E4241EC for ; Thu, 18 Jan 2024 11:51:28 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705578690; cv=none; b=hRvBlaPnUVzRR8QJWASbtM0HDNOYUYjxLa7qYrTpf6xZzA79KAQ3ZZDzo8Z/6XtHbkSdiO+1LlMXQJQmpdtAxAh+xnK/joC2zUHUATGFM/7VwFPqiMvE/XEI2HjpFv/isALePIAMUJDymGtQyfh1qrQK6wwkJCQXUNMXdthXMHI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705578690; c=relaxed/simple; bh=gjKL/IdNxXoEFbPq3V+ULP/88RRcg6e6ZtkZpfaK71w=; h=DKIM-Signature:Received:X-MC-Unique:Received: X-Google-DKIM-Signature:X-Gm-Message-State:X-Received: X-Google-Smtp-Source:X-Received:Received:Received:From:To:Cc: Subject:In-Reply-To:References:X-Clacks-Overhead:Date:Message-ID: MIME-Version:Content-Type:Content-Transfer-Encoding; b=IgZUcRmvbYNjms3NsRGMXt9H37ytG+/EJjOeYVIGFbEjFpq/uKquNjUoSPLtKkkpMQRhi2gLt3d1eEaaaROMmdWqLSr2Y3cQgwkHupuAbCTTq/lHxwmoAXXKgim8uQJxSzs05ivq3Ho8Pi/BCsYQwwmGskaDhm0yZQWz6/IrzxM= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=g/40rDyY; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1705578687; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dES9AcNoy5iDWvH8/jwu9Q9jSrExSLuFvfw2IDCTzWM=; b=g/40rDyYgImzdIgUmY6USCskh1xCJgKzgCBMtHOJHkKgG3X8+WcNDMB0Zy8IXM723JsSXp SZKNXxKQHy4Vd2vyFnx4ecsZn+DXZj6jrg0vuJoUnSHa8qFrY9IlEIa0jzhmwmzVXI7WWn kyX+3F6yUWsPiGQR472j8Ac8JVAp+jM= Received: from mail-ed1-f70.google.com (mail-ed1-f70.google.com [209.85.208.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-45-H6gAcLdePpGlksLQDrU77g-1; Thu, 18 Jan 2024 06:51:21 -0500 X-MC-Unique: H6gAcLdePpGlksLQDrU77g-1 Received: by mail-ed1-f70.google.com with SMTP id 4fb4d7f45d1cf-55731b63fe0so6774565a12.1 for ; Thu, 18 Jan 2024 03:51:21 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705578680; x=1706183480; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=dES9AcNoy5iDWvH8/jwu9Q9jSrExSLuFvfw2IDCTzWM=; b=LQTAyy7KKPGKkFHNfiaUVwHIqlUJBCy4LQlNen6s26uBdsGIticzPP4ppzJ8rm4nqU JXI3ALEayWRgcNyp7YpQepU0YKc3eoznidk/id0bwcf7OE+2W4sBzN2OLDJeguOIT96G NTtz0VgLsnmRxqY8kiqJtNuuMHKpfH2+/ghJIKnJcL16Qg2KZLkf9mOoHiaPXx0xHPUW Up+VJ5r3yWPDU/sYtWbTewdZVhOqkNxeulZ3LQ5BQxn59qzR7O6VJeHb3jAuPnTDxjCR IdfI2WSHy06lTpFaH4zUsDCu3v8gV+x3sqo28NPCPGDB3XkbUSYf7VtQgajpYU+mhO2/ LV0w== X-Gm-Message-State: AOJu0YzimsgJZpd/PJL/Me9AQ2fIXzWfyQkYLcpL0zqwjQCdtwydqYJS ylrvyuBo14VXXXSc1Z2PwR8cOUrqOIj4x3OIUEd0wTw8tjFsJ0a6LamqKKg/U5HV6gh7YTSJC3w e51jq7ixleaccSpLTNQuK5qt9oslyOrcxP86HxrQKBmW2bhrV0g6z6DjewMXtNQ== X-Received: by 2002:aa7:cf07:0:b0:559:e763:6bfc with SMTP id a7-20020aa7cf07000000b00559e7636bfcmr477441edy.56.1705578680242; Thu, 18 Jan 2024 03:51:20 -0800 (PST) X-Received: by 2002:aa7:cf07:0:b0:559:e763:6bfc with SMTP id a7-20020aa7cf07000000b00559e7636bfcmr477425edy.56.1705578679854; Thu, 18 Jan 2024 03:51:19 -0800 (PST) Received: from alrua-x1.borgediget.toke.dk ([2a0c:4d80:42:443::2]) by smtp.gmail.com with ESMTPSA id b3-20020a0564021f0300b005545dffa0bdsm9338903edb.13.2024.01.18.03.51.19 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Jan 2024 03:51:19 -0800 (PST) Received: by alrua-x1.borgediget.toke.dk (Postfix, from userid 1000) id ABB751088BAE; Thu, 18 Jan 2024 12:51:18 +0100 (CET) From: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= To: Jakub Kicinski Cc: Sebastian Andrzej Siewior , Alexei Starovoitov , LKML , Network Development , "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Alexei Starovoitov , Andrii Nakryiko , Cong Wang , Hao Luo , Jamal Hadi Salim , Jesper Dangaard Brouer , Jiri Olsa , Jiri Pirko , John Fastabend , KP Singh , Martin KaFai Lau , Ronak Doshi , Song Liu , Stanislav Fomichev , VMware PV-Drivers Reviewers , Yonghong Song , bpf Subject: Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect. In-Reply-To: <20240117180447.2512335b@kernel.org> References: <20231215171020.687342-1-bigeasy@linutronix.de> <20231215171020.687342-16-bigeasy@linutronix.de> <87r0iw524h.fsf@toke.dk> <20240112174138.tMmUs11o@linutronix.de> <87ttnb6hme.fsf@toke.dk> <20240117180447.2512335b@kernel.org> X-Clacks-Overhead: GNU Terry Pratchett Date: Thu, 18 Jan 2024 12:51:18 +0100 Message-ID: <87bk9i6ert.fsf@toke.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Jakub Kicinski writes: > On Wed, 17 Jan 2024 17:37:29 +0100 Toke H=C3=B8iland-J=C3=B8rgensen wrote: >> I am not contesting that latency is important, but it's a pretty >> fundamental trade-off and we don't want to kill throughput entirely >> either. Especially since this is global to the whole kernel; and there >> are definitely people who want to use XDP on an RT kernel and still >> achieve high PPS rates. >>=20 >> (Whether those people really strictly speaking need to be running an RT >> kernel is maybe debatable, but it does happen). >>=20 >> > I expected the lock operation (under RT) to always succeeds and not >> > cause any delay because it should not be contended.=20=20 >>=20 >> A lock does cause delay even when it's not contended. Bear in mind that >> at 10 Gbps line rate, we have a budget of 64 nanoseconds to process each >> packet (for 64-byte packets). So just the atomic op to figure out >> whether there's any contention (around 10ns on the Intel processors I >> usually test on) will blow a huge chunk of the total processing budget. >> We can't actually do the full processing needed in those 64 nanoseconds >> (not to mention the 6.4 nanoseconds we have available at 100Gbps), which >> is why it's essential to amortise as much as we can over multiple >> packets. >>=20 >> This is all back-of-the-envelope calculations, of course. Having some >> actual numbers to look at would be great; I don't suppose you have a >> setup where you can run xdp-bench and see how your patches affect the >> throughput? > > A potentially stupid idea which I have been turning in my head is=20 > how we could get away from having the driver handle details of NAPI > budgeting. It's an source of bugs and endless review comments. > > All drivers end up maintaining a counter of "how many packets have > I processed" and comparing that against the budget. Would it be crazy > if we put that inside napi_struct? Add a "budget" member inside > napi_struct as well, and: > > struct napi_struct { > ... > // poll state > unsigned int budget; > unsigned int rx_used; > ... > } > > static inline bool napi_rx_has_budget(napi) > { > return napi->budget > napi->rx_used; > } > > poll(napi) // no budget > { > while (napi_rx_has_budget(napi)) { > napi_gro_receive(napi, skb); /* does napi->rx_used++ */ > // maybe add explicit napi_rx_count() if > // driver did something funny with the frame. > } > } > > We can also create napi_tx_has_budget() so that people stop being > confused whether budget is for Tx or not. And napi_xdp_comp_has_budget() > so that people stop completing XDP in hard irq context (budget=3D=3D0)... > > And we can pass napi into napi_consume_skb(), instead of, presumably > inexplicably to a newcomer, passing in budget. > And napi_complete_done() can lose the work_done argument, too. I do agree that conceptually it makes a lot of sense to encapsulate the budget like this so drivers don't have to do all this state tracking themselves. It does appear that drivers are doing different things with the budget as it is today, though. For instance, the intel drivers seem to divide the budget over all the enabled RX rings(?); so I'm wondering if it'll be possible to unify drivers around a more opaque NAPI poll API? -Toke