Received: by 2002:a05:7412:ba23:b0:fa:4c10:6cad with SMTP id jp35csp182115rdb; Wed, 17 Jan 2024 23:36:07 -0800 (PST) X-Google-Smtp-Source: AGHT+IEo+iuv7tiBrc5TLwQ+Beg4HY8vzFOAUqSsrHCeFXTJ6RyaZRRHm2KE/PDgq8itsiQw+xeC X-Received: by 2002:a05:620a:258e:b0:783:5b0b:8794 with SMTP id x14-20020a05620a258e00b007835b0b8794mr424196qko.8.1705563366821; Wed, 17 Jan 2024 23:36:06 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1705563366; cv=pass; d=google.com; s=arc-20160816; b=hBr7VU5Qtf0JrfPvY+D2kV1rVNpri4APugOJCVj5ow3pmMT8SOCI0om/7/6QaPBQLx GWsnRTHDoGwSx1o4PwIX93waT6eOYGlLEcxdVennXe5XIdLgi/OiU8yOZOYP+goA1/BA XLeIG3mp+7zUOD5nja8KbPr3wRHIHzTdoot9rvchzSbzQzYUHx3bf1uA5D7wJEPockk7 CUkAnpge/o0FQtSl0OCAvQqm4rIq4yAZrWy2YLL5SwH2iBbqRPmXNdCuoOYNEWR9RuLk et767Kj+UPZ548HsusaFt4mDGB3qO63auZVQDrwo5dcA45irZabwixJkC1mPhuh29Ul6 1AjQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:list-unsubscribe:list-subscribe:list-id:precedence :references:message-id:subject:cc:to:from:dkim-signature :dkim-signature:date; bh=x/SrWGWqFAeh0Q4NXPXuSzxRFfe/0QLjn7iAVq/bzjE=; fh=cCAx0MHxVd61uUszgLk1yVf+f47bpe7W53S6U8/BkPE=; b=ffk9z14xiazLOISKqFsFzU/2P4hPGFtKbBJMoZNFtnrCZsIrD2sHiWO1CEFN0unfba Mnzt4lJDXATl1VROqPEQbMxTe/rvSSEXFZShF+ck3mlX3VoUFFK+fOve51XU5HEldtuY z0ZFKnPnZ4Z7qqDPwZ7uzxOrZ6nCaxkuzyaw7R4+kIFZ8YmDCAXg0dGxos1OccqExs0d kNOV2Zr76TGorXZ65jySLha69P2eQPcrTfZt5mKpCeHwhIe2oaZO1hvI6F9jBHAW4znY wQDaSQ988mClA5vwgEJBW7GiQQoXwFezc77NjphGioWR+heo8lve3argwGzuI2eCwpOS f77Q== ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Xb8xY9VP; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; arc=pass (i=1 spf=pass spfdomain=linutronix.de dkim=pass dkdomain=linutronix.de dmarc=pass fromdomain=linutronix.de); spf=pass (google.com: domain of linux-kernel+bounces-29766-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29766-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Return-Path: Received: from ny.mirrors.kernel.org (ny.mirrors.kernel.org. [147.75.199.223]) by mx.google.com with ESMTPS id m15-20020a05620a214f00b007815c9c6ae6si13277096qkm.616.2024.01.17.23.36.06 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Jan 2024 23:36:06 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-29766-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) client-ip=147.75.199.223; Authentication-Results: mx.google.com; dkim=pass header.i=@linutronix.de header.s=2020 header.b=Xb8xY9VP; dkim=neutral (no key) header.i=@linutronix.de header.s=2020e; arc=pass (i=1 spf=pass spfdomain=linutronix.de dkim=pass dkdomain=linutronix.de dmarc=pass fromdomain=linutronix.de); spf=pass (google.com: domain of linux-kernel+bounces-29766-linux.lists.archive=gmail.com@vger.kernel.org designates 147.75.199.223 as permitted sender) smtp.mailfrom="linux-kernel+bounces-29766-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=NONE sp=QUARANTINE dis=NONE) header.from=linutronix.de Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ny.mirrors.kernel.org (Postfix) with ESMTPS id 91AC71C2281B for ; Thu, 18 Jan 2024 07:36:06 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 0487211716; Thu, 18 Jan 2024 07:35:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Xb8xY9VP"; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b="Skp4jtKn" Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4A4FC947A; Thu, 18 Jan 2024 07:35:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=193.142.43.55 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705563346; cv=none; b=ixkR6/7ZK1Y1nT7lm3YsyjJuG0f41iylbEU84v7wU8kODf00NikKRS3tEqYaZujoBkrDFJ3QfC2MgAszmOVzBwFwsxznT9HxKynKn8mX3UG3zNhuv3n1MPTQztYchlTJ96VdvZOuZjISYxlrf13sWbmpT4LI2S3cBejbxLNSUgI= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1705563346; c=relaxed/simple; bh=8hmAdEIcfL0v9wcLZ0431SoMcuZUtlch/gTPUp8R8qQ=; h=Date:DKIM-Signature:DKIM-Signature:From:To:Cc:Subject:Message-ID: References:MIME-Version:Content-Type:Content-Disposition: Content-Transfer-Encoding:In-Reply-To; b=Gymn8v1P0TqlW1rru0GuLDZKFP1LVntw38RhqcD2sutr1YFf+e68VcAwUAW35BFUsoMKlUgs4bjxUhloqL6uTd8Uy7Q4wL4gMbUSNlpjSxMduOo8uWAWvVQiL0gcySgYsOFp/EHfqaMZDBs7VNbr73H+Kbtmg25cQc8KN0ywwn4= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de; spf=pass smtp.mailfrom=linutronix.de; dkim=pass (2048-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Xb8xY9VP; dkim=permerror (0-bit key) header.d=linutronix.de header.i=@linutronix.de header.b=Skp4jtKn; arc=none smtp.client-ip=193.142.43.55 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linutronix.de Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linutronix.de Date: Thu, 18 Jan 2024 08:35:40 +0100 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1705563341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x/SrWGWqFAeh0Q4NXPXuSzxRFfe/0QLjn7iAVq/bzjE=; b=Xb8xY9VPVGlZC+0QyCX3HfuFBe5M9F+tz0pdkLj3oZTgqR4tmZ9n4s9e21IQVo5tK8RFPB /FAZt+62WyFkD+HgJvOrcbZAsgpc7IHfUk1S76Iwo6ICt5bt+DrVU7de6cy+pgZUQ0MP7n NeehovYW/jpCoyeP/j3v9cxSBhcTvMSFYE3kxatMPiUx8nZY+cV+mvhp4jkHIlEIg7WWg8 Yt0CWDmppz+3oxeBAidkP+VT2U5UcO3aYLIMXi3z9QJAGs6EciAZ+pEF3FgSCT7eWJZ5wi AH3aY702oiSHc1DIZies6lkRUmCtoBw9J7ofrWmyGNaoL08eUJyxmmKQFbzKhA== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1705563341; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=x/SrWGWqFAeh0Q4NXPXuSzxRFfe/0QLjn7iAVq/bzjE=; b=Skp4jtKnot55gcSMtLq/UrWu7lTfXeGcP1L+cK7QA43cdiWiuT4Z/sY+SwkBPZf1FZLnAu dQU730Md1RqVgfCg== From: Sebastian Andrzej Siewior To: Toke =?utf-8?Q?H=C3=B8iland-J=C3=B8rgensen?= Cc: Alexei Starovoitov , LKML , Network Development , "David S. Miller" , Boqun Feng , Daniel Borkmann , Eric Dumazet , Frederic Weisbecker , Ingo Molnar , Jakub Kicinski , Paolo Abeni , Peter Zijlstra , Thomas Gleixner , Waiman Long , Will Deacon , Alexei Starovoitov , Andrii Nakryiko , Cong Wang , Hao Luo , Jamal Hadi Salim , Jesper Dangaard Brouer , Jiri Olsa , Jiri Pirko , John Fastabend , KP Singh , Martin KaFai Lau , Ronak Doshi , Song Liu , Stanislav Fomichev , VMware PV-Drivers Reviewers , Yonghong Song , bpf Subject: Re: [PATCH net-next 15/24] net: Use nested-BH locking for XDP redirect. Message-ID: <20240118073540.GIobmYpD@linutronix.de> References: <20231215171020.687342-1-bigeasy@linutronix.de> <20231215171020.687342-16-bigeasy@linutronix.de> <87r0iw524h.fsf@toke.dk> <20240112174138.tMmUs11o@linutronix.de> <87ttnb6hme.fsf@toke.dk> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <87ttnb6hme.fsf@toke.dk> On 2024-01-17 17:37:29 [+0100], Toke H=C3=B8iland-J=C3=B8rgensen wrote: > This is all back-of-the-envelope calculations, of course. Having some > actual numbers to look at would be great; I don't suppose you have a > setup where you can run xdp-bench and see how your patches affect the > throughput? No but I probably could set it up. > I chatted with Jesper about this, and he had an idea not too far from > this: split up the XDP and regular stack processing in two stages, each > with their individual batching. So whereas right now we're doing > something like: >=20 > run_napi() > bh_disable() > for pkt in budget: > act =3D run_xdp(pkt) > if (act =3D=3D XDP_PASS) > run_netstack(pkt) // this is the expensive bit > bh_enable() >=20 > We could instead do: >=20 > run_napi() > bh_disable() > for pkt in budget: > act =3D run_xdp(pkt) > if (act =3D=3D XDP_PASS) > add_to_list(pkt, to_stack_list) > bh_enable() > // sched point > bh_disable() > for pkt in to_stack_list: > run_netstack(pkt) > bh_enable() >=20 >=20 > This would limit the batching that blocks everything to only the XDP > processing itself, which should limit the maximum time spent in the > blocking state significantly compared to what we have today. The caveat > being that rearranging things like this is potentially a pretty major > refactoring task that needs to touch all the drivers (even if some of > the logic can be moved into the core code in the process). So not really > sure if this approach is feasible, TBH. This does not work because bh_disable() does not disable scheduling. Scheduling may happen. bh_disable() acquires a lock which is currently the only synchronisation point between two say network driver doing NAPI. And this what I want to get rid of. Regarding expensive bit as in XDP_PASS: This doesn't need locking as per proposal, just the REDIRECT piece. > > Daniel said netkit doesn't need this locking because it is not > > supporting this redirect and it made me think. Would it work to make > > the redirect structures part of the bpf_prog-structure instead of > > per-CPU? My understanding is that eBPF's programs data structures are > > part of it and contain locking allowing one eBPF program preempt > > another one. > > Having the redirect structures part of the program would obsolete > > locking. Do I miss anything? >=20 > This won't work, unfortunately: the same XDP program can be attached to > multiple interfaces simultaneously, and for hardware with multiple > receive queues (which is most of the hardware that supports XDP), it can > even run simultaneously on multiple CPUs on the same interface. This is > the reason why this is all being kept in per-CPU variables today. So I started hacking this and noticed yesterday and noticed that you can run multiple bpf programs. This is how I learned that it won't work. My plan B is now to move it into task_struct.=20 > -Toke Sebastian