Received: by 2002:a05:7412:d1aa:b0:fc:a2b0:25d7 with SMTP id ba42csp1888319rdb; Wed, 31 Jan 2024 12:10:25 -0800 (PST) X-Google-Smtp-Source: AGHT+IGKNcG38ZBA+9m60myBFQWHXxqPtKXUaxqzv82neRc75eqjH4pjPnHSUkLCzKqqR34FbzYD X-Received: by 2002:a05:6512:11c3:b0:50f:f9b9:f542 with SMTP id h3-20020a05651211c300b0050ff9b9f542mr438892lfr.7.1706731825004; Wed, 31 Jan 2024 12:10:25 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1706731824; cv=pass; d=google.com; s=arc-20160816; b=RBCUVDNFiEujTtKmOOyyDheMJfBa2MDYRXLvfBMgn1hThCKEs75aIDWK3uXTM0JqWh HMscsYauf92tSfUphrJQJ5a9rJl+pXIv5HKjWbxRokjpTK77pmcUIWA/0w3QKjGCCDks 0bjHPXQEoW5QnjcRh9Yj9R2wOmUzc5qTGeHEXVBZBGRfN2LSHrDHkBuMsAlWw1YDj0Ds ypwxQq9542hw2Zf6mfkm1CZMJp20/7i2dHN14ZkheK489sNPdqtbSpsVxJyn3Nn7H8oQ VJ7X+WhjFphF9Cx0gB8Bz7ybvK90i4HPHsLhMF0cupGmcFzANhTIiSNb2aB+c3tq5SA7 pZcQ== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=EP2UpA68s4FVB2SLFzwDpR8KmTA6wh7vb51mUtrgsWI=; fh=PzDKM2/uMwGHhgQcdMv80G//2c244NdwdQmRGqAzhFI=; b=f7QLno1i7mGCAWlB5SMhs0WvuCDJG5HKNRbJFA1WTJicpxXA8sIGTuHCwpEhSVUJGz MW5qd9aobdJDJj7J/ZaHwAQN+f56OqZGNifzJoIuvfACgusKAGFbmV1zDZUmnYCpuxFi ZRClouZcySuY0i4FgB5CWgdbhuGNvGxU2eIs99JIoR0BG5MH1BAOM5txaNs2nEoS8kx7 N2WGcbhppuXZ0dM44HdF7Orw7z8/CZGuKn/lv2fIpCmFAk5nSMqScmtzezVj23LsbMop Ytclkr2msrBwQJqnzklSQNjCmCQV46Qy268CtqIooDspYOSn3gRtgh2GSX4Ag8QZPyNO bqag==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=Ulu6nsF+; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-47048-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47048-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com X-Forwarded-Encrypted: i=1; AJvYcCU1DeoxgAKKjgev1ngiskkZY8QXmmE8w8tcVtK9x7wAggm1hmIC0GmKEGhzCLezOoXJ7sfBBRgBx4HuSPdjfr/cxQbDgpIApx/YI2mgPw== Return-Path: Received: from am.mirrors.kernel.org (am.mirrors.kernel.org. [2604:1380:4601:e00::3]) by mx.google.com with ESMTPS id g26-20020a170906349a00b00a3655d68679si1295120ejb.24.2024.01.31.12.10.24 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 31 Jan 2024 12:10:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-47048-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) client-ip=2604:1380:4601:e00::3; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=Ulu6nsF+; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-47048-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:4601:e00::3 as permitted sender) smtp.mailfrom="linux-kernel+bounces-47048-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by am.mirrors.kernel.org (Postfix) with ESMTPS id 9023F1F27BD8 for ; Wed, 31 Jan 2024 20:10:24 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 8FCB73EA8F; Wed, 31 Jan 2024 20:10:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="Ulu6nsF+" Received: from mail-ua1-f51.google.com (mail-ua1-f51.google.com [209.85.222.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A09C93FB10 for ; Wed, 31 Jan 2024 20:10:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.51 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706731814; cv=none; b=L1A3I9VRtpyDLr64ZuaJS2PWhUQXISrqgB1GkbI+Q0kMFIiilJldrOU4R1vxCPJVlas0fIaXwR5FRWg975lgK31hLlGo2ycBA3fwsbr0fiA+YJ2uZ1R/c3+NBe1PuGnJx5bPtc4GXV8Y/1/3fXp9TSetKRtN8ozxa+dRHrIKDSc= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706731814; c=relaxed/simple; bh=lpCcRtIMEusW8xCxbSlET/LKnKX6VC4CkqZ78XCKqLw=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=gucLExlhtuLxwuSIX41QkXOc45WwEfYJNpkYgFj212An+pY6PAyGIKsH8xDCDO74AlaOadsIvgOCCPiFyE9f3JzgfcVJuv5R3zxNSJOfGS46uRf48or/Cfo7wkwEHQP9MBUE0H3MKMHymFeS14LdNdRtYV9gZRkJyxf6XnLidOI= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=Ulu6nsF+; arc=none smtp.client-ip=209.85.222.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ua1-f51.google.com with SMTP id a1e0cc1a2514c-7ce4512d308so69963241.1 for ; Wed, 31 Jan 2024 12:10:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1706731809; x=1707336609; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EP2UpA68s4FVB2SLFzwDpR8KmTA6wh7vb51mUtrgsWI=; b=Ulu6nsF+x15rnzkGsJ5EAb48hm9uQBM3dUFLR8RybkON63xShQn3RC2yL/yKqEnATX zN5oAQpFKuzEK3WRPMjHqESWxAdOGDFFK0IpLiz5yxXu7rNa8KOqVoMoFhgnXKGN8Rgv gU8bRbGTJZvNZFWob95hdJIRIBEdt6nIvwj3argf2iHp5obKpgwe4knnsLme3m/uF0zA LvVFoq1HEY88hcU/CmiSKo6YtuMP9EHXHL5j8ueUmw3YBnjMpcwIfpn8pyS+KV6gFweA N+jvjLMpTOtYhdDMqkAdLlXhg3x9LzytOM58Jo5/GoNaKk+Q5f2FZgZHhVtk2naNylRz gbcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706731809; x=1707336609; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EP2UpA68s4FVB2SLFzwDpR8KmTA6wh7vb51mUtrgsWI=; b=j/eBTbOyblzcgCJZ0o6bNqYyHsTqs+V99kSosadgV4yFXBQArXIg9XNis85xKe7wQ0 Kj4lWp8GvlLsABtKwc6S40nItPa8g57l1Y6wFmjC79OouUkIIkT1EEkh6N/iE5VI2knF D75+taD3kVbEmBXWqkgWl2eUP9tBnmLFMHmqK7ho9TPNgSLRvhswXbuOi5e+7rMXrwZ6 FzM8O7BzPmZ7/DeaGxHXJ3qG1ocf1lNH6xzw8Zzcrj78cTtdCRltMB85aaO0SXwnJsln Df+zXl86Wdao4Q6vIkS4U9JdBla7BdYfQhsq5qJVzKCK23iSpjuIeUoQdPYzGMu8Odnd iNvQ== X-Gm-Message-State: AOJu0YyS/el6i/k8F3sjT1yU2yB8Xw59Pdz0SxTtwvX2dPUqko2FlOLp Moeu5qiNkbpv00gYe3oCXzPgjVWrBkUUb3bkocrm/9ZZ74uhhxaILvc0e3+ALIiuTv75f1oes5n 6zV2P93pxkCtTgYuPNOmiXzVg1gHPJx6Tt6cK X-Received: by 2002:a67:e3b0:0:b0:46c:9cac:4147 with SMTP id j16-20020a67e3b0000000b0046c9cac4147mr1076516vsm.17.1706731809168; Wed, 31 Jan 2024 12:10:09 -0800 (PST) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240131141858.1149719-1-elver@google.com> In-Reply-To: From: Marco Elver Date: Wed, 31 Jan 2024 21:09:30 +0100 Message-ID: Subject: Re: [PATCH] bpf: Separate bpf_local_storage_lookup() fast and slow paths To: Martin KaFai Lau Cc: Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , Mykola Lysenko , Shuah Khan , bpf@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, 31 Jan 2024 at 20:52, Martin KaFai Lau wrote= : > > On 1/31/24 6:18 AM, Marco Elver wrote: > > To allow the compiler to inline the bpf_local_storage_lookup() fast- > > path, factor it out by making bpf_local_storage_lookup() a static inlin= e > > function and move the slow-path to bpf_local_storage_lookup_slowpath(). > > > > Base on results from './benchs/run_bench_local_storage.sh' this produce= s > > improvements in throughput and latency in the majority of cases: > > > > | Hashmap Control > > | =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > | num keys: 10 > > | hashmap (control) sequential get: > > | | > > | hits throughput: 13.895 =C2=B1 0.024 M ops/s | 14.022 = =C2=B1 0.095 M ops/s (+0.9%) > > | hits latency: 71.968 ns/op | 71.318 ns/op = (-0.9%) > > | important_hits throughput: 13.895 =C2=B1 0.024 M ops/s | 14.022 = =C2=B1 0.095 M ops/s (+0.9%) > > | > > | num keys: 1000 > > | hashmap (control) sequential get: > > | | > > | hits throughput: 11.793 =C2=B1 0.018 M ops/s | 11.645 = =C2=B1 0.370 M ops/s (-1.3%) > > | hits latency: 84.794 ns/op | 85.874 ns/op = (+1.3%) > > | important_hits throughput: 11.793 =C2=B1 0.018 M ops/s | 11.645 = =C2=B1 0.370 M ops/s (-1.3%) > > | > > | num keys: 10000 > > | hashmap (control) sequential get: > > | | > > | hits throughput: 7.113 =C2=B1 0.012 M ops/s | 7.037 =C2= =B1 0.051 M ops/s (-1.1%) > > | hits latency: 140.581 ns/op | 142.113 ns/op = (+1.1%) > > | important_hits throughput: 7.113 =C2=B1 0.012 M ops/s | 7.037 =C2= =B1 0.051 M ops/s (-1.1%) > > My understanding is the change in this patch should not affect the hashma= p > control result, so the above +/- ~1% change could be mostly noise. Yes, I think they are noise. > > | > > | num keys: 100000 > > | hashmap (control) sequential get: > > | | > > | hits throughput: 4.793 =C2=B1 0.034 M ops/s | 4.990 =C2= =B1 0.025 M ops/s (+4.1%) > > | hits latency: 208.623 ns/op | 200.401 ns/op = (-3.9%) > > | important_hits throughput: 4.793 =C2=B1 0.034 M ops/s | 4.990 =C2= =B1 0.025 M ops/s (+4.1%) > > | > > | num keys: 4194304 > > | hashmap (control) sequential get: > > | | > > | hits throughput: 2.088 =C2=B1 0.008 M ops/s | 2.962 =C2= =B1 0.004 M ops/s (+41.9%) > > | hits latency: 478.851 ns/op | 337.648 ns/op = (-29.5%) > > | important_hits throughput: 2.088 =C2=B1 0.008 M ops/s | 2.962 =C2= =B1 0.004 M ops/s (+41.9%) > > The last one has a big difference. Did you run it a couple of times witho= ut the > change and check if the result was consistent ? Based on what you say above this might be noise. I will rerun a few times (and also rebased against the latest v6.8-rc). > > | > > | Local Storage > > | =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > > | num_maps: 1 > > | local_storage cache sequential get: > > | | > > | hits throughput: 32.598 =C2=B1 0.008 M ops/s | 38.480 = =C2=B1 0.054 M ops/s (+18.0%) > > | hits latency: 30.676 ns/op | 25.988 ns/op = (-15.3%) > > | important_hits throughput: 32.598 =C2=B1 0.008 M ops/s | 38.480 = =C2=B1 0.054 M ops/s (+18.0%) > > | local_storage cache interleaved get: > > | | > > | hits throughput: 36.963 =C2=B1 0.045 M ops/s | 43.847 = =C2=B1 0.037 M ops/s (+18.6%) > > | hits latency: 27.054 ns/op | 22.807 ns/op = (-15.7%) > > | important_hits throughput: 36.963 =C2=B1 0.045 M ops/s | 43.847 = =C2=B1 0.037 M ops/s (+18.6%) > > | > > | num_maps: 10 > > | local_storage cache sequential get: > > | | > > | hits throughput: 32.078 =C2=B1 0.004 M ops/s | 37.813 = =C2=B1 0.020 M ops/s (+17.9%) > > | hits latency: 31.174 ns/op | 26.446 ns/op = (-15.2%) > > | important_hits throughput: 3.208 =C2=B1 0.000 M ops/s | 3.781 =C2= =B1 0.002 M ops/s (+17.9%) > > | local_storage cache interleaved get: > > | | > > | hits throughput: 34.564 =C2=B1 0.011 M ops/s | 40.082 = =C2=B1 0.037 M ops/s (+16.0%) > > | hits latency: 28.932 ns/op | 24.949 ns/op = (-13.8%) > > | important_hits throughput: 12.344 =C2=B1 0.004 M ops/s | 14.315 = =C2=B1 0.013 M ops/s (+16.0%) > > | > > | num_maps: 16 > > | local_storage cache sequential get: > > | | > > | hits throughput: 32.493 =C2=B1 0.023 M ops/s | 38.147 = =C2=B1 0.029 M ops/s (+17.4%) > > | hits latency: 30.776 ns/op | 26.215 ns/op = (-14.8%) > > | important_hits throughput: 2.031 =C2=B1 0.001 M ops/s | 2.384 =C2= =B1 0.002 M ops/s (+17.4%) > > | local_storage cache interleaved get: > > | | > > | hits throughput: 34.380 =C2=B1 0.521 M ops/s | 41.605 = =C2=B1 0.095 M ops/s (+21.0%) > > | hits latency: 29.087 ns/op | 24.035 ns/op = (-17.4%) > > | important_hits throughput: 10.939 =C2=B1 0.166 M ops/s | 13.238 = =C2=B1 0.030 M ops/s (+21.0%) > > | > > | num_maps: 17 > > | local_storage cache sequential get: > > | | > > | hits throughput: 28.748 =C2=B1 0.028 M ops/s | 32.248 = =C2=B1 0.080 M ops/s (+12.2%) > > | hits latency: 34.785 ns/op | 31.009 ns/op = (-10.9%) > > | important_hits throughput: 1.693 =C2=B1 0.002 M ops/s | 1.899 =C2= =B1 0.005 M ops/s (+12.2%) > > | local_storage cache interleaved get: > > | | > > | hits throughput: 31.313 =C2=B1 0.030 M ops/s | 35.911 = =C2=B1 0.020 M ops/s (+14.7%) > > | hits latency: 31.936 ns/op | 27.847 ns/op = (-12.8%) > > | important_hits throughput: 9.533 =C2=B1 0.009 M ops/s | 10.933 = =C2=B1 0.006 M ops/s (+14.7%) > > | > > | num_maps: 24 > > | local_storage cache sequential get: > > | | > > | hits throughput: 18.475 =C2=B1 0.027 M ops/s | 19.000 = =C2=B1 0.006 M ops/s (+2.8%) > > | hits latency: 54.127 ns/op | 52.632 ns/op = (-2.8%) > > | important_hits throughput: 0.770 =C2=B1 0.001 M ops/s | 0.792 =C2= =B1 0.000 M ops/s (+2.9%) > > | local_storage cache interleaved get: > > | | > > | hits throughput: 21.361 =C2=B1 0.028 M ops/s | 22.388 = =C2=B1 0.099 M ops/s (+4.8%) > > | hits latency: 46.814 ns/op | 44.667 ns/op = (-4.6%) > > | important_hits throughput: 6.009 =C2=B1 0.008 M ops/s | 6.298 =C2= =B1 0.028 M ops/s (+4.8%) > > | > > | num_maps: 32 > > | local_storage cache sequential get: > > | | > > | hits throughput: 14.220 =C2=B1 0.006 M ops/s | 14.168 = =C2=B1 0.020 M ops/s (-0.4%) > > | hits latency: 70.323 ns/op | 70.580 ns/op = (+0.4%) > > | important_hits throughput: 0.445 =C2=B1 0.000 M ops/s | 0.443 =C2= =B1 0.001 M ops/s (-0.4%) > > | local_storage cache interleaved get: > > | | > > | hits throughput: 17.250 =C2=B1 0.011 M ops/s | 16.650 = =C2=B1 0.021 M ops/s (-3.5%) > > | hits latency: 57.971 ns/op | 60.061 ns/op = (+3.6%) > > | important_hits throughput: 4.815 =C2=B1 0.003 M ops/s | 4.647 =C2= =B1 0.006 M ops/s (-3.5%) > > | > > | num_maps: 100 > > | local_storage cache sequential get: > > | | > > | hits throughput: 5.212 =C2=B1 0.012 M ops/s | 5.878 =C2= =B1 0.004 M ops/s (+12.8%) > > | hits latency: 191.877 ns/op | 170.116 ns/op = (-11.3%) > > | important_hits throughput: 0.052 =C2=B1 0.000 M ops/s | 0.059 =C2= =B1 0.000 M ops/s (+13.5%) > > | local_storage cache interleaved get: > > | | > > | hits throughput: 6.521 =C2=B1 0.053 M ops/s | 7.086 =C2= =B1 0.010 M ops/s (+8.7%) > > | hits latency: 153.343 ns/op | 141.116 ns/op = (-8.0%) > > | important_hits throughput: 1.703 =C2=B1 0.014 M ops/s | 1.851 =C2= =B1 0.003 M ops/s (+8.7%) > > | > > | num_maps: 1000 > > | local_storage cache sequential get: > > | | > > | hits throughput: 0.357 =C2=B1 0.005 M ops/s | 0.325 =C2= =B1 0.005 M ops/s (-9.0%) > > | hits latency: 2803.738 ns/op | 3076.923 ns/op= (+9.7%) > > Is it understood why the slow down here? The same goes for the "num_maps:= 32" > case above but not as bad as here. num_maps:32 could be noise. > > | important_hits throughput: 0.000 =C2=B1 0.000 M ops/s | 0.000 =C2= =B1 0.000 M ops/s > > The important_hits is very little in this case? It seems to be below 0.000M on the test machine. > > | local_storage cache interleaved get: > > | | > > | hits throughput: 0.434 =C2=B1 0.007 M ops/s | 0.447 =C2= =B1 0.007 M ops/s (+3.0%) > > | hits latency: 2306.539 ns/op | 2237.687 ns/op= (-3.0%) > > | important_hits throughput: 0.109 =C2=B1 0.002 M ops/s | 0.112 =C2= =B1 0.002 M ops/s (+2.8%) > > > > Signed-off-by: Marco Elver > > --- > > include/linux/bpf_local_storage.h | 17 ++++++++++++++++= - > > kernel/bpf/bpf_local_storage.c | 14 ++++---------- > > .../selftests/bpf/progs/cgrp_ls_recursion.c | 2 +- > > .../selftests/bpf/progs/task_ls_recursion.c | 2 +- > > 4 files changed, 22 insertions(+), 13 deletions(-) > > > > diff --git a/include/linux/bpf_local_storage.h b/include/linux/bpf_loca= l_storage.h > > index 173ec7f43ed1..c8cecf7fff87 100644 > > --- a/include/linux/bpf_local_storage.h > > +++ b/include/linux/bpf_local_storage.h > > @@ -130,9 +130,24 @@ bpf_local_storage_map_alloc(union bpf_attr *attr, > > bool bpf_ma); > > > > struct bpf_local_storage_data * > > +bpf_local_storage_lookup_slowpath(struct bpf_local_storage *local_stor= age, > > + struct bpf_local_storage_map *smap, > > + bool cacheit_lockit); > > +static inline struct bpf_local_storage_data * > > bpf_local_storage_lookup(struct bpf_local_storage *local_storage, > > struct bpf_local_storage_map *smap, > > - bool cacheit_lockit); > > + bool cacheit_lockit) > > +{ > > + struct bpf_local_storage_data *sdata; > > + > > + /* Fast path (cache hit) */ > > + sdata =3D rcu_dereference_check(local_storage->cache[smap->cache_= idx], > > + bpf_rcu_lock_held()); > > + if (likely(sdata && rcu_access_pointer(sdata->smap) =3D=3D smap)) > > + return sdata; > > + > > + return bpf_local_storage_lookup_slowpath(local_storage, smap, cac= heit_lockit); > > +} > > > > void bpf_local_storage_destroy(struct bpf_local_storage *local_storag= e); > > > > diff --git a/kernel/bpf/bpf_local_storage.c b/kernel/bpf/bpf_local_stor= age.c > > index 146824cc9689..2ef782a1bd6f 100644 > > --- a/kernel/bpf/bpf_local_storage.c > > +++ b/kernel/bpf/bpf_local_storage.c > > @@ -415,20 +415,14 @@ void bpf_selem_unlink(struct bpf_local_storage_el= em *selem, bool reuse_now) > > } > > > > /* If cacheit_lockit is false, this lookup function is lockless */ > > -struct bpf_local_storage_data * > > -bpf_local_storage_lookup(struct bpf_local_storage *local_storage, > > - struct bpf_local_storage_map *smap, > > - bool cacheit_lockit) > > +noinline struct bpf_local_storage_data * > > Is noinline needed ? Yes, so that this TU or LTO kernels do not inline the slowpath, which would cause worse codegen in the caller. > > +bpf_local_storage_lookup_slowpath(struct bpf_local_storage *local_stor= age, > > + struct bpf_local_storage_map *smap, > > + bool cacheit_lockit) > > { > > struct bpf_local_storage_data *sdata; > > struct bpf_local_storage_elem *selem; > > > > - /* Fast path (cache hit) */ > > - sdata =3D rcu_dereference_check(local_storage->cache[smap->cache_= idx], > > - bpf_rcu_lock_held()); > > - if (sdata && rcu_access_pointer(sdata->smap) =3D=3D smap) > > - return sdata; > > - > > /* Slow path (cache miss) */ > > hlist_for_each_entry_rcu(selem, &local_storage->list, snode, > > rcu_read_lock_trace_held()) > > diff --git a/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c b/to= ols/testing/selftests/bpf/progs/cgrp_ls_recursion.c > > index a043d8fefdac..9895087a9235 100644 > > --- a/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c > > +++ b/tools/testing/selftests/bpf/progs/cgrp_ls_recursion.c > > @@ -21,7 +21,7 @@ struct { > > __type(value, long); > > } map_b SEC(".maps"); > > > > -SEC("fentry/bpf_local_storage_lookup") > > +SEC("fentry/bpf_local_storage_lookup_slowpath") > > The selftest is trying to catch recursion. The change here cannot test th= e same > thing because the slowpath will never be hit in the test_progs. I don't = have a > better idea for now also. > > It has a conflict with the bpf-next tree also. Was the patch created agai= nst an > internal tree? Base was v6.7. I will do a rebase and rerun benchmarks.