Received: by 2002:ab2:f03:0:b0:1ef:ffd0:ce49 with SMTP id i3csp29371lqf; Tue, 26 Mar 2024 13:15:31 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCU8ILljDfIE1iMGnWmYpim5FJhHliT+gsaDO14OqqyzYrAznYpC77G6wUx0N1a3Vh7WjlkzoMBwiZirxGTcFf0ruRH8tcgj2LfscgTBBg== X-Google-Smtp-Source: AGHT+IEBYdLlY4xdDjU5XhGNI41ceFuhnHsIcGeWhN8o1O2+DhZKIzXdungNUpiKDQzXWut25cHT X-Received: by 2002:a05:6a20:3d87:b0:1a3:a8ff:473b with SMTP id s7-20020a056a203d8700b001a3a8ff473bmr3234659pzi.29.1711484130829; Tue, 26 Mar 2024 13:15:30 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711484130; cv=pass; d=google.com; s=arc-20160816; b=S76FH/jNqbuVU+v+gWEh7JVsg4NOWaXdgLibZFXGQnfPLEVw32dUjpRZLt4AHkKL85 G1wRAnKHVRKjjSDRQEpQ+HjxUSb26zRh1MnLjyMhNDMHwF4CLSHGDZnbs1cHgagNL8aL 9lpw5t7wjLFXeP/T9PuxdA9aIx4BZytsTMUADJFfVZD5jeWMZ6JJGV4c/U+zNT1G0PRa CNWP/LSwFX0mEZc8Lsa4+o5OQod4R8Bsyy2TwgjUMsWQL4EU43Qd8WbqoZcfkTnh+JcX dnpdFX88zVdaE1UAUuECcd7cjEwr9MicpFb+meyXjss8DH056YmT1jO4wsB2cBApQ7lh MmWg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=6jKpOWJ+LWFlZC4DPjtJM6QnKsaH5cXRpUTLZjP2Byk=; fh=lIGqDmrcTwgP32gSqtNjq6tjKIK/Nxb7eBeNzw3n96E=; b=hSnlgP+KH4bdGhfg8fFnVNOpkMFAzWeFmdIpDFIIFDyEbPYV3vBndmctKsiJyGK6q+ RJUleSnlRKrfhWgSyABP8WHV9mgosLQEFFQXCbjJUsaDjxUvWUyARXi0fz8ZZ4HHT7v5 7nJkwm0L4I5pejqesRL3+EAfEQm/X/zXyRAUjmv3qrR/jlZMMF81xC2Dp4GBW5/KJ8V+ bDu0+hJQgBUkCypXazysxnrcdfgEv3nWxWZLMVZfPDkfNv5rTQqtiq9T0pwUK1m8o7kL RBH0lqjFEfe9GR2a8raJZcQQnTAMwG6n6gOtrBa91VEMi2hbxAE4UGGkaeunbgyDNVyf F1FQ==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=KPvLO7qA; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-119877-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-119877-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [2604:1380:45e3:2400::1]) by mx.google.com with ESMTPS id ca34-20020a056a0206a200b005dc918631dfsi11203730pgb.122.2024.03.26.13.15.30 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 26 Mar 2024 13:15:30 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-119877-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) client-ip=2604:1380:45e3:2400::1; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=KPvLO7qA; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-119877-linux.lists.archive=gmail.com@vger.kernel.org designates 2604:1380:45e3:2400::1 as permitted sender) smtp.mailfrom="linux-kernel+bounces-119877-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id E3653326031 for ; Tue, 26 Mar 2024 20:15:22 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 1048713D61D; Tue, 26 Mar 2024 20:14:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="KPvLO7qA" Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com [209.85.218.53]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC29D13D299 for ; Tue, 26 Mar 2024 20:14:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.218.53 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711484097; cv=none; b=c3swR2TnCUk7VNCgwqoM7rMSoUBnAXwN6e715DeucwCpt49ovQDiyCpWDjs+oTA4RoluaIcu3Gzb+LuqS4lE/F3FVI8N3e4aBFLdPf6Own9ZZxnrR3Ge1QWKHaZkSoe8MeQi5LxxoIyoMaCaOuXrMKADJaK3V+d8U3jw4y7eIAQ= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711484097; c=relaxed/simple; bh=6jKpOWJ+LWFlZC4DPjtJM6QnKsaH5cXRpUTLZjP2Byk=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=jk5sbgt7jXowXDITc7QvIprpigqfv4o4E323/4hmx+dUqxKvXv8+iSOLOHjW9yxwyvI7XKviTksAfo+Rs8FOFQEalDDdTTQMUVqDFU1mUGjD1DcCTxpNEA2QpoEde6lAhDq+6RDOt1A1xi12BSg+AG4CNJR4qiwRr5LlxoPeCXo= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=KPvLO7qA; arc=none smtp.client-ip=209.85.218.53 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-ej1-f53.google.com with SMTP id a640c23a62f3a-a4644bde1d4so770299466b.3 for ; Tue, 26 Mar 2024 13:14:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711484093; x=1712088893; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6jKpOWJ+LWFlZC4DPjtJM6QnKsaH5cXRpUTLZjP2Byk=; b=KPvLO7qAkCiHfLM+7yYD85XG5SeVblySb+Qyp5kGaAjTA/hczmDKSHgcVEPlfaUlGr IMmaSRtqEP4EjGQdyLe0fIEy9yBSIXzWsYF76hwt339EjpdwJlc6+it4Bke1Cz158+wK JvahVWgtv+0XsyS/8WtGkD57oDyuWnLXOxkCQBFeVRxIDY58YBosEToyALwz5grXmRUM DagkNt9M3oVW1iS5Q+0NBO7V9VuGzfw/4IEU7WFUJhMAuG9s910IAyvv6jkVM49kKGzP aDV+E70H6fI9HFQoiVX93VpTbD1PryU9FIvU2TvI+ioykPDls2+kiOH16gPBBnSBRN+V ancA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711484093; x=1712088893; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6jKpOWJ+LWFlZC4DPjtJM6QnKsaH5cXRpUTLZjP2Byk=; b=TNktinCKFGCF3APb7hVCI38tg9NbtmJEp58GhWLWE0fIKYCxjbqEMv7deM/9VbzMfm /VUCCS4R8OxJmjK4trc5PlAcQe0yrIKHXKKbq4mYNn9jEFdRDnAdA5UoOqZ87pKi9V3z 7kJd8xpBhnSsuLLR70WoiN6sK3lfymLZ2vcbZnsj93AZ1ASVKwomiymjUoSLmgoChv6t oXj3WzHvmdcfCRkarU9HJEwi4nFUXxYaBjiN6HmSEkBLNLpC3Y4crBCsjFrtfwKpFhjQ D393tzMdt/Q2kEuJstRzx1U7jf//x7RsiCpE0oV4wE3+ydt7pqQEUMenxoADA1jFI3h8 5tUg== X-Forwarded-Encrypted: i=1; AJvYcCWVvLErF3U7KNOZFAhU2ifoabyW6OQUaZkke/6pELSzLHO8llb+agoU6ZCBkExLtBfXY0NlxjzH0TihB/poUpKHhu8II6hnzlgTk1vC X-Gm-Message-State: AOJu0YyUzQ1LgrT/m5akk4TBYL6G4SL79jtMbQ3/xuTmJSoC+/UxCrPK 29YVmalljHxCJ80Oa2wLlUbTKM/2YJ7XJjYifQwnE/JF/im1HYP3pWHUAuPk+d6jp1GcJJl8tf3 gSS6DaugKVoi90iZy3XvkbLO/J5fwzIPk30Ri X-Received: by 2002:a17:907:76f2:b0:a47:32b3:18c5 with SMTP id kg18-20020a17090776f200b00a4732b318c5mr521341ejc.68.1711484092750; Tue, 26 Mar 2024 13:14:52 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240305020153.2787423-1-almasrymina@google.com> <6208950d-6453-e797-7fc3-1dcf15b49dbe@huawei.com> In-Reply-To: From: Mina Almasry Date: Tue, 26 Mar 2024 13:14:39 -0700 Message-ID: Subject: Re: [RFC PATCH net-next v6 00/15] Device Memory TCP To: Yunsheng Lin , shakeel.butt@linux.dev Cc: YiFei Zhu , netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Pavel Begunkov , David Wei , Jason Gunthorpe , Shailend Chand , Harshitha Ramamurthy , Jeroen de Borst , Praveen Kaligineedi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Mar 26, 2024 at 5:47=E2=80=AFAM Yunsheng Lin wrote: > > On 2024/3/26 8:28, Mina Almasry wrote: > > On Tue, Mar 5, 2024 at 11:38=E2=80=AFAM Mina Almasry wrote: > >> > >> On Tue, Mar 5, 2024 at 4:54=E2=80=AFAM Yunsheng Lin wrote: > >>> > >>> On 2024/3/5 10:01, Mina Almasry wrote: > >>> > >>> ... > >>> > >>>> > >>>> Perf - page-pool benchmark: > >>>> --------------------------- > >>>> > >>>> bench_page_pool_simple.ko tests with and without these changes: > >>>> https://pastebin.com/raw/ncHDwAbn > >>>> > >>>> AFAIK the number that really matters in the perf tests is the > >>>> 'tasklet_page_pool01_fast_path Per elem'. This one measures at about= 8 > >>>> cycles without the changes but there is some 1 cycle noise in some > >>>> results. > >>>> > >>>> With the patches this regresses to 9 cycles with the changes but the= re > >>>> is 1 cycle noise occasionally running this test repeatedly. > >>>> > >>>> Lastly I tried disable the static_branch_unlikely() in > >>>> netmem_is_net_iov() check. To my surprise disabling the > >>>> static_branch_unlikely() check reduces the fast path back to 8 cycle= s, > >>>> but the 1 cycle noise remains. > >>>> > >>> > >>> The last sentence seems to be suggesting the above 1 ns regresses is = caused > >>> by the static_branch_unlikely() checking? > >> > >> Note it's not a 1ns regression, it's looks like maybe a 1 cycle > >> regression (slightly less than 1ns if I'm reading the output of the > >> test correctly): > >> > >> # clean net-next > >> time_bench: Type:tasklet_page_pool01_fast_path Per elem: 8 cycles(tsc) > >> 2.993 ns (step:0) > >> > >> # with patches > >> time_bench: Type:tasklet_page_pool01_fast_path Per elem: 9 cycles(tsc) > >> 3.679 ns (step:0) > >> > >> # with patches and with diff that disables static branching: > >> time_bench: Type:tasklet_page_pool01_fast_path Per elem: 8 cycles(tsc) > >> 3.248 ns (step:0) > >> > >> I do see noise in the test results between run and run, and any > >> regression (if any) is slightly obfuscated by the noise, so it's a bit > >> hard to make confident statements. So far it looks like a ~0.25ns > >> regression without static branch and about ~0.65ns with static branch. > >> > >> Honestly when I saw all 3 results were within some noise I did not > >> investigate more, but if this looks concerning to you I can dig > >> further. I likely need to gather a few test runs to filter out the > >> noise and maybe investigate the assembly my compiler is generating to > >> maybe narrow down what changes there. > >> > > > > I did some more investigation here to gather more data to filter out > > the noise, and recorded the summary here: > > > > https://pastebin.com/raw/v5dYRg8L > > > > Long story short, the page_pool benchmark results are consistent with > > some outlier noise results that I'm discounting here. Currently > > page_pool fast path is at 8 cycles > > > > [ 2115.724510] time_bench: Type:tasklet_page_pool01_fast_path Per > > elem: 8 cycles(tsc) 3.187 ns (step:0) - (measurement period > > time:0.031870585 sec time_interval:31870585) - (invoke count:10000000 > > tsc_interval:86043192) > > > > and with this patch series it degrades to 10 cycles, or about a 0.7ns > > degradation or so: > > Even if the absolute value for the overhead is small, we seems have a > degradation of about 20% for tasklet_page_pool01_fast_path testcase, > which seems scary. > > I am assuming that every page is recyclable for tasklet_page_pool01_fast_= path > testcase, and that code path matters for page_pool, it would be good to > remove any additional checking for that code path. > We can remove the usage of static_branch_unlikely in the net_iov check, which reduces the overhead to 1 cycle (8->9), only 12.5% overhead. The addition of the static_branch_unlikely is not improving the performance of devmem TCP anyway. From previous discussions with Jesper he deemed a 1 cycle degradation acceptable, but he hasn't commented in a while, he may have changed his mind but so far no complaints. We can additionally only add the check only if CONFIG_SHARED_DMA_BUFFER is enabled. I've tested that and the fast path goes back to 8 cycles (0 overhead). If CONFIG_SHARED_DMA_BUFFER is not enabled then netmem can't be dmabuf anyway, so no reason to check. > And we already have pool->has_init_callback checking when we have to use > a new page, it may make sense to refactor that to share the same checking > for provider to avoid the overhead as much as possible. > > Also, I am not sure if it really matter that much, as with the introducin= g > of netmem_is_net_iov() checking spreading in the networking, the overhead > might add up for other case too. --=20 Thanks, Mina