Received: by 2002:ab2:6857:0:b0:1ef:ffd0:ce49 with SMTP id l23csp3030575lqp; Mon, 25 Mar 2024 17:48:09 -0700 (PDT) X-Forwarded-Encrypted: i=3; AJvYcCVbtuCFyXE+3K8ZIsyUiS/P84LP7sczkn44/ixA0Aye72Y85amJs8sYjpox1j6sAp/cLtzgUIFSwQILZMlanm8is4n0Hv2ZQ+EtLTFBoA== X-Google-Smtp-Source: AGHT+IG1idRvolc1JpslC2NL/XQMDlaTpbgVqa0mx0AUmIxlxe4lF6Srl768vMjz1/mp1IFOD8zw X-Received: by 2002:a05:6a20:3894:b0:1a3:6194:aa43 with SMTP id n20-20020a056a20389400b001a36194aa43mr7763907pzf.61.1711414089443; Mon, 25 Mar 2024 17:48:09 -0700 (PDT) ARC-Seal: i=2; a=rsa-sha256; t=1711414089; cv=pass; d=google.com; s=arc-20160816; b=HC0RBycd+9uMWHBPDgdzsMqKqQcAPze24rhE1KxRRZjhcmjMs8AdxbeDyK+QEUTHYK ThNm7iOlGXPtQtqWFcafp7/wDSHfEFhy3Pfkq5FYIuJsHdSKjqMde70SqYIsyVG9VAiR z/UwrYYv37SvHehECuEjgPQdNBHhENT+F6HhRoOAHau/b9f6Ofe8vgpcuzsKwl1d6WY5 48q0g6/uKJ8J9U2SAcvWQ5N2nevhXWbRUUOoC6DMursK0aFsFWNZ6ODy6RnoD1Wlwgy4 Npbp2WKLnURIADRadyIpCNuW44wDj6z2H0cymlLbKadDB9fpaA0kFTJDqxEQ9GXHY+D6 Xedg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:list-unsubscribe:list-subscribe :list-id:precedence:dkim-signature; bh=6AS6qRiVlQsc1+zZ8NlPzKQQkZa8BexCVQalpdaPfxc=; fh=MEYhRHAZpcjM8kRXeGnd3mz4UCKMZzeuUuZI5bbJOzo=; b=SL8fHmMX1DVvXzdjjJdpYVAOsPrEbonptyPu+RjRUBigD5In68ARBOE4tAvNLwmv/1 MTdmbIm1RtbeHNL/X8MbGLSiVpcelTxx0irdkmyVxgcmfUhUGJ28vlQfEWZ00s9LqudG fVcicUUxpsf9MEePQKY5kNF9X+DaoxsFtzHghf11Oukb8k4+6c3u1OIFsaL2bVoIEdEa sJhHciqX+e3pzzFuLjlp7Zvf96EItbxW0Zxhhk3I9VYwHWZnSnsmz73McKQixXiQ8tBC hn4MluBmpnbXfbTQ6UPMgumyRZvpQP39jcMe9XgdSUwfO5/lzULjfu900kfBJ4Nq9ZbF zT3Q==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=jwc72MJc; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-118212-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-118212-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id x15-20020a170902ec8f00b001dca6272e21si6437871plg.36.2024.03.25.17.48.09 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Mar 2024 17:48:09 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel+bounces-118212-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20230601 header.b=jwc72MJc; arc=pass (i=1 spf=pass spfdomain=google.com dkim=pass dkdomain=google.com dmarc=pass fromdomain=google.com); spf=pass (google.com: domain of linux-kernel+bounces-118212-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-118212-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 54CB42C2E10 for ; Tue, 26 Mar 2024 00:28:46 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 4CFDC139F; Tue, 26 Mar 2024 00:28:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="jwc72MJc" Received: from mail-lf1-f43.google.com (mail-lf1-f43.google.com [209.85.167.43]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B47137F6 for ; Tue, 26 Mar 2024 00:28:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.167.43 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711412910; cv=none; b=dskJNaTX+UKWTfi+2qp/oojQ9KX+t+U2gZHiUfxEWVa3sqhjpqueNgb0okcD7cFolraTOOSxMjr0hRZT/wbK9IvSo084uHEBeAdsi361HqJPSQPkrW5o0Jmo4FF0DRMOBz5dl1mUO4ZC2hfseB0CZ9gt9wGCf63SameMh/wWZeE= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711412910; c=relaxed/simple; bh=q9rJFj0LGRVbIR7Im5V+4jATTEQF0xUdYxt97RfhsCY=; h=MIME-Version:References:In-Reply-To:From:Date:Message-ID:Subject: To:Cc:Content-Type; b=GzGoyV5/DaCDkrm0GGn1IvRReRZXsrk56w0mH9i4hMugu6i4dO/Qhfp51fIikKxxMpz2S+9qoO2+6dXYBqfXV1deYVMst+1DqpbxJGCciRjf73Hr3XvWuTbxk/PapWieN1WJJI0sSs7kCZ6Q/jNk37Gylgw9gmWJduseLm+DAFE= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=jwc72MJc; arc=none smtp.client-ip=209.85.167.43 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=google.com Received: by mail-lf1-f43.google.com with SMTP id 2adb3069b0e04-513e89d0816so6184549e87.0 for ; Mon, 25 Mar 2024 17:28:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1711412906; x=1712017706; darn=vger.kernel.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6AS6qRiVlQsc1+zZ8NlPzKQQkZa8BexCVQalpdaPfxc=; b=jwc72MJcbOv3kNXMWOzcqkprlIA0kzQenPVePAM+2MCIOWNpyQlr6Bod5r9PGyvAtH W6MB0rm6nc8gi1c90MEweCLFgqrpGJGagH8d9WJ+/WPLaNmtIo9qwAX7olhxp7p1+eyd DcyyHKRIsnQpuHfXC265pgCXIuHYOQ18/yzNq9b12LEYr0kPYRM0GpIGdh8sOyaD9+kR qvdPyYoNIACwyhXeSLQae4+A4idpRdC3nULounqP9EtnB1V1RPwCNZ4w4UvOv+JUb4FC 0pIfwgms3iFSJGJuud3CMLeDpCdI+yUp2mJeizZI7bwQ90Mb/sYG+eprzKPZ3MZUedo+ 6zIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1711412906; x=1712017706; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6AS6qRiVlQsc1+zZ8NlPzKQQkZa8BexCVQalpdaPfxc=; b=Mmb9D9h2KUbebDnXAL+3R86lCNFVBDXSAYfSSxiaPEUGS3hVeAErjVyKhKOHZscFgO J4h0d8v3RTNb9yqU99D/niFA1i8lsEgHgLrehtNLXFw2ZJGqD4PEvZO2ESvldv7GfgRJ xs13H4e0VLfn8hlBW70Gr3O//e8lIsSru29o5igKyGAxm9FogIr5Y1klmuMsKDkZECYW caJkzkatg8rmPZQdqFfhpq//OpDBbsmX+Qp2NFzQ7OgWabAzdKWo18Hlo53Vx/6vGQza vVaFosgyjQ7tFesSVt7b3qzPkxjvb0lAxUOcwo0oTdnq9PGcnxp/Z5ihgTxAPmLWbHKo dgCA== X-Forwarded-Encrypted: i=1; AJvYcCXg3tvXVsQskveDaVMqS4CwyJbSPVhZuxj3ZifK2MHsxePrQlzMjg1RuDhg4gjub5USGtteIO3AgiH7YYmN2vyZ6WQHpIzMGLPAxANe X-Gm-Message-State: AOJu0YwTzibE0FUvAVhNFf3CBgFIHOj52nE0cH3OszePPFxoweGrtGce 0xrL0F0D/vN/YvIsG1fUT6xtS8QTRfNU0DrmHqrSQ0Vd938aF8/MZ51ao7v8EUl90JQuZ+oSll8 xVRpYN6wvD7nfB2TmyyBlaMaht+0DYFS/gTfv X-Received: by 2002:a05:6512:456:b0:513:2b35:2520 with SMTP id y22-20020a056512045600b005132b352520mr5371798lfk.58.1711412905481; Mon, 25 Mar 2024 17:28:25 -0700 (PDT) Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 References: <20240305020153.2787423-1-almasrymina@google.com> <6208950d-6453-e797-7fc3-1dcf15b49dbe@huawei.com> In-Reply-To: From: Mina Almasry Date: Mon, 25 Mar 2024 17:28:12 -0700 Message-ID: Subject: Re: [RFC PATCH net-next v6 00/15] Device Memory TCP To: Yunsheng Lin , YiFei Zhu Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-alpha@vger.kernel.org, linux-mips@vger.kernel.org, linux-parisc@vger.kernel.org, sparclinux@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-arch@vger.kernel.org, bpf@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org, "David S. Miller" , Eric Dumazet , Jakub Kicinski , Paolo Abeni , Jonathan Corbet , Richard Henderson , Ivan Kokshaysky , Matt Turner , Thomas Bogendoerfer , "James E.J. Bottomley" , Helge Deller , Andreas Larsson , Jesper Dangaard Brouer , Ilias Apalodimas , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Arnd Bergmann , Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Martin KaFai Lau , Eduard Zingerman , Song Liu , Yonghong Song , John Fastabend , KP Singh , Stanislav Fomichev , Hao Luo , Jiri Olsa , David Ahern , Willem de Bruijn , Shuah Khan , Sumit Semwal , =?UTF-8?Q?Christian_K=C3=B6nig?= , Pavel Begunkov , David Wei , Jason Gunthorpe , Shailend Chand , Harshitha Ramamurthy , Shakeel Butt , Jeroen de Borst , Praveen Kaligineedi Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Mar 5, 2024 at 11:38=E2=80=AFAM Mina Almasry wrote: > > On Tue, Mar 5, 2024 at 4:54=E2=80=AFAM Yunsheng Lin wrote: > > > > On 2024/3/5 10:01, Mina Almasry wrote: > > > > ... > > > > > > > > Perf - page-pool benchmark: > > > --------------------------- > > > > > > bench_page_pool_simple.ko tests with and without these changes: > > > https://pastebin.com/raw/ncHDwAbn > > > > > > AFAIK the number that really matters in the perf tests is the > > > 'tasklet_page_pool01_fast_path Per elem'. This one measures at about = 8 > > > cycles without the changes but there is some 1 cycle noise in some > > > results. > > > > > > With the patches this regresses to 9 cycles with the changes but ther= e > > > is 1 cycle noise occasionally running this test repeatedly. > > > > > > Lastly I tried disable the static_branch_unlikely() in > > > netmem_is_net_iov() check. To my surprise disabling the > > > static_branch_unlikely() check reduces the fast path back to 8 cycles= , > > > but the 1 cycle noise remains. > > > > > > > The last sentence seems to be suggesting the above 1 ns regresses is ca= used > > by the static_branch_unlikely() checking? > > Note it's not a 1ns regression, it's looks like maybe a 1 cycle > regression (slightly less than 1ns if I'm reading the output of the > test correctly): > > # clean net-next > time_bench: Type:tasklet_page_pool01_fast_path Per elem: 8 cycles(tsc) > 2.993 ns (step:0) > > # with patches > time_bench: Type:tasklet_page_pool01_fast_path Per elem: 9 cycles(tsc) > 3.679 ns (step:0) > > # with patches and with diff that disables static branching: > time_bench: Type:tasklet_page_pool01_fast_path Per elem: 8 cycles(tsc) > 3.248 ns (step:0) > > I do see noise in the test results between run and run, and any > regression (if any) is slightly obfuscated by the noise, so it's a bit > hard to make confident statements. So far it looks like a ~0.25ns > regression without static branch and about ~0.65ns with static branch. > > Honestly when I saw all 3 results were within some noise I did not > investigate more, but if this looks concerning to you I can dig > further. I likely need to gather a few test runs to filter out the > noise and maybe investigate the assembly my compiler is generating to > maybe narrow down what changes there. > I did some more investigation here to gather more data to filter out the noise, and recorded the summary here: https://pastebin.com/raw/v5dYRg8L Long story short, the page_pool benchmark results are consistent with some outlier noise results that I'm discounting here. Currently page_pool fast path is at 8 cycles [ 2115.724510] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 8 cycles(tsc) 3.187 ns (step:0) - (measurement period time:0.031870585 sec time_interval:31870585) - (invoke count:10000000 tsc_interval:86043192) and with this patch series it degrades to 10 cycles, or about a 0.7ns degradation or so: [ 498.226127] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 10 cycles(tsc) 3.944 ns (step:0) - (measurement period time:0.039442539 sec time_interval:39442539) - (invoke count:10000000 tsc_interval:106485268) I took the time to dig into where the degradation comes from, and to my surprise we can shave off 1 cycle in perf by removing the static_branch_unlikely check in netmem_is_net_iov() like so: diff --git a/include/net/netmem.h b/include/net/netmem.h index fe354d11a421..2b4310ac1115 100644 --- a/include/net/netmem.h +++ b/include/net/netmem.h @@ -122,8 +122,7 @@ typedef unsigned long __bitwise netmem_ref; static inline bool netmem_is_net_iov(const netmem_ref netmem) { #ifdef CONFIG_PAGE_POOL - return static_branch_unlikely(&page_pool_mem_providers) && - (__force unsigned long)netmem & NET_IOV; + return (__force unsigned long)netmem & NET_IOV; #else return false; #endif With this change, the fast path is 9 cycles, only a 1 cycle (~0.35ns) regression: [ 199.184429] time_bench: Type:tasklet_page_pool01_fast_path Per elem: 9 cycles(tsc) 3.552 ns (step:0) - (measurement period time:0.035524013 sec time_interval:35524013) - (invoke count:10000000 tsc_interval:95907775) I did some digging with YiFei on why the static_branch_unlikely appears to be causing a 1 cycle regression, but could not get an answer that makes sense. The # of instructions in page_pool_return_page() with the static_branch_unlikely and without is about the same in the compiled .o file, and my understanding is that static_branch will cause code re-writing anyway so looking at the compiled code may not be representative. Worthy of note is that I get ~95% line rate of devmem TCP regardless of the static_branch_unlikely() or not, so impact of the static_branch is not large enough to be measurable end-to-end. I'm thinking I want to drop the static_branch_unlikely() in the next RFC since it doesn't improve the end-to-end throughput number and is resulting in a measurable improvement in the page pool benchmark. --=20 Thanks, Mina