Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp83895iog; Thu, 23 Jun 2022 23:07:51 -0700 (PDT) X-Google-Smtp-Source: AGRyM1svcqvQf049GMrrJL12xEJT07d9NH+t0Jik+GkSMVJKkzaQStkJNJ8CqmqTyAbGrbuZtoYc X-Received: by 2002:a62:4ed3:0:b0:525:5a10:d5ac with SMTP id c202-20020a624ed3000000b005255a10d5acmr6100599pfb.65.1656050871337; Thu, 23 Jun 2022 23:07:51 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656050871; cv=none; d=google.com; s=arc-20160816; b=OgKOlQ+BTaufmjUF7Fx9Ep0FuHhpUxYbDGWOCIEg01P+dAKkrqMdBozezE65DGxJOc hDKH2dc79GUwH9EJSs1dmCi693mlxvPm/v8GcZoe9jYBSKBc+COjak8g2RXeBfFTnNSp HItEzR8P0SOFTT1ZDu4+MNQ2H4x8JaRw38/4trNFTslOJlV9VzBn4fOTfz4YtaHUpG4W kSH0X8Emwi0cRrDhYmmFS8VdG8DXkoSlR14uwubGk2gj0QbDLXOf6ynj/xIoBfXpOlUF 6DujqJiiZt76N46/nVXq3+JA+a09gdmz6q30O8vkTNavpVY409/cqC2bWAtrxcB9+8XZ Cb9g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=UUGqTgN+5YKv2H0rf/ztiANxsWf1MI3a/7GrwKOQcgM=; b=lNqSrBLnvw6/v+o5kJWEhjvhH7QOFzchFVQWERnq62t2zdQYLJZr24RGvnFELoZBst j94+oBefOaglfaM9xm2EgFiozFmUvBGNSViY4MVpMA7+aOIGVotM1GTeT1LShRYF0oPu KCyo+cRVm01jg8zUZku/2a5UORsoHNCSruB8tQ/eYXFG9XbhTS2DZNa0zQL9dDzbIt+7 kPzDPi8DkEEnUFSDs+b4kq3yNVBxdkyo1GwaSKQkgh9wkPoIFcWFiu8j6/MVG7W4ZXfL ejnHwa8OrRiMzStFsrg4QgVN+x+A1hWm79xln3D7Z9v8LWrhHW14CuH3L1gc03LFin3K wIMQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VWOlSEOd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id q13-20020a635c0d000000b003fe25580685si1590303pgb.349.2022.06.23.23.07.39; Thu, 23 Jun 2022 23:07:51 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=VWOlSEOd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229659AbiFXFpP (ORCPT + 99 others); Fri, 24 Jun 2022 01:45:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36018 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229451AbiFXFpN (ORCPT ); Fri, 24 Jun 2022 01:45:13 -0400 Received: from mail-yw1-x1135.google.com (mail-yw1-x1135.google.com [IPv6:2607:f8b0:4864:20::1135]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5341F69A8A for ; Thu, 23 Jun 2022 22:45:12 -0700 (PDT) Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-317741c86fdso14798467b3.2 for ; Thu, 23 Jun 2022 22:45:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=UUGqTgN+5YKv2H0rf/ztiANxsWf1MI3a/7GrwKOQcgM=; b=VWOlSEOdFiePk9Rc5gCg1TK+lVqlDRMW8n9OQyh6w4erjgn20jqCcBjiyzbPeEZkm/ Ju3DJRybXMomwT4SKbwv6AoRGa8i06EDXov0akpTQSw2VVZfmiL+N8uKdpKC6GAqyXrg XLNNQRLU7BkoOv7W/61vjxPbuxN3h9QYx7UHfu3UqhJkGHdevTZAqrYlmVFW7uQQ++HK 9oOh+E/03sukVd8lf6Jl+FKqID9qaVM8fSN1m0cuK3dUi4PQOeYXhtPJeikMdaqgt/YL 7s30iAUICIM1bsnf1ub4rqh8GV1lNDCECsTSq1H3Vqwo8vD4ZPmGqICzqh0ia+ZWym65 fanw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=UUGqTgN+5YKv2H0rf/ztiANxsWf1MI3a/7GrwKOQcgM=; b=b/1GwIQ+xS98eYPN77t22vdX0dQ/ha7ePBdwlkLezpKZPCiEPOTP6IGChnolG00D/Z f2aCYOYLxPgHSbeb3buWGAO2IJeZ/k6PQh6XjhBYr2MXh8Fj7rJOoUDT8gdRd0wh5+d1 WpGSZCSFisWXx68a/ond44rqooiPjg/CVJPrPx643iYB2SlVdtcrvwEx4eEb0bI4LJub XsB9LiQCsjP998fiGK4lxbH2MzUc8K/8QkecfSx77qFPZu4mMhgkDhxsD2MsAuopbA/R XkvohQfl0OweU+nqIt87NzsMzGwaVr4Msir78v+DJwYBESXizgP4Anx4SBpreNBktY0i I9Pw== X-Gm-Message-State: AJIora83BwPFUB6uIe77m0tHZBPXgji247nJ52jko06Rq3N3chxwHJRr JyZVfaukHTWfIvo1ox0kevQVUe/MjdoHYj38aJntMw== X-Received: by 2002:a81:1809:0:b0:317:c014:f700 with SMTP id 9-20020a811809000000b00317c014f700mr14454062ywy.255.1656049511378; Thu, 23 Jun 2022 22:45:11 -0700 (PDT) MIME-Version: 1.0 References: <20220619150456.GB34471@xsang-OptiPlex-9020> <20220622172857.37db0d29@kernel.org> <20220623185730.25b88096@kernel.org> <20220624051351.GA72171@shbuild999.sh.intel.com> In-Reply-To: <20220624051351.GA72171@shbuild999.sh.intel.com> From: Eric Dumazet Date: Fri, 24 Jun 2022 07:45:00 +0200 Message-ID: Subject: Re: [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression To: Feng Tang Cc: Jakub Kicinski , Xin Long , Marcelo Ricardo Leitner , kernel test robot , Shakeel Butt , Soheil Hassas Yeganeh , LKML , Linux Memory Management List , network dev , linux-s390@vger.kernel.org, MPTCP Upstream , "linux-sctp @ vger . kernel . org" , lkp@lists.01.org, kbuild test robot , Huang Ying , zhengjun.xing@linux.intel.com, fengwei.yin@intel.com, Ying Xu Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Jun 24, 2022 at 7:14 AM Feng Tang wrote: > > Hi Eric, > > On Fri, Jun 24, 2022 at 06:13:51AM +0200, Eric Dumazet wrote: > > On Fri, Jun 24, 2022 at 3:57 AM Jakub Kicinski wrote: > > > > > > On Thu, 23 Jun 2022 18:50:07 -0400 Xin Long wrote: > > > > From the perf data, we can see __sk_mem_reduce_allocated() is the one > > > > using CPU the most more than before, and mem_cgroup APIs are also > > > > called in this function. It means the mem cgroup must be enabled in > > > > the test env, which may explain why I couldn't reproduce it. > > > > > > > > The Commit 4890b686f4 ("net: keep sk->sk_forward_alloc as small as > > > > possible") uses sk_mem_reclaim(checking reclaimable >= PAGE_SIZE) to > > > > reclaim the memory, which is *more frequent* to call > > > > __sk_mem_reduce_allocated() than before (checking reclaimable >= > > > > SK_RECLAIM_THRESHOLD). It might be cheap when > > > > mem_cgroup_sockets_enabled is false, but I'm not sure if it's still > > > > cheap when mem_cgroup_sockets_enabled is true. > > > > > > > > I think SCTP netperf could trigger this, as the CPU is the bottleneck > > > > for SCTP netperf testing, which is more sensitive to the extra > > > > function calls than TCP. > > > > > > > > Can we re-run this testing without mem cgroup enabled? > > > > > > FWIW I defer to Eric, thanks a lot for double checking the report > > > and digging in! > > > > I did tests with TCP + memcg and noticed a very small additional cost > > in memcg functions, > > because of suboptimal layout: > > > > Extract of an internal Google bug, update from June 9th: > > > > -------------------------------- > > I have noticed a minor false sharing to fetch (struct > > mem_cgroup)->css.parent, at offset 0xc0, > > because it shares the cache line containing struct mem_cgroup.memory, > > at offset 0xd0 > > > > Ideally, memcg->socket_pressure and memcg->parent should sit in a read > > mostly cache line. > > ----------------------- > > > > But nothing that could explain a "-69.4% regression" > > We can double check that. > > > memcg has a very similar strategy of per-cpu reserves, with > > MEMCG_CHARGE_BATCH being 32 pages per cpu. > > We have proposed patch to increase the batch numer for stats > update, which was not accepted as it hurts the accuracy and > the data is used by many tools. > > > It is not clear why SCTP with 10K writes would overflow this reserve constantly. > > > > Presumably memcg experts will have to rework structure alignments to > > make sure they can cope better > > with more charge/uncharge operations, because we are not going back to > > gigantic per-socket reserves, > > this simply does not scale. > > Yes, the memcg statitics and charge/unchage update is very sensitive > with the data alignemnt layout, and can easily trigger peformance > changes, as we've seen quite some similar cases in the past several > years. > > One pattern we've seen is, even if a memcg stats updating or charge > function only takes about 2%~3% of the CPU cycles in perf-profile data, > once it got affected, the peformance change could be amplified to up to > 60% or more. > Reorganizing "struct mem_cgroup" to put "struct page_counter memory" in a separate cache line would be beneficial. Many low hanging fruits, assuming nobody will use __randomize_layout on it ;) Also some fields are written even if their value is not changed. diff --git a/mm/memcontrol.c b/mm/memcontrol.c index abec50f31fe64100f4be5b029c7161b3a6077a74..53d9c1e581e78303ef73942e2b34338567987b74 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -7037,10 +7037,12 @@ bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages, struct page_counter *fail; if (page_counter_try_charge(&memcg->tcpmem, nr_pages, &fail)) { - memcg->tcpmem_pressure = 0; + if (READ_ONCE(memcg->tcpmem_pressure)) + WRITE_ONCE(memcg->tcpmem_pressure, 0); return true; } - memcg->tcpmem_pressure = 1; + if (!READ_ONCE(memcg->tcpmem_pressure)) + WRITE_ONCE(memcg->tcpmem_pressure, 1); if (gfp_mask & __GFP_NOFAIL) { page_counter_charge(&memcg->tcpmem, nr_pages); return true;