Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp104280iog; Thu, 23 Jun 2022 23:43:12 -0700 (PDT) X-Google-Smtp-Source: AGRyM1ug5lerk4C0AyyIK2IxhMborrw2hvDWaiiEGr9LaE3Vu5Yg8ASfElN9feyqjUKIEhHHpKFa X-Received: by 2002:a63:82c3:0:b0:40c:c340:318d with SMTP id w186-20020a6382c3000000b0040cc340318dmr9102507pgd.191.1656052992274; Thu, 23 Jun 2022 23:43:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656052992; cv=none; d=google.com; s=arc-20160816; b=mo735LpECihTuTPveEgm+jXujydf704achguiGZa6eSFxBtjJ28JDvcjhu7tvkxflT ZaFAiCw1VhKDh8FzWpN/Lity7ct/qSG0pYeSv89USm8bQifpwpA39qmXLZcQxbSPGYX4 QrqTegBbgAgRVQTRuGU1dLQ/tcZ6uQQ8sqyb4S48B2s/TF3lQiRFDf7CFHQ6XVy5q0fv sDnb0YWwNgiWW78XguyJO/6UWIc5oItFgmsu7FG+fhmoQpzxcpiegFOk3Eb8NmrnEX2L X3m7v5YPpCglDJtnhxd2Bc/RhqQpROKBimZbsQ38CWfPlsnsXkbYQV95Q2nZBZ1f1dWF /Hhw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=71jGypN6c61twJSnmBdGUmoPeKpzoJtu/qf5Lc4Vgtg=; b=YUhs9MUKWcjPscFAAw0bneJecebkZkedzG2CKn9yx0JWghBENf60lz3uFUAVGhW2OE E1yfNZMjAXhp18Tt6i3iGzMMewhMI36VB13l2LANF9LIcDJPdwCe2SXVMeZ290Zdh8ML 10aNWbQ+/0r7ZtE1XUL4oFWc1p2bNptLjaEaLp30/1efG1WYN2HBAQcQaAi0N36uv3fv GFk6FkGzP2/YQ3mmGroTgxl9N7jHRZPt4J3SKo/9lacVuaUc8GzarFyGZeHUVDZwt2lW T5q8edQFXa1J7hsENFfn7Vi5VI2qS3hSV2QayLf9LmTtPADU3NqR7tnhUkDNV6OXomSt zviQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GQWOciPs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w3-20020a631603000000b0040cab1fb5a6si1802471pgl.14.2022.06.23.23.42.59; Thu, 23 Jun 2022 23:43:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20210112 header.b=GQWOciPs; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230074AbiFXGe3 (ORCPT + 99 others); Fri, 24 Jun 2022 02:34:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:45424 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230018AbiFXGe2 (ORCPT ); Fri, 24 Jun 2022 02:34:28 -0400 Received: from mail-pj1-x1031.google.com (mail-pj1-x1031.google.com [IPv6:2607:f8b0:4864:20::1031]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DF04F5DC21 for ; Thu, 23 Jun 2022 23:34:26 -0700 (PDT) Received: by mail-pj1-x1031.google.com with SMTP id p5so1862290pjt.2 for ; Thu, 23 Jun 2022 23:34:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=71jGypN6c61twJSnmBdGUmoPeKpzoJtu/qf5Lc4Vgtg=; b=GQWOciPs4SH+KIoLH/4jvpooIeGL/uKMB5IIRmqHwKrVMCLTNkhEHTFWG3stILKRer cinAgcMsxDf4RTBxejYoy941btEr/eLYCO+GF0ql1oaFb5jWx60AuACLQuEvlaLYrFZf UgRAAkZWXt6lWFMzYjbN06dquy9miTA8f1PyXmTEZJcjuxIApu2/k+3F3LqT6WPZStaK L7ajnIeNJmeLEocSQsgypUZxz1YBXMny857DRCdVvb2g8M2IoTpQy8wBz8KkiaITpwjR N+Jfv3+DvM/5n9lutIQWyD9cIG3lm6BebIyNWT6S8ze0LdBcVOk9Wq+EciVzzoisMPoO SYeA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=71jGypN6c61twJSnmBdGUmoPeKpzoJtu/qf5Lc4Vgtg=; b=K3asOngmI0EXJCqWGxeTBW1M6MraE508YEDzWJOFynG0HfSdiytrCvHVCmblwquGl3 n91q8NdAlp3ayFEDBmOQAsjlnZlxhmaszP2BFnyE+cXAxCc0qfD07qzAqDDnxM4kHohP Fu6jb5OMxuejG3nTjpax4ooYRqcKxeeNvudfmhuu8xLRYEFoCosMIEsVq6h65iUrrZzU It7Lkr9fPiLYGE+zHXvsaynVEPe/oor9+03+sP5URKJ8mygkcC3Asr6PWnK+8XDwHgoX AWpKe6Opmgi2G70525V9MXvAlIHnpjmgtTMjlJ+K4wl/Wzd2BU78Lyotp44IB0gvE/a8 /xyA== X-Gm-Message-State: AJIora/lOyFh9t06YkEYYjOXQmBO2SIH9T0r9QsCM4GajFcS5Su6cx7G FpwBb/eisMsdZ7tbv+ZqHwkjNTmXxDSwLNieU9sVzQ== X-Received: by 2002:a17:902:f685:b0:16a:3c40:e3b5 with SMTP id l5-20020a170902f68500b0016a3c40e3b5mr15658634plg.106.1656052466148; Thu, 23 Jun 2022 23:34:26 -0700 (PDT) MIME-Version: 1.0 References: <20220619150456.GB34471@xsang-OptiPlex-9020> <20220622172857.37db0d29@kernel.org> <20220623185730.25b88096@kernel.org> In-Reply-To: From: Shakeel Butt Date: Thu, 23 Jun 2022 23:34:15 -0700 Message-ID: Subject: Re: [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression To: Eric Dumazet , Linux MM , Andrew Morton , Roman Gushchin , Michal Hocko , Johannes Weiner , Muchun Song Cc: Jakub Kicinski , Xin Long , Marcelo Ricardo Leitner , kernel test robot , Soheil Hassas Yeganeh , LKML , network dev , linux-s390@vger.kernel.org, MPTCP Upstream , "linux-sctp @ vger . kernel . org" , lkp@lists.01.org, kbuild test robot , Huang Ying , "Tang, Feng" , Xing Zhengjun , Yin Fengwei , Ying Xu Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-17.6 required=5.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF, ENV_AND_HDR_SPF_MATCH,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE,USER_IN_DEF_DKIM_WL,USER_IN_DEF_SPF_WL autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org CCing memcg folks. The thread starts at https://lore.kernel.org/all/20220619150456.GB34471@xsang-OptiPlex-9020/ On Thu, Jun 23, 2022 at 9:14 PM Eric Dumazet wrote: > > On Fri, Jun 24, 2022 at 3:57 AM Jakub Kicinski wrote: > > > > On Thu, 23 Jun 2022 18:50:07 -0400 Xin Long wrote: > > > From the perf data, we can see __sk_mem_reduce_allocated() is the one > > > using CPU the most more than before, and mem_cgroup APIs are also > > > called in this function. It means the mem cgroup must be enabled in > > > the test env, which may explain why I couldn't reproduce it. > > > > > > The Commit 4890b686f4 ("net: keep sk->sk_forward_alloc as small as > > > possible") uses sk_mem_reclaim(checking reclaimable >= PAGE_SIZE) to > > > reclaim the memory, which is *more frequent* to call > > > __sk_mem_reduce_allocated() than before (checking reclaimable >= > > > SK_RECLAIM_THRESHOLD). It might be cheap when > > > mem_cgroup_sockets_enabled is false, but I'm not sure if it's still > > > cheap when mem_cgroup_sockets_enabled is true. > > > > > > I think SCTP netperf could trigger this, as the CPU is the bottleneck > > > for SCTP netperf testing, which is more sensitive to the extra > > > function calls than TCP. > > > > > > Can we re-run this testing without mem cgroup enabled? > > > > FWIW I defer to Eric, thanks a lot for double checking the report > > and digging in! > > I did tests with TCP + memcg and noticed a very small additional cost > in memcg functions, > because of suboptimal layout: > > Extract of an internal Google bug, update from June 9th: > > -------------------------------- > I have noticed a minor false sharing to fetch (struct > mem_cgroup)->css.parent, at offset 0xc0, > because it shares the cache line containing struct mem_cgroup.memory, > at offset 0xd0 > > Ideally, memcg->socket_pressure and memcg->parent should sit in a read > mostly cache line. > ----------------------- > > But nothing that could explain a "-69.4% regression" > > memcg has a very similar strategy of per-cpu reserves, with > MEMCG_CHARGE_BATCH being 32 pages per cpu. > > It is not clear why SCTP with 10K writes would overflow this reserve constantly. > > Presumably memcg experts will have to rework structure alignments to > make sure they can cope better > with more charge/uncharge operations, because we are not going back to > gigantic per-socket reserves, > this simply does not scale. Yes I agree. As you pointed out there are fields which are mostly read-only but sharing cache lines with fields which get updated and definitely need work. However can we first confirm if memcg charging is really the issue here as I remember these intel lkp tests are configured to run in root memcg and the kernel does not associate root memcg to any socket (see mem_cgroup_sk_alloc()). If these tests are running in non-root memcg, is this cgroup v1 or v2? The memory counter and the 32 pages per cpu stock are only used on v2. For v1, there is no per-cpu stock and there is a separate tcpmem page counter and on v1 the network memory accounting has to be enabled explicitly i.e. not enabled by default. There is definite possibility of slowdown on v1 but let's first confirm the memcg setup used for this testing environment. Feng, can you please explain the memcg setup on these test machines and if the tests are run in root or non-root memcg? thanks, Shakeel