Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp3610952iog; Mon, 27 Jun 2022 21:41:56 -0700 (PDT) X-Google-Smtp-Source: AGRyM1uWjMaYRkgZi16VYgCGigSIBLoqU2yo57p9wtq6k8WNaKz8qkSs76AiFLWx2I5FPFHTep0b X-Received: by 2002:a17:907:94c9:b0:726:9747:edbc with SMTP id dn9-20020a17090794c900b007269747edbcmr11045996ejc.698.1656391316366; Mon, 27 Jun 2022 21:41:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656391316; cv=none; d=google.com; s=arc-20160816; b=DYnlwM756bybJ77Gyuv3vgcG4En9E/fWhH++EOSQ+0gSD1Rv8C79r90RDgRvlPQfEe YAwgk9kYBt2Ilfg1O7R0Lvn2sffo1YPSGca9l7iHJxa90646w/II3EBlmvH4/0LqUyxD yV0sGvl7t1ug1MdiY2uvEoCqUBDbtzdXhJEN5mjvEfbYstCTXNDw5XfKH8s0Dy8OsIvV /FjktefamJP8ZG0y5fWJy4uaP5Y7DP3j3iYFa5PuNLXIKY7ok9/RtISfqplVzHj6u3PH TfKeIv5rGvrdDH//ZIqs3UEm0KV9xhYWU/9dHagoPc+Tlve0X7IcJCI0K3Poj6ZrS9iu 2n6A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:dkim-signature; bh=dM0XtE/x4deoawEYS4rQFn9rO+BPiz2xCqeDmd2AH3g=; b=xmnMQnoNhBbU/qVSCOE8iCq3T5o9KquIqiHHizQ3iC99Lh3N4ZKNR/5nHxVhAbCSCT H38uCEmmiiBIgJCTDJhslsjGIlho+7QCxny/7UuXhfjR06QpfJxAkaEQ/dzGiOLfDdwk rMfITSUeu1bkIi4n+c66GbdMjvzqeLYXyT0C8ioXSZyfjQyu0jcJYDyjbfuC9oY26Eu2 nAlPHMGeY+s3HH8hnt9TMcItCVNWG8AtXW01cVZi+rK6VXE7Ih4fhMqiOCWTXuGcdTgd 94PIojgAr+mDVdtnkPhaTKBW8NtJ8SvO0+KT8RfeP6xFI9tIv/cyriZmD4o9mYlxQfGz FFIw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="l5N3Q/+i"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id b18-20020a056402279200b0043743ad1af8si14899134ede.356.2022.06.27.21.41.30; Mon, 27 Jun 2022 21:41:56 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b="l5N3Q/+i"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S243690AbiF1Dti (ORCPT + 99 others); Mon, 27 Jun 2022 23:49:38 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39626 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S243577AbiF1Dte (ORCPT ); Mon, 27 Jun 2022 23:49:34 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 399AC20BED; Mon, 27 Jun 2022 20:49:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1656388173; x=1687924173; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=JCJ3t15hN0kX/PrCxPVD3oswwqwAU6aCjICQKC2sqZE=; b=l5N3Q/+iOgc6ZujH8OeWhX+z7eZZX9M6KmVGgHRpS4oyB/c1VZfUYoL6 hCzyTKYKqFEoRpWOQHTpqJe2ZvtOYDEPd2HDng8aHF01KRZe9w1kimA3Y felQVH7z9egx7WHAbSDQi1wZiID6ef9vWyp8GYoRqgS5RnkqtGciyBoo4 Ct89J93MfFMfQWxbWW0qffJRzWgNDQvk0ffhkus0tzWpMB2HwLTajiXLw Kd41XDT5uoUvCnDpa42Z/KuGZZ+tKhSOEChTjDC7Q7LtTFbYO3GS6/4qH zUXFezg/F9+pr9DUa8oH+fen71mJ7bs4g8tjoRPwIFVsC9Ttk6RRGeRnQ A==; X-IronPort-AV: E=McAfee;i="6400,9594,10391"; a="280385326" X-IronPort-AV: E=Sophos;i="5.92,227,1650956400"; d="scan'208";a="280385326" Received: from orsmga006.jf.intel.com ([10.7.209.51]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 Jun 2022 20:49:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,227,1650956400"; d="scan'208";a="564924431" Received: from shbuild999.sh.intel.com (HELO localhost) ([10.239.146.138]) by orsmga006.jf.intel.com with ESMTP; 27 Jun 2022 20:49:27 -0700 Date: Tue, 28 Jun 2022 11:49:26 +0800 From: Feng Tang To: Eric Dumazet Cc: Shakeel Butt , Linux MM , Andrew Morton , Roman Gushchin , Michal Hocko , Johannes Weiner , Muchun Song , Jakub Kicinski , Xin Long , Marcelo Ricardo Leitner , kernel test robot , Soheil Hassas Yeganeh , LKML , network dev , linux-s390@vger.kernel.org, MPTCP Upstream , "linux-sctp @ vger . kernel . org" , lkp@lists.01.org, kbuild test robot , Huang Ying , Xing Zhengjun , Yin Fengwei , Ying Xu Subject: Re: [net] 4890b686f4: netperf.Throughput_Mbps -69.4% regression Message-ID: <20220628034926.GA69004@shbuild999.sh.intel.com> References: <20220624070656.GE79500@shbuild999.sh.intel.com> <20220624144358.lqt2ffjdry6p5u4d@google.com> <20220625023642.GA40868@shbuild999.sh.intel.com> <20220627023812.GA29314@shbuild999.sh.intel.com> <20220627123415.GA32052@shbuild999.sh.intel.com> <20220627144822.GA20878@shbuild999.sh.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 27, 2022 at 06:25:59PM +0200, Eric Dumazet wrote: > On Mon, Jun 27, 2022 at 4:48 PM Feng Tang wrote: > > > > Yes, I also analyzed the perf-profile data, and made some layout changes > > which could recover the changes from 69% to 40%. > > > > 7c80b038d23e1f4c 4890b686f4088c90432149bd6de 332b589c49656a45881bca4ecc0 > > ---------------- --------------------------- --------------------------- > > 15722 -69.5% 4792 -40.8% 9300 netperf.Throughput_Mbps > > > > I simply did the following and got much better results. > > But I am not sure if updates to ->usage are really needed that often... > > > diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h > index 679591301994d316062f92b275efa2459a8349c9..e267be4ba849760117d9fd041e22c2a44658ab36 > 100644 > --- a/include/linux/page_counter.h > +++ b/include/linux/page_counter.h > @@ -3,12 +3,15 @@ > #define _LINUX_PAGE_COUNTER_H > > #include > +#include > #include > #include > > struct page_counter { > - atomic_long_t usage; > - unsigned long min; > + /* contended cache line. */ > + atomic_long_t usage ____cacheline_aligned_in_smp; > + > + unsigned long min ____cacheline_aligned_in_smp; > unsigned long low; > unsigned long high; > unsigned long max; > @@ -27,12 +30,6 @@ struct page_counter { > unsigned long watermark; > unsigned long failcnt; > > - /* > - * 'parent' is placed here to be far from 'usage' to reduce > - * cache false sharing, as 'usage' is written mostly while > - * parent is frequently read for cgroup's hierarchical > - * counting nature. > - */ > struct page_counter *parent; > }; I just tested it, it does perform better (the 4th is with your patch), some perf-profile data is also listed. 7c80b038d23e1f4c 4890b686f4088c90432149bd6de 332b589c49656a45881bca4ecc0 e719635902654380b23ffce908d ---------------- --------------------------- --------------------------- --------------------------- 15722 -69.5% 4792 -40.8% 9300 -27.9% 11341 netperf.Throughput_Mbps 0.00 +0.3 0.26 ± 5% +0.5 0.51 +1.3 1.27 ± 2%pp.self.__sk_mem_raise_allocated 0.00 +0.3 0.32 ± 15% +1.7 1.74 ± 2% +0.4 0.40 ± 2% pp.self.propagate_protected_usage 0.00 +0.8 0.82 ± 7% +0.9 0.90 +0.8 0.84 pp.self.__mod_memcg_state 0.00 +1.2 1.24 ± 4% +1.0 1.01 +1.4 1.44 pp.self.try_charge_memcg 0.00 +2.1 2.06 +2.1 2.13 +2.1 2.11 pp.self.page_counter_uncharge 0.00 +2.1 2.14 ± 4% +2.7 2.71 +2.6 2.60 ± 2% pp.self.page_counter_try_charge 1.12 ± 4% +3.1 4.24 +1.1 2.22 +1.4 2.51 pp.self.native_queued_spin_lock_slowpath 0.28 ± 9% +3.8 4.06 ± 4% +0.2 0.48 +0.4 0.68 pp.self.sctp_eat_data 0.00 +8.2 8.23 +0.8 0.83 +1.3 1.26 pp.self.__sk_mem_reduce_allocated And the size of 'mem_cgroup' is increased from 4224 Bytes to 4608. Another info is the perf hotspos are slightly different between tcp and sctp test cases. Thanks, Feng