Received: by 2002:a05:6a10:1d13:0:0:0:0 with SMTP id pp19csp398278pxb; Thu, 2 Sep 2021 06:47:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyr41AVWz/5RxPi2RxdFePWUmqtFfOOqfZabhIToCoKqVQEgFUgnA4H/8EgUl1LyXBQY6cK X-Received: by 2002:a17:906:f43:: with SMTP id h3mr3914370ejj.267.1630590454048; Thu, 02 Sep 2021 06:47:34 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1630590454; cv=none; d=google.com; s=arc-20160816; b=tlMHMRBqmyhdctHflxSDueKmtXQuAAlgHLqQVy2JwFSu5q2/drc3Wp+Qc1OkE/4DO1 EIJnhTKdtpyIXwPj+YtY2iCQ8/cnPmv1WM3CDUDCFLCgRn0Ki641hvTeXzMbX8mByQV7 3gg6KFEcAsXXtYYxd4KA3wKw/QoYuyyJGrdPqkwR1b2LGHGqr3w8Jn9IM/LmPQ4S4sZZ ABNGAPL+CBsNhJSm3mwZ7lmia3tD8RfpnjrHwnrTp60B+yNQBXJkMPtPLwf1gS+mBYiO dKWE3XnAgnGhCwZs+FPbRWvJJ1kri06fVRGhhF3w3ZccdUOBye6nOg54fME5cjKXh3gq C3Ag== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=i3fA75IwPwfRN/VQlOHJzj+IZnQctixR6txYOEBJ2fE=; b=HAnU7EdqK3CuIrycB1g6RjPNxSoPsM12ttHQRSkkbkP4HMfP7xjcuUwN8Td1ndXdW3 tamfhf4GJRuwE3ovsmCVvMiITvk8YZHI2DzkNEpOSx5DS9ahevroHpBWB2d3pqQlwH5w 4DXlcTr7rZT6FlW08u8/NWx8AHKjQ+e0q8WFTzYgQQxjl+wzPteCRBy3uGLzMHTTsTSP itKYijvtLXOTmGxHPUR87+PP8pBhSE/kXc9U9AmAFp92RxC2SNR9OzEvbryg2/Cv+Fcb HRJaoZBSE2kLO7r/pwK4jCnjN2tY1h5jSh0LBaf3k46dxKR7JraoUKT2qSlwlrec5UkC wnJA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v1si2262252ejd.643.2021.09.02.06.46.42; Thu, 02 Sep 2021 06:47:34 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1345317AbhIBNka (ORCPT + 99 others); Thu, 2 Sep 2021 09:40:30 -0400 Received: from mga02.intel.com ([134.134.136.20]:44394 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1345306AbhIBNk3 (ORCPT ); Thu, 2 Sep 2021 09:40:29 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10094"; a="206335089" X-IronPort-AV: E=Sophos;i="5.84,372,1620716400"; d="scan'208";a="206335089" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Sep 2021 06:39:28 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.84,372,1620716400"; d="scan'208";a="542702720" Received: from shbuild999.sh.intel.com (HELO localhost) ([10.239.146.151]) by fmsmga002.fm.intel.com with ESMTP; 02 Sep 2021 06:39:24 -0700 Date: Thu, 2 Sep 2021 21:39:24 +0800 From: Feng Tang To: Michal Koutn?? Cc: Andi Kleen , Johannes Weiner , Linus Torvalds , andi.kleen@intel.com, kernel test robot , Roman Gushchin , Michal Hocko , Shakeel Butt , Balbir Singh , Tejun Heo , Andrew Morton , LKML , lkp@lists.01.org, kernel test robot , "Huang, Ying" , Zhengjun Xing Subject: Re: [mm] 2d146aa3aa: vm-scalability.throughput -36.4% regression Message-ID: <20210902133924.GA72811@shbuild999.sh.intel.com> References: <20210818023004.GA17956@shbuild999.sh.intel.com> <20210831063036.GA46357@shbuild999.sh.intel.com> <20210831092304.GA17119@blackbody.suse.cz> <20210901045032.GA21937@shbuild999.sh.intel.com> <877dg0wcrr.fsf@linux.intel.com> <20210902013558.GA97410@shbuild999.sh.intel.com> <20210902034628.GA76472@shbuild999.sh.intel.com> <20210902105306.GC17119@blackbody.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20210902105306.GC17119@blackbody.suse.cz> User-Agent: Mutt/1.5.24 (2015-08-30) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Sep 02, 2021 at 12:53:06PM +0200, Michal Koutn?? wrote: > Hi. > > On Thu, Sep 02, 2021 at 11:46:28AM +0800, Feng Tang wrote: > > > Narrowing it down to a single prefetcher seems good enough to me. The > > > behavior of the prefetchers is fairly complicated and hard to predict, so I > > > doubt you'll ever get a 100% step by step explanation. > > My layman explanation with the available information is that the > prefetcher somehow behaves as if it marked the offending cacheline as > modified (even though reading only) therefore slowing down the remote reader. But this can't explain the test that adding 128 bytes before css->cgroup can restore/improve the performance. > On Thu, Sep 02, 2021 at 09:35:58AM +0800, Feng Tang wrote: > > @@ -139,10 +139,21 @@ struct cgroup_subsys_state { > > /* PI: the cgroup that this css is attached to */ > > struct cgroup *cgroup; > > > > + struct cgroup_subsys_state *parent; > > + > > /* PI: the cgroup subsystem that this css is attached to */ > > struct cgroup_subsys *ss; > > Hm, an interesting move; be mindful of commit b8b1a2e5eca6 ("cgroup: > move cgroup_subsys_state parent field for cache locality"). It might be > a regression for systems with cpuacct root css present. (That is likely > a big amount nowadays, that may be the reason why you don't see full > recovery? For future, we may at least guard cpuacct_charge() with > cgroup_subsys_enabled() static branch.) Goot catch! Acutally I also tested only moving 'destroy_work' and 'destroy_rwork' ('parent' is not touched with the cost of 8 bytes more padding), which has simliar effect that restore to about 15% regression. > > [snip] > > Yes, I'm afriad so, given that the policy/algorithm used by perfetcher > > keeps changing from generation to generation. > > Exactly. I'm afraid of relayouting the structure with each new > generation. A robust solution is putting all frequently accessed members > into individual cache-lines + separating them with one more cache line? :-/ Yes, this is hard. Even for my debug patch, we can only say it works as restoring the regression partly, but not knowing the exact reason. Thansk, Feng > > Michal