Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp9687172rwr; Thu, 11 May 2023 20:04:24 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ6S47yb8lms2H8bXjdQsEhIAiuzHHxzXWceZ23RyyxpcUfIjNaePo6HwFtPGo704AGO0myN X-Received: by 2002:a17:902:d2c4:b0:1ac:5717:fd2 with SMTP id n4-20020a170902d2c400b001ac57170fd2mr21418318plc.47.1683860663952; Thu, 11 May 2023 20:04:23 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683860663; cv=none; d=google.com; s=arc-20160816; b=z9lW7K0BGAAWs7s1dv+2sTlYSPd+0Tfck+gz5bTRX6Z7ZjQCk46o+4R7FHfLyoqrFW bsoG4yL0ClgyN8MlA8Wmu7QAa5TVT+WJRqnyuFtrYzmOgAs3cayWLPVABGGI6u7X6/YM 1udNqi6QyactedSRjlx+HC2b5yhlIaxgf21QQXQJCKks+StOY/zlN30bGEvoYekwpmEf jeaNkMH4T2MHZcEARSo/bayrdOl+cR+DnZQgIsHHuvFfohyiTno2lxM0e2/0LRWcCju9 cxaXOQ92KlvIEjj+pn7RWiWPTXxl5DMQ+XWE32H+vLQOBViJGzeFEVgZCv3b+HZMn7M3 y+nw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:user-agent:message-id:in-reply-to :date:references:subject:cc:to:from:dkim-signature; bh=EuYZYko4XhY0WrBaSapxzH4IQtJP77yEEd41d9bS65U=; b=NfOQ47qZff7DuybOehbtppGl8PR/FG4Aoid08g6KdmTqDeM6jZiDERDCNP+zOcP2fR nGIDsvaI1YwfX1YLNufIjzJND/7+BCTg/GJP7AzWDO6sZbBx1e+WakMV2mjSMUS3TaTj ttYht+TW14bJTDu8BLfUjV5mHYzH97kYii3RFLX1Q1czcZcWPnWfpAJ+xJnrKGjP0+Di VbShoVavGq8HVD8BQ5nVjHUJ/9ixgrnUtJYiVZI6hn6zgXZfYZxqaGcgIZYmXKJMPKmC D96hGXqLSHm/vkP3bxY0T3Tnw4FEhIG3eQX8ZfpQRVsjH3R+tIiTEYYQ4rJzuICgJRaI Lzjg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dBqqFJCP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id w11-20020a17090a8a0b00b0024dee3457b1si21773417pjn.49.2023.05.11.20.04.11; Thu, 11 May 2023 20:04:23 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=dBqqFJCP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S239791AbjELC4h (ORCPT + 99 others); Thu, 11 May 2023 22:56:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42670 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231229AbjELC4g (ORCPT ); Thu, 11 May 2023 22:56:36 -0400 Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B891E63 for ; Thu, 11 May 2023 19:56:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683860195; x=1715396195; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=JQwFcMEcOc46Dt/Yt7xAWILwrd7bAWYMMUprblli39I=; b=dBqqFJCPCZOZvo7zqf9hoZdmDJl+Voc9Y9kdjTjinK2gAdgpWmFMOPtV piVNPhtuyTL/c0mXu6yaDLfDpIsvTu18JrJBMU3aNjFiUeSwKyWRNiPnb tvb2dH8oqidY0NUNQ3PVoys2h+T0jY+bqDME6lzUpmnU813ntWGJxFjTl n256NBFe4Xd8Mc8TArfj8AVuMY0IfYO7UFehC+M7vqBjIpHYJ5eB3H11s wefSA5CJ7aluIViV1YErYBQL4vcZK1wJiPSsfKjZHlsREnPw4xcpLHj9P YqfB1af9IeVKAMJbR+masvOEkzB9oWoxr2n5IghvRdZ25ddvAeAbsDpjG A==; X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="339996970" X-IronPort-AV: E=Sophos;i="5.99,269,1677571200"; d="scan'208";a="339996970" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by orsmga101.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 19:56:34 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10707"; a="811888990" X-IronPort-AV: E=Sophos;i="5.99,269,1677571200"; d="scan'208";a="811888990" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 May 2023 19:56:31 -0700 From: "Huang, Ying" To: Michal Hocko Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Andrew Morton , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Pavel Tatashin , Matthew Wilcox Subject: Re: [RFC 0/6] mm: improve page allocator scalability via splitting zones References: <20230511065607.37407-1-ying.huang@intel.com> Date: Fri, 12 May 2023 10:55:21 +0800 In-Reply-To: (Michal Hocko's message of "Thu, 11 May 2023 17:05:51 +0200") Message-ID: <87r0rm8die.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Michal, Thanks for comments! Michal Hocko writes: > On Thu 11-05-23 14:56:01, Huang Ying wrote: >> The patchset is based on upstream v6.3. >> >> More and more cores are put in one physical CPU (usually one NUMA node >> too). In 2023, one high-end server CPU has 56, 64, or more cores. >> Even more cores per physical CPU are planned for future CPUs. While >> all cores in one physical CPU will contend for the page allocation on >> one zone in most cases. This causes heavy zone lock contention in >> some workloads. And the situation will become worse and worse in the >> future. >> >> For example, on an 2-socket Intel server machine with 224 logical >> CPUs, if the kernel is built with `make -j224`, the zone lock >> contention cycles% can reach up to about 12.7%. >> >> To improve the scalability of the page allocation, in this series, we >> will create one zone instance for each about 256 GB memory of a zone >> type generally. That is, one large zone type will be split into >> multiple zone instances. Then, different logical CPUs will prefer >> different zone instances based on the logical CPU No. So the total >> number of logical CPUs contend on one zone will be reduced. Thus the >> scalability is improved. > > It is not really clear to me why you need a new zone for all this rather > than partition free lists internally within the zone? Essentially to > increase the current two level system to 3: per cpu caches, per cpu > arenas and global fallback. Sorry, I didn't get your idea here. What is per cpu arenas? What's the difference between it and per cpu caches (PCP)? > I am also missing some information why pcp caches tunning is not > sufficient. PCP does improve the page allocation scalability greatly! But it doesn't help much for workloads that allocating pages on one CPU and free them in different CPUs. PCP tuning can improve the page allocation scalability for a workload greatly. But it's not trivial to find the best tuning parameters for various workloads and workload run time statuses (workloads may have different loads and memory requirements at different time). And we may run different workloads on different logical CPUs of the system. This also makes it hard to find the best PCP tuning globally. It would be better to find a solution to improve the page allocation scalability out of box or automatically. Do you agree? Best Regards, Huang, Ying