Received: by 2002:a05:6358:9144:b0:117:f937:c515 with SMTP id r4csp8387712rwr; Thu, 11 May 2023 00:01:41 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ5NBamyJZcbozzpUIHIkjIcc9uNY+PK+3OSO4G3H9T3OtGVLh7kfkqxHPAn1UPkoHPuEYDs X-Received: by 2002:a05:6a20:4288:b0:100:9d6c:b49e with SMTP id o8-20020a056a20428800b001009d6cb49emr16969095pzj.58.1683788501313; Thu, 11 May 2023 00:01:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1683788501; cv=none; d=google.com; s=arc-20160816; b=dMIlKwCKcFrMK4dJ9iCVDnUED+0I3+GEtvL9x22Sn7qjcWFzUzCnyP5eAuNM3Jb8+/ X+Pv3Joy41p9IXdhNI6N13mXjDg7X6iOwfGsnhECK/Q/8XT8IMdn1QhTypaaXc1N2lV7 MWYw950aU6sTG4KS5w2zILXzQfC9dsKKsZlDD0OwioHkH9+1bi/5xR9yTRB73pvYX1XF e7DmhE4R7lMV8szuJe1CHGck1zfxJSLjp4nM0BU3KMW+eow/jZQ6ppGqVG+lujiqnZY5 tY0lM1Lok7BMNgBX26DQKrYftEInHrJf27VsBV+onHuoNWvFQbn4mdWmzcLHcDLw0A+R FhRA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :message-id:date:subject:cc:to:from:dkim-signature; bh=hvjxUAzQgZsVwA1r2aEShqDqAqfqKwVv+/heSju6wGo=; b=xmcn9g0/Jb2dc+LB1ujb0JuBBC1Z0qMT0BSCGzRGCxEXZOO3SLM5+/2E2kOZfg1aTH 5V2maJ6MbDDUkUm0+ty11OvmpYH6e7g3+/EQ8ObUliGdgZeKA81cCZJhtdk1LGE0bJqI DCn5iaF8JWAtc4UJQv3OL9SX+eOOmVCBFShsoUiMiEUHZP8ZUPBS2NO6QMn1XzfTDuct PC8ptxsp4m2YE/qQRduAyUVRYAGPxBT05S6gi536fZfg1IIGsIVNt3AaqSTMfPLQSmr1 AaScOQ65Gav2kImMgNimmQEsPxJNjo185kjmjF7g/vOVM1OrHWyFHxoOS14u1daoQxHH kH/g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GTFwZcaP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id f27-20020a63381b000000b0051b6613dddesi5842121pga.281.2023.05.11.00.01.26; Thu, 11 May 2023 00:01:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=GTFwZcaP; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237251AbjEKG53 (ORCPT + 99 others); Thu, 11 May 2023 02:57:29 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:56988 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237280AbjEKG5G (ORCPT ); Thu, 11 May 2023 02:57:06 -0400 Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7841D6584 for ; Wed, 10 May 2023 23:57:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1683788222; x=1715324222; h=from:to:cc:subject:date:message-id:mime-version: content-transfer-encoding; bh=5148VWt67GT1KAtmE4SFUCYsm8r3r4RYHgyw/Z4o9Gc=; b=GTFwZcaPHmzUdCt33Jy22vCc/fpXB2qBGf67qPv2kEHHBpO7UJFNNtnJ TsE/bBjcAzI+nCo8cT2UGR+ctj9KQRO3Wdpxgy6zwQnjS1RsYo+a/4+tg AqWNpu+PDFanaCPHuVU6ZvBpvHPZKNaX69J/7dStw8rG3yqAzCa9eghuY EEgIqKbids/1vAyQQp9ZAZ2kkEgQxx2tebvmzfYiLR2XI4oAHqWaUCAqg JZeoOqd+xWFYTZYZLg7kQaf/Msi7oiC/nPnenk5RIYg3hkxRkMPnL2H/X vr+E/mkKdpm4xONUc9uD/zfXYIc9tDOMG2/HBuyPgyJWPOpC/afyixMyI w==; X-IronPort-AV: E=McAfee;i="6600,9927,10706"; a="436744462" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="436744462" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 May 2023 23:56:32 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10706"; a="823855230" X-IronPort-AV: E=Sophos;i="5.99,266,1677571200"; d="scan'208";a="823855230" Received: from chaoyan1-mobl2.ccr.corp.intel.com (HELO yhuang6-mobl2.ccr.corp.intel.com) ([10.255.31.95]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 10 May 2023 23:56:28 -0700 From: Huang Ying To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Arjan Van De Ven , Andrew Morton , Huang Ying , Mel Gorman , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox Subject: [RFC 0/6] mm: improve page allocator scalability via splitting zones Date: Thu, 11 May 2023 14:56:01 +0800 Message-Id: <20230511065607.37407-1-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-4.4 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_MED, SPF_HELO_NONE,SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The patchset is based on upstream v6.3. More and more cores are put in one physical CPU (usually one NUMA node too). In 2023, one high-end server CPU has 56, 64, or more cores. Even more cores per physical CPU are planned for future CPUs. While all cores in one physical CPU will contend for the page allocation on one zone in most cases. This causes heavy zone lock contention in some workloads. And the situation will become worse and worse in the future. For example, on an 2-socket Intel server machine with 224 logical CPUs, if the kernel is built with `make -j224`, the zone lock contention cycles% can reach up to about 12.7%. To improve the scalability of the page allocation, in this series, we will create one zone instance for each about 256 GB memory of a zone type generally. That is, one large zone type will be split into multiple zone instances. Then, different logical CPUs will prefer different zone instances based on the logical CPU No. So the total number of logical CPUs contend on one zone will be reduced. Thus the scalability is improved. With the series, the zone lock contention cycles% reduces to less than 1.6% in the above kbuild test case when 4 zone instances are created for ZONE_NORMAL. Also tested the series with the will-it-scale/page_fault1 with 16 processes. With the optimization, the benchmark score increases up to 18.2% and the zone lock contention reduces from 13.01% to 0.56%. To create multiple zone instances for a zone type, another choice is to create zone instances based on the total number of logical CPUs. We choose to use memory size because it is easier to be implemented. In most cases, the more the cores, the larger the memory size is. And, on system with larger memory size, the performance requirement of the page allocator is usually higher. Best Regards, Huang, Ying