Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp1418504rdb; Wed, 20 Sep 2023 08:35:28 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFh22UHFJYwd9X0V18MHCL9/gAZzKc8nrhkgjUhPDdCSfItFmqaJy7pUiDG5r1aFoXamsJL X-Received: by 2002:a05:6a00:188e:b0:68b:f529:a329 with SMTP id x14-20020a056a00188e00b0068bf529a329mr3601011pfh.5.1695224128081; Wed, 20 Sep 2023 08:35:28 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695224128; cv=none; d=google.com; s=arc-20160816; b=xna+W8jpOqVTTG82AHgaedodkgjzTAIAL8no6+b+GxZyldxKi2mVz3U9f0jMAVdffz 9tRT3Q6HlJvtvQxJfI3w6CjGD96qaVK181UWhYwjMsrj1SlDQ4x+p6bL+hnWsjxiMh4G OZpVwqMzb4TBZZ6G/ZqmOqBlxa3CWNKc9zGlPJWnt9QvtypMcM1UWarHrrr9nf5IN6gq eW46CEHEPzdMu/jhmmzBNycHbm80pTi2jdSvw2N7jCC1FY9uDCQsqsfYQB4Yf0Mv5Gi2 KPRArOeH3PHd+6FzB3mQkv1x91a6RbtK1b33rv+2Rn2DkkMDGftrDqoCxLWgKd5WdwFG TQ8Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=B04gbBYfG2Z+ZG2AaIcVwZBgZPaKp1IkffSjK1YB3eQ=; fh=itp5/nRcK58KRx3/xKC9CX073VYzlzJbf2ljAGnHiM4=; b=fwwb2KbJwn6MqlTbthq4x2m6Ai2tHcPeChvH8/exDrqC1RGDpSTHbHj7L2kh/JtgjA pvJzsUXeFsgbo0RoaHIc6KwddPb/MdJ/9EljXaCFL3IuY5l76FcUJ28CH0e913KRsccK xEeAuVyGScXeJfsvwNGo1XbyKGxTZLkrwO13uz4xfOqCj4XKqi0X5q18GkaF2smswgP6 Uqm6jpSByvCA7Q+ajv/F7jwQNIZtN3zkkhnwE53Xgo0o1QtoCMXKGO03vWdH6m9uQixX kS3P5F01lF7BJC+1GoRSqgL2Y6ihnkHU41plVJ2S2gCfJHlOGyq72zMMmN1patYyn8eI Gmgg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Gp7YBX0n; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from groat.vger.email (groat.vger.email. [2620:137:e000::3:5]) by mx.google.com with ESMTPS id v62-20020a638941000000b00574057d7c19si3090877pgd.220.2023.09.20.08.35.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 20 Sep 2023 08:35:28 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) client-ip=2620:137:e000::3:5; Authentication-Results: mx.google.com; dkim=pass header.i=@intel.com header.s=Intel header.b=Gp7YBX0n; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::3:5 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by groat.vger.email (Postfix) with ESMTP id EA42B8303B27; Tue, 19 Sep 2023 23:20:42 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at groat.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233222AbjITGUQ (ORCPT + 99 others); Wed, 20 Sep 2023 02:20:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S233281AbjITGUM (ORCPT ); Wed, 20 Sep 2023 02:20:12 -0400 Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.126]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A061DE for ; Tue, 19 Sep 2023 23:19:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1695190793; x=1726726793; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=34wdsgvd2w4CDiR/vU6T5wdv61OCb5iOnSh2X+G58Mc=; b=Gp7YBX0nMiS5QX/TBV6p3G9xBLqXObK3Vvn25jn0OFU44htoSWmGhBO8 xv+t8h02FgEdi0dLRwcRuNl+mmOHsEUyEh19kd0qiPcLb1h9X9r+m6g+P 46KrzmOYM6RpgA/E5m+Ss6UN+SmSG2y2EZXtAFsP+YNEKXaVLb8pIKvQw f2vNEubh7v0tTgZd06YUAYshMqmz01kfTsWsgDY/ZE8yUaBoZ8b39gR26 yvYxuPJhy/kgH6wXcGB43jfeTboGe301YHenTKyzBvXwNivj3kxFNzKF9 Ot6gSC5Ku1KrPsELcyzePifubGKcLVYcgHfzN3Q2K/gww5copN/jP0P2G Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10838"; a="365187681" X-IronPort-AV: E=Sophos;i="6.02,161,1688454000"; d="scan'208";a="365187681" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2023 23:19:52 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10838"; a="740060606" X-IronPort-AV: E=Sophos;i="6.02,161,1688454000"; d="scan'208";a="740060606" Received: from yhuang6-mobl2.sh.intel.com ([10.238.6.133]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Sep 2023 23:19:49 -0700 From: Huang Ying To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, Arjan Van De Ven , Huang Ying , Mel Gorman , Andrew Morton , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: [PATCH 05/10] mm, page_alloc: scale the number of pages that are batch allocated Date: Wed, 20 Sep 2023 14:18:51 +0800 Message-Id: <20230920061856.257597-6-ying.huang@intel.com> X-Mailer: git-send-email 2.39.2 In-Reply-To: <20230920061856.257597-1-ying.huang@intel.com> References: <20230920061856.257597-1-ying.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on groat.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (groat.vger.email [0.0.0.0]); Tue, 19 Sep 2023 23:20:43 -0700 (PDT) When a task is allocating a large number of order-0 pages, it may acquire the zone->lock multiple times allocating pages in batches. This may unnecessarily contend on the zone lock when allocating very large number of pages. This patch adapts the size of the batch based on the recent pattern to scale the batch size for subsequent allocations. On a 2-socket Intel server with 224 logical CPU, we tested kbuild on one socket with `make -j 112`. With the patch, the cycles% of the spinlock contention (mostly for zone lock) decreases from 40.5% to 37.9% (with PCP size == 361). Signed-off-by: "Huang, Ying" Suggested-by: Mel Gorman Cc: Andrew Morton Cc: Vlastimil Babka Cc: David Hildenbrand Cc: Johannes Weiner Cc: Dave Hansen Cc: Michal Hocko Cc: Pavel Tatashin Cc: Matthew Wilcox Cc: Christoph Lameter --- include/linux/mmzone.h | 3 ++- mm/page_alloc.c | 52 ++++++++++++++++++++++++++++++++++-------- 2 files changed, 44 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 4132e7490b49..4f7420e35fbb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -685,9 +685,10 @@ struct per_cpu_pages { int high; /* high watermark, emptying needed */ int batch; /* chunk size for buddy add/remove */ u8 flags; /* protected by pcp->lock */ + u8 alloc_factor; /* batch scaling factor during allocate */ u8 free_factor; /* batch scaling factor during free */ #ifdef CONFIG_NUMA - short expire; /* When 0, remote pagesets are drained */ + u8 expire; /* When 0, remote pagesets are drained */ #endif /* Lists of pages, one per migrate type stored on the pcp-lists */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 30554c674349..30bb05fa5353 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2376,6 +2376,12 @@ static void free_unref_page_commit(struct zone *zone, struct per_cpu_pages *pcp, int pindex; bool free_high = false; + /* + * On freeing, reduce the number of pages that are batch allocated. + * See nr_pcp_alloc() where alloc_factor is increased for subsequent + * allocations. + */ + pcp->alloc_factor >>= 1; __count_vm_events(PGFREE, 1 << order); pindex = order_to_pindex(migratetype, order); list_add(&page->pcp_list, &pcp->lists[pindex]); @@ -2682,6 +2688,41 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone, return page; } +static int nr_pcp_alloc(struct per_cpu_pages *pcp, int order) +{ + int high, batch, max_nr_alloc; + + high = READ_ONCE(pcp->high); + batch = READ_ONCE(pcp->batch); + + /* Check for PCP disabled or boot pageset */ + if (unlikely(high < batch)) + return 1; + + /* + * Double the number of pages allocated each time there is subsequent + * refiling of order-0 pages without drain. + */ + if (!order) { + max_nr_alloc = max(high - pcp->count - batch, batch); + batch <<= pcp->alloc_factor; + if (batch <= max_nr_alloc && pcp->alloc_factor < PCP_BATCH_SCALE_MAX) + pcp->alloc_factor++; + batch = min(batch, max_nr_alloc); + } + + /* + * Scale batch relative to order if batch implies free pages + * can be stored on the PCP. Batch can be 1 for small zones or + * for boot pagesets which should never store free pages as + * the pages may belong to arbitrary zones. + */ + if (batch > 1) + batch = max(batch >> order, 2); + + return batch; +} + /* Remove page from the per-cpu list, caller must protect the list */ static inline struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, @@ -2694,18 +2735,9 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, do { if (list_empty(list)) { - int batch = READ_ONCE(pcp->batch); + int batch = nr_pcp_alloc(pcp, order); int alloced; - /* - * Scale batch relative to order if batch implies - * free pages can be stored on the PCP. Batch can - * be 1 for small zones or for boot pagesets which - * should never store free pages as the pages may - * belong to arbitrary zones. - */ - if (batch > 1) - batch = max(batch >> order, 2); alloced = rmqueue_bulk(zone, order, batch, list, migratetype, alloc_flags); -- 2.39.2