Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp1731041ybg; Sat, 19 Oct 2019 01:16:22 -0700 (PDT) X-Google-Smtp-Source: APXvYqw0ibOHKVCRPT3PT02NChhVtiHZQETIldo5zZt4rFO99GSSlsG4IZKwIMNi7Bg4yXHDN4Qy X-Received: by 2002:a50:fb84:: with SMTP id e4mr13942445edq.181.1571472982573; Sat, 19 Oct 2019 01:16:22 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571472982; cv=none; d=google.com; s=arc-20160816; b=Mro5AkWgZuHG2dhdOqpkolBVrhbqmNQC5gnZnK4N6XC5vsbK0jzlMJrgBBqW9u4/VT B1sPtNIGK3CnxbstzIilJRz0udYVdhY0opDAC7Ks+ftjC+NEtNPXvly86iaOwhbk2OK+ u4i/y3nPRVex9691Iujz6LG7QZCSjroTzJrTl8RW8o7GPQ64UesNlLMlv+lWSv6YlJN1 dEUoTyeE/3qZTarVjroDPpM+BgsVD+WQR84iInANfZoZBdgnIVDkDpLFh+psdmKu0d9M HfZcMyjIZm+ASlv9Nnw5exwUB2ROmyVbpb46GCf3scj3vm1RzWuFJBJcCXPsHZ81KJZM /SlA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from; bh=gHEUng/Lv5dxGIrMN0tx/noEDTTjPJK3E58O/ROahh4=; b=NxSR0jIbggu0uvXE241dLmeCW8s+Ax0kjMv9xPSEmCVo0SOiyXSs2nFFl9i5y1v4q7 H8AOfUKg7ScfMhIqb9YlzcYqTeFGP3ZThAgXGGm3rVb10LzrhFYY+wghF9C4lL4Q1y1r nEl2HRg2B1c8d4ca8dVisWNfsy5tSUwlZWg3lD4aCgBbmLECrGIyKYsxl7I2rCcUAOHP LJXoAySR7v8xLlfpc1p86knMcQNBma6BQQQL69imm9T3dOAEVcS3zGtJ8DXn1ux+fZRo jIIw7W83kjRlxYONkm3w+N9V+OjGHzs/NHra7bldHDElILPBgWJKWa6NSU4cpT4JROQM DNPQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l23si4847444ejr.296.2019.10.19.01.15.59; Sat, 19 Oct 2019 01:16:22 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2407532AbfJRK4N (ORCPT + 99 others); Fri, 18 Oct 2019 06:56:13 -0400 Received: from outbound-smtp23.blacknight.com ([81.17.249.191]:53801 "EHLO outbound-smtp23.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2407703AbfJRK4L (ORCPT ); Fri, 18 Oct 2019 06:56:11 -0400 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp23.blacknight.com (Postfix) with ESMTPS id 77FC1B885C for ; Fri, 18 Oct 2019 11:56:08 +0100 (IST) Received: (qmail 30789 invoked from network); 18 Oct 2019 10:56:08 -0000 Received: from unknown (HELO stampy.112glenside.lan) (mgorman@techsingularity.net@[84.203.19.210]) by 81.17.254.9 with ESMTPA; 18 Oct 2019 10:56:08 -0000 From: Mel Gorman To: Andrew Morton Cc: Michal Hocko , Vlastimil Babka , Thomas Gleixner , Matt Fleming , Borislav Petkov , Linux-MM , Linux Kernel Mailing List , Mel Gorman Subject: [PATCH 2/3] mm, meminit: Recalculate pcpu batch and high limits after init completes Date: Fri, 18 Oct 2019 11:56:05 +0100 Message-Id: <20191018105606.3249-3-mgorman@techsingularity.net> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20191018105606.3249-1-mgorman@techsingularity.net> References: <20191018105606.3249-1-mgorman@techsingularity.net> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Deferred memory initialisation updates zone->managed_pages during the initialisation phase but before that finishes, the per-cpu page allocator (pcpu) calculates the number of pages allocated/freed in batches as well as the maximum number of pages allowed on a per-cpu list. As zone->managed_pages is not up to date yet, the pcpu initialisation calculates inappropriately low batch and high values. This increases zone lock contention quite severely in some cases with the degree of severity depending on how many CPUs share a local zone and the size of the zone. A private report indicated that kernel build times were excessive with extremely high system CPU usage. A perf profile indicated that a large chunk of time was lost on zone->lock contention. This patch recalculates the pcpu batch and high values after deferred initialisation completes on each node. It was tested on a 2-socket AMD EPYC 2 machine using a kernel compilation workload -- allmodconfig and all available CPUs. mmtests configuration: config-workload-kernbench-max Configuration was modified to build on a fresh XFS partition. kernbench 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v1r1 Amean user-256 13249.50 ( 0.00%) 15928.40 * -20.22%* Amean syst-256 14760.30 ( 0.00%) 4551.77 * 69.16%* Amean elsp-256 162.42 ( 0.00%) 118.46 * 27.06%* Stddev user-256 42.97 ( 0.00%) 50.83 ( -18.30%) Stddev syst-256 336.87 ( 0.00%) 33.70 ( 90.00%) Stddev elsp-256 2.46 ( 0.00%) 0.81 ( 67.01%) 5.4.0-rc3 5.4.0-rc3 vanillaresetpcpu-v1r1 Duration User 39766.24 47802.92 Duration System 44298.10 13671.93 Duration Elapsed 519.11 387.65 The patch reduces system CPU usage by 69.16% and total build time by 27.06%. The variance of system CPU usage is also much reduced. Cc: stable@vger.kernel.org # v4.15+ Signed-off-by: Mel Gorman --- mm/page_alloc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cafe568d36f6..0a0dd74edc83 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1818,6 +1818,14 @@ static int __init deferred_init_memmap(void *data) */ while (spfn < epfn) nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); + + /* + * The number of managed pages has changed due to the initialisation + * so the pcpu batch and high limits needs to be updated or the limits + * will be artificially small. + */ + zone_pcp_update(zone); + zone_empty: pgdat_resize_unlock(pgdat, &flags); @@ -8516,7 +8524,6 @@ void free_contig_range(unsigned long pfn, unsigned int nr_pages) WARN(count != 0, "%d pages are still in use!\n", count); } -#ifdef CONFIG_MEMORY_HOTPLUG /* * The zone indicated has a new number of managed_pages; batch sizes and percpu * page high values need to be recalulated. @@ -8527,7 +8534,6 @@ void __meminit zone_pcp_update(struct zone *zone) __zone_pcp_update(zone); mutex_unlock(&pcp_batch_high_lock); } -#endif void zone_pcp_reset(struct zone *zone) { -- 2.16.4