Received: by 2002:a25:d7c1:0:0:0:0:0 with SMTP id o184csp1738157ybg; Sat, 19 Oct 2019 01:25:58 -0700 (PDT) X-Google-Smtp-Source: APXvYqyacy1BhGiYZyGSIWxGzLr4D+6x1MuY5aCG5M+dIxRNs7GfqkJc7AD0nVfkxmcpHmIueCt9 X-Received: by 2002:a17:906:181b:: with SMTP id v27mr12852040eje.117.1571473557940; Sat, 19 Oct 2019 01:25:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1571473557; cv=none; d=google.com; s=arc-20160816; b=Fz+b3CrlgRmeYbB++Y0gDPVBRL10SkrGU0gnEGzMKQmdZ7CIUIvoZ5IgbGA9Nedcd+ QfJ6dqRyBspk36HuvLjZO6zxR8j0/eTMscInKFV3Eq/6RI1SWJRr9Fws2IknVxVmhbXe 9YLWOSOUZO6/dVgW9qBpMC3JPnNrRSqxnlesSc3QbhKp1jqVeEyJtfK6lhprcfWKXjCz sfT3jLGaDeOuMmBCtYRLvADW4sWwn2CynlJGTK9+9i9FUfWOELmrMEB/4RsuDtkx9unt TS7yZPbZxDZGV+hQ1O39rtTiudwViajYZyoGG5u26n+/u6mChC20SdZ8DnKSW3goS05F QJ4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=BzzEmOh+Jc+nNtjuWSKlvboXYDgkreAqm727P92ddBA=; b=CuB9vE4q2hzl+zLEM9CLL50LRmTau90oehzhTrezlTDQAYy4hdyTg87NVqXyeBPKnx QAapv+uimfUs/jsWAYQzGYssQF1+GcteD1GlhNRV7QHIPBgrICvwP9hBintuKuqOe8kW uN5LGlD1hcUKjR3loKT8ARLGrFZTR0wDZK4bzTeawPF10kV3d4oSf0t+MzYu1UYP5jac 7sbN0H9pgufhkw5N6KVtTPakcQhTRZPUg2zbRCfefjT9Pf34yve2S3tXDrdTUWk0eSgv hKYilrocw2b5Y9oIdcawTcNj023hgtRXT2UVziLA54ybfY5mFCr9I5cbroTuTv53tK/V Rauw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l23si4847444ejr.296.2019.10.19.01.25.35; Sat, 19 Oct 2019 01:25:57 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2410207AbfJRNBb (ORCPT + 99 others); Fri, 18 Oct 2019 09:01:31 -0400 Received: from mx2.suse.de ([195.135.220.15]:44958 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S2410165AbfJRNBa (ORCPT ); Fri, 18 Oct 2019 09:01:30 -0400 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 70035AEF3; Fri, 18 Oct 2019 13:01:28 +0000 (UTC) Date: Fri, 18 Oct 2019 15:01:27 +0200 From: Michal Hocko To: Mel Gorman Cc: Andrew Morton , Vlastimil Babka , Thomas Gleixner , Matt Fleming , Borislav Petkov , Linux-MM , Linux Kernel Mailing List Subject: Re: [PATCH 2/3] mm, meminit: Recalculate pcpu batch and high limits after init completes Message-ID: <20191018130127.GP5017@dhcp22.suse.cz> References: <20191018105606.3249-1-mgorman@techsingularity.net> <20191018105606.3249-3-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191018105606.3249-3-mgorman@techsingularity.net> User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri 18-10-19 11:56:05, Mel Gorman wrote: > Deferred memory initialisation updates zone->managed_pages during > the initialisation phase but before that finishes, the per-cpu page > allocator (pcpu) calculates the number of pages allocated/freed in > batches as well as the maximum number of pages allowed on a per-cpu list. > As zone->managed_pages is not up to date yet, the pcpu initialisation > calculates inappropriately low batch and high values. > > This increases zone lock contention quite severely in some cases with the > degree of severity depending on how many CPUs share a local zone and the > size of the zone. A private report indicated that kernel build times were > excessive with extremely high system CPU usage. A perf profile indicated > that a large chunk of time was lost on zone->lock contention. > > This patch recalculates the pcpu batch and high values after deferred > initialisation completes on each node. It was tested on a 2-socket AMD > EPYC 2 machine using a kernel compilation workload -- allmodconfig and > all available CPUs. > > mmtests configuration: config-workload-kernbench-max > Configuration was modified to build on a fresh XFS partition. > > kernbench > 5.4.0-rc3 5.4.0-rc3 > vanilla resetpcpu-v1r1 > Amean user-256 13249.50 ( 0.00%) 15928.40 * -20.22%* > Amean syst-256 14760.30 ( 0.00%) 4551.77 * 69.16%* > Amean elsp-256 162.42 ( 0.00%) 118.46 * 27.06%* > Stddev user-256 42.97 ( 0.00%) 50.83 ( -18.30%) > Stddev syst-256 336.87 ( 0.00%) 33.70 ( 90.00%) > Stddev elsp-256 2.46 ( 0.00%) 0.81 ( 67.01%) > > 5.4.0-rc3 5.4.0-rc3 > vanillaresetpcpu-v1r1 > Duration User 39766.24 47802.92 > Duration System 44298.10 13671.93 > Duration Elapsed 519.11 387.65 > > The patch reduces system CPU usage by 69.16% and total build time by > 27.06%. The variance of system CPU usage is also much reduced. The fix makes sense. It would be nice to see the difference in the batch sizes from the initial setup compared to the one after the deferred intialization is done > Cc: stable@vger.kernel.org # v4.15+ Hmm, are you sure about 4.15? Doesn't this go all the way down to deferred initialization? I do not see any recent changes on when setup_per_cpu_pageset is called. > Signed-off-by: Mel Gorman Acked-by: Michal Hocko > --- > mm/page_alloc.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index cafe568d36f6..0a0dd74edc83 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1818,6 +1818,14 @@ static int __init deferred_init_memmap(void *data) > */ > while (spfn < epfn) > nr_pages += deferred_init_maxorder(&i, zone, &spfn, &epfn); > + > + /* > + * The number of managed pages has changed due to the initialisation > + * so the pcpu batch and high limits needs to be updated or the limits > + * will be artificially small. > + */ > + zone_pcp_update(zone); > + > zone_empty: > pgdat_resize_unlock(pgdat, &flags); > > @@ -8516,7 +8524,6 @@ void free_contig_range(unsigned long pfn, unsigned int nr_pages) > WARN(count != 0, "%d pages are still in use!\n", count); > } > > -#ifdef CONFIG_MEMORY_HOTPLUG > /* > * The zone indicated has a new number of managed_pages; batch sizes and percpu > * page high values need to be recalulated. > @@ -8527,7 +8534,6 @@ void __meminit zone_pcp_update(struct zone *zone) > __zone_pcp_update(zone); > mutex_unlock(&pcp_batch_high_lock); > } > -#endif > > void zone_pcp_reset(struct zone *zone) > { > -- > 2.16.4 > -- Michal Hocko SUSE Labs