Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753960AbdHOKhu (ORCPT ); Tue, 15 Aug 2017 06:37:50 -0400 Received: from mx1.redhat.com ([209.132.183.28]:45604 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753831AbdHOKgp (ORCPT ); Tue, 15 Aug 2017 06:36:45 -0400 DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 586BE6881A Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx04.extmail.prod.ext.phx2.redhat.com; spf=fail smtp.mailfrom=brouer@redhat.com Date: Tue, 15 Aug 2017 12:36:36 +0200 From: Jesper Dangaard Brouer To: Kemi Wang Cc: Andrew Morton , Michal Hocko , Mel Gorman , Johannes Weiner , Dave , Andi Kleen , Ying Huang , Aaron Lu , Tim Chen , Linux MM , Linux Kernel , brouer@redhat.com Subject: Re: [PATCH 0/2] Separate NUMA statistics from zone statistics Message-ID: <20170815123636.3788230c@redhat.com> In-Reply-To: <1502786736-21585-1-git-send-email-kemi.wang@intel.com> References: <1502786736-21585-1-git-send-email-kemi.wang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Tue, 15 Aug 2017 10:36:45 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3394 Lines: 76 On Tue, 15 Aug 2017 16:45:34 +0800 Kemi Wang wrote: > Each page allocation updates a set of per-zone statistics with a call to > zone_statistics(). As discussed in 2017 MM submit, these are a substantial ^^^^^^ should be "summit" > source of overhead in the page allocator and are very rarely consumed. This > significant overhead in cache bouncing caused by zone counters (NUMA > associated counters) update in parallel in multi-threaded page allocation > (pointed out by Dave Hansen). Hi Kemi Thanks a lot for following up on this work. A link to the MM summit slides: http://people.netfilter.org/hawk/presentations/MM-summit2017/MM-summit2017-JesperBrouer.pdf > To mitigate this overhead, this patchset separates NUMA statistics from > zone statistics framework, and update NUMA counter threshold to a fixed > size of 32765, as a small threshold greatly increases the update frequency > of the global counter from local per cpu counter (suggested by Ying Huang). > The rationality is that these statistics counters don't need to be read > often, unlike other VM counters, so it's not a problem to use a large > threshold and make readers more expensive. > > With this patchset, we see 26.6% drop of CPU cycles(537-->394, see below) > for per single page allocation and reclaim on Jesper's page_bench03 > benchmark. Meanwhile, this patchset keeps the same style of virtual memory > statistics with little end-user-visible effects (see the first patch for > details), except that the number of NUMA items in each cpu > (vm_numa_stat_diff[]) is added to zone->vm_numa_stat[] when a user *reads* > the value of NUMA counter to eliminate deviation. I'm very happy to see that you found my kernel module for benchmarking useful :-) > I did an experiment of single page allocation and reclaim concurrently > using Jesper's page_bench03 benchmark on a 2-Socket Broadwell-based server > (88 processors with 126G memory) with different size of threshold of pcp > counter. > > Benchmark provided by Jesper D Broucer(increase loop times to 10000000): ^^^^^^^ You mis-spelled my last name, it is "Brouer". > https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm/bench > > Threshold CPU cycles Throughput(88 threads) > 32 799 241760478 > 64 640 301628829 > 125 537 358906028 <==> system by default > 256 468 412397590 > 512 428 450550704 > 4096 399 482520943 > 20000 394 489009617 > 30000 395 488017817 > 32765 394(-26.6%) 488932078(+36.2%) <==> with this patchset > N/A 342(-36.3%) 562900157(+56.8%) <==> disable zone_statistics > > Kemi Wang (2): > mm: Change the call sites of numa statistics items > mm: Update NUMA counter threshold size > > drivers/base/node.c | 22 ++++--- > include/linux/mmzone.h | 25 +++++--- > include/linux/vmstat.h | 33 ++++++++++ > mm/page_alloc.c | 10 +-- > mm/vmstat.c | 162 +++++++++++++++++++++++++++++++++++++++++++++++-- > 5 files changed, 227 insertions(+), 25 deletions(-) > -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer