Received: by 2002:a25:ef43:0:0:0:0:0 with SMTP id w3csp132113ybm; Thu, 28 May 2020 18:11:53 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyB1jHOQ0mkD78HiKIVE0XBbxR0nEJomP2MvNAqLY6nx7alT/EkjV380lHyCsE2mPWVyx0H X-Received: by 2002:a17:906:3d69:: with SMTP id r9mr5378487ejf.20.1590714713675; Thu, 28 May 2020 18:11:53 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1590714713; cv=none; d=google.com; s=arc-20160816; b=RVu3yi4ji6fs2UArbZR6G9hx6gRowm5Ys8ODezqwqmi9WEJ7PFop85724/2jtLowtD NkWD15DvMwkC3vOGBYl2rJZUGemOvx9z0OT4cGLjdu14P398tNWJ7XLfizZbgmQkpol4 qDGzl32J0IBjb0sfFJF+vBHPS+ckHkEyoqn4x4dEOS9PyGxfHlwLYyB+YUATScRd9P2i oNMIMReYhPs7whrWakoj9A5Fm8hutPVcF31lS7g+crf/lG4HzrsYom1NV8L6X6lH7ggU SUiOH83Mfu1BabirUS65TxHGVnJftX446/eZlh3zlLGBJwud5jVM3NaaCsm9kdT94hex KPGw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:ironport-sdr:ironport-sdr; bh=HUMdW9Wfg4xMmDRPqdB4zf/VsqAH2SDnwYiD5zYXu5M=; b=O5lz2f7b2Rbpl2jkpeiF8F/q7JfWX5MGJBZK+jADlNo1RLhocbMdBNqP5d+yiG35nC jtn1EABdMRA8XlMN8e/b7mvLZwgrXgfIAFiUnzOqFXdKXMtfni6wMcR8qfta9vFnzdXm wnT8oQahYs/L2f6gGjZJpGYseW32P0rVymnycqNtoCiaLLDkZInbLcmRPujhdgmvhP2u nDTmcyoHB9eUt4xatDBnI8aXoyLIRWGnkNH0HvUf9T/Dsbh44ZG+0SGQDFdITraxZqJW 7WEs5wUjDns6n+aujFyejRGZAbZIcVaApuEI+9LGvplAR1kT15IZ02TCKqohGH71rhCf EK3w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bx1si5011110edb.573.2020.05.28.18.11.30; Thu, 28 May 2020 18:11:53 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2438328AbgE2BGe (ORCPT + 99 others); Thu, 28 May 2020 21:06:34 -0400 Received: from mga17.intel.com ([192.55.52.151]:2379 "EHLO mga17.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2438312AbgE2BGa (ORCPT ); Thu, 28 May 2020 21:06:30 -0400 IronPort-SDR: 9BDnshHHdQuAVfUn+t3dTiW3/lHT+J+se9lKk09DgrDvulO6sEUMPv/J5qqrft0BliA2T02jDp 5Xd7vCwBNdGg== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2020 18:06:29 -0700 IronPort-SDR: RxXsvuJCMcDX8IEzpOUSMFasbltQTH5HRxJ9mhwLY3p5ixHUaP3fzc7TY7mYV8oqtworpBpMFZ kW1YSkdx7yqw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,446,1583222400"; d="scan'208";a="443187429" Received: from shbuild999.sh.intel.com ([10.239.146.107]) by orsmga005.jf.intel.com with ESMTP; 28 May 2020 18:06:26 -0700 From: Feng Tang To: Andrew Morton , Michal Hocko , Johannes Weiner , Matthew Wilcox , Mel Gorman , Kees Cook , Qian Cai , andi.kleen@intel.com, tim.c.chen@intel.com, dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Feng Tang Subject: [PATCH v4 4/4] mm: adjust vm_committed_as_batch according to vm overcommit policy Date: Fri, 29 May 2020 09:06:10 +0800 Message-Id: <1590714370-67182-5-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1590714370-67182-1-git-send-email-feng.tang@intel.com> References: <1590714370-67182-1-git-send-email-feng.tang@intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When checking a performance change for will-it-scale scalability mmap test [1], we found very high lock contention for spinlock of percpu counter 'vm_committed_as': 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; Actually this heavy lock contention is not always necessary. The 'vm_committed_as' needs to be very precise when the strict OVERCOMMIT_NEVER policy is set, which requires a rather small batch number for the percpu counter. So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Also add a sysctl handler to adjust it when the policy is reconfigured. Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. We tested with test platforms in 0day (server, desktop and laptop), and 80%+ platforms shows improvements with that test. And whether it shows improvements depends on if the test mmap size is bigger than the batch number computed. And if the lift is 16X, 1/3 of the platforms will show improvements, though it should help the mmap/unmap usage generally, as Michal Hocko mentioned: : I believe that there are non-synthetic worklaods which would benefit from : a larger batch. E.g. large in memory databases which do large mmaps : during startups from multiple threads. [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/ Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang Cc: Michal Hocko Cc: Matthew Wilcox (Oracle) Cc: Johannes Weiner Cc: Mel Gorman Cc: Kees Cook Cc: Andi Kleen Cc: Tim Chen Cc: Dave Hansen Cc: Huang Ying Signed-off-by: Andrew Morton --- include/linux/mm.h | 2 ++ include/linux/mman.h | 4 ++++ kernel/sysctl.c | 2 +- mm/mm_init.c | 18 ++++++++++++++---- mm/util.c | 12 ++++++++++++ 5 files changed, 33 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 573947c..c2efea6 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -206,6 +206,8 @@ int overcommit_ratio_handler(struct ctl_table *, int, void *, size_t *, loff_t *); int overcommit_kbytes_handler(struct ctl_table *, int, void *, size_t *, loff_t *); +int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, + loff_t *); #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) diff --git a/include/linux/mman.h b/include/linux/mman.h index 4b08e9c..91c93c1 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -57,8 +57,12 @@ extern struct percpu_counter vm_committed_as; #ifdef CONFIG_SMP extern s32 vm_committed_as_batch; +extern void mm_compute_batch(void); #else #define vm_committed_as_batch 0 +static inline void mm_compute_batch(void) +{ +} #endif unsigned long vm_memory_committed(void); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index db1ce7a..9456c86 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2650,7 +2650,7 @@ static struct ctl_table vm_table[] = { .data = &sysctl_overcommit_memory, .maxlen = sizeof(sysctl_overcommit_memory), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = overcommit_policy_handler, .extra1 = SYSCTL_ZERO, .extra2 = &two, }, diff --git a/mm/mm_init.c b/mm/mm_init.c index 435e5f7..c5a6fb1 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "internal.h" #ifdef CONFIG_DEBUG_MEMORY_INIT @@ -144,14 +145,23 @@ EXPORT_SYMBOL_GPL(mm_kobj); #ifdef CONFIG_SMP s32 vm_committed_as_batch = 32; -static void __meminit mm_compute_batch(void) +void mm_compute_batch(void) { u64 memsized_batch; s32 nr = num_present_cpus(); s32 batch = max_t(s32, nr*2, 32); - - /* batch size set to 0.4% of (total memory/#cpus), or max int32 */ - memsized_batch = min_t(u64, (totalram_pages()/nr)/256, 0x7fffffff); + unsigned long ram_pages = totalram_pages(); + + /* + * For policy of OVERCOMMIT_NEVER, set batch size to 0.4% + * of (total memory/#cpus), and lift it to 25% for other + * policies to easy the possible lock contention for percpu_counter + * vm_committed_as, while the max limit is INT_MAX + */ + if (sysctl_overcommit_memory == OVERCOMMIT_NEVER) + memsized_batch = min_t(u64, ram_pages/nr/256, INT_MAX); + else + memsized_batch = min_t(u64, ram_pages/nr/4, INT_MAX); vm_committed_as_batch = max_t(s32, memsized_batch, batch); } diff --git a/mm/util.c b/mm/util.c index fe63271..580d268 100644 --- a/mm/util.c +++ b/mm/util.c @@ -746,6 +746,18 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer, return ret; } +int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) +{ + int ret; + + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + if (ret == 0 && write) + mm_compute_batch(); + + return ret; +} + int overcommit_kbytes_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { -- 2.7.4