Received: by 2002:a05:6902:102b:0:0:0:0 with SMTP id x11csp1837367ybt; Sun, 21 Jun 2020 00:42:41 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyXZUGvY9urhPahX3aJfhd6roh1yw43//lRwuXMSAFYVdmL2l59J/9lJjrBk9OGRTg2aIPj X-Received: by 2002:a17:907:435f:: with SMTP id oc23mr11104586ejb.426.1592725361279; Sun, 21 Jun 2020 00:42:41 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1592725361; cv=none; d=google.com; s=arc-20160816; b=bLNR9JrSao+EdToq3LAe6EQWnlAGWXk7zvl6sZjbnjSzUDm5kWdydTa7KlNxzUURJK GIcNqoJirlELC2qZChyv+3ct8w/ybTxt/JLiArUS5izXmyjU8mH6058vCoVikK63daAm p5iviWVMiBkq+YjxgEj6KErkjKpv7XBE3OKufcy5Opg6SgO7z/20MLANVDzi0+MdubLT gfvhNJ4IhILjNVMaTBqPADVD0wCtsxcOiyOiLStw+FDTxZJUKKQvpprvlo/PBgD+OJld ae5K4Hf61fKFUbqnAQYza+GSM3pZnlnhvRj+uLiY0cxc1c0nZCUma88CT0V9IFfQ+cAZ yFzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:references:in-reply-to:message-id:date :subject:cc:to:from:ironport-sdr:ironport-sdr; bh=t5N0YuBNAJRM8egMMDuKkoM9yrZvDNaXlXU7I4zfIqg=; b=wPwRH92ZEA9Vha0mT1nFJ7hato9RDnjmlNaI3Wkl7A8VKiTVTtPctDRa/KHey+YDm2 oRNPvqehneoUuvsXMzYtOH+mUV9ioVA/KEHV7auUVINOKCfR+mR7r0b69n2moJMd/UoV F8Gyf0AhhIijlillb6pYHz/M/5ntx8IeTPRaqglKCX8DLnKGcnyKxnWSkcWC4J4qBYvx wwUjvpvVWhiYpfFeajJ5/6Bzpxtx/8NE10WPmZziehU6ETq7hYo3dPVW22IXPLovIJhk 2n7EN0U77jIYm1g3wwOSjQK+0PD/5t9dEPcij3Y8tBbARmpASxCsLePyXZkS8thHswDC TC4w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v23si7651203edi.391.2020.06.21.00.42.19; Sun, 21 Jun 2020 00:42:41 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729478AbgFUHg5 (ORCPT + 99 others); Sun, 21 Jun 2020 03:36:57 -0400 Received: from mga06.intel.com ([134.134.136.31]:16490 "EHLO mga06.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729423AbgFUHg4 (ORCPT ); Sun, 21 Jun 2020 03:36:56 -0400 IronPort-SDR: Y7QEklAKhq1h4u32/BkHhpY9jXugHM9phUr4Ln8D3lHFzJWwO/q2cZKonkQUJvWWXMoBV0e5II kJm6Hyw725sQ== X-IronPort-AV: E=McAfee;i="6000,8403,9658"; a="204986831" X-IronPort-AV: E=Sophos;i="5.75,262,1589266800"; d="scan'208";a="204986831" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Jun 2020 00:36:55 -0700 IronPort-SDR: fKe89FpcbZ7ysZDzI2TjwVwce1DW6iyYrXNJXfV1zY1fHS0oUOlxxHyH1Tauy2Lv5iNq9o++OX c19rlnb5u3Ew== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.75,262,1589266800"; d="scan'208";a="478102550" Received: from shbuild999.sh.intel.com ([10.239.146.107]) by fmsmga005.fm.intel.com with ESMTP; 21 Jun 2020 00:36:52 -0700 From: Feng Tang To: Andrew Morton , Michal Hocko , Johannes Weiner , Matthew Wilcox , Mel Gorman , Kees Cook , Luis Chamberlain , Iurii Zaikin , andi.kleen@intel.com, tim.c.chen@intel.com, dave.hansen@intel.com, ying.huang@intel.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Feng Tang Subject: [PATCH v5 3/3] mm: adjust vm_committed_as_batch according to vm overcommit policy Date: Sun, 21 Jun 2020 15:36:40 +0800 Message-Id: <1592725000-73486-4-git-send-email-feng.tang@intel.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1592725000-73486-1-git-send-email-feng.tang@intel.com> References: <1592725000-73486-1-git-send-email-feng.tang@intel.com> Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When checking a performance change for will-it-scale scalability mmap test [1], we found very high lock contention for spinlock of percpu counter 'vm_committed_as': 94.14% 0.35% [kernel.kallsyms] [k] _raw_spin_lock_irqsave 48.21% _raw_spin_lock_irqsave;percpu_counter_add_batch;__vm_enough_memory;mmap_region;do_mmap; 45.91% _raw_spin_lock_irqsave;percpu_counter_add_batch;__do_munmap; Actually this heavy lock contention is not always necessary. The 'vm_committed_as' needs to be very precise when the strict OVERCOMMIT_NEVER policy is set, which requires a rather small batch number for the percpu counter. So keep 'batch' number unchanged for strict OVERCOMMIT_NEVER policy, and lift it to 64X for OVERCOMMIT_ALWAYS and OVERCOMMIT_GUESS policies. Also add a sysctl handler to adjust it when the policy is reconfigured. Benchmark with the same testcase in [1] shows 53% improvement on a 8C/16T desktop, and 2097%(20X) on a 4S/72C/144T server. We tested with test platforms in 0day (server, desktop and laptop), and 80%+ platforms shows improvements with that test. And whether it shows improvements depends on if the test mmap size is bigger than the batch number computed. And if the lift is 16X, 1/3 of the platforms will show improvements, though it should help the mmap/unmap usage generally, as Michal Hocko mentioned: : I believe that there are non-synthetic worklaods which would benefit from : a larger batch. E.g. large in memory databases which do large mmaps : during startups from multiple threads. [1] https://lore.kernel.org/lkml/20200305062138.GI5972@shao2-debian/ Link: http://lkml.kernel.org/r/1589611660-89854-4-git-send-email-feng.tang@intel.com Signed-off-by: Feng Tang Acked-by: Michal Hocko Cc: Matthew Wilcox (Oracle) Cc: Johannes Weiner Cc: Mel Gorman Cc: Kees Cook Cc: Andi Kleen Cc: Tim Chen Cc: Dave Hansen Cc: Huang Ying --- include/linux/mm.h | 2 ++ include/linux/mman.h | 4 ++++ kernel/sysctl.c | 2 +- mm/mm_init.c | 18 ++++++++++++++---- mm/util.c | 12 ++++++++++++ 5 files changed, 33 insertions(+), 5 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index e6ff54a..d00facb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -206,6 +206,8 @@ int overcommit_ratio_handler(struct ctl_table *, int, void *, size_t *, loff_t *); int overcommit_kbytes_handler(struct ctl_table *, int, void *, size_t *, loff_t *); +int overcommit_policy_handler(struct ctl_table *, int, void *, size_t *, + loff_t *); #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n)) diff --git a/include/linux/mman.h b/include/linux/mman.h index 4b08e9c..91c93c1 100644 --- a/include/linux/mman.h +++ b/include/linux/mman.h @@ -57,8 +57,12 @@ extern struct percpu_counter vm_committed_as; #ifdef CONFIG_SMP extern s32 vm_committed_as_batch; +extern void mm_compute_batch(void); #else #define vm_committed_as_batch 0 +static inline void mm_compute_batch(void) +{ +} #endif unsigned long vm_memory_committed(void); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 40180cd..10dcc06 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -2650,7 +2650,7 @@ static struct ctl_table vm_table[] = { .data = &sysctl_overcommit_memory, .maxlen = sizeof(sysctl_overcommit_memory), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = overcommit_policy_handler, .extra1 = SYSCTL_ZERO, .extra2 = &two, }, diff --git a/mm/mm_init.c b/mm/mm_init.c index 435e5f7..c5a6fb1 100644 --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "internal.h" #ifdef CONFIG_DEBUG_MEMORY_INIT @@ -144,14 +145,23 @@ EXPORT_SYMBOL_GPL(mm_kobj); #ifdef CONFIG_SMP s32 vm_committed_as_batch = 32; -static void __meminit mm_compute_batch(void) +void mm_compute_batch(void) { u64 memsized_batch; s32 nr = num_present_cpus(); s32 batch = max_t(s32, nr*2, 32); - - /* batch size set to 0.4% of (total memory/#cpus), or max int32 */ - memsized_batch = min_t(u64, (totalram_pages()/nr)/256, 0x7fffffff); + unsigned long ram_pages = totalram_pages(); + + /* + * For policy of OVERCOMMIT_NEVER, set batch size to 0.4% + * of (total memory/#cpus), and lift it to 25% for other + * policies to easy the possible lock contention for percpu_counter + * vm_committed_as, while the max limit is INT_MAX + */ + if (sysctl_overcommit_memory == OVERCOMMIT_NEVER) + memsized_batch = min_t(u64, ram_pages/nr/256, INT_MAX); + else + memsized_batch = min_t(u64, ram_pages/nr/4, INT_MAX); vm_committed_as_batch = max_t(s32, memsized_batch, batch); } diff --git a/mm/util.c b/mm/util.c index 1c9d097..52ed9c1 100644 --- a/mm/util.c +++ b/mm/util.c @@ -746,6 +746,18 @@ int overcommit_ratio_handler(struct ctl_table *table, int write, void *buffer, return ret; } +int overcommit_policy_handler(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos) +{ + int ret; + + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); + if (ret == 0 && write) + mm_compute_batch(); + + return ret; +} + int overcommit_kbytes_handler(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) { -- 2.7.4