Received: by 2002:ac0:a582:0:0:0:0:0 with SMTP id m2-v6csp1129813imm; Tue, 2 Oct 2018 03:18:06 -0700 (PDT) X-Google-Smtp-Source: ACcGV63DTN1wmQNjyeOGbCYX+c6mNFPFP+scNNe5qwW/JiRALaJfc3aNBu6Mb5Mygo2tIydAmBkw X-Received: by 2002:a63:b10f:: with SMTP id r15-v6mr13916110pgf.442.1538475486172; Tue, 02 Oct 2018 03:18:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1538475486; cv=none; d=google.com; s=arc-20160816; b=u2TtekFp6nxDRwNmg0Qrjkrueyvgj9m7tcjn/pUBebozLZD2lxnO3JKwLjgkpJYCL2 tlgPdJGO7dvcisxClm4+82nkWn29TCxO3K8olBWeh/vHgHzQnYLqQSe6EFjy1AE2vmna 1WRgPPsu3BcKWE/2K/tm9r5jyowRLUaLBL6Z67D/h1Y0nr4jDigoOm6VQXEdHa/ZvCvh GoBVADGBPYSqQssLhCF2+myb3WNxiIrsEeRVViiK6E7DmSZCdoWHN+W81an+ff+qzZht BbiiSAYmZ7e9XKJCEkT0wmt5BYdy+/TiEsA1SuruTAB23b3RlLwp3EAPdZmlxBXGF3Xf 0w/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-disposition :content-transfer-encoding:mime-version:robot-unsubscribe:robot-id :git-commit-id:subject:to:references:in-reply-to:reply-to:cc :message-id:from:date; bh=Qc1LlRMF7wx+EWlaFGjmiE3NEsHXcJGiU0E9RfpZ+40=; b=Enms3lzi+X7Kf8Fdy/MRj+gZlwP6S6E4cl+AWtmpT2WjKlgPZgQrCmIM9opoKFSoWg 5YEUdGC9pHMoBQt78f/5uezMXy2pOR+rjq9/DxOSZmFDiH9B2cBU6jTA95qH2bQRddM+ hCoEwGVHIFsjZfk5zjeSCQBAdIxt49t6ZDUJTqOKUBBOwCVDxVSZGKE78esnkV3RmYz7 KC1SBOfp03QzEixmY65YtNMyYPklihCqtXTO+rYxXRyHwKn6fA2l9awgpPYvnsF+t1GP mx22R13ecycUlZojXCkEQQ9psBUluEiLWJdeeIupzMiRHG4WFfQegmqO+RKBjGYfGioq KTIg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 64-v6si15773277pft.177.2018.10.02.03.17.51; Tue, 02 Oct 2018 03:18:06 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727433AbeJBRAA (ORCPT + 99 others); Tue, 2 Oct 2018 13:00:00 -0400 Received: from terminus.zytor.com ([198.137.202.136]:44323 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726178AbeJBQ77 (ORCPT ); Tue, 2 Oct 2018 12:59:59 -0400 Received: from terminus.zytor.com (localhost [127.0.0.1]) by terminus.zytor.com (8.15.2/8.15.2) with ESMTPS id w92AH2vj1917437 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NO); Tue, 2 Oct 2018 03:17:02 -0700 Received: (from tipbot@localhost) by terminus.zytor.com (8.15.2/8.15.2/Submit) id w92AH2mJ1917434; Tue, 2 Oct 2018 03:17:02 -0700 Date: Tue, 2 Oct 2018 03:17:02 -0700 X-Authentication-Warning: terminus.zytor.com: tipbot set sender to tipbot@zytor.com using -f From: tip-bot for Mel Gorman Message-ID: Cc: torvalds@linux-foundation.org, linux-kernel@vger.kernel.org, jhladky@redhat.com, mingo@kernel.org, a.p.zijlstra@chello.nl, mgorman@techsingularity.net, riel@surriel.com, tglx@linutronix.de, hpa@zytor.com, srikar@linux.vnet.ibm.com, linux-mm@kvack.org Reply-To: linux-mm@kvack.org, srikar@linux.vnet.ibm.com, tglx@linutronix.de, hpa@zytor.com, riel@surriel.com, mgorman@techsingularity.net, a.p.zijlstra@chello.nl, mingo@kernel.org, jhladky@redhat.com, linux-kernel@vger.kernel.org, torvalds@linux-foundation.org In-Reply-To: <20181001100525.29789-2-mgorman@techsingularity.net> References: <20181001100525.29789-2-mgorman@techsingularity.net> To: linux-tip-commits@vger.kernel.org Subject: [tip:sched/urgent] mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration Git-Commit-ID: efaffc5e40aeced0bcb497ed7a0a5b8c14abfcdf X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline X-Spam-Status: No, score=-2.9 required=5.0 tests=ALL_TRUSTED,BAYES_00, DATE_IN_FUTURE_96_Q autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on terminus.zytor.com Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: efaffc5e40aeced0bcb497ed7a0a5b8c14abfcdf Gitweb: https://git.kernel.org/tip/efaffc5e40aeced0bcb497ed7a0a5b8c14abfcdf Author: Mel Gorman AuthorDate: Mon, 1 Oct 2018 11:05:24 +0100 Committer: Ingo Molnar CommitDate: Tue, 2 Oct 2018 11:31:14 +0200 mm, sched/numa: Remove rate-limiting of automatic NUMA balancing migration Rate limiting of page migrations due to automatic NUMA balancing was introduced to mitigate the worst-case scenario of migrating at high frequency due to false sharing or slowly ping-ponging between nodes. Since then, a lot of effort was spent on correctly identifying these pages and avoiding unnecessary migrations and the safety net may no longer be required. Jirka Hladky reported a regression in 4.17 due to a scheduler patch that avoids spreading STREAM tasks wide prematurely. However, once the task was properly placed, it delayed migrating the memory due to rate limiting. Increasing the limit fixed the problem for him. Currently, the limit is hard-coded and does not account for the real capabilities of the hardware. Even if an estimate was attempted, it would not properly account for the number of memory controllers and it could not account for the amount of bandwidth used for normal accesses. Rather than fudging, this patch simply eliminates the rate limiting. However, Jirka reports that a STREAM configuration using multiple processes achieved similar performance to 4.16. In local tests, this patch improved performance of STREAM relative to the baseline but it is somewhat machine-dependent. Most workloads show little or not performance difference implying that there is not a heavily reliance on the throttling mechanism and it is safe to remove. STREAM on 2-socket machine 4.19.0-rc5 4.19.0-rc5 numab-v1r1 noratelimit-v1r1 MB/sec copy 43298.52 ( 0.00%) 44673.38 ( 3.18%) MB/sec scale 30115.06 ( 0.00%) 31293.06 ( 3.91%) MB/sec add 32825.12 ( 0.00%) 34883.62 ( 6.27%) MB/sec triad 32549.52 ( 0.00%) 34906.60 ( 7.24% Signed-off-by: Mel Gorman Reviewed-by: Rik van Riel Acked-by: Peter Zijlstra Cc: Jirka Hladky Cc: Linus Torvalds Cc: Linux-MM Cc: Srikar Dronamraju Cc: Thomas Gleixner Link: http://lkml.kernel.org/r/20181001100525.29789-2-mgorman@techsingularity.net Signed-off-by: Ingo Molnar --- include/linux/mmzone.h | 6 ---- include/trace/events/migrate.h | 27 ------------------ mm/migrate.c | 65 ------------------------------------------ mm/page_alloc.c | 2 -- 4 files changed, 100 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 1e22d96734e0..3f4c0b167333 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -671,12 +671,6 @@ typedef struct pglist_data { #ifdef CONFIG_NUMA_BALANCING /* Lock serializing the migrate rate limiting window */ spinlock_t numabalancing_migrate_lock; - - /* Rate limiting time interval */ - unsigned long numabalancing_migrate_next_window; - - /* Number of pages migrated during the rate limiting time interval */ - unsigned long numabalancing_migrate_nr_pages; #endif /* * This is a per-node reserve of pages that are not available diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h index 711372845945..705b33d1e395 100644 --- a/include/trace/events/migrate.h +++ b/include/trace/events/migrate.h @@ -70,33 +70,6 @@ TRACE_EVENT(mm_migrate_pages, __print_symbolic(__entry->mode, MIGRATE_MODE), __print_symbolic(__entry->reason, MIGRATE_REASON)) ); - -TRACE_EVENT(mm_numa_migrate_ratelimit, - - TP_PROTO(struct task_struct *p, int dst_nid, unsigned long nr_pages), - - TP_ARGS(p, dst_nid, nr_pages), - - TP_STRUCT__entry( - __array( char, comm, TASK_COMM_LEN) - __field( pid_t, pid) - __field( int, dst_nid) - __field( unsigned long, nr_pages) - ), - - TP_fast_assign( - memcpy(__entry->comm, p->comm, TASK_COMM_LEN); - __entry->pid = p->pid; - __entry->dst_nid = dst_nid; - __entry->nr_pages = nr_pages; - ), - - TP_printk("comm=%s pid=%d dst_nid=%d nr_pages=%lu", - __entry->comm, - __entry->pid, - __entry->dst_nid, - __entry->nr_pages) -); #endif /* _TRACE_MIGRATE_H */ /* This part must be outside protection */ diff --git a/mm/migrate.c b/mm/migrate.c index 4f1d894835b5..5e285c1249a0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1855,54 +1855,6 @@ static struct page *alloc_misplaced_dst_page(struct page *page, return newpage; } -/* - * page migration rate limiting control. - * Do not migrate more than @pages_to_migrate in a @migrate_interval_millisecs - * window of time. Default here says do not migrate more than 1280M per second. - */ -static unsigned int migrate_interval_millisecs __read_mostly = 100; -static unsigned int ratelimit_pages __read_mostly = 128 << (20 - PAGE_SHIFT); - -/* Returns true if the node is migrate rate-limited after the update */ -static bool numamigrate_update_ratelimit(pg_data_t *pgdat, - unsigned long nr_pages) -{ - unsigned long next_window, interval; - - next_window = READ_ONCE(pgdat->numabalancing_migrate_next_window); - interval = msecs_to_jiffies(migrate_interval_millisecs); - - /* - * Rate-limit the amount of data that is being migrated to a node. - * Optimal placement is no good if the memory bus is saturated and - * all the time is being spent migrating! - */ - if (time_after(jiffies, next_window) && - spin_trylock(&pgdat->numabalancing_migrate_lock)) { - pgdat->numabalancing_migrate_nr_pages = 0; - do { - next_window += interval; - } while (unlikely(time_after(jiffies, next_window))); - - WRITE_ONCE(pgdat->numabalancing_migrate_next_window, next_window); - spin_unlock(&pgdat->numabalancing_migrate_lock); - } - if (pgdat->numabalancing_migrate_nr_pages > ratelimit_pages) { - trace_mm_numa_migrate_ratelimit(current, pgdat->node_id, - nr_pages); - return true; - } - - /* - * This is an unlocked non-atomic update so errors are possible. - * The consequences are failing to migrate when we potentiall should - * have which is not severe enough to warrant locking. If it is ever - * a problem, it can be converted to a per-cpu counter. - */ - pgdat->numabalancing_migrate_nr_pages += nr_pages; - return false; -} - static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) { int page_lru; @@ -1975,14 +1927,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, if (page_is_file_cache(page) && PageDirty(page)) goto out; - /* - * Rate-limit the amount of data that is being migrated to a node. - * Optimal placement is no good if the memory bus is saturated and - * all the time is being spent migrating! - */ - if (numamigrate_update_ratelimit(pgdat, 1)) - goto out; - isolated = numamigrate_isolate_page(pgdat, page); if (!isolated) goto out; @@ -2029,14 +1973,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, unsigned long mmun_start = address & HPAGE_PMD_MASK; unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE; - /* - * Rate-limit the amount of data that is being migrated to a node. - * Optimal placement is no good if the memory bus is saturated and - * all the time is being spent migrating! - */ - if (numamigrate_update_ratelimit(pgdat, HPAGE_PMD_NR)) - goto out_dropref; - new_page = alloc_pages_node(node, (GFP_TRANSHUGE_LIGHT | __GFP_THISNODE), HPAGE_PMD_ORDER); @@ -2133,7 +2069,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, out_fail: count_vm_events(PGMIGRATE_FAIL, HPAGE_PMD_NR); -out_dropref: ptl = pmd_lock(mm, pmd); if (pmd_same(*pmd, entry)) { entry = pmd_modify(entry, vma->vm_page_prot); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 89d2a2ab3fe6..706a738c0aee 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6197,8 +6197,6 @@ static unsigned long __init calc_memmap_size(unsigned long spanned_pages, static void pgdat_init_numabalancing(struct pglist_data *pgdat) { spin_lock_init(&pgdat->numabalancing_migrate_lock); - pgdat->numabalancing_migrate_nr_pages = 0; - pgdat->numabalancing_migrate_next_window = jiffies; } #else static void pgdat_init_numabalancing(struct pglist_data *pgdat) {}