Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752634AbdHJMLQ (ORCPT ); Thu, 10 Aug 2017 08:11:16 -0400 Received: from terminus.zytor.com ([65.50.211.136]:44547 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752510AbdHJMLM (ORCPT ); Thu, 10 Aug 2017 08:11:12 -0400 Date: Thu, 10 Aug 2017 05:07:30 -0700 From: tip-bot for Rik van Riel Message-ID: Cc: torvalds@linux-foundation.org, riel@redhat.com, hpa@zytor.com, linux-kernel@vger.kernel.org, tglx@linutronix.de, mingo@kernel.org, mgorman@suse.de, peterz@infradead.org Reply-To: riel@redhat.com, torvalds@linux-foundation.org, mgorman@suse.de, peterz@infradead.org, mingo@kernel.org, tglx@linutronix.de, linux-kernel@vger.kernel.org, hpa@zytor.com In-Reply-To: <20170731192847.23050-2-riel@redhat.com> References: <20170731192847.23050-2-riel@redhat.com> To: linux-tip-commits@vger.kernel.org Subject: [tip:sched/core] sched/numa: Slow down scan rate if shared faults dominate Git-Commit-ID: 37ec97deb3a8c68a7adfab61beb261ffeab19d09 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Length: 3902 Lines: 99 Commit-ID: 37ec97deb3a8c68a7adfab61beb261ffeab19d09 Gitweb: http://git.kernel.org/tip/37ec97deb3a8c68a7adfab61beb261ffeab19d09 Author: Rik van Riel AuthorDate: Mon, 31 Jul 2017 15:28:46 -0400 Committer: Ingo Molnar CommitDate: Thu, 10 Aug 2017 12:18:16 +0200 sched/numa: Slow down scan rate if shared faults dominate The comment above update_task_scan_period() says the scan period should be increased (scanning slows down) if the majority of memory accesses are on the local node, or if the majority of the page accesses are shared with other tasks. However, with the current code, all a high ratio of shared accesses does is slow down the rate at which scanning is made faster. This patch changes things so either lots of shared accesses or lots of local accesses will slow down scanning, and numa scanning is sped up only when there are lots of private faults on remote memory pages. Signed-off-by: Rik van Riel Signed-off-by: Peter Zijlstra (Intel) Acked-by: Mel Gorman Cc: Linus Torvalds Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: jhladky@redhat.com Cc: lvenanci@redhat.com Link: http://lkml.kernel.org/r/20170731192847.23050-2-riel@redhat.com Signed-off-by: Ingo Molnar --- kernel/sched/fair.c | 39 +++++++++++++++++++++++++-------------- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index ef5b66b..cb6b7c8 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -1892,7 +1892,7 @@ static void update_task_scan_period(struct task_struct *p, unsigned long shared, unsigned long private) { unsigned int period_slot; - int ratio; + int lr_ratio, ps_ratio; int diff; unsigned long remote = p->numa_faults_locality[0]; @@ -1922,25 +1922,36 @@ static void update_task_scan_period(struct task_struct *p, * >= NUMA_PERIOD_THRESHOLD scan period increases (scan slower) */ period_slot = DIV_ROUND_UP(p->numa_scan_period, NUMA_PERIOD_SLOTS); - ratio = (local * NUMA_PERIOD_SLOTS) / (local + remote); - if (ratio >= NUMA_PERIOD_THRESHOLD) { - int slot = ratio - NUMA_PERIOD_THRESHOLD; + lr_ratio = (local * NUMA_PERIOD_SLOTS) / (local + remote); + ps_ratio = (private * NUMA_PERIOD_SLOTS) / (private + shared); + + if (ps_ratio >= NUMA_PERIOD_THRESHOLD) { + /* + * Most memory accesses are local. There is no need to + * do fast NUMA scanning, since memory is already local. + */ + int slot = ps_ratio - NUMA_PERIOD_THRESHOLD; + if (!slot) + slot = 1; + diff = slot * period_slot; + } else if (lr_ratio >= NUMA_PERIOD_THRESHOLD) { + /* + * Most memory accesses are shared with other tasks. + * There is no point in continuing fast NUMA scanning, + * since other tasks may just move the memory elsewhere. + */ + int slot = lr_ratio - NUMA_PERIOD_THRESHOLD; if (!slot) slot = 1; diff = slot * period_slot; } else { - diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot; - /* - * Scale scan rate increases based on sharing. There is an - * inverse relationship between the degree of sharing and - * the adjustment made to the scanning period. Broadly - * speaking the intent is that there is little point - * scanning faster if shared accesses dominate as it may - * simply bounce migrations uselessly + * Private memory faults exceed (SLOTS-THRESHOLD)/SLOTS, + * yet they are not on the local NUMA node. Speed up + * NUMA scanning to get the memory moved over. */ - ratio = DIV_ROUND_UP(private * NUMA_PERIOD_SLOTS, (private + shared + 1)); - diff = (diff * ratio) / NUMA_PERIOD_SLOTS; + int ratio = max(lr_ratio, ps_ratio); + diff = -(NUMA_PERIOD_THRESHOLD - ratio) * period_slot; } p->numa_scan_period = clamp(p->numa_scan_period + diff,