Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4113494imm; Mon, 11 Jun 2018 07:13:27 -0700 (PDT) X-Google-Smtp-Source: ADUXVKIOYTcG8auMkefM8KUONfXKMFg/MDhM/v0eP76BkW8nNjSIjb3mwG1R9bXeth2qWK6ZeCXK X-Received: by 2002:a65:504d:: with SMTP id k13-v6mr14703049pgo.38.1528726407021; Mon, 11 Jun 2018 07:13:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1528726406; cv=none; d=google.com; s=arc-20160816; b=t2kAVCzgIeOUTM/zRw7obtovlr7ReTMCt8sqhlY83CnW8pv87ogWFqy+leqpj0hyr3 9zzSh0oup083bYZ3vhkytimlnD/Hva3kQ07PgjYASe928qDTUDhh/ltga3taqa9hEOrj pJAeYZf/AT8yNNEaYU1SL3kua/OTCbjMKw5H6ov2AYHJPPkT/YxBsIncad3gYUyb9mDr NjEITJBugLGNJBSR2sVU0xjVg6LRaYxjvIJxY/VrShYlRcPaUDSuNQHAVpDEgc140QRL 0uNZsi/f9p6i4C+FPkAgLUZ6VzgQ4AJ6HymrqtSlfKRfCItQ7OeV50MVk+jT215icW0I aTUA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:arc-authentication-results; bh=4deFDNAZ5wikyi0PrlcR3uJctpNnMNvcyhyE94C8b7c=; b=ZH3QSdPcl0it0SZKlykszcQwE6rBZo3Zu+IPXA9OD2HvmS6dGr9aM1Ss1HBprdYe4C FnkY1KFCzbnuvMl2Nk61BNnXQGcoZHgzH95xSmwjAYk2xJ2xzKjUFATOonpWd6rwCw5g tR0/Ngz+gHRVwKXVO/cDjygT2Gc6Hwnv/S2lm5jJJlmGfplnqdUh3qIw4t1WRtV2X/82 P2/M3ZxZSfnvcOLukrqoUxwYitrqIma5of2M7mKpnQ93MiNZqRFp9CKLAVY3jUYt5Bvf nN+P2q+6xyyb23gr4mX9W0yRWbdQ4/C4KzK0ufcBVS1VuNBx93OgcgEM39QlgibRFT5m Gutg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id u10-v6si10542754plu.160.2018.06.11.07.13.12; Mon, 11 Jun 2018 07:13:26 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933713AbeFKOLS (ORCPT + 99 others); Mon, 11 Jun 2018 10:11:18 -0400 Received: from outbound-smtp04.blacknight.com ([81.17.249.35]:53167 "EHLO outbound-smtp04.blacknight.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932394AbeFKOLQ (ORCPT ); Mon, 11 Jun 2018 10:11:16 -0400 Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp04.blacknight.com (Postfix) with ESMTPS id 81FF298939 for ; Mon, 11 Jun 2018 14:11:14 +0000 (UTC) Received: (qmail 32286 invoked from network); 11 Jun 2018 14:11:14 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[37.228.237.171]) by 81.17.254.9 with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 11 Jun 2018 14:11:14 -0000 Date: Mon, 11 Jun 2018 15:11:13 +0100 From: Mel Gorman To: Jirka Hladky Cc: Jakub Racek , linux-kernel , "Rafael J. Wysocki" , Len Brown , linux-acpi@vger.kernel.org, "kkolakow@redhat.com" Subject: Re: [4.17 regression] Performance drop on kernel-4.17 visible on Stream, Linpack and NAS parallel benchmarks Message-ID: <20180611141113.pfuttg7npch3jtg6@techsingularity.net> References: MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170912 (1.9.0) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 11, 2018 at 12:04:34PM +0200, Jirka Hladky wrote: > Hi Mel, > > your suggestion about the commit which has caused the regression was right > - it's indeed this commit: > > 2c83362734dad8e48ccc0710b5cd2436a0323893 > > The question now is what can be done to improve the results. I have made > stream to run longer and I see that data are moved very slowly from NODE#1 > to NODE#0. > Ok, this is somewhat expected although I suspect the scan rate slowed a lot in the early phase of the program and that's why the migration is slow -- slow scan means fewer samples and takes longer to reach the 2-pass filter. > The process has started on NODE#1 where all memory has been allocated. > Right after the start, the process has been moved to NODE#0 but only part > of the memory has been moved to that node. numa_preferred_nid has stayed 1 > for 30 seconds. The numa_preferred_nid has changed to 0 at > 2018-Jun-09_03h35m58s and most of the memory has been finally reallocated. > See the logs below. > > Could we try to make numa_preferred_nid to change faster? > What catches us is that each element in itself makes sense, it's just not a universal win. The identified patch makes a reasonable choice in that fork shouldn't necessary spread across the machine as it hurts short-lived or communicating processes. Unfortunately, if a load is NUMA-aware and the processes are independent then automatic NUMA balancing has to take action which means there is a period of time where performance is sub-optimal. Similarly, the load balancer is making a reasonable decision when a socket gets overloaded. Fixing any part of it for STREAM will end up regressing something else. The numa_preferred_nid can probably be changed faster by adjusting the scan rate. Unfortunately, it comes with the penalty that system CPU overhead will be higher and stalls in the process increase to handle the PTE updates and the subsequent faults. This might help STREAM but anything that is latency sensitive will be hurt. Worse, if a socket is over-saturated and there is a high frequency of cross-node migrations to load balance then the scan rate might always stay at the max frequency and a very high cost incurred so we end up with another class of regression. Srikar Dronamra did have a series with two patches that increase the scan rate when there is a cross-node migration. It may be the case that it also has the impact of changing numa_preferred_nid faster but it has a real risk of introducing regressions. Still, for the purposes of testing you might be interested in testing the following two patches? Srikar Dronamra [PATCH 17/19] sched/numa: Pass destination cpu as a parameter to migrate_task_rq Srikar Dronamra [PATCH 18/19] sched/numa: Reset scan rate whenever task moves across nodes -- Mel Gorman SUSE Labs