Received: by 10.192.165.156 with SMTP id m28csp248525imm; Tue, 10 Apr 2018 20:57:02 -0700 (PDT) X-Google-Smtp-Source: AIpwx48CrVnRWB9sLSyxRT4+0ppmBkIdEvMGfrdfwhQy30uDNBzaV6AWBxkxuMV8JerJHPO7USE9 X-Received: by 10.98.138.154 with SMTP id o26mr2594351pfk.82.1523419022791; Tue, 10 Apr 2018 20:57:02 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523419022; cv=none; d=google.com; s=arc-20160816; b=EnhJF4ESk+lk5BazWzfsLoZi+9gIszd/qb6mqwGZSIblkaWdjmWbd8vNJ/XiBIgkUZ 4zDv8+TKaDG06mAltBNWP73hcTkscN1h4wsYXMePjfT9wpkRXSPrNuva/CxW7iAFjkkS vZQjd5aDkVQJWcgukCx4i7qLeRt7SlAk10zQ7EbhrEGwy2uTtBjCT4lQd9lic3ICg707 P11hP5uL74mn9TZu0ob+lisXaSO7B7vgynP/H0MvTaTsAaiksYsvugKB46pC+/y+hoIs yrIk3RMVI/ULPAkj0m+JjqX3fcui4/Ym/RdneslOfgsR+U0oaxjjb0BjEJsMpF4SDjkA QBIA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=vGOOGvRMISOUD59wJdACTq/GIT8yGmZ4ZEmwSl7D7Zg=; b=ZvQS6cHngF0yZq+3vIBy2LZCQFLIgoYKGITk6dIpe1p8ls0LhHcCPUSBTPnuR5dtdj YmwlsRLon3TOHNE/97DtzUH3Fy8q/PYJi6N1nseR4kytLN0O2RZWB8WcE+6BDo19v1wc aYyNn6DPRLAXzzffBeRCpL38rctFv4Pc0vngoTSd/tNZlK8tcB+if/n3eq5PY5DsPyAL X4sTFs5ibb7YGps5C5Yh+aie07VHFglq11kDO3fD1TxER3/PFY4sZlYBU6sauG2/CqRZ l3mBoCzbAYcS+YIytPQOOVuZkJZPmSwSl9kgxcO7li02GjNwHmz7X7W0EMYRzXpQMyDC FqjA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=QFYukgGW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id 92-v6si214881pli.623.2018.04.10.20.56.25; Tue, 10 Apr 2018 20:57:02 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=QFYukgGW; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752181AbeDKDx1 (ORCPT + 99 others); Tue, 10 Apr 2018 23:53:27 -0400 Received: from userp2130.oracle.com ([156.151.31.86]:55144 "EHLO userp2130.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751589AbeDKDx0 (ORCPT ); Tue, 10 Apr 2018 23:53:26 -0400 Received: from pps.filterd (userp2130.oracle.com [127.0.0.1]) by userp2130.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w3B3p4fw021927; Wed, 11 Apr 2018 03:53:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2017-10-26; bh=vGOOGvRMISOUD59wJdACTq/GIT8yGmZ4ZEmwSl7D7Zg=; b=QFYukgGWWTbL8MZcYkn5rD5Osyy9p0SIvH+NtiL91zvOGRnEDhfoQpiAHZwngdIWlCxt 3JxO82CSf1O14MJaJgCcUuN7bLxG6xlZe9MVgjjcHDD8qdQ/sU0bFJ1UkW5L7aAYib0J EIJ/39ehtZqYbcyB1GLc2oloiYPDzVxSPe1wJTfsG7zTokyBvn2RI2svY/A3E9OTT31u 0G50RjGudaI5f8BVNyweU59iyxp8u51VXH0IwXJQIo25XCoZ3k5ntewrNGBDizzKxlpL 1HYVdeZ4cv6dTSOUm/hTpRxmlAM15L+GxyLi4r/bvaC08eQ5DzeYUgfGnlidBniPj4yQ Og== Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp2130.oracle.com with ESMTP id 2h6ne7d6k6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 11 Apr 2018 03:53:00 +0000 Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w3B3qxiq027557 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 11 Apr 2018 03:52:59 GMT Received: from abhmp0019.oracle.com (abhmp0019.oracle.com [141.146.116.25]) by aserv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w3B3qvho023224; Wed, 11 Apr 2018 03:52:58 GMT Received: from [10.74.104.224] (/10.74.104.224) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 10 Apr 2018 20:52:57 -0700 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 1/1] vmscan: Support multiple kswapd threads per node From: Buddy Lumpkin In-Reply-To: <20180403190759.GB6779@bombadil.infradead.org> Date: Tue, 10 Apr 2018 20:52:55 -0700 Cc: Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, hannes@cmpxchg.org, riel@surriel.com, mgorman@suse.de, akpm@linux-foundation.org Content-Transfer-Encoding: quoted-printable Message-Id: <2E72CC2C-871C-41C1-8238-6BA04C361D4E@oracle.com> References: <1522661062-39745-1-git-send-email-buddy.lumpkin@oracle.com> <1522661062-39745-2-git-send-email-buddy.lumpkin@oracle.com> <20180403133115.GA5501@dhcp22.suse.cz> <20180403190759.GB6779@bombadil.infradead.org> To: Matthew Wilcox X-Mailer: Apple Mail (2.3273) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8859 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804110036 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Apr 3, 2018, at 12:07 PM, Matthew Wilcox = wrote: >=20 > On Tue, Apr 03, 2018 at 03:31:15PM +0200, Michal Hocko wrote: >> On Mon 02-04-18 09:24:22, Buddy Lumpkin wrote: >>> The presence of direct reclaims 10 years ago was a fairly reliable >>> indicator that too much was being asked of a Linux system. Kswapd = was >>> likely wasting time scanning pages that were ineligible for = eviction. >>> Adding RAM or reducing the working set size would usually make the = problem >>> go away. Since then hardware has evolved to bring a new struggle for >>> kswapd. Storage speeds have increased by orders of magnitude while = CPU >>> clock speeds stayed the same or even slowed down in exchange for = more >>> cores per package. This presents a throughput problem for a single >>> threaded kswapd that will get worse with each generation of new = hardware. >>=20 >> AFAIR we used to scale the number of kswapd workers many years ago. = It >> just turned out to be not all that great. We have a kswapd reclaim >> window for quite some time and that can allow to tune how much = proactive >> kswapd should be. >>=20 >> Also please note that the direct reclaim is a way to throttle overly >> aggressive memory consumers. The more we do in the background context >> the easier for them it will be to allocate faster. So I am not really >> sure that more background threads will solve the underlying problem. = It >> is just a matter of memory hogs tunning to end in the very same >> situtation AFAICS. Moreover the more they are going to allocate the = more >> less CPU time will _other_ (non-allocating) task get. >>=20 >>> Test Details >>=20 >> I will have to study this more to comment. >>=20 >> [...] >>> By increasing the number of kswapd threads, throughput increased by = ~50% >>> while kernel mode CPU utilization decreased or stayed the same, = likely due >>> to a decrease in the number of parallel tasks at any given time = doing page >>> replacement. >>=20 >> Well, isn't that just an effect of more work being done on behalf of >> other workload that might run along with your tests (and which = doesn't >> really need to allocate a lot of memory)? In other words how >> does the patch behaves with a non-artificial mixed workloads? >>=20 >> Please note that I am not saying that we absolutely have to stick = with the >> current single-thread-per-node implementation but I would really like = to >> see more background on why we should be allowing heavy memory hogs to >> allocate faster or how to prevent that. I would be also very = interested >> to see how to scale the number of threads based on how CPUs are = utilized >> by other workloads. >=20 > Yes, very much this. If you have a single-threaded workload which is > using the entirety of memory and would like to use even more, then it > makes sense to use as many CPUs as necessary getting memory out of its > way. If you have N CPUs and N-1 threads happily occupying themselves = in > their own reasonably-sized working sets with one monster process = trying > to use as much RAM as possible, then I'd be pretty unimpressed to see > the N-1 well-behaved threads preempted by kswapd. A single thread cannot create the demand to keep any number of kswapd = tasks busy, so this memory hog is going to need to have multiple threads if it = is going to do any measurable damage to the amount of work performed by the = compute bound tasks, and once we increase the number of tasks used for the = memory hog, preemption is already happening. So let=E2=80=99s say we are willing to accept that it is going to take = multiple threads to create enough demand to keep multiple kswapd tasks busy, we just do not = want any additional preemptions strictly due to additional kswapd tasks. You = have to consider, If we managed to create enough demand to keep multiple kswapd = tasks busy, then we are creating enough demand to trigger direct reclaims. A = _lot_ of direct reclaims, and direct reclaims consume A _lot_ of cpu. So if we = are running multiple kswapd threads, they might be preempting your N-1 threads, but = if they were not running, the memory hog tasks would be preempting your N-1 = threads. >=20 > My biggest problem with the patch-as-presented is that it's yet one = more > thing for admins to get wrong. We should spawn more threads = automatically > if system conditions are right to do that. One thing about this patch-as-presented that an admin could get wrong is = by starting with a setting of 16, deciding that it didn=E2=80=99t help and = reducing it back to one. It allows for 16 threads because I actually saw a benefit with = large numbers of kswapd threads when a substantial amount of the memory pressure was=20= created using anonymous memory mappings that do not involve the page = cache. This really is a special case, and the maximum number of threads allowed = should probably be reduced to a more sensible value like 8 or even 6 if there = is concern about admins doing the wrong thing.