Received: by 10.192.165.156 with SMTP id m28csp360443imm; Tue, 10 Apr 2018 23:42:03 -0700 (PDT) X-Google-Smtp-Source: AIpwx48dKVfqq41gDAQSul239tVjQVwU/WYFtG/Grj1J3QT2rXYwwP321LSFAUG13Q9W7uMSeXCV X-Received: by 10.99.122.8 with SMTP id v8mr2430159pgc.401.1523428923358; Tue, 10 Apr 2018 23:42:03 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1523428923; cv=none; d=google.com; s=arc-20160816; b=tDqyTdScE2PJ/fmQ/fDphkMV4YCyxE2wGTwS8+GpuTQRIXEyMFuliaDZM6j5rRrCe0 JWmwm/L3BtvEGrXfeMcrI18Pp/FaFXZR+xBUneISQsYxGTPbton7fHx2Fsr+FJmQrMVK Lim/mcrOMsM0jeAoI7nw9pFLoDpOhnq9vlVqSqW4epgQgxgu7WyNa6/Vyn0sQI9dnG0K Y59XRrNmG5aBwHdJWDrBrIonBNTrpr04jFhLheFZOUGbc23Qdwi/Wm9zuM3OtySNno2z LAILkwJlO1shwLyqaXqZS+uWM/SD1PcV4nrhVCSLMbBJyx+/iAEdvlF4eWZZ2jjaLw1n 60dQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature:arc-authentication-results; bh=07bxumUciRnirIZHUicolAensG+6uteip8I3o2wjFs0=; b=p92t4bzOHCIHTvSuUYUXKuI5VeYJ0vhfTRPA5owBsgHJXb48SlFWEUvMjkr6hD5USa GycaVdBruN0zjbtZP26x1Ab338kHqia3v2rOI14kVGBFABYj+VLvzuYh5nOZZdXt1FRQ GehCb1YHHlABjqmr7RZ50ob3AiizKGln1TM9Vd1l16qiZb3ceBmcAwKN0LVK87EFX3IY 9nbPVT2tSiY4+JqopMNmu7MwWeHVOxoqm9iNyY9/L9DidIdNsV9AKK86pYc1fpWFfmlq LYz2xFBR07Iw1/H4J2RlqmcdFa5LGItlblCSDOoBHhY+U0iz5eoAC5of1DPCmPt5kHwK D4Kg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=g64dllTa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s4si372763pfi.400.2018.04.10.23.41.26; Tue, 10 Apr 2018 23:42:03 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@oracle.com header.s=corp-2017-10-26 header.b=g64dllTa; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=oracle.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752376AbeDKGib (ORCPT + 99 others); Wed, 11 Apr 2018 02:38:31 -0400 Received: from userp2120.oracle.com ([156.151.31.85]:56988 "EHLO userp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752194AbeDKGi3 (ORCPT ); Wed, 11 Apr 2018 02:38:29 -0400 Received: from pps.filterd (userp2120.oracle.com [127.0.0.1]) by userp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id w3B6ajHG164271; Wed, 11 Apr 2018 06:38:00 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=content-type : mime-version : subject : from : in-reply-to : date : cc : content-transfer-encoding : message-id : references : to; s=corp-2017-10-26; bh=07bxumUciRnirIZHUicolAensG+6uteip8I3o2wjFs0=; b=g64dllTa8bTYZHGpaqIgr9VuNgy46CzCkmX4t2VaJXqmrLsINkzsSUyJvGZvsy/bO3GG /nrXYK1mlW1JgJ0/dZ11XayyEOIjCjSyqphPDzkl+KxlBrwESY/QYrqJASbab2D5elMD nsrYw6RmhogLeW6mKnqc73ABHHGGzq4VpabAeR7orFCLZC+zaonNCMUca9LKQGIF/dP5 Z9LRbCfvsxVfjWUV1dOdCISA3K1+wQg/N9inZvPObBDQOuVnidKTx4b/MRUEMgAqk8SI K5Qw544+Eaj5MNsPbR+yrCOc5XCXOJGKZm33Fov+9gchtB1thT2NNBoTg4aTw5jlvTI4 zA== Received: from userv0021.oracle.com (userv0021.oracle.com [156.151.31.71]) by userp2120.oracle.com with ESMTP id 2h6pn4nduu-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 11 Apr 2018 06:38:00 +0000 Received: from userv0121.oracle.com (userv0121.oracle.com [156.151.31.72]) by userv0021.oracle.com (8.14.4/8.14.4) with ESMTP id w3B6bxIX017131 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 11 Apr 2018 06:37:59 GMT Received: from abhmp0016.oracle.com (abhmp0016.oracle.com [141.146.116.22]) by userv0121.oracle.com (8.14.4/8.13.8) with ESMTP id w3B6bvgj014571; Wed, 11 Apr 2018 06:37:57 GMT Received: from [10.74.104.224] (/10.74.104.224) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Tue, 10 Apr 2018 23:37:57 -0700 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [RFC PATCH 1/1] vmscan: Support multiple kswapd threads per node From: Buddy Lumpkin In-Reply-To: <20180403211253.GC30145@bombadil.infradead.org> Date: Tue, 10 Apr 2018 23:37:53 -0700 Cc: Michal Hocko , linux-mm@kvack.org, linux-kernel@vger.kernel.org, hannes@cmpxchg.org, riel@surriel.com, mgorman@suse.de, akpm@linux-foundation.org Content-Transfer-Encoding: quoted-printable Message-Id: <32B9D909-03EA-4852-8AE3-FE398E87EC83@oracle.com> References: <1522661062-39745-1-git-send-email-buddy.lumpkin@oracle.com> <1522661062-39745-2-git-send-email-buddy.lumpkin@oracle.com> <20180403133115.GA5501@dhcp22.suse.cz> <20180403190759.GB6779@bombadil.infradead.org> <20180403211253.GC30145@bombadil.infradead.org> To: Matthew Wilcox X-Mailer: Apple Mail (2.3273) X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=8859 signatures=668698 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1711220000 definitions=main-1804110064 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org > On Apr 3, 2018, at 2:12 PM, Matthew Wilcox = wrote: >=20 > On Tue, Apr 03, 2018 at 01:49:25PM -0700, Buddy Lumpkin wrote: >>> Yes, very much this. If you have a single-threaded workload which = is >>> using the entirety of memory and would like to use even more, then = it >>> makes sense to use as many CPUs as necessary getting memory out of = its >>> way. If you have N CPUs and N-1 threads happily occupying = themselves in >>> their own reasonably-sized working sets with one monster process = trying >>> to use as much RAM as possible, then I'd be pretty unimpressed to = see >>> the N-1 well-behaved threads preempted by kswapd. >>=20 >> The default value provides one kswapd thread per NUMA node, the same >> it was without the patch. Also, I would point out that just because = you devote >> more threads to kswapd, doesn=E2=80=99t mean they are busy. If = multiple kswapd threads >> are busy, they are almost certainly doing work that would have = resulted in >> direct reclaims, which are often substantially more expensive than a = couple >> extra context switches due to preemption. >=20 > [...] >=20 >> In my previous response to Michal Hocko, I described >> how I think we could scale watermarks in response to direct reclaims, = and >> launch more kswapd threads when kswapd peaks at 100% CPU usage. >=20 > I think you're missing my point about the workload ... kswapd isn't > "nice", so it will compete with the N-1 threads which are chugging = along > at 100% CPU inside their working sets. =20 If the memory hog is generating enough demand for multiple kswapd tasks to be busy, then it is generating enough demand to trigger direct reclaims. Since direct reclaims are 100% CPU bound, the preemptions you are concerned about are happening anyway. > In this scenario, we _don't_ > want to kick off kswapd at all; we want the monster thread to clean up > its own mess. This makes direct reclaims sound like a positive thing overall and that is simply not the case. If cleaning is the metaphor to describe direct reclaims, then it=E2=80=99s happening in the kitchen using a garden = hose. When conditions for direct reclaims are present they can occur in any task that is allocating on the system. They inject latency in random = places and they decrease filesystem throughput. When software engineers try to build their own cache, I usually try to = talk them out of it. This rarely works, as they usually have reasons they = believe make the project compelling, so I just ask that they compare their = results using direct IO and a private cache to simply allowing the page cache to do it=E2=80=99s thing. I can=E2=80=99t make this pitch any more because = direct reclaims have too much of an impact on filesystem throughput. The only positive thing that direct reclaims provide is a means to = prevent the system from crashing or deadlocking when it falls too low on memory. > If we have idle CPUs, then yes, absolutely, lets have > them clean up for the monster, but otherwise, I want my N-1 threads > doing their own thing. >=20 > Maybe we should renice kswapd anyway ... thoughts? We don't seem to = have > had a nice'd kswapd since 2.6.12, but maybe we played with that = earlier > and discovered it was a bad idea? >=20