Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp782941imu; Tue, 27 Nov 2018 22:31:36 -0800 (PST) X-Google-Smtp-Source: AFSGD/USsvJvO71rHptWbl3IrHoStIIH7GQ3vlXkxAcBzCjDluccjUAGa0ivo6IdrGkaM0dkUiAA X-Received: by 2002:a17:902:3181:: with SMTP id x1mr35296809plb.58.1543386696767; Tue, 27 Nov 2018 22:31:36 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543386696; cv=none; d=google.com; s=arc-20160816; b=idPy4zhg3S2YaWMTOSDwbntk/7I0/LJdDVyGCx8R4VslVPzIQE21/O+Sv/52orevDr OpEUqp3QMrQ8gP2hgB46aDA0CJdHn4IUsL5gem/SHI6Oem+XvTX9wDx/bfMZKWAJsuFr AYrJlA5r+G2jQmHQKNGbdpA8hGnz0gA1UGTlXsz0pmgGrsaSrfnifwnEzJw6/yRPBkDJ ooH6LLUVkJGyHCPSzH4yWo6JBBBQWh978LVsN8RraNwCmEQyhOS3aVAVzyS9El88TkjY guU1VcEd5iYERC/bO4UEhy0S5qapMvJDuY/tID1h4wv1LtEincyEBWMgZL80PB9GY4tn /uDQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-disposition:mime-version:references:message-id:subject:cc :to:from:date; bh=DP+mIDMJceJ1y+pRy+Xccm8RhtX2ou7Z4VWBlLrzj5M=; b=wimbTmnCSerAxEF/+gHskYZf2est1XXRQZrcD8xV7nhJ2lntwhwZ0RHLipplApwWCO ZblKgQwFiit8M7xFVr8ThOTxPig2bQI8x8kJjQxGA01kNMaKP/0k6+OC1u/nfnjGwj3C SKbUX8On2SqdIBheQZNXvy1yG6wdXg6y77CT8G9kLlSB1Qg/xq031ykIOmtHW8CruyaC ilQiODeReNfl235aos3vtI3XTpThcePcbDJzNdSE+96EfURG9JK42lgDDNzCnT8GZAs4 f7Au6cnJqGlUFKRkSibC0A0KRwvVLfrn3+mv+eREYq1ftbUHctMPxyMB0kCZKygu4wbC t1dg== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r11si6495353plo.319.2018.11.27.22.31.20; Tue, 27 Nov 2018 22:31:36 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727276AbeK1RbR (ORCPT + 99 others); Wed, 28 Nov 2018 12:31:17 -0500 Received: from mx2.suse.de ([195.135.220.15]:55112 "EHLO mx1.suse.de" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726972AbeK1RbR (ORCPT ); Wed, 28 Nov 2018 12:31:17 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id EC5E9B08A; Wed, 28 Nov 2018 06:30:43 +0000 (UTC) Date: Wed, 28 Nov 2018 07:30:40 +0100 From: Michal Hocko To: Linus Torvalds Cc: Andrea Arcangeli , rong.a.chen@intel.com, s.priebe@profihost.ag, alex.williamson@redhat.com, mgorman@techsingularity.net, zi.yan@cs.rutgers.edu, Vlastimil Babka , rientjes@google.com, kirill@shutemov.name, Andrew Morton , Linux List Kernel Mailing , lkp@01.org Subject: Re: [LKP] [mm] ac5b2c1891: vm-scalability.throughput -61.3% regression Message-ID: <20181128063040.GF6923@dhcp22.suse.cz> References: <20181127062503.GH6163@shao2-debian> <20181127205737.GI16136@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue 27-11-18 14:50:05, Linus Torvalds wrote: > On Tue, Nov 27, 2018 at 12:57 PM Andrea Arcangeli wrote: > > > > This difference can only happen with defrag=always, and that's not the > > current upstream default. > > Ok, thanks. That makes it a bit less critical. > > > That MADV_HUGEPAGE causes flights with NUMA balancing is not great > > indeed, qemu needs NUMA locality too, but then the badness caused by > > __GFP_THISNODE was a larger regression in the worst case for qemu. > [...] > > So the short term alternative again would be the alternate patch that > > does __GFP_THISNODE|GFP_ONLY_COMPACT appended below. > > Sounds like we should probably do this. Particularly since Vlastimil > pointed out that we'd otherwise have issues with the back-port for 4.4 > where that "defrag=always" was the default. > > The patch doesn't look horrible, and it directly addresses this > particular issue. > > Is there some reason we wouldn't want to do it? We have discussed it previously and the biggest concern was that it introduces a new GFP flag with a very weird and one-off semantic. Anytime we have done that in the past it basically kicked back because people have started to use such a flag and any further changes were really hard to do. So I would really prefer some more systematic solution. And I believe we can do that here. MADV_HUGEPAGE (resp. THP always enabled) has gained a local memory policy with the patch which got effectively reverted. I do believe that conflating "I want THP" with "I want them local" is just wrong from the API point of view. There are different classes of usecases which obviously disagree on the later. So I believe that a long term solution should introduce a MPOL_NODE_RECLAIM kind of policy. It would effectively reclaim local nodes (within NODE_RECLAIM distance) before falling to other nodes. Apart from that we need a less disruptive reclaim driven by compaction and Mel is already working on that AFAIK. -- Michal Hocko SUSE Labs