Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8778444imu; Tue, 4 Dec 2018 14:06:12 -0800 (PST) X-Google-Smtp-Source: AFSGD/XIDJobXuM2FGnXqKhZ6LNrVjYXYGXFpQjZb2j2pZ42s+TKpaO+YUYPWASf/Vos4YM7vqAc X-Received: by 2002:a63:4d:: with SMTP id 74mr18653286pga.248.1543961172276; Tue, 04 Dec 2018 14:06:12 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543961172; cv=none; d=google.com; s=arc-20160816; b=t3otJFH171j5/81HlD7AgvYefRnLcwT33+3TSmWcU+bmLlB4ZhXzVk/r/CGYiCCDG5 OfG8ldnV9sX3U0rSqVdAf4n2aN3noHngD3ra1IaTcH3oMIsUJGmt8d3acnMDw9riqCoY +lI25kIpy9BKGzYrx8lfzGlT6kiRXrF7BKEz5dwTDpzoX8YCknpPPktiINtnmZYdw9w7 jCZfpIocRNuwsKvgpdOEJMJ8Bo3vrXta0+IP/BsyP8VovNcEdEbOD+FD0s0YOvxQOozT dpnrXa1QZhSMQF2ghsSK25CKYe9mHsvTlk5JuV8ZtwESkRLKIZz+LYFLPjAlhjELBHfK NQVQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=2uwV9K9lD7GjY7kJi/YZk4TItn6nz345cENCtA8i8Zc=; b=qI/xXI1pOP4tOWSx+ZO4v00k0qQsUdDdvYxvGHJBUPzoUd75FqH1b+i/SILs9lstcz MsQxrKcW10RHRcvoRROWg9nesJHufJYaAqrOwA15OnLLumqlSBKbHrSGFVTN4MCLSpNp 3THOM8/ZL8mwtR7xcPWkQQWbj5zsTI2OXMGuaLrZv7YaQCA1SfEqfRFRaPOkiz1FXWVs kLzwN3MBZprD27ye5HO85F+AVlDh5kewbuNp4vt3MnAGjRSvRHe1ECiUnl47hPZd1r6N sMVcmi31o9SmEbwnxUVHDo8v+cVybO0QWHT9Mc0nSUxDcxufbSKkAzVO00lN2VgokwbT nXpA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OxM4LTAx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id a12si19199372pll.112.2018.12.04.14.05.56; Tue, 04 Dec 2018 14:06:12 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=OxM4LTAx; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726392AbeLDWEP (ORCPT + 99 others); Tue, 4 Dec 2018 17:04:15 -0500 Received: from mail-pf1-f194.google.com ([209.85.210.194]:41145 "EHLO mail-pf1-f194.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725895AbeLDWEO (ORCPT ); Tue, 4 Dec 2018 17:04:14 -0500 Received: by mail-pf1-f194.google.com with SMTP id b7so8896756pfi.8 for ; Tue, 04 Dec 2018 14:04:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=2uwV9K9lD7GjY7kJi/YZk4TItn6nz345cENCtA8i8Zc=; b=OxM4LTAxEYSwPwRdIbGaipZ9XAmtPG80VcAg3dvjXGZPMqSgPfC64RUlyAqbhdjMDC VwHl4EzYdAgQwdf3Sa/05Agqp+fjo88+hIyQQqBcTQf5objSSESSfssqzCZqX0KiNW1o 32LauellJH8ytHSKvpBVbMRtByOF/Qo+fX/ODKd1Xjp5XVe7DF1vsN2Qxunj0cwlpgSz +b0eM1d8m5pAC985E9sBPNSNzm3sft9j9v+t5cGijSvgQB97HxWHktUI+f+qgBFQ3jRm WD6tZerttHgeVlUZTja1L+xrZOG3g5C3XIrIB4CmXm3vklLJywG0Kzg0me/sQgo3iK5i kDgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=2uwV9K9lD7GjY7kJi/YZk4TItn6nz345cENCtA8i8Zc=; b=jbHfVCUSvG2ebMFZpbzjQirVhlDR/XYrfMfAzVznaYNvDErjNTSDexdS3kDXTXLo3/ E19NGt/BgKIVgB8iF6z+K+o68jKjPmXa4OmU4teCCWC5hqJfPHp8HJ76twL0Wsb6N0c9 ur5Mi74e8NFjE/Sw3m7sklrG9gHL3k/gY2rWdfJbcP761iipsuiP2H8HfeHR/p01K3oe RdWpWatdTvRchOGPsIOKF1/KTeMrO5Kw9sgeV6tDHRUz6qNCShfB1iElKExD0Z0faBNl 8WhCMvHY+Wohqn3J8fXuvRPa4kMNGZAIZ3syJCK97AkLidD/GRlNeyHUrpXjTVeHFbEd mAqg== X-Gm-Message-State: AA+aEWaWGQXS4IiA78cS0MYhjaGCWV/FckoMM79Sin3gcLzLmYhCEbr6 PUCT0vcoXTBviSeKntNB4eTvdw== X-Received: by 2002:a65:47ca:: with SMTP id f10mr18614523pgs.166.1543961052287; Tue, 04 Dec 2018 14:04:12 -0800 (PST) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id f13sm26559778pfa.132.2018.12.04.14.04.10 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Tue, 04 Dec 2018 14:04:11 -0800 (PST) Date: Tue, 4 Dec 2018 14:04:10 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Vlastimil Babka cc: Linus Torvalds , Andrea Arcangeli , ying.huang@intel.com, Michal Hocko , s.priebe@profihost.ag, mgorman@techsingularity.net, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu Subject: Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 4 Dec 2018, Vlastimil Babka wrote: > So, AFAIK, the situation is: > > - commit 5265047ac301 in 4.1 introduced __GFP_THISNODE for THP. The > intention came a bit earlier in 4.0 commit 077fcf116c8c. (I admit acking > both as it seemed to make sense). Yes, both are based on the preference to fault local thp and fallback to local pages before allocating remotely because it does not cause the performance regression introduced by not setting __GFP_THISNODE. > - The resulting node-reclaim-like behavior regressed Andrea's KVM > workloads, but reverting it (only for madvised or non-default > defrag=always THP by commit ac5b2c18911f) would regress David's > workloads starting with 4.20 to pre-4.1 levels. > Almost, but the defrag=always case had the subtle difference of also setting __GFP_NORETRY whereas MADV_HUGEPAGE did not. This was different than the comment in __alloc_pages_slowpath() that expected thp fault allocations to be caught by checking __GFP_NORETRY. > If the decision is that it's too late to revert a 4.1 regression for one > kind of workload in 4.20 because it causes regression for another > workload, then I guess we just revert ac5b2c18911f (patch 1) for 4.20 > and don't rush a different fix (patch 2) to 4.20. It's not a big > difference if a 4.1 regression is fixed in 4.20 or 4.21? > The revert is certainly needed to prevent the regression, yes, but I anticipate that Andrea will report back that patch 2 at least improves the situation for the problem that he was addressing, specifically that it is pointless to thrash any node or reclaim unnecessarily when compaction has already failed. This is what setting __GFP_NORETRY for all thp fault allocations fixes. > Because there might be other unexpected consequences of patch 2 that > testing won't be able to catch in the remaining 4.20 rc's. And I'm not > even sure if it will fix Andrea's workloads. While it should prevent > node-reclaim-like thrashing, it will still mean that KVM (or anyone) > won't be able to allocate THP's remotely, even if the local node is > exhausted of both huge and base pages. > Patch 2 does nothing with respect to the remote allocation policy; it simply prevents reclaim (and potentially thrashing). Patch 1 sets __GFP_THISNODE to prevent the remote allocation. Note that the commit to be reverted in patch 1, if not reverted, would cause an even more severe regression from Andrea's case if remote memory is fragmented as well: it opens the door to thrashing both local and remote memory instead of only local memory. I measured this as a 40% allocation latency regression when purposefully fragmenting all nodes and faulting without __GFP_THISNODE, and that was on Haswell; I can imagine it would be even greater on Rome.