Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp10030470imu; Wed, 5 Dec 2018 14:49:14 -0800 (PST) X-Google-Smtp-Source: AFSGD/VcFxkpAK22VxLO0/G8Tn/KK/m6AP3KK5sy6YFqa70rix9tSZu6XIb95/M+xctBe7HO5XjF X-Received: by 2002:a63:e950:: with SMTP id q16mr21980180pgj.138.1544050154209; Wed, 05 Dec 2018 14:49:14 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1544050154; cv=none; d=google.com; s=arc-20160816; b=cz1Rkbb1R76WqsmJIqLWlwad314cLzYpI1bfWxO5Qg60ffsuJWrBZSoKbDE1+T3+wt V3nkHLci2bikMz+fI3QfkMPBHV2HkDu2L0YyD5pZwVwslIYBEh9nwSQoXJ4y16U1T+C4 0nP5Swj7rL5CWJ/GgTsgfDUHOs+c6TR8l4e751lFUhTbKJomV9rc/NSjko2qTnPo93k0 IN+2m+yCo4tqJJ8Ab8kLezNNCleoYHcN65/GTZIvRxGjAMcDoOjYJUbJWedqKHoWwUBi AW2/Gt1gBZplBxII27JvGtMljPu6ckLzfEkDFXmJH7QMMTuEjwMmeNfHHoLxDSvBILXg hDZA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:message-id :subject:cc:to:from:date:dkim-signature; bh=yGfgcwRGAlQUzInDvDMYZlup7dWVY1TstkoREPb0IGM=; b=FFsJROWvimKrrKtQM/Ef7+U20CF93+jEOAZtBYV/s8+zemSYFp9ffYGNZlGNp/PWrZ yFc8JH5f2gC6wuhIhJsEt3l9/q9hB4WMl5F9FRn3Ke380MS5FhUn+rkd9j/LE7860cgI Jpwatzwf0Azv0m2VnbJhomFc6AnclBt57DXFiU9Y23WDHhZR9mbg5vztF+n5l9WQqtju z5UTDjYqgVRmk0zngSg9gnVO62Bu+Jew1RWjxwcAqAYz2TwxBXhmjW5iXOQ1f3CpsD5i xHIOIxmAXzgM6QD4SSv9QIh00JstqPZHBtT7f8o6QeTDbg9yUxewYbwEolIawMwyMxF7 VhLQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="fcPW/+GA"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id m7si26329995pfc.118.2018.12.05.14.48.59; Wed, 05 Dec 2018 14:49:14 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b="fcPW/+GA"; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728663AbeLEWqx (ORCPT + 99 others); Wed, 5 Dec 2018 17:46:53 -0500 Received: from mail-pf1-f196.google.com ([209.85.210.196]:35353 "EHLO mail-pf1-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727704AbeLEWqx (ORCPT ); Wed, 5 Dec 2018 17:46:53 -0500 Received: by mail-pf1-f196.google.com with SMTP id z9so10730096pfi.2 for ; Wed, 05 Dec 2018 14:46:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:message-id:user-agent:mime-version; bh=yGfgcwRGAlQUzInDvDMYZlup7dWVY1TstkoREPb0IGM=; b=fcPW/+GAzq+Ld3dRLPoj27pBQPtmKXPQTdZI0kbfGAvbT6dZO9M4ydFJg+oZMEJ60V TMI2lSU4pp/AT8QGzrdKFGleKW3Xwp7HZH1Q8QwY/I9abT5TwXqgBTTFhGJZAr7BKz+u FJJRgl/r7QjfXHhd7l5xEKRNQgqbwvhKu18ab/hkcuJwTednAE+iKVi31qeD1Dv3dJ+L t16FPDwDbKE3IV+v8//9XmJcIGbdIbpBOraTbiflnX6n+39RImehs7E1GAUBmxnW6GPt 9xQodqlzQPxf8s3dz4fj1lniBX4xao+pTuCKsC4GVeypZUwvHwYwzxZ79oIYrXT4i10v 9+/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:user-agent :mime-version; bh=yGfgcwRGAlQUzInDvDMYZlup7dWVY1TstkoREPb0IGM=; b=uZrv+k+QwpH/v3P0ZmynNB01WgSworj+d3d/6QSPJUl6J5j/PM9RHpelZZFlI0AgOp 2mo0kc9Wv7c0uGa9rOoQ3o6oRSG7nzNkjv3XYg5LQpvKEAspX8IgelSGR33g9jJFgx0y LleoFpRtDruhHvNIipCwHmTRjrEuYJPZDzX35A8fhw7eQib0k71dsG++lyoyO/q91Gie Ns1WSHabLWJlF3OukvwliZj2ThiWlCIpLCCaSOHX0VIGGok8n/wa5hZlu9R4KzQ1mTGe Ss+QYNtrSIxIbpzso5XdQEz9Jxk57PQN8BfWS2TLLsr36q+UyXq0Roq52X0ofvaxjTqm Ih6A== X-Gm-Message-State: AA+aEWZ+uO1Mnv/9yj3SMFbkvnPeTL0KOizd35b6TwSOqH4qWdYIl8aS P/FeRuFNOe3oxwOV4ecIGhbVRg== X-Received: by 2002:a65:560e:: with SMTP id l14mr22122260pgs.168.1544050012017; Wed, 05 Dec 2018 14:46:52 -0800 (PST) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id e86sm28884951pfb.6.2018.12.05.14.46.51 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Wed, 05 Dec 2018 14:46:51 -0800 (PST) Date: Wed, 5 Dec 2018 14:46:50 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Linus Torvalds cc: Andrea Arcangeli , mgorman@techsingularity.net, Vlastimil Babka , mhocko@kernel.org, ying.huang@intel.com, s.priebe@profihost.ag, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu Subject: [patch v2 for-4.20] mm, thp: restore node-local hugepage allocations Message-ID: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). By not setting __GFP_THISNODE, applications can allocate remote hugepages when the local node is fragmented or low on memory when either the thp defrag setting is "always" or the vma has been madvised with MADV_HUGEPAGE. Remote access to hugepages often has much higher latency than local pages of the native page size. On Haswell, ac5b2c18911f was shown to have a 13.9% access regression after this commit for binaries that remap their text segment to be backed by transparent hugepages. The intent of ac5b2c18911f is to address an issue where a local node is low on memory or fragmented such that a hugepage cannot be allocated. In every scenario where this was described as a fix, there is abundant and unfragmented remote memory available to allocate from, even with a greater access latency. If remote memory is also low or fragmented, not setting __GFP_THISNODE was also measured on Haswell to have a 40% regression in allocation latency. Restore __GFP_THISNODE for thp allocations. Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE mappings") Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask") Signed-off-by: David Rientjes --- include/linux/mempolicy.h | 2 -- mm/huge_memory.c | 42 +++++++++++++++------------------------ mm/mempolicy.c | 2 +- 3 files changed, 17 insertions(+), 29 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -139,8 +139,6 @@ struct mempolicy *mpol_shared_policy_lookup(struct shared_policy *sp, struct mempolicy *get_task_policy(struct task_struct *p); struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, unsigned long addr); -struct mempolicy *get_vma_policy(struct vm_area_struct *vma, - unsigned long addr); bool vma_policy_mof(struct vm_area_struct *vma); extern void numa_default_policy(void); diff --git a/mm/huge_memory.c b/mm/huge_memory.c --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -632,37 +632,27 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, unsigned long addr) { const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); - gfp_t this_node = 0; - -#ifdef CONFIG_NUMA - struct mempolicy *pol; - /* - * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not - * specified, to express a general desire to stay on the current - * node for optimistic allocation attempts. If the defrag mode - * and/or madvise hint requires the direct reclaim then we prefer - * to fallback to other node rather than node reclaim because that - * can lead to excessive reclaim even though there is free memory - * on other nodes. We expect that NUMA preferences are specified - * by memory policies. - */ - pol = get_vma_policy(vma, addr); - if (pol->mode != MPOL_BIND) - this_node = __GFP_THISNODE; - mpol_cond_put(pol); -#endif + const gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT | __GFP_THISNODE; + /* Always do synchronous compaction */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); + return GFP_TRANSHUGE | __GFP_THISNODE | + (vma_madvised ? 0 : __GFP_NORETRY); + + /* Kick kcompactd and fail quickly */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM | this_node; + return gfp_mask | __GFP_KSWAPD_RECLAIM; + + /* Synchronous compaction if madvised, otherwise kick kcompactd */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM : - __GFP_KSWAPD_RECLAIM | this_node); + return gfp_mask | (vma_madvised ? __GFP_DIRECT_RECLAIM : + __GFP_KSWAPD_RECLAIM); + + /* Only do synchronous compaction if madvised */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_REQ_MADV_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? __GFP_DIRECT_RECLAIM : - this_node); - return GFP_TRANSHUGE_LIGHT | this_node; + return gfp_mask | (vma_madvised ? __GFP_DIRECT_RECLAIM : 0); + + return gfp_mask; } /* Caller must hold page table lock. */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1662,7 +1662,7 @@ struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, * freeing by another task. It is the caller's responsibility to free the * extra reference for shared policies. */ -struct mempolicy *get_vma_policy(struct vm_area_struct *vma, +static struct mempolicy *get_vma_policy(struct vm_area_struct *vma, unsigned long addr) { struct mempolicy *pol = __get_vma_policy(vma, addr);