Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp7597615imu; Mon, 3 Dec 2018 15:51:23 -0800 (PST) X-Google-Smtp-Source: AFSGD/URRRZVGVaczust5CaduBCU1uDjeGjN8iWJUJkXiQMzB0zEe1RsrwFs0+pN96/Y7KieqWJS X-Received: by 2002:a63:9f19:: with SMTP id g25mr14856224pge.327.1543881083079; Mon, 03 Dec 2018 15:51:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543881083; cv=none; d=google.com; s=arc-20160816; b=wstwQeAEwTerhK3tEhQMynDf3wysou27JtocIaPkG8GSS/KNak0RKXers8geDPmgcm ZlCtmtu1fDhU+CgyvHdJGVqIuzeK/FEyy86LlNAqAGt1JHAvHXBhTpes2n4G92ZakTli Ptxaj5DBKtDSymBK9Nv0NqLXwDR+tAe1wBbscuHbOPWtMCkm+qLnWYaco4kTnZlM3IGX oMuHt1T20kGkqrBp6h63yElymDFwhsg4f2digmQ7sMQ3k7B9tjpC4hzOelUU7r7f+81F xNXhJtdAkQPlFwJLOxA6F6Zhm7K/FVUjl8+Egmrkmzmt08eixXAaDmmpUtRGfdFlRny2 r/kA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:mime-version:user-agent:references :message-id:in-reply-to:subject:cc:to:from:date:dkim-signature; bh=T5rIjhLQ4stys0+TCyJHyOn0Ld01LDgoTWpjXbRKG0I=; b=fSC5yZEt6FWLd+/MX8n6hHKWJkJutX/vG8+TNvsGEio/sPZFN0tQUXBzjOqYooR0dC vXno/xCwYaj4EOQtAy8Qj/E/Ap8zRAEjvvrlrQZu6MfDPUBHrmv8jIdtLiG3puH3TnTf tmS5WHyRM3fMlbuh/7p8ER3etcewk/BT43ET4LSkf3DWno1pwRJwT6adGsc/UjbL1kPa T5PB/7pk+CJbpidkPnwPk6vATkn+vQS7SmHETj6ZDgcz4cPZhsQsvJAag+1/6im1xBZW D9mgFSEYZWN0iXKsUk97fiYD+6twSJ6uuMKrG27N6JPsl5RYZPXegdQnslUFORNMF7qB zvHg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=kmS0Lz+F; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g5si14417594plo.108.2018.12.03.15.51.07; Mon, 03 Dec 2018 15:51:23 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=kmS0Lz+F; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726070AbeLCXuZ (ORCPT + 99 others); Mon, 3 Dec 2018 18:50:25 -0500 Received: from mail-pl1-f195.google.com ([209.85.214.195]:44076 "EHLO mail-pl1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725981AbeLCXuY (ORCPT ); Mon, 3 Dec 2018 18:50:24 -0500 Received: by mail-pl1-f195.google.com with SMTP id k8so7271149pls.11 for ; Mon, 03 Dec 2018 15:50:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=T5rIjhLQ4stys0+TCyJHyOn0Ld01LDgoTWpjXbRKG0I=; b=kmS0Lz+FUCFImUjiMzzAMvPH9SDaCcijUEb3OtIX0B0AHfRNJEJB5Z0lEMrUNDVsta SV1aHT1uvyV1o8FPNZr1PU/VZXiUQccnYK0s0zfQZs4Kbv41+wYPGdmmyAS4FWIy1+pD hg2nKRqz4OKszVVvFR92g1Z/vDgICbVWOTL0pkmPrykvWkwlp9E+yrui7QzZE54uJsgP rSBNPBRmeVKqiTdG9QU+kxdnGpcaTnwM2bAbfCBXc7r28R69kHIl0IeqCXarLiYD2fFc OL2T3cv8X822lAbwJp6ACRN8wR489gsvnl+2pwZSEIULHNcXFjqLksPrW1D6OqRqPbde ttGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=T5rIjhLQ4stys0+TCyJHyOn0Ld01LDgoTWpjXbRKG0I=; b=Y0eEsUnmc5ilkbb0H+kV78E9xOoY1UjEBhOs1kWrzQp2T+QMD2V7b/oauy6MkT40jm pPD5/cikEAfRgt9NCFHNPIS51Z/rh/FU2Ahsw9tr4P8SAz9ztTkzGyx8ONJMLZOUTLb8 6N7ZigZedfrZloIwtuEmTg03VVKhN+M/BxmvKoK7WtAyf6Jvvprd2oly6gAcaTM+eaxx e2v7S26CO2h90HujnNOrFOhPxn5Wj8QjGuyGmbDVEjzhYBZ/3PMEn2qsbLk+F5LC4PwZ 7LFpspXgdRz7D27Yf0kxU7VQjRCz7mpCOD/7vSfEgjJ6jqLoNBD0vguZRx0CNFG2uMSF LOHw== X-Gm-Message-State: AA+aEWa7vwuxlXkdQOCGJMjmeUHh12VvfG9wvCDwNdrIE6LcMa2u2OwV avKdAvcINSUepNrc2PXM9CEEDw== X-Received: by 2002:a17:902:15a8:: with SMTP id m37mr3080673pla.129.1543881023156; Mon, 03 Dec 2018 15:50:23 -0800 (PST) Received: from [2620:15c:17:3:3a5:23a7:5e32:4598] ([2620:15c:17:3:3a5:23a7:5e32:4598]) by smtp.gmail.com with ESMTPSA id w11sm15585480pgk.16.2018.12.03.15.50.22 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Mon, 03 Dec 2018 15:50:22 -0800 (PST) Date: Mon, 3 Dec 2018 15:50:21 -0800 (PST) From: David Rientjes X-X-Sender: rientjes@chino.kir.corp.google.com To: Linus Torvalds , Andrea Arcangeli cc: ying.huang@intel.com, Michal Hocko , s.priebe@profihost.ag, mgorman@techsingularity.net, Linux List Kernel Mailing , alex.williamson@redhat.com, lkp@01.org, kirill@shutemov.name, Andrew Morton , zi.yan@cs.rutgers.edu, Vlastimil Babka Subject: [patch 2/2 for-4.20] mm, thp: always fault memory with __GFP_NORETRY In-Reply-To: Message-ID: References: User-Agent: Alpine 2.21 (DEB 202 2017-01-01) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org If memory compaction initially fails to free a hugepage, reclaiming and retrying compaction is more likely to be harmful rather than beneficial. For reclaim, it is unlikely that the pages reclaimed will form contiguous memory the size of a hugepage without unnecessarily reclaiming a lot of memory unnecessarily. It is also not guaranteed to be beneficial to compaction if the reclaimed memory is not accessible to the per-zone freeing scanner. For both of these reasons independently, all reclaim activity may be entirely fruitless. With these two issues, retrying compaction again is not likely to have a different result. It is better to fallback to pages of the native page size and allow khugepaged to collapse the memory into a hugepage later when the fragmentation or availability of local memory is better. If __GFP_NORETRY is set, which the page allocator implementation is expecting in its comments, this can prevent large amounts of unnecesary reclaim and swapping activity that can cause performance of other applications to quickly degrade. Furthermore, since reclaim is likely to be more harmful than beneficial for such large order allocations, it is better to fail earlier rather than trying reclaim of SWAP_CLUSTER_MAX pages which is unlikely to make a difference for memory compaction to become successful. Signed-off-by: David Rientjes --- drivers/gpu/drm/ttm/ttm_page_alloc.c | 8 ++++---- drivers/gpu/drm/ttm/ttm_page_alloc_dma.c | 3 +-- include/linux/gfp.h | 3 ++- mm/huge_memory.c | 3 +-- mm/page_alloc.c | 16 ++++++++++++++++ 5 files changed, 24 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc.c b/drivers/gpu/drm/ttm/ttm_page_alloc.c --- a/drivers/gpu/drm/ttm/ttm_page_alloc.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc.c @@ -860,8 +860,8 @@ static int ttm_get_pages(struct page **pages, unsigned npages, int flags, while (npages >= HPAGE_PMD_NR) { gfp_t huge_flags = gfp_flags; - huge_flags |= GFP_TRANSHUGE_LIGHT | __GFP_NORETRY | - __GFP_KSWAPD_RECLAIM; + huge_flags |= GFP_TRANSHUGE_LIGHT | + __GFP_KSWAPD_RECLAIM; huge_flags &= ~__GFP_MOVABLE; huge_flags &= ~__GFP_COMP; p = alloc_pages(huge_flags, HPAGE_PMD_ORDER); @@ -978,13 +978,13 @@ int ttm_page_alloc_init(struct ttm_mem_global *glob, unsigned max_pages) GFP_USER | GFP_DMA32, "uc dma", 0); ttm_page_pool_init_locked(&_manager->wc_pool_huge, - (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY | + (GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM) & ~(__GFP_MOVABLE | __GFP_COMP), "wc huge", order); ttm_page_pool_init_locked(&_manager->uc_pool_huge, - (GFP_TRANSHUGE_LIGHT | __GFP_NORETRY | + (GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM) & ~(__GFP_MOVABLE | __GFP_COMP) , "uc huge", order); diff --git a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c --- a/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c +++ b/drivers/gpu/drm/ttm/ttm_page_alloc_dma.c @@ -863,8 +863,7 @@ static gfp_t ttm_dma_pool_gfp_flags(struct ttm_dma_tt *ttm_dma, bool huge) gfp_flags |= __GFP_ZERO; if (huge) { - gfp_flags |= GFP_TRANSHUGE_LIGHT | __GFP_NORETRY | - __GFP_KSWAPD_RECLAIM; + gfp_flags |= GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM; gfp_flags &= ~__GFP_MOVABLE; gfp_flags &= ~__GFP_COMP; } diff --git a/include/linux/gfp.h b/include/linux/gfp.h --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -298,7 +298,8 @@ struct vm_area_struct; #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) #define GFP_HIGHUSER_MOVABLE (GFP_HIGHUSER | __GFP_MOVABLE) #define GFP_TRANSHUGE_LIGHT ((GFP_HIGHUSER_MOVABLE | __GFP_COMP | \ - __GFP_NOMEMALLOC | __GFP_NOWARN) & ~__GFP_RECLAIM) + __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN) & \ + ~__GFP_RECLAIM) #define GFP_TRANSHUGE (GFP_TRANSHUGE_LIGHT | __GFP_DIRECT_RECLAIM) /* Convert GFP flags to their corresponding migrate type */ diff --git a/mm/huge_memory.c b/mm/huge_memory.c --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -636,8 +636,7 @@ static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma, un /* Always do synchronous compaction */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) - return GFP_TRANSHUGE | __GFP_THISNODE | - (vma_madvised ? 0 : __GFP_NORETRY); + return GFP_TRANSHUGE | __GFP_THISNODE; /* Kick kcompactd and fail quickly */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, &transparent_hugepage_flags)) diff --git a/mm/page_alloc.c b/mm/page_alloc.c --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4139,6 +4139,22 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, if (compact_result == COMPACT_DEFERRED) goto nopage; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* + * When faulting a hugepage, it is very unlikely that + * thrashing the zonelist is going to help compaction in + * freeing such a high-order page. Reclaim would need + * to free contiguous memory itself or guarantee the + * reclaimed memory is accessible by the compaction + * freeing scanner. Since there is no such guarantee, + * thrashing is more harmful than beneficial. It is + * better to simply fail and fallback to native pages. + */ + if (order == HPAGE_PMD_ORDER && + !(current->flags & PF_KTHREAD)) + goto nopage; +#endif + /* * Looks like reclaim/compaction is worth trying, but * sync compaction could be very expensive, so keep