Received: by 2002:a05:6a10:f347:0:0:0:0 with SMTP id d7csp2051363pxu; Tue, 24 Nov 2020 15:56:39 -0800 (PST) X-Google-Smtp-Source: ABdhPJztZY5yH8NKw59CgIMPuSer0QZm0X4C8mk4EmGa9BLCOpA4Kr9oWXIinmvtkjBYl6cEm58f X-Received: by 2002:a17:906:86cf:: with SMTP id j15mr853563ejy.260.1606262198983; Tue, 24 Nov 2020 15:56:38 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1606262198; cv=none; d=google.com; s=arc-20160816; b=z1m3Bd8rNApQzqNZK4J0bTQf3Hhm3hBPAwSbGk8DUyzWQPaNInmte2ob+3dk7T7ehL W4IhFM/e1yfzkfH6/p2CL+QZdycamxXlr7q77x1xGY372lFxfXbVmUJ/LRgbnHQSaF5X BqVn3u7ZdGdVfmAPk1UgLiefmZNY+Nz/7aqZ3DbuPQ4fTvx1W7ww0UE+rEYmky9zMT4y NLTqW4+HByWdA+sAvfHZAQJX7Zd/m0eR60QfGXksS9b0xT8vQjNOhsG+kGLp18vN9ZUa NtX/R1MHb5qx+uXo4yUFL5mwLzu9XL2M1SwxIdSijYhSwAGfRGhYI91IhPXpLhryb9bL F3Lw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from; bh=LuRKKTJaiUlpHJM5r1NJ5nlILsXok0lT7yKSTJk1+8A=; b=Js532gL118fNZxyQxAQNicfzPys/FanPHNJrMjhS8hVwhRog5H6DH6nda3WtCiu385 LhkMFXmxsTEwlm4Z2T8g7WB3Sk51HWGfk7fAfiU7Ki4vJB+oJM6+XnZy7kDG9NKqoCG8 tJQ5poDaFopQ/Qqgv3hcblBTIKGNkOHV0GC+KM5GNcJk105jhWIUcWZwSpZGQhv6Hum/ J5J0/pd7lXr4Ly5xsiRYMvkSJyGuIsSPFFvZ3IUvrDvqT2cFTeJ6Wm5xZxaVBUY7767l ieQYaqKAt52cT/eOhwO+OuDiiFE/k2BczAbap1Mw60z/DzvewWpv4sMU18ZeBFs+mgDw Eh8w== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p14si167919edm.185.2020.11.24.15.56.16; Tue, 24 Nov 2020 15:56:38 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728371AbgKXTtq (ORCPT + 99 others); Tue, 24 Nov 2020 14:49:46 -0500 Received: from shelob.surriel.com ([96.67.55.147]:36752 "EHLO shelob.surriel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728285AbgKXTtl (ORCPT ); Tue, 24 Nov 2020 14:49:41 -0500 Received: from imladris.surriel.com ([96.67.55.152]) by shelob.surriel.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94) (envelope-from ) id 1kheJc-0006wM-1r; Tue, 24 Nov 2020 14:49:28 -0500 From: Rik van Riel To: hughd@google.com Cc: xuyu@linux.alibaba.com, akpm@linux-foundation.org, mgorman@suse.de, aarcange@redhat.com, willy@infradead.org, linux-kernel@vger.kernel.org, kernel-team@fb.com, linux-mm@kvack.org, vbabka@suse.cz, mhocko@suse.com, Rik van Riel Subject: [PATCH 1/3] mm,thp,shmem: limit shmem THP alloc gfp_mask Date: Tue, 24 Nov 2020 14:49:23 -0500 Message-Id: <20201124194925.623931-2-riel@surriel.com> X-Mailer: git-send-email 2.25.4 In-Reply-To: <20201124194925.623931-1-riel@surriel.com> References: <20201124194925.623931-1-riel@surriel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: riel@shelob.surriel.com Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org The allocation flags of anonymous transparent huge pages can be controlled through the files in /sys/kernel/mm/transparent_hugepage/defrag, which can help the system from getting bogged down in the page reclaim and compaction code when many THPs are getting allocated simultaneously. However, the gfp_mask for shmem THP allocations were not limited by those configuration settings, and some workloads ended up with all CPUs stuck on the LRU lock in the page reclaim code, trying to allocate dozens of THPs simultaneously. This patch applies the same configurated limitation of THPs to shmem hugepage allocations, to prevent that from happening. Controlling the gfp_mask of THP allocations through the knobs in sysfs allows users to determine the balance between how aggressively the system tries to allocate THPs at fault time, and how much the application may end up stalling attempting those allocations. This way a THP defrag setting of "never" or "defer+madvise" will result in quick allocation failures without direct reclaim when no 2MB free pages are available. With this patch applied, THP allocations for tmpfs will be a little more aggressive than today for files mmapped with MADV_HUGEPAGE, and a little less aggressive for files that are not mmapped or mapped without that flag. Signed-off-by: Rik van Riel --- include/linux/gfp.h | 2 ++ mm/huge_memory.c | 6 +++--- mm/shmem.c | 8 +++++--- 3 files changed, 10 insertions(+), 6 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index c603237e006c..c7615c9ba03c 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -614,6 +614,8 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask); extern void pm_restrict_gfp_mask(void); extern void pm_restore_gfp_mask(void); +extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma); + #ifdef CONFIG_PM_SLEEP extern bool pm_suspended_storage(void); #else diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9474dbc150ed..c5d03b2f2f2f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -649,9 +649,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, * available * never: never stall for any thp allocation */ -static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct *vma) +gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma) { - const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); + const bool vma_madvised = vma && (vma->vm_flags & VM_HUGEPAGE); /* Always do synchronous compaction */ if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, &transparent_hugepage_flags)) @@ -744,7 +744,7 @@ vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf) pte_free(vma->vm_mm, pgtable); return ret; } - gfp = alloc_hugepage_direct_gfpmask(vma); + gfp = vma_thp_gfp_mask(vma); page = alloc_hugepage_vma(gfp, vma, haddr, HPAGE_PMD_ORDER); if (unlikely(!page)) { count_vm_event(THP_FAULT_FALLBACK); diff --git a/mm/shmem.c b/mm/shmem.c index 537c137698f8..6c3cb192a88d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1545,8 +1545,8 @@ static struct page *shmem_alloc_hugepage(gfp_t gfp, return NULL; shmem_pseudo_vma_init(&pvma, info, hindex); - page = alloc_pages_vma(gfp | __GFP_COMP | __GFP_NORETRY | __GFP_NOWARN, - HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), true); + page = alloc_pages_vma(gfp, HPAGE_PMD_ORDER, &pvma, 0, numa_node_id(), + true); shmem_pseudo_vma_destroy(&pvma); if (page) prep_transhuge_page(page); @@ -1802,6 +1802,7 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, struct page *page; enum sgp_type sgp_huge = sgp; pgoff_t hindex = index; + gfp_t huge_gfp; int error; int once = 0; int alloced = 0; @@ -1887,7 +1888,8 @@ static int shmem_getpage_gfp(struct inode *inode, pgoff_t index, } alloc_huge: - page = shmem_alloc_and_acct_page(gfp, inode, index, true); + huge_gfp = vma_thp_gfp_mask(vma); + page = shmem_alloc_and_acct_page(huge_gfp, inode, index, true); if (IS_ERR(page)) { alloc_nohuge: page = shmem_alloc_and_acct_page(gfp, inode, -- 2.25.4