Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp3843186pxb; Mon, 8 Feb 2021 01:17:24 -0800 (PST) X-Google-Smtp-Source: ABdhPJwu/oQ7vxQ/aDJQ53w/hjN1Y8+qpe7J5gCL7XkGpC5JgEwb3jurD/g9UvtTvIa375fIP6L6 X-Received: by 2002:a17:906:d935:: with SMTP id rn21mr16081142ejb.443.1612775844731; Mon, 08 Feb 2021 01:17:24 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1612775844; cv=none; d=google.com; s=arc-20160816; b=K+2HwQp0A34BtMvDf5p7AIYSrhgG0yeFizyOqanEV3f1hXb66YpENbtfdeOmUT/v0W CFvQRPu6eS9vXcA0act22AWwA7heMrgfB9TKkFN2Tu1kx8jj2KMX2M4pECmCRZZmZBzf 6N6X3x3yuOvQJBJciTQeAmtUeLL3InXdW276190JZeqCYh88fg1oyHEx30ljIHgdn2TZ Q5Nz3sVNy0E/8sBOngFep+Zi+yGgCQDpco8cNf9FUp85TGyq92uRYM0MBMp27gBjsEpV 4gMpeOjvSw7dMHbC73dBDkVYFum4oIhciBDElTsgnNQ+neZJvzU2jAFgysh9zjKXE2Ny tW9Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:mime-version :references:in-reply-to:message-id:date:subject:cc:to:from :dkim-signature; bh=kKHOUUeqPc2RKsnBepEyE4N4WmwJ66s9oVgh992mndU=; b=ZdsEYWQFeaC1xFRaP4xbfKsEmk+pBnKBRnvwi2rzwQsCkhpmDzT3rYhTVbte/yBzI6 pIpYUJzJGbogTaNEVnQPNYKPCj2fjhK8PLf4wDGWOIMegoQ04jPRGZKAdZ07o4c4INr5 fW87Mth0GpdNiApTWU3wxhAL+wPBvbJq91gXfaXIFfbppQAgHRjBKI3GWsHA9PLWc4t6 aabnv2gyq7ajoqZYQ3Zaw9ew12EJF2hFCKBXsNiNSZelvCT/FAOhAIDeqaRiyv2vuXff Ne8OuVdoxldNg9RNlY+G1PG7JGL5svvVcU3fEv+O+TrY5MLDhba+z66xSdymyIw4X/Km d4nA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=DTgLcJBo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g24si10443254ejf.739.2021.02.08.01.17.00; Mon, 08 Feb 2021 01:17:24 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=DTgLcJBo; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230408AbhBHJP2 (ORCPT + 99 others); Mon, 8 Feb 2021 04:15:28 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:54844 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230438AbhBHIxs (ORCPT ); Mon, 8 Feb 2021 03:53:48 -0500 Received: from mail-pf1-x433.google.com (mail-pf1-x433.google.com [IPv6:2607:f8b0:4864:20::433]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78AEAC0617AA for ; Mon, 8 Feb 2021 00:52:45 -0800 (PST) Received: by mail-pf1-x433.google.com with SMTP id k13so2810059pfh.13 for ; Mon, 08 Feb 2021 00:52:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=kKHOUUeqPc2RKsnBepEyE4N4WmwJ66s9oVgh992mndU=; b=DTgLcJBouvMNdgLcWtZ32Xg4JDXo/WnMr8feE0yHG4uxK11rdt5wCY/QBqZi6FoXHa tIyF3wMYza9YvYB4mIiEzMx7+bWAmu/cq7aj3VXqFyR2ciRYnjw56IdYk9sFm5m+d/+V HgCjI0cpVlAO+QjDMBzDlAecBFHlW+3UenoxyfOLQ+7/IJqKumYRKpztipMZ2f7+9fSB nVOV83X2ZdwjfTZJfUInQeYYLqADbEi9LIKpzrkpQvONATAiNXJH+T3kgOlWe9ERqixR cLPpvl7YcrcmMQ5UIs56jDykwOX+cUxIdLmRsiUXtS9AAyNpf1KbHXJnoeLc5Wjb8ln0 OTrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=kKHOUUeqPc2RKsnBepEyE4N4WmwJ66s9oVgh992mndU=; b=YlTZ1u+/RkLdk/q/9RD6zvHW/a2K4AkZCj6AtkXgTgLFghA/mb2nkjETOII/tYpEzM OYIMTX4RHRQbSoWQjZNeFJYDfnqAPFt5VZC5gz3azgz/X1tdxUxQam31/UgRQEUDIsPD FhZk9J16KGYRuva0QNjW8icnMh9kXRpIuJ8rjzHIx+UIKRIKirXQ99Afe/85mq2ySHvt A5KHrTKYmfYvQU8+TPArIU8QloNp/+zq3APDnE4Vx0rFl51OnNl4VlDPewUqyL6tVKKc yV9nrXHPhymmImdbiwj8Bicnsws2hyDStKUPPjF4a+ZzaJcbAd8kT+Spt4aS47bcJxyJ NB9A== X-Gm-Message-State: AOAM531Wiu4xiSDwz8DVXRjLQLw+AVxqYliaRZQG/QK0rJDw1caV40U0 XwCGsOV5pxqMDQBX55ONgVgTgw== X-Received: by 2002:a05:6a00:2305:b029:1b4:8368:13fd with SMTP id h5-20020a056a002305b02901b4836813fdmr16761844pfh.0.1612774364976; Mon, 08 Feb 2021 00:52:44 -0800 (PST) Received: from localhost.localdomain ([139.177.225.254]) by smtp.gmail.com with ESMTPSA id g15sm17205179pfb.30.2021.02.08.00.52.33 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 08 Feb 2021 00:52:44 -0800 (PST) From: Muchun Song To: corbet@lwn.net, mike.kravetz@oracle.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, rientjes@google.com, willy@infradead.org, osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, naoya.horiguchi@nec.com, joao.m.martins@oracle.com Cc: duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, Muchun Song Subject: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page Date: Mon, 8 Feb 2021 16:50:09 +0800 Message-Id: <20210208085013.89436-5-songmuchun@bytedance.com> X-Mailer: git-send-email 2.21.0 (Apple Git-122) In-Reply-To: <20210208085013.89436-1-songmuchun@bytedance.com> References: <20210208085013.89436-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org When we free a HugeTLB page to the buddy allocator, we should allocate the vmemmap pages associated with it. But we may cannot allocate vmemmap pages when the system is under memory pressure, in this case, we just refuse to free the HugeTLB page instead of looping forever trying to allocate the pages. Signed-off-by: Muchun Song --- include/linux/mm.h | 2 ++ mm/hugetlb.c | 19 ++++++++++++- mm/hugetlb_vmemmap.c | 30 +++++++++++++++++++++ mm/hugetlb_vmemmap.h | 6 +++++ mm/sparse-vmemmap.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++- 5 files changed, 130 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d7dddf334779..33c5911afe18 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2981,6 +2981,8 @@ static inline void print_vma_addr(char *prefix, unsigned long rip) void vmemmap_remap_free(unsigned long start, unsigned long end, unsigned long reuse); +int vmemmap_remap_alloc(unsigned long start, unsigned long end, + unsigned long reuse, gfp_t gfp_mask); void *sparse_buffer_alloc(unsigned long size); struct page * __populate_section_memmap(unsigned long pfn, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4cfca27c6d32..69dcbaa2e6db 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1397,16 +1397,26 @@ static void __free_huge_page(struct page *page) h->resv_huge_pages++; if (HPageTemporary(page)) { - list_del(&page->lru); ClearHPageTemporary(page); + + if (alloc_huge_page_vmemmap(h, page)) { + h->surplus_huge_pages++; + h->surplus_huge_pages_node[nid]++; + goto enqueue; + } + list_del(&page->lru); update_and_free_page(h, page); } else if (h->surplus_huge_pages_node[nid]) { + if (alloc_huge_page_vmemmap(h, page)) + goto enqueue; + /* remove the page from active list */ list_del(&page->lru); update_and_free_page(h, page); h->surplus_huge_pages--; h->surplus_huge_pages_node[nid]--; } else { +enqueue: arch_clear_hugepage_flags(page); enqueue_huge_page(h, page); } @@ -1693,6 +1703,10 @@ static int free_pool_huge_page(struct hstate *h, nodemask_t *nodes_allowed, struct page *page = list_entry(h->hugepage_freelists[node].next, struct page, lru); + + if (alloc_huge_page_vmemmap(h, page)) + break; + list_del(&page->lru); h->free_huge_pages--; h->free_huge_pages_node[node]--; @@ -1760,6 +1774,9 @@ int dissolve_free_huge_page(struct page *page) goto retry; } + if (alloc_huge_page_vmemmap(h, head)) + goto out; + /* * Move PageHWPoison flag from head page to the raw error page, * which makes any subpages rather than the error page reusable. diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c index 0209b736e0b4..3d85e3ab7caa 100644 --- a/mm/hugetlb_vmemmap.c +++ b/mm/hugetlb_vmemmap.c @@ -169,6 +169,8 @@ * (last) level. So this type of HugeTLB page can be optimized only when its * size of the struct page structs is greater than 2 pages. */ +#define pr_fmt(fmt) "HugeTLB: " fmt + #include "hugetlb_vmemmap.h" /* @@ -198,6 +200,34 @@ static inline unsigned long free_vmemmap_pages_size_per_hpage(struct hstate *h) return (unsigned long)free_vmemmap_pages_per_hpage(h) << PAGE_SHIFT; } +int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) +{ + int ret; + unsigned long vmemmap_addr = (unsigned long)head; + unsigned long vmemmap_end, vmemmap_reuse; + + if (!free_vmemmap_pages_per_hpage(h)) + return 0; + + vmemmap_addr += RESERVE_VMEMMAP_SIZE; + vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); + vmemmap_reuse = vmemmap_addr - PAGE_SIZE; + + /* + * The pages which the vmemmap virtual address range [@vmemmap_addr, + * @vmemmap_end) are mapped to are freed to the buddy allocator, and + * the range is mapped to the page which @vmemmap_reuse is mapped to. + * When a HugeTLB page is freed to the buddy allocator, previously + * discarded vmemmap pages must be allocated and remapping. + */ + ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, + GFP_ATOMIC | __GFP_NOWARN | __GFP_THISNODE); + if (ret == -ENOMEM) + pr_info("cannot alloc vmemmap pages\n"); + + return ret; +} + void free_huge_page_vmemmap(struct hstate *h, struct page *head) { unsigned long vmemmap_addr = (unsigned long)head; diff --git a/mm/hugetlb_vmemmap.h b/mm/hugetlb_vmemmap.h index 6923f03534d5..e5547d53b9f5 100644 --- a/mm/hugetlb_vmemmap.h +++ b/mm/hugetlb_vmemmap.h @@ -11,8 +11,14 @@ #include #ifdef CONFIG_HUGETLB_PAGE_FREE_VMEMMAP +int alloc_huge_page_vmemmap(struct hstate *h, struct page *head); void free_huge_page_vmemmap(struct hstate *h, struct page *head); #else +static inline int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) +{ + return 0; +} + static inline void free_huge_page_vmemmap(struct hstate *h, struct page *head) { } diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c index d3076a7a3783..60fc6cd6cd23 100644 --- a/mm/sparse-vmemmap.c +++ b/mm/sparse-vmemmap.c @@ -40,7 +40,8 @@ * @remap_pte: called for each lowest-level entry (PTE). * @reuse_page: the page which is reused for the tail vmemmap pages. * @reuse_addr: the virtual address of the @reuse_page page. - * @vmemmap_pages: the list head of the vmemmap pages that can be freed. + * @vmemmap_pages: the list head of the vmemmap pages that can be freed + * or is mapped from. */ struct vmemmap_remap_walk { void (*remap_pte)(pte_t *pte, unsigned long addr, @@ -237,6 +238,78 @@ void vmemmap_remap_free(unsigned long start, unsigned long end, free_vmemmap_page_list(&vmemmap_pages); } +static void vmemmap_restore_pte(pte_t *pte, unsigned long addr, + struct vmemmap_remap_walk *walk) +{ + pgprot_t pgprot = PAGE_KERNEL; + struct page *page; + void *to; + + BUG_ON(pte_page(*pte) != walk->reuse_page); + + page = list_first_entry(walk->vmemmap_pages, struct page, lru); + list_del(&page->lru); + to = page_to_virt(page); + copy_page(to, (void *)walk->reuse_addr); + + set_pte_at(&init_mm, addr, pte, mk_pte(page, pgprot)); +} + +static int alloc_vmemmap_page_list(unsigned long start, unsigned long end, + gfp_t gfp_mask, struct list_head *list) +{ + unsigned long nr_pages = (end - start) >> PAGE_SHIFT; + int nid = page_to_nid((struct page *)start); + struct page *page, *next; + + while (nr_pages--) { + page = alloc_pages_node(nid, gfp_mask, 0); + if (!page) + goto out; + list_add_tail(&page->lru, list); + } + + return 0; +out: + list_for_each_entry_safe(page, next, list, lru) + __free_pages(page, 0); + return -ENOMEM; +} + +/** + * vmemmap_remap_alloc - remap the vmemmap virtual address range [@start, end) + * to the page which is from the @vmemmap_pages + * respectively. + * @start: start address of the vmemmap virtual address range that we want + * to remap. + * @end: end address of the vmemmap virtual address range that we want to + * remap. + * @reuse: reuse address. + * @gpf_mask: GFP flag for allocating vmemmap pages. + */ +int vmemmap_remap_alloc(unsigned long start, unsigned long end, + unsigned long reuse, gfp_t gfp_mask) +{ + LIST_HEAD(vmemmap_pages); + struct vmemmap_remap_walk walk = { + .remap_pte = vmemmap_restore_pte, + .reuse_addr = reuse, + .vmemmap_pages = &vmemmap_pages, + }; + + /* See the comment in the vmemmap_remap_free(). */ + BUG_ON(start - reuse != PAGE_SIZE); + + might_sleep_if(gfpflags_allow_blocking(gfp_mask)); + + if (alloc_vmemmap_page_list(start, end, gfp_mask, &vmemmap_pages)) + return -ENOMEM; + + vmemmap_remap_range(reuse, end, &walk); + + return 0; +} + /* * Allocate a block of memory to be used to back the virtual memory map * or to back the page tables that are used to create the mapping. -- 2.11.0