Received: by 2002:ac0:946b:0:0:0:0:0 with SMTP id j40csp1594051imj; Sun, 17 Feb 2019 09:26:21 -0800 (PST) X-Google-Smtp-Source: AHgI3IYDKrN+S2Qv9yEv21GjgnzPFcnWo4yeiGIH4y+NuI/Urst/qE3IBZ2H7wXVAvozPsq7hnaa X-Received: by 2002:a63:374e:: with SMTP id g14mr15271575pgn.59.1550424381833; Sun, 17 Feb 2019 09:26:21 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1550424381; cv=none; d=google.com; s=arc-20160816; b=vWZzRNpTc5X6CHOeuPNgLLoEXMVkV+tExSI4KNODhwuduum8jgn7Dd72aoJxyM2ESV T+S59DIKWorsI/VaX+ZgTwLJf6uSVB3UQ+5UAwyC6dXtB8nXbH3f27XScQq5kltW4eq3 HwoqXVufP6sx0c12liqijSxeLMS6uqkqUd1SsMrIHAivS5b7KkehqqES5d5m4h6agCyh 6Nitk4E39i43PghC5qaHGdtfwDU2JignGq/VCwLfcTxn5fFFhueKkZJnFB8AaNxiTXtd G/BL9XQtAtT4j+LBO6PWQMKIRFlqbOvYBvBpDFqox+Y3CePhAZQp4Yjx4rf7UJggLdEx 9SQQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:references:to:subject:from; bh=MaNfqnJ2sX4P3SPjigjC3FcIjHvs3ufwxGgJSOZpIyM=; b=Po8okKjEmraBEtMsRPnWD1NpCKhFtzHN68ogxHtKdW+o3Fuu4B6CNGLEHwOMw9HTwQ pzID27gqABUtk1PxfW7Ci+GmVPKqs5dts3Hup9QUM8IQp8/Ek5EmIFmmA1izIPzW/ouG GquATh4ZzxpNUYX3fgKs+a5rWx76pth/Pa16Da3jKZjxyRHYEKCZQkBjYn729ychDd7m UccQPhfD276eoEqwpEAiX0t88oiEYObHtlBBLUQvFiqo1viMdFeEmf5aaBhAmuDpZvPL Ldgnz6PXvI9Zai5RPE9LgTdqs2/ZcH5ilg3qOn3QPK+Jm0YxrrhezkzljKRjN6ixsi4G VnHQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id b3si11498543pld.282.2019.02.17.09.26.05; Sun, 17 Feb 2019 09:26:21 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727902AbfBQROO (ORCPT + 99 others); Sun, 17 Feb 2019 12:14:14 -0500 Received: from mslow2.mail.gandi.net ([217.70.178.242]:36170 "EHLO mslow2.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725810AbfBQRON (ORCPT ); Sun, 17 Feb 2019 12:14:13 -0500 Received: from relay9-d.mail.gandi.net (unknown [217.70.183.199]) by mslow2.mail.gandi.net (Postfix) with ESMTP id ACFFA3A6CDA; Sun, 17 Feb 2019 18:06:36 +0100 (CET) X-Originating-IP: 79.86.19.127 Received: from [192.168.0.11] (127.19.86.79.rev.sfr.net [79.86.19.127]) (Authenticated sender: alex@ghiti.fr) by relay9-d.mail.gandi.net (Postfix) with ESMTPSA id A9E19FF802; Sun, 17 Feb 2019 17:06:26 +0000 (UTC) From: Alex Ghiti Subject: Re: [PATCH v3] hugetlb: allow to free gigantic pages regardless of the configuration To: Dave Hansen , Vlastimil Babka , Catalin Marinas , Will Deacon , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Martin Schwidefsky , Heiko Carstens , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , x86@kernel.org, Dave Hansen , Andy Lutomirski , Peter Zijlstra , Alexander Viro , Mike Kravetz , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org References: <20190214193100.3529-1-alex@ghiti.fr> Message-ID: <37046a52-a0eb-cb1a-0a72-601cdee45917@ghiti.fr> Date: Sun, 17 Feb 2019 12:06:26 -0500 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: sv-FI Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2/15/19 12:34 PM, Dave Hansen wrote: >> -#if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA) >> +#ifdef CONFIG_CONTIG_ALLOC >> /* The below functions must be run on a range from a single zone. */ >> extern int alloc_contig_range(unsigned long start, unsigned long end, >> unsigned migratetype, gfp_t gfp_mask); >> -extern void free_contig_range(unsigned long pfn, unsigned nr_pages); >> #endif >> +extern void free_contig_range(unsigned long pfn, unsigned int nr_pages); > There's a lot of stuff going on in this patch. Adding/removing config > options. Please get rid of these superfluous changes or at least break > them out. I agree that this patch does a lot of things. I am going at least to split it into 2 separate patches, one suggested-by Vlastimil regarding the renaming of MEMORY_ISOLATION && COMPACTION || CMA, and another that indeed does what was primarily intended. >> #ifdef CONFIG_CMA >> /* CMA stuff */ >> diff --git a/mm/Kconfig b/mm/Kconfig >> index 25c71eb8a7db..138a8df9b813 100644 >> --- a/mm/Kconfig >> +++ b/mm/Kconfig >> @@ -252,12 +252,17 @@ config MIGRATION >> pages as migration can relocate pages to satisfy a huge page >> allocation instead of reclaiming. >> >> + >> config ARCH_ENABLE_HUGEPAGE_MIGRATION >> bool > Like this. :) My apologies for that. >> config ARCH_ENABLE_THP_MIGRATION >> bool >> >> +config CONTIG_ALLOC >> + def_bool y >> + depends on (MEMORY_ISOLATION && COMPACTION) || CMA >> + >> config PHYS_ADDR_T_64BIT >> def_bool 64BIT > Please think carefully though the Kconfig dependencies. 'select' is > *not* the same as 'depends on'. > > This replaces a bunch of arch-specific "select ARCH_HAS_GIGANTIC_PAGE" > with a 'depends on'. I *think* that ends up being OK, but it absolutely > needs to be addressed in the changelog about why *you* think it is OK > and why it doesn't change the functionality of any of the patched > architetures. Ok. >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index afef61656c1e..e686c92212e9 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -1035,7 +1035,6 @@ static int hstate_next_node_to_free(struct hstate *h, nodemask_t *nodes_allowed) >> ((node = hstate_next_node_to_free(hs, mask)) || 1); \ >> nr_nodes--) >> >> -#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE >> static void destroy_compound_gigantic_page(struct page *page, >> unsigned int order) >> { > Whats the result of this #ifdef removal? A universally larger kernel > even for architectures that do not support runtime gigantic page > alloc/free? That doesn't seem like a good thing. Ok, I agree, now that we removed the "wrong" definition of ARCH_HAS_GIGANTIC_PAGE, we can actually use this define for architectures to show they support gigantic pages and avoid the problem you mention. Thanks. >> @@ -1058,6 +1057,12 @@ static void free_gigantic_page(struct page *page, unsigned int order) >> free_contig_range(page_to_pfn(page), 1 << order); >> } >> >> +static inline bool gigantic_page_runtime_allocation_supported(void) >> +{ >> + return IS_ENABLED(CONFIG_CONTIG_ALLOC); >> +} > Why bother having this function? Why don't the callers just check the > config option directly? Ok, this function is only used once in set_max_huge_pages where you mention the need for a comment, so I can get rid of it. Thanks. >> +#ifdef CONFIG_CONTIG_ALLOC >> static int __alloc_gigantic_page(unsigned long start_pfn, >> unsigned long nr_pages, gfp_t gfp_mask) >> { >> @@ -1143,22 +1148,15 @@ static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, >> static void prep_new_huge_page(struct hstate *h, struct page *page, int nid); >> static void prep_compound_gigantic_page(struct page *page, unsigned int order); >> >> -#else /* !CONFIG_ARCH_HAS_GIGANTIC_PAGE */ >> -static inline bool gigantic_page_supported(void) { return false; } >> +#else /* !CONFIG_CONTIG_ALLOC */ >> static struct page *alloc_gigantic_page(struct hstate *h, gfp_t gfp_mask, >> int nid, nodemask_t *nodemask) { return NULL; } >> -static inline void free_gigantic_page(struct page *page, unsigned int order) { } >> -static inline void destroy_compound_gigantic_page(struct page *page, >> - unsigned int order) { } >> #endif >> >> static void update_and_free_page(struct hstate *h, struct page *page) >> { >> int i; >> >> - if (hstate_is_gigantic(h) && !gigantic_page_supported()) >> - return; > I don't get the point of removing this check. Logically, this reads as > checking if the architecture supports gigantic hstates and has nothing > to do with allocation. I think this check was wrong from the beginning: gigantic_page_supported() was only checking (MEMORY_ISOLATION && COMPACTION) || CMA, which has nothing to do with the capability to free gigantic pages. But then I went through all the architectures to see if removing this test could affect any of them. And I noticed that if an architecture supports gigantic page without advertising it with ARCH_HAS_GIGANTIC_PAGE, then it would decrement the number of free huge page but would not actually free the pages. I found at least 2 archs that have gigantic pages, but do not allow runtime allocation nor freeing of those pages because they do not define the (wrong) ARCH_HAS_GIGANTIC_PAGE: - ia64 has HPAGE_SHIFT_DEFAULT = 28, with PAGE_SHIFT = 14 - sh has max HPAGE_SHIFT = 29 and max PAGE_SHIFT = 16 with default MAX_ORDER = 11, both architectures support gigantic pages. So I'm going to propose a patch that selects the (right) ARCH_HAS_GIGANTIC_PAGE for those archs, because I think they should be able to free their boottime gigantic pages. Regarding this check, we can either remove it if we are sure that every architecture that has gigantic pages selects ARCH_HAS_GIGANTIC_PAGE, or leaving it in case some future archs forget to select it. I'd rather patch all archs so that they can at least free gigantic pages and then remove the test since hstate_is_gigantic would imply gigantic_page_supported. I will propose something like that if you agree. >> h->nr_huge_pages--; >> h->nr_huge_pages_node[page_to_nid(page)]--; >> for (i = 0; i < pages_per_huge_page(h); i++) { >> @@ -2276,13 +2274,20 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, >> } >> >> #define persistent_huge_pages(h) (h->nr_huge_pages - h->surplus_huge_pages) >> -static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, >> +static int set_max_huge_pages(struct hstate *h, unsigned long count, >> nodemask_t *nodes_allowed) >> { >> unsigned long min_count, ret; >> >> - if (hstate_is_gigantic(h) && !gigantic_page_supported()) >> - return h->max_huge_pages; >> + if (hstate_is_gigantic(h) && >> + !gigantic_page_runtime_allocation_supported()) { > The indentation here is wrong and reduces readability. Needs to be like > this: > > if (hstate_is_gigantic(h) && > !gigantic_page_runtime_allocation_supported()) { This will disappear with your previous remark, thanks. >> + spin_lock(&hugetlb_lock); >> + if (count > persistent_huge_pages(h)) { >> + spin_unlock(&hugetlb_lock); >> + return -EINVAL; >> + } >> + goto decrease_pool; >> + } > Needs comments. > > /* Gigantic pages can be freed but not allocated */ > > or something. > Ok, I agree, I'll add that and another sentence regarding the removal of gigantic_page_runtime_allocation_supported. Thank you Dave for your comments ! Alex