Received: by 2002:ac0:bc90:0:0:0:0:0 with SMTP id a16csp5135566img; Wed, 27 Mar 2019 02:50:30 -0700 (PDT) X-Google-Smtp-Source: APXvYqzPiSFteqKtYMwkHr8yYW1B/pRdm6FqCh6pgj2t2XRBvXg+f+S03lBfyhgkX7+fQj6SzGiX X-Received: by 2002:a62:293:: with SMTP id 141mr33824023pfc.245.1553680230245; Wed, 27 Mar 2019 02:50:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1553680230; cv=none; d=google.com; s=arc-20160816; b=t3b71Jc9fvJU7XxbjDEfDRwiFEmnTqG4BlBKEaRGVA2/nEZ/2VwSxEh87pv1IJhbHX EwjKy1A4cJzjuu4u30o0GfNC5rJCZTMLofAfXW/Qvh49FLY5cj+VtT1p412Je9xkiwv7 CIjH3r4wL3eZyVhCmaJ3SAK0IAjYVdQShfj1+34hPyGLN33QYna42r4cpgdrnNHtgwLO 1T9wwlRIqEP49aJonB88u1VzmLiRRth1CUBNiM+rnEAVRdwNdj3e70OzDJN5DKreke3f AMcstlR9Yx7hThxdtOecH0z7C23/6zjzaIgQ13NtY4h3GsST8dvo0TOBTEAooRx0rigz nVeg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-language :content-transfer-encoding:in-reply-to:mime-version:user-agent:date :message-id:from:references:to:subject; bh=KRQOYa8+1E/3fz3sNF3uRKhF9otmQPsHxlcaig41dTs=; b=Z5bNVNm4ka8rkZKQmLSWLWLGmgrM8kM5QxxeqLyOXudV/SlkDJ6w8qCZ7WLoRt6c6B GGpOrSZ+dlOkJThL/xKQvlrIZJaZFUPFE+76KMF9KwIkmzTIJC/9bezhqA5XvmkbPIpL y1UCLWwl9WKkXwMWG868F5vGlzhCW0MSSTOYgWUO/CoFqnQwHndpy4FA7GmxOSHrx7qi anlIN+Kkj0FGoL9ODYXWCqfDENsR6Eu2TesOsJP62/wN0T1dSvjyAiGUD/yFoqEYdc2Z deP6g1NpL8yVWNgVvHCOFOX3p7Dpk47R8KrqTvEzqrFmbCzba+VoAih9HDSgp99u/6qX CyPA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id z64si3224739pgd.106.2019.03.27.02.50.15; Wed, 27 Mar 2019 02:50:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1732717AbfC0JtX (ORCPT + 99 others); Wed, 27 Mar 2019 05:49:23 -0400 Received: from relay12.mail.gandi.net ([217.70.178.232]:58115 "EHLO relay12.mail.gandi.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725786AbfC0JtW (ORCPT ); Wed, 27 Mar 2019 05:49:22 -0400 Received: from [10.30.1.20] (lneuilly-657-1-5-103.w81-250.abo.wanadoo.fr [81.250.144.103]) (Authenticated sender: alex@ghiti.fr) by relay12.mail.gandi.net (Postfix) with ESMTPSA id 2175620000E; Wed, 27 Mar 2019 09:49:13 +0000 (UTC) Subject: Re: [PATCH v8 4/4] hugetlb: allow to free gigantic pages regardless of the configuration To: "Aneesh Kumar K.V" , mpe@ellerman.id.au, Andrew Morton , Vlastimil Babka , Catalin Marinas , Will Deacon , Benjamin Herrenschmidt , Paul Mackerras , Martin Schwidefsky , Heiko Carstens , Yoshinori Sato , Rich Felker , "David S . Miller" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "H . Peter Anvin" , x86@kernel.org, Dave Hansen , Andy Lutomirski , Peter Zijlstra , Mike Kravetz , linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-mm@kvack.org References: <20190327063626.18421-1-alex@ghiti.fr> <20190327063626.18421-5-alex@ghiti.fr> From: Alexandre Ghiti Message-ID: Date: Wed, 27 Mar 2019 10:48:54 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: fr Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 03/27/2019 09:55 AM, Aneesh Kumar K.V wrote: > On 3/27/19 2:14 PM, Alexandre Ghiti wrote: >> >> >> On 03/27/2019 08:01 AM, Aneesh Kumar K.V wrote: >>> On 3/27/19 12:06 PM, Alexandre Ghiti wrote: >>>> On systems without CONTIG_ALLOC activated but that support gigantic >>>> pages, >>>> boottime reserved gigantic pages can not be freed at all. This patch >>>> simply enables the possibility to hand back those pages to memory >>>> allocator. >>>> >>>> Signed-off-by: Alexandre Ghiti >>>> Acked-by: David S. Miller [sparc] >>>> >>>> diff --git a/arch/powerpc/include/asm/book3s/64/hugetlb.h >>>> b/arch/powerpc/include/asm/book3s/64/hugetlb.h >>>> index ec2a55a553c7..7013284f0f1b 100644 >>>> --- a/arch/powerpc/include/asm/book3s/64/hugetlb.h >>>> +++ b/arch/powerpc/include/asm/book3s/64/hugetlb.h >>>> @@ -36,8 +36,8 @@ static inline int hstate_get_psize(struct hstate >>>> *hstate) >>>>       } >>>>   } >>>>   -#ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE >>>> -static inline bool gigantic_page_supported(void) >>>> +#define __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED >>>> +static inline bool gigantic_page_runtime_supported(void) >>>>   { >>>>       /* >>>>        * We used gigantic page reservation with hypervisor assist >>>> in some case. >>>> @@ -49,7 +49,6 @@ static inline bool gigantic_page_supported(void) >>>>         return true; >>>>   } >>>> -#endif >>>>     /* hugepd entry valid bit */ >>>>   #define HUGEPD_VAL_BITS        (0x8000000000000000UL) >>> >>> Is that correct when CONTIG_ALLOC is not enabled? I guess we want >>> >>> gigantic_page_runtime_supported to return false when CONTIG_ALLOC is >>> not enabled on all architectures and on POWER when it is enabled we >>> want it to be conditional as it is now. >>> >>> -aneesh >>> >> >> CONFIG_ARCH_HAS_GIGANTIC_PAGE is set by default when an architecture >> supports gigantic >> pages: on its own, it allows to allocate boottime gigantic pages AND >> to free them at runtime >> (this is the goal of this series), but not to allocate runtime >> gigantic pages. >> If CONTIG_ALLOC is set, it allows in addition to allocate runtime >> gigantic pages. >> >> I re-introduced the runtime checks because we can't know at compile >> time if powerpc can >> or not support gigantic pages. >> >> So for all architectures, gigantic_page_runtime_supported only >> depends on >> CONFIG_ARCH_HAS_GIGANTIC_PAGE enabled or not. The possibility to >> allocate runtime >> gigantic pages is dealt with after those runtime checks. >> > > you removed that #ifdef in the patch above. ie we had > #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE > static inline bool gigantic_page_supported(void) > { >     /* >      * We used gigantic page reservation with hypervisor assist in > some case. >      * We cannot use runtime allocation of gigantic pages in those > platforms >      * This is hash translation mode LPARs. >      */ >     if (firmware_has_feature(FW_FEATURE_LPAR) && !radix_enabled()) >         return false; > >     return true; > } > #endif Yes, I removed the #ifdef CONFIG_ARCH_HAS_GIGANTIC_PAGE because it was defined only if CONTIG_ALLOC was set. But now, CONFIG_ARCH_HAS_GIGANTIC_PAGE is inconditionally set for powerpc so I think we don't need it anymore. Actually I have doubts now, is this true for all configurations ? I see that it is only set for PPC_RADIX_MMU. I think the problem is here: instead of returning true, it should do like the generic version, ie return IS_ENABLED(CONFIG_ARCH_HAS_GIGANTIC_PAGE). Do you agree ? > > > This is now > #define __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED > static inline bool gigantic_page_runtime_supported(void) > { > if (firmware_has_feature(FW_FEATURE_LPAR) && !radix_enabled()) >         return false; > >     return true; > } > > > I am wondering whether it should be > > #define __HAVE_ARCH_GIGANTIC_PAGE_RUNTIME_SUPPORTED > static inline bool gigantic_page_runtime_supported(void) > { > >    if (!IS_ENABLED(CONFIG_CONTIG_ALLOC)) >         return false; I don't think this test should happen here, CONFIG_CONTIG_ALLOC only allows to allocate gigantic pages, doing that check here would prevent powerpc to free boottime gigantic pages when not a guest. Note that this check is actually done in set_max_huge_pages. > > if (firmware_has_feature(FW_FEATURE_LPAR) && !radix_enabled()) >         return false; Maybe I did not understand this check: I understood that, in the case the system is virtualized, we do not want it to hand back gigantic pages. Does this check test if the system is currently being virtualized ? If yes, I think the patch is correct: it prevents freeing gigantic pages when the system is virtualized but allows a 'normal' system to free gigantic pages. > >     return true; > } > > or add that #ifdef back. > >> By the way, I forgot to ask you why you think that if an arch cannot >> allocate runtime gigantic >> pages, it should not be able to free boottime gigantic pages ? >> > > on virtualized platforms like powervm which use a paravirtualized page > table update mechanism (we dont' have two level table). The ability to > map a page huge depends on how hypervisor allocated the guest ram. > Hypervisor also allocates the guest specific page table of a specific > size depending on how many pages are going to be mapped by what page > size. > > on POWER we indicate possible guest real address that can be mapped > via hugepage (in this case 16G) using a device tree node > (ibm,expected#pages) . It is expected that we will map these pages > only as 16G pages. Hence we cannot free them back to the buddy where > it could get mapped via 64K page size. Thanks for the explanations. Alex > > -aneesh > >