Received: by 2002:a6b:fb09:0:0:0:0:0 with SMTP id h9csp306274iog; Wed, 29 Jun 2022 00:01:12 -0700 (PDT) X-Google-Smtp-Source: AGRyM1vA9EcoSakyxygJ46AhjqfMljP68UM4tL87GLI4AAMImNmaECxOMcuxiQFDBMnRKdU4BG1y X-Received: by 2002:a17:907:7251:b0:723:dc32:aefb with SMTP id ds17-20020a170907725100b00723dc32aefbmr1761817ejc.91.1656486072462; Wed, 29 Jun 2022 00:01:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1656486072; cv=none; d=google.com; s=arc-20160816; b=RS4HXkJ3Jm9ql2DjFIDomoWpvPm2fqyrTNTeVpltGtt+xx0fTkxs8cm/5EFhZpFn8H HVUowtUec44u4t+AGQa8/j5aFMT3ZhvHiRGi25RuKxBuhqfnbhz8okR9D7NMTqEzdbML fg8FXY20oZLqNg2m2u90W6NkLQBivsBEPyBg6BIpGT1BhtBjVg7jwINt60bkY/xeQ0uR sxjX1tz5ivgfg5XXPUk+ZRI+eYJpERmOxMjt6EwO5xzOPqCxAO0f5ys4NuXbYj0n3ANw 3smRgmTDSMqWE6v9iz+JexJ2Y5W4gd7W5v/wnaMMuvdqOejiT4lXJjBnxge6fKVUhlXw kiHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:in-reply-to:content-disposition:mime-version :references:message-id:subject:cc:to:from:date:dkim-signature; bh=3fDtVdFXcJ1THjzBC8Y1/G61yWX9UP0wjExNA/N4V6k=; b=XstczOffnhrKJ1h1IOu3vkG7MFSHl7ftzdOXZOZJu0DATnhM7ddIhSJ4IEcqPrgUfS 8j5AIV8esOFxchDGC26lcHUP/BiizMJCawUlTlZq2R5DzgNuprQRc16OrPrDSegEka4Z I6Ocq13Hejm3p7j8cDhlU0aSLkCOXGMKTYahO3SvsZjAMquOBqZuS+0fUWf7toNJm2Pj ROCBUcTnuJ6uuax9ZtqqKAp6+WuGMsCQ00owY03Xk3RHQl3lDqciMf3yu32zmcarZe+i uvL92g4bgHEQWfM+1Ksd/2wYXhwPujdhwfEBt0gKVNqQyhuuar5KkR42qk64rYZsATdw 4EiQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=bvufUtoD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id nc42-20020a1709071c2a00b00706b9787b43si19805897ejc.319.2022.06.29.00.00.44; Wed, 29 Jun 2022 00:01:12 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=bvufUtoD; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232132AbiF2Gj0 (ORCPT + 99 others); Wed, 29 Jun 2022 02:39:26 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48638 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231290AbiF2GjW (ORCPT ); Wed, 29 Jun 2022 02:39:22 -0400 Received: from mail-pl1-x62d.google.com (mail-pl1-x62d.google.com [IPv6:2607:f8b0:4864:20::62d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D3D8720BC3 for ; Tue, 28 Jun 2022 23:39:21 -0700 (PDT) Received: by mail-pl1-x62d.google.com with SMTP id q18so13159365pld.13 for ; Tue, 28 Jun 2022 23:39:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=3fDtVdFXcJ1THjzBC8Y1/G61yWX9UP0wjExNA/N4V6k=; b=bvufUtoDfA/LfWUF82SXjQBHU2AmthBebbO+sxI0rXBGT8WtDWG95lIcL4rZGxP5Tt PwZGqgSPUqKXjTiv9mlS7ejYruLqcwcKOMjX9Mh5EaSbgB6IGfuZBUbSvTcBQaTKJaEt VeBhPLlVwum1JmdK8FMVEvvbzI7/V33dbCBFjFoWsvD0BXSXndsu+9KM3gBbgFQ6zKPk saN9jAkLtYiQttbVbqLKYsACpC8c+omvBbtiGBOdrZsd4FJh4xwcBrzCxJtXgTj6/J6u /kLln4GD5YmkOYVoyo1ax+Tzrn5FJFSlsdVCpPGfBsGUNES6COLFFOdMXZzb9PGqHg/z 2baQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=3fDtVdFXcJ1THjzBC8Y1/G61yWX9UP0wjExNA/N4V6k=; b=M/0VgGlWyJqAhYUv8Dr/H2I1DFJejVIami2r6c6DKXsHZR/6krsP7Yh1OeQyVM0I3o doB2+fDSihW/wXWO7Jd7N25oFncIJl81BPwZX/QC+MEVpNOBdzp1P5k2TZTAYnu9+Rql jUHUezxkjXtLPG4s1F6Unw+fIeHFnphyNG8/Dya9mHDE2ZVn7AeiwIkWJF/eeoKBlkVk qt71EShtuqV+4F/kva7CicquHgaDxhHN2S+Di4DzX1eDGcw1bOLo1gYlnM+mmxEnhRPb nps6GCLCq/I4N14MNyKSFzg0nmaFGVaCNibTzOYQ0rFBPxM4oTrdNSvtFjbediX9iE3O QFGw== X-Gm-Message-State: AJIora8m4QzxLzVb59cOiU6ee2cXrymvpW7OFkXX/jbpcoSam3iycAuZ 2JhgWeooiYkYjK3pjKj0AjntgA== X-Received: by 2002:a17:902:d2c4:b0:16a:5c48:8312 with SMTP id n4-20020a170902d2c400b0016a5c488312mr7785514plc.45.1656484761389; Tue, 28 Jun 2022 23:39:21 -0700 (PDT) Received: from localhost ([139.177.225.245]) by smtp.gmail.com with ESMTPSA id y27-20020a634b1b000000b0040cff9def93sm10268662pga.66.2022.06.28.23.39.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 28 Jun 2022 23:39:20 -0700 (PDT) Date: Wed, 29 Jun 2022 14:39:17 +0800 From: Muchun Song To: James Houghton Cc: Mike Kravetz , Peter Xu , David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , Jue Wang , Manish Mishra , "Dr . David Alan Gilbert" , linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH 02/26] hugetlb: sort hstates in hugetlb_init_hstates Message-ID: References: <20220624173656.2033256-1-jthoughton@google.com> <20220624173656.2033256-3-jthoughton@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 28, 2022 at 08:40:27AM -0700, James Houghton wrote: > On Mon, Jun 27, 2022 at 11:42 AM Mike Kravetz wrote: > > > > On 06/24/22 17:36, James Houghton wrote: > > > When using HugeTLB high-granularity mapping, we need to go through the > > > supported hugepage sizes in decreasing order so that we pick the largest > > > size that works. Consider the case where we're faulting in a 1G hugepage > > > for the first time: we want hugetlb_fault/hugetlb_no_page to map it with > > > a PUD. By going through the sizes in decreasing order, we will find that > > > PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too. > > > > > > Signed-off-by: James Houghton > > > --- > > > mm/hugetlb.c | 40 +++++++++++++++++++++++++++++++++++++--- > > > 1 file changed, 37 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > > index a57e1be41401..5df838d86f32 100644 > > > --- a/mm/hugetlb.c > > > +++ b/mm/hugetlb.c > > > @@ -33,6 +33,7 @@ > > > #include > > > #include > > > #include > > > +#include > > > > > > #include > > > #include > > > @@ -48,6 +49,10 @@ > > > > > > int hugetlb_max_hstate __read_mostly; > > > unsigned int default_hstate_idx; > > > +/* > > > + * After hugetlb_init_hstates is called, hstates will be sorted from largest > > > + * to smallest. > > > + */ > > > struct hstate hstates[HUGE_MAX_HSTATE]; > > > > > > #ifdef CONFIG_CMA > > > @@ -3144,14 +3149,43 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) > > > kfree(node_alloc_noretry); > > > } > > > > > > +static int compare_hstates_decreasing(const void *a, const void *b) > > > +{ > > > + const int shift_a = huge_page_shift((const struct hstate *)a); > > > + const int shift_b = huge_page_shift((const struct hstate *)b); > > > + > > > + if (shift_a < shift_b) > > > + return 1; > > > + if (shift_a > shift_b) > > > + return -1; > > > + return 0; > > > +} > > > + > > > +static void sort_hstates(void) > > > +{ > > > + unsigned long default_hstate_sz = huge_page_size(&default_hstate); > > > + > > > + /* Sort from largest to smallest. */ > > > + sort(hstates, hugetlb_max_hstate, sizeof(*hstates), > > > + compare_hstates_decreasing, NULL); > > > + > > > + /* > > > + * We may have changed the location of the default hstate, so we need to > > > + * update it. > > > + */ > > > + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz)); > > > +} > > > + > > > static void __init hugetlb_init_hstates(void) > > > { > > > struct hstate *h, *h2; > > > > > > - for_each_hstate(h) { > > > - if (minimum_order > huge_page_order(h)) > > > - minimum_order = huge_page_order(h); > > > + sort_hstates(); > > > > > > + /* The last hstate is now the smallest. */ > > > + minimum_order = huge_page_order(&hstates[hugetlb_max_hstate - 1]); > > > + > > > + for_each_hstate(h) { > > > /* oversize hugepages were init'ed in early boot */ > > > if (!hstate_is_gigantic(h)) > > > hugetlb_hstate_alloc_pages(h); > > > > This may/will cause problems for gigantic hugetlb pages allocated at boot > > time. See alloc_bootmem_huge_page() where a pointer to the associated hstate > > is encoded within the allocated hugetlb page. These pages are added to > > hugetlb pools by the routine gather_bootmem_prealloc() which uses the saved > > hstate to add prep the gigantic page and add to the correct pool. Currently, > > gather_bootmem_prealloc is called after hugetlb_init_hstates. So, changing > > hstate order will cause errors. > > > > I do not see any reason why we could not call gather_bootmem_prealloc before > > hugetlb_init_hstates to avoid this issue. > > Thanks for catching this, Mike. Your suggestion certainly seems to > work, but it also seems kind of error prone. I'll have to look at the > code more closely, but maybe it would be better if I just maintained a > separate `struct hstate *sorted_hstate_ptrs[]`, where the original I don't think this is a good idea. If you really rely on the order of the initialization in this patch. The easier solution is changing huge_bootmem_page->hstate to huge_bootmem_page->hugepagesz. Then we can use size_to_hstate(huge_bootmem_page->hugepagesz) in gather_bootmem_prealloc(). Thanks. > locations of the hstates remain unchanged, as to not break > gather_bootmem_prealloc/other things. > > > -- > > Mike Kravetz >