Received: by 2002:a05:6512:2355:0:0:0:0 with SMTP id p21csp214669lfu; Wed, 30 Mar 2022 21:19:48 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwg2eXztnAz3znMc4zCj60nLXSshvWOzeTmgqN+DvtH6zNbPF1e8RPeCaw4oq8mbrSMTcA6 X-Received: by 2002:a63:7c18:0:b0:398:2d6e:488e with SMTP id x24-20020a637c18000000b003982d6e488emr9094086pgc.379.1648700388245; Wed, 30 Mar 2022 21:19:48 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1648700388; cv=none; d=google.com; s=arc-20160816; b=tjEyMk0sO9IzX+sc2jQF++FRV3q8Rgv8Z44Ly2LyzJ2a3jQDrvu2A5OhccslCVjTJS 8DYsEZNKzOcPieBbDryb0QhjN4ZkTOC3S+tEvZfjgcarSJOvKht1oa/BPyGT9BYUGPE4 uAEq/nsuOzvSUz8cDMJ0ulyWTpqbZKH927IQ5ytPMtuv9QsF/N4LeukyO6RFfD1bEOFD i+5liV0NaeyjzgUQtja+xKINrvi3bQobz1Mr6WwuzMhPZngSNM0mSpxSFq35t/mGbOwO xIHv5Mulfhqr9gxbWus12qQ+78UQXm1VRExG6GhK9V7B+vBjqloEVjOgazMlaRE6PcyX M4OQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=8+ueIIKq75xrOgcbpQGCP0Xp5hXOruV4yiy78MzdsaM=; b=SIcUAYboCTJjgDM7TBcynWj13hgmCXaLh+qucnvieSKveOngTU1KnuI6R53LBnxR+B kxpi2S74WBG/Dynr6DVHOpEjYk1jmy1xey5lvpl8U/WYru/EBkF+XnazCaPk75HC4ljf SJ67Jei6W2bqPoefXQFvDouS+3SGXE8oZiVEvZ3mXYY3PFuQvxfedSnGflo9jJeO5w0r 57Ygq9FBHT0s0weu8oP/9KBrvslmvpj+yLkS0ggRRFfLMPbHKrdgfE0iywk6ZEb3iQuG QgdQqwDeH6EGxE/Wr3SGFUFgmGydPgpnePjwQlUoy1B0Q0oOlId0pRi8q8Ul4AiMiHhF v4ZA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="xF/J7v97"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id u13-20020a17090abb0d00b001c67621041bsi1955830pjr.110.2022.03.30.21.19.31; Wed, 30 Mar 2022 21:19:48 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b="xF/J7v97"; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230185AbiCaEOT (ORCPT + 99 others); Thu, 31 Mar 2022 00:14:19 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:39100 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230015AbiCaEOH (ORCPT ); Thu, 31 Mar 2022 00:14:07 -0400 Received: from mail-yw1-x1131.google.com (mail-yw1-x1131.google.com [IPv6:2607:f8b0:4864:20::1131]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BD3928EA28 for ; Wed, 30 Mar 2022 20:46:06 -0700 (PDT) Received: by mail-yw1-x1131.google.com with SMTP id 00721157ae682-2e6ceb45174so201046847b3.8 for ; Wed, 30 Mar 2022 20:46:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8+ueIIKq75xrOgcbpQGCP0Xp5hXOruV4yiy78MzdsaM=; b=xF/J7v97bhv1UsQTHnB+oBqMfeIAD9hShTLRksgabyqK2tVhgKtDtkLmc5GQKPQhfo dnm1DcIGeLSIDCQ/OkLHR4kYPHLDVeVa1G/NKeXdtKoyJubynsPePbmIvtT6tsPIElCv xhatSYIul1FpQ+5luE2LftI/IaNtqtDgcwiMiEJRe2lSE4op1T4f203G18T5JZRVM+WQ Y+1+SFuF6v00ObQ0dQyfehBxjpDArKxpBsg8GF2ZDtDKHMqQgsoVXTUUr9+/Hw5QtlLD llIC04kkVwqvRIAmy7WDCVRlb/tq4OGtEMZcFd7bbhdxjMxxf+GON+jNTy2a6YdUgXpy PAgg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8+ueIIKq75xrOgcbpQGCP0Xp5hXOruV4yiy78MzdsaM=; b=AtQHGMJtwOtJ5MypS21/zzyKNcy9KCPg+oJ6UBdKw6TAP3ALT6PIfsdtwMRBE8f6zL 1wnLr/5azZLFUEb3qk8u9ikrjpVWAfowtENMEc2N+4oV/jXB48MY2jZFCDIeSMxSXJkS Rd18mkTICkIfEysrX+cSLoJpZ40fztpK5zNjsTOUf1FZJMvoc6K5mxG3KmtXwqOq6VCQ U838x6ChNKD8WeiW+8YxMf8p4zPh1U30qBBkMhlsVOPs17B68sknCMi3b6j/E64CwI/s Cr+PhMME59N9xYCIe4Fy5a5wOv9yaTR4vDJrUn4EjQcDoUcYFdKDfGq7jBggt9r+U7Dm qAuA== X-Gm-Message-State: AOAM531LItyj32W2DDqLp0yv510pb48gOFgiqrWg/Sj30KQ77WLY8gNt Mi6w2u5aQMw/kjbKVqwVUG8X9s5trAPZmjxxkO70kA== X-Received: by 2002:a81:3756:0:b0:2e3:3db4:7de1 with SMTP id e83-20020a813756000000b002e33db47de1mr3232544ywa.458.1648698365817; Wed, 30 Mar 2022 20:46:05 -0700 (PDT) MIME-Version: 1.0 References: <20220330153745.20465-1-songmuchun@bytedance.com> <20220330153745.20465-5-songmuchun@bytedance.com> <20220330193657.88f68bbf13fb198fb189bc15@linux-foundation.org> In-Reply-To: <20220330193657.88f68bbf13fb198fb189bc15@linux-foundation.org> From: Muchun Song Date: Thu, 31 Mar 2022 11:45:29 +0800 Message-ID: Subject: Re: [PATCH v6 4/4] mm: hugetlb_vmemmap: add hugetlb_free_vmemmap sysctl To: Andrew Morton Cc: Jonathan Corbet , Mike Kravetz , Luis Chamberlain , Kees Cook , Iurii Zaikin , Oscar Salvador , David Hildenbrand , Masahiro Yamada , Linux Doc Mailing List , LKML , Linux Memory Management List , Xiongchun duan , Muchun Song Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 31, 2022 at 10:37 AM Andrew Morton wrote: > > On Wed, 30 Mar 2022 23:37:45 +0800 Muchun Song wrote: > > > We must add "hugetlb_free_vmemmap=on" to boot cmdline and reboot the > > server to enable the feature of freeing vmemmap pages of HugeTLB > > pages. Rebooting usually takes a long time. Add a sysctl to enable > > or disable the feature at runtime without rebooting. > > I forget, why did we add the hugetlb_free_vmemmap option in the first > place? Why not just leave the feature enabled in all cases? The 1st reason is because we disable PMD/huge page mapping of vmemmap pages (in the original version) which increase page table pages. So if a user/sysadmin only uses a small number of HugeTLB pages (as a percentage of system memory), they could end up using more memory with hugetlb_free_vmemmap on as opposed to off. Now this tradeoff is gone. The 2nd reason is this feature adds more overhead in the path of HugeTLB allocation/freeing from/to the buddy system. As Mike said in the link [1]. " There are still some instances where huge pages are allocated 'on the fly' instead of being pulled from the pool. Michal pointed out the case of page migration. It is also possible for someone to use hugetlbfs without pre-allocating huge pages to the pool. I remember the use case pointed out in commit 099730d67417. It says, "I have a hugetlbfs user which is never explicitly allocating huge pages with 'nr_hugepages'. They only set 'nr_overcommit_hugepages' and then let the pages be allocated from the buddy allocator at fault time." In this case, I suspect they were using 'page fault' allocation for initialization much like someone using /proc/sys/vm/nr_hugepages. So, the overhead may not be as noticeable. " For those different workloads, we introduce hugetlb_free_vmemmap and expect users to make decisions based on their workloads. [1] https://patchwork.kernel.org/comment/23752641/ > > Furthermore, why would anyone want to tweak this at runtime? What is > the use case? Where is the end-user value in all of this? If the workload is changed in the future on a server. The users need to adapt this at runtime without rebooting the server. > > > Disabling requires there is no any optimized HugeTLB page in the > > system. If you fail to disable it, you can set "nr_hugepages" to 0 > > and then retry. > > > > --- a/Documentation/admin-guide/sysctl/vm.rst > > +++ b/Documentation/admin-guide/sysctl/vm.rst > > @@ -561,6 +561,20 @@ Change the minimum size of the hugepage pool. > > See Documentation/admin-guide/mm/hugetlbpage.rst > > > > > > +hugetlb_free_vmemmap > > +==================== > > + > > +Enable (set to 1) or disable (set to 0) the feature of optimizing vmemmap > > +pages associated with each HugeTLB page. Once true, the vmemmap pages of > > +subsequent allocation of HugeTLB pages from buddy system will be optimized, > > +whereas already allocated HugeTLB pages will not be optimized. If you fail > > +to disable this feature, you can set "nr_hugepages" to 0 and then retry > > +since it is only allowed to be disabled after there is no any optimized > > +HugeTLB page in the system. > > + > > Pity the poor user who is looking at this and wondering whether it will > improve or worsen things. If we don't tell them, who will? Are they > supposed to just experiment? > > What can we add here to help them understand whether this might be > beneficial? > My bad. I should explain more details to let users make better decisions. Thanks.