Received: by 2002:a05:6a10:a852:0:0:0:0 with SMTP id d18csp977734pxy; Wed, 5 May 2021 19:59:08 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwFTfrAUJAc4kF0dKGcihYRO+IZG630vOT4JtDaiw0+j6Exeqt9L0t4iRhiSS411KfS6RqC X-Received: by 2002:a63:b108:: with SMTP id r8mr1995456pgf.66.1620269948356; Wed, 05 May 2021 19:59:08 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1620269948; cv=none; d=google.com; s=arc-20160816; b=bwzuAuOt8wz4UyiwnglpDst0S92zE28JBmT5nA52VqBgYAGc5s2MDe/ouZjE6qDIlZ rgv4qMf1ob5vraDYMDq6PDAuOjoGPNoReHC++0byQzWbhEMmqlgvdVMoRUXA9GNvzReC r+5/9/ouNXsmWWheDsH3a02X/xqXhLo9I/V7PrJPJJD3WR6fMyGNU3jtpJq4KHOzCrHR vuDv2JCEKtSpgLOVvJgqpuoijD88gBIMYZjStAjInWS1nrmzKIMhHmr1LdVwSk4Kd6Ru 2vUqE7JdaPLJ0CjsPRIylApco1mNh6rQTsnv2FIhchvQFEY8iQyL2I2aMt80Rnzd9wB0 iqYQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=YewfXLV+s6eseo7hxW6W1xEDd4aCTIQYY6fGFTM6318=; b=oHpENpbVDl+l/khN0NX3ylh5N7bcgqXE85ib3HGth34HYRDQXFSYqvlo/HMOC+SVBF 95N5VHsX+YAUyQdma3vqkcOxanAjEVtjXLZJy4LNhCI9CdDnzyjWJ+NDTlGkbbSrsAm9 yzbm/hWLWRFDhxm119/pbZchC9J8RgrDcmI585SPS/rd9uVDcbIP/2n7juolgNV1lz4I 1xO3qUGy32yh4mnt3T51fhAwsXZHapdgoOasJbbev6uUYMyFfmTLC8IB1Krvv4tox1Gf Wyfz7V+drZhs7aD9qcsxi5AT2CqN2AkFAQzbw+dRBafbwM9jvNO5mTd58rQToiB5kE4p 2M0w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=1g6apy4A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id p1si1130146plf.232.2021.05.05.19.58.55; Wed, 05 May 2021 19:59:08 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=1g6apy4A; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231745AbhEFCyf (ORCPT + 99 others); Wed, 5 May 2021 22:54:35 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:34214 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229465AbhEFCye (ORCPT ); Wed, 5 May 2021 22:54:34 -0400 Received: from mail-pl1-x631.google.com (mail-pl1-x631.google.com [IPv6:2607:f8b0:4864:20::631]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 98A53C061761 for ; Wed, 5 May 2021 19:53:37 -0700 (PDT) Received: by mail-pl1-x631.google.com with SMTP id s20so2624727plr.13 for ; Wed, 05 May 2021 19:53:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=YewfXLV+s6eseo7hxW6W1xEDd4aCTIQYY6fGFTM6318=; b=1g6apy4Azt+HLbznoibVPTszQApPwt5wzruvbJC5FJnjkaIHYHo+Bkd4lgwAWi7NPW FG2uf5WbZPNeI1sj9AWI79NRQyIGbGgASl8Hvh9TLzdRahLtITbx/vWLQcLAEOMeoqt6 o4CoFGCvqeBFGVkxvhGlpgZkReqTi1xh1Dh619tg3WDYfd+rUBX21oe8O4oiiW11A+/S F/Fp3g5j0ChlmaFmRneBoxrv5Jxqs87Wd5p2WI+xAlYMSqilKnXV/Q4HJd34PXoHiOOF CMc54ObYAPTg0Fe3Hd1Ammk7j0VlJY6u2D9DGWx8UTwCwMRl95U0OpcLwdtZAbsQSD57 UQ/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=YewfXLV+s6eseo7hxW6W1xEDd4aCTIQYY6fGFTM6318=; b=TRze3gY5+z5n2cXYDYySCLlT7RouhVvuwBX69L5xGtmkDJJdFTPg4UjXJyyw2+q0nE qcUCmWSmQiGZin3iAvkm+bBjBN+iNmJoQtP1rIBW+Ece3ITyb/1MbpSC8PrKbXPoa8E3 BJT5oNJG4Ub5LpgKs0NLs+Ajc0rbYjq2ZUGxEAEJrmQFAEezh0W4Eg94rYXmnJaQsvDZ bkLn6Z4yJ+Ac+56xvaMNUpiV/yk6FciP1XJ6i05S5n3psOVuwh46PiL4ntuTQ8afyEmZ 0XOgFmE8U9B8MGS88le2pqvMd59BUjjr2TDT6kU1jAOFDp0rN8Ubrv9tuDZ6PfQ5LNhZ SUvA== X-Gm-Message-State: AOAM533pZg45P8juTk+Bq8zaHM/QcOxPRbRxE1yyLRXQi3a+jTIIieir aZldi+sKD59WamUWIvAXf0jSdwypHHjItijNr24U9Q== X-Received: by 2002:a17:902:e54e:b029:ed:6ed2:d0ab with SMTP id n14-20020a170902e54eb02900ed6ed2d0abmr2031887plf.24.1620269616937; Wed, 05 May 2021 19:53:36 -0700 (PDT) MIME-Version: 1.0 References: <20210430031352.45379-1-songmuchun@bytedance.com> <20210430031352.45379-7-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Thu, 6 May 2021 10:52:58 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v22 6/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page To: Mike Kravetz Cc: Jonathan Corbet , Thomas Gleixner , Ingo Molnar , bp@alien8.de, X86 ML , hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , Alexander Viro , Andrew Morton , paulmck@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , Michal Hocko , "Song Bao Hua (Barry Song)" , David Hildenbrand , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Joao Martins , Xiongchun duan , fam.zheng@bytedance.com, zhengqi.arch@bytedance.com, linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, May 6, 2021 at 6:21 AM Mike Kravetz wrote: > > On 4/29/21 8:13 PM, Muchun Song wrote: > > When we free a HugeTLB page to the buddy allocator, we need to allocate > > the vmemmap pages associated with it. However, we may not be able to > > allocate the vmemmap pages when the system is under memory pressure. In > > this case, we just refuse to free the HugeTLB page. This changes behavior > > in some corner cases as listed below: > > > > 1) Failing to free a huge page triggered by the user (decrease nr_pages). > > > > User needs to try again later. > > > > 2) Failing to free a surplus huge page when freed by the application. > > > > Try again later when freeing a huge page next time. > > > > 3) Failing to dissolve a free huge page on ZONE_MOVABLE via > > offline_pages(). > > > > This can happen when we have plenty of ZONE_MOVABLE memory, but > > not enough kernel memory to allocate vmemmmap pages. We may even > > be able to migrate huge page contents, but will not be able to > > dissolve the source huge page. This will prevent an offline > > operation and is unfortunate as memory offlining is expected to > > succeed on movable zones. Users that depend on memory hotplug > > to succeed for movable zones should carefully consider whether the > > memory savings gained from this feature are worth the risk of > > possibly not being able to offline memory in certain situations. > > > > 4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via > > alloc_contig_range() - once we have that handling in place. Mainly > > affects CMA and virtio-mem. > > > > Similar to 3). virito-mem will handle migration errors gracefully. > > CMA might be able to fallback on other free areas within the CMA > > region. > > > > Vmemmap pages are allocated from the page freeing context. In order for > > those allocations to be not disruptive (e.g. trigger oom killer) > > __GFP_NORETRY is used. hugetlb_lock is dropped for the allocation > > because a non sleeping allocation would be too fragile and it could fail > > too easily under memory pressure. GFP_ATOMIC or other modes to access > > memory reserves is not used because we want to prevent consuming > > reserves under heavy hugetlb freeing. > > > > Signed-off-by: Muchun Song > > --- > > Documentation/admin-guide/mm/hugetlbpage.rst | 8 ++ > > Documentation/admin-guide/mm/memory-hotplug.rst | 13 ++++ > > include/linux/hugetlb.h | 3 + > > include/linux/mm.h | 2 + > > mm/hugetlb.c | 98 +++++++++++++++++++++---- > > mm/hugetlb_vmemmap.c | 34 +++++++++ > > mm/hugetlb_vmemmap.h | 6 ++ > > mm/migrate.c | 5 +- > > mm/sparse-vmemmap.c | 75 ++++++++++++++++++- > > 9 files changed, 227 insertions(+), 17 deletions(-) > > > > diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst > > index f7b1c7462991..6988895d09a8 100644 > > --- a/Documentation/admin-guide/mm/hugetlbpage.rst > > +++ b/Documentation/admin-guide/mm/hugetlbpage.rst > > @@ -60,6 +60,10 @@ HugePages_Surp > > the pool above the value in ``/proc/sys/vm/nr_hugepages``. The > > maximum number of surplus huge pages is controlled by > > ``/proc/sys/vm/nr_overcommit_hugepages``. > > + Note: When the feature of freeing unused vmemmap pages associated > > + with each hugetlb page is enabled, the number of surplus huge pages > > + may be temporarily larger than the maximum number of surplus huge > > + pages when the system is under memory pressure. > > Hugepagesize > > is the default hugepage size (in Kb). > > Hugetlb > > @@ -80,6 +84,10 @@ returned to the huge page pool when freed by a task. A user with root > > privileges can dynamically allocate more or free some persistent huge pages > > by increasing or decreasing the value of ``nr_hugepages``. > > > > +Note: When the feature of freeing unused vmemmap pages associated with each > > +hugetlb page is enabled, we can fail to free the huge pages triggered by > > +the user when ths system is under memory pressure. Please try again later. > > + > > Pages that are used as huge pages are reserved inside the kernel and cannot > > be used for other purposes. Huge pages cannot be swapped out under > > memory pressure. > > diff --git a/Documentation/admin-guide/mm/memory-hotplug.rst b/Documentation/admin-guide/mm/memory-hotplug.rst > > index 05d51d2d8beb..c6bae2d77160 100644 > > --- a/Documentation/admin-guide/mm/memory-hotplug.rst > > +++ b/Documentation/admin-guide/mm/memory-hotplug.rst > > @@ -357,6 +357,19 @@ creates ZONE_MOVABLE as following. > > Unfortunately, there is no information to show which memory block belongs > > to ZONE_MOVABLE. This is TBD. > > > > + Memory offlining can fail when dissolving a free huge page on ZONE_MOVABLE > > + and the feature of freeing unused vmemmap pages associated with each hugetlb > > + page is enabled. > > + > > + This can happen when we have plenty of ZONE_MOVABLE memory, but not enough > > + kernel memory to allocate vmemmmap pages. We may even be able to migrate > > + huge page contents, but will not be able to dissolve the source huge page. > > + This will prevent an offline operation and is unfortunate as memory offlining > > + is expected to succeed on movable zones. Users that depend on memory hotplug > > + to succeed for movable zones should carefully consider whether the memory > > + savings gained from this feature are worth the risk of possibly not being > > + able to offline memory in certain situations. > > + > > .. note:: > > Techniques that rely on long-term pinnings of memory (especially, RDMA and > > vfio) are fundamentally problematic with ZONE_MOVABLE and, therefore, memory > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index d523a345dc86..d3abaaec2a22 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -525,6 +525,7 @@ unsigned long hugetlb_get_unmapped_area(struct file *file, unsigned long addr, > > * code knows it has only reference. All other examinations and > > * modifications require hugetlb_lock. > > * HPG_freed - Set when page is on the free lists. > > + * HPG_vmemmap_optimized - Set when the vmemmap pages of the page are freed. > > * Synchronization: hugetlb_lock held for examination and modification. > > You just moved the Synchronization comment so that it applies to both > HPG_freed and HPG_vmemmap_optimized. However, HPG_vmemmap_optimized is > checked/modified both with and without hugetlb_lock. Nothing wrong with > that, just need to update/fix the comment. > Thanks, Mike. I will update the comment. > Everything else looks good to me, > > Reviewed-by: Mike Kravetz > > -- > Mike Kravetz