Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp115807pxb; Fri, 19 Feb 2021 20:29:35 -0800 (PST) X-Google-Smtp-Source: ABdhPJxho7tDxEmjwUHkYm6IAn9OJAHQIDyqfVu2vf0d/e77JrkXaipFa8WSgcSGt3+UeC6hI2hS X-Received: by 2002:a17:906:ad87:: with SMTP id la7mr8652515ejb.534.1613795375640; Fri, 19 Feb 2021 20:29:35 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613795375; cv=none; d=google.com; s=arc-20160816; b=Fvg83D6CB1zOrj/ADUF/ANRXlF6WbWG71pFvFAJ/2waH3jeAGhDfOnX6ndNCuymRn1 1QcuwPPy/rWcj19FV4w4mDHhP/jwLWJvOY6t0DPVV5AuZ+W68Mayn5Ktz3PLVFPd/ePt 2FB4PRF2DVBoaSSxbA695bOTj48MJgUCR0yKuvocr3voUkh0MyIK0Dcwyd8yG0Hl99G8 gAiaOKiWX70EJZ0SIJNKTcR8C8iMqbO8bdGQohhw0k3mGVSuiCYzmjoOpLFJh4ErVOwk GPmhGkn8Af76Yiz0wsbtDMfS5kgu4ljA0v0Hnszn2RMfsQ4qKC42/CBQMvo5fSGNOlOr x3AA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=mklg+AYwKzafXeovf7IuFZZrAWcs0LMC3PrPd1tFTKU=; b=QH2FZdtwmDGXwaw0CzzZb+fhfMzRwDs/MkWXZQnoUEfruumI0Prxo44EQlQS0oHnU7 0Hs442VmVA7U/oj4zRKCCyGWwgHoMARtu7U4U7tdUtvil4JUKEgZwXQOk31VdgH1brGw HAZ09FWIUFPvUqmnh6GVkMttVwLxEmN0uYq3ljIajHnkS3xnnj4zc7b7Mbi4DgiVduL8 SxDvL16thVPVd0o+5s+aY3ekxTUWPxuZ7kQPOl8eg7XN/LQ74MBkC1A65uTfdYgHwxaJ uu9ekavHPP5NPLuVnvj4sf30/7dadWxenHN3M+gsNj6vPpL4Vpke2oXF+NYcyN7smaeN 1ilw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=12b7zh+2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id v12si911527eja.361.2021.02.19.20.28.47; Fri, 19 Feb 2021 20:29:35 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=12b7zh+2; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229725AbhBTEV4 (ORCPT + 99 others); Fri, 19 Feb 2021 23:21:56 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51538 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229765AbhBTEVx (ORCPT ); Fri, 19 Feb 2021 23:21:53 -0500 Received: from mail-pg1-x52a.google.com (mail-pg1-x52a.google.com [IPv6:2607:f8b0:4864:20::52a]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2A30EC06178A for ; Fri, 19 Feb 2021 20:21:13 -0800 (PST) Received: by mail-pg1-x52a.google.com with SMTP id b21so6618275pgk.7 for ; Fri, 19 Feb 2021 20:21:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mklg+AYwKzafXeovf7IuFZZrAWcs0LMC3PrPd1tFTKU=; b=12b7zh+2/tZORGhOPLBBGI1aC17iBV3qsfujetE2XOCQYGg56u4ElUYrHBoKbS5uuc jO3TCUFl/u8EdF5/kbZtA9RAFu2qywWZJg5tFL4Zx6Mo8+LxP5OZtio585fA619DKjJ+ ktz7rSrKj1Je+nCBS7ttc5v2CMIqcb8xniDRheJ9NnFjuuNVIGbS2fUAcHlI7juzx31J kjJqPpW/Ryy0ikHGt904ok0oRJbarKGIlf3Yr7DuzgJL+wSZPrR8zRAaBa/0MjmIMxKt sCiDfxX9HN9h9LVSSW0pC+z19hvhXpFVDJUPrU0gTEuE9eWoh49+bAMQYodtQW3CnFVP SvhQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mklg+AYwKzafXeovf7IuFZZrAWcs0LMC3PrPd1tFTKU=; b=jlsK1PXwAk6ZQiqIY1tQIB2fx7fm8pWWoF9vTbeysN1LOe3Uoe7JgHlJrBIcFEFhbL 9WpHkOW4CaPsw5w99bpry5hnU9qCcbiEC0KF1mLBcnF8cYr+duJz/qY6tw7UZot0ZiGn RD2L92qz92ftpm15B3dwT+UPIyCiMfFPCOobyf58ZcN+mEvtv7fy8lN2yvL0VnRiDP8c B/FnvTPn9prUZA127VZeAcv4wg67ExAAv3hC6I7f+EIEkMjVwb2W2NcmIMsUZlV9gwJ+ TQyZd2G3f/eouPMPqgph22f5gnMgyBMPAWUFcmXmHdApfW8ywlp0MatdczVutUc+drkn m2tQ== X-Gm-Message-State: AOAM531FHiDG60roj6fjJDbtGNs2L6iJmd+o53LEkeOwrNNYNglwYhgU bfk13LFsEk9kqwcdaC+vAqSLuXQmjTsiAv3C8Su00Q== X-Received: by 2002:aa7:910c:0:b029:1ed:ef1:81b with SMTP id 12-20020aa7910c0000b02901ed0ef1081bmr10564219pfh.49.1613794872318; Fri, 19 Feb 2021 20:21:12 -0800 (PST) MIME-Version: 1.0 References: <20210219104954.67390-1-songmuchun@bytedance.com> <20210219104954.67390-5-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Sat, 20 Feb 2021 12:20:36 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page To: Michal Hocko Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , "Song Bao Hua (Barry Song)" , David Hildenbrand , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Joao Martins , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Feb 19, 2021 at 10:12 PM Michal Hocko wrote: > > On Fri 19-02-21 18:49:49, Muchun Song wrote: > > When we free a HugeTLB page to the buddy allocator, we should allocate > > the vmemmap pages associated with it. But we may cannot allocate vmemmap > > pages when the system is under memory pressure, in this case, we just > > refuse to free the HugeTLB page instead of looping forever trying to > > allocate the pages. This changes some behavior (list below) on some > > corner cases. > > > > 1) Failing to free a huge page triggered by the user (decrease nr_pages). > > > > Need try again later by the user. > > > > 2) Failing to free a surplus huge page when freed by the application. > > > > Try again later when freeing a huge page next time. > > This means that surplus pages can accumulate right? This should be > rather unlikely because one released huge page could then be reused for > normal allocations - including vmemmap. Unlucky timing might still end > up in the accumulation though. Not something critical though. Agree. > > > 3) Failing to dissolve a free huge page on ZONE_MOVABLE via > > offline_pages(). > > > > This is a bit unfortunate if we have plenty of ZONE_MOVABLE memory > > but are low on kernel memory. For example, migration of huge pages > > would still work, however, dissolving the free page does not work. > > This is a corner cases. When the system is that much under memory > > pressure, offlining/unplug can be expected to fail. > > Please mention that this is unfortunate because it prevents from the > memory offlining which shouldn't happen for movable zones. People > depending on the memory hotplug and movable zone should carefuly > consider whether savings on unmovable memory are worth losing their > hotplug functionality in some situations. Make sense. I will mention this in the change log. Thanks. > > > 4) Failing to dissolve a huge page on CMA/ZONE_MOVABLE via > > alloc_contig_range() - once we have that handling in place. Mainly > > affects CMA and virtio-mem. > > What about hugetlb page poisoning on HW failure (resp. soft offlining)? If the HW poisoned hugetlb page failed to be dissolved, the page will go back to the free list with PG_HWPoison set. But the page will not be used, because we will check whether the page is HW poisoned when it is dequeued from the free list. If so, we will skip this page. > > > > > Similar to 3). virito-mem will handle migration errors gracefully. > > CMA might be able to fallback on other free areas within the CMA > > region. > > > > We do not want to use GFP_ATOMIC to allocate vmemmap pages. Because it > > grants access to memory reserves and we do not think it is reasonable > > to use memory reserves. We use GFP_KERNEL in alloc_huge_page_vmemmap(). > > This likely needs more context around. Maybe something like > " > Vmemmap pages are allocated from the page freeing context. In order for > those allocations to be not disruptive (e.g. trigger oom killer) > __GFP_NORETRY is used. hugetlb_lock is dropped for the allocation > because a non sleeping allocation would be too fragile and it could fail > too easily under memory pressure. GFP_ATOMIC or other modes to access > memory reserves is not used because we want to prevent consuming > reserves under heavy hugetlb freeing. > " Thanks. I will use this to the change log. > > I haven't gone through the patch in a great detail yet, from a high > level POV it looks good although the counter changes and reshuffling > seems little wild. That requires a more detailed look I do not have time > for right now. Mike would be much better for that anywya ;) Yeah. Hope Mike will review this (I believe he is good at this area). > > I do not see any check for an atomic context in free_huge_page path. I > have suggested to replace in_task by in_atomic check (with a gotcha that > the later doesn't work without preempt_count but there is a work to > address that). Sorry. I forgot it. I will replace in_task with in_atomic in the next version. Thanks for your suggestions. > -- > Michal Hocko > SUSE Labs