Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2664465pxb; Sun, 24 Jan 2021 16:00:20 -0800 (PST) X-Google-Smtp-Source: ABdhPJwmgSF8CGlwC4wtQUE16SI8CNFGjjcDsgfXtFqzRKsR9exa8Lz3GpUJ3SWYBQaSaNDbrt9r X-Received: by 2002:a17:906:a445:: with SMTP id cb5mr75801ejb.0.1611532820045; Sun, 24 Jan 2021 16:00:20 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1611532820; cv=none; d=google.com; s=arc-20160816; b=SVQZ0osuURbsJ8R/j7+7i8zlJY1pnfIYOfN83jeUTH0mumikzeu1tb/qy12ZbrBarQ 2yBt4VCdMeK4oWjmvJYjMr7eYfw9qx3y8aLZOV241B39AKrnL2pmbdiORAZI6b077YQa HxGpZY6tFzSF+m3r5Wdax8DLo1klHJUKfl0Apo7+nq503qa0qaDx7bDZaY48IROcj3Ov OSqDvlh931JshPsdkHNoBceOjKlAmgBEt6lkSEn345+9XpI6vqVAajOeqgYTxWB0yu+W QTQvwiQJBniEKWy9JUZXXflBx34h++Yx5MUIYr090AQ16PUfLKfzOFreETyxyZITnncG CuAA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:mime-version:references:message-id:in-reply-to :subject:cc:to:from:date:dkim-signature; bh=mo+UVHt1CoQmQvrUMCO6S1rlAhtZtR0L+ZEK+84Qw8g=; b=StJZxiMXun8CmyGavEukiWeekBHdWN5noGaXDG8b6A72IwGShW70641UE51JKpxj3S soqOXJHsedRerfzs+WEMzTwXYTG3yOvAsmN2pVHPv32Mkc3wHd+a+403VcPh2juJQ2Qa ZvHewLANK5ot29ky+K/1oFl8+soB9X42oMBsn2z6Rz4ekxiYZrPcfmUGwXKkdJKVUUSm QK7NvlF7Ynz4qqw2/0vAbekACaO5D+PDWAoUfVvPcxEd32qUPLJSYxhLsxyzWXO9t0yQ pLcMrgjlpfZ2aDUWy+lM2we9vqxBiU2hb8e9hzasX/jheYI6+b/KjscgD+GOURAtkHOL Iv1w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=sG3T12yd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id d24si6307564edu.93.2021.01.24.15.59.54; Sun, 24 Jan 2021 16:00:20 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@google.com header.s=20161025 header.b=sG3T12yd; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=REJECT sp=REJECT dis=NONE) header.from=google.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726333AbhAXX41 (ORCPT + 99 others); Sun, 24 Jan 2021 18:56:27 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55376 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726103AbhAXX4U (ORCPT ); Sun, 24 Jan 2021 18:56:20 -0500 Received: from mail-pg1-x531.google.com (mail-pg1-x531.google.com [IPv6:2607:f8b0:4864:20::531]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 7A077C061573 for ; Sun, 24 Jan 2021 15:55:40 -0800 (PST) Received: by mail-pg1-x531.google.com with SMTP id z21so7761447pgj.4 for ; Sun, 24 Jan 2021 15:55:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=mo+UVHt1CoQmQvrUMCO6S1rlAhtZtR0L+ZEK+84Qw8g=; b=sG3T12ydF9lYKTPIyKnfP8cAFtU6F8Rb54+tzW47qzbjz/JsrkcBk8Y2Saq6h8aQ5v R6r8EfKV1LReR+UgZqBb+pM20/pUfsExGCi9RECg9n6VrYm6xJlf7qCNH0NShwsBvPAl 1o5Fdui1KKtB923kQe+mvd1gGKONAOSMpGR2OIa4rFG6MzF0PPn1riM+bJUhBA2yvQH7 ZMqfp3v3zTYUeEeoHrY50nD/Uc/Dt/rqeIcwJpRglxHQNStis+M+j5TcLxuEJgRs1U6x Zc0mJxMrRjPYyRUgH7e0uUsG1YLeMcEKf9gXMtbiYnOMXQNMAGPZCxFFkx8+D0gNLGcG v3+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=mo+UVHt1CoQmQvrUMCO6S1rlAhtZtR0L+ZEK+84Qw8g=; b=o4YM9fc2EQcZLlsK8WUrWiTLLdEb3vd9QHuWhE9XbbTqMOmqt2WSchOat8IF2c16JE LLMn6Ldd1zD8OLjHvORTua1AagLkQxD6y4LBF/YabdoI3yHuzXWukYOgOqNVBw9HFBWB fXwhvWxZULWz8/Ysl9VaqhDwdZdt4x0mNkviR2BPNLPpmWUBzwFEejYjLT/sRZXUEgpp SgRqq4NevdQPld3kT7BxHhPXUT1+9OHC0N1GmuisJabZ9d4R1NLmjdtX0of7bjs2KKsy Wp9wYFA8DKseW72m7h5+Wvglc/yfIRT8ZP3fpwNJFbjmrAcRA9c5Z7BExSTjo5pftqLN yTbQ== X-Gm-Message-State: AOAM533GBa29d/wp3pa9kndYjYXHYSidJ7T7Sne2oWhIctx/Z/88EWq+ uP2pnF8Pf8DqGL/ZmuqadGhCKg== X-Received: by 2002:a63:e109:: with SMTP id z9mr2465934pgh.5.1611532539670; Sun, 24 Jan 2021 15:55:39 -0800 (PST) Received: from [2620:15c:17:3:4a0f:cfff:fe51:6667] ([2620:15c:17:3:4a0f:cfff:fe51:6667]) by smtp.gmail.com with ESMTPSA id x1sm15377868pgj.37.2021.01.24.15.55.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 24 Jan 2021 15:55:38 -0800 (PST) Date: Sun, 24 Jan 2021 15:55:37 -0800 (PST) From: David Rientjes To: Muchun Song cc: corbet@lwn.net, mike.kravetz@oracle.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, Matthew Wilcox , osalvador@suse.de, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, naoya.horiguchi@nec.com, duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v13 04/12] mm: hugetlb: defer freeing of HugeTLB pages In-Reply-To: <20210117151053.24600-5-songmuchun@bytedance.com> Message-ID: <59d18082-248a-7014-b917-625d759c572@google.com> References: <20210117151053.24600-1-songmuchun@bytedance.com> <20210117151053.24600-5-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sun, 17 Jan 2021, Muchun Song wrote: > In the subsequent patch, we should allocate the vmemmap pages when > freeing HugeTLB pages. But update_and_free_page() is always called > with holding hugetlb_lock, so we cannot use GFP_KERNEL to allocate > vmemmap pages. However, we can defer the actual freeing in a kworker > to prevent from using GFP_ATOMIC to allocate the vmemmap pages. > > The update_hpage_vmemmap_workfn() is where the call to allocate > vmemmmap pages will be inserted. > I think it's reasonable to assume that userspace can release free hugetlb pages from the pool on oom conditions when reclaim has become too expensive. This approach now requires that we can allocate vmemmap pages in a potential oom condition as a prerequisite for freeing memory, which seems less than ideal. And, by doing this through a kworker, we can presumably get queued behind another work item that requires memory to make forward progress in this oom condition. Two thoughts: - We're going to be freeing the hugetlb page after we can allocate the vmemmap pages, so why do we need to allocate with GFP_KERNEL? Can't we simply dip into memory reserves using GFP_ATOMIC (and thus can be holding hugetlb_lock) because we know we'll be freeing more memory than we'll be allocating? I think requiring a GFP_KERNEL allocation to block to free memory for vmemmap when we'll be freeing memory ourselves is dubious. This simplifies all of this. - If the answer is that we actually have to use GFP_KERNEL for other reasons, what are your thoughts on pre-allocating the vmemmap as opposed to deferring to a kworker? In other words, preallocate the necessary memory with GFP_KERNEL and put it on a linked list in struct hstate before acquiring hugetlb_lock. > Signed-off-by: Muchun Song > Reviewed-by: Mike Kravetz > --- > mm/hugetlb.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++-- > mm/hugetlb_vmemmap.c | 12 --------- > mm/hugetlb_vmemmap.h | 17 ++++++++++++ > 3 files changed, 89 insertions(+), 14 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 140135fc8113..c165186ec2cf 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1292,15 +1292,85 @@ static inline void destroy_compound_gigantic_page(struct page *page, > unsigned int order) { } > #endif > > -static void update_and_free_page(struct hstate *h, struct page *page) > +static void __free_hugepage(struct hstate *h, struct page *page); > + > +/* > + * As update_and_free_page() is always called with holding hugetlb_lock, so we > + * cannot use GFP_KERNEL to allocate vmemmap pages. However, we can defer the > + * actual freeing in a workqueue to prevent from using GFP_ATOMIC to allocate > + * the vmemmap pages. > + * > + * The update_hpage_vmemmap_workfn() is where the call to allocate vmemmmap > + * pages will be inserted. > + * > + * update_hpage_vmemmap_workfn() locklessly retrieves the linked list of pages > + * to be freed and frees them one-by-one. As the page->mapping pointer is going > + * to be cleared in update_hpage_vmemmap_workfn() anyway, it is reused as the > + * llist_node structure of a lockless linked list of huge pages to be freed. > + */ > +static LLIST_HEAD(hpage_update_freelist); > + > +static void update_hpage_vmemmap_workfn(struct work_struct *work) > { > - int i; > + struct llist_node *node; > + > + node = llist_del_all(&hpage_update_freelist); > + > + while (node) { > + struct page *page; > + struct hstate *h; > + > + page = container_of((struct address_space **)node, > + struct page, mapping); > + node = node->next; > + page->mapping = NULL; > + h = page_hstate(page); > + > + spin_lock(&hugetlb_lock); > + __free_hugepage(h, page); > + spin_unlock(&hugetlb_lock); > > + cond_resched(); Wouldn't it be better to hold hugetlb_lock for the iteration rather than constantly dropping it and reacquiring it? Use cond_resched_lock(&hugetlb_lock) instead?