Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp4881253pxb; Mon, 15 Feb 2021 03:56:16 -0800 (PST) X-Google-Smtp-Source: ABdhPJwebkjTXCGlolc19xeUJ6lQeICKRFLrVZR4Hh4nILGyvVBl2ijT0NnVgL769R8P0S+EEfIW X-Received: by 2002:a17:906:bceb:: with SMTP id op11mr60705ejb.113.1613390176509; Mon, 15 Feb 2021 03:56:16 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1613390176; cv=none; d=google.com; s=arc-20160816; b=wXznOUJX6vgI2w8mWzyycOXwADiYJUnXrgXKVTPZj7xlWcQj1GdypzI+xeONmeq+ie rsLyUCtb9NvlZVDtUD17d7NN9eXq6VeemXS7sjasScGwj/Q5aniMCwBQPiL8MvCx+Vw4 I3XZt51ATQ+XqzbTrjEONi/42BBEOVAfYt9s1zJ/7byYWPmsRMBNGwSS0kWuVZ1p2bWH S6kd0rDzVKy2zqDfnrGnyT0P5fQY3jGj+IQDNqqtLrSscc0IJNuV0ltRdo3EOdYFLMHU tIcANunXGmjjTwF072ix5w4k1Ep/26ne9fEFQubMMrZIO5W0SNHyilTq3WTi/eh1HmjB iHjw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=ELXsaA6h2gd3/2XQSUpZtIfYlq6fHEtAZT6xqmaOGS4=; b=ErxKX69MbpciT++mmuOubvkGZYj3oIgu22JKn38kunv86sDye3uuxdRo1569RdsnTj zsqOP7ldpP9Jhm9wDku7c1ZqccEfoTlNd8P6l6l0+sOefqtxZKywJLx+AXQ/J6K0nM9Q XT8OgQCfbNr/NRCeyHrOVC/dkj24TkOpLIbARRygLojOscE0mPBJEJ8y9BQiK4J4sejK xeO4qAbK5Tj8BhEjT/GK7lcvRy0q1V4w8xhelBKqfLi48AOhbmhsMPzSr5UcbYnZFRiQ NShEGZ+K6XTQJ9EDIzAhzVZXXfs6ooEOZML7E2iLxnuiifNv0PuEVCxSUM9nAviBolhs NO4w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=Dh6h6aUx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id rl24si11638474ejb.582.2021.02.15.03.55.53; Mon, 15 Feb 2021 03:56:16 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=Dh6h6aUx; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=bytedance.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230059AbhBOLxB (ORCPT + 99 others); Mon, 15 Feb 2021 06:53:01 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48332 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229925AbhBOLwo (ORCPT ); Mon, 15 Feb 2021 06:52:44 -0500 Received: from mail-pj1-x1032.google.com (mail-pj1-x1032.google.com [IPv6:2607:f8b0:4864:20::1032]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E9C76C0613D6 for ; Mon, 15 Feb 2021 03:52:03 -0800 (PST) Received: by mail-pj1-x1032.google.com with SMTP id q72so3590715pjq.2 for ; Mon, 15 Feb 2021 03:52:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ELXsaA6h2gd3/2XQSUpZtIfYlq6fHEtAZT6xqmaOGS4=; b=Dh6h6aUxZyxT9mZOEFDApEmhokQpI3yJ+/bpvDReNio8+/wY/9QxWqG2Rf9jWS+LBm WEgYVe5iGG4CVZzCxLPhdAS2BB5wSLWkyhr+/+2yug9QKh0mNDnRJH8PXryXuxT5yhhe Yj/0vEXgC8cr/5h8zU+Lsup3t3R4Fxsg3+YGw7xQ+ko1nfmnuI3cWS/49c2KMuCvRFRF 6bwOlDCyKKfQTv8wv6j2JZ8F869L2CkSbhG8m+pkbgy+a0Dj9mojgJ+FM593mk7ohaWP fGZOs7zgayE35XhJfH/Jyd4dgSMze+hzpttwNy1vXWLWm2vXB97Ydssjj6YmLk+Ic3/F NUyg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ELXsaA6h2gd3/2XQSUpZtIfYlq6fHEtAZT6xqmaOGS4=; b=J6qIFRhxi+vAhxXu9F8NTpRKbkJQiKk3JEjc74Y4skkbPRZql3VEZGdMTmISXuU6tz RB57JwfJIaDhJqBHM6VzP5fNqb9vKX3CUnRFoXzSQkLYCLlJCwb+gsOQ9o7YR6ZlYbP4 DD9PN7+GHvEGoGkt21e7ZU4iq+uVI71YnB/C65m1888R4J7HgCz9eomtn6VRG4RwV9cY 4MpjIOPaBo2WXB/5j02NeMGN+1Kc6CNST3snA2YgqA4ul+vhgNVfyARFNjNRTAtXNCkG Hlz6BP5lx0DOYmxVOMsCTFR9Oo5j0RFZX9KV2DAUJZVaNG+gu6dXOC3FwdXhl1B1fRZP BbEA== X-Gm-Message-State: AOAM532ycAwJt29u14w0RytOXw0lguID3c2tdGVkCVKl2mJafaqONTP9 5/tifcbU+cW9l6RzuAxyuA2/KgmSOzj/61yRo3IXVw== X-Received: by 2002:a17:90b:1096:: with SMTP id gj22mr15623262pjb.229.1613389923470; Mon, 15 Feb 2021 03:52:03 -0800 (PST) MIME-Version: 1.0 References: <20210208085013.89436-1-songmuchun@bytedance.com> <20210208085013.89436-5-songmuchun@bytedance.com> In-Reply-To: From: Muchun Song Date: Mon, 15 Feb 2021 19:51:26 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v15 4/8] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page To: Michal Hocko Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , "Song Bao Hua (Barry Song)" , David Hildenbrand , =?UTF-8?B?SE9SSUdVQ0hJIE5BT1lBKOWggOWPoyDnm7TkuZ8p?= , Joao Martins , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 15, 2021 at 6:33 PM Michal Hocko wrote: > > On Mon 15-02-21 18:05:06, Muchun Song wrote: > > On Fri, Feb 12, 2021 at 11:32 PM Michal Hocko wrote: > [...] > > > > +int alloc_huge_page_vmemmap(struct hstate *h, struct page *head) > > > > +{ > > > > + int ret; > > > > + unsigned long vmemmap_addr = (unsigned long)head; > > > > + unsigned long vmemmap_end, vmemmap_reuse; > > > > + > > > > + if (!free_vmemmap_pages_per_hpage(h)) > > > > + return 0; > > > > + > > > > + vmemmap_addr += RESERVE_VMEMMAP_SIZE; > > > > + vmemmap_end = vmemmap_addr + free_vmemmap_pages_size_per_hpage(h); > > > > + vmemmap_reuse = vmemmap_addr - PAGE_SIZE; > > > > + > > > > + /* > > > > + * The pages which the vmemmap virtual address range [@vmemmap_addr, > > > > + * @vmemmap_end) are mapped to are freed to the buddy allocator, and > > > > + * the range is mapped to the page which @vmemmap_reuse is mapped to. > > > > + * When a HugeTLB page is freed to the buddy allocator, previously > > > > + * discarded vmemmap pages must be allocated and remapping. > > > > + */ > > > > + ret = vmemmap_remap_alloc(vmemmap_addr, vmemmap_end, vmemmap_reuse, > > > > + GFP_ATOMIC | __GFP_NOWARN | __GFP_THISNODE); > > > > > > I do not think that this is a good allocation mode. GFP_ATOMIC is a non > > > sleeping allocation and a medium memory pressure might cause it to > > > fail prematurely. I do not think this is really an atomic context which > > > couldn't afford memory reclaim. I also do not think we want to grant > > > > Because alloc_huge_page_vmemmap is called under hugetlb_lock > > now. So using GFP_ATOMIC indeed makes the code more simpler. > > You can have a preallocated list of pages prior taking the lock. A discussion about this can refer to here: https://patchwork.kernel.org/project/linux-mm/patch/20210117151053.24600-5-songmuchun@bytedance.com/ > Moreover do we want to manipulate vmemmaps from under spinlock in > general. I have to say I have missed that detail when reviewing. Need to > think more. > > > From the document of the kernel, I learned that __GFP_NOMEMALLOC > > can be used to explicitly forbid access to emergency reserves. So if > > we do not want to use the reserve memory. How about replacing it to > > > > GFP_ATOMIC | __GFP_NOMEMALLOC | __GFP_NOWARN | __GFP_THISNODE > > The whole point of GFP_ATOMIC is to grant access to memory reserves so > the above is quite dubious. If you do not want access to memory reserves Look at the code of gfp_to_alloc_flags(). static inline unsigned int gfp_to_alloc_flags(gfp_t gfp_mask) { [...] if (gfp_mask & __GFP_ATOMIC) { /* * Not worth trying to allocate harder for __GFP_NOMEMALLOC even * if it can't schedule. */ if (!(gfp_mask & __GFP_NOMEMALLOC)) alloc_flags |= ALLOC_HARDER; [...] } Seems to allow this operation (GFP_ATOMIC | __GFP_NOMEMALLOC). > then use GFP_NOWAIT instead. But failures are much more easier to happen > then. > > NOMEMALLOC is meant to be used from paths which are allowed to consume > memory reserves - e.g. when invoked from the memory reclaim path. > -- > Michal Hocko > SUSE Labs