Received: by 2002:a05:6a10:8c0a:0:0:0:0 with SMTP id go10csp2228316pxb; Tue, 23 Feb 2021 01:35:51 -0800 (PST) X-Google-Smtp-Source: ABdhPJyYff5V+cTcmE2eSvTr3USAz8zmuzf4Yj8L8aK27kF4fh5nLtwfnw5vol9ftpXq30jKSW2x X-Received: by 2002:a17:906:5293:: with SMTP id c19mr20315339ejm.437.1614072951350; Tue, 23 Feb 2021 01:35:51 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1614072951; cv=none; d=google.com; s=arc-20160816; b=HzFNv0bpYS8rMxVJgcpQbWkkrvq+tvSo7gmU0nKsaidQbVaWwm0zAoEnemJKtZBQ4b cVP5ZTsGe9+j6vEANObfswWBtBn6/KRuTmBVQVW1QyT01nSdSMh4s52MWTg4N6gN6I8/ zT51TQT/fTkHBHl1CfyQ4uMXjJpTNbaQ9B+EySMKH7529D/8PdaJb4WL0EHe0qL5voC9 XU6faRpkouW3kSU7Ifzxf1jgyHyGlcovdLdXwPH2pD3ZE2P84MvooPESf35PIv9C/377 +fE90nh/c1D0O4QM93wqVDQEGQ0Jhh6VXLAtI6TkEuxQPR/qKuoR+hyS7Sx1pEYLFkau jIAw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:user-agent:in-reply-to:content-disposition :mime-version:references:message-id:subject:cc:to:from:date; bh=OMYrdRaoHVj9GSTlYYMy9DvkeMn17Z3Ute9K0z+h79s=; b=XIAcZf2ee1RvOB9RhbmQskZCeEUMdETe7F+O4PsJoSK6Pv84am8EsrvSndc7si4/cd YzorosHx2USvKR7n5eQM83jqlE6aY+TpoWIvDb0isSxB68i40Xas80SnmY5tLNwGv/g8 KdKuO3Qrlad5tb1GqPgoBwoloSRPKTKBC/hzeFP0CFKkjX2yS39mHJalXGXArtyChpvb y3V4LfvkoQ0fNcKBp2N8dl246ZfbpjlZwL9kcXsyw2tQE2cReCmjJaq4ngsejcwfYYhJ FttR3OWubi4GzJLnT9q4fVv+M8RtHas5bXDi6xacbHgHXt1ju/m6GkawAVJXe1drr0Pp F9sA== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id bt23si13999211edb.20.2021.02.23.01.35.27; Tue, 23 Feb 2021 01:35:51 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231403AbhBWJaq (ORCPT + 99 others); Tue, 23 Feb 2021 04:30:46 -0500 Received: from mx2.suse.de ([195.135.220.15]:43614 "EHLO mx2.suse.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231561AbhBWJ2l (ORCPT ); Tue, 23 Feb 2021 04:28:41 -0500 X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 6C54AAC1D; Tue, 23 Feb 2021 09:27:59 +0000 (UTC) Date: Tue, 23 Feb 2021 10:27:55 +0100 From: Oscar Salvador To: Mike Kravetz Cc: Muchun Song , corbet@lwn.net, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, rientjes@google.com, willy@infradead.org, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, naoya.horiguchi@nec.com, joao.m.martins@oracle.com, duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org Subject: Re: [PATCH v16 4/9] mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page Message-ID: <20210223092740.GA1998@linux> References: <20210219104954.67390-1-songmuchun@bytedance.com> <20210219104954.67390-5-songmuchun@bytedance.com> <13a5363c-6af4-1e1f-9a18-972ca18278b5@oracle.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <13a5363c-6af4-1e1f-9a18-972ca18278b5@oracle.com> User-Agent: Mutt/1.10.1 (2018-07-13) Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Feb 22, 2021 at 04:00:27PM -0800, Mike Kravetz wrote: > > -static void update_and_free_page(struct hstate *h, struct page *page) > > +static int update_and_free_page(struct hstate *h, struct page *page) > > + __releases(&hugetlb_lock) __acquires(&hugetlb_lock) > > { > > int i; > > + int nid = page_to_nid(page); > > > > if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) > > - return; > > + return 0; > > > > h->nr_huge_pages--; > > - h->nr_huge_pages_node[page_to_nid(page)]--; > > + h->nr_huge_pages_node[nid]--; > > + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page(page), page); > > + VM_BUG_ON_PAGE(hugetlb_cgroup_from_page_rsvd(page), page); > > + set_compound_page_dtor(page, NULL_COMPOUND_DTOR); > > + set_page_refcounted(page); > > I think you added the set_page_refcounted() because the huge page will > appear as just a compound page without a reference after dropping the > hugetlb lock? It might be better to set the reference before modifying > the destructor. Otherwise, page scanning code could find the non-hugetlb > compound page with no reference. I could not find any code where this > would be a problem, but I think it would be safer to set the reference > first. But we already had set_page_refcounted() before this patchset there. Are the worries only because we drop the lock? AFAICS, the "page-scanning" problem could have happened before as well? Although, what does page scanning mean in this context? I am not opposed to move it above, but I would like to understand the concern here. > > > + spin_unlock(&hugetlb_lock); > > I really like the way this code is structured. It is much simpler than > previous versions with retries or workqueue. There is nothing wrong with > always dropping the lock here. However, I wonder if we should think about > optimizing for the case where this feature is not enabled and we are not > freeing a 1G huge page. I suspect this will be the most common case for > some time, and there is no need to drop the lock in this case. > > Please do not change the code based on my comment. I just wanted to bring > this up for thought. > > Is it as simple as checking? > if (free_vmemmap_pages_per_hpage(h) || hstate_is_gigantic(h)) > spin_unlock(&hugetlb_lock); > > /* before return */ > if (free_vmemmap_pages_per_hpage(h) || hstate_is_gigantic(h)) > spin_lock(&hugetlb_lock); AFAIK, we at least need the hstate_is_gigantic? Comment below says that free_gigantic_page might block, so we need to drop the lock. And I am fine with the change overall. Unless I am missing something, we should not need to drop the lock unless we need to allocate vmemmap pages (apart from gigantic pages). > > > + > > + if (alloc_huge_page_vmemmap(h, page)) { > > + int zeroed; > > + > > + spin_lock(&hugetlb_lock); > > + INIT_LIST_HEAD(&page->lru); > > + set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); > > + h->nr_huge_pages++; > > + h->nr_huge_pages_node[nid]++; I think prep_new_huge_page() does this for us? > > + > > + /* > > + * If we cannot allocate vmemmap pages, just refuse to free the > > + * page and put the page back on the hugetlb free list and treat > > + * as a surplus page. > > + */ > > + h->surplus_huge_pages++; > > + h->surplus_huge_pages_node[nid]++; > > + > > + /* > > + * This page is now managed by the hugetlb allocator and has > > + * no users -- drop the last reference. > > + */ > > + zeroed = put_page_testzero(page); > > + VM_BUG_ON_PAGE(!zeroed, page); Can this actually happen? AFAIK, page landed in update_and_free_page should be zero refcounted, then we increase the reference, and I cannot see how the reference might have changed in the meantime. I am all for catching corner cases, but not sure how realistic this is. Moreover, if we __ever__ get there, things can get nasty. We basically will have an in-use page in the free hugetlb pool, so corruption will happen. At that point, a plain BUG_ON might be better. But as I said, I do not think we need that. I yet need to look further, but what I have seen so far looks good. -- Oscar Salvador SUSE L3