Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp1048340rdb; Tue, 19 Sep 2023 19:57:46 -0700 (PDT) X-Google-Smtp-Source: AGHT+IG1QKRom9YT6b9ItpAdEOoGWq2yi9m2vP+xpE3zSMn/bt5tfa9CnEWyhq66bsAdbJKt7hmV X-Received: by 2002:a05:6808:1594:b0:3a7:215c:e34 with SMTP id t20-20020a056808159400b003a7215c0e34mr1341093oiw.15.1695178666045; Tue, 19 Sep 2023 19:57:46 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695178666; cv=none; d=google.com; s=arc-20160816; b=JFlbXGJ2PSaW1G52bI32qPPQLkPEd8Jt5Dt1UiDBz0aFd/Pny/PlpnkGxSqx//2xRP Ggqn3Zp8RzBlFO/x5E02POdHX9ddKZ7At5JzcWvGCh1bgDBM8phE/yNC+S3EeieIe6Le uAtGIHE8rW7mI9XUUvnYMj86KNaA35RG9Z2P+BGe88nn4AYJAqoSx9XL+MG1cNhkA7gb 704anRb6+DVNB4R3EFyIzMTb6edrQ6CRB6QxYcnydE/ZZG7jbA+zBJNw9Z/1MI92mvBx +/5Ghngj0o7yk/O8Zzt2slt5Tls+QhJDCBiBHPAtbKAMZR6qneH9bVgIFAWLU/sWtM9k w0QQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=B8LSTp3wsxfzcFZzC31jRTbSVU6OJwnSu3sTbOPn0G8=; fh=9ftvJNS9qFIaM+eSpgkUvHMv3ZATU5D8HwoX+Af6uRg=; b=jbp6R4l2RBcDPI8S2XF8eY4vh84SMKK6B9hFuSMYKyf/6ZWq09qDBeW6a7txJ7lvtn Oc+YV93e1+HaRc1HX/cg2hp99a4RwBCLRBXn7H5nMYWIVMWkwDYGWtSUYqVsgOHsMbNq FbGs9qJKovtLnlm6cAsRKie6c2vRXSpwoa6EP4PEHFM4UrZjynnaduPoRdszgxvEf2bz VSjhaX4y4ngLyCzk9UK0h0wwhQXHuSnjjYExej2c4ngaWfNh3ordOr+X7qsJdHwUN7AW KSqU4lHQnJPqmyaIoidv706HlTQUJJlb6GKSPO9KayFWd8Myir22tz+L6noJcXcKrLZg G5MQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=WGdmrN18; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from lipwig.vger.email (lipwig.vger.email. [23.128.96.33]) by mx.google.com with ESMTPS id s11-20020a056a00194b00b0068e26b4cd71si11113457pfk.179.2023.09.19.19.57.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 19:57:45 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) client-ip=23.128.96.33; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=WGdmrN18; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.33 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by lipwig.vger.email (Postfix) with ESMTP id 91AA5826CB57; Tue, 19 Sep 2023 19:57:30 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at lipwig.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232009AbjITC44 (ORCPT + 99 others); Tue, 19 Sep 2023 22:56:56 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53936 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229641AbjITC4z (ORCPT ); Tue, 19 Sep 2023 22:56:55 -0400 Received: from out-223.mta0.migadu.com (out-223.mta0.migadu.com [91.218.175.223]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 17DDFCA for ; Tue, 19 Sep 2023 19:56:47 -0700 (PDT) Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695178605; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=B8LSTp3wsxfzcFZzC31jRTbSVU6OJwnSu3sTbOPn0G8=; b=WGdmrN18qWd9ekOdaHd9nYKDNZV5azPdEY/azeGCfAMn1OLThE+qdnt3AdgiMmzwv5ZbHU q5UAyfIq6URew3pvGQxbcqkajr7RaAAgDqsMSAQjEZqJUEMzutPK5OStKIMe66NEHcB8W2 TjGWhXKj0YNatNWSwvXxIadN8Tejixw= Mime-Version: 1.0 Subject: Re: [PATCH v4 4/8] hugetlb: perform vmemmap restoration on a list of pages X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: <20230919205756.GB425719@monkey> Date: Wed, 20 Sep 2023 10:56:17 +0800 Cc: Linux-MM , LKML , Muchun Song , Joao Martins , Oscar Salvador , David Hildenbrand , Miaohe Lin , David Rientjes , Anshuman Khandual , Naoya Horiguchi , Barry Song <21cnbao@gmail.com>, Michal Hocko , Matthew Wilcox , Xiongchun Duan , Andrew Morton Content-Transfer-Encoding: quoted-printable Message-Id: References: <20230918230202.254631-1-mike.kravetz@oracle.com> <20230918230202.254631-5-mike.kravetz@oracle.com> <20230919205756.GB425719@monkey> To: Mike Kravetz X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lipwig.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (lipwig.vger.email [0.0.0.0]); Tue, 19 Sep 2023 19:57:30 -0700 (PDT) > On Sep 20, 2023, at 04:57, Mike Kravetz = wrote: >=20 > On 09/19/23 17:52, Muchun Song wrote: >>=20 >>=20 >> On 2023/9/19 07:01, Mike Kravetz wrote: >>> The routine update_and_free_pages_bulk already performs vmemmap >>> restoration on the list of hugetlb pages in a separate step. In >>> preparation for more functionality to be added in this step, create = a >>> new routine hugetlb_vmemmap_restore_folios() that will restore >>> vmemmap for a list of folios. >>>=20 >>> This new routine must provide sufficient feedback about errors and >>> actual restoration performed so that update_and_free_pages_bulk can >>> perform optimally. >>>=20 >>> Signed-off-by: Mike Kravetz >>> --- >>> mm/hugetlb.c | 36 ++++++++++++++++++------------------ >>> mm/hugetlb_vmemmap.c | 37 +++++++++++++++++++++++++++++++++++++ >>> mm/hugetlb_vmemmap.h | 11 +++++++++++ >>> 3 files changed, 66 insertions(+), 18 deletions(-) >>>=20 >>> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index d6f3db3c1313..814bb1982274 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -1836,36 +1836,36 @@ static void = update_and_free_hugetlb_folio(struct hstate *h, struct folio *folio, >>> static void update_and_free_pages_bulk(struct hstate *h, struct = list_head *list) >>> { >>> + int ret; >>> + unsigned long restored; >>> struct folio *folio, *t_folio; >>> - bool clear_dtor =3D false; >>> /* >>> - * First allocate required vmemmmap (if necessary) for all folios = on >>> - * list. If vmemmap can not be allocated, we can not free folio = to >>> - * lower level allocator, so add back as hugetlb surplus page. >>> - * add_hugetlb_folio() removes the page from THIS list. >>> - * Use clear_dtor to note if vmemmap was successfully allocated = for >>> - * ANY page on the list. >>> + * First allocate required vmemmmap (if necessary) for all folios. >>> */ >>> - list_for_each_entry_safe(folio, t_folio, list, lru) { >>> - if (folio_test_hugetlb_vmemmap_optimized(folio)) { >>> - if (hugetlb_vmemmap_restore(h, &folio->page)) { >>> - spin_lock_irq(&hugetlb_lock); >>> + ret =3D hugetlb_vmemmap_restore_folios(h, list, &restored); >>> + >>> + /* >>> + * If there was an error restoring vmemmap for ANY folios on the = list, >>> + * add them back as surplus hugetlb pages. add_hugetlb_folio() = removes >>> + * the folio from THIS list. >>> + */ >>> + if (ret < 0) { >>> + spin_lock_irq(&hugetlb_lock); >>> + list_for_each_entry_safe(folio, t_folio, list, lru) >>> + if (folio_test_hugetlb_vmemmap_optimized(folio)) >>> add_hugetlb_folio(h, folio, true); >>> - spin_unlock_irq(&hugetlb_lock); >>> - } else >>> - clear_dtor =3D true; >>> - } >>> + spin_unlock_irq(&hugetlb_lock); >>> } >>> /* >>> - * If vmemmmap allocation was performed on any folio above, take = lock >>> - * to clear destructor of all folios on list. This avoids the = need to >>> + * If vmemmmap allocation was performed on ANY folio , take lock = to >>> + * clear destructor of all folios on list. This avoids the need = to >>> * lock/unlock for each individual folio. >>> * The assumption is vmemmap allocation was performed on all or = none >>> * of the folios on the list. This is true expect in VERY rare = cases. >>> */ >>> - if (clear_dtor) { >>> + if (restored) { >>> spin_lock_irq(&hugetlb_lock); >>> list_for_each_entry(folio, list, lru) >>> __clear_hugetlb_destructor(h, folio); >>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>> index 4558b814ffab..463a4037ec6e 100644 >>> --- a/mm/hugetlb_vmemmap.c >>> +++ b/mm/hugetlb_vmemmap.c >>> @@ -480,6 +480,43 @@ int hugetlb_vmemmap_restore(const struct hstate = *h, struct page *head) >>> return ret; >>> } >>> +/** >>> + * hugetlb_vmemmap_restore_folios - restore vmemmap for every folio = on the list. >>> + * @h: struct hstate. >>> + * @folio_list: list of folios. >>> + * @restored: Set to number of folios for which vmemmap was = restored >>> + * successfully if caller passes a non-NULL pointer. >>> + * >>> + * Return: %0 if vmemmap exists for all folios on the list. If an = error is >>> + * encountered restoring vmemmap for ANY folio, an error code >>> + * will be returned to the caller. It is then the responsibility >>> + * of the caller to check the hugetlb vmemmap optimized flag of >>> + * each folio to determine if vmemmap was actually restored. >>> + */ >>> +int hugetlb_vmemmap_restore_folios(const struct hstate *h, >>> + struct list_head *folio_list, >>> + unsigned long *restored) >>> +{ >>> + unsigned long num_restored; >>> + struct folio *folio; >>> + int ret =3D 0, t_ret; >>> + >>> + num_restored =3D 0; >>> + list_for_each_entry(folio, folio_list, lru) { >>> + if (folio_test_hugetlb_vmemmap_optimized(folio)) { >>> + t_ret =3D hugetlb_vmemmap_restore(h, &folio->page); >>=20 >> I still think we should free a non-optimized HugeTLB page if we >> encounter an OOM situation instead of continue to restore >> vemmmap pages. Restoring vmemmmap pages will only aggravate >> the OOM situation. The suitable appraoch is to free a non-optimized >> HugeTLB page to satisfy our allocation of vmemmap pages, what's >> your opinion, Mike? >=20 > I agree. >=20 > As you mentioned previously, this may complicate this code path a bit. > I will rewrite to make this happen. Maybe we could introduced two list passed to update_and_free_pages_bulk = (this will be easy for the callers of it), one is for non-optimized huge page, another is optimized one. In update_and_free_pages_bulk, we could first free those non-optimized huge page, and then restore vemmmap pages for those optimized ones, in which case, the code could be simple. hugetlb_vmemmap_restore_folios() dose not need to add complexity, which still continue to restore vmemmap pages and will stop once we encounter an OOM situation. Thanks. > --=20 > Mike Kravetz