Received: by 2002:a05:7412:37c9:b0:e2:908c:2ebd with SMTP id jz9csp927230rdb; Tue, 19 Sep 2023 14:49:59 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGZ5DSGJLfLHhBCuvWanQi1BYd1x0adZIlckGBlUfPnbGgDwP8QUiBBu6p02r+bxyPIYcwo X-Received: by 2002:a05:6358:3194:b0:143:6587:60e4 with SMTP id q20-20020a056358319400b00143658760e4mr1206782rwd.14.1695160199455; Tue, 19 Sep 2023 14:49:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1695160199; cv=none; d=google.com; s=arc-20160816; b=OwiVilAtcsSbnmg/ERz5S47Jtk41pz95vqMmKkWdpYmTXPvgv+yhlZKGvDiL07lYUi 4EP+p/cGwFt0cqqSvKYKa7s/jRKg44/yO/V9QkU9TPJ/GvT2Su4okf48CEgJS/BSH0n2 9oi6KZ3glEabX4GxE66Q6EFCAiWnSMyfmDOGp4kSLf8Oh0ZDhbRkGt1w+2RI1b5+LL5Q 8fP8gj95h+33F4O0E/P7qTDYbjEi2LU0ed0Ki5Xb7F1rGKBku0oa/Ncypj/feODJ4bI4 YTgX/bjcTiURGfunJPnfbbA8GOp90x/X6crNcMOl2RrNzyl1513uFulRf7vmxDx17487 8zGA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:to:references:message-id :content-transfer-encoding:cc:date:in-reply-to:from:subject :mime-version:dkim-signature; bh=xhG63ZKSBvU91SDuaCV2UVUbpZEeTJ9JdWWGs2XqxmU=; fh=f+PnwsEj59MPYKhMMUWnvhP71gzFLrvb0ACJLgAdD+g=; b=fLInN30l+E42l1RM8U0ldisJpsJx0wqoGtBUWvxxAqeJdxLwjc+9L8I4/lSSdP9c5g 9UkxJyglLgpSXOVLci4BfWi3uppASCXZvTbtQtrG7XqBOpwvqhhNfdQTIyEcQejtdKR5 q0lWyznmmuqC9jh2n6RLFlkQP5mr8HeUdI2qZvSfV8Ck9QjMm1nll/JUZK3Q2WvjpKBM RFhepxSKnoe6yld15IBTSrlPbb+YpzZ35deBryL9RgB1P5hmDal6gio+CPerOQGxwyQC jaFM6Be7zKhxsUxh6qE5wOwFH7nq8/D+8Xitq8T+/1myGX2KG51RhwXE6Vd8SMra8jwg N1Rw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=uGdjC6Qr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Return-Path: Received: from agentk.vger.email (agentk.vger.email. [23.128.96.32]) by mx.google.com with ESMTPS id u13-20020a63d34d000000b0057771e49c25si10436913pgi.693.2023.09.19.14.49.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 19 Sep 2023 14:49:59 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) client-ip=23.128.96.32; Authentication-Results: mx.google.com; dkim=pass header.i=@linux.dev header.s=key1 header.b=uGdjC6Qr; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.32 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=linux.dev Received: from out1.vger.email (depot.vger.email [IPv6:2620:137:e000::3:0]) by agentk.vger.email (Postfix) with ESMTP id B210980CCD19; Tue, 19 Sep 2023 01:58:18 -0700 (PDT) X-Virus-Status: Clean X-Virus-Scanned: clamav-milter 0.103.10 at agentk.vger.email Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231174AbjISI55 (ORCPT + 99 others); Tue, 19 Sep 2023 04:57:57 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51858 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231158AbjISI5s (ORCPT ); Tue, 19 Sep 2023 04:57:48 -0400 Received: from out-218.mta0.migadu.com (out-218.mta0.migadu.com [IPv6:2001:41d0:1004:224b::da]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2D8B1129 for ; Tue, 19 Sep 2023 01:57:41 -0700 (PDT) Content-Type: text/plain; charset=us-ascii DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1695113858; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xhG63ZKSBvU91SDuaCV2UVUbpZEeTJ9JdWWGs2XqxmU=; b=uGdjC6QrZs4BreUkw8Io9PPSZ3Cw7OlodJ7r9RJeihtKLtehE0lQalsQIYhoQl3Yx3+wqO Qcpkj7awzZ3lb66E8mMQCckiiA4N4EgN4sUclQo5t/yjEel+9uAglU6TO9VyRvMpHB+mWa lJxrBYS1Wkn7DVHqptDPsQoCsPMIZWw= Mime-Version: 1.0 Subject: Re: [PATCH v4 6/8] hugetlb: batch PMD split for bulk vmemmap dedup X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Muchun Song In-Reply-To: Date: Tue, 19 Sep 2023 16:57:00 +0800 Cc: Mike Kravetz , Muchun Song , Oscar Salvador , David Hildenbrand , Miaohe Lin , David Rientjes , Anshuman Khandual , Naoya Horiguchi , Barry Song <21cnbao@gmail.com>, Michal Hocko , Matthew Wilcox , Xiongchun Duan , Linux-MM , Andrew Morton , LKML Content-Transfer-Encoding: quoted-printable Message-Id: <83B874B6-FF22-4588-90A9-31644D598032@linux.dev> References: <20230918230202.254631-1-mike.kravetz@oracle.com> <20230918230202.254631-7-mike.kravetz@oracle.com> <9c627733-e6a2-833b-b0f9-d59552f6ab0d@linux.dev> <07192BE2-C66E-4F74-8F76-05F57777C6B7@linux.dev> To: Joao Martins X-Migadu-Flow: FLOW_OUT X-Spam-Status: No, score=-0.9 required=5.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on agentk.vger.email Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (agentk.vger.email [0.0.0.0]); Tue, 19 Sep 2023 01:58:18 -0700 (PDT) > On Sep 19, 2023, at 16:55, Joao Martins = wrote: >=20 > On 19/09/2023 09:41, Muchun Song wrote: >>> On Sep 19, 2023, at 16:26, Joao Martins = wrote: >>> On 19/09/2023 07:42, Muchun Song wrote: >>>> On 2023/9/19 07:01, Mike Kravetz wrote: >>>>> From: Joao Martins >>>>>=20 >>>>> In an effort to minimize amount of TLB flushes, batch all PMD = splits >>>>> belonging to a range of pages in order to perform only 1 (global) = TLB >>>>> flush. >>>>>=20 >>>>> Add a flags field to the walker and pass whether it's a bulk = allocation >>>>> or just a single page to decide to remap. First value >>>>> (VMEMMAP_SPLIT_NO_TLB_FLUSH) designates the request to not do the = TLB >>>>> flush when we split the PMD. >>>>>=20 >>>>> Rebased and updated by Mike Kravetz >>>>>=20 >>>>> Signed-off-by: Joao Martins >>>>> Signed-off-by: Mike Kravetz >>>>> --- >>>>> mm/hugetlb_vmemmap.c | 79 = +++++++++++++++++++++++++++++++++++++++++--- >>>>> 1 file changed, 75 insertions(+), 4 deletions(-) >>>>>=20 >>>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c >>>>> index 147ed15bcae4..e8bc2f7567db 100644 >>>>> --- a/mm/hugetlb_vmemmap.c >>>>> +++ b/mm/hugetlb_vmemmap.c >>>>> @@ -27,6 +27,7 @@ >>>>> * @reuse_addr: the virtual address of the @reuse_page = page. >>>>> * @vmemmap_pages: the list head of the vmemmap pages that can = be freed >>>>> * or is mapped from. >>>>> + * @flags: used to modify behavior in bulk operations >>>>=20 >>>> Better to describe it as "used to modify behavior in vmemmap page = table walking >>>> operations" >>>>=20 >>> OK >>>=20 >>>>> void hugetlb_vmemmap_optimize_folios(struct hstate *h, struct = list_head >>>>> *folio_list) >>>>> { >>>>> struct folio *folio; >>>>> LIST_HEAD(vmemmap_pages); >>>>> + list_for_each_entry(folio, folio_list, lru) >>>>> + hugetlb_vmemmap_split(h, &folio->page); >>>>> + >>>>> + flush_tlb_all(); >>>>> + >>>>> list_for_each_entry(folio, folio_list, lru) { >>>>> int ret =3D __hugetlb_vmemmap_optimize(h, &folio->page, >>>>> &vmemmap_pages); >>>>=20 >>>> This is unlikely to be failed since the page table allocation >>>> is moved to the above=20 >>>=20 >>>> (Note that the head vmemmap page allocation >>>> is not mandatory).=20 >>>=20 >>> Good point that I almost forgot >>>=20 >>>> So we should handle the error case in the above >>>> splitting operation. >>>=20 >>> But back to the previous discussion in v2... the thinking was that = /some/ PMDs >>> got split, and say could allow some PTE remapping to occur and free = some pages >>> back (each page allows 6 more splits worst case). Then the next >>> __hugetlb_vmemmap_optimize() will have to split PMD pages again for = those >>> hugepages that failed the batch PMD split (as we only defer the PTE = remap tlb >>> flush in this stage). >>=20 >> Oh, yes. Maybe we could break the above traversal as early as = possible >> once we enter an ENOMEM? >>=20 >=20 > Sounds good -- no point in keep trying to split if we are failing with = OOM. >=20 > Perhaps a comment in both of these clauses (the early break on split = and the OOM > handling in batch optimize) could help make this clear. Make sense. Thanks. >=20 >>>=20 >>> Unless this isn't something worth handling >>>=20 >>> Joao