Received: by 2002:a05:6358:51dd:b0:131:369:b2a3 with SMTP id 29csp941142rwl; Thu, 10 Aug 2023 04:14:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHQQSs7nkahTwLGb2PIa6AYGJBn9vm4Bejk+i4GfdeGmu1B22F+I0X2uE4J73iJFK3InUtc X-Received: by 2002:a05:6870:248c:b0:1b0:408a:1d05 with SMTP id s12-20020a056870248c00b001b0408a1d05mr2347839oaq.38.1691666064988; Thu, 10 Aug 2023 04:14:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1691666064; cv=none; d=google.com; s=arc-20160816; b=CNztb0hw/wXFIL/yCPFgtieXDtEgBYR+CVCW4End1PdFR/VyfvyuTNC0sdkJ4PR1V4 Dt04yFwYMMdHqZcfHVlMcwMvaMF+Bz2aAhGqw82EE2TfKiCsv6PIpvJLYsHo4qiE1sKP zPDBP/cazukdnG6OvS5vrKoj3xrBNPf/tBYAVYLCNedjjTYcpnsMAkY7GBWnKkwPV29z /QqfotozHFRiWY1Z7fJzeen546pAFVqwDJDx19J/3ANksNTWgyFkWKpH/RX429P+Vjov rsBanlv4GOjhNu8Pz5KpDoNthdHGYZ4Jzce59xJE2fWh8owVIE82G9EPsWChWiIaKi5f PpeA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:content-transfer-encoding:in-reply-to:from :references:cc:to:content-language:subject:user-agent:mime-version :date:message-id; bh=J/aQDsLPVt0IRruRwWl6IeMgWJ3rPknxey2+RFHV8rI=; fh=RIYZPmOwlaR4vL2sZRyOTyyr4L1UXXhonEvbtdXZBW0=; b=KSWw7lRLACUppv3JbnlWQp6OMwuPK+oLYQTFWU0DxYfZ8oZEa4LWzIN+xkQZPuBZQk XQSx9nXm5XYuobDP5zNyguHEfNYS78rzMa5GfrjUlGtrF3hwdBEJ1A/EK4SiDBIYaheJ OekicXLreEx5zhcch2bttFpEhmT8Ae1kbU/pQHxAobZGnVbbNHjUAzOg3Ojj2IUL5vP0 yKYz742+4zy2FtDUuNkZ3efZ07MewhzMZ0s5lbT6OdMu5tL4xrOB7oDVW9H7Vf3rIcsp hOhFmo7yCCv2lFIF3gj2/+q66i9IanDPH646rs8QCAQ+lnMK78hbqAToZl94FWR/DZms rjNw== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id l7-20020a17090a384700b0026811057b97si1376940pjf.69.2023.08.10.04.14.12; Thu, 10 Aug 2023 04:14:24 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S233351AbjHJKlQ (ORCPT + 99 others); Thu, 10 Aug 2023 06:41:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:32918 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231437AbjHJKk6 (ORCPT ); Thu, 10 Aug 2023 06:40:58 -0400 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 5474710FE; Thu, 10 Aug 2023 03:40:57 -0700 (PDT) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 631762F4; Thu, 10 Aug 2023 03:41:39 -0700 (PDT) Received: from [10.1.27.169] (XHFQ2J9959.cambridge.arm.com [10.1.27.169]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 645A93F6C4; Thu, 10 Aug 2023 03:40:55 -0700 (PDT) Message-ID: <09b0f874-b84a-45a5-ab9b-53e348eae3df@arm.com> Date: Thu, 10 Aug 2023 11:40:54 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios Content-Language: en-GB To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, Andrew Morton , Jonathan Corbet , Mike Kravetz , Hugh Dickins , "Matthew Wilcox (Oracle)" , Yin Fengwei , Yang Shi , Zi Yan References: <20230809083256.699513-1-david@redhat.com> <181fcc79-b1c6-412f-9ca1-d1f21ef33e32@arm.com> <60b5b2a2-1d1d-661c-d61e-855178fff44d@redhat.com> From: Ryan Roberts In-Reply-To: <60b5b2a2-1d1d-661c-d61e-855178fff44d@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00, RCVD_IN_DNSWL_BLOCKED,SPF_HELO_NONE,SPF_NONE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 09/08/2023 20:17, David Hildenbrand wrote: > On 09.08.23 21:07, Ryan Roberts wrote: >> On 09/08/2023 09:32, David Hildenbrand wrote: >>> Let's track the total mapcount for all large folios in the first subpage. >>> >>> The total mapcount is what we actually want to know in folio_mapcount() >>> and it is also sufficient for implementing folio_mapped(). This also >>> gets rid of any "raceiness" concerns as expressed in >>> folio_total_mapcount(). >>> >>> With sub-PMD THP becoming more important and things looking promising >>> that we will soon get support for such anon THP, we want to avoid looping >>> over all pages of a folio just to calculate the total mapcount. Further, >>> we may soon want to use the total mapcount in other context more >>> frequently, so prepare for reading it efficiently and atomically. >>> >>> Make room for the total mapcount in page[1] of the folio by moving >>> _nr_pages_mapped to page[2] of the folio: it is not applicable to hugetlb >>> -- and with the total mapcount in place probably also not desirable even >>> if PMD-mappable hugetlb pages could get PTE-mapped at some point -- so we >>> can overlay the hugetlb fields. >>> >>> Note that we currently don't expect any order-1 compound pages / THP in >>> rmap code, and that such support is not planned. If ever desired, we could >>> move the compound mapcount to another page, because it only applies to >>> PMD-mappable folios that are definitely larger. Let's avoid consuming >>> more space elsewhere for now -- we might need more space for a different >>> purpose in some subpages soon. >>> >>> Maintain the total mapcount also for hugetlb pages. Use the total mapcount >>> to implement folio_mapcount(), total_mapcount(), folio_mapped() and >>> page_mapped(). >>> >>> We can now get rid of folio_total_mapcount() and >>> folio_large_is_mapped(), by just inlining reading of the total mapcount. >>> >>> _nr_pages_mapped is now only used in rmap code, so not accidentially >>> externally where it might be used on arbitrary order-1 pages. The remaining >>> usage is: >>> >>> (1) Detect how to adjust stats: NR_ANON_MAPPED and NR_FILE_MAPPED >>>   -> If we would account the total folio as mapped when mapping a >>>      page (based on the total mapcount), we could remove that usage. >>> >>> (2) Detect when to add a folio to the deferred split queue >>>   -> If we would apply a different heuristic, or scan using the rmap on >>>      the memory reclaim path for partially mapped anon folios to >>>      split them, we could remove that usage as well. >>> >>> So maybe, we can simplify things in the future and remove >>> _nr_pages_mapped. For now, leave these things as they are, they need more >>> thought. Hugh really did a nice job implementing that precise tracking >>> after all. >>> >>> Note: Not adding order-1 sanity checks to the file_rmap functions for >>>        now. For anon pages, they are certainly not required right now. >>> >>> Cc: Andrew Morton >>> Cc: Jonathan Corbet >>> Cc: Mike Kravetz >>> Cc: Hugh Dickins >>> Cc: "Matthew Wilcox (Oracle)" >>> Cc: Ryan Roberts >>> Cc: Yin Fengwei >>> Cc: Yang Shi >>> Cc: Zi Yan >>> Signed-off-by: David Hildenbrand >> >> Other than the nits and query on zeroing _total_mapcount below, LGTM. If zeroing >> is correct: >> >> Reviewed-by: Ryan Roberts > > Thanks for the review! > > [...] > >>>     static inline int total_mapcount(struct page *page) >> >> nit: couldn't total_mapcount() just be implemented as a wrapper around >> folio_mapcount()? What's the benefit of PageCompound() check instead of just >> getting the folio and checking if it's large? i.e: > > Good point, let me take a look tomorrow if the compiler can optimize in both > cases equally well. > > [...] > >>>   diff --git a/mm/hugetlb.c b/mm/hugetlb.c >>> index 5f498e8025cc..6a614c559ccf 100644 >>> --- a/mm/hugetlb.c >>> +++ b/mm/hugetlb.c >>> @@ -1479,7 +1479,7 @@ static void __destroy_compound_gigantic_folio(struct >>> folio *folio, >>>       struct page *p; >>>         atomic_set(&folio->_entire_mapcount, 0); >>> -    atomic_set(&folio->_nr_pages_mapped, 0); >>> +    atomic_set(&folio->_total_mapcount, 0); >> >> Just checking this is definitely what you intended? _total_mapcount is -1 when >> it means "no pages mapped", so 0 means 1 page mapped? > > I was blindly doing what _entire_mapcount is doing: zeroing out the values. ;) > > But let's look into the details: in __destroy_compound_gigantic_folio(), we're > manually dissolving the whole compound page. So instead of actually returning a > compound page to the buddy (where we would make sure the mapcounts are -1, to > then zero them out !), we simply zero-out the fields we use and then dissolve > the compound page: to be left with a bunch of order-0 pages where the memmap is > in a clean state. > > (the buddy doesn't handle that page order, so we have to do things manually to > get to order-0 pages we can reuse or free) > Yeah fair enough, thanks for the explanation.