Received: by 2002:a05:7208:9594:b0:7e:5202:c8b4 with SMTP id gs20csp1837858rbb; Tue, 27 Feb 2024 02:53:15 -0800 (PST) X-Forwarded-Encrypted: i=3; AJvYcCWoLYMddU9n+7Ng1attwhcuoVi2LM2VKbXnf7f/18hwReE01kMxwmICL1rXq2E+zQh4tEVuU9gqCeOKrQ87+V5eQyDHCl6e8upZ1D+wPg== X-Google-Smtp-Source: AGHT+IHVk77VnfS9MsbZ2p8Bc0+FATQmhIxvDWozlPozezMxgQmSnN94cnAYEloK2+cP4SbRPWBp X-Received: by 2002:a05:6e02:c90:b0:365:db8:606c with SMTP id b16-20020a056e020c9000b003650db8606cmr11241326ile.28.1709031195483; Tue, 27 Feb 2024 02:53:15 -0800 (PST) ARC-Seal: i=2; a=rsa-sha256; t=1709031195; cv=pass; d=google.com; s=arc-20160816; b=U5SBCTy4rH0MvxMOdeSiI7pDVZ+BB5+T+vsRE6Axm02stE4UPUi4i/NxWTAUwWnv8k yhMdk8zNS3fmDOwlhYyQaMW9EaDFxp+B5ymwkEOSr9cYbVBJsZgGxanwwk2FMUKay2bG mcfbihPFghWvNFDa/ak4vWSgC4DDpXr8Cp9ZZ0uviC5UV/UzddQKEQeFFhi4C0R43p4/ 0NSYRBIqH6TGIrvPqmc5eQUsLMJsiv1FZEBnhbcXQQb4kVeX4/uyH+iXV2ie826v/1+Z /J00Aj5hYGc+ksVKjSb6oSeeskI39stKs8cRS063yJgtDAjICaZUCkh+nqV8JffiE0oz YHVg== ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:list-unsubscribe :list-subscribe:list-id:precedence:date:message-id; bh=ZDemDWDhHX7jSpCmwQSi+/ffQ+/jXBi7Q3CM/3QBYJY=; fh=gBZvsqOnuQ9MzPQT9B9pDwtwz0UFzIxqyL6olzP77AE=; b=XwGdkU3viaxt7i/sSr2EtSVxE/gtTcULIyB5s1WNxBrwFIkyc5TL3EVd+BRWVSISX/ hlZw/UvoE7H6Zu09VioWLF4wpHo+fKGmxAUlUKo4u0hV1+Fc0gi9eeiMyp8usQPBUnTP nXYusyDVKicf0aN8ZA2qOMuOoSG+av+DZ3Tm5zikhY5V6VPLjw3tl98E7ssQBqPx2Vjs fCtiItMUCYqIbFum4/1oX4NKhsEA9xtNqwxHpOn1gaoUXWcg3XFN76ZxOkw0ZCN/5Kaq UCT0OZ0snCt2PnzY1NYvjhbqG7lDYA+7CRx6T4VTmLdG70Nr9EMGzjrLG2t98g+JbdHi w4sA==; dara=google.com ARC-Authentication-Results: i=2; mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-83069-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83069-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Return-Path: Received: from sv.mirrors.kernel.org (sv.mirrors.kernel.org. [139.178.88.99]) by mx.google.com with ESMTPS id y67-20020a636446000000b005dc4a006ee8si5289830pgb.477.2024.02.27.02.53.15 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Feb 2024 02:53:15 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel+bounces-83069-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) client-ip=139.178.88.99; Authentication-Results: mx.google.com; arc=pass (i=1 spf=pass spfdomain=arm.com dmarc=pass fromdomain=arm.com); spf=pass (google.com: domain of linux-kernel+bounces-83069-linux.lists.archive=gmail.com@vger.kernel.org designates 139.178.88.99 as permitted sender) smtp.mailfrom="linux-kernel+bounces-83069-linux.lists.archive=gmail.com@vger.kernel.org"; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=arm.com Received: from smtp.subspace.kernel.org (wormhole.subspace.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by sv.mirrors.kernel.org (Postfix) with ESMTPS id 2F3B82861B1 for ; Tue, 27 Feb 2024 10:53:15 +0000 (UTC) Received: from localhost.localdomain (localhost.localdomain [127.0.0.1]) by smtp.subspace.kernel.org (Postfix) with ESMTP id 962A51386C0; Tue, 27 Feb 2024 10:53:10 +0000 (UTC) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by smtp.subspace.kernel.org (Postfix) with ESMTP id D07E5DDB8 for ; Tue, 27 Feb 2024 10:53:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=217.140.110.172 ARC-Seal:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709031190; cv=none; b=UqczvkIqQeYaZenvhAdiIMXSuwgIjVirrOUwnpq5dlNebKcJKcIdoB6OC0XsS9JoSVSq/CanmN1UZSQ4h8u55JK7C8hn1TcZDNpdlyGKGyoTkrnPSaSXUBYa7qRs9M1GLoXUvEqYXHGMZAcWguMfr93Hm2ih5C03sQAyvfIYCJw= ARC-Message-Signature:i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1709031190; c=relaxed/simple; bh=4173aevZicLQ7+bQhgyhKa7Q3an6+yhm2Hu3Jx4RqZA=; h=Message-ID:Date:MIME-Version:Subject:To:Cc:References:From: In-Reply-To:Content-Type; b=YRNCGKbGE1Sz0k/6gQbiJHRhvbrWi2VkprrKtwyBV1ZSEmkDJsPiHpjzMWSzDlzsypGMnw1k1iKbkGswbBC8+6jOiOFGk40USEIxY/i2/LyljjnhV1TDBOMnXb1DxSHYkTs8WK+6oJ1EreCTTVvKfkaSxTpjp1vx/YkX5UCKT74= ARC-Authentication-Results:i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com; spf=pass smtp.mailfrom=arm.com; arc=none smtp.client-ip=217.140.110.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=arm.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id AA6DBFEC; Tue, 27 Feb 2024 02:53:45 -0800 (PST) Received: from [10.1.30.188] (XHFQ2J9959.cambridge.arm.com [10.1.30.188]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id B8F5F3F6C4; Tue, 27 Feb 2024 02:53:05 -0800 (PST) Message-ID: Date: Tue, 27 Feb 2024 10:53:03 +0000 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: export folio_pte_batch as a couple of modules might need it Content-Language: en-GB To: David Hildenbrand , Lance Yang Cc: Barry Song <21cnbao@gmail.com>, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Yin Fengwei References: <20240227024050.244567-1-21cnbao@gmail.com> <61b9dfc9-5522-44fd-89a4-140833ede8af@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit On 27/02/2024 10:30, David Hildenbrand wrote: > On 27.02.24 11:21, Lance Yang wrote: >> On Tue, Feb 27, 2024 at 5:14 PM David Hildenbrand wrote: >>> >>> On 27.02.24 10:07, Ryan Roberts wrote: >>>> On 27/02/2024 02:40, Barry Song wrote: >>>>> From: Barry Song >>>>> >>>>> madvise and some others might need folio_pte_batch to check if a range >>>>> of PTEs are completely mapped to a large folio with contiguous physcial >>>>> addresses. Let's export it for others to use. >>>>> >>>>> Cc: Lance Yang >>>>> Cc: Ryan Roberts >>>>> Cc: David Hildenbrand >>>>> Cc: Yin Fengwei >>>>> Signed-off-by: Barry Song >>>>> --- >>>>>    -v1: >>>>>    at least two jobs madv_free and madv_pageout depend on it. To avoid >>>>>    conflicts and dependencies, after discussing with Lance, we prefer >>>>>    this one can land earlier. >>>> >>>> I think this will also ultimately be useful for mprotect too, though I haven't >>>> looked at it properly yet. >>>> >>> >>> Yes, I think we briefly discussed that. >>> >>>>> >>>>>    mm/internal.h | 13 +++++++++++++ >>>>>    mm/memory.c   | 11 +---------- >>>>>    2 files changed, 14 insertions(+), 10 deletions(-) >>>>> >>>>> diff --git a/mm/internal.h b/mm/internal.h >>>>> index 13b59d384845..8e2bc304f671 100644 >>>>> --- a/mm/internal.h >>>>> +++ b/mm/internal.h >>>>> @@ -83,6 +83,19 @@ static inline void *folio_raw_mapping(struct folio *folio) >>>>>       return (void *)(mapping & ~PAGE_MAPPING_FLAGS); >>>>>    } >>>>> >>>>> +/* Flags for folio_pte_batch(). */ >>>>> +typedef int __bitwise fpb_t; >>>>> + >>>>> +/* Compare PTEs after pte_mkclean(), ignoring the dirty bit. */ >>>>> +#define FPB_IGNORE_DIRTY            ((__force fpb_t)BIT(0)) >>>>> + >>>>> +/* Compare PTEs after pte_clear_soft_dirty(), ignoring the soft-dirty bit. */ >>>>> +#define FPB_IGNORE_SOFT_DIRTY               ((__force fpb_t)BIT(1)) >>>>> + >>>>> +extern int folio_pte_batch(struct folio *folio, unsigned long addr, >>>>> +            pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, >>>>> +            bool *any_writable); >>>>> + >>>>>    void __acct_reclaim_writeback(pg_data_t *pgdat, struct folio *folio, >>>>>                                               int nr_throttled); >>>>>    static inline void acct_reclaim_writeback(struct folio *folio) >>>>> diff --git a/mm/memory.c b/mm/memory.c >>>>> index 1c45b6a42a1b..319b3be05e75 100644 >>>>> --- a/mm/memory.c >>>>> +++ b/mm/memory.c >>>>> @@ -953,15 +953,6 @@ static __always_inline void __copy_present_ptes(struct >>>>> vm_area_struct *dst_vma, >>>>>       set_ptes(dst_vma->vm_mm, addr, dst_pte, pte, nr); >>>>>    } >>>>> >>>>> -/* Flags for folio_pte_batch(). */ >>>>> -typedef int __bitwise fpb_t; >>>>> - >>>>> -/* Compare PTEs after pte_mkclean(), ignoring the dirty bit. */ >>>>> -#define FPB_IGNORE_DIRTY            ((__force fpb_t)BIT(0)) >>>>> - >>>>> -/* Compare PTEs after pte_clear_soft_dirty(), ignoring the soft-dirty bit. */ >>>>> -#define FPB_IGNORE_SOFT_DIRTY               ((__force fpb_t)BIT(1)) >>>>> - >>>>>    static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) >>>>>    { >>>>>       if (flags & FPB_IGNORE_DIRTY) >>>>> @@ -982,7 +973,7 @@ static inline pte_t __pte_batch_clear_ignored(pte_t >>>>> pte, fpb_t flags) >>>>>     * If "any_writable" is set, it will indicate if any other PTE besides the >>>>>     * first (given) PTE is writable. >>>>>     */ >>>> >>>> David was talking in Lance's patch thread, about improving the docs for this >>>> function now that its exported. Might be worth syncing on that. >>> >>> Here is my take: >>> >>> Signed-off-by: David Hildenbrand >>> --- >>>    mm/memory.c | 22 ++++++++++++++++++---- >>>    1 file changed, 18 insertions(+), 4 deletions(-) >>> >>> diff --git a/mm/memory.c b/mm/memory.c >>> index d0b855a1837a8..098356b8805ae 100644 >>> --- a/mm/memory.c >>> +++ b/mm/memory.c >>> @@ -971,16 +971,28 @@ static inline pte_t __pte_batch_clear_ignored(pte_t >>> pte, fpb_t flags) >>>          return pte_wrprotect(pte_mkold(pte)); >>>    } >>> >>> -/* >>> +/** >>> + * folio_pte_batch - detect a PTE batch for a large folio >>> + * @folio: The large folio to detect a PTE batch for. >>> + * @addr: The user virtual address the first page is mapped at. >>> + * @start_ptep: Page table pointer for the first entry. >>> + * @pte: Page table entry for the first page. >> >> Nit: >> >> - * @pte: Page table entry for the first page. >> + * @pte: Page table entry for the first page that must be the first subpage of >> + *               the folio excluding arm64 for now. >> >> IIUC, pte_batch_hint is always 1 excluding arm64 for now. >> I'm not sure if this modification will be helpful? > > IIRC, Ryan made sure that this also works when passing another subpage, after > when cont-pte is set. Otherwise this would already be broken for fork/zap. > > So I don't think this comment would actually be correct. Indeed, the spec for the function is exactly the same for arm64 as for other arches. It's just that arm64 can accelerate the implementation by skipping forward to the next contpte boundary when the current pte is part of a contpte block. There is no requirement for pte (or addr or start_ptep) to point to the first subpage of a folio - they can point to any subpage. pte, addr and start_ptep must all refer to the same entry, but I think that's clear from the existing text.