Received: by 2002:a05:6a10:413:0:0:0:0 with SMTP id 19csp1284825pxp; Thu, 10 Mar 2022 02:27:03 -0800 (PST) X-Google-Smtp-Source: ABdhPJwsaqvL7HgQHdnSKq4pllJyp3aUu+f+d/0HGm72JGnWrTPGY+ELX7jIM/PNeiGANjGgsPEU X-Received: by 2002:a17:906:cc87:b0:6da:f37a:56d5 with SMTP id oq7-20020a170906cc8700b006daf37a56d5mr3457638ejb.444.1646908023359; Thu, 10 Mar 2022 02:27:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1646908023; cv=none; d=google.com; s=arc-20160816; b=JJL/fbGLTVVM0aZbPle+dovDqjk31HgcVgdwnj5VClCcdsysJ1djFNV82i6rNbIqcD y96bmN3j/q3j8Xm1qDGZGelZ81nhexW7hCwpCF576yfUcOhjppVQa5hNA91xCZIql5U6 vjLaTXBoJlqSCSoc83Zo20viTnFWXOJ4K7xTD0EMHxElQymzuk1e6Z5u4XleIWPMeYfu mwtvEc/hMz9dxWqjkVbBFkCFSrwbVV0as8bQRTIDhCiaFy4u04f/p72En993pJtrtNQo JWoibsyIfJn1X4y0GnnXTE1tjw6l8SUDAT8MGIYZQzvflMv0JJvtwshObpOGy2LkpAvs 0zsQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=MQ5sXziCooHIXl/2HjfAKdxZRIJNOqLHKmehhq6dXuA=; b=ix5Phr4p6rupmv6Scj5wrLvzn4Jf01rUFG5BFuGLBw0PvmrrZ8MUZjHzU/VZQjQPBi VlQHuGQvfkdvmtBWEDSKPQxe+i9tJtsmATO9CMtyk3mULvaz9ST5qgomvW91TBc+C/P5 0obVRnUGHV4JUj8ZNVpcCmSx6ZD3mPaNdG2WczniztwB46B0G2a6uW9SfqYJ4GO7NPZw noS6ivbKFGDzlutyp/vXX70JW5+MIKPMPmI76Yfr41YUP5SWmtuHGVNtTIhS1x4GRnBG EM5DEnbidTKK7W79VbQNw6aBo+uJuGNaXyv+uxhS8r9Pmdj9nbJ+iqk0uE0sDPnkbZqi d7vQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=BL8aZPjj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from out1.vger.email (out1.vger.email. [2620:137:e000::1:20]) by mx.google.com with ESMTP id gt38-20020a1709072da600b006dafd70e54dsi3512708ejc.507.2022.03.10.02.26.37; Thu, 10 Mar 2022 02:27:03 -0800 (PST) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) client-ip=2620:137:e000::1:20; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20210112.gappssmtp.com header.s=20210112 header.b=BL8aZPjj; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 2620:137:e000::1:20 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237374AbiCJA1p (ORCPT + 99 others); Wed, 9 Mar 2022 19:27:45 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:46400 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230110AbiCJA1o (ORCPT ); Wed, 9 Mar 2022 19:27:44 -0500 Received: from mail-pj1-x102c.google.com (mail-pj1-x102c.google.com [IPv6:2607:f8b0:4864:20::102c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8CAA9123BEA for ; Wed, 9 Mar 2022 16:26:44 -0800 (PST) Received: by mail-pj1-x102c.google.com with SMTP id mg21-20020a17090b371500b001bef9e4657cso6849982pjb.0 for ; Wed, 09 Mar 2022 16:26:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=MQ5sXziCooHIXl/2HjfAKdxZRIJNOqLHKmehhq6dXuA=; b=BL8aZPjjssce7mPIFAhJDomv2XOSpSrRSP853T81aBGL+M6NcK9YcFzmJpP0giYrg4 MKjwmuR5cyz1YvkZRXk1TjltfMkqPdoCr2FXdjzJ7I/cjLvblsrN4WtBAELWtx4yQ69S 0H3P9B/ef+y6d0YURAFjxdo1eKIeIGrC8PbTJSLpeWlkcadMmtllbmBUYIS1nbna07pD ksdaNFUAHuou32qpeI0QL9gBAqUqRT48ElCvgHzHG2Bm33jTGTppZwh2d8hiibAphSHo HLoq/xTLmIrNjgJkj+63J5yI+YyAwkUr124mum33cc748kn0/0V9MVi8m0b+hP6yASuA xadA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=MQ5sXziCooHIXl/2HjfAKdxZRIJNOqLHKmehhq6dXuA=; b=DjGPxYx38FZBPpxIfNg2bw3kuxYS+d9mHdD8dydx+f3YjupmOXCcF4q6HZieQklHOq DUkubMyRG1s5H9XPFurmfGV9e0r3nr2U0XfjP07dO6Op20n5Mh40N291shbkkQ4pVZM2 cOpzn2dzqrlS7qfLKvE7KBm+0wVeBX16hCMf3WyxXXkk5n7clrJR0Pb1CS+feI5VWgRM xKFYOXhfyZBuUb5G+VoOhZq7CUyzFFdic5kPEi6m4d2etp+fvCbhE6S0xQpVhLOck0U3 5ivSvWFsG3yd1swrTuwehoCbsjDyuY5OY4f+gu5pZW7szA4wVB36DqnzkyGckQIe0NMu /QJA== X-Gm-Message-State: AOAM532xNOvh6H1wQv7svw7+jwbuvAXEUdOK5ObyU3P1glpEtN21jRMd bpds5iNklutjby9n7WgzLc9ob0bSct8QBXFp7Ugu1g== X-Received: by 2002:a17:90a:430d:b0:1bc:f340:8096 with SMTP id q13-20020a17090a430d00b001bcf3408096mr2138031pjg.93.1646872003959; Wed, 09 Mar 2022 16:26:43 -0800 (PST) MIME-Version: 1.0 References: <20220302082718.32268-1-songmuchun@bytedance.com> <20220302082718.32268-4-songmuchun@bytedance.com> In-Reply-To: <20220302082718.32268-4-songmuchun@bytedance.com> From: Dan Williams Date: Wed, 9 Mar 2022 16:26:33 -0800 Message-ID: Subject: Re: [PATCH v4 3/6] mm: rmap: introduce pfn_mkclean_range() to cleans PTEs To: Muchun Song Cc: Matthew Wilcox , Jan Kara , Al Viro , Andrew Morton , Alistair Popple , Yang Shi , Ralph Campbell , Hugh Dickins , xiyuyang19@fudan.edu.cn, "Kirill A. Shutemov" , Ross Zwisler , Christoph Hellwig , linux-fsdevel , Linux NVDIMM , Linux Kernel Mailing List , Linux MM , duanxiongchun@bytedance.com, Muchun Song Content-Type: text/plain; charset="UTF-8" X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on lindbergh.monkeyblade.net Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Mar 2, 2022 at 12:29 AM Muchun Song wrote: > > The page_mkclean_one() is supposed to be used with the pfn that has a > associated struct page, but not all the pfns (e.g. DAX) have a struct > page. Introduce a new function pfn_mkclean_range() to cleans the PTEs > (including PMDs) mapped with range of pfns which has no struct page > associated with them. This helper will be used by DAX device in the > next patch to make pfns clean. This seems unfortunate given the desire to kill off CONFIG_FS_DAX_LIMITED which is the only way to get DAX without 'struct page'. I would special case these helpers behind CONFIG_FS_DAX_LIMITED such that they can be deleted when that support is finally removed. > > Signed-off-by: Muchun Song > --- > include/linux/rmap.h | 3 +++ > mm/internal.h | 26 +++++++++++++-------- > mm/rmap.c | 65 +++++++++++++++++++++++++++++++++++++++++++--------- > 3 files changed, 74 insertions(+), 20 deletions(-) > > diff --git a/include/linux/rmap.h b/include/linux/rmap.h > index b58ddb8b2220..a6ec0d3e40c1 100644 > --- a/include/linux/rmap.h > +++ b/include/linux/rmap.h > @@ -263,6 +263,9 @@ unsigned long page_address_in_vma(struct page *, struct vm_area_struct *); > */ > int folio_mkclean(struct folio *); > > +int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, > + struct vm_area_struct *vma); > + > void remove_migration_ptes(struct folio *src, struct folio *dst, bool locked); > > /* > diff --git a/mm/internal.h b/mm/internal.h > index f45292dc4ef5..ff873944749f 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -516,26 +516,22 @@ void mlock_page_drain(int cpu); > extern pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma); > > /* > - * At what user virtual address is page expected in vma? > - * Returns -EFAULT if all of the page is outside the range of vma. > - * If page is a compound head, the entire compound page is considered. > + * * Return the start of user virtual address at the specific offset within > + * a vma. > */ > static inline unsigned long > -vma_address(struct page *page, struct vm_area_struct *vma) > +vma_pgoff_address(pgoff_t pgoff, unsigned long nr_pages, > + struct vm_area_struct *vma) > { > - pgoff_t pgoff; > unsigned long address; > > - VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */ > - pgoff = page_to_pgoff(page); > if (pgoff >= vma->vm_pgoff) { > address = vma->vm_start + > ((pgoff - vma->vm_pgoff) << PAGE_SHIFT); > /* Check for address beyond vma (or wrapped through 0?) */ > if (address < vma->vm_start || address >= vma->vm_end) > address = -EFAULT; > - } else if (PageHead(page) && > - pgoff + compound_nr(page) - 1 >= vma->vm_pgoff) { > + } else if (pgoff + nr_pages - 1 >= vma->vm_pgoff) { > /* Test above avoids possibility of wrap to 0 on 32-bit */ > address = vma->vm_start; > } else { > @@ -545,6 +541,18 @@ vma_address(struct page *page, struct vm_area_struct *vma) > } > > /* > + * Return the start of user virtual address of a page within a vma. > + * Returns -EFAULT if all of the page is outside the range of vma. > + * If page is a compound head, the entire compound page is considered. > + */ > +static inline unsigned long > +vma_address(struct page *page, struct vm_area_struct *vma) > +{ > + VM_BUG_ON_PAGE(PageKsm(page), page); /* KSM page->index unusable */ > + return vma_pgoff_address(page_to_pgoff(page), compound_nr(page), vma); > +} > + > +/* > * Then at what user virtual address will none of the range be found in vma? > * Assumes that vma_address() already returned a good starting address. > */ > diff --git a/mm/rmap.c b/mm/rmap.c > index 723682ddb9e8..ad5cf0e45a73 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -929,12 +929,12 @@ int folio_referenced(struct folio *folio, int is_locked, > return pra.referenced; > } > > -static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *vma, > - unsigned long address, void *arg) > +static int page_vma_mkclean_one(struct page_vma_mapped_walk *pvmw) > { > - DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_SYNC); > + int cleaned = 0; > + struct vm_area_struct *vma = pvmw->vma; > struct mmu_notifier_range range; > - int *cleaned = arg; > + unsigned long address = pvmw->address; > > /* > * We have to assume the worse case ie pmd for invalidation. Note that > @@ -942,16 +942,16 @@ static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *vma, > */ > mmu_notifier_range_init(&range, MMU_NOTIFY_PROTECTION_PAGE, > 0, vma, vma->vm_mm, address, > - vma_address_end(&pvmw)); > + vma_address_end(pvmw)); > mmu_notifier_invalidate_range_start(&range); > > - while (page_vma_mapped_walk(&pvmw)) { > + while (page_vma_mapped_walk(pvmw)) { > int ret = 0; > > - address = pvmw.address; > - if (pvmw.pte) { > + address = pvmw->address; > + if (pvmw->pte) { > pte_t entry; > - pte_t *pte = pvmw.pte; > + pte_t *pte = pvmw->pte; > > if (!pte_dirty(*pte) && !pte_write(*pte)) > continue; > @@ -964,7 +964,7 @@ static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *vma, > ret = 1; > } else { > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > - pmd_t *pmd = pvmw.pmd; > + pmd_t *pmd = pvmw->pmd; > pmd_t entry; > > if (!pmd_dirty(*pmd) && !pmd_write(*pmd)) > @@ -991,11 +991,22 @@ static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *vma, > * See Documentation/vm/mmu_notifier.rst > */ > if (ret) > - (*cleaned)++; > + cleaned++; > } > > mmu_notifier_invalidate_range_end(&range); > > + return cleaned; > +} > + > +static bool page_mkclean_one(struct folio *folio, struct vm_area_struct *vma, > + unsigned long address, void *arg) > +{ > + DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, PVMW_SYNC); > + int *cleaned = arg; > + > + *cleaned += page_vma_mkclean_one(&pvmw); > + > return true; > } > > @@ -1033,6 +1044,38 @@ int folio_mkclean(struct folio *folio) > EXPORT_SYMBOL_GPL(folio_mkclean); > > /** > + * pfn_mkclean_range - Cleans the PTEs (including PMDs) mapped with range of > + * [@pfn, @pfn + @nr_pages) at the specific offset (@pgoff) > + * within the @vma of shared mappings. And since clean PTEs > + * should also be readonly, write protects them too. > + * @pfn: start pfn. > + * @nr_pages: number of physically contiguous pages srarting with @pfn. > + * @pgoff: page offset that the @pfn mapped with. > + * @vma: vma that @pfn mapped within. > + * > + * Returns the number of cleaned PTEs (including PMDs). > + */ > +int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t pgoff, > + struct vm_area_struct *vma) > +{ > + struct page_vma_mapped_walk pvmw = { > + .pfn = pfn, > + .nr_pages = nr_pages, > + .pgoff = pgoff, > + .vma = vma, > + .flags = PVMW_SYNC, > + }; > + > + if (invalid_mkclean_vma(vma, NULL)) > + return 0; > + > + pvmw.address = vma_pgoff_address(pgoff, nr_pages, vma); > + VM_BUG_ON_VMA(pvmw.address == -EFAULT, vma); > + > + return page_vma_mkclean_one(&pvmw); > +} > + > +/** > * page_move_anon_rmap - move a page to our anon_vma > * @page: the page to move to our anon_vma > * @vma: the vma the page belongs to > -- > 2.11.0 >