Received: by 2002:a05:6a10:22f:0:0:0:0 with SMTP id 15csp2542696pxk; Mon, 14 Sep 2020 16:56:15 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGlAUoZbJFyODxdhIlPlhcm4rRQzHRPDX9bxHBryHC3fNQL1X7z6ikpLFW9VYNfU9P1YQf X-Received: by 2002:a17:906:f6c6:: with SMTP id jo6mr17044907ejb.251.1600127774859; Mon, 14 Sep 2020 16:56:14 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1600127774; cv=none; d=google.com; s=arc-20160816; b=mh4s1J3kVLvkDomQ3/Gb6210IuLi2F9VSmEE54MCnB/DhRNZjqfsSyiOKL7otgQ/z5 SkxulzpJ60ENGH6QG37rxVNhJJTdbO8vdMY0JeMkpwqgILeo/B2OFlh9Th/5IRx5Z2Jh xhFCRknMv8+0QxB/qOsdraUAJRdINwgVeL7GY+gHo8l238gclxzjofid4UFPDAmzqOW4 qpFzBzD8Rtxa7QpYpVL+X0hJUIS9/de0gMxSU2Ap2kx8UVI46YCMayTc3/LfHsbV+W8n CEGuU2Jv9Axc2l+u++gaAMDq09+vtDOYIxEKwsOXrSk2xV9ytaZAOwaMS7agSqrP/htq buMA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=GSb48WQPQrkgi3bBuccaFuKU67HsPRSlHk+Uq/1GA1o=; b=klnjcicEOtdxe+rhZeoqVdzetOOK641IR+NAom5UtOR3g3yxdjzU8POcICy3lmgWbF 3efupH26APBpiAGxJHW2iSJKhTIeiFjE65pgxoZGUGLrE5R0PzA1TELYb6zRnDLwS9QG BjVLIoYNqq0mApZdgXS0qp1ZYOeBHkoYxJyghjZSxihNaCpMOiK1MkXtcgQ8ycFZoNRs aUJ/dzHI818ciPush3VJfOTIMgV2CtHNY3ymwL0UKX+7GYM8Xsf2OwXOACFAs4TCH3yt WXvgf/hb3oBqoyIVt3i8tN1FvAyxzOts+XKO16IH+b4ewX4G8+uwMUBAPFN6D6bRBPTg sinQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=dbMe0ECu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id g6si8020791edw.254.2020.09.14.16.55.50; Mon, 14 Sep 2020 16:56:14 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=dbMe0ECu; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726043AbgINXxe (ORCPT + 99 others); Mon, 14 Sep 2020 19:53:34 -0400 Received: from hqnvemgate24.nvidia.com ([216.228.121.143]:3927 "EHLO hqnvemgate24.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725994AbgINXx3 (ORCPT ); Mon, 14 Sep 2020 19:53:29 -0400 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate24.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Mon, 14 Sep 2020 16:51:08 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Mon, 14 Sep 2020 16:53:28 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Mon, 14 Sep 2020 16:53:28 -0700 Received: from rcampbell-dev.nvidia.com (172.20.13.39) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Mon, 14 Sep 2020 23:53:26 +0000 Subject: Re: [PATCH] mm: remove extra ZONE_DEVICE struct page refcount To: Dan Williams CC: Linux MM , , , Linux Kernel Mailing List , Ira Weiny , "Matthew Wilcox" , Jerome Glisse , "John Hubbard" , Alistair Popple , Christoph Hellwig , Jason Gunthorpe , "Bharata B Rao" , Zi Yan , "Kirill A . Shutemov" , Yang Shi , Paul Mackerras , Ben Skeggs , "Andrew Morton" References: <20200914224509.17699-1-rcampbell@nvidia.com> X-Nvconfidentiality: public From: Ralph Campbell Message-ID: <10b4b85c-f1e9-b6b5-74cd-6190ee0aca5d@nvidia.com> Date: Mon, 14 Sep 2020 16:53:25 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [172.20.13.39] X-ClientProxiedBy: HQMAIL107.nvidia.com (172.20.187.13) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1600127468; bh=GSb48WQPQrkgi3bBuccaFuKU67HsPRSlHk+Uq/1GA1o=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=dbMe0ECus2gTUFHe8/EBdTG4bM7a+Szx8krahhX/k29r99HHbmD68ATPlhtwnTF9n C7TC982i8Jc/5aPiL2zAOpmuksKWq48vM3OHgCxO7MnVYDNko4Iq++kuBOWAO9LzZE Dqg5Q+fKWQ96YvBB6v9Oghw9EljXDcrdFJBKfe1tGWveRagxMvjZr/nq3WA2le6AMr h82lKgJNzBp+2CsP6rGlL6SZSf457Gi4YF79V3sMfANnPLN9Hbe7qsSqwbxPIU4EpV 0zuB+3QAuNzfS4e/E7mmwdT5zpkubVabPUTY83gjy8ryR1p3huULeumACtmkH4AsM0 LXVytDljms0fA== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 9/14/20 4:10 PM, Dan Williams wrote: > On Mon, Sep 14, 2020 at 3:45 PM Ralph Campbell wrote: >> >> ZONE_DEVICE struct pages have an extra reference count that complicates the >> code for put_page() and several places in the kernel that need to check the >> reference count to see that a page is not being used (gup, compaction, >> migration, etc.). Clean up the code so the reference count doesn't need to >> be treated specially for ZONE_DEVICE. >> >> Signed-off-by: Ralph Campbell >> --- >> >> Matthew Wilcox, Ira Weiny, and others have complained that ZONE_DEVICE >> struct page reference counting is ugly/broken. This is my attempt to >> fix it and it works for the HMM migration self tests. > > Can you link to a technical description of what's broken? Or better > yet, summarize that argument in the changelog? > >> I'm only sending this out as a RFC since I'm not that familiar with the >> DAX, PMEM, XEN, and other uses of ZONE_DEVICE struct pages allocated >> with devm_memremap_pages() or memremap_pages() but my best reading of >> the code looks like it might be OK. I could use help testing these >> configurations. > > Back in the 4.15 days I could not convince myself that some code paths > blindly assumed that pages with refcount==0 were on an lru list. Since > then, struct page has been reorganized to not collide the ->pgmap back > pointer with the ->lru list and there have been other cleanups for > page pinning that might make this incremental cleanup viable. > > You also need to fix up ext4_break_layouts() and > xfs_break_dax_layouts() to expect ->_refcount is 0 instead of 1. This > also needs some fstests exposure. Got it. Thanks! >> I have a modified THP migration patch series that applies on top of >> this one and is cleaner since I don't have to add code to handle the >> +1 reference count. The link below is for the earlier v2: >> ("mm/hmm/nouveau: add THP migration to migrate_vma_*") >> https://lore.kernel.org/linux-mm/20200902165830.5367-1-rcampbell@nvidia.com >> >> >> arch/powerpc/kvm/book3s_hv_uvmem.c | 1 - >> drivers/gpu/drm/nouveau/nouveau_dmem.c | 1 - >> include/linux/memremap.h | 6 +-- >> include/linux/mm.h | 39 --------------- >> lib/test_hmm.c | 1 - >> mm/gup.c | 44 ----------------- >> mm/memremap.c | 20 ++++---- >> mm/migrate.c | 5 -- >> mm/swap.c | 66 +++++++++++--------------- >> 9 files changed, 41 insertions(+), 142 deletions(-) > > This diffstat is indeed appealing. > >> >> diff --git a/arch/powerpc/kvm/book3s_hv_uvmem.c b/arch/powerpc/kvm/book3s_hv_uvmem.c >> index 84e5a2dc8be5..00d97050d7ff 100644 >> --- a/arch/powerpc/kvm/book3s_hv_uvmem.c >> +++ b/arch/powerpc/kvm/book3s_hv_uvmem.c >> @@ -711,7 +711,6 @@ static struct page *kvmppc_uvmem_get_page(unsigned long gpa, struct kvm *kvm) >> >> dpage = pfn_to_page(uvmem_pfn); >> dpage->zone_device_data = pvt; >> - get_page(dpage); >> lock_page(dpage); >> return dpage; >> out_clear: >> diff --git a/drivers/gpu/drm/nouveau/nouveau_dmem.c b/drivers/gpu/drm/nouveau/nouveau_dmem.c >> index a13c6215bba8..2a4bbe01a455 100644 >> --- a/drivers/gpu/drm/nouveau/nouveau_dmem.c >> +++ b/drivers/gpu/drm/nouveau/nouveau_dmem.c >> @@ -324,7 +324,6 @@ nouveau_dmem_page_alloc_locked(struct nouveau_drm *drm) >> return NULL; >> } >> >> - get_page(page); >> lock_page(page); >> return page; >> } >> diff --git a/include/linux/memremap.h b/include/linux/memremap.h >> index 4e9c738f4b31..7dd9802d2612 100644 >> --- a/include/linux/memremap.h >> +++ b/include/linux/memremap.h >> @@ -67,9 +67,9 @@ enum memory_type { >> >> struct dev_pagemap_ops { >> /* >> - * Called once the page refcount reaches 1. (ZONE_DEVICE pages never >> - * reach 0 refcount unless there is a refcount bug. This allows the >> - * device driver to implement its own memory management.) >> + * Called once the page refcount reaches 0. The reference count is >> + * reset to 1 before calling page_free(). This allows the >> + * device driver to implement its own memory management. > > I'd clarify the order events / responsibility of the common core > page_free() and the device specific page_free(). At the same time, why > not update drivers to expect that the page is already refcount==0 on > entry? Seems odd to go through all this trouble to make the reference > count appear to be zero to the wider kernel but expect that drivers > get a fake reference on entry to their ->page_free() callbacks. Good point. Since set_page_refcounted() is defined in mm_interal.h I would have to move the definition to someplace like page_ref.h or have the drivers cal init_page_count() or set_page_count() since get_page() calls VM_BUG_ON_PAGE() if refcount == 0. I'll move set_page_refcounted() since that is what the page allocator uses and seems named for the purpose.