Received: by 2002:a05:6a10:9848:0:0:0:0 with SMTP id x8csp580413pxf; Wed, 24 Mar 2021 10:48:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzOZO2pK8eIc4V49TlNBPHP8NL8J0nlgT/tEg0C2rR5huXzXJ1NhYg5rVOzUNrUfCUrOhL6 X-Received: by 2002:a17:907:3e12:: with SMTP id hp18mr4997769ejc.366.1616608109475; Wed, 24 Mar 2021 10:48:29 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1616608109; cv=none; d=google.com; s=arc-20160816; b=hTb9HZuRBuPsOwRUqOylTz2gPX5l4UCnLJtN9bQves9iWyQLCE9KJND/PjCyVz9Zuv dSfHL7NJBR0BvoAvyRzKVW0kMtXZP/4zM8L4CI5HU7L77XCwyXWR1xNLcHn3Ci4kAKRX B00cjUCyVICjo928Q8P+TUA/2fXfed4VxTXlTsaqHBmbYI1haRMlKoijANJJelurE8Q3 QqZDPVqLy61OtYwi7e8FWplDPeb9LkQo4fANaw35l4fonnRYZd5NS1DTXqevLleMFxwi TcTE8dpbtyDWP1cuGdAgbkWA1HUwDZHYKAbRLE3Ba7CDzlsmZtPI2U1qOTFtzF5H4zJJ p7wQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:cc:to:subject:message-id:date:from:in-reply-to :references:mime-version:dkim-signature; bh=gGg9bkmQmUrJEoYO//o+38WT2vp4tDvJ1YauvHLK3w0=; b=T3EriKVmcmqPTp+XDexGYakTj0zi+m371mLkG8H6pbY6gvB8F5Qctl+p0NUkpgq4Y2 p0Sn5AvRvhVVe0s/D2m8fzVDw2aWOn7mvDvP4vXiceAGXtSLcbludgwTMUx+PD7sIigh W9C/9oQHAgLSoC9d8GxzlFc/uNvDuB3UrUxSmljD2l3MtMt21Ax9cVFZ84O0tkbiMyFf iqpdPaGdbpIOObG69Xzn5KUZwIp7kCaVnQZam0YAHWXuF7S8BXS/IIUkO66tkqex24D9 NN1LpgsoH1zmGCSz/tXUp5HTrUjPjtxJMqbRg6CInLkmACzd98KRaU9uqAnWN36W/hxq DcmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=FvQlkmFH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [23.128.96.18]) by mx.google.com with ESMTP id df23si2195152edb.38.2021.03.24.10.48.05; Wed, 24 Mar 2021 10:48:29 -0700 (PDT) Received-SPF: pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) client-ip=23.128.96.18; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=FvQlkmFH; spf=pass (google.com: domain of linux-kernel-owner@vger.kernel.org designates 23.128.96.18 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237206AbhCXRq2 (ORCPT + 99 others); Wed, 24 Mar 2021 13:46:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:36474 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S237207AbhCXRp7 (ORCPT ); Wed, 24 Mar 2021 13:45:59 -0400 Received: from mail-ed1-x52c.google.com (mail-ed1-x52c.google.com [IPv6:2a00:1450:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B4B12C061763 for ; Wed, 24 Mar 2021 10:45:54 -0700 (PDT) Received: by mail-ed1-x52c.google.com with SMTP id e7so28546202edu.10 for ; Wed, 24 Mar 2021 10:45:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gGg9bkmQmUrJEoYO//o+38WT2vp4tDvJ1YauvHLK3w0=; b=FvQlkmFH+tOQEZcfLr3Wsp0O5iLofV5GNtUxR5JyPoM5RLG5vHO0fVqWiayt+ShEiq 6fOcX+/MeA/YjC57p8CPqmLsGiGYJLmRra8KGCVinVIoroyCsik+dyIWDBQTzm+1KveI 3phrbMjz/WSkCRSI41RGGf+W0S888ZEPvbBpYLoK3GfRyXt5qIb2rnZx9IzUiipPH6jt zozhTgF1kG6BbfK756qY10AQSO1h33cgsHI0KI8CSfl3kX3LQVP7SGifZKvRhMAeV4d9 n+fPrenlEA+w78JYAcQjAowrJAfquHk2wnR7aqBvBIpXV+h1NRlYakr+HiETBpvDrqwm drfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gGg9bkmQmUrJEoYO//o+38WT2vp4tDvJ1YauvHLK3w0=; b=V3oGgTMPT9H0G/qG37kE3NRfLyfDrpMsPTddlTglWDZEB2npnHSmQS9LrK5WEPymAC 82TZkZN47eMXGQkjKdO56WYikLJrSetFsb1XlYpzxXXJAtDg4EkMl0rLkaVJhwnI+PiG ngJA8ddEgze+3K6PAyBpavEQFtUEV5Sxk2O8pa8+D6qdl0fhSV82+wq9eZe0vhi8fbm1 NEWKEyAx7GdT77uTfy1b2zIislBXCXMXdzi/rqIYVgYqJuOp/ZT20/guogc5u/r78tqx P6zOfnpPfzIrFACcXOEeO32VtRGAfCBIegQYxetCQNqSZMjjiG20x4ZcJ4hil5FTu4Sf cHCg== X-Gm-Message-State: AOAM53202w8RCBnDuse6P3xgwgoUh+m10h/tA+tn3OW5d4sTNT9ZzHTS WZy2ZhKgM3LFgOUWjBvo1tOyBlUogAbl0rLQDtJKERP6JuJSPA== X-Received: by 2002:aa7:dd05:: with SMTP id i5mr4696265edv.300.1616607953325; Wed, 24 Mar 2021 10:45:53 -0700 (PDT) MIME-Version: 1.0 References: <161604048257.1463742.1374527716381197629.stgit@dwillia2-desk3.amr.corp.intel.com> <161604050866.1463742.7759521510383551055.stgit@dwillia2-desk3.amr.corp.intel.com> <66514812-6a24-8e2e-7be5-c61e188fecc4@oracle.com> In-Reply-To: <66514812-6a24-8e2e-7be5-c61e188fecc4@oracle.com> From: Dan Williams Date: Wed, 24 Mar 2021 10:45:42 -0700 Message-ID: Subject: Re: [PATCH 3/3] mm/devmap: Remove pgmap accounting in the get_user_pages_fast() path To: Joao Martins Cc: Jason Gunthorpe , Christoph Hellwig , Shiyang Ruan , Vishal Verma , Dave Jiang , Ira Weiny , Matthew Wilcox , Jan Kara , Andrew Morton , david , linux-fsdevel , Linux Kernel Mailing List , Linux MM , linux-nvdimm Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Mar 18, 2021 at 3:02 AM Joao Martins wrote: > > On 3/18/21 4:08 AM, Dan Williams wrote: > > Now that device-dax and filesystem-dax are guaranteed to unmap all user > > mappings of devmap / DAX pages before tearing down the 'struct page' > > array, get_user_pages_fast() can rely on its traditional synchronization > > method "validate_pte(); get_page(); revalidate_pte()" to catch races with > > device shutdown. Specifically the unmap guarantee ensures that gup-fast > > either succeeds in taking a page reference (lock-less), or it detects a > > need to fall back to the slow path where the device presence can be > > revalidated with locks held. > > [...] > > > @@ -2087,21 +2078,26 @@ static int gup_pte_range(pmd_t pmd, unsigned long addr, unsigned long end, > > #endif /* CONFIG_ARCH_HAS_PTE_SPECIAL */ > > > > #if defined(CONFIG_ARCH_HAS_PTE_DEVMAP) && defined(CONFIG_TRANSPARENT_HUGEPAGE) > > + > > static int __gup_device_huge(unsigned long pfn, unsigned long addr, > > unsigned long end, unsigned int flags, > > struct page **pages, int *nr) > > { > > int nr_start = *nr; > > - struct dev_pagemap *pgmap = NULL; > > > > do { > > - struct page *page = pfn_to_page(pfn); > > + struct page *page; > > + > > + /* > > + * Typically pfn_to_page() on a devmap pfn is not safe > > + * without holding a live reference on the hosting > > + * pgmap. In the gup-fast path it is safe because any > > + * races will be resolved by either gup-fast taking a > > + * reference or the shutdown path unmapping the pte to > > + * trigger gup-fast to fall back to the slow path. > > + */ > > + page = pfn_to_page(pfn); > > > > - pgmap = get_dev_pagemap(pfn, pgmap); > > - if (unlikely(!pgmap)) { > > - undo_dev_pagemap(nr, nr_start, flags, pages); > > - return 0; > > - } > > SetPageReferenced(page); > > pages[*nr] = page; > > if (unlikely(!try_grab_page(page, flags))) { > > So for allowing FOLL_LONGTERM[0] would it be OK if we used page->pgmap after > try_grab_page() for checking pgmap type to see if we are in a device-dax > longterm pin? So, there is an effort to add a new pte bit p{m,u}d_special to disable gup-fast for huge pages [1]. I'd like to investigate whether we could use devmap + special as an encoding for "no longterm" and never consult the pgmap in the gup-fast path. [1]: https://lore.kernel.org/linux-mm/a1fa7fa2-914b-366d-9902-e5b784e8428c@shipmail.org/