Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8825831imu; Tue, 4 Dec 2018 15:04:23 -0800 (PST) X-Google-Smtp-Source: AFSGD/WLyyf6pdHtci3E6uyo2rs0cSDtL50hVusvC5ncEqI/PbF66g00DKnpOcRqkfc7bk1E6IHC X-Received: by 2002:a63:7c13:: with SMTP id x19mr18135196pgc.45.1543964663557; Tue, 04 Dec 2018 15:04:23 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543964663; cv=none; d=google.com; s=arc-20160816; b=WrBtnmInWEgvd128IqVTxj16EgKrxQqmBWf5fTVB/f/BcZD3jTsFKUOh561TrpDipa tiWiELAZnX1qD4YELpQrKJOigY4ru/X5wmvpB/y3TYSSW3zX7t2Pzik2J6i/8TieDMrb d9xpNPascpOymAiSTV8o69l823f6DNwpbAqTahC5nPooyl/EbVc5AIR3DareLnbCkFdg X/XNFAzU/lhOzoZW7XgtBIQ7cB2UMZWGvJc6ZnDQb7sTPDHLjzpWfBJ+SVUsimZd7quo owCVgM4Yg0p56Lk9grsvf5/Y7fwc7VYFO8OQtPCw5+HDOHAuEE/75xdzVn60P5gGCQdZ lCEw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=sMIV9UdFAh6kXMjUoumKsFzTXeORRHitzQorgQpcy74=; b=RvomoJrICDwLPZ8FH2FUp8oPUlHvXcvGiORJlg4UdvILgyuW9wEHNOG/yj45IoI4Cr 1MFalD1uK0aJoJwd8XIe1wTAJiE3CUXM3u33++sUaoAhhNSMsSMZi8MjIfNoURzo6UAy ypuL/Y9eSJalBNVqNv5+heAhiRMJxThTTi3TJvSmRDhvZoQWSls7stwc16gfbOzFpYdj LzyfJKsDIUdBjfq4/7WGlFnyW7ck97CgimbYf0731HPPSlYvQFF1Wa8BzFgCed4od6Gi 50UyfArN7sg6G6Y6m9IJ7OxQl1sQp37LiZ62O4pYs0KgTi3j6bU5av9fPNTokxr3LW00 0LFw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=QhbDuElk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id s36si18190416pld.46.2018.12.04.15.04.08; Tue, 04 Dec 2018 15:04:23 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=QhbDuElk; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726457AbeLDXDR (ORCPT + 99 others); Tue, 4 Dec 2018 18:03:17 -0500 Received: from mail-oi1-f195.google.com ([209.85.167.195]:42372 "EHLO mail-oi1-f195.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725875AbeLDXDO (ORCPT ); Tue, 4 Dec 2018 18:03:14 -0500 Received: by mail-oi1-f195.google.com with SMTP id w13so15850597oiw.9 for ; Tue, 04 Dec 2018 15:03:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=sMIV9UdFAh6kXMjUoumKsFzTXeORRHitzQorgQpcy74=; b=QhbDuElkCug1RZqEbnUZoPKqOMA4+o5ilUTeq42k0M2gFqxQFvFnlSiIUXpTkHHCzl ZD/WBZyCl7Lh9rH6QIIfkGo0PWBanGIIXwAoWwdDc2SYRwC61/H0pgfouJTs1iQwzX3E u4u5x5lDoSjAxObOTq/0xNc6nw6pthSVYjh3Hehol90rFuEqcpkRkxRdPOOheN5T6p77 5DGEjCSAuIdUP2+foXOzxx4pit4iKmciUiz4Uash0QxyEwaehjA+fTqImbKvQm03aMxz bj2p3MfiuJo/LBafMQrsGtCdzzCppaddDIkoQjPXSg0wXEWSvKVoLf4FJsQXECJWWeZ8 wWFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=sMIV9UdFAh6kXMjUoumKsFzTXeORRHitzQorgQpcy74=; b=fFiiieGFjK4WnuFivur94hZBTy8Vh1xBZ8pNkKEh151Og1nKcCQO4QxIBrjc8qmE9l rbldnY7GZBBsA8hZRS9p55PcO71PH31P+Aj049iVLTe2+E1IFtB8FiHhPQx8zGpzNTVH 2cWC9MFKGBaYpkCHkoDkEi/tMZPy8YwF5ivLniL1vTSDaPUurEAZXbpzj6qvEr7fo6TO oFr4d4snTLw2a6UQ3sZj+ZCJw1YD8PS5ksSJIZIGO6wxau+pkMajiydjBhy0Oh8QWJpB sIMrmjmcy3o6oGXV6dSmGgQpuWceBALwm2PAFLvpxPVPGtw6gZ7amNmI0JCom3Bu1Hvv r1gw== X-Gm-Message-State: AA+aEWZGO18wjp+23hQZTCWfNVRPtugTBT+kqutPYmS16CNgbln1LpZo 3zThOD07TtyiluwazFDqC42aRofcyY8adClxdmn3yw== X-Received: by 2002:aca:4307:: with SMTP id q7mr7849795oia.105.1543964593007; Tue, 04 Dec 2018 15:03:13 -0800 (PST) MIME-Version: 1.0 References: <20181204001720.26138-1-jhubbard@nvidia.com> <20181204001720.26138-2-jhubbard@nvidia.com> In-Reply-To: From: Dan Williams Date: Tue, 4 Dec 2018 15:03:02 -0800 Message-ID: Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions To: John Hubbard Cc: John Hubbard , Andrew Morton , Linux MM , Jan Kara , tom@talpey.com, Al Viro , benve@cisco.com, Christoph Hellwig , Christopher Lameter , "Dalessandro, Dennis" , Doug Ledford , Jason Gunthorpe , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Matthew Wilcox , Michal Hocko , mike.marciniszyn@intel.com, rcampbell@nvidia.com, Linux Kernel Mailing List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 4, 2018 at 1:56 PM John Hubbard wrote: > > On 12/4/18 12:28 PM, Dan Williams wrote: > > On Mon, Dec 3, 2018 at 4:17 PM wrote: > >> > >> From: John Hubbard > >> > >> Introduces put_user_page(), which simply calls put_page(). > >> This provides a way to update all get_user_pages*() callers, > >> so that they call put_user_page(), instead of put_page(). > >> > >> Also introduces put_user_pages(), and a few dirty/locked variations, > >> as a replacement for release_pages(), and also as a replacement > >> for open-coded loops that release multiple pages. > >> These may be used for subsequent performance improvements, > >> via batching of pages to be released. > >> > >> This is the first step of fixing the problem described in [1]. The steps > >> are: > >> > >> 1) (This patch): provide put_user_page*() routines, intended to be used > >> for releasing pages that were pinned via get_user_pages*(). > >> > >> 2) Convert all of the call sites for get_user_pages*(), to > >> invoke put_user_page*(), instead of put_page(). This involves dozens of > >> call sites, and will take some time. > >> > >> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to > >> implement tracking of these pages. This tracking will be separate from > >> the existing struct page refcounting. > >> > >> 4) Use the tracking and identification of these pages, to implement > >> special handling (especially in writeback paths) when the pages are > >> backed by a filesystem. Again, [1] provides details as to why that is > >> desirable. > > > > I thought at Plumbers we talked about using a page bit to tag pages > > that have had their reference count elevated by get_user_pages()? That > > way there is no need to distinguish put_page() from put_user_page() it > > just happens internally to put_page(). At the conference Matthew was > > offering to free up a page bit for this purpose. > > > > ...but then, upon further discussion in that same session, we realized that > that doesn't help. You need a reference count. Otherwise a random put_page > could affect your dma-pinned pages, etc, etc. Ok, sorry, I mis-remembered. So, you're effectively trying to capture the end of the page pin event separate from the final 'put' of the page? Makes sense. > I was not able to actually find any place where a single additional page > bit would help our situation, which is why this still uses LRU fields for > both the two bits required (the RFC [1] still applies), and the dma_pinned_count. Except the LRU fields are already in use for ZONE_DEVICE pages... how does this proposal interact with those? > [1] https://lore.kernel.org/r/20181110085041.10071-7-jhubbard@nvidia.com > > >> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" > >> > >> Reviewed-by: Jan Kara > > > > Wish, you could have been there Jan. I'm missing why it's safe to > > assume that a single put_user_page() is paired with a get_user_page()? > > > > A put_user_page() per page, or a put_user_pages() for an array of pages. See > patch 0002 for several examples. Yes, however I was more concerned about validation and trying to locate missed places where put_page() is used instead of put_user_page(). It would be interesting to see if we could have a debug mode where get_user_pages() returned dynamically allocated pages from a known address range and catch drivers that operate on a user-pinned page without using the proper helper to 'put' it. I think we might also need a ref_user_page() for drivers that may do their own get_page() and expect the dma_pinned_count to also increase.