Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8898356imu; Tue, 4 Dec 2018 16:37:45 -0800 (PST) X-Google-Smtp-Source: AFSGD/WhiX2XVaE5mHknD+hxdXFYJFsAq+4T7izBUhT296BiuW0WauQ1m0+0Jyt2/q1prGxtH+mf X-Received: by 2002:a63:3c44:: with SMTP id i4mr18321809pgn.286.1543970265516; Tue, 04 Dec 2018 16:37:45 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543970265; cv=none; d=google.com; s=arc-20160816; b=MlbFsvEGHGjKYw9TQORLjwVUdvHxhKwYGxwHHLTHPZumLGcH/TkyY0gRI3asBeohZ5 uXd0GkGbnValfzNJGMuRA6d4TsYP5/XEYuRcP/rwtlL34jwi/XCSAf/NYoBknrokrUJe f9FEbNXgxmHdBMEPmTS5EXRcy6/dLWEFUXJLw0Nmzi/6bSzZEhkWmU2ceaTL6WnBq6c5 EUT8Tu8O+zum8U2gdkHKEQ4ujA/8dn6V4jK9d555lG3dVOTCGLM1/gHDN/4BVhrw1Raa 1T1963lG4G3RDbBebLiRDrZyXmYLCrooQKxQVSXq7AMm4lJhSWuiJiB2eEY7kaDOkDgy B7CA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:user-agent:in-reply-to :content-transfer-encoding:content-disposition:mime-version :references:message-id:subject:cc:to:from:date; bh=d/qmS8NXnk01k2dzs8uKBVDCeFII+h8f+IWFNPU/XZQ=; b=ZlC7z7oc0bbu/D/8SA2udcLFDHDkKOWKyKZnxcX41zHwgarLWZur8gYK2j3eX0AxPy hrpJ7qCbs9ee7ptMiqkMrVAqqBG/vSMWgAOLL4R/1nQ7VACGzjYSbNkNZ8LPRJiyl9Y+ PUjU89ggB/nyaXpIa8EZRyIF36gNqVeSIOg4/a0ncTn5CbGpBdITbi/y2TCW2y9227g2 sVBMRBxMA/4oeEhjMiV9T8fl9g2gQ8oZ3ab5zcvIfHjT3AdAEG9q9Rj1It9ccKqiTe3B zbbAk+uWjDO/XfC/fUZIL6tZNmzGxebdEfnWr3Hzy7+Y+FfOfxchj4qKemIZwlaoRv6N 3GZQ== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id r27si17158590pgl.494.2018.12.04.16.37.28; Tue, 04 Dec 2018 16:37:45 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=redhat.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726440AbeLEAgy (ORCPT + 99 others); Tue, 4 Dec 2018 19:36:54 -0500 Received: from mx1.redhat.com ([209.132.183.28]:43412 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725979AbeLEAgx (ORCPT ); Tue, 4 Dec 2018 19:36:53 -0500 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id E6CA5308403D; Wed, 5 Dec 2018 00:36:52 +0000 (UTC) Received: from redhat.com (unknown [10.20.6.215]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 914896B8EA; Wed, 5 Dec 2018 00:36:50 +0000 (UTC) Date: Tue, 4 Dec 2018 19:36:48 -0500 From: Jerome Glisse To: Dan Williams Cc: John Hubbard , John Hubbard , Andrew Morton , Linux MM , Jan Kara , tom@talpey.com, Al Viro , benve@cisco.com, Christoph Hellwig , Christopher Lameter , "Dalessandro, Dennis" , Doug Ledford , Jason Gunthorpe , Matthew Wilcox , Michal Hocko , mike.marciniszyn@intel.com, rcampbell@nvidia.com, Linux Kernel Mailing List , linux-fsdevel Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions Message-ID: <20181205003648.GT2937@redhat.com> References: <20181204001720.26138-1-jhubbard@nvidia.com> <20181204001720.26138-2-jhubbard@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: User-Agent: Mutt/1.10.0 (2018-05-17) X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.40]); Wed, 05 Dec 2018 00:36:53 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Dec 04, 2018 at 03:03:02PM -0800, Dan Williams wrote: > On Tue, Dec 4, 2018 at 1:56 PM John Hubbard wrote: > > > > On 12/4/18 12:28 PM, Dan Williams wrote: > > > On Mon, Dec 3, 2018 at 4:17 PM wrote: > > >> > > >> From: John Hubbard > > >> > > >> Introduces put_user_page(), which simply calls put_page(). > > >> This provides a way to update all get_user_pages*() callers, > > >> so that they call put_user_page(), instead of put_page(). > > >> > > >> Also introduces put_user_pages(), and a few dirty/locked variations, > > >> as a replacement for release_pages(), and also as a replacement > > >> for open-coded loops that release multiple pages. > > >> These may be used for subsequent performance improvements, > > >> via batching of pages to be released. > > >> > > >> This is the first step of fixing the problem described in [1]. The steps > > >> are: > > >> > > >> 1) (This patch): provide put_user_page*() routines, intended to be used > > >> for releasing pages that were pinned via get_user_pages*(). > > >> > > >> 2) Convert all of the call sites for get_user_pages*(), to > > >> invoke put_user_page*(), instead of put_page(). This involves dozens of > > >> call sites, and will take some time. > > >> > > >> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to > > >> implement tracking of these pages. This tracking will be separate from > > >> the existing struct page refcounting. > > >> > > >> 4) Use the tracking and identification of these pages, to implement > > >> special handling (especially in writeback paths) when the pages are > > >> backed by a filesystem. Again, [1] provides details as to why that is > > >> desirable. > > > > > > I thought at Plumbers we talked about using a page bit to tag pages > > > that have had their reference count elevated by get_user_pages()? That > > > way there is no need to distinguish put_page() from put_user_page() it > > > just happens internally to put_page(). At the conference Matthew was > > > offering to free up a page bit for this purpose. > > > > > > > ...but then, upon further discussion in that same session, we realized that > > that doesn't help. You need a reference count. Otherwise a random put_page > > could affect your dma-pinned pages, etc, etc. > > Ok, sorry, I mis-remembered. So, you're effectively trying to capture > the end of the page pin event separate from the final 'put' of the > page? Makes sense. > > > I was not able to actually find any place where a single additional page > > bit would help our situation, which is why this still uses LRU fields for > > both the two bits required (the RFC [1] still applies), and the dma_pinned_count. > > Except the LRU fields are already in use for ZONE_DEVICE pages... how > does this proposal interact with those? > > > [1] https://lore.kernel.org/r/20181110085041.10071-7-jhubbard@nvidia.com > > > > >> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" > > >> > > >> Reviewed-by: Jan Kara > > > > > > Wish, you could have been there Jan. I'm missing why it's safe to > > > assume that a single put_user_page() is paired with a get_user_page()? > > > > > > > A put_user_page() per page, or a put_user_pages() for an array of pages. See > > patch 0002 for several examples. > > Yes, however I was more concerned about validation and trying to > locate missed places where put_page() is used instead of > put_user_page(). > > It would be interesting to see if we could have a debug mode where > get_user_pages() returned dynamically allocated pages from a known > address range and catch drivers that operate on a user-pinned page > without using the proper helper to 'put' it. I think we might also > need a ref_user_page() for drivers that may do their own get_page() > and expect the dma_pinned_count to also increase. Total crazy idea for this, but this is the right time of day for this (for me at least it is beer time :)) What about mapping all struct page in two different range of kernel virtual address and when get user space is use it returns a pointer from the second range of kernel virtual address to the struct page. Then in put_page you know for sure if the code putting the page got it from GUP or from somewhere else. page_to_pfn() would need some trickery to handle that. Dunno if we are running out of kernel virtual address (outside 32bits that i believe we are trying to shot down quietly behind the bar). Cheers, J?r?me