Received: by 2002:ad5:474a:0:0:0:0:0 with SMTP id i10csp8915048imu; Tue, 4 Dec 2018 17:01:01 -0800 (PST) X-Google-Smtp-Source: AFSGD/Uww/RQkJZlQ/zw9HienWGzSTCdb+uSURkCHsOMvfzPFFifBqVSi+v7FK6GHiyj52a8fQlF X-Received: by 2002:a17:902:24d:: with SMTP id 71mr21514161plc.225.1543971661177; Tue, 04 Dec 2018 17:01:01 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1543971661; cv=none; d=google.com; s=arc-20160816; b=hkElG8EiurU2To2qGkTQ6HTWo5Q7702Y2pIV1VlgZ8P4Tyeyxk22zctbcIDeysgHri 58nIRmaQ3ywIQaxO7sEuZpZLUDQKdzc1fYWtaHO5xTvi5HrKUQb++mcgsy1EpZr6pKWl tuZLMNireti4YcbbRWR//1fMPutqdQe05p1m8+dvSYOFE11DBencTVNn1Gg+x/8gopcB 4VFTcHNdHNmyskNptdohTgLb3egmarLSN6MfUvgiGRIfukBFOqoxoWQlip7io01IJ6Gs po6VDJbJ/pC5syEUDCHduezQJw1ZyCzUlY1Vgk4OGuSTCzL6qy2czGzGAzA9zRTfvju9 Q03w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:dkim-signature:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject; bh=8QJN4s9oL68HqMKTn7Fdg7/S8pnuk2TmHNx0aEohZc0=; b=mgCiiO1qbhp095/tuwtlI6qiY+Jpy4pJZjhl8QujINuKPrMemt7TLAAzfGoljFTSJK iaMJfcBLCEiqU/NvpxgUiaVjDKcyb7pmpVdRUmhVoVphMbYr2KXEP4trRz5z5vAp9Kg0 Puv9aEFAVS3bCK+8fHV75eWCRb3Ox9oSQaqd68iFDexSUz+yhOpmBzhnnNxP2xvWfeYi dRyV+dSfpDBucbl2oJD5WuH8dlM+EoO0vodT9iR6WCr8WHfvdrKXAPn9awWgATmrpX9b OcN8JP89c8TmkLw/BKIWgYCoogEzvHQXAov8dJmOqHlUwwO/PxnmL41VJrHQipF5rQ4u Xl+g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=i5k306Eh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id l5si19570754pls.423.2018.12.04.17.00.45; Tue, 04 Dec 2018 17:01:01 -0800 (PST) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@nvidia.com header.s=n1 header.b=i5k306Eh; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=pass (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726677AbeLEA76 (ORCPT + 99 others); Tue, 4 Dec 2018 19:59:58 -0500 Received: from hqemgate15.nvidia.com ([216.228.121.64]:18795 "EHLO hqemgate15.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725905AbeLEA76 (ORCPT ); Tue, 4 Dec 2018 19:59:58 -0500 Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqemgate15.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 04 Dec 2018 16:59:53 -0800 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 04 Dec 2018 16:59:56 -0800 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 04 Dec 2018 16:59:56 -0800 Received: from [10.110.48.28] (10.124.1.5) by HQMAIL101.nvidia.com (172.20.187.10) with Microsoft SMTP Server (TLS) id 15.0.1395.4; Wed, 5 Dec 2018 00:59:55 +0000 Subject: Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions To: Dan Williams , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= CC: John Hubbard , Andrew Morton , Linux MM , Jan Kara , , Al Viro , , Christoph Hellwig , Christopher Lameter , "Dalessandro, Dennis" , Doug Ledford , Jason Gunthorpe , Matthew Wilcox , Michal Hocko , , , Linux Kernel Mailing List , linux-fsdevel References: <20181204001720.26138-1-jhubbard@nvidia.com> <20181204001720.26138-2-jhubbard@nvidia.com> <20181205003648.GT2937@redhat.com> X-Nvconfidentiality: public From: John Hubbard Message-ID: <6455f657-708d-5b7f-00bf-89ca8a226c8e@nvidia.com> Date: Tue, 4 Dec 2018 16:59:55 -0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.2 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL108.nvidia.com (172.18.146.13) To HQMAIL101.nvidia.com (172.20.187.10) Content-Type: text/plain; charset="utf-8" Content-Language: en-US-large Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1543971593; bh=8QJN4s9oL68HqMKTn7Fdg7/S8pnuk2TmHNx0aEohZc0=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=i5k306EhJGFbOsJy4Z5+5xINWh4dM/CeqifwCgg4ekiDtFLCAqDMrgiuX6azXO8WZ Lz4iGJHh3tMVDEWCbmtoJfxx91hoZ+vQsBNYw4hETQJpOVhJ6rmoL4BpZg23fW0yVZ KLPOs2ZblLq3MFX++oA1SUL34RWKzcA3Rv0vBiALK4EfjGVAoFrDoQg28nLDjaZ9Bo yXVT24TRcdBdiNNtHKMFSi6IRP51gKHrKJ5kr4yB8QgXxL0FtCR2C2g4B0if05utTQ Xm6XFI+h5cYqEFLSxbm8MSzLtIGhOMpDPe2MA+pOvY1Z31cF3Wslnj3QaWM52WHWxC Qb6hwHLvLtRDA== Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 12/4/18 4:40 PM, Dan Williams wrote: > On Tue, Dec 4, 2018 at 4:37 PM Jerome Glisse wrote: >> >> On Tue, Dec 04, 2018 at 03:03:02PM -0800, Dan Williams wrote: >>> On Tue, Dec 4, 2018 at 1:56 PM John Hubbard wrote: >>>> >>>> On 12/4/18 12:28 PM, Dan Williams wrote: >>>>> On Mon, Dec 3, 2018 at 4:17 PM wrote: >>>>>> >>>>>> From: John Hubbard >>>>>> >>>>>> Introduces put_user_page(), which simply calls put_page(). >>>>>> This provides a way to update all get_user_pages*() callers, >>>>>> so that they call put_user_page(), instead of put_page(). >>>>>> >>>>>> Also introduces put_user_pages(), and a few dirty/locked variations, >>>>>> as a replacement for release_pages(), and also as a replacement >>>>>> for open-coded loops that release multiple pages. >>>>>> These may be used for subsequent performance improvements, >>>>>> via batching of pages to be released. >>>>>> >>>>>> This is the first step of fixing the problem described in [1]. The steps >>>>>> are: >>>>>> >>>>>> 1) (This patch): provide put_user_page*() routines, intended to be used >>>>>> for releasing pages that were pinned via get_user_pages*(). >>>>>> >>>>>> 2) Convert all of the call sites for get_user_pages*(), to >>>>>> invoke put_user_page*(), instead of put_page(). This involves dozens of >>>>>> call sites, and will take some time. >>>>>> >>>>>> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to >>>>>> implement tracking of these pages. This tracking will be separate from >>>>>> the existing struct page refcounting. >>>>>> >>>>>> 4) Use the tracking and identification of these pages, to implement >>>>>> special handling (especially in writeback paths) when the pages are >>>>>> backed by a filesystem. Again, [1] provides details as to why that is >>>>>> desirable. >>>>> >>>>> I thought at Plumbers we talked about using a page bit to tag pages >>>>> that have had their reference count elevated by get_user_pages()? That >>>>> way there is no need to distinguish put_page() from put_user_page() it >>>>> just happens internally to put_page(). At the conference Matthew was >>>>> offering to free up a page bit for this purpose. >>>>> >>>> >>>> ...but then, upon further discussion in that same session, we realized that >>>> that doesn't help. You need a reference count. Otherwise a random put_page >>>> could affect your dma-pinned pages, etc, etc. >>> >>> Ok, sorry, I mis-remembered. So, you're effectively trying to capture >>> the end of the page pin event separate from the final 'put' of the >>> page? Makes sense. >>> >>>> I was not able to actually find any place where a single additional page >>>> bit would help our situation, which is why this still uses LRU fields for >>>> both the two bits required (the RFC [1] still applies), and the dma_pinned_count. >>> >>> Except the LRU fields are already in use for ZONE_DEVICE pages... how >>> does this proposal interact with those? >>> >>>> [1] https://lore.kernel.org/r/20181110085041.10071-7-jhubbard@nvidia.com >>>> >>>>>> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" >>>>>> >>>>>> Reviewed-by: Jan Kara >>>>> >>>>> Wish, you could have been there Jan. I'm missing why it's safe to >>>>> assume that a single put_user_page() is paired with a get_user_page()? >>>>> >>>> >>>> A put_user_page() per page, or a put_user_pages() for an array of pages. See >>>> patch 0002 for several examples. >>> >>> Yes, however I was more concerned about validation and trying to >>> locate missed places where put_page() is used instead of >>> put_user_page(). >>> >>> It would be interesting to see if we could have a debug mode where >>> get_user_pages() returned dynamically allocated pages from a known >>> address range and catch drivers that operate on a user-pinned page >>> without using the proper helper to 'put' it. I think we might also >>> need a ref_user_page() for drivers that may do their own get_page() >>> and expect the dma_pinned_count to also increase. Good idea about a new ref_user_page() call. It's going to hard to find those places at all of the call sites, btw. >> >> Total crazy idea for this, but this is the right time of day >> for this (for me at least it is beer time :)) What about mapping >> all struct page in two different range of kernel virtual address >> and when get user space is use it returns a pointer from the second >> range of kernel virtual address to the struct page. Then in put_page >> you know for sure if the code putting the page got it from GUP or >> from somewhere else. page_to_pfn() would need some trickery to >> handle that. > > Yes, exactly what I was thinking, if only as a debug mode since > instrumenting every pfn/page translation would be expensive. > That does sound viable as a debug mode. I'll try it out. A reliable way (in both directions) of sorting out put_page() vs. put_user_page() would be a huge improvement, even if just in debug mode. >> Dunno if we are running out of kernel virtual address (outside >> 32bits that i believe we are trying to shot down quietly behind >> the bar). > > There's room, KASAN is in a roughly similar place. > Looks like I'd better post a new version of the entire RFC, rather than just these two patches. It's still less fully-baked than I'd hoped. :) thanks, -- John Hubbard NVIDIA