Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4412217imm; Mon, 18 Jun 2018 14:39:30 -0700 (PDT) X-Google-Smtp-Source: ADUXVKKn7jlN7ECyXxAKQbWVtY4adQrFuDLYNcwJmOLtnseU26CqCPd2t4y6fNByBDHaVJu4emwU X-Received: by 2002:a17:902:42e4:: with SMTP id h91-v6mr15780688pld.27.1529357970126; Mon, 18 Jun 2018 14:39:30 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529357970; cv=none; d=google.com; s=arc-20160816; b=UupHkz00wuslcqyAq1qckdkwXzKuhA4NTVYZWITqgpNmCVb6ear9x00FPxmGCfjRkJ d21lUFeCWE98FYKOsSs3l9Hxnl9L6sa59AfWkzUGe8DsHNNlO5NUNv933/+Jd9h/ghOB 9xpiD9027YVMLxccppLewckK36kni06GAs3ufotFdXsZN1HiCV2SpnNctjrCxXZtcP5K zxTK8YiRX05eXH10nJSud7JZzsZXhDNzU00G3CjA2owxv7JBsZz9WCyuZgEdXIO8v+XE W7D+i/M1hk6oqdhUmkiRPYEojXqpV3k/FlPylzeHJeIwabc7YgVD3Lu/BvzVQKveDIVd 2xww== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:content-transfer-encoding :content-language:in-reply-to:mime-version:user-agent:date :message-id:from:references:cc:to:subject:arc-authentication-results; bh=1YeSvKAKxhrwyPKSpemB6GrBf1tB79pzKbBcNGnzEW4=; b=R5oDRa78oPcXJEMymIC37kU7cKbERcANwiKVOaTgNdprBzvBgItolkHw+DO6YoTFVn f/YAs/ssRDJJBSLU5S3eEuquqEg8RVkah7haMHmb7CB6JTiGgyLQtRRrNbLmhXYa0fMG Ox89d5dCxATWtoalg/jvFnQL5AFVALqcfwU11ftG+KmGBZa0ixgcuYN20kECtrZJnTe1 ikD+sfYV0rejZglo/an+5PqtuVG5FL62EgYbmXsKzentQXg1bPaz/5wIfR6/6IdEZBfJ n1aW8qvRPAZc+h5bV+4w+3AJUfSXM07NBbN82QuCqr11ln/CemW5HsUf7HxP4VXvIzM6 ADug== ARC-Authentication-Results: i=1; mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id d89-v6si14849666pfj.311.2018.06.18.14.39.16; Mon, 18 Jun 2018 14:39:30 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=nvidia.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935789AbeFRVhk (ORCPT + 99 others); Mon, 18 Jun 2018 17:37:40 -0400 Received: from hqemgate16.nvidia.com ([216.228.121.65]:14103 "EHLO hqemgate16.nvidia.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754227AbeFRVhi (ORCPT ); Mon, 18 Jun 2018 17:37:38 -0400 Received: from hqpgpgate102.nvidia.com (Not Verified[216.228.121.13]) by hqemgate16.nvidia.com (using TLS: TLSv1, AES128-SHA) id ; Mon, 18 Jun 2018 14:37:40 -0700 Received: from HQMAIL107.nvidia.com ([172.20.161.6]) by hqpgpgate102.nvidia.com (PGP Universal service); Mon, 18 Jun 2018 14:37:42 -0700 X-PGP-Universal: processed; by hqpgpgate102.nvidia.com on Mon, 18 Jun 2018 14:37:42 -0700 Received: from [10.110.48.28] (10.110.48.28) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1347.2; Mon, 18 Jun 2018 21:37:37 +0000 Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*() To: Dan Williams CC: Christoph Hellwig , Jason Gunthorpe , John Hubbard , Matthew Wilcox , Michal Hocko , Christopher Lameter , Jan Kara , Linux MM , LKML , linux-rdma References: <20180617012510.20139-1-jhubbard@nvidia.com> <20180617012510.20139-3-jhubbard@nvidia.com> <20180617200432.krw36wrcwidb25cj@ziepe.ca> <311eba48-60f1-b6cc-d001-5cc3ed4d76a9@nvidia.com> <20180618081258.GB16991@lst.de> <3898ef6b-2fa0-e852-a9ac-d904b47320d5@nvidia.com> X-Nvconfidentiality: public From: John Hubbard Message-ID: <0e6053b3-b78c-c8be-4fab-e8555810c732@nvidia.com> Date: Mon, 18 Jun 2018 14:36:44 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.110.48.28] X-ClientProxiedBy: HQMAIL104.nvidia.com (172.18.146.11) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 06/18/2018 12:21 PM, Dan Williams wrote: > On Mon, Jun 18, 2018 at 11:14 AM, John Hubbard wrote: >> On 06/18/2018 10:56 AM, Dan Williams wrote: >>> On Mon, Jun 18, 2018 at 10:50 AM, John Hubbard wrote: >>>> On 06/18/2018 01:12 AM, Christoph Hellwig wrote: >>>>> On Sun, Jun 17, 2018 at 01:28:18PM -0700, John Hubbard wrote: >>>>>> Yes. However, my thinking was: get_user_pages() can become a way to indicate that >>>>>> these pages are going to be treated specially. In particular, the caller >>>>>> does not really want or need to support certain file operations, while the >>>>>> page is flagged this way. >>>>>> >>>>>> If necessary, we could add a new API call. >>>>> >>>>> That API call is called get_user_pages_longterm. >>>> >>>> OK...I had the impression that this was just semi-temporary API for dax, but >>>> given that it's an exported symbol, I guess it really is here to stay. >>> >>> The plan is to go back and provide api changes that bypass >>> get_user_page_longterm() for RDMA. However, for VFIO and others, it's >>> not clear what we could do. In the VFIO case the guest would need to >>> be prepared handle the revocation. >> >> OK, let's see if I understand that plan correctly: >> >> 1. Change RDMA users (this could be done entirely in the various device drivers' >> code, unless I'm overlooking something) to use mmu notifiers, and to do their >> DMA to/from non-pinned pages. > > The problem with this approach is surprising the RDMA drivers with > notifications of teardowns. It's the RDMA userspace applications that > need the notification, and it likely needs to be explicit opt-in, at > least for the non-ODP drivers. > >> 2. Return early from get_user_pages_longterm, if the memory is...marked for >> RDMA? (How? Same sort of page flag that I'm floating here, or something else?) >> That would avoid the problem with pinned pages getting their buffer heads >> removed--by disallowing the pinning. Makes sense. > > Well, right now the RDMA workaround is DAX specific and it seems we > need to generalize it for the page-cache case. One thought is to have > try_to_unmap() take it's own reference and wait for the page reference > count to drop to one so that the truncate path knows the page is > dma-idle and disconnected from the page cache, but I have not looked > at the details. > >> Also, is there anything I can help with here, so that things can happen sooner? > > I do think we should explore a page flag for pages that are "long > term" pinned. Michal asked for something along these lines at LSF / MM > so that the core-mm can give up on pages that the kernel has lost > lifetime control. Michal, did I capture your ask correctly? OK, that "refcount == 1" approach sounds promising: -- still use a page flag, but narrow the scope to get_user_pages_longterm() pages -- just wait in try_to_unmap, instead of giving up I'll look into it, while waiting for Michal's thoughts on this.