Received: by 2002:ac0:a5b6:0:0:0:0:0 with SMTP id m51-v6csp4294358imm; Mon, 18 Jun 2018 12:23:20 -0700 (PDT) X-Google-Smtp-Source: ADUXVKJ9l6TVJsvV00kCQ4SL/6y2Vk0vjJw2/dgQmd20foYmC37nqeyRr0eR+l8drDlGXV6aN41N X-Received: by 2002:a17:902:3c5:: with SMTP id d63-v6mr15779841pld.163.1529349800234; Mon, 18 Jun 2018 12:23:20 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1529349800; cv=none; d=google.com; s=arc-20160816; b=Zp3wIjP6qBUYxfRl4uGdSVshWOZUPX6JMaE0+1YEePfqKaummtjPJ/QnL9m6GHN2Y6 o9xsxGdePGpFQTb8ESIg9n1B0MGbAl9BajyN7bA106RsnGA64NnkFfKUNN3QidvemIwf IqDLc0hOFBWKqzOa9Gjqrq455B1TDmq8V3hCz8YOp/oN6LcqZJ1kaMTn0go4QdSZcm2I 67aDYKlr/b9Iit5AbC9jSAAISlu/httgFQVw2n3MARpzBP9HVHJWjuIdDU4OtJrFVsO3 iX7V8BzVXHlH/IWHm820qkAnk7Ip9MwiJS+QrhSO5dQgt3arSRABXQjtzcrviS9XtfZC Rtmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=list-id:precedence:sender:cc:to:subject:message-id:date:from :references:in-reply-to:mime-version:dkim-signature :arc-authentication-results; bh=P72y9HBsxOeK3n6ypmZhKrph6Tes9m4oigwwoetoE+I=; b=llscKoQGTzrMb0qCsrZYyzL1IT6Pf7MncHaGCsASdFApa4145KJH2piZPDCVExPtBy 2o7DaZzGpq9QsMFQirV4kPKfVeoDlFAgG/w61ptfoHLTK7L3EdPZGf+5UaBtv654THie RdG4/B2lc1YD/kU28yxjOr+dfJU5J51tWfQ8mi3WzN6/FC+d3Pa4ZGL7YL6dy2n7QcwH oPLE4DmVRlvNy5bFMpwtQa50j9lvhbvWiNXkrhtkAP+nh7JW4RlfJe9idlgHuty+7RCg gwvk4BEgjHZ8R6/VVgMKin3qrK9BZF3KnWdkosb32Btbeeby1USVenEjI+DYAwigKlrx 0RVg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=Qj9fIc5Z; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Return-Path: Received: from vger.kernel.org (vger.kernel.org. [209.132.180.67]) by mx.google.com with ESMTP id g10-v6si13037688pgr.72.2018.06.18.12.23.06; Mon, 18 Jun 2018 12:23:20 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) client-ip=209.132.180.67; Authentication-Results: mx.google.com; dkim=pass header.i=@intel-com.20150623.gappssmtp.com header.s=20150623 header.b=Qj9fIc5Z; spf=pass (google.com: best guess record for domain of linux-kernel-owner@vger.kernel.org designates 209.132.180.67 as permitted sender) smtp.mailfrom=linux-kernel-owner@vger.kernel.org; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=intel.com Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935926AbeFRTVv (ORCPT + 99 others); Mon, 18 Jun 2018 15:21:51 -0400 Received: from mail-oi0-f65.google.com ([209.85.218.65]:38934 "EHLO mail-oi0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S935544AbeFRTVr (ORCPT ); Mon, 18 Jun 2018 15:21:47 -0400 Received: by mail-oi0-f65.google.com with SMTP id t22-v6so15903383oih.6 for ; Mon, 18 Jun 2018 12:21:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=P72y9HBsxOeK3n6ypmZhKrph6Tes9m4oigwwoetoE+I=; b=Qj9fIc5ZSkn0rr31HLRt94YI5zW2DRNeCugU+vaUfLxCPjyvo8zMmRB3nKrqFegp4/ ynfzBc4OPiBZPC76rxAxn35kkIcixBW4zzBz6cS6M7BaJt2KYQtq2neLFm01R2Il+J5z V+SM4ofdoPhTtpTajhKXpMcXKch7xgO/IDeFh70GXLKTeZW+h44CmjBUyafdMO6Hw1oa 36OVzsnnPdRGh34UD8Udyja8SDLitZ6ofvC1Ei/zs4o1pxmEhv1lDHQkzBA1cQoggFrR CNL2Hs9yEFaybQmVHfLCH7EBhp02LzFgd6kszvG/7NXid4XFtO+lLejp03On+nvWSvkN S2bA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=P72y9HBsxOeK3n6ypmZhKrph6Tes9m4oigwwoetoE+I=; b=NliEaQ4fRHiKuOKcqakW/5uFlsimSztlZpQlF41iS7jPcMRsRD7WlVrUcZ5Ak/V3Lb IKJ6jlOzuqNnG4r5T01qRV4ce/iiqgJKBUQKdCajD7fYB5i1UflnEueFvR/rO73SLZPk s8z+s7PXkAzjNeE2V7kVyVF5rLasAK6YzWyDTQzKRmULJuCwiit19y/K9ZIbMHC0/Abv ICgSf7LrarE5RWrxuVepXtTe0PoOGEUiL60SSEsD0GZseaLd3JctQLC+SW8IHhm/x5Ow wuB+TSvxMOU0Shz9WMAj1iKcdgTQzao/87lnyCBjn0jpL2P7kSTbS9FIRLVnHthC4qdn nREQ== X-Gm-Message-State: APt69E3XQ6tzXNPDQ2zlqUIdb2pY3ifqH2JEw6vGK0JB7vlXd0CrekYE 7mC0t9r1oQ76xJItdJBlWJFqNxi2bl/O+QgvDZeOjg== X-Received: by 2002:aca:3954:: with SMTP id g81-v6mr8270687oia.215.1529349707282; Mon, 18 Jun 2018 12:21:47 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:2ea9:0:0:0:0:0 with HTTP; Mon, 18 Jun 2018 12:21:46 -0700 (PDT) In-Reply-To: <3898ef6b-2fa0-e852-a9ac-d904b47320d5@nvidia.com> References: <20180617012510.20139-1-jhubbard@nvidia.com> <20180617012510.20139-3-jhubbard@nvidia.com> <20180617200432.krw36wrcwidb25cj@ziepe.ca> <311eba48-60f1-b6cc-d001-5cc3ed4d76a9@nvidia.com> <20180618081258.GB16991@lst.de> <3898ef6b-2fa0-e852-a9ac-d904b47320d5@nvidia.com> From: Dan Williams Date: Mon, 18 Jun 2018 12:21:46 -0700 Message-ID: Subject: Re: [PATCH 2/2] mm: set PG_dma_pinned on get_user_pages*() To: John Hubbard Cc: Christoph Hellwig , Jason Gunthorpe , John Hubbard , Matthew Wilcox , Michal Hocko , Christopher Lameter , Jan Kara , Linux MM , LKML , linux-rdma Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jun 18, 2018 at 11:14 AM, John Hubbard wrote: > On 06/18/2018 10:56 AM, Dan Williams wrote: >> On Mon, Jun 18, 2018 at 10:50 AM, John Hubbard wrote: >>> On 06/18/2018 01:12 AM, Christoph Hellwig wrote: >>>> On Sun, Jun 17, 2018 at 01:28:18PM -0700, John Hubbard wrote: >>>>> Yes. However, my thinking was: get_user_pages() can become a way to indicate that >>>>> these pages are going to be treated specially. In particular, the caller >>>>> does not really want or need to support certain file operations, while the >>>>> page is flagged this way. >>>>> >>>>> If necessary, we could add a new API call. >>>> >>>> That API call is called get_user_pages_longterm. >>> >>> OK...I had the impression that this was just semi-temporary API for dax, but >>> given that it's an exported symbol, I guess it really is here to stay. >> >> The plan is to go back and provide api changes that bypass >> get_user_page_longterm() for RDMA. However, for VFIO and others, it's >> not clear what we could do. In the VFIO case the guest would need to >> be prepared handle the revocation. > > OK, let's see if I understand that plan correctly: > > 1. Change RDMA users (this could be done entirely in the various device drivers' > code, unless I'm overlooking something) to use mmu notifiers, and to do their > DMA to/from non-pinned pages. The problem with this approach is surprising the RDMA drivers with notifications of teardowns. It's the RDMA userspace applications that need the notification, and it likely needs to be explicit opt-in, at least for the non-ODP drivers. > 2. Return early from get_user_pages_longterm, if the memory is...marked for > RDMA? (How? Same sort of page flag that I'm floating here, or something else?) > That would avoid the problem with pinned pages getting their buffer heads > removed--by disallowing the pinning. Makes sense. Well, right now the RDMA workaround is DAX specific and it seems we need to generalize it for the page-cache case. One thought is to have try_to_unmap() take it's own reference and wait for the page reference count to drop to one so that the truncate path knows the page is dma-idle and disconnected from the page cache, but I have not looked at the details. > Also, is there anything I can help with here, so that things can happen sooner? I do think we should explore a page flag for pages that are "long term" pinned. Michal asked for something along these lines at LSF / MM so that the core-mm can give up on pages that the kernel has lost lifetime control. Michal, did I capture your ask correctly?